Kafka-backup: Consumer offset

Created on 13 Jun 2019  ·  3Comments  ·  Source: itadventurer/kafka-backup

@azapps First of all thanks for this wonderful open source project.

I am writing a blog post on backup and restore of Kafka Topics in a Kubernetes environment with another open source project OpenEBS providing the underlying persistent container attached storage.

For now I settled on using Spredfast's S3 connector but my friend Arash Kaffamanesh pointed me to your work. I had a couple of questions.

At the time of restore , how do i let the consumer know from where to start consuming ?
Can you please share additional differences with spredfast's connector ?

my Kafka environment runs in Kubernetes. Ideally I want a backup/restore storage location outside my cluster so that I can get it back in event of a failure.

backup location is determined by target.dir , it becomes difficult to manage a path on a node if the environment is Kubernetes.

Most helpful comment

Hi Imran,

I am writing a blog post on backup and restore of Kafka Topics in a Kubernetes environment with another open source project OpenEBS providing the underlying persistent container attached storage.

Backing up Kafka using File System snapshots is not that trivial. See https://github.com/azapps/kafka-backup/blob/master/docs/Comparing_Kafka_Backup_Solutions.md for more information about that.

For now I settled on using Spredfast's S3 connector but my friend Arash Kaffamanesh pointed me to your work. I had a couple of questions.

The S3 connector seems perfectly fine if you do not need to restore any Consumer offsets. I dived deep into the source code of the S3 connector before dismissing it as a solution for our problems as it does not provide that critical feature and it is hard to extend it to handle that case.

At the time of restore , how do i let the consumer know from where to start consuming ?

Currently the only way is to just delete the segments that should not be restored and recreate the index. There will be more information soon about how to achieve that. If you really require to start restoration from a very specific offset, please open an issue. That should not be hard to implement.

Can you please share additional differences with spredfast's connector ?

Again, The S3 connector is not able to sync consumer offsets during restoration. In fact, there is simply no way to do so reliably in the current Kafka version. Thanks to @ryannedolan work on Mirror Maker 2 there will be soon a way to do so and kafka-backup uses that API. Luckily this change is even backward-compatible and there will be documentation how to use kafka-backup that way very soon.

Additonally, S3 Connector just supports S3. Currently kafka-backup supports only backup to file system and then you can use whatever tool you want to move it to your final destination. I am planning to add support for more storage backends if there is a need.

Apart from that, the two projects are architecturally-wise very similar (in fact, the S3 connector together with Mirror Maker 2 inspired kafka-backup)

my Kafka environment runs in Kubernetes. Ideally I want a backup/restore storage location outside my cluster so that I can get it back in event of a failure.

As far as I know you are using Strimzi too, we have the same backup. I will write a blog post soonish how to do a full backup of Kafka and (do not forget that!) Zookeeper on Kubernetes and Strimzi.

backup location is determined by target.dir , it becomes difficult to manage a path on a node if the environment is Kubernetes.

Just mount a persistent volume as always. Use a sidecar container to move it to your final destination. You can even keep the persistent volume relatively small as you can delete old segments and their index as soon as they are finalized. (Documentation is coming)

If you wait a few more days, I will publish an introductory blog post covering some of your topics. Write me an email or ask @arashkaffamanesh for a draft :wink:

All 3 comments

Hi Imran,

I am writing a blog post on backup and restore of Kafka Topics in a Kubernetes environment with another open source project OpenEBS providing the underlying persistent container attached storage.

Backing up Kafka using File System snapshots is not that trivial. See https://github.com/azapps/kafka-backup/blob/master/docs/Comparing_Kafka_Backup_Solutions.md for more information about that.

For now I settled on using Spredfast's S3 connector but my friend Arash Kaffamanesh pointed me to your work. I had a couple of questions.

The S3 connector seems perfectly fine if you do not need to restore any Consumer offsets. I dived deep into the source code of the S3 connector before dismissing it as a solution for our problems as it does not provide that critical feature and it is hard to extend it to handle that case.

At the time of restore , how do i let the consumer know from where to start consuming ?

Currently the only way is to just delete the segments that should not be restored and recreate the index. There will be more information soon about how to achieve that. If you really require to start restoration from a very specific offset, please open an issue. That should not be hard to implement.

Can you please share additional differences with spredfast's connector ?

Again, The S3 connector is not able to sync consumer offsets during restoration. In fact, there is simply no way to do so reliably in the current Kafka version. Thanks to @ryannedolan work on Mirror Maker 2 there will be soon a way to do so and kafka-backup uses that API. Luckily this change is even backward-compatible and there will be documentation how to use kafka-backup that way very soon.

Additonally, S3 Connector just supports S3. Currently kafka-backup supports only backup to file system and then you can use whatever tool you want to move it to your final destination. I am planning to add support for more storage backends if there is a need.

Apart from that, the two projects are architecturally-wise very similar (in fact, the S3 connector together with Mirror Maker 2 inspired kafka-backup)

my Kafka environment runs in Kubernetes. Ideally I want a backup/restore storage location outside my cluster so that I can get it back in event of a failure.

As far as I know you are using Strimzi too, we have the same backup. I will write a blog post soonish how to do a full backup of Kafka and (do not forget that!) Zookeeper on Kubernetes and Strimzi.

backup location is determined by target.dir , it becomes difficult to manage a path on a node if the environment is Kubernetes.

Just mount a persistent volume as always. Use a sidecar container to move it to your final destination. You can even keep the persistent volume relatively small as you can delete old segments and their index as soon as they are finalized. (Documentation is coming)

If you wait a few more days, I will publish an introductory blog post covering some of your topics. Write me an email or ask @arashkaffamanesh for a draft :wink:

@azapps' contribution is unique and awesome and I guess the whole community should help to get the proposed and implemented Kafka Backup by @azapps to become a standardised piece of the Kafka Ecosystem!

Nothing is perfect, but this implementation by @azapps is brilliant!

For the record: Here we go: https://medium.com/@anatolyz/introducing-kafka-backup-9dc0677ea7ee

Was this page helpful?
0 / 5 - 0 ratings