Kafka-backup: Consumer offset

Created on 13 Jun 2019 · 3Comments · Source: itadventurer/kafka-backup

@azapps First of all thanks for this wonderful open source project.

I am writing a blog post on backup and restore of Kafka Topics in a Kubernetes environment with another open source project OpenEBS providing the underlying persistent container attached storage.

For now I settled on using Spredfast's S3 connector but my friend Arash Kaffamanesh pointed me to your work. I had a couple of questions.

At the time of restore , how do i let the consumer know from where to start consuming ?
Can you please share additional differences with spredfast's connector ?

my Kafka environment runs in Kubernetes. Ideally I want a backup/restore storage location outside my cluster so that I can get it back in event of a failure.

backup location is determined by target.dir , it becomes difficult to manage a path on a node if the environment is Kubernetes.

Source

ipochi

Most helpful comment

Hi Imran,

I am writing a blog post on backup and restore of Kafka Topics in a Kubernetes environment with another open source project OpenEBS providing the underlying persistent container attached storage.

Backing up Kafka using File System snapshots is not that trivial. See https://github.com/azapps/kafka-backup/blob/master/docs/Comparing_Kafka_Backup_Solutions.md for more information about that.

For now I settled on using Spredfast's S3 connector but my friend Arash Kaffamanesh pointed me to your work. I had a couple of questions.

The S3 connector seems perfectly fine if you do not need to restore any Consumer offsets. I dived deep into the source code of the S3 connector before dismissing it as a solution for our problems as it does not provide that critical feature and it is hard to extend it to handle that case.

At the time of restore , how do i let the consumer know from where to start consuming ?

Currently the only way is to just delete the segments that should not be restored and recreate the index. There will be more information soon about how to achieve that. If you really require to start restoration from a very specific offset, please open an issue. That should not be hard to implement.

Can you please share additional differences with spredfast's connector ?

Again, The S3 connector is not able to sync consumer offsets during restoration. In fact, there is simply no way to do so reliably in the current Kafka version. Thanks to @ryannedolan work on Mirror Maker 2 there will be soon a way to do so and kafka-backup uses that API. Luckily this change is even backward-compatible and there will be documentation how to use kafka-backup that way very soon.

Additonally, S3 Connector just supports S3. Currently kafka-backup supports only backup to file system and then you can use whatever tool you want to move it to your final destination. I am planning to add support for more storage backends if there is a need.

Apart from that, the two projects are architecturally-wise very similar (in fact, the S3 connector together with Mirror Maker 2 inspired kafka-backup)

my Kafka environment runs in Kubernetes. Ideally I want a backup/restore storage location outside my cluster so that I can get it back in event of a failure.

As far as I know you are using Strimzi too, we have the same backup. I will write a blog post soonish how to do a full backup of Kafka and (do not forget that!) Zookeeper on Kubernetes and Strimzi.

backup location is determined by target.dir , it becomes difficult to manage a path on a node if the environment is Kubernetes.

Just mount a persistent volume as always. Use a sidecar container to move it to your final destination. You can even keep the persistent volume relatively small as you can delete old segments and their index as soon as they are finalized. (Documentation is coming)

If you wait a few more days, I will publish an introductory blog post covering some of your topics. Write me an email or ask @arashkaffamanesh for a draft :wink:

itadventurer on 13 Jun 2019

❤2 👍2

All 3 comments

Hi Imran,

I am writing a blog post on backup and restore of Kafka Topics in a Kubernetes environment with another open source project OpenEBS providing the underlying persistent container attached storage.

Backing up Kafka using File System snapshots is not that trivial. See https://github.com/azapps/kafka-backup/blob/master/docs/Comparing_Kafka_Backup_Solutions.md for more information about that.

For now I settled on using Spredfast's S3 connector but my friend Arash Kaffamanesh pointed me to your work. I had a couple of questions.

At the time of restore , how do i let the consumer know from where to start consuming ?

Can you please share additional differences with spredfast's connector ?

Apart from that, the two projects are architecturally-wise very similar (in fact, the S3 connector together with Mirror Maker 2 inspired kafka-backup)

my Kafka environment runs in Kubernetes. Ideally I want a backup/restore storage location outside my cluster so that I can get it back in event of a failure.

As far as I know you are using Strimzi too, we have the same backup. I will write a blog post soonish how to do a full backup of Kafka and (do not forget that!) Zookeeper on Kubernetes and Strimzi.

backup location is determined by target.dir , it becomes difficult to manage a path on a node if the environment is Kubernetes.

If you wait a few more days, I will publish an introductory blog post covering some of your topics. Write me an email or ask @arashkaffamanesh for a draft :wink:

itadventurer on 13 Jun 2019

❤2 👍2

@azapps' contribution is unique and awesome and I guess the whole community should help to get the proposed and implemented Kafka Backup by @azapps to become a standardised piece of the Kafka Ecosystem!

Nothing is perfect, but this implementation by @azapps is brilliant!

arashkaffamanesh on 13 Jun 2019

For the record: Here we go: https://medium.com/@anatolyz/introducing-kafka-backup-9dc0677ea7ee

itadventurer on 14 Jun 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Provide examples for different configurations

itadventurer · 5Comments

"consumer poll timeout has expired" loop (was "kafka-backup may fail due to slow disk IO")

jay7x · 18Comments

Failed to deserialize value for header 'kafka_replyPartition' on topic

jay7x · 15Comments

Yet another NPE

jay7x · 16Comments

How can backup all topics using regex

huyngopt1994 · 5Comments