How to upgrade your Spark Stream application with a new checkpoint With working code
Index:
· Kafka Basics: Topics, partition & offset
· What information is inside the checkpoint?
· How to fetch information about Offset & Partition from the Checkpoint folder?
· Now the easy part: Use Spark to start reading Kafka from a particular Offset
· Footnote:
Sometimes in life, we need to make breaking changes which require us to create a new checkpoint. Some example scenarios:
- You are doing a code/application change where you are changing logic
- Major Spark Version upgrade from Spark 2.x to Spark 3.x
- The previous deployment was wrong, and you want to reprocess from a certain point
There could be plenty of scenarios where you want to control precisely which data(Kafka offsets) need to be processed.
Not every scenario requires a new checkpoint. Here is a list of things you can change without requiring a new checkpoint.
This blog helps you understand how to handle a scenario where a new checkpoint is unavoidable.