How to upgrade your Spark Stream application with a new checkpoint With working code

Canadian Data Guy
3 min readJan 25, 2023

Index:

· Kafka Basics: Topics, partition & offset
· What information is inside the checkpoint?
· How to fetch information about Offset & Partition from the Checkpoint folder?
· Now the easy part: Use Spark to start reading Kafka from a particular Offset
· Footnote:

Sometimes in life, we need to make breaking changes which require us to create a new checkpoint. Some example scenarios:

  1. You are doing a code/application change where you are changing logic
  2. Major Spark Version upgrade from Spark 2.x to Spark 3.x
  3. The previous deployment was wrong, and you want to reprocess from a certain point

There could be plenty of scenarios where you want to control precisely which data(Kafka offsets) need to be processed.

Not every scenario requires a new checkpoint. Here is a list of things you can change without requiring a new checkpoint.

This blog helps you understand how to handle a scenario where a new checkpoint is unavoidable.

Photo by Patrick Tomasso on Unsplash

Kafka Basics: Topics, partition & offset

--

--

Canadian Data Guy

https://canadiandataguy.com | Data Engineering & Streaming @ Databricks | Ex Amazon/AWS | All Opinions Are My Own