How to write your first Spark application with Stream-Stream Joins with working code

Canadian Data Guy
11 min readMar 23, 2023

Source: https://canadiandataguy.com/blog/spark-stream-stream-join/

Have you been waiting to try Streaming but cannot take the plunge?

In a single blog, we will teach you whatever needs to be understood about Streaming Joins. We will give you a working code which you can use for your next Streaming Pipeline.

The steps involved:

  1. Create a fake dataset at scale
  2. Set a baseline using traditional SQL
  3. Define Temporary Streaming Views
  4. Inner Joins with optional Watermarking
  5. Left Joins with Watermarking
  6. The cold start edge case: withEventTimeOrder
  7. Cleanup
https://unsplash.com/photos/GAWiEPB0uEk

Index

· What is Stream-Stream Join?
Concept of Stream-Stream Join
Types of Stream-Stream Join
· 1. The Setup: Create a fake dataset at scale
Next, we will break this Delta table into 2 different tables
· 2. Set a baseline using traditional SQL
· Summary so far:
· 3. Define Temporary Streaming Views
· 4. Inner Joins with optional Watermarking
How was the watermark computed in this scenario?
· 5. Left Joins with Watermarking
5.a How Left Joins works differently than an

--

--

Canadian Data Guy

https://canadiandataguy.com | Data Engineering & Streaming @ Databricks | Ex Amazon/AWS | All Opinions Are My Own