How to write your first Spark application with Stream-Stream Joins with working code
Source: https://canadiandataguy.com/blog/spark-stream-stream-join/
Have you been waiting to try Streaming but cannot take the plunge?
In a single blog, we will teach you whatever needs to be understood about Streaming Joins. We will give you a working code which you can use for your next Streaming Pipeline.
The steps involved:
- Create a fake dataset at scale
- Set a baseline using traditional SQL
- Define Temporary Streaming Views
- Inner Joins with optional Watermarking
- Left Joins with Watermarking
- The cold start edge case: withEventTimeOrder
- Cleanup
Index
· What is Stream-Stream Join?
∘ Concept of Stream-Stream Join
∘ Types of Stream-Stream Join
· 1. The Setup: Create a fake dataset at scale
∘ Next, we will break this Delta table into 2 different tables
· 2. Set a baseline using traditional SQL
· Summary so far:
· 3. Define Temporary Streaming Views
· 4. Inner Joins with optional Watermarking
∘ How was the watermark computed in this scenario?
· 5. Left Joins with Watermarking
∘ 5.a How Left Joins works differently than an…