Lecture #7.2 - Apache Spark - Streaming API
Lecture #7.2 - Apache Spark - Streaming API
APACHE SPARK
STREAMING API
Agenda
● Spark Streaming
● API
● Summary
2
Where are we?
3
1.
SPARK
STREAMING
Spark Streaming Features
● Declarative API:
Application specifies what instead of how to
compute events.
6
Spark Streaming Features
Spark Streaming has two streaming APIs:
9
Structured Streaming Basics
10
Structured Streaming Basics
11
Continuous Application
● streaming jobs
● batch jobs
● joins between streaming and offline data
● interactive ad-hoc queries.
12
Continuous Application
13
1.2
CORE
CONCEPTS
Core components
15
Core components
● Kafka
● File
● Socket*
● Rate**
18
Core components
19
Core components
Output Sinks
● Kafka
● File
● Console *
● Memory *
● Foreach
● ForeachBatch
* used for debugging
20
Core components
21
Core components
Output Modes
22
Core components
23
Core components
Triggers
24
Core components
25
Core components
Event-Time Processing
Structured Streaming support for event-time
processing (i.e., processing data based on
timestamps included in the record that may
arrive out of order)
26
Core components
Event-Time Processing
● Event-time data:
Event-time means time fields that are
embedded in the data
Event-Time Processing
● Watermarks:
Allow to specify how late it’s expected to see
data in event time (to limit how long they
need to remember old data)
28
2.
API
Input Source
SparkSession.readStream
30
Input Source
DataStreamReader
DataFrame.writeStream
32
Actions
DataStreamWriter.start
33
Actions
StreamQuery.awaitTermination(timeout)
34
3.
SUMMARY
Summary
36
Summary
37