Stream Processing Using Kafka
Stream Processing Using Kafka
Software Consultant
@_himaniarora @pk_official
Who we are?
Paradigms of programming
REQUEST/RESPONSE
BATCH SYSTEMS
STREAM PROCESSING
REQUEST/RESPONSE
batch systems
STREAM PROCESSING
STREAM PROCESSING with kafka
2 APPROACHES:
Major Challenges:
FAULT TOLERANCE
TIME
STATE
REPROCESSING
STREAM PROCESSING FRAMEWORK
SPARK
STORM
SAMZA
FLINK ETC...
KAFKA STREAMS : ANOTHER WAY OF STREAM PROCESSING
Lets starts with Kafka Stream but wait.. What is KAFKA?
Hello Apache Kafka
Powerful
Makes your applications highly scalable, elastic,
distributed, fault-tolerant.
Stateful and stateless processing
Event-time processing with windowing, joins,
aggregations
Lightweight
Low barrier to entry
No processing cluster required
No external dependencies other than Apache Kafka
Capabilities of Kafka Stream
Real-time
Millisecond processing latency
Record-at-a-time processing (no micro-batching)
Seamlessly handles late-arriving and out-of-order data
High throughput
Fully integrated
100% compatible with Apache Kafka 0.10.2 and 0.10.1
Easy to integrate into existing applications and microservices
Runs everywhere: on-premises, public clouds, private clouds,
containers, etc.
Integrates with databases through continous change data
capture (CDC) performed by Kafka Connect
Key concepts of Kafka Streams
KStream
KTable
Time
Aggregations
Joins
Windowing
Key concepts of Kafka Streams
Kstream
A KStream is an abstraction of a record stream.
Each data record represents a self-contained datum in
the unbounded data set.
Using the table analogy, data records in a record
stream are always interpreted as an INSERT .
Lets imagine the following two data records are being
sent to the stream:
("alice", 1) --> ("alice", 3)
Key concepts of Kafka Streams
Ktable
A KStream is an abstraction of a changelog stream.
Each data record represents an update.
Using the table analogy, data records in a record
stream are always interpreted as an UPDATE .
Lets imagine the following two data records are being
sent to the stream:
("alice", 1) --> ("alice", 3)
Key concepts of Kafka Streams
Time
A critical aspect in stream processing is the the notion
of time.
Kafka Streams supports the following notions of time:
Event Time
Processing Time
Ingestion Time
Kafka Streams assigns a timestamp to every data
record via so-called timestamp extractors.
Key concepts of Kafka Streams
Aggregations
An aggregation operation takes one input stream or
table, and yields a new table.
It is done by combining multiple input records into a
single output record.
In the Kafka Streams DSL, an input stream of an
aggregation operation can be a KStream or a KTable,
but the output stream will always be a KTable.
Key concepts of Kafka Streams
Joins
A join operation merges two input streams and/or
tables based on the keys of their data records, and
yields a new stream/table.
Key concepts of Kafka Streams
Windowing
Windowing lets you control how to group records that
have the same key for stateful operations such as
aggregations or joins into so-called windows.
Windows are tracked per record key.
When working with windows, you can specify a
retention period for the window.
This retention period controls how long Kafka Streams
will wait for out-of-order or late-arriving data records
for a given window.
If a record arrives after the retention period of a
window has passed, the record is discarded and will not
be processed in that window.
Inside Kafka Stream
Processor Topology
Stream Partitions and Tasks
So-called Sources import data into Kafka, and Sinks export data
from Kafka.
https://www.slideshare.net/ConfluentInc/demystifying-stream-
processing-with-apache-kafka-69228952
https://www.confluent.io/blog/introducing-kafka-streams-
stream-processing-made-simple/
http://docs.confluent.io/3.2.0/streams/index.html
http://docs.confluent.io/3.2.0/connect/index.html
Any
Questions?
Thank You