0% found this document useful (0 votes)
5 views

Kafka a Deep Dive Into Real Time Data Streaming

Uploaded by

kspark595
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Kafka a Deep Dive Into Real Time Data Streaming

Uploaded by

kspark595
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Kafka: A Deep Dive into

Real-time Data
Streaming
Welcome to our exploration of Apache Kafka, a powerful distributed
streaming platform. Kafka enables you to build real-time data
pipelines for a wide range of applications, from event-driven
architectures to modern data analytics.

KT
by Khanh Truong
Kafka Fundamentals: The Building Blocks
Topics Partitions Brokers Producers &
Consumers
Categorized streams of Each topic is divided into Kafka servers that store
data. Producers publish partitions. This enables and distribute messages. Producers send messages
messages to topics, and parallel processing and Brokers handle all to Kafka topics, while
consumers subscribe to improves performance. communication with consumers retrieve
topics to receive data. producers and messages from topics for
consumers. processing.
Zookeeper: The
Conductor of Kafka
1 Coordination 2 Configuration
Zookeeper manages It stores Kafka
broker discovery, cluster configurations, such as
membership, and leader topic metadata and
election. partition assignments.

3 Fault Tolerance
Zookeeper ensures Kafka's resilience to node failures by
providing a highly available service.
Crafting Kafka Topics:
The Data Pipeline
Foundation
Partitioning Replication
Dividing topics into partitions Creating multiple copies of
allows for parallel processing partitions across brokers
and scalability. ensures data durability and
fault tolerance.

Retention Policies
Defining how long data is stored in Kafka. This helps manage
storage space and data freshness.
Producing Data to Kafka: Sending
Messages into the Stream

Message Serialization Topic Selection Asynchronous Communication


Converting messages into a format Deciding which topic the message Producers send messages without
suitable for transmission over the should be published to based on its waiting for confirmation, allowing for
network. content. efficient data streaming.
Consuming Data from
Kafka: Receiving and
Processing Messages
1 Message Deserialization
Converting messages from their network format back into
usable objects.

2 Message Handling
Processing messages based on their content and business logic.

3 Consumer Group Assignment


Consumers belong to groups, which enables load balancing and
fault tolerance.
Kafka Consumer Groups:
Distributed Processing and
Resilience
Load Balancing
Messages are evenly distributed across consumers within a group.

Fault Tolerance
If a consumer fails, other consumers in the group can take over
its work.

Message Offset Management


Consumer groups track their progress through the topic using
message offsets.
Kafka Streams API: Real-time Data
Processing Made Easy

Data Transformation
1

Filtering
2
Selecting specific messages based on criteria.

Aggregation
3
Combining messages to derive insights.

Windowing
4
Processing data over a specific time window.
Kafka Connect: Bridging the Gap with
External Systems

1 Data Sources

Connectors
2
Plugins that handle data ingestion from and to external systems.

3 Kafka Topics

4 Data Sinks
Extending Kafka: Beyond the Basic

100+
Connectors
Extending Kafka's reach to various data sources and sinks.

1K+
Community
A vibrant community of developers and users contributing to Kafka's growth.

30M+
Messages
Kafka's impressive scalability and throughput handling billions of messages per day.

You might also like