0% found this document useful (0 votes)
15 views17 pages

Apache Kafka

Uploaded by

dyvikmanju5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views17 pages

Apache Kafka

Uploaded by

dyvikmanju5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

INTERNET OF THINGS

TOPIC: APACHE KAFKA


Group members
Taniya Souza
[1DA21CS150]
Srinivasan r Guide:
[1DA21CS143] Prof. Lavanya Santosh
Yashwanth B K
CSE Dept
[1DA21CS171]
Yashwanth Gowda B
[1DA21CS172]
What is Apache Kafka?
•Apache Kafka is an open-source distributed event-streaming platform.

•Originally developed by LinkedIn and donated to the Apache Software

Foundation in 2011.

•Designed to handle high-throughput, low-latency, real-time data streams.


Key Features of Kafka
 Distributed System: Runs as a cluster of brokers for scalability and fault tolerance.

 Durable Storage: Data is stored on disk and replicated across brokers.

 High Throughput: Can handle millions of messages per second.

 Low Latency: Ensures quick delivery of messages.

 Decoupling Systems: Allows independent development and scaling of producers and consumers.
Why Use Kafka?
•Ideal for modern data-driven applications.

•Helps in building real-time analytics systems.

•Serves as a backbone for microservices communication.

•Ensures scalability to handle large datasets.

•Integrates with popular big data frameworks like Spark, Flink, and Hadoop.
Core Functions:
•Publish and Subscribe: Enables real-time messaging between producers and consumers through

topics.

•Durable Storage: Persistently stores data streams on disk, allowing replay and recovery.

•Scalable Partitioning: Divides topics into partitions for parallel and distributed data processing.

•Fault Tolerance: Ensures data availability and reliability through replication across brokers.

•Real-Time Stream Processing: Processes and analyzes data streams in real time using Kafka

Streams or external tools.


Kafka Architecture
Overview
• Kafka is a publish-subscribe messaging
system with the following components:
• Producers: Publish messages to topics.
• Consumers: Subscribe to topics to consume
messages.
• Brokers: Manage the storage and retrieval
of messages.
• Topics: Categories to which messages are
published.
• Partitions: Break down topics for scalability.
Kafka Topics
 A topic is a logical channel for data streams.

 Each topic is divided into partitions for parallel processing.

 Data in topics is retained for a configurable period, even after

consumption.

 Topics can have configurations for replication and data retention.

 Example: A “Sales Data” topic could have partitions based on

regions.
Producers and Consumers
•Producers: Send data to Kafka topics.

•Push messages to specific partitions.

•Can define custom partitioning logic (e.g., based on keys).

•Consumers: Read data from topics.

•Join consumer groups for parallel processing.

•Kafka ensures that each partition is read by one consumer in a group.


Brokers and Clusters
• A Kafka cluster consists of multiple brokers.

• Brokers: Handle storage and management of data

streams.

• Each broker handles a subset of partitions.

• Collaborate to provide fault tolerance and scalability.

• Clusters use ZooKeeper (or KRaft, in newer versions)

for managing configurations and leader election.


Kafka Partitions
 Topics are divided into partitions to distribute data

and allow parallelism.

 Data Placement: Messages in partitions are stored

in the order they arrive.

 Key-Based Partitioning: Ensures that messages

with the same key go to the same partition.

 Example: A “User Activity” topic could have partitions

for different user IDs.


Offset and Message
Ordering
• Offset: A unique identifier for each message in a partition.

• Used to keep track of consumed messages.

• Kafka guarantees message order within a partition but not

across partitions.

• Consumers can reset offsets for reprocessing messages.


Durability and Replication
•Kafka ensures durability by replicating data across brokers.

•Leader Replica: Handles all read and write requests for a partition.

•Follower Replicas: Maintain copies and take over if the leader fails.

•Acknowledgments: Producers can configure how many replicas must confirm

a message before it's considered successful.


Use Cases of Kafka
•Real-Time Analytics:
Monitor and analyze social media feeds or website activities.
•Log Aggregation:
Centralized logging from distributed systems.
•Event Sourcing:
Capture application changes as a sequence of events.
•Data Integration:
Sync databases and applications.
•Stream Processing:
Process and analyze data in real-time with Kafka Streams or other tools.
Advantages of Kafka
•Scalability: Can scale horizontally by adding brokers.
•Flexibility: Works with multiple programming languages.
•Resilience: Fault-tolerant with replication and partitioning.
•Performance: Handles millions of events per second with low latency.
•Integration: Seamlessly integrates with popular tools like Spark and Flink.
Challenges with Kafka
 Complex Setup: Requires expertise to configure and maintain.

 Resource-Intensive: High memory usage for durability and performance.

 Message Duplication: Can occur without proper configuration.

 Operational Overhead: ZooKeeper dependency in older versions.


SUMMAR
Y
 Apache Kafka is a distributed platform for real-time data streaming and
processing, designed for high-throughput, low-latency, and fault-tolerant
communication.
 Kafka uses topics for organizing data, partitions for scalability, and
replication for reliability, enabling efficient handling of massive data
streams.
 Common applications include real-time analytics, event-driven
architectures, log aggregation, and data integration between diverse
systems.
THANK YOU

You might also like