0% found this document useful (0 votes)

30 views

Kafka Notes

Apache Kafka is an open-source distributed event streaming platform that enables real-time data processing and high-throughput data feeds. Its architecture consists of producers, consumers, brokers, topics, and partitions, ensuring fault tolerance and scalability. Kafka is widely used for real-time data pipelines, event sourcing, log aggregation, and stream processing, making it a key component in modern data architectures.

Uploaded by

vyom06

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views

Kafka Notes

Uploaded by

vyom06

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Kafka Notes

Introduction to Kafka
Apache Kafka is an open-source distributed event streaming platform designed to handle high-
throughput, real-time data feeds. Originally developed by LinkedIn and later open-sourced, Kafka is
written in Scala & Java, and it's used for building real-time data pipelines and streaming applications.
Kafka enables the creation of applications that can process, store, and manage streams of records in
real-time, providing fault tolerance, scalability, and high throughput.
Kafka Architecture
The major components that form the Kafka Architecture include:
1. Producer
Producers are the applications or systems that send data (events) to Kafka topics.
They push messages to Kafka brokers (servers) through topics, which then become available
for consumers to read.
2. Consumer
Consumers read data from Kafka topics. Each consumer in Kafka subscribes to one or more
topics.
Kafka supports both pull-based and push-based consumption methods. Consumers can
track and process messages from a specific point in the stream (offset).
3. Broker and Cluster
A Kafka Broker is a server that stores data and serves clients (producers and consumers).
These servers mediate conversation between two systems, they are responsible for
delivering messages to the right party.
Kafka clusters are typically made up of multiple brokers for fault tolerance and scalability.
Kafka ensures data replication across brokers, so data is not lost even if one broker fails.
Other components of Kafka include:
1. Topic
A topic is a category or feed to which Kafka producers send messages.
Messages are simply byte arrays that can store any object in typical data formats like String,
JSON, Avro etc.
Consumers subscribe to these topics to consume the data.
Topics in Kafka are broken down into partitions. Messages are written to partitions in
append-only fashion.
The partitions are distributed across brokers (and replicated too) for scalability and fault
tolerance.
2. Partition
Kafka topics are divided into partitions. Partitions allow Kafka to horizontally scale and
parallelize message processing.
Each partition is an ordered, immutable sequence of records. Partitions allow Kafka to
distribute the load across different brokers.
3. Offset
An offset is a unique identifier for each record within a partition. Kafka uses offsets to track
which records have been consumed.
Consumers maintain their position using offsets, enabling them to continue consuming from
the correct position in the topic stream.
4. ZooKeeper
ZooKeeper is used by Kafka to coordinate and manage distributed processes. It is
responsible for keeping track of broker metadata, managing leader election for partitions,
and maintaining cluster state.
Kafka relies on ZooKeeper to manage distributed system consistency, but in recent versions,
Kafka has been moving towards KRaft mode (Kafka Raft), which removes the need for
ZooKeeper.
Kafka Workflow
Kafka uses a publish-subscribe (Pub/Sub) model where producers send messages to topics and
consumers subscribe to topics to process messages.
1. Producer Sends Data:
The producer sends a message to a Kafka topic. The message could be a log, event, or any
form of data.
Kafka allows producers to publish messages using synchronous or asynchronous methods,
depending on the configuration.
Producers select the partition to send the messages to, per topic. They can also implement a
priority system i.e. sending a message to a certain partition depending on priority of the
message.
Producers don't wait for acknowledgement from brokers, and they keep sending messages
as fast as the brokers can handle.
2. Broker Stores the Message:
Multiple brokers form a cluster to maintain the load balance.
Kafka brokers store messages in partitions. Each partition is an ordered log of messages,
and Kafka appends new messages to the end of the log.
Each message is assigned a unique offset within the partition.
Kafka maintains multiple copies (replicas) of the data to ensure high availability and fault
tolerance.
One broker instance can handle thousands of read-writes per second and TBs of messages.
Backups of topic partitions are stored in multiple brokers
3. Consumer Reads Messages:
Consumers subscribe to one or more Kafka topics and read messages.
Consumers can use polling to fetch messages from Kafka topics.
Consumers can be part of a consumer group, where the group collectively consumes the
data. Each partition is consumed by one consumer in the group, ensuring parallel processing
and load balancing.
Having subscribed to a topic, consumers with request Kafka broker at regular intervals (e.g.
in every 200ms )
4. Processing and Acknowledgment:
Consumers process the messages based on their logic (e.g., transforming, storing in a
database, triggering downstream events).
After processing, consumers may commit the offset to Kafka, signifying that the messages
have been successfully processed.
Stream Processing applications can be built using Kafka Streams or integrated with systems
like Apache Flink or Apache Spark for real-time transformations and analytics.
Kafka Use Cases
Kafka is widely used in various real-time data processing scenarios. Some common use cases include:
1. Real-Time Data Pipelines:
Kafka is used to build reliable data pipelines that transport large volumes of data from one
system to another in real time.
2. Event Sourcing:
Kafka can store streams of immutable event logs, which can be replayed, audited, or used
for reconstructing system states.
3. Log Aggregation:
Kafka is often used for collecting and aggregating logs from various services, providing a
centralized log repository.
4. Real-Time Analytics:
Kafka enables real-time analytics by streaming data to analytics platforms like Apache
Spark, Apache Flink, or other data processing frameworks.
5. Messaging System:
Kafka provides a high-throughput, low-latency messaging system for communication
between microservices and other systems.
6. Stream Processing:
Kafka Streams API allows users to perform stream processing tasks, such as filtering,
joining, and aggregating data in real time.
Kafka Benefits
1. Scalability
Kafka is designed to handle very high throughput, with the ability to scale horizontally by adding
more brokers to the cluster.
Topics are partitioned and distributed across multiple brokers, allowing Kafka to handle large
volumes of data.
2. Fault Tolerance
Kafka replicates data across brokers to ensure fault tolerance. If one broker goes down, another
replica can serve the data.
Kafka guarantees data durability and ensures that no data is lost even in case of broker failures.
3. High Throughput
Kafka is optimized for high-throughput message delivery. It can handle millions of messages per
second due to its efficient storage and distributed architecture.
4. Low Latency
Kafka delivers messages with low latency, making it suitable for real-time data processing and
event-driven applications.
5. Durability
Kafka provides durability by persisting messages to disk in a highly efficient manner. Even after
messages are consumed, they remain available for replay for a configured retention period.
6. Stream Processing
Kafka supports stream processing natively through the Kafka Streams API and integrations with
tools like Apache Flink and Apache Spark for real-time data analytics and processing.
Setting up Kafka in your system
Prerequisites
Before setting up Kafka, ensure that the following software is installed:
1. Java: Kafka is written in Java, so you must have Java 8 or later installed. You can download and
install Java from the official Oracle website or install it via package managers like apt or brew
(macOS).
2. Zookeeper: Kafka relies on Zookeeper for distributed coordination (though newer versions of
Kafka are moving away from Zookeeper). Zookeeper must be installed and running before starting
Kafka.
3. Kafka: Kafka itself is the core component you need to install.
Steps for setting up Kafka
Step 1: Download and Extract Apache Kafka
Download Kafka:
Go to the official Apache Kafka website.
Download the latest stable version of Kafka. Select the version that fits your operating
system (tar/zip).
Extract the archive:
If you've downloaded the Kafka .tar or .tgz file, extract it to a directory:

tar -xvzf kafka_2.13-2.8.0.tgz

This will extract Kafka into a directory named kafka_2.13-2.8.0.

Step 2: Start Zookeeper
Kafka uses Zookeeper to manage distributed brokers and ensure fault tolerance. Kafka comes with
an embedded Zookeeper server, but you can also set up a separate Zookeeper instance.
Start the Zookeeper server:
Navigate to the Kafka directory:

cd kafka_2.13-2.8.0

- Start Zookeeper using the provided script:

bin/zookeeper-server-start.sh config/zookeeper.properties

This will start the Zookeeper server with the default configuration provided in the
config/zookeeper.properties file.

Note: By default, Zookeeper binds to localhost:2181. You can modify the configuration if
needed.
Step 3: Start Kafka Broker
Start the Kafka broker:
Open another terminal window and navigate to the Kafka directory again.
Run the following command to start the Kafka broker:

bin/kafka-server-start.sh config/server.properties

This will start a single Kafka broker using the default configuration found in
config/server.properties.
By default, Kafka will run on localhost:9092, and it will communicate with the Zookeeper
instance running on localhost:2181.
Step 4: Create a Kafka Topic
Once Kafka is running, you can create a topic to send and consume messages. Topics are logical
channels to which producers send messages, and consumers read from.
Create a topic:
Use the following command to create a topic named my_topic:

bin/kafka-topics.sh --create --topic my_topic --bootstrap-server

localhost:9092 --partitions 1 --replication-factor 1

This will create a topic with 1 partition and a replication factor of 1.

List Kafka topics:
To verify that the topic has been created, you can list all topics using:

bin/kafka-topics.sh --list --bootstrap-server localhost:9092

Step 5: Produce Messages to Kafka Topic

Now that Kafka is up and running, you can start producing messages to your topic.
Start a producer:
In a new terminal window, start the Kafka producer to send messages to your topic:

bin/kafka-console-producer.sh --topic my_topic --bootstrap-server

localhost:9092

Type messages and press Enter to send them to Kafka. Each message will be added to the
my_topic topic.
Step 6: Consume Messages from Kafka Topic
Once messages are being produced to the topic, you can start consuming them.
Start a consumer:
In another terminal window, start a Kafka consumer to read messages from the topic:

bin/kafka-console-consumer.sh --topic my_topic --bootstrap-server

localhost:9092 --from-beginning
This will start the consumer and display all messages in my_topic from the beginning.
Note: The --from-beginning flag tells the consumer to read messages from the beginning
of the topic, not just new messages.
Step 7: Test the Setup
Send messages: In the producer terminal, send a few test messages.
Consume messages: In the consumer terminal, the messages will appear as you send them from
the producer.
Step 8: Stopping Kafka and Zookeeper
Stop Kafka:
To stop Kafka, press Ctrl + C in the terminal window where Kafka is running.
Alternatively, you can stop Kafka by running:

bin/kafka-server-stop.sh config/server.properties

Stop Zookeeper:
Similarly, press Ctrl + C in the terminal window where Zookeeper is running.
Or run the following command to stop Zookeeper:

bin/zookeeper-server-stop.sh config/zookeeper.properties

Step 9: Kafka Configuration (Optional)

Kafka's configuration files are located in the config/ directory:
server.properties: Configuration for Kafka broker (e.g., ports, log directories, replication
settings).
zookeeper.properties: Configuration for Zookeeper.
You can customize these configuration files to suit your needs, such as changing the broker's
listening port, log directories, or adjusting retention policies for topics.
Conclusion
Apache Kafka is a powerful, distributed event streaming platform designed to handle large-scale, high-
throughput, and fault-tolerant data streams. It is widely used in various industries for real-time analytics,
messaging, data integration, and stream processing. Kafka's ability to handle millions of events per
second while maintaining durability and scalability makes it an essential component in modern data
architectures.

Kafka Using Spring Boot
No ratings yet
Kafka Using Spring Boot
136 pages
Apache Kafka
No ratings yet
Apache Kafka
9 pages
Understanding Apache Kafka White Paper
No ratings yet
Understanding Apache Kafka White Paper
7 pages
Apache Kafka(1)
No ratings yet
Apache Kafka(1)
10 pages
Apache Kafka
No ratings yet
Apache Kafka
17 pages
Apache Kafka 101
No ratings yet
Apache Kafka 101
25 pages
Kafka
No ratings yet
Kafka
12 pages
Kafka Clustering v1.0.0
No ratings yet
Kafka Clustering v1.0.0
20 pages
AK
No ratings yet
AK
22 pages
Getting To Know Kafka: Ola Is The First Course in The Series of Courses Covering All The Aspects of Kafka
No ratings yet
Getting To Know Kafka: Ola Is The First Course in The Series of Courses Covering All The Aspects of Kafka
23 pages
Kafka Using Spring Boot v2
No ratings yet
Kafka Using Spring Boot v2
150 pages
KAFKAExample2
No ratings yet
KAFKAExample2
12 pages
kafka
No ratings yet
kafka
43 pages
Unit 5 Apache Kafka Notes
No ratings yet
Unit 5 Apache Kafka Notes
54 pages
Fundamentals and Architecture of Apache Kafka
No ratings yet
Fundamentals and Architecture of Apache Kafka
30 pages
Cours - Kafka
No ratings yet
Cours - Kafka
72 pages
Kafka
No ratings yet
Kafka
23 pages
Apache_Kafka_360_1631077800
No ratings yet
Apache_Kafka_360_1631077800
137 pages
Kafka
No ratings yet
Kafka
19 pages
Apache Kafka
No ratings yet
Apache Kafka
13 pages
unit 3
No ratings yet
unit 3
26 pages
4. Introduction to Apache Kafka and its setup (3)
No ratings yet
4. Introduction to Apache Kafka and its setup (3)
29 pages
Introduction To Apache Kafka
No ratings yet
Introduction To Apache Kafka
18 pages
Configuring Kafka For High Throughput
No ratings yet
Configuring Kafka For High Throughput
11 pages
Chapter 1 - Introduction To KAFKA: Objectives
No ratings yet
Chapter 1 - Introduction To KAFKA: Objectives
17 pages
? Kafka
No ratings yet
? Kafka
2 pages
Introduction To Apache Kafka - 070224-1155-334
No ratings yet
Introduction To Apache Kafka - 070224-1155-334
7 pages
Documentation
No ratings yet
Documentation
105 pages
Apache Kafka
No ratings yet
Apache Kafka
27 pages
Big Data - Group 14
No ratings yet
Big Data - Group 14
26 pages
Apache Kafka Tutorial
No ratings yet
Apache Kafka Tutorial
6 pages
Kafka
No ratings yet
Kafka
5 pages
Kafka With Spring Boot
No ratings yet
Kafka With Spring Boot
48 pages
Apache Kafka Description
No ratings yet
Apache Kafka Description
36 pages
Introduction To Apache Kafka
No ratings yet
Introduction To Apache Kafka
15 pages
KAFKA PRESENTATION (1)
No ratings yet
KAFKA PRESENTATION (1)
16 pages
Kafka Reference Architecture
No ratings yet
Kafka Reference Architecture
12 pages
Apache Kafka Long Polling
No ratings yet
Apache Kafka Long Polling
20 pages
Apache Kafka Key Concepts
100% (1)
Apache Kafka Key Concepts
8 pages
SITA1603 Unit 3 Material
No ratings yet
SITA1603 Unit 3 Material
45 pages
Apache Kafka | Thi Nguyen's Blog
No ratings yet
Apache Kafka | Thi Nguyen's Blog
39 pages
Pache Kafka Is An Open-Source Distr
No ratings yet
Pache Kafka Is An Open-Source Distr
1 page
KAFKA PPT
No ratings yet
KAFKA PPT
11 pages
Kafka Notes1
No ratings yet
Kafka Notes1
19 pages
Apache Kafka Beginner Guide
No ratings yet
Apache Kafka Beginner Guide
40 pages
Kafka Notes
No ratings yet
Kafka Notes
7 pages
Apache Kafka Introduction
No ratings yet
Apache Kafka Introduction
21 pages
Kafka
No ratings yet
Kafka
3 pages
Instaclustr Understanding Apache Kafka White Paper
No ratings yet
Instaclustr Understanding Apache Kafka White Paper
8 pages
Using Kafka For Real Time Data Ingestion With .NET KevinFeasel
No ratings yet
Using Kafka For Real Time Data Ingestion With .NET KevinFeasel
33 pages
08_Apache_Kafka
No ratings yet
08_Apache_Kafka
45 pages
Kafka Monitoring
No ratings yet
Kafka Monitoring
64 pages
BDA Lab A7
No ratings yet
BDA Lab A7
10 pages
Apache Kafka
No ratings yet
Apache Kafka
130 pages
Kafka My Kafka Note v67
No ratings yet
Kafka My Kafka Note v67
55 pages
Advanced Apache Kafka: Engineering High-Performance Streaming Applications
From Everand
Advanced Apache Kafka: Engineering High-Performance Streaming Applications
Peter Jones
No ratings yet
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
From Everand
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
Eric Chou
No ratings yet
The Apache Kafka® and Generative AI Handbook
From Everand
The Apache Kafka® and Generative AI Handbook
Joseph Matthew Stein
No ratings yet
Kafka Mastery Guide: Comprehensive Techniques and Insights
From Everand
Kafka Mastery Guide: Comprehensive Techniques and Insights
Adam Jones
No ratings yet
Mastering Kafka Streams: From Basics to Expert Proficiency
From Everand
Mastering Kafka Streams: From Basics to Expert Proficiency
William Smith
No ratings yet
Abhishek PPT Final Year 1
No ratings yet
Abhishek PPT Final Year 1
20 pages
Coca-Cola Business Intelligence Case
No ratings yet
Coca-Cola Business Intelligence Case
8 pages
Section 7. PID Control
No ratings yet
Section 7. PID Control
39 pages
Hydraulics Glossary
100% (1)
Hydraulics Glossary
5 pages
Process Assessment Issues of The ISO-IEC 29110 Standard
No ratings yet
Process Assessment Issues of The ISO-IEC 29110 Standard
4 pages
Grid Stability Is Must
No ratings yet
Grid Stability Is Must
4 pages
Symantec I3
No ratings yet
Symantec I3
4 pages
LectureZero - CSE320
No ratings yet
LectureZero - CSE320
31 pages
TC191001 1
No ratings yet
TC191001 1
1 page
FS-l6S: Instruction Manual
No ratings yet
FS-l6S: Instruction Manual
25 pages
4 - Effective Resource Management Planning
No ratings yet
4 - Effective Resource Management Planning
21 pages
03 - MAE AE Resources & Exercise
No ratings yet
03 - MAE AE Resources & Exercise
53 pages
DP0016633 - 02-EPHTT2 - FIBER AREA and MACHINE DESCRIPTIVE SAFETY LOGICS
No ratings yet
DP0016633 - 02-EPHTT2 - FIBER AREA and MACHINE DESCRIPTIVE SAFETY LOGICS
31 pages
2019-2-16 Fiber Cut Parts List
100% (3)
2019-2-16 Fiber Cut Parts List
50 pages
Photography Plaza Nagaria S Tang-S
No ratings yet
Photography Plaza Nagaria S Tang-S
11 pages
Fujitsu Q700 - Especificações
No ratings yet
Fujitsu Q700 - Especificações
5 pages
Kevin Gichure Chege CV 1
No ratings yet
Kevin Gichure Chege CV 1
6 pages
OS6350 AOS 6.7.2 R07 CLI Reference Guide PDF
No ratings yet
OS6350 AOS 6.7.2 R07 CLI Reference Guide PDF
2,974 pages
70-689 Upgrading Your Skills To MCSA Windows 8
No ratings yet
70-689 Upgrading Your Skills To MCSA Windows 8
61 pages
Axali
No ratings yet
Axali
11 pages
Piping: Engineering Subfields
No ratings yet
Piping: Engineering Subfields
2 pages
No Soldering Guide RaspberryPi Only To Add Games On Game and Watch V1
No ratings yet
No Soldering Guide RaspberryPi Only To Add Games On Game and Watch V1
4 pages
Analysis of Algorithms
No ratings yet
Analysis of Algorithms
4 pages
HCD-GRX5 RX66
No ratings yet
HCD-GRX5 RX66
70 pages
Ds-2cd6365g0e-Ivs
No ratings yet
Ds-2cd6365g0e-Ivs
5 pages
PageSpeed Insights-After
No ratings yet
PageSpeed Insights-After
2 pages
ICAA5044A
No ratings yet
ICAA5044A
8 pages
Qa16 Qa01 Qa05 Qa06
No ratings yet
Qa16 Qa01 Qa05 Qa06
2 pages
Bim Module Epfl
No ratings yet
Bim Module Epfl
105 pages
Google Maps!
No ratings yet
Google Maps!
6 pages

Kafka Notes

Uploaded by

Kafka Notes

Uploaded by

Kafka Notes

tar -xvzf kafka_2.13-2.8.0.tgz

This will extract Kafka into a directory named kafka_2.13-2.8.0.

- Start Zookeeper using the provided script:

bin/kafka-topics.sh --create --topic my_topic --bootstrap-server

This will create a topic with 1 partition and a replication factor of 1.

bin/kafka-topics.sh --list --bootstrap-server localhost:9092

Step 5: Produce Messages to Kafka Topic

bin/kafka-console-producer.sh --topic my_topic --bootstrap-server

bin/kafka-console-consumer.sh --topic my_topic --bootstrap-server

Step 9: Kafka Configuration (Optional)

You might also like