Kafka With Spring Boot
Kafka With Spring Boot
So, Kafka was born! It was built to be tough, able to handle tons of messages at once without
breaking a sweat. As it worked like a charm for LinkedIn, it became a big hit and got picked up
by the Apache Software Foundation.
From there, it just kept growing, becoming a favorite tool for lots of different businesses
thanks to its awesome streaming powers.
Let’s take an example
When a user go to cab booking app and book a cab then a driver is assigned to
that particular user, now that user is getting constant location update of the driver,
so how you think this type of communication is done?
Book a
User Driver
cab
Step 5
Constantly fetching
updated location of Updating location into
the driver
DB in every one
second
Step 6
After seeing the diagram everything seems fine and neat
Alright, let's put our detective hats on and snoop out the problem in this!
1. Now there is only one user and one driver, but their can be million of them.
2. Every second drivers are pushing their location into DB and every second users are
picking the location from the DB → So DB will not allow these humongous
Operation
So System will crash….
2. Let’s see the Kafka Approach
But, First understand kafka, because this doc is all about kafka
Kafka is an open-source distributed event streaming platform used for building real-time data
pipelines and streaming applications
Kafka is designed to handle large volumes of data, allowing seamless communication between systems and applications in
real-time.
It is characterized by its
1. Scalability
2. Fault-tolerance
3. high throughput
Making it a popular choice for organizations across various industries to manage and process streaming data efficiently.
Kafka is often used for use cases such as log aggregation, real-time analytics, monitoring, and messaging systems.
Architecture of kafka
cluster
Mapping inside the cluster
Understand each unit of kafka
Broker
The architecture of Apache Kafka is designed for distributed, fault tolerant, and scalable
handling of streaming data.
High level overview there is a cluster containing multiple brokers where one is a leader and the
rest is followers there by maintaining replica of data. Now the leader broker host a leader
partition where the producer writes and consumer reads. Rest brokers hosts follower partitions
and hence will replicate the same data to different disk location for full tolerance
Leader broker
Follower broker contains the
same data for fault tolerance,
in case we lost the data from
the leader broker
Follower broker
Follower broker
Understand each unit of kafka
Broker
But still, what is broker?
Kafka broker are individual server within the kafka cluster. They store and manage data
handle producer and consumer requests, and participate in the replication and distribution of
the data
Understand each unit of kafka
Topic
Partitions are a fundamental aspect of Kafka’s scalability and fault tolerance. A topic in
Kafka is divided into one or more partitions. These partitions allow the messages for a
topic to be spread across multiple nodes in the Kafka cluster, enabling horizontal scaling
of processing. Each partition can be placed on a different server, thus spreading the data
load and allowing for more messages to be processed concurrently.
.
Understand each unit of kafka
Partition
Partitions also play a crucial role in fault tolerance. In Kafka, each partition can be
replicated across multiple nodes, meaning that copies of the partition’s data are maintained
on different servers. This replication ensures that if a node fails, the data can be retrieved
from another node that has a replica of the partition, thereby minimizing data loss and
downtime.
The number of partitions in a topic directly influences the degree of parallelism in data
processing, as more partitions mean more parallel consumers can read from a topic
concurrently. However, the partition count should be chosen wisely, as too many partitions
can lead to overhead in management and decreased performance due to the increased
number of file handles and network connections.
Understand each unit of kafka
Partition
In summary, topics and partitions are central to Kafka’s architecture, providing the
mechanisms for organizing messages and enabling the system to scale out and handle
failures gracefully, thus ensuring reliable data delivery at scale
Understand each unit of kafka
Zookeeper
Understand each unit of kafka
Zookeeper
In real life, a zookeeper is responsible for the care and management of animals in a zoo,
ensuring their well-being, safety, and maintaining their habitats. Similarly, in the context of
Apache Kafka, Zookeeper is a crucial component responsible for managing and
coordinating the Kafka cluster.
Here are some similarities between a real-life zookeeper and Kafka Zookeeper:
Ensuring Stability and Safety: Both types of zookeepers ensure stability and safety. A
real-life zookeeper ensures the safety and well-being of animals, while Kafka Zookeeper
ensures the stability and reliability of the Kafka cluster by keeping track of its state and
configuration.
Maintaining Order and Organization: Zookeepers in both contexts maintain order and
organization. In a zoo, this involves organizing feeding schedules, cleaning habitats, and
ensuring proper care routines. In Kafka, Zookeeper maintains the metadata about Kafka
topics, brokers, and partitions, ensuring that the cluster remains organized and
Understand each unit of kafka
Zookeeper
Handling Failures: Both types of zookeepers handle failures. In a zoo, this might involve
responding to medical emergencies or resolving conflicts among animals. In Kafka,
Zookeeper handles failures by electing a leader among Kafka brokers and ensuring that
the cluster can continue to operate even if some nodes fail.
Producers are application that published messages to kafka topics. they create producer record like this
Understand each unit of kafka
Consumer
In Kafka, consumers team up to handle data from different parts of a topic. This
teamwork helps spread out the work and make things faster. Each consumer in
the team gets assigned specific parts of the topic to focus on. Every message from
those parts goes to just one consumer in the team. Kafka keeps track of who's
doing what, moving things around as needed when new members join or others
leave.
Understand each unit of kafka
Consumer
Two main work of Kafka consumer
Consumers keep track of the messages Consumers can commit these offsets to
they have processed using an offset, Kafka, which allows them to resume
which is a sequential id that uniquely reading from where they left off in case
identifies each message within a of a failure or restart, ensuring no
partition. message is lost or processed twice.
What are Offsets?
They are just the unique id each message is given to.
Kafka has its own Offsets management(ZOOKEEPER), it knows that if the offsets
are full. now product messages will be written from the next offset only.
Enough with theory of Beloved Kafka 🥹! :)
zookeeper-server-start.bat ..\..\config\zookeeper.properties
Paste the above command in your first cmd window like this
Now we will going to run our kafka in our local machine
There is two step to start kafka
kafka-server-start.bat ..\..\config\server.properties
Paste the above command in your 2nd cmd window like this
Note- Do not close those two cmd windows
Paste the above command in your 3rd cmd window like this
To see topics
ALL :- kafka-topics.bat --describe --bootstrap-server localhost:9092
Consumer
kafka-console-consumer.bat --bootstrap-server localhost:9092 --topic vipul-D-topic --from-beginning
O/P
>message1
>message2
>message3
Output - as soon we send data from producer it will be consume by a consumer
Producer Consumer
That’s all from Kafka in local
Let’s take an example
When a user go to cab booking app and book a cab then a driver is assigned to
that particular user, now that user is getting constant location update of the driver,
so how you think this type of communication is done?
Driver user
Driver user
Driver user
Let’s implement this is using Spring Boot
Create two spring boot project
Cab-book-driver Cab-book-user
Add the same dependency in both project
Maven: <dependency>
<groupId>org.springframework.kafka</groupId>
<artifactId>spring-kafka</artifactId>
</dependency>
Create a configuration class in Cab-book-driver project
Here we are only printing the locations which were sent by our driver.
After that, build and run the consumer(Cab-book-user) project
Thanks for your time. Detecting Errors, Experiencing Keen Scrutiny Helps. Addressing
suggestions for further improvement, I kindly request that you correct me rather than
making fun of the mistakes .