Kafka 2
Kafka 2
d Core concepts. In this article we will cover the core… | by Aritra Das | inspiringbrilliance | Medium
Search Write
658 4
Source: https://www.confluent.io/
Introduction
Let’s start by answering the question “What is Kafka?”.
https://medium.com/inspiredbrilliance/kafka-basics-and-core-concepts-5fd7a68c3193 1/20
12/5/23, 1:01 PM Kafka Basics and Core concepts. In this article we will cover the core… | by Aritra Das | inspiringbrilliance | Medium
Distributed
Kafka works as a cluster of one or more nodes that can live in different
Datacenters, we can distribute data/ load across different nodes in the Kafka
Cluster, and it is inherently scalable, available, and fault-tolerant.
Streaming Platform
Commit Log
This one is my favorite. When you push data to Kafka it takes and appends
them to a stream of records, like appending logs in a log file or if you’re from
a Database background like the WAL. This stream of data can be “Replayed”
or read from any point in time.
Having said all of that, Kafka is commonly used for real-time streaming data
pipelines, i.e. to transfer data between systems, building systems that
https://medium.com/inspiredbrilliance/kafka-basics-and-core-concepts-5fd7a68c3193 2/20
12/5/23, 1:01 PM Kafka Basics and Core concepts. In this article we will cover the core… | by Aritra Das | inspiringbrilliance | Medium
Message
A message is the atomic unit of data for Kafka. Let’s say that you are building
a log monitoring system, and you push each log record into Kafka, your log
message is a JSON that has this structure.
https://medium.com/inspiredbrilliance/kafka-basics-and-core-concepts-5fd7a68c3193 3/20
12/5/23, 1:01 PM Kafka Basics and Core concepts. In this article we will cover the core… | by Aritra Das | inspiringbrilliance | Medium
When you push this JSON into Kafka you are actually pushing 1 message.
Kafka saves this JSON as a byte array, and that byte array is a message for
Kafka. This is that atomic unit, a JSON having two keys “level” and
“message”. But it does not mean you can’t push anything else into Kafka, you
can push String, Integer, a JSON of different schema, and everything else,
but we generally push different types of messages into different topics (we
will get to know what is a topic soon).
https://medium.com/inspiredbrilliance/kafka-basics-and-core-concepts-5fd7a68c3193 4/20
12/5/23, 1:01 PM Kafka Basics and Core concepts. In this article we will cover the core… | by Aritra Das | inspiringbrilliance | Medium
Topic
Topics, as the name suggests, are the logical categories of messages in Kafka,
a stream of the same type of data. Going back to our previous example of the
logging system, let’s say our system generates application logs, ingress logs,
and database logs and pushes them to Kafka for other services to consume.
Now, these three types of logs can be logically be divided into three topics,
appLogs, ingressLogs, and dbLogs. We can create these three topics in Kafka,
whenever there’s an app log message, we push it to appLogs topic and for
database logs, we push it to the dbLogs topic. This way we have logical
segregation between messages, sort of like having different tables for
holding different types of data.
Partitions
Partition is analogous to shard in the database and is the core concept
behind Kafka’s scaling capabilities. Let’s say that our system becomes really
popular and hence there are millions of log messages per second. So now
the node on which appLogs topic is present, is unable to hold all the data
that is coming in. We initially solve this by adding more storage to our node
i.e. vertical scaling. But as we all know vertical scaling has its limit, once that
threshold is reached we need to horizontally scale, which means we need to
add more nodes and split the data between the nodes. When we split data of
a topic into multiple streams, we call all of those smaller streams the
“Partition” of that topic.
https://medium.com/inspiredbrilliance/kafka-basics-and-core-concepts-5fd7a68c3193 5/20
12/5/23, 1:01 PM Kafka Basics and Core concepts. In this article we will cover the core… | by Aritra Das | inspiringbrilliance | Medium
This image depicts the idea of partitions, where a single topic has 4
partitions, and all of them hold a different set of data. The blocks you see
here are the different messages in that partition. Let’s imagine the topic to
be an array, now due to memory constraint we have split the single array
into 4 different smaller arrays. And when we write a new message to a topic,
the relevant partition is selected and then that message is added at the end of
the array.
An offset for a message is the index of the array for that message. The
numbers on the blocks in this picture denote the Offset, the first block is at
the 0th offset and the last block would on the (n-1)th offset. The performance
of the system also depends on the ways you set up partitions, we will look
into that later in the article. (Please note that on Kafka it is not going to be an
actual array but a symbolic one)
Producer
A producer is the Kafka client that publishes messages to a Kafka topic. Also
one of the core responsibilities of the Producer is to decide which partition
to send the messages to. Depending on various configuration and
https://medium.com/inspiredbrilliance/kafka-basics-and-core-concepts-5fd7a68c3193 6/20
12/5/23, 1:01 PM Kafka Basics and Core concepts. In this article we will cover the core… | by Aritra Das | inspiringbrilliance | Medium
parameters, the producer decides the destination partition, let’s look a bit
more into this.
2. Key Specified => When a key is specified with the message, then the
producer uses Consistent Hashing to map the key to a partition. Don’t
worry if you don’t know what consistent hashing is, in short, it’s a
hashing mechanism where for the same key same hash is generated
always, and it minimizes the redistribution of keys on a re-hashing
scenario like a node add or a node removal to the cluster. So let’s say in
our logging system we use source node ID as the key, then the logs for the
same node will always go to the same partition. This is very relevant for
the order guarantees of messages in Kafka, we will shortly see how.
3. Partition Specified => You can hardcode the destination partition as well.
Consumer
So far we have produced messages, to read those messages we use Kafka
consumer. A consumer reads messages from partitions, in an ordered
fashion. So if 1, 2, 3, 4 was inserted into a topic, the consumer will read it in
the same order. Since every message has an offset, every time a consumer
reads a message it stores the offset value onto Kafka or Zookeeper, denoting
that it is the last message that the consumer read. So in case, a consumer
node goes down, it can come back and resume from the last read position.
https://medium.com/inspiredbrilliance/kafka-basics-and-core-concepts-5fd7a68c3193 7/20
12/5/23, 1:01 PM Kafka Basics and Core concepts. In this article we will cover the core… | by Aritra Das | inspiringbrilliance | Medium
Also if at any point in time a consumer needs to go back in time and read
older messages, it can do so by just resetting the offset position.
Consumer Group
A consumer group is a collection of consumers that work together to read
messages from a topic. There are some very interesting concepts here, let’s
go through them.
Now you need to send both text and email OTP. So your OTP service can put
the OTP in Kafka, and then the SMS Service consumer group and Email
Service consumer group can both receive the message and can then send the
SMS and email out.
2. Order guarantee => Now we have seen that a topic can be partitioned and
multiple consumers can consumer from the same topic, then how do you
maintain the order of messages on the consumer-end one might ask. Good
question. One partition can not be read by multiple consumers in the same
https://medium.com/inspiredbrilliance/kafka-basics-and-core-concepts-5fd7a68c3193 8/20
12/5/23, 1:01 PM Kafka Basics and Core concepts. In this article we will cover the core… | by Aritra Das | inspiringbrilliance | Medium
consumer group. This is enabled by the consumer group only, only one
consumer in the group gets to read from a single partition. Let me explain.
https://medium.com/inspiredbrilliance/kafka-basics-and-core-concepts-5fd7a68c3193 9/20
12/5/23, 1:01 PM Kafka Basics and Core concepts. In this article we will cover the core… | by Aritra Das | inspiringbrilliance | Medium
This will not be possible if the same partition had multiple consumers in the
same group. If you read the same partition in the different consumers who
are in different groups, then also for each consumer group the messages will
end up ordered.
Broker
A broker is a single Kafka server. Brokers receive messages from producers,
assigns offset to them, and then commit them to the partition log, which is
basically writing data to disk, and this gives Kafka its durable nature.
https://medium.com/inspiredbrilliance/kafka-basics-and-core-concepts-5fd7a68c3193 10/20
12/5/23, 1:01 PM Kafka Basics and Core concepts. In this article we will cover the core… | by Aritra Das | inspiringbrilliance | Medium
Cluster
A Kafka cluster is a group of broker nodes working together to provide,
scalability, availability, and fault tolerance. One of the brokers in a cluster
works as the Controller, which basically assigns partitions to brokers,
monitors for broker failure to do certain administrative stuff.
1. Create a topic
https://medium.com/inspiredbrilliance/kafka-basics-and-core-concepts-5fd7a68c3193 11/20
12/5/23, 1:01 PM Kafka Basics and Core concepts. In this article we will cover the core… | by Aritra Das | inspiringbrilliance | Medium
Let’s take Partition 0, the leader node for this partition is node 2. The data for
this partition is replicated on nodes 2,5 and 1.S o one partition is replicated
on 3 nodes and this behavior is repeated for all 5 partitions. And also if you
see, all the leader nodes for each partition are different. So to utilize the
nodes properly, the Kafka controller broker distributed the partitions evenly
across all nodes. And you can also observe the replications are also evenly
distributed and no node is overloaded. All of these are done by the controller
Broke with the help of Zookeeper.
Since you have understood clustering now, you can see to scale we could
partition a topic even more and for each partition, we could add a dedicated
consumer node, and that way we can horizontally scale.
Zookeeper
Kafka does not function without zookeeper( at least for now, they have plans
to deprecate zookeeper in near future). Zookeeper works as the central
configuration and consensus management system for Kafka. It tracks the
brokers, topics, and partition assignment, leader election, basically all the
metadata about the cluster.
And these my friend were the basic and core concepts of Kafka.
https://medium.com/inspiredbrilliance/kafka-basics-and-core-concepts-5fd7a68c3193 12/20
12/5/23, 1:01 PM Kafka Basics and Core concepts. In this article we will cover the core… | by Aritra Das | inspiringbrilliance | Medium
Beyond basics
There are a few more things that are slightly advanced that you should know,
I would not go into details and just touch upon, cause I don’t want to
overload you with so much information in one shot.
Producer
Synchronous send
Asynchronous send.
ACK 1: Consider sent when leader broker received the message |FASTER
ACK All: Consider sent when all replicas received the message |FAST
You can compress and batch messages on producer before sendig to broker.
It gives high throughput and lowers disk usage but raises CPU usage.
https://medium.com/inspiredbrilliance/kafka-basics-and-core-concepts-5fd7a68c3193 13/20
12/5/23, 1:01 PM Kafka Basics and Core concepts. In this article we will cover the core… | by Aritra Das | inspiringbrilliance | Medium
If you use Avro as the serializer/ deserializer instead of normal JSON, you
will have to declare your schema upfront but this gives better performance
and saves storage.
Consumer
Poll loop
Kafka consumer constantly polls data from the broker and it’s no the other
way round.
Batch size
We can configure how many records and how much data is returned per poll
call.
Commit offset
https://medium.com/inspiredbrilliance/kafka-basics-and-core-concepts-5fd7a68c3193 14/20
12/5/23, 1:01 PM Kafka Basics and Core concepts. In this article we will cover the core… | by Aritra Das | inspiringbrilliance | Medium
On message read we can update the offset position for the consumer, this is
called committing the offset. Auto commit can be enabled or the application
can commit the offset explicitly. This can be done both synchronously and
asynchronously.
Ending notes
Kafka is a great piece of software and has tons of capabilities and can be
used in various sets of use cases. Kafka fits great into Modern-day
Distributed Systems due to it being distributed by design. It was originally
founded at LinkedIn and is currently maintained by Confluent. It is used by
top tech companies like Uber, Netflix, Activision, Spotify, Slack, Pinterest,
Coursera. We looked into the core concepts of Kafka to get you started. There
are tons of other things like Kafka Stream API or kSql that we did not talk
about in the interest of time.
References
1. Kafka the Definitive Guide
2. https://www.confluent.io/blog/apache-kafka-intro-how-kafka-works/
https://medium.com/inspiredbrilliance/kafka-basics-and-core-concepts-5fd7a68c3193 15/20
12/5/23, 1:01 PM Kafka Basics and Core concepts. In this article we will cover the core… | by Aritra Das | inspiringbrilliance | Medium
Happy learning 😃
Programming
https://medium.com/inspiredbrilliance/kafka-basics-and-core-concepts-5fd7a68c3193 16/20
12/5/23, 1:01 PM Kafka Basics and Core concepts. In this article we will cover the core… | by Aritra Das | inspiringbrilliance | Medium
631 2 59 3
332 1 454 4
https://medium.com/inspiredbrilliance/kafka-basics-and-core-concepts-5fd7a68c3193 17/20
12/5/23, 1:01 PM Kafka Basics and Core concepts. In this article we will cover the core… | by Aritra Das | inspiringbrilliance | Medium
81 1
Lists
https://medium.com/inspiredbrilliance/kafka-basics-and-core-concepts-5fd7a68c3193 18/20
12/5/23, 1:01 PM Kafka Basics and Core concepts. In this article we will cover the core… | by Aritra Das | inspiringbrilliance | Medium
432 2 3
https://medium.com/inspiredbrilliance/kafka-basics-and-core-concepts-5fd7a68c3193 19/20
12/5/23, 1:01 PM Kafka Basics and Core concepts. In this article we will cover the core… | by Aritra Das | inspiringbrilliance | Medium
1 2K 4
https://medium.com/inspiredbrilliance/kafka-basics-and-core-concepts-5fd7a68c3193 20/20