0% found this document useful (0 votes)
5 views3 pages

Kafka

Kafka is a distributed publish-subscribe messaging system that uses messages as the fundamental data unit, consisting of keys, values, offsets, and timestamps. It organizes messages into topics, which can be regular or compacted, and supports producers that send messages and consumers that retrieve them. Kafka brokers manage data storage and communication within a cluster, ensuring reliability and scalability through partitioning and replication.

Uploaded by

Khả Võ Văn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views3 pages

Kafka

Kafka is a distributed publish-subscribe messaging system that uses messages as the fundamental data unit, consisting of keys, values, offsets, and timestamps. It organizes messages into topics, which can be regular or compacted, and supports producers that send messages and consumers that retrieve them. Kafka brokers manage data storage and communication within a cluster, ensuring reliability and scalability through partitioning and replication.

Uploaded by

Khả Võ Văn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Kafka Introduction

Kafka is a unique distributed publish-subscribe messaging system written in the


Scala language with multi-language support and runs on the Java Virtual
Machine (JVM).
Message
Message is the fundamental data unit in Apache Kafka. It represents a record of
information that is produced by producers and consumed by consumers in
Kafka system.
Structure of Kafka message:
-Key: the key is used to determine the patition of the message. All messages
with the same key will are sent to the same partition.
-Value: is the actual data and payload in a message. Value can be any form of
data.
-Offset: is a unique sequential number assigned to each message in a partition.
Offset is used for determining the position of messages.
-Timestamp: used to determine the time when message was produced.
TOPIC and PARTITIONS
Topic basically is a category or a channel to which messages are stored and
transmitted between producers and consumers.
Kafka supports two types of topic:
-Regular topic: can be configured with a specific retention time or space bound.
When there are messages that are older than specified retention time, or the
space bound is exceeded for a partition, Kafka is allowed to detele those
messages to free space. By default, topics are configured with a retention time
of 7 days, but it's also possible to store data indefinitely.
-Compacted topic: messages are not deleted based on retention time or space
bound. Instead, Kafka treats later messages as updates to earlier messages with
the same key and guarantees never to delete the latest message per key. Only
the older messages with the same key are removed and the latest version of
each key is kept.
Topic is split into multiple partitions, with partition, kafka provide the
parallelism and scalability of data. That is, consumers can consume data from
multiple partitions distributed across different brokers in parallel.
PRODUCERS
Producer is a client process that publishes or sends messages to Kafka topics.
Producers are responsible for sending data to Kafka in a reliable, distributed
and scalable manner.
The producers can specifies the topic and the partition of that topic to which the
message should be sent, either by specifying a key or using a default
partitioning strategy.
CONSUMERS
Consumer is a client process that consumes the messages stored in topics.
A consumer must subscribe to one or more topics from which it wants to
consume messages. Consumers pull data from Kafka brokers, processing the
data in the order it was stored in the topic’s partitions.
Each consumer in a consumer group is assigned different partitions, ensuring
that no two consumers in the same group read the same partition
simultaneously.
After successfully processing a message, the consumer commits its offset to
Kafka, this helps the consumer to keep track of which messages is has
processed.
BROKERS and CLUSTERS
Kafka broker is a server or a node within a Kafka cluster that is responsible for
storing data and handling communication between producers and consumers.
A Kafka broker is a component of the Kafka cluster which receives data from
producers and stores it in Kafka topics, sends data to consumers when they
request it.
Broker also manages partitions within a topic and ensures data is distributed
across the cluster.
It replicates data to multiple brokers so data is ensured even if some brokers
fail.
Each broker is identified by a unique ID.
Kafka brokers act as leaders or followers for partitions. The leader broker
handles read and write requests, while the follower brokers replicate the
leader’s data. If the leader goes down, a new leader is elected from the
followers.
A cluster is a distributed system consisted of multiple brokers working
together. Cluster use Zookeeper for managing cluster state, keeping track of
which broker is the leader of each partition , and monitoring the health of
brokers.
KAFKA ARCHITECTURE

You might also like