Kafka is a distributed publish-subscribe messaging system that uses messages as the fundamental data unit, consisting of keys, values, offsets, and timestamps. It organizes messages into topics, which can be regular or compacted, and supports producers that send messages and consumers that retrieve them. Kafka brokers manage data storage and communication within a cluster, ensuring reliability and scalability through partitioning and replication.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
5 views3 pages
Kafka
Kafka is a distributed publish-subscribe messaging system that uses messages as the fundamental data unit, consisting of keys, values, offsets, and timestamps. It organizes messages into topics, which can be regular or compacted, and supports producers that send messages and consumers that retrieve them. Kafka brokers manage data storage and communication within a cluster, ensuring reliability and scalability through partitioning and replication.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3
Kafka Introduction
Kafka is a unique distributed publish-subscribe messaging system written in the
Scala language with multi-language support and runs on the Java Virtual Machine (JVM). Message Message is the fundamental data unit in Apache Kafka. It represents a record of information that is produced by producers and consumed by consumers in Kafka system. Structure of Kafka message: -Key: the key is used to determine the patition of the message. All messages with the same key will are sent to the same partition. -Value: is the actual data and payload in a message. Value can be any form of data. -Offset: is a unique sequential number assigned to each message in a partition. Offset is used for determining the position of messages. -Timestamp: used to determine the time when message was produced. TOPIC and PARTITIONS Topic basically is a category or a channel to which messages are stored and transmitted between producers and consumers. Kafka supports two types of topic: -Regular topic: can be configured with a specific retention time or space bound. When there are messages that are older than specified retention time, or the space bound is exceeded for a partition, Kafka is allowed to detele those messages to free space. By default, topics are configured with a retention time of 7 days, but it's also possible to store data indefinitely. -Compacted topic: messages are not deleted based on retention time or space bound. Instead, Kafka treats later messages as updates to earlier messages with the same key and guarantees never to delete the latest message per key. Only the older messages with the same key are removed and the latest version of each key is kept. Topic is split into multiple partitions, with partition, kafka provide the parallelism and scalability of data. That is, consumers can consume data from multiple partitions distributed across different brokers in parallel. PRODUCERS Producer is a client process that publishes or sends messages to Kafka topics. Producers are responsible for sending data to Kafka in a reliable, distributed and scalable manner. The producers can specifies the topic and the partition of that topic to which the message should be sent, either by specifying a key or using a default partitioning strategy. CONSUMERS Consumer is a client process that consumes the messages stored in topics. A consumer must subscribe to one or more topics from which it wants to consume messages. Consumers pull data from Kafka brokers, processing the data in the order it was stored in the topic’s partitions. Each consumer in a consumer group is assigned different partitions, ensuring that no two consumers in the same group read the same partition simultaneously. After successfully processing a message, the consumer commits its offset to Kafka, this helps the consumer to keep track of which messages is has processed. BROKERS and CLUSTERS Kafka broker is a server or a node within a Kafka cluster that is responsible for storing data and handling communication between producers and consumers. A Kafka broker is a component of the Kafka cluster which receives data from producers and stores it in Kafka topics, sends data to consumers when they request it. Broker also manages partitions within a topic and ensures data is distributed across the cluster. It replicates data to multiple brokers so data is ensured even if some brokers fail. Each broker is identified by a unique ID. Kafka brokers act as leaders or followers for partitions. The leader broker handles read and write requests, while the follower brokers replicate the leader’s data. If the leader goes down, a new leader is elected from the followers. A cluster is a distributed system consisted of multiple brokers working together. Cluster use Zookeeper for managing cluster state, keeping track of which broker is the leader of each partition , and monitoring the health of brokers. KAFKA ARCHITECTURE