0% found this document useful (0 votes)
20 views45 pages

08 Apache Kafka

Apache Kafka is a distributed event streaming platform designed for high-throughput, real-time data processing and transport. It features a scalable architecture with producers, consumers, and brokers, allowing for efficient data handling and coordination through partitioning and replication. Key use cases include real-time data pipelines, analytics, log aggregation, and microservices communication.

Uploaded by

lucas.andreu01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views45 pages

08 Apache Kafka

Apache Kafka is a distributed event streaming platform designed for high-throughput, real-time data processing and transport. It features a scalable architecture with producers, consumers, and brokers, allowing for efficient data handling and coordination through partitioning and replication. Key use cases include real-time data pipelines, analytics, log aggregation, and microservices communication.

Uploaded by

lucas.andreu01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Apache Kafka

Author Large Scale Distributed Systems


Summary
1. Introduction to Apache Kafka
2. Kafka Architecture and Components
3. Coordination in a Kafka Cluster

Author Large Scale Distributed Systems


1. Introduction to Kafka

Author Large Scale Distributed Systems


What is Apache Kafka?
• A distributed event streaming platform designed to handle
high-throughput, real-time data feeds
• Processes, stores, and transports data between systems in a reliable and
scalable way

Key Strengths
• Scalability: can handle increasing amounts of data
• Reliability: can absorb common failures without disrupting the service
• Low Latency: operations are fast

Author Large Scale Distributed Systems


Core Use Cases
● Real-Time Data Pipelines
○ Build real-time, scalable data pipelines for continuous data flow between applications
● Real-Time Analytics
○ Support advanced analytics with real-time data feeds for monitoring, predictions, and
insights
● Log Aggregation
○ consolidates logs and events from different distributed systems, simplifying centralized
monitoring
● Microservices Communication
○ Facilitate efficient, decoupled communication between microservices by providing a
durable event store

Author Large Scale Distributed Systems


Core Concepts
● The unit of information that is travelling through Kafka is a message or
record ⇒ an array of bytes representing something
○ e.g. a tweet, a page visit, the CPU load of a node at a specific time
● A stream of records of a particular type is defined as topic
● A producer “publishes” (i.e. adds) messages to a topic
● Records are stored on servers called Kafka brokers

Author Large Scale Distributed Systems


Core Concepts
● A consumer subscribe to one or more topics from brokers, and consume
from them by pulling (i.e. reading) records
● The consumer reads incoming messages in the order they have been
written to the source topic (more on this later)
● Similar to a FIFO (First-In/First-Out) queue but…
● …the same topic can have multiple consumers
● this is known as the PubSub messaging pattern
○ Use case: same data used by different applications

Author Large Scale Distributed Systems


Example

KAFKA BROKER

Each of 3 producers
writes 10 records on The topic will end up Each consumer will
topic mytopic with 30 records eventually read the
same 30 records
Author Large Scale Distributed Systems
Example producer pseudo-code
producer = new Producer(…); Create a producer

message = new Message(“test message


Create a message as a sequence of bytes
str”.getBytes());

set = new MessageSet(message); Create a set (a.k.a. batch) of 1 or more messages

producer.send(“topic1”, set);
Send the message to topic topic1

Author Large Scale Distributed Systems


Example consumer pseudo-code
consumer = new Consumer(“topic1”) Create a consumer for topic1

messages = consumer.poll() Poll (one or more) messages from topic1

for (message : messages) { Use the messages


bytes = message.payload();
// do something with the bytes
}

Author Large Scale Distributed Systems


Consumer Offsets
How different consumers read the same
topic?
● Each record in a topic gets assigned an
offset, an increasing int value
representing the ordering of records
● Each consumer has an associated
consumer offset that represent a
bookmark in the topic for that consumer.

Author Large Scale Distributed Systems


Consumer Offsets & Consumer Lag
● As a consumer can read (poll) multiple
messages at a time, its consumer offset
will increment by the count of the pulled
messages
● The difference between the latest offset
and the consumer offset is called
consumer lag
○ It is useful to measure how behind a
consumer is with respect to the production
of data
Author Large Scale Distributed Systems
Speeding up
● Multiple producers writing to the same topic implies that the topic is
stored in the same broker, meaning that all consumers would connect to
the same node to read that topic
● Question: what would be the read/write limit?
○ Write limit: the network capacity (common network capacities for a single
off-the-shelf node are nowadays 1Gbps or 10 Gbps)
○ Read limit: same as write limit, but divided by the number of consumers
● So having a topic stored in a single node wouldn’t scale well
○ Plus additional issues. Which ones can you think of?
■ availability: if node fails, full topic unavailable

Author Large Scale Distributed Systems


2. Kafka Architecture

Author Large Scale Distributed Systems


More on Kafka Architecture
Actors in a kafka cluster:
● Brokers
● Producers
● Consumers
● Coordinator
(ZooKeeper/KRaft)

Kafka cluster

Author Large Scale Distributed Systems


Producers and Consumers
Producers are client* Consumer are client*
applications that write data to applications that read data from
Kafka. kafka
They know nothing about They know nothing about
consumers, they only talk to producers, they only talk to
brokers brokers

*client means that they use Producer/Consumer classes from a


library that implements the Kafka protocol
Author Large Scale Distributed Systems
Brokers
A Kafka broker is a node running the Apache Kafka server software
Broker responsibilities are:
● to talk to producers, accept messages from them and write received
messages to local broker storage
● to talk to consumer, accept read requests from them and serve messages
from local broker storage
● to coordinate with other brokers

Author Large Scale Distributed Systems


Kafka Cluster
A kafka cluster is a set of brokers working together as a single system. With
multiple brokers we can:
- distribute data among different nodes
- replicate data to increase data availability
- increase system performance by distributing system and network load
- achieve horizontal scalability
- by adding as many brokers as needed to handle the incoming traffic

How does Kafka distribute data within a cluster?

Author Large Scale Distributed Systems


Topic partitioning
For scalability, Kafka breaks topics into partitions
Example of a topic with 3 partitions
● A partition is a portion of a topic, that is, a
subset of records
● Topics are thus divided into non-overlapping
subsets that can be on separate nodes
○ increase write throughput
● The full topic is the sum of all its partitions
● A partition has a max size (usually 1Gb)

Author Large Scale Distributed Systems


Topic partitioning
● Partitions keep topics’ arrival order guarantee (FIFO)
○ Order guaranteed within a partition, but not between different partitions
● Partitions have their own offsets
○ Records are uniquely identified by tuple <partition #, offset #>
● # of partitions is set at topic’s creation time
○ can be changed later, but it’s costly
● Brokers need to know where partitions reside:
○ to inform producers on where to write
○ to inform consumers on where to read from

Author Large Scale Distributed Systems


Dispatching records
How does a record end in partition P1 vs P2 vs PN?
● Together with a message value, producers can specify a message key
○ messages are now a compound <k, v>: v is the actual content/payload, k is the key
○ messages with the same key hash will end up in the same partition
● Specifying a key is not mandatory
○ if unspecified, messages from same producer will be dispatched in round-robin fashion
among all available partitions
● Advantages of using a key
○ guarantee that all messages sent by a single producer with the same key, will end up in
the same partition and in the same order they were sent
● Custom partition strategies can be created
○ but need to avoid partition imbalance!

Author Large Scale Distributed Systems


Dispatching records

Author Large Scale Distributed Systems


Example
Imagine an Amazon-like web application, where users can do any of:
● add an item to their cart
● remove an item from their cart
● pay for the items in their cart

If we want to ensure that actions for a user are read by a downstream


consumer in the same order as they are executed, what could we do?

Author Large Scale Distributed Systems


Anatomy of a Kafka Record
Besides the actual content of the message, a kafka record
contains additional fields:
- topic: self-explanatory
- partition: the partition the record belongs to
- offset: the offset within the partition
- timestamp: the timestamp when the broker added the
message
- headers: a set of <key,value> pairs representing message
metadata, might be useful to consumers for specific
situations
- key: a key for the message
- value: the actual content of the message, the information
we want to transmit

Author Large Scale Distributed Systems


Topic management in brokers
In a streaming context, client applications can produce continuously and
without limits to kafka brokers => topics can grow indefinitely
To manage local storage, brokers can enable different kind of cleanup policies:
- time-based policy (a.k.a. delete): delete all records that have been stored
longer than Δt (a.k.a. retention time)
- key-based policy (a.k.a. compact): keep only the last record for a given key
- time-and-key policy (a.k.a. compact + delete): keep only the last record for
a given key, delete when a record has been stored longer than Δt

Author Large Scale Distributed Systems


Topic management in brokers
Cleanup policies can be specified per-topic, with global defaults (delete, Δt=1
week)
As Kafka provides a generic platform to store and retrieve data, the decision
on which cleanup policy should be used depends on the specific application
Example:
● Let’s consider again the Amazon-like application discussed before:
○ What kind of cleanup policy makes sense for such application?
○ What kind of cleanup policy does NOT make sense for such application?

Author Large Scale Distributed Systems


Compacted topics
● A compacted topic is a topic that uses the compact cleanup policy

offset key value

0 a 123

1 a 129 offset key value


After a while (when Kafka performs
2 b 145 compaction during clean-up) 3 c 123

3 c 123 5 a 12

4 b null
Tombstone: record with null value
5 a 12 to represent deletion

Author Large Scale Distributed Systems


3. Coordination in a
Kafka cluster

Author Large Scale Distributed Systems


When do we need coordination in Kafka?
Some use cases:
● Partition replication
○ A replication factor of N implies that each record in the partition must be
replicated to N-1 nodes
○ One node per partition will hold the responsibility of managing the replication
● Consumer Group coordination
○ When many consumers are reading a topic together
● Partition Reassignment
○ If a consumer gets added to a consumer group

Author Large Scale Distributed Systems


Partition Replication
● Replication is the process of having multiple
copies of the data for the sole purpose of
availability, in case one of the brokers goes
down and/or is unable to serve requests
● The unit of replication is a partition, not an
entire topic
○ replication factor is specified per-topic
● For a replication factor of N, one broker
node is designated as leader, and will act as
coordinator, while the N - 1 remaining nodes
are followers
○ Follow a master-slave topology

Author Large Scale Distributed Systems


Partition Replication
● The leader
○ keep the list of brokers that hold a replica
of the partition (the followers)
○ connect with clients
○ propagate messages to followers
● A replica is said to be in-sync (ISR) if it’s
able to keep up with the leader within a
specific Δt
○ Being an ISR can change with time, as
nodes might slow down, crash, or be
added
○ Only an in-sync replica can become a
leader if the current one fails
CLIENT
Author Large Scale Distributed Systems
Anatomy of a partition
Kafka partitions implement Write-Ahead Log (WAL) mechanism, similar to
DBs:
● A WAL is an append-only log written on disk, before it is persisted (DB
context) or available to consumers (Kafka context)
● Each entry in a WAL has an ID (offset in Kafka context)
● Guarantees data durability: in case of failures, the WAL can be replayed to
restore the state of the system before the failure

Author Large Scale Distributed Systems


Anatomy of a partition
Kafka partitions are further divided into segments:
● A segment is a chunk of a partition’s log stored as a file on the broker’s
disk
● At any point in time, for each partition there’s only one “open” segment
● When the max size for a segment is reached, segment is “closed” and a
new one is “opened”
● A partition is the sum of its consecutive segments
● Segments optimize local storage and cleanup operations

Author Large Scale Distributed Systems


Anatomy of a partition
In the example:
● 4 brokers
● 4 partitions for
the same Topic 1
● Each partition has
1 replica
● At the specific
time, all replicas
are in-sync

Author Large Scale Distributed Systems


Partition Leader Election
● Is the process by which Kafka selects a leader broker for each partition
○ Will happen if the current leader for a partition fails
● Aims to be a fast process, where the missing leader is replaced in
milliseconds to seconds
● Uses the ISR-list as a pre-approved list of candidates
● Each candidate then identifies with <broker_id, latest_epoch>
○ The candidate with the latest epoch is elected as leader
○ In case of epoch equality (as it should theoretically be) the broker with the lowest
ID is selected

Author Large Scale Distributed Systems


Partition Leader Election
● So there is no explicit process to elect a leader
● The entire process is managed by the controller node
○ The controller is a node elected among all brokers at cluster startup
● What if the ISR list is empty?
○ Case 1: the cluster is configured to allow out-of-sync replicas to become leaders
■ Potential data loss
○ Case 2: the cluster is blocked, in alarm mode, awaits for operator to fix
● So avoiding direct leader election is preferred as it’s faster

Author Large Scale Distributed Systems


Controller Leader Election
● While a partition leader is elected through this fast process, the controller
leader is selected through a race, using the underlying Zookeeper
coordination layer
● The system must guarantee that only one controller leader is elected
● All candidates to be a controller will try to write the same node.
○ Zookeeper will allow only one node to write
○ The faster node will write its ID in the node

Author Large Scale Distributed Systems


ZooKeeper
How do we guarantee correct semantic? ZK has the notion of ephemeral nodes
An ephemeral node in ZooKeeper is a temporary node that exists only as long
as the session that created it is active:
● Once the session disconnects (e.g., due to failure, timeout, or explicit
closing), the ephemeral node is automatically deleted
● Other nodes can listen for a particular node availability
● Each broker is also a client to ZooKeeper
● Upon controller broker failure, the node(s) it created disappears
● When the node /controller disappears, all listeners are notified,
this triggers the creation of a new /controller node

Author Large Scale Distributed Systems


Leader Election in Kafka
● All of the N nodes working on a partition know:
○ The current leader
○ The N-1 followers and if they are in-sync replicas
● In case of failure, each broker in the subset of N replicas nodes is a
candidate for new leader
● Every partition has an epoch number (a monotonically increasing value)
that each broker can increment and get
○ Every broker has a local copy of this value
○ The operation is synchronous ⇒ each broker will get a different value
● Each broker broadcasts a leader election request to the other N - 1 nodes
asking to be the leader
○ A request contains at least (broker-id, broker-partition-epoch)

Author Large Scale Distributed Systems


Leader Election in Kafka
Upon receiving a request for leadership, each broker:
● get the current epoch number for the partition
● compares the epoch in the request with its current local epoch
● if epoch in the request is lower, it ignores the request
● if epoch is greater then the node, updates its local epoch and votes for the
sender broker-id to be the leader
● each broker expects to receive N-1 requests in a finite interval of time

Author Large Scale Distributed Systems


Leader Election: The Election System

Author Large Scale Distributed Systems


Zookeeper & Consensus in Kafka
● The “shared” knowledge of the system is kept in a separate system, called
Zookeeper
○ Newer versions of Kafka are ditching ZooKeeper in favor of another system
● ZooKeeper is a distributed coordination service that provides centralized
management for configuration, synchronization, and metadata in
distributed systems, ensuring consistency, fault tolerance, and high
availability
● The leader election process is in fact an implementation of a broader class
of algorithms called consensus algorithms
● The Zookeeper implementation is called ZAB (Zookeeper Atomic
Broadcast): it is used for leader election, but also for message
acknowledgement
Author Large Scale Distributed Systems
ZAB: Zookeeper Atomic Broadcast
Broadcast: the process of sending a message M to N nodes.
Atomic: if sender sends M1 , then M2, any host that will receive all messages,
will receive them in the same order (total ordering).
Atomicity guarantees that:
● All alive nodes will receive the messages in the same order
● Messages will be delivered once and once only

Author Large Scale Distributed Systems


From compacted topics to materialized views
● In its simplest form, a materialized view (MV) is an in-memory key-value representation of a topic.
○ Compacted topics are the best to construct MVs because Kafka cleans up unneeded records regularly.
● A materialized view can be used as a self-updating cache. When a record is received in near real-time,
we can CREATE / UPDATE / DELETE it in the materialized view accordingly.

Materialization
TOPIC
MV

offset key value


key value

0 a 123
c 123

1 b 145
a 12

2 c 123

3 b null

4 a 12
Example with Python

Author Large Scale Distributed Systems


Reading list for the class
• Kreps, J., Narkhede, N., & Rao, J. “Kafka: A distributed messaging system for log
processing”. In Proceedings of the NetDB. 2011.

Author Large Scale Distributed Systems Page 45

You might also like