Kafka Development and Functionality
Kafka Development and Functionality
In the technological landscape of the late 2000s and early 2010s, the professional
networking platform LinkedIn found itself at the epicenter of a data explosion. The
company was experiencing an exponential surge in the volume of digitized
information, encompassing not only traditional transactional records like user profiles
and job histories but, more significantly, a torrent of user activity data.1 Every click,
search, profile view, and connection request represented a valuable event that
needed to be captured and leveraged in real-time to power core platform features like
news feeds, analytics, and recommendation engines.1
This real-time requirement created a profound technical crisis. The existing data
infrastructure paradigms of the era were fundamentally ill-equipped to handle
LinkedIn's burgeoning scale and velocity. The available options presented a critical
gap 1:
1. Databases: Traditional relational database management systems were optimized
for "data at rest"—the persistent storage and structured querying of information.
They excelled at transactional integrity but were too slow and cumbersome for
the high-throughput, low-latency ingestion and processing demanded by
real-time data feeds.1
2. Traditional Messaging Systems: Message-oriented middleware, or message
queues, were designed for "data in motion," facilitating asynchronous
communication between applications. However, these systems were typically
architected around a single, centralized broker node. While effective for many
enterprise integration patterns, they were not built for the hyper-scale LinkedIn
required. Attempting to funnel a data volume that was projected to grow by a
factor of 1,000 through a single-node system would inevitably lead to
catastrophic failure.1 Furthermore, these systems lacked the crucial capabilities
of long-term data retention and message replayability, which were becoming
essential for complex analytics and reprocessing scenarios.2
Faced with this "mismatch" between their needs and the available technology, a team
of engineers at LinkedIn—Jay Kreps, Neha Narkhede, and Jun Rao—embarked on
creating a new system from the ground up in 2010.1 The project, christened "Kafka" by
Jay Kreps in homage to the author Franz Kafka, was conceived with a singular design
tenet: to be "a system optimized for writing".3 After approximately a year of
development, the first version was deployed within LinkedIn, where it rapidly became
the central nervous system of the company's data architecture, integrating hundreds
of microservices and data systems in real-time.1
The transformative potential of Kafka was evident far beyond the walls of LinkedIn. In
early 2011, the project was open-sourced and contributed to the Apache Software
Foundation, where it graduated from incubation to a top-level project on October 23,
2012.3 This strategic move catalyzed its adoption across the technology industry.
This period marked a fundamental shift in Kafka's identity. It evolved rapidly from
being perceived as a powerful but specialized "messaging queue" to a "full-fledged
event streaming platform".2 This was not merely a semantic rebranding but a reflection
of a vastly expanded feature set that went far beyond simple message transport. The
integration of durable, long-term storage, native stream processing capabilities via
the Kafka Streams library, and seamless data integration with external systems
through Kafka Connect transformed it into a comprehensive platform for handling
data in motion.3
To truly understand Kafka's architecture and capabilities, one must grasp its central,
foundational abstraction: the distributed, partitioned, and replicated commit log.2
This is not merely one feature among many; it is the architectural DNA from which all
of Kafka's other properties—its performance, scalability, durability, and unique
messaging model—are derived.
A commit log is a simple, append-only data structure. In Kafka, data is always written
to the end of the log, making the data records immutable and the write operations
extremely fast, as they leverage the efficiency of sequential disk I/O rather than the
slower random-access patterns required by traditional databases.14 This aligns
directly with its original design goal of being "optimized for writing".3
This architectural choice has profound implications. The most significant is the
complete decoupling of data producers from data consumers.8 In a traditional
message queue, the broker often manages the state of message delivery, tracking
which consumer has received and acknowledged each message. This creates a tight
coupling and a potential bottleneck at the broker. Kafka's commit log model
obliterates this paradigm. The broker's job is simplified to its essence: append records
to the log and replicate them for fault tolerance. It does not track consumer state.
Instead, the responsibility for tracking consumption progress is shifted entirely to the
client side. Each consumer is responsible for managing its own position, or offset,
within the log.3 This "dumb broker/smart client" philosophy 20 means that consumers
can read data at their own pace, rewind to re-process historical data, or have multiple
independent consumer applications read from the same data stream without
interfering with one another. This capability for data replay is what fundamentally
elevates Kafka from a transient messaging system to a durable event streaming
platform, capable of serving both real-time and historical data processing needs.2
The decision to build Kafka around a distributed commit log was a direct and elegant
solution to the scaling crisis that birthed it. By simplifying the broker's responsibilities,
the architecture inherently supports massive horizontal scalability; adding more
brokers to a cluster is a straightforward way to increase capacity because the core
logic of each node remains simple.2 This same design choice necessitates the
consumer model of groups and offset management, which in turn provides the unique
ability to function as both a queue and a pub/sub system simultaneously. Ultimately,
the distributed log is the architectural north star that has guided Kafka's entire
evolution.
The power and scalability of Apache Kafka stem from a set of well-defined
architectural components that work in concert. Understanding the role of each
component and their interactions is essential for designing, deploying, and managing
robust Kafka-based systems.
A Kafka deployment consists of a cluster of one or more servers, where each server is
referred to as a broker.2 These brokers form the backbone of the Kafka system. Their
primary responsibilities are to receive streams of records from producer clients,
assign them to the correct partitions, store them durably on disk, and serve them to
consumer clients upon request.13 Each broker in the cluster is identified by a unique,
integer-based ID and is responsible for a subset of the partitions in the cluster,
ensuring a balanced distribution of load.16
Within this cluster, one broker is dynamically elected to take on the additional role of
the controller.24 The controller acts as the administrative brain of the cluster. It is
responsible for managing the state of all resources, including topics, partitions, and
replicas. Its key duties include handling broker failures, performing leader elections for
partitions when a leader broker goes down, and managing the addition or removal of
brokers from the cluster.24 By centralizing these state management tasks in a single
controller, Kafka ensures that cluster-wide state changes are handled efficiently and
without race conditions. If the controller broker fails, a new controller is automatically
elected from the remaining healthy brokers in the cluster.25
Clients (producers and consumers) do not need to know the entire topology of the
cluster. They initiate a connection with one or more designated bootstrap servers.
The bootstrap server responds with metadata about the complete cluster, including
the addresses of all brokers and which broker is the leader for which partition. Armed
with this metadata, the client can then establish direct connections to the appropriate
brokers to send or receive data.27
Kafka organizes streams of records into categories called topics.2 A topic is a logical
name that producers publish to and consumers subscribe from. It can be
conceptualized as a feed, analogous to a table in a relational database or a folder in a
filesystem.18
To achieve scalability and parallelism, each topic is divided into one or more
partitions.5 A partition is the fundamental unit of storage and parallelism in Kafka.
Each partition is an ordered, immutable sequence of records—effectively a structured
commit log in its own right.16 When a producer sends a record to a topic, it is
ultimately stored in one of these partitions. This partitioning allows a topic's data and
processing load to be split across multiple brokers in the cluster.16 The number of
partitions for a topic is a critical configuration parameter, as it dictates the maximum
level of parallelism for consumers within a consumer group; a group cannot have more
active consumers than the number of partitions for a topic.31
Within each partition, every record is assigned a unique, immutable, and sequential
integer known as an offset.5 The offset serves as a unique identifier for a record within
its partition. For example, the first record in a partition has an offset of 0, the second
has an offset of 1, and so on. This simple, ordered structure is what allows consumers
to reliably track their read position and enables Kafka to provide its ordering
guarantees.3 The combination of topic name, partition number, and offset uniquely
identifies any record in a Kafka cluster.34
Producers are the client applications responsible for publishing, or writing, streams of
events to Kafka topics.2 A Kafka record produced by these clients is a key-value pair,
accompanied by a timestamp and optional, user-defined headers.5 Both the key and
the value are serialized into byte arrays by the producer before being transmitted over
the network to the broker.27
batch.size, which defines the maximum size of a batch in bytes, and linger.ms, which
sets a maximum time the producer will wait to fill a batch before sending it.
Additionally, compression (configured via compression.type) can be applied to these
batches, further reducing network bandwidth and storage requirements. Larger
batches generally lead to better compression ratios.47
Consumers are the client applications that subscribe to Kafka topics to read and
process the streams of records published by producers.2
The central abstraction for consumption is the consumer group. Consumers identify
themselves with a group.id string.5 This simple mechanism elegantly unifies the two
primary messaging models:
● Queueing Model: When multiple consumer instances share the same group.id,
they form a pool of workers. Kafka distributes the topic's partitions among these
instances, ensuring that each partition is consumed by exactly one member of
the group. This effectively load-balances the processing workload across the
consumers, mimicking the behavior of a traditional message queue.19
● Publish-Subscribe Model: When consumer instances each have a unique
group.id, they act as independent subscribers. In this case, each consumer
receives a full copy of all messages from all partitions of the topic, mirroring the
broadcast behavior of a publish-subscribe system.19
As consumers read records from a partition, they must track their progress. This is
done via offset management. The offset of the last successfully processed record for
each partition is periodically "committed" back to a special, highly-available internal
Kafka topic named __consumer_offsets.31 This committed offset acts as a durable
bookmark. If a consumer instance fails and restarts, or if a rebalance assigns its
partition to another consumer, the new consumer will query the
__consumer_offsets topic to find the last committed offset and resume processing
from that point, ensuring no data is lost and (depending on the commit strategy)
minimizing reprocessing.31 Developers have the choice between automatic offset
committing (
This architecture reveals a deeper principle: a Kafka Consumer Group is more than
just a simple reader. It functions as a persistent, fault-tolerant, and scalable "view" or
"cursor" into the distributed commit log. The state of this view—the collection of
committed offsets—is not an ephemeral detail but is itself stored durably as data
within Kafka. A traditional database view is a logical construct that provides a window
into underlying tables. Similarly, a Kafka topic can be seen as the underlying table of
immutable events (the source of truth). A consumer group, then, defines a specific,
independent consumption process on that data. Because the state of this process
(the offsets) is also stored durably in Kafka, multiple independent applications can
consume the same data stream without impacting one another. One group might be a
real-time analytics engine, another an ETL pipeline to a data warehouse, and a third
an audit service. Each progresses at its own pace, maintaining its own persistent view.
Adding a new application—a new view—is as simple as starting a new consumer
group. This is a profoundly scalable paradigm that stems directly from the
architectural decision to decouple consumption state from the broker and manage it
as a first-class, persistent entity within the Kafka ecosystem itself.
Part III: Guarantees of a Distributed System
Kafka achieves durability and high availability through its replication protocol, which is
built on a leader-follower model.26
● The Leader-Follower Model: When a topic is configured with a replication factor
greater than one, Kafka creates multiple copies, or replicas, of each partition.
These replicas are distributed across different brokers in the cluster to protect
against single-broker failures. For each partition, one replica is designated as the
leader, while the others become followers.5 All produce requests (writes) and
consume requests (reads) for a given partition are exclusively handled by its
leader.16 The followers' sole responsibility is to passively replicate the data from
their leader, fetching new records in sequence to maintain a byte-for-byte
identical copy of the leader's log.53
● In-Sync Replicas (ISR): To manage replication consistency, Kafka maintains a
dynamic set for each partition known as the In-Sync Replicas (ISR).25 This set
contains the leader and any followers that are fully "caught-up" with the leader's
log.25 A follower is considered caught-up if it is actively fetching from the leader
and its log does not lag behind the leader's log by more than a configurable time,
replica.lag.time.max.ms.51 If a follower fails or falls too far behind, the leader
removes it from the ISR. This ISR set is the cornerstone of Kafka's failover
strategy.
● The High Watermark: To prevent consumers from reading data that has not been
fully replicated and could be lost in a leader failure, Kafka uses a concept called
the high watermark. The high watermark is the offset of the last record that has
been successfully copied to all replicas in the ISR.55 Consumers are only permitted
to read records up to this high watermark offset. This ensures that any data a
consumer sees is considered "committed" and will not be lost as long as at least
one replica from the ISR remains available.25
● Leader Election: In the event of a leader broker failure, the cluster controller
initiates a leader election. It selects a new leader from the remaining healthy
replicas that are members of the ISR.25 Because every member of the ISR is
guaranteed to have all committed records (up to the high watermark), this
process ensures that no committed data is lost during the failover.25
● Unclean Leader Election: A critical trade-off between availability and
consistency arises in the catastrophic scenario where all replicas in the ISR for a
partition become unavailable. By default (unclean.leader.election.enable=false),
Kafka prioritizes consistency. It will keep the partition offline and wait for a replica
from the original ISR to come back to life, thus guaranteeing no data loss but
sacrificing availability.25 If this setting is enabled, Kafka will prioritize availability by
electing the first replica to come back online as the new leader, even if it was not
in the ISR. This brings the partition back online quickly but risks losing any data
that had not been replicated to that follower.25
Kafka's durability is not a monolithic feature but a finely-tunable contract between the
producer client and the broker cluster. This contract is defined by the interplay of
three key configuration parameters. Understanding how to orchestrate them is
essential for architecting a system that meets precise data safety and performance
goals.
1. replication.factor: This is a topic-level setting that defines the total number of
copies (replicas) to maintain for each partition of that topic.24 A replication factor
of
N means the cluster can tolerate the failure of up to N−1 brokers without losing
data for that topic.19 For any production environment, a replication factor of at
least 3 is the standard best practice, typically distributed across three different
physical racks or availability zones.57
2. min.insync.replicas: This setting, configurable at the broker or topic level,
establishes a minimum threshold for the size of the ISR. When a producer uses
acks=all, the broker will reject the produce request with a NotEnoughReplicas
exception if the number of in-sync replicas is less than this value.51 This is a
critical safety mechanism. For example, with a replication factor of 3, setting
min.insync.replicas=2 ensures that any acknowledged write has been durably
persisted on at least two separate brokers, protecting against data loss even if
the leader fails immediately after acknowledging the write.
3. Producer acks Setting: This client-side configuration defines the producer's
criteria for considering a write request successful, creating a direct trade-off
between durability and latency.27
○ acks=0: The producer sends the message and immediately considers it
successful without waiting for any acknowledgment from the broker. This is a
"fire-and-forget" mode that offers the highest throughput and lowest latency
but provides the weakest durability guarantees, as messages can be lost in
transit or if the leader fails before writing the record.61
○ acks=1 (Default before Kafka 3.0): The producer waits for an acknowledgment
from the partition leader only. This confirms that the leader has successfully
written the record to its local log. It offers a balance of durability and
performance, but data can still be lost if the leader fails before its followers
have replicated the record.61
○ acks=all (or -1) (Default from Kafka 3.0): The producer waits for an
acknowledgment from the leader after the record has been successfully
replicated to all followers currently in the ISR. This setting provides the
strongest possible durability guarantee, as it ensures that any acknowledged
record exists on multiple brokers.61
The interplay of these three settings defines the system's resilience. Simply setting
replication.factor=3 is insufficient if the producer uses acks=1, as a leader failure can
still cause data loss. Even with acks=all, if min.insync.replicas is not set appropriately
(e.g., it defaults to 1), a write could be acknowledged by a lone leader just before it
fails. Therefore, the gold standard for mission-critical durability in production is the
combination of replication.factor=3, min.insync.replicas=2, and producer acks=all. This
configuration ensures that every acknowledged write is present on at least two
brokers and that the system can tolerate the failure of one broker without any data
loss or loss of write availability.
Kafka's ordering guarantees are precise and directly tied to its partitioning model.
● Within a Partition: Kafka provides a strict ordering guarantee for records within a
single partition. If a producer sends message M1 followed by message M2 to the
same partition, Kafka guarantees they will be written to the log in that order (M1
will have a lower offset than M2), and all consumers of that partition will read
them in that exact same order.19 This is an inviolable property of the append-only
commit log.
● Across Partitions: Conversely, Kafka provides no global ordering guarantee for
records across the different partitions of a topic.46 A consumer reading from a
multi-partition topic may process a record from partition 1 that was produced
chronologically later than a record from partition 0 that it has not yet received.
● Achieving Order for Related Events: To enforce a specific order for a sequence
of related events (for example, all updates for a single customer account), the
application must ensure these events are sent to the same partition. This is
achieved by producing all related records with the same message key (e.g., using
the customer ID as the key). The producer's key-based partitioning logic
guarantees that all records with the same key will be deterministically routed to
the same partition, thereby preserving their relative order.40
Message delivery semantics define the guarantees a system provides about whether
a message will be delivered and how many times. Kafka supports all three primary
semantics, which are configurable based on application needs.
● At-Most-Once: In this mode, messages may be lost but are guaranteed never to
be delivered more than once. This behavior can occur in Kafka under specific
failure scenarios. For example, if a producer fails to receive an acknowledgment
from the broker and is configured not to retry, the message might be lost. On the
consumer side, if an application is configured to commit offsets automatically
before processing the data, a crash after the commit but before processing would
cause the message to be skipped upon restart.50 This semantic prioritizes
performance over reliability and is suitable only for use cases that can tolerate
data loss, such as collecting non-critical metrics.
● At-Least-Once: This is Kafka's default guarantee. Messages are guaranteed
never to be lost but may be redelivered as duplicates. This occurs if a producer
sends a message but experiences a temporary network failure and does not
receive an acknowledgment. The producer's retry mechanism will then resend the
message, potentially creating a duplicate in the broker's log. On the consumer
side, if an application processes a message but crashes before it can commit the
corresponding offset, it will re-read and re-process the same message upon
restarting.50 This is the most common semantic used, with applications often
designed to be idempotent (able to handle duplicate messages gracefully).
● Exactly-Once Semantics (EOS): This is the strongest and most complex
guarantee, ensuring that each message is delivered and processed once and only
once. Achieving EOS in a distributed system is a non-trivial challenge that
requires coordination between the client and the broker. Kafka achieves this
through two powerful features introduced in version 0.11:
1. Idempotent Producer: By setting enable.idempotence=true in the producer
configuration, the producer becomes idempotent. The broker assigns a
unique Producer ID (PID) to the producer instance, and the producer includes
a sequence number with every record it sends to a specific partition. The
broker keeps track of the last sequence number it has seen for each (PID,
partition) combination. If it receives a record with a sequence number it has
already processed, it discards the duplicate, preventing data duplication from
producer retries.65 This provides exactly-once delivery guarantees
from the producer to the broker log for a single partition.
2. Transactions: The Kafka Transactional API extends EOS to atomic writes
across multiple topics and partitions. This is essential for
"consume-transform-produce" stream processing applications, where an
application reads a message, processes it, and writes one or more resulting
messages back to Kafka. The entire operation must be atomic. The API allows
a producer to beginTransaction(), send records to multiple partitions, send
the consumer's offsets to the transaction, and then either
commitTransaction() or abortTransaction(). A broker-side component called
the Transaction Coordinator manages the state of these transactions.
Consumers can be configured with isolation.level="read_committed" to
ensure they only ever read records that are part of a successfully committed
transaction, effectively filtering out any data from aborted or in-progress
transactions.65
A Kafka topic serves as the fundamental channel for the publish-subscribe pattern. A
producer publishes records to a topic, and any application can subscribe to that topic
to receive the records.19 The innovation lies in how consumers subscribe. By labeling
themselves with a
This hybrid model gives Kafka the workload scalability of a traditional queuing system
and the data-sharing flexibility of a publish-subscribe system, all within a unified and
coherent framework.10
When compared to established message queues like RabbitMQ and ActiveMQ, Kafka's
distinct design philosophy becomes clear, leading to different strengths, weaknesses,
and ideal use cases.
● Architectural Philosophy: The primary distinction is the "smart broker" versus
"dumb broker" paradigm.
○ RabbitMQ/ActiveMQ: These systems embody the "smart broker, dumb
client" model.20 The broker is a sophisticated entity responsible for complex
message routing (e.g., AMQP exchanges in RabbitMQ), tracking the delivery
state of every message, managing acknowledgments, and implementing
features like message priorities. The client's logic is correspondingly simpler.
○ Kafka: Kafka follows the "dumb broker, smart client" model.20 The broker's
role is simplified to that of a high-performance, distributed log manager.
Complex logic, such as tracking which messages have been processed (offset
management) and deciding what to consume, is offloaded to the consumer
client.
● Message Retention and Replayability: This is a fundamental differentiator.
○ RabbitMQ/ActiveMQ: These are primarily transient buffers. Messages are
typically stored in memory or on disk until they are successfully consumed
and acknowledged, at which point they are deleted from the queue.15 They are
not designed for long-term storage or data replay.
○ Kafka: Retention is policy-based, not consumption-based. Records are
durably stored on disk for a configurable period (e.g., 7 days or indefinitely) or
until a size limit is reached, irrespective of whether they have been
consumed.2 This turns the broker into a persistent, replayable system of
record, a feature that is foundational to its use as a streaming platform.
● Performance and Scalability:
○ RabbitMQ/ActiveMQ: They offer excellent performance for low-latency,
transactional messaging and can handle moderate to high throughput.
However, their broker-centric design and complex message state
management can become a bottleneck at extreme scales. Scaling is often
achieved through clustering or federation, which can be more complex to
manage than Kafka's model.64
○ Kafka: Kafka is architected from the ground up for extreme throughput
(capable of handling millions of messages per second) and seamless
horizontal scalability.2 Its performance is derived from leveraging sequential
disk I/O and a simplified broker model. Adding more brokers to a cluster is a
straightforward way to scale capacity.
● Ideal Use Cases:
○ RabbitMQ/ActiveMQ: They excel in traditional enterprise messaging
scenarios. This includes acting as a task queue for background job
processing, facilitating request-response (RPC-style) communication
between microservices, and implementing complex routing logic where
messages need to be delivered based on content or priority.21 ActiveMQ's
strong support for the Java Message Service (JMS) API also makes it ideal for
integrating with legacy enterprise systems.83
○ Kafka: It is the superior choice for building real-time data pipelines,
implementing event sourcing patterns, aggregating logs and metrics at a
massive scale, and serving as the backbone for stream processing
applications. Any scenario that requires high throughput, long-term data
retention, and the ability for multiple applications to replay and analyze data
streams is a prime use case for Kafka.9
Latency Profile Low latency, Very low latency for Low latency for
optimized for high transactional moderate workloads
throughput 3 messages 64 83
Apache Kafka's dominance is not solely due to the performance of its core broker. Its
power is magnified by a rich, integrated ecosystem of tools and libraries that
transform it from a messaging component into a comprehensive, end-to-end data
platform. These components address critical needs in data integration, stream
processing, and data governance.
The core of the framework is the Connector plugin. Connectors are pre-built or
custom packages of code that understand how to interface with a specific external
system. There are two types 98:
● Source Connectors: Ingest data from external systems (e.g., polling a database
for new rows, tailing a log file) and publish it to Kafka topics.
● Sink Connectors: Export data from Kafka topics to external systems (e.g., writing
records to an Elasticsearch index, HDFS, or a cloud object store).
Key Features:
● Converters: These plugins handle the serialization and deserialization of data as
it moves between the external system and Kafka. They ensure data is in the
correct format (e.g., JSON, Avro) for both Kafka and the target system.98
● Single Message Transformations (SMTs): These allow for lightweight, in-flight
modification of individual records as they pass through the Connect pipeline.
SMTs can be used to filter records, mask sensitive fields, add metadata, or alter
the structure of a message without requiring a separate stream processing
application.98
● Dead Letter Queues (DLQs): For sink connectors, a DLQ can be configured as a
destination for records that cannot be processed successfully (e.g., due to a data
format error). This prevents the entire pipeline from halting on a single bad record
and allows for later inspection and remediation.98
Key Features:
● High-Level DSL and Processor API: It offers a functional, high-level
Domain-Specific Language (DSL) with common stream processing operators like
map, filter, groupBy, join, and aggregate. For more complex or fine-grained
control, it also provides the lower-level Processor API.3
● Stateful Processing: Kafka Streams has first-class support for stateful
operations, such as windowed aggregations and joins. It manages local state
using embedded, high-performance key-value stores (typically RocksDB), which
allows state to be larger than available memory. For fault tolerance, all updates to
these local state stores are backed up to a compacted changelog topic in Kafka,
allowing state to be fully restored in the event of an application failure.3
● Exactly-Once Semantics (EOS): Kafka Streams is the primary vehicle for
achieving end-to-end exactly-once processing in Kafka. By setting
processing.guarantee=exactly_once, the library leverages Kafka's transactional
capabilities to ensure that for every input record, the processing, state updates,
and resulting output records are completed as a single atomic unit.65
Purpose: ksqlDB is an event streaming database built on top of Kafka. Its goal is to
radically simplify the creation of stream processing applications by providing an
interactive, high-level SQL interface.104
How it Works: ksqlDB is not a new storage engine; it operates directly on data stored
in Kafka topics. Under the hood, a ksqlDB server parses the SQL statements
submitted by a user and translates them into Kafka Streams applications. These
applications are then executed on the Kafka cluster to perform the requested
processing.104
Abstractions: ksqlDB introduces two core abstractions that bridge the gap between
the relational world and the streaming world 105:
● STREAM: Represents an unbounded, append-only sequence of events, directly
mapping to a Kafka topic.
● TABLE: Represents a stateful, materialized view of a stream. It provides a
snapshot of the latest value for each key in the stream and is continuously
updated as new events arrive.
Use Case: ksqlDB significantly lowers the barrier to entry for stream processing. It
empowers developers, data analysts, and data scientists to perform real-time data
exploration, filtering, transformation, and aggregation on live data streams using
familiar SQL syntax, without needing to write complex code in Java or Scala.107
Role in Data Governance: In any large-scale data architecture, ensuring data quality
and consistency is a critical challenge. Schema Registry acts as the enforcer of a
"data contract" between producers and consumers.110 By providing a central
repository for schemas, it ensures that all data flowing through Kafka adheres to a
predefined structure, preventing data corruption and downstream processing
failures.110
Schema Evolution: A key feature of Schema Registry is its ability to manage the
evolution of schemas over time. As applications and business requirements change,
data schemas must also change. Schema Registry enforces compatibility rules (e.g.,
backward, forward, full compatibility) when a new version of a schema is registered.
This ensures that new producers do not break old consumers, and new consumers
can still read data produced with old schemas, allowing for independent and
decoupled upgrades of microservices.109
How it Works: The process is highly efficient. When a producer sends a record, its
serializer first checks if the schema is registered. If not, it registers it and receives a
unique schema ID. This small integer ID is then embedded in the record's metadata,
rather than the full, verbose schema. When a consumer receives the record, its
deserializer extracts the schema ID, and if it doesn't have the corresponding schema
cached locally, it requests it from the Schema Registry. This schema is then used to
correctly deserialize the record's payload.109
The power of the Kafka ecosystem lies in the synergistic way these components build
upon one another, creating a powerful feedback loop that reinforces Kafka's position
as the de facto standard for data in motion. An organization might initially adopt Kafka
for its core message brokering. Soon, the need arises to integrate data from a
relational database. Instead of building a custom, brittle pipeline, the team deploys
Kafka Connect with a pre-built JDBC connector, saving weeks of development time
and gaining a scalable, reliable solution.99 Next, they need to filter and enrich this
incoming data in real-time. Rather than introducing and managing a separate,
complex stream processing cluster, they embed the logic directly into a lightweight
Kafka Streams application, leveraging the same operational model and guarantees as
the rest of their Kafka infrastructure.3 As the volume of processed data grows, data
analysts want to run ad-hoc queries on the live streams. With ksqlDB, they can do so
using familiar SQL, gaining immediate insights without waiting for the data to be
loaded into a traditional data warehouse.106 Finally, as more teams and services begin
to depend on these data streams, maintaining data quality becomes paramount. The
organization implements Schema Registry to enforce data contracts and manage
schema evolution, preventing breaking changes and ensuring the long-term health of
their data ecosystem.110
At this stage, the organization is no longer just using Kafka; they are leveraging a
comprehensive, integrated platform for data integration, processing, querying, and
governance. The value derived is not from any single component but from their
seamless interplay. This creates a significant competitive moat. A competing
technology like Apache Pulsar, even with certain architectural advantages, must not
only rival the Kafka broker but also this entire, deeply integrated and battle-tested
ecosystem, presenting a much higher barrier to displacement.90
Part VI: The Future of Kafka: Recent Developments and Strategic
Direction
Apache Kafka is not a static project; it is in a constant state of evolution. Recent and
ongoing developments are fundamentally reshaping its architecture, simplifying its
operation, and expanding its capabilities. These changes are not merely incremental
improvements but strategic moves that address historical limitations and position
Kafka for the next decade of data streaming.
For most of its history, Apache Kafka had a symbiotic but complex relationship with
Apache ZooKeeper. Kafka relied on ZooKeeper for critical cluster coordination tasks,
including storing cluster metadata, tracking broker membership, and, most
importantly, electing the cluster controller.34 This dependency, while functional, was a
significant source of operational friction. It meant that to run a production Kafka
cluster, operators had to deploy, manage, monitor, and secure two separate, complex
distributed systems, each with its own configuration, failure modes, and security
models. This added considerable operational overhead and created a potential
scalability bottleneck, as ZooKeeper's performance could limit the number of
partitions a Kafka cluster could efficiently manage.113
The Kafka community addressed this long-standing challenge with the introduction of
KIP-500: Replace ZooKeeper with a Self-Managed Metadata Quorum. This KIP
introduced the Kafka Raft (KRaft) protocol, an event-based implementation of the
Raft consensus algorithm built directly into the Kafka brokers themselves.112 In KRaft
mode, the ZooKeeper dependency is eliminated entirely. Instead, a dedicated quorum
of brokers, acting as controllers, uses the KRaft protocol to manage cluster metadata,
which is stored durably in an internal Kafka topic. This makes Kafka a self-contained,
single-system deployment.112
The transition to KRaft has been a carefully managed, multi-year effort. KRaft was
declared production-ready for new clusters in Apache Kafka 3.3.112 The recent Apache
Kafka 4.0 release in 2025 marks the final step in this evolution, completely removing
the ZooKeeper mode and making KRaft the default and only supported operational
mode.115 For existing ZooKeeper-based clusters, a detailed migration path is provided,
allowing for a phased transition through a "hybrid mode" before the final cutover.120
With tiered storage, Kafka solidifies its position as a true, long-term system of record
for event data, combining the low-latency performance of local storage for real-time
access with the cost-effectiveness and virtually infinite capacity of cloud object
storage for historical data.
The Apache Kafka 4.0 release introduced several other game-changing features that
signal the project's future direction.
● New Consumer Rebalance Protocol (KIP-848): Now generally available, this
new protocol revolutionizes how consumer groups handle rebalancing. It replaces
the old "stop-the-world" rebalance mechanism with a more cooperative protocol
that allows consumers to continue processing data from their assigned partitions
while a rebalance is in progress for other partitions. This dramatically reduces
downtime and improves the stability and performance of large, dynamic
consumer groups.118
● Queues for Kafka (KIP-932): This feature, currently in early access, directly
addresses one of the few remaining areas where traditional message queues held
an advantage. It introduces the concept of a Share Group as an alternative to a
consumer group. In a share group, the strict one-to-one mapping between
partitions and consumers is relaxed, allowing the number of consumers to exceed
the number of partitions. This enables true work-queue semantics, where a pool
of consumers can cooperatively process records from the same partitions, with
individual message acknowledgment and delivery tracking. This makes Kafka a
much more viable platform for traditional queuing use cases without sacrificing its
core durability and scalability.3
● Eligible Leader Replicas (ELR) (KIP-966): This preview feature further
strengthens Kafka's consistency guarantees during failover. It introduces a subset
of the ISR, known as the ELR, which contains only those replicas guaranteed to
have the complete data log up to the high-watermark. By restricting leader
elections to only replicas in the ELR, Kafka can further prevent rare edge cases
that could lead to data loss.118
These recent developments are not isolated improvements. They represent a clear
and coherent strategic response to the evolving demands of the data landscape and
the competitive pressures from other platforms, particularly Apache Pulsar. Pulsar's
primary architectural selling points have been its separation of compute and storage
and its native support for both streaming and queuing. Kafka's introduction of KRaft
simplifies its operational model, directly countering the argument of its complexity.
The implementation of Tiered Storage directly mirrors Pulsar's core architectural
benefit, addressing the cost and elasticity arguments that favored Pulsar in
cloud-native deployments. Finally, the introduction of Share Groups (Queues for
Kafka) is a direct answer to Pulsar's flexible queuing capabilities. This roadmap
demonstrates a clear strategy: to systematically re-architect Kafka to incorporate the
best ideas from its competitors, thereby neutralizing its perceived weaknesses while
leveraging its unparalleled ecosystem and market dominance to secure its position as
the leading event streaming platform for the foreseeable future.
Kafka is designed for horizontal scalability, but this requires careful planning.
● Partitioning Strategy: The number of partitions for a topic is one of the most
critical and difficult-to-change decisions. It determines the maximum consumer
parallelism and is a key factor in throughput. A common rule of thumb is to
provision partitions based on the target throughput (e.g., if a single partition can
handle 10 MB/s and the target is 100 MB/s, at least 10 partitions are needed) and
the expected number of consumer instances. It is generally better to
over-partition slightly than to under-partition, as adding partitions later can
disrupt key-based ordering guarantees.28 However, an excessive number of
partitions (e.g., thousands per broker) can increase memory overhead and leader
election time.28
● Broker Sizing and Scaling: Brokers should be sized based on expected network
I/O, disk throughput, and memory requirements. Kafka scales horizontally by
adding more broker nodes to the cluster. After adding brokers, partitions must be
reassigned to the new nodes to balance the load. This rebalancing can be a
resource-intensive operation. Tools like LinkedIn's Cruise Control can be used to
automate the process of generating and executing optimized partition
reassignment plans.124
For most organizations, the active-passive model using asynchronous replication with
a tool like MirrorMaker2 provides the best balance of data protection, cost, and
performance for a disaster recovery strategy.
This comprehensive analysis reveals that Apache Kafka is far more than a
high-performance message queue. It is a sophisticated, distributed event streaming
platform, architected around the foundational principle of the replicated commit log.
This core design choice is the wellspring of its defining characteristics: extreme
scalability that supports trillions of events per day; high-throughput performance
derived from sequential I/O patterns; and configurable data durability that allows it to
serve as a fault-tolerant, long-term system of record.
Based on this analysis, the following strategic recommendations can be made for
organizations considering or currently using Apache Kafka.
Exactly-Once Processing Use Kafka Streams with This leverages Kafka's native
(Stream Processing) processing.guarantee=exac support for idempotency and
tly_once or use the atomic transactions to ensure
Transactional API with an that end-to-end
idempotent producer and "consume-transform-produce
isolation.level=read_committe " operations are processed
d consumers 65 exactly once, even in the face
of failures.
1. How (and why) Kafka was created at LinkedIn | Frontier Enterprise, truy cập vào
tháng 7 5, 2025,
https://www.frontier-enterprise.com/unleashing-kafka-insights-from-confluent-j
un-rao/
2. How Apache Kafka Powers Scalable Data Architectures - Peerbits, truy cập vào
tháng 7 5, 2025,
https://www.peerbits.com/blog/everything-you-need-to-about-apache-kafka.ht
ml
3. en.wikipedia.org, truy cập vào tháng 7 5, 2025,
https://en.wikipedia.org/wiki/Apache_Kafka
4. en.wikipedia.org, truy cập vào tháng 7 5, 2025,
https://en.wikipedia.org/wiki/Apache_Kafka#:~:text=Kafka%20was%20originally%
20developed%20at,Rao%20helped%20co%2Dcreate%20Kafka.
5. Kafka — All you want to know. History | by Ramprakash | Analytics Vidhya |
Medium, truy cập vào tháng 7 5, 2025,
https://medium.com/analytics-vidhya/kafka-all-you-want-to-know-b9624e49600
6
6. History of Kafka - Data Lake for Enterprises [Book] - O'Reilly Media, truy cập vào
tháng 7 5, 2025,
https://www.oreilly.com/library/view/data-lake-for/9781787281349/1ed43286-4179-
4c35-b044-4c1b379753d3.xhtml
7. Apache Kafka: Past, Present and Future - Confluent | DE, truy cập vào tháng 7 5,
2025,
https://www.confluent.io/de-de/online-talks/apache-kafka-past-present-future-o
n-demand/
8. What is Apache Kafka? Introduction - Conduktor, truy cập vào tháng 7 5, 2025,
https://learn.conduktor.io/kafka/what-is-apache-kafka-part-1/
9. What is Apache Kafka? | Confluent, truy cập vào tháng 7 5, 2025,
https://www.confluent.io/what-is-apache-kafka/
10.What is Kafka? - Apache Kafka Explained - AWS, truy cập vào tháng 7 5, 2025,
https://aws.amazon.com/what-is/apache-kafka/
11. Using Apache Kafka for log aggregation - Redpanda, truy cập vào tháng 7 5,
2025, https://www.redpanda.com/guides/kafka-use-cases-log-aggregation
12.Use Cases and Architectures for Apache Kafka across Industries ..., truy cập vào
tháng 7 5, 2025,
https://www.kai-waehner.de/blog/2020/10/20/apache-kafka-event-streaming-us
e-cases-architectures-examples-real-world-across-industries/
13.Apache Kafka: Architecture, deployment and ecosystem [2025 guide] -
Instaclustr, truy cập vào tháng 7 5, 2025,
https://www.instaclustr.com/education/apache-kafka/
14.The Past, Present and Future of Message Queue 1 - Vanus AI, truy cập vào tháng
7 5, 2025,
https://www.vanus.ai/blog/the-past-present-and-future-of-message-queue-1/
15.RabbitMQ vs Kafka - Difference Between Message Queue Systems - AWS, truy
cập vào tháng 7 5, 2025,
https://aws.amazon.com/compare/the-difference-between-rabbitmq-and-kafka/
16.How Kafka distributes the topic partitions among the brokers - Codemia, truy
cập vào tháng 7 5, 2025,
https://codemia.io/knowledge-hub/path/how_kafka_distributes_the_topic_partitio
ns_among_the_brokers
17.Apache Kafka: Real-Time Event Streaming Platform Explained | by Tahir | Medium,
truy cập vào tháng 7 5, 2025,
https://medium.com/@tahirbalarabe2/apache-kafka-real-time-event-streaming-
platform-explained-12497b2bed44
18.Apache Kafka documentation, truy cập vào tháng 7 5, 2025,
https://kafka.apache.org/documentation/
19.Documentation - Apache Kafka, truy cập vào tháng 7 5, 2025,
https://kafka.apache.org/081/documentation.html
20.I've been trying to rationalize using either RabbitMQ or Kafka for something I'm... |
Hacker News, truy cập vào tháng 7 5, 2025,
https://news.ycombinator.com/item?id=23259305
21.Kafka Vs RabbitMQ: Key Differences & Features Explained - Simplilearn.com, truy
cập vào tháng 7 5, 2025, https://www.simplilearn.com/kafka-vs-rabbitmq-article
22.Apache Kafka: What It Is, Use Cases and More | Built In, truy cập vào tháng 7 5,
2025, https://builtin.com/data-science/what-is-kafka
23.Overview of Kafka architecture: brokers, topics, partitions, and ..., truy cập vào
tháng 7 5, 2025,
https://www.codefro.com/2023/10/03/overview-of-kafka-architecture-brokers-to
pics-partitions-and-replication/
24.Best Practices for Kafka Production Deployments in Confluent Platform, truy cập
vào tháng 7 5, 2025,
https://docs.confluent.io/platform/current/kafka/post-deployment.html
25.Kafka Replication | Confluent Documentation, truy cập vào tháng 7 5, 2025,
https://docs.confluent.io/kafka/design/replication.html
26.Disaster Recovery and High Availability in Apache Kafka: Best Practices for
Resilient Streaming Systems | by Let's code - Medium, truy cập vào tháng 7 5,
2025,
https://medium.com/@letsCodeDevelopers/disaster-recovery-and-high-availabili
ty-in-apache-kafka-best-practices-for-resilient-streaming-5838122c3329
27.Kafka producer - Redpanda, truy cập vào tháng 7 5, 2025,
https://www.redpanda.com/guides/kafka-architecture-kafka-producer
28.Kafka Topics Choosing the Replication Factor and Partitions Count - Conduktor,
truy cập vào tháng 7 5, 2025,
https://learn.conduktor.io/kafka/kafka-topics-choosing-the-replication-factor-an
d-partitions-count/
29.Starting out with Kafka clusters: topics, partitions and brokers | by Martin Hodges
| Medium, truy cập vào tháng 7 5, 2025,
https://medium.com/@martin.hodges/starting-out-with-kafka-clusters-topics-pa
rtitions-and-brokers-c9fbe4ed1642
30.Kafka Partitions: Essential Concepts for Scalability and Performance - DataCamp,
truy cập vào tháng 7 5, 2025, https://www.datacamp.com/tutorial/kafka-partitions
31.A Beginner's Guide to Kafka® Consumers - Instaclustr, truy cập vào tháng 7 5,
2025, https://www.instaclustr.com/blog/a-beginners-guide-to-kafka-consumers/
32.Architecture - Apache Kafka, truy cập vào tháng 7 5, 2025,
https://kafka.apache.org/34/documentation/streams/architecture
33.How does kafka consumers/producers commit messages/partitions? - Stack
Overflow, truy cập vào tháng 7 5, 2025,
https://stackoverflow.com/questions/61679476/how-does-kafka-consumers-prod
ucers-commit-messages-partitions
34.Understanding Apache Kafka architecture – a definitive guide - Site24x7, truy cập
vào tháng 7 5, 2025,
https://www.site24x7.com/learn/apache-kafka-architecture.html
35.Tutorial: Apache Kafka Producer & Consumer APIs - Azure HDInsight | Microsoft
Learn, truy cập vào tháng 7 5, 2025,
https://learn.microsoft.com/en-us/azure/hdinsight/kafka/apache-kafka-producer-
consumer-api
36.Kafka Producer and Consumer. I talked about Kafka architecture in ..., truy cập
vào tháng 7 5, 2025,
https://medium.com/@cobch7/kafka-producer-and-consumer-f1f6390994fc
37.How to send message to a particular partition in Kafka? - Stack Overflow, truy
cập vào tháng 7 5, 2025,
https://stackoverflow.com/questions/50324249/how-to-send-message-to-a-part
icular-partition-in-kafka
38.Apache Kafka Partition Key: A Comprehensive Guide - Confluent, truy cập vào
tháng 7 5, 2025, https://www.confluent.io/learn/kafka-partition-key/
39.How Producer decides in which Partition it has to put the message? - Stack
Overflow, truy cập vào tháng 7 5, 2025,
https://stackoverflow.com/questions/59389222/how-producer-decides-in-which-
partition-it-has-to-put-the-message
40.Kafka Keys, Partitions and Message Ordering - Lydtech Consulting, truy cập vào
tháng 7 5, 2025,
https://www.lydtechconsulting.com/blog-kafka-message-keys.html
41.dattell.com, truy cập vào tháng 7 5, 2025,
https://dattell.com/data-architecture-blog/does-kafka-guarantee-message-order
/#:~:text=Kafka%20Consumer%20Offset.-,Kafka%20Guarantees%20Order,the%
20message%20ordering%20is%20guaranteed.
42.Does Kafka Guarantee Message Order? - Dattell, truy cập vào tháng 7 5, 2025,
https://dattell.com/data-architecture-blog/does-kafka-guarantee-message-order
/
43.How to produce messages to selected partition using kafka-console-producer? -
Codemia, truy cập vào tháng 7 5, 2025,
https://codemia.io/knowledge-hub/path/how_to_produce_messages_to_selected
_partition_using_kafka-console-producer
44.How to use Apache Kafka to guarantee message ordering? - Medium, truy cập
vào tháng 7 5, 2025,
https://medium.com/latentview-data-services/how-to-use-apache-kafka-to-gua
rantee-message-ordering-ac2d00da6c22
45.Understanding Kafka Producer: How Partition Selection Works - Today I learned,
truy cập vào tháng 7 5, 2025,
https://til.hashnode.dev/understanding-kafka-producer-how-partition-selection-
works
46.Sending Data to a Specific Partition in Kafka - Baeldung, truy cập vào tháng 7 5,
2025, https://www.baeldung.com/kafka-send-data-partition
47.Kafka Producer for Confluent Platform, truy cập vào tháng 7 5, 2025,
https://docs.confluent.io/platform/current/clients/producer.html
48.Optimizing Kafka Performance: Tips for Tuning and Scaling Kafka ..., truy cập vào
tháng 7 5, 2025,
https://medium.com/@nemagan/optimizing-kafka-performance-tips-for-tuning-
and-scaling-kafka-clusters-ebc08153c661
49.Apache Kafka — Understanding how to produce and consume messages? -
Medium, truy cập vào tháng 7 5, 2025,
https://medium.com/@sirajul.anik/apache-kafka-understanding-how-to-produce
-and-consume-messages-9744c612f40f
50.Delivery Semantics for Kafka Consumers | Learn Apache Kafka - Conduktor, truy
cập vào tháng 7 5, 2025,
https://learn.conduktor.io/kafka/delivery-semantics-for-kafka-consumers/
51.Understanding In-Sync Replicas (ISR) in Apache Kafka - GeeksforGeeks, truy cập
vào tháng 7 5, 2025,
https://www.geeksforgeeks.org/understanding-in-sync-replicas-isr-in-apache-ka
fka/
52.Leader Follower Pattern in Distributed Systems - GeeksforGeeks, truy cập vào
tháng 7 5, 2025,
https://www.geeksforgeeks.org/system-design/leader-follower-pattern-in-distrib
uted-systems/
53.broker - What is a partition leader in Apache Kafka? - Stack Overflow, truy cập
vào tháng 7 5, 2025,
https://stackoverflow.com/questions/60835817/what-is-a-partition-leader-in-apa
che-kafka/60837212
54.Multi-Geo Replication in Apache Kafka - Confluent, truy cập vào tháng 7 5, 2025,
https://www.confluent.io/blog/multi-geo-replication-in-apache-kafka/
55.Kafka Replication & Min In-Sync Replicas - Lydtech Consulting, truy cập vào tháng
7 5, 2025, https://www.lydtechconsulting.com/blog-kafka-replication.html
56.When does Kafka Leader Election happen? - Codemia, truy cập vào tháng 7 5,
2025,
https://codemia.io/knowledge-hub/path/when_does_kafka_leader_election_happ
en
57.Kafka Replication: Concept & Best Practices - GitHub, truy cập vào tháng 7 5,
2025,
https://github.com/AutoMQ/automq/wiki/Kafka-Replication:-Concept-&-Best-Pra
ctices
58.Learning Kafka - Configuring Kafka Producer for Message Durability - Blog, truy
cập vào tháng 7 5, 2025,
https://dsinecos.github.io/blog/Learning-Kafka-Configure-Kafka-Producer-for-M
essage-Durability
59.What is a partition leader in Apache Kafka? - broker - Stack Overflow, truy cập
vào tháng 7 5, 2025,
https://stackoverflow.com/questions/60835817/what-is-a-partition-leader-in-apa
che-kafka
60.Understanding Message Durability in Kafka | by Amarendra Singh - Dev Genius,
truy cập vào tháng 7 5, 2025,
https://blog.devgenius.io/understanding-message-durability-in-kafka-8f6e7006a
ea8
61.Kafka — Data Durability and Availability Guarantees | by Mahesh Saini | The Life
Titbits, truy cập vào tháng 7 5, 2025,
https://medium.com/the-life-titbits/kafka-data-durability-and-availability-guarant
ees-add5e4340638
62.Kafka Topic Replication | Learn Apache Kafka with Conduktor, truy cập vào tháng
7 5, 2025, https://learn.conduktor.io/kafka/kafka-topic-replication/
63.Ensuring Message Ordering in Kafka: Strategies and Configurations | Baeldung,
truy cập vào tháng 7 5, 2025,
https://www.baeldung.com/kafka-message-ordering
64.RabbitMQ vs. Kafka vs. ActiveMQ: A Battle of Messaging Brokers, truy cập vào
tháng 7 5, 2025,
https://www.designgurus.io/blog/rabbitmq-kafka-activemq-system-design
65.Exactly-once Semantics is Possible: Here's How Apache Kafka Does it, truy cập
vào tháng 7 5, 2025,
https://www.confluent.io/blog/exactly-once-semantics-are-possible-heres-how-
apache-kafka-does-it/
66.Demystifying Apache Kafka Message Delivery Semantics - Keen IO, truy cập vào
tháng 7 5, 2025,
https://keen.io/blog/demystifying-apache-kafka-message-delivery-semantics-at
-most-once-at-least-once-exactly-once/
67.Kafka message delivery semantics: at most once, at least once, exactly once | by
Navya PS, truy cập vào tháng 7 5, 2025,
https://medium.com/@psnavya90/kafka-message-delivery-semantics-at-most-o
nce-at-least-once-exactly-once-14bc48046776
68.Apache Kafka's Exactly-Once Semantics in Spring Cloud Stream Kafka
Applications, truy cập vào tháng 7 5, 2025,
https://spring.io/blog/2023/10/16/apache-kafkas-exactly-once-semantics-in-sprin
g-cloud-stream-kafka/
69.How Kafka achieves exactly-once semantics | by Oleg Potapov - Medium, truy
cập vào tháng 7 5, 2025,
https://oleg0potapov.medium.com/how-kafka-achieves-exactly-once-semantics
-57fdb7ad2e3f
70.Exactly-Once Processing in Kafka explained | by sudan - Medium, truy cập vào
tháng 7 5, 2025,
https://ssudan16.medium.com/exactly-once-processing-in-kafka-explained-66ec
c41a8548
71.Exactly Once Processing in Kafka with Java | Baeldung, truy cập vào tháng 7 5,
2025, https://www.baeldung.com/kafka-exactly-once
72.en.wikipedia.org, truy cập vào tháng 7 5, 2025,
https://en.wikipedia.org/wiki/Message_queue
73.What Is a Message Queue? | IBM, truy cập vào tháng 7 5, 2025,
https://www.ibm.com/think/topics/message-queues
74.Apache Kafka vs. ActiveMQ: Differences & Comparison - AutoMQ, truy cập vào
tháng 7 5, 2025,
https://www.automq.com/blog/apache-kafka-vs-activemq-differences-and-com
parison
75.Kafka vs RabbitMQ: Key Differences & When to Use Each | DataCamp, truy cập
vào tháng 7 5, 2025, https://www.datacamp.com/blog/kafka-vs-rabbitmq
76.Kafka vs Message Queue: A Quick Comparison - Linearloop, truy cập vào tháng 7
5, 2025,
https://www.linearloop.io/blog/kafka-vs-message-queue-a-quick-comparison
77.Benchmarking RabbitMQ vs Kafka vs Pulsar Performance - Confluent, truy cập
vào tháng 7 5, 2025,
https://www.confluent.io/blog/kafka-fastest-messaging-system/
78.Apache Kafka® vs ActiveMQ: 5 key differences and how to choose - Instaclustr,
truy cập vào tháng 7 5, 2025,
https://www.instaclustr.com/education/apache-kafka/apache-kafka-vs-activemq-
5-key-differences-and-how-to-choose/
79.Difference between Apache Kafka, RabbitMQ, and ActiveMQ - DEV Community,
truy cập vào tháng 7 5, 2025,
https://dev.to/somadevtoo/difference-between-apache-kafka-rabbitmq-and-acti
vemq-4f1k
80.When to use RabbitMQ over Kafka? [closed] - Stack Overflow, truy cập vào tháng
7 5, 2025,
https://stackoverflow.com/questions/42151544/when-to-use-rabbitmq-over-kafk
a
81.When to use Apache kafka instead of ActiveMQ [closed] - Stack Overflow, truy
cập vào tháng 7 5, 2025,
https://stackoverflow.com/questions/44792604/when-to-use-apache-kafka-inste
ad-of-activemq
82.RabbitMQ vs. Apache Kafka | Confluent, truy cập vào tháng 7 5, 2025,
https://www.confluent.io/learn/rabbitmq-vs-apache-kafka/
83.ActiveMQ vs Kafka: Differences & Use Cases Explained - DataCamp, truy cập vào
tháng 7 5, 2025, https://www.datacamp.com/blog/activemq-vs-kafka
84.What is Apache Kafka? - Red Hat, truy cập vào tháng 7 5, 2025,
https://www.redhat.com/en/topics/integration/what-is-apache-kafka
85.Powered By - Apache Kafka, truy cập vào tháng 7 5, 2025,
https://kafka.apache.org/powered-by
86.Apache Kafka Use Cases: When to Choose It and When to Look Elsewhere -
CelerData, truy cập vào tháng 7 5, 2025,
https://celerdata.com/glossary/apache-kafka-use-cases
87.Use Cases - Apache Kafka, truy cập vào tháng 7 5, 2025,
https://kafka.apache.org/uses
88.10 Real-World Event-Driven Architecture Examples in Logistics. Implementing
Kafka at Scale to Handle Supply Chain Network - nexocode, truy cập vào tháng 7
5, 2025,
https://nexocode.com/blog/posts/event-driven-architecture-examples-in-logistic
s-apache-kafka-to-handle-supply-chain-network/
89.Apache ActiveMQ vs. Kafka | Baeldung, truy cập vào tháng 7 5, 2025,
https://www.baeldung.com/apache-activemq-vs-kafka
90.Kafka vs Pulsar: Key Differences - Optiblack, truy cập vào tháng 7 5, 2025,
https://optiblack.com/insights/kafka-vs-pulsar-key-differences
91.Apache Kafka vs. Apache Pulsar: Differences & Comparison - AutoMQ, truy cập
vào tháng 7 5, 2025,
https://www.automq.com/blog/apache-kafka-vs-apache-pulsar-differences-com
parison
92.How is Apache Pulsar different from Apache Kafka? - Milvus, truy cập vào tháng 7
5, 2025,
https://milvus.io/ai-quick-reference/how-is-apache-pulsar-different-from-apach
e-kafka
93.When would you use Kafka, vs some other broker? : r/apachekafka - Reddit, truy
cập vào tháng 7 5, 2025,
https://www.reddit.com/r/apachekafka/comments/hf8you/when_would_you_use_
kafka_vs_some_other_broker/
94.Kafka vs. Pulsar vs. RabbitMQ: Performance, Architecture, and Features
Compared, truy cập vào tháng 7 5, 2025,
https://www.confluent.io/kafka-vs-pulsar/
95.Kafka vs Pulsar: Choosing the Right Stream Processing Platform - RisingWave,
truy cập vào tháng 7 5, 2025,
https://risingwave.com/blog/kafka-vs-pulsar-choosing-the-right-stream-processi
ng-platform/
96.Comparing Apache Pulsar vs. Apache Kafka | 2022 Benchmark Report, truy cập
vào tháng 7 5, 2025,
https://streamnative.io/blog/apache-pulsar-vs-apache-kafka-2022-benchmark
97.A More Accurate Perspective on Pulsar's Performance Compared to Kafka -
StreamNative, truy cập vào tháng 7 5, 2025,
https://streamnative.io/blog/perspective-on-pulsars-performance-compared-to-
kafka
98.What is Kafka Connect—Complete Guide - Redpanda, truy cập vào tháng 7 5,
2025, https://www.redpanda.com/guides/kafka-tutorial-what-is-kafka-connect
99.What is Kafka Connect? Concepts & Best Practices - AutoMQ, truy cập vào tháng
7 5, 2025,
https://www.automq.com/blog/kafka-connect-architecture-concepts-best-practi
ces
100. Kafka Connect | Confluent Documentation, truy cập vào tháng 7 5, 2025,
https://docs.confluent.io/platform/current/connect/index.html
101. Kafka Connectors—Overview, use cases, and best practices - Redpanda, truy
cập vào tháng 7 5, 2025,
https://www.redpanda.com/guides/kafka-cloud-kafka-connectors
102. Architecture - Apache Kafka, truy cập vào tháng 7 5, 2025,
https://kafka.apache.org/20/documentation/streams/architecture
103. The Nuts and Bolts of Kafka Streams---An Architectural Deep Dive - YouTube,
truy cập vào tháng 7 5, 2025, https://www.youtube.com/watch?v=2_-WoWlAD5M
104. What is Apache Kafka? Ecosystem - Conduktor, truy cập vào tháng 7 5, 2025,
https://learn.conduktor.io/kafka/what-is-apache-kafka-part-3/
105. Apache Kafka® and ksqlDB for Confluent Platform, truy cập vào tháng 7 5,
2025,
https://docs.confluent.io/platform/current/ksqldb/concepts/apache-kafka-primer.
html
106. Apache Kafka — Part III — ksqlDB - Medium, truy cập vào tháng 7 5, 2025,
https://medium.com/@selcuk.sert/apache-kafka-part-iii-ksqldb-f3f1b8cbaf60
107. Introduction to ksqlDB | Baeldung, truy cập vào tháng 7 5, 2025,
https://www.baeldung.com/ksqldb
108. Mastering ksqldb Tutorial: Your Ultimate Guide - RisingWave: Streaming
Database Built on Open Standards, truy cập vào tháng 7 5, 2025,
https://risingwave.com/blog/mastering-ksqldb-tutorial-your-ultimate-guide/
109. Study Notes 6.11-12: Kafka ksqlDB, Connect & Schema Registry - DEV
Community, truy cập vào tháng 7 5, 2025,
https://dev.to/pizofreude/study-notes-611-12-kafka-ksqldb-connect-schema-regi
stry-2g9n
110. Schema Registry for Confluent Platform | Confluent Documentation, truy cập
vào tháng 7 5, 2025,
https://docs.confluent.io/platform/current/schema-registry/index.html
111. Kafka Schema Registry in Distributed Systems | by Alex Klimenko - Medium,
truy cập vào tháng 7 5, 2025,
https://medium.com/@alxkm/kafka-schema-registry-in-distributed-systems-8a9
9bad321b1
112. Kafka's Shift from ZooKeeper to Kraft | Baeldung, truy cập vào tháng 7 5, 2025,
https://www.baeldung.com/kafka-shift-from-zookeeper-to-kraft
113. KRaft: Apache Kafka Without ZooKeeper - SOC Prime, truy cập vào tháng 7 5,
2025, https://socprime.com/blog/kraft-apache-kafka-without-zookeeper/
114. Kafka Raft vs. ZooKeeper vs. Redpanda, truy cập vào tháng 7 5, 2025,
https://www.redpanda.com/guides/kafka-alternatives-kafka-raft
115. The Evolution of Kafka Architecture: From ZooKeeper to KRaft | by Roman
Glushach, truy cập vào tháng 7 5, 2025,
https://romanglushach.medium.com/the-evolution-of-kafka-architecture-from-z
ookeeper-to-kraft-f42d511ba242
116. Guide to ZooKeeper to KRaft migration - OSO, truy cập vào tháng 7 5, 2025,
https://oso.sh/blog/guide-to-zookeeper-to-kraft-migration/
117. KRaft - Apache Kafka Without ZooKeeper - Confluent Developer, truy cập vào
tháng 7 5, 2025, https://developer.confluent.io/learn/kraft/
118. Apache Kafka 4.0 Release: Default KRaft, Queues, Faster Rebalances, truy cập
vào tháng 7 5, 2025, https://www.confluent.io/blog/latest-apache-kafka-release/
119. Apache Kafka 4.0, truy cập vào tháng 7 5, 2025, https://kafka.apache.org/blog
120. From ZooKeeper to KRaft: How the Kafka migration works - Strimzi, truy cập
vào tháng 7 5, 2025, https://strimzi.io/blog/2024/03/21/kraft-migration/
121. Migrate from ZooKeeper to KRaft on Confluent Platform, truy cập vào tháng 7
5, 2025,
https://docs.confluent.io/platform/current/installation/migrate-zk-kraft.html
122. Migrating Zookeeper to Kraft | The Write Ahead Log, truy cập vào tháng 7 5,
2025, https://platformatory.io/blog/Migrating-Zookeeper-to-kraft/
123. The various tiers of Apache Kafka Tiered Storage - Strimzi, truy cập vào tháng
7 5, 2025,
https://strimzi.io/blog/2025/04/22/tha-various-tiers-of-apache-kafka-tiered-stora
ge/
124. How to auto scale Apache Kafka with Tiered Storage in Production - OSO, truy
cập vào tháng 7 5, 2025,
https://oso.sh/blog/how-to-auto-scale-apache-kafka-with-tiered-storage-in-pro
duction/
125. Kafka performance tuning strategies and practical tips - Redpanda, truy cập
vào tháng 7 5, 2025,
https://www.redpanda.com/guides/kafka-performance-kafka-performance-tunin
g
126. Kafka monitoring: Tutorials and best practices - Redpanda, truy cập vào tháng
7 5, 2025,
https://www.redpanda.com/guides/kafka-performance-kafka-monitoring
127. The Hitchhiker's Guide to Disaster Recovery and Multi-Region Kafka ..., truy
cập vào tháng 7 5, 2025,
https://www.warpstream.com/blog/the-hitchhikers-guide-to-disaster-recovery-a
nd-multi-region-kafka
128. DR for Kafka Cluster : r/apachekafka - Reddit, truy cập vào tháng 7 5, 2025,
https://www.reddit.com/r/apachekafka/comments/1i93cmi/dr_for_kafka_cluster/
129. Building Bulletproof Disaster Recovery for Apache Kafka: A Field-Tested
Architecture - OSO, truy cập vào tháng 7 5, 2025,
https://oso.sh/blog/building-bulletproof-disaster-recovery-for-apache-kafka-a-fi
eld-tested-architecture/
130. Replicate Multi-Datacenter Topics Across Kafka Clusters in Confluent
Platform, truy cập vào tháng 7 5, 2025,
https://docs.confluent.io/platform/current/multi-dc-deployments/replicator/index.
html
131. Build multi-Region resilient Apache Kafka applications with identical topic
names using Amazon MSK and Amazon MSK Replicator | AWS Big Data Blog, truy
cập vào tháng 7 5, 2025,
https://aws.amazon.com/blogs/big-data/build-multi-region-resilient-apache-kafk
a-applications-with-identical-topic-names-using-amazon-msk-and-amazon-ms
k-replicator/
132. Failover & Failback Runbooks - JetStream Software, truy cập vào tháng 7 5,
2025,
http://www.jetstreamsoft.com/wp-content/uploads/2020/05/Failover-Failback-Ru
nbooks_v1.0.pdf
133. Cluster Linking Disaster Recovery and Failover on Confluent Cloud, truy cập
vào tháng 7 5, 2025,
https://docs.confluent.io/cloud/current/multi-cloud/cluster-linking/dr-failover.html
134. Performing a failover or failback - Cloudera Documentation, truy cập vào
tháng 7 5, 2025,
https://docs.cloudera.com/csm-operator/1.2/kafka-replication-deploy-configure/t
opics/csm-op-replication-failover-failback.html