Apache Kafka Documentation
Apache Kafka Documentation
Prior
releases: 0.7.x, 0.8.0, 0.8.1.X, 0.8.2.X, 0.9.0.X, 0.10.0.X, 0.10.1.X, 0.10.2.X, 0.11.0.X, 1.0.X, 1.1
.X, 2.0.X, 2.1.X, 2.2.X, 2.3.X, 2.4.X, 2.5.X.
1. GETTING STARTED
1.1 Introduction
Event streaming is the digital equivalent of the human body's central nervous system. It is
the technological foundation for the 'always-on' world where businesses are increasingly
software-defined and automated, and where the user of software is more software.
Technically speaking, event streaming is the practice of capturing data in real-time from
event sources like databases, sensors, mobile devices, cloud services, and software
applications in the form of streams of events; storing these event streams durably for later
retrieval; manipulating, processing, and reacting to the event streams in real-time as well as
retrospectively; and routing the event streams to different destination technologies as
needed. Event streaming thus ensures a continuous flow and interpretation of data so that
the right information is at the right place, at the right time.
Kafka combines three key capabilities so you can implement your use cases for event
streaming end-to-end with a single battle-tested solution:
And all this functionality is provided in a distributed, highly scalable, elastic, fault-tolerant,
and secure manner. Kafka can be deployed on bare-metal hardware, virtual machines, and
containers, and on-premises as well as in the cloud. You can choose between self-
managing your Kafka environments and using fully managed services offered by a variety of
vendors.
Servers: Kafka is run as a cluster of one or more servers that can span multiple datacenters
or cloud regions. Some of these servers form the storage layer, called the brokers. Other
servers run Kafka Connect to continuously import and export data as event streams to
integrate Kafka with your existing systems such as relational databases as well as other
Kafka clusters. To let you implement mission-critical use cases, a Kafka cluster is highly
scalable and fault-tolerant: if any of its servers fails, the other servers will take over their
work to ensure continuous operations without any data loss.
Clients: They allow you to write distributed applications and microservices that read, write,
and process streams of events in parallel, at scale, and in a fault-tolerant manner even in
the case of network problems or machine failures. Kafka ships with some such clients
included, which are augmented by dozens of clients provided by the Kafka community:
clients are available for Java and Scala including the higher-level Kafka Streams library, for
Go, Python, C/C++, and many other programming languages as well as REST APIs.
Main Concepts and Terminology
An event records the fact that "something happened" in the world or in your business. It is
also called record or message in the documentation. When you read or write data to Kafka,
you do this in the form of events. Conceptually, an event has a key, value, timestamp, and
optional metadata headers. Here's an example event:
Events are organized and durably stored in topics. Very simplified, a topic is similar to a
folder in a filesystem, and the events are the files in that folder. An example topic name
could be "payments". Topics in Kafka are always multi-producer and multi-subscriber: a
topic can have zero, one, or many producers that write events to it, as well as zero, one, or
many consumers that subscribe to these events. Events in a topic can be read as often as
needed—unlike traditional messaging systems, events are not deleted after consumption.
Instead, you define for how long Kafka should retain your events through a per-topic
configuration setting, after which old events will be discarded. Kafka's performance is
effectively constant with respect to data size, so storing data for a long time is perfectly
fine.
To make your data fault-tolerant and highly-available, every topic can be replicated, even
across geo-regions or datacenters, so that there are always multiple brokers that have a
copy of the data just in case things go wrong, you want to do maintenance on the brokers,
and so on. A common production setting is a replication factor of 3, i.e., there will always be
three copies of your data. This replication is performed at the level of topic-partitions.
Kafka APIs
In addition to command line tooling for management and administration tasks, Kafka has
five core APIs for Java and Scala:
The Admin API to manage and inspect topics, brokers, and other Kafka objects.
The Producer API to publish (write) a stream of events to one or more Kafka topics.
The Consumer API to subscribe to (read) one or more topics and to process the
stream of events produced to them.
The Kafka Streams API to implement stream processing applications and
microservices. It provides higher-level functions to process event streams, including
transformations, stateful operations like aggregations and joins, windowing,
processing based on event-time, and more. Input is read from one or more topics in
order to generate output to one or more topics, effectively transforming the input
streams to output streams.
The Kafka Connect API to build and run reusable data import/export connectors that
consume (read) or produce (write) streams of events from and to external systems
and applications so they can integrate with Kafka. For example, a connector to a
relational database like PostgreSQL might capture every change to a set of tables.
However, in practice, you typically don't need to implement your own connectors
because the Kafka community already provides hundreds of ready-to-use
connectors.
Here is a description of a few of the popular use cases for Apache Kafka®. For an overview
of a number of these areas in action, see this blog post.
Messaging
Kafka works well as a replacement for a more traditional message broker. Message brokers
are used for a variety of reasons (to decouple processing from data producers, to buffer
unprocessed messages, etc). In comparison to most messaging systems Kafka has better
throughput, built-in partitioning, replication, and fault-tolerance which makes it a good
solution for large scale message processing applications.
In our experience messaging uses are often comparatively low-throughput, but may require
low end-to-end latency and often depend on the strong durability guarantees Kafka
provides.
Activity tracking is often very high volume as many activity messages are generated for
each user page view.
Metrics
Kafka is often used for operational monitoring data. This involves aggregating statistics
from distributed applications to produce centralized feeds of operational data.
Log Aggregation
Many people use Kafka as a replacement for a log aggregation solution. Log aggregation
typically collects physical log files off servers and puts them in a central place (a file server
or HDFS perhaps) for processing. Kafka abstracts away the details of files and gives a
cleaner abstraction of log or event data as a stream of messages. This allows for lower-
latency processing and easier support for multiple data sources and distributed data
consumption. In comparison to log-centric systems like Scribe or Flume, Kafka offers
equally good performance, stronger durability guarantees due to replication, and much
lower end-to-end latency.
Stream Processing
Many users of Kafka process data in processing pipelines consisting of multiple stages,
where raw input data is consumed from Kafka topics and then aggregated, enriched, or
otherwise transformed into new topics for further consumption or follow-up processing. For
example, a processing pipeline for recommending news articles might crawl article content
from RSS feeds and publish it to an "articles" topic; further processing might normalize or
deduplicate this content and publish the cleansed article content to a new topic; a final
processing stage might attempt to recommend this content to users. Such processing
pipelines create graphs of real-time data flows based on the individual topics. Starting in
0.10.0.0, a light-weight but powerful stream processing library called Kafka Streams is
available in Apache Kafka to perform such data processing as described above. Apart from
Kafka Streams, alternative open source stream processing tools include Apache
Storm and Apache Samza.
Event Sourcing
Event sourcing is a style of application design where state changes are logged as a time-
ordered sequence of records. Kafka's support for very large stored log data makes it an
excellent backend for an application built in this style.
Commit Log
Kafka can serve as a kind of external commit-log for a distributed system. The log helps
replicate data between nodes and acts as a re-syncing mechanism for failed nodes to
restore their data. The log compaction feature in Kafka helps support this usage. In this
usage Kafka is similar to Apache BookKeeper project.
Run the following commands in order to start all services in the correct order:
Once all services have successfully launched, you will have a basic Kafka environment
running and ready to use.
STEP 3: CREATE A TOPIC TO STORE YOUR EVENTS
Kafka is a distributed event streaming platform that lets you read, write, store, and
process events (also called records or messages in the documentation) across many
machines.
Example events are payment transactions, geolocation updates from mobile phones,
shipping orders, sensor measurements from IoT devices or medical equipment, and much
more. These events are organized and stored in topics. Very simplified, a topic is similar to
a folder in a filesystem, and the events are the files in that folder.
So before you can write your first events, you must create a topic. Open another terminal
session and run:
All of Kafka's command line tools have additional options: run the kafka-
topics.sh command without any arguments to display usage information. For example, it
can also show you details such as the partition count of the new topic:
A Kafka client communicates with the Kafka brokers via the network for writing (or reading)
events. Once received, the brokers will store the events in a durable and fault-tolerant
manner for as long as you need—even forever.
Run the console producer client to write a few events into your topic. By default, each line
you enter will result in a separate event being written to the topic.
Open another terminal session and run the console consumer client to read the events you
just created:
Feel free to experiment: for example, switch back to your producer terminal (previous step)
to write additional events, and see how the events immediately show up in your consumer
terminal.
Because events are durably stored in Kafka, they can be read as many times and by as
many consumers as you want. You can easily verify this by opening yet another terminal
session and re-running the previous command again.
You probably have lots of data in existing systems like relational databases or traditional
messaging systems, along with many applications that already use these systems. Kafka
Connect allows you to continuously ingest data from external systems into Kafka, and vice
versa. It is thus very easy to integrate existing systems with Kafka. To make this process
even easier, there are hundreds of such connectors readily available.
To give you a first taste, here's how one would implement the popular WordCount algorithm:
wordCounts.toStream().to("output-topic"), Produced.with(Serdes.String(),
Serdes.Long()));
Now that you reached the end of the quickstart, feel free to tear down the Kafka
environment—or continue playing around.
1. Stop the producer and consumer clients with Ctrl-C, if you haven't done so already.
2. Stop the Kafka broker with Ctrl-C.
3. Lastly, stop the ZooKeeper server with Ctrl-C.
If you also want to delete any data of your local Kafka environment including any events you
have created along the way, run the command:
CONGRATULATIONS!
You have successfully finished the Apache Kafka quickstart.
Read through the brief Introduction to learn how Kafka works at a high level, its main
concepts, and how it compares to other technologies. To understand Kafka in more
detail, head over to the Documentation.
Browse through the Use Cases to learn how other users in our world-wide
community are getting value out of Kafka.
Join a local Kafka meetup group and watch talks from Kafka Summit, the main
conference of the Kafka community.
1.4 Ecosystem
There are a plethora of tools that integrate with Kafka outside the main distribution.
The ecosystem page lists many of these, including stream processing systems, Hadoop
integration, monitoring, and deployment tools.
If you are upgrading from a version prior to 2.1.x, please see the note below about the
change to the schema used to store consumer offsets. Once you have changed the
inter.broker.protocol.version to the latest version, it will not be possible to downgrade to a
version prior to 2.1.
If you are upgrading from version 0.11.0.x or above, and you have not overridden the
message format, then you only need to override the inter-broker protocol version.
o inter.broker.protocol.version=CURRENT_KAFKA_VERSION (e.g., 2.5, 2.4, etc.)
2. Upgrade the brokers one at a time: shut down the broker, update the code, and
restart it. Once you have done so, the brokers will be running the latest version and
you can verify that the cluster's behavior and performance meets expectations. It is
still possible to downgrade at this point if there are any problems.
3. Once the cluster's behavior and performance has been verified, bump the protocol
version by editing inter.broker.protocol.version and setting it to 2.6.
4. Restart the brokers one by one for the new protocol version to take effect. Once the
brokers begin using the latest protocol version, it will no longer be possible to
downgrade the cluster to an older version.
5. If you have overridden the message format version as instructed above, then you
need to do one more rolling restart to upgrade it to its latest version. Once all (or
most) consumers have been upgraded to 0.11.0 or later, change
log.message.format.version to 2.6 on each broker and restart them one by one. Note
that the older Scala clients, which are no longer maintained, do not support the
message format introduced in 0.11, so to avoid conversion costs (or to take
advantage of exactly once semantics), the newer Java clients must be used.
Kafka Streams adds a new processing mode (requires broker 2.5 or newer) that
improves application scalability using exactly-once guarantees (cf. KIP-447)
TLSv1.3 has been enabled by default for Java 11 or newer. The client and server will
negotiate TLSv1.3 if both support it and fallback to TLSv1.2 otherwise. See KIP-
573 for more details.
The default value for the client.dns.lookup configuration has been changed
from default to use_all_dns_ips. If a hostname resolves to multiple IP addresses,
clients and brokers will now attempt to connect to each IP in sequence until the
connection is successfully established. See KIP-602 for more details.
NotLeaderForPartitionException has been deprecated and replaced
with NotLeaderOrFollowerException. Fetch requests and other requests intended only
for the leader or follower return NOT_LEADER_OR_FOLLOWER(6) instead of
REPLICA_NOT_AVAILABLE(9) if the broker is not a replica, ensuring that this
transient error during reassignments is handled by all clients as a retriable
exception.
If you are upgrading from a version prior to 2.1.x, please see the note below about the
change to the schema used to store consumer offsets. Once you have changed the
inter.broker.protocol.version to the latest version, it will not be possible to downgrade to a
version prior to 2.1.
For a rolling upgrade:
If you are upgrading from version 0.11.0.x or above, and you have not overridden the
message format, then you only need to override the inter-broker protocol version.
If you are upgrading from a version prior to 2.1.x, please see the note below about the
change to the schema used to store consumer offsets. Once you have changed the
inter.broker.protocol.version to the latest version, it will not be possible to downgrade to a
version prior to 2.1.
If you are upgrading from version 0.11.0.x or above, and you have not overridden the
message format, then you only need to override the inter-broker protocol version.
1. ZooKeeper has been upgraded to 3.5.6. ZooKeeper upgrade from 3.4.X to 3.5.6 can
fail if there are no snapshot files in 3.4 data directory. This usually happens in test
upgrades where ZooKeeper 3.5.6 is trying to load an existing 3.4 data dir in which no
snapshot file has been created. For more details about the issue please refer
to ZOOKEEPER-3056. A fix is given in ZOOKEEPER-3056, which is to
set snapshot.trust.empty=true config in zookeeper.properties before the upgrade.
But we have observed data loss in standalone cluster upgrades when
using snapshot.trust.empty=true config. For more details about the issue please
refer to ZOOKEEPER-3644. So we recommend the safe workaround of copying
empty snapshot file to the 3.4 data directory, if there are no snapshot files in 3.4 data
directory. For more details about the workaround please refer to ZooKeeper Upgrade
FAQ.
2. An embedded Jetty based AdminServer added in ZooKeeper 3.5. AdminServer is
enabled by default in ZooKeeper and is started on port 8080. AdminServer is
disabled by default in the ZooKeeper config ( zookeeper.properties) provided by the
Apache Kafka distribution. Make sure to update your local zookeeper.properties file
with admin.enableServer=false if you wish to disable the AdminServer. Please
refer AdminServer config to configure the AdminServer.
A new Admin API has been added for partition reassignments. Due to changing the
way Kafka propagates reassignment information, it is possible to lose reassignment
state in failure edge cases while upgrading to the new version. It is not
recommended to start reassignments while upgrading.
ZooKeeper has been upgraded from 3.4.14 to 3.5.6. TLS and dynamic
reconfiguration are supported by the new version.
The bin/kafka-preferred-replica-election.sh command line tool has been
deprecated. It has been replaced by bin/kafka-leader-election.sh.
The methods electPreferredLeaders in the Java AdminClient class have been
deprecated in favor of the methods electLeaders.
Scala code leveraging the NewTopic(String, int, short) constructor with literal
values will need to explicitly call toShort on the second literal.
The argument in the constructor GroupAuthorizationException(String) is now used
to specify an exception message. Previously it referred to the group that failed
authorization. This was done for consistency with other exception types and to avoid
potential misuse. The constructor TopicAuthorizationException(String) which was
previously used for a single unauthorized topic was changed similarly.
The internal PartitionAssignor interface has been deprecated and replaced with a
new ConsumerPartitionAssignor in the public API. Some methods/signatures are
slightly different between the two interfaces. Users implementing a custom
PartitionAssignor should migrate to the new interface as soon as possible.
The DefaultPartitioner now uses a sticky partitioning strategy. This means that
records for specific topic with null keys and no assigned partition will be sent to the
same partition until the batch is ready to be sent. When a new batch is created, a
new partition is chosen. This decreases latency to produce, but it may result in
uneven distribution of records across partitions in edge cases. Generally users will
not be impacted, but this difference may be noticeable in tests and other situations
producing records for a very short amount of time.
The blocking KafkaConsumer#committed methods have been extended to allow a list of
partitions as input parameters rather than a single partition. It enables fewer
request/response iterations between clients and brokers fetching for the committed
offsets for the consumer group. The old overloaded functions are deprecated and
we would recommend users to make their code changes to leverage the new
methods (details can be found in KIP-520).
We've introduced a new INVALID_RECORD error in the produce response to distinguish
from the CORRUPT_MESSAGE error. To be more concrete, previously when a batch of
records were sent as part of a single request to the broker and one or more of the
records failed the validation due to various causes (mismatch magic bytes, crc
checksum errors, null key for log compacted topics, etc), the whole batch would be
rejected with the same and misleading CORRUPT_MESSAGE, and the caller of the
producer client would see the corresponding exception from either the future object
of RecordMetadata returned from the send call as well as in
the Callback#onCompletion(RecordMetadata metadata, Exception exception) Now
with the new error code and improved error messages of the exception, producer
callers would be better informed about the root cause why their sent records were
failed.
We are introducing incremental cooperative rebalancing to the clients' group
protocol, which allows consumers to keep all of their assigned partitions during a
rebalance and at the end revoke only those which must be migrated to another
consumer for overall cluster balance. The ConsumerCoordinator will choose the
latest RebalanceProtocol that is commonly supported by all of the consumer's
supported assignors. You can use the new built-in CooperativeStickyAssignor or
plug in your own custom cooperative assignor. To do so you must implement
the ConsumerPartitionAssignor interface and
include RebalanceProtocol.COOPERATIVE in the list returned
by ConsumerPartitionAssignor#supportedProtocols . Your custom assignor can then
leverage the ownedPartitions field in each consumer's Subscription to give partitions
back to their previous owners whenever possible. Note that when a partition is to be
reassigned to another consumer, it must be removed from the new assignment until
it has been revoked from its original owner. Any consumer that has to revoke a
partition will trigger a followup rebalance to allow the revoked partition to safely be
assigned to its new owner. See the ConsumerPartitionAssignor RebalanceProtocol
javadocs for more information.
To upgrade from the old (eager) protocol, which always revokes all partitions before
rebalancing, to cooperative rebalancing, you must follow a specific upgrade path to
get all clients on the same ConsumerPartitionAssignor that supports the cooperative
protocol. This can be done with two rolling bounces, using
the CooperativeStickyAssignor for the example: during the first one, add
"cooperative-sticky" to the list of supported assignors for each member (without
removing the previous assignor -- note that if previously using the default, you must
include that explicitly as well). You then bounce and/or upgrade it. Once the entire
group is on 2.4+ and all members have the "cooperative-sticky" among their
supported assignors, remove the other assignor(s) and perform a second rolling
bounce so that by the end all members support only the cooperative protocol. For
further details on the cooperative rebalancing protocol and upgrade path, see KIP-
429.
There are some behavioral changes to the ConsumerRebalanceListener, as well as a
new API. Exceptions thrown during any of the listener's three callbacks will no longer
be swallowed, and will instead be re-thrown all the way up to
the Consumer.poll() call. The onPartitionsLost method has been added to allow
users to react to abnormal circumstances where a consumer may have lost
ownership of its partitions (such as a missed rebalance) and cannot commit offsets.
By default, this will simply call the existing onPartitionsRevoked API to align with
previous behavior. Note however that onPartitionsLost will not be called when the
set of lost partitions is empty. This means that no callback will be invoked at the
beginning of the first rebalance of a new consumer joining the group.
The semantics of the ConsumerRebalanceListener's callbacks are further changed
when following the cooperative rebalancing protocol described above. In addition
to onPartitionsLost, onPartitionsRevoked will also never be called when the set of
revoked partitions is empty. The callback will generally be invoked only at the end of
a rebalance, and only on the set of partitions that are being moved to another
consumer. The onPartitionsAssigned callback will however always be called, even
with an empty set of partitions, as a way to notify users of a rebalance event (this is
true for both cooperative and eager). For details on the new callback semantics, see
the ConsumerRebalanceListener javadocs.
The Scala trait kafka.security.auth.Authorizer has been deprecated and replaced
with a new Java API org.apache.kafka.server.authorizer.Authorizer . The
authorizer implementation class kafka.security.auth.SimpleAclAuthorizer has also
been deprecated and replaced with a new
implementation kafka.security.authorizer.AclAuthorizer . AclAuthorizer uses
features supported by the new API to improve authorization logging and is
compatible with SimpleAclAuthorizer. For more details, see KIP-504.
Upgrading from 0.8.x, 0.9.x, 0.10.0.x, 0.10.1.x, 0.10.2.x, 0.11.0.x, 1.0.x, 1.1.x, 2.0.x or
2.1.x or 2.2.x to 2.3.0
If you are upgrading from a version prior to 2.1.x, please see the note below about the
change to the schema used to store consumer offsets. Once you have changed the
inter.broker.protocol.version to the latest version, it will not be possible to downgrade to a
version prior to 2.1.
If you are upgrading from 0.11.0.x, 1.0.x, 1.1.x, 2.0.x, or 2.1.x, and you have not
overridden the message format, then you only need to override the inter-broker
protocol version.
Upgrading from 0.8.x, 0.9.x, 0.10.0.x, 0.10.1.x, 0.10.2.x, 0.11.0.x, 1.0.x, 1.1.x, 2.0.x or
2.1.x to 2.2.0
If you are upgrading from a version prior to 2.1.x, please see the note below about the
change to the schema used to store consumer offsets. Once you have changed the
inter.broker.protocol.version to the latest version, it will not be possible to downgrade to a
version prior to 2.1.
If you are upgrading from 0.11.0.x, 1.0.x, 1.1.x, or 2.0.x and you have not overridden
the message format, then you only need to override the inter-broker protocol version.
Kafka Streams 2.2.1 requires 0.11 message format or higher and does not work with
older message format.
The default consumer group id has been changed from the empty string ( "") to null.
Consumers who use the new default group id will not be able to subscribe to topics,
and fetch or commit offsets. The empty string as consumer group id is deprecated
but will be supported until a future major release. Old clients that rely on the empty
string group id will now have to explicitly provide it as part of their consumer config.
For more information see KIP-289.
The bin/kafka-topics.sh command line tool is now able to connect directly to
brokers with --bootstrap-server instead of zookeeper. The old --zookeeper option is
still available for now. Please read KIP-377 for more information.
Kafka Streams depends on a newer version of RocksDBs that requires MacOS 10.13
or higher.
Upgrading from 0.8.x, 0.9.x, 0.10.0.x, 0.10.1.x, 0.10.2.x, 0.11.0.x, 1.0.x, 1.1.x, or 2.0.0
to 2.1.0
Note that 2.1.x contains a change to the internal schema used to store consumer offsets.
Once the upgrade is complete, it will not be possible to downgrade to previous versions.
See the rolling upgrade notes below for more detail.
If you are upgrading from 0.11.0.x, 1.0.x, 1.1.x, or 2.0.x and you have not overridden
the message format, then you only need to override the inter-broker protocol version.
1. Offset expiration semantics has slightly changed in this version. According to the
new semantics, offsets of partitions in a group will not be removed while the group is
subscribed to the corresponding topic and is still active (has active consumers). If
group becomes empty all its offsets will be removed after default offset retention
period (or the one set by broker) has passed (unless the group becomes active
again). Offsets associated with standalone (simple) consumers, that do not use
Kafka group management, will be removed after default offset retention period (or
the one set by broker) has passed since their last commit.
2. The default for console consumer's enable.auto.commit property when
no group.id is provided is now set to false. This is to avoid polluting the consumer
coordinator cache as the auto-generated group is not likely to be used by other
consumers.
3. The default value for the producer's retries config was changed
to Integer.MAX_VALUE, as we introduced delivery.timeout.ms in KIP-91, which sets an
upper bound on the total time between sending a record and receiving
acknowledgement from the broker. By default, the delivery timeout is set to 2
minutes.
4. By default, MirrorMaker now
overrides delivery.timeout.ms to Integer.MAX_VALUE when configuring the producer.
If you have overridden the value of retries in order to fail faster, you will instead
need to override delivery.timeout.ms.
5. The ListGroup API now expects, as a recommended alternative, Describe
Group access to the groups a user should be able to list. Even though the
old Describe Cluster access is still supported for backward compatibility, using it for
this API is not advised.
6. KIP-336 deprecates the ExtendedSerializer and ExtendedDeserializer interfaces and
propagates the usage of Serializer and Deserializer. ExtendedSerializer and
ExtendedDeserializer were introduced with KIP-82 to provide record headers for
serializers and deserializers in a Java 7 compatible fashion. Now we consolidated
these interfaces as Java 7 support has been dropped since.
Jetty has been upgraded to 9.4.12, which excludes TLS_RSA_* ciphers by default
because they do not support forward secrecy, see
https://github.com/eclipse/jetty.project/issues/2807 for more information.
Unclean leader election is automatically enabled by the controller
when unclean.leader.election.enable config is dynamically updated by using per-
topic config override.
The AdminClient has added a method AdminClient#metrics(). Now any application
using the AdminClient can gain more information and insight by viewing the metrics
captured from the AdminClient. For more information see KIP-324
Kafka now supports Zstandard compression from KIP-110. You must upgrade the
broker as well as clients to make use of it. Consumers prior to 2.1.0 will not be able
to read from topics which use Zstandard compression, so you should not enable it
for a topic until all downstream consumers are upgraded. See the KIP for more
detail.
Upgrading from 0.8.x, 0.9.x, 0.10.0.x, 0.10.1.x, 0.10.2.x, 0.11.0.x, 1.0.x, or 1.1.x to
2.0.0
Kafka 2.0.0 introduces wire protocol changes. By following the recommended rolling
upgrade plan below, you guarantee no downtime during the upgrade. However, please
review the notable changes in 2.0.0 before upgrading.
If you are upgrading from 0.11.0.x, 1.0.x, or 1.1.x and you have not overridden the
message format, then you only need to override the inter-broker protocol format.
1. If you are willing to accept downtime, you can simply take all the brokers down,
update the code and start them back up. They will start with the new protocol by
default.
2. Bumping the protocol version and restarting can be done any time after the brokers
are upgraded. It does not have to be immediately after. Similarly for the message
format version.
3. If you are using Java8 method references in your Kafka Streams code you might
need to update your code to resolve method ambiguities. Hot-swapping the jar-file
only might not work.
4. ACLs should not be added to prefixed resources, (added in KIP-290), until all brokers
in the cluster have been updated.
NOTE: any prefixed ACLs added to a cluster, even after the cluster is fully upgraded,
will be ignored should the cluster be downgraded again.
KIP-186 increases the default offset retention time from 1 day to 7 days. This makes
it less likely to "lose" offsets in an application that commits infrequently. It also
increases the active set of offsets and therefore can increase memory usage on the
broker. Note that the console consumer currently enables offset commit by default
and can be the source of a large number of offsets which this change will now
preserve for 7 days instead of 1. You can preserve the existing behavior by setting
the broker config offsets.retention.minutes to 1440.
Support for Java 7 has been dropped, Java 8 is now the minimum version required.
The default value for ssl.endpoint.identification.algorithm was changed to https,
which performs hostname verification (man-in-the-middle attacks are possible
otherwise). Set ssl.endpoint.identification.algorithm to an empty string to restore
the previous behaviour.
KAFKA-5674 extends the lower interval of max.connections.per.ip minimum to zero
and therefore allows IP-based filtering of inbound connections.
KIP-272 added API version tag to the
metric kafka.network:type=RequestMetrics,name=RequestsPerSec,request={Produce|
FetchConsumer|FetchFollower|...} . This metric now
becomes kafka.network:type=RequestMetrics,name=RequestsPerSec,request={Produc
e|FetchConsumer|FetchFollower|...},version={0|1|2|3|...} . This will impact JMX
monitoring tools that do not automatically aggregate. To get the total count for a
specific request type, the tool needs to be updated to aggregate across different
versions.
KIP-225 changed the metric "records.lag" to use tags for topic and partition. The
original version with the name format "{topic}-{partition}.records-lag" has been
removed.
The Scala consumers, which have been deprecated since 0.11.0.0, have been
removed. The Java consumer has been the recommended option since 0.10.0.0.
Note that the Scala consumers in 1.1.0 (and older) will continue to work even if the
brokers are upgraded to 2.0.0.
The Scala producers, which have been deprecated since 0.10.0.0, have been
removed. The Java producer has been the recommended option since 0.9.0.0. Note
that the behaviour of the default partitioner in the Java producer differs from the
default partitioner in the Scala producers. Users migrating should consider
configuring a custom partitioner that retains the previous behaviour. Note that the
Scala producers in 1.1.0 (and older) will continue to work even if the brokers are
upgraded to 2.0.0.
MirrorMaker and ConsoleConsumer no longer support the Scala consumer, they
always use the Java consumer.
The ConsoleProducer no longer supports the Scala producer, it always uses the Java
producer.
A number of deprecated tools that rely on the Scala clients have been removed:
ReplayLogProducer, SimpleConsumerPerformance, SimpleConsumerShell,
ExportZkOffsets, ImportZkOffsets, UpdateOffsetsInZK, VerifyConsumerRebalance.
The deprecated kafka.tools.ProducerPerformance has been removed, please use
org.apache.kafka.tools.ProducerPerformance.
New Kafka Streams configuration parameter upgrade.from added that allows rolling
bounce upgrade from older version.
KIP-284 changed the retention time for Kafka Streams repartition topics by setting
its default value to Long.MAX_VALUE.
Updated ProcessorStateManager APIs in Kafka Streams for registering state stores to
the processor topology. For more details please read the Streams Upgrade Guide.
In earlier releases, Connect's worker configuration required
the internal.key.converter and internal.value.converter properties. In 2.0, these
are no longer required and default to the JSON converter. You may safely remove
these properties from your Connect standalone and distributed worker
configurations:
internal.key.converter=org.apache.kafka.connect.json.JsonConverter internal.key
.converter.schemas.enable=false internal.value.converter=org.apache.kafka.conne
ct.json.JsonConverter internal.value.converter.schemas.enable=false
KIP-266 adds a new consumer configuration default.api.timeout.ms to specify the
default timeout to use for KafkaConsumer APIs that could block. The KIP also adds
overloads for such blocking APIs to support specifying a specific timeout to use for
each of them instead of using the default timeout set by default.api.timeout.ms. In
particular, a new poll(Duration) API has been added which does not block for
dynamic partition assignment. The old poll(long) API has been deprecated and will
be removed in a future version. Overloads have also been added for
other KafkaConsumer methods
like partitionsFor, listTopics, offsetsForTimes, beginningOffsets, endOffsets and cl
ose that take in a Duration.
Also as part of KIP-266, the default value of request.timeout.ms has been changed to
30 seconds. The previous value was a little higher than 5 minutes to account for
maximum time that a rebalance would take. Now we treat the JoinGroup request in
the rebalance as a special case and use a value derived
from max.poll.interval.ms for the request timeout. All other request types use the
timeout defined by request.timeout.ms
The internal method kafka.admin.AdminClient.deleteRecordsBefore has been
removed. Users are encouraged to migrate
to org.apache.kafka.clients.admin.AdminClient.deleteRecords .
The AclCommand tool --producer convenience option uses the KIP-277 finer grained
ACL on the given topic.
KIP-176 removes the --new-consumer option for all consumer based tools. This
option is redundant since the new consumer is automatically used if --bootstrap-
server is defined.
KIP-290 adds the ability to define ACLs on prefixed resources, e.g. any topic starting
with 'foo'.
KIP-283 improves message down-conversion handling on Kafka broker, which has
typically been a memory-intensive operation. The KIP adds a mechanism by which
the operation becomes less memory intensive by down-converting chunks of
partition data at a time which helps put an upper bound on memory consumption.
With this improvement, there is a change in FetchResponse protocol behavior where
the broker could send an oversized message batch towards the end of the response
with an invalid offset. Such oversized messages must be ignored by consumer
clients, as is done by KafkaConsumer.
Upgrading your Streams application from 1.1 to 2.0 does not require a broker
upgrade. A Kafka Streams 2.0 application can connect to 2.0, 1.1, 1.0, 0.11.0, 0.10.2
and 0.10.1 brokers (it is not possible to connect to 0.10.0 brokers though).
Note that in 2.0 we have removed the public APIs that are deprecated prior to 1.0;
users leveraging on those deprecated APIs need to make code changes accordingly.
See Streams API changes in 2.0.0 for more details.
Upgrading from 0.8.x, 0.9.x, 0.10.0.x, 0.10.1.x, 0.10.2.x, 0.11.0.x, or 1.0.x to 1.1.x
Kafka 1.1.0 introduces wire protocol changes. By following the recommended rolling
upgrade plan below, you guarantee no downtime during the upgrade. However, please
review the notable changes in 1.1.0 before upgrading.
If you are upgrading from 0.11.0.x or 1.0.x and you have not overridden the message
format, then you only need to override the inter-broker protocol format.
1. If you are willing to accept downtime, you can simply take all the brokers down,
update the code and start them back up. They will start with the new protocol by
default.
2. Bumping the protocol version and restarting can be done any time after the brokers
are upgraded. It does not have to be immediately after. Similarly for the message
format version.
3. If you are using Java8 method references in your Kafka Streams code you might
need to update your code to resolve method ambiguties. Hot-swapping the jar-file
only might not work.
Upgrading your Streams application from 1.0 to 1.1 does not require a broker
upgrade. A Kafka Streams 1.1 application can connect to 1.0, 0.11.0, 0.10.2 and
0.10.1 brokers (it is not possible to connect to 0.10.0 brokers though).
See Streams API changes in 1.1.0 for more details.
Kafka 1.0.0 introduces wire protocol changes. By following the recommended rolling
upgrade plan below, you guarantee no downtime during the upgrade. However, please
review the notable changes in 1.0.0 before upgrading.
If you are upgrading from 0.11.0.x and you have not overridden the message format,
you must set both the message format version and the inter-broker protocol version
to 0.11.0.
o inter.broker.protocol.version=0.11.0
o log.message.format.version=0.11.0
2. Upgrade the brokers one at a time: shut down the broker, update the code, and
restart it.
3. Once the entire cluster is upgraded, bump the protocol version by
editing inter.broker.protocol.version and setting it to 1.0.
4. Restart the brokers one by one for the new protocol version to take effect.
5. If you have overridden the message format version as instructed above, then you
need to do one more rolling restart to upgrade it to its latest version. Once all (or
most) consumers have been upgraded to 0.11.0 or later, change
log.message.format.version to 1.0 on each broker and restart them one by one. If
you are upgrading from 0.11.0 and log.message.format.version is set to 0.11.0, you
can update the config and skip the rolling restart. Note that the older Scala
consumer does not support the new message format introduced in 0.11, so to avoid
the performance cost of down-conversion (or to take advantage of exactly once
semantics), the newer Java consumer must be used.
1. If you are willing to accept downtime, you can simply take all the brokers down,
update the code and start them back up. They will start with the new protocol by
default.
2. Bumping the protocol version and restarting can be done any time after the brokers
are upgraded. It does not have to be immediately after. Similarly for the message
format version.
Notable changes in 1.0.2
Topic deletion is now enabled by default, since the functionality is now stable. Users
who wish to to retain the previous behavior should set the broker
config delete.topic.enable to false. Keep in mind that topic deletion removes data
and the operation is not reversible (i.e. there is no "undelete" operation)
For topics that support timestamp search if no offset can be found for a partition,
that partition is now included in the search result with a null offset value. Previously,
the partition was not included in the map. This change was made to make the search
behavior consistent with the case of topics not supporting timestamp search.
If the inter.broker.protocol.version is 1.0 or later, a broker will now stay online to
serve replicas on live log directories even if there are offline log directories. A log
directory may become offline due to IOException caused by hardware failure. Users
need to monitor the per-broker metric offlineLogDirectoryCount to check whether
there is offline log directory.
Added KafkaStorageException which is a retriable exception.
KafkaStorageException will be converted to NotLeaderForPartitionException in the
response if the version of client's FetchRequest or ProducerRequest does not
support KafkaStorageException.
-XX:+DisableExplicitGC was replaced by -XX:+ExplicitGCInvokesConcurrent in the
default JVM settings. This helps avoid out of memory exceptions during allocation
of native memory by direct buffers in some cases.
The overridden handleError method implementations have been removed from the
following deprecated classes in
the kafka.api package: FetchRequest, GroupCoordinatorRequest, OffsetCommitRequest,
OffsetFetchRequest, OffsetRequest, ProducerRequest, and TopicMetadataRequest. This
was only intended for use on the broker, but it is no longer in use and the
implementations have not been maintained. A stub implementation has been
retained for binary compatibility.
The Java clients and tools now accept any string as a client-id.
The deprecated tool kafka-consumer-offset-checker.sh has been removed.
Use kafka-consumer-groups.sh to get consumer group details.
SimpleAclAuthorizer now logs access denials to the authorizer log by default.
Authentication failures are now reported to clients as one of the subclasses
of AuthenticationException. No retries will be performed if a client connection fails
authentication.
Custom SaslServer implementations may throw SaslAuthenticationException to
provide an error message to return to clients indicating the reason for authentication
failure. Implementors should take care not to include any security-critical
information in the exception message that should not be leaked to unauthenticated
clients.
The app-info mbean registered with JMX to provide version and commit id will be
deprecated and replaced with metrics providing these attributes.
Kafka metrics may now contain non-numeric
values. org.apache.kafka.common.Metric#value() has been deprecated and will
return 0.0 in such cases to minimise the probability of breaking users who read the
value of every client metric (via a MetricsReporter implementation or by calling
the metrics() method). org.apache.kafka.common.Metric#metricValue() can be used
to retrieve numeric and non-numeric metric values.
Every Kafka rate metric now has a corresponding cumulative count metric with the
suffix -total to simplify downstream processing. For example, records-consumed-
rate has a corresponding metric named records-consumed-total.
Mx4j will only be enabled if the system property kafka_mx4jenable is set to true. Due
to a logic inversion bug, it was previously enabled by default and disabled
if kafka_mx4jenable was set to true.
The package org.apache.kafka.common.security.auth in the clients jar has been
made public and added to the javadocs. Internal classes which had previously been
located in this package have been moved elsewhere.
When using an Authorizer and a user doesn't have required permissions on a topic,
the broker will return TOPIC_AUTHORIZATION_FAILED errors to requests
irrespective of topic existence on broker. If the user have required permissions and
the topic doesn't exists, then the UNKNOWN_TOPIC_OR_PARTITION error code will
be returned.
config/consumer.properties file updated to use new consumer config properties.
Upgrading your Streams application from 0.11.0 to 1.0 does not require a broker
upgrade. A Kafka Streams 1.0 application can connect to 0.11.0, 0.10.2 and 0.10.1
brokers (it is not possible to connect to 0.10.0 brokers though). However, Kafka
Streams 1.0 requires 0.10 message format or newer and does not work with older
message formats.
If you are monitoring on streams metrics, you will need make some changes to the
metrics names in your reporting and monitoring code, because the metrics sensor
hierarchy was changed.
There are a few public APIs
including ProcessorContext#schedule(), Processor#punctuate() and KStreamBuilder, T
opologyBuilder are being deprecated by new APIs. We recommend making
corresponding code changes, which should be very minor since the new APIs look
quite similar, when you upgrade.
See Streams API changes in 1.0.0 for more details.
Upgrading your Streams application from 0.10.2 to 1.0 does not require a broker
upgrade. A Kafka Streams 1.0 application can connect to 1.0, 0.11.0, 0.10.2 and
0.10.1 brokers (it is not possible to connect to 0.10.0 brokers though).
If you are monitoring on streams metrics, you will need make some changes to the
metrics names in your reporting and monitoring code, because the metrics sensor
hierarchy was changed.
There are a few public APIs
including ProcessorContext#schedule(), Processor#punctuate() and KStreamBuilder, T
opologyBuilder are being deprecated by new APIs. We recommend making
corresponding code changes, which should be very minor since the new APIs look
quite similar, when you upgrade.
If you specify customized key.serde, value.serde and timestamp.extractor in
configs, it is recommended to use their replaced configure parameter as these
configs are deprecated.
See Streams API changes in 0.11.0 for more details.
Upgrading your Streams application from 0.10.0 to 1.0 does require a broker
upgrade because a Kafka Streams 1.0 application can only connect to 0.1, 0.11.0,
0.10.2, or 0.10.1 brokers.
There are couple of API changes, that are not backward compatible (cf. Streams API
changes in 1.0.0, Streams API changes in 0.11.0, Streams API changes in 0.10.2,
and Streams API changes in 0.10.1 for more details). Thus, you need to update and
recompile your code. Just swapping the Kafka Streams library jar file will not work
and will break your application.
Upgrading from 0.10.0.x to 1.0.2 requires two rolling bounces with
config upgrade.from="0.10.0" set for first upgrade phase (cf. KIP-268). As an
alternative, an offline upgrade is also possible.
o prepare your application instances for a rolling bounce and make sure that
config upgrade.from is set to "0.10.0" for new version 0.11.0.3
o bounce each instance of your application once
o prepare your newly deployed 1.0.2 application instances for a second round
of rolling bounces; make sure to remove the value for config upgrade.mode
o bounce each instance of your application once more to complete the upgrade
Upgrading from 0.10.0.x to 1.0.0 or 1.0.1 requires an offline upgrade (rolling bounce
upgrade is not supported)
o stop all old (0.10.0.x) application instances
o update your code and swap old code and jar file with new code and new jar
file
o restart all new (1.0.0 or 1.0.1) application instances
Kafka 0.11.0.0 introduces a new message format version as well as wire protocol changes.
By following the recommended rolling upgrade plan below, you guarantee no downtime
during the upgrade. However, please review the notable changes in 0.11.0.0 before
upgrading.
Starting with version 0.10.2, Java clients (producer and consumer) have acquired the ability
to communicate with older brokers. Version 0.11.0 clients can talk to version 0.10.0 or
newer brokers. However, if your brokers are older than 0.10.0, you must upgrade all the
brokers in the Kafka cluster before upgrading your clients. Version 0.11.0 brokers support
0.8.x and newer clients.
1. If you are willing to accept downtime, you can simply take all the brokers down,
update the code and start them back up. They will start with the new protocol by
default.
2. Bumping the protocol version and restarting can be done any time after the brokers
are upgraded. It does not have to be immediately after. Similarly for the message
format version.
3. It is also possible to enable the 0.11.0 message format on individual topics using the
topic admin tool (bin/kafka-topics.sh) prior to updating the global
setting log.message.format.version.
4. If you are upgrading from a version prior to 0.10.0, it is NOT necessary to first update
the message format to 0.10.0 before you switch to 0.11.0.
Upgrading your Streams application from 0.10.2 to 0.11.0 does not require a broker
upgrade. A Kafka Streams 0.11.0 application can connect to 0.11.0, 0.10.2 and
0.10.1 brokers (it is not possible to connect to 0.10.0 brokers though).
If you specify customized key.serde, value.serde and timestamp.extractor in
configs, it is recommended to use their replaced configure parameter as these
configs are deprecated.
See Streams API changes in 0.11.0 for more details.
Upgrading your Streams application from 0.10.1 to 0.11.0 does not require a broker
upgrade. A Kafka Streams 0.11.0 application can connect to 0.11.0, 0.10.2 and
0.10.1 brokers (it is not possible to connect to 0.10.0 brokers though).
You need to recompile your code. Just swapping the Kafka Streams library jar file
will not work and will break your application.
If you specify customized key.serde, value.serde and timestamp.extractor in
configs, it is recommended to use their replaced configure parameter as these
configs are deprecated.
If you use a custom (i.e., user implemented) timestamp extractor, you will need to
update this code, because the TimestampExtractor interface was changed.
If you register custom metrics, you will need to update this code, because
the StreamsMetric interface was changed.
See Streams API changes in 0.11.0 and Streams API changes in 0.10.2 for more
details.
Upgrading a 0.10.0 Kafka Streams Application
Upgrading your Streams application from 0.10.0 to 0.11.0 does require a broker
upgrade because a Kafka Streams 0.11.0 application can only connect to 0.11.0,
0.10.2, or 0.10.1 brokers.
There are couple of API changes, that are not backward compatible (cf. Streams API
changes in 0.11.0, Streams API changes in 0.10.2, and Streams API changes in
0.10.1 for more details). Thus, you need to update and recompile your code. Just
swapping the Kafka Streams library jar file will not work and will break your
application.
Upgrading from 0.10.0.x to 0.11.0.3 requires two rolling bounces with
config upgrade.from="0.10.0" set for first upgrade phase (cf. KIP-268). As an
alternative, an offline upgrade is also possible.
o prepare your application instances for a rolling bounce and make sure that
config upgrade.from is set to "0.10.0" for new version 0.11.0.3
o bounce each instance of your application once
o prepare your newly deployed 0.11.0.3 application instances for a second
round of rolling bounces; make sure to remove the value for
config upgrade.mode
o bounce each instance of your application once more to complete the upgrade
Upgrading from 0.10.0.x to 0.11.0.0, 0.11.0.1, or 0.11.0.2 requires an offline upgrade
(rolling bounce upgrade is not supported)
o stop all old (0.10.0.x) application instances
o update your code and swap old code and jar file with new code and new jar
file
o restart all new (0.11.0.0 , 0.11.0.1, or 0.11.0.2) application instances
Unclean leader election is now disabled by default. The new default favors durability
over availability. Users who wish to to retain the previous behavior should set the
broker config unclean.leader.election.enable to true.
Producer
configs block.on.buffer.full, metadata.fetch.timeout.ms and timeout.ms have been
removed. They were initially deprecated in Kafka 0.9.0.0.
The offsets.topic.replication.factor broker config is now enforced upon auto
topic creation. Internal auto topic creation will fail with a
GROUP_COORDINATOR_NOT_AVAILABLE error until the cluster size meets this
replication factor requirement.
When compressing data with snappy, the producer and broker will use the
compression scheme's default block size (2 x 32 KB) instead of 1 KB in order to
improve the compression ratio. There have been reports of data compressed with
the smaller block size being 50% larger than when compressed with the larger block
size. For the snappy case, a producer with 5000 partitions will require an additional
315 MB of JVM heap.
Similarly, when compressing data with gzip, the producer and broker will use 8 KB
instead of 1 KB as the buffer size. The default for gzip is excessively low (512 bytes).
The broker configuration max.message.bytes now applies to the total size of a batch
of messages. Previously the setting applied to batches of compressed messages, or
to non-compressed messages individually. A message batch may consist of only a
single message, so in most cases, the limitation on the size of individual messages
is only reduced by the overhead of the batch format. However, there are some subtle
implications for message format conversion (see below for more detail). Note also
that while previously the broker would ensure that at least one message is returned
in each fetch request (regardless of the total and partition-level fetch sizes), the
same behavior now applies to one message batch.
GC log rotation is enabled by default, see KAFKA-3754 for details.
Deprecated constructors of RecordMetadata, MetricName and Cluster classes have
been removed.
Added user headers support through a new Headers interface providing user
headers read and write access.
ProducerRecord and ConsumerRecord expose the new Headers API via Headers
headers() method call.
ExtendedSerializer and ExtendedDeserializer interfaces are introduced to support
serialization and deserialization for headers. Headers will be ignored if the
configured serializer and deserializer are not the above classes.
A new config, group.initial.rebalance.delay.ms, was introduced. This config
specifies the time, in milliseconds, that the GroupCoordinator will delay the initial
consumer rebalance. The rebalance will be further delayed by the value
of group.initial.rebalance.delay.ms as new members join the group, up to a
maximum of max.poll.interval.ms. The default value for this is 3 seconds. During
development and testing it might be desirable to set this to 0 in order to not delay
test execution time.
org.apache.kafka.common.Cluster#partitionsForTopic , partitionsForNode and availa
blePartitionsForTopic methods will return an empty list instead of null (which is
considered a bad practice) in case the metadata for the required topic does not
exist.
Streams API configuration parameters timestamp.extractor, key.serde,
and value.serde were deprecated and replaced
by default.timestamp.extractor, default.key.serde, and default.value.serde,
respectively.
For offset commit failures in the Java consumer's commitAsync APIs, we no longer
expose the underlying cause when instances of RetriableCommitFailedException are
passed to the commit callback. See KAFKA-5052 for more detail.
Kafka 0.11.0 includes support for idempotent and transactional capabilities in the producer.
Idempotent delivery ensures that messages are delivered exactly once to a particular topic
partition during the lifetime of a single producer. Transactional delivery allows producers to
send data to multiple partitions such that either all messages are successfully delivered, or
none of them are. Together, these capabilities enable "exactly once semantics" in Kafka.
More details on these features are available in the user guide, but below we add a few
specific notes on enabling them in an upgraded cluster. Note that enabling EoS is not
required and there is no impact on the broker's behavior if unused.
1. Only the new Java producer and consumer support exactly once semantics.
2. These features depend crucially on the 0.11.0 message format. Attempting to use
them on an older format will result in unsupported version errors.
3. Transaction state is stored in a new internal topic __transaction_state. This topic is
not created until the the first attempt to use a transactional request API. Similar to
the consumer offsets topic, there are several settings to control the topic's
configuration. For example, transaction.state.log.min.isr controls the minimum
ISR for this topic. See the configuration section in the user guide for a full list of
options.
4. For secure clusters, the transactional APIs require new ACLs which can be turned on
with the bin/kafka-acls.sh. tool.
5. EoS in Kafka introduces new request APIs and modifies several existing ones.
See KIP-98 for the full details
One of the notable differences in the new message format is that even uncompressed
messages are stored together as a single batch. This has a few implications for the broker
configuration max.message.bytes, which limits the size of a single batch. First, if an older
client produces messages to a topic partition using the old format, and the messages are
individually smaller than max.message.bytes, the broker may still reject them after they are
merged into a single batch during the up-conversion process. Generally this can happen
when the aggregate size of the individual messages is larger than max.message.bytes. There
is a similar effect for older consumers reading messages down-converted from the new
format: if the fetch size is not set at least as large as max.message.bytes, the consumer may
not be able to make progress even if the individual uncompressed messages are smaller
than the configured fetch size. This behavior does not impact the Java client for 0.10.1.0
and later since it uses an updated fetch protocol which ensures that at least one message
can be returned even if it exceeds the fetch size. To get around these problems, you should
ensure 1) that the producer's batch size is not set larger than max.message.bytes, and 2) that
the consumer's fetch size is set at least as large as max.message.bytes.
Most of the discussion on the performance impact of upgrading to the 0.10.0 message
format remains pertinent to the 0.11.0 upgrade. This mainly affects clusters that are not
secured with TLS since "zero-copy" transfer is already not possible in that case. In order to
avoid the cost of down-conversion, you should ensure that consumer applications are
upgraded to the latest 0.11.0 client. Significantly, since the old consumer has been
deprecated in 0.11.0.0, it does not support the new message format. You must upgrade to
use the new consumer to use the new message format without the cost of down-
conversion. Note that 0.11.0 consumers support backwards compatibility with 0.10.0
brokers and upward, so it is possible to upgrade the clients first before the brokers.
0.10.2.0 has wire protocol changes. By following the recommended rolling upgrade plan
below, you guarantee no downtime during the upgrade. However, please review the notable
changes in 0.10.2.0 before upgrading.
Starting with version 0.10.2, Java clients (producer and consumer) have acquired the ability
to communicate with older brokers. Version 0.10.2 clients can talk to version 0.10.0 or
newer brokers. However, if your brokers are older than 0.10.0, you must upgrade all the
brokers in the Kafka cluster before upgrading your clients. Version 0.10.2 brokers support
0.8.x and newer clients.
1. Update server.properties file on all brokers and add the following properties:
o inter.broker.protocol.version=CURRENT_KAFKA_VERSION (e.g. 0.8.2, 0.9.0,
0.10.0 or 0.10.1).
o log.message.format.version=CURRENT_KAFKA_VERSION (See potential
performance impact following the upgrade for the details on what this
configuration does.)
2. Upgrade the brokers one at a time: shut down the broker, update the code, and
restart it.
3. Once the entire cluster is upgraded, bump the protocol version by editing
inter.broker.protocol.version and setting it to 0.10.2.
4. If your previous message format is 0.10.0, change log.message.format.version to
0.10.2 (this is a no-op as the message format is the same for 0.10.0, 0.10.1 and
0.10.2). If your previous message format version is lower than 0.10.0, do not change
log.message.format.version yet - this parameter should only change once all
consumers have been upgraded to 0.10.0.0 or later.
5. Restart the brokers one by one for the new protocol version to take effect.
6. If log.message.format.version is still lower than 0.10.0 at this point, wait until all
consumers have been upgraded to 0.10.0 or later, then change
log.message.format.version to 0.10.2 on each broker and restart them one by one.
Note: If you are willing to accept downtime, you can simply take all the brokers down,
update the code and start all of them. They will start with the new protocol by default.
Note: Bumping the protocol version and restarting can be done any time after the brokers
were upgraded. It does not have to be immediately after.
Upgrading your Streams application from 0.10.1 to 0.10.2 does not require a broker
upgrade. A Kafka Streams 0.10.2 application can connect to 0.10.2 and 0.10.1
brokers (it is not possible to connect to 0.10.0 brokers though).
You need to recompile your code. Just swapping the Kafka Streams library jar file
will not work and will break your application.
If you use a custom (i.e., user implemented) timestamp extractor, you will need to
update this code, because the TimestampExtractor interface was changed.
If you register custom metrics, you will need to update this code, because
the StreamsMetric interface was changed.
See Streams API changes in 0.10.2 for more details.
Upgrading your Streams application from 0.10.0 to 0.10.2 does require a broker
upgrade because a Kafka Streams 0.10.2 application can only connect to 0.10.2 or
0.10.1 brokers.
There are couple of API changes, that are not backward compatible (cf. Streams API
changes in 0.10.2 for more details). Thus, you need to update and recompile your
code. Just swapping the Kafka Streams library jar file will not work and will break
your application.
Upgrading from 0.10.0.x to 0.10.2.2 requires two rolling bounces with
config upgrade.from="0.10.0" set for first upgrade phase (cf. KIP-268). As an
alternative, an offline upgrade is also possible.
o prepare your application instances for a rolling bounce and make sure that
config upgrade.from is set to "0.10.0" for new version 0.10.2.2
o bounce each instance of your application once
o prepare your newly deployed 0.10.2.2 application instances for a second
round of rolling bounces; make sure to remove the value for
config upgrade.mode
o bounce each instance of your application once more to complete the upgrade
Upgrading from 0.10.0.x to 0.10.2.0 or 0.10.2.1 requires an offline upgrade (rolling
bounce upgrade is not supported)
o stop all old (0.10.0.x) application instances
o update your code and swap old code and jar file with new code and new jar
file
o restart all new (0.10.2.0 or 0.10.2.1) application instances
The default values for two configurations of the StreamsConfig class were changed
to improve the resiliency of Kafka Streams applications. The internal Kafka Streams
producer retries default value was changed from 0 to 10. The internal Kafka
Streams consumer max.poll.interval.ms default value was changed from 300000
to Integer.MAX_VALUE.
0.10.1.0 has wire protocol changes. By following the recommended rolling upgrade plan
below, you guarantee no downtime during the upgrade. However, please notice
the Potential breaking changes in 0.10.1.0 before upgrade.
Note: Because new protocols are introduced, it is important to upgrade your Kafka clusters
before upgrading your clients (i.e. 0.10.1.x clients only support 0.10.1.x or later brokers
while 0.10.1.x brokers also support older clients).
1. Update server.properties file on all brokers and add the following properties:
o inter.broker.protocol.version=CURRENT_KAFKA_VERSION (e.g. 0.8.2.0,
0.9.0.0 or 0.10.0.0).
o log.message.format.version=CURRENT_KAFKA_VERSION (See potential
performance impact following the upgrade for the details on what this
configuration does.)
2. Upgrade the brokers one at a time: shut down the broker, update the code, and
restart it.
3. Once the entire cluster is upgraded, bump the protocol version by editing
inter.broker.protocol.version and setting it to 0.10.1.0.
4. If your previous message format is 0.10.0, change log.message.format.version to
0.10.1 (this is a no-op as the message format is the same for both 0.10.0 and
0.10.1). If your previous message format version is lower than 0.10.0, do not change
log.message.format.version yet - this parameter should only change once all
consumers have been upgraded to 0.10.0.0 or later.
5. Restart the brokers one by one for the new protocol version to take effect.
6. If log.message.format.version is still lower than 0.10.0 at this point, wait until all
consumers have been upgraded to 0.10.0 or later, then change
log.message.format.version to 0.10.1 on each broker and restart them one by one.
Note: If you are willing to accept downtime, you can simply take all the brokers down,
update the code and start all of them. They will start with the new protocol by default.
Note: Bumping the protocol version and restarting can be done any time after the brokers
were upgraded. It does not have to be immediately after.
The log retention time is no longer based on last modified time of the log segments.
Instead it will be based on the largest timestamp of the messages in a log segment.
The log rolling time is no longer depending on log segment create time. Instead it is
now based on the timestamp in the messages. More specifically. if the timestamp of
the first message in the segment is T, the log will be rolled out when a new message
has a timestamp greater than or equal to T + log.roll.ms
The open file handlers of 0.10.0 will increase by ~33% because of the addition of
time index files for each segment.
The time index and offset index share the same index size configuration. Since each
time index entry is 1.5x the size of offset index entry. User may need to increase
log.index.size.max.bytes to avoid potential frequent log rolling.
Due to the increased number of index files, on some brokers with large amount the
log segments (e.g. >15K), the log loading process during the broker startup could be
longer. Based on our experiment, setting the num.recovery.threads.per.data.dir to
one may reduce the log loading time.
Upgrading your Streams application from 0.10.0 to 0.10.1 does require a broker
upgrade because a Kafka Streams 0.10.1 application can only connect to 0.10.1
brokers.
There are couple of API changes, that are not backward compatible (cf. Streams API
changes in 0.10.1 for more details). Thus, you need to update and recompile your
code. Just swapping the Kafka Streams library jar file will not work and will break
your application.
Upgrading from 0.10.0.x to 0.10.1.2 requires two rolling bounces with
config upgrade.from="0.10.0" set for first upgrade phase (cf. KIP-268). As an
alternative, an offline upgrade is also possible.
o prepare your application instances for a rolling bounce and make sure that
config upgrade.from is set to "0.10.0" for new version 0.10.1.2
o bounce each instance of your application once
o prepare your newly deployed 0.10.1.2 application instances for a second
round of rolling bounces; make sure to remove the value for
config upgrade.mode
o bounce each instance of your application once more to complete the upgrade
Upgrading from 0.10.0.x to 0.10.1.0 or 0.10.1.1 requires an offline upgrade (rolling
bounce upgrade is not supported)
o stop all old (0.10.0.x) application instances
o update your code and swap old code and jar file with new code and new jar
file
o restart all new (0.10.1.0 or 0.10.1.1) application instances
The new Java consumer is no longer in beta and we recommend it for all new
development. The old Scala consumers are still supported, but they will be
deprecated in the next release and will be removed in a future major release.
The --new-consumer/--new.consumer switch is no longer required to use tools like
MirrorMaker and the Console Consumer with the new consumer; one simply needs
to pass a Kafka broker to connect to instead of the ZooKeeper ensemble. In addition,
usage of the Console Consumer with the old consumer has been deprecated and it
will be removed in a future major release.
Kafka clusters can now be uniquely identified by a cluster id. It will be automatically
generated when a broker is upgraded to 0.10.1.0. The cluster id is available via the
kafka.server:type=KafkaServer,name=ClusterId metric and it is part of the Metadata
response. Serializers, client interceptors and metric reporters can receive the cluster
id by implementing the ClusterResourceListener interface.
The BrokerState "RunningAsController" (value 4) has been removed. Due to a bug, a
broker would only be in this state briefly before transitioning out of it and hence the
impact of the removal should be minimal. The recommended way to detect if a given
broker is the controller is via the
kafka.controller:type=KafkaController,name=ActiveControllerCount metric.
The new Java Consumer now allows users to search offsets by timestamp on
partitions.
The new Java Consumer now supports heartbeating from a background thread.
There is a new configuration max.poll.interval.ms which controls the maximum
time between poll invocations before the consumer will proactively leave the group
(5 minutes by default). The value of the configuration request.timeout.ms must
always be larger than max.poll.interval.ms because this is the maximum time that a
JoinGroup request can block on the server while the consumer is rebalancing, so we
have changed its default value to just above 5 minutes. Finally, the default value
of session.timeout.ms has been adjusted down to 10 seconds, and the default value
of max.poll.records has been changed to 500.
When using an Authorizer and a user doesn't have Describe authorization on a topic,
the broker will no longer return TOPIC_AUTHORIZATION_FAILED errors to requests
since this leaks topic names. Instead, the UNKNOWN_TOPIC_OR_PARTITION error
code will be returned. This may cause unexpected timeouts or delays when using the
producer and consumer since Kafka clients will typically retry automatically on
unknown topic errors. You should consult the client logs if you suspect this could be
happening.
Fetch responses have a size limit by default (50 MB for consumers and 10 MB for
replication). The existing per partition limits also apply (1 MB for consumers and
replication). Note that neither of these limits is an absolute maximum as explained in
the next point.
Consumers and replicas can make progress if a message larger than the
response/partition size limit is found. More concretely, if the first message in the
first non-empty partition of the fetch is larger than either or both limits, the message
will still be returned.
Overloaded constructors were added
to kafka.api.FetchRequest and kafka.javaapi.FetchRequest to allow the caller to
specify the order of the partitions (since order is significant in v3). The previously
existing constructors were deprecated and the partitions are shuffled before the
request is sent to avoid starvation issues.
Notes to clients with version 0.9.0.0: Due to a bug introduced in 0.9.0.0, clients that depend
on ZooKeeper (old Scala high-level Consumer and MirrorMaker if used with the old
consumer) will not work with 0.10.0.x brokers. Therefore, 0.9.0.0 clients should be upgraded
to 0.9.0.1 before brokers are upgraded to 0.10.0.x. This step is not necessary for 0.8.X or
0.9.0.1 clients.
1. Update server.properties file on all brokers and add the following properties:
o inter.broker.protocol.version=CURRENT_KAFKA_VERSION (e.g. 0.8.2 or
0.9.0.0).
o log.message.format.version=CURRENT_KAFKA_VERSION (See potential
performance impact following the upgrade for the details on what this
configuration does.)
2. Upgrade the brokers. This can be done a broker at a time by simply bringing it down,
updating the code, and restarting it.
3. Once the entire cluster is upgraded, bump the protocol version by editing
inter.broker.protocol.version and setting it to 0.10.0.0. NOTE: You shouldn't touch
log.message.format.version yet - this parameter should only change once all
consumers have been upgraded to 0.10.0.0
4. Restart the brokers one by one for the new protocol version to take effect.
5. Once all consumers have been upgraded to 0.10.0, change
log.message.format.version to 0.10.0 on each broker and restart them one by one.
Note: If you are willing to accept downtime, you can simply take all the brokers down,
update the code and start all of them. They will start with the new protocol by default.
Note: Bumping the protocol version and restarting can be done any time after the brokers
were upgraded. It does not have to be immediately after.
The message format in 0.10.0 includes a new timestamp field and uses relative offsets for
compressed messages. The on disk message format can be configured through
log.message.format.version in the server.properties file. The default on-disk message
format is 0.10.0. If a consumer client is on a version before 0.10.0.0, it only understands
message formats before 0.10.0. In this case, the broker is able to convert messages from
the 0.10.0 format to an earlier format before sending the response to the consumer on an
older version. However, the broker can't use zero-copy transfer in this case. Reports from
the Kafka community on the performance impact have shown CPU utilization going from
20% before to 100% after an upgrade, which forced an immediate upgrade of all clients to
bring performance back to normal. To avoid such message conversion before consumers
are upgraded to 0.10.0.0, one can set log.message.format.version to 0.8.2 or 0.9.0 when
upgrading the broker to 0.10.0.0. This way, the broker can still use zero-copy transfer to
send the data to the old consumers. Once consumers are upgraded, one can change the
message format to 0.10.0 on the broker and enjoy the new message format that includes
new timestamp and improved compression. The conversion is supported to ensure
compatibility and can be useful to support a few apps that have not updated to newer
clients yet, but is impractical to support all consumer traffic on even an overprovisioned
cluster. Therefore, it is critical to avoid the message conversion as much as possible when
brokers have been upgraded but the majority of clients have not.
Note: By setting the message format version, one certifies that all existing messages are on
or below that message format version. Otherwise consumers before 0.10.0.0 might break.
In particular, after the message format is set to 0.10.0, one should not change it back to an
earlier format as it may break consumers on versions before 0.10.0.0.
Starting from Kafka 0.10.0.0, the message format version in Kafka is represented as
the Kafka version. For example, message format 0.9.0 refers to the highest message
version supported by Kafka 0.9.0.
Message format 0.10.0 has been introduced and it is used by default. It includes a
timestamp field in the messages and relative offsets are used for compressed
messages.
ProduceRequest/Response v2 has been introduced and it is used by default to
support message format 0.10.0
FetchRequest/Response v2 has been introduced and it is used by default to support
message format 0.10.0
MessageFormatter interface was changed from def writeTo(key: Array[Byte],
value: Array[Byte], output: PrintStream) to def writeTo(consumerRecord:
ConsumerRecord[Array[Byte], Array[Byte]], output: PrintStream)
MessageReader interface was changed from def readMessage():
KeyedMessage[Array[Byte], Array[Byte]] to def readMessage():
ProducerRecord[Array[Byte], Array[Byte]]
MessageFormatter's package was changed from kafka.tools to kafka.common
MessageReader's package was changed from kafka.tools to kafka.common
MirrorMakerMessageHandler no longer exposes the handle(record:
MessageAndMetadata[Array[Byte], Array[Byte]]) method as it was never called.
The 0.7 KafkaMigrationTool is no longer packaged with Kafka. If you need to migrate
from 0.7 to 0.10.0, please migrate to 0.8 first and then follow the documented
upgrade process to upgrade from 0.8 to 0.10.0.
The new consumer has standardized its APIs to accept java.util.Collection as the
sequence type for method parameters. Existing code may have to be updated to
work with the 0.10.0 client library.
LZ4-compressed message handling was changed to use an interoperable framing
specification (LZ4f v1.5.1). To maintain compatibility with old clients, this change
only applies to Message format 0.10.0 and later. Clients that Produce/Fetch LZ4-
compressed messages using v0/v1 (Message format 0.9.0) should continue to use
the 0.9.0 framing implementation. Clients that use Produce/Fetch protocols v2 or
later should use interoperable LZ4f framing. A list of interoperable LZ4 libraries is
available at http://www.lz4.org/
Starting from Kafka 0.10.0.0, a new client library named Kafka Streams is available
for stream processing on data stored in Kafka topics. This new client library only
works with 0.10.x and upward versioned brokers due to message format changes
mentioned above. For more information please read Streams documentation.
The default value of the configuration parameter receive.buffer.bytes is now 64K
for the new consumer.
The new consumer now exposes the configuration
parameter exclude.internal.topics to restrict internal topics (such as the consumer
offsets topic) from accidentally being included in regular expression subscriptions.
By default, it is enabled.
The old Scala producer has been deprecated. Users should migrate their code to the
Java producer included in the kafka-clients JAR as soon as possible.
The new consumer API has been marked stable.
1. Update server.properties file on all brokers and add the following property:
inter.broker.protocol.version=0.8.2.X
2. Upgrade the brokers. This can be done a broker at a time by simply bringing it down,
updating the code, and restarting it.
3. Once the entire cluster is upgraded, bump the protocol version by editing
inter.broker.protocol.version and setting it to 0.9.0.0.
4. Restart the brokers one by one for the new protocol version to take effect
Note: If you are willing to accept downtime, you can simply take all the brokers down,
update the code and start all of them. They will start with the new protocol by default.
Note: Bumping the protocol version and restarting can be done any time after the brokers
were upgraded. It does not have to be immediately after.
Potential breaking changes in 0.9.0.0
Deprecations in 0.9.0.0
0.8.2 is fully compatible with 0.8.1. The upgrade can be done one broker at a time by simply
bringing it down, updating the code, and restarting it.
0.8.1 is fully compatible with 0.8. The upgrade can be done one broker at a time by simply
bringing it down, updating the code, and restarting it.
Release 0.7 is incompatible with newer releases. Major changes were made to the API,
ZooKeeper data structures, and protocol, and configuration in order to add replication
(Which was missing in 0.7). The upgrade from 0.7 to later versions requires a special
tool for migration. This migration can be done without downtime.
2. APIS
Kafka includes five core apis:
Kafka exposes all its functionality over a language independent protocol which has clients
available in many programming languages. However only the Java clients are maintained
as part of the main Kafka project, the others are available as independent open source
projects. A list of non-Java clients is available here.
The Producer API allows applications to send streams of data to topics in the Kafka cluster.
To use the producer, you can use the following maven dependency:
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>2.6.0</version>
</dependency>
The Consumer API allows applications to read streams of data from topics in the Kafka
cluster.
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>2.6.0</version>
</dependency>
The Streams API allows transforming streams of data from input topics to output topics.
To use Kafka Streams you can use the following maven dependency:
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-streams</artifactId>
<version>2.6.0</version>
</dependency>
To use Kafka Streams DSL for Scala for Scala 2.13 you can use the following maven
dependency:
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-streams-scala_2.13</artifactId>
<version>2.6.0</version>
</dependency>
Many users of Connect won't need to use this API directly, though, they can use pre-built
connectors without needing to write any code. Additional information on using Connect is
available here.
The Admin API supports managing and inspecting topics, brokers, acls, and other Kafka
objects.
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>2.6.0</version>
</dependency>
3. CONFIGURATION
Kafka uses key-value pairs in the property file format for configuration. These values can be
supplied either from a file or programmatically.
broker.id
log.dirs
zookeeper.connect
Type: string
Default:
Valid Values:
Importance: high
Update Mode: read-only
advertised.host.name
Type: string
Default: null
Valid Values:
Importance: high
Update Mode: read-only
advertised.listeners
Type: string
Default: null
Valid Values:
Importance: high
Update Mode: per-broker
advertised.port
Type: int
Default: null
Valid Values:
Importance: high
Update Mode: read-only
auto.create.topics.enable
Type: boolean
Default: true
Valid Values:
Importance: high
Update Mode: read-only
auto.leader.rebalance.enable
Type: boolean
Default: true
Valid Values:
Importance: high
Update Mode: read-only
background.threads
The number of threads to use for various background processing tasks
Type: int
Default: 10
Valid Values: [1,...]
Importance: high
Update Mode: cluster-wide
broker.id
The broker id for this server. If unset, a unique broker id will be generated.To avoid
conflicts between zookeeper generated broker id's and user configured broker id's,
generated broker ids start from reserved.broker.max.id + 1.
Type: int
Default: -1
Valid Values:
Importance: high
Update Mode: read-only
compression.type
Specify the final compression type for a given topic. This configuration accepts the
standard compression codecs ('gzip', 'snappy', 'lz4', 'zstd'). It additionally accepts
'uncompressed' which is equivalent to no compression; and 'producer' which means
retain the original compression codec set by the producer.
Type: string
Default: producer
Valid Values:
Importance: high
Update Mode: cluster-wide
control.plane.listener.name
Name of listener used for communication between controller and brokers. Broker
will use the control.plane.listener.name to locate the endpoint in listeners list, to
listen for connections from the controller. For example, if a broker's config is :
listeners = INTERNAL://192.1.1.8:9092, EXTERNAL://10.1.1.5:9093,
CONTROLLER://192.1.1.8:9094
listener.security.protocol.map = INTERNAL:PLAINTEXT, EXTERNAL:SSL,
CONTROLLER:SSL
control.plane.listener.name = CONTROLLER
On startup, the broker will start listening on "192.1.1.8:9094" with security protocol
"SSL".
On controller side, when it discovers a broker's published endpoints through
zookeeper, it will use the control.plane.listener.name to find the endpoint, which it
will use to establish connection to the broker.
For example, if the broker's published endpoints on zookeeper are :
"endpoints" :
["INTERNAL://broker1.example.com:9092","EXTERNAL://broker1.example.com:9093
","CONTROLLER://broker1.example.com:9094"]
and the controller's config is :
listener.security.protocol.map = INTERNAL:PLAINTEXT, EXTERNAL:SSL,
CONTROLLER:SSL
control.plane.listener.name = CONTROLLER
then controller will use "broker1.example.com:9094" with security protocol "SSL" to
connect to the broker.
If not explicitly configured, the default value will be null and there will be no
dedicated endpoints for controller connections.
Type: string
Default: null
Valid Values:
Importance: high
Update Mode: read-only
delete.topic.enable
Enables delete topic. Delete topic through the admin tool will have no effect if this
config is turned off
Type: boolean
Default: true
Valid Values:
Importance: high
Update Mode: read-only
host.name
DEPRECATED: only used when listeners is not set. Use listeners instead.
hostname of broker. If this is set, it will only bind to this address. If this is not set, it
will bind to all interfaces
Type: string
Default: ""
Valid Values:
Importance: high
Update Mode: read-only
leader.imbalance.check.interval.seconds
The frequency with which the partition rebalance check is triggered by the controller
Type: long
Default: 300
Valid Values:
Importance: high
Update Mode: read-only
leader.imbalance.per.broker.percentage
The ratio of leader imbalance allowed per broker. The controller would trigger a
leader balance if it goes above this value per broker. The value is specified in
percentage.
Type: int
Default: 10
Valid Values:
Importance: high
Update Mode: read-only
listeners
Listener List - Comma-separated list of URIs we will listen on and the listener names.
If the listener name is not a security protocol, listener.security.protocol.map must
also be set.
Specify hostname as 0.0.0.0 to bind to all interfaces.
Leave hostname empty to bind to default interface.
Examples of legal listener lists:
PLAINTEXT://myhost:9092,SSL://:9091
CLIENT://0.0.0.0:9092,REPLICATION://localhost:9093
Type: string
Default: null
Valid Values:
Importance: high
Update Mode: per-broker
log.dir
The directory in which the log data is kept (supplemental for log.dirs property)
Type: string
Default: /tmp/kafka-logs
Valid Values:
Importance: high
Update Mode: read-only
log.dirs
The directories in which the log data is kept. If not set, the value in log.dir is used
Type: string
Default: null
Valid Values:
Importance: high
Update Mode: read-only
log.flush.interval.messages
Type: long
Default: 9223372036854775807
Valid Values: [1,...]
Importance: high
Update Mode: cluster-wide
log.flush.interval.ms
The maximum time in ms that a message in any topic is kept in memory before
flushed to disk. If not set, the value in log.flush.scheduler.interval.ms is used
Type: long
Default: null
Valid Values:
Importance: high
Update Mode: cluster-wide
log.flush.offset.checkpoint.interval.ms
The frequency with which we update the persistent record of the last flush which
acts as the log recovery point
Type: int
Default: 60000 (1 minute)
Valid Values: [0,...]
Importance: high
Update Mode: read-only
log.flush.scheduler.interval.ms
The frequency in ms that the log flusher checks whether any log needs to be flushed
to disk
Type: long
Default: 9223372036854775807
Valid Values:
Importance: high
Update Mode: read-only
log.flush.start.offset.checkpoint.interval.ms
The frequency with which we update the persistent record of log start offset
Type: int
Default: 60000 (1 minute)
Valid Values: [0,...]
Importance: high
Update Mode: read-only
log.retention.bytes
The maximum size of the log before deleting it
Type: long
Default: -1
Valid Values:
Importance: high
Update Mode: cluster-wide
log.retention.hours
The number of hours to keep a log file before deleting it (in hours), tertiary to
log.retention.ms property
Type: int
Default: 168
Valid Values:
Importance: high
Update Mode: read-only
log.retention.minutes
The number of minutes to keep a log file before deleting it (in minutes), secondary to
log.retention.ms property. If not set, the value in log.retention.hours is used
Type: int
Default: null
Valid Values:
Importance: high
Update Mode: read-only
log.retention.ms
The number of milliseconds to keep a log file before deleting it (in milliseconds), If
not set, the value in log.retention.minutes is used. If set to -1, no time limit is applied.
Type: long
Default: null
Valid Values:
Importance: high
Update Mode: cluster-wide
log.roll.hours
The maximum time before a new log segment is rolled out (in hours), secondary to
log.roll.ms property
Type: int
Default: 168
Valid Values: [1,...]
Importance: high
Update Mode: read-only
log.roll.jitter.hours
Type: int
Default: 0
Valid Values: [0,...]
Importance: high
Update Mode: read-only
log.roll.jitter.ms
The maximum jitter to subtract from logRollTimeMillis (in milliseconds). If not set,
the value in log.roll.jitter.hours is used
Type: long
Default: null
Valid Values:
Importance: high
Update Mode: cluster-wide
log.roll.ms
The maximum time before a new log segment is rolled out (in milliseconds). If not
set, the value in log.roll.hours is used
Type: long
Default: null
Valid Values:
Importance: high
Update Mode: cluster-wide
log.segment.bytes
Type: int
Default: 1073741824 (1 gibibyte)
Valid Values: [14,...]
Importance: high
Update Mode: cluster-wide
log.segment.delete.delay.ms
The amount of time to wait before deleting a file from the filesystem
Type: long
Default: 60000 (1 minute)
Valid Values: [0,...]
Importance: high
Update Mode: cluster-wide
message.max.bytes
The largest record batch size allowed by Kafka (after compression if compression is
enabled). If this is increased and there are consumers older than 0.10.2, the
consumers' fetch size must also be increased so that they can fetch record batches
this large. In the latest message format version, records are always grouped into
batches for efficiency. In previous message format versions, uncompressed records
are not grouped into batches and this limit only applies to a single record in that
case.This can be set per topic with the topic level max.message.bytes config.
Type: int
Default: 1048588
Valid Values: [0,...]
Importance: high
Update Mode: cluster-wide
min.insync.replicas
When a producer sets acks to "all" (or "-1"), min.insync.replicas specifies the
minimum number of replicas that must acknowledge a write for the write to be
considered successful. If this minimum cannot be met, then the producer will raise
an exception (either NotEnoughReplicas or NotEnoughReplicasAfterAppend).
When used together, min.insync.replicas and acks allow you to enforce greater
durability guarantees. A typical scenario would be to create a topic with a replication
factor of 3, set min.insync.replicas to 2, and produce with acks of "all". This will
ensure that the producer raises an exception if a majority of replicas do not receive a
write.
Type: int
Default: 1
Valid Values: [1,...]
Importance: high
Update Mode: cluster-wide
num.io.threads
The number of threads that the server uses for processing requests, which may
include disk I/O
Type: int
Default: 8
Valid Values: [1,...]
Importance: high
Update Mode: cluster-wide
num.network.threads
The number of threads that the server uses for receiving requests from the network
and sending responses to the network
Type: int
Default: 3
Valid Values: [1,...]
Importance: high
Update Mode: cluster-wide
num.recovery.threads.per.data.dir
The number of threads per data directory to be used for log recovery at startup and
flushing at shutdown
Type: int
Default: 1
Valid Values: [1,...]
Importance: high
Update Mode: cluster-wide
num.replica.alter.log.dirs.threads
The number of threads that can move replicas between log directories, which may
include disk I/O
Type: int
Default: null
Valid Values:
Importance: high
Update Mode: read-only
num.replica.fetchers
Type: int
Default: 1
Valid Values:
Importance: high
Update Mode: cluster-wide
offset.metadata.max.bytes
The maximum size for a metadata entry associated with an offset commit
Type: int
Default: 4096 (4 kibibytes)
Valid Values:
Importance: high
Update Mode: read-only
offsets.commit.required.acks
The required acks before the commit can be accepted. In general, the default (-1)
should not be overridden
Type: short
Default: -1
Valid Values:
Importance: high
Update Mode: read-only
offsets.commit.timeout.ms
Offset commit will be delayed until all replicas for the offsets topic receive the
commit or this timeout is reached. This is similar to the producer request timeout.
Type: int
Default: 5000 (5 seconds)
Valid Values: [1,...]
Importance: high
Update Mode: read-only
offsets.load.buffer.size
Batch size for reading from the offsets segments when loading offsets into the
cache (soft-limit, overridden if records are too large).
Type: int
Default: 5242880
Valid Values: [1,...]
Importance: high
Update Mode: read-only
offsets.retention.check.interval.ms
Type: long
Default: 600000 (10 minutes)
Valid Values: [1,...]
Importance: high
Update Mode: read-only
offsets.retention.minutes
After a consumer group loses all its consumers (i.e. becomes empty) its offsets will
be kept for this retention period before getting discarded. For standalone consumers
(using manual assignment), offsets will be expired after the time of last commit plus
this retention period.
Type: int
Default: 10080
Valid Values: [1,...]
Importance: high
Update Mode: read-only
offsets.topic.compression.codec
Compression codec for the offsets topic - compression may be used to achieve
"atomic" commits
Type: int
Default: 0
Valid Values:
Importance: high
Update Mode: read-only
offsets.topic.num.partitions
The number of partitions for the offset commit topic (should not change after
deployment)
Type: int
Default: 50
Valid Values: [1,...]
Importance: high
Update Mode: read-only
offsets.topic.replication.factor
The replication factor for the offsets topic (set higher to ensure availability). Internal
topic creation will fail until the cluster size meets this replication factor requirement.
Type: short
Default: 3
Valid Values: [1,...]
Importance: high
Update Mode: read-only
offsets.topic.segment.bytes
The offsets topic segment bytes should be kept relatively small in order to facilitate
faster log compaction and cache loads
Type: int
Default: 104857600 (100 mebibytes)
Valid Values: [1,...]
Importance: high
Update Mode: read-only
port
Type: int
Default: 9092
Valid Values:
Importance: high
Update Mode: read-only
queued.max.requests
The number of queued requests allowed for data-plane, before blocking the network
threads
Type: int
Default: 500
Valid Values: [1,...]
Importance: high
Update Mode: read-only
quota.consumer.default
DEPRECATED: Used only when dynamic default quotas are not configured for or in
Zookeeper. Any consumer distinguished by clientId/consumer group will get
throttled if it fetches more bytes than this value per-second
Type: long
Default: 9223372036854775807
Valid Values: [1,...]
Importance: high
Update Mode: read-only
quota.producer.default
DEPRECATED: Used only when dynamic default quotas are not configured for , or in
Zookeeper. Any producer distinguished by clientId will get throttled if it produces
more bytes than this value per-second
Type: long
Default: 9223372036854775807
Valid Values: [1,...]
Importance: high
Update Mode: read-only
replica.fetch.min.bytes
Minimum bytes expected for each fetch response. If not enough bytes, wait up to
replicaMaxWaitTimeMs
Type: int
Default: 1
Valid Values:
Importance: high
Update Mode: read-only
replica.fetch.wait.max.ms
max wait time for each fetcher request issued by follower replicas. This value should
always be less than the replica.lag.time.max.ms at all times to prevent frequent
shrinking of ISR for low throughput topics
Type: int
Default: 500
Valid Values:
Importance: high
Update Mode: read-only
replica.high.watermark.checkpoint.interval.ms
The frequency with which the high watermark is saved out to disk
Type: long
Default: 5000 (5 seconds)
Valid Values:
Importance: high
Update Mode: read-only
replica.lag.time.max.ms
If a follower hasn't sent any fetch requests or hasn't consumed up to the leaders log
end offset for at least this time, the leader will remove the follower from isr
Type: long
Default: 30000 (30 seconds)
Valid Values:
Importance: high
Update Mode: read-only
replica.socket.receive.buffer.bytes
Type: int
Default: 65536 (64 kibibytes)
Valid Values:
Importance: high
Update Mode: read-only
replica.socket.timeout.ms
The socket timeout for network requests. Its value should be at least
replica.fetch.wait.max.ms
Type: int
Default: 30000 (30 seconds)
Valid Values:
Importance: high
Update Mode: read-only
request.timeout.ms
The configuration controls the maximum amount of time the client will wait for the
response of a request. If the response is not received before the timeout elapses the
client will resend the request if necessary or fail the request if retries are exhausted.
Type: int
Default: 30000 (30 seconds)
Valid Values:
Importance: high
Update Mode: read-only
socket.receive.buffer.bytes
The SO_RCVBUF buffer of the socket server sockets. If the value is -1, the OS default
will be used.
Type: int
Default: 102400 (100 kibibytes)
Valid Values:
Importance: high
Update Mode: read-only
socket.request.max.bytes
Type: int
Default: 104857600 (100 mebibytes)
Valid Values: [1,...]
Importance: high
Update Mode: read-only
socket.send.buffer.bytes
The SO_SNDBUF buffer of the socket server sockets. If the value is -1, the OS default
will be used.
Type: int
Default: 102400 (100 kibibytes)
Valid Values:
Importance: high
Update Mode: read-only
transaction.max.timeout.ms
Type: int
Default: 900000 (15 minutes)
Valid Values: [1,...]
Importance: high
Update Mode: read-only
transaction.state.log.load.buffer.size
Batch size for reading from the transaction log segments when loading producer ids
and transactions into the cache (soft-limit, overridden if records are too large).
Type: int
Default: 5242880
Valid Values: [1,...]
Importance: high
Update Mode: read-only
transaction.state.log.min.isr
Type: int
Default: 2
Valid Values: [1,...]
Importance: high
Update Mode: read-only
transaction.state.log.num.partitions
The number of partitions for the transaction topic (should not change after
deployment).
Type: int
Default: 50
Valid Values: [1,...]
Importance: high
Update Mode: read-only
transaction.state.log.replication.factor
The replication factor for the transaction topic (set higher to ensure availability).
Internal topic creation will fail until the cluster size meets this replication factor
requirement.
Type: short
Default: 3
Valid Values: [1,...]
Importance: high
Update Mode: read-only
transaction.state.log.segment.bytes
The transaction topic segment bytes should be kept relatively small in order to
facilitate faster log compaction and cache loads
Type: int
Default: 104857600 (100 mebibytes)
Valid Values: [1,...]
Importance: high
Update Mode: read-only
transactional.id.expiration.ms
The time in ms that the transaction coordinator will wait without receiving any
transaction status updates for the current transaction before expiring its
transactional id. This setting also influences producer id expiration - producer ids are
expired once this time has elapsed after the last write with the given producer id.
Note that producer ids may expire sooner if the last write from the producer id is
deleted due to the topic's retention settings.
Type: int
Default: 604800000 (7 days)
Valid Values: [1,...]
Importance: high
Update Mode: read-only
unclean.leader.election.enable
Indicates whether to enable replicas not in the ISR set to be elected as leader as a
last resort, even though doing so may result in data loss
Type: boolean
Default: false
Valid Values:
Importance: high
Update Mode: cluster-wide
zookeeper.connection.timeout.ms
The max time that the client waits to establish a connection to zookeeper. If not set,
the value in zookeeper.session.timeout.ms is used
Type: int
Default: null
Valid Values:
Importance: high
Update Mode: read-only
zookeeper.max.in.flight.requests
Type: int
Default: 10
Valid Values: [1,...]
Importance: high
Update Mode: read-only
zookeeper.session.timeout.ms
Type: int
Default: 18000 (18 seconds)
Valid Values:
Importance: high
Update Mode: read-only
zookeeper.set.acl
Set client to use secure ACLs
Type: boolean
Default: false
Valid Values:
Importance: high
Update Mode: read-only
broker.id.generation.enable
Enable automatic broker id generation on the server. When enabled the value
configured for reserved.broker.max.id should be reviewed.
Type: boolean
Default: true
Valid Values:
Importance: medium
Update Mode: read-only
broker.rack
Rack of the broker. This will be used in rack aware replication assignment for fault
tolerance. Examples: `RACK1`, `us-east-1d`
Type: string
Default: null
Valid Values:
Importance: medium
Update Mode: read-only
connections.max.idle.ms
Idle connections timeout: the server socket processor threads close the connections
that idle more than this
Type: long
Default: 600000 (10 minutes)
Valid Values:
Importance: medium
Update Mode: read-only
connections.max.reauth.ms
When explicitly set to a positive number (the default is 0, not a positive number), a
session lifetime that will not exceed the configured value will be communicated to
v2.2.0 or later clients when they authenticate. The broker will disconnect any such
connection that is not re-authenticated within the session lifetime and that is then
subsequently used for any purpose other than re-authentication. Configuration
names can optionally be prefixed with listener prefix and SASL mechanism name in
lower-case. For example,
listener.name.sasl_ssl.oauthbearer.connections.max.reauth.ms=3600000
Type: long
Default: 0
Valid Values:
Importance: medium
Update Mode: read-only
controlled.shutdown.enable
Type: boolean
Default: true
Valid Values:
Importance: medium
Update Mode: read-only
controlled.shutdown.max.retries
Controlled shutdown can fail for multiple reasons. This determines the number of
retries when such failure happens
Type: int
Default: 3
Valid Values:
Importance: medium
Update Mode: read-only
controlled.shutdown.retry.backoff.ms
Before each retry, the system needs time to recover from the state that caused the
previous failure (Controller fail over, replica lag etc). This config determines the
amount of time to wait before retrying.
Type: long
Default: 5000 (5 seconds)
Valid Values:
Importance: medium
Update Mode: read-only
controller.socket.timeout.ms
Type: int
Default: 30000 (30 seconds)
Valid Values:
Importance: medium
Update Mode: read-only
default.replication.factor
Type: int
Default: 1
Valid Values:
Importance: medium
Update Mode: read-only
delegation.token.expiry.time.ms
The token validity time in miliseconds before the token needs to be renewed. Default
value 1 day.
Type: long
Default: 86400000 (1 day)
Valid Values: [1,...]
Importance: medium
Update Mode: read-only
delegation.token.master.key
Master/secret key to generate and verify delegation tokens. Same key must be
configured across all the brokers. If the key is not set or set to empty string, brokers
will disable the delegation token support.
Type: password
Default: null
Valid Values:
Importance: medium
Update Mode: read-only
delegation.token.max.lifetime.ms
The token has a maximum lifetime beyond which it cannot be renewed anymore.
Default value 7 days.
Type: long
Default: 604800000 (7 days)
Valid Values: [1,...]
Importance: medium
Update Mode: read-only
delete.records.purgatory.purge.interval.requests
The purge interval (in number of requests) of the delete records request purgatory
Type: int
Default: 1
Valid Values:
Importance: medium
Update Mode: read-only
fetch.max.bytes
The maximum number of bytes we will return for a fetch request. Must be at least
1024.
Type: int
Default: 57671680 (55 mebibytes)
Valid Values: [1024,...]
Importance: medium
Update Mode: read-only
fetch.purgatory.purge.interval.requests
The purge interval (in number of requests) of the fetch request purgatory
Type: int
Default: 1000
Valid Values:
Importance: medium
Update Mode: read-only
group.initial.rebalance.delay.ms
The amount of time the group coordinator will wait for more consumers to join a
new group before performing the first rebalance. A longer delay means potentially
fewer rebalances, but increases the time until processing begins.
Type: int
Default: 3000 (3 seconds)
Valid Values:
Importance: medium
Update Mode: read-only
group.max.session.timeout.ms
The maximum allowed session timeout for registered consumers. Longer timeouts
give consumers more time to process messages in between heartbeats at the cost
of a longer time to detect failures.
Type: int
Default: 1800000 (30 minutes)
Valid Values:
Importance: medium
Update Mode: read-only
group.max.size
Type: int
Default: 2147483647
Valid Values: [1,...]
Importance: medium
Update Mode: read-only
group.min.session.timeout.ms
The minimum allowed session timeout for registered consumers. Shorter timeouts
result in quicker failure detection at the cost of more frequent consumer
heartbeating, which can overwhelm broker resources.
Type: int
Default: 6000 (6 seconds)
Valid Values:
Importance: medium
Update Mode: read-only
inter.broker.listener.name
Name of listener used for communication between brokers. If this is unset, the
listener name is defined by security.inter.broker.protocol. It is an error to set this and
security.inter.broker.protocol properties at the same time.
Type: string
Default: null
Valid Values:
Importance: medium
Update Mode: read-only
inter.broker.protocol.version
Type: string
Default: 2.6-IV0
[0.8.0, 0.8.1, 0.8.2, 0.9.0,
0.10.0-IV0, 0.10.0-IV1,
0.10.1-IV0, 0.10.1-IV1,
0.10.1-IV2, 0.10.2-IV0,
0.11.0-IV0, 0.11.0-IV1,
Valid Values: 0.11.0-IV2, 1.0-IV0, 1.1-IV0,
2.0-IV0, 2.0-IV1, 2.1-IV0,
2.1-IV1, 2.1-IV2, 2.2-IV0,
2.2-IV1, 2.3-IV0, 2.3-IV1,
2.4-IV0, 2.4-IV1, 2.5-IV0,
2.6-IV0]
Importance: medium
Update Mode: read-only
log.cleaner.backoff.ms
Type: long
Default: 15000 (15 seconds)
Valid Values: [0,...]
Importance: medium
Update Mode: cluster-wide
log.cleaner.dedupe.buffer.size
The total memory used for log deduplication across all cleaner threads
Type: long
Default: 134217728
Valid Values:
Importance: medium
Update Mode: cluster-wide
log.cleaner.delete.retention.ms
Type: long
Default: 86400000 (1 day)
Valid Values:
Importance: medium
Update Mode: cluster-wide
log.cleaner.enable
Enable the log cleaner process to run on the server. Should be enabled if using any
topics with a cleanup.policy=compact including the internal offsets topic. If disabled
those topics will not be compacted and continually grow in size.
Type: boolean
Default: true
Valid Values:
Importance: medium
Update Mode: read-only
log.cleaner.io.buffer.load.factor
Log cleaner dedupe buffer load factor. The percentage full the dedupe buffer can
become. A higher value will allow more log to be cleaned at once but will lead to
more hash collisions
Type: double
Default: 0.9
Valid Values:
Importance: medium
Update Mode: cluster-wide
log.cleaner.io.buffer.size
The total memory used for log cleaner I/O buffers across all cleaner threads
Type: int
Default: 524288
Valid Values: [0,...]
Importance: medium
Update Mode: cluster-wide
log.cleaner.io.max.bytes.per.second
The log cleaner will be throttled so that the sum of its read and write i/o will be less
than this value on average
Type: double
Default: 1.7976931348623157E308
Valid Values:
Importance: medium
Update Mode: cluster-wide
log.cleaner.max.compaction.lag.ms
The maximum time a message will remain ineligible for compaction in the log. Only
applicable for logs that are being compacted.
Type: long
Default: 9223372036854775807
Valid Values:
Importance: medium
Update Mode: cluster-wide
log.cleaner.min.cleanable.ratio
The minimum ratio of dirty log to total log for a log to eligible for cleaning. If the
log.cleaner.max.compaction.lag.ms or the log.cleaner.min.compaction.lag.ms
configurations are also specified, then the log compactor considers the log eligible
for compaction as soon as either: (i) the dirty ratio threshold has been met and the
log has had dirty (uncompacted) records for at least the
log.cleaner.min.compaction.lag.ms duration, or (ii) if the log has had dirty
(uncompacted) records for at most the log.cleaner.max.compaction.lag.ms period.
Type: double
Default: 0.5
Valid Values:
Importance: medium
Update Mode: cluster-wide
log.cleaner.min.compaction.lag.ms
The minimum time a message will remain uncompacted in the log. Only applicable
for logs that are being compacted.
Type: long
Default: 0
Valid Values:
Importance: medium
Update Mode: cluster-wide
log.cleaner.threads
Type: int
Default: 1
Valid Values: [0,...]
Importance: medium
Update Mode: cluster-wide
log.cleanup.policy
The default cleanup policy for segments beyond the retention window. A comma
separated list of valid policies. Valid policies are: "delete" and "compact"
Type: list
Default: delete
Valid Values: [compact, delete]
Importance: medium
Update Mode: cluster-wide
log.index.interval.bytes
Type: int
Default: 4096 (4 kibibytes)
Valid Values: [0,...]
Importance: medium
Update Mode: cluster-wide
log.index.size.max.bytes
Type: int
Default: 10485760 (10 mebibytes)
Valid Values: [4,...]
Importance: medium
Update Mode: cluster-wide
log.message.format.version
Specify the message format version the broker will use to append messages to the
logs. The value should be a valid ApiVersion. Some examples are: 0.8.2, 0.9.0.0,
0.10.0, check ApiVersion for more details. By setting a particular message format
version, the user is certifying that all the existing messages on disk are smaller or
equal than the specified version. Setting this value incorrectly will cause consumers
with older versions to break as they will receive messages with a format that they
don't understand.
Type: string
Default: 2.6-IV0
Valid Values: [0.8.0, 0.8.1, 0.8.2, 0.9.0,
0.10.0-IV0, 0.10.0-IV1,
0.10.1-IV0, 0.10.1-IV1,
0.10.1-IV2, 0.10.2-IV0,
0.11.0-IV0, 0.11.0-IV1,
0.11.0-IV2, 1.0-IV0, 1.1-IV0,
2.0-IV0, 2.0-IV1, 2.1-IV0,
2.1-IV1, 2.1-IV2, 2.2-IV0,
2.2-IV1, 2.3-IV0, 2.3-IV1,
2.4-IV0, 2.4-IV1, 2.5-IV0,
2.6-IV0]
Importance: medium
Update Mode: read-only
log.message.timestamp.difference.max.ms
The maximum difference allowed between the timestamp when a broker receives a
message and the timestamp specified in the message. If
log.message.timestamp.type=CreateTime, a message will be rejected if the
difference in timestamp exceeds this threshold. This configuration is ignored if
log.message.timestamp.type=LogAppendTime.The maximum timestamp difference
allowed should be no greater than log.retention.ms to avoid unnecessarily frequent
log rolling.
Type: long
Default: 9223372036854775807
Valid Values:
Importance: medium
Update Mode: cluster-wide
log.message.timestamp.type
Define whether the timestamp in the message is message create time or log append
time. The value should be either `CreateTime` or `LogAppendTime`
Type: string
Default: CreateTime
[CreateTime,
Valid Values:
LogAppendTime]
Importance: medium
Update Mode: cluster-wide
log.preallocate
Should pre allocate file when create new segment? If you are using Kafka on
Windows, you probably need to set it to true.
Type: boolean
Default: false
Valid Values:
Importance: medium
Update Mode: cluster-wide
log.retention.check.interval.ms
The frequency in milliseconds that the log cleaner checks whether any log is eligible
for deletion
Type: long
Default: 300000 (5 minutes)
Valid Values: [1,...]
Importance: medium
Update Mode: read-only
max.connections
The maximum number of connections we allow in the broker at any time. This limit
is applied in addition to any per-ip limits configured using max.connections.per.ip.
Listener-level limits may also be configured by prefixing the config name with the
listener prefix, for example, listener.name.internal.max.connections . Broker-wide
limit should be configured based on broker capacity while listener limits should be
configured based on application requirements. New connections are blocked if
either the listener or broker limit is reached. Connections on the inter-broker listener
are permitted even if broker-wide limit is reached. The least recently used connection
on another listener will be closed in this case.
Type: int
Default: 2147483647
Valid Values: [0,...]
Importance: medium
Update Mode: cluster-wide
max.connections.per.ip
The maximum number of connections we allow from each ip address. This can be
set to 0 if there are overrides configured using max.connections.per.ip.overrides
property. New connections from the ip address are dropped if the limit is reached.
Type: int
Default: 2147483647
Valid Values: [0,...]
Importance: medium
Update Mode: cluster-wide
max.connections.per.ip.overrides
Type: string
Default: ""
Valid Values:
Importance: medium
Update Mode: cluster-wide
max.incremental.fetch.session.cache.slots
Type: int
Default: 1000
Valid Values: [0,...]
Importance: medium
Update Mode: read-only
num.partitions
Type: int
Default: 1
Valid Values: [1,...]
Importance: medium
Update Mode: read-only
password.encoder.old.secret
The old secret that was used for encoding dynamically configured passwords. This
is required only when the secret is updated. If specified, all dynamically encoded
passwords are decoded using this old secret and re-encoded using
password.encoder.secret when broker starts up.
Type: password
Default: null
Valid Values:
Importance: medium
Update Mode: read-only
password.encoder.secret
The secret used for encoding dynamically configured passwords for this broker.
Type: password
Default: null
Valid Values:
Importance: medium
Update Mode: read-only
principal.builder.class
Type: class
Default: null
Valid Values:
Importance: medium
Update Mode: per-broker
producer.purgatory.purge.interval.requests
The purge interval (in number of requests) of the producer request purgatory
Type: int
Default: 1000
Valid Values:
Importance: medium
Update Mode: read-only
queued.max.request.bytes
The number of queued bytes allowed before no more requests are read
Type: long
Default: -1
Valid Values:
Importance: medium
Update Mode: read-only
replica.fetch.backoff.ms
Type: int
Default: 1000 (1 second)
Valid Values: [0,...]
Importance: medium
Update Mode: read-only
replica.fetch.max.bytes
The number of bytes of messages to attempt to fetch for each partition. This is not
an absolute maximum, if the first record batch in the first non-empty partition of the
fetch is larger than this value, the record batch will still be returned to ensure that
progress can be made. The maximum record batch size accepted by the broker is
defined via message.max.bytes (broker config) or max.message.bytes (topic config).
Type: int
Default: 1048576 (1 mebibyte)
Valid Values: [0,...]
Importance: medium
Update Mode: read-only
replica.fetch.response.max.bytes
Maximum bytes expected for the entire fetch response. Records are fetched in
batches, and if the first record batch in the first non-empty partition of the fetch is
larger than this value, the record batch will still be returned to ensure that progress
can be made. As such, this is not an absolute maximum. The maximum record batch
size accepted by the broker is defined via message.max.bytes (broker config)
or max.message.bytes (topic config).
Type: int
Default: 10485760 (10 mebibytes)
Valid Values: [0,...]
Importance: medium
Update Mode: read-only
replica.selector.class
The fully qualified class name that implements ReplicaSelector. This is used by the
broker to find the preferred read replica. By default, we use an implementation that
returns the leader.
Type: string
Default: null
Valid Values:
Importance: medium
Update Mode: read-only
reserved.broker.max.id
Type: int
Default: 1000
Valid Values: [0,...]
Importance: medium
Update Mode: read-only
sasl.client.callback.handler.class
The fully qualified name of a SASL client callback handler class that implements the
AuthenticateCallbackHandler interface.
Type: class
Default: null
Valid Values:
Importance: medium
Update Mode: read-only
sasl.enabled.mechanisms
The list of SASL mechanisms enabled in the Kafka server. The list may contain any
mechanism for which a security provider is available. Only GSSAPI is enabled by
default.
Type: list
Default: GSSAPI
Valid Values:
Importance: medium
Update Mode: per-broker
sasl.jaas.config
JAAS login context parameters for SASL connections in the format used by JAAS
configuration files. JAAS configuration file format is described here. The format for
the value is: 'loginModuleClass controlFlag (optionName=optionValue)*; '. For
brokers, the config must be prefixed with listener prefix and SASL mechanism name
in lower-case. For example, listener.name.sasl_ssl.scram-sha-
256.sasl.jaas.config=com.example.ScramLoginModule required;
Type: password
Default: null
Valid Values:
Importance: medium
Update Mode: per-broker
sasl.kerberos.kinit.cmd
Type: string
Default: /usr/bin/kinit
Valid Values:
Importance: medium
Update Mode: per-broker
sasl.kerberos.min.time.before.relogin
Type: long
Default: 60000
Valid Values:
Importance: medium
Update Mode: per-broker
sasl.kerberos.principal.to.local.rules
A list of rules for mapping from principal names to short names (typically operating
system usernames). The rules are evaluated in order and the first rule that matches
a principal name is used to map it to a short name. Any later rules in the list are
ignored. By default, principal names of the form {username}/{hostname}@{REALM}
are mapped to {username}. For more details on the format please see security
authorization and acls. Note that this configuration is ignored if an extension of
KafkaPrincipalBuilder is provided by the principal.builder.class configuration.
Type: list
Default: DEFAULT
Valid Values:
Importance: medium
Update Mode: per-broker
sasl.kerberos.service.name
The Kerberos principal name that Kafka runs as. This can be defined either in Kafka's
JAAS config or in Kafka's config.
Type: string
Default: null
Valid Values:
Importance: medium
Update Mode: per-broker
sasl.kerberos.ticket.renew.jitter
Percentage of random jitter added to the renewal time.
Type: double
Default: 0.05
Valid Values:
Importance: medium
Update Mode: per-broker
sasl.kerberos.ticket.renew.window.factor
Login thread will sleep until the specified window factor of time from last refresh to
ticket's expiry has been reached, at which time it will try to renew the ticket.
Type: double
Default: 0.8
Valid Values:
Importance: medium
Update Mode: per-broker
sasl.login.callback.handler.class
The fully qualified name of a SASL login callback handler class that implements the
AuthenticateCallbackHandler interface. For brokers, login callback handler config
must be prefixed with listener prefix and SASL mechanism name in lower-case. For
example, listener.name.sasl_ssl.scram-sha-
256.sasl.login.callback.handler.class=com.example.CustomScramLoginCallbackHa
ndler
Type: class
Default: null
Valid Values:
Importance: medium
Update Mode: read-only
sasl.login.class
The fully qualified name of a class that implements the Login interface. For brokers,
login config must be prefixed with listener prefix and SASL mechanism name in
lower-case. For example, listener.name.sasl_ssl.scram-sha-
256.sasl.login.class=com.example.CustomScramLogin
Type: class
Default: null
Valid Values:
Importance: medium
Update Mode: read-only
sasl.login.refresh.buffer.seconds
The amount of buffer time before credential expiration to maintain when refreshing a
credential, in seconds. If a refresh would otherwise occur closer to expiration than
the number of buffer seconds then the refresh will be moved up to maintain as much
of the buffer time as possible. Legal values are between 0 and 3600 (1 hour); a
default value of 300 (5 minutes) is used if no value is specified. This value and
sasl.login.refresh.min.period.seconds are both ignored if their sum exceeds the
remaining lifetime of a credential. Currently applies only to OAUTHBEARER.
Type: short
Default: 300
Valid Values:
Importance: medium
Update Mode: per-broker
sasl.login.refresh.min.period.seconds
The desired minimum time for the login refresh thread to wait before refreshing a
credential, in seconds. Legal values are between 0 and 900 (15 minutes); a default
value of 60 (1 minute) is used if no value is specified. This value and
sasl.login.refresh.buffer.seconds are both ignored if their sum exceeds the
remaining lifetime of a credential. Currently applies only to OAUTHBEARER.
Type: short
Default: 60
Valid Values:
Importance: medium
Update Mode: per-broker
sasl.login.refresh.window.factor
Login refresh thread will sleep until the specified window factor relative to the
credential's lifetime has been reached, at which time it will try to refresh the
credential. Legal values are between 0.5 (50%) and 1.0 (100%) inclusive; a default
value of 0.8 (80%) is used if no value is specified. Currently applies only to
OAUTHBEARER.
Type: double
Default: 0.8
Valid Values:
Importance: medium
Update Mode: per-broker
sasl.login.refresh.window.jitter
The maximum amount of random jitter relative to the credential's lifetime that is
added to the login refresh thread's sleep time. Legal values are between 0 and 0.25
(25%) inclusive; a default value of 0.05 (5%) is used if no value is specified. Currently
applies only to OAUTHBEARER.
Type: double
Default: 0.05
Valid Values:
Importance: medium
Update Mode: per-broker
sasl.mechanism.inter.broker.protocol
Type: string
Default: GSSAPI
Valid Values:
Importance: medium
Update Mode: per-broker
sasl.server.callback.handler.class
The fully qualified name of a SASL server callback handler class that implements the
AuthenticateCallbackHandler interface. Server callback handlers must be prefixed
with listener prefix and SASL mechanism name in lower-case. For example,
listener.name.sasl_ssl.plain.sasl.server.callback.handler.class=com.example.Custo
mPlainCallbackHandler.
Type: class
Default: null
Valid Values:
Importance: medium
Update Mode: read-only
security.inter.broker.protocol
Type: string
Default: PLAINTEXT
Valid Values:
Importance: medium
Update Mode: read-only
ssl.cipher.suites
Type: list
Default: ""
Valid Values:
Importance: medium
Update Mode: per-broker
ssl.client.auth
Configures kafka broker to request client authentication. The following settings are
common:
Type: string
Default: none
Valid Values: [required, requested, none]
Importance: medium
Update Mode: per-broker
ssl.enabled.protocols
The list of protocols enabled for SSL connections. The default is 'TLSv1.2,TLSv1.3'
when running with Java 11 or newer, 'TLSv1.2' otherwise. With the default value for
Java 11, clients and servers will prefer TLSv1.3 if both support it and fallback to
TLSv1.2 otherwise (assuming both support at least TLSv1.2). This default should be
fine for most cases. Also see the config documentation for `ssl.protocol`.
Type: list
Default: TLSv1.2
Valid Values:
Importance: medium
Update Mode: per-broker
ssl.key.password
The password of the private key in the key store file. This is optional for client.
Type: password
Default: null
Valid Values:
Importance: medium
Update Mode: per-broker
ssl.keymanager.algorithm
The algorithm used by key manager factory for SSL connections. Default value is the
key manager factory algorithm configured for the Java Virtual Machine.
Type: string
Default: SunX509
Valid Values:
Importance: medium
Update Mode: per-broker
ssl.keystore.location
The location of the key store file. This is optional for client and can be used for two-
way authentication for client.
Type: string
Default: null
Valid Values:
Importance: medium
Update Mode: per-broker
ssl.keystore.password
The store password for the key store file. This is optional for client and only needed
if ssl.keystore.location is configured.
Type: password
Default: null
Valid Values:
Importance: medium
Update Mode: per-broker
ssl.keystore.type
The file format of the key store file. This is optional for client.
Type: string
Default: JKS
Valid Values:
Importance: medium
Update Mode: per-broker
ssl.protocol
The SSL protocol used to generate the SSLContext. The default is 'TLSv1.3' when
running with Java 11 or newer, 'TLSv1.2' otherwise. This value should be fine for
most use cases. Allowed values in recent JVMs are 'TLSv1.2' and 'TLSv1.3'. 'TLS',
'TLSv1.1', 'SSL', 'SSLv2' and 'SSLv3' may be supported in older JVMs, but their usage
is discouraged due to known security vulnerabilities. With the default value for this
config and 'ssl.enabled.protocols', clients will downgrade to 'TLSv1.2' if the server
does not support 'TLSv1.3'. If this config is set to 'TLSv1.2', clients will not use
'TLSv1.3' even if it is one of the values in ssl.enabled.protocols and the server only
supports 'TLSv1.3'.
Type: string
Default: TLSv1.2
Valid Values:
Importance: medium
Update Mode: per-broker
ssl.provider
The name of the security provider used for SSL connections. Default value is the
default security provider of the JVM.
Type: string
Default: null
Valid Values:
Importance: medium
Update Mode: per-broker
ssl.trustmanager.algorithm
The algorithm used by trust manager factory for SSL connections. Default value is
the trust manager factory algorithm configured for the Java Virtual Machine.
Type: string
Default: PKIX
Valid Values:
Importance: medium
Update Mode: per-broker
ssl.truststore.location
Type: string
Default: null
Valid Values:
Importance: medium
Update Mode: per-broker
ssl.truststore.password
The password for the trust store file. If a password is not set access to the truststore
is still available, but integrity checking is disabled.
Type: password
Default: null
Valid Values:
Importance: medium
Update Mode: per-broker
ssl.truststore.type
Type: string
Default: JKS
Valid Values:
Importance: medium
Update Mode: per-broker
zookeeper.clientCnxnSocket
Type: string
Default: null
Valid Values:
Importance: medium
Update Mode: read-only
zookeeper.ssl.client.enable
Set client to use TLS when connecting to ZooKeeper. An explicit value overrides any
value set via the zookeeper.client.secure system property (note the different name).
Defaults to false if neither is set; when true, zookeeper.clientCnxnSocket must be set
(typically to org.apache.zookeeper.ClientCnxnSocketNetty ); other values to set may
include zookeeper.ssl.cipher.suites, zookeeper.ssl.crl.enable, zookeeper.ssl.enab
led.protocols, zookeeper.ssl.endpoint.identification.algorithm , zookeeper.ssl.ke
ystore.location, zookeeper.ssl.keystore.password , zookeeper.ssl.keystore.type, zo
okeeper.ssl.ocsp.enable, zookeeper.ssl.protocol, zookeeper.ssl.truststore.locati
on, zookeeper.ssl.truststore.password , zookeeper.ssl.truststore.type
Type: boolean
Default: false
Valid Values:
Importance: medium
Update Mode: read-only
zookeeper.ssl.keystore.location
Keystore location when using a client-side certificate with TLS connectivity to
ZooKeeper. Overrides any explicit value set via
the zookeeper.ssl.keyStore.location system property (note the camelCase).
Type: string
Default: null
Valid Values:
Importance: medium
Update Mode: read-only
zookeeper.ssl.keystore.password
Type: password
Default: null
Valid Values:
Importance: medium
Update Mode: read-only
zookeeper.ssl.keystore.type
Type: string
Default: null
Valid Values:
Importance: medium
Update Mode: read-only
zookeeper.ssl.truststore.location
Truststore location when using TLS connectivity to ZooKeeper. Overrides any explicit
value set via the zookeeper.ssl.trustStore.location system property (note the
camelCase).
Type: string
Default: null
Valid Values:
Importance: medium
Update Mode: read-only
zookeeper.ssl.truststore.password
Type: password
Default: null
Valid Values:
Importance: medium
Update Mode: read-only
zookeeper.ssl.truststore.type
Truststore type when using TLS connectivity to ZooKeeper. Overrides any explicit
value set via the zookeeper.ssl.trustStore.type system property (note the
camelCase). The default value of null means the type will be auto-detected based
on the filename extension of the truststore.
Type: string
Default: null
Valid Values:
Importance: medium
Update Mode: read-only
alter.config.policy.class.name
The alter configs policy class that should be used for validation. The class should
implement the org.apache.kafka.server.policy.AlterConfigPolicy interface.
Type: class
Default: null
Valid Values:
Importance: low
Update Mode: read-only
alter.log.dirs.replication.quota.window.num
The number of samples to retain in memory for alter log dirs replication quotas
Type: int
Default: 11
Valid Values: [1,...]
Importance: low
Update Mode: read-only
alter.log.dirs.replication.quota.window.size.seconds
The time span of each sample for alter log dirs replication quotas
Type: int
Default: 1
Valid Values: [1,...]
Importance: low
Update Mode: read-only
authorizer.class.name
Type: string
Default: ""
Valid Values:
Importance: low
Update Mode: read-only
client.quota.callback.class
Type: class
Default: null
Valid Values:
Importance: low
Update Mode: read-only
connection.failed.authentication.delay.ms
Connection close delay on failed authentication: this is the time (in milliseconds) by
which connection close will be delayed on authentication failure. This must be
configured to be less than connections.max.idle.ms to prevent connection timeout.
Type: int
Default: 100
Valid Values: [0,...]
Importance: low
Update Mode: read-only
create.topic.policy.class.name
The create topic policy class that should be used for validation. The class should
implement the org.apache.kafka.server.policy.CreateTopicPolicy interface.
Type: class
Default: null
Valid Values:
Importance: low
Update Mode: read-only
delegation.token.expiry.check.interval.ms
Type: long
Default: 3600000 (1 hour)
Valid Values: [1,...]
Importance: low
Update Mode: read-only
kafka.metrics.polling.interval.secs
Type: int
Default: 10
Valid Values: [1,...]
Importance: low
Update Mode: read-only
kafka.metrics.reporters
A list of classes to use as Yammer metrics custom reporters. The reporters should
implement kafka.metrics.KafkaMetricsReporter trait. If a client wants to expose
JMX operations on a custom reporter, the custom reporter needs to additionally
implement an MBean trait that
extends kafka.metrics.KafkaMetricsReporterMBean trait so that the registered MBean
is compliant with the standard MBean convention.
Type: list
Default: ""
Valid Values:
Importance: low
Update Mode: read-only
listener.security.protocol.map
Map between listener names and security protocols. This must be defined for the
same security protocol to be usable in more than one port or IP. For example,
internal and external traffic can be separated even if SSL is required for both.
Concretely, the user could define listeners with names INTERNAL and EXTERNAL
and this property as: `INTERNAL:SSL,EXTERNAL:SSL`. As shown, key and value are
separated by a colon and map entries are separated by commas. Each listener name
should only appear once in the map. Different security (SSL and SASL) settings can
be configured for each listener by adding a normalised prefix (the listener name is
lowercased) to the config name. For example, to set a different keystore for the
INTERNAL listener, a config with
name listener.name.internal.ssl.keystore.location would be set. If the config for
the listener name is not set, the config will fallback to the generic config
(i.e. ssl.keystore.location).
Type: string
PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINT
Default:
EXT,SASL_SSL:SASL_SSL
Valid
Values:
Importa
low
nce:
Update
per-broker
Mode:
log.message.downconversion.enable
Type: boolean
Default: true
Valid Values:
Importance: low
Update Mode: cluster-wide
metric.reporters
Type: list
Default: ""
Valid Values:
Importance: low
Update Mode: cluster-wide
metrics.num.samples
Type: int
Default: 2
Valid Values: [1,...]
Importance: low
Update Mode: read-only
metrics.recording.level
Type: string
Default: INFO
Valid Values:
Importance: low
Update Mode: read-only
metrics.sample.window.ms
Type: long
Default: 30000 (30 seconds)
Valid Values: [1,...]
Importance: low
Update Mode: read-only
password.encoder.cipher.algorithm
Type: string
Default: AES/CBC/PKCS5Padding
Valid Values:
Importance: low
Update Mode: read-only
password.encoder.iterations
Type: int
Default: 4096
Valid Values: [1024,...]
Importance: low
Update Mode: read-only
password.encoder.key.length
Type: int
Default: 128
Valid Values: [8,...]
Importance: low
Update Mode: read-only
password.encoder.keyfactory.algorithm
Type: string
Default: null
Valid Values:
Importance: low
Update Mode: read-only
quota.window.num
Type: int
Default: 11
Valid Values: [1,...]
Importance: low
Update Mode: read-only
quota.window.size.seconds
Type: int
Default: 1
Valid Values: [1,...]
Importance: low
Update Mode: read-only
replication.quota.window.num
Type: int
Default: 11
Valid Values: [1,...]
Importance: low
Update Mode: read-only
replication.quota.window.size.seconds
Type: int
Default: 1
Valid Values: [1,...]
Importance: low
Update Mode: read-only
security.providers
Type: string
Default: null
Valid Values:
Importance: low
Update Mode: read-only
ssl.endpoint.identification.algorithm
Type: string
Default: https
Valid Values:
Importance: low
Update Mode: per-broker
ssl.engine.factory.class
Type: class
Default: null
Valid Values:
Importance: low
Update Mode: per-broker
ssl.principal.mapping.rules
A list of rules for mapping from distinguished name from the client certificate to
short name. The rules are evaluated in order and the first rule that matches a
principal name is used to map it to a short name. Any later rules in the list are
ignored. By default, distinguished name of the X.500 certificate will be the principal.
For more details on the format please see security authorization and acls. Note that
this configuration is ignored if an extension of KafkaPrincipalBuilder is provided by
the principal.builder.class configuration.
Type: string
Default: DEFAULT
Valid Values:
Importance: low
Update Mode: read-only
ssl.secure.random.implementation
Type: string
Default: null
Valid Values:
Importance: low
Update Mode: per-broker
transaction.abort.timed.out.transaction.cleanup.interval.ms
The interval at which to rollback transactions that have timed out
Type: int
Default: 10000 (10 seconds)
Valid Values: [1,...]
Importance: low
Update Mode: read-only
transaction.remove.expired.transaction.cleanup.interval.ms
Type: int
Default: 3600000 (1 hour)
Valid Values: [1,...]
Importance: low
Update Mode: read-only
zookeeper.ssl.cipher.suites
Specifies the enabled cipher suites to be used in ZooKeeper TLS negotiation (csv).
Overrides any explicit value set via the zookeeper.ssl.ciphersuites system property
(note the single word "ciphersuites"). The default value of null means the list of
enabled cipher suites is determined by the Java runtime being used.
Type: list
Default: null
Valid Values:
Importance: low
Update Mode: read-only
zookeeper.ssl.crl.enable
Type: boolean
Default: false
Valid Values:
Importance: low
Update Mode: read-only
zookeeper.ssl.enabled.protocols
Specifies the enabled protocol(s) in ZooKeeper TLS negotiation (csv). Overrides any
explicit value set via the zookeeper.ssl.enabledProtocols system property (note the
camelCase). The default value of null means the enabled protocol will be the value
of the zookeeper.ssl.protocol configuration property.
Type: list
Default: null
Valid Values:
Importance: low
Update Mode: read-only
zookeeper.ssl.endpoint.identification.algorithm
Type: string
Default: HTTPS
Valid Values:
Importance: low
Update Mode: read-only
zookeeper.ssl.ocsp.enable
Specifies whether to enable Online Certificate Status Protocol in the ZooKeeper TLS
protocols. Overrides any explicit value set via the zookeeper.ssl.ocsp system
property (note the shorter name).
Type: boolean
Default: false
Valid Values:
Importance: low
Update Mode: read-only
zookeeper.ssl.protocol
Specifies the protocol to be used in ZooKeeper TLS negotiation. An explicit value
overrides any value set via the same-named zookeeper.ssl.protocol system
property.
Type: string
Default: TLSv1.2
Valid Values:
Importance: low
Update Mode: read-only
zookeeper.sync.time.ms
Type: int
Default: 2000 (2 seconds)
Valid Values:
Importance: low
Update Mode: read-only
From Kafka version 1.1 onwards, some of the broker configs can be updated without
restarting the broker. See the Dynamic Update Mode column in Broker Configs for the update
mode of each broker config.
To alter the current broker configs for broker id 0 (for example, the number of log cleaner
threads):
To delete a config override and revert to the statically configured or default value for broker
id 0 (for example, the number of log cleaner threads):
All configs that are configurable at cluster level may also be configured at per-broker level
(e.g. for testing). If a config value is defined at different levels, the following order of
precedence is used:
Password config values that are dynamically updated are encrypted before storing in
ZooKeeper. The broker config password.encoder.secret must be configured
in server.properties to enable dynamic update of password configs. The secret may be
different on different brokers.
The secret used for password encoding may be rotated with a rolling restart of brokers. The
old secret used for encoding passwords currently in ZooKeeper must be provided in the
static broker config password.encoder.old.secret and the new secret must be provided
in password.encoder.secret. All dynamic password configs stored in ZooKeeper will be re-
encoded with the new secret when the broker starts up.
In Kafka 1.1.x, all dynamically updated password configs must be provided in every alter
request when updating configs using kafka-configs.sh even if the password config is not
being altered. This constraint will be removed in a future release.
Brokers may be configured with SSL keystores with short validity periods to reduce the risk
of compromised certificates. Keystores may be updated dynamically without restarting the
broker. The config name must be prefixed with the listener prefix listener.name.
{listenerName}. so that only the keystore config of a specific listener is updated. The
following configs may be updated in a single alter request at per-broker level:
ssl.keystore.type
ssl.keystore.location
ssl.keystore.password
ssl.key.password
If the listener is the inter-broker listener, the update is allowed only if the new keystore is
trusted by the truststore configured for that listener. For other listeners, no trust validation
is performed on the keystore by the broker. Certificates must be signed by the same
certificate authority that signed the old certificate to avoid any client authentication failures.
ssl.truststore.type
ssl.truststore.location
ssl.truststore.password
If the listener is the inter-broker listener, the update is allowed only if the existing keystore
for that listener is trusted by the new truststore. For other listeners, no trust validation is
performed by the broker before the update. Removal of CA certificates used to sign client
certificates from the new truststore can lead to client authentication failures.
Default topic configuration options used by brokers may be updated without broker restart.
The configs are applied to topics without a topic config override for the equivalent per-topic
config. One or more of these configs may be overridden at cluster-default level used by all
brokers.
log.segment.bytes
log.roll.ms
log.roll.hours
log.roll.jitter.ms
log.roll.jitter.hours
log.index.size.max.bytes
log.flush.interval.messages
log.flush.interval.ms
log.retention.bytes
log.retention.ms
log.retention.minutes
log.retention.hours
log.index.interval.bytes
log.cleaner.delete.retention.ms
log.cleaner.min.compaction.lag.ms
log.cleaner.max.compaction.lag.ms
log.cleaner.min.cleanable.ratio
log.cleanup.policy
log.segment.delete.delay.ms
unclean.leader.election.enable
min.insync.replicas
max.message.bytes
compression.type
log.preallocate
log.message.timestamp.type
log.message.timestamp.difference.max.ms
From Kafka version 2.0.0 onwards, unclean leader election is automatically enabled by the
controller when the config unclean.leader.election.enable is dynamically updated. In Kafka
version 1.1.x, changes to unclean.leader.election.enable take effect only when a new
controller is elected. Controller re-election may be forced by running:
Log cleaner configs may be updated dynamically at cluster-default level used by all brokers.
The changes take effect on the next iteration of log cleaning. One or more of these configs
may be updated:
log.cleaner.threads
log.cleaner.io.max.bytes.per.second
log.cleaner.dedupe.buffer.size
log.cleaner.io.buffer.size
log.cleaner.io.buffer.load.factor
log.cleaner.backoff.ms
The size of various thread pools used by the broker may be updated dynamically at cluster-
default level used by all brokers. Updates are restricted to the range currentSize /
2 to currentSize * 2 to ensure that config updates are handled gracefully.
num.network.threads
num.io.threads
num.replica.fetchers
num.recovery.threads.per.data.dir
log.cleaner.threads
background.threads
The maximum number of connections allowed for a given IP/host by the broker may be
updated dynamically at cluster-default level used by all brokers. The changes will apply for
new connection creations and the existing connections count will be taken into account by
the new limits.
max.connections.per.ip
max.connections.per.ip.overrides
Listeners may be added or removed dynamically. When a new listener is added, security
configs of the listener must be provided as listener configs with the listener
prefix listener.name.{listenerName}.. If the new listener uses SASL, the JAAS configuration
of the listener must be provided using the JAAS configuration
property sasl.jaas.config with the listener and mechanism prefix. See JAAS configuration
for Kafka brokers for details.
In Kafka version 1.1.x, the listener used by the inter-broker listener may not be updated
dynamically. To update the inter-broker listener to a new listener, the new listener may be
added on all brokers without restarting the broker. A rolling restart is then required to
update inter.broker.listener.name.
In addition to all the security configs of new listeners, the following configs may be updated
dynamically at per-broker level:
listeners
advertised.listeners
listener.security.protocol.map
Configurations pertinent to topics have both a server default as well an optional per-topic
override. If no per-topic configuration is given the server default is used. The override can be
set at topic creation time by giving one or more --config options. This example creates a
topic named my-topic with a custom max message size and flush rate:
Overrides can also be changed or set later using the alter configs command. This example
updates the max message size for my-topic:
> bin/kafka-configs.sh --bootstrap-server localhost:9092 --entity-type
topics --entity-name my-topic
--alter --add-config max.message.bytes=128000
The following are the topic-level configurations. The server's default configuration for this
property is given under the Server Default Property heading. A given server default config
value only applies to a topic if it does not have an explicit topic config override.
cleanup.policy
A string that is either "delete" or "compact" or both. This string designates the
retention policy to use on old log segments. The default policy ("delete") will discard
old segments when their retention time or size limit has been reached. The
"compact" setting will enable log compaction on the topic.
Type: list
Default: delete
Valid Values: [compact, delete]
Server Default Property: log.cleanup.policy
Importance: medium
compression.type
Specify the final compression type for a given topic. This configuration accepts the
standard compression codecs ('gzip', 'snappy', 'lz4', 'zstd'). It additionally accepts
'uncompressed' which is equivalent to no compression; and 'producer' which means
retain the original compression codec set by the producer.
Type: string
Default: producer
Valid Values: [uncompressed, zstd, lz4,
snappy, gzip, producer]
Server Default Property: compression.type
Importance: medium
delete.retention.ms
Type: long
Default: 86400000 (1 day)
Valid Values: [0,...]
Server Default Property: log.cleaner.delete.retention.ms
Importance: medium
file.delete.delay.ms
Type: long
Default: 60000 (1 minute)
Valid Values: [0,...]
Server Default Property: log.segment.delete.delay.ms
Importance: medium
flush.messages
This setting allows specifying an interval at which we will force an fsync of data
written to the log. For example if this was set to 1 we would fsync after every
message; if it were 5 we would fsync after every five messages. In general we
recommend you not set this and use replication for durability and allow the operating
system's background flush capabilities as it is more efficient. This setting can be
overridden on a per-topic basis (see the per-topic configuration section).
Type: long
Default: 9223372036854775807
Valid Values: [0,...]
Server Default Property: log.flush.interval.messages
Importance: medium
flush.ms
This setting allows specifying a time interval at which we will force an fsync of data
written to the log. For example if this was set to 1000 we would fsync after 1000 ms
had passed. In general we recommend you not set this and use replication for
durability and allow the operating system's background flush capabilities as it is
more efficient.
Type: long
Default: 9223372036854775807
Valid Values: [0,...]
Server Default Property: log.flush.interval.ms
Importance: medium
follower.replication.throttled.replicas
A list of replicas for which log replication should be throttled on the follower side.
The list should describe a set of replicas in the form [PartitionId]:[BrokerId],
[PartitionId]:[BrokerId]:... or alternatively the wildcard '*' can be used to throttle all
replicas for this topic.
Type: list
Default: ""
Valid Values: [partitionId]:[brokerId],[partitionId]:[brokerId],...
Server Default Property: follower.replication.throttled.replicas
Importance: medium
index.interval.bytes
This setting controls how frequently Kafka adds an index entry to its offset index.
The default setting ensures that we index a message roughly every 4096 bytes. More
indexing allows reads to jump closer to the exact position in the log but makes the
index larger. You probably don't need to change this.
Type: int
Default: 4096 (4 kibibytes)
Valid Values: [0,...]
Server Default Property: log.index.interval.bytes
Importance: medium
leader.replication.throttled.replicas
A list of replicas for which log replication should be throttled on the leader side. The
list should describe a set of replicas in the form [PartitionId]:[BrokerId],[PartitionId]:
[BrokerId]:... or alternatively the wildcard '*' can be used to throttle all replicas for this
topic.
Type: list
Default: ""
Valid Values: [partitionId]:[brokerId],[partitionId]:[brokerId],...
Server Default Property: leader.replication.throttled.replicas
Importance: medium
max.compaction.lag.ms
The maximum time a message will remain ineligible for compaction in the log. Only
applicable for logs that are being compacted.
Type: long
Default: 9223372036854775807
Valid Values: [1,...]
Server Default Property: log.cleaner.max.compaction.lag.ms
Importance: medium
max.message.bytes
The largest record batch size allowed by Kafka (after compression if compression is
enabled). If this is increased and there are consumers older than 0.10.2, the
consumers' fetch size must also be increased so that they can fetch record batches
this large. In the latest message format version, records are always grouped into
batches for efficiency. In previous message format versions, uncompressed records
are not grouped into batches and this limit only applies to a single record in that
case.
Type: int
Default: 1048588
Valid Values: [0,...]
Server Default Property: message.max.bytes
Importance: medium
message.format.version
Specify the message format version the broker will use to append messages to the
logs. The value should be a valid ApiVersion. Some examples are: 0.8.2, 0.9.0.0,
0.10.0, check ApiVersion for more details. By setting a particular message format
version, the user is certifying that all the existing messages on disk are smaller or
equal than the specified version. Setting this value incorrectly will cause consumers
with older versions to break as they will receive messages with a format that they
don't understand.
Type: string
Default: 2.6-IV0
[0.8.0, 0.8.1, 0.8.2, 0.9.0,
0.10.0-IV0, 0.10.0-IV1,
0.10.1-IV0, 0.10.1-IV1,
0.10.1-IV2, 0.10.2-IV0,
0.11.0-IV0, 0.11.0-IV1,
Valid Values: 0.11.0-IV2, 1.0-IV0, 1.1-IV0,
2.0-IV0, 2.0-IV1, 2.1-IV0,
2.1-IV1, 2.1-IV2, 2.2-IV0,
2.2-IV1, 2.3-IV0, 2.3-IV1,
2.4-IV0, 2.4-IV1, 2.5-IV0,
2.6-IV0]
Server Default Property: log.message.format.version
Importance: medium
message.timestamp.difference.max.ms
The maximum difference allowed between the timestamp when a broker receives a
message and the timestamp specified in the message. If
message.timestamp.type=CreateTime, a message will be rejected if the difference in
timestamp exceeds this threshold. This configuration is ignored if
message.timestamp.type=LogAppendTime.
Type: long
Default: 9223372036854775807
Valid Values: [0,...]
Server Default Property: log.message.timestamp.difference.max.ms
Importance: medium
message.timestamp.type
Define whether the timestamp in the message is message create time or log append
time. The value should be either `CreateTime` or `LogAppendTime`
Type: string
Default: CreateTime
[CreateTime,
Valid Values:
LogAppendTime]
Server Default Property: log.message.timestamp.type
Importance: medium
min.cleanable.dirty.ratio
This configuration controls how frequently the log compactor will attempt to clean
the log (assuming log compaction is enabled). By default we will avoid cleaning a log
where more than 50% of the log has been compacted. This ratio bounds the
maximum space wasted in the log by duplicates (at 50% at most 50% of the log
could be duplicates). A higher ratio will mean fewer, more efficient cleanings but will
mean more wasted space in the log. If the max.compaction.lag.ms or the
min.compaction.lag.ms configurations are also specified, then the log compactor
considers the log to be eligible for compaction as soon as either: (i) the dirty ratio
threshold has been met and the log has had dirty (uncompacted) records for at least
the min.compaction.lag.ms duration, or (ii) if the log has had dirty (uncompacted)
records for at most the max.compaction.lag.ms period.
Type: double
Default: 0.5
Valid Values: [0,...,1]
Server Default Property: log.cleaner.min.cleanable.ratio
Importance: medium
min.compaction.lag.ms
The minimum time a message will remain uncompacted in the log. Only applicable
for logs that are being compacted.
Type: long
Default: 0
Valid Values: [0,...]
Server Default Property: log.cleaner.min.compaction.lag.ms
Importance: medium
min.insync.replicas
When a producer sets acks to "all" (or "-1"), this configuration specifies the minimum
number of replicas that must acknowledge a write for the write to be considered
successful. If this minimum cannot be met, then the producer will raise an exception
(either NotEnoughReplicas or NotEnoughReplicasAfterAppend).
When used together, min.insync.replicas and acks allow you to enforce greater
durability guarantees. A typical scenario would be to create a topic with a replication
factor of 3, set min.insync.replicas to 2, and produce with acks of "all". This will
ensure that the producer raises an exception if a majority of replicas do not receive a
write.
Type: int
Default: 1
Valid Values: [1,...]
Server Default Property: min.insync.replicas
Importance: medium
preallocate
True if we should preallocate the file on disk when creating a new log segment.
Type: boolean
Default: false
Valid Values:
Server Default Property: log.preallocate
Importance: medium
retention.bytes
This configuration controls the maximum size a partition (which consists of log
segments) can grow to before we will discard old log segments to free up space if
we are using the "delete" retention policy. By default there is no size limit only a time
limit. Since this limit is enforced at the partition level, multiply it by the number of
partitions to compute the topic retention in bytes.
Type: long
Default: -1
Valid Values:
Server Default Property: log.retention.bytes
Importance: medium
retention.ms
This configuration controls the maximum time we will retain a log before we will
discard old log segments to free up space if we are using the "delete" retention
policy. This represents an SLA on how soon consumers must read their data. If set
to -1, no time limit is applied.
Type: long
Default: 604800000 (7 days)
Valid Values: [-1,...]
Server Default Property: log.retention.ms
Importance: medium
segment.bytes
This configuration controls the segment file size for the log. Retention and cleaning
is always done a file at a time so a larger segment size means fewer files but less
granular control over retention.
Type: int
Default: 1073741824 (1 gibibyte)
Valid Values: [14,...]
Server Default Property: log.segment.bytes
Importance: medium
segment.index.bytes
This configuration controls the size of the index that maps offsets to file positions.
We preallocate this index file and shrink it only after log rolls. You generally should
not need to change this setting.
Type: int
Default: 10485760 (10 mebibytes)
Valid Values: [0,...]
Server Default Property: log.index.size.max.bytes
Importance: medium
segment.jitter.ms
The maximum random jitter subtracted from the scheduled segment roll time to
avoid thundering herds of segment rolling
Type: long
Default: 0
Valid Values: [0,...]
Server Default Property: log.roll.jitter.ms
Importance: medium
segment.ms
This configuration controls the period of time after which Kafka will force the log to
roll even if the segment file isn't full to ensure that retention can delete or compact
old data.
Type: long
Default: 604800000 (7 days)
Valid Values: [1,...]
Server Default Property: log.roll.ms
Importance: medium
unclean.leader.election.enable
Indicates whether to enable replicas not in the ISR set to be elected as leader as a
last resort, even though doing so may result in data loss.
Type: boolean
Default: false
Valid Values:
Server Default Property: unclean.leader.election.enable
Importance: medium
message.downconversion.enable
Type: boolean
Default: true
Valid Values:
Server Default Property: log.message.downconversion.enable
Importance: low
3.3 Producer Configs
Type: class
Default:
Valid Values:
Importance: high
value.serializer
Type: class
Default:
Valid Values:
Importance: high
acks
The number of acknowledgments the producer requires the leader to have received
before considering a request complete. This controls the durability of records that
are sent. The following settings are allowed:
o acks=0 If set to zero then the producer will not wait for any acknowledgment
from the server at all. The record will be immediately added to the socket
buffer and considered sent. No guarantee can be made that the server has
received the record in this case, and the retries configuration will not take
effect (as the client won't generally know of any failures). The offset given
back for each record will always be set to -1.
o acks=1 This will mean the leader will write the record to its local log but will
respond without awaiting full acknowledgement from all followers. In this
case should the leader fail immediately after acknowledging the record but
before the followers have replicated it then the record will be lost.
o acks=all This means the leader will wait for the full set of in-sync replicas to
acknowledge the record. This guarantees that the record will not be lost as
long as at least one in-sync replica remains alive. This is the strongest
available guarantee. This is equivalent to the acks=-1 setting.
Type: string
Default: 1
Valid Values: [all, -1, 0, 1]
Importance: high
bootstrap.servers
A list of host/port pairs to use for establishing the initial connection to the Kafka
cluster. The client will make use of all servers irrespective of which servers are
specified here for bootstrapping—this list only impacts the initial hosts used to
discover the full set of servers. This list should be in the
form host1:port1,host2:port2,.... Since these servers are just used for the initial
connection to discover the full cluster membership (which may change dynamically),
this list need not contain the full set of servers (you may want more than one,
though, in case a server is down).
Type: list
Default: ""
Valid Values: non-null string
Importance: high
buffer.memory
The total bytes of memory the producer can use to buffer records waiting to be sent
to the server. If records are sent faster than they can be delivered to the server the
producer will block for max.block.ms after which it will throw an exception.
This setting should correspond roughly to the total memory the producer will use,
but is not a hard bound since not all memory the producer uses is used for buffering.
Some additional memory will be used for compression (if compression is enabled)
as well as for maintaining in-flight requests.
Type: long
Default: 33554432
Valid Values: [0,...]
Importance: high
compression.type
The compression type for all data generated by the producer. The default is none
(i.e. no compression). Valid values are none, gzip, snappy, lz4, or zstd. Compression
is of full batches of data, so the efficacy of batching will also impact the
compression ratio (more batching means better compression).
Type: string
Default: none
Valid Values:
Importance: high
retries
Setting a value greater than zero will cause the client to resend any record whose
send fails with a potentially transient error. Note that this retry is no different than if
the client resent the record upon receiving the error. Allowing retries without
setting max.in.flight.requests.per.connection to 1 will potentially change the
ordering of records because if two batches are sent to a single partition, and the first
fails and is retried but the second succeeds, then the records in the second batch
may appear first. Note additionally that produce requests will be failed before the
number of retries has been exhausted if the timeout configured
by delivery.timeout.ms expires first before successful acknowledgement. Users
should generally prefer to leave this config unset and instead
use delivery.timeout.ms to control retry behavior.
Type: int
Default: 2147483647
Valid Values: [0,...,2147483647]
Importance: high
ssl.key.password
The password of the private key in the key store file. This is optional for client.
Type: password
Default: null
Valid Values:
Importance: high
ssl.keystore.location
The location of the key store file. This is optional for client and can be used for two-
way authentication for client.
Type: string
Default: null
Valid Values:
Importance: high
ssl.keystore.password
The store password for the key store file. This is optional for client and only needed
if ssl.keystore.location is configured.
Type: password
Default: null
Valid Values:
Importance: high
ssl.truststore.location
Type: string
Default: null
Valid Values:
Importance: high
ssl.truststore.password
The password for the trust store file. If a password is not set access to the truststore
is still available, but integrity checking is disabled.
Type: password
Default: null
Valid Values:
Importance: high
batch.size
The producer will attempt to batch records together into fewer requests whenever
multiple records are being sent to the same partition. This helps performance on
both the client and the server. This configuration controls the default batch size in
bytes.
Requests sent to brokers will contain multiple batches, one for each partition with
data available to be sent.
A small batch size will make batching less common and may reduce throughput (a
batch size of zero will disable batching entirely). A very large batch size may use
memory a bit more wastefully as we will always allocate a buffer of the specified
batch size in anticipation of additional records.
Type: int
Default: 16384
Valid Values: [0,...]
Importance: medium
client.dns.lookup
Controls how the client uses DNS lookups. If set to use_all_dns_ips, connect to each
returned IP address in sequence until a successful connection is established. After a
disconnection, the next IP is used. Once all IPs have been used once, the client
resolves the IP(s) from the hostname again (both the JVM and the OS cache DNS
name lookups, however). If set to resolve_canonical_bootstrap_servers_only ,
resolve each bootstrap address into a list of canonical names. After the bootstrap
phase, this behaves the same as use_all_dns_ips. If set to default (deprecated),
attempt to connect to the first IP address returned by the lookup, even if the lookup
returns multiple IP addresses.
Type: string
Default: use_all_dns_ips
[default, use_all_dns_ips,
Valid Values:
resolve_canonical_bootstrap_servers_only]
Importance: medium
client.id
An id string to pass to the server when making requests. The purpose of this is to be
able to track the source of requests beyond just ip/port by allowing a logical
application name to be included in server-side request logging.
Type: string
Default: ""
Valid Values:
Importance: medium
connections.max.idle.ms
Close idle connections after the number of milliseconds specified by this config.
Type: long
Default: 540000 (9 minutes)
Valid Values:
Importance: medium
delivery.timeout.ms
An upper bound on the time to report success or failure after a call to send() returns.
This limits the total time that a record will be delayed prior to sending, the time to
await acknowledgement from the broker (if expected), and the time allowed for
retriable send failures. The producer may report failure to send a record earlier than
this config if either an unrecoverable error is encountered, the retries have been
exhausted, or the record is added to a batch which reached an earlier delivery
expiration deadline. The value of this config should be greater than or equal to the
sum of request.timeout.ms and linger.ms.
Type: int
Default: 120000 (2 minutes)
Valid Values: [0,...]
Importance: medium
linger.ms
The producer groups together any records that arrive in between request
transmissions into a single batched request. Normally this occurs only under load
when records arrive faster than they can be sent out. However in some
circumstances the client may want to reduce the number of requests even under
moderate load. This setting accomplishes this by adding a small amount of artificial
delay—that is, rather than immediately sending out a record the producer will wait for
up to the given delay to allow other records to be sent so that the sends can be
batched together. This can be thought of as analogous to Nagle's algorithm in TCP.
This setting gives the upper bound on the delay for batching: once we
get batch.size worth of records for a partition it will be sent immediately regardless
of this setting, however if we have fewer than this many bytes accumulated for this
partition we will 'linger' for the specified time waiting for more records to show up.
This setting defaults to 0 (i.e. no delay). Setting linger.ms=5, for example, would
have the effect of reducing the number of requests sent but would add up to 5ms of
latency to records sent in the absence of load.
Type: long
Default: 0
Valid Values: [0,...]
Importance: medium
max.block.ms
Type: long
Default: 60000 (1 minute)
Valid Values: [0,...]
Importance: medium
max.request.size
The maximum size of a request in bytes. This setting will limit the number of record
batches the producer will send in a single request to avoid sending huge requests.
This is also effectively a cap on the maximum uncompressed record batch size.
Note that the server has its own cap on the record batch size (after compression if
compression is enabled) which may be different from this.
Type: int
Default: 1048576
Valid Values: [0,...]
Importance: medium
partitioner.class
The size of the TCP receive buffer (SO_RCVBUF) to use when reading data. If the
value is -1, the OS default will be used.
Type: int
Default: 32768 (32 kibibytes)
Valid Values: [-1,...]
Importance: medium
request.timeout.ms
The configuration controls the maximum amount of time the client will wait for the
response of a request. If the response is not received before the timeout elapses the
client will resend the request if necessary or fail the request if retries are exhausted.
This should be larger than replica.lag.time.max.ms (a broker configuration) to
reduce the possibility of message duplication due to unnecessary producer retries.
Type: int
Default: 30000 (30 seconds)
Valid Values: [0,...]
Importance: medium
sasl.client.callback.handler.class
The fully qualified name of a SASL client callback handler class that implements the
AuthenticateCallbackHandler interface.
Type: class
Default: null
Valid Values:
Importance: medium
sasl.jaas.config
JAAS login context parameters for SASL connections in the format used by JAAS
configuration files. JAAS configuration file format is described here. The format for
the value is: 'loginModuleClass controlFlag (optionName=optionValue)*; '. For
brokers, the config must be prefixed with listener prefix and SASL mechanism name
in lower-case. For example, listener.name.sasl_ssl.scram-sha-
256.sasl.jaas.config=com.example.ScramLoginModule required;
Type: password
Default: null
Valid Values:
Importance: medium
sasl.kerberos.service.name
The Kerberos principal name that Kafka runs as. This can be defined either in Kafka's
JAAS config or in Kafka's config.
Type: string
Default: null
Valid Values:
Importance: medium
sasl.login.callback.handler.class
The fully qualified name of a SASL login callback handler class that implements the
AuthenticateCallbackHandler interface. For brokers, login callback handler config
must be prefixed with listener prefix and SASL mechanism name in lower-case. For
example, listener.name.sasl_ssl.scram-sha-
256.sasl.login.callback.handler.class=com.example.CustomScramLoginCallbackHa
ndler
Type: class
Default: null
Valid Values:
Importance: medium
sasl.login.class
The fully qualified name of a class that implements the Login interface. For brokers,
login config must be prefixed with listener prefix and SASL mechanism name in
lower-case. For example, listener.name.sasl_ssl.scram-sha-
256.sasl.login.class=com.example.CustomScramLogin
Type: class
Default: null
Valid Values:
Importance: medium
sasl.mechanism
SASL mechanism used for client connections. This may be any mechanism for
which a security provider is available. GSSAPI is the default mechanism.
Type: string
Default: GSSAPI
Valid Values:
Importance: medium
security.protocol
Protocol used to communicate with brokers. Valid values are: PLAINTEXT, SSL,
SASL_PLAINTEXT, SASL_SSL.
Type: string
Default: PLAINTEXT
Valid Values:
Importance: medium
send.buffer.bytes
The size of the TCP send buffer (SO_SNDBUF) to use when sending data. If the value
is -1, the OS default will be used.
Type: int
Default: 131072 (128 kibibytes)
Valid Values: [-1,...]
Importance: medium
ssl.enabled.protocols
The list of protocols enabled for SSL connections. The default is 'TLSv1.2,TLSv1.3'
when running with Java 11 or newer, 'TLSv1.2' otherwise. With the default value for
Java 11, clients and servers will prefer TLSv1.3 if both support it and fallback to
TLSv1.2 otherwise (assuming both support at least TLSv1.2). This default should be
fine for most cases. Also see the config documentation for `ssl.protocol`.
Type: list
Default: TLSv1.2
Valid Values:
Importance: medium
ssl.keystore.type
The file format of the key store file. This is optional for client.
Type: string
Default: JKS
Valid Values:
Importance: medium
ssl.protocol
The SSL protocol used to generate the SSLContext. The default is 'TLSv1.3' when
running with Java 11 or newer, 'TLSv1.2' otherwise. This value should be fine for
most use cases. Allowed values in recent JVMs are 'TLSv1.2' and 'TLSv1.3'. 'TLS',
'TLSv1.1', 'SSL', 'SSLv2' and 'SSLv3' may be supported in older JVMs, but their usage
is discouraged due to known security vulnerabilities. With the default value for this
config and 'ssl.enabled.protocols', clients will downgrade to 'TLSv1.2' if the server
does not support 'TLSv1.3'. If this config is set to 'TLSv1.2', clients will not use
'TLSv1.3' even if it is one of the values in ssl.enabled.protocols and the server only
supports 'TLSv1.3'.
Type: string
Default: TLSv1.2
Valid Values:
Importance: medium
ssl.provider
The name of the security provider used for SSL connections. Default value is the
default security provider of the JVM.
Type: string
Default: null
Valid Values:
Importance: medium
ssl.truststore.type
The file format of the trust store file.
Type: string
Default: JKS
Valid Values:
Importance: medium
enable.idempotence
When set to 'true', the producer will ensure that exactly one copy of each message is
written in the stream. If 'false', producer retries due to broker failures, etc., may write
duplicates of the retried message in the stream. Note that enabling idempotence
requires max.in.flight.requests.per.connection to be less than or equal to
5, retries to be greater than 0 and acks must be 'all'. If these values are not explicitly
set by the user, suitable values will be chosen. If incompatible values are set,
a ConfigException will be thrown.
Type: boolean
Default: false
Valid Values:
Importance: low
interceptor.classes
Type: list
Default: ""
Valid Values: non-null string
Importance: low
max.in.flight.requests.per.connection
The maximum number of unacknowledged requests the client will send on a single
connection before blocking. Note that if this setting is set to be greater than 1 and
there are failed sends, there is a risk of message re-ordering due to retries (i.e., if
retries are enabled).
Type: int
Default: 5
Valid Values: [1,...]
Importance: low
metadata.max.age.ms
The period of time in milliseconds after which we force a refresh of metadata even if
we haven't seen any partition leadership changes to proactively discover any new
brokers or partitions.
Type: long
Default: 300000 (5 minutes)
Valid Values: [0,...]
Importance: low
metadata.max.idle.ms
Controls how long the producer will cache metadata for a topic that's idle. If the
elapsed time since a topic was last produced to exceeds the metadata idle duration,
then the topic's metadata is forgotten and the next access to it will force a metadata
fetch request.
Type: long
Default: 300000 (5 minutes)
Valid Values: [5000,...]
Importance: low
metric.reporters
Type: list
Default: ""
Valid Values: non-null string
Importance: low
metrics.num.samples
The number of samples maintained to compute metrics.
Type: int
Default: 2
Valid Values: [1,...]
Importance: low
metrics.recording.level
Type: string
Default: INFO
Valid Values: [INFO, DEBUG]
Importance: low
metrics.sample.window.ms
Type: long
Default: 30000 (30 seconds)
Valid Values: [0,...]
Importance: low
reconnect.backoff.max.ms
Type: long
Default: 1000 (1 second)
Valid Values: [0,...]
Importance: low
reconnect.backoff.ms
The base amount of time to wait before attempting to reconnect to a given host.
This avoids repeatedly connecting to a host in a tight loop. This backoff applies to all
connection attempts by the client to a broker.
Type: long
Default: 50
Valid Values: [0,...]
Importance: low
retry.backoff.ms
The amount of time to wait before attempting to retry a failed request to a given
topic partition. This avoids repeatedly sending requests in a tight loop under some
failure scenarios.
Type: long
Default: 100
Valid Values: [0,...]
Importance: low
sasl.kerberos.kinit.cmd
Type: string
Default: /usr/bin/kinit
Valid Values:
Importance: low
sasl.kerberos.min.time.before.relogin
Type: long
Default: 60000
Valid Values:
Importance: low
sasl.kerberos.ticket.renew.jitter
Login thread will sleep until the specified window factor of time from last refresh to
ticket's expiry has been reached, at which time it will try to renew the ticket.
Type: double
Default: 0.8
Valid Values:
Importance: low
sasl.login.refresh.buffer.seconds
The amount of buffer time before credential expiration to maintain when refreshing a
credential, in seconds. If a refresh would otherwise occur closer to expiration than
the number of buffer seconds then the refresh will be moved up to maintain as much
of the buffer time as possible. Legal values are between 0 and 3600 (1 hour); a
default value of 300 (5 minutes) is used if no value is specified. This value and
sasl.login.refresh.min.period.seconds are both ignored if their sum exceeds the
remaining lifetime of a credential. Currently applies only to OAUTHBEARER.
Type: short
Default: 300
Valid Values: [0,...,3600]
Importance: low
sasl.login.refresh.min.period.seconds
The desired minimum time for the login refresh thread to wait before refreshing a
credential, in seconds. Legal values are between 0 and 900 (15 minutes); a default
value of 60 (1 minute) is used if no value is specified. This value and
sasl.login.refresh.buffer.seconds are both ignored if their sum exceeds the
remaining lifetime of a credential. Currently applies only to OAUTHBEARER.
Type: short
Default: 60
Valid Values: [0,...,900]
Importance: low
sasl.login.refresh.window.factor
Login refresh thread will sleep until the specified window factor relative to the
credential's lifetime has been reached, at which time it will try to refresh the
credential. Legal values are between 0.5 (50%) and 1.0 (100%) inclusive; a default
value of 0.8 (80%) is used if no value is specified. Currently applies only to
OAUTHBEARER.
Type: double
Default: 0.8
Valid Values: [0.5,...,1.0]
Importance: low
sasl.login.refresh.window.jitter
The maximum amount of random jitter relative to the credential's lifetime that is
added to the login refresh thread's sleep time. Legal values are between 0 and 0.25
(25%) inclusive; a default value of 0.05 (5%) is used if no value is specified. Currently
applies only to OAUTHBEARER.
Type: double
Default: 0.05
Valid Values: [0.0,...,0.25]
Importance: low
security.providers
Type: string
Default: null
Valid Values:
Importance: low
ssl.cipher.suites
Type: list
Default: null
Valid Values:
Importance: low
ssl.endpoint.identification.algorithm
Type: string
Default: https
Valid Values:
Importance: low
ssl.engine.factory.class
Type: class
Default: null
Valid Values:
Importance: low
ssl.keymanager.algorithm
The algorithm used by key manager factory for SSL connections. Default value is the
key manager factory algorithm configured for the Java Virtual Machine.
Type: string
Default: SunX509
Valid Values:
Importance: low
ssl.secure.random.implementation
The SecureRandom PRNG implementation to use for SSL cryptography operations.
Type: string
Default: null
Valid Values:
Importance: low
ssl.trustmanager.algorithm
The algorithm used by trust manager factory for SSL connections. Default value is
the trust manager factory algorithm configured for the Java Virtual Machine.
Type: string
Default: PKIX
Valid Values:
Importance: low
transaction.timeout.ms
The maximum amount of time in ms that the transaction coordinator will wait for a
transaction status update from the producer before proactively aborting the ongoing
transaction.If this value is larger than the transaction.max.timeout.ms setting in the
broker, the request will fail with a InvalidTransactionTimeout error.
Type: int
Default: 60000 (1 minute)
Valid Values:
Importance: low
transactional.id
Type: class
Default:
Valid Values:
Importance: high
value.deserializer
Type: class
Default:
Valid Values:
Importance: high
bootstrap.servers
A list of host/port pairs to use for establishing the initial connection to the Kafka
cluster. The client will make use of all servers irrespective of which servers are
specified here for bootstrapping—this list only impacts the initial hosts used to
discover the full set of servers. This list should be in the
form host1:port1,host2:port2,.... Since these servers are just used for the initial
connection to discover the full cluster membership (which may change dynamically),
this list need not contain the full set of servers (you may want more than one,
though, in case a server is down).
Type: list
Default: ""
Valid Values: non-null string
Importance: high
fetch.min.bytes
The minimum amount of data the server should return for a fetch request. If
insufficient data is available the request will wait for that much data to accumulate
before answering the request. The default setting of 1 byte means that fetch
requests are answered as soon as a single byte of data is available or the fetch
request times out waiting for data to arrive. Setting this to something greater than 1
will cause the server to wait for larger amounts of data to accumulate which can
improve server throughput a bit at the cost of some additional latency.
Type: int
Default: 1
Valid Values: [0,...]
Importance: high
group.id
A unique string that identifies the consumer group this consumer belongs to. This
property is required if the consumer uses either the group management functionality
by using subscribe(topic) or the Kafka-based offset management strategy.
Type: string
Default: null
Valid Values:
Importance: high
heartbeat.interval.ms
The expected time between heartbeats to the consumer coordinator when using
Kafka's group management facilities. Heartbeats are used to ensure that the
consumer's session stays active and to facilitate rebalancing when new consumers
join or leave the group. The value must be set lower than session.timeout.ms, but
typically should be set no higher than 1/3 of that value. It can be adjusted even lower
to control the expected time for normal rebalances.
Type: int
Default: 3000 (3 seconds)
Valid Values:
Importance: high
max.partition.fetch.bytes
The maximum amount of data per-partition the server will return. Records are
fetched in batches by the consumer. If the first record batch in the first non-empty
partition of the fetch is larger than this limit, the batch will still be returned to ensure
that the consumer can make progress. The maximum record batch size accepted by
the broker is defined via message.max.bytes (broker config)
or max.message.bytes (topic config). See fetch.max.bytes for limiting the consumer
request size.
Type: int
Default: 1048576 (1 mebibyte)
Valid Values: [0,...]
Importance: high
session.timeout.ms
The timeout used to detect client failures when using Kafka's group management
facility. The client sends periodic heartbeats to indicate its liveness to the broker. If
no heartbeats are received by the broker before the expiration of this session
timeout, then the broker will remove this client from the group and initiate a
rebalance. Note that the value must be in the allowable range as configured in the
broker configuration
by group.min.session.timeout.ms and group.max.session.timeout.ms.
Type: int
Default: 10000 (10 seconds)
Valid Values:
Importance: high
ssl.key.password
The password of the private key in the key store file. This is optional for client.
Type: password
Default: null
Valid Values:
Importance: high
ssl.keystore.location
The location of the key store file. This is optional for client and can be used for two-
way authentication for client.
Type: string
Default: null
Valid Values:
Importance: high
ssl.keystore.password
The store password for the key store file. This is optional for client and only needed
if ssl.keystore.location is configured.
Type: password
Default: null
Valid Values:
Importance: high
ssl.truststore.location
Type: string
Default: null
Valid Values:
Importance: high
ssl.truststore.password
The password for the trust store file. If a password is not set access to the truststore
is still available, but integrity checking is disabled.
Type: password
Default: null
Valid Values:
Importance: high
allow.auto.create.topics
Type: boolean
Default: true
Valid Values:
Importance: medium
auto.offset.reset
What to do when there is no initial offset in Kafka or if the current offset does not
exist any more on the server (e.g. because that data has been deleted):
Type: string
Default: latest
Valid Values: [latest, earliest, none]
Importance: medium
client.dns.lookup
Controls how the client uses DNS lookups. If set to use_all_dns_ips, connect to each
returned IP address in sequence until a successful connection is established. After a
disconnection, the next IP is used. Once all IPs have been used once, the client
resolves the IP(s) from the hostname again (both the JVM and the OS cache DNS
name lookups, however). If set to resolve_canonical_bootstrap_servers_only ,
resolve each bootstrap address into a list of canonical names. After the bootstrap
phase, this behaves the same as use_all_dns_ips. If set to default (deprecated),
attempt to connect to the first IP address returned by the lookup, even if the lookup
returns multiple IP addresses.
Type: string
Default: use_all_dns_ips
[default, use_all_dns_ips,
Valid Values:
resolve_canonical_bootstrap_servers_only]
Importance: medium
connections.max.idle.ms
Close idle connections after the number of milliseconds specified by this config.
Type: long
Default: 540000 (9 minutes)
Valid Values:
Importance: medium
default.api.timeout.ms
Specifies the timeout (in milliseconds) for client APIs. This configuration is used as
the default timeout for all client operations that do not specify a timeout parameter.
Type: int
Default: 60000 (1 minute)
Valid Values: [0,...]
Importance: medium
enable.auto.commit
Type: boolean
Default: true
Valid Values:
Importance: medium
exclude.internal.topics
Whether internal topics matching a subscribed pattern should be excluded from the
subscription. It is always possible to explicitly subscribe to an internal topic.
Type: boolean
Default: true
Valid Values:
Importance: medium
fetch.max.bytes
The maximum amount of data the server should return for a fetch request. Records
are fetched in batches by the consumer, and if the first record batch in the first non-
empty partition of the fetch is larger than this value, the record batch will still be
returned to ensure that the consumer can make progress. As such, this is not a
absolute maximum. The maximum record batch size accepted by the broker is
defined via message.max.bytes (broker config) or max.message.bytes (topic config).
Note that the consumer performs multiple fetches in parallel.
Type: int
Default: 52428800 (50 mebibytes)
Valid Values: [0,...]
Importance: medium
group.instance.id
A unique identifier of the consumer instance provided by the end user. Only non-
empty strings are permitted. If set, the consumer is treated as a static member,
which means that only one instance with this ID is allowed in the consumer group at
any time. This can be used in combination with a larger session timeout to avoid
group rebalances caused by transient unavailability (e.g. process restarts). If not set,
the consumer will join the group as a dynamic member, which is the traditional
behavior.
Type: string
Default: null
Valid Values:
Importance: medium
isolation.level
Type: string
Default: read_uncommitted
[read_committed,
Valid Values:
read_uncommitted]
Importance: medium
max.poll.interval.ms
The maximum delay between invocations of poll() when using consumer group
management. This places an upper bound on the amount of time that the consumer
can be idle before fetching more records. If poll() is not called before expiration of
this timeout, then the consumer is considered failed and the group will rebalance in
order to reassign the partitions to another member. For consumers using a non-
null group.instance.id which reach this timeout, partitions will not be immediately
reassigned. Instead, the consumer will stop sending heartbeats and partitions will be
reassigned after expiration of session.timeout.ms. This mirrors the behavior of a
static consumer which has shutdown.
Type: int
Default: 300000 (5 minutes)
Valid Values: [1,...]
Importance: medium
max.poll.records
Type: int
Default: 500
Valid Values: [1,...]
Importance: medium
partition.assignment.strategy
A list of class names or class types, ordered by preference, of supported partition
assignment strategies that the client will use to distribute partition ownership
amongst consumer instances when group management is used.
Implementing
the org.apache.kafka.clients.consumer.ConsumerPartitionAssignor interface allows
you to plug in a custom assignmentstrategy.
Type: list
class
Default:
org.apache.kafka.clients.consumer.RangeAssignor
Valid Values: non-null string
Importance: medium
receive.buffer.bytes
The size of the TCP receive buffer (SO_RCVBUF) to use when reading data. If the
value is -1, the OS default will be used.
Type: int
Default: 65536 (64 kibibytes)
Valid Values: [-1,...]
Importance: medium
request.timeout.ms
The configuration controls the maximum amount of time the client will wait for the
response of a request. If the response is not received before the timeout elapses the
client will resend the request if necessary or fail the request if retries are exhausted.
Type: int
Default: 30000 (30 seconds)
Valid Values: [0,...]
Importance: medium
sasl.client.callback.handler.class
The fully qualified name of a SASL client callback handler class that implements the
AuthenticateCallbackHandler interface.
Type: class
Default: null
Valid Values:
Importance: medium
sasl.jaas.config
JAAS login context parameters for SASL connections in the format used by JAAS
configuration files. JAAS configuration file format is described here. The format for
the value is: 'loginModuleClass controlFlag (optionName=optionValue)*; '. For
brokers, the config must be prefixed with listener prefix and SASL mechanism name
in lower-case. For example, listener.name.sasl_ssl.scram-sha-
256.sasl.jaas.config=com.example.ScramLoginModule required;
Type: password
Default: null
Valid Values:
Importance: medium
sasl.kerberos.service.name
The Kerberos principal name that Kafka runs as. This can be defined either in Kafka's
JAAS config or in Kafka's config.
Type: string
Default: null
Valid Values:
Importance: medium
sasl.login.callback.handler.class
The fully qualified name of a SASL login callback handler class that implements the
AuthenticateCallbackHandler interface. For brokers, login callback handler config
must be prefixed with listener prefix and SASL mechanism name in lower-case. For
example, listener.name.sasl_ssl.scram-sha-
256.sasl.login.callback.handler.class=com.example.CustomScramLoginCallbackHa
ndler
Type: class
Default: null
Valid Values:
Importance: medium
sasl.login.class
The fully qualified name of a class that implements the Login interface. For brokers,
login config must be prefixed with listener prefix and SASL mechanism name in
lower-case. For example, listener.name.sasl_ssl.scram-sha-
256.sasl.login.class=com.example.CustomScramLogin
Type: class
Default: null
Valid Values:
Importance: medium
sasl.mechanism
SASL mechanism used for client connections. This may be any mechanism for
which a security provider is available. GSSAPI is the default mechanism.
Type: string
Default: GSSAPI
Valid Values:
Importance: medium
security.protocol
Protocol used to communicate with brokers. Valid values are: PLAINTEXT, SSL,
SASL_PLAINTEXT, SASL_SSL.
Type: string
Default: PLAINTEXT
Valid Values:
Importance: medium
send.buffer.bytes
The size of the TCP send buffer (SO_SNDBUF) to use when sending data. If the value
is -1, the OS default will be used.
Type: int
Default: 131072 (128 kibibytes)
Valid Values: [-1,...]
Importance: medium
ssl.enabled.protocols
The list of protocols enabled for SSL connections. The default is 'TLSv1.2,TLSv1.3'
when running with Java 11 or newer, 'TLSv1.2' otherwise. With the default value for
Java 11, clients and servers will prefer TLSv1.3 if both support it and fallback to
TLSv1.2 otherwise (assuming both support at least TLSv1.2). This default should be
fine for most cases. Also see the config documentation for `ssl.protocol`.
Type: list
Default: TLSv1.2
Valid Values:
Importance: medium
ssl.keystore.type
The file format of the key store file. This is optional for client.
Type: string
Default: JKS
Valid Values:
Importance: medium
ssl.protocol
The SSL protocol used to generate the SSLContext. The default is 'TLSv1.3' when
running with Java 11 or newer, 'TLSv1.2' otherwise. This value should be fine for
most use cases. Allowed values in recent JVMs are 'TLSv1.2' and 'TLSv1.3'. 'TLS',
'TLSv1.1', 'SSL', 'SSLv2' and 'SSLv3' may be supported in older JVMs, but their usage
is discouraged due to known security vulnerabilities. With the default value for this
config and 'ssl.enabled.protocols', clients will downgrade to 'TLSv1.2' if the server
does not support 'TLSv1.3'. If this config is set to 'TLSv1.2', clients will not use
'TLSv1.3' even if it is one of the values in ssl.enabled.protocols and the server only
supports 'TLSv1.3'.
Type: string
Default: TLSv1.2
Valid Values:
Importance: medium
ssl.provider
The name of the security provider used for SSL connections. Default value is the
default security provider of the JVM.
Type: string
Default: null
Valid Values:
Importance: medium
ssl.truststore.type
Type: string
Default: JKS
Valid Values:
Importance: medium
auto.commit.interval.ms
Type: int
Default: 5000 (5 seconds)
Valid Values: [0,...]
Importance: low
check.crcs
Automatically check the CRC32 of the records consumed. This ensures no on-the-
wire or on-disk corruption to the messages occurred. This check adds some
overhead, so it may be disabled in cases seeking extreme performance.
Type: boolean
Default: true
Valid Values:
Importance: low
client.id
An id string to pass to the server when making requests. The purpose of this is to be
able to track the source of requests beyond just ip/port by allowing a logical
application name to be included in server-side request logging.
Type: string
Default: ""
Valid Values:
Importance: low
client.rack
A rack identifier for this client. This can be any string value which indicates where
this client is physically located. It corresponds with the broker config 'broker.rack'
Type: string
Default: ""
Valid Values:
Importance: low
fetch.max.wait.ms
The maximum amount of time the server will block before answering the fetch
request if there isn't sufficient data to immediately satisfy the requirement given by
fetch.min.bytes.
Type: int
Default: 500
Valid Values: [0,...]
Importance: low
interceptor.classes
Type: list
Default: ""
Valid Values: non-null string
Importance: low
metadata.max.age.ms
The period of time in milliseconds after which we force a refresh of metadata even if
we haven't seen any partition leadership changes to proactively discover any new
brokers or partitions.
Type: long
Default: 300000 (5 minutes)
Valid Values: [0,...]
Importance: low
metric.reporters
Type: list
Default: ""
Valid Values: non-null string
Importance: low
metrics.num.samples
Type: int
Default: 2
Valid Values: [1,...]
Importance: low
metrics.recording.level
Type: string
Default: INFO
Valid Values: [INFO, DEBUG]
Importance: low
metrics.sample.window.ms
Type: long
Default: 30000 (30 seconds)
Valid Values: [0,...]
Importance: low
reconnect.backoff.max.ms
Type: long
Default: 1000 (1 second)
Valid Values: [0,...]
Importance: low
reconnect.backoff.ms
The base amount of time to wait before attempting to reconnect to a given host.
This avoids repeatedly connecting to a host in a tight loop. This backoff applies to all
connection attempts by the client to a broker.
Type: long
Default: 50
Valid Values: [0,...]
Importance: low
retry.backoff.ms
The amount of time to wait before attempting to retry a failed request to a given
topic partition. This avoids repeatedly sending requests in a tight loop under some
failure scenarios.
Type: long
Default: 100
Valid Values: [0,...]
Importance: low
sasl.kerberos.kinit.cmd
Type: string
Default: /usr/bin/kinit
Valid Values:
Importance: low
sasl.kerberos.min.time.before.relogin
Type: long
Default: 60000
Valid Values:
Importance: low
sasl.kerberos.ticket.renew.jitter
Type: double
Default: 0.05
Valid Values:
Importance: low
sasl.kerberos.ticket.renew.window.factor
Login thread will sleep until the specified window factor of time from last refresh to
ticket's expiry has been reached, at which time it will try to renew the ticket.
Type: double
Default: 0.8
Valid Values:
Importance: low
sasl.login.refresh.buffer.seconds
The amount of buffer time before credential expiration to maintain when refreshing a
credential, in seconds. If a refresh would otherwise occur closer to expiration than
the number of buffer seconds then the refresh will be moved up to maintain as much
of the buffer time as possible. Legal values are between 0 and 3600 (1 hour); a
default value of 300 (5 minutes) is used if no value is specified. This value and
sasl.login.refresh.min.period.seconds are both ignored if their sum exceeds the
remaining lifetime of a credential. Currently applies only to OAUTHBEARER.
Type: short
Default: 300
Valid Values: [0,...,3600]
Importance: low
sasl.login.refresh.min.period.seconds
The desired minimum time for the login refresh thread to wait before refreshing a
credential, in seconds. Legal values are between 0 and 900 (15 minutes); a default
value of 60 (1 minute) is used if no value is specified. This value and
sasl.login.refresh.buffer.seconds are both ignored if their sum exceeds the
remaining lifetime of a credential. Currently applies only to OAUTHBEARER.
Type: short
Default: 60
Valid Values: [0,...,900]
Importance: low
sasl.login.refresh.window.factor
Login refresh thread will sleep until the specified window factor relative to the
credential's lifetime has been reached, at which time it will try to refresh the
credential. Legal values are between 0.5 (50%) and 1.0 (100%) inclusive; a default
value of 0.8 (80%) is used if no value is specified. Currently applies only to
OAUTHBEARER.
Type: double
Default: 0.8
Valid Values: [0.5,...,1.0]
Importance: low
sasl.login.refresh.window.jitter
The maximum amount of random jitter relative to the credential's lifetime that is
added to the login refresh thread's sleep time. Legal values are between 0 and 0.25
(25%) inclusive; a default value of 0.05 (5%) is used if no value is specified. Currently
applies only to OAUTHBEARER.
Type: double
Default: 0.05
Valid Values: [0.0,...,0.25]
Importance: low
security.providers
Type: string
Default: null
Valid Values:
Importance: low
ssl.cipher.suites
Type: list
Default: null
Valid Values:
Importance: low
ssl.endpoint.identification.algorithm
Type: string
Default: https
Valid Values:
Importance: low
ssl.engine.factory.class
Type: class
Default: null
Valid Values:
Importance: low
ssl.keymanager.algorithm
The algorithm used by key manager factory for SSL connections. Default value is the
key manager factory algorithm configured for the Java Virtual Machine.
Type: string
Default: SunX509
Valid Values:
Importance: low
ssl.secure.random.implementation
Type: string
Default: null
Valid Values:
Importance: low
ssl.trustmanager.algorithm
The algorithm used by trust manager factory for SSL connections. Default value is
the trust manager factory algorithm configured for the Java Virtual Machine.
Type: string
Default: PKIX
Valid Values:
Importance: low
The name of the Kafka topic where connector configurations are stored
Type: string
Default:
Valid Values:
Importance: high
group.id
A unique string that identifies the Connect cluster group this worker belongs to.
Type: string
Default:
Valid Values:
Importance: high
key.converter
Converter class used to convert between Kafka Connect format and the serialized
form that is written to Kafka. This controls the format of the keys in messages
written to or read from Kafka, and since this is independent of connectors it allows
any connector to work with any serialization format. Examples of common formats
include JSON and Avro.
Type: class
Default:
Valid Values:
Importance: high
offset.storage.topic
The name of the Kafka topic where connector offsets are stored
Type: string
Default:
Valid Values:
Importance: high
status.storage.topic
The name of the Kafka topic where connector and task status are stored
Type: string
Default:
Valid Values:
Importance: high
value.converter
Converter class used to convert between Kafka Connect format and the serialized
form that is written to Kafka. This controls the format of the values in messages
written to or read from Kafka, and since this is independent of connectors it allows
any connector to work with any serialization format. Examples of common formats
include JSON and Avro.
Type: class
Default:
Valid Values:
Importance: high
bootstrap.servers
A list of host/port pairs to use for establishing the initial connection to the Kafka
cluster. The client will make use of all servers irrespective of which servers are
specified here for bootstrapping—this list only impacts the initial hosts used to
discover the full set of servers. This list should be in the
form host1:port1,host2:port2,.... Since these servers are just used for the initial
connection to discover the full cluster membership (which may change dynamically),
this list need not contain the full set of servers (you may want more than one,
though, in case a server is down).
Type: list
Default: localhost:9092
Valid Values:
Importance: high
heartbeat.interval.ms
The expected time between heartbeats to the group coordinator when using Kafka's
group management facilities. Heartbeats are used to ensure that the worker's
session stays active and to facilitate rebalancing when new members join or leave
the group. The value must be set lower than session.timeout.ms, but typically should
be set no higher than 1/3 of that value. It can be adjusted even lower to control the
expected time for normal rebalances.
Type: int
Default: 3000 (3 seconds)
Valid Values:
Importance: high
rebalance.timeout.ms
The maximum allowed time for each worker to join the group once a rebalance has
begun. This is basically a limit on the amount of time needed for all tasks to flush
any pending data and commit offsets. If the timeout is exceeded, then the worker
will be removed from the group, which will cause offset commit failures.
Type: int
Default: 60000 (1 minute)
Valid Values:
Importance: high
session.timeout.ms
The timeout used to detect worker failures. The worker sends periodic heartbeats to
indicate its liveness to the broker. If no heartbeats are received by the broker before
the expiration of this session timeout, then the broker will remove the worker from
the group and initiate a rebalance. Note that the value must be in the allowable range
as configured in the broker configuration
by group.min.session.timeout.ms and group.max.session.timeout.ms.
Type: int
Default: 10000 (10 seconds)
Valid Values:
Importance: high
ssl.key.password
The password of the private key in the key store file. This is optional for client.
Type: password
Default: null
Valid Values:
Importance: high
ssl.keystore.location
The location of the key store file. This is optional for client and can be used for two-
way authentication for client.
Type: string
Default: null
Valid Values:
Importance: high
ssl.keystore.password
The store password for the key store file. This is optional for client and only needed
if ssl.keystore.location is configured.
Type: password
Default: null
Valid Values:
Importance: high
ssl.truststore.location
Type: string
Default: null
Valid Values:
Importance: high
ssl.truststore.password
The password for the trust store file. If a password is not set access to the truststore
is still available, but integrity checking is disabled.
Type: password
Default: null
Valid Values:
Importance: high
client.dns.lookup
Controls how the client uses DNS lookups. If set to use_all_dns_ips, connect to each
returned IP address in sequence until a successful connection is established. After a
disconnection, the next IP is used. Once all IPs have been used once, the client
resolves the IP(s) from the hostname again (both the JVM and the OS cache DNS
name lookups, however). If set to resolve_canonical_bootstrap_servers_only ,
resolve each bootstrap address into a list of canonical names. After the bootstrap
phase, this behaves the same as use_all_dns_ips. If set to default (deprecated),
attempt to connect to the first IP address returned by the lookup, even if the lookup
returns multiple IP addresses.
Type: string
Default: use_all_dns_ips
[default, use_all_dns_ips,
Valid Values:
resolve_canonical_bootstrap_servers_only]
Importance: medium
connections.max.idle.ms
Close idle connections after the number of milliseconds specified by this config.
Type: long
Default: 540000 (9 minutes)
Valid Values:
Importance: medium
connector.client.config.override.policy
Type: string
Default: None
Valid Values:
Importance: medium
receive.buffer.bytes
The size of the TCP receive buffer (SO_RCVBUF) to use when reading data. If the
value is -1, the OS default will be used.
Type: int
Default: 32768 (32 kibibytes)
Valid Values: [0,...]
Importance: medium
request.timeout.ms
The configuration controls the maximum amount of time the client will wait for the
response of a request. If the response is not received before the timeout elapses the
client will resend the request if necessary or fail the request if retries are exhausted.
Type: int
Default: 40000 (40 seconds)
Valid Values: [0,...]
Importance: medium
sasl.client.callback.handler.class
The fully qualified name of a SASL client callback handler class that implements the
AuthenticateCallbackHandler interface.
Type: class
Default: null
Valid Values:
Importance: medium
sasl.jaas.config
JAAS login context parameters for SASL connections in the format used by JAAS
configuration files. JAAS configuration file format is described here. The format for
the value is: 'loginModuleClass controlFlag (optionName=optionValue)*; '. For
brokers, the config must be prefixed with listener prefix and SASL mechanism name
in lower-case. For example, listener.name.sasl_ssl.scram-sha-
256.sasl.jaas.config=com.example.ScramLoginModule required;
Type: password
Default: null
Valid Values:
Importance: medium
sasl.kerberos.service.name
The Kerberos principal name that Kafka runs as. This can be defined either in Kafka's
JAAS config or in Kafka's config.
Type: string
Default: null
Valid Values:
Importance: medium
sasl.login.callback.handler.class
The fully qualified name of a SASL login callback handler class that implements the
AuthenticateCallbackHandler interface. For brokers, login callback handler config
must be prefixed with listener prefix and SASL mechanism name in lower-case. For
example, listener.name.sasl_ssl.scram-sha-
256.sasl.login.callback.handler.class=com.example.CustomScramLoginCallbackHa
ndler
Type: class
Default: null
Valid Values:
Importance: medium
sasl.login.class
The fully qualified name of a class that implements the Login interface. For brokers,
login config must be prefixed with listener prefix and SASL mechanism name in
lower-case. For example, listener.name.sasl_ssl.scram-sha-
256.sasl.login.class=com.example.CustomScramLogin
Type: class
Default: null
Valid Values:
Importance: medium
sasl.mechanism
SASL mechanism used for client connections. This may be any mechanism for
which a security provider is available. GSSAPI is the default mechanism.
Type: string
Default: GSSAPI
Valid Values:
Importance: medium
security.protocol
Protocol used to communicate with brokers. Valid values are: PLAINTEXT, SSL,
SASL_PLAINTEXT, SASL_SSL.
Type: string
Default: PLAINTEXT
Valid Values:
Importance: medium
send.buffer.bytes
The size of the TCP send buffer (SO_SNDBUF) to use when sending data. If the value
is -1, the OS default will be used.
Type: int
Default: 131072 (128 kibibytes)
Valid Values: [0,...]
Importance: medium
ssl.enabled.protocols
The list of protocols enabled for SSL connections. The default is 'TLSv1.2,TLSv1.3'
when running with Java 11 or newer, 'TLSv1.2' otherwise. With the default value for
Java 11, clients and servers will prefer TLSv1.3 if both support it and fallback to
TLSv1.2 otherwise (assuming both support at least TLSv1.2). This default should be
fine for most cases. Also see the config documentation for `ssl.protocol`.
Type: list
Default: TLSv1.2
Valid Values:
Importance: medium
ssl.keystore.type
The file format of the key store file. This is optional for client.
Type: string
Default: JKS
Valid Values:
Importance: medium
ssl.protocol
The SSL protocol used to generate the SSLContext. The default is 'TLSv1.3' when
running with Java 11 or newer, 'TLSv1.2' otherwise. This value should be fine for
most use cases. Allowed values in recent JVMs are 'TLSv1.2' and 'TLSv1.3'. 'TLS',
'TLSv1.1', 'SSL', 'SSLv2' and 'SSLv3' may be supported in older JVMs, but their usage
is discouraged due to known security vulnerabilities. With the default value for this
config and 'ssl.enabled.protocols', clients will downgrade to 'TLSv1.2' if the server
does not support 'TLSv1.3'. If this config is set to 'TLSv1.2', clients will not use
'TLSv1.3' even if it is one of the values in ssl.enabled.protocols and the server only
supports 'TLSv1.3'.
Type: string
Default: TLSv1.2
Valid Values:
Importance: medium
ssl.provider
The name of the security provider used for SSL connections. Default value is the
default security provider of the JVM.
Type: string
Default: null
Valid Values:
Importance: medium
ssl.truststore.type
Type: string
Default: JKS
Valid Values:
Importance: medium
worker.sync.timeout.ms
When the worker is out of sync with other workers and needs to resynchronize
configurations, wait up to this amount of time before giving up, leaving the group,
and waiting a backoff period before rejoining.
Type: int
Default: 3000 (3 seconds)
Valid Values:
Importance: medium
worker.unsync.backoff.ms
When the worker is out of sync with other workers and fails to catch up within
worker.sync.timeout.ms, leave the Connect cluster for this long before rejoining.
Type: int
Default: 300000 (5 minutes)
Valid Values:
Importance: medium
access.control.allow.methods
Sets the methods supported for cross origin requests by setting the Access-Control-
Allow-Methods header. The default value of the Access-Control-Allow-Methods
header allows cross origin requests for GET, POST and HEAD.
Type: string
Default: ""
Valid Values:
Importance: low
access.control.allow.origin
Type: string
Default: ""
Valid Values:
Importance: low
admin.listeners
List of comma-separated URIs the Admin REST API will listen on. The supported
protocols are HTTP and HTTPS. An empty or blank string will disable this feature.
The default behavior is to use the regular listener (specified by the 'listeners'
property).
Type: list
Default: null
Valid org.apache.kafka.connect.runtime.WorkerConfig$AdminListenersValidator
Values: @7b1d7fff
Importanc
low
e:
client.id
An id string to pass to the server when making requests. The purpose of this is to be
able to track the source of requests beyond just ip/port by allowing a logical
application name to be included in server-side request logging.
Type: string
Default: ""
Valid Values:
Importance: low
config.providers
Type: list
Default: ""
Valid Values:
Importance: low
config.storage.replication.factor
Type: short
Default: 3
Positive number not larger
than the number of brokers in
Valid Values:
the Kafka cluster, or -1 to use
the broker's default
Importance: low
connect.protocol
Type: string
Default: sessioned
[eager, compatible,
Valid Values:
sessioned]
Importance: low
header.converter
HeaderConverter class used to convert between Kafka Connect format and the
serialized form that is written to Kafka. This controls the format of the header values
in messages written to or read from Kafka, and since this is independent of
connectors it allows any connector to work with any serialization format. Examples
of common formats include JSON and Avro. By default, the SimpleHeaderConverter
is used to serialize header values to strings and deserialize them by inferring the
schemas.
Type: class
Default: org.apache.kafka.connect.storage.SimpleHeaderConverter
Valid Values:
Importance: low
inter.worker.key.generation.algorithm
Type: string
Default: HmacSHA256
Any KeyGenerator algorithm
Valid Values:
supported by the worker JVM
Importance: low
inter.worker.key.size
The size of the key to use for signing internal requests, in bits. If null, the default key
size for the key generation algorithm will be used.
Type: int
Default: null
Valid Values:
Importance: low
inter.worker.key.ttl.ms
The TTL of generated session keys used for internal request validation (in
milliseconds)
Type: int
Default: 3600000 (1 hour)
Valid Values: [0,...,2147483647]
Importance: low
inter.worker.signature.algorithm
Type: string
Default: HmacSHA256
Any MAC algorithm
Valid Values:
supported by the worker JVM
Importance: low
inter.worker.verification.algorithms
Type: list
Default: HmacSHA256
A list of one or more MAC
Valid Values: algorithms, each supported by
the worker JVM
Importance: low
internal.key.converter
Converter class used to convert between Kafka Connect format and the serialized
form that is written to Kafka. This controls the format of the keys in messages
written to or read from Kafka, and since this is independent of connectors it allows
any connector to work with any serialization format. Examples of common formats
include JSON and Avro. This setting controls the format used for internal
bookkeeping data used by the framework, such as configs and offsets, so users can
typically use any functioning Converter implementation. Deprecated; will be removed
in an upcoming version.
Type: class
Default: org.apache.kafka.connect.json.JsonConverter
Valid Values:
Importance: low
internal.value.converter
Converter class used to convert between Kafka Connect format and the serialized
form that is written to Kafka. This controls the format of the values in messages
written to or read from Kafka, and since this is independent of connectors it allows
any connector to work with any serialization format. Examples of common formats
include JSON and Avro. This setting controls the format used for internal
bookkeeping data used by the framework, such as configs and offsets, so users can
typically use any functioning Converter implementation. Deprecated; will be removed
in an upcoming version.
Type: class
Default: org.apache.kafka.connect.json.JsonConverter
Valid Values:
Importance: low
listeners
List of comma-separated URIs the REST API will listen on. The supported protocols
are HTTP and HTTPS.
Specify hostname as 0.0.0.0 to bind to all interfaces.
Leave hostname empty to bind to default interface.
Examples of legal listener lists: HTTP://myhost:8083,HTTPS://myhost:8084
Type: list
Default: null
Valid Values:
Importance: low
metadata.max.age.ms
The period of time in milliseconds after which we force a refresh of metadata even if
we haven't seen any partition leadership changes to proactively discover any new
brokers or partitions.
Type: long
Default: 300000 (5 minutes)
Valid Values: [0,...]
Importance: low
metric.reporters
Type: list
Default: ""
Valid Values:
Importance: low
metrics.num.samples
Type: int
Default: 2
Valid Values: [1,...]
Importance: low
metrics.recording.level
Type: string
Default: INFO
Valid Values: [INFO, DEBUG]
Importance: low
metrics.sample.window.ms
Type: long
Default: 30000 (30 seconds)
Valid Values: [0,...]
Importance: low
offset.flush.interval.ms
Type: long
Default: 60000 (1 minute)
Valid Values:
Importance: low
offset.flush.timeout.ms
Maximum number of milliseconds to wait for records to flush and partition offset
data to be committed to offset storage before cancelling the process and restoring
the offset data to be committed in a future attempt.
Type: long
Default: 5000 (5 seconds)
Valid Values:
Importance: low
offset.storage.partitions
The number of partitions used when creating the offset storage topic
Type: int
Default: 25
Positive number, or -1 to use
Valid Values:
the broker's default
Importance: low
offset.storage.replication.factor
Replication factor used when creating the offset storage topic
Type: short
Default: 3
Positive number not larger
than the number of brokers in
Valid Values:
the Kafka cluster, or -1 to use
the broker's default
Importance: low
plugin.path
List of paths separated by commas (,) that contain plugins (connectors, converters,
transformations). The list should consist of top level directories that include any
combination of:
a) directories immediately containing jars with plugins and their dependencies
b) uber-jars with plugins and their dependencies
c) directories immediately containing the package directory structure of classes of
plugins and their dependencies
Note: symlinks will be followed to discover dependencies or plugins.
Examples:
plugin.path=/usr/local/share/java,/usr/local/share/kafka/plugins,/opt/connectors
Do not use config provider variables in this property, since the raw path is used by
the worker's scanner before config providers are initialized and used to replace
variables.
Type: list
Default: null
Valid Values:
Importance: low
reconnect.backoff.max.ms
Type: long
Default: 1000 (1 second)
Valid Values: [0,...]
Importance: low
reconnect.backoff.ms
The base amount of time to wait before attempting to reconnect to a given host.
This avoids repeatedly connecting to a host in a tight loop. This backoff applies to all
connection attempts by the client to a broker.
Type: long
Default: 50
Valid Values: [0,...]
Importance: low
response.http.headers.config
Type: string
Default: ""
Comma-separated header
rules, where each header rule
is of the form '[action]
Valid Values: [header name]:[header value]'
and optionally surrounded by
double quotes if any part of a
header rule contains a comma
Importance: low
rest.advertised.host.name
If this is set, this is the hostname that will be given out to other workers to connect
to.
Type: string
Default: null
Valid Values:
Importance: low
rest.advertised.listener
Sets the advertised listener (HTTP or HTTPS) which will be given to other workers to
use.
Type: string
Default: null
Valid Values:
Importance: low
rest.advertised.port
If this is set, this is the port that will be given out to other workers to connect to.
Type: int
Default: null
Valid Values:
Importance: low
rest.extension.classes
Type: list
Default: ""
Valid Values:
Importance: low
rest.host.name
Hostname for the REST API. If this is set, it will only bind to this interface.
Type: string
Default: null
Valid Values:
Importance: low
rest.port
Type: int
Default: 8083
Valid Values:
Importance: low
retry.backoff.ms
The amount of time to wait before attempting to retry a failed request to a given
topic partition. This avoids repeatedly sending requests in a tight loop under some
failure scenarios.
Type: long
Default: 100
Valid Values: [0,...]
Importance: low
sasl.kerberos.kinit.cmd
Type: string
Default: /usr/bin/kinit
Valid Values:
Importance: low
sasl.kerberos.min.time.before.relogin
Type: long
Default: 60000
Valid Values:
Importance: low
sasl.kerberos.ticket.renew.jitter
Type: double
Default: 0.05
Valid Values:
Importance: low
sasl.kerberos.ticket.renew.window.factor
Login thread will sleep until the specified window factor of time from last refresh to
ticket's expiry has been reached, at which time it will try to renew the ticket.
Type: double
Default: 0.8
Valid Values:
Importance: low
sasl.login.refresh.buffer.seconds
The amount of buffer time before credential expiration to maintain when refreshing a
credential, in seconds. If a refresh would otherwise occur closer to expiration than
the number of buffer seconds then the refresh will be moved up to maintain as much
of the buffer time as possible. Legal values are between 0 and 3600 (1 hour); a
default value of 300 (5 minutes) is used if no value is specified. This value and
sasl.login.refresh.min.period.seconds are both ignored if their sum exceeds the
remaining lifetime of a credential. Currently applies only to OAUTHBEARER.
Type: short
Default: 300
Valid Values: [0,...,3600]
Importance: low
sasl.login.refresh.min.period.seconds
The desired minimum time for the login refresh thread to wait before refreshing a
credential, in seconds. Legal values are between 0 and 900 (15 minutes); a default
value of 60 (1 minute) is used if no value is specified. This value and
sasl.login.refresh.buffer.seconds are both ignored if their sum exceeds the
remaining lifetime of a credential. Currently applies only to OAUTHBEARER.
Type: short
Default: 60
Valid Values: [0,...,900]
Importance: low
sasl.login.refresh.window.factor
Login refresh thread will sleep until the specified window factor relative to the
credential's lifetime has been reached, at which time it will try to refresh the
credential. Legal values are between 0.5 (50%) and 1.0 (100%) inclusive; a default
value of 0.8 (80%) is used if no value is specified. Currently applies only to
OAUTHBEARER.
Type: double
Default: 0.8
Valid Values: [0.5,...,1.0]
Importance: low
sasl.login.refresh.window.jitter
The maximum amount of random jitter relative to the credential's lifetime that is
added to the login refresh thread's sleep time. Legal values are between 0 and 0.25
(25%) inclusive; a default value of 0.05 (5%) is used if no value is specified. Currently
applies only to OAUTHBEARER.
Type: double
Default: 0.05
Valid Values: [0.0,...,0.25]
Importance: low
scheduled.rebalance.max.delay.ms
The maximum delay that is scheduled in order to wait for the return of one or more
departed workers before rebalancing and reassigning their connectors and tasks to
the group. During this period the connectors and tasks of the departed workers
remain unassigned
Type: int
Default: 300000 (5 minutes)
Valid Values: [0,...,2147483647]
Importance: low
ssl.cipher.suites
Type: list
Default: null
Valid Values:
Importance: low
ssl.client.auth
Configures kafka broker to request client authentication. The following settings are
common:
Type: string
Default: none
Valid Values:
Importance: low
ssl.endpoint.identification.algorithm
Type: string
Default: https
Valid Values:
Importance: low
ssl.engine.factory.class
Type: class
Default: null
Valid Values:
Importance: low
ssl.keymanager.algorithm
The algorithm used by key manager factory for SSL connections. Default value is the
key manager factory algorithm configured for the Java Virtual Machine.
Type: string
Default: SunX509
Valid Values:
Importance: low
ssl.secure.random.implementation
Type: string
Default: null
Valid Values:
Importance: low
ssl.trustmanager.algorithm
The algorithm used by trust manager factory for SSL connections. Default value is
the trust manager factory algorithm configured for the Java Virtual Machine.
Type: string
Default: PKIX
Valid Values:
Importance: low
status.storage.partitions
The number of partitions used when creating the status storage topic
Type: int
Default: 5
Positive number, or -1 to use
Valid Values:
the broker's default
Importance: low
status.storage.replication.factor
Amount of time to wait for tasks to shutdown gracefully. This is the total amount of
time, not per task. All task have shutdown triggered, then they are waited on
sequentially.
Type: long
Default: 5000 (5 seconds)
Valid Values:
Importance: low
topic.creation.enable
Type: boolean
Default: true
Valid Values:
Importance: low
topic.tracking.allow.reset
If set to true, it allows user requests to reset the set of active topics per connector.
Type: boolean
Default: true
Valid Values:
Importance: low
topic.tracking.enable
Enable tracking the set of active topics per connector during runtime.
Type: boolean
Default: true
Valid Values:
Importance: low
Type: string
Default:
non-empty string without ISO
Valid Values:
control characters
Importance: high
connector.class
Type: string
Default:
Valid Values:
Importance: high
tasks.max
Type: int
Default: 1
Valid Values: [1,...]
Importance: high
key.converter
Converter class used to convert between Kafka Connect format and the serialized
form that is written to Kafka. This controls the format of the keys in messages
written to or read from Kafka, and since this is independent of connectors it allows
any connector to work with any serialization format. Examples of common formats
include JSON and Avro.
Type: class
Default: null
Valid Values:
Importance: low
value.converter
Converter class used to convert between Kafka Connect format and the serialized
form that is written to Kafka. This controls the format of the values in messages
written to or read from Kafka, and since this is independent of connectors it allows
any connector to work with any serialization format. Examples of common formats
include JSON and Avro.
Type: class
Default: null
Valid Values:
Importance: low
header.converter
HeaderConverter class used to convert between Kafka Connect format and the
serialized form that is written to Kafka. This controls the format of the header values
in messages written to or read from Kafka, and since this is independent of
connectors it allows any connector to work with any serialization format. Examples
of common formats include JSON and Avro. By default, the SimpleHeaderConverter
is used to serialize header values to strings and deserialize them by inferring the
schemas.
Type: class
Default: null
Valid Values:
Importance: low
config.action.reload
The action that Connect should take on the connector when changes in external
configuration providers result in a change in the connector's configuration
properties. A value of 'none' indicates that Connect will do nothing. A value of
'restart' indicates that Connect should restart/reload the connector with the updated
configuration properties.The restart may actually be scheduled in the future if the
external configuration provider indicates that a configuration value will expire in the
future.
Type: string
Default: restart
Valid Values: [none, restart]
Importance: low
transforms
Type: list
Default: ""
non-null string, unique
Valid Values:
transformation aliases
Importance: low
predicates
Type: list
Default: ""
non-null string, unique
Valid Values:
predicate aliases
Importance: low
errors.retry.timeout
Type: long
Default: 0
Valid Values:
Importance: medium
errors.retry.delay.max.ms
Type: long
Default: 60000 (1 minute)
Valid Values:
Importance: medium
errors.tolerance
Behavior for tolerating errors during connector operation. 'none' is the default value
and signals that any error will result in an immediate connector task failure; 'all'
changes the behavior to skip over problematic records.
Type: string
Default: none
Valid Values: [none, all]
Importance: medium
errors.log.enable
If true, write each error and the details of the failed operation and problematic record
to the Connect application log. This is 'false' by default, so that only errors that are
not tolerated are reported.
Type: boolean
Default: false
Valid Values:
Importance: medium
errors.log.include.messages
Whether to the include in the log the Connect record that resulted in a failure. This is
'false' by default, which will prevent record keys, values, and headers from being
written to log files, although some information such as topic and partition number
will still be logged.
Type: boolean
Default: false
Valid Values:
Importance: medium
topic.creation.groups
Type: list
Default: ""
non-null string, unique topic
Valid Values:
creation groups
Importance: low
Type: string
Default:
non-empty string without ISO
Valid Values:
control characters
Importance: high
connector.class
Type: string
Default:
Valid Values:
Importance: high
tasks.max
Type: int
Default: 1
Valid Values: [1,...]
Importance: high
topics
Type: list
Default: ""
Valid Values:
Importance: high
topics.regex
Regular expression giving topics to consume. Under the hood, the regex is compiled
to a java.util.regex.Pattern. Only one of topics or topics.regex should be specified.
Type: string
Default: ""
Valid Values: valid regex
Importance: high
key.converter
Converter class used to convert between Kafka Connect format and the serialized
form that is written to Kafka. This controls the format of the keys in messages
written to or read from Kafka, and since this is independent of connectors it allows
any connector to work with any serialization format. Examples of common formats
include JSON and Avro.
Type: class
Default: null
Valid Values:
Importance: low
value.converter
Converter class used to convert between Kafka Connect format and the serialized
form that is written to Kafka. This controls the format of the values in messages
written to or read from Kafka, and since this is independent of connectors it allows
any connector to work with any serialization format. Examples of common formats
include JSON and Avro.
Type: class
Default: null
Valid Values:
Importance: low
header.converter
HeaderConverter class used to convert between Kafka Connect format and the
serialized form that is written to Kafka. This controls the format of the header values
in messages written to or read from Kafka, and since this is independent of
connectors it allows any connector to work with any serialization format. Examples
of common formats include JSON and Avro. By default, the SimpleHeaderConverter
is used to serialize header values to strings and deserialize them by inferring the
schemas.
Type: class
Default: null
Valid Values:
Importance: low
config.action.reload
The action that Connect should take on the connector when changes in external
configuration providers result in a change in the connector's configuration
properties. A value of 'none' indicates that Connect will do nothing. A value of
'restart' indicates that Connect should restart/reload the connector with the updated
configuration properties.The restart may actually be scheduled in the future if the
external configuration provider indicates that a configuration value will expire in the
future.
Type: string
Default: restart
Valid Values: [none, restart]
Importance: low
transforms
Type: list
Default: ""
non-null string, unique
Valid Values:
transformation aliases
Importance: low
predicates
Type: list
Default: ""
non-null string, unique
Valid Values:
predicate aliases
Importance: low
errors.retry.timeout
Type: long
Default: 0
Valid Values:
Importance: medium
errors.retry.delay.max.ms
Type: long
Default: 60000 (1 minute)
Valid Values:
Importance: medium
errors.tolerance
Behavior for tolerating errors during connector operation. 'none' is the default value
and signals that any error will result in an immediate connector task failure; 'all'
changes the behavior to skip over problematic records.
Type: string
Default: none
Valid Values: [none, all]
Importance: medium
errors.log.enable
If true, write each error and the details of the failed operation and problematic record
to the Connect application log. This is 'false' by default, so that only errors that are
not tolerated are reported.
Type: boolean
Default: false
Valid Values:
Importance: medium
errors.log.include.messages
Whether to the include in the log the Connect record that resulted in a failure. This is
'false' by default, which will prevent record keys, values, and headers from being
written to log files, although some information such as topic and partition number
will still be logged.
Type: boolean
Default: false
Valid Values:
Importance: medium
errors.deadletterqueue.topic.name
The name of the topic to be used as the dead letter queue (DLQ) for messages that
result in an error when processed by this sink connector, or its transformations or
converters. The topic name is blank by default, which means that no messages are
to be recorded in the DLQ.
Type: string
Default: ""
Valid Values:
Importance: medium
errors.deadletterqueue.topic.replication.factor
Replication factor used to create the dead letter queue topic when it doesn't already
exist.
Type: short
Default: 3
Valid Values:
Importance: medium
errors.deadletterqueue.context.headers.enable
If true, add headers containing error context to the messages written to the dead
letter queue. To avoid clashing with headers from the original record, all error
context header keys, all error context header keys will start with __connect.errors.
Type: boolean
Default: false
Valid Values:
Importance: medium
An identifier for the stream processing application. Must be unique within the Kafka
cluster. It is used as 1) the default client-id prefix, 2) the group-id for membership
management, 3) the changelog topic prefix.
Type: string
Default:
Valid Values:
Importance: high
bootstrap.servers
A list of host/port pairs to use for establishing the initial connection to the Kafka
cluster. The client will make use of all servers irrespective of which servers are
specified here for bootstrapping—this list only impacts the initial hosts used to
discover the full set of servers. This list should be in the
form host1:port1,host2:port2,.... Since these servers are just used for the initial
connection to discover the full cluster membership (which may change dynamically),
this list need not contain the full set of servers (you may want more than one,
though, in case a server is down).
Type: list
Default:
Valid Values:
Importance: high
replication.factor
The replication factor for change log topics and repartition topics created by the
stream processing application.
Type: int
Default: 1
Valid Values:
Importance: high
state.dir
Directory location for state store. This path must be unique for each streams
instance sharing the same underlying filesystem.
Type: string
Default: /tmp/kafka-streams
Valid Values:
Importance: high
acceptable.recovery.lag
The maximum acceptable lag (number of offsets to catch up) for a client to be
considered caught-up for an active task.Should correspond to a recovery time of well
under a minute for a given workload. Must be at least 0.
Type: long
Default: 10000
Valid Values: [0,...]
Importance: medium
cache.max.bytes.buffering
Maximum number of memory bytes to be used for buffering across all threads
Type: long
Default: 10485760
Valid Values: [0,...]
Importance: medium
client.id
An ID prefix string used for the client IDs of internal consumer, producer and restore-
consumer, with pattern '-StreamThread--'.
Type: string
Default: ""
Valid Values:
Importance: medium
default.deserialization.exception.handler
Type: class
Default: org.apache.kafka.streams.errors.LogAndFailExceptionHandler
Valid Values:
Importance: medium
default.key.serde
Type: class
Default: org.apache.kafka.common.serialization.Serdes$ByteArraySerde
Valid Values:
Importance: medium
default.production.exception.handler
Type: class
Default: org.apache.kafka.streams.errors.DefaultProductionExceptionHandler
Valid Values:
Importance: medium
default.timestamp.extractor
Type: class
Default: org.apache.kafka.streams.processor.FailOnInvalidTimestamp
Valid Values:
Importance: medium
default.value.serde
Type: class
Default: org.apache.kafka.common.serialization.Serdes$ByteArraySerde
Valid Values:
Importance: medium
max.task.idle.ms
Maximum amount of time a stream task will stay idle when not all of its partition
buffers contain records, to avoid potential out-of-order record processing across
multiple input streams.
Type: long
Default: 0
Valid Values:
Importance: medium
max.warmup.replicas
The maximum number of warmup replicas (extra standbys beyond the configured
num.standbys) that can be assigned at once for the purpose of keeping the task
available on one instance while it is warming up on another instance it has been
reassigned to. Used to throttle how much extra broker traffic and cluster state can
be used for high availability. Must be at least 1.
Type: int
Default: 2
Valid Values: [1,...]
Importance: medium
num.standby.replicas
Type: int
Default: 0
Valid Values:
Importance: medium
num.stream.threads
Type: int
Default: 1
Valid Values:
Importance: medium
processing.guarantee
Type: string
Default: at_least_once
[at_least_once, exactly_once,
Valid Values:
exactly_once_beta]
Importance: medium
security.protocol
Protocol used to communicate with brokers. Valid values are: PLAINTEXT, SSL,
SASL_PLAINTEXT, SASL_SSL.
Type: string
Default: PLAINTEXT
Valid Values:
Importance: medium
topology.optimization
Type: string
Default: none
Valid Values: [none, all]
Importance: medium
application.server
A host:port pair pointing to a user-defined endpoint that can be used for state store
discovery and interactive queries on this KafkaStreams instance.
Type: string
Default: ""
Valid Values:
Importance: low
buffered.records.per.partition
Maximum number of records to buffer per partition.
Type: int
Default: 1000
Valid Values:
Importance: low
built.in.metrics.version
Type: string
Default: latest
Valid Values: [0.10.0-2.4, latest]
Importance: low
commit.interval.ms
The frequency with which to save the position of the processor. (Note,
if processing.guarantee is set to exactly_once, the default value is 100, otherwise the
default value is 30000.
Type: long
Default: 30000 (30 seconds)
Valid Values: [0,...]
Importance: low
connections.max.idle.ms
Close idle connections after the number of milliseconds specified by this config.
Type: long
Default: 540000 (9 minutes)
Valid Values:
Importance: low
metadata.max.age.ms
The period of time in milliseconds after which we force a refresh of metadata even if
we haven't seen any partition leadership changes to proactively discover any new
brokers or partitions.
Type: long
Default: 300000 (5 minutes)
Valid Values: [0,...]
Importance: low
metric.reporters
Type: list
Default: ""
Valid Values:
Importance: low
metrics.num.samples
Type: int
Default: 2
Valid Values: [1,...]
Importance: low
metrics.recording.level
Type: string
Default: INFO
Valid Values: [INFO, DEBUG]
Importance: low
metrics.sample.window.ms
Type: long
Default: 30000 (30 seconds)
Valid Values: [0,...]
Importance: low
partition.grouper
Type: class
Default: org.apache.kafka.streams.processor.DefaultPartitionGrouper
Valid Values:
Importance: low
poll.ms
Type: long
Default: 100
Valid Values:
Importance: low
probing.rebalance.interval.ms
The maximum time to wait before triggering a rebalance to probe for warmup
replicas that have finished warming up and are ready to become active. Probing
rebalances will continue to be triggered until the assignment is balanced. Must be at
least 1 minute.
Type: long
Default: 600000 (10 minutes)
Valid Values: [60000,...]
Importance: low
receive.buffer.bytes
The size of the TCP receive buffer (SO_RCVBUF) to use when reading data. If the
value is -1, the OS default will be used.
Type: int
Default: 32768 (32 kibibytes)
Valid Values: [-1,...]
Importance: low
reconnect.backoff.max.ms
Type: long
Default: 1000 (1 second)
Valid Values: [0,...]
Importance: low
reconnect.backoff.ms
The base amount of time to wait before attempting to reconnect to a given host.
This avoids repeatedly connecting to a host in a tight loop. This backoff applies to all
connection attempts by the client to a broker.
Type: long
Default: 50
Valid Values: [0,...]
Importance: low
request.timeout.ms
The configuration controls the maximum amount of time the client will wait for the
response of a request. If the response is not received before the timeout elapses the
client will resend the request if necessary or fail the request if retries are exhausted.
Type: int
Default: 40000 (40 seconds)
Valid Values: [0,...]
Importance: low
retries
Setting a value greater than zero will cause the client to resend any request that fails
with a potentially transient error.
Type: int
Default: 0
Valid Values: [0,...,2147483647]
Importance: low
retry.backoff.ms
The amount of time to wait before attempting to retry a failed request to a given
topic partition. This avoids repeatedly sending requests in a tight loop under some
failure scenarios.
Type: long
Default: 100
Valid Values: [0,...]
Importance: low
rocksdb.config.setter
Type: class
Default: null
Valid Values:
Importance: low
send.buffer.bytes
The size of the TCP send buffer (SO_SNDBUF) to use when sending data. If the value
is -1, the OS default will be used.
Type: int
Default: 131072 (128 kibibytes)
Valid Values: [-1,...]
Importance: low
state.cleanup.delay.ms
The amount of time in milliseconds to wait before deleting state when a partition has
migrated. Only state directories that have not been modified for at
least state.cleanup.delay.ms will be removed
Type: long
Default: 600000 (10 minutes)
Valid Values:
Importance: low
upgrade.from
Type: string
Default: null
[null, 0.10.0, 0.10.1, 0.10.2,
Valid Values: 0.11.0, 1.0, 1.1, 2.0, 2.1, 2.2,
2.3]
Importance: low
windowstore.changelog.additional.retention.ms
Added to a windows maintainMs to ensure data is not deleted from the log
prematurely. Allows for clock drift. Default is 1 day
Type: long
Default: 86400000 (1 day)
Valid Values:
Importance: low
A list of host/port pairs to use for establishing the initial connection to the Kafka
cluster. The client will make use of all servers irrespective of which servers are
specified here for bootstrapping—this list only impacts the initial hosts used to
discover the full set of servers. This list should be in the
form host1:port1,host2:port2,.... Since these servers are just used for the initial
connection to discover the full cluster membership (which may change dynamically),
this list need not contain the full set of servers (you may want more than one,
though, in case a server is down).
Type: list
Default:
Valid Values:
Importance: high
ssl.key.password
The password of the private key in the key store file. This is optional for client.
Type: password
Default: null
Valid Values:
Importance: high
ssl.keystore.location
The location of the key store file. This is optional for client and can be used for two-
way authentication for client.
Type: string
Default: null
Valid Values:
Importance: high
ssl.keystore.password
The store password for the key store file. This is optional for client and only needed
if ssl.keystore.location is configured.
Type: password
Default: null
Valid Values:
Importance: high
ssl.truststore.location
The password for the trust store file. If a password is not set access to the truststore
is still available, but integrity checking is disabled.
Type: password
Default: null
Valid Values:
Importance: high
client.dns.lookup
Controls how the client uses DNS lookups. If set to use_all_dns_ips, connect to each
returned IP address in sequence until a successful connection is established. After a
disconnection, the next IP is used. Once all IPs have been used once, the client
resolves the IP(s) from the hostname again (both the JVM and the OS cache DNS
name lookups, however). If set to resolve_canonical_bootstrap_servers_only ,
resolve each bootstrap address into a list of canonical names. After the bootstrap
phase, this behaves the same as use_all_dns_ips. If set to default (deprecated),
attempt to connect to the first IP address returned by the lookup, even if the lookup
returns multiple IP addresses.
Type: string
Default: use_all_dns_ips
[default, use_all_dns_ips,
Valid Values:
resolve_canonical_bootstrap_servers_only]
Importance: medium
client.id
An id string to pass to the server when making requests. The purpose of this is to be
able to track the source of requests beyond just ip/port by allowing a logical
application name to be included in server-side request logging.
Type: string
Default: ""
Valid Values:
Importance: medium
connections.max.idle.ms
Close idle connections after the number of milliseconds specified by this config.
Type: long
Default: 300000 (5 minutes)
Valid Values:
Importance: medium
default.api.timeout.ms
Specifies the timeout (in milliseconds) for client APIs. This configuration is used as
the default timeout for all client operations that do not specify a timeout parameter.
Type: int
Default: 60000 (1 minute)
Valid Values: [0,...]
Importance: medium
receive.buffer.bytes
The size of the TCP receive buffer (SO_RCVBUF) to use when reading data. If the
value is -1, the OS default will be used.
Type: int
Default: 65536 (64 kibibytes)
Valid Values: [-1,...]
Importance: medium
request.timeout.ms
The configuration controls the maximum amount of time the client will wait for the
response of a request. If the response is not received before the timeout elapses the
client will resend the request if necessary or fail the request if retries are exhausted.
Type: int
Default: 30000 (30 seconds)
Valid Values: [0,...]
Importance: medium
sasl.client.callback.handler.class
The fully qualified name of a SASL client callback handler class that implements the
AuthenticateCallbackHandler interface.
Type: class
Default: null
Valid Values:
Importance: medium
sasl.jaas.config
JAAS login context parameters for SASL connections in the format used by JAAS
configuration files. JAAS configuration file format is described here. The format for
the value is: 'loginModuleClass controlFlag (optionName=optionValue)*; '. For
brokers, the config must be prefixed with listener prefix and SASL mechanism name
in lower-case. For example, listener.name.sasl_ssl.scram-sha-
256.sasl.jaas.config=com.example.ScramLoginModule required;
Type: password
Default: null
Valid Values:
Importance: medium
sasl.kerberos.service.name
The Kerberos principal name that Kafka runs as. This can be defined either in Kafka's
JAAS config or in Kafka's config.
Type: string
Default: null
Valid Values:
Importance: medium
sasl.login.callback.handler.class
The fully qualified name of a SASL login callback handler class that implements the
AuthenticateCallbackHandler interface. For brokers, login callback handler config
must be prefixed with listener prefix and SASL mechanism name in lower-case. For
example, listener.name.sasl_ssl.scram-sha-
256.sasl.login.callback.handler.class=com.example.CustomScramLoginCallbackHa
ndler
Type: class
Default: null
Valid Values:
Importance: medium
sasl.login.class
The fully qualified name of a class that implements the Login interface. For brokers,
login config must be prefixed with listener prefix and SASL mechanism name in
lower-case. For example, listener.name.sasl_ssl.scram-sha-
256.sasl.login.class=com.example.CustomScramLogin
Type: class
Default: null
Valid Values:
Importance: medium
sasl.mechanism
SASL mechanism used for client connections. This may be any mechanism for
which a security provider is available. GSSAPI is the default mechanism.
Type: string
Default: GSSAPI
Valid Values:
Importance: medium
security.protocol
Protocol used to communicate with brokers. Valid values are: PLAINTEXT, SSL,
SASL_PLAINTEXT, SASL_SSL.
Type: string
Default: PLAINTEXT
Valid Values:
Importance: medium
send.buffer.bytes
The size of the TCP send buffer (SO_SNDBUF) to use when sending data. If the value
is -1, the OS default will be used.
Type: int
Default: 131072 (128 kibibytes)
Valid Values: [-1,...]
Importance: medium
ssl.enabled.protocols
The list of protocols enabled for SSL connections. The default is 'TLSv1.2,TLSv1.3'
when running with Java 11 or newer, 'TLSv1.2' otherwise. With the default value for
Java 11, clients and servers will prefer TLSv1.3 if both support it and fallback to
TLSv1.2 otherwise (assuming both support at least TLSv1.2). This default should be
fine for most cases. Also see the config documentation for `ssl.protocol`.
Type: list
Default: TLSv1.2
Valid Values:
Importance: medium
ssl.keystore.type
The file format of the key store file. This is optional for client.
Type: string
Default: JKS
Valid Values:
Importance: medium
ssl.protocol
The SSL protocol used to generate the SSLContext. The default is 'TLSv1.3' when
running with Java 11 or newer, 'TLSv1.2' otherwise. This value should be fine for
most use cases. Allowed values in recent JVMs are 'TLSv1.2' and 'TLSv1.3'. 'TLS',
'TLSv1.1', 'SSL', 'SSLv2' and 'SSLv3' may be supported in older JVMs, but their usage
is discouraged due to known security vulnerabilities. With the default value for this
config and 'ssl.enabled.protocols', clients will downgrade to 'TLSv1.2' if the server
does not support 'TLSv1.3'. If this config is set to 'TLSv1.2', clients will not use
'TLSv1.3' even if it is one of the values in ssl.enabled.protocols and the server only
supports 'TLSv1.3'.
Type: string
Default: TLSv1.2
Valid Values:
Importance: medium
ssl.provider
The name of the security provider used for SSL connections. Default value is the
default security provider of the JVM.
Type: string
Default: null
Valid Values:
Importance: medium
ssl.truststore.type
Type: string
Default: JKS
Valid Values:
Importance: medium
metadata.max.age.ms
The period of time in milliseconds after which we force a refresh of metadata even if
we haven't seen any partition leadership changes to proactively discover any new
brokers or partitions.
Type: long
Default: 300000 (5 minutes)
Valid Values: [0,...]
Importance: low
metric.reporters
Type: int
Default: 2
Valid Values: [1,...]
Importance: low
metrics.recording.level
Type: string
Default: INFO
Valid Values: [INFO, DEBUG]
Importance: low
metrics.sample.window.ms
Type: long
Default: 30000 (30 seconds)
Valid Values: [0,...]
Importance: low
reconnect.backoff.max.ms
Type: long
Default: 1000 (1 second)
Valid Values: [0,...]
Importance: low
reconnect.backoff.ms
The base amount of time to wait before attempting to reconnect to a given host.
This avoids repeatedly connecting to a host in a tight loop. This backoff applies to all
connection attempts by the client to a broker.
Type: long
Default: 50
Valid Values: [0,...]
Importance: low
retries
Setting a value greater than zero will cause the client to resend any request that fails
with a potentially transient error.
Type: int
Default: 2147483647
Valid Values: [0,...,2147483647]
Importance: low
retry.backoff.ms
The amount of time to wait before attempting to retry a failed request. This avoids
repeatedly sending requests in a tight loop under some failure scenarios.
Type: long
Default: 100
Valid Values: [0,...]
Importance: low
sasl.kerberos.kinit.cmd
Type: string
Default: /usr/bin/kinit
Valid Values:
Importance: low
sasl.kerberos.min.time.before.relogin
Type: long
Default: 60000
Valid Values:
Importance: low
sasl.kerberos.ticket.renew.jitter
Type: double
Default: 0.05
Valid Values:
Importance: low
sasl.kerberos.ticket.renew.window.factor
Login thread will sleep until the specified window factor of time from last refresh to
ticket's expiry has been reached, at which time it will try to renew the ticket.
Type: double
Default: 0.8
Valid Values:
Importance: low
sasl.login.refresh.buffer.seconds
The amount of buffer time before credential expiration to maintain when refreshing a
credential, in seconds. If a refresh would otherwise occur closer to expiration than
the number of buffer seconds then the refresh will be moved up to maintain as much
of the buffer time as possible. Legal values are between 0 and 3600 (1 hour); a
default value of 300 (5 minutes) is used if no value is specified. This value and
sasl.login.refresh.min.period.seconds are both ignored if their sum exceeds the
remaining lifetime of a credential. Currently applies only to OAUTHBEARER.
Type: short
Default: 300
Valid Values: [0,...,3600]
Importance: low
sasl.login.refresh.min.period.seconds
The desired minimum time for the login refresh thread to wait before refreshing a
credential, in seconds. Legal values are between 0 and 900 (15 minutes); a default
value of 60 (1 minute) is used if no value is specified. This value and
sasl.login.refresh.buffer.seconds are both ignored if their sum exceeds the
remaining lifetime of a credential. Currently applies only to OAUTHBEARER.
Type: short
Default: 60
Valid Values: [0,...,900]
Importance: low
sasl.login.refresh.window.factor
Login refresh thread will sleep until the specified window factor relative to the
credential's lifetime has been reached, at which time it will try to refresh the
credential. Legal values are between 0.5 (50%) and 1.0 (100%) inclusive; a default
value of 0.8 (80%) is used if no value is specified. Currently applies only to
OAUTHBEARER.
Type: double
Default: 0.8
Valid Values: [0.5,...,1.0]
Importance: low
sasl.login.refresh.window.jitter
The maximum amount of random jitter relative to the credential's lifetime that is
added to the login refresh thread's sleep time. Legal values are between 0 and 0.25
(25%) inclusive; a default value of 0.05 (5%) is used if no value is specified. Currently
applies only to OAUTHBEARER.
Type: double
Default: 0.05
Valid Values: [0.0,...,0.25]
Importance: low
security.providers
A list of configurable creator classes each returning a provider implementing
security algorithms. These classes should implement
the org.apache.kafka.common.security.auth.SecurityProviderCreator interface.
Type: string
Default: null
Valid Values:
Importance: low
ssl.cipher.suites
Type: list
Default: null
Valid Values:
Importance: low
ssl.endpoint.identification.algorithm
Type: string
Default: https
Valid Values:
Importance: low
ssl.engine.factory.class
Type: class
Default: null
Valid Values:
Importance: low
ssl.keymanager.algorithm
The algorithm used by key manager factory for SSL connections. Default value is the
key manager factory algorithm configured for the Java Virtual Machine.
Type: string
Default: SunX509
Valid Values:
Importance: low
ssl.secure.random.implementation
Type: string
Default: null
Valid Values:
Importance: low
ssl.trustmanager.algorithm
The algorithm used by trust manager factory for SSL connections. Default value is
the trust manager factory algorithm configured for the Java Virtual Machine.
Type: string
Default: PKIX
Valid Values:
Importance: low
4. DESIGN
4.1 Motivation
We designed Kafka to be able to act as a unified platform for handling all the real-time data
feeds a large company might have. To do this we had to think through a fairly broad set of
use cases.
It would have to have high-throughput to support high volume event streams such as real-
time log aggregation.
It would need to deal gracefully with large data backlogs to be able to support periodic data
loads from offline systems.
It also meant the system would have to handle low-latency delivery to handle more
traditional messaging use-cases.
Finally in cases where the stream is fed into other data systems for serving, we knew the
system would have to be able to guarantee fault-tolerance in the presence of machine
failures.
Supporting these uses led us to a design with a number of unique elements, more akin to a
database log than a traditional messaging system. We will outline some elements of the
design in the following sections.
4.2 Persistence
Kafka relies heavily on the filesystem for storing and caching messages. There is a general
perception that "disks are slow" which makes people skeptical that a persistent structure
can offer competitive performance. In fact disks are both much slower and much faster
than people expect depending on how they are used; and a properly designed disk structure
can often be as fast as the network.
The key fact about disk performance is that the throughput of hard drives has been
diverging from the latency of a disk seek for the last decade. As a result the performance of
linear writes on a JBOD configuration with six 7200rpm SATA RAID-5 array is about
600MB/sec but the performance of random writes is only about 100k/sec—a difference of
over 6000X. These linear reads and writes are the most predictable of all usage patterns,
and are heavily optimized by the operating system. A modern operating system provides
read-ahead and write-behind techniques that prefetch data in large block multiples and
group smaller logical writes into large physical writes. A further discussion of this issue can
be found in this ACM Queue article; they actually find that sequential disk access can in
some cases be faster than random memory access!
To compensate for this performance divergence, modern operating systems have become
increasingly aggressive in their use of main memory for disk caching. A modern OS will
happily divert all free memory to disk caching with little performance penalty when the
memory is reclaimed. All disk reads and writes will go through this unified cache. This
feature cannot easily be turned off without using direct I/O, so even if a process maintains
an in-process cache of the data, this data will likely be duplicated in OS pagecache,
effectively storing everything twice.
Furthermore, we are building on top of the JVM, and anyone who has spent any time with
Java memory usage knows two things:
1. The memory overhead of objects is very high, often doubling the size of the data
stored (or worse).
2. Java garbage collection becomes increasingly fiddly and slow as the in-heap data
increases.
As a result of these factors using the filesystem and relying on pagecache is superior to
maintaining an in-memory cache or other structure—we at least double the available cache
by having automatic access to all free memory, and likely double again by storing a
compact byte structure rather than individual objects. Doing so will result in a cache of up
to 28-30GB on a 32GB machine without GC penalties. Furthermore, this cache will stay
warm even if the service is restarted, whereas the in-process cache will need to be rebuilt in
memory (which for a 10GB cache may take 10 minutes) or else it will need to start with a
completely cold cache (which likely means terrible initial performance). This also greatly
simplifies the code as all logic for maintaining coherency between the cache and filesystem
is now in the OS, which tends to do so more efficiently and more correctly than one-off in-
process attempts. If your disk usage favors linear reads then read-ahead is effectively pre-
populating this cache with useful data on each disk read.
This suggests a design which is very simple: rather than maintain as much as possible in-
memory and flush it all out to the filesystem in a panic when we run out of space, we invert
that. All data is immediately written to a persistent log on the filesystem without necessarily
flushing to disk. In effect this just means that it is transferred into the kernel's pagecache.
The persistent data structure used in messaging systems are often a per-consumer queue
with an associated BTree or other general-purpose random access data structures to
maintain metadata about messages. BTrees are the most versatile data structure available,
and make it possible to support a wide variety of transactional and non-transactional
semantics in the messaging system. They do come with a fairly high cost, though: Btree
operations are O(log N). Normally O(log N) is considered essentially equivalent to constant
time, but this is not true for disk operations. Disk seeks come at 10 ms a pop, and each disk
can do only one seek at a time so parallelism is limited. Hence even a handful of disk seeks
leads to very high overhead. Since storage systems mix very fast cached operations with
very slow physical disk operations, the observed performance of tree structures is often
superlinear as data increases with fixed cache--i.e. doubling your data makes things much
worse than twice as slow.
Intuitively a persistent queue could be built on simple reads and appends to files as is
commonly the case with logging solutions. This structure has the advantage that all
operations are O(1) and reads do not block writes or each other. This has obvious
performance advantages since the performance is completely decoupled from the data size
—one server can now take full advantage of a number of cheap, low-rotational speed 1+TB
SATA drives. Though they have poor seek performance, these drives have acceptable
performance for large reads and writes and come at 1/3 the price and 3x the capacity.
Having access to virtually unlimited disk space without any performance penalty means
that we can provide some features not usually found in a messaging system. For example,
in Kafka, instead of attempting to delete messages as soon as they are consumed, we can
retain messages for a relatively long period (say a week). This leads to a great deal of
flexibility for consumers, as we will describe.
4.3 Efficiency
We have put significant effort into efficiency. One of our primary use cases is handling web
activity data, which is very high volume: each page view may generate dozens of writes.
Furthermore, we assume each message published is read by at least one consumer (often
many), hence we strive to make consumption as cheap as possible.
We have also found, from experience building and running a number of similar systems,
that efficiency is a key to effective multi-tenant operations. If the downstream infrastructure
service can easily become a bottleneck due to a small bump in usage by the application,
such small changes will often create problems. By being very fast we help ensure that the
application will tip-over under load before the infrastructure. This is particularly important
when trying to run a centralized service that supports dozens or hundreds of applications
on a centralized cluster as changes in usage patterns are a near-daily occurrence.
We discussed disk efficiency in the previous section. Once poor disk access patterns have
been eliminated, there are two common causes of inefficiency in this type of system: too
many small I/O operations, and excessive byte copying.
The small I/O problem happens both between the client and the server and in the server's
own persistent operations.
To avoid this, our protocol is built around a "message set" abstraction that naturally groups
messages together. This allows network requests to group messages together and
amortize the overhead of the network roundtrip rather than sending a single message at a
time. The server in turn appends chunks of messages to its log in one go, and the consumer
fetches large linear chunks at a time.
This simple optimization produces orders of magnitude speed up. Batching leads to larger
network packets, larger sequential disk operations, contiguous memory blocks, and so on,
all of which allows Kafka to turn a bursty stream of random message writes into linear
writes that flow to the consumers.
The other inefficiency is in byte copying. At low message rates this is not an issue, but
under load the impact is significant. To avoid this we employ a standardized binary
message format that is shared by the producer, the broker, and the consumer (so data
chunks can be transferred without modification between them).
The message log maintained by the broker is itself just a directory of files, each populated
by a sequence of message sets that have been written to disk in the same format used by
the producer and consumer. Maintaining this common format allows optimization of the
most important operation: network transfer of persistent log chunks. Modern unix operating
systems offer a highly optimized code path for transferring data out of pagecache to a
socket; in Linux this is done with the sendfile system call.
To understand the impact of sendfile, it is important to understand the common data path
for transfer of data from file to socket:
1. The operating system reads data from the disk into pagecache in kernel space
2. The application reads the data from kernel space into a user-space buffer
3. The application writes the data back into kernel space into a socket buffer
4. The operating system copies the data from the socket buffer to the NIC buffer where
it is sent over the network
This is clearly inefficient, there are four copies and two system calls. Using sendfile, this re-
copying is avoided by allowing the OS to send the data from pagecache to the network
directly. So in this optimized path, only the final copy to the NIC buffer is needed.
We expect a common use case to be multiple consumers on a topic. Using the zero-copy
optimization above, data is copied into pagecache exactly once and reused on each
consumption instead of being stored in memory and copied out to user-space every time it
is read. This allows messages to be consumed at a rate that approaches the limit of the
network connection.
This combination of pagecache and sendfile means that on a Kafka cluster where the
consumers are mostly caught up you will see no read activity on the disks whatsoever as
they will be serving data entirely from cache.
For more background on the sendfile and zero-copy support in Java, see this article.
In some cases the bottleneck is actually not CPU or disk but network bandwidth. This is
particularly true for a data pipeline that needs to send messages between data centers over
a wide-area network. Of course, the user can always compress its messages one at a time
without any support needed from Kafka, but this can lead to very poor compression ratios
as much of the redundancy is due to repetition between messages of the same type (e.g.
field names in JSON or user agents in web logs or common string values). Efficient
compression requires compressing multiple messages together rather than compressing
each message individually.
Kafka supports this with an efficient batching format. A batch of messages can be clumped
together compressed and sent to the server in this form. This batch of messages will be
written in compressed form and will remain compressed in the log and will only be
decompressed by the consumer.
Kafka supports GZIP, Snappy, LZ4 and ZStandard compression protocols. More details on
compression can be found here.
Load balancing
The producer sends data directly to the broker that is the leader for the partition without any
intervening routing tier. To help the producer do this all Kafka nodes can answer a request
for metadata about which servers are alive and where the leaders for the partitions of a
topic are at any given time to allow the producer to appropriately direct its requests.
The client controls which partition it publishes messages to. This can be done at random,
implementing a kind of random load balancing, or it can be done by some semantic
partitioning function. We expose the interface for semantic partitioning by allowing the user
to specify a key to partition by and using this to hash to a partition (there is also an option
to override the partition function if need be). For example if the key chosen was a user id
then all data for a given user would be sent to the same partition. This in turn will allow
consumers to make locality assumptions about their consumption. This style of partitioning
is explicitly designed to allow locality-sensitive processing in consumers.
Asynchronous send
Batching is one of the big drivers of efficiency, and to enable batching the Kafka producer
will attempt to accumulate data in memory and to send out larger batches in a single
request. The batching can be configured to accumulate no more than a fixed number of
messages and to wait no longer than some fixed latency bound (say 64k or 10 ms). This
allows the accumulation of more bytes to send, and few larger I/O operations on the
servers. This buffering is configurable and gives a mechanism to trade off a small amount
of additional latency for better throughput.
The Kafka consumer works by issuing "fetch" requests to the brokers leading the partitions
it wants to consume. The consumer specifies its offset in the log with each request and
receives back a chunk of log beginning from that position. The consumer thus has
significant control over this position and can rewind it to re-consume data if need be.
An initial question we considered is whether consumers should pull data from brokers or
brokers should push data to the consumer. In this respect Kafka follows a more traditional
design, shared by most messaging systems, where data is pushed to the broker from the
producer and pulled from the broker by the consumer. Some logging-centric systems, such
as Scribe and Apache Flume, follow a very different push-based path where data is pushed
downstream. There are pros and cons to both approaches. However, a push-based system
has difficulty dealing with diverse consumers as the broker controls the rate at which data
is transferred. The goal is generally for the consumer to be able to consume at the
maximum possible rate; unfortunately, in a push system this means the consumer tends to
be overwhelmed when its rate of consumption falls below the rate of production (a denial of
service attack, in essence). A pull-based system has the nicer property that the consumer
simply falls behind and catches up when it can. This can be mitigated with some kind of
backoff protocol by which the consumer can indicate it is overwhelmed, but getting the rate
of transfer to fully utilize (but never over-utilize) the consumer is trickier than it seems.
Previous attempts at building systems in this fashion led us to go with a more traditional
pull model.
You could imagine other possible designs which would be only pull, end-to-end. The
producer would locally write to a local log, and brokers would pull from that with consumers
pulling from them. A similar type of "store-and-forward" producer is often proposed. This is
intriguing but we felt not very suitable for our target use cases which have thousands of
producers. Our experience running persistent data systems at scale led us to feel that
involving thousands of disks in the system across many applications would not actually
make things more reliable and would be a nightmare to operate. And in practice we have
found that we can run a pipeline with strong SLAs at large scale without a need for producer
persistence.
Consumer Position
Keeping track of what has been consumed is, surprisingly, one of the key performance
points of a messaging system.
Most messaging systems keep metadata about what messages have been consumed on
the broker. That is, as a message is handed out to a consumer, the broker either records
that fact locally immediately or it may wait for acknowledgement from the consumer. This
is a fairly intuitive choice, and indeed for a single machine server it is not clear where else
this state could go. Since the data structures used for storage in many messaging systems
scale poorly, this is also a pragmatic choice--since the broker knows what is consumed it
can immediately delete it, keeping the data size small.
What is perhaps not obvious is that getting the broker and consumer to come into
agreement about what has been consumed is not a trivial problem. If the broker records a
message as consumed immediately every time it is handed out over the network, then if the
consumer fails to process the message (say because it crashes or the request times out or
whatever) that message will be lost. To solve this problem, many messaging systems add
an acknowledgement feature which means that messages are only marked
as sent not consumed when they are sent; the broker waits for a specific acknowledgement
from the consumer to record the message as consumed. This strategy fixes the problem of
losing messages, but creates new problems. First of all, if the consumer processes the
message but fails before it can send an acknowledgement then the message will be
consumed twice. The second problem is around performance, now the broker must keep
multiple states about every single message (first to lock it so it is not given out a second
time, and then to mark it as permanently consumed so that it can be removed). Tricky
problems must be dealt with, like what to do with messages that are sent but never
acknowledged.
Kafka handles this differently. Our topic is divided into a set of totally ordered partitions,
each of which is consumed by exactly one consumer within each subscribing consumer
group at any given time. This means that the position of a consumer in each partition is just
a single integer, the offset of the next message to consume. This makes the state about
what has been consumed very small, just one number for each partition. This state can be
periodically checkpointed. This makes the equivalent of message acknowledgements very
cheap.
Scalable persistence allows for the possibility of consumers that only periodically consume
such as batch data loads that periodically bulk-load data into an offline system such as
Hadoop or a relational data warehouse.
In the case of Hadoop we parallelize the data load by splitting the load over individual map
tasks, one for each node/topic/partition combination, allowing full parallelism in the
loading. Hadoop provides the task management, and tasks which fail can restart without
danger of duplicate data—they simply restart from their original position.
Static Membership
Now that we understand a little about how producers and consumers work, let's discuss the
semantic guarantees Kafka provides between producer and consumer. Clearly there are
multiple possible message delivery guarantees that could be provided:
It's worth noting that this breaks down into two problems: the durability guarantees for
publishing a message and the guarantees when consuming a message.
Many systems claim to provide "exactly once" delivery semantics, but it is important to read
the fine print, most of these claims are misleading (i.e. they don't translate to the case
where consumers or producers can fail, cases where there are multiple consumer
processes, or cases where data written to disk can be lost).
Prior to 0.11.0.0, if a producer failed to receive a response indicating that a message was
committed, it had little choice but to resend the message. This provides at-least-once
delivery semantics since the message may be written to the log again during resending if
the original request had in fact succeeded. Since 0.11.0.0, the Kafka producer also supports
an idempotent delivery option which guarantees that resending will not result in duplicate
entries in the log. To achieve this, the broker assigns each producer an ID and deduplicates
messages using a sequence number that is sent by the producer along with every message.
Also beginning with 0.11.0.0, the producer supports the ability to send messages to
multiple topic partitions using transaction-like semantics: i.e. either all messages are
successfully written or none of them are. The main use case for this is exactly-once
processing between Kafka topics (described below).
Not all use cases require such strong guarantees. For uses which are latency sensitive we
allow the producer to specify the durability level it desires. If the producer specifies that it
wants to wait on the message being committed this can take on the order of 10 ms.
However the producer can also specify that it wants to perform the send completely
asynchronously or that it wants to wait only until the leader (but not necessarily the
followers) have the message.
Now let's describe the semantics from the point-of-view of the consumer. All replicas have
the exact same log with the same offsets. The consumer controls its position in this log. If
the consumer never crashed it could just store this position in memory, but if the consumer
fails and we want this topic partition to be taken over by another process the new process
will need to choose an appropriate position from which to start processing. Let's say the
consumer reads some messages -- it has several options for processing the messages and
updating its position.
1. It can read the messages, then save its position in the log, and finally process the
messages. In this case there is a possibility that the consumer process crashes after
saving its position but before saving the output of its message processing. In this
case the process that took over processing would start at the saved position even
though a few messages prior to that position had not been processed. This
corresponds to "at-most-once" semantics as in the case of a consumer failure
messages may not be processed.
2. It can read the messages, process the messages, and finally save its position. In this
case there is a possibility that the consumer process crashes after processing
messages but before saving its position. In this case when the new process takes
over the first few messages it receives will already have been processed. This
corresponds to the "at-least-once" semantics in the case of consumer failure. In
many cases messages have a primary key and so the updates are idempotent
(receiving the same message twice just overwrites a record with another copy of
itself).
So what about exactly once semantics (i.e. the thing you actually want)? When consuming
from a Kafka topic and producing to another topic (as in a Kafka Streams application), we
can leverage the new transactional producer capabilities in 0.11.0.0 that were mentioned
above. The consumer's position is stored as a message in a topic, so we can write the
offset to Kafka in the same transaction as the output topics receiving the processed data. If
the transaction is aborted, the consumer's position will revert to its old value and the
produced data on the output topics will not be visible to other consumers, depending on
their "isolation level." In the default "read_uncommitted" isolation level, all messages are
visible to consumers even if they were part of an aborted transaction, but in
"read_committed," the consumer will only return messages from transactions which were
committed (and any messages which were not part of a transaction).
When writing to an external system, the limitation is in the need to coordinate the
consumer's position with what is actually stored as output. The classic way of achieving
this would be to introduce a two-phase commit between the storage of the consumer
position and the storage of the consumers output. But this can be handled more simply and
generally by letting the consumer store its offset in the same place as its output. This is
better because many of the output systems a consumer might want to write to will not
support a two-phase commit. As an example of this, consider a Kafka Connect connector
which populates data in HDFS along with the offsets of the data it reads so that it is
guaranteed that either data and offsets are both updated or neither is. We follow similar
patterns for many other data systems which require these stronger semantics and for
which the messages do not have a primary key to allow for deduplication.
So effectively Kafka supports exactly-once delivery in Kafka Streams, and the transactional
producer/consumer can be used generally to provide exactly-once delivery when
transferring and processing data between Kafka topics. Exactly-once delivery for other
destination systems generally requires cooperation with such systems, but Kafka provides
the offset which makes implementing this feasible (see also Kafka Connect). Otherwise,
Kafka guarantees at-least-once delivery by default, and allows the user to implement at-
most-once delivery by disabling retries on the producer and committing offsets in the
consumer prior to processing a batch of messages.
4.7 Replication
Kafka replicates the log for each topic's partitions across a configurable number of servers
(you can set this replication factor on a topic-by-topic basis). This allows automatic failover
to these replicas when a server in the cluster fails so messages remain available in the
presence of failures.
Other messaging systems provide some replication-related features, but, in our (totally
biased) opinion, this appears to be a tacked-on thing, not heavily used, and with large
downsides: replicas are inactive, throughput is heavily impacted, it requires fiddly manual
configuration, etc. Kafka is meant to be used with replication by default—in fact we
implement un-replicated topics as replicated topics where the replication factor is one.
The unit of replication is the topic partition. Under non-failure conditions, each partition in
Kafka has a single leader and zero or more followers. The total number of replicas including
the leader constitute the replication factor. All reads and writes go to the leader of the
partition. Typically, there are many more partitions than brokers and the leaders are evenly
distributed among brokers. The logs on the followers are identical to the leader's log—all
have the same offsets and messages in the same order (though, of course, at any given
time the leader may have a few as-yet unreplicated messages at the end of its log).
Followers consume messages from the leader just as a normal Kafka consumer would and
apply them to their own log. Having the followers pull from the leader has the nice property
of allowing the follower to naturally batch together log entries they are applying to their log.
As with most distributed systems automatically handling failures requires having a precise
definition of what it means for a node to be "alive". For Kafka node liveness has two
conditions
1. A node must be able to maintain its session with ZooKeeper (via ZooKeeper's
heartbeat mechanism)
2. If it is a follower it must replicate the writes happening on the leader and not fall "too
far" behind
We refer to nodes satisfying these two conditions as being "in sync" to avoid the vagueness
of "alive" or "failed". The leader keeps track of the set of "in sync" nodes. If a follower dies,
gets stuck, or falls behind, the leader will remove it from the list of in sync replicas. The
determination of stuck and lagging replicas is controlled by the replica.lag.time.max.ms
configuration.
We can now more precisely define that a message is considered committed when all in
sync replicas for that partition have applied it to their log. Only committed messages are
ever given out to the consumer. This means that the consumer need not worry about
potentially seeing a message that could be lost if the leader fails. Producers, on the other
hand, have the option of either waiting for the message to be committed or not, depending
on their preference for tradeoff between latency and durability. This preference is controlled
by the acks setting that the producer uses. Note that topics have a setting for the "minimum
number" of in-sync replicas that is checked when the producer requests acknowledgment
that a message has been written to the full set of in-sync replicas. If a less stringent
acknowledgement is requested by the producer, then the message can be committed, and
consumed, even if the number of in-sync replicas is lower than the minimum (e.g. it can be
as low as just the leader).
The guarantee that Kafka offers is that a committed message will not be lost, as long as
there is at least one in sync replica alive, at all times.
Kafka will remain available in the presence of node failures after a short fail-over period, but
may not remain available in the presence of network partitions.
At its heart a Kafka partition is a replicated log. The replicated log is one of the most basic
primitives in distributed data systems, and there are many approaches for implementing
one. A replicated log can be used by other systems as a primitive for implementing other
distributed systems in the state-machine style.
A replicated log models the process of coming into consensus on the order of a series of
values (generally numbering the log entries 0, 1, 2, ...). There are many ways to implement
this, but the simplest and fastest is with a leader who chooses the ordering of values
provided to it. As long as the leader remains alive, all followers need to only copy the values
and ordering the leader chooses.
Of course if leaders didn't fail we wouldn't need followers! When the leader does die we
need to choose a new leader from among the followers. But followers themselves may fall
behind or crash so we must ensure we choose an up-to-date follower. The fundamental
guarantee a log replication algorithm must provide is that if we tell the client a message is
committed, and the leader fails, the new leader we elect must also have that message. This
yields a tradeoff: if the leader waits for more followers to acknowledge a message before
declaring it committed then there will be more potentially electable leaders.
If you choose the number of acknowledgements required and the number of logs that must
be compared to elect a leader such that there is guaranteed to be an overlap, then this is
called a Quorum.
A common approach to this tradeoff is to use a majority vote for both the commit decision
and the leader election. This is not what Kafka does, but let's explore it anyway to
understand the tradeoffs. Let's say we have 2f+1 replicas. If f+1 replicas must receive a
message prior to a commit being declared by the leader, and if we elect a new leader by
electing the follower with the most complete log from at least f+1 replicas, then, with no
more than f failures, the leader is guaranteed to have all committed messages. This is
because among any f+1 replicas, there must be at least one replica that contains all
committed messages. That replica's log will be the most complete and therefore will be
selected as the new leader. There are many remaining details that each algorithm must
handle (such as precisely defined what makes a log more complete, ensuring log
consistency during leader failure or changing the set of servers in the replica set) but we will
ignore these for now.
This majority vote approach has a very nice property: the latency is dependent on only the
fastest servers. That is, if the replication factor is three, the latency is determined by the
faster follower not the slower one.
The downside of majority vote is that it doesn't take many failures to leave you with no
electable leaders. To tolerate one failure requires three copies of the data, and to tolerate
two failures requires five copies of the data. In our experience having only enough
redundancy to tolerate a single failure is not enough for a practical system, but doing every
write five times, with 5x the disk space requirements and 1/5th the throughput, is not very
practical for large volume data problems. This is likely why quorum algorithms more
commonly appear for shared cluster configuration such as ZooKeeper but are less common
for primary data storage. For example in HDFS the namenode's high-availability feature is
built on a majority-vote-based journal, but this more expensive approach is not used for the
data itself.
Kafka takes a slightly different approach to choosing its quorum set. Instead of majority
vote, Kafka dynamically maintains a set of in-sync replicas (ISR) that are caught-up to the
leader. Only members of this set are eligible for election as leader. A write to a Kafka
partition is not considered committed until all in-sync replicas have received the write. This
ISR set is persisted to ZooKeeper whenever it changes. Because of this, any replica in the
ISR is eligible to be elected leader. This is an important factor for Kafka's usage model
where there are many partitions and ensuring leadership balance is important. With this ISR
model and f+1 replicas, a Kafka topic can tolerate f failures without losing committed
messages.
For most use cases we hope to handle, we think this tradeoff is a reasonable one. In
practice, to tolerate f failures, both the majority vote and the ISR approach will wait for the
same number of replicas to acknowledge before committing a message (e.g. to survive one
failure a majority quorum needs three replicas and one acknowledgement and the ISR
approach requires two replicas and one acknowledgement). The ability to commit without
the slowest servers is an advantage of the majority vote approach. However, we think it is
ameliorated by allowing the client to choose whether they block on the message commit or
not, and the additional throughput and disk space due to the lower required replication
factor is worth it.
Another important design distinction is that Kafka does not require that crashed nodes
recover with all their data intact. It is not uncommon for replication algorithms in this space
to depend on the existence of "stable storage" that cannot be lost in any failure-recovery
scenario without potential consistency violations. There are two primary problems with this
assumption. First, disk errors are the most common problem we observe in real operation
of persistent data systems and they often do not leave data intact. Secondly, even if this
were not a problem, we do not want to require the use of fsync on every write for our
consistency guarantees as this can reduce performance by two to three orders of
magnitude. Our protocol for allowing a replica to rejoin the ISR ensures that before rejoining,
it must fully re-sync again even if it lost unflushed data in its crash.
Note that Kafka's guarantee with respect to data loss is predicated on at least one replica
remaining in sync. If all the nodes replicating a partition die, this guarantee no longer holds.
However a practical system needs to do something reasonable when all the replicas die. If
you are unlucky enough to have this occur, it is important to consider what will happen.
There are two behaviors that could be implemented:
1. Wait for a replica in the ISR to come back to life and choose this replica as the leader
(hopefully it still has all its data).
2. Choose the first replica (not necessarily in the ISR) that comes back to life as the
leader.
This is a simple tradeoff between availability and consistency. If we wait for replicas in the
ISR, then we will remain unavailable as long as those replicas are down. If such replicas
were destroyed or their data was lost, then we are permanently down. If, on the other hand,
a non-in-sync replica comes back to life and we allow it to become leader, then its log
becomes the source of truth even though it is not guaranteed to have every committed
message. By default from version 0.11.0.0, Kafka chooses the first strategy and favor
waiting for a consistent replica. This behavior can be changed using configuration property
unclean.leader.election.enable, to support use cases where uptime is preferable to
consistency.
This dilemma is not specific to Kafka. It exists in any quorum-based scheme. For example
in a majority voting scheme, if a majority of servers suffer a permanent failure, then you
must either choose to lose 100% of your data or violate consistency by taking what remains
on an existing server as your new source of truth.
When writing to Kafka, producers can choose whether they wait for the message to be
acknowledged by 0,1 or all (-1) replicas. Note that "acknowledgement by all replicas" does
not guarantee that the full set of assigned replicas have received the message. By default,
when acks=all, acknowledgement happens as soon as all the current in-sync replicas have
received the message. For example, if a topic is configured with only two replicas and one
fails (i.e., only one in sync replica remains), then writes that specify acks=all will succeed.
However, these writes could be lost if the remaining replica also fails. Although this ensures
maximum availability of the partition, this behavior may be undesirable to some users who
prefer durability over availability. Therefore, we provide two topic-level configurations that
can be used to prefer message durability over availability:
1. Disable unclean leader election - if all replicas become unavailable, then the partition
will remain unavailable until the most recent leader becomes available again. This
effectively prefers unavailability over the risk of message loss. See the previous
section on Unclean Leader Election for clarification.
2. Specify a minimum ISR size - the partition will only accept writes if the size of the ISR
is above a certain minimum, in order to prevent the loss of messages that were
written to just a single replica, which subsequently becomes unavailable. This
setting only takes effect if the producer uses acks=all and guarantees that the
message will be acknowledged by at least this many in-sync replicas. This setting
offers a trade-off between consistency and availability. A higher setting for minimum
ISR size guarantees better consistency since the message is guaranteed to be
written to more replicas which reduces the probability that it will be lost. However, it
reduces availability since the partition will be unavailable for writes if the number of
in-sync replicas drops below the minimum threshold.
Replica Management
The above discussion on replicated logs really covers only a single log, i.e. one topic
partition. However a Kafka cluster will manage hundreds or thousands of these partitions.
We attempt to balance partitions within a cluster in a round-robin fashion to avoid clustering
all partitions for high-volume topics on a small number of nodes. Likewise we try to balance
leadership so that each node is the leader for a proportional share of its partitions.
It is also important to optimize the leadership election process as that is the critical window
of unavailability. A naive implementation of leader election would end up running an
election per partition for all partitions a node hosted when that node failed. Instead, we
elect one of the brokers as the "controller". This controller detects failures at the broker
level and is responsible for changing the leader of all affected partitions in a failed broker.
The result is that we are able to batch together many of the required leadership change
notifications which makes the election process far cheaper and faster for a large number of
partitions. If the controller fails, one of the surviving brokers will become the new controller.
Log compaction ensures that Kafka will always retain at least the last known value for each
message key within the log of data for a single topic partition. It addresses use cases and
scenarios such as restoring state after application crashes or system failure, or reloading
caches after application restarts during operational maintenance. Let's dive into these use
cases in more detail and then describe how compaction works.
So far we have described only the simpler approach to data retention where old log data is
discarded after a fixed period of time or when the log reaches some predetermined size.
This works well for temporal event data such as logging where each record stands alone.
However an important class of data streams are the log of changes to keyed, mutable data
(for example, the changes to a database table).
Let's discuss a concrete example of such a stream. Say we have a topic containing user
email addresses; every time a user updates their email address we send a message to this
topic using their user id as the primary key. Now say we send the following messages over
some time period for a user with id 123, each message corresponding to a change in email
address (messages for other ids are omitted):
Log compaction gives us a more granular retention mechanism so that we are guaranteed
to retain at least the last update for each primary key (e.g. [email protected]). By doing this we
guarantee that the log contains a full snapshot of the final value for every key not just keys
that changed recently. This means downstream consumers can restore their own state off
this topic without us having to retain a complete log of all changes.
Let's start by looking at a few use cases where this is useful, then we'll see how it can be
used.
In each of these cases one needs primarily to handle the real-time feed of changes, but
occasionally, when a machine crashes or data needs to be re-loaded or re-processed, one
needs to do a full load. Log compaction allows feeding both of these use cases off the
same backing topic. This style of usage of a log is described in more detail in this blog post.
The general idea is quite simple. If we had infinite log retention, and we logged each change
in the above cases, then we would have captured the state of the system at each time from
when it first began. Using this complete log, we could restore to any point in time by
replaying the first N records in the log. This hypothetical complete log is not very practical
for systems that update a single record many times as the log will grow without bound even
for a stable dataset. The simple log retention mechanism which throws away old updates
will bound space but the log is no longer a way to restore the current state—now restoring
from the beginning of the log no longer recreates the current state as old updates may not
be captured at all.
Log compaction is a mechanism to give finer-grained per-record retention, rather than the
coarser-grained time-based retention. The idea is to selectively remove records where we
have a more recent update with the same primary key. This way the log is guaranteed to
have at least the last state for each key.
This retention policy can be set per-topic, so a single cluster can have some topics where
retention is enforced by size or time and other topics where retention is enforced by
compaction.
This functionality is inspired by one of LinkedIn's oldest and most successful pieces of
infrastructure—a database changelog caching service called Databus. Unlike most log-
structured storage systems Kafka is built for subscription and organizes data for fast linear
reads and writes. Unlike Databus, Kafka acts as a source-of-truth store so it is useful even
in situations where the upstream data source would not otherwise be replayable.
Here is a high-level picture that shows the logical structure of a Kafka log with the offset for
each message.
The head of the log is identical to a traditional Kafka log. It has dense, sequential offsets
and retains all messages. Log compaction adds an option for handling the tail of the log.
The picture above shows a log with a compacted tail. Note that the messages in the tail of
the log retain the original offset assigned when they were first written—that never changes.
Note also that all offsets remain valid positions in the log, even if the message with that
offset has been compacted away; in this case this position is indistinguishable from the
next highest offset that does appear in the log. For example, in the picture above the offsets
36, 37, and 38 are all equivalent positions and a read beginning at any of these offsets
would return a message set beginning with 38.
Compaction also allows for deletes. A message with a key and a null payload will be treated
as a delete from the log. This delete marker will cause any prior message with that key to be
removed (as would any new message with that key), but delete markers are special in that
they will themselves be cleaned out of the log after a period of time to free up space. The
point in time at which deletes are no longer retained is marked as the "delete retention
point" in the above diagram.
The compaction is done in the background by periodically recopying log segments.
Cleaning does not block reads and can be throttled to use no more than a configurable
amount of I/O throughput to avoid impacting producers and consumers. The actual process
of compacting a log segment looks something like this:
1. Any consumer that stays caught-up to within the head of the log will see every
message that is written; these messages will have sequential offsets. The
topic's min.compaction.lag.ms can be used to guarantee the minimum length of time
must pass after a message is written before it could be compacted. I.e. it provides a
lower bound on how long each message will remain in the (uncompacted) head. The
topic's max.compaction.lag.ms can be used to guarantee the maximum delay
between the time a message is written and the time the message becomes eligible
for compaction.
2. Ordering of messages is always maintained. Compaction will never re-order
messages, just remove some.
3. The offset for a message never changes. It is the permanent identifier for a position
in the log.
4. Any consumer progressing from the start of the log will see at least the final state of
all records in the order they were written. Additionally, all delete markers for deleted
records will be seen, provided the consumer reaches the head of the log in a time
period less than the topic's delete.retention.ms setting (the default is 24 hours). In
other words: since the removal of delete markers happens concurrently with reads, it
is possible for a consumer to miss delete markers if it lags by more
than delete.retention.ms.
Log compaction is handled by the log cleaner, a pool of background threads that recopy log
segment files, removing records whose key appears in the head of the log. Each compactor
thread works as follows:
1. It chooses the log that has the highest ratio of log head to log tail
2. It creates a succinct summary of the last offset for each key in the head of the log
3. It recopies the log from beginning to end removing keys which have a later
occurrence in the log. New, clean segments are swapped into the log immediately so
the additional disk space required is just one additional log segment (not a fully copy
of the log).
4. The summary of the log head is essentially just a space-compact hash table. It uses
exactly 24 bytes per entry. As a result with 8GB of cleaner buffer one cleaner
iteration can clean around 366GB of log head (assuming 1k messages).
The log cleaner is enabled by default. This will start the pool of cleaner threads. To enable
log cleaning on a particular topic, add the log-specific property
log.cleanup.policy=compact
log.cleaner.min.compaction.lag.ms
This can be used to prevent messages newer than a minimum message age from being
subject to compaction. If not set, all log segments are eligible for compaction except for the
last segment, i.e. the one currently being written to. The active segment will not be
compacted even if all of its messages are older than the minimum compaction time lag.
The log cleaner can be configured to ensure a maximum delay after which the
uncompacted "head" of the log becomes eligible for log compaction.
log.cleaner.max.compaction.lag.ms
This can be used to prevent log with low produce rate from remaining ineligible for
compaction for an unbounded duration. If not set, logs that do not exceed
min.cleanable.dirty.ratio are not compacted. Note that this compaction deadline is not a
hard guarantee since it is still subjected to the availability of log cleaner threads and the
actual compaction time. You will want to monitor the uncleanable-partitions-count, max-
clean-time-secs and max-compaction-delay-secs metrics.
4.9 Quotas
Kafka cluster has the ability to enforce quotas on requests to control the broker resources
used by clients. Two types of client quotas can be enforced by Kafka brokers for each
group of clients sharing a quota:
It is possible for producers and consumers to produce/consume very high volumes of data
or generate requests at a very high rate and thus monopolize broker resources, cause
network saturation and generally DOS other clients and the brokers themselves. Having
quotas protects against these issues and is all the more important in large multi-tenant
clusters where a small set of badly behaved clients can degrade user experience for the
well behaved ones. In fact, when running Kafka as a service this even makes it possible to
enforce API limits according to an agreed upon contract.
Client groups
The identity of Kafka clients is the user principal which represents an authenticated user in
a secure cluster. In a cluster that supports unauthenticated clients, user principal is a
grouping of unauthenticated users chosen by the broker using a
configurable PrincipalBuilder. Client-id is a logical grouping of clients with a meaningful
name chosen by the client application. The tuple (user, client-id) defines a secure logical
group of clients that share both user principal and client-id.
Quotas can be applied to (user, client-id), user or client-id groups. For a given connection,
the most specific quota matching the connection is applied. All connections of a quota
group share the quota configured for the group. For example, if (user="test-user", client-
id="test-client") has a produce quota of 10MB/sec, this is shared across all producer
instances of user "test-user" with the client-id "test-client".
Quota Configuration
Quota configuration may be defined for (user, client-id), user and client-id groups. It is
possible to override the default quota at any of the quota levels that needs a higher (or even
lower) quota. The mechanism is similar to the per-topic log config overrides. User and (user,
client-id) quota overrides are written to ZooKeeper under /config/users and client-id quota
overrides are written under /config/clients. These overrides are read by all brokers and are
effective immediately. This lets us change quotas without having to do a rolling restart of
the entire cluster. See here for details. Default quotas for each group may also be updated
dynamically using the same mechanism.
1. /config/users/<user>/clients/<client-id>
2. /config/users/<user>/clients/<default>
3. /config/users/<user>
4. /config/users/<default>/clients/<client-id>
5. /config/users/<default>/clients/<default>
6. /config/users/<default>
7. /config/clients/<client-id>
8. /config/clients/<default>
Network bandwidth quotas are defined as the byte rate threshold for each group of clients
sharing a quota. By default, each unique client group receives a fixed quota in bytes/sec as
configured by the cluster. This quota is defined on a per-broker basis. Each group of clients
can publish/fetch a maximum of X bytes/sec per broker before clients are throttled.
Request Rate Quotas
Request rate quotas are defined as the percentage of time a client can utilize on request
handler I/O threads and network threads of each broker within a quota window. A quota of n
% represents n% of one thread, so the quota is out of a total capacity of ((num.io.threads +
num.network.threads) * 100)%. Each group of clients may use a total percentage of
upto n% across all I/O and network threads in a quota window before being throttled. Since
the number of threads allocated for I/O and network threads are typically based on the
number of cores available on the broker host, request rate quotas represent the total
percentage of CPU that may be used by each group of clients sharing the quota.
Enforcement
By default, each unique client group receives a fixed quota as configured by the cluster. This
quota is defined on a per-broker basis. Each client can utilize this quota per broker before it
gets throttled. We decided that defining these quotas per broker is much better than having
a fixed cluster wide bandwidth per client because that would require a mechanism to share
client quota usage among all the brokers. This can be harder to get right than the quota
implementation itself!
How does a broker react when it detects a quota violation? In our solution, the broker first
computes the amount of delay needed to bring the violating client under its quota and
returns a response with the delay immediately. In case of a fetch request, the response will
not contain any data. Then, the broker mutes the channel to the client, not to process
requests from the client anymore, until the delay is over. Upon receiving a response with a
non-zero delay duration, the Kafka client will also refrain from sending further requests to
the broker during the delay. Therefore, requests from a throttled client are effectively
blocked from both sides. Even with older client implementations that do not respect the
delay response from the broker, the back pressure applied by the broker via muting its
socket channel can still handle the throttling of badly behaving clients. Those clients who
sent further requests to the throttled channel will receive responses only after the delay is
over.
Byte-rate and thread utilization are measured over multiple small windows (e.g. 30 windows
of 1 second each) in order to detect and correct quota violations quickly. Typically, having
large measurement windows (for e.g. 10 windows of 30 seconds each) leads to large
bursts of traffic followed by long delays which is not great in terms of user experience.
5. IMPLEMENTATION
5.1 Network Layer
The network layer is a fairly straight-forward NIO server, and will not be described in great
detail. The sendfile implementation is done by giving the MessageSet interface
a writeTo method. This allows the file-backed message set to use the more
efficient transferTo implementation instead of an in-process buffered write. The threading
model is a single acceptor thread and N processor threads which handle a fixed number of
connections each. This design has been pretty thoroughly tested elsewhere and found to be
simple to implement and fast. The protocol is kept quite simple to allow for future
implementation of clients in other languages.
5.2 Messages
Messages consist of a variable-length header, a variable-length opaque key byte array and a
variable-length opaque value byte array. The format of the header is described in the
following section. Leaving the key and value opaque is the right decision: there is a great
deal of progress being made on serialization libraries right now, and any particular choice is
unlikely to be right for all uses. Needless to say a particular application using Kafka would
likely mandate a particular serialization type as part of its usage. The RecordBatch interface
is simply an iterator over messages with specialized methods for bulk reading and writing
to an NIO Channel.
Messages (aka Records) are always written in batches. The technical term for a batch of
messages is a record batch, and a record batch contains one or more records. In the
degenerate case, we could have a record batch containing a single record. Record batches
and records have their own headers. The format of each is described below.
baseOffset: int64
batchLength: int32
partitionLeaderEpoch: int32
magic: int8 (current magic value is 2)
crc: int32
attributes: int16
bit 0~2:
0: no compression
1: gzip
2: snappy
3: lz4
4: zstd
bit 3: timestampType
bit 4: isTransactional (0 means not transactional)
bit 5: isControlBatch (0 means not a control batch)
bit 6~15: unused
lastOffsetDelta: int32
firstTimestamp: int64
maxTimestamp: int64
producerId: int64
producerEpoch: int16
baseSequence: int32
records: [Record]
Note that when compression is enabled, the compressed record data is serialized directly
following the count of the number of records.
The CRC covers the data from the attributes to the end of the batch (i.e. all the bytes that
follow the CRC). It is located after the magic byte, which means that clients must parse the
magic byte before deciding how to interpret the bytes between the batch length and the
magic byte. The partition leader epoch field is not included in the CRC computation to avoid
the need to recompute the CRC when this field is assigned for every batch that is received
by the broker. The CRC-32C (Castagnoli) polynomial is used for the computation.
On compaction: unlike the older message formats, magic v2 and above preserves the first
and last offset/sequence numbers from the original batch when the log is cleaned. This is
required in order to be able to restore the producer's state when the log is reloaded. If we
did not retain the last sequence number, for example, then after a partition leader failure, the
producer might see an OutOfSequence error. The base sequence number must be
preserved for duplicate checking (the broker checks incoming Produce requests for
duplicates by verifying that the first and last sequence numbers of the incoming batch
match the last from that producer). As a result, it is possible to have empty batches in the
log when all the records in the batch are cleaned but batch is still retained in order to
preserve a producer's last sequence number. One oddity here is that the firstTimestamp
field is not preserved during compaction, so it will change if the first record in the batch is
compacted away.
A control batch contains a single record called the control record. Control records should
not be passed on to applications. Instead, they are used by consumers to filter out aborted
transactional messages.
The key of a control record conforms to the following schema:
The schema for the value of a control record is dependent on the type. The value is opaque
to clients.
5.3.2 Record
Record level headers were introduced in Kafka 0.11.0. The on-disk format of a record with
Headers is delineated below.
length: varint
attributes: int8
bit 0~7: unused
timestampDelta: varint
offsetDelta: varint
keyLength: varint
key: byte[]
valueLen: varint
value: byte[]
Headers => [Header]
headerKeyLength: varint
headerKey: String
headerValueLength: varint
Value: byte[]
We use the same varint encoding as Protobuf. More information on the latter can be
found here. The count of headers in a record is also encoded as a varint.
Prior to Kafka 0.11, messages were transferred and stored in message sets. In a message
set, each message has its own metadata. Note that although message sets are represented
as an array, they are not preceded by an int32 array size like other array elements in the
protocol.
Message Set:
In versions prior to Kafka 0.10, the only supported message format version (which is
indicated in the magic value) was 0. Message format version 1 was introduced with
timestamp support in version 0.10.
Similarly to version 2 above, the lowest bits of attributes represent the compression
type.
In version 1, the producer should always set the timestamp type bit to 0. If the topic
is configured to use log append time, (through either broker level config
log.message.timestamp.type = LogAppendTime or topic level config
message.timestamp.type = LogAppendTime), the broker will overwrite the
timestamp type and the timestamp in the message set.
The highest bits of attributes must be set to 0.
When receiving recursive version 0 messages, the broker decompresses them and each
inner message is assigned an offset individually. In version 1, to avoid server side re-
compression, only the wrapper message will be assigned an offset. The inner messages
will have relative offsets. The absolute offset can be computed using the offset from the
outer message, which corresponds to the offset assigned to the last inner message.
The crc field contains the CRC32 (and not CRC-32C) of the subsequent message bytes (i.e.
from magic byte to the value).
5.4 Log
A log for a topic named "my_topic" with two partitions consists of two directories
(namely my_topic_0 and my_topic_1) populated with data files containing the messages for
that topic. The format of the log files is a sequence of "log entries""; each log entry is a 4
byte integer N storing the message length which is followed by the N message bytes. Each
message is uniquely identified by a 64-bit integer offset giving the byte position of the start
of this message in the stream of all messages ever sent to that topic on that partition. The
on-disk format of each message is given below. Each log file is named with the offset of the
first message it contains. So the first file created will be 00000000000.kafka, and each
additional file will have an integer name roughly S bytes from the previous file where S is the
max log file size given in the configuration.
The exact binary format for records is versioned and maintained as a standard interface so
record batches can be transferred between producer, broker, and client without recopying or
conversion when desirable. The previous section included details about the on-disk format
of records.
The use of the message offset as the message id is unusual. Our original idea was to use a
GUID generated by the producer, and maintain a mapping from GUID to offset on each
broker. But since a consumer must maintain an ID for each server, the global uniqueness of
the GUID provides no value. Furthermore, the complexity of maintaining the mapping from a
random id to an offset requires a heavy weight index structure which must be synchronized
with disk, essentially requiring a full persistent random-access data structure. Thus to
simplify the lookup structure we decided to use a simple per-partition atomic counter which
could be coupled with the partition id and node id to uniquely identify a message; this
makes the lookup structure simpler, though multiple seeks per consumer request are still
likely. However once we settled on a counter, the jump to directly using the offset seemed
natural—both after all are monotonically increasing integers unique to a partition. Since the
offset is hidden from the consumer API this decision is ultimately an implementation detail
and we went with the more efficient approach.
Writes
The log allows serial appends which always go to the last file. This file is rolled over to a
fresh file when it reaches a configurable size (say 1GB). The log takes two configuration
parameters: M, which gives the number of messages to write before forcing the OS to flush
the file to disk, and S, which gives a number of seconds after which a flush is forced. This
gives a durability guarantee of losing at most M messages or S seconds of data in the event
of a system crash.
Reads
Reads are done by giving the 64-bit logical offset of a message and an S-byte max chunk
size. This will return an iterator over the messages contained in the S-byte buffer. S is
intended to be larger than any single message, but in the event of an abnormally large
message, the read can be retried multiple times, each time doubling the buffer size, until the
message is read successfully. A maximum message and buffer size can be specified to
make the server reject messages larger than some size, and to give a bound to the client on
the maximum it needs to ever read to get a complete message. It is likely that the read
buffer ends with a partial message, this is easily detected by the size delimiting.
The actual process of reading from an offset requires first locating the log segment file in
which the data is stored, calculating the file-specific offset from the global offset value, and
then reading from that file offset. The search is done as a simple binary search variation
against an in-memory range maintained for each file.
The log provides the capability of getting the most recently written message to allow clients
to start subscribing as of "right now". This is also useful in the case the consumer fails to
consume its data within its SLA-specified number of days. In this case when the client
attempts to consume a non-existent offset it is given an OutOfRangeException and can
either reset itself or fail as appropriate to the use case.
Deletes
Data is deleted one log segment at a time. The log manager applies two metrics to identify
segments which are eligible for deletion: time and size. For time-based policies, the record
timestamps are considered, with the largest timestamp in a segment file (order of records
is not relevant) defining the retention time for the entire segment. Size-based retention is
disabled by default. When enabled the log manager keeps deleting the oldest segment file
until the overall size of the partition is within the configured limit again. If both policies are
enabled at the same time, a segment that is eligible for deletion due to either policy will be
deleted. To avoid locking reads while still allowing deletes that modify the segment list we
use a copy-on-write style segment list implementation that provides consistent views to
allow a binary search to proceed on an immutable static snapshot view of the log segments
while deletes are progressing.
Guarantees
Note that two kinds of corruption must be handled: truncation in which an unwritten block is
lost due to a crash, and corruption in which a nonsense block is ADDED to the file. The
reason for this is that in general the OS makes no guarantee of the write order between the
file inode and the actual block data so in addition to losing written data the file can gain
nonsense data if the inode is updated with a new size but a crash occurs before the block
containing that data is written. The CRC detects this corner case, and prevents it from
corrupting the log (though the unwritten messages are, of course, lost).
5.5 Distribution
When the coordinator receives an offset fetch request, it simply returns the last committed
offset vector from the offsets cache. In case coordinator was just started or if it just
became the coordinator for a new set of consumer groups (by becoming a leader for a
partition of the offsets topic), it may need to load the offsets topic partition into the cache.
In this case, the offset fetch will fail with an CoordinatorLoadInProgressException and the
consumer may retry the OffsetFetchRequest after backing off.
ZooKeeper Directories
The following gives the ZooKeeper structures and algorithms used for co-ordination
between consumers and brokers.
Notation
When an element in a path is denoted [xyz], that means that the value of xyz is not fixed
and there is in fact a ZooKeeper znode for each possible value of xyz. For example /topics/
[topic] would be a directory named /topics containing a sub-directory for each topic name.
Numerical ranges are also given such as [0...5] to indicate the subdirectories 0, 1, 2, 3, 4.
An arrow -> is used to indicate the contents of a znode. For example /hello -> world would
indicate a znode /hello containing the value "world".
This is a list of all present broker nodes, each of which provides a unique logical broker id
which identifies it to consumers (which must be given as part of its configuration). On
startup, a broker node registers itself by creating a znode with the logical broker id under
/brokers/ids. The purpose of the logical broker id is to allow a broker to be moved to a
different physical machine without affecting consumers. An attempt to register a broker id
that is already in use (say because two servers are configured with the same broker id)
results in an error.
Since the broker registers itself in ZooKeeper using ephemeral znodes, this registration is
dynamic and will disappear if the broker is shutdown or dies (thus notifying consumers it is
no longer available).
/brokers/topics/[topic]/partitions/[0...N]/state -->
{"controller_epoch":...,"leader":...,"version":...,"leader_epoch":...,"is
r":[...]} (ephemeral node)
Each broker registers itself under the topics it maintains and stores the number of partitions
for that topic.
Cluster Id
The cluster id is a unique and immutable identifier assigned to a Kafka cluster. The cluster
id can have a maximum of 22 characters and the allowed characters are defined by the
regular expression [a-zA-Z0-9_\-]+, which corresponds to the characters used by the URL-
safe Base64 variant with no padding. Conceptually, it is auto-generated when a cluster is
started for the first time.
6. OPERATIONS
Here is some information on actually running Kafka as a production system based on usage
and experience at LinkedIn. Please send us any additional tips you know of.
This section will review the most common operations you will perform on your Kafka
cluster. All of the tools reviewed in this section are available under the bin/ directory of the
Kafka distribution and each tool will print details on all possible commandline options if it is
run with no arguments.
You have the option of either adding topics manually or having them be created
automatically when data is first published to a non-existent topic. If topics are auto-created
then you may want to tune the default topic configurations used for auto-created topics.
The replication factor controls how many servers will replicate each message that is
written. If you have a replication factor of 3 then up to 2 servers can fail before you will lose
access to your data. We recommend you use a replication factor of 2 or 3 so that you can
transparently bounce machines without interrupting data consumption.
The partition count controls how many logs the topic will be sharded into. There are several
impacts of the partition count. First each partition must fit entirely on a single server. So if
you have 20 partitions the full data set (and read and write load) will be handled by no more
than 20 servers (not counting replicas). Finally the partition count impacts the maximum
parallelism of your consumers. This is discussed in greater detail in the concepts section.
Each sharded partition log is placed into its own folder under the Kafka log directory. The
name of such folders consists of the topic name, appended by a dash (-) and the partition
id. Since a typical folder name can not be over 255 characters long, there will be a limitation
on the length of topic names. We assume the number of partitions will not ever be above
100,000. Therefore, topic names cannot be longer than 249 characters. This leaves just
enough room in the folder name for a dash and a potentially 5 digit long partition id.
The configurations added on the command line override the default settings the server has
for things like the length of time data should be retained. The complete set of per-topic
configurations is documented here.
Modifying topics
You can change the configuration or partitioning of a topic using the same topic tool.
Be aware that one use case for partitions is to semantically partition data, and adding
partitions doesn't change the partitioning of existing data so this may disturb consumers if
they rely on that partition. That is if data is partitioned by hash(key) %
number_of_partitions then this partitioning will potentially be shuffled by adding partitions
but Kafka will not attempt to automatically redistribute data in any way.
To add configs:
To remove a config:
Graceful shutdown
The Kafka cluster will automatically detect any broker shutdown or failure and elect new
leaders for the partitions on that machine. This will occur whether a server fails or it is
brought down intentionally for maintenance or configuration changes. For the latter cases
Kafka supports a more graceful mechanism for stopping a server than just killing it. When a
server is stopped gracefully it has two optimizations it will take advantage of:
1. It will sync all its logs to disk to avoid needing to do any log recovery when it restarts
(i.e. validating the checksum for all messages in the tail of the log). Log recovery
takes time so this speeds up intentional restarts.
2. It will migrate any partitions the server is the leader for to other replicas prior to
shutting down. This will make the leadership transfer faster and minimize the time
each partition is unavailable to a few milliseconds.
Syncing the logs will happen automatically whenever the server is stopped other than by a
hard kill, but the controlled leadership migration requires using a special setting:
controlled.shutdown.enable=true
Note that controlled shutdown will only succeed if all the partitions hosted on the broker
have replicas (i.e. the replication factor is greater than 1 and at least one of these replicas is
alive). This is generally what you want since shutting down the last replica would make that
topic partition unavailable.
Balancing leadership
Whenever a broker stops or crashes, leadership for that broker's partitions transfers to
other replicas. When the broker is restarted it will only be a follower for all its partitions,
meaning it will not be used for client reads and writes.
To avoid this imbalance, Kafka has a notion of preferred replicas. If the list of replicas for a
partition is 1,5,9 then node 1 is preferred as the leader to either node 5 or 9 because it is
earlier in the replica list. By default the Kafka cluster will try to restore leadership to the
restored replicas. This behaviour is configured with:
auto.leader.rebalance.enable=true
You can also set this to false, but you will then need to manually restore leadership to the
restored replicas by running the command:
The rack awareness feature spreads replicas of the same partition across different racks.
This extends the guarantees Kafka provides for broker-failure to cover rack-failure, limiting
the risk of data loss should all the brokers on a rack fail at once. The feature can also be
applied to other broker groupings such as availability zones in EC2.
You can specify that a broker belongs to a particular rack by adding a property to the broker
config:
broker.rack=my-rack-id
You can run many such mirroring processes to increase throughput and for fault-tolerance
(if one process dies, the others will take overs the additional load).
Data will be read from topics in the source cluster and written to a topic with the same
name in the destination cluster. In fact the mirror maker is little more than a Kafka
consumer and producer hooked together.
The source and destination clusters are completely independent entities: they can have
different numbers of partitions and the offsets will not be the same. For this reason the
mirror cluster is not really intended as a fault-tolerance mechanism (as the consumer
position will be different); for that we recommend using normal in-cluster replication. The
mirror maker process will, however, retain and use the message key for partitioning so order
is preserved on a per-key basis.
Here is an example showing how to mirror a single topic (named my-topic) from an input
cluster:
> bin/kafka-mirror-maker.sh
--consumer.config consumer.properties
--producer.config producer.properties --whitelist my-topic
Note that we specify the list of topics with the --whitelist option. This option allows any
regular expression using Java-style regular expressions. So you could mirror two topics
named A and B using --whitelist 'A|B'. Or you could mirror all topics using --whitelist
'*'. Make sure to quote any regular expression to ensure the shell doesn't try to expand it
as a file path. For convenience we allow the use of ',' instead of '|' to specify a list of topics.
Combining mirroring with the configuration auto.create.topics.enable=true makes it
possible to have a replica cluster that will automatically create and replicate all data in a
source cluster even as new topics are added.
Sometimes it's useful to see the position of your consumers. We have a tool that will show
the position of all consumers in a consumer group as well as how far behind the end of the
log they are. To run this tool on a consumer group named my-group consuming a topic
named my-topic would look like this:
With the ConsumerGroupCommand tool, we can list, describe, or delete the consumer
groups. The consumer group can be deleted manually, or automatically when the last
committed offset for that group expires. Manual deletion works only if the group does not
have any active members. For example, to list all consumer groups across all topics:
test-consumer-group
To view offsets, as mentioned earlier, we "describe" the consumer group like this:
There are a number of additional "describe" options that can be used to provide more
detailed information about a consumer group:
--members: This option provides the list of all active members in the consumer
group.
consumer3-ecea43e4-1f01-479f-8349-f9130b75d8ee /127.0.0.1
consumer3 0
consumer3-ecea43e4-1f01-479f-8349-f9130b75d8ee /127.0.0.1
consumer3 0 -
--offsets: This is the default describe option and provides the same output as the "--
describe" option.
--state: This option provides useful group-level information.
> bin/kafka-consumer-groups.sh --bootstrap-server
localhost:9092 --describe --group my-group --state
COORDINATOR (ID) ASSIGNMENT-STRATEGY STATE
#MEMBERS
To manually delete one or multiple consumer groups, the "--delete" option can be used:
To reset offsets of a consumer group, "--reset-offsets" option can be used. This option
supports one consumer group at the time. It requires defining following scopes: --all-topics
or --topic. One scope must be selected, unless you use '--from-file' scenario. Also, first make
sure that the consumer instances are inactive. See KIP-122 for more details.
--reset-offsets also has following scenarios to choose from (at least one scenario must be
selected):
Please note, that out of range offsets will be adjusted to available offset end. For example,
if offset end is at 10 and offset shift request is of 15, then, offset at 10 will actually be
selected.
If you are using the old high-level consumer and storing the group metadata in ZooKeeper
(i.e. offsets.storage=zookeeper), pass --zookeeper instead of --bootstrap-server:
Adding servers to a Kafka cluster is easy, just assign them a unique broker id and start up
Kafka on your new servers. However these new servers will not automatically be assigned
any data partitions, so unless partitions are moved to them they won't be doing any work
until new topics are created. So usually when you add machines to your cluster you will
want to migrate some existing data to these machines.
The process of migrating data is manually initiated but fully automated. Under the covers
what happens is that Kafka will add the new server as a follower of the partition it is
migrating and allow it to fully replicate the existing data in that partition. When the new
server has fully replicated the contents of this partition and joined the in-sync replica one of
the existing replicas will delete their partition's data.
The partition reassignment tool can be used to move partitions across brokers. An ideal
partition distribution would ensure even data load and partition sizes across all brokers. The
partition reassignment tool does not have the capability to automatically study the data
distribution in a Kafka cluster and move partitions around to attain an even load
distribution. As such, the admin has to figure out which topics or partitions should be
moved around.
The partition reassignment tool can run in 3 mutually exclusive modes:
--generate: In this mode, given a list of topics and a list of brokers, the tool generates
a candidate reassignment to move all partitions of the specified topics to the new
brokers. This option merely provides a convenient way to generate a partition
reassignment plan given a list of topics and target brokers.
--execute: In this mode, the tool kicks off the reassignment of partitions based on the
user provided reassignment plan. (using the --reassignment-json-file option). This
can either be a custom reassignment plan hand crafted by the admin or provided by
using the --generate option
--verify: In this mode, the tool verifies the status of the reassignment for all partitions
listed during the last --execute. The status can be either of successfully completed,
failed or in progress
The partition reassignment tool can be used to move some topics off of the current set of
brokers to the newly added brokers. This is typically useful while expanding an existing
cluster since it is easier to move entire topics to the new set of brokers, than moving one
partition at a time. When used to do this, the user should provide a list of topics that should
be moved to the new set of brokers and a target list of new brokers. The tool then evenly
distributes all partitions for the given list of topics across the new set of brokers. During this
move, the replication factor of the topic is kept constant. Effectively the replicas for all
partitions for the input list of topics are moved from the old set of brokers to the newly
added brokers.
For instance, the following example will move all partitions for topics foo1,foo2 to the new
set of brokers 5,6. At the end of this move, all partitions for topics foo1 and foo2
will only exist on brokers 5,6.
Since the tool accepts the input list of topics as a json file, you first need to identify the
topics you want to move and create the json file as follows:
Once the json file is ready, use the partition reassignment tool to generate a candidate
assignment:
> bin/kafka-reassign-partitions.sh --bootstrap-server localhost:9092
--topics-to-move-json-file topics-to-move.json --broker-list "5,6"
--generate
Current partition replica assignment
{"version":1,
"partitions":[{"topic":"foo1","partition":2,"replicas":[1,2]},
{"topic":"foo1","partition":0,"replicas":[3,4]},
{"topic":"foo2","partition":2,"replicas":[1,2]},
{"topic":"foo2","partition":0,"replicas":[3,4]},
{"topic":"foo1","partition":1,"replicas":[2,3]},
{"topic":"foo2","partition":1,"replicas":[2,3]}]
}
{"version":1,
"partitions":[{"topic":"foo1","partition":2,"replicas":[5,6]},
{"topic":"foo1","partition":0,"replicas":[5,6]},
{"topic":"foo2","partition":2,"replicas":[5,6]},
{"topic":"foo2","partition":0,"replicas":[5,6]},
{"topic":"foo1","partition":1,"replicas":[5,6]},
{"topic":"foo2","partition":1,"replicas":[5,6]}]
}
The tool generates a candidate assignment that will move all partitions from topics
foo1,foo2 to brokers 5,6. Note, however, that at this point, the partition movement has not
started, it merely tells you the current assignment and the proposed new assignment. The
current assignment should be saved in case you want to rollback to it. The new assignment
should be saved in a json file (e.g. expand-cluster-reassignment.json) to be input to the tool
with the --execute option as follows:
{"version":1,
"partitions":[{"topic":"foo1","partition":2,"replicas":[1,2]},
{"topic":"foo1","partition":0,"replicas":[3,4]},
{"topic":"foo2","partition":2,"replicas":[1,2]},
{"topic":"foo2","partition":0,"replicas":[3,4]},
{"topic":"foo1","partition":1,"replicas":[2,3]},
{"topic":"foo2","partition":1,"replicas":[2,3]}]
}
Finally, the --verify option can be used with the tool to check the status of the partition
reassignment. Note that the same expand-cluster-reassignment.json (used with the
--execute option) should be used with the --verify option:
The partition reassignment tool can also be used to selectively move replicas of a partition
to a specific set of brokers. When used in this manner, it is assumed that the user knows
the reassignment plan and does not require the tool to generate a candidate reassignment,
effectively skipping the --generate step and moving straight to the --execute step
For instance, the following example moves partition 0 of topic foo1 to brokers 5,6 and
partition 1 of topic foo2 to brokers 2,3:
The first step is to hand craft the custom reassignment plan in a json file:
> cat custom-reassignment.json
{"version":1,"partitions":[{"topic":"foo1","partition":0,"replicas":
[5,6]},{"topic":"foo2","partition":1,"replicas":[2,3]}]}
Then, use the json file with the --execute option to start the reassignment process:
{"version":1,
"partitions":[{"topic":"foo1","partition":0,"replicas":[1,2]},
{"topic":"foo2","partition":1,"replicas":[3,4]}]
}
The --verify option can be used with the tool to check the status of the partition
reassignment. Note that the same custom-reassignment.json (used with the --execute
option) should be used with the --verify option:
Decommissioning brokers
The partition reassignment tool does not have the ability to automatically generate a
reassignment plan for decommissioning brokers yet. As such, the admin has to come up
with a reassignment plan to move the replica for all partitions hosted on the broker to be
decommissioned, to the rest of the brokers. This can be relatively tedious as the
reassignment needs to ensure that all the replicas are not moved from the decommissioned
broker to only one other broker. To make this process effortless, we plan to add tooling
support for decommissioning brokers in the future.
Increasing replication factor
Increasing the replication factor of an existing partition is easy. Just specify the extra
replicas in the custom reassignment json file and use it with the --execute option to
increase the replication factor of the specified partitions.
For instance, the following example increases the replication factor of partition 0 of topic
foo from 1 to 3. Before increasing the replication factor, the partition's only replica existed
on broker 5. As part of increasing the replication factor, we will add more replicas on
brokers 6 and 7.
The first step is to hand craft the custom reassignment plan in a json file:
Then, use the json file with the --execute option to start the reassignment process:
{"version":1,
"partitions":[{"topic":"foo","partition":0,"replicas":[5]}]}
The --verify option can be used with the tool to check the status of the partition
reassignment. Note that the same increase-replication-factor.json (used with the --execute
option) should be used with the --verify option:
You can also verify the increase in replication factor with the kafka-topics tool:
> bin/kafka-topics.sh --bootstrap-server localhost:9092 --topic foo
--describe
Topic:foo PartitionCount:1 ReplicationFactor:3 Configs:
Topic: foo Partition: 0 Leader: 5 Replicas: 5,6,7 Isr:
5,6,7
Kafka lets you apply a throttle to replication traffic, setting an upper bound on the
bandwidth used to move replicas from machine to machine. This is useful when
rebalancing a cluster, bootstrapping a new broker or adding or removing brokers, as it limits
the impact these data-intensive operations will have on users.
There are two interfaces that can be used to engage a throttle. The simplest, and safest, is
to apply a throttle when invoking the kafka-reassign-partitions.sh, but kafka-configs.sh can
also be used to view and alter the throttle values directly.
So for example, if you were to execute a rebalance, with the below command, it would move
partitions at no more than 50MB/s.
When you execute this script you will see the throttle engage:
Should you wish to alter the throttle, during a rebalance, say to increase the throughput so it
completes quicker, you can do this by re-running the execute command passing the same
reassignment-json-file:
When the --verify option is executed, and the reassignment has completed, the script will
confirm that the throttle was removed:
The administrator can also validate the assigned configs using the kafka-configs.sh. There
are two pairs of throttle configuration used to manage the throttling process. First pair
refers to the throttle value itself. This is configured, at a broker level, using the dynamic
properties:
leader.replication.throttled.rate
follower.replication.throttled.rate
leader.replication.throttled.replicas
follower.replication.throttled.replicas
This shows the throttle applied to both leader and follower side of the replication protocol.
By default both sides are assigned the same throttled throughput value.
Here we see the leader throttle is applied to partition 1 on broker 102 and partition 0 on
broker 101. Likewise the follower throttle is applied to partition 1 on broker 101 and
partition 0 on broker 102.
By default kafka-reassign-partitions.sh will apply the leader throttle to all replicas that exist
before the rebalance, any one of which might be leader. It will apply the follower throttle to
all move destinations. So if there is a partition with replicas on brokers 101,102, being
reassigned to 102,103, a leader throttle, for that partition, would be applied to 101,102 and a
follower throttle would be applied to 103 only.
If required, you can also use the --alter switch on kafka-configs.sh to alter the throttle
configurations manually.
The throttle should be removed in a timely manner once reassignment completes (by
running kafka-reassign-partitions.sh --verify).
(2) Ensuring Progress:
If the throttle is set too low, in comparison to the incoming write rate, it is possible for
replication to not make progress. This occurs when:
Where BytesInPerSec is the metric that monitors the write throughput of producers into
each broker.
The administrator can monitor whether replication is making progress, during the rebalance,
using the metric:
kafka.server:type=FetcherLagMetrics,name=ConsumerLag,clientId=([-.\w]+),topic=([-.\w]
+),partition=([0-9]+)
The lag should constantly decrease during replication. If the metric does not decrease the
administrator should increase the throttle throughput as described above.
Setting quotas
Quotas overrides and defaults may be configured at (user, client-id), user or client-id levels
as described here. By default, clients receive an unlimited quota. It is possible to set custom
quotas for each (user, client-id), user or client-id group.
It is possible to set default quotas for each (user, client-id), user or client-id group by
specifying --entity-default option instead of --entity-name.
If entity name is not specified, all entities of the specified type are described. For example,
describe all users:
It is possible to set default quotas that apply to all client-ids by setting these configs on the
brokers. These properties are applied only if quota overrides or defaults are not configured
in Zookeeper. By default, each client-id receives an unlimited quota. The following sets the
default quota per producer and consumer client-id to 10MB/sec.
quota.producer.default=10485760
quota.consumer.default=10485760
Note that these properties are being deprecated and may be removed in a future release.
Defaults configured using kafka-configs.sh take precedence over these properties.
6.2 Datacenters
Some deployments will need to manage a data pipeline that spans multiple datacenters.
Our recommended approach to this is to deploy a local Kafka cluster in each datacenter
with application instances in each datacenter interacting only with their local cluster and
mirroring between clusters (see the documentation on the mirror maker tool for how to do
this).
This deployment pattern allows datacenters to act as independent entities and allows us to
manage and tune inter-datacenter replication centrally. This allows each facility to stand
alone and operate even if the inter-datacenter links are unavailable: when this occurs the
mirroring falls behind until the link is restored at which time it catches up.
For applications that need a global view of all data you can use mirroring to provide clusters
which have aggregate data mirrored from the local clusters in all datacenters. These
aggregate clusters are used for reads by applications that require the full data set.
This is not the only possible deployment pattern. It is possible to read from or write to a
remote Kafka cluster over the WAN, though obviously this will add whatever latency is
required to get the cluster.
Kafka naturally batches data in both the producer and consumer so it can achieve high-
throughput even over a high-latency connection. To allow this though it may be necessary
to increase the TCP socket buffer sizes for the producer, consumer, and broker using
the socket.send.buffer.bytes and socket.receive.buffer.bytes configurations. The
appropriate way to set this is documented here.
acks
compression
batch size
# ZooKeeper
zookeeper.connect=[list of ZooKeeper servers]
# Log configuration
num.partitions=8
default.replication.factor=3
log.dir=[List of directories. Kafka should have its own dedicated
disk(s) or SSD(s).]
# Other configurations
broker.id=[An integer. Start with 0 and increment by 1 for each new
broker.]
listeners=[list of listeners]
auto.create.topics.enable=false
min.insync.replicas=2
queued.max.requests=[number of concurrent requests]
Our client configuration varies a fair amount between different use cases.
Java 8 and Java 11 are supported. Java 11 performs significantly better if TLS is enabled,
so it is highly recommended (it also includes a number of other performance
improvements: G1GC, CRC32C, Compact Strings, Thread-Local Handshakes and more).
From a security perspective, we recommend the latest released patch version as older
freely available versions have disclosed security vulnerabilities. Typical arguments for
running Kafka with OpenJDK-based Java implementations (including Oracle JDK) are:
For reference, here are the stats for one of LinkedIn's busiest clusters (at peak) that uses
said Java arguments:
60 brokers
50k partitions (replication factor 2)
800k messages/sec in
300 MB/sec inbound, 1 GB/sec+ outbound
All of the brokers in that cluster have a 90% GC pause time of about 21ms with less than 1
young GC per second.
We are using dual quad-core Intel Xeon machines with 24GB of memory.
You need sufficient memory to buffer active readers and writers. You can do a back-of-the-
envelope estimate of memory needs by assuming you want to be able to buffer for 30
seconds and compute your memory need as write_throughput*30.
The disk throughput is important. We have 8x7200 rpm SATA drives. In general disk
throughput is the performance bottleneck, and more disks is better. Depending on how you
configure flush behavior you may or may not benefit from more expensive disks (if you
force flush often then higher RPM SAS drives may be better).
OS
Kafka should run well on any unix system and has been tested on Linux and Solaris.
We have seen a few issues running on Windows and Windows is not currently a well
supported platform though we would be happy to change that.
It is unlikely to require much OS-level tuning, but there are three potentially important OS-
level configurations:
File descriptor limits: Kafka uses file descriptors for log segments and open
connections. If a broker hosts many partitions, consider that the broker needs at
least (number_of_partitions)*(partition_size/segment_size) to track all log segments
in addition to the number of connections the broker makes. We recommend at least
100000 allowed file descriptors for the broker processes as a starting point. Note:
The mmap() function adds an extra reference to the file associated with the file
descriptor fildes which is not removed by a subsequent close() on that file
descriptor. This reference is removed when there are no more mappings to the file.
Max socket buffer size: can be increased to enable high-performance data transfer
between data centers as described here.
Maximum number of memory map areas a process may have (aka
vm.max_map_count). See the Linux kernel documentation. You should keep an eye
at this OS-level property when considering the maximum number of partitions a
broker may have. By default, on a number of Linux systems, the value of
vm.max_map_count is somewhere around 65535. Each log segment, allocated per
partition, requires a pair of index/timeindex files, and each of these files consumes 1
map area. In other words, each log segment uses 2 map areas. Thus, each partition
requires minimum 2 map areas, as long as it hosts a single log segment. That is to
say, creating 50000 partitions on a broker will result allocation of 100000 map areas
and likely cause broker crash with OutOfMemoryError (Map failed) on a system with
default vm.max_map_count. Keep in mind that the number of log segments per
partition varies depending on the segment size, load intensity, retention policy and,
generally, tends to be more than one.
We recommend using multiple drives to get good throughput and not sharing the same
drives used for Kafka data with application logs or other OS filesystem activity to ensure
good latency. You can either RAID these drives together into a single volume or format and
mount each drive as its own directory. Since Kafka has replication the redundancy provided
by RAID can also be provided at the application level. This choice has several tradeoffs.
If you configure multiple data directories partitions will be assigned round-robin to data
directories. Each partition will be entirely in one of the data directories. If data is not well
balanced among partitions this can lead to load imbalance between disks.
RAID can potentially do better at balancing load between disks (although it doesn't always
seem to) because it balances load at a lower level. The primary downside of RAID is that it
is usually a big performance hit for write throughput and reduces the available disk space.
Another potential benefit of RAID is the ability to tolerate disk failures. However our
experience has been that rebuilding the RAID array is so I/O intensive that it effectively
disables the server, so this does not provide much real availability improvement.
Kafka always immediately writes all data to the filesystem and supports the ability to
configure the flush policy that controls when data is forced out of the OS cache and onto
disk using the flush. This flush policy can be controlled to force data to disk after a period
of time or after a certain number of messages has been written. There are several choices
in this configuration.
Kafka must eventually call fsync to know that data was flushed. When recovering from a
crash for any log segment not known to be fsync'd Kafka will check the integrity of each
message by checking its CRC and also rebuild the accompanying offset index file as part of
the recovery process executed on startup.
Note that durability in Kafka does not require syncing data to disk, as a failed node will
always recover from its replicas.
We recommend using the default flush settings which disable application fsync entirely.
This means relying on the background flush done by the OS and Kafka's own background
flush. This provides the best of all worlds for most uses: no knobs to tune, great throughput
and latency, and full recovery guarantees. We generally feel that the guarantees provided by
replication are stronger than sync to local disk, however the paranoid still may prefer having
both and application level fsync policies are still supported.
The drawback of using application level flush settings is that it is less efficient in its disk
usage pattern (it gives the OS less leeway to re-order writes) and it can introduce latency as
fsync in most Linux filesystems blocks writes to the file whereas the background flushing
does much more granular page-level locking.
In general you don't need to do any low-level tuning of the filesystem, but in the next few
sections we will go over some of this in case it is useful.
Pdflush has a configurable policy that controls how much dirty data can be maintained in
cache and for how long before it must be written back to disk. This policy is described here.
When Pdflush cannot keep up with the rate of data being written it will eventually cause the
writing process to block incurring latency in the writes to slow down the accumulation of
data.
Using pagecache has several advantages over an in-process cache for storing data that will
be written out to disk:
The I/O scheduler will batch together consecutive small writes into bigger physical
writes which improves throughput.
The I/O scheduler will attempt to re-sequence writes to minimize movement of the
disk head which improves throughput.
It automatically uses all the free memory on the machine
Filesystem Selection
Kafka uses regular files on disk, and as such it has no hard dependency on a specific
filesystem. The two filesystems which have the most usage, however, are EXT4 and XFS.
Historically, EXT4 has had more usage, but recent improvements to the XFS filesystem have
shown it to have better performance characteristics for Kafka's workload with no
compromise in stability.
Comparison testing was performed on a cluster with significant message loads, using a
variety of filesystem creation and mount options. The primary metric in Kafka that was
monitored was the "Request Local Time", indicating the amount of time append operations
were taking. XFS resulted in much better local times (160ms vs. 250ms+ for the best EXT4
configuration), as well as lower average wait times. The XFS performance also showed less
variability in disk performance.
For any filesystem used for data directories, on Linux systems, the following options are
recommended to be used at mount time:
noatime: This option disables updating of a file's atime (last access time) attribute
when the file is read. This can eliminate a significant number of filesystem writes,
especially in the case of bootstrapping consumers. Kafka does not rely on the atime
attributes at all, so it is safe to disable this.
XFS Notes
The XFS filesystem has a significant amount of auto-tuning in place, so it does not require
any change in the default settings, either at filesystem creation time or at mount. The only
tuning parameters worth considering are:
largeio: This affects the preferred I/O size reported by the stat call. While this can
allow for higher performance on larger disk writes, in practice it had minimal or no
effect on performance.
nobarrier: For underlying devices that have battery-backed cache, this option can
provide a little more performance by disabling periodic write flushes. However, if the
underlying device is well-behaved, it will report to the filesystem that it does not
require flushes, and this option will have no effect.
EXT4 Notes
EXT4 is a serviceable choice of filesystem for the Kafka data directories, however getting
the most performance out of it will require adjusting several mount options. In addition,
these options are generally unsafe in a failure scenario, and will result in much more data
loss and corruption. For a single broker failure, this is not much of a concern as the disk can
be wiped and the replicas rebuilt from the cluster. In a multiple-failure scenario, such as a
power outage, this can mean underlying filesystem (and therefore data) corruption that is
not easily recoverable. The following options can be adjusted:
6.6 Monitoring
Kafka uses Yammer Metrics for metrics reporting in the server. The Java clients use Kafka
Metrics, a built-in metrics registry that minimizes transitive dependencies pulled into client
applications. Both expose metrics via JMX and can be configured to report stats using
pluggable stats reporters to hook up to your monitoring system.
All Kafka rate metrics have a corresponding cumulative count metric with suffix -total. For
example, records-consumed-rate has a corresponding metric named records-consumed-
total.
The easiest way to see the available metrics is to fire up jconsole and point it at a running
kafka client or server; this will allow browsing all metrics with JMX.
Security Considerations for Remote Monitoring using JMX
Apache Kafka disables remote JMX by default. You can enable remote monitoring using
JMX by setting the environment variable JMX_PORT for processes started using the CLI or
standard Java system properties to enable remote JMX programmatically. You must enable
security when enabling remote JMX in production scenarios to ensure that unauthorized
users cannot monitor or control your broker or application as well as the platform on which
these are running. Note that authentication is disabled for JMX by default in Kafka and
security configs must be overridden for production deployments by setting the environment
variable KAFKA_JMX_OPTS for processes started using the CLI or by setting appropriate Java
system properties. See Monitoring and Management Using JMX Technology for details on
securing JMX.
Producer monitoring
kafka.producer:type=producer-metrics,client-id="{client-id}"
ATTRIBUTE NAME DESCRIPTION
The average number of bytes
batch-size-avg
sent per partition per-request.
The max number of bytes sent
batch-size-max
per partition per-request.
The average number of batch
batch-split-rate
splits per second
batch-split-total The total number of batch splits
The average compression rate of
compression-rate-avg
record batches.
The age in seconds of the current
metadata-age
producer metadata being used.
The average time in ms a request
produce-throttle-time-avg
was throttled by a broker
The maximum time in ms a
produce-throttle-time-max
request was throttled by a broker
The average per-second number
record-error-rate of record sends that resulted in
errors
The total number of record sends
record-error-total
that resulted in errors
The average time in ms record
record-queue-time-avg
batches spent in the send buffer.
The maximum time in ms record
record-queue-time-max
batches spent in the send buffer.
The average per-second number
record-retry-rate
of retried record sends
The total number of retried
record-retry-total
record sends
The average number of records
record-send-rate
sent per second.
record-send-total The total number of records sent.
record-size-avg The average record size
record-size-max The maximum record size
The average number of records
records-per-request-avg
per request.
The average request latency in
request-latency-avg
ms
The maximum request latency in
request-latency-max
ms
requests-in-flight The current number of in-flight
requests awaiting a response.
kafka.producer:type=producer-topic-metrics,client-id="{client-id}",topic="{topic}"
ATTRIBUTE NAME DESCRIPTION
The average number of bytes
byte-rate
sent per second for a topic.
The total number of bytes sent
byte-total
for a topic.
The average compression rate of
compression-rate
record batches for a topic.
The average per-second number
record-error-rate of record sends that resulted in
errors for a topic
The total number of record sends
record-error-total
that resulted in errors for a topic
The average per-second number
record-retry-rate
of retried record sends for a topic
The total number of retried
record-retry-total
record sends for a topic
The average number of records
record-send-rate
sent per second for a topic.
The total number of records sent
record-send-total
for a topic.
consumer monitoring
kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}"
ATTRIBUTE NAME DESCRIPTION
The average number of bytes
bytes-consumed-rate
consumed per second
The total number of bytes
bytes-consumed-total
consumed
The average time taken for a
fetch-latency-avg
fetch request.
The max time taken for any fetch
fetch-latency-max
request.
The number of fetch requests per
fetch-rate
second.
The average number of bytes
fetch-size-avg
fetched per request
The maximum number of bytes
fetch-size-max
fetched per request
fetch-throttle-time-avg The average throttle time in ms
The maximum throttle time in
fetch-throttle-time-max
ms
The total number of fetch
fetch-total
requests.
The average number of records
records-consumed-rate
consumed per second
The total number of records
records-consumed-total
consumed
The maximum lag in terms of
records-lag-max number of records for any
partition in this window
The minimum lead in terms of
records-lead-min number of records for any
partition in this window
The average number of records
records-per-request-avg
in each request
kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}",topic="{topic}"
ATTRIBUTE NAME DESCRIPTION
The average number of bytes
bytes-consumed-rate
consumed per second for a topic
The total number of bytes
bytes-consumed-total
consumed for a topic
The average number of bytes
fetch-size-avg
fetched per request for a topic
fetch-size-max The maximum number of bytes
fetched per request for a topic
The average number of records
records-consumed-rate
consumed per second for a topic
The total number of records
records-consumed-total
consumed for a topic
The average number of records
records-per-request-avg
in each request for a topic
kafka.consumer:type=consumer-fetch-manager-
metrics,partition="{partition}",topic="{topic}",client-id="{client-id}"
ATTRIBUTE NAME DESCRIPTION
The current read replica for the
preferred-read-replica partition, or -1 if reading from
leader
records-lag The latest lag of the partition
records-lag-avg The average lag of the partition
records-lag-max The max lag of the partition
records-lead The latest lead of the partition
records-lead-avg The average lead of the partition
records-lead-min The min lead of the partition
Connect Monitoring
A Connect worker process contains all the producer and consumer metrics as well as
metrics specific to Connect. The worker process itself has a number of metrics, while each
connector and task have additional metrics.
kafka.connect:type=connect-worker-metrics
ATTRIBUTE NAME DESCRIPTION
The number of connectors run in
connector-count
this worker.
The total number of connector
connector-startup-attempts-total startups that this worker has
attempted.
The average percentage of this
connector-startup-failure-
worker's connectors starts that
percentage
failed.
The total number of connector
connector-startup-failure-total
starts that failed.
The average percentage of this
connector-startup-success-
worker's connectors starts that
percentage
succeeded.
The total number of connector
connector-startup-success-total
starts that succeeded.
The number of tasks run in this
task-count
worker.
The total number of task startups
task-startup-attempts-total
that this worker has attempted.
The average percentage of this
task-startup-failure-percentage
worker's tasks starts that failed.
The total number of task starts that
task-startup-failure-total
failed.
The average percentage of this
task-startup-success-percentage worker's tasks starts that
succeeded.
The total number of task starts that
task-startup-success-total
succeeded.
kafka.connect:type=connect-worker-metrics,connector="{connector}"
ATTRIBUTE NAME DESCRIPTION
The number of destroyed tasks of
connector-destroyed-task-count
the connector on the worker.
The number of failed tasks of the
connector-failed-task-count
connector on the worker.
The number of paused tasks of the
connector-paused-task-count
connector on the worker.
The number of running tasks of the
connector-running-task-count
connector on the worker.
The number of tasks of the
connector-total-task-count
connector on the worker.
connector-unassigned-task- The number of unassigned tasks of
count the connector on the worker.
kafka.connect:type=connect-worker-rebalance-metrics
ATTRIBUTE NAME DESCRIPTION
The total number of rebalances
completed-rebalances-total
completed by this worker.
The Connect protocol used by this
connect-protocol
cluster
The epoch or generation number of
epoch
this worker.
leader-name The name of the group leader.
The average time in milliseconds
rebalance-avg-time-ms
spent by this worker to rebalance.
The maximum time in milliseconds
rebalance-max-time-ms
spent by this worker to rebalance.
Whether this worker is currently
rebalancing
rebalancing.
The time in milliseconds since this
time-since-last-rebalance-ms worker completed the most recent
rebalance.
kafka.connect:type=connector-metrics,connector="{connector}"
ATTRIBUTE NAME DESCRIPTION
connector-class The name of the connector class.
The type of the connector. One of
connector-type
'source' or 'sink'.
The version of the connector class,
connector-version
as reported by the connector.
The status of the connector. One of
status 'unassigned', 'running', 'paused',
'failed', or 'destroyed'.
kafka.connect:type=connector-task-metrics,connector="{connector}",task="{task}"
ATTRIBUTE NAME DESCRIPTION
The average size of the batches
batch-size-avg
processed by the connector.
The maximum size of the batches
batch-size-max
processed by the connector.
The average time in milliseconds
offset-commit-avg-time-ms taken by this task to commit
offsets.
The average percentage of this
offset-commit-failure-
task's offset commit attempts that
percentage
failed.
The maximum time in milliseconds
offset-commit-max-time-ms taken by this task to commit
offsets.
The average percentage of this
offset-commit-success-
task's offset commit attempts that
percentage
succeeded.
The fraction of time this task has
pause-ratio
spent in the pause state.
The fraction of time this task has
running-ratio
spent in the running state.
The status of the connector task.
status One of 'unassigned', 'running',
'paused', 'failed', or 'destroyed'.
kafka.connect:type=sink-task-metrics,connector="{connector}",task="{task}"
ATTRIBUTE NAME DESCRIPTION
The average per-second number of
offset-commit-completion-rate offset commit completions that
were completed successfully.
The total number of offset commit
offset-commit-completion-total completions that were completed
successfully.
The current sequence number for
offset-commit-seq-no
offset commits.
The average per-second number of
offset commit completions that
offset-commit-skip-rate
were received too late and
skipped/ignored.
The total number of offset commit
offset-commit-skip-total completions that were received too
late and skipped/ignored.
The number of topic partitions
assigned to this task belonging to
partition-count
the named sink connector in this
worker.
The average time taken by this task
put-batch-avg-time-ms
to put a batch of sinks records.
The maximum time taken by this
put-batch-max-time-ms
task to put a batch of sinks records.
The number of records that have
been read from Kafka but not yet
sink-record-active-count completely
committed/flushed/acknowledged
by the sink task.
The average number of records that
have been read from Kafka but not
sink-record-active-count-avg yet completely
committed/flushed/acknowledged
by the sink task.
The maximum number of records
that have been read from Kafka but
sink-record-active-count-max not yet completely
committed/flushed/acknowledged
by the sink task.
The maximum lag in terms of
number of records that the sink
sink-record-lag-max
task is behind the consumer's
position for any topic partitions.
The average per-second number of
records read from Kafka for this
sink-record-read-rate task belonging to the named sink
connector in this worker. This is
before transformations are applied.
The total number of records read
from Kafka by this task belonging
sink-record-read-total to the named sink connector in this
worker, since the task was last
restarted.
sink-record-send-rate The average per-second number of
records output from the
transformations and sent/put to this
task belonging to the named sink
connector in this worker. This is
after transformations are applied
and excludes any records filtered
out by the transformations.
The total number of records output
from the transformations and
sent/put to this task belonging to
sink-record-send-total
the named sink connector in this
worker, since the task was last
restarted.
kafka.connect:type=source-task-metrics,connector="{connector}",task="{task}"
ATTRIBUTE NAME DESCRIPTION
The average time in milliseconds
poll-batch-avg-time-ms taken by this task to poll for a
batch of source records.
The maximum time in milliseconds
poll-batch-max-time-ms taken by this task to poll for a
batch of source records.
The number of records that have
source-record-active-count been produced by this task but not
yet completely written to Kafka.
The average number of records that
have been produced by this task
source-record-active-count-avg
but not yet completely written to
Kafka.
The maximum number of records
that have been produced by this
source-record-active-count-max
task but not yet completely written
to Kafka.
The average per-second number of
records produced/polled (before
source-record-poll-rate transformation) by this task
belonging to the named source
connector in this worker.
The total number of records
produced/polled (before
source-record-poll-total transformation) by this task
belonging to the named source
connector in this worker.
source-record-write-rate The average per-second number of
records output from the
transformations and written to
Kafka for this task belonging to the
named source connector in this
worker. This is after
transformations are applied and
excludes any records filtered out
by the transformations.
The number of records output from
the transformations and written to
Kafka for this task belonging to the
source-record-write-total
named source connector in this
worker, since the task was last
restarted.
kafka.connect:type=task-error-metrics,connector="{connector}",task="{task}"
ATTRIBUTE NAME DESCRIPTION
deadletterqueue-produce- The number of failed writes to the
failures dead letter queue.
deadletterqueue-produce- The number of attempted writes to
requests the dead letter queue.
The epoch timestamp when this
last-error-timestamp
task last encountered an error.
The number of errors that were
total-errors-logged
logged.
The number of record processing
total-record-errors
errors in this task.
The number of record processing
total-record-failures
failures in this task.
The number of records skipped due
total-records-skipped
to errors.
total-retries The number of operations retried.
Streams Monitoring
A Kafka Streams instance contains all the producer and consumer metrics as well as
additional metrics specific to Streams. By default Kafka Streams has metrics with two
recording levels: debug and info.
Note that the metrics have a 4-layer hierarchy. At the top level there are client-level metrics
for each started Kafka Streams client. Each client has stream threads, with their own
metrics. Each stream thread has tasks, with their own metrics. Each task has a number of
processor nodes, with their own metrics. Each task also has a number of state stores and
record caches, all with their own metrics.
Use the following configuration option to specify which metrics you want collected:
metrics.recording.level="info"
Client Metrics
Thread Metrics
Task Metrics
All of the following metrics have a recording level of debug, except for metrics dropped-
records-rate and dropped-records-total which have a recording level of info:
METRIC/ATTRIBUTE NAME DESCRIPTION MBEAN NAME
kafka.streams:type=stream-task-
The average execution time in
process-latency-avg metrics,thread-id=([-.\w]+),task-
ns, for processing.
id=([-.\w]+)
process-latency-max The maximum execution time in kafka.streams:type=stream-task-
metrics,thread-id=([-.\w]+),task-
ns, for processing.
id=([-.\w]+)
The average number of
kafka.streams:type=stream-task-
processed records per second
process-rate metrics,thread-id=([-.\w]+),task-
across all source processor nodes
id=([-.\w]+)
of this task.
The total number of processed kafka.streams:type=stream-task-
process-total records across all source metrics,thread-id=([-.\w]+),task-
processor nodes of this task. id=([-.\w]+)
kafka.streams:type=stream-task-
The average execution time in
commit-latency-avg metrics,thread-id=([-.\w]+),task-
ns, for committing.
id=([-.\w]+)
kafka.streams:type=stream-task-
The maximum execution time in
commit-latency-max metrics,thread-id=([-.\w]+),task-
ns, for committing.
id=([-.\w]+)
kafka.streams:type=stream-task-
The average number of commit
commit-rate metrics,thread-id=([-.\w]+),task-
calls per second.
id=([-.\w]+)
kafka.streams:type=stream-task-
The total number of commit
commit-total metrics,thread-id=([-.\w]+),task-
calls.
id=([-.\w]+)
The average observed lateness of kafka.streams:type=stream-task-
record-lateness-avg records (stream time - record metrics,thread-id=([-.\w]+),task-
timestamp). id=([-.\w]+)
The max observed lateness of kafka.streams:type=stream-task-
record-lateness-max records (stream time - record metrics,thread-id=([-.\w]+),task-
timestamp). id=([-.\w]+)
kafka.streams:type=stream-task-
The average number of enforced
enforced-processing-rate metrics,thread-id=([-.\w]+),task-
processings per second.
id=([-.\w]+)
kafka.streams:type=stream-task-
The total number enforced
enforced-processing-total metrics,thread-id=([-.\w]+),task-
processings.
id=([-.\w]+)
kafka.streams:type=stream-task-
The average number of records
dropped-records-rate metrics,thread-id=([-.\w]+),task-
dropped within this task.
id=([-.\w]+)
kafka.streams:type=stream-task-
The total number of records
dropped-records-total metrics,thread-id=([-.\w]+),task-
dropped within this task.
id=([-.\w]+)
The following metrics are only available on certain types of nodes, i.e., process-rate and
process-total are only available for source processor nodes and suppression-emit-rate and
suppression-emit-total are only available for suppression operation nodes. All of the metrics
have a recording level of debug:
METRIC/ATTRIBUTE NAME DESCRIPTION MBEAN NAME
kafka.streams:type=stream-
The average number of records
processor-node-metrics,thread-
process-rate processed by a source processor
id=([-.\w]+),task-id=([-.\w]
node per second.
+),processor-node-id=([-.\w]+)
kafka.streams:type=stream-
The total number of records
processor-node-metrics,thread-
process-total processed by a source processor
id=([-.\w]+),task-id=([-.\w]
node per second.
+),processor-node-id=([-.\w]+)
The rate at which records that kafka.streams:type=stream-
have been emitted downstream processor-node-metrics,thread-
suppression-emit-rate
from suppression operation id=([-.\w]+),task-id=([-.\w]
nodes. +),processor-node-id=([-.\w]+)
The total number of records that kafka.streams:type=stream-
have been emitted downstream processor-node-metrics,thread-
suppression-emit-total
from suppression operation id=([-.\w]+),task-id=([-.\w]
nodes. +),processor-node-id=([-.\w]+)
All of the following metrics have a recording level of debug. Note that the store-scope value
is specified in StoreSupplier#metricsScope() for user's customized state stores; for built-in
state stores, currently we have:
in-memory-state
in-memory-lru-state
in-memory-window-state
in-memory-suppression (for suppression buffers)
rocksdb-state (for RocksDB backed key-value store)
rocksdb-window-state (for RocksDB backed window store)
rocksdb-session-state (for RocksDB backed session store)
RocksDB Metrics
All of the following metrics have a recording level of debug. The metrics are collected every
minute from the RocksDB state stores. If a state store consists of multiple RocksDB
instances as it is the case for aggregations over time and session windows, each metric
reports an aggregation over the RocksDB instances of the state store. Note that the store-
scope for built-in RocksDB state stores are currently the following:
Others
We recommend monitoring GC time and other stats and various server stats such as CPU
utilization, I/O service time, etc. On the client side, we recommend monitoring the
message/byte rate (global and per topic), request rate/size/time, and on the consumer side,
max lag in messages among all partitions and min fetch request rate. For a consumer to
keep up, max lag needs to be less than a threshold and min fetch rate needs to be larger
than 0.
6.7 ZooKeeper
Stable version
The current stable branch is 3.5. Kafka is regularly updated to include the latest release in
the 3.5 series.
Operationalizing ZooKeeper
Redundancy in the physical/hardware/network layout: try not to put them all in the
same rack, decent (but don't go nuts) hardware, try to keep redundant power and
network paths, etc. A typical ZooKeeper ensemble has 5 or 7 servers, which tolerates
2 and 3 servers down, respectively. If you have a small deployment, then using 3
servers is acceptable, but keep in mind that you'll only be able to tolerate 1 server
down in this case.
I/O segregation: if you do a lot of write type traffic you'll almost definitely want the
transaction logs on a dedicated disk group. Writes to the transaction log are
synchronous (but batched for performance), and consequently, concurrent writes
can significantly affect performance. ZooKeeper snapshots can be one such a
source of concurrent writes, and ideally should be written on a disk group separate
from the transaction log. Snapshots are written to disk asynchronously, so it is
typically ok to share with the operating system and message log files. You can
configure a server to use a separate disk group with the dataLogDir parameter.
Application segregation: Unless you really understand the application patterns of
other apps that you want to install on the same box, it can be a good idea to run
ZooKeeper in isolation (though this can be a balancing act with the capabilities of
the hardware).
Use care with virtualization: It can work, depending on your cluster layout and
read/write patterns and SLAs, but the tiny overheads introduced by the virtualization
layer can add up and throw off ZooKeeper, as it can be very time sensitive
ZooKeeper configuration: It's java, make sure you give it 'enough' heap space (We
usually run them with 3-5G, but that's mostly due to the data set size we have here).
Unfortunately we don't have a good formula for it, but keep in mind that allowing for
more ZooKeeper state means that snapshots can become large, and large
snapshots affect recovery time. In fact, if the snapshot becomes too large (a few
gigabytes), then you may need to increase the initLimit parameter to give enough
time for servers to recover and join the ensemble.
Monitoring: Both JMX and the 4 letter words (4lw) commands are very useful, they
do overlap in some cases (and in those cases we prefer the 4 letter commands, they
seem more predictable, or at the very least, they work better with the LI monitoring
infrastructure)
Don't overbuild the cluster: large clusters, especially in a write heavy usage pattern,
means a lot of intracluster communication (quorums on the writes and subsequent
cluster member updates), but don't underbuild it (and risk swamping the cluster).
Having more servers adds to your read capacity.
Overall, we try to keep the ZooKeeper system as small as will handle the load (plus standard
growth capacity planning) and as simple as possible. We try not to do anything fancy with
the configuration or application layout as compared to the official release as well as keep it
as self contained as possible. For these reasons, we tend to skip the OS packaged versions,
since it has a tendency to try to put things in the OS standard hierarchy, which can be
'messy', for want of a better way to word it.
7. SECURITY
7.1 Security Overview
In release 0.9.0.0, the Kafka community added a number of features that, used either
separately or together, increases security in a Kafka cluster. The following security
measures are currently supported:
It's worth noting that security is optional - non-secured clusters are supported, as well as a
mix of authenticated, unauthenticated, encrypted and non-encrypted clients. The guides
below explain how to configure and use the security features in both clients and brokers.
7.2 Encryption and Authentication using SSL
Apache Kafka allows clients to use SSL for encryption of traffic as well as authentication.
By default, SSL is disabled but can be turned on if needed. The following paragraphs explain
in detail how to set up your own PKI infrastructure, use it to create certificates and configure
Kafka to use these.
The first step of deploying one or more brokers with SSL support is to generate a
public/private keypair for every server. Since Kafka expects all keys and certificates
to be stored in keystores we will use Java's keytool command for this task. The tool
supports two different keystore formats, the Java specific jks format which has been
deprecated by now, as well as PKCS12. PKCS12 is the default format as of Java
version 9, to ensure this format is being used regardless of the Java version in use
all following commands explicitly specify the PKCS12 format.
1. keystorefile: the keystore file that stores the keys (and later the certificate) for
this broker. The keystore file contains the private and public keys of this
broker, therefore it needs to be kept safe. Ideally this step is run on the Kafka
broker that the key will be used on, as this key should never be
transmitted/leave the server that it is intended for.
2. validity: the valid time of the key in days. Please note that this differs from the
validity period for the certificate, which will be determined in Signing the
certificate. You can use the same key to request multiple certificates: if your
key has a validity of 10 years, but your CA will only sign certificates that are
valid for one year, you can use the same key with 10 certificates over time.
To obtain a certificate that can be used with the private key that was just created a
certificate signing request needs to be created. This signing request, when signed by
a trusted CA results in the actual certificate which can then be installed in the
keystore and used for authentication purposes.
To generate certificate signing requests run the following command for all server
keystores created so far.
keytool -keystore server.keystore.jks -alias
localhost -validity {validity} -genkey -keyalg RSA -destkeystoretype
pkcs12 -ext SAN=DNS:{FQDN},IP:{IPADDRESS1}
This command assumes that you want to add hostname information to the
certificate, if this is not the case, you can omit the extension parameter -ext
SAN=DNS:{FQDN},IP:{IPADDRESS1}. Please see below for more information on this.
Host name verification, when enabled, is the process of checking attributes from the
certificate that is presented by the server you are connecting to against the actual
hostname or ip address of that server to ensure that you are indeed connecting to
the correct server.
The main reason for this check is to prevent man-in-the-middle attacks. For Kafka,
this check has been disabled by default for a long time, but as of Kafka 2.0.0 host
name verification of servers is enabled by default for client connections as well as
inter-broker connections.
Server host name verification may be disabled by
setting ssl.endpoint.identification.algorithm to an empty string.
For dynamically configured broker listeners, hostname verification may be disabled
using kafka-configs.sh:
bin/kafka-configs.sh --bootstrap-server
localhost:9093 --entity-type brokers --entity-name 0 --alter --add-
config
"listener.name.internal.ssl.endpoint.identification.algorithm="
Note:
Normally there is no good reason to disable hostname verification apart from being
the quickest way to "just get it to work" followed by the promise to "fix it later when
there is more time"!
Getting hostname verification right is not that hard when done at the right time, but
gets much harder once the cluster is up and running - do yourself a favor and do it
now!
If host name verification is enabled, clients will verify the server's fully qualified
domain name (FQDN) or ip address against one of the following two fields:
3. Common Name (CN)
4. Subject Alternative Name (SAN)
While Kafka checks both fields, usage of the common name field for hostname
verification has been deprecated since 2000 and should be avoided if possible. In
addition the SAN field is much more flexible, allowing for multiple DNS and IP entries
to be declared in a certificate.
Another advantage is that if the SAN field is used for hostname verification the
common name can be set to a more meaningful value for authorization purposes.
Since we need the SAN field to be contained in the signed certificate, it will be
specified when generating the signing request. It can also be specified when
generating the keypair, but this will not automatically be copied into the signing
request.
To add a SAN field append the following argument -ext SAN=DNS:{FQDN},IP:
{IPADDRESS} to the keytool command:
After this step each machine in the cluster has a public/private key pair which can
already be used to encrypt traffic and a certificate signing request, which is the basis
for creating a certificate. To add authentication capabilities this signing request
needs to be signed by a trusted authority, which will be created in this step.
A certificate authority (CA) is responsible for signing certificates. CAs works likes a
government that issues passports - the government stamps (signs) each passport
so that the passport becomes difficult to forge. Other governments verify the stamps
to ensure the passport is authentic. Similarly, the CA signs the certificates, and the
cryptography guarantees that a signed certificate is computationally difficult to
forge. Thus, as long as the CA is a genuine and trusted authority, the clients have a
strong assurance that they are connecting to the authentic machines.
For this guide we will be our own Certificate Authority. When setting up a production
cluster in a corporate environment these certificates would usually be signed by a
corporate CA that is trusted throughout the company. Please see Common Pitfalls in
Production for some things to consider for this case.
Due to a bug in OpenSSL, the x509 module will not copy requested extension fields
from CSRs into the final certificate. Since we want the SAN extension to be present
in our certificate to enable hostname verification, we'll use the ca module instead.
This requires some additional configuration to be in place before we generate our CA
keypair.
Save the following listing into a file called openssl-ca.cnf and adjust the values for
validity and common attributes as necessary.
HOME = .
RANDFILE = $ENV::HOME/.rnd
####################################################################
[ ca ]
default_ca = CA_default # The default ca section
[ CA_default ]
base_dir = .
certificate = $base_dir/cacert.pem # The CA certifcate
private_key = $base_dir/cakey.pem # The CA private key
new_certs_dir = $base_dir # Location for new certs
after signing
database = $base_dir/index.txt # Database index file
serial = $base_dir/serial.txt # The current serial number
####################################################################
[ req ]
default_bits = 4096
default_keyfile = cakey.pem
distinguished_name = ca_distinguished_name
x509_extensions = ca_extensions
string_mask = utf8only
####################################################################
[ ca_distinguished_name ]
countryName = Country Name (2 letter code)
countryName_default = DE
####################################################################
[ ca_extensions ]
subjectKeyIdentifier = hash
authorityKeyIdentifier = keyid:always, issuer
basicConstraints = critical, CA:true
keyUsage = keyCertSign, cRLSign
####################################################################
[ signing_policy ]
countryName = optional
stateOrProvinceName = optional
localityName = optional
organizationName = optional
organizationalUnitName = optional
commonName = supplied
emailAddress = optional
####################################################################
[ signing_req ]
subjectKeyIdentifier = hash
authorityKeyIdentifier = keyid,issuer
basicConstraints = CA:FALSE
keyUsage = digitalSignature, keyEncipherment
Then create a database and serial number file, these will be used to keep track of
which certificates were signed with this CA. Both of these are simply text files that
reside in the same directory as your CA keys.
With these steps done you are now ready to generate your CA that will be used to
sign certificates later.
The CA is simply a public/private key pair and certificate that is signed by itself, and
is only intended to sign other certificates.
This keypair should be kept very safe, if someone gains access to it, they can create
and sign certificates that will be trusted by your infrastructure, which means they will
be able to impersonate anybody when connecting to any service that trusts this CA.
The next step is to add the generated CA to the **clients' truststore** so that the
clients can trust this CA:
Note: If you configure the Kafka brokers to require client authentication by setting
ssl.client.auth to be "requested" or "required" in the Kafka brokers config then you
must provide a truststore for the Kafka brokers as well and it should have all the CA
certificates that clients' keys were signed by.
Finally, you need to import both the certificate of the CA and the signed certificate
into the keystore:
This will leave you with one truststore called truststore.jks - this can be the same for
all clients and brokers and does not contain any sensitive information, so there is no
need to secure this.
Additionally you will have one server.keystore.jks file per node which contains that
nodes keys, certificate and your CAs certificate, please refer to Configuring Kafka
Brokers and Configuring Kafka Clients for information on how to use these files.
For some tooling assistance on this topic, please check out the easyRSA project
which has extensive scripting in place to help with these steps.
The above paragraphs show the process to create your own CA and use it to sign
certificates for your cluster. While very useful for sandbox, dev, test, and similar
systems, this is usually not the correct process to create certificates for a production
cluster in a corporate environment. Enterprises will normally operate their own CA
and users can send in CSRs to be signed with this CA, which has the benefit of users
not being responsible to keep the CA secure as well as a central authority that
everybody can trust. However it also takes away a lot of control over the process of
signing certificates from the user. Quite often the persons operating corporate CAs
will apply tight restrictions on certificates that can cause issues when trying to use
these certificates with Kafka.
2. Intermediate Certificates
Corporate Root CAs are often kept offline for security reasons. To enable day-
to-day usage, so called intermediate CAs are created, which are then used to
sign the final certificates. When importing a certificate into the keystore that
was signed by an intermediate CA it is necessarry to provide the entire chain
of trust up to the root CA. This can be done by simply cating the certificate
files into one combined certificate file and then importing this with keytool.
3. Failure to copy extension fields
CA operators are often hesitant to copy and requested extension fields from
CSRs and prefer to specify these themselves as this makes it harder for a
malicious party to obtain certificates with potentially misleading or fraudulent
values. It is adviseable to double check signed certificates, whether these
contain all requested SAN fields to enable proper hostname verification. The
following command can be used to print certificate details to the console,
which should be compared with what was originally requested:
listeners
If SSL is not enabled for inter-broker communication (see below for how to enable it),
both PLAINTEXT and SSL ports will be necessary.
listeners=PLAINTEXT://host.name:port,SSL://host.name:port
ssl.keystore.location=/var/private/ssl/server.keystore.jks
ssl.keystore.password=test1234
ssl.key.password=test1234
ssl.truststore.location=/var/private/ssl/server.truststore.jks
ssl.truststore.password=test1234
If you want to enable SSL for inter-broker communication, add the following to the
server.properties file (it defaults to PLAINTEXT)
security.inter.broker.protocol=SSL
Due to import regulations in some countries, the Oracle implementation limits the
strength of cryptographic algorithms available by default. If stronger algorithms are
needed (for example, AES with 256-bit keys), the JCE Unlimited Strength Jurisdiction
Policy Files must be obtained and installed in the JDK/JRE. See the JCA Providers
Documentation for more information.
The JRE/JDK will have a default pseudo-random number generator (PRNG) that is
used for cryptography operations, so it is not required to configure the
implementation used with the ssl.secure.random.implementation. However, there are
performance issues with some implementations (notably, the default chosen on
Linux systems, NativePRNG, utilizes a global lock). In cases where performance of
SSL connections becomes an issue, consider explicitly setting the implementation to
be used. The SHA1PRNG implementation is non-blocking, and has shown very good
performance characteristics under heavy load (50 MB/sec of produced messages,
plus replication traffic, per-broker).
Once you start the broker you should be able to see in the server.log
-----BEGIN CERTIFICATE-----
{variable sized random bytes}
-----END CERTIFICATE-----
subject=/C=US/ST=CA/L=Santa
Clara/O=org/OU=org/CN=Sriharsha Chintalapani
issuer=/C=US/ST=CA/L=Santa
Clara/O=org/OU=org/CN=kafka/[email protected]
If the certificate does not show up or if there are any other error messages then your
keystore is not setup properly.
SSL is supported only for the new Kafka Producer and Consumer, the older API is not
supported. The configs for SSL will be the same for both producer and consumer.
If client authentication is not required in the broker, then the following is a minimal
configuration example:
security.protocol=SSL
ssl.truststore.location=/var/private/ssl/client.truststore.jks
ssl.truststore.password=test1234
ssl.keystore.location=/var/private/ssl/client.keystore.jks
ssl.keystore.password=test1234
ssl.key.password=test1234
1. ssl.provider (Optional). The name of the security provider used for SSL
connections. Default value is the default security provider of the JVM.
2. ssl.cipher.suites (Optional). A cipher suite is a named combination of
authentication, encryption, MAC and key exchange algorithm used to
negotiate the security settings for a network connection using TLS or SSL
network protocol.
3. ssl.enabled.protocols=TLSv1.2,TLSv1.1,TLSv1. It should list at least one of
the protocols configured on the broker side
4. ssl.truststore.type=JKS
5. ssl.keystore.type=JKS
kafka-console-producer.sh --bootstrap-server
localhost:9093 --topic test --producer.config client-ssl.properties
kafka-console-consumer.sh --bootstrap-server
localhost:9093 --topic test --consumer.config client-ssl.properties
1. JAAS configuration
Kafka uses the Java Authentication and Authorization Service (JAAS) for SASL
configuration.
listener.name.sasl_ssl.scram-sha-
256.sasl.jaas.config=org.apache.kafka.common.security.scram.Sc
ramLoginModule required \
username="admin" \
password="admin-secret";
listener.name.sasl_ssl.plain.sasl.jaas.config=org.apache.kafka
.common.security.plain.PlainLoginModule required \
username="admin" \
password="admin-secret" \
user_admin="admin-secret" \
user_alice="alice-secret";
2. KafkaClient {
3.
com.sun.security.auth.module.Krb5LoginModule
required
4. useKeyTab=true
5. storeKey=true
6.
keyTab="/etc/security/keytabs/kafka_client.keytab"
7. principal="[email protected]";
};
-Djava.security.auth.login.config=/etc/kafka/kafka_
client_jaas.conf
2. SASL configuration
SASL may be used with PLAINTEXT or SSL as the transport layer using the security
protocol SASL_PLAINTEXT or SASL_SSL respectively. If SASL_SSL is used, then SSL
must also be configured.
1. SASL mechanisms
GSSAPI (Kerberos)
PLAIN
SCRAM-SHA-256
SCRAM-SHA-512
OAUTHBEARER
2. SASL configuration for Kafka brokers
listeners=SASL_PLAINTEXT://host.name:port
If you are only configuring a SASL port (or if you want the Kafka
brokers to authenticate each other using SASL) then make sure you
set the same SASL protocol for inter-broker communication:
SASL authentication is only supported for the new Java Kafka producer and
consumer, the older API is not supported.
KafkaServer {
com.sun.security.auth.module.Krb5LoginModule
required
useKeyTab=true
storeKey=true
keyTab="/etc/security/keytabs/kafka_server.keytab"
principal="kafka/[email protected]";
};
// Zookeeper client authentication
Client {
com.sun.security.auth.module.Krb5LoginModule
required
useKeyTab=true
storeKey=true
keyTab="/etc/security/keytabs/kafka_server.keytab"
principal="kafka/[email protected]";
};
-Djava.security.auth.login.config=/etc/kafka/kafka_server_jaas.co
nf
Make sure the keytabs configured in the JAAS file are readable by the
operating system user who is starting kafka broker.
Configure SASL port and SASL mechanisms in server.properties as
described here. For example:
listeners=SASL_PLAINTEXT://host.name:port
security.inter.broker.protocol=SASL_PLAINTEXT
sasl.mechanism.inter.broker.protocol=GSSAPI
sasl.enabled.mechanisms=GSSAPI
sasl.jaas.config=com.sun.security.auth.module.Krb5LoginM
odule required \
useKeyTab=true \
storeKey=true \
keyTab="/etc/security/keytabs/kafka_client.keytab" \
principal="[email protected]";
sasl.jaas.config=com.sun.security.auth.module.Krb5LoginM
odule required \
useTicketCache=true;
-Djava.security.krb5.conf=/etc/kafka/krb5.conf
sasl.kerberos.service.name=kafka
KafkaServer {
org.apache.kafka.common.security.plain.PlainLoginModule
required
username="admin"
password="admin-secret"
user_admin="admin-secret"
user_alice="alice-secret";
};
Pass the JAAS config file location as JVM parameter to each Kafka
broker:
-Djava.security.auth.login.config=/etc/kafka/kafka_server_jaas.co
nf
sasl.jaas.config=org.apache.kafka.common.security.plain.
PlainLoginModule required \
username="alice" \
password="alice-secret";
security.protocol=SASL_SSL
sasl.mechanism=PLAIN
The default iteration count of 4096 is used if iterations are not specified. A
random salt is created and the SCRAM identity consisting of salt, iterations,
StoredKey and ServerKey are stored in Zookeeper. See RFC 5802 for details
on SCRAM identity and the individual fields.
Credentials may be deleted for one or more SCRAM mechanisms using the --
alter --delete-config option:
KafkaServer {
org.apache.kafka.common.security.scram.ScramLoginModule
required
username="admin"
password="admin-secret";
};
Pass the JAAS config file location as JVM parameter to each Kafka
broker:
-Djava.security.auth.login.config=/etc/kafka/kafka_server_jaas.co
nf
listeners=SASL_SSL://host.name:port
security.inter.broker.protocol=SASL_SSL
sasl.mechanism.inter.broker.protocol=SCRAM-SHA-256
(or SCRAM-SHA-512)
sasl.jaas.config=org.apache.kafka.common.security.scram.
ScramLoginModule required \
username="alice" \
password="alice-secret";
security.protocol=SASL_SSL
Add a suitably modified JAAS file similar to the one below to each
Kafka broker's config directory, let's call it kafka_server_jaas.conf for
this example:
KafkaServer {
org.apache.kafka.common.security.oauthbearer.OAuthBearer
LoginModule required
unsecuredLoginStringClaim_sub="admin";
};
The property unsecuredLoginStringClaim_sub in
the KafkaServer section is used by the broker when it initiates
connections to other brokers. In this example, admin will appear in the
subject (sub) claim and will be the user for inter-broker
communication.
Pass the JAAS config file location as JVM parameter to each Kafka
broker:
-Djava.security.auth.login.config=/etc/kafka/kafka_server_jaas.co
nf
listeners=SASL_SSL://host.name:port (or
SASL_PLAINTEXT if non-production)
security.inter.broker.protocol=SASL_SSL (or
SASL_PLAINTEXT if non-production)
sasl.mechanism.inter.broker.protocol=OAUTHBEARER
sasl.enabled.mechanisms=OAUTHBEARER
sasl.jaas.config=org.apache.kafka.common.security.oauthb
earer.OAuthBearerLoginModule required \
unsecuredLoginStringClaim_sub="alice";
sasl.mechanism=OAUTHBEARER
unsecuredLoginExtension_
extension name is any sequence of
<extensionname>="value" lowercase or uppercase alphabet
characters. In addition, the "auth"
extension name is reserved. A
valid extension value is any
combination of characters with
ASCII codes 1-127.
Set to a non-empty
value if you wish a
particular String
claim holding a
unsecuredValidatorPrincipalClaimName principal name to
="value" be checked for
existence; the
default is to check
for the existence of
the 'sub' claim.
Set to a custom
claim name if you
wish the name of
the String or Str
unsecuredValidatorScopeClaimName="va
lue" ing List claim
holding any token
scope to be
something other
than 'scope'.
Set to a space-
delimited list of
scope values if you
wish
the String/Strin
unsecuredValidatorRequiredScope="val
ue" g List claim
holding the token
scope to be
checked to make
sure it contains
certain values.
Set to a positive
integer value if
you wish to allow
unsecuredValidatorAllowableClockSkew up to some number
Ms="value" of positive
milliseconds of
clock skew (the
default is 0).
Kafka periodically refreshes any token before it expires so that the client can
continue to make connections to brokers. The parameters that impact how
the refresh algorithm operates are specified as part of the
producer/consumer/broker configuration and are as follows. See the
documentation for these properties elsewhere for details. The default values
are usually reasonable, in which case these configuration parameters would
not need to be explicitly set.
Producer/Consumer/Broker Configuration
Property
sasl.login.refresh.window.factor
sasl.login.refresh.window.jitter
sasl.login.refresh.min.period.seconds
sasl.login.refresh.min.buffer.seconds
2. KafkaServer {
3. com.sun.security.auth.module.Krb5LoginModule
required
4. useKeyTab=true
5. storeKey=true
6. keyTab="/etc/security/keytabs/kafka_server.keytab"
7. principal="kafka/[email protected]";
8.
9.
org.apache.kafka.common.security.plain.PlainLoginModule
required
10. username="admin"
11. password="admin-secret"
12. user_admin="admin-secret"
13. user_alice="alice-secret";
};
sasl.enabled.mechanisms=GSSAPI,PLAIN,SCRAM-SHA-256,SCRAM-SHA-
512,OAUTHBEARER
15. Specify the SASL security protocol and mechanism for inter-broker
communication in server.properties if required:
SASL mechanism can be modified in a running cluster using the following sequence:
1. User authenticates with the Kafka cluster via SASL or SSL, and obtains a
delegation token. This can be done using Admin APIs or using kafka-
delegation-tokens.sh script.
2. User securely passes the delegation token to Kafka clients for authenticating
with the Kafka cluster.
3. Token owner/renewer can renew/expire the delegation tokens.
4. Token Management
A token has a current life, and a maximum renewable life. By default, tokens
must be renewed once every 24 hours for up to 7 days. These can be
configured
using delegation.token.expiry.time.ms and delegation.token.max.life
time.ms config options.
6. Token Authentication
sasl.jaas.config=org.apache.kafka.common.security.scram.
ScramLoginModule required \
username="tokenID123" \
password="lAYYSFmLs4bTjf+lTZ1LCHR/ZZFNA==" \
tokenauth="true";
Kafka ships with a pluggable Authorizer and an out-of-box authorizer implementation that
uses zookeeper to store all the acls. The Authorizer is configured by
setting authorizer.class.name in server.properties. To enable the out of the box
implementation use:
authorizer.class.name=kafka.security.authorizer.AclAuthorizer
Kafka acls are defined in the general format of "Principal P is [Allowed/Denied] Operation O
From Host H on any Resource R matching ResourcePattern RP". You can read more about
the acl structure in KIP-11 and resource patterns in KIP-290. In order to add, remove or list
acls you can use the Kafka authorizer CLI. By default, if no ResourcePatterns match a
specific Resource R, then R has no associated acls, and therefore no one other than super
users is allowed to access R. If you want to change that behavior, you can include the
following in server.properties.
allow.everyone.if.no.acl.found=true
One can also add super users in server.properties like the following (note that the delimiter
is semicolon since SSL user names may contain comma). Default PrincipalType string
"User" is case sensitive.
super.users=User:Bob;User:Alice
RULE:pattern/replacement/
RULE:pattern/replacement/[LU]
Example ssl.principal.mapping.rules values are:
RULE:^CN=(.*?),OU=ServiceUsers.*$/$1/,
RULE:^CN=(.*?),OU=(.*?),O=(.*?),L=(.*?),ST=(.*?),C=(.*?)$/$1@$2/L,
RULE:^.*[Cc][Nn]=([a-zA-Z0-9.]*).*$/$1/L,
DEFAULT
By default, the SASL user name will be the primary part of the Kerberos principal. One can
change that by setting sasl.kerberos.principal.to.local.rules to a customized rule in
server.properties. The format of sasl.kerberos.principal.to.local.rules is a list where
each rule works in the same way as the auth_to_local in Kerberos configuration file
(krb5.conf). This also support additional lowercase/uppercase rule, to force the translated
result to be all lowercase/uppercase. This is done by adding a "/L" or "/U" to the end of the
rule. check below formats for syntax. Each rules starts with RULE: and contains an
expression as the following formats. See the kerberos documentation for more details.
RULE:[n:string](regexp)s/pattern/replacement/
RULE:[n:string](regexp)s/pattern/replacement/g
RULE:[n:string](regexp)s/pattern/replacement//L
RULE:[n:string](regexp)s/pattern/replacement/g/L
RULE:[n:string](regexp)s/pattern/replacement//U
RULE:[n:string](regexp)s/pattern/replacement/g/U
Read
Write
Create
Delete
--operation All Operation
Alter
Describe
ClusterAction
DescribeConfigs
AlterConfigs
IdempotentWrite
All
Examples
Adding Acls
Suppose you want to add an acl "Principals User:Bob and User:Alice are allowed to
perform Operation Read and Write on Topic Test-Topic from IP 198.51.100.0 and IP
198.51.100.1". You can do that by executing the CLI with following options:
bin/kafka-acls.sh --authorizer-properties
zookeeper.connect=localhost:2181 --add --allow-principal User:Bob
--allow-principal User:Alice --allow-host 198.51.100.0 --allow-host
198.51.100.1 --operation Read --operation Write --topic Test-topic
By default, all principals that don't have an explicit acl that allows access for an
operation to a resource are denied. In rare cases where an allow acl is defined that
allows access to all but some principal we will have to use the --deny-principal and
--deny-host option. For example, if we want to allow all users to Read from Test-topic
but only deny User:BadBob from IP 198.51.100.3 we can do so using following
commands:
bin/kafka-acls.sh --authorizer-properties
zookeeper.connect=localhost:2181 --add --allow-principal User:*
--allow-host * --deny-principal User:BadBob --deny-host 198.51.100.3
--operation Read --topic Test-topic
bin/kafka-acls.sh --authorizer-properties
zookeeper.connect=localhost:2181 --add --allow-principal User:Peter
--allow-host 198.51.200.1 --producer --topic *
You can add acls on prefixed resource patterns, e.g. suppose you want to add an acl
"Principal User:Jane is allowed to produce to any Topic whose name starts with
'Test-' from any host". You can do that by executing the CLI with following options:
bin/kafka-acls.sh --authorizer-properties
zookeeper.connect=localhost:2181 --add --allow-principal User:Jane
--producer --topic Test- --resource-pattern-type prefixed
Removing Acls
Removing acls is pretty much the same. The only difference is instead of --add
option users will have to specify --remove option. To remove the acls added by the
first example above we can execute the CLI with following options:
bin/kafka-acls.sh --authorizer-properties
zookeeper.connect=localhost:2181 --remove --allow-principal User:Bob
--allow-principal User:Alice --allow-host 198.51.100.0 --allow-host
198.51.100.1 --operation Read --operation Write --topic Test-topic
If you want to remove the acl added to the prefixed resource pattern above we can
execute the CLI with following options:
bin/kafka-acls.sh --authorizer-properties
zookeeper.connect=localhost:2181 --remove --allow-principal
User:Jane --producer --topic Test- --resource-pattern-type Prefixed
List Acls
We can list acls for any resource by specifying the --list option with the resource. To
list all acls on the literal resource pattern Test-topic, we can execute the CLI with
following options:
bin/kafka-acls.sh --authorizer-properties
zookeeper.connect=localhost:2181 --list --topic Test-topic
However, this will only return the acls that have been added to this exact resource
pattern. Other acls can exist that affect access to the topic, e.g. any acls on the topic
wildcard '*', or any acls on prefixed resource patterns. Acls on the wildcard resource
pattern can be queried explicitly:
bin/kafka-acls.sh --authorizer-properties
zookeeper.connect=localhost:2181 --list --topic *
bin/kafka-acls.sh --authorizer-properties
zookeeper.connect=localhost:2181 --list --topic Test-topic
--resource-pattern-type match
This will list acls on all matching literal, wildcard and prefixed resource patterns.
bin/kafka-acls.sh --authorizer-properties
zookeeper.connect=localhost:2181 --add --allow-principal User:Bob
--producer --topic Test-topic
Similarly to add Alice as a consumer of Test-topic with consumer group Group-1 we
just have to pass --consumer option:
bin/kafka-acls.sh --authorizer-properties
zookeeper.connect=localhost:2181 --add --allow-principal User:Bob
--consumer --topic Test-topic --group Group-1
Note that for consumer option we must also specify the consumer group. In order to
remove a principal from producer or consumer role we just need to pass --remove
option.
Authorization Primitives
Protocol calls are usually performing some operations on certain resources in Kafka. It is
required to know the operations and resources to set up effective protection. In this section
we'll list these operations and resources, then list the combination of these with the
protocols to see the valid scenarios.
Operations in Kafka
There are a few operation primitives that can be used to build up privileges. These can be
matched up with certain resources to allow specific protocol calls for a given user. These
are:
Read
Write
Create
Delete
Alter
Describe
ClusterAction
DescribeConfigs
AlterConfigs
IdempotentWrite
All
Resources in Kafka
The operations above can be applied on certain resources which are described below.
Topic: this simply represents a Topic. All protocol calls that are acting on topics
(such as reading, writing them) require the corresponding privilege to be added. If
there is an authorization error with a topic resource, then a
TOPIC_AUTHORIZATION_FAILED (error code: 29) will be returned.
Group: this represents the consumer groups in the brokers. All protocol calls that are
working with consumer groups, like joining a group must have privileges with the
group in subject. If the privilege is not given then a GROUP_AUTHORIZATION_FAILED
(error code: 30) will be returned in the protocol response.
Cluster: this resource represents the cluster. Operations that are affecting the whole
cluster, like controlled shutdown are protected by privileges on the Cluster resource.
If there is an authorization problem on a cluster resource, then a
CLUSTER_AUTHORIZATION_FAILED (error code: 31) will be returned.
TransactionalId: this resource represents actions related to transactions, such as
committing. If any error occurs, then a
TRANSACTIONAL_ID_AUTHORIZATION_FAILED (error code: 53) will be returned by
brokers.
DelegationToken: this represents the delegation tokens in the cluster. Actions, such
as describing delegation tokens could be protected by a privilege on the
DelegationToken resource. Since these objects have a little special behavior in Kafka
it is recommended to read KIP-48 and the related upstream documentation
at Authentication using Delegation Tokens.
In the below table we'll list the valid operations on resources that are executed by the Kafka
API protocols.
PROTOCOL (API KEY) OPERATIO RESOURCE NOTE
N
An transactional producer which
Transactional
PRODUCE (0) Write has its transactional.id set requires
Id
this privilege.
IdempotentW An idempotent produce action
PRODUCE (0) Cluster
rite requires this privilege.
This applies to a normal produce
PRODUCE (0) Write Topic
action.
A follower must have
ClusterActio ClusterAction on the Cluster
FETCH (1) Cluster
n resource in order to fetch partition
data.
Regular Kafka consumers need
FETCH (1) Read Topic READ permission on each
partition they are fetching.
LIST_OFFSETS (2) Describe Topic
METADATA (3) Describe Topic
If topic auto-creation is enabled,
then the broker-side API will check
for the existence of a Cluster level
METADATA (3) Create Cluster privilege. If it's found then it'll
allow creating the topic, otherwise
it'll iterate through the Topic level
privileges (see the next one).
This authorizes auto topic creation
if enabled but the given user
METADATA (3) Create Topic
doesn't have a cluster level
permission (above).
ClusterActio
LEADER_AND_ISR (4) Cluster
n
ClusterActio
STOP_REPLICA (5) Cluster
n
ClusterActio
UPDATE_METADATA (6) Cluster
n
ClusterActio
CONTROLLED_SHUTDOWN (7) Cluster
n
An offset can only be committed if
it's authorized to the given group
OFFSET_COMMIT (8) Read Group and the topic too (see below).
Group access is checked first, then
Topic access.
Since offset commit is part of the
OFFSET_COMMIT (8) Read Topic consuming process, it needs
privileges for the read action.
PROTOCOL (API KEY) OPERATIO RESOURCE NOTE
N
Similarly to OFFSET_COMMIT,
the application must have
privileges on group and topic level
OFFSET_FETCH (9) Describe Group too to be able to fetch. However in
this case it requires describe access
instead of read. Group access is
checked first, then Topic access.
OFFSET_FETCH (9) Describe Topic
The FIND_COORDINATOR
request can be of "Group" type in
which case it is looking for
FIND_COORDINATOR (10) Describe Group
consumergroup coordinators. This
privilege would represent the
Group mode.
This applies only on transactional
Transactional producers and checked when a
FIND_COORDINATOR (10) Describe
Id producer tries to find the
transaction coordinator.
JOIN_GROUP (11) Read Group
HEARTBEAT (12) Read Group
LEAVE_GROUP (13) Read Group
SYNC_GROUP (14) Read Group
DESCRIBE_GROUPS (15) Describe Group
When the broker checks to
authorize a list_groups request it
first checks for this cluster level
authorization. If none found then it
LIST_GROUPS (16) Describe Cluster proceeds to check the groups
individually. This operation doesn't
return
CLUSTER_AUTHORIZATION_F
AILED.
If none of the groups are
authorized, then just an empty
response will be sent back instead
of an error. This operation doesn't
LIST_GROUPS (16) Describe Group
return
CLUSTER_AUTHORIZATION_F
AILED. This is applicable from the
2.1 release.
SASL_HANDSHAKE (17) The SASL handshake is part of the
authentication process and
therefore it's not possible to apply
PROTOCOL (API KEY) OPERATIO RESOURCE NOTE
N
any kind of authorization here.
The API_VERSIONS request is
part of the Kafka protocol
handshake and happens on
API_VERSIONS (18) connection and before any
authentication. Therefore it's not
possible to control this with
authorization.
If there is no cluster level
authorization then it won't return
CLUSTER_AUTHORIZATION_F
CREATE_TOPICS (19) Create Cluster
AILED but fall back to use topic
level, which is just below. That'll
throw error if there is a problem.
This is applicable from the 2.0
CREATE_TOPICS (19) Create Topic
release.
DELETE_TOPICS (20) Delete Topic
DELETE_RECORDS (21) Delete Topic
Transactional
INIT_PRODUCER_ID (22) Write
Id
IdempotentW
INIT_PRODUCER_ID (22) Cluster
rite
If there is no cluster level privilege
OFFSET_FOR_LEADER_EPOCH ClusterActio
Cluster for this operation, then it'll check
(23) n
for topic level one.
OFFSET_FOR_LEADER_EPOCH This is applicable from the 2.1
Describe Topic
(23) release.
ADD_PARTITIONS_TO_TXN Transactional This API is only applicable to
(24) Id transactional requests. It first
checks for the Write action on the
Write
TransactionalId resource, then it
checks the Topic in subject
(below).
ADD_PARTITIONS_TO_TXN
Write Topic
(24)
Similarly to
ADD_PARTITIONS_TO_TXN
this is only applicable to
Transactional transactional request. It first checks
ADD_OFFSETS_TO_TXN (25) Write
Id for Write action on the
TransactionalId resource, then it
checks whether it can Read on the
given group (below).
PROTOCOL (API KEY) OPERATIO RESOURCE NOTE
N
ADD_OFFSETS_TO_TXN (25) Read Group
Transactional
END_TXN (26) Write
Id
ClusterActio
WRITE_TXN_MARKERS (27) Cluster
n
Transactional
TXN_OFFSET_COMMIT (28) Write
Id
TXN_OFFSET_COMMIT (28) Read Group
TXN_OFFSET_COMMIT (28) Read Topic
DESCRIBE_ACLS (29) Describe Cluster
CREATE_ACLS (30) Alter Cluster
DELETE_ACLS (31) Alter Cluster
If broker configs are requested,
DescribeConf
DESCRIBE_CONFIGS (32) Cluster then the broker will check cluster
igs
level privileges.
If topic configs are requested, then
DescribeConf
DESCRIBE_CONFIGS (32) Topic the broker will check topic level
igs
privileges.
If broker configs are altered, then
ALTER_CONFIGS (33) AlterConfigs Cluster the broker will check cluster level
privileges.
If topic configs are altered, then the
ALTER_CONFIGS (33) AlterConfigs Topic broker will check topic level
privileges.
ALTER_REPLICA_LOG_DIRS
Alter Cluster
(34)
An empty response will be returned
DESCRIBE_LOG_DIRS (35) Describe Cluster
on authorization failure.
SASL_AUTHENTICATE is part
of the authentication process and
SASL_AUTHENTICATE (36)
therefore it's not possible to apply
any kind of authorization here.
CREATE_PARTITIONS (37) Alter Topic
Creating delegation tokens has
CREATE_DELEGATION_TOKE special rules, for this please see
N (38) the Authentication using
Delegation Tokens section.
Renewing delegation tokens has
RENEW_DELEGATION_TOKEN special rules, for this please see
(39) the Authentication using
Delegation Tokens section.
EXPIRE_DELEGATION_TOKEN Expiring delegation tokens has
PROTOCOL (API KEY) OPERATIO RESOURCE NOTE
N
special rules, for this please see
(40) the Authentication using
Delegation Tokens section.
Describing delegation tokens has
DESCRIBE_DELEGATION_TOK DelegationTo special rules, for this please see
Describe
EN (41) ken the Authentication using
Delegation Tokens section.
DELETE_GROUPS (42) Delete Group
ELECT_PREFERRED_LEADERS ClusterActio
Cluster
(43) n
If broker configs are altered, then
INCREMENTAL_ALTER_CONFI
AlterConfigs Cluster the broker will check cluster level
GS (44)
privileges.
If topic configs are altered, then the
INCREMENTAL_ALTER_CONFI
AlterConfigs Topic broker will check topic level
GS (44)
privileges.
ALTER_PARTITION_REASSIGN
Alter Cluster
MENTS (45)
LIST_PARTITION_REASSIGNM
Describe Cluster
ENTS (46)
OFFSET_DELETE (47) Delete Group
OFFSET_DELETE (47) Read Topic
You can secure a running cluster via one or more of the supported protocols discussed
previously. This is done in phases:
The specific steps for configuring SSL and SASL are described in sections 7.2 and 7.3.
Follow these steps to enable security for your desired protocol(s).
The security implementation lets you configure different protocols for both broker-client
and broker-broker communication. These must be enabled in separate bounces. A
PLAINTEXT port must be left open throughout so brokers and/or clients can continue to
communicate.
When performing an incremental bounce stop the brokers cleanly via a SIGTERM. It's also
good practice to wait for restarted replicas to return to the ISR list before moving onto the
next node.
As an example, say we wish to encrypt both broker-client and broker-broker communication
with SSL. In the first incremental bounce, an SSL port is opened on each node:
listeners=PLAINTEXT://broker1:9091,SSL://broker1:9092
We then restart the clients, changing their config to point at the newly opened, secured port:
bootstrap.servers = [broker1:9092,...]
security.protocol = SSL
...etc
In the second incremental server bounce we instruct Kafka to use SSL as the broker-broker
protocol (which will use the same SSL port):
listeners=PLAINTEXT://broker1:9091,SSL://broker1:9092
security.inter.broker.protocol=SSL
In the final bounce we secure the cluster by closing the PLAINTEXT port:
listeners=SSL://broker1:9092
security.inter.broker.protocol=SSL
Alternatively we might choose to open multiple ports so that different protocols can be
used for broker-broker and broker-client communication. Say we wished to use SSL
encryption throughout (i.e. for broker-broker and broker-client communication) but we'd like
to add SASL authentication to the broker-client connection also. We would achieve this by
opening two additional ports during the first bounce:
listeners=PLAINTEXT://broker1:9091,SSL://broker1:9092,SASL_SSL://broker1:
9093
We would then restart the clients, changing their config to point at the newly opened, SASL
& SSL secured port:
bootstrap.servers = [broker1:9093,...]
security.protocol = SASL_SSL
...etc
The second server bounce would switch the cluster to use encrypted broker-broker
communication via the SSL port we previously opened on port 9092:
listeners=PLAINTEXT://broker1:9091,SSL://broker1:9092,SASL_SSL://broker1:
9093
security.inter.broker.protocol=SSL
The final bounce secures the cluster by closing the PLAINTEXT port.
listeners=SSL://broker1:9092,SASL_SSL://broker1:9093
security.inter.broker.protocol=SSL
ZooKeeper can be secured independently of the Kafka cluster. The steps for doing this are
covered in section 7.6.2.
ZooKeeper supports mutual TLS (mTLS) authentication beginning with the 3.5.x versions.
Kafka supports authenticating to ZooKeeper with SASL and mTLS -- either individually or
both together -- beginning with version 2.5. See KIP-515: Enable ZK client to use the new
TLS supported authentication for more details.
When using mTLS alone, every broker and any CLI tools (such as the ZooKeeper Security
Migration Tool) should identify itself with the same Distinguished Name (DN) because it is
the DN that is ACL'ed. This can be changed as described below, but it involves writing and
deploying a custom ZooKeeper authentication provider. Generally each certificate should
have the same DN but a different Subject Alternative Name (SAN) so that hostname
verification of the brokers and any CLI tools by ZooKeeper will succeed.
When using SASL authentication to ZooKeeper together with mTLS, both the SASL identity
and either the DN that created the znode (i.e. the creating broker's certificate) or the DN of
the Security Migration Tool (if migration was performed after the znode was created) will be
ACL'ed, and all brokers and CLI tools will be authorized even if they all use different DNs
because they will all use the same ACL'ed SASL identity. It is only when using mTLS
authentication alone that all the DNs must match (and SANs become critical -- again, in the
absence of writing and deploying a custom ZooKeeper authentication provider as described
below).
Use the broker properties file to set TLS configs for brokers as described below.
To enable ZooKeeper SASL authentication on brokers, there are two necessary steps:
1. Create a JAAS login file and set the appropriate system property to point to it as
described above
2. Set the configuration property zookeeper.set.acl in each broker to true
The metadata stored in ZooKeeper for the Kafka cluster is world-readable, but can only be
modified by the brokers. The rationale behind this decision is that the data stored in
ZooKeeper is not sensitive, but inappropriate manipulation of that data can cause cluster
disruption. We also recommend limiting the access to ZooKeeper via network segmentation
(only brokers and some admin tools need access to ZooKeeper).
It is possible to use something other than the DN for the identity of mTLS clients by writing
a class that
extends org.apache.zookeeper.server.auth.X509AuthenticationProvider and
overrides the method protected String getClientId(X509Certificate clientCert).
Choose a scheme name and set authProvider.[scheme] in ZooKeeper to be the fully-
qualified class name of the custom implementation; then
set ssl.authProvider=[scheme] to use it.
Here is a sample (partial) ZooKeeper configuration for enabling TLS authentication. These
configurations are described in the ZooKeeper Admin Guide.
secureClientPort=2182
serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory
authProvider.x509=org.apache.zookeeper.server.auth.X509AuthenticationProv
ider
ssl.keyStore.location=/path/to/zk/keystore.jks
ssl.keyStore.password=zk-ks-passwd
ssl.trustStore.location=/path/to/zk/truststore.jks
ssl.trustStore.password=zk-ts-passwd
IMPORTANT: ZooKeeper does not support setting the key password in the ZooKeeper
server keystore to a value different from the keystore password itself. Be sure to set the key
password to be the same as the keystore password.
Here is a sample (partial) Kafka Broker configuration for connecting to ZooKeeper with
mTLS authentication. These configurations are described above in Broker Configs.
zookeeper.clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty
# define key/trust stores to use TLS to ZooKeeper; ignored unless
zookeeper.ssl.client.enable=true
zookeeper.ssl.keystore.location=/path/to/kafka/keystore.jks
zookeeper.ssl.keystore.password=kafka-ks-passwd
zookeeper.ssl.truststore.location=/path/to/kafka/truststore.jks
zookeeper.ssl.truststore.password=kafka-ts-passwd
# tell broker to create ACLs on znodes
zookeeper.set.acl=true
IMPORTANT: ZooKeeper does not support setting the key password in the ZooKeeper client
(i.e. broker) keystore to a value different from the keystore password itself. Be sure to set
the key password to be the same as the keystore password.
If you are running a version of Kafka that does not support security or simply with security
disabled, and you want to make the cluster secure, then you need to execute the following
steps to enable ZooKeeper authentication with minimal disruption to your operations:
ssl.trustStore.password=zk-ts-passwd
9. Perform a rolling restart of brokers setting the JAAS login file and/or defining
ZooKeeper mutual TLS configurations (including connecting to the TLS-enabled
ZooKeeper port) as required, which enables brokers to authenticate to ZooKeeper. At
the end of the rolling restart, brokers are able to manipulate znodes with strict ACLs,
but they will not create znodes with those ACLs
10. If you enabled mTLS, disable the non-TLS port in ZooKeeper
11. Perform a second rolling restart of brokers, this time setting the configuration
parameter zookeeper.set.acl to true, which enables the use of secure ACLs when
creating znodes
12. Execute the ZkSecurityMigrator tool. To execute the tool, there is this
script: bin/zookeeper-security-migration.sh with zookeeper.acl set to secure.
This tool traverses the corresponding sub-trees changing the ACLs of the znodes.
Use the --zk-tls-config-file <file> option if you enable mTLS.
It is also possible to turn off authentication in a secure cluster. To do it, follow these steps:
1. Perform a rolling restart of brokers setting the JAAS login file and/or defining
ZooKeeper mutual TLS configurations, which enables brokers to authenticate, but
setting zookeeper.set.acl to false. At the end of the rolling restart, brokers stop
creating znodes with secure ACLs, but are still able to authenticate and manipulate
all znodes
2. Execute the ZkSecurityMigrator tool. To execute the tool, run this
script bin/zookeeper-security-migration.sh with zookeeper.acl set to unsecure.
This tool traverses the corresponding sub-trees changing the ACLs of the znodes.
Use the --zk-tls-config-file <file> option if you need to set TLS configuration.
3. If you are disabling mTLS, enable the non-TLS port in ZooKeeper
4. Perform a second rolling restart of brokers, this time omitting the system property
that sets the JAAS login file and/or removing ZooKeeper mutual TLS configuration
(including connecting to the non-TLS-enabled ZooKeeper port) as required
5. If you are disabling mTLS, disable the TLS port in ZooKeeper
bin/zookeeper-security-migration.sh --zookeeper.acl=secure
--zookeeper.connect=localhost:2181
bin/zookeeper-security-migration.sh --help
ZooKeeper connections that use mutual TLS are encrypted. Beginning with ZooKeeper
version 3.5.7 (the version shipped with Kafka version 2.5) ZooKeeper supports a sever-side
config ssl.clientAuth (case-insensitively: want/need/none are the valid options, the
default is need), and setting this value to none in ZooKeeper allows clients to connect via a
TLS-encrypted connection without presenting their own certificate. Here is a sample
(partial) Kafka Broker configuration for connecting to ZooKeeper with just TLS encryption.
These configurations are described above in Broker Configs.
# connect to the ZooKeeper port configured for TLS
zookeeper.connect=zk1:2182,zk2:2182,zk3:2182
# required to use TLS to ZooKeeper (default is false)
zookeeper.ssl.client.enable=true
# required to use TLS to ZooKeeper
zookeeper.clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty
# define trust stores to use TLS to ZooKeeper; ignored unless
zookeeper.ssl.client.enable=true
# no need to set keystore information assuming
ssl.clientAuth=none on ZooKeeper
zookeeper.ssl.truststore.location=/path/to/kafka/truststore.jks
zookeeper.ssl.truststore.password=kafka-ts-passwd
# tell broker to create ACLs on znodes (if using SASL
authentication, otherwise do not set this)
zookeeper.set.acl=true
8. KAFKA CONNECT
8.1 Overview
Kafka Connect is a tool for scalably and reliably streaming data between Apache Kafka and
other systems. It makes it simple to quickly define connectors that move large collections
of data into and out of Kafka. Kafka Connect can ingest entire databases or collect metrics
from all your application servers into Kafka topics, making the data available for stream
processing with low latency. An export job can deliver data from Kafka topics into
secondary storage and query systems or into batch systems for offline analysis.
Kafka Connect currently supports two modes of execution: standalone (single process) and
distributed.
In standalone mode all work is performed in a single process. This configuration is simpler
to setup and get started with and may be useful in situations where only one worker makes
sense (e.g. collecting log files), but it does not benefit from some of the features of Kafka
Connect such as fault tolerance. You can start a standalone process with the following
command:
The first parameter is the configuration for the worker. This includes settings such as the
Kafka connection parameters, serialization format, and how frequently to commit offsets.
The provided example should work well with a local cluster running with the default
configuration provided by config/server.properties. It will require tweaking to use with a
different configuration or production deployment. All workers (both standalone and
distributed) require a few configs:
The parameters that are configured here are intended for producers and consumers used
by Kafka Connect to access the configuration, offset and status topics. For configuration of
the producers used by Kafka source tasks and the consumers used by Kafka sink tasks, the
same parameters can be used but need to be prefixed
with producer. and consumer. respectively. The only Kafka client parameter that is inherited
without a prefix from the worker configuration is bootstrap.servers, which in most cases
will be sufficient, since the same cluster is often used for all purposes. A notable exception
is a secured cluster, which requires extra parameters to allow connections. These
parameters will need to be set up to three times in the worker configuration, once for
management access, once for Kafka sources and once for Kafka sinks.
Starting with 2.3.0, client configuration overrides can be configured individually per
connector by using the prefixes producer.override. and consumer.override. for Kafka
sources or Kafka sinks respectively. These overrides are included with the rest of the
connector's configuration properties.
The remaining parameters are connector configuration files. You may include as many as
you want, but all will execute within the same process (on different threads).
Distributed mode handles automatic balancing of work, allows you to scale up (or down)
dynamically, and offers fault tolerance both in the active tasks and for configuration and
offset commit data. Execution is very similar to standalone mode:
The difference is in the class which is started and the configuration parameters which
change how the Kafka Connect process decides where to store configurations, how to
assign work, and where to store offsets and task statues. In the distributed mode, Kafka
Connect stores the offsets, configs and task statuses in Kafka topics. It is recommended to
manually create the topics for offset, configs and statuses in order to achieve the desired
the number of partitions and replication factors. If the topics are not yet created when
starting Kafka Connect, the topics will be auto created with default number of partitions and
replication factor, which may not be best suited for its usage.
Note that in distributed mode the connector configurations are not passed on the command
line. Instead, use the REST API described below to create, modify, and destroy connectors.
Configuring Connectors
Connector configurations are simple key-value mappings. For standalone mode these are
defined in a properties file and passed to the Connect process on the command line. In
distributed mode, they will be included in the JSON payload for the request that creates (or
modifies) the connector.
Most configurations are connector dependent, so they can't be outlined here. However,
there are a few common options:
name - Unique name for the connector. Attempting to register again with the same
name will fail.
connector.class - The Java class for the connector
tasks.max - The maximum number of tasks that should be created for this connector.
The connector may create fewer tasks if it cannot achieve this level of parallelism.
key.converter - (optional) Override the default key converter set by the worker.
value.converter - (optional) Override the default value converter set by the worker.
The connector.class config supports several formats: the full name or alias of the class for
this connector. If the connector is org.apache.kafka.connect.file.FileStreamSinkConnector,
you can either specify this full name or use FileStreamSink or FileStreamSinkConnector to
make the configuration a bit shorter.
Sink connectors also have a few additional options to control their input. Each sink
connector must set one of the following:
For any other options, you should consult the documentation for the connector.
Transformations
transforms - List of aliases for the transformation, specifying the order in which the
transformations will be applied.
transforms.$alias.type - Fully qualified class name for the transformation.
transforms.$alias.$transformationSpecificConfig Configuration properties for the
transformation
For example, lets take the built-in file source connector and use a transformation to add a
static field.
Throughout the example we'll use schemaless JSON data format. To use schemaless
format, we changed the following two lines in connect-standalone.properties from true to
false:
key.converter.schemas.enable
value.converter.schemas.enable
The file source connector reads each line as a String. We will wrap each line in a Map and
then add a second field to identify the origin of the event. To do this, we use two
transformations:
name=local-file-source
connector.class=FileStreamSource
tasks.max=1
file=test.txt
topic=connect-test
transforms=MakeMap, InsertSource
transforms.MakeMap.type=org.apache.kafka.connect.transforms.HoistField$Va
lue
transforms.MakeMap.field=line
transforms.InsertSource.type=org.apache.kafka.connect.transforms.InsertFi
eld$Value
transforms.InsertSource.static.field=data_source
transforms.InsertSource.static.value=test-file-source
All the lines starting with transforms were added for the transformations. You can see the
two transformations we created: "InsertSource" and "MakeMap" are aliases that we chose
to give the transformations. The transformation types are based on the list of built-in
transformations you can see below. Each transformation type has additional configuration:
HoistField requires a configuration called "field", which is the name of the field in the map
that will include the original String from the file. InsertField transformation lets us specify
the field name and the value that we are adding.
When we ran the file source connector on my sample file without the transformations, and
then read them using kafka-console-consumer.sh, the results were:
"foo"
"bar"
"hello world"
We then create a new file connector, this time after adding the transformations to the
configuration file. This time, the results will be:
{"line":"foo","data_source":"test-file-source"}
{"line":"bar","data_source":"test-file-source"}
{"line":"hello world","data_source":"test-file-source"}
You can see that the lines we've read are now part of a JSON map, and there is an extra
field with the static value we specified. This is just one example of what you can do with
transformations.
Included transformations
Several widely-applicable data and routing transformations are included with Kafka
Connect:
org.apache.kafka.connect.transforms.InsertField
Insert field(s) using attributes from the record metadata or a configured static value.
Use the concrete transformation type designed for the record key
(org.apache.kafka.connect.transforms.InsertField$Key ) or value
(org.apache.kafka.connect.transforms.InsertField$Value ).
offset.field
Type: string
Default: null
Valid Values:
Importance: medium
partition.field
Field name for Kafka partition. Suffix with ! to make this a required field, or ? to keep
it optional (the default).
Type: string
Default: null
Valid Values:
Importance: medium
static.field
Field name for static data field. Suffix with ! to make this a required field, or ? to keep
it optional (the default).
Type: string
Default: null
Valid Values:
Importance: medium
static.value
Type: string
Default: null
Valid Values:
Importance: medium
timestamp.field
Field name for record timestamp. Suffix with ! to make this a required field, or ? to
keep it optional (the default).
Type: string
Default: null
Valid Values:
Importance: medium
topic.field
Field name for Kafka topic. Suffix with ! to make this a required field, or ? to keep it
optional (the default).
Type: string
Default: null
Valid Values:
Importance: medium
org.apache.kafka.connect.transforms.ReplaceField
blacklist
Type: list
Default: ""
Valid Values:
Importance: medium
renames
Type: list
Default: ""
list of colon-delimited pairs,
Valid Values:
e.g. foo:bar,abc:xyz
Importance: medium
whitelist
Fields to include. If specified, only these fields will be used.
Type: list
Default: ""
Valid Values:
Importance: medium
org.apache.kafka.connect.transforms.MaskField
Mask specified fields with a valid null value for the field type (i.e. 0, false, empty string, and
so on).
For numeric and string fields, an optional replacement value can be specified that is
converted to the correct type.
Use the concrete transformation type designed for the record key
(org.apache.kafka.connect.transforms.MaskField$Key ) or value
(org.apache.kafka.connect.transforms.MaskField$Value ).
fields
Type: list
Default:
Valid Values: non-empty list
Importance: high
replacement
Custom value replacement, that will be applied to all 'fields' values (numeric or non-
empty string values only).
Type: string
Default: null
Valid Values: non-empty string
Importance: low
org.apache.kafka.connect.transforms.ValueToKey
Replace the record key with a new key formed from a subset of fields in the record value.
fields
Field names on the record value to extract as the record key.
Type: list
Default:
Valid Values: non-empty list
Importance: high
org.apache.kafka.connect.transforms.HoistField
Wrap data using the specified field name in a Struct when schema present, or a Map in the
case of schemaless data.
Use the concrete transformation type designed for the record key
(org.apache.kafka.connect.transforms.HoistField$Key ) or value
(org.apache.kafka.connect.transforms.HoistField$Value ).
field
Field name for the single field that will be created in the resulting Struct or Map.
Type: string
Default:
Valid Values:
Importance: medium
org.apache.kafka.connect.transforms.ExtractField
Extract the specified field from a Struct when schema present, or a Map in the case of
schemaless data. Any null values are passed through unmodified.
Use the concrete transformation type designed for the record key
(org.apache.kafka.connect.transforms.ExtractField$Key ) or value
(org.apache.kafka.connect.transforms.ExtractField$Value ).
field
Type: string
Default:
Valid Values:
Importance: medium
org.apache.kafka.connect.transforms.SetSchemaMetadata
schema.name
Type: string
Default: null
Valid Values:
Importance: high
schema.version
Type: int
Default: null
Valid Values:
Importance: high
org.apache.kafka.connect.transforms.TimestampRouter
Update the record's topic field as a function of the original topic value and the record
timestamp.
This is mainly useful for sink connectors, since the topic field is often used to determine the
equivalent entity name in the destination system(e.g. database table or search index name).
timestamp.format
Type: string
Default: yyyyMMdd
Valid Values:
Importance: high
topic.format
Format string which can contain ${topic} and ${timestamp} as placeholders for the
topic and timestamp, respectively.
Type: string
Default: ${topic}-${timestamp}
Valid Values:
Importance: high
org.apache.kafka.connect.transforms.RegexRouter
Update the record topic using the configured regular expression and replacement string.
Under the hood, the regex is compiled to a java.util.regex.Pattern. If the pattern matches
the input topic, java.util.regex.Matcher#replaceFirst() is used with the replacement
string to obtain the new topic.
regex
Type: string
Default:
Valid Values: valid regex
Importance: high
replacement
Replacement string.
Type: string
Default:
Valid Values:
Importance: high
org.apache.kafka.connect.transforms.Flatten
Flatten a nested data structure, generating names for each field by concatenating the field
names at each level with a configurable delimiter character. Applies to Struct when schema
present, or a Map in the case of schemaless data. The default delimiter is '.'.
Use the concrete transformation type designed for the record key
(org.apache.kafka.connect.transforms.Flatten$Key ) or value
(org.apache.kafka.connect.transforms.Flatten$Value ).
delimiter
Delimiter to insert between field names from the input record when generating field
names for the output record
Type: string
Default: .
Valid Values:
Importance: medium
org.apache.kafka.connect.transforms.Cast
Cast fields or the entire key or value to a specific type, e.g. to force an integer field to a
smaller width. Only simple primitive types are supported -- integers, floats, boolean, and
string.
Use the concrete transformation type designed for the record key
(org.apache.kafka.connect.transforms.Cast$Key ) or value
(org.apache.kafka.connect.transforms.Cast$Value ).
spec
List of fields and the type to cast them to of the form field1:type,field2:type to cast
fields of Maps or Structs. A single type to cast the entire value. Valid types are int8,
int16, int32, int64, float32, float64, boolean, and string.
Type: list
Default:
list of colon-delimited pairs,
Valid Values:
e.g. foo:bar,abc:xyz
Importance: high
org.apache.kafka.connect.transforms.TimestampConverter
Convert timestamps between different formats such as Unix epoch, strings, and Connect
Date/Timestamp types.Applies to individual fields or to the entire value.
Use the concrete transformation type designed for the record key
(org.apache.kafka.connect.transforms.TimestampConverter$Key ) or value
(org.apache.kafka.connect.transforms.TimestampConverter$Value ).
target.type
The desired timestamp representation: string, unix, Date, Time, or Timestamp
Type: string
Default:
Valid Values:
Importance: high
field
The field containing the timestamp, or empty if the entire value is a timestamp
Type: string
Default: ""
Valid Values:
Importance: high
format
Type: string
Default: ""
Valid Values:
Importance: medium
org.apache.kafka.connect.transforms.Filter
Drops all records, filtering them from subsequent transformations in the chain. This is
intended to be used conditionally to filter out records matching (or not matching) a
particular Predicate.
Predicates
For example, suppose you have a source connector which produces messages to many
different topics and you want to:
To do this we need first to filter out the records destined for the topic 'foo'. The Filter
transformation removes records from further processing, and can use the
TopicNameMatches predicate to apply the transformation only to records in topics which
match a certain regular expression. TopicNameMatches's only configuration property
is pattern which is a Java regular expression for matching against the topic name. The
configuration would look like this:
transforms=Filter
transforms.Filter.type=org.apache.kafka.connect.transforms.Filter
transforms.Filter.predicate=IsFoo
predicates=IsFoo
predicates.IsFoo.type=org.apache.kafka.connect.predicates.TopicNameMatche
s
predicates.IsFoo.pattern=foo
Next we need to apply ExtractField only when the topic name of the record is not 'bar'. We
can't just use TopicNameMatches directly, because that would apply the transformation to
matching topic names, not topic names which do not match. The transformation's
implicit negate config properties allows us to invert the set of records which a predicate
matches. Adding the configuration for this to the previous example we arrive at:
transforms=Filter,Extract
transforms.Filter.type=org.apache.kafka.connect.transforms.Filter
transforms.Filter.predicate=IsFoo
transforms.Extract.type=org.apache.kafka.connect.transforms.ExtractField$
Key
transforms.Extract.field=other_field
transforms.Extract.predicate=IsBar
transforms.Extract.negate=true
predicates=IsFoo,IsBar
predicates.IsFoo.type=org.apache.kafka.connect.predicates.TopicNameMatche
s
predicates.IsFoo.pattern=foo
predicates.IsBar.type=org.apache.kafka.connect.predicates.TopicNameMatche
s
predicates.IsBar.pattern=bar
org.apache.kafka.connect.transforms.predicates.HasHeaderKey
A predicate which is true for records with at least one header with the configured name.
name
Type: string
Default:
Valid Values: non-empty string
Importance: medium
org.apache.kafka.connect.transforms.predicates.RecordIsTombstone
A predicate which is true for records which are tombstones (i.e. have null value).
org.apache.kafka.connect.transforms.predicates.TopicNameMatches
A predicate which is true for records with a topic name that matches the configured regular
expression.
pattern
A Java regular expression for matching against the name of a record's topic.
Type: string
Default:
Valid Values: non-empty string, valid regex
Importance: medium
REST API
Since Kafka Connect is intended to be run as a service, it also provides a REST API for
managing connectors. The REST API server can be configured using
the listeners configuration option. This field should contain a list of listeners in the
following format: protocol://host:port,protocol2://host2:port2 . Currently supported
protocols are http and https. For example:
listeners=http://localhost:8080,https://localhost:8443
By default, if no listeners are specified, the REST server runs on port 8083 using the HTTP
protocol. When using HTTPS, the configuration has to include the SSL configuration. By
default, it will use the ssl.* settings. In case it is needed to use different configuration for
the REST API than for connecting to Kafka brokers, the fields can be prefixed
with listeners.https. When using the prefix, only the prefixed options will be used and
the ssl.* options without the prefix will be ignored. Following fields can be used to
configure HTTPS for the REST API:
ssl.keystore.location
ssl.keystore.password
ssl.keystore.type
ssl.key.password
ssl.truststore.location
ssl.truststore.password
ssl.truststore.type
ssl.enabled.protocols
ssl.provider
ssl.protocol
ssl.cipher.suites
ssl.keymanager.algorithm
ssl.secure.random.implementation
ssl.trustmanager.algorithm
ssl.endpoint.identification.algorithm
ssl.client.auth
The REST API is used not only by users to monitor / manage Kafka Connect. It is also used
for the Kafka Connect cross-cluster communication. Requests received on the follower
nodes REST API will be forwarded to the leader node REST API. In case the URI under which
is given host reachable is different from the URI which it listens on, the configuration
options rest.advertised.host.name, rest.advertised.port and rest.advertised.listener ca
n be used to change the URI which will be used by the follower nodes to connect with the
leader. When using both HTTP and HTTPS listeners, the rest.advertised.listener option
can be also used to define which listener will be used for the cross-cluster communication.
When using HTTPS for communication between nodes, the
same ssl.* or listeners.https options will be used to configure the HTTPS client.
Kafka Connect also provides a REST API for getting information about connector plugins:
GET /- return basic information about the Kafka Connect cluster such as the version
of the Connect worker that serves the REST request (including git commit ID of the
source code) and the Kafka cluster ID that is connected to.
Kafka Connect provides error reporting to handle errors encountered along various stages
of processing. By default, any error encountered during conversion or within
transformations will cause the connector to fail. Each connector configuration can also
enable tolerating such errors by skipping them, optionally writing each error and the details
of the failed operation and problematic record (with various levels of detail) to the Connect
application log. These mechanisms also capture errors when a sink connector is
processing the messages consumed from its Kafka topics, and all of the errors can be
written to a configurable "dead letter queue" (DLQ) Kafka topic.
To report errors within a connector's converter, transforms, or within the sink connector
itself to the log, set errors.log.enable=true in the connector configuration to log details of
each error and problem record's topic, partition, and offset. For additional debugging
purposes, set errors.log.include.messages=true to also log the problem record key, value,
and headers to the log (note this may log sensitive information).
To report errors within a connector's converter, transforms, or within the sink connector
itself to a dead letter queue topic, set errors.deadletterqueue.topic.name, and
optionally errors.deadletterqueue.context.headers.enable=true .
By default connectors exhibit "fail fast" behavior immediately upon an error or exception.
This is equivalent to adding the following configuration properties with their defaults to a
connector configuration:
These and other related connector configuration properties can be changed to provide
different behavior. For example, the following configuration properties can be added to a
connector configuration to setup error handling with multiple retries, logging to the
application logs and the my-connector-errors Kafka topic, and tolerating all errors by
reporting them rather than failing the connector task:
This guide describes how developers can write new connectors for Kafka Connect to move
data between Kafka and other systems. It briefly reviews a few key concepts and then
describes how to create a simple connector.
To copy data between Kafka and another system, users create a Connector for the system
they want to pull data from or push data to. Connectors come in two
flavors: SourceConnectors import data from another system (e.g. JDBCSourceConnector would
import a relational database into Kafka) and SinkConnectors export data
(e.g. HDFSSinkConnector would export the contents of a Kafka topic to an HDFS file).
Connectors do not perform any data copying themselves: their configuration describes the
data to be copied, and the Connector is responsible for breaking that job into a set
of Tasks that can be distributed to workers. These Tasks also come in two corresponding
flavors: SourceTask and SinkTask.
With an assignment in hand, each Task must copy its subset of the data to or from Kafka. In
Kafka Connect, it should always be possible to frame these assignments as a set of input
and output streams consisting of records with consistent schemas. Sometimes this
mapping is obvious: each file in a set of log files can be considered a stream with each
parsed line forming a record using the same schema and offsets stored as byte offsets in
the file. In other cases it may require more effort to map to this model: a JDBC connector
can map each table to a stream, but the offset is less clear. One possible mapping uses a
timestamp column to generate queries incrementally returning new data, and the last
queried timestamp can be used as the offset.
Each stream should be a sequence of key-value records. Both the keys and values can have
complex structure -- many primitive types are provided, but arrays, objects, and nested data
structures can be represented as well. The runtime data format does not assume any
particular serialization format; this conversion is handled internally by the framework.
In addition to the key and value, records (both those generated by sources and those
delivered to sinks) have associated stream IDs and offsets. These are used by the
framework to periodically commit the offsets of data that have been processed so that in
the event of failures, processing can resume from the last committed offsets, avoiding
unnecessary reprocessing and duplication of events.
Dynamic Connectors
Not all jobs are static, so Connector implementations are also responsible for monitoring the
external system for any changes that might require reconfiguration. For example, in
the JDBCSourceConnector example, the Connector might assign a set of tables to each Task.
When a new table is created, it must discover this so it can assign the new table to one of
the Tasks by updating its configuration. When it notices a change that requires
reconfiguration (or a change in the number of Tasks), it notifies the framework and the
framework updates any corresponding Tasks.
The rest of this section will walk through some code to demonstrate the key steps in
creating a connector, but developers should also refer to the full example source code as
many details are omitted for brevity.
Connector Example
@Override
public Class<? extends Task> taskClass() {
return FileStreamSourceTask.class;
}
@Override
public void start(Map<String, String> props) {
// The complete version includes error handling as well.
filename = props.get(FILE_CONFIG);
topic = props.get(TOPIC_CONFIG);
}
@Override
public void stop() {
// Nothing to do since no background monitoring is required.
}
Finally, the real core of the implementation is in taskConfigs(). In this case we are only
handling a single file, so even though we may be permitted to generate more tasks as per
the maxTasks argument, we return a list with only one entry:
@Override
public List<Map<String, String>> taskConfigs(int maxTasks) {
ArrayList<Map<String, String>> configs = new ArrayList<>();
// Only one input stream makes sense.
Map<String, String> config = new HashMap<>();
if (filename != null)
config.put(FILE_CONFIG, filename);
config.put(TOPIC_CONFIG, topic);
configs.add(config);
return configs;
}
Although not used in the example, SourceTask also provides two APIs to commit offsets in
the source system: commit and commitRecord. The APIs are provided for source systems
which have an acknowledgement mechanism for messages. Overriding these methods
allows the source connector to acknowledge messages in the source system, either in bulk
or individually, once they have been written to Kafka. The commit API stores the offsets in
the source system, up to the offsets that have been returned by poll. The implementation of
this API should block until the commit is complete. The commitRecord API saves the offset in
the source system for each SourceRecord after it is written to Kafka. As Kafka Connect will
record offsets automatically, SourceTasks are not required to implement them. In cases
where a connector does need to acknowledge messages in the source system, only one of
the APIs is typically required.
Even with multiple tasks, this method implementation is usually pretty simple. It just has to
determine the number of input tasks, which may require contacting the remote service it is
pulling data from, and then divvy them up. Because some patterns for splitting work among
tasks are so common, some utilities are provided in ConnectorUtils to simplify these cases.
Note that this simple example does not include dynamic input. See the discussion in the
next section for how to trigger updates to task configs.
Just as with the connector, we need to create a class inheriting from the appropriate
base Task class. It also has some standard lifecycle methods:
@Override
public void start(Map<String, String> props) {
filename = props.get(FileStreamSourceConnector.FILE_CONFIG);
stream = openOrThrowError(filename);
topic = props.get(FileStreamSourceConnector.TOPIC_CONFIG);
}
@Override
public synchronized void stop() {
stream.close();
}
These are slightly simplified versions, but show that these methods should be relatively
simple and the only work they should perform is allocating or freeing resources. There are
two points to note about this implementation. First, the start() method does not yet handle
resuming from a previous offset, which will be addressed in a later section. Second,
the stop() method is synchronized. This will be necessary because SourceTasks are given a
dedicated thread which they can block indefinitely, so they need to be stopped with a call
from a different thread in the Worker.
Next, we implement the main functionality of the task, the poll() method which gets events
from the input system and returns a List<SourceRecord>:
@Override
public List<SourceRecord> poll() throws InterruptedException {
try {
ArrayList<SourceRecord> records = new ArrayList<>();
while (streamValid(stream) && records.isEmpty()) {
LineAndOffset line = readToNextLine(stream);
if (line != null) {
Map<String, Object> sourcePartition =
Collections.singletonMap("filename", filename);
Map<String, Object> sourceOffset =
Collections.singletonMap("position", streamOffset);
records.add(new SourceRecord(sourcePartition,
sourceOffset, topic, Schema.STRING_SCHEMA, line));
} else {
Thread.sleep(1);
}
}
return records;
} catch (IOException e) {
// Underlying stream was killed, probably as a result of
calling stop. Allow to return
// null, and driving thread will handle any shutdown if
necessary.
}
return null;
}
Again, we've omitted some details, but we can see the important steps: the poll() method
is going to be called repeatedly, and for each call it will loop trying to read records from the
file. For each line it reads, it also tracks the file offset. It uses this information to create an
output SourceRecord with four pieces of information: the source partition (there is only one,
the single file being read), source offset (byte offset in the file), output topic name, and
output value (the line, and we include a schema indicating this value will always be a string).
Other variants of the SourceRecord constructor can also include a specific output partition, a
key, and headers.
Note that this implementation uses the normal Java InputStream interface and may sleep if
data is not available. This is acceptable because Kafka Connect provides each task with a
dedicated thread. While task implementations have to conform to the
basic poll() interface, they have a lot of flexibility in how they are implemented. In this case,
an NIO-based implementation would be more efficient, but this simple approach works, is
quick to implement, and is compatible with older versions of Java.
Sink Tasks
The flush() method is used during the offset commit process, which allows tasks to
recover from failures and resume from a safe point such that no events will be missed. The
method should push any outstanding data to the destination system and then block until
the write has been acknowledged. The offsets parameter can often be ignored, but is useful
in some cases where implementations want to store offset information in the destination
store to provide exactly-once delivery. For example, an HDFS connector could do this and
use atomic move operations to make sure the flush() operation atomically commits the
data and offsets to a final location in HDFS.
@Override
public void start(Map<String, String> props) {
...
try {
reporter = context.errantRecordReporter(); // may be null
if DLQ not enabled
} catch (NoSuchMethodException | NoClassDefFoundError e) {
// Will occur in Connect runtimes earlier than 2.6
reporter = null;
}
}
@Override
public void put(Collection<SinkRecord> records) {
for (SinkRecord record: records) {
try {
// attempt to process and send record to data sink
process(record);
} catch(Exception e) {
if (reporter != null) {
// Send errant record to error reporter
reporter.report(record, e);
} else {
// There's no error reporter, so fail
throw new ConnectException("Failed on record",
e);
}
}
}
}
To correctly resume upon startup, the task can use the SourceContext passed into
its initialize() method to access the offset data. In initialize(), we would add a bit more
code to read the offset (if it exists) and seek to that position:
Of course, you might need to read many keys for each of the input streams.
The OffsetStorageReader interface also allows you to issue bulk reads to efficiently load all
offsets, then apply them by seeking each input stream to the appropriate position.
Source connectors need to monitor the source system for changes, e.g. table
additions/deletions in a database. When they pick up changes, they should notify the
framework via the ConnectorContext object that reconfiguration is necessary. For example,
in a SourceConnector:
if (inputsChanged())
this.context.requestTaskReconfiguration();
The framework will promptly request new configuration information and update the tasks,
allowing them to gracefully commit their progress before reconfiguring them. Note that in
the SourceConnector this monitoring is currently left up to the connector implementation. If
an extra thread is required to perform this monitoring, the connector must allocate it itself.
Ideally this code for monitoring changes would be isolated to the Connector and tasks would
not need to worry about them. However, changes can also affect tasks, most commonly
when one of their input streams is destroyed in the input system, e.g. if a table is dropped
from a database. If the Task encounters the issue before the Connector, which will be
common if the Connector needs to poll for changes, the Task will need to handle the
subsequent error. Thankfully, this can usually be handled simply by catching and handling
the appropriate exception.
SinkConnectors usually only have to handle the addition of streams, which may translate to
new entries in their outputs (e.g., a new database table). The framework manages any
changes to the Kafka input, such as when the set of input topics changes because of a
regex subscription. SinkTasks should expect new input streams, which may require creating
new resources in the downstream system, such as a new table in a database. The trickiest
situation to handle in these cases may be conflicts between multiple SinkTasks seeing a
new input stream for the first time and simultaneously trying to create the new
resource. SinkConnectors, on the other hand, will generally require no special code for
handling a dynamic set of streams.
ConfigDef class is used for specifying the set of expected configurations. For each
configuration, you can specify the name, the type, the default value, the documentation, the
group information, the order in the group, the width of the configuration value and the name
suitable for display in the UI. Plus, you can provide special validation logic used for single
configuration validation by overriding the Validator class. Moreover, as there may be
dependencies between configurations, for example, the valid values and visibility of a
configuration may change according to the values of other configurations. To handle
this, ConfigDef allows you to specify the dependents of a configuration and to provide an
implementation of Recommender to get valid values and set visibility of a configuration given
the current configuration values.
The FileStream connectors are good examples because they are simple, but they also have
trivially structured data -- each line is just a string. Almost all practical connectors will need
schemas with more complex data formats.
To create more complex data, you'll need to work with the Kafka Connect data API. Most
structured records will need to interact with two classes in addition to primitive
types: Schema and Struct.
The API documentation provides a complete reference, but here is a simple example
creating a Schema and Struct:
If you are implementing a source connector, you'll need to decide when and how to create
schemas. Where possible, you should avoid recomputing them as much as possible. For
example, if your connector is guaranteed to have a fixed schema, create it statically and
reuse a single instance.
However, many connectors will have dynamic schemas. One simple example of this is a
database connector. Considering even just a single table, the schema will not be predefined
for the entire connector (as it varies from table to table). But it also may not be fixed for a
single table over the lifetime of the connector since the user may execute an ALTER
TABLE command. The connector must be able to detect these changes and react
appropriately.
Sink connectors are usually simpler because they are consuming data and therefore do not
need to create schemas. However, they should take just as much care to validate that the
schemas they receive have the expected format. When the schema does not match --
usually indicating the upstream producer is generating invalid data that cannot be correctly
translated to the destination system -- sink connectors should throw an exception to
indicate this error to the system.
When a connector is first submitted to the cluster, a rebalance is triggered between the
Connect workers in order to distribute the load that consists of the tasks of the new
connector. This same rebalancing procedure is also used when connectors increase or
decrease the number of tasks they require, when a connector's configuration is changed, or
when a worker is added or removed from the group as part of an intentional upgrade of the
Connect cluster or due to a failure.
In versions prior to 2.3.0, the Connect workers would rebalance the full set of connectors
and their tasks in the cluster as a simple way to make sure that each worker has
approximately the same amount of work. This behavior can be still enabled by
setting connect.protocol=eager.
Starting with 2.3.0, Kafka Connect is using by default a protocol that performs incremental
cooperative rebalancing that incrementally balances the connectors and tasks across the
Connect workers, affecting only tasks that are new, to be removed, or need to move from
one worker to another. Other tasks are not stopped and restarted during the rebalance, as
they would have been with the old protocol.
If a Connect worker leaves the group, intentionally or due to a failure, Connect waits
for scheduled.rebalance.max.delay.ms before triggering a rebalance. This delay defaults to
five minutes (300000ms) to tolerate failures or upgrades of workers without immediately
redistributing the load of a departing worker. If this worker returns within the configured
delay, it gets its previously assigned tasks in full. However, this means that the tasks will
remain unassigned until the time specified by scheduled.rebalance.max.delay.ms elapses. If
a worker does not return within that time limit, Connect will reassign those tasks among the
remaining workers in the Connect cluster.
The new Connect protocol is enabled when all the workers that form the Connect cluster
are configured with connect.protocol=compatible, which is also the default value when this
property is missing. Therefore, upgrading to the new Connect protocol happens
automatically when all the workers upgrade to 2.3.0. A rolling upgrade of the Connect
cluster will activate incremental cooperative rebalancing when the last worker joins on
version 2.3.0.
You can use the REST API to view the current status of a connector and its tasks, including
the ID of the worker to which each was assigned. For example, the GET /connectors/file-
source/status request shows the status of a connector named file-source:
{
"name": "file-source",
"connector": {
"state": "RUNNING",
"worker_id": "192.168.1.208:8083"
},
"tasks": [
{
"id": 0,
"state": "RUNNING",
"worker_id": "192.168.1.209:8083"
}
]
}
Connectors and their tasks publish status updates to a shared topic (configured
with status.storage.topic) which all workers in the cluster monitor. Because the workers
consume this topic asynchronously, there is typically a (short) delay before a state change
is visible through the status API. The following states are possible for a connector or one of
its tasks:
In most cases, connector and task states will match, though they may be different for short
periods of time when changes are occurring or if tasks have failed. For example, when a
connector is first started, there may be a noticeable delay before the connector and its
tasks have all transitioned to the RUNNING state. States will also diverge when tasks fail
since Connect does not automatically restart failed tasks. To restart a connector/task
manually, you can use the restart APIs listed above. Note that if you try to restart a task
while a rebalance is taking place, Connect will return a 409 (Conflict) status code. You can
retry after the rebalance completes, but it might not be necessary since rebalances
effectively restart all the connectors and tasks in the cluster.
Starting with 2.5.0, Kafka Connect uses the status.storage.topic to also store information
related to the topics that each connector is using. Connect Workers use these per-
connector topic status updates to respond to requests to the REST endpoint GET
/connectors/{name}/topics by returning the set of topic names that a connector is using. A
request to the REST endpoint PUT /connectors/{name}/topics/reset resets the set of active
topics for a connector and allows a new set to be populated, based on the connector's
latest pattern of topic usage. Upon connector deletion, the set of the connector's active
topics is also deleted. Topic tracking is enabled by default but can be disabled by
setting topic.tracking.enable=false. If you want to disallow requests to reset the active
topics of connectors during runtime, set the Worker
property topic.tracking.allow.reset=false.
It's sometimes useful to temporarily stop the message processing of a connector. For
example, if the remote system is undergoing maintenance, it would be preferable for source
connectors to stop polling it for new data instead of filling logs with exception spam. For
this use case, Connect offers a pause/resume API. While a source connector is paused,
Connect will stop polling it for additional records. While a sink connector is paused, Connect
will stop pushing new messages to it. The pause state is persistent, so even if you restart
the cluster, the connector will not begin message processing again until the task has been
resumed. Note that there may be a delay before all of a connector's tasks have transitioned
to the PAUSED state since it may take time for them to finish whatever processing they
were in the middle of when being paused. Additionally, failed tasks will not transition to the
PAUSED state until they have been restarted.
9. KAFKA STREAMS
Kafka Streams is a client library for processing and analyzing data stored in Kafka. It builds
upon important stream processing concepts such as properly distinguishing between event
time and processing time, windowing support, exactly-once processing semantics and
simple yet efficient management of application state.
Kafka Streams has a low barrier to entry: You can quickly write and run a small-scale proof-
of-concept on a single machine; and you only need to run additional instances of your
application on multiple machines to scale up to high-volume production workloads. Kafka
Streams transparently handles the load balancing of multiple instances of the same
application by leveraging Kafka's parallelism model.