0% found this document useful (0 votes)
113 views1 page

Kafka Producer Internals: Find Answers On The Fly, or Master Something New. Subscribe Today

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 1

PREV NEXT

⏮ Deep Dive into Kafka Producers  Building Data Streaming Applications with Apache Kafka Kafka Producer APIs ⏭

Kafka producer internals


In this section, we will walk through different Kafka producer components, and at a higher level, cover how

messages get transferred from a Kafka producer application to Kafka queues. While writing producer
applications, you generally use Producer APIs, which expose methods at a very abstract level. Before sending any 
data, a lot of steps are performed by these APIs. So it is very important to understand these internal steps in order
to gain complete knowledge about Kafka producers. We will cover these in this section. First, we need to
understand the responsibilities of Kafka producers apart from publishing messages. Let's look at them one by
one:

Bootstrapping Kafka broker URLs: The Producer connects to at least one broker to fetch metadata about
the Kafka cluster. It may happen that the first broker to which the producer wants to connect may be down. To
ensure a failover, the producer implementation takes a list of more than one broker URL to bootstrap from.
Producer iterates through a list of Kafka broker addresses until it finds the one to connect to fetch cluster
metadata.

Data serialization: Kafka uses a binary protocol to send and receive data over TCP. This means that while
writing data to Kafka, producers need to send the ordered byte sequence to the defined Kafka broker's
network port. Subsequently, it will read the response byte sequence from the Kafka broker in the same ordered
fashion. Kafka producer serializes every message data object into ByteArrays before sending any record to
Find answers on the fly, or master something new. Subscribe today. See pricing options.
the respective broker over the wire. Similarly, it converts any byte sequence received from the broker as a
response to the message object.

Determining topic partition: It is the responsibility of the Kafka producer to determine which topic partition
data needs to be sent. If the partition is specified by the caller program, then Producer APIs do not determine
topic partition and send data directly to it. However, if no partition is specified, then producer will choose a
partition for the message. This is generally based on the key of the message data object. You can also code for
your custom partitioner in case you want data to be partitioned as per specific business logic for your
enterprise.

Determining the leader of the partition: Producers send data to the leader of the partition directly. It is the
producer's responsibility to determine the leader of the partition to which it will write messages. To do so,
producers ask for metadata from any of the Kafka brokers. Brokers answer the request for metadata about
active servers and leaders of the topic's partitions at that point of time.

Failure handling/retry ability: Handling failure responses or number of retries is something that needs to be
controlled through the producer application. You can configure the number of retries through Producer API
configuration, and this has to be decided as per your enterprise standards. Exception handling should be done
through the producer application component. Depending on the type of exception, you can determine different
data flows.

Batching: For efficient message transfers, batching is a very useful mechanism. Through Producer API
configurations, you can control whether you need to use the producer in asynchronous mode or not. Batching
ensures reduced I/O and optimum utilization of producer memory. While deciding on the number of messages
in a batch, you have to keep in mind the end-to-end latency. End-to-end latency increases with the number of
messages in a batch.

Hopefully, the preceding paragraphs have given you an idea about the prime responsibilities of Kafka producers.
Now, we will discuss Kafka producer data flows. This will give you a clear understanding about the steps
involved in producing Kafka messages.

Internal implementation or the sequence of steps in Producer APIs may differ for respective
programming languages. Some of the steps can be done in parallel using threads or callbacks.

The following image shows the high-level steps involved in producing messages to the Kafka cluster:

Kafka producer high-level flow

Publishing messages to a Kafka topic starts with calling Producer APIs with appropriate details such as messages
in string format, topic, partitions (optional), and other configuration details such as broker URLs and so on. The
Producer API uses the passed on information to form a data object in a form of nested key-value pair. Once the
data object is formed, the producer serializes it into byte arrays. You can either use an inbuilt serializer or you can
develop your custom serializer. Avro is one of the commonly used data serializers.

Serialization ensures compliance to the Kafka binary protocol and efficient network transfer.

Next, the partition to which data needs to be sent is determined. If partition information is passed in API calls,
then producer would use that partition directly. However, in case partition information is not passed, then
producer determines the partition to which data should be sent. Generally, this is decided by the keys defined in
data objects. Once the record partition is decided, producer determines which broker to connect to in order to
send messages. This is generally done by the bootstrap process of selecting the producers and then, based on the
fetched metadata, determining the leader broker.

Producers also need to determine supported API versions of a Kafka broker. This is accomplished by using API
versions exposed by the Kafka cluster. The goal is that producer will support different versions of Producer APIs.
While communicating with the respective leader broker, they should use the highest API version supported by
both the producers and brokers.

Producers send the used API version in their write requests. Brokers can reject the write request if a compatible
API version is not reflected in the write request. This kind of setup ensures incremental API evolution while
supporting older versions of APIs.

Once a serialized data object is sent to the selected Broker, producer receives a response from those brokers. If
they receive metadata about the respective partition along with new message offsets, then the response is
considered successful. However, if error codes are received in the response, then producer can either throw the
exception or retry as per the received configuration.

As we move further in the chapter, we will dive deeply into the technical side of Kafka Producer APIs and write
them using Java and Scala programs.

Support / Sign Out


© 2021 O'Reilly Media, Inc. Terms of Service / Privacy Policy

You might also like