Kafka Producer Internals: Find Answers On The Fly, or Master Something New. Subscribe Today

Uploaded by

Dallas Guy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

149 views1 page

Kafka Producer Internals: Find Answers On The Fly, or Master Something New. Subscribe Today

Uploaded by

Dallas Guy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

PREV NEXT

⏮ Deep Dive into Kafka Producers  Building Data Streaming Applications with Apache Kafka Kafka Producer APIs ⏭

Kafka producer internals

In this section, we will walk through different Kafka producer components, and at a higher level, cover how

messages get transferred from a Kafka producer application to Kafka queues. While writing producer
applications, you generally use Producer APIs, which expose methods at a very abstract level. Before sending any 
data, a lot of steps are performed by these APIs. So it is very important to understand these internal steps in order
to gain complete knowledge about Kafka producers. We will cover these in this section. First, we need to
understand the responsibilities of Kafka producers apart from publishing messages. Let's look at them one by
one:

Bootstrapping Kafka broker URLs: The Producer connects to at least one broker to fetch metadata about
the Kafka cluster. It may happen that the first broker to which the producer wants to connect may be down. To
ensure a failover, the producer implementation takes a list of more than one broker URL to bootstrap from.
Producer iterates through a list of Kafka broker addresses until it finds the one to connect to fetch cluster
metadata.

Data serialization: Kafka uses a binary protocol to send and receive data over TCP. This means that while
writing data to Kafka, producers need to send the ordered byte sequence to the defined Kafka broker's
network port. Subsequently, it will read the response byte sequence from the Kafka broker in the same ordered
fashion. Kafka producer serializes every message data object into ByteArrays before sending any record to
Find answers on the fly, or master something new. Subscribe today. See pricing options.
the respective broker over the wire. Similarly, it converts any byte sequence received from the broker as a
response to the message object.

Determining topic partition: It is the responsibility of the Kafka producer to determine which topic partition
data needs to be sent. If the partition is specified by the caller program, then Producer APIs do not determine
topic partition and send data directly to it. However, if no partition is specified, then producer will choose a
partition for the message. This is generally based on the key of the message data object. You can also code for
your custom partitioner in case you want data to be partitioned as per specific business logic for your
enterprise.

Determining the leader of the partition: Producers send data to the leader of the partition directly. It is the
producer's responsibility to determine the leader of the partition to which it will write messages. To do so,
producers ask for metadata from any of the Kafka brokers. Brokers answer the request for metadata about
active servers and leaders of the topic's partitions at that point of time.

Failure handling/retry ability: Handling failure responses or number of retries is something that needs to be
controlled through the producer application. You can configure the number of retries through Producer API
configuration, and this has to be decided as per your enterprise standards. Exception handling should be done
through the producer application component. Depending on the type of exception, you can determine different
data flows.

Batching: For efficient message transfers, batching is a very useful mechanism. Through Producer API
configurations, you can control whether you need to use the producer in asynchronous mode or not. Batching
ensures reduced I/O and optimum utilization of producer memory. While deciding on the number of messages
in a batch, you have to keep in mind the end-to-end latency. End-to-end latency increases with the number of
messages in a batch.

Hopefully, the preceding paragraphs have given you an idea about the prime responsibilities of Kafka producers.
Now, we will discuss Kafka producer data flows. This will give you a clear understanding about the steps
involved in producing Kafka messages.

Internal implementation or the sequence of steps in Producer APIs may differ for respective
programming languages. Some of the steps can be done in parallel using threads or callbacks.

The following image shows the high-level steps involved in producing messages to the Kafka cluster:

Kafka producer high-level flow

Publishing messages to a Kafka topic starts with calling Producer APIs with appropriate details such as messages
in string format, topic, partitions (optional), and other configuration details such as broker URLs and so on. The
Producer API uses the passed on information to form a data object in a form of nested key-value pair. Once the
data object is formed, the producer serializes it into byte arrays. You can either use an inbuilt serializer or you can
develop your custom serializer. Avro is one of the commonly used data serializers.

Serialization ensures compliance to the Kafka binary protocol and efficient network transfer.

Next, the partition to which data needs to be sent is determined. If partition information is passed in API calls,
then producer would use that partition directly. However, in case partition information is not passed, then
producer determines the partition to which data should be sent. Generally, this is decided by the keys defined in
data objects. Once the record partition is decided, producer determines which broker to connect to in order to
send messages. This is generally done by the bootstrap process of selecting the producers and then, based on the
fetched metadata, determining the leader broker.

Producers also need to determine supported API versions of a Kafka broker. This is accomplished by using API
versions exposed by the Kafka cluster. The goal is that producer will support different versions of Producer APIs.
While communicating with the respective leader broker, they should use the highest API version supported by
both the producers and brokers.

Producers send the used API version in their write requests. Brokers can reject the write request if a compatible
API version is not reflected in the write request. This kind of setup ensures incremental API evolution while
supporting older versions of APIs.

Once a serialized data object is sent to the selected Broker, producer receives a response from those brokers. If
they receive metadata about the respective partition along with new message offsets, then the response is
considered successful. However, if error codes are received in the response, then producer can either throw the
exception or retry as per the received configuration.

As we move further in the chapter, we will dive deeply into the technical side of Kafka Producer APIs and write
them using Java and Scala programs.

Support / Sign Out

RT2021 Chap5
100% (1)
RT2021 Chap5
34 pages
Professional Cloud Network Engineer-150-336
No ratings yet
Professional Cloud Network Engineer-150-336
187 pages
Cloudera Administration
No ratings yet
Cloudera Administration
694 pages
Interview Question
No ratings yet
Interview Question
24 pages
Slide 5-6 Kafka
No ratings yet
Slide 5-6 Kafka
111 pages
SMAPI Crash
No ratings yet
SMAPI Crash
82 pages
Components of A C Program (Part 3)
No ratings yet
Components of A C Program (Part 3)
25 pages
Ajp Theory
No ratings yet
Ajp Theory
25 pages
Hadoop - Hive
No ratings yet
Hadoop - Hive
190 pages
Class-X Term - 2 Project - Compressed
No ratings yet
Class-X Term - 2 Project - Compressed
16 pages
Bab 8
No ratings yet
Bab 8
29 pages
Unit 2 WS
No ratings yet
Unit 2 WS
13 pages
Design A Movie Ticket Booking System - Grokking The Object Oriented Design Interview
No ratings yet
Design A Movie Ticket Booking System - Grokking The Object Oriented Design Interview
18 pages
Tryag
No ratings yet
Tryag
28 pages
LIT - Timings Dashed MMM
No ratings yet
LIT - Timings Dashed MMM
8 pages
Library Managment System
No ratings yet
Library Managment System
186 pages
Chapter #6 - Intro To HDL
No ratings yet
Chapter #6 - Intro To HDL
52 pages
PLSQL 7 1 SG
No ratings yet
PLSQL 7 1 SG
31 pages
Kafka's Architecture: Find Answers On The Fly, or Master Something New. Subscribe Today
No ratings yet
Kafka's Architecture: Find Answers On The Fly, or Master Something New. Subscribe Today
1 page
Deep Learning - Intro, Methods & Applications
100% (1)
Deep Learning - Intro, Methods & Applications
37 pages
Cloudera Administration PDF
100% (1)
Cloudera Administration PDF
476 pages
Xingyi Fan
No ratings yet
Xingyi Fan
2 pages
CS8451 QB
No ratings yet
CS8451 QB
15 pages
Python Unit - 4 Notes
No ratings yet
Python Unit - 4 Notes
32 pages
KAFKA
No ratings yet
KAFKA
22 pages
Null Values: CHAPTER 5 (6/E) CHAPTER 8 (5/E)
No ratings yet
Null Values: CHAPTER 5 (6/E) CHAPTER 8 (5/E)
13 pages
2019 R1 Framework Dev Guide
No ratings yet
2019 R1 Framework Dev Guide
221 pages
Top 51 AWS Interview Questions (2023)
No ratings yet
Top 51 AWS Interview Questions (2023)
22 pages
5144 EPSInForms
No ratings yet
5144 EPSInForms
27 pages
Question Bank RDBMS
No ratings yet
Question Bank RDBMS
3 pages
Company Interview Question Bank
No ratings yet
Company Interview Question Bank
16 pages
Resume Software Engineer
0% (1)
Resume Software Engineer
4 pages
Cloudera Distribution of Apache Kafka
No ratings yet
Cloudera Distribution of Apache Kafka
56 pages
Practicas LECCION 6 Oracle
No ratings yet
Practicas LECCION 6 Oracle
3 pages
Demystifying Apache Arrow
No ratings yet
Demystifying Apache Arrow
6 pages
How To Use Study4Pass To Pass AWS DVA C02 AWS Certified Developer Associate Dumps
No ratings yet
How To Use Study4Pass To Pass AWS DVA C02 AWS Certified Developer Associate Dumps
2 pages
Preliminary Bibliography
No ratings yet
Preliminary Bibliography
2 pages
Message Partitions: Find Answers On The Fly, or Master Something New. Subscribe Today
No ratings yet
Message Partitions: Find Answers On The Fly, or Master Something New. Subscribe Today
1 page
Custom Partition - Building Data Streaming Applications With Apache Kafka
No ratings yet
Custom Partition - Building Data Streaming Applications With Apache Kafka
1 page
Kafka Producer Apis: Find Answers On The Fly, or Master Something New. Subscribe Today
No ratings yet
Kafka Producer Apis: Find Answers On The Fly, or Master Something New. Subscribe Today
1 page
Conventions - Building Data Streaming Applications With Apache Kafka
No ratings yet
Conventions - Building Data Streaming Applications With Apache Kafka
1 page
Best Practices: Find Answers On The Fly, or Master Something New. Subscribe Today
No ratings yet
Best Practices: Find Answers On The Fly, or Master Something New. Subscribe Today
1 page
Advance Queuing Messaging Protocol - Building Data Streaming Applications With Apache Kafka
No ratings yet
Advance Queuing Messaging Protocol - Building Data Streaming Applications With Apache Kafka
1 page
Additional Producer Configuration: Find Answers On The Fly, or Master Something New. Subscribe Today
No ratings yet
Additional Producer Configuration: Find Answers On The Fly, or Master Something New. Subscribe Today
1 page
Apache Spark Component Guide
No ratings yet
Apache Spark Component Guide
84 pages
Edureka Training - AWS Solutions Architect
No ratings yet
Edureka Training - AWS Solutions Architect
11 pages
CCA175 Demo Examenes
No ratings yet
CCA175 Demo Examenes
19 pages
Oozie Tutorial
No ratings yet
Oozie Tutorial
84 pages
Cloudera Kafka
No ratings yet
Cloudera Kafka
175 pages
C++ Project On Payroll Management System
No ratings yet
C++ Project On Payroll Management System
16 pages
CAN - STM32F107 - John Kneen Microcontrollers
No ratings yet
CAN - STM32F107 - John Kneen Microcontrollers
29 pages
5 Kafka Producer Advanced
No ratings yet
5 Kafka Producer Advanced
152 pages
SparkInternals All
No ratings yet
SparkInternals All
90 pages
The Docker Handbook: by Anand Nevase
No ratings yet
The Docker Handbook: by Anand Nevase
57 pages
Azure DevOps Build and Release Pipelines 1
100% (1)
Azure DevOps Build and Release Pipelines 1
13 pages
Python Advanced - Pipes in Python
No ratings yet
Python Advanced - Pipes in Python
7 pages
DBT Cloud Advanced Architecture Guide
0% (1)
DBT Cloud Advanced Architecture Guide
4 pages
Interactive Visual Data Exploration With Spark in Databricks Cloud
No ratings yet
Interactive Visual Data Exploration With Spark in Databricks Cloud
26 pages
Stream Processing at Lyft
No ratings yet
Stream Processing at Lyft
20 pages
AWS Certification Preparation Notes
No ratings yet
AWS Certification Preparation Notes
25 pages
Leetcode Preparation
No ratings yet
Leetcode Preparation
14 pages
Hadoop Questions
No ratings yet
Hadoop Questions
41 pages
Interview PDF
No ratings yet
Interview PDF
100 pages
AWS EC2 Interview Questions - MindMajix
No ratings yet
AWS EC2 Interview Questions - MindMajix
27 pages
Apache Airflow TRAINING12532
No ratings yet
Apache Airflow TRAINING12532
3 pages
02 - Apache Spark On Amazon EMR
No ratings yet
02 - Apache Spark On Amazon EMR
31 pages
Cloudurable Kafka Tutorial v1 PDF
No ratings yet
Cloudurable Kafka Tutorial v1 PDF
79 pages
Complex Event Processing With Apache Flink Presentation
No ratings yet
Complex Event Processing With Apache Flink Presentation
49 pages
Apache Spark Theory by Arsh
No ratings yet
Apache Spark Theory by Arsh
4 pages
Apache Kafka Course Curriculum
No ratings yet
Apache Kafka Course Curriculum
5 pages
Top 10 Kafka Problems
No ratings yet
Top 10 Kafka Problems
3 pages
DO Qualification Kit: Software Model Standard
No ratings yet
DO Qualification Kit: Software Model Standard
33 pages
Azure Devops: Sato Naoki (Neo) - @satonaoki Jazug Tohoku Azure Devops #Jazug #Azuredevops
No ratings yet
Azure Devops: Sato Naoki (Neo) - @satonaoki Jazug Tohoku Azure Devops #Jazug #Azuredevops
34 pages
Pytest Documentation: Release 2.7.1
No ratings yet
Pytest Documentation: Release 2.7.1
219 pages
The Following Are The Different Phases Involved in A ETL Project Development Life Cycle
100% (2)
The Following Are The Different Phases Involved in A ETL Project Development Life Cycle
3 pages
Spark Training in Bangalore
No ratings yet
Spark Training in Bangalore
36 pages
Bigdata Notes
No ratings yet
Bigdata Notes
26 pages
MSFT Cloud Architecture Storage PDF
No ratings yet
MSFT Cloud Architecture Storage PDF
6 pages
Set Your Data in Motion
No ratings yet
Set Your Data in Motion
8 pages
Python Developer Interview Questions PDF
100% (1)
Python Developer Interview Questions PDF
2 pages
Edureka - Scala Interview Questions
No ratings yet
Edureka - Scala Interview Questions
21 pages
Amazon Web Services Hands-On IAM: December, 2012
No ratings yet
Amazon Web Services Hands-On IAM: December, 2012
9 pages
Flink Vs Spark by Slim Baltagi
No ratings yet
Flink Vs Spark by Slim Baltagi
67 pages
Unstructured Dataload Into Hive Database Through PySpark
No ratings yet
Unstructured Dataload Into Hive Database Through PySpark
9 pages
Studying For A Tech Interview Sucks
No ratings yet
Studying For A Tech Interview Sucks
8 pages
Databricks Question
No ratings yet
Databricks Question
7 pages
Hadoop Admin Course
No ratings yet
Hadoop Admin Course
8 pages
Spark Syllabus 1
No ratings yet
Spark Syllabus 1
3 pages
DVS SPARK Course Content PDF
No ratings yet
DVS SPARK Course Content PDF
2 pages
DW
No ratings yet
DW
29 pages
Donald Ngandeu 1
No ratings yet
Donald Ngandeu 1
6 pages
Course Schedule - LeetCode
No ratings yet
Course Schedule - LeetCode
1 page
Cloudera Administration Study Guide
No ratings yet
Cloudera Administration Study Guide
3 pages
Zycus Placement Paper
No ratings yet
Zycus Placement Paper
5 pages
Java 17 Backend Development: Design backend systems using Spring Boot, Docker, Kafka, Eureka, Redis, and Tomcat
From Everand
Java 17 Backend Development: Design backend systems using Spring Boot, Docker, Kafka, Eureka, Redis, and Tomcat
Elara Drevyn
No ratings yet
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Mastering Kafka Streams: From Basics to Expert Proficiency
From Everand
Mastering Kafka Streams: From Basics to Expert Proficiency
William Smith
No ratings yet
Extending Docker
From Everand
Extending Docker
Russ McKendrick
5/5 (1)

Kafka Producer Internals: Find Answers On The Fly, or Master Something New. Subscribe Today

Uploaded by

Kafka Producer Internals: Find Answers On The Fly, or Master Something New. Subscribe Today

Uploaded by

PREV NEXT

Kafka producer internals

Kafka producer high-level flow

Support / Sign Out

You might also like