https://pandio.

com/blog/top-10-problems-when-using-apache-kafka/
Top 10 Problems When Using Apache Kafka
Apache Kafka is one of the most popularly used open-
source(https://opensource.com/resources/what-open-source) distributed event
streaming platforms. Its use cases range from enabling mission-critical apps to
building and maintaining high-performance data pipelines. If you consider whether
to use Apache Kafka for your future projects, you should know all about the pros
and cons of using it. In today’s article, we will stick to the cons. While Apache
Kafka is a solid distributed messaging platform, it has some limitations. To paint
the picture, we have put together the top 10 problems when using Apache Kafka.

1. In Sync Replica Alerts

Kafka In Sync Replica Alert tells you that some of the topics are under-
replicated. The data is simply not being replicated to brokers. These alerts
indicate a potentially serious problem because the probability of data being lost
becomes higher. It can happen entirely unexpectedly, even if you do nothing on your
side. It usually takes place when downlevel clients affect the volume of data.

A spike in data volume causes the Kafka broker to back up message conversion.
However, the problem has to be addressed as soon as possible. Usually, the
questionable broker has to be fixed for the entire system to be operational again.

2. Kafka Liveness Check Problems and Automation

The Kafka liveness check problems can quickly occur if the host where the
liveness check is running cannot reach the host where the broker is running. If
this happens, the broker will keep on restarting. Meanwhile, all the downlevel
clients won’t be able to run their apps. It can become a real nuance if you want to
automate some of your tasks on Kafka.

Why? Because you need to enable liveness check to streamline automation and
make sure that the broker’s client-serving port is open. You can simply write a
piece of code to restart the broker when the port is not open. But if the broker
falls into a dead-loop and keeps restarting, your entire infrastructure is rendered
useless. Is there a quick fix? Simply turn off the liveness check.

3. New Brokers Can Impact the Performance

Staging a new cluster and installing the broker software on Apache Kafka
(https://pandio.com/blog/how-to-fix-apache-kafka-error-nobrokersavailable/) is
straightforward. Adding new brokers should not cause any problems, right? Pushing a
new Kafka broker into production can potentially impact the performance and cause
serious latency and missing file problems.

The broker can work properly before the partition reassign process is
completed. Devs usually forget about it and use the default commands from the
documentation. Moving thousands of partitions to the staging cluster can take
hours. And, until all the partitions have been moved, its performance will suffer.
This is why you should be careful and have a plan when you want to add a new broker
to the infrastructure.

4. Questionable Long-Term Storage Solution

If you are working with large sets of data, using Apache Kafka to store it
might cause you several problems. The major problem comes from Kafka storing
redundant copies of data. It can affect the performance, but, more importantly, it
can significantly increase your storage costs.

The best solution would be to use Kafka only for storing data for a brief
period and migrate data to a relational or non-relational database, depending on
your specific requirements.
5. Finding Perfect Data Retention Settings
While we are discussing long-term storage solution
problems(https://pandio.com/blog/zero-data-loss-a-reality-in-apache-pulsar-not-
true-with-kafka/), let’s point out one additional issue related to it. The
downstream clients often have completely unpredictable data request patterns. This
makes finding the perfect and most optimal data retention settings somewhat of a
problem.

Kafka stores messages in topics. This data can take up significant disk space
on your brokers. To dump the data, you need to set the retention period or
configurable size. If you don’t tune the data
retention(https://pandio.com/blog/pulsar-with-pandio-dont-use-apache-kafka/)
settings correctly, you risk either rendering data useless or paying way too much
for storage than you should have to in the first place.

6. Overly Complex Data Transformations on-Fly

Using Apache Kafka on big data integration and migration projects can become
too complex. How come? Kafka was built to streamline delivering messages, and the
platform excels at it. However, you will run into some problems if you want to
transform data on-fly.

Even with Kafka Stream API, you will have to spend days building complex data
pipelines(https://pandio.com/blog/challenges-building-big-data-pipelines/) and
managing the interaction between data producers and data consumers. Not to mention
having to deal with and manage a system this complex. There are other distributed
messaging systems that are much better for streamlining
ETL(https://pandio.com/blog/what-is-etl-benefits-challenges-recent-advances/) jobs,
such as Apache Pulsar.

7. Upscaling and Topic Rebalancing

The volume of your data streams can go in both directions. This is why it is
crucial to choose a distributed messaging platform easy to scale up and down. With
Kafka, this is a problem because you need to balance things manually to reduce
resource bottlenecks.

You will have to do it every time a major change in the data stream occurs.
And do it both via partition leadership balancing and Kafka reassign partition
script. At the same time, with stateless brokers, Apache Pulsar makes the scale-out
process (https://pandio.com/blog/how-apache-pulsar-solves-kafkas-scalability-
issues/) significantly easier.

8. MirrorMaker Doesn’t Replicate the Topic Offsets

MirrorMaker is one of Kafka’s features that allows you to make copies of your
clusters. This would be a great disaster recovery plan if it weren’t for one
downside. MirrorMaker doesn’t replicate the topic offsets between the clusters. You
will have to create unique keys in messages to overcome this problem which can
become a daunting task when you are working at scale.

9. Not All Messaging Paradigms Are Included

While Apache Kafka comes with many messaging paradigms, some are still
missing.This can turn into a real problem if you need to extend your infrastructure
use case.It limits the Kafka capability to support building complex data pipelines.

Two major messaging paradigms not supported in Kafka are point-to-point

queues and request/reply queues

10. Changing Messages Reduces Performance

If you want to use Apache Kafka to deliver messages as they are, you will
have no issues performance-wise. However, the problem occurs once you wish to
modify the messages before you deliver them.

Manipulating data on the fly is possible with Kafka, but the system it uses
has some limits. It uses system calls to do it, and modifying messages makes the
entire platform perform significantly slower.

Nevertheless, many giants across industries use Apache Kafka, including

Twitter, Netflix, and LinkedIn. These ten problems are quite specific to Kafka, and
they might affect your implementation of the distributed messaging solution in a
specific case. Feel free to check Pandio if you want to learn more about Apache
Pulsar, a distributed messaging platform (https://pandio.com/blog/apache-kafka-or-
apache-pulsar-which-is-better-and-why/) that outperforms Kafka in almost every
possible use case and is positioned for the ML workloads of the future.

callHome ams 5f492ceeba1fc80006f6af54

AWS Associate Data Engineer
100% (2)
AWS Associate Data Engineer
23 pages
Getting To Know Kafka: Ola Is The First Course in The Series of Courses Covering All The Aspects of Kafka
100% (1)
Getting To Know Kafka: Ola Is The First Course in The Series of Courses Covering All The Aspects of Kafka
23 pages
Configuring Kafka For High Throughput
No ratings yet
Configuring Kafka For High Throughput
11 pages
Kafka Sparkstreaming
No ratings yet
Kafka Sparkstreaming
75 pages
A Visual Introduction To Apache Kafka PDF
No ratings yet
A Visual Introduction To Apache Kafka PDF
84 pages
Apache Kafka 101
No ratings yet
Apache Kafka 101
25 pages
Azure Storage
No ratings yet
Azure Storage
649 pages
Apache Kafka Tutorial
No ratings yet
Apache Kafka Tutorial
24 pages
Cloudera Kafka
No ratings yet
Cloudera Kafka
175 pages
Aws Fargate and Ecs Masterclass
100% (1)
Aws Fargate and Ecs Masterclass
74 pages
Kafka PDF
No ratings yet
Kafka PDF
106 pages
Kafka Streams
No ratings yet
Kafka Streams
129 pages
Slide 5-6 Kafka
No ratings yet
Slide 5-6 Kafka
111 pages
Kafka My Kafka Note v67
No ratings yet
Kafka My Kafka Note v67
55 pages
AWS Scenario Based
No ratings yet
AWS Scenario Based
16 pages
KAFKA
No ratings yet
KAFKA
22 pages
Apache Kafka Interview Questions
No ratings yet
Apache Kafka Interview Questions
5 pages
Amazon Elastic MapReduce PDF
No ratings yet
Amazon Elastic MapReduce PDF
231 pages
Kafka Interview Questions
No ratings yet
Kafka Interview Questions
11 pages
Kafka's Architecture: Find Answers On The Fly, or Master Something New. Subscribe Today
No ratings yet
Kafka's Architecture: Find Answers On The Fly, or Master Something New. Subscribe Today
1 page
Kafka Internals
No ratings yet
Kafka Internals
30 pages
Elasticsearch Optimization
No ratings yet
Elasticsearch Optimization
25 pages
Kafka Cloudera Documentation
100% (1)
Kafka Cloudera Documentation
175 pages
Top 51 AWS Interview Questions (2023)
No ratings yet
Top 51 AWS Interview Questions (2023)
22 pages
Kafka Low Level Architecture
No ratings yet
Kafka Low Level Architecture
52 pages
Kubernetes For Beginners
100% (1)
Kubernetes For Beginners
29 pages
Apache Kafka Description
No ratings yet
Apache Kafka Description
36 pages
Handle Large Messages in Apache Kafka
No ratings yet
Handle Large Messages in Apache Kafka
59 pages
AWS Architect Associate V.1
50% (2)
AWS Architect Associate V.1
104 pages
White Paper Architects Guide To Implementing Event Driven Architecture
No ratings yet
White Paper Architects Guide To Implementing Event Driven Architecture
31 pages
System Analysis and Design Solved MCQs (Set-7)
No ratings yet
System Analysis and Design Solved MCQs (Set-7)
4 pages
Cobol
No ratings yet
Cobol
69 pages
Redis Internals
No ratings yet
Redis Internals
20 pages
CIS Amazon Web Services Three-Tier Web Architecture Benchmark v1.0.0
No ratings yet
CIS Amazon Web Services Three-Tier Web Architecture Benchmark v1.0.0
215 pages
Kafka Cheat Sheets
No ratings yet
Kafka Cheat Sheets
1 page
Data Destruction Policy 20.02.20
No ratings yet
Data Destruction Policy 20.02.20
3 pages
Dhruba Jyoti Saha - Java Architect
No ratings yet
Dhruba Jyoti Saha - Java Architect
15 pages
Learning-Notes - Books - Designing-Data-Intensive-Applications - MD at Master Keyvanakbary - Learning-Notes
No ratings yet
Learning-Notes - Books - Designing-Data-Intensive-Applications - MD at Master Keyvanakbary - Learning-Notes
91 pages
Apache Kafka Essentials
No ratings yet
Apache Kafka Essentials
10 pages
Top Answers To Kafka Interview Questions
No ratings yet
Top Answers To Kafka Interview Questions
3 pages
5 Kafka Producer Advanced
No ratings yet
5 Kafka Producer Advanced
152 pages
Locking Down Your Kubernetes Cluster With Linkerd
No ratings yet
Locking Down Your Kubernetes Cluster With Linkerd
24 pages
Apache Kafka - Basic Operations
No ratings yet
Apache Kafka - Basic Operations
6 pages
Perencanaan Pengembangan Perpustakaan Digital Di Sekolah Menengah Atas (SMA) Negeri 1 Padang
No ratings yet
Perencanaan Pengembangan Perpustakaan Digital Di Sekolah Menengah Atas (SMA) Negeri 1 Padang
8 pages
Automate Machine Learning - Aparna Elangovan
No ratings yet
Automate Machine Learning - Aparna Elangovan
26 pages
Lab - Exploring DataLake With Athena and Quicksight PDF
No ratings yet
Lab - Exploring DataLake With Athena and Quicksight PDF
22 pages
Apache Kafka
No ratings yet
Apache Kafka
6 pages
TSM TDP For Exchange Server Windows
No ratings yet
TSM TDP For Exchange Server Windows
328 pages
Veeam Backup 9 0 Release Notes en
No ratings yet
Veeam Backup 9 0 Release Notes en
23 pages
Design Approach To Handle Late Arriving Dimensions and Late Arriving Facts
No ratings yet
Design Approach To Handle Late Arriving Dimensions and Late Arriving Facts
109 pages
AWS S3 Interview Questions
No ratings yet
AWS S3 Interview Questions
4 pages
100 Interview Questions
No ratings yet
100 Interview Questions
13 pages
Cloudurable Kafka Tutorial v1 PDF
No ratings yet
Cloudurable Kafka Tutorial v1 PDF
79 pages
Kubernetes and Spinnaker
No ratings yet
Kubernetes and Spinnaker
71 pages
JVM (Java Virtual Machine)
No ratings yet
JVM (Java Virtual Machine)
34 pages
Serverless Architecture Wagner
No ratings yet
Serverless Architecture Wagner
48 pages
AWS CloudFormation Basics
No ratings yet
AWS CloudFormation Basics
13 pages
Artifactory With Amazon Ecs On The Aws Cloud PDF
No ratings yet
Artifactory With Amazon Ecs On The Aws Cloud PDF
37 pages
AWS Basic Interview Questions
No ratings yet
AWS Basic Interview Questions
12 pages
1.introduction To Databases
No ratings yet
1.introduction To Databases
32 pages
Chandan Prakash's Blog
No ratings yet
Chandan Prakash's Blog
4 pages
Kafka Producer Internals: Find Answers On The Fly, or Master Something New. Subscribe Today
No ratings yet
Kafka Producer Internals: Find Answers On The Fly, or Master Something New. Subscribe Today
1 page
Data Frame in Panda 01
No ratings yet
Data Frame in Panda 01
9 pages
Quantitative Research
No ratings yet
Quantitative Research
4 pages
Apache Cassandra Certification
No ratings yet
Apache Cassandra Certification
0 pages
Integrating Apache Nifi and Apache Kafka
No ratings yet
Integrating Apache Nifi and Apache Kafka
5 pages
AI Hackathon
No ratings yet
AI Hackathon
11 pages
Computer Science 146 Computer Architecture
No ratings yet
Computer Science 146 Computer Architecture
16 pages
Sample Vba Code
No ratings yet
Sample Vba Code
29 pages
Sap Bi (Open Hub Destination)
No ratings yet
Sap Bi (Open Hub Destination)
16 pages
(06) 分析輔助工具 - Power PI - 基礎入門介紹.zh-CN.en
No ratings yet
(06) 分析輔助工具 - Power PI - 基礎入門介紹.zh-CN.en
55 pages
Dbms Lab Manual 10csl57
100% (1)
Dbms Lab Manual 10csl57
38 pages
Methods For The Assessment of Productivity of Small Hold Farms
No ratings yet
Methods For The Assessment of Productivity of Small Hold Farms
49 pages
1.CTSD-2 Unit - 1
No ratings yet
1.CTSD-2 Unit - 1
33 pages
Basic File Structure
No ratings yet
Basic File Structure
17 pages
Case Study: Problem Statement
No ratings yet
Case Study: Problem Statement
6 pages
Cultural Heritage Preservation Through Community e
No ratings yet
Cultural Heritage Preservation Through Community e
10 pages
SBA Activity Sample
No ratings yet
SBA Activity Sample
2 pages
E Programming Classicconnector
No ratings yet
E Programming Classicconnector
13 pages
Calvin G. Payne: 4250 Emerald Bay Drive Jacksonville, FL 32277 904.476.3968
No ratings yet
Calvin G. Payne: 4250 Emerald Bay Drive Jacksonville, FL 32277 904.476.3968
10 pages
Databricks - Data Analyst
No ratings yet
Databricks - Data Analyst
5 pages
html2pdf SSRF Deserialization
No ratings yet
html2pdf SSRF Deserialization
4 pages
Emmanuel Asantewaa
No ratings yet
Emmanuel Asantewaa
3 pages
Assignment 05
No ratings yet
Assignment 05
6 pages
Aligning Disk Partitions To Boost Virtual Machine Performance
No ratings yet
Aligning Disk Partitions To Boost Virtual Machine Performance
5 pages
DAT201x Syllabus
No ratings yet
DAT201x Syllabus
3 pages
Ultimate AWS Certified Solutions Architect Associate Exam Guide: Master Designing Resilient, Scalable Architectures with Core and Advanced AWS Services to Crack the SAA-C03 Certification (English Edition)
From Everand
Ultimate AWS Certified Solutions Architect Associate Exam Guide: Master Designing Resilient, Scalable Architectures with Core and Advanced AWS Services to Crack the SAA-C03 Certification (English Edition)
Venkata Sasi Kanumuri
No ratings yet
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Mastering Kafka Streams: From Basics to Expert Proficiency
From Everand
Mastering Kafka Streams: From Basics to Expert Proficiency
William Smith
No ratings yet
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
From Everand
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
Eric Chou
No ratings yet

Top 10 Kafka Problems

Uploaded by