0% found this document useful (0 votes)
16 views10 pages

Apache Kafka

Apache Kafka is a distributed messaging system that facilitates data exchange between different parts of a computer system through a publish-subscribe model. It features a robust ecosystem that includes topics, brokers, partitions, and consumers, allowing for high throughput, fault tolerance, and scalability. Key applications of Kafka include real-time data processing for services like Ola and Zomato, and it supports exactly-once message delivery semantics.

Uploaded by

anisha.kse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views10 pages

Apache Kafka

Apache Kafka is a distributed messaging system that facilitates data exchange between different parts of a computer system through a publish-subscribe model. It features a robust ecosystem that includes topics, brokers, partitions, and consumers, allowing for high throughput, fault tolerance, and scalability. Key applications of Kafka include real-time data processing for services like Ola and Zomato, and it supports exactly-once message delivery semantics.

Uploaded by

anisha.kse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Apache Kafka

Apache Kafka
Apache Kafka is like a communication system that help different parts of a
computer system exchange data by publishing and subscribing to topic

Subscriber

Publisher
Sender Apache Kafka Receiver
Why we use Apache Kafka
• Ola driver location update
• Zomato live food tracking
• Notification system to huge users
• Increase database throughput
Zomato boy
• Difficult to read write at frequent
basic

User Live location store Data base

Zomato server
Zomato Boy

Update
User

Publish
Zomato
server
Kafka
Topic

Bulk Batch OP
Kafka Architecture
Kafka cluster
Kafka Ecosystem
Offset Kafka Broker 1
Topic A

Topic A Partition

Producer Consumer

Topic B

Kafka Broker 2

Zookeeper
Key concepts
• Kafka Ecosystem: The Kafka ecosystem refers to the entire suite of tools,
libraries, and components that complement Apache Kafka for building real-time
data pipelines, stream processing applications, and other distributed systems.

• Kafka Topic: A Kafka topic is a category or feed name to which messages are
published by producers. Topics are divided into partitions to allow parallelism and
scalability within a Kafka cluster. Each message published to a topic is appended
to one of its partitions.

• Kafka Broker: A Kafka broker is a Kafka server that runs in a Kafka cluster. It stores
and manages partitions, handles producer requests, and serves consumer
requests. Brokers are responsible for storing and replicating data across the
cluster.
• Kafka Cluster: A Kafka cluster is a group of Kafka brokers working together to
store and manage topics and handle the load from producers and consumers.
Kafka clusters provide scalability, fault tolerance, and high availability by
distributing data partitions across multiple brokers.

• Partition: A partition is a unit of parallelism in Kafka. Topics are divided into


partitions, and each partition is replicated across multiple brokers for fault
tolerance. Messages within a partition are ordered and assigned a sequential id
called an offset.

• Offset: An offset is a unique identifier assigned to each message within a


partition. Offsets are sequential integers that represent the position of a message
within the partition. Consumers use offsets to track their position in a partition
and retrieve messages.
• Zookeeper: Apache Zookeeper is a centralized service used by Kafka for
managing and coordinating Kafka brokers and maintaining cluster metadata. It
handles tasks such as leader election, maintaining configuration information, and
detecting broker failures.

• Producer: A Kafka producer is a client application that publishes messages to


Kafka topics. Producers send messages (key-value pairs) to Kafka brokers, which
then append the messages to the appropriate topic partitions based on the
message key (optional) and partitioning strategy.

• Consumer:A Kafka consumer is a client application that subscribes to topics and


reads messages from Kafka brokers. Consumers read messages from partitions,
process them, and maintain their own offset to track their position in each
partition. Consumers can be part of a consumer group for load balancing and
parallelism.
Key Features of Kafka
• Distributed Messaging System: Kafka is designed as a distributed messaging system,
providing a unified platform for handling real-time data feeds with high-throughput,
fault tolerance, and horizontal scalability.
• Partitioning:Kafka topics are divided into partitions, allowing data within a topic to be
distributed across multiple Kafka brokers. Partitioning enables horizontal scalability
and improves parallelism for data processing.
• Replication: Kafka replicates partitions across multiple brokers to ensure fault
tolerance and data durability. Replication ensures that data is not lost even if some
brokers or nodes fail.
• High Throughput: Kafka is optimized for high throughput and low latency, making it
suitable for handling large volumes of data and supporting real-time data processing
and analytics.
• Fault Tolerance: Kafka provides built-in replication and leader election mechanisms to
maintain availability and durability of data, even in the event of broker failures.
• Scalability: Kafka scales horizontally by adding more brokers to the cluster and
partitioning topics across multiple nodes. This scalability allows Kafka to handle
increasing data volumes and growing workloads.
• Streaming:Kafka supports stream processing with the Kafka Streams API and
integration with Apache Kafka Connect for connecting Kafka with external systems
such as databases and data lakes.
• Exactly-once Semantics:Kafka guarantees exactly-once semantics for message
delivery between producers and consumers. This ensures that messages are
processed exactly once, addressing concerns about data consistency.
• Connectivity and Integration: Kafka Connect simplifies integration with external
systems by providing connectors for various data sources and sinks. It allows
seamless data movement between Kafka and other systems.
• Ecosystem and Community:Kafka has a vibrant ecosystem with support for
monitoring, management, and integration tools. It is backed by a strong
community and active development, ensuring continuous improvement and
innovation.

You might also like