0% found this document useful (0 votes)
9 views4 pages

Class 5 - MsgQueues, PubSub, Kafka

Uploaded by

tysgart
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views4 pages

Class 5 - MsgQueues, PubSub, Kafka

Uploaded by

tysgart
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Async vs Sync

● Synchronous tasks are high-priority tasks that require immediate execution and user feedback. They
are generally associated with user actions that need immediate system response.
● Asynchronous tasks can be processed in the background and are not time-sensitive. They don't need
immediate user feedback and often involve long-running operations that can be offloaded to
background systems.

Message Queues:

● Asynchronous Communication: Message queues enable asynchronous communication between


different services, meaning one service doesn't need to wait for another service to complete its task
before moving on to its next task.
● Load Balancing: They can also help distribute the load evenly among different services or instances
of a service.
● Controlling Throughput: By adjusting the rate at which messages are sent or received, you can
control the throughput of the system.
● Decoupling Components: Message queues decouple the services, meaning the services do not need
to interact with each other directly.
● Scaling: As the load increases, more queues or services reading from the queues can be added to
scale the system.
● Buffering & Throttling: Queues can act as a buffer, holding messages when the processing service is
not ready. Throttling can be implemented to control the rate of message processing based on the
current load on the system.

Distributed Message Queue vs Non-Distributed Message Queue

Non-Distributed Message Queue Distributed Message Queue

Lower: Since the system isn't distributed, a Higher: Distributed queues are designed to
Availability single point of failure can cause the entire avoid single points of failure. If one node fails,
service to be unavailable. the system can still continue to operate.

Depends on the specific queue technology More Robust: Messages in a distributed queue
and its configuration. Some may support can be replicated across multiple nodes,
Message Persistence
persistent messaging, but may not be as ensuring that no data is lost in case of a node
robust as distributed systems. failure.

Limited: The capacity is limited by the Higher: Since the system is distributed, it can
Scalability resources of the single machine where it be easily scaled up by adding more nodes to
operates. the system.

Lower: Being limited to a single machine's Higher: As you can distribute the load across
Throughput resources, the throughput might be limited multiple machines, you can achieve much
compared to distributed systems. higher throughput.
Enabled: Nodes can be spread across
Limited: All the data resides on a single
Geographical different geographical locations which can help
machine, which might be located in one
Distribution in reducing latency and enhancing data
geographic location.
locality.

Higher: The distributed nature of these


Lower: Since there's only a single machine, if
Reliability systems allows for built-in redundancy. If one
it fails, the service becomes unavailable.
node fails, others can take over its load.

Higher: Even when individual nodes fail, the


Lower: A single machine's failure can disrupt
Resilience system as a whole can continue functioning,
the whole system.
making it highly resilient to faults.

Producer/Consumer vs Publish/Subscribe

Producer/Consumer (One-to-one communication): In this pattern, a producer sends messages to a queue,


and a consumer reads from that queue. The key characteristics are:

● The producer and consumer are decoupled.


● The producer adds messages to the queue without knowing about the consumer's state.
● The consumer can consume messages from the queue at its own pace.

Example: Each order on an e-commerce platform (Amazon, for instance) can be seen as a message produced
by the Order Management Service. The Delivery Service, which is responsible for processing these orders,
acts as the consumer. It takes orders from the queue and processes them for delivery.

Tools: RabbitMQ, Apache Kafka, Amazon SQS.

Publish/Subscribe (One-to-many communication): In this pattern, a publisher sends messages to a topic,


and multiple subscribers can receive those messages. The key characteristics are:

● It involves one-to-many communication, where one publisher sends messages to multiple subscribers.
● Subscribers express interest in receiving specific types of messages by subscribing to relevant topics.

Example: When a customer places an order, the Order Management Service publishes a message (order
details). Multiple services like Delivery Service (to process delivery) and Receipt Service (to generate a
receipt) are interested in this message. They subscribe to this topic and receive the message.

Tools: Google Pub/Sub, Apache Kafka, RabbitMQ, AWS SNS.

These two communication patterns serve different purposes and the choice between them depends on the
specific use case. The Producer/Consumer pattern is used when you need to distribute tasks among different
workers (like processing orders). The Publish/Subscribe pattern is used when you want to broadcast
messages to multiple receivers (like notifying different services about a new order).
Kafka

● Distributed streaming platform for real-time data streaming and processing.


● Ideal for high-throughput, responsive applications, surpassing the capabilities of JMS, RabbitMQ, and
AMQP.

Key Kafka Concepts in E-commerce Example

● Producer: Generates and pushes records (messages) into topics. E.g., Order Management Service
creates order messages.
● Consumer: Reads data from Kafka topics. E.g., Delivery Service processes order messages.
● Topic: A category for records where multiple consumers can subscribe. E.g., "Orders" topic for order
messages.
● Broker: Servers storing and managing data in a Kafka cluster.
● Cluster: A set of brokers, scalable without downtime.
● Partition: Divides topics for organization and scalability. Hosted on different servers.
● Offset: Unique record identifier in a partition.
● Replica: Copies of partitions for fault tolerance.
● Consumer Group: A group of consumers that collaboratively process data.

Message Structure

● Key (optional): Used for partitioning topics.


● Value: Event details (e.g., string/object).
● Timestamp.
● Compression type.
● Headers (optional).
● Partition and offset ID (assigned once written to a topic).

How Consumer Consumes and Tracks

● Kafka consumers maintain their position using "offsets."


● Periodic heartbeat updates to Kafka with the latest offset.
● Kafka does not track whether a message is consumed by all consumers.
● Multiple consumers are organized into consumer groups.
● Each consumer group reads from specific partitions, ensuring each message is delivered to only one
consumer.

Replication of Partition

● Kafka replicates each partition across multiple brokers for data reliability and fault tolerance.
● One broker is the "leader," handling data requests, while others are "followers" duplicating the leader's
data.
● Replicas are distributed across brokers, ensuring data availability during broker failures.

Zookeeper and Its Evolution in Kafka

● Zookeeper is a coordination service used for maintaining configuration, synchronization, and group
services in distributed systems.
● Zookeeper was initially essential for Kafka to manage metadata and cluster status.
● Maintaining Zookeeper added complexity and potential single points of failure.
● Since Kafka 2.8.0, Kafka introduced its internal metadata management system, reducing the
dependency on Zookeeper.
● Kafka's internal system, known as KRaft mode, simplifies Kafka's architecture, improves performance,
and enhances reliability.
● Kafka can now operate independently without Zookeeper, making it more manageable and robust.

Kafka vs RabbitMQ

Criteria Apache Kafka RabbitMQ

Log-based publish-subscribe (pub/sub) model Supports multiple messaging models like


optimized for real-time data feeds. Kafka pub/sub, request/reply, and point-to-point.
Messaging Model
retains all messages for a set period, allowing RabbitMQ's focus is more on message
consumers to replay the stream. routing, delivery, and guarantee.

High throughput, handling millions of Good performance for many use-cases, but
Performance messages per second, which makes it ideal for typically doesn't match Kafka's extremely
heavy-load scenarios. high throughput.

Kafka stores data on disk and provides RabbitMQ also provides message durability
Durability intra-cluster replication, ensuring message by storing data on disk and supports
durability. replication between nodes.

More suitable for traditional messaging, task


Best for real-time streaming data analysis, log
Use-cases distribution, and situations where complex
aggregation, and event sourcing.
routing to multiple consumers is needed.

Kafka is more complex to set up and manage, RabbitMQ is easier to set up and manage,
Ease of Use due to its distributed nature and more and it offers a user-friendly web-based
configuration options. management interface.

Supports at-most-once (where messages can


be lost), at-least-once (where messages can
At-least-once delivery is standard, but
Message Delivery be duplicated), and exactly-once (where
exactly-once delivery is also supported with
Semantics message delivery is assured but with
more complex configuration.
considerable performance implications)
delivery semantics.

Kafka provides the producer and consumer RabbitMQ has wide language support with
Language Support API in multiple languages including Java, libraries available for many modern
Python, .NET, Go, etc. programming languages.

Note that the choice between Kafka and RabbitMQ depends on specific use-case requirements, and each has
its strengths and weaknesses.

You might also like