Class 5 - MsgQueues, PubSub, Kafka
Class 5 - MsgQueues, PubSub, Kafka
● Synchronous tasks are high-priority tasks that require immediate execution and user feedback. They
are generally associated with user actions that need immediate system response.
● Asynchronous tasks can be processed in the background and are not time-sensitive. They don't need
immediate user feedback and often involve long-running operations that can be offloaded to
background systems.
Message Queues:
Lower: Since the system isn't distributed, a Higher: Distributed queues are designed to
Availability single point of failure can cause the entire avoid single points of failure. If one node fails,
service to be unavailable. the system can still continue to operate.
Depends on the specific queue technology More Robust: Messages in a distributed queue
and its configuration. Some may support can be replicated across multiple nodes,
Message Persistence
persistent messaging, but may not be as ensuring that no data is lost in case of a node
robust as distributed systems. failure.
Limited: The capacity is limited by the Higher: Since the system is distributed, it can
Scalability resources of the single machine where it be easily scaled up by adding more nodes to
operates. the system.
Lower: Being limited to a single machine's Higher: As you can distribute the load across
Throughput resources, the throughput might be limited multiple machines, you can achieve much
compared to distributed systems. higher throughput.
Enabled: Nodes can be spread across
Limited: All the data resides on a single
Geographical different geographical locations which can help
machine, which might be located in one
Distribution in reducing latency and enhancing data
geographic location.
locality.
Producer/Consumer vs Publish/Subscribe
Example: Each order on an e-commerce platform (Amazon, for instance) can be seen as a message produced
by the Order Management Service. The Delivery Service, which is responsible for processing these orders,
acts as the consumer. It takes orders from the queue and processes them for delivery.
● It involves one-to-many communication, where one publisher sends messages to multiple subscribers.
● Subscribers express interest in receiving specific types of messages by subscribing to relevant topics.
Example: When a customer places an order, the Order Management Service publishes a message (order
details). Multiple services like Delivery Service (to process delivery) and Receipt Service (to generate a
receipt) are interested in this message. They subscribe to this topic and receive the message.
These two communication patterns serve different purposes and the choice between them depends on the
specific use case. The Producer/Consumer pattern is used when you need to distribute tasks among different
workers (like processing orders). The Publish/Subscribe pattern is used when you want to broadcast
messages to multiple receivers (like notifying different services about a new order).
Kafka
● Producer: Generates and pushes records (messages) into topics. E.g., Order Management Service
creates order messages.
● Consumer: Reads data from Kafka topics. E.g., Delivery Service processes order messages.
● Topic: A category for records where multiple consumers can subscribe. E.g., "Orders" topic for order
messages.
● Broker: Servers storing and managing data in a Kafka cluster.
● Cluster: A set of brokers, scalable without downtime.
● Partition: Divides topics for organization and scalability. Hosted on different servers.
● Offset: Unique record identifier in a partition.
● Replica: Copies of partitions for fault tolerance.
● Consumer Group: A group of consumers that collaboratively process data.
Message Structure
Replication of Partition
● Kafka replicates each partition across multiple brokers for data reliability and fault tolerance.
● One broker is the "leader," handling data requests, while others are "followers" duplicating the leader's
data.
● Replicas are distributed across brokers, ensuring data availability during broker failures.
● Zookeeper is a coordination service used for maintaining configuration, synchronization, and group
services in distributed systems.
● Zookeeper was initially essential for Kafka to manage metadata and cluster status.
● Maintaining Zookeeper added complexity and potential single points of failure.
● Since Kafka 2.8.0, Kafka introduced its internal metadata management system, reducing the
dependency on Zookeeper.
● Kafka's internal system, known as KRaft mode, simplifies Kafka's architecture, improves performance,
and enhances reliability.
● Kafka can now operate independently without Zookeeper, making it more manageable and robust.
Kafka vs RabbitMQ
High throughput, handling millions of Good performance for many use-cases, but
Performance messages per second, which makes it ideal for typically doesn't match Kafka's extremely
heavy-load scenarios. high throughput.
Kafka stores data on disk and provides RabbitMQ also provides message durability
Durability intra-cluster replication, ensuring message by storing data on disk and supports
durability. replication between nodes.
Kafka is more complex to set up and manage, RabbitMQ is easier to set up and manage,
Ease of Use due to its distributed nature and more and it offers a user-friendly web-based
configuration options. management interface.
Kafka provides the producer and consumer RabbitMQ has wide language support with
Language Support API in multiple languages including Java, libraries available for many modern
Python, .NET, Go, etc. programming languages.
Note that the choice between Kafka and RabbitMQ depends on specific use-case requirements, and each has
its strengths and weaknesses.