Understanding Streams in Redis and Kafka
Understanding Streams in Redis and Kafka
© 2022 Redis
2
Table of PART 1. Introducing the concept of streams. . . . . . . . . . . 3 Single partition and multiple consumers . . . . . . . . . . . 28
Contents
What are streams? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Multiple partitions and multiple consumers. . . . . . . . . 30
How streams are related to events . . . . . . . . . . . . . . . . . . 7 In-order and in-parallel message processing. . . . . . . . 37
How streams compare to buffering . . . . . . . . . . . . . . . . . . 8 The role of consumer groups in Redis Streams . . . . . . . . 43
Processing using just Buffers . . . . . . . . . . . . . . . . . . . . . . . . . . 8 How messages are acknowledged. . . . . . . . . . . . . . . . . . 49
Processing using Streams. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Letting the producer know that the message has been
The challenges of stream processing . . . . . . . . . . . . . . . 10 delivered. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
3
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
4
Even when you think you have a firm understanding of Even though we’ll be covering deep and complex topics,
it, stream processing can still be a very complex topic. thanks to the format and the illustrations, it should be an
In fact, it’s difficult to maintain a good mental model of easy and fun read overall.
streaming unless you really understand some stream
processing systems.
By the end of this, you should have
The goal of this e-book is to help you build that mental
• An expert-level theoretical understanding of
model. I’ll use text, code snippets, and more than 50
streams, the challenges of stream processing,
illustrations to explain
and how two stream processing systems (Kafka
and Redis streams) work
1. How to think about streams and connect the
• Enough knowledge to do a proof-of-concept
dots between different perspectives so you get a
of Redis Streams or Kafka and to determine
bigger picture
which one is best suited for you
2. Some of the challenges of handling streams
• Enough theoretical knowledge to get a head
3. How stream processing systems such as Redis
start on certification for either Redis or Kafka
Streams and Kafka work. We are using these two
systems as examples in the hope that you’ll gain a
OK, let’s get started.
more thorough understanding as opposed to learning
just how one system handles the processing.
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
5
JSON chat JSON chat JSON chat JSON chat JSON chat
TIME
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
6
This data can be internal as well Figure 4: Streams of messages sent using
as external. It doesn’t have to machine-to-machine communication
come from the outside world.
It could originate from different
systems sending messages Web Server
to each other. For example,
a webserver, after receiving
payment information, might
use a JSON message to tell
an email server to send an JSON
email via a JSON message.
That is machine-to-machine
communication. You can also
think of these messages as JSON
coming in the form of streams
because they can come in small
pieces and can come over time
and at any point in time. JSON JSON JSON JSON
App Server Analytics Server
JSON
JSON
Email Server
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
7
related to events
An event is simply a mechanism, Processing one
a trigger that is activated when Events “Buy event” at a time
something has occurred. For
example, when someone buys
a product, that triggers an event
that leads to the creation of a Server
JSON message that contains the
person’s information, payment
amount, product info, and so
on. This usually originates at
the browser or mobile app, JSON STREAM
and then the message is sent JSON Buy JSON Buy JSON Buy JSON Buy JSON
to the server. Here, the event is
the act of buying the product,
indicating something occurred. Data correcponding TIME
And since the buying event to each “event”
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
8
buffering
If you ask backend engineers SERVER
who work in Java, NodeJS, and BUFFERS
other programming languages,
they’ll tell you that streams are
more efficient than buffers for Store the fixed data
processing chunks of data. They 1 back to disk
An example should help us to MyProject.txt MyProject.txt MyProject.txt FIXED New MyProject.txt New MyProject.txt
Buffer (RAM) MyProject.txt
understand their perspective a Buffer (RAM)
little better.
2 3 4
Say you have a 10 GB file
containing hundreds of typos Processes the entire
DATA PROCESSOR
file at a time
that say “breams” instead of
“streams.” Let’s look at how to
use buffers to replace “breams”
with “streams,” and then we’ll
see how the process would Here is how it works:
work using streams. 1. You first read the entire 10 GB file into RAM (can be slow to load all that data).
2. You then send this data to a data processor that will fix all the typos from “breams” to “streams.”
3. Once the data processor finishes processing, the new data will be stored back in the RAM
(so you may need an additional 10 GB of memory).
4. After all the processing is done, you write the entire file into a new file.
As you can see, this process not only tends to be slow, but it can also take up a lot of memory.
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
9
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
10
The challenges of 1. What happens if the consumer is unable to process the chunks as quickly as the producer creates them? Taking our
current example, what if the consumer is 50 % slower then the producer? If we’re starting out with a 10 GB file, that
stream processing means by the time the producer has processed all 10 GBs, the consumer would only have processed 5 GB. What
happens to the remaining 5 GB while it’s waiting to be processed? Suddenly, that 50-100 bytes allocated for data
Although streams can be a very that still needs to processed would have to be expanded to 5 GB.
efficient way of processing huge
volumes of data, they come with
their own set of challenges. Let’s
take a look at a few of them. Figure 8: If the consumer is slower than the producer, you’ll need additional memory.
STREAMS
(50% SLOWER CONSUMER)
SERVER
1 2
BYTES
TO LINE
10 GB CONVERTOR 10 GB
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
11
2. And that’s just one nightmare scenario. There are others. For example, what happens if the consumer suddenly dies
while it’s processing a line? You’d need a way of keeping track of the line that was being processed and a mechanism
that would allow you to reread that line and all the lines that follow.
STREAMS
(FAILED CONSUMER)
SERVER
1 2
BYTES
TO LINE
10 GB CONVERTOR 10 GB
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
12
3. Finally, what happens if you need to be able to and that each has a corresponding consumer: a
process different events and send them to different “payment processor,” an “inventory processor,” and
consumers? And, to add an extra level of complexity, a “webserver events processor.” In addition, there is
what if you have interdependent processing, when the an important interdependency between two of the
process of one consumer depends on the actions of consumers. Before you can process the inventory,
another? There’s a real risk that you’ll wind up with a you need to verify payment first. Finally, each type
complex, tightly coupled, monolithic system that’s very of data has different destinations. If it’s a payment
hard to manage. This is because these requirements event, you send the output to all the systems, such as
will keep changing as you keep adding and removing the database, email system, CRM, and so on. If it’s a
different producers and consumers. webserver event, then you send it just to the database.
If it’s an Inventory event, you send it to the database
For example (Figure 10), let’s assume we have a large retail and the CRM.
shop with thousands of servers that support shopping
through web apps and mobile apps. As you can imagine, this can quickly become quite
complicated and messy. And that’s not even including the
Imagine that we are processing three types of data slow consumers and fault-tolerance issues that we’ll need
related to payments, inventory, and webserver logs to deal with for each consumer.
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
13
Figure 10: The challenge of tight coupling because of multiple producers and consumers
Database
CRM
Inventory events Inventory proc
BEFORE PROCESSING
AFTER PROCESSING After processing, some data needs to go to all the systems (like payment Other web
events data) and some are only for a few (like web events data) services
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
14
Of course, all of this assumes that you’re dealing with consumers, and we retain the dependency between the
a monolithic architecture, that you have a single server inventory microservices and the payment ones. Finally,
receiving and processing all the events. How would you the problems we pinpointed in our original streaming
deal with a “microservices architecture”? In this case, example remain problems.
numerous small servers (that is, microservices) would
be processing the events, and they would all need to 1. We haven’t figured out what to do when a consumer
be able to talk to each other. Suddenly, you don’t just crashes.
have multiple producers and consumers. You have them 2. We haven’t come up with a method for managing slow
spread out over multiple servers. consumers that doesn’t force us to vastly inflate the
size of the buffer.
A key benefit of microservices is that they solve the 3. We don’t yet have a way to ensure that our data isn’t
problem of scaling specific services depending on lost.
changing needs. Unfortunately, although microservices
solve some problems, they leave others unaddressed. These are just some of the main challenges. Let’s take a
We still have tight coupling between our producers and look at how to address them.
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
15
Webserver
events
Dashboard
Inventory
events
After processing, some data needs to go to all the systems (like payment
events data) and some are only for a few (like web events data)
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
16
Specialized stream As we’ve seen, streams can be great for processing large 4. If the consumer is slow, you can increase consumption
amounts of data but also introduce a set of challenges. by adding more consumers.
processing systems New specialized systems such as Apache Kafka and Redis 5. If one consumer is dependent on another, you can
Streams were introduced to solve these challenges. In the simply listen to the output stream of that consumer
world of Kafka and Redis Streams, servers no longer lie at and then do your processing. For example, in Figure
the center, the streams do, and everything else revolves 11, the inventory service is receiving events from both
around them. the inventory stream (purple) and also the output of
the payment processing stream (orange) before it
Data engineers and data architects frequently share this processes the inventory event. This is how you solve
stream-centered worldview. Perhaps it’s not surprising the interdependency problems.
that when streams become the center of the world, 6. The data in the streams are persistent (as in a
everything is streamlined. database). Any system can access any data at any
time. If for some reason data wasn’t processed, you
Figure 12 illustrates a direct mapping of the tightly can reprocess it.
coupled example you saw earlier. Let’s see how it works
at a high level. A number of streaming challenges that once seemed
formidable, even insurmountable, can readily be solved just
Note: We’ll go into the details later in the context of Redis by putting streams at the center of the world. This is why
Streams and Kafka to give you an in-depth understanding more and more people are using Kafka and Redis Streams in
of the following: their data layer.
1. Here the streams and the data(events) are first-class This is also why data engineers view streams as the center
citizens as opposed to systems that are processing of the world.
them.
2. Any system that is interested in sending data Now that we understand what streams, events, and
(producer), receiving data (consumer), or both sending stream processing systems are, let’s take a look at Redis
and receiving data (producer + consumer) connects to Streams and Kafka to understand stream processing and
the stream processing system. how they solve various challenges. By the end of this, you
3. Because producers and consumers are decoupled, should be an expert, at least in the theoretical aspects of
you can add additional consumers or producers at stream processing to the extent that you can readily do a
will. You can listen to any event you want. This makes it proof-of-concept for each system or easily earn Kafka or
perfect for microservices architectures. Redis certification.
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
17
Figure 12: When we make streams the center of the world, everything becomes streamlined.
Payments stream
Webserver stream
Inventory stream
REDIS STREAMS
Payments stream (post processing)
OR
Events stream (post processing)
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
18
So the key thing to remember is that, when you are thinking about Kafka and Redis Streams, you should really
think of Kafka and Redis (not just Redis Streams).
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
19
How messages Figure 13: How messages look in Kafka and Redis Streams
In Redis Streams, each message Messages are stored in the same order
as they arrive. Messages are by default
by default gets a timestamp
identified by
as well as a sequence number. <millisecondsTime>-<sequenceNumber>
56628723-0 56628724-0 56628725-0 56628726-0 56628727-0
The sequence number is 1518951480106-0
Redis Streams
provided to accommodate NEWEST OLDEST
messages that arrive at the
exact same millisecond. So
if two messages arrived at
the exact same millisecond
(1518951480106), their ids would
look like 1518951480106-0 and
1518951480106-1.
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
20
Creating streams Figure 14: How messages look in Kafka for topic Email with one broker, one partition, and one replication factor
We’ll get into all these in a bit but Note the command to create a Kafka topic with one partition and one replication factor would look like this:
for now let’s assume you have > kafka-topics --zookeeper 127.0.0.1:2181 --topic Email --create --partitions 1
one partition, one broker, and --replication-factor 1
one replication factor.
Important Kafka no longer requires Zookeeper as of Kafka 3.3.1. Since 3.3.1, Kafka uses Kafka Raft (KRaft), which
Important:
is built-in. Instead of “--zookeeper” you now use “--bootstrap-server”.
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
21
Note: The example below (Figure 14a) shows how these would look in a Kafka cluster. We’ll discuss that later, but for now
just imagine that there is only one broker.
Figure 14a: A Kafka cluster with three brokers (servers), two topics (Email and Payment), where Email-topic has three parti-
tions that are spread across three brokers (10, 11, and 12) and Payment topic has two partitions that are spread across two
brokers (11 and 12).
Apache
ZooKeeper
Topic Topic
Email Payment
Partition 1 Partition 0
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
22
In Redis, you simply create a Figure 15: How messages look in Redis for an Email stream
stream and give it a key. Note
that all the data within this
stream is part of this single key
(“Email”). Note also that this
key and its stream just resides {”name”: “raja”, “email”: “[email protected]”}
along with other keys and data User JSON
structures. Redis allows for a
number of data structures. TIME
A stream is just one of them.
(See Figure 15.) 12-2 12-1 12-0
Email
Redis Streams NEWEST OLDEST Email Service
If the Email stream already exists, it will append the message. If it doesn’t exist, Redis will automatically create a
stream (using “Email” as its key) and then append the first message. The asterisk will auto-generate the message
id (timestamp-sequence) for this message.
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
23
Adding messages
In the following command, you are using a Kafka producer CLI tool to send three messages to the Email topic.
Kafka has a concept called $ kafka-console-producer --brokers-list 127.0.0.1:9092 --topic Email
“producers.” These are > my first email
responsible for sending > my second email
messages. They can also send > my third email
messages with some options
such as acknowledgments, In Redis Streams, use the XADD command to send the data in a hash to the Email key.
serialization format, and so on. XADD Email * subject “my first email”
XADD Email * subject “my second email”
XADD Email * subject “my third email”
In Redis Streams, you can set up acknowledgments and many other things as part of the Redis server or Redis
cluster settings. Remember that these settings will get applied to the entire Redis and not just for the Redis
Streams data structure.
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
24
Both Kafka and Redis Streams In Kafka, the following command reads all the messages in the Email topic. The “bootstrap-server” is the main Kafka
have the concepts of consumers server. The “--from-beginning” flag tells Kafka to send all the data from the beginning. If we don’t provide this flag, the
and consumer groups. We’ll consumer will only retrieve messages that arrive after it has connected to Kafka and started to listen.
cover just the basics first.
Response:
> my first email
> my second email
> my third email
Note: The above consumer client will continue to wait for new messages in a blocking fashion and will display them when
they arrive.
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
25
1. Consume messages by using XREAD (equivalent to Kafka’s command). In the command below, “BLOCK 0”
tells the Redis CLI to maintain the connection forever (0) in a blocking manner. “Email 0” after the keyword
“STREAMS” means to get messages from the “Email” stream and from the beginning of time.
Response:
1) 1) 1518951480106-0
2) 1) “subject”
2) “my first email”
2) 1) 1518951482479-0
2) 1) “subject”
2) “my second email”
3) 1) 1518951482480-0
2) 1) “subject”
2) “my third email”
Notes:
• If you use “Email $”, then it would get only new messages from the “Email” stream. That is, “XREAD BLOCK 0 STREAMS
Email $”
• You can use any other timestamp id after the stream name to get messages after that timestamp id. That is, “XREAD
BLOCK 0 STREAMS Email 1518951482479-0”
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
26
c. You can use the command XRANGE and get everything from the smallest (“-”) timestamp to the latest one
(“+”).
Response:
1) 1) 1518951480106-0
2) 1) “subject”
2) “my first email”
2) 1) 1518951482479-0
2) 1) “subject”
2) “my second email”
3) 1) 1518951482480-0
2) 1) “subject”
2) “my third email”
Response:
1) 1) 1518951482479-0
2) 1) “subject”
2) “my second email”
2) 1) 1518951482480-0
2) 1) “subject”
2) “my third email”
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
27
Response:
1) 1) 1518951480106-0
2) 1) “subject”
2) “my first email”
d. By prefixing the last id with a “(“, you can pick up where you left off, starting with the messages that
immediately followed the one with that id and keeping the “+” for the ending point. In the example below,
we are retrieving two messages that come after a message with a “1518951480106-0” id.
Response:
1) 1) 1518951482479-0
2) 1) “subject”
2) “my second email”
2) 1) 1518951482480-0
2) 1) “subject”
2) “my third email”
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
28
TOPIC: EMAIL
Email Service
All messages
PARTITION: 0
2 1 0
Email Service
Apache
ZooKeeper
NEWEST OLDEST
All messages
TIME
Email Service
Note: Although it doesn’t work for this scenario, it works fine in the chat messenger clients where you can connect multiple
users to the same topic and they all receive all chat messages.
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
29
It works exactly like that in Redis Picture 17: A “fan out” in Redis Streams
Streams as well.
All messages
User JSON
All messages
TIME
myArray Array
Email Service
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
30
In Kafka, there is a concept called a partition. You can think of a partition as a physical file on the disk. Partitions are used for
scaling purposes. However you should use them carefully, either with “keys” or with “consumer groups.” We’ll talk about both
of them in a bit. But just know that consumers generally don’t care and are not aware of the partitions. They just subscribe to
a “Topic” (the higher-level abstraction) and consume whatever Kafka sends them.
We are going to cover multiple cases of just using multiple partitions and multiple consumers, and it may look odd at first.
In the example below, we have created three partitions for the “Email” topic using the following command:
Now when we add three messages to the topic, they are automatically distributed to each of the partitions using a
hashing algorithm. So each partition gets just one message each in our example. But when consumers connect to this
topic (they are not aware of the partitions), all the messages that are in each partition are sent to each consumer in a
fan-out fashion.
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
31
Figure 18: A “fan out” when there are multiple partitions (default behavior)
BROKER 1 (SERVER)
TOPIC: EMAIL
All messages
PARTITION: 0
0
Email Service
All messages
PARTITION: 1
0 Email Service
Apache
ZooKeeper
All messages
PARTITION: 2
Email Service
0
TIME
Notes:
• Message order: The order in which consumers receive messages is not guaranteed. For example, “Email Service 1” might
receive “message 1”, “message 3” and finally “message 2”. Whereas “Email Service 2” might get them in the following
order: “message 3”, “message 1” and “message 2”. This is because message order is only maintained within a single parti-
tion.
• Later, we’ll learn more about the ordering and how to use keys and consumer groups to alter this default behavior.
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
32
It still works the same. Each consumer gets all the messages irrespective of partitions. Message order is not guaranteed.
Figure 19: A “fan out” when there are more partitions than consumers
BROKER 1 (SERVER)
TOPIC: EMAIL
All messages
PARTITION: 0
0
Email Service
PARTITION: 1
0
Apache
ZooKeeper
All messages
PARTITION: 2
Email Service
0
TIME
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
33
Case 3: Multiple but fewer partitions (three) and more consumers (four)
It still works the same. Each consumer receives all the messages irrespective of partitions. Message order is still random.
Figure 20: A “fan out” when there are fewer partitions than consumers
BROKER 1 (SERVER)
TOPIC: EMAIL
All messages
Email Service
PARTITION: 0
0
All messages
Email Service
PARTITION: 1
0 All messages
Apache
ZooKeeper
Email Service
PARTITION: 2
All messages
0
Email Service
TIME
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
34
In Redis Streams, there is no such concept as partitions. If you are using a standalong Redis Server, you don’t need to
worry about partitioning. If you do want to distribute messages in the same stream across several servers, then you
should use a combination of multiple stream keys and a sharding system like Redis Cluster, or some other applicaiton-
specific sharding system.
Let’s look at how you might implement something resembling partitions in Redis.
You can create “partitions” by creating multiple streams and then distributing data yourself. And on the consumer side,
unlike Kafka, since you have direct access to each of these streams, you can consume the data in a fan-out fashion by
connecting to all the streams, or by using a key or keys to connect to specific streams.
Say you created three streams: “Email:P0”, “Email:P1”, and “Email:P2”. And say you want to distribute the incoming
messages in a round-robin fashion. And finally you want to consume data in a “fan-out” fashion and also in a “per-
stream” fashion.
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
35
Consuming from Redis Stream “partitions” in a “fan out” fashion (Figure 21)
To consume the data in a “fan out” fashion, simply listen to all the streams (Figure 21).
//Consumer 1
XREAD BLOCK 0 STREAMS Email:P0 0 Email:P1 0 Email:P2 0
//Consumer 2
XREAD BLOCK 0 STREAMS Email:P0 0 Email:P1 0 Email:P2 0
//Consumer 3
XREAD BLOCK 0 STREAMS Email:P0 0 Email:P1 0 Email:P2 0
Notes:
• BLOCK 0 = Wait indefinitely for messages.
• ”Email:P0 0” = Read all messages from the beginning (0-0).
• By providing multiple stream names and “0”s each consumer can receive all messages.
Figure 21: How to implement “partitions” in Redis streams and consume messages in a “fan out” manner
All messages
TIME
Email Service
12-0
Email:P0 Redis Streams
All messages
TIME
12-1
Email:P1 Email Service
Redis Streams
TIME
All messages
12-2
Email:P2 Redis Streams
Email Service
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
36
Only messages
from “Email:P0”
TIME
Email Service
12-0
Email:P0 Redis Streams
Only messages
from “Email:P1”
TIME
12-0
Email:P1 Email Service
Redis Streams
TIME
Only messages
from “Email:P2”
12-0
Email:P2 Redis Streams
Email Service
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
37
In-order and in-parallel message processing that, those messages with the same key will end up
in the same partition. This helps in message ordering.
Tasks can be handled in order, in parallel, or with a Message keys are also useful in other things such as log
combination of both. As the name implies, with in- compaction, but we’ll not cover that here.
order processing, tasks are handled in a specific order.
For example, when processing a credit card, we need Secondly, Kafka uses the concept of “consumer groups,”
to first check for validity of the card, then do fraud where you define a bunch of individual consumers
detection, and then check for a balance. With in-parallel as part of the same consumer group. Kafka will then
processing, tasks are handled simultaneously so they can ensure that messages are distributed across different
be completed more quickly. With in-order and in-parallel consumers that are part of that group. This helps in
processing, the system splits the tasks into groups of scaling consumption and also avoids “fan out,” so
tasks that need to be handled in order and then assigns each message is read by only one consumer. Another
those to different consumers that can perform those key aspect of consumer groups is that, assuming the
ordered tasks in parallel. Kafka and Redis Streams handle number of consumers is greater than or equal to the
this process a little differently. How they differ will become number of partitions, each consumer in a group is tied
clearer when you look at each system’s implementation. to a single partition and is allowed to read messages
from just that partition. It cannot read messages
How Kafka handles it from multiple partitions. This way when you combine
message keys and consumer groups you’ll wind up with
In Kafka, you can send metadata called a “key” (aka highly distributed consumption, although order is still
“message key”) along with the message. When you do not guaranteed in the event of a consumer failure.
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
38
Referring to Figure 23, let’s say Figure 23: How consumer groups and message keys work in Kafka
you are processing emails for an
e-commerce store. You need to Consumer Group:
send the following emails and in Email Application
BROKER 1 (SERVER)
the following order:
TOPIC: EMAIL
1. Payment Received All messages
in Partition 0
2. Product Shipped
3. Product Delivered
Email Service 1
PARTITION: 0
In this case, to make sure they
are sent in that order, we can 2 1 0 All messages
in Partition 1
use the order id (“order1234”)
as the key we send to Kafka to
Email Service 2
ensure that all the messages
end up in the same partition. PARTITION: 1
All messages
in Partition 2
Apache
ZooKeeper
Email Service 3
PARTITION: 2
Extra Stand-by
Email Service 4
TIME
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
39
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
40
How Redis Streams handle it Let’s say someone purchases three different products. For each product, you have “payment_received”, “product_shipped”,
and “product_delivered” messages (for a total of nine), and you want to process them in order but also in parallel.
Although, like Kafka, Redis
Streams has a concept of In the example (Figure 24) below, yellow, purple, and pink represent three products. Each product has three messages
“consumer groups,” it operates representing its different states. As you can see, if you want to process three messages at a time, simply create three
differently. In fact, you don’t streams and send each product’s data into a specific stream based on the product id or some unique identifier. This
need it for this specific use is similar to “keys” in Kafka. After that, connect each consumer to each stream (i.e., 1:1 mapping). This is also similar
case. We’ll learn in the next to Kafka. Then you’ll get both parallel processing and in-order message processing at the same time. As we already
section how Redis uses mentioned, unlike with Kafka, with Redis you don’t really need consumer groups in this case.
consumer groups, but for now
let’s see how in-order and in-
parallel message processing Figure 24: Using Redis Streams to process multiple messages in parallel and in order
works in Redis.
AllMsgs in-order
Creating streams in Redis from “Email:P0”
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
41
INCR email_counter
2. Hash the unique id like the order id and take a modulus of the number of streams to determine which stream
(Email:P0 or Email:P1 or Email:P2) to send the data to.
All we need to do is to convert a string like orderId into a number using a popular hash called “murmur” hash
and then take mod of the number of streams.
Note: Kafka also uses “murmur” hash for converting string into a number. There are “murmur” libraries in every
language, such as this one in Nodejs. A “murmur” hash, while not strictly necessary, is fast and sufficient given
you do not require cryptographic secutiry.
3. Send the messages to the appropriate stream. Notice that because of the hash we employed in the above
steps, we’ll have a 1:1 mapping between the order id and the stream name. So, for example, all the messages
with order id order1234 will go to “Email:P0” stream.
XADD Email:P0 * id order1234 name “light bulb” price “$1:00” status “payment_received”
XADD Email:P0 * id order1234 name “light bulb” price “$1:00” status “order_shipped”
XADD Email:P0 * id order1234 name “light bulb” price “$1:00” status “order_delivered”
XADD Email:P2 * id order5555 name “Yoga book” price “$31:00” status “payment_received”
XADD Email:P2 * id order5555 name “Yoga book” price “$31:00” status “order_shipped”
XADD Email:P2 * id order5555 name “Yoga book” price “$31:00” status “order_delivered”
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
42
4. Here’s how just one of the consumers will receive all three messages.
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
43
The role of consumer In the example below (Figure 25), we have created a consumer group called “Email Application” and have made three
groups in Redis Streams consumers part of that group. Each consumer is asking for one message each at the same time for concurrent processing.
In this case, Redis Streams simply distributes unread (unconsumed) messages to each consumer.
In Redis Streams, although there
is no concept of “message keys”
like in Kafka, you can still get Figure 25: Each member of the consumer group “Email
message ordering without it. Application” has concurrently requested one message and has
Consumer Group:
However, it does have a concept been given one unread message Email Application
of “consumer groups” but again
it works differently from Kafka.
1 of 3 msg
First, let’s understand how
consumer groups work in Redis
Streams and then later we’ll see 12-0
Email Service 1
how Redis Streams handles
message ordering. 2 of 3 msg
TIME
In Redis Streams you can
connect multiple consumers that Email:P1
12-2 12-1 12-0 12-1 Email Service 2
Redis Streams
are part of the same consumer
group to a single stream and do 3 of 3 msg
parallel processing without the
need for partitions.
12-2
Email Service 3
Note: Each consumer within the consumer group needs to identify itself with a name. In Figure 25, we have named the
services that are part of the “Email Application” group as “emailService1”, “emailService2”, and “emailService3”.
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
44
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
45
As mentioned earlier, unlike Figure 26: How consumer groups work when one
Kafka, each consumer within consumer asks for more messages than the other
the group can ask for as many Consumer Group:
messages as it wants. Email Application
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
46
This scenario doesn't have to Figure 27: Multiple consumer groups and non-grouped,
be limited to a single consumer or standalone, consumers all consuming messages at
group. It’s possible to have the same time Consumer Group:
Email Application
multiple consumer groups as
well as regular consumers that
are not part of any group, all
1 of 3 msg
consuming messages at the
same time. In Figure 27, there
are two different consumer 12-0
Email Service 1
groups (“Email Application”
2 of 3 msg
and “Payment Application”)
as well as a regular consumer TIME
(Dashboard Service) that are all 12-2 12-1 12-0 12-1
Email Email Service 2
consuming the messages. Redis Streams
3 of 3 msg
12-2
Consumer Group 2: Email Service 3
Payment Application
2
1 of 3 msg 2 of 3 msg
1 0
12-0 12-1
Payment Service 1
3 of 3 msg
12-2
Payment Service 2
DASHBOARD SERVICE CONSUMER
(not part of any group)
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
47
And finally, a consumer or a Figure 28: The consumer group (Payment Application) and a
consumer group can consume consumer (Dashboard Service) consuming data from two streams Consumer Group:
data from multiple streams Email Application
(see Figure 28).
1 of 3 msg
TIME
12-1
TIME Email Service 2
12-2
Email Service 3
Consumer Group 2:
Payment Application
2 2
1 of 3 msg 2 of 3 msg
1 1 0
12-2 12-0
Payment Service 2
DASHBOARD SERVICE CONSUMER
(not part of any group)
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
48
2. Making the dashboard service get messages from the beginning from both Email (Email 0) and the Order
(Orders 0) streams and also waiting for any new streams in a blocking fashion (BLOCK 0).
Now that you have seen the basics of how stream processing works, let’s look at how some of the challenges are
addressed. One of the most effective ways to handle them is via “message acknowledgements.”
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
49
How messages 1. Providing message delivery guarantees for producers. Once a message has been sent, how can we be sure that it has
been received? We need the streaming system to acknowledge that it has in fact safely stored the incoming message.
are acknowledged
Streaming system
In the context of stream Figure 29: How stream processing
processing, acknowledgement systems acknowledge message
is simply a way for one system Producer reception to producers
to confirm to another system
that it has received a message
or that it has processed that
message. Redis
or
Message acknowledgements Yes, I acknowledge that I’ve safely stored it
can be used to solve the
following four stream
processing challenges:
1. Providing message delivery 2. Providing message consumption guarantees for consumers. There needs to be a way for the consumer to
guarantees for producers acknowledge back to the system that it has successfully processed the message.
2. Providing message
consumption guarantees for
consumers Streaming system
Figure 30: A consumer
3. Enabling retries after
acknowledgement to the streaming
temporary outages
system after processing the message
4. Permitting reassignment
following a permanent
Consumer
outage
Redis
Yes, I acknowledge that I’ve processed the message
or
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
50
3. Enabling retries after temporary outages. We need to be able to reprocess messages in the event that a consumer
dies while processing them. For this we need to have a mechanism that enables the consumer to acknowledge to
the system that it has processed a message. And if there is an issue, the processing system needs to provide a way
to re-process that message in case of a temporary failure (Figure 31).
4. Permitting reassignment following a permanent outage. And lastly, if the consumer permanently fails (say,
crashes), we need a way to either assign the job to a different consumer or allow different consumers to find out
about the failure and take over the job (Figure 32).
Streaming system
1 2
Figure 32: When a consumer,
while attempting to read new
First time consuming messages (1) permanently crashes
Consumer
(2), a new consumer takes over
3 (3), successfully processes the
messages, and then sends back an
A new consumer takes over when another consumer crashes
Redis NEW acknowledgment to the streaming
Consumer
or 4 system (4).
Yes, I acknowledge that I’ve processed the message
Now let’s look at how Kafka and Redis Streams handle each one of these.
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
51
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
52
Independently, the write is communicated from the primary to the replica and replication acknowledges the write back to
the primary. These are steps 5 and 6.
Independently, the write to a replica is also persisted to disk and acknowledged within the replica. These are steps 7 and 8.
Figure 34: How strong consistency configuration (option 1) works STRONG CONSISTENCY
1 2
APP
4 3
Proxy Primary
6 5
7
8
Redis Labs Cluster Replica Storage
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
53
With this flow, the application only gets the acknowledgment from the write after durability is achieved with
replication to the replica and to the persistent storage.
With the WAIT command, Redis will make a best-effort attempt to guarantee that even under a node failure or node
restart, an acknowledged write will be recorded. However there is still a possibility of failure.
See the WAIT command for details on the new durability and consistency options.
Ack = 1 (wait for the leader – but “Weak consistency configuration” where you get
Option 2
not replicas – to acknowledge ) acknowledgement only from the leader
Ack = All (wait for all replicas to “Strong consistency config” (wait for all replicas
Option 3
acknowledge) to acknowledge). Or use “Redis Raft”
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
54
Apache
ZooKeeper
PARTITION: 2
3 2 1 0
TIME
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
55
Committing offsets (i.e., Figure 37: Consumers have processed up to offset 2 in partition 0, up
consumer acknowledgement). to offset 4 in partition 1, and up to offset 0 in partition 3.
When a consumer processes
a message or a bunch of
messages, it acknowledges
this by telling Kafka the offset BROKER 1 (SERVER)
it has just consumed. In
Kafka, this can be automatic TOPIC: EMAIL COMMITTED OFFSET
or manual. Following the
consumer's acknowledgement,
this information is written
to an internal topic called PARTITION: 0
“__consumer_offsets”, which acts 6 5 4 3 2 1 0
Apache
ZooKeeper
PARTITION: 2
3 2 1 0
TIME
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
56
This leads to three delivery Figure 38: How at-most-once message processing works
methods, each with its own
advantages:
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
57
2. At least once: In this case, Figure 39: How at-least-once processing works
the consumer will commit only
after processing. Let’s imagine
BROKER 1 (SERVER)
that for performance reasons
the consumer is reading
TOPIC: EMAIL COMMITTED OFFSET
three messages at once
and committing once after 2ND TRY
(with duplicate processing)
processing all three messages.
Let’s say it processed two
1ST TRY
successfully but crashed before
it was able to process the third
PARTITION: 0
one. In this case, the consumer
6 5 4 3 2 1 0
(or a different consumer) can
come back and request these
messages from Kafka again.
And because the messages
were never committed, Kafka PARTITION: 1
will send all three messages 4 3 2 1 0
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
58
In the illustration above, assume that a consumer is The way to mitigate this is to process the messages in a
processing three messages at a time and committing an way that it’s idempotent. This means even if you process
offset after it has processed all three. a message multiple times, the end result won’t change.
For example, if you set the exact price of some product
Here is how it works: multiple times in a database, it won’t matter. When building
distributed applications, if you find that you cannot maintain
1. A consumer reads messages with offsets 0, 1, and 2. idempotency when processing messages, you likely need to
2. It processes them. reconsider your logic to find a way to make it idempotent.
3. It commits the offset to 2.
4. The next time it asks, it gets messages with offsets 3, 3. Exactly once: As the name suggests, it simply means
4, and 5. that you figure out a way to ensure that a message is
5. Let’s say it processes offsets 3 and 4 but crashes processed once and no more. For this you typically
while processing offset-5. need extra support (programming logic) to ensure and
6. A new consumer (or the same consumer) requests guarantee this because there could be various reasons
messages from Kafka. for duplicate processing. Kafka only provides this level of
7. Kafka will again return messages with offset 3, 4, and 5. guarantee out of the box with Kafka-to-Kafka streams.
8. Let’s say this time all three are successfully
processed. That’s good, but it leads to duplicate Now that we’ve seen how Kafka provides message
processing of 3 and 4. consumption guarantees, let’s take a look at how Redis
handles them. But first, in order to do so, we need to
delve into the Redis concept of the “pending list.”
Remember that Redis Streams does not have a built-in To ensure that these consumers don’t process duplicate
mechanism for partitions. Multiple consumers that are part messages, Redis Streams uses an additional data structure
of the same consumer group can all connect to a single called “pending lists” to keep track of messages that are
stream and yet still process messages concurrently within currently being processed by one of the consumers.
that stream.
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
59
Looking at Figure 40, 1. The “last_delivered_id” id ensures that only unread messages are delivered to future requests from consumers
“emailService1” has asked of that same group. This is kind of like the “offset commits” in Kafka.
for two messages and 2. The pending lists allow consumers, should they temporarily die during processing (that is, before acknowledgement),
“emailService2” has asked to pick up where they left off.
for one. After the messages 3. The pending lists also allow other consumers to claim pending messages (using XCLAIM) in case the death of the
have been received, Redis original consumer proves to be permanent.
Streams puts a copy (or a
pointer) of them in a separate
list for each consumer. So
“12-0” and “12-1” are added Figure 40: How pending lists work in Redis Streams
to the list for “emailService1”
Consumer Group:
and “12-2” is added to the Email Application
list for “emailService2”. In last_delivered_id
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
60
Now, let’s imagine that Figure 41: Following acknowledgment from emailService2,
“emailService2” has completed the message “12-2” is removed from the pending list
its processing (Figure 41) and
acknowledges this. Redis Consumer Group:
Streams responds by removing Email Application
last_delivered_id
the processed items from the
pending list.
TIME
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
61
3. Let’s consume a message from the “emailService2” consumer that’s part of the “EmailApplnGroup” from the
“Email” stream.
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
62
As you can imagine, with Redis Figure 42: “At most once” processing in Redis Streams
Streams you can easily apply the
same delivery approaches used
with Kafka. Let’s take a look. Consumer Group:
Email Application
last_delivered_id
1. At most once: In this case,
you send an acknowledgement
when the messages have been TIME
received but before they’ve
been processed. Using Figure 42 Email
12-5 12-4 12-3 12-2 12-1 12-0 12-1 12-0
Redis Streams
for reference, let’s imagine that emailService 1
“emailService2” acknowledges
before fully processing the
message in order to quickly TIME Pending List for “emailService1”
consume more messages, 12-2
12-1 12-0
and that losing some message
processing doesn’t matter. In this emailService 2
case, if the consumer crashes
after acknowledgement but
before processing the message, TIME Pending List for “emailService2”
then that would be lost. Note that
12-2
this message is still in the stream,
so you can potentially reprocess
it, although you’ll never know if
you’ll need to or not.
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
63
2. At least once: Again, this is Figure 43: “At-least once” processing in Redis Streams
very similar to Kafka. A message
Consumer Group:
is only acknowledged after Email Application
it’s been processed. Here, if a last_delivered_id
consumer is acknowledged
after processing multiple TIME
messages, and it crashes
during the processing of one Email
12-5 12-4 12-3 12-2 12-1 12-0 12-1 12-0
Redis Streams
of those messages, then you’ll emailService 1
12-2
In the example above, we have a consumer group called processing the second message.
“Email Application” with two consumers (“emailService1” 4. When “emailService1” comes back later and reads from
and “emailService2”). the pending list, it will again see both messages in that
1. “emailService1” reads the first two messages, while list.
“emailService2” reads the third message at the same time. 5. As a result, it will process both messages.
2. The pending list of emailService1 stores the first 6. Because of step 5, the consumer ends up processing
two messages and similarly the pending list of the first message twice.
emailService2 stores the third message.
3. “emailService1” starts to process both messages (and And this is why it’s called “at least once." Although ideally
hasn’t committed yet). However, let’s say it temporarily all pending messages will be processed in one pass, it
crashes after processing the first message but before may require more.
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
64
3. Exactly once: In Redis Figure 44: “Exactly once” (Option 1) processing in Redis Streams
Streams, you have multiple (done by processing one message at a time)
ways of ensuring that each Consumer Group:
message is processed exactly Email Application
last_delivered_id
one time.
12-1
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
65
Option 2: As an alternative Figure 45: “Exactly once” (Option 2) processing in Redis Streams (using a set
to Option 1, you can also use data structure to keep track of the messages that have already been processed)
additional data structures, such Consumer Group:
Email Application
as Redis Sets, to keep track of last_delivered_id
messages that have already
been processed by some
consumer. This way you can TIME
check the set and make sure
12-5 12-4 12-3 12-2 12-1 12-0 12-0
the message’s id is not already Email
Redis Streams emailService 1
a member of the set before you
request it from the stream.
2ND ROUND OF PROCESSING
12-1
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
66
Topic Topic
Email Payment
Partition 1 Partition 0
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
67
Redis Clusters
With Redis, things work pretty much the same way. In The only caveat if you’re using the OSS cluster is that you
the example below, the Redis cluster has three nodes. don’t have a proxy for these cluster nodes. That means
The messages for “Email” are sent to three different your client libraries will need to manage where the data
streams that are in three different nodes. The messages goes by directly connecting to each node within the
for “Payment” are sent to two different streams that are in cluster. But thankfully, the cluster APIs make it very easy
two different nodes. and most of the Redis client libraries in all programming
languages already support it.
Figure 47: A Redis OSS cluster with three brokers (servers), two topics (Email and Payment), and five streams
Email: Payment:
P1 P0
stream stream
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
68
On the other hand, Redis Enterprise provides a proxy By the way, in Redis clusters, if the key contains curly
layer on top of the clusters. This way the client can just braces (“{}”), then only the text within those curly braces
connect to the proxy and doesn’t have to worry about will be hashed. This means you can name the keys as
exactly which server the data is going to or coming from. “Email:{P0}” and “Payment:{P1}”.
Figure 47a: A Redis Enterprise cluster with three brokers (servers), two topics (Email and Payment), and five streams
Email: Payment:
{P1} P0
stream stream
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
69
By the way, in Redis you can run 1. Let’s say you are running four Redis instances (four 3. Next, let’s say you move two shards to the second
multiple Redis instances in the shards) on a single node. And imagine you have split node in order to better distribute the load. This is
same node. These are called the data across these four instances. This is mainly called “rebalancing”.
shards. for parallel processing and to fully utilize all the CPU 4. Finally, in order to increase parallel processing and
and memory resources. to fully utilize all the CPUs, you may add more Redis
Figure 48 illustrates how the 2. Now, let’s say you want to move to two machines, instances and split the data across those instances. This
Redis cluster helps scale Redis. that is, you want to “scale out”. So you add a second is called “resharding”. So let’s say you’ve added two more
Here is how it works. machine. At this point you have scaled out, and this instances/shards in each node. Now in the beginning
node is now part of the cluster. But this new node is these new instances won’t have any data. So you need to
empty to begin with. It contains no Redis instances use a process called resharding to split and move some
or data. of the existing data. In the end you wind up with a total
of eight shards and much higher throughput.
Figure 48: Using Redis Enterprise to increase your throughput by scaling out, rebalancing, and resharding your data
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
70
Hopefully this ebook has provided you with a solid foundation for understanding both Kafka and Redis Streams. It’s
Conclusion important to note that Kafka is a very robust streaming platform that is geared towards highly complex, distributed
applications when you have very specific requirements. In contrast, Redis Streams is a great way to add streaming to
an existing application that is already using Redis. Redis Streams has much lower management overhead, and if you
are already using Redis for say, caching, then you can implement Redis Streams without setting up and maintaining a
separate system.
If you’re interested in learning more and taking this further, check out the free Redis Streams course (RU202) offered at
Redis University.
Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis