0% found this document useful (0 votes)
9 views

Understanding Streams in Redis and Kafka

Uploaded by

Prudhvi Madasu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Understanding Streams in Redis and Kafka

Uploaded by

Prudhvi Madasu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

Understanding Streams in

Redis and Kafka


A Visual Guide

© 2022 Redis
2

Table of PART 1. Introducing the concept of streams. . . . . . . . . . . 3 Single partition and multiple consumers . . . . . . . . . . . 28

Contents
What are streams? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Multiple partitions and multiple consumers. . . . . . . . . 30
How streams are related to events . . . . . . . . . . . . . . . . . . 7 In-order and in-parallel message processing. . . . . . . . 37
How streams compare to buffering . . . . . . . . . . . . . . . . . . 8 The role of consumer groups in Redis Streams . . . . . . . . 43
Processing using just Buffers . . . . . . . . . . . . . . . . . . . . . . . . . . 8 How messages are acknowledged. . . . . . . . . . . . . . . . . . 49
Processing using Streams. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Letting the producer know that the message has been
The challenges of stream processing . . . . . . . . . . . . . . . 10 delivered. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

Specialized stream processing systems. . . . . . . . . . . . . 16 With Kafka. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51


With Redis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Summary: The two approaches compared. . . . . . . . . 53
PART 2. Comparing the approaches of Kafka and Redis
to handling streams. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Letting the consumer know that the message has been
received . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
How messages (event data) are stored. . . . . . . . . . . . . . 19
The role of offsets in Kafka’s consumption
Creating streams. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 acknowledgements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Adding messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 The role of “pending lists” in Redis’ consumption
Consuming messages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 acknowledgements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
With Kafka. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 The role of clusters in Kafka and Redis. . . . . . . . . . . . . . . 66
With Redis Streams. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Kafka Clusters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Approaches to scaling consumption. . . . . . . . . . . . . . . . . . 28 Redis Clusters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
3

Part 1 Introducing the concept of streams


Streams are incredibly useful but can be a little confusing to describe. Part of this is due to the fact that they can be
explained in at least two distinct ways and that each description provides only half of the picture. Backend developers
(NodeJS and Java) see streams (in contrast to buffers) as a memory-efficient way of processing large amounts of data.
The big data folks take a different perspective. They view streams as a way of dealing with data that arrives over time or
as a means of decoupling the producers and consumers of that data.

It’s like that old story where


blind people are asked to
describe an elephant. The first
one touches the elephant’s leg
and says the elephant feels like
a tree. The second one touches
the elephant’s trunk and
concludes that the elephant
feels like a snake and so on.

Figure 1. Image credit: https://stock.


adobe.com/search?k=blind+men+and+ele
phant&asset_id=460307967

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
4

Even when you think you have a firm understanding of Even though we’ll be covering deep and complex topics,
it, stream processing can still be a very complex topic. thanks to the format and the illustrations, it should be an
In fact, it’s difficult to maintain a good mental model of easy and fun read overall.
streaming unless you really understand some stream
processing systems.
By the end of this, you should have
The goal of this e-book is to help you build that mental
• An expert-level theoretical understanding of
model. I’ll use text, code snippets, and more than 50
streams, the challenges of stream processing,
illustrations to explain
and how two stream processing systems (Kafka
and Redis streams) work
1. How to think about streams and connect the
• Enough knowledge to do a proof-of-concept
dots between different perspectives so you get a
of Redis Streams or Kafka and to determine
bigger picture
which one is best suited for you
2. Some of the challenges of handling streams
• Enough theoretical knowledge to get a head
3. How stream processing systems such as Redis
start on certification for either Redis or Kafka
Streams and Kafka work. We are using these two
systems as examples in the hope that you’ll gain a
OK, let’s get started.
more thorough understanding as opposed to learning
just how one system handles the processing.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
5

What are streams?


Processing
In computer science, a stream Figure 2: Processing bytes stream one byte at a time
is a sequence of data elements
(i.e., series of strings, JSON,
binary, raw bytes) that are made
Server
available for processing in small
chunks over time. As with the
contents of a text file, this data
may be finite, but even then it
BYTES STREAM
will be processed in pieces, one 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
word, or one line at a time in a BYTE BYTE BYTE
10 GB
sequence (word after word, or
line after line) until all that data
has been processed. TIME

In other cases, the data might Figure 3: Processing JSON stream


Processing one JSON
be infinite and might never document at a time
end. For example, say you
are processing data in a chat
messenger server. You’ll only
Server
get a chat message to process
when someone writes one. And
it can happen any time and may
continue for as long as people
keep chatting. JSON STREAM

JSON chat JSON chat JSON chat JSON chat JSON chat

TIME

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
6

This data can be internal as well Figure 4: Streams of messages sent using
as external. It doesn’t have to machine-to-machine communication
come from the outside world.
It could originate from different
systems sending messages Web Server
to each other. For example,
a webserver, after receiving
payment information, might
use a JSON message to tell
an email server to send an JSON
email via a JSON message.
That is machine-to-machine
communication. You can also
think of these messages as JSON
coming in the form of streams
because they can come in small
pieces and can come over time
and at any point in time. JSON JSON JSON JSON
App Server Analytics Server

JSON

JSON

Email Server

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
7

How streams are Figure 5: How events generate streams of data

related to events
An event is simply a mechanism, Processing one
a trigger that is activated when Events “Buy event” at a time
something has occurred. For
example, when someone buys
a product, that triggers an event
that leads to the creation of a Server
JSON message that contains the
person’s information, payment
amount, product info, and so
on. This usually originates at
the browser or mobile app, JSON STREAM
and then the message is sent JSON Buy JSON Buy JSON Buy JSON Buy JSON
to the server. Here, the event is
the act of buying the product,
indicating something occurred. Data correcponding TIME
And since the buying event to each “event”

can happen at any time, the


resulting data (typically JSON)
representing that event flows
into the system as a stream.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
8

How streams Processing using just buffers


compare to Figure 6: How buffers are used to process data

buffering
If you ask backend engineers SERVER
who work in Java, NodeJS, and BUFFERS
other programming languages,
they’ll tell you that streams are
more efficient than buffers for Store the fixed data
processing chunks of data. They 1 back to disk

come from the perspective of


processing large data inside an
application server.
10 GB 10 GB 10 GB 10 GB 10 GB 10 GB

An example should help us to MyProject.txt MyProject.txt MyProject.txt FIXED New MyProject.txt New MyProject.txt
Buffer (RAM) MyProject.txt
understand their perspective a Buffer (RAM)
little better.
2 3 4
Say you have a 10 GB file
containing hundreds of typos Processes the entire
DATA PROCESSOR
file at a time
that say “breams” instead of
“streams.” Let’s look at how to
use buffers to replace “breams”
with “streams,” and then we’ll
see how the process would Here is how it works:
work using streams. 1. You first read the entire 10 GB file into RAM (can be slow to load all that data).
2. You then send this data to a data processor that will fix all the typos from “breams” to “streams.”
3. Once the data processor finishes processing, the new data will be stored back in the RAM
(so you may need an additional 10 GB of memory).
4. After all the processing is done, you write the entire file into a new file.

As you can see, this process not only tends to be slow, but it can also take up a lot of memory.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
9

Processing using streams


A better approach is to read the data as a stream. Here the data is transferred Figure 7: How streams are used to process data (Basic “stream processing”)
in bytes, and you figure out how to group these bytes into tiny chunks, then
process each chunk. STREAMS
(LINE-BY-LINE PROCESSING)
SERVER
Here is how it works:
1. Data comes in bytes; that is, one byte at a time.
2. The producer assembles those bytes into a chunk that you've specified. The last byte that is 1st line in MyProject.txt
part of the 1st line ~100 Bytes Buffer (RAM)
For example, if you've decided to process the file a line at a time, it keeps
appending bytes until it spots a newline character that signals the end of 1 2
this particular chunk. This chunk is now ready to pass on to the consumer. BYTES
TO LINE
3. The consumer processes the line, looks for the existence of the typo and, if CONVERTOR
10 GB 10 GB
it finds one, replaces "breams" with "streams."
MyProject.txt New MyProject.txt
4. The processed chunk is then written as a stream to the new file. Bytes that are part
New file is written
5. The whole process is repeated until the end-of-file character is detected. of the 2nd line
one line at a time
At that point, the process is complete, and the stream is closed. Bytes that are part 3 4
of the 3rd line
DATA PROCESSOR
As you can probably see, compared to buffering, streaming has some clear
Processes one
benefits. It's faster, more efficient, and places significantly less of a burden line at a time
on memory. Although both streaming and buffering require a buffer, in the
case of buffering, that buffer must be large enough to contain the entire
file or message. With streaming, the buffer only needs to be large enough
to accommodate the size of a specified chunk. Moreover, once the current
Now that you’ve seen how backend engineers view streams, let’s look
chunk has been processed, the buffer can be cleared and then used to
at streams through the eyes of big data engineers. But first, in order
accommodate the next chunk. As a result, regardless of the size of the
to do so, we need to better understand some of the challenges of
file, the buffer consumes only 50-100 bytes of memory at a time. Second,
stream processing.
because the entire file doesn't need to be loaded into RAM first, the process
can begin right away.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
10

The challenges of 1. What happens if the consumer is unable to process the chunks as quickly as the producer creates them? Taking our
current example, what if the consumer is 50 % slower then the producer? If we’re starting out with a 10 GB file, that
stream processing means by the time the producer has processed all 10 GBs, the consumer would only have processed 5 GB. What
happens to the remaining 5 GB while it’s waiting to be processed? Suddenly, that 50-100 bytes allocated for data
Although streams can be a very that still needs to processed would have to be expanded to 5 GB.
efficient way of processing huge
volumes of data, they come with
their own set of challenges. Let’s
take a look at a few of them. Figure 8: If the consumer is slower than the producer, you’ll need additional memory.

STREAMS
(50% SLOWER CONSUMER)
SERVER

The last byte that is 5 GB RAM due to


part of the 1st line slow processor

1 2
BYTES
TO LINE
10 GB CONVERTOR 10 GB

MyProject.txt New MyProject.txt


Bytes that are part
New file is written
of the 2nd line
one line at a time
Bytes that are part 3 4
of the 3rd line
DATA PROCESSOR

Processes one line at a time


50% slower than producer

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
11

2. And that’s just one nightmare scenario. There are others. For example, what happens if the consumer suddenly dies
while it’s processing a line? You’d need a way of keeping track of the line that was being processed and a mechanism
that would allow you to reread that line and all the lines that follow.

Figure 9: When the consumer fails

STREAMS
(FAILED CONSUMER)
SERVER

The last byte that is 1st line in MyProject.txt


part of the 1st line ~100 Bytes Buffer (RAM)

1 2
BYTES
TO LINE
10 GB CONVERTOR 10 GB

MyProject.txt New MyProject.txt


Bytes that are part
New file is written
of the 2nd line
one line at a time
Bytes that are part 3 4
of the 3rd line
DATA PROCESSOR

Fails processing after


certain number of lines

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
12

3. Finally, what happens if you need to be able to and that each has a corresponding consumer: a
process different events and send them to different “payment processor,” an “inventory processor,” and
consumers? And, to add an extra level of complexity, a “webserver events processor.” In addition, there is
what if you have interdependent processing, when the an important interdependency between two of the
process of one consumer depends on the actions of consumers. Before you can process the inventory,
another? There’s a real risk that you’ll wind up with a you need to verify payment first. Finally, each type
complex, tightly coupled, monolithic system that’s very of data has different destinations. If it’s a payment
hard to manage. This is because these requirements event, you send the output to all the systems, such as
will keep changing as you keep adding and removing the database, email system, CRM, and so on. If it’s a
different producers and consumers. webserver event, then you send it just to the database.
If it’s an Inventory event, you send it to the database
For example (Figure 10), let’s assume we have a large retail and the CRM.
shop with thousands of servers that support shopping
through web apps and mobile apps. As you can imagine, this can quickly become quite
complicated and messy. And that’s not even including the
Imagine that we are processing three types of data slow consumers and fault-tolerance issues that we’ll need
related to payments, inventory, and webserver logs to deal with for each consumer.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
13

Figure 10: The challenge of tight coupling because of multiple producers and consumers

MULTIPLE INPUT AND OUTPUT STREAMS SERVER


(TIGHTLY COUPLED, MONOLITHIC)

Database

Payment events Payment proc

Email

Webserver events Webserver proc

CRM
Inventory events Inventory proc

The inventory proc needs to check with payment proc


before it can reduce inventory count (dependency)
Dashboard

BEFORE PROCESSING

AFTER PROCESSING After processing, some data needs to go to all the systems (like payment Other web
events data) and some are only for a few (like web events data) services

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
14

Of course, all of this assumes that you’re dealing with consumers, and we retain the dependency between the
a monolithic architecture, that you have a single server inventory microservices and the payment ones. Finally,
receiving and processing all the events. How would you the problems we pinpointed in our original streaming
deal with a “microservices architecture”? In this case, example remain problems.
numerous small servers (that is, microservices) would
be processing the events, and they would all need to 1. We haven’t figured out what to do when a consumer
be able to talk to each other. Suddenly, you don’t just crashes.
have multiple producers and consumers. You have them 2. We haven’t come up with a method for managing slow
spread out over multiple servers. consumers that doesn’t force us to vastly inflate the
size of the buffer.
A key benefit of microservices is that they solve the 3. We don’t yet have a way to ensure that our data isn’t
problem of scaling specific services depending on lost.
changing needs. Unfortunately, although microservices
solve some problems, they leave others unaddressed. These are just some of the main challenges. Let’s take a
We still have tight coupling between our producers and look at how to address them.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
15

Figure 11: The challenges of tight coupling in the microservices world

MULTIPLE INPUT AND OUTPUT STREAMS


(TIGHTLY COUPLED, MICROSERVICES)
1
Payment
events
Database
Payment Payment Payment

Payment Payment Payment


Service Service Service
Email

Webserver
events

Web proc Web proc Web proc


CRM
The inventory proc needs to check with
payment proc before it can reduce
2 Web ev Web ev Web ev
inventory count (dependency) Service Service Service

Dashboard
Inventory
events

Invtr proc Invtr proc Invtr proc

Inventory Inventory Inventory Other web


services
Service Service Service

After processing, some data needs to go to all the systems (like payment
events data) and some are only for a few (like web events data)

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
16

Specialized stream As we’ve seen, streams can be great for processing large 4. If the consumer is slow, you can increase consumption
amounts of data but also introduce a set of challenges. by adding more consumers.
processing systems New specialized systems such as Apache Kafka and Redis 5. If one consumer is dependent on another, you can
Streams were introduced to solve these challenges. In the simply listen to the output stream of that consumer
world of Kafka and Redis Streams, servers no longer lie at and then do your processing. For example, in Figure
the center, the streams do, and everything else revolves 11, the inventory service is receiving events from both
around them. the inventory stream (purple) and also the output of
the payment processing stream (orange) before it
Data engineers and data architects frequently share this processes the inventory event. This is how you solve
stream-centered worldview. Perhaps it’s not surprising the interdependency problems.
that when streams become the center of the world, 6. The data in the streams are persistent (as in a
everything is streamlined. database). Any system can access any data at any
time. If for some reason data wasn’t processed, you
Figure 12 illustrates a direct mapping of the tightly can reprocess it.
coupled example you saw earlier. Let’s see how it works
at a high level. A number of streaming challenges that once seemed
formidable, even insurmountable, can readily be solved just
Note: We’ll go into the details later in the context of Redis by putting streams at the center of the world. This is why
Streams and Kafka to give you an in-depth understanding more and more people are using Kafka and Redis Streams in
of the following: their data layer.

1. Here the streams and the data(events) are first-class This is also why data engineers view streams as the center
citizens as opposed to systems that are processing of the world.
them.
2. Any system that is interested in sending data Now that we understand what streams, events, and
(producer), receiving data (consumer), or both sending stream processing systems are, let’s take a look at Redis
and receiving data (producer + consumer) connects to Streams and Kafka to understand stream processing and
the stream processing system. how they solve various challenges. By the end of this, you
3. Because producers and consumers are decoupled, should be an expert, at least in the theoretical aspects of
you can add additional consumers or producers at stream processing to the extent that you can readily do a
will. You can listen to any event you want. This makes it proof-of-concept for each system or easily earn Kafka or
perfect for microservices architectures. Redis certification.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
17

Figure 12: When we make streams the center of the world, everything becomes streamlined.

Payment Webserver Inventory


events events events

Payments stream

Webserver stream

Inventory stream

REDIS STREAMS
Payments stream (post processing)
OR
Events stream (post processing)

Inventory stream (post processing)

Payment Web proc Invtr proc

Payment Web ev Inventory Database Email CRM Dashboard Other web


Service Service Service services

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
18

Part 2 Comparing the approaches of Kafka


and Redis to handling streams
Apache Kafka is open source (Apache License 2.0, Redis is an open-source (BSD3, written in C), in-memory
written in Scala) and a leading distributed streaming database, considered to be the fastest and most loved
platform. It’s a very feature-rich stream processing database. It’s also the leading database on AWS. Redis
system. Kafka also comes with additional ecosystem Streams is just one of the capabilities of Redis. With Redis,
services such as KsqlDB and Kafka Connect to provide you’ll get a multi-model, multi-data structure database
for more comprehensive capabilities. with 6 modules and more than 10 data structures.

So the key thing to remember is that, when you are thinking about Kafka and Redis Streams, you should really
think of Kafka and Redis (not just Redis Streams).

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
19

How messages Figure 13: How messages look in Kafka and Redis Streams

(event data) are


stored
Although their storage is similar,
Kafka and Redis Streams have
Messages are stored in the same
different ways of identifying order as they arrive. Messages are
each message. In Kafka, each partly identified by their sequence
message is given a sequence 4 3 2 1 0 number that starts at 0

number that starts with 0. But


NEWEST OLDEST
each message can only be
partly identified by its sequence TIME
number. That’s because of
another concept called a
“partition” that we’ll get into later.

In Redis Streams, each message Messages are stored in the same order
as they arrive. Messages are by default
by default gets a timestamp
identified by
as well as a sequence number. <millisecondsTime>-<sequenceNumber>
56628723-0 56628724-0 56628725-0 56628726-0 56628727-0
The sequence number is 1518951480106-0
Redis Streams
provided to accommodate NEWEST OLDEST
messages that arrive at the
exact same millisecond. So
if two messages arrived at
the exact same millisecond
(1518951480106), their ids would
look like 1518951480106-0 and
1518951480106-1.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
20

Creating streams Figure 14: How messages look in Kafka for topic Email with one broker, one partition, and one replication factor

In Kafka, you create what’s


called a “topic.” You can think of BROKER 1 (SERVER)
this as the name of the stream.
However, in Kafka, you also need TOPIC: EMAIL
to understand four key concepts.

1. Partition: You can think of it


as a file on the disk. PARTITION: 0
2. Broker: You can think of the 2 1 0
actual server. Apache Email Service
3. Replication Factor: The ZooKeeper NEWEST OLDEST
number of duplicate copies
TIME
of the messages you want
to keep.
4. ZooKeeper: This is an
additional system that you
need to use in order to
manage Kafka.

We’ll get into all these in a bit but Note the command to create a Kafka topic with one partition and one replication factor would look like this:
for now let’s assume you have > kafka-topics --zookeeper 127.0.0.1:2181 --topic Email --create --partitions 1
one partition, one broker, and --replication-factor 1
one replication factor.

Important Kafka no longer requires Zookeeper as of Kafka 3.3.1. Since 3.3.1, Kafka uses Kafka Raft (KRaft), which
Important:
is built-in. Instead of “--zookeeper” you now use “--bootstrap-server”.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
21

Note: The example below (Figure 14a) shows how these would look in a Kafka cluster. We’ll discuss that later, but for now
just imagine that there is only one broker.

Figure 14a: A Kafka cluster with three brokers (servers), two topics (Email and Payment), where Email-topic has three parti-
tions that are spread across three brokers (10, 11, and 12) and Payment topic has two partitions that are spread across two
brokers (11 and 12).

Apache
ZooKeeper

Topic Topic Topic


Email Payment Email
Partition 0 Partition 1 Partition 2

Topic Topic
Email Payment
Partition 1 Partition 0

BROKER ID: 10 BROKER ID: 11 BROKER ID: 12

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
22

In Redis, you simply create a Figure 15: How messages look in Redis for an Email stream
stream and give it a key. Note
that all the data within this
stream is part of this single key
(“Email”). Note also that this
key and its stream just resides {”name”: “raja”, “email”: “[email protected]”}
along with other keys and data User JSON
structures. Redis allows for a
number of data structures. TIME
A stream is just one of them.
(See Figure 15.) 12-2 12-1 12-0
Email
Redis Streams NEWEST OLDEST Email Service

[”This”, “is”, “an”, “Array”]


myArray Array

”I’m a humble string”


myString String

The command to create a Redis stream would look like this:


XADD Email * email_subject “1st email” email_body “hello world”

If the Email stream already exists, it will append the message. If it doesn’t exist, Redis will automatically create a
stream (using “Email” as its key) and then append the first message. The asterisk will auto-generate the message
id (timestamp-sequence) for this message.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
23

Adding messages
In the following command, you are using a Kafka producer CLI tool to send three messages to the Email topic.
Kafka has a concept called $ kafka-console-producer --brokers-list 127.0.0.1:9092 --topic Email
“producers.” These are > my first email
responsible for sending > my second email
messages. They can also send > my third email
messages with some options
such as acknowledgments, In Redis Streams, use the XADD command to send the data in a hash to the Email key.
serialization format, and so on. XADD Email * subject “my first email”
XADD Email * subject “my second email”
XADD Email * subject “my third email”

In Redis Streams, you can set up acknowledgments and many other things as part of the Redis server or Redis
cluster settings. Remember that these settings will get applied to the entire Redis and not just for the Redis
Streams data structure.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
24

Consuming messages With Kafka

Both Kafka and Redis Streams In Kafka, the following command reads all the messages in the Email topic. The “bootstrap-server” is the main Kafka
have the concepts of consumers server. The “--from-beginning” flag tells Kafka to send all the data from the beginning. If we don’t provide this flag, the
and consumer groups. We’ll consumer will only retrieve messages that arrive after it has connected to Kafka and started to listen.
cover just the basics first.

$ kafka-console-consumer --bootstrap-server 127.0.0.1:9092 --topic Email --from-


beginning

Response:
> my first email
> my second email
> my third email

Note: The above consumer client will continue to wait for new messages in a blocking fashion and will display them when
they arrive.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
25

With Redis Streams

In Redis Streams, you have two main options:

1. Consume messages by using XREAD (equivalent to Kafka’s command). In the command below, “BLOCK 0”
tells the Redis CLI to maintain the connection forever (0) in a blocking manner. “Email 0” after the keyword
“STREAMS” means to get messages from the “Email” stream and from the beginning of time.

XREAD BLOCK 0 STREAMS Email 0

Response:
1) 1) 1518951480106-0
2) 1) “subject”
2) “my first email”
2) 1) 1518951482479-0
2) 1) “subject”
2) “my second email”
3) 1) 1518951482480-0
2) 1) “subject”
2) “my third email”

Notes:
• If you use “Email $”, then it would get only new messages from the “Email” stream. That is, “XREAD BLOCK 0 STREAMS
Email $”
• You can use any other timestamp id after the stream name to get messages after that timestamp id. That is, “XREAD
BLOCK 0 STREAMS Email 1518951482479-0”

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
26

2. Consume messages by giving a range of ids or by using some special commands.

c. You can use the command XRANGE and get everything from the smallest (“-”) timestamp to the latest one
(“+”).

> XRANGE Email - +

Response:
1) 1) 1518951480106-0
2) 1) “subject”
2) “my first email”
2) 1) 1518951482479-0
2) 1) “subject”
2) “my second email”
3) 1) 1518951482480-0
2) 1) “subject”
2) “my third email”

d. You can also provide timestamps directly.

> XRANGE Email 1518951482479 1518951482480

Response:
1) 1) 1518951482479-0
2) 1) “subject”
2) “my second email”
2) 1) 1518951482480-0
2) 1) “subject”
2) “my third email”

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
27

c. You can limit the result by specifying a count.

> XRANGE Email - + COUNT 1

Response:
1) 1) 1518951480106-0
2) 1) “subject”
2) “my first email”

d. By prefixing the last id with a “(“, you can pick up where you left off, starting with the messages that
immediately followed the one with that id and keeping the “+” for the ending point. In the example below,
we are retrieving two messages that come after a message with a “1518951480106-0” id.

> XRANGE Email (1518951480106-0 + COUNT 2

Response:
1) 1) 1518951482479-0
2) 1) “subject”
2) “my second email”
2) 1) 1518951482480-0
2) 1) “subject”
2) “my third email”

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
28

Approaches to scaling Single partition and multiple consumers


consumption
Scenario: Let’s imagine you have three emails that need to be processed in no particular order by three email processors
You just saw the basics of (consumers) so you can get the job done in one third the time.
producers and consumers in
both Kafka and Redis. Now, In Kafka, let’s say you connected all three consumers to the Email topic. Then all three messages are sent to all three
let’s dig in and see how these consumers. So you end up processing duplicate messages. This is called a “fan out.”
streaming services scale
consumption.
Figure 16: A “fan out” in Kafka when multiple consumers connect to a single topic

BROKER 1 (SERVER) All messages

TOPIC: EMAIL
Email Service

All messages
PARTITION: 0
2 1 0
Email Service
Apache
ZooKeeper
NEWEST OLDEST
All messages
TIME

Email Service

Note: Although it doesn’t work for this scenario, it works fine in the chat messenger clients where you can connect multiple
users to the same topic and they all receive all chat messages.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
29

It works exactly like that in Redis Picture 17: A “fan out” in Redis Streams
Streams as well.

All messages

{”name”: “raja”, “email”: “[email protected]”} Email Service

User JSON

All messages
TIME

12-2 12-1 12-0


Email Email Service
Redis Streams NEWEST OLDEST

[”This”, “is”, “an”, “Array”] All messages

myArray Array
Email Service

”I’m a humble string”


myString String

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
30

Multiple partitions and multiple consumers

In Kafka, there is a concept called a partition. You can think of a partition as a physical file on the disk. Partitions are used for
scaling purposes. However you should use them carefully, either with “keys” or with “consumer groups.” We’ll talk about both
of them in a bit. But just know that consumers generally don’t care and are not aware of the partitions. They just subscribe to
a “Topic” (the higher-level abstraction) and consume whatever Kafka sends them.

We are going to cover multiple cases of just using multiple partitions and multiple consumers, and it may look odd at first.

Case 1: An equal number of partitions and consumers (three each)

In the example below, we have created three partitions for the “Email” topic using the following command:

> kafka-topics --zookeeper 127.0.0.1:2181 --topic Email --create --partitions 3


--replication-factor 1

Now when we add three messages to the topic, they are automatically distributed to each of the partitions using a
hashing algorithm. So each partition gets just one message each in our example. But when consumers connect to this
topic (they are not aware of the partitions), all the messages that are in each partition are sent to each consumer in a
fan-out fashion.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
31

Figure 18: A “fan out” when there are multiple partitions (default behavior)

BROKER 1 (SERVER)

TOPIC: EMAIL

All messages
PARTITION: 0
0
Email Service

All messages
PARTITION: 1
0 Email Service
Apache
ZooKeeper
All messages

PARTITION: 2
Email Service
0

TIME

Notes:
• Message order: The order in which consumers receive messages is not guaranteed. For example, “Email Service 1” might
receive “message 1”, “message 3” and finally “message 2”. Whereas “Email Service 2” might get them in the following
order: “message 3”, “message 1” and “message 2”. This is because message order is only maintained within a single parti-
tion.
• Later, we’ll learn more about the ordering and how to use keys and consumer groups to alter this default behavior.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
32

Case 2: More partitions (three) and fewer consumers (two)

It still works the same. Each consumer gets all the messages irrespective of partitions. Message order is not guaranteed.

Figure 19: A “fan out” when there are more partitions than consumers

BROKER 1 (SERVER)

TOPIC: EMAIL

All messages
PARTITION: 0
0
Email Service

PARTITION: 1
0
Apache
ZooKeeper
All messages

PARTITION: 2
Email Service
0

TIME

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
33

Case 3: Multiple but fewer partitions (three) and more consumers (four)

It still works the same. Each consumer receives all the messages irrespective of partitions. Message order is still random.

Figure 20: A “fan out” when there are fewer partitions than consumers

BROKER 1 (SERVER)

TOPIC: EMAIL
All messages

Email Service
PARTITION: 0
0
All messages

Email Service

PARTITION: 1
0 All messages
Apache
ZooKeeper
Email Service

PARTITION: 2
All messages
0

Email Service
TIME

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
34

In Redis Streams, there is no such concept as partitions. If you are using a standalong Redis Server, you don’t need to
worry about partitioning. If you do want to distribute messages in the same stream across several servers, then you
should use a combination of multiple stream keys and a sharding system like Redis Cluster, or some other applicaiton-
specific sharding system.

Let’s look at how you might implement something resembling partitions in Redis.

You can create “partitions” by creating multiple streams and then distributing data yourself. And on the consumer side,
unlike Kafka, since you have direct access to each of these streams, you can consume the data in a fan-out fashion by
connecting to all the streams, or by using a key or keys to connect to specific streams.

Say you created three streams: “Email:P0”, “Email:P1”, and “Email:P2”. And say you want to distribute the incoming
messages in a round-robin fashion. And finally you want to consume data in a “fan-out” fashion and also in a “per-
stream” fashion.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
35

Consuming from Redis Stream “partitions” in a “fan out” fashion (Figure 21)

To consume the data in a “fan out” fashion, simply listen to all the streams (Figure 21).
//Consumer 1
XREAD BLOCK 0 STREAMS Email:P0 0 Email:P1 0 Email:P2 0
//Consumer 2
XREAD BLOCK 0 STREAMS Email:P0 0 Email:P1 0 Email:P2 0
//Consumer 3
XREAD BLOCK 0 STREAMS Email:P0 0 Email:P1 0 Email:P2 0

Notes:
• BLOCK 0 = Wait indefinitely for messages.
• ”Email:P0 0” = Read all messages from the beginning (0-0).
• By providing multiple stream names and “0”s each consumer can receive all messages.

Figure 21: How to implement “partitions” in Redis streams and consume messages in a “fan out” manner

All messages

TIME
Email Service
12-0
Email:P0 Redis Streams

All messages
TIME

12-1
Email:P1 Email Service
Redis Streams

TIME
All messages
12-2
Email:P2 Redis Streams
Email Service

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
36

Consuming from Redis Stream “partitions” in a per-stream fashion (Figure 22)

To implement a round-robin, you can keep a counter in


Redis, say checkout_counter, and increment it (INCR
To consume the data in a “per-stream” fashion,
checkout_counter) every time you send a new message
simply listen to the stream of your choice. Here
to a stream. Then use a modulus of the checkout_counter
message orders are preserved (Figure 22).
(checkout_counter% number of streams) to determine
//Consumer 1
which stream you should send the next message to.
XREAD BLOCK 0 STREAMS Email:P0 0
//Consumer 2
XREAD BLOCK 0 STREAMS Email:P1 0
//Consumer 3
The following command creates a “check_out:p0”
XREAD BLOCK 0 STREAMS Email:P2 0
Redis stream.
XADD check_out:p0 * message 0 cartId 1
items 5 cost $100
INCR checkout_counter //Use this for
Figure 22: How to implement “partitions” in Redis Streams round-robin
and consume them in a “per-stream” manner

Only messages
from “Email:P0”

TIME
Email Service
12-0
Email:P0 Redis Streams
Only messages
from “Email:P1”
TIME

12-0
Email:P1 Email Service
Redis Streams

TIME
Only messages
from “Email:P2”
12-0
Email:P2 Redis Streams
Email Service

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
37

In-order and in-parallel message processing that, those messages with the same key will end up
in the same partition. This helps in message ordering.
Tasks can be handled in order, in parallel, or with a Message keys are also useful in other things such as log
combination of both. As the name implies, with in- compaction, but we’ll not cover that here.
order processing, tasks are handled in a specific order.
For example, when processing a credit card, we need Secondly, Kafka uses the concept of “consumer groups,”
to first check for validity of the card, then do fraud where you define a bunch of individual consumers
detection, and then check for a balance. With in-parallel as part of the same consumer group. Kafka will then
processing, tasks are handled simultaneously so they can ensure that messages are distributed across different
be completed more quickly. With in-order and in-parallel consumers that are part of that group. This helps in
processing, the system splits the tasks into groups of scaling consumption and also avoids “fan out,” so
tasks that need to be handled in order and then assigns each message is read by only one consumer. Another
those to different consumers that can perform those key aspect of consumer groups is that, assuming the
ordered tasks in parallel. Kafka and Redis Streams handle number of consumers is greater than or equal to the
this process a little differently. How they differ will become number of partitions, each consumer in a group is tied
clearer when you look at each system’s implementation. to a single partition and is allowed to read messages
from just that partition. It cannot read messages
How Kafka handles it from multiple partitions. This way when you combine
message keys and consumer groups you’ll wind up with
In Kafka, you can send metadata called a “key” (aka highly distributed consumption, although order is still
“message key”) along with the message. When you do not guaranteed in the event of a consumer failure.

Let’s look at an example to make it clear.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
38

Referring to Figure 23, let’s say Figure 23: How consumer groups and message keys work in Kafka
you are processing emails for an
e-commerce store. You need to Consumer Group:
send the following emails and in Email Application
BROKER 1 (SERVER)
the following order:

TOPIC: EMAIL
1. Payment Received All messages
in Partition 0
2. Product Shipped
3. Product Delivered
Email Service 1
PARTITION: 0
In this case, to make sure they
are sent in that order, we can 2 1 0 All messages
in Partition 1
use the order id (“order1234”)
as the key we send to Kafka to
Email Service 2
ensure that all the messages
end up in the same partition. PARTITION: 1
All messages
in Partition 2
Apache
ZooKeeper
Email Service 3

PARTITION: 2
Extra Stand-by

Email Service 4
TIME

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
39

And secondly, we can define


a consumer group “Email 1. Create a producer to send messages to “Email” topic with a key. Use the “=” sign to separate the “key”
Application” and designate (e.g., orderid1234) and the “value” (the actual message content).
four consumers as part of that $ kafka-console-producer.sh --broker-list localhost:9092 --topic Email l
group. Kafka will then internally --property
connect each consumer to one “parse.key=true” --property “key.separator==”
partition (1:1 mapping).
2. Then, send three messages with different statuses “payment_received”, “product_shipped” and
• If there are more consumers “product_delivered”.
than partitions, the > orderId1234={id:”orderid1234”, name: “light bulb”, price:” $1.00”, status:
additional consumers will “payment_received”}
be kept idle and won’t > orderId1234={id:”orderid1234”, name: “light bulb”, price:” $1.00”, status:
receive any messages. So, “product_shipped”}
in our example, the fourth > orderId1234={id:”orderid1234”, name: “light bulb”, price:” $1.00”, status:
consumer will be left idle and “product_delivered”}
won’t receive any messages.
• If there are fewer consumers 3. Create four consumers within the consumer group “EmailApp” and connect them to the “Email” topic
than partitions, then some of (in four different CLI windows).
the consumers will receive kafka-console-consumer --bootstrap-server 127.0.0.1:9092
data from multiple partitions. --group EmailApp --topic Email
However, there will still be kafka-console-consumer --bootstrap-server 127.0.0.1:9092
only one consumer (that’s --group EmailApp --topic Email
part of the group) per kafka-console-consumer --bootstrap-server 127.0.0.1:9092
partition. --group EmailApp --topic Email
kafka-console-consumer --bootstrap-server 127.0.0.1:9092
Let’s see how this actually looks --group EmailApp --topic Email
in CLI.
4. Here’s how just one of the consumers will receive all three messages.
> orderId1234={id:”orderid1234”, name: “light bulb”, price:” $1.00”, status:
“payment_received”}
> orderId1234={id:”orderid1234”, name: “light bulb”, price:” $1.00”, status:
“product_shipped”}
> orderId1234={id:”orderid1234”, name: “light bulb”, price:” $1.00”, status:
“product_delivered”}

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
40

How Redis Streams handle it Let’s say someone purchases three different products. For each product, you have “payment_received”, “product_shipped”,
and “product_delivered” messages (for a total of nine), and you want to process them in order but also in parallel.
Although, like Kafka, Redis
Streams has a concept of In the example (Figure 24) below, yellow, purple, and pink represent three products. Each product has three messages
“consumer groups,” it operates representing its different states. As you can see, if you want to process three messages at a time, simply create three
differently. In fact, you don’t streams and send each product’s data into a specific stream based on the product id or some unique identifier. This
need it for this specific use is similar to “keys” in Kafka. After that, connect each consumer to each stream (i.e., 1:1 mapping). This is also similar
case. We’ll learn in the next to Kafka. Then you’ll get both parallel processing and in-order message processing at the same time. As we already
section how Redis uses mentioned, unlike with Kafka, with Redis you don’t really need consumer groups in this case.
consumer groups, but for now
let’s see how in-order and in-
parallel message processing Figure 24: Using Redis Streams to process multiple messages in parallel and in order
works in Redis.
AllMsgs in-order
Creating streams in Redis from “Email:P0”

is cheap. You simply define TIME


multiple streams (essentially Email Service

simulating “partitions'') based Email:P0 Redis Streams


12-3 12-1 12-0

on some simple hash algorithm AllMsgs in-order


and then send the messages TIME from “Email:P1”

to those different streams.


12-2 12-1 12-0
Once you do, you will be able Email:P1
Redis Streams
Email Service

to use different consumers to


consume those messages in TIME
AllMsgs in-order
order and in parallel. 12-2 12-1 12-0
from “Email:P2”
Email:P2 Redis Streams
Let’s look at an example. Email Service

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
41

Let’s see how it looks in the CLI.


1. Use a key to keep track of the number of emails so we can use it for round-robin.

INCR email_counter

2. Hash the unique id like the order id and take a modulus of the number of streams to determine which stream
(Email:P0 or Email:P1 or Email:P2) to send the data to.

var streamName = “Email:P” + ((murmurHash(“order1234”) % 3)

All we need to do is to convert a string like orderId into a number using a popular hash called “murmur” hash
and then take mod of the number of streams.

Note: Kafka also uses “murmur” hash for converting string into a number. There are “murmur” libraries in every
language, such as this one in Nodejs. A “murmur” hash, while not strictly necessary, is fast and sufficient given
you do not require cryptographic secutiry.

3. Send the messages to the appropriate stream. Notice that because of the hash we employed in the above
steps, we’ll have a 1:1 mapping between the order id and the stream name. So, for example, all the messages
with order id order1234 will go to “Email:P0” stream.

XADD Email:P0 * id order1234 name “light bulb” price “$1:00” status “payment_received”
XADD Email:P0 * id order1234 name “light bulb” price “$1:00” status “order_shipped”
XADD Email:P0 * id order1234 name “light bulb” price “$1:00” status “order_delivered”

XADD Email:P1 * id order2222 name “chair” price “$100:00” status “payment_received”


XADD Email:P1 * id order2222 name “chair” price “$100:00” status “order_shipped”
XADD Email:P1 * id order2222 name “chair” price “$100:00” status “order_delivered”

XADD Email:P2 * id order5555 name “Yoga book” price “$31:00” status “payment_received”
XADD Email:P2 * id order5555 name “Yoga book” price “$31:00” status “order_shipped”
XADD Email:P2 * id order5555 name “Yoga book” price “$31:00” status “order_delivered”

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
42

4. Here’s how just one of the consumers will receive all three messages.

> orderId1234={id:”orderid1234”, name: “light bulb”, price:” $1.00”, status:


“payment_received”}
> orderId1234={id:”orderid1234”, name: “light bulb”, price:” $1.00”, status:
“order_shipped”}
> orderId1234={id:”orderid1234”, name: “light bulb”, price:” $1.00”, status:
“order_delivered”}

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
43

The role of consumer In the example below (Figure 25), we have created a consumer group called “Email Application” and have made three
groups in Redis Streams consumers part of that group. Each consumer is asking for one message each at the same time for concurrent processing.
In this case, Redis Streams simply distributes unread (unconsumed) messages to each consumer.
In Redis Streams, although there
is no concept of “message keys”
like in Kafka, you can still get Figure 25: Each member of the consumer group “Email
message ordering without it. Application” has concurrently requested one message and has
Consumer Group:
However, it does have a concept been given one unread message Email Application
of “consumer groups” but again
it works differently from Kafka.
1 of 3 msg
First, let’s understand how
consumer groups work in Redis
Streams and then later we’ll see 12-0
Email Service 1
how Redis Streams handles
message ordering. 2 of 3 msg
TIME
In Redis Streams you can
connect multiple consumers that Email:P1
12-2 12-1 12-0 12-1 Email Service 2
Redis Streams
are part of the same consumer
group to a single stream and do 3 of 3 msg
parallel processing without the
need for partitions.
12-2
Email Service 3

Note: Each consumer within the consumer group needs to identify itself with a name. In Figure 25, we have named the
services that are part of the “Email Application” group as “emailService1”, “emailService2”, and “emailService3”.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
44

Let’s see how it looks in the CLI.


1. Create a stream (Email), a consumer group (Email Application - “EmailApplnGroup”,) and set it to read all
messages from the beginning (“0”). Note: If you use “$” instead of “0”, then it will send only new messages.
Notes: Also, if you provide any other id, then it will start reading from that id. Note: MKSTREAM is used to make a
• The “>” means, send me new stream if the stream doesn’t already exist.
new messages; that is, ones
that have not been read by XGROUP CREATE Email EmailApplnGroup 0 MKSTREAM
anyone (including me).
• Caution: If you use “0” or a 2. Add three messages to the stream.
timestamp instead of “>”,
then Redis Streams will XADD Email * subject “1st email” body “Hello world”
only return messages that XADD Email * subject “2nd email” body “Hello world”
have already been read (but XADD Email * subject “3rd email” body “Hello world”
not acknowledged) by the
current consumer.
3. Let’s consume messages using the three consumers, each asking concurrently for one email.
• Note that It doesn’t return
all the messages from the XREADGROUP GROUP EmailApplnGroup emailService1 COUNT 1 STREAMS Email >
mainstream. This is because, XREADGROUP GROUP EmailApplnGroup emailService2 COUNT 1 STREAMS Email >
when a consumer is part XREADGROUP GROUP EmailApplnGroup emailService3 COUNT 1 STREAMS Email >
of a group, an additional
list called a “pending list” 4. In this case, each consumer will receive one message.
is created and treated as
a micro-stream that holds 1) 1) “Email”
messages for that particular 2) 1) 1) 1526569495631-0
consumer. This is a little 2) 1) “subject”
tricky to understand, we’ll 2) “1st email”
discuss this in a bit 3) “body”
• “COUNT 1” means give me 4) “Hello world”
just one message.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
45

As mentioned earlier, unlike Figure 26: How consumer groups work when one
Kafka, each consumer within consumer asks for more messages than the other
the group can ask for as many Consumer Group:
messages as it wants. Email Application

In Figure 26, we have


“emailService1” asking for 2 of 3 msg 1 of 3 msg
two messages instead of
one, while at the same time
12-1 12-0
“emailService2” is asking Email Service 1

for one. Finally, a little


bit later, “emailService3” 3 of 3 msg

asks for one message. In TIME


this case, emailService1
12-2 12-1 12-0 12-2
gets to process two Email Email Service 2
Redis Streams
messages, emailService2
gets to process one, but
emailService3 doesn’t wind
up with any because there
are no more unclaimed Email Service 3
messages available.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
46

This scenario doesn't have to Figure 27: Multiple consumer groups and non-grouped,
be limited to a single consumer or standalone, consumers all consuming messages at
group. It’s possible to have the same time Consumer Group:
Email Application
multiple consumer groups as
well as regular consumers that
are not part of any group, all
1 of 3 msg
consuming messages at the
same time. In Figure 27, there
are two different consumer 12-0
Email Service 1
groups (“Email Application”
2 of 3 msg
and “Payment Application”)
as well as a regular consumer TIME
(Dashboard Service) that are all 12-2 12-1 12-0 12-1
Email Email Service 2
consuming the messages. Redis Streams
3 of 3 msg

12-2
Consumer Group 2: Email Service 3
Payment Application
2

1 of 3 msg 2 of 3 msg

1 0
12-0 12-1
Payment Service 1

3 of 3 msg

12-2
Payment Service 2
DASHBOARD SERVICE CONSUMER
(not part of any group)

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
47

And finally, a consumer or a Figure 28: The consumer group (Payment Application) and a
consumer group can consume consumer (Dashboard Service) consuming data from two streams Consumer Group:
data from multiple streams Email Application
(see Figure 28).
1 of 3 msg
TIME

12-2 12-1 12-0 12-0


Email Email Service 1
Redis Streams
2 of 3 msg

12-1
TIME Email Service 2

12-2 12-1 12-0 3 of 3 msg


Orders
Redis Streams

12-2
Email Service 3

Consumer Group 2:
Payment Application
2 2
1 of 3 msg 2 of 3 msg

1 1 0

12-0 12-1 12-2 12-1


Payment Service 1 0
3 of 3 msg

12-2 12-0
Payment Service 2
DASHBOARD SERVICE CONSUMER
(not part of any group)

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
48

To consume from multiple


streams, you simply need to list 1. Making a consumer group’s (Payment Application Consumer Group 2 - “paymentApplnGroup”) consumer
them. This is how it looks like in (paymentService1) get two (Count 2) unread messages from Email stream (Email >) and also from the Orders
the CLI. stream (Orders >)

XREADGROUP GROUP paymentApplnGroup paymentService1 COUNT 2


STREAMS Email > Orders >

2. Making the dashboard service get messages from the beginning from both Email (Email 0) and the Order
(Orders 0) streams and also waiting for any new streams in a blocking fashion (BLOCK 0).

XREAD BLOCK 0 STREAMS Email 0 Orders 0

Now that you have seen the basics of how stream processing works, let’s look at how some of the challenges are
addressed. One of the most effective ways to handle them is via “message acknowledgements.”

Let’s dig in.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
49

How messages 1. Providing message delivery guarantees for producers. Once a message has been sent, how can we be sure that it has
been received? We need the streaming system to acknowledge that it has in fact safely stored the incoming message.
are acknowledged
Streaming system
In the context of stream Figure 29: How stream processing
processing, acknowledgement systems acknowledge message
is simply a way for one system Producer reception to producers
to confirm to another system
that it has received a message
or that it has processed that
message. Redis
or
Message acknowledgements Yes, I acknowledge that I’ve safely stored it
can be used to solve the
following four stream
processing challenges:

1. Providing message delivery 2. Providing message consumption guarantees for consumers. There needs to be a way for the consumer to
guarantees for producers acknowledge back to the system that it has successfully processed the message.
2. Providing message
consumption guarantees for
consumers Streaming system
Figure 30: A consumer
3. Enabling retries after
acknowledgement to the streaming
temporary outages
system after processing the message
4. Permitting reassignment
following a permanent
Consumer
outage

Redis
Yes, I acknowledge that I’ve processed the message
or

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
50

3. Enabling retries after temporary outages. We need to be able to reprocess messages in the event that a consumer
dies while processing them. For this we need to have a mechanism that enables the consumer to acknowledge to
the system that it has processed a message. And if there is an issue, the processing system needs to provide a way
to re-process that message in case of a temporary failure (Figure 31).

Streaming system Figure 31: If the consumer fails


Retry 1 to process the message on the
Retry 2 first try, a mechanism is needed
Retry 3 that enables it to retry processing
the message until it has been
successfully processed and that
First time consuming enables it to acknowledge when
Redis
or
the message has finally been
Consumer processed.
Yes, I acknowledge that I’ve processed the message

Consumer retrying after failed processing


and acknowledging after the third try

4. Permitting reassignment following a permanent outage. And lastly, if the consumer permanently fails (say,
crashes), we need a way to either assign the job to a different consumer or allow different consumers to find out
about the failure and take over the job (Figure 32).
Streaming system
1 2
Figure 32: When a consumer,
while attempting to read new
First time consuming messages (1) permanently crashes
Consumer
(2), a new consumer takes over
3 (3), successfully processes the
messages, and then sends back an
A new consumer takes over when another consumer crashes
Redis NEW acknowledgment to the streaming
Consumer
or 4 system (4).
Yes, I acknowledge that I’ve processed the message

Now let’s look at how Kafka and Redis Streams handle each one of these.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
51

Letting the producer Configuring weak consistency and durability


know that the message
has been delivered Let’s see how the weak consistency and durability configuration works. And once configured, it’ll work the same for all
types of Redis keys including Redis Streams. This is somewhat equivalent to “aAck=1” in Kafka.
With Kafka
Any updates that are issued to the database are typically performed with the following flow:
In Kafka, you can have the 1. The application issues a write to the proxy.
following configurations 2. The proxy communicates with the correct primary “shard” in the system that contains the given key.
3. Once the write operation is complete, an acknowledgement is sent back to the proxy.
Ack = 0: Send the message but 4. The proxy sends the acknowledgment back to the application.
don’t wait for any confirmation
(you may lose data, but it will be Independently, the write is communicated from primary to replica and replication acknowledges the write back to the
extremely fast). primary. These are steps 5 and 6.
Ack = 1: At least one of the
nodes in the cluster must Independently, the write to a replica is also persisted to disk and acknowledged within the replica. These are steps 7 and 8.
acknowledge receipt.
Ack = All: All the leader and
replicas must acknowledge Figure 33: How weak consistency configuration works WEAK CONSISTENCY
that they have received the
messages. This can be slow but
will ensure that the message
has been stored successfully in 1 2
both the leader and followers.
APP
With Redis
4 3
Proxy Primary
In Redis Streams (especially in
Redis Enterprise), you have two 6 5
ways to acknowledge that a
message has been delivered.
7
You can configure Redis clusters
to have a weak consistency (but
more throughput) or a strong 8
consistency (with a little less Redis Labs Cluster Replica Storage
throughput).

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
52

Configuring strong consistency and durability


To read more about
Here’s how a strong consistency and durability works in Redis Streams. This is equivalent to “ack=All” in Kafka.
consistency and
durability, see https://
Option 1
docs.redislabs.com/
latest/rs/concepts/data-
With the WAIT command, applications can ask to wait for acknowledgments only after replication or persistence is
access/consistency-
confirmed on the replica. The flow of a write operation with the WAIT command is shown below:
durability/
1. The application issues a write.
2. The proxy communicates with the correct primary “shard” in the system that contains the given key.
3. The acknowledgment is sent to the proxy once the write operation completes.
4. The proxy sends the acknowledgement back to the application.

Independently, the write is communicated from the primary to the replica and replication acknowledges the write back to
the primary. These are steps 5 and 6.

Independently, the write to a replica is also persisted to disk and acknowledged within the replica. These are steps 7 and 8.

Figure 34: How strong consistency configuration (option 1) works STRONG CONSISTENCY

1 2
APP

4 3
Proxy Primary

6 5
7

8
Redis Labs Cluster Replica Storage

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
53

With this flow, the application only gets the acknowledgment from the write after durability is achieved with
replication to the replica and to the persistent storage.

With the WAIT command, Redis will make a best-effort attempt to guarantee that even under a node failure or node
restart, an acknowledged write will be recorded. However there is still a possibility of failure.

See the WAIT command for details on the new durability and consistency options.

Summary: The two


Consistency and durability Kafka Redis (Redis Streams)
approaches compared

Ack = 0 (doesn’t wait for


Option 1 Ignore or don’t wait for acknowledgement
acknowledgement)

Ack = 1 (wait for the leader – but “Weak consistency configuration” where you get
Option 2
not replicas – to acknowledge ) acknowledgement only from the leader

Ack = All (wait for all replicas to “Strong consistency config” (wait for all replicas
Option 3
acknowledge) to acknowledge). Or use “Redis Raft”

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
54

Letting the consumer The role of offsets in Kafka’s consumption acknowledgements


know that the message
has been received Offsets are simply incremental ids that are given to each message within a given partition for a given Topic. They start
from 0 for each partition for a given Topic. So there could be multiple messages with the same offset id. Therefore, the
In order to understand only way to uniquely identify them in the entire system is by combining the offset id with the partition id and the topic
consumption guarantees, you’ll name (because there could be multiple topics with the same partition ids).
need to know a little more about
some of the inner workings of
Figure 36: How
Kafka and Redis Streams, more BROKER 1 (SERVER)
offsets look in Kafka
specifically, the concepts of
"offsets" in Kafka and "pending TOPIC: EMAIL OFFSETS
lists" in Redis streams. These
concepts, in conjunction with
acknowledgements, will help to
solve the challenge of providing PARTITION: 0
consumption guarantees. 6 5 4 3 2 1 0

So let’s take a look at “offsets” in


Kafka and then “pending lists” in
Redis Streams before we return
PARTITION: 1
to consumption guarantees.
4 3 2 1 0

Apache
ZooKeeper

PARTITION: 2
3 2 1 0

TIME

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
55

Committing offsets (i.e., Figure 37: Consumers have processed up to offset 2 in partition 0, up
consumer acknowledgement). to offset 4 in partition 1, and up to offset 0 in partition 3.
When a consumer processes
a message or a bunch of
messages, it acknowledges
this by telling Kafka the offset BROKER 1 (SERVER)
it has just consumed. In
Kafka, this can be automatic TOPIC: EMAIL COMMITTED OFFSET
or manual. Following the
consumer's acknowledgement,
this information is written
to an internal topic called PARTITION: 0
“__consumer_offsets”, which acts 6 5 4 3 2 1 0

like a tracking mechanism. This is


how Kafka knows what message
to send next to consumers.
PARTITION: 1
4 3 2 1 0

Apache
ZooKeeper

PARTITION: 2
3 2 1 0

TIME

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
56

This leads to three delivery Figure 38: How at-most-once message processing works
methods, each with its own
advantages:

1. At most once: In this


case, the consumer provides BROKER 1 (SERVER)
acknowledgement as soon as
it receives the message, even TOPIC: EMAIL COMMITTED OFFSET
before it has had a chance
to process it. Although this
leads to higher throughput, if
the consumer dies before it's PARTITION: 0
able to actually process the 6 5 4 3 2 1 0

message, that message will be


lost. That's why this method
is called "at most once."
The consumer has only one PARTITION: 1
chance to request a group of 4 3 2 1 0
messages. Any messages it is
Apache
unable to process will be lost. ZooKeeper

For example, in Figure 38,


a consumer receives three PARTITION: 2
messages and acknowledges 3 2 1 0

the offset for each before


processing them. As it turns
out, it couldn’t actually process TIME
the third message (offset-2)
successfully. But since the offset
has already been committed,
the next time it asks for new
messages, Kafka will send them
from offset-3 onwards. As a
result, the message with offset-2
will fail to be processed.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
57

2. At least once: In this case, Figure 39: How at-least-once processing works
the consumer will commit only
after processing. Let’s imagine
BROKER 1 (SERVER)
that for performance reasons
the consumer is reading
TOPIC: EMAIL COMMITTED OFFSET
three messages at once
and committing once after 2ND TRY
(with duplicate processing)
processing all three messages.
Let’s say it processed two
1ST TRY
successfully but crashed before
it was able to process the third
PARTITION: 0
one. In this case, the consumer
6 5 4 3 2 1 0
(or a different consumer) can
come back and request these
messages from Kafka again.
And because the messages
were never committed, Kafka PARTITION: 1
will send all three messages 4 3 2 1 0

again. As a result, the consumer Apache


ZooKeeper
will end up reprocessing
messages that were already
processed (i.e., duplicate PARTITION: 2
processing). This approach is 3 2 1 0
called “at least once” because
the consumer isn’t limited to a
single request. TIME

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
58

In the illustration above, assume that a consumer is The way to mitigate this is to process the messages in a
processing three messages at a time and committing an way that it’s idempotent. This means even if you process
offset after it has processed all three. a message multiple times, the end result won’t change.
For example, if you set the exact price of some product
Here is how it works: multiple times in a database, it won’t matter. When building
distributed applications, if you find that you cannot maintain
1. A consumer reads messages with offsets 0, 1, and 2. idempotency when processing messages, you likely need to
2. It processes them. reconsider your logic to find a way to make it idempotent.
3. It commits the offset to 2.
4. The next time it asks, it gets messages with offsets 3, 3. Exactly once: As the name suggests, it simply means
4, and 5. that you figure out a way to ensure that a message is
5. Let’s say it processes offsets 3 and 4 but crashes processed once and no more. For this you typically
while processing offset-5. need extra support (programming logic) to ensure and
6. A new consumer (or the same consumer) requests guarantee this because there could be various reasons
messages from Kafka. for duplicate processing. Kafka only provides this level of
7. Kafka will again return messages with offset 3, 4, and 5. guarantee out of the box with Kafka-to-Kafka streams.
8. Let’s say this time all three are successfully
processed. That’s good, but it leads to duplicate Now that we’ve seen how Kafka provides message
processing of 3 and 4. consumption guarantees, let’s take a look at how Redis
handles them. But first, in order to do so, we need to
delve into the Redis concept of the “pending list.”

The role of “pending lists” in Redis’ consumption acknowledgements

Remember that Redis Streams does not have a built-in To ensure that these consumers don’t process duplicate
mechanism for partitions. Multiple consumers that are part messages, Redis Streams uses an additional data structure
of the same consumer group can all connect to a single called “pending lists” to keep track of messages that are
stream and yet still process messages concurrently within currently being processed by one of the consumers.
that stream.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
59

Looking at Figure 40, 1. The “last_delivered_id” id ensures that only unread messages are delivered to future requests from consumers
“emailService1” has asked of that same group. This is kind of like the “offset commits” in Kafka.
for two messages and 2. The pending lists allow consumers, should they temporarily die during processing (that is, before acknowledgement),
“emailService2” has asked to pick up where they left off.
for one. After the messages 3. The pending lists also allow other consumers to claim pending messages (using XCLAIM) in case the death of the
have been received, Redis original consumer proves to be permanent.
Streams puts a copy (or a
pointer) of them in a separate
list for each consumer. So
“12-0” and “12-1” are added Figure 40: How pending lists work in Redis Streams
to the list for “emailService1”
Consumer Group:
and “12-2” is added to the Email Application
list for “emailService2”. In last_delivered_id

addition, it updates the “last_


delivered_id” id to “12-2”.
TIME
This allows for three key things.

Email 12-5 12-4 12-3 12-2 12-1 12-0 12-1 12-0


Redis Streams emailService 1

TIME Pending List for “emailService1”


12-2
12-1 12-0 emailService 2

TIME Pending List for “emailService2”


emailService 3
12-2

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
60

Now, let’s imagine that Figure 41: Following acknowledgment from emailService2,
“emailService2” has completed the message “12-2” is removed from the pending list
its processing (Figure 41) and
acknowledges this. Redis Consumer Group:
Streams responds by removing Email Application
last_delivered_id
the processed items from the
pending list.
TIME

Email 12-5 12-4 12-3 12-2 12-1 12-0 12-1 12-0


Redis Streams emailService 1
ACK

TIME Pending List for “emailService1”


12-2
12-1 12-0 emailService 2

TIME Pending List for “emailService2”


emailService 3

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
61

Let’s see how it looks in the CLI.


1. Create a stream (Email), a consumer group, “Email Application” (EmailApplnGroup) and set it to read all
messages from the beginning (“0”). Note: If you use “$” instead of “0”, then it will send only new messages.
Also, if you provide any other id, then it will start reading from that id. Note: MKSTREAM is used to make a
new stream if the stream doesn’t already exist.

XGROUP CREATE Email EmailApplnGroup 0 MKSTREAM

2. Add six messages to the stream.

XADD Email * subject “1st email” body “Hello world”


XADD Email * subject “2nd email” body “Hello world”
XADD Email * subject “3rd email” body “Hello world”
XADD Email * subject “4th email” body “Hello world”
XADD Email * subject “5th email” body “Hello world”
XADD Email * subject “6th email” body “Hello world”

3. Let’s consume a message from the “emailService2” consumer that’s part of the “EmailApplnGroup” from the
“Email” stream.

XREADGROUP GROUP EmailApplnGroup emailService2 COUNT 1 STREAMS Email


//This will return a message that’ll look like this
1) 1) “Email”
2) 1) 1) 1526569495632-1
2) 1) “subject”
2) “3rd email”

4. Imagine we processed that message and we acknowledged that.

XACK Email emailApplGroup 1526569495632-1

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
62

As you can imagine, with Redis Figure 42: “At most once” processing in Redis Streams
Streams you can easily apply the
same delivery approaches used
with Kafka. Let’s take a look. Consumer Group:
Email Application
last_delivered_id
1. At most once: In this case,
you send an acknowledgement
when the messages have been TIME
received but before they’ve
been processed. Using Figure 42 Email
12-5 12-4 12-3 12-2 12-1 12-0 12-1 12-0
Redis Streams
for reference, let’s imagine that emailService 1

“emailService2” acknowledges
before fully processing the
message in order to quickly TIME Pending List for “emailService1”
consume more messages, 12-2
12-1 12-0
and that losing some message
processing doesn’t matter. In this emailService 2
case, if the consumer crashes
after acknowledgement but
before processing the message, TIME Pending List for “emailService2”
then that would be lost. Note that
12-2
this message is still in the stream,
so you can potentially reprocess
it, although you’ll never know if
you’ll need to or not.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
63

2. At least once: Again, this is Figure 43: “At-least once” processing in Redis Streams
very similar to Kafka. A message
Consumer Group:
is only acknowledged after Email Application
it’s been processed. Here, if a last_delivered_id

consumer is acknowledged
after processing multiple TIME
messages, and it crashes
during the processing of one Email
12-5 12-4 12-3 12-2 12-1 12-0 12-1 12-0
Redis Streams
of those messages, then you’ll emailService 1

end up reprocessing all of the


messages, not just the ones 2ND ROUND OF PROCESSING
that failed to be processed.
1ST ROUND OF PROCESSING

TIME Pending List for “emailService1” 12-2


emailService 2
12-1 12-0

TIME Pending List for “emailService2”

12-2

In the example above, we have a consumer group called processing the second message.
“Email Application” with two consumers (“emailService1” 4. When “emailService1” comes back later and reads from
and “emailService2”). the pending list, it will again see both messages in that
1. “emailService1” reads the first two messages, while list.
“emailService2” reads the third message at the same time. 5. As a result, it will process both messages.
2. The pending list of emailService1 stores the first 6. Because of step 5, the consumer ends up processing
two messages and similarly the pending list of the first message twice.
emailService2 stores the third message.
3. “emailService1” starts to process both messages (and And this is why it’s called “at least once." Although ideally
hasn’t committed yet). However, let’s say it temporarily all pending messages will be processed in one pass, it
crashes after processing the first message but before may require more.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
64

3. Exactly once: In Redis Figure 44: “Exactly once” (Option 1) processing in Redis Streams
Streams, you have multiple (done by processing one message at a time)
ways of ensuring that each Consumer Group:
message is processed exactly Email Application
last_delivered_id
one time.

Option 1: Because Redis TIME


Streams is extremely fast, you
can read just one message Email
12-5 12-4 12-3 12-2 12-1 12-0 12-0
Redis Streams emailService 1
at a time and acknowledge it
after that message has been
successfully processed. In this 2ND ROUND OF PROCESSING
scenario, you’ll always have one
message in the pending list. 1ST ROUND OF PROCESSING
However, even though Redis
Streams is fast, consumers TIME Pending List for “emailService1” 12-1
can still be slow to process. emailService 2
12-0
Consider the performance of
your consumers before using
this option.

TIME Pending List for “emailService2”

12-1

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
65

Option 2: As an alternative Figure 45: “Exactly once” (Option 2) processing in Redis Streams (using a set
to Option 1, you can also use data structure to keep track of the messages that have already been processed)
additional data structures, such Consumer Group:
Email Application
as Redis Sets, to keep track of last_delivered_id
messages that have already
been processed by some
consumer. This way you can TIME
check the set and make sure
12-5 12-4 12-3 12-2 12-1 12-0 12-0
the message’s id is not already Email
Redis Streams emailService 1
a member of the set before you
request it from the stream.
2ND ROUND OF PROCESSING

1ST ROUND OF PROCESSING

TIME Pending List for “emailService1” 12-1


emailService 2
12-0

TIME Pending List for “emailService2”

12-1

Store the processed message ids


in a set to help with Exactly-once
Processed 12-0 12-1 12-3
msgs (set) delivery symantics

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
66

The role of clusters Kafka Clusters


in Kafka and Redis Kafka is a distributed system. That means you typically The example below (Figure 46) shows how a typical
wouldn’t use it with just one server but would be more Kafka cluster would look. It shows a Kafka cluster with
In this section we’ll go over likely to use it with at least three servers. Clusters provide three brokers (servers), two topics (Email and Payment),
some high-level aspects of high availability and durability. If one of the servers goes where the Email topic has three partitions spread
clusters. This is a very deep down, the others can still keep serving the clients. across three brokers (10, 11, and 12), and the Payment
topic so we’ll only cover the topic has two partitions that are spread across two
key aspects of it. In Kafka each server is called a broker. In production, brokers (11 and 12).
Kafka clusters might have anywhere between three
brokers (minimum) to hundreds of brokers.

Figure 46: A Kafka cluster consisting of


three brokers (servers), two topics (Email and
Apache
ZooKeeper
Payment), and five partitions

Topic Topic Topic


Email Payment Email
Partition 0 Partition 1 Partition 2

Topic Topic
Email Payment
Partition 1 Partition 0

BROKER ID: 10 BROKER ID: 11 BROKER ID: 12

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
67

Redis Clusters

With Redis, things work pretty much the same way. In The only caveat if you’re using the OSS cluster is that you
the example below, the Redis cluster has three nodes. don’t have a proxy for these cluster nodes. That means
The messages for “Email” are sent to three different your client libraries will need to manage where the data
streams that are in three different nodes. The messages goes by directly connecting to each node within the
for “Payment” are sent to two different streams that are in cluster. But thankfully, the cluster APIs make it very easy
two different nodes. and most of the Redis client libraries in all programming
languages already support it.

Figure 47: A Redis OSS cluster with three brokers (servers), two topics (Email and Payment), and five streams

Email: Payment: Email:


P0 P1 P2
stream stream stream

Email: Payment:
P1 P0
stream stream

NODE 1 NODE 2 NODE 3

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
68

On the other hand, Redis Enterprise provides a proxy By the way, in Redis clusters, if the key contains curly
layer on top of the clusters. This way the client can just braces (“{}”), then only the text within those curly braces
connect to the proxy and doesn’t have to worry about will be hashed. This means you can name the keys as
exactly which server the data is going to or coming from. “Email:{P0}” and “Payment:{P1}”.

Figure 47a: A Redis Enterprise cluster with three brokers (servers), two topics (Email and Payment), and five streams

REDIS ENTERPRISE CLUSTER PROXY

Email: Payment: Email:


{P0} P1 {P2}
stream stream stream

Email: Payment:
{P1} P0
stream stream

NODE 1 NODE 2 NODE 3

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
69

By the way, in Redis you can run 1. Let’s say you are running four Redis instances (four 3. Next, let’s say you move two shards to the second
multiple Redis instances in the shards) on a single node. And imagine you have split node in order to better distribute the load. This is
same node. These are called the data across these four instances. This is mainly called “rebalancing”.
shards. for parallel processing and to fully utilize all the CPU 4. Finally, in order to increase parallel processing and
and memory resources. to fully utilize all the CPUs, you may add more Redis
Figure 48 illustrates how the 2. Now, let’s say you want to move to two machines, instances and split the data across those instances. This
Redis cluster helps scale Redis. that is, you want to “scale out”. So you add a second is called “resharding”. So let’s say you’ve added two more
Here is how it works. machine. At this point you have scaled out, and this instances/shards in each node. Now in the beginning
node is now part of the cluster. But this new node is these new instances won’t have any data. So you need to
empty to begin with. It contains no Redis instances use a process called resharding to split and move some
or data. of the existing data. In the end you wind up with a total
of eight shards and much higher throughput.

Figure 48: Using Redis Enterprise to increase your throughput by scaling out, rebalancing, and resharding your data

SCALING OUT, RESHARDING & REBALANCING

SCALE OUT REBALANCING RESHARDING

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
70

Hopefully this ebook has provided you with a solid foundation for understanding both Kafka and Redis Streams. It’s
Conclusion important to note that Kafka is a very robust streaming platform that is geared towards highly complex, distributed
applications when you have very specific requirements. In contrast, Redis Streams is a great way to add streaming to
an existing application that is already using Redis. Redis Streams has much lower management overhead, and if you
are already using Redis for say, caching, then you can implement Redis Streams without setting up and maintaining a
separate system.

If you’re interested in learning more and taking this further, check out the free Redis Streams course (RU202) offered at
Redis University.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis

You might also like