0% found this document useful (0 votes)
212 views69 pages

Ebook Streams Redis Streams and Kafka 20220615

Uploaded by

Sang Hoàng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
212 views69 pages

Ebook Streams Redis Streams and Kafka 20220615

Uploaded by

Sang Hoàng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

Understanding Streams in

Redis and Kafka


A Visual Guide

© 2022 Redis
2

Table of PART 1. Introducing the concept of streams. . . . . . . . . . . 3 Single partition and multiple consumers . . . . . . . . . . . 28

Contents
What are streams? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Multiple partitions and multiple consumers. . . . . . . . . 30
How streams are related to events . . . . . . . . . . . . . . . . . . 7 In-order and in-parallel message processing. . . . . . . . 37
How streams compare to buffering . . . . . . . . . . . . . . . . . . 8 The role of consumer groups in Redis Streams . . . . . . . . 43
Processing using just Buffers . . . . . . . . . . . . . . . . . . . . . . . . . . 8 How messages are acknowledged. . . . . . . . . . . . . . . . . . 49
Processing using Streams. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Letting the producer know that the message has been
The challenges of stream processing . . . . . . . . . . . . . . . 10 delivered. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

Specialized stream processing systems. . . . . . . . . . . . . 16 With Kafka. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51


With Redis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Summary: The two approaches compared. . . . . . . . . 53
PART 2. Comparing the approaches of Kafka and Redis
to handling streams. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Letting the consumer know that the message has been
received . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
How messages (event data) are stored. . . . . . . . . . . . . . 19
The role of offsets in Kafka’s consumption
Creating streams. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 acknowledgements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Adding messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 The role of “pending lists” in Redis’ consumption
Consuming messages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 acknowledgements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
With Kafka. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 The role of clusters in Kafka and Redis. . . . . . . . . . . . . . . 66
With Redis Streams. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Kafka Clusters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Approaches to scaling consumption. . . . . . . . . . . . . . . . . . 28 Redis Clusters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
3

Part 1 Introducing the concept of streams


Streams are incredibly useful but can be a little confusing to describe. Part of this is due to the fact that they can be
explained in at least two distinct ways and that each description provides only half of the picture. Backend developers
(NodeJS and Java) see streams (in contrast to buffers) as a memory-efficient way of processing large amounts of data.
The big data folks take a different perspective. They view streams as a way of dealing with data that arrives over time or
as a means of decoupling the producers and consumers of that data.

It’s like that old story where


blind people are asked to
describe an elephant. The first
one touches the elephant’s leg
and says the elephant feels like
a tree. The second one touches
the elephant’s trunk and
concludes that the elephant
feels like a snake and so on.

Picture 1. Image credit: https://medium.


com/betterism/the-blind-men-and-the-
elephant-596ec8a72a7d

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
4

Even when you think you have a firm understanding of Even though we’ll be covering deep and complex topics,
it, stream processing can still be a very complex topic. thanks to the format and the illustrations, it should be an
In fact, it’s difficult to maintain a good mental model of easy and fun read overall.
streaming unless you really understand some stream
processing systems.
By the end of this, you should have
The goal of this ebook is to help you build that mental
• An expert-level theoretical understanding of
model. I’ll use text, code snippets, and more than 50
streams, the challenges of stream processing,
illustrations to explain
and how two stream processing systems (Kafka
and Redis streams) work
1. How to think about streams and connect the
• Enough knowledge to do a proof-of-concept
dots between different perspectives so you get a
of Redis Streams or Kafka and to determine
bigger picture
which one is best suited for you
2. Some of the challenges of handling streams
• Enough theoretical knowledge to get a head
3. How stream processing systems such as Redis
start on certification for either Redis or Kafka
Streams and Kafka work. We are using these two
systems as examples in the hope that you’ll gain a
OK, let’s get started.
more thorough understanding as opposed to learning
just how one system handles the processing.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
5

What are streams?


Processing
In computer science, a stream is Picture 2: Processing bytes stream one Byte at a time
a sequence of data elements(i.e.
series of strings, JSON, binary,
raw bytes) that are made
Server
available for processing in small
chunks over time. As with the
contents of a text file, this data
may be finite, but even then it
BYTES STREAM
will be processed in pieces, one 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
word, or one line at a time in a BYTE BYTE BYTE
10 GB
sequence (word after word, or
line after line) until all that data
has been processed. TIME

In other cases, the data might Picture 3: Processing JSON stream


Processing one JSON
be infinite and might never document at a time
end. For example, say you
are processing data in a chat
messenger server. You’ll only
Server
get a chat message to process
when someone writes one. And
it can happen any time and may
continue for as long as people
keep chatting. JSON STREAM

JSON chat JSON chat JSON chat JSON chat JSON chat

TIME

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
6

This data can be internal as well Picture 4: Streams of messages sent using
as external. It doesn’t have to machine-to-machine communication
come from the outside world.
It could originate from different
systems sending messages Web Server
to each other. For example,
a webserver, after receiving
payment information, might
use a JSON message to tell
an email server to send an JSON
email via a JSON message.
That is machine-to-machine
communication. You can also
think of these messages as JSON
coming in the form of streams
because they can come in small
pieces and can come over time
and at any point in time. JSON JSON JSON JSON
App Server Analytics Server

JSON

JSON

Email Server

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
7

How streams are Picture 5: How events generate streams of data

related to events
An event is simply a mechanism, Processing one
a trigger that is activated when Events “Buy event” at a time
something has occurred. For
example, when someone buys
a product, that triggers an event
that leads to the creation of a Server
JSON message that contains the
person’s information, payment
amount, product info, and so
on. This usually originates at
the browser or mobile app, JSON STREAM
and then the message is sent JSON Buy JSON Buy JSON Buy JSON Buy JSON
to the server. Here, the event is
the act of buying the product,
indicating something occurred. Data correcponding TIME
And since the buying event to each “event”

can happen at any time, the


resulting data (typically JSON)
representing that event flows
into the system as a stream.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
8

How streams Processing using just Buffers


compare to Picture 6: How buffers are used to process data

buffering
If you ask backend engineers, SERVER
who work in Java, NodeJS, and BUFFERS
other programming languages,
they’ll tell you that streams are
more efficient than buffers for Store the fixed data
processing chunks of data. They 1 back to disk

come from the perspective of


processing large data inside an
application server.
10 GB 10 GB 10 GB 10 GB 10 GB 10 GB

An example should help us to MyProject.txt MyProject.txt MyProject.txt FIXED New MyProject.txt New MyProject.txt
Buffer (RAM) MyProject.txt
understand their perspective a Buffer (RAM)
little better.
2 3 4
Say you have a 10GB file
containing hundreds of typos Processes the entire
DATA PROCESSOR
file at a time
that say “breams” instead of
“streams”. Let’s look at how to
use buffers to replace “breams”
with “streams,” and then we’ll
see how the process would Here is how it works
work using streams. 1. You first read the entire 10 GB file into RAM (can be slow to load all that data).
representing that event flows 2. You then send this data to a data processor that’ll fix all the typos from “breams” to “streams”.
into the system as a stream. 3. Once the data processor finishes processing, the new data will be stored back in the RAM
(so you may need an additional 10 GB of memory).
4. After all the processing is done, you write the entire file into a new file.

As you can see, this process not only tends to be slow, but it can also take up a lot of memory.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
9

Processing using Streams


A better approach is to read the data as a stream. Here the data is transferred Picture 7: How streams are used to process data (Basic “Stream processing”)
in bytes, and you figure out how to group these bytes into tiny chunks, then
process each chunk. STREAMS
(LINE-BY-LINE PROCESSING)
SERVER
Here is how it works:
1. Data comes in bytes, that is, one byte at a time.
2. The producer assembles those bytes into a chunk that you've specified. The last byte that is 1st line in MyProject.txt
part of the 1st line ~100 Bytes Buffer (RAM)
For example, if you've decided to process the file a line at a time, it keeps
appending bytes until it spots a newline character that signals the end of 1 2
this particular chunk. This chunk is now ready to pass on to the consumer. BYTES
TO LINE
3. The consumer processes the line, looks for the existence of the typo and, if CONVERTOR
10 GB 10 GB
it finds one, replaces "breams" with "streams."
MyProject.txt New MyProject.txt
4. The processed chunk is then written as a stream to the new file. Bytes that are part
New file is written
5. The whole process is repeated until the end-of-file character is detected. of the 2nd line
one line at a time
At that point, the process is complete, and the stream is closed. Bytes that are part 3 4
of the 3rd line
DATA PROCESSOR
As you can probably see, compared to buffering, streaming has some clear
Processes one
benefits. It's faster, more efficient, and places significantly less of a burden line at a time
on memory. Although both streaming and buffering require a buffer, in the
case of buffering, that buffer must be large enough to contain the entire
file or message. With streaming, the buffer only needs to be large enough
to accommodate the size of a specified chunk. Moreover, once the current
Now that you’ve seen how backend engineers view streams, let’s look
chunk has been processed, the buffer can be cleared and then used to
at streams through the eyes of big data engineers. But first, in order
accommodate the next chunk. As a result, regardless of the size of the
to do so, we need to better understand some of the challenges of
file, the buffer consumes only 50-100 bytes of memory at a time. Second,
stream processing.
because the entire file doesn't need to be loaded into RAM first, the process
can begin right away.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
10

The challenges of 1. What happens if the consumer is unable to process the chunks as quickly as the producer creates them? Taking our
current example, what if the consumer is 50 percent slower then the producer? If we’re starting out with a 10GB file,
stream processing that means by the time the producer has processed all 10GBs, the consumer would only have processed 5GB. What
happens to the remaining 5GB while it’s waiting to be processed? Suddenly, that 50-100 bytes allocated for data that
Although streams can be a very still needs to processed would have to be expanded to 5GB.
efficient way of processing huge
volumes of data, they come with
their own set of challenges. Let’s
take a look at a few of them. Picture 8: If the consumer is slower than the producer, you’ll need additional memory.

STREAMS
(50% SLOWER CONSUMER)
SERVER

The last byte that is 5GB RAM due to


part of the 1st line slow processor

1 2
BYTES
TO LINE
10 GB CONVERTOR 10 GB

MyProject.txt New MyProject.txt


Bytes that are part
New file is written
of the 2nd line
one line at a time
Bytes that are part 3 4
of the 3rd line
DATA PROCESSOR

Processes one line at a time


50% slower than producer

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
11

2. And that’s just one nightmare scenario. There are others. For example, what happens if the consumer suddenly dies
while it’s processing a line? You’d need a way of keeping track of the line that was being processed and a mechanism
that would allow you to reread that line and all the lines that follow.

Picture 9: When the consumer fails

STREAMS
(FAILED CONSUMER)
SERVER

The last byte that is 1st line in MyProject.txt


part of the 1st line ~100 Bytes Buffer (RAM)

1 2
BYTES
TO LINE
10 GB CONVERTOR 10 GB

MyProject.txt New MyProject.txt


Bytes that are part
New file is written
of the 2nd line
one line at a time
Bytes that are part 3 4
of the 3rd line
DATA PROCESSOR

Fails processing after


certain number of lines

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
12

3. Finally, what happens if you need to be able to and that each has a corresponding consumer: a
process different events and send them to different “payment processor”, an “inventory processor”, and
consumers? And, to add an extra level of complexity, a “webserver events processor.” In addition, there is
what if you have interdependent processing, when the an important interdependency between two of the
process of one consumer depends on the actions of consumers. Before you can process the inventory,
another? There’s a real risk that you’ll wind up with a you need to verify payment first. Finally, each type
complex, tightly coupled, monolithic system that’s very of data has different destinations. If it’s a payment
hard to manage.This is because these requirements event, you send the output to all the systems, such as
will keep changing as you keep adding and removing the Database, Email system, CRM, and so on. If it’s a
different producers and consumers. webserver event, then you send it just to the database.
If it’s an Inventory event, you send it to the database
For example (Picture 10), let’s assume we have a large and the CRM.
retail shop with thousands of servers that support
shopping through web apps and mobile apps. As you can imagine, this can quickly become quite
complicated and messy. And that’s not even including the
Imagine that we are processing three types of data slow consumers and fault-tolerance issues that we’ll need
related to payments, inventory, and webserver logs to deal with for each consumer.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
13

Picture 10: The challenge of tight coupling because of multiple producers and consumers

MULTIPLE INPUT AND OUTPUT STREAMS SERVER


(TIGHTLY COUPLED, MONOLITHIC)

Database

Payment events Payment proc

Email

Webserver events Webserver proc

CRM
Inventory events Inventory proc

The inventory proc needs to check with payment proc


before it can reduce inventory count (dependency)
Dashboard

BEFORE PROCESSING

AFTER PROCESSING After processing, some data needs to go to all the systems (like Payment Other web
events’ data) and some are only for a few (like web events’ data) services

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
14

Of course, all of this assumes that you’re dealing with consumers, and we retain the dependency between the
a monolithic architecture, that you have a single server inventory microservices and the payment ones. Finally,
receiving and processing all the events. How would you the problems we pinpointed in our original streaming
deal with a “microservices architecture”? In this case, example remain problems
numerous small servers (that is, microservices) would
be processing the events, and they would all need to 1. We haven’t figured out what to do when a consumer
be able to talk to each other. Suddenly, you don’t just crashes.
have multiple producers and consumers. You have them 2. We haven’t come up with a method for managing slow
spread out over multiple servers. consumers that doesn’t force us to vastly inflate the
size of the buffer.
A key benefit of microservices is that they solve the 3. We don’t yet have a way to ensure that our data isn’t
problem of scaling specific services depending on lost.
changing needs. Unfortunately, although microservices
solve some problems, they leave others unaddressed. These are just some of the main challenges. Let’s take a
We still have tight coupling between our producers and look at how to address them.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
15

Picture 11: The challenges of tight coupling in the microservices world

MULTIPLE INPUT AND OUTPUT STREAMS


(TIGHTLY COUPLED, MICROSERVICES)
1
Payment
events
Database
Payment Payment Payment

Payment Payment Payment


Service Service Service
Email

Webserver
events

Web proc Web proc Web proc


CRM
The inventory proc needs to check with
payment proc before it can reduce
2 Web ev Web ev Web ev
inventory count (dependency) Service Service Service

Dashboard
Inventory
events

Invtr proc Invtr proc Invtr proc

Inventory Inventory Inventory Other web


services
Service Service Service

After processing, some data needs to go to all the systems (like Payment
events’ data) and some are only for a few (like web events’ data)

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
16

Specialized stream As we’ve seen, streams can be great for processing large 4. If the consumer is slow, you can increase consumption
amounts of data but also introduce a set of challenges. by adding more consumers.
processing systems New specialized systems such as Apache Kafka and Redis 5. If one consumer is dependent on another, you can
Streams were introduced to solve these challenges. In the simply listen to the output stream of that consumer
world of Kafka and Redis Streams, servers no longer lie at and then do your processing. For example, in the
the center, the streams do, and everything else revolves picture above, the inventory service is receiving events
around them. from both the inventory stream (Purple) and also the
output of the payment processing stream (Orange)
Data engineers and data architects frequently share this before it processes the inventory event. This is how
stream-centered worldview. Perhaps it’s not surprising you solve the interdependency problems.
that when streams become the center of the world, 6. The data in the streams are persistent (as in a
everything is streamlined. database). Any system can access any data at any
time. If for some reason data wasn’t processed, you
In picture 12 is a direct mapping of the tightly coupled can reprocess it.
example you saw earlier. Let’s see how it works at a
high level. A number of streaming challenges that once seemed
formidable, even insurmountable, can readily be solved just
Note: We’ll go into the details later in the context of Redis by putting streams at the center of the world.This is why
Streams and Kafka to give you an in-depth understanding more and more people are using Kafka and Redis Streams in
of the following: their data layer.

1. Here the streams and the data(events) are first-class This is why data engineers view streams as the center of
citizens as opposed to systems that are processing the world.
them.
2. Any system that is interested in sending data Now that we understand what streams, events, and
(producer\), receiving data (consumer), or both stream processing systems are, let’s take a look into Redis
sending and receiving data (producer + consumer) Streams and Kafka to understand stream processing and
connects to the stream processing system. how they solve various challenges. By the end of this, you
3. Because producers and consumers are decoupled, should be an expert, at least in the theoretical aspects of
you can add additional consumers or producers at stream processing to the extent that you can readily do a
will. You can listen to any event you want. This makes it proof-of-concept for each system or easily earn Kafka or
perfect for microservices architectures. Redis certification.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
17

Picture 12: When we make streams the center of the world, everything becomes streamlined.

Payment Webserver Inventory


events events events

Payments stream

Webserver stream

Inventory stream

REDIS STREAMS
Payments stream (post processing)
OR
Events stream (post processing)

Inventory stream (post processing)

Payment Web proc Invtr proc

Payment Web ev Inventory Database Email CRM Dashboard Other web


Service Service Service services

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
18

Part 2 Comparing the approaches of Kafka


and Redis to handling streams
Apache Kafka is open-source (Apache License 2.0, Redis is an open-source (BSD3, written in C) in-memory
written in Scala) and a leading distributed streaming database, considered to be the fastest and most loved
platform. It’s a very feature-rich stream processing database. It’s also the leading database on AWS. Redis
system. Kafka also comes with additional ecosystem Streams is just one of the capabilities of Redis. With Redis,
services such as KsqlDB and Kafka Connect to provide you’ll get a multi-model, multi-data structure database
for more comprehensive capabilities. with six modules and more than 10 data structures.

So the key thing to remember is that, when you are thinking about Kafka and Redis Streams, you should really
think of Kafka and Redis (not just Redis Streams).

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
19

How messages Picture 13: How messages look in Kafka and Redis Streams

(event data) are


stored
Although their storage is similar,
Kafka and Redis Streams have
Messages are stored in the same
different ways of identifying order as they arrive. Messages are
each message. In Kafka, each partly identified by their sequence
message is given a sequence 4 3 2 1 0 number that starts at 0

number that starts with 0. But


NEWEST OLDEST
each message can only be
partly identified by its sequence TIME
number. That’s because of
another concept called a
“partition” that we’ll get into later.

In Redis Streams, each message Messages are stored in the same order
as they arrive. Messages are by default
by default gets a timestamp
identified by:
as well as a sequence number. <millisecondsTime>-<sequenceNumber>
56628723-0 56628723-0 56628723-0 56628723-0 56628723-0
The sequence number is 1518951480106-0
Redis Streams
provided to accommodate NEWEST OLDEST
messages that arrive at the
exact same millisecond. So
if two messages arrived at
the exact same millisecond
(1518951480106), their ids would
look like 1518951480106-0 and
1518951480106-1.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
20

Creating streams Picture 14: How messages look in Kafka for topic Email with one broker, one partition, and one replication factor

In Kafka, you create what’s


called a “topic”. You can think of BROKER 1 (SERVER)
this as the name of the stream.
However, in Kafka, you also need TOPIC: EMAIL
to understand four key concepts.

1. Partition: You can think of it


as a file on the disk. PARTITION: 0
2. Broker: You can think of the 2 1 0
actual server. Apache Email Service
3. Replication Factor: The ZooKeeper NEWEST OLDEST
number of duplicate copies
TIME
of the messages you want
to keep.
4. Zookeeper: This is an
additional system that you
need to use in order to
manage Kafka.

We’ll get into all these in a bit but Note the command to create a Kafka topic with one partition and one replication factor would look like this:
for now let’s assume you have > kafka-topics --zookeeper 127.0.0.1:2181 --topic Email --create --partitions 1
one partition, one broker, and --replication-factor 1
one replication factor.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
21

Note: The example below (Picture 14a) shows how these would look in a Kafka cluster. We’ll discuss that later, but for now
just imagine that there is only one broker.

Picture 14a: A Kafka cluster with three brokers (servers), two topics (Email and Payment), where Email-topic has three
partitions that are spread across three brokers(10, 11, and 12) and Payment topic has two partitions that are spread across
two brokers (11 and 12)

Apache
ZooKeeper

Topic Topic Topic


Email Payment Email
Partition 0 Partition 1 Partition 2

Topic Topic
Email Payment
Partition 1 Partition 0

BROKER ID: 10 BROKER ID: 11 BROKER ID: 12

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
22

In Redis, you simply create a Picture 15: How messages look in Redis for an Email stream
stream and give it a key. Note
that all the data within this
stream is part of this single key
(“Email’). Note also that this
key and its stream just resides {”name”: “raja”, “email”: “[email protected]”}
along with other keys and data User JSON
structures. Redis allows for a
number of data structures. TIME
A stream is just one of them.
(See Picture 15.) 12-2 12-1 12-0
Email
Redis Streams NEWEST OLDEST Email Service

[”This”, “is”, “an”, “Array”]


myArray Array

”I’m a humble string”


myString String

The command to create a Redis stream would look like this:


XADD Email * email_subject “1st email” email_body “hello world”

If the Email stream already exists, it will append the message. If it doesn’t exist, Redis will automatically create a
stream (using “Email” as its key) and then append the first message. The asterisk will auto-generate the message
id (timestamp-sequence) for this message.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
23

Adding messages
In the following command, you are using a Kafka producer CLI tool to send three messages to the Email topic.
Kafka has a concept called $ kafka-console-producer --brokers-list 127.0.0.1:9092 --topic Email
“producers.” These are > my first email
responsible for sending > my second email
messages. They can also send > my third email
messages with some options
such as acknowledgments, In Redis Streams, use the XADD command to send the data in a hash to the Email key.
serialization format and so on. XADD Email * subject “my first email”
XADD Email * subject “my second email”
XADD Email * subject “my third email”

In Redis Streams, you can set up acknowledgments and many other things as part of the Redis server or Redis
cluster settings. Remember that these settings will get applied to the entire Redis and not just for the Redis
Streams data structure.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
24

Consuming messages With Kafka

Both Kafka and Redis Streams In Kafka, the following command reads all the messages in the Email topic. The “bootstrap-server” is the main Kafka
have the concepts of consumers server. The “--from-beginning” flag tells Kafka to send all the data from the beginning. If we don’t provide this flag, the
and consumer groups. We’ll consumer will only retrieve messages that arrive after it has connected to Kafka and started to listen.
cover just the basics first.

$ kafka-console-consumer --bootstrap-server 127.0.0.1:9092 --topic Email --from-


beginning

Response:
> my first email
> my second email
> my third email

Note: The above consumer client will continue to wait for new messages in a blocking fashion and will display them when
they arrive.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
25

With Redis Streams

In Redis Streams, you have two main options:

1. Consume messages by using XREAD (equivalent to Kafka’s command). In the command below, “BLOCK 0”
tells the Redis CLI to maintain the connection forever (0) in a blocking manner. “Email 0” after the keyword
“STREAMS” means to get messages from the “Email” stream and from the beginning of time.

XREAD BLOCK 0 STREAMS Email 0

Response:
1) 1) 1518951480106-0
2) 1) “subject”
2) “my first email”
2) 1) 1518951482479-0
2) 1) “subject”
2) “my second email”
3) 1) 1518951482480-0
2) 1) “subject”
2) “my third email”

Notes:
• If you use “Email $”, then it would get only new messages from the “Email” stream. That is, “XREAD BLOCK 0 STREAMS
Email $”
• You can use any other timestamp id after the stream name to get messages after that timestamp id. That is, “XREAD
BLOCK 0 STREAMS Email 1518951482479-0”

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
26

2. Consume messages by giving a range of ids or by using some special commands.

a. You can use the command XRANGE and get everything from the smallest (“-”) timestamp to the latest one
(“+”).

> XRANGE Email - +

Response:
1) 1) 1518951480106-0
2) 1) “subject”
2) “my first email”
2) 1) 1518951482479-0
2) 1) “subject”
2) “my second email”
3) 1) 1518951482480-0
2) 1) “subject”
2) “my third email”

b. You can also provide timestamps directly.

> XRANGE Email 1518951482479 1518951482480

Response:
1) 1) 1518951482479-0
2) 1) “subject”
2) “my second email”
2) 1) 1518951482480-0
2) 1) “subject”
2) “my third email”

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
27

c. You can limit the result by specifying a count.

> XRANGE Email - + COUNT 1

Response:
1) 1) 1518951480106-0
2) 1) “subject”
2) “my first email”

d. By prefixing the last id with a “(“, you can pick up where you left off, starting with the messages that
immediately followed the one with that id and keeping the “+” for the ending point. In the example below,
we are retrieving two messages that come after a message with a “1518951480106-0” id.

> XRANGE Email (1518951480106-0 + COUNT 2

Response:
1) 1) 1518951482479-0
2) 1) “subject”
2) “my second email”
2) 1) 1518951482480-0
2) 1) “subject”
2) “my third email”

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
28

Approaches to scaling Single partition and multiple consumers


consumption
Scenario: Let’s imagine you have three emails that need to be processed in no particular order by three email processors
You just saw the basics of (consumers) so you can get the job done in one third the time.
producers and consumers in
both Kafka and Redis. Now, In Kafka, let’s say you connected all three consumers to the Email topic. Then all three messages are sent to all three
let’s dig in and see how these consumers. So you end up processing duplicate messages. This is called a “fan out”.
streaming services scale
consumption.
Picture 16: A “fan out” in Kafka when multiple consumers connect to a single topic

BROKER 1 (SERVER) All messages

TOPIC: EMAIL
Email Service

All messages
PARTITION: 0
2 1 0
Email Service
Apache
ZooKeeper
NEWEST OLDEST
All messages
TIME

Email Service

Note: Although it doesn’t work for this scenario, it works fine in the chat messenger clients where you can connect multiple
users to the same topic and they all receive all chat messages.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
29

It works exactly like that in Redis Picture 17: A “fan out” in Redis Streams
Streams as well.

All messages

{”name”: “raja”, “email”: “[email protected]”} Email Service

User JSON

All messages
TIME

12-2 12-1 12-0


Email Email Service
Redis Streams NEWEST OLDEST

[”This”, “is”, “an”, “Array”] All messages

myArray Array
Email Service

”I’m a humble string”


myString String

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
30

Multiple partitions and multiple consumers

In Kafka, there is a concept called a partition. You can think of a partition as a physical file on the disk. Partitions are used for
scaling purposes. However you should use them carefully, either with “keys” or with “consumer groups”. We’ll talk about both
of them in a bit. But just know that consumers generally don’t care and are not aware of the partitions. They just subscribe to
a “Topic”(the higher-level abstraction) and consume whatever Kafka sends them.

We are going to cover multiple cases of just using multiple partitions and multiple consumers. and it may look odd at first.

Case 1: An equal number of partitions and consumers (three each)

In the example below, we have created three partitions for the “Email” topic using the following command:

> kafka-topics --zookeeper 127.0.0.1:2181 --topic Email --create --partitions 1


--replication-factor 1

Now when we add three messages to the topic, they are automatically distributed to each of the partitions using a
hashing algorithm. So each partition gets just one message each in our example. But when consumers connect to this
topic (they are not aware of the partitions), all the messages that are in each partition are sent to each consumer in a
fan-out fashion.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
31

Picture 18: A “fan out” when there are multiple partitions (default behavior)

BROKER 1 (SERVER)

TOPIC: EMAIL

All messages
PARTITION: 0
0
Email Service

All messages
PARTITION: 1
0 Email Service
Apache
ZooKeeper
All messages

PARTITION: 2
Email Service
0

TIME

Notes:
• Message order: The consumers may receive the messages in random order. For example, “Email Service 1” might receive
“message 1”, “message 3” and finally “message 2”. Whereas “Email Service 2” might get them in the following order:
“message 3”, “message 1” and “message 2”. This is because orders are only maintained within a single partition.
• Later, we’ll learn more about the ordering and how to use keys and consumer groups to alter this default behavior.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
32

Case 2: More partitions (three) and fewer consumers (two)

It still works the same. Each consumer gets all the messages irrespective of partitions. Message order is random.

Picture 19: A “fan out” when there are more partitions than consumers

BROKER 1 (SERVER)

TOPIC: EMAIL

All messages
PARTITION: 0
0
Email Service

PARTITION: 1
0
Apache
ZooKeeper
All messages

PARTITION: 2
Email Service
0

TIME

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
33

Case 3: Multiple but fewer partitions (three) and more consumers (four)

It still works the same. Each consumer receives all the messages irrespective of partitions. Message order is still random.

Picture 20: A “fan out” when there are fewer partitions than consumers

BROKER 1 (SERVER)

TOPIC: EMAIL
All messages

Email Service
PARTITION: 0
0
All messages

Email Service

PARTITION: 1
0 All messages
Apache
ZooKeeper
Email Service

PARTITION: 2
All messages
0

Email Service
TIME

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
34

In Redis Streams, there is no such concept as partitions. In many cases you don’t even need them. However, because
Redis Streams includes Redis with all its features, you can very easily implement something equivalent.

Let’s look at how easy this is to implement.

You can create “partitions” simply by creating multiple streams and then distributing data yourself. And on the
consumer side, unlike Kafka, since you have direct access to each of these streams, you can consume the data in a fan-
out fashion by connecting to all the streams, or by using a key or keys to connect to specific streams.

Say you created three streams: “Email:P0”, “Email:P1”, and “Email:P2”. And say you want to distribute the incoming
messages in a round-robin fashion. And finally you want to consume data in a “fan-out” fashion and also in a “per-
stream” fashion.

1. Use a key to keep track of the number of emails so we can use it for round-robin.
INCR email_counter

2. Find the modulus of the email_counter to determine which stream (Email:P0 or Email:P1 or Email:P2) to send
the data to
var streamName = “Email:P” + (email_counter % 3)

3. Send each message to the appropriate stream.


XADD streamName * subject “email 1 subject” //Email:P1 or Email:P2

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
35

Consuming from Redis Stream “partitions” in a “fan out” fashion (Picture 21)

To consume the data in a “fan out” fashion, simply listen to all the streams (Picture 21).
//Consumer 1
XREAD BLOCK 0 STREAMS Email:P0 0 Email:P1 0 Email:P2 0
//Consumer 2
XREAD BLOCK 0 STREAMS Email:P0 0 Email:P1 0 Email:P2 0
//Consumer 3
XREAD BLOCK 0 STREAMS Email:P0 0 Email:P1 0 Email:P2 0

Notes:
• BLOCK 0 = Wait indefinitely for messages.
• ”Email:P0 0” = Read all messages from the beginning (0-0).
• By providing multiple stream names and “0”s each consumer can receive all messages.

Picture 21: How to implement “partitions” in Redis streams and consume messages in a “fan out” manner

All messages

TIME
Email Service
12-0
Email:P0 Redis Streams

All messages
TIME

12-0
Email:P1 Email Service
Redis Streams

TIME
All messages
12-0
Email:P2 Redis Streams
Email Service

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
36

Consuming from Redis Stream “partitions” in a per-stream fashion (Picture 22)

To implement a round-robin, you can keep a counter in


Redis, say checkout_counter, and increment it (INCR
To consume the data in a “per-stream” fashion,
checkout_counter) every time you send a new message
simply listen to the stream of your choice. Here
to a stream. Then use a modulus of the checkout_counter
message orders are preserved (Picture 22).
(checkout_counter% number of streams) to determine
//Consumer 1
which stream you should send the next message to.
XREAD BLOCK 0 STREAMS Email:P0 0
//Consumer 2
XREAD BLOCK 0 STREAMS Email:P1 0
//Consumer 3
The following command creates a “check_out:p0”
XREAD BLOCK 0 STREAMS Email:P2 0
Redis stream.
XADD check_out:p0 * message 0 cartId 1
items 5 cost $100
INCR checkout_counter //Use this for
Picture 22: How to implement “partitions” in Redis round-robin
Streams and consume them in a “per-stream” manner

Only messages
from “Email:P0”

TIME
Email Service
12-0
Email:P0 Redis Streams
Only messages
from “Email:P1”
TIME

12-0
Email:P1 Email Service
Redis Streams

TIME
Only messages
from “Email:P2”
12-0
Email:P2 Redis Streams
Email Service

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
37

In-order and in-parallel message processing that, those messages with the same key will end up
in the same partition. This helps in message ordering.
Tasks can be handled in order, in parallel, or with a Message keys are also useful in other things such as log
combination of both. As the name implies, with in- compaction, but we’ll not cover that here.
order processing, tasks are handled in a specific order.
For example, when processing a credit card, we need Secondly, Kafka uses the concept of “consumer groups”,
to first check for validity of the card, then do fraud where you define a bunch of individual consumers
detection, and then check for a balance. With in-parallel as part of the same consumer group. Kafka will then
processing, tasks are handled simultaneously so they can ensure that messages are distributed across different
be completed more quickly. With in-order and in-parallel consumers that are part of that group. This helps in
processing, the system splits the tasks into groups of scaling consumption and also avoids “fan out”, so each
tasks that need to be handled in order and then assigns message is read by only one consumer. Another key
those to different consumers that can perform those aspect of consumer groups is that each consumer in a
ordered tasks in parallel. Kafka and Redis Streams handle group is tied to a single partition and is allowed to read
this process a little differently. How they differ will become messages just from just that partition. It can not read
clearer when you look at each system’s implementation. messages from multiple partitions. This way when you
combine message keys and consumer groups you’ll
How Kafka handles it wind up with highly distributed consumption, where
message ordering is guaranteed.
In Kafka, you can send metadata called a “key” (aka
“message key”) along with the message. When you do Let’s look at an example to make it clear.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
38

Referring to Picture 23, let’s say Picture 23: How consumer groups and message keys work in Kafka
you are processing emails for an
ecommerce store. You need to Consumer Group:
send the following emails and in Email Application
BROKER 1 (SERVER)
the following order:

TOPIC: EMAIL
1. Payment Received All messages
in Partition 0
2. Product Shipped
3. Product Delivered
Email Service 1
PARTITION: 0
In this case, to make sure they
are sent in that order, we can 2 1 0 All messages
in Partition 1
use the order id(“order1234”)
as the key we send to Kafka to
Email Service 2
ensure that all the messages
end up in the same partition. PARTITION: 1
All messages
in Partition 2
Apache
ZooKeeper
Email Service 3

PARTITION: 2
Extra Stand-by

Email Service 4
TIME

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
39

And secondly, we can define


a consumer group “Email 1. Create a producer to send messages to “Email” topic with a key. Use the “=” sign to separate the “key”
Application” and designate (e.g. orderid1234) and the “value” (the actual message content).
four consumers as part of that $ kafka-console-producer.sh --broker-list localhost:9092 --topic Email l
group. Kafka will then internally --property
connect each consumer to one “parse.key=true” --property “key.separator==”
partition (1:1 mapping).
2. Then, send three messages with different statuses “payment_received”, “product_shipped” and
• If there are more consumers “product_delivered”.
than partitions, the
> orderId1234={id:”orderid1234”, name: “light bulb”, price:” $1.00”, status:
additional consumers will “payment_received”}
be kept idle and won’t > orderId1234={id:”orderid1234”, name: “light bulb”, price:” $1.00”, status:
receive any messages. So, “product_shipped”}
in our example, the fourth > orderId1234={id:”orderid1234”, name: “light bulb”, price:” $1.00”, status:
consumer will be left idle and “product_delivered”}
won’t receive any messages.
• If there are fewer consumers 3. Create four consumers within the consumer group “EmailApp” and connect them to the “Email” topic
than partitions, then some of (in four different CLI windows).
the consumers will receive kafka-console-consumer --bootstrap-server 127.0.0.1:9092
data from multiple partitions. --group EmailApp --topic Email
However, there will still be
kafka-console-consumer --bootstrap-server 127.0.0.1:9092
only one consumer (that’s
--group EmailApp --topic Email
part of the group) per kafka-console-consumer --bootstrap-server 127.0.0.1:9092
partition. --group EmailApp --topic Email
kafka-console-consumer --bootstrap-server 127.0.0.1:9092
Let’s see how this actually looks
--group EmailApp --topic Email
in CLI.
4. Here’s how just one of the consumers will receive all three messages.
> orderId1234={id:”orderid1234”, name: “light bulb”, price:” $1.00”, status:
“payment_received”}
> orderId1234={id:”orderid1234”, name: “light bulb”, price:” $1.00”, status:
“product_shipped”}
> orderId1234={id:”orderid1234”, name: “light bulb”, price:” $1.00”, status:
“product_delivered”}

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
40

How Redis Streams handle it Let’s say someone purchases three different products. For each product, you have “payment_received”, “product_shipped”,
and product_delivered” messages (for a total of nine), and you want to process them in order but also in parallel.
Although, like Kafka, Redis
Streams has a concept of In the example (Picture 24) below, yellow, purple and pink represent three products. Each product has three messages
“consumer groups”, it operates representing its different states. As you can see, if you want to process three messages at a time, simply create three
differently. In fact, you don’t streams and send each product’s data into a specific stream based on the product id or some unique identifier. This
need it for this specific use is similar to “keys” in Kafka. After that, connect each consumer to each stream (i.e. 1:1 mapping). This is also similar
case. We’ll learn in the next to Kafka. Then you’ll get both parallel processing and in-order message processing at the same time. As we already
section how Redis uses mentioned, unlike with Kafka, with Redis you don’t really need consumer groups in this case.
consumer groups, but for now
let’s see how in-order and in-
parallel message processing Picture 24: Using Redis Streams to process multiple messages in parallel and in order
works in Redis.
AllMsgs in-order
Creating streams in Redis from “Email:P0”

is cheap. You simply define TIME


multiple streams (essentially Email Service

simulating “partitions'') based Email:P0 Redis Streams


12-0 12-1 12-0

on some simple hash algorithm AllMsgs in-order


and then send the messages TIME from “Email:P1”

to those different streams.


12-2 12-1 12-0
Once you do, you will be able Email:P1
Redis Streams
Email Service

to use different consumers to


consume those messages in TIME
AllMsgs in-order
order and in parallel. 12-2 12-1 12-0
from “Email:P2”
Email:P2 Redis Streams
Let’s take an example. Email Service

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
41

Let’s see how it looks in the CLI.


1. Use a key to keep track of the number of emails so we can use it for round-robin.

INCR email_counter

2. Hash the unique id like the order id and take a modulus of the number of streams to determine which stream
(Email:P0 or Email:P1 or Email:P2) to send the data to.

var streamName = “Email:P” + ((murmurHash(“order1234”) % 3)

All we need to do is to convert a string like orderId into a number using a popular hash called “murmur” hash
and then take mod of the number of streams.

Note: Kafka also uses “murmur” hash for converting string into a number. There are “murmur” libraries in every
language, such as this one in Nodejs.

3. Send the messages to the appropriate stream. Notice that because of the hash we employed in the above
steps, we’ll have a 1:1 mapping between the product id and the stream name. So, for example, all the
messages with order id order1234 will go to “Email:P0” stream.

XADD Email:P0 * id order1234 name “light bulb” price “$1:00” status “payment_received”
XADD Email:P0 * id order1234 name “light bulb” price “$1:00” status “product_shipped”
XADD Email:P0 * id order1234 name “light bulb” price “$1:00” status “product_delivered”

XADD Email:P1 * id order2222 name “chair” price “$100:00” status “payment_received”


XADD Email:P1 * id order2222 name “chair” price “$100:00” status “product_shipped”
XADD Email:P1 * id order2222 name “chair” price “$100:00” status “product_delivered”

XADD Email:P2 * id order5555 name “Yoga book” price “$31:00” status “payment_received”
XADD Email:P2 * id order5555 name “Yoga book” price “$31:00” status “product_shipped”
XADD Email:P2 * id order5555 name “Yoga book” price “$31:00” status “product_delivered”

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
42

4. Here’s how just one of the consumers will receive all three messages.

> orderId1234={id:”orderid1234”, name: “light bulb”, price:” $1.00”, status:


“payment_received”}
> orderId1234={id:”orderid1234”, name: “light bulb”, price:” $1.00”, status:
“product_shipped”}
> orderId1234={id:”orderid1234”, name: “light bulb”, price:” $1.00”, status:
“product_delivered”}

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
43

The role of consumer In the example below (Picture 25), we have created a consumer group called “Email Application” and have made three
groups in Redis Streams consumers part of that group. Each consumer is asking for one message each at the same time for concurrent processing.
In this case, Redis Streams simply distributes unread (unconsumed) messages to each consumer.
In Redis Streams, although there
is no concept of “message keys”
like in Kafka, you can still get Picture 25: Each member of the consumer group “Email
message ordering without it. Application” has concurrently requested one message and has
Consumer Group:
However, it does have a concept been given one unread message Email Application
of “consumer groups” but again
it works differently from Kafka.
1 of 3 msg
First, let’s understand how
consumer groups work in Redis
Streams and then later we’ll see 12-0
Email Service 1
how Redis Streams handles
message ordering. 2 of 3 msg
TIME
In Redis Streams,you can
connect multiple consumers that Email:P1
12-2 12-1 12-0 12-1 Email Service 2
Redis Streams
are part of the same consumer
group to a single stream and do 3 of 3 msg
parallel processing without the
need for partitions.
12-2
Email Service 3

Note: Each consumer within the consumer group needs to identify itself with a name. In Picture 25, we have named the
services that are part of the “Email Application” group as “emailService1”, “emailService2”, and “emailService3”.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
44

Let’s see how it looks in the CLI.


1. Create a stream (Email), a consumer group (Email Application - “EmailApplnGroup”) and set it to read all
messages from the beginning (“0”). Note: If you use “$” instead of “0”, then it’ll send only new messages.
Notes: Also, if you provide any other id, then it’ll start reading from that id. Note: MKSTREAM is used to make a
• The “>” means, send me new stream if the stream doesn’t already exist.
new messages, that is, ones
that have not been read by XGROUP CREATE Email EmailApplnGroup 0 MKSTREAM
anyone (including me).
• Caution: If you use “0” or a 2. Add three messages to the stream
timestamp instead of “>”,
then Redis Streams will XADD Email * subject “1st email” body “Hello world”
only return messages that XADD Email * subject “2nd email” body “Hello world”
have already been read (but XADD Email * subject “3rd email” body “Hello world”
not acknowledged) by the
current consumer.
3. Let’s consume messages using the three consumers, each asking concurrently for one email.
• Note that It doesn’t return
all the messages from the XREADGROUP GROUP EmailApplnGroup emailService1 COUNT 1 STREAMS Email >
mainstream. This is because, XREADGROUP GROUP EmailApplnGroup emailService2 COUNT 1 STREAMS Email >
when a consumer is part XREADGROUP GROUP EmailApplnGroup emailService3 COUNT 1 STREAMS Email >
of a group, an additional
list called a “pending list” 4. In this case, each consumer will receive one message.
is created and treated as
a micro-stream that holds 1) 1) “Email”
messages for that particular 2) 1) 1) 1526569495631-0
consumer. This is a little 2) 1) “subject”
tricky to understand, we’ll 2) “1st email”
discuss this in a bit 3) “body”
• “COUNT 1” means give me 4) “Hello world”
just one message.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
45

As mentioned earlier, unlike Picture 26: How consumer groups work when one
Kafka, each consumer within consumer asks for more messages than the other
the group can ask for as many Consumer Group:
messages as it wants. Email Application

In Picture 26, we have


“emailService1” asking for 2 of 3 msg 1 of 3 msg
two messages instead of
one, while at the same time
12-1 12-0
“emailService2” is asking Email Service 1

for one. Finally, a little


bit later, “emailService3” 3 of 3 msg

asks for one message. In TIME


this case, emailService1
12-2 12-1 12-0 12-1
gets to process two Email Email Service 2
Redis Streams
messages, emailService2
gets to process one, but
emailService3 doesn’t wind
up with any because there
are no more unclaimed Email Service 3
messages available.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
46

This scenario doesn't have to Picture 27: Multiple consumer groups and regular
be limited to a single consumer consumers all consuming messages at the same time
group. It’s possible to have Consumer Group:
Email Application
multiple consumer groups as
well as regular consumers that
are not part of any group, all
1 of 3 msg
consuming messages at the
same time. In Picture 27, there
are two different consumer 12-0
Email Service 1
groups (“Email Application”
2 of 3 msg
and “Payment Application”)
as well as a regular consumer TIME
(Dashboard Service) that are all 12-2 12-1 12-0 12-1
Email Email Service 2
consuming the messages. Redis Streams
3 of 3 msg

12-2
Consumer Group 2: Email Service 3
Payment Application
2

1 of 3 msg 2 of 3 msg

1 0
12-0 12-1
Payment Service 1

3 of 3 msg

12-2
Payment Service 2
DASHBOARD SERVICE CONSUMER
(not part of any group)

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
47

And finally, a consumer or a Picture 28: The consumer group (Payment Application) and a
consumer group can consume consumer (Dashboard Service) consuming data from two streams Consumer Group:
data from multiple streams Email Application
(see Picture 28).
1 of 3 msg
TIME

12-2 12-1 12-0 12-0


Email Email Service 1
Redis Streams
2 of 3 msg

12-1
TIME Email Service 2

12-2 12-1 12-0 3 of 3 msg


Orders
Redis Streams

12-2
Email Service 3

Consumer Group 2:
Payment Application
2 2
1 of 3 msg 2 of 3 msg

1 1 0

12-0 12-1 12-2 12-1


Payment Service 1 0
3 of 3 msg

12-2 12-0
Payment Service 2
DASHBOARD SERVICE CONSUMER
(not part of any group)

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
48

To consume from multiple


streams, you simply need to list 1. Making a consumer group’s (Payment Application Consumer Group 2 - “paymentApplnGroup”) consumer
them. This is how it looks like in (paymentService1) get two (Count 2) unread messages from Email stream (Email >) and also from the Orders
the CLI. stream (Orders >)

XREADGROUP GROUP paymentApplnGroup paymentService1 COUNT 2


STREAMS Email > Orders >

2. Making the dashboard service get messages from the beginning from both Email(Email 0) and the Order
(Orders 0) streams and also waiting for any new streams in a blocking fashion (BLOCK 0).

XREAD BLOCK 0 STREAMS Email 0 Orders 0

Now that you have seen the basics of how stream processing works, let’s look at how some of the challenges are
addressed. One of the most effective ways to handle them is via “message acknowledgements”.

Let’s dig in.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
49

How messages 1. Providing message delivery guarantees for producers. Once a message has been sent, how can we be sure that it has
been received? We need the streaming system to acknowledge that it has in fact safely stored the incoming message.
are acknowledged
Streaming system
In the context of stream Picture 29: How stream processing
processing, acknowledgement systems acknowledge message
is simply a way for one system Producer reception to producers
to confirm to another system
that it has received a message
or that it has processed that
message. Redis
or
Message acknowledgements Yes, I acknowledge that I’ve safely stored it
can be used to solve the
following four stream
processing challenges:

1. Providing message delivery 2. Providing message consumption guarantees for consumers. There needs to be a way for the consumer to
guarantees for producers acknowledge back to the system that it has successfully processed the message.
2. Providing message
consumption guarantees for
consumers Streaming system
Picture 30: A consumer
3. Enabling retries after
acknowledgement to the streaming
temporary outages
system after processing the message
4. Permitting reassignment
following a permanent
Consumer
outage

Redis
Yes, I acknowledge that I’ve processed the message
or

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
50

3. Enabling retries after temporary outages. We need to be able to reprocess messages in the event that a consumer
dies while processing them. For this we need to have a mechanism that enables the consumer to acknowledge to
the system that it has processed a message. And if there is an issue, the processing system needs to provide a way
to re-process that message in case of a temporary failure (Picture 31).

Streaming system Picture 31: If the consumer fails


Retry 1 to process the message on the
Retry 2 first try, a mechanism is needed
Retry 3 that enables it to retry processing
the message until it has been
successfully processed and that
First time consuming enables it to acknowledge when
Redis
or
the message has finally been
Consumer processed.
Yes, I acknowledge that I’ve processed the message

Consumer retrying after failed processing


and acknowledging after the third try

4. Permitting reassignment following a permanent outage. And lastly, if the consumer permanently fails (say,
crashes), we need a way to either assign the job to a different consumer or allow different consumers to find out
about the failure and take over the job (Picture 32).
Streaming system
1 2
Picture 32: When a consumer,
while attempting to read new
First time consuming messages (1) permanently crashes
Consumer
(2) a new consumer takes over
3 (3) successfully processes the
messages, and then sends back an
A new consumer takes over when another consumer crashes
Redis NEW acknowledgment to the streaming
Consumer
or 4 system. (4)
Yes, I acknowledge that I’ve processed the message

Now let’s look at how Kafka and Redis Streams handle each one of these.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
51

Letting the producer Configuring weak consistency and durability


know that the message
has been delivered Let’s see how the weak consistency and durability configuration works. And once configured, it’ll work the same for all
types of Redis keys including Redis Streams. This is somewhat equivalent to “aAck=1” in Kafka.
With Kafka
Any updates that are issued to the database are typically performed with the following flow:
In Kafka, you can have the The application issues a write to the proxy.
following configurations 1. The proxy communicates with the correct master “shard” in the system that contains the given key.
2. Once the write operation is complete, an acknowledgement is sent back to the proxy.
Ack = 0: Send the message but 3. The proxy sends the acknowledgment back to the application.
don’t wait for any confirmation
(you may lose data, but it will be Independently, the write is communicated from master to replica and replication acknowledges the write back to the
extremely fast). master. These are steps 5 and 6.
Ack = 1: At least one of the
nodes in the cluster must Independently, the write to a replica is also persisted to disk and acknowledged within the replica. These are steps 7 and 8.
acknowledge receipt.
Ack = All: All the master and
replicas must acknowledge Picture 33: How weak consistency configuration works WEAK CONSISTENCY
that they have received the
messages. This can be slow but
will ensure that the message
has been stored successfully in 1 2
both the master and replicas.
APP
With Redis
4 3
Proxy Master
In Redis Streams (especially in
Redis Enterprise), you have two 6 5
ways to acknowledge that a
message has been delivered.
7
You can configure Redis clusters
to have a weak consistency (but
more throughput) or a strong 8
consistency (with a little less Redis Labs Cluster Slave Storage
throughput).

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
52

Configuring strong consistency and durability


To read more about
Here’s how a strong consistency and durability works in Redis Streams. This is equivalent to “ack=All” in Kafka.
consistency and
durability, see https://
Option 1
docs.redislabs.com/
latest/rs/concepts/data-
With the WAIT command, applications can ask to wait for acknowledgments only after replication or persistence is
access/consistency-
confirmed on the replica. The flow of a write operation with the WAIT command is shown below:
durability/
1. The application issues a write.
2. The proxy communicates with the correct master “shard” in the system that contains the given key.
3. The acknowledgment is sent to the proxy once the write operation completes.
4. The proxy sends the acknowledgement back to the application.

Independently, the write is communicated from the master to the replica and replication acknowledges the write back to
the master. These are steps 5 and 6.

Independently, the write to a replica is also persisted to disk and acknowledged within the replica. These are steps 7 and 8.

Picture 34: How strong consistency configuration (option 1) works STRONG CONSISTENCY

1 2
APP

8 7
Proxy Master

6 3
4

5
Redis Labs Cluster Slave Storage

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
53

With this flow, the application only gets the Option 2


acknowledgment from the write after durability is
achieved with replication to the replica and to the RedisRaft is another Redis project designed
persistent storage. to make strong consistency even better.

With the WAIT command, applications can have a Picture 35: The RedisRaft logo
guarantee that even under a node failure or node
restart, an acknowledged write will be recorded.
You can learn more about it here: https://redislabs.
See the WAIT command for details on the new durability
com/blog/redisraft-new-strong-consistency-
and consistency options.
deployment-option/

Summary: The two


Consistency and durability Kafka Redis (Redis Streams)
approaches compared

Ack = 0 (doesn’t wait for


Option 1 Ignore or don’t wait for acknowledgement
acknowledgement)

Ack = 1 (wait for the leader – but “Weak consistency configuration” where you get
Option 2
not replicas – to acknowledge ) acknowledgement only from the leader

Ack = All (wait for all replicas to “Strong consistency config” (wait for all replicas
Option 3
acknowledge) to acknowledge). Or use “Redis Raft”

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
54

Letting the consumer The role of offsets in Kafka’s consumption acknowledgements


know that the message
has been received Offsets are simply incremental ids that are given to each message within a given partition for a given Topic. They start
from 0 for each partition for a given Topic. So there could be multiple messages with the same offset id. So the only
In order to understand way to uniquely identify them in the entire system is by combining the offset id with the partition id and the topic name
consumption guarantees, you’ll (because there could be multiple topics with the same partition ids).
need to know a little more about
some of the inner workings of
Picture 36: How
Kafka and Redis Streams, more BROKER 1 (SERVER)
offsets look in Kafka
specifically, the concepts of
"offsets" in Kafka and "pending TOPIC: EMAIL OFFSETS
lists" in Redis streams. These
concepts, in conjunction with
acknowledgements, will help to
solve the challenge of providing PARTITION: 0
consumption guarantees. 6 5 4 3 2 1 0

So let’s take a look at “offsets” in


Kafka and then “pending lists” in
Redis Streams before we return
PARTITION: 1
to consumption guarantees.
4 3 2 1 0

Apache
ZooKeeper

PARTITION: 2
3 2 1 0

TIME

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
55

Committing offsets (i.e. Picture 37: Consumers have processed up to offset 2 in partition 0,
consumer acknowledgement). up to offset 4 in partition 1, and up to offset 0 in partition 3.
When a consumer processes
a message or a bunch of
messages, it acknowledges
this by telling Kafka the offset BROKER 1 (SERVER)
it has just consumed. In
Kafka, this can be automatic TOPIC: EMAIL COMMITTED OFFSET
or manual. Following the
consumer's acknowledgement,
this information is written
to an internal topic called PARTITION: 0
“__consumer_offsets”, which acts 6 5 4 3 2 1 0

like a tracking mechanism. This is


how Kafka knows what message
to send next to consumers.
PARTITION: 1
4 3 2 1 0

Apache
ZooKeeper

PARTITION: 2
3 2 1 0

TIME

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
56

This leads to three delivery Picture 38: How at-most-once message processing works
methods, each with its own
advantages

1. At most once: In this


case, the consumer provides BROKER 1 (SERVER)
acknowledgement as soon as
it receives the message, even TOPIC: EMAIL COMMITTED OFFSET
before it has had a chance
to process it. Although this
leads to higher throughput, if
the consumer dies before it's PARTITION: 0
able to actually process the 6 5 4 3 2 1 0

message, that message will be


lost. That's why this method
is called "at most once."
The consumer has only one PARTITION: 1
chance to request a group of 4 3 2 1 0
messages. Any messages it is
Apache
unable to process will be lost. ZooKeeper

For example, in Picture 38,


a consumer receives three PARTITION: 2
messages and acknowledges 3 2 1 0

the offset for each before


processing them. As it turns
out, it couldn’t actually process TIME
the third message (offset-2)
successfully.. But since the offset
has already been committed,
the next time it asks for new
messages, Kafka will send them
from offset-3 onwards. As a
result, the message with offset-2
will fail to be processed.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
57

2. At least once: In this case, Picture 39: How at-least-once processing works
the consumer will commit only
after processing. Let’s imagine
BROKER 1 (SERVER)
that for performance reasons
the consumer is reading
TOPIC: EMAIL COMMITTED OFFSET
three messages at once
and committing once after 2ND TRY
(with duplicate processing)
processing all three messages.
Let’s say it processed two
1ST TRY
successfully but crashed before
it was able to process the third
PARTITION: 0
one. In this case, the consumer
6 5 4 3 2 1 0
(or a different consumer) can
come back and request these
messages from Kafka again.
And because the messages
were never committed, Kafka PARTITION: 1
will send all three messages 4 3 2 1 0

again. As a result, the consumer Apache


ZooKeeper
will end up re-processing
messages that were already
processed (i.e. duplicate PARTITION: 2
processing). This approach is 3 2 1 0
called “at least once” because
the consumer isn’t limited to a
single request. TIME

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
58

In the illustration above, assume that a consumer is The way to mitigate this is to process the messages in a
processing three messages at a time and committing an way that it’s idempotent. This means even if you process a
offset after it has processed all three. message multiple times, the end result won’t change. For
Here is how it works: example, if you set the exact price of some product multiple
times in a database, it won’t matter. However there are certain
1. A consumer reads messages with offsets 0, 1, and 2. situations where it does matter. So you need to be careful.
2. It processes them.
3. It commits the offset to 2. 3. Exactly once: As the name suggests, it simply means
4. The next time it asks, it gets messages with offsets 3, that you figure out a way to ensure that a message is
4, and 5. processed once and no more. For this you typically
5. Let’s say it processes offsets 3 and 4 but crashes need extra support (programming logic) to ensure and
while processing offset-5. guarantee this because there could be various reasons
6. A new consumer (or the same consumer) requests for duplicate processing. Kafka only provides this level of
messages from Kafka. guarantee out of the box with Kafka-to-Kafka streams.
7. Kafka will again return messages with offset 3, 4, and 5.
8. Let’s say this time all three are successfully Now that we’ve seen how Kafka provides message
processed. That’s good, but it leads to duplicate consumption guarantees, let’s take a look at how Redis
processing of 3 and 4. handles them. But first, in order to do so, we need to
delve into the Redis concept of the “pending list.”

The role of “pending lists” in Redis’ consumption acknowledgements

Remember that in Redis Streams there’s no need for multiple To ensure that these consumers don’t process duplicate
partitions. Multiple consumers that are part of the same messages, Redis Streams uses an additional data structure
consumer group can all connect to a single stream and yet called “pending lists” to keep track of messages that are
still process messages concurrently within that stream. currently being processed by one of the consumers.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
59

Looking at Picture 40, 1. The “last_delivered_id” id ensures that only unread messages are delivered to future requests from consumers
“emailService1” has asked of that same group. This is kind of like the “offset commits” in Kafka.
for two messages and 2. The pending lists allow consumers, should they temporarily die during processing (that is, before acknowledgement),
“emailService2” has asked to pick up where they left off.
for one. After the messages 3. The pending lists also allow other consumers to claim pending messages (using XCLAIM) in case the death of the
have been received, Redis original consumer proves to be permanent.
Streams puts a copy(or a
pointer) of them in a separate
list for each consumer. So
“12-0” and “12-1” are added Picture 40: How pending lists work in Redis Streams
to the list for “emailService1”
Consumer Group:
and “12-2” is added to the Email Application
list for “emailService2”. In last_delivered_id

addition, it updates the “last_


delivered_id” id to “12-2”.
TIME
This allows for three key things.

Email 12-5 12-4 12-3 12-2 12-1 12-0 12-1 12-0


Redis Streams emailService 1

TIME Pending List for “emailService1”


12-2
12-1 12-0 emailService 2

TIME Pending List for “emailService2”


emailService 3
12-2

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
60

Now, let’s imagine that Picture 41: Following acknowledgment from emailService2,
“emailService2” has completed the message “12-2” is removed from the pending list.
its processing (Picture 41)
and acknowledges this. Redis Consumer Group:
Streams responds by removing Email Application
last_delivered_id
the processed items from the
pending list.
TIME

Email 12-5 12-4 12-3 12-2 12-1 12-0 12-1 12-0


Redis Streams emailService 1
ACK

TIME Pending List for “emailService1”


12-2
12-1 12-0 emailService 2

TIME Pending List for “emailService2”


emailService 3

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
61

Let’s see how it looks in the CLI.


1. Create a stream (Email), a consumer group, “Email Application”(EmailApplnGroup) and set it to read all
messages from the beginning (“0”). Note: If you use “$” instead of “0”, then it’ll send only new messages.
Also, if you provide any other id, then it’ll start reading from that id. Note: MKSTREAM is used to make a new
stream if the stream doesn’t already exist.

XGROUP CREATE Email EmailApplnGroup 0 MKSTREAM

2. Add six messages to the stream.

XADD Email * subject “1st email” body “Hello world”


XADD Email * subject “2nd email” body “Hello world”
XADD Email * subject “3rd email” body “Hello world”
XADD Email * subject “4th email” body “Hello world”
XADD Email * subject “5th email” body “Hello world”
XADD Email * subject “6th email” body “Hello world”

3. Let’s consume a message from the “emailService2” consumer that’s part of the “EmailApplnGroup” from the
“Email” stream.

XREADGROUP GROUP EmailApplnGroup emailService2 COUNT 1 STREAMS Email


//This will return a message that’ll look like this
1) 1) “Email”
2) 1) 1) 1526569495632-1
2) 1) “subject”
2) “3rd email”

4. Imagine we processed that message and we acknowledged that.

XACK Email emailApplGroup 1526569495632-1

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
62

As you can imagine, with Redis Picture 42: “At most once” processing in Redis Streams
Streams you can easily apply the
same delivery approaches used
with Kafka. Let’s take a look. Consumer Group:
Email Application
last_delivered_id
1. At most once: In this case,
you send an acknowledgement
when the messages have been TIME
received but before they’ve been
processed. Using Picture 42 Email
12-5 12-4 12-3 12-2 12-1 12-0 12-1 12-0
Redis Streams
for reference, let’s imagine that emailService 1

“emailService2” acknowledges
before fully processing the
message in order to quickly TIME Pending List for “emailService1”
consume more messages and 12-2
12-1 12-0
that losing some message
processing doesn’t matter. In this emailService 2
case, if the consumer crashes
after acknowledgement but
before processing the message, TIME Pending List for “emailService2”
then that’d be lost. Note that this
12-2
message is still in the stream, so
you can potentially re-process
it, although you’ll never know if
you’ll need to or not.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
63

2. At least once: Again, this is Picture 43: “At-least once” processing in Redis Streams
very similar to Kafka. A message
Consumer Group:
is only acknowledged after Email Application
it’s been processed. Here, if a last_delivered_id

consumer is acknowledged
after processing multiple TIME
messages, and it crashes
during the processing of one Email
12-5 12-4 12-3 12-2 12-1 12-0 12-1 12-0
Redis Streams
of those messages, then you’ll emailService 1

end up re-processing all of the


messages, not just the ones 2ND ROUND OF PROCESSING
that failed to be processed.
1ST ROUND OF PROCESSING

TIME Pending List for “emailService1” 12-2


emailService 2
12-1 12-0

TIME Pending List for “emailService2”

12-2

In the example above, we have a consumer group called crashes after processing the first message but before
“Email Application” with two consumers (“emailService1” processing the second message.
and “emailService2”). 4. When “emailService1” comes back later and reads from
the pending list, it’ll again see both messages in that list.
1. “emailService1” reads the first two messages, while 5. As a result, it will process both messages.
“emailService2” reads the third message at the same time. 6. Because of step 5, the consumer ends up processing
2. The pending list of emailService1 stores the first the first message twice.
two messages and similarly the pending list of
emailService2 stores the third message. And this is why it’s called “at least once." Although ideally
3. “emailService1” starts to process both messages (and all pending messages will be processed in one pass, it
hasn’t committed yet). However, let’s say it temporarily may require more.

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
64

3. Exactly once: In Redis Picture 44: “Exactly once” (Option 1) processing in Redis Streams
Streams, you have multiple (done by processing one message at a time)
ways of ensuring that each Consumer Group:
message is processed exactly Email Application
last_delivered_id
one time.

Option 1: Because Redis TIME


Streams is extremely fast, you
can read just one message Email
12-5 12-4 12-3 12-2 12-1 12-0 12-0
Redis Streams emailService 1
at a time and acknowledge it
after that message has been
successfully processed. In this 2ND ROUND OF PROCESSING
scenario, you’ll always have one
message in the pending list. 1ST ROUND OF PROCESSING

TIME Pending List for “emailService1” 12-1


emailService 2
12-0

TIME Pending List for “emailService2”

12-1

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
65

Option 2: As an alternative Picture 45: “Exactly once” (Option 2) processing in Redis Streams (using a set
to Option 1, you can also use data structure to keep track of the messages that have already been processed)
additional data structures, such Consumer Group:
Email Application
as Redis Sets, to keep track of last_delivered_id
messages that have already
been processed by some
consumer. This way you can TIME
check the set and make sure
12-5 12-4 12-3 12-2 12-1 12-0 12-0
the message’s id is not already Email
Redis Streams emailService 1
a member of the set before you
request it from the stream.
2ND ROUND OF PROCESSING

1ST ROUND OF PROCESSING

TIME Pending List for “emailService1” 12-1


emailService 2
12-0

TIME Pending List for “emailService2”

12-1

Store the processed message ids


in a set to help with Exactly-once
Processed 12-0 12-1 12-3
msgs (set) delivery symantics

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
66

The role of clusters Kafka Clusters


in Kafka and Redis Kafka is a distributed system. That means you typically The example below (Picture 46) shows how a typical
wouldn’t use it with just one server but would be more Kafka cluster would look. It shows a Kafka cluster with
In this section we’ll go over likely to use it with at least three servers. Clusters provide three brokers (servers), two topics (Email and Payment),
some high-level aspects of high availability and durability. If one of the servers goes where the Email topic has three partitions spread
clusters. This is a very deep down, the others can still keep serving the clients. across three brokers(10, 11, and 12) and the Payment
topic so we’ll only cover the topic has two partitions that are spread across two
key aspects of it. In Kafka each server is called a broker. In production, brokers (11 and 12).
Kafka clusters might have anywhere between three
brokers (minimum) to hundreds of brokers.

Picture 46: A Kafka cluster consisting of


three brokers (servers), two topics (Email and
Apache
ZooKeeper
Payment), and five partitions

Topic Topic Topic


Email Payment Email
Partition 0 Partition 1 Partition 2

Topic Topic
Email Payment
Partition 1 Partition 0

BROKER ID: 10 BROKER ID: 11 BROKER ID: 12

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
67

Redis Clusters

With Redis, things work pretty much the same way. In The only caveat if you’re using the OSS cluster is that you
the example below, the Redis cluster has three nodes. don’t have a proxy for these cluster nodes. That means
The messages for “Email” are sent to three different your client libraries will need to manage where the data
streams that are in three different nodes. The messages goes by directly connecting to each node within the
for “Payment” are sent to two different streams that are in cluster. But thankfully, the cluster APIs make it very easy
two different nodes. and most of the Redis’ client libraries in all programming
languages already support it.

Picture 47: A Redis OSS cluster with three brokers (servers), two topics (Email and Payment), and five streams

Email: Payment: Email:


P0 P1 P2
stream stream stream

Email: Payment:
P1 P0
stream stream

NODE 1 NODE 2 NODE 3

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
68

On the other hand, Redis Enterprise provides a proxy By the way, in Redis clusters, if the key contains curly
layer on top of the clusters. This way the client can just braces (“{}”), then only the text within those curly braces
connect to the proxy and doesn’t have to worry about will be hashed. This means you can name the keys as
exactly which server the data is going to or coming from. “Email:{P0}” and “Payment:{P1}”.

Picture 47a: A Redis Enterprise cluster with three brokers (servers), two topics (Email and Payment), and five streams

REDIS ENTERPRISE CLUSTER PROXY

Email: Payment: Email:


{P0} P1 {P2}
stream stream stream

Email: Payment:
{P1} P0
stream stream

NODE 1 NODE 2 NODE 3

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis
69

By the way, in Redis you can run 1. Let’s say you are running four Redis instances (four 3. Next, let’s say you move two shards to the second
multiple Redis instances in the shards) on a single node. And imagine you have split node in order to better distribute the load. This is
same node. These are called the data across these four instances. This is mainly called “rebalancing”.
shards. for parallel processing and to fully utilize all the CPU 4. Finally, in order to increase parallel processing and
and memory resources. to fully utilize all the CPUs, you may add more Redis
Picture 48 illustrates how the 2. Now, let’s say you want to move to two machines, instances and split the data across those instances. This
Redis cluster helps scale Redis. that is, you want to “scale out”. So you add a second is called “resharding”. So let’s say you’ve added two more
Here is how it works. machine. At this point you have scaled out, and this instances/shards in each node. Now in the beginning
node is now part of the cluster. But this new node is these new instances won’t have any data. So you need to
empty to begin with. It contains no Redis instances use a process called resharding to split and move some
or data. of the existing data. In the end you wind up with a total
of eight shards and much higher throughput.

Picture 48: Using Redis Enterprise to increase your throughput by scaling out, rebalancing, and resharding your data

SCALING OUT, RESHARDING & REBALANCING

SCALE OUT REBALANCING RESHARDING

Redis E-Book / Understanding Streams in Redis and Kafka – A Visual Guide © 2022 Redis

You might also like