Caching at Scale With Redis Updated 2021 12 04
Caching at Scale With Redis Updated 2021 12 04
With Redis
Cloud Caching Techniques
for Enterprise Applications
Lee Atchison
To Beth
My life, my soulmate, my love.
Foreword
But when we sat down to make the book a reality, we quickly realized that we
needed an expert, independent voice to be the author. Someone with the
expertise to tell the whole story. Someone with the technical credibility to be
taken seriously by serious developers. Someone who could connect the technical
details to the business impacts. Someone with the wide-ranging background
needed to see the larger picture and tell the story from a vendor-neutral,
third-party perspective. And someone with the independence to always keep the
needs of enterprise readers in mind. Oh, and we needed someone who could
explain it all in plain English and straightforward visuals, so it would make sense
to as many folks as possible—without dumbing things down and while still
bringing real value and insight to experts. A tall order, to be sure.
Lee Atchison was obviously the right person for the job.
Lee spent seven years at Amazon Web Services, where among many other
accomplishments, he was the senior manager responsible for creating the AWS
Elastic Beanstalk Platform-as-a-Service (PaaS). In his eight years as a Principal
Cloud Architect and Senior Director of Strategic Architecture at New Relic, he
was instrumental in crafting the monitoring company’s cloud product strategy.
That’s not all. Lee is a widely quoted thought leader in publications including
diginomica, IT Brief, ProgrammableWeb, CIOReview, and DZone. He has jetted
around the world, giving eagerly received technical talks everywhere from
London to Sydney, Tokyo to Paris, and all over North America. He’s been a
regular guest on technology podcasts, articulating deep technical insights with
humor and good cheer—and he now hosts his own “Modern Digital
Applications” podcast. And perhaps most importantly, he’s an accomplished
author: his well-received and recently updated book, Architecting for Scale: High
Availability for Your Growing Applications, attracts long lines of developers queueing
up at conferences and trade shows for the chance to receive a signed copy.
That’s the story of how this book came to be, and why we think it’s so uniquely
valuable. We hope you find it useful and informative.
Acknowledgements
I’d like to thank Redis for its help creating this book. Specifically, I would like to
thank Fredric Paul. “The Freditor,” as we love to call him, was instrumental in
getting this book published, along with many of my past publications. Fred,
despite being dual-first-named, you are a great friend, and I love working with
you.
I would also like to thank Alec Wagner, who was the editor for this book. I’m
sure I will hear the phrases “passive voice vs. active voice” and “sentence case vs.
upper case” in my sleep for years to come.
And of course, a big thank you to my lovely wife, Beth, who is always there for
me and supports me and my career with such warmth and love. Finally, a shout
out to my fur family: Issie, who contributes background noises (snoring) to my
conference calls; Abbey, who adds smiles and wags; and Buddha, whose love for
keyboard walking is responsible for any typos you may encounter here.
Table of Contents
Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Doing this at the high scale demanded by our modern world is a challenge that
requires significant resources—and those demands are constantly growing based
on ever-changing needs. Handling these needs while maintaining high
availability is simply the cost of entry for any modern application.
While there are many methods, processes, and techniques for making an
application retain high availability at scale, caching is a central technique in almost
all of them. Effective caching is the hallmark of an effectively scalable application.
This book describes what caching is, why it is a cornerstone of effective large-
scale modern applications, and how Redis can help you meet these demanding
caching needs.
It then talks about the different types of caching strategies, and what
architectural patterns can be implemented with Redis. We then will talk about
cache scaling, and cache consistency, and how caching can be utilized in cloud
environments. We end with a discussion about measuring cache performance,
and conclude with a glossary of important cache terminology.
The audience for this book includes senior-level software engineers and software
architects who may already be familiar with Redis and are interested in
expanding their uses of Redis. They are building and operating highly scaled,
complex applications with large and often unwieldy datasets, and they are
interested in how Redis can be used as their primary caching engine for
managing data performance.
You will learn how caching, and hence Redis, fits into your complex
application architecture and how platforms such as Redis Enterprise can help
solve your problems.
“ Ano,
cache is a place to hide things…
wait—that’s the wrong type
of cache.
”
I n software engineering, a cache is a data-storage component that can be
accessed faster or more efficiently than the original source of the data. It is a
temporary holding location for information so that repeated access to the same
information can be acquired faster and with fewer resources. Figure 2-1
illustrates a consumer requesting data from a service, and the data then being
returned in response.
But what happens if the data service has to perform some complex operation in
order to acquire the data? This complex operation can be resource intensive,
time intensive, or both. If the service has to perform the complex operation
every time a consumer requests the data, a significant amount of time and/or
resources can be spent retrieving the same data over and over again.
Instead, with a cache, the first time the complex operation is performed, the
result is returned to the consumer, and a copy of the result is stored in the cache.
The next time the data is needed, rather than performing the complex operation
all over again, the result can be pulled directly out of the cache and returned to
the consumer faster and using fewer resources. This particular caching strategy is
commonly referred to as cache-aside—more on that later.
HTTP application cache: When you access a web page, depending on the
website, the page may require a fair amount of processing in order for the page
to be created, sent back to you, and rendered in your browser. However, the
page—or significant portions of the page—may not change much from one
request to the next. A cache is used to store pages and/or portions of pages so
they can be returned to a user faster and using fewer resources than if the cache
was not used. This results in a more responsive website, and the capability for the
website to handle significantly more simultaneous users with a given number of
computational resources. An HTTP application cache is illustrated in section 1
of Figure 2-2.
Web browser cache: Your web browser itself may also cache some content so
that it can be displayed almost instantly rather than waiting for it to be
downloaded across a significant distance and potentially over a slower internet
link. This browser cache is particularly effective for images and other static
content. In Figure 2-2, section 3 is a web browser cache.
CPU cache: There are many caches within computers, including in the central
processing unit (CPU). The core CPU caches executed commands so that
repeated execution of the same or similar instructions can occur considerably
faster. In fact, much of the increased speed of computers in recent years is due to
improvements in how commands are executed and cached, rather than actual
clock speed improvements. Figure 2-4 illustrates this type of cache.
Memory cache: RAM is fast, but not fast enough for high-speed CPUs. In
order for data to be fetched fast enough from RAM to keep up with high-speed
CPUs, RAM is cached into an extremely fast cache and commands are executed
from that cache.
Disk cache: Disk drives are relatively slow at retrieving information. RAM,
used as the storage medium for a cache, can be put in front of the disk, so that
repeated common disk operations (such as retrieving contents of a directory
listing), can be read from the cache rather than waiting for the results to be read
from the disk.
2. The cache must be able to store and properly retrieve the result faster
using fewer resources than did the original source.
5. The data must be needed more than once. The more times it is needed,
the more effective and useful the cache is.
For a cache to be effective, you need a really good understanding of the statistical
distribution of data access from your application or data source. When your data
access has a normal (bell-curve) distribution, caching is more likely to be effective
compared to a flat data access distribution. There are advanced caching strategies
that are more effective with other kinds of data access distributions.
In a dynamic cache, when a value is changed in the data store, that value is also
changed directly in the cache—an old value is overwritten with a new value. The
cache is called a dynamic cache, because the application using the cache can
write changes directly into the cache. How this occurs can vary depending on
the type of usage pattern employed, but the application using the cache can
fundamentally change data in a dynamic cache.
In a static cache, when a value is stored in the cache, it cannot be changed by the
application. It is immutable. Changes to the data are stored directly into the
underlying data store, and the out-of-date values are typically removed from the
cache. The next time the value is accessed from the cache, it will be missing,
hence it will be read from the underlying data store and the new value will be
stored in the cache.
Chapter 2. What Is Caching? 9
10 Caching at Scale With Redis
3.
Why Caching?
“ Cimportant
aching allows you to skip doing
things, and yet still benefit
from the results… sometimes.
”
C onsider a simple service that multiplies two numbers. A call to that service
might look like this:
MUL 6 7
result: 42
This service could be called repeatedly from multiple sources with many different
multiplication requests:
MUL 6 7
result: 42
MUL 3 4
result: 12
MUL 373 2389
result: 891,097
MUL 1839 2383
result: 4,382,337
MUL 16 12
result: 192
MUL 3 4
result: 12
You see it can take on an entire series of multiplication requests, process them all
independently, and return the results.
But did you notice that the last request is the same as the second request? The
same multiplication call has been requested a second time. This poor little service,
however, doesn’t know that it’s already been asked to perform the request, and it
goes about all the work it needs to do to calculate the result … again.
Multiplying 3 and 4 to get 12 may not seem like a very onerous task. But it’s
more complex than you might imagine, and the operations a multiplication
service such as this might be asked to perform could be significantly more
complex. If the service already has performed the same operation and returned
the same result, why should it have to redo the same operation? Although some
services can’t skip performing a repeated operation, in a service such as this,
there is no reason to redo a calculation that has already been performed.
3. If there is not an entry, call the service and get the result of 3 times 4.
4. Store the result of the service call into the cache under the entry “3x4”.
What happens next? Well, the first time anyone wants to get the result of 3 times
4, the request has to consult the service directly, because the value has not
previously been placed in the cache. However, after the value is calculated, the
result is placed in the cache. The cache now stores the operation (3 times 4,
written as 3x4) as a key to the result of the operation (12).
The next time someone wants to get the result of 3 times 4, the request sees that
the result is already stored in the cache, and it simply returns the stored result.
The service itself doesn’t need to perform the calculation, saving time and
resources.
This cache-aside process is illustrated in Figure 3-1. In the top of this figure
(section 1), you can see a consumer making a request to the multiplication service
without a cache. Every time the consumer needs to figure out a result, it must ask
the multiplication service to process the result.
The middle of the figure (section 2) shows that a cache has been hydrated. Now,
whenever a service call is made that calculates a result, the request and the
corresponding result are stored in the cache. That way, when a repeat request
comes in, as shown in the bottom of the figure (section 3), the request can be
satisfied by the cache without ever calling the service at all.
• P
erformance improvement. Caching improves latency. Latency is the
new outage, and if you can avoid having to do a time-intensive
calculation by simply using a cached result, you can reduce the latency
for all requests that utilize the cache. Over time, this can have a huge
performance impact on your application.
1. Caching can cause the application to not execute desired side effects
of targeted operations
Compare this to a service that might perform some physical action (such as
turning a car’s steering wheel) or might change data in some other system (such
as updating data in a user’s profile record). These types of services have side
effects, because simply calling the service causes changes to the application,
system, or the external world.
• A
service that changes the position of a car’s steering wheel has an
observable impact—it has a side effect.
• S
oftware that stores data in a database has an external impact—it’s
changing the state of the database.
Improperly caching services with side effects is the cause of many software
failures and system outages. It is easy to introduce bugs into a system when
adding caching if side effects aren’t properly taken into account.
But this isn’t always the case. Caches are most effective when the following two
criteria are true:
1. In a cache-aside pattern, the resources it takes to check and fill the
cache are significantly smaller than the resources it takes to perform
the backing operation in the first place.
The more both of these two statements are true, the more effective the cache is.
The less either or both of these are true, the less effective a cache is. In extreme
cases, a cache can actually make performance worse. In those cases, there is no
reason to implement the cache in production.
If, for example, the resources it takes to manipulate the cache is greater than the
resources it takes to perform the backing operation in the first place, then having
a cache can make your performance worse.
Additionally, when you check to see if the correct response is in the cache, and it
is unavailable more often than not, then the cache isn’t really helping that much,
and the overhead of checking the cache can actually make performance worse.
Situations like this are not good use cases for caching.
Summary
As we’ve shown, caching can be a great way to improve the performance,
scalability, availability, and reliability of a service. When done properly, caching
is highly valuable in applications of almost any size and complexity.
But be aware that improper caching can actually make performance worse.
Even more important, improper caching can introduce bugs and failures into
your system.
“ Tbuilding
here aresoftware:
three hard things in
maintaining cache
consistency and off-by-one errors.
”
A typical use case for a cache is as a temporary data store in front of the
system of record. This temporary memory store typically provides faster
access to data than the more-permanent memory store. This is either because
the cache medium used is itself physically faster (e.g., RAM for the cache
compared with hard disk storage for the permanent store), or because the
cache is physically or logically located nearer the consumer of the data (such
as at an edge location or on a local client computer, rather than in a backend
data center).
At the most basic level, this type of cache simply holds duplicate copies of data
that is also stored in the permanent memory store.
When an application needs to access data, it typically first checks to see if the
data is stored in the cache. If it is, the data is read directly from the cache. This
is usually the fastest and most reliable way of getting the data. However, if the
data is not in the cache, then the data needs to be fetched from the underlying
data store. After the data is fetched from the primary data store, it is typically
stored in the cache so future uses of the data will benefit by having the data
available in the cache.
Inline cache
An inline cache—which can include read-through, write-through, and read/
write-through caches—is a cache that sits in front of a data store, and the data
store is accessed through the cache.
Take a look at Figure 4-2. If an application wants to read a value from the data
store, it attempts to read the value from the cache. If the cache has the value, it is
simply returned. If the cache does not have the value, then the cache reads the
value from the underlying data store. The cache then remembers this value and
returns it to the calling application. The next time the value is needed, it can be
read directly from the cache.
Figure 4-2. Inline cache, in which cache consistency is the responsibility of the cache
When an application needs to read a value, it first checks to see if the value is in
the cache. If it is not in the cache, then the application accesses the data store
directly to read the desired value. Then, the application stores the value in the
cache for later use. The next time the value is needed, it is read directly from
the cache.
Cache consistency is the measure of whether data stored in the cache has the
same value as the source data that is stored in the underlying data store.
Maintaining cache consistency is essential for successfully utilizing a cache.
Take a look at Figure 4-5. In this diagram, the value of key “cost” is being
updated to the value “51”. This update is written by the application directly
into the data store. In order to maintain cache consistency, once the value has
been updated in the data store, the value in the cache is simply removed from
the cache either by the
application or the data store
itself. Because the value is no
longer available in the cache,
the application has to get the
value from the underlying data
store. By removing the newly
invalid value from the cache, in
a cache-aside pattern, the next
usage of the value will force it
to be read from the underlying
data store, guaranteeing that
the new value (“51”) will be
returned.
Figure 4-5. Invalidating cache on write
With a write-behind cache, the value is updated directly in the cache, just like the
write-through approach. However, the write call then immediately returns,
without updating the underlying data store. From the application perspective, the
write was fast, because only the cache had to be updated.
At this point in time, the cache has the newer value, and the data store has an
older value. To maintain cache consistency, the cache then updates the
underlying data store with the new value, at a later point in time. This is typically
a background, asynchronous activity performed by the cache.
This would not be a problem if all access to the key was performed through
this cache. However, if there is a mistake or error of some kind and the data is
accessed directly from the underlying data store, or through some other
means, it is possible that the old value will be returned for some period of
time. Whether or not this is a problem depends on your application
requirements. See Figure 4-8.
In this case, an important job of the cache is to try and determine what data will
be needed and make sure that data is available in the cache.
More often, however, the cache will remove older or less frequently used data
from the cache in order to make room for the newer data. Then, if some
application needs that older data in the future, it will need to re-fetch the data
from the underlying data store. This process of removing older or less frequently
used data is called cache eviction, because data is evicted, or removed, from
the cache.
The premise behind an LRU cache is that data that hasn’t been accessed
recently is less likely to be accessed in the future. Because the goal of the cache is
to keep data that will likely be needed in the future, getting rid of data that
hasn’t been used recently helps keep commonly used data available in the cache.
The difference between an LRU and LFU cache is small. An LRU uses the
amount of time since the data was last accessed, while the LFU uses the number
of times the data was accessed. In other words, the LRU bases its decision on an
access date, the LFU bases its decision on an access count.
Oldest-stored eviction
In an oldest-stored cache, when the cache is full, the cache looks for the data that
has been in the cache the longest period of time and removes that data first. This
eviction policy is not common in enterprise caches.
This is sometimes called a first-in-first-out (FIFO) cache. The data that was first
inserted into the cache is the data that is first evicted.
This isn’t a commonly used eviction technique, usually one of the algorithmic
approaches is chosen instead. However, this technique is easy for the cache to
implement, and therefore fast for it to execute. But these types of caches are
more likely to evict the wrong data, which means they tend to create a larger
number of cache misses later, when still-needed evicted data is accessed once
again. That’s why random eviction is not typically used in production.
Session management is a common use case for TTL eviction. A session object
stored in a cache can have a TTL set to represent the amount of time the system
waits before an idle user is logged off. Then, every time the user interacts with
the session, the TTL value is updated and postponed. If the user fails to interact
by the end of the TTL period, the session is evicted from the cache and the user
is effectively logged out.
This is simpler and more efficient for the cache to implement, because it
doesn’t require any eviction algorithm. This method can be used in cases
where the cache is at least as large as the underlying data store. (Of course,
that’s not common, as most caches do not store all of the data from the
underlying data store. Typically, only certain datasets, such as sessions, are
cached.) If the cache has enough capacity to hold all the data, then there is
no chance that the cache will ever fill, and hence eviction is never required.
Cache thrashing
Sometimes a value is removed from the cache, but is then requested again
soon afterwards and thus need to be re-fetched. This can cause other values
to be removed from the cache, which in turn requires them to be re-fetched
later when requested. This back-and-forth motion can lead to a condition
known as “cache thrashing,” which reduces cache efficiency. Cache
thrashing typically happens when a cache is full and not using the most
appropriate eviction type for the particular use case. Often, simply adjusting
the eviction algorithm or changing the cache size can reduce thrashing.
There is no right or wrong eviction strategy, the correct choice depends on your
application needs and expectations. Most often, the LRU or LFU is the best
choice, but which of those two depends on specific usage patterns. Analyzing
data access patterns and distribution is usually required to determine the proper
eviction type for a particular application, but sometimes trial and error is the best
strategy to figure out which algorithm to select. The oldest-stored eviction
strategy is also an option that can be tried and measured against LRU and LFU.
The random-eviction option is not used very often. You can test it in your
application, but most situations will find one of the other strategies work better.
Some applications require maximum performance across an even distribution of
data access, so evictions cannot be tolerated. But when using this option, space
management becomes a concern that must be managed appropriately.
As applications request data, data is read from the permanent data store and
stored in the cache. As time goes on, more and more data is stored in the cache
and available for use by the consumer applications. Over time, this results in
fewer cache misses and more cache hits. The performance of the cache improves
as time goes on. This is called a warm cache.
The process of initially seeding data into a cache is called warming the cache,
or simply cache warmup. When the data is added continuously over time, the
process is referred to as pre-fetching.
Redis as a cache
Redis makes an ideal application data cache. It runs in memory (RAM), which
means it is fast. Redis is often used as a cache frontend for some other, slower but
more permanent data store, such as an SQL database. Redis can also persist its
data, which can be used for a variety of purposes, including automatically
warming the cache during recovery.
By default, when a Redis database fills up, future writes to the database will
simply fail, preventing new data from being inserted into the database. This
mode can be used to implement the permanent cache eviction policy,
described earlier.
maxmemory-policy allkeys-lru
When the database is filled, old data will be automatically evicted from the
database in a least-recently used first eviction policy. This mode can be used to
implement an LRU cache.
Redis can also be configured as an LFU cache, using the LFU eviction option:
maxmemory-policy allkeys-lfu
Redis can implement other eviction algorithms as well. For example, it can
implement a random-eviction cache:
maxmemory-policy allkeys-random
maxmemory-policy volatile-lru
maxmemory-policy volatile-lfu
maxmemory-policy volatile-random
Approximation algorithms
It should be noted that the Redis LRU and LFU eviction policies are
approximations. When using the LRU expiration option, Redis does not
always delete the true least-recently used value when a value needs to be
deleted. Instead, it samples several keys and deletes the least-recently used key
among the sampled set.
“ Credundant
aches arerepeated
effective when they reduce
requests that generate
the same result that is equivalent and
equal… repeatedly.
”
C aches have lots of capabilities, features, and use cases that go beyond simply
storing key-value pairs. This chapter discusses some of these more
advanced aspects of caching.
With cache persistence, contents persist even during power outages and
system reboots. An application might rely on an object being stored in the
cache forever. A persistent cache may be chosen for performance reasons, as
long as it is acceptable for an application to fail and not perform if a value is
removed inappropriately.
Redis can operate as either a volatile or persistent cache. It uses RAM for its
primary memory storage, making it act like a volatile cache, yet permanent
storage can be used to provide the persistent backup and rehydration, so that
Redis can be used as a persistent cache.
Redis persistence
3. A combination of both
Together, these provide a variety of options for making data persistent as needed
while maintaining the performance advantages of being RAM-based.
Redis uses a file called the append-only file (AOF), in order to create a persistent
backup of the primary volatile cache in persistent storage. The AOF file stores a
real-time log of updates to the cache. This file is updated continuously, so it
represents an accurate, persistent view of the state of the cache when the cache
is shut down, depending on configuration and failure scenarios. When the cache
is restarted and cleared, the commands recorded in the AOF log file can be
replayed to re-create the state of the Redis cache at the time of the shutdown.
The result? The cache, while implemented primarily in volatile memory, can be
used as a reliable, persistent data cache.
The option APPENDONLY yes enables the AOF log file. All changes to the
cache will result in an entry being written to this log file, but the log file itself isn’t
necessarily stored in persistent storage immediately. For performance reasons,
you can delay the write to persistent storage for a period of time to improve
overall system performance. This is controlled via APPENDFSYNC. The following
options are available:
APPENDFSYNC no: This allows the operating system to cache the log
•
file and wait to persist it to permanent storage when it deems necessary.
Then there would be four entries in the log file. Ultimately, all four of these
entries are no longer needed to rehydrate the cache, since the entry is now
deleted. The following command will clean up the log file:
BGREWRITEAOF
The result is a log file with the shortest sequence of commands needed to
rehydrate the current state of the cache in memory. You can force this command
to be executed automatically, which is the recommended best practice, rather
than manually.
SAVE
BGSAVE
This command returns immediately and creates a background job that creates
the snapshot.
If your goal is to create a reliable, persistent cache that can survive process
crashes, system crashes, and other system failures, then the only reliable way to
do that is to use AOF persistence with APPENDFSYNC set to always. No other
method guarantees that the entire state of the cache will be properly stored in
persistent storage at all times. If your goal is to maintain a series of point-in-time
backups for historical and system-recovery purposes (such as saving one backup
per day for an entire month), then the RDB backup is the proper method to
create these backups. This is because the RDB is a single file providing an
accurate snapshot of the database at a given point in time. This snapshot is
guaranteed to be consistent. However, RDB cannot be used to survive system
failures, because any changes made to the system between RDB snapshots will
be lost during a system failure.
So, depending on your requirements, both RDB and AOF can be used to solve
your persistent needs. Used together, they can provide both a system-tolerant
persistent cache, along with historical point-in-time snapshot backups.
In RoF, all the data keys are still stored in RAM, but the value of those keys is
intelligently stored in a mixture of RAM and SSD flash storage. The value is
stored based on a least-recently used (LRU) eviction policy. More actively used
values are stored in RAM and lesser used values are stored in SSD.
Note that the use of persistent SSD flash memory does not automatically convert
your cache into a persistent cache. This is because the keys are still stored in
RAM, regardless of where your data values are stored, RAM or SSD. Therefore,
using RoF with SSD storage does not remove the requirement of creating AOF
and/or RDB backup files to create a true persistent cache.
As a simple example, imagine you have in a Redis database a few Hash maps
that represent user-related information, such as first name, last name, and age.
Then you can use the RG.PYEXECUTE command to execute a Python script to
perform data cleanup on this information. Here is a sample script that deletes all
users who are younger than 35 years old:
Microservices can also take advantage of Redis as a classic cache. This can be
as an internal, server-side cache storing interim data used internally by a
service. More specifically, a Redis instance can be used as a cache-aside cache
fronting a slower data store, as shown in the Redis data cache example in
Figure 5-3. Cache-aside caches are described in more detail in Chapter 4,
“Basic Caching Strategies.”
For more information, see “How Redis Simplifies Microservices Design Patterns” on The
New Stack (https://thenewstack.io/how-redis-simplifies-microservices-design-patterns).
The RediSearch module allows you to create an index of keys that contain Hash
data types. The index represents the attributes that you plan to query within all
Redis keys that are included in the index. Once the index is created, search
terms can be applied against the index to determine which Hash keys contains
data that match the search terms. This is illustrated in Figure 5-6.
This creates a search index using all Hash entries with a key that starts with
“user:”.
The three LPUSH commands pushed three elements on the list. The LRANGE
command prints the contents of the list from left to right. So, after executing the
three LPUSH commands above, the list contains the values ["CCC", "BBB",
"AAA"], in that order.
The Lists data type is most commonly used for queues and scheduling purposes,
and it can be used for these purposes in some cache scenarios. Some caches
utilize queues, such as Redis Lists, to order or prioritize data that is stored in the
cache.
Sets
A single Redis key can contain a set of strings. A Redis Set is an unordered list of
strings. Unlike the Redis Lists data type, the Redis Sets data type does not dictate
an order of insertion or removal. Additionally, in Redis Sets, a given data value
(a given string) must be unique. So, if you try to insert the same string value
twice into the same set, the value is only inserted once.
In a cache application, sets can be used to test for repeated operations. For
example, you can use a set to determine if a given command has been requested
from the same IP address recently:
Hashes
A single Redis key itself can represent a set of key-value pairs in the form of a
Hash. A Hash allows for the creation of custom structured data types and stores
them in a single Redis key entry.
The classic use case for a Redis Hash is to store properties for an object, such as
a user:
You can also change individual properties. Again, using the same data as before:
In a cache, you can use Hashes to store more complex properties-based data.
https://redis.io/topics/data-types-intro
Many of these data types will be more useful in traditional database use cases,
rather than traditional cache use cases. One of the benefits of Redis is that it
works great as a cache, but you can also use it as a primary NoSQL database.
Redis’ versatility is one of its greatest strengths.
“ Htake
owtomany seconds does it
change a lightbulb?
Zero—the lightbulb was cached.
”
C aches are hugely important to building large, highly scalable applications.
They improve application performance and reduce resource requirements,
thus enabling greater overall application scalability.
Once your application has reached a certain size and scale, even your cache will
meet performance limits. There are two types of limits that caches typically run
into: storage limits and resource limits.
Storage limits are limits on the amount of space available to cache data.
Consider a simple service cache, where service results are stored in the cache to
prevent extraneous service calls. The cache has room for only a specific number
of request results. If that number of unique requests is exceeded, then the cache
will fill, and some results will be discarded. The full cache has reached its storage
limit, and the cache can become a bottleneck for ongoing application scaling.
Resource limits are limits on the capability of the cache to perform its
necessary functions—storing and retrieving cached data. Typically, these
resources are either network bandwidth to retrieve the results, or CPU capacity
in processing the request. Consider the same simple service cache. If a single
request is made repeatedly and the result is cached, you won’t run into storage
limits because only a single result must be cached. However, the more the same
service request is made, the more often the single result will be retrieved from the
cache. At some point, the number of requests will be so large that the cache will
run out of the resources required to retrieve the value.
Vertical scaling can increase the amount of RAM available to the cache, thus
reducing the likelihood of the cache reaching a storage limit. But it can also add
larger and more powerful processors and more network bandwidth, which can
reduce the likelihood of the cache reaching a resource limit.
In other words, vertical scaling means increasing the size and computing power
of a single instance or node, while horizontal scaling involves increasing the
number of nodes or instances.
Read replicas
Read replicas are a technique used in open source Redis for improving the read
performance of a cache without significantly impacting write performance. In a
typical simple cache, the cache is stored on a single server, and both read and
write access to the cache occur on that server.
With read replicas, a copy of the cache is also stored on auxiliary servers,
called read replicas. The replicas receive updates from the primary server.
Because each of the auxiliary servers has a complete copy of the cache, a read
request for the cache can access any of the auxiliary servers—as they all have the
same result. Because they are distributed across multiple servers, a significantly
greater number of read requests can be handled, and handled more quickly.
When a write to the cache occurs, the write is performed to the master cache
instance. This master instance then sends a message indicating what has changed
in the cache to all of the read replicas, so that all instances have a consistent set
of cached data.
This model does not improve write performance, but it can increase read
performance for large-scale implementation by spreading the read load across
multiple servers. Additionally, availability can be improved—if any of the read
replicas crash, the load can be shared to any of the other servers in the cluster, so
the system remains operational. For increased availability, if the Redis master
instance fails, one of the read replicas can take over the master role and assume
those responsibilities.
With sharding, data is distributed across various partitions, each holding only a
portion of the cached information. A request to access the cache (either read or
write) is sent to a shard selector (in Redis Enterprise, this is implemented in a
proxy), which chooses the appropriate shard to which to send the request. In a
generic cache, the shard selector chooses the appropriate shard by looking at the
cache key for the request. In Redis, shard selection is implemented by the proxy
that oversees forwarding Redis operations to the appropriate database shard. It
then uses a deterministic algorithm to specify which shard a particular request
should be sent to. The algorithm is deterministic, which means every request for
a given cache key will go to the same shard, and only that shard will have
information for a given cache key. Sharding is illustrated in Figure 6-2.
But sharding isn’t always simple. In generic caches, choosing a shard selector that
effectively balances traffic across all nodes can require tuning. It can also lower
availability by increasing dependency on multiple instances. Failure of a single
instance can, if not properly managed, bring down the entire cache.
Redis Clustering addresses these issues and makes sharding simpler and easier to
implement. Redis Clustering uses a simple CRC16 on the key in order to select
one of up to 1,000 nodes that contain the desired data. A re-sharding protocol
allows for rebalancing for both capacity and performance management reasons.
Failover protocols improve the overall availability of the cache.
Active-Active (multi-master)
Active-Active, i.e. multi-master, replication is a way to handle higher loads of
both the write and the read performance of a cache.
When a write to one of the cache nodes occurs, the instance that receives the
write sends a message indicating what has changed in the cache to all of the
other nodes, so that all instances have a consistent set of cached data.
In the large cache implementation illustrated in Figure 6-3, the cache consists of
at least three servers, each running a copy of the cache software and each with a
complete copy of the cache database. Any of them can handle any type of data
request.
But what happens when two requests come in to update the same cached data
value? In a single-node cache, the requests are serialized and the changes take
place in order, with the last change typically overriding previous changes.
In this model, multiple master database instances are held in different data
centers which can be located across different regions and around the world.
Individual consumers connect to the Redis database instance that is nearest to
their geographic location. The Active-Active Redis database instances are then
synchronized in a multi-master model so that each Redis instance has a
complete and up-to-date copy of the cached data at all times. This model is
called Active-Active because each of the database instances can accept read
and write operations on any key, and the instances are peers in the network.
In that case, when a request comes in to return the result of “3 times 4”, the
cached value will be used rather than a calculated value, and the service will
return “13”, an obviously incorrect result that would mostly likely never occur in
the real world.
1. When the underlying results change, and the cache is not updated
The user requests a specific value to be read from the slow data source. The
cache is consulted. If the value is not in the cache, the slower backing data
source is consulted, the result is stored in the cache, and the request returns.
However, if the result is stored in the cache, the cached value is returned directly.
But what happens if the value in the underlying data source has changed? In a
cache-aside pattern, if the old value is still in the cache, then the old value will be
returned from future requests, rather than the new value. The cache is now
inconsistent. This is illustrated in Figure 7-2. In order for the cache to become
consistent again, either the old value in the cache has to be updated to the new
value, or the old value has to be removed from the cache, so future requests will
retrieve the correct value from the underlying data store.
This delay could, and should, be quite short—hopefully short enough so that the
delay does not cause any serious problems. However, in some cases it can be
quite lengthy. Whether or not this delay causes a problem is entirely dependent
on the use case, and it is up to the specific application to decide if the delay
causes any issues.
In these cases, one strategy is to set an expire time on the cache, requiring the
cached values to be thrown away and reread from the underlying data store at
regular intervals, limiting the amount of time the cached value may be
inconsistent. While this strategy can reduce the length of time a cached value is
In many scenarios, multiple cache nodes will have duplicate copies of all or part
of the cache. An algorithm is used to keep the cache values up to date and
consistent across all of the cache nodes. This is illustrated in Figure 7-3.
“ Cloudy
with a chance of caching.
”
R unning a single open source Redis instance on premises is rather
straightforward. There are a couple options for how to set it up, and
none of them are complex.
However, in the cloud, there are many different ways to set up and configure
Redis as a cache server. In fact, there are more ways to set up a Redis cache than
there are cloud providers. This chapter discusses some of the various cloud
options available.
These instances are easy to set up and use, can be turned on/off very quickly,
and are typically charged by the hour or the amount of resources consumed.
This makes them especially well-suited for development, testing, and autoscaled
production environments.
• Redis To Go
• Heroku
• ScaleGrid
• Aiven
• Digital Ocean
In theory, this model works even in cases in which the different regions are
provided by different cloud providers. The only caveat is that the replication
setup commands must be available for configuration by the cloud provider, and
those commands can be restricted on some levels of service from some providers.
Nonetheless, you could set up Redis manually on separate compute instances in
multiple cloud providers and then configure the replication so that your read
replicas from a given cloud provider are connected to a master in another cloud
provider. This is shown in Figure 8-3.
In this model, both reads and writes can be processed from any of the Redis
master instances in any region or with any provider. This improves application
performance dramatically. After a write occurs in a given region, it is
automatically replicated to all the other masters in the cluster.
To take advantage of this type of cluster replication, you must use Redis
Enterprise. For cloud-hosted databases, that means you must either use Redis
Enterprise Cloud instances, or you must roll your own self-hosted Redis instances
on cloud compute instances or container images. None of the major cloud
providers offer multi-region Active-Active deployments natively, nor do they offer
deployments across cloud providers. For this type of large-scale, highly available
distributed architecture, you must use Redis Enterprise.
How does the cache improve performance? The diagram in Figure 9-1 shows
the same cached multiplication service introduced in Chapter 2. This version,
though, shows how much time it takes to retrieve a value from the cache
(hypothetically, 1 millisecond), compared with having the multiplication service
calculate the result (25ms). Put another way, the first time the request is made,
the multiplication service has to be consulted, so the entire operation takes
approximately 25ms. But each subsequent equivalent operation can be
performed by retrieving the value from the cache, which takes only 1ms, in our
example. The result is improved performance for cached operations.
Notice that talking to and manipulating the cache also takes time. So requests
that must call the service also have to first check the cache and add a result to the
cache. This additional effort is called the cache overhead.
The total time a request takes to process as a result of a cache miss is:
Notice that this total time is greater than the time it takes for the service to
process the request if there was no cache (25ms). The additional 2ms is the
cache overhead.
The total time a request takes to process as a result of a cache hit is:
Request_Time = Cache_Check
Request_Time = 1ms
So, some requests take significantly less time (1ms in our example), while other
requests incur additional overhead (2ms in our example). Without a cache, all
requests would take about the same amount of time (25ms in our example).
In order for a cache to be effective, the overall time for all requests
must be less than the overall time if the cache didn’t exist.
This means, essentially, that there needs to be more cache hits than cache misses
overall. How many more depends on the amount of time spent processing the
cache (the cache overhead) and the amount of time it takes to process a
request to the service (service call time).
The greater the number of cache hits compared with the number of cache
misses, the more effective the cache. Additionally, the greater the service call time
compared with the cache overhead, the more effective the cache.
Let’s look at this in more detail. First, we need to introduce two more terms. The
cache miss rate is the percentage of requests that generate a cache miss.
Conversely, the cache hit rate is the percent of requests that generate a cache
hit. Because each request must either be a cache hit or cache miss, that means:
When using our multiplication service without a cache, each request takes 25ms.
With a cache, the time is either 1ms or 27ms, depending on whether there was a
cache hit or cache miss. In order for the cache to be effective, the 2ms overhead
of accessing the cache during a cache miss must be offset by some number of
cache hits. Put another way, the total request time without a cache must be
greater than the total request time with a cache for the cache to be considered
effective. Therefore, in order for the cache to be effective:
Request_Time_With_Cache =
( Cache_Miss_Rate * Request_Time_Cache_Miss ) +
( Cache_Hit_Rate * Request_Time_Cache_Hit )
And since:
Cache_Hit_Rate = 1 - Cache_Miss_Rate
Request_Time_With_Cache =
( Cache_Miss_Rate * Request_Time_Cache_Miss ) +
( (1 - Cache_Miss_Rate) * Request_Time_Cache_Hit )
Therefore:
Given that cache hit rate + cache miss rate = 1, we can do the same calculation
using the cache hit rate rather than the cache miss rate:
In other words, in this example, as long as a request can be satisfied by the cache
(cache hit rate) at least 7.7% of the time, then having the cache is more efficient
than not having the cache.
Doing the math the other way, you could ask a different question. If the average
request time is 25ms without a cache, what would be the average request time if
the cache hit rate was 25%? 50%? 75%? 90%?
0.75 * 27 + 0.25 * 1
20.25 + 0.25
20.5ms
The average request time assuming a cache hit rate of 25% is 20.5ms. Much
faster than the 25ms for no cache!
But it gets better, using our other cache hit rate assumptions:
Cache_Hit_Rate = 50%:
(1 - 0.5) * 27ms + 0.5 * 1ms
0.5 * 27 + 0.5 * 1
13.5 + 0.5
= 14ms
Cache_Hit_Rate = 75%:
(1 - 0.75) * 27ms + 0.75 * 1ms
0.25 * 27 + 0.75 * 1
6.75 + 0.75
= 7.5ms
Cache_Hit_Rate = 90%:
(1 - 0.9) * 27ms + 0.9 * 1ms
0.1 * 27 + 0.9 * 1
2.7 + .9
= 3.6ms
These calculations are all based on the amount of time it takes for the request to
be processed by the multiplication service without the cache (25ms in our
example). But this value is just an assumption. What happens if that value is
larger, say 500ms?
Cache_Hit_Rate = 25%:
(1 - 0.25) * 502ms + 0.25 * 1ms
0.75 * 502 + 0.25 * 1
376.5 + 0.25
= 376.75ms
Cache_Hit_Rate = 50%:
(1 - 0.5) * 502ms + 0.5 * 1ms
0.5 * 502 + 0.5 * 1
251 + 0.5
= 251.5ms
Cache_Hit_Rate = 75%:
(1 - 0.75) * 502ms + 0.75 * 1ms
0.25 * 502 + 0.75 * 1
125.5 + 0.75
= 126.25ms
Cache_Hit_Rate = 90%:
(1 - 0.9) * 502ms + 0.9 * 1ms
0.1 * 502 + 0.9 * 1
50.2 + .9
= 51.1ms
Cache eviction. The act of removing (often older or less frequently used) data
from a volatile cache to make room for new data when it is determined the data
is either no longer needed, or is less likely to be needed than other data in the
cache.
Cache hit. A request for data that can be satisfied using data stored in the
cache. If there is a cache hit, it means the cache was successful in handling the
request without engaging the backend data store or service.
Cache locality. The set of data available within a cache. Often, depending on
the application, the data that will be needed in the future can be inferred by data
used in the past, and that inferred data can be loaded in advance into the cache.
This inferred data is the cache locality. This practice is common for caches such
as CPU or memory caches, where reading one memory location often results in
nearby memory locations also being accessed.
Cache miss. A request for data that cannot be satisfied by existing data in the
cache and must be sent to the backend data store or service in order for it to be
processed. In the case of a cache miss, the cache is not successful in satisfying the
request.
Cache overhead. The amount of time in excess of the time it would normally
take without a cache to return the desired data result from the data store or
service. Cache overhead is used in calculating the effectiveness of a cache in
order to improve performance.
Cache warmup. The process of taking a cold cache and converting it into a
warm cache. This process may involve artificially storing data, at startup, into the
cache to make it a warm cache. But, more often than not, a cache starts out cold
and warms up gradually over time as more requests are made. The result is a
cache that goes from mostly cache misses to mostly cache hits.
Data lag. The time it takes for a changed value in a database or cache made to
one server to be propagated to all other servers in the system.
Flush (cache flush). The action of emptying a cache of all data. When a
cache has been flushed, it no longer contains any data and subsequent requests
will result in cache misses.
Horizontal scaling. The act of scaling a system to support larger requests and
more requests by adding additional servers and other components to the system,
rather than making those servers and other components bigger.
Least Recently Used (LRU). A cache eviction strategy used by full caches to
remove less-likely-to-be-used data from the cache to make room for more-likely-
to-be-used data. The LRU strategy removes data that hasn’t been used for the
longest period of time.
Vertical scaling. The act of scaling a system to support larger requests and
more requests, by making individual servers and other components of the
system bigger and more powerful, rather than adding more servers or
components in parallel.
Volatile cache. A cache in which data is evicted, or removed, from the cache if
there is no longer any room in the cache to store new data. Typically, data that is
old, stagnant, or less likely to be needed by the user is removed from the cache.
Different eviction algorithms (LRU, LFU, etc.) are employed to decide what data
to remove from the cache.
Warm cache. A cache that has sufficient data stored in it so that there is a
reasonably high chance of a request being satisfied from data in the cache,
resulting in a high number of cache hits.
Write conflict. When two writes of different values occur from different
sources to the same location. To resolve the conflict, the system must determine
which value to use.
• Cache coherency defines the behavior of reading and writing the same
data index (key), such that the system behaves as if there is no cache in the
system. In other words, reading a given data value will give the same result
whether it is read through a cache or not.
It might seem that, technically, what we are referring to in this book is closer to
cache coherency than to cache consistency. However, for most people and in
most conversations, if you simply use the phrase “cache consistency,” you will be
well understood. Either way, if you don’t understand these distinctions, don’t
worry too much. The difference is important mostly for those who deal with the
likes of CPU caches.
Lee has 34 years of industry experience. Lee spent seven years at Amazon Web
Services, where he led the creation of the company’s first software download
store, created the AWS Elastic Beanstalk service, and managed the migration of
Amazon’s retail platform to a new service-based architecture. Additionally, Lee
spent eight years at New Relic, where he led the construction of a solid service-
based system architecture and system processes, thus allowing New Relic to scale
from a startup to a high-traffic public enterprise.
Redis is the world’s most popular in-memory database, and commercial provider
of Redis Enterprise, which delivers superior performance, matchless reliability,
and unparalleled flexibility for personalization, machine learning, IoT, search,
e-commerce, social, and metering solutions worldwide.
MODERNIZE
YOUR ENTERPRISE
Thoughtful advice and expertise from an
external point of view on modernizing your
enterprise applications