WP 7 Reasons Cache
WP 7 Reasons Cache
BACKGROUND 3
PRE-CACHING AND CACHING 3
CONCLUSION 9
Before we dive into the problems with
WHY CACHE A DATABASE IN THE
these solutions, let’s examine a few generic
FIRST PLACE? approaches to database caching.
You have a database. Why would you need to
add a cache? Traditionally, caches have been Pre-caching and Caching
added to databases because they provided a Pre-caching is a process that loads data ahead
performance boost. Caches temporarily hold of time in anticipation of its use. For example,
data in memory, making data access much when a web page is retrieved, the pages that
faster by side-stepping persistent storage users typically jump to when they leave that
media, like hard or solid-state drives (SSD). page might be pre-cached in anticipation. An
RAM access (measured in tens of nanoseconds) application might pre-cache files or records that
is about three orders of magnitude faster than are commonly called for at some time during
SSD access (measured in tens of microseconds). a session. Pre-caching differs from web and
Simple enough, right? Add a cache and improve browser caching, in that pre-caching implies
application performance. If it were that simple, storing files that are expected to be used,
we’d all just deploy caches and be done with it. whereas regular caching deals with files already
However, there is a famous dictum: requested by the user.
There are only two hard things in Computer Pre-caching requires some method to determine
Science: cache invalidation and naming things. what should be cached ahead of time (i.e.,
— Phil Karlton, c. 1995 time and expertise needed to make manual or
procedural decisions on what to pre-cache).
Caches are not as simple as they are often
However, pre-caching might contain data that
made out to be. In fact, they can be one
never actually gets used. If you are caching in
of the more problematic components of a
an all-RAM instance, that means you’re paying
distributed application architecture. If your
money on expensive resources for pre-caching
data infrastructure relies on caches, your overall
info of low-value.
approach may be subject to the downsides of
caching. To help you assess these downsides, Caching, on the other hand, is populated as
we’ve identified the seven primary problems needed. Since the cache is reactive, the first
behind using external caches with a NoSQL transaction to it is likely to be much slower than
database. We also explain how the Scylla subsequent actions on the same data. Since
NoSQL database’s native caching alleviates nothing is primed in the cache, it has to be
these downsides and provides a simple, scalable populated as requests are served. This could
solution. saturate certain requests until the cache is fully
populated.
BACKGROUND
Side Cache versus Transparent Cache
Myriad database caching solutions have been
A further distinction exists between side
developed over the years. They range from
caches and transparent caches. External cache
simple application caches to in-memory data
deployments are typically implemented in the
grids that span global computer clusters.
form of a “side cache.” The cache is independent
Amazon Web Services, for example, offers a
of the database, and it places a heavy burden on
managed cache solution, known as Amazon
the application developer: The application itself
DynamoDB Accelerator (DAX), that runs in
is responsible for maintaining cache coherency.
front of its database. Redis, as another example,
The application performs double writes, both for
popularized the idea that a system can be both
the cache and for the database. Reads are done
a store and a cache, all at once.
first from the cache, and, if the data isn’t found,
a separate read is dispatched to the database.
3
a cache at all. This is because the request is
sent first to the cache, and then, upon failure
at the cache, to the database. The result is
additional latency to an already slow path
of uncached data. One may claim that when
the entire data set fits in the cache (quite
an expensive proposition; see below) the
additional latency doesn’t come into play, but
most of the time there is more than a single
workload/pattern that hits the database and
some of it will carry the extra hop cost.
4
it is frontending. Bottom line: Rely on the new data should replace existing older cached
database, rather than exposing yourself to objects.
data loss via the cache.
When scanning a large dataset, say a large
4. Application complexity — your application range or a full-table scan, a lot of objects are
needs to handle more cases read from the disk. The database can realize this
Application and operational complexity are is a scan and not a regular query and chose to
problems for external caches. Once you have leave these objects outside its internal cache,
an external cache, you need to keep the cache but an external cache would treat this query
up-to-date with the client and the database. like any other and attempt to cache the results.
For instance, if your database runs repairs, The database automatically synchronizes the
the cache needs to be synced or invalidated. content of the cache with the disk and with the
Your client retry and timeout policies need to incoming requests, and thus the user and the
match the properties of the cache but also developer do not need to do anything to make
need to function when the cache is done. it happen. If, for some reason, your database
Usually, such scenarios are hard to test. doesn’t respond fast enough, it means that...
• The cache is misconfigured
5. External caching ruins database caching
Modern databases have embedded caches • It doesn’t have enough RAM for caching
and complex policies to manage them. When • The working set size and request pattern
you place a cache in front of the database, don’t fit the cache
most read requests will reach only the
• The database cache implementation is poor
external cache and the database won’t keep
these objects in its memory. As a result, the
database cache is rendered ineffective, and
when requests eventually reach the database, A BETTER WAY: SCYLLA’S
its cache will be cold and the responses will EMBEDDED CACHE
come primarily from the disk.
The Scylla NoSQL database offers a better
6. External caching complicates data security approach to caching, one that addresses the
An external cache adds a whole new attack significant problems covered above, while also
surface to your infrastructure. Encryption, delivering the performance gains that caching
isolation and access control on data placed in promises. To understand how Scylla’s cache
the cache are likely to be different from the implementation addresses these problems, it’s
ones at the database layer itself. important to first examine how its cache works.
Scylla is designed to be fully compatible with
7. External caching ignores the database’s
Apache Cassandra. Yet, unlike Cassandra, Scylla
intelligence and resources
does not rely on the default cache offered by
Databases are very complex and impose high
Linux. Linux caching is inefficient for database
disk I/O workloads on the system. Any of
implementations for the following reasons.
the queries access the same data, and some
amount of the working set size can be cached The Linux page cache, also called disk cache,
in memory in order to save disk accesses. improves operating system performance by
A good database should have multiple storing page-size chunks of files in memory
sophisticated logical processes to decide to save on expensive disk seeks. The Linux
which objects and indexes it should cache, kernel treats files as 4KB chunks by default.
and who should have access to them. This speeds up performance, but only when
data is 4KB or larger. The problem is that many
The database also should have various eviction
common database operations involve data
policies (with the least recently used policy as a
smaller than 4KB. In those cases, Linux’s 4KB
straightforward example) that determine when
minimum leads to high read amplification.
5
Adding to the problem, the extra data is rarely The diagram below displays the architecture of
useful for subsequent queries (since it usually Cassandra’s caches, with layered key, row, and
has very poor ‘spatial locality’). For most cases underlying Linux page caches.
it’s just wasted bandwidth.
The architects who designed Scylla recognized
Cassandra attempts to alleviate read that a special-purpose cache would deliver
amplification by adding a key cache and a row better performance than Linux’s default cache.
cache, which directly store frequently used A unified cache can dynamically tune itself
objects. However, Cassandra’s extra caches to any workload, and obviates the need to
increase overall complexity and are very difficult manually tune multiple different caches as one
to configure properly. The operator allocates is forced to do with Apache Cassandra. Since
memory to each cache; different ratios produce Scylla caches objects itself, it always controls
varying performance characteristics. Different their eviction and memory footprint.
workloads benefit from different settings. The
More importantly though, Scylla can
operator also has to decide how much memory
dynamically balance the different types of
to allocate to the JVM’s heap as well as the off-
caches stored. Scylla does this using a set of
heap memory structures. Since the allocations
controllers, including a memtable controller,
are performed at boot time, it’s practically
compaction controls, and a cache controller,
impossible to get it right, especially for dynamic
which enable it to dynamically adjust their sizes.
workloads that can change dramatically over
Once data is no longer cached in memory, Scylla
time.
will generate a continuation task to read the
There is another problem. Under the hood, the data asynchronously from the disk using direct
Linux page cache also performs synchronous memory access (DMA), which allows hardware
blocking operations, which decrease the subsystems to access main system memory
performance and predictability of the system. independent of CPU.
Since Cassandra is unaware that a requested
The C++ Seastar framework on which Scylla is
object does not reside in the Linux page cache,
built will execute the continuation task in a µsec
accesses to non-resident pages will cause Linux
(1 million tasks/core/sec) and will rush to run
to issue a page fault and context switch to read
the next task. There’s no blocking, heavyweight
from disk. Then it will context switch again
context switch, waste, or tuning. For Scylla
to run another thread. The original thread is
users, this design means higher ratios of (cheap)
paused and its locks are held. Eventually, when
disk to (expensive) RAM.
the disk data is ready (yet another interrupt
context switch), the kernel will schedule in the This cache design enables each Scylla node to
original thread. serve more data, which in turn lets operators
6
run smaller clusters of more powerful nodes
PERFORMANCE CHARACTERISTICS
with larger disks. Scylla’s unified cache also
simplifies operations, since it eliminates multiple
OF SCYLLA’S CACHE
competing caches and dynamically tunes itself Scylla’s cache is much faster than Cassandra’s.
at runtime to accommodate varying workloads. To prove this, we ran the same workload on
Finally, because Scylla has a very efficient Google Cloud to compare the performance of
internal cache it obviates the need for a Scylla against Cassandra.
separate external cache, making for a more A single-node cluster was running on n1-
efficient, reliable, secure and cost-effective standard-32 and the loaders were running on
unified solution. n1-standard-4. Both Scylla and Cassandra were
There are also times when you want to make a run using default configurations.
query that avoids hitting the cache at all. For The latency graphs below display the
instance, if you anticipate making a broad ad performance characteristics that were recorded
hoc query, you might wish to avoid the cache, during the test.
read directly from disk, and thus avoid having
As you can see, these latency tests show
any returned results needlessly take up space in
noticeable differences for the cacheable
the cache. To support this functionality, Scylla
workload. In a 2-minute span Cassandra
allows filtering results to bypass the cache.
showed 5 spikes of read latencies greater than
10ms. Scylla never had more than single-digit
read latencies. For write latencies, Cassandra
regularly suffered spikes far greater than 10ms
every nine seconds, whereas Scylla generally
kept write latencies under 10ms.
Cassandra Scylla
7
SCYLLA’S CACHE IN THE Cache
REAL-WORLD Cache
Warmer
8
IMVU REINS IN COSTS BY MOVING FROM REDIS CONCLUSION
TO SCYLLA
Modern applications rely on memory
A popular social community, IMVU enables
architectures that are both extremely fast and
people all over the world to interact with each
globally distributed. The ideal architecture finds
other using 3D avatars on their desktops,
a balance between memory (RAM) and storage
tablets, and mobile devices. In order to meet
(SSD) to deliver high performance along with
growing requirements for scale, IMVU decided
reliability and consistency.
it needed a more performant solution than their
previous database architecture of Memcached Many organizations use in-memory caching
in front of MySQL and Redis. They looked for alongside a NoSQL database. As we have shown
something that would be easier to configure, in this paper, caching solutions that are not
easier to extend, and, if successful, easier to integral to the database can cause enormous
scale. headaches. Among many problems, they impose
unnecessary cost, complexity, and operational
They decided on using Scylla. Redis worked well
overhead.
in terms of features, but when IMVU actually
rolled it out to a hundred thousand concurrent If you are using an external cache or in-
users, they found the expense difficult to justify. memory database for high-speed operational
Scylla allowed IMVU to maintain that same performance, you should take a closer look at
responsiveness for a scale ten to a hundred the Scylla NoSQL database.
times as large as what Redis could handle.
“Redis was fine for prototyping features, but NEXT STEPS
once we actually rolled it out to a hundred Visit scylladb.com to...
thousand concurrent users, the expenses started
• Download Scylla. Check out our download
getting hard to justify,” said Ken Rudy, a senior
page to run Scylla on AWS, install it locally in
software engineer at IMVU. “Scylla is optimized
a Virtual Machine, or run it in Docker.
for keeping the data you need in memory and
everything else in disk. Scylla allowed us to • Take Scylla for a Test Drive. Our Test Drive
maintain the same responsiveness for a scale a lets you quickly spin-up a running cluster
hundred times what Redis could handle.” of Scylla so you can see for yourself how it
performs.
9
ABOUT SCYLLADB
Scylla is the real-time big data database. A drop-in
alternative to Apache Cassandra and Amazon DynamoDB,
Scylla embraces a shared-nothing approach that increases
throughput and storage capacity as much as 10X that of
Cassandra. AdGear, AppNexus, Comcast, Fanatics, FireEye,
Grab, IBM Compose, MediaMath, Ola Cabs, Samsung,
Starbucks and many more leading companies have
adopted Scylla to realize order-of-magnitude performance
improvements and reduce hardware costs. Scylla is available
in Open Source, Enterprise and fully managed Cloud
editions. ScyllaDB was founded by the team responsible
for the KVM hypervisor and is backed by Bessemer Venture
Partners, Eight Roads Ventures, Innovation Endeavors,
Magma Venture Partners, Qualcomm Ventures, Samsung
Ventures, TLV Partners, Western Digital Capital and Wing
Venture Capital.
For more information: ScyllaDB.com
SCYLLADB.COM