Gorilla - A Fast, Scalable, In-Memory Time Series Database - The
Gorilla - A Fast, Scalable, In-Memory Time Series Database - The
MENU
Gorilla: A fast, scalable, in-memory time series database – Pelkonen et al. 2015
Error rates across one of Facebook’s sites were spiking. The problem had first shown up
through an automated alert triggered by an in-memory time-series database called Gorilla a
few minutes after the problem started. One set of engineers mitigated the immediate issue. A
second group set out to find the root cause. They fired up Facebook’s time series correlation
engine built on top of Gorilla, and searched for metrics showing a correlation with the errors.
This showed that copying a release binary to Facebook’s web servers (a routine event) caused
an anomalous drop in memory used across the site…
In the 18 months prior to publication, Gorilla helped Facebook engineers identify and debug
several such production issues.
As of Spring 2015, Facebook’s monitoring systems generated more than 2 billion unique time
series of counters, with about 12 million data points added per second – over 1 trillion data
points per day. Here then are the design goals for Gorilla:
To meet the performance requirements, Gorilla is built as an in-memory TSDB that functions
as a write-through cache for monitoring data ultimately written to an HBase data store. To
meet the requirements to store 26 hours of data in-memory, Gorilla incorporates a new time
series compression algorithm that achieves an average 12x reduction in size. The in-memory
data structures allow fast and efficient scans of all data while maintaining constant time
lookup of individual time series.
The key specified in the monitoring data is used to uniquely identify a time
series. By sharding all monitoring data based on these unique string keys, each
time series dataset can be mapped to a single Gorilla host. Thus, we can scale
Gorilla by simply adding new hosts and tuning the sharding function to map
new time series data to the expanded set of hosts. When Gorilla was launched
to production 18 months ago, our dataset of all time series data inserted in the
past 26 hours fit into 1.3TB of RAM evenly distributed across 20 machines. Since
then, we have had to double the size of the clusters twice due to data growth,
and are now running on 80 machines within each Gorilla cluster. This process
was simple due to the share-nothing architecture and focus on horizontal
scalability.
The in-memory data structure is anchored in a C++ standard library unordered map. This
proved to have sufficient performance and no issues with lock contention. For persistence
Gorilla stores data in GlusterFS, a POSIX-compliant distributed file system with 3x replication.
“HDFS, or other distributed file systems would have sufficed just as easily.” For more details on
the data structures and how Gorilla handles failures, see sections 4.3 and 4.4 in the paper. I
want to focus here on the techniques Gorilla uses for time series compression to fit all of that
data into memory!
When it comes to time stamps, a key observation is that most sources log points at fixed
intervals (e.g. one point every 60 seconds). Every now and then the data point may be logged a
little bit early or late (e.g., a second or two), but this window is normally constrained. We’re
now entering a world where every bit counts, so if we can represent successive time stamps
with very small numbers, we’re winning… Each data block is used to store two hours of data.
The block header stores the starting time stamp, aligned to this two hour window. The first
time stamp in the block (first entry after the start of the two hour window) is then stored as a
delta from the block start time, using 14 bits. 14 bits is enough to span a bit more than 4 hours
at second resolution so we know we won’t need more than that.
For all subsequent time stamps, we compare deltas. Suppose we have a block start time of
02:00:00, and the first time stamp is 62 seconds later at 02:01:02. The next data point is at
02:02:02, another 60 seconds later. Comparing these two deltas, the second delta (60 seconds),
is 2 seconds shorter than the first one (62). So we record -2. How many bits should we use to
record the -2? As few as possible ideally! We can use tag bits to tell us how many bits the actual
value is encoded with. The scheme works as follows:
Figure 3 shows the results of time stamp compression in Gorilla. We have found
that about 96% of all time stamps can be compressed to a single bit.
(i.e., 96% of all time stamps occur at regular intervals, such that the delta of deltas is zero).
So much for time stamps, what about the data values themselves?
We discovered that the value in most time series does not change significantly
when compared to its neighboring data points. Further, many data sources only
store integers. This allowed us to tune the expensive prediction scheme in [25]
to a simpler implementation that merely compares the current value to the
previous value. If values are close together the sign, exponent, and first few bits
of the mantissa will be identical. We leverage this to compute a simple XOR of
the current and previous values rather than employing a delta encoding
scheme.
The values are then encoded as follows:
If the meaningful bits do not fit within the meaningful bit range of the previous value, then
store the control bit ‘1’ followed by the number of leading zeros in the next 5 bits, the length
of the meaningful XOR’d value in the next 6 bits, and finally the meaningful bits of the
XOR’d value.
Roughly 51% of all values are compressed to a single bit since the current and
previous values are identical. About 30% of the values are compressed with the
control bits ’10’ with an average compressed size of 26.6 bits. The remaining 19%
are compressed with control bits ’11’, with an average size of 36.9 bits, due to the
extra overhead required to encode the length of leading zero bits and
meaningful bits.
Gorillas’ low latency processing (over 70x faster than the previous system it replaced) enabled
the Facebook team to build a number of tools on top, these include horizon charts; aggregated
roll-ups which update based on all completed buckets every two hours; and a correlation
engine that we saw being used in the opening case study.
1. Prioritize recent data over historical data. Why things are broken right now is a more
pressing question than why they were broken 2 days ago.
2. Read latency matters – without this the more advanced tools built on top would not have
been practical
3. High availability trumps resource efficiency.
We found that building a reliable, fault tolerant system was the most time
consuming part of the project. While the team prototyped a high performance,
compressed, in-memory TSDB in a very short period of time, it took several
more months of hard work to make it fault tolerant. However, the advantages of
fault tolerance were visible when the system successfully survived both real
and simulated failures.
POSTED IN UNCATEGORIZED
Finding surprising patterns in a time series database in linear time and space | the morning
paper (guest)
8 years ago
[…] the Facebook Gorilla paper, the authors mentioned a number of additional time series
analysis techniques they’d […]
Big Analytics Roundup (May 9, 2016) | The Big Analytics Blog (guest) 8 years ago
Towards parameter-free data mining | the morning paper (guest) 8 years ago
[…] time series paper today from the Facebook Gorilla references. Keogh et al. describe an in‐
credibly simple and easy to implement scheme that does […]
I would like to explore and evaluate.. I don't find the software for download for installation..
Can you please let us know if this software is available for public to use and explore? Is it an
open source?
Read 2 replies
The Morning Paper on Operability | the morning paper (guest) 8 years ago
[…] might also be interested in Gorilla, an in-memory TSDB that Facebook built to handle the 1
trillion data points per day gathered by […]
Kraken: Leveraging live traffic tests to identify and resolve resource utilization bottlenecks in
large scale web services | the morning paper (guest)
8 years ago
[…] under test to adjust the stress it is putting on systems. The traffic shifting module queries
Gorilla for system health before determining the next traffic shift to the system under test.
Health metric […]
Show more