0% found this document useful (0 votes)

89 views6 pages

Schema Design For Time Series Data in Bigtable

This document describes different schema designs for storing time series data in Cloud Bigtable. It discusses time bucket patterns where each row represents a bucket of time like an hour or day. It also covers single-timestamp patterns where each event is a separate row. Within these approaches, data can be stored in columns or serialized into a single column. The best design depends on the queries and size of the data. The examples illustrate recording weather balloon measurements stored different ways.

Uploaded by

Jo booking

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

89 views6 pages

Schema Design For Time Series Data in Bigtable

Uploaded by

Jo booking

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Schema design for time series data

cloud.google.com/bigtable/docs/schema-design-time-series

This page describes schema design patterns for storing time series data in Cloud Bigtable.
This page builds on Designing your schema and assumes you are familiar with the
concepts and recommendations described on that page.

A time series is a collection of data that consists of measurements and the times when the
measurements are recorded. Examples of time series include the following:

The plot of memory usage on your computer

Temperature over time on a news report
Stock market prices over a period of time

A good schema results in excellent performance and scalability, and a bad schema can
lead to a poorly performing system. However, no single schema design provides the best
fit for all use cases.

The patterns described on this page provide a starting point. Your unique dataset and the
queries you plan to use are the most important things to consider as you design a schema
for your time-series data.

The basic design patterns for storing time-series data in Bigtable are as follows:

Data for examples

To illustrate the differences between patterns, the examples on this page assume that you
are storing data for an app that records the measurements that weather balloons take
once every minute. We use event to mean a single request that writes one or multiple cells
at the same time. Location IDs correspond with Google Cloud regions.

Measurement Example

Pressure (pascals) 94587

Temperature (Celsius) 9.5

Humidity (percentage) 65

Altitude (meters) 601

Related data Example

1. Timestamps on this page are formatted like `tYYYY-MM-DD-HHMM` for human

readability. In a production table, timestamps are usually expressed as the
number of microseconds since 1970-01-0100:00:00 UTC, like
`1616264288050807`).

1/6
Measurement Example

Balloon ID 3698

Location asia-southeast1

Timestamp1 t2021-03-05-1204

1. Timestamps on this page are formatted like `tYYYY-MM-DD-HHMM` for human

readability. In a production table, timestamps are usually expressed as the
number of microseconds since 1970-01-0100:00:00 UTC, like
`1616264288050807`).

Time buckets
In a time bucket pattern, each row in your table represents a "bucket" of time, such as an
hour, day, or month. A row key includes a non-timestamp identifier, such as week49 , for
the time period recorded in the row, along with other identifying data.

The size of the bucket that you use — such as minute, hour, or day — depends on the
queries that you plan to use and on Bigtable data size limits. For instance, if rows that
contain an hour of data are bigger the recommended maximum size per row of 100 MB,
then rows that represent a half hour or a minute are probably a better choice.

Advantages of time bucket patterns include the following:

You'll see better performance. For example, if you store 100 measurements, Bigtable
writes and reads those measurements faster if they are in one row than if they are in
100 rows.

Data stored in this way is compressed more efficiently than data in tall, narrow
tables.

Disadvantages include the following:

Time-bucket schema design patterns are more complicated than single-timestamp

patterns and can take more time and effort to develop.

Adding new columns for new events

In this time bucket pattern, you write a new column to a row for each event, storing the
data in the column qualifier rather than as a cell value. This means that for each cell, you
send the column family, column qualifier, and timestamp, but no value.

Using this pattern for the sample weather balloon data, each row contains all the
measurements for a single metric, such as pressure , for a single weather balloon, over
the course of a week. Each row key contains the location, balloon ID, metric that you are

2/6
recording in the row, and a week number. Every time a balloon reports its data for a
metric, you add a new column to the row. The column qualifier contains the
measurement, the pressure in Pascals, for the minute identified by the cell timestamp.

In this example, after three minutes a row might look like this:

Row key 94558 94122 95992

us- t2021-03-05- t2021-03-05- t2021-03-05-

west2#3698#pressure#week1 1200 1201 1202

Use cases for this pattern include the following:

You don't need to measure changes in your time series data.

You want to save storage space by using column qualifiers as data.

Adding new cells for new events

In this time bucket pattern, you add new cells to existing columns when you write a new
event. This pattern lets you take advantage of Bigtable's ability to let you store multiple
timestamped cells in a given row and column. It's important to specify garbage collection
rules when you use this pattern.

Using the weather balloon data as an example, each row contains all the measurements
for a single weather balloon over the course of a week. The row key prefix is an identifier
for the week, so you can read an entire week's worth of data for multiple balloons with a
single query. The other row key segments are the location where the balloon operates and
the ID number for the balloon. The table has one column family, measurements , and
that column family has one column for each type of measurement: pressure ,
temperature , humidity , and altitude .

Every time a balloon sends its measurements, the application writes new values to the row
that holds the current week's data for the balloon, writing additional timestamped cells to
each column. At the end of the week, each column in each row has one measurement for
each minute of the week, or 10,080 cells (if your garbage collection policy allows it).

Each column in each row holds a measurement for each minute of the week. In this case,
after three minutes, the first two columns in a row might look like this:

Row key pressure temp

asia-south2#3698#week1 94558 (t2021-03-05-1200) 9.5 (t2021-03-05-1200)

94122 (t2021-03-05-1201) 9.4 (t2021-03-05-1201)

95992 (t2021-03-05-1202) 9.2 (t2021-03-05-1202)

Use cases for this pattern include the following:

3/6
You want to be able to measure changes in measurements over time.

Single-timestamp rows
In this pattern, you create a row for each new event or measurement instead of adding
cells to columns in existing rows. The row key suffix is the timestamp value. Tables that
follow this pattern tend to be tall and narrow, and each column in a row contains only
one cell.

Important: To avoid hotspots, never use a timestamp value as a row key prefix.

Single-timestamp serialized
In this pattern, you store all the data for a row in a single column in a serialized format
such as a protocol buffer (protobuf). This approach is described in more detail on
Designing your schema.

For example, if you use this pattern to store the weather balloon data, your table might
look like this after four minutes:

Row key measurements_blob

us-west2#3698#2021-03-05-1200 protobuf_1

us-west2#3698#2021-03-05-1201 protobuf_2

us-west2#3698#2021-03-05-1202 protobuf_3

us-west2#3698#2021-03-05-1203 protobuf_4

Advantages of this pattern include the following:

Storage efficiency

Speed

Disadvantages include the following:

The inability to retrieve only certain columns when you read the data

The need to deserialize the data after it's read

Use cases for this pattern include the following:

You are not sure how you will query the data or your queries might fluctuate.

Your need to keep costs down outweighs your need to be able to filter data before
you retrieve it from Bigtable.

Each event contains so many measurements that you might exceed the 100 MB per-
row limit if you store the data in multiple columns.

4/6
Single-timestamp unserialized
In this pattern, you store each event in its own row, even if you are recording only one
measurement. The data in the columns is not serialized.

Advantages of this pattern include the following:

It is generally easier to implement than a time-bucket pattern.

You might spend less time refining your schema before using it.

Disadvantages of this pattern often outweigh the advantages:

Bigtable is less performant with this pattern.

Data stored this way is not as efficiently compressed as data in wider columns.

Even when the timestamp is at the end of the row key, this pattern can result in
hotspots.

Use cases for this pattern include the following:

You want to always retrieve all columns but only a specified range of timestamps,
but you have a reason not to store the data in a serialized structure.

You want to store an unbounded number of events.

Using the weather balloon example data, the column family and column qualifiers are the
same as the example using time buckets and new cells. In this pattern, however, every set
of reported measurements for each weather balloon is written to a new row. The
following table shows five rows that are written using this pattern:

Row key pressure temperature humidity altitude

us-west2#3698#2021-03-05-1200 94558 9.6 61 612

us-west2#3698#2021-03-05-1201 94122 9.7 62 611

us-west2#3698#2021-03-05-1202 95992 9.5 58 602

us-west2#3698#2021-03-05-1203 96025 9.5 66 598

us-west2#3698#2021-03-05-1204 96021 9.6 63 624

Additional strategies
If you need to send multiple different queries for the same dataset, consider storing your
data in multiple tables, each with a row key designed for one of the queries.

You can also combine patterns in some cases. For example, you can store serialized data
in rows that represent time buckets, as long as you don't let the rows become too big.

5/6
What's next
Review the steps involved in planning a schema.
Understand the best practices for designing a schema.
Read about the performance you can expect from Bigtable.
Explore the diagnostic capabilities of Key Visualizer.
Work through a tutorial on monitoring time-series data with OpenTSDB and Google
Cloud.

Was this helpful?

6/6

AWS Redshift
No ratings yet
AWS Redshift
145 pages
4 - Dimensional Modeling
No ratings yet
4 - Dimensional Modeling
71 pages
Exam Topics - PDE - Questions-7w1dhd9jefy8p8w9ucpjurqidy
No ratings yet
Exam Topics - PDE - Questions-7w1dhd9jefy8p8w9ucpjurqidy
64 pages
ADO Lecture II 2024-26
No ratings yet
ADO Lecture II 2024-26
67 pages
XCMG Catalogue 2017
67% (6)
XCMG Catalogue 2017
14 pages
20 GFS BigTable
No ratings yet
20 GFS BigTable
36 pages
6.1 GCP - Cloud - Bigtable PDF
100% (1)
6.1 GCP - Cloud - Bigtable PDF
18 pages
Name: Reena Kale Te Comps Roll No: 23 DWM Experiment No: 1 Title: Designing A Data Warehouse Schema For A Case Study and Performing
No ratings yet
Name: Reena Kale Te Comps Roll No: 23 DWM Experiment No: 1 Title: Designing A Data Warehouse Schema For A Case Study and Performing
7 pages
Lecture 33
No ratings yet
Lecture 33
32 pages
Data Warehousing and Data Mining Dec 2023
No ratings yet
Data Warehousing and Data Mining Dec 2023
28 pages
Building A Scalable Time-Series Database Using Postgres: Mike Freedman
No ratings yet
Building A Scalable Time-Series Database Using Postgres: Mike Freedman
45 pages
Unit 5 Lecture 3
No ratings yet
Unit 5 Lecture 3
18 pages
SnowFlake Notes
100% (1)
SnowFlake Notes
40 pages
Schema Design Best Practices - Bigtable Documentation - Google Cloud
No ratings yet
Schema Design Best Practices - Bigtable Documentation - Google Cloud
14 pages
Storage Systems
No ratings yet
Storage Systems
23 pages
Apache Calcite - A Foundational Framework For Optimized Query Processing Over Heterogeneous Data Sources - Sigmod-2018
No ratings yet
Apache Calcite - A Foundational Framework For Optimized Query Processing Over Heterogeneous Data Sources - Sigmod-2018
23 pages
Bigtable Overview - Google Cloud
No ratings yet
Bigtable Overview - Google Cloud
8 pages
Time-Series, Graph Database Deep Dive
No ratings yet
Time-Series, Graph Database Deep Dive
20 pages
DWM Unit 2. Data Warehousing Modeling & OLAP I
100% (2)
DWM Unit 2. Data Warehousing Modeling & OLAP I
16 pages
Week 4
No ratings yet
Week 4
8 pages
Bigtable: A Distributed Storage System For Structured Data
No ratings yet
Bigtable: A Distributed Storage System For Structured Data
23 pages
Big Table
No ratings yet
Big Table
10 pages
Bigtable: A Distributed Storage System For Structured Data
No ratings yet
Bigtable: A Distributed Storage System For Structured Data
26 pages
Bigtable: A Distributed Storage System For Structured Data: Presentation On Paper by
No ratings yet
Bigtable: A Distributed Storage System For Structured Data: Presentation On Paper by
12 pages
Experiment2 E059 DWM PDF
No ratings yet
Experiment2 E059 DWM PDF
10 pages
HBase
No ratings yet
HBase
14 pages
Unit 5
No ratings yet
Unit 5
5 pages
Best Practices High Performance ETL To BigQuery Kerzqp
No ratings yet
Best Practices High Performance ETL To BigQuery Kerzqp
9 pages
Timescaledb: SQL Made Scalable For Time-Series Data: 1 Background
No ratings yet
Timescaledb: SQL Made Scalable For Time-Series Data: 1 Background
7 pages
BDA
No ratings yet
BDA
9 pages
Sqlfordevscom Next Level Database Techniques For Developers Pages 21 30
No ratings yet
Sqlfordevscom Next Level Database Techniques For Developers Pages 21 30
10 pages
Assingment:-2 Submitted To: - Mandeep Ma'Am Submitted By: - Nishant Ruhil UID:-17BCA1513 GROUP:-4 Class: - Bca-4D
No ratings yet
Assingment:-2 Submitted To: - Mandeep Ma'Am Submitted By: - Nishant Ruhil UID:-17BCA1513 GROUP:-4 Class: - Bca-4D
6 pages
Write Examples BigTable
No ratings yet
Write Examples BigTable
4 pages
Google Bigtable
No ratings yet
Google Bigtable
21 pages
Bigtable A System For Distributed Structured Storage: Motivation
No ratings yet
Bigtable A System For Distributed Structured Storage: Motivation
9 pages
Time Series Collections Considerations - MongoDB Manual v8.0
No ratings yet
Time Series Collections Considerations - MongoDB Manual v8.0
2 pages
Experiment No.02: LAB Manual Part A
No ratings yet
Experiment No.02: LAB Manual Part A
10 pages
BigData-Assignment4-CSP 554
No ratings yet
BigData-Assignment4-CSP 554
4 pages
Jumeed Oral Questions
100% (1)
Jumeed Oral Questions
261 pages
Star Schema B
No ratings yet
Star Schema B
10 pages
BA
No ratings yet
BA
6 pages
Bda Mid Ans
No ratings yet
Bda Mid Ans
18 pages
OD 03 PDE Building and Operationalizing Data Processing Systems
No ratings yet
OD 03 PDE Building and Operationalizing Data Processing Systems
34 pages
UCS15E08 - Cloud Computing - Unit 3 Notes
No ratings yet
UCS15E08 - Cloud Computing - Unit 3 Notes
13 pages
Columnar Database
No ratings yet
Columnar Database
18 pages
Timescale DB
No ratings yet
Timescale DB
3 pages
Hbase in Practice
No ratings yet
Hbase in Practice
46 pages
Exam Overview: GCP Data Engineer
100% (5)
Exam Overview: GCP Data Engineer
12 pages
In The Star Schema Design
No ratings yet
In The Star Schema Design
11 pages
Exam Overview: GCP Data Engineer
100% (1)
Exam Overview: GCP Data Engineer
12 pages
Bigtable: A Distributed Storage System For Structured Data
100% (1)
Bigtable: A Distributed Storage System For Structured Data
4 pages
HBase Design Patterns Sample Chapter
No ratings yet
HBase Design Patterns Sample Chapter
16 pages
DatawareHousing Concepts
No ratings yet
DatawareHousing Concepts
20 pages
Big Table
No ratings yet
Big Table
21 pages
AHM13e Chapter - 01 - Solution To Problems and Key To Cases
No ratings yet
AHM13e Chapter - 01 - Solution To Problems and Key To Cases
19 pages
The Ailing Planet Notes
100% (1)
The Ailing Planet Notes
4 pages
Data Storage Services in GCP: Relational Database Data Warehouse Nosql Big Data Database Service
No ratings yet
Data Storage Services in GCP: Relational Database Data Warehouse Nosql Big Data Database Service
15 pages
Untitled
No ratings yet
Untitled
1 page
Sunlight Dishwashing Liquid Msds
No ratings yet
Sunlight Dishwashing Liquid Msds
12 pages
QA For Bank
No ratings yet
QA For Bank
443 pages
School Based Management
No ratings yet
School Based Management
6 pages
Latihan Bahasa Inggris
No ratings yet
Latihan Bahasa Inggris
13 pages
Advance Structures (7th Semester) (B.ARCH)
No ratings yet
Advance Structures (7th Semester) (B.ARCH)
93 pages
Computer Organization: Basic Structure of Computer
No ratings yet
Computer Organization: Basic Structure of Computer
59 pages
Maths PP2 Marking Scheme - Docx Form 3 End Term 3 Excellence
No ratings yet
Maths PP2 Marking Scheme - Docx Form 3 End Term 3 Excellence
9 pages
Guide For The IFT Approval
No ratings yet
Guide For The IFT Approval
34 pages
Generalized Elliptical Distributions Theory and Applications (Thesis) - Frahm (2004)
No ratings yet
Generalized Elliptical Distributions Theory and Applications (Thesis) - Frahm (2004)
145 pages
Required Documents - World Education Services
No ratings yet
Required Documents - World Education Services
6 pages
Organization Behavior: Manish Awasthi
100% (1)
Organization Behavior: Manish Awasthi
11 pages
Importance of ITeS
No ratings yet
Importance of ITeS
12 pages
Public Administration:: Your Unofficially The Compulsory Subject (In The Changed Context)
No ratings yet
Public Administration:: Your Unofficially The Compulsory Subject (In The Changed Context)
4 pages
Employee Survey Questionnaire
No ratings yet
Employee Survey Questionnaire
1 page
Grade 10 - Unit 01
No ratings yet
Grade 10 - Unit 01
2 pages
State Space Control of Systems Tutorial
No ratings yet
State Space Control of Systems Tutorial
15 pages
123GL Undstd Cybersec
No ratings yet
123GL Undstd Cybersec
6 pages
Dividend Payout of Meezan Sovereign Fund and Meezan Cash Fund
No ratings yet
Dividend Payout of Meezan Sovereign Fund and Meezan Cash Fund
11 pages
NHB Ebook Wet Markets
No ratings yet
NHB Ebook Wet Markets
19 pages
RD Rigidsteelconduitimc
No ratings yet
RD Rigidsteelconduitimc
1 page
Journey Management Plan 3
No ratings yet
Journey Management Plan 3
1 page
(Communication Electronic Circuits) Preface
No ratings yet
(Communication Electronic Circuits) Preface
2 pages
Lab 6 Excel
No ratings yet
Lab 6 Excel
6 pages
Biogase and Its Uses
No ratings yet
Biogase and Its Uses
1 page
Unit 3
No ratings yet
Unit 3
3 pages
13 Marquez v. CA
No ratings yet
13 Marquez v. CA
1 page
Computer-Controlled Systems: Theory and Design, Third Edition
From Everand
Computer-Controlled Systems: Theory and Design, Third Edition
Karl J Åström
3/5 (1)
Introduction to Time Series Analysis
From Everand
Introduction to Time Series Analysis
Vikas Rathi
No ratings yet
Jump into JMP Scripting, Second Edition
From Everand
Jump into JMP Scripting, Second Edition
Wendy Murphrey
No ratings yet
Straight Road to Excel 2013/2016 Pivot Tables: Get Your Hands Dirty
From Everand
Straight Road to Excel 2013/2016 Pivot Tables: Get Your Hands Dirty
Sam Akrasi
No ratings yet
Microsoft Azure Database Administrator DP 300
From Everand
Microsoft Azure Database Administrator DP 300
Manish Soni
No ratings yet
Administering Microsoft Azure SQL Solutions DP 300
From Everand
Administering Microsoft Azure SQL Solutions DP 300
Manish Soni
No ratings yet

Schema Design For Time Series Data in Bigtable

Uploaded by

Schema Design For Time Series Data in Bigtable

Uploaded by

Schema design for time series data

The plot of memory usage on your computer

Data for examples

Pressure (pascals) 94587

Temperature (Celsius) 9.5

Altitude (meters) 601

Related data Example

1. Timestamps on this page are formatted like `tYYYY-MM-DD-HHMM` for human

1. Timestamps on this page are formatted like `tYYYY-MM-DD-HHMM` for human

Advantages of time bucket patterns include the following:

Disadvantages include the following:

Time-bucket schema design patterns are more complicated than single-timestamp

Adding new columns for new events

Row key 94558 94122 95992

us- t2021-03-05- t2021-03-05- t2021-03-05-

Use cases for this pattern include the following:

You don't need to measure changes in your time series data.

You want to save storage space by using column qualifiers as data.

Adding new cells for new events

Row key pressure temp

asia-south2#3698#week1 94558 (t2021-03-05-1200) 9.5 (t2021-03-05-1200)

94122 (t2021-03-05-1201) 9.4 (t2021-03-05-1201)

95992 (t2021-03-05-1202) 9.2 (t2021-03-05-1202)

Use cases for this pattern include the following:

Row key measurements_blob

Advantages of this pattern include the following:

Disadvantages include the following:

The need to deserialize the data after it's read

Use cases for this pattern include the following:

Advantages of this pattern include the following:

It is generally easier to implement than a time-bucket pattern.

Disadvantages of this pattern often outweigh the advantages:

Bigtable is less performant with this pattern.

Use cases for this pattern include the following:

You want to store an unbounded number of events.

Row key pressure temperature humidity altitude

us-west2#3698#2021-03-05-1200 94558 9.6 61 612

us-west2#3698#2021-03-05-1201 94122 9.7 62 611

us-west2#3698#2021-03-05-1202 95992 9.5 58 602

us-west2#3698#2021-03-05-1203 96025 9.5 66 598

us-west2#3698#2021-03-05-1204 96021 9.6 63 624

Was this helpful?

You might also like