0% found this document useful (0 votes)

78 views14 pages

Schema Design Best Practices - Bigtable Documentation - Google Cloud

This document provides best practices for designing schemas in Google Cloud Bigtable. It recommends storing related datasets in a single table to optimize for row-range reads. Column families should group related columns, and there should be no more than 100 column families per table. Row keys are the most important design element as they determine how data is organized and accessed. The schema should distribute reads and writes evenly across the row space.

Uploaded by

drunion

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

78 views14 pages

Schema Design Best Practices - Bigtable Documentation - Google Cloud

Uploaded by

drunion

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

3/18/24, 9:39 AM Schema design best practices | Bigtable Documentation | Google Cloud

Schema design best practices

This page contains information about Bigtable schema design. Before you read this page, you
should be familiar with the overview of Bigtable (/bigtable/docs/overview). The following topics
are covered on this page:

General concepts (#general-concepts): Basic concepts to keep in mind as you design your
schema.

Best practices (#best-practices): Design guidelines that apply to most use cases, broken
down by table component.

Special use cases (#special-use-cases): Recommendations for some specific use cases
and data patterns.

General concepts
Designing a Bigtable schema is different from designing a schema for a relational database. A
Bigtable schema is defined by application logic rather than by a schema definition object or
file. You can add column families to a table when you create or update the table, but columns
and row key patterns are defined by the data that you write to the table.

In Bigtable, a schema is a blueprint or model of a table, including the structure of the following
table components:

Row keys

Column families, including their garbage collection policies

Columns

Key Point: Design your schema for the queries that you plan to use.

In Bigtable, schema design is driven primarily by the queries, or read requests, that you plan to
send to the table. Because reading a row range (/bigtable/docs/reads#row-range) is the fastest
way to read your Bigtable data, the recommendations on this page are designed to help you
optimize for row range reads. In most cases, that means sending a query based on row key
prefixes.

https://cloud.google.com/bigtable/docs/schema-design 1/14
3/18/24, 9:39 AM Schema design best practices | Bigtable Documentation | Google Cloud

A secondary consideration is the avoidance of hotspots – to prevent hotspots, you need to

consider write patterns and how you can avoid accessing a small key space in a short amount
of time.

The following general concepts apply to Bigtable schema design:

Bigtable is a key/value store, not a relational store. It does not support joins, and
transactions are supported only within a single row.

Each table has only one index, the row key. There are no secondary indexes. Each row
key must be unique.

Rows are sorted lexicographically by row key, from the lowest to the highest byte string.
Row keys are sorted in big-endian byte order (sometimes called network byte order), the
binary equivalent of alphabetical order.

Column families are not stored in any specific order.

Columns are grouped by column family and sorted in lexicographic order within the
column family. For example, in a column family called SysMonitor with column qualifiers
of ProcessName, User, %CPU, ID, Memory, DiskRead, and Priority, Bigtable stores the
columns in this order:

SysMonitor

%CPU DiskRead ID Memory Priority ProcessName User

The intersection of a row and column can contain multiple timestamped cells. Each cell
contains a unique, timestamped version of the data for that row and column.

All operations are atomic at the row level. An operation affects either an entire row or
none of the row.

Ideally, both reads and writes should be distributed evenly across the row space of a
table.

Bigtable tables are sparse. A column doesn't take up any space in a row that doesn't use
the column.

Best practices

https://cloud.google.com/bigtable/docs/schema-design 2/14
3/18/24, 9:39 AM Schema design best practices | Bigtable Documentation | Google Cloud

A good schema results in excellent performance and scalability, and a poorly designed schema
can lead to a poorly performing system. Every use case is different and requires its own
design, but the following best practices apply to most use cases. Exceptions are noted.

Starting at the table level and working down to the row key level, the following sections
describe the best practices for schema design:

Tables (#tables)

Column families (#column-families)

Columns (#columns)

Rows (#rows)

Cells (#cells)

Row keys (#row-keys)

All table elements, especially row keys, should be designed with planned read requests in
mind. Check quotas and limits (/bigtable/quotas#limits-data-size) for recommended and hard size
limits for all table elements.

Because all tables in an instance are stored on the same tablets

(/bigtable/docs/overview#architecture), a schema design that results in hotspots in one table can
affect the latency of other tables in the same instance. Hotspots are caused by frequently
accessing one part of the table in a short period of time.

Tables
Store datasets with similar schemas in the same table, rather than in separate tables.

In other database systems, you might choose to store data in multiple tables based on the
subject and number of columns. In Bigtable, however, it's usually better to store all your data in
one table. You can assign a unique row key prefix to use for each dataset, so that Bigtable
stores the related data in a contiguous range of rows that you can then query by row key prefix.

Bigtable has a limit of 1,000 tables per instance, but usually you should have far fewer tables.
Avoid creating a large number of tables for the following reasons:

Sending requests to many different tables can increase backend connection overhead,
resulting in increased tail latency.

https://cloud.google.com/bigtable/docs/schema-design 3/14
3/18/24, 9:39 AM Schema design best practices | Bigtable Documentation | Google Cloud

Having multiple tables of different sizes can disrupt the behind-the-scenes load
balancing that makes Bigtable function well.

You might justifiably want a separate table for a different use case that requires a different
schema, but you shouldn't use separate tables for similar data. For example, you shouldn't
create a new table because it's a new year or you have a new customer.

Column families
Put related columns in the same column family. When a row contains multiple values that are
related to one another, it's a good practice to group the columns that contain those values in
the same column family. Group data as closely as you can to avoid needing to design complex
filters and so you get just the information that you need, but no more, in your most frequent
read requests.

Create up to about 100 column families per table. Creating more than 100 column families
may cause performance degradation.

Choose short names for your column families. Names are included in the data that is
transferred for each request.

Put columns that have different data retention needs in different column families. This
practice is important if you want to limit storage costs. Garbage collection policies are set at
the column family level, not at the column level. For example, if you only need to keep the most
recent version of a particular piece of data, don't store it in a column family that is set to store
1,000 versions of something else. Otherwise you're paying to store 999 cells of data that you
don't need.

Columns
(Optional) Treat column qualifiers as data. Since you have to store a column qualifier for every
column, you can save space by naming the column with a value. As an example, consider a
table that stores data about friendships in a Friends column family. Each row represents a
person and all their friendships. Each column qualifier can be the ID of a friend. Then the value
for each column in that row can be the social circle the friend is in. In this example, rows might
look like this:

Row key Column qualifier:value Column qualifier:value Column qualifier:value

https://cloud.google.com/bigtable/docs/schema-design 4/14
3/18/24, 9:39 AM Schema design best practices | Bigtable Documentation | Google Cloud

Jose Fred:book-club Gabriel:work Hiroshi:tennis

Sofia Hiroshi:work Seo Yoon:school Jakob:chess-club

Contrast this schema with a schema for the same data that doesn't treat column qualifiers as
data and instead has the same columns in every row:

Row key Column qualifier:value Column qualifier:value

Jose#1 Friend:Fred Circle:book-club

Jose#2 Friend:Gabriel Circle:work

Jose#3 Friend:Hiroshi Circle:tennis

Sofia#1 Friend:Hiroshi Circle:work

Sofia#2 Friend:Seo Yoon Circle:school

Sofia#3 Friend:Jakob Circle:chess-club

The second schema design causes the table to grow much faster.

If you're using column qualifiers to store data (#columns), give column qualifiers short but
meaningful names. This approach lets you reduce the amount of data that is transferred for
each request. The maximum size is 16 KB.

Create as many columns as you need in the table. Bigtable tables are sparse, and there's no
space penalty for a column that is not used in a row. You can have millions of columns in a
table, as long as no row exceeds the maximum limit of 256 MB per row.

Avoid using too many columns in any single row. Even though a table can have millions of
columns, a row shouldn't. A few factors contribute to this best practice:

It takes time for Bigtable to process each cell in a row.

Each cell adds some overhead to the amount of data that's stored in your table and sent
over the network. For example, if you're storing 1 KB (1,024 bytes) of data, it's much more
space-efficient to store that data in a single cell, rather than spreading the data across
1,024 cells that each contain 1 byte.

If your dataset logically requires more columns per row than Bigtable can process efficiently,
consider storing the data as a protobuf in a single column (#query-flux).

https://cloud.google.com/bigtable/docs/schema-design 5/14
3/18/24, 9:39 AM Schema design best practices | Bigtable Documentation | Google Cloud

Rows
Keep the size of all values in a single row under 100 MB. Make sure that data in a single row
doesn't exceed 256 MB. Rows that exceed this limit can result in reduced read performance.

Keep all information for an entity in a single row. For most use cases, avoid storing data that
you must read atomically, or all at once, in more than one row to avoid inconsistencies. For
example, if you update two rows in a table, it's possible that one row will be updated
successfully and the other update will fail. Make sure your schema does not require more than
one row to be updated at the same time for related data to be accurate. This practice ensures
that if part of a write request fails or must be sent again, that piece of data is not temporarily
incomplete.

Exception: If keeping an entity in a single row results in rows that are hundreds of MB, you
should split the data across multiple rows.

Store related entities in adjacent rows, to make reads more efficient.

Cells
Don't store more than 10 MB of data in a single cell. Recall that a cell is the data stored for a
given row and column with a unique timestamp, and that multiple cells can be stored at the
intersection of that row and column. The number of cells retained in a column is governed by
the garbage collection policy (/bigtable/docs/garbage-collection) that you set for the column
family that contains that column.

Row keys
Design your row key based on the queries you will use to retrieve the data. Well-designed row
keys get the best performance out of Bigtable. The most efficient Bigtable queries retrieve data
using one of the following:

Row key

Row key prefix

Range of rows defined by starting and ending row keys

Other types of queries trigger a full table scan, which is much less efficient. By choosing the
correct row key now, you can avoid a painful data migration process later.

https://cloud.google.com/bigtable/docs/schema-design 6/14
3/18/24, 9:39 AM Schema design best practices | Bigtable Documentation | Google Cloud

Keep your row keys short. A row key must be 4 KB or less. Long row keys take up additional
memory and storage and increase the time it takes to get responses from the Bigtable server.

Store multiple delimited values in each row key. Because the best way to query Bigtable
efficiently is by row key, it's often useful to include multiple identifiers in your row key. When
your row key includes multiple values, it's especially important to have a clear understanding of
how you use your data.

Row key segments are usually separated by a delimiter, such as a colon, slash, or hash symbol.
The first segment or set of contiguous segments is the row key prefix, and the last segment or
set of contiguous segments is the row key suffix.

Well-planned row key prefixes let you take advantage of Bigtable's built-in sorting order to store
related data in contiguous rows. Storing related data in contiguous rows lets you access
related data as a range of rows, rather than running inefficient table scans.

If your data includes integers that you want to store or sort numerically, pad the integers with
leading zeroes. Bigtable stores data lexicographically (#general-concepts). For example,
lexicographically, 3 > 20 but 20 > 03. Padding the 3 with a leading zero ensures that the
numbers are sorted numerically. This tactic is important for timestamps where range-based
queries are used.

It's important to create a row key that makes it possible to retrieve a well-defined range of
rows. Otherwise, your query requires a table scan, which is much slower than retrieving
specific rows.

For example, if your application tracks mobile device data, you can have a row key that
consists of device type, device ID, and the day the data is recorded. Row keys for this data
might look like this:

phone#4c410523#20200501
phone#4c410523#20200502
tablet#a0b81f74#20200501

https://cloud.google.com/bigtable/docs/schema-design 7/14
3/18/24, 9:39 AM Schema design best practices | Bigtable Documentation | Google Cloud

tablet#a0b81f74#20200502

This row key design lets you retrieve data with a single request for:

A device type

A combination of device type and device ID

This row key design would not be optimal if you want to retrieve all data for a given day.
Because the day is stored in the third segment, or the row key suffix, you cannot just request a
range of rows based on the suffix or a middle segment of the row key. Instead, you have to
send a read request with a filter that scans the entire table looking for the day value.

Use human-readable string values in your row keys whenever possible. This practice makes it
easier to use the Key Visualizer tool (/bigtable/docs/keyvis-overview) to troubleshoot issues with
Bigtable.

Often, you should design row keys that start with a common value and end with a granular
value. For example, if your row key includes a continent, country, and city, you can create row
keys that look like the following so that they automatically sort first by values with lower
cardinality:

asia#india#bangalore
asia#india#mumbai
asia#japan#okinawa
asia#japan#sapporo
southamerica#bolivia#cochabamba
southamerica#bolivia#lapaz
southamerica#chile#santiago
southamerica#chile#temuco

Note: In production, you would probably use identifiers that take up less storage, like AS for Asia or 591 for
Bolivia.

Row keys to avoid

Some types of row keys can make it difficult to query your data, and some result in poor
performance. This section describes some types of row keys that you should avoid using in

https://cloud.google.com/bigtable/docs/schema-design 8/14
3/18/24, 9:39 AM Schema design best practices | Bigtable Documentation | Google Cloud

Bigtable.

Row keys that start with a timestamp. This pattern causes sequential writes to be pushed onto
a single node, creating a hotspot
(/bigtable/docs/schema-design-time-series#ensure_that_your_row_key_avoids_hotspotting). If you put a
timestamp in a row key, precede it with a high-cardinality value like a user ID to avoid hotspots.

Row keys that cause related data to not be grouped. Avoid row keys that cause related data to
be stored in non-contiguous row ranges, which are inefficient to read together.

Sequential numeric IDs. Suppose that your system assigns a numeric ID to each of your
application's users. You might be tempted to use the user's numeric ID as the row key for your
table. However, because new users are more likely to be active users, this approach is likely to
push most of your traffic to a small number of nodes.

A safer approach is to use a reversed version of the user's numeric ID, which spreads traffic
more evenly across all of the nodes for your Bigtable table.

Frequently updated identifiers. Avoid using a single row key to identify a value that must be
updated frequently. For example, if you store memory-usage data for a number of devices once
per second, do not use a single row key for each device that is made up of the device ID and
the metric being stored, such as 4c410523#memusage, and update the row repeatedly. This type
of operation overloads the tablet that stores the frequently used row. It can also cause a row to
exceed its size limit, because a column's previous values take up space until the cells are
removed during garbage collection.

Instead, store each new reading in a new row. Using the memory usage example, each row key
can contain the device ID, the type of metric, and a timestamp, so the row keys are similar to
4c410523#memusage#1423523569918. This strategy is efficient because in Bigtable, creating a
new row takes no more time than creating a new cell. In addition, this strategy lets you quickly
read data from a specific date range by calculating the appropriate start and end keys.

For values that change frequently, such as a counter that is updated hundreds of times each
minute, it's best to keep the data in memory, at the application layer, and write new rows to
Bigtable periodically.

Hashed values. Hashing a row key removes your ability to take advantage of Bigtable's natural
sorting order, making it impossible to store rows in a way that are optimal for querying. For the
same reason, hashing values makes it challenging to use the Key Visualizer tool to
troubleshoot issues with Bigtable. Use human-readable values instead of hashed values.

https://cloud.google.com/bigtable/docs/schema-design 9/14
3/18/24, 9:39 AM Schema design best practices | Bigtable Documentation | Google Cloud

Values expressed as raw bytes rather than human-readable strings. Raw bytes are fine for
column values, but for readability and troubleshooting, use string values in row keys.

Special use cases

You may have a unique dataset that requires special consideration when designing a schema
to store it in Bigtable. This section describes some, but not all, different types of Bigtable data
and some suggested tactics for storing it in the most optimal way.

Time-based data
Include a timestamp as part of your row key if you often retrieve data based on the time when
it was recorded.

For example, your application might record performance-related data, such as CPU and
memory usage, once per second for many machines. Your row key for this data could combine
an identifier for the machine with a timestamp for the data (for example,
machine_4223421#1425330757685). Keep in mind that row keys are sorted lexicographically
(#row-keys).

Don't use a timestamp by itself or at the beginning of a row key, because this will cause
sequential writes to be pushed onto a single node, creating a hotspot. In this case, you need to
consider write patterns as well as read patterns.

If you usually retrieve the most recent records first, you can use a reversed timestamp in the
row key by subtracting the timestamp from your programming language's maximum value for
long integers (in Java, java.lang.Long.MAX_VALUE). With a reversed timestamp, the records
will be ordered from most recent to least recent.

Note: A timestamp is usually the number of microseconds since 1970-01-01 00:00:00 UTC.

For information specifically about working with time series data, see Schema design for time
series data (/bigtable/docs/schema-design-time-series).

Multi-tenancy

https://cloud.google.com/bigtable/docs/schema-design 10/14
3/18/24, 9:39 AM Schema design best practices | Bigtable Documentation | Google Cloud

Row key prefixes provide a scalable solution for a "multi-tenancy" use case, a scenario in which
you store similar data, using the same data model, on behalf of multiple clients. Using one
table for all tenants is the most efficient way to store and access multi-tenant data.

For example, say you store and track purchase histories on behalf of many companies. You
can use your unique ID for each company as a row key prefix. All data for a tenant is stored in
contiguous rows in the same table, and you can query or filter using the row key prefix. Then,
when a company is no longer your customer and you need to delete the purchase history data
you were storing for the company, you can drop the range of rows
(/bigtable/docs/reference/admin/rpc/google.bigtable.admin.v2#google.bigtable.admin.v2.BigtableTableAd
min.DropRowRange)
that use that customer's row key prefix.

For example, if you are storing mobile device data for customers altostrat and
examplepetstore, you can create row keys like the following. Then, if altostrat is no longer
your customer, you drop all rows with the row key prefix altostrat.

altostrat#phone#4c410523#20190501
altostrat#phone#4c410523#20190502
altostrat#tablet#a0b41f74#20190501
examplepetstore#phone#4c410523#20190502
examplepetstore#tablet#a6b81f79#20190501
examplepetstore#tablet#a0b81f79#20190502

In contrast, if you store data on behalf of each company in its own table, you can experience
performance and scalability issues. You are also more likely to inadvertently reach Bigtable's
limit of 1,000 tables per instance. After an instance reaches this limit, Bigtable prevents you
from creating more tables in the instance.

Privacy
Unless your use case demands it, avoid using personally identifiable information (PII) or user
data in row keys or column family IDs. The values in row keys and column families are both
customer data and service data, and applications that use them, like encryption or logging, can
inadvertently expose them to users who shouldn't have access to private data.

For more information about how service data is handled, see the Google Cloud Privacy Notice
(/terms/cloud-privacy-notice).

https://cloud.google.com/bigtable/docs/schema-design 11/14
3/18/24, 9:39 AM Schema design best practices | Bigtable Documentation | Google Cloud

Domain names
Wide range of domain names

If you're storing data about entities that can be represented as domain names, consider using
a reverse domain name (for example, com.company.product) as the row key. Using a reverse
domain name is an especially good idea if each row's data tends to overlap with adjacent rows.
In this case, Bigtable can compress your data more efficiently.

In contrast, standard domain names that are not reversed can cause rows to be sorted in such
a way that related data is not grouped together in one place, which can result in less efficient
compression and less efficient reads.

This approach works best when your data is spread across many different reverse domain
names.

To illustrate this point, consider the following domain names, automatically sorted in
lexicographic order by Bigtable:

drive.google.com
en.wikipedia.org
maps.google.com

This is undesirable for the use case where you want to query all rows for the google.com. In
contrast, consider the same rows where the domain names have been reversed:

com.google.drive
com.google.maps
org.wikipedia.en

In the second example, the related rows are automatically sorted in a way that makes it easy to
retrieve them as a range of rows.

Few domain names

If you expect to store a lot of data for only one or a small number of domain names, consider
other values for your row key. Otherwise, you might push writes to a single node in your cluster,

https://cloud.google.com/bigtable/docs/schema-design 12/14
3/18/24, 9:39 AM Schema design best practices | Bigtable Documentation | Google Cloud

resulting in hotspots, or your rows might grow too large.

Changing or uncertain queries

If you don't always run the same queries on your data, or you are unsure what your queries will
be, one option is to store all the data for a row in one column instead of multiple columns. With
this approach, you use a format that makes it less difficult to extract the individual values later,
such as the protocol buffer (https://developers.google.com/protocol-buffers/docs/overview) binary
format or a JSON file.

The row key is still carefully designed to make sure you can retrieve the data you need, but
each row typically has only one column that contains all the data for the row in a single
protobuf.

Storing data as a protobuf message in one column instead of spreading the data into multiple
columns has advantages and disadvantages. Advantages include the following:

The data takes up less space, so it costs you less to store it.

You maintain a certain amount of flexibility by not committing to column families and
column qualifiers.

Your reading application does not need to "know" what your table schema is.

Some disadvantages are the following:

You have to deserialize the protobuf messages after they are read from Bigtable.

You lose the option to query the data in protobuf messages using filters.

You can't use BigQuery to run federated queries on fields within protobuf messages after
reading them from Bigtable.

What's next
Learn how to design a schema for time-series data
(/bigtable/docs/schema-design-time-series).

Review the steps involved in planning a schema (/bigtable/docs/schema-design-steps).

Review the types of write requests (/bigtable/docs/writes) you can send to Bigtable.

https://cloud.google.com/bigtable/docs/schema-design 13/14
3/18/24, 9:39 AM Schema design best practices | Bigtable Documentation | Google Cloud

Review the applicable quotas and limits (/bigtable/quotas).

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License
(https://creativecommons.org/licenses/by/4.0/), and code samples are licensed under the Apache 2.0 License
(https://www.apache.org/licenses/LICENSE-2.0). For details, see the Google Developers Site Policies
(https://developers.google.com/site-policies). Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2024-03-13 UTC.

https://cloud.google.com/bigtable/docs/schema-design 14/14

2
No ratings yet
2
11 pages
FortiSIEM - 5.2 - Study - Guide-Online 2 PDF
100% (2)
FortiSIEM - 5.2 - Study - Guide-Online 2 PDF
506 pages
Jump into JMP Scripting, Second Edition
From Everand
Jump into JMP Scripting, Second Edition
Wendy Murphrey
No ratings yet
AWS Certified Solutions Architect - Professional
From Everand
AWS Certified Solutions Architect - Professional
VB Dev
No ratings yet
SAP HANA Security Overview
No ratings yet
SAP HANA Security Overview
19 pages
JMeter Sample Resume Sample 2
0% (1)
JMeter Sample Resume Sample 2
5 pages
Kofax TotalAgility - Implementation Best Practice Guide - V1 0
67% (3)
Kofax TotalAgility - Implementation Best Practice Guide - V1 0
116 pages
Bigtable Overview - Google Cloud
No ratings yet
Bigtable Overview - Google Cloud
8 pages
ADO Lecture II 2024-26
No ratings yet
ADO Lecture II 2024-26
67 pages
Bigtable: A Distributed Storage System For Structured Data
No ratings yet
Bigtable: A Distributed Storage System For Structured Data
14 pages
Bigtable: A Distributed Storage System For Structured Data
No ratings yet
Bigtable: A Distributed Storage System For Structured Data
26 pages
Unit 5 Lecture 3
No ratings yet
Unit 5 Lecture 3
18 pages
Big Table
No ratings yet
Big Table
10 pages
Bigtable: A Distributed Storage System For Structured Data
No ratings yet
Bigtable: A Distributed Storage System For Structured Data
23 pages
Google Bigtable
No ratings yet
Google Bigtable
21 pages
Bigtable: A Distributed Storage System For Structured Data
100% (1)
Bigtable: A Distributed Storage System For Structured Data
4 pages
002 Bigtable
No ratings yet
002 Bigtable
16 pages
Lecture 33
No ratings yet
Lecture 33
32 pages
Google Bigtable
No ratings yet
Google Bigtable
3 pages
Bigtable: A Distributed Storage System For Structured Data: Presentation On Paper by
No ratings yet
Bigtable: A Distributed Storage System For Structured Data: Presentation On Paper by
12 pages
Hbase in Practice
No ratings yet
Hbase in Practice
46 pages
DBMS Unit-3
No ratings yet
DBMS Unit-3
28 pages
6.1 GCP - Cloud - Bigtable PDF
100% (1)
6.1 GCP - Cloud - Bigtable PDF
18 pages
How I Learned To Stop Worrying and Love Using A Lot of Disk Space To Scale
No ratings yet
How I Learned To Stop Worrying and Love Using A Lot of Disk Space To Scale
9 pages
CDA C2 R 047 en File 19.en
No ratings yet
CDA C2 R 047 en File 19.en
3 pages
Data Structures I Essentials
From Everand
Data Structures I Essentials
Dennis Smolarski
No ratings yet
G G 'S Bigtable: Name: Tunahan YILDIRIM Number:2195303 Paper: A Distributed Storage System For Structured Data
No ratings yet
G G 'S Bigtable: Name: Tunahan YILDIRIM Number:2195303 Paper: A Distributed Storage System For Structured Data
38 pages
SQLite Database Programming for Xamarin: Cross-platform C# database development for iOS and Android using SQLite.XM
From Everand
SQLite Database Programming for Xamarin: Cross-platform C# database development for iOS and Android using SQLite.XM
Anthony Serpico
No ratings yet
Google, Inc.: Bigtable: A Distributed Storage System For Structured Data
No ratings yet
Google, Inc.: Bigtable: A Distributed Storage System For Structured Data
24 pages
Schema Design For Time Series Data in Bigtable
No ratings yet
Schema Design For Time Series Data in Bigtable
6 pages
Bossing Spreadsheets: A Girl's Guide to Data Analysis: Bossing Up
From Everand
Bossing Spreadsheets: A Girl's Guide to Data Analysis: Bossing Up
Sophie Johnson
No ratings yet
4-DBA Student
No ratings yet
4-DBA Student
48 pages
Introduction to Microsoft SQL Server
From Everand
Introduction to Microsoft SQL Server
Eric Frick
No ratings yet
Data Storage Services in GCP: Relational Database Data Warehouse Nosql Big Data Database Service
No ratings yet
Data Storage Services in GCP: Relational Database Data Warehouse Nosql Big Data Database Service
15 pages
SAS Interview Questions You'll Most Likely Be Asked
From Everand
SAS Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
SQL 101 Crash Course: Comprehensive Guide to SQL Fundamentals and Practical Applications
From Everand
SQL 101 Crash Course: Comprehensive Guide to SQL Fundamentals and Practical Applications
Emrys Callahan
5/5 (1)
Lec 10 - Column DB
No ratings yet
Lec 10 - Column DB
34 pages
18 - HBase Schema Design
No ratings yet
18 - HBase Schema Design
22 pages
SQL Interview Success From Beginner To Pro
From Everand
SQL Interview Success From Beginner To Pro
Shana
No ratings yet
OD 03 PDE Building and Operationalizing Data Processing Systems
No ratings yet
OD 03 PDE Building and Operationalizing Data Processing Systems
34 pages
Data Structure and Algorithms in Java: From Basics to Expert Proficiency
From Everand
Data Structure and Algorithms in Java: From Basics to Expert Proficiency
William Smith
No ratings yet
Pivot Tables In Depth For Microsoft Excel 2016
From Everand
Pivot Tables In Depth For Microsoft Excel 2016
Suljan Qeska
3.5/5 (3)
Google BigQuery Analytics
From Everand
Google BigQuery Analytics
Jordan Tigani
3/5 (1)
Best Practices High Performance ETL To BigQuery Kerzqp
No ratings yet
Best Practices High Performance ETL To BigQuery Kerzqp
9 pages
Access 2016: Up To Speed
From Everand
Access 2016: Up To Speed
R.M. Hyttinen
5/5 (2)
Bigtable: Cse 490H, Autumn 2008
No ratings yet
Bigtable: Cse 490H, Autumn 2008
9 pages
DBMS MASTER: Become Pro in Database Management System
From Everand
DBMS MASTER: Become Pro in Database Management System
Ummed Singh
No ratings yet
Learn Hbase in 24 Hours
From Everand
Learn Hbase in 24 Hours
Alex Nordeen
No ratings yet
Naming Conventions For Oracle Tables
100% (1)
Naming Conventions For Oracle Tables
5 pages
Mastering Responsive Web Design with HTML5 and CSS3
From Everand
Mastering Responsive Web Design with HTML5 and CSS3
Ricardo Zea
No ratings yet
Build Your Own Distributed Compilation Cluster - A Practical Walkthrough
From Everand
Build Your Own Distributed Compilation Cluster - A Practical Walkthrough
Hunter Davis
No ratings yet
Essential Algorithms: A Practical Approach to Computer Algorithms
From Everand
Essential Algorithms: A Practical Approach to Computer Algorithms
Rod Stephens
4.5/5 (2)
Creating your MySQL Database: Practical Design Tips and Techniques
From Everand
Creating your MySQL Database: Practical Design Tips and Techniques
Marc Delisle
3/5 (1)
Big Query Optimization Document
No ratings yet
Big Query Optimization Document
10 pages
Upgrading your skills with Access
From Everand
Upgrading your skills with Access
Rémy Lentzner
No ratings yet
Amazon DynamoDB - The Definitive Guide: Explore enterprise-ready, serverless NoSQL with predictable, scalable performance
From Everand
Amazon DynamoDB - The Definitive Guide: Explore enterprise-ready, serverless NoSQL with predictable, scalable performance
Aman Dhingra
No ratings yet
Data Structures Explained: A Practical Guide with Examples
From Everand
Data Structures Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
Bigtable Architecture and Implementation: Definitive Reference for Developers and Engineers
From Everand
Bigtable Architecture and Implementation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Data Analytics & Visualization All-in-One For Dummies
From Everand
Data Analytics & Visualization All-in-One For Dummies
Jack A. Hyman
No ratings yet
Big Query Interview Q&A
No ratings yet
Big Query Interview Q&A
8 pages
SQL Server: Tips and Tricks - 2
From Everand
SQL Server: Tips and Tricks - 2
Priyanka Agarwal
4.5/5 (3)
JavaScript Data Structures Explained: A Practical Guide with Examples
From Everand
JavaScript Data Structures Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
Segmentation Analytics with SAS Viya: An Approach to Clustering and Visualization
From Everand
Segmentation Analytics with SAS Viya: An Approach to Clustering and Visualization
Randall S. Collica
No ratings yet
Iceberg Table Formats and Analytics: Definitive Reference for Developers and Engineers
From Everand
Iceberg Table Formats and Analytics: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
CSS Grid Layout: 5 Practical Projects
From Everand
CSS Grid Layout: 5 Practical Projects
Craig Buckler
No ratings yet
Chapter 2 Literature Review
No ratings yet
Chapter 2 Literature Review
21 pages
Sas 94 Sas Viya Functional Comparison
No ratings yet
Sas 94 Sas Viya Functional Comparison
27 pages
Advanced Multi Tenancy Configurations
No ratings yet
Advanced Multi Tenancy Configurations
14 pages
KineticGuide ForCloudDeployments 2022.1
100% (1)
KineticGuide ForCloudDeployments 2022.1
76 pages
The Ultimate Guide To ERP
No ratings yet
The Ultimate Guide To ERP
22 pages
ASA Cluster
No ratings yet
ASA Cluster
22 pages
Software-As-A-service (SaaS) Perspectives and Challenges
No ratings yet
Software-As-A-service (SaaS) Perspectives and Challenges
15 pages
01 Day1 Excercise HANA Native Hands-On
No ratings yet
01 Day1 Excercise HANA Native Hands-On
35 pages
Seminar Synopsis The Multitenant Application
No ratings yet
Seminar Synopsis The Multitenant Application
6 pages
Cloud Computing - 3170717 - Thatmishrajii
No ratings yet
Cloud Computing - 3170717 - Thatmishrajii
39 pages
Chapter 5 - Version1
No ratings yet
Chapter 5 - Version1
92 pages
CC15 16
No ratings yet
CC15 16
3 pages
Guide: IBM Maximo Asset Configuration Manager
No ratings yet
Guide: IBM Maximo Asset Configuration Manager
140 pages
Informatica Powercenter and Data Quality On Oracle Exadata
No ratings yet
Informatica Powercenter and Data Quality On Oracle Exadata
8 pages
Software As A Service (Saas)
No ratings yet
Software As A Service (Saas)
18 pages
Five Characteristics of Cloud Computing
No ratings yet
Five Characteristics of Cloud Computing
8 pages
Cloud Computing - Midsem
No ratings yet
Cloud Computing - Midsem
591 pages
HPSM 9 40 Universal CMDB Integration Guide
No ratings yet
HPSM 9 40 Universal CMDB Integration Guide
273 pages
Unit-1 Introduction To Cloud Computing
No ratings yet
Unit-1 Introduction To Cloud Computing
58 pages
Multi-Tenant Enabled Elearning Platform: Blended With Workflow Technologies
No ratings yet
Multi-Tenant Enabled Elearning Platform: Blended With Workflow Technologies
5 pages
User Guide: Ibm Qradar User Behavior Analytics (Uba) App
No ratings yet
User Guide: Ibm Qradar User Behavior Analytics (Uba) App
294 pages
FortiSOAR Ordering Guide
No ratings yet
FortiSOAR Ordering Guide
4 pages
Windows Azure Prescriptive Guidance
No ratings yet
Windows Azure Prescriptive Guidance
425 pages
Oracle 1z0 060 20150419
No ratings yet
Oracle 1z0 060 20150419
130 pages
Security Guidance For Critical Areas of Focus in Cloud Computin
No ratings yet
Security Guidance For Critical Areas of Focus in Cloud Computin
177 pages

Schema Design Best Practices - Bigtable Documentation - Google Cloud

Uploaded by

Schema Design Best Practices - Bigtable Documentation - Google Cloud

Uploaded by

3/18/24, 9:39 AM Schema design best practices | Bigtable Documentation | Google Cloud

Schema design best practices

Column families, including their garbage collection policies

A secondary consideration is the avoidance of hotspots – to prevent hotspots, you need to

The following general concepts apply to Bigtable schema design:

Column families are not stored in any specific order.

%CPU DiskRead ID Memory Priority ProcessName User

Column families (#column-families)

Row keys (#row-keys)

Because all tables in an instance are stored on the same tablets

Row key Column qualifier:value Column qualifier:value Column qualifier:value

Jose Fred:book-club Gabriel:work Hiroshi:tennis

Sofia Hiroshi:work Seo Yoon:school Jakob:chess-club

Row key Column qualifier:value Column qualifier:value

Jose#1 Friend:Fred Circle:book-club

Jose#2 Friend:Gabriel Circle:work

Jose#3 Friend:Hiroshi Circle:tennis

Sofia#1 Friend:Hiroshi Circle:work

Sofia#2 Friend:Seo Yoon Circle:school

Sofia#3 Friend:Jakob Circle:chess-club

It takes time for Bigtable to process each cell in a row.

Store related entities in adjacent rows, to make reads more efficient.

Row key prefix

Range of rows defined by starting and ending row keys

A combination of device type and device ID

Row keys to avoid

Special use cases

Few domain names

resulting in hotspots, or your rows might grow too large.

Changing or uncertain queries

Some disadvantages are the following:

Review the steps involved in planning a schema (/bigtable/docs/schema-design-steps).

Review the applicable quotas and limits (/bigtable/quotas).

Last updated 2024-03-13 UTC.

You might also like