100% found this document useful (1 vote)

166 views4 pages

Bigtable: A Distributed Storage System For Structured Data

Bigtable is a distributed storage system for structured data developed by Google. It is a column-oriented, non-relational database that automatically scales horizontally to handle large amounts of structured data. Bigtable stores data as multidimensional sorted maps and provides APIs for common languages like Java. It offers high scalability, availability, and durability through replication across clusters and automatic rebalancing.

Uploaded by

Eric John Dailisan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

166 views4 pages

Bigtable: A Distributed Storage System For Structured Data

Uploaded by

Eric John Dailisan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

BIGTABLE: A DISTRIBUTED STORAGE SYSTEM FOR STRUCTURED DATA

ERIC JOHN S. DAILISAN

PINMALUDPOD, URDANETA CITY, PANGASINAN

[email protected]

09152227260
INTRODUCTION data, simply add a second cluster to your instance, and replication
starts automatically. No more managing masters or regions; just
Traditional relational databases present a view that is
design your table schemas, and Cloud Bigtable will handle the rest
composed of multiple tables, each with rows and named columns.
for you.
Queries, mostly performed in SQL (Structured Query Language)
allow one to extract specific columns from a row where certain Cluster resizing without downtime
conditions are met (e.g., a column has a specific value). Moreover,
You can increase the size of a Cloud Bigtable cluster for a few
one can perform queries across multiple tables (this is the
hours to handle a large load, then reduce the cluster's size again—
"relational" part of a relational database). For example a table of
all without any downtime. After you change a cluster's size, it
students may include a student's name, ID number, and contact
typically takes just a few minutes under load for Cloud Bigtable to
information. A table of grades may include a student's ID number,
balance performance across all of the nodes in your cluster.
course number, and grade. We can construct a query that extracts a
grades by name by searching for the ID number in the student table Open Source
and then matching that ID number in the grade table. Moreover,
with traditional databases, we expect ACID guarantees: that Bigtable is available as open source, which is a major advantage as

transactions will be atomic, consistent, isolated, and durable. As it enriches the kind of comments and contributions it receives over

we saw when we studied distributed transactions, it is impossible time. Users are then assured a good degree of improvement and

to guarantee consistency while providing high availability and addition with an active developer base in the open source

network partition tolerance. This makes ACID databases context. This also means that Bigtable would adhere to the

unattractive for highly distributed environments and led to the required industry standards. For example, the HBase API, which is

emergence of alternate data stores that are target to high one of the most popularly used bases, is seamlessly supported and

availability and high performance. Here, we will look at the organizations that already use products like HBase would find it

structure and capabilities of BigTable. doubly simple to set up Bigtable for their data.

BODY Security

Google Bigtable is a distributed, column-oriented data store With large amounts of data, concerns for data security also escalate

created by Google Inc. to handle very large amounts of structured just as much. Bigtable offers a replicated storage strategy, with

data associated with the company's Internet search and Web algorithms for encryption of data; something that is sure to help

services operations. allay these concerns. Customers can also bank on Google’s
expertise in this area, with their long-standing experience of
Bigtable was designed to support applications requiring handling the privacy and security of large amounts of data.
massive scalability; from its first iteration, the technology was
intended to be used with petabytes of data. The database was Maturity

designed to be deployed on clustered systems and uses a simple

Due to the simple fact that Bigtable has been used internally for a
data model that Google has described as "a sparse, distributed,
significant period of time by a data giant like Google, it can
persistent multidimensional sorted map." Data is assembled in
promise a high level of stability and maturity to its users. It is not
order by row key, and indexing of the map is arranged according to
at all comparable to a new and untested product, and might
row, column keys and timestamps. Compression algorithms help
probably score favourably on many fronts when compared to
achieve high capacity.
longstanding players in the arena as well. Due to its internal use,

Cloud Bigtable is exposed to applications through customers can also be sure of its continued availability and

multiple client libraries, including a supported extension to the enhancement. Drawing on its strengths as an organization, Google

Apache HBase library for Java\. As a result, it integrates with the also lists many of its service partners including Pythian, CCRi and

existing Apache ecosystem of open-source Big Data software. Sungard, as companies who can build platforms to help support a
faster transition to Bigtable.
Cloud Bigtable's powerful back-end servers offer several key
advantages over a self-managed HBase installation: Cloud Bigtable is ideal for applications that need very high
throughput and scalability for non-structured key/value data, where
Incredible scalability each value is typically no larger than 10 MB. Cloud Bigtable also
excels as a storage engine for batch MapReduce operations, stream
Cloud Bigtable scales in direct proportion to the number of
processing/analytics, and machine-learning applications.
machines in your cluster. A self-managed HBase installation has a
design bottleneck that limits the performance after a certain You can use Cloud Bigtable to store and query all of the following
threshold is reached. Cloud Bigtable does not have this bottleneck, types of data:
so you can scale your cluster up to handle more reads and writes.
Time-series Data, such as CPU and memory usage over time for
Simple Administration multiple servers.

Cloud Bigtable handles upgrades and restarts transparently, and it Marketing Data, such as purchase histories and customer
automatically maintains high data durability. To replicate your preferences.
Financial Data, such as transaction histories, stock prices, and automatically, saving users the effort of manually administering
currency exchange rates. their tablets. Understanding Cloud Bigtable Performance provides
more details about this process.
Internet of Things Data, such as usage reports from energy
meters and home appliances.

Graph Data, such as information about how users are connected Supported Data Types
to one another.
Cloud Bigtable treats all data as raw byte strings for most
To store the underlying data for each of your tables, Cloud purposes. The only time Cloud Bigtable tries to determine the type
Bigtable shards the data into multiple tablets (Not a typo! Tablets is for increment operations, where the target must be a 64-bit
and tables are different things.), where each tablet contains a integer encoded as an 8-byte big-endian value.
contiguous range of rows within the table.
Memory and disk usage

The following sections describe how several components of Cloud

Bigtable affect memory and disk usage for your instance.

Empty cells

Empty cells in a Cloud Bigtable table do not take up any space.

Each row is essentially a collection of key/value entries, where the
key is a combination of the column family, column qualifier and
timestamp. If a row does not include a value for a specific key, the
key/value entry is simply not present.

Column qualifiers

Column qualifiers take up space in a row, since each column

And here’s the important thing when it comes to tablets: they can qualifier used in a row is stored in that row. As a result, it is often
be reassigned to different nodes in your cluster, on demand, efficient to use column qualifiers as data. In the Prezzy example
allowing Cloud Bigtable to scale and re-balance seamlessly as your shown above, the column qualifiers in the follows family are the
use patterns change. usernames of followed users; the key/value entry for these columns
is simply a placeholder value.

Expected Performance Compactions

Under these typical workloads, Cloud Bigtable delivers highly

Cloud Bigtable periodically rewrites your tables to remove deleted
predictable performance, and according to the official
entries, and to reorganize your data so that reads and writes are
documentation, you can expect to achieve the following
more efficient. This process is known as a compaction. There are
performance for each node in your Cloud Bigtable cluster,
no configuration settings for compactions—Cloud Bigtable
depending on which type of storage your cluster uses:
compacts your data automatically.

Mutations and deletions

Mutations, or changes, to a row take up extra storage space,

In general, a cluster’s performance increases linearly as you add
because Cloud Bigtable stores mutations sequentially and
nodes to the cluster. For example, if you create an SSD cluster with
compacts them only periodically. When Cloud Bigtable compacts a
10 nodes, the cluster can support up to 100,000 QPS for a typical
table, it removes values that are no longer needed. If you update
read-only or write-only workload, assuming that each row contains
the value in a cell, both the original value and the new value will
1 KB of data.
be stored on disk for some amount of time until the data is
Load Balancing compacted.

Each Cloud Bigtable zone is managed by a master process, which

Deletions also take up extra storage space, at least in the
balances workload and data volume within clusters. The master
short term, because deletions are actually a specialized type of
splits busier/larger tablets in half and merges less-accessed/smaller
mutation. Until the table is compacted, a deletion uses extra
tablets together, redistributing them between nodes as needed. If a
storage rather than freeing up space.
certain tablet gets a spike of traffic, the master will split the tablet
in two, then move one of the new tablets to another node. Cloud
Bigtable manages all of the splitting, merging, and rebalancing
Data compression REFERENCES

Paul Krzyzanowski, BigTable, cs.rutgers.edu

Cloud Bigtable compresses your data automatically using an
intelligent algorithm. You cannot configure compression settings Margaret Rouse, Google Bigtable, techtarget.com
for your table. However, it is useful to know how to store data so
that it can be compressed efficiently: Google, Overview of Cloud Bigtable, cloud.google.com

Admin, Cloud Bigtable Launched by Google to Store Big Data,

 Random data cannot be compressed as efficiently as
suyati.com
patterned data. Patterned data includes text, such as
the page you're reading right now.
Colt McAnlis, Cloud Bigtable Performance 101, medium.com
 Compression works best if identical values are near
each other, either in the same row or in adjoining rows.
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh,
If you arrange your row keys so that rows with identical
Deborah A. Wallach,Mike Burrows, Tushar Chandra, Andrew
chunks of data are next to each other, the data can be
Fikes, Robert E. Gruber, Bigtable: A Distributed Storage System
compressed efficiently.
for Structured Data, Google,Inc

Cloud Bigtable and other storage options

Cloud Bigtable is not a relational database; it does not support SQL

queries or joins, nor does it support multi-row transactions. Also, it
is not a good solution for storing less than 1 TB of data.

 If you need full SQL support for an online transaction

processing (OLTP) system, consider Cloud Spanner or
Cloud 0SQL.

 If you need interactive querying in an online analytical

processing (OLAP) system, consider BigQuery.

 If you need to store immutable blobs larger than 10

MB, such as large images or movies, consider Cloud
Storage.

 If you need to store highly structured objects in a

document database, with support for ACID transactions
and SQL-like queries, consider Cloud Datastore.

CONCLUSION

We have described Bigtable, a distributed system for storing

structured data at Google... Our users like the performance and
high availability provided by the Bigtable implementation, and that
they can scale the capacity of their clusters by simply adding more
machines to the system as their resource demands change over
time... Finally, we have found that there are significant advantages
to building our own storage solution at Google. We have gotten a
substantial amount of flexibility from designing our own data
model for Bigtable.

Title Proposal Format Irrigation1
100% (1)
Title Proposal Format Irrigation1
5 pages
Processor and Memory
100% (1)
Processor and Memory
27 pages
Google Analytics BigQuery Setup V2
No ratings yet
Google Analytics BigQuery Setup V2
64 pages
ExamStudyGuide 801 CIV v3
No ratings yet
ExamStudyGuide 801 CIV v3
13 pages
Algorithms and Tools of Big Dat3
No ratings yet
Algorithms and Tools of Big Dat3
66 pages
Ericsson GSM System Survey PDF
No ratings yet
Ericsson GSM System Survey PDF
2 pages
The Operational Data Store - Tactical Analysis at Your Fingertips
86% (7)
The Operational Data Store - Tactical Analysis at Your Fingertips
64 pages
The History of LUE
100% (5)
The History of LUE
5 pages
How Will You Test Incremental Loading
100% (1)
How Will You Test Incremental Loading
2 pages
How To Open Badoo Dating Site For Free
No ratings yet
How To Open Badoo Dating Site For Free
2 pages
Ar Sse 1643801510 Beat The Cyberbully Scenario Cards 1 - Ver - 1
No ratings yet
Ar Sse 1643801510 Beat The Cyberbully Scenario Cards 1 - Ver - 1
3 pages
Labs AlloyDB
100% (1)
Labs AlloyDB
2 pages
T and C Ticket
No ratings yet
T and C Ticket
11 pages
Google Account
0% (1)
Google Account
286 pages
Software Architect
100% (1)
Software Architect
54 pages
Clavia USB Driver v3.0x Installation Instructions
No ratings yet
Clavia USB Driver v3.0x Installation Instructions
2 pages
Data Modeling & ER Model
100% (1)
Data Modeling & ER Model
122 pages
Computer Networks (Lab-2,3)
No ratings yet
Computer Networks (Lab-2,3)
17 pages
6BR Capitaland Firewall Ports Administration (MACD) Entry Confirmation v1.2
No ratings yet
6BR Capitaland Firewall Ports Administration (MACD) Entry Confirmation v1.2
3 pages
AdvSQLCourseBook-3 1new
No ratings yet
AdvSQLCourseBook-3 1new
818 pages
ICE360S - Laravel Blog Design Project - April 2024
No ratings yet
ICE360S - Laravel Blog Design Project - April 2024
3 pages
Big Query Interview Q&A
No ratings yet
Big Query Interview Q&A
8 pages
Wave Mobile App User Guide
No ratings yet
Wave Mobile App User Guide
69 pages
Eng Essay
No ratings yet
Eng Essay
5 pages
Nitte Meenakshi Institute of Technology
No ratings yet
Nitte Meenakshi Institute of Technology
8 pages
UNIX System Administration - II
100% (1)
UNIX System Administration - II
5 pages
Pres5 - Single Instance Architecture
100% (1)
Pres5 - Single Instance Architecture
13 pages
Linear Regression in R - R Tutorial
100% (1)
Linear Regression in R - R Tutorial
33 pages
User Datagram Protocol
No ratings yet
User Datagram Protocol
6 pages
GURPS Character - Ai Variables
No ratings yet
GURPS Character - Ai Variables
2 pages
Teradata Skills
No ratings yet
Teradata Skills
502 pages
UbiQuoss U9016B User Guide GE-PON Ver1.1
No ratings yet
UbiQuoss U9016B User Guide GE-PON Ver1.1
469 pages
IE1 Unit 3 - Additional Useful Vocabulary
No ratings yet
IE1 Unit 3 - Additional Useful Vocabulary
4 pages
IBM Traveler 8.5.3
No ratings yet
IBM Traveler 8.5.3
49 pages
Name Year Signature: Cpe General Assembly 2019 (Attendance)
No ratings yet
Name Year Signature: Cpe General Assembly 2019 (Attendance)
5 pages
Unix Administration II
100% (1)
Unix Administration II
6 pages
Interview Preparation Part-1
No ratings yet
Interview Preparation Part-1
53 pages
Build Two Node Oracle Rac 11gr2 11 2 0 3 With Gns Dns DHCP and Haip1 PDF
100% (1)
Build Two Node Oracle Rac 11gr2 11 2 0 3 With Gns Dns DHCP and Haip1 PDF
143 pages
Oracle Clusterware
100% (1)
Oracle Clusterware
2 pages
3G - Worst Cells - 05112014 (Recovered)
No ratings yet
3G - Worst Cells - 05112014 (Recovered)
34 pages
RPM Packages: by Abhishek Kumar
100% (1)
RPM Packages: by Abhishek Kumar
16 pages
Informatica Powermart / Powercenter 8.6
No ratings yet
Informatica Powermart / Powercenter 8.6
239 pages
Ineuron Slides PDF
No ratings yet
Ineuron Slides PDF
38 pages
Telco Cloud Mini Guide 1
No ratings yet
Telco Cloud Mini Guide 1
6 pages
Unix Fundamentals and Command References: Solaris Linux Hp-Ux AIX
100% (1)
Unix Fundamentals and Command References: Solaris Linux Hp-Ux AIX
178 pages
To All Computer Engineering Students
No ratings yet
To All Computer Engineering Students
2 pages
Google Bigtable
No ratings yet
Google Bigtable
3 pages
Oracle Rac
100% (1)
Oracle Rac
2 pages
Archive
100% (1)
Archive
18 pages
Annual Report: FPSB India
100% (1)
Annual Report: FPSB India
68 pages
Networking Exam Paper March 2012 - Final
100% (1)
Networking Exam Paper March 2012 - Final
4 pages
Formalizing ETL Jobs For Incremental Loading of Data Warehouses
100% (1)
Formalizing ETL Jobs For Incremental Loading of Data Warehouses
20 pages
MX WinGuard An Ingetral Visualising and Danger Management System PDF
No ratings yet
MX WinGuard An Ingetral Visualising and Danger Management System PDF
2 pages
Module 1 - Oracle Architecture
100% (2)
Module 1 - Oracle Architecture
34 pages
Netronics Essential Guide For Wireless ISPs
No ratings yet
Netronics Essential Guide For Wireless ISPs
21 pages
Fundamentals of UNIX Administration: Course Length: Course Description
100% (1)
Fundamentals of UNIX Administration: Course Length: Course Description
4 pages
Fundamentals of Big Data Engineering: A Guide To The
No ratings yet
Fundamentals of Big Data Engineering: A Guide To The
14 pages
Lesson Plan 2 by Lina
No ratings yet
Lesson Plan 2 by Lina
12 pages
DLL Mil DLL in Mil Quarter 1 2023
No ratings yet
DLL Mil DLL in Mil Quarter 1 2023
30 pages
Edited Final (Group 12) Possitive Impact of Social Media On Online Selling Business
No ratings yet
Edited Final (Group 12) Possitive Impact of Social Media On Online Selling Business
44 pages
SQL DBA CDC Work and How-To's
No ratings yet
SQL DBA CDC Work and How-To's
61 pages
Business Objects Step by Step Tutorial
No ratings yet
Business Objects Step by Step Tutorial
27 pages
MoAhten Resume
100% (1)
MoAhten Resume
2 pages
Title: Smartway: If (Off Track)
No ratings yet
Title: Smartway: If (Off Track)
2 pages
Subnet and Ip's Etisalat
No ratings yet
Subnet and Ip's Etisalat
18 pages
Data Science
No ratings yet
Data Science
71 pages
How Does SCAN Listener Works in Oracle RAC 11gR2 PDF
100% (1)
How Does SCAN Listener Works in Oracle RAC 11gR2 PDF
2 pages
Oracle GoldenGate 11gr2 IE and Oracle DG - Switchover-Fail-over Ops v1.1-ID1436913.1
No ratings yet
Oracle GoldenGate 11gr2 IE and Oracle DG - Switchover-Fail-over Ops v1.1-ID1436913.1
24 pages
Postman
100% (1)
Postman
18 pages
Nosql: Non-Relational Next Generation Operational Datastores and Databases
No ratings yet
Nosql: Non-Relational Next Generation Operational Datastores and Databases
19 pages
Introduction To API Security
100% (1)
Introduction To API Security
33 pages
AWS Services Overview
No ratings yet
AWS Services Overview
28 pages
Oracle Indexes
No ratings yet
Oracle Indexes
3 pages
Tuning Your PostgreSQL Server
No ratings yet
Tuning Your PostgreSQL Server
7 pages
Magento Performance Optimization
No ratings yet
Magento Performance Optimization
25 pages
PS7000 Ver2.1readme
No ratings yet
PS7000 Ver2.1readme
5 pages
Project Overview: Title: Noise Detector & Warning Device For Library
No ratings yet
Project Overview: Title: Noise Detector & Warning Device For Library
2 pages
Meeting Scheduling System: University of Texas at Dallas
No ratings yet
Meeting Scheduling System: University of Texas at Dallas
15 pages
Performance BigQuery Vs Redshift
No ratings yet
Performance BigQuery Vs Redshift
14 pages
PROPOSED ACTIVITIES 2019 New 1st 2nd
No ratings yet
PROPOSED ACTIVITIES 2019 New 1st 2nd
2 pages
Apache Cassandra
No ratings yet
Apache Cassandra
7 pages
Informatica Performance Tuning
No ratings yet
Informatica Performance Tuning
11 pages
Distributed Computing With Python - Sample Chapter
No ratings yet
Distributed Computing With Python - Sample Chapter
18 pages
Informatica Basic Dac Obia7964
0% (1)
Informatica Basic Dac Obia7964
96 pages
Automatic Tools For High Availability in Postgresql: Camilo Andrés Echeverri
No ratings yet
Automatic Tools For High Availability in Postgresql: Camilo Andrés Echeverri
9 pages
Apache Cassandra Database - Instaclustr
No ratings yet
Apache Cassandra Database - Instaclustr
8 pages
Oracle Automatic Storage Management: Notes
0% (1)
Oracle Automatic Storage Management: Notes
4 pages
An Investigation of NoSQL Database Performance From A MYSQL Perspective
No ratings yet
An Investigation of NoSQL Database Performance From A MYSQL Perspective
3 pages
Network+ Exam 2001
100% (1)
Network+ Exam 2001
5 pages
Cassandra Installation Review
No ratings yet
Cassandra Installation Review
6 pages
Data Integration Using GoldenGate
No ratings yet
Data Integration Using GoldenGate
18 pages
A Performance Comparison of SQL and NoSQL Databases
No ratings yet
A Performance Comparison of SQL and NoSQL Databases
5 pages
Restful Web Services: A Seminar Report On
No ratings yet
Restful Web Services: A Seminar Report On
11 pages
Databricks Essentials: A Guide to Unified Data Analytics
From Everand
Databricks Essentials: A Guide to Unified Data Analytics
Robert Johnson
No ratings yet
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Google Cloud Data Engineer 100+ Practice Exam Questions With Well Explained Answers
From Everand
Google Cloud Data Engineer 100+ Practice Exam Questions With Well Explained Answers
vivian njoroge
No ratings yet
Oracle Exadata Complete Self-Assessment Guide
From Everand
Oracle Exadata Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet

Bigtable: A Distributed Storage System For Structured Data

Uploaded by

Bigtable: A Distributed Storage System For Structured Data

Uploaded by

BIGTABLE: A DISTRIBUTED STORAGE SYSTEM FOR STRUCTURED DATA

ERIC JOHN S. DAILISAN

PINMALUDPOD, URDANETA CITY, PANGASINAN

designed to be deployed on clustered systems and uses a simple

The following sections describe how several components of Cloud

Empty cells in a Cloud Bigtable table do not take up any space.

Column qualifiers take up space in a row, since each column

Expected Performance Compactions

Under these typical workloads, Cloud Bigtable delivers highly

Mutations and deletions

Mutations, or changes, to a row take up extra storage space,

Each Cloud Bigtable zone is managed by a master process, which

Paul Krzyzanowski, BigTable, cs.rutgers.edu

Admin, Cloud Bigtable Launched by Google to Store Big Data,

Cloud Bigtable and other storage options

Cloud Bigtable is not a relational database; it does not support SQL

 If you need full SQL support for an online transaction

 If you need interactive querying in an online analytical

 If you need to store immutable blobs larger than 10

 If you need to store highly structured objects in a

We have described Bigtable, a distributed system for storing

You might also like