Bigtable: A Distributed Storage System For Structured Data
Bigtable: A Distributed Storage System For Structured Data
Bigtable: A Distributed Storage System For Structured Data
09152227260
INTRODUCTION data, simply add a second cluster to your instance, and replication
starts automatically. No more managing masters or regions; just
Traditional relational databases present a view that is
design your table schemas, and Cloud Bigtable will handle the rest
composed of multiple tables, each with rows and named columns.
for you.
Queries, mostly performed in SQL (Structured Query Language)
allow one to extract specific columns from a row where certain Cluster resizing without downtime
conditions are met (e.g., a column has a specific value). Moreover,
You can increase the size of a Cloud Bigtable cluster for a few
one can perform queries across multiple tables (this is the
hours to handle a large load, then reduce the cluster's size again—
"relational" part of a relational database). For example a table of
all without any downtime. After you change a cluster's size, it
students may include a student's name, ID number, and contact
typically takes just a few minutes under load for Cloud Bigtable to
information. A table of grades may include a student's ID number,
balance performance across all of the nodes in your cluster.
course number, and grade. We can construct a query that extracts a
grades by name by searching for the ID number in the student table Open Source
and then matching that ID number in the grade table. Moreover,
with traditional databases, we expect ACID guarantees: that Bigtable is available as open source, which is a major advantage as
transactions will be atomic, consistent, isolated, and durable. As it enriches the kind of comments and contributions it receives over
we saw when we studied distributed transactions, it is impossible time. Users are then assured a good degree of improvement and
to guarantee consistency while providing high availability and addition with an active developer base in the open source
network partition tolerance. This makes ACID databases context. This also means that Bigtable would adhere to the
unattractive for highly distributed environments and led to the required industry standards. For example, the HBase API, which is
emergence of alternate data stores that are target to high one of the most popularly used bases, is seamlessly supported and
availability and high performance. Here, we will look at the organizations that already use products like HBase would find it
structure and capabilities of BigTable. doubly simple to set up Bigtable for their data.
BODY Security
Google Bigtable is a distributed, column-oriented data store With large amounts of data, concerns for data security also escalate
created by Google Inc. to handle very large amounts of structured just as much. Bigtable offers a replicated storage strategy, with
data associated with the company's Internet search and Web algorithms for encryption of data; something that is sure to help
services operations. allay these concerns. Customers can also bank on Google’s
expertise in this area, with their long-standing experience of
Bigtable was designed to support applications requiring handling the privacy and security of large amounts of data.
massive scalability; from its first iteration, the technology was
intended to be used with petabytes of data. The database was Maturity
Cloud Bigtable is exposed to applications through customers can also be sure of its continued availability and
multiple client libraries, including a supported extension to the enhancement. Drawing on its strengths as an organization, Google
Apache HBase library for Java\. As a result, it integrates with the also lists many of its service partners including Pythian, CCRi and
existing Apache ecosystem of open-source Big Data software. Sungard, as companies who can build platforms to help support a
faster transition to Bigtable.
Cloud Bigtable's powerful back-end servers offer several key
advantages over a self-managed HBase installation: Cloud Bigtable is ideal for applications that need very high
throughput and scalability for non-structured key/value data, where
Incredible scalability each value is typically no larger than 10 MB. Cloud Bigtable also
excels as a storage engine for batch MapReduce operations, stream
Cloud Bigtable scales in direct proportion to the number of
processing/analytics, and machine-learning applications.
machines in your cluster. A self-managed HBase installation has a
design bottleneck that limits the performance after a certain You can use Cloud Bigtable to store and query all of the following
threshold is reached. Cloud Bigtable does not have this bottleneck, types of data:
so you can scale your cluster up to handle more reads and writes.
Time-series Data, such as CPU and memory usage over time for
Simple Administration multiple servers.
Cloud Bigtable handles upgrades and restarts transparently, and it Marketing Data, such as purchase histories and customer
automatically maintains high data durability. To replicate your preferences.
Financial Data, such as transaction histories, stock prices, and automatically, saving users the effort of manually administering
currency exchange rates. their tablets. Understanding Cloud Bigtable Performance provides
more details about this process.
Internet of Things Data, such as usage reports from energy
meters and home appliances.
Graph Data, such as information about how users are connected Supported Data Types
to one another.
Cloud Bigtable treats all data as raw byte strings for most
To store the underlying data for each of your tables, Cloud purposes. The only time Cloud Bigtable tries to determine the type
Bigtable shards the data into multiple tablets (Not a typo! Tablets is for increment operations, where the target must be a 64-bit
and tables are different things.), where each tablet contains a integer encoded as an 8-byte big-endian value.
contiguous range of rows within the table.
Memory and disk usage
Empty cells
Column qualifiers
CONCLUSION