Bigtable is a distributed storage system built by Google to handle large amounts of structured data. It uses a sparse, distributed, persistent multidimensional sorted map as its data model. The API allows for creating/deleting tables and column families as well as reading, writing, and querying data. It is built upon other Google technologies like Google File System for storage and Chubby for locking.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
21 views16 pages
002 Bigtable
Bigtable is a distributed storage system built by Google to handle large amounts of structured data. It uses a sparse, distributed, persistent multidimensional sorted map as its data model. The API allows for creating/deleting tables and column families as well as reading, writing, and querying data. It is built upon other Google technologies like Google File System for storage and Chubby for locking.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16
Bigtable
David Wyrobnik, MEng
Overview ● What is Bigtable? ● Data Model ● API ● Building Blocks ● Implementation What is Bigtable (high level) ● “Distributed storage system for structured data” - title of paper ● “BigTable is a compressed, high performance, and proprietary data storage system built on Google File System, Chubby Lock Service, SSTable (log- structured storage like LevelDB) and a few other Google technologies.” - wikipedia ● “A Bigtable is a sparse, distributed, persistent multidimensional sorted map” - paper Data Model Data Model ● (row:string, column:string, time:int64) → array of bytes Data Model continued ● Timestamps can be assigned automatically (“real time”) or by client ● Versioned data management, two per-column-family settings for garbage- collection ○ last n versions of a cell should be kept ○ only new-enough versions kept (e.g. only values that were written in the last seven days) API API ● Functions for creating and deleting ○ tables and column families ● Functions for changing ○ clusters, table, and column family metadata (such as control rights) ● Write, delete, and lookup values in individual rows ● Iterate over subset of data in table ● Single-row transactions → perform atomic read-modify-write sequences ● No general transactions across rows, but supports batching writes across rows ● Bigtable can be used with MapReduce (common use case) Building Blocks and Implementation Building Blocks ● Google-File-System (GFS) to store log and data files. ● SSTable file format. ● Chubby as a lock service ● Bigtable uses Chubby ○ to ensure at most one active master exists ○ to store bootstrap location of Bigtable data ○ to discover tablet servers ○ to store Bigtable schema information (column family info for each table) ○ to store access control lists Implementation ● Three major components: ○ library that is linked into every client ○ one master server ○ many tablet servers ● Master mainly responsible for assigning tablets to tablet servers ● Tablet servers can be added or removed dynamically ● Tablet server store typically 10-1000 tablets ● Tablet server handle read and writes and splitting of tablets that are too large ● Client data does not move through master. Tablet Location Tablet Assignment ● Master keeps track of live tablet servers, current assignments, and of unassigned tablets ● Master assigns unassigned tablets to tablet servers by sending a tablet load request ● Tablet servers are linked to files in Chubby directory (servers directory) ● When new master starts: ○ Acquires unique master lock in Chubby ○ Scans live tablet servers ○ Gets list of tablets from each tablet server, to learn which tablets are assigned ○ Scans METADATA table to learn set of existing tablets → adds unassigned tablets to list Tablet Serving Consistency ● Bigtable has a strong consistency model, since operations on rows are atomic and tablets are only served by one tablet server at a time Discussion