Lecture 33
Lecture 33
Stores
Introduction
Basics
• AKA (Also Know As) Wide-column, columnar
• Data model
• Columns family
• represented as 3-tuple
• column name
• value
• timestamp
Data Model: Row
// row
"martin-fowler" : {
firstName: "Martin",
lastName: "Fowler",
• Row is a collection of columns
location: "Boston"
attached to the same row key
}
• columns can be added to
any row at anytime without
having to add for all rows
// row
"Jack-fowler" : {
firstName: "Jack",
lastName: "Fowler",
}
Data Model: Column Family
(CF)
• CF is a set of columns
containing related data
• Cassandra
• BigTable
• HBase
• Hypertable
• Accumulo
Ranked list: http://db-engines.com/en/ranking/wide+column+store
BigTable
• Google’s paper:
Chang, F. et al. (2008). Bigtable: A Distributed Storage System for
Structured Data. ACM TOCS, 26(2), pp 1–26.
• Multi-dimensional map
HBase
• An open source implementation of Google BigTable
• Implementation: Java
Cassandra
• Developed at Facebook
• Written in Java
• Operations
• Billions of users
• Google analytic
• Google Finance
• stock information
• Personalized Search
• Youtube
• and more
A big table
• characteristics:
• sparse
• distributed
• A map
• Is an associative array
• Distributed
• Sorted
• For example if we want to store all pages of the same domain, then we reverse the url
and use it as row key
• Time based
• A tablet
• reading is efficient
• can be configured
Splitting a Tablet
Columns & Column Families
• Column Family
• group of columns
• operations:
• “anchors” — pages referencing the web page (in the row key)
• In the previous example, same URL may have different content over time
• which represent real time, time when the cell was added
• At retrieval time
• if no exact match then the latest version that is earlier than the specified timestamp