NOsql Presentation
NOsql Presentation
NoSQL
Big data
Name-Ajay Kushwaha
Roll no.-2000290110017
1
NoSQL!
NoSQL databases are currently a hot topic in some parts of
computing, with over a hundred
different NoSQL databases.
No SQL?
• Documents
• Loosely structured sets of key/value pairs in documents, e.g., XML, JSON,
BSON
• Encapsulate and encode data in some standard formats or encodings
• Are addressed in the database via a unique key
• Documents are treated as a whole, avoiding splitting a document into its
constituent name/value pairs
• Allow documents retrieving by keys or contents
• Notable for:
• MongoDB (used in FourSquare, Github, and more)
• CouchDB (used in Apple, BBC, Canonical, Cern, and more)
Document Databases (Document Store)
13
Document Databases, JSON
{
_id: ObjectId("51156a1e056d6f966f268f81"),
type: "Article",
author: "Derick Rethans",
title: "Introduction to Document Databases with MongoDB",
date: ISODate("2013-04-24T16:26:31.911Z"),
body: "This arti…"
},
{
_id: ObjectId("51156a1e056d6f966f268f82"),
type: "Book",
author: "Derick Rethans",
title: "php|architect's Guide to Date and Time Programming with PHP",
isbn: "978-0-9738621-5-7"
}
Key/Value stores
• Store data in a schema-less way
• Store data as maps
• HashMaps or associative arrays
• Provide a very efficient average running
time algorithm for accessing data
• Notable for:
• Couchbase (Zynga, Vimeo, NAVTEQ, ...)
• Redis (Craiglist, Instagram, StackOverfow,
flickr, ...)
• Amazon Dynamo (Amazon, Elsevier,
IMDb, ...)
• Apache Cassandra (Facebook, Digg,
Reddit, Twitter,...)
• Voldemort (LinkedIn, eBay, …)
• Riak (Github, Comcast, Mochi, ...)
Sorted Ordered Column-Oriented Stores
• Data are stored in a column-oriented way
• Data efficiently stored
• Avoids consuming space for storing nulls
• Columns are grouped in column-families
• Data isn’t stored as a single table but is stored by column families
• Unit of data is a set of key/value pairs
• Identified by “row-key”
• Ordered and sorted based on row-key
• Notable for:
• Google's Bigtable (used in all
Google's services)
• HBase (Facebook, StumbleUpon,
Hulu, Yahoo!, ...)
Scaling RDBMS
• Master-Slave
• All writes are written to the master. All reads performed against
the replicated slave databases
• Critical reads may be incorrect as writes may not have been
propagated down
• Large data sets can pose problems as master needs to duplicate
data to
• Sharding
• Any DB distributed across multiple machines needs to
know in what machine a piece of data is stored or must be
stored
• A sharding system makes this decision for each row, using
its key
NoSQL, No ACID