NoSQL Databases and Big Data Storage Systems
NoSQL Databases and Big Data Storage Systems
What is NoSQL?
NoSQL (Not Only SQL) is a category of database management systems that provides a mechanism for
storage and retrieval of data model in means other than the tabular relations used in relational
databases.
Why NoSQL?
Characteristics:
Definition
CAP Theorem, proposed by Eric Brewer, states that a distributed database system can only guarantee
two out of three of the following:
1. Consistency (C) – Every read receives the most recent write or an error.
2. Availability (A) – Every request receives a (non-error) response, without guaranteeing the
most recent write.
3. Partition Tolerance (P) – The system continues to operate despite arbitrary partitioning due to
network failures.
NoSQL databases often sacrifice consistency to achieve availability and partition tolerance
(e.g., eventual consistency).
Document Stores
Key Features:
Schema flexibility.
Overview
The key is unique, and the value can be a string, JSON, binary, etc.
Popular Examples:
Redis: In-memory key-value store; supports rich data structures like lists, sets.
Use Cases:
Caching
Session management
Overview
Data is grouped into column families, each containing multiple rows with flexible columns.
Key Features:
Popular Examples:
Apache Cassandra: Highly scalable, decentralized wide-column store.
HBase: Built on top of Hadoop HDFS; suitable for real-time read/write access to large
datasets.
Use Cases:
Time-series data
Event logging
Real-time analytics
Overview
Best suited for applications where relationships between data are crucial.
Graph Model:
Neo4j
Use Cases:
Social networks
Fraud detection
Recommendation systems
JSON/BSON/XML
Document Store Semi-structured data MongoDB, Couchbase
documents
Key-Value Store Key-value pairs Fast lookups, caching Redis, DynamoDB, Riak
Neo4j, Amazon
Graph Database Nodes and relationships Relationship-heavy data
Neptune