10 No SQL Databases
10 No SQL Databases
Traditionally, the software industries use relational databases to store and manage data persistently. Not only SQL or
NoSQL is a new set of a database that has emerged in the recent past as an alternative solution to relational
databases.
Carl Strozzi introduced the term NoSQL to name his file-based database in 1998.
NoSQL refers to all databases and data stores that are not based on the Relational Database Management Systems
or RDBMS principles. It relates to large data sets accessed and manipulated on a Web scale. NoSQL does not
represent single product or technology. It represents a group of products and a various related data concepts for
storage and management. NoSQL was a hashtag that was chosen for a tech meetup to discuss the new databases.
Why NoSQL?
With the explosion of social media, user-driven content has grown rapidly and has increased the volume and type of
data that is produced, managed, analyzed, and archived. In addition, new sources of data, such as sensors, Global
Positioning Systems or GPS, automated trackers, and other monitoring systems generate huge volumes of data on a
regular basis.
These large volumes of data sets, also called big data, have introduced new challenges and opportunities for data
storage, management, analysis, and archival. In addition, data is becoming increasingly semi-structured and sparse.
This means that RDBMS databases which require upfront schema definition and relational references are examined.
To resolve the problems related to large-volume and semi-structured data, a class of new database products has
emerged. These new classes of database products consist of column-based data stores, key/value pair databases,
and document databases. Together, these are called NoSQL. The NoSQL database consists of diverse products with
each product having unique sets of features and value propositions.
Given below is a table that explains the difference between RDBMS (Relationship Database Management Systems)
and NoSQL.
RDBMS NoSQL
Data is stored in a relational model, with
rows and columns.
A row contains information about an item Data is stored in a host of different databases, with
while columns contain specific information, different data storage models.
such as ‘Model’, ‘Date of Manufacture’, Follows dynamic schemas. Meaning, you can add
‘Color’. columns anytime.
Follows fixed schema. Meaning, the Supports horizontal scaling. You can scale across
columns are defined and locked before data multiple servers. Multiple servers are cheap commodity
entry. In addition, each row contains data hardware or cloud instances, which make scaling cost-
for each column. effective compared to vertical scaling.
Supports vertical scaling. Scaling an Not ACID Compliant.
RDBMS across multiple servers is a
challenging and time-consuming process.
Atomicity, Consistency, Isolation &
Durability(ACID) Compliant
Benefits of NoSQL
All nodes in a cluster must be able to serve read request even if some machines are down.
Must be capable of easily replicating and segregating data between different physical shelves in a data center.
This helps avoid hardware outages.
Must be able to support data distribution designs that are multi-data centers, on-premises or in the cloud.
Types of NoSQL
1. Key-Value database
Key-Value database has a big hash table of keys and values. Riak (Pronounce as REE-awk), Tokyo Cabinet, Redis
server, Memcached ((Pronounce as mem-cached), and Scalaris are examples of a key-value store.
2. Document-based database
Document-based database stores documents made up of tagged elements. Examples: MongoDB, CouchDB,
OrientDB, and RavenDB .
3. Column-based database
Each storage block contains data from only one column, Examples: BigTable, Cassandra, Hbase, and Hypertable.
4. Graph-based database
A graph-based database is a network database that uses nodes to represent and store data. Examples are Neo4J,
InfoGrid, Infinite Graph, and FlockDB. The availability of choices in NoSQL databases has its own advantages and
disadvantages.
The advantage is, it allows you to choose a design according to your system requirements. However, because you
have to make a choice based on requirements, there is always a chance that the same database product may not be
used properly.
Key-Value Database
From an Application Program Interface or API perspective, a key-value database is the simplest NoSQL database.
This database stores every single item as a key with a value. You can get the value for a key, add a value for a key, or
delete a key. The value is a blob that the database stores without knowing its content. The responsibility lies with the
application to understand what is stored. Typically, key-value databases use primary-key access.
Therefore, they generally offer enhanced performance and scalability. All key-value databases may not have the same
features.
For example, data is not persistent in Memcached while it is in Riak.
These features are important when implementing certain solutions.
For example, you need to implement caching of user preferences.
If you implement them in Memcached, you may lose all the data when the node goes down and may need to get them
from the source database. If you store the same data in Riak, you may not lose data but must consider how to update
the stale data. It is important to select a key-value database based on your requirements.
The key value store does not have a defined schema. It contains client defined semantics for understanding what the
values are. A key value store is simple to build and easy to scale. It also tends to have great performance because the
access pattern can be optimized to suit your requirement.
The advantages & disadvantages of the key-value store include the following.
Advantages Disadvantages
Queries: You can perform a query It does not provide any
by using the key. Even range traditional database capabilities,
queries on the key are usually not such as consistency when
possible. multiple transactions are
Schema: Key value databases executed simultaneously.
have the following schema - the As the volume of data
key is a string, the value is a blob. increases, maintaining unique
The client determines how to values as keys become difficult.
parse data.
Usages: Key value databases can
access data using a key. Key-
value type database suffers from
major weaknesses.
Document Database
It stores and retrieves various documents in formats, such as XML, JavaScript Object Notation or JSON
(Pronounce as JAY- Sahn), Binary JSON or BSON.
These documents are self-descriptive, hierarchical tree data structures which consist of maps, collections, and
scalar values.
The stored documents can be similar to each other, but not necessarily the same. It stores documents in the
value part of the key-value database. You can consider the document databases as key-value stores where
you can examine the values.
MongoDB: Provides a rich query language and many useful features such as built-in support for MapReduce-
style aggregation and geospatial indexes.
Apache CouchDB: Uses JSON for documents, JavaScript for MapReduce indexes, and regular HTTP for its
API.
Column-Based Database
Column-based databases store data in column families as rows. These rows contain multiple columns associated with
a row key.
For example, you may access customer profile information at the same time, but not their order history.
Each column family is like a container of rows in an RDBMS table where the key identifies the rows.
Each row consists of multiple columns. However, the various rows need not have the same columns.
Moreover, you can add a column to any row at any time without adding it to other rows.
The goal of a Column-based database is to efficiently read and write data to and from hard disk storage to quickly
return a query. In this database, all column one values are physically together, followed by all the column two values.
The data is stored in record order so that the 100th entry for column one and the column two are from the same input
record. This allows you to access individual data elements, such as customer name, as a group in columns, rather
than individually row-by-row.
The compression permits columnar operations like MIN, MAX, SUM, COUNT, and AVG— to be performed very
rapidly. A column-based database management system or DBMS is self-indexing, therefore it uses less disk space
than an RDBMS containing the same data.
The diagram provided in the section depicts that data is getting stored in a column rather than row format. It shows
columns for the same column family are stored together in one file on the hard disk.
Therefore, these data can be retrieved fast in an efficient manner.
Cassandra is fast and easily scalable with write operations spread across the cluster.
The cluster does not have a master node, hence, any node can handle the read and write operations.
Graph Database
A graph database lets you store data and its relationships with other data in the form of nodes and edges. Each
relation can have a set of properties. Edges have a direction which has its own significance and enables you to
explore the relationship in both the direction.
All the nodes in the graph are organized by relationships that help explore interesting and hidden patterns between the
nodes.
Examples of Graph Database
Examples of graph database are Neo4J (pronounce as Neo- four-J), Infinite Graph, OrientDB, and FlockDB.
Neo4J is one of the most popular graph databases, which is ACID compliant. It is the product of the company
Neo Technologies. It is Java based but has bindings for other languages, including Ruby and Python.
FlockDB was created by Twitter for relationship related analytics.
In the graph database, the labeled property graph model is used for modeling the data. It is same as the entity
relationships or ER model used in RDBMS. The property graph contains connected entities, such as the nodes which
can hold any number of attributes or key-value-pairs.
CAP Theorem
Many NoSQL databases provide options for a developer to choose to adjust the database as per requirement. For
this, understanding the following requirements is important:
Consistency
Consistency in CAP theorem refers to atomicity and isolation. Consistency means consistent read and write
operations for the same sets of data so that concurrent operations see the same valid and consistent data state,
without any stale data.
Consistency in ACID
Consistency in ACID means if the data does not satisfy predefined constraints, it is not persisted. Consistency in CAP
theorem is different. In a single-machine database, consistency is achieved using the ACID semantics.
However, in the case of NoSQL databases which are scaled out and distributed providing consistency gets
complicated.
Availability
Partition Tolerance
Partition tolerance or fault-tolerance is the third element of the CAP theorem. Partition tolerance measures the ability
of a system to continue its service when some of its clusters become unavailable.
In the next section, we will learn about MongoDB in terms of the CAP theorem.
By default, MongoDB offers strong consistency. This means after you perform a write operation, you cannot read the
same data until the write operation is successful. MongoDB is a single-master system and by default, all reads go to
the primary node.
Optionally, if you enable reading from the secondary node, MongoDB becomes eventually consistent and allows
reading of out-of-date results. In addition, MongoDB handles network partition very well by keeping same data on
multiple nodes or replica set.
Therefore, MongoDB is a consistent and partition tolerant database which comprises the availability aspect.
Summary
NoSQL represents a class of products and a collection of diverse or related data concepts for storage and
manipulation.
NoSQL databases are used to efficiently manage large-volume and semi-structured data.
The four basic NoSQL database types are— Key-Value, Document-based, Column-based, and Graph-based.
According to the CAP theorem, a distributed computer system cannot provide all the three properties together
—consistency, availability, and partition tolerance.