2- NoSQL
2- NoSQL
01
• NoSQL databases (aka "not only SQL") are non tabular, and store data differently
than relational tables.
• A NoSQL database provides a mechanism for storage and retrieval of data that is
modeled in means other than the tabular relations used in relational databases.
• Such databases have existed since the late 1960s, but the name "NoSQL" was only
coined in the early 21st century
History of NoSQL
• The acronym NoSQL was first used in 1998 by Carlo Strozzi while naming his
lightweight, open-source “relational” database that did not use SQL.
• The name came up again in 2009 when Eric Evans and Johan Oskarsson used it to
describe non-relational databases. Relational databases are often referred to as SQL
systems.
• The term NoSQL can mean either “No SQL systems” or the more commonly
accepted translation of “Not only SQL,” to emphasize the fact some systems might
support SQL-like query languages.
Cont…
• The NoSQL model uses a distributed database system, meaning a system with multiple computers. The
non-relational system is quicker, uses an ad-hoc approach for organizing data, and processes large
amounts of differing kinds of data.
• For general research, NoSQL databases are the better choice for large, unstructured data sets compared
with relational databases due to their speed and flexibility.
• Not only can NoSQL systems handle both structured and unstructured data, but they can also process
unstructured Big Data quickly.
• This led to organizations such as Facebook, Twitter, LinkedIn, and Google adopting NoSQL systems. These
organizations process tremendous amounts of unstructured data, coordinating it to find patterns and gain
business insights.
Need of NoSQL
• The relationship between the data you store is not that important
• The data is growing continuously, and you need to scale the database regular to handle the data.
• NoSQL database provides more flexibility when it comes to handling data. There is no requirement
to specify the schema to start working with the application.
Why Use NoSQL?
• The concept of NoSQL databases became popular with
Internet giants like Google, Facebook, Amazon, etc. who
deal with huge volumes of data. The system response
time becomes slow when you use RDBMS for massive
volumes of data.
• To resolve this problem, we could “scale up” our systems
by upgrading our existing hardware. This process is
expensive.
• The alternative for this issue is to distribute database load
on multiple hosts whenever the load increases. This
method is known as “scaling out.”
• NoSQL database is non-relational, so it scales out better
than relational databases as they are designed with web
applications in mind.
CAP Theorem (Brewers Theorem)
• The CAP Theorem, also known as Brewer’s theorem (after its developer, Eric Brewer), is an important part of non-
relational databases. It states that a distributed data store “cannot” simultaneously offer more than “two of three”
established guarantees. Brewer, at the University of California, presented the theory in the fall of 1998, and it was
published in 1999 as the CAP Principle. The three guarantees that cannot be met simultaneously are:
• Consistency: The data within the database remains consistent, even after an operation has been executed. For instance,
after updating a system, all clients will see the same data.
• Availability: The system is constantly on (always available), with no downtime. Availability means that that any client
making a request for data gets a response, even if one or more nodes are down. Another way to state this—all working
nodes in the distributed system return a valid response for any request, without exception.
• Partition Tolerance: Even if communication among the servers is no longer reliable, the system will continue to function.
This is because the servers can be partitioned off, into multiple groups which can’t communicate with each other.
Cont…
• NoSQL (non-relational) databases are ideal for
distributed network applications. Unlike their
vertically scalable SQL (relational) counterparts,
NoSQL databases are horizontally scalable and
distributed by design—they can rapidly scale across
a growing network consisting of multiple
interconnected nodes.
• Today, NoSQL databases are classified based on the
two CAP characteristics they support:
Cont…
• CP database: A CP database delivers consistency and partition
tolerance at the expense of availability. When a partition occurs
between any two nodes, the system has to shut down the non-
consistent node (i.e., make it unavailable) until the partition is
resolved.
• AP database: An AP database delivers availability and partition
tolerance at the expense of consistency. When a partition
occurs, all nodes remain available but those at the wrong end
of a partition might return an older version of data than others.
(When the partition is resolved, the AP databases typically
resync the nodes to repair all inconsistencies in the system.)
• CA database: A CA database delivers consistency and
availability across all nodes. It can’t do this if there is a partition
between any two nodes in the system, however, and therefore
can’t deliver fault tolerance.
The BASE Model
• NoSQL databases are eventually consistent, but the eventual consistency implementation may vary
across different NoSQL databases.
• NRW is the notation used to describe how the eventual consistency model which is implemented
across NoSQL databases where
• N is the number of data copies that the database has maintained.
• R is the number of copies that an application needs to refer to before returning a read request’s
output.
• W is the number of data copies that need to be written to before a write operation is marked as
completed successfully.
• Using these notation configurations, the databases implement the model of eventual consistency.
Cont…
Read Operations
• If R is set to 1, the read operation will read any data copy, which can be
outdated.
• If R>1, more than one copy is read, and it will read most recent value.
However, this can slow down the read operation.
• Using N<W+R always ensures that a read operation retrieves the latest
value. This is because the number of written copies and read copies
are always greater than the actual number of copies, ensuring that at
least one read copy has the latest version. This is quorum assembly.
Categories of
NoSQL Database
• Column-oriented databases primarily work on columns and every column is treated individually.
• Values of a single column are stored contiguously.
• Column stores data in column specific files.
• All data within each column datafile have the same type which makes it ideal for compression.
• Column stores can improve the performance of queries as it can access specific column data.
• High performance on aggregation queries (e.g. COUNT, SUM, AVG, MIN, MAX).
• Works on data warehouses and business intelligence, customer relationship management
(CRM), Library card catalogs etc.
• Example of Column-oriented databases : BigTable, Cassandra, SimpleDB etc.
• A column family consists of
multiple rows.
• Each row can contain a different
number of columns to the other
rows. And the columns don’t have to
match the columns in the other rows
(i.e. they can have different column
names, data types, etc).
• Each column is contained to its
row. It doesn’t span all rows like in a
relational database. Each column
contains a name/value pair, along
with a timestamp.
Document database
• A document database is a type of nonrelational database that is designed to store and query
data as JSON-like documents.
• Document databases make it easier for developers to store and query data in a database by
using the same document-model format they use in their application code.
• The flexible, semistructured, and hierarchical nature of documents and document databases
allows them to evolve with applications’ needs. The document model works well with use
cases such as catalogs, user profiles, and content management systems where each
document is unique and evolves over time.
• Document databases enable flexible indexing, powerful ad hoc queries, and analytics over
collections of documents.
Graph Based Database
• High scalability: This scaling up approach fails when the transaction rates and fast response
requirements increase. In contrast to this, the new generation of NoSQL databases is
designed to scale out (i.e. to expand horizontally using low-end commodity servers).
• Manageability and administration: NoSQL databases are designed to mostly work with
automated repairs, distributed data, and simpler data models, leading to low manageability
and administration.
• Low cost: NoSQL databases are typically designed to work with a cluster of cheap
commodity servers, enabling the users to store and process more data at a low cost.
• Flexible data models: NoSQL databases have a very flexible data model, enabling them to
work with any type of data; they don’t comply with the rigid RDBMS data models. As a result,
any application changes that involve updating the database schema can be easily
implemented.
NoSQL Disadvantage
• Maturity: Most NoSQL databases are pre-production versions with key features that are
still to be implemented. Thus, when deciding on a NoSQL database, we should analyse the
product properly to ensure the features are fully implemented and not still on the To-do
list.
• Support: Support is one limitation that we need to consider. Most NoSQL databases are
from start-ups which were open sourced. As a result, support is very minimal as compared
to the enterprise software companies and may not have global reach or support resources.
• Limited Query Capabilities: Since NoSQL databases are generally developed to meet
the scaling requirement of the web-scale applications, they provide limited querying
capabilities.
• Expertise: Since NoSQL is an evolving area, expertise on the technology is limited in the
developer and administrator community.