DBMS - Unit 5 (NoSQL Databases)
DBMS - Unit 5 (NoSQL Databases)
Prof. S. B. Shinde
Asst Professor, MESCOE Pune
Introduction
In the computing system (web and business applications),
there are enormous data that comes out every day from the
web. A large section of these data is handled by Relational
database management systems (RDBMS).
Prof. S. B. Shinde
Introduction
Prof. S. B. Shinde
Distributed Systems
A distributed system consists of multiple computers and
software components that communicate through a computer
network (a local network or by a wide area network).
Prof. S. B. Shinde
What is NoSQL?
Stands for Not Only SQL.
These type of data storing may not require fixed schema, avoid
join operations and typically scale horizontally.
Prof. S. B. Shinde
Why NoSQL ?
In today’s time data is becoming easier to access and capture
through third parties such as Facebook, Google+ and others.
Structured data
Prof. S. B. Shinde
RDBMS vs NoSQL
RDBMS NoSQL
Structured and organized data Stands for Not Only SQL
Structured query language (SQL) No declarative query language
Data and its relationships are No predefined schema
stored in separate tables Key-Value pair storage, Column
DDL,DML Store, Document Store, Graph
Tight Consistency databases
ACID Transaction Eventual consistency rather ACID
property
Unstructured and unpredictable data
CAP Theorem
Prioritizes high performance, high
availability and scalability
BASE Transaction
Prof. S. B. Shinde
Prof. S. B. Shinde
Brief History of NoSQL
The term NoSQL was coined by Carlo Strozzi in the year
1998. He used this term to name his Open Source, Light
Weight, DataBase which did not have an SQL interface.
Three Eras of Databases
Prof. S. B. Shinde
Before NoSQL
Prof. S. B. Shinde
After NoSQL
Prof. S. B. Shinde
Type of NoSQL
Document Oriented Databases:
Document oriented database stores data in the form of documents.
A collection of documents.
A document can be in a JSON, BSON, XML, YAML, etc format.
Data in this model is stored inside documents.
A document is a key value collection where the key allows access to its
value.
Documents are stored into collections in order to group different kinds of
data.
Relational model Document model
Tables Collections
Rows Documents
Prof. S. B. Shinde
Type of NoSQL
Column Oriented Databases ::
Column-oriented databases primarily work on columns
and every column is treated individually.
Column stores can improve the performance of queries as
it can access specific column data.
High performance on aggregation queries (e.g. COUNT,
SUM, AVG, MIN, MAX).
Works on data warehouses and business intelligence,
customer relationship management (CRM), Library card
catalogs etc.
Prof. S. B. Shinde
Type of NoSQL
Column Oriented Databases
Prof. S. B. Shinde
Type of NoSQL
Key-Value Databases:
In key-value database each item in the database is stored as
an attribute name (or “key”), together with its value.
Key-value stores are most basic types of NoSQL databases.
Designed to handle huge amounts of data.
In the key-value storage, database stores data as hash table
where each key is unique and the value can be string, JSON,
BLOB (basic large object) etc.
Key-Value stores follows the 'Availability' and 'Partition'
aspects of CAP theorem.
Key-Value stores can be used as collections, dictionaries,
associative arrays etc.
Example: Redis, Riak, Azure Table Storage, DynamoDB,
Berkeley DB, LevelDB, FoundationDB etc.
Prof. S. B. Shinde
Type of NoSQL
Key-Value Databases:
Prof. S. B. Shinde
Type of NoSQL
Graph database:
A graph database uses graph structures with nodes, edges,
and properties to represent and store data in database.
A graph databases is faster for associative data sets and
hence it’s gaining popularity these days.
Graph stores are used to store information about networks,
such as social connections.
Each node represents an entity (such as a student or
business) and each edge represents a connection or
relationship between two node
Prof. S. B. Shinde
Type of NoSQL
Graph database:
Prof. S. B. Shinde
SQL v/s NoSQL
SQL Databases NoSQL Databases
Types One type (SQL database) with Many different types including
minor variations key-value stores, document
databases, wide-column stores,
and graph databases.
Development Developed in 1970s to deal with Developed in 2000s to deal with
History first wave of data storage limitations of SQL databases,
applications. particularly concerning scale,
replication and unstructured data
storage.
Examples MySQL, Postgres, Oracle Database MongoDB, Cassandra, HBase,
Neo4j
Data Specific language using Select, Through object-oriented APIs
Manipulation Insert, and Update statements,
Consistency Can be configured for strong Depends on product. Some
consistency provide strong consistency (e.g.,
MongoDB) whereas others offer
eventual consistency (e.g.,
Cassandra)
Prof. S. B. Shinde
SQL v/s NoSQL
SQL Databases NoSQL Databases
Scaling Vertically, meaning a single server Horizontally, meaning that to add
must be made increasingly powerful capacity, a database administrator
in order to deal with increased can simply add more commodity
demand. servers or cloud instances.
Development Mix of open-source (e.g., Postgres, Open-source
Model MySQL) and closed source (e.g.,
Oracle Database)
Supports Yes, updates can be configured to In certain circumstances and at
Transactions complete entirely or not at all certain levels (e.g., document level
vs. database level)
Data Storage Individual records are stored as rows Varies based on database type.
Model in tables, with each column storing
a specific piece of data about that
record much like a spreadsheet.
Schemas Structure and data types are fixed in Typically dynamic. Records can
advance. To store information about add new information on the fly,
a new data item, the entire database and unlike SQL table rows,
must be altered, during which time dissimilar data can be stored
the database must be taken offline. together as necessary.
Prof. S. B. Shinde
CAP Theorem (Brewer’s Theorem)
CAP Theorem states that there are three basic
requirements which exist in a special relation when
designing applications for a distributed architecture.
Consistency:
Availability:
Partition tolerance:
Prof. S. B. Shinde
CAP Theorem
Consistency:
Consistency: (all nodes see the same data at the same
time)
Prof. S. B. Shinde
CAP Theorem
Availability:
Prof. S. B. Shinde
CAP Theorem
Partition Tolerance
Partition Tolerance (the system continues to operate
despite arbitrary message loss or failure of part of the
system)
Prof. S. B. Shinde
CAP Theorem
Theoretically it is impossible to fulfill all 3 requirements.
A distributed system can support only 2 out of the 3
characteristics:
CA - Single site cluster, therefore all nodes are
always in contact. When a partition occurs, the
system blocks.
CP - Some data may not be accessible, but the
rest is still consistent/accurate.
AP - System is still available under partitioning,
but some of the data returned may be inaccurate.
Prof. S. B. Shinde
CAP Theorem
Prof. S. B. Shinde
The BASE System
The BASE acronym was defined by Eric Brewer, who
is also known for formulating the CAP theorem.
Prof. S. B. Shinde
The BASE System Contd..
Basically Available indicates that the system does guarantee
availability, in terms of the CAP theorem.
Soft state indicates that the state of the system may change
over time, even without input. This is because of the eventual
consistency model.
Advantages: Disadvantages:
High scalability Maturity
Dynamic Schemas Enterprise Support
Replication Transaction Support
Auto-sharding Expertise (Highly Skilled
Integrated Caching Programmers)
Distributed Computing
Low Cost
No complicated Relationships
Prof. S. B. Shinde
Advantages of NoSQL
Scalability: NoSQL database can be scaled up easily and with minimum
effort and hence it’s well suited for today’s every increasing database need
(bit data). NoSQL database have scalable architecture, so it can efficiently
manage data and can scale up to many machines instead of costly machines
that are required while scaling using of SQL DBMS.
With dynamic schema, if we want to change the length of column, or add
new column we don’t need to change whole table data instead the new data
will be stored with the new structure without affecting the previous data
/structure. In NoSQL databases we can insertion data without a predefined
schema.
Replication provides redundancy and increases data availability. With
multiple copies of data on different database servers, replication protects a
database from the loss of a single server.
To use replication with sharding, deploy each shard as a replica set.
Sharding is the process of storing data records across multiple machines and
is MongoDB’s approach to meeting the demands of data growth.
Many NoSQL database have integrated caching mechanism, hence
frequently used data are stored in system memory as much as possible and
discarding the need for a separate caching layer.
Disadvantages of NoSQL
Maturity – NoSQL database are new and emerging technologies. Since its
under heavy development bugs, new features, keep on arising.
Prof. S. B. Shinde
Thank
You