Introduction To: Nosql

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 27

INTRODUCTION TO

NOSQL
COMPUTER SCIENCE AND ENGINEERING
(DATA SCIENCE)

Presented by
V. Nagarjuna
HISTORY OF NOSQL

 The term NoSQL was coined by Carlo Strozzi in the year 1998. He used this term to name his
Open Source, Light Weight, Database which did not have an SQL interface.

 In the early 2009, when last.fm wanted to organize an event on open-source distributed
databases, Eric Evans, a Rackspace employee, reused the term to refer databases which are
non-relational, distributed, and does not conform to atomicity, consistency, isolation,
durability - four obvious features of traditional relational database systems.

 In the same year, the "no:sql(east)" conference held in Atlanta, USA, NoSQL was discussed
and debated a lot.

 And then, discussion and practice of NoSQL got a momentum, and NoSQL saw an
unprecedented growth.
HISTORY OF NOSQL
NOSQL……?

NoSQL is a non-relational database management systems, different from

traditional relational database management systems in some significant ways.

NoSQL is designed for distributed data stores where very large scale of data

storing needs (for example Google or Facebook which collects terabits of data

every day for their users). These type of data storing may not require fixed

schema, avoid join operations and typically scale horizontally.


WHY NOSQL?

In today’s time data is becoming easier to access and capture through third

parties such as Facebook, Google+ and others. Personal user information, social

graphs, geo location data, user-generated content and machine logging data are

just a few examples where the data has been increasing exponentially. To avail the

above service properly, it is required to process huge amount of data. Which SQL

databases were never designed. The evolution of NoSql databases is to handle

these huge data properly.


RDBMS VS NOSQL
RDBMS
Structured and organized data
Structured query language (SQL)
Data and its relationships are stored in separate tables.
Data Manipulation Language, Data Definition Language
Tight Consistency
NoSQL
Stands for Not Only SQL
No declarative query language
No predefined schema
Key-Value pair storage, Column Store, Document Store, Graph databases
Eventual consistency rather ACID property
Unstructured and unpredictable data
CAP THEOREM (BREWER’S THEOREM)
Understand the CAP theorem when you talk about NoSQL databases or in fact when
designing any distributed system. CAP theorem states that there are three basic
requirements which exist in a special relation when designing applications for a
distributed architecture.
Consistency:

This means that the data in the database remains consistent after the execution of an operation. For
example after an update operation all clients see the same data.

Availability :

This means that the system is always on (service guarantee availability), no downtime.

Partition Tolerance :

This means that the system continues to function even the communication among the servers is unreliable,
i.e. the servers may be partitioned into multiple groups that cannot communicate with one another.
CA - Single site cluster, all nodes are always in contact. When a partition occurs, the system blocks.

CP-Some data may not be accessible, but the rest is still consistent/accurate.

AP -System is still available under partitioning, but some of the data returned may be inaccurate.
NOSQL PROS/CONS
Advantages :

• High scalability
• Distributed Computing
• Lower cost
• Schema flexibility, semi-structure data
• No complicated Relationships

Disadvantages

• No standardization
• Limited query capabilities (so far)
THE BASE

The CAP theorem states that a distributed computer system cannot guarantee all of the
following three properties at the same time:
Consistency
Availability
Partition tolerance
A BASE system gives up on consistency.
o Basically Available indicates that the system does guarantee availability, in terms of
the CAP theorem.
o Soft state indicates that the state of the system may change over time, even without
input. This is because of the eventual consistency model.
o Eventual consistency indicates that the system will become consistent over time, given
that the system doesn't receive input during that time.
ACID VS BASE

ACID BASE

Atomic Basically Available

Consistency Soft state

Isolation Eventual consistency

Durable  
NOSQL CATEGORIES
There are four general types (most common categories) of NoSQL databases.
Each of these categories has its own specific attributes and limitations. There is not
a single solutions which is better than all the others, however there are some
databases that are better to solve specific problems. To clarify the NoSQL databases,
lets discuss the most common categories :

• Key-value stores

• Column-oriented

• Graph

• Document oriented
KEY-VALUE STORES

Key-value stores are most basic types of NoSQL databases.

Designed to handle huge amounts of data.

Based on Amazon’s Dynamo paper.

Key value stores allow developer to store schema-less data.

In the key-value storage, database stores data as hash table where each key is

unique and the value can be string, JSON, BLOB (Binary Large OBjec) etc.
KEY-VALUE STORES

 A key may be strings, hashes, lists, sets, sorted sets and values are stored against these keys.

 For example a key-value pair might consist of a key like "Name" that is associated with a

value like "Robin".

 Key-Value stores can be used as collections, dictionaries, associative arrays etc.

 Key-Value stores follow the 'Availability' and 'Partition' aspects of CAP theorem.

 Key-Values stores would work well for shopping cart contents, or individual values like

color schemes, a landing page URI, or a default account number.

 Example of Key-value store DataBase : Redis, Dynamo, Riak. etc.


PICTORIAL PRESENTATION 
PICTORIAL PRESENTATION 
COLUMN-ORIENTED DATABASES
 Column-oriented databases primarily work on columns and every column is treated
individually.
 Values of a single column are stored contiguously.
 Column stores data in column specific files.
 In Column stores, query processors work on columns too.
 Alldata within each column datafile have the same type which makes it ideal for
compression.
 Column stores can improve the performance of queries as it can access specific column
data.
 High performance on aggregation queries (e.g. COUNT, SUM, AVG, MIN, MAX).
 Workson data warehouses and business intelligence, customer relationship
management (CRM), Library card catalogs etc.
 Example of Column-oriented databases : BigTable, Cassandra, SimpleDB etc.
PICTORIAL PRESENTATION 
GRAPH DATABASES

 A graph data structure consists of a finite (and possibly mutable) set of ordered pairs,
called edges or arcs, of certain entities called nodes or vertices.

 The following picture presents a labeled graph of 6 vertices and 7 edges.


GRAPH DATABASES
Graph Databases…?
 A graph database stores data in a graph.
 It is capable of elegantly representing any kind of data in a highly accessible way.
 A graph database is a collection of nodes and edges
 Each node represents an entity (such as a student or business) and each edge represents
a connection or relationship between two nodes.
 Every node and edge are defined by a unique identifier.
 Each node knows its adjacent nodes.
 As the number of nodes increases, the cost of a local step (or hop) remains the same.
 Index for lookups.
 Here is a comparison between the classic relational model and the graph model :
COMPARISON BETWEEN THE CLASSIC RELATIONAL MODEL AND THE
GRAPH MODEL

Relational model Graph model


Tables Vertices and Edges set

Rows Vertices

Columns Key/value pairs

Joins Edges

Example of Graph databases : OrientDB, Neo4J, Titan.etc.


PICTORIAL PRESENTATION 
DOCUMENT ORIENTED DATABASES
 A collection of documents
 Data in this model is stored inside documents.
 A document is a key value collection where the key allows access to its
value.
 Documents are not typically forced to have a schema and therefore are
flexible and easy to change.
 Documents are stored into collections in order to group different kinds
of data.
 Documents can contain many different key-value pairs, or key-array
pairs, or even nested documents.
COMPARISON BETWEEN THE CLASSIC RELATIONAL MODEL AND
THE DOCUMENT MODEL

Relational model Document model


Tables Collections

Rows Documents

Columns Key/value pairs

Joins not available

Example of Document Oriented databases : MongoDB, CouchDB etc.


PICTORIAL PRESENTATION 
PRODUCTION DEPLOYMENT
 There is a large number of companies using NoSQL. To name a few :

 Google

 Facebook

 Mozilla

 Adobe

 Foursquare

 LinkedIn

 Digg

 McGraw-Hill Education

 Vermont Public Radio


Thank You

You might also like