0% found this document useful (0 votes)

12 views34 pages

Module 7 - NoSQL

The document discusses NoSQL databases and provides details about MongoDB and key-value stores. It describes the characteristics of NoSQL systems, different categories including document, key-value, column and graph databases. It also covers MongoDB data model, operations and distributed features like replication and sharding.

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views34 pages

Module 7 - NoSQL

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Module 7

NoSQL

Dr. S. RENUKA DEVI

Professor
SCOPE
VIT Chennai Campus
What is NoSQL?
Not Only SQL

Most NoSQL systems are distributed databases or

distributed storage systems, with a focus on semi-
structured data storage, high performance, availability,
data replication, and scalability

A structured relational SQL system may not be appropriate

for applications storing vast amount of data because
SQL systems offer too many services which the
application may not need
Traditional relational model may be too restrictive
Some of the organizations decided to develop their own
systems:
Google developed a proprietary NoSQL system known as
BigTable - used in many of Google’s applications such as
Gmail, Google Maps, and Web site indexing.
Apache Hbase is an open source NoSQL system based on
similar concepts.
Google innovation - Column-based or wide column
stores; also referred to as column family stores.
Amazon developed a NoSQL system called DynamoDB –
available through Amazon’s cloud services.
This innovation led to the category known as key-value
data stores or sometimes key-tuple or key-object data
stores.
Facebook developed a NoSQL system called Cassandra,
which is now open source and known as Apache Cassandra.
It uses concepts from both key-value stores and column-
based systems.

Other software companies started developing their own

solutions—for example, MongoDB and CouchDB, which
are classified as document-based NoSQL systems or
document stores.

Another category of NoSQL systems is the graph-based

NoSQL systems, or graph databases
Neo4j, graphbase
Characteristics of NOSQL Systems
NOSQL characteristics related to distributed databases and
distributed systems
 Scalability -In NoSQL systems, horizontal scalability is generally
used, where the distributed system is expanded by adding more nodes
for data storage and processing as the volume of data grows.
 Availability, Replication and Eventual Consistency
 Replication Models
 Master-slave replication requires one copy to be the master copy;
all write operations must be applied to the master copy and then
propagated to the slave copies
 master-master replication allows reads and writes at any of the
replicas but may not guarantee that reads at nodes that store
different copies see the same values.
 Sharding of Files (also known as horizontal partitioning) -
distribute the load of accessing the file records to multiple nodes
 High-Performance Data Access: - hashing or range partitioning on
object keys
Characteristics of NOSQL Systems
NoSQL characteristics related to data models and query
languages
NoSQL systems emphasize performance and flexibility over
modeling power and complex querying
Not Requiring a Schema - allows semi-structured, self-
describing data
Less Powerful Query Languages - CRUD operations
Versioning
Categories of NOSQL Systems
 Document-based NOSQL systems: These systems store data
in the form of documents using well-known formats, such as
JSON (JavaScript Object Notation)
 Documents are accessible via their document id
 NoSQL key-value stores: a simple data model based on fast
access by the key to the value associated with the key
 Column-based or wide column NoSQL systems: These
systems partition a table by column into column families where
each column family is stored in its own files.
 Graph-based NOSQL systems: Data is represented as graphs,
and related nodes can be found by traversing the edges using
path expressions.
 Hybrid NoSQL systems
 Object databases
 XML databases
The CAP Theorem
 CAP refer to three desirable properties of distributed systems with
replicated data: consistency , availability and partition
tolerance

 The CAP theorem states that it is not possible to guarantee all

three of the desirable properties at the same time in a distributed
system with data replication.

 If this is the case, then the distributed system designer would have
to choose two properties out of the three to guarantee.

 In a N0SQL distributed data store, a weaker consistency level is

often acceptable, and guaranteeing the other two properties
(availability, partition tolerance) is important.

 Hence, eventual consistency is often adopted in NoSQL systems.

Document-Based NOSQL Systems

It stores data as collections of similar documents

Sometimes known as document stores

Examples include MongoDB and CouchDB

MongoDB Data Model
MongoDB documents are stored in BSON (Binary JSON)
format
 Individual documents are stored in a collection.
The operation createCollection is used to create each
collection.
For example, the following command can be used to create a
collection called project to hold PROJECT objects from the
COMPANY database
[Link](“project”, { capped : true, size : 1310720, max :
500 } )
[Link](“worker”, { capped : true, size : 5242880, max
: 2000 } ) )
MongoDB Data Model
Each document in a collection has a unique ObjectId
field, called _id, which is automatically indexed in the
collection
The value of ObjectId can be specified by the user, or it
can be system-generated
System-generated ObjectIds have a specific format,
which combines the timestamp when the object is
created (4 bytes, in an internal MongoDB format), the
node id (3 bytes), the process id (2 bytes), and a counter
(3 bytes) into a 16-byte Id value.
User-generated ObjectsIds can have any value specified
by the user as long as it uniquely identifies the document
MongoDB Data Model
A collection does not have a schema
The structure of the data fields in documents is
chosen based on how documents will be accessed and
used
Denormalized document design with embedded
subdocuments
Embedded array of document references
MongoDB CRUD Operations
MongoDb has several CRUD (create, read, update,
delete) operations
Documents can be created and inserted into their
collections using the insert operation, whose format is:
db.<collection_name>.insert(<document(s)>)
The parameters of the insert operation can include either a
single document or an array of documents
Example:
[Link]( { _id: “P1”, Pname: “ProductX”, Plocation:
“Bellaire” } )
[Link]( [ { _id: “W1”, Ename: “John Smith”,
ProjectId: “P1”, Hours: 32.5 },{ _id: “W2”, Ename: “Joyce
English”, ProjectId: “P1”,Hours: 20.0 } ] )
MongoDB CRUD Operations
The delete operation is called remove, and the format
is:
db.<collection_name>.remove(<condition>)

There is also an update operation, which has a

condition to select certain documents, and a $set
clause to specify the update.
For read queries, the main command is called find,
and the format is:
db.<collection_name>.find(<condition>)
MongoDB Distributed Systems Characteristics
MongoDB is a distributed system, the two-phase
commit method is used to ensure atomicity and
consistency of multi-document transactions

Replication in MongoDB
The concept of replica set is used in MongoDB to
createmultiple copies of the same data set on different
nodes in the distributed system
It uses a variation of the master-slave approach - all
write operations must be applied to the primary copy
and then propagated to the secondaries
MongoDB Distributed Systems Characteristics
Sharding in MongoDB
 Sharding of the documents in the collection—also known as
horizontal partitioning— divides the documents into disjoint
partitions known as shards.
 This allows the system to add more nodes as needed by a
process known as horizontal scaling of the distributed
system
 It store the shards of the collection on different nodes to
achieve load balancing
 Each node will process only those operations pertaining to
the documents in the shard stored at that node
MongoDB Distributed Systems Characteristics
Two ways to partition a collection into shards – range
partitioning and hash partitioning
Both require that the user specify a particular document field to
be used as the basis for partitioning the documents into shards.
The partitioning field, known as the shard key must have two
characteristics:
 it must exist in every document in the collection
 it must have an index
Range partitioning creates the chunks by specifying a range of key
values
Hash partitioning applies a hash function h(K) to each shard key
K, and the partitioning of keys into chunks is based on the hash
values
MongoDB Distributed Systems Characteristics
 When sharding is used, MongoDB queries are submitted to a
module called the query router, which keeps track of which
nodes contain which shards based on the particular
partitioning method used on the shard keys

 The query (CRUD operation) will be routed to the nodes that

contain the shards that hold the documents that the query is
requesting

 Sharding and replication are used together; sharding focuses

on improving performance via load balancing and horizontal
scalability, whereas replication focuses on ensuring system
availability when certain nodes fail in the distributed system
NoSQL Key-Value stores
The key is a unique identifier associated with a data item
and is used to locate this data item rapidly

The value is the data item itself, and it can have very
different formats for different key-value storage systems

The main characteristic of key-value stores is the fact that

every value (data item) must be associated with a unique
key, and that retrieving the value by supplying the key must
be very fast
NoSQL Key-Value stores
DynamoDB
an Amazon product and is available as part of Amazon’s
AWS/SDK platforms
The basic data model in DynamoDB uses the concepts of
tables, items, and attributes
A table in DynamoDB does not have a schema; it holds a
collection of self-describing items
Each item will consist of a number of (attribute, value) pairs,
and attribute values can be single-valued or multivalued
DynamoDB also allows the user to specify the items in JSON
format, and the system will convert them to the internal
storage format of DynamoDB
NoSQL Key-Value stores
When a table is created, it is required to specify a table
name and a primary key

The primary key will be used to rapidly locate the items in

the table

Thus, the primary key is the key and the item is the value
for the DynamoDB key-value store

The primary key attribute must exist in every item in the

table
Column-Based or Wide Column NOSQL
Systems
BigTable - Google distributed storage system for big data

An open source system known as Apache Hbase is similar

to Google BigTable, but it typically uses HDFS (Hadoop
Distributed File System) for data storage

HDFS is used in many cloud computing applications

sAnother well-known example of column-based NOSQL

systems is Cassandra
Column-Based or Wide Column NOSQL
Systems
BigTable (and Hbase) is sometimes described as a sparse
multidimensional distributed persistent sorted map, where
the word ‘map’ means a collection of (key, value) pairs

One of the main differences that distinguish column-based

systems from key-value stores is the nature of the key

In Hbase, the key is multidimensional and so has several

components: typically, a combination of table name, row
key, column, and timestamp
Hbase data model
 . The data model in Hbase organizes data using the concepts of
namespaces, tables, column families, column qualifiers, columns,
rows, and data cells

 A column is identified by a combination of (column family:column

qualifier)

 Data is stored in a self-describing form by associating columns with

data values, where data values are strings

 Hbase also stores multiple versions of a data item, with a timestamp

associated with each version, so versions and timestamps are also
part of the Hbase data model
Hbase data model
Column Families, Column Qualifiers, and Columns
 A table is associated with one or more column families
 Each column family will have a name
 Column families must be specified when the table is created and cannot
be changed later
 The table name is followed by the names of the column families
associated with the table.
 When the data is loaded into a table, each column family can be
associated with many column qualifiers
 A column is specified by a combination of
ColumnFamily:ColumnQualifier.
 The concept of column family is somewhat similar to vertical partitioning
because columns (attributes) that are accessed together because they
belong to the same column family are stored in the same files..
Examples in Hbase
Examples in Hbase

A cell holds a basic data item in Hbase. The key (address)

of a cell is specified by a combination of (table, rowid,
columnfamily, columnqualifier, timestamp)

If timestamp is left out, the latest version of the item is

retrieved

A namespace is a collection of tables.

NOSQL Graph Databases

The data is represented as a graph, which is a collection of

vertices (nodes) and edges

Both nodes and edges can be labeled to indicate the types

of entities and relationships they represent

It is generally possible to store data associated with both

individual nodes and individual edges
Any Queries?

Module 5 Part II NoSQL DB
No ratings yet
Module 5 Part II NoSQL DB
12 pages
CHAP1 No SQL Database - 085309
No ratings yet
CHAP1 No SQL Database - 085309
72 pages
NoSQL Database Features & Architecture
No ratings yet
NoSQL Database Features & Architecture
14 pages
Unit IV
No ratings yet
Unit IV
50 pages
NOSQL Databases and Big Data Storage Systems: Shilpa R Assistant Professor Cse, Sdmit Ujire
No ratings yet
NOSQL Databases and Big Data Storage Systems: Shilpa R Assistant Professor Cse, Sdmit Ujire
61 pages
MongoDB Sharding and Data Modeling Insights
No ratings yet
MongoDB Sharding and Data Modeling Insights
10 pages
NOSQL Databases
No ratings yet
NOSQL Databases
8 pages
1664473609-Unit 5 - Database Management - MongoDB
No ratings yet
1664473609-Unit 5 - Database Management - MongoDB
23 pages
BIG - DATA - Unit 4
No ratings yet
BIG - DATA - Unit 4
99 pages
Bigdata Unit 4
No ratings yet
Bigdata Unit 4
97 pages
NoSQL Unit 3
No ratings yet
NoSQL Unit 3
65 pages
NoSQL and Distributed Computing
No ratings yet
NoSQL and Distributed Computing
36 pages
Full Stack - Unit3
No ratings yet
Full Stack - Unit3
70 pages
UNIT 1 MongoDB Fully Complete
67% (3)
UNIT 1 MongoDB Fully Complete
60 pages
MongoDB: Features and Advantages
No ratings yet
MongoDB: Features and Advantages
227 pages
MongoDB Case Study 1
No ratings yet
MongoDB Case Study 1
6 pages
Unit5 Notes Short DB
No ratings yet
Unit5 Notes Short DB
6 pages
Module 4 Nosql
No ratings yet
Module 4 Nosql
8 pages
DBMS - Module-5 - NOSQL Databases and Big Data Storage Systems - Chapter24
No ratings yet
DBMS - Module-5 - NOSQL Databases and Big Data Storage Systems - Chapter24
42 pages
Complete Unit 3 Notes
No ratings yet
Complete Unit 3 Notes
30 pages
Module 3 Mongodb
No ratings yet
Module 3 Mongodb
10 pages
Nosql Notes
No ratings yet
Nosql Notes
110 pages
NoSQL Lecture Notes Compilation
No ratings yet
NoSQL Lecture Notes Compilation
5 pages
NoSQL Data Analytics Guide
0% (1)
NoSQL Data Analytics Guide
50 pages
MongoDB Lecture 1
No ratings yet
MongoDB Lecture 1
37 pages
Unit 1
No ratings yet
Unit 1
57 pages
MongoDB NoSQL Database Guide
No ratings yet
MongoDB NoSQL Database Guide
19 pages
Full Stack UNIT 3
No ratings yet
Full Stack UNIT 3
36 pages
2383 - 1019 - DOC - NoSQL Databases
No ratings yet
2383 - 1019 - DOC - NoSQL Databases
6 pages
Unit 4
No ratings yet
Unit 4
27 pages
BDA Unit 5
No ratings yet
BDA Unit 5
61 pages
NoSQL & MongoDB Overview
No ratings yet
NoSQL & MongoDB Overview
47 pages
L48 - MongoDB
No ratings yet
L48 - MongoDB
31 pages
Key NOSQL Features in Distributed Systems
No ratings yet
Key NOSQL Features in Distributed Systems
36 pages
Unit 5 NOSQL
No ratings yet
Unit 5 NOSQL
102 pages
NGD Question Bank Answers
No ratings yet
NGD Question Bank Answers
41 pages
Mongo DB
No ratings yet
Mongo DB
23 pages
NOSQL
No ratings yet
NOSQL
50 pages
NoSQL Databases: MongoDB & CAP Theorem
No ratings yet
NoSQL Databases: MongoDB & CAP Theorem
34 pages
DBMS Module 5
No ratings yet
DBMS Module 5
36 pages
Understanding MongoDB's $out Stage
No ratings yet
Understanding MongoDB's $out Stage
133 pages
NoSQL+Databases+and+MongoDB+-+I+ +Lecture+Notes
No ratings yet
NoSQL+Databases+and+MongoDB+-+I+ +Lecture+Notes
7 pages
Document Database
No ratings yet
Document Database
25 pages
Open-Source - Document Oriented - Nosql Database - Distributed Database
No ratings yet
Open-Source - Document Oriented - Nosql Database - Distributed Database
15 pages
Module 7
No ratings yet
Module 7
30 pages
NoSQL and MongoDB Overview Guide
No ratings yet
NoSQL and MongoDB Overview Guide
47 pages
Notes For Question Bank
No ratings yet
Notes For Question Bank
17 pages
Mongo DB-CRUD
No ratings yet
Mongo DB-CRUD
10 pages
Research Paper Updated
No ratings yet
Research Paper Updated
11 pages
Unit-1 Notes
No ratings yet
Unit-1 Notes
18 pages
NoSQL & MongoDB Essentials
No ratings yet
NoSQL & MongoDB Essentials
26 pages
Lecture 40 1
No ratings yet
Lecture 40 1
22 pages
UNIT 3 FS Notes
No ratings yet
UNIT 3 FS Notes
45 pages
Dod Unit2
No ratings yet
Dod Unit2
22 pages
MongoDB & NoSQL for Developers
No ratings yet
MongoDB & NoSQL for Developers
12 pages
Introduction To MongoDB
No ratings yet
Introduction To MongoDB
25 pages
06 NoSQL
No ratings yet
06 NoSQL
80 pages
Unit 1
No ratings yet
Unit 1
16 pages
RDBMS vs NoSQL: Key Comparisons
No ratings yet
RDBMS vs NoSQL: Key Comparisons
6 pages
MongoDB Concepts and Queries Overview
No ratings yet
MongoDB Concepts and Queries Overview
45 pages
Module 2
No ratings yet
Module 2
42 pages
FSD Unit III
No ratings yet
FSD Unit III
22 pages
Questions and Answers 5-10 Marks
No ratings yet
Questions and Answers 5-10 Marks
2 pages
NOSQL
No ratings yet
NOSQL
55 pages
Ultra
No ratings yet
Ultra
385 pages
Data Analysts' Guide to BigQuery & Tableau
100% (1)
Data Analysts' Guide to BigQuery & Tableau
14 pages
Event Driven Architecture
No ratings yet
Event Driven Architecture
16 pages
Database Sharding at Facebook - A Case Study
No ratings yet
Database Sharding at Facebook - A Case Study
12 pages
Mangodb
No ratings yet
Mangodb
36 pages
Big Query
No ratings yet
Big Query
8 pages
Grokking The System Design Interview
0% (1)
Grokking The System Design Interview
25 pages
MongoDB for Developers & DBAs
No ratings yet
MongoDB for Developers & DBAs
7 pages
Intersystem Cache
No ratings yet
Intersystem Cache
20 pages
01 - Introduction To MongoDB
No ratings yet
01 - Introduction To MongoDB
15 pages
Mongodb Report
No ratings yet
Mongodb Report
26 pages
MongoDB Administration Guide PDF
No ratings yet
MongoDB Administration Guide PDF
201 pages
The Richest Man in Babylon
No ratings yet
The Richest Man in Babylon
70 pages
NoSQL for Data Engineers
No ratings yet
NoSQL for Data Engineers
144 pages
H13-831 - V2.0-ENU Huawei Real Exam Questions
No ratings yet
H13-831 - V2.0-ENU Huawei Real Exam Questions
40 pages
MongoDB for Social Network Development
No ratings yet
MongoDB for Social Network Development
4 pages
4 Designing Instagram - Grokking The System Design Interview
No ratings yet
4 Designing Instagram - Grokking The System Design Interview
9 pages
ArangoDB Training - Section A - Fundamentals
No ratings yet
ArangoDB Training - Section A - Fundamentals
250 pages
Unit 3 - FSW - Important Ques With Ans
No ratings yet
Unit 3 - FSW - Important Ques With Ans
36 pages
Kibana
No ratings yet
Kibana
12 pages
M03 - Application Policy Infrastructure Controller
No ratings yet
M03 - Application Policy Infrastructure Controller
50 pages
6 4360704 Nosql Lab Manual
No ratings yet
6 4360704 Nosql Lab Manual
169 pages
Implementing Cloud Design Patterns For AWS 1st Edition Marcus Young Online PDF
No ratings yet
Implementing Cloud Design Patterns For AWS 1st Edition Marcus Young Online PDF
159 pages
Mongo DB
No ratings yet
Mongo DB
77 pages

Module 7 - NoSQL

Uploaded by

Module 7 - NoSQL

Uploaded by

Module 7

Dr. S. RENUKA DEVI

Most NoSQL systems are distributed databases or

A structured relational SQL system may not be appropriate

Other software companies started developing their own

Another category of NoSQL systems is the graph-based

 The CAP theorem states that it is not possible to guarantee all

 In a N0SQL distributed data store, a weaker consistency level is

 Hence, eventual consistency is often adopted in NoSQL systems.

It stores data as collections of similar documents

Sometimes known as document stores

Examples include MongoDB and CouchDB

There is also an update operation, which has a

 The query (CRUD operation) will be routed to the nodes that

 Sharding and replication are used together; sharding focuses

The main characteristic of key-value stores is the fact that

The primary key will be used to rapidly locate the items in

The primary key attribute must exist in every item in the

An open source system known as Apache Hbase is similar

HDFS is used in many cloud computing applications

sAnother well-known example of column-based NOSQL

One of the main differences that distinguish column-based

In Hbase, the key is multidimensional and so has several

 A column is identified by a combination of (column family:column

 Data is stored in a self-describing form by associating columns with

 Hbase also stores multiple versions of a data item, with a timestamp

A cell holds a basic data item in Hbase. The key (address)

If timestamp is left out, the latest version of the item is

A namespace is a collection of tables.

The data is represented as a graph, which is a collection of

Both nodes and edges can be labeled to indicate the types

It is generally possible to store data associated with both

You might also like