Unit V NoSQL Databases
Unit V NoSQL Databases
DATABASE MANAGEMENT
SYSTEM
TE 2019 Course
Prof.K. B. Sadafale
Assistant Professor
Syllabus
Unit I: Introduction to Database Management Systems and ER Model
Site 1
Site 2
Site 5
Communication
Network
Site 4 Site 3
Distributed DBMS Environment
Site 1
Site 2
Site 5
Communication
Network
Site 4 Site 3
12.3 Introduction
Increased complexity of concurrency control:
concurrent updates to distinct
replicas may lead to inconsistent data unless
special concurrency control mechanisms are
implemented.
One solution: choose one copy as primary copy and
apply concurrency control operations on primary copy.
Data Fragmentation
If relation r is fragmented , r is divided into number of
fragments r1,r2,r3….rn.
These fragments contain sufficient information to allow
reconstruction of the original relation r.
Data can be distributed by storing individual tables at
different sites
Data can also be distributed by decomposing a table and
storing portions at different sites – called Fragmentation
There are two different schemes for fragmenting a relation:
Horizontal Fragmentation
Vertical Fragmentation
Horizontal Fragmentation
Horizontal fragmentation splits the relation by assigning each
tuple of r to one or more fragments.
Vertical fragmentation splits the relation by decomposing the
scheme R of relation r.
Consider the following schema
Account_scmema=(account_number,branch_name,balance)
In Horizontal fragmentation , a relation r is partitioned into a
number of subsets, r1,r2,r3….rn.
Each tuple of relation r must belong to at least one of the
fragments , so that the original relation can be
reconstructed,if needed .
The account relation can be divided into several different
fragments , each of which consists of tuples of accounts
belonging to particular branch.
consider in the banking system only two branches Hillside
and vallyview .
There are two different fragments.
Account1= branch_name=“Hillside”(account)
Account2= branch_name=“Valleyview”(account)
Horizontal Fragmentation Example
r = r1 r2 r3 …………………rn
Vertical Fragmentation Example
Fragmentation transparency
Replication transparency
Location transparency
Fragmentation transparency:
Users are not required to know how a relation has
been fragmented.
Replication transparency
Users view each data object as logically unique.
The distributed system may replicate an object to
increase either system performance or data
availability.
Location transparency
User are not required to know the physical location of
the data.
Types of Data
Big Data includes huge volume, high velocity, and extensible variety of
data. These are 3 types: Structured data, Semi-structured data, and
Unstructured data.
Structured data –
Structured data is data whose elements are addressable for effective
analysis.
It has been organized into a formatted repository that is typically a
database.
It concerns all data which can be stored in database SQL in a table with
rows and columns.
They have relational keys and can easily be mapped into pre-designed
fields.
Today, those data are most processed in the development and simplest
way to manage information. Example: Relational data
Semi-Structured data –
Semi-structured data is information that does not reside in a relational
database but that has some organizational properties that make it easier
to analyze. With some processes, you can store them in the relation
database (it could be very hard for some kind of semi-structured data),
but Semi-structured exist to ease space. Example: XML data
Unstructured data –
Unstructured data is a data which is not organized in a predefined manner
or does not have a predefined data model, thus it is not a good fit for a
mainstream relational database. So for Unstructured data, there are
alternative platforms for storing and managing, it is increasingly prevalent
in IT systems and is used by organizations in a variety of business
intelligence and analytics applications. Example: Word, PDF, Text, Media
logs.
Properties Structured data Semi-structured data Unstructured data
It is based on
It is based on Relational It is based on character
Technology XML/RDF(Resource
database table and binary data
Description Framework).
Structured query allow Queries over anonymous Only textual queries are
Query performance
complex joining nodes are possible possible
NoSQL
A JOIN is a means for combining fields from two tables by using values
common to each.
As a special case, a table (base table, view, or joined table) can JOIN to
itself in a self-join.
Atomicity
Consistency
Isolation
Durability
Atomicity means that you can guarantee that all of a transaction
happens, or none of it does.
Consistency means that you can guarantee that your data will be
consistent.
Isolation means that one transaction cannot read data from another
transaction that is not yet completed.
These type of data storing may not require fixed schema, avoid join
operations and typically scale horizontally.
You need to understand the CAP theorem when you talk about
NoSQL databases, or, in fact, when designing any distributed
system.
The CAP theorem states that there are three basic requirements
which exist in a special relation when designing applications for a
distributed architecture.
Consistency - This means that the data in the database remains
consistent after the execution of an operation.
For example, after an update operation, all clients see the same data.
Key-Value store – we start with this type of database because these are some
of the least complex NoSQL options.
These databases are designed for storing data in a schema-less way.
In a key-value store, all of the data within consists of an indexed key and a
value, hence the name.
Examples of this type of database include: Cassandra, DyanmoDB, Azure Table
Storage (ATS), Riak, BerkeleyDB.
Data model: nodes and edges
Nodes may have properties (including ID)
Edges may have labels or roles
Graph Database
In general, graph databases are useful when you are more interested in
relationships between data than in the data itself:
for example, in representing and traversing social networks, generating
recommendations, or conducting forensic investigations (e.g. pattern
detection).
Types of NoSQL Databases
A database is a collection of structured data or information
which is stored in a computer system and can be accessed
easily.
A database is usually managed by a Database Management
System (DBMS).
NoSQL is a non-relational database that is used to store the
data in the nontabular form.
NoSQL stands for Not only SQL.
The main types are documents, key-value, wide-column, and
graphs.
Types of NoSQL Database:
Document-based databases
Key-value stores
Column-oriented databases
Graph-based databases
Document-Based Database
The document-based database is a nonrelational database. Instead of storing the data in rows and
columns (tables), it uses the documents to store the data in the database. A document database
stores data in JSON, BSON, or XML documents.
Documents can be stored and retrieved in a form that is much closer to the data objects used in
applications which means less translation is required to use these data in the applications. In the
Document database, the particular elements can be accessed by using the index value that is
assigned for faster querying.
Collections are the group of documents that store documents that have similar contents. Not all the
documents are in any collection as they require a similar schema because document databases have
a flexible schema.
Key features of documents database:
Flexible schema: Documents in the database has a flexible schema. It means the documents in the
database need not be the same schema.
Faster creation and maintenance: the creation of documents is easy and minimal maintenance
is required once we create the document.
No foreign keys: There is no dynamic relationship between two documents so documents can be
independent of one another. So, there is no requirement for a foreign key in a document database.
Open formats: To build a document we use XML, JSON, and others.
Key-Value Stores
A key-value store is a nonrelational database. The simplest form
of a NoSQL database is a key-value store. Every data element in
the database is stored in key-value pairs. The data can be
retrieved by using a unique key allotted to each element in the
database. The values can be simple data types like strings and
numbers or complex objects.
A key-value store is like a relational database with only two
columns which is the key and the value.
Key features of the key-value store:
Simplicity.
Scalability.
Speed.
Column Oriented Databases
A column-oriented database is a non-relational database that
stores the data in columns instead of rows. That means when
we want to run analytics on a small number of columns, you
can read those columns directly without consuming memory
with the unwanted data.
Columnar databases are designed to read data more efficiently
and retrieve the data with greater speed. A columnar database
is used to store a large amount of data. Key features of
columnar oriented database:
Scalability.
Compression.
Very responsive.
Graph-Based databases
Graph-based databases focus on the relationship between
the elements. It stores the data in the form of nodes in the
database. The connections between the nodes are called links
or relationships.
Key features of graph database:
In a graph-based database, it is easy to identify the
relationship between the data by using the links.
The Query’s output is real-time results.
The speed depends upon the number of relationships among
the database elements.
BASE: Basically Available, Soft state, Eventual
consistency
Basically, available means DB is available all
the time as per CAP theorem
Soft state means even without an input; the
system state may change
Eventual consistency means that the system
will become consistent over time
Difference between ACID and BASE
Criteria ACID BASE
Simplicity Simple Complex
RDBMS
- Structured and organized data
- Structured query language (SQL)
- Data and its relationships are stored in separate tables.
- Data Manipulation Language, Data Definition Language
- Tight Consistency
- ACID Transaction
NoSQL
- Stands for Not Only SQL
- No declarative query language
- No predefined schema
- Variants - Key-Value Pair Store, Column Store, Document Store, Graph
Store
- Eventual consistency rather ACID property
- Unstructured and unpredictable data
- CAP Theorem
- Prioritizes high performance, high availability
66 and scalability
key features of NoSQL:
5. Flexible schema
Different Types of NoSQL Systems
• Column-based Systems
– Google’s BigTable
– Facebook’s Cassandra
Database Database
Table Collection
Row Document
Column Field
Index Index
Table Join Embedded documents & Linking
Primary key Primary Key
Specify any unique column or column In MongoDB, the primary key is
combination as primary key. automatically set to the _id field.
use database_name
If you want to check your databases list, then use the command
show dbs
In MongoDB, you don't need to create collection. MongoDB
creates collection automatically, when you insert some
document
>db.tutorialspoint.insert({"name" : "tutorialspoint"})
>show collections
mycol
mycollection
system.indexes
tutorialspoint
show collections
MongoDB Drop Database
Db.dropDatabase()
>db.mycol.find({key1:value1, key2:value2})
OR in MongoDB
To query documents based on the OR condition,
you need to use $or keyword.
>db.COLLECTION_NAME.update(SELECTIOIN_CRITERIA, UPDATED_DATA)
Example:
>db.mycol.update({'title':'MongoDB Overview'},{$set:
{'title':'New MongoDB Tutorial'}},{multi:true})
MongoDB-Delete Document
>db.COLLECTION_NAME.remove(DELLETION_CRITTERIA)
MongoDB-Delete Document
If there are multiple records and you want to delete only first
record, then set justOne parameter in remove() method
>db.COLLECTION_NAME.remove(DELETION_CRITERIA,1)
>db.COLLECTION_NAME.find().sort({KEY:1})
MongoDB Indexing
Indexes are special data structures, that store a small portion of the
data set in an easy to traverse form.
The index stores the value of a specific field or set of fields, ordered by
the value of the field as specified in index.
>db.COLLECTION_NAME.aggregate(AGGREGATE_OPERATION)
MongoDB Aggregate Functions
$sum:
Sums up the defined value from all documents in the collection.
db.mycol.aggregate([{$group : {_id: "$by_user", num_tutorial :
{$sum:"$likes"}}}])
$avg:
Calculates the average of all given values from all documents in the
collection.
db.mycol.aggregate([{$group : {_id: "$by_user", num_tutorial :
{$avg:"$likes"}}}])
$min:
Gets the minimum of the corresponding values from all documents in the
collection.
db.mycol.aggregate([{$group : {_id: "$by_user", num_tutorial :
{$min:"$likes"}}}])
$max:
Gets the maximum of the corresponding values from all documents in
the collection.
db.mycol.aggregate([{$group : {_id: "$by_user", num_tutorial :
{$max:"$likes"}}}])
MongoDB Sharding
> db.c.find().limit(3)
If there are fewer than three documents matching your query in the collection,
only the number of matching documents will be returned; limit sets an upper
limit, not a lower limit.
Skip :-skip works similarly to limit:
> db.c.find().skip(3)
This will skip the first three matching documents and return the rest of the
matches. If there are fewer than three documents in your collection, it will not
return any documents.
Sort
To sort documents in MongoDB, you need to use sort()
method.
>db.COLLECTION_NAME.find().sort({KEY:1})
> db.student.find();
{ "_id" : ObjectId("59798f8ff04853461a205087"), "name" : "abc", "pin" : 1213331}
{ "_id" : ObjectId("5982e590f6bd1b6d87588a1a"), "name" : "pqr", "pin" : 78776787 }
{ "_id" : ObjectId("5982e5bdf6bd1b6d87588a1b"), "name" : "xyz", "pin" : 6787, “addrss" : "pune" }
{ "_id" : ObjectId("5982e5c6f6bd1b6d87588a1c"), "name" : "xyz", "pin" : 6787, “addrss" : "mumbai"
}
{ "_id" : ObjectId("5982e5dcf6bd1b6d87588a1d"), "name" : "umesh", "pin" : 5654787, "addrss" :
"Nagpure" }
Limits
> db.student.find().limit(3);
{ "_id" : ObjectId("59798f8ff04853461a205087"), "name" : "abc", "pin" : 1213331}
{ "_id" : ObjectId("5982e590f6bd1b6d87588a1a"), "name" : "pqr", "pin" : 78776787 }
{ "_id" : ObjectId("5982e5bdf6bd1b6d87588a1b"), "name" : "xyz", "pin" : 6787, “addrss" : "pune" }
Skips
> db.student.find().skip(3);
{ "_id" : ObjectId("5982e5c6f6bd1b6d87588a1c"), "name" : "xyz", "pin" : 6787, “addrss" : "mumbai"
}
{ "_id" : ObjectId("5982e5dcf6bd1b6d87588a1d"), "name" : "umesh", "pin" : 5654787, "addrss" :
"Nagpure" }
>
Sort
> db.student.find().sort({name:1});
{ "_id" : ObjectId("59798f8ff04853461a205087"), "name" : "abc", "pin" : 1213331}
{ "_id" : ObjectId("5982e590f6bd1b6d87588a1a"), "name" : "pqr", "pin" : 78776787 }
{ "_id" : ObjectId("5982e5dcf6bd1b6d87588a1d"), "name" : "umesh", "pin" : 5654787,
"addrss" : "Nagpure" }
{ "_id" : ObjectId("5982e5bdf6bd1b6d87588a1b"), "name" : "xyz", "pin" : 6787, “addrss" :
"pune" }
{ "_id" : ObjectId("5982e5c6f6bd1b6d87588a1c"), "name" : "xyz", "pin" : 6787, “addrss" :
"mumbai" }
>
> db.student.find().sort({name:-1});
{ "_id" : ObjectId("5982e5bdf6bd1b6d87588a1b"), "name" : "xyz", "pin" : 6787, “addrss" :
"pune" }
{ "_id" : ObjectId("5982e5c6f6bd1b6d87588a1c"), "name" : "xyz", "pin" : 6787, “addrss" :
"mumbai" }
{ "_id" : ObjectId("5982e5dcf6bd1b6d87588a1d"), "name" : "umesh", "pin" : 5654787,
"addrss" : "Nagpure" }
{ "_id" : ObjectId("5982e590f6bd1b6d87588a1a"), "name" : "pqr", "pin" : 78776787 }
{ "_id" : ObjectId("59798f8ff04853461a205087"), "name" : "abc", "pin" : 1213331}
> db.student.find().sort({name:-1,addrss:1});
{ "_id" : ObjectId("5982e5c6f6bd1b6d87588a1c"), "name" : "xyz", "pin" :
6787, “addrss" : "mumbai" }
{ "_id" : ObjectId("5982e5bdf6bd1b6d87588a1b"), "name" : "xyz", "pin" :
6787, “addrss" : "pune" }
{ "_id" : ObjectId("5982e5dcf6bd1b6d87588a1d"), "name" : "umesh", "pin" :
5654787, "addrss" : "Nagpure" }
{ "_id" : ObjectId("5982e590f6bd1b6d87588a1a"), "name" : "pqr", "pin" :
78776787 }
{ "_id" : ObjectId("59798f8ff04853461a205087"), "name" : "abc", "pin" :
1213331}
Combination of limits-Skips-Sort
> db.student.find().limit(3).sort({"name":1});
Ex:
Prob: A king want to count the total population in his country.
Solution 2: king sends one person to each city. They will count the
population in each city and after returning to kingdom the total count will
reduce to single count(By adding population of each city).
Assuming there are 10 cities in the kingdom, if it takes one day to count the
population in each city then counting the total population takes 10 days in
solution 1 and only one day in solution 2.
In map Reduce we have to write 3 functions.
db.orders.mapReduce(
mapFunction1,
reduceFunction1,
{ out: "map_example" }
)
db.map_example.find();
Create a collection orders.
Insert following documents in orders collection
{ "cust_id" : "abc123", "ord_date“ : new Date("Oct 04,2012"), "price" : 25 }
{ "cust_id" : "xyz123", "ord_date” : new Date("Oct 05,2012"), "price" : 50 }
{ "cust_id" : "xyz456", "ord_date” : new Date("Oct 06,2012"), "price" : 60 }
{ "cust_id" : "xyz789", "ord_date” : new Date("Oct 07,2012“), "price" : 70 }
{ "cust_id" : "xyz789", "ord_date” : new Date("Oct 06,2012"),"price" : 90 }
{ "cust_id" : "xyz789", "ord_date” : new Date("Oct 06,2012"),"price" : 90 }
{ "cust_id" : "abc123", "ord_date” : new Date("Oct 06,2012"),"price" : 90 }
db.orders.mapReduce(mapFunction1,reduceFunction1,
{out:"map_example"})
{
"result" : "map_example",
"timeMillis" : 75,
"counts" : {
"input" : 7,
"emit" : 7,
"reduce" : 2,
"output" : 4
},
"ok" : 1,
}
Find documents from map_example as follows
db.map_example.find()
it gives the Total Price Per Customer
db.map_example.find()
E.g
{ "_id" : "abc123", "value" : 115 }
{ "_id" : "xyz123", "value" : 50 }
{ "_id" : "xyz456", "value" : 60 }
{ "_id" : "xyz789", "value" : 250 }
End of Unit No : 5