0% found this document useful (0 votes)
39 views84 pages

Chapter 5

Uploaded by

View Present
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views84 pages

Chapter 5

Uploaded by

View Present
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 84

CHAPTER 05: MongoDB

Use cases
• Aadhar is an excellent example of real world use
cases of MongoDB.
• Shutterfly is a popular internet-based photo
sharing and personal publishing company that
manages a store of more than 6 billion images
with a transaction rate of up to 10,000 operations
per second.
• Shutterfly is one of the companies that
transitioned from Oracle to MongoDB.
Other use cases
• Castlight Health, Thumbtack, Klout, IBM, Citrix, Twitter, T-
Mobile, Zendesk, Sony, Chegg, Techstars, Atlassian,
Udacity, BrightRoll, RetailMeNot, Hootsuite,
SurveyMonkey, Shyp, Criteo, MuleSoft, HackerRank,
Foursquare, HTC, InVision, Intercom.
• Facebook
• GitHub https://www.mongodb.com/who-use
s-mongodb
• Youtube
• Twitter
• LinkedIn
• Slack
• StackOverflow and so on.
MongoDB
• MongoDB is a document-oriented NoSQL database
used for high volume data storage, which came into
light around the mid-2000s.
• MongoDB is an open-source database that uses a
document-oriented data model and a non-structured
query language. It is one of the most powerful
NoSQL systems and databases around, today.
• Stores data in JSON-like formats.
• It is a highly scalable, flexible, and distributed NoSQL
database.
MongoDB
• Each database contains collections which in turn
contains documents. Each document can be different
with a varying number of fields.
• The size and content of each document can be
different from each other.
• MongoDB was first developed by MongoDB Inc.,
known then as 10gen, in October 2007 originally as a
major part in a PaaS (Platform as a Service) product
similar to Windows Azure and Google App Engine.
• The development was shifted to open source in
2009.
Difference between RDBMS and MongoDB
RDBMS MongoDB
Database Database
Table, View Collection
Row Document (JSON, BSON)
Column Field
Index Index
Join Embedded Document
Foreign Key Reference
Partition Shard

14
Collection
• A grouping of MongoDB documents.
• A collection is the equivalent of an RDBMS
table. A collection exists within a single
database.
• Collections do not enforce a schema.
Sample Representation of a Document in
MongoDB

Collection
Document
Referenced Documents
Embedded Documents:
Theory of noSQL: CAP
• Many nodes
C
• Nodes contain replicas of
partitions of data

• Consistency
– all replicas contain the same
version of data
• Availability
– system remains operational
A P
on failing nodes
• Partition tolarence CAP Theorem:
– multiple entry points satisfying all three at the
– system remains operational
on system split
same time is impossible

20
Features/Why MongoDB
• Document-Oriented storege
• Full Index Support
• Replication & High Agile
Availability
• Auto-Sharding
• Querying
• Fast In-Place Updates Scalable
• Map/Reduce

21
Other features
• Easy to install and use
• Detailed documentation
• Various APIs
– JavaScript, Python, Ruby, Perl, Java, Java, Scala, C#,
C++, Haskell, Erlang
• Community (Good)
• Open source

22
ACID - BASE

•Basically
•Atomicity
Available (CP)
•Consistency •Soft-state
•Isolation •Eventually
•Durability
consistent (AP)

23
Pritchett, D.: BASE: An Acid Alternative (queue.acm.org/detail.cfm?id=1394128)
What is MongoDB
• MongoDB is
– Cross-platform
– Open source
– Non-relational
– Distributed
– NoSQL
– Document oriented data store.
What is MongoDB
• A document-oriented database
– documents encapsulate and encode data (or
information) in some standard formats or encodings
• NoSQL database
– non-adherence to the widely used relational database
– highly optimized for retrieve and append operations
• uses BSON/JSON format
• schema-less
– No more configuring database columns with types
• No transactions
• No joins
BSON (Binary JSON)
• BSON is a computer data interchange format used
mainly as a data storage and network transfer format in
the MongoDB database.
• It is a binary form for representing simple data
structures, associative arrays (called objects or
documents in MongoDB), and various data types of
specific interest to MongoDB.
• MongoDB represents JSON documents in binary-
encoded format called BSON behind the scenes.
JSON
• JavaScript Object Notation (JSON) is an open, human and
machine-readable standard that facilitates data
interchange, and along with XML is the main format for
data interchange used on the modern web.
• JSON supports all the basic data types numbers, strings,
and Boolean values, as well as arrays and hashes.
• Document databases such as MongoDB use JSON
documents in order to store records, just as tables and
rows store records in a relational database.
{
'_id' : 1, Example
'name' : { 'first' : 'John', 'last' : 'Backus' },
'contribs' : [ 'Fortran', 'ALGOL', 'Backus-Naur Form', 'FP' ],
'awards' : [
{
'award' : 'W.W. McDowell Award',
'year' : 1967,
'by' : 'IEEE Computer Society'
}, {
'award' : 'Draper Prize',
'year' : 1993,
'by' : 'National Academy of Engineering'
}
]
}
The Basics
• A MongoDB may have zero or more databases
• A database may have zero or more collections.
• A collection may have zero or more documents.
– Docs in the same collection don’t even need to have
the same fields
– Docs are the records in RDBMS
– Docs can embed other documents
– Documents are addressed in the database via a
unique key
• A document may have one or more fields.
• MongoDB Indexes is much like their RDBMS
counterparts.
The Basics
• Simple queries
• Makes sense with most web applications
• Easier and faster integration of data
• Not well suited for heavy and complex
transactions systems.
Unique Key
• Each JSON document should have a unique
identifier. It is the _id key.
• It is similar to the primary key in relational
databases.
• An index is automatically built on the unique
identifier.
Database
• It is collection of collections.
• It is a container for collections.
• Default database of MongoDB is 'db', which is
stored within data folder.
• A single MongoDB server can house several
databases.
Collection
• A collection may store a number of documents.
• A collection is analogous to a table of an RDBMS.
• A collection can create on demand.
• A collection exists within a single database.
• A collection can hold several MongoDB
documents.
• A collection does not enforce a schema.
Example: Mongo Collection

{ "_id":
ObjectId("4efa8d2b7d284dad101e4bc9"),
"Last Name": "DUMONT", "First Name": "Jean", Automatically
"Date of Birth": "01-22-1963" }, generated by
MongoDB
{ "_id":
ObjectId("4efa8d2b7d284dad101e4bc7"), "Last
Name": "PELLERIN",
"First Name": "Franck",
"Date of Birth": "09-19-1983", "Address": "1
chemin des Loges", "City": "VERSAILLES" }
Document
• The document is the unit of storing data in a
MongoDB database.
• A document is analogous to a row/record/tuple
in an RDBMS table.
• A document has a dynamic schema. This implies
that, a document in a collection need not have
the same set of fields/key-value pairs
Support for Dynamic queries
• MongoDB has extensive support for dynamic
queries.
Storing Binary Data
• MongoDB provides GridFS to support the storage of
binary data such as images, audio files, video files,
etc.
• GridFS divides a file into chunks and stores each
chunk of data in a separate document, each
maximum size 255k.
Replication
Sharding
Sharding
• Sharding is a type of database partitioning that separates
very large databases the into smaller, faster, more easily
manageable parts called data shards.
• The word shard means a small part of a whole.
• A large dataset is divided and distributed over multiple
servers or shards.
• Each shard is an independent database and collectively
constitute a logical database.
Advantages of Sharding
• MongoDB distributes the read and write workload across
the shards in the cluster.
• Sharding reduces the amount of data that each shard
needs to store and manage.
• Sharding reduces the number of operations that each
shard handles.
Updating information in-place
• MongoDB updates the data wherever it is available.
• MongoDB writes to disk once every second.
Terms used in RDBMS and MongoDB
Embedded documents
Database Server & Client
• Server: mongod
• Client: mongo
Data types
String Must be UTF-8 valid.
Most commonly used data type.
Integer Can be 32-bit or 64-bit (depends on the server).
Boolean To store a true/false value.
Double To store floating point (real values).
Min/Max keys To compare a value against the lowest or
highest BSON elements.
Arrays To store arrays or list or multiple values into
one key.
Timestamp To record when a document has been modified
or added.
Null To store a NULL value. A NULL is a missing or
unknown value.
Date To store the current date or time in Unix time
format. One can create object of date and
pass day, month and year to it.
Object ID To store the document’s id.
Binary data To store binary data (images, binaries, etc.).
Code To store javascript code into the document.
Regular expression To store regular expression.
MQL

• show dbs: show database names


• use myDB : set/create required database
• db.dropDatabase()
• db -> report the name of the current database
• show collections
• db.version()
• db.stats()
• db.help()
CRUD

• Create: using insert(), update() and save()


• Read: using find()
• Update: using update()
• Delete: using remove()
Collection

• To create a collection by the name “Person”.

db.createCollection(“Person”)
Collection

• To drop a collection by the name “food”.

db.food.drop();
Insert and display
• Create a collection by the name “Students”
and store the following data in it.

db.Student.insert({_id:2, StudName:"Raju
Patil”, Grade: "VII", Hobbies:
"Internet Surfing"})

db.Student.find()

db.Student.find().pretty()
Update Method

• db.Student.update({_id:4},{$set:
{"StudName":"Aryan Patil"}},{upsert:true});
Output: WriteResult({ "nMatched" : 0,
"nUpserted" : 1, "nModified" : 0, "_id" : 4 })

• db.Student.update({_id:4},{$set:
{"StudName":"Aryan Patil 2"}},{upsert:true});
Output: WriteResult({ "nMatched" : 1,
"nUpserted" : 0, "nModified" : 1 })
Other commands

• Commands
Find Method

• To search for documents from the “Students”


collection based on certain search criteria.

db.Students.find({studentName:"Raju"}).pretty()
Find Method
To display only the Name and Grade from all
the documents of the Students collection. The
identifier _id should be suppressed and NOT
displayed.

db.Students.find({},{Name:1,Grade:1,_id:0})
Find Method
• To find those documents where the Grade is set to
‘VII’

db.Students.find({Grade:{$eq:“VII"}}).pretty()
Find Method
• To find those documents from the Students
collection where the Hobbies is set to either ‘Chess’
or is set to ‘Skating’.

db.Students.find({Hobbies:{$in:
["Chess","Skating"]}}).pretty()
Find Method
• To find documents from the Students collection
where the Name begins with “M”.

db.Students.find({StudName:/^M/}).pretty();
Find Method
• To find documents from the Students collection
where the StudName has an “e” in any position.

db.Students.find({StudName:/e/}).pretty();
Number of documents

• To find the number of documents in the Students


collection.

db.Students.count();
Sort

db.Students.find().sort({Name:1}).pretty();

db.Students.find().sort({Name:-1}).pretty();

db.Students.find().sort({Grade:1,Hobbies:1}).pretty();
Skip

db.Students.find().skip(2).pretty();

db.Students.find().skip(2).limit(3).pretty();
Aggregate Function
{ CustID: “C123”,
AccBal: 500,
AccType: “S”
} { CustID: “C123”,
AccBal: 500,
{ CustID: “C123”, AccType: “S”
AccBal: 900, }
AccType: “S”
{
} { _id: “C123”,
TotAccBal: 1400
CustID: “C123”, }
{ AccBal: 900,
AccType: “S”
CustID: “C111”,
$match } $group
AccBal: 1200, {
AccType: “S” { CustID: “C111”, _id: “C111”,
} AccBal: 1200, TotAccBal: 1200
AccType: “S” }
{ CustID: “C123”,
AccBal: 1500, }
AccType: “C”
}

Customers
Aggregate Function
• First filter on “AccType:S” and then group it
on “CustID” and then compute the sum of
“AccBal” and then filter those documents
wherein the “TotAccBal” is greater than 1200,
use the below syntax:

db.Customers.aggregate( { $match : {AccType : "S" } },


{ $group : { _id : "$CustID",TotAccBal : { $sum :
"$AccBal" } } },
{ $match : {TotAccBal : { $gt : 1200 } }});
Map Reduce Functions
{ CustID: “C123”,
AccBal: 500,
AccType: “S”

} { CustID: “C123”,
AccBal: 500,
AccType: “S”
{ CustID: “C123”,
AccBal: 900, }
AccType: “S” {
_id: “C123”,
} { value: 1400
CustID: “C123”,
{ AccBal: 900, {“C123”:[ 500,900 ]}
AccType: “S”
CustID: “C111”,
query } map
{“C111”: 1200 } {
AccBal: 1200,
AccType: “S” { CustID: “C111”, _id: “C111”,
AccBal: 1200, value: 1200
}
AccType: “S” }
{ CustID: “C123”,
AccBal: 1500, }
AccType: “C” Customer_Totals

Customers
MapReduce functions
• As per the MongoDB documentation, Map-
reduce is a data processing paradigm for
condensing large volumes of data into useful
aggregated results.
• MongoDB uses mapReduce command for map-
reduce operations. MapReduce is generally used
for processing large data sets.
MapReduce functions

>db.collection.mapReduce(
function() {emit(key,value);}, //map F
function(key,values) {return reduceFunction}, { //reduce F
out: collection,
query: document,
}
)
MapReduce functions

• The map-reduce function first queries the collection,


then maps the result documents to emit key-value
pairs, which is then reduced based on the keys that
have multiple values.
MapReduce functions

• map is a javascript function that maps a value with a key


and emits a key-value pair
• reduce is a javascript function that reduces or groups all
the documents having the same key
• out specifies the location of the map-reduce query result
• query specifies the optional selection criteria for selecting
documents
Example
Map and Reduce functions
• var map=function()
{
Emit (this.CustID, this.AccBal);
}
• var reduce=function(key,values)
{
return Array.sum(values);
}
Example
• Creating map and Reduce functions
• var map=function(){emit (this.custID, this.bal);}
• var reduce=function(key,values){return
Array.sum(values);}
• db.Accounts.mapReduce(
map,
reduce,
{
out:"CustomerTot",
query:{type:"S"}
}
);
Data
(Implement MapReduce)

• custID, bal, type


• C123, 1200, C
• C123, 900, S
• C124, 1200, S
• C123, 1200, C
• C124, 1200, S
• C123, 1200, S
Data
(Implement MapReduce)

Name Subject marks


Midhu Science 68
Midhu Maths 98
Midhu Sports 77
Akhil Science 67
Akhil Maths 87
Akhil Sports 89
Anish Science 67
Anish Maths 78
Anish Sports 90
Java Script Programming

To compute the factorial of a given positive number. The user


is required to create a function by the name “factorial” and
insert it into the “system.js” collection.

Big Data and Analytics by Seema Acharya


and Subhashini Chellappan Copyright
2015, WILEY INDIA PVT. LTD.
db.system.js.insert({_id:"fact",value:function(n)
{
if (n==1)
return 1;
else
return n * fact(n-1);
}
}
)

db.eval("fact(5)")
Big Data and Analytics by Seema Acharya
and Subhashini Chellappan Copyright
2015, WILEY INDIA PVT. LTD.
db.system.js.insert({_id:”max”,value:function(a,b)
{
if (a > b)
return a;
else
return b;
}
}
)
db.eval(“max(67,89)”)
var big=db.eval("max(78,-1)");
>big Big Data and Analytics by Seema Acharya
and Subhashini Chellappan Copyright
2015, WILEY INDIA PVT. LTD.
What is Cursor in MongoDB?

• When the db.collection.find () function is used to


search for documents in the collection, the result
returns a pointer to the collection of documents
returned which is called a cursor (Pointer).
• By default, the cursor will be iterated
automatically when the result of the query is
returned.
• But one can also explicitly go through the items
returned in the cursor one by one.
Big Data and Analytics by Seema Acharya
and Subhashini Chellappan Copyright
2015, WILEY INDIA PVT. LTD.
Example

• var mark = db.marks.find();

• while(mark.hasNext())
{
print(tojson(mark.next()));
}

Big Data and Analytics by Seema Acharya


and Subhashini Chellappan Copyright
2015, WILEY INDIA PVT. LTD.
Try yourself

1. Dealing with NULL values


2. Arrays
3. And other commands

Big Data and Analytics by Seema Acharya


and Subhashini Chellappan Copyright
2015, WILEY INDIA PVT. LTD.

You might also like