Chapter 5
Chapter 5
Use cases
• Aadhar is an excellent example of real world use
cases of MongoDB.
• Shutterfly is a popular internet-based photo
sharing and personal publishing company that
manages a store of more than 6 billion images
with a transaction rate of up to 10,000 operations
per second.
• Shutterfly is one of the companies that
transitioned from Oracle to MongoDB.
Other use cases
• Castlight Health, Thumbtack, Klout, IBM, Citrix, Twitter, T-
Mobile, Zendesk, Sony, Chegg, Techstars, Atlassian,
Udacity, BrightRoll, RetailMeNot, Hootsuite,
SurveyMonkey, Shyp, Criteo, MuleSoft, HackerRank,
Foursquare, HTC, InVision, Intercom.
• Facebook
• GitHub https://www.mongodb.com/who-use
s-mongodb
• Youtube
• Twitter
• LinkedIn
• Slack
• StackOverflow and so on.
MongoDB
• MongoDB is a document-oriented NoSQL database
used for high volume data storage, which came into
light around the mid-2000s.
• MongoDB is an open-source database that uses a
document-oriented data model and a non-structured
query language. It is one of the most powerful
NoSQL systems and databases around, today.
• Stores data in JSON-like formats.
• It is a highly scalable, flexible, and distributed NoSQL
database.
MongoDB
• Each database contains collections which in turn
contains documents. Each document can be different
with a varying number of fields.
• The size and content of each document can be
different from each other.
• MongoDB was first developed by MongoDB Inc.,
known then as 10gen, in October 2007 originally as a
major part in a PaaS (Platform as a Service) product
similar to Windows Azure and Google App Engine.
• The development was shifted to open source in
2009.
Difference between RDBMS and MongoDB
RDBMS MongoDB
Database Database
Table, View Collection
Row Document (JSON, BSON)
Column Field
Index Index
Join Embedded Document
Foreign Key Reference
Partition Shard
14
Collection
• A grouping of MongoDB documents.
• A collection is the equivalent of an RDBMS
table. A collection exists within a single
database.
• Collections do not enforce a schema.
Sample Representation of a Document in
MongoDB
Collection
Document
Referenced Documents
Embedded Documents:
Theory of noSQL: CAP
• Many nodes
C
• Nodes contain replicas of
partitions of data
• Consistency
– all replicas contain the same
version of data
• Availability
– system remains operational
A P
on failing nodes
• Partition tolarence CAP Theorem:
– multiple entry points satisfying all three at the
– system remains operational
on system split
same time is impossible
20
Features/Why MongoDB
• Document-Oriented storege
• Full Index Support
• Replication & High Agile
Availability
• Auto-Sharding
• Querying
• Fast In-Place Updates Scalable
• Map/Reduce
21
Other features
• Easy to install and use
• Detailed documentation
• Various APIs
– JavaScript, Python, Ruby, Perl, Java, Java, Scala, C#,
C++, Haskell, Erlang
• Community (Good)
• Open source
22
ACID - BASE
•Basically
•Atomicity
Available (CP)
•Consistency •Soft-state
•Isolation •Eventually
•Durability
consistent (AP)
23
Pritchett, D.: BASE: An Acid Alternative (queue.acm.org/detail.cfm?id=1394128)
What is MongoDB
• MongoDB is
– Cross-platform
– Open source
– Non-relational
– Distributed
– NoSQL
– Document oriented data store.
What is MongoDB
• A document-oriented database
– documents encapsulate and encode data (or
information) in some standard formats or encodings
• NoSQL database
– non-adherence to the widely used relational database
– highly optimized for retrieve and append operations
• uses BSON/JSON format
• schema-less
– No more configuring database columns with types
• No transactions
• No joins
BSON (Binary JSON)
• BSON is a computer data interchange format used
mainly as a data storage and network transfer format in
the MongoDB database.
• It is a binary form for representing simple data
structures, associative arrays (called objects or
documents in MongoDB), and various data types of
specific interest to MongoDB.
• MongoDB represents JSON documents in binary-
encoded format called BSON behind the scenes.
JSON
• JavaScript Object Notation (JSON) is an open, human and
machine-readable standard that facilitates data
interchange, and along with XML is the main format for
data interchange used on the modern web.
• JSON supports all the basic data types numbers, strings,
and Boolean values, as well as arrays and hashes.
• Document databases such as MongoDB use JSON
documents in order to store records, just as tables and
rows store records in a relational database.
{
'_id' : 1, Example
'name' : { 'first' : 'John', 'last' : 'Backus' },
'contribs' : [ 'Fortran', 'ALGOL', 'Backus-Naur Form', 'FP' ],
'awards' : [
{
'award' : 'W.W. McDowell Award',
'year' : 1967,
'by' : 'IEEE Computer Society'
}, {
'award' : 'Draper Prize',
'year' : 1993,
'by' : 'National Academy of Engineering'
}
]
}
The Basics
• A MongoDB may have zero or more databases
• A database may have zero or more collections.
• A collection may have zero or more documents.
– Docs in the same collection don’t even need to have
the same fields
– Docs are the records in RDBMS
– Docs can embed other documents
– Documents are addressed in the database via a
unique key
• A document may have one or more fields.
• MongoDB Indexes is much like their RDBMS
counterparts.
The Basics
• Simple queries
• Makes sense with most web applications
• Easier and faster integration of data
• Not well suited for heavy and complex
transactions systems.
Unique Key
• Each JSON document should have a unique
identifier. It is the _id key.
• It is similar to the primary key in relational
databases.
• An index is automatically built on the unique
identifier.
Database
• It is collection of collections.
• It is a container for collections.
• Default database of MongoDB is 'db', which is
stored within data folder.
• A single MongoDB server can house several
databases.
Collection
• A collection may store a number of documents.
• A collection is analogous to a table of an RDBMS.
• A collection can create on demand.
• A collection exists within a single database.
• A collection can hold several MongoDB
documents.
• A collection does not enforce a schema.
Example: Mongo Collection
{ "_id":
ObjectId("4efa8d2b7d284dad101e4bc9"),
"Last Name": "DUMONT", "First Name": "Jean", Automatically
"Date of Birth": "01-22-1963" }, generated by
MongoDB
{ "_id":
ObjectId("4efa8d2b7d284dad101e4bc7"), "Last
Name": "PELLERIN",
"First Name": "Franck",
"Date of Birth": "09-19-1983", "Address": "1
chemin des Loges", "City": "VERSAILLES" }
Document
• The document is the unit of storing data in a
MongoDB database.
• A document is analogous to a row/record/tuple
in an RDBMS table.
• A document has a dynamic schema. This implies
that, a document in a collection need not have
the same set of fields/key-value pairs
Support for Dynamic queries
• MongoDB has extensive support for dynamic
queries.
Storing Binary Data
• MongoDB provides GridFS to support the storage of
binary data such as images, audio files, video files,
etc.
• GridFS divides a file into chunks and stores each
chunk of data in a separate document, each
maximum size 255k.
Replication
Sharding
Sharding
• Sharding is a type of database partitioning that separates
very large databases the into smaller, faster, more easily
manageable parts called data shards.
• The word shard means a small part of a whole.
• A large dataset is divided and distributed over multiple
servers or shards.
• Each shard is an independent database and collectively
constitute a logical database.
Advantages of Sharding
• MongoDB distributes the read and write workload across
the shards in the cluster.
• Sharding reduces the amount of data that each shard
needs to store and manage.
• Sharding reduces the number of operations that each
shard handles.
Updating information in-place
• MongoDB updates the data wherever it is available.
• MongoDB writes to disk once every second.
Terms used in RDBMS and MongoDB
Embedded documents
Database Server & Client
• Server: mongod
• Client: mongo
Data types
String Must be UTF-8 valid.
Most commonly used data type.
Integer Can be 32-bit or 64-bit (depends on the server).
Boolean To store a true/false value.
Double To store floating point (real values).
Min/Max keys To compare a value against the lowest or
highest BSON elements.
Arrays To store arrays or list or multiple values into
one key.
Timestamp To record when a document has been modified
or added.
Null To store a NULL value. A NULL is a missing or
unknown value.
Date To store the current date or time in Unix time
format. One can create object of date and
pass day, month and year to it.
Object ID To store the document’s id.
Binary data To store binary data (images, binaries, etc.).
Code To store javascript code into the document.
Regular expression To store regular expression.
MQL
db.createCollection(“Person”)
Collection
db.food.drop();
Insert and display
• Create a collection by the name “Students”
and store the following data in it.
db.Student.insert({_id:2, StudName:"Raju
Patil”, Grade: "VII", Hobbies:
"Internet Surfing"})
db.Student.find()
db.Student.find().pretty()
Update Method
• db.Student.update({_id:4},{$set:
{"StudName":"Aryan Patil"}},{upsert:true});
Output: WriteResult({ "nMatched" : 0,
"nUpserted" : 1, "nModified" : 0, "_id" : 4 })
• db.Student.update({_id:4},{$set:
{"StudName":"Aryan Patil 2"}},{upsert:true});
Output: WriteResult({ "nMatched" : 1,
"nUpserted" : 0, "nModified" : 1 })
Other commands
• Commands
Find Method
db.Students.find({studentName:"Raju"}).pretty()
Find Method
To display only the Name and Grade from all
the documents of the Students collection. The
identifier _id should be suppressed and NOT
displayed.
db.Students.find({},{Name:1,Grade:1,_id:0})
Find Method
• To find those documents where the Grade is set to
‘VII’
db.Students.find({Grade:{$eq:“VII"}}).pretty()
Find Method
• To find those documents from the Students
collection where the Hobbies is set to either ‘Chess’
or is set to ‘Skating’.
db.Students.find({Hobbies:{$in:
["Chess","Skating"]}}).pretty()
Find Method
• To find documents from the Students collection
where the Name begins with “M”.
db.Students.find({StudName:/^M/}).pretty();
Find Method
• To find documents from the Students collection
where the StudName has an “e” in any position.
db.Students.find({StudName:/e/}).pretty();
Number of documents
db.Students.count();
Sort
db.Students.find().sort({Name:1}).pretty();
db.Students.find().sort({Name:-1}).pretty();
db.Students.find().sort({Grade:1,Hobbies:1}).pretty();
Skip
db.Students.find().skip(2).pretty();
db.Students.find().skip(2).limit(3).pretty();
Aggregate Function
{ CustID: “C123”,
AccBal: 500,
AccType: “S”
} { CustID: “C123”,
AccBal: 500,
{ CustID: “C123”, AccType: “S”
AccBal: 900, }
AccType: “S”
{
} { _id: “C123”,
TotAccBal: 1400
CustID: “C123”, }
{ AccBal: 900,
AccType: “S”
CustID: “C111”,
$match } $group
AccBal: 1200, {
AccType: “S” { CustID: “C111”, _id: “C111”,
} AccBal: 1200, TotAccBal: 1200
AccType: “S” }
{ CustID: “C123”,
AccBal: 1500, }
AccType: “C”
}
Customers
Aggregate Function
• First filter on “AccType:S” and then group it
on “CustID” and then compute the sum of
“AccBal” and then filter those documents
wherein the “TotAccBal” is greater than 1200,
use the below syntax:
} { CustID: “C123”,
AccBal: 500,
AccType: “S”
{ CustID: “C123”,
AccBal: 900, }
AccType: “S” {
_id: “C123”,
} { value: 1400
CustID: “C123”,
{ AccBal: 900, {“C123”:[ 500,900 ]}
AccType: “S”
CustID: “C111”,
query } map
{“C111”: 1200 } {
AccBal: 1200,
AccType: “S” { CustID: “C111”, _id: “C111”,
AccBal: 1200, value: 1200
}
AccType: “S” }
{ CustID: “C123”,
AccBal: 1500, }
AccType: “C” Customer_Totals
Customers
MapReduce functions
• As per the MongoDB documentation, Map-
reduce is a data processing paradigm for
condensing large volumes of data into useful
aggregated results.
• MongoDB uses mapReduce command for map-
reduce operations. MapReduce is generally used
for processing large data sets.
MapReduce functions
>db.collection.mapReduce(
function() {emit(key,value);}, //map F
function(key,values) {return reduceFunction}, { //reduce F
out: collection,
query: document,
}
)
MapReduce functions
db.eval("fact(5)")
Big Data and Analytics by Seema Acharya
and Subhashini Chellappan Copyright
2015, WILEY INDIA PVT. LTD.
db.system.js.insert({_id:”max”,value:function(a,b)
{
if (a > b)
return a;
else
return b;
}
}
)
db.eval(“max(67,89)”)
var big=db.eval("max(78,-1)");
>big Big Data and Analytics by Seema Acharya
and Subhashini Chellappan Copyright
2015, WILEY INDIA PVT. LTD.
What is Cursor in MongoDB?
• while(mark.hasNext())
{
print(tojson(mark.next()));
}