0% found this document useful (0 votes)
23 views

Lecture 31

MongoDB provides a mapReduce command to perform aggregation operations on data. The mapReduce operation involves defining a map function to emit key-value pairs and a reduce function to combine values for each key. Results can be stored in a new collection. MongoDB also uses master-slave replication, sharding to partition and distribute data, and BSON to store JSON documents.

Uploaded by

Aman Salman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Lecture 31

MongoDB provides a mapReduce command to perform aggregation operations on data. The mapReduce operation involves defining a map function to emit key-value pairs and a reduce function to combine values for each key. Results can be stored in a new collection. MongoDB also uses master-slave replication, sharding to partition and distribute data, and BSON to store JSON documents.

Uploaded by

Aman Salman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

MongoDB

Advance
MapReduce with MongoDB

• MongoDB provides mapReduce command

• Syntax: db.collection-name.mapReduce(mapFunction,
reduceFunction, options)
MapReduce Operation

• the map function, defined as JS function

• the reduce function also defined as function using the


function keyword
MapReduce Operation

• query: is the criteria to select documents first, if not given,


then the map reduce will be applied on all docs

• out field specify where to store the MapReduce output

• here, in the example order_totals will be a new collection


MapReduce
• The map and reduce can be defined as functions outside
the mapReduce operator,

• can be called inside the mapReduce operator

db.testMR.mapReduce(mapFunction1,
reduceFunction1,
{ out: "map_reduce_example" }
)
Example
• Collection contains documents, document as shown
below
{
cust_id: "abc123",

ord_date: new Date("Oct 04, 2012"),

status: 'A',

price: 25,

items: [ { sku: "mmm", qty: 5, price: 2.5 },

{ sku: "nnn", qty: 5, price: 2.5 } ]

• The MapReduce task: return the total price per customer


Define Map Function

• Map function reads the input and emits/produce

• cust_id and price as key-value pairs

• We can define the map function using JS


var mapFunction1 = function() {

emit(this.cust_id, this.price);

};

* this refers to the document that the map function is processing


Define the reduce function

• The reduce function will get the cust_id as a key, and an


Array of all prices as value

• Using JS, we can define the reduce function as

var reduceFunction1 = function(keyCustId, valuesPrices) {

return Array.sum(valuesPrices);

};
Perform the MapReduce
db.orders.mapReduce(

mapFunction1,

reduceFunction1,

{ out: “sum_prices_per_customer” }

• We call the mapReduce operator on the collection


)

• pass the map function name

• pass the reduce function name

• and specify where to store the output


Other parameters
mapReduce takes

• Beside the map function, reduce function, and the out

• mapReduce operator can take

• query: perform selection (filter on the data)

• sort

• limit
Example

• count number of movies per year, starting from 2005

sample doc
{
title: "Anthropoid",
year: 2016,
actors: [ ObjectId("8") ]
}
Example
• count number of movies per year, starting from 2005, and
sort

sample doc
{
title: "Anthropoid",
year: 2016,
actors: [ ObjectId("8") ]
}
Index
• If data is not indexed, this means that the DB will scan the entire
collection to find docs based on given conditions

• With indexes, mongoDB can efficiently find docs

• primary feature for performance

• Primary index is applied on the identifier field (_id)

• created automatically

• Secondary indexes can be applied on any field

• created manually
Index types
• Default: _id

• Single field

• user-defined on single field

• Compound fields

• user-defined on multiple fields

• multikey index

• used to index content stored in array

• index entry for each array element


Index Structures
• Ordered

• values in the indexed field are sorted either ascending or


descending

• Hashed

• index the hashing of the values

• Text

• index the text of the fields

• useful for full-text search


Behind the scenes: BSON

• MongoDB uses BSON (Binary JSON) to store


representation of JSON docs

• the JSON objects, arrays are serialized into binary


Behind the scenes:
Replication

• Master/slave replication

• one replica is the master

• other replicas are slaves

• client perform operations on


the master replica
Behind the scenes:
Sharding
• MongoDB automatically partition the data

• MongoDB partition Collection

• using the indexed key that is immutable (for example the _id)

• divide into chunks

• when the chunk grows beyond configured limit, it will be split

• In the background

• MongoDB runs chunk migration process

• to achieve load balancing


Summary
• MongoDB is a JOSN document Database

• Master/slave replication approach

• Query functionality

• CRUD operations

• Create , Read, Update, Delete

• MapReduce

• index structures

You might also like