0% found this document useful (0 votes)
27 views

Mongo DB Notes

Map-reduce is a data processing paradigm that condenses large volumes of data into useful aggregated results. MongoDB provides the mapReduce database command to perform map-reduce operations on collections, including sharded collections. The document demonstrates running a map-reduce operation on the orders1 collection to sum the amounts for each customer ID where the status is "A", outputting the results to the orders_totals collection. Another example shows using map-reduce to count the number of posts by each user from the c1 collection where the status is "active", outputting the results to the post_total collection.

Uploaded by

dojox78612
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Mongo DB Notes

Map-reduce is a data processing paradigm that condenses large volumes of data into useful aggregated results. MongoDB provides the mapReduce database command to perform map-reduce operations on collections, including sharded collections. The document demonstrates running a map-reduce operation on the orders1 collection to sum the amounts for each customer ID where the status is "A", outputting the results to the orders_totals collection. Another example shows using map-reduce to count the number of posts by each user from the c1 collection where the status is "active", outputting the results to the post_total collection.

Uploaded by

dojox78612
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Academic Year 2022-23

Course : SY BTech IT
Course Name : ADBMSL

(Assignment No 4)

Map reduce
Map-reduce is a data processing paradigm for condensing large volumes of data into useful
aggregated results. For map-reduce operations, MongoDB provides the mapReduce database
command.
Map-reduce supports operations on sharded collections, both as an input and as an output.
Sharding is a method for distributing data across multiple machines. MongoDB uses sharding to
support deployments with very large data sets and high throughput operations.
Consider the following map-reduce operation:

Prof. Pallavi M. Tekade


Academic Year 2022-23
Course : SY BTech IT
Course Name : ADBMSL
> db.orders1.insertMany([{cust_id:"A123",amount:500,status:"A"},
{cust_id:"A123",amount:250,status:"A"},{cust_id:"B212",amount:200,status:"A"},
{cust_id:"A123",amount:300,status:"D"}])
{
"acknowledged" : true,
"insertedIds" : [
ObjectId("59cde47a1ea6abd9870348f9"), ObjectId("59cde47a1ea6abd9870348fa"),
ObjectId("59cde47a1ea6abd9870348fb"),
ObjectId("59cde47a1ea6abd9870348fc")
]

> db.orders.find()
{ "_id" : ObjectId("59cde47a1ea6abd9870348f9"), "cust_id" : "A123", "amount" : 500, "status" :
"A" }
{ "_id" : ObjectId("59cde47a1ea6abd9870348fa"), "cust_id" : "A123", "amount" : 250, "status" : "A" }
{ "_id" : ObjectId("59cde47a1ea6abd9870348fb"), "cust_id" : "B212", "amount" : 200, "status" : "A" }
{ "_id" : ObjectId("59cde47a1ea6abd9870348fc"), "cust_id" : "A123", "amount" : 300, "status" :
"D" }

> db.orders1.mapReduce( function(){emit (this.cust_id,this.amount);}, function(key,values){return


Array.sum(values)}, { query:{status:"A"}, out:"orders_totals" } )
{
"result" : "orders_totals",
"timeMillis" : 419,
"counts" : {
"input" : 3,
"emit" : 3,
"reduce" : 1,
"output" : 2
},
"ok" : 1
}

>db.orders_totals.find()

Prof. Pallavi M. Tekade


Academic Year 2022-23
Course : SY BTech IT
Course Name : ADBMSL
Another example
The collection c1 contains documents which store user_name of the users and the status of posts.

>>db.c1.insertMany([{"post_text": "India is an awesome country","user_name":


"sachin", "status":"active"},
{"post_text": "welcome to India","user_name": "saurav","status":"active"},
{"post_text": "I live in India","user_name": "yuvraj","status":"active"},
{"post_text": "India is great","user_name": "gautam","status":"active"}])

Now, we will use a mapReduce function on our c1 collection to select all the active posts, group
them on the basis of user_name and then count the number of posts by each user using the
following code.
>>db.c1.mapReduce(
function() { emit(this.user_name,1); },

function(key, values) {return Array.sum(values)}, {


query:{status:"active"}, out:"post_total"
}
)

>>db.post_total.find()
>>> db.c1.insertMany([{"post_text": "India is an awesome country","user_name": "sachin",
"status":"active"},
... {"post_text": "welcome to India","user_name": "saurav","status":"active"},
... {"post_text": "I live in India","user_name": "yuvraj","status":"active"},
... {"post_text": "India is great","user_name": "gautam","status":"active"}])
{
"acknowledged" : true,
"insertedIds" : [
ObjectId("59dcdd4a26deb533e8c00fb6"),
ObjectId("59dcdd4a26deb533e8c00fb7"),
ObjectId("59dcdd4a26deb533e8c00fb8"),
ObjectId("59dcdd4a26deb533e8c00fb9")
]
}
>>> db.c1.mapReduce(
... function() { emit(this.user_name,1); },
...
... function(key, values) {return Array.sum(values)}, {
... query:{status:"active"},
... out:"post_total"
... }
... )
{
"result" : "post_total",

Prof. Pallavi M. Tekade


Academic Year 2022-23
Course : SY BTech IT
Course Name : ADBMSL
"timeMillis" : 101,
"counts" : {
"input" : 4,
"emit" : 4,
"reduce" : 0,
"output" : 4
},
"ok" : 1
}
>>> db.post_total.find()
{ "_id" : "gautam", "value" : 1 }
{ "_id" : "sachin", "value" : 1 }
{ "_id" : "saurav", "value" : 1 }
{ "_id" : "yuvraj", "value" : 1 }

Prof. Pallavi M. Tekade

You might also like