Notes For Question Bank
Notes For Question Bank
Module 3
1. Explain the document model with a suitable example.
MongoDB follows a document-oriented data model, where data is stored as
JSON-like documents (BSON format) instead of traditional rows and columns like in
relational databases.
The key thing to understand about the document model is that data that is
accessed together is stored together. And also the documents across a collection need
not have common fields in it.
For general purpose use, the document model prevails as the preferred model by
developers and database administrators
Key Features of MongoDB's Document Model
{
field1: value1,
field2: value2,
field3: value3,
...
fieldN: valueN
}
The value of a field can be any of the BSON data types, including other documents,
arrays, and arrays of documents. For example, the following document contains
values of varying types:
Example:
var mydoc = {
_id: ObjectId("5099803df3f4948bd2f98391"),
name: { first: "Alan", last: "Turing" },
birth: new Date('Jun 23, 1912'),
death: new Date('Jun 07, 1954'),
contribs: [ "Turing machine", "Turing test", "Turingery" ],
views : NumberLong(1250000)
}
The above fields have the following data types:
_id holds an ObjectId.
name holds an embedded document that contains the fields first and last.
birth and death hold values of the Date type.
contribs holds an array of strings.
views holds a value of the NumberLong type.
It is atomic because all the writes to all the documents are committed at once or if any fail the
entire transaction is rolled back.
It is consistent because all of the checks are done within the transaction.
It is isolated as the “snapshot” isolation level is used to guarantee this.
It is durable as it uses the write concern of “majority” to commit the data
3. Explain the concept of Scaling-Up and Scaling-Out in detail.
The vertical scaling approach, also referred to as "scaling up," focuses on adding more
resources or more processing power to a single machine.
These additions may include CPU and RAM resources upgrades which will increase the
processing speed of a single server or increase the storage capacity of a single machine to
address increasing data requirements.
Advantage:
It is easier than the alternative horizontal scaling approach, since hardware resources and/or
computing resources are only being added to one machine, there is less complexity
involved.
Since only one machine is being upgraded, vertical scaling is often a more economical
choice in the short-term than the horizontal scaling approach.
Disadvantage:
With one machine, there is a limit to the amount of upgrades or expansion that can occur.
This means that an organization's scalability needs may not be met if their single machine
doesn't have the necessary expansion capacity.
PROS
● The main benefit of vertical scaling is that nothing changes about your database
infrastructure other than the hardware specifications of the machine running the
database.
● As such, it's transparent to the application. The only difference is that you have more
CPUs, memory, and/or storage space.
● Vertical scaling is a good option to try first if massive storage and processing are not
required.
CONS
● The downside of scaling up is that servers with more storage and processing power can
be a lot more expensive.
● There is also a physical limit on the amount of CPUs, memory, network interfaces, and
hard-drives that can be used on a single machine.
● If scaling vertically requires a migration between hardwares, it could result in
downtime or service disruption
Scaling Out (Horizontal approach) refers to adding more machines to further distribute the
load of the database and increase overall storage and/or processing power.
PROS
● Horizontal scaling is "infinitely scalable" as you can always add another machine, if
you are already using the largest machine available.
● There is more predictable increases in pricing
● Horizontal scaling can also deliver better performance and customer experience.
● An example of this is distributing with global clusters to deliver better performance in
each region.
CONS
● Horizontal scaling may require application architecture and code changes due to the
distributed nature of the data.
● Database systems that are scaled horizontally can be more complicated to manage and
maintain, leading to more work for you and your team.
● This approach also adds additional complexity as it's important to make sure data is
evenly distributed across the shards and there is no duplicated or lost data.
Sharding
The sharding method of horizontal scaling involves dividing a large database into smaller,
more manageable pieces (called shards) and then distributing the shards across multiple
machines. Each shard contains a subset of the data, and each machine is responsible for
storing and performing requests for a specific set of shards. For example, in the illustration
below, the data shard data subsets were divided by price range.
This approach to horizontal scaling improves the system's fault tolerance and availability, as a
single point of failure in one machine does not affect the remaining machines. However, this
approach also adds additional complexity as it's important to make sure data is evenly
distributed across the shards and there is no duplicated or lost data.
Replication
The replication method of scaling horizontally creates multiple copies of the same database
on multiple machines. Usually, one machine is designated as the primary machine (e.g., first
machine where database changes are made) and all database changes made to that database
are propagated to all other database replicas (e.g., the other machines with the same
database). This ensures that all instances of the database are up-to-date.
In MongoDB, replica sets are employed for this purpose. A primary server or node accepts all
write operations and applies those same operations across secondary servers, replicating the
data. If the primary server should ever experience a critical failure, any one of the secondary
servers can be elected to become the new primary node. If that server comes back online, it
becomes a secondary once it fully recovers, aiding the new primary
● The advantage of this horizontal scaling method is that system availability and fault tolerance
is greatly improved. Specifically, if the primary machine has an outage, one of the other
existing machines can be promoted to the status of primary machine. And, since all machines
have the same database with the same data stored, the system can continue to operate without
interruption. In addition, due to the existence of more machines, improved performance can
also occur as data requests can be distributed across multiple machines.
● Some of the disadvantages to replication include the introduction of additional complexity
and risk of duplicated or lost data (as with sharding). In addition, because replication requires
the use of multiple copies of the database across multiple machines, additional system traffic
and storage requirements are also a concern. This can sometimes lead to additional system
and personnel costs.
The following aggregation pipeline example contains two stages and returns the total order
quantity of medium size pizzas grouped by pizza name:
db.orders.aggregate( [
// Stage 1: Filter pizza order documents by pizza size
{
$match: { size: "medium" }
},
// Stage 2: Group remaining documents by pizza name and calculate total
quantity
{
$group: { _id: "$name", totalQuantity: { $sum: "$quantity" } }
}
])
Example output:
[
{ _id: 'Cheese', totalQuantity: 50 },
{ _id: 'Vegan', totalQuantity: 10 },
{ _id: 'Pepperoni', totalQuantity: 20 }
]
Note to remember:
The $match stage:
Filters the pizza order documents to pizzas with a size of medium.
Passes the remaining documents to the $group stage.
The $group stage:
Groups the remaining documents by pizza name.
Uses $sum to calculate the total order quantity for each pizza name. The total is stored
in the totalQuantity field returned by the aggregation pipeline.
8. Explain with suitable examples, the difference between aggregation pipelines and single
purpose aggregation methods
Aggregation Pipelines
The aggregation pipeline is a framework for data processing, allowing multiple stages to
transform documents sequentially.
Key Features :
Multiple Stages – Allows complex transformations using a sequence of operators.
Efficient Processing – Each stage refines data before passing it to the next.
Flexibility – Supports grouping, filtering, sorting, and computations.
Using an aggregation pipeline, we calculate the average rating for each product:
db.productReviews.aggregate([
{ $group: { _id: "$productId", avgRating: { $avg: "$reviewRating" } } }
]);
Here $group – Groups by productId and calculates the average review rating.
Output:
[
{ "_id": "A1", "avgRating": 4.5 },
{ "_id": "B2", "avgRating": 3.0 }
]
2. Single-Purpose Aggregation Methods
• The single purpose aggregation methods aggregate documents from a single collection.
The methods are simple but lack the capabilities of an aggregation pipeline.
• These are specialized functions designed for common aggregation tasks such as
counting, distinct values, and simple grouping.
Key Features
● Simpler & Faster – Optimized for specific queries.
● Limited Functionality – Cannot perform multi-stage transformations.
● Direct Methods – Used when aggregation pipelines are unnecessary.
Methods Description
[
{
"$match": {
category: "Electronics"
}
},
{
"$group": {
_id: "$productName",
totalQuantitySold : {
"$sum": "$quantitySold"
}
}
}
]
OUTPUT:
[
{ "_id": "Smartphone", "totalQuantitySold": 30 },
{ "_id": "Laptop", "totalQuantitySold": 15 },
{ "_id": "Smartwatch", "totalQuantitySold": 8 }
]
Stages used in this pipeline:
$match: The $match stage in MongoDB's aggregation framework is used to filter
documents in a pipeline. It allows you to pass only the documents that meet certain criteria to
the next stage of the pipeline. In the above example it filters the documents to include only
those with the category "Electronics".
$group: enables grouping of documents and applying aggregate functions on the grouped
data. It is commonly used for data analysis, reporting, and summarization. Along with
basic aggregate functions like sum, count, and average the $group supports a variety of
other operations such as finding the maximum or minimum value in a group,
concatenating strings and calculating standard deviations. In the above example, it groups the
documents by the product name and calculates the total quantity sold for each product.
10. Explain the concept of Map-Reduce in MongoDB. Illustrate this with a use case.
Map-reduce is a data processing paradigm for condensing large volumes of data into
useful aggregated results. To perform map-reduce operations, MongoDB provides
the mapReduce database command.
In this map-reduce operation, MongoDB applies the map phase to each input document (i.e.
the documents in the collection that match the query condition). The map function emits
key-value pairs. For those keys that have multiple values, MongoDB applies
the reduce phase, which collects and condenses the aggregated data. MongoDB then stores
the results in a collection. Optionally, the output of the reduce function may pass through
a finalize function to further condense or process the results of the aggregation.
All map-reduce functions in MongoDB are JavaScript and run within the mongod process.
Map-reduce operations take the documents of a single collection as the input and can
perform any arbitrary sorting and limiting before beginning the map stage. mapReduce can
return the results of a map-reduce operation as a document, or may write the results to
collections.
11. Explain in detail the scenarios in which embedded data models are typically used.
Solution:
Map Function: The map function emits each productId along with the saleAmount.
13. Using Map-Reduce in MongoDB, calculate the average review rating for each product
in a collection of customer reviews. Each document in the productReviews collection
contains fields such as productId, reviewRating, reviewText, and reviewDate. Write the Map
and Reduce functions to return the appropriate results. Show a schematic representation of
the Map-Reduce operation for the given scenario.
(Refer: Map-Reduce - MongoDB Manual v7.0)
Map Function: The map function emits each productId along with the
reviewRating.
var mapFunction = function() {
emit(this.productId, { sum: this.reviewRating, count: 1 });
};
Reduce Function: The reduce function accumulates the sum of review ratings
and the count of reviews.
var reduceFunction = function(key, values) {
var reducedValue = { sum: 0, count: 0 };
values.forEach(function(value) {
reducedValue.sum += value.sum;
reducedValue.count += value.count;
});
return reducedValue;
};
Finalize Function: The finalize function calculates the average rating.
var finalizeFunction = function(key, reducedValue) {
return reducedValue.count > 0 ? reducedValue.sum / reducedValue.count : 0;
};
Executing Map-Reduce
Run the Map-Reduce operation on the productReviews collection:
db.productReviews.mapReduce(
mapFunction,
reduceFunction,
{
out: "averageRatings",
finalize: finalizeFunction
}
);
Schematic Representation
-------------------------------------------------
| productId | reviewRating | reviewText | reviewDate |
-------------------------------------------------
| 101 | 4.5 | "Great!" | 2025-05-12 |
| 101 | 3.8 | "Good" | 2025-05-14 |
| 102 | 5.0 | "Excellent" | 2025-05-10 |
| 102 | 4.2 | "Nice" | 2025-05-11 |
-------------------------------------------------
| Map Function |
------------------------------------------------
| Key(productId) | Value({sum, count}) |
-------------------------------------------------
| 101 | {4.5, 1} |
| 101 | {3.8, 1} |
| 102 | {5.0, 1} |
| 102 | {4.2, 1} |
| Reduce Function |
-------------------------------------------------
| Key(productId) | Value({sum, count}) |
-------------------------------------------------
| 101 | {8.3, 2} |
| 102 | {9.2, 2} |
-------------------------------------------------
| Finalize Function |
-------------------------------------------------
| productId | AverageRating |
-------------------------------------------------
| 101 | 4.15 |
| 102 | 4.6 |
-------------------------------------------------
Module 4
1. Explain the purpose of the insertOne() method in MongoDB with syntax and a suitable practical
example.
2. Explain the purpose of the insertMany() method in MongoDB with syntax and a suitable
practical example.
3. Explain the concept of bulk write in MongoDB in detail with syntax. Provide a suitable
example to perform ordered bulk insert with detailed execution steps.
4. Create a MongoDB database and collection to store student information (USN, Sname, Sem,
Branch, CGPA). Perform the below data manipulation and analysis tasks using MongoDB
queries :
a. Demonstrate insertion of multiple documents
b. Display all the documents
c. Display all the students in AIML
d. Display the number of students in CSE
e. Display all the ISE students with CGPA greater than 6, but less than 7.5
5. Create a MongoDB database to store product information (product_id, name, description, price,
category, brand, inventory). Perform the below data manipulation and analysis tasks using
MongoDB queries :
a. Find all products in the electronics category with a price greater than $500
b. Find all products by a specific brand
c. Find products with a price between $100 and $300
d. Find all products in the clothing category
e. Sort products by price in ascending order