0% found this document useful (0 votes)

10 views32 pages

Big Data

This report details an assignment on Big Data Analytics focusing on MongoDB, highlighting its installation, core functionalities, and operations such as CRUD. The assignment aims to provide hands-on experience with MongoDB's capabilities in managing unstructured data and its distributed processing features. It also discusses the development of a web-based E-Library Management System to address inefficiencies in library data management.

Uploaded by

kanti chandrakar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views32 pages

Big Data

Uploaded by

kanti chandrakar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

BELAGAVI, KARNATAKA-590 018

REPORT ON

BIG DATA ANALYTICS ASSIGNMENT

Submitted in partial fulfilment of the requirements for the Big data Analytics (21CS71)
course of the 8th semester.

BACHELOR OF ENGINEERING
IN
COMPUTER SCIENCE AND ENGINEERING

KAJAL GUPTA (1JS21CS075)

Under the guidance of

Dr. ABHILASH C B
Associate Professor, CSE Department

JSS ACADEMY OF TECHNICAL EDUCATION, BENGALURU

Department of Computer Science and Engineering
2024 – 2025
JSS MAHAVIDYAPEETHA, MYSURU
ABSTRACT

This assignment aims to provide a comprehensive understanding of Big Data Analytics through the
installation and demonstration of MongoDB, a widely-used big data tool. As part of the curriculum for
the 8th semester course Big Data Analytics (21CS71), the focus is on exploring MongoDB’s core
functionalities, architecture, and its role in handling and analyzing large volumes of unstructured and
semi-structured data.

The assignment begins with the successful installation and configuration of MongoDB on a local system,
followed by the creation and manipulation of databases and collections using real-time data samples. Core
data operations such as insertion, querying, updating, and deletion (CRUD) are performed using the
MongoDB shell and GUI tools like MongoDB Compass. This hands-on experience introduces students to
MongoDB’s data model, which uses flexible, JSON-like documents, facilitating scalable and efficient
data management suitable for big data environments.

In addition, the assignment explores MongoDB’s capabilities in distributed data processing and
storage, which are essential components of big data systems. Topics such as replication and sharding
are discussed to illustrate how MongoDB ensures high availability, fault tolerance, and horizontal
scalability. The concept of replica sets and the use of quorums in maintaining consistency across
distributed systems directly supports Course Outcomes CO1 and CO2, which relate to understanding data
distribution and consistency in big data architectures.

The assignment also includes an overview of aggregation pipelines, indexing strategies, and
MongoDB’s support for parallel data processing, which collectively enhance performance in big data
analytics tasks. Though MapReduce is not the default processing model in MongoDB, its legacy support
is touched upon to align with CO3, introducing students to the fundamentals of parallel computation
within big data systems.
TABLE OF CONTENTS

Chapter No. Page No.

Contents

1 INTRODUCTION 1-2
1.1. What is NoSQL? 1
1.2. What is MongoDB? 1
1.3. Problem Statement 2

2 MONGODB INSTALLATION PROCESS 3-4

2.1. Pre-Installation Notes 3
2.2. Installation Steps 3
2.3. Post-Installation Options 4
2.4. Connecting to MongoDB 4

3 WORKING WITH MONGODB 3-25

3.1. CRUD Operations 3
3.2. Aggregation Framework 13
3.3. MapReduce 16
3.4.Indexing and Results 19

4 CHALLENGES AND FUTURE PROSPECTS 26-27

4.1. Challenges in MongoDB and NoSQL Databases 26
4.2. Future Prospects of NoSQL and MongoDB 26

5 CONCLUSION 28

6 REFERENCES 29
Big Data Analytics 21CS71

CHAPTER 1

INTRODUCTION

1.1 What is NOSQL?

NoSQL databases are designed to efficiently manage large volumes of data that may not
conform to the rigid structures of traditional relational databases. Unlike relational models that
store data in predefined tables with strict schemas, NoSQL systems provide more flexible data
storage options, allowing for rapid development and scalability.

The term “NoSQL” stands for “Not Only SQL,” indicating that these systems support a variety
of data models, including key-value, document, column-family, and graph formats. These
databases are optimized for high performance, horizontal scaling, and distributed data storage.
They are commonly used in applications where data structures frequently change, such as web
applications, social networks, real-time analytics, and Internet of Things (IoT) systems.

NoSQL databases also emphasize eventual consistency and fault tolerance, making them
suitable for distributed environments and big data use cases.

1.2 What is MongoDB?

MongoDB is a document-oriented NoSQL database that stores data in BSON (Binary JSON)
format, allowing for dynamic, schema-less structures. It enables developers to work with
complex, nested data without needing to define rigid schemas in advance. This flexibility
makes MongoDB well-suited for applications with rapidly changing data models.

Key features of MongoDB include support for high availability through replica sets, horizontal
scalability via sharding, and robust query capabilities using both the MongoDB shell and GUI
tools like MongoDB Compass. It also offers powerful aggregation pipelines for data
processing and transformation.

In this assignment, MongoDB was installed and configured on a local system. The
demonstration includes creating databases and collections, performing CRUD (Create, Read,

Dept of CSE, JSSATEB 1

Big Data Analytics 21CS71

Update, Delete) operations, and exploring the replication model. Particular attention was given
to MongoDB’s consistency model and the role of quorums in maintaining eventual
consistency across distributed systems.

By working with MongoDB, this assignment provides hands-on experience with a leading
NoSQL technology and reinforces theoretical concepts related to unstructured data
management, distributed databases, and modern application development.

1.3 Problem Statement

In many educational institutions and libraries, managing book inventories, member

records, and borrowing transactions manually or using basic tools like spreadsheets can
lead to inefficiencies, loss of data, and difficulties in scaling operations. To address
these challenges, there is a need for a robust, user-friendly, and secure E-Library
Management System (ELMS) that can streamline library operations and centralize book
and user data.
This project aims to develop a web-based E-Library Management System using
MongoDB as the backend database. The system will allow librarians or administrators
to perform key library functions such as:
User Authentication: A secure login system for librarians to access and manage the
platform.
Add New Books: A form to collect and store book details (e.g., title, author, genre,
ISBN, availability) into the MongoDB database.
Delete Books: The ability to remove book records by ID or title.
View All Books: Display a list of all books with options to sort, filter, or search.
Update Book Information (optional): Modify existing book details such as
availability status, author name, or genre.
User Authentication ensures that only authorized personnel can manage the library’s
data. The system will be developed using a modern technology stack emphasizing
performance, scalability, and ease of use. MongoDB’s schema-less design is especially
well-suited for handling a wide variety of book data formats and evolving library
requirements.

Dept of CSE, JSSATEB 2

Big Data Analytics 21CS71

CHAPTER 2

MONGODB INSTALLATION PROCESS

This guide outlines the installation of MongoDB 8.0 Community Edition on supported 64-bit
Windows platforms using the MSI installation wizard. It includes basic configuration steps and
considerations for both service-based and manual operation modes.

2.1 Pre-Installation Notes

• The MongoDB Shell (mongosh) must be installed separately.

• MongoDB Compass installation is optional during setup.
• MongoDB can be configured to run as a Windows service.
• VirtualBox is not supported on Hyper-V; disable Hyper-V if using VirtualBox.
• Performance monitoring requires the user to belong to the Performance Monitor
Users and Performance Log Users groups.

2.2 Installation Steps

1. Download the Installer

Obtain the .msi installer from the MongoDB Download Center, selecting version
8.0, Windows as the platform, and msi as the package type.
2. Run the Installer
Launch the downloaded .msi file to begin the installation wizard.
3. Follow the Wizard
o Choose either Complete (recommended) or Custom setup.
o Optionally install MongoDB Compass.
o Choose to install MongoDB as a Windows service:
▪ Configure the service name, data directory (--dbpath), and log
directory (--logpath).
▪ Select a service account (default: Network Service or custom user).

Dept of CSE, JSSATEB 3

Big Data Analytics 21CS71

2.3 Post-Installation Options

• If Installed as a Service
MongoDB starts automatically after installation. Modify configuration via
<install dir>\bin\mongod.cfg if needed, then restart the service through
the Windows Services console.
• If Not Installed as a Service
Create the data directory (C:\data\db) and start MongoDB manually via the
command prompt:

"C:\Program Files\MongoDB\Server\8.0\bin\mongod.exe" --
dbpath="C:\data\db"

2.4 Connecting to MongoDB

Install mongosh, add it to your system PATH, and run mongosh.exe to connect to the
MongoDB instance.
Refer to the MongoDB documentation for information on CRUD operations and deployment
connections.

Additional Configuration

• bindIp Setting
By default, MongoDB binds to 127.0.0.1. Modify bindIp in the config file or
use --bind_ip to allow external connections. Ensure proper security measures are
in place before exposing MongoDB to public networks.
• PATH Environment Variable
Add C:\Program Files\MongoDB\Server\8.0\bin and the path to
mongosh to the system PATH to simplify command-line access.
• Upgrades
The .msi installer supports automatic upgrades within the same release series (e.g.,
8.0.1 to 8.0.2). For major version upgrades, reinstall MongoDB.

Dept of CSE, JSSATEB 4

Big Data Analytics 21CS71

CHAPTER 3

WORKING WITH MONGODB

MongoDB is not only known for its ease of setup and flexible schema but also for its rich set
of operations that enable efficient data handling. This section explores how MongoDB supports
essential CRUD operations (Create, Read, Update, Delete), which are the backbone of any
database interaction. Mastery of these operations is crucial for developing, maintaining, and
scaling applications using MongoDB.

3.1 CRUD Operations

MongoDB uses documents stored inside collections. Each document is a JSON-like structure
(internally BSON – Binary JSON), allowing flexible, hierarchical data. Below are examples of
how to perform each CRUD operation in MongoDB.

3.1.1 Create (Insert Operations)

In MongoDB, Create operations refer to inserting new documents into a collection. MongoDB
provides two primary methods:

• insertOne() – for inserting a single document.

• insertMany() – for inserting multiple documents at once.

Each document must follow the BSON (Binary JSON) structure, which allows for flexibility
in types (arrays, objects, booleans, etc.).

A. Inserting a Single Document

Basic Example:

db.students.insertOne({
name: "Abhishek",
age: 22,
branch: "CSE"
})

With Embedded Document and Array:

Dept of CSE, JSSATEB 5

Big Data Analytics 21CS71

db.students.insertOne({
name: "Sneha",
age: 21,
branch: "AI & DS",
marks: [88, 92, 95],
address: {
city: "Hyderabad",
pincode: 500081
},
isActive: true,
created_at: new Date()
})

B. Inserting Multiple Documents

Basic Example:

db.students.insertMany([
{ name: "Raj", age: 20, branch: "MECH" },
{ name: "Pooja", age: 22, branch: "ECE" },
{ name: "Vikram", age: 23, branch: "CIVIL" }
])

With Different Fields (Flexible Schema):

db.students.insertMany([
{ name: "Ananya", gender: "Female" },
{ name: "Amit", age: 25, department: "IT", isGraduated: false
}
])

MongoDB collections are schema-less, which means each document can have different fields
or structures.

C. Custom _id Field

Dept of CSE, JSSATEB 6

Big Data Analytics 21CS71

MongoDB automatically generates a unique _id for each document. However, you can assign
a custom _id if needed:

db.students.insertOne({
_id: 101,
name: "Ravi",
age: 21
})

Note: Attempting to insert another document with the same _id will result in an error.

D. Error Handling with insertMany()

By default, if one document insertion fails (e.g., due to duplicate _id), the whole operation
stops. You can override this behavior using the ordered: false option:

db.students.insertMany([
{ _id: 1, name: "A" },
{ _id: 1, name: "B" }, // Duplicate _id, causes error
{ _id: 2, name: "C" }
], { ordered: false }) // Inserts other valid documents

E. Best Practices for Create Operations

• Always include a created_at timestamp for tracking when the data was inserted.
• Use consistent field naming conventions.
• Validate data at the application level or using MongoDB’s schema validation.
• Avoid inserting large arrays or deeply nested structures unless necessary.

3.1.2 Read (Find Operations)

Read operations in MongoDB are used to retrieve documents from a collection. MongoDB
provides several methods for querying data, primarily using the find() and findOne()
methods. These operations support filtering, projections, sorting, pagination, and the use of
various query operators.

A. find() – Retrieve Multiple Documents

Dept of CSE, JSSATEB 7

Big Data Analytics 21CS71

The find() method returns a cursor to the documents that match the query. If no query is
specified, all documents are returned.

Example:

db.students.find()

With a filter condition:

javascript
CopyEdit
db.students.find({ age: { $gt: 21 } })

B. findOne() – Retrieve a Single Document

The findOne() method returns the first matching document.

Example:

db.students.findOne({ name: "Abhishek" })

C. Projection – Select Specific Fields

Projection allows you to specify which fields to return in the result.

Return only name and branch:

db.students.find({ age: { $gt: 20 } }, { name: 1, branch: 1,

_id: 0 })

D. Query Operators

MongoDB supports various operators for building complex queries.

• $eq – Equal to
• $ne – Not equal to

Dept of CSE, JSSATEB 8

Big Data Analytics 21CS71

• $gt, $lt, $gte, $lte – Greater/Less than

• $in, $nin – In/Not in array
• $and, $or, $not – Logical operations
• $regex – Pattern matching

Examples:

Find students in CSE or ECE:

db.students.find({ branch: { $in: ["CSE", "ECE"] } })

Find names starting with 'A':

db.students.find({ name: { $regex: "^A", $options: "i" } })

E. Sorting Results

Sorting can be done using the sort() method. Use 1 for ascending and -1 for descending.

Sort by age descending:

db.students.find().sort({ age: -1 })

F. Limiting and Skipping

Use limit() and skip() for pagination or limiting the number of results.

Limit to 5 documents:

db.students.find().limit(5)

Skip the first 5 documents:

db.students.find().skip(5)

Pagination example (page 2, 5 per page):

Dept of CSE, JSSATEB 9

Big Data Analytics 21CS71

db.students.find().skip(5).limit(5)

G. Counting Documents

You can count how many documents match a query using countDocuments().

Count number of students in CSE:

db.students.countDocuments({ branch: "CSE" })

3.1.3 Update Operations

Update operations in MongoDB are used to modify existing documents in a collection. You
can update specific fields, add new fields, or even replace entire documents. MongoDB
provides methods such as updateOne(), updateMany(), and replaceOne().

A. updateOne() – Update a Single Document

Updates the first document that matches the query.

Example:

db.students.updateOne(
{ name: "Abhishek" },
{ $set: { age: 23 } }
)

You can also use operators like $inc, $unset, $rename, etc.

Increment age by 1:

db.students.updateOne(
{ name: "Abhishek" },
{ $inc: { age: 1 } }
)

Dept of CSE, JSSATEB 10

Big Data Analytics 21CS71

B. updateMany() – Update Multiple Documents

Updates all documents that match the query.

Example:

db.students.updateMany(
{ branch: "CSE" },
{ $set: { isEligible: true } }
)

C. replaceOne() – Replace an Entire Document

This method replaces the entire document with the new one provided. Useful when you want
to overwrite all fields.

Example:

db.students.replaceOne(
{ name: "Sneha" },
{
name: "Sneha",
age: 22,
branch: "AI & DS",
isActive: true
}
)

Note: All fields must be specified; missing fields will be removed.

D. Common Update Operators

• $set: Sets the value of a field.

• $unset: Removes a field.
• $inc: Increments a field.

Dept of CSE, JSSATEB 11

Big Data Analytics 21CS71

• $rename: Renames a field.

• $addToSet: Adds a value to an array only if it doesn't already exist.
• $push: Adds a value to an array (duplicates allowed).

Unset a field:

db.students.updateOne(
{ name: "Sneha" },
{ $unset: { isActive: "" } }
)

Rename a field:

db.students.updateOne(
{ name: "Ravi" },
{ $rename: { "branch": "department" } }
)

E. Upsert Option

Upsert means “update if exists, insert if not.”

Example:

db.students.updateOne(
{ name: "Karthik" },
{ $set: { age: 24, branch: "EEE" } },
{ upsert: true }
)

This will insert a new document if no matching document is found.

3.1.4 Delete Operation

Delete operations in MongoDB are used to remove documents from a collection. MongoDB
provides two primary methods for deletion:

• deleteOne() – Deletes a single document that matches the query.

Dept of CSE, JSSATEB 12

Big Data Analytics 21CS71

• deleteMany() – Deletes multiple documents that match the query.

A. deleteOne() – Delete a Single Document

Deletes the first document that matches the provided query.

Example:

db.students.deleteOne({ name: "Abhishek" })

This will delete the first document where the name is "Abhishek."

B. deleteMany() – Delete Multiple Documents

Deletes all documents that match the query.

Example:

db.students.deleteMany({ branch: "CSE" })

This will delete all documents where the branch is "CSE."

C. Deleting All Documents in a Collection

If you want to remove all documents from a collection but not drop the collection itself, use
deleteMany() with an empty query.

Example:

db.students.deleteMany({})

This deletes all documents in the "students" collection.

3.2 Aggregation Framework

The Aggregation Framework in MongoDB is a powerful tool that allows you to process data
records and return computed results. It is similar to SQL's GROUP BY and JOIN operations.
The aggregation pipeline is a series of stages that process data, transforming it into an
aggregated result.

Dept of CSE, JSSATEB 13

Big Data Analytics 21CS71

A. Aggregation Pipeline

The aggregation pipeline is a sequence of stages, each performing a specific operation on the
input data, such as filtering, grouping, or sorting.

Basic Structure:

db.collection.aggregate([
{ stage1 },
{ stage2 },
{ stage3 }
])

Each stage transforms the data and passes it to the next stage.

B. Common Aggregation Operators

• $match – Filters documents based on a condition (similar to find()).

• $group – Groups documents by a specified expression and performs operations like
sum(), avg(), etc.
• $sort – Sorts documents in ascending or descending order.
• $project – Reshapes each document by including or excluding fields.
• $limit – Limits the number of documents passed to the next stage.
• $skip – Skips a specified number of documents.
• $unwind – Deconstructs an array field into separate documents.
• $lookup – Joins documents from another collection (similar to SQL joins).

C. Examples of Aggregation Stages

1. $match – Filter Documents

db.students.aggregate([
{ $match: { branch: "CSE" } }
])

Filters documents where the branch field is "CSE".

Dept of CSE, JSSATEB 14

Big Data Analytics 21CS71

2. $group – Group Documents

db.students.aggregate([
{ $group: { _id: "$branch", total_students: { $sum: 1 } } }
])

Groups students by branch and calculates the total number of students in each branch.

3. $sort – Sort Documents

db.students.aggregate([
{ $sort: { age: -1 } }
])

Sorts the documents in descending order by age.

4. $project – Include or Exclude Fields

db.students.aggregate([
{ $project: { name: 1, age: 1, _id: 0 } }
])

Projects only the name and age fields, excluding the _id field.

5. $limit and $skip – Pagination

db.students.aggregate([
{ $skip: 5 },
{ $limit: 5 }
])

Skips the first 5 documents and limits the result to 5 documents.

D. Using $lookup for Joins

MongoDB’s $lookup operator allows you to perform left outer joins between collections.

Example – Joining Two Collections:

Dept of CSE, JSSATEB 15

Big Data Analytics 21CS71

db.orders.aggregate([
{
$lookup: {
from: "products",
localField: "product_id",
foreignField: "_id",
as: "product_details"
}
}
])

This query joins the orders collection with the products collection where product_id in the
orders collection matches the _id in the products collection.

E. Advanced Aggregation Operations

• $addFields – Adds new fields to the documents.

• $count – Counts the number of documents that match the query.
• $facet – Allows multiple pipelines to be run within a single aggregation.

Example – Using $count:

db.students.aggregate([
{ $match: { branch: "CSE" } },
{ $count: "total_students_in_cse" }
])

3.3 MapReduce

MapReduce in MongoDB is a powerful tool for performing complex data processing tasks that
require transforming and aggregating large datasets. It is based on the Map and Reduce
operations, which are typically used for parallel processing and can handle large-scale
operations.

A. Overview of MapReduce

Dept of CSE, JSSATEB 16

Big Data Analytics 21CS71

• Map Function: The map function processes each document and outputs key-value
pairs.
• Reduce Function: The reduce function groups the results by keys and combines them
into a single output.

Basic Syntax:

db.collection.mapReduce(
mapFunction,
reduceFunction,
{ options }
)

• mapFunction: Defines how to emit key-value pairs from each document.

• reduceFunction: Defines how to combine results with the same key.
• options: Additional options for the operation, such as the output collection or query.

B. Example of MapReduce Operation

Let's assume we have a sales collection, and we want to calculate the total sales per product.

1. The Map Function

The map function emits a key-value pair for each document, where the key is the product name,
and the value is the sale amount.

var mapFunction = function() {

emit(this.product, this.amount);
}

2. The Reduce Function

The reduce function sums the sale amounts for each product.

var reduceFunction = function(key, values) {

return Array.sum(values);
}

Dept of CSE, JSSATEB 17

Big Data Analytics 21CS71

3. Running MapReduce

You can execute the MapReduce operation as follows:

db.sales.mapReduce(
mapFunction,
reduceFunction,
{ out: "total_sales_per_product" }
)

This will output the result into a new collection called total_sales_per_product, where each
document will contain the product name and the total sales amount.

C. Output Options

• out: Specifies where to store the output of the MapReduce operation. Options include:
o A collection (out: "collection_name")
o A temporary collection (out: { inline: 1 } for inline results)
o A merge operation (out: { merge: "existing_collection" })

Example with inline output:

db.sales.mapReduce(
mapFunction,
reduceFunction,
{ out: { inline: 1 } }
)

This will return the result directly without storing it in a collection.

D. Limitations of MapReduce

• Performance: MapReduce can be slower than aggregation operations for many use
cases. It requires writing results to disk, which can be inefficient for large datasets.

Dept of CSE, JSSATEB 18

Big Data Analytics 21CS71

• Single Threaded: While MongoDB performs some optimizations, the MapReduce

process may not be as fast as fully parallelized solutions in distributed systems.

E. When to Use MapReduce

• Complex Transformations: Use MapReduce when you need to apply complex

transformations that the aggregation framework cannot handle.

• Custom Reductions: When you need to define custom logic for combining values,
MapReduce provides flexibility that aggregation may not offer.

• Parallel Processing: For large-scale data processing that can be split into smaller tasks,
MapReduce is helpful.

3.4 Indexing and Results

Indexing in MongoDB is a technique used to improve the performance of query operations by

providing a faster way to access data. MongoDB uses indexes to quickly locate the data without
scanning the entire collection. By default, MongoDB creates an index on the _id field for every
collection.

A. Types of Indexes in MongoDB

MongoDB supports various types of indexes to optimize different types of queries:

1. Single Field Index: The simplest type of index, created on a single field.
2. Compound Index: An index on multiple fields, which is useful when queries involve
multiple fields.
3. Text Index: Used for text search on string fields.
4. Hashed Index: Primarily used for sharded collections to ensure the distribution of data
across shards.
5. Geospatial Index: Used for queries on location-based data.
6. Wildcard Index: Allows indexing on all fields in a document.

B. Creating an Index

You can create an index using the createIndex() method.

Dept of CSE, JSSATEB 19

Big Data Analytics 21CS71

Syntax:

db.collection.createIndex({ field_name: 1 }) // 1 for

ascending, -1 for descending

Example: Create a Single Field Index

To create an index on the name field in the students collection:

db.students.createIndex({ name: 1 })

This index will allow for faster queries based on the name field in the students collection.

Example: Create a Compound Index

A compound index involves multiple fields. This is useful when you frequently query on
multiple fields together.

db.students.createIndex({ branch: 1, age: -1 })

This index will improve the performance of queries that filter by branch and sort by age in
descending order.

Example: Create a Text Index

A text index allows for full-text search on string fields.

db.articles.createIndex({ content: "text" })

This enables full-text search queries like:

db.articles.find({ $text: { $search: "MongoDB" } })

C. Dropping an Index

If an index is no longer needed, it can be dropped to save space and improve performance on
write operations.

db.students.dropIndex("index_name")

Dept of CSE, JSSATEB 20

Big Data Analytics 21CS71

Alternatively, to drop all indexes except the default _id index:

db.students.dropIndexes()

D. Indexing Strategies

• Use Indexes for Frequently Queried Fields: Index fields that are frequently used in
queries, particularly in find(), sort(), update(), or delete() operations.
• Limit the Number of Indexes: Each index consumes memory, and excessive indexes
can slow down write operations. Keep the number of indexes manageable.
• Use Compound Indexes When Applicable: Compound indexes are particularly useful
when multiple fields are used together in queries. They avoid the need for multiple
single-field indexes.
• Consider Indexing for Sorting: If queries often involve sorting on specific fields,
creating an index on those fields can speed up the sorting process.
• Analyze Query Performance: Use MongoDB's explain() method to understand how
indexes are being used in queries and whether performance improvements are needed.

E. Indexing and Performance

• Speed Up Query Execution: Indexes allow MongoDB to quickly locate documents

that match a query, significantly improving query performance.
• Cost on Write Operations: Indexes must be maintained whenever documents are
inserted, updated, or deleted, which can slow down write operations.
• Storage Overhead: Indexes consume additional disk space. The more indexes you
create, the more storage is required.
• Indexing Strategy: Carefully design your indexes to balance query performance with
storage and write efficiency.

F. Example: Query Optimization Using Indexes

Without Index:

db.students.find({ branch: "CSE", age: 20 })

Without an index on branch and age, MongoDB will need to scan the entire collection.

Dept of CSE, JSSATEB 21

Big Data Analytics 21CS71

With Compound Index:

db.students.createIndex({ branch: 1, age: 1 })

This index allows MongoDB to quickly locate the documents where branch is "CSE" and age
is 20, improving performance.

3.5 Indexing

Indexing in MongoDB is a technique used to improve the performance of query operations by

A. Types of Indexes in MongoDB

MongoDB supports various types of indexes to optimize different types of queries:

B. Creating an Index

You can create an index using the createIndex() method.

Syntax:

db.collection.createIndex({ field_name: 1 }) // 1 for ascending, -1 for

descending

Example: Create a Single Field Index

Dept of CSE, JSSATEB 22

Big Data Analytics 21CS71

Dept of CSE, JSSATEB 23

Big Data Analytics 21CS71

Dept of CSE, JSSATEB 24

Big Data Analytics 21CS71

Dept of CSE, JSSATEB 25

Big Data Analytics 21CS71

CHAPTER 4

Challenges and Future Prospects

4.1 Challenges in MongoDB and NoSQL Databases

While NoSQL databases like MongoDB offer several advantages over traditional relational
databases, such as flexibility in handling unstructured data and scalability, they come with their
own set of challenges:

• Consistency vs. Availability: NoSQL databases typically prioritize availability and

partition tolerance (AP in the CAP theorem) over strict consistency, which can lead to
challenges when ensuring consistency across distributed systems.
• Complex Queries: Although MongoDB's aggregation framework is powerful, it may
still not offer the same level of querying sophistication as relational databases with
SQL. Complex join operations, for instance, are handled with $lookup, but they are not
as efficient as SQL joins.
• Data Modeling: Designing schemas in MongoDB can be challenging, especially when
deciding between embedding or referencing documents. There is no one-size-fits-all
approach, and the choice depends on the use case.
• Transaction Support: MongoDB has improved support for multi-document
transactions (since version 4.0), but handling transactions in NoSQL databases is still
less mature than in traditional relational databases.
• Learning Curve: For developers accustomed to SQL, transitioning to NoSQL
databases can be difficult. Understanding the nuances of NoSQL data models and
operations requires a shift in mindset and practices.
• Scalability and Sharding Complexity: Although MongoDB supports sharding, the
setup and management of sharded clusters can be complex, especially when scaling
horizontally across large datasets.

4.2 Future Prospects of NoSQL and MongoDB

Despite these challenges, the future of NoSQL databases like MongoDB looks promising due
to several emerging trends and advancements:

Dept of CSE, JSSATEB 26

Big Data Analytics 21CS71

• Improved Multi-Document Transactions: MongoDB's improvements in multi-

document ACID transactions open up the possibility for using NoSQL in more critical
applications that require strong consistency.
• Integration with Machine Learning and AI: As NoSQL databases like MongoDB
grow, integration with AI and machine learning frameworks will become more
important, especially for managing large datasets used in model training.
• Serverless and Cloud-Native Deployments: With the rise of serverless architectures
and cloud-native technologies, MongoDB Atlas and other NoSQL databases are
becoming more popular as they provide fully managed services with auto-scaling and
high availability.
• Support for Graph and Time-Series Data: MongoDB’s recent improvements in
handling graph data (via the $graphLookup operator) and time-series data make it more
versatile for new use cases, such as IoT and social networks.
• Better Querying Capabilities: As MongoDB continues to evolve, its aggregation
framework and querying capabilities will improve, allowing it to handle more complex
operations with better performance.
• Hybrid Databases: MongoDB and other NoSQL databases may increasingly offer
hybrid models that combine the best features of both SQL and NoSQL, making them
more adaptable to a wider range of applications.

Dept of CSE, JSSATEB 27

Big Data Analytics 21CS71

CHAPTER 5

CONCLUSION

In conclusion, NoSQL databases, especially MongoDB, have revolutionized the way data is
stored and processed, offering solutions for modern applications that demand scalability,
flexibility, and high performance. Unlike traditional relational databases, MongoDB’s
document-oriented model allows developers to store data in a more natural, hierarchical format,
making it an ideal choice for handling unstructured and semi-structured data.
MongoDB's strength lies in its ability to scale horizontally, distribute data across multiple
nodes, and provide high availability and fault tolerance. Its flexible schema design allows rapid
iteration and agile development, while its built-in features like the aggregation framework,
MapReduce, and indexing significantly improve query performance and data processing
capabilities.
Despite these advantages, MongoDB is not without its challenges. Issues such as consistency
in distributed systems, complex querying, and the learning curve for developers transitioning
from SQL databases are areas that continue to pose difficulties. However, MongoDB's
continued advancements—particularly in transaction support, improved querying capabilities,
and integration with cloud-native and serverless architectures—are addressing these limitations
and expanding its use cases.
Looking ahead, MongoDB’s integration with emerging technologies like artificial intelligence,
machine learning, and big data analytics will further enhance its utility, enabling businesses to
harness the power of large datasets. The support for graph databases and time-series data,
combined with hybrid models that blend the best of both SQL and NoSQL, ensures that
MongoDB will remain a critical tool in the developer's toolkit.
As the demand for real-time data processing, high availability, and cloud-based applications
continues to rise, MongoDB's role as a leading NoSQL database will only grow. Its growing
ecosystem, improved features, and vast community support make it a powerful choice for
modern applications across industries such as e-commerce, social media, IoT, finance, and
healthcare.
In summary, MongoDB, with its unique features and growing capabilities, provides a robust
solution to the data management challenges of modern applications. Its continuous evolution
ensures that it will remain at the forefront of the NoSQL revolution, empowering developers
and businesses to build scalable, high-performance systems.

Dept of CSE, JSSATEB 28

Big Data Analytics 21CS71

CHAPTER 6

REFERENCES
• MongoDB Documentation. (n.d.). MongoDB Manual. MongoDB, Inc. Retrieved from
https://docs.mongodb.com/
• Chodorow, K. (2013). MongoDB: The Definitive Guide. O'Reilly Media.
• Giamas, A. (2017). Mastering MongoDB: The Complete Guide to MongoDB
Development and Administration. Packt Publishing.
• MongoDB Atlas Documentation. (n.d.). MongoDB Atlas: Managed MongoDB in the
Cloud. MongoDB, Inc. Retrieved from https://www.mongodb.com/cloud/atlas
• Rhys, C. (2020). MongoDB in Action. Manning Publications.
• Finkel, H. (2015). Learning MongoDB: A Hands-on Guide to Building Applications
with MongoDB. Packt Publishing.
• O'Reilly Media. (2015). Learning MongoDB. Retrieved from
https://www.oreilly.com/library/view/learning-mongodb/9781785884334/
• Grolinger, K., Hughes, K., & Buckley, K. (2013). Data Management in the Cloud:
Challenges and Opportunities. International Journal of Cloud Computing and
Services Science (IJCCSS), 2(3), 1-18.
• Nunn, M., & Denny, M. (2017). Practical MongoDB: Architecting, Developing, and
Administering MongoDB. Apress.

Dept of CSE, JSSATEB 29

9-Wide Column Database and Document Database-25!01!2025
No ratings yet
9-Wide Column Database and Document Database-25!01!2025
43 pages
Soc-Ii Mongodb Record
No ratings yet
Soc-Ii Mongodb Record
51 pages
WebDev Week 10 PPT - Connecting To Databases
No ratings yet
WebDev Week 10 PPT - Connecting To Databases
33 pages
Data Mining in Business Intelligence
No ratings yet
Data Mining in Business Intelligence
64 pages
CCS368 Stream Processing Record
No ratings yet
CCS368 Stream Processing Record
35 pages
UNIT 1 MongoDB Fully Complete
100% (1)
UNIT 1 MongoDB Fully Complete
60 pages
Encyclopedia 04 00093 v2
No ratings yet
Encyclopedia 04 00093 v2
21 pages
MEAN Ebook - CodeWithRandom
No ratings yet
MEAN Ebook - CodeWithRandom
524 pages
Mongo DB Lab Manual
No ratings yet
Mongo DB Lab Manual
23 pages
Mongodb Introductioninstalaltion and Basic Crud Operations
No ratings yet
Mongodb Introductioninstalaltion and Basic Crud Operations
53 pages
X4idgtbvr6vsk1wjveoh Introduction To Mongodb 230511071703 58379da2
No ratings yet
X4idgtbvr6vsk1wjveoh Introduction To Mongodb 230511071703 58379da2
29 pages
Ch2 Nosql Wordpress
No ratings yet
Ch2 Nosql Wordpress
9 pages
No SQL
No ratings yet
No SQL
4 pages
Mongo
No ratings yet
Mongo
29 pages
Database
No ratings yet
Database
5 pages
Experiment of Installing Mongo
No ratings yet
Experiment of Installing Mongo
5 pages
Nonsql-Database Note
No ratings yet
Nonsql-Database Note
24 pages
Mongodb Report
No ratings yet
Mongodb Report
26 pages
BD Unit4 Summary
No ratings yet
BD Unit4 Summary
6 pages
Mongodblabmanual1 240305075254 f531f8f5
No ratings yet
Mongodblabmanual1 240305075254 f531f8f5
73 pages
01 Overview
No ratings yet
01 Overview
49 pages
Jagadish MongoDB - Practical - File
No ratings yet
Jagadish MongoDB - Practical - File
33 pages
A Technical Review Last
No ratings yet
A Technical Review Last
11 pages
Adbms Unit 2
No ratings yet
Adbms Unit 2
38 pages
Simple Blog Rajesh
No ratings yet
Simple Blog Rajesh
12 pages
AdityaGaur BDA Exp3
No ratings yet
AdityaGaur BDA Exp3
3 pages
Research Paper Updated
No ratings yet
Research Paper Updated
11 pages
BDCN Unit 2 Learning Outcome 1
No ratings yet
BDCN Unit 2 Learning Outcome 1
12 pages
MEAN 3 L3 Setting Up and Operating On MongoDB
No ratings yet
MEAN 3 L3 Setting Up and Operating On MongoDB
108 pages
4.2 Nep Nosql Using Mongodb Syllabus
No ratings yet
4.2 Nep Nosql Using Mongodb Syllabus
4 pages
BIG Data Analytics 21CSH-471: Computer Science & Engineering
No ratings yet
BIG Data Analytics 21CSH-471: Computer Science & Engineering
31 pages
281510lecture - 1 Introduction To MongoDB-1718181125331
No ratings yet
281510lecture - 1 Introduction To MongoDB-1718181125331
22 pages
BDCN Unit 2 Learning Outcome 2
No ratings yet
BDCN Unit 2 Learning Outcome 2
13 pages
c13 JSE JDBC NoSQL
No ratings yet
c13 JSE JDBC NoSQL
51 pages
Screenshot 2023-12-07 at 00.20.37
No ratings yet
Screenshot 2023-12-07 at 00.20.37
21 pages
Unit-V SQL
No ratings yet
Unit-V SQL
18 pages
Mongo DB
No ratings yet
Mongo DB
7 pages
Dba 2
No ratings yet
Dba 2
19 pages
Wa0003.
No ratings yet
Wa0003.
9 pages
NOSQL Lab Book
No ratings yet
NOSQL Lab Book
33 pages
Updated Mongodb Lab Manual IV Sem
No ratings yet
Updated Mongodb Lab Manual IV Sem
48 pages
Unit-V DBMS
No ratings yet
Unit-V DBMS
19 pages
Data Spaces: Edward Curry Simon Scerri Tuomo Tuikka Eds
No ratings yet
Data Spaces: Edward Curry Simon Scerri Tuomo Tuikka Eds
367 pages
MSD Unit-4 Material
No ratings yet
MSD Unit-4 Material
17 pages
Advanced - Databases Syllabus
No ratings yet
Advanced - Databases Syllabus
2 pages
Ebook Interview Questions
No ratings yet
Ebook Interview Questions
200 pages
Manual Mango
No ratings yet
Manual Mango
17 pages
Adele Kuzmiakova - The Creation and Management of Database Systems-Arcler Press (2023)
No ratings yet
Adele Kuzmiakova - The Creation and Management of Database Systems-Arcler Press (2023)
262 pages
2383 - 1019 - DOC - NoSQL Databases
No ratings yet
2383 - 1019 - DOC - NoSQL Databases
6 pages
Seminar - NEUROMORPHIC COMPUTING
100% (3)
Seminar - NEUROMORPHIC COMPUTING
14 pages
Nosql Not Only SQL: Databases That Don't Require A Fixed Schema
No ratings yet
Nosql Not Only SQL: Databases That Don't Require A Fixed Schema
21 pages
L48 - MongoDB
No ratings yet
L48 - MongoDB
31 pages
Syllabus ADBMS
No ratings yet
Syllabus ADBMS
3 pages
Unit 4 - Mongodb
No ratings yet
Unit 4 - Mongodb
10 pages
Lab Sheet 9
No ratings yet
Lab Sheet 9
13 pages
BDA - Expt 2 - 18102B0032
No ratings yet
BDA - Expt 2 - 18102B0032
4 pages
Introductiont MongoDB
No ratings yet
Introductiont MongoDB
44 pages
DIGITAL FLUENCY MCQ's-Material (Complete Syllabus)
No ratings yet
DIGITAL FLUENCY MCQ's-Material (Complete Syllabus)
37 pages
NGT Syllabus (E-Next - In)
No ratings yet
NGT Syllabus (E-Next - In)
3 pages
281507lecture Notes 1 - Introduction To MongoDB-1718181125439
No ratings yet
281507lecture Notes 1 - Introduction To MongoDB-1718181125439
8 pages
Deep Keywordnet: Automated English Keyword Extraction in Documents Using Deep Keyword Network Based Ranking
No ratings yet
Deep Keywordnet: Automated English Keyword Extraction in Documents Using Deep Keyword Network Based Ranking
33 pages
Image Classification Using SVM and CNN: March 2020
No ratings yet
Image Classification Using SVM and CNN: March 2020
6 pages
Database Management System - (Chapter 5)
No ratings yet
Database Management System - (Chapter 5)
10 pages
MongoDB Data Modeling - Sample Chapter
No ratings yet
MongoDB Data Modeling - Sample Chapter
40 pages
Class 10 IT Notes Full Final
No ratings yet
Class 10 IT Notes Full Final
3 pages
Final Project 555
No ratings yet
Final Project 555
7 pages
Evolution of Semantic Similarity - A Survey
No ratings yet
Evolution of Semantic Similarity - A Survey
35 pages
Mongodb Schema Validation
No ratings yet
Mongodb Schema Validation
8 pages
A Systematic Literature Review of Cloud Computing Cybersecurity
No ratings yet
A Systematic Literature Review of Cloud Computing Cybersecurity
38 pages
Mongo DB Session 01 Document
No ratings yet
Mongo DB Session 01 Document
10 pages
K Thi: Final Exam - Ngày Thi: 09.11.2013: 2 (Next)
No ratings yet
K Thi: Final Exam - Ngày Thi: 09.11.2013: 2 (Next)
32 pages
Sample Question Paper
No ratings yet
Sample Question Paper
4 pages
Varnika Resume Final
No ratings yet
Varnika Resume Final
2 pages
WWW Simplilearn Com Tutorials SQL Tutorial What Is Normalization in SQL
No ratings yet
WWW Simplilearn Com Tutorials SQL Tutorial What Is Normalization in SQL
9 pages
Experiment No 4
No ratings yet
Experiment No 4
9 pages
Preservation of The Privacy of Data On Cloud by Using Key Map Based Data Anonamysa
No ratings yet
Preservation of The Privacy of Data On Cloud by Using Key Map Based Data Anonamysa
12 pages
MongoDB - Course Curriculum
No ratings yet
MongoDB - Course Curriculum
5 pages
Importing Flat File Data
No ratings yet
Importing Flat File Data
1 page
Payton Webster Resume
No ratings yet
Payton Webster Resume
1 page
Deployment of Medibot in Medical Field
No ratings yet
Deployment of Medibot in Medical Field
11 pages
Resume 6'24
No ratings yet
Resume 6'24
1 page
Yatish Singh: Relevant Coursework
No ratings yet
Yatish Singh: Relevant Coursework
1 page
منهجية البحث العلمي ادارة اعمال امتحان
No ratings yet
منهجية البحث العلمي ادارة اعمال امتحان
4 pages
796 Ict P1 Mock Questions 2024
No ratings yet
796 Ict P1 Mock Questions 2024
4 pages
Notes For CAPE IT
No ratings yet
Notes For CAPE IT
6 pages
Introduction To Ai Everyone
No ratings yet
Introduction To Ai Everyone
4 pages
Depression Detection Emotion AI
No ratings yet
Depression Detection Emotion AI
5 pages
Gurnameh Resume Data Science
No ratings yet
Gurnameh Resume Data Science
1 page
Kishan Thesiya: Btech 2019 - Dhirubhai Ambani Institute of Information and Communication Technology (Da-Iict)
No ratings yet
Kishan Thesiya: Btech 2019 - Dhirubhai Ambani Institute of Information and Communication Technology (Da-Iict)
1 page
Applied MongoDB: Strategies for Designing and Developing Robust NoSQL Databases
From Everand
Applied MongoDB: Strategies for Designing and Developing Robust NoSQL Databases
Adam Jones
No ratings yet