0% found this document useful (0 votes)
10 views32 pages

Big Data

This report details an assignment on Big Data Analytics focusing on MongoDB, highlighting its installation, core functionalities, and operations such as CRUD. The assignment aims to provide hands-on experience with MongoDB's capabilities in managing unstructured data and its distributed processing features. It also discusses the development of a web-based E-Library Management System to address inefficiencies in library data management.

Uploaded by

kanti chandrakar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views32 pages

Big Data

This report details an assignment on Big Data Analytics focusing on MongoDB, highlighting its installation, core functionalities, and operations such as CRUD. The assignment aims to provide hands-on experience with MongoDB's capabilities in managing unstructured data and its distributed processing features. It also discusses the development of a web-based E-Library Management System to address inefficiencies in library data management.

Uploaded by

kanti chandrakar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

BELAGAVI, KARNATAKA-590 018

REPORT ON

BIG DATA ANALYTICS ASSIGNMENT


Submitted in partial fulfilment of the requirements for the Big data Analytics (21CS71)
course of the 8th semester.

BACHELOR OF ENGINEERING
IN
COMPUTER SCIENCE AND ENGINEERING

KAJAL GUPTA (1JS21CS075)

Under the guidance of


Dr. ABHILASH C B
Associate Professor, CSE Department

JSS ACADEMY OF TECHNICAL EDUCATION, BENGALURU


Department of Computer Science and Engineering
2024 – 2025
JSS MAHAVIDYAPEETHA, MYSURU
ABSTRACT

This assignment aims to provide a comprehensive understanding of Big Data Analytics through the
installation and demonstration of MongoDB, a widely-used big data tool. As part of the curriculum for
the 8th semester course Big Data Analytics (21CS71), the focus is on exploring MongoDB’s core
functionalities, architecture, and its role in handling and analyzing large volumes of unstructured and
semi-structured data.

The assignment begins with the successful installation and configuration of MongoDB on a local system,
followed by the creation and manipulation of databases and collections using real-time data samples. Core
data operations such as insertion, querying, updating, and deletion (CRUD) are performed using the
MongoDB shell and GUI tools like MongoDB Compass. This hands-on experience introduces students to
MongoDB’s data model, which uses flexible, JSON-like documents, facilitating scalable and efficient
data management suitable for big data environments.

In addition, the assignment explores MongoDB’s capabilities in distributed data processing and
storage, which are essential components of big data systems. Topics such as replication and sharding
are discussed to illustrate how MongoDB ensures high availability, fault tolerance, and horizontal
scalability. The concept of replica sets and the use of quorums in maintaining consistency across
distributed systems directly supports Course Outcomes CO1 and CO2, which relate to understanding data
distribution and consistency in big data architectures.

The assignment also includes an overview of aggregation pipelines, indexing strategies, and
MongoDB’s support for parallel data processing, which collectively enhance performance in big data
analytics tasks. Though MapReduce is not the default processing model in MongoDB, its legacy support
is touched upon to align with CO3, introducing students to the fundamentals of parallel computation
within big data systems.
TABLE OF CONTENTS

Chapter No. Page No.


Contents

1 INTRODUCTION 1-2
1.1. What is NoSQL? 1
1.2. What is MongoDB? 1
1.3. Problem Statement 2

2 MONGODB INSTALLATION PROCESS 3-4


2.1. Pre-Installation Notes 3
2.2. Installation Steps 3
2.3. Post-Installation Options 4
2.4. Connecting to MongoDB 4

3 WORKING WITH MONGODB 3-25


3.1. CRUD Operations 3
3.2. Aggregation Framework 13
3.3. MapReduce 16
3.4.Indexing and Results 19

4 CHALLENGES AND FUTURE PROSPECTS 26-27


4.1. Challenges in MongoDB and NoSQL Databases 26
4.2. Future Prospects of NoSQL and MongoDB 26

5 CONCLUSION 28

6 REFERENCES 29
Big Data Analytics 21CS71

CHAPTER 1

INTRODUCTION

1.1 What is NOSQL?

NoSQL databases are designed to efficiently manage large volumes of data that may not
conform to the rigid structures of traditional relational databases. Unlike relational models that
store data in predefined tables with strict schemas, NoSQL systems provide more flexible data
storage options, allowing for rapid development and scalability.

The term “NoSQL” stands for “Not Only SQL,” indicating that these systems support a variety
of data models, including key-value, document, column-family, and graph formats. These
databases are optimized for high performance, horizontal scaling, and distributed data storage.
They are commonly used in applications where data structures frequently change, such as web
applications, social networks, real-time analytics, and Internet of Things (IoT) systems.

NoSQL databases also emphasize eventual consistency and fault tolerance, making them
suitable for distributed environments and big data use cases.

1.2 What is MongoDB?

MongoDB is a document-oriented NoSQL database that stores data in BSON (Binary JSON)
format, allowing for dynamic, schema-less structures. It enables developers to work with
complex, nested data without needing to define rigid schemas in advance. This flexibility
makes MongoDB well-suited for applications with rapidly changing data models.

Key features of MongoDB include support for high availability through replica sets, horizontal
scalability via sharding, and robust query capabilities using both the MongoDB shell and GUI
tools like MongoDB Compass. It also offers powerful aggregation pipelines for data
processing and transformation.

In this assignment, MongoDB was installed and configured on a local system. The
demonstration includes creating databases and collections, performing CRUD (Create, Read,

Dept of CSE, JSSATEB 1


Big Data Analytics 21CS71

Update, Delete) operations, and exploring the replication model. Particular attention was given
to MongoDB’s consistency model and the role of quorums in maintaining eventual
consistency across distributed systems.

By working with MongoDB, this assignment provides hands-on experience with a leading
NoSQL technology and reinforces theoretical concepts related to unstructured data
management, distributed databases, and modern application development.

1.3 Problem Statement

In many educational institutions and libraries, managing book inventories, member


records, and borrowing transactions manually or using basic tools like spreadsheets can
lead to inefficiencies, loss of data, and difficulties in scaling operations. To address
these challenges, there is a need for a robust, user-friendly, and secure E-Library
Management System (ELMS) that can streamline library operations and centralize book
and user data.
This project aims to develop a web-based E-Library Management System using
MongoDB as the backend database. The system will allow librarians or administrators
to perform key library functions such as:
User Authentication: A secure login system for librarians to access and manage the
platform.
Add New Books: A form to collect and store book details (e.g., title, author, genre,
ISBN, availability) into the MongoDB database.
Delete Books: The ability to remove book records by ID or title.
View All Books: Display a list of all books with options to sort, filter, or search.
Update Book Information (optional): Modify existing book details such as
availability status, author name, or genre.
User Authentication ensures that only authorized personnel can manage the library’s
data. The system will be developed using a modern technology stack emphasizing
performance, scalability, and ease of use. MongoDB’s schema-less design is especially
well-suited for handling a wide variety of book data formats and evolving library
requirements.

Dept of CSE, JSSATEB 2


Big Data Analytics 21CS71

CHAPTER 2

MONGODB INSTALLATION PROCESS

This guide outlines the installation of MongoDB 8.0 Community Edition on supported 64-bit
Windows platforms using the MSI installation wizard. It includes basic configuration steps and
considerations for both service-based and manual operation modes.

2.1 Pre-Installation Notes

• The MongoDB Shell (mongosh) must be installed separately.


• MongoDB Compass installation is optional during setup.
• MongoDB can be configured to run as a Windows service.
• VirtualBox is not supported on Hyper-V; disable Hyper-V if using VirtualBox.
• Performance monitoring requires the user to belong to the Performance Monitor
Users and Performance Log Users groups.

2.2 Installation Steps

1. Download the Installer


Obtain the .msi installer from the MongoDB Download Center, selecting version
8.0, Windows as the platform, and msi as the package type.
2. Run the Installer
Launch the downloaded .msi file to begin the installation wizard.
3. Follow the Wizard
o Choose either Complete (recommended) or Custom setup.
o Optionally install MongoDB Compass.
o Choose to install MongoDB as a Windows service:
▪ Configure the service name, data directory (--dbpath), and log
directory (--logpath).
▪ Select a service account (default: Network Service or custom user).

Dept of CSE, JSSATEB 3


Big Data Analytics 21CS71

2.3 Post-Installation Options

• If Installed as a Service
MongoDB starts automatically after installation. Modify configuration via
<install dir>\bin\mongod.cfg if needed, then restart the service through
the Windows Services console.
• If Not Installed as a Service
Create the data directory (C:\data\db) and start MongoDB manually via the
command prompt:

"C:\Program Files\MongoDB\Server\8.0\bin\mongod.exe" --
dbpath="C:\data\db"

2.4 Connecting to MongoDB

Install mongosh, add it to your system PATH, and run mongosh.exe to connect to the
MongoDB instance.
Refer to the MongoDB documentation for information on CRUD operations and deployment
connections.

Additional Configuration

• bindIp Setting
By default, MongoDB binds to 127.0.0.1. Modify bindIp in the config file or
use --bind_ip to allow external connections. Ensure proper security measures are
in place before exposing MongoDB to public networks.
• PATH Environment Variable
Add C:\Program Files\MongoDB\Server\8.0\bin and the path to
mongosh to the system PATH to simplify command-line access.
• Upgrades
The .msi installer supports automatic upgrades within the same release series (e.g.,
8.0.1 to 8.0.2). For major version upgrades, reinstall MongoDB.

Dept of CSE, JSSATEB 4


Big Data Analytics 21CS71

CHAPTER 3

WORKING WITH MONGODB


MongoDB is not only known for its ease of setup and flexible schema but also for its rich set
of operations that enable efficient data handling. This section explores how MongoDB supports
essential CRUD operations (Create, Read, Update, Delete), which are the backbone of any
database interaction. Mastery of these operations is crucial for developing, maintaining, and
scaling applications using MongoDB.

3.1 CRUD Operations

MongoDB uses documents stored inside collections. Each document is a JSON-like structure
(internally BSON – Binary JSON), allowing flexible, hierarchical data. Below are examples of
how to perform each CRUD operation in MongoDB.

3.1.1 Create (Insert Operations)

In MongoDB, Create operations refer to inserting new documents into a collection. MongoDB
provides two primary methods:

• insertOne() – for inserting a single document.


• insertMany() – for inserting multiple documents at once.

Each document must follow the BSON (Binary JSON) structure, which allows for flexibility
in types (arrays, objects, booleans, etc.).

A. Inserting a Single Document

Basic Example:

db.students.insertOne({
name: "Abhishek",
age: 22,
branch: "CSE"
})

With Embedded Document and Array:

Dept of CSE, JSSATEB 5


Big Data Analytics 21CS71

db.students.insertOne({
name: "Sneha",
age: 21,
branch: "AI & DS",
marks: [88, 92, 95],
address: {
city: "Hyderabad",
pincode: 500081
},
isActive: true,
created_at: new Date()
})

B. Inserting Multiple Documents

Basic Example:

db.students.insertMany([
{ name: "Raj", age: 20, branch: "MECH" },
{ name: "Pooja", age: 22, branch: "ECE" },
{ name: "Vikram", age: 23, branch: "CIVIL" }
])

With Different Fields (Flexible Schema):

db.students.insertMany([
{ name: "Ananya", gender: "Female" },
{ name: "Amit", age: 25, department: "IT", isGraduated: false
}
])

MongoDB collections are schema-less, which means each document can have different fields
or structures.

C. Custom _id Field

Dept of CSE, JSSATEB 6


Big Data Analytics 21CS71

MongoDB automatically generates a unique _id for each document. However, you can assign
a custom _id if needed:

db.students.insertOne({
_id: 101,
name: "Ravi",
age: 21
})

Note: Attempting to insert another document with the same _id will result in an error.

D. Error Handling with insertMany()

By default, if one document insertion fails (e.g., due to duplicate _id), the whole operation
stops. You can override this behavior using the ordered: false option:

db.students.insertMany([
{ _id: 1, name: "A" },
{ _id: 1, name: "B" }, // Duplicate _id, causes error
{ _id: 2, name: "C" }
], { ordered: false }) // Inserts other valid documents

E. Best Practices for Create Operations

• Always include a created_at timestamp for tracking when the data was inserted.
• Use consistent field naming conventions.
• Validate data at the application level or using MongoDB’s schema validation.
• Avoid inserting large arrays or deeply nested structures unless necessary.

3.1.2 Read (Find Operations)

Read operations in MongoDB are used to retrieve documents from a collection. MongoDB
provides several methods for querying data, primarily using the find() and findOne()
methods. These operations support filtering, projections, sorting, pagination, and the use of
various query operators.

A. find() – Retrieve Multiple Documents

Dept of CSE, JSSATEB 7


Big Data Analytics 21CS71

The find() method returns a cursor to the documents that match the query. If no query is
specified, all documents are returned.

Example:

db.students.find()

With a filter condition:

javascript
CopyEdit
db.students.find({ age: { $gt: 21 } })

B. findOne() – Retrieve a Single Document

The findOne() method returns the first matching document.

Example:

db.students.findOne({ name: "Abhishek" })

C. Projection – Select Specific Fields

Projection allows you to specify which fields to return in the result.

Return only name and branch:

db.students.find({ age: { $gt: 20 } }, { name: 1, branch: 1,


_id: 0 })

D. Query Operators

MongoDB supports various operators for building complex queries.

• $eq – Equal to
• $ne – Not equal to

Dept of CSE, JSSATEB 8


Big Data Analytics 21CS71

• $gt, $lt, $gte, $lte – Greater/Less than


• $in, $nin – In/Not in array
• $and, $or, $not – Logical operations
• $regex – Pattern matching

Examples:

Find students in CSE or ECE:

db.students.find({ branch: { $in: ["CSE", "ECE"] } })

Find names starting with 'A':

db.students.find({ name: { $regex: "^A", $options: "i" } })

E. Sorting Results

Sorting can be done using the sort() method. Use 1 for ascending and -1 for descending.

Sort by age descending:

db.students.find().sort({ age: -1 })

F. Limiting and Skipping

Use limit() and skip() for pagination or limiting the number of results.

Limit to 5 documents:

db.students.find().limit(5)

Skip the first 5 documents:

db.students.find().skip(5)

Pagination example (page 2, 5 per page):

Dept of CSE, JSSATEB 9


Big Data Analytics 21CS71

db.students.find().skip(5).limit(5)

G. Counting Documents

You can count how many documents match a query using countDocuments().

Count number of students in CSE:

db.students.countDocuments({ branch: "CSE" })

3.1.3 Update Operations

Update operations in MongoDB are used to modify existing documents in a collection. You
can update specific fields, add new fields, or even replace entire documents. MongoDB
provides methods such as updateOne(), updateMany(), and replaceOne().

A. updateOne() – Update a Single Document

Updates the first document that matches the query.

Example:

db.students.updateOne(
{ name: "Abhishek" },
{ $set: { age: 23 } }
)

You can also use operators like $inc, $unset, $rename, etc.

Increment age by 1:

db.students.updateOne(
{ name: "Abhishek" },
{ $inc: { age: 1 } }
)

Dept of CSE, JSSATEB 10


Big Data Analytics 21CS71

B. updateMany() – Update Multiple Documents

Updates all documents that match the query.

Example:

db.students.updateMany(
{ branch: "CSE" },
{ $set: { isEligible: true } }
)

C. replaceOne() – Replace an Entire Document

This method replaces the entire document with the new one provided. Useful when you want
to overwrite all fields.

Example:

db.students.replaceOne(
{ name: "Sneha" },
{
name: "Sneha",
age: 22,
branch: "AI & DS",
isActive: true
}
)

Note: All fields must be specified; missing fields will be removed.

D. Common Update Operators

• $set: Sets the value of a field.


• $unset: Removes a field.
• $inc: Increments a field.

Dept of CSE, JSSATEB 11


Big Data Analytics 21CS71

• $rename: Renames a field.


• $addToSet: Adds a value to an array only if it doesn't already exist.
• $push: Adds a value to an array (duplicates allowed).

Unset a field:

db.students.updateOne(
{ name: "Sneha" },
{ $unset: { isActive: "" } }
)

Rename a field:

db.students.updateOne(
{ name: "Ravi" },
{ $rename: { "branch": "department" } }
)

E. Upsert Option

Upsert means “update if exists, insert if not.”

Example:

db.students.updateOne(
{ name: "Karthik" },
{ $set: { age: 24, branch: "EEE" } },
{ upsert: true }
)

This will insert a new document if no matching document is found.

3.1.4 Delete Operation

Delete operations in MongoDB are used to remove documents from a collection. MongoDB
provides two primary methods for deletion:

• deleteOne() – Deletes a single document that matches the query.

Dept of CSE, JSSATEB 12


Big Data Analytics 21CS71

• deleteMany() – Deletes multiple documents that match the query.

A. deleteOne() – Delete a Single Document

Deletes the first document that matches the provided query.

Example:

db.students.deleteOne({ name: "Abhishek" })

This will delete the first document where the name is "Abhishek."

B. deleteMany() – Delete Multiple Documents

Deletes all documents that match the query.

Example:

db.students.deleteMany({ branch: "CSE" })

This will delete all documents where the branch is "CSE."

C. Deleting All Documents in a Collection

If you want to remove all documents from a collection but not drop the collection itself, use
deleteMany() with an empty query.

Example:

db.students.deleteMany({})

This deletes all documents in the "students" collection.

3.2 Aggregation Framework

The Aggregation Framework in MongoDB is a powerful tool that allows you to process data
records and return computed results. It is similar to SQL's GROUP BY and JOIN operations.
The aggregation pipeline is a series of stages that process data, transforming it into an
aggregated result.

Dept of CSE, JSSATEB 13


Big Data Analytics 21CS71

A. Aggregation Pipeline

The aggregation pipeline is a sequence of stages, each performing a specific operation on the
input data, such as filtering, grouping, or sorting.

Basic Structure:

db.collection.aggregate([
{ stage1 },
{ stage2 },
{ stage3 }
])

Each stage transforms the data and passes it to the next stage.

B. Common Aggregation Operators

• $match – Filters documents based on a condition (similar to find()).


• $group – Groups documents by a specified expression and performs operations like
sum(), avg(), etc.
• $sort – Sorts documents in ascending or descending order.
• $project – Reshapes each document by including or excluding fields.
• $limit – Limits the number of documents passed to the next stage.
• $skip – Skips a specified number of documents.
• $unwind – Deconstructs an array field into separate documents.
• $lookup – Joins documents from another collection (similar to SQL joins).

C. Examples of Aggregation Stages

1. $match – Filter Documents

db.students.aggregate([
{ $match: { branch: "CSE" } }
])

Filters documents where the branch field is "CSE".

Dept of CSE, JSSATEB 14


Big Data Analytics 21CS71

2. $group – Group Documents

db.students.aggregate([
{ $group: { _id: "$branch", total_students: { $sum: 1 } } }
])

Groups students by branch and calculates the total number of students in each branch.

3. $sort – Sort Documents

db.students.aggregate([
{ $sort: { age: -1 } }
])

Sorts the documents in descending order by age.

4. $project – Include or Exclude Fields

db.students.aggregate([
{ $project: { name: 1, age: 1, _id: 0 } }
])

Projects only the name and age fields, excluding the _id field.

5. $limit and $skip – Pagination

db.students.aggregate([
{ $skip: 5 },
{ $limit: 5 }
])

Skips the first 5 documents and limits the result to 5 documents.

D. Using $lookup for Joins

MongoDB’s $lookup operator allows you to perform left outer joins between collections.

Example – Joining Two Collections:

Dept of CSE, JSSATEB 15


Big Data Analytics 21CS71

db.orders.aggregate([
{
$lookup: {
from: "products",
localField: "product_id",
foreignField: "_id",
as: "product_details"
}
}
])

This query joins the orders collection with the products collection where product_id in the
orders collection matches the _id in the products collection.

E. Advanced Aggregation Operations

• $addFields – Adds new fields to the documents.


• $count – Counts the number of documents that match the query.
• $facet – Allows multiple pipelines to be run within a single aggregation.

Example – Using $count:

db.students.aggregate([
{ $match: { branch: "CSE" } },
{ $count: "total_students_in_cse" }
])

3.3 MapReduce

MapReduce in MongoDB is a powerful tool for performing complex data processing tasks that
require transforming and aggregating large datasets. It is based on the Map and Reduce
operations, which are typically used for parallel processing and can handle large-scale
operations.

A. Overview of MapReduce

Dept of CSE, JSSATEB 16


Big Data Analytics 21CS71

• Map Function: The map function processes each document and outputs key-value
pairs.
• Reduce Function: The reduce function groups the results by keys and combines them
into a single output.

Basic Syntax:

db.collection.mapReduce(
mapFunction,
reduceFunction,
{ options }
)

• mapFunction: Defines how to emit key-value pairs from each document.


• reduceFunction: Defines how to combine results with the same key.
• options: Additional options for the operation, such as the output collection or query.

B. Example of MapReduce Operation

Let's assume we have a sales collection, and we want to calculate the total sales per product.

1. The Map Function

The map function emits a key-value pair for each document, where the key is the product name,
and the value is the sale amount.

var mapFunction = function() {


emit(this.product, this.amount);
}

2. The Reduce Function

The reduce function sums the sale amounts for each product.

var reduceFunction = function(key, values) {


return Array.sum(values);
}

Dept of CSE, JSSATEB 17


Big Data Analytics 21CS71

3. Running MapReduce

You can execute the MapReduce operation as follows:

db.sales.mapReduce(
mapFunction,
reduceFunction,
{ out: "total_sales_per_product" }
)

This will output the result into a new collection called total_sales_per_product, where each
document will contain the product name and the total sales amount.

C. Output Options

• out: Specifies where to store the output of the MapReduce operation. Options include:
o A collection (out: "collection_name")
o A temporary collection (out: { inline: 1 } for inline results)
o A merge operation (out: { merge: "existing_collection" })

Example with inline output:

db.sales.mapReduce(
mapFunction,
reduceFunction,
{ out: { inline: 1 } }
)

This will return the result directly without storing it in a collection.

D. Limitations of MapReduce

• Performance: MapReduce can be slower than aggregation operations for many use
cases. It requires writing results to disk, which can be inefficient for large datasets.

Dept of CSE, JSSATEB 18


Big Data Analytics 21CS71

• Single Threaded: While MongoDB performs some optimizations, the MapReduce


process may not be as fast as fully parallelized solutions in distributed systems.

E. When to Use MapReduce

• Complex Transformations: Use MapReduce when you need to apply complex


transformations that the aggregation framework cannot handle.

• Custom Reductions: When you need to define custom logic for combining values,
MapReduce provides flexibility that aggregation may not offer.

• Parallel Processing: For large-scale data processing that can be split into smaller tasks,
MapReduce is helpful.

3.4 Indexing and Results

Indexing in MongoDB is a technique used to improve the performance of query operations by


providing a faster way to access data. MongoDB uses indexes to quickly locate the data without
scanning the entire collection. By default, MongoDB creates an index on the _id field for every
collection.

A. Types of Indexes in MongoDB

MongoDB supports various types of indexes to optimize different types of queries:

1. Single Field Index: The simplest type of index, created on a single field.
2. Compound Index: An index on multiple fields, which is useful when queries involve
multiple fields.
3. Text Index: Used for text search on string fields.
4. Hashed Index: Primarily used for sharded collections to ensure the distribution of data
across shards.
5. Geospatial Index: Used for queries on location-based data.
6. Wildcard Index: Allows indexing on all fields in a document.

B. Creating an Index

You can create an index using the createIndex() method.

Dept of CSE, JSSATEB 19


Big Data Analytics 21CS71

Syntax:

db.collection.createIndex({ field_name: 1 }) // 1 for


ascending, -1 for descending

Example: Create a Single Field Index

To create an index on the name field in the students collection:

db.students.createIndex({ name: 1 })

This index will allow for faster queries based on the name field in the students collection.

Example: Create a Compound Index

A compound index involves multiple fields. This is useful when you frequently query on
multiple fields together.

db.students.createIndex({ branch: 1, age: -1 })

This index will improve the performance of queries that filter by branch and sort by age in
descending order.

Example: Create a Text Index

A text index allows for full-text search on string fields.

db.articles.createIndex({ content: "text" })

This enables full-text search queries like:

db.articles.find({ $text: { $search: "MongoDB" } })

C. Dropping an Index

If an index is no longer needed, it can be dropped to save space and improve performance on
write operations.

db.students.dropIndex("index_name")

Dept of CSE, JSSATEB 20


Big Data Analytics 21CS71

Alternatively, to drop all indexes except the default _id index:

db.students.dropIndexes()

D. Indexing Strategies

• Use Indexes for Frequently Queried Fields: Index fields that are frequently used in
queries, particularly in find(), sort(), update(), or delete() operations.
• Limit the Number of Indexes: Each index consumes memory, and excessive indexes
can slow down write operations. Keep the number of indexes manageable.
• Use Compound Indexes When Applicable: Compound indexes are particularly useful
when multiple fields are used together in queries. They avoid the need for multiple
single-field indexes.
• Consider Indexing for Sorting: If queries often involve sorting on specific fields,
creating an index on those fields can speed up the sorting process.
• Analyze Query Performance: Use MongoDB's explain() method to understand how
indexes are being used in queries and whether performance improvements are needed.

E. Indexing and Performance

• Speed Up Query Execution: Indexes allow MongoDB to quickly locate documents


that match a query, significantly improving query performance.
• Cost on Write Operations: Indexes must be maintained whenever documents are
inserted, updated, or deleted, which can slow down write operations.
• Storage Overhead: Indexes consume additional disk space. The more indexes you
create, the more storage is required.
• Indexing Strategy: Carefully design your indexes to balance query performance with
storage and write efficiency.

F. Example: Query Optimization Using Indexes

Without Index:

db.students.find({ branch: "CSE", age: 20 })

Without an index on branch and age, MongoDB will need to scan the entire collection.

Dept of CSE, JSSATEB 21


Big Data Analytics 21CS71

With Compound Index:

db.students.createIndex({ branch: 1, age: 1 })

This index allows MongoDB to quickly locate the documents where branch is "CSE" and age
is 20, improving performance.

3.5 Indexing

Indexing in MongoDB is a technique used to improve the performance of query operations by


providing a faster way to access data. MongoDB uses indexes to quickly locate the data without
scanning the entire collection. By default, MongoDB creates an index on the _id field for every
collection.

A. Types of Indexes in MongoDB

MongoDB supports various types of indexes to optimize different types of queries:

1. Single Field Index: The simplest type of index, created on a single field.
2. Compound Index: An index on multiple fields, which is useful when queries involve
multiple fields.
3. Text Index: Used for text search on string fields.
4. Hashed Index: Primarily used for sharded collections to ensure the distribution of data
across shards.
5. Geospatial Index: Used for queries on location-based data.
6. Wildcard Index: Allows indexing on all fields in a document.

B. Creating an Index

You can create an index using the createIndex() method.

Syntax:

db.collection.createIndex({ field_name: 1 }) // 1 for ascending, -1 for


descending

Example: Create a Single Field Index

Dept of CSE, JSSATEB 22


Big Data Analytics 21CS71

Dept of CSE, JSSATEB 23


Big Data Analytics 21CS71

Dept of CSE, JSSATEB 24


Big Data Analytics 21CS71

Dept of CSE, JSSATEB 25


Big Data Analytics 21CS71

CHAPTER 4

Challenges and Future Prospects

4.1 Challenges in MongoDB and NoSQL Databases

While NoSQL databases like MongoDB offer several advantages over traditional relational
databases, such as flexibility in handling unstructured data and scalability, they come with their
own set of challenges:

• Consistency vs. Availability: NoSQL databases typically prioritize availability and


partition tolerance (AP in the CAP theorem) over strict consistency, which can lead to
challenges when ensuring consistency across distributed systems.
• Complex Queries: Although MongoDB's aggregation framework is powerful, it may
still not offer the same level of querying sophistication as relational databases with
SQL. Complex join operations, for instance, are handled with $lookup, but they are not
as efficient as SQL joins.
• Data Modeling: Designing schemas in MongoDB can be challenging, especially when
deciding between embedding or referencing documents. There is no one-size-fits-all
approach, and the choice depends on the use case.
• Transaction Support: MongoDB has improved support for multi-document
transactions (since version 4.0), but handling transactions in NoSQL databases is still
less mature than in traditional relational databases.
• Learning Curve: For developers accustomed to SQL, transitioning to NoSQL
databases can be difficult. Understanding the nuances of NoSQL data models and
operations requires a shift in mindset and practices.
• Scalability and Sharding Complexity: Although MongoDB supports sharding, the
setup and management of sharded clusters can be complex, especially when scaling
horizontally across large datasets.

4.2 Future Prospects of NoSQL and MongoDB

Despite these challenges, the future of NoSQL databases like MongoDB looks promising due
to several emerging trends and advancements:

Dept of CSE, JSSATEB 26


Big Data Analytics 21CS71

• Improved Multi-Document Transactions: MongoDB's improvements in multi-


document ACID transactions open up the possibility for using NoSQL in more critical
applications that require strong consistency.
• Integration with Machine Learning and AI: As NoSQL databases like MongoDB
grow, integration with AI and machine learning frameworks will become more
important, especially for managing large datasets used in model training.
• Serverless and Cloud-Native Deployments: With the rise of serverless architectures
and cloud-native technologies, MongoDB Atlas and other NoSQL databases are
becoming more popular as they provide fully managed services with auto-scaling and
high availability.
• Support for Graph and Time-Series Data: MongoDB’s recent improvements in
handling graph data (via the $graphLookup operator) and time-series data make it more
versatile for new use cases, such as IoT and social networks.
• Better Querying Capabilities: As MongoDB continues to evolve, its aggregation
framework and querying capabilities will improve, allowing it to handle more complex
operations with better performance.
• Hybrid Databases: MongoDB and other NoSQL databases may increasingly offer
hybrid models that combine the best features of both SQL and NoSQL, making them
more adaptable to a wider range of applications.

Dept of CSE, JSSATEB 27


Big Data Analytics 21CS71

CHAPTER 5

CONCLUSION

In conclusion, NoSQL databases, especially MongoDB, have revolutionized the way data is
stored and processed, offering solutions for modern applications that demand scalability,
flexibility, and high performance. Unlike traditional relational databases, MongoDB’s
document-oriented model allows developers to store data in a more natural, hierarchical format,
making it an ideal choice for handling unstructured and semi-structured data.
MongoDB's strength lies in its ability to scale horizontally, distribute data across multiple
nodes, and provide high availability and fault tolerance. Its flexible schema design allows rapid
iteration and agile development, while its built-in features like the aggregation framework,
MapReduce, and indexing significantly improve query performance and data processing
capabilities.
Despite these advantages, MongoDB is not without its challenges. Issues such as consistency
in distributed systems, complex querying, and the learning curve for developers transitioning
from SQL databases are areas that continue to pose difficulties. However, MongoDB's
continued advancements—particularly in transaction support, improved querying capabilities,
and integration with cloud-native and serverless architectures—are addressing these limitations
and expanding its use cases.
Looking ahead, MongoDB’s integration with emerging technologies like artificial intelligence,
machine learning, and big data analytics will further enhance its utility, enabling businesses to
harness the power of large datasets. The support for graph databases and time-series data,
combined with hybrid models that blend the best of both SQL and NoSQL, ensures that
MongoDB will remain a critical tool in the developer's toolkit.
As the demand for real-time data processing, high availability, and cloud-based applications
continues to rise, MongoDB's role as a leading NoSQL database will only grow. Its growing
ecosystem, improved features, and vast community support make it a powerful choice for
modern applications across industries such as e-commerce, social media, IoT, finance, and
healthcare.
In summary, MongoDB, with its unique features and growing capabilities, provides a robust
solution to the data management challenges of modern applications. Its continuous evolution
ensures that it will remain at the forefront of the NoSQL revolution, empowering developers
and businesses to build scalable, high-performance systems.

Dept of CSE, JSSATEB 28


Big Data Analytics 21CS71

CHAPTER 6

REFERENCES
• MongoDB Documentation. (n.d.). MongoDB Manual. MongoDB, Inc. Retrieved from
https://docs.mongodb.com/
• Chodorow, K. (2013). MongoDB: The Definitive Guide. O'Reilly Media.
• Giamas, A. (2017). Mastering MongoDB: The Complete Guide to MongoDB
Development and Administration. Packt Publishing.
• MongoDB Atlas Documentation. (n.d.). MongoDB Atlas: Managed MongoDB in the
Cloud. MongoDB, Inc. Retrieved from https://www.mongodb.com/cloud/atlas
• Rhys, C. (2020). MongoDB in Action. Manning Publications.
• Finkel, H. (2015). Learning MongoDB: A Hands-on Guide to Building Applications
with MongoDB. Packt Publishing.
• O'Reilly Media. (2015). Learning MongoDB. Retrieved from
https://www.oreilly.com/library/view/learning-mongodb/9781785884334/
• Grolinger, K., Hughes, K., & Buckley, K. (2013). Data Management in the Cloud:
Challenges and Opportunities. International Journal of Cloud Computing and
Services Science (IJCCSS), 2(3), 1-18.
• Nunn, M., & Denny, M. (2017). Practical MongoDB: Architecting, Developing, and
Administering MongoDB. Apress.

Dept of CSE, JSSATEB 29

You might also like