0% found this document useful (0 votes)
8 views

Assignment 11

The document outlines the steps taken by Anayna Nidhi Singh to complete an assignment for CSP 554, focusing on the installation and use of MongoDB on a Dataproc cluster. It includes detailed commands for setting up the database, performing various exercises related to data manipulation, and a summary of a study on managing real-time temporal data with MongoDB in IoT applications. The research emphasizes MongoDB's advantages over traditional relational databases in handling schema evolution and real-time data processing.

Uploaded by

anaynasingh25
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Assignment 11

The document outlines the steps taken by Anayna Nidhi Singh to complete an assignment for CSP 554, focusing on the installation and use of MongoDB on a Dataproc cluster. It includes detailed commands for setting up the database, performing various exercises related to data manipulation, and a summary of a study on managing real-time temporal data with MongoDB in IoT applications. The research emphasizes MongoDB's advantages over traditional relational databases in handling schema evolution and real-time data processing.

Uploaded by

anaynasingh25
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Assignment 11

CSP 554: Big Data Technologies


Fall 2024

Submitted By: Anayna Nidhi Singh


Hawk ID: A20547310

Step A – Start an Dataproc cluster (Choose)

Step B – Download the assignment software (mongoex.tar, mongodb-org 4.2.repo) to


master node
Step C – Install assignment software (mongoex.zip, mongodb-org-7.0.repo)

I used a Debian instead of Linux 2023 so I have followed different steps to install the
software. The steps I followed are:

1) Step 1: Import the MongoDB GPG Key


Command Used: wget -qO - https://pgp.mongodb.com/server-7.0.asc | sudo gpg --
dearmor -o /usr/share/keyrings/mongodb-org-7.gpg
2) Step 2: Add the MongoDB Repository
Command Used: echo "deb [signed-by=/usr/share/keyrings/mongodb-org-7.gpg]
https://repo.mongodb.org/apt/debian $(lsb_release -cs)/mongodb-org/7.0 main" |
sudo tee /etc/apt/sources.list.d/mongodb-org-7.list
3) Step 3: Update the Package Database
Command Used: sudo apt update
Step D – Install and start MongoDB

Commands Used:

1) sudo apt install -y mongodb-org


2) sudo systemctl start mongod
3) sudo systemctl status mongod
4) sudo apt install -y mongodb-mongosh

Step E - Start the MongoDB Shell (Command Line Interpreter)

Command Used: mongosh


Step F – Edit mongo query language files

Opened a third terminal connection to the DATAPROC master node called CLI-Term:

Step G – Setting up the assignment database

Commands used:

1) use assignment;
2) load(‘./load.js’);
3) db.unicorns.find();
Exercise 1

Command Used: db.unicorns.find({ weight: { $lt: 500 } });

Exercise 2

Command Used: db.unicorns.find({ loves: "apple" });


Exercise 3

Command Used:

db.unicorns.insertOne({

name: "Malini",

dob: new Date("2008-11-03"),

loves: ["pears", "grapes"],


weight: 450,

gender: "F",

vampires: 23,

horns: 1

});

Find command to show its in collection: db.unicorns.find({ name: "Malini" });

Exercise 4

Command Used:

db.unicorns.updateOne(
{ name: "Malini" },

{ $addToSet: { loves: "apricots" } }

);

Command used to verify that the above command: db.unicorns.find({ name: "Malini" });

Exercise 5

Command Used: db.unicorns.deleteMany({ weight: { $gt: 600 } });

Command used to verify the above command: db.unicorns.find({ weight: { $gt: 600 } });

In the output shown below we can see that all unicorns with a weight of more than 600
pounds have been deleted:
Exercise 6: Summary of the article “Modeling temporal aspects of sensor data for
MongoDB NoSQL database”
The study focuses on addressing the challenges of managing real-time temporal data generated by IoT devices,
particularly from ANT+ sensors used in healthcare. The research question explores how NoSQL databases,
especially MongoDB, can provide a scalable and flexible schema for storing and processing such data. This is
particularly important because traditional relational databases (RDBMS) struggle to handle the demands of modern
applications, including the need for horizontal scaling, schema flexibility, and support for high-velocity data
streams.

The authors hypothesize that a document-oriented database like MongoDB can overcome these limitations by
supporting schema evolution and hierarchical data structures. They also propose that such a model is well-suited for
handling temporal aspects of real-time sensor data, which are critical for applications requiring time-series analysis,
such as remote healthcare monitoring.

To test their hypotheses, the researchers developed a middleware solution and designed a schema tailored for
MongoDB to efficiently store and query temporal data. The middleware was responsible for integrating data from
ANT+ sensors, which transmit timestamped measurements, into MongoDB's JSON-based hierarchical document
model. This design allowed the schema to adapt dynamically to new data formats and structures without requiring
predefined schemas, a common limitation in RDBMS. The study also incorporated an algorithm to handle schema
evolution, ensuring that new data could be seamlessly integrated while preserving the hierarchical organization of
existing data. The researchers analyzed the system's performance in terms of scalability, storage efficiency, and the
ability to maintain temporal order in real-time data streams. Key aspects of the evaluation included the system's
ability to handle large-scale timestamped datasets, the efficiency of queries on hierarchical data, and the robustness
of the schema in dynamic environments.

The results demonstrated MongoDB’s suitability for real-time IoT data management. The hierarchical schema
effectively reduced redundancy by embedding related data, minimizing the need for expensive join operations
typical in relational databases. Query performance improved significantly due to MongoDB’s ability to index both
primary and secondary attributes, even within sub-documents. The system also seamlessly supported schema
evolution, allowing it to handle new data formats dynamically without the need for complex migrations or
redefinitions. For example, as new data attributes were introduced, they were integrated into the existing schema
with minimal disruption, showcasing MongoDB's flexibility. Additionally, the system handled large volumes of
timestamped data efficiently, preserving temporal order while enabling fast query execution. This performance was
particularly beneficial in healthcare scenarios, where timely access to sensor data is critical for decision-making.

The implications of this research are significant, especially in fields where real-time data processing is vital, such as
healthcare and IoT applications. MongoDB’s ability to handle schema evolution and support dynamic queries makes
it a strong candidate for applications requiring both scalability and flexibility. While the study’s findings validate the
potential of MongoDB and other NoSQL databases for temporal data, the authors suggest further research is
necessary. Future studies should focus on incorporating advanced analytics and improving cross-document query
capabilities to enhance the system’s applicability across various domains.

In conclusion, this research highlights MongoDB's ability to address the unique requirements of real-time temporal
data. The findings validate the potential of NoSQL databases as a robust solution for IoT applications, offering
flexibility, scalability, and efficiency in ways traditional databases cannot. Future studies should explore additional
optimizations and advanced processing techniques to further enhance the capabilities of NoSQL systems in big data
environments.

You might also like