0% found this document useful (0 votes)
7 views

Big Data Query Processing Approach UsingMongoDB

big data query using mongodb

Uploaded by

gajendra
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Big Data Query Processing Approach UsingMongoDB

big data query using mongodb

Uploaded by

gajendra
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

3rd International Conference on Innovative Practices in Technology and Management (ICIPTM 2023)

Big Data Query Processing Approach


Using MongoDB
Keshav Sangeeta Rani
2023 3rd International Conference on Innovative Practices in Technology and Management (ICIPTM) | 979-8-3503-3623-8/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICIPTM57143.2023.10117738

1 World College of Technology and Management, World College of Technology and Management,
Gurgaon, India Gurgaon, India
E-mail: sangeeta.sept@gmail.com

displayed by the system. There is a collection of NoSQL


Abstract
compatible database. The phrase "Not just SQL" is also
The "Big Data" phrase describes to the management of a wide processed for mentioned databases. Theyprovide storage
range of organized and unstructured data with increasing speed and retrieval mechanisms with less restrictive consistency
and quantity. These datasets are conventional, large, and difficult
to maintain. However, these datasets are used within a number of
models than relational database. NoSQL databases have
companies to perform various tasks on them as well as for received wide appeal due to their simple architecture,
organizational purposes and to provide a summary of the data horizontal scalability, and accessibility. Facebook,
currently being used. More precise and accurate business Yahoo, Google and Amazon, among others, use them
judgments can be make as a result of the growing volume of big extensively in online real-time and big data applications.
data, which is now more affordable and available. The objective
of this research paper is to demonstrate how to identify and use NoSQL entered the database industry with the
only the most significant and important data to be used in a follow- introduction of Web 2.0. There are mainly five types of
up investigation, help other researchers perform additional NoSQL databases including Key-Value Stores,
analysis, take into account only a limited number of data, Document Stores, Column-Oriented, Graph Databases,
ensuring that the study will always provide the best results.
Although there are other methods and tools to extract data with
and Structured Data Stores. These respective databases
certain filters. MongoDB uses the NoSQL model as the basis for are represented by Cassandra, MongoDB, Big Data
query processing. To obtain data from a large data collection, Table, neo4j, and Redis. The NoSQL category includes
query processing is used and it will continue toplay an important databases. With the backing of venture capital and open
role in future research and strategies for this work. The behaviour source, small IT companies like MongoDB, and Riak
of extraction of data from Big Data and query processing on the
bases input parameters that are going to use in Machine have pushed the database. Oracle has also added some
Learning. This process also termed as Data Mining which will NoSQL functionality to its Berkeley database, which it
show the behaviour of mining data from large amount of combine calls Oracle NoSQL. IBM's HBase is now at theheart of
data. This paper show the behaviour implementation of MongoDB big data. In this article, the MongoDB NoSQL document
on required parameter and will produce the efficient result.
storage database, open source, and without classes is
Keywords Big Data || MongoDB || NOSQL || analyzed. It is highly scalable and handles BSON
Extraction || Machine Learning || Data Mining ||Query documents. The mainattraction of MongoDB for NoSQL
Processing users is its ability to reduce maps, automatically split and
expand documents. MongoDB is well-suited for cloud-
1 Introduction based, on-premises, and hybrid deployments. MongoDB
is easy to understand due to its simple structure. The
Traditional relational database management systems,
efforts of other researchers related to MongoDB are
such as MySQL, MSSQL, DB2, Oracle, etc., have
detailed in the following section.
dominated the industry for the past three decades. They
use typical SQL (StructuredQuery Processing). Due to the As database technology advances, the popularity of
proliferation of web-scale applications such as Facebook, NoSQL databases increases. One of them is MongoDB,
mobileapplications, and RFID, the Internet has become an a NoSQL database hosted in the cloud. As can be seen,
integral part of the modern world. Thanks to these apps, the data modelling approach for relational and non-
zettabytes of data are generated every day. Traditional relational databases is completely different. Data
relational databases are not effective in distributed modelling in a MongoDB database is based on the
environments due to changing database and application database's data and properties. Variations in the data
requirements. This has led to the prominence and model affect the performance of MongoDB-based
popularity of NoSQL databases. NoSQL databases seem applications. In this study, we use two unique modelling
to benefit because they are schema-free, elastic, and techniques:
extensible. Carlo Strozierfirst used the word “NoSQL” in
year 1998 to describe his light-weight open-source combine documents and standardize collections. Due to
relational database. The normal SQL interface is not embedding, there may be cases where documents

979-8-3503-3623-8/23/$31.00 ©2023 IEEE


Authorized licensed use limited to: Indian Institute of Technology - Jodhpur. Downloaded on November 25,2023 at 16:34:56 UTC from IEEE Xplore. Restrictions apply.
increase in size after they are created, which can affect mapping is likewise simplified with the aid of using
database performance. MongoDB imposes a maximum MongoDB. The map-shrinking and statistics-replication
size limit on documents. With references we have the functions of MongoDB make improvement faster than
most flexibility for integration, but client-side programs Oracle.
have to do additional queries to resolve. In same case,
joins cannot be used. To improve performance in MongoDB is an open-supply database, plugins that
mixed environments, it is necessary to establish simplify the technique can be built. When storing sensor
standardized and integrated methods. This study statistics, examine the overall performance of SQL and
illustrates variation in query execution time as a function NoSQL databases, consisting of an open-supply SQL
of modelling techniques related to normalization and Database (PostgreSQL) and open-supply NoSQL
integration, and provides a basis for estimating Databases (Cassandra and MongoDB). Small to
normalization and integration requirements. tominimize medium-sized, non-essential sensor packages wherein
query execution time. write pace is important might also additionally locate
MongoDB to be the fine option.
It is stated in the cited document that finding problems in
2 Background
the code at an early stage is essentialfor software quality
2.1 Background Research Context assurance and this can be done through code review. The
researchers examined an open-source Chromium
Due to the novelty of the NoSQL movement, a number modification repository that uses MongoDB as a backend.
of research and development opportunities exist [1]. This Prior to that, they investigated and identified promising
type of database is very popular in academia. In addition answers for seven areas of research. Ideas and techniques
to other NoSQL databases, the work done in MongoDB for implementing Auto-Shading in MongoDB, in
is discussed here. Cassandra and MongoDB, two of the addition to an improved algorithm based on data
most prominent NoSQL databases, are analyzed in this transaction frequency, are discussed in [14]. By
article for their key features and security aspects. Lack of efficiently distributingdata across shards, their method is
encryption guide for statistics files, inadequate said to improve the concurrent read and write
authentication each among purchaser and server and performance of shards. Comparing the performance of
among server members, extraordinarily fundamental MongoDB with the open source PostgreSQL RDBMS,
authorization without role-primarily based totally get TPC-H queries were integrated into MongoDB for
right of entryto control (RBAC) guide, or each. investigation.
The distributed NoSQL system provides greater
availability for huge amounts of statistics, but does not
allow convoluted queries to be executed. Previous
systems based on reverse lists of individual swallows
performed poorly on a large scale. Additionally,
additional indexes are allowed by databases like
MongoDB, making them transparent to developers [4].
MapReduce and its ability to perform complex operations
such as adding and removing objects from a document
Fig 1. MongoDB collection have led to the immense appeal of MongoDB.
Detailed and susceptibility to SQL injection and denial Our semantic studyof one of the commercial applications
of provider threats have been highlighted because the was performed to demonstrate the performance
maximum big challenge that each structuresshared. This difference between embedded and normalized data
allows MongoDB to offer safety withinside the identical models. Analysis indicates that a strictly normalized data
way as NoSQL databases. The have a look at of the model may not provide reliable results for some
essential set of rules for map reduction[3]. Also describes convoluted queries involving joins. The built-in data
that “NoSQL Database”, which include “MongoDB” model works well for all queries [5].
with key cost stores, provide an powerful basis for Now, in this article, we will explain our deep analysis
gathering huge volumes of statistics. Oracle and with another application that terabyte entries.
MongoDB are contrasted primarily based totally on
some of characteristics, consisting of theory, differences, 2.2 Embedding in MongoDB Vs Normalization
functions, restrictions, integrity, distribution, gadget in MongoDB
requirements, architecture, and question overall
performance. insert. They stated that MongoDB may be The goal of relational database design is to build high-
very adaptable and horizontally expandable. In addition, quality groups that form relationships, thereby
a unmarried area might also additionally keep complex minimizing redundant data and storage space.
statistics like as an array, an object, or a reference. Object
2

Authorized licensed use limited to: Indian Institute of Technology - Jodhpur. Downloaded on November 25,2023 at 16:34:56 UTC from IEEE Xplore. Restrictions apply.
Normalizing relational databases improves data Embedded documents have one-to-one relationship is
integrity, performance, and space utilization [8]. shown:
However, aggregation is not supported by MongoDB,
which requires denormalization. The purpose of this {_id:1, First_Name:“Keshav”, Place :{ State:”Delhi”,
study was to investigate the analytical differences Country:”India”}}
between the normalized data model and the embedded
data model, i.e. denormalized, in the context of Adding similarities, the embedded documents with one-
MongoDB. to-many relationship are shown.

In MongoDB, document consider an entity, which is a {_id:1, First_Name:“Keshav”, Children :[{ First
set of key value pairs as shown below. There are three Name:“UnMarried”, Age:0}, {First_Name:“Single”,
fields (columns) First Named_id, First_Name and Place,
each with its own value. Age:2}]
{ _id:1, First_Name:“Keshav”, Place:“Delhi”} MongoDB can efficiently process data such as storing
and retrieving.
MongoDB refers to RDBMS "save" as "document". The
"table" containing many documents in an RDBMS is 3 Implementation
called a "collection" in MongoDB. The _id field in
MongoDB reflects the RDBMS idea of a primary key. 3.1 MongoDB Theory
By concatenating and combining, joins can be copied.
Manualreferences or DBRefs can be used to create the Some important concepts of MongoDB Database:
connection.
Structure of Database: Structured way to store access
Normalization was achieved in the present study by
manually connecting data sets using references. data.
Denormalization can involve combining documents into Concept of NoSQL database: Related tables of data.
a single collection. NoSQL documents DB processing: Data in MongoDB is
stored as documents.
Save and stored in Collections: Documents are stored in
collections of documents.
MongoDB stores data in the form of Collections and data
store in the form of documents that are part of Document
[22]. Below fig. explain how data, documents and
collections are interrelated [22][1].

Fig 2. Normalization and Denormalization


Figure 2 & 3 shows the relationship and distinguish
between normalization, denormalization
And embedding.

Fig 4. Store Data in Documents

Fig 3. Embedding
Fig 5. Arrangement of Documents in Collections
3

Authorized licensed use limited to: Indian Institute of Technology - Jodhpur. Downloaded on November 25,2023 at 16:34:56 UTC from IEEE Xplore. Restrictions apply.
MongoDB Documents: Organizing and storing data as a
set of field-value pairs.
Fig 7. Dashboard of Atlas
MongoDB Collections: Organized store of documents in
MongoDB, usually with common field between Documents are representing in memory by
documents. JSON and BSON. JSON stands for JavaScript
Standard Object Notations and its format is:
For performing MongoDB queries, we are going to use
Atlas. MongoDB is uses at the core of Atlas for data Initialize and terminate with curly braces “{}”
storage and retrieval. Atlas is database as a service. Atlas Separate each key and value with colon “:”
deployment work as acluster deployment [22][2]. Mainly Separate each key: value pair with a comma “,”
Atlas consider three main sets are: “Keys” must be surrounded by quotation marks “”
Atlas Clusters: Inserting data by group of servers. BSON stands for Binary Standard Object Notation
Atlas Replica Set: A few linked MongoDB instances that and it create a bridge gab between binary
store the same data. representation and JSON format [22][3]. It is
Atlas Single Cluster in Atlas: Automatically configured optimized for speed, flexibility, high performance and
as a replica set. general-purpose focus.
MongoDB has different import and export statement for
both JSON and BSON

In MongoDB every document must have a unique


_id value, so that I will show uniqueness of the
document in the collections. Even all documents in
the collection can be identical except for the _id
value. Even every document in the collection is very
different in every way. ObjectId() has default value
for the _id field unless otherwise specified.
Fig 5. Cluster Deployment MongoDB has schema validation functionality
allows to enforce document structure [22][4].
Fig 8. Terminal Query for FindOne()

Fig 6. Relation between Cluster and Replica Set


Replica set work on collection of Instance. Instance runs
certain software as a single local machine or in a cloud .
Services provide by Atlas are:

• Manage cluster creation


MQL Operators:
• Run and maintain database deployment
• Use cloud service provider Update Operators:
• Experiments with new tools and features Enable to modify data in the database [22][5].
Example:$inc, $set, $unset
MongoDB Atlas clusters are groups of servers that store Query Operators:
the data. Replica-sets are a set of few connected Provide additional ways to locate data within the
MongoDB instances that store the same data [22][2].
database
$ has multiple uses:
Precedes MQL Operators
Precedes Aggregation pipeline stages
Allow access to field values
Comparison Operators:
Query Operators provide additional ways to locate data
within the database.
Comparison Operators specifically allow us to find data
4

Authorized licensed use limited to: Indian Institute of Technology - Jodhpur. Downloaded on November 25,2023 at 16:34:56 UTC from IEEE Xplore. Restrictions apply.
within a certain range. Fig 10. Type and JSON format for listings and Reviews
{ <filed>:{ <operator>:<value>}} $group:An operator that takes the incoming stream of
$eq is used as the default operator when an operator is data, and siphons it into multipledistinct reservoirs.
not specified.
3.2 MongoDB Queries for Processing:
Command to establish connection [23][1]:
mongo"mongodb+srv://keshav@sandbox.wx2hnvg.mo
ngodb.net/admin"
Command shows and select database use sample
training

Fig 11. Filter of $group


$match and $group:Non filtering stages do not
modify the original data [23]. Instead,they work
with the data in the cursor

Fig 9. Show and Select DB terminal Queue


There are multiple operators to perform MongoDB
CRUD (Create, Read, Update, Delete)operations:
Update Operators: Modify existing details in database
$inc: increment the values
$set: assign some different value to existing values
$unset: reset the value
Query Operators: Provide additional ways to locate data
within the database Fig 12. Combine query of $match and $group
Comparison Operators: To compare values on different
parameters 4 Future Work and scope
$eq: Compare the Equality
$gt: Greater-Than A schema specific to a business application is the subject
$gte: Greater-Than or Equal-to of testing. Similar kind of work alsoextended to perform
$ne: Not Equal-to processing of different type of processes. Our goal is to
perform integrationand normalization of these schemas
$lt: Less Than
to test for performance differences and determine an
$lte: Less-than or Equal to approach to decide on the level of normalization and/or
Logical Operators integration to achieve results. better depending on the
$and: Match all of the specified query clauses data model type.
$or: Al least one of the query clauses is matched
$nor: Fail to match both given clauses SQL will form more efficiently than MongoDB but the
main aim of this research is to work on the big data. Even
$not: Negates the query requirement
SQL has some row and column limitations but because
Next framework is the Aggregation Framework [23]. In of MongoDB architecture it fulfils the cons of SQL. In
this framework is the simplest form, another way to future, amount of data will be high so MongoDB will
query data in MongoDB. Four main operation are in this work more efficiently in future applications.
framework: 5 Conclusion
ListingsAndReview:First list all the match fields then In this research authors implemented MongoDB queries
review and provide the result that perform different operations. The system is built
using MongoDB, compared to the MySQL
implementation method, adding andquerying data using
MongoDB. However, references to MongoDB, inferior
to MySQL, were noticed throughout the development.
Problem-solving takes longer. MySQL is no simpler than
post-administration difficulties.
MongoDB has multiple application like tweet analysis
5

Authorized licensed use limited to: Indian Institute of Technology - Jodhpur. Downloaded on November 25,2023 at 16:34:56 UTC from IEEE Xplore. Restrictions apply.
because of efficiency in document-oriented database. 16. Rahman, M. K. (2015, July). DataViz: High velocity data visualization
and retrieval of relevant information from social network. In 2015 6th
The “json” format of data stored in MongoDB facilitates International Conference on Computing, Communication and
data analysis and processing. We may compare other Networking Technologies (ICCCNT) (pp. 1-7). IEEE.
features of MongoDB and NoSQL in the future. 17. Nisar, M. U., Fard, A., & Miller, J. A. (2013, June). Techniques for graph
analytics on big data. In 2013 IEEE International Congress on Big
Comparing SQL to MongoDB for simple inserts, Data (pp. 255-262). IEEE.
updates, and queries, MongoDB offers outstanding 18. Rani, S., Tripathi, K., Arora, Y., & Kumar, A. (2022, February).
Analysis of Anomaly detection of Malware using KNN. In 2022 2nd
overall runtime performance. There isn't a clear International Conference on Innovative Practices in Technology and
definition of a schema in MongoDB. In contrast to SQL, Management (ICIPTM) (Vol. 2, pp. 774-779). IEEE.
which necessitates meticulous schema design, 19. Babar, M., Arif, F., Jan, M. A., Tan, Z., & Khan, F. (2019). Urban data
management system: Towards Big Data analytics for Internet of Things
MongoDB is capable of supporting a dynamic schema, based smart urban environment using customized Hadoop. Future
such as a document management system with a large Generation Computer Systems, 96, 398-409.
number of dynamic fields and just a small number of 20. Mills, B. W. (2015). Borrowing from Nature: A Hybrid Bridge to a New
Data Paradigm (Doctoral dissertation, Colorado Technical University).
well-known research data items. 21. Ertel, W. (2018). Introduction to artificial intelligence. Springer.
22. Gupta, S., Rani, S., Dixit, A., & Dev, H. (2019). Features exploration of
In addition, installing MongoDB requires more effort distinct load balancing algorithms in cloud computing environment.
International Journal of Advanced Networking and Applications, 11(1),
than installing SQL. SQL is the industry standard and is 4177-4183.
supported by more companiesthan MongoDB. Everyone 23. Erraji, A., Maizate, A., & Ouzzif, M. (2022). New ETL Process for a
knows that MongoDB works best as a distributed Smart Approach of Data Migration from Relational System to
MongoDB System. In International Conference on Digital Technologies
database. Theoretically, MongoDB should beat SQL due and Applications (pp. 131-140). Springer, Cham.
to the lack of definite schema definition. As relational 24. Rani, S., Kumar, A., Bagchi, A., Yadav, S., & Kumar, S. (2021, August).
SQL databases contain significant overhead. RPL based routing protocols for load balancing in IoT network. In
Journal of Physics: Conference Series (Vol. 1950, No. 1, p. 012073). IOP
Publishing.
25. Zhang, D., Pee, L. G., Pan, S. L., & Cui, L. (2022). Big data analytics,
REFERENCES resource orchestration, and digital sustainability: A case study of smart
city development. Government Information Quarterly, 39(1), 101626.
1. Gandomi, A., & Haider, M. (2015). Beyond the hype: Big data concepts, 26. Hoberman, S. (2014). Data Modeling for MongoDB: Building Well-
methods, and analytics. International journal of information Designed and Supportable MongoDB Databases. Technics
management, 35(2), 137-144. Publications.
2. Pousttchi, K., Tilson, D., Lyytinen, K., & Hufenbach, Y. (2015). 27. Banker, K., Garrett, D., Bakkum, P., & Verch, S. (2016). MongoDB in
Introduction to the special issue on mobile commerce: mobile commerce action: covers MongoDB version 3.0. Simon and Schuster.
research yesterday, today, tomorrow—what remains to be
done?. International Journal of Electronic Commerce, 19(4), 1-20.
3. Kaivo-Oja, J., Virtanen, P., Jalonen, H., & Stenvall, J. (2015, August).
The effects of the internet of things and big data to organizations and
their knowledge management practices. In International Conference on
Knowledge Management in Organizations (pp. 495-513). Springer,
Cham.
4. Kościelniak, H., & Puto, A. (2015). BIG DATA in decision making
processes of enterprises. Procedia Computer Science, 65, 1052-1058.
5. Bettencourt, L. M. (2014). The uses of big data in cities. Big data, 2(1),
12-22.
6. Śliwa, P., Krzos, G., & Pondel, M. (2020). Dynamic modelling of inter-
organizational networks using the domain knowledge and big data
analytics.
7. Kimball, R., & Ross, M. (2011). The data warehouse toolkit: the
complete guide to dimensional modeling. John Wiley & Sons.
8. Lam, C. (2010). Hadoop in action. Simon and Schuster.
9. Rhodes, D. R., Yu, J., Shanker, K., Deshpande, N., Varambally, R.,
Ghosh, D., ... & Chinnaiyan, A. M. (2004). ONCOMINE: a cancer
microarray database and integrated data-mining
platform. Neoplasia, 6(1), 1-6.
10. Neeraj, N. (2015). Mastering Apache Cassandra. Packt Publishing Ltd.
11. Babu, A. S., & Supriya, M. (2022, April). Blockchain Based Precision
Agriculture Model Using Machine Learning Algorithms. In 2022
International Conference on Breakthrough in Heuristics And
Reciprocation of Advanced Technologies (BHARAT) (pp. 127-132).
IEEE.
12. Braha, D. (Ed.). (2013). Data mining for design and manufacturing:
methods and applications (Vol. 3). Springer Science & Business Media.
13. Smith, R. N., Aleksic, J., Butano, D., Carr, A., Contrino, S., Hu, F., ... &
Micklem, G. (2012). InterMine: a flexible data warehouse system for the
integration and analysis of heterogeneous biological
data. Bioinformatics, 28(23), 3163-3165.
14. Chen, R., Shi, J., Chen, Y., Zang, B., Guan, H., & Chen, H. (2019).
Powerlyra: Differentiated graph computation and partitioning on skewed
graphs. ACM Transactions on Parallel Computing (TOPC), 5(3), 1-39.
15. Ailamaki, A., & Bowers, S. (Eds.). (2012). Scientific and Statistical
Database Management: 24th International Conference, SSDBM 2012,
Chania, Crete, Greece, June 25-27, 2012, Proceedings (Vol. 7338).
Springer.

Authorized licensed use limited to: Indian Institute of Technology - Jodhpur. Downloaded on November 25,2023 at 16:34:56 UTC from IEEE Xplore. Restrictions apply.

You might also like