Big Data Query Processing Approach UsingMongoDB
Big Data Query Processing Approach UsingMongoDB
1 World College of Technology and Management, World College of Technology and Management,
Gurgaon, India Gurgaon, India
E-mail: sangeeta.sept@gmail.com
Authorized licensed use limited to: Indian Institute of Technology - Jodhpur. Downloaded on November 25,2023 at 16:34:56 UTC from IEEE Xplore. Restrictions apply.
Normalizing relational databases improves data Embedded documents have one-to-one relationship is
integrity, performance, and space utilization [8]. shown:
However, aggregation is not supported by MongoDB,
which requires denormalization. The purpose of this {_id:1, First_Name:“Keshav”, Place :{ State:”Delhi”,
study was to investigate the analytical differences Country:”India”}}
between the normalized data model and the embedded
data model, i.e. denormalized, in the context of Adding similarities, the embedded documents with one-
MongoDB. to-many relationship are shown.
In MongoDB, document consider an entity, which is a {_id:1, First_Name:“Keshav”, Children :[{ First
set of key value pairs as shown below. There are three Name:“UnMarried”, Age:0}, {First_Name:“Single”,
fields (columns) First Named_id, First_Name and Place,
each with its own value. Age:2}]
{ _id:1, First_Name:“Keshav”, Place:“Delhi”} MongoDB can efficiently process data such as storing
and retrieving.
MongoDB refers to RDBMS "save" as "document". The
"table" containing many documents in an RDBMS is 3 Implementation
called a "collection" in MongoDB. The _id field in
MongoDB reflects the RDBMS idea of a primary key. 3.1 MongoDB Theory
By concatenating and combining, joins can be copied.
Manualreferences or DBRefs can be used to create the Some important concepts of MongoDB Database:
connection.
Structure of Database: Structured way to store access
Normalization was achieved in the present study by
manually connecting data sets using references. data.
Denormalization can involve combining documents into Concept of NoSQL database: Related tables of data.
a single collection. NoSQL documents DB processing: Data in MongoDB is
stored as documents.
Save and stored in Collections: Documents are stored in
collections of documents.
MongoDB stores data in the form of Collections and data
store in the form of documents that are part of Document
[22]. Below fig. explain how data, documents and
collections are interrelated [22][1].
Fig 3. Embedding
Fig 5. Arrangement of Documents in Collections
3
Authorized licensed use limited to: Indian Institute of Technology - Jodhpur. Downloaded on November 25,2023 at 16:34:56 UTC from IEEE Xplore. Restrictions apply.
MongoDB Documents: Organizing and storing data as a
set of field-value pairs.
Fig 7. Dashboard of Atlas
MongoDB Collections: Organized store of documents in
MongoDB, usually with common field between Documents are representing in memory by
documents. JSON and BSON. JSON stands for JavaScript
Standard Object Notations and its format is:
For performing MongoDB queries, we are going to use
Atlas. MongoDB is uses at the core of Atlas for data Initialize and terminate with curly braces “{}”
storage and retrieval. Atlas is database as a service. Atlas Separate each key and value with colon “:”
deployment work as acluster deployment [22][2]. Mainly Separate each key: value pair with a comma “,”
Atlas consider three main sets are: “Keys” must be surrounded by quotation marks “”
Atlas Clusters: Inserting data by group of servers. BSON stands for Binary Standard Object Notation
Atlas Replica Set: A few linked MongoDB instances that and it create a bridge gab between binary
store the same data. representation and JSON format [22][3]. It is
Atlas Single Cluster in Atlas: Automatically configured optimized for speed, flexibility, high performance and
as a replica set. general-purpose focus.
MongoDB has different import and export statement for
both JSON and BSON
Authorized licensed use limited to: Indian Institute of Technology - Jodhpur. Downloaded on November 25,2023 at 16:34:56 UTC from IEEE Xplore. Restrictions apply.
within a certain range. Fig 10. Type and JSON format for listings and Reviews
{ <filed>:{ <operator>:<value>}} $group:An operator that takes the incoming stream of
$eq is used as the default operator when an operator is data, and siphons it into multipledistinct reservoirs.
not specified.
3.2 MongoDB Queries for Processing:
Command to establish connection [23][1]:
mongo"mongodb+srv://keshav@sandbox.wx2hnvg.mo
ngodb.net/admin"
Command shows and select database use sample
training
Authorized licensed use limited to: Indian Institute of Technology - Jodhpur. Downloaded on November 25,2023 at 16:34:56 UTC from IEEE Xplore. Restrictions apply.
because of efficiency in document-oriented database. 16. Rahman, M. K. (2015, July). DataViz: High velocity data visualization
and retrieval of relevant information from social network. In 2015 6th
The “json” format of data stored in MongoDB facilitates International Conference on Computing, Communication and
data analysis and processing. We may compare other Networking Technologies (ICCCNT) (pp. 1-7). IEEE.
features of MongoDB and NoSQL in the future. 17. Nisar, M. U., Fard, A., & Miller, J. A. (2013, June). Techniques for graph
analytics on big data. In 2013 IEEE International Congress on Big
Comparing SQL to MongoDB for simple inserts, Data (pp. 255-262). IEEE.
updates, and queries, MongoDB offers outstanding 18. Rani, S., Tripathi, K., Arora, Y., & Kumar, A. (2022, February).
Analysis of Anomaly detection of Malware using KNN. In 2022 2nd
overall runtime performance. There isn't a clear International Conference on Innovative Practices in Technology and
definition of a schema in MongoDB. In contrast to SQL, Management (ICIPTM) (Vol. 2, pp. 774-779). IEEE.
which necessitates meticulous schema design, 19. Babar, M., Arif, F., Jan, M. A., Tan, Z., & Khan, F. (2019). Urban data
management system: Towards Big Data analytics for Internet of Things
MongoDB is capable of supporting a dynamic schema, based smart urban environment using customized Hadoop. Future
such as a document management system with a large Generation Computer Systems, 96, 398-409.
number of dynamic fields and just a small number of 20. Mills, B. W. (2015). Borrowing from Nature: A Hybrid Bridge to a New
Data Paradigm (Doctoral dissertation, Colorado Technical University).
well-known research data items. 21. Ertel, W. (2018). Introduction to artificial intelligence. Springer.
22. Gupta, S., Rani, S., Dixit, A., & Dev, H. (2019). Features exploration of
In addition, installing MongoDB requires more effort distinct load balancing algorithms in cloud computing environment.
International Journal of Advanced Networking and Applications, 11(1),
than installing SQL. SQL is the industry standard and is 4177-4183.
supported by more companiesthan MongoDB. Everyone 23. Erraji, A., Maizate, A., & Ouzzif, M. (2022). New ETL Process for a
knows that MongoDB works best as a distributed Smart Approach of Data Migration from Relational System to
MongoDB System. In International Conference on Digital Technologies
database. Theoretically, MongoDB should beat SQL due and Applications (pp. 131-140). Springer, Cham.
to the lack of definite schema definition. As relational 24. Rani, S., Kumar, A., Bagchi, A., Yadav, S., & Kumar, S. (2021, August).
SQL databases contain significant overhead. RPL based routing protocols for load balancing in IoT network. In
Journal of Physics: Conference Series (Vol. 1950, No. 1, p. 012073). IOP
Publishing.
25. Zhang, D., Pee, L. G., Pan, S. L., & Cui, L. (2022). Big data analytics,
REFERENCES resource orchestration, and digital sustainability: A case study of smart
city development. Government Information Quarterly, 39(1), 101626.
1. Gandomi, A., & Haider, M. (2015). Beyond the hype: Big data concepts, 26. Hoberman, S. (2014). Data Modeling for MongoDB: Building Well-
methods, and analytics. International journal of information Designed and Supportable MongoDB Databases. Technics
management, 35(2), 137-144. Publications.
2. Pousttchi, K., Tilson, D., Lyytinen, K., & Hufenbach, Y. (2015). 27. Banker, K., Garrett, D., Bakkum, P., & Verch, S. (2016). MongoDB in
Introduction to the special issue on mobile commerce: mobile commerce action: covers MongoDB version 3.0. Simon and Schuster.
research yesterday, today, tomorrow—what remains to be
done?. International Journal of Electronic Commerce, 19(4), 1-20.
3. Kaivo-Oja, J., Virtanen, P., Jalonen, H., & Stenvall, J. (2015, August).
The effects of the internet of things and big data to organizations and
their knowledge management practices. In International Conference on
Knowledge Management in Organizations (pp. 495-513). Springer,
Cham.
4. Kościelniak, H., & Puto, A. (2015). BIG DATA in decision making
processes of enterprises. Procedia Computer Science, 65, 1052-1058.
5. Bettencourt, L. M. (2014). The uses of big data in cities. Big data, 2(1),
12-22.
6. Śliwa, P., Krzos, G., & Pondel, M. (2020). Dynamic modelling of inter-
organizational networks using the domain knowledge and big data
analytics.
7. Kimball, R., & Ross, M. (2011). The data warehouse toolkit: the
complete guide to dimensional modeling. John Wiley & Sons.
8. Lam, C. (2010). Hadoop in action. Simon and Schuster.
9. Rhodes, D. R., Yu, J., Shanker, K., Deshpande, N., Varambally, R.,
Ghosh, D., ... & Chinnaiyan, A. M. (2004). ONCOMINE: a cancer
microarray database and integrated data-mining
platform. Neoplasia, 6(1), 1-6.
10. Neeraj, N. (2015). Mastering Apache Cassandra. Packt Publishing Ltd.
11. Babu, A. S., & Supriya, M. (2022, April). Blockchain Based Precision
Agriculture Model Using Machine Learning Algorithms. In 2022
International Conference on Breakthrough in Heuristics And
Reciprocation of Advanced Technologies (BHARAT) (pp. 127-132).
IEEE.
12. Braha, D. (Ed.). (2013). Data mining for design and manufacturing:
methods and applications (Vol. 3). Springer Science & Business Media.
13. Smith, R. N., Aleksic, J., Butano, D., Carr, A., Contrino, S., Hu, F., ... &
Micklem, G. (2012). InterMine: a flexible data warehouse system for the
integration and analysis of heterogeneous biological
data. Bioinformatics, 28(23), 3163-3165.
14. Chen, R., Shi, J., Chen, Y., Zang, B., Guan, H., & Chen, H. (2019).
Powerlyra: Differentiated graph computation and partitioning on skewed
graphs. ACM Transactions on Parallel Computing (TOPC), 5(3), 1-39.
15. Ailamaki, A., & Bowers, S. (Eds.). (2012). Scientific and Statistical
Database Management: 24th International Conference, SSDBM 2012,
Chania, Crete, Greece, June 25-27, 2012, Proceedings (Vol. 7338).
Springer.
Authorized licensed use limited to: Indian Institute of Technology - Jodhpur. Downloaded on November 25,2023 at 16:34:56 UTC from IEEE Xplore. Restrictions apply.