0% found this document useful (0 votes)
15 views

Analysis For An Open-Source Library in Database Ma

Uploaded by

220010044
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Analysis For An Open-Source Library in Database Ma

Uploaded by

220010044
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Analysis for an open-source library in database

management systems

Rama Hayek1 and Mohammad Zaher Akkad2p


Pollack Periodica • 1
Department of Software Engineering and Information Systems, Faculty of Information Engineering,
An International Journal University of Aleppo, Aleppo, Syria
for Engineering and 2
Information Sciences Faculty of Mechanical Engineering and Informatics, Institute of Logistics, University of Miskolc,
Miskolc-Egyetemváros, Hungary

Received: November 21, 2023 • Revised manuscript received: February 2, 2024 • Accepted: February 8, 2024
DOI:
10.1556/606.2024.00990
© 2024 The Author(s)

ABSTRACT
Spatial data management is crucial for applications like urban planning and environmental monitoring.
While traditional relational databases are commonly used, they struggle with large and complex spatial
data. NoSQL databases provide support for unstructured data and scalability. This article compares the
ORIGINAL RESEARCH performance and disk space usage of SQL Server (a relational database) and MongoDB (NoSQL
database) using an open-source library. Experiments conducted with the OpenStreetMap dataset from
PAPER Central America show that the MongoDB database outperformed the relational SQL Server database in
most cases, offering practical advantages for spatial data management in Geographic Information
System applications.

KEYWORDS
spatial data, relational database, NoSQL, OpenStreetMap

1. INTRODUCTION
Geographic Information Systems (GIS) are computer-based systems used to manage, store,
manipulate, analyze, and present spatial or geographic data [1]. GIS data typically involves
many types of data, like maps, satellite images, and geospatial data [2]. The management and
analysis of GIS data are traditionally done using Relational DataBase Management Systems
(RDBMS) due to their strong data consistency, transactional support, and Atomicity, Con-
sistency, Isolation, and Durability (ACID) properties [3]. Relational databases have been used
as a strong base for managing and storing spatial data objects for close to two decades [4].
In today’s world, with the increasing growth of geolocated and geospatial data ranging from
satellite images with massive volumes to user-generated content that has varied formats,
performance, and scalability challenges appeared with Structured Query Language (SQL)
relational databases [3]. Due to these limitations, companies need to develop efficient ways to
improve response time both when data is provided and retrieved. Non-relational SQL
(NoSQL) databases are one of the techniques that have emerged in the recent past to handle
requests for large datasets [2, 4]. Many organizations now use NoSQL to manage their GIS
data [5]. NoSQL databases are more flexible and scalable and can handle large quantities of
unstructured and semi-structured data, making them well-suited for managing geospatial
data [6].
The objective of this article is to create an open-source library that enables the analysis of
p
Corresponding author. query performance and disk space utilization in two commonly used DataBase Management
E-mail: [email protected]
Systems (DBMSs): SQL Server and MongoDB. Both systems are capable of handling spatial
data making them suitable for querying spatial datasets. This article contributes to existing
literature by introducing an open-source library developed using C# programming language
on a console application that can be utilized within DBMS or GIS modules. It also

Unauthenticated | Downloaded 05/15/24 02:17 PM UTC


2 Pollack Periodica

investigates and compares the query performance and disk these two database types. For example, one study [13] spe-
space usage of relational and NoSQL databases, both of cifically compared the performance of MongoDB, a popular
which are extensively employed in spatial analysis. NoSQL document-oriented database, with SQL database in
terms of executing common queries. The findings of this
study indicated that MongoDB often outperformed SQL in
2. THEORETICAL BACKGROUND terms of speed and overall performance. Another study [11]
focused on examining the performance of MongoDB and
Traditionally, geospatial databases as Oracle Spatial and PostGIS, a spatial extension for SQL databases, in handling
PostGIS have been effective tools for managing geospatial geospatial data. The researchers assessed the loading capa-
data. They provide spatial data types, indexing capabilities, bilities of both databases using a NodeJS-based web appli-
and spatial operations that are specifically designed for cation that simulated large amounts of geospatial data.
handling geospatial information. In the past decade, the Although the results of the study indicated that MongoDB
availability of GPS-enabled devices has led to a significant outperformed PostGIS in loading geospatial data, they might
increase in geospatial data production. This has resulted in differ in real-world scenarios with actual geospatial data and
the emergence of “Geospatial Big Data,” which refers to the complex queries. In 2018, the study [14] conducted a per-
unstructured and semi-structured location-based data formance comparison between a document-based NoSQL
generated [7]. While SQL relational databases were tradi- database and a relational database for Voluntary Geographic
tionally effective for managing geospatial data, the rise of Information System (VGIS) data storage architecture. The
geospatial big data has necessitated more efficient handling study suggested the advantages of the document-based
and storage solutions. NoSQL databases have emerged as NoSQL database in terms of feasibility and performance, but
promising alternatives, offering improved performance and it lacks consideration of more complex types of queries and
scalability for managing geospatial big data. These databases comparing metrics. Furthermore, the study [15] in 2020
are specifically designed to handle large-scale and unstruc- provided a detailed analysis of Create, Read, Update, and
tured data, making them well-suited for the challenges Delete (CRUD) operations and their impact on application
associated with geospatial big data [8]. NoSQL databases do performance. The study compared the well-known MySQL
not use tables, rows, and columns to organize and store data. database with CouchDB, a less-studied non-relational
Instead, they use a variety of data models that do not require database. The analysis considered a complex database sce-
a predefined schema. Different types of NoSQL DBMSs are nario with multiple joins and explored different data struc-
tures. This study used a comprehensive analysis of CRUD
 Key-Value Store (e.g., DynamoDB);
operations and query complexity. However, the number of
 Document Store (e.g., MongoDB and CouchDB);
records used is relatively small, which may not fully capture
 Wide Column Store/Column Families (e.g., Hadoop/
scalability and performance challenges at larger scales.
HBase and Cassandra); and
Additional studies [4, 16, 17] also contributed to the un-
 Graph Databases (e.g., Neo4J and OrientDB) [9].
derstanding of the performance advantages of MongoDB in
NoSQL databases are considered one of the techniques to specific scenarios and query types. All these studies pri-
handle the emergent requirements of geo-big data [6]. To marily focused on reading queries and did not extensively
aid decision-making when selecting and optimizing geo- investigate writing operations.
spatial database systems, comparative studies between SQL Based on the research gaps in the previously mentioned
and NoSQL DBMSs have highlighted the varying strengths literature, this article aims to provide a comprehensive
and limitations of different systems. In addition, there are evaluation of SQL Server and MongoDB by considering the
several frameworks available for evaluating geospatial data- complete CRUD operations and disk space utilization using
bases like GEOYCSB, Geographica, and GeoBenchmark the openly available OpenStreetMap (OSM) dataset of
Suite. Each framework offers its own set of features and Central America. By analyzing both the query performance
performance metrics for assessing the efficiency and suit- and disk space aspects, database management systems
ability of geospatial database solutions. Researchers have analysis is covered. The research involves the development
proposed frameworks that consider factors like query opti- of an open-source library for spatial query performance
mization, application performance estimation, and analysis. By comparing the spatial query performance of
measuring disk space, helping to navigate the complex SQL Server and MongoDB, this research aims to provide
landscape of selecting an appropriate DBMS for GIS appli- valuable insights that can facilitate informed decision-mak-
cation. There are two main reasons behind relying on the ing in the field of spatial data management. To sum up, the
DBMSs (SQL Server and MongoDB) [10–12]. First, they contributions of the article are as follows:
both support spatial data types. Second, they possess a large
user base.  The evaluation of the spatial query performance and
Starting with a brief literature review, there has been a disk space usage of SQL Server and MongoDB is pro-
growing interest in recent years in finding a balance between vided, considering their support for spatial data types;
traditional relational databases and the emerging opportu-  An open-source library is developed that facilitates the
nities offered by NoSQL databases. As a result, several analysis of query performance between these two-
studies have been conducted to compare the performance of database management systems;

Unauthenticated | Downloaded 05/15/24 02:17 PM UTC


Pollack Periodica 3

 The experiments are conducted on the openly avail- assess their effectiveness in managing and storing the dataset
able OSM dataset of Central America, ensuring the from a storage utilization perspective. This analysis provides
relevance and practicality of the findings; insights into the efficiency of data storage and can help
 The outcomes of this article provide a valuable inform decisions regarding database sizing, resource allo-
resource for organizations and practitioners seeking cation, and cost optimization. By incorporating these con-
effective geospatial data management and decision- siderations into the experimental design, it was ensured to
making processes; align closely with the actual mechanisms and challenges
 Outcomes have significant implications for GIS edu- faced by GIS businesses and applications.
cation, as the used methodology enables the teaching
of both relational and NoSQL DBMSs using real-life
datasets. 4. EXPERIMENTS AND ANALYSIS
The aim is to develop an open-source library that facilitates
systematic query performance and disk storage analyses
3. APPLIED METHODOLOGY between SQL Server and MongoDB. Conducting a
comparative study involves performing multiple experi-
This article focuses on analyzing the query performance of ments to gather results for analysis. These results are then
CRUD operations on two well-known database management used to evaluate and discuss the performance characteristics
systems: SQL Server and MongoDB. The methodology in- and disk space utilization of the two database systems. The
volves several steps to ensure a comprehensive analysis. The applied source code can be found on GitHub [18].
first step is to import the openly available OSM locations
dataset of Central America into the SQL Server. This initial 4.1. Application data
import is crucial to ensure that each record in the dataset has
a unique identifier, which may not be readily available in the The data was extracted from the OSM [19], which is a free
raw data. Next, the OSM dataset is converted into GeoJSON and open-source project that aims to create a map of the
objects, which serve as the basis for importing the data into world using crowdsourced data. The OSM dataset is a
MongoDB. This conversion step allows for a comparative collection of geospatial data that has been contributed by
analysis between SQL Server and MongoDB, as both databases users all over the world. This dataset includes information
will have access to the same dataset in a compatible format. on roads, buildings, land use, points of interest, and other
In the experiments, real-world mechanisms used by geographic features. The OSM dataset is available in a variety
businesses and enterprises were simulated by considering of formats, including XML and PBF (used in the article).
the actual transaction registration model employed by The OSM dataset is used by various organizations and
database engines. Standard commands were used like individuals for various applications, including navigation,
INSERT instead of more efficient instructions like Bulk urban planning, disaster response, and environmental
Insert to accurately reflect real-world data insertion patterns analysis. It offers a collaborative and constantly updated
[13]. This principle applies to other basic commands as well, source of geospatial data [20]. The experiments were
like DELETE and UPDATE. By simulating the transaction executed on four OSM datasets that contain different
registration model commonly used by businesses, the numbers of location points. The numbers of location points
performance of SQL Server and MongoDB is evaluated in all datasets are as follows: (1,000,000, 10,000,000,
accurately under realistic conditions. 20,000,000, 30,000,000). All experiments were conducted on
Regarding the querying of data, queries are divided into a CORE i5, 12 GB RAM laptop with a Windows 11, 64 bit
two categories: selecting all data and k-Nearest Neighbors operating system.
(kNN) queries [17]. Selecting all data is important for data
verification, exploration, and performance benchmarking, 4.2. Applied tests
allowing researchers to ensure data accuracy, understand its In the study, the OSM data was stored in both SQL Server
structure, and establish a performance baseline. On the other and MongoDB (utilizing the 2dsphere index for efficient
hand, kNN queries are essential for spatial analysis and spatial querying). Additionally, no constraints were applied
location-based applications, enabling the evaluation of to the databases. The execution of queries was performed on
database systems’ efficiency and accuracy in handling a single node without distributed computing. The following
proximity searches. By dividing the queries into these cate- presents the results of the experiments, as well as the anal-
gories, the article covers fundamental data retrieval and ysis. The limitations of the experiments are also presented
spatial analysis aspects, providing a comprehensive assess- and discussed.
ment of the database systems’ performance and capabilities.
The final test involves measuring the disk space utiliza- 4.2.1. Performance analysis. Experiments in this research
tion of the two databases: SQL Server and MongoDB. This include the insert, update, delete, and comprehensive select
test aims to evaluate the storage efficiency and space con- queries on the two database engines. First, the SQL language
sumption of each database system. By comparing the disk queries were designed to run on an MS SQL Server database.
space requirements of the two databases, researchers can Then the queries were converted to the same queries in

Unauthenticated | Downloaded 05/15/24 02:17 PM UTC


4 Pollack Periodica

MongoDB syntax to run in MongoDB. For evaluating the


performance operations, 10,000 data points have been chosen
as an average sample for the number of records, which is being
selected in most of the related studies. The comparison of the
taken time to run the queries on both databases will be shown Fig. 3. UPDATE execution times in SQL Server and MongoDB
in the experimental results. The execution times were taken in
microseconds in both relational and NoSQL databases.
In the following, one subsection is assigned to each 4.3.1. Q1 loading all locations. The goal of the first query
experiment, in which the desired operation or query is is to retrieve all locations from both the SQL Server and
expressed, and the results are illustrated. MongoDB databases. This query helps assess the efficiency
of retrieving and handling a large volume of geospatial data
4.2.2. Insert operations. The results of the insert operations from the NoSQL database. By testing the query’s perfor-
in both databases in the previously mentioned size scales, mance, scalability, and response time, GIS analysts can
along with 10,000 records at a time, indicate that SQL Server evaluate the database’s ability to handle and process exten-
performs insert operations at a higher speed compared to sive geospatial datasets. This query is particularly useful in
MongoDB with SQL Server being approximately two times assessing the overall data loading capabilities and ensuring
faster in completing the insert operations. In fact, Mon- that the NoSQL database can effectively handle the entire
goDB’s insert performance can be optimized when inserting geospatial dataset without any bottlenecks or performance
entire documents at once, as it eliminates the need for in- issues.
dividual row-level processing. On the other hand, SQL
Server’s relational structure requires explicit data type defi- 4.3.2. Q2 k nearest neighbors’ analysis. kNN queries,
nitions for each column, which can lead to faster insert which involve finding the nearest neighbors of a query
performance in certain scenarios, especially when inserting instance, are widely used in GIS and other fields for tasks
one record at a time. The results are shown in Fig. 1. like data classification and clustering. Researchers have
dedicated significant attention to optimizing kNN opera-
4.2.3. Delete operations. The delete operations have also tions. This subsection discusses the experimental setup and
been evaluated in these experiments. The results in Fig. 2 findings of a kNN query comparison between SQL Server
show that MongoDB outperforms SQL by a factor of about and MongoDB. In this experiment, a random point is
13 times when the number of records reaches 30 million. selected, and its nearest neighbors are determined based on
its coordinates.
4.2.4. Update operations. Update operations were per- The k of the nearest 1,000 was selected as an average
formed with the same database size scales in this subsection. value of k as in the related studies [16, 17]. The aim is to find
As can be seen from the results, the MongoDB database the nearest points to a specific given point, ordered by dis-
engine has absolute superiority especially when data sizes tance. This particular query is essential for assessing the
increase. The results are shown in Fig. 3. efficiency of a database in conducting proximity-based
searches, which is crucial in various geospatial analysis tasks.
4.3. Querying in different modes These tasks often involve finding nearby locations, per-
Regarding the query on the data, as mentioned earlier, forming spatial clustering, or executing spatial joins. By
queries have been divided into two main queries (Q1 and evaluating the response time and accuracy of the query, GIS
Q2), and the results are to be reviewed. analysts can determine the database’s suitability for handling
geospatial analysis tasks that heavily rely on proximity-based
operations.
In SQL Server, to calculate the neighbors’ points, using
spatial extensions like the Geometry Data Type and spatial
functions for distance like Euclidean distance would typi-
cally be needed. On the other hand, in MongoDB, the $near
operator is used to find and sort the documents based on
their proximity to the specified point. The results are then
Fig. 1. INSERT execution times in SQL Server and MongoDB limited to 1,000 points using the limit function. For the first
query Q1, the average execution times are cleared in the
following figures.
After analyzing the graph in Figs 4 and 5, it becomes
evident that MongoDB consistently outperforms SQL Server
in all scenarios, ranging from small datasets to large datasets.
For the second query Q2, query execution times are pre-
sented as follows.
Also, from Figs 6 and 7, it is observed that the kNN
Fig. 2. DELETE execution times in SQL Server and MongoDB loading time for MongoDB in Q2 is significantly smaller

Unauthenticated | Downloaded 05/15/24 02:17 PM UTC


Pollack Periodica 5

advantageous choice for handling the complex querying


needs of these systems, where real-time data retrieval and
processing are critical.
The findings are specific to the investigated versions of
Fig. 4. Q1 average execution times in SQL Server SQL Server and MongoDB but remain valuable for teach-
ing GIS based on relational DBMSs. The developed library
has the potential to teach both relational and NoSQL
DBMSs using real-life datasets and can be expanded to
support other DBMSs. Future research directions include
analyzing more spatial data types like line and multi-line
objects. The article can also propose expanding the test
Fig. 5. Q1 average execution times in MongoDB scope to encompass important spatial features, as geo-
metric operations, spatial indexing strategies, and support
for 3d/4d spatial data to further evaluate the practical
applicability and performance of the geospatial databases
in diverse scenarios.

Fig. 6. Q2 average execution times in SQL Server


REFERENCES

[1] H. Goyal, C. Sharma, and N. Joshi, “An integrated approach of


GIS and spatial data mining in big data,” Int. J. Comput. Appl.,
vol. 169, no. 11, pp. 1–6, 2017.
[2] W. Tampubolon, W. Reinhardt, S. Sumaryono, and S. T. L.
Fig. 7. Q2 average execution times in MongoDB
Tobing, “NoSQL standard and approach for geospatial database
collection,” in Seminar Nasional Geomatika, Cibinong, Indonesia,
April 14, 2021, pp. 321–326.
[3] E. Baralis, A. D. Valle, P. Garza, C. Rossi, and F. Scullino, “SQL
versus NoSQL databases for geospatial applications,” in IEEE In-
ternational Conference on Big Data, Boston, MA, USA, December
11–14, 2017, pp. 3388–3397.
[4] S. Agarwal and K. S. Rajan, “Analyzing the performance of NoSQL
Fig. 8. Comparing disk space between SQL Server with MongoDB vs. SQL databases for Spatial and Aggregate queries,” in Confer-
(size in GB) ence Proceedings on Free and Open-Source Software for Geospatial,
vol. 17, Boston, USA, September 20, 2017, Art no. 2.
compared to that of MS SQL Server as the number of data [5] M. Hasan, E. Panidi, and V. Badenko, “Comperative evaluation of
grows. On smaller datasets, the response time difference is NoSQL and relational databases performance while analysing semi
less significant compared to the larger dataset. structured GeoSpatial Data,” in International Scientific Conference
GEOBALCANICA, Saint Petersburg, Russia, August 22, 2019,
4.4. Disk space analysis pp. 541–549.
[6] D. Guo and E. Onstein, “State-of-the-art geospatial information
Regarding disk space usage, the NoSQL proved to be a better processing in NoSQL Databases,” ISPRS Int. J. Geo-Information,
solution (see Fig. 8). MongoDB demonstrates more efficient vol. 9, no. 5, 2020, Art no. 331.
disk space usage compared to SQL Server for the same [7] J. G. Lee and M. Kang, “Geospatial big data: Challenges and
amount of data, primarily due to its flexible, schema-less opportunities,” Big Data Res., vol. 2, no. 2, pp. 74–81, 2015.
format and the absence of relational overhead such as [8] Z. Liu, H. Guo, and C. Wang, “Considerations on geospatial big
transaction logs and metadata management. data,” IOP Conf. Ser. Earth Environ. Sci., vol. 46, 2016, Art no.
012058.
[9] J. K. Chen and W. Z. Lee, “An introduction of NoSQL databases
5. CONCLUSIONS based on their categories and application industries,” Algorithms,
vol. 12, no. 5, 2019, Art no. 106.
The research conducted in this article aimed to investigate [10] T. Jia, X. Zhao, Z. Wang, D. Gong, and G. Ding, “Model trans-
the performance and disk space utilization in GIS applica- formation and data migration from relational database to Mon-
tions using both relational and NoSQL databases. The goDB,” in Proceedings of IEEE International Congress on Big Data,
results consistently demonstrated that MongoDB, a NoSQL San Francisco, CA, USA, June 27–July 02, 2016, pp. 60–67.
database, outperformed relational databases in most cases. [11] D. Laksono, “Testing spatial data deliverance in SQL and NoSQL
The superior performance of MongoDB makes it an database using NodeJS fullstack web app,” in Proceedings of 4th

Unauthenticated | Downloaded 05/15/24 02:17 PM UTC


6 Pollack Periodica

International Conference on Science and Technology, Yogyakarta, for application’s data storage,” Appl. Sci., vol. 10, no. 23, 2020,
Indonesia, August 7–8, 2018, pp. 1–5. Art no. 8524.
[12] Geospatial queries, 2023. [Online]. Available: https://www.mongodb. _ B. Coşkun, S. Sertok, and B. Anbaroglu, “K-nearest neighbor
[16] I.
com/docs/manual/geospatial-queries/. Accessed: Nov. 1, 2023. query performance analysis on a large-scale taxi dataset: Post-
[13] S. H. Aboutorabi, M. Rezapour, M. Moradi, and N. Ghadiri, greSQL vs. MongoDB,” Int. Arch. Photogrammetry, Remote
“Performance evaluation of SQL and MongoDB databases for big Sensing Spat. Inf. Sci., vol. XLII-2/W13, pp. 1531–1538, 2019.
e-commerce data,” in International Symposium on Computer [17] B. Anbaroglu and A. Mobasheri, “Spatial query performance ana-
Science and Software Engineering, Tabriz, Iran, August 18–19, lyses on a big taxi trip origin-destination dataset,” in Proceedings on
2015, pp. 1–7. Open Source Geospatial Science for Urban Studies, Lecture Notes in
[14] D. C. M. Maia, M. Holanda, and B. D. C. Camargos, “Performance Intelligent Transportation and Infrastructure, vol. 2020, pp. 37–53.
analysis on voluntary geographic information systems with [18] Applied source code, 2024. [Online]. Available: https://github.
document-based NoSQL Database,” in Proceedings on De- com/remihk94/source_lib. Accessed: Jan. 30, 2024.
velopments and Advances in Intelligent Systems and Applications, [19] OpenStreetMap, 2023. [Online]. Available: https://www.
Maristela, Holandia, Maristela, January 1, 2018, pp. 181–197. openstreetmap.org/about. Accessed: Nov. 1, 2023.
[15] C. A. Győrödi, D. V. Dumşe-Burescu, D. R. Zmaranda, R. [20] OpenStreetMap and its use as open data, 2023. [Online].
Győrödi, G. A. Gabor, and G. D. Pecherle, “Performance analysis Available: https://www.e-education.psu.edu/geog585/node/738.
of NoSQL and relational databases with CouchDB and MySQL Accessed: Nov. 1, 2023.

Open Access statement. This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (https://
creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are
credited, a link to the CC License is provided, and changes – if any – are indicated. (SID_1)

Unauthenticated | Downloaded 05/15/24 02:17 PM UTC

You might also like