Analysis For An Open-Source Library in Database Ma
Analysis For An Open-Source Library in Database Ma
management systems
Received: November 21, 2023 • Revised manuscript received: February 2, 2024 • Accepted: February 8, 2024
DOI:
10.1556/606.2024.00990
© 2024 The Author(s)
ABSTRACT
Spatial data management is crucial for applications like urban planning and environmental monitoring.
While traditional relational databases are commonly used, they struggle with large and complex spatial
data. NoSQL databases provide support for unstructured data and scalability. This article compares the
ORIGINAL RESEARCH performance and disk space usage of SQL Server (a relational database) and MongoDB (NoSQL
database) using an open-source library. Experiments conducted with the OpenStreetMap dataset from
PAPER Central America show that the MongoDB database outperformed the relational SQL Server database in
most cases, offering practical advantages for spatial data management in Geographic Information
System applications.
KEYWORDS
spatial data, relational database, NoSQL, OpenStreetMap
1. INTRODUCTION
Geographic Information Systems (GIS) are computer-based systems used to manage, store,
manipulate, analyze, and present spatial or geographic data [1]. GIS data typically involves
many types of data, like maps, satellite images, and geospatial data [2]. The management and
analysis of GIS data are traditionally done using Relational DataBase Management Systems
(RDBMS) due to their strong data consistency, transactional support, and Atomicity, Con-
sistency, Isolation, and Durability (ACID) properties [3]. Relational databases have been used
as a strong base for managing and storing spatial data objects for close to two decades [4].
In today’s world, with the increasing growth of geolocated and geospatial data ranging from
satellite images with massive volumes to user-generated content that has varied formats,
performance, and scalability challenges appeared with Structured Query Language (SQL)
relational databases [3]. Due to these limitations, companies need to develop efficient ways to
improve response time both when data is provided and retrieved. Non-relational SQL
(NoSQL) databases are one of the techniques that have emerged in the recent past to handle
requests for large datasets [2, 4]. Many organizations now use NoSQL to manage their GIS
data [5]. NoSQL databases are more flexible and scalable and can handle large quantities of
unstructured and semi-structured data, making them well-suited for managing geospatial
data [6].
The objective of this article is to create an open-source library that enables the analysis of
p
Corresponding author. query performance and disk space utilization in two commonly used DataBase Management
E-mail: [email protected]
Systems (DBMSs): SQL Server and MongoDB. Both systems are capable of handling spatial
data making them suitable for querying spatial datasets. This article contributes to existing
literature by introducing an open-source library developed using C# programming language
on a console application that can be utilized within DBMS or GIS modules. It also
investigates and compares the query performance and disk these two database types. For example, one study [13] spe-
space usage of relational and NoSQL databases, both of cifically compared the performance of MongoDB, a popular
which are extensively employed in spatial analysis. NoSQL document-oriented database, with SQL database in
terms of executing common queries. The findings of this
study indicated that MongoDB often outperformed SQL in
2. THEORETICAL BACKGROUND terms of speed and overall performance. Another study [11]
focused on examining the performance of MongoDB and
Traditionally, geospatial databases as Oracle Spatial and PostGIS, a spatial extension for SQL databases, in handling
PostGIS have been effective tools for managing geospatial geospatial data. The researchers assessed the loading capa-
data. They provide spatial data types, indexing capabilities, bilities of both databases using a NodeJS-based web appli-
and spatial operations that are specifically designed for cation that simulated large amounts of geospatial data.
handling geospatial information. In the past decade, the Although the results of the study indicated that MongoDB
availability of GPS-enabled devices has led to a significant outperformed PostGIS in loading geospatial data, they might
increase in geospatial data production. This has resulted in differ in real-world scenarios with actual geospatial data and
the emergence of “Geospatial Big Data,” which refers to the complex queries. In 2018, the study [14] conducted a per-
unstructured and semi-structured location-based data formance comparison between a document-based NoSQL
generated [7]. While SQL relational databases were tradi- database and a relational database for Voluntary Geographic
tionally effective for managing geospatial data, the rise of Information System (VGIS) data storage architecture. The
geospatial big data has necessitated more efficient handling study suggested the advantages of the document-based
and storage solutions. NoSQL databases have emerged as NoSQL database in terms of feasibility and performance, but
promising alternatives, offering improved performance and it lacks consideration of more complex types of queries and
scalability for managing geospatial big data. These databases comparing metrics. Furthermore, the study [15] in 2020
are specifically designed to handle large-scale and unstruc- provided a detailed analysis of Create, Read, Update, and
tured data, making them well-suited for the challenges Delete (CRUD) operations and their impact on application
associated with geospatial big data [8]. NoSQL databases do performance. The study compared the well-known MySQL
not use tables, rows, and columns to organize and store data. database with CouchDB, a less-studied non-relational
Instead, they use a variety of data models that do not require database. The analysis considered a complex database sce-
a predefined schema. Different types of NoSQL DBMSs are nario with multiple joins and explored different data struc-
tures. This study used a comprehensive analysis of CRUD
Key-Value Store (e.g., DynamoDB);
operations and query complexity. However, the number of
Document Store (e.g., MongoDB and CouchDB);
records used is relatively small, which may not fully capture
Wide Column Store/Column Families (e.g., Hadoop/
scalability and performance challenges at larger scales.
HBase and Cassandra); and
Additional studies [4, 16, 17] also contributed to the un-
Graph Databases (e.g., Neo4J and OrientDB) [9].
derstanding of the performance advantages of MongoDB in
NoSQL databases are considered one of the techniques to specific scenarios and query types. All these studies pri-
handle the emergent requirements of geo-big data [6]. To marily focused on reading queries and did not extensively
aid decision-making when selecting and optimizing geo- investigate writing operations.
spatial database systems, comparative studies between SQL Based on the research gaps in the previously mentioned
and NoSQL DBMSs have highlighted the varying strengths literature, this article aims to provide a comprehensive
and limitations of different systems. In addition, there are evaluation of SQL Server and MongoDB by considering the
several frameworks available for evaluating geospatial data- complete CRUD operations and disk space utilization using
bases like GEOYCSB, Geographica, and GeoBenchmark the openly available OpenStreetMap (OSM) dataset of
Suite. Each framework offers its own set of features and Central America. By analyzing both the query performance
performance metrics for assessing the efficiency and suit- and disk space aspects, database management systems
ability of geospatial database solutions. Researchers have analysis is covered. The research involves the development
proposed frameworks that consider factors like query opti- of an open-source library for spatial query performance
mization, application performance estimation, and analysis. By comparing the spatial query performance of
measuring disk space, helping to navigate the complex SQL Server and MongoDB, this research aims to provide
landscape of selecting an appropriate DBMS for GIS appli- valuable insights that can facilitate informed decision-mak-
cation. There are two main reasons behind relying on the ing in the field of spatial data management. To sum up, the
DBMSs (SQL Server and MongoDB) [10–12]. First, they contributions of the article are as follows:
both support spatial data types. Second, they possess a large
user base. The evaluation of the spatial query performance and
Starting with a brief literature review, there has been a disk space usage of SQL Server and MongoDB is pro-
growing interest in recent years in finding a balance between vided, considering their support for spatial data types;
traditional relational databases and the emerging opportu- An open-source library is developed that facilitates the
nities offered by NoSQL databases. As a result, several analysis of query performance between these two-
studies have been conducted to compare the performance of database management systems;
The experiments are conducted on the openly avail- assess their effectiveness in managing and storing the dataset
able OSM dataset of Central America, ensuring the from a storage utilization perspective. This analysis provides
relevance and practicality of the findings; insights into the efficiency of data storage and can help
The outcomes of this article provide a valuable inform decisions regarding database sizing, resource allo-
resource for organizations and practitioners seeking cation, and cost optimization. By incorporating these con-
effective geospatial data management and decision- siderations into the experimental design, it was ensured to
making processes; align closely with the actual mechanisms and challenges
Outcomes have significant implications for GIS edu- faced by GIS businesses and applications.
cation, as the used methodology enables the teaching
of both relational and NoSQL DBMSs using real-life
datasets. 4. EXPERIMENTS AND ANALYSIS
The aim is to develop an open-source library that facilitates
systematic query performance and disk storage analyses
3. APPLIED METHODOLOGY between SQL Server and MongoDB. Conducting a
comparative study involves performing multiple experi-
This article focuses on analyzing the query performance of ments to gather results for analysis. These results are then
CRUD operations on two well-known database management used to evaluate and discuss the performance characteristics
systems: SQL Server and MongoDB. The methodology in- and disk space utilization of the two database systems. The
volves several steps to ensure a comprehensive analysis. The applied source code can be found on GitHub [18].
first step is to import the openly available OSM locations
dataset of Central America into the SQL Server. This initial 4.1. Application data
import is crucial to ensure that each record in the dataset has
a unique identifier, which may not be readily available in the The data was extracted from the OSM [19], which is a free
raw data. Next, the OSM dataset is converted into GeoJSON and open-source project that aims to create a map of the
objects, which serve as the basis for importing the data into world using crowdsourced data. The OSM dataset is a
MongoDB. This conversion step allows for a comparative collection of geospatial data that has been contributed by
analysis between SQL Server and MongoDB, as both databases users all over the world. This dataset includes information
will have access to the same dataset in a compatible format. on roads, buildings, land use, points of interest, and other
In the experiments, real-world mechanisms used by geographic features. The OSM dataset is available in a variety
businesses and enterprises were simulated by considering of formats, including XML and PBF (used in the article).
the actual transaction registration model employed by The OSM dataset is used by various organizations and
database engines. Standard commands were used like individuals for various applications, including navigation,
INSERT instead of more efficient instructions like Bulk urban planning, disaster response, and environmental
Insert to accurately reflect real-world data insertion patterns analysis. It offers a collaborative and constantly updated
[13]. This principle applies to other basic commands as well, source of geospatial data [20]. The experiments were
like DELETE and UPDATE. By simulating the transaction executed on four OSM datasets that contain different
registration model commonly used by businesses, the numbers of location points. The numbers of location points
performance of SQL Server and MongoDB is evaluated in all datasets are as follows: (1,000,000, 10,000,000,
accurately under realistic conditions. 20,000,000, 30,000,000). All experiments were conducted on
Regarding the querying of data, queries are divided into a CORE i5, 12 GB RAM laptop with a Windows 11, 64 bit
two categories: selecting all data and k-Nearest Neighbors operating system.
(kNN) queries [17]. Selecting all data is important for data
verification, exploration, and performance benchmarking, 4.2. Applied tests
allowing researchers to ensure data accuracy, understand its In the study, the OSM data was stored in both SQL Server
structure, and establish a performance baseline. On the other and MongoDB (utilizing the 2dsphere index for efficient
hand, kNN queries are essential for spatial analysis and spatial querying). Additionally, no constraints were applied
location-based applications, enabling the evaluation of to the databases. The execution of queries was performed on
database systems’ efficiency and accuracy in handling a single node without distributed computing. The following
proximity searches. By dividing the queries into these cate- presents the results of the experiments, as well as the anal-
gories, the article covers fundamental data retrieval and ysis. The limitations of the experiments are also presented
spatial analysis aspects, providing a comprehensive assess- and discussed.
ment of the database systems’ performance and capabilities.
The final test involves measuring the disk space utiliza- 4.2.1. Performance analysis. Experiments in this research
tion of the two databases: SQL Server and MongoDB. This include the insert, update, delete, and comprehensive select
test aims to evaluate the storage efficiency and space con- queries on the two database engines. First, the SQL language
sumption of each database system. By comparing the disk queries were designed to run on an MS SQL Server database.
space requirements of the two databases, researchers can Then the queries were converted to the same queries in
International Conference on Science and Technology, Yogyakarta, for application’s data storage,” Appl. Sci., vol. 10, no. 23, 2020,
Indonesia, August 7–8, 2018, pp. 1–5. Art no. 8524.
[12] Geospatial queries, 2023. [Online]. Available: https://www.mongodb. _ B. Coşkun, S. Sertok, and B. Anbaroglu, “K-nearest neighbor
[16] I.
com/docs/manual/geospatial-queries/. Accessed: Nov. 1, 2023. query performance analysis on a large-scale taxi dataset: Post-
[13] S. H. Aboutorabi, M. Rezapour, M. Moradi, and N. Ghadiri, greSQL vs. MongoDB,” Int. Arch. Photogrammetry, Remote
“Performance evaluation of SQL and MongoDB databases for big Sensing Spat. Inf. Sci., vol. XLII-2/W13, pp. 1531–1538, 2019.
e-commerce data,” in International Symposium on Computer [17] B. Anbaroglu and A. Mobasheri, “Spatial query performance ana-
Science and Software Engineering, Tabriz, Iran, August 18–19, lyses on a big taxi trip origin-destination dataset,” in Proceedings on
2015, pp. 1–7. Open Source Geospatial Science for Urban Studies, Lecture Notes in
[14] D. C. M. Maia, M. Holanda, and B. D. C. Camargos, “Performance Intelligent Transportation and Infrastructure, vol. 2020, pp. 37–53.
analysis on voluntary geographic information systems with [18] Applied source code, 2024. [Online]. Available: https://github.
document-based NoSQL Database,” in Proceedings on De- com/remihk94/source_lib. Accessed: Jan. 30, 2024.
velopments and Advances in Intelligent Systems and Applications, [19] OpenStreetMap, 2023. [Online]. Available: https://www.
Maristela, Holandia, Maristela, January 1, 2018, pp. 181–197. openstreetmap.org/about. Accessed: Nov. 1, 2023.
[15] C. A. Győrödi, D. V. Dumşe-Burescu, D. R. Zmaranda, R. [20] OpenStreetMap and its use as open data, 2023. [Online].
Győrödi, G. A. Gabor, and G. D. Pecherle, “Performance analysis Available: https://www.e-education.psu.edu/geog585/node/738.
of NoSQL and relational databases with CouchDB and MySQL Accessed: Nov. 1, 2023.
Open Access statement. This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (https://
creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are
credited, a link to the CC License is provided, and changes – if any – are indicated. (SID_1)