Performance Evaluation of Iot Data Management Using Mongodb Versus Mysql Databases in Different Cloud Environments
Performance Evaluation of Iot Data Management Using Mongodb Versus Mysql Databases in Different Cloud Environments
ABSTRACT The Internet of Things (IoT) introduces a new challenge for Database Management Systems
(DBMS). In IoT, large numbers of sensors are used in daily lives. These sensors generate a huge amount
of heterogeneous data that needs to be handled by the appropriate DBMS. The IoT has a challenge for the
DBMS in evaluating how to store and manipulate a huge amount of heterogeneous data. DBMS can be
categorized into two main types: The Relational DBMSs and the Non-relational DBMSs. This paper aims
to provide a thorough comparative evaluation of two popular open-source DBMSs: MySQL as a Relational
DBMS and MongoDB as a Non-relational DBMS. This comparison is based on evaluating the performance
of inserting and retrieving a huge amount of IoT data and evaluating the performance of the two types of
databases to work on resources with different specifications in cloud computing. This paper also proposes
two prediction models and differentiates between them to estimate the response time in terms of the size
of the database and the specifications of the cloud instance. These models help to select the appropriate
DBMS to manage and store a certain size of data on an instance with particular specifications based on the
estimated response time. The results indicate that MongoDB outperforms MySQL in terms of latency and
the database size through increasing the amount of tested data. Moreover, MongoDB can save resources
better than MySQL that needs resources with high capabilities to work with less performance.
INDEX TERMS IoT, DBMS, SQL, NoSQL, MySQL, MongoDB, AWS, cloud, multiple non-linear regres-
sions.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
110656 VOLUME 8, 2020
M. Eyada et al.: Performance Evaluation of IoT Data Management using MongoDB versus MySQL Databases
TABLE 1. Instance specification. elements of air pollution indoors and outdoors. The indoor
air frequently and severely polluted more than the outdoor
air, which can be refined naturally. The IoT sensor data that
depends on the temporal status and variables related to the
spatial characteristics of the space being measured should
be considered differently from other homogeneous inputs,
such as image and audio, because it considers heterogeneous
data [35]. Some changes are added to increase the flexibility
and decrease the redundancy for working with the intended
database. The database can be partitioned into three parts:
database and NoSQL database on IoT data. The main metrics the sensors part, the location longitude and latitude part and
in these experiments are the response time and the size of the the timestamp part. The first part is concerned with adding
database. This evaluation is performed in three main parts. a new sensor to a specific station. This part is a multi-value
First, it examines the impact of increasing workloads on the part so it can be changed from adding a new field for each new
two databases. This part helps to compare the performance of sensor to add a new record for each new sensor by converting
the two databases handling a large scale of IoT data. Second, the sensors data fields (i.e. the number of fields is equal to
it tests the effect of improving the capabilities of the cloud the number of all sensors) to only two fields: sensor_id field
instance on increasing the performance of the two databases. and sensor_data field. This change increases flexibility as
It helps in deciding which database can save resources better. it allows increasing any number of sensors to the database
Third, it proposes a prediction model that is concluded from without affecting its structure.
statistical analysis on the measured data of the introduced The second part is concerned with storing the longitude
experiments. This part applies two approaches of prediction and latitude of the location. This part can be normalized into
and compares between them to estimate the latency of the an independent table or collection that contains three fields:
data; in terms of the data size and instance performance. station_id, longitude and latitude for each station location in
It also evaluates these two estimation approaches and defines addition to the original table or collection but with replacing
which one is more accurate. The proposed prediction model longitude and latitude fields with station_id field. This change
provides the flexibility to evaluate and compare the two types reduces the redundancy in the original data as for every
of databases on any size of data and any instance. It selects sensor data insertion to a table or collection there is no need
the DBMS with low estimated latency. to add the station location. The third part, the timestamp
part, is concerned with instant receiving of sensor data. This
B. SOFTWARE AND HARDWARE part of data is important and must be added to the original
This section discusses the utilized software and hardware in table or collection.
the proposed performance evaluation. Node.JS LTS version With these changes, the database becomes more flexible
10.16.3 and NPM version 6.10.2 are used to process the col- to deal with MySQL and MongoDB DBMSs. After editing
lected data. Ubuntu 16.04 LTS version is used as an operating the database, it becomes able to receive and store data from
system to setup MongoDB, MySQL and Node.JS. any number of towns, stations and sensors; besides saving the
Elastic Compute Cloud (EC2) is used during the imple- locations of every station the longitude and the latitude with
mentation of this comparison. EC2 remains a core service Date and Time of every record.
of the AWS cloud platform. It provides a customer with
an opportunity to build and host a software system on the 2) MYSQL DATABASE SETUP
Amazon virtual servers (EC2 Instances). EC2 is a virtual MySQL server 8.0.11 is the version that is used in this evalua-
private server (VPS) within a cloud, where storage can be tion. Table 2 shows the SQL statements that are used to create
resizable and almost unlimited. With Ec2 service, three types the tested MySQL database schema specifying the structure
of instances were used t3.large (referred to as VM1), t3.xlarge of a database for managing a series of town’s base stations
(referred to as VM2) and t3.2xlarge (referred to as VM3). The and the related sensors. Two tables are created for MySQL
T3 instances feature is the Intel Xeon Platinum 8000 series schema: station_location and town_name.
(Skylake-SP) processor with a sustained all core Turbo CPU The station_location table is to save the location of every
clock speed of up to 3.1 GHz. Additionally, there is a support station. Referential integrity constraint is implemented with
for the new Intel Advanced Vector Extensions 512 (AVX- FOREIGN KEY to stop any station to be inserted in the town
512) instruction set [33]. Table 1 shows the specifications of table without being mentioned in the stations_locations. The
the instances. main target of dividing the dataset in two tables is to reduce
the data redundancy.
C. DATABASE SETUP
1) IoT BENCHMARK 3) MONGODB DATABASE SETUP
The pollution database [34] is used as a base for this MongoDB version 4.2 is the current stable release ver-
paper. This database is based on collecting information about sion that is applied in this comparison. As in MySQL, two
FIGURE 5. Insert query Instance T3.xlarge. FIGURE 7. The storage size of MongoDB based hybrid model, MongoDB
based reference model, and MySQL databases.
3) LINEAR REGRESSION
FIGURE 13. Select query Instance T3.xlarge.
A simple Linear Regression is used to illustrate the relation
between the dependent variable y and the independent vari-
able x based on the regression equation [36].
y = a1 x1 + a0 (1)
The proposed evaluation needs to find a relation between
three variables: latency, data size and instance performance.
The dependent variable (latency) is related to two indepen-
dent variables (data size and instance performance). In this
case, the Multiple Linear Regression can be implemented as
follows [35]:
y = a2 x2 + a1 x1 + a0 (2)
FIGURE 14. Select query Instance T3.2xlarge. where x1 is the data size, x2 is the instance performance and y
is the latency. Table 9 shows the parameters notations which
are used in this evaluation.
database dealing with a wide range of IoT data. In addition, The determiner method is used to solve the previous equa-
the results show that MySQL can work well with high- tion and get the final equation of prediction for both Mon-
performance instance, unlike MongoDB which works very goDB and MySQL latency as [36]:
well with all instance specifications.
y x1 x2 1
n n n
P P P
V. STATISTICAL ANALYSIS
i=1 y x 1 x2 n
i=1 i=1
In this section, a statistical analysis is introduced to estimate n n n n
=0
P P 2
P P (3)
the latency of data from a measured data size and instance
x1 y x1 x1 x2 y
performance using two approaches: Multiple Linear Regres- i=1
n
i=1
n
i=1
n
i=1
n
P
x22
P P P
sion and Multiple Non-linear Regression. This estimation
x2 y x1 x2 x2
is implemented on both database types, MySQL and Mon- i=1 i=1 i=1 i=1
goDB, for twofold aims. The first aim is to compare the By substituting in equation (3) with values from Table 8 to
two types of databases in terms of latency. The second aim get the values a2 , a1 , and a0 .
TABLE 8. Dataset information: latency, data Size, and Instance TABLE 9. Parameters’ notations used in the proposed evaluation.
performance.
x12 x22
y x1 x2x1 x2 1
n n n n n n
P
x12 x22
P P P P P
y x1 ∗ x2 x1 x2 n
i=1 i=1 i=1 i=1 i=1 i=1
n n n n n n n
P 2
x14 x12 x22 x12 x1 x2 x12 x1 x12 x2 2
P P P P P P
x y x1 y
i=1 1
i=1 i=1 i=1 i=1 i=1 i=1
n n n n n n n
P 2
x12 x22 x24 x22 x1 x2 x22 x1 x22 x2 x22 y
P P P P P P
x2 y
=0 (7)
ni=1 i=1
n n
i=1 i=1
n
i=1
n
i=1
n
i=1
n
x12 x1 x2 x22 x1 x2 x12 x22 x2 x12 x1 x22
P P P P P P P
x1 x2 y x1 x2 y
i=1 i=1 i=1 i=1 i=1 i=1 i=1
P n n n n n n n
x12 x1 x22 x1 x12 x2 x12
P P P P P P
x1 y x1 x2 x1 y
i=1 i=1 i=1 i=1 i=1 i=1 i=1
P n n n n n n n
x12 x2 x22 x2 x22 x1 x22
P P P P P P
x2 y x1 x2 x2 y
i=1 i=1 i=1 i=1 i=1 i=1 i=1
[2] B. Diene, J. Rodrigues, O. Diallo, E. Ndoye, and V. V. Korotaev, ‘‘Data [26] B. Maity, S. Sen, and N. C. Debnath, ‘‘Retracted: Challenges of implement-
management techniques for Internet of Things,’’ Mech. Syst. Signal Pro- ing data warehouse in MongoDB environment,’’ J. Fundam. Appl. Sci.,
cess., vol. 138, Apr. 2020, Art. no. 106564. vol. 10, no. 4S, pp. 222–228, 2018.
[3] S. Kontogiannis, C. Asiminidis, and G. Kokkonis, ‘‘Comparing relational [27] C. Gyorödi, R. Gyorödi, and R. Sotoc, ‘‘A comparative study of relational
and NoSQL databases for carrying IoT data,’’ J. Sci. Eng. Res., vol. 6, no. 1, and non-relational database models in a Web-based application,’’ Int.
pp. 125–133, 2019. J. Adv. Comput. Sci. Appl., vol. 6, no. 11, pp. 78–83, 2015.
[4] R. Čerešnák and M. Kvet, ‘‘Comparison of query performance in relational [28] L. Kumar, S. Rajawat, and K. Joshi, ‘‘Comparative analysis of NoSQL
a non-relation databases,’’ Transp. Res. Procedia, vol. 40, pp. 170–177, (MongoDB) with MySQL database,’’ Int. J. Modern Trends Eng. Res.,
Jan. 2019. vol. 2, no. 5, pp. 120–127, May 2015.
[5] B. Jose and S. Abraham, ‘‘Analysis of aggregate functions in relational [29] Z. Bicevska and I. Oditis, ‘‘Towards NoSQL-based data warehouse solu-
databases and NoSQL databases,’’ Int. J. Comput. Sci. Eng., vol. 6, no. 6, tions,’’ Procedia Comput. Sci., vol. 104, pp. 104–111, Jan. 2017.
pp. 74–79, Jul. 2018. [30] C. Li and J. Gu, ‘‘An integration approach of hybrid databases based on
[6] W. Ali, M. U. Shafique, M. A. Majeed, and A. Raza, ‘‘Comparison between SQL in cloud computing environment,’’ Softw., Pract. Exper., vol. 49, no. 3,
SQL and NoSQL databases and their relationship with big data analytics,’’ pp. 401–422, Mar. 2019.
Asian J. Res. Comput. Sci., pp. 1–10, Oct. 2019. [31] Y. Rasheed, M. Qutqut, and F. Almasalha, ‘‘Overview of the current status
[7] J. Dizdarević, F. Carpio, A. Jukan, and X. Masip-Bruin, ‘‘A survey of of NoSQL database,’’ Int. J. Comput. Sci. Netw. Secur., vol. 19, no. 4,
communication protocols for Internet of Things and related challenges of pp. 47–53, Apr. 2019.
fog and cloud computing integration,’’ ACM Comput. Surv., vol. 51, no. 6, [32] C. Asiminidis, G. Kokkonis, and S. Kontogiannis, ‘‘Database systems
pp. 1–29, Feb. 2019. performance evaluation for IoT applications,’’ Int. J. Database Manage.
[8] Y. Liu, K. Akram Hassan, M. Karlsson, Z. Pang, and S. Gong, ‘‘A data- Syst., vol. 10, no. 06, pp. 1–14, Dec. 2018.
centric Internet of Things framework based on azure cloud,’’ IEEE Access, [33] T3. US. Accessed: Jun. 25, 2019. [Online]. Available: https://aws.amazon.
vol. 7, pp. 53839–53858, 2019. com/ec2/instance-types/t3
[9] A. Celesti, A. Galletta, L. Carnevale, M. Fazio, A. Lay-Ekuakille, and [34] Romania. Pollution Measurements for the City of Brasov in
M. Villari, ‘‘An IoT cloud system for traffic monitoring and vehicular acci- Romania. Accessed: Jun. 25, 2019. [Online]. Available: http://iot.ee.
dents prevention based on mobile sensor data processing,’’ IEEE Sensors surrey.ac.uk:8080/datasets.html#weather
J., vol. 18, no. 12, pp. 4795–4802, Jun. 2018. [35] J. Moon, S. Kum, and S. Lee, ‘‘A heterogeneous IoT data analysis frame-
work with collaboration of edge-cloud computing: Focusing on indoor
[10] Z. Daher and H. Hajjdiab, ‘‘Cloud storage comparative analysis Amazon
PM10 and PM2.5 status prediction,’’ Sensors, vol. 19, no. 14, p. 3038,
simple storage vs. microsoft azure blob storage,’’ Int. J. Mach. Learn.
Jul. 2019.
Comput., vol. 8, no. 1, pp. 85–89, Feb. 2018.
[36] D. J. Hand, ‘‘Statistical challenges of administrative and transaction data,’’
[11] V. M. Ionescu and J. M. Lopez-Guede, ‘‘Comparing Google Cloud
J. Roy. Stat. Soc., A (Statist. Soc.), vol. 181, no. 3, pp. 555–605, Jun. 2018.
and Microsoft Azure platforms for undergraduate laboratory use,’’ in
[37] B. Shyti and D. Valera, ‘‘The regression model for the statistical analysis of
Proc. Int. Workshop Soft Comput. Models Ind. Environ. Appl. (SOCO),
albanian economy,’’ Int. J. Math. Trends Technol., vol. 62, no. 2, pp. 90–96,
San Sebastián, Spain, Oct. 2016, pp. 795–802.
Oct. 2018.
[12] J. Kaur and M. Sharma, ‘‘Extending IoTs into the cloud-based platform for [38] M. El Genidy, ‘‘Multiple nonlinear regression of the Markovian arrival
examining Amazon Web services,’’ in Examining Cloud Computing Tech- process for estimating the daily global solar radiation,’’ Commun. Statist.-
nologies Through the Internet of Things. Hershey, PA, USA: IGI Global, Theory Methods, vol. 48, no. 22, pp. 5427–5444, Oct. 2018.
2018, pp. 216–227. [Online]. Available: http://www.igi-global.com [39] M. M. El Genidy, ‘‘Multiple non linear regression model for the maximum
[13] M. Laaziri, K. Benmoussa, S. Khoulji, and M. L. Kerkeb, ‘‘A Compara- number of migratory bird types during migration years,’’ Commun. Statist.-
tive study of PHP frameworks performance,’’ Procedia Manuf., vol. 32, Theory Methods, vol. 46, no. 16, pp. 7969–7975, Aug. 2017.
pp. 864–871, Jan. 2019.
[14] J. M. Volk and M. A. Turner, ‘‘PRMS-Python: A Python framework for
programmatic PRMS modeling and access to its data structures,’’ Environ.
Model. Softw., vol. 114, pp. 152–165, Apr. 2019.
[15] D. Laksono, ‘‘Testing spatial data deliverance in SQL and NoSQL database
using NodeJS fullstack Web app,’’ in Proc. 4th Int. Conf. Sci. Technol.
(ICST), Yogyakarta, Indonesia, Aug. 2018, pp. 1–5.
[16] M. Ohyver, J. V. Moniaga, I. Sungkawa, B. E. Subagyo, and I. A. Chandra, MAHMOUD EYADA received the B.Sc. degree in
‘‘The comparison firebase realtime database and MySQL database perfor- computer and mathematical science from the Fac-
mance using Wilcoxon signed-rank test,’’ Procedia Comput. Sci., vol. 157, ulty of Science, Port Said University, in 2016. He
pp. 396–405, Jan. 2019. is a Server Side Back-END Developer, a Database
[17] L. Bienvenu and R. Downey, ‘‘On low for speed oracles,’’ J. Comput. Syst. Analyzer, and an IoT Systems Designer work
Sci., vol. 108, pp. 49–63, Mar. 2020. as a freelancer with worldwide projects since
[18] P. Senellart, L. Jachiet, S. Maniu, and Y. Ramusat, ‘‘ProvSQL: Provenance his last year in college. His research interests
and probability management in postgreSQL,’’ Proc. VLDB Endowment, include database management systems, data anal-
vol. 11, no. 12, pp. 2034–2037, Aug. 2018. ysis, the Internet of Things, Internet protocols,
[19] R. R. Parmar and S. Roy, ‘‘MongoDB as an efficient graph database: wireless sensor networks, and embedded systems.
An application of document oriented NOSQL database,’’ Data Intensive
Comput. Appl. Big Data, vol. 29, pp. 331–358, Feb. 2018.
[20] M. Ben Brahim, W. Drira, F. Filali, and N. Hamdi, ‘‘Spatial data extension
for cassandra NoSQL database,’’ J. Big Data, vol. 3, no. 1, Dec. 2016.
[21] H. V. Le and A. Takasu, ‘‘G-HBase: A high performance geographical
database based on HBase,’’ IEICE Trans. Inf. Syst., vol. E101.D, no. 4,
pp. 1053–1065, 2018.
[22] C. Asiminidis, G. Kokkonis, and S. Kontogiannis, ‘‘Managing IoT data
WALAA SABER received the B.Sc. and M.Sc.
using relational schema and JSON fields, a comparative study,’’ IOSR degrees in computer and control engineering from
J. Comput. Eng., vol. 20, no. 6, pp. 46–52, 2019. Suez Canal University in 2001 and 2008, respec-
[23] V. Jain, ‘‘MongoDB and NoSQL databases,’’ Int. J. Comput. Appl., tively, and the Ph.D. degree in computer and con-
vol. 167, no. 10, pp. 8887–8975, 2017. trol engineering from Port Said University, Egypt,
[24] J. Fjällid, ‘‘A comparative study of databases for storing sensor data,’’ in 2014. She is an Assistant Professor with Electri-
M.S. thesis, Dept. Comp. Sci., Tech. Univ., Stockholm, Sweden, 2019. cal Engineering Department, Port Said University.
[25] Y.-S. Kang, I.-H. Park, J. Rhee, and Y.-H. Lee, ‘‘MongoDB-based repos- Her research interests include computer network-
itory design for IoT-generated RFID/Sensor big data,’’ IEEE Sensors J., ing, including cloud computing, clustering, and the
vol. 16, no. 2, pp. 485–497, Jan. 2016. Internet of Things.
MOHAMMED M. EL GENIDY received the FATHY AMER received the B.Sc. degree in mil-
Ph.D. degree in statistics and computer science itary science and the B.Sc. degree in electrical
from Mathematics Department, Faculty of Sci- and communication engineering from the Military
ence, Mansoura University, Mansoura, Egypt, Technical College (MTC), Cairo, Egypt, in 1970,
in 2001. He is currently an Assistant Professor the M.Sc. degree in electrical, mechatronics, and
of statistics with the Faculty of Science, Mathe- communication engineering (major area: electron-
matics and Computer Science Department, Port ics and communication) from Azhar University,
Said University, Port Said, Egypt. His research Cairo, in 1985, and the Ph.D. degree in philoso-
interests include statistics, probability, order statis- phy in computer science from Computer Science
tics, queues theory, regression, estimation, distri- Department, Azhar University. From May 1970 to
butions, mathematical programming, statistical tests, and hypotheses test. June 1993, he was an Engineer Officer and a Lecturer with the Military
Technical College and various places in the armed forces in the field of
computers, information technology, electrical engineering, communications,
and electronic insurance. From September 2012 to August 2013, he was the
Dean of the Higher Institute of Engineering and Technology, Obour, Egypt.
From September 2013 to September 2016, he was a Professor Emeritus with
the Department of Information Technology, Faculty of Computers and Infor-
mation, Cairo University. Since October 2016, he has been the Vice Dean of
Community Affairs and the Environment, Misr University for Science and
Technology. He is currently a Professor with the 6th of October University.
His research interests include local, expanding and international networking,
databases, multimedia, the Internet of Things, and information technology.