Monitoring and Predicting Air
Monitoring and Predicting Air
Article
Monitoring and Predicting Air Quality with IoT Devices
Claudia Banciu 1, * , Adrian Florea 1 and Razvan Bogdan 2
1 Department of Computer Science and Electrical Engineering, Lucian Blaga University of Sibiu,
550025 Sibiu, Romania; [email protected]
2 Department of Computers and Information Technology, “Politehnica” University of Timisoara,
300006 Timis, oara, Romania; [email protected]
* Correspondence: [email protected]
Abstract: The growing concern about air quality and its influence on human health has prompted
the development of sophisticated monitoring and forecast systems. This article gives a thorough
investigation into forecasting the air quality index (AQI) with an Internet of Things (IoT) device that
analyzes temperature, humidity, PM10, and PM2.5 levels. The dataset used for this analysis comprises
5869 data points across six critical parameters essential for accurate air quality prediction. The data
from these sensors is sent to the ThingSpeak cloud platform for storage and preliminary analysis.
The system forecasts AQI using a TensorFlow-based regression model, delivering real-time insights.
The combination of IoT technology and machine learning improves the accuracy and responsiveness
of air quality monitoring systems, making it a useful tool for environmental management and public
health protection. This work presents comparatively the effectiveness of feedforward neural network
models trained with the ‘adam’ and ‘RMSprop’ optimizers over different epochs, as well as the
machine learning algorithm random forest with varying numbers of estimators to forecast AQI.
The models were trained using both types of regression analysis: linear regression and random
forest regression. The findings show that the model achieves a high degree of accuracy, with the
predictions closely aligning with the actual AQI values, thus having the potential to significantly
reduce the negative health impact associated with poor air quality, protecting public health and
alerting users when pollution levels are higher than allowed. Specifically, the random forest model
with 100 estimators delivers the best overall performance for both AQI 10 and AQI 2.5, achieving the
lowest Mean Absolute Error (MAE) of 0.2785 for AQI 10 and 0.2483 for AQI 2.5. This integration of
Citation: Banciu, C.; Florea, A.;
Bogdan, R. Monitoring and Predicting
IoT technology and advanced predictive analysis addresses the significant worldwide issue of air
Air Quality with IoT Devices. pollution by identifying the pollution hotspots and allowing decision-makers for quick reactions, and
Processes 2024, 12, 1961. https:// the development of effective strategies to reduce pollution sources.
doi.org/10.3390/pr12091961
Keywords: Internet of Things; prediction; machine learning; sensors; air quality index; environment
Academic Editors: Silvia Carpitella,
Manuel Herrera, Bruno Melo Brentan
and Joaquín Izquierdo
information-sensing devices and/or controlled online anytime. The issue of air pollution is
relevant and significant since it affects the entire natural ecosystem, and endangers people’s
health, being also associated with infectious disease transmission [2]. According to the
World Health Organization, polluted air causes millions of deaths each year, primarily due
to diseases such as heart disease, stroke, chronic obstructive renal diseases, pulmonary
disease, lung cancer, and acute respiratory infections [3]. To mitigate these consequences, it
is critical to accurately assess air quality and anticipate its degradation in the short term
caused by changes in wind directions and intensification, or other calamities. Platforms
equipped with particle detectors may alert citizens to the rising quantities of pollen or dust.
In such cases, it may be recommended that people avoid regions that may be hazardous to
their health, take a different route, and find the nearest pharmacies where antihistamine
medications may be purchased [4].
The urgent need for integrated IoT devices in environmental monitoring is highlighted
due to the old and fixed infrastructure existing in some cities, the lack of advanced sensor
technologies, optimized connectivity, and sustainable energy solutions. This hinders real-
time data acquisition, processing, and analysis, limiting informed decision making and
sustainable resource management [5]. Addressing these issues could be carried out by
designing advanced IoT devices that revolutionize environmental data collection [6], ensure
relevance across applications, and provide actionable insights for stakeholders [7].
This paper explores merging IoT systems and machine learning algorithms to predict
air quality index, crucial for public health and environmental management. Machine
learning (ML) is a subset of the artificial intelligence (AI) domain that involves training
algorithms to recognize patterns and make predictions based on data, gradually improving
its accuracy. The learning system of an ML algorithm is divided into three main parts: a
decision process, an error function, and an iterative ‘evaluate and optimize’ process. In
environmental monitoring, machine learning algorithms can analyze complex datasets
generated by IoT devices, learning from past data to make accurate predictions about future
air quality conditions. By processing large amounts of data, machine learning models can
identify trends, detect anomalies, and provide early warnings, making them invaluable
for forecasting pollution levels and helping to mitigate health risks. The study uses an
IoT device to measure indoor environmental characteristics like temperature, humidity,
PM10, and PM2.5. PM10 and PM2.5 refer to particulate matter with diameters less than
10 µm and 2.5 µm, respectively (e.g., allergens, like pollens, mold spores, dust mites, and
cockroaches). PM2.5 is more harmful to human health than PM10 due to its smaller size and
ability to penetrate deeper into the respiratory and renal system [8]. The data are stored on
ThingSpeak for analysis. A TensorFlow-based regression model predicts air quality index
(AQI), providing timely alerts and insights for preventive actions. This could help identify
pollutants, highlight high concentrations, and predict future dangerous areas. Machine
learning algorithms provide accurate and fast air quality estimates, crucial for public health
responses and environmental policy development. These technologies detect complex
environmental data patterns, increasing AQI forecast accuracy. Real-time information and
forecasts limit exposure to harmful pollutants, minimizing health risks. Combining these
technologies creates scalable, cost-effective air quality monitoring networks. This AIoT
system (AI + IoT) is a prototype and in this stage, it was tested just in indoor conditions
(private house). It can be applied in industrial shop floors, offices, and rooms of school
classes, etc. Still, it is easily extendable to include other sensors and to be tested outdoors
to help decision-makers from the city level or local environmental agencies. The machine
learning algorithms are not affected by the place where the IoT system is applied.
Briefly, the research objectives are the following:
RO1: Identifying hotspot areas from the air and noise pollution point of view by developing
an IoT system (the hardware and software components) responsible for data gathering,
reading sensors’ values, and uploading them on the cloud/web server;
Processes 2024, 12, 1961 3 of 25
RO2: Developing the software application (data analytics level of IoT system) for con-
tinuous information about indoor and outdoor environment quality, possible threats,
and advice;
RO3: Implementing the prediction algorithms of weather parameters and pollutant trends;
RO4: Implementing the module for data visualization and proposing suggestions for
decision-makers.
The rest of this paper is structured as follows: Section 2 reviews previous work and
challenges in the field; Section 3 describes the proposed solution—the hardware system
implemented, the methodology of data collection, the metrics involved, and analysis pro-
cesses; and Section 4 briefly highlights the results and significant findings. Section 5 reflect
the limitations of this work and to provide a more comprehensive analysis and challenges
of air quality monitoring in outdoor environment. Finally, the Section 6 summarizes the
key points and suggests future research directions.
2. Related Work
In recent years, significant progress has been achieved in the ability to monitor and,
in some situations, anticipate air quality. Government monitoring stations, on the other
hand, are accurate but have a limited number of locations and high operational costs [9].
To address these restrictions, researchers have combined IoT devices to improve flexibility
and cost-effectiveness in monitoring air quality [10]. The Internet of Things (IoT) and ma-
chine learning have significantly improved air quality monitoring and prediction systems.
IoT-enabled systems collect real-time data on air pollutants, which are then evaluated using
machine learning techniques to forecast air quality levels [11,12]. The rapid expansion of
IoT technology has revolutionized industries with remote monitoring and sophisticated
analytics [13]. Monitoring air quality is crucial for resolving health issues and mitigating
the effects of poor air quality on public health [14]. The rapidly expanding field of IoT mon-
itoring indoor parameters includes sensor technology, data administration, user experience,
health consequences, calibration, validation, and integration [15].
For example, Kumar Sai et al. [16] proposed an inexpensive IoT-based air quality
monitoring system based on Arduino and the MQ series (specifically MQ135 and MQ7).
These sensors detect ammonia, carbon dioxide, alcohol, smoke, and carbon monoxide. This
arrangement allows for the cost-effective and adaptable display of contaminants in the
air. The system analyzes air quality using data obtained from several sensors, proving the
feasibility of using low-cost sensors for environmental monitoring. This strategy not only
improves the availability of air quality monitoring, but also ensures that IoT can be used to
address environmental challenges. The authors emphasize the importance of accessible air
quality monitoring to raise awareness and improve public health.
Karnati [17] assessed IoT-based air pollution monitoring systems focusing on big data
and machine learning. They highlighted the need for smart devices and advanced analytics
for effective air quality control plans.
Air pollution affects human health, plant life, and wildlife. Traditional methods like
lab analysis and expensive models are no longer effective. Recent research focuses on smart
devices using machine learning algorithms, big data technologies, and IoT to collect and
analyze air data [18]. The aim is to improve air pollution models and address research
challenges, focusing on data sources, monitoring, and forecasting models. Some of the
shortcomings of the data collection systems are the fixed and aging infrastructure of the
local environmental protection agencies and the change in traffic and the industrial pole
in certain cities, which makes it ineffective to collect data from areas that are no longer
relevant as they were 10–15 years ago [4].
Al horr et al.’s studies explore the impact of indoor factors such as temperature,
humidity, air quality, and illumination on occupant health, well-being, and productivity [19].
Identifying the best parameter ranges for human comfort and performance is a research goal.
Zhang et al. [20] assess indoor particulate matter in urban households, highlighting health
hazards and practical ways to reduce exposure. They suggest improving ventilation, using
Processes 2024, 12, 1961 4 of 25
air purifiers, using cleaner cooking methods, and reducing indoor PM-generating activities.
Pope and Dockery’s [21] study highlights the severe health consequences of PM2.5 exposure,
highlighting the link between fine particulate matter and increased risk of cardiovascular
and respiratory disorders. They call for strict air quality standards and policies to protect
public health and emphasize the need for ongoing research and effective treatments to
reduce these risks. Gope, Dawn, and Das’ study [22] on the impact of COVID-19 lockdown
measures on air quality found that these measures positively impacted the environment,
resulting in lower air pollutant concentrations in cities worldwide. The study also found a
significant decrease in PM2.5 and PM10 levels during lockdown periods, highlighting the
effectiveness of lockout procedures in improving air quality. These findings contribute to a
growing body of research on the environmental effects of pandemic-induced societal shifts.
The studies [23,24] examine the impact of COVID-19 on air quality in 87 major cities
globally, highlighting the environmental impacts of reduced human activity. The authors
suggest using the pandemic as a natural experiment to understand the relationship between
human activities and air quality. They advocate for sustainable urban planning; promoting
public transport, cycling, and walking; and accelerating the adoption of cleaner energy
sources. They also suggest implementing air quality regulations to maintain lower pollution
levels during lockdowns and raising public awareness about improved air quality. The
study found a strong correlation between reduced human activity and improved air quality,
suggesting that long-term improvements can be achieved through sustainable practices
during the pandemic.
Monitoring indoor elements using IoT devices is driven by a desire to create healthier
and more sustainable indoor environments [25]. Researchers intend to increase occupant
well-being and productivity by addressing research questions on sensor technology, data
management, user experience, health implications, calibration and validation, and inte-
gration and interoperability [19]. By solving these research concerns, progress can be
achieved in monitoring indoor parameters with IoT devices, resulting in healthier and
more sustainable indoor environments.
In [26], the authors developed a monitoring network in the city of Salerno (placed
at the seaside in southern Italy) composed of three collection stations of air quality in
relatively highly crowded areas. The stations combine information from sensors for PM10
detection, temperature, humidity, pressure, and wind direction in order to better evaluate
the pollution degree. The authors applied an interpolation model to determine the areas of
highest pollution concentration within the monitored area based on weather conditions,
traffic speed and volume, and street geometry. Unfortunately, the applicability of the
solution is limited due to the legal regulation of the location of IoT monitoring systems
(data collection stations). At least in Romania, only city halls have the right of installing
such monitoring systems in public spaces.
Most research studies do not simultaneously present information about developing
an IoT system and applying machine learning algorithms on data (produced by their IoT
system or obtained from other sources). Usually, the topics are split: either discussing
from the hardware perspective development of the IoT system or discussing the machine
learning algorithms (the software perspective). Unlike the previously mentioned papers,
this work combines both topics in a holistic approach, targeting a social problem, namely
air quality monitoring with an impact on human health and employee productivity in shop
floors or offices. The novelty of this AIoT solution consists of developing a fully functional
IoT system that produces real data regarding indoor air pollution which are then analyzed
and used as input in machine learning algorithms for predicting air quality index based on
indoor environmental characteristics like temperature, humidity, PM10, and PM2.5. The
developed application can be used as a decision or warning tool for the population by
employers (if is used indoors) or local authorities (if is used outdoors).
Processes2024,
Processes 12,1961
2024,12, x FOR PEER REVIEW 5 of 25
5 of 25
3. Proposed
3. Proposed Solution
Solution
This article
This article presents
presents practical
practical solutions
solutions for
for addressing
addressing challenges
challenges in in air
air quality
quality mon-
mon-
itoring systems
itoring systems using
using IoT-based
IoT-based data
dataacquisition
acquisitionand andTensorFlow-based
TensorFlow-basedanalysis.analysis. The
The
strategy aims
strategy aims toto enhance
enhance the the effectiveness
effectiveness andand efficiency
efficiency of of these
these systems,
systems, enabling
enabling better
better
environmentalmonitoring
environmental monitoringand andmanagement
management to to support
support healthier
healthier living
living conditions.
conditions. In-
Indoor
door parameters
parameters play aplay a crucial
crucial role inrole in occupant
occupant health,health, well-being,
well-being, and productivity
and productivity [27]. [27].
The project
The project aims
aims toto collect
collect natural
natural environmental
environmental parameters
parameters such such asas temperature,
temperature,
humidity, and air quality, which are influenced by suspended particles such as PM10
humidity, and air quality, which are influenced by suspended particles such as PM10 and
and
PM2.5,which
PM2.5, whichare arecomplex
complexmixtures
mixtures ofof very
very small
small particles
particles andand liquid
liquid droplets.
droplets. TheThe sen-
sensors
sors used
used for collecting
for collecting the are
the data datahigh-precision
are high-precision
sensorssensors
with with low power
low power consumption.
consumption.
Figure 1 represents
Figure represents the thesystem
systemarchitecture,
architecture,which
which reflects thethe
reflects flow of data
flow fromfrom
of data col-
lection to to
collection prediction,
prediction, integrating
integrating thetheexport
exportofofdata
datafrom
fromThingSpeak
ThingSpeakas as aa CSV file
file for
for
analysis with TensorFlow. This provides a streamlined process for real-time air quality
analysis with TensorFlow. This provides a streamlined process for real-time air quality
monitoring and
monitoring and forecasting.
forecasting.
Figure 1.
Figure 1. System
System architecture.
architecture.
3.1. Hardware
3.1. Hardware Part
Part
Hardwarerepresents
Hardware representsthethephysical
physical portion
portion of aofcomputer
a computer model,
model, as opposed
as opposed to soft-
to software,
ware, which
which dealsthe
deals with with the logical
logical part. part.
Figure22presents
Figure presentsthethe structure
structure of the
of the device.
device. TheThe processing
processing unit awith
unit with a Wireless
Wireless Local
Local Area Network (WLAN) module reads data from the connected sensors. The sent
Area Network (WLAN) module reads data from the connected sensors. The data are data
areHTTP
via sent via
to HTTP to the
the cloud cloud at configurable
at configurable intervals (currently
intervals (currently 1 min). The1 min).
deviceThe
hasdevice has
3 sensors
connected and is currently
3 sensors connected and isincurrently
use for temperature, humidity, and
in use for temperature, dust particles.
humidity, and dustThe system
particles.
could be easily
The system extended
could with
be easily gas or other
extended with sensors.
gas or other sensors.
Processes 2024, 12, x FOR PEER REVIEW 6 of 25
Processes 2024, 12, 1961 6 of 25
Figure
Figure2.2.Hardware
Hardwarearchitecture.
architecture.
3.1.1. Raspberry Pi 4 B
3.1.1. Raspberry Pi 4 B
The Raspberry Pi 4 Model B offers enhanced multimedia capabilities, computing
The Raspberry Pi 4 Model B offers enhanced multimedia capabilities, computing
speed, memory, and networking with a 64-bit quad-core processor, dual displays, and PoE
speed, memory, and networking with a 64-bit quad-core processor, dual displays, and
capability [28].
PoE capability [28].
3.1.2. Temperature Sensor TMP117
3.1.2. Temperature Sensor TMP117
TMP117 is a precise digital temperature sensor [29]. The temperature sensor uses digi-
TMP117
tal data is a precise
and Raspberry Pi’sdigital
GPIO temperature
pins for easy sensor connection.[29]. The
To usetemperature sensor uses
the I2C communication
digital data
protocol, SDAandand
Raspberry
SCL pins Pi’s GPIO
must bepins for easy
connected toconnection. To use
microcontroller the requiring
pins, I2C communi-I2C-6
cation
channelprotocol,
creation. SDA and SCL pins must be connected to microcontroller pins, requiring
I2C-6 channel creation.
3.1.3. Humidity Sensor HIH-4030
3.1.3. The
Humidity
projectSensor HIH-4030
utilized a SparkFun breakout board for Honeywell’s HIH-4030 humidity
Themeasuring
sensor, project utilized a SparkFun
relative humidity breakout
and providing boardan foranalog
Honeywell’s HIH-4030
output voltage forhumidity
easy data
sensor, measuring
processing relative
[30]. The analoghumidity and providing
sensor requires a converter an analog outputdata
from analog voltage for easyvalues
into digital data
processing [30]. The analog sensor requires a converter from analog data into digital val-
for humidity calculation, influenced by ambient temperature using Formula (1).
ues for humidity calculation, influenced by ambient temperature using Formula (1).
SensorRH
True RH = 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 , T in ◦ C, (1)
𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑅𝑅𝑅𝑅1.0546
= − 0.00216T, 𝑇𝑇 𝑖𝑖𝑖𝑖 °𝐶𝐶,
1.0546−0.00216𝑇𝑇
(1)
Figure3.3.Interconnecting
Figure Interconnectingcomponents.
components.
Figure 3.Figure
Figure 44depicts
depictscomponents.
theelectrical
the electrical diagram
diagram and
and shows
shows how
how each
each sensor
sensor is connected
is connected to
to the
Interconnecting
the microcontroller.
microcontroller. In theInprovided
the provided diagram,
diagram, thecoding
the color color coding for the connections
for the connections is as
is as follows:
follows:
red wiresred
are wires
used are
for used
power for power
connections connections
(3.3 V and (3.3
5 V),V and
black 5 V),
wires
Figure 4 depicts the electrical diagram and shows how each sensor is connected toblack
are wires
used forare used
ground
the for ground connections,
connections, yellow wiresyellow wires
represent SDArepresent SDAorange
(I2C data), (I2C data),
wiresorange
are forwires are is
SCL (I2C for
Clock),
microcontroller. In the provided diagram, the color coding for the connections asSCL
blue wires
(I2C Clock),are used
blue for the analog output from the HIH 4030 to A0 on the ADS 1015,onand
follows: red wires arewires
usedare for used
power forconnections
the analog (3.3
output from
V and theblack
5 V), HIH wires
4030 to A0used
are the
green wires are for the connections to the SDS011.
for ADS
ground1015, and green yellow
connections, wires are for represent
wires the connections to the
SDA (I2C SDS011.
data), orange wires are for SCL
(I2C Clock), blue wires are used for the analog output from the HIH 4030 to A0 on the
ADS 1015, and green wires are for the connections to the SDS011.
Figure4.4.Electrical
Figure Electricaldiagram.
diagram.
The
Figure 4.The flowdiagram
flow
Electrical diagrambelow
diagram. belowindicates
indicatesthe
theprocess
processof
ofdata
datagathering
gatheringand
andtransmission
transmissionvia
via
anIoT
an IoTdevice
deviceand
anddescribes
describesaadata-gathering
data-gatheringsystem
systembasedbasedononaaRaspberry
RaspberryPi. Pi.The
Themethod
method
begins with turning on the Raspberry Pi, and then running an application
The flow diagram below indicates the process of data gathering and transmission viathe
begins with turning on the Raspberry Pi, and then running an application that that activates
activates
the sensors
an IoT device
sensors to describes
and
to collectcollect data.
data. The The system
a data-gathering
system then looks
system
then looks for for on
abased
Wi-Fi a Wi-Fi connection;
a Raspberry
connection; Pi. is
if one The ifmethod
notone is not
available,
begins with turning on the Raspberry Pi, and then running an application that activatesand
the program terminates. If Wi-Fi is enabled, the system connects to a cloud database,
thetransfers
sensors and stores the
to collect collected
data. data. After
The system storing,
then looks forthe
a system waits for oneif minute
Wi-Fi connection; one is before
not
Processes 2024, 12, x FOR PEER REVIEW 8 of 25
Figure5.5.Flowchart
Figure Flowchart RPI
RPI application.
application.
3.2.
3.2.Software
SoftwarePart
Part
This component oversees gathering data, delivering it to a database, and predicting
This component oversees gathering data, delivering it to a database, and predicting
it. Software is a collection of instructions, data, or programs used to run machines and
it. Software
complete is atasks.
certain collection of instructions, data, or programs used to run machines and
complete certain tasks.
3.2.1. Acquisition of Data by Sensors
3.2.1. Acquisition
The Raspberry ofPi Data by Sensors
microcontroller works by running an application developed using
The Raspberry Pi microcontroller works by running an application developed using
the Python 3.9 programming language.
the Pythonthe3.9
Using values from collected
programming PM10 and PM2.5, AQI is calculated as in (2). To make
language.
this possible is required the installation of the python library python-aqi [33]. This library
Using the values from collected PM10 and PM2.5, AQI is calculated as in (2). To make
converts the AQI value to the pollutant concentration (µg/m3 or ppm) using the United
this possible is required the installation of the python library python-aqi [33]. This library
States Environmental Protection Agency (EPA) algorithm [34,35] (Table 1).
converts the AQI value to the pollutant concentration (µg/m3 or ppm) using the United
States Environmental Protection Agency
Ihigh − Ilow (EPA) algorithm [34,35] (Table 1).
I= ·(C − BPlow ) + Ilow , (2)
BPhigh − BP−low
𝐼𝐼ℎ𝑖𝑖𝑖𝑖ℎ 𝐼𝐼𝑙𝑙𝑙𝑙𝑙𝑙
𝐼𝐼 = ⋅ (𝐶𝐶 − 𝐵𝐵𝐵𝐵𝑙𝑙𝑙𝑙𝑙𝑙 ) + 𝐼𝐼𝑙𝑙𝑙𝑙𝑙𝑙 ,
𝐵𝐵𝐵𝐵ℎ𝑖𝑖𝑖𝑖ℎ − 𝐵𝐵𝐵𝐵𝑙𝑙𝑙𝑙𝑙𝑙
(2)
where
where
I = the (Air Quality) index
CI == the
thepollutant
(Air Quality) index
concentration
C low
BP = the concentration breakpoint that is ≤C
= the pollutant concentration
BPhigh = the concentration breakpoint that is ≥C
BPlow = the concentration breakpoint that is ≤C
Ilow = the index breakpoint corresponding to Clow
BPhigh = the concentration breakpoint that is ≥C
Ihigh = the index breakpoint corresponding to Chigh
Ilow = the index breakpoint corresponding to Clow
Ihigh = the index breakpoint corresponding to Chigh
Processes 2024, 12, x FOR PEER REVIEW 9 of 25
3.2.2. ThingSpeak
3.2.2. ThingSpeak CloudCloudComputing
ComputingPlatform
Platform
The collected
The collected data
data are
aresent
senttotoThingSpeak,
ThingSpeak,which
whichisisanan‘Application
‘Application Programming
Programming
Interface’(API)
Interface’ (API)and
andwebwebservice
serviceforforthe
the‘Internet
‘InternetofofThings’
Things’(IoT)
(IoT)[36].
[36].ThingSpeak
ThingSpeakserves
serves
asaavaluable
as valuable tool
tool for
fordisplaying
displaying and andanalyzing
analyzing the
the environmental
environmentaldata datacollected
collectedthrough
through
theIoT-based
the IoT-basedmonitoring
monitoringsystemsystemimplemented
implementedin inthe
thestudy.
study.A Anewnewchannel
channelisiscreated
createdonon
ThingSpeak and data is sent to it using an HTTP GET request. The URL was constructed
ThingSpeak and data is sent to it using an HTTP GET request. The URL was constructed
withthe
with theWrite
WriteAPI
APIKeyKeyandandthe
theappropriate
appropriatefield
fieldvalues,
values,enabling
enablingthe thetransmission
transmissionof ofdata
data
to the ThingSpeak platform for visualization and analysis.
to the ThingSpeak platform for visualization and analysis.
The communication
The communication between
between the the collecting
collecting stations
stations and
and the
the central
central node
node is
is wireless.
wireless.
At 1 min intervals, samples of data are collected and sent to the device. In order to connect
At 1 min intervals, samples of data are collected and sent to the device. In order to connect
and send
and send data
data to
to the
the ThingSpeak
ThingSpeak platform,
platform, the
the device
device is
is connected
connected to to aa Wireless
Wireless Local
Local
Area
AreaNetwork
Network(WLAN)
(WLAN)and andHyper-Text
Hyper-TextTransfer
TransferProtocol (HTTP)
Protocol (HTTP) is used,
is used,as can be seen
as can in
be seen
Figure 6.
in Figure 6.
ThingSpeak offers 4 channels for free, where each channel can contain a maximum
of 10 fields. As the implemented IoT system collects 6 factors, only 6 fields are required.
Processes 2024, 12, 1961 After creating the channel, it is accessible through an API key. Having only 1 collecting 10 of 25
device, this channel is enough for this IoT system. Uploading data to ThingSpeak is made
with HTTP requests. A GET method is called to the ThingSpeak platform’s URL with the
parameters—the
ThingSpeak offersAPI key and the values
4 channels for free,forwhere
the 6 fields (e.g., url):
each channel can contain a maximum
of𝑈𝑈𝑈𝑈𝑈𝑈 = 𝑓𝑓′ℎ𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡://𝑎𝑎𝑎𝑎𝑎𝑎.
10 fields. As the implemented𝑡𝑡ℎ𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖.
IoT𝑐𝑐𝑐𝑐𝑐𝑐/𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢?
system collects 𝑎𝑎𝑎𝑎𝑎𝑎_𝑘𝑘𝑘𝑘𝑘𝑘 = ““&𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓1
6 factors, only 6 fields are required.
After creating the = {𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡_𝑑𝑑𝑑𝑑}&𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓2
channel, it is accessible = {ℎ𝑢𝑢𝑢𝑢_𝑑𝑑𝑑𝑑}&𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓3
through an API key. = {𝑝𝑝𝑝𝑝_𝑡𝑡𝑡𝑡𝑡𝑡_𝑑𝑑𝑑𝑑}&𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓4
Having only 1 collecting
device, this channel =is{𝑝𝑝𝑝𝑝_𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡_𝑑𝑑𝑑𝑑}&𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓5
enough for this IoT system. = {𝑎𝑎𝑎𝑎𝑎𝑎_𝑡𝑡𝑡𝑡𝑡𝑡_𝑑𝑑𝑑𝑑}𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓6
Uploading data to ThingSpeak is made
with HTTP requests. = {𝑎𝑎𝑎𝑎𝑎𝑎_𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡_𝑑𝑑𝑑𝑑}′
A GET method is called to the ThingSpeak platform’s URL with the
The method of uploading data from the device was chosen due to the fact that con-
parameters—the API key and the values for the 6 fields (e.g., url):
necting the collecting station to the internet would increase the power consumption of the
microcontroller.
URL = fIn this :way,
′ https the running time of the device increases.
//api.thingspeak.com/update?api_key = ““& f ield1
The data are = {precisely
temp_db}read by the
& f ield2 = {sensors,
hum_db}which& f ield3include the temperature
= { pm_ten_db }& f ield4 sensor,
humidity sensor, = dust sensor,
{ pm_two and }index
f ive_db quality,
& f ield5 as illustrated
= { aqi_ten_db in Figure 7, and thus, no
} f ield6
human intervention is required
= { aqi_two f ive_dbin }the
′ sensing process. The IoT device is responsible for
collecting the environmental parameters, for processing the raw data and calculating the
air quality index of
The method based on the data
uploading particlefrom matter data, was
the device and chosen
for transmitting
due to thethe data
fact thattocon-
the
cloud database using Wi-Fi.
necting the collecting station to the internet would increase the power consumption of the
The secondIncomponent
microcontroller. this way, the represents
running time the cloud
of the database,
device increases.and it oversees storing the
transmitted
The data data
aresecurely
precisely forread
real-time
by theand historical
sensors, which analysis.
include It also provides functional-
the temperature sensor,
ity to export
humidity the stored
sensor, data asand
dust sensor, a CSV indexfile quality,
for further analysis. in Figure 7, and thus, no
as illustrated
human The last part is is
intervention represented
required inby thethe data analysis
sensing process.and Theprediction
IoT devicemodel. Firstly, for
is responsible the
dataset must be imported for analysis. Then, the prediction model would be created using
collecting the environmental parameters, for processing the raw data and calculating the
TensorFlow-based
air quality index based analysis.
on the In the end, the
particle resulting
matter data, data
and forwilltransmitting
be displayedthe anddata
visualized
to the
for easy interpretation by the user.
cloud database using Wi-Fi.
Figure7.
Figure 7. Software
Software architecture.
architecture.
3.2.3.The
Making
secondPredictions
component represents the cloud database, and it oversees storing the
transmitted data securely
The proposed forimports
system real-timetheand
airhistorical analysis.
quality dataset It Google
into also provides functionality
Collab, saving it in
to export the stored data as a CSV file for further analysis.
CSV format for easy computer analysis. These data are easily analyzed using the Tensor-
FlowThe last part
(version is represented
2.17.0) by thewith
package included datathe
analysis
Googleand prediction
Collab program.model. Firstly, [37]
TensorFlow the
dataset must be imported for analysis. Then, the prediction model would be
is an open-source machine learning library developed by Google that offers two modes of created using
TensorFlow-based
execution: the eager analysis.
mode andIn the end,mode.
graph the resulting data will be displayed and visualized
for easy interpretation by the user.
The dataset includes 6 critical parameters that aid in air quality prediction. Initially,
the dataset
3.2.3. MakingisPredictions
preprocessed using appropriate approaches to remove inconsistent and
missing valued data, and the necessary features from the dataset are chosen to improve
The proposed system imports the air quality dataset into Google Collab, saving it in
CSV format for easy computer analysis. These data are easily analyzed using the Tensor-
Flow (version 2.17.0) package included with the Google Collab program. TensorFlow [37]
is an open-source machine learning library developed by Google that offers two modes of
execution: the eager mode and graph mode.
The dataset includes 6 critical parameters that aid in air quality prediction. Initially,
the dataset is preprocessed using appropriate approaches to remove inconsistent and
missing valued data, and the necessary features from the dataset are chosen to improve the
Processes 2024, 12, 1961 11 of 25
outcomes. The dataset is then divided into two parts: training and testing to evaluate the
model’s performance.
Regression analysis [38] is a type of predictive modeling technique that examines the
relationship between a dependent and an independent variable. This technique is used for
forecasting or predicting, time series modeling, and determining the causal relationship
between variables.
1. Air quality dataset collection
The project uses an air quality dataset from the ThingSpeak cloud, available in CSV
format. The data, spanning 5 days from 17 to 21 May 2024, and taken from an office with
two people, offers information on the average minute reactions of air components. The
dataset has 5869 rows and 8 columns.
2. Data preprocessing
This is a data mining method that converts raw data into a comprehensible format by
cleansing it, filling in missing values, smoothing noisy data, resolving inconsistencies, and
converting decimal values to suitable floats.
The errors or inconsistencies in the data are identified and corrected. This involves
removing duplicates, handling outliers, and correcting data entry errors. The missing
values are handled. The data are normalized or standardized by converting them to a
consistent scale, which is often necessary for algorithms that require normalized input
data. Decimal values are converted to suitable floats to maintain precision and ensure
consistent analysis.
In detail, the following preprocessing steps have been performed:
• Handling Missing Values: The method dropna(inplace = True) addresses missing values
by removing the rows with any missing data. This ensures that the dataset used for
training and testing does not contain null or missing entries.
• Outlier Detection and Removal using IQR: Outliers are detected using the Interquartile
Range (IQR) method. The IQR is calculated by subtracting the first quartile (Q1) from
the third quartile (Q3). Outliers are defined as values that lie below Q1−1.5IQR or
above Q3 + 1.5IQR. These rows are removed from the dataset.
• Data Transformation: The PowerTransformer with the yeo–johnson method [39] is used
to transform the selected numeric columns (temperature, humidity, pm10, and pm25).
The Yeo–Johnson transformation is suitable for data that include both positive and
negative values. It adjusts the distribution to be more Gaussian-like without los-
ing information. By making the data more Gaussian-like, it can often improve the
performance and stability of the machine learning models.
• Converting Data Types to Float: To ensure consistency in the data format, the numeric
columns are converted to float. This is crucial for accurate computations, especially in
machine learning models, which often require numeric input data in the float format.
• Normalization or Standardization: The code uses the StandardScaler method to stan-
dardize the feature values. This scales the data so that it has a mean of 0 and a standard
deviation of 1, which helps certain machine learning models perform better, especially
those that are sensitive to the scale of input data.
3. Splitting training and test dataset
Separating datasets into training and testing sets is a crucial step in evaluating data
mining methods. Most data are used for training, while a smaller amount is used for
testing. Create a model using training data and test it against the test set to minimize data
discrepancies and improve model properties.
The data are split as follows: 80% was used to train the model and 20% was used for
testing it. By training the model on one subset and testing it on another, it can be assessed
how well the model generalizes to new data, helping to identify and minimize overfitting
or underfitting.
Processes 2024, 12, 1961 12 of 25
4. Feature selection
Machine learning models’ performance is heavily influenced by the data features used
to train them. Irrelevant or partially relevant features may have a negative impact on the
model performance. In this project, attributes such as ‘temperature’, ‘humidity’, ‘pm10′ ,
and ‘pm25′ were chosen for better results. The relevance of a feature is measured as the
entire reduction in criteria caused by that characteristic.
The feature importance is measured using various techniques, such as the following:
• Correlation Analysis: Evaluating the correlation between each feature and the tar-
get variable.
• Feature Importance Scores: Using algorithms like random forests to rank feature
importance.
5. Regression analysis
The processed datasets are used to generate a function that plots training and valida-
tion data for a sequential neural network with backpropagation learning. This model is
designed for regression tasks, where the goal is to predict a continuous numerical value.
Linear regression is a widely used predictive analysis method that assesses the ef-
fectiveness of a group of predictor variables in predicting an outcome and identifies the
most significant predictor variables. It is a statistical technique utilized to analyze the
correlation between a dependent variable (y—which is the target prediction) and one or
more independent variables (x) [40].
Consider the model function (3) which describes a line with slope β and y-intercept α.
y = α + β x, (3)
The scientific literature reveals the following benefits of using linear regression:
• Simplicity and Interpretability: Linear regression is a computationally efficient, simple,
and easy-to-understand technique. The correlation between the variables is easily
understood because the coefficients show the precise influence of each predictor.
• Low Variance: Compared to more complex models, linear regression is less prone to
overfitting because it is based on a single model.
• Helpful for Small Datasets: It works effectively with smaller datasets with a linear
relationship.
There are also disadvantages of linear regression:
• Assumption of Linearity: Linear regression assumes a linear relationship between
the independent and dependent variables, which limits its effectiveness in capturing
complex patterns.
• Sensitivity to Outliers: Linear regression is sensitive to outliers, which can distort the
model and affect its prediction accuracy.
• Limited in Handling Multicollinearity: Multicollinearity among predictor variables
can impact the stability of the model coefficients.
Backpropagation is a fundamental technique for training neural networks involving
the backward propagation of mistakes. It involves computing the difference between the
expected output and the actual target value and adjusting the weights based on this gradient.
This technique helps the model learn from past failures and improve its predictions over
time, thereby enhancing performance.
Random forest regression is a machine learning technique for predicting continuous
outcomes by aggregating the predictions of numerous decision trees. It operates by building
a ‘forest’ of decision trees during training, with each tree employing a random part of the
training data and a random subset of the characteristics [41]. In random forest regression,
the final prediction is obtained by averaging the predictions of all the individual trees in
the forest, resulting in a more accurate and stable forecast than a single decision tree model.
This strategy reduces the variance of regression predictors using bagging while keeping
the bias largely constant.
Processes 2024, 12, 1961 13 of 25
where
SS RES is the sum of the squares of the residual errors (the difference between the
observed and predicted values).
SS TOT is the total sum of squares (the difference between the observed values and
their mean, squared).
Mean squared error (MSE) is a statistical measure that quantifies the average squared
difference between the observed and predicted values (6). After calculating the squared
Processes 2024, 12, 1961 14 of 25
difference for each data point, the average of those squared differences is calculated. MSE
is more prone to outliers than MAE since it allocates higher weights to errors [43].
2
∑(yi − ŷi )
MSE = , (6)
n
The square root of MSE is referred to as the root mean squared error (RMSE). This
measure is frequently used since it is simple to read and has the same unit as the dependent
variable. It gives an indicator of the average error magnitude [44].
The root mean squared error (RMSE) is one of two primary performance measures for
a regression model. It calculates the average difference between the values predicted by a
model and the actual values. It estimates the model’s ability to predict the target value (7).
s
√ 2
∑(yi − ŷi )
RMSE = MSE = , (7)
n
The model performs better when the root mean squared error decreases. A perfect
model (a hypothetic model that consistently predicts the precise expected value) would
have a root mean squared error of zero.
In statistics, the Mean Absolute Percentage Error (MAPE), sometimes referred to as
mean absolute percentage deviation (MAPD), is a metric of forecasting method prediction
accuracy. MAPE is the average of Absolute Percentage Errors (APEs). Let At and Ft denote
the actual and forecast values at data point t, respectively [45]. It typically expresses
accuracy as a ratio specified by the following Formula (8), where N is the number of
data points:
1 N At − Ft
N t∑
MAPE = × 100, (8)
=1 At
However, MAPE has one big drawback: it generates infinite or undefined results when
the actual values are 0 or close to zero, which is typical in particular domains. If the real
values are very tiny (often less than one), MAPE produces extraordinarily large percentage
mistakes (outliers), whereas zero actual values produce endless MAPEs.
The Root Mean Squared Logarithmic Error (RMSLE) is determined by applying the log
function to the actual and predicted values and then subtracting them. RMSLE is resistant
to outliers when both minor and large errors are considered [46].
v
u N
u1
RMSLE = t ∑ (log(yi + 1) − log(ŷi + 1))2 , (9)
N i =0
100 N | Ft − At |
N t∑
SMAPE = | At |+| Ft |
, (10)
=1 2
The absolute difference between At and Ft is calculated by dividing the sum of the
absolute values of the actual and predicted values by half. The result of this calculation is
added for each fitted point t and then divided by the number of fitted points N.
Mean Directional Accuracy (MDA) measures how often the predicted direction of
change matches the actual direction of change. It is useful for time series and directional
forecasts. It compares the forecast direction (upward or downward) to the actual realized
direction. It is defined by the following Formula (11), where yi is the actual value at time i
𝑡𝑡 𝑡𝑡
absolute values of the actual and predicted values by half. The result of this calculation is
added for each fitted point t and then divided by the number of fitted points N.
Mean Directional Accuracy (MDA) measures how often the predicted direction of
change matches the actual direction of change. It is useful for time series and directional
Processes 2024, 12, 1961 15 of 25
forecasts. It compares the forecast direction (upward or downward) to the actual realized
direction. It is defined by the following Formula (11), where 𝑦𝑦𝑖𝑖 is the actual value at time
i and 𝑦𝑦�𝑖𝑖 is the forecast value at time i. Variable N represents the number of forecasting
and ŷi is the forecast value at time i. Variable N represents the number of forecasting points
points and I[⋅] is the indicator function that equals 1 if the condition inside is true and 0
and I[·] is the indicator function that equals 1 if the condition inside is true and 0 otherwise.
otherwise.
11 N𝑁𝑁
𝑀𝑀𝑀𝑀𝑀𝑀 = N ∑
MDA = yi −−yi𝑦𝑦−1 )·(
�I Ι[([(𝑦𝑦𝑖𝑖
ŷi −
𝑖𝑖−1 ) ∙ (𝑦𝑦
ŷi−𝑦𝑦�1 ) >) 0>] 0]
�𝑖𝑖 − 𝑖𝑖−1
(11)
(11)
𝑁𝑁i=2
𝑖𝑖=2
The Median
The Median Absolute
Absolute Error
Error (MedAE)
(MedAE) is is aa robust
robust statistical
statistical measure
measure used
used to
to evaluate
evaluate
the accuracy of predictions, particularly in the presence of outliers. It is defined as
the accuracy of predictions, particularly in the presence of outliers. It is defined as the
the
medianof
median of the
the absolute
absolute differences
differencesbetween
betweenthethe predicted
predictedandandactual
actualvalues,
values, making
makingitit less
less
sensitive to
sensitive to extreme
extreme values
values compared
compared toto other
other metrics
metrics like
like Mean
Mean Absolute
Absolute Error
Error (MAE)
(MAE) or or
Root Mean Square Error (RMSE).
Root Mean Square Error (RMSE).
The
The loss
loss is
is calculated
calculated by
by taking
taking the
the median
median of of all
all the
the absolute
absolute differences
differences between
between thethe
target and the prediction, by following Formula
target and the prediction, by following Formula (12): (12):
𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀==median
MedAE |yi|𝑦𝑦
𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 −𝑖𝑖 ŷ−i |,𝑦𝑦�𝑖𝑖 |, (12)
In the
In thesource
sourcecode,
code,wewe easily
easily changed
changed thethe methods
methods depending
depending onneeds
on the the needs (e.g.,
(e.g., from
from ‘tf.keras.metrics.MeanSquaredError’
‘tf.keras.metrics.MeanSquaredError’ to ‘tf.keras.metrics.MeanAbsolutePercent-
to ‘tf.keras.metrics.MeanAbsolutePercentageError’)
ageError’)
when whenthe
compiling compiling
model. the model.
4. Results
4. Results
The graphs
The graphspresented
presentedbelow
belowwere
were generated
generated in in
thethe ThingSpeak
ThingSpeak platform
platform based
based on
on the
the data
data collected
collected by the
by the implemented
implemented IoTIoT device.
device. Figures
Figures 8 8and
and9 9represent
representthe
thecharts
chartsthat
that
contain all
contain all the
the entries
entries of
of AQI
AQI 10
10 and
and AQI
AQI 2.5,
2.5, respectively,
respectively,in inthe
thedatabase.
database.
Figure 8.
Figure 8. Graph
Graph generated
generated in
in ThingSpeak
ThingSpeak web
web application
applicationfor
forAQI
AQI 10.
10.
The simple method of graphing and calculating using a machine learning methodology
is discussed below.
Figure 10 depicts a heat map of the air quality dataset’s properties in a graphical
format. The color scale runs from −1 to 1, with dark green indicating stronger positive
correlations (closer to 1), brown indicating stronger negative correlations (closer to −1),
and white or light green expressing no correlation (closer to zero).
Processes2024,
Processes 12,1961
2024,12, Figure 9. Graph generated in ThingSpeak web application for AQI 2.5.
x FOR PEER REVIEW 16 of 25
16 of 25
The simple method of graphing and calculating using a machine learning methodol-
ogy is discussed below.
Figure 10 depicts a heat map of the air quality dataset’s properties in a graphical for-
mat. The color scale runs from −1 to 1, with dark green indicating stronger positive corre-
lations (closer to 1), brown indicating stronger negative correlations (closer to −1), and
white or light green expressing no correlation (closer to zero).
Based on the provided heatmap, AQI 2.5 shows a strong positive correlation with
temperature and PM2.5, indicating that as these parameters increase, AQI 2.5 also in-
creases significantly. Additionally, AQI 2.5 has moderate positive correlations with PM10
and AQI10, suggesting a moderate relationship with these parameters. However, AQI 2.5
does not exhibit strong correlations with temperature or humidity, as indicated by their
lower correlation values. On the other hand, AQI 10 demonstrates moderate positive cor-
relations with both PM10 and PM2.5, but it does not show strong correlations with tem-
perature or humidity. This analysis highlights that while AQI 2.5 and AQI 10 are influ-
enced by particulate matter levels, they are not significantly affected by temperature and
humidity.
Figure9.9.Graph
Figure Graphgenerated
generatedin
inThingSpeak
ThingSpeakweb
webapplication
applicationfor
forAQI
AQI2.5.
2.5.
The simple method of graphing and calculating using a machine learning methodol-
ogy is discussed below.
Figure 10 depicts a heat map of the air quality dataset’s properties in a graphical for-
mat. The color scale runs from −1 to 1, with dark green indicating stronger positive corre-
lations (closer to 1), brown indicating stronger negative correlations (closer to −1), and
white or light green expressing no correlation (closer to zero).
Based on the provided heatmap, AQI 2.5 shows a strong positive correlation with
temperature and PM2.5, indicating that as these parameters increase, AQI 2.5 also in-
creases significantly. Additionally, AQI 2.5 has moderate positive correlations with PM10
and AQI10, suggesting a moderate relationship with these parameters. However, AQI 2.5
does not exhibit strong correlations with temperature or humidity, as indicated by their
lower correlation values. On the other hand, AQI 10 demonstrates moderate positive cor-
relations with both PM10 and PM2.5, but it does not show strong correlations with tem-
perature or humidity. This analysis highlights that while AQI 2.5 and AQI 10 are influ-
enced by particulate matter levels, they are not significantly affected by temperature and
humidity.
Based on the provided heatmap, AQI 2.5 shows a strong positive correlation with
temperature and PM2.5, indicating that as these parameters increase, AQI 2.5 also increases
significantly. Additionally, AQI 2.5 has moderate positive correlations with PM10 and
AQI10, suggesting a moderate relationship with these parameters. However, AQI 2.5 does
not exhibit strong correlations with temperature or humidity, as indicated by their lower
correlation values. On the other hand, AQI 10 demonstrates moderate positive correlations
with both PM10 and PM2.5, but it does not show strong correlations with temperature
or humidity. This analysis highlights that while AQI 2.5 and AQI 10 are influenced by
particulate matter levels, they are not significantly affected by temperature and humidity.
The model is trained using both types of regression analysis: linear regression and
random forest regression. For linear regression are used different numbers of epochs (50,
100, 500, and 1000), and for random forest, two values for the number of decision trees (100
and 1000). The following figures represent the accuracy of the predictions by 50 epochs.
The following 5 figures (Figures 11–16) compare the actual AQI 10 or AQI 2.5 with the
predicted one. Each scatter plot used displays the actual values of AQI along the X-axis,
epochs.
The following 5 figures (Figures 11–16) compare the actual AQI 10 or AQI 2.5 with
the predicted one. Each scatter plot used displays the actual values of AQI along the X-
axis, and displays the predicted values along the Y-axis. Each point represents individual
Processes 2024, 12, 1961 data entries, and the dashed red line indicates the ideal relationship where the predicted 17 of 25
values exactly match the actual measurements. Correlating the results from Figures 11–16
with the previous charts (Figures 8 and 9—Graph generated in ThingSpeak web applica-
tion displays
and for AQI),the it ispredicted
easier to values
understand
alongwhy for a single
the Y-axis. Each value on the X-axis
point represents (taken atdata
individual dif-
ferent times
entries, in the
and the measurement
dashed interval)the
red line indicates there arerelationship
ideal more (predicted)
where values on the Y-axis.
the predicted values
To somematch
exactly extent,thethis is also
actual the purpose ofCorrelating
measurements. machine learning algorithms
the results to reduce
from Figures 11–16thewith
pre-
diction error over multiple iterations.
the previous charts (Figures 8 and 9—Graph generated in ThingSpeak web application
The results
for AQI), in Figure
it is easier 11 are obtained
to understand why for after the value
a single datasetonwas trained(taken
the X-axis using atthe ‘adam’
different
optimizer for the prediction model while Figure 12 contains the results using
times in the measurement interval) there are more (predicted) values on the Y-axis. To somethe
extent,
‘RMSprop’this is also the purpose
optimizer. It can be of machine
observed learning
that algorithms
this model is moretoaccurate
reduce thethanprediction
the other
error
one. over multiple iterations.
Figure 12.
Figure 12. Predicted
Predicted vs.
vs. actual
actual AQI
AQI 10—prediction
10—prediction model
model with
with ‘RMSprop’
‘RMSprop’ optimizer.
optimizer.
The following diagrams compare the actual with the predicted value of AQI 2.5. Most
of the data points which can be seen in the Figure 13 cluster more closely around the red
dashed line than the ones from the diagram represented in Figure 14, suggesting that the
first model’s predictions are more accurate than the other one.
Figure 12. Predicted vs. actual AQI 10—prediction model with ‘RMSprop’ optimizer.
The following diagrams compare the actual with the predicted value of AQI 2.5. Most
of the data points which can be seen in the Figure 13 cluster more closely around the red
Processes 2024, 12, 1961
dashed line than the ones from the diagram represented in Figure 14, suggesting that
18 ofthe
25
first model’s predictions are more accurate than the other one.
Figure 14.
Figure 14. Predicted
Predicted vs.
vs. actual
actual AQI
AQI 2.5—prediction model with
2.5—prediction model with ‘RMSprop
‘RMSprop optimizer.
optimizer.
The results
The values inobtained
Figure indicate that ourafter
11 are obtained model
theprediction
dataset wasis trained
performingusingwell
theand ac-
‘adam’
cording tofor
optimizer Figures 11 and 13,
the prediction our data
model whilefits into 12
Figure thecontains
‘good’ range fromusing
the results the AQI
the categories,
‘RMSprop’
which can It
optimizer. becan
seenbein the tablethat
observed 1 [35].
this However, the low
model is more values
accurate of AQI
than can be
the other due to the
one.
rather high
The temperatures
following diagrams(above 22 degrees
compare Celsius)
the actual from
with the the time
predicted of data
value collection—
of AQI 2.5. Most
May 2024, because it is known that PM ratios decrease with increasing temperature
of the data points which can be seen in the Figure 13 cluster more closely around [2].
the red
On the other
dashed hand,
line than thethe particular
ones from thecontext
diagram ofrepresented
the analyzed in environment with lessthat
Figure 14, suggesting house-
the
holdmodel’s
first activities like cooking
predictions areon stoves
more and indoor
accurate smoking
than the positively affected the less in-
other one.
door particulate emissions.
The Figures 15 and 16 represent the accuracy of the predictions by 1000 epochs using
random forest regression.
May 2024, because it is known that PM ratios decrease with increasing temperature [2].
On the other hand, the particular context of the analyzed environment with less house-
hold activities like cooking on stoves and indoor smoking positively affected the less in-
door particulate emissions.
Processes 2024, 12, 1961
The Figures 15 and 16 represent the accuracy of the predictions by 1000 epochs19using
of 25
random forest regression.
Figure16.
Figure 16.Predicted
Predictedvs.
vs. actual
actualAQI
AQI 2.5—random
2.5—random forest
forestregression.
regression.
Table
The 2 reflect
values the results
obtained generated
indicate by the
that our Mean
model Absolute is
prediction Error, Mean Squared
performing well andError,
ac-
Root Mean
cording Squared
to Figures 11Error, and
and 13, R-squared
our metrics.
data fits into the ‘good’ range from the AQI categories,
which can be seen in the Table 1 [35]. However, the low values of AQI can be due to the
Table 2.
rather AQItemperatures
high measurements(above
resulted
22after evaluating
degrees thefrom
Celsius) modelthe
with MAE,
time MSE,collection—May
of data RMSE and R2.
2024, because it is known that PM ratios decrease with increasing temperature [2]. On
Prediction AQI 10 AQI 2.5
Epochs the other hand, the particular context of the analyzed environment with less household
Model MAE like cooking
activities MSE onRMSE R2
stoves and indoor MAE positively
smoking MSE affected
RMSE R2
the less indoor
50 0.3289 0.1968
particulate emissions. 0.4437 0.9383 0.3205 0.1557 0.3946 0.9953
‘adam’ opti- 100 0.3276 0.1972
The Figures 0.4440
15 and 16 represent0.9382
the accuracy0.3211 0.1573 by0.3966
of the predictions 1000 epochs 0.9952
using
mizer 500 0.3246forest0.2032
random regression.0.4508 0.9363 0.3247 0.1622 0.4028 0.9951
1000 0.3174 0.2102the results
Table 2 reflect 0.4585 0.9341
generated 0.3145Absolute
by the Mean 0.1494 0.3865Squared
Error, Mean 0.9955
Error,
50 0.3550
Root Mean 0.2202
Squared 0.4693
Error, and 0.9309
R-squared 0.3283
metrics. 0.1678 0.4096 0.9949
‘RMSprop’ 100 0.3142 0.1901 0.4360 0.9404 0.3432 0.1795 0.4237 0.9946
optimizer 500 0.3145 0.1995 0.4467 0.9374 0.3222 0.1581 0.3976 0.9952
1000 0.3088 0.2018 0.4492 0.9367 0.3361 0.1690 0.4111 0.9949
n_estimator
Random For- 100 0.2785 0.2095 0.4577 0.9343 0.2483 0.1516 0.3894 0.9954
est 1000 0.2778 0.2086 0.4568 0.9346 0.2482 0.1503 0.3877 0.9955
Processes 2024, 12, 1961 20 of 25
Table 2. AQI measurements resulted after evaluating the model with MAE, MSE, RMSE and R2.
‘adam’ 100 0.3276 0.1972 0.4440 0.9382 0.3211 0.1573 0.3966 0.9952
optimizer 500 0.3246 0.2032 0.4508 0.9363 0.3247 0.1622 0.4028 0.9951
1000 0.3174 0.2102 0.4585 0.9341 0.3145 0.1494 0.3865 0.9955
50 0.3550 0.2202 0.4693 0.9309 0.3283 0.1678 0.4096 0.9949
‘RMSprop’ 100 0.3142 0.1901 0.4360 0.9404 0.3432 0.1795 0.4237 0.9946
optimizer 500 0.3145 0.1995 0.4467 0.9374 0.3222 0.1581 0.3976 0.9952
1000 0.3088 0.2018 0.4492 0.9367 0.3361 0.1690 0.4111 0.9949
n_estimator
Random 100 0.2785 0.2095 0.4577 0.9343 0.2483 0.1516 0.3894 0.9954
Forest 1000 0.2778 0.2086 0.4568 0.9346 0.2482 0.1503 0.3877 0.9955
The ‘adam’ optimizer’s performance improves with increasing epochs, with a slight
decrease in the Mean Absolute Error (MAE) for AQI 10 and a similar trend for AQI 2.5.
The root mean squared error (RMSE) also decreases with more epochs, suggesting better
predictions. The R-squared values are consistently high for both AQI 10 and AQI 2.5,
indicating that the model explains a significant portion of the data’s variance. This suggests
that the ‘adam’ optimizer is effective in predicting data.
The ‘RMSprop’ optimizer significantly improves the MAE for AQI 10 from 0.35508
to 0.30883 at 1000 epochs, while the MAE for AQI 2.5 initially increases but decreases
to 0.33618 at 1000 epochs. The RMSE for both AQI 10 and AQI 2.5 follows a similar
trend, improving with more epochs. The R2 values remain high, indicating the optimizer
effectively explains data variance, though slightly lower than the ‘adam’ optimizer.
The random forest model with 100 estimators has a lower MAE for AQI 10 and AQI
2.5 compared to the neural network models using the ‘adam’ and ‘RMSprop’ optimizers.
Increasing the number of estimators slightly reduces the MAE, indicating a marginal
improvement. The RMSE values are comparable to the neural network models, and the R2
values are high, indicating a strong fit to the data.
Table 3 summarizes the findings generated by the Mean Absolute Percentage Er-
ror, Root Mean Squared Log Error, Symmetric Mean Absolute Percentage Error, Mean
Directional Accuracy, and Median Absolute Error metrics.
For the ‘adam’ optimizer, as the epochs increase from 50 to 1000, there is a noticeable
improvement in the accuracy metrics for both AQI PM10 and PM2.5. For instance, the
MAPE for AQI PM10 decreases from 5.8233 at 50 epochs to 5.5584 at 1000 epochs, while the
SMAPE drops from 2.8391 to 2.7145, indicating a reduction in error.
Similarly, for AQI PM2.5, the MAPE reduces from 2.9556 at 50 epochs to 2.8293 at
1000 epochs, and the MedAE decreases from 0.3009 to 0.2645, demonstrating enhanced
prediction accuracy with more training.
In contrast, the ‘RMSprop’ optimizer exhibits fluctuating results across epochs; for
example, the MAPE for AQI PM10 varies from 5.7068 to 5.6649, while the MDA slightly
decreases from 0.8154 to 0.8108. AQI PM2.5 performance remains relatively stable, with the
MDA around 0.9000 but some inconsistencies in other metrics like the SMAPE and MedAE.
Processes 2024, 12, 1961 21 of 25
Table 3. AQI measurements resulted after evaluating the model with MAPE, RMSLE, SMAPE, MDA
and MedAE.
The random forest model, evaluated with 100 and 1000 estimators, shows consistent
performance, maintaining a lower MAPE of around 4.8627 for AQI PM10 and 2.3673 for
AQI PM2.5 compared to the optimizers. However, the SMAPE values are higher, such as
4.7675 for AQI PM10, highlighting differences in how errors are distributed.
Overall, the ‘adam’ optimizer demonstrates gradual improvement with more epochs,
the ‘RMSprop’ optimizer shows mixed stability, and random forest provides consistent but
distinct error dynamics, especially in the MAPE and SMAPE comparisons.
5. Discussions
One limitation of our study is that the IoT system has been tested indoors. Even if
from a physical point of view the IoT system can be relatively easily adapted for outdoor
environments, in the following is provided a more comprehensive analysis from a holistic
view (physical, economical, authorization and legislation, etc.).
Outdoor air quality monitoring comes with some challenges besides adding sensors.
First of all, it is about powering the IoT systems, developed and placed on poles, to voltage
sources (either for direct connection to the city network power grid or using solar batteries)
but the latest will offer a different lifetime from one area to another, from a season to another,
and variability including in the collection of data from the sensors. Another challenge
consists of transmitting data to the cloud and ensuring the level of connectivity of the IoT
system (wired networks, Wi-Fi, RFID, GSM, etc.). These constraints, besides the number
of systems implemented and arranged on poles approximately 200 m apart, will have an
economic impact. Another impediment in outdoor implementation is the agreement of
local or county authorities to place IoT systems in the city for monitoring as well as ensuring
their maintenance in the case of malfunctions. The system implemented in this phase and
support for the scientific paper is only a prototype with the aim of generating data that can
then be used in the application of artificial intelligence algorithms for prediction. The IoT
system was tested internally to avoid some of the previously mentioned challenges.
However, in 2017, one of the co-authors had such a development for outdoor air
quality monitoring using a prototype connected to cars, GSM for data transmission using
mobile phones, and GPS sensors to identify the geographic position with impact in the case
of changes in weather conditions [4]. Sensors that must be added for outdoor monitoring
are a gas sensor for measuring CO2 and NOX (e.g., SainSmart MQ135 Sensor Air Quality
Sensor and Hazardous Gas Detection Module) and barometric pressure (e.g., BMP085
Digital Barometric Pressure Measurement Sensor). For communication could be used either
Processes 2024, 12, 1961 22 of 25
a combined GPS/GSM unit for location and communication or a Wi-Fi connectivity and
GPS unit. Another limitation consists of the relatively small dataset and using machine
learning algorithms with fixed parameters. For any kind of optimization problem, the
hyperparameters can be tuned and varied in a design space exploration process. An auto-
matic design space exploration process based on genetic algorithms or other evolutionary
algorithms consumes a lot of execution time and was not the initial scope of this work.
However, the idea will be investigated by the authors as a further development.
Despite these limitations, the current work has succeeded in presenting an integrated
AIoT system in which the physical IoT system is functional and produces real data re-
garding air pollution which are then analyzed and used as input in machine learning
algorithms (AI component of the system) for the implementation of prediction algorithms.
The random forest model expresses the best performance exploiting the robustness to
outliers and noise and the ability to capture non-linear relationships between the predictor
and response variables.
Author Contributions: Conceptualization, A.F. and R.B.; methodology, A.F. and R.B.; software, C.B.;
validation, C.B., R.B. and A.F.; formal analysis, A.F.; investigation, C.B.; resources, C.B.; data curation,
C.B.; writing—original draft preparation, C.B., R.B. and A.F.; writing—review and editing, C.B. and
A.F.; visualization, C.B.; supervision, R.B.; project administration, A.F. All authors have read and
agreed to the published version of the manuscript.
Funding: This work was partially developed in the project CoDEMO (Co-Creative Decision-Makers for
5.0 Organizations), grant number 101104819, an initiative supported by the Erasmus+ funding mecha-
nism ERASMUS-EDU-2022-PI-ALL-INNO-EDU-ENTERP (Alliances for Education and Enterprises).
Data Availability Statement: The data presented in this study are available by request from the
corresponding author.
Conflicts of Interest: The authors declare no conflicts of interest.
References
1. Fahmi, N.; Prayitno, E.; Musri, T.; Supria, S.; Ananda, F. An Implementation Environmental Monitoring Real-time IoT Technology.
In Proceedings of the 2022 International Conference on Electrical, Computer and Energy Technologies (ICECET), Prague, Czech
Republic, 20 July 2022; pp. 1–4. [CrossRef]
2. Li, H.; Xu, X.L.; Dai, D.W.; Huang, Z.Y.; Ma, Z.; Guan, Y.J. Air pollution and temperature are associated with increased COVID-19
incidence: A time series study. Int. J. Infect. Dis. 2020, 97, 278–282. [CrossRef] [PubMed]
3. HEPA (Health and Energy Platform of Action). Call to Action to Increase Climate Resilience of Health Care Facilities & Air
Quality Through Sustainable Energy. 2022. Available online: https://www.who.int/publications/m/item/call-to-action-to-
increase-climate-resilience-of-health-care-facilities---air-quality-through-sustainable-energy (accessed on 10 June 2024).
4. Florea, A.; Berntzen, L.; Johannessen, M.R.; Stoica, D.; Naicu, I.S.; Cazan, V. Low cost mobile embedded system for air quality
monitoring. In Proceedings of the Sixth International Conference on Smart Cities, Systems, Devices and Technologies (SMART),
Venice, Italy, 25–29 June 2017; pp. 25–29.
5. Ullo, S.L.; Sinha, G.R. Advances in Smart Environment Monitoring Systems Using IoT and Sensors. Sensors 2020, 20, 3113.
[CrossRef] [PubMed]
6. Laha, S.R.; Pattanayak, B.K.; Pattnaik, S. Advancement of Environmental Monitoring System Using IoT and Sensor: A Compre-
hensive Analysis. AIMS Environ. Sci. 2022, 9, 771–800. [CrossRef]
7. Hassan, M.N.; Islam, M.R.; Faisal, F.; Semantha, F.H.; Siddique, A.H.; Hasan, M. An IoT based Environment Monitoring System.
In Proceedings of the 2020 3rd International Conference on Intelligent Sustainable Systems (ICISS), Thoothukudi, India, 3–5
December 2020; pp. 1119–1124. [CrossRef]
8. Sun, Y.; Xue, Y.; Jiang, X.; Jin, C.; Wu, S.; Zhou, X. Estimation of the PM2.5 and PM10 Mass Concentration over Land from FY-4A
Aerosol Optical Depth Data. Remote Sens. 2021, 13, 4276. [CrossRef]
9. Mokrani, H.; Lounas, R.; Bennai, M.T.; Salhi, D.E.; Djerbi, R. Air Quality Monitoring Using IoT: A Survey. In Proceedings of the
2019 IEEE International Conference on Smart Internet of Things (SmartIoT), Tianjin, China, 9–11 August 2019; pp. 127–134.
10. Tran, Q.; Dang, Q.; Le, T.; Nguyen, H.-T.; Tan, L. Air Quality Monitoring and Forecasting System using IoT and Machine Learning
Techniques. In Proceedings of the 2022 6th International Conference on Green Technology and Sustainable Development (GTSD
2022), Nha Trang City, Vietnam, 29 July 2022; pp. 786–792.
11. Lalitha, K.V.; Naveen, A.V.; Supriya, A.S.V.; Nagalakshmi, P.S.R.; Kantham, P.S. Realtime Air Quality Evaluator Using Iot and
Machine Learning. Int. J. Eng. Res. Technol. 2024, 13. [CrossRef]
12. Khanna, A.; Kaur, S. Internet of Things (IoT), Applications and Challenges: A Comprehensive Review. Wirel. Pers Commun. 2020,
114, 1687–1762. [CrossRef]
13. Saini, J.; Dutta, M.; Marques, G. Indoor Air Quality Monitoring Systems Based on Internet of Things: A Systematic Review. Int. J.
Environ. Res. Public Health 2020, 17, 4942. [CrossRef] [PubMed]
14. Abdulmalek, S.; Nasir, A.; Jabbar, W.A.; Almuhaya, M.A.M.; Bairagi, A.K.; Khan, M.A.; Kee, S.H. IoT-Based Healthcare-Monitoring
System towards Improving Quality of Life: A Review. Healthcare 2022, 10, 1993. [CrossRef] [PubMed]
15. Gangwar, A.; Singh, S.; Mishra, R.; Prakash, S. The State-of-the-Art in Air Pollution Monitoring and Forecasting Systems Using
IOT, Big Data, and Machine Learning. Wirel. Pers. Commun. 2023, 130, 1699–1729. [CrossRef]
16. Sai, K.B.K.; Mukherjee, S.; Sultana, H.P. Low Cost IoT Based Air Quality Monitoring Setup Using Arduino and MQ Series Sensors
with Dataset Analysis. Procedia Comput. Sci. 2019, 165, 322–327. [CrossRef]
17. Karnati, H. IoT-Based Air Quality Monitoring System with Machine Learning for Accurate and Real-time Data Analysis. arXiv
2023, arXiv:2307.00580. [CrossRef]
Processes 2024, 12, 1961 24 of 25
18. Cincinelli, A.; Martellini, T. Indoor Air Quality and Health. Int. J. Env. Res. Public Health 2017, 14, 1286. [CrossRef] [PubMed]
[PubMed Central]
19. Al Horr, Y.; Arif, M.; Katafygiotou, M.; Mazroei, A.; Kaushik, A.; Elsarrag, E. Impact of indoor environmental quality on occupant
well-being and comfort: A review of the literature. Int. J. Sustain. Built Environ. 2016, 5, 1–11. [CrossRef]
20. Zhang, L.; Ou, C.; Magana-Arachchi, D.; Vithanage, M.; Vanka, K.S.; Palanisami, T.; Masakorala, K.; Wijesekara, H.; Yan, Y.;
Bolan, N.; et al. Indoor Particulate Matter in Urban Households: Sources, Pathways, Characteristics, Health Effects, and Exposure
Mitigation. Int. J. Environ. Res. Public Health 2021, 18, 11055. [CrossRef]
21. Pope, C.A., 3rd; Dockery, D.W. Health effects of fine particulate air pollution: Lines that connect. J. Air Waste Manag. Assoc. 2006,
56, 709–742. [CrossRef] [PubMed]
22. Gope, S.; Dawn, S.; Das, S.S. Effect of Covid-19 Pandemic on Air Quality: A Study Based on Air Quality Index. Environ. Sci.
Pollut. Res. 2021, 28, 35564–35583. [CrossRef] [PubMed]
23. Sarmadi, M.; Rahimi, S.; Rezaei, M.; Sanaei, D.; Dianatinasab, M. Air quality index variation before and after the onset of
COVID-19 pandemic: A comprehensive study on 87 capital, industrial and polluted cities of the world. Environ. Sci. Eur. 2021,
33, 134. [CrossRef]
24. Vadrevu, K.P.; Eaturu, A.; Biswas, S.; Lasko, K.; Sahu, S.; Garg, J.K.; Justice, C. Spatial and temporal variations of air pollution
over 41 cities of India during the COVID-19 lockdown period. Sci. Rep. 2020, 10, 16574. [CrossRef]
25. Li, C.; Wang, J.; Wang, S.; Zhang, Y. A review of IoT applications in healthcare. Neurocomputing 2024, 565, 127017. [CrossRef]
26. Sofia, D.; Giuliano, A.; Gioiella, F.; Barletta, D.; Poletto, M. Modeling of an air quality monitoring network with high space-time
resolution. Comput. Aided Chem. Eng. 2018, 43, 193–198.
27. Himeur, Y.; Elnour, M.; Fadli, F.; Meskin, N.; Petri, I.; Rezgui, Y.; Bensaali, F.; Amira, A. AI-big data analytics for building
automation and management systems: A survey, actual challenges and future perspectives. Artif. Intell. Rev. 2023, 56, 4929–5021.
[CrossRef] [PubMed]
28. Raspberry Pi Trading Ltd. Raspberry Pi 4 Computer Model B. Available online: https://www.raspberrypi.com/products/
raspberry-pi-4-model-b/specifications/ (accessed on 11 June 2024).
29. Texas Instruments. TMP117 High-Accuracy, Low-Power, Digital Temperature Sensor with SMBusTM- and I2C-Compatible
Interface. Available online: https://pdf1.alldatasheet.com/datasheet-pdf/view/1083213/TI1/TMP117.html (accessed on 16
September 2022).
30. Honeywell. HIH-4030 Datasheet(PDF). Available online: https://pdf1.alldatasheet.com/datasheet-pdf/view/1242920/
HONEYWELL/HIH-4030.html (accessed on 12 June 2024).
31. Texas Instruments. “ADS1015 Datasheet(PDF)”. Available online: https://pdf1.alldatasheet.com/datasheet-pdf/view/541221
/TI1/ADS1015.html (accessed on 12 June 2024).
32. Microcontrollerslab. Nova PM SDS011 Dust Particle Sensor for Air Quality Measurement. Available online: https:
//microcontrollerslab.com/nova-pm-sds011-dust-sensor-pinout-working-interfacing-datasheet/#2D_Model (accessed
on 12 June 2024).
33. Pypi. Python-aqi 0.6.1. Available online: https://pypi.org/project/python-aqi/ (accessed on 12 June 2024).
34. United States Environmental Protection Agency. Technical Assistance Document for the Reporting of Daily Air Quality—The
Air Quality Index (AQI). EPA 454/B-18-007. 2018. Available online: https://www.airnow.gov/sites/default/files/2020-05/aqi-
technical-assistance-document-sept2018.pdf (accessed on 13 June 2024).
35. United States Environmental Protection Agency. Final Reconsideration of the National Ambient Air Quality Standards for
Particulate Matter. 2024. Available online: https://www.epa.gov/pm-pollution/final-reconsideration-national-ambient-air-
quality-standards-particulate-matter-pm (accessed on 13 June 2024).
36. Maureira, M.A.G.; Oldenhof, D.; Teernstra, L. ThingSpeak—An API and Web Service for the Internet of Things. 2014. Available
online: https://staas.home.xs4all.nl/t/swtr/documents/wt2014_thingspeak.pdf (accessed on 13 June 2024).
37. Ahlawat, S. Introduction to TensorFlow. In Reinforcement Learning for Finance; Apress: Berkeley, CA, USA, 2023; pp. 5–137.
[CrossRef]
38. Aarthi, A.; Gayathri, P.; Gomathi, N.R.; Kalaiselvi, S.; Gomathi, V. Air quality prediction through regression model. Int. J. Sci.
Technol. Res. 2020, 9, 923–928. Available online: http://www.ijstr.org/final-print/mar2020/Air-Quality-Prediction-Through-
Regression-Model.pdf (accessed on 14 June 2024).
39. Weisberg, S. Yeo-Johnson Power Transformations; Department of Applied Statistics, University of Minnesota: St. Paul, MN,
USA, 2001.
40. Aityan, S.K. Linear Regression. In Business Research Methodology; Springer: Berlin/Heidelberg, Germany, 2022; pp. 359–394.
[CrossRef]
41. Chen, L.; Gamage, P.W.; Ryan, J. Debias random forest regression predictors. J. Stat. Res. 2023, 56, 115–131. [CrossRef]
42. Chicco, D.; Warrens, K.J.; Jurman, G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE,
MSE and RMSE in regression analysis evaluation. PeerJ Comput. Sci. 2021, 7, e623. [CrossRef]
43. Foong, N.; Chin, L.H.; Hoe, Q.S. Mean squared error—A tool to evaluate the accuracy of parameter estimators in regression.
J. Qual. Meas. Anal. 2008, 4, 71–80.
44. Hodson, T.O. Root-mean-square error (RMSE) or mean absolute error (MAE): When to use them or not. Geosci. Model Dev. 2022,
15, 5481–5487. [CrossRef]
Processes 2024, 12, 1961 25 of 25
45. Kim, S.; Kim, H. A new metric of absolute percentage error for intermittent demand forecasts. Int. J. Forecast. 2016, 32, 669–679.
[CrossRef]
46. Jadon, A.; Patil, A.; Jadon, S. A Comprehensive Survey of Regression-Based Loss Functions for Time Series Forecasting. In Data
Management, Analytics and Innovation, Proceedings of the International Conference on Data Management, Analytics & Innovation, Vellore,
India, 19–21 January 2024; Springer: Singapore, 2024; Volume 998, pp. 117–147.
47. Kreinovich, V.; Nguyen, H.T.; Ouncharoen, R. How to Estimate Forecasting Quality: A System-Motivated Derivation of Symmetric
Mean Absolute Percentage Error (SMAPE) and Other Similar Characteristics; University of Texas at El Paso: El Paso, TX, USA, 2014.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.
Reproduced with permission of copyright owner. Further reproduction
prohibited without permission.