Smart Air Quality Monitoring IoT-Based Infrastruct
Smart Air Quality Monitoring IoT-Based Infrastruct
Article
Smart Air Quality Monitoring IoT-Based Infrastructure for
Industrial Environments
Laura García 1,† , Antonio-Javier Garcia-Sanchez 2,†,∗ , Rafael Asorey-Cacheda 2,† , Joan Garcia-Haro 2,†
and Claudia-Liliana Zúñiga-Cañón 3,†
1 Instituto de Investigación para la Gestión Integrada de Zonas Costeras, Universitat Politècnica de València,
46730 Valencia, Spain
2 Department of Information and Communications Technologies, Universidad Politécnica de Cartagena,
30202 Cartagena, Spain
3 Research Group COMBA I+D, Universidad Santiago de Cali, Cali 760035, Colombia
* Correspondence: [email protected]
† These authors contributed equally to this work.
Abstract: Deficient air quality in industrial environments creates a number of problems that affect
both the staff and the ecosystems of a particular area. To address this, periodic measurements must
be taken to monitor the pollutant substances discharged into the atmosphere. However, the deployed
system should also be adapted to the specific requirements of the industry. This paper presents a
complete air quality monitoring infrastructure based on the IoT paradigm that is fully integrable
into current industrial systems. It includes the development of two highly precise compact devices
to facilitate real-time monitoring of particulate matter concentrations and polluting gases in the air.
These devices are able to collect other information of interest, such as the temperature and humidity
of the environment or the Global Positioning System (GPS) location of the device. Furthermore,
machine learning techniques have been applied to the Big Data collected by this system. The results
identify that the Gaussian Process Regression is the technique with the highest accuracy among
Citation: García, L.; Garcia-Sanchez,
the air quality data sets gathered by the devices. This provides our solution with, for instance, the
A.-J.; Asorey-Cacheda, R.; intelligence to predict when safety levels might be surpassed.
Garcia-Haro, J.; Zúñiga-Cañón, C.-L.
Smart Air Quality Monitoring Keywords: air quality monitoring; particulate matter; polluting gas; machine-learning
IoT-Based Infrastructure for
Industrial Environments. Sensors
2022, 22, 9221. https://doi.org/
10.3390/s22239221 1. Introduction
Academic Editor: Luigi Ferrigno Air quality is one of the main aspects to consider in environmental impact assessment.
It not only affects the environment but also has an effect on the health of the people living
Received: 27 October 2022
and working near industrial activities. Industrial spaces are particularly susceptible to air
Accepted: 23 November 2022
contamination due to the manipulation of chemical components and processes that emit
Published: 27 November 2022
polluting gases and small particles into the air. These types of particles are made up of
Publisher’s Note: MDPI stays neutral a complex mixture of solid, solid and liquid, or liquid particles of organic and inorganic
with regard to jurisdictional claims in substances suspended in the air [1]. They can penetrate the respiratory tract, reaching
published maps and institutional affil- greater depths the smaller their size [2]. The WHO states that particulate matter is affecting
iations. the world’s population more than any other type of pollutant. In addition, it recommends
not exceeding the levels of particles specified in Table 1 [3]. Therefore, polluting gases
that could potentially harm human health are also present in the air. These gases are often
monitored together with particulate matter levels to establish air quality indexes, such as
Copyright: © 2022 by the authors.
the European Air Quality Index [4] that provides an air quality evaluation ranging from
Licensee MDPI, Basel, Switzerland.
good to extremely poor based on the data gathered from over 3500 stations and following
This article is an open access article
distributed under the terms and
the EU air quality directives [5]. Sulfur dioxide (SO2 ), nitrogen dioxide (NO2 ), ozone
conditions of the Creative Commons
(O3 ), and carbon monoxide (CO) are among the most harmful substances. Table 2 shows
Attribution (CC BY) license (https://
the threshold values that the WHO recommends not exceeding [6]. As a consequence,
creativecommons.org/licenses/by/ industries are requesting ways to track and control their emissions in real time to avoid
4.0/). facing penalties for exceeding the limits established where they operate.
WHO Rec-
Annual Hourly 10 min
ommended 24 h Average 8 h Average
Average Average Average
Values
SO2 – 20 µg m−3 – – 500 µg m−3
NO2 40 µg m−3 – – 200 µg m−3 –
O3 – – 100 µg m−3 – –
Air quality is usually monitored using precision stations deployed at specific locations,
many of which are integrated into (official) meteorology monitoring stations. However,
as interest in monitoring air quality to improve health and implement better policies has
increased, the development of portable and affordable devices has multiplied. Specifically,
there are many proposals in the context of Smart Cities for indoor and outdoor air quality
monitoring [7]. These types of systems are usually deployed under the Internet of Things
(IoT) umbrella [8], where devices are used in a variety of locations. The data gathered can be
processed to map a city’s air quality [9] or provide the air quality history of a location. Other
proposals include devices such as pUAV to monitor air quality from different areas [10].
These devices can communicate with others through wireless connections or connect to the
Internet with a cabled Ethernet connection to forward data to the server for storage and
processing. Among the available wireless technologies, WiFi, ZigBee, or LoRa are popular
choices for IoT air quality monitoring. Monitored data analysis has also evolved to include
Artificial Intelligence (AI) techniques. Machine learning, as a part of the solutions AI
provides, is suitable to perform predictions and estimations of air quality [11]. Furthermore,
environments such as industrial facilities may need to adopt certain standards to ensure
interoperability with other systems and devices, for example, the INSPIRE specifications
based on infrastructures for spatial data [12].
This paper presents an air quality measurement architecture comprised of devices
capable of measuring polluting gases and suspended particulate matter. To facilitate the
installation of these devices so that they are totally independent of the electrical network,
an anti-vandalism structure with a solar panel on the top and housing, an auxiliary battery
to provide continuous power has been designed. In addition, a LoRaWAN network has
been designed, implemented, and deployed, through which the devices communicate with
a server that processes and stores the data captured in databases for further analysis. An
alert system has also been developed with email alerts and/or instant message alarms.
We have designed and implemented software modules to communicate our solution with
OPC systems. These systems are widely used in the industry to control pPLC and the
information exchanges between their systems. Moreover, the Big Data generated has been
extensively analyzed using several machine learning techniques to determine which is the
most accurate one. Therefore, it is possible to forecast trends of future data and predict,
for instance, inadmissible pollution levels. The combination of the above functionalities
results in a robust air quality measurement architecture that can be fully integrated with the
systems traditionally used by industry. This makes it an ideal industry focused solution for
real-time air quality monitoring. Specifically, the proposed system would benefit industries
working with soils, stones, grains, or other materials that can produce particulate matter as
well as industries susceptible to high levels of NO2 , O3 , SO2 , and CO.
The rest of the paper has been organized as follows. Section 2 presents the related
work. In Section 3, the developed infrastructure is detailed. Section 4 describes the design
Sensors 2022, 22, 9221 3 of 45
of the monitoring devices. Section 5 discusses the obtained results. Finally, Section 6
concludes.
2. Related Work
Insufficient air quality can lead to severe health problems, but precision air quality
monitoring devices may be costly and difficult to deploy in a variety of settings. As a
result, several studies have been conducted to determine the correlation between more
affordable sensors and precision air quality stations. Evangelos Bagkis et al. analyzed
in [13] the performance of an air quality monitoring device compared to a reference station.
Seven machine learning algorithms were employed to model a correction factor. The results
showed that variations in the meteorological conditions affected the quality of the data.
Among all the techniques tested, the Convolutional Neural Network (CNN) obtained the
best overall performance. Furthermore, the average of several estimators improved the
metrics. Byoung Gook Loh et al. used Web Query and Machine Learning to calibrate a
device for PM2.5 particulate matter monitoring [14]. The algorithms employed as regression
models for the calibration were k-nearest neighbors, Extreme Gradient Boosting, Support
Vector Machine, and Random Forest. Stratified k-fold cross-validation was applied to
evaluate the performance. Results showed that the best performance was obtained by the
Extreme Gradient Boosting algorithm.
As air monitoring devices have become more affordable, and health concerns about
poor air quality have increased, interest in deploying air quality monitoring systems in
different environments has grown. Andrew Rebeiro-Hargrave et al. developed in [9] a
system for urban air quality monitoring. Their system can generate history graphs and
pollution maps from the information gathered. The devices are portable and low-cost and
can monitor air pollutant gases (O3 , CO, and NO2 ) and PM2.5 particles. The system was
tested in Helsinki, and a pollution profile of the city was created. It was presented as a
tool to generate policies to improve the city’s air quality. Aditiyo Hermawan Kuncor et
al. designed in [15] an air quality monitoring system based on IoT that was deployed
and tested in the city of Tasikmalaya. CO, O3 , and CH4 were monitored using MQ-7,
MQ-131, and MQ-4 sensors, respectively. Arduino microprocessors were employed to
obtain the data from the sensors and forward them through the Internet to be visualized
in real-time. Steven J. Johnston et al. presented in [16] a system for air quality monitoring.
The devices were equipped with four PM sensors and LoRaWAN transceivers. Tests
were performed in the city of Southampton in the UK. The results proved that the system
performed correctly on a city scale. Furthermore, some of the low-cost sensors were
suitable for monitoring particles and detecting trends. These types of systems for air quality
monitoring are usually deployed as Wireless Sensor Networks (WNS), such as the one
presented by Patricia Arroyo et al. [17]. Data were gathered from the ZigBee nodes and
forwarded to the server through the gateway. Cloud computing systems stored, processed,
and monitored volatile organic compounds, such as benzene, xylene, ethylbenzene, and
toluene. Data were processed using Support Vector Machine and a backpropagation
learning algorithm. The results showed good behavior in obtaining concentrations of
volatile organic compounds. The work by Ivan Popović et al. proposed a framework for air
quality monitoring in urban environments [18]. The system was designed with a layered
architecture based on fog computing supporting real-time operation to perform activities
such as fault diagnosis and automatic reporting. The processing performed on the sensors
was presented as microservices located on the different layers of the architecture. The
system was deployed and tested in a time frame of six months, monitoring O3 , CO, CO2 ,
SO2 , NO, and NO2 , as well as PM1 , PM2.5 , and PM10 . Meteorological parameters, such as
air temperature, humidity, and pressure were also monitored. The results corroborated
good system performance.
In addition to urban environments, indoor and industrial environments are also
considered for air quality measurement deployments. JunHo Jo et al. presented in [19]
an indoor air quality monitoring system able to measure CO, CO2 , VOC, and aerosol
concentrations, in addition to air temperature and humidity. The data were forwarded
Sensors 2022, 22, 9221 4 of 45
to the server in real time through LTE and could be accessed through a web server or a
specifically developed application. The system was tested at Hanyang University, and
the results showed the prototype implementation and data collected from the application.
Judith Molka-Danielsen et al. in [20] studied the deployment of an air quality monitoring
system in industrial environments, specifically, a logistics shipping base. The authors
discussed how to process the obtained data to evaluate the impact of high CO2 levels
in an industrial workplace. They stressed the importance of the correct transformation
of data to facilitate their visualization. Furthermore, the authors suggested using smart
closed-loop systems that could detect spikes of potentially harmful polluting gases and
trigger actuators to provide more ventilation.
The popularity and effectiveness of AI has led to the introduction of machine learning
techniques in air quality monitoring systems. C. Amuthadevi et al. used different machine
learning approaches to develop an air quality monitoring model [11]. The selected machine
learning methods were Non-Linear Artificial Neural Network (ANN), Neuro-Fuzzy, Statis-
tical Multilevel Regression, and Deep Learning Long-Short-Term Memory (DL-LSTM). To
determine the accuracy of the different methods, the RMSE (root-mean-square error), R2 ,
and MAPE parameters were used. Outcomes showed that DL-LSTM presented the best
result among the tested methods. Dixian Zhu et al. employed machine learning to develop
a forecasting model for air quality [21]. They used a refined model with regularization
to enforce the prediction models. MTL, nuclear norm regularization, Frobenius norm
regularization, and l2,1 -norm regularization were compared. The results of the experiments
proved the efficiency of their proposed method. Similarly, Naomi Zimmerman et al. [22]
employed machine learning to create calibration models to improve the performance of
low-cost sensors for air quality monitoring. The results of testing univariate Linear Re-
gression, Random Forests, and Empirical Multiple Linear Regression showed Random
Forests to be the one that enabled low-cost sensors to meet the requirements of the US
EPA Air Sensors Guidebook. In addition, differences in NO2 concentrations were found
in less than 1.5 km. Finally, other works, such as [23] dealt with data acquisition and IoT
communication architecture from a general perspective.
The existing proposals have been developed considering the environment of the
deployment without taking into account any standards in device design apart from com-
munication protocols. This could be due to a lack of standards for certain environments.
However, considering the existing standards in industrial environments it is important to
ensure the integration, interoperability, and scalability of feasible solutions. Therefore, the
solution proposed in this work integrates high-quality components calibrated with preci-
sion sensors. We adopt the OPC standard, which is widely used in industrial environments.
Furthermore, analyzing the acquired air quality data with machine learning techniques has
contributed to determining the best algorithm for processing air quality data in industrial
environments and predicting trends for future datasets.
3. Architecture Description
In this section, the proposed system architecture for air quality monitoring in industrial
environments is described.
Figure 1 shows the general scheme of the developed infrastructure. The goal is to
plot air quality for end-clients from the data collected by a series of sensors. The acquired
data is forwarded by the devices using LoRa communication technology on the 868 MHz
frequency band, as stated in the LoRaWAN specification for deployments in Europe [24].
The server has a data management system in charge of storing data in an InfluxDB
database [25]. This type of database stores time series of data and manages the huge
amounts of data generated by the devices, applications, and infrastructures, providing a
timestamp for each of them when stored. This is useful for their later representation in
graphs using the Grafana software. Grafana is a web server that depicts data time series in
graph format [26]. This web server represents the evolution (over time) of the particulate
matter or polluting gases and the internal parameters of the devices, such as communication
link quality levels, allowing the visualization and supervision of the deployment. In this
Sensors 2022, 22, 9221 5 of 45
way, any device connected to the internal network of a client company can display the
data. Furthermore, different types of data sources can be configured in Grafana. In this
proposal, the InfluxDb data base and Prometheus monitoring system were configured
to visualize the state of the servers and the operating services. The server integrates an
OPCUA (OPC Unified Architecture) server and, if necessary, there is also a virtual machine
in Windows OS where a proxy can obtain the data stored in the OPCUA server and send
it to the OPCDA (OPC Data Access) client company’s server. The use of Windows OS is
required since OPCDA uses Microsoft DCOM protocol.
The data reaching the OPCDA server is then forwarded to an external server through
a VPN to facilitate a secure connection. A copy of the data is stored on this server as well,
and Grafana can represent the data from any device with an Internet connection since it
has its own public IP address that the client can access. This server also has an alert system.
As a basic service, Grafana’s integrated default warning system can be used to inform the
user when abnormal levels of contamination are detected. As an advanced service, the
Prometheus Alertmanager tool will notify the network manager of any failure detected in any
of the provided services. In the following subsections, all these components are described
in detail.
PM/Gas
sensor
LoRa gateway
PM/Gas
sensor
PM/Gas
sensor
Server
Industrial OPCDA
server Data
LoRa server
controller
Telemetry
Windows OPCUA client
server
virtual
machine
VPN
Connection
Preventive
Server
maintenance
alert
manager
These devices, called microcontrollers, are responsible for data acquisition and their
transmission to the network. The LoRa class establishing the form of communication must
be configured in each of the devices. The LoRaWAN specification defines three classes of
devices: class A, class B, and class C [27]. Our devices can able to operate as class A or
C. The differences between the classes reside in the time the receive windows are open,
which is two times for class A and constantly open for class C [29]. These receive windows
enable communication from the gateway to the devices. However, the longer the receive
Sensors 2022, 22, 9221 7 of 45
window is open, the higher the energy consumption of the device. As a result, class A
devices are more energy efficient and are usually powered by a battery [27]. For this reason,
the microcontroller in this proposal has been configured as a class A device.
LoRa provides two authentication methods for devices: Over-The-Air-Activation
(OTAA) and Authentication By Personalisation (ABP). The devices have a unique identifier
known as DevEUI, which is defined by the manufacturer. However, to identify the device
and all the communications coming from this device in a LoRa network, a not necessarily
unique identifier is used. It is known as DevAddr. When the activation process starts,
regardless of whether OTAA or ABP is employed, a DevAddr is assigned to the device. For
ABP, the device has a DevAddr and static session keys stored in its memory. Therefore,
even a network activation process is not necessary. In the case of OTAA, the device must
initiate a login process to the network, and the DevAddr and the keys change as a new
session is established [30]. For this proposal, both authentication methods have been used.
However, ABP was prioritized. This is because OTAA must have a high-quality connection
to ensure that the data packets coming from the gateway, which are necessary to activate
the device, are received. This could be a problem for devices located in areas with limited
radio coverage.
A series of scripts related to the subscribed topics of interest has been developed in
Python. These topics include: (i) link quality data encompassed within a topic bridging
information, like RSSI (Received Signal Strength Indicator) or SNR (signal-to-noise ratio),
and (ii) environmental pollution data received from the sensors, which is included in the
topics of each application configured in the Chirpstack server. When data reaches the
MQTT server, these scripts decode, analyze, and store them in different InfluxDB databases.
Redundant databases facilitate that the data be stored certainly in our system.
Sensors 2022, 22, 9221 8 of 45
are presented in detail. Note that the company manufacturing these devices is denoted as
Qartia Smart Technologies [36].
Programming in MicroPython is very simple since the code is loaded into the internal
memory with a program designed for a specific type of microcontroller. Communication
with the microcontroller is usually set by emulating a serial interface [38], and the program-
ming is carried out through a terminal application on a computer. We have configured the
device to send data every five minutes, which is a reasonable value for industrial processes.
These data are the average particle concentrations registered since the previous data were
sent, as well as the temperature, humidity, and GPS positioning. To communicate with
the different sensors connected to the microcontroller, MicroPython has a library capable of
handling different communication interfaces [37], such as SPI, used in communication with
the particle sensor, and UART, for communication with the GPS or I2C receiver needed to
read data from the temperature and humidity sensors.
Inside the device, there is a charge controller that manages the recharging process of
an internal high-capacity lithium battery. It has approximately two-days’ autonomy in
case it is disconnected from the power grid or the auxiliary battery. When an occasional
measurement in a concrete area is needed, our solution can be portable. The device can
be moved to the location required. Furthermore, a casing has been designed keeping in
mind that the device must withstand adverse weather conditions. It includes air input and
output on both sides for the particle sensor. A nozzle faces downwards to ensure that the
device is protected and the particle concentrations are correctly monitored.
The flow diagram of the software developed for this device is represented in Figure 5.
After pressing the ON button, placed on one side next to the charge connector, the device
begins the boot up sequence of the integrated particle sensor. Then, the request for the
histogram is initiated and an average of the particle concentration data from the previous 5
min is obtained. Then, the device collects the temperature and humidity data and calculates
the percentage of remaining battery. Next, the device finds the GPS location, which is only
Sensors 2022, 22, 9221 10 of 45
acquired in open spaces. All this information is sent to the LoRaWAN network. After that,
the histogram request sequence starts again to continue running the software in a loop.
Figure 5. Flow diagram of the operation of the particulate matter measuring device.
The sensors react to gases generating voltage levels on their electrodes. These lev-
els must be measured to calculate the gas concentration in the air. The measurement is
performed by two electrodes: a working electrode and an auxiliary electrode. This com-
pensates for the errors caused by the effects of ambient temperature and humidity. The
aim is to obtain the equivalence between the voltage levels and the accurate concentration
of each of the gases in µg m−3 . The calibration of the sensors was twofold. On the one
hand, all the sensors were first calibrated under laboratory conditions. Secondly, a second
calibration of all sensors in working conditions was made (outdoors). To do this, sensor
devices were measuring during several weeks together with an official air quality station
belonging to the administration. The results of both measurement systems were compared,
and the measurements of our devices were adjusted using a linear regression. This second
calibration allowed us to take into account the impact of interfering gases on our sensors
without laboratory conditions.
The device includes analog-to-digital converters to obtain digital values from the
voltage measured at the electrodes of the sensors. Then, the microcontroller integrated
into the device reads these values and sends them through LoRa to the server to carry out
postprocessing actions on the data. Similar to the particulate matter measuring device, the
pollutant gas measuring device integrates a GPS and temperature and humidity sensor
which, in this case, are necessary for postprocessing since the gas sensors are even more
sensitive to temperature and humidity. In addition, this device includes a charge controller
and a battery with enough capacity for three days of continuous use.
The software developed for this device reads the voltage values of the main and
auxiliary electrodes of the gas sensors for 1 minute, carrying out one measurement per
second and calculating the average value. This value is stored to later be forwarded to the
server. After this, the GPS location of the device is attained and, lastly, the device reads
the current temperature and humidity. Once all the necessary data are collected, they are
dispatched to the LoRaWAN network for postprocessing and storage at the server. The
flow chart of the operation of the device is presented in Figure 7.
The device was calibrated by obtaining voltage samples from the two electrodes of each
of the sensors for a long time window (one month) with the device placed near an official
station with highly calibrated gas sensors. Once the necessary samples were obtained, the
data were stored in a csv format. This csv format includes hourly averages of the voltage
data from the four sensors’ working and the auxiliary electrodes, ambient temperature
and humidity data at the time the sample was taken, and the gas concentrations from the
official station. The data from the official station is public and updated every hour with the
average concentration of the previous hour.
Sensors 2022, 22, 9221 12 of 45
The device was calibrated using Python and the Linear Regression method, which
is one of the most well-known algorithms in machine learning. Linear Regression uses
an equation to obtain a predicted value from the input data. A coefficient is assigned to
each input value. When this coefficient is zero, the input value does not influence the
result of the prediction [40]. To calculate the coefficients of the Linear Regression algorithm,
data from the official station were used as the expected result. The voltage values from
the electrodes and the ambient temperature and humidity obtained at the time of taking
these samples were used as input values to calibrate each device. As a result, different
adjustment coefficients were generated for each of the gas sensors. These coefficients must
be multiplied by the input data to obtain the gas concentration in µg m−3 .
Note that measurements from other sensors were also considered since there is cross
interference among them. Therefore, a sensor positioned to measure the concentration of
a specific gas may react to the presence of other gases, producing undesirable voltage in
its electrodes.
The results of this calibration process were highly satisfactory. Concentration values
very similar (up to around 90%) to those of the official station were obtained. Therefore,
our device is a good option as a small and affordable solution to measure the concentration
of gases and particulate matter in the air of industrial environments.
However, it is necessary to previously verify that there is enough radio coverage for the
devices to establish a connection to the gateway.
Some factors, such as deploying the solar panel inclined and facing south were consid-
ered. Moreover, the structure includes a large battery that can store the energy provided
by the solar panel and provide the necessary weight to keep the structure stable without
anchoring it to the floor, although the latter is recommended for improved safety. Both the
battery and the solar panel controller are placed inside a watertight box in the lower part
of the structure. The cables from the solar panel are connected to the controller through
the lower part of the box. The power cords of the two devices monitoring air quality are
also connected to the controller using USB ports through the lower part of the box. The
controller is connected to the battery.
The solar panel is placed on the top of the structure protecting the suspended partic-
ulate matter measuring device and the polluting gas measuring device. This structure is
encased in a metal grid with a door. The structure and solar panel are shown in Figure 8.
Figure 8. Storage structure for the air quality monitoring sensors in its development stage.
5. Results
This section presents the graphical interface and the alert features of our system, as
well as the machine learning study determining the best prediction-making algorithm based
on the data obtained from the sensors of the different devices discussed in the previous
sections. These predictions permit early detection and report alerts about dangerous levels
of gases or particulate matter.
shows how the information about particulate matter in the air is organized. It has been
designed to show the average concentration of the previous half hour on the upper left
part of the panel. There are also warning indicators that vary in color as concentrations
increase. For example, the PM10 display is green if it does not surpass the 25 µg m−3
threshold, yellow if the concentration ranges from 25 to 50 µg m−3 , and red when it exceeds
50 µg m−3 . These values are based on the WHO’s recommendations for annual and daily
measurements, as indicated in Section 1. In the case of PM2.5 , the same criteria was applied.
Lastly, for PM1 , the same values as PM2.5 were used since the WHO does not provide
any recommendation.
The graph on the upper right part of the panel shows the evolution of particulate
levels. The graph with the daily average of the previous seven days is displayed below.
The central left part shows the location of the device, forwarded by the integrated GPS. To
do this, the WorldMAp extension in Grafana must be installed. The real time temperature
and humidity values and the graphs representing their evolution are located on the lower
part of the panel.
Figure 9. Visualization panel for the particulate matter concentrations in the air.
Figure 10 shows the panel with the concentrations of the four different gases monitored
by the device. The upper part shows the geolocation of the device. The graphs of the
individual concentrations of each gas have been divided into sections. Each section shows
the average concentration of the previous half hour, the evolution of the concentration by
hour, and the average daily concentration. The device also includes a section displaying
the temperature and humidity.
Sensors 2022, 22, 9221 15 of 45
Figure 10. Visualization panel for concentrations of the four gases in the air.
Regarding the alerts, Figure 11 shows an example of the alert that is forwarded to a
client when the established threshold (50 µg m−3 ) is surpassed for an hour. This alert is
sent through Grafana, and the evolution of the particulate matter levels over time can be
seen in an email.
Figure 11. Alert for high PM10 particulate matter levels in the air using the service integrated
in Grafana.
Figure 12 shows an alert received by the network administrator notifying about a series of
crashed services. This alert has been forwarded using the Prometheus AlertManager functionality.
The industry may use these alerts to create specific policies and determine the actions
to be performed to reduce pollution emissions. Furthermore, as the location of all the
deployed devices is known, the source of the pollution is identified by the device that
activates the alert.
Sensors 2022, 22, 9221 16 of 45
observations, which were employed by each of the machine learning techniques under
consideration. Tests were performed for the data series acquired from each device without
removing the outlier values and once the outlier values were removed. Figures referring to
the statistical results presented in Tables 3–5 are available in the Appendix A. Finally, note
that all the devices were placed in the same location as the official station with the goal of
(i) calibrating our devices and (ii) verifying the proper operation of our complete system in
real time.
Table 3. Statistical details of the data for the polluting gas measuring device.
Table 4. Statistical details of the data for the particulate matter measuring device.
5.2.1. Machine Learning Techniques Applied to the Data from the Polluting Gas
Measuring Device
This subsection presents the results of applying the selected machine learning tech-
niques to the data from the polluting gas measuring device.
Table 6 specifies the RMSE and R2 results referring to CO gas for each algorithm. As
can be seen, the tests were performed only once due to the absence of outlier values. The
Gaussian Process Regression algorithm provided the best results, followed by the Random
Forest Regression and the k-nearest neighbors algorithms. For the Gaussian Process Re-
gression algorithm, we used an exponential kernel that adjusted to the dataset better than
Random Forest Regression (where the variance reduction is applied as selection criterion
from the mean squared error metric), k-nearest Neighbors (based on Euclidean distance),
Support Vector Machine (using radial basis function kernel), or Linear Regression algo-
rithms. However, it is important to address the differences in processing times required by
these algorithms since the Gaussian Process Regression algorithm required more processing
and memory resources than the remaining techniques under consideration. Specifically, the
Gaussian Process Regression needed over five hundred times more processing time than
Random Forest Regression or k-nearest Neighbors. The Support Vector Machine algorithm
was the one offering the worst results, except for the humidity data, being the Linear
Regression algorithm the one providing the worst results for all types of data gathered.
The results for the Gaussian Process Regression algorithm as the best machine learning
technique for air quality data were repeated for all the other sensors in the device, as shown
in Table 7 for the NO2 data, Table 8 for the O3 data, Table 9 for the SO2 data, Table 10
for the temperature data, and Table 11 for the humidity data. Also, the graphical results
for the Gaussian Process Regression technique are represented in Figures 13 and 14. The
graphical results for the other machine learning techniques are available in the Appendix
A due to space constraints. The Random Forest Regression technique obtains better results
than k-nearest neighbors for all the data from all the sensors. However, this difference
is minimum if we consider the R2 results. Only the results achieved for the NO2 and O3
sensors presented a greater, but still small, difference in accuracy.
Sensors 2022, 22, 9221 19 of 45
Table 6. CO results.
ML Technique RMSE R2
Linear Regression 0.0606371383 0.3457382025
Random Forest 0.0027354389 0.9986685398
k-nearest Neighbors 0.0058150782 0.9939829269
Support Vector Machine 0.0658647137 0.2280666840
Gaussian Process Regression 3.360786×10−10 1.0
Table 8. O3 results.
ML Technique RMSE R2
Linear Regression 2.4381959621 0.6823210137
Random Forest 0.1537149787 0.9987373481
k-nearest Neighbors 0.3163279069 0.9946528017
Support Vector Machine 3.5221020014 0.3370891440
Gaussian Process Regression 1.780470×10−11 1.0
ML Technique RMSE R2
Linear Regression 3.0491692194 0.7579604526
Random Forest 0.0698949997 0.9998728210
k-nearest Neighbors 0.1536363707 0.9993855149
Support Vector Machine 5.3579532919 0.2526550146
Gaussian Process Regression 3.742109×10−12 1.0
Sensors 2022, 22, 9221 20 of 45
Figure 13. Gaussian Process Regression graphical results for the polluting gas measuring device of
(a) the CO sensor data, (b) the NO2 data with outliers, (c) the NO2 data without outliers, and (d) the
O3 data with outliers.
5.2.2. Machine Learning Techniques Applied to the Data from the Suspended Particulate
Matter Measuring Device
The results for the suspended particulate matter measuring device are presented in
this subsection.
Similar to the results obtained for the previous device, the Gaussian Process Regression
technique had the best results in terms of accuracy, followed by Random Forest Regression
and k-nearest neighbors. Linear Regression presented the worst results for PM2.5 , PM1 , and
humidity data. For the rest of the cases, the worst algorithm was Support Vector Machine.
These results are presented in Table 12 for the PM2.5 data, Table 13 for the PM 1 data,
Table 14 for the PM 10 data, Table 15 for the temperature data, and Table 16 for the
Humidity data. Figures 15 and 16 show the graphical results of the Gaussian Process
Sensors 2022, 22, 9221 21 of 45
Regression algorithm for the data captured by the sensor in this device. The rest of the
graphical results are available in the Appendix A.
It is important to note that the differences between the results for the Random Forest
Regression and k-nearest neighbors are greater than in the previous case. Thus, Random
Forest Regression is the best algorithm if there are strict processing time requirements.
Figure 14. Gaussian Process Regression graphical results for the polluting gas measuring device of
(a) the O3 data without outliers, (b) the SO2 data, (c) the temperature data, (d) the humidity data with
outliers, and (e) the humidity data without outliers.
Sensors 2022, 22, 9221 22 of 45
Table 12. PM2.5 results for the suspended particulate matter measuring device.
Table 13. PM1 results for the suspended particulate matter measuring device.
Table 14. PM10 results for the suspended particulate matter measuring device.
Table 15. Temperature results for the suspended particulate matter measuring device.
ML Technique RMSE R2
Linear Regression 3.2755739004 0.69075451886
Random Forest 0.15767071532 0.99928347618
k-nearest Neighbors 0.35182904520 0.99643227035
Support Vector Machine 5.08413090899 0.25499016306
Gaussian Process Regression 4.0773021×10−12 1.0
Sensors 2022, 22, 9221 23 of 45
Table 16. Humidity results for the suspended particulate matter measuring device.
Figure 15. Gaussian Process Regression graphical results for the suspended particulate matter
measuring device of (a) the PM2.5 sensor data with outliers, (b) the PM2.5 data without outliers, (c)
the PM1 data with outliers, and (d) the PM1 data without outliers.
Figure 16. Gaussian Process Regression graphical results for the suspended particulate matter
measuring device of (a) the PM10 data with outliers, (b) the PM10 data without outliers, (c) the
temperature data, (d) the humidity data with outliers, and (e) the humidity data without outliers.
J.G.-H. and A.-J.G.-S.; funding acquisition, R.A.-C. and A.-J.G.-S. All authors have read and agreed to
the published version of the manuscript.
Funding: This research was supported by the project grants PID2020-116329GB-C22, and TED2021-
129336B-I00 funded by MCIN/AEI/ 10.13039/501100011033. This research was supported by the
project “Crowdsourcing Optimized Wireless Sensor Network Deployment (CRoWD)”, grant No.
613-621119-852 funded by Dirección General de Investigaciones of Universidad Santiago de Cali.
Finally, this work is a result of a internship funded by the Autonomous Community of the Region
of Murcia through the Fundación Seneca - Agencia de Ciencia y Tecnologia de la Región de Murcia
(Seneca Foundation—Agency for Science and Technology in the Region of Murcia) and European
programme NextGenerationEU.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The data presented in this study are available on request from the
corresponding author. The data are not publicly available due to privacy constraints.
Conflicts of Interest: The authors declare no conflict of interest.
Abbreviations
The following abbreviations are used in this manuscript:
Appendix A
The data presented in the appendix are the graphical results from the statistical
analysis of the data and the Machine-Learning results for each technique applied to the
data gathered by all the sensors that comprise the the polluting particle metering devices
and the polluting gas measuring device.
Sensors 2022, 22, 9221 26 of 45
Figure A1. Statistics for (a) CO (b) NO2 (c) O3 (d) SO2 (e) Temperature and (f) Humidity for the
polluting gas measuring device.
Sensors 2022, 22, 9221 27 of 45
Figure A2. Statistics for (a) PM2.5 (b) PM1 (c) PM10 (d) Temperature and (e) Humidity for particulate
matter measuring device # 1.
Sensors 2022, 22, 9221 28 of 45
Figure A3. Statistics for (a) PM2.5 (b) PM1 (c) PM10 (d) Temperature and (e) Humidity for particulate
matter measuring device # 2.
Sensors 2022, 22, 9221 29 of 45
Appendix A.2. Graphics of the Results for the Machine Learning Techniques Applied to the Data
Obtained from the Air Quality Monitoring Devices
Figure A4. CO results for (a) Linear Regression (b) Random Forest Regression (c) KNN and (d) Sup-
port Vector Machine.
Sensors 2022, 22, 9221 30 of 45
Figure A5. NO2 results with outliers for (a) Linear Regression (b) Random Forest Regression (c) KNN
and (d) Support Vector Machine, and without outliers for (e) Linear Regression (f) Random Forest
Regression (g) KNN and (h) Support Vector Machine.
Sensors 2022, 22, 9221 31 of 45
Figure A6. O3 results with outliers for (a) Linear Regression (b) Random Forest Regression (c) KNN
and (d) Support Vector Machine, and without outliers for (e) Linear Regression (f) Random Forest
Regression (g) KNN and (h) Support Vector Machine.
Sensors 2022, 22, 9221 32 of 45
Figure A7. SO2 results for (a) Linear Regression (b) Random Forest Regression (c) KNN and (d) Sup-
port Vector Machine.
Figure A8. Temperature results for (a) Linear Regression (b) Random Forest Regression (c) KNN and
(d) Support Vector Machine.
Sensors 2022, 22, 9221 33 of 45
Figure A9. Humidity results with outliers for (a) Linear Regression (b) Random Forest Regression (c)
KNN and (d) Support Vector Machine, and without outliers for (e) Linear Regression (f) Random
Forest Regression (g) KNN and (h) Support Vector Machine.
Sensors 2022, 22, 9221 34 of 45
Figure A10. PM2.5 results of the articulate matter measuring device # 1 with outliers for (a) Linear Re-
gression (b) Random Forest Regression (c) KNN and (d) Support Vector Machine, and without outliers
for (e) Linear Regression (f) Random Forest Regression (g) KNN and (h) Support Vector Machine.
Sensors 2022, 22, 9221 35 of 45
Figure A11. PM1 results of the articulate matter measuring device # 1 with outliers for (a) Linear Re-
gression (b) Random Forest Regression (c) KNN and (d) Support Vector Machine, and without outliers
for (e) Linear Regression (f) Random Forest Regression (g) KNN and (h) Support Vector Machine.
Sensors 2022, 22, 9221 36 of 45
Figure A12. PM10 results of the articulate matter measuring device # 1 with outliers for (a) Linear Re-
gression (b) Random Forest Regression (c) KNN and (d) Support Vector Machine, and without outliers
for (e) Linear Regression (f) Random Forest Regression (g) KNN and (h) Support Vector Machine.
Sensors 2022, 22, 9221 37 of 45
Figure A13. Temperature results for the articulate matter measuring device # 1 for (a) Linear Regres-
sion (b) Random Forest Regression (c) KNN and (d) Support Vector Machine.
Sensors 2022, 22, 9221 38 of 45
Figure A14. Humidity results of the articulate matter measuring device # 1 with outliers for (a)
Linear Regression (b) Random Forest Regression (c) KNN and (d) Support Vector Machine, and
without outliers for (e) Linear Regression (f) Random Forest Regression (g) KNN and (h) Support
Vector Machine.
Sensors 2022, 22, 9221 39 of 45
Figure A15. PM2.5 results of the articulate matter measuring device # 2 with outliers for (a) Linear Re-
gression (b) Random Forest Regression (c) KNN and (d) Support Vector Machine, and without outliers
for (e) Linear Regression (f) Random Forest Regression (g) KNN and (h) Support Vector Machine.
Sensors 2022, 22, 9221 40 of 45
Figure A16. PM1 results of the articulate matter measuring device # 2 with outliers for (a) Linear Re-
gression (b) Random Forest Regression (c) KNN and (d) Support Vector Machine, and without outliers
for (e) Linear Regression (f) Random Forest Regression (g) KNN and (h) Support Vector Machine.
Sensors 2022, 22, 9221 41 of 45
Figure A17. PM10 results of the articulate matter measuring device # 2 with outliers for (a) Linear Re-
gression (b) Random Forest Regression (c) KNN and (d) Support Vector Machine, and without outliers
for (e) Linear Regression (f) Random Forest Regression (g) KNN and (h) Support Vector Machine.
Sensors 2022, 22, 9221 42 of 45
Figure A18. Temperature results of the articulate matter measuring device # 2 with outliers for (a)
Linear Regression (b) Random Forest Regression (c) KNN and (d) Support Vector Machine, and
without outliers for (e) Linear Regression (f) Random Forest Regression (g) KNN and (h) Support
Vector Machine.
Sensors 2022, 22, 9221 43 of 45
Figure A19. Humidity results of the articulate matter measuring device # 2 with outliers for (a)
Linear Regression (b) Random Forest Regression (c) KNN and (d) Support Vector Machine, and
without outliers for (e) Linear Regression (f) Random Forest Regression (g) KNN and (h) Support
Vector Machine.
Sensors 2022, 22, 9221 44 of 45
References
1. Pražnikar, Z.; Pražnikar, J. The Effects of Particulate Matter Air Pollution on Respiratory Health and on the Cardiovascular
System. Jure Praznikar Prisp. 2011, 27, 5.
2. Alattar, N.; Yousif, J. Evaluating Particulate Matter (PM2.5 and PM10 ) Impact on Human Health in Oman Based on a
Hybrid Artificial Neural Network and Mathematical Models. In Proceedings of the 2019 International Conference on
Control, Artificial Intelligence, Robotics Optimization (ICCAIRO), Majorca Island, Spain, 3–5 May 2019; pp. 129–135.
https://doi.org/10.1109/ICCAIRO47923.2019.00028.
3. Suhaimi, N.F.; Jalaludin, J.; Bakar, S.A. Deoxyribonucleic acid (DNA) methylation in children exposed to air pollu-
tion: A possible mechanism underlying respiratory health effects development. Rev. Environ. Health 2021, 36, 77–93.
https://doi.org/doi:10.1515/reveh-2020-0065.
4. European Air Quality Index, 2021. Available online: https://www.eea.europa.eu/themes/air/air-quality-index (accessed on 13
November 2022).
5. Revision of the Ambient Air Quality Directives. Available online: https://environment.ec.europa.eu/topics/air/air-quality/
revision-ambient-air-quality-directives_en (accessed on 13 November 2022).
6. World Health Organization. New WHO Global Air Quality Guidelines Aim to Save Millions of Lives from Air Pollution; World Health
Organization (WHO): Geneva, Switzerland.
7. Shelestov, A.; Sumilo, L.; Lavreniuk, M.; Vasiliev, V.; Bulanaya, T.; Gomilko, I.; Kolotii, A.; Medianovskyi, K.; Skakun, S.
Indoor and outdoor air quality monitoring on the base of intelligent sensors for smart city. In Proceedings of the XVIII
International Conference on Data Science and Intelligent Analysis of Information, New York, NY, USA, 20–22 July 2018; Springer:
Berlin/Heidelberg, Germany, 2018; pp. 134–145.
8. Mansour, S.; Nasser, N.; Karim, L.; Ali, A. Wireless sensor network-based air quality monitoring system. In Proceedings of the
2014 International Conference on Computing, Networking and Communications (ICNC), Honolulu, HI, USA, 3–6 February 2014;
pp. 545–550.
9. Rebeiro-Hargrave, A.; Motlagh, N.H.; Varjonen, S.; Lagerspetz, E.; Nurmi, P.; Tarkoma, S. MegaSense: Cyber-physical system
for real-time urban air quality monitoring. In Proceedings of the 2020 15th IEEE Conference on Industrial Electronics and
Applications (ICIEA), Virtual, 9–13 November 2020; pp. 1–6.
10. Hu, Z.; Bai, Z.; Yang, Y.; Zheng, Z.; Bian, K.; Song, L. UAV aided aerial-ground IoT for air quality sensing in smart city:
Architecture, technologies, and implementation. IEEE Netw. 2019, 33, 14–22.
11. Amuthadevi, C.; Vijayan, D.; Ramachandran, V. Development of air quality monitoring (AQM) models using different machine
learning approaches. J. Ambient. Intell. Humaniz. Comput. 2021, 1 1–13.
12. European Commission. Available online: https://inspire.ec.europa.eu/inspire-directive/2 (accessed on 13 November 2022).
13. Bagkis, E.; Kassandros, T.; Karteris, M.; Karteris, A.; Karatzas, K. Analyzing and improving the performance of a particulate
matter low cost air quality monitoring device. Atmosphere 2021, 12, 251.
14. Loh, B.G.; Choi, G.H. Calibration of portable particulate matter–monitoring device using web query and machine learning. Saf.
Health Work. 2019, 10, 452–460.
15. Kuncoro, A.H.; Mellyanawaty, M.; Sambas, A.; Maulana, D.S.; Mamat, M. Air Quality Monitoring System in the City of
Tasikmalaya based on the Internet of Things (IoT). J. Adv. Res. Dyn. Control. Syst. 2020, 12, 2473–2479.
16. Johnston, S.J.; Basford, P.J.; Bulot, F.M.; Apetroaie-Cristea, M.; Easton, N.H.; Davenport, C.; Foster, G.L.; Loxham, M.; Morris, A.K.;
Cox, S.J. City scale particulate matter monitoring using LoRaWAN based air quality IoT devices. Sensors 2019, 19, 209.
17. Arroyo, P.; Herrero, J.L.; Suárez, J.I.; Lozano, J. Wireless sensor network combined with cloud computing for air quality
monitoring. Sensors 2019, 19, 691.
18. Popović, I.; Radovanovic, I.; Vajs, I.; Drajic, D.; Gligorić, N. Building Low-Cost Sensing Infrastructure for Air Quality Monitoring
in Urban Areas Based on Fog Computing. Sensors 2022, 22, 1026.
19. Jo, J.; Jo, B.; Kim, J.; Kim, S.; Han, W. Development of an IoT-based indoor air quality monitoring platform. J. Sens. 2020, 2020
1–14 .
20. Molka-Danielsen, J.; Engelseth, P.; Wang, H. Large scale integration of wireless sensor network technologies for air quality
monitoring at a logistics shipping base. J. Ind. Inf. Integr. 2018, 10, 20–28.
21. Zhu, D.; Cai, C.; Yang, T.; Zhou, X. A machine learning approach for air quality prediction: Model regularization and optimization.
Big Data Cogn. Comput. 2018, 2, 5.
22. Zimmerman, N.; Presto, A.A.; Kumar, S.P.; Gu, J.; Hauryliuk, A.; Robinson, E.S.; Robinson, A.L.; Subramanian, R. A machine
learning calibration model using random forests to improve sensor performance for lower-cost air quality monitoring. Atmos.
Meas. Tech. 2018, 11, 291–313.
23. Sun, P.; Liu, W.; Xu, Y.; Wang, L. Research on application of real-time database for air quality automatic monitoring system. Earth
Environ. Sci. 2021, 675, 012023.
24. LoRaWAN 1.1 Specification, 2017. Available online: https://lora-alliance.org/wp-content/uploads/2020/11/lorawantm_
specification_-v1.1.pdf (accessed on 24 February 2021).
25. Influxdata. Influxdara Act in Time. Available online: https://www.influxdata.com (accessed on 24 February 2022).
Sensors 2022, 22, 9221 45 of 45
26. Kychkin, A.; Deryabin, A.; Vikentyeva, O.; Shestakova, L. Architecture of Compressor Equipment Monitoring and Control Cyber-
Physical System Based on Influxdata Platform. In Proceedings of the 2019 International Conference on Industrial Engineering,
Applications and Manufacturing (ICIEAM), Sochi, Russia, 25–29 March 2019; pp. 1–5.
27. The Things Network. LoRaWAN. Available online: https://www.thethingsnetwork.org/docs/lorawan/what-is-lorawan/
(accessed on 24 February 2022).
28. Chirpstack. Available online: https://www.chirpstack.io (accessed on 24 November 2022).
29. Elbsir, H.E.; Kassab, M.; Bhiri, S.; Bedoui, M.H. Evaluation of lorawan class b efficiency for downlink traffic. In Proceedings
of the 2020 16th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob),
Thessaloniki, Greece, 12–14 October 2020; pp. 105–110.
30. The Things Stack. ABP vs OTAA. Available online: https://www.thethingsindustries.com/docs/devices/abp-vs-otaa/ (accessed
on 24 February 2022).
31. Tantitharanukul, N.; Osathanunkul, K.; Hantrakul, K.; Pramokchon, P.; Khoenkaw, P. MQTT-topics management system for
sharing of open data. In Proceedings of the 2017 International Conference on Digital Arts, Media and Technology (ICDAMT),
Chiang Mai, Thailand, 1–4 March 2017; pp. 62–65.
32. Guo, Z.X.; Xie, X.Q.; Ni, Z.G. The application of OPC DA in factory data acquisition. In Proceedings of the 2012 IEEE International
Conference on Computer Science and Automation Engineering (CSAE), Zhangjiajie, China, 25–27 May 2012; Volume 2, pp.
209–212.
33. OPC Foundation. OPC Unified Architecture. 2021. Available online: https://opcfoundation.org/about/opc-technologies/opc-ua
(accessed on 27 February 2022).
34. Tolon, C. Monitoring Availability Metrics with Blackbox Exporter and Sysdig. 2021. Available online: https://sysdig.com/blog/
blackbox-exporter-sysdig/ (accessed on 28 February 2022).
35. Avdakovic, S.; Dedovic, M.M.; Dautbasic, N.; Dizdarevic, J. The influence of wind speed, humidity, temperature and air pressure
on pollutants concentrations of PM10—Sarajevo case study using wavelet coherence approach. In Proceedings of the 2016 XI
International Symposium on Telecommunications (BIHTEL), Sarajevo, Bosnia and Herzegovina, 24–26 October 2016; pp. 1–6.
36. Qartia. Available online: https://qartia.com (accessed on 24 November 2022).
37. George Robotics Limited. MicroPython—Python for Microcontrollers. Available online: https://micropython.org (accessed on
21 February 2022).
38. Gaspar, G.; Fabo, P.; Kuba, M.; Flochova, J.; Dudak, J.; Florkova, Z. Development of IoT applications based on the MicroPython
platform for industry 4.0 implementation. In Proceedings of the 2020 19th International Conference on Mechatronics-Mechatronika
(ME), Prague, Czech Republic, 2–4 December 2020; pp. 1–7.
39. IQAir. 10 Most Harmful Pollutants You’re Breathing Every Day. 2021. Available online: https://www.iqair.com/us/blog/
health-and-wellness/10-most-harmful-air-pollutants (accessed on 22 February 2022).
40. Brownlee, J. Linear Regression for Machine Learning. 2016. Available online: https://machinelearningmastery.com/linear-
regression-for-machine-learning/ (accessed on 24 February 2022).
41. Tukey, J.W. Addison—Wesley Series in Behavioral Science: Quantitative Methods. In Exploratory Data Analysis; Sage: Thousand
Oaks, CA, USA, 1979.