MDPI Sandy S Special Issue
MDPI Sandy S Special Issue
MDPI Sandy S Special Issue
1 Department of Informatics & Telecommunications, University of Peloponnese, Karaiskaki 70, Tripoli, Greece;
2 School of Electrical and Computer Engineering, National & Technical University of Athens, Polytechnioupoli
Zografou, 15772, Athens, Greece;
* Correspondence: [email protected];(F.A.)
Abstract: Particulate matter pollution is a significant concern on a global scale, as it poses detrimental 1
effects on human health. In order to implement effective mitigation measures, it is essential to have an 2
accurate and efficient forecasting service. In this work, we pave the road towards a general framework 3
for forecasting particulate matter concentrations using publicly available data from low-cost sensors, 4
in conjunction with state-of-the-art machine learning algorithms. Specifically, for addressing the time 5
variability, we utilize a novel Long Short-Term Memory (LSTM) neural network, which offers a sense 6
of interpretability. For the first time in this field, we model the spatial dependence of particulate 7
matter pollution in urban agglomerations by incorporating features such as population density and 8
mean floor area ratio. Our approach is applicable to any type of sensor, and as a case study, we apply 9
it to Patras, a previously unstudied Greek port city, to predict PM2.5 concentrations. Our model 10
demonstrates forecasting accuracy comparable to the resolution of the sensors, as well as meaningful 11
Keywords: keyword 1; keyword 2; keyword 3 (List three to ten pertinent keywords specific to the 13
1. Introduction 15
Particulate matter (PM) pollution poses a major health concern worldwide according 16
to World Health Organization, [1]. Among the most influential PMs to human health [2] 17
are PM10 particles with aerodynamic diameter less than 10 µm, [3], PM2.5 (aerodynamic 18
Citation: Lastname, F.; Lastname, F.; diameter less than 2.5µ m) and PM1.0 (aerodynamic diameter less than 1.0 µm). 19
Lastname, F. Title. Atmosphere 2022, 1, 0.
The most abundant natural PM particles are sea salt, originating from the earth’s 20
https://doi.org/
oceans, mineral dust originating from arid and semi-arid areas, volcanic and biogenic 21
Received: emissions, [4]. Anthropogenic particles are produced from industrial complexes (i.e. petro- 22
Published: residential heating [5], biomass burning [6], and more. These particles may be transported 24
to long distances from their source (> 1000km) by mesoscale and synoptic circulations, 25
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
depending on their aerodynamic properties and chemical reactivity, [7]. The turbulent 26
iations. PM concentrations within the lower layers of the troposphere, as well, [8]. PM particles 28
influence the energy budget of the atmosphere by scattering and absorbing solar radiation 29
Copyright: © 2023 by the authors.
and by absorbing remitting infrared radiation. Some of these particles interact with water 30
Submitted to Atmosphere for
vapor and other hydrometeors in clouds, hence influencing cloud dynamics and precipi- 31
possible open access publication
tation characteristics, such as the total amount produced and the maximum rates, [7]. A 32
under the terms and conditions
sub-case of the general picture, showing a profound interest, is the distribution of PM con- 33
of the Creative Commons Attri-
centrations in dense urban environments, where the landscape increases the complexity, i.e. 34
bution (CC BY) license (https://
creativecommons.org/licenses/by/
roads between high buildings (the so-called street canyons), presenting high aerodynamic 35
4.0/).
roughness. Circulations can also be induced by localized steep temperature gradients, 36
heavily influenced by meteorological conditions such as wind speed and relative humidity, 38
which are also extremely time-dependent, modeling and forecasting PM pollution in urban 39
Towards forecasting particulate matter pollution, there are two main approaches, 41
namely transport models (i.e. CALPUFF [10], ADMS-5 [11,12], CAMx [13] etc.) and 42
allow for somewhat coarse modelling of large spatial scales, from 100m (ADMS-5) to a 44
whole hemi-sphere (CAMx), while CFD models allow for very detailed modelling, though 45
focused on small scales. It is shown at [10] that dispersion models are less accurate within 46
the complex agglomeration within a city, thus a prominent approach would be to employ a 47
CFD framework in such environments. However, this task could be very costly in terms of 48
computational resources and for some applications, practically impossible. For example, 49
[10] studied an area of about 1.2 km2 , which is much smaller than the area for a medium- 50
sized city, which could be of order of 10 km2 . Moreover, an ever-present difficulty in both 51
the aforementioned approaches is the need for detailed modelling of the pollution sources, 52
in order to be used as input to the simulation, [17]. Another difficulty is that the resulting 53
modeling framework cannot be generalized easily, as each city has its own, time-dependent, 54
emissions budget. A way to overcome the above difficulties is to use a purely model 55
agnostic approach, that is Machine Learning algorithms and specifically, Artificial Neural 56
Networks (ANN). 57
literature, with the first work of this kind published almost two decades before today, i.e. 59
[18]. The last few years, the community is actively exploring Deep Learning approaches 60
to particulate matter prediction, with very good results [19–32]. Despite of the accurate 61
predictions of ANNs, and the fact that they often outperform classical machine learning 62
algorithms, they receive criticism as being "black boxes", i.e. [33]. In order to maintain 63
the excellent predictability while also posses insight on the results, there are approaches 64
offering a sense of interpretability on the final prediction outcome. A recent work [34] 65
constructs a novel kind of Long Short-Term Memory (LSTM) ANN network that allows 66
for both high quality predictions and interpretations of the final result. In this work, we 67
employ LSTM networks of the latter kind to predict PM concentrations and we define a set 68
of features, able to properly quantify the spatial dependency of the phenomenon, coming 69
The plan of this work is a follows: At Section 2 we describe our method, in Section 3 71
we present our setup and in Section 4 we discuss our results. Finally, at Section 5 we draw 72
Long Short Term Memory networks (LSTM). Notably, we use an LSTM instance for the 77
whole city, rather than one instance per sensor, as done i.e. at [19]. The spatial variability 78
(MLP) to combine the results of particular LSTM instances that are used per sensor, [20,29] 80
employ a Convolutional Recurrent NN (CNN) while the latter work also utilizes geospatial 81
features, such as longtitude, latitude and distance from city center. In general, there exists a 82
number of works that employ combinations of CNN with LSTM cels [29? ]. 83
The main advantage of our approach lies in the fact that one can easily add or remove 84
sensors, while also being able to incorporate moving sensors, for example in drones [35] 85
or city bus [36]. This feature is not trivial in the context of previous works in the field [37]. 86
Moreover, our approach is applicable for any kind of sensors, with minimum requirement 87
to apply sensor-pecific calibration procedures during the data pre-proccessing stage. One 88
Version April 4, 2023 submitted to Atmosphere 3 of 14
can also take into account specific properties of a particulate sensors subnet, for example 89
As stated before, we employ a modified LSTM network. LSTMs are commonly used 92
in tasks such as language translation, speech recognition, and time series forecasting, i.e. 93
[38]. The structure of an LSTM cell consists of four components:, namely the input gate, 94
the forget gate, the output gate and the hidden state. The input gate is responsible for 95
determining which pieces of information from the input data should be passed on to the 96
next layer of the LSTM. This is done by weighting the input data using a sigmoid function, 97
which produces a value between 0 and 1 for each element provided. Values close to 0 98
indicate that the corresponding element of the input data should not be passed on, while 99
values close to 1 indicate that it should be passed on. The forget gate is responsible for 100
determining which pieces of information from the previous time step should be retained 101
and which should be forgotten. This is also done using a sigmoid function, which produces 102
a value between 0 and 1 for each element of the previous time step’s data. Values close 103
to 0 indicate that the corresponding element should be forgotten, while values close to 1 104
indicate that it should be retained. The output gate is responsible for determining which 105
pieces of information from the current time step should be passed on to the next time step. 106
This is done using a combination of the input gate, forget gate, and current time step data. 107
The output gate produces a weighted sum of these inputs, which is then passed on to the 108
next time step. Finally, the hidden state is carried over from one time step to the next, and 109
represents to the "memory" of the LSTM, updated per timestep according to the input, 110
forget and output gates. This allows the LSTM to model long-term dependencies in the 111
Recently, [34] proposed a novel LSTM, named "IMV-LSTM", that offers a sense of 113
interpretability of the final result, while maintaining high accuracy of its predictions. The 114
underlying idea is to store the information from each variable (i.e. feature) in a specific 115
part of the hidden state matrix and to perform updating of the latter matrix in such a way 116
that the aforementioned separation is retained. Thus, after the training, one could assess 117
the individual contribution of each feature to the prediction. Authors at [34] describe two 118
particular LSTM implementations, namely the tensor IMV-LSTM and the full IMV-LSTM, 119
where they differ on the update scheme for the forget, input and output gates. 120
Some details on interpretability and a few things about a formal consideration 121
The available features in our dataset, can be categorized in the following subsets 123
• Meteorological features: Relative Humidity, Average Wind Speed, Pressure, Average 124
• Geospatial features: Mean Floor Area Ratio, which corresponds to the mean ratio of 126
total built floor area to the area of the piece of land under study[39]. Note that the 127
total built floor area of a building is calculated, which means that for buildings with 128
multiple floors, the Mean Floor Area Ratio is larger than 1. Mean Floor Ratio quantifies 129
the "density" of buildings in a given area, which contribute to the phenomenon via 130
.with regard to the Mean Population Density, which is the average population density 131
• Time-related features: In order to inform the model for the periodicity of the phe-
nomenon, regarding daily (hours), weekly (days) and seasonal (months) variability
of both the human-related emissions and also the meteorological conditions, we
parametrize each timestamp via the following features.
2π · time 2π · time
cos_time = cos , sin_time = sin (1)
T T
Version April 4, 2023 submitted to Atmosphere 4 of 14
where time ∈ {hours, days, months} and T ∈ {24, 7, 12}, respectively. 133
30000
25000
20000
Counts
15000
10000
5000
741
749
1030
1566
1672
1712
5078
5092
14857
14877
23759
30765
56113
56229
56453
57523
101589
101597
101609
101611
146920
id
9.17%
8.65%
8.80% 1.60%
7.69%
7.74%
5.19%
0.34%
2.23%
6.70%
1.47%
1.72%
1.56%
1.60%
7.73%
3.16%
2.38%
6.91% 3.17%
6.77% 5.42%
id
741 1566 5078 14877 56113 57523 101609
749 1672 5092 23759 56229 101589 101611
1030 1712 14857 30765 56453 101597 146920
Figure 1. Left panel: Meteorological (green) and Particulate Matter (blue) sensors locations at Patras,
Greece. Right panel: On the upper side, the histogram of the data points per PM sensor is presented,
while on the lower part, the corresponding percentages are depicted in a pie chart.
works on source appointment via chemical analysis. 2. complex interplay between 136
meteorological stability conditions and emissions 3. some generalities on health effects. 137
4. There are installed sensors, properly cite people that set up the network. 138
The greater Patras area (city center and its suburbs) was selected for this analysis, 139
located in the northwestern Peloponnese (38◦ 14′ N, 21◦ 4′ E), approximately 220 km west 140
Köppen–Geiger climate classification), with daily average temperatures ranging from 142
6.1◦ C in January to 25.3◦ C in August. The wettest month is November with an average 143
accumulated precipitation of 118 mm, while July is the driest with 4.2 mm [40]. The study 144
of air quality over Patras poses a significant scientific interest, as it is the third largest city 145
in Greece, home to more than 200,000 residents. There are many sources of particulate 146
matter, with comparable contributions to total PM loading and characterized by notable 147
spatiotemporal variations, [41]. The southern part of the city features an international port, 148
particularly active during the summer season. It primarily serves passenger ships rather 149
Version April 4, 2023 submitted to Atmosphere 5 of 14
than cargo vessels. To the southwest (16km from the city center), there is a small industrial 150
zone, consisting of a number of light industries (pharmaceuticals, food, beverages, etc). 151
The contribution of biomass burning, such as the burning of agricultural waste (olive 152
tree branches), from rural areas surrounding Patras is estimated to be up to 7% for PM2.5 153
and 10% for PM10 , [42]. Studies suggest that during high-polluted days, this contribution 154
can reach up to 50%, due to low mixing, [6]. Anthropogenic PM particles in Patras mainly 155
consist of organic aerosols (OA) and sulfates. The main sources of OA are very oxygenated 156
from other primary emissions (including cooking OA), [43]. Traffic is the main source of 159
anthropogenic PM10 particles (46.2%), [44]. On the other hand, the most common natural 160
PM10 particles observed over Patras consist of long-range transport of Saharan dust, mainly 161
contributing to the total PM10 . Extreme events of dust transport over Greece are frequent 162
throughout the year, [45]. A 2011 study suggested that secondary sulfate (34%), traffic 163
emissions (34%), biomass burning (11%), shipping (10%), sea salt (11%), and mineral dust 164
(2%) were the major PM2.5 sources in the city center and for a suburban site in Patras, 165
the same study suggested that secondary sulfate (34%), traffic emissions (25%), biomass 166
burning (15%), mineral dust (10%), and sea salt (5%) were the major PM2.5 contributors, 167
[46]. Biomass burning for residential heating is the most important organic aerosol source 168
Fotis: This should be split in paragraphs, i.e. general info, met info 170
microgram (µg) per m3 , measured by PurpleAir PA - II sensors, [48]. Historical data are 175
publicly available PurpleAir API [49], in 10min intervals. PurpleAir PA - II devices contains 176
two PMS5003 laser particle counters, a BME280 environmental sensor, and an ESP8266 177
PMS5003 sensors is based on the modulation of light intensity as particles pass through 179
the measurement cavity. The latter modulation, the so-called nephelometric response, is 180
directly proportional to the concentration of particles in terms of both mass and number. 181
The available features from PurpleAir API [49] are as follows, 182
5.0_um_count_i, 10.0_um_count_i, in units of particle number per 0.1 dm3 . Essentially 184
these are particle counts in the range of diameters from the written value and below. 185
• Calculated concentration values, that are pm1.0_cf_1_i, pm1.0_atm_i, pm2.5_atm_i, 186
pm2.5_cf_1_i, pm10.0_atm_i, pm10.0_cf_1_i’, in units of µg/m3 . 187
where i ∈ { a, b} corresponds to each one of the two PMS5003 sensors, hereafter channels. 188
The transformation algorithm from particle counts to concentration values is proprietary, 189
so essentially can be considered as part of the measurement process. In similar lines with 190
the literature [50? ], we use features with the label CF=1, which correspond to assuming 191
Meteorological variables and specifically pressure, temperature, and relative humidity 193
are measured via the the BME280 sensor. However, the latter sensor is located right 194
above the PMS5003 sensors, thus heat dissipation from the sensors increases the measured 195
temperatures (from 2.7 0 C to 5.3 0 C) and provides a drier RH, (from +9.7 % to +24.3%), [51]. 196
This effect is not a constant shift but a time-varying fluctuation thus one could anticipate 197
that it could enhance or alleviate physics imprints on the data. Moreover, as [52] found, 198
non-physical maxima and minima of temperature and relative humidity appearing in the 199
dataset, in a frequency of 1 per 107 measurements, attributed to either electronic noise or 200
k
Zj = ∑ wij Zi (2)
i =1
The contribution of each meteorological station, i, is according to weights, wij = C/lij , 202
with lij is the distance between meteorological station i and sensor j, C is a normalization 203
PM2.5( g/m3)-channel B
80 300
200
60
150 200
100 40
100
50 20
0 0 0
0 50 100 150 200 250 300 350 0 20 40 60 80 100 120 0 100 200 300 400
id = 1566 id = 1672 id = 1712
250
80 400
200
PM2.5( g/m3)-channel B
60 300
150
40 200
100
20 100 50
0 0 0
0 20 40 60 80 0 100 200 300 400 0 50 100 150 200
id = 5078 id = 5092 id = 14857
300
300 800
250
250
600
PM2.5( g/m3)-channel B
200
200
150 400
150
100 100
200
50 50
0 0 0
0 50 100 150 200 250 300 0 50 100 150 200 250 300 0 100 200 300 400 500 600 700
id = 14877 id = 23759 id = 30765
400
350
200 200
300
PM2.5( g/m3)-channel B
200
100 100
150
100
50 50
50
0 0 0
0 50 100 150 200 0 50 100 150 200 0 50 100 150 200 250 300 350 400
PM2.5( g/m3)-channel A PM2.5( g/m3)-channel A PM2.5( g/m3)-channel A
Figure 2. Scatter plots for the two channels of PurpleAir sensors in Patras with id ∈
{741, 749, 1030, 1566, 1672, 1712, 5078, 5092, 14857, 14877, 23759, 30765} that satisfy the criterion (??).
The dashed line corresponds to the least-squares fit, with the corresponding parameters can be found at Table 1
Version April 4, 2023 submitted to Atmosphere 8 of 14
PM2.5( g/m3)-channel B
80 150 1000
60 800
100
600
40
400
50
20
200
0 0 0
0 20 40 60 80 100 120 0 50 100 150 200 0 250 500 750 1000 1250 1500 1750
id = 57523 id = 101589 id = 101597
250 200
200
200
PM2.5( g/m3)-channel B
150
150
150
100
100
100
50 50
50
0 0 0
0 50 100 150 200 0 50 100 150 200 250 0 25 50 75 100 125 150 175 200
id = 101609 id = 101611 id = 146920
300
35
200
250 30
PM2.5( g/m3)-channel B
200 25
150
20
150
100 15
100
10
50
50 5
0 0 0
0 50 100 150 200 0 50 100 150 200 250 300 0 5 10 15 20 25 30 35 40
PM2.5( g/m3)-channel A PM2.5( g/m3)-channel A PM2.5( g/m3)-channel A
Figure 3. Scatter plots for the two channels of PurpleAir sensors in Patras with id ∈
{56113, 56229, 56453, 57523, 101589, 101597, 101609, 101611, 146920} that satisfy the criterion (??). The
dashed line corresponds to the least-squares fit, with the corresponding parameters can be found at Table ??
Version April 4, 2023 submitted to Atmosphere 9 of 14
Table 1. General properties of our dataset. The ‘id’ column corresponds to sensor id in the purpleAir
network, a, b are the coefficients of the linear fit between the channels, µscatter , σscatter , µ̃, σmedian
are mean scatter with the corresponding standard deviation, median and the corresponding sigma.
Scatter is defined as the orthogonal distance between each ( PM2.5,channel A , PM2.5,channelB ) value and
the fitted line. Also, rk and the corresponding p values are measures of correlation, with k to be either
‘Pearson’ or ‘Spearmann’. Detailed description can be found in the text.
| PM A,2.5 − PMB,2.5 |
≤ a% (3)
PM A,2.5 + PMB,2.5
From PA-II sensor’s output features presented above, we select “pm2.5_c f _1”, in similar 205
lines with the literature [55? ], so PM A,2.5 ≡ pm2.5_c f _1_a and PMB,2.5 ≡ pm2.5_c f _1_b. 206
The measurements from each sensor after employing the aforementioned cut (eq. 3)
are shown at Fig. 2,3. In order to assess the linearity between two channels, we employ
Spearman’s and Pearson’s rank-order correlation coefficients, as implemented in the free
and open source Python library scipy [56]. Both criteria are non-parametric measures of
the monotonicity of the relationship between two variables, and rk ∈ [−1, +1], where
k ∈ {Pearson, Spearman} and 0 implying no correlation, +1 positive correlation and -1
negative correlation. The difference between Spearman and Pearson correlation coefficients
is lying in the underlying assumptions with the most notable to be normality for the case
of Pearson’s [57]. The corresponding p-value quantifies the probability of the same or
more extreme r value to appear due to random fluctuations between uncorrelated datasets.
By observing the aforementioned values at Table 1 we deduce that both criteria strongly
support linearity between the two channels at least on the region where the vast majority of
the measurements lie. As a further step, we perform a linear fit between the two channels
and we calculate the orthogonal distances between each ( PM2.5,channel A , PM2.5,channelB )
point and the fitted line.
PMB,2.5 = a · PM A,2.5 + b (4)
The parameters of the fit are given at Tab. 1 for all sensors. At the latter table we also
present some statistical measures, namely the mean value per sensor, the corresponding
Version April 4, 2023 submitted to Atmosphere 10 of 14
standard deviation and the median with its dispersion measure. In order to construct the
final PM2.5 to be used by our model, we take the weighted average within 1h, where for
weights we use the reverse of the orthogonal distance mentioned before, addressing in
same time the fact that normality of the PM measurements within 1h interval, can not
be safely assumed in general [58]. Here we apply a quality cut-off, namely we exclude
measurements where the scatter is more than 1µg/m3 . After this step, we construct the
PM2.5,chan.avg measurements as the mean value between the two channels,
Finally, as the sensitivity of the sensors is generally found to be highly related to meteo-
rological qualities such as humidity and temperature, [51–53,55] we apply a calibration
procedure. Among the various linear and non-linear calibration curves employed in the
literature (see e. g. [52]), we choose to use the calibration curve proposed on [52], as TODO:
add 2-3 lines regarding the selection of this curve, which reads as
where RH is the Relative Humidity in %, calculated using the following expression, [59]
( )
ab TDP,avg − Tavg
RH = exp · 100 (7)
Tavg + b TDP,avg + b
where a = 17.368, b = 238.88◦ C and Tavg , TDP,avg are temperature and dew point tempera- 208
ture averages within 1h, directly measured from the meteorological stations. 209
We use the dataset described at Sect. ?? and according to standard practice, we 212
scale it via standard minmax scaling. Note that the minmax scaling is not applied on the 213
parametrized time features, that lie in the [−1, 1] space, and also to the wind direction 214
feature (“winddirAvg”). Instead, the latter feature is transformed via sine/cosine in order 215
• Usage of Nash–Sutcliffe model efficiency coefficient, in the same lines with [26]. 218
TODO 219
3. Results on predictions for timescales: 1h, 12h, 24h, 48h, 1week 222
4. Feature importance per timescale and in general for these timescales NOTES On 223
"large" prediction times, time-features seem to be more important. On "less" pred 224
The assumptions of CF=1 are justified by the findings of ??? the guys that analyzed 226
The time features essentially quantify/describe emission, as the meteorological vari- 228
We train ’full’ and ‘tensor’ LSTMs N = 20 times and we present and discuss the
corresponding results on feature importance, for different prediction timescales. In order
Version April 4, 2023 submitted to Atmosphere 11 of 14
tensor
14 full
12
10
Percentage (%)
8
0
Auto-regressive RH tempAvg windspeedAvg Pressure Mean Fl. Area R. winddirAvg pressureTrend month hour day
Features
Figure 4. Feature importance in percentages, for the feature set of ??, where the two features for each
time were aggregated, for prediction window of 48 hours.
to construct the "mean" value on the feature importance, we minimized the following
expression
! −1 v !
N N k k
u
1 1
∑ ∑ ∑ ∑ fi,mean − 1
u 2
O( f mean ) = t f ij − f i,mean + · 10λ
j =1
RMSEj j =1
RMSEj i =1 i =1
(8)
where k is the length of the feature importance vector and RMSEj is the RMSE correspond- 232
ing to this particular run. The first term is simply the weighted average of the Euclidian 233
distances on the feature space between feature importance vector j and the "mean" and as 234
weights we use the reversed RMSE values. The second term corresponds to the condition 235
that all percentages add to 1. The λ parameter is an arbitrary integer, and we set λ = 5. 236
4. Conclusions 239
To be rephrased... We report that the LSTM network shows a forecasting accuracy that 240
is comparable to the sensor’s resolution, combined with meaningful interpretations of its 241
results, which provide insight into the Physics of the problem. That said, in parallel with a 242
re-training schedule, our model can be used as an accurate, low-cost, early-warning system, 243
in order to attenuate the health hazards of particulate matter pollution, by providing 244
Author Contributions: For research articles with several authors, a short paragraph specifying their 246
individual contributions must be provided. The following statements should be used “Conceptualiza- 247
tion, X.X. and Y.Y.; methodology, X.X.; software, X.X.; validation, X.X., Y.Y. and Z.Z.; formal analysis, 248
X.X.; investigation, X.X.; resources, X.X.; data curation, X.X.; writing—original draft preparation, 249
X.X.; writing—review and editing, X.X.; visualization, X.X.; supervision, X.X.; project administration, 250
X.X.; funding acquisition, Y.Y. All authors have read and agreed to the published version of the 251
manuscript.”, please turn to the CRediT taxonomy for the term explanation. Authorship must be 252
limited to those who have contributed substantially to the work reported. 253
Funding: This work was supported in part by project ENIRISST+ under grant agreement No. MIS 254
5047041 from the General Secretary for ERDF & CF, under Operational Programme Competitive- 255
ness, Entrepreneurship and Innovation 2014-2020 (EPAnEK) of the Greek Ministry of Economy and 256
Development (co-financed by Greece and the EU through the European Regional Development Fund). 257
258
259
Version April 4, 2023 submitted to Atmosphere 12 of 14
. 260
Data Availability Statement: In this section, please provide details regarding where data supporting 262
reported results can be found, including links to publicly archived datasets analyzed or generated 263
during the study. Please refer to suggested Data Availability Statements in section “MDPI Research 264
Data Policies” at https://www.mdpi.com/ethics. If the study did not report any data, you might 265
Acknowledgments: FA wants to thank Dr Sandy Fameli, from University of the Aegean for very 267
Conflicts of Interest: The authors declare no conflict of interest. The funders had no role in the design 269
of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or 270
Sample Availability: Samples of the compounds ... are available from the authors. 272
Abbreviations 273
275
MDPI Multidisciplinary Digital Publishing Institute
DOAJ Directory of open access journals
276
TLA Three letter acronym
LD Linear dichroism
Appendix A 277
• Meterological features: Solar Radiation High, uv-High, Humidity Low, Humidity High, 280
Humidity Low, Humidity Average, Temperature High, Temperature Low, Tempera- 281
ture Average, Wind Speed High, Wind Speed Low, Wind Speed Average, Wind Gust 282
High, Wind Gust Low, , Wind Gust Average, Dew Point High, Dew Point Low, Dew 283
Point Average, Wind Chill High, Wind Chill Low, Wind Chill Average, Heat Index 284
High, Heat Index Low, Heat Index Average, Pressure Maximum, Pressure Minimum, 285
Appendix B 288
All appendix sections must be cited in the main text. In the appendices, Figures, Tables, 289
etc. should be labeled, starting with “A”—e.g., Figure A1, Figure A2, etc. 290
References 291
1. Weltgesundheitsorganisation.; Organization, W.H.; for Environment, E.C. WHO global air quality guidelines: particulate matter 292
(PM2. 5 and PM10), ozone, nitrogen dioxide, sulfur dioxide and carbon monoxide; World Health Organization, 2021. 293
2. Goldberg, M. A systematic review of the relation between long-term exposure to ambient air pollution and chronic diseases. 294
3. Khaniabadi, Y.O.; Goudarzi, G.; Daryanoosh, S.M.; Borgini, A.; Tittarelli, A.; De Marco, A. Exposure to PM10, NO2, and O3 and 296
impacts on human health. Environmental science and pollution research 2017, 24, 2781–2789. 297
4. Seinfeld, H., J.; Pandis, N., S. Atmospheric Chemistry and Physics: From Air Pollution to Climate Change; Wiley, 2016. 298
5. Kaskaoutis, D.; Grivas, G.; Oikonomou, K.; Tavernaraki, P.; Papoutsidaki, K.; Tsagkaraki, M.; Stavroulas, I.; Zarmpas, P.; 299
Paraskevopoulou, D.; Bougiatioti, A.; et al. Impacts of severe residential wood burning on atmospheric processing, water-soluble 300
organic aerosol and light absorption, in an inland city of Southeastern Europe. Atmospheric Environment 2022, p. 119139. 301
6. Papadakis, G.; Megaritis, A.; Pandis, S. Effects of olive tree branches burning emissions on PM2. 5 concentrations. Atmospheric 302
7. Levin, Z.; Cotton, R., W. Atmospheric Chemistry and Physics: From Air Pollution to Climate Change; Springer, 2009. 304
Version April 4, 2023 submitted to Atmosphere 13 of 14
8. Su, T.; Li, Z.; Li, C.; Li, J.; Han, W.; Shen, C.; Tan, W.; Wei, J.; Guo, J. The significant impact of aerosol vertical structure on lower 305
atmosphere stability and its critical role in aerosol–planetary boundary layer (PBL) interactions. Atmospheric Chemistry and Physics 306
9. Pearlmutter, D.; Bitan, A.; Berliner, P. Microclimatic analysis of “compact” urban canyons in an arid zone. Atmospheric Environment 308
10. Toscano, D.; Marro, M.; Mele, B.; Murena, F.; Salizzoni, P. Assessment of the impact of gaseous ship emissions in ports using 310
physical and numerical models: The case of Naples. Building and Environment 2021, 196, 107812. 311
11. Merico, E.; Dinoi, A.; Contini, D. Development of an integrated modelling-measurement system for near-real-time estimates of 312
harbour activity impact to atmospheric pollution in coastal cities. Transportation Research Part D: Transport and Environment 2019, 313
12. Progiou, A.; Bakeas, E.; Evangelidou, E.; Kontogiorgi, C.; Lagkadinou, E.; Sebos, I. Air pollutant emissions from Piraeus port: 315
External costs and air quality levels. Transportation Research Part D: Transport and Environment 2021, 91, 102586. 316
13. Wang, J.; Xing, J.; Mathur, R.; Pleim, J.E.; Wang, S.; Hogrefe, C.; Gan, C.M.; Wong, D.C.; Hao, J. Historical trends in PM2. 5-related 317
premature mortality during 1990–2010 across the northern hemisphere. Environmental health perspectives 2017, 125, 400–408. 318
14. Jeanjean, A.P.; Monks, P.S.; Leigh, R.J. Modelling the effectiveness of urban trees and grass on PM2. 5 reduction via dispersion 319
and deposition at a city scale. Atmospheric Environment 2016, 147, 1–10. 320
15. Lauriks, T.; Longo, R.; Baetens, D.; Derudi, M.; Parente, A.; Bellemans, A.; Van Beeck, J.; Denys, S. Application of improved CFD 321
modeling for prediction and mitigation of traffic-related air pollution hotspots in a realistic urban street. Atmospheric Environment 322
16. Hao, C.; Xie, X.; Huang, Y.; Huang, Z. Study on influence of viaduct and noise barriers on the particulate matter dispersion in 324
street canyons by CFD modeling. Atmospheric Pollution Research 2019, 10, 1723–1735. 325
17. Fameli, K.M.; Assimakopoulos, V.D. The new open Flexible Emission Inventory for Greece and the Greater Athens Area 326
(FEI-GREGAA): Account of pollutant sources and their importance from 2006 to 2012. Atmospheric Environment 2016, 137, 17–37. 327
18. Pérez, P.; Trier, A.; Reyes, J. Prediction of PM2. 5 concentrations several hours in advance using neural networks in Santiago, 328
19. Zhao, J.; Deng, F.; Cai, Y.; Chen, J. Long short-term memory-Fully connected (LSTM-FC) neural network for PM2. 5 concentration 330
20. Qin, D.; Yu, J.; Zou, G.; Yong, R.; Zhao, Q.; Zhang, B. A novel combined prediction scheme based on CNN and LSTM for urban 332
21. Zhou, Y.; Chang, F.J.; Chang, L.C.; Kao, I.F.; Wang, Y.S. Explore a deep learning multi-output neural network for regional 334
multi-step-ahead air quality forecasts. Journal of cleaner production 2019, 209, 134–145. 335
22. Wu, X.; Wang, Y.; He, S.; Wu, Z. PM2.5 / PM10 ratio prediction based on a long short-term memory neural network in Wuhan, 336
23. Zhang, B.; Zhang, H.; Zhao, G.; Lian, J. Constructing a PM2. 5 concentration prediction model by combining auto-encoder with 338
Bi-LSTM neural networks. Environmental Modelling & Software 2020, 124, 104600. 339
24. Li, T.; Hua, M.; Wu, X. A hybrid CNN-LSTM model for forecasting particulate matter (PM2. 5). Ieee Access 2020, 8, 26933–26940. 340
25. Pak, U.; Ma, J.; Ryu, U.; Ryom, K.; Juhyok, U.; Pak, K.; Pak, C. Deep learning-based PM2. 5 prediction considering the 341
spatiotemporal correlations: A case study of Beijing, China. Science of The Total Environment 2020, 699, 133561. 342
26. Qiao, W.; Wang, Y.; Zhang, J.; Tian, W.; Tian, Y.; Yang, Q. An innovative coupled model in view of wavelet transform for predicting 343
short-term PM10 concentration. Journal of Environmental Management 2021, 289, 112438. 344
27. Zhang, L.; Na, J.; Zhu, J.; Shi, Z.; Zou, C.; Yang, L. Spatiotemporal causal convolutional network for forecasting hourly PM2. 5 345
concentrations in Beijing, China. Computers & Geosciences 2021, 155, 104869. 346
28. Zhang, B.; Zou, G.; Qin, D.; Lu, Y.; Jin, Y.; Wang, H. A novel Encoder-Decoder model based on read-first LSTM for air pollutant 347
29. Yan, R.; Liao, J.; Yang, J.; Sun, W.; Nong, M.; Li, F. Multi-hour and multi-site air quality index forecasting in Beijing using CNN, 349
LSTM, CNN-LSTM, and spatiotemporal clustering. Expert Systems with Applications 2021, 169, 114513. 350
30. Mao, W.; Wang, W.; Jiao, L.; Zhao, S.; Liu, A. Modeling air quality prediction using a deep learning approach: Method 351
optimization and evaluation. Sustainable Cities and Society 2021, 65, 102567. 352
31. Du, M.; Chen, Y.; Liu, Y.; Yin, H. A Novel Hybrid Method to Predict PM2. 5 Concentration Based on the SWT-QPSO-LSTM 353
32. Hu, K.; Guo, X.; Gong, X.; Wang, X.; Liang, J.; Li, D. Air quality prediction using spatio-temporal deep learning. Atmospheric 355
33. Castelvecchi, D. Can we open the black box of AI? Nature News 2016, 538, 20. 357
34. Guo, T.; Lin, T.; Antulov-Fantulin, N. Exploring interpretable lstm neural networks over multi-variable data. In Proceedings of 358
the International conference on machine learning. PMLR, 2019, pp. 2494–2504. 359
35. Hedworth, H.A.; Sayahi, T.; Kelly, K.E.; Saad, T. The effectiveness of drones in measuring particulate matter. Journal of Aerosol 360
36. Kaivonen, S.; Ngai, E.C.H. Real-time air pollution monitoring with sensors on city bus. Digital Communications and Networks 2020, 362
6, 23–30. 363
Version April 4, 2023 submitted to Atmosphere 14 of 14
37. Li, X.; Peng, L.; Yao, X.; Cui, S.; Hu, Y.; You, C.; Chi, T. Long short-term memory neural network for air pollutant concentration 364
predictions: Method development and evaluation. Environmental pollution 2017, 231, 997–1004. 365
38. Yu, Y.; Si, X.; Hu, C.; Zhang, J. A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures. Neural 366
Computation 2019, 31, 1235–1270. https://doi.org/10.1162/neco_a_01199. 367
40. Global Modeling and Assimilation Office (GMAO), Goddard Earth Sciences Data and Information Services Center (GES 369
DISC). MERRA-2 instU_2d_lfo_Nx: 2d, 2d,diurnal,Instantaneous,Single-Level,Assimilation,Land Surface Forcings V5.12.4, 370
2023. Accessed 10.01.23, https://doi.org/110.5067/FC3BVJ88Y8A2. 371
41. Kosmopoulos, G.; Salamalikis, V.; Matrali, A.; Pandis, S.N.; Kazantzidis, A. Insights about the Sources of PM2.5 in an Urban Area 372
from Measurements of a Low-Cost Sensor Network. Atmosphere 2022, 13. https://doi.org/10.3390/atmos13030440. 373
42. Manousakas, M.; Papaefthymiou, H.; Diapouli, E.; Migliori, A.; Karydas, A.; Bogdanovic-Radovic, I.; Eleftheriadis, K. Assessment 374
of PM2. 5 sources and their corresponding level of uncertainty in a coastal urban area using EPA PMF 5.0 enhanced diagnostics. 375
Science of the Total Environment 2017, 574, 155–164. 376
43. Kostenidou, E.; Florou, K.; Kaltsonoudis, C.; Tsiflikiotou, M.; Vratolis, S.; Eleftheriadis, K.; Pandis, S.N. Sources and chemical 377
characterization of organic aerosol during the summer in the eastern Mediterranean. Atmospheric Chemistry and Physics 2015, 378
15, 11355–11371. https://doi.org/10.5194/acp-15-11355-2015. 379
44. Manousakas, M.; Diapouli, E.; Papaefthymiou, H.; Kantarelou, V.; Zarkadas, C.; Kalogridis, A.C.; Karydas, A.G.; Eleftheriadis, K. 380
XRF characterization and source apportionment of PM10 samples collected in a coastal city. X-Ray Spectrometry 2018, 47, 190–200. 381
45. Matthaios, V.N.; Triantafyllou, A.G.; Koutrakis, P. PM10 episodes in Greece: Local sources versus long-range trans- 382
port—observations and model simulations. Journal of the Air & Waste Management Association 2017, 67, 105–126, [https://doi.org/10.1080/109
383
46. Manousakas, M.I.; Florou, K.; Pandis, S.N. Source Apportionment of Fine Organic and Inorganic Atmospheric Aerosol in an 385
Urban Background Area in Greece. Atmosphere 2020, 11. https://doi.org/10.3390/atmos11040330. 386
47. Florou, K.; Papanastasiou, D.K.; Pikridas, M.; Kaltsonoudis, C.; Louvaris, E.; Gkatzelis, G.I.; Patoulias, D.; Mihalopoulos, N.; 387
Pandis, S.N. The contribution of wood burning and other pollution sources to wintertime organic aerosol levels in two Greek 388
cities. Atmospheric Chemistry and Physics 2017, 17, 3145–3163. https://doi.org/10.5194/acp-17-3145-2017. 389
50. Kosmopoulos, G.; Salamalikis, V.; Pandis, S.; Yannopoulos, P.; Bloutsos, A.; Kazantzidis, A. Low-cost sensors for measuring 392
airborne particulate matter: Field evaluation and calibration at a South-Eastern European site. Science of The Total Environment 393
2020, 748, 141396. 394
51. Holder, A.L.; Mebust, A.K.; Maghran, L.A.; McGown, M.R.; Stewart, K.E.; Vallano, D.M.; Elleman, R.A.; Baker, K.R. Field 395
evaluation of low-cost particulate matter sensors for measuring wildfire smoke. Sensors 2020, 20, 4796. 396
52. Barkjohn, K.K.; Gantt, B.; Clements, A.L. Development and application of a United States-wide correction for PM2.5 data collected 397
with the PurpleAir sensor. Atmospheric Measurement Techniques 2021, 14, 4617–4637. https://doi.org/10.5194/amt-14-4617-2021. 398
53. Ardon-Dryer, K.; Dryer, Y.; Williams, J.N.; Moghimi, N. Measurements of PM 2.5 with PurpleAir under atmospheric conditions. 399
Atmospheric Measurement Techniques 2020, 13, 5441–5458. 400
55. Stavroulas, I.; Grivas, G.; Michalopoulos, P.; Liakakou, E.; Bougiatioti, A.; Kalkavouras, P.; Fameli, K.M.; Hatzianastassiou, N.; 402
Mihalopoulos, N.; Gerasopoulos, E. Field evaluation of low-cost PM sensors (Purple Air PA-II) under variable urban air quality 403
conditions, in Greece. Atmosphere 2020, 11, 926. 404
56. Virtanen, P.; Gommers, R.; Oliphant, T.E.; Haberland, M.; Reddy, T.; Cournapeau, D.; Burovski, E.; Peterson, P.; Weckesser, W.; 405
Bright, J.; et al. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods 2020, 17, 261–272. 406
https://doi.org/10.1038/s41592-019-0686-2. 407
57. Kowalski, C.J. On the effects of non-normality on the distribution of the sample product-moment correlation coefficient. Journal 408
of the Royal Statistical Society: Series C (Applied Statistics) 1972, 21, 1–12. 409
58. Alolayan, M.A.; Brown, K.W.; Evans, J.S.; Bouhamra, W.S.; Koutrakis, P. Source apportionment of fine particles in Kuwait City. 410
Science of the Total Environment 2013, 448, 14–25. 411
59. Buck, A.L. New Equations for Computing Vapor Pressure and Enhancement Factor. Journal of Applied Meteorology and Climatology 412
1981, 20, 1527 – 1532. https://doi.org/10.1175/1520-0450(1981)020<1527:NEFCVP>2.0.CO;2. 413