Air Quality Prediction
Air Quality Prediction
Model evaluation
Evaluating performance of the model on test data to
determine how well it generalizes to new, unseen data. You can
use metrics like mean squared error to measure the model
accuracy.
Model deployment
Finally, deploy the trained model on the Arduino platform,
where it can use the data from the sensors to make predictions
in real- time.
IVALGORITHM
LSTM (Long Short-Term Memory) one of the type of recurrent
neural network (RNN) architecture which is designed to handle the
long-term dependencies between input sequences. All the models
Belongs to LSTM are Helpful for Processing and as well as
Prediction of timeseries data such as speech, text and video etc.
At a high level, an LSTM consists of three main components: an
input gate, a forget gate, and an output gate. Each of these gates is
implemented using a sigmoid activation function and controls the
flow of the through the LSTM.
Figure. 1: Device design The input gate determines which parts of the input sequence should be
used to update the LSTM's internal state.The part of the internal state
Figure 1 It indicates the collection of air quality with multiple should be forgotten that is decided by the forget gate.The output gate
parameters from the desired locations and by sensors like determines which parts of the internal state should be used to produce
PM10,PM2.5,Carbon monoxide sensor, Wind wave and wind speed the output sequence.
and temperature sensor and many more the data collected from Having this The cell in LSTM is capable of carrying information
multiple sensors will send the data to Arduino microcontroller, It across the time. The cell state will be updated using a combination of
will send the datato the local server like PC/laptops we have to pre- input gate and forget gate, and can be thought of as a "memory" of
process the data before sending the data to the training model the the LSTM.
data pre- processing includes Cleaning and Normalization and The equations that govern by the behavior of an LSTM are as follows:
formatting the file A_t = σ(Xi + U_I H_{T-1} + B)
B_T = σ(W X + U_F H_{T-1}
B_D_T = σ(WX_T + U)
C = tanh(W Tx + U{t-1} + Cb)
Cleaning the data E = F_T * c_{t-1} + I* G_T = O_T * tanh(tc)
It is customary to have missing values in the any dataset ,It will
happen basically while collecting data from the different resources Here, I, F,T, and represent gates output ,input and forget gates The
to overcome this problem the rows with missing data is eliminated subscript t represents the current time step, and H_{t-1} represents the
And converted in to the numerical model for the easy understanding output the previous time step. i,,f, W, and C are weight matrices for
the input, forget, output, and candidate gate respectively, weight
matrices for the recurrent connections. i, f, o, and c are bias terms. as:
During training process, the weights and biases of an LSTM are t = sigmoid(dw* [h_{t-1}] + b)
updated using backpropagation through time (BPTT), which
involves calculating the gradients with respective of the parameters fw is a weight matrix, f to be a bias vector, and the [h_{t- 1}, x_t]
of the LSTM Algorithm. notation indicates addition of previous hidden state data and current
Overall, LSTMs are a powerful tool for processing and predicting input data.
time-series data, thanks to their capacity of acquiring the long term
dependencies and remember important information across multiple The output gate (d_t) is used to control the current cell state (C)
time steps. should be outputted at each time step. It is defined as:
d = sigmoid(O * [h{t-1}, x] + B)
V. MATHEMATICAL MODEL
where Ow is a Mass of the matrix, Ob is a bias of the vector, and the
[H_{t- 1}, X] notation indicates concatenation of the previous
Mathematical model of an LSTM algorithm hidden state and the current input.
The cell state (C) is updated Considering the input and forget gates ,
Long Short-Term Memory (LSTM) is a type of recurrent neural and the candidate cell state (C_tilde_t) which is calculated as:
network (RNN) architecture that can be representedmathematically. C_tilde_T = tanh(wc * [h{t-1}, X_T] + Cb)
At a high level, it consists of three gates: the input and forget gate, where cw is a weight matrix, cb is a bias vector, and the [H_{t- 1},
and also the output gate. Each gate is asigmoid function that takes X_t] notation indicates addition of previous data which hidden and
as input will be the previous hidden state and the current input (x), the current input.The updated cell state (tc) is computed as:
and outputs a value between 0 and tc = ft * h{t-1}a_t * C_tilde_t
1. These gate values manages l the flow of the information through
the different cells. The hidden state (th) is computed as:
th= to* tanh(C_t)
The input gate which determines the information must be stored
in the cell state (S_T). It is represented mathematically by :I_T =
sigmoid(W[H_t-1, Tx] + Ib) This output will be used for further processing or as output for the
current time step.
fForget gate determines the which information should be discarded
from the cell state. mathematically as follows: LSTM algorithm uses a set of mathematical equations to determine
Tf= sigmoid(WF*[H_t, tx] + fb) which information to store, forget, and output at each time step,
Whatever the information that is coming out from the from the depends on the previous hidden state and by the current input data .
cell will be determined by the output get. It is represented This allows it to extracts the long term dependencies to selectively
mathematically as follows: remember or forget information over time, making it suited well to
to = sigmoid(ow * [H_t-1, X_T] + ob) the tasks like language modelling, and time series analysis.
The current input (X) is combined by the previous hidden state VI. RESULTS AND DISCUSSIONS
(H_t-1) to produce a candidate cell state (C_tilde_t), which is The plotted the actual true values (first plot) and predicted values
thenupdated used by the input gate and the forget gate. The (second plot). One can visually see that the distribution is almost the
updated cellstate (S_t) is then passed through the output gate same. This says that our predictions are very accurate.
to produce the current hidden state (h_t), which is the output
of the LSTM cell.These operations are represented
mathematically as follows: S_tilde_t = tanh(sw * [H_t-1,X_T] +
cb)
Ts= F * C_t-1 + I * C_tilde_t
Th = T * tanh(S)
The input gate (i) is used to control quantity of the new input (x)
should be added to the cell state (C) at each time step. It is defined
as:
a = sigmoid(W_a * [h_{t-1}, x] + i)
a is a weight matrix, i is a bias vector, and the [h_{t- 1}, x_t] Fig 6.1 actual value graph
notation indicates the concatenation over previous hidden state
(h_{t-1}) and the current input (x).
forget gate (f) which is used to control how much of the previous
state of (c{t-1}) must be retained at each time step. It is defined
Fig 6.5 Accuracy of NO2
Computed model accuracy for CO
Fig 6.2 Predicted value graph
[3] Limei “The Research being Made on The Air Quality Index
which is based on SPSS” by Limei Ma,Yijun Gao,Chen Zhao.
In conclusion, the use of machine learning and Arduino in air [10] "A Review of Air Quality Prediction with Artificial
quality prediction has shown promising results in providing Intelligence Techniques" by A. M. Mohamed et al. (2021)
accurate and real-time measurements of air quality. Machine
learning algorithms like linear regression, decision trees, and the
neural networks methods has been used estimation of air quality
parameters such as PM2.5, CO, and NO2, with high accuracy.
By integrating machine learning algorithms with an Arduino- based
sensor network, air quality monitoring can be done in real- time,
providing continuous updates on the air quality status. This can be
useful for both individuals and governments to take necessary
actions to reduce pollution levels and promote public health.