Air Quality Prediction Using Machine Learning Algorithms - A Review
Air Quality Prediction Using Machine Learning Algorithms - A Review
Abstract-Predicting air quality is necessary step to be named as Greenhouse gas. Global warming a major
taken by government as it is becoming the major concern caused by increase in carbon dioxide in
concern among the health of human beings. Air air.CO2 is exhale by Human.CO2 is also released by
quality Index measure the quality of air. Various air
burning of fossil fuels.
pollutants causing air pollution are Carbon dioxide,
Nitrogen dioxide, carbon monoxide etc that are Sulphur oxide (SOX): Sulphur dioxide (SO2)
released from burning of natural gas, coal and wood, released by burning coal and petroleum. It is
industries, vehicles etc. Air Pollution can cause severe
released by various industries. When react with
disease like lungs cancer, brain disease and even lead
to death. Machine learning algorithms helps in
Catalyst (NO2), results in H2SO4 causing acid rains
determining the air quality index. Various research is that forms the major cause of Air Pollution.
being done in this field but still results are still not
Nitrogen oxide (NOX): Most commonly Nitrogen
accurate. Dataset are available from Kaggle, air
quality monitoring sites and divided into two dioxide (NO2) that is caused by thunderstorm, rise
Training and Testing. Machine Learning algorithms in temperature.
employed for this are Linear Regression, Decision
Tree, Random Forest, Artificial Neural Network,
Carbon monoxide (CO): -Carbon monoxide is
Support Vector Machine. caused by burning of coal and wood. It is released
by Vehicles.It is odorless, colorless, toxic gas. It
Keywords: Air quality Index, Decision tree, Support forms a smog in air and thus a primary pollutant in
Vector Machine, Random Forest, Linear Regression air pollution.
1. INTRODUCTION Toxic metals –Example are Lead and Mercury
Air pollution is dangerous for human health and Chlorofluorocarbons (CFC): -Chlorofluorocarbons
should be decrease fast in urban and rural areas so released by air conditioners, refrigerators which
it is necessary to predict the quality of air react with other gases and damage the Ozone
accurately. There are many types of pollution like Layer. Therefore, Ultraviolet Rays reach the earth
water pollution, air pollution, soil pollution etc but surface and thus cause harms to human beings.
most important among these is air pollution which
should be controlled immediately as humans inhale Garbage, Sewage and industrial Process also causes
oxygen through air. Air Pollution.
There are various causes of air pollution. Outdoor Particles originating from dust storms, forest,
air pollution caused by industries, factories, volcanoes in the form of solid or liquid causing air
vehicles and Indoor air pollution is caused if air pollution.
inside the house is contaminated by smokes,
Secondary Pollutants include: -
chemicals, smell. Two types of Pollutants that is
causing air pollution are Primary Pollutants and Ground Level Ozone: It is just above the earth
Secondary Pollutants. surface and forms when Hydrocarbon react with
Nitrogen Oxide in the sunlight presence.
Primary Pollutants include: -
Acid Rain: When Sulphur dioxide react with
Carbon dioxide (CO2): Carbon dioxide is playing
nitrgendioxide, oxygen and water in air thus
an important role in causing air pollution. It is also
140
ISBN: 978-1-7281-8337-4/20/$31.00 ©2020 IEEE
Authorized licensed use limited to: Central Michigan University. Downloaded on May 14,2021 at 04:10:26 UTC from IEEE Xplore. Restrictions apply.
2020 2nd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN)
Benzene concentration can also account into air Machine learning Prediction
pollution and its concentration can be determined Process process
with CO [7].
141
Authorized licensed use limited to: Central Michigan University. Downloaded on May 14,2021 at 04:10:26 UTC from IEEE Xplore. Restrictions apply.
2020 2nd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN)
[4] CR et al.(2018) in their research had used dataset is splitted into training and testing. After k
Autoregression to detect whether the air is polluted means clustering,MultinominalLogistic regression
or not and linear regression can be used to and decision trees are used to predict the upcoming
determine the level of PM2.5 but the drawback is it values and in multinominal logistic regression the
does not able to determine the level of PM2.5 after actual values are much closer to predicted values
when there is some change in atmospheric and considering multinominal logic regression
condition and it takes into account meteorological giving better accurate result.
condition such as wind speed, temperature.
[12] Desislava Ivanova and Angel Elenkov (2019)
[5] Zhang et al. (2018) proposed Wavelet neural had used Rasberry Pi platform with Multilayer
network considered as robust method to determine perceptron algorithm of machine learning to predict
the level of Air pollutants but lack in determining the air pollutant accurately.Multilayer perceptron
the appropriate wavelet function and no of proper overcome problem of classification which is used
hidden layers required in their research which leads for discrete values and regression which is used for
to inaccurate prediction of air pollutants. continuous value. In this author uses discrete values
and had used multilayer perceptron using
[6] Timothy M. Amado and Jennifer C. Dela backpropagation and therefore input did not pass to
Cruz(2018) in their study considered neural the activation function resulting in 0 or 1
network with integrated sensor giving accurate indicating how big the difference between the
level of air pollutants but slow because it take all predicted and actual value. The coefficient of
training set and cannot work with incomplete data determination(R2) obtained is better but more can
set. be improved by increment feeding.
[7] Arwa et al. (2018) in their research had [13] J. Angelin Jebamalar & A. Sasi Kumar(2019)
determined Benzene concentration by using in this paper uses hybrid light tree and light
Artificial Neural Network(ANN) and Support gradient boosting model .The proposed method
Vector Machine but does not predict the error PM2.5 level is captured using the sensor with
accurately between actual and predicted value and raspberry PI and the cloud is used to store it and
its association with CO can be determined using then hybrid model is used for predicting. Hybrid
method of Correlation. model is compared with Linear Regression ,Lasso
[8]In this paper the author Kang et al.(2018), had Regression ,Support Vector Regression , Neural
compared the various models like ANN, Random Network , Random Forest , Decision Tree ,
Forest, Decision Trees, Least squares support XGBoost and Hybid model comes out to be best to
vector machine model, Deep belief network and detect PM2.5 as Boosting is sequential ensemble as
Deep belief network comes out to be superior as it it correct the errors from the previous model. It
takes into account hourly prediction of data but takes less space and can handle large amount of
drawbacks are there are lots of issues in sensor data but the limitation is it takes more time and
quality of data due to faults in device . image can be used further to predict PM2.5 Level
more accurately.
[9] Mejía et al. (2018),had determined PM10 level
best with Random Forest but does not accurately [14] Soubhik et al. (2018) had compared various
predict the level of dangerous pollutant but can algorithms like Linear regression, Neural network
work with incomplete data set.. regression, Lasso regression, ElasticNet regression,
Decision Forest, Extra trees, Boosted decision tree,
[10] Lidia et al.(2015) had determined Airvlc XGBoost, KNN, and Ridge regression to predict air
Model not only to predict the air quality level pollutant level.Better accuracy is obtained by extra
CO,NO,PM2.5 but also warns the public about trees because features are arranged in decreasing
dangerous air pollutant outside using his model order of importance to predict the upcoming value..
through sensors
[15] Burhan BARAN (2019) proposed Air quality
[11] Nandini K & G Fathima (2019) had used used prediction using Extreme learning machine (ELM).
Multinominal Logistic regression and decision ELM is single layer and feed forward artificial
trees to predict air quality pollutant level. In this neural network. ELM uses faster learning speed
Interpolation, prediction and feature analysis is .Activation used are sigmoidal, sine and hard-limit
done. The author in this uses K-means clustering to and the accuracy obtained by sigmodial function is
form the three clusters and classified as them as better. More number of hidden neurons are used
high , Medium, low level on air pollution level and
142
Authorized licensed use limited to: Central Michigan University. Downloaded on May 14,2021 at 04:10:26 UTC from IEEE Xplore. Restrictions apply.
2020 2nd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN)
and 10 cross validation is used to calulate the air wise and cover north east china in its area and does
pollutant level using ELM. not account into meterological data .
[16] In this the author Pasupuleti et al. (2020) III. COMPARATIVE ANALYSIS OF RESULT
compares the decision tree, linear regression, OBTAINED BY VARIOUS RESEARCHERS
random forest. Major air pollutants are taken and WITH DIFFERENT ALGORITHMS
meterological condition are taken using Arduino
Platform. Random forest gives better accurate Table 1. Result obtained by various researchers
result due to overfitting that reduces errors But with different algorithms
drawback is Random forest uses more memory and Techniq Predi Pollu Areas
high cost. ue ction tants
Perfo
[17] Haotian Jing & Yingchun Wang(2020) had
rman
predicted the air quality index using XG Boost.It ce
uses the weak classifier and shortcoming of the [1] Neural Accur SO2, Republic of
previous weak classifier to form a strong classifier Network acy: NO2, Macedonia
thus reducing the error between predicted and 92.3% O3,C
actual values .It uses the K- cross validation .The O,
mean absolute error and coefficient of PM2.
determination is determined to predict the 5,
difference between actual and predicted value.The PM10
drawback faced is that it takes the previous value [2] RNN Accur CO,N Atlanta-Sandy
and is affected by outlier unwanted pollutant in the acy: O2,O3 Springs-
air. 80.27 ,SO2, Roswell,
% PM2.5,
[18]MaryamAljanabi, MohammadShkoukani and PM10
MohammadHijjawi (2020) depict the ozone layer [4] Logistic Mean Temp Populating and
depending upon the Temperature, humidity, regressio Accur eratur Developing
windspeed,wind direction. Various machine n, acy:- e, countries
learning algorithm used are MLP, XG Boost, SVR, Autoregr 0.998 Wind
DTR. In this data set is taken according to area ession 59 speed,
Stand Dewp
selection then the preprocessing is done the
ardDe oint,
fluctuation if any is removed in the dataset using viatio Press
holt winter, moving average, savitsky, Golay and n ure,P
Savitsky and it was observed that Golay gives Accur M2.5
better result in preprocessing. Then Feature acy
selection is performed using forward feature 0.000
wrapper selection as there are some unwanted 612
features which is not to be taken into account to [6] Neural Accur Air General
predict accurately the level of pollutant in air .Then Network acy:9 tempe
the machine learning algorithm described above is 9.56 rature
performed and the coefficient of determination, ,
root means square error, Mean absolute error are Relati
compared and MLP Comes out to be superior ve
humi
model. It predicts the Ozone layer using MLP on
dity,
day to day basis.
MQ2,
[19] ShuWang et al. (2020) uses gas recurrent MQ1
Neural Network to predict the air quality level and 35,M
Q5
compared with MLP and SVR. Gas recurrent
[7] Artificial MRE: Benze General
neural network performs better due to less invariant
Neural -0.16 ne
in sensor drift but Gas Recurrent Neural Network is Network Conce
more prone to variablity and humidity in ntrati
atmosphere. on
[8] Deep Error Pm China
[20] Zhao et al.(2020) uses CERT Model which is
Belief :1.6% 2.5
formed by combining forward and recurrent neural Network
but predicts the pollutant level day wise not hourly
143
Authorized licensed use limited to: Central Michigan University. Downloaded on May 14,2021 at 04:10:26 UTC from IEEE Xplore. Restrictions apply.
2020 2nd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN)
144
Authorized licensed use limited to: Central Michigan University. Downloaded on May 14,2021 at 04:10:26 UTC from IEEE Xplore. Restrictions apply.
2020 2nd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN)
RMSE can be calculated as: - [13]. J. Angelin Jebamalar & A. Sasi Kumar, PM2.5 Prediction
using Machine Learning Hybrid Model for Smart
Health,2019, International Journal of Engineering and
Advanced Technology (IJEAT)
[14]. Soubhik Mahanta, T. Ramakrishnudu, Rajat Raj Jha and
Niraj Tailor, Urban Air Quality Prediction Using
Regression Analysis,2019, IEEE
MAE can be calculated as: - [15]. Burhan BARAN, Prediction of Air Quality Index by
Extreme Learning Machines,2019, IEEE
[16]. Venkat Rao Pasupuleti , Uhasri , Pavan Kalyan , Srikanth
and Hari Kiran Reddy, Air Quality Prediction Of Data Log
By Machine Learning,2020,IEEE
[17]. Haotian Jing & Yingchun Wang, Research on Urban Air
Quality Prediction Based on Ensemble Learning of
IV. CONCLUSION XGBoost, 2020, E3S Web of Conferences
[18]. Maryam Aljanabi, Mohammad Shkoukani and Mohammad
Hijjawi, Ground-level Ozone Prediction Using Machine
Lots of research had been done and different Learning Techniques: A Case Study in Amman,
algorithms comes out to be superior with different Jordan,2020, International Journal of Automation and
conditions like pollutant chosen and area selection. Computing
[19]. Shu Wang, Yuhuang Hu, Javier Burgues´ Santiago Marco
Various algorithms had taken meteorological data and Shih-Chii Liu, Prediction of Gas Concentration Using
like temperature, wind speed, humidity in Gated Recurrent Neural Networks, 2020, IEEE
predicting accurately the upcoming pollutant level. [20]. Zhili Zhao , Jian Qin, Zhaoshuang He, Huan Li, Yi Yang
and Ruisheng Zhang, Combining forward with recurrent
Neural Network and boosting model comes out to neural networks for hourly air quality prediction in
be superior than other algorithms. Northwest of China, Environmental Science and Pollution
Research,2020
REFERENCES
[1]. Kostandina Veljanovska1 & Angel Dimoski2, Air Quality
Index Prediction Using Simple Machine Learning
Algorithms,2018, International Journal of Emerging Trends
& Technology in Computer Science (IJETTCS).
[2]. Xiaosong Zhao , Rui Zhang, Jheng-Long Wu, Pei-Chann
Chang and Yuan Ze University, A Deep Recurrent Neural
Network for Air Quality Classification, 2018, Journal of
Information Hiding and Multimedia Signal Processing
[3]. Savita Vivek Mohurle, Dr. Richa Purohit and Manisha Patil,
A study of fuzzy clustering concept for measuring air
pollution index,2018, International Journal of Advanced
Science and Research
[4]. Aditya C R , Chandana R Deshmukh , Nayana D K and
Praveen Gandhi Vidyavastu , Detection and Prediction of
Air Pollution using Machine Learning Models,2018,
International Journal of Engineering Trends and
Technology (IJETT)
[5]. Shan Zhang , Xiaoli Li & Yang Li , Jianxiang Mei,
Prediction of Urban PM2.5 Concentration Based on
Wavelet Neural Network,2018,IEEE.
[6]. Timothy M. Amado & Jennifer C. Dela Cruz, Development
of Machine Learning-based Predictive Models for Air
Quality Monitoring and Characterization, 2018, IEEE
[7]. Arwa Shawabkeh , Feda Al-Beqain , Ali Rodan, Maher
Salem, Benzene Air Pollution Monitoring Model using
ANN and SVM,2018,IEEE
[8]. Gaganjot Kaur Kang, Jerry Zeyu Gao, Sen Chiao,
Shengqiang Lu, and Gang Xie, Air Quality Prediction: Big
Data and Machine Learning Approaches,2018, International
Journal of Environmental Science and Development
[9]. Nicolás Mejía Martínez, Laura Melissa Montes, Ivan Mura
and Juan Felipe Franco, Machine Learning Techniques for
PM10 Levels Forecast in Bogotá,2018,IEEE
[10]. Lidia Contreras Ochando, Cristina I. Font Julian, Francisco
Contreras Ochando, Cesar Ferri,Airvlc: An application for
real-time forecasting urban air pollution,2015, Proceedings
of the 2 nd International Workshop on Mining Urban Data,
Lille, France.
[11]. Nandini K & G Fathima, Urban Air Quality Analysis and
Prediction Using Machine Learning,2019,IEEE
[12]. Desislava Ivanova and Angel Elenkov, Intelligent System
for Air Quality Monitoring Assessment using the Raspberry
Pi Platform,2019, IEEE
145
Authorized licensed use limited to: Central Michigan University. Downloaded on May 14,2021 at 04:10:26 UTC from IEEE Xplore. Restrictions apply.