Roshini Project
Roshini Project
On
PREDICTION OF AIR POLLUION USING MACHINE LEARNING
TECHNIQUES
Submitted by
ROSHINI MISHRA 20W91A05J8
R. PAVANI 20W91A05J3
S. MANISH 20W91A05L2
B. VENKATESH 20W91A05P5
Under the Esteemed Guidance of
Mr. J. VENKATESH
Asst. Professor, CSE
TO
Jawaharlal Nehru Technological University, Hyderabad
In partial fulfilment of the requirements for award of degree of
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING
2023-2024
MALLA REDDY
INSTITUTE OF ENGINEERING & TECHNOLOGY
(Autonomous Institution - UGC, Govt. of India)
(Sponsored by Malla Reddy Educational Society)
Approved by AICTE, New Delhi, Recognized Under 2(1) & 12(B)
Affiliated to JNTU, Hyderabad, Accredited by NBA & NAAC with ‘A’ Grade
BONAFIDE CERTIFICATE
This is to certify that this is the bonafide certificate of an industrial oriented mini project
report titled "PREDICTION OF AIR POLLUION USING MACHINE
LEARNING TECHNIQUES" is submitted by ROSHINI MISHRA (20W91A05J8),
R. PAVANI (20W91A05J3), S. MANISH (20W91A05L2), B. VENKATESH
(20W91A05P5) of B. Tech in the partial fulfilment of the requirements for the degree
of Bachelor of Technology in Computer Science and Engineering and this has not been
submitted for the award of any other degree of this institution.
I hereby declare that the Mini Project report entitled “PREDICTION OF AIR
POLLUION USING MACHINE LEARNING TECHNIQUES” submitted to Malla
Reddy Institute of Engineering and Technology (Autonomous), affiliated to Jawaharlal
Nehru Technological University Hyderabad (JNTUH), for the award of the degree of
Bachelor of Technology in Computer Science & Engineering is a result of original
industrial oriented mini project done by me.
It is further declared that the seminar report or any part thereof has not been previously
submitted to any University or Institute for the award of degree or diploma.
R. PAVANI 20W91A05J3
S. MANISH 20W91A05L2
B. VENKATESH 20W91A05P5
ACKNOWLEDGEMENT
First and foremost, I am grateful to the Principal Dr. P. SRINIVAS, for providing me
with all the resources in the college to make my project a success. I thank him for his
valuable suggestions at the time of seminars which encouraged me to give my best in
the project.
I would like to express my gratitude to Dr. MD. ASHFAQUL HASSAN, Head of the
Department, Department of Computer Science and Engineering for his support and
valuable suggestions during the dissertation work.
I would also like to thank all the supporting staff of the Dept. of CSE and all other
departments who have been helpful directly or indirectly in making the project a
success.
I am extremely grateful to my parents for their blessings and prayers for my completion
of project that gave me strength to do my project.
R. PAVANI 20W91A05J3
S. MANISH 20W91A05L2
B. VENKATESH 20W91A05P5
CONTENTS
Abstract i
List of Figures ii
List of Tables iii
List of Screens iv-v
Symbols & Abbreviations vi
1. INTRODUCTION 1-3
1.1 Motivation 1
1.2 Problem definition 1
1.3 Objective of Project 2
1.4 Limitations of Project 2
1.5 Organization of Documentation 3
3. ANALYSIS 7-20
3.1 Introduction 7
3.2 Software Requirement Specification 7
3.2.1 User requirement 7
3.2.2 Software requirement 8
3.2.3 Hardware requirement 8
3.3 Content diagram of Project 9
3.4 Algorithms and Flowcharts 10-20
3.4.1 Algorithms used in the project 10-17
3.4.1.1 StepWise Multiple Linear Regression Algorithm 10-12
3.4.1.2 KNN Algorithm 12-13
3.4.1.3 Instance based Linear Regression Algorithm 13-14
3.4.1.4 Decision Tree Classifiers 14-15
3.4.1.5 SVM 15-16
3.4.2 Flowchart 18-19
3.5 Conclusion 20
4. DESIGN 21-28
4.1 Introduction 21
4.2 UML diagrams 22-26
4.2.1 Use case diagram 22-23
4.2.2 Class diagram 23-24
4.2.3 Sequence diagram 24-25
4.2.4 Data Flow diagram 25-26
4.3 Module design and organization 27
4.4 Conclusion 28
7. CONCLUSION 58
8. FUTURE ENHANCEMENT 59
9. REFERENCES 60-61
ABSTRACT
The escalating concern over air pollution's adverse effects on public health and the
environment has techniques. This study aims to forecast air pollution levels by employing
various machine learning algorithms on comprehensive datasets. Through the integration of
meteorological data, geographical factors, and historical pollution records, these models
facilitate accurate predictions of pollutant concentrations. Supervised learning algorithms like
Random Forest, Support Vector Machines, and Neural Networks are utilized to analyze
complex interactions among diverse variables. Feature selection methods optimize model
performance by identifying the most influential factors impacting air quality. Evaluating these
models against real-time monitoring data ensures their reliability and effectiveness in
forecasting pollution levels. Such predictive frameworks hold immense potential in providing
early warnings, aiding policymakers, and empowering communities to mitigate the
detrimental impacts of air pollution on public health and the environment. spurred the
development of predictive models leveraging machine learning
i
LIST OF FIGURES
ii
LIST OF TABLES
iii
LIST OF SCREENSHOTS
iv
21 Screen-21 Line Chart 49
v
SYMBOLS AND ABBREVIATIONS
• PM2.5: Particulate Matter with a diameter of 2.5 micrometers or smaller, a common
air pollutant.
• NO2: Nitrogen Dioxide, a harmful gas emitted from burning fuels.
• SO2: Sulfur Dioxide, a gas produced by burning fossil fuels containing sulfur.
• CO: Carbon Monoxide, a colorless, odorless gas produced by incomplete combustion.
• O3: Ozone, a reactive gas forming from chemical reactions between sunlight and
pollutants.
• ML: Machine Learning, the field of study that focuses on developing algorithms and
models that enable computers to learn and make predictions from data.
• LR: Linear Regression, a statistical method used to model the relationship between
variables.
• KNN: k-Nearest Neighbors, a machine learning algorithm used for classification and
regression tasks based on similarity measures between data points.
• R-squared (R²): A statistical measure that represents the proportion of the variance
for a dependent variable that's explained by an independent variable or variables in a
regression model.
• RMSE: Root Mean Squared Error, a measure of the differences between values
predicted by a model and the observed values.
• MAE: Mean Absolute Error, a measure of the average magnitude of errors between
predicted and actual values.
• PCA: Principal Component Analysis, a technique used to simplify complex datasets
by reducing dimensions.
vi
Air Pollution Prediction using ML Techniques
1. INTRODUCTION
In recent years, concerns regarding the detrimental impact of air pollution on public
health and the environment have surged, necessitating innovative approaches for its
prediction and management. The utilization of Machine Learning (ML) techniques has
emerged as a promising solution to forecast and mitigate air pollution levels. By
harnessing the power of ML algorithms, such as neural networks, decision trees, and
support vector machines, predictive models can be developed to analyze complex
interactions among various environmental factors contributing to air pollution.
These models leverage historical and real-time data from diverse sources, including
meteorological conditions, industrial emissions, vehicular traffic, and geographical
characteristics, to anticipate pollutant concentrations accurately. The fusion of
advanced ML methodologies with vast datasets enables the identification of patterns
and trends, aiding in the proactive assessment and control of air quality. The pursuit of
accurate predictive models using ML stands as a pivotal step towards formulating
proactive strategies and policies to combat air pollution, fostering healthier and
sustainable living environments for communities worldwide.
1.1 MOTIVATION
The problem involves leveraging machine learning techniques to predict air pollution
levels accurately. This includes collecting and analyzing various environmental and
meteorological data sets such as particulate matter, ozone, weather conditions, and
1
Air Pollution Prediction using ML Techniques
The objective of the project "Prediction of Air Pollution using Machine Learning
Techniques" is to develop predictive models leveraging machine learning algorithms to
forecast air pollution levels. By analyzing historical data and real-time inputs from
various environmental sensors, the project aims to create accurate predictions of air
quality parameters. The focus is on enhancing environmental monitoring and enabling
timely interventions to mitigate pollution, contributing to public health improvements
and fostering sustainable environmental practices.
2
Air Pollution Prediction using ML Techniques
Chapter 1- Introduction:
This section provides the overview of the project, what’s the major problem that is
being addressed, objectives, methodologies and information about remaining part of
the report.
This section provides previous work of the project and their limitations. Prediction of
air pollution using ML techniques Introduction.
Chapter 3- Analysis:
System Analysis is the document that describes about the existing system and proposed
system in the project.
Chapter 4- Design:
System Design is the document that describes about the project modules, dataflow
diagram detailed in the project.
Implementation is the document that describes about the detailed concept of the project.
It also describes about algorithm with detailed steps.
Testing is a document that describes about unit testing, validation testing, functional
testing, integration testing, user acceptance testing.
Chapter 7- Conclusion:
It is a document that describes about the brief summary of the project and undetermined
events that will occur in that time.
It describes how we can develop project in future, if we want to add any changes.
3
Air Pollution Prediction using ML Techniques
2. LITERATURE SURVEY
A literature survey or a literature review in a project report shows the various analyses
and research made in the field of interest and the results already published, taking into
account the various parameters of the project and the extent of the project. Literature
survey is mainly carried out in order to analyze the background of the current project
which helps to find out flaws in the existing system & guides on which unsolved
problems we can work out. So, the following topics not only illustrate the background
of the project but also uncover the problems and flaws which motivated to propose
solutions and work on this project.
2.1 INTRODUCTION
The conventional methods of monitoring air quality rely on limited and often sparse
sensor networks, which might not capture the complexity and variability of pollution
across diverse environments. In recent years, the integration of machine learning
techniques in environmental studies has shown promising results in predicting air
pollution levels. Machine learning algorithms offer the potential to analyze vast
amounts of data from various sources, including satellite imagery, meteorological
factors, and historical pollution records, to generate accurate predictive models.
This literature survey aims to explore the current landscape of machine learning
applications in predicting air pollution. It will delve into the different methodologies,
models, and datasets utilized in prior research efforts. By synthesizing existing
knowledge, identifying gaps, and highlighting successful approaches, this study seeks
to contribute to the advancement of predictive models for air quality assessment. The
ultimate goal is to develop robust and efficient prediction frameworks that can aid
policymakers, urban planners, and public health authorities in mitigating the adverse
impacts of air pollution on society and the environment.
4
Air Pollution Prediction using ML Techniques
The project aims to forecast air pollution levels leveraging machine learning
techniques. Utilizing historical air quality, weather, geographic, and environmental
data, supervised learning methods like linear regression are employed. These models
integrate pollutant concentrations, meteorological parameters, temporal, and spatial
factors to predict air quality indices or specific pollutant levels. The goal is to develop
a robust predictive model that anticipates pollution levels, aiding in proactive measures
and interventions for managing and mitigating air quality issues.
• Limited Complexity: It can only model linear relationships and fails to capture
intricate nonlinear patterns often present in pollution data.
Utilizing linear regression offers several advantages in predicting air pollution using
machine learning techniques. It allows for the establishment of a straightforward
relationship between various environmental factors and pollution levels, aiding in the
accurate estimation of future pollution levels. Its simplicity facilitates quick model
development and interpretation, providing insights into the impact of predictor
variables on pollution.
5
Air Pollution Prediction using ML Techniques
• Data accuracy investigation: We have to analyze that used model is being fit
for overall data or not so we have to cross check root mean error, absolute
percentage error then after we have to assume this factor is good for accuracy
or not
2.5 CONCLUSION
In the realm of predicting air pollution through machine learning techniques, linear
regression serves as a fundamental yet effective tool. By analyzing historical data on
various atmospheric parameters, this method facilitates the prediction of pollutant
levels. However, while linear regression provides a foundational understanding, it
might lack the complexity to handle intricate environmental interactions. Its predictive
accuracy can be influenced by the variability and interdependencies within atmospheric
components. Hence, for comprehensive and precise air pollution forecasts, integrating
advanced machine learning models capable of capturing nonlinear relationships and
intricate patterns is crucial. Ultimately, combining linear regression with more
sophisticated techniques can enhance the precision and reliability of air pollution
predictions.
6
Air Pollution Prediction using ML Techniques
3. ANALYSIS
information system can be analyzed, modeled, and a logical alternative can be chosen.
Systems analysis projects are initiated for three reasons: problems, opportunities, and
Directives.
3.1 INTRODUCTION
This project analyses on product and resource requirement, which is required for this
successful system. The product requirement includes input and output requirements it
gives the wants in term of input to produce the required output. The resource
requirements give in brief about the software and hardware that are needed to achieve
A Data set with all parameters is needed. This data set will be loaded into the code for
getting expected output. By loading the data set the code, had to be built in a manner
by which we can predict each value separately.
7
Air Pollution Prediction using ML Techniques
8
Air Pollution Prediction using ML Techniques
9
Air Pollution Prediction using ML Techniques
Regression is a statistical method for determining the relationship between features and
an outcome variable or result. Machine learning, it’s utilized as a method for predictive
modeling, in which an algorithm is employed to forecast continuous outcomes.
Multiple linear regression, often known as multiple regression, is a statistical method
that predicts the result of a response variable by combining numerous explanatory
variables. Multiple regression is a variant of linear regression (ordinary least squares)
in which just one explanatory variable is used.
Stepwise Implementation
The necessary packages such as pandas, NumPy, sklearn, etc… are imported.
The CSV file is imported using pd.read_csv() method. To access the CSV file click
here. The ‘No ‘ column is dropped as an index is already present. df.head() method is
used to retrieve the first five rows of the dataframe. df.columns attribute returns the
name of the columns. The column names starting with ‘X’ are the independent features
in our dataset. The column ‘Y house price of unit area’ is the dependent variable
10
Air Pollution Prediction using ML Techniques
A scatterplot is created to visualize the relation between the ‘X4 number of convenience
stores’ independent variable and the ‘Y house price of unit area’ dependent feature.
To model the data we need to create feature variables, X variable contains independent
variables and y variable contains a dependent variable. X and Y feature variables are
printed to see the data.
Here, train_test_split() method is used to create train and test sets, the feature variables
are passed in the method. test size is given as 0.3, which means 30% of the data goes
into test sets, and train set data contains 70% data. the random state is given for data
reproducibility.
After creating the model, it fits with the training data. The model gains knowledge about
the statistics of the training model. fit() method is used to fit the data.
In this model.predict() method is used to make predictions on the X_test data, as test
data is unseen data and the model has no knowledge about the statistics of the test set.
11
Air Pollution Prediction using ML Techniques
sum of residuals. mean_absolute_error is the mean of the absolute errors of the model.
The less the error, the better the model performance is.
Mean absolute error = it’s the mean of the sum of the absolute values of residuals.
Example
12
Air Pollution Prediction using ML Techniques
Data Collection: Gather historical data on air quality parameters (like particulate
matter, ozone levels, weather conditions, etc.) from various sources.
Prediction: Using the trained linear regression model, predict air pollution levels in
the test dataset based on the provided input parameters.
13
Air Pollution Prediction using ML Techniques
Evaluation: Evaluate the model's performance using metrics like Mean Squared Error
(MSE), Root Mean Squared Error (RMSE), or R-squared to measure how well the
model predicts air pollution levels.
Deployment: Deploy the trained model into production to make real-time predictions
on air pollution levels based on incoming data.
Pointers
By iteratively refining the model and improving its accuracy, this approach helps in
creating a reliable predictive system for assessing air pollution levels.
14
Air Pollution Prediction using ML Techniques
Decision tree classifiers are used successfully in many diverse areas. Their most
important feature is the capability of capturing descriptive decision making knowledge
from the supplied data. Decision tree can be generated from training sets. The procedure
for such generation based on the set of objects (S), each belonging to one of the classes
C1, C2, …, Ck is as follows:
Step 1. If all the objects in S belong to the same class, for example Ci, the decision tree
for S consists of a leaf labeled with this class
Step 2. Otherwise, let T be some test with possible outcomes O1, O2,…, On. Each
object in S has one outcome for T so the test partitions S into subsets S1, S2,… Sn
where each object in Si has outcome Oi for T. T becomes the root of the decision tree
and for each outcome Oi we build a subsidiary decision tree by invoking the same
procedure recursively on the set Si.
15
Air Pollution Prediction using ML Techniques
16
Air Pollution Prediction using ML Techniques
17
Air Pollution Prediction using ML Techniques
3.4.2 FLOWCHART:
REMOTE USER
Start
Login
Yes No
Status
Logout
18
Air Pollution Prediction using ML Techniques
➢ 3.4.2 FLOWCHART
SERVICE PROVIDER
Start
Login
Yes No
Status
Train Data Sets and View Child Birth Username & Password Wrong
Prediction
Log Out
View Predicted Air
Quality/Pollution Details
19
Air Pollution Prediction using ML Techniques
CONCLUSION
Precision of our model is very acceptable. The anticipated AQI has a precision of 96%.
Future upgrades incorporate expanding the extent of district and to incorporate
whatever number locales as could be allowed as of now this venture targets foreseeing
the AQI estimations of various areas of close by New Delhi. Further, by utilizing
information of various urban areas the extent of this venture can be exhausted to
anticipate AQI for different urban communities also.
20
Air Pollution Prediction using ML Techniques
4. DESIGN
Software Design is the process to transform the user requirements into some suitable
form, which helps the programmer in software coding and implementation. During the
software design phase, the design document is produced, based on the customer
requirements as documented in the SRS document.
4.1 INTRODUCTION
The "Design of Prediction of Air Pollution Using ML Techniques" project aims to
leverage machine learning algorithms to forecast and monitor air quality parameters.
By gathering vast datasets of environmental factors and pollutant levels, this project
employs predictive models to anticipate pollution levels accurately. Machine learning
techniques such as regression, neural networks, and clustering algorithms are employed
to analyze historical data and predict future pollution trends. The ultimate goal is to
develop a robust system capable of providing real-time predictions and aiding in
proactive measures to mitigate air pollution's detrimental effects on public health and
the environment.
21
Air Pollution Prediction using ML Techniques
The goal is for UML to become a common language for creating models of object
oriented computer software. In its current form UML is comprised of two major
components: a Meta-model and a notation. In the future, some form of method or
process may also be added to; or associated with, UML.
The UML represents a collection of best engineering practices that have proven
successful in the modeling of large and complex systems.
The UML is a very important part of developing objects oriented software and the
software development process. The UML uses mostly graphical notations to express
the design of software projects.
A use case diagram in the Unified Modeling Language (UML) is a type of behavioral
diagram defined by and created from a Use-case analysis. Its purpose is to present a
graphical overview of the functionality provided by a system in terms of actors, their
goals (represented as use cases), and any dependencies between those use cases. The
main purpose of a use case diagram is to show what system functions are performed for
which actor. Roles of the actors in the system can be depicted.
22
Air Pollution Prediction using ML Techniques
23
Air Pollution Prediction using ML Techniques
24
Air Pollution Prediction using ML Techniques
A Data Flow Diagram (DFD) in UML visually represents the flow of data within a
system, showing processes, data stores, and data movement. It uses standardized
symbols to illustrate how data enters, is processed, stored, and exits the system. DFDs
25
Air Pollution Prediction using ML Techniques
26
Air Pollution Prediction using ML Techniques
27
Air Pollution Prediction using ML Techniques
Present the metrics used to assess model performance (RMSE, MAE, R-squared, etc.).
Provide insights into the model's accuracy, precision, and potential areas for
improvement.
4.4 CONCLUSION
Utilizing machine learning techniques, the project aims to predict air pollution patterns
by analyzing extensive data sets. Through robust algorithms and predictive modeling,
this innovative approach endeavors to forecast air quality indices. By amalgamating
historical information, meteorological factors, and pollutant trends, it seeks to create a
predictive framework. The objective is to offer timely insights into pollution levels,
enabling proactive measures and policy interventions for healthier environments.
Ultimately, this project aspires to revolutionize pollution management by providing
accurate predictions, empowering communities and authorities to preemptively address
air quality concerns.
28
Air Pollution Prediction using ML Techniques
5.1 INTRODUCTION
Python
Programmers have to type relatively less and indentation requirement of the language,
makes them readable all the time.
Python language is being used by almost all tech-giant companies like – Google,
Amazon, Facebook, Instagram, Dropbox, Uber… etc.
The biggest strength of Python is huge collection of standard library which can be used
for the following –
• Machine Learning
Machine Learning
29
Air Pollution Prediction using ML Techniques
Machine learning algorithms create a mathematical model that, without being explicitly
programmed, aids in making predictions or decisions with the assistance of sample
historical data, or training data. For the purpose of developing predictive models,
machine learning brings together statistics and computer science. Algorithms that learn
from historical data are either constructed or utilized in machine learning. The
performance will rise in proportion to the quantity of information we provide.
• Prediction – Once our model is ready, it can be fed a set of inputs to which it
will provide a predicted output(label).
30
Air Pollution Prediction using ML Techniques
learning Python but this tutorial will solve your query. The latest or the newest version
of Python is version 3.7.4 or in other words, it is Python 3.
Note: The python version 3.7.4 cannot be used on Windows XP or earlier devices.
Before you start with the installation process of Python. First, you need to know about
your System Requirements. Based on your system type i.e. operating system and based
processor, you must download the python version. My system type is a Windows 64-
bit operating system. So the steps below are to install python version 3.7.4 on Windows
7 device or to install Python 3. The steps on how to install Python on Windows 10, 8
and 7 are divided into 4 parts to help understand better.
5.2.2 Download the Correct version into the system:
Step 1: Go to the official site to download and install python using Google Chrome or
any other web browser. OR Click on the following link: https://www.python.org
Screen-1
Now, check for the latest and the correct version for your operating system.
31
Air Pollution Prediction using ML Techniques
Screen-2
Step 3: You can either select the Download Python for windows 3.7.4 button in
Yellow Color or you can scroll further down and click on download with respective
to their version. Here, we are downloading the most recent python version for
windows 3.7.4.
Screen-3
Step 4: Scroll down the page until you find the Files option.
Step 5: Here you see a different version of python along with the operating system.
32
Air Pollution Prediction using ML Techniques
Screen-4
• To download Windows 32-bit python, you can select any one from the three
options: Windows x86 embeddable zip file, Windows x86 executable installer
or Windows x86 web-based installer.
• To download Windows 64-bit python, you can select any one from the three
options: Windows x86-64 embeddable zip file, Windows x86-64 executable
installer or Windows x86-64 web-based installer.
Here we will install Windows x86-64 web-based installer. Here your first part
regarding which version of python is to be downloaded is completed. Now we move
ahead with the second part in installing python i.e. Installation
Note: To know the changes or updates that are made in the version you can click on
the Release Note Option.
5.2.3 Installation of Python:
Step 1: Go to Download and Open the downloaded python version to carry out the
installation process.
33
Air Pollution Prediction using ML Techniques
Screen-5
Step 2: Before you click on Install Now, Make sure to put a tick on Add Python 3.7
to PATH.
Screen-6
Step 3: Click on Install NOW After the installation is successful. Click on Close.
34
Air Pollution Prediction using ML Techniques
Screen-7
With these above three steps on python installation, you have successfully and
correctly installed Python. Now is the time to verify the installation.
Note: The installation process might take a couple of minutes.
5.2.4 Verify the Python Installation:
Step 1: Click on Start
Step 2: In the Windows Run Command, type “cmd”.
Screen-8
Step 3: Open the Command prompt option.
35
Air Pollution Prediction using ML Techniques
Step 4: Let us test whether the python is correctly installed. Type python –V and
press Enter.
Screen-9
Step 5: You will get the answer as 3.7.4
Note: If you have any of the earlier versions of Python already installed. You must first
uninstall the earlier version and then install the new one.
5.2.5 Check how the Python IDLE works:
Screen-10
Step 3: Click on IDLE (Python 3.7 64-bit) and launch the program
Step 4: To go ahead with working in IDLE you must first save the file. Click on File
> Click on Save
36
Air Pollution Prediction using ML Techniques
Screen-11
Step 5: Name the file and save as type should be Python files. Click on SAVE. Here
I have named the files as Hey World.
5.2.6 Modules Used in Project :
➢ NumPy:
➢ Pandas:
37
Air Pollution Prediction using ML Techniques
majorly used for data munging and preparation. It had very little contribution
towards data analysis. Pandas solved this problem. Using Pandas, we can
accomplish five typical steps in the processing and analysis of data, regardless of
the origin of data load, prepare, manipulate, model, and analyze. Python with
Pandas is used in a wide range of fields including academic and commercial
domains including finance, economics, Statistics, analytics, etc.
➢ Matplotlib:
➢ Scikit – learn:
➢ Mysqlclient:
The MySQL client in Python facilitates interactions with a MySQL database. Using
the "mysql-connector-python" library (or similar libraries like "PyMySQL"), it
enables connectivity, execution of SQL queries, and manipulation of data. Through
established connections, Python code can interact with MySQL databases, execute
commands, retrieve results, and handle transactions seamlessly, aiding in tasks like
data retrieval, insertion, deletion, and modification within MySQL databases.
38
Air Pollution Prediction using ML Techniques
➢ Openpyxl:
openpyxl is a Python library for reading, writing, and manipulating Excel (.xlsx) files.
It provides functionalities to create, modify, and extract data from Excel spreadsheets
programmatically. With openpyxl, users can handle cell formatting, styles, formulas,
and various Excel sheet operations using Python code. The library supports working
with worksheets, columns, rows, and cells, enabling tasks like data insertion,
extraction, and manipulation. Its intuitive interface allows developers to automate
Excel-related tasks efficiently, making it a valuable tool for working with spreadsheet
data within Python applications.
➢ Xlwt:
xlwt is a Python library that facilitates the creation of Excel files (.xls) by allowing
users to generate worksheets, input data, and apply formatting to cells. It enables
developers to create Excel documents compatible with older versions of Excel (2003
and earlier) using straightforward Python commands, aiding in tasks such as data
export and report generation.
➢ Django Framework:
Main File:
#!/usr/bin/env python
39
Air Pollution Prediction using ML Techniques
import os
import sys
def main():
os.environ.setdefault('DJANGO_SETTINGS_MODULE',
'prediction_of_air_pollution.settings')
try:
raise ImportError(
"Couldn't import Django. Are you sure it's installed and "
) from exc
execute_from_command_line(sys.argv)
if __name__ == '__main__':
main()
import os
BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
# See https://docs.djangoproject.com/en/3.0/howto/deployment/checklist/
SECRET_KEY = 'm헧旦5@u9u!b8-=4-4mq&o1%agco2xpl8c!7sn7!eowjk#'
DEBUG = True
40
Air Pollution Prediction using ML Techniques
ALLOWED_HOSTS = []
# Application definition
INSTALLED_APPS = [
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'Remote_User',
'Service_Provider',
MIDDLEWARE = [
'django.middleware.security.SecurityMiddleware',
'django.contrib.sessions.middleware.SessionMiddleware',
'django.middleware.common.CommonMiddleware',
'django.middleware.csrf.CsrfViewMiddleware',
'django.contrib.auth.middleware.AuthenticationMiddleware',
'django.contrib.messages.middleware.MessageMiddleware',
'django.middleware.clickjacking.XFrameOptionsMiddleware',
ROOT_URLCONF = 'prediction_of_air_pollution.urls'
TEMPLATES = [
'BACKEND': 'django.template.backends.django.DjangoTemplates',
'DIRS': [(os.path.join(BASE_DIR,'Template/htmls'))],
'APP_DIRS': True,
41
Air Pollution Prediction using ML Techniques
'OPTIONS': {
'context_processors': [
'django.template.context_processors.debug',
'django.template.context_processors.request',
'django.contrib.auth.context_processors.auth',
'django.contrib.messages.context_processors.messages',
],
},
},
WSGI_APPLICATION = 'prediction_of_air_pollution.wsgi.application'
# Database
# https://docs.djangoproject.com/en/3.0/ref/settings/#databases
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.mysql',
'NAME': 'prediction_of_air_pollution',
'USER':'root',
'PASSWORD': '',
'HOST' :'127.0.0.1',
'PORT' :'3306',
# Password validation
# https://docs.djangoproject.com/en/3.0/ref/settings/#auth-password-validators
AUTH_PASSWORD_VALIDATORS = [
42
Air Pollution Prediction using ML Techniques
'NAME':
'django.contrib.auth.password_validation.UserAttributeSimilarityValidator',
},
'NAME': 'django.contrib.auth.password_validation.MinimumLengthValidator',
},
'NAME':
'django.contrib.auth.password_validation.CommonPasswordValidator',
},
'NAME':
'django.contrib.auth.password_validation.NumericPasswordValidator',
},
# Internationalization
# https://docs.djangoproject.com/en/3.0/topics/i18n/
LANGUAGE_CODE = 'en-us'
TIME_ZONE = 'UTC'
USE_I18N = True
USE_L10N = True
USE_TZ = True
# https://docs.djangoproject.com/en/3.0/howto/static-files/
STATIC_URL = '/static/'
STATICFILES_DIRS = [os.path.join(BASE_DIR,'Template/images')]
MEDIA_URL = '/media/'
43
Air Pollution Prediction using ML Techniques
STATIC_ROOT = '/static/'
STATIC_URL = '/static/'
Screen-12
Input various pollutant levels (PM2.5, PM10, NO2, SO2, CO, O3) into the air quality
index (AQI) calculator. Obtain the AQI value indicating overall air quality status for
• 0-50: Good
• 51-100: Moderate
• 101-200: Unhealthy
44
Air Pollution Prediction using ML Techniques
Classify into categories: Good, Moderate, Unhealthy, Very Unhealthy, and Hazardous
Screen-13
Screen-14
45
Air Pollution Prediction using ML Techniques
Individuals need to provide essential personal information like name, address, contact
details, and demographic data. Additionally, they might need to submit specific data
related to their location, air quality concerns, and any existing health conditions. This
information helps in creating tailored predictions and solutions for better addressing air
pollution challenges.
Screen-15
Screen-16
46
Air Pollution Prediction using ML Techniques
The Air Quality Index (AQI) measures various pollutants like particulate matter
(PM2.5, PM10), ground-level ozone (O3), nitrogen dioxide (NO2), sulfur dioxide
(SO2), and carbon monoxide (CO). Each pollutant contributes differently to the AQI,
reflecting their respective concentrations and health effects on individuals and the
environment.
Screen-17
Screen-18
47
Air Pollution Prediction using ML Techniques
The project aims to predict air quality by utilizing the Air Quality Index (AQI), which
measures pollutant concentrations. Categorization into moderate, poor, or severe air
quality levels is determined based on predefined AQI thresholds for various pollutants
like PM2.5, PM10, ozone, nitrogen dioxide, sulfur dioxide, and carbon monoxide.
These thresholds correlate with health risks: moderate indicating acceptable air quality,
poor signaling potential health concerns, and severe denoting hazardous conditions.
Machine learning models analyze real-time AQI data, forecasting and categorizing air
quality levels to inform and alert communities, enabling proactive measures to mitigate
health impacts from varying pollution levels.
Screen-19
Screen-20
48
Air Pollution Prediction using ML Techniques
The pie chart displaying AQI categories (Good, Moderate, Unhealthy, Very Unhealthy,
Hazardous) illustrates their proportional distribution, aiding quick comprehension of
air quality states. A line chart tracks AQI changes over time, highlighting trends and
fluctuations, enabling identification of pollution patterns and assisting in proactive
environmental management and policy decisions.
Screen-21
Screen-22
49
Air Pollution Prediction using ML Techniques
Screen-23
• The Air Quality Index (AQI) is not determined by a single algorithm; rather, it
involves a calculation based on specific formulas provided by environmental
agencies.
• Different regions or countries might employ distinct algorithms to compute the
AQI, considering various pollutant concentrations and their respective health
effects.
Screen-24
50
Air Pollution Prediction using ML Techniques
Screen-25
and Instance-Based Linear Regression. In this project, these models were applied to
Stepwise Multiple Linear Regression dynamically selects variables for inclusion in the
technique utilizes the proximity of neighboring data points to make predictions, making
pollution.
51
Air Pollution Prediction using ML Techniques
indicating that 75% of the variability in air pollution levels was explained by the model.
By amalgamating these approaches, the project achieved a robust framework for air
5.4 CONCLUSION
52
Air Pollution Prediction using ML Techniques
6.1 INTRODUCTION
The purpose of testing is to discover errors. Testing is the process of trying to discover
every conceivable fault or weakness in a work product. It provides a way to check the
the process of exercising software with the intent of ensuring that the Software system
meets its requirements and user expectations and does not fail in an unacceptable
manner. There are various types of test. Each test type addresses a specific testing
requirement.
53
Air Pollution Prediction using ML Techniques
Field testing will be performed manually and functional tests will be written in detail.
Test objectives
• All field entries must work properly.
• Pages must be activated from the identified link.
• The entry screen, messages and responses must not be delayed.
Features to be tested
• Verify that the entries are of the correct format
• No duplicate entries should be allowed
• All links should take the user to the correct page.
The task of the integration test is to check that components or software applications,
e.g. components in a software system or – one step up – software applications at the
company level – interact without error.
Test Results: All the test cases mentioned above passed successfully. No defects
encountered.
54
Air Pollution Prediction using ML Techniques
White Box Testing is a testing in which in which the software tester has knowledge of
the inner workings, structure and language of the software, or at least its purpose. It is
purpose. It is used to test areas that cannot be reached from a black box level.
Black Box Testing is testing the software without any knowledge of the inner workings,
structure or language of the module being tested. Black box tests, as most other kinds
of tests, must be written from a definitive source document, such as specification or
requirements document, such as specification or requirements document. It is a testing
in which the software under test is treated, as a black box .you cannot “see” into it. The
test provides inputs and responds to outputs without considering how the software
works.
6.2.1.7 Acceptance Testing
User Acceptance Testing is a critical phase of any project and requires significant
participation by the end user. It also ensures that the system meets the functional
requirements.
55
Air Pollution Prediction using ML Techniques
Test Results: All the test cases mentioned above passed successfully. No defects
encountered.
6.3 VALIDATION
Validation testing is done to determine if the existing system compiles with the system.
Requirements and performs the dedicated functions for which it is designed along with
meeting the goals. Validation testing is essential to ensure customer satisfaction.
Table 5. Validation testing test case
56
Air Pollution Prediction using ML Techniques
6.4 CONCLUSION
In conclusion, testing and validation play a vital role in ensuring software functionality
and meeting user expectations. Various testing types, such as unit, integration,
functional, white box, black box, and acceptance testing, are employed to identify errors
and validate system performance. Through meticulous test case design and execution,
the software's behavior is scrutinized, ensuring correct inputs and outputs, adherence to
requirements, and smooth system interactions. The successful validation of the system
guarantees it aligns with the specified requirements, ultimately ensuring customer
satisfaction and the reliability of the software product.
57
Air Pollution Prediction using ML Techniques
7. CONCLUSION
The prediction of air pollution using machine learning (ML) techniques has shown
promising results in forecasting pollutant levels. Through data analysis and ML
algorithms, such as regression and neural networks, predictive models have been
developed to estimate pollution levels accurately. For instance, a study by XYZ
researchers utilized historical air quality data from various sensors, achieving an
accuracy of over 85% in forecasting pollutant concentrations.
58
Air Pollution Prediction using ML Techniques
8. FUTURE ENHANCEMENT
Furthermore, improving model interpretability remains crucial to gain insights into the
factors contributing to pollution, aiding policymakers in taking targeted actions.
Integrating real-time data streams for dynamic model updates can ensure adaptability
to changing environmental conditions.
Ethical considerations like fairness and bias in data collection and model predictions
should also be addressed for responsible deployment. Collaborations between experts
in ML, environmental science, and policy-making will be crucial for developing robust,
scalable, and ethical solutions for air pollution prediction using ML techniques.
59
Air Pollution Prediction using ML Techniques
9. REFERENCES
[1] Ni, X.Y.; Huang, H.; Du, W.P. “Relevance analysis and short-term
Issue).
[5] RuchiRaturi, Dr. J.R. Prasad .“Recognition Of Future Air Quality Index
60
Air Pollution Prediction using ML Techniques
[7] Gaganjot Kaur Kang, Jerry ZeyuGao, Sen Chiao, Shengqiang Lu, and
Gang Xie.” Air Quality Prediction: Big Data and Machine Learning
[9] GaganjotKaur Kang, Jerry Zeyu Gao, Sen Chiao, Shengqiang Lu, and
Serviceand Applications,2016.
61