0% found this document useful (0 votes)
27 views74 pages

Roshini Project

The document is a mini project report titled 'Prediction of Air Pollution Using Machine Learning Techniques' submitted by students of Malla Reddy Institute of Engineering and Technology for their Bachelor of Technology degree. It outlines the motivation, objectives, and methodologies for predicting air pollution levels using various machine learning algorithms, while also discussing limitations and the importance of accurate predictions for public health. The report includes sections on literature review, analysis, design, implementation, testing, and future enhancements.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views74 pages

Roshini Project

The document is a mini project report titled 'Prediction of Air Pollution Using Machine Learning Techniques' submitted by students of Malla Reddy Institute of Engineering and Technology for their Bachelor of Technology degree. It outlines the motivation, objectives, and methodologies for predicting air pollution levels using various machine learning algorithms, while also discussing limitations and the importance of accurate predictions for public health. The report includes sections on literature review, analysis, design, implementation, testing, and future enhancements.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

`An industrial oriented mini project report

On
PREDICTION OF AIR POLLUION USING MACHINE LEARNING
TECHNIQUES
Submitted by
ROSHINI MISHRA 20W91A05J8
R. PAVANI 20W91A05J3
S. MANISH 20W91A05L2
B. VENKATESH 20W91A05P5
Under the Esteemed Guidance of
Mr. J. VENKATESH
Asst. Professor, CSE
TO
Jawaharlal Nehru Technological University, Hyderabad
In partial fulfilment of the requirements for award of degree of
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


MALLA REDDY INSTITUTE OF ENGINEERING AND TECHNOLOGY
(UGC AUTONOMOUS)
(Sponsored by Malla Reddy Educational society)
(Affiliated to JNTU, Hyderabad)
Maisammaguda, Dhulapally post, Secunderabad-500014.

2023-2024
MALLA REDDY
INSTITUTE OF ENGINEERING & TECHNOLOGY
(Autonomous Institution - UGC, Govt. of India)
(Sponsored by Malla Reddy Educational Society)
Approved by AICTE, New Delhi, Recognized Under 2(1) & 12(B)
Affiliated to JNTU, Hyderabad, Accredited by NBA & NAAC with ‘A’ Grade

Department of Computer Science and Engineering

BONAFIDE CERTIFICATE

This is to certify that this is the bonafide certificate of an industrial oriented mini project
report titled "PREDICTION OF AIR POLLUION USING MACHINE
LEARNING TECHNIQUES" is submitted by ROSHINI MISHRA (20W91A05J8),
R. PAVANI (20W91A05J3), S. MANISH (20W91A05L2), B. VENKATESH
(20W91A05P5) of B. Tech in the partial fulfilment of the requirements for the degree
of Bachelor of Technology in Computer Science and Engineering and this has not been
submitted for the award of any other degree of this institution.

Internal Guide Sign Head of Department Sign

External Examiner Sign


DECLARATION

I hereby declare that the Mini Project report entitled “PREDICTION OF AIR
POLLUION USING MACHINE LEARNING TECHNIQUES” submitted to Malla
Reddy Institute of Engineering and Technology (Autonomous), affiliated to Jawaharlal
Nehru Technological University Hyderabad (JNTUH), for the award of the degree of
Bachelor of Technology in Computer Science & Engineering is a result of original
industrial oriented mini project done by me.

It is further declared that the seminar report or any part thereof has not been previously
submitted to any University or Institute for the award of degree or diploma.

ROSHINI MISHRA 20W91A05J8

R. PAVANI 20W91A05J3

S. MANISH 20W91A05L2

B. VENKATESH 20W91A05P5
ACKNOWLEDGEMENT

First and foremost, I am grateful to the Principal Dr. P. SRINIVAS, for providing me
with all the resources in the college to make my project a success. I thank him for his
valuable suggestions at the time of seminars which encouraged me to give my best in
the project.

I would like to express my gratitude to Dr. MD. ASHFAQUL HASSAN, Head of the
Department, Department of Computer Science and Engineering for his support and
valuable suggestions during the dissertation work.

I offer my sincere gratitude to my project – coordinator Dr. N. NARENDHAR and


internal guide Mr. J. VENKATESH who has supported me throughout this project
with their patience and valuable suggestions.

I would also like to thank all the supporting staff of the Dept. of CSE and all other
departments who have been helpful directly or indirectly in making the project a
success.

I am extremely grateful to my parents for their blessings and prayers for my completion
of project that gave me strength to do my project.

ROSHINI MISHRA 20W91A05J8

R. PAVANI 20W91A05J3

S. MANISH 20W91A05L2

B. VENKATESH 20W91A05P5
CONTENTS

CONTENT PAGE NO.

Abstract i
List of Figures ii
List of Tables iii
List of Screens iv-v
Symbols & Abbreviations vi
1. INTRODUCTION 1-3
1.1 Motivation 1
1.2 Problem definition 1
1.3 Objective of Project 2
1.4 Limitations of Project 2
1.5 Organization of Documentation 3

2. LITERATURE SURVEY 4-6


2.1 Introduction 4
2.2 Existing System 5
2.3 Disadvantages of Existing system 5
2.4 Proposed System 5-6
2.5 Conclusion 6

3. ANALYSIS 7-20
3.1 Introduction 7
3.2 Software Requirement Specification 7
3.2.1 User requirement 7
3.2.2 Software requirement 8
3.2.3 Hardware requirement 8
3.3 Content diagram of Project 9
3.4 Algorithms and Flowcharts 10-20
3.4.1 Algorithms used in the project 10-17
3.4.1.1 StepWise Multiple Linear Regression Algorithm 10-12
3.4.1.2 KNN Algorithm 12-13
3.4.1.3 Instance based Linear Regression Algorithm 13-14
3.4.1.4 Decision Tree Classifiers 14-15
3.4.1.5 SVM 15-16
3.4.2 Flowchart 18-19
3.5 Conclusion 20

4. DESIGN 21-28
4.1 Introduction 21
4.2 UML diagrams 22-26
4.2.1 Use case diagram 22-23
4.2.2 Class diagram 23-24
4.2.3 Sequence diagram 24-25
4.2.4 Data Flow diagram 25-26
4.3 Module design and organization 27
4.4 Conclusion 28

5. IMPLEMENTATION & RESULTS 29-52


5.1 Introduction 29-30
5.2 Explanation of Key functions 30-39
5.3 Method of Implementation 39-52
5.3.1 Source Code 39-44
5.3.2 Output Screens 44-51
5.3.3 Result Analysis 51-52
5.4 Conclusion 52

6. TESTING & VALIDATION 53-57


6.1 Introduction 53
6.2 Design of test cases and scenarios 53-56
6.3 Validation 56-57
6.4 Conclusion 57

7. CONCLUSION 58
8. FUTURE ENHANCEMENT 59
9. REFERENCES 60-61
ABSTRACT

The escalating concern over air pollution's adverse effects on public health and the
environment has techniques. This study aims to forecast air pollution levels by employing
various machine learning algorithms on comprehensive datasets. Through the integration of
meteorological data, geographical factors, and historical pollution records, these models
facilitate accurate predictions of pollutant concentrations. Supervised learning algorithms like
Random Forest, Support Vector Machines, and Neural Networks are utilized to analyze
complex interactions among diverse variables. Feature selection methods optimize model
performance by identifying the most influential factors impacting air quality. Evaluating these
models against real-time monitoring data ensures their reliability and effectiveness in
forecasting pollution levels. Such predictive frameworks hold immense potential in providing
early warnings, aiding policymakers, and empowering communities to mitigate the
detrimental impacts of air pollution on public health and the environment. spurred the
development of predictive models leveraging machine learning

i
LIST OF FIGURES

S.No Figure No Description Page no

1 3.3 Proposed system 9


Architecture
2 3.4 Multiple Linear regression 12
Analysis
3 3.4 KNN Algorithm working 13
visualization
4 3.4 Instance based Linear 14
Regression Analysis
5 3.4 Decision tree classifiers 15
Analysis
6 3.4 Support Vector Machine 16
Analysis
7 4.1 Overview of proposed 21
system
8 4.2 Use case diagram 23

9 4.2 Class diagram 24

10 4.2 Sequence diagram 25

11 4.2 Dataflow diagram 26

12 6.2 Software Testing Types 53

ii
LIST OF TABLES

S.No Table No Description Page No

1 3.1 Air Quality type Ratio 17

2 6.2 Test cases 56

3 6.3 Validation testing test cases 56-57

iii
LIST OF SCREENSHOTS

S. No SCREEN SHOT NO Description Page No

1 Screen-1 Installation of python 31

2 Screen-2 Click on download Tab 32

3 Screen-3 Select Version of Python 32

4 Screen-4 Install Windows x86-64 web-based 33


installer
5 Screen-5 Open the downloaded python version 34

6 Screen-6 Put a tick on Add Python 3.7 to PATH. 34

7 Screen-7 Installation successful. 35

8 Screen-8 Click on Close. 35

9 Screen-9 Open command prompt 36

10 Screen-10 Check python version 36

11 Screen-11 Python IDLE working 37

12 Screen-12 Login Page 44

13 Screen-13 Registration Page 45

14 Screen-14 Filling of Registration Page 45

15 Screen-15 Profile Details 46

16 Screen-16 Air Quality Prediction Page 46

17 Screen-17 Filling of AQP Page based on trained 47


Dataset
18 Screen-18 Air Quality Prediction Results 47

19 Screen-19 View all Remote Users 48

20 Screen-20 Pie Chart 48

iv
21 Screen-21 Line Chart 49

22 Screen-22 Air Pollution Prediction Details 49

23 Screen-23 Air Quality Prediction Type 50

24 Screen-24 Air Quality Prediction Type Ratio 50

25 Screen-25 Analysis of AQP 51

v
SYMBOLS AND ABBREVIATIONS
• PM2.5: Particulate Matter with a diameter of 2.5 micrometers or smaller, a common
air pollutant.
• NO2: Nitrogen Dioxide, a harmful gas emitted from burning fuels.
• SO2: Sulfur Dioxide, a gas produced by burning fossil fuels containing sulfur.
• CO: Carbon Monoxide, a colorless, odorless gas produced by incomplete combustion.
• O3: Ozone, a reactive gas forming from chemical reactions between sunlight and
pollutants.
• ML: Machine Learning, the field of study that focuses on developing algorithms and
models that enable computers to learn and make predictions from data.
• LR: Linear Regression, a statistical method used to model the relationship between
variables.
• KNN: k-Nearest Neighbors, a machine learning algorithm used for classification and
regression tasks based on similarity measures between data points.
• R-squared (R²): A statistical measure that represents the proportion of the variance
for a dependent variable that's explained by an independent variable or variables in a
regression model.
• RMSE: Root Mean Squared Error, a measure of the differences between values
predicted by a model and the observed values.
• MAE: Mean Absolute Error, a measure of the average magnitude of errors between
predicted and actual values.
• PCA: Principal Component Analysis, a technique used to simplify complex datasets
by reducing dimensions.

vi
Air Pollution Prediction using ML Techniques

1. INTRODUCTION

In recent years, concerns regarding the detrimental impact of air pollution on public
health and the environment have surged, necessitating innovative approaches for its
prediction and management. The utilization of Machine Learning (ML) techniques has
emerged as a promising solution to forecast and mitigate air pollution levels. By
harnessing the power of ML algorithms, such as neural networks, decision trees, and
support vector machines, predictive models can be developed to analyze complex
interactions among various environmental factors contributing to air pollution.

These models leverage historical and real-time data from diverse sources, including
meteorological conditions, industrial emissions, vehicular traffic, and geographical
characteristics, to anticipate pollutant concentrations accurately. The fusion of
advanced ML methodologies with vast datasets enables the identification of patterns
and trends, aiding in the proactive assessment and control of air quality. The pursuit of
accurate predictive models using ML stands as a pivotal step towards formulating
proactive strategies and policies to combat air pollution, fostering healthier and
sustainable living environments for communities worldwide.

1.1 MOTIVATION

Predicting air pollution using machine learning techniques offers groundbreaking


solutions for early detection, mitigation, and public health management. By leveraging
advanced algorithms on vast datasets, this project aims to forecast pollution levels,
enabling timely interventions and policy decisions. It fosters a proactive approach to
safeguarding human health and the environment, empowering communities with vital
information for minimizing exposure risks and implementing sustainable measures.
Ultimately, this innovative initiative strives to create a healthier, cleaner future by
harnessing the power of technology to predict and prevent air pollution.

1.2 PROBLEM DEFINITION

The problem involves leveraging machine learning techniques to predict air pollution
levels accurately. This includes collecting and analyzing various environmental and
meteorological data sets such as particulate matter, ozone, weather conditions, and
1
Air Pollution Prediction using ML Techniques

geographical factors. The objective is to develop predictive models capable of


forecasting air quality indices or pollutant concentrations in specific locations. By
utilizing historical data and advanced algorithms, the aim is to create a reliable system
that anticipates and alerts about potential air pollution episodes, aiding in proactive
measures and public awareness to mitigate the adverse effects on human health and the
environment.

1.3 OBJECTIVE OF THE PROJECT

The objective of the project "Prediction of Air Pollution using Machine Learning
Techniques" is to develop predictive models leveraging machine learning algorithms to
forecast air pollution levels. By analyzing historical data and real-time inputs from
various environmental sensors, the project aims to create accurate predictions of air
quality parameters. The focus is on enhancing environmental monitoring and enabling
timely interventions to mitigate pollution, contributing to public health improvements
and fostering sustainable environmental practices.

1.4 LIMITATIONS OF PROJECT

• Data Availability: Limited availability or quality of historical air pollution data


for training the machine learning models can hinder accuracy.
• Feature Selection: Challenges in identifying and selecting relevant features
impacting air pollution prediction due to complex interactions among various
contributing factors.
• Model Generalization: Difficulty in developing a model that generalizes well
across diverse geographical locations or varying environmental conditions.
• Computational Resources: High computational requirements for processing
extensive datasets and running complex machine learning algorithms.
• Model Interpretability: Complex machine learning models might lack
interpretability, making it challenging to explain predictions to stakeholders or
experts.
• Regulatory Constraints: Compliance with regulatory standards and policies
for air quality prediction models might pose limitations.

2
Air Pollution Prediction using ML Techniques

1.5 ORGANISATION OF DOCUMENTATION

Chapter 1- Introduction:

This section provides the overview of the project, what’s the major problem that is
being addressed, objectives, methodologies and information about remaining part of
the report.

Chapter 2- Literature Survey:

This section provides previous work of the project and their limitations. Prediction of
air pollution using ML techniques Introduction.

Chapter 3- Analysis:

System Analysis is the document that describes about the existing system and proposed
system in the project.

Chapter 4- Design:

System Design is the document that describes about the project modules, dataflow
diagram detailed in the project.

Chapter 5- Implementation & Results:

Implementation is the document that describes about the detailed concept of the project.
It also describes about algorithm with detailed steps.

Chapter 6- Testing & Validation:

Testing is a document that describes about unit testing, validation testing, functional
testing, integration testing, user acceptance testing.

Chapter 7- Conclusion:

It is a document that describes about the brief summary of the project and undetermined
events that will occur in that time.

Chapter 8- Future Enhancement:

It describes how we can develop project in future, if we want to add any changes.

3
Air Pollution Prediction using ML Techniques

2. LITERATURE SURVEY

A literature survey or a literature review in a project report shows the various analyses
and research made in the field of interest and the results already published, taking into
account the various parameters of the project and the extent of the project. Literature
survey is mainly carried out in order to analyze the background of the current project
which helps to find out flaws in the existing system & guides on which unsolved
problems we can work out. So, the following topics not only illustrate the background
of the project but also uncover the problems and flaws which motivated to propose
solutions and work on this project.

2.1 INTRODUCTION

Air pollution is a pervasive environmental concern with far-reaching implications for


public health and ecological balance. Rapid industrialization, urbanization, and
vehicular emissions have significantly contributed to the deterioration of air quality
worldwide. Monitoring and predicting air pollution levels are crucial for implementing
effective mitigation strategies and safeguarding human health.

The conventional methods of monitoring air quality rely on limited and often sparse
sensor networks, which might not capture the complexity and variability of pollution
across diverse environments. In recent years, the integration of machine learning
techniques in environmental studies has shown promising results in predicting air
pollution levels. Machine learning algorithms offer the potential to analyze vast
amounts of data from various sources, including satellite imagery, meteorological
factors, and historical pollution records, to generate accurate predictive models.

This literature survey aims to explore the current landscape of machine learning
applications in predicting air pollution. It will delve into the different methodologies,
models, and datasets utilized in prior research efforts. By synthesizing existing
knowledge, identifying gaps, and highlighting successful approaches, this study seeks
to contribute to the advancement of predictive models for air quality assessment. The
ultimate goal is to develop robust and efficient prediction frameworks that can aid
policymakers, urban planners, and public health authorities in mitigating the adverse
impacts of air pollution on society and the environment.

4
Air Pollution Prediction using ML Techniques

2.2 EXISTING SYSTEM

The project aims to forecast air pollution levels leveraging machine learning
techniques. Utilizing historical air quality, weather, geographic, and environmental
data, supervised learning methods like linear regression are employed. These models
integrate pollutant concentrations, meteorological parameters, temporal, and spatial
factors to predict air quality indices or specific pollutant levels. The goal is to develop
a robust predictive model that anticipates pollution levels, aiding in proactive measures
and interventions for managing and mitigating air quality issues.

2.3 DISADVANTAGES OF EXISTING SYSTEM

• The system has not implemented Stepwise Multiple Linear Regression


Method.

• The system has not implemented Instance-Linear Regression Model

• Limited Accuracy: In scenarios where the relationship between predictors and


pollution levels is non-linear or dynamic, linear regression might provide
inaccurate predictions.

• Limited Complexity: It can only model linear relationships and fails to capture
intricate nonlinear patterns often present in pollution data.

2.4 PROPOSED SYSTEM

Utilizing linear regression offers several advantages in predicting air pollution using
machine learning techniques. It allows for the establishment of a straightforward
relationship between various environmental factors and pollution levels, aiding in the
accurate estimation of future pollution levels. Its simplicity facilitates quick model
development and interpretation, providing insights into the impact of predictor
variables on pollution.

• Data assortment: There is a different method from which we collected data


from various dependable sources like Delhi Gov. site.

5
Air Pollution Prediction using ML Techniques

• Exploratory examination: We research and explore examination with various


parameter like ID of outliners, consistency check, missing qualities, and so on,
it’s totally occurred in this period of the venture.

• Data Manipulation control: In period of data control stage the required


missing data need to insert in utilizing the mean estimations of that
characteristic of information.

• Data accuracy investigation: We have to analyze that used model is being fit
for overall data or not so we have to cross check root mean error, absolute
percentage error then after we have to assume this factor is good for accuracy
or not

2.5 CONCLUSION

In the realm of predicting air pollution through machine learning techniques, linear
regression serves as a fundamental yet effective tool. By analyzing historical data on
various atmospheric parameters, this method facilitates the prediction of pollutant
levels. However, while linear regression provides a foundational understanding, it
might lack the complexity to handle intricate environmental interactions. Its predictive
accuracy can be influenced by the variability and interdependencies within atmospheric
components. Hence, for comprehensive and precise air pollution forecasts, integrating
advanced machine learning models capable of capturing nonlinear relationships and
intricate patterns is crucial. Ultimately, combining linear regression with more
sophisticated techniques can enhance the precision and reliability of air pollution
predictions.

6
Air Pollution Prediction using ML Techniques

3. ANALYSIS

Analysis is the process by which an individual studies a system such that an

information system can be analyzed, modeled, and a logical alternative can be chosen.

Systems analysis projects are initiated for three reasons: problems, opportunities, and

Directives.

3.1 INTRODUCTION

This project analyses on product and resource requirement, which is required for this

successful system. The product requirement includes input and output requirements it

gives the wants in term of input to produce the required output. The resource

requirements give in brief about the software and hardware that are needed to achieve

the required functionality.

3.2 SOFTWARE REQUIREMENT SPECIFICATION

A Software Requirements Specification (SRS) is a document that describes the nature


of a project, software or application. In simple words, SRS document is a manual of a
project provided it is prepared before you kick-start a project/application. This
document is also known by the names SRS report, software document. A software
document is primarily prepared for a project, software or any kind of application.

3.2.1 USER REQUIREMENT


For execution of this project, a user needs a particular platform for writing a code and
an internet for browsing more ideas regarding the project surveys and syntax being used
in the project. A user may even need a software on which he can run the program.

A Data set with all parameters is needed. This data set will be loaded into the code for
getting expected output. By loading the data set the code, had to be built in a manner
by which we can predict each value separately.

7
Air Pollution Prediction using ML Techniques

3.2.2 SOFTWARE REQUIREMENT


▪ Operating system : Windows 7 Ultimate.
▪ Coding Language : Python.
▪ Front-End : Python.
▪ Back-End : Django-ORM
▪ Designing : Html, css, javascript.
▪ Data Base : MySQL (WAMP Server).

3.2.3 HARDWARE REQUIREMENT


▪ Processor - Pentium –IV
▪ RAM - 4 GB (min)
▪ Hard Disk - 20 GB
▪ Key Board - Standard Windows Keyboard
▪ Mouse - Two or Three Button Mouse
▪ Monitor - SVGA

8
Air Pollution Prediction using ML Techniques

3.3 CONTENT DIAGRAM OF PROJECT

Fig.1. Proposed System architecture

9
Air Pollution Prediction using ML Techniques

3.4 ALGORITHMS AND FLOWCHARTS


3.4.1 ALGORITHMS USED IN THE PROJECT:
3.4.1.1 STEPWISE MULTIPLE LINEAR REGRESSION ALGORITHM:

Regression is a statistical method for determining the relationship between features and
an outcome variable or result. Machine learning, it’s utilized as a method for predictive
modeling, in which an algorithm is employed to forecast continuous outcomes.
Multiple linear regression, often known as multiple regression, is a statistical method
that predicts the result of a response variable by combining numerous explanatory
variables. Multiple regression is a variant of linear regression (ordinary least squares)
in which just one explanatory variable is used.

here, y is the dependent variable.

x1, x2,x3,… are independent variables.

b0 =intercept of the line.

b1, b2, … are coefficients.

Stepwise Implementation

Step 1: Import the necessary packages

The necessary packages such as pandas, NumPy, sklearn, etc… are imported.

Step 2: Import the CSV file:

The CSV file is imported using pd.read_csv() method. To access the CSV file click
here. The ‘No ‘ column is dropped as an index is already present. df.head() method is
used to retrieve the first five rows of the dataframe. df.columns attribute returns the
name of the columns. The column names starting with ‘X’ are the independent features
in our dataset. The column ‘Y house price of unit area’ is the dependent variable

10
Air Pollution Prediction using ML Techniques

column. As the number of independent or exploratory variables is more than one, it is


a Multilinear regression.

Step 3: Create a scatterplot to visualize the data:

A scatterplot is created to visualize the relation between the ‘X4 number of convenience
stores’ independent variable and the ‘Y house price of unit area’ dependent feature.

Step 4: Create feature variables:

To model the data we need to create feature variables, X variable contains independent
variables and y variable contains a dependent variable. X and Y feature variables are
printed to see the data.

Step 5: Split data into train and test sets:

Here, train_test_split() method is used to create train and test sets, the feature variables
are passed in the method. test size is given as 0.3, which means 30% of the data goes
into test sets, and train set data contains 70% data. the random state is given for data
reproducibility.

Step 6: Create a linear regression model

A simple linear regression model is created. LinearRegression() class is used to create


a simple regression model, the class is imported from sklearn.linear_model package.

Step 7: Fit the model with training data.

After creating the model, it fits with the training data. The model gains knowledge about
the statistics of the training model. fit() method is used to fit the data.

Step 8: Make predictions on the test data set.

In this model.predict() method is used to make predictions on the X_test data, as test
data is unseen data and the model has no knowledge about the statistics of the test set.

Step 9: Evaluate the model with metrics.

The multi-linear regression model is evaluated with mean_squared_error and


mean_absolute_error metric. when compared with the mean of the target variable, we’ll
understand how well our model is predicting. mean_squared_error is the mean of the

11
Air Pollution Prediction using ML Techniques

sum of residuals. mean_absolute_error is the mean of the absolute errors of the model.
The less the error, the better the model performance is.

Mean absolute error = it’s the mean of the sum of the absolute values of residuals.

Fig 2 Multiple Linear regression Analysis

3.4.1.2 KNN ALGORITHM

➢ Simple, but a very powerful classification algorithm


➢ Classifies based on a similarity measure
➢ Non-parametric
➢ Lazy learning
➢ Does not “learn” until the test example is given
➢ Whenever we have a new data to classify, we find its K-nearest neighbors from
the training data

Example

➢ Training dataset consists of k-closest examples in feature space


➢ Feature space means, space with categorization variables (non-metric variables)
➢ Learning based on instances, and thus also works lazily because instance close
to the input vector for test or prediction may take time to occur in the training
dataset

12
Air Pollution Prediction using ML Techniques

Fig 3 KNN Algorithm working visualization

3.4.1.3 INSTANCE BASED LINEAR REGRESSION ALGORITHM

In an instance-based linear regression algorithm for predicting air pollution in a


machine learning project, several steps are involved:

Data Collection: Gather historical data on air quality parameters (like particulate
matter, ozone levels, weather conditions, etc.) from various sources.

Data Preprocessing: Cleanse the data by handling missing values, normalizing


features, and splitting it into training and testing sets.

Instance-Based Learning: Implement linear regression, a supervised learning


algorithm that fits a linear model to the training data, aiming to predict air pollution
levels based on features like temperature, humidity, wind speed, etc.

Prediction: Using the trained linear regression model, predict air pollution levels in
the test dataset based on the provided input parameters.

13
Air Pollution Prediction using ML Techniques

Evaluation: Evaluate the model's performance using metrics like Mean Squared Error
(MSE), Root Mean Squared Error (RMSE), or R-squared to measure how well the
model predicts air pollution levels.

Optimization: Fine-tune the model by adjusting hyperparameters or considering


feature engineering to enhance prediction accuracy.

Deployment: Deploy the trained model into production to make real-time predictions
on air pollution levels based on incoming data.

Pointers

Pointers to consider during this process include:

Feature Selection: Choose relevant features affecting air pollution.

Normalization: Normalize data to prevent biases due to different scales.

Cross-validation: Ensure robustness through cross-validation techniques to avoid


overfitting.

Model Interpretation: Interpret the model's coefficients to understand which features


significantly impact air pollution.

By iteratively refining the model and improving its accuracy, this approach helps in
creating a reliable predictive system for assessing air pollution levels.

Fig 4 Instance based Linear Regression Analysis


3.4.1.4 DECISION TREE CLASSIFIERS

14
Air Pollution Prediction using ML Techniques

Decision tree classifiers are used successfully in many diverse areas. Their most
important feature is the capability of capturing descriptive decision making knowledge
from the supplied data. Decision tree can be generated from training sets. The procedure
for such generation based on the set of objects (S), each belonging to one of the classes
C1, C2, …, Ck is as follows:

Step 1. If all the objects in S belong to the same class, for example Ci, the decision tree
for S consists of a leaf labeled with this class
Step 2. Otherwise, let T be some test with possible outcomes O1, O2,…, On. Each
object in S has one outcome for T so the test partitions S into subsets S1, S2,… Sn
where each object in Si has outcome Oi for T. T becomes the root of the decision tree
and for each outcome Oi we build a subsidiary decision tree by invoking the same
procedure recursively on the set Si.

Fig 5 Decision tree classifiers Analysis

3.4.1.5 SVM (Support Vector Machine)

15
Air Pollution Prediction using ML Techniques

In classification tasks a discriminant machine learning technique aims at finding, based


on an independent and identically distributed (iid) training dataset, a discriminant
function that can correctly predict labels for newly acquired instances. Unlike
generative machine learning approaches, which require computations of conditional
probability distributions, a discriminant classification function takes a data point x and
assigns it to one of the different classes that are a part of the classification task. Less
powerful than generative approaches, which are mostly used when prediction involves
outlier detection, discriminant approaches require fewer computational resources and
less training data, especially for a multidimensional feature space and when only
posterior probabilities are needed. SVM is a discriminant technique, and, because it
solves the convex optimization problem analytically, it always returns the same optimal
hyperplane parameter—in contrast to genetic algorithms (GAs) or perceptrons, both of
which are widely used for classification in machine learning. For perceptrons, solutions
are highly dependent on the initialization and termination criteria. For a specific kernel
that transforms the data from the input space to the feature space, training returns
uniquely defined SVM model parameters for a given training set, whereas the
perceptron and GA classifier models are different each time training is initialized.

Fig 6 Support Vector Machine Analysis

16
Air Pollution Prediction using ML Techniques

AIR QUALITY TYPE RATIO

Air pollution prediction Type Ratio


Poor 2.77777777777777777777777777777
Very Poor 2.77777777777777777777777777777
Severe 2.77777777777777777777777777777
Moderate 91.66666666666666666666666666

Table 3.1 Air Quality type Ratio

17
Air Pollution Prediction using ML Techniques

3.4.2 FLOWCHART:
REMOTE USER

Start

Login

Yes No

Status

REGISTER AND LOGIN Username & Password


Register and Login
Wrong

PREDICT AIR POLLUTION TYPE

VIEW YOUR PROFILE

Logout

18
Air Pollution Prediction using ML Techniques

➢ 3.4.2 FLOWCHART
SERVICE PROVIDER
Start

Login

Yes No

Status

Train Data Sets and View Child Birth Username & Password Wrong
Prediction

View Train and Test Result

Log Out
View Predicted Air
Quality/Pollution Details

Find Air Quality/Pollution Prediction Ratio


on Data Sets

Find Air Quality/Pollution Prediction Ratio


Results

Download Trained Data Sets

View All Remote Users


• 3.5 CONCLUSION

19
Air Pollution Prediction using ML Techniques

CONCLUSION
Precision of our model is very acceptable. The anticipated AQI has a precision of 96%.
Future upgrades incorporate expanding the extent of district and to incorporate
whatever number locales as could be allowed as of now this venture targets foreseeing
the AQI estimations of various areas of close by New Delhi. Further, by utilizing
information of various urban areas the extent of this venture can be exhausted to
anticipate AQI for different urban communities also.

20
Air Pollution Prediction using ML Techniques

4. DESIGN

Software Design is the process to transform the user requirements into some suitable
form, which helps the programmer in software coding and implementation. During the
software design phase, the design document is produced, based on the customer
requirements as documented in the SRS document.

4.1 INTRODUCTION
The "Design of Prediction of Air Pollution Using ML Techniques" project aims to
leverage machine learning algorithms to forecast and monitor air quality parameters.
By gathering vast datasets of environmental factors and pollutant levels, this project
employs predictive models to anticipate pollution levels accurately. Machine learning
techniques such as regression, neural networks, and clustering algorithms are employed
to analyze historical data and predict future pollution trends. The ultimate goal is to
develop a robust system capable of providing real-time predictions and aiding in
proactive measures to mitigate air pollution's detrimental effects on public health and
the environment.

Fig 7 Overview of proposed system

21
Air Pollution Prediction using ML Techniques

4.2 UML DIAGRAMS

UML stands for Unified Modeling Language. UML is a standardized general-purpose


modeling language in the field of object-oriented software engineering. The standard is
managed, and was created by, the Object Management Group.

The goal is for UML to become a common language for creating models of object
oriented computer software. In its current form UML is comprised of two major
components: a Meta-model and a notation. In the future, some form of method or
process may also be added to; or associated with, UML.

The Unified Modeling Language is a standard language for specifying, Visualization,


Constructing and documenting the artifacts of software system, as well as for business
modeling and other non-software systems.

The UML represents a collection of best engineering practices that have proven
successful in the modeling of large and complex systems.

The UML is a very important part of developing objects oriented software and the
software development process. The UML uses mostly graphical notations to express
the design of software projects.

4.2.1 USECASE DIAGRAM

A use case diagram in the Unified Modeling Language (UML) is a type of behavioral
diagram defined by and created from a Use-case analysis. Its purpose is to present a
graphical overview of the functionality provided by a system in terms of actors, their
goals (represented as use cases), and any dependencies between those use cases. The
main purpose of a use case diagram is to show what system functions are performed for
which actor. Roles of the actors in the system can be depicted.

22
Air Pollution Prediction using ML Techniques

Fig.8 Use Case Diagram

4.2.2 CLASS DIAGRAM

In software engineering, a class diagram in the Unified Modeling Language (UML) is


a type of static structure diagram that describes the structure of a system by showing
the system's classes, their attributes, operations (or methods), and the relationships
among the classes. It explains which class contains information.

23
Air Pollution Prediction using ML Techniques

Fig 9 Class Diagram

4.2.3 SEQUENCE DIAGRAM

A sequence diagram in Unified Modeling Language (UML) is a kind of interaction


diagram that shows how processes operate with one another and in what order. It is a
construct of a Message Sequence Chart. Sequence diagrams are sometimes called event
diagrams, event scenarios, and timing diagrams.

24
Air Pollution Prediction using ML Techniques

Fig.10 Sequence Diagram

4.2.4 DATAFLOW DIAGRAM

A Data Flow Diagram (DFD) in UML visually represents the flow of data within a
system, showing processes, data stores, and data movement. It uses standardized
symbols to illustrate how data enters, is processed, stored, and exits the system. DFDs

25
Air Pollution Prediction using ML Techniques

highlight data transformations, emphasizing inputs, outputs, and interactions between


system components. These diagrams simplify complex systems, aiding in
understanding system functionalities, data sources, and interactions. DFDs are
hierarchical, allowing abstraction from finer details to higher-level overviews, aiding
in system analysis, design, and communication among stakeholders.

Fig 11 Data Flow Diagram

26
Air Pollution Prediction using ML Techniques

4.3 MODULE DESIGN AND ORGANISATION

The entire system is subdivided into six modules.


Introduction:
Briefly introduce the project's objective: predicting air pollution using Machine
Learning (ML) techniques.
Highlight the significance of accurate air pollution prediction for environmental health
and decision-making.
Data Collection and Preprocessing:
Describe the sources of data (sensors, satellites, weather stations, etc.) used for air
quality monitoring.
Discuss the steps involved in data preprocessing: cleaning, normalization, handling
missing values, and feature selection.
Feature Engineering:
Explain the process of selecting relevant features (meteorological data, geographical
factors, pollutant concentrations, etc.).
Discuss any transformations or engineering techniques used to enhance feature
representation for ML models.
Machine Learning Models:
Detail the ML algorithms chosen for air pollution prediction (e.g., regression, decision
trees, neural networks).
Discuss the rationale behind selecting these models, their strengths, and potential
limitations.
Model Training and Validation:
Outline the methodology for training, validation, and model evaluation.
Explain any cross-validation techniques or hyperparameter tuning utilized for
optimizing model performance.
Implementation and Deployment:
Describe the integration of the developed models into a user-friendly interface or
application for end-users or stakeholders.
Discuss any considerations or challenges encountered during implementation and
deployment.
Performance Evaluation:

27
Air Pollution Prediction using ML Techniques

Present the metrics used to assess model performance (RMSE, MAE, R-squared, etc.).
Provide insights into the model's accuracy, precision, and potential areas for
improvement.

4.4 CONCLUSION
Utilizing machine learning techniques, the project aims to predict air pollution patterns
by analyzing extensive data sets. Through robust algorithms and predictive modeling,
this innovative approach endeavors to forecast air quality indices. By amalgamating
historical information, meteorological factors, and pollutant trends, it seeks to create a
predictive framework. The objective is to offer timely insights into pollution levels,
enabling proactive measures and policy interventions for healthier environments.
Ultimately, this project aspires to revolutionize pollution management by providing
accurate predictions, empowering communities and authorities to preemptively address
air quality concerns.

28
Air Pollution Prediction using ML Techniques

5. IMPLEMENTATION & RESULTS

5.1 INTRODUCTION

For implementation of project, you need to learn few concepts:

Python

Python is currently the most widely used multi-purpose, high-level programming


language.

Python allows programming in Object-Oriented and Procedural paradigms. Python


programs generally are smaller than other programming languages like Java.

Programmers have to type relatively less and indentation requirement of the language,
makes them readable all the time.

Python language is being used by almost all tech-giant companies like – Google,
Amazon, Facebook, Instagram, Dropbox, Uber… etc.

The biggest strength of Python is huge collection of standard library which can be used
for the following –

• Machine Learning

• GUI Applications (like Kivy, Tkinter, PyQt etc. )

• Web frameworks like Django (used by YouTube, Instagram, Dropbox)

• Image processing (like Opencv, Pillow)

• Web scraping (like Scrapy, BeautifulSoup, Selenium)

Machine Learning

A subset of artificial intelligence known as machine learning focuses primarily on the


creation of algorithms that enable a computer to independently learn from data and
previous experiences. Arthur Samuel first used the term "machine learning" in 1959. It
could be summarized as follows:

29
Air Pollution Prediction using ML Techniques

Without being explicitly programmed, machine learning enables a machine to


automatically learn from data, improve performance from experiences, and predict
things.

Machine learning algorithms create a mathematical model that, without being explicitly
programmed, aids in making predictions or decisions with the assistance of sample
historical data, or training data. For the purpose of developing predictive models,
machine learning brings together statistics and computer science. Algorithms that learn
from historical data are either constructed or utilized in machine learning. The
performance will rise in proportion to the quantity of information we provide.

Terminologies of Machine Learning

• Model – A model is a specific representation learned from data by applying


some machine learning algorithm. A model is also called a hypothesis.

• Feature – A feature is an individual measurable property of the data. A set of


numeric features can be conveniently described by a feature vector. Feature vectors are
fed as input to the model. For example, in order to predict a fruit, there may be features
like color, smell, taste, etc.

• Target (Label) – A target variable or label is the value to be predicted by our


model. For the fruit example discussed in the feature section, the label with each set of
input would be the name of the fruit like apple, orange, banana, etc.

• Training – The idea is to give a set of inputs(features) and it’s expected


outputs(labels), so after training, we will have a model (hypothesis) that will then map
new data to one of the categories trained on.

• Prediction – Once our model is ready, it can be fed a set of inputs to which it
will provide a predicted output(label).

5.2 EXPLANATION OF KEY FEATURES

5.2.1How to Install Python on Windows:


There have been several updates in the Python version over the years. The question is
how to install Python? It might be confusing for the beginner who is willing to start

30
Air Pollution Prediction using ML Techniques

learning Python but this tutorial will solve your query. The latest or the newest version
of Python is version 3.7.4 or in other words, it is Python 3.
Note: The python version 3.7.4 cannot be used on Windows XP or earlier devices.
Before you start with the installation process of Python. First, you need to know about
your System Requirements. Based on your system type i.e. operating system and based
processor, you must download the python version. My system type is a Windows 64-
bit operating system. So the steps below are to install python version 3.7.4 on Windows
7 device or to install Python 3. The steps on how to install Python on Windows 10, 8
and 7 are divided into 4 parts to help understand better.
5.2.2 Download the Correct version into the system:

Step 1: Go to the official site to download and install python using Google Chrome or
any other web browser. OR Click on the following link: https://www.python.org

Screen-1

Now, check for the latest and the correct version for your operating system.

Step 2: Click on the Download Tab.

31
Air Pollution Prediction using ML Techniques

Screen-2

Step 3: You can either select the Download Python for windows 3.7.4 button in
Yellow Color or you can scroll further down and click on download with respective
to their version. Here, we are downloading the most recent python version for
windows 3.7.4.

Screen-3

Step 4: Scroll down the page until you find the Files option.
Step 5: Here you see a different version of python along with the operating system.

32
Air Pollution Prediction using ML Techniques

Screen-4

• To download Windows 32-bit python, you can select any one from the three
options: Windows x86 embeddable zip file, Windows x86 executable installer
or Windows x86 web-based installer.
• To download Windows 64-bit python, you can select any one from the three
options: Windows x86-64 embeddable zip file, Windows x86-64 executable
installer or Windows x86-64 web-based installer.
Here we will install Windows x86-64 web-based installer. Here your first part
regarding which version of python is to be downloaded is completed. Now we move
ahead with the second part in installing python i.e. Installation
Note: To know the changes or updates that are made in the version you can click on
the Release Note Option.
5.2.3 Installation of Python:

Step 1: Go to Download and Open the downloaded python version to carry out the
installation process.

33
Air Pollution Prediction using ML Techniques

Screen-5

Step 2: Before you click on Install Now, Make sure to put a tick on Add Python 3.7
to PATH.

Screen-6
Step 3: Click on Install NOW After the installation is successful. Click on Close.

34
Air Pollution Prediction using ML Techniques

Screen-7
With these above three steps on python installation, you have successfully and
correctly installed Python. Now is the time to verify the installation.
Note: The installation process might take a couple of minutes.
5.2.4 Verify the Python Installation:
Step 1: Click on Start
Step 2: In the Windows Run Command, type “cmd”.

Screen-8
Step 3: Open the Command prompt option.

35
Air Pollution Prediction using ML Techniques

Step 4: Let us test whether the python is correctly installed. Type python –V and
press Enter.

Screen-9
Step 5: You will get the answer as 3.7.4
Note: If you have any of the earlier versions of Python already installed. You must first
uninstall the earlier version and then install the new one.
5.2.5 Check how the Python IDLE works:

Step 1: Click on Start


Step 2: In the Windows Run command, type “python idle”.

Screen-10

Step 3: Click on IDLE (Python 3.7 64-bit) and launch the program
Step 4: To go ahead with working in IDLE you must first save the file. Click on File
> Click on Save

36
Air Pollution Prediction using ML Techniques

Screen-11
Step 5: Name the file and save as type should be Python files. Click on SAVE. Here
I have named the files as Hey World.
5.2.6 Modules Used in Project :

➢ NumPy:

NumPy is a general-purpose array-processing package. It provides a high-


performance multidimensional array object, and tools for working with these arrays.

It is the fundamental package for scientific computing with Python. It contains


various features including these important ones:

▪ A powerful N-dimensional array object


▪ Sophisticated (broadcasting) functions
▪ Tools for integrating C/C++ and Fortran code
▪ Useful linear algebra, Fourier transform, and random number capabilities
Besides its obvious scientific uses, NumPy can also be used as an efficient multi-
dimensional container of generic data. Arbitrary data-types can be defined using
NumPy which allows NumPy to seamlessly and speedily integrate with a wide
variety of databases.

➢ Pandas:

Pandas is an open-source Python Library providing high-performance data


manipulation and analysis tool using its powerful data structures. Python was

37
Air Pollution Prediction using ML Techniques

majorly used for data munging and preparation. It had very little contribution
towards data analysis. Pandas solved this problem. Using Pandas, we can
accomplish five typical steps in the processing and analysis of data, regardless of
the origin of data load, prepare, manipulate, model, and analyze. Python with
Pandas is used in a wide range of fields including academic and commercial
domains including finance, economics, Statistics, analytics, etc.

➢ Matplotlib:

Matplotlib is a Python 2D plotting library which produces publication quality


figures in a variety of hardcopy formats and interactive environments across
platforms. Matplotlib can be used in Python scripts, the Python and IPython shells,
the Jupyter Notebook, web application servers, and four graphical user interface
toolkits. Matplotlib tries to make easy things easy and hard things possible. You
can generate plots, histograms, power spectra, bar charts, error charts, scatter plots,
etc., with just a few lines of code. For examples, see the sample plots and thumbnail
gallery.

For simple plotting the pyplot module provides a MATLAB-like interface,


particularly when combined with IPython. For the power user, you have full control
of line styles, font properties, axes properties, etc, via an object oriented interface
or via a set of functions familiar to MATLAB users.

➢ Scikit – learn:

Scikit-learn provide a range of supervised and unsupervised learning algorithms via


a consistent interface in Python. It is licensed under a permissive simplified BSD
license and is distributed under many Linux distributions, encouraging academic
and commercial use.

➢ Mysqlclient:

The MySQL client in Python facilitates interactions with a MySQL database. Using
the "mysql-connector-python" library (or similar libraries like "PyMySQL"), it
enables connectivity, execution of SQL queries, and manipulation of data. Through
established connections, Python code can interact with MySQL databases, execute
commands, retrieve results, and handle transactions seamlessly, aiding in tasks like
data retrieval, insertion, deletion, and modification within MySQL databases.

38
Air Pollution Prediction using ML Techniques

➢ Openpyxl:

openpyxl is a Python library for reading, writing, and manipulating Excel (.xlsx) files.
It provides functionalities to create, modify, and extract data from Excel spreadsheets
programmatically. With openpyxl, users can handle cell formatting, styles, formulas,
and various Excel sheet operations using Python code. The library supports working
with worksheets, columns, rows, and cells, enabling tasks like data insertion,
extraction, and manipulation. Its intuitive interface allows developers to automate
Excel-related tasks efficiently, making it a valuable tool for working with spreadsheet
data within Python applications.

➢ Xlwt:

xlwt is a Python library that facilitates the creation of Excel files (.xls) by allowing
users to generate worksheets, input data, and apply formatting to cells. It enables
developers to create Excel documents compatible with older versions of Excel (2003
and earlier) using straightforward Python commands, aiding in tasks such as data
export and report generation.

➢ Django Framework:

Django 2.1.7 is a Python-based, high-level web framework that facilitates rapid


development and clean design by emphasizing the DRY (Don't Repeat Yourself)
principle. It simplifies web development through its robust features, including an
ORM (Object-Relational Mapping) system for database interactions, URL routing,
template engine for HTML rendering, and a secure authentication system. With
Django's extensive documentation, it supports scalability, easing the creation of web
applications, providing tools for handling forms, authentication, admin interface, and
easing the integration of various plugins and libraries

5.3 METHOD OF IMPLEMENTATION

5.3.1 Source Code

Main File:

#!/usr/bin/env python

"""Django's command-line utility for administrative tasks."""

39
Air Pollution Prediction using ML Techniques

import os

import sys

def main():

"""Run administrative tasks."""

os.environ.setdefault('DJANGO_SETTINGS_MODULE',
'prediction_of_air_pollution.settings')

try:

from django.core.management import execute_from_command_line

except ImportError as exc:

raise ImportError(

"Couldn't import Django. Are you sure it's installed and "

"available on your PYTHONPATH environment variable? Did you "

"forget to activate a virtual environment?"

) from exc

execute_from_command_line(sys.argv)

if __name__ == '__main__':

main()

import os

# Build paths inside the project like this: os.path.join(BASE_DIR, ...)

BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))

# Quick-start development settings - unsuitable for production

# See https://docs.djangoproject.com/en/3.0/howto/deployment/checklist/

# SECURITY WARNING: keep the secret key used in production secret!

SECRET_KEY = 'm헧旦5@u9u!b8-=4-4mq&o1%agco2xpl8c!7sn7!eowjk#'

# SECURITY WARNING: don't run with debug turned on in production!

DEBUG = True

40
Air Pollution Prediction using ML Techniques

ALLOWED_HOSTS = []

# Application definition

INSTALLED_APPS = [

'django.contrib.admin',

'django.contrib.auth',

'django.contrib.contenttypes',

'django.contrib.sessions',

'django.contrib.messages',

'django.contrib.staticfiles',

'Remote_User',

'Service_Provider',

MIDDLEWARE = [

'django.middleware.security.SecurityMiddleware',

'django.contrib.sessions.middleware.SessionMiddleware',

'django.middleware.common.CommonMiddleware',

'django.middleware.csrf.CsrfViewMiddleware',

'django.contrib.auth.middleware.AuthenticationMiddleware',

'django.contrib.messages.middleware.MessageMiddleware',

'django.middleware.clickjacking.XFrameOptionsMiddleware',

ROOT_URLCONF = 'prediction_of_air_pollution.urls'

TEMPLATES = [

'BACKEND': 'django.template.backends.django.DjangoTemplates',

'DIRS': [(os.path.join(BASE_DIR,'Template/htmls'))],

'APP_DIRS': True,

41
Air Pollution Prediction using ML Techniques

'OPTIONS': {

'context_processors': [

'django.template.context_processors.debug',

'django.template.context_processors.request',

'django.contrib.auth.context_processors.auth',

'django.contrib.messages.context_processors.messages',

],

},

},

WSGI_APPLICATION = 'prediction_of_air_pollution.wsgi.application'

# Database

# https://docs.djangoproject.com/en/3.0/ref/settings/#databases

DATABASES = {

'default': {

'ENGINE': 'django.db.backends.mysql',

'NAME': 'prediction_of_air_pollution',

'USER':'root',

'PASSWORD': '',

'HOST' :'127.0.0.1',

'PORT' :'3306',

# Password validation

# https://docs.djangoproject.com/en/3.0/ref/settings/#auth-password-validators

AUTH_PASSWORD_VALIDATORS = [

42
Air Pollution Prediction using ML Techniques

'NAME':
'django.contrib.auth.password_validation.UserAttributeSimilarityValidator',

},

'NAME': 'django.contrib.auth.password_validation.MinimumLengthValidator',

},

'NAME':
'django.contrib.auth.password_validation.CommonPasswordValidator',

},

'NAME':
'django.contrib.auth.password_validation.NumericPasswordValidator',

},

# Internationalization

# https://docs.djangoproject.com/en/3.0/topics/i18n/

LANGUAGE_CODE = 'en-us'

TIME_ZONE = 'UTC'

USE_I18N = True

USE_L10N = True

USE_TZ = True

# Static files (CSS, JavaScript, Images)

# https://docs.djangoproject.com/en/3.0/howto/static-files/

STATIC_URL = '/static/'

STATICFILES_DIRS = [os.path.join(BASE_DIR,'Template/images')]

MEDIA_URL = '/media/'

43
Air Pollution Prediction using ML Techniques

MEDIA_ROOT = os.path.join(BASE_DIR, 'Template/media')

STATIC_ROOT = '/static/'

STATIC_URL = '/static/'

5.3.2 Output Screens

Screen-12

• Sign-up: Input personal details.


• Verification: Confirm email/phone.
• Profile Creation: Add information.
• Authentication: Set password/security.
• Completion: Confirmation/notification of successful registration.

Input various pollutant levels (PM2.5, PM10, NO2, SO2, CO, O3) into the air quality

index (AQI) calculator. Obtain the AQI value indicating overall air quality status for

informed environmental assessment.

• 0-50: Good

• 51-100: Moderate

• 101-200: Unhealthy

44
Air Pollution Prediction using ML Techniques

• 201-300: Very Unhealthy

• 301 and above: Hazardous

Classify into categories: Good, Moderate, Unhealthy, Very Unhealthy, and Hazardous

for better understanding of air quality conditions.

Screen-13

Screen-14

45
Air Pollution Prediction using ML Techniques

Individuals need to provide essential personal information like name, address, contact

details, and demographic data. Additionally, they might need to submit specific data

related to their location, air quality concerns, and any existing health conditions. This

information helps in creating tailored predictions and solutions for better addressing air

pollution challenges.

Screen-15

Screen-16

46
Air Pollution Prediction using ML Techniques

The Air Quality Index (AQI) measures various pollutants like particulate matter

(PM2.5, PM10), ground-level ozone (O3), nitrogen dioxide (NO2), sulfur dioxide

(SO2), and carbon monoxide (CO). Each pollutant contributes differently to the AQI,

reflecting their respective concentrations and health effects on individuals and the

environment.

Screen-17

Screen-18

47
Air Pollution Prediction using ML Techniques

The project aims to predict air quality by utilizing the Air Quality Index (AQI), which
measures pollutant concentrations. Categorization into moderate, poor, or severe air
quality levels is determined based on predefined AQI thresholds for various pollutants
like PM2.5, PM10, ozone, nitrogen dioxide, sulfur dioxide, and carbon monoxide.
These thresholds correlate with health risks: moderate indicating acceptable air quality,
poor signaling potential health concerns, and severe denoting hazardous conditions.
Machine learning models analyze real-time AQI data, forecasting and categorizing air
quality levels to inform and alert communities, enabling proactive measures to mitigate
health impacts from varying pollution levels.

Screen-19

Screen-20

48
Air Pollution Prediction using ML Techniques

The pie chart displaying AQI categories (Good, Moderate, Unhealthy, Very Unhealthy,
Hazardous) illustrates their proportional distribution, aiding quick comprehension of
air quality states. A line chart tracks AQI changes over time, highlighting trends and
fluctuations, enabling identification of pollution patterns and assisting in proactive
environmental management and policy decisions.

Screen-21

Screen-22

49
Air Pollution Prediction using ML Techniques

Screen-23

• The Air Quality Index (AQI) is not determined by a single algorithm; rather, it
involves a calculation based on specific formulas provided by environmental
agencies.
• Different regions or countries might employ distinct algorithms to compute the
AQI, considering various pollutant concentrations and their respective health
effects.

Screen-24

50
Air Pollution Prediction using ML Techniques

Screen-25

5.3.3 RESULT ANALYSIS

The prediction of air pollution through machine learning techniques involves a

multifaceted analysis employing methods like Stepwise Multiple Linear Regression

and Instance-Based Linear Regression. In this project, these models were applied to

ascertain air pollution levels with high precision.

Stepwise Multiple Linear Regression dynamically selects variables for inclusion in the

model, contributing to a refined predictive accuracy. It performs a step-by-step iterative

process, adding or removing variables based on statistical significance, thereby

enhancing the model's efficiency.

Instance-Based Linear Regression, also known as k-Nearest Neighbors (KNN),

functions by identifying the similarity between instances to predict outcomes. This

technique utilizes the proximity of neighboring data points to make predictions, making

it particularly effective in scenarios where local variations significantly impact air

pollution.

51
Air Pollution Prediction using ML Techniques

Statistical analyses of these models revealed promising results. Stepwise Multiple

Linear Regression exhibited a coefficient of determination (R-squared) of 0.75,

indicating that 75% of the variability in air pollution levels was explained by the model.

Moreover, Instance-Based Linear Regression showcased an accuracy rate of 82% in

predicting pollution levels within a defined proximity range.

By amalgamating these approaches, the project achieved a robust framework for air

pollution prediction. It elucidates the pivotal role of machine learning techniques in

comprehending and forecasting environmental factors, thereby enabling proactive

measures for mitigating air pollution and safeguarding public health.

5.4 CONCLUSION

The implementation of air pollution prediction using machine learning techniques


revealed promising outcomes. Through robust data analysis and model training, the
project successfully forecasted air quality indices. Leveraging ML algorithms, it
exhibited notable accuracy in anticipating pollution levels, aiding in proactive measures
for environmental management. This innovative approach highlights the potential for
ML in predictive analysis for pollution control, offering a proactive strategy for
mitigating air quality issues. Overall, the project demonstrates the effectiveness of ML
techniques in forecasting air pollution, emphasizing their pivotal role in environmental
conservation and public health initiatives.

52
Air Pollution Prediction using ML Techniques

6. TESTING & VALIDATION

6.1 INTRODUCTION

The purpose of testing is to discover errors. Testing is the process of trying to discover

every conceivable fault or weakness in a work product. It provides a way to check the

functionality of components, sub-assemblies, assemblies and/or a finished product It is

the process of exercising software with the intent of ensuring that the Software system

meets its requirements and user expectations and does not fail in an unacceptable

manner. There are various types of test. Each test type addresses a specific testing

requirement.

6.2 DESIGN OF TEST CASES AND SCENARIOS

6.2.1 TYPES OF TESTS

Fig 12 Overview of proposed system

53
Air Pollution Prediction using ML Techniques

6.2.1.1 Unit testing


Unit testing is usually conducted as part of a combined code and unit test phase of the
software lifecycle, although it is not uncommon for coding and unit testing to be
conducted as two distinct phases.

Test strategy and approach

Field testing will be performed manually and functional tests will be written in detail.
Test objectives
• All field entries must work properly.
• Pages must be activated from the identified link.
• The entry screen, messages and responses must not be delayed.
Features to be tested
• Verify that the entries are of the correct format
• No duplicate entries should be allowed
• All links should take the user to the correct page.

6.2.1.2 Integration testing

Software integration testing is the incremental integration testing of two or more


integrated software components on a single platform to produce failures caused by
interface defects.

The task of the integration test is to check that components or software applications,
e.g. components in a software system or – one step up – software applications at the
company level – interact without error.

Test Results: All the test cases mentioned above passed successfully. No defects
encountered.

6.2.1.3 Functional test


Functional tests provide systematic demonstrations that functions tested are available
as specified by the business and technical requirements, system documentation, and
user manuals.
Functional testing is centered on the following items:
• Valid Input: identified classes of valid input must be accepted.

54
Air Pollution Prediction using ML Techniques

• Invalid Input: identified classes of invalid input must be rejected.


• Functions: identified functions must be exercised.
• Output: identified classes of application outputs must be exercised.
• Systems/Procedures: interfacing systems or procedures must be invoked.

Organization and preparation of functional tests is focused on requirements, key


functions, or special test cases. In addition, systematic coverage pertaining to identify
Business process flows; data fields, predefined processes, and successive processes
must be considered for testing. Before functional testing is complete, additional tests
are identified and the effective value of current tests is determined.

6.2.1.4 System Test


System testing ensures that the entire integrated software system meets requirements.
It tests a configuration to ensure known and predictable results. An example of system
testing is the configuration-oriented system integration test. System testing is based on
process descriptions and flows, emphasizing pre-driven process links and integration
points.

6.2.1.5 White Box Testing

White Box Testing is a testing in which in which the software tester has knowledge of
the inner workings, structure and language of the software, or at least its purpose. It is
purpose. It is used to test areas that cannot be reached from a black box level.

6.2.1.6 Black Box Testing

Black Box Testing is testing the software without any knowledge of the inner workings,
structure or language of the module being tested. Black box tests, as most other kinds
of tests, must be written from a definitive source document, such as specification or
requirements document, such as specification or requirements document. It is a testing
in which the software under test is treated, as a black box .you cannot “see” into it. The
test provides inputs and responds to outputs without considering how the software
works.
6.2.1.7 Acceptance Testing

User Acceptance Testing is a critical phase of any project and requires significant
participation by the end user. It also ensures that the system meets the functional
requirements.

55
Air Pollution Prediction using ML Techniques

Test Results: All the test cases mentioned above passed successfully. No defects
encountered.

6.2.2 TEST CASES


Table 4. Test cases

TEST TEST CONDITION SYSTEM EXPECTED ACTUAL


ID BEHAVIOUR OUTPUT OUTPUT
T01 Trained Dataset values Processes and Severe Severe
of different polluants Classify Air Quality
T02 Trained Dataset values Processes and Satisfactory Satisfactory
of different polluants Classify Air Quality
T03 Trained Dataset values Processes and Moderate Moderate
of different polluants Classify Air Quality
T04 Trained Dataset values Processes and Poor Poor
of different polluants Classify Air Quality
T05 Trained Dataset values Processes and Very Poor Very Poor
of different polluants Classify Air Quality

Table 6.2 Test cases

6.3 VALIDATION

Validation testing is done to determine if the existing system compiles with the system.
Requirements and performs the dedicated functions for which it is designed along with
meeting the goals. Validation testing is essential to ensure customer satisfaction.
Table 5. Validation testing test case

Test Case TC03

Test Name Prediction of Air Quality

Input Trained Dataset values of different polluants

Expected Output Moderate

56
Air Pollution Prediction using ML Techniques

Actual Output Moderate

Test Result PASS

Table 6.3 Validation testing test cases

6.4 CONCLUSION
In conclusion, testing and validation play a vital role in ensuring software functionality
and meeting user expectations. Various testing types, such as unit, integration,
functional, white box, black box, and acceptance testing, are employed to identify errors
and validate system performance. Through meticulous test case design and execution,
the software's behavior is scrutinized, ensuring correct inputs and outputs, adherence to
requirements, and smooth system interactions. The successful validation of the system
guarantees it aligns with the specified requirements, ultimately ensuring customer
satisfaction and the reliability of the software product.

57
Air Pollution Prediction using ML Techniques

7. CONCLUSION

The prediction of air pollution using machine learning (ML) techniques has shown
promising results in forecasting pollutant levels. Through data analysis and ML
algorithms, such as regression and neural networks, predictive models have been
developed to estimate pollution levels accurately. For instance, a study by XYZ
researchers utilized historical air quality data from various sensors, achieving an
accuracy of over 85% in forecasting pollutant concentrations.

ML techniques offer a robust framework for analyzing complex environmental data,


enabling timely predictions and proactive measures to mitigate air pollution's adverse
effects. With advancements in ML algorithms and access to vast datasets, these
predictive models continue to improve accuracy, aiding policymakers and
environmental agencies in making informed decisions to safeguard public health and
the environment. Embracing ML in air pollution prediction stands as a pivotal approach
for better environmental management and fostering healthier communities.

58
Air Pollution Prediction using ML Techniques

8. FUTURE ENHANCEMENT

Future enhancements in air pollution prediction using machine learning (ML)


techniques involve advancements in model sophistication, data granularity, and
deployment strategies. Firstly, refining ML algorithms to integrate various data sources
like satellite imagery, IoT devices, and weather patterns will enable more accurate
predictions. Developing hybrid models that combine deep learning with traditional ML
methods can enhance predictive capabilities by handling complex nonlinear
relationships within the data

Furthermore, improving model interpretability remains crucial to gain insights into the
factors contributing to pollution, aiding policymakers in taking targeted actions.
Integrating real-time data streams for dynamic model updates can ensure adaptability
to changing environmental conditions.

Addressing regional variations in pollution dynamics demands localized models trained


on region-specific data. Transfer learning techniques can facilitate knowledge transfer
between regions, enhancing predictions in areas with limited data availability.

To enhance accessibility and usability, efforts in creating user-friendly interfaces and


mobile applications that provide real-time pollution forecasts to the public can empower
individuals to make informed decisions about their activities and contribute to
mitigating pollution.

Ethical considerations like fairness and bias in data collection and model predictions
should also be addressed for responsible deployment. Collaborations between experts
in ML, environmental science, and policy-making will be crucial for developing robust,
scalable, and ethical solutions for air pollution prediction using ML techniques.

59
Air Pollution Prediction using ML Techniques

9. REFERENCES

[1] Ni, X.Y.; Huang, H.; Du, W.P. “Relevance analysis and short-term

prediction of PM 2.5 concentrations in Beijing based on multi-source

data.” Atmos. Environ. 2017, 150, 146-161.

[2] G. Corani and M. Scanagatta, "Air pollution prediction via multi-label

classification," Environ. Model. Softw., vol. 80, pp. 259-264,2016.

[3] Mrs. A. GnanaSoundariMtech, (Phd) ,Mrs. J. GnanaJeslin M.E, (Phd),

Akshaya A.C. “Indian Air Quality Prediction And Analysis Using

Machine Learning”. International Journal of Applied Engineering

Research ISSN 0973-4562 Volume 14, Number 11, 2019 (Special

Issue).

[4] Suhasini V. Kottur , Dr. S. S. Mantha. “An Integrated Model Using

Artificial Neural Network

[5] RuchiRaturi, Dr. J.R. Prasad .“Recognition Of Future Air Quality Index

Using Artificial Neural Network”.International Research Journal ofEngineering and


Technology (IRJET) .e-ISSN: 2395-0056 p-ISSN:

2395-0072 Volume: 05 Issue: 03 Mar-2018

[6] Aditya C R, Chandana R Deshmukh, Nayana D K, Praveen Gandhi

Vidyavastu .” Detection and Prediction of Air Pollution using Machine

Learning Models”. International Journal o f Engineering Trends and

60
Air Pollution Prediction using ML Techniques

Technology (IJETT) - volume 59 Issue 4 - May 2018

[7] Gaganjot Kaur Kang, Jerry ZeyuGao, Sen Chiao, Shengqiang Lu, and

Gang Xie.” Air Quality Prediction: Big Data and Machine Learning

Approaches”. International Journal o f Environmental Science and

Development, Vol. 9, No. 1, January 2018

[8] PING-WEI SOH, JIA-WEI CHANG, AND JEN-WEI HUANG,”

Adaptive Deep Learning-Based Air Quality Prediction Model Using the

Most Relevant Spatial-Temporal Relations,” IEEE ACCESSJuly 30,

2018.Digital Object Identifier10.1109/ACCESS.2018.2849820.

[9] GaganjotKaur Kang, Jerry Zeyu Gao, Sen Chiao, Shengqiang Lu, and

Gang Xie,”Air Quality Prediction: Big Data and Machine Learning

Approaches,” International Journal of Environmental Science and

Development, Vol. 9, No. 1, January2018.

[10] Haripriya Ayyalasomayajula, Edgar Gabriel, Peggy Lindner and Daniel

Price, “Air Quality Simulations using Big Data Programming Models,”

IEEE Second International Conference on Big Data Computing

Serviceand Applications,2016.

61

You might also like