0% found this document useful (0 votes)
222 views

Random Forest Algorithm

The document describes using a random forest algorithm for air quality monitoring and forecasting. It provides an overview of random forest algorithms, including how they work, their model description, and the algorithm steps. As an example, it discusses using a random forest classifier to predict air quality index values based on various input variables.

Uploaded by

zalakthakkar
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
222 views

Random Forest Algorithm

The document describes using a random forest algorithm for air quality monitoring and forecasting. It provides an overview of random forest algorithms, including how they work, their model description, and the algorithm steps. As an example, it discusses using a random forest classifier to predict air quality index values based on various input variables.

Uploaded by

zalakthakkar
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

Random Forest-Supervised Machine Learning

Algorithm for Air Quality Monitoring and


Forecasting.
1
Prepared by : Zalak L. Thakker
Research scholar
Batch February : 2021
08/25/2021
2 Index
Sr.No. Topic Slide No.
1. Introduction to AI, ML, Deep Learning 3.
2. Types of Machine Learning 4.
3. How Supervised algorithm Works 5.
4. Workflow for supervised machine learning algorithm 6.
5. Supervised Learning Algorithm 8.
6. Regression Algorithm 9.
7. Random Forest Algorithm 17.
8. Random Forest Algorithm Model Description 18.
9. Algorithm for Random forest 19.
10. Random Forest Classifier 20.
11. Example of Random Forest Algorithm 21.
12. Example:- Air Quality Index(AQI) Prediction using 22.
random forest classifier
13. References 28.
08/25/2021
3

08/25/2021
Types of machine learning[11]
4

08/25/2021
5

08/25/2021
Workflow for supervised machine learning algorithm
6

08/25/2021
7

08/25/2021
8 Supervised Learning Algorithm

 Linear Regression
 Decision Tree Regression
 Random Forest
 Gradient Boosting
 Support Vector Machine
 Logistic Regression
 Artificial Neural Networks
 Naïve Bayesian classifiers
 K-nearest Neighbour

08/25/2021
9 Regression Algorithm [5][6]

 Linear Regression
 Lasso Regression
 Support Vector Regression
 Neural Network
 Random Forest
 Decision Tree
 XGBoost

08/25/2021
10
Linear Regression[6]
 Linear regression is a linear model, I.e. a model that has a linear relationship between the
independent input variables (x) and the single dependent output variable (y) such that y can
be evaluated or predicted from the linear combination of the independent input variables (x).

 The linear regression is called ‘Simple Linear Regression’ if there is only a single input
variable or independent variable(x).

 In Multiple Linear Regression as the independent variables (x) are more and the dependent
variable (y) is only one i.e. AQI in my case

08/25/2021
11
Lasso Regression
 Least Absolute Selection Shrinkage Operator(LASSO) is a type of linear regression that
attain the predictors by summing up the regression coefficients and hence said to be
shrinking the data values to the mean value. Thus minimize prediction error of regression
coefficient.

08/25/2021
12
Support Vector Regression
 Support vector regression is a linear model that fits the error within certain threshold
i.e.,based on margin-based loss function .Thereby minimize the error and maximize the
margin.Thus it helps in high prediction accuracy.

08/25/2021
13
Neural Network
 A neural network is a supervised algorithm used in machine learning to make predictions
based on existing data. Neural network input layer take inputs based on existing
information.
 Hidden Layer of network layer use backpropagation strategy to optimize the input variable
weights that improve the prediction.
 The data from input and hidden layers yield the prediction of the output layer.

08/25/2021
14
Random Forest
 A Random Forest is an ensemble technique which capable performs both regression and
classification tasks with the utilization of multiple decision trees and a technique called
Bootstrap and Aggregation which is referred to as bagging.
 The random forest provides better results as it doesn't rely on single decision tree rather it
works on multiple decision trees to determine better prediction.
 The random forest model is a kind of additive model which makes the predictions by
combining the decisions from a sequence of the base models. More precisely and correctly
we can write the class of models as:
 g(x)=f0(x)+f1(x)+f2(x)+...
 Where, g as the final model makes the summation of the simple base models f(i). Here, each
of the base classifier is a simple decision tree. This broad technique of using the multiple
models an approach to obtain a better predictive performance is called the model ensembling
technique.

08/25/2021
15
Decision Tree
 Decision tree regression trains a model in the structure of a tree to predict the data.

 It searches for every distinct values for your predictors and chooses to the split the target
variable.

 The main objective behind the Decision tree is to maximize the Information gain at each
split as the decision tree keep splitting in depth.

08/25/2021
16
XGBoost
 eXtreme Gradient Boosting is a bagging ensemble based machine learning algorithm that train
the individual models in a sequential way from the previous model.

 The performance of each algorithm is measured using the following metrics:

 MEAN ABSOLUTE ERROR is the average absolute differences between actual and predicted
value.

 ROOT MEAN SQUARED ERROR is the average of squared differences between actual and
predicted values.

08/25/2021
17
Random Forest algorithm
 Random Forest is also a “Tree”-based algorithm that uses the qualities features of multiple
Decision Trees for making decisions.
 Therefore, it can be referred to as a ‘Forest’ of trees and hence the name “Random Forest”.
The term ‘Random’ is due to the fact that this algorithm is a forest of ‘Randomly created
Decision Trees’.
 Random Forests are an ensemble learning method that is for performing classification,
regression as well as other tasks through the construction of decision trees and providing
the output as a class which is the mode or mean of the underlying individual trees.
 So, in simple term “The random forest algorithm is a supervised learning model; it uses
labelled data to “learn” how to classify unlabelled data.”

08/25/2021
18
Random Forest Algorithm Model
Description
 In Random Forest algorithm, the ensemble method is used which is better than a single
decision tree because it decreases the over-fitting by averaging the result.
 It is based on the divide-and-conquer approach.
 The assemblage of decision tree classifiers is known as a Forest.
 For each attribute, the attribute selection indicators such as information gain, gain ratio, and
Gene index is used to generate individual decision trees.
 Each tree depends on an independent random sample.
 In a classification problem, each tree votes, and therefore the hottest class is chosen as the
outcome.
 In the case of regression, the average of all the outputs is considered as the final result[1]

08/25/2021
19 Algorithm for Random Forest:[1]
A. Let, the number of training cases is 'N', and the number of variables in the classifier is ‘M'.

B. The number 'm' of input variables is used to ascertain the decision at the node of the tree, and 'm'
should be substantially less than 'M'.

C. Select a training set for this tree by choosing N times with substitution from all N available
training cases.

D. Use the remainder of the cases to estimate the error of the tree by predicting their classes.

E. For each node of the tree, 'm' variables are randomly chosen based on the decision taken at the
node. The best split is determined based on these m variables in the training set.
08/25/2021

F. Each tree is fully grown and not pruned.


20
Random Forest Classifier

08/25/2021
Example of Random Forest Algorithm
21

08/25/2021
22 Example:- Air Quality Index(AQI) Prediction using random
forest classifier

 Air Quality Index:- An air quality index (AQI) is a number used by government agencies
to communicate to the public how polluted the air is currently or how polluted it is
forecasted to become.
 As the AQI increases, an increasingly large percentage of the population is likely to be
exposed, and people might experience increasingly severe health effects.
 Central Pollution Control Board,India initiated National Ambient Air Quality Monitoring
(NAAQM) programme in the year 1984 with 7 stations at Agra and Anpara.

08/25/2021
23 Importance of air quality monitoring
 Determine if air quality is meeting national standards
 Determine the highest pollutant concentrations
 Understand how pollutants behave and their relationship with the weather.
 validate pollution modelling, used to test 'what if' scenarios.
 Forecast air quality
 Evaluate the effectiveness of air pollution control programs
 Evaluate the effects of air pollution on public health
 Track the progress of plans for meeting air quality standard
 Determine air quality trends
 Develop responsible and cost-effective pollution control strategies and policy decisions

08/25/2021
24 Interpretation of Air quality using IND-
AQI
 Primarily two steps are involved in formulating an AQI:

 formation of sub-indices (for each pollutant) and

 Aggregation of sub-indices to get an overall AQI.

08/25/2021
25 AQI Formula
(1) The sub-index (Ip) for a given pollutant concentration (Cp), as based on ‘linear segmented
principle’ is calculated as:

Ip= [{(IHI - ILO)/ (BHI -BLO)} * (Cp-BLO)] +ILO


 
 BHI= Breakpoint concentration greater or equal to given concentration
 BLO= Breakpoint concentration smaller or equal to given concentration
 IHI = AQI value corresponding to BHI
 ILO = AQI value corresponding to BLO; subtract one from ILO, if ILO is greater than 50

(2)Finally,
AQI = Max (Ip) (where; p= 1,2,...,n; denotes n pollutants)
08/25/2021
26 The AQI values and corresponding ambient concentrations (health
breakpoints) as well as associated likely health impacts are as
follows:

08/25/2021
27
Features considere
-->City
-->Date
-->PM2.5 (Particulate Matter 2.5-micrometer)
-->PM10 (Particulate Matter 10-micrometer)
-->SO2 (Sulphur Dioxide)
-->NOx (Any Nitric x-oxide)
-->NH3 (Ammonia)
-->CO (Carbon Monoxide)
-->O3 (Ozone or Trioxygen)
-->Benzene
-->Toluene
-->Xylene
-->AQI
-->AQI_Bucket

08/25/2021
28 References:-
1. Implementation of Machine Learning Algorithms for Analysis and Prediction of Air Quality
https://www.ijert.org/implementation-of-machine-learning-algorithms-for-analysis-and-prediction-of-air-quality
2. Comparative Analysis of Machine Learning Techniques for Predicting Air Quality in Smart Cities
https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8746201
3. A COMPARATIVE STUDY ON PREDICTION OF INDIAN AIR QUALITY INDEX USING MACHINE LEARNING ALGORITHMS
http://www.jcreview.com/fulltext/197-1592899246.pdf
4. Air Quality Prediction based on Supervised Machine Learning Methods
https://www.ijitee.org/wp-content/uploads/papers/v8i9S4/I11320789S419.pdf
5. PM2.5 Prediction using Machine Learning Hybrid Model for Smart Health
https://www.ijeat.org/wp-content/uploads/papers/v9i1/A1187109119.pdf
6. Air Quality Prediction Model using Supervised Machine Learning Algorithms
http://ijsrcseit.com/paper/CSEIT206435.pdf
7. INDIAN AIR QUALITY PREDICTION AND ANALYSIS USING MACHINE LEARNING
https://www.ripublication.com/ijaerspl2019/ijaerv14n11spl_34.pdf
8. Machine Learning Basics: Random Forest Regression
https://towardsdatascience.com/machine-learning-basics-random-forest-regression-be3e1e3bb91a
9. Learn Types of Machine Learning Algorithms with Ultimate Use Cases
https://data-flair.training/blogs/types-of-machine-learning-algorithms/
10. https://www.javatpoint.com/machine-learning-random-forest-algorithm
11.Application of machine learning in rheumatic disease research
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6610179/
12. RAQ–A Random Forest Approach for Predicting Air Quality in Urban Sensing Systems
https://www.researchgate.net/publication/290210861_RAQ-A_Random_Forest_Approach_for_Predicting_Air_Quality_in_Urban_Sensing_Systems
13. Air Quality Index
https://cpcb.nic.in/National-Air-Quality-Index/

08/25/2021

You might also like