Random Forest-Supervised ML

Random Forest is a powerful supervised machine learning algorithm used for both regression and classification tasks, combining multiple decision trees to improve prediction accuracy. It employs ensemble techniques like Bagging and Boosting, with Bagging utilizing Bootstrap Aggregation to create independent models and Boosting focusing on correcting errors from previous models. The document also discusses the implementation of Random Forest using scikit-learn, highlighting its advantages over single decision trees in terms of accuracy and feature ranking.

Uploaded by

24mt0362

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views45 pages

Random Forest-Supervised ML

Uploaded by

24mt0362

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 45

Supervised

Learning:
Random Forest

Prepared By
ARCHANA
Random forest
• Random forest is another powerful supervised ML algorithm which can be used for both regression and
classification problems.
• The general technique of random decision forests was first proposed by Ho in 1995. Random forest is an
ensemble of decision trees or it can be thought of as a forest of decision trees.
• Since random forest combines many decision tree models into one, it is known as an ensemble algorithm.
For example, instead of building a decision tree to predict EUR/1000 ft, using a single tree could result in
an erroneous value due to variance in predictions.
• One way to avoid this variance when predicting the EUR/ 1000 ft is to take predictions from hundreds or
thousands of decision trees and using the average of those trees to calculate the final answer.
• Combining many decision trees into a single model is essentially the fundamental concept behind using
random forest. The predictions made by a single decision tree could be inaccurate, but when combined,
the prediction will be closer to average.
• The reason random forest is typically more accurate than a single decision tree is because much more
knowledge is incorporated from many predictions.
• For regression problems, random forest uses the average of the decision trees for final prediction.
However, as previously mentioned, classification problems can also be solved using random forest by
taking a majority vote of the predicted class.
• Fig. below illustrates the difference between a single decision tree versus a random
forest which consists of an ensemble of decision trees.
• There are two main types of combining multiple decision trees into one and they are
as follows:
Working of Random Forest Algorithm
• Before understanding the working of the random forest algorithm in
machine learning, we must look into the ensemble learning technique.
• Ensemble simply means combining multiple models. Thus a collection of
models is used to make predictions rather than an individual model.
• Ensemble uses two types of methods:
Bagging
Boosting
Bagging
• Bagging, also known as Bootstrap Aggregation, serves as the ensemble technique in the
Random Forest algorithm. Here are the steps involved in Bagging:
• Selection of Subset: Bagging starts by choosing a random sample, or subset, from the
entire dataset.
• Bootstrap Sampling: Each model is then created from these samples, called Bootstrap
Samples, which are taken from the original data with replacement. This process is known
as row sampling.
• Bootstrapping: The step of row sampling with replacement is referred to as
bootstrapping.
• Independent Model Training: Each model is trained independently on its corresponding
Bootstrap Sample. This training process generates results for each model.
• Majority Voting: The final output is determined by combining the results of all models
through majority voting. The most commonly predicted outcome among the models is
selected.
• Aggregation: This step, which involves combining all the results and generating the final
output based on majority voting, is known as aggregation.
Boosting
• Boosting is one of the techniques that use the concept of ensemble learning.
A boosting algorithm combines multiple simple models (also known as
weak learners or base estimators) to generate the final output. It is done by
building a model by using weak models in series.
• There are several boosting algorithms; AdaBoost was the first really
successful boosting algorithm that was developed for the purpose of binary
classification.
• AdaBoost is an abbreviation for Adaptive Boosting and is a prevalent
boosting technique that combines multiple “weak classifiers” into a single
“strong classifier.” There are Other Boosting techniques.
Steps Involved in Random Forest Algorithm
Step 1: In this model, a subset of data points and a subset of features is
selected for constructing each decision tree. Simply put, n random records and
m features are taken from the data set having k number of records.
Step 2: Individual decision trees are constructed for each sample.
Step 3: Each decision tree will generate an output.
Step 4: Final output is considered based on Majority Voting or Averaging for
Classification and regression, respectively.
• What is AdaBoost? AdaBoost, short for Adaptive Boosting, is an ensemble
machine learning algorithm that can be used in a wide variety of
classification and regression tasks. It is a supervised learning algorithm that
is used to classify data by combining multiple weak or base learners (e.g.,
decision trees) into a strong learner. AdaBoost works by weighting the
instances in the training dataset based on the accuracy of previous
classifications.
• Boosting, on the other hand, (used in gradient boosting) is also considered to be
an ensemble technique where the models are built sequentially as opposed to
independently.
• In boosting, more weights are placed on instances with incorrect predictions.
Therefore, the focus in boosting is the challenging cases that are being predicted
inaccurately.
• As opposed to bagging where an equal weighted average is used, boosting uses
weighted average where more weight is applied to the models with better
performance.
• In other words, in boosting, the samples that were predicted inaccurately get a
higher weight which would then lead to sampling them more often.
• This is the main reason why bagging can be performed independently while
boosting is performed sequentially
• AdaBoost Algorithm
• Freund and Schapire first presented boosting as an ensemble modelling approach
in 1997. Boosting has now become a popular strategy for dealing with binary
classification issues. These algorithms boost prediction power by transforming a
large number of weak learners into strong learners.
• Boosting algorithms work on the idea of first building a model on the training
dataset and then building a second model to correct the faults in the first model.
This technique is repeated until the mistakes are reduced and the dataset is
accurately predicted. Boosting algorithms function similarly in that they combine
numerous models (weak learners) to produce the final result (strong learners).
• There are three kinds of boosting algorithms:
• The AdaBoost algorithm is used.
• Gradient descent algorithm
• Xtreme gradient descent algorithm
What is AdaBoost Algorithm in Machine Learning?
• AdaBoost in machine learning is one of these predictive modelling
techniques. AdaBoost, also known as Adaptive Boosting, is a Machine
Learning approach that is utilised as an Ensemble Method.
• AdaBoost's most commonly used estimator is decision trees with one level,
which is decision trees with just one split. These trees are often referred to as
Decision Stumps.
• This approach constructs a model and assigns equal weights to all data
points. It then applies larger weights to incorrectly categorised points.
• In the following model, all points with greater weights are given more
weight. It will continue to train models until a smaller error is returned.
•
• AdaBoost in Machine Learning
• To illustrate, imagine you created a decision tree algorithm using the Titanic
dataset and obtained an accuracy of 80%. Following that, you use a new
method and assess the accuracy, which is 75% for KNN and 70% for Linear
Regression.
• When we develop a new model on the same dataset, the accuracy varies.
What if we combine all of these algorithms to create the final prediction?
Using the average of the outcomes from various models will yield more
accurate results. In this method, we can improve prediction power.
• Understanding the Working of the AdaBoost Classifier Algorithm
• Step 1:
• The image below represents the adaboost algorithm example or adaboost
example by taking below dataset . It is a classification challenge since the
target column is binary. First and foremost, these data points will be
weighted. At first, all of the weights will be equal.
Step 2:
• We will examine how well "Gender" classifies the samples, followed by
how the variables (Age and Income) categorise the samples. We'll make a
decision stump for each characteristic and then compute each tree's Gini
Index. Our first stump will be the tree with the lowest Gini Index.
• Let's suppose Gender has the lowest gini index in our dataset, thus it will be
our first stump.
Step 3:
• Using this approach, we will now determine the "Amount of Say" or
"Importance" or "Influence" for this classifier in categorising the data points:
0 represents a flawless stump, while 1 represents a bad
stump.
•According to the graph above, when there is no misclassification, there is no error (Total Error = 0), hence the "amount of
say (alpha)" will be a huge value.
•When the classifier predicts half correctly and half incorrect, the Total Error equals 0.5, and the classifier's significance
(amount of say) equals 0.
•If all of the samples were improperly categorised, the error will be quite large (about to 1), and our alpha value will be a
negative integer
Step 4:
• You're probably asking why it's required to determine a stump's TE and
performance. The answer is simple: we need to update the weights since if
the same weights are used in the next model, the result will be the same as it
was in the previous model.
• The weights of the incorrect forecasts will be increased, while the weights
of the successful predictions will be dropped. When we create our next
model after updating the weights, we will assign greater weight to the points
with higher weights.
• After determining the classifier's significance and total error, we must
update the weights using the following formula:
• We know that the entire sum of the sample weights must equal one, but if
we add all of the new sample weights together, we get 0.8004. To get this
amount equal to 1, we will normalise these weights by dividing all the
weights by the entire sum of updated weights, which is 0.8004. Hence, we
get this dataset after normalising the sample weights, and the sum is now
equal to 1.
• Step 5:
• We must now create a fresh dataset to see whether or not the mistakes have
decreased. To do this, we will delete the "sample weights" and "new sample
weights" columns and then split our data points into buckets based on the
"new sample weights.”
Step 6:
• We're nearly there. The method now chooses random values ranging from 0
to 1. Because improperly categorised records have greater sample weights,
the likelihood of picking them is relatively high.
• Assume the five random integers chosen by our algorithm are
0.38,0.26,0.98,0.40,0.55.
• Now we'll examine where these random numbers go in the bucket and
create our new dataset, which is displayed below.
This is our new dataset, and we can see that the data point that
was incorrectly categorised has been picked three times since it
has a greater weight.
Step 7:
• This now serves as our new dataset, and we must repeat all of the preceding stages, i.e.
Give each data point an equal weight. Determine the stump that best classifies the new
group of samples by calculating their Gini index and picking the one with the lowest
Gini index. To update the prior sample weights, compute the "Amount of Say" and
"Total error." Normalize the newly calculated sample weights. Iterate through these
procedures until a low training error is obtained.
• Assume that we have built three decision trees (DT1, DT2, and DT3) sequentially with
regard to our dataset. If we transmit our test data now, it will go through all of the
decision trees, and we will eventually find which class has the majority, and we will
make predictions for our test dataset based on that.
Conclusion
• AdaBoost is a powerful and widely used machine learning algorithm that has been
successfully applied to classification and regression tasks in a wide variety of
domains. It is an effective method for combining multiple weak or base learners into a
single strong learner, and has been shown to have good generalization performance. Its
ability to weight instances based on previous classifications makes it robust to noisy
and imbalanced datasets, and it is computationally efficient and less prone to
overfitting.
Random forest implementation using scikit-learn
• In this section, the same TOC data set used under decision tree will be used to apply for random forest
regression.
• import the "RandomForestRegressor" as follows:
• Next, let’s define the parameters inside the "RandomForestRegressor." There are multiple important
hyper-tuning parameters within a random forest model such as "n_estimators," "criterion," "max_depth,"
etc.
• "n_estimators“ defines the number of trees in the forest. Usually the higher this number, the more
accurate the model is without leading to overfitting.
• In the example below, "n_estimators" is set to be 5000 which means 5000 independent decision trees will
be constructed and the average of the 5000 trees will be used as the predicted value for each prediction
row.
• "criterion" of "mse" was chosen for this model which means variance reduction is desired. Since
bootstrapping aggregation that was discussed is desired to be chosen for this model, "bootstrap" was set to
"True”.
• If "bootstrap" is set to "False," the whole data set is used to build each decision tree.
• "n_jobs" is set to "-1" in an attempt to use all processors. If this is not desired, simply change from 1 to a
different integer value.
• Next, let’s apply these defined "rf" parameters to the training inputs and
output features (X_train,y_train) and obtain the accuracy of both
training and testing sets as shown below:
• As can be observed, the testing R2 is 81.82% compared to 68.33% of the decision tree.
• Therefore, without doing further parameter fine-tuning, the random forest algorithm appears to be
outperforming the decision tree. Let’s also visualize the cross plots of actual versus predicted
training and testing data sets as follows:
• Next, let’s also obtain MAE, MSE, and RMSE for the testing set as follows:
• As illustrated, MAE, MSE, and RMSE values are lower as compared to the decision tree model.
Next, let’s also obtain the feature ranking using random forest as follows:
• As illustrated in Fig. above slide, the important features obtained by random forest is different than
what was obtained from decision tree.
• This is primarily attributed to the higher accuracy of the random forest model.
• The recommendation is to go with the model with higher accuracy which is the random forest model
in this particular example.
• Tree-based algorithms, such as decision tree, random forest, extra trees, etc., use percentage
improvement in the purity of the node to naturally rank the input features.
• As previously discussed, in classification problems, the idea is to minimize Gini impurity (if Gini
impurity is selected).
• Therefore, nodes that lead to the greatest reduction in Gini impurity happen at the start of the trees
while nodes with the least amount of reduction occur at the end of the trees. This is how tree-based
algorithms perform feature ranking.
• To be consistent with the decision tree model, let’s also do a five-fold cross-validation to observe the
resulting average R2 for the random forest model as follows:

On average, the cross-validation R2 on random forest is 77.48% as compared to 63.03% of the

decision tree model.
THANK YOU

ML Lecture 15 Ensemble
No ratings yet
ML Lecture 15 Ensemble
27 pages
Handout9 Trees Bagging Boosting
100% (1)
Handout9 Trees Bagging Boosting
23 pages
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
No ratings yet
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
27 pages
Bagging and Random Forest Presentation1
100% (3)
Bagging and Random Forest Presentation1
23 pages
CEH Practical Notes - ?????????
No ratings yet
CEH Practical Notes - ?????????
40 pages
Bagging+Boosting+Gradient Boosting
100% (1)
Bagging+Boosting+Gradient Boosting
48 pages
UNIT-3 Material
No ratings yet
UNIT-3 Material
19 pages
Ensemble Learning: Wisdom of The Crowd
100% (1)
Ensemble Learning: Wisdom of The Crowd
12 pages
Ensemble Learning Methods
100% (1)
Ensemble Learning Methods
24 pages
Ch-4 Ensemble Learning
No ratings yet
Ch-4 Ensemble Learning
18 pages
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
100% (1)
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
14 pages
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
No ratings yet
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
6 pages
Classification Algorithms
No ratings yet
Classification Algorithms
68 pages
ML-Unit I - Ensemble Methods
No ratings yet
ML-Unit I - Ensemble Methods
54 pages
ML Mod1
No ratings yet
ML Mod1
48 pages
5 - EnsembleModeling
No ratings yet
5 - EnsembleModeling
80 pages
Unit 3
No ratings yet
Unit 3
63 pages
Random Forest Algorithm
No ratings yet
Random Forest Algorithm
39 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
40 pages
Module 3: Advanced ML Algorithms and Hardware Design Optimization
No ratings yet
Module 3: Advanced ML Algorithms and Hardware Design Optimization
38 pages
Module 7 - Ensemble Learning
No ratings yet
Module 7 - Ensemble Learning
41 pages
Module 2
No ratings yet
Module 2
34 pages
Ensemble - Part 1
No ratings yet
Ensemble - Part 1
33 pages
Ensemble Final
No ratings yet
Ensemble Final
41 pages
05 - Ensemble Learning
No ratings yet
05 - Ensemble Learning
39 pages
Ensemble Learning
No ratings yet
Ensemble Learning
52 pages
22AIP3101A Session 11
No ratings yet
22AIP3101A Session 11
30 pages
Ensemble Methods
No ratings yet
Ensemble Methods
31 pages
D3 IT Random Forest Apr 2023
No ratings yet
D3 IT Random Forest Apr 2023
32 pages
Basic Computer Concepts
100% (12)
Basic Computer Concepts
9 pages
Machine Learning Lecture 2,3,4
No ratings yet
Machine Learning Lecture 2,3,4
26 pages
14-AI ML Ensemble 2022
No ratings yet
14-AI ML Ensemble 2022
41 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
32 pages
Gradient Boosted Trees: Dr. Geetha Kuntoji
No ratings yet
Gradient Boosted Trees: Dr. Geetha Kuntoji
24 pages
Random Forest
No ratings yet
Random Forest
27 pages
Ensemble Learning (Autosaved)
No ratings yet
Ensemble Learning (Autosaved)
31 pages
15 Best Linux Distributions For Hacking Pen Testing in 2020 PDF
No ratings yet
15 Best Linux Distributions For Hacking Pen Testing in 2020 PDF
20 pages
ML Unit 3 (DS)
No ratings yet
ML Unit 3 (DS)
31 pages
Random Forest
No ratings yet
Random Forest
20 pages
Lecture-12 Machine Learning With Python
No ratings yet
Lecture-12 Machine Learning With Python
18 pages
ML Mod 5.1
No ratings yet
ML Mod 5.1
18 pages
Lecture 6
No ratings yet
Lecture 6
24 pages
ML Unit-3
No ratings yet
ML Unit-3
15 pages
Unit-3 ML
No ratings yet
Unit-3 ML
18 pages
Data Mining - Ensemble Methods
No ratings yet
Data Mining - Ensemble Methods
12 pages
Random Forest
No ratings yet
Random Forest
25 pages
Unit Iv
No ratings yet
Unit Iv
14 pages
ML Unit 3-1
No ratings yet
ML Unit 3-1
14 pages
UNIT III Word File
No ratings yet
UNIT III Word File
13 pages
ML Unit 3
No ratings yet
ML Unit 3
22 pages
2.4-Ensemble Methods Lecture Notes
No ratings yet
2.4-Ensemble Methods Lecture Notes
14 pages
Ensemble Method
No ratings yet
Ensemble Method
8 pages
Trees, Boosting, and Random Forest
No ratings yet
Trees, Boosting, and Random Forest
14 pages
Unit 3 Aml
No ratings yet
Unit 3 Aml
9 pages
Bagging Vs Boosting - Javatpoint
No ratings yet
Bagging Vs Boosting - Javatpoint
8 pages
Bagging
No ratings yet
Bagging
6 pages
Online Resources: Where To From Here
No ratings yet
Online Resources: Where To From Here
4 pages
Bagging
No ratings yet
Bagging
7 pages
Random Forest
No ratings yet
Random Forest
10 pages
Data Mining Notes
No ratings yet
Data Mining Notes
5 pages
Eda - M4
No ratings yet
Eda - M4
7 pages
2.PC Jotun Chart1011 PDF
No ratings yet
2.PC Jotun Chart1011 PDF
3 pages
710, Barton Centre, M G Road, Bangalore 560 001: A.O: Against Order Tax (Vat) 5% Extra
No ratings yet
710, Barton Centre, M G Road, Bangalore 560 001: A.O: Against Order Tax (Vat) 5% Extra
11 pages
Openstack CLI References
No ratings yet
Openstack CLI References
556 pages
Cracking Wifi Passwords Using Aircrack-Ng Using A Target-Specific Custom Wordlist Generated by Us
No ratings yet
Cracking Wifi Passwords Using Aircrack-Ng Using A Target-Specific Custom Wordlist Generated by Us
9 pages
MATLAB Report
No ratings yet
MATLAB Report
17 pages
Adafruit Ultimate Gps PDF
No ratings yet
Adafruit Ultimate Gps PDF
52 pages
8-Queen Problem
No ratings yet
8-Queen Problem
2 pages
Chapter-5 Network Programming
No ratings yet
Chapter-5 Network Programming
22 pages
AutoCAD Cheat Sheet
No ratings yet
AutoCAD Cheat Sheet
2 pages
Brochure Template 4 - 2 July
No ratings yet
Brochure Template 4 - 2 July
18 pages
4M - Fineelec
No ratings yet
4M - Fineelec
10 pages
FW22.1.214 Release Notes
No ratings yet
FW22.1.214 Release Notes
9 pages
3 - 21, 2 - 24 PM) - WPS Office PDF Love 6
No ratings yet
3 - 21, 2 - 24 PM) - WPS Office PDF Love 6
2 pages
Chapter 3: Solving Systems of Linear Equations Using Gaussian Elimination
No ratings yet
Chapter 3: Solving Systems of Linear Equations Using Gaussian Elimination
13 pages
G6S User Manual PDF
No ratings yet
G6S User Manual PDF
72 pages
Report Final 3.1
No ratings yet
Report Final 3.1
27 pages
A Wormhole Attack Detection and Prevention Techniq
No ratings yet
A Wormhole Attack Detection and Prevention Techniq
9 pages
Avigilon 2.0C-H4PTZ-DP30
No ratings yet
Avigilon 2.0C-H4PTZ-DP30
4 pages
Lab 03
No ratings yet
Lab 03
16 pages
Project 1name - Excel Activities in Email Automation - People - Email
No ratings yet
Project 1name - Excel Activities in Email Automation - People - Email
4 pages
Grade 10 Holiday Assignment 2024 Final
No ratings yet
Grade 10 Holiday Assignment 2024 Final
18 pages
1 s2.0 S0140366419306930 Main
No ratings yet
1 s2.0 S0140366419306930 Main
7 pages
JETIR1509007
No ratings yet
JETIR1509007
7 pages
Presentation Impress Shortcut Keys
No ratings yet
Presentation Impress Shortcut Keys
3 pages
VGA Bios Collection Sapphire RX 470 8 GB TechPowerUp
No ratings yet
VGA Bios Collection Sapphire RX 470 8 GB TechPowerUp
1 page
Development of E-Learning Models in Database System Courses
No ratings yet
Development of E-Learning Models in Database System Courses
4 pages
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Pedestrian Detection: Please, suggest a subtitle for a book with title 'Pedestrian Detection' within the realm of 'Computer Vision'. The suggested subtitle should not have ':'.
From Everand
Pedestrian Detection: Please, suggest a subtitle for a book with title 'Pedestrian Detection' within the realm of 'Computer Vision'. The suggested subtitle should not have ':'.
Fouad Sabry
No ratings yet
Differential Evolution: Fundamentals and Applications
From Everand
Differential Evolution: Fundamentals and Applications
Fouad Sabry
No ratings yet

Random Forest-Supervised ML

Uploaded by

Random Forest-Supervised ML

Uploaded by

Supervised

On average, the cross-validation R2 on random forest is 77.48% as compared to 63.03% of the

You might also like