0% found this document useful (0 votes)
8 views45 pages

Random Forest-Supervised ML

Random Forest is a powerful supervised machine learning algorithm used for both regression and classification tasks, combining multiple decision trees to improve prediction accuracy. It employs ensemble techniques like Bagging and Boosting, with Bagging utilizing Bootstrap Aggregation to create independent models and Boosting focusing on correcting errors from previous models. The document also discusses the implementation of Random Forest using scikit-learn, highlighting its advantages over single decision trees in terms of accuracy and feature ranking.

Uploaded by

24mt0362
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views45 pages

Random Forest-Supervised ML

Random Forest is a powerful supervised machine learning algorithm used for both regression and classification tasks, combining multiple decision trees to improve prediction accuracy. It employs ensemble techniques like Bagging and Boosting, with Bagging utilizing Bootstrap Aggregation to create independent models and Boosting focusing on correcting errors from previous models. The document also discusses the implementation of Random Forest using scikit-learn, highlighting its advantages over single decision trees in terms of accuracy and feature ranking.

Uploaded by

24mt0362
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Supervised

Learning:
Random Forest

Prepared By
ARCHANA
Random forest
• Random forest is another powerful supervised ML algorithm which can be used for both regression and
classification problems.
• The general technique of random decision forests was first proposed by Ho in 1995. Random forest is an
ensemble of decision trees or it can be thought of as a forest of decision trees.
• Since random forest combines many decision tree models into one, it is known as an ensemble algorithm.
For example, instead of building a decision tree to predict EUR/1000 ft, using a single tree could result in
an erroneous value due to variance in predictions.
• One way to avoid this variance when predicting the EUR/ 1000 ft is to take predictions from hundreds or
thousands of decision trees and using the average of those trees to calculate the final answer.
• Combining many decision trees into a single model is essentially the fundamental concept behind using
random forest. The predictions made by a single decision tree could be inaccurate, but when combined,
the prediction will be closer to average.
• The reason random forest is typically more accurate than a single decision tree is because much more
knowledge is incorporated from many predictions.
• For regression problems, random forest uses the average of the decision trees for final prediction.
However, as previously mentioned, classification problems can also be solved using random forest by
taking a majority vote of the predicted class.
• Fig. below illustrates the difference between a single decision tree versus a random
forest which consists of an ensemble of decision trees.
• There are two main types of combining multiple decision trees into one and they are
as follows:
Working of Random Forest Algorithm
• Before understanding the working of the random forest algorithm in
machine learning, we must look into the ensemble learning technique.
• Ensemble simply means combining multiple models. Thus a collection of
models is used to make predictions rather than an individual model.
• Ensemble uses two types of methods:
Bagging
Boosting
Bagging
• Bagging, also known as Bootstrap Aggregation, serves as the ensemble technique in the
Random Forest algorithm. Here are the steps involved in Bagging:
• Selection of Subset: Bagging starts by choosing a random sample, or subset, from the
entire dataset.
• Bootstrap Sampling: Each model is then created from these samples, called Bootstrap
Samples, which are taken from the original data with replacement. This process is known
as row sampling.
• Bootstrapping: The step of row sampling with replacement is referred to as
bootstrapping.
• Independent Model Training: Each model is trained independently on its corresponding
Bootstrap Sample. This training process generates results for each model.
• Majority Voting: The final output is determined by combining the results of all models
through majority voting. The most commonly predicted outcome among the models is
selected.
• Aggregation: This step, which involves combining all the results and generating the final
output based on majority voting, is known as aggregation.
Boosting
• Boosting is one of the techniques that use the concept of ensemble learning.
A boosting algorithm combines multiple simple models (also known as
weak learners or base estimators) to generate the final output. It is done by
building a model by using weak models in series.
• There are several boosting algorithms; AdaBoost was the first really
successful boosting algorithm that was developed for the purpose of binary
classification.
• AdaBoost is an abbreviation for Adaptive Boosting and is a prevalent
boosting technique that combines multiple “weak classifiers” into a single
“strong classifier.” There are Other Boosting techniques.
Steps Involved in Random Forest Algorithm
Step 1: In this model, a subset of data points and a subset of features is
selected for constructing each decision tree. Simply put, n random records and
m features are taken from the data set having k number of records.
Step 2: Individual decision trees are constructed for each sample.
Step 3: Each decision tree will generate an output.
Step 4: Final output is considered based on Majority Voting or Averaging for
Classification and regression, respectively.
• What is AdaBoost? AdaBoost, short for Adaptive Boosting, is an ensemble
machine learning algorithm that can be used in a wide variety of
classification and regression tasks. It is a supervised learning algorithm that
is used to classify data by combining multiple weak or base learners (e.g.,
decision trees) into a strong learner. AdaBoost works by weighting the
instances in the training dataset based on the accuracy of previous
classifications.
• Boosting, on the other hand, (used in gradient boosting) is also considered to be
an ensemble technique where the models are built sequentially as opposed to
independently.
• In boosting, more weights are placed on instances with incorrect predictions.
Therefore, the focus in boosting is the challenging cases that are being predicted
inaccurately.
• As opposed to bagging where an equal weighted average is used, boosting uses
weighted average where more weight is applied to the models with better
performance.
• In other words, in boosting, the samples that were predicted inaccurately get a
higher weight which would then lead to sampling them more often.
• This is the main reason why bagging can be performed independently while
boosting is performed sequentially
• AdaBoost Algorithm
• Freund and Schapire first presented boosting as an ensemble modelling approach
in 1997. Boosting has now become a popular strategy for dealing with binary
classification issues. These algorithms boost prediction power by transforming a
large number of weak learners into strong learners.
• Boosting algorithms work on the idea of first building a model on the training
dataset and then building a second model to correct the faults in the first model.
This technique is repeated until the mistakes are reduced and the dataset is
accurately predicted. Boosting algorithms function similarly in that they combine
numerous models (weak learners) to produce the final result (strong learners).
• There are three kinds of boosting algorithms:
• The AdaBoost algorithm is used.
• Gradient descent algorithm
• Xtreme gradient descent algorithm
What is AdaBoost Algorithm in Machine Learning?
• AdaBoost in machine learning is one of these predictive modelling
techniques. AdaBoost, also known as Adaptive Boosting, is a Machine
Learning approach that is utilised as an Ensemble Method.
• AdaBoost's most commonly used estimator is decision trees with one level,
which is decision trees with just one split. These trees are often referred to as
Decision Stumps.
• This approach constructs a model and assigns equal weights to all data
points. It then applies larger weights to incorrectly categorised points.
• In the following model, all points with greater weights are given more
weight. It will continue to train models until a smaller error is returned.

• AdaBoost in Machine Learning
• To illustrate, imagine you created a decision tree algorithm using the Titanic
dataset and obtained an accuracy of 80%. Following that, you use a new
method and assess the accuracy, which is 75% for KNN and 70% for Linear
Regression.
• When we develop a new model on the same dataset, the accuracy varies.
What if we combine all of these algorithms to create the final prediction?
Using the average of the outcomes from various models will yield more
accurate results. In this method, we can improve prediction power.
• Understanding the Working of the AdaBoost Classifier Algorithm
• Step 1:
• The image below represents the adaboost algorithm example or adaboost
example by taking below dataset . It is a classification challenge since the
target column is binary. First and foremost, these data points will be
weighted. At first, all of the weights will be equal.
Step 2:
• We will examine how well "Gender" classifies the samples, followed by
how the variables (Age and Income) categorise the samples. We'll make a
decision stump for each characteristic and then compute each tree's Gini
Index. Our first stump will be the tree with the lowest Gini Index.
• Let's suppose Gender has the lowest gini index in our dataset, thus it will be
our first stump.
Step 3:
• Using this approach, we will now determine the "Amount of Say" or
"Importance" or "Influence" for this classifier in categorising the data points:
0 represents a flawless stump, while 1 represents a bad
stump.
•According to the graph above, when there is no misclassification, there is no error (Total Error = 0), hence the "amount of
say (alpha)" will be a huge value.
•When the classifier predicts half correctly and half incorrect, the Total Error equals 0.5, and the classifier's significance
(amount of say) equals 0.
•If all of the samples were improperly categorised, the error will be quite large (about to 1), and our alpha value will be a
negative integer
Step 4:
• You're probably asking why it's required to determine a stump's TE and
performance. The answer is simple: we need to update the weights since if
the same weights are used in the next model, the result will be the same as it
was in the previous model.
• The weights of the incorrect forecasts will be increased, while the weights
of the successful predictions will be dropped. When we create our next
model after updating the weights, we will assign greater weight to the points
with higher weights.
• After determining the classifier's significance and total error, we must
update the weights using the following formula:
• We know that the entire sum of the sample weights must equal one, but if
we add all of the new sample weights together, we get 0.8004. To get this
amount equal to 1, we will normalise these weights by dividing all the
weights by the entire sum of updated weights, which is 0.8004. Hence, we
get this dataset after normalising the sample weights, and the sum is now
equal to 1.
• Step 5:
• We must now create a fresh dataset to see whether or not the mistakes have
decreased. To do this, we will delete the "sample weights" and "new sample
weights" columns and then split our data points into buckets based on the
"new sample weights.”
Step 6:
• We're nearly there. The method now chooses random values ranging from 0
to 1. Because improperly categorised records have greater sample weights,
the likelihood of picking them is relatively high.
• Assume the five random integers chosen by our algorithm are
0.38,0.26,0.98,0.40,0.55.
• Now we'll examine where these random numbers go in the bucket and
create our new dataset, which is displayed below.
This is our new dataset, and we can see that the data point that
was incorrectly categorised has been picked three times since it
has a greater weight.
Step 7:
• This now serves as our new dataset, and we must repeat all of the preceding stages, i.e.
Give each data point an equal weight. Determine the stump that best classifies the new
group of samples by calculating their Gini index and picking the one with the lowest
Gini index. To update the prior sample weights, compute the "Amount of Say" and
"Total error." Normalize the newly calculated sample weights. Iterate through these
procedures until a low training error is obtained.
• Assume that we have built three decision trees (DT1, DT2, and DT3) sequentially with
regard to our dataset. If we transmit our test data now, it will go through all of the
decision trees, and we will eventually find which class has the majority, and we will
make predictions for our test dataset based on that.
Conclusion
• AdaBoost is a powerful and widely used machine learning algorithm that has been
successfully applied to classification and regression tasks in a wide variety of
domains. It is an effective method for combining multiple weak or base learners into a
single strong learner, and has been shown to have good generalization performance. Its
ability to weight instances based on previous classifications makes it robust to noisy
and imbalanced datasets, and it is computationally efficient and less prone to
overfitting.
Random forest implementation using scikit-learn
• In this section, the same TOC data set used under decision tree will be used to apply for random forest
regression.
• import the "RandomForestRegressor" as follows:
• Next, let’s define the parameters inside the "RandomForestRegressor." There are multiple important
hyper-tuning parameters within a random forest model such as "n_estimators," "criterion," "max_depth,"
etc.
• "n_estimators“ defines the number of trees in the forest. Usually the higher this number, the more
accurate the model is without leading to overfitting.
• In the example below, "n_estimators" is set to be 5000 which means 5000 independent decision trees will
be constructed and the average of the 5000 trees will be used as the predicted value for each prediction
row.
• "criterion" of "mse" was chosen for this model which means variance reduction is desired. Since
bootstrapping aggregation that was discussed is desired to be chosen for this model, "bootstrap" was set to
"True”.
• If "bootstrap" is set to "False," the whole data set is used to build each decision tree.
• "n_jobs" is set to "-1" in an attempt to use all processors. If this is not desired, simply change from 1 to a
different integer value.
• Next, let’s apply these defined "rf" parameters to the training inputs and
output features (X_train,y_train) and obtain the accuracy of both
training and testing sets as shown below:
• As can be observed, the testing R2 is 81.82% compared to 68.33% of the decision tree.
• Therefore, without doing further parameter fine-tuning, the random forest algorithm appears to be
outperforming the decision tree. Let’s also visualize the cross plots of actual versus predicted
training and testing data sets as follows:
• Next, let’s also obtain MAE, MSE, and RMSE for the testing set as follows:
• As illustrated, MAE, MSE, and RMSE values are lower as compared to the decision tree model.
Next, let’s also obtain the feature ranking using random forest as follows:
• As illustrated in Fig. above slide, the important features obtained by random forest is different than
what was obtained from decision tree.
• This is primarily attributed to the higher accuracy of the random forest model.
• The recommendation is to go with the model with higher accuracy which is the random forest model
in this particular example.
• Tree-based algorithms, such as decision tree, random forest, extra trees, etc., use percentage
improvement in the purity of the node to naturally rank the input features.
• As previously discussed, in classification problems, the idea is to minimize Gini impurity (if Gini
impurity is selected).
• Therefore, nodes that lead to the greatest reduction in Gini impurity happen at the start of the trees
while nodes with the least amount of reduction occur at the end of the trees. This is how tree-based
algorithms perform feature ranking.
• To be consistent with the decision tree model, let’s also do a five-fold cross-validation to observe the
resulting average R2 for the random forest model as follows:

On average, the cross-validation R2 on random forest is 77.48% as compared to 63.03% of the


decision tree model.
THANK YOU

You might also like