Random Forest

The document discusses machine learning techniques for reducing overfitting and underfitting including ensemble learning, bagging, boosting, and random forests. It describes how each technique works and its advantages.

Uploaded by

danyalshah9009

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views20 pages

Random Forest

Uploaded by

danyalshah9009

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Machine Learning

By: Ammara Yaseen

Agenda:
1. Overfitting and Underfitting
2. Ensemble Learning
3. Bagging and Boosting
4. Random Forest
5. OOB Evaluation and error
Overfitting
• A statistical model is said to be overfitted when the model does not
make accurate predictions on testing data.
• When a model gets trained with so much data, it starts learning
from the noise and inaccurate data entries in our data set.
• Broadly speaking, overfitting means our training has focused on the
particular training set so much that it has missed the point entirely.
In this way, the model is not able to adapt to new data as it’s too
focused on the training set.
• A solution to avoid overfitting is using a linear algorithm if we
have linear data or using the parameters like the maximal depth if
we are using decision trees.
Overfitting

• Reasons for Overfitting:

• High variance and low bias.
• The model is too complex.
• The size of the training data.
• Techniques to Reduce Overfitting
• Increase training data.
• Reduce model complexity.
• Early stopping during the training phase (have an eye over the loss over
the training period as soon as loss begins to increase stop training).
• Ridge Regularization and Lasso Regularization.
• Use dropout for neural networks to tackle overfitting.
Underfitting
• Underfitting, on the other hand, means the model
has not captured the underlying logic of the data. It
doesn’t know what to do with the task we’ve given it
and, therefore, provides an answer that is far from
correct.
Underfitting

Reasons for Underfitting

• The model is too simple, So it may be not capable to represent the complexities in
the data.
• The input features which is used to train the model is not the adequate
representations of underlying factors influencing the target variable.
• The size of the training dataset used is not enough.
• Excessive regularization are used to prevent the overfitting, which constraint the
model to capture the data well.
• Features are not scaled.
Techniques to Reduce Underfitting
• Increase model complexity.
• Increase the number of features, performing feature engineering.
• Remove noise from the data.
• Increase the number of epochs or increase the duration of training to get better
Good Fit

Ideally, the case when the model makes the predictions with 0 error, is said to have a
good fit on the data. This situation is achievable at a spot between overfitting and
underfitting. In order to understand it, we will have to look at the performance of
our model with the passage of time, while it is learning from the training dataset.

With the passage of time, our model will keep on learning, and thus the error for the
model on the training and testing data will keep on decreasing. If it will learn for too
long, the model will become more prone to overfitting due to the presence of noise
and less useful details. Hence the performance of our model will decrease. In order
to get a good fit, we will stop at a point just before where the error starts increasing.
At this point, the model is said to have good skills in training datasets as well as our
unseen testing dataset.
Overfitting,Apropriate Fitting,Underfitting
Ensemble Learning

Ensemble learning is a machine learning technique that enhances accuracy and resilience in
forecasting by merging predictions from multiple models. It aims to mitigate errors or biases
that may exist in individual models by leveraging the collective intelligence of the ensemble.

The underlying concept behind ensemble learning is to combine the outputs of diverse
models to create a more precise prediction. By considering multiple perspectives and utilizing
the strengths of different models, ensemble learning improves the overall performance of
the learning system. This approach not only enhances accuracy but also provides resilience
against uncertainties in the data. By effectively merging predictions from multiple models,
ensemble learning has proven to be a powerful tool in various domains, offering more robust
and reliable forecasts.
Bagging and Boosting

Bagging is an Ensemble Learning technique which aims to reduce the error learning
through the implementation of a set of homogeneous machine learning algorithms. The
key idea of bagging is the use of multiple base learners which are trained separately
with a random sample from the training set, which through a voting or averaging
approach, produce a more stable and accurate model.
Bagging and Boosting

The main two components of bagging technique are: the random

sampling with replacement (bootstrapping) and the set of
homogeneous machine learning algorithms (ensemble learning).
The bagging process is quite easy to understand, first it is extracted
“n” subsets from the training set, then these subsets are used to
train “n” base learners of the same type. For making a prediction,
each one of the “n” learners are feed with the test sample, the
output of each learner is averaged (in case of regression) or voted
(in case of classification).
Whole bagging process
Boosting

Boosting is an Ensemble Learning technique that, like bagging, makes use of a set
of base learners to improve the stability and effectiveness of a ML model. The idea
behind a boosting architecture is the generation of sequential hypotheses, where
each hypothesis tries to improve or correct the mistakes made in the previous one .

The central idea of boosting is the implementation of homogeneous ML

algorithms in a sequential way, where each of these ML algorithms tries to improve
the stability of the model by focusing on the errors made by the previous ML
algorithm. The way in which the errors of each base learner is considered to be
improved with the next base learner in the sequence, is the key differentiator
between all variations of the boosting technique.
Boosting
Boosting
Boosting
The Explanation for Training the Boosting Model:

The above diagram explains the AdaBoost algorithm in a very simple way. Let’s try to understand it in a stepwise process:

• B1 consists of 10 data points which consist of two types namely plus(+) and minus(-) and 5 of which are plus(+) and the
other 5 are minus(-) and each one has been assigned equal weight initially. The first model tries to classify the data points
and generates a vertical separator line but it wrongly classifies 3 plus(+) as minus(-).

• B2 consists of the 10 data points from the previous model in which the 3 wrongly classified plus(+) are weighted more so
that the current model tries more to classify these pluses(+) correctly. This model generates a vertical separator line that
correctly classifies the previously wrongly classified pluses(+) but in this attempt, it wrongly classifies three minuses(-).

• B3 consists of the 10 data points from the previous model in which the 3 wrongly classified minus(-) are weighted more so
that the current model tries more to classify these minuses(-) correctly. This model generates a horizontal separator line that
correctly classifies the previously wrongly classified minuses(-).

• B4 combines together B1, B2, and B3 in order to build a strong prediction model which is much better than any individual
model used.
Boosting

Disadvantages of Boosting Algorithms

• Boosting algorithms also have some disadvantages
these are:
• Boosting Algorithms are vulnerable to the outliers
• It is difficult to use boosting algorithms for Real-Time
applications.
• It is computationally expensive for large datasets
Random Forest
Random Forest is a technique that uses ensemble learning, that combines many
weak classifiers to provide solutions to complex problems.
As the name suggests random forest consists of many decision trees. Rather than
depending on one tree it takes the prediction from each tree and based on the
majority votes of predictions, predicts the final output.

The main difference between these two is that Random Forest is a bagging method
that uses a subset of the original dataset to make predictions and this property of
Random Forest helps to overcome Overfitting. Instead of building a single decision
tree, Random forest builds a number of DT’s with a different set of observations. One
big advantage of this algorithm is that it can be used for classification as well as
regression problems.
Random Forest

Steps involved in Random Forest Algorithm

Step-1 – We first make subsets of our original data. We will do row sampling and
feature sampling that means we’ll select rows and columns with replacement and
create subsets of the training dataset

Step- 2 – We create an individual decision tree for each subset we take

Step-3 – Each decision tree will give an output

Step 4 – Final output is considered based on Majority Voting if it’s a classification

problem and average if it’s a regression problem.
OOB(Out Of Bag) Evaluation

OOB (Out of the bag) Evaluation

Since sampling is done with replacement, about one-third of the data is not used to
train the model and this data is called out of the bag samples.

To get the oob evaluation we need to set a parameter called oob_score to TRUE. We can see that the score we
Decision Trees Random Forests
get from oob samples, and the test dataset is somewhat the same. In this way, we can use these left-out samples
in evaluating our model.

ML Lecture 15 Ensemble
No ratings yet
ML Lecture 15 Ensemble
27 pages
Evaluating Machine Learning Algorithms and Model Selection
No ratings yet
Evaluating Machine Learning Algorithms and Model Selection
10 pages
Ensemble, Voting, Bagging, Boosting
No ratings yet
Ensemble, Voting, Bagging, Boosting
15 pages
Handout9 Trees Bagging Boosting
100% (1)
Handout9 Trees Bagging Boosting
23 pages
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
No ratings yet
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
6 pages
Bagging+Boosting+Gradient Boosting
100% (1)
Bagging+Boosting+Gradient Boosting
48 pages
ML Mod1
No ratings yet
ML Mod1
48 pages
Bagging Vs Boosting in Machine Learning
No ratings yet
Bagging Vs Boosting in Machine Learning
4 pages
Ensemble - Part 1
No ratings yet
Ensemble - Part 1
33 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
4 pages
Ens Embling
No ratings yet
Ens Embling
19 pages
ML-Unit I - Ensemble Methods
No ratings yet
ML-Unit I - Ensemble Methods
54 pages
Module 5,1 Ensemble - Bagging, RF, Boosting
No ratings yet
Module 5,1 Ensemble - Bagging, RF, Boosting
66 pages
UMl - Unit 3
No ratings yet
UMl - Unit 3
50 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
40 pages
Bagging Vs Boosting in Machine Learning - GeeksforGeeks
No ratings yet
Bagging Vs Boosting in Machine Learning - GeeksforGeeks
9 pages
7 - Ensemble Techniques-Converted Updated
No ratings yet
7 - Ensemble Techniques-Converted Updated
8 pages
Eda - M4
No ratings yet
Eda - M4
7 pages
5 - EnsembleModeling
No ratings yet
5 - EnsembleModeling
80 pages
M4 - FDS
No ratings yet
M4 - FDS
15 pages
Enseble LEarning
100% (1)
Enseble LEarning
57 pages
Ensemble Learning (Autosaved)
No ratings yet
Ensemble Learning (Autosaved)
31 pages
Ensemble Final
No ratings yet
Ensemble Final
41 pages
Ensemble
No ratings yet
Ensemble
33 pages
Random Forest-Supervised ML
No ratings yet
Random Forest-Supervised ML
45 pages
ML - Underfitting and Overfitting - GeeksforGeeks
No ratings yet
ML - Underfitting and Overfitting - GeeksforGeeks
8 pages
ML Unit-3
No ratings yet
ML Unit-3
15 pages
Ensemble Methods
No ratings yet
Ensemble Methods
32 pages
Unit 4 Part 1
No ratings yet
Unit 4 Part 1
47 pages
Bagging
No ratings yet
Bagging
7 pages
Overfitting & Feature Engineering
No ratings yet
Overfitting & Feature Engineering
37 pages
Ensemble Learning: Wisdom of The Crowd
100% (1)
Ensemble Learning: Wisdom of The Crowd
12 pages
5.3 Model
No ratings yet
5.3 Model
26 pages
05 - Ensemble Learning
No ratings yet
05 - Ensemble Learning
39 pages
Unit 3
No ratings yet
Unit 3
59 pages
ML Mod 5.1
No ratings yet
ML Mod 5.1
18 pages
Unit 4 ML
No ratings yet
Unit 4 ML
25 pages
Bias and Variance in Machine Learning
No ratings yet
Bias and Variance in Machine Learning
3 pages
ML Unit 3-1
No ratings yet
ML Unit 3-1
14 pages
ML & DL
No ratings yet
ML & DL
19 pages
Ensemble Learning
No ratings yet
Ensemble Learning
52 pages
Unit V - Multiple Learners
No ratings yet
Unit V - Multiple Learners
54 pages
ML11 Generalization
No ratings yet
ML11 Generalization
40 pages
Ensemble Learning Methods
100% (1)
Ensemble Learning Methods
24 pages
Model Evaluation
No ratings yet
Model Evaluation
29 pages
Ch-4 Ensemble Learning
No ratings yet
Ch-4 Ensemble Learning
18 pages
Module 7 - Ensemble Learning
No ratings yet
Module 7 - Ensemble Learning
41 pages
Ensemble Methods
No ratings yet
Ensemble Methods
31 pages
UNIT III Word File
No ratings yet
UNIT III Word File
13 pages
Unit-3 ML
No ratings yet
Unit-3 ML
18 pages
Group9 ABA Ensemble Model
No ratings yet
Group9 ABA Ensemble Model
5 pages
Unit 3
No ratings yet
Unit 3
63 pages
Unit 3 Aml
No ratings yet
Unit 3 Aml
9 pages
Classification Algorithms
No ratings yet
Classification Algorithms
68 pages
Lecture 6
No ratings yet
Lecture 6
24 pages
Unit Iv
No ratings yet
Unit Iv
14 pages
Lecture 05 Random Forest 07112022 124639pm
No ratings yet
Lecture 05 Random Forest 07112022 124639pm
25 pages