Module 2
Module 2
[8 Sessions] [Apply]
Suppose you ask a complex question to thousands of random people, then aggregate their
answers. In many cases you will find that this aggregated answer is better than an expert’s
answer
Suppose you have trained a few classifiers, each one achieving about 80% accuracy.
You may have :
• Logistic Regression classifier
• SVM classifier
• Random Forest classifier
• K-Nearest Neighbors classifier
A very simple way to create an even better classifier is to aggregate the predictions of each
classifier and predict the class that gets the most votes. This majority-vote classifier is
called a hard voting classifier
Sampling both training instances and features is called the Random Patches method.
Keeping all training instances but sampling features (i.e., bootstrap_features=True and/or
max_fea tures smaller than 1.0) is called the Random Subspaces method.
This is particularly useful when you are dealing with high-dimensional inputs (such as
images).
Random Forests
Random Forest is an ensemble of Decision Trees, generally trained via the bagging
method (or sometimes pasting).
Instead of building a Bagging Classifier and passing it a Decision Tree Classifier, you can
instead use the Random Forest Classifier class, which is more convenient and optimized
for Decision Trees.
19
Pruning-To reduce the
complexity of Decision Tree
Algorithm
20
Random Forest-An Improved
version of Decision Tree
21
Bagging using Random Forest
Trees
• Random Forest is a specific ensemble method that
utilizes bagging as its underlying technique.
• Random forest is one of the most popular tree-based
supervised learning algorithms. It is also the most
flexible and easy to use.
• The algorithm can be used to solve both classification
and regression problems.
• Random forest tends to combine hundreds of decision
trees and then trains each decision tree on a different
sample of the observations.
Algorithm
• Step 1: The algorithm select random samples from
the dataset provided.
• Step 2: The algorithm will create a decision tree for
each sample selected. Then it will get a prediction
result from each decision tree created.
• Step 3: Voting will then be performed for every
predicted result. For a classification problem, it will
use mode, and for a regression problem, it will
use mean.
• Step 4: And finally, the algorithm will select the
most voted prediction result as the final prediction.
Extremely Randomized Trees ensemble
When you are growing a tree in a Random Forest, at each node only a random subset of
the features is considered for splitting.
Trees can be made even more random by also using random thresholds for each feature
rather than searching for the best possible thresholds.
A forest of such extremely random trees is simply called an Extremely Randomized
Trees ensemble
Boosting
Boosting (originally called hypothesis boosting) refers to any Ensemble method that
can combine several weak learners into a strong learner.
The general idea of most boosting methods is to train predictors sequentially, each
trying to correct its predecessor.
29
Stacking
Instead of using trivial functions (such as hard voting) to aggregate the predictions of
all predictors in an ensemble, we train a model to perform this aggregation.
The training set is split in two subsets. The first subset is used to train the predictors in the
first layer.
Next, the first layer predictors are used to make predictions on the second (held-out) set .
31
Training the Blender
We can create a new training set
using these predicted values as
input features (which makes this
new training set three-
dimensional), and keeping the
target values. The blender is
trained on this new training set,
so it learns to predict the target
value given the first layer’s
predictions.
32
Predictions in a multilayer stacking ensemble
• The first training subset is
used to train the first layer,
the second one is used to
create the training set used to
train the second layer and the
third one is used to create the
training set to train the third
layer.
33