0% found this document useful (0 votes)
7 views

Module 2

Uploaded by

falishaumaiza6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Module 2

Uploaded by

falishaumaiza6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 34

Module: II: Ensemble Learning

[8 Sessions] [Apply]

Ensemble Learning – using subset of instances – Bagging, Pasting, using


subset of features –random patches and random subspaces method; Voting
Classifier, Random Forest; Boosting – AdaBoost, Gradient Boosting, Extremely
Randomized Trees, Stacking.
Why Ensemble Learning ?

Suppose you ask a complex question to thousands of random people, then aggregate their
answers. In many cases you will find that this aggregated answer is better than an expert’s
answer

Similarly, if you aggregate the predictions of a group of predictors (such as classifiers or


regressors), you will often get better predictions than with the best individual predictor.

A group of predictors is called an ensemble; thus, this technique is called


Ensemble Learning, and an Ensemble Learning algorithm is called an Ensemble
method.
Voting Classifiers

Suppose you have trained a few classifiers, each one achieving about 80% accuracy.
You may have :
• Logistic Regression classifier
• SVM classifier
• Random Forest classifier
• K-Nearest Neighbors classifier
A very simple way to create an even better classifier is to aggregate the predictions of each
classifier and predict the class that gets the most votes. This majority-vote classifier is
called a hard voting classifier

Surprisingly, this voting


classifier often achieves
a higher accuracy than
the best classifier in the
ensemble.

Hard Voting Classifier Predictions


The following code creates and trains a voting classifier in Scikit-Learn, composed of
three diverse classifiers:

The Voting Classifier


outperforms the
individual classifier
with Accuracy of
90.4%
Bagging and Pasting
Another approach is to use the same training algorithm for every predictor, but to train
them on different random subsets of the training set.
When sampling is performed with replacement, this method is called bagging (short for
bootstrap aggregating ).
When sampling is performed without replacement, it is called pasting.

Pasting/Bagging Training Set sampling and Training


Bagging Classifier
Python coding for ensemble
learning
• import pandas as pd
• from pandas import read_csv
• import numpy as np
• from sklearn.metrics import accuracy_score
• from sklearn import tree
• from sklearn import svm
• from sklearn.linear_model import
LogisticRegression
• df = read_csv("diabetes.csv")
• x = np.array(df.drop(["Outcome"], 1))
• y = np.array(df["Outcome"])
• from sklearn.model_selection import
train_test_split
• x_train, x_test, y_train, y_test = train_test_split(x,
y, test_size = 0.2,random_state = 100)
• model1=tree.DecisionTreeClassifier()
• model2=svm.SVC(kernel='sigmoid', C=1,
gamma=1)
• model3=LogisticRegression(solver='liblinear',
random_state=0)
• model1.fit(x_train,y_train)
• model2.fit(x_train,y_train)
• model3.fit(x_train,y_train)
• prediction1=model1.predict(x_test)
• Decision Tree
• print("Decision Tree")
• print(accuracy_score(prediction1,y_t • 70.77922077922078
est)*100)
• support vector
• prediction2=model2.predict(x_test) machine
• print("support vector machine")
• print(accuracy_score(prediction2,y_t • 65.5844155844156
est)*100)
• LogisticRegression
• prediction3=model3.predict(x_test) • 74.02597402597402
• print("LogisticRegression")
• print(accuracy_score(prediction3,y_t
est)*100)
• Output:
• # Ensemble of Models
• estimator = []
• estimator.append(('LR',LogisticRegression(solver ='lbfgs',multi_class ='multinomial',max_iter
= 200)))
• estimator.append(('SVC', svm.SVC(gamma ='auto', probability = True)))
• estimator.append(('DTC', tree.DecisionTreeClassifier()))

• # Voting Classifier with hard voting


• from sklearn.ensemble import VotingClassifier
• hard_voting = VotingClassifier(estimators = estimator, voting ='hard')
• hard_voting.fit(x_train, y_train)
• prediction4= hard_voting.predict(x_test)
• print("Ensemble learning")
• print(accuracy_score(prediction4,y_test)*100)

Ensemble learning 75.32467532467533


The following code trains an ensemble of 500 Decision Tree classifiers,5 each trained
on 100 training instances randomly sampled from the training set with replacement
(this is an example of bagging, but if you want to use pasting instead, just set
bootstrap=False)
A single Decision Tree versus a bagging ensemble of 500 trees
Random Patches and Random Subspaces
The BaggingClassifier class supports sampling the features as well. This is controlled by two
hyperparameters: max_features and bootstrap_features.
They work the same way as max_samples and bootstrap, but for feature sampling instead of
instance sampling.
Thus, each predictor will be trained on a random subset of the input features

Sampling both training instances and features is called the Random Patches method.
Keeping all training instances but sampling features (i.e., bootstrap_features=True and/or
max_fea tures smaller than 1.0) is called the Random Subspaces method.

This is particularly useful when you are dealing with high-dimensional inputs (such as
images).
Random Forests

Random Forest is an ensemble of Decision Trees, generally trained via the bagging
method (or sometimes pasting).

Instead of building a Bagging Classifier and passing it a Decision Tree Classifier, you can
instead use the Random Forest Classifier class, which is more convenient and optimized
for Decision Trees.

Random forest tends to combine hundreds of decision


trees and then trains each decision tree on a different
sample of the observations.
18
Decision Tree is a series of
Nodes

19
Pruning-To reduce the
complexity of Decision Tree
Algorithm

20
Random Forest-An Improved
version of Decision Tree

21
Bagging using Random Forest
Trees
• Random Forest is a specific ensemble method that
utilizes bagging as its underlying technique.
• Random forest is one of the most popular tree-based
supervised learning algorithms. It is also the most
flexible and easy to use.
• The algorithm can be used to solve both classification
and regression problems.
• Random forest tends to combine hundreds of decision
trees and then trains each decision tree on a different
sample of the observations.
Algorithm
• Step 1: The algorithm select random samples from
the dataset provided.
• Step 2: The algorithm will create a decision tree for
each sample selected. Then it will get a prediction
result from each decision tree created.
• Step 3: Voting will then be performed for every
predicted result. For a classification problem, it will
use mode, and for a regression problem, it will
use mean.
• Step 4: And finally, the algorithm will select the
most voted prediction result as the final prediction.
Extremely Randomized Trees ensemble

When you are growing a tree in a Random Forest, at each node only a random subset of
the features is considered for splitting.
Trees can be made even more random by also using random thresholds for each feature
rather than searching for the best possible thresholds.
A forest of such extremely random trees is simply called an Extremely Randomized
Trees ensemble
Boosting
Boosting (originally called hypothesis boosting) refers to any Ensemble method that
can combine several weak learners into a strong learner.

The general idea of most boosting methods is to train predictors sequentially, each
trying to correct its predecessor.

Most popular Boosting methods are:


• AdaBoost( Adaptive Boosting)
• Gradient Boosting
AdaBoost
A first base classifier (such as a Decision Tree) is trained and used to make predictions on the
training set. The relative weight of misclassified training instances is then increased. A second
classifier is trained using the updated weights and again it makes predictions on the training
set, weights are updated, and so on

AdaBoost sequential training with instance weight updates


The first classifier gets many instances wrong, so their weights get
boosted.
The second classifier therefore does a better job on these instances,
and so on.
The plot on the right represents the same sequence of predictors
except that the learning rate is halved (i.e., the misclassified instance
weights are boosted half as much at every iteration).
Gradient Boosting
• Gradient Boosting works by sequentially adding
predictors to an ensemble, each one correcting
its predecessor. However, instead of tweaking the
instance weights at every iteration like AdaBoost
does, this method tries to fit the new predictor to
the residual errors made by the previous
predictor.

29
Stacking
Instead of using trivial functions (such as hard voting) to aggregate the predictions of
all predictors in an ensemble, we train a model to perform this aggregation.

Each of the bottom three


predictors predicts a different
value (3.1, 2.7, and 2.9), and
then the final predictor
(called a blender, or a meta
learner) takes these
predictions as inputs and
makes the final prediction
(3.0).
Training the First layer

The training set is split in two subsets. The first subset is used to train the predictors in the
first layer.
Next, the first layer predictors are used to make predictions on the second (held-out) set .

31
Training the Blender
We can create a new training set
using these predicted values as
input features (which makes this
new training set three-
dimensional), and keeping the
target values. The blender is
trained on this new training set,
so it learns to predict the target
value given the first layer’s
predictions.

32
Predictions in a multilayer stacking ensemble
• The first training subset is
used to train the first layer,
the second one is used to
create the training set used to
train the second layer and the
third one is used to create the
training set to train the third
layer.

• Once this is done, we can


make a prediction for a new
instance by going through
each layer sequentially.

33

You might also like