Ensemble Methods in Machine Learning
Ensemble Methods in Machine Learning
● Bagging. Building multiple models (typically of the same type) from different subsamples of the training
dataset.
● Boosting. Building multiple models (typically of the same type) each of which learns to fix the prediction
errors of a prior model in the chain.
● Voting. Building multiple models (typically of differing types) and simple statistics (like calculating the
mean) are used to combine predictions.
Bagging Algorithms
Bootstrap Aggregation or bagging involves taking multiple samples from your training dataset (with replacement)
and training a model for each sample.
The final output prediction is averaged across the predictions of all of the sub-models.
Bagging performs best with algorithms that have high variance. A popular example are decision trees, often
constructed without pruning.
In the example below see an example of using the BaggingClassifier with the Classification and Regression
Trees algorithm (DecisionTreeClassifier). A total of 100 trees are created.
# Bagged Decision Trees for Classification
import pandas
from sklearn import model_selection
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
#url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataframe = pandas.read_csv(‘diabetes.csv’)
array = dataframe.values
X = array[:,0:8]
Y = array[:,8]
seed = 7
kfold = model_selection.KFold(n_splits=10, random_state=seed)
cart = DecisionTreeClassifier()
num_trees = 100
model = BaggingClassifier(base_estimator=cart, n_estimators=num_trees, random_state=seed)
results = model_selection.cross_val_score(model, X, Y, cv=kfold)
print(results.mean())
Random Forest
Random forest is an extension of bagged decision trees.
Samples of the training dataset are taken with replacement, but the trees are constructed in a way that reduces the
correlation between individual classifiers. Specifically, rather than greedily choosing the best split point in the
construction of the tree, only a random subset of features are considered for each split.
You can construct a Random Forest model for classification using the RandomForestClassifier class.
The example below provides an example of Random Forest for classification with 100 trees and split points chosen
from a random selection of 3 features.
# Random Forest Classification
import pandas
from sklearn import model_selection
from sklearn.ensemble import RandomForestClassifier
#url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataframe = pandas.read_csv(‘diabetes.csv’)
array = dataframe.values
X = array[:,0:8]
Y = array[:,8]
seed = 7
num_trees = 100
max_features = 3
kfold = model_selection.KFold(n_splits=10, random_state=seed)
model = RandomForestClassifier(n_estimators=num_trees, max_features=max_features)
results = model_selection.cross_val_score(model, X, Y, cv=kfold)
print(results.mean())
Extra Trees
Extra Trees are another modification of bagging where random trees are constructed from samples of the training
dataset.
You can construct an Extra Trees model for classification using the ExtraTreesClassifier class.
The example below provides a demonstration of extra trees with the number of trees set to 100 and splits chosen
from 7 random features.
# Extra Trees Classification
import pandas
from sklearn import model_selection
from sklearn.ensemble import ExtraTreesClassifier
#url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataframe = pandas.read_csv(‘diabetes.csv’)
array = dataframe.values
X = array[:,0:8]
Y = array[:,8]
seed = 7
num_trees = 100
max_features = 7
kfold = model_selection.KFold(n_splits=10, random_state=seed)
model = ExtraTreesClassifier(n_estimators=num_trees, max_features=max_features)
results = model_selection.cross_val_score(model, X, Y, cv=kfold)
print(results.mean())
Boosting Algorithms
Boosting ensemble algorithms creates a sequence of models that attempt to correct the mistakes of the models
before them in the sequence.
Once created, the models make predictions which may be weighted by their demonstrated accuracy and the
results are combined to create a final output prediction.
The two most common boosting ensemble machine learning algorithms are:
1. AdaBoost
2. Stochastic Gradient Boosting
AdaBoost
AdaBoost was perhaps the first successful boosting ensemble algorithm. It generally works by weighting
instances in the dataset by how easy or difficult they are to classify, allowing the algorithm to pay or or less
attention to them in the construction of subsequent models.
You can construct an AdaBoost model for classification using the AdaBoostClassifier class.
The example below demonstrates the construction of 30 decision trees in sequence using the AdaBoost
algorithm.
# AdaBoost Classification
import pandas
from sklearn import model_selection
from sklearn.ensemble import AdaBoostClassifier
#url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataframe = pandas.read_csv(‘diabetes.csv’)
array = dataframe.values
X = array[:,0:8]
Y = array[:,8]
seed = 7
num_trees = 30
kfold = model_selection.KFold(n_splits=10, random_state=seed)
model = AdaBoostClassifier(n_estimators=num_trees, random_state=seed)
results = model_selection.cross_val_score(model, X, Y, cv=kfold)
print(results.mean())
Stochastic Gradient Boosting
Stochastic Gradient Boosting (also called Gradient Boosting Machines) are one of the most sophisticated
ensemble techniques. It is also a technique that is proving to be perhaps of the the best techniques available for
improving performance via ensembles.
You can construct a Gradient Boosting model for classification using the GradientBoostingClassifier class.
The example below demonstrates Stochastic Gradient Boosting for classification with 100 trees.
# Stochastic Gradient Boosting Classification
import pandas
from sklearn import model_selection
from sklearn.ensemble import GradientBoostingClassifier
#url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataframe = pandas.read_csv(‘diabetes.csv’)
array = dataframe.values
X = array[:,0:8]
Y = array[:,8]
seed = 7
num_trees = 100
kfold = model_selection.KFold(n_splits=10, random_state=seed)
model = GradientBoostingClassifier(n_estimators=num_trees, random_state=seed)
results = model_selection.cross_val_score(model, X, Y, cv=kfold)
print(results.mean())
Voting Ensemble
Voting is one of the simplest ways of combining the predictions from multiple machine learning algorithms.
It works by first creating two or more standalone models from your training dataset. A Voting Classifier can then
be used to wrap your models and average the predictions of the sub-models when asked to make predictions for
new data.
You can create a voting ensemble model for classification using the VotingClassifier class.
The code below provides an example of combining the predictions of logistic regression, classification and
regression trees and support vector machines together for a classification problem.
# Voting Ensemble for Classification
import pandas
from sklearn import model_selection
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.ensemble import VotingClassifier
#url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataframe = pandas.read_csv(‘diabetes.csv’)
array = dataframe.values
X = array[:,0:8]
Y = array[:,8]
seed = 7
kfold = model_selection.KFold(n_splits=10, random_state=seed)
# create the sub models
estimators = []
model1 = LogisticRegression()
estimators.append(('logistic', model1))
model2 = DecisionTreeClassifier()
estimators.append(('cart', model2))
model3 = SVC()
estimators.append(('svm', model3))
# create the ensemble model
ensemble = VotingClassifier(estimators)
results = model_selection.cross_val_score(ensemble, X, Y, cv=kfold)
print(results.mean())
Stacking
A stacking classifier is an ensemble method where the output from multiple classifiers is passed as an input
to a meta-classifier for the task of the final classification. The individual classification models are trained
based on the complete training set, then the meta-classifier is fitted based on the outputs (meta-features) of
the individual classification models.
Stacking has been successfully used in several machine learning competitions at Kaggle. It is definitely a
must know technique in machine learning. Stacking is an ensemble technique that uses a new model to learn
how to best combine the predictions from two or more models trained on your dataset.
Voting Vs Stacking
Voting - The algorithm has some constant method that gives answers.
Stacking - The algorithm takes advantage of the predictions as the new representation of the problem and
creates another abstraction layer to learn how to predict the correct label having k votes
Stacking vs bagging and boosting
1. Bagging (stands for Bootstrap Aggregating): we use bagging for combining weak learners (base models) of high
variance. Bagging aims to produce a model with lower variance than the individual weak models. Bagging takes
advantage of Bootstrapping technique - sampling different sets of data from a given training set by using
replacement. After bootstrapping the training dataset, we train the model on all the different sets and aggregate the
result. Unlike bagging, in stacking, the models are typically different (e.g. not all decision trees) and fit on the same
dataset (instead of samples of the training dataset).
2. Boosting: in boosting the learners are trained sequentially. The algorithm learns models sequentially in a very
adaptive way (a base model depends on the previous ones) and combines them following a deterministic strategy.
Unlike boosting, in stacking, a single model is used to learn how to best combine the predictions from the
contributing models (e.g. instead of correcting the predictions of prior models).
Bagging trains models in parallel, boosting trains the models sequentially. Stacking creates a new meta-model.
Stacking Scikit-Learn
Stacking is used for two machine learning problems with the help of Scikit-Learn. Scikit-learn is a free
software machine learning library for the Python programming language. It features various classification,
regression and clustering algorithms including support vector machines, linear regression, logistic
regression, k-means clustering and many more.
The first problem is the famous iris problem in which, given some attributes, we have to classify the iris
flower as Setosa, Versicolor, or Virginica which are it’s three species. The second problem is Wine
recognition in which we have to classify the wine into three categories. Both of these datasets are available
in Scikit-learn library.
from numpy import mean
from numpy import std
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import StackingClassifier
from matplotlib import pyplot
from sklearn.datasets import load_wine,load_iris
from matplotlib.pyplot import figure
figure(num=2, figsize=(16, 12), dpi=80, facecolor='w', edgecolor='k')
# define dataset
X,y = load_wine().data,load_wine().target
X1,y1= load_iris().data,load_iris().target
# get the models to evaluate
models = get_models()
# evaluate the models and store results
results, names, results1 = list(), list(),list()
for name, model in models.items():
scores,scores1= evaluate_model(model)
results.append(scores)
results1.append(scores1)
names.append(name)
print('>%s -> %.3f (%.3f)---Wine dataset' % (name, mean(scores), std(scores)))
print('>%s -> %.3f (%.3f)---Iris dataset' % (name, mean(scores1), std(scores1)))
# plot model performance for comparison
pyplot.rcParams["figure.figsize"] = (15,6)
pyplot.boxplot(results, labels=[s+"-wine" for s in names], showmeans=True)
pyplot.show()
pyplot.boxplot(results1, labels=[s+"-iris" for s in names], showmeans=True)
pyplot.show()
References
https://www.analyticsvidhya.com/blog/2021/04/distinguish-between-tree-based-machine-learning-algorithms/
https://www.geeksforgeeks.org/stacking-in-machine-learning-2/
https://machinelearningmastery.com/ensemble-machine-learning-algorithms-python-scikit-learn/
http://rasbt.github.io/mlxtend/user_guide/classifier/StackingClassifier/
https://machinelearningmastery.com/stacking-ensemble-machine-learning-with-python/
https://machinelearningmastery.com/stacking-ensemble-machine-learning-with-python/