100% found this document useful (1 vote)
136 views

Ensemble Learning Algorithms

Uploaded by

Dharaneesh .R.P
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
136 views

Ensemble Learning Algorithms

Uploaded by

Dharaneesh .R.P
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Ensemble Learning Algorithms

With Python

Make Better Predictions with


Bagging, Boosting, and Stacking

Jason Brownlee
i

Disclaimer
The information contained within this eBook is strictly for educational purposes. If you wish to apply
ideas contained in this eBook, you are taking full responsibility for your actions.
The author has made every effort to ensure the accuracy of the information within this book was
correct at time of publication. The author does not assume and hereby disclaims any liability to any
party for any loss, damage, or disruption caused by errors or omissions, whether such errors or
omissions result from accident, negligence, or any other cause.
No part of this eBook may be reproduced or transmitted in any form or by any means, electronic or
mechanical, recording or by any information storage and retrieval system, without written permission
from the author.

Acknowledgements
Special thanks to my copy editor Sarah Martin and my technical editors Michael Sanderson and Arun
Koshy, Andrei Cheremskoy, and John Halfyard.

Copyright

© Copyright 2021 Jason Brownlee. All Rights Reserved.


Ensemble Learning Algorithms With Python

Edition: v1.1
Contents

Copyright i

Contents ii

Preface iii

I Introduction iv

II Bagging 2
1 Bagged Decision Trees Ensemble 3
1.1 Tutorial Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Bagging Ensemble Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Evaluate Bagging Ensembles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.1 Bagging for Classification . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.2 Bagging for Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Bagging Hyperparameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4.1 Explore Number of Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4.2 Explore Number of Samples . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4.3 Explore Alternate Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5 Bagging Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.5.1 Pasting Ensemble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.5.2 Random Subspace Ensemble . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.5.3 Random Patches Ensemble . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.6 Common Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.7 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

ii
Preface

Predictive skill can be the most important outcome for some modeling projects. This can
be the case when slightly better predictions can result in a large benefit to the organization.
A popular example is Netflix, where slightly better recommendations are known to result in
better customer retention with the platform. This motivated the one-million-dollar Netflix
prize, which was won using a large ensemble of models. On predictive modeling problems where
predictive performance is most important, like machine learning competitions, ensembles are
almost universally among the top and winning solutions. Ensemble learning algorithms are
required if you want the best results.
Ensemble learning used to be an advanced subfield of machine learning, left to the experts.
This was for two main reasons. The first is that many ensemble learning algorithms are a more
complex type of model, requiring the careful training and integration of multiple other machine
learning models. This makes them challenging to implement and challenging to train correctly
in a way that avoids data leakage, and in turn, optimistically misleading results. The second
reason is that ensemble learning is computationally expensive as instead of fitting and evaluating
one model, a single ensemble requires fitting tens, hundreds, or even thousands of models. This
used to require large computational resources and expertise with parallel programming.
Thankfully, things have changed. Desktop computers are now incredibly fast and are multi-
core by default. We also have access to a suite of advanced ensemble algorithms in modern
machine learning libraries such as scikit-Learn in Python, as well as highly efficient third-party
implementations of some of the more powerful ensemble algorithms in libraries like XGBoost and
LightGBM. It has never been easier to rapidly evaluate advanced ensemble learning algorithms
on your own predictive modeling projects. The problem has transformed from a matter of how
to implement ensemble methods correctly to instead what are the extent of ensemble methods
available and how can they be tailored to specific machine learning projects. That is why I
created this book.
I designed this book to take you on a tour of the most effective ensemble machine learning
algorithms and show you exactly how they can be used to address classification and regression
problems, and how to configure and tune the techniques to get the most out of them. I wanted
to skip the theory and math for each method, which may be interesting but do not tell you how
to actually configure and use the methods, and focus on showing you exactly how to get a result
so that you can bring modern and powerful ensemble learning algorithms to your own projects
as fast as possible. Ensemble learning is important to machine learning, and I believe that if it
is taught at the right level for practitioners, it can be a fascinating, fun, directly applicable, and
an immeasurably useful toolbox of techniques. I hope that you agree.

Jason Brownlee
2021

iii
Part I

Introduction

iv
Welcome

Welcome to Ensemble Learning Algorithms With Python. Ensemble learning algorithms are
those techniques that combine the predictions of two or more machine learning algorithms with
the goal of improving predictive skill. Ensemble learning algorithms are a more advanced subfield
of machine learning, often turned to on machine learning projects when predictive performance
is the most important objective. As such, ensembles are widely used by top participants and
winners of competitive machine learning competitions.
Traditionally, ensembles have been challenging to implement due to their increased computa-
tional cost and complexity, which can introduce data leakage and result in optimistic estimates
of model performance. Modern libraries, such as scikit-learn and related third party libraries,
now make working with ensembles straightforward for beginners and advanced practitioners
alike. I designed this book to teach you the techniques for ensemble learning step-by-step with
concrete and executable examples in Python.

Who Is This Book For?


Before we get started, let’s make sure you are in the right place. This book is for developers that
may know some applied machine learning. Maybe you know how to work through a predictive
modeling problem end-to-end, or at least most of the main steps, with popular tools. The
lessons in this book do assume a few things about you, such as:

ˆ You know your way around basic Python for programming.

ˆ You may know some basic NumPy for array manipulation.

ˆ You may know some basic Scikit-Learn for modeling.

This guide was written in the top-down and results-first machine learning style that you’re
used to from Machine Learning Mastery.

About Your Outcomes


This book will teach you the techniques for ensemble learning that you need to know as a
machine learning practitioner. After reading and working through this book, you will know:

ˆ The intuition behind drawing upon a crowd or multiple experts when making important
decisions and how this intuition carries over to ensemble learning algorithms.

v
vi

ˆ The benefits of ensemble learning techniques for predictive modeling for both lifting
predictive skill and improving model robustness.

ˆ How to develop and evaluate multi-model algorithms for classification and regression
problems, providing a precursor to ensemble learning.

ˆ How to develop, configure, and evaluate bagging ensembles for classification and regression
predictive modeling problems.

ˆ How to develop and evaluate extensions to bagging, such as random subspace, random
forest, and extra trees ensembles.

ˆ How to develop, configure, and evaluate adaptive boosting (AdaBoost) and gradient
ensembles for classification and regression predictive modeling problems.

ˆ How to develop and evaluate efficient implementations of gradient boosting ensembles,


such as extreme gradient boosting (XGBoost) and light gradient boosting machines
(LightGBM).

ˆ How to develop, configure, and evaluate stacking ensembles for classification and regression
predictive modeling problems.

ˆ How to develop and evaluate simpler stacking ensembles such as voting and weighted
average ensembles.

ˆ How to develop and evaluate extensions to stacking, such as model blending and super
learner ensembles.

This book is not a substitute for an undergraduate course on ensemble learning (if such
courses exist) or a textbook for such a course, although it could complement such materials. For
a good list of top papers, textbooks, and other resources on ensemble learning, see the Further
Reading section at the end of each tutorial.

How to Read This Book


This book was written to be read linearly, from start to finish. That being said, if you know the
basics and need help with a specific technique, then you can skip straight to that section and
get started. This book was designed for you to read on your workstation, on the screen, not on
a tablet or eReader. My hope is that you have the book open right next to your editor and run
the examples as you read about them.
This book is not intended to be read passively or be placed in a folder as a reference text. It
is a playbook, a workbook, and a guidebook intended for you to learn by doing and then apply
your new understanding with working Python examples. To get the most out of the book, I
would recommend playing with the examples in each tutorial. Extend them, break them, then
fix them.
vii

About the Book Structure


This book was designed around major ensemble learning techniques that are directly relevant to
real-world problems. There are a lot of things you could learn about ensemble learning, from
theory to abstract concepts to APIs. My goal is to take you straight to developing an intuition
for the elements you must understand with laser-focused tutorials.
The tutorials were designed to focus on how to get results with ensemble learning methods.
As such, the tutorials give you the tools to both rapidly understand and apply each technique
or operation. There is a mixture of both tutorial lessons and practical examples to introduce
the methods and give plenty of opportunities to practice using them. Each of the tutorials is
designed to take you about one hour to read through and complete, excluding the extensions
and further reading.
You can choose to work through the lessons one per day, one per week, or at your own pace.
I think momentum is critically important, and this book is intended to be read and used, not to
sit idle. I recommend picking a schedule and sticking to it. The tutorials are divided into six
parts; they are:
ˆ Part 1: Foundation: Discover the power of ensemble learning techniques, why they are
important to getting good performance on your project, and how to develop an intuition
for what is being learned.
ˆ Part 2: Background: Discover the background required for ensemble learning including
the diversity of ensemble members, techniques for combining predictions, the complexity
of ensemble models, and the main types of ensemble methods.
ˆ Part 3: Multiple Models: Discover machine learning techniques that involve explicitly
using multiple models that provide the foundation for ensemble learning methods.
ˆ Part 4: Bagging: Discover bootstrap aggregation known as bagging family of ensemble
learning techniques including random forest, extra trees, and related methods.
ˆ Part 5: Boosting: Discover the boosting family of ensemble learning techniques, in-
cluding adaptive boosting, gradient boosting, and modern efficient implementations like
extreme gradient boosting and light gradient boosting machines.
ˆ Part 6: Stacking: Discover the stacked generalization or stacking family of ensemble
learning methods, including voting, blending, and related methods.
Each part targets a specific learning outcome, and so does each tutorial within each part.
This acts as a filter to ensure you are only focused on the things you need to know to get to a
specific result and do not get bogged down in the math or near-infinite number of digressions.
The tutorials were not designed to teach you everything there is to know about each of the
methods. They were designed to give you an understanding of how they work, how to use them,
and how to interpret the results the fastest way I know how: to learn by doing.

About Python Code Examples


The code examples were carefully designed to demonstrate the purpose of a given lesson. For
this reason, the examples are highly targeted.
viii

ˆ Algorithms were demonstrated on synthetic and small standard datasets to give you the
context and confidence to bring the techniques to your own projects.

ˆ Model configurations used were discovered through trial and error and are skillful, but
not optimized. This leaves the door open for you to explore new and possibly better
configurations.

ˆ Code examples are complete and standalone. The code for each lesson will run as-is with
no code from prior lessons or third parties needed beyond the installation of the required
packages.

A complete working example is presented with each tutorial for you to inspect and copy-paste.
All source code is also provided with the book and I would recommend running the provided
files whenever possible to avoid any copy-paste issues. The provided code was developed in a
text editor and is intended to be run on the command line. No special IDE or notebooks are
required. If you are using a more advanced development environment and are having trouble,
try running the example from the command line instead.
Machine learning algorithms are stochastic. This means that they will make different
predictions when the same model configuration is trained on the same training data. On top of
that, each experimental problem in this book is based on generating stochastic predictions. As a
result, this means you will not get exactly the same sample output presented in this book. This
is by design. I want you to get used to the stochastic nature of the machine learning algorithms.
If this bothers you, please note:

ˆ You can re-run a given example a few times and your results should be close to the values
reported.

ˆ You can make the output consistent by fixing the random number seed.

ˆ You can develop a robust estimate of the skill of a model by fitting and evaluating it
multiple times and taking the average of the final skill score (highly recommended).

All code examples were tested on a POSIX-compatible machine with Python 3. All code
examples will run on modest and modern computer hardware. I am only human, and there
may be a bug in the sample code. If you discover a bug, please let me know so I can fix it and
correct the book (and you can request a free update at any time).

About Further Reading


Each lesson includes a list of further reading resources. This may include:

ˆ Research papers.

ˆ Books and book chapters.

ˆ Webpages.

ˆ API documentation.
ix

ˆ Open-source projects.

Wherever possible, I have listed and linked to the relevant API documentation for key objects
and functions used in each lesson so you can learn more about them. When it comes to research
papers, I have listed those that are first to use a specific technique or first in a specific problem
domain. These are not required reading but can give you more technical details, theory, and
configuration details if you’re looking for it. Wherever possible, I have tried to link to the freely
available version of the paper on the arXiv preprint archive. You can search for and download
any of the papers listed on Google Scholar Search. Wherever possible, I have tried to link to
books on Amazon.
I don’t know everything, and if you discover a good resource related to a given lesson, please
let me know so I can update the book.

About Getting Help


You might need help along the way. Don’t worry; you are not alone.

ˆ Help with a technique? If you need help with the technical aspects of a specific
operation or technique, see the Further Reading section at the end of each tutorial.

ˆ Help with APIs? If you need help with using a Python library, see the list of resources
in the Further Reading section at the end of each lesson, and also see Appendix A.

ˆ Help with your workstation? If you need help setting up your environment, I would
recommend using Anaconda and following my tutorial in Appendix B.

ˆ Help in general? You can shoot me an email. My details are in Appendix A.

Next
Are you ready? Let’s dive in!
This is Just a Sample

Thank-you for your interest in Ensemble Learning Algorithms With Python.


This is just a sample of the full text. You can purchase the complete book online from:
https://machinelearningmastery.com/ensemble-learning-algorithms-with-python/

Ensemble Learning Algorithms


With Python

Make Better Predictions with


Bagging, Boosting, and Stacking

Jason Brownlee

1
Part II

Bagging

2
Chapter 1

Bagged Decision Trees Ensemble

Bagging is an ensemble machine learning algorithm that combines the predictions from many
decision trees. It is also easy to implement given that it has few key hyperparameters and
sensible heuristics for configuring these hyperparameters. Bagging performs well in general and
provides the basis for a whole field of ensemble of decision tree algorithms such as the popular
random forest and extra trees ensemble algorithms, as well as the lesser-known Pasting, Random
Subspaces, and Random Patches ensemble algorithms. In this tutorial, you will discover how to
develop Bagging ensembles for classification and regression. After completing this tutorial, you
will know:

ˆ Bagging ensemble is an ensemble created from decision trees fit on different samples of a
dataset.

ˆ How to use the Bagging ensemble for classification and regression with scikit-learn.

ˆ How to explore the effect of Bagging model hyperparameters on model performance.

Let’s get started.

1.1 Tutorial Overview


This tutorial is divided into five parts; they are:

1. Bagging Ensemble Algorithm

2. Evaluate Bagging Ensembles

3. Bagging Hyperparameters

4. Bagging Extensions

5. Common Questions

3
1.2. Bagging Ensemble Algorithm 4

1.2 Bagging Ensemble Algorithm


Bootstrap Aggregation, or Bagging for short, is an ensemble machine learning algorithm.
Specifically, it is an ensemble of decision tree models, although the bagging technique can also
be used to combine the predictions of other types of models. As its name suggests, bootstrap
aggregation is based on the idea of the bootstrap sample. A bootstrap sample is a sample of a
dataset with replacement. Replacement means that a sample drawn from the dataset is replaced,
allowing it to be selected again and perhaps multiple times in the new sample. This means that
the sample may have duplicate examples from the original dataset. The bootstrap sampling
technique is used to estimate a population statistic from a small data sample. This is achieved
by drawing multiple bootstrap samples, calculating the statistic on each, and reporting the
mean statistic across all samples.
An example of using bootstrap sampling would be estimating the population mean from a
small dataset. Multiple bootstrap samples are drawn from the dataset, the mean calculated on
each, then the mean of the estimated means is reported as an estimate of the population mean.
Surprisingly, the bootstrap method provides a robust and accurate approach to estimating
statistical quantities compared to a single estimate on the original dataset.
This same approach can be used to create an ensemble of decision tree models. This is
achieved by drawing multiple bootstrap samples from the training dataset and fitting a decision
tree on each. The predictions from the decision trees are then combined to provide a more
robust and accurate prediction than a single decision tree (typically, but not always).

Bagging predictors is a method for generating multiple versions of a predictor and


using these to get an aggregated predictor. [...] The multiple versions are formed by
making bootstrap replicates of the learning set and using these as new learning sets.

— Bagging Predictors, 1996.

Predictions are made for regression problems by averaging the prediction across the decision
trees. Predictions are made for classification problems by taking the majority vote prediction
for the classes from across the predictions made by the decision trees. The bagged decision trees
are effective because each decision tree is fit on a slightly different training dataset, which in
turn allows each tree to have minor differences and make slightly different skillful predictions.
Technically, we say that the method is effective because the trees have a low correlation between
predictions and, in turn, prediction errors.
Decision trees, specifically unpruned decision trees, are used as they slightly overfit the
training data and have a high variance. Other high-variance machine learning algorithms can
be used, such as a k-nearest neighbors algorithm with a low k value, although decision trees
have proven to be the most effective.

If perturbing the learning set can cause significant changes in the predictor con-
structed, then bagging can improve accuracy.

— Bagging Predictors, 1996.

Bagging does not always offer an improvement. For low-variance models that already perform
well, bagging can result in a decrease in model performance.
1.3. Evaluate Bagging Ensembles 5

The evidence, both experimental and theoretical, is that bagging can push a good
but unstable procedure a significant step towards optimality. On the other hand, it
can slightly degrade the performance of stable procedures.

— Bagging Predictors, 1996.

1.3 Evaluate Bagging Ensembles


The scikit-learn Python machine learning library provides an implementation of Bagging
ensembles for machine learning via the BaggingRegressor and BaggingClassifier classes.
Both models operate the same way and take the same arguments that influence how the decision
trees are created. Randomness is used in the construction of the model. This means that
each time the algorithm is run on the same data, it will produce a slightly different model.
When using machine learning algorithms that have a stochastic learning algorithm, it is good
practice to evaluate them by averaging their performance across multiple runs or repeats of
cross-validation. When fitting a final model, it may be desirable to either increase the number
of trees until the variance of the model is reduced across repeated evaluations, or to fit multiple
final models and average their predictions. Let’s take a look at how to develop a Bagging
ensemble for both classification and regression.

1.3.1 Bagging for Classification


In this section, we will look at using Bagging for a classification problem. First, we can use the
make classification() function to create a synthetic binary classification problem with 1,000
examples and 20 input features. The complete example is listed below.
# synthetic binary classification dataset
from sklearn.datasets import make_classification
# define dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5,
random_state=5)
# summarize the dataset
print(X.shape, y.shape)

Listing 1.1: Example of creating the synthetic classification dataset.


Running the example creates the dataset and summarizes the shape of the input and output
components.
(1000, 20) (1000,)

Listing 1.2: Example output from creating the synthetic classification dataset.
Next, we can evaluate a Bagging algorithm on this dataset. We will evaluate the model
using repeated stratified k-fold cross-validation, with three repeats and 10 folds. We will report
the mean and standard deviation of the accuracy of the model across all repeats and folds.
# evaluate bagging algorithm for classification
from numpy import mean
from numpy import std
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
1.3. Evaluate Bagging Ensembles 6

from sklearn.model_selection import RepeatedStratifiedKFold


from sklearn.ensemble import BaggingClassifier
# define dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5,
random_state=5)
# define the model
model = BaggingClassifier()
# define the evaluation procedure
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
# evaluate the model and collect the results
n_scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)
# report performance
print('Mean Accuracy: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))

Listing 1.3: Example of evaluating bagging on a classification dataset.


Running the example reports the mean and standard deviation accuracy of the model.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation
procedure, or differences in numerical precision. Consider running the example a few times and
compare the average outcome.

In this case, we can see the Bagging ensemble with default hyperparameters achieves a
classification accuracy of about 85 percent on this synthetic dataset.
Mean Accuracy: 0.856 (0.037)

Listing 1.4: Example output from evaluating bagging on a classification dataset.


We can also use the Bagging model as a final model and make predictions for classification.
First, the Bagging ensemble is fit on all available data, then the predict() function can be
called to make predictions on new data. The example below demonstrates this on our binary
classification dataset.
# make predictions using bagging for classification
from sklearn.datasets import make_classification
from sklearn.ensemble import BaggingClassifier
# define dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5,
random_state=5)
# define the model
model = BaggingClassifier()
# fit the model on the whole dataset
model.fit(X, y)
# make a single prediction
row = [-4.7705504, -1.88685058, -0.96057964, 2.53850317, -6.5843005, 3.45711663,
-7.46225013, 2.01338213, -0.45086384, -1.89314931, -2.90675203, -0.21214568,
-0.9623956, 3.93862591, 0.06276375, 0.33964269, 4.0835676, 1.31423977, -2.17983117,
3.1047287]
yhat = model.predict([row])
# summarize the prediction
print('Predicted Class: %d' % yhat[0])

Listing 1.5: Example of using bagging for making a prediction on a classification dataset.
Running the example fits the Bagging ensemble model on the entire dataset and is then used
to make a prediction on a new row of data, as we might when using the model in an application.
1.3. Evaluate Bagging Ensembles 7

Predicted Class: 1

Listing 1.6: Example output from using bagging for making a prediction on a classification
dataset.
Now that we are familiar with using Bagging for classification, let’s look at the API for
regression.

1.3.2 Bagging for Regression


In this section, we will look at using Bagging for a regression problem. First, we can use the
make regression() function to create a synthetic regression problem with 1,000 examples and
20 input features. The complete example is listed below.
# synthetic regression dataset
from sklearn.datasets import make_regression
# define dataset
X, y = make_regression(n_samples=1000, n_features=20, n_informative=15, noise=0.1,
random_state=5)
# summarize the dataset
print(X.shape, y.shape)

Listing 1.7: Example of creating the synthetic regression dataset.


Running the example creates the dataset and summarizes the shape of the input and output
components.
(1000, 20) (1000,)

Listing 1.8: Example output from creating the synthetic regression dataset.
Next, we can evaluate a Bagging algorithm on this dataset. As we did with the last section,
we will evaluate the model using repeated k-fold cross-validation, with three repeats and 10
folds. We will report the mean absolute error (MAE) of the model across all repeats and folds.
The complete example is listed below.

Note: The scikit-learn API flips the sign of the MAE to transform it from minimizing error to
maximizing negative error. This means that large magnitude positive errors become large
negative errors (e.g. 100 becomes -100) and a perfect model has no error with a value of 0.0. It
also means that we can safely ignore the sign of the mean MAE scores.

# evaluate bagging ensemble for regression


from numpy import mean
from numpy import std
from sklearn.datasets import make_regression
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedKFold
from sklearn.ensemble import BaggingRegressor
# define dataset
X, y = make_regression(n_samples=1000, n_features=20, n_informative=15, noise=0.1,
random_state=5)
# define the model
model = BaggingRegressor()
1.3. Evaluate Bagging Ensembles 8

# define the evaluation procedure


cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)
# evaluate the model and collect the results
n_scores = cross_val_score(model, X, y, scoring='neg_mean_absolute_error', cv=cv, n_jobs=-1)
# report performance
print('MAE: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))

Listing 1.9: Example of evaluating bagging on a regression dataset.


Running the example reports the mean and standard deviation accuracy of the model.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation
procedure, or differences in numerical precision. Consider running the example a few times and
compare the average outcome.

In this case, we can see that the Bagging ensemble with default hyperparameters achieves a
MAE of about 100.
MAE: -101.133 (9.757)

Listing 1.10: Example output from evaluating bagging on a regression dataset.


We can also use the Bagging model as a final model and make predictions for regression.
First, the Bagging ensemble is fit on all available data, then the predict() function can be
called to make predictions on new data. The example below demonstrates this on our regression
dataset.
# bagging ensemble for making predictions for regression
from sklearn.datasets import make_regression
from sklearn.ensemble import BaggingRegressor
# define dataset
X, y = make_regression(n_samples=1000, n_features=20, n_informative=15, noise=0.1,
random_state=5)
# define the model
model = BaggingRegressor()
# fit the model on the whole dataset
model.fit(X, y)
# make a single prediction
row = [0.88950817, -0.93540416, 0.08392824, 0.26438806, -0.52828711, -1.21102238,
-0.4499934, 1.47392391, -0.19737726, -0.22252503, 0.02307668, 0.26953276, 0.03572757,
-0.51606983, -0.39937452, 1.8121736, -0.00775917, -0.02514283, -0.76089365, 1.58692212]
yhat = model.predict([row])
# summarize the prediction
print('Prediction: %d' % yhat[0])

Listing 1.11: Example of using bagging for making a prediction on a regression dataset.
Running the example fits the Bagging ensemble model on the entire dataset and is then used
to make a prediction on a new row of data, as we might when using the model in an application.
Prediction: -134

Listing 1.12: Example output from using bagging for making a prediction on a regression
dataset.
Now that we are familiar with using the scikit-learn API to evaluate and use Bagging
ensembles, let’s look at configuring the model.
1.4. Bagging Hyperparameters 9

1.4 Bagging Hyperparameters


In this section, we will take a closer look at some of the hyperparameters you should consider
tuning for the Bagging ensemble and their effect on model performance.

1.4.1 Explore Number of Trees


An important hyperparameter for the Bagging algorithm is the number of decision trees used in
the ensemble. Typically, the number of trees is increased until the model performance stabilizes.
Intuition might suggest that more trees will lead to overfitting, although this is not the case.
Bagging and related ensembles of decision trees algorithms (like random forest) appear to be
somewhat immune to overfitting the training dataset given the stochastic nature of the learning
algorithm. The number of trees can be set via the n estimators argument and defaults to 100.
The example below explores the effect of the number of trees with values between 10 to 5,000.
# explore bagging ensemble number of trees effect on performance
from numpy import mean
from numpy import std
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.ensemble import BaggingClassifier
from matplotlib import pyplot

# get the dataset


def get_dataset():
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15,
n_redundant=5, random_state=5)
return X, y

# get a list of models to evaluate


def get_models():
models = dict()
# define number of trees to consider
n_trees = [10, 50, 100, 500, 500, 1000, 5000]
for n in n_trees:
models[str(n)] = BaggingClassifier(n_estimators=n)
return models

# evaluate a given model using cross-validation


def evaluate_model(model, X, y):
# define the evaluation procedure
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
# evaluate the model and collect the results
scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)
return scores

# define dataset
X, y = get_dataset()
# get the models to evaluate
models = get_models()
# evaluate the models and store results
results, names = list(), list()
for name, model in models.items():
1.4. Bagging Hyperparameters 10

# evaluate the model


scores = evaluate_model(model, X, y)
# store the results
results.append(scores)
names.append(name)
# summarize the performance along the way
print('>%s %.3f (%.3f)' % (name, mean(scores), std(scores)))
# plot model performance for comparison
pyplot.boxplot(results, labels=names, showmeans=True)
pyplot.show()

Listing 1.13: Example of evaluating the effect of the number of trees in the bagging ensemble.
Running the example first reports the mean accuracy for each configured number of decision
trees.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation
procedure, or differences in numerical precision. Consider running the example a few times and
compare the average outcome.

In this case, we can see that performance improves on this dataset until about 100 trees and
remains flat after that.
>10 0.855 (0.037)
>50 0.876 (0.035)
>100 0.882 (0.037)
>500 0.885 (0.041)
>1000 0.885 (0.037)
>5000 0.885 (0.038)

Listing 1.14: Example output from evaluating the effect of the number of trees in the bagging
ensemble.
A box and whisker plot is created for the distribution of accuracy scores for each configured
number of trees. We can see the general trend of no further improvement beyond about 100
trees.
1.4. Bagging Hyperparameters 11

Figure 1.1: Box Plot of Bagging Ensemble Size vs. Classification Accuracy.

1.4.2 Explore Number of Samples


The size of the bootstrap sample can also be varied. The default is to create a bootstrap sample
that has the same number of examples as the original dataset. Using a smaller dataset can
increase the variance of the resulting decision trees and could result in better overall performance.
The number of samples used to fit each decision tree is set via the max samples argument. The
example below explores different sized samples as a ratio of the original dataset from 10 percent
to 100 percent (the default).
# explore bagging ensemble number of samples effect on performance
from numpy import mean
from numpy import std
from numpy import arange
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.ensemble import BaggingClassifier
from matplotlib import pyplot

# get the dataset


def get_dataset():
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15,
1.4. Bagging Hyperparameters 12

n_redundant=5, random_state=5)
return X, y

# get a list of models to evaluate


def get_models():
models = dict()
# explore ratios from 10% to 100% in 10% increments
for i in arange(0.1, 1.1, 0.1):
key = '%.1f' % i
models[key] = BaggingClassifier(max_samples=i)
return models

# evaluate a given model using cross-validation


def evaluate_model(model, X, y):
# define the evaluation procedure
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
# evaluate the model and collect the results
scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)
return scores

# define dataset
X, y = get_dataset()
# get the models to evaluate
models = get_models()
# evaluate the models and store results
results, names = list(), list()
for name, model in models.items():
# evaluate the model
scores = evaluate_model(model, X, y)
# store the results
results.append(scores)
names.append(name)
# summarize the performance along the way
print('>%s %.3f (%.3f)' % (name, mean(scores), std(scores)))
# plot model performance for comparison
pyplot.boxplot(results, labels=names, showmeans=True)
pyplot.show()

Listing 1.15: Example of evaluating the effect of the number of samples in the bagging ensemble.
Running the example first reports the mean accuracy for each sample set size.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation
procedure, or differences in numerical precision. Consider running the example a few times and
compare the average outcome.

In this case, the results suggest that performance generally improves with an increase in
the sample size, highlighting that the default of 100 percent the size of the training dataset
is sensible. It might also be interesting to explore a smaller sample size with a corresponding
increase in the number of trees in an effort to reduce the variance of the individual models.
>0.1 0.810 (0.036)
>0.2 0.836 (0.044)
>0.3 0.844 (0.043)
>0.4 0.843 (0.041)
1.4. Bagging Hyperparameters 13

>0.5 0.852 (0.034)


>0.6 0.855 (0.042)
>0.7 0.858 (0.042)
>0.8 0.861 (0.033)
>0.9 0.866 (0.041)
>1.0 0.864 (0.042)

Listing 1.16: Example output from evaluating the effect of the number of samples in the bagging
ensemble.
A box and whisker plot is created for the distribution of accuracy scores for each sample
size. We see a general trend of increasing accuracy with sample size.

Figure 1.2: Box Plot of Bagging Sample Size vs. Classification Accuracy.

1.4.3 Explore Alternate Algorithm


Decision trees are the most common algorithm used in a bagging ensemble. The reason for this
is that they are easy to configure to have a high variance and because they perform well in
general. Other algorithms can be used with bagging and must be configured to have a modestly
high variance. One example is the k-nearest neighbors algorithm where the k value can be
set to a low value. The algorithm used in the ensemble is specified via the base estimator
argument and must be set to an instance of the algorithm and algorithm configuration to use.
1.4. Bagging Hyperparameters 14

The example below demonstrates using a KNeighborsClassifier as the base algorithm used
in the bagging ensemble. Here, the algorithm is used with default hyperparameters where k is
set to 5.
# evaluate bagging with knn algorithm for classification
from numpy import mean
from numpy import std
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import BaggingClassifier
# define dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5,
random_state=5)
# define the model
model = BaggingClassifier(base_estimator=KNeighborsClassifier())
# define the evaluation procedure
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
# evaluate the model and collect the results
n_scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)
# report performance
print('Mean Accuracy: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))

Listing 1.17: Example of evaluating the effect of changing the base algorithm in the bagging
ensemble.
Running the example reports the mean and standard deviation accuracy of the model.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation
procedure, or differences in numerical precision. Consider running the example a few times and
compare the average outcome.

In this case, we can see the Bagging ensemble with KNN and default hyperparameters
achieves a classification accuracy of about 88 percent on this synthetic dataset.
Mean Accuracy: 0.888 (0.036)

Listing 1.18: Example output from evaluating the effect of changing the base algorithm in the
bagging ensemble.
We can test different values of k to find the right balance of model variance to achieve good
performance as a bagged ensemble. The below example tests bagged KNN models with k values
between 1 and 20.
# explore bagging ensemble k for knn effect on performance
from numpy import mean
from numpy import std
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.ensemble import BaggingClassifier
from sklearn.neighbors import KNeighborsClassifier
from matplotlib import pyplot

# get the dataset


1.4. Bagging Hyperparameters 15

def get_dataset():
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15,
n_redundant=5, random_state=5)
return X, y

# get a list of models to evaluate


def get_models():
models = dict()
# evaluate k values from 1 to 20
for i in range(1,21):
# define the base model
base = KNeighborsClassifier(n_neighbors=i)
# define the ensemble model
models[str(i)] = BaggingClassifier(base_estimator=base)
return models

# evaluate a given model using cross-validation


def evaluate_model(model, X, y):
# define the evaluation procedure
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
# evaluate the model and collect the results
scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)
return scores

# define dataset
X, y = get_dataset()
# get the models to evaluate
models = get_models()
# evaluate the models and store results
results, names = list(), list()
for name, model in models.items():
# evaluate the model
scores = evaluate_model(model, X, y)
# store the results
results.append(scores)
names.append(name)
# summarize the performance along the way
print('>%s %.3f (%.3f)' % (name, mean(scores), std(scores)))
# plot model performance for comparison
pyplot.boxplot(results, labels=names, showmeans=True)
pyplot.show()

Listing 1.19: Example of evaluating the effect of changing the configuration of KNN as the base
algorithm in the bagging ensemble.
Running the example first reports the mean accuracy for each k value.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation
procedure, or differences in numerical precision. Consider running the example a few times and
compare the average outcome.

In this case, the results suggest a small k value such as two to four results in the best mean
accuracy when used in a bagging ensemble.
>1 0.884 (0.025)
>2 0.890 (0.029)
1.4. Bagging Hyperparameters 16

>3 0.886 (0.035)


>4 0.887 (0.033)
>5 0.878 (0.037)
>6 0.879 (0.042)
>7 0.877 (0.037)
>8 0.877 (0.036)
>9 0.871 (0.034)
>10 0.877 (0.033)
>11 0.876 (0.037)
>12 0.877 (0.030)
>13 0.874 (0.034)
>14 0.871 (0.039)
>15 0.875 (0.034)
>16 0.877 (0.033)
>17 0.872 (0.034)
>18 0.873 (0.036)
>19 0.876 (0.034)
>20 0.876 (0.037)

Listing 1.20: Example output from evaluating the effect of changing the configuration of KNN
as the base algorithm in the bagging ensemble.
A box and whisker plot is created for the distribution of accuracy scores for each k value.
We see a general trend of increasing accuracy with sample size in the beginning, then a modest
decrease in performance as the variance of the individual KNN models used in the ensemble is
increased with larger k values.
1.5. Bagging Extensions 17

Figure 1.3: Box Plot of Bagging KNN Number of Neighbors vs. Classification Accuracy.

1.5 Bagging Extensions


There are many modifications and extensions to the bagging algorithm in an effort to improve
the performance of the approach. Perhaps the most famous is the random forest algorithm.
There is a number of less famous, although still effective, extensions to bagging that may be
interesting to investigate. This section demonstrates some of these approaches, such as pasting
ensemble, random subspace ensemble, and the random patches ensemble. We are not comparing
the results of these extensions on the dataset , but rather providing working examples of how to
use each technique that you can copy-paste and try with your own dataset.

1.5.1 Pasting Ensemble


The Pasting Ensemble is an extension to bagging that involves fitting ensemble members based
on random samples of the training dataset instead of bootstrap samples. The approach is
designed to use smaller sample sizes than the training dataset in cases where the training dataset
does not fit into memory.
The procedure takes small pieces of the data, grows a predictor on each small piece
and then pastes these predictors together. A version is given that scales up to
terabyte data sets. The methods are also applicable to on-line learning.
1.5. Bagging Extensions 18

— Pasting Small Votes for Classification in Large Databases and On-Line, 1999.
The example below demonstrates the Pasting ensemble by setting the bootstrap argument
to False and setting the number of samples used in the training dataset via max samples to a
modest value, in this case, 50 percent of the training dataset size.
# evaluate pasting ensemble algorithm for classification
from numpy import mean
from numpy import std
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.ensemble import BaggingClassifier
# define dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5,
random_state=5)
# define the model
model = BaggingClassifier(bootstrap=False, max_samples=0.5)
# define the evaluation procedure
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
# evaluate the model and collect the results
n_scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)
# report performance
print('Mean Accuracy: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))

Listing 1.21: Example of evaluating a pasting ensemble.


Running the example reports the mean and standard deviation accuracy of the model.
Note: Your results may vary given the stochastic nature of the algorithm or evaluation
procedure, or differences in numerical precision. Consider running the example a few times and
compare the average outcome.
In this case, we can see the Pasting ensemble achieves a classification accuracy of about 84
percent on this dataset.
Mean Accuracy: 0.848 (0.039)

Listing 1.22: Example output from evaluating a pasting ensemble.

1.5.2 Random Subspace Ensemble


A Random Subspace Ensemble is an extension to bagging that involves fitting ensemble members
based on datasets constructed from random subsets of the features in the training dataset. It
is similar to the random forest except the data samples are random rather than a bootstrap
sample and the subset of features is selected for the entire decision tree rather than at each split
point in the tree.
The classifier consists of multiple trees constructed systematically by pseudorandomly
selecting subsets of components of the feature vector, that is, trees constructed in
randomly chosen subspaces.
— The Random Subspace Method For Constructing Decision Forests, 1998.
A worked example of the Random Subspace Ensemble is explored next in Chapter ??.
1.5. Bagging Extensions 19

1.5.3 Random Patches Ensemble


The Random Patches Ensemble is an extension to bagging that involves fitting ensemble members
based on datasets constructed from random subsets of rows (samples) and columns (features) of
the training dataset. It does not use bootstrap samples and might be considered an ensemble
that combines both the random sampling of the dataset of the Pasting ensemble and the random
sampling of features of the Random Subspace ensemble.

We investigate a very simple, yet effective, ensemble framework that builds each
individual model of the ensemble from a random patch of data obtained by drawing
random subsets of both instances and features from the whole dataset.

— Ensembles on Random Patches, 2012.

The example below demonstrates the Random Patches Ensemble with decision trees created
from a random sample of the training dataset limited to 50 percent of the size of the training
dataset, and with a random subset of 10 features.
# evaluate random patches ensemble algorithm for classification
from numpy import mean
from numpy import std
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.ensemble import BaggingClassifier
# define dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5,
random_state=5)
# define the model
model = BaggingClassifier(bootstrap=False, max_features=10, max_samples=0.5)
# define the evaluation procedure
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
# evaluate the model and collect the results
n_scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)
# report performance
print('Mean Accuracy: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))

Listing 1.23: Example of evaluating a random patches ensemble.


Running the example reports the mean and standard deviation accuracy of the model.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation
procedure, or differences in numerical precision. Consider running the example a few times and
compare the average outcome.

In this case, we can see the Random Patches Ensemble achieves a classification accuracy of
about 84 percent on this dataset.
Mean Accuracy: 0.845 (0.036)

Listing 1.24: Example output from evaluating a random patches ensemble.


1.6. Common Questions 20

1.6 Common Questions


In this section we will take a closer look at some common sticking points you may have with
the bagging ensemble procedure.

Q. What algorithm should be used in the ensemble?


The algorithm should have a moderate variance, meaning it is moderately dependent upon the
specific training data. A decision tree, often unpruned, is the default model to use because it
works well in practice. Other algorithms can be used as long as they are configured to have a
moderate variance.

... it is well known that Bagging should be used with unstable learners, and generally,
the more unstable, the larger the performance improvement.

— Page 52, Ensemble Methods, 2012.

Q. How many ensemble members should be used?


The performance of the model will converge with an increase of the number of decision trees to
a point, then remain level. Therefore, keep increasing the number of trees until the performance
stabilizes on your dataset.

... the performance of Bagging converges as the ensemble size, i.e., the number of
base learners, grows large ...

— Page 52, Ensemble Methods, 2012.

Q. Won’t the ensemble overfit with too many trees?


No. Bagging ensembles are very unlikely to overfit in general.

Q. How large should the bootstrap sample be?


It is good practice to make the bootstrap sample as large as the original dataset size. That is
100% the size or an equal number of rows as the original dataset.

Q. What problems are well suited to bagging?


Generally, bagging is well suited to problems with small or modest sized datasets. But this is a
rough guide. If you’re unsure, try it and see.

Bagging is best suited for problems with relatively small available training datasets.

— Page 12, Ensemble Machine Learning, 2012.

1.7 Further Reading


This section provides more resources on the topic if you are looking to go deeper.
1.8. Summary 21

Papers
ˆ Bagging predictors, 1996.
https://link.springer.com/article/10.1007/BF00058655

ˆ Pasting Small Votes for Classification in Large Databases and On-Line, 1999.
https://link.springer.com/article/10.1023/A:1007563306331

ˆ The Random Subspace Method For Constructing Decision Forests, 1998.


https://ieeexplore.ieee.org/abstract/document/709601

ˆ Ensembles on Random Patches, 2012.


https://link.springer.com/chapter/10.1007/978-3-642-33460-3_28

Books
ˆ Pattern Classification Using Ensemble Methods, 2010.
https://amzn.to/2zxc0F7

ˆ Ensemble Methods, 2012.


https://amzn.to/2XZzrjG

ˆ Ensemble Machine Learning, 2012.


https://amzn.to/2C7syo5

APIs
ˆ sklearn.ensemble.BaggingClassifier API.
https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingClassifier.html

ˆ sklearn.ensemble.BaggingRegressor API.
https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingRegressor.html

Articles
ˆ Bootstrap aggregating, Wikipedia.
https://en.wikipedia.org/wiki/Bootstrap_aggregating

1.8 Summary
In this tutorial, you discovered how to develop Bagging ensembles for classification and regression.
Specifically, you learned:

ˆ Bagging ensemble is an ensemble created from decision trees fit on different samples of a
dataset.

ˆ How to use the Bagging ensemble for classification and regression with scikit-learn.

ˆ How to explore the effect of Bagging model hyperparameters on model performance.


1.8. Summary 22

Next
In the next section, we will take a closer look at an extension to Bagging called Random
Subspace Ensembles.
This is Just a Sample

Thank-you for your interest in Ensemble Learning Algorithms With Python.


This is just a sample of the full text. You can purchase the complete book online from:
https://machinelearningmastery.com/ensemble-learning-algorithms-with-python/

Ensemble Learning Algorithms


With Python

Make Better Predictions with


Bagging, Boosting, and Stacking

Jason Brownlee

23

You might also like