100% found this document useful (1 vote)

689 views49 pages

A Comprehensive Guide To Ensemble Learning (With Python Codes) PDF

Uploaded by

fadhli202

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

689 views49 pages

A Comprehensive Guide To Ensemble Learning (With Python Codes) PDF

Uploaded by

fadhli202

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

A Comprehensive Guide to Ensemble Learning (with Python codes)

× Registrations Closing: Certified AI & ML BlackBelt Program (Beginner to Master) Closes In

-
3 Days 00 Hours 50 Minutes 23 Seconds Enroll Now

HOME
BLOG ARCHIVE
TRAININGS
DISCUSS
DATAHACK
JOBS
o p q x
i LOGIN / REGISTER

CORPORATE

LEARN l
ENGAGE l
COMPETE l
GET HIRED l
COURSES l
CONTACT
2

AI & ML BLACKBELT PROGRAM

Home n Machine Learning n A Comprehensive Guide to Ensemble Learning (with

Python codes)

MACHINE LEARNING PYTHON

A Comprehensive Guide to Ensemble

Learning (with Python codes)

AISHWARYA SINGH, JUNE 18, 2018 LOGIN TO BOOKMARK THIS ARTICLE

Introduction

When you want to purchase a new car, will you walk up to the first car
JOIN THE NEXTGEN
shop and purchase one based on the advice of the dealer? It’s highly
DATA SCIENCE
unlikely.
ECOSYSTEM
Download Resource

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

You would likely browser a few web portals where people have posted
their reviews and compare different car models, checking for their -Get access to free
features and prices. You will also probably ask your friends and courses on Analytics
colleagues for their opinion. In short, you wouldn’t directly reach a Vidhya
conclusion, but will instead make a decision considering the opinions
-Get free downloadable
of other people as well.
resource from Analytics
Vidhya
-Save your articles
-Participate in
hackathons and win
prizes

Join
Join now
now

Ensemble models in machine learning operate on a similar idea. They

combine the decisions from multiple models to improve the overall
performance. This can be achieved in various ways, which you will
discover in this article.

The objective of this article is to introduce the concept of ensemble

learning and understand the algorithms which use this technique. To
cement your understanding of this diverse topic, we will explain the
advanced algorithms in Python using a hands-on case study on a real-
life problem.

Note: This article assumes a basic understanding of Machine Learning

algorithms. I would recommend going through this article to
familiarize yourself with these concepts.

Table of Contents

1. Introduction to Ensemble Learning

2. Basic Ensemble Techniques
2.1 Max Voting
2.2 Averaging

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

2.3 Weighted Average

POPULAR POSTS
3. Advanced Ensemble Techniques
3.1 Stacking
3.2 Blending 8 Useful R Packages for
3.3 Bagging Data Science You Aren’t
3.4 Boosting Using (But Should!)
4. Algorithms based on Bagging and Boosting
24 Ultimate Data
4.1 Bagging meta-estimator
Science Projects To
4.2 Random Forest
Boost Your Knowledge
4.3 AdaBoost
and Skills (& can be
4.4 GBM
accessed freely)
4.5 XGB
4.6 Light GBM Understanding Support
4.7 CatBoost Vector Machine
algorithm from examples
(along with code)
A Complete Tutorial to
1. Introduction to Ensemble Learning Learn Data Science with
Python from Scratch
Let’s understand the concept of ensemble learning with an example. Essentials of Machine
Suppose you are a movie director and you have created a short movie Learning Algorithms
on a very important and interesting topic. Now, you want to take (with Python and R
preliminary feedback (ratings) on the movie before making it public. Codes)
What are the possible ways by which you can do that?
7 Types of Regression
Techniques you should
A: You may ask one of your friends to rate the movie for you.
know!
Now it’s entirely possible that the person you have chosen loves you
very much and doesn’t want to break your heart by providing a 1-star Introduction to k-Nearest
rating to the horrible work you have created. Neighbors: Simplified
(with implementation in
B: Another way could be by asking 5 colleagues of yours to rate the Python)
movie. Stock Prices Prediction
This should provide a better idea of the movie. This method may Using Machine Learning
provide honest ratings for your movie. But a problem still exists. These and Deep Learning
5 people may not be “Subject Matter Experts” on the topic of your Techniques (with Python
movie. Sure, they might understand the cinematography, the shots, or codes)
the audio, but at the same time may not be the best judges of dark
humour.

C: How about asking 50 people to rate the movie?

Some of which can be your friends, some of them can be your
colleagues and some may even be total strangers.

The responses, in this case, would be more generalized and

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

diversified since now you have people with different sets of skills. And
as it turns out – this is a better approach to get honest ratings than the
previous cases we saw.

With these examples, you can infer that a diverse group of people are
likely to make better decisions as compared to individuals. Similar is
true for a diverse set of models in comparison to single models. This
diversification in Machine Learning is achieved by a technique called
Ensemble Learning.
RECENT POSTS

Now that you have got a gist of what ensemble learning is – let us look
at the various techniques in ensemble learning along with their 8 Awesome Data Science
implementations. Capstone Projects from
Praxis Business School
APRIL 29, 2019

2. Simple Ensemble Techniques Learn how to Build and

Deploy a Chatbot in Minutes
In this section, we will look at a few simple but powerful techniques, using Rasa (IPL Case
Study!)
namely:
APRIL 29, 2019

1. Max Voting
How I Built Animated Plots
2. Averaging in R to Analyze my Fitness
3. Weighted Averaging Data (and you can too!)
APRIL 26, 2019
2.1 Max Voting
DataHack Radio #22:
The max voting method is generally used for classification problems. Exploring Computer Vision
In this technique, multiple models are used to make predictions for and Data Engineering with
each data point. The predictions by each model are considered as a Dat Tran
‘vote’. The predictions which we get from the majority of the models APRIL 25, 2019
are used as the final prediction.

For example, when you asked 5 of your colleagues to rate your movie
(out of 5); we’ll assume three of them rated it as 4 while two of them
gave it a 5. Since the majority gave a rating of 4, the final rating will be
taken as 4. You can consider this as taking the mode of all the
predictions.

The result of max voting would be something like this:

Colleague 1 Colleague 2 Colleague 3 Colleague 4 Colleague 5 Final rating

5 4 5 4 4 4

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

Sample Code:

Here x_train consists of independent variables in training data, y_train

is the target variable for training data. The validation set is x_test
(independent variables) and y_test (target variable) .

model1 = tree.DecisionTreeClassifier()

model2 = KNeighborsClassifier()

model3= LogisticRegression()

model1.fit(x_train,y_train)

model2.fit(x_train,y_train)

model3.fit(x_train,y_train)

pred1=model1.predict(x_test)

pred2=model2.predict(x_test)

pred3=model3.predict(x_test)

final_pred = np.array([])

for i in range(0,len(x_test)):

final_pred = np.append(final_pred, mode([pred1[i], pred2[i

], pred3[i]]))

Alternatively, you can use “VotingClassifier” module in sklearn as

follows:

from sklearn.ensemble import VotingClassifier

model1 = LogisticRegression(random_state=1)

model2 = tree.DecisionTreeClassifier(random_state=1)

model = VotingClassifier(estimators=[('lr', model1), ('dt', mo

del2)], voting='hard')

model.fit(x_train,y_train)

model.score(x_test,y_test)

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

2.2 Averaging

Similar to the max voting technique, multiple predictions are made for
each data point in averaging. In this method, we take an average of
predictions from all the models and use it to make the final prediction.
Averaging can be used for making predictions in regression problems
or while calculating probabilities for classification problems.

For example, in the below case, the averaging method would take the
average of all the values.

i.e. (5+4+5+4+4)/5 = 4.4

Colleague 1 Colleague 2 Colleague 3 Colleague 4 Colleague 5 Final rating

5 4 5 4 4 4.4

Sample Code:

model1 = tree.DecisionTreeClassifier()

model2 = KNeighborsClassifier()

model3= LogisticRegression()

model1.fit(x_train,y_train)

model2.fit(x_train,y_train)

model3.fit(x_train,y_train)

pred1=model1.predict_proba(x_test)

pred2=model2.predict_proba(x_test)

pred3=model3.predict_proba(x_test)

finalpred=(pred1+pred2+pred3)/3

2.3 Weighted Average

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

This is an extension of the averaging method. All models are assigned

different weights defining the importance of each model for prediction.
For instance, if two of your colleagues are critics, while others have no
prior experience in this field, then the answers by these two friends are
given more importance as compared to the other people.

The result is calculated as [(50.23) + (40.23) + (50.18) + (40.18) +

(4*0.18)] = 4.41.

Colleague 1 Colleague 2 Colleague 3 Colleague 4 Colleague 5 Final rating

weight 0.23 0.23 0.18 0.18 0.18

rating 5 4 5 4 4 4.41

Sample Code:

model1 = tree.DecisionTreeClassifier()

model2 = KNeighborsClassifier()

model3= LogisticRegression()

model1.fit(x_train,y_train)

model2.fit(x_train,y_train)

model3.fit(x_train,y_train)

pred1=model1.predict_proba(x_test)

pred2=model2.predict_proba(x_test)

pred3=model3.predict_proba(x_test)

finalpred=(pred1*0.3+pred2*0.3+pred3*0.4)

3. Advanced Ensemble techniques

Now that we have covered the basic ensemble techniques, let’s move
on to understanding the advanced techniques.

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

3.1 Stacking

Stacking is an ensemble learning technique that uses predictions from

multiple models (for example decision tree, knn or svm) to build a new
model. This model is used for making predictions on the test set.
Below is a step-wise explanation for a simple stacked ensemble:

1. The train set is split into 10 parts.

2. A base model (suppose a decision tree) is fitted on 9 parts and

predictions are made for the 10th part. This is done for each part
of the train set.

3. The base model (in this case, decision tree) is then fitted on the
whole train dataset.
4. Using this model, predictions are made on the test set.

5. Steps 2 to 4 are repeated for another base model (say knn)

resulting in another set of predictions for the train set and test
set.

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

6. The predictions from the train set are used as features to build a
new model.

7. This model is used to make final predictions on the test

prediction set.

Sample code:

We first define a function to make predictions on n-folds of train and

test dataset. This function returns the predictions for train and test for
each model.

def Stacking(model,train,y,test,n_fold):

folds=StratifiedKFold(n_splits=n_fold,random_state=1)

test_pred=np.empty((test.shape[0],1),float)

train_pred=np.empty((0,1),float)

for train_indices,val_indices in folds.split(train,y.values

x_train,x_val=train.iloc[train_indices],train.iloc[val_i

ndices]

y_train,y_val=y.iloc[train_indices],y.iloc[val_indices]

model.fit(X=x_train,y=y_train)

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

train_pred=np.append(train_pred,model.predict(x_val))

test_pred=np.append(test_pred,model.predict(test))

return test_pred.reshape(-1,1),train_pred

Now we’ll create two base models – decision tree and knn.

model1 = tree.DecisionTreeClassifier(random_state=1)

test_pred1 ,train_pred1=Stacking(model=model1,n_fold=10, train

=x_train,test=x_test,y=y_train)

train_pred1=pd.DataFrame(train_pred1)

test_pred1=pd.DataFrame(test_pred1)

model2 = KNeighborsClassifier()

test_pred2 ,train_pred2=Stacking(model=model2,n_fold=10,train=

x_train,test=x_test,y=y_train)

train_pred2=pd.DataFrame(train_pred2)

test_pred2=pd.DataFrame(test_pred2)

Create a third model, logistic regression, on the predictions of the

decision tree and knn models.

df = pd.concat([train_pred1, train_pred2], axis=1)

df_test = pd.concat([test_pred1, test_pred2], axis=1)

model = LogisticRegression(random_state=1)

model.fit(df,y_train)

model.score(df_test, y_test)

In order to simplify the above explanation, the stacking model we have

created has only two levels. The decision tree and knn models are
built at level zero, while a logistic regression model is built at level one.
Feel free to create multiple levels in a stacking model.

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

3.2 Blending

Blending follows the same approach as stacking but uses only a

holdout (validation) set from the train set to make predictions. In other
words, unlike stacking, the predictions are made on the holdout set
only. The holdout set and the predictions are used to build a model
which is run on the test set. Here is a detailed explanation of the
blending process:

1. The train set is split into training and validation sets.

2. Model(s) are fitted on the training set.

3. The predictions are made on the validation set and the test set.

4. The validation set and its predictions are used as features to

build a new model.
5. This model is used to make final predictions on the test and
meta-features.

Sample Code:

We’ll build two models, decision tree and knn, on the train set in order
to make predictions on the validation set.

model1 = tree.DecisionTreeClassifier()

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

model1.fit(x_train, y_train)

val_pred1=model1.predict(x_val)

test_pred1=model1.predict(x_test)

val_pred1=pd.DataFrame(val_pred1)

test_pred1=pd.DataFrame(test_pred1)

model2 = KNeighborsClassifier()

model2.fit(x_train,y_train)

val_pred2=model2.predict(x_val)

test_pred2=model2.predict(x_test)

val_pred2=pd.DataFrame(val_pred2)

test_pred2=pd.DataFrame(test_pred2)

Combining the meta-features and the validation set, a logistic

regression model is built to make predictions on the test set.

df_val=pd.concat([x_val, val_pred1,val_pred2],axis=1)

df_test=pd.concat([x_test, test_pred1,test_pred2],axis=1)

model = LogisticRegression()

model.fit(df_val,y_val)

model.score(df_test,y_test)

3.3 Bagging

The idea behind bagging is combining the results of multiple models

(for instance, all decision trees) to get a generalized result. Here’s a
question: If you create all the models on the same set of data and
combine it, will it be useful? There is a high chance that these models
will give the same result since they are getting the same input. So how
can we solve this problem? One of the techniques is bootstrapping.

Bootstrapping is a sampling technique in which we create subsets of

observations from the original dataset, with replacement. The size of
the subsets is the same as the size of the original set.

Bagging (or Bootstrap Aggregating) technique uses these subsets

(bags) to get a fair idea of the distribution (complete set). The size of
subsets created for bagging may be less than the original set.

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

1. Multiple subsets are created from the original dataset, selecting

observations with replacement.
2. A base model (weak model) is created on each of these
subsets.
3. The models run in parallel and are independent of each other.
4. The final predictions are determined by combining the
predictions from all the models.

3.4 Boosting

Before we go further, here’s another question for you: If a data point is

incorrectly predicted by the first model, and then the next (probably all
models), will combining the predictions provide better results? Such
situations are taken care of by boosting.

Boosting is a sequential process, where each subsequent model

attempts to correct the errors of the previous model. The succeeding
models are dependent on the previous model. Let’s understand the
way boosting works in the below steps.

1. A subset is created from the original dataset.

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

2. Initially, all data points are given equal weights.

3. A base model is created on this subset.
4. This model is used to make predictions on the whole dataset.

5. Errors are calculated using the actual values and predicted

values.
6. The observations which are incorrectly predicted, are given
higher weights.
(Here, the three misclassified blue-plus points will be given
higher weights)
7. Another model is created and predictions are made on the
dataset.
(This model tries to correct the errors from the previous model)

8. Similarly, multiple models are created, each correcting the errors

of the previous model.
9. The final model (strong learner) is the weighted mean of all the
models (weak learners).

Thus, the boosting algorithm combines a number of weak

learners to form a strong learner. The individual models would
not perform well on the entire dataset, but they work well for
some part of the dataset. Thus, each model actually boosts the
performance of the ensemble.

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

4. Algorithms based on Bagging and Boosting

Bagging and Boosting are two of the most commonly used techniques
in machine learning. In this section, we will look at them in detail.
Following are the algorithms we will be focusing on:

Bagging algorithms:

Bagging meta-estimator
Random forest

Boosting algorithms:

AdaBoost
GBM
XGBM
Light GBM
CatBoost

For all the algorithms discussed in this section, we will follow this
procedure:

Introduction to the algorithm

Sample code
Parameters

For this article, I have used the Loan Prediction Problem. You can
download the dataset from here. Please note that a few code lines
(reading the data, splitting into train-test sets, etc.) will be the same
for each algorithm. In order to avoid repetition, I have written the code
for the same below, and further discussed only the code for the
algorithm.

#importing important packages

import pandas as pd

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

import numpy as np

#reading the dataset

df=pd.read_csv("/home/user/Desktop/train.csv")

#filling missing values

df['Gender'].fillna('Male', inplace=True)

Similarly, fill values for all the columns. EDA, missing values and
outlier treatment has been skipped for the purposes of this article. To
understand these topics, you can go through this article: Ultimate
guide for Data Exploration in Python using NumPy, Matplotlib
and Pandas.

#split dataset into train and test

from sklearn.model_selection import train_test_split

train, test = train_test_split(df, test_size=0.3, random_state

=0)

x_train=train.drop('Loan_Status',axis=1)

y_train=train['Loan_Status']

x_test=test.drop('Loan_Status',axis=1)

y_test=test['Loan_Status']

#create dummies

x_train=pd.get_dummies(x_train)

x_test=pd.get_dummies(x_test)

Let’s jump into the bagging and boosting algorithms!

4.1 Bagging meta-estimator

Bagging meta-estimator is an ensembling algorithm that can be used

for both classification (BaggingClassifier) and regression
(BaggingRegressor) problems. It follows the typical bagging technique

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

to make predictions. Following are the steps for the bagging meta-
estimator algorithm:

1. Random subsets are created from the original dataset

(Bootstrapping).
2. The subset of the dataset includes all features.
3. A user-specified base estimator is fitted on each of these smaller
sets.
4. Predictions from each model are combined to get the final result.

Code:

from sklearn.ensemble import BaggingClassifier

from sklearn import tree

model = BaggingClassifier(tree.DecisionTreeClassifier(random_s

tate=1))

model.fit(x_train, y_train)

model.score(x_test,y_test)

0.75135135135135134

Sample code for regression problem:

from sklearn.ensemble import BaggingRegressor

model = BaggingRegressor(tree.DecisionTreeRegressor(random_sta

te=1))

model.fit(x_train, y_train)

model.score(x_test,y_test)

Parameters used in the algorithms:

base_estimator:
It defines the base estimator to fit on random subsets of
the dataset.
When nothing is specified, the base estimator is a decision
tree.

n_estimators:
It is the number of base estimators to be created.
The number of estimators should be carefully tuned as a
large number would take a very long time to run, while a
very small number might not provide the best results.

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

max_samples:
This parameter controls the size of the subsets.
It is the maximum number of samples to train each base
estimator.

max_features:
Controls the number of features to draw from the whole
dataset.
It defines the maximum number of features required to
train each base estimator.

n_jobs:
The number of jobs to run in parallel.
Set this value equal to the cores in your system.
If -1, the number of jobs is set to the number of cores.

random_state:
It specifies the method of random split. When random
state value is same for two models, the random selection
is same for both models.
This parameter is useful when you want to compare
different models.

4.2 Random Forest

Random Forest is another ensemble machine learning algorithm that

follows the bagging technique. It is an extension of the bagging
estimator algorithm. The base estimators in random forest are decision
trees. Unlike bagging meta estimator, random forest randomly selects
a set of features which are used to decide the best split at each node
of the decision tree.

Looking at it step-by-step, this is what a random forest model does:

1. Random subsets are created from the original dataset

(bootstrapping).
2. At each node in the decision tree, only a random set of features
are considered to decide the best split.
3. A decision tree model is fitted on each of the subsets.
4. The final prediction is calculated by averaging the predictions
from all decision trees.

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

Note: The decision trees in random forest can be built on a subset of

data and features. Particularly, the sklearn model of random forest
uses all features for decision tree and a subset of features are
randomly selected for splitting at each node.

To sum up, Random forest randomly selects data points and features,
and builds multiple trees (Forest) .

Code:

from sklearn.ensemble import RandomForestClassifier

model= RandomForestClassifier(random_state=1)

model.fit(x_train, y_train)

model.score(x_test,y_test)

0.77297297297297296

You can see feature importance by using model.feature_importances_

in random forest.

for i, j in sorted(zip(x_train.columns, model.feature_importan

ces_)):

print(i, j)

The result is as below:

ApplicantIncome 0.180924483743

CoapplicantIncome 0.135979758733

Credit_History 0.186436670523

Property_Area_Urban 0.0167025290557

Self_Employed_No 0.0165385567137

Self_Employed_Yes 0.0134763695267

Sample code for regression problem:

from sklearn.ensemble import RandomForestRegressor

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

model= RandomForestRegressor()

model.fit(x_train, y_train)

model.score(x_test,y_test)

Parameters

n_estimators:
It defines the number of decision trees to be created in a
random forest.
Generally, a higher number makes the predictions
stronger and more stable, but a very large number can
result in higher training time.

criterion:
It defines the function that is to be used for splitting.
The function measures the quality of a split for each
feature and chooses the best split.

max_features :
It defines the maximum number of features allowed for the
split in each decision tree.
Increasing max features usually improve performance but
a very high number can decrease the diversity of each
tree.

max_depth:
Random forest has multiple decision trees. This parameter
defines the maximum depth of the trees.

min_samples_split:
Used to define the minimum number of samples required
in a leaf node before a split is attempted.
If the number of samples is less than the required number,
the node is not split.

min_samples_leaf:
This defines the minimum number of samples required to
be at a leaf node.
Smaller leaf size makes the model more prone to
capturing noise in train data.

max_leaf_nodes:
This parameter specifies the maximum number of leaf

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

nodes for each tree.

The tree stops splitting when the number of leaf nodes
becomes equal to the max leaf node.

n_jobs:
This indicates the number of jobs to run in parallel.
Set value to -1 if you want it to run on all cores in the
system.

random_state:
This parameter is used to define the random selection.
It is used for comparison between various models.

4.3 AdaBoost

Adaptive boosting or AdaBoost is one of the simplest boosting

algorithms. Usually, decision trees are used for modelling. Multiple
sequential models are created, each correcting the errors from the last
model. AdaBoost assigns weights to the observations which are
incorrectly predicted and the subsequent model works to predict these
values correctly.

Below are the steps for performing the AdaBoost algorithm:

1. Initially, all observations in the dataset are given equal weights.

2. A model is built on a subset of data.
3. Using this model, predictions are made on the whole dataset.
4. Errors are calculated by comparing the predictions and actual
values.
5. While creating the next model, higher weights are given to the
data points which were predicted incorrectly.
6. Weights can be determined using the error value. For instance,
higher the error more is the weight assigned to the observation.
7. This process is repeated until the error function does not
change, or the maximum limit of the number of estimators is
reached.

Code:

from sklearn.ensemble import AdaBoostClassifier

model = AdaBoostClassifier(random_state=1)

model.fit(x_train, y_train)

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

model.score(x_test,y_test)

0.81081081081081086

Sample code for regression problem:

from sklearn.ensemble import AdaBoostRegressor

model = AdaBoostRegressor()

model.fit(x_train, y_train)

model.score(x_test,y_test)

Parameters

base_estimators:
It helps to specify the type of base estimator, that is, the
machine learning algorithm to be used as base learner.

n_estimators:
It defines the number of base estimators.
The default value is 10, but you should keep a higher
value to get better performance.

learning_rate:
This parameter controls the contribution of the estimators
in the final combination.
There is a trade-off between learning_rate and
n_estimators.

max_depth:
Defines the maximum depth of the individual estimator.
Tune this parameter for best performance.

n_jobs
Specifies the number of processors it is allowed to use.
Set value to -1 for maximum processors allowed.

random_state :
An integer value to specify the random data split.
A definite value of random_state will always produce same
results if given with same parameters and training data.

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

4.4 Gradient Boosting (GBM)

Gradient Boosting or GBM is another ensemble machine learning

algorithm that works for both regression and classification problems.
GBM uses the boosting technique, combining a number of weak
learners to form a strong learner. Regression trees used as a base
learner, each subsequent tree in series is built on the errors calculated
by the previous tree.

We will use a simple example to understand the GBM algorithm. We

have to predict the age of a group of people using the below data:

1. The mean age is assumed to be the predicted value for all

observations in the dataset.
2. The errors are calculated using this mean prediction and actual
values of age.

3. A tree model is created using the errors calculated above as

target variable. Our objective is to find the best split to minimize
the error.
4. The predictions by this model are combined with the predictions
1.

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

5. This value calculated above is the new prediction.

6. New errors are calculated using this predicted value and actual
value.

7. Steps 2 to 6 are repeated till the maximum number of iterations

is reached (or error function does not change).

Code:

from sklearn.ensemble import GradientBoostingClassifier

model= GradientBoostingClassifier(learning_rate=0.01,random_st

ate=1)

model.fit(x_train, y_train)

model.score(x_test,y_test)

0.81621621621621621

Sample code for regression problem:

from sklearn.ensemble import GradientBoostingRegressor

model= GradientBoostingRegressor()

model.fit(x_train, y_train)

model.score(x_test,y_test)

Parameters

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

min_samples_split
Defines the minimum number of samples (or observations)
which are required in a node to be considered for splitting.
Used to control over-fitting. Higher values prevent a model
from learning relations which might be highly specific to
the particular sample selected for a tree.

min_samples_leaf
Defines the minimum samples required in a terminal or
leaf node.
Generally, lower values should be chosen for imbalanced
class problems because the regions in which the minority
class will be in the majority will be very small.

min_weight_fraction_leaf
Similar to min_samples_leaf but defined as a fraction of
the total number of observations instead of an integer.

max_depth
The maximum depth of a tree.
Used to control over-fitting as higher depth will allow the
model to learn relations very specific to a particular
sample.
Should be tuned using CV.

max_leaf_nodes
The maximum number of terminal nodes or leaves in a
tree.
Can be defined in place of max_depth. Since binary trees
are created, a depth of ‘n’ would produce a maximum of
2^n leaves.
If this is defined, GBM will ignore max_depth.

max_features
The number of features to consider while searching for the
best split. These will be randomly selected.
As a thumb-rule, the square root of the total number of
features works great but we should check up to 30-40% of
the total number of features.
Higher values can lead to over-fitting but it generally
depends on a case to case scenario.

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

4.5 XGBoost

XGBoost (extreme Gradient Boosting) is an advanced implementation

of the gradient boosting algorithm. XGBoost has proved to be a highly
effective ML algorithm, extensively used in machine learning
competitions and hackathons. XGBoost has high predictive power and
is almost 10 times faster than the other gradient boosting techniques.
It also includes a variety of regularization which reduces overfitting and
improves overall performance. Hence it is also known as ‘regularized
boosting‘ technique.

Let us see how XGBoost is comparatively better than other

techniques:

1. Regularization:
Standard GBM implementation has no regularisation like
XGBoost.
Thus XGBoost also helps to reduce overfitting.

2. Parallel Processing:
XGBoost implements parallel processing and is faster than
GBM .
XGBoost also supports implementation on Hadoop.

3. High Flexibility:
XGBoost allows users to define custom optimization
objectives and evaluation criteria adding a whole new
dimension to the model.

4. Handling Missing Values:

XGBoost has an in-built routine to handle missing values.

5. Tree Pruning:
XGBoost makes splits up to the max_depth specified and
then starts pruning the tree backwards and removes splits
beyond which there is no positive gain.

6. Built-in Cross-Validation:
XGBoost allows a user to run a cross-validation at each
iteration of the boosting process and thus it is easy to get
the exact optimum number of boosting iterations in a
single run.

Code:

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

Since XGBoost takes care of the missing values itself, you do not have
to impute the missing values. You can skip the step for missing value
imputation from the code mentioned above. Follow the remaining
steps as always and then apply xgboost as below.

import xgboost as xgb

model=xgb.XGBClassifier(random_state=1,learning_rate=0.01)

model.fit(x_train, y_train)

model.score(x_test,y_test)

0.82702702702702702

Sample code for regression problem:

import xgboost as xgb

model=xgb.XGBRegressor()

model.fit(x_train, y_train)

model.score(x_test,y_test)

Parameters

nthread
This is used for parallel processing and the number of
cores in the system should be entered..
If you wish to run on all cores, do not input this value. The
algorithm will detect it automatically.

eta
Analogous to learning rate in GBM.
Makes the model more robust by shrinking the weights on
each step.

min_child_weight
Defines the minimum sum of weights of all observations
required in a child.
Used to control over-fitting. Higher values prevent a model
from learning relations which might be highly specific to
the particular sample selected for a tree.

max_depth
It is used to define the maximum depth.
Higher depth will allow the model to learn relations very

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

specific to a particular sample.

gamma
A node is split only when the resulting split gives a positive
reduction in the loss function. Gamma specifies the
minimum loss reduction required to make a split.
Makes the algorithm conservative. The values can vary
depending on the loss function and should be tuned.

subsample
Same as the subsample of GBM. Denotes the fraction of
observations to be randomly sampled for each tree.
Lower values make the algorithm more conservative and
prevent overfitting but values that are too small might lead
to under-fitting.

colsample_bytree
It is similar to max_features in GBM.
Denotes the fraction of columns to be randomly sampled
for each tree.

4.6 Light GBM

Before discussing how Light GBM works, let’s first understand why we
need this algorithm when we have so many others (like the ones we
have seen above). Light GBM beats all the other algorithms when
the dataset is extremely large. Compared to the other algorithms,
Light GBM takes lesser time to run on a huge dataset.

LightGBM is a gradient boosting framework that uses tree-based

algorithms and follows leaf-wise approach while other algorithms work
in a level-wise approach pattern. The images below will help you
understand the difference in a better way.

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

Leaf-wise grwth may cause over-fitting on smaller datasets but that

can be avoided by using the ‘max_depth’ parameter for learning. You
can read more about Light GBM and its comparison with XGB in this
article.

Code:

import lightgbm as lgb

train_data=lgb.Dataset(x_train,label=y_train)

#define parameters

params = {'learning_rate':0.001}

model= lgb.train(params, train_data, 100)

y_pred=model.predict(x_test)

for i in range(0,185):

if y_pred[i]>=0.5:

y_pred[i]=1

else:

y_pred[i]=0

0.81621621621621621

Sample code for regression problem:

import lightgbm as lgb

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

train_data=lgb.Dataset(x_train,label=y_train)

params = {'learning_rate':0.001}

model= lgb.train(params, train_data, 100)

from sklearn.metrics import mean_squared_error

rmse=mean_squared_error(y_pred,y_test)**0.5

Parameters

num_iterations:
It defines the number of boosting iterations to be
performed.

num_leaves :
This parameter is used to set the number of leaves to be
formed in a tree.
In case of Light GBM, since splitting takes place leaf-wise
rather than depth-wise, num_leaves must be smaller than
2^(max_depth), otherwise, it may lead to overfitting.

min_data_in_leaf :
A very small value may cause overfitting.
It is also one of the most important parameters in dealing
with overfitting.

max_depth:
It specifies the maximum depth or level up to which a tree
can grow.
A very high value for this parameter can cause overfitting.

bagging_fraction:
It is used to specify the fraction of data to be used for each
iteration.
This parameter is generally used to speed up the training.

max_bin :
Defines the max number of bins that feature values will be
bucketed in.
A smaller value of max_bin can save a lot of time as it
buckets the feature values in discrete bins which is
computationally inexpensive.

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

4.7 CatBoost

Handling categorical variables is a tedious process, especially when

you have a large number of such variables. When your categorical
variables have too many labels (i.e. they are highly cardinal),
performing one-hot-encoding on them exponentially increases the
dimensionality and it becomes really difficult to work with the dataset.

CatBoost can automatically deal with categorical variables and does

not require extensive data preprocessing like other machine learning
algorithms. Here is an article that explains CatBoost in detail.

Code:

CatBoost algorithm effectively deals with categorical variables. Thus,

you should not perform one-hot encoding for categorical variables.
Just load the files, impute missing values, and you’re good to go.

from catboost import CatBoostClassifier

model=CatBoostClassifier()

categorical_features_indices = np.where(df.dtypes != np.float)

[0]

model.fit(x_train,y_train,cat_features=([ 0, 1, 2, 3, 4, 10])

,eval_set=(x_test, y_test))

model.score(x_test,y_test)

0.80540540540540539

Sample code for regression problem:

from catboost import CatBoostRegressor

model=CatBoostRegressor()

categorical_features_indices = np.where(df.dtypes != np.float)

[0]

model.fit(x_train,y_train,cat_features=([ 0, 1, 2, 3, 4, 10])

,eval_set=(x_test, y_test))

model.score(x_test,y_test)

Parameters

loss_function:

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

Defines the metric to be used for training.

iterations:
The maximum number of trees that can be built.
The final number of trees may be less than or equal to this
number.

learning_rate:
Defines the learning rate.
Used for reducing the gradient step.

border_count:
It specifies the number of splits for numerical features.
It is similar to the max_bin parameter.

depth:
Defines the depth of the trees.

random_seed:
This parameter is similar to the ‘random_state’ parameter
we have seen previously.
It is an integer value to define the random seed for
training.

This brings us to the end of the ensemble algorithms section. We have

covered quite a lot in this article!

End Notes

Ensemble modeling can exponentially boost the performance of your

model and can sometimes be the deciding factor between first place
and second! In this article, we covered various ensemble learning
techniques and saw how these techniques are applied in machine
learning algorithms. Further, we implemented the algorithms on our
loan prediction dataset.

This article will have given you a solid understanding of this topic. If
you have any suggestions or questions, do share in the comment
section below. Also, I encourage you to implement these algorithms at
your end and share your results with us!

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

Learn, train, compete, hack and get hired!

You can also read this article on Analytics Vidhya's Android APP

TAGS : ADABOOST, AVERAGING, BAGGING, BLENDING, BOOSTING, ENSEMBLE

LEARNING, GBM, LIGHT GBM, MAX VOTING, RANDOM FOREST, STACKING, WEIGHTED

AVERAGING, XGB

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

PREVIOUS ARTICLE NEXT ARTICLE

T DeepMind’s h Facebook’s Machine U
Computer Vision Learning Model
Algorithm Brings the Manipulates Images
Power of to Open Closed
Imagination to Build Eyes!
3D Scenes from 2D
Images

Aishwarya Singh

An avid reader and blogger who loves exploring the endless

world of data science and artificial intelligence. Fascinated by
the limitless applications of ML and AI; eager to learn and
discover the depths of data science.

This article is quite old and you might not get a prompt response
from the author. We request you to post this comment on
Analytics Vidhya's Discussion portal to get your queries
resolved

55 COMMENTS

JOAQUIN Reply
June 18, 2018 at 1:07 pm

Really nice article! And just when I needed the most. Could you please
upload the dataset you used? Im having an error regarding the shapes
when implementing the Stacking Ensemble.

Thank you!

AISHWARYA SINGH Reply

June 18, 2018 at 1:27 pm

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

Hi Joaquin,

Glad you found this useful. You can download the dataset
from this link.

SURAJ PANDEY Reply

July 8, 2018 at 7:47 pm

model.score(df_test, y_test) is failing as the

shape of df_test and y_test doesn’t match

AISHWARYA SINGH Reply

July 9, 2018 at 9:36 am

Hi Suraj,

Can you please print the shape and

head of your df_test and y_test, and
show the results?

AHMED Reply
September 5, 2018 at
11:31 pm

I have the same problem

dealing with titanic dataset
The shape of `df`: (623, 2)
The shape of `df_test`:
(2680, 2)
The shape of `test_y`: (268,)
The shape of `test_X`: (268,
11)
The shape of `train_X`:
(623, 11)
The shape of `train_X`:
(623,)

AISHW
Reply
SINGH
Septemb

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

6, 2018
at 11:04
am

Ideally df_test
should have 268
rows.

CHRIS
Reply
PALME
October
11, 2018
at 4:57
pm

Yeah, bad
formatting in the
method. The line
before the return
statement should
be on the same
level as return
statement:
test_pred=np.appe
so it gets run
once, at the
moment it gets
run for each fold.
Also, so set up
the empty array it
should be
test_pred=np.emp

VARSH
Reply
October
26, 2018
at 11:33
pm

Same problem
here. with Pima
Indian Diabetes

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

dataset
Shape of df_test:
(847, 2)
Shape of y_test:
(77, )

ADITYA Reply
June 18, 2018 at 1:49 pm

Nice Article !!!

AISHWARYA SINGH Reply

June 18, 2018 at 9:38 pm

Thanks Aditya

SANJOY DATTA Reply

June 18, 2018 at 5:00 pm

Thank you. This is great content. Been following it from the beginning.
2 issues:

Getting NameError: tree is not defined.

Secondly, from section 4 onwards, there is dataset to work on. But no

dataset referred to for sections before 4. So cannot run the code on
data.

NameError Traceback (most recent call last)

in ()
3 from sklearn.ensemble import BaggingClassifier
4 #model = tree.DecisionTreeClassifier()
—-> 5 model =
BaggingClassifier(tree.DecisionTreeClassifier(random_state=1))
6 model.fit(x_train, y_train)
7 model.score(x_test,y_test)

NameError: name ‘tree’ is not defined

For beginners like me, will need a little more detail to follow the full
notebook.

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

AISHWARYA SINGH Reply

June 18, 2018 at 6:11 pm

Hi Sanjoy,

The codes for voting and averaging can be used with any
dataset, and hence no particular dataset is attached to that
section. You can try implementing the codes on loan
prediction dataset and if you face any issues do let me
know.

Regarding the error ‘tree not found’ , please use the

following code line : from sklearn import tree. Thank you for
pointing it out. I will update the same in the post.

SANJOY DATTA Reply

June 19, 2018 at 3:48 pm

Thank you for your response.

Now getting this error. Changed Y/N to 1/0

hoping that would take care of NaN at least. But
the problem persists.

ValueError: Input contains NaN, infinity or a value

too large for dtype(‘float64’).

AISHWARYA SINGH Reply

June 19, 2018 at 4:05 pm

Please impute the missing values in

your dataset. Steps for data
preprocessing have not been included
in this article. You can fill the missing
values using
df['Column_name'].fillna('value',i

SANJOY Reply
DATTA
June 19, 2018 at 7:15

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

Thank you for your

patience. Missed the
instruction for other fields.
Done that. Now have a
different problem.

After getting dummies for

x_train and x_test, the
number of X variables are
turning out to be different –
449 for train and 205 for test
for 30% test set.

It changes as we change to
20% – to
511 for train and 143 for
test.

For 10% test, it further

changes to
572, 82 respectively.

Obviously the range of

unique values within train
and test are causing this.
Loan_ID is the main
contributor here since each
614 examples have unique
id.

Hence the error now for

30% test separation is:

ValueError: Number of
features of the model must
match the input. Model
n_features is 449 and input
n_features is 205.

If I remove Loan_ID as
input, I get model score 1.0.

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

Do these make any sense?

AISHW
Reply
SINGH
June 19,
2018 at
7:24 pm

Remove the
Loan_ID before
creating
dummies.
Generally ID is
unique for every
data point and
hence should not
be used for
training the data.
After removing
the loan ID, fit the
model on the
remaining
features and test
on the validation
set.

UMAR L U Reply
June 18, 2018 at 7:45 pm

Wonderful article

AISHWARYA SINGH Reply

June 18, 2018 at 9:38 pm

Thank you

ABHINAV JAIN Reply

June 19, 2018 at 2:44 pm

Hi, In section 3.3 bagging at one point you mentioned that “The size of

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

the subsets is the same as the size of the original set.” when you are
explaining bootstraping and in the next paragraph you are saying “The
size of subsets created for bagging may be less than the original set.”
Please make it clear. Nice Article !!

AISHWARYA SINGH Reply

June 19, 2018 at 3:20 pm

Hi Abhinav,

In bootstrapping, the size of subsets are same as the size of

the original set. While in bagging, the size of each subset
may be equal to or lesser than the size of the original set.

MEHARUNNISA Reply
June 24, 2018 at 12:37 am

kindly explain the same using R language

AISHWARYA SINGH Reply

June 25, 2018 at 11:54 am

Hi meharunnisa,

Here is an article on ensemble learning in R : How to build

Ensemble Models in machine learning? (with code in R)

ISHIT Reply
June 29, 2018 at 4:23 pm

Can you please explain how did you calculate the Prediction 2 in
gradient boosting?

AISHWARYA SINGH Reply

June 29, 2018 at 5:00 pm

Hi Ishit,

In this case, I have taken a simple example to explain the

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

concept. So the residuals are considered as target for next

decision tree. The decision tree splits such that the similar
target are in the same node. Further the average of the
node is calculated. This is assigned to all the values in the
as new predictions. I have made some update, please
check if it clarifies your doubt now. If you still face any issue,
do let me know.

AJAY Reply
July 1, 2018 at 12:25 am

Thanks for the article, Aishwarya!

AISHWARYA SINGH Reply

July 2, 2018 at 10:23 am

Hi Ajay,

Glad you liked it!

RAM Reply
July 2, 2018 at 12:03 pm

Can you please explain how did you calculate the Prediction 2 in
gradient boosting?
For Prediction 1 you use following method
mean age =combine all age / number of person age
Residual 1 = age – mean age

in same way how you have calculated predication 2

AISHWARYA SINGH Reply

July 2, 2018 at 7:38 pm

Hi,

We create a decision tree on the residuals. Let us suppose

that the decision tree splits such that all positive numbers
are in one leaf node while negative in other (Just an
example, the results are much more complicated). The

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

average for each leaf node is taken as the predicted value.

Further these values are combined with the mean and new
residuals are created.

JORGE REYES-SPINDOLA Reply

July 11, 2018 at 10:17 pm

Thank you for a very informative article. Just one issue: When fitting
the BaggingRegressor to the training data, I get the following error:
ValueError: could not convert string to float: ‘Y’

I’m assuming it’s because we need to convert the y_train to {1,0}

Am I correct?

Thanks much
Jorge

AISHWARYA SINGH Reply

July 12, 2018 at 11:34 pm

Hi Jorge,

If your target variable is ‘Y’ and ‘N’ , you should use

BaggingClassifier instead of BaggingRegressor.

HRUSHIKESH Reply
July 16, 2018 at 1:25 pm

Great article. Keep up the good work !

AISHWARYA SINGH Reply

July 17, 2018 at 4:55 pm

Thank you

CHIRANJIT Reply
July 18, 2018 at 12:48 pm

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

This is very nice, Hoe do you see this in Prod ?

Anyway great work.

AYMEN Reply
July 22, 2018 at 1:36 pm

A nice article

But if i need to use boosting or bagging using different models like

(decision tree, random forest, logistic regression ) how can i implement
it ?

AISHWARYA SINGH Reply

July 31, 2018 at 3:21 pm

Hi Aymen,

If you see the code for bagging classifier, you will observe
that we can provide the classifier we wish to use. As an
example, I have used decision tree, you can use random
forest or logistic regression.

MANOJ Reply
August 27, 2018 at 12:16 pm

Hi !, in the stacking function,

you are initiating “test_pred” with some random floats of shape
(test.shape[0],1)
code: test_pred = np.empty(test.shape[0],1,float)

later in the same function , you are appending the predicted values of
“test” dataset to the already existing “test_pred”.,
1) if “test_pred” is the predictions of the “test” dataset that we pass in
the funciton, they should have same number of rows, but in the way it
was coded, the number of rows will be twice the number of rows of
“test” dataset, since “test_pred” was already initiated with some
random numbers(the empty commad generates) of rows equal to rows
in “test”, and then adding the equal number of additional predictions to
those already existing rows(while appending the predicitons of test),
need some clarification…
ex: in example show, the shape of “test_pred” should be (154,1) since

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

“test” dataset passed was of shape(154,8)., but the shape the function
“test_pred” the function is returning is twice ie., (308,1)

2) And any particular reason, why “test_pred” was not initiated like
“train_pred” with empty array shape (0,1), instead of shape
(test.shape[0],1) ?

AISHWARYA SINGH Reply

August 27, 2018 at 8:15 pm

Hi manoj,

When I use np.empty(shape), it should give me an empty

array of the shape assigned. If you are getting an error,
replace this line and define test_pred in the same way as
train_pred is defined.

PO-YU KAO Reply

December 22, 2018 at 5:55 am

Hello Manoj,

I think `test_pred=np.append(test_pred,model.predict(test))`
should be placed outside the for loop.

SHIVA Reply
October 17, 2018 at 5:05 am

Thanks for this awesome post. Do you hav any posts explaining
stacked ensemble of pretrained Deep Learning models with image
inputs? Can you point to any resources otherwise?

AISHWARYA SINGH Reply

October 17, 2018 at 2:12 pm

Hi Shiva,

I haven’t researched on ensemble of pretrained models yet.

If I come across a relevant post, I’d share it with you.

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

ROSA Reply
October 22, 2018 at 7:25 pm

Hey thanks very much for your help. I am trying to run the stacking
method and I got this error AttributeError: ‘numpy.ndarray’ object has
no attribute ‘values’. Can you please explain me why. Pd. I am new on
programming Thanks in advance

AISHWARYA SINGH Reply

October 22, 2018 at 7:53 pm

Hi,

Looks like the you are using .values on an array. Convert

it into a dataframe and use the command.

JINGMIAO SHEN Reply

November 1, 2018 at 2:09 am

How should I miss such a great article before???

I have become your fan now, AISHWARYA!!!
Love “Concept + Code” blog, easy to follow and implement.
Appreciate your time !!!

AISHWARYA SINGH Reply

November 1, 2018 at 12:18 pm

Thanks a lot!

SADIQ Reply
December 27, 2018 at 7:45 pm

Thanks for the detailed and organized article.

Could you please help me on following issue?
df has 2 features and we fitted the level one model to df and y_train,
my question is how can we use this model to predict x_test as we
need to get y_test (predicted y for test data set) for x_test. Model fitted
with 2 features but x_test has 20 features so could not use the model
for x_test.
For example if I want to use level one model to predict Loan_Status for

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

Loan Prediction competition after(model.fit(df,y_train)) how can I use

model.predict(x_test)? Showing following error.!

ValueError: X has 20 features per sample; expecting 2

AISHWARYA SINGH Reply

January 2, 2019 at 11:45 am

Hi Sadiq,

The dataset you train and the daatset on which you want to
predict should have the same number of features. df should
have 20 features or you will have to drop the remaining 18
features from x_test. Which part of the code in the article (or
uder which section) did you face the error?

SARWOJOWO Reply
January 30, 2019 at 2:02 pm

same issue i have error like this ; ValueError: Found array

with dim 4. Estimator expected <= 2. how to solved this ?

HUSEYIN Reply
January 2, 2019 at 11:27 am

Well organized and informative article.

I have a question:

What do you think about their usage in real life. Although there are
powerful boosting algoritms( like “XGBoost”), do we still need stacking,
blending or voting based learning?

Thank you

AISHWARYA SINGH Reply

February 20, 2019 at 2:16 pm

Hi,

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

I am aware that we have powerful algorithms that are able

to give excellent performance. But the idea behind covering
the concepts of stacking, blending was to start with the
basics and then move to complex algorithms

SHUKRITY SI Reply
January 4, 2019 at 11:20 pm

It is a very useful article for ensemble methods. But while using

blending, I get the error “cannot concatenate a non-NDFrame object.
Can you please guide me to avoid the error?

AISHWARYA SINGH Reply

February 20, 2019 at 2:25 pm

Hi shukrity, Please check the type of the data you are using.
is it a dataframe?

DENNIS CARTIN Reply

January 7, 2019 at 7:20 pm

Nice article. However, I am looking for ensemble for Keras model. Can
you share your knowledge please?

SHUKRITY SI Reply
January 13, 2019 at 4:15 am

Can you please help me out of my problem regarding stacking. In my

dataset, size of train set is 7116 and size of test set is 1780. So,
df_test and y_test should same in size(1780). But, size of df_test is
shown 10680. So,value error arises for this inconsistency.
Please tell me how can I solve this problem?

AISHWARYA SINGH Reply

February 20, 2019 at 2:21 pm

Could you share the notebook with me? Or the code so that
I can copy paste and check at my end.

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

A Comprehensive Guide to Ensemble Learning (with Python codes)

SHAMEER Reply
February 9, 2019 at 2:37 pm

Nice Article !!!

DATA SCIENTISTS

COMPANIES

JOIN OUR
Don't have an account? Sign up here. COMMUNITY :


46336

20474


7513

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

Pandas Handbook
No ratings yet
Pandas Handbook
33 pages
Chapter 7 Solutions
100% (3)
Chapter 7 Solutions
9 pages
Bourdarot, G. - Well Testing Interpretation Methods PDF
100% (1)
Bourdarot, G. - Well Testing Interpretation Methods PDF
170 pages
Centaur XP Interface Specification Manual
No ratings yet
Centaur XP Interface Specification Manual
292 pages
T. Mailund - Beginning Data Science in R 4 - Data Analysis, Visualization, and Modelling For The Data Scientist (2022)
No ratings yet
T. Mailund - Beginning Data Science in R 4 - Data Analysis, Visualization, and Modelling For The Data Scientist (2022)
527 pages
Pandas Cheat Sheet PDF
67% (3)
Pandas Cheat Sheet PDF
1 page
Machine Learning Cheat Sheet ??? - ?
No ratings yet
Machine Learning Cheat Sheet ??? - ?
231 pages
Review of Basic Statistical Concepts Hanke
No ratings yet
Review of Basic Statistical Concepts Hanke
28 pages
Sanskriti University Brochure
No ratings yet
Sanskriti University Brochure
36 pages
Income Taxation 2021 Rex Banggawan Answers Multiple Choice-Theory: General Concepts
100% (1)
Income Taxation 2021 Rex Banggawan Answers Multiple Choice-Theory: General Concepts
6 pages
A Comprehensive Guide To Ensemble Learning (With Python Codes)
No ratings yet
A Comprehensive Guide To Ensemble Learning (With Python Codes)
21 pages
Full Syllabus of Calicut University (2004) Information Technology (IT)
No ratings yet
Full Syllabus of Calicut University (2004) Information Technology (IT)
191 pages
Super Study Guide: Data Science Tools: Afshine Amidi and Shervine Amidi August 21, 2020
No ratings yet
Super Study Guide: Data Science Tools: Afshine Amidi and Shervine Amidi August 21, 2020
23 pages
VIP Cheatsheet: Convolutional Neural Networks: Afshine Amidi and Shervine Amidi November 26, 2018
No ratings yet
VIP Cheatsheet: Convolutional Neural Networks: Afshine Amidi and Shervine Amidi November 26, 2018
5 pages
Building A Career in Data Science - The Overview
No ratings yet
Building A Career in Data Science - The Overview
2 pages
Midsem Regular MFDS 22-12-2019 Answer Key PDF
No ratings yet
Midsem Regular MFDS 22-12-2019 Answer Key PDF
5 pages
Part1 20180910.13500.1596979305.4946 PDF
No ratings yet
Part1 20180910.13500.1596979305.4946 PDF
94 pages
Cheatsheet Machine Learning Tips and Tricks PDF
No ratings yet
Cheatsheet Machine Learning Tips and Tricks PDF
2 pages
Customer Segmentation Clustering
No ratings yet
Customer Segmentation Clustering
35 pages
Unit 4
No ratings yet
Unit 4
108 pages
Big Data Analysis With Scala and Spark: Heather Miller
No ratings yet
Big Data Analysis With Scala and Spark: Heather Miller
17 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
75 pages
Data Visualization in Python
No ratings yet
Data Visualization in Python
11 pages
ENG 202: Computers and Engineering Object Oriented Programming in PYTHON
No ratings yet
ENG 202: Computers and Engineering Object Oriented Programming in PYTHON
56 pages
COmputing and Combinatoria
No ratings yet
COmputing and Combinatoria
490 pages
CSE-Machine Learning & Big Data - WSS Source Book
No ratings yet
CSE-Machine Learning & Big Data - WSS Source Book
181 pages
Manual Testing Cheat Sheet
No ratings yet
Manual Testing Cheat Sheet
9 pages
Appendix B DAX Reference
100% (1)
Appendix B DAX Reference
174 pages
Linear Algebra Using R
No ratings yet
Linear Algebra Using R
17 pages
Data Analyst Roadmap by Rishabh Mishra
No ratings yet
Data Analyst Roadmap by Rishabh Mishra
9 pages
Statistics
100% (1)
Statistics
72 pages
76 - Sample - Chapter Kunci M2K3 No 9
No ratings yet
76 - Sample - Chapter Kunci M2K3 No 9
94 pages
Azure Data Engineer Roadmap
No ratings yet
Azure Data Engineer Roadmap
4 pages
Pandas Guide
No ratings yet
Pandas Guide
64 pages
PySpark CheatSheet Edureka
No ratings yet
PySpark CheatSheet Edureka
1 page
Maths of Machine Learning
No ratings yet
Maths of Machine Learning
75 pages
DataTable JSON Serialization in JSON - Net and JavaScriptSerializer
No ratings yet
DataTable JSON Serialization in JSON - Net and JavaScriptSerializer
9 pages
How Can I Become A Data Scientist - Quora
100% (1)
How Can I Become A Data Scientist - Quora
16 pages
Python Cheat Sheet: Print Print ("Hello World") Input Input ("What's Your Name")
100% (1)
Python Cheat Sheet: Print Print ("Hello World") Input Input ("What's Your Name")
16 pages
M5 - Custom Model Building With SQL in BigQuery ML Slides
No ratings yet
M5 - Custom Model Building With SQL in BigQuery ML Slides
32 pages
The Gainz Manual
No ratings yet
The Gainz Manual
28 pages
44th International Mathematical Olympiad.. Short-Listed Problems and Solutions (Tokyo, 2003) (71s) - MSCH
No ratings yet
44th International Mathematical Olympiad.. Short-Listed Problems and Solutions (Tokyo, 2003) (71s) - MSCH
71 pages
Numpy
No ratings yet
Numpy
15 pages
001-2023-0921 DLMDSBDT01 Course Book
No ratings yet
001-2023-0921 DLMDSBDT01 Course Book
124 pages
Research Paper Presentation Pandas Moshiul Arefin
No ratings yet
Research Paper Presentation Pandas Moshiul Arefin
30 pages
Sampling Techniques - Towards Data Science
No ratings yet
Sampling Techniques - Towards Data Science
10 pages
Data Visualization - Getting Started With Plotly
No ratings yet
Data Visualization - Getting Started With Plotly
37 pages
Huffman Code - Brilliant Math & Science Wiki
No ratings yet
Huffman Code - Brilliant Math & Science Wiki
18 pages
Multiple Linear Regression
100% (1)
Multiple Linear Regression
14 pages
Probability and Statistics Notes - Akshansh
0% (2)
Probability and Statistics Notes - Akshansh
101 pages
Statistical Learning Theory
No ratings yet
Statistical Learning Theory
4 pages
Apache Griffin: Data Quality Solution For Both Streaming and Batch
No ratings yet
Apache Griffin: Data Quality Solution For Both Streaming and Batch
32 pages
Migrating Big Data Analytics
No ratings yet
Migrating Big Data Analytics
16 pages
Linear Algebra and Matrices: Methods For Dummies 21 October, 2009 Elvina Chu & Flavia Mancini
No ratings yet
Linear Algebra and Matrices: Methods For Dummies 21 October, 2009 Elvina Chu & Flavia Mancini
33 pages
Data Science Links
No ratings yet
Data Science Links
1 page
Machine Learning Super Cheatsheet (Prof. Pedram Jahangiry)
No ratings yet
Machine Learning Super Cheatsheet (Prof. Pedram Jahangiry)
2 pages
Tamil Rosary
No ratings yet
Tamil Rosary
12 pages
MACHINE LEARNING ALGORITHM Unit-II
No ratings yet
MACHINE LEARNING ALGORITHM Unit-II
115 pages
Python For Non-Programmers - 1-1
No ratings yet
Python For Non-Programmers - 1-1
19 pages
New Learning of Python by Practical Innovation and Technology
From Everand
New Learning of Python by Practical Innovation and Technology
Sudhir Pathania
No ratings yet
Effective Amazon Machine Learning
From Everand
Effective Amazon Machine Learning
Alexis Perrier
No ratings yet
Learn R By Coding
From Everand
Learn R By Coding
Thomas Kurnicki
No ratings yet
Introduction to Python Programming: Learn Coding with Hands-On Projects for Beginners
From Everand
Introduction to Python Programming: Learn Coding with Hands-On Projects for Beginners
Kiet Huynh
No ratings yet
Offline First Web Development: Design and build robust offline-first apps for exceptional user experience even when an internet connection is absent
From Everand
Offline First Web Development: Design and build robust offline-first apps for exceptional user experience even when an internet connection is absent
Daniel Sauble
No ratings yet
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
From Everand
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
Robert Johnson
No ratings yet
An Adaptive Simulated Annealing Algorithm PDF
No ratings yet
An Adaptive Simulated Annealing Algorithm PDF
9 pages
1D Diffusion PDE
No ratings yet
1D Diffusion PDE
23 pages
Pressure Buildups Vs Static Gradient Survey Flowing Gradient Survey
100% (1)
Pressure Buildups Vs Static Gradient Survey Flowing Gradient Survey
1 page
Zolotukhin Reservoir Engineering 002
86% (14)
Zolotukhin Reservoir Engineering 002
217 pages
Design and Fabrication of Fluidized-Bed Reactor
No ratings yet
Design and Fabrication of Fluidized-Bed Reactor
12 pages
Huawei: Quick Start Guide Guide de Démarrage Rapide Schnellstartanleitung Guida Di Avvio Rapido Guía de Inicio Rápido
No ratings yet
Huawei: Quick Start Guide Guide de Démarrage Rapide Schnellstartanleitung Guida Di Avvio Rapido Guía de Inicio Rápido
170 pages
Listado de Prestadores RN 38-2019
No ratings yet
Listado de Prestadores RN 38-2019
13 pages
Nu 540 Es Om Manual
No ratings yet
Nu 540 Es Om Manual
65 pages
MIS602 - Assessment 3 - 20240603
No ratings yet
MIS602 - Assessment 3 - 20240603
5 pages
Advanced Web Programming
No ratings yet
Advanced Web Programming
266 pages
Ar Pp2 Sample Conract Documents Reeport
No ratings yet
Ar Pp2 Sample Conract Documents Reeport
26 pages
Thebook Polyvance Plastic Weld
No ratings yet
Thebook Polyvance Plastic Weld
16 pages
Proctoring Chart 31-01-2022
No ratings yet
Proctoring Chart 31-01-2022
5 pages
Tests Based On Asymptotic Properties
No ratings yet
Tests Based On Asymptotic Properties
6 pages
Strategic Materials and Computational Design Ceramic Engineering and Science Proceedings Volume 31 Yanchun Zhou Instant Download
No ratings yet
Strategic Materials and Computational Design Ceramic Engineering and Science Proceedings Volume 31 Yanchun Zhou Instant Download
46 pages
Sterilization of Water Using Bleaching Powder: A Chemistry Investigatory Project
No ratings yet
Sterilization of Water Using Bleaching Powder: A Chemistry Investigatory Project
14 pages
Miniature Rotor Craft
No ratings yet
Miniature Rotor Craft
16 pages
Computer Graphics - Saurabh Kumar (01714402009) Bca 3 Year
100% (1)
Computer Graphics - Saurabh Kumar (01714402009) Bca 3 Year
35 pages
Translate - Google Search PDF
No ratings yet
Translate - Google Search PDF
1 page
Chordette Gem With Apt-X
No ratings yet
Chordette Gem With Apt-X
2 pages
(HK) L&S Curriculum Guide Eng
No ratings yet
(HK) L&S Curriculum Guide Eng
215 pages
Autonomous Maintenance Pillar in World Class Manufacturing WCM
No ratings yet
Autonomous Maintenance Pillar in World Class Manufacturing WCM
13 pages
20-Loc - de Falta em Tempo Real para Redes de Transmissão 2-2012
No ratings yet
20-Loc - de Falta em Tempo Real para Redes de Transmissão 2-2012
8 pages
Easy System For Thesis in Information Technology
100% (3)
Easy System For Thesis in Information Technology
7 pages
Generate Yearly Report - 2020.10 Exercise Hints
No ratings yet
Generate Yearly Report - 2020.10 Exercise Hints
8 pages
GVC Diagnostic - Mapping - EBRD Region
No ratings yet
GVC Diagnostic - Mapping - EBRD Region
6 pages
Vector - Matrix Calculus
No ratings yet
Vector - Matrix Calculus
10 pages
Ggu Mba Borchure 2022 1659962789705
No ratings yet
Ggu Mba Borchure 2022 1659962789705
24 pages
Auditing Vouching
No ratings yet
Auditing Vouching
6 pages
The Manufacture, Storage and Import of Hazardous Chemicals Rules, 1989
0% (1)
The Manufacture, Storage and Import of Hazardous Chemicals Rules, 1989
58 pages