0% found this document useful (0 votes)

18 views14 pages

2.4-Ensemble Methods Lecture Notes

hdrg

Uploaded by

sara

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views14 pages

2.4-Ensemble Methods Lecture Notes

hdrg

Uploaded by

sara

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Ensemble methods

J.A. Matos

Modelling and Data Analysis II

Master in Finance (FEP)

Contents
1 Motivation 2
1.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Ensemble models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Combining homogeneous classifiers . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Bootstrap Aggregating - Bagging 5

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Advantages and disadvantage . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 Random Trees 7
3.1 Random Forests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Extremely Randomized Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.3 Feature importance evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

4 Boosting methods 9
4.1 Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2 AdaBoost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.3 Gradient boosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.4 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.6 Perturbation of the test examples . . . . . . . . . . . . . . . . . . . . . . . . . 13

5 Combining Heterogeneous Estimators 14

5.1 Stacked generalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.2 Voting methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1 Motivation 2

1 Motivation
1.1 Setup
Basic idea
The main idea behind any multiple (predictive) model is based on the observation that different
learning algorithms explore:

• different representation languages;

• different search spaces;

• different hypothesis assessment functions.

Can we take advantage of these differences?

The goal of ensemble methods is to combine the predictions of several base estimators built with
a given learning algorithm in order to improve generalizability/robustness over a single estimator.

Definition
In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain
better predictive performance than could be obtained from any of the constituent learning algo-
rithms alone. Unlike a statistical ensemble in statistical mechanics, which is usually infinite, a
machine learning ensemble consists of only a concrete finite set of alternative models, but typically
allows for much more flexible structure to exist among those alternatives.
From https://en.wikipedia.org/wiki/Ensemble_learning

Error correlation
Given a set of classifiers, such that the prediction of the classifier i is given by fî (x) = y, i =
1 . . . n such that f (x) = y denotes the true class of record x.
We define the error correlation between classifiers as

φij = p fî (x) = fˆj (x) |fî (x) 6= f (x) ∨ fˆj (x) 6= f (x) .

This means the probability that both classifiers commit the same error given that one of them fails.

Purpose

• Combine models that produce non correlated errors or preferably negatively correlated er-
rors;

• Each model should have better performance than the random choice.

Modelling and Data Analysis II Master in Finance

1 Motivation 3

1.2 Ensemble models

Ensemble methods
Two families of ensemble methods are usually distinguished:

• In averaging methods, the driving principle is to build several estimators independently and
then to average their predictions. On average, the combined estimator is usually better than
any of the single base estimator because its variance is reduced.

• By contrast, in boosting methods, base estimators are built sequentially and one tries to reduce
the bias of the combined estimator. The motivation is to combine several weak models to
produce a powerful ensemble.

Classification vs regression
Here we will discuss mainly classification models but the same ideas apply equally well to re-
gression models.
To understand how the relation works look into the first method that we have studied (k−nearest
neighbours) to see how regression can be implemented when compared with the classification. E.g.
the methods can be combined by averaging the output (for regression) or voting (for classification).

Optimization methods
It should be noted that, by construction, all the methods presented here are optimization methods.
Note
The ensemble methods are optimization methods even if the underlying methods are not.
E.g, consider an ensemble of nearest neighbours methods. The ensemble is an optimization
method although the nearest neighbours are not.

1.3 Combining homogeneous classifiers

Motivation
Here we combine models generated by a single algorithm.
Diversity is one of the requirements when using multiple models:

• One of the ways to ensure that we have generate different models is by the manipulation of
the training dataset.

• The learning algorithm is run several times using different distributions of the training dataset.

This works well if the algorithm is unstable, i.e. algorithms where a small change in the training
dataset can induce large change in the output representation (like what we saw in the Decision
Trees, the tree structure can change a lot with small changes in the input) and not necessarily in the
output prediction.

Modelling and Data Analysis II Master in Finance

1 Motivation 4

Sampling the training dataset

The types of algorithms diversify the base classifiers can be broadly divided in different types:

• Sampling from the training dataset;

– Bagging;
– Boosting;

• Sampling from the set of attributes;

• Injecting Randomness;

• Perturbation of the test examples.

Modelling and Data Analysis II Master in Finance

2 Bootstrap Aggregating - Bagging 5

2 Bootstrap Aggregating - Bagging

2.1 Introduction
Bagging meaning
Bagging is an abbreviation of Bootstrap Aggregating.
In ensemble algorithms, bagging methods form a class of algorithms which build several in-
stances of a black-box estimator on random subsets of the original training set and then aggregate
their individual predictions to form a final prediction. These methods are used as a way to re-
duce the variance of a base estimator (e.g., a decision tree), by introducing randomization into its
construction procedure and then making an ensemble out of it.

Definition
Bootstrap aggregating
, also called bagging (from bootstrap aggregating), is a machine learning ensemble meta-algorithm
designed to improve the stability and accuracy of machine learning algorithms used in statistical
classification and regression. It also reduces variance and helps to avoid overfitting. Although it
is usually applied to decision tree methods, it can be used with any type of method. Bagging is a
special case of the model averaging approach.
From https://en.wikipedia.org/wiki/Bootstrap_aggregating

Bagging

2.2 Implementation
Python Implementation
This can be used for several classes of estimators that we have already used, see:
• https://scikit-learn.org/stable/modules/ensemble.html#bagging-meta-estimator

Modelling and Data Analysis II Master in Finance

2 Bootstrap Aggregating - Bagging 6

2.3 Advantages and disadvantage

Advantages and disadvantages
We can generate bagging models, for example, from Bayesian Classifiers or from Decision trees.
Advantages:

• Many weak learners aggregated typically outperform a single learner over the entire set, and
has less overfit;

• Removes variance in high-variance low-bias weak learner, which can improve efficiency
(statistics);

• Can be performed in parallel, as each separate bootstrap can be processed on its own before
combination.

Disadvantages:

• For weak learner with high bias, bagging will also carry high bias into its aggregate;

• Loss of interpretability of a model;

• Can be computationally expensive depending on the data set.

Modelling and Data Analysis II Master in Finance

3 Random Trees 7

3 Random Trees
3.1 Random Forests
Random forests
One of the main methods that applies Bagging are the Random Forests.
This extension combines:

• the "bagging" idea and

• random selection of features in order to construct a collection of decision trees with con-
trolled variance.

https://en.wikipedia.org/wiki/Random_forest

Example
The Random Forest model uses Bagging, where decision tree models with higher variance are
present. It makes random feature selection to grow trees (Injecting Randomness). Several random
trees make a Random Forest.

3.2 Extremely Randomized Trees

Motivation
In extremely randomized trees (see ExtraTreesClassifier and ExtraTreesRegressor classes),
randomness goes one step further in the way splits are computed. As in random forests, a random
subset of candidate features is used, but instead of looking for the most discriminative thresholds,
thresholds are drawn at random for each candidate feature and the best of these randomly-generated
thresholds is picked as the splitting rule. This usually allows to reduce the variance of the model a
bit more, at the expense of a slightly greater increase in bias.

3.3 Feature importance evaluation

Modelling and Data Analysis II Master in Finance

3 Random Trees 8

3.4 Implementation
Python
In random forests (see RandomForestClassifier and RandomForestRegressor classes), each
tree in the ensemble is built from a sample drawn with replacement (i.e., a bootstrap sample) from
the training set.

Modelling and Data Analysis II Master in Finance

4 Boosting methods 9

4 Boosting methods
4.1 Idea
Boosting

4.2 AdaBoost
Example: AdaBoost
The core principle of AdaBoost is to fit a sequence of weak learners (i.e., models that are only
slightly better than random guessing, such as small decision trees) on repeatedly modified versions
of the data. The predictions from all of them are then combined through a weighted majority vote
(or sum) to produce the final prediction. The data modifications at each so-called boosting iteration
consist of applying weights , …, to each of the training samples. Initially, those weights are all set
to , so that the first step simply trains a weak learner on the original data. For each successive
iteration, the sample weights are individually modified and the learning algorithm is reapplied to
the reweighted data. At a given step, those training examples that were incorrectly predicted by the
boosted model induced at the previous step have their weights increased, whereas the weights are
decreased for those that were predicted correctly. As iterations proceed, examples that are difficult
to predict receive ever-increasing influence.

Example: AdaBoost

Modelling and Data Analysis II Master in Finance

4 Boosting methods 10

4.3 Gradient boosting

Gradient boosting
Gradient boosting is a machine learning technique for regression, classification and other tasks,
which produces a prediction model in the form of an ensemble of weak prediction models, typically
decision trees.
The idea of gradient boosting originated in the observation that boosting can be interpreted as
an optimization algorithm on a suitable cost function.
X a model F
One possible example is to consider the cost function as where the goal is to "teach"
to predict values of the form ŷ = F (x) by minimizing the mean squared error n1 (ŷi − yi )2 ,
i
where i indexes over some training set of size n of actual values of the output variable y:
• ŷi is the predicted value F (x);

• yi is the observed value;

• n is the number of samples in y.

General idea
Let us consider a gradient boosting algorithm with M stages. At each stage m (1 ≤ m ≤ M )
of gradient boosting, suppose some imperfect model Fm (for low m, this model may simply return
ŷi = ȳ, where the right hand side is the mean of y. In order to improve Fm , our algorithm should
add some new estimator, hm (x). Thus,

Fm+1 (x) = Fm (x) + hm (x) = y

or, equivalently,

hm (x) = y − Fm (x)

Modelling and Data Analysis II Master in Finance

4 Boosting methods 11

Therefore, gradient boosting will fit h to the residual y − Fm (x). As in other boosting variants,
each Fm+1 attempts to correct the errors of its predecessor Fm . A generalization of this idea to
loss functions other than squared error, and to classification and ranking problems, follows from
the observation that residuals hm (x) for a given model are proportional equivalent to the negative
gradients of the mean squared error (MSE) loss function (with respect to F (x)):
1
LMSE = (y − F (x))2
n
∂LMSE 2 2
− = (y − F (x)) = hm (x).
∂F n n
So, gradient boosting could be specialized to a gradient descent algorithm, and generalizing it entails
"plugging in" a different loss and its gradient.

Steepest descent method

The gradient boosting method assumes a real-valued y and seeks an approximation F̂ (x) in the
form of a weighted sum of functions hi (x) from some class H, called base (or weak) learners:
M
X
F̂ (x) = γi hi (x) + const.
i=1

We are usually given a training set {(x1 , y1 ), . . . , (xn , yn )} of known sample values of x and
corresponding values of y. In accordance with the empirical risk minimization principle, the
method tries to find an approximation F̂ (x) that minimizes the average value of the loss func-
tion on the training set, i.e., minimizes the empirical risk. It does so by starting with a model,
consisting of a constant function F0 (x), and incrementally expands it in a greedy fashion:
n
X
F0 (x) = arg min L(yi , γ),
γ
i=1
n
" #
X
Fm (x) = Fm−1 (x) + arg min L(yi , Fm−1 (xi ) + hm (xi )) ,
hm ∈H i=1

where hm ∈ H is a base learner function.

Unfortunately, choosing the best function h at each step for an arbitrary loss function L is a
computationally infeasible optimization problem in general. Therefore, we restrict our approach to
a simplified version of the problem.
The idea is to apply a steepest descent step to this minimization problem (functional gradient
descent).

4.4 Regularization
Regularization
Fitting the training set too closely can lead to degradation of the model’s generalization ability.

Modelling and Data Analysis II Master in Finance

4 Boosting methods 12

Several so called regularization techniques reduce this overfitting effect by constraining the fitting
procedure.

• One natural regularization parameter is the number of gradient boosting iterations M (i.e.
the number of trees in the model when the base learner is a decision tree). Increasing M
reduces the error on training set, but setting it too high may lead to overfitting. An optimal
value of M is often selected by monitoring prediction error on a separate validation data set.
Besides controlling M , several other regularization techniques are used.

• Another regularization parameter is the depth of the trees. The higher this value the more
likely the model will overfit the training data.

4.5 Implementation
Implementations
Gradient boosting gives usually very good results and is one the methods used in machine learning
problems, together with neural networks and random forests. In Python there are two well regarded
implementations:

XGBoost (eXtreme Gradient Boosting) https://en.wikipedia.org/wiki/XGBoost

scikit-learn https://scikit-learn.org/stable/modules/ensemble.html#gradient-boosted-trees

Some of those algorithms support missing values and categorical data removing the need for addi-
tional preprocessing such as imputation.

Bringing back some interpretability

Individual decision trees can be interpreted easily by simply visualizing the tree structure. Gra-
dient boosting models, however, comprise hundreds of regression trees thus they cannot be easily
interpreted by visual inspection of the individual trees. Fortunately, a number of techniques have
been proposed to summarize and interpret gradient boosting models.
Often features do not contribute equally to predict the target response; in many situations the
majority of the features are in fact irrelevant. When interpreting a model, the first question usually
is: what are those important features and how do they contributing in predicting the target response?

Individual decision trees intrinsically perform feature selection by selecting appropriate split
points. This information can be used to measure the importance of each feature; the basic idea
is: the more often a feature is used in the split points of a tree the more important that feature
is. This notion of importance can be extended to decision tree ensembles by simply averaging the
impurity-based feature importance of each tree.

• https://scikit-learn.org/stable/modules/ensemble.html#interpretation-with-feature-importance

• https://scikit-learn.org/stable/modules/ensemble.html#random-forest-feature-importance

Modelling and Data Analysis II Master in Finance

4 Boosting methods 13

4.6 Perturbation of the test examples

Perturbation of the test examples
Another way to generate diversity is to keep the original estimator model and to slightly perturb
the test examples adding white noise to each record and look into the corresponding distribution
of outcomes.
That is instead of considering a record x we consider a set of test examples in the form of x +
where corresponds to the white noise perturbation.

Modelling and Data Analysis II Master in Finance

5 Combining Heterogeneous Estimators 14

5 Combining Heterogeneous Estimators

5.1 Stacked generalization
Stacked generalization
Stacked generalization is a method for combining estimators to reduce their biases. More pre-
cisely, the predictions of each individual estimator are stacked together and used as input to a final
estimator to compute the prediction. This final estimator is trained through cross-validation (to be
seen in the next module).
See scikit-learn implementation: https://scikit-learn.org/stable/modules/ensemble.
html#stacked-generalization

5.2 Voting methods

Elections!

• Classification: VotingClassifier

• Regression: VotingRegressor

Modelling and Data Analysis II Master in Finance

Ensemble Learning
No ratings yet
Ensemble Learning
52 pages
Enseble LEarning
100% (1)
Enseble LEarning
57 pages
Lecture 6
No ratings yet
Lecture 6
24 pages
An Introduction of Ensemble Learning
100% (1)
An Introduction of Ensemble Learning
40 pages
ML Lecture 15 Ensemble
No ratings yet
ML Lecture 15 Ensemble
27 pages
Ensemble Methods
No ratings yet
Ensemble Methods
31 pages
22AIP3101A Session 11
No ratings yet
22AIP3101A Session 11
30 pages
Unit 3
No ratings yet
Unit 3
59 pages
Unit 3
No ratings yet
Unit 3
63 pages
Module 7 - Ensemble Learning
No ratings yet
Module 7 - Ensemble Learning
41 pages
Bagging
No ratings yet
Bagging
7 pages
Classification Algorithms
No ratings yet
Classification Algorithms
68 pages
Ensemble Methods
No ratings yet
Ensemble Methods
32 pages
5 - EnsembleModeling
No ratings yet
5 - EnsembleModeling
80 pages
Unit 4 ML
No ratings yet
Unit 4 ML
25 pages
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
36 pages
Ensemble Models
No ratings yet
Ensemble Models
52 pages
ML Unit 3-1
No ratings yet
ML Unit 3-1
14 pages
Unit-3 ML
No ratings yet
Unit-3 ML
18 pages
Ensemble TBL Notes
No ratings yet
Ensemble TBL Notes
2 pages
Bagging+Boosting+Gradient Boosting
100% (1)
Bagging+Boosting+Gradient Boosting
48 pages
Group9 ABA Ensemble Model
No ratings yet
Group9 ABA Ensemble Model
5 pages
Ensemble Learning: Wisdom of The Crowd
100% (1)
Ensemble Learning: Wisdom of The Crowd
12 pages
UNIT III Word File
No ratings yet
UNIT III Word File
13 pages
Machine Learning Lecture 2,3,4
No ratings yet
Machine Learning Lecture 2,3,4
26 pages
Lecture 10 Ensemble Methods
No ratings yet
Lecture 10 Ensemble Methods
69 pages
Lecture 5
No ratings yet
Lecture 5
11 pages
Ensemble Learning
No ratings yet
Ensemble Learning
26 pages
Pa - Unit - Iv
No ratings yet
Pa - Unit - Iv
45 pages
Ensemble Learning
No ratings yet
Ensemble Learning
30 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
8 pages
ML Mod 5.1
No ratings yet
ML Mod 5.1
18 pages
Module 2
No ratings yet
Module 2
34 pages
cs229 Notes Ensemble
No ratings yet
cs229 Notes Ensemble
7 pages
ML - 5
No ratings yet
ML - 5
53 pages
Unit 3 Aml
No ratings yet
Unit 3 Aml
9 pages
Ensemble Final
No ratings yet
Ensemble Final
41 pages
12 Ensemble Model
No ratings yet
12 Ensemble Model
90 pages
Technical Report
No ratings yet
Technical Report
10 pages
7 - Ensemble Techniques-Converted Updated
No ratings yet
7 - Ensemble Techniques-Converted Updated
8 pages
ML Unit-3
No ratings yet
ML Unit-3
28 pages
Ensemble Methods
100% (1)
Ensemble Methods
15 pages
Ensemble Learning in Machine Learning
No ratings yet
Ensemble Learning in Machine Learning
4 pages
D3 IT Random Forest Apr 2023
No ratings yet
D3 IT Random Forest Apr 2023
32 pages
Evaluating Machine Learning Algorithms and Model Selection
No ratings yet
Evaluating Machine Learning Algorithms and Model Selection
10 pages
Unit 4
No ratings yet
Unit 4
17 pages
Eda - M4
No ratings yet
Eda - M4
7 pages
Unit 4 Part 1
No ratings yet
Unit 4 Part 1
47 pages
ML Unit-3
No ratings yet
ML Unit-3
15 pages
Unit Iv
No ratings yet
Unit Iv
14 pages
Ch-4 Ensemble Learning
No ratings yet
Ch-4 Ensemble Learning
18 pages
Ensemble Learning
No ratings yet
Ensemble Learning
15 pages
Unit 4 Ensemble Techniques and Unsupervised Learning
100% (1)
Unit 4 Ensemble Techniques and Unsupervised Learning
25 pages
Ensemble Methods (Final)
No ratings yet
Ensemble Methods (Final)
16 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
40 pages
Ensemble Methods Send
No ratings yet
Ensemble Methods Send
20 pages
Article Review 9 Eng
No ratings yet
Article Review 9 Eng
21 pages
UNIT-3 Material
No ratings yet
UNIT-3 Material
19 pages
Anomaly and Fraud Detection in Credit Card Transactions Using The ARIMA Model
No ratings yet
Anomaly and Fraud Detection in Credit Card Transactions Using The ARIMA Model
11 pages
Lesson 11 - Regression and Correlation Analysis
No ratings yet
Lesson 11 - Regression and Correlation Analysis
8 pages
Machine Learning - Brief
No ratings yet
Machine Learning - Brief
12 pages
A Survey On Football Player Performance and Value Estimation Using Machine Learning Techniques (#1215552) - 2816789
No ratings yet
A Survey On Football Player Performance and Value Estimation Using Machine Learning Techniques (#1215552) - 2816789
6 pages
Spearman Rank-Order Correlation
No ratings yet
Spearman Rank-Order Correlation
16 pages
Stat 302 Practice Final: Brad Mcneney 2017-04-15
No ratings yet
Stat 302 Practice Final: Brad Mcneney 2017-04-15
7 pages
ML, DL Questions: Downloaded From
No ratings yet
ML, DL Questions: Downloaded From
4 pages
MINITAB For The Calculus: Confidence and Prediction Intervals For The Data in Table 12.1
No ratings yet
MINITAB For The Calculus: Confidence and Prediction Intervals For The Data in Table 12.1
3 pages
Chapter 7 - TThe Box-Jenkins Methodology For ARIMA Models
100% (1)
Chapter 7 - TThe Box-Jenkins Methodology For ARIMA Models
205 pages
Biometry Final 1999 Key
No ratings yet
Biometry Final 1999 Key
12 pages
CS6ML Assignment1
No ratings yet
CS6ML Assignment1
4 pages
Sawtooth Software: Analysis of Traditional Conjoint Using Microsoft Excel: An Introductory Example
No ratings yet
Sawtooth Software: Analysis of Traditional Conjoint Using Microsoft Excel: An Introductory Example
7 pages
03-Pelanggaran Asumsi Klasik (Normal-Kolinier) - BEA
No ratings yet
03-Pelanggaran Asumsi Klasik (Normal-Kolinier) - BEA
16 pages
Autoregressive Conditional Heteroskedasticity
No ratings yet
Autoregressive Conditional Heteroskedasticity
9 pages
Inference For The Least-Squares Line
No ratings yet
Inference For The Least-Squares Line
4 pages
One Way and Two Way Classification Analysis of Variance
No ratings yet
One Way and Two Way Classification Analysis of Variance
61 pages
Decision Trees - CHAID AND CART 2019 PDF
No ratings yet
Decision Trees - CHAID AND CART 2019 PDF
44 pages
Multinomial Logistic Regression - R Data Analysis Examples - IDRE Stats
No ratings yet
Multinomial Logistic Regression - R Data Analysis Examples - IDRE Stats
8 pages
Regression Discontinuity
No ratings yet
Regression Discontinuity
5 pages
Bertani (2018) How To Describe Bivariate Data
No ratings yet
Bertani (2018) How To Describe Bivariate Data
5 pages
Sec 4 A
No ratings yet
Sec 4 A
14 pages
Assignment ML
100% (2)
Assignment ML
21 pages
Correlation Types and Degree and Karl Pearson Coefficient of Correlation
No ratings yet
Correlation Types and Degree and Karl Pearson Coefficient of Correlation
5 pages
Data Analysis Workshop - Factor Analysis
No ratings yet
Data Analysis Workshop - Factor Analysis
85 pages
Heteroscedasticity Notes
No ratings yet
Heteroscedasticity Notes
9 pages
Aleks 1.73
No ratings yet
Aleks 1.73
4 pages
Hubungan Antar Volume Lalu Lintas Dengan Tingkat Kebisingan Di Jalan
No ratings yet
Hubungan Antar Volume Lalu Lintas Dengan Tingkat Kebisingan Di Jalan
7 pages
Uji Normaitas Hasil Jawaban Kuesioner
No ratings yet
Uji Normaitas Hasil Jawaban Kuesioner
10 pages
Tab 1 Production Laitiere Lactation 1: Statistics
No ratings yet
Tab 1 Production Laitiere Lactation 1: Statistics
16 pages
Elmousalami-Elaskary2020 Article DrillingStuckPipeClassificatio
No ratings yet
Elmousalami-Elaskary2020 Article DrillingStuckPipeClassificatio
14 pages

2.4-Ensemble Methods Lecture Notes

Uploaded by

2.4-Ensemble Methods Lecture Notes

Uploaded by

Ensemble methods

Modelling and Data Analysis II

2 Bootstrap Aggregating - Bagging 5

5 Combining Heterogeneous Estimators 14

• different representation languages;

• different search spaces;

• different hypothesis assessment functions.

Can we take advantage of these differences?

Modelling and Data Analysis II Master in Finance

1.2 Ensemble models

1.3 Combining homogeneous classifiers

Modelling and Data Analysis II Master in Finance

Sampling the training dataset

• Sampling from the training dataset;

• Sampling from the set of attributes;

• Perturbation of the test examples.

Modelling and Data Analysis II Master in Finance

2 Bootstrap Aggregating - Bagging

Modelling and Data Analysis II Master in Finance

2.3 Advantages and disadvantage

• Loss of interpretability of a model;

• Can be computationally expensive depending on the data set.

Modelling and Data Analysis II Master in Finance

• the "bagging" idea and

3.2 Extremely Randomized Trees

3.3 Feature importance evaluation

Modelling and Data Analysis II Master in Finance

Modelling and Data Analysis II Master in Finance

Modelling and Data Analysis II Master in Finance

4.3 Gradient boosting

• yi is the observed value;

• n is the number of samples in y.

Fm+1 (x) = Fm (x) + hm (x) = y

Modelling and Data Analysis II Master in Finance

Steepest descent method

where hm ∈ H is a base learner function.

Modelling and Data Analysis II Master in Finance

XGBoost (eXtreme Gradient Boosting) https://en.wikipedia.org/wiki/XGBoost

Bringing back some interpretability

Modelling and Data Analysis II Master in Finance

4.6 Perturbation of the test examples

Modelling and Data Analysis II Master in Finance

5 Combining Heterogeneous Estimators

5.2 Voting methods

Modelling and Data Analysis II Master in Finance

You might also like