0% found this document useful (0 votes)

62 views80 pages

3.pattern Recognition (Pattern Classification) - AdaBoost

The document provides an overview of boosting and the AdaBoost algorithm. It discusses how boosting works to combine multiple weak learners to create a strong learner by iteratively adjusting weights of training examples. Specifically, AdaBoost focuses on examples that previous hypotheses misclassified, gradually boosting the accuracy of the combined hypothesis on each round. The document defines key boosting concepts like weak learning, base classifiers, and the goal of boosting to efficiently learn complex concepts from simple hypotheses.

Uploaded by

temp

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views80 pages

3.pattern Recognition (Pattern Classification) - AdaBoost

Uploaded by

temp

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 80

Pattern Recognition

(Pattern Classification)
AdaBoost (Adaptive Boosting)
Hypothesis set and Algorithm

Second Edition
Contents
1. Boosting
2. AdaBoost
3. AdaBoost and margin maximization
4. Multiclass boosting algorithms
5. Appendix: Decision Tree

This chapter is mostly based on:

Foundations of Machine Learning, 2nd Ed., By Mehryar Mohri, , Afshin Rostamizadeh, Ameet Talwalkar,
Publisher: MIT Press, 2018

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 2

gineering, IUST, Morteza Analoui
1- Boosting
Boosting
• In 1988 Kearns and Valiant posed a theoretical question of whether a
“weak (non complex)” learner that performs just slightly better than
random guessing can be “boosted” into an arbitrarily accurate “strong
(complex)” learning algorithm
• Schapire came up with the first provable polynomial-time boosting
algorithm in 1989
• Freund developed a much more efficient boosting learner in 1990. First
experiments with these early boosting algorithms were carried out by
Drucker, Schapire and Simard on an OCR task.
• AdaBoost algorithm, introduced in 1995 by Freund and Schapire is a very
practical machine learning tool.
03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 4
gineering, IUST, Morteza Analoui
Recall from Chapter 1
Other Model selection methods
• Boosting (Ensemble methods)
• Minimum description length (MDL)
• Akaike’s information criterion (AIC)*
• Bayesian information criterion (BIC)
• Focused information criterion (FIC)
•…
Note: MDL "gives a selection criterion formally identical to BIC approach" for large number of
SAMPLEs
* Akaike, H. (1974), "A new look at the statistical model identification", IEEE Transactions on Automatic Control, 19 (6): 716723, Bibcode:
1974ITAC...19..716A, doi:10.1109/TAC.1974.1100705, MR 0423716.

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 6

gineering, IUST, Morteza Analoui
From Chapter1: Model Selection Problem
• A key problem in design of learning algorithms is choice of hypothesis
set H. This is known as model selection problem
• How should H be chosen?
• A rich or complex enough H could contain ideal Bayes classifier
• On the other hand, learning with such a complex family becomes a very difficult
task
• Choice of H is subject to a trade-off that can be analyzed in terms of
• estimation error (empirical error)
• approximation errors (hypothesis set complexity)
• We focus on case of binary classification but much of what is discussed
can be straightforwardly extended to different tasks and loss functions
03/18/2024 Machine Learning Theory for Pattern Recognition - School of C 7
omputer Engineering, IUST - Morteza Analoui
Boosting is a Model selection strategy
• Generalization is done by boosting (amplifying) accuracy of weak
hypothesis to address complexity (bias) tradeoff issue:
• Error of a learner can be decomposed into a sum of approximation error and
estimation error: smaller approximation error (bias) means larger estimation
error
• A learner is thus faced with the problem of picking a good tradeoff between
these two considerations.
( 𝑅 ( h ) − 𝑅 ( h∗ ) ) + ( 𝑅 ( h∗) − 𝑅∗ ¿ )

excess error of estimation error approximation error

how good is how good is in H how good is H
bias
03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 8
gineering, IUST, Morteza Analoui
Boosting the complexity of H
• Boosting paradigm allows learner to have smooth control over this
tradeoff
• Learning starts with a basic simple (weak) hypothesis set (might have
large approximation error), and as it progresses, the set grows richer
(more complex)
• It is just opposite to regularization-based model selection, which
begins with very high complex H and learning process tries to reduce
the complexity up to the best trade-off

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 9

gineering, IUST, Morteza Analoui
Boosting the computational complexity
• A boosting algorithm ampliﬁes accuracy of weak learners (simple
hypothesis set)
• Intuitively, one can think of a weak learner as an algorithm that uses a
simple learner to output a hypothesis that comes from an easy-to-
learn hypothesis set H and performs just slightly better than a
random guess
• When a weak learner can be implemented efﬁciently, boosting
provides a tool for aggregating such weak hypotheses to approximate
gradually good predictors for larger, and harder to learn, concepts.

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 10

gineering, IUST, Morteza Analoui
Ensemble methods
• Ensemble methods are general techniques in machine learning for
combining several hypothesis (predictors) to create a more accurate
one.
• It is often difficult, for a non-trivial learning task, to directly devise an
accurate algorithm satisfying the strong PAC-learning.
• A large ensembles of diverse weak classifiers can have exceptional
performance.
• Boosting stemmed from theoretical question of whether an efﬁcient
weak learner can be “boosted” into an efﬁcient strong learner.

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 11

gineering, IUST, Morteza Analoui
From Chapter1: Deﬁnition 2.3: PAC-
learning definition
• A concept class is said to be PAC-learnable if there exists an algorithm
and a polynomial function such that for any and , for all distributions
on X and for any target concept , following holds for any sample size

𝑃 [ 𝑅( h𝑆 )≤ 𝜖 ] ≥1 − 𝛿 1-δ: confidence
ϵ: error (1-ϵ: accuracy) (2.4)
𝑚
𝑆 𝐷
• When exists, it is called a PAC-learning algorithm for
• (2.4): is PAC-learnable if is approximately correct (error at most )
with high probability (at least )

03/18/2024 Machine Learning Theory for Pattern Recognition - School of C 12

omputer Engineering, IUST - Morteza Analoui
Deﬁnition 7.1 (Weak learning)
• Following gives a formal definition of the weak learners.
• Let be a number such that the computational cost of representing any element
is at most O() and denote by size(c) the maximal cost of the computational
representation of .
• Deﬁnition 7.1 (Weak learning): A concept set (class) is said to be weakly PAC-
learnable if there exists an algorithm A, , and a polynomial function poly(.,.,.) such
that for any , for all distributions on and for any target concept , the following
holds for any sample size poly(1/ , size(c)):

𝑆 𝒟
[
ℙ 𝑚𝑅 ( h 𝑆 ) ≤
1
2
−𝛾 ≥1−𝛿
] (7.1)

• where is the hypothesis returned by algorithm when trained on sample .

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 13
gineering, IUST, Morteza Analoui
Base classifier
• When such an algorithm exists, it is called a weak learning algorithm
for or a weak learner.
• Hypotheses returned by a weak learning algorithm are called base
classifiers.

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 14

gineering, IUST, Morteza Analoui
Boosting algorithms
• Key idea behind boosting algorithms is to use a weak learning
algorithm to build a strong learner, that is, an accurate PAC-learning
algorithm.
• Boosting techniques use an ensemble method: they combine
different base classifiers returned by a weak learner to create a more
accurate predictor.

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 15

gineering, IUST, Morteza Analoui
Contents
1. Boosting
2. AdaBoost
3. AdaBoost and margin maximization
4. Multiclass boosting algorithms
5. Appendix: Decision Tree

03/18/2024 Pattern Recognition-SVM, School of Computer Engineering, IUS 16

T, Morteza Analoui
2- AdaBoost
AdaBoost algorithm
• is hypothesis set out of which
the base classifiers are
selected (base classifier set).
• Pseudocode: base classifier is
a function mapping from to
• Algorithm takes as input a
labeled sample
𝑆=( ( 𝑥 1 , 𝑦 1 ) , … , ( 𝑥 𝑚 , 𝑦 𝑚 ) ) h is mixture (ensemble) weight
for all
h
• Base classifier’s error should is normalization factor to ensure that the weights t+1() sum to one.

be less than random: =

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 18

gineering, IUST, Morteza Analoui
Ensemble of base classifiers
• At each iteration of the loop 3-8, a new base classifier is selected that
minimizes the error on the training sample weighted by the
distribution t:
𝑚
h 𝑡 ∈arg min ℙ [ h(𝑥 𝑖 )≠ 𝑦 𝑖 ]=argmin ∑ 𝒟𝑡 (𝑖)1h (𝑥 )≠ 𝑦 𝑖 𝑖
h ∈H h ∈H 𝑖=1

• (error of base classifier t ),

• then , is ratio of accuracy and error

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 19

gineering, IUST, Morteza Analoui
Higher weight for miss classified examples
• Distribution t+1 substantially increases the weight on if is incorrectly classified (),
and, on the contrary, decreasing it if is correctly classified.
• This has the effect of focusing more on examples incorrectly labeled at the next
round of boosting, less on those correctly classified by

𝒟 𝑡 (𝑖 ) 𝑒− 𝛼 𝑡 𝑦 𝑖 h𝑡 ( 𝑥 𝑖)

𝒟 𝑡 +1 ( 𝑖 ) ←
• Where mixture weight and 𝑍𝑡
• Simple mathematical derivation of the algorithm: Rojas, R. (2009).
AdaBoost and the super bowl of classifiers a tutorial introduction to adaptive boosting. Freie University, Berlin, Tech.

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 20

gineering, IUST, Morteza Analoui
1 − 𝜖
𝑍 𝑡 = 2 [ 𝜖𝑡 (1 − 𝜖 𝑡 ) ]
1
𝑡
2
𝛼 𝑡= 0 . 5 𝑙𝑜𝑔
𝜖𝑡
𝛼𝑡
𝜖 𝑡 < 0 .5

0
0.2 0.4 𝜖𝑡
0 0.5 𝜖𝑡
𝒟 𝑡 (𝑖 ) 𝑒− 𝛼 𝑡 𝑦 𝑖 h𝑡 ( 𝑥 𝑖)

𝒟 𝑡 +1 ( 𝑖 ) ←
𝑍𝑡
03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 21
gineering, IUST, Morteza Analoui
: a linear mixture (combination) of
• After rounds of boosting, the hypothesis (classifier) returned by
AdaBoost is based on the sign of function , which is a non-negative
linear combination of the base classifiers .

• The weight assigned to in is a logarithmic function of

• More accurate has higher and
• Thus, more accurate assigns a larger weight in
03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 23
gineering, IUST, Morteza Analoui
Linear mixture of
• is a linear function in space

•,

• Vector of base hypothesis can be viewed as a feature vector associated to , which

is considered to be similar to in Kernel-SVM, and is weight vector that was
denoted by .

• AdaBoost: , SVM:
SVM: Linear mixture of features

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 24

gineering, IUST, Morteza Analoui
Base classifier in ensemble methods
• A combination of several diverse suboptimal learner may lead to a better overall learner.

• Hypothesis of an unstable hypothesis set are diverse

• Unstable means: small changes in training sample can cause substantial changes of hypothesis
trained on the sample
• Unstable hypothesis are versatile models which reacts to small change in training sample
• Unstable classifiers play a major role in ensembles methods

• Examples of unstable hypothesis set:

• decision tree classifiers
• some neural networks
• SVM is an stable learning machine

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 25

gineering, IUST, Morteza Analoui
H in AdaBoost
• The family of base classifiers H typically used with AdaBoost in
practice is that of decision trees of depth one, known as stumps (root
and 2 leaves).
• Boosting stumps are threshold functions associated to a single feature
• If data is in ( features), we can associate a stump to components
• To determine the stump with the minimal weighted error at each
round of boosting, the best feature and its best threshold must be
computed.

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 26

gineering, IUST, Morteza Analoui
Example 1 – Ensemble of linear

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 27

gineering, IUST, Morteza Analoui
Example 1 – Ensemble of linear

h 1( 𝑥 ) h 2 (𝑥 )
0.35

training error
h=𝜶 ∙ 𝒉

0.10
0.05

𝑡 =5 𝑡 =40
03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 28
gineering, IUST, Morteza Analoui
Example 2– Ensemble of Stumps
best thresholds (decision boundaries)
at each boosting round
h1 h2 h3 h

Visualization of final classifier , constructed

as a linear combination of base classifiers.
3
h= ∑ 𝛼𝑡 h 𝑡
𝑡 =1

weights are updated at each round

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 29

gineering, IUST, Morteza Analoui
Example 3 =0.424

0 .5
𝑍 𝑡 = 2 [ 𝜖𝑡 ( 1 −𝜖 𝑡 ) ]

𝑐𝑜𝑟𝑟𝑒𝑐𝑡 0 .1 𝑒 −0 . 424
𝐷 2 = 0 .5
= 0 . 071
2( 0 .3 × 0 . 7) 𝜖 2=3 × 0 . 071=0 . 213
𝑖𝑛𝑐𝑜𝑟𝑟𝑒𝑐𝑡 0 .1 𝑒
+0 . 424
𝛼 2=0 . 653
𝐷2 = =0 . 167
2( 0 . 3 × 0 . 7)0 . 5

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 30

gineering, IUST, Morteza Analoui
Example 3
𝜶=[ 𝛼 1 𝛼2 𝛼3 ]=[ 0 . 424 0 . 625 0 . 908 ]
𝑇𝑟𝑎𝑛𝑠𝑝𝑜𝑠𝑒
𝒉= [ h1 h2 h3]
• Range of

0.424 +0.653 +0.908

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 32

gineering, IUST, Morteza Analoui
Normalized version of
• We will denote by
𝑇

∑ 𝛼 𝑡 h𝑡 ( 𝑥 ) 𝜶 ∙ 𝒉𝑇 ( 𝑥 )
𝑡 =1
h ( 𝑥 )= = = 𝜶 ∙ 𝒉𝑇 ( 𝑥 )
𝑇
‖𝜶‖1
∑ 𝛼𝑡
𝑡 =1

• is normalized version of the function returned by AdaBoost

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 33

gineering, IUST, Morteza Analoui
Recall from Chapter 1: Deﬁnition 5.1 -
margins
• Definition of SVM solution is based on notion of margin. If we omit
outliers, training data is correctly separated by with a margin :
h ( Φ( 𝑥 ) )=0
𝑥𝑖
ρi () = =

geometric margin of classifier = 2 𝜌h =

geometric margin at a point = distance from to hyperplane =0

0. 5
‖𝑤‖2 =( 𝑤 21+ 𝑤22 +… 𝑤 2𝑁 )

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 36

gineering, IUST, Morteza Analoui
Recall from Chapter 1: Soft margin –
Hard margin
• For ·, vector with can be viewed as an outlier
• with is correctly classified by hyperplane but is considered to be an
outlier, that is > 0

• If we omit outliers, training data is correctly separated by with a

margin that we refer to as soft margin, as opposed to hard margin in
separable case

03/18/2024 Pattern Recognition-SVM, School of Computer Engineering, IUS 37

T, Morteza Analoui
Deﬁnition 7.3 (L1-geometric margin of )
• =L1-geometric margin at point for a linear function with we define as:

𝑤 ∙ Φ(𝑥 )
𝑇
𝑦 ∑ 𝛼 𝑡 h𝑡 ( 𝑥 ) 𝑦 h ( 𝑥) 𝑦 𝜶 ∙ 𝒉𝑇 ( 𝒙 )
𝑡 =1
𝜌h ( 𝑥 ) = = = = 𝑦 𝜶 ∙ 𝒉𝑇 ( 𝑥 ) = 𝑦 h ( 𝑥 )
𝑇
‖𝜶‖1 ‖𝜶‖1
∑ 𝛼𝑡
𝑡 =1

• =L1-geometric margin of : If we omit miss-classified examples, training data is

correctly separated by with a margin that we refer to as soft margin. The L1-
margin of over a sample is its minimum margin at the points in that sample:

|𝜶 ∙ 𝒉 𝑇 ( 𝑥 𝑖 )|
𝜌 h = min 𝜌 h ( 𝑥𝑖 ) = min
‖𝛼‖1
= min |𝜶 ∙ 𝒉𝑇 ( 𝑥 𝑖 )|
𝑖 ∈[𝑚 ] 𝑖 ∈ [𝑚] 𝑖 ∈ [𝑚 ]

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 38

gineering, IUST, Morteza Analoui
Analogy with margin in SVM
• Consider: , the vector of base hypothesis values can be viewed as a
feature vector associated to , which is considered to be similar to in
Kernel-SVM, and is weight vector that was denoted by .

• For ensemble linear combinations such as those returned by AdaBoost,

additionally, the weight vector is non-negative:

• Notion of geometric margin for such ensemble functions which differs from
the one introduced for SVMs only by the norm-1 used instead of norm-2.

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 39

gineering, IUST, Morteza Analoui
Contents
1. Boosting
2. AdaBoost
3. AdaBoost and margin maximization
4. Multiclass boosting algorithms
5. Appendix: Decision Tree

03/18/2024 Pattern Recognition-SVM, School of Computer Engineering, IUS 40

T, Morteza Analoui
3- AdaBoost and margin
maximization
AdaBoost and margin maximization
• Maximum margin for a linearly separable sample is given by
argmax min 𝜌 h ( 𝑥 )
𝜶 𝑖using
• By definition, optimization problem ∈[𝑚confidence
] margin can be written as:

argmax 𝜌 h ( 𝑥 )
𝜶
• This is a linear program (LP), that is, a convex optimization problem with a linear
objective function and linear constraints.subject
There to:
are several different methods for
solving relative large LPs in practice, using the simplex method, interior-point
methods, or a variety of special-purpose solutions.

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 42

gineering, IUST, Morteza Analoui
Theorem 7.7: Bound on the empirical margin
loss (training error)
• Let denote function returned by AdaBoost after rounds of boosting
and assume for all that , which implies Then, for any confidence
margin , the following holds:

𝑦 𝑖 h ( 𝑥 𝑖)
𝑚 𝑦𝑖 h ( 𝑥𝑖 ) −(
𝜌
−1 )
1 −( −1) 𝑒
^
𝑅 𝑆 , 𝜌 ( h )= ∑ 𝑒 𝜌
𝑚 𝑖=1
𝑇
𝑅 𝑆 , 𝜌 ( h ) ≤ 2 ∏ √𝜖 𝑡 (1 −𝜖 𝑡 )
^ 𝑇 1−𝜌 1+ 𝜌
𝑦𝑖 h ( 𝑥𝑖)
=𝜌 h ( 𝑥𝑖 ) / 𝜌
𝑡 =1 − 1 0 +1+2 𝜌

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 43

gineering, IUST, Morteza Analoui
Theorem 7.7: Bound on the empirical loss
• Furthermore, if for all
• is known as the edge.
• Note: means
• For the empirical margin loss is upper bounded:

^ 1+ 𝜌 𝑇 /2
𝑅 𝑆 , 𝜌 (h) ≤ [(1 −2 𝛾 )1−𝜌
(1+2 𝛾 ) ]

• Note that , therefore empirical margin loss decreases exponentially fast when
increases and becomes zero for sufficiently large

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 44

gineering, IUST, Morteza Analoui
Note on bound on the empirical loss
• Value of and accuracy of base classifiers do not need to be known to
AdaBoost algorithm.
• Algorithm adapts to accuracy and defines a solution based on .
• In practice, error may increase as a function of . This is because
boosting presses weak learner to concentrate on instances that are
harder and harder to classify, for which even best base classifier could
not achieve an error significantly better than random.

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 45

gineering, IUST, Morteza Analoui
Note on bound on the empirical loss
• If becomes close to 0.5 relatively fast as a function of , then the
bound of theorem 7.7 becomes uninformative.

𝜀𝑡 ⟶ 0 .5 then 0< 𝛾 ≤ ( 0 . 5 −𝜖 𝑡 ) . So 𝛾 ≅ 0

^ 1+ 𝜌 𝑇 /2=1
𝑅 𝑆 , 𝜌 (h) ≤ [(1 −2 𝛾 )1−𝜌
(1+2 𝛾 ) ]
0 0

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 46

gineering, IUST, Morteza Analoui
VC-dimension-based analysis of AdaBoost
• The family of functions out of which AdaBoost selects its output after
rounds of boosting is

{ (∑ ) }
𝑇
ℱ 𝑇 = 𝑠𝑔𝑛 𝛼 𝑡 h𝑡 : 𝛼𝑡 ≥ 0 , h 𝑡 ∈ H, 𝑡 ∈ [ 𝑇 ]
𝑡 =1

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 47

gineering, IUST, Morteza Analoui
VC-dimension-based analysis of AdaBoost
• The VC-dimension of can be bounded as follows in terms of the VC-
dimension of the family of base hypothesis
Vcdim

• Bound suggests that AdaBoost could over fit for large values of , and
indeed this can occur.
• However, in many cases, it has been observed empirically that the
generalization error of AdaBoost decreases as a function of the
number of rounds of boosting , as illustrated in figure 7.5.

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 48

gineering, IUST, Morteza Analoui
Test error of AdaBoost
• Test error

Figure 7.5
An empirical result using AdaBoost with C4.5 decision trees as base learners. In this example, the
training error goes to zero after about 5 rounds of boosting, yet the test error continues to
decrease for larger values of (Reduction of bias due to increasing hypothesis set complexity)

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 49

gineering, IUST, Morteza Analoui
Generalization Bound, using margin
bound and VC-d complexity
• Corollary 7.6 Let be a family of functions taking values in {+1, -1} with VC-dimension .
• Fix for any , with probability at least , the following holds for all :

√ √
𝑒𝑚 1
2 𝑑 𝑙𝑜𝑔 𝑙𝑜𝑔 (7.15)
^ ( h) + 2 𝑑 𝛿
𝑅 (h )≤ 𝑅 𝑆,𝜌 +
𝜌 𝑚 2𝑚

• stands for “convex combination of base hypotheses”

• is true for binary classification
• can be chosen as a larger quantity for which vanishes and while complexity term becomes more
favorable since it decreases as
• is a free parameter that typically determined via cross-validation

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 50

gineering, IUST, Morteza Analoui
Empirical observation - Linearly separable
case
• Empirical margin loss becomes zero for sufficiently large

• In some tasks, the generalization error decreases as a function of

even after the error on training sample is zero.
• It means that geometric margin continues to increase

• Margin-based analysis supports the theoretical explanation for these

empirical observations, (7.15)

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 51

gineering, IUST, Morteza Analoui
AdaBoost in practice
• AdaBoost may admits a negative edge , in which case the weak learning condition
does not hold. (for all )

• AdaBoost may result in several with large total mixture weights ()

• This can be because algorithm increasingly concentrates on a few examples
that are hard to classify and whose weights keep growing. Only a few base
classifiers might achieve the best performance for hard examples. These base
classifiers with relatively large total mixture weights dominate the ensemble
and therefore solely dictate the classification decision.
• The performance of the resulting ensemble is typically poor since it almost
entirely hinges on that of a few base classifiers.

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 52

gineering, IUST, Morteza Analoui
L1-regularized AdaBoost
1. Limiting the number of rounds of boosting , which is also known as early-
stopping.
2. Controlling the magnitude of s. This can be done by augmenting the objective
function of AdaBoost with a regularization term based on a norm of the vector
of mixture weights. It is referred to as L1-regularized AdaBoost.
𝑚
1
𝑇
h ( 𝑥 𝑖 ) = ∑ 𝛼 𝑖 h𝑡 ( 𝑥 𝑖 )
𝑡 =1 or
min ∑
𝑚 𝑖=1
𝑒 − 𝑦 h (𝑥 )
+ 𝜆‖𝜶‖1 𝑖 𝑖
(7.31)

convex and differentiable upper bound on the zero- regularization term

one loss (see Theorem 7.7 slide).

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 53

gineering, IUST, Morteza Analoui
A practical application: face detector
• Viola and Jones’s algorithm: cascade of AdaBoost ensembles
• sliding a square window and classifying it as positive (face) or negative
(non-face). square window

P P P
𝐴𝑑𝑎𝐵𝑜𝑜𝑠𝑡1 𝐴𝑑𝑎𝐵𝑜𝑜𝑠𝑡 2 𝐴𝑑𝑎𝐵𝑜𝑜𝑠𝑡 3
An example of the output from
the Viola–Jones face detector

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 54

gineering, IUST, Morteza Analoui
Contents
1. Boosting
2. AdaBoost
3. AdaBoost and margin maximization
4. Multiclass boosting algorithms
5. Appendix: Decision Tree

03/18/2024 Pattern Recognition-SVM, School of Computer Engineering, IUS 55

T, Morteza Analoui
4- Multiclass boosting algorithms
Recall from Chapter 1: Multiclass
classification
• Let denots input space and denots output space classes (labels), and
let be an unknown distribution over according to which input points
are drawn. We will distinguish between two cases:
• mono-label case, where is a finite set of classes that we mark with numbers
for convenience, , and
• multi-label case where .

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 57

gineering, IUST, Morteza Analoui
Recall from Chapter 1: Multiclass
classification
• In mono-label case, each example is labeled with a single class, while
in multi-label case it can be labeled with several. Multi-label can be
illustrated by case of text documents, which can be labeled with
several different relevant topics
• Multi-label example: : 1-sport, 2-business, 3-society. Positive
components of a vector in indicate classes associated with example :
• , that is labeled as sport and society.

h ( 𝑥 𝑖 , 𝑦 𝑖 [1] ) h ( 𝑥 𝑖 , 𝑦 𝑖 [2 ] ) h ( 𝑥 𝑖 , 𝑦 𝑖 [3 ] )
sport business society

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 58

gineering, IUST, Morteza Analoui
Multiclass boosting algorithms
• We describe a boosting algorithm for multi-class classification called
AdaBoost.MH
• Multi-label setting is:
• Training set: ,
• Consider:
• , where
(9.13)
• Empirical loss:
• is a convex and differentiable
03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 59
gineering, IUST, Morteza Analoui
AdaBoost.MH exactly coincides with
AdaBoost
• AdaBoost.MH exactly coincides with AdaBoost applied to the training
sample derived from by splitting each labeled point into labeled
examples , with each example in and its label in denotes the th
coordinate of
( 𝑥 𝑖 , 𝑦 𝑖 ) ⟶ ( ( 𝑥 𝑖 , 1 ) , 𝑦 𝑖 [ 1 ] ) , … , ( ( 𝑥 𝑖 , 𝑘 ) , 𝑦 𝑖 [ 𝑘 ] ) ,𝑖 ∈[𝑚]

• Let denote the resulting sample, then

ℝ𝑛
𝑆 =( ( 𝑥 1 , 1 ) , 𝑦 1 [ 1 ] ) , ( ( 𝑥 1 , 2 ) , 𝑦 1 [ 2 ] ) , …, ( ( 𝑥 1 ,𝑘 ) , 𝑦 1 [ 𝑘 ] ) , …, ( ( 𝑥 𝑚,1 ) , 𝑦 𝑚 [ 1 ] ) , ( ( 𝑥 𝑚, 2 ) , 𝑦 𝑚 [ 2 ] ) , …, ( ( 𝑥 𝑚, 𝑘 ) , 𝑦 𝑚 [ 𝑘 ] )
′

( examples ) ℝ 𝑛 +1
03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 60
gineering, IUST, Morteza Analoui
Example

(𝑛+1)𝑡 h 𝑓𝑒𝑎𝑡𝑢𝑟𝑒

• ,

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 61

gineering, IUST, Morteza Analoui
AdaBoost.MH algorithm
(𝑆 , 𝑋 ∈ ℝ
′ 𝑛 +1
)

h
h
𝑇
𝛼𝑗

h 𝑇 𝛼 h
𝑗 𝑗
h
𝑗 =𝑡
03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 62
gineering, IUST, Morteza Analoui
AdaBoost.MH exactly coincides with
AdaBoost
• contains examples and the expression of the objective function in
(9.13) coincides exactly with that of the objective function of
AdaBoost for the sample
• Theoretical analysis along with the other observations we presented
for AdaBoost so far, also apply here
• Now, we will focus on aspects related to the computational efficiency
and to the weak learning condition that are specific to the multi-class
scenario

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 63

gineering, IUST, Morteza Analoui
Complexity of AdaBoost.MH algorithm
• Complexity of the algorithm is that of AdaBoost applied to a sample of size .
• For , using boosting stumps as base classifiers, the complexity of the algorithm is
therefore in
• Thus, for a large number of classes , the algorithm may become impractical using
a single processor.

• Weak learning condition:

at each round there exists a base classifier : such that
• This may be hard to achieve if some classes difficult to distinguish

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 64

gineering, IUST, Morteza Analoui
Boosting Algorithms
• Boosting algorithms can differ in how they create and aggregate weak
learners during sequential process
• Three popular types of boosting methods include:
1. Adaptive boosting or AdaBoost: Yoav Freund and Robert Schapire
are credited with the creation of the AdaBoost algorithm. This
method operates iteratively, identifying misclassified data points
and adjusting their weights to minimize the training error. The
model continues optimize in a sequential fashion until it yields the
strongest predictor.

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 65

gineering, IUST, Morteza Analoui
Boosting Algorithms
2. Gradient boosting: It works by sequentially adding predictors to an ensemble
with each one correcting for errors of its predecessor. However, instead of
changing weights of data points like AdaBoost, gradient boosting trains on
residual errors of previous predictor. The name, gradient boosting, is used
since it combines gradient descent algorithm and boosting method

Extreme gradient boosting or XGBoost: XGBoost is an implementation of gradient boosting

that’s designed for computational speed and scale. XGBoost leverages multiple cores on
CPU, allowing for learning to occur in parallel during training

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 66

gineering, IUST, Morteza Analoui
03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 67
gineering, IUST, Morteza Analoui
AdaBoost offers several advantages
• It is simple, its implementation is straightforward, and the time
complexity of each round of boosting as a function of the sample size
is rather favorable
• When using decision stumps, the time complexity of each round of
boosting is in . Of course, if the dimension of the feature space is very
large, then the algorithm could become in fact quite slow.
• When using decision stumps, algorithm only select features that
increase its predictive power during training, it can help to reduce
dimensionality (feature selection)
• It benefits from a rich theoretical analysis
03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 68
gineering, IUST, Morteza Analoui
Drawbacks of AdaBoost
• Need to select the parameter and the base classifier set. (stopping
criterion) is crucial to the performance of the algorithm.

• VCdimension analysis shows: larger values of can lead to over fitting.

In practice, is typically determined by validation set

• Complexity () of the family of base classifiers appeared in VCdim

bound. It is important to control in order to guarantee generalization.
Higher may leads to over fitting

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 69

gineering, IUST, Morteza Analoui
Drawbacks of AdaBoost
• Serious disadvantage of AdaBoost is its performance in the presence
of noise. Distribution weight assigned to examples that are harder to
classify substantially increases with . Noisy samples may end up
dominating the s.

• Sequential training in boosting is hard to scale up. Since each is built

on its predecessors, boosting models can be computationally
expensive
• XGBoost has been introduced to address scalability issues seen in other types
of boosting methods

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 70

gineering, IUST, Morteza Analoui
How to benefit from noise drawback
• Behavior of AdaBoost in the presence of noise can be used, in fact, as
a useful feature for detecting outliers, that is, examples that are
incorrectly labeled or that are hard to classify.
• Examples with large weights after a certain number of rounds of
boosting can be identified as outliers.

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 71

gineering, IUST, Morteza Analoui
Applications of boosting
• Boosting algorithms are well suited for artificial intelligence projects
across a broad range of industries, including:
• Healthcare: Boosting is used to lower errors in medical data
predictions, such as predicting cardiovascular risk factors and cancer
patient survival rates
• Finance: Boosting is used with deep learning models to automate
critical tasks, including fraud detection, pricing analysis, and more
•…

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 72

gineering, IUST, Morteza Analoui
Contents
1. Boosting
2. AdaBoost
3. AdaBoost and margin maximization
4. Multiclass boosting algorithms
5. Appendix: Decision Tree

03/18/2024 Pattern Recognition-SVM, School of Computer Engineering, IUS 73

T, Morteza Analoui
5- Appendix: Decision Tree
Decision Tree
• Decision trees can be used as weak learners with boosting to define
effective learning algorithms
• Decision trees are typically fast to train and evaluate and relatively
easy to interpret root node
𝑥1 > 𝑎1

𝑥1 > 𝑎2 𝑥 2> 𝑎 3

𝑥 2> 𝑎 4
leaf3 leaf4 leaf5

leaf1 leaf2

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 75

gineering, IUST, Morteza Analoui
root node

Decision Tree
𝑥1 > 𝑎1

𝑥1 > 𝑎2 𝑥 2> 𝑎 3

• Figure 9.2 shows a simple example in the 𝑥 2> 𝑎 4

case of a space based on features and , as leaf3 leaf4 leaf5

well as the partition it represents
leaf1 leaf2
• A leaf defines a region of formed by set of
sample points corresponding to same 𝑥2
traversal of tree
• A label is assigned to a leaf, . Majority
representation among training points falling
in a leaf region defines label of that leaf. leaf1 region

𝑥 1
Majority label of training examples
in region is the label of leaf1
03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 76
gineering, IUST, Morteza Analoui
Binary decision tree
• Definition 9.5 - Binary decision tree is a representation of a partition
of feature space
• As in Figure 9.2, each interior node of a decision tree corresponds to a
question related to a feature (attribute)
• It can be a
• numerical question of form for a feature variable , , and some threshold , as
in example of Figure 9.2, or
• a categorical question such as , when feature takes a categorical value such
as a color

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 77

gineering, IUST, Morteza Analoui
More complex node questions
• More complex node questions, resulting in partitions based on more
complex decision surfaces
• Example: binary space partition (BSP) trees partition space with
convex polyhedral regions, based on questions of form

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 78

gineering, IUST, Morteza Analoui
Prediction/partitioning
• To predict label of any point we start at root node of decision tree and
go down tree until a leaf is found, by moving to right child of a node
when response to node question is positive, and to left child
otherwise. When we reach a leaf, we associate with label of this leaf
• A leaf defines a region of formed by set of points corresponding to
same traversal of tree
• By definition, no two regions intersect and all points belong to exactly
one region

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 79

gineering, IUST, Morteza Analoui
Learning
• Label of a leaf is determined using training sample: class with
majority representation among labeled training examples falling in a
leaf region defines label of that leaf, with ties broken arbitrarily
• There are different training algorithm, we mention two here
• Greedy: This is motivated by fact that general problem of finding a decision
tree with smallest error is NP-hard
• Grow-then-prune: First a very large tree is grown until it fully fits training
sample. Then, resulting tree is pruned back to minimize an objective function
defined (based on generalization bounds) as sum of an empirical error and a
complexity term

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 80

gineering, IUST, Morteza Analoui
1- Greedy algorithm root node
𝑥1 > 𝑎1 𝑞 1𝑡 =1
Greedy Decision Trees
1. tree root node 𝑡 =3𝑞 3𝑥 > 𝑎 1 2 𝑞 2𝑡 =2
𝑥 2> 𝑎 3

2. For to do
𝑞 7𝑥 > 𝑎𝑞 6 𝑞 5
2 4 𝑞4
3. SPLIT(tree, ) leaf3 leaf4 leaf5
4. Return tree 𝑞9
𝑡 =𝑇 =9 𝑞8
5. leaf1 leaf2

• The procedure splits node by making it an internal node with question and leaf
children and each labeled with dominating class of region it defines, with ties
broken arbitrarily. Root node is a leaf whose label is class that has majority over
entire

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 81

gineering, IUST, Morteza Analoui
Node impurity (error in a node)
• Pair is chosen so that node impurity is maximally decreased according to some
measure of impurity (impurity of node )
• Decrease in impurity (information gain) by split is given by:
~
𝐹 ( 𝑛 , 𝑞) = 𝐹 ( 𝑛) − ¿
• is fraction of examples in region defined by that are moved to
: # of examples in
𝑛,𝑞 denotes fraction of that belong
+ to class

𝑚𝑛− =𝜂 ( 𝑛 , 𝑞 ) ×𝑚𝑛𝑑 𝑛− (𝑛 , 𝑞) 𝑛+¿ (𝑛 ,𝑞 )¿ 𝑚𝑛 +¿ =(1 −𝜂 ( 𝑛 , 𝑞 ) ) ×𝑚

𝑛 ¿

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 82

gineering, IUST, Morteza Analoui
Node impurity definitions
For mono-label multiclass ( labels) the impurity of
node can be defined in 3 ways:

{
𝑘
− ∑ 𝑝 𝑙 (𝑛𝑑 )𝑙𝑜𝑔 2( 𝑝 𝑙 (𝑛𝑑 ) ) Entropy
𝑙 =1
𝐹 ( 𝑛 )= 𝑘

∑ 𝑝𝑙 (𝑛𝑑)( 1− 𝑝 𝑙 ( 𝑛𝑑 ) ) Gini index

𝑙 =1
1 − max 𝑝 𝑙 ( 𝑛𝑑 ) Misclassification

:
𝑙 ∈ [ 𝑘]

For any node and class , denote fraction of points at that belong
to class . Figure 9.4: binary case,
All three functions are concave, which ensures that Three node impurity definitions
plotted as a function of fraction of
positive examples in.

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 83

gineering, IUST, Morteza Analoui
2- Grow-then-prune
• First a very large tree is grown until it fully fits training sample or until no more than a very small
number of points are left at each leaf
• Then, resulting tree, denoted as , is pruned back to minimize an objective function defined (based
on generalization bounds) as sum of an empirical error and a complexity term. Complexity can be
expressed in terms of size of (set of leaves of tree). Resulting objective is
number of nodes number of leaves
~
𝐺 𝜆 ( 𝑡𝑟𝑒𝑒 ) = ∑~ |𝑛| 𝐹 ( 𝑛𝑑 )+ 𝜆|𝑡𝑟𝑒𝑒|
complexity
(9.15)
𝑛∈ 𝑡𝑟𝑒𝑒
empirical error(impurity)

• where is a regularization parameter determining trade-of between misclassification, or more

generally impurity, versus tree complexity

03/18/2024 Pattern Recognition-Adaptive Boosting, School of Computer En 84

gineering, IUST, Morteza Analoui

Asynchronous Data Transfer in Computer Organization - Javatpoint
No ratings yet
Asynchronous Data Transfer in Computer Organization - Javatpoint
8 pages
ML Unit 3 (Ab22)
No ratings yet
ML Unit 3 (Ab22)
42 pages
Aga A2 0101 Ap
No ratings yet
Aga A2 0101 Ap
1 page
Module 5,1 Ensemble - Bagging, RF, Boosting
No ratings yet
Module 5,1 Ensemble - Bagging, RF, Boosting
66 pages
Machine Learning-5
No ratings yet
Machine Learning-5
89 pages
کتاب هفتم بارگزاری شده
No ratings yet
کتاب هفتم بارگزاری شده
57 pages
MachineLearning - UNIT III
No ratings yet
MachineLearning - UNIT III
30 pages
Unit 4 Part 1
No ratings yet
Unit 4 Part 1
47 pages
DAA Unit 1
No ratings yet
DAA Unit 1
106 pages
Unit V - Multiple Learners
No ratings yet
Unit V - Multiple Learners
54 pages
Lec5 Boosting v2.7 1
No ratings yet
Lec5 Boosting v2.7 1
46 pages
Chapter Five
No ratings yet
Chapter Five
42 pages
Lecture18 Boosting
No ratings yet
Lecture18 Boosting
21 pages
ML Unit-4 Prob Learning
No ratings yet
ML Unit-4 Prob Learning
36 pages
ML8 Ensembles
No ratings yet
ML8 Ensembles
31 pages
Ensemble
No ratings yet
Ensemble
33 pages
Architect ARCS Installation Guide
No ratings yet
Architect ARCS Installation Guide
19 pages
ENG6500 7 Ensembles Boosting
No ratings yet
ENG6500 7 Ensembles Boosting
49 pages
Computer Network: 02 December 2024 22:38
No ratings yet
Computer Network: 02 December 2024 22:38
5 pages
Sec 1630
No ratings yet
Sec 1630
145 pages
Overview of Adaboost: Reconciling Its Views To Better Understand Its Dynamics
No ratings yet
Overview of Adaboost: Reconciling Its Views To Better Understand Its Dynamics
39 pages
Control Flow Graphs Against Malware Methods of Analysis and Detection
No ratings yet
Control Flow Graphs Against Malware Methods of Analysis and Detection
5 pages
Your Charges in Detail - 7400447196: Monthly Rentals
No ratings yet
Your Charges in Detail - 7400447196: Monthly Rentals
5 pages
Lect02 Problem ML
No ratings yet
Lect02 Problem ML
29 pages
22 Boosting
No ratings yet
22 Boosting
32 pages
AdaBoost Final
No ratings yet
AdaBoost Final
97 pages
AdaBoost New PDF
No ratings yet
AdaBoost New PDF
45 pages
BRKNMS-2573 (2019)
No ratings yet
BRKNMS-2573 (2019)
106 pages
Machine Learning: Ensemble Methods
No ratings yet
Machine Learning: Ensemble Methods
54 pages
MoA AoA Amended PDF
No ratings yet
MoA AoA Amended PDF
185 pages
i-ALERT Remote Monitoring Solution
No ratings yet
i-ALERT Remote Monitoring Solution
12 pages
Boosting Mit
No ratings yet
Boosting Mit
36 pages
Unit-5 SM
No ratings yet
Unit-5 SM
32 pages
BioBlocksLab - A Portable DIY Bio Lab Using BioBlocks Language - ScienceDirect
No ratings yet
BioBlocksLab - A Portable DIY Bio Lab Using BioBlocks Language - ScienceDirect
14 pages
Тест ИКТ 8
No ratings yet
Тест ИКТ 8
8 pages
Foundations of Machine Learning: Boosting
No ratings yet
Foundations of Machine Learning: Boosting
41 pages
Matlab Workshop Day2 - 001
No ratings yet
Matlab Workshop Day2 - 001
31 pages
Computer Project XI
No ratings yet
Computer Project XI
10 pages
Chapter 3 - Boosting Theory
No ratings yet
Chapter 3 - Boosting Theory
7 pages
Computational Data Analysis: Machine Learning
No ratings yet
Computational Data Analysis: Machine Learning
26 pages
LECTURE+NOTES Boosting
No ratings yet
LECTURE+NOTES Boosting
8 pages
The Evolution of Boosting Algorithms: From Machine Learning To Statistical Modelling
No ratings yet
The Evolution of Boosting Algorithms: From Machine Learning To Statistical Modelling
32 pages
Uwell Caliburn AK2 Replacement Pods - India Vape
No ratings yet
Uwell Caliburn AK2 Replacement Pods - India Vape
1 page
Introduction To Boosting: Cynthia Rudin PACM, Princeton University
No ratings yet
Introduction To Boosting: Cynthia Rudin PACM, Princeton University
29 pages
An Introduction To Boosting and Leveraging: 1 A Brief History of Boosting
No ratings yet
An Introduction To Boosting and Leveraging: 1 A Brief History of Boosting
66 pages
Strange and Beautiful Numbers 2
No ratings yet
Strange and Beautiful Numbers 2
10 pages
Boosting (Machine Learning)
No ratings yet
Boosting (Machine Learning)
6 pages
Avr EM'CY DIESEL GENERATOR-3
No ratings yet
Avr EM'CY DIESEL GENERATOR-3
7 pages
Scha Pire
No ratings yet
Scha Pire
182 pages
BasicML Survey
No ratings yet
BasicML Survey
6 pages
Boosting
No ratings yet
Boosting
2 pages
Introduction To Boosting - 2
No ratings yet
Introduction To Boosting - 2
79 pages
Introduction of Internet of Things: Drive For Ever
No ratings yet
Introduction of Internet of Things: Drive For Ever
13 pages
Sarthak Bhutani - SDE2 - 3.5YOE - Resume PDF
No ratings yet
Sarthak Bhutani - SDE2 - 3.5YOE - Resume PDF
1 page
Bagging and Boosting: Amit Srinet Dave Snyder
No ratings yet
Bagging and Boosting: Amit Srinet Dave Snyder
33 pages
Machine Learning
No ratings yet
Machine Learning
40 pages
12 Chapterwise Blue Print 2022-23
No ratings yet
12 Chapterwise Blue Print 2022-23
3 pages
Value Stream Mapping - TQM
No ratings yet
Value Stream Mapping - TQM
27 pages
Infineon IDP2303 DS v02 - 00 EN
No ratings yet
Infineon IDP2303 DS v02 - 00 EN
38 pages
Foundations of Machine Learning: Courant Institute and Google Research
No ratings yet
Foundations of Machine Learning: Courant Institute and Google Research
42 pages
A Brief Introduction To Adaboost: Hongbo Deng 6 Feb, 2007
No ratings yet
A Brief Introduction To Adaboost: Hongbo Deng 6 Feb, 2007
35 pages
Ensemble Learning
No ratings yet
Ensemble Learning
22 pages
Adaboost: Derek Hoiem March 31, 2004
No ratings yet
Adaboost: Derek Hoiem March 31, 2004
46 pages
Bagging+Boosting+Gradient Boosting
100% (1)
Bagging+Boosting+Gradient Boosting
48 pages
Boosting and Applications Yuan
No ratings yet
Boosting and Applications Yuan
41 pages
Boosting (Machine Learning)
No ratings yet
Boosting (Machine Learning)
6 pages
Ada Boost
No ratings yet
Ada Boost
25 pages
Web Mining
No ratings yet
Web Mining
20 pages
Operating Manual
No ratings yet
Operating Manual
30 pages
Bagging and Boosting: 9.520 Class 10, 13 March 2006 Sasha Rakhlin
No ratings yet
Bagging and Boosting: 9.520 Class 10, 13 March 2006 Sasha Rakhlin
19 pages
CS229 Supplemental Lecture Notes: 1 Boosting
No ratings yet
CS229 Supplemental Lecture Notes: 1 Boosting
11 pages
Adaptive Boosting For Classification and Regression
No ratings yet
Adaptive Boosting For Classification and Regression
4 pages
Boosting Approach To Machine Learn
No ratings yet
Boosting Approach To Machine Learn
23 pages
A Short Introduction To Boosting
No ratings yet
A Short Introduction To Boosting
14 pages
PTCL 3G Evo Tab - Software Update Step by Step Visual Guide
No ratings yet
PTCL 3G Evo Tab - Software Update Step by Step Visual Guide
4 pages
Boosting Products of Base Classifiers: Balazs Kegl Gmail COM Busarobi Gmail COM
No ratings yet
Boosting Products of Base Classifiers: Balazs Kegl Gmail COM Busarobi Gmail COM
8 pages
Slide07 Haykin Chapter 7: Committee Machines
No ratings yet
Slide07 Haykin Chapter 7: Committee Machines
8 pages
A Short Introduction To Boosting
No ratings yet
A Short Introduction To Boosting
14 pages
A Short Introduction To Boosting
No ratings yet
A Short Introduction To Boosting
14 pages
Project On Microsoft
33% (3)
Project On Microsoft
7 pages
DO 27 S 2019 PDF
No ratings yet
DO 27 S 2019 PDF
258 pages
Grade 8 Computer LM Unit 2
No ratings yet
Grade 8 Computer LM Unit 2
11 pages
E-Yantra Robotics Competition E-Yantra+ Caretaker Robot Theme
No ratings yet
E-Yantra Robotics Competition E-Yantra+ Caretaker Robot Theme
7 pages
VT Secondary Injection Format
No ratings yet
VT Secondary Injection Format
3 pages
Understanding Machine Learning
100% (69)
Understanding Machine Learning
416 pages