ML - 5
ML - 5
03/06/2022
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Basic Concept
03/06/2022
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Basic Concept
• Different Techniques
• Some of the commonly used Ensemble techniques are
discussed below
Bagging
• Bagging or Bootstrap Aggregation is a powerful, effective
and simple ensemble method. The method uses multiple
versions of a training set by using the bootstrap, i.e.
sampling with replacement and t it can be used with any
type of model for classification or regression. Bagging is
only effective when using unstable (i.e. a small change in
the training set can cause a significant change in the model)
non-linear models.
03/06/2022
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Basic Concept
Boosting
• Boosting is a meta-algorithm which can be viewed as a model averaging
method. It is the most widely used ensemble method and one of the
most powerful learning ideas. This method was originally designed for
classification but it can also be profitably extended to regression. The
original boosting algorithm combined three weak learners to generate a
strong learner.
Stacking
• Stacking is concerned with combining multiple classifiers generated by
using different learning algorithms on a single dataset which consists of
pairs of feature vectors and their classifications. This technique consists
of basically two phases, in the first phase, a set of base-level classifiers
is generated and in the second phase, a meta-level classifier is learned
which combines the outputs of the base-level classifiers.
03/06/2022
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Basic Concept
7
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
– Boosting
8
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Bagging
03/06/2022
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
03/06/2022
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
03/06/2022
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
03/06/2022
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
• Bagging
• Sampling with replacement Training Data
Data ID
Original Data 1 2 3 4 5 6 7 8 9 10
Bagging (Round 1) 7 8 10 8 2 5 10 10 5 9
Bagging (Round 2) 1 4 9 1 2 3 2 7 3 2
Bagging (Round 3) 1 8 5 10 5 5 9 6 3 7
13
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
14
14
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
+1 0.4 to 0.7: -1 +1
0.8 x
0.3
15
16
Bagging (applied to training data)
17
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Bagging Summary
• Works well if the base classifiers are unstable
(complement each other)
• Increased accuracy because it reduces the
variance of the individual classifier
• Does not focus on any particular instance of the
training data
– Therefore, less susceptible to model over-fitting
when applied to noisy data
• What if we want to focus on a particular instances
of training data?
18
19
20
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Definition
03/06/2022
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
03/06/2022
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Basic Concept
03/06/2022
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Basic Concept
03/06/2022
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Basic Concept
03/06/2022
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Basic Concept
• Lazy learners
• Lazy learners simply store the training data and wait until a testing data
appear. When it does, classification is conducted based on the most
related data in the stored training data. Compared to eager learners,
lazy learners have less training time but more time in predicting.
• Ex. k-nearest neighbor, Case-based reasoning
• Eager learners
• Eager learners construct a classification model based on the given
training data before receiving data for classification. It must be able to
commit to a single hypothesis that covers the entire instance space. Due
to the model construction, eager learners take a long time for train and
less time to predict.
• Ex. Decision Tree, Naive Bayes, Artificial Neural Networks
03/06/2022
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
• k-Nearest-Neighbor Method
– first described in the early 1950s
– It has since been widely used in the area of pattern
recognition.
– The training instances are described by n attributes.
– Each instance represents a point in an n-
dimensional
space.
– A k-nearest-neighbor classifier searches the pattern
space for the k training instances that are closest to
the unknown instance.
03/06/2022
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
03/06/2022
Continued…
03/06/2022
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
• Boosting
• An iterative procedure to adaptively change
distribution of training data by focusing
more on previously misclassified records
– Initially, all N records are assigned equal
weights
– Unlike bagging, weights may change at the end
of a boosting round
39
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
• End
40
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
03/06/2022
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
• Explanation:
The above diagram explains the AdaBoost algorithm in a very simple way. Let’s try to
understand it in a stepwise process:
• B1 consists of 10 data points which consist of two types namely plus(+) and minus(-) and 5 of
which are plus(+) and the other 5 are minus(-) and each one has been assigned equal weight
initially. The first model tries to classify the data points and generates a vertical separator line
but it wrongly classifies 3 plus(+) as minus(-).
• B2 consists of the 10 data points from the previous model in which the 3 wrongly classified
plus(+) are weighted more so that the current model tries more to classify these pluses(+)
correctly. This model generates a vertical separator line that correctly classifies the previously
wrongly classified pluses(+) but in this attempt, it wrongly classifies three minuses(-).
• B3 consists of the 10 data points from the previous model in which the 3 wrongly classified
minus(-) are weighted more so that the current model tries more to classify these minuses(-)
correctly. This model generates a horizontal separator line that correctly classifies the
previously wrongly classified minuses(-).
• B4 combines together B1, B2, and B3 in order to build a strong prediction model which is much
better than any individual model used.
03/06/2022
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
43
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Boosting
• Equal weights are assigned to each training
instance (1/N for round 1) at first round
• After a classifier Ci is learned, the weights are
adjusted to allow the subsequent classifier
Ci+1 to “pay more attention” to data that were
misclassified by Ci.
• Final boosted classifier C* combines the votes of
each individual classifier
– Weight of each classifier’s vote is a function of its
accuracy
• Adaboost – popular boosting algorithm
44
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
45
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
46
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
47
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
• Testing:
– For each class c, sum the weights of each classifier that assigned
class c to X (unseen data)
– The class with the highest sum is the WINNER!
48
Example: Error and Classifier Weight in AdaBoost
• Importance of a classifier:
49
Example: Data Instance Weight in AdaBoost
50
Illustrating AdaBoost
Initial weights for each data point Data points
for training
51
Illustrating AdaBoost
52
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Thank You
03/06/2022