Aiml Unit-4
Aiml Unit-4
of world
Engineering
Unit-IV
Contents
Basic idea:
– If it walks like a duck, quacks like a duck, then it’s
probably a duck
Compute
Distance Test
Record
111111111110 000000000001
vs
011111111111 100000000000
P( X | Y ) P(Y )
P(Y | X ) =
P( X )
Approach:
– compute posterior probability P(Y | X1, X2, …, Xd) using the
Bayes theorem
P( X 1 X 2 X d | Y ) P(Y )
P(Y | X 1 X 2 X n ) =
P( X 1 X 2 X d )
P(Refund=Yes|Yes)=0
1 −
( 120−110 ) 2
Name Give Birth Can Fly Live in Water Have Legs Class
human yes no no yes mammals
A: attributes
python no no no no non-mammals M: mammals
salmon no no yes no non-mammals
whale yes no yes no mammals N: non-mammals
frog no no sometimes yes non-mammals
komodo no no no yes non-mammals
6 6 2 2
bat
pigeon
yes
no
yes
yes
no
no
yes
yes
mammals
non-mammals
P ( A | M ) = = 0.06
cat yes no no yes mammals
7 7 7 7
leopard shark yes no yes no non-mammals 1 10 3 4
turtle no no sometimes yes non-mammals P ( A | N ) = = 0.0042
penguin no no sometimes yes non-mammals 13 13 13 13
porcupine yes no no yes mammals
7
P ( A | M ) P ( M ) = 0.06 = 0.021
eel no no yes no non-mammals
salamander no no sometimes yes non-mammals
gila monster no no no yes non-mammals 20
platypus no no no yes mammals
13
owl
dolphin
no
yes
yes
no
no
yes
yes
no
non-mammals
mammals
P ( A | N ) P ( N ) = 0.004 = 0.0027
eagle no yes no yes non-mammals 20
D
D is parent of C
A is child of C
C
B is descendant of D
D is ancestor of A
A B
X1 X2 X3 X4 ... Xd
Exercise Diet
Blood
Chest Pain
Pressure
Example graph of a logistic regression curve fitted to data. The curve shows the probability of
passing an exam (binary dependent variable) versus hours studying (scalar independent variable).
where w and b are the parameters of the model and aT denotes the transpose of
a vector a. Note that if wT x + b > 0, then x belongs to class 1 since its odds is
greater than 1. Otherwise, x belongs to class 0.
12/29/2024
2/10/2021 Introduction to Data Mining, 2nd Edition 49
Ensemble Methods
12/29/2024
2/10/2021 Introduction to Data Mining, 2nd Edition 50
Ensemble Methods
12/29/2024
2/10/2021 Introduction to Data Mining, 2nd Edition 51
Example: Why Do Ensemble Methods Work?
12/29/2024
2/10/2021 Introduction to Data Mining, 2nd Edition 52
Necessary Conditions for Ensemble Methods
Comparison between
errors of base classifiers
and errors of the ensemble
classifier
12/29/2024
2/10/2021 Introduction to Data Mining, 2nd Edition 53
Rationale for Ensemble Learning
12/29/2024
2/10/2021 Introduction to Data Mining, 2nd Edition 54
Bias-Variance Decomposition
Analogous problem of reaching a target y by firing projectiles
from x (regression problem)
12/29/2024
2/10/2021 Introduction to Data Mining, 2nd Edition 55
Bias-Variance Trade-off and Overfitting
Overfitting
Underfitting
12/29/2024
2/10/2021 Introduction to Data Mining, 2nd Edition 57
Constructing Ensemble Classifiers
12/29/2024
2/10/2021 Introduction to Data Mining, 2nd Edition 58
Constructing Ensemble Classifiers
12/29/2024
2/10/2021 Introduction to Data Mining, 2nd Edition 59
Constructing Ensemble Classifiers
2/10/2021
12/29/2024 Introduction to Data Mining, 2nd Edition 60
Constructing Ensemble Classifiers
The first three approaches are generic methods that are applicable to any classifier,
whereas the fourth approach depends on the type of classifier used.
The base classifiers for most of these approaches can be generated sequentially (one
after another) or in parallel (all at once).
2/10/2021
12/29/2024 Introduction to Data Mining, 2nd Edition 61
Bagging (Bootstrap AGGregatING)
Original Data 1 2 3 4 5 6 7 8 9 10
Bagging (Round 1) 7 8 10 8 2 5 10 10 5 9
Bagging (Round 2) 1 4 9 1 2 3 2 7 3 2
Bagging (Round 3) 1 8 5 10 5 5 9 6 3 7
12/29/2024
2/10/2021 Introduction to Data Mining, 2nd Edition 63
Bagging Example
Consider 1-dimensional data set:
Original Data:
x 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
y 1 1 1 -1 -1 -1 -1 1 1 1
xk
True False
yleft yright
12/29/2024
2/10/2021 Introduction to Data Mining, 2nd Edition 64
Bagging Example
Bagging Round 1:
x 0.1 0.2 0.2 0.3 0.4 0.4 0.5 0.6 0.9 0.9 x <= 0.35 y = 1
y 1 1 1 1 -1 -1 -1 -1 1 1 x > 0.35 y = -1
Bagging Round 2:
x 0.1 0.2 0.3 0.4 0.5 0.5 0.9 1 1 1
y 1 1 1 -1 -1 -1 1 1 1 1
Bagging Round 3:
x 0.1 0.2 0.3 0.4 0.4 0.5 0.7 0.7 0.8 0.9
y 1 1 1 -1 -1 -1 -1 -1 1 1
Bagging Round 4:
x 0.1 0.1 0.2 0.4 0.4 0.5 0.5 0.7 0.8 0.9
y 1 1 1 -1 -1 -1 -1 -1 1 1
Bagging Round 5:
x 0.1 0.1 0.2 0.5 0.6 0.6 0.6 1 1 1
y 1 1 1 -1 -1 -1 -1 1 1 1
Bagging Round 1:
x 0.1 0.2 0.2 0.3 0.4 0.4 0.5 0.6 0.9 0.9 x <= 0.35 y = 1
y 1 1 1 1 -1 -1 -1 -1 1 1 x > 0.35 y = -1
Bagging Round 2:
x 0.1 0.2 0.3 0.4 0.5 0.5 0.9 1 1 1 x <= 0.7 y = 1
y 1 1 1 -1 -1 -1 1 1 1 1 x > 0.7 y = 1
Bagging Round 3:
x 0.1 0.2 0.3 0.4 0.4 0.5 0.7 0.7 0.8 0.9 x <= 0.35 y = 1
y 1 1 1 -1 -1 -1 -1 -1 1 1 x > 0.35 y = -1
Bagging Round 4:
x 0.1 0.1 0.2 0.4 0.4 0.5 0.5 0.7 0.8 0.9 x <= 0.3 y = 1
y 1 1 1 -1 -1 -1 -1 -1 1 1 x > 0.3 y = -1
Bagging Round 5:
x 0.1 0.1 0.2 0.5 0.6 0.6 0.6 1 1 1 x <= 0.35 y = 1
x > 0.35 y = -1
y 1 1 1 -1 -1 -1 -1 1 1 1
Bagging Round 6:
x 0.2 0.4 0.5 0.6 0.7 0.7 0.7 0.8 0.9 1 x <= 0.75 y = -1
y 1 -1 -1 -1 -1 -1 -1 1 1 1 x > 0.75 y = 1
Bagging Round 7:
x 0.1 0.4 0.4 0.6 0.7 0.8 0.9 0.9 0.9 1 x <= 0.75 y = -1
y 1 -1 -1 -1 -1 1 1 1 1 1 x > 0.75 y = 1
Bagging Round 8:
x 0.1 0.2 0.5 0.5 0.5 0.7 0.7 0.8 0.9 1 x <= 0.75 y = -1
y 1 1 -1 -1 -1 -1 -1 1 1 1 x > 0.75 y = 1
Bagging Round 9:
x 0.1 0.3 0.4 0.4 0.6 0.7 0.7 0.8 1 1 x <= 0.75 y = -1
y 1 1 -1 -1 -1 -1 -1 1 1 1 x > 0.75 y = 1
12/29/2024
2/10/2021 Introduction to Data Mining, 2nd Edition 68
Bagging Example
Use majority vote (sign of sum of predictions) to determine
class of ensemble classifier
Round x=0.1 x=0.2 x=0.3 x=0.4 x=0.5 x=0.6 x=0.7 x=0.8 x=0.9 x=1.0
1 1 1 1 -1 -1 -1 -1 -1 -1 -1
2 1 1 1 1 1 1 1 1 1 1
3 1 1 1 -1 -1 -1 -1 -1 -1 -1
4 1 1 1 -1 -1 -1 -1 -1 -1 -1
5 1 1 1 -1 -1 -1 -1 -1 -1 -1
6 -1 -1 -1 -1 -1 -1 -1 1 1 1
7 -1 -1 -1 -1 -1 -1 -1 1 1 1
8 -1 -1 -1 -1 -1 -1 -1 1 1 1
9 -1 -1 -1 -1 -1 -1 -1 1 1 1
10 1 1 1 1 1 1 1 1 1 1
Sum 2 2 2 -6 -6 -6 -6 2 2 2
Predicted Sign 1 1 1 -1 -1 -1 -1 1 1 1
Class
12/29/2024
2/10/2021 Introduction to Data Mining, 2nd Edition 70
Boosting
Records that are wrongly classified will have their
weights increased in the next round
Records that are classified correctly will have their
weights decreased in the next round
Original Data 1 2 3 4 5 6 7 8 9 10
Boosting (Round 1) 7 3 2 8 7 9 4 10 6 3
Boosting (Round 2) 5 4 9 4 2 5 1 7 4 2
Boosting (Round 3) 4 4 8 10 4 5 4 6 3 4
12/29/2024
2/10/2021 Introduction to Data Mining, 2nd Edition 71
AdaBoost
Importance of a classifier:
1 1 − i
i = ln
2 i
12/29/2024
2/10/2021 Introduction to Data Mining, 2nd Edition 72
AdaBoost Algorithm
Weight update:
12/29/2024
2/10/2021 Introduction to Data Mining, 2nd Edition 73
AdaBoost Algorithm
Original Data:
x 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
y 1 1 1 -1 -1 -1 -1 1 1 1
xk
True False
yleft yright
12/29/2024
2/10/2021 Introduction to Data Mining, 2nd Edition 75
AdaBoost Example
Training sets for the first 3 boosting rounds:
Boosting Round 1:
x 0.1 0.4 0.5 0.6 0.6 0.7 0.7 0.7 0.8 1
y 1 -1 -1 -1 -1 -1 -1 -1 1 1
Boosting Round 2:
x 0.1 0.1 0.2 0.2 0.2 0.2 0.3 0.3 0.3 0.3
y 1 1 1 1 1 1 1 1 1 1
Boosting Round 3:
x 0.2 0.2 0.4 0.4 0.4 0.4 0.5 0.6 0.6 0.7
y 1 1 -1 -1 -1 -1 -1 -1 -1 -1
Summary:
Round Split Point Left Class Right Class alpha
1 0.75 -1 1 1.738
2 0.05 1 1 2.7784
3 0.3 1 -1 4.1195
12/29/2024
2/10/2021 Introduction to Data Mining, 2nd Edition 76
AdaBoost Example
Weights
Round x=0.1 x=0.2 x=0.3 x=0.4 x=0.5 x=0.6 x=0.7 x=0.8 x=0.9 x=1.0
1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
2 0.311 0.311 0.311 0.01 0.01 0.01 0.01 0.01 0.01 0.01
3 0.029 0.029 0.029 0.228 0.228 0.228 0.228 0.009 0.009 0.009
Classification
Round x=0.1 x=0.2 x=0.3 x=0.4 x=0.5 x=0.6 x=0.7 x=0.8 x=0.9 x=1.0
1 -1 -1 -1 -1 -1 -1 -1 1 1 1
2 1 1 1 1 1 1 1 1 1 1
3 1 1 1 -1 -1 -1 -1 -1 -1 -1
Sum 5.16 5.16 5.16 -3.08 -3.08 -3.08 -3.08 0.397 0.397 0.397
Predicted Sign 1 1 1 -1 -1 -1 -1 1 1 1
Class
12/29/2024
2/10/2021 Introduction to Data Mining, 2nd Edition 77
Random Forest Algorithm
Construct an ensemble of decision trees by
manipulating training set as well as features
12/29/2024
2/10/2021 Introduction to Data Mining, 2nd Edition 78
Random Forest Algorithm
12/29/2024
2/10/2021 Introduction to Data Mining, 2nd Edition 79
Characteristics of Random Forest
12/29/2024
2/10/2021 Introduction to Data Mining, 2nd Edition 80
Gradient Boosting
Constructs a series of models
– Models can be any predictive model that has a
differentiable loss function
– Commonly, trees are the chosen model
◆ XGboost (extreme gradient boosting) is a popular
package because of its impressive performance
Boosting can be viewed as optimizing the loss
function by iterative functional gradient descent.
Implementations of various boosted algorithms are
available in Python, R, Matlab, and more.
12/29/2024
2/10/2021 Introduction to Data Mining, 2nd Edition 81
References:
1. Introduction to Data Mining ,Pang-Ning Tan, Michael Steinbach, Vipin
Kumar,2nd edition, 2019,Pearson , ISBN-10-9332571406, ISBN-13 -978-
9332571402
2. Machine Learning ,Tom M. Mitchell, Indian Edition, 2013, McGraw Hill
Education, ISBN – 10 – 1259096955
3. Jiawei Han and Micheline, Kamber: Data Mining – Concepts and
Techniques, 2nd Edition, Morgan Kaufmann, 2006, ISBN 1-55860-901-6