0% found this document useful (0 votes)
18 views25 pages

CS3491-AI ML-Chapter 5

CS3491-ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING

Uploaded by

Steephen Raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views25 pages

CS3491-AI ML-Chapter 5

CS3491-ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING

Uploaded by

Steephen Raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 25

INTRODUCTION TO

Machine
Learning
CHAPTER 5:

Multivariate
Methods
Multivariate Data
 Multiple measurements (sensors)
 d inputs/features/attributes: d-variate
 N instances/observations/examples

 X 11 X 1
2  X 1
d
 2 2 2 
 X1 X 2  X d
X
  
 N N N 
 X 1 X 2  X d 

3
Multivariate Parameters
Mean : E x μ 1 ,...,d 
T

Covariance : ij CovX i , X j 


ij
Correlation : Corr X i , X j  ij 
i  j

  12  12   1d 
 
 
2
 21  2   2d 
 CovX   E X  μ X  μ  
T

  
 2 
  d1  d 2   d 

4
Parameter Estimation
N

Samplemean m : mi 
t 1 i
x t

,i 1,...,d
N

Covariancematrix S : sij 
 x
N
t 1
t
i 
 mi xtj  mj 
N
sij
Correlation matrix R : rij 
si s j

5
Estimation of Missing Values
 What to do if certain instances have missing
attributes?
 Ignore those instances: not a good idea if the
sample is small
 Use ‘missing’ as an attribute: may give
information
 Imputation: Fill in the missing value
 Mean imputation: Use the most likely value (e.g., mean)
 Imputation by regression: Predict based on other
attributes

6
Multivariate Normal
Distribution

x ~N d μ, Σ 
1  1 
p x  exp  x  μ Σ x  μ
T 1

2 Σ
d/ 2 1/ 2
 2 
7
Multivariate Normal
Distribution
 Mahalanobis distance: (x – μ)T ∑–1 (x – μ)
measures the distance from x to μ in terms of
∑ (normalizes for difference in variances and
correlations)
 12 12 
 Bivariate: d = 2   
 12 22 

1  1 2 
p x1 ,x2   exp 
 2
2
 
z1  2z1z 2  z 2 

212 1  2
 21   
zi  xi  i / i
8
Bivariate Normal

9
10
Independent Inputs: Naive
Bayes
 If xi are independent, offdiagonals of ∑ are 0,
Mahalanobis distance reduces to weighted (by
1/σi) Euclidean distance:
d
1  1 d x   
2

p x pi xi   d
exp    i
 
i
 
 2 
i 1
2 i
d / 2
 i 1  i 
i 1

 If variances are also equal, reduces to Euclidean


distance

11
Parametric Classification
 If p (x | Ci ) ~ N ( μi , ∑i )
1  1 
p x | C i   exp  x  μi  Σi x  μi 
T 1

2 Σi
d/ 2 1/ 2
 2 
 Discriminant functions are

gi x  log p x | C i   log P C i 


d 1 1
  log2  log Σi  x  μi  Σi x  μi   log P C i 
T 1

2 2 2

12
Estimation of Parameters
P̂ C  
 r
t i
t

i
N

mi 
t i x
r t t

t i
r t

r x  
T

Si 
t i
t t t
 mi x  mi
t i
r t

1 1
gi x   log Si  x  mi  Si x  mi   log P̂ C i 
T 1

2 2

13
Different Si
 Quadratic discriminant

1
2 2

1 T 1 1 T 1

gi x   log Si  x Si x  2xT Si mi  mi Si mi  log P̂ C i 
T
 xT Wi x  wi x  wi 0
where
1 1
Wi   Si
2
1
wi Si mi
1 T 1 1
wi 0   mi Si mi  log Si  log P̂ C i 
2 2

14
likelihoods
discriminant:
P (C1|x ) = 0.5

posterior for C1

15
Common Covariance Matrix S
 Shared common sample covariance S
S   P̂ C i Si
i

 Discriminant reduces to
1
gi x   x  mi  S 1 x  mi   log P̂ C i 
T

2
which is a linear discriminant
gi x wi x  wi 0
T

where
1 T 1
1
wi S mi wi 0   mi S mi  log P̂ C i 
2 16
Common Covariance Matrix S

17
Diagonal S
 When xj j = 1,..d, are independent, ∑ is diagonal
p (x|Ci) = ∏j p (xj |Ci) (Naive Bayes’ assumption)
2
1  x  mij 
d t

gi x    j
  log P̂ C i 
2 j 1  s j 

Classify based on weighted Euclidean distance (in


sj units) to the nearest mean

18
Diagonal S

variances may be
different

19
Diagonal S, equal variances
 Nearest mean classifier: Classify based on
Euclidean distance to the nearest mean
2
x  mi
gi x   2
 log P̂ C i 
2s
2
1 d

  2  xtj  mij
2s j 1
  log P̂ C 
i

 Each mean can be considered a prototype or


template and this is template matching

20
Diagonal S, equal variances

*?

21
Model Selection
Assumption Covariance matrix No of parameters
Shared, Hyperspheric Si=S=s2I 1
Shared, Axis-aligned Si=S, with sij=0 d
Shared, Hyperellipsoidal Si=S d(d+1)/2
Different, Si K d(d+1)/2
Hyperellipsoidal
 As we increase complexity (less restricted S), bias
decreases and variance increases
 Assume simple models (allow some bias) to
control variance (regularization)

22
Discrete Features
 Binary features:pij p x j 1 | C i 
if xj are independent (Naive Bayes’)
d
p x | C i  pij 1  pij 
xj 1 x j 
j 1

the discriminant is linear


gi x  log p x | C i   log P C i 
  x j log pij  1  x j  log 1  pij   log P C i 
j

Estimated parameters p̂ij 


t j ri
x t t

t i
r t

23
Discrete Features
 Multinomial (1-of-nj) features: xj  {v1, v2,..., vnj}

pijk p z jk 1 | C i  p x j vk | C i 

if xj are independent
nd j

p x | C i  pijkjk
z

j 1 k 1

gi x   j k z jk log pijk  log P C i 

p̂ijk 
t jk i
z t
r t

t i
r t

24
Multivariate Regression
r g x | w ,w ,...,w  
t t
0 1 d

 Multivariate linear model


w 0  w1x1t  w 2x2t    w d xdt
1

E w 0 ,w1 ,...,w d | X   t r t  w 0  w1x1t    w d xdt
2

2

 Multivariate polynomial model:


Define new higher-order variables
z1=x1, z2=x2, z3=x12, z4=x22, z5=x1x2
and use the linear model in this new z space
(basis functions, kernel trick, SVM: Chapter 10)

25

You might also like