Applied Multivariate Statistics - Review
Applied Multivariate Statistics - Review
Pn
Sample covariance: Cov(x;
d y) = 1
n¡1 i=1 (xi ¡ x)(yi ¡ y)
c
Sample correlation: d y) = Cov(x;y)
rxy = Cor(x; ¾
^x ¾
^y
2
Scatterplot: Correlation is scale invariant
3
Intuition and pitfalls for correlation
Correlation = LINEAR relation
4
Covariance matrix / correlation matrix:
Table of pairwise values
True covariance matrix: §ij = Cov(Xi; Xj )
True correlation matrix: Cij = Cor(Xi; Xj )
5
Sq. Mahalanobis Distance MD2(x)
Multivariate Normal Distribution:
=
Most common model choice
Sq. distance from mean in
standard deviations
IN DIRECTION OF X
1
¡ 1 T ¡1
¢
f(x; ¹; §) = p exp ¡ 2 ¢ (x ¡ ¹) § (x ¡ ¹)
2¼j§j
6
µ ¶
Mahalanobis distance: Example 0
¹= ;
0
µ ¶
25 0
§=
0 1
(0,10)
MD = 10
7
µ ¶
Mahalanobis distance: Example 0
¹= ;
0
µ ¶
25 0
§=
0 1
(10, 7)
MD = 7.3
8
Glyphplots:
Stars
9
Mosaic plot with shading
Suprisingly small
R: Function “mosaic” in package “vcd”
observed cell
count
p-value of
independence
Suprisingly large
test: Highly
observed cell
significant
count
10
Outliers: Theory of Mahalanobis Distance
11
Outliers: Check for multivariate outlier
12
Outliers: chisq.plot
Outlier easily detected !
13
Missing values: Problem of Single Imputation
Aggregate
results
Do standard analysis
Impute several times for each imputed data set;
get estimate and std.error
15
Idea of MDS
16
Distance: To scale or not to scale…
17
Dissimilarity for mixed data: Gower’s Dissim.
Pp
Aggregate: d(i; j) = p i=1 d(f)
1
ij
19
PCA (Version 1): Orthogonal directions
20
How many PC’s: Blood Example
Rule 1: 5 PC’s
Rule 2: 3 PC’s
21
Biplot: Show info on samples AND variables
Approximately true:
• Data points: Projection on first two PCs
Distance in Biplot ~ True Distance
• Projection of sample onto arrow gives
original (scaled) value of that variable
• Arrowlength: Variance of variable
• Angle between Arrows: Correlation
22
Supervised Learning: LDA
P (C)P (XjC)
P (CjX) = P (X) » P (C)P (XjC)
Prior / prevalence:
Find some estimate Assume:
Fraction of samples
XjC » N(¹c; §)
in that class
Bayes rule:
Choose class where P(C|X) is maximal
(rule is “optimal” if all types of error are equally costly)
In Practice: Estimate 𝑃 𝐶 , 𝜇𝐶 , Σ
23
LDA Orthogonal directions of best separation
1. Principal Component
Linear decision boundary
1. Linear Discriminant
=
1. Canonical Variable
1
Classify to which class? – Consider:
• Prior
0
• Mahalanobis distance to class center
24
LDA: Quality of classification
Test
Training
25