0% found this document useful (0 votes)
34 views13 pages

02 Murphy Multi Variate Distanc

This document discusses multivariate distance and similarity measures. It defines key terms like the multivariate sample mean, multivariate variance, and covariance matrix. It then describes different distance measures, including univariate distance, univariate z-score distance, bivariate Euclidean distance, and multivariate Euclidean distance. It notes that Euclidean distance does not account for differences in variable variances or correlations. The document concludes by introducing the Mahalanobis distance, which accounts for these factors by using the covariance matrix.

Uploaded by

hamed9811
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views13 pages

02 Murphy Multi Variate Distanc

This document discusses multivariate distance and similarity measures. It defines key terms like the multivariate sample mean, multivariate variance, and covariance matrix. It then describes different distance measures, including univariate distance, univariate z-score distance, bivariate Euclidean distance, and multivariate Euclidean distance. It notes that Euclidean distance does not account for differences in variable variances or correlations. The document concludes by introducing the Mahalanobis distance, which accounts for these factors by using the covariance matrix.

Uploaded by

hamed9811
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 13

Multivariate Distance and

Similarity
Robert F. Murphy
Cytometry Development
Workshop 2000

General Multivariate
Dataset
We are given values of p variables
for n independent observations
Construct an n x p matrix M
consisting of vectors X1 through Xn
each of length p

Multivariate Sample
Mean

Define mean vector I of length p


n

M(i, j)

I( j) =

i=1

or

matrixnotation

Xi

i=1

I=

vectornotation

Multivariate Variance

Define variance vector of length


p
2

( j) =

(M(i, j) I( j))

i=1

n1
matrixnotation

Multivariate Variance

or
2

(Xi I)

i=1

n1

vectornotation

Covariance Matrix

Define a p x p matrix cov (called the


covariance matrix) analogous to 2
n

(M(i, j) I(j ))(M(i,k) I(k))

cov( j,k) =

i=1

n1

Covariance Matrix

Note that the covariance of a variable with


itself is simply the variance of that variable

cov( j, j) = ( j)

Univariate Distance

The simple distance between the values of a


single variable j for two observations i and l is

M(i, j) M(l, j)

Univariate z-score
Distance

To measure distance in units of


standard deviation between the values
of a single variable j for two observations
i and l we define the z-score distance

M(i, j) M(l, j)
( j)

Bivariate Euclidean
Distance

The most commonly used measure of distance


between two observations i and l on two variables j
and k is the Euclidean distance

(M(i, j) M(l, j)) +(M(i,k) M(l,k))

Multivariate Euclidean
Distance

This can be extended to more than


two variables
p

(M(i, j) M(l, j))

j=1

Effects of variance and


covariance on Euclidean
distance
Theellipse
B

showsthe
50%contour
ofa
hypothetical
population.

PointsAandBhavesimilarEuclideandistancesfromthemean,
butpointBisclearlymoredifferentfromthepopulationthan
pointA.

Mahalanobis Distance

To account for differences in variance


between the variables, and to account
for correlations between variables, we
use the Mahalanobis distance

-1

D =(Xi Xl )cov (Xi Xl )

You might also like