0% found this document useful (0 votes)
11 views53 pages

Stat

This document covers the concepts of random vectors, covariance, and the multivariate normal distribution as part of a statistical learning course. It includes definitions, examples, and exercises related to random vectors, moments, covariance matrices, and correlation matrices, using a dataset of cork tree measurements. The document also discusses linear combinations of random vectors and their statistical properties.

Uploaded by

emilien88.henry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views53 pages

Stat

This document covers the concepts of random vectors, covariance, and the multivariate normal distribution as part of a statistical learning course. It includes definitions, examples, and exercises related to random vectors, moments, covariance matrices, and correlation matrices, using a dataset of cork tree measurements. The document also discusses linear combinations of random vectors and their statistical properties.

Uploaded by

emilien88.henry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Module 2, Part 2: Random vectors, covariance,

multivariate Normal distribution


TMA4268 Statistical Learning V2025

Stefanie Mu!, Department of Mathematical Sciences, NTNU

January 17, 2025

1 / 53
Overview

• Random vectors

• The covariance and correlation matrix

• The multivariate normal distribution

2 / 53
Random vector
• A random vector X(p→1) is a p-dimensional vector of
random variables. For example
• Weight of cork deposits in p = 4 directions (N, E, S, W).
• Factors to predict body fat: bmi, age, weight, hip
circumference,. . . .
• Joint distribution function: f (x).
• From joint distribution function to marginal (and
conditional distributions).
! ↑ ! ↑
f1 (x1 ) = ··· f (x1 , x2 , . . . , xp )dx2 · · · dxp
↓↑ ↓↑

• Cumulative distribution (definite integrals!) used to


calculate probabilites.
3 / 53
Moments
The moments are important properties of the distribution of X.
We will look at:
• E: Mean of random vector and random matrices.
• Cov: Covariance matrix.
• Corr: Correlation matrix.
• E and Cov of multiple linear combinations.

4 / 53
The Cork deposit data

• Classical multivariate data set from Rao (1948).


• Weigth of bark deposits of n = 28 cork trees in p = 4
directions (N, E, S, W).
corkds=as.matrix(
read.table("https://www.math.ntnu.no/emner/TMA4268/2019v/data/corkMKB.txt")
)
dimnames(corkds)[[2]]=c("N","E","S","W")
head(corkds)
## N E S W
## [1,] 72 66 76 77
## [2,] 60 53 66 63
## [3,] 56 57 64 58
## [4,] 41 29 36 38
## [5,] 32 32 35 36
## [6,] 30 35 34 26
dim(corkds)
## [1] 28 4

5 / 53
Look at the data (always the first thing to do):
library(GGally)
corkds <- as.data.frame(corkds)
ggpairs(corkds)

N E S W
0.025
0.020
0.015 Corr: Corr: Corr:

N
0.010
0.005 0.885*** 0.905*** 0.883***
0.000
80
70
60 Corr: Corr:

E
50
40 0.826*** 0.769***
30
100
80 Corr:

S
60
40 0.923***

70
60

W
50
40
30
40 60 80 30 40 50 60 70 80 40 60 80 100 30 40 50 60 70
6 / 53
• Here we have a random sample of n = 28 cork trees from
the population and observe a p = 4 dimensional random
vector for each tree.
• This leads us to the definition of random vectors and a
random matrix for cork trees:

 
X11 X12 X13 X14
 X21 X22 X23 X24 
 
 
X(28→4) = 

X31 X32 X33 X34 

 .. .. .. .. 
 . . . . 
X28,1 X28,2 X28,3 X28,4

7 / 53
The mean vector

• Random vector X(p→1) with mean vector µ(p→1) :


   
X1 E(X1 )
   
 X2   E(X2 ) 
X(p→1) = 
 ..  , and µ(p→1) = E(X) = 
  ..  .

 .   . 
Xp E(Xp )

• Note that E(Xj ) is calculated from the marginal


distribution of Xj and contains no information about
dependencies between Xj and Xk for k →= j.

8 / 53
Rules for the mean I

Random matrix X(n→p) and random matrix Y(n→p) :

E(X + Y ) = E(X) + E(Y ) .

(Rules of vector and matrix addition)

9 / 53
Rules for the mean II
• Random matrix X(n→p) and conformable constant matrices
A and B:
E(AXB) = AE(X)B
Proof: Board

10 / 53
Q:
• What are the univariate analogue to the formulas on the
previous two slides (which you studied in your first
introductory course in statistics)?

11 / 53
The covariance
In the introductory statistics course we defined the covariance

ωij = Cov(Xi , Xj ) = E[(Xi ↑ µi )(Xj ↑ µj )]


= E(Xi · Xj ) ↑ µi µj .

• What is the covariance called when i = j?


• What does it mean when the covariance is
• negative
• zero
• positive?

12 / 53
Make a scatter plot for negative, zero and positive correlation
(see also R example).

13 / 53
Variance-covariance matrix
• Consider random vector X(p→1) with mean vector µ(p→1) :
   
X1 E(X1 )
   
 X2   E(X2 ) 
X(p→1) = 
 ..  , and µ(p→1) = E(X) = 
  .. 

 .   . 
Xp E(Xp )

• Variance-covariance matrix ! (real and symmetric)

! = Cov(X) = E[(X ↑ µ)(X ↑ µ)T ]


 
ε12 ε12 · · · ε1p

 ε12 ε22 · · · ε2p 

=
 .. .. .. ..  = E(XX T ) ↑ µµT

 . . . . 
ε1p ε2p · · · εp2

14 / 53
15 / 53
• The diagonal elements in !, εii = εi2 , are variances.
• The o!-diagonal elements are covariances
εij = E[(Xi ↑ µi )(Xj ↑ µj )] = εji .
• ! is called variance, covariance and variance-covariance
matrix and denoted both Var(X) and Cov(X).

16 / 53
Exercise: the variance-covariance matrix
Let X4→1 have variance-covariance matrix
 
2 1 0 0
 1 2 0 1 
 
!= .
 0 0 2 1 
0 1 1 2

Explain what this means.

17 / 53
Correlation matrix

Correlation matrix ω (real and symmetric)


 
ω12
↓ ↓ω122 ··· ↓ω1p  
 ω12 ω12 ω1 ω22 ω12 ωp2  1 ω12 · · · ω1p
 ω22 
 ↓ω12
 ↓ ··· ↓ω2p  
  ω12 1 ··· ω2p 

ω12 ω22 ω22 ω22 ω22 ωp2
ω=
 = . .. .. .. 
.. .. .. ..   .
  . . . .


 . . . . 

 ω1p ωp2  ω1p ω2p · · · 1
↓ ↓ω2p ··· ↓
ω12 ωp2 ω22 ωp2 ωp2 ωp2

 ( 
ε12 0 ··· 0
 ( 
 
1 1 1  0 ε22 · · · 0 
ω = (V )2
↓1
!(V ) 2
↓1
, where V 2 =
 .. .. .. ..


. . .
(.
 
 
0 0 ··· εp2

18 / 53
Exercise: the correlation matrix

Let X4→1 have variance-covariance matrix


 
2 1 0 0
 1 2 0 1 
 
!= .
 0 0 2 1 
0 1 1 2

Find the correlation matrix.

19 / 53
A:

20 / 53
Linear combinations

Consider a random vector X(p→1) with mean vector µ = E(X)


and variance-covariance matrix ! = Cov(X).
The linear combinations
 )p 
)j=1 c1j Xj
 p 
 j=1 c2j Xj 
Z = CX = 
 .. 

 . 
)p
j=1 ckj Xj

have
E(Z) = E(CX) = Cµ
Cov(Z) = Cov(CX) = C!C T
Exercise: Follow the proof - what are the most important
transitions?

21 / 53
22 / 53
Exercise: Linear combinations

     2

XN µN εN εN E εN S εN W
 XE   µE   ε 2
εE εES εEW 
     NE 
X= ,µ =   , and ! =  
 XS   µS   εN S εSE εS2 εSW 
XW µW εN W εEW εSW 2
εW

Scientists would like to compare the following three contrasts:


N-S, E+W and (E+W)-(N+S), and define a new random vector
Y(3→1) = C(3→4) X(4→1) giving the three contrasts.
• Write down C.
• Explain how to find E(Y1 ) and Cov(Y1 , Y3 ).
• Use R to find the mean vector, covariance matrix and
correlations matrix of Y , when the mean vector and
covariance matrix for X given below.

23 / 53
Find C, such that Y(3→1) = C(3→4) X(4→1) gives the three
contrasts above:

24 / 53
Cov(Y ) = Cov(CX) = ...

25 / 53
corkds <- as.matrix(read.table("https://www.math.ntnu.no/emner/TMA4268/2019v/data/corkMKB.txt"))
dimnames(corkds)[[2]] <- c("N","E","S","W")
mu=apply(corkds,2,mean)
mu
Sigma=var(corkds)
Sigma

## N E S W
## 50.53571 46.17857 49.67857 45.17857
## N E S W
## N 290.4061 223.7526 288.4378 226.2712
## E 223.7526 219.9299 229.0595 171.3743
## S 288.4378 229.0595 350.0040 259.5410
## W 226.2712 171.3743 259.5410 226.0040
(C <- matrix(c(1,0,-1,0,0,1,0,1,-1,1,-1,1),byrow=T,nrow=3))

## [,1] [,2] [,3] [,4]


## [1,] 1 0 -1 0
## [2,] 0 1 0 1
## [3,] -1 1 -1 1
C %*% Sigma %*% t(C)

## [,1] [,2] [,3]


## [1,] 63.53439 -38.57672 21.02116
## [2,] -38.57672 788.68254 -149.94180
## [3,] 21.02116 -149.94180 128.71958

26 / 53
The covariance matrix - more requirements?

Random vector X(p→1) with mean vector µ(p→1) and covariance


matrix
 
ε12 ε12 · · · ε1p

 ε12 ε22 · · · ε2p 

! = Cov(X) = E[(X ↑ µ)(X ↑ µ) ] = 
T
 .. .. .. .. 

 . . . . 
ε1p ε2p · · · εp2

27 / 53
• The covariance matrix is by construction symmetric, and it
is common to require that the covariance matrix is positive
semidefinite. This means that, for every vector b →= 0

bT !b ↔ 0 .

• Why do you think that is?

Hint: Is it possible that the variance of the linear combination


Y = bT X is negative?

28 / 53
Random vectors - Single-choice exercise

Quizz on www.menti.com

29 / 53
Question 1: Mean of sum
X and Y are two bivariate random vectors with E(X) = (1, 2)T
and E(Y ) = (2, 0)T . What is E(X + Y )?

• A: (1.5, 1)T
• B: (3, 2)T
• C: (↑1, 2)T
• D: (1, ↑2)T

30 / 53
Question 2: Mean of linear combination
X is a 2-dimensional random vector with E(X) = (2, 5)T , and
b = (0.5, 0.5)T is a constant vector. What is E(bT X)?

• A: 3.5
• B: 7
• C: 2
• D: 5

31 / 53
Question 3: Covariance
X is a p-dimensional random vector with mean µ. Which of the
following defines the covariance matrix?

• A: E[(X ↑ µ)T (X ↑ µ)]


• B: E[(X ↑ µ)(X ↑ µ)T ]
• C: E[(X ↑ µ)(X ↑ µ)]
• D: E[(X ↑ µ)T (X ↑ µ)T ]

32 / 53
Question 4: Mean of linear combinations
X is a p-dimensional random vector with mean µ and covariance
matrix !. C is a constant matrix. What is then the mean of the
k-dimensional random vector Y = CX?

• A: Cµ
• B: C!
• C: CµC T
• D: C!C T

33 / 53
Question 5: Covariance of linear combinations
X is a p-dimensional random vector with mean µ and covariance
matrix !. C is a constant matrix. What is then the covariance
of the k-dimensional random vector Y = CX?

• A: Cµ
• B: C!
• C: CµC T
• D: C!C T

34 / 53
Question 6: Correlation
X is a 2-dimensional random vector with covariance matrix
* +
4 0.8
!=
0.8 1

Then the correlation between the two elements of X are:

• A: 0.10
• B: 0.25
• C: 0.40
• D: 0.80

35 / 53
The multivariate normal distribution

Why is the mvN so popular?


• Many natural phenomena may be modelled using this
distribution (just as in the univariate case).
• Multivariate version of the central limit theorem- the sample
mean will be approximately multivariate normal for large
samples.
• Good interpretability of the covariance.
• Mathematically tractable.
• Building block in many models and methods.

36 / 53
3D multivariate Normal distributions

37 / 53
The multivariate normal (mvN) pdf
The random vector Xp→1 is multivariate normal Np with mean µ
and (positive definite) covariate matrix !. The pdf is:
1 1
f (x) = exp{↑ (x ↑ µ) ! (x ↑ µ)}
T ↓1
1
2
p
(2ϑ) 2 |!| 2

Questions:
• How does this compare to the univariate version?

1 1
f (x) = ↓ exp{↑ 2 (x ↑ µ)2 }
2ϑε 2ε
• Why do we need the constant in front of the exp?
• What is the dimension of the part in exp?
• What happens if the determinant |!| = 0 (degenerate case)?

38 / 53
Four useful properties of the mvN
Let X(p→1) be a random vector from Np (µ, !).
1. The grapical contours of the mvN are ellipsoids (can be
shown using spectral decomposition).
2. Linear combinations of components of X are (multivariate)
normal.
3. All subsets of the components of X are (multivariate)
normal (special case of the above).
4. Zero covariance implies that the corresponding components
are independently distributed (in contrast to general
distributions).

If you need a refresh, you might find that video useful:


https://www.youtube.com/watch?v=eho8xH3E6mE

39 / 53
All of these are proven in TMA4267 Linear Statistical Models.
The result 4 is rather useful! If you have a bivariate normal and
observed covariance 0, then your variables are independent.

40 / 53
Contours of multivariate normal distribution
• Contours of constant density for the p-dimensional normal
distribution are ellipsoids defined by x such that

(x ↑ µ)T !↓1 (x ↑ µ) = b

where b > 0 is a constant. ↓


• These ellipsoids are centered at µ and have axes ± bϖi ei ,
where !ei = ϖi ei (eigenvector for ϖi ), for i = 1, ..., p.
• To see this the spectral decomposition of the covariance
matrix is useful.

41 / 53
Note:
In M4: Classification the mvN is very important and we will
often draw contours of the mvN as ellipses (in 2D space). This is
the reason why we do that.

42 / 53
Identify the mvNs from their contours
* +
εx2 ωεx εy
Let ! = .
ωεx εy εy2

The following four figure contours have been generated:


• 1: εx = 1, εy = 2, ω = ↑0.3
• 2: εx = 1, εy = 1, ω = 0
• 3: εx = 1, εy = 1, ω = 0.5
• 4: εx = 1, εy = 2, ω = 0

Match the distributions to the figures on the next slide.

43 / 53
A B

4
0.02 0.02

2
0.06 0.06
0. 0.1
2 1
0.1

0.1
16
4
0

0.
0.14
2
0.1
0.08 0.08
0.04 0.04
−2

−2
−4

−4
C
−4 −2 0 2 4 D
−4 −2 0 2 4
4

4
0.
01

0.03 0.03
0.04
2

2
0.05
0.06
0.07
0.07
0

0
0.06
0.05
−2

−2

0.04

0.02 0.02
0. 0.
01 01 01
0.
−4

−4

−4 −2 0 2 4 −4 −2 0 2 4

Take a look at the contour plots - when are the contours circles,
when ellipses?
44 / 53
Multiple choice - multivariate normal
A second quizz on www.menti.com

Choose the correct answer. Let’s go!

45 / 53
Question 1: Multivariate normal pdf
1 p2 ↓ 12
The probability density function is ( 2ε ) det(!) exp{↑ 12 Q}
where Q is
• A: (x ↑ µ)T !↓1 (x ↑ µ)
• B: (x ↑ µ)!(x ↑ µ)T
• C: ! ↑ µ

46 / 53
Question 2: Trivariate normal pdf
What graphical form has the solution to f (x) = constant?
• A: Circle
• B: Parabola
• C: Ellipsoid
• D: Bell shape

47 / 53
Question 3: Multivariate normal distribution
Xp ↗ Np (µ, !), and C is a k ↘ p constant matrix. Y = CX is
• A: Chi-squared with k degrees of freedom
• B: Multivariate normal with mean kµ
• C: Chi-squared with p degrees of freedom
• D: Multivariate normal with mean Cµ

48 / 53
Question 4: Independence
Let X ↗ N3 (µ, !), with
 
1 1 0
 
! =  1 3 2 .
0 2 5

Which two variables are independent?


• A: X1 and X2
• B: X1 and X3
• C: X2 and X3
• D: None – but two are uncorrelated.

49 / 53
Question 5: Constructing independent variables?
Let X ↗ Np (µ, !). How can I construct a vector of independent
standard normal variables from X?
• A: !(X ↑ µ)
• B: !↓1 (X + µ)
↓ 12
• C: ! (X ↑ µ)
1
• D: ! 2 (X + µ)

50 / 53
51 / 53
Further reading/resources

• Videoes on YouTube by the authors of ISL, Chapter 2

52 / 53
Acknowledgements

Thanks to Mette Langaas, who developed the first slide set in


2018 and 2019, and to Julia Debik for contributing to this
module.

53 / 53

You might also like