Stat
Stat
1 / 53
Overview
• Random vectors
2 / 53
Random vector
• A random vector X(p→1) is a p-dimensional vector of
random variables. For example
• Weight of cork deposits in p = 4 directions (N, E, S, W).
• Factors to predict body fat: bmi, age, weight, hip
circumference,. . . .
• Joint distribution function: f (x).
• From joint distribution function to marginal (and
conditional distributions).
! ↑ ! ↑
f1 (x1 ) = ··· f (x1 , x2 , . . . , xp )dx2 · · · dxp
↓↑ ↓↑
4 / 53
The Cork deposit data
5 / 53
Look at the data (always the first thing to do):
library(GGally)
corkds <- as.data.frame(corkds)
ggpairs(corkds)
N E S W
0.025
0.020
0.015 Corr: Corr: Corr:
N
0.010
0.005 0.885*** 0.905*** 0.883***
0.000
80
70
60 Corr: Corr:
E
50
40 0.826*** 0.769***
30
100
80 Corr:
S
60
40 0.923***
70
60
W
50
40
30
40 60 80 30 40 50 60 70 80 40 60 80 100 30 40 50 60 70
6 / 53
• Here we have a random sample of n = 28 cork trees from
the population and observe a p = 4 dimensional random
vector for each tree.
• This leads us to the definition of random vectors and a
random matrix for cork trees:
X11 X12 X13 X14
X21 X22 X23 X24
X(28→4) =
X31 X32 X33 X34
.. .. .. ..
. . . .
X28,1 X28,2 X28,3 X28,4
7 / 53
The mean vector
8 / 53
Rules for the mean I
9 / 53
Rules for the mean II
• Random matrix X(n→p) and conformable constant matrices
A and B:
E(AXB) = AE(X)B
Proof: Board
10 / 53
Q:
• What are the univariate analogue to the formulas on the
previous two slides (which you studied in your first
introductory course in statistics)?
11 / 53
The covariance
In the introductory statistics course we defined the covariance
12 / 53
Make a scatter plot for negative, zero and positive correlation
(see also R example).
13 / 53
Variance-covariance matrix
• Consider random vector X(p→1) with mean vector µ(p→1) :
X1 E(X1 )
X2 E(X2 )
X(p→1) =
.. , and µ(p→1) = E(X) =
..
. .
Xp E(Xp )
14 / 53
15 / 53
• The diagonal elements in !, εii = εi2 , are variances.
• The o!-diagonal elements are covariances
εij = E[(Xi ↑ µi )(Xj ↑ µj )] = εji .
• ! is called variance, covariance and variance-covariance
matrix and denoted both Var(X) and Cov(X).
16 / 53
Exercise: the variance-covariance matrix
Let X4→1 have variance-covariance matrix
2 1 0 0
1 2 0 1
!= .
0 0 2 1
0 1 1 2
17 / 53
Correlation matrix
(
ε12 0 ··· 0
(
1 1 1 0 ε22 · · · 0
ω = (V )2
↓1
!(V ) 2
↓1
, where V 2 =
.. .. .. ..
. . .
(.
0 0 ··· εp2
18 / 53
Exercise: the correlation matrix
19 / 53
A:
20 / 53
Linear combinations
have
E(Z) = E(CX) = Cµ
Cov(Z) = Cov(CX) = C!C T
Exercise: Follow the proof - what are the most important
transitions?
21 / 53
22 / 53
Exercise: Linear combinations
2
XN µN εN εN E εN S εN W
XE µE ε 2
εE εES εEW
NE
X= ,µ = , and ! =
XS µS εN S εSE εS2 εSW
XW µW εN W εEW εSW 2
εW
23 / 53
Find C, such that Y(3→1) = C(3→4) X(4→1) gives the three
contrasts above:
24 / 53
Cov(Y ) = Cov(CX) = ...
25 / 53
corkds <- as.matrix(read.table("https://www.math.ntnu.no/emner/TMA4268/2019v/data/corkMKB.txt"))
dimnames(corkds)[[2]] <- c("N","E","S","W")
mu=apply(corkds,2,mean)
mu
Sigma=var(corkds)
Sigma
## N E S W
## 50.53571 46.17857 49.67857 45.17857
## N E S W
## N 290.4061 223.7526 288.4378 226.2712
## E 223.7526 219.9299 229.0595 171.3743
## S 288.4378 229.0595 350.0040 259.5410
## W 226.2712 171.3743 259.5410 226.0040
(C <- matrix(c(1,0,-1,0,0,1,0,1,-1,1,-1,1),byrow=T,nrow=3))
26 / 53
The covariance matrix - more requirements?
27 / 53
• The covariance matrix is by construction symmetric, and it
is common to require that the covariance matrix is positive
semidefinite. This means that, for every vector b →= 0
bT !b ↔ 0 .
28 / 53
Random vectors - Single-choice exercise
Quizz on www.menti.com
29 / 53
Question 1: Mean of sum
X and Y are two bivariate random vectors with E(X) = (1, 2)T
and E(Y ) = (2, 0)T . What is E(X + Y )?
• A: (1.5, 1)T
• B: (3, 2)T
• C: (↑1, 2)T
• D: (1, ↑2)T
30 / 53
Question 2: Mean of linear combination
X is a 2-dimensional random vector with E(X) = (2, 5)T , and
b = (0.5, 0.5)T is a constant vector. What is E(bT X)?
• A: 3.5
• B: 7
• C: 2
• D: 5
31 / 53
Question 3: Covariance
X is a p-dimensional random vector with mean µ. Which of the
following defines the covariance matrix?
32 / 53
Question 4: Mean of linear combinations
X is a p-dimensional random vector with mean µ and covariance
matrix !. C is a constant matrix. What is then the mean of the
k-dimensional random vector Y = CX?
• A: Cµ
• B: C!
• C: CµC T
• D: C!C T
33 / 53
Question 5: Covariance of linear combinations
X is a p-dimensional random vector with mean µ and covariance
matrix !. C is a constant matrix. What is then the covariance
of the k-dimensional random vector Y = CX?
• A: Cµ
• B: C!
• C: CµC T
• D: C!C T
34 / 53
Question 6: Correlation
X is a 2-dimensional random vector with covariance matrix
* +
4 0.8
!=
0.8 1
• A: 0.10
• B: 0.25
• C: 0.40
• D: 0.80
35 / 53
The multivariate normal distribution
36 / 53
3D multivariate Normal distributions
37 / 53
The multivariate normal (mvN) pdf
The random vector Xp→1 is multivariate normal Np with mean µ
and (positive definite) covariate matrix !. The pdf is:
1 1
f (x) = exp{↑ (x ↑ µ) ! (x ↑ µ)}
T ↓1
1
2
p
(2ϑ) 2 |!| 2
Questions:
• How does this compare to the univariate version?
1 1
f (x) = ↓ exp{↑ 2 (x ↑ µ)2 }
2ϑε 2ε
• Why do we need the constant in front of the exp?
• What is the dimension of the part in exp?
• What happens if the determinant |!| = 0 (degenerate case)?
38 / 53
Four useful properties of the mvN
Let X(p→1) be a random vector from Np (µ, !).
1. The grapical contours of the mvN are ellipsoids (can be
shown using spectral decomposition).
2. Linear combinations of components of X are (multivariate)
normal.
3. All subsets of the components of X are (multivariate)
normal (special case of the above).
4. Zero covariance implies that the corresponding components
are independently distributed (in contrast to general
distributions).
39 / 53
All of these are proven in TMA4267 Linear Statistical Models.
The result 4 is rather useful! If you have a bivariate normal and
observed covariance 0, then your variables are independent.
40 / 53
Contours of multivariate normal distribution
• Contours of constant density for the p-dimensional normal
distribution are ellipsoids defined by x such that
(x ↑ µ)T !↓1 (x ↑ µ) = b
41 / 53
Note:
In M4: Classification the mvN is very important and we will
often draw contours of the mvN as ellipses (in 2D space). This is
the reason why we do that.
42 / 53
Identify the mvNs from their contours
* +
εx2 ωεx εy
Let ! = .
ωεx εy εy2
43 / 53
A B
4
0.02 0.02
2
0.06 0.06
0. 0.1
2 1
0.1
0.1
16
4
0
0.
0.14
2
0.1
0.08 0.08
0.04 0.04
−2
−2
−4
−4
C
−4 −2 0 2 4 D
−4 −2 0 2 4
4
4
0.
01
0.03 0.03
0.04
2
2
0.05
0.06
0.07
0.07
0
0
0.06
0.05
−2
−2
0.04
0.02 0.02
0. 0.
01 01 01
0.
−4
−4
−4 −2 0 2 4 −4 −2 0 2 4
Take a look at the contour plots - when are the contours circles,
when ellipses?
44 / 53
Multiple choice - multivariate normal
A second quizz on www.menti.com
45 / 53
Question 1: Multivariate normal pdf
1 p2 ↓ 12
The probability density function is ( 2ε ) det(!) exp{↑ 12 Q}
where Q is
• A: (x ↑ µ)T !↓1 (x ↑ µ)
• B: (x ↑ µ)!(x ↑ µ)T
• C: ! ↑ µ
46 / 53
Question 2: Trivariate normal pdf
What graphical form has the solution to f (x) = constant?
• A: Circle
• B: Parabola
• C: Ellipsoid
• D: Bell shape
47 / 53
Question 3: Multivariate normal distribution
Xp ↗ Np (µ, !), and C is a k ↘ p constant matrix. Y = CX is
• A: Chi-squared with k degrees of freedom
• B: Multivariate normal with mean kµ
• C: Chi-squared with p degrees of freedom
• D: Multivariate normal with mean Cµ
48 / 53
Question 4: Independence
Let X ↗ N3 (µ, !), with
1 1 0
! = 1 3 2 .
0 2 5
49 / 53
Question 5: Constructing independent variables?
Let X ↗ Np (µ, !). How can I construct a vector of independent
standard normal variables from X?
• A: !(X ↑ µ)
• B: !↓1 (X + µ)
↓ 12
• C: ! (X ↑ µ)
1
• D: ! 2 (X + µ)
50 / 53
51 / 53
Further reading/resources
52 / 53
Acknowledgements
53 / 53