0% found this document useful (0 votes)
6 views5 pages

Statistical Foundaments

Uploaded by

patatapocha18
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views5 pages

Statistical Foundaments

Uploaded by

patatapocha18
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Fundamentals of Statistical Learning

In this chapter we are going to present the statistical fundamentals to


understand the concepts of the following topics.

1 Expected Value
Let X = [x1 · · · xp ]T be a random p-dimensional vector.

Definition. The expected value of X is defined as


 
E(x1 )
E(X) = µ =  ...  .
 
E(xp )

Properties
i If a is a constant vector, then E(a) = a
   
E(a1 ) a1
 ..   .. 
Proof. E(a) =  .  =  .  = a.
E(ap ap

ii E(a + X) = a + E(X).

Proof.
       
E(a1 + x1 ) a1 + E(x1 ) a1 E(x1 )
E(a + X) =  .. ..   ..   .. 
=  =  .  +  . .
  
. .
E(ap + xp ) ap + E(xp ) ap E(xp )

iii If A is a matrix (or vector) such that AX exist, then

a) E(AX) = AE(X),
b) E(XA) = E(X)A.

Proof.
Pp   Pp   Pp 
a1j xj
j E( j a1j xj ) j a1j E(xj )
E(AX) = E  .. .. ..
= =  = AE(X).
     
Pp . Pp . Pp .
j apj xj E( j apj xj ) j apj E(xj )

Part b) is similarly demonstrated.

iv E(X + Y) = E(X) + E(Y).

1
Proof.
       
x1 + y 1 E(x1 ) + E(y1 ) E(x1 ) E(y1 )
E(X + Y) = E  ...  =  ..   ..   .. 
 =  . + .  = E(X)+E(Y).
  
.
xp + y p E(xp ) + E(yp ) E(xp ) E(yp )

v If X and Y are independent, then E(XY) = E(X)E(X)

Proof. We know that two random variables are independents if f (X, Y) =


f (X)f (Y) or equivalently f (X|Y) = f (X), then
ZZ
E(XY) = XYf (X, Y)dXdY,
ZZ
= XYf (X)f (Y)dXdY,
Z Z
= Xf (X)dX Yf (Y)dY = E(X)E(X).

2 Dispersion matrix
Definition. The variance-covariance matrix of X, or dispersion matrix,
denoted by Σ , is defined by

D(X) = Σ = E[(X − µ)(X − µ)T ].

Note that
 
(x1 − µ1 )2
· · · (x1 − µ1 )(xp − µp )
(X − µ)(X − µ)T =  .. ... ..
,
 
. .
2
(xp − µp )(x1 − µ1 ) · · · (xp − µp )

therefore  
V (x1 )· · · cov(x1 , xp )
Σ= .. ... ..
.
 
. .
cov(xp , x1 ) · · · V (xp )

Theorem. The variance-covariance matrix is positive-semidefinite.

Proof. A square symmetric real matrix A is positive-semidefinite if

yT Ay ≥ 0 ∀y ∈ Rp .

We’re going to prove that the covariance matrix fulfills it.

2
yT Σy = yT E[(X − µ)(X − µ)T ]y,
= E[yT (X − µ) (X − µ)T y],
| {z } | {z }
1×1 1×1

= E[((X − µ) y) (X − µ)T y],


T T
a = aT if a ∈ R
= E[((X − µ)T y)(X − µ)T y],
= E[((X − µ)T y)2 ] ≥ 0.

Properties
i If b ∈ Rp is a constant vector, then D(b) = Σ = 0p×p .

Proof.
   
(b1 − b1 )(b1 − b1 ) · · · (b1 − b1 )(bp − bp ) 0 ··· 0
.. .. .. . .
D(b) = E   = E  .. . . . ..  = 0p×p .
 
. . .
(bp − bp )(b1 − b1 ) · · · (bp − bp )(bp − bp ) 0 ··· 0

ii If b ∈ Rp is a constant vector, then D(b + X) = D(X).

Proof.

D(b + X),
 
(b1 + x1 − b1 − µ1 )2 · · · (b1 + x1 − b1 − µ1 )( bp + xp − bp − µp )
= E .. ... ..
,
 
. .
2
bp + xp − 
( bp − µp )(b1 + x1 − b1 − µ1 ) · · · bp + xp − 
( bp − µp )
= D(X).

iii If A is a constant matrix, or a vector, such that AX exits, then

D(AX) = AΣAT .

Proof.

D(AX) = E[(AX − Aµ)(AX − Aµ)T ],


= E[A(X − µ)(A(X − µ))T ],
= E[A(X − µ)(X − µ)T AT ],
= AE[(X − µ)(X − µ)T ]AT = AΣAT .

3
iv If X and Y are independent random variables, then

D(X + Y) = D(X) + D(X).

Proof. Before we prove this property, we shall prove that if X and Y are
independent then xi and yi are also independent for all (i, j) ∈ (1, · · · , p)2 .
That is to say, that the components of X and Y are also independent. Let
X and Y two independent random vectors, then

f (X, Y) = f (X)f (Y).

Where for simplicity of notation and to better understand we assume that


X = [x1 , x2 ]T and Y = [y1 , y2 ]T . Let’s prove that f (x1 |y1 ) = f (x1 )

f (x1 , y1 )
f (x1 |y1 ) = ,
f (y1 ) marginal
probability
ZZ
1
= f (x1 , x2 , y1 , y2 )dx2 dy2 ,
f (y1 )
ZZ independence
1
= f (x1 , x2 )f (y1 , y2 )dx2 dy2 ,
f (y1 )
Z Z
1
= f (x1 , x2 )dx2 f (y1 , y2 )dy2 ,
f (y1 )
f (x1 )f (y1 )
= = f (x1 ).
f (y1 )
Then cov(xi , yj ) = 0, ∀(i, j) ∈ (1, · · · , p)2 . Now we have
 
cov(x1 + y1 , x1 + y1 ) · · · cov(x1 + y1 , xp + yp )
D(X + Y) =  .. .. ..
.
 
. . .
cov(xp + yp , x1 + y1 ) · · · cov(xp + yp , xp + yp )

Then we only have to note that

cov(xi + yj , xj + yj ) = cov(xi , xj ) + cov(xi , yj ) + cov(yi , xj ) + cov(yi , yj ),


= cov(xi , xj ) + cov(yi , yj ).

Therefore D(X + Y) = D(X) + D(Y).

3 Covariance matrix
Let there be given two stochastic variables X = [x1 · · · xp ]T and Y = [y1 · · · yp ]T of
dimension p and q, respectively, and with mean values µ and ν.
Definition. The covariance matrix between X and Y is defined by
 
cov(x1 , y1 ) · · · cov(x1 , yq )
C(X, Y ) = E[(X − µ)(Y − ν)] =  .. ... ..
.
 
. .
cov(xp , y1 ) · · · cov(xp , yq )

4
Properties
i C(X, X) = D(X).

ii C(X, Y) = C(X, Y)T .

iii If An×p and Bm×q are real matrix, then C(AX, BY) = AC(X, Y)B T .

Proof.

C(AX, BY) = E[(AX − Aµ)(BY − Bν)T ],


= E[A(X − µ)(Y − ν)T B T ],
= AE[(X − µ)(Y − ν)T ]B T ,
= AC(X, Y)B T .

iv Let U and V two p and q-dimensional random vectors with mean γ and δ,
respectively. Then

a) C(U + X, Y) = C(U, Y) + C(X, Y),


b) C(U, V + Y) = C(U, V) + C(U, Y).

Proof.

C(U + X, Y) = E[(U + X − γ − µ)(Y − δ)T ],


= E[((U − γ) + (X − µ))(Y − δ)T ],
= E[(U − γ)(Y − δ)T + (X − µ))(Y − δ)T ],
= C(U, Y) + C(X, Y).

Part b) is similarly demonstrated

v D(X + U) = D(X) + D(U) + C(X, U) + C(U, X).

Proof.

D(X + U) = C(X + U, X + U).


= C(X, X) + C(X, U) + C(U, X) + C(U, U).
= D(X) + D(U) + C(X, U) + C(U, X).

You might also like