0% found this document useful (0 votes)
38 views41 pages

Maths Primer

The document provides an overview of key linear algebra concepts including transpose, inverse, rank and trace of matrices, determinant, eigenanalysis, and matrix gradient. It outlines topics in analysis including metrics, Jacobi and Hessian, Taylor series, and optimization. It also outlines probability theory topics like combinatorics, random variables and vectors, conditional probabilities and independence, and expectations and moments.

Uploaded by

Vera ZHAO
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views41 pages

Maths Primer

The document provides an overview of key linear algebra concepts including transpose, inverse, rank and trace of matrices, determinant, eigenanalysis, and matrix gradient. It outlines topics in analysis including metrics, Jacobi and Hessian, Taylor series, and optimization. It also outlines probability theory topics like combinatorics, random variables and vectors, conditional probabilities and independence, and expectations and moments.

Uploaded by

Vera ZHAO
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Mathematics Primer

Machine Intelligence

Neural Information Processing Group

www.ni.tu-berlin.de Mathematics Primer 1 / 41


1 Linear Algebra
Transpose, Inverse, Rank and Trace
Determinant
Eigenanalysis
Matrix Gradient
2 Analysis
Metrics
Jacobi and Hessian
Taylor Series
Optimization
3 Probability Theory
Combinatorics
Random Variables and Vectors
Conditional Probabilities and Independence
Expectations and Moments

www.ni.tu-berlin.de Mathematics Primer 2 / 41


Linear Algebra

Outline

1 Linear Algebra
Transpose, Inverse, Rank and Trace
Determinant
Eigenanalysis
Matrix Gradient
2 Analysis
Metrics
Jacobi and Hessian
Taylor Series
Optimization
3 Probability Theory
Combinatorics
Random Variables and Vectors
Conditional Probabilities and Independence
Expectations and Moments
www.ni.tu-berlin.de Mathematics Primer 3 / 41
Linear Algebra Transpose, Inverse, Rank and Trace

Matrix Multiplication, Transpose and Inverse

Consider matrices A ∈ RN ×M , B ∈ RM ×p with elements


(A)ij = aij , (B)ij = bij .
PM
The product A B ∈ RN ×p has elements (A B)ij = r=1 air brj .
⊤ ⊤
The transpose A has elements (A )ij = aji .
The inverse A−1 of a square matrix satisfies A A−1 = A−1 A = I.
The following identities hold:

(A B)⊤ = B⊤ A⊤

(A B)−1 = B−1 A−1


(A⊤ )−1 = (A−1 )⊤

www.ni.tu-berlin.de Mathematics Primer 4 / 41


Linear Algebra Transpose, Inverse, Rank and Trace

Rank and Trace

Linear independence
A set of vectors {a1 , . . . , aN } is linearly independent, if N
P
i=1 αi a i = 0
holds only if all αi = 0. This means none of the vectors can be expressed
as a linear combination of the others.

Rank
The rank rank(A) of a matrix A is the maximum number of linearly
independent rows (or columns).

Trace
PN
The trace of a square matrix A ∈ RN ×N is defined as Tr(A) = i=1 aii .

It holds:
Tr(A B) = Tr(B A)

www.ni.tu-berlin.de Mathematics Primer 5 / 41


Linear Algebra Determinant

Determinant

The determinant det(A) shows certain properties of a square matrix A


det(A) = 0 iff the rows (or columns) are linearly dependent
det(A) 6= 0 iff A is invertible

Note:
Determinant of the identiy matrix: det(I) = 1
Determinant of a transposed matrix: det(A) = det(A⊤ )
Determinant of a product of two matrices:

det(A B) = det(A) det(B)

www.ni.tu-berlin.de Mathematics Primer 6 / 41


Linear Algebra Determinant

Determinant calculation (general)

Calculation of the determinant of a N × N -Matrix A:


X
det(A) = Aij Cij .
j

Row i can be any row, the result is always the same. The cofactors Cij
are defined as Cij = (−1)i+j det([A]∅ij ), where [A]∅ij is the submatrix
that remains when the i-th row and j-th column are removed:

··· ∅ ···
 
A11 A12 A1N

 A21 A22 ··· ∅ ··· A2N 

.. .. .. . ..
∅ ..
 
 . . . . 
[A]∅ij =
 
∅ ∅ ∅ ∅ ∅ ∅

 
.. .. .. .. ..
 
. . . . .
 
 ∅ 
AN 1 AN 2 ··· ∅ ··· AN N

www.ni.tu-berlin.de Mathematics Primer 7 / 41


Linear Algebra Determinant

Determinant calculation (special cases)

a b
|A| = = ad − bc
c d

a b c
e f d f d e
|A| = d e f = a −b +c
h i g i g h
g h i

= aei + bf g + cdh − ceg − bdi − af h

www.ni.tu-berlin.de Mathematics Primer 8 / 41


Linear Algebra Determinant

Determinant and Inverse

The inverse A−1 of a square matrix A exists iff det(A) 6= 0


(matrix not singular).
Calculation of the inverse matrix:
adj[A]
A−1 =
det(A)

where the adjoint adj[A] of A is the matrix whose elements are the
cofactors:
(adj[A])ij = Cji
The determinant of an inverse matrix is given by
1
det(A−1 ) =
det(A)

www.ni.tu-berlin.de Mathematics Primer 9 / 41


Linear Algebra Eigenanalysis

Eigendecomposition of a Matrix

Problem: Find the Eigenvectors and Eigenvalues of a N × N matrix A.


Consider the system of linear equations:

A x = λx
(A − λI) x = 0

Solutions: N Eigenvectors x = vi and corresponding Eigenvalues


λ = λi
B x = 0 has non-trivial solutions iff det(B) = 0
Therefore, non-trivial λ are the roots of the characteristic
polynomial:
p(λ) ≡ det(A − λI) = 0

www.ni.tu-berlin.de Mathematics Primer 10 / 41


Linear Algebra Eigenanalysis

Eigenvalues and Eigenvectors

Characteristic Equation:

p(λ) ≡ det(A − λI) = 0

Polynomial of order N
N (not necessarily distinct) solutions
Number of non-zero Eigenvalues: rank(A)
In general: Eigenvalues are complex
For symmetric matrices (A = A⊤ ): Eigenvalues are real
QN
Determinant: det(A) = i=1 λi
PN
Trace: Tr(A) = i=1 λi

www.ni.tu-berlin.de Mathematics Primer 11 / 41


Linear Algebra Eigenanalysis

Eigendecomposition of a Matrix in R3
Example

 
0 −1 1
A =  −3 −2 3 
 
−2 −2 3

Eigenvalues: det(A − λI) = −λ3 + λ2 + λ − 1 = 0


⇒ λ1 = 1, λ2 = 1, λ3 = −1
Find each eigenvector x = vi associated with each eigenvalue λi :
  
0 − λi −1 1 x1
(A − λi I)x =  −3 −2 − λi 3   x2  = 0
  
−2 −2 3 − λi x3

⇒ λ1 , λ2 Eigenspace: {(x1 , x2 , x3 )| − x1 − x2 + x3 = 0}
⇒ λ3 Eigenspace: {(t, 3t, 2t)|t ∈ R}
www.ni.tu-berlin.de Mathematics Primer 12 / 41
Linear Algebra Eigenanalysis

Eigendecomposition of a Matrix in R3
Example

 
0 −1 1
A =  −3 −2 3 
 
−2 −2 3

The rank of A is rank(A) = 3


The number of non-zero Eigenvalues is 3X
The trace of A is Tr(A) = 0 − 2 + 3 = 1
The sum of the Eigenvalues is −1 + 1 + 1 = 1X
The determinant of A is det(A) = −1
The product of the Eigenvalues is (−1) · 1 · 1 = −1X

www.ni.tu-berlin.de Mathematics Primer 13 / 41


Linear Algebra Matrix Gradient

Matrix Gradient
The gradient of a function f : RN → R is given by
⊤
∂f ∂f

∇f ≡ ,...,
∂x1 ∂xN
Examples:
linear f : x 7→ a⊤ x ∇f (x) = a
quadratic f : x 7→ x⊤ Ax ∇f (x) = (A⊤ + A)x

Consider a scalar-valued function f of the elements of an N × M matrix


W, f : W 7→ R , f (W) = f (w11 , . . . , wN M ).
The matrix gradient of f w.r.t. W is defined as
∂f ∂f
 
∂w11 ··· ∂wN1
∂f  .. .. 
= . . 
∂W  ∂f ∂f

∂w1M ··· ∂wNM

www.ni.tu-berlin.de Mathematics Primer 14 / 41


Analysis

Outline

1 Linear Algebra
Transpose, Inverse, Rank and Trace
Determinant
Eigenanalysis
Matrix Gradient
2 Analysis
Metrics
Jacobi and Hessian
Taylor Series
Optimization
3 Probability Theory
Combinatorics
Random Variables and Vectors
Conditional Probabilities and Independence
Expectations and Moments
www.ni.tu-berlin.de Mathematics Primer 15 / 41
Analysis

Definitions from Functional Analysis

Functions, Functionals and Operators


Two sets M and N are connected by a functional dependency , if to
each x ∈ M there corresponds a unique element y ∈ N . This functional
dependency is called
a function if M and N are sets of numbers
a functional if M is a set of functions and N a set of numbers
an operator if both sets are sets of functions

Example: Linear integral operator T with kernel k(t, x):


Z b
T f (x) = k(t, x)f (t)dt
q

www.ni.tu-berlin.de Mathematics Primer 16 / 41


Analysis

Infimum and Supremum

Infimum, Supremum
Let D be a subset of R. A number K is called supremum (infimum) of
D, if K is the smallest upper bound (largest lower bound) of D:

x ≤ K (x ≥ K), ∀ x ∈ D

We write: sup D = K (inf D = K).

Examples:
For the closed interval D = [a, b], a ≤ b : sup D = b, inf D = a.
 n
For D = n+1 , n ∈ N : sup D = 1.

www.ni.tu-berlin.de Mathematics Primer 17 / 41


Analysis Metrics

Metric Space

Metric
A metric (or distance function) on a set X is a non-negative mapping

d : X × X → R+

(x, y) 7→ d(x, y)
with the following characteristics
1 Positive definiteness: d(x, y) = 0 iff x = y, d(x, y) > 0 otherwise
2 Symmetry: d(x, y) = d(y, x), ∀ x, y ∈ X
3 Triangle inequality: d(x, z) ≤ d(x, y) + d(y, z), ∀ x, y, z ∈ X

The pair (X, d) forms a metric space


d(x, y) is called the distance between x and y.

www.ni.tu-berlin.de Mathematics Primer 18 / 41


Analysis Jacobi and Hessian

Jacobi and Hessian

The matrix of the partial derivatives of a vector-valued function


f : RN → RM is known as Jacobi matrix and is given by
∂f1 ∂f1
 
∂x1 ··· ∂xN
∂f  .. .. .. 
Jf ≡ = . . . 
∂x  ∂fM ∂fM

∂x1 ··· ∂xN

The square matrix of second-order partial derivatives of a


scalar-valued function f : RN → R is called Hessian matrix and is
given by
∂2f ∂2f ∂2f
 
∂x21 ∂x1 ∂x2 ··· ∂x1 ∂xN
∂2f ∂2f ∂2f
 
∂2f 

∂x2 ∂x1 ∂x22
··· ∂x2 ∂xN


Hf ≡ 2
= .. .. .. ..

∂x 
 . . . .


 
∂2f ∂2f ∂2f
∂xN ∂x1 ∂xN ∂x2 ··· ∂x2N

www.ni.tu-berlin.de Mathematics Primer 19 / 41


Analysis Taylor Series

Taylor Series

Taylor Series in R
Let f : I → R be an infinitely often differentiable function, and x0 ∈ I.
Then the Taylor series around x0 is defined as

X 1 dn f (x)
f (x) = (x − x0 )n
n=0
n! dxn x0
1
= f (x0 ) + f ′ (x0 ) · (x − x0 ) + f ′′ (x0 ) · (x − xo )2 + . . .
2

Taylor Series in RN
Let f be an infinitely smooth scalar-valued function with domain in RN :

⊤ 1
f (x) = f (x0 ) + ∇f(x (x − x0 ) + (x − x0 )⊤ Hf (x0 ) (x − x0 ) + . . .
0) 2

www.ni.tu-berlin.de Mathematics Primer 20 / 41


Analysis Optimization

Local Extrema

Let f be a scalar-valued function RN → R.


Critical Points
A point x0 , where ∇f (x0 ) = 0 is called a critical point of f .

Local Extrema
A critical point x0 of f is
a minimum of f , if all Eigenvalues of (Hf )(x0 ) are positive
(the Hessian is positive definite)
a maximum of f , if all Eigenvalues of (Hf )(x0 ) are negative
(the Hessian is negative definite)
no extremum of f , in all other cases (the Hessian is indefinite)

www.ni.tu-berlin.de Mathematics Primer 21 / 41


Analysis Optimization

Convexity

Convex Functions
Let U ⊂ RN be open and convex. A function f : U → R is called (strictly)
convex, if for all x1 , x2 ∈ U with x1 6= x2 and all 0 < λ < 1

f λx1 + (1 − λ)x2 (<) ≤ λf (x1 ) + (1 − λ)f (x2 )

Concave Functions
f is called concave, if (−f ) is convex.

www.ni.tu-berlin.de Mathematics Primer 22 / 41


Analysis Optimization

The Lagrange Method (Equality Constraints)


Problem: Maximization of a function f (w): RN → R under some equality
constraints gi (w) = 0 ∀i ∈ {1, . . . , k}.
!
f (w) = max, s.t. gi (w) = 0, ∀i ∈ {1, . . . , k}
Solution: Form the Lagrangian
k
X
L(w, λ1 , . . . , λk ) = f (w) + λi gi (w),
i=1
where λ1 , . . . , λk are called Lagrange multipliers. Find the stationary
points (saddle points) of the Lagrangian w.r.t. both w and all the λi :
k
∂L(w, λ1 , . . . , λk ) ∂f (w) X ∂gi (w)
= + λi =0
∂w ∂w i=1
∂w
and
∂L(w, λ1 , . . . , λk )
= gi (w) = 0, ∀i.
∂λi
www.ni.tu-berlin.de Mathematics Primer 23 / 41
Analysis Optimization

The Lagrange Method (Inequality Constraints)


Now: Maximization of a function f (w) under some inequality constraints.
!
f (w) = max, s.t. hi (w) ≤ 0, ∀i ∈ {1, . . . , k}

Solution: Find the stationary points of the Lagrangian


k
X
L(w, λ1 , . . . , λk ) = f (w) + λi hi (w),
i=1

w.r.t. w under the constraints

hi (w) ≤ 0, ∀i

λi ≥ 0, ∀i
λi · hi (w) = 0, ∀i,
which are known as the Karush-Kuhn-Tucker (KKT) conditions.
www.ni.tu-berlin.de Mathematics Primer 24 / 41
Probability Theory

Outline

1 Linear Algebra
Transpose, Inverse, Rank and Trace
Determinant
Eigenanalysis
Matrix Gradient
2 Analysis
Metrics
Jacobi and Hessian
Taylor Series
Optimization
3 Probability Theory
Combinatorics
Random Variables and Vectors
Conditional Probabilities and Independence
Expectations and Moments
www.ni.tu-berlin.de Mathematics Primer 25 / 41
Probability Theory Combinatorics

Combinatorics

Consider a set consisting of n elements. The power set is the set of all
subsets, its cardinality is 2n .
Permutation: arrangement of n elements in a certain order
# without repetitions: Pn = n!
(k) n!
# with repetitions (k ≤ n repeated elements): Pn = k!

Combination: choice of k out of n elements regardless of order


(k) n
 n!
# without repetitions: Cn = k = k!(n−k)!
(k) n+k−1

# with repetitions: Cn = k

Variation: choice of k out of n elements taking their order into


account
(k) n

# without repetitions Vn = k! k
(k)
# with repetitions: Vn = nk

www.ni.tu-berlin.de Mathematics Primer 26 / 41


Probability Theory Random Variables and Vectors

Random Variable

Consider a set Ω of elementary events w, e.g. all possible outcomes of an


experiment. The mapping
Ω→R⊂R
w → X(w) ≡ X
is called a random variable.
If R consists of a finite or countable infinite number of elements, then
X is called a discrete random variable.
If R = R or R consists of intervals from R, then X is called a
continuous random variable.

Example: Roll dice


w1 : 1 comes up → X(w1 ) = 1, . . . , w6 : 6 comes up → X(w6 ) = 6

www.ni.tu-berlin.de Mathematics Primer 27 / 41


Probability Theory Random Variables and Vectors

Distribution of a Random Variable


The cumulative distribution function (cdf) or simply distribution
function of a random variable X at point z is defined as the probability
that X ≤ z:
FX (z) = P (X ≤ z)

Allowing z to vary in (−∞, ∞) defines the cdf for all values of X.


0 ≤ FX ≤ 1, a nondecreasing and continuous function for continuous
X.
Example: Roll ideal dice, where P (X = i) = 61 ∀i




0 for z < 1
for 1 ≤ z < 2

1/6



FX (z) = 2/6 for 2 ≤ z < 3

...






1 for z ≥ 6

www.ni.tu-berlin.de Mathematics Primer 28 / 41


Probability Theory Random Variables and Vectors

Probability Density of a Continuous Variable

The probability density function (pdf) pX of a continuous X is


obtained as the derivative of its cdf:
dFX (x)
pX (z) =
dx x=z

In practice, the cdf is computed from the known pdf using the inverse
relationship Z z
FX (z) = pX (t)dt
−∞

Example: the Gaussian distribution N (µ, σ 2 )


1 z (x−µ)2
Z
cdf F (z) ≡ P (X ≤ z) = √ e− 2σ 2 dx
σ 2π −∞
1 (z−µ)2
pdf p(z) = √ e− 2σ2
σ 2π

www.ni.tu-berlin.de Mathematics Primer 29 / 41


Probability Theory Random Variables and Vectors

Distribution of a Random Vector

The distribution function of a random vector X:

Ω → R N ⊂ RN

w → X(w) ≡ X
at a point z is given by

FX (z) = P (X ≤ z)

www.ni.tu-berlin.de Mathematics Primer 30 / 41


Probability Theory Random Variables and Vectors

Distribution of a Random Vector


Example

Toss a German 2 Euro and a German 20 Cent coin.


w(1) = {2 Euro: eagle, 20 Cent: gate} → X(w(1) ) = (1, 1)⊤
w(2) = {2 Euro: eagle, 20 Cent: number} → X(w(2) ) = (1, 2)⊤
w(3) = {2 Euro: number, 20 Cent: gate} → X(w(3) ) = (2, 1)⊤
w(4) = {2 Euro: number, 20 Cent: number} → X(w(4) ) = (2, 2)⊤



 0
for (z1 < 1) ∨ (z2 < 1)
for (1 ≤ z1 < 2)
 1/4 ∧ (1 ≤ z2 < 2)



FX (z) = 1/2 for (1 ≤ z1 < 2) ∧ (2 ≤ z2 )
3/4 for (2 ≤ z1 ) ∧ (1 ≤ z2 < 2)





1 for (2 ≤ z1 ) ∧ (2 ≤ z2 )

www.ni.tu-berlin.de Mathematics Primer 31 / 41


Probability Theory Conditional Probabilities and Independence

Conditional Probabilities

Conditional Probabilities
Consider two discrete random variables X and Y . The conditional
probability of Y given X:

P (X = x, Y = y)
P (Y = y|X = x) = , P (X = x) 6= 0
P (X = x)

Conditional Probability Densities


Consider two continuous random vectors X, Y and their joint probability
density. The conditional probability density of Y given X: Probability for
finding Y ∈ [y, y + dy] if we already know that X ∈ [x, x + dx].

p(x, y)
p(y|x) = almost everywhere in X
p(x)

www.ni.tu-berlin.de Mathematics Primer 32 / 41


Probability Theory Conditional Probabilities and Independence

Independence

Statistical Independence of Continuous Random Vectors


The random vectors X and Y are statistically independent iff

p(y|x) = p(y) or equivalently p(x, y) = p(x)p(y)

www.ni.tu-berlin.de Mathematics Primer 33 / 41


Probability Theory Conditional Probabilities and Independence

Marginals

Law of Total Probability (Discrete Random Variables)


Marginalisation over Y:
X
P (X = x) = P (X = x, Y = yk )
k

Marginal Densities (Continuous Random Vectors)


Given the joint density pX,Y (x, y) of two random vectors X and Y, the
marginal density pX (x) is obtained by integrating over the other random
vector: Z ∞
pX (x) = pX,Y (x, ỹ)dỹ
−∞

www.ni.tu-berlin.de Mathematics Primer 34 / 41


Probability Theory Conditional Probabilities and Independence

Bayes’ Theorem

Bayes’ Theorem (Discrete Random Variables)

P (X = x|Y = y)P (Y = y)
P (Y = y|X = x) =
P (X = x)

P (X = x|Y = y)P (Y = y)
= P
k P (X = x|Y = yk )P (Y = yk )

Bayes’ Theorem (Continuous Random Vectors)


p(x|y)p(y) p(x|y)p(y)
p(y|x) = = R
p(x) p(x|ỹ)p(ỹ)dỹ

www.ni.tu-berlin.de Mathematics Primer 35 / 41


Probability Theory Conditional Probabilities and Independence

Decomposition

Factorization of a joint pdf (or cdf), as given by the Chain Rule:

p(x1 , . . . , xd ) = p(x1 )p(x2 |x1 ) . . . p(xd |x1 , . . . , xd−1 )

Special case: Statistical Independence


d
Y
p(x1 , . . . , xd ) = p(x1 )p(x2 ) . . . p(xd ) = p(xk )
k=1

Special case: 1st order Markov chain

p(x1 , . . . , xd ) = p(xd |xd−1 )p(xd−1 |xd−2 ) . . . p(x2 |x1 )p(x1 )

www.ni.tu-berlin.de Mathematics Primer 36 / 41


Probability Theory Expectations and Moments

Expectations

In Practice: Probability density usually unknown


However: Expectations of functions can be directly estimated from
the data

The expectation of a scalar-, vector- or matrix-valued function g(X) of a


random vector X, as defined below, can be estimated from a dataset of k
i.i.d. samples x(1) , x(2) , . . . , x(k) :
∞ k
1X
Z
hg(X)i ≡ g(x) pX (x) dx ≈ g(x(j) )
−∞ k j=1

Linearity: haX + bX + ci = ahXi + bhYi + c


pX known ⇒ Expectations of arbitrary function available
Expectations for all functions g known ⇒ pX can be determined
⇒ Statistics of X completely known
www.ni.tu-berlin.de Mathematics Primer 37 / 41
Probability Theory Expectations and Moments

Moments

Moments of a random vector X = (X1 , . . . , Xn ) are typical expectations


used to characterize it. They are obtained when g(X) consists of products
of components of X.

Examples:
R
1st order: hXi i = p(xi ) xi dxi ... mean value µi , µ = (µ1 , . . . , µn )
2nd order: hXi Xj i ... correlation between Xi , Xj
3rd order: hXi Xj Xk i ... e.g. skewness

www.ni.tu-berlin.de Mathematics Primer 38 / 41


Probability Theory Expectations and Moments

Correlation Matrix

The correlation matrix of a random vector X contains all second order


moments hXi Xj i:
RX = hX X⊤ i

Symmetry: RX = R⊤
X
Positive semidefinite: a⊤ RX a ≥ 0, ∀a
⇒ all eigenvalues real and nonnegative
⇒ all eigenvectors are mutually orthogonal

www.ni.tu-berlin.de Mathematics Primer 39 / 41


Probability Theory Expectations and Moments

Covariance Matrix

The covariance matrix of a random vector X is given by

CX ≡ h(X − µX )(X − µX )⊤ i = hX X⊤ i − µX µ⊤
X
= RX − µX µ⊤
X

and the components Cij are calculated as


ZZ
Cij = hXi Xj i − µi µj = p(xi , xj ) xi xj dxi dxj − µi µj .

Cii = σi2 ... variance of Xi


For zero mean (“centered”), the correlation and covariance matrices
are identical.

www.ni.tu-berlin.de Mathematics Primer 40 / 41


Probability Theory Expectations and Moments

Uncorrelatedness and Independence

Two random vectors X and Y are uncorrelated iff their cross-covariance


matrix CX Y = hX Y⊤ i − µX µY = 0.
Uncorrelatedness implies that

RX Y = hX Y⊤ i = hXihY⊤ i = µX µ⊤
Y
,

while independence implies that

hg(X)h(Y )i = hg(X)ihh(Y )i for any g, h

⇒ Independence much stronger property than uncorrelatedness


Special property of Gaussian distributions:
uncorrelatedness = independence

www.ni.tu-berlin.de Mathematics Primer 41 / 41

You might also like