0% found this document useful (0 votes)

83 views12 pages

Lecture Notes On Independent Component Analysis

Uploaded by

yuyang zhang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

83 views12 pages

Lecture Notes On Independent Component Analysis

Uploaded by

yuyang zhang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Lecture Notes on

Independent Component Analysis

Laurenz Wiskott
Institut fur Neuroinformatik
Ruhr-Universitat Bochum, Germany, EU

11 December 2016

Contents
1 Intuition 2
1.1 Mixing and unmixing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 How to find the unmixing matrix? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Sources can only be recovered up to permutation and rescaling . . . . . . . . . . . . . . . . . 5
1.4 Whiten the data first . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 A generic ICA algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Formalism based on cumulants 5

2.1 Moments and cumulants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Cross-cumulants of statistically independent components are zero . . . . . . . . . . . . . . . . 7
2.3 Components with zero cross-cumulants are statistically independent . . . . . . . . . . . . . . 8
2.4 Rotated cumulants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5 Contrast function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.6 Givens-rotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.7 Optimizing the contrast function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.8 The algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3 Other resources 11
3.1 Written material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Videos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

These lecture notes depend on my lecture notes on principal component analysis and are largely based on
(Hyvarinen et al., 2001; Blaschke and Wiskott, 2004).

2009, 20112013, 2016 Laurenz Wiskott (homepage https://www.ini.rub.de/PEOPLE/wiskott/). This work (except
for all figures from other sources, if present) is licensed under the Creative Commons Attribution-ShareAlike 4.0 International
License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/4.0/. Figures from other sources
have their own copyright, which is generally indicated. Do not distribute parts of these lecture notes showing figures with
non-free copyrights (here usually figures I have the rights to publish but you dont, like my own published figures). Figures I
do not have the rights to publish are grayed out, but the word Figure, Image, or the like in the reference is often linked to a
pdf.
More teaching material is available at https://www.ini.rub.de/PEOPLE/wiskott/Teaching/Material/.

1
1 Intuition
1.1 Mixing and unmixing
In contrast to principal component analysis, which deals with the second-order moments of a data distri-
bution, independent component analysis focuses on higher-order moments, which can, of course, be of very
diverse and very complex nature. In (linear) independent component analysis (ICA) one assumes1
a very simple model of the data, namely that it is a linear mixture (D: Mischung) of some statistically
independent sources (D: Quellen) sI , and one often even assumes that the number of sources I is
the same as the dimensionality N of the data. Each source is characterized by a probability density
function (pdf) (D: Wahrscheinlichkeitsdichtefunktino) psi (si ) and the joint pdf of the sources is simply
the product of its individual pdfs, for two sources we have

ps (s1 , s2 ) = ps1 (s1 ) ps2 (s2 ) . (1)

For an example see figure 1.

ps(s1, s2)
ps1 (s1) s2

s1 s1

ps2 (s2)

Figure 1: Individual and joint pdfs of two sources.

CC BY-SA 4.0

It is assumed that the data is generated by mixing the sources linearly like

x := Ms , (2)

with an invertible square mixing matrix M. The task of independent component analysis then is to find a
square matrix U that inverts this mixture, so that

y := Ux (3)

recovers the sources. Consider first two examples where we know the mixing.

Example 1: Figure 2 shows an example of the two-dimensional mixture

s1
x := s1 d1 + s2 d2 = d1 d2 (4)
| {z } s2
=: M

of the sources of figure 1, where the mixing is orthogonal. This means that matrix M is orthogonal if we
assume the vectors di are normalized. Notice that the vectors di indicate how a single source is distributed
1 Important text (but not inline formulas) is set bold face; marks important formulas worth remembering; marks less

important formulas, which I also discuss in the lecture; + marks sections that I typically skip during my lectures.

2
x2

Figure 2: A two-dimensional orthogonal mixture of the two sources given above.

CC BY-SA 4.0

over the different data vector components. I therefore call them distribution vectors (D: Verteilungsvektoren),
but this is not a standard term. They do not indicate the mixing of all sources on one component of the
data vector, thus calling them mixture vectors would be misleading. Notice also that the fact that the data
is concentrated around d2 is a consequence of s1 (not s2 !) being concentrated around zero.
Since the vectors di are orthogonal, extracting the sources from the mixture can be done by mulitplying
with these vectors, for instance

y1 := dT1 x (5)
(4)
= dT1 (s1 d1 + s2 d2 ) (6)
= s1 dT1 d1 +s2 dT1 d2 (7)
| {z } | {z }
=1 =0
= s1 , (8)

and likewise for y2 .

The unmixing matrix therefore is

dT1

U := , (9)
dT2

since

dT1

(9,4)
UM = d1 d2 (10)
dT2
dT1 d1 dT1 d2

= (11)
dT2 d1 dT2 d2

1 0
= . (12)
0 1

Example 2: If the distribution vectors are not orthogonal, matters become a bit more complicated, see
figure 3. It is somewhat counterintuitive that now the vectors ei to extract the sources from the
mixture do not have to point in the direction of the corresponding distribution vectors but
rather must be orthogonal to all other distribution vectors, so that

eTi dj = ij . (13)

3
d1 x2

x1
e2

Figure 3: A two-dimensional non-orthogonal mixture of the two sources given above.

CC BY-SA 4.0

Notice that in the figure e1 has the same angle to d1 as e2 has to d2 and that e1 must be shorter than e2 ,
because d1 is longer than d2 , to keep the inner products eTi di equal.
With the extraction vectors (D: Extraktionsvektoren) ei we get the unmixing matrix
T
e1
U := (14)
eT2
and verify
eT1

(14,4)
UM = d1 d2 (15)
eT2
eT1 d1 eT1 d2

= (16)
eT2 d1 eT2 d2

(13) 1 0
= . (17)
0 1

1.2 How to find the unmixing matrix?

So far we have only derived unmixing matrices if the mixing matrix was known. But we have not said anything
about how to find the unmixing matrix if only the data is given and nothing is known about the mixing (or
the sources) apart from it being linear. So we need some statistical criteria to judge whether an
unmixing matrix is good or not. There are two fundamental approaches, either one compares the joint
pdf of the unmixed data with the product of its marginals, or one maximizes the non-Gaussianity of the
marginals.

Make the output signal components statistically independent: A fundamental assumption of ICA
is that the data is a linear mixture of statistically independent sources. If we unmix the data the resulting
output signal components should therefore again be statistically independent. Thus a possible criterion for
whether the unmixing is good or not is whether
py (y1 , y2 ) = py1 (y1 ) py2 (y2 ) . (18)
or not. In practice one measures the difference between the joint distribution and the product of marginals
and tries to minimize it. For instance one can use the Kullback-Leibler divergence between py (y1 , y2 )
and py1 (y1 ) py2 (y2 ). In this lecture I will focus on cross-cumulants as a measure of statistical
independence.
This approach necessarily optimizes the unmixing matrix as a whole.

4
Make the output signal components non-Gaussian: Another approach is based on the observation
that, if you mix two sources, the mixture tends to be more Gaussian than the sources. Applied
over and over again this culminates in the central limit theorem of statistics, which basically states that a
mixture of infinitely many variables has a Gaussian distribution. Turning this argument around it might
be a good strategy to search for output signal components that are as different from a Gaussian
as possible. This are then most likely sources. To measure non-Gaussianity, one often uses kurtosis (D:
Kurtosis), see below.
This approach permits to extract one source after the other. One then typically first extracts the
most non-Gaussian signal, eliminates the corresponding dimension from the data and then finds the second
non-Gaussian signal.

1.3 Sources can only be recovered up to permutation and rescaling

It is clear that if y1 and y2 are statistically independent, 3y2 and 0.5y1 are also statistcally independent.
Thus, there is no way to tell the order of the sources and their scale. A similar argument holds for the
approach based on non-Gaussianity. Thus, the sources can principally be only recovered up to a
permutation and scaling.

1.4 Whiten the data first

The least one can expect from the estimated sources is that they are uncorrelated. To fix the arbitrary
scaling factor, it is also common to require the output data components to have unit variance (zero mean
is assumed in any case). It is therefore a good idea to whiten or sphere the data first, because then
we know that the data projected onto any normalized vector has zero mean and unit variance and that
the data projected onto two orthogonal vectors are uncorrelated. Thus, the unmixing matrix must be
orthogonal and the unmixing problem reduces to finding the right rotation, which is still difficult enough.
Intuitively, whitening brings us from the situation in figure 3 to that in figure 2.

1.5 A generic ICA algorithm

Summarizing what we have learned so far, a generic ICA algorithm is conceptually relatively simple and
works as follows:

1. remove the mean and whiten the data,

2. rotate the data such that either

(a) the output signal components are as statistically independent as possible, or

(b) the output signal components are most non-Gaussian.

2 Formalism based on cumulants

There is a whole zoo of different ICA algorithms. I focus here on a method based on higher-order cumulants.

5
2.1 Moments and cumulants
Moments and cumulants are a convenient way of describing statistical distributions. If all moments or all
cumulants are known, the distribution is uniquely determined. Thus, moments and cumulants are equivalent
descriptions of a distribution, although one usually only uses the lower moments or cumulants. If hi indicates
the average over all data points, then the moments of some vectorial data y written as single components
are defined as

first moment hyi i (19)

second moment hyi yj i (20)
third moment hyi yj yk i (21)
fourth moment hyi yj yk yl i (22)
higher moments ... .

A disadvantage of moments is that higher moments contain information that can already be expected from
lower moments, for instance

hyi yj i = hyi i hyj i + ? , (23)

or for zero mean data

hyi yj yk yl i = hyi yj i hyk yl i + hyi yk i hyj yl i + hyi yl i hyj yk i + ? . (24)

It would be nice to have a definition for only the part that is not known yet, much like the variance for the
second moment of a scalar variable,

(y hyi)2

= hyyi hyi hyi , (25)

where one has removed the influence of the mean from the second moment. This idea can be generalized
to mixed and higher moments. If one simply subtracts off from the higher moments what one
would expect from the lower moments already, one gets corrected moments that are called
cumulants (D: Kumulanten). For simplicity we assume zero mean data, then
!
Ci := hyi i = 0 , (26)
Cij := hyi yj i , (27)
Cijk := hyi yj yk i , (28)
Cijkl := hyi yj yk yl i hyi yj i hyk yl i hyi yk i hyj yl i hyi yl i hyj yk i . (29)

Notice that there are no terms to be subtracted off from Cij and Cijk due to the zero-mean constraint. Any
term that one might extract would have a factor of the form hyi i included, which is zero.
For scalar variables these four cumulants have nice intuitive interpretations. We know already that Ci
3/2
and Cii are the mean and variance of the data, respectively. Ciii (often normalized by Cii ) is
the skewness (D: Schiefe) of the distribution, which indicates how much tilted to the left or to the right
the distribution is. Ciiii (often normalized by Cii2 ) is the kurtosis (D: Kurtosis) of the distribution,
which indicates how peaky the distribution is. A uniform distribution has negative kurtosis and is called
sub-Gaussian; a peaky distribution with heavy tails has positive kurtosis and is called super-Gaussian. A
Gaussian distribution has zero kurtosis, and also all its other cumulants higher than the second cumulant
vanish.
Cumulants have the important property that if you add two (or more) statistically independent variables,
the cumulants of the sum equals the sum of the cumulants of the variables.

6
Mean:
Ci < 0 Ci = 0 Ci > 0

0 0 0

Variance:
Cii < 1 Cii = 1 Cii > 1

0 0 0

Skewness:
Ciii < 0 Ciii = 0 Ciii > 0

0 0 0

Kurtosis:
Ciiii < 0 Ciiii = 0 Ciiii > 0

0 0 0

Figure 4: Illustration of the effect of the first four cumulants in one variable, i.e. mean, variance, skewness,
and kurtosis.
CC BY-SA 4.0

2.2 Cross-cumulants of statistically independent components are zero

First notice that if the random variables of a higher moment can be split into two statistically
independent groups, then the moment can be written as a product of the two lower moments
of the groups. For instance, if yi and yj are statistically independent of yk , then
Z Z Z
hyi yj yk i = yi yj yk p(yi , yj , yk ) dyk dyj dyi (30)
y y yk
Z iZ jZ
= yi yj yk p(yi , yj ) p(yk ) dyk dyj dyi (due to statistical independence) (31)
yi yj yk
Z Z Z
= yi yj p(yi , yj ) dyj dyi yk p(yk ) dyk (32)
yi yj yk
= hyi yj i hyk i . (33)

This has an important implication for cumulants of statistically independent variables. We have argued
above that a cumulant can be interpreted as the corresponding moment of identical structure minus all what
can be expected from lower-order moments (or cumulants) already. If the random variables can be split into
two statistically independent groups then the corresponding moment can be written as a product of two lower
moments, which means that it can be completely predicted from these lower moments. As a consequence

7
the cumulant is zero, because there is nothing left, that could not be predicted. Thus we can state (without
proof) that if a set of random variables are statistically independent then all cross-cumulants
vanish. For instance, for statistically independent random variables yi and yj with zero mean we get

Ciij = hyi yi yj i (34)

(33)
= hyi yi i hyj i (since yi and yj are statistically independent) (35)
|{z}
=0
= 0 (since the data is zero-mean) , (36)
Ciijj = hyi yi yj yj i hyi yi i hyj yj i hyi yj i hyi yj i hyi yj i hyi yj i (37)
(33)
= hyi yi i hyj yj i hyi yi i hyj yj i 2 hyi i hyj i hyi i hyj i (38)
|{z}
=0
(since yi and yj are statistically independent)
= 0 (since the data is zero-mean) . (39)

2.3 Components with zero cross-cumulants are statistically independent

The converse is also true (again without proof). If all cross-cumulants vanish then the random
variables are statistically independent. Notice that pairwise statistical independence does not generally
suffice for the overall statistical independence of the variables. Consider, for instance, the three binary
varibles A, B, and C = AxorB.
Thus, the cross-cumulants can be used to measure the statistical dependence between the output signal
components in ICA. Of course, one cannot use all cross-cumulants, but for instance one can require

Cij = ij (unit covariance matrix) , (40)

X
2
and Cijk minimal, (41)
ijk6=iii
X
2
or Cijkl minimal, (42)
ijkl6=iiii
X X
2 2
or Cijk + Cijkl minimal. (43)
ijk6=iii ijkl6=iiii

Usually one uses the fourth-order cumulants because signals are often symmetric and then the third-order
cumulants vanish in any case. Assuming we disregard third-order cumulants the optimiziation can be stated
as follows: Given some multi-dimensional input data x, find the matrix U that produces output data

y = Ux (44)

that minimize (42) under the constraint (40). The constraint is trivial to fulfill by whitening the data. Once
the data has been whitened with some matrix W to obtain the whitened data x, the only transformation
that is left to do is a rotation by a rotation matrix R, as we have discussed earlier. Thus, we have

y = Rx (45)
= | {z } x
RW (46)
=U

8
2.4 Rotated cumulants
x y
Let x and y indicate the whitened and the rotated data, respectively, and C and Cijkl the corresponding
cumulants. Then we write the cumulants of y in terms of the cumulants of x.
y
Cijkl = hyi yj yk yl i hyi yj i hyk yl i hyi yk i hyj yl i hyi yl i hyj yk i (47)
* +
X X X X
= Ri x Rj x Rk x Rl x

* +* +
X X X X
Ri x Rj x Rk x Rl x ... ... (48)

X
(since (45) yi = Ri x )

X
= Ri Rj Rk Rl (hx x x x i hx x i hx x i ... ...) (49)

| {z }
x
C
X
x
= Ri Rj Rk Rl C . (50)

The fact that the new cumulants are simply linear combinations of the old cumulants reflects the multilin-
earity of cumulants.

2.5 Contrast function

It is interesting to note that the square sum over all cumulants of a given order does not change
under a rotation, as can be easily verified for fourth order.
2
X y 2 (50) X X
x
Cijkl = Ri Rj Rk Rl C (51)
ijkl ijkl

X X X
x x
= Ri Rj Rk Rl C Ri Rj Rk Rl C (52)
ijkl
X X X X X X
x x
= C C Ri Ri Rj Rj Rk Rk Rl Rl (53)
i j k l
| {z }| {z }| {z }| {z }

X
x
2
= C . (54)

This implies that instead of minimizing the square sum over all cross-cumulants of order four it is equivalent
to maximize the square sum over the kurtosis of all components, i.e.
X
2
minimize Cijkl (55)
ijkl6=iiii
X
2
maximize 4 := Ciiii , (56)
i

since the sum over these two sums is constant. This is one way of formalizing our intuition that making all
components statistically indendent is equivalent to making them as non-Gaussian as possible. Maximizing
4 is obviously much easier than minimizing the square sum over all cross cumulants. Thus, we will use
that as our objective function or contrast function.
A corresponding relationship also holds for third-order cumulants. However, if the sources are symmetric,
which they might well be, then their skewness is zero in any case and maximizing their square sum is of
little use. One therefore usually considers fourth- rather than third-order cumulants for ICA. Considering
both simulataneously, however, might even be better.

9
2.6 Givens-rotations
A rotation matrix in 2D is given by

cos sin
R = . (57)
sin cos

In higher dimensions a rotation matrix can be quite complex. To keep things simple, we use so-called Givens
rotations (D: Givens-Rotation), which are defined as a rotation within the 2D subspace spanned
by two axes n and m. For instance a rotation within the plane spanned by the second and the fourth axis
(n = 2, m = 4) of a four-dimensional space is given by

1 0 0 0
0 cos 0 sin
R = 0
. (58)
0 1 0
0 sin 0 cos

It can be shown that any general rotation, i.e. any orthogonal matrix with positive determinant, can be
written as a product of Givens-rotations. Thus, we can find the general rotation matrix of the ICA
problem by applying a series of Givens rotations and each time improve the objective function (42) a bit.

2.7 Optimizing the contrast function

We have argued above that ICA can be performed by repeatedly applying a Givens-rotation to the whitened
data. Each time two axes (n and m) that define the rotation plane are selected at random and then the
rotation angle is optimized to maximize the value of the contrast function. By applying enough of such
Givens-rotations, the algorithm should eventually converge to the globally optimal solution
(although we have no guarantee for that). Writing the constrast function as a function of the rotation angle
yields
X y 2
4 () := (Ciiii ) (59)
i
2
(50)
X X
x
= Ri Ri Ri Ri C (60)
i
2
X X
x
= K+ Ri Ri Ri Ri C . (61)
i=n,m =n,m

For i 6= n, m the entries of the rotation matrix are Ri = i for = , , , and do not depend on .
Thus, these terms are constant and are contained in K.
For i = n, m the entries of the rotation matrix are Ri = 0 for 6= n, m and Ri {cos(), sin()} for
= n, m with = , , , . Thus, the sums over , , , can be restricted to n, m and each resulting term
contains eight cosine- or sine-factors, because Ri {cos(), sin()} and due to the squaring. Thus, 4
can always be written as
8
X
4 () = K+ kp0 cos()(8p) sin()p . (62)
p=0

x
The old cumulants C are all contained in the constants kp0 .
We can simplify 4 () even further with the following two considerations: Firstly, a rotation by multiples of
90 should have no effect on the contrast function, because that would only flip or exchange the components
of y. Thus, 4 must have a 90 periodicity and can always be written like

4 () = A0 + A4 cos(4 + 4 ) + A8 cos(8 + 8 ) + A12 cos(12 + 12 ) + A16 ... (63)

10
Secondly, products of two sin- and cos-functions produce constants and frequency doubled terms, e.g.

sin()2 = (1 cos(2))/2 (64)

2
cos() = (1 + cos(2))/2 (65)
sin() cos() = sin(2)/2 . (66)

Products of eight sin- and cos-functions produce at most eightfold frequencies (three time frequency dou-
bling). This limits equation (63) to terms up to 8. Thus, finally we get the following contrast function
for one Givens-rotation:

4 () = A0 + A4 cos(4 + 4 ) + A8 cos(8 + 8 ) . (67)

x
Again the constants contain the old cumulants C in some complicated but computable form.
It is relatively simple to find the maximium of this function once the constants are known.

2.8 The algorithm

A cumulant based algorithm for independent component analysis could now look as follows:

1. Whiten the data x with whitening matrix W and create y = Wx.

2. Select two axes/variables yn and ym randomly.

3. Rotate y in the plane spanned by yn and ym such that 4 () is maximized. This leads to a new y.
4. Go to step 2 unless a suitable convergence criterion is fulfilled, for instance, the rotation angle has been
smaller than some threshold for the last 1000 iterations.
5. Stop.

3 Other resources
Numbers in square brackets indicate sections of these lecture notes to which the corresponding item is related.

3.1 Written material

ICA on Wikipedia
https://en.wikipedia.org/wiki/Independent_component_analysis

3.2 Videos
Abstract conceptual introduction to ICA from Georgia Tech

1. ICA objective
https://www.youtube.com/watch?v=2WY7wCghSVI (2:13)
2. Mixing and unmixing / blind source separation / cocktail party problem
https://www.youtube.com/watch?v=wIlrddNbXDo (4:17)
3. How to formulate this mathematically
https://www.youtube.com/watch?v=pSwRO5d266I (5:15)
4. Application to artificially mixed sound
https://www.youtube.com/watch?v=T0HP9cxri0A (3:25)
5. Quiz Questions PCA vs ICA
https://www.youtube.com/watch?v=TDW0vMz_3ag (0:32)

11
6. Quiz Answers PCA vs ICA with further comments
https://www.youtube.com/watch?v=SjM2Qm7N9CU (10:04)
7. More on differences between PCA and ICA
https://www.youtube.com/watch?v=e4woe8GRjEI (7:17)
01:2301:34 I find this statement confusing. Transposing a matrix is a very different operation
than rotating data, and I find it highly non-trivial that PCA works also on the transposed data
matrix, see the section on singular value decomposition in the lecture notes on PCA.
02:4902:59 Hm, this is not quite true. Either the mean has been removed from the data, which
is normally the case, then the average face is at the origin and cannot be extracted with PCA.
Or the mean has not been removed, then it is typically the first eigenvector (not the second),
that goes through the mean, although not exactly, as one can see in one of the python exercises.
There are marked differences around the eyes.
Lecture on ICA based on cumulants by Santosh Vempala, Georgia Institute of Technology.
https://www.youtube.com/watch?v=KSIA908KNiw (52:22)
00:4506:31 Introducing the ICA problem definition
06:3107:50 Would PCA solve the problem? No!
07:5015:04 Formulation of a deflation algorithm based on cumulants
15:0420:35 (Reformulation with tensor equations)
20:35 (Two problems in solving the ICA problem)
* 21:32 (What if the second and forth order cumulants are zero?)
25:26 (Algorithm: Fourier PCA)
* ...
...

3.3 Software
FastICA in scikit-learn, a python library for machine learning
http://scikit-learn.org/stable/modules/decomposition.html#independent-component-analysis-ica
Examples using FastICA in scikit-learn
http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.FastICA.html#
sklearn.decomposition.FastICA

3.4 Exercises
Analytical exercises by Laurenz Wiskott
https://www.ini.rub.de/PEOPLE/wiskott/Teaching/Material/IndependentComponentAnalysis-Exercises.
pdf
https://www.ini.rub.de/PEOPLE/wiskott/Teaching/Material/IndependentComponentAnalysis-Solutions.
pdf
Python exercises by Laurenz Wiskott
https://www.ini.rub.de/PEOPLE/wiskott/Teaching/Material/IndependentComponentAnalysis-PythonExercises.
zip
https://www.ini.rub.de/PEOPLE/wiskott/Teaching/Material/IndependentComponentAnalysis-PythonSolutions.
zip

References
Blaschke, T. and Wiskott, L. (2004). CuBICA: Independent component analysis by simultaneous third- and
fourth-order cumulant diagonalization. IEEE Trans. on Signal Processing, 52(5):12501256. 1
Hyvarinen, A., Karhunen, J., and Oja, E. (2001). Independent Component Analysis. John Wiley & Sons. 1

Java Programs
100% (1)
Java Programs
422 pages
Nonlinear Source Separation - Luis B. Almeida
No ratings yet
Nonlinear Source Separation - Luis B. Almeida
114 pages
Independent Components Analysis: CS229 Lecture Notes
No ratings yet
Independent Components Analysis: CS229 Lecture Notes
6 pages
cs229 Notes11 PDF
No ratings yet
cs229 Notes11 PDF
6 pages
Lecture 19
No ratings yet
Lecture 19
34 pages
Solutions To The Exercises On Independent Component Analysis
No ratings yet
Solutions To The Exercises On Independent Component Analysis
12 pages
Exercises On Independent Component Analysis
No ratings yet
Exercises On Independent Component Analysis
5 pages
Hyvärinen, Oja - 2000 - Independent Component Analysis Algorithms and Applications PDF
No ratings yet
Hyvärinen, Oja - 2000 - Independent Component Analysis Algorithms and Applications PDF
20 pages
Source Separation: Principles, Current Advances and Applications
No ratings yet
Source Separation: Principles, Current Advances and Applications
10 pages
6 Dimension Reduction Theory
No ratings yet
6 Dimension Reduction Theory
18 pages
Independent Component Analysis: A Statistical Perspective: Klaus Nordhausen - Hannu Oja
No ratings yet
Independent Component Analysis: A Statistical Perspective: Klaus Nordhausen - Hannu Oja
23 pages
M04 Dimension Reduction 1
No ratings yet
M04 Dimension Reduction 1
40 pages
Independent Components Analysis
No ratings yet
Independent Components Analysis
26 pages
Independent Components Analysis
No ratings yet
Independent Components Analysis
26 pages
3 - Feature Extraction
No ratings yet
3 - Feature Extraction
22 pages
Class 4 Part 2 PDF
No ratings yet
Class 4 Part 2 PDF
26 pages
Independent Component Analysis & Blind Source Separation: Ata Kaban The University of Birmingham
No ratings yet
Independent Component Analysis & Blind Source Separation: Ata Kaban The University of Birmingham
23 pages
PrincipalComponentAnalysis LectureNotesPublic
No ratings yet
PrincipalComponentAnalysis LectureNotesPublic
24 pages
It ML Unit 4 Notes Final
No ratings yet
It ML Unit 4 Notes Final
21 pages
Source Seperation 1
No ratings yet
Source Seperation 1
41 pages
Blind Source Separation: Biomedical Applications: Alexander M. Bronstein Michael M. Bronstein Michael Zibulevsky
No ratings yet
Blind Source Separation: Biomedical Applications: Alexander M. Bronstein Michael M. Bronstein Michael Zibulevsky
30 pages
Independent Component Analysis: Algorithms and Applications
100% (1)
Independent Component Analysis: Algorithms and Applications
31 pages
Independent Component Analysis: Algorithms and Applications: 1 Motivation
No ratings yet
Independent Component Analysis: Algorithms and Applications: 1 Motivation
31 pages
Independent Component Analysis: Derek Beaton
No ratings yet
Independent Component Analysis: Derek Beaton
28 pages
cs229 Notes10 PDF
No ratings yet
cs229 Notes10 PDF
6 pages
Testing Significance of Mixing and Demixing Coefficients in Ica
No ratings yet
Testing Significance of Mixing and Demixing Coefficients in Ica
8 pages
ICA Dim Red
No ratings yet
ICA Dim Red
39 pages
Stock Market Analysis and Prediction
No ratings yet
Stock Market Analysis and Prediction
26 pages
Independent Component Analysis: An Introduction: Alaa Tharwat
No ratings yet
Independent Component Analysis: An Introduction: Alaa Tharwat
28 pages
Lecture Notes On Pattern Recognition and Image Processing
No ratings yet
Lecture Notes On Pattern Recognition and Image Processing
24 pages
Gee7 2011
No ratings yet
Gee7 2011
318 pages
Blind Source Separation Using Modified Contrast Function in Fast ICA Algorithm
No ratings yet
Blind Source Separation Using Modified Contrast Function in Fast ICA Algorithm
4 pages
Pca
No ratings yet
Pca
6 pages
5 - Feature Generation
No ratings yet
5 - Feature Generation
15 pages
Independent Component Analysis For Time Series Separation
No ratings yet
Independent Component Analysis For Time Series Separation
13 pages
Independent Component Analysis
100% (1)
Independent Component Analysis
16 pages
Lecture-3 Unit 3
No ratings yet
Lecture-3 Unit 3
22 pages
PCA - Ensemble Classifiers
No ratings yet
PCA - Ensemble Classifiers
9 pages
4 - Basics in Statistics and Linear Algebra
No ratings yet
4 - Basics in Statistics and Linear Algebra
7 pages
VIP Cheatsheet: Unsupervised Learning: Afshine Amidi and Shervine Amidi September 9, 2018
No ratings yet
VIP Cheatsheet: Unsupervised Learning: Afshine Amidi and Shervine Amidi September 9, 2018
3 pages
ML Mod 4 Part 2
No ratings yet
ML Mod 4 Part 2
32 pages
Ijaerv 10 N 55 SPL 196
No ratings yet
Ijaerv 10 N 55 SPL 196
5 pages
ICAA
No ratings yet
ICAA
19 pages
Mastering The Discrete Fourier Transform in One, Two or Several Dimensions
No ratings yet
Mastering The Discrete Fourier Transform in One, Two or Several Dimensions
388 pages
Principal Component Analysis and Cluster Analysis
No ratings yet
Principal Component Analysis and Cluster Analysis
14 pages
20 Pca
No ratings yet
20 Pca
50 pages
Data Mining1
No ratings yet
Data Mining1
3 pages
ML Unit 4
No ratings yet
ML Unit 4
10 pages
Preshius Project
No ratings yet
Preshius Project
39 pages
(Sici) 1099 1115 (199601) 10:1 19::aid Acs384 3.0.co 2 7
No ratings yet
(Sici) 1099 1115 (199601) 10:1 19::aid Acs384 3.0.co 2 7
28 pages
10 Autoencoders
No ratings yet
10 Autoencoders
42 pages
Feature Selection and Dimensionality Reduction
No ratings yet
Feature Selection and Dimensionality Reduction
4 pages
Week 2
No ratings yet
Week 2
96 pages
Practical Biomedical Signal Analysis Using MATLAB® - 2nd Edition Educational Ebook Download
100% (9)
Practical Biomedical Signal Analysis Using MATLAB® - 2nd Edition Educational Ebook Download
14 pages
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
From Everand
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
Vladimir Kiselev
No ratings yet
Complex Analysis: Advanced Concepts
From Everand
Complex Analysis: Advanced Concepts
Shashank Tiwari
No ratings yet
From Simple IO to Monad Transformers
From Everand
From Simple IO to Monad Transformers
J Adrian Zimmer
2/5 (1)
The Linux Terminal for Advanced Users - The Command Line Made Easy: First Edition
From Everand
The Linux Terminal for Advanced Users - The Command Line Made Easy: First Edition
Michael Basler
No ratings yet
Deep Learning Fundamentals in Python
From Everand
Deep Learning Fundamentals in Python
LazyProgrammer
4/5 (9)
The Tech Interview Playbook: From DSA to System Design
From Everand
The Tech Interview Playbook: From DSA to System Design
Chinmoy Mukherjee
No ratings yet
Solutions To The Exercises On Fisher Discriminant Analysis
No ratings yet
Solutions To The Exercises On Fisher Discriminant Analysis
5 pages
Exercises On Fisher Discriminant Analysis
No ratings yet
Exercises On Fisher Discriminant Analysis
2 pages
Lecture Notes On Clustering
No ratings yet
Lecture Notes On Clustering
10 pages
Exercises On Backpropagation
No ratings yet
Exercises On Backpropagation
4 pages
Exercises On Backpropagation
No ratings yet
Exercises On Backpropagation
4 pages
Backpropagation LectureNotesPublic
No ratings yet
Backpropagation LectureNotesPublic
13 pages
Solutions To The Exercises On The Bias-Variance Dilemma
No ratings yet
Solutions To The Exercises On The Bias-Variance Dilemma
8 pages
Echelon Form of A Matrix
No ratings yet
Echelon Form of A Matrix
10 pages
CCP Lab Manual
No ratings yet
CCP Lab Manual
19 pages
Observing: The State of Linear System
No ratings yet
Observing: The State of Linear System
7 pages
Determination of ZIP Parameters With Least Squares Optimization Method
No ratings yet
Determination of ZIP Parameters With Least Squares Optimization Method
6 pages
Cauchy Binet
No ratings yet
Cauchy Binet
5 pages
CSI-321 Assignment
No ratings yet
CSI-321 Assignment
6 pages
Lie Group
No ratings yet
Lie Group
59 pages
FRM Download File
No ratings yet
FRM Download File
31 pages
Value Analysis Value Engineering
100% (2)
Value Analysis Value Engineering
25 pages
1608cbb682872f - 48646037354
No ratings yet
1608cbb682872f - 48646037354
4 pages
Populus Guia 1
No ratings yet
Populus Guia 1
125 pages
Risk Management
100% (1)
Risk Management
62 pages
A Level Further Pure Mathematics 3
No ratings yet
A Level Further Pure Mathematics 3
257 pages
Literature Review
0% (1)
Literature Review
35 pages
Graybill Dist FC
No ratings yet
Graybill Dist FC
7 pages
Spectral Graph Theory
No ratings yet
Spectral Graph Theory
4 pages
Problem Set 2
No ratings yet
Problem Set 2
6 pages
1 Introduction To Finite Element Methods For Electromagnetic Fields and Coupled Problems
No ratings yet
1 Introduction To Finite Element Methods For Electromagnetic Fields and Coupled Problems
128 pages
Module 2.1and 2.2 Rank of Matrices
No ratings yet
Module 2.1and 2.2 Rank of Matrices
4 pages
College Algebra Dictionary
No ratings yet
College Algebra Dictionary
12 pages
Linear Algebra (PPT) Updated
100% (2)
Linear Algebra (PPT) Updated
112 pages
A Programming Language For The Fem: Freefem++: F. Hecht
No ratings yet
A Programming Language For The Fem: Freefem++: F. Hecht
186 pages
Multi Step Analysis of Interconected Grounding Electrodes
No ratings yet
Multi Step Analysis of Interconected Grounding Electrodes
7 pages
Chapter 2 Discretization of The Domain
100% (2)
Chapter 2 Discretization of The Domain
21 pages
Matrix Method in Parallel Optics
No ratings yet
Matrix Method in Parallel Optics
25 pages
Determinant, Matrices L 1
No ratings yet
Determinant, Matrices L 1
33 pages
Generalized H2 Control
No ratings yet
Generalized H2 Control
6 pages
UCSD - ECON 100A Math Handout
No ratings yet
UCSD - ECON 100A Math Handout
18 pages
Matrices 1
100% (1)
Matrices 1
77 pages

Lecture Notes On Independent Component Analysis

Uploaded by

Lecture Notes On Independent Component Analysis

Uploaded by

Lecture Notes on

Independent Component Analysis

2 Formalism based on cumulants 5

ps (s1 , s2 ) = ps1 (s1 ) ps2 (s2 ) . (1)

For an example see figure 1.

Figure 1: Individual and joint pdfs of two sources.

Example 1: Figure 2 shows an example of the two-dimensional mixture

Figure 2: A two-dimensional orthogonal mixture of the two sources given above.

and likewise for y2 .

Figure 3: A two-dimensional non-orthogonal mixture of the two sources given above.

1.2 How to find the unmixing matrix?

1.3 Sources can only be recovered up to permutation and rescaling

1.4 Whiten the data first

1.5 A generic ICA algorithm

1. remove the mean and whiten the data,

(a) the output signal components are as statistically independent as possible, or

2 Formalism based on cumulants

first moment hyi i (19)

hyi yj i = hyi i hyj i + ? , (23)

or for zero mean data

hyi yj yk yl i = hyi yj i hyk yl i + hyi yk i hyj yl i + hyi yl i hyj yk i + ? . (24)

2.2 Cross-cumulants of statistically independent components are zero

Ciij = hyi yi yj i (34)

2.3 Components with zero cross-cumulants are statistically independent

Cij = ij (unit covariance matrix) , (40)

2.5 Contrast function

2.7 Optimizing the contrast function

4 () = A0 + A4 cos(4 + 4 ) + A8 cos(8 + 8 ) + A12 cos(12 + 12 ) + A16 ... (63)

sin()2 = (1 cos(2))/2 (64)

4 () = A0 + A4 cos(4 + 4 ) + A8 cos(8 + 8 ) . (67)

2.8 The algorithm

1. Whiten the data x with whitening matrix W and create y = Wx.

3.1 Written material

You might also like