0% found this document useful (0 votes)
83 views12 pages

Lecture Notes On Independent Component Analysis

e

Uploaded by

yuyang zhang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views12 pages

Lecture Notes On Independent Component Analysis

e

Uploaded by

yuyang zhang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Lecture Notes on

Independent Component Analysis


Laurenz Wiskott
Institut fur Neuroinformatik
Ruhr-Universitat Bochum, Germany, EU

11 December 2016

Contents
1 Intuition 2
1.1 Mixing and unmixing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 How to find the unmixing matrix? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Sources can only be recovered up to permutation and rescaling . . . . . . . . . . . . . . . . . 5
1.4 Whiten the data first . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 A generic ICA algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Formalism based on cumulants 5


2.1 Moments and cumulants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Cross-cumulants of statistically independent components are zero . . . . . . . . . . . . . . . . 7
2.3 Components with zero cross-cumulants are statistically independent . . . . . . . . . . . . . . 8
2.4 Rotated cumulants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5 Contrast function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.6 Givens-rotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.7 Optimizing the contrast function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.8 The algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3 Other resources 11
3.1 Written material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Videos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

These lecture notes depend on my lecture notes on principal component analysis and are largely based on
(Hyvarinen et al., 2001; Blaschke and Wiskott, 2004).

2009, 20112013, 2016 Laurenz Wiskott (homepage https://www.ini.rub.de/PEOPLE/wiskott/). This work (except
for all figures from other sources, if present) is licensed under the Creative Commons Attribution-ShareAlike 4.0 International
License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/4.0/. Figures from other sources
have their own copyright, which is generally indicated. Do not distribute parts of these lecture notes showing figures with
non-free copyrights (here usually figures I have the rights to publish but you dont, like my own published figures). Figures I
do not have the rights to publish are grayed out, but the word Figure, Image, or the like in the reference is often linked to a
pdf.
More teaching material is available at https://www.ini.rub.de/PEOPLE/wiskott/Teaching/Material/.

1
1 Intuition
1.1 Mixing and unmixing
In contrast to principal component analysis, which deals with the second-order moments of a data distri-
bution, independent component analysis focuses on higher-order moments, which can, of course, be of very
diverse and very complex nature. In (linear) independent component analysis (ICA) one assumes1
a very simple model of the data, namely that it is a linear mixture (D: Mischung) of some statistically
independent sources (D: Quellen) sI , and one often even assumes that the number of sources I is
the same as the dimensionality N of the data. Each source is characterized by a probability density
function (pdf) (D: Wahrscheinlichkeitsdichtefunktino) psi (si ) and the joint pdf of the sources is simply
the product of its individual pdfs, for two sources we have

ps (s1 , s2 ) = ps1 (s1 ) ps2 (s2 ) . (1)

For an example see figure 1.

ps(s1, s2)
ps1 (s1) s2

s1 s1

ps2 (s2)

s2

Figure 1: Individual and joint pdfs of two sources.


CC BY-SA 4.0

It is assumed that the data is generated by mixing the sources linearly like

x := Ms , (2)

with an invertible square mixing matrix M. The task of independent component analysis then is to find a
square matrix U that inverts this mixture, so that

y := Ux (3)

recovers the sources. Consider first two examples where we know the mixing.

Example 1: Figure 2 shows an example of the two-dimensional mixture


 
 s1
x := s1 d1 + s2 d2 = d1 d2 (4)
| {z } s2
=: M

of the sources of figure 1, where the mixing is orthogonal. This means that matrix M is orthogonal if we
assume the vectors di are normalized. Notice that the vectors di indicate how a single source is distributed
1 Important text (but not inline formulas) is set bold face; marks important formulas worth remembering; marks less

important formulas, which I also discuss in the lecture; + marks sections that I typically skip during my lectures.

2
x2

d2

x1

d1

Figure 2: A two-dimensional orthogonal mixture of the two sources given above.


CC BY-SA 4.0

over the different data vector components. I therefore call them distribution vectors (D: Verteilungsvektoren),
but this is not a standard term. They do not indicate the mixing of all sources on one component of the
data vector, thus calling them mixture vectors would be misleading. Notice also that the fact that the data
is concentrated around d2 is a consequence of s1 (not s2 !) being concentrated around zero.
Since the vectors di are orthogonal, extracting the sources from the mixture can be done by mulitplying
with these vectors, for instance

y1 := dT1 x (5)
(4)
= dT1 (s1 d1 + s2 d2 ) (6)
= s1 dT1 d1 +s2 dT1 d2 (7)
| {z } | {z }
=1 =0
= s1 , (8)

and likewise for y2 .


The unmixing matrix therefore is

dT1
 
U := , (9)
dT2

since

dT1
 
(9,4) 
UM = d1 d2 (10)
dT2
dT1 d1 dT1 d2
 
= (11)
dT2 d1 dT2 d2
 
1 0
= . (12)
0 1

Example 2: If the distribution vectors are not orthogonal, matters become a bit more complicated, see
figure 3. It is somewhat counterintuitive that now the vectors ei to extract the sources from the
mixture do not have to point in the direction of the corresponding distribution vectors but
rather must be orthogonal to all other distribution vectors, so that

eTi dj = ij . (13)

3
d1 x2

e1

d2

x1
e2

Figure 3: A two-dimensional non-orthogonal mixture of the two sources given above.


CC BY-SA 4.0

Notice that in the figure e1 has the same angle to d1 as e2 has to d2 and that e1 must be shorter than e2 ,
because d1 is longer than d2 , to keep the inner products eTi di equal.
With the extraction vectors (D: Extraktionsvektoren) ei we get the unmixing matrix
 T 
e1
U := (14)
eT2
and verify
eT1
 
(14,4) 
UM = d1 d2 (15)
eT2
eT1 d1 eT1 d2
 
= (16)
eT2 d1 eT2 d2
 
(13) 1 0
= . (17)
0 1

1.2 How to find the unmixing matrix?


So far we have only derived unmixing matrices if the mixing matrix was known. But we have not said anything
about how to find the unmixing matrix if only the data is given and nothing is known about the mixing (or
the sources) apart from it being linear. So we need some statistical criteria to judge whether an
unmixing matrix is good or not. There are two fundamental approaches, either one compares the joint
pdf of the unmixed data with the product of its marginals, or one maximizes the non-Gaussianity of the
marginals.

Make the output signal components statistically independent: A fundamental assumption of ICA
is that the data is a linear mixture of statistically independent sources. If we unmix the data the resulting
output signal components should therefore again be statistically independent. Thus a possible criterion for
whether the unmixing is good or not is whether
py (y1 , y2 ) = py1 (y1 ) py2 (y2 ) . (18)
or not. In practice one measures the difference between the joint distribution and the product of marginals
and tries to minimize it. For instance one can use the Kullback-Leibler divergence between py (y1 , y2 )
and py1 (y1 ) py2 (y2 ). In this lecture I will focus on cross-cumulants as a measure of statistical
independence.
This approach necessarily optimizes the unmixing matrix as a whole.

4
Make the output signal components non-Gaussian: Another approach is based on the observation
that, if you mix two sources, the mixture tends to be more Gaussian than the sources. Applied
over and over again this culminates in the central limit theorem of statistics, which basically states that a
mixture of infinitely many variables has a Gaussian distribution. Turning this argument around it might
be a good strategy to search for output signal components that are as different from a Gaussian
as possible. This are then most likely sources. To measure non-Gaussianity, one often uses kurtosis (D:
Kurtosis), see below.
This approach permits to extract one source after the other. One then typically first extracts the
most non-Gaussian signal, eliminates the corresponding dimension from the data and then finds the second
non-Gaussian signal.

1.3 Sources can only be recovered up to permutation and rescaling


It is clear that if y1 and y2 are statistically independent, 3y2 and 0.5y1 are also statistcally independent.
Thus, there is no way to tell the order of the sources and their scale. A similar argument holds for the
approach based on non-Gaussianity. Thus, the sources can principally be only recovered up to a
permutation and scaling.

1.4 Whiten the data first


The least one can expect from the estimated sources is that they are uncorrelated. To fix the arbitrary
scaling factor, it is also common to require the output data components to have unit variance (zero mean
is assumed in any case). It is therefore a good idea to whiten or sphere the data first, because then
we know that the data projected onto any normalized vector has zero mean and unit variance and that
the data projected onto two orthogonal vectors are uncorrelated. Thus, the unmixing matrix must be
orthogonal and the unmixing problem reduces to finding the right rotation, which is still difficult enough.
Intuitively, whitening brings us from the situation in figure 3 to that in figure 2.

1.5 A generic ICA algorithm


Summarizing what we have learned so far, a generic ICA algorithm is conceptually relatively simple and
works as follows:

1. remove the mean and whiten the data,


2. rotate the data such that either

(a) the output signal components are as statistically independent as possible, or


(b) the output signal components are most non-Gaussian.

2 Formalism based on cumulants


There is a whole zoo of different ICA algorithms. I focus here on a method based on higher-order cumulants.

5
2.1 Moments and cumulants
Moments and cumulants are a convenient way of describing statistical distributions. If all moments or all
cumulants are known, the distribution is uniquely determined. Thus, moments and cumulants are equivalent
descriptions of a distribution, although one usually only uses the lower moments or cumulants. If hi indicates
the average over all data points, then the moments of some vectorial data y written as single components
are defined as

first moment hyi i (19)


second moment hyi yj i (20)
third moment hyi yj yk i (21)
fourth moment hyi yj yk yl i (22)
higher moments ... .

A disadvantage of moments is that higher moments contain information that can already be expected from
lower moments, for instance

hyi yj i = hyi i hyj i + ? , (23)

or for zero mean data

hyi yj yk yl i = hyi yj i hyk yl i + hyi yk i hyj yl i + hyi yl i hyj yk i + ? . (24)

It would be nice to have a definition for only the part that is not known yet, much like the variance for the
second moment of a scalar variable,

(y hyi)2


= hyyi hyi hyi , (25)

where one has removed the influence of the mean from the second moment. This idea can be generalized
to mixed and higher moments. If one simply subtracts off from the higher moments what one
would expect from the lower moments already, one gets corrected moments that are called
cumulants (D: Kumulanten). For simplicity we assume zero mean data, then
!
Ci := hyi i = 0 , (26)
Cij := hyi yj i , (27)
Cijk := hyi yj yk i , (28)
Cijkl := hyi yj yk yl i hyi yj i hyk yl i hyi yk i hyj yl i hyi yl i hyj yk i . (29)

Notice that there are no terms to be subtracted off from Cij and Cijk due to the zero-mean constraint. Any
term that one might extract would have a factor of the form hyi i included, which is zero.
For scalar variables these four cumulants have nice intuitive interpretations. We know already that Ci
3/2
and Cii are the mean and variance of the data, respectively. Ciii (often normalized by Cii ) is
the skewness (D: Schiefe) of the distribution, which indicates how much tilted to the left or to the right
the distribution is. Ciiii (often normalized by Cii2 ) is the kurtosis (D: Kurtosis) of the distribution,
which indicates how peaky the distribution is. A uniform distribution has negative kurtosis and is called
sub-Gaussian; a peaky distribution with heavy tails has positive kurtosis and is called super-Gaussian. A
Gaussian distribution has zero kurtosis, and also all its other cumulants higher than the second cumulant
vanish.
Cumulants have the important property that if you add two (or more) statistically independent variables,
the cumulants of the sum equals the sum of the cumulants of the variables.

6
Mean:
Ci < 0 Ci = 0 Ci > 0

0 0 0

Variance:
Cii < 1 Cii = 1 Cii > 1

0 0 0

Skewness:
Ciii < 0 Ciii = 0 Ciii > 0

0 0 0

Kurtosis:
Ciiii < 0 Ciiii = 0 Ciiii > 0

0 0 0

Figure 4: Illustration of the effect of the first four cumulants in one variable, i.e. mean, variance, skewness,
and kurtosis.
CC BY-SA 4.0

2.2 Cross-cumulants of statistically independent components are zero


First notice that if the random variables of a higher moment can be split into two statistically
independent groups, then the moment can be written as a product of the two lower moments
of the groups. For instance, if yi and yj are statistically independent of yk , then
Z Z Z
hyi yj yk i = yi yj yk p(yi , yj , yk ) dyk dyj dyi (30)
y y yk
Z iZ jZ
= yi yj yk p(yi , yj ) p(yk ) dyk dyj dyi (due to statistical independence) (31)
yi yj yk
Z Z Z
= yi yj p(yi , yj ) dyj dyi yk p(yk ) dyk (32)
yi yj yk
= hyi yj i hyk i . (33)

This has an important implication for cumulants of statistically independent variables. We have argued
above that a cumulant can be interpreted as the corresponding moment of identical structure minus all what
can be expected from lower-order moments (or cumulants) already. If the random variables can be split into
two statistically independent groups then the corresponding moment can be written as a product of two lower
moments, which means that it can be completely predicted from these lower moments. As a consequence

7
the cumulant is zero, because there is nothing left, that could not be predicted. Thus we can state (without
proof) that if a set of random variables are statistically independent then all cross-cumulants
vanish. For instance, for statistically independent random variables yi and yj with zero mean we get

Ciij = hyi yi yj i (34)


(33)
= hyi yi i hyj i (since yi and yj are statistically independent) (35)
|{z}
=0
= 0 (since the data is zero-mean) , (36)
Ciijj = hyi yi yj yj i hyi yi i hyj yj i hyi yj i hyi yj i hyi yj i hyi yj i (37)
(33)
= hyi yi i hyj yj i hyi yi i hyj yj i 2 hyi i hyj i hyi i hyj i (38)
|{z}
=0
(since yi and yj are statistically independent)
= 0 (since the data is zero-mean) . (39)

2.3 Components with zero cross-cumulants are statistically independent


The converse is also true (again without proof). If all cross-cumulants vanish then the random
variables are statistically independent. Notice that pairwise statistical independence does not generally
suffice for the overall statistical independence of the variables. Consider, for instance, the three binary
varibles A, B, and C = AxorB.
Thus, the cross-cumulants can be used to measure the statistical dependence between the output signal
components in ICA. Of course, one cannot use all cross-cumulants, but for instance one can require

Cij = ij (unit covariance matrix) , (40)


X
2
and Cijk minimal, (41)
ijk6=iii
X
2
or Cijkl minimal, (42)
ijkl6=iiii
X X
2 2
or Cijk + Cijkl minimal. (43)
ijk6=iii ijkl6=iiii

Usually one uses the fourth-order cumulants because signals are often symmetric and then the third-order
cumulants vanish in any case. Assuming we disregard third-order cumulants the optimiziation can be stated
as follows: Given some multi-dimensional input data x, find the matrix U that produces output data

y = Ux (44)

that minimize (42) under the constraint (40). The constraint is trivial to fulfill by whitening the data. Once
the data has been whitened with some matrix W to obtain the whitened data x, the only transformation
that is left to do is a rotation by a rotation matrix R, as we have discussed earlier. Thus, we have

y = Rx (45)
= | {z } x
RW (46)
=U

8
2.4 Rotated cumulants
x y
Let x and y indicate the whitened and the rotated data, respectively, and C and Cijkl the corresponding
cumulants. Then we write the cumulants of y in terms of the cumulants of x.
y
Cijkl = hyi yj yk yl i hyi yj i hyk yl i hyi yk i hyj yl i hyi yl i hyj yk i (47)
* +
X X X X
= Ri x Rj x Rk x Rl x

* +* +
X X X X
Ri x Rj x Rk x Rl x ... ... (48)

X
(since (45) yi = Ri x )

X
= Ri Rj Rk Rl (hx x x x i hx x i hx x i ... ...) (49)

| {z }
x
C
X
x
= Ri Rj Rk Rl C . (50)


The fact that the new cumulants are simply linear combinations of the old cumulants reflects the multilin-
earity of cumulants.

2.5 Contrast function


It is interesting to note that the square sum over all cumulants of a given order does not change
under a rotation, as can be easily verified for fourth order.
2
X  y 2 (50) X X
x
Cijkl = Ri Rj Rk Rl C (51)
ijkl ijkl 

X X X
x x
= Ri Rj Rk Rl C Ri Rj Rk Rl C (52)
ijkl 
X X X X X X
x x
= C C Ri Ri Rj Rj Rk Rk Rl Rl (53)
 i j k l
| {z }| {z }| {z }| {z }

X
x
2
= C . (54)


This implies that instead of minimizing the square sum over all cross-cumulants of order four it is equivalent
to maximize the square sum over the kurtosis of all components, i.e.
X
2
minimize Cijkl (55)
ijkl6=iiii
X
2
maximize 4 := Ciiii , (56)
i

since the sum over these two sums is constant. This is one way of formalizing our intuition that making all
components statistically indendent is equivalent to making them as non-Gaussian as possible. Maximizing
4 is obviously much easier than minimizing the square sum over all cross cumulants. Thus, we will use
that as our objective function or contrast function.
A corresponding relationship also holds for third-order cumulants. However, if the sources are symmetric,
which they might well be, then their skewness is zero in any case and maximizing their square sum is of
little use. One therefore usually considers fourth- rather than third-order cumulants for ICA. Considering
both simulataneously, however, might even be better.

9
2.6 Givens-rotations
A rotation matrix in 2D is given by
 
cos sin
R = . (57)
sin cos

In higher dimensions a rotation matrix can be quite complex. To keep things simple, we use so-called Givens
rotations (D: Givens-Rotation), which are defined as a rotation within the 2D subspace spanned
by two axes n and m. For instance a rotation within the plane spanned by the second and the fourth axis
(n = 2, m = 4) of a four-dimensional space is given by

1 0 0 0
0 cos 0 sin
R = 0
. (58)
0 1 0
0 sin 0 cos

It can be shown that any general rotation, i.e. any orthogonal matrix with positive determinant, can be
written as a product of Givens-rotations. Thus, we can find the general rotation matrix of the ICA
problem by applying a series of Givens rotations and each time improve the objective function (42) a bit.

2.7 Optimizing the contrast function


We have argued above that ICA can be performed by repeatedly applying a Givens-rotation to the whitened
data. Each time two axes (n and m) that define the rotation plane are selected at random and then the
rotation angle is optimized to maximize the value of the contrast function. By applying enough of such
Givens-rotations, the algorithm should eventually converge to the globally optimal solution
(although we have no guarantee for that). Writing the constrast function as a function of the rotation angle
yields
X y 2
4 () := (Ciiii ) (59)
i
2
(50)
X X
x
= Ri Ri Ri Ri C (60)
i 
2
X X
x
= K+ Ri Ri Ri Ri C . (61)
i=n,m =n,m

For i 6= n, m the entries of the rotation matrix are Ri = i for = , , ,  and do not depend on .
Thus, these terms are constant and are contained in K.
For i = n, m the entries of the rotation matrix are Ri = 0 for 6= n, m and Ri {cos(), sin()} for
= n, m with = , , , . Thus, the sums over , , ,  can be restricted to n, m and each resulting term
contains eight cosine- or sine-factors, because Ri {cos(), sin()} and due to the squaring. Thus, 4
can always be written as
8
X
4 () = K+ kp0 cos()(8p) sin()p . (62)
p=0

x
The old cumulants C are all contained in the constants kp0 .
We can simplify 4 () even further with the following two considerations: Firstly, a rotation by multiples of
90 should have no effect on the contrast function, because that would only flip or exchange the components
of y. Thus, 4 must have a 90 periodicity and can always be written like

4 () = A0 + A4 cos(4 + 4 ) + A8 cos(8 + 8 ) + A12 cos(12 + 12 ) + A16 ... (63)

10
Secondly, products of two sin- and cos-functions produce constants and frequency doubled terms, e.g.

sin()2 = (1 cos(2))/2 (64)


2
cos() = (1 + cos(2))/2 (65)
sin() cos() = sin(2)/2 . (66)

Products of eight sin- and cos-functions produce at most eightfold frequencies (three time frequency dou-
bling). This limits equation (63) to terms up to 8. Thus, finally we get the following contrast function
for one Givens-rotation:

4 () = A0 + A4 cos(4 + 4 ) + A8 cos(8 + 8 ) . (67)


x
Again the constants contain the old cumulants C in some complicated but computable form.
It is relatively simple to find the maximium of this function once the constants are known.

2.8 The algorithm


A cumulant based algorithm for independent component analysis could now look as follows:

1. Whiten the data x with whitening matrix W and create y = Wx.


2. Select two axes/variables yn and ym randomly.

3. Rotate y in the plane spanned by yn and ym such that 4 () is maximized. This leads to a new y.
4. Go to step 2 unless a suitable convergence criterion is fulfilled, for instance, the rotation angle has been
smaller than some threshold for the last 1000 iterations.
5. Stop.

3 Other resources
Numbers in square brackets indicate sections of these lecture notes to which the corresponding item is related.

3.1 Written material


ICA on Wikipedia
https://en.wikipedia.org/wiki/Independent_component_analysis

3.2 Videos
Abstract conceptual introduction to ICA from Georgia Tech

1. ICA objective
https://www.youtube.com/watch?v=2WY7wCghSVI (2:13)
2. Mixing and unmixing / blind source separation / cocktail party problem
https://www.youtube.com/watch?v=wIlrddNbXDo (4:17)
3. How to formulate this mathematically
https://www.youtube.com/watch?v=pSwRO5d266I (5:15)
4. Application to artificially mixed sound
https://www.youtube.com/watch?v=T0HP9cxri0A (3:25)
5. Quiz Questions PCA vs ICA
https://www.youtube.com/watch?v=TDW0vMz_3ag (0:32)

11
6. Quiz Answers PCA vs ICA with further comments
https://www.youtube.com/watch?v=SjM2Qm7N9CU (10:04)
7. More on differences between PCA and ICA
https://www.youtube.com/watch?v=e4woe8GRjEI (7:17)
01:2301:34 I find this statement confusing. Transposing a matrix is a very different operation
than rotating data, and I find it highly non-trivial that PCA works also on the transposed data
matrix, see the section on singular value decomposition in the lecture notes on PCA.
02:4902:59 Hm, this is not quite true. Either the mean has been removed from the data, which
is normally the case, then the average face is at the origin and cannot be extracted with PCA.
Or the mean has not been removed, then it is typically the first eigenvector (not the second),
that goes through the mean, although not exactly, as one can see in one of the python exercises.
There are marked differences around the eyes.
Lecture on ICA based on cumulants by Santosh Vempala, Georgia Institute of Technology.
https://www.youtube.com/watch?v=KSIA908KNiw (52:22)
00:4506:31 Introducing the ICA problem definition
06:3107:50 Would PCA solve the problem? No!
07:5015:04 Formulation of a deflation algorithm based on cumulants
15:0420:35 (Reformulation with tensor equations)
20:35 (Two problems in solving the ICA problem)
* 21:32 (What if the second and forth order cumulants are zero?)
25:26 (Algorithm: Fourier PCA)
* ...
...

3.3 Software
FastICA in scikit-learn, a python library for machine learning
http://scikit-learn.org/stable/modules/decomposition.html#independent-component-analysis-ica
Examples using FastICA in scikit-learn
http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.FastICA.html#
sklearn.decomposition.FastICA

3.4 Exercises
Analytical exercises by Laurenz Wiskott
https://www.ini.rub.de/PEOPLE/wiskott/Teaching/Material/IndependentComponentAnalysis-Exercises.
pdf
https://www.ini.rub.de/PEOPLE/wiskott/Teaching/Material/IndependentComponentAnalysis-Solutions.
pdf
Python exercises by Laurenz Wiskott
https://www.ini.rub.de/PEOPLE/wiskott/Teaching/Material/IndependentComponentAnalysis-PythonExercises.
zip
https://www.ini.rub.de/PEOPLE/wiskott/Teaching/Material/IndependentComponentAnalysis-PythonSolutions.
zip

References
Blaschke, T. and Wiskott, L. (2004). CuBICA: Independent component analysis by simultaneous third- and
fourth-order cumulant diagonalization. IEEE Trans. on Signal Processing, 52(5):12501256. 1
Hyvarinen, A., Karhunen, J., and Oja, E. (2001). Independent Component Analysis. John Wiley & Sons. 1

12

You might also like