Data Mining Essen, Als 2: Data Mining in Prac, Ce, With Python
Data Mining Essen, Als 2: Data Mining in Prac, Ce, With Python
Spring 2015
Rosanne
Liu
[email protected]
Outline
• Why
Python?
• Intro
to
Python
• Intro
to
Scikit-‐Learn
• Unsupervised
Learning
– Demo
on
PCA,
K-‐Means
• Supervised
Learning
– Demo
on
Linear
Regression,
LogisGc
Regression
Outline
• Why
Python?
• Intro
to
Python
• Intro
to
Scikit-‐Learn
• Unsupervised
Learning
– Demo
on
PCA,
K-‐Means
• Supervised
Learning
– Demo
on
Linear
Regression,
LogisGc
Regression
• Why
Python?
• Why
Python?
Not
Think
about
the
scien,st’s
needs:
§ Get
data
(simulaGon,
experiment
control)
§ Manipulate
and
process
data.
§ Visualize
results...
to
understand
what
we
are
doing!
§ Communicate
results:
produce
figures
for
reports
or
publicaGons,
write
presentaGons.
Why
Python?
• Why
Python?
Not
– Easy
• Easy
to
learn,
easily
readable
• ScienGsts
first,
programmers
second
– Efficient
• Managing
memory
is
easy
–
if
you
just
don’t
care
– A
single
Language
for
everything
• Avoid
learning
a
new
soXware
for
each
new
problem
More
to
Take
Away
• Why
Python?
• Intro
to
Python
• Intro
to
Scikit-‐Learn
• Unsupervised
Learning
– Demo
on
PCA,
K-‐Means
• Supervised
Learning
– Demo
on
Linear
Regression,
LogisGc
Regression
The
Use
of
Python:
Simple
demos
0
–
Python
Intro.ipynb
Outline
• Why
Python?
• Intro
to
Python
• Intro
to
Scikit-‐Learn
• Unsupervised
Learning
– Demo
on
PCA,
K-‐Means
• Supervised
Learning
– Demo
on
Linear
Regression,
LogisGc
Regression
What
is
Scikit-‐learn
• Why
Python?
• Intro
to
Python
• Intro
to
Scikit-‐Learn
• Unsupervised
Learning
– Demo
on
PCA,
K-‐Means
• Supervised
Learning
– Demo
on
Linear
Regression,
LogisGc
Regression
The
use
of
Scikit-‐Learn:
unsupervised
learning
demos
PCA
Summary
• IdenGfy
important
variables
in
projecGon
matrix
W:
1
–
PCA.ipynb
K-‐Means
Algorithm
2
–
k
means.ipynb
Outline
• Why
Python?
• Intro
to
Python
• Intro
to
Scikit-‐Learn
• Unsupervised
Learning
– Demo
on
PCA,
K-‐Means
• Supervised
Learning
– Demo
on
Linear
Regression,
LogisGc
Regression,
kNN
The
use
of
Scikit-‐Learn:
supervised
learning
demos
Linear
Regression
1D 2D
• LogisGc
regression
works
well
if
the
data
is
linearly
separable,
but…
K
Nearest
Neighbors