Dimensionality Reduction
Dimensionality Reduction
REDUCTION
Dimensionality of input
2
Supervised
Train using selected subset
Estimate error on validation data set
Unsupervised
Look at input only(e.g. age, income and
savings)
Select subset of 2 that bear most of the
information about the person
Mutual Information
8
Forward search
Start from empty set of features
Try each of remaining features
Estimate classification/regression error for
adding specific feature
Select feature that gives maximum
improvement in validation error
Stop when no significant improvement
Backward search
Start with original set of size d
Drop features with smallest impact on error
Floating Search
10
Floating search
Two types of steps: Add k, remove l
More computations
Feature Extraction
11
Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
PCA: Motivation
13
Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
PCA: Motivation
15
Taking Derivatives
Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
What PCA does
18
z = WT(x – m)
where the columns of W are the
eigenvectors of ∑, and m is sample mean
Centers the data at the origin and rotates
the axes
Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
How to choose k ?
19
1 2 k
1 2 k d
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press
(V1.1)
PCA
21
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
PCA
22
Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Factor Analysis
24
V is d x k matrix (k<d)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
FA Usage
28
Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Means and Scatter after projection
30
Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Good Projection
31
m1 m2
2
J w 2 2
s s
1 2
Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Expectation-
32
Maximization Algorithm
Expectation-Maximization algorithm can
be used for the latent variables (variables
that are not directly observable and are
actually inferred from the values of the other
observed variables) too in order to predict
their values with the condition that the
general form of probability distribution
governing those latent variables is known to
us. This algorithm is actually at the base of
many unsupervised clustering algorithms in
the field of machine learning.
Algorithm:
33
Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
35
Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
36
Advantages of EM algorithm –
It is always guaranteed that likelihood
will increase with each iteration.
The E-step and M-step are often pretty
easy for many problems in terms of
implementation.
Solutions to the M-steps often exist in
the closed form.
41
Disadvantages of EM algorithm –
It has slow convergence.
It makes convergence to the local
optima only.
It requires both the probabilities, forward
and backward (numerical optimization
requires only forward probability).
K-Nearest Neighbor
Features
All instances correspond to points in an n-
dimensional Euclidean space
Classification is delayed till a new instance
arrives
Classification done by comparing feature
vectors of the different points
Target function may be discrete or real-
valued
1-Nearest Neighbor
3-Nearest Neighbor
K-Nearest Neighbor
Acid durability =3
Strength= 7
Summary
56
Feature selection
Supervised: drop features which don’t introduce
large errors (validation set)
Unsupervised: keep only uncorrelated features
(drop features that don’t add much information)
Feature extraction
Linearly combine feature into smaller set of
features
Supervised
PCA: explain most of the total variability
FA: explain most of the common variability
Unsupervised
LDA: best separate class instances
Missing data
57
Remove the data(column/row)
Replace it with average value if numeric
data
Replace it with Default value
Linear regression /bayesian regression
Clustering techniques
Based on observed/historical data