4 DL
4 DL
4
Learning Associations
• Basket analysis:
P (Y | X ) probability that somebody who buys X also
buys Y where X and Y are products/services.
Model 6
Classification: Applications
• Aka Pattern recognition
• Face recognition: Pose, lighting, occlusion (glasses,
beard), make-up, hair style
• Character recognition: Different handwriting styles.
• Speech recognition: Temporal dependency.
– Use of a dictionary or the syntax of the language.
– Sensor fusion: Combine multiple modalities; eg, visual (lip
image) and acoustic for speech
• Medical diagnosis: From symptoms to illnesse.
7
Face Recognition
Test images
8
Prediction: Regression
9
Regression Applications
10
Different Data Analysis Tasks
12
Unsupervised Learning
• Unsupervised learning is a type of machine learning
algorithm used to draw inferences from datasets
consisting of input data without labeled responses.
• Clustering: Grouping similar instances
• Other applications:
– Predicting the weather
– Calculating the height of a person in the school.
– Summarization.
13
Reinforcement Learning
• Topics:
– Policies: what actions should an agent take in a particular
situation
– Utility estimation: how good is a state (→used by policy)
• No supervised output but delayed reward
• Credit assignment problem (what was responsible for
the outcome)
• Applications:
– Game playing
– Robot in a maze
– Multiple agents, partial observability,
14 ...
Clustering Strategies
• K-means
– Iteratively re-assign points to the nearest cluster
center
• Agglomerative clustering
– Start with each point as its own cluster and iteratively
merge the closest clusters
• Mean-shift clustering
– Estimate modes of pdf
• Spectral clustering
– Split the nodes in a graph based on assigned links with
similarity weights
f( ) = “apple”
f( ) = “tomato”
f( ) = “cow”
Slide credit: L. Lazebnik
The machine learning
framework
y = f(x)
output prediction Image
function feature
Testing
Image Learned
Prediction
Features model
Test Image Slide credit: D. Hoiem and L. Lazebnik
Features
• Raw pixels
• Histograms
• GIST descriptors
• …
Slide credit: L. Lazebnik
Classifiers: Nearest neighbor
Training
Training Test
examples
examples example
from class 2
from class 1
f(x) = sgn(w x + b)
Contains a motorbike
Underfitting Overfitting
Error
Test error
Training error
Slide
Slide
credit:
credit:
D. D.
Hoiem
Hoiem
How to reduce variance?
x
x
x o
x x
x
+ o
o x
x
o o+
o
o
x2
x1
1-nearest neighbor
x
x
x o
x x
x
+ o
o x
x
o o+
o
o
x2
x1
3-nearest neighbor
x
x
x o
x x
x
+ o
o x
x
o o+
o
o
x2
x1
5-nearest neighbor
x
x
x o
x x
x
+ o
o x
x
o o+
o
o
x2
x1
Classifiers: Logistic Regression
Maximize likelihood of
label given data,
male
assuming a log-linear
model
Height
female
x2
x1 Pitch of voice
P( x1 , x2 | y = 1)
log = wT x
P( x1 , x2 | y = −1)
P( y = 1 | x1 , x2 ) = 1 / (1 + exp(− w T x ))
Classifiers: Linear SVM
x
x
x x x
x
o x
x
o o
o
o
x2
x1
• Find a linear function to separate the classes:
f(x) = sgn(w x + b)
Classifiers: Linear SVM
x
x
x x x
x
o x
x
o o
o
o
x2
x1
• Find a linear function to separate the classes:
f(x) = sgn(w x + b)
Classifiers: Linear SVM
x
x
x o
x x
x
o x
x
o o
o
o
x2
x1
• Find a linear function to separate the classes:
f(x) = sgn(w x + b)
Nonlinear SVMs
• Datasets that are linearly separable work out great:
0 x
0 x
Φ: x → φ(x)
x
x
x o
x x
x
o x
o x
o o
o
o
x2
x1
Classification Process
1. Classification tasks
2. Building a classifier
3. Evaluating a classifier
70
Classifying Mushrooms
https://archive.ics.uci.edu/ml/datasets/Mushroom
71
Classifying Iris Plants
73
Classification Tasks
◆ Given:
◆ A set of classes
◆ Instances (examples)
of each class
http://www.business-insight.com/html/intelligence/bi_overfitting.html 74
Classification Tasks
75
Classification Tasks
◆Given: A set of
labeled instances
◆Generate: A
method (aka model)
that when given a
new instance it will
hypothesize its class
80
Classifying a New Instance
84
Classifying New Instances
85
Training and Test Sets
Training instances
(training set)
Test instances
(test set)
86
Contamination
Training instances
(training set)
Test instances
(test set)
87
About Classification Tasks
88
2. Building a Classifier
89
What is a Modeler?
◆A
mathematical/algori
thmic approach to
generalize from
instances so it can
make predictions
about instances that
it has not seen
before
◆Its output is called a
model
90
Types of Modelers/Models
◆ Logistic regression
◆ Decision trees
◆ Random forests
◆ Kernel methods
◆ Genetic algorithms
◆ Neural networks
91
http://tjo-en.hatenablog.com/entry/2014/01/06/234155 93
http://tjo-en.hatenablog.com/entry/2014/01/06/234155 94
http://tjo-en.hatenablog.com/entry/2014/01/06/234155 95
What Modeler to Choose?
◆ Logistic regression
◆Data scientists try
◆ Naïve Bayes classifiers
different modelers,
◆ Support vector machines (SVMs)
with different
◆ Decision trees
parameters, and
◆ Random forests check the accuracy
◆ Kernel methods to figure out which
◆ Genetic algorithms (GAs) one works best for
◆ Neural networks: perceptrons the data at hand
98
Ensembles
◆ An ensemble method uses several
algorithms that do the same task,
and combines their results
◆ “Ensemble learning”
99
http://magizbox.com/index.php/machine-learning/ds-model-building/ensemble/ 100
3. Evaluating a Classifier
101
Classification Accuracy
102
Evaluating a Classifier:
n-fold Cross Validation
◆ Suppose m labeled
instances
◆ Divide into n subsets
(“folds”) of equal
size
TP TP
Precision = Recall =
TP + FP TP + FN
106
Evaluating a Classifier:
Other Metrics
107
Overfitting
◆ A model overfits the training data when it is very accurate
with that data, and may not do so well with new test data
Model 1
Model 2
109
Induction
110
When Facing a Classification
Task
◆ What features to choose ◆ What classes to choose
◆ Try defining different ◆ Edible / poisonous?
features ◆ Edible / poisonous /
◆ For some problems, unknown?
hundreds and maybe
thousands of features may ◆ How many labeled examples
be possible ◆ May require a lot of work
◆ Sometimes the features are ◆ What modeler to choose
not directly observable (ie,
◆ Better to try different ones
there are “latent” variables)
111
What to remember about classifiers
• Statlib: http://lib.stat.cmu.edu/
• Delve: http://www.cs.utoronto.ca/~delve/
113
Resources: Journals
• Journal of Machine Learning Research
www.jmlr.org
• Machine Learning
• IEEE Transactions on Neural Networks
• IEEE Transactions on Pattern Analysis and
Machine Intelligence
• Annals of Statistics
• Journal of the American Statistical Association
• ...
114
Resources: Conferences
115
Some Machine Learning References
• General
– Tom Mitchell, Machine Learning, McGraw Hill, 1997
– Christopher Bishop, Neural Networks for Pattern
Recognition, Oxford University Press, 1995
• Adaboost
– Friedman, Hastie, and Tibshirani, “Additive logistic
regression: a statistical view of boosting”, Annals of
Statistics, 2000
• SVMs
– http://www.support-vector.net/icml-tutorial.pdf