CS373 Lecture18.1
CS373 Lecture18.1
Learning
CS 373
Purdue University
Dan Goldwasser
[email protected]
Multiclass classification Tasks
• So far, our discussion was limited to binary predictions
– Well, almost (?)
• What happens if our decision is not over binary labels?
– Many interesting classification problems are not!
– POS: Noun,verb, determiner,..
– Document classification: sports, finance, politics
– Sentiment: Positive, negative, objective
4
Example: One-vs-All
Feature function notation
• For examples with label i we want: wiTx > wjTx
• Alternative notation: Stack all weight vectors
is
equivalent to wiTx > wjTx
Example
• The same pattern is encoded as different features associated with
different classes.
• The weights capture the relationship between the pattern and the
output class.
7
Multiclass Perceptron
• Binary SVM:
– Minimize ||W|| such that the closest points to the hyperplane have a score of
+/- 1
• Multiclass SVM
– Each label has a different weight vector
– Maximize multiclass margin
Margin in the Multiclass case
Revise the definition for the multiclass case:
• The difference between the score of the correct label and the
scores of competing labels
margin
Positive slack
Introduction to Machine Learning. Fall 2015 18
K. Crammer, Y. Singer: ”On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines”, JMLR, 2001
Multiclass classification so far
• Learning:
• Prediction
19
Cost Sensitive Multiclass Classification
• Sometime we are willing to “tolerate” some
mistakes more than others
20
Cost Sensitive Multiclass Classification
• We can think about it as a hierarchy:
• Define a distance metric:
– Δ(y,y’) = tree distance between y and y’
23
Reminder: Subgradient descent
• asdas
Slides by Sebastian Nowozin and Christoph H. Lampert “structured models in computer vision” tutorial CVPR 2011
24
Reminder: Subgradient descent
Slides by Sebastian Nowozin and Christoph H. Lampert “structured models in computer vision” tutorial CVPR 2011
25
Reminder: Subgradient descent
Slides by Sebastian Nowozin and Christoph H. Lampert “structured models in computer vision” tutorial CVPR 2011
26
Reminder: Subgradient descent
Slides by Sebastian Nowozin and Christoph H. Lampert “structured models in computer vision” tutorial CVPR 2011
27
Subgradient for the MC case
28
Subgradient for the MC case
29
Subgradient for the MC case
30
Subgradient for the MC case
31
Subgradient for the MC case
32
Subgradient descent for the MC case