0% found this document useful (0 votes)

39 views33 pages

CS373 Lecture18.1

Uploaded by

milishukla1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views33 pages

CS373 Lecture18.1

Uploaded by

milishukla1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 33

Data mining & Machine

Learning
CS 373
Purdue University

Dan Goldwasser
[email protected]
Multiclass classification Tasks
• So far, our discussion was limited to binary predictions
– Well, almost (?)
• What happens if our decision is not over binary labels?
– Many interesting classification problems are not!
– POS: Noun,verb, determiner,..
– Document classification: sports, finance, politics
– Sentiment: Positive, negative, objective

How can we approach these problems?

• Can the problem be reduced into a

binary classification problem?
2
Multiclass classification
• We will look into two approaches:
– Combining multiple binary classifiers
• One-vs-All
• All-vs-All
– Training a single classifier
• Extending SVM to the multiclass case
One-Vs-All
Assumption: Each class can be separated from the rest
using a binary classifier
• Learning: Decomposed to learning k independent binary classifiers, one
corresponding to each class
– An example (x,y) is considered positive for class y and negative to all others.
– Assume m examples, k class labels (assume m/k in each)
– Classifier fi: m/k (positive) and (k-1)m/k (negative)
• Decision: Winner Takes All:
– f(x) = argmaxi fi (x) = argmaxi (vix)

4
Example: One-vs-All
Feature function notation
• For examples with label i we want: wiTx > wjTx
• Alternative notation: Stack all weight vectors

• Define features jointly over the input and output

is
equivalent to wiTx > wjTx
Example
• The same pattern is encoded as different features associated with
different classes.
• The weights capture the relationship between the pattern and the
output class.

7
Multiclass Perceptron

Image from CIML.info

9
Multiclass Logistic Regression
• Recall: logistic regression learns a probabilistic classifier, using the
sigmoid function to model the conditional probability of the label.

Training objective: Find w that

maximizes the conditional
likelihood:
Multiclass Logistic Regression
• The multiclass version can be rewritten as -
Multiclass Logistic Regression
• The training objective- find w that maximizes the conditional
likelihood of the data {(x,y)i}
Multiclass Logistic Regression
Minimize the negative
log-likelihood of the data

Expected feature counts given the

current model

Feature counts for the gold (observed data)

How we got here?

Maximize the data

likelihood--> log likelihood

Minimize the negative

log-likelihood of the data

Expected feature counts given the

current model

Feature counts for the gold (observed data)

Multiclass SVM
• Single classifier optimizing a global objective
– Extend the SVM framework to the multiclass settings

• Binary SVM:
– Minimize ||W|| such that the closest points to the hyperplane have a score of
+/- 1

• Multiclass SVM
– Each label has a different weight vector
– Maximize multiclass margin
Margin in the Multiclass case
Revise the definition for the multiclass case:
• The difference between the score of the correct label and the
scores of competing labels

margin

Colors indicate different labels

SVM Objective: Minimize total norm of weights s.t. the

true label is scored at least 1 more than the second best. 16
Hard Multiclass SVM
Regularization

The score of the true

label has to be
higher than 1,
for any label
Soft Multiclass SVM
Regularizer Slack Variables
• asd

The score of the true label should have a

Relax hard constraints
margin of 1-ξi
using slack variables

Positive slack
Introduction to Machine Learning. Fall 2015 18
K. Crammer, Y. Singer: ”On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines”, JMLR, 2001
Multiclass classification so far
• Learning:

• Prediction

19
Cost Sensitive Multiclass Classification
• Sometime we are willing to “tolerate” some
mistakes more than others

20
Cost Sensitive Multiclass Classification
• We can think about it as a hierarchy:
• Define a distance metric:
– Δ(y,y’) = tree distance between y and y’

We would like to incorporate that into our learning model

Introduction to Machine Learning. Fall 2015 21

Cost Sensitive Multiclass Classification
• We can think about it as a hierarchy:
• Define a distance metric:
– Δ(y,y’) = tree distance between y and y’

We would like to incorporate that into our learning model

Introduction to Machine Learning. Fall 2015 22

Cost Sensitive Multiclass Classification

Instead we can have an unconstrained version -

Question: W
hat is sub-g
of this loss radient
function?

23
Reminder: Subgradient descent
• asdas

Slides by Sebastian Nowozin and Christoph H. Lampert “structured models in computer vision” tutorial CVPR 2011
24
Reminder: Subgradient descent

Slides by Sebastian Nowozin and Christoph H. Lampert “structured models in computer vision” tutorial CVPR 2011
25
Reminder: Subgradient descent

Slides by Sebastian Nowozin and Christoph H. Lampert “structured models in computer vision” tutorial CVPR 2011
26
Reminder: Subgradient descent

Slides by Sebastian Nowozin and Christoph H. Lampert “structured models in computer vision” tutorial CVPR 2011
27
Subgradient for the MC case

28
Subgradient for the MC case

29
Subgradient for the MC case

30
Subgradient for the MC case

31
Subgradient for the MC case

32
Subgradient descent for the MC case

Introduction to Machine Learning. Fall 2015 33

Subgradient descent for the MC case

Question: What is the difference between this algorithm and the

perceptron variant for multiclass classification?
Introduction to Machine Learning. Fall 2015 34

MixSIR Manual 0.5
No ratings yet
MixSIR Manual 0.5
20 pages
Project Portfolio Management BDM Deck
100% (3)
Project Portfolio Management BDM Deck
19 pages
SVM Multi-Class Classification
No ratings yet
SVM Multi-Class Classification
5 pages
Multiclass Classification
No ratings yet
Multiclass Classification
45 pages
Machine Learning
No ratings yet
Machine Learning
28 pages
Lec 05
No ratings yet
Lec 05
54 pages
Lecture 1, Part 2: Linear Classification: Roger Grosse
No ratings yet
Lecture 1, Part 2: Linear Classification: Roger Grosse
10 pages
Project Report 2
No ratings yet
Project Report 2
11 pages
Machine Learning
No ratings yet
Machine Learning
6 pages
l05 Machine Learning
No ratings yet
l05 Machine Learning
34 pages
Notes Binary Mulitclass Classification
No ratings yet
Notes Binary Mulitclass Classification
7 pages
Ai and ML
No ratings yet
Ai and ML
16 pages
L02 Classification and Regression
No ratings yet
L02 Classification and Regression
26 pages
Multi-Class Classification
No ratings yet
Multi-Class Classification
52 pages
Multiclass Classification
No ratings yet
Multiclass Classification
3 pages
Discriminant Functions
No ratings yet
Discriminant Functions
33 pages
Multiclass Classification: 9.520 Class 06, 25 Feb 2008 Ryan Rifkin
No ratings yet
Multiclass Classification: 9.520 Class 06, 25 Feb 2008 Ryan Rifkin
59 pages
Multiclass Classification: 9.520 Class 06, 25 Feb 2008 Ryan Rifkin
No ratings yet
Multiclass Classification: 9.520 Class 06, 25 Feb 2008 Ryan Rifkin
59 pages
Unit-Ii Chapter-3 Beyond Binary Classification Handling More Than Two Classes
No ratings yet
Unit-Ii Chapter-3 Beyond Binary Classification Handling More Than Two Classes
16 pages
3 Percept Ron
No ratings yet
3 Percept Ron
34 pages
06 Lectureslides LinearClassification Fixed
No ratings yet
06 Lectureslides LinearClassification Fixed
52 pages
Introduction To Machine Learning Lecture 3: Linear Classification Methods
No ratings yet
Introduction To Machine Learning Lecture 3: Linear Classification Methods
40 pages
ML Unit 2
No ratings yet
ML Unit 2
31 pages
Unit Ii
No ratings yet
Unit Ii
118 pages
Classification in Machine Learning
No ratings yet
Classification in Machine Learning
25 pages
11 Ethem Linear SVM 2015
No ratings yet
11 Ethem Linear SVM 2015
66 pages
Beyond Binary Classification
No ratings yet
Beyond Binary Classification
34 pages
d2l en 165 218
No ratings yet
d2l en 165 218
35 pages
cs188 sp23 Lec25 - Z
No ratings yet
cs188 sp23 Lec25 - Z
38 pages
Chapter 4. Classification Algorithms-Stud
No ratings yet
Chapter 4. Classification Algorithms-Stud
43 pages
AI Lec 4
No ratings yet
AI Lec 4
35 pages
Survey On Multiclass Classification Methods
No ratings yet
Survey On Multiclass Classification Methods
9 pages
IT 802 ML Unit-2 Notes
No ratings yet
IT 802 ML Unit-2 Notes
19 pages
Lecture Notes - SVM
No ratings yet
Lecture Notes - SVM
13 pages
Binary, Multi-Class & Multi-Label Classification
No ratings yet
Binary, Multi-Class & Multi-Label Classification
6 pages
Warming-Up To ML, and Some Simple Supervised Learners (Distance-Based "Local" Methods)
No ratings yet
Warming-Up To ML, and Some Simple Supervised Learners (Distance-Based "Local" Methods)
29 pages
4 Types of Classification Tasks in Machine Learning
No ratings yet
4 Types of Classification Tasks in Machine Learning
14 pages
Lecture 11 - 09.09.24 Classification Part 1
No ratings yet
Lecture 11 - 09.09.24 Classification Part 1
51 pages
Main
No ratings yet
Main
5 pages
Supervised Machine Learning
No ratings yet
Supervised Machine Learning
74 pages
06-07-08-Supervised Learning by Computing Distances, Multi Class Classification, Decision Boundary
No ratings yet
06-07-08-Supervised Learning by Computing Distances, Multi Class Classification, Decision Boundary
32 pages
5 Multiclass
No ratings yet
5 Multiclass
46 pages
Machine Learning and Pattern Recognition Week 3 Intro - Classification
No ratings yet
Machine Learning and Pattern Recognition Week 3 Intro - Classification
5 pages
ML Lectures Summary 2
No ratings yet
ML Lectures Summary 2
52 pages
Comparative Study of Four Supervised Machine Learning Techniques For Classification
No ratings yet
Comparative Study of Four Supervised Machine Learning Techniques For Classification
15 pages
Agarwala 14
No ratings yet
Agarwala 14
9 pages
Data Mining Classification and Prediction
No ratings yet
Data Mining Classification and Prediction
17 pages
Lecture 9
No ratings yet
Lecture 9
27 pages
ML LVC 3 Post-Session Summary
No ratings yet
ML LVC 3 Post-Session Summary
16 pages
06 MultiClass Classification
No ratings yet
06 MultiClass Classification
16 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
Classification
No ratings yet
Classification
2 pages
Classification: Prof. Gheith Abandah
No ratings yet
Classification: Prof. Gheith Abandah
30 pages
FALLSEM2024-25 BCSE209L TH VL2024250101735 2024-07-25 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101735 2024-07-25 Reference-Material-I
37 pages
Fintech ML Using Azure
No ratings yet
Fintech ML Using Azure
51 pages
Machine Learning Introduction
No ratings yet
Machine Learning Introduction
56 pages
Section02 MulticlassClassification
No ratings yet
Section02 MulticlassClassification
19 pages
Binary Classifier
No ratings yet
Binary Classifier
14 pages
Unit 4 ML
No ratings yet
Unit 4 ML
28 pages
Machine Learning-4
100% (1)
Machine Learning-4
18 pages
1
No ratings yet
1
61 pages
Mastering Dynamic Programming in Java
From Everand
Mastering Dynamic Programming in Java
Ed A Norex
No ratings yet
User Manual: Ekom/Cmmi/Eng/Rsd/Um Fosimv2 - Um - SMK - V1-0
No ratings yet
User Manual: Ekom/Cmmi/Eng/Rsd/Um Fosimv2 - Um - SMK - V1-0
14 pages
InTech-Real Time Robotic Hand Control Using Hand Gestures
No ratings yet
InTech-Real Time Robotic Hand Control Using Hand Gestures
16 pages
Guia SPC Proficient
No ratings yet
Guia SPC Proficient
6 pages
Travel Request Form: Traveller Information
No ratings yet
Travel Request Form: Traveller Information
1 page
Facility Layout: "The Physical Arrangement of Resources (Including People) in The Production Process."
100% (1)
Facility Layout: "The Physical Arrangement of Resources (Including People) in The Production Process."
21 pages
Robot Programming PDF
100% (2)
Robot Programming PDF
72 pages
Digital Image Steganography Based On Combination of DCT and DWT - Springer2010
No ratings yet
Digital Image Steganography Based On Combination of DCT and DWT - Springer2010
6 pages
CABAL Online Starter Guide
100% (2)
CABAL Online Starter Guide
18 pages
ML Unit 1 Part 2
No ratings yet
ML Unit 1 Part 2
56 pages
Top 20 Automation Testing Interview Questions and Answers
100% (1)
Top 20 Automation Testing Interview Questions and Answers
36 pages
DLD (Assignment 2)
No ratings yet
DLD (Assignment 2)
12 pages
NSDA Manual - Interp Rules
No ratings yet
NSDA Manual - Interp Rules
4 pages
Enhancing LO DataSources - Step by Step PDF
No ratings yet
Enhancing LO DataSources - Step by Step PDF
19 pages
Shcspraccs
No ratings yet
Shcspraccs
49 pages
Interface and Base Tables
No ratings yet
Interface and Base Tables
34 pages
TeXRefCard v1 5 PDF
No ratings yet
TeXRefCard v1 5 PDF
2 pages
Run ASP On Xampp
No ratings yet
Run ASP On Xampp
2 pages
Kba-161023230425 3 NV Item Esn, Esn Me, Meid and Meid Me
No ratings yet
Kba-161023230425 3 NV Item Esn, Esn Me, Meid and Meid Me
3 pages
MCS Index 802.11n and 802.11ac
No ratings yet
MCS Index 802.11n and 802.11ac
1 page
Strengthening Public Financial Management Through Transparency in Timor-Leste
No ratings yet
Strengthening Public Financial Management Through Transparency in Timor-Leste
75 pages
Discuss The Importance of Information As A Resource
No ratings yet
Discuss The Importance of Information As A Resource
4 pages
PERT Analysis: by Pete Hasek, EDS (Member, Southeast Michigan Chapter)
No ratings yet
PERT Analysis: by Pete Hasek, EDS (Member, Southeast Michigan Chapter)
2 pages
2ED Advanced Math William Guo ToC Ref
No ratings yet
2ED Advanced Math William Guo ToC Ref
11 pages
ZXWR RNC Product Description
No ratings yet
ZXWR RNC Product Description
21 pages
Cisco Certification Tracking Guide
No ratings yet
Cisco Certification Tracking Guide
4 pages
XML Publisher and FSG For Beginners: Susan Behn, Alyssa Johnson, and Karen Brownfield
No ratings yet
XML Publisher and FSG For Beginners: Susan Behn, Alyssa Johnson, and Karen Brownfield
9 pages
CLK Yl Ntb620 Cr2
No ratings yet
CLK Yl Ntb620 Cr2
3 pages
Gprs Gprs Data Over Cellular Networks Data Over Cellular Networks Data Over Cellular Networks Data Over Cellular Networks
No ratings yet
Gprs Gprs Data Over Cellular Networks Data Over Cellular Networks Data Over Cellular Networks Data Over Cellular Networks
51 pages

CS373 Lecture18.1

Uploaded by

CS373 Lecture18.1

Uploaded by

Data mining & Machine

How can we approach these problems?

• Can the problem be reduced into a

• Define features jointly over the input and output

Image from CIML.info

Training objective: Find w that

Expected feature counts given the

Feature counts for the gold (observed data)

Maximize the data

Minimize the negative

Expected feature counts given the

Feature counts for the gold (observed data)

Colors indicate different labels

SVM Objective: Minimize total norm of weights s.t. the

The score of the true

The score of the true label should have a

We would like to incorporate that into our learning model

Introduction to Machine Learning. Fall 2015 21

We would like to incorporate that into our learning model

Introduction to Machine Learning. Fall 2015 22

Instead we can have an unconstrained version -

Introduction to Machine Learning. Fall 2015 33

Question: What is the difference between this algorithm and the

You might also like