0% found this document useful (0 votes)

289 views34 pages

Beyond Binary Classification

This document discusses techniques for handling problems beyond binary classification, including: - Dealing with imbalanced data using subsampling or weighting during training. - Converting binary classifiers to multi-class using one-vs-all or one-vs-one approaches. - Ranking problems and approaches like pointwise and pairwise methods for learning to rank items. - Structured prediction using structured SVMs to jointly predict correlated output variables.

Uploaded by

lalitha lalli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

289 views34 pages

Beyond Binary Classification

Uploaded by

lalitha lalli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Beyond binary classification

The slides are closely adapted from Subhransu Maji’s slides

Learning with imbalanced data
• One class might be rare (E.g., face detection)
• Mistakes on the rare class cost more:
‣ cost of misclassifying y=+1 is (>1)
‣ cost of misclassifying y=-1 is 1
• Why? we want is a better f-score (or average precision)

binary classification -weighted binary classification

Suppose we have an algorithm to train a binary classifier, can

we use it to train the alpha weighted version?
2
Training by sub-sampling
• Input: Output:

• While true
‣ Sample We have sub-sampled the negatives
by
‣ Sample
‣ If
➡ return
sub-sampling algorithm

Claim
binary classification -weighted binary classification

3
Modifying training
• To train simply —
‣ Subsample negatives and train a binary classifier.
‣ Alternatively, supersample positives and train a binary classifier.
‣ Which one is better?
• For some learners we don’t need to keep copies of the positives
‣ Decision tree
➡ Modify accuracy to the weighted version
‣ kNN classifier
➡ Take weighted votes during prediction
‣ Perceptron?

4
Multi-class classification
• Labels are one of K different ones.
• Some classifiers are inherently multi-class —
‣ kNN classifiers: vote among the K labels, pick the one with the highest vote
(break ties arbitrarily)
‣ Decision trees: use multi-class histograms to determine the best feature to
splits. At the leaves predict the most frequent label.
• Question: can we take a binary classifier and turn it into multi-class?

5
One-vs-all (OVA) classifier
• Train K classifiers, each to distinguish one class from the rest
• Prediction: pick the class with the highest score:

score function

• Example
‣ Perceptron:
➡ May have to calibrate the weights (e.g., fix the norm to 1) since we are
comparing the scores of classifiers
➡ In practice, doing this right is tricky when there are a large number of classes

6
One-vs-one (OVO) classifier
• Train K(K-1)/2 classifiers, each to distinguish one class from another
• Each classifier votes for the winning class in a pair
• The class with most votes wins

• Example
‣ Perceptron:

➡ Calibration is not an issue since we are taking the sign of the score

7
Classification tree
• Needs log(k) runs in inference

SVM

(1 or 3) (2 or 4)

SVM SVM

3 1 2 4
Directed acyclic graph (DAG) classifier
• DAG SVM [Platt et al., NIPS 2000]
‣ Faster testing: O(K) instead of O(K(K-1)/2)
‣ Has some theoretical guarantees

Figure from Platt et al.

9
Ranking

10
Ranking
• Input: query (e.g. “cats”)
• Output: a sorted list of items

• How should we measure performance?

• The loss function is trickier than in the binary classification case
‣ Example 1: All items in the first page should be relevant
‣ Example 2: All relevant items should be ahead of irrelevant items

11
Learning to rank
• For simplicity lets assume we are learning to rank for a given query.
• Learning to rank:
‣ Input: a list of items
‣ Output: a function that takes a set of items and returns a sorted list

• Approaches
‣ Pointwise approach:
➡ Assumes that each document has a numerical score.
➡ Learn a model to predict the score (e.g. linear regression).
‣ Pairwise approach:
➡ Ranking is approximated by a classification problem.
➡ Learn a binary classifier that can tell which item is better given a pair.

12
Naive rank train
• Create a dataset with binary labels
features for
‣ Initialize: comparing
‣ For every i and j such that, i ≠ j item i and j
➡ If item i is more relevant than j
• Add a positive point:
➡ If item i is less relevant than j
• Add a negative point:

• Learn a binary classifier on D

• Ranking
‣ Initialize:
‣ For every i and j such that, i ≠ j
➡ Calculate prediction:

➡ Update scores:

13
Problems with naive ranking
• Naive rank train works well for bipartite ranking problems
‣ Where the goal is to predict whether an item is relevant or not. There is no
notion of an item being more relevant than another.
• A better strategy is to account for the positions of the items in the list
• Denote a ranking by:
‣ If item u appears before item v, we have:
• Let the space of all permutations of M objects be:
• A ranking function maps M items to a permutation:
• A cost function (omega)
‣ The cost of placing an item at position i at j:
• Ranking loss:

14
ω-rank loss functions
• To be a valid loss function ω must be:
‣ Symmetric:
‣ Monotonic:
‣ Satisfy triangle inequality:

• Examples:
‣ Kemeny loss:

‣ Top-K loss:

15
ω-rank train
• Create a dataset with binary labels features for
‣ Initialize: comparing
‣ For every i and j such that, i ≠ j item i and j
➡ If σᵢ < σ (item i is more relevant)
• Add a positive point:
➡ If σᵢ > σ (item j is more relevant)
• Add a negative point:

• Learn a binary classifier on D (each instance has a weight)

• Ranking
‣ Initialize:
‣ For every i and j such that, i ≠ j
➡ Calculate prediction:

➡ Update scores:

16
Structured SVM
Predicting Syntactic Trees

Slide from Dan Roth from UIUC

1
Parsing
We map x and y into another feature space with mapping function f

Slide from Dan Roth from UIUC

1
Original SVM
Structured SVM

• We want the score for the right answer to be at least 1 better than
any other possible answer.

• Note that the space of can be huge.

‣ So the inference may be difficult:

• Not all wrong answers are “equal”: some are closer to the desired answer so
…
Structured SVM

• Note that the space of can be huge.

‣ So the inference may be difficult:
‣ What about learning?

• We know just a subset of those constraints are support vectors

‣ So if we are given the solution w, we can find active constraints
• What if we don’t know w? Can we search for those constraints iteratively.
How?
Structured SVM

• Algorithm:
‣ (assuming we have reasonable number of support vectors)
‣ 1. Learn a w given some constraints
➡ can be a random sample initially
‣ 2. Apply w to data and get most violating constraints
➡ Find that maximizes
➡ A data point will not have a constraint if all possible solutions satisfy the margin.
‣ 3. Add them to the constraints, and go to step 1.
Structured SVM

• What is a good mapping function for multi-class SVM?

Structured SVM

• What is a good mapping function for multi-class SVM?

‣ Will it solve the score calibration problem?

w for class 1

w for class n
Example: Deformable part models (DPM)
Human pose estimation

2
Sample results

2
Example: Deformable part models (DPM)

Human pose Face pose estimation Object detection

estimation

Felzenszwalb, Girshick, McAllester, Ramanan. "Object Detection with Discriminatively Trained Part-Based Models" TPAMI 2010
Yang & Ramanan, "Articulated Pose Estimation using Flexible Mixtures of Parts" CVPR 2011
Zhu & Ramanan, "Face Detection, Pose Estimation, and Landmark Localization in the Wild", CVPR 2012

2
Collective classification
• Predicting multiple correlated variables

input output

features labels
objective
29
Collective classification
• Predicting multiple correlated variables

independent predictions can be noisy

labels of nearby
vertices as
features E.g., histogram of labels in a 5x5 neighborhood
30
Stacking classifiers
• Train two classifiers
• First one is trained to predict output from the input
• Second is trained on the input and the output of first classifier

31
Stacking classifiers
• Train a stack of N classifiers
• ith classifier is trained on the input + output of the previous i-1 classifiers

• Overfitting is an issue: the classifiers are accurate on training data but on not on
test data leading to a cascade of overconfident classifiers
• Solution: held-out data

…
32
Summary
• Learning with imbalanced data
‣ Implicit and explicit sampling can be used to train binary classifiers for the
weighted loss case
• Beyond binary classification
‣ Multi-class classification
➡ Some classifiers are inherently multi-class

➡ Others can be combined using: one-vs-one, one-vs-all methods

‣ Ranking
➡ Ranking loss functions to capture distance between permutations

➡ Pointwise and pairwise methods

‣ Structured prediction
Change SVM to predict structured output rather than binary ones.
➡

‣ Collective classification
➡ Stacking classifiers trained with held-out data

33
Slides credit
• Slides are closely adapted from CIML book by Hal Daume and Subransu
Maji’s course.
• Images for collective classification are from the PASCAL VOC dataset
‣ http://pascallin.ecs.soton.ac.uk/challenges/VOC/
• Some of the discussion is based on Wikipedia
‣ http://en.wikipedia.org/wiki/Learning_to_rank

PowerPoint Slides To Chapter 07
No ratings yet
PowerPoint Slides To Chapter 07
49 pages
Shivaji University, Kolhapur
No ratings yet
Shivaji University, Kolhapur
12 pages
CS6456-Object Oriented Programming
No ratings yet
CS6456-Object Oriented Programming
15 pages
ML Unit-1
No ratings yet
ML Unit-1
15 pages
Support Vector Machine (SVM) : Basic Terminologies
100% (1)
Support Vector Machine (SVM) : Basic Terminologies
2 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
29 pages
CP4253 Map Unit Ii
No ratings yet
CP4253 Map Unit Ii
23 pages
Unit - 3
No ratings yet
Unit - 3
42 pages
Unit 1 - Computer Networks - WWW - Rgpvnotes.in
No ratings yet
Unit 1 - Computer Networks - WWW - Rgpvnotes.in
14 pages
DBMS - Unit-3
No ratings yet
DBMS - Unit-3
35 pages
Unit I R Data Structures
No ratings yet
Unit I R Data Structures
30 pages
ML Notes MAKAUT 7th Sem
No ratings yet
ML Notes MAKAUT 7th Sem
31 pages
ML Unit-1
No ratings yet
ML Unit-1
32 pages
Unit 2 - Advanced Computer Architecture - WWW - Rgpvnotes.in
No ratings yet
Unit 2 - Advanced Computer Architecture - WWW - Rgpvnotes.in
15 pages
Attack Surfaces and Attack Trees, A Model For Network Security, Standards
No ratings yet
Attack Surfaces and Attack Trees, A Model For Network Security, Standards
19 pages
Soft Computing Assignment
100% (1)
Soft Computing Assignment
13 pages
Unit 5 RNN
No ratings yet
Unit 5 RNN
14 pages
CP4253 Map Unit Iii
No ratings yet
CP4253 Map Unit Iii
26 pages
5 Pca
No ratings yet
5 Pca
14 pages
IPCV Unit 04
No ratings yet
IPCV Unit 04
12 pages
Unit 4 Deeplearning
No ratings yet
Unit 4 Deeplearning
41 pages
MACHINE LEARNING Unit-1
No ratings yet
MACHINE LEARNING Unit-1
23 pages
Laboratory 1. Working With Images in Opencv
No ratings yet
Laboratory 1. Working With Images in Opencv
13 pages
PPL Unit 3-1
No ratings yet
PPL Unit 3-1
25 pages
OS - Module 5 - Memory Management
No ratings yet
OS - Module 5 - Memory Management
81 pages
Unit-2 Introduction To Hadoop
No ratings yet
Unit-2 Introduction To Hadoop
19 pages
Inception Net
No ratings yet
Inception Net
88 pages
File Allocation Methods
100% (1)
File Allocation Methods
14 pages
715ECT04 Embedded Systems 2M & 16M
0% (1)
715ECT04 Embedded Systems 2M & 16M
32 pages
Systolic Arrays & Their Applications
No ratings yet
Systolic Arrays & Their Applications
35 pages
Deep Learning Unit 2
No ratings yet
Deep Learning Unit 2
30 pages
Data Preprocessing in Python - Handling Missing Data
No ratings yet
Data Preprocessing in Python - Handling Missing Data
8 pages
Transfer Learning Seminar
No ratings yet
Transfer Learning Seminar
12 pages
Neural Network Unit - 4 - 221210 - 134739
No ratings yet
Neural Network Unit - 4 - 221210 - 134739
15 pages
JNTUK R20 B.Tech CSE 3-2 Machine Learning Unit 4 Notes
No ratings yet
JNTUK R20 B.Tech CSE 3-2 Machine Learning Unit 4 Notes
23 pages
2-Capacity, Underfitting, overfitting-15-Jul-2020Material - I - 15-Jul-2020 - ML - Fundamentals
No ratings yet
2-Capacity, Underfitting, overfitting-15-Jul-2020Material - I - 15-Jul-2020 - ML - Fundamentals
35 pages
Unit 2
No ratings yet
Unit 2
178 pages
Computer Architecture Question Anna University
No ratings yet
Computer Architecture Question Anna University
2 pages
Unit 1 Introduction To Embedded System Design
No ratings yet
Unit 1 Introduction To Embedded System Design
67 pages
Issues in ML
No ratings yet
Issues in ML
2 pages
Digital Steganography
No ratings yet
Digital Steganography
38 pages
Files in Python
No ratings yet
Files in Python
12 pages
MLT Unit 3
100% (1)
MLT Unit 3
38 pages
1) Define MIPS. CPI and MFLOPS.: Q.1 Attempt Any FOUR
No ratings yet
1) Define MIPS. CPI and MFLOPS.: Q.1 Attempt Any FOUR
10 pages
Lecture 3 Multiprocessor Vs Multicomputer Vs DS
No ratings yet
Lecture 3 Multiprocessor Vs Multicomputer Vs DS
55 pages
Unit 5 I/O Organization: Computer Architecture
No ratings yet
Unit 5 I/O Organization: Computer Architecture
9 pages
Trends in Computer Architecture
No ratings yet
Trends in Computer Architecture
30 pages
Practice Final sp22
No ratings yet
Practice Final sp22
10 pages
1.disabling Interrupts:: Mutual Exclusion With Busy Waiting
No ratings yet
1.disabling Interrupts:: Mutual Exclusion With Busy Waiting
2 pages
Unit Iii Data-Level Parallelism in Vector, Simd, and Gpu Architectures
No ratings yet
Unit Iii Data-Level Parallelism in Vector, Simd, and Gpu Architectures
26 pages
ML Unit-4
No ratings yet
ML Unit-4
9 pages
ML Unit Ii
No ratings yet
ML Unit Ii
30 pages
Unit 2 Machine Learning
No ratings yet
Unit 2 Machine Learning
32 pages
SPINS: Security Protocols For Sensor Networks
No ratings yet
SPINS: Security Protocols For Sensor Networks
29 pages
Lab Program
100% (1)
Lab Program
15 pages
Unit 1 CV
No ratings yet
Unit 1 CV
78 pages
Unit 3 - Computer Networks - WWW - Rgpvnotes.in
No ratings yet
Unit 3 - Computer Networks - WWW - Rgpvnotes.in
18 pages
1151ec117 - Eosdd - Unit2 - Question - Bank PDF
No ratings yet
1151ec117 - Eosdd - Unit2 - Question - Bank PDF
26 pages
CST 402 DC QB
No ratings yet
CST 402 DC QB
6 pages
AppDynamics Third Edition
From Everand
AppDynamics Third Edition
Gerardus Blokdyk
No ratings yet
Semen Analysis
No ratings yet
Semen Analysis
42 pages
BIOS Password Backdoors in Laptops
No ratings yet
BIOS Password Backdoors in Laptops
4 pages
Statement Project
No ratings yet
Statement Project
1 page
Asmus CV
No ratings yet
Asmus CV
4 pages
Preparation of Specimens FR Immunohistochemistry - PPT (2) - 1
No ratings yet
Preparation of Specimens FR Immunohistochemistry - PPT (2) - 1
33 pages
Java Questions
No ratings yet
Java Questions
38 pages
B11 Building Enviro Systems and Control Exam Questions
No ratings yet
B11 Building Enviro Systems and Control Exam Questions
20 pages
Chemsheets AS 1051 Hesss Law 2 Combustion
100% (1)
Chemsheets AS 1051 Hesss Law 2 Combustion
2 pages
An Empirical Validation of Cognitive Complexity As A Measure of Source Code Understandability
No ratings yet
An Empirical Validation of Cognitive Complexity As A Measure of Source Code Understandability
12 pages
BIOS Instructor Setup Rev 6 65
No ratings yet
BIOS Instructor Setup Rev 6 65
24 pages
Field Test Genius 20 - Gearless
100% (1)
Field Test Genius 20 - Gearless
3 pages
Missel Product List GB 2017 02 Fire Protection PDF
No ratings yet
Missel Product List GB 2017 02 Fire Protection PDF
36 pages
Dissertation Outline - Shwetank
No ratings yet
Dissertation Outline - Shwetank
4 pages
Programmable Logic Controllers - Basic PLC Components
No ratings yet
Programmable Logic Controllers - Basic PLC Components
2 pages
Q1 LE Mathematics-8 Lesson-2 Week-2
No ratings yet
Q1 LE Mathematics-8 Lesson-2 Week-2
25 pages
Tricks and Treats For "CST Programmers": Amit Rappel, Itzik Haimov
No ratings yet
Tricks and Treats For "CST Programmers": Amit Rappel, Itzik Haimov
24 pages
Brown Book
100% (1)
Brown Book
179 pages
Design of Spur Gear
No ratings yet
Design of Spur Gear
23 pages
Crystals, Defects and Microstructures - Modeling Across Scales - R. Phillips (Cambridge, 2004) WW PDF
100% (3)
Crystals, Defects and Microstructures - Modeling Across Scales - R. Phillips (Cambridge, 2004) WW PDF
808 pages
Seminar On: 3D Printing
No ratings yet
Seminar On: 3D Printing
19 pages
DS 2CD2T23G0 I520180404aawrc12389314 - 20221006123632
No ratings yet
DS 2CD2T23G0 I520180404aawrc12389314 - 20221006123632
26 pages
Projectile Motion (Lecture Note)
No ratings yet
Projectile Motion (Lecture Note)
16 pages
Chapter 3: Semiconductors: Electronic Materials
No ratings yet
Chapter 3: Semiconductors: Electronic Materials
12 pages
Convection Heat Transfer
No ratings yet
Convection Heat Transfer
60 pages
1-117 Ac Comp Quiz
100% (1)
1-117 Ac Comp Quiz
394 pages
De Morgan
0% (1)
De Morgan
11 pages
Questions For Gas Turbine Engine
No ratings yet
Questions For Gas Turbine Engine
120 pages
Delta ASDA A A+ User Manual
No ratings yet
Delta ASDA A A+ User Manual
383 pages
UNIT3 2marks
No ratings yet
UNIT3 2marks
7 pages
LDPC Codes
No ratings yet
LDPC Codes
3 pages