0% found this document useful (0 votes)

14 views25 pages

Three Approaches To Ordinal Classification (Slides 2009)

Uploaded by

carlosgg33

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views25 pages

Three Approaches To Ordinal Classification (Slides 2009)

Uploaded by

carlosgg33

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Three Approaches to Ordinal Classification

Krzysztof Dembczyński, Wojciech Kotłowski

Institute of Computing Science

Poznań University of Technology

EURO 2009, Bonn, July 8, 2009

1 Three Approaches to Ordinal Classification

2 Boosting-like Approach

3 Ordinal Matrix Factorization

4 Conclusions
Ordinal classification consists in predicting a label ta-
ken from a finite and ordered set for an object described
by some attributes.

This problem shares some characteristics of multi-class

classification and regression, but:
• the order between class labels cannot be neglected,
• the scale of the decision attribute is not cardinal.
Recommender system predicting a rating of a movie for a gi-
ven user.
Email filtering to ordered groups like: important, normal, later,
or spam.
Nature of ordinal classification:
• Classification with ordered class labels?
• Degenerate ranking problem?
1 Three Approaches to Ordinal Classification

2 Boosting-like Approach

3 Ordinal Matrix Factorization

4 Conclusions
Denotation:
• K – number of classes
• y – actual label
• ŷ – predicted label
• x – attributes
• f (x) – prediction (ranking or utility) function
• L(·) – loss function
• J·K – Boolean test
Ordinal Classification – Probability Estimation:
• Prediction risk is defined by a loss matrix:

L(y, ŷ) = (ly,ŷ )K×K

with v-shaped rows and zeros on diagonal.

 
0 1 2 3
 1 0 1 2 
L(y, ŷ) = 
 
2 1 0 1

 
3 2 1 0
Ordinal Classification – Probability Estimation:
• Bayes decision for the loss matrix L(y, ŷ) is given by:

K
ŷ ∗ = arg min
X
Pr(y = k|x)L(k, ŷ).
ŷ
k=1

• To solve the problem, we need to estimate conditional

probabilities Pr(y = k|x) – a lot of algorithms . . .
• We can decompose the problem to K − 1 binary problems by
utilizing the order of labels y: the result then are estimates of
Pr(y > k|x), k = 1, . . . , K − 1.
• To satisfy monotonicity of Pr(y > k|x), k = 1, . . . , K − 1,
we use isotonic regression.
• Other possibilities allowed . . .
Ordinal Classification – Probability Estimation:
• Given Pr(y = k|x), k = 1, . . . , K, the optimal prediction is:


 arg maxk Pr(y = k|x), for lyŷ = Jy 6= ŷK,




ŷ ∗ = median(y|x), for lyŷ = |y − ŷ|,





E(y|x), for lyŷ = (y − ŷ)2 .


• Absolute-error loss seems to be the most natural since its

Bayes decision is median that does not depend on scale of
labels.
• Any function of the probability distribution can be used for
object ranking.
Ordinal Classification – Degenerate Ranking:
• Prediction risk is defined by a rank loss computed over pairs
of objects:

L (y◦• , f (x◦ ), f (x• )) = Jy◦• (f (x◦ ) − f (x• )) ¬ 0K,

where
y◦• = sgn(y◦ − y• ),
and f (x) is a ranking (or utility) function.

yi1 > y i2 > y i3 > ... > yiN −1 > y iN

f (xi1 ) > f (xi3 ) > f (xi2 ) > . . . > f (xiN −1 ) > f (xiN )
Ordinal Classification – Degenerate Ranking:
• This approach ranks the objects.
• To assign class labels, one has to compute thresholds on a
range of the ranking function with respect to a given loss
matrix.
• Rank loss minimization is strictly connected with
maximization of AUC criterion used in binary classification.
• Minimization of rank loss on training set has quadratic
complexity with respect to number of object, however, in the
case of K ordered classes, the algorithm can work in linear
time.
Ordinal Classification – Threshold Loss:
• Prediction risk is defined by threshold loss:

K−1
X
L(y, f (x), θ) = Jyk (f (x) − θk ) 0K,
k=1

where θ = (θ0 , . . . , θK ) are consecutive thresholds to be

computed simultaneously with f (x), and

yk = 1, if y > k, and yk = −1, otherwise y ¬ k.

θ0 = − ∞ ... θ1 = − 3.5 θ2 = − 1.2 ... θk−−1 = 1.2 θk−−2 = 3.8 ... θK = ∞

−5 −4 −3 −2 −1 0 1 2 3 4 5
f (x)
Ordinal Classification – Threshold Loss:
• This approach shares characteristics of the previous two.
• Comparison of an object to thresholds instead to all other
training objects – lower complexity, but linear algorithms
exist for rank loss minimization in ordinal classification
settings.
• Joint solution for all K − 1 binary problems – no need of
isotonization of conditional probabilities, but the result is a
single value.
• Weighted threshold loss can approximate any loss matrix.
1 Three Approaches to Ordinal Classification

2 Boosting-like Approach

3 Ordinal Matrix Factorization

4 Conclusions
Boosting-like Algorithms for Three Approaches:
• Prediction function is an ensemble of decision rules:
M
X
f (x) = α0 + rm (x).
m=1

• We used boosting approach to learn f (x): in each iteration,

a single rule is generated by concentrating on examples which
were hardest to classify correctly by previous rules with
respect to a given loss function.
Boosting-like Algorithms for Three Approaches:
• Ordinal ENDER – decomposes the problem into a
sequence of binary problems for estimating Pr(y > k|x);
uses isotonic regression for isotonization of the estimates;
final prediction is median over computed class distribution.
• RankRules – minimizes (exponential) rank loss;
parameterized to minimize absolute-error.
• ORDER – minimizes (exponential) threshold loss;
parameterized to minimize absolute-error.

• ENDER-Abs – reference algorithm constructing ensemble of

decision rules by direct minimization of absolute-error.

All the algorithms work in linear time with respect to number of

training example (plus log-linear time for sorting used once in
preprocessing phase).
Experimental Results:
• Comparison of Ordinal ENDER, RankRules, RankRules and
ENDER AE.
• 19 benchmark sets taken from Luis Torgo repository –
transformed from regression to ordinal classification settings.
• Average ranks are computed with repect to mean absolute
error obtained on each data set.
• Critical difference in average ranks is CD = 1.076.

CD = 1.076

ENDER−Abs RankRules
ORDER
Ordinal ENDER

4 3 2 1
Experimental Results:
• There is almost no quantitative difference in performance
and time consumption: RankRules is slightly slower.
• Qualitative differences: Ordinal ENDER is related to
probability estimation, but RankRules to AUC maximization.
• Ensemble of decision rules are competitive to: RankBoost
AE, ORBoost-All, SVM-IMC.
1 Three Approaches to Ordinal Classification

2 Boosting-like Approach

3 Ordinal Matrix Factorization

4 Conclusions
Ordinal Matrix Factorization:
• Given sparse matrix Y of observed values build a model
based on matrix factorization:

Y ' Ŷ = UVT

where U is an I × M and VT is a M × J matrix.

• The prediction is then defined by:

M
X
ŷij = uim vjm .
m=1

• Example: I is the number of users, J is the number of

movies in the movie recommender system, and M is number
of features describing users and movies.
• For learning we use gradient descent applied alternately to
U and V matrices with respect to a given loss function.
Ordinal Matrix Factorization for Three Approaches:
• Decomposition schema for probability estimation.
• Minimization of rank loss.
• Minimization of threshold loss.
• Hypothesis: all the approaches perform similarly.
• For all three approaches linear algorithms exists: minimization
of (exponential) rank loss, however, is the most demanding.
• No satisfactory results yet :(
• Work in progress . . .
1 Three Approaches to Ordinal Classification

2 Boosting-like Approach

3 Ordinal Matrix Factorization

4 Conclusions
Conclusions:
• Nature of ordinal classification?
• Three approaches to ordinal classification.
• Boosting-like algorithm: rather qualitative than quantitative
differences between these approaches.
• Ordinal Matrix factorization: in progress . . .

Tuo Zhao Notes
No ratings yet
Tuo Zhao Notes
47 pages
Classification
100% (2)
Classification
105 pages
MIT18 657F15 LecNote PDF
No ratings yet
MIT18 657F15 LecNote PDF
194 pages
Slides
No ratings yet
Slides
174 pages
Week 4 - Classification Alternative Techniques
No ratings yet
Week 4 - Classification Alternative Techniques
87 pages
Trevithick Second Steam Locomotive PDF
50% (2)
Trevithick Second Steam Locomotive PDF
6 pages
Name Email Mobile
No ratings yet
Name Email Mobile
30 pages
Chapter 1 Thesis Noise
100% (1)
Chapter 1 Thesis Noise
11 pages
Bayes Classification
No ratings yet
Bayes Classification
86 pages
Online Passive-Aggressive Algorithms
No ratings yet
Online Passive-Aggressive Algorithms
35 pages
Learning Expectations: QUARTER 1 Week 1
No ratings yet
Learning Expectations: QUARTER 1 Week 1
10 pages
Linear Classification: 1 1 N N I D I
No ratings yet
Linear Classification: 1 1 N N I D I
33 pages
04 Probability and Learning PDF
No ratings yet
04 Probability and Learning PDF
34 pages
Bayesian Decision Theory
No ratings yet
Bayesian Decision Theory
65 pages
Bayesian Learning: Berrin Yanikoglu
No ratings yet
Bayesian Learning: Berrin Yanikoglu
64 pages
Ordinal Regression Methods: Survey and Experimental Study
No ratings yet
Ordinal Regression Methods: Survey and Experimental Study
20 pages
Theory For Classification and Linear Models (I)
No ratings yet
Theory For Classification and Linear Models (I)
32 pages
Module - 4 - ECE3047 - Machine Learning
No ratings yet
Module - 4 - ECE3047 - Machine Learning
81 pages
Bayes Decision Theory
No ratings yet
Bayes Decision Theory
53 pages
Fath On y 2017 Adversarial
No ratings yet
Fath On y 2017 Adversarial
19 pages
Accepted Manuscript
No ratings yet
Accepted Manuscript
13 pages
Oracle Bounds and Exact Algorithm For Dyadic Classification Trees
No ratings yet
Oracle Bounds and Exact Algorithm For Dyadic Classification Trees
15 pages
Subnet Mask PDF
No ratings yet
Subnet Mask PDF
5 pages
SMOTE: Synthetic Minority Over-Sampling Technique: Nitesh V. Chawla
No ratings yet
SMOTE: Synthetic Minority Over-Sampling Technique: Nitesh V. Chawla
37 pages
2 PDF
No ratings yet
2 PDF
232 pages
Neurocomputing: José-Ramón Cano, Pedro Antonio Gutiérrez, Bartosz Krawczyk, Michał Wo Zniak, Salvador García
No ratings yet
Neurocomputing: José-Ramón Cano, Pedro Antonio Gutiérrez, Bartosz Krawczyk, Michał Wo Zniak, Salvador García
15 pages
Optimum Statistical Classifiers
100% (1)
Optimum Statistical Classifiers
12 pages
Bayesian Laws
No ratings yet
Bayesian Laws
16 pages
Introduction To Pattern Recognition: Vojtěch Franc
100% (1)
Introduction To Pattern Recognition: Vojtěch Franc
21 pages
Statistical Learning Theory: 18.657: Mathematics of Machine Learning
No ratings yet
Statistical Learning Theory: 18.657: Mathematics of Machine Learning
9 pages
A Neural Network Approach To Ordinal Regression
No ratings yet
A Neural Network Approach To Ordinal Regression
6 pages
Daily Lesson Log: Tle - Icttd9 - 12al - Ic - E - 3
No ratings yet
Daily Lesson Log: Tle - Icttd9 - 12al - Ic - E - 3
4 pages
Batl006 PDF
No ratings yet
Batl006 PDF
26 pages
Machine Learning - Unit 2
No ratings yet
Machine Learning - Unit 2
104 pages
1.0 Modeling: 1.1 Classification
No ratings yet
1.0 Modeling: 1.1 Classification
5 pages
Underfit Choice: Theoretical Background
No ratings yet
Underfit Choice: Theoretical Background
5 pages
National-Oilwell: Top Drive
No ratings yet
National-Oilwell: Top Drive
6 pages
A Neural Network Approach To Ordinal Regression: Jianlin Cheng, Zheng Wang, and Gianluca Pollastri
No ratings yet
A Neural Network Approach To Ordinal Regression: Jianlin Cheng, Zheng Wang, and Gianluca Pollastri
6 pages
DTA Report 295 NR 1534 ISSN 1175-6594
No ratings yet
DTA Report 295 NR 1534 ISSN 1175-6594
31 pages
Classification & Prediction: - Shailesh Yadav Central University of Rajasthan
No ratings yet
Classification & Prediction: - Shailesh Yadav Central University of Rajasthan
28 pages
Machine Learning: Tools, Techniques, Applications (2013-14-I) # 1
No ratings yet
Machine Learning: Tools, Techniques, Applications (2013-14-I) # 1
5 pages
Neofiti 1 - Deuteronomio - Translation-English
No ratings yet
Neofiti 1 - Deuteronomio - Translation-English
68 pages
Lec5 Class
No ratings yet
Lec5 Class
14 pages
Lecture 7 - Classification (Rules and Naïve Bayes)
100% (1)
Lecture 7 - Classification (Rules and Naïve Bayes)
19 pages
Chapter 2
No ratings yet
Chapter 2
31 pages
MISY 631 Final Review Calculators Will Be Provided For The Exam
No ratings yet
MISY 631 Final Review Calculators Will Be Provided For The Exam
9 pages
PEPSICO
No ratings yet
PEPSICO
5 pages
Lesson 1: Pre-Analytical Factors and Gross Description: Histopathologic and Cytologic Techniques - Lecture
No ratings yet
Lesson 1: Pre-Analytical Factors and Gross Description: Histopathologic and Cytologic Techniques - Lecture
28 pages
AB Salts WKST Key
No ratings yet
AB Salts WKST Key
10 pages
UNIT IV Na-Ve Bayes Classifier Algorithm
No ratings yet
UNIT IV Na-Ve Bayes Classifier Algorithm
33 pages
Chapter 04b Riskmin-Class - Commented2
No ratings yet
Chapter 04b Riskmin-Class - Commented2
30 pages
6.867 Section 3: Classification: 1 Intro 2 2 Representation 2 3 Probabilistic Models 2
No ratings yet
6.867 Section 3: Classification: 1 Intro 2 2 Representation 2 3 Probabilistic Models 2
10 pages
Stat Risk
No ratings yet
Stat Risk
6 pages
Reference Material - LDA
No ratings yet
Reference Material - LDA
24 pages
ML Linear Model
No ratings yet
ML Linear Model
10 pages
ML - Module 3
No ratings yet
ML - Module 3
58 pages
Mathematics Digital Text Book: Class Ix
No ratings yet
Mathematics Digital Text Book: Class Ix
16 pages
Benchmark Report - Voice Service Optimization For Common State, TP20160728
No ratings yet
Benchmark Report - Voice Service Optimization For Common State, TP20160728
16 pages
CS402 Mod 3
No ratings yet
CS402 Mod 3
2 pages
10769-Article TexDepicting Cassowaries in The Qing Court
No ratings yet
10769-Article TexDepicting Cassowaries in The Qing Court
94 pages
Assessment of Credit Management in Micro Finance Institution
No ratings yet
Assessment of Credit Management in Micro Finance Institution
42 pages
SOW Ransomware Protection Solution v1
No ratings yet
SOW Ransomware Protection Solution v1
11 pages
Linearclassification
No ratings yet
Linearclassification
31 pages
Is An Ordinal Class Structure Useful in Classier Learning (2009)
No ratings yet
Is An Ordinal Class Structure Useful in Classier Learning (2009)
22 pages
300 GPD Water Maker
No ratings yet
300 GPD Water Maker
7 pages
Fpse 64
No ratings yet
Fpse 64
1 page
BT Mid 1ans
No ratings yet
BT Mid 1ans
9 pages
Lec 17 - Dsfa23
No ratings yet
Lec 17 - Dsfa23
32 pages
Notes6 Classification
No ratings yet
Notes6 Classification
10 pages
Class Adv Classification III
No ratings yet
Class Adv Classification III
54 pages
Classification: Alternative Techniques: Md. Fazlul Karim Patwary IIT
No ratings yet
Classification: Alternative Techniques: Md. Fazlul Karim Patwary IIT
65 pages
Vet Pharm Therapeutics - 2020 - Broughton Neiswanger - Pharmacometabolomics With A Combination of PLS DA and Random
No ratings yet
Vet Pharm Therapeutics - 2020 - Broughton Neiswanger - Pharmacometabolomics With A Combination of PLS DA and Random
11 pages
Lecture 2 - Principle of Machine Learning
No ratings yet
Lecture 2 - Principle of Machine Learning
39 pages
Lecture 03 Bayes Classifier With Prob Concepts
No ratings yet
Lecture 03 Bayes Classifier With Prob Concepts
70 pages
2023 LSE MY474 Applied Machine Learning Social Science, Lecture3
No ratings yet
2023 LSE MY474 Applied Machine Learning Social Science, Lecture3
58 pages
How To Choose The Journal That's Right For Your Study - PLOS
No ratings yet
How To Choose The Journal That's Right For Your Study - PLOS
13 pages
Tan ChineseLiteratureEssays 2016
No ratings yet
Tan ChineseLiteratureEssays 2016
5 pages
S5 Math Exercise
No ratings yet
S5 Math Exercise
6 pages
Module - 3 - Last Part
No ratings yet
Module - 3 - Last Part
16 pages
DATA - FA 2024 - Dist
No ratings yet
DATA - FA 2024 - Dist
85 pages
2 - Classification Models
No ratings yet
2 - Classification Models
52 pages
Sandeep Julakanti - Resume
No ratings yet
Sandeep Julakanti - Resume
9 pages
Weekly Topical Test 1 Trigonometry
No ratings yet
Weekly Topical Test 1 Trigonometry
3 pages
Unit Iv
No ratings yet
Unit Iv
34 pages
WK 08
No ratings yet
WK 08
10 pages
Module 4 - Classification
No ratings yet
Module 4 - Classification
10 pages
Science Literacy Strategies
No ratings yet
Science Literacy Strategies
3 pages
OOPS Lab File
No ratings yet
OOPS Lab File
60 pages
Tres Hold
No ratings yet
Tres Hold
7 pages
Exercises of Logarithms and Exponentials
From Everand
Exercises of Logarithms and Exponentials
Simone Malacrida
No ratings yet

Three Approaches To Ordinal Classification (Slides 2009)

Uploaded by

Three Approaches To Ordinal Classification (Slides 2009)

Uploaded by

Three Approaches to Ordinal Classification

Krzysztof Dembczyński, Wojciech Kotłowski

Institute of Computing Science

EURO 2009, Bonn, July 8, 2009

3 Ordinal Matrix Factorization

This problem shares some characteristics of multi-class

3 Ordinal Matrix Factorization

L(y, ŷ) = (ly,ŷ )K×K

with v-shaped rows and zeros on diagonal.

• To solve the problem, we need to estimate conditional

• Absolute-error loss seems to be the most natural since its

L (y◦• , f (x◦ ), f (x• )) = Jy◦• (f (x◦ ) − f (x• )) ¬ 0K,

yi1 > y i2 > y i3 > ... > yiN −1 > y iN

where θ = (θ0 , . . . , θK ) are consecutive thresholds to be

yk = 1, if y > k, and yk = −1, otherwise y ¬ k.

θ0 = − ∞ ... θ1 = − 3.5 θ2 = − 1.2 ... θk−−1 = 1.2 θk−−2 = 3.8 ... θK = ∞

3 Ordinal Matrix Factorization

• We used boosting approach to learn f (x): in each iteration,

• ENDER-Abs – reference algorithm constructing ensemble of

All the algorithms work in linear time with respect to number of

3 Ordinal Matrix Factorization

where U is an I × M and VT is a M × J matrix.

• Example: I is the number of users, J is the number of

3 Ordinal Matrix Factorization

You might also like