0% found this document useful (0 votes)
121 views11 pages

Presentation On Jansche, hlt2005

This document presents a method for training logistic regression models to optimize the F-measure performance metric instead of accuracy. It formulates F-measure as a rational function of model utilities to approximate its optimization. Experimental results on a text summarization task show this F-measure training can outperform maximum likelihood training.

Uploaded by

philip zigoris
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
121 views11 pages

Presentation On Jansche, hlt2005

This document presents a method for training logistic regression models to optimize the F-measure performance metric instead of accuracy. It formulates F-measure as a rational function of model utilities to approximate its optimization. Experimental results on a text summarization task show this F-measure training can outperform maximum likelihood training.

Uploaded by

philip zigoris
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 11

Maximum Expected F-Measure Training

of Logistic Regression Models


Martin Jansche, HLT 2005

presented by Philip Zigoris


Motivation

Learning algorithms generally optimize 0-1


accuracy.
Often this is not the performance measure we are
concerned with.
This tends to be the case with datasets heavily
skewed towards one class or when the cost of error
differs for each class.
Outline


Review: Logistic Regression

Review: F_alpha Performance Measure

Optimizing F_alpha
– Formulation and Algorithm
– Comparison to ML
– Experimental Results

Conclusion
Review: Logistic Regression

Sample: (x i , y i ) Î Â k
´ {±1}
1
Pr(+1 | x,q) = = g(x ×q)
1+ e - x×q

Classifier: y
MAP (x) = argmax Pr(+1 | x,q)
y

Objective: θ = argmax Õ g(y i (x i ×q))


*

q i
Review: F-measure

Predicted
A : true positive +1 -1
B : misses True +1 A B
C : false alarms -1 C D
D : true negative
æa 1- a ö- 1
Precision: A/(A+C) F a (R,P) = ç + ÷
èR P ø
vs. A
Fa (A,B,C) =
Recall: A/(A+B) A + a B + (1- a )C
Section 4: Relation to Expected Utility

é ù
êå Iy MAP (x i )= +1Iy i = +1ú
Express F as a rational i
éAù
1ê ú 1ê ú
function of a vector U S = êå Iy MAP (x i )=- 1Iyi = +1ú = Bú
valued utility nê i ú n ê
êëCúû
êå Iy (x )= +1Iy =- 1ú
êë i MAP i i
úû
(Approximately) Optimizing F

Similar to logistic regression:


Iy MAP (x i )= +1 » Pr(+1 | x,q)
We can also approximate A,B,C:
÷ (q) =
A  å g(x ×q)
÷ (q)
i
y i = +1
÷ (q) = A 
÷ (q) = n - A 
B  ÷ (q) F 
pos ÷ pos
a n pos + (1- a ) m 
÷ (q) = m 
C  ÷ (q)
÷ pos - A 
÷ pos = å g(x ×q)

i
Comparison to maximum
likelihood: Toy dataset
x y
0 +1
1 -1
2 +1
3 -1
Comparison to maximum
likelihood: Toy dataset
Maximum Likelihood gives all +1 classifier (0.35,0.57)
•Recall is 1
•Precision is 3/4
•F.5=6/7 ≈ 0.86
Classifier trained with F.5 approximation (20, 15)
•F.25=4/5 ≈ 0.8
•Gives all one classifier (results the same as
ML) trained with F approximation labels first two
Classifier .25
examples negative (-20,15)
•F.5 = 4/5 ≈ 0.8
•F.25=8/9 ≈ 0.89
Experiments: Text Summarization

Task: Classify sentence (and like units) as belonging to summarization


Data:
•3535 train, 408 test instances
•29 features (1 binary, 28 real/integer valued)
•All features present

Results:

Data source: Sameer Maskey and Julia Hirschberg. Comparing lexical,


acoustic/prosodic, structural and discourse features for speech summarization. In
Conclusions
Main idea:
Approximate MAP classification with the
probability itself. This gives a continuous
potential over parameters which can be
optimized with standard techniques
Main criticism:
Experiments are inconclusive.

You might also like