Task Loss Estimation for Sequence Prediction

Bahdanau, Dzmitry; Serdyuk, Dmitriy; Brakel, Philémon; Ke, Nan Rosemary; Chorowski, Jan; Courville, Aaron; Bengio, Yoshua

Computer Science > Machine Learning

arXiv:1511.06456 (cs)

[Submitted on 19 Nov 2015 (v1), last revised 19 Jan 2016 (this version, v4)]

Title:Task Loss Estimation for Sequence Prediction

Authors:Dzmitry Bahdanau, Dmitriy Serdyuk, Philémon Brakel, Nan Rosemary Ke, Jan Chorowski, Aaron Courville, Yoshua Bengio

View PDF

Abstract:Often, the performance on a supervised machine learning task is evaluated with a emph{task loss} function that cannot be optimized directly. Examples of such loss functions include the classification error, the edit distance and the BLEU score. A common workaround for this problem is to instead optimize a emph{surrogate loss} function, such as for instance cross-entropy or hinge loss. In order for this remedy to be effective, it is important to ensure that minimization of the surrogate loss results in minimization of the task loss, a condition that we call emph{consistency with the task loss}. In this work, we propose another method for deriving differentiable surrogate losses that provably meet this requirement. We focus on the broad class of models that define a score for every input-output pair. Our idea is that this score can be interpreted as an estimate of the task loss, and that the estimation error may be used as a consistent surrogate loss. A distinct feature of such an approach is that it defines the desirable value of the score for every input-output pair. We use this property to design specialized surrogate losses for Encoder-Decoder models often used for sequence prediction tasks. In our experiment, we benchmark on the task of speech recognition. Using a new surrogate loss instead of cross-entropy to train an Encoder-Decoder speech recognizer brings a significant ~13% relative improvement in terms of Character Error Rate (CER) in the case when no extra corpora are used for language modeling.

Comments:	Submitted to ICLR 2016
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:1511.06456 [cs.LG]
	(or arXiv:1511.06456v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1511.06456

Submission history

From: Dzmitry Bahdanau [view email]
[v1] Thu, 19 Nov 2015 23:51:31 UTC (54 KB)
[v2] Fri, 27 Nov 2015 22:53:47 UTC (80 KB)
[v3] Fri, 8 Jan 2016 15:28:19 UTC (89 KB)
[v4] Tue, 19 Jan 2016 20:48:19 UTC (90 KB)

Computer Science > Machine Learning

Title:Task Loss Estimation for Sequence Prediction

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Task Loss Estimation for Sequence Prediction

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators