Visual attention models for scene text recognition

Ghosh, Suman K.; Valveny, Ernest; Bagdanov, Andrew D.

Computer Science > Computer Vision and Pattern Recognition

arXiv:1706.01487 (cs)

[Submitted on 5 Jun 2017]

Title:Visual attention models for scene text recognition

Authors:Suman K.Ghosh, Ernest Valveny, Andrew D. Bagdanov

View PDF

Abstract:In this paper we propose an approach to lexicon-free recognition of text in scene images. Our approach relies on a LSTM-based soft visual attention model learned from convolutional features. A set of feature vectors are derived from an intermediate convolutional layer corresponding to different areas of the image. This permits encoding of spatial information into the image representation. In this way, the framework is able to learn how to selectively focus on different parts of the image. At every time step the recognizer emits one character using a weighted combination of the convolutional feature vectors according to the learned attention model. Training can be done end-to-end using only word level annotations. In addition, we show that modifying the beam search algorithm by integrating an explicit language model leads to significantly better recognition results. We validate the performance of our approach on standard SVT and ICDAR'03 scene text datasets, showing state-of-the-art performance in unconstrained text recognition.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1706.01487 [cs.CV]
	(or arXiv:1706.01487v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1706.01487

Submission history

From: Suman Ghosh [view email]
[v1] Mon, 5 Jun 2017 18:34:37 UTC (1,531 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2017-06

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Suman K. Ghosh
Ernest Valveny
Andrew D. Bagdanov

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:Visual attention models for scene text recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Visual attention models for scene text recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators