Levenshtein OCR

Da, Cheng; Wang, Peng; Yao, Cong

Computer Science > Computer Vision and Pattern Recognition

arXiv:2209.03594 (cs)

[Submitted on 8 Sep 2022 (v1), last revised 14 Nov 2022 (this version, v2)]

Title:Levenshtein OCR

Authors:Cheng Da, Peng Wang, Cong Yao

View PDF

Abstract:A novel scene text recognizer based on Vision-Language Transformer (VLT) is presented. Inspired by Levenshtein Transformer in the area of NLP, the proposed method (named Levenshtein OCR, and LevOCR for short) explores an alternative way for automatically transcribing textual content from cropped natural images. Specifically, we cast the problem of scene text recognition as an iterative sequence refinement process. The initial prediction sequence produced by a pure vision model is encoded and fed into a cross-modal transformer to interact and fuse with the visual features, to progressively approximate the ground truth. The refinement process is accomplished via two basic character-level operations: deletion and insertion, which are learned with imitation learning and allow for parallel decoding, dynamic length change and good interpretability. The quantitative experiments clearly demonstrate that LevOCR achieves state-of-the-art performances on standard benchmarks and the qualitative analyses verify the effectiveness and advantage of the proposed LevOCR algorithm. Code is available at this https URL.

Comments:	Accepted by ECCV2022
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2209.03594 [cs.CV]
	(or arXiv:2209.03594v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2209.03594

Submission history

From: Cheng Da [view email]
[v1] Thu, 8 Sep 2022 06:46:50 UTC (979 KB)
[v2] Mon, 14 Nov 2022 06:09:39 UTC (979 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Levenshtein OCR

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Levenshtein OCR

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators