Deep Matching Prior Network: Toward Tighter Multi-Oriented Text Detection

The document proposes a new CNN-based method called Deep Matching Prior Network (DMPNet) to detect text with tighter quadrangle bounding boxes rather than rectangles. DMPNet uses quadrilateral sliding windows to roughly recall text and then refines the predictions. It achieves state-of-the-art performance on a public dataset, outperforming existing methods.

Uploaded by

To Isaac

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

117 views

Deep Matching Prior Network: Toward Tighter Multi-Oriented Text Detection

Uploaded by

To Isaac

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection

Yuliang Liu, Lianwen Jin+

College of Electronic Information Engineering
South China University of Technology
[email protected]
arXiv:1703.01425v1 [cs.CV] 4 Mar 2017

Abstract

Detecting incidental scene text is a challenging task be-

cause of multi-orientation, perspective distortion, and vari-
ation of text size, color and scale. Retrospective research
has only focused on using rectangular bounding box or hor- (a) Rectangular bounding box cause unnecessary overlap.
izontal sliding window to localize text, which may result in
redundant background noise, unnecessary overlap or even
information loss. To address these issues, we propose a
new Convolutional Neural Networks (CNNs) based method,
named Deep Matching Prior Network (DMPNet), to detect
text with tighter quadrangle. First, we use quadrilateral
sliding windows in several specific intermediate convolu-
tional layers to roughly recall the text with higher overlap-
ping area and then a shared Monte-Carlo method is pro-
posed for fast and accurate computing of the polygonal ar- (c) Marginal text can not be exactly localized with rectangle.
eas. After that, we designed a sequential protocol for rela-
tive regression which can exactly predict text with compact
quadrangle. Moreover, a auxiliary smooth Ln loss is also
proposed for further regressing the position of text, which
has better overall performance than L2 loss and smooth L1
loss in terms of robustness and stability. The effectiveness (d) Rectangular bounding box brings redundant noise.
of our approach is evaluated on a public word-level, multi- Figure 1. Comparison of quadrilateral bounding box and rectangu-
oriented scene text database, ICDAR 2015 Robust Reading lar bounding box for localizing texts.
Competition Challenge 4 “Incidental scene text localiza-
tion”. The performance of our method is evaluated by using
spective distortions, and variation of text size, color or
F-measure and found to be 70.64%, outperforming the ex-
scale [40], which makes it a very challenging task [39]. In
isting state-of-the-art method with F-measure 63.76%.
the past few years, various existing methods have success-
fully been used for detecting horizontal or near-horizontal
texts [2, 4, 23, 11, 10]. However, due to the horizontal rect-
1. Introduction
angular constraints, multi-oriented text are restrictive to be
Scene text detection is an important prerequisite [32, 31, recalled in practice, e.g. low accuracies reported in ICDAR
37, 1, 34] for many content-based applications, e.g., mul- 2015 Competition Challenge 4 “Incidental scene text local-
tilingual translation, blind navigation and automotive assis- ization” [14].
tance. Especially, the recognition stage always stands in Recently, numerous techniques [35, 36, 13, 39] have
need of localizing scene text in advance, thus it is a signifi- been devised for multi-oriented text detection; these meth-
cant requirement for detecting methods that can tightly and ods used rotated rectangle to localize oriented text. How-
robustly localize scene text. ever, Ye and Doermann [34] indicated that because of char-
Camera captured scene text are often found with low- acters distortion, the boundary of text may lose rectangular
quality; these texts may have multiple orientations, per- shape, and the rectangular constraints may result in redun-

1
dant background noise, unnecessary overlap or even infor- • The proposed smooth Ln loss has better overall per-
mation loss when detecting distorted incidental scene text formance than L2 loss and smooth L1 loss in terms of
as shown in Figure 1. It can be visualized from the Figure robustness and stability.
that the rectangle based methods must face three kinds of
circumstances: i) redundant information may reduce the re- • Our approach shows state-of-the-art performance in
liability of detected confidence [18] and make subsequent detecting incidental scene text.
recognition harder [40]; ii) marginal text may not be lo-
calized completely; iii) when using non-maximum suppres-
2. Related work
sion [21], unnecessary overlap may eliminate true predic- Reading text in the wild have been extensively studied in
tion. recent years because scene text conveys numerous valuable
To address these issues, in this paper, we proposed a new information that can be used on many intelligent applica-
Convolutional Neural Networks (CNNs) based method, tions, e.g. autonomous vehicles and blind navigation. Un-
named Deep Matching Prior Network (DMPNet), toward like generic objects, scene text has unconstrained lengths,
tighter text detection. To the best of our knowledge, this shape and especially perspective distortions, which make
is the first attempt to detect text with quadrangle. Ba- text detection hard to simply adopt techniques from other
sically, our method consists of two steps: roughly recall domains. Therefore, the mainstream of text detection meth-
text and finely adjust the predicted bounding box. First, ods always focused on the structure of individual charac-
based on the priori knowledge of textual intrinsic shape, we ters and the relationships between characters [40], e.g. con-
design different kinds of quadrilateral sliding windows in nected component based methods [38, 27, 22]. These meth-
specific intermediate convolutional layers to roughly recall ods often use stroke width transform (SWT) [9] or max-
text by comparing the overlapping area with a predefined imally stable extremal region (MSER) [20, 24] to first ex-
threshold. During this rough procedure, because numer- tract character candidates, and using a series of subsequence
ous polygonal overlapping areas between the sliding win- steps to eliminate non-text noise for exactly connecting the
dow (SW) and ground truth (GT) need to be computed, candidates. Although accurate, such methods are somewhat
we design a shared Monte-Carlo method to solve this is- limited to preserve various true characters in practice [3].
sue, which is qualitatively proved more accurate than the Another mainstream method is based on sliding win-
previous computational method [30]. After roughly recall- dow [2, 15, 8, 17], which shifts a window in each position
ing text, those SWs with higher overlapping area would be with multiple scales from an image to detect text. Although
finely adjusted for better localizing; different from existing this method can effectively recall text, the classification of
methods [2, 4, 23, 11, 10, 35, 36, 39] that predict text with the locations can be sensitive to false positives because the
rectangle, our method can use quadrangle for tighter local- sliding windows often carry various background noise.
izing scene text, which owe to the sequential protocol we Recently, Convolutional Neural Networks [28, 6, 26, 19,
purposed and the relative regression we used. Moreover, a 25] have been proved powerful enough to suppress false
new smooth Ln loss is also proposed for further regressing positives, which enlightened researchers in the area of scene
the position of text, which has better overall performance text detection; in [10], Huang et al. integrated MSER and
than L2 loss and smooth L1 loss in terms of robustness and CNN to significantly enhance performance over conven-
stability. Experiments on the public word-level and multi- tional methods; Zhang et al. utilized Fully Convolutional
oriented dataset, ICDAR 2015 Robust Reading Competition Network [39] to efficiently generate a pixel-wise text/non-
Challenge 4 “Incidental scene text localization”, demon- text salient map, which achieve state-of-the-art performance
strate that our method outperforms previous state-of-the-art on public datasets. It is worth mentioning that the common
methods [33] in terms of F-measure. ground of these successful methods is to utilized textual in-
We summarize our contributions as follow: trinsic information for training the CNN. Inspired by this
• We are the first to put forward prior quadrilateral slid- promising idea, instead of using constrained rectangle, we
ing window, which significantly improve the recall designed numerous quadrilateral sliding windows based on
rate. the textual intrinsic shape, which significantly improves re-
call rate in practice.
• We proposed sequential protocol for uniquely deter-
mining the order of 4 points in arbitrary plane convex 3. Proposed methodology
quadrangle, which enable our method for using rela-
This section presents details of the Deep Matching Prior
tive regression to predict quadrilateral bounding box.
Network (DMPNet). It includes the key contributions that
• The proposed shared Monte-Carlo computational make our method reliable and accurate for text localiza-
method can fast and accurately compute the polygonal tion: firstly, roughly recalling text with quadrilateral sliding
overlapping area. window; then, using a shared Monte-Carlo method for fast
(a) Comparison of recalling scene text. (b) Horizontal sliding windows. (c) Proposed quadrilateral sliding windows.
Figure 2. Comparison between horizontal sliding window and quadrilateral sliding window. (a): Black bounding box represents ground
truth; red represents our method. Blue represents horizontal sliding window. It can be visualized that quadrilateral window can easier
recall text than rectangular window with higher overlapping area. (b): Horizontal sliding windows used in [19]. (c): Proposed quadrilateral
sliding windows. Different quadrilateral sliding window can be distinguished with different color.

and accurate computing of polygonal areas; finely localiz- are added inside the square; b) two long parallelograms are
ing text with quadrangle and design a Smooth Ln loss for added inside the long rectangle. c) two tall parallelograms
moderately adjusting the predicted bounding box. are added inside the tall rectangle.
With these flexible sliding windows, the rough bound-
3.1. Roughly recall text with quadrilateral sliding ing boxes become more accurate and thus the subsequence
window finely procedure can be easier to localize text tightly. In ad-
Previous approaches [19, 26] have successfully adopted dition, because of less background noise, the confidence of
sliding windows in the intermediate convolutional layers to these quadrilateral sliding windows can be more reliable in
roughly recall text. Although the methods [26] can accu- practice, which can be used to eliminate false positives.
rately learn region proposal based on the sliding windows,
these approaches have been too slow for real-time or near 3.1.1 Shared Monte-Carlo method
real-time applications. To raise the speed, Liu [19] sim-
ply evaluate a small set of prior windows of different aspect As mentioned earlier, for each ground truth, we need to
ratios at each location in several feature maps with differ- compute its overlapping area with every quadrilateral slid-
ent scales, which can successfully detect both small and big ing window. However, the previous method [30] can only
objects. However, the horizontal sliding windows are of- compute rectangular area with unsatisfactory computational
ten hard to recall multi-oriented scene text in our practice. accuracy, thus we proposed a shared Monte-Carlo method
Inspired by the recent successful methods [10, 39] that in- that has both high speed and accuracy properties when com-
tegrated the textual feature and CNN, we put forward nu- puting the polygonal area. Our method consists of two
merous quadrilateral sliding windows based on the textual steps.
intrinsic shape to roughly recall text. a) First, we uniformly sample 10,000 points in circum-
During this rough procedure, an overlapping threshold scribed rectangle of the ground truth. The area of ground
was used to judge whether the sliding window is positive truth (SGT ) can be computed by calculating the ratio of
or negative. If a sliding window is positive, it would be overlapping points in total points multiplied by the area of
used to finely localize the text. Basically, a small threshold circumscribed rectangle. In this step, all points inside the
may bring a lot of background noise, reducing the preci- ground truth would be reserved for sharing computation.
sion, while a large threshold may make text harder to be b) Second, if the circumscribed rectangle of each slid-
recalled. But if we use quadrilateral sliding window, the ing window and the circumscribed rectangle of each ground
overlapping area between sliding window and ground truth truth do not have a intersection, the overlapping area is con-
can be larger enough to reach a higher threshold, which are sidered zero and we do not need to further compute. If
beneficial to improve both the recall rate and the precision the overlapping area is not zero, we use the same sam-
as shown in Figure 2. As the figure presents, we reserve the pling strategy to compute the area of sliding window (SSW )
horizontal sliding windows, simultaneously designing sev- and then calculating how many the reserved points from
eral quadrangles inside them based on the prior knowledge the first step inside the sliding window. The ratio of inside
of textual intrinsic shape: a) two rectangles with 45 degrees points multiplies the area of the circumscribed rectangle is
Figure 3. Comparison between previous method and our method in computing overlapping area.

the overlapping area. Specially, this step is suitable for us-

ing GPU parallelization, because we can use each thread to
be responsible for calculating each sliding window with the
specified ground truth, and thus we can handle thousands of
sliding windows in a short time.
Note that we use a method proposed in [12] to judge
whether a point is inside a polygon, and this method is
also known as the crossing number algorithm or the even-
odd rule algorithm [5]. The comparison between previous
method and our algorithm is shown in Figure 3, our method
shows satisfactory performance for computing polygonal
area in practice.

3.2. Finely localize text with quadrangle

The fine procedure focuses on using those sliding win-
Figure 4. Procedure of uniquely determining the sequence of four
dows with higher overlapping area to tightly localize text. points from a plane convex quadrangle.
Unlike horizontal rectangle that can be determined by two
diagonal points, we need to predict the coordinates of four
points to localize a quadrangle. However, simply using from the line with middle slope. The second and the fourth
the 4 points to shape a quadrangle is prone to be self- points are in the opposite side (defined “bigger” side and
contradictory, because the subjective annotation may make “smaller” side) of the middle line. Here, we assume middle
the network ambiguous to decide which is the first point. line Lm : ax + by + c = 0, and we define an undeter-
Therefore, before training, it is essential to order 4 points in mined point P (xp , yp ). If Lm (P ) > 0, we assume P is
advance. in the “bigger” side. If Lm (P ) < 0, P is assumed in the
Sequential protocol of coordinates. The propose proto- “smaller” side. Based on this assumption, the point in the
col can be used to determine the sequence of four points in “bigger” side would be assigned as second point, and the
the plane convex quadrangle, which contains four steps as last point would be regarded as the fourth point. The last
shown in Figure 4. First, we determine the first point with step is to compare the slopes between two diagonals (line13
minimum value x. If two points simultaneously have the and line24 ). From the line with bigger slope, we choose
minimum x, then we choose the point with smaller value the point with smaller x as the new first point. Specially,
y as the first point. Second, we connect the first point if the bigger slope is infinite, the point that has smaller y
to the other three points, and the third point can be found would be chosen as the first point. Similarly, we find out
p∗ p∗
y −py
p∗w1 −pw1
the third point, and then the second and fourth point can be x −px
dx = wchr , dy = hchr , dw1 = wchr , dh1 =
determined again. After finishing these four steps, the final p∗
h1 −ph1
∗
pw −pw2 ∗
ph −ph2
hchr , dw2 = wchr , dh2 = hchr , dw3 =
2 2
sequence of the four points from a given convex quadrangle
p∗
w3 −pw3
∗
ph −ph3 ∗
pw −pw4
can be uniquely determined. wchr , dh3 = hchr , dw4 =
3
wchr , dh4
4
=
Based on the sequential protocol, DMPNet can clearly p∗
h4 −ph4

learn and regress the coordinate of each point by computing hchr . This can be thought of as fine regression from an
quadrilateral sliding window to a nearby ground-truth box.
the relative position to the central point. Different from [26]
which regress two coordinates and two lengths for a rect- 3.3. Smooth Ln loss
angular prediction, our regressive method predicts two
Different from [19, 26], our approach uses a proposed
coordinates and eight lengths for a quadrilateral detection.
smooth Ln loss instead of smooth L1 loss to further localize
For each ground truth, the coordinates of four points would
scene text. Smooth L1 loss is less sensitive to outliers than
be reformatted to (x, y, w1 , h1 , w2 , h2 , w3 , h3 , w4 , h4 ),
the L2 loss used in R-CNN [7], however, this loss is not
where x, y are the central coordinate of the minimum
stable enough for adjustment of a data, which means the re-
circumscribed horizontal rectangle, and wi , hi are the
gression line may jump a large amount for small adjustment
relative position of the i-th point (i = {1, 2, 3, 4})
or just a little modification was used for big adjustment. As
to the central point. As Figure 5 shows, the co-
for proposed smooth Ln loss, the regressive parameters are
ordinates of four points (x1 ,y1 ,x2 ,y2 ,x3 ,y3 ,x4 ,y4 )=
continuous functions of the data, which means for any small
(x + w1 ,y + h1 ,x + w2 ,y + h2 ,x + w3 ,y + h3 ,x + w4 ,y + h4 ).
adjustment of a data point, the regression line will always
Note that wi and hi can be negative. Actually, eight coordi-
move only slightly, improving the precision in localizing
nates are enough to determine the position of a quadrangle,
small text. For bigger adjustment, the regression can always
and the reason why we use ten coordinates is because we
move to a moderate step based on smooth Ln loss, which
can avoid regressing 8 coordinates, which do not contain
can accelerate the converse of training procedure in prac-
relative information and it is more difficult to learn in
tice. As mentioned in section 3.2, the recursive loss, Lreg,
practice [6]. Inspired by [26], we also use Lreg(pi ;p∗i ) =
is defined over a tuple of true bounding-box regression tar-
R(pi -p∗i ) for multi-task loss, where R is our proposed loss
gets p∗ and a predicted tuple p for class text. The Smooth
function (smooth Ln) that would be described in section
L1 loss proposed in [6] is given by:
3.4. p∗ = (p∗x , p∗y , p∗w1 , p∗h1 , p∗w2 , p∗h2 , p∗w3 , p∗h3 , p∗w4 , p∗h4 )
represents the ten parameterized coordinates X
Lreg(p; p∗ ) = smoothL1 (pi , p∗ ), (1)
of the predicted bounding box, and p =
i∈S
(px , py , pw1 , ph1 , pw2 , ph2 , pw3 , ph3 , pw4 , ph4 ) represents
the ground truth. in which,

0.5x2

if |x| < 1
smoothL1 (x) = (2)
|x| − 0.5 otherwise.

The x in the function represents the error between predicted

value and ground truth (x = w · (p − p∗ )). The deviation
function of smoothL1 is:

x if |x| < 1
deviationL1 (x) = (3)
sign(x) otherwise.

As equation 3 shows, the deviation function is a piecewise

function while the smooth Ln loss is a continuous derivable
function. The proposed Smooth Ln loss is given by:
Figure 5. The position of each point of quadrangle can be calcu-
lated by central point and the relative lengths. X
Lreg(p; p∗ ) = smoothLn (pi , p∗ ), (4)
i∈S
From the given coordinates, we can calculate the mini-
mum x (xmin ) and maximum x (xmax ) of the circumscribed in which,
rectangle, and the width of circumscribed horizontal rectan-
smoothLn (x) = (|d| + 1)ln(|d| + 1) − |d|, (5)
gle wchr = xmax − xmin . Similarly, we can get the height
hchr = ymax − ymin . and the deviation function of smoothLn is:
We adopt the parameterizations of the 10 coordinates as
following: deviationLn (x) = sign(x) · ln(sign(x) · x + 1). (6)
5 5

4 4

3 3

2 2

1 1

loss
loss

0 0

−1 −1

−2 Smooth L1 loss −2
Smooth L1 loss
−3 Smooth ln loss −3
Smooth ln loss
L2 loss
−4 −4 L2 loss

−5 −5
−5 −4 −3 −2 −1 0 1 2 3 4 5 −5 −4 −3 −2 −1 0 1 2 3 4 5

x (error) x (error)

(a) forward loss functions. (b) backward deviation functions.

Figure 6. Visualization of differences among three loss functions (L2, smooth L1 and smooth Ln). Here, the L2 function uses the same
coefficient 0.5 with smooth L1 loss.

property L2 loss smooth L1 loss smooth Ln loss rate and precision, which is always used for ranking
Robustness Worst Best Good
Stability Good Worst Best
the methods.

Table 1. Different properties of different loss functions. Robust- Particularly, we simply use official 1000 training images
ness represents the ability of resistance to outliers in the data and as our training set without any extra data augmentation, but
stability represents the capability of adjusting regressive step. we have modified some rectangular labels to quadrilateral
labels for adapting to our method.
Dataset - ICDAR 2015 Competition Challenge 4 “In-
Equation 5 and equation 6 are both continuous function
cidental Scene Text” [14]. Different from the previous
with a single equation. For equation 6, it is easy to prove
ICDAR competition, in which the text are well-captured,
|x| ≥ |deviationLn (x)|, which means the smooth Ln loss
horizontal, and typically centered in images. The datasets
is also less sensitive to outliers than the L2 loss used in
includes 1000 training images and 500 testing incidental
R-CNN [7]. A intuitive representation of the differences
scene image in where text may appear in any orientation
among three loss functions is shown in Figure 6. The com-
and any location with small size or low resolution and the
parisons of properties in terms of robustness and stability
annotations of all bounding boxes are marked at the word
are summarized in Table 1. The results demonstrate that the
level.
smooth Ln loss promises better text localization and rela-
Baseline network. The main structure of DMPNet is
tively tighter bounding boxes around the texts.
based on the VGG-16 model [28], Similar to Single Shot
Detector [19], we use the same intermediate convolutional
4. Experiments layers to apply quadrilateral sliding windows. All input
Our testing environment is a desktop running Ubuntu images would be resized to a 800x800 for preserving tiny
14.04 64bit version with TitanX. In this section, we quan- texts.
titatively evaluate our method on the public dataset: IC- Experimental results. For comprehensively evaluat-
DAR 2015 Competition Challenge 4: “Incidental Scene ing our algorithm, we collect and list the competition re-
Text” [14], and as far as we know, this is the only one dataset sults [14] in Table 2. The previous best method of this
in which texts are both word-level and multi-oriented. All dataset, proposed by Yao et al. [33], achieved a F measure
results of our methods are evaluated from its online evalua- of 63.76% while our approach obtains 70.64%. The preci-
tion system, which would calculate the recall rate, precision sion of these two methods are comparable but the recall rate
and F-measure to rank the submitted methods. The general of our method has greatly increased, which is mainly due to
criteria of these three index can be explained below: the quadrilateral sliding windows described in section 3.1.
Figure 7 shows several detected results taken from the
• Recall rate evaluates the ability of finding text. test set of ICDAR 2015 challenge 4. DMPNet can robustly
• Precision evaluates the reliability of predicted bound- localize all kinds of scene text with less background noise.
ing box. However, due to the complexity of incidental scene, some
false detections still exist, and our method may fail to re-
• F-measure is the harmonic mean (Hmean) of recall call some inconspicuous text as shown in the last column of
Table 2. Evaluation on the ICDAR 2015 competition on robust adjusting the prediction, which shows better overall perfor-
reading challenge 4 “Incidental Scene Text” localization.
Algorithm Recall (%) Precision (%) Hmean (%)
mance than L2 loss and smooth L1 loss in terms of robust-
Baseline (SSD-VGGNet) 25.48 63.25 36.326 ness and stability. Experiments on the well-known ICDAR
Proposed DMPNet 68.22 73.23 70.64 2015 robust reading challenge 4 dataset demonstrate that
Megvii-Image++ [33] 56.96 72.40 63.76
CTPN [29] 51.56 74.22 60.85
DMPNet can achieve state-of-the-art performance in detect-
MCLAB FCN [14] 43.09 70.81 53.58 ing incidental scene text. In the following, we discuss an
StardVision-2 [14] 36.74 77.46 49.84
StardVision-1 [14] 46.27 53.39 49.57
issue related to our approach and briefly describe our future
CASIA USTB-Cascaded [14] 39.53 61.68 48.18 work.
NJU Text [14] 35.82 72.73 48.00 Ground truth of the text. Texts in camera captured im-
AJOU [16] 46.94 47.26 47.10
HUST MCLAB [14] 37.79 44.00 40.66 ages are always with perspective distortion. However rect-
Deep2Text-MO [36] 32.11 49.59 38.98 angular constraints of labeling data may bring a lot of back-
CNN Proposal [14] 34.42 34.71 34.57
TextCatcher-2 [14] 34.81 24.91 29.04
ground noise, and it may lose information for not containing
all texts when labeling marginal text. As far as we know,
ICDAR 2015 Challenge 4 is the first dataset to use quadri-
Figure 7. lateral labeling, and our method prove the effectiveness of
utilizing quadrilateral labeling. Thus, quadrilateral labeling
for scene text may be more reasonable.
Future Work. The high recall rate of the DMPNet
mainly depends on numerous prior-designed quadrilateral
sliding windows. Although our method have been proved
effective, the man-made shape of sliding window may not
be the optimal designs. In future, we will explore using
shape-adaptive sliding windows toward tighter scene text
detection.

References
[1] A. Bissacco, M. Cummins, Y. Netzer, and H. Neven. Pho-
toocr: Reading text in uncontrolled conditions. In IEEE In-
ternational Conference on Computer Vision, pages 785–792,
2013. 1
[2] X. Chen and A. L. Yuille. Detecting and reading text in nat-
Figure 7. Experimental results of samples on ICDAR 2015 Chal- ural scenes. In IEEE Computer Society Conference on Com-
lenge 4, including multi-scale and multi-language word-level text. puter Vision and Pattern Recognition, pages 366–373, 2004.
Our method can tightly localize text with less background infor- 1, 2
mation as shown in the first two columns. Top three images from
[3] H. Cho, M. Sung, and B. Jun. Canny text detector: Fast
last column are the failure recalling cases of the proposed method.
and robust scene text localization algorithm. In Proceed-
Specially, some labels are missed in some images, which may re-
ings of the IEEE Conference on Computer Vision and Pattern
duce our accuracy as the red bounding box listed in the fourth
Recognition, pages 3566–3573, 2016. 2
image of the last column.
[4] B. Epshtein, E. Ofek, and Y. Wexler. Detecting text in natu-
ral scenes with stroke width transform. In Computer Vision
and Pattern Recognition (CVPR), 2010 IEEE Conference on,
5. Conclusion and future work pages 2963–2970. IEEE, 2010. 1, 2
In this paper, we have proposed an CNN based method, [5] M. Galetzka and P. O. Glauner. A correct even-odd algo-
named Deep Matching Prior Network (DMPNet), that can rithm for the point-in-polygon (pip) problem for complex
effectively reduce the background interference. The DMP- polygons. CVPR, 2012. 4
[6] R. Girshick. Fast r-cnn. In Proceedings of the IEEE Inter-
Net is the first attempt to adopt quadrilateral sliding win-
national Conference on Computer Vision, pages 1440–1448,
dows, which are designed based on the priori knowledge
2015. 2, 5
of textual intrinsic shape, to roughly recall text. And we
[7] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich fea-
use a proposed sequential protocol and a relative regressive ture hierarchies for accurate object detection and semantic
method to finely localize text without self-contradictory. segmentation. In Proceedings of the IEEE conference on
Due to the requirement of computing numerous polygo- computer vision and pattern recognition, pages 580–587,
nal overlapping area in the rough procedure, we proposed 2014. 5, 6
a shared Monte-Carlo method for fast and accurate calcula- [8] S. M. Hanif and L. Prevost. Text detection and localization
tion. In addition, a new smooth Ln loss is used for further in complex scene images using constrained adaboost algo-
rithm. In 2009 10th International Conference on Document [25] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You
Analysis and Recognition, pages 1–5. IEEE, 2009. 2 only look once: Unified, real-time object detection. arXiv
[9] W. Huang, Z. Lin, J. Yang, and J. Wang. Text localization in preprint arXiv:1506.02640, 2015. 2
natural images using stroke feature transform and text covari- [26] S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: To-
ance descriptors. In International Conference on Computer wards real-time object detection with region proposal net-
Vision, pages 1241–1248, 2013. 2 works. IEEE Transactions on Pattern Analysis & Machine
[10] W. Huang, Y. Qiao, and X. Tang. Robust scene text detec- Intelligence, pages 1–1, 2016. 2, 3, 5
tion with convolution neural network induced mser trees. In [27] C. Shi, C. Wang, B. Xiao, Y. Zhang, S. Gao, and Z. Zhang.
ECCV, pages 497–511, 2014. 1, 2, 3 Scene text recognition using part-based tree-structured char-
[11] M. Jaderberg, A. Vedaldi, and A. Zisserman. Deep features acter detection. In IEEE Conference on Computer Vision and
for text spotting. In European conference on computer vi- Pattern Recognition, pages 2961–2968, 2013. 2
sion, pages 512–528. Springer, 2014. 1, 2 [28] K. Simonyan and A. Zisserman. Very deep convolutional
[12] H. Kai and A. Agathos. The point in polygon problem for ar- networks for large-scale image recognition. arXiv preprint
bitrary polygons. Computational Geometry, 20(3):131–144, arXiv:1409.1556, 2014. 2, 6
2001. 4 [29] Z. Tian, W. Huang, T. He, P. He, and Y. Qiao. Detecting Text
[13] L. Kang, Y. Li, and D. Doermann. Orientation robust text in Natural Image with Connectionist Text Proposal Network.
line detection in natural images. In 2014 IEEE Conference Springer International Publishing, 2016. 7
on Computer Vision and Pattern Recognition, pages 4034– [30] Z. Tu, Y. Ma, W. Liu, X. Bai, and C. Yao. Detecting texts of
4041. IEEE, 2014. 1 arbitrary orientations in natural images. In IEEE Conference
[14] D. Karatzas, S. Lu, F. Shafait, S. Uchida, E. Valveny, on Computer Vision and Pattern Recognition, pages 1083–
L. Gomezbigorda, A. Nicolaou, S. Ghosh, A. Bagdanov, and 1090, 2012. 2, 3
M. Iwamura. Icdar 2015 competition on robust reading. In [31] J. J. Weinman, Z. Butler, D. Knoll, and J. Feild. Toward in-
International Conference on Document Analysis and Recog- tegrated scene text reading. IEEE Transactions on Software
nition, 2015. 1, 6, 7 Engineering, 36(2):375–87, 2014. 1
[15] K. I. Kim, K. Jung, and H. K. Jin. Texture-based ap-
[32] J. J. Weinman, E. Learned-Miller, and A. R. Hanson. Scene
proach for text detection in images using support vector ma-
text recognition using similarity and a lexicon with sparse
chines and continuously adaptive mean shift algorithm. Pat-
belief propagation. IEEE Transactions on Pattern Analysis
tern Analysis & Machine Intelligence IEEE Transactions on,
& Machine Intelligence, 31(10):1733–46, 2009. 1
25(12):1631–1639, 2003. 2
[33] C. Yao, J. Wu, X. Zhou, C. Zhang, S. Zhou, Z. Cao, and
[16] H. I. Koo and D. H. Kim. Scene text detection via connected
Q. Yin. Incidental scene text understanding: Recent pro-
component clustering and nontext filtering. IEEE Transac-
gresses on icdar 2015 robust reading competition challenge
tions on Image Processing A Publication of the IEEE Signal
4. PAMI, 2015. 2, 6, 7
Processing Society, 22(6):2296–2305, 2013. 7
[17] J.-J. Lee, P.-H. Lee, S.-W. Lee, A. L. Yuille, and C. Koch. [34] Q. Ye and D. Doermann. Text detection and recognition in
Adaboost for text detection in natural scene. In ICDAR, imagery: A survey. IEEE Transactions on Pattern Analysis
pages 429–434, 2011. 2 & Machine Intelligence, 37(7):1480–1500, 2015. 1
[18] M. Li and I. K. Sethi. Confidence-based active learning. [35] C. Yi and Y. Tian. Text string detection from natural scenes
IEEE Transactions on Pattern Analysis & Machine Intelli- by structure-based partition and grouping. IEEE Transac-
gence, 28(8):1251–61, 2006. 2 tions on Image Processing, 20(9):2594–605, 2011. 1, 2
[19] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, and S. Reed. [36] X. C. Yin, W. Y. Pei, J. Zhang, and H. W. Hao. Multi-
Ssd: Single shot multibox detector. arXiv preprint orientation scene text detection with adaptive clustering.
arXiv:1512.02325, 2015. 2, 3, 5, 6 IEEE Transactions on Pattern Analysis & Machine Intelli-
[20] J. Matas, O. Chum, M. Urban, and T. Pajdla. Robust wide- gence, 37(9):1930–7, 2015. 1, 2, 7
baseline stereo from maximally stable extremal regions. Im- [37] X. C. Yin, X. Yin, K. Huang, and H. W. Hao. Robust text de-
age & Vision Computing, 22(10):761–767, 2004. 2 tection in natural scene images. IEEE Transactions on Pat-
[21] A. Neubeck and L. V. Gool. Efficient non-maximum sup- tern Analysis & Machine Intelligence, 36(5):970–83, 2014.
pression. In International Conference on Pattern Recogni- 1
tion, pages 850–855, 2006. 2 [38] A. Zamberletti, L. Noce, and I. Gallo. Text localization based
[22] L. Neumann and J. Matas. Real-time scene text localization on fast feature pyramids and multi-resolution maximally sta-
and recognition. In IEEE Conference on Computer Vision ble extremal regions. In Asian Conference on Computer Vi-
and Pattern Recognition, pages 3538–3545, 2012. 2 sion, pages 91–105. Springer, 2014. 2
[23] L. Neumann and J. Matas. Scene text localization and recog- [39] Z. Zhang, C. Zhang, W. Shen, C. Yao, W. Liu, and X. Bai.
nition with oriented stroke detection. In IEEE International Multi-oriented text detection with fully convolutional net-
Conference on Computer Vision, pages 97–104, 2013. 1, 2 works. arXiv preprint arXiv:1604.04018, 2016. 1, 2, 3
[24] D. Nistr and H. Stewnius. Linear time maximally stable ex- [40] Y. Zhu, C. Yao, and X. Bai. Scene text detection and recog-
tremal regions. In Computer Vision - ECCV 2008, European nition: recent advances and future trends. Frontiers of Com-
Conference on Computer Vision, Marseille, France, October puter Science, 10(1):19–36, 2016. 1, 2
12-18, 2008, Proceedings, pages 183–196, 2008. 2

Schematic: 311C U Excavator Hydraulic System
No ratings yet
Schematic: 311C U Excavator Hydraulic System
2 pages
Precision Plus Price List USD - Feb 2019 PDF
No ratings yet
Precision Plus Price List USD - Feb 2019 PDF
87 pages
A_Transformer-Based_Framework_for_Scene_Text_Recognition
No ratings yet
A_Transformer-Based_Framework_for_Scene_Text_Recognition
16 pages
Textfield: Learning A Deep Direction Field For Irregular Scene Text Detection
No ratings yet
Textfield: Learning A Deep Direction Field For Irregular Scene Text Detection
14 pages
CNN-BiLSTM model for English Handwriting Recognition, Comprehensiv Evalution on the IAM Dataset 2307.00664v1
No ratings yet
CNN-BiLSTM model for English Handwriting Recognition, Comprehensiv Evalution on the IAM Dataset 2307.00664v1
20 pages
2017 - Yingying - R2CNN-Rotational-Region-CNN-for-Orientation-Robust-Scene-Text-Detection
No ratings yet
2017 - Yingying - R2CNN-Rotational-Region-CNN-for-Orientation-Robust-Scene-Text-Detection
8 pages
CRNN Model For Text Detection and Classification From Natural Scenes
No ratings yet
CRNN Model For Text Detection and Classification From Natural Scenes
11 pages
1-s2.0-S0031320318304370-main
No ratings yet
1-s2.0-S0031320318304370-main
10 pages
Efficient Nearest Neighbor Search in High Dimensional Hamming Space
No ratings yet
Efficient Nearest Neighbor Search in High Dimensional Hamming Space
11 pages
Applied Sciences: Scene Text Detection Using Attention With Depthwise Separable Convolutions
No ratings yet
Applied Sciences: Scene Text Detection Using Attention With Depthwise Separable Convolutions
18 pages
Lip-Reading With Densely Connected Temporal Convolutional Networks
No ratings yet
Lip-Reading With Densely Connected Temporal Convolutional Networks
10 pages
Enhanced Scene Text Recognition Using Deep Learning Based Hybrid Attention Recognition Network
No ratings yet
Enhanced Scene Text Recognition Using Deep Learning Based Hybrid Attention Recognition Network
12 pages
9
No ratings yet
9
6 pages
Learning Text-Line Localization With Shared and Local Regression Neural Networks
No ratings yet
Learning Text-Line Localization With Shared and Local Regression Neural Networks
6 pages
10
No ratings yet
10
22 pages
1301.2628!!!
No ratings yet
1301.2628!!!
10 pages
Qu_Towards_Robust_Tampered_Text_Detection_in_Document_Image_New_Dataset_CVPR_2023_paper-1 (1)
No ratings yet
Qu_Towards_Robust_Tampered_Text_Detection_in_Document_Image_New_Dataset_CVPR_2023_paper-1 (1)
10 pages
Unconstrained Offline Handwritten Word
No ratings yet
Unconstrained Offline Handwritten Word
5 pages
Jaderberg 16
No ratings yet
Jaderberg 16
20 pages
Adaptive_Feature_Abstraction_for_Translating_Video
No ratings yet
Adaptive_Feature_Abstraction_for_Translating_Video
16 pages
Kami Export - 1904.01941
No ratings yet
Kami Export - 1904.01941
5 pages
Acuan CNN + LSTM Model
No ratings yet
Acuan CNN + LSTM Model
5 pages
2021 LIN - Artificial Intelligence - STAN
No ratings yet
2021 LIN - Artificial Intelligence - STAN
9 pages
Unconstrained Text Recognition With Convolutional Neural Networks
No ratings yet
Unconstrained Text Recognition With Convolutional Neural Networks
13 pages
Multi-Script-Oriented Text Detection and Recognition in Video/Scene/Born Digital Images
No ratings yet
Multi-Script-Oriented Text Detection and Recognition in Video/Scene/Born Digital Images
18 pages
A Deep Hierarchical Feature Learning Architecture for Crack Segmentation
No ratings yet
A Deep Hierarchical Feature Learning Architecture for Crack Segmentation
15 pages
Palmprint-Palmvein Fusion Recognition Based On Deep Hashing Network
No ratings yet
Palmprint-Palmvein Fusion Recognition Based On Deep Hashing Network
12 pages
Joint Transceiver Optimization For Wireless Communication PHY With CNN
No ratings yet
Joint Transceiver Optimization For Wireless Communication PHY With CNN
21 pages
2401.00192v1
No ratings yet
2401.00192v1
6 pages
2019 - Joeseytre - TextTubes-for-Detecting-Curved-Text-in-the-Wild
No ratings yet
2019 - Joeseytre - TextTubes-for-Detecting-Curved-Text-in-the-Wild
10 pages
Convolutional Multi-Directional Recurrent Network For of Ine Handwritten Text Recognition
No ratings yet
Convolutional Multi-Directional Recurrent Network For of Ine Handwritten Text Recognition
6 pages
MORAN: A Multi-Object Rectified Attention Network For Scene Text Recognition
No ratings yet
MORAN: A Multi-Object Rectified Attention Network For Scene Text Recognition
15 pages
Tang_Few_Could_Be_Better_Than_All_Feature_Sampling_and_Grouping_CVPR_2022_paper
No ratings yet
Tang_Few_Could_Be_Better_Than_All_Feature_Sampling_and_Grouping_CVPR_2022_paper
10 pages
A Convolutional Recurrent Neural Network For Real-Time Speech Enhancement
No ratings yet
A Convolutional Recurrent Neural Network For Real-Time Speech Enhancement
5 pages
A Text Classification Model Based On GCN and BiGRU Fusion
No ratings yet
A Text Classification Model Based On GCN and BiGRU Fusion
5 pages
Research On Text Classification Based On CNN and LSTM: Yuandong Luan Shaofu Lin
No ratings yet
Research On Text Classification Based On CNN and LSTM: Yuandong Luan Shaofu Lin
4 pages
Ria 38.01 12
No ratings yet
Ria 38.01 12
11 pages
Aug - Overlaid - Data Augmentation For Recognition of Handwritten PDF
No ratings yet
Aug - Overlaid - Data Augmentation For Recognition of Handwritten PDF
7 pages
Gupta Synthetic Data For CVPR 2016 Paper
No ratings yet
Gupta Synthetic Data For CVPR 2016 Paper
10 pages
Comparison of Deep CNN and ResNet For Handwritten Devanagari Character Recognition
No ratings yet
Comparison of Deep CNN and ResNet For Handwritten Devanagari Character Recognition
4 pages
Electronics 12 03087
No ratings yet
Electronics 12 03087
10 pages
Real-Time Speech Enhancement On Raw Signals With Deep State-Space Modeling
No ratings yet
Real-Time Speech Enhancement On Raw Signals With Deep State-Space Modeling
7 pages
Character-Level Convolutional Networks For Text Classification
No ratings yet
Character-Level Convolutional Networks For Text Classification
9 pages
Scene Text Image Super-Resolution in The Wild
No ratings yet
Scene Text Image Super-Resolution in The Wild
29 pages
3586a370
No ratings yet
3586a370
7 pages
1 s2.0 S0924271621002379 Main
No ratings yet
1 s2.0 S0924271621002379 Main
15 pages
VQGAN: Taming Transformer For High-Resolution Image Synthesis
No ratings yet
VQGAN: Taming Transformer For High-Resolution Image Synthesis
52 pages
wiamis_04
No ratings yet
wiamis_04
4 pages
2001 11542 PDF
No ratings yet
2001 11542 PDF
5 pages
Convolutional Character Networks
No ratings yet
Convolutional Character Networks
11 pages
Accepted Manuscript: Applied Soft Computing
No ratings yet
Accepted Manuscript: Applied Soft Computing
39 pages
Metasurface Paper
No ratings yet
Metasurface Paper
25 pages
Semantic Image Segmentation With Task-Specific Edge Detection Using Cnns and A Discriminatively Trained Domain Transform
No ratings yet
Semantic Image Segmentation With Task-Specific Edge Detection Using Cnns and A Discriminatively Trained Domain Transform
10 pages
PPT for the First Paper (1)
No ratings yet
PPT for the First Paper (1)
49 pages
10.2478 - Jaiscr 2019 0006
No ratings yet
10.2478 - Jaiscr 2019 0006
11 pages
Hybrid Overlay Structure Based On Random Walks: Ruixiong Tian, Yongqiang Xiong, Qian Zhang, Bo Li, Ben Y. Zhao, Xing Li
No ratings yet
Hybrid Overlay Structure Based On Random Walks: Ruixiong Tian, Yongqiang Xiong, Qian Zhang, Bo Li, Ben Y. Zhao, Xing Li
6 pages
Deep Learning Based Channel Estimation Algorithm Over Time Selective Fading Channels
No ratings yet
Deep Learning Based Channel Estimation Algorithm Over Time Selective Fading Channels
8 pages
(2022-MM) SPTS Single-Point Text Spotting
No ratings yet
(2022-MM) SPTS Single-Point Text Spotting
12 pages
364 Power of Deep Learning For
No ratings yet
364 Power of Deep Learning For
4 pages
MSGLN
No ratings yet
MSGLN
10 pages
Deep Learning: Fundamentals and Applications
From Everand
Deep Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Error-Correction on Non-Standard Communication Channels
From Everand
Error-Correction on Non-Standard Communication Channels
Edward A. Ratzer
No ratings yet
Soal Bing KLS 2
No ratings yet
Soal Bing KLS 2
2 pages
Dismantling & Reinstatement Qty. For Working Dprs
No ratings yet
Dismantling & Reinstatement Qty. For Working Dprs
28 pages
321 SST Tolerances A 480-A 480M
No ratings yet
321 SST Tolerances A 480-A 480M
25 pages
READING A2
No ratings yet
READING A2
23 pages
The Sumerian Harp of Ur
No ratings yet
The Sumerian Harp of Ur
19 pages
The Effect of Humour On Virality The Stu PDF
No ratings yet
The Effect of Humour On Virality The Stu PDF
16 pages
Answering Questions With Statistics 1st Edition Szafran Test Bank Full Chapter PDF
100% (19)
Answering Questions With Statistics 1st Edition Szafran Test Bank Full Chapter PDF
44 pages
BROOKLYN Colm Tóibín Miss - Bee
No ratings yet
BROOKLYN Colm Tóibín Miss - Bee
20 pages
SAAd Cep
No ratings yet
SAAd Cep
6 pages
s11148 008 9034 2
No ratings yet
s11148 008 9034 2
2 pages
Choosing A Mate
No ratings yet
Choosing A Mate
14 pages
lord_of_the_flies_webquest
No ratings yet
lord_of_the_flies_webquest
7 pages
Mars Off Grid Solar Power System
No ratings yet
Mars Off Grid Solar Power System
20 pages
Gst Most Expected Questions Part 2 by Vg Sir_250129_134822
No ratings yet
Gst Most Expected Questions Part 2 by Vg Sir_250129_134822
9 pages
Dyanamic English Grammar and Composition-1
100% (2)
Dyanamic English Grammar and Composition-1
120 pages
Iso 23233 2009 en FR PDF
No ratings yet
Iso 23233 2009 en FR PDF
6 pages
Scientific Astrology
100% (1)
Scientific Astrology
65 pages
British Medical Journal PMS PMDD Management For GPs
No ratings yet
British Medical Journal PMS PMDD Management For GPs
35 pages
UPDA ELECTRICAL EXAM 08.11.2020: Displacement Factor?
No ratings yet
UPDA ELECTRICAL EXAM 08.11.2020: Displacement Factor?
4 pages
1 Institutional Data in Prescribed Format - 11
No ratings yet
1 Institutional Data in Prescribed Format - 11
65 pages
Communication Systems by B P Lathi PDF
0% (4)
Communication Systems by B P Lathi PDF
2 pages
2.2 Partial Derivatives
No ratings yet
2.2 Partial Derivatives
12 pages
Electrochemistry Mind Map
No ratings yet
Electrochemistry Mind Map
2 pages
AEM 3e Chapter 18
No ratings yet
AEM 3e Chapter 18
7 pages
SH-2003 Manual
No ratings yet
SH-2003 Manual
2 pages
2015 Citroen c4 Cactus 92715 PDF
No ratings yet
2015 Citroen c4 Cactus 92715 PDF
326 pages
WHO PPT On Aseptic Processing
50% (2)
WHO PPT On Aseptic Processing
47 pages
Gore Et Al 2017 Pakistan Beachcast Cetaceans
No ratings yet
Gore Et Al 2017 Pakistan Beachcast Cetaceans
9 pages