0% found this document useful (0 votes)

35 views15 pages

A Machine-Learning Approach For Analyzing Document

Machine

Uploaded by

maibackupimali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views15 pages

A Machine-Learning Approach For Analyzing Document

Machine

Uploaded by

maibackupimali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/220600742

A machine-learning approach for analyzing document layout structures with

two reading orders

Article in Pattern Recognition · October 2008

DOI: 10.1016/j.patcog.2008.03.014 · Source: DBLP

CITATIONS READS

8 768

3 authors, including:

Chien-Hsing Chou Fu zhi Chang

Tamkang University Shandong University
67 PUBLICATIONS 1,289 CITATIONS 70 PUBLICATIONS 1,379 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Attachable Augmented Haptic on Gaming Controller View project

All content following this page was uploaded by Chien-Hsing Chou on 26 July 2019.

The user has requested enhancement of the downloaded file.

Pattern Recognition 41 (2008) 3200 -- 3213

Contents lists available at ScienceDirect

Pattern Recognition
journal homepage: w w w . e l s e v i e r . c o m / l o c a t e / p r

A machine-learning approach for analyzing document layout structures with two

reading orders
Chung-Chih Wu, Chien-Hsing Chou, Fu Chang ∗
Institute of Information Science, Academia Sinica, 128 Academia Road, Section 2, Taipei 115, Taiwan

A R T I C L E I N F O A B S T R A C T

Article history: The purpose of document layout analysis is to locate textlines and text regions in document images
Received 16 April 2007 mostly via a series of split-or-merge operations. Before applying such an operation, however, it is nec-
Received in revised form 3 March 2008 essary to examine the context to decide whether the place chosen for the operation is appropriate. We
Accepted 12 March 2008
thus view document layout analysis as a matter of solving a series of binary decision problems, such as
whether to apply, or not to apply, a split-or-merge operation to a chosen place. To solve these problems,
Keywords:
Binary decision
we use support vector machines to learn whether or not to apply the previously mentioned operations
Document layout analysis from training documents in which all textlines and text regions have been located and their identifies
Reading order labeled. The proposed approach is very effective for analyzing documents that allow both horizontal and
Support vector machine vertical reading orders. When applied to a test data set composed of eight types of layout structure, the
Taboo box approach's accuracy rates for identifying textlines and text regions are 98.83% and 96.72%, respectively.
Textline © 2008 Elsevier Ltd. All rights reserved.
Text region

1. Introduction top-down approach, one example is the recursive X--Y cut method [1]
that relies on projection profiles to cut a textual region into several
Document layout analysis involves operations that divide a docu- sub-regions; however, it may fail in text regions that lack a fully
ment into textlines composed of homogeneous characters, and into extended horizontal or vertical cut. Similarly, methods that exploit
text regions composed of homogeneous textlines. Analyzing the lay- maximal white rectangles [2] or white streams [3] may also fail to
out structure of some documents, such as Chinese and Japanese doc- find large enough white margins. For this reason, Lee and Ryu [4]
uments, is particularly challenging because they allow two reading proposed a multi-scale analysis method that examines a document in
orders. The reading order is defined by the order of characters in a various scales. All of the above methods were designed for Western
textline. In Western documents, the reading order is always hori- documents only.
zontal, while in Chinese and Japanese documents, it can be horizon- In the bottom-up approach, examples are the document spectrum
tal or vertical. Sometimes, such documents may contain both types method [5], the minimal-cost spanning tree method [6], and the
of textlines. In these cases, the complexity of layout analysis is of- component-based algorithm [7]. These methods construct textlines
ten increased, since it can be conducted in two possible directions, based on the distances between connected components (referred to
whereas in documents with a single reading order there is no such as components hereafter). Many bottom-up methods were also de-
freedom. signed for documents with a single reading order. Xi et al. [8], on the
other hand, applied the spanning tree method to documents with
two reading orders. They used the spanning tree as a pre-classifier
1.1. Background to gather components into sub-graphs. In this approach, one cru-
cial step involves cutting away a vertical sub-graph that has been
The various layout analysis methods proposed thus far can be wrongly merged into a horizontal textline, or vice versa. Relying on
categorized as top-down, bottom-up, or hybrid approaches. In the some heuristics to solve this problem, the authors achieved an ac-
curacy rate of 87.2% in the analysis of 25 documents. Chen et al. [9]
developed a rule-based bottom-up method for documents with two
∗ Corresponding author. Tel.: +886 02 2788 3799; fax: +886 02 2782 4814. reading orders. The rules for merging components into textlines are
E-mail addresses: [email protected] (C.-C. Wu), based on the concept of "nearest-neighbor connect-strength'', which
[email protected] (C.-H. Chou), [email protected] (F. Chang). varies according to the size similarity, distance, and offset of the

0031-3203/$30.00 © 2008 Elsevier Ltd. All rights reserved.

doi:10.1016/j.patcog.2008.03.014
C.-C. Wu et al. / Pattern Recognition 41 (2008) 3200 -- 3213 3201

Fig. 1. The flowchart of the proposed layout analysis procedure.

components. The accuracy rate reported for this approach was 83.2% step removes textlines grown under a false reading-order assump-
using an un-specified number of documents. tion, and the second eliminates overlap between the remaining
Okamoto and Takahashi [10] proposed a hybrid segmentation textlines.
method for documents with two reading orders. It partitions a doc- The advantage of dividing the problem into several sub-problems
ument into blocks based on field separators and white streams, and is that they are simple and can be solved one at a time. If we tried to
then merges the components of each block into textlines. Unfortu- solve all the sub-problems simultaneously, we would have to deal
nately, the authors did not specify the method's parameters or detail with mutually conflicting examples, which would complicate the
its performance. learning process. Thus, when we grow textlines initially, we do not
The methods of Liu et al. [11] and Chang et al. [12] are adaptive in worry whether they are under-extended, over-extended, or whether
the sense that split-or-merge operations are performed using esti- they should be there at all. These issues can be dealt with at a later
mated parameter values. Liu et al.'s approach applies the operations stage when we have more information.
in both top-down and bottom-up directions, whereas Chang et al.'s All our sub-problems involve a decision about whether to per-
method only conducts split operations when low-level objects have form a certain split-or-merge operation. For this reason, we employ
been merged into high-level ones. Although the latter method did support vector machines (SVM) [18,19] to construct the decision
not have a unified approach for estimating its parameters, it achieved function. In the learning phase, we prepare both positive and nega-
a rather good performance on its data set. A performance report was tive examples. The former are examples to which we apply a certain
not provided for the former method. operation P, and the latter are examples to which we do not apply
Since all methods split or merge objects based on certain param- P. The learning procedure produces a decision function, or classifier,
eters, parameter estimation is crucial in layout analysis. Machine- which is used in the layout analysis to decide whether or not to
learning techniques can be especially useful in this regard. Even so, conduct P.
very few of the methods proposed thus far have adopted a learn-
ing approach. Learning methods have been used to separate tex- 1.3. The steps in our procedure
tual areas from graphical areas [13], and to classify text regions
as headline, main text, etc. [14,15]. In surveys of learning methods Our procedure for layout analysis involves four steps, which we
used in document image analysis, Marinai et al. [16] consider neural describe briefly below.
network methods for document analysis and recognition, while Liu In Step I, we form both horizontal and vertical textlines out of
et al. [17] consider learning methods for handwritten digit recog- certain components. To avoid forming many textlines that are es-
nition. To the best of our knowledge, the only work that applies a sentially the same, we impose restrictions on the growth of new
machine-learning technique to layout analysis is that of Laven et textlines. Following this operation, we apply another operation to
al. [15]. They derived a logistic regression classifier from a train- find similar textlines and merge them.
ing data set and used it to analyze journal formats. The error rate In Step II, we remove textlines with false reading orders. We do
was 56% for one test data set and 25.7% for another test data set so by first grouping those textlines that are compatible in terms of
(cf. [15, Table 2]). size, but differ in reading order. Then, based on certain features that
are extracted from each group, we identify and remove the textlines
1.2. Our contribution in each group that were grown according to a wrong reading-order
assumption.
In this paper, our objective is to provide a method for analyz- In Step III, we remove areas of overlap between the remaining
ing documents composed of two reading orders. To resolve such a textlines. Two types of textline overlap are possible: orthogonal and
hard problem, we feel that we need to go beyond the rule-based parallel. For both types, we have to assign the overlapping zone to
approach and take advantage of a more structured procedure that one textline only.
can learn from examples. Of course, learning by itself does not In Step IV, we form text regions out of related textlines. Then,
guarantee a good solution. The success of learning depends on in another step, we identify the multiple column structure in the
a good mechanism and also on the problem it has to solve. We related textlines.
divide our problem into several sub-problems, which arise natu- The flowchart for the above procedure is shown in Fig. 1.
rally when we employ a bottom-up procedure. At the beginning The remainder of this paper is organized as follows. Sections 2,
of the procedure, we grow components into textlines; and at the 3, 4, and 5 are devoted the four steps, respectively. Section 6 con-
end, we grow textlines into text regions. Two intermediate steps tains the experimental results. Finally, in Section 7, we present our
are also necessary to resolve conflicts between textlines. The first conclusions.
3202 C.-C. Wu et al. / Pattern Recognition 41 (2008) 3200 -- 3213

2. Step I: formation of textlines

When a document image is given, we assume that its graph-

ical objects have been identified and labeled as non-textual
objects. For the remaining areas, we use a contour tracing tech-
nique [20] to find and label components in the document. The
next step is to form textlines using a component as the start-
ing point. Following convention, we identify a component, a
textline, or a text region by its bounding box. Thus, for an ob-
ject, E, we use Eheight to denote its height and Ewidth to de-
note its width. We then form horizontal and vertical textlines
from H-qualified and V-qualified components, respectively. An H-
qualified component has a height that exceeds the minimum of
{Fheight : F is a labeled horizontal textline in the training data set},
Fig. 3. The green area contains positive examples and the surrounding red area
and a V-qualified component has a width that exceeds the minimum contains artificially generated negative examples.
of {Gwidth : G is a labeled vertical textline in the training data set}.
Hereafter, unless otherwise stated, we only describe our treatment of
horizontal textlines, since vertical textlines are processed similarly. identities. To extract the positive examples from labeled textlines, we
commence with a textline T whose first component I is H-qualified.
2.1. Forming textlines Since I is currently the last component in T, we set J as I, and set the
adjacent component of J as K. Next, we extract the three features
We form a horizontal textline T using an H-qualified component from I, J, and K, and take this feature vector as a positive example.
I as the first component. T can be formed on either the left or right We then reset J as the current last component and K as its adjacent
of I. In the following description, we assume that it is formed on the component, and extract the three features again. We proceed in this
right. manner until we encounter the last component of T, after which
Suppose T is a textline in which I is the first component and J is we move to the next labeled textline whose first component is H-
the last component, as illustrated in Fig. 2. Let RC be the rectangle qualified.
that lies to the right of J and has a width of 1.5×max(Jheight , Jwidth ). Positive examples are abundant in the training data set, but there
are relatively few negative examples. In fact, they only occur when
If RC does not contain any components, the formation of T ends at J.
we over-extend a textline and locate other textlines. To balance the
Otherwise, let K be the leftmost component that overlaps with RC;
number of positive and negative examples, which is an important
and let T̃ be the box that encloses both T and K. We define OTK
aspect of SVM learning, the training data set can be enlarged, but
as the overlap of the horizontal projections of T and K. We then
it is costly in terms of human labor. An alternative way, which we
determine whether K is compatible with T according to an SVM
adopt in this study, is to create the negative examples artificially by
decision function based on the following three features:
taking reference of the positive examples.
I Let each positive example be represented as a vector (x1 , x2 , x3 ),
height
1. T , where x1 , x2 , and x3 correspond to the three features described
height
O
above. In addition, let ui be the upper bound of xi , li be the lower
2. T TK , bound, and i = (ui − li )/2, i = 1, 2, 3. We then generate negative
height
examples as (z1 , z2 , z3 ), where zi corresponds to the ith feature and
T̃
3. T
height
. is drawn randomly from (li − i , li ) ∪ (ui , ui + i ), i = 1, 2, 3. Note
height that we generate as many negative examples as there are positive
examples. We illustrate the above values in Fig. 3, where we assume
Note that we use Theight as a normalizing factor for each of these
that all the examples lie in a two-dimensional space.
features because we want the resultant features to be invariant of
After preparing the positive and negative examples, we apply the
the sizes of the objects from which they are extracted. This measure
SVM learning procedure to produce a decision function of the form
will also be taken for all subsequent features. Now, if K is judged to
be a compatible component, we incorporate K into T and continue n

the process; otherwise, the formation of T ends at J. f(x) = yi i k(x, xi ) + , (1)
In order to train an SVM decision function, we must prepare both i=1
positive and negative examples. Initially, we need a set of train- where x is an arbitrary test sample; xi is a training sample; yi = 1 if
ing document images in which all the components, textlines, and xi is a positive example, or −1 if it is a negative example; i is a non-
text regions have been located and labeled with their respective negative quantity; is a bias term; and k(x, xi ) is a kernel function
that measures the similarity between x and xi for i = 1, . . . , n. In the
testing phase, when a test sample x is given, we classify it as positive
or negative according to whether f(x) 0 or f(x) < 0.

2.2. Finding similar textlines

If we were to form textlines from all H-qualified components,

many of the resultant textlines would be essentially the same. Two
operations can be performed to resolve this problem. First, when a
textline T is formed, we do not construct new textlines from any
Fig. 2. I and J are, respectively, the initial and last components in textline T; K is the components in T whose heights exceed × Theight , where 0 < < 1.
first component found outside T; OTK is the overlap of the horizontal projections
of T and K; and T̃ is the minimum rectangle that encloses T and K.
The value of is set at the end of the layout analysis as follows.
C.-C. Wu et al. / Pattern Recognition 41 (2008) 3200 -- 3213 3203

Our experimental results show that when we set to 0.8, the final 3.1.1. Marking taboo boxes
accuracy rate of the layout analysis (expressed as an F1 score, which Figs. 4 and 5 show how we mark taboo boxes. The left-hand panel
we define in Section 6) is the same as when we set to 1; however, of each figure shows a document with an area highlighted in green,
we only spend 1/4 of the time forming textlines. For this reason, while the right-hand panel shows the document with the green area
we set the value of as 0.8. Note that the restriction on forming expanded. For a component C in the right-hand panel, let SL , SR , SA ,
textlines is designed to save computing time; it does not eliminate and SB be the distances between C and the boxes located in the areas
all duplicated textlines. to the left, right, above and below C, respectively. If no box is located
Our second operation merges two textlines if they are judged as on one side of C, say the left side, then SL is the distance between C
essentially the same; that is, if they overlap, have the same reading and the left margin of the document.
order, and they are similar. To decide whether two textlines are The rules for marking taboo boxes are as follows:
similar, we again employ a learning procedure. We examine any two
If min(SL , SR ) > max(SA , SB ), then C probably belongs to a vertical
labeled textlines, A and B, in the training data set that belong to
textline, and we draw taboo boxes on the left and the right of
the same labeled text region, and extract the following feature from
C (Fig. 4).
them:
If min(SA , SB ) > max(SL , SR ), then C probably belongs to a hori-
min(A
height ,Bheight ) zontal textline, and we draw taboo boxes above and below C.
4. max(A .
height ,Bheight ) If SL > max(SA , SB ) and min(SA , SB ) > SR , then C probably starts
a horizontal textline, and we draw taboo boxes above, below,
We then use an SVM learning procedure to train a binary decision and on the left of C.
function, whose non-negative values indicate that two textlines are If SR >max(SA ,SB ) and min(SA ,SB ) > SL , then C probably ends a
similar, and whose negative values indicate that they are dissimilar. horizontal textline, and we draw taboo boxes above, below, and
on the right of C.
3. Step II: removal of textlines with false reading orders If SA > max(SL , SR ) and min(SL , SR ) > SB , then C probably starts
a vertical textline, and we draw taboo boxes above, on the left,
Since we are dealing with documents that have two reading or- and on the right of C (Fig. 5).
ders, we must form textlines in both the horizontal and the vertical If SB > max(SL , SR ) and min(SL , SR ) > SA , then C probably ends a
directions. Some of these textlines may be spurious due to the false vertical textline, and we draw taboo boxes below, on the left,
assumption about their reading order; however, we can only iden- and on the right of C.
tify them after we have completed the formation process. In Section
3.1, we explain how to detect and remove such textlines. After this Fig. 6a shows all the horizontal taboo boxes (i.e., the boxes on the
step, we may find that some of the remaining textlines overlap. We right and left of components) in the document in Fig. 5, while Fig. 6b
discuss the solution to this problem in Section 3.2. shows all the vertical taboo boxes (i.e., the boxes above and below
the components) in the same document. Recall that we partition the
3.1. Finding compatible textlines sets of textlines into disjoint groups. Let GP be one such group. We
determine the majority reading order of GP as follows. Let BGP be
Two textlines are said to have conflicting reading orders if they the box that encloses GP, Vtaboo be the total area of the vertical
intersect and are compatible in terms of size, but their reading or- taboo boxes that overlap with BGP , and Htaboo be the total area of
ders are different. Recall that, in Section 2, we constructed a decision horizontal taboo boxes that overlap with BGP . If Vtaboo > Htaboo , the
function to decide whether two textlines are similar. The function majority reading order of GP is horizontal; otherwise, it is vertical.
can also be used here to decide whether two textlines are compati- We call textlines with the majority reading order major textlines;
ble. However, since we are now dealing with textlines composed of other textlines are called minor textlines.
opposite reading orders, we have to modify the involved feature ac-
cordingly. Suppose we are given a horizontal textline A and a vertical 3.2. Removing false reading orders
textline B; the feature we extract from them is
min(A In most cases, the majority reading order of GP is the reading
height ,Bwidth ) order of the textlines deemed to be genuine. Thus, we should only
5. max(A .
height ,Bwidth ) retain the major textlines in GP and remove the minor ones. How-
Next, we need to put all textlines into disjoint groups. We start ever, this solution may cause a problem if two text regions are very
by adding an arbitrary textline to a group G, and then recursively close together, as shown in Fig. 7a. When textlines are formed and
add textlines whose reading orders conflict with some members grouped, we end up with a single group because a textline, T, over-
of G. When this is done, we form another group, and so on until extends from one region to the other so that all textlines are pulled
all textlines have been partitioned into disjoint groups. Note that into the same group. In this case, the majority reading order is ver-
the above operation will always result in the same disjoint groups, tical. After removing all horizontal textlines, we find a few compo-
irrespective of the choice of the first textline. Then, we decide the nents that do not belong to any textlines (Fig. 7b). We call these
valid reading order for each group. The following feature is very components orphans.
useful for this purpose. To resolve the problem caused by orphans, we adopt the follow-
Given a group of textlines obtained in the above manner, we ex- ing remedy. Whenever the removal of minor textlines causes the oc-
amine the components that comprise each textline in the group. We currence of orphans, we retain those textlines (Fig. 7c), and look for
want each component to "guess'', from its own viewpoint, whether possible points to break all the textlines into two groups. The cut-
it belongs to a horizontal or a vertical textline. If it believes that it ting points must lie on the retained textlines or on the textlines that
belongs to a horizontal textline, we mark taboo boxes in the vertical intersect with them. Since these points are located in gaps between
direction (i.e., above and below the component); otherwise, we mark consecutive components, we examine all the gaps in the textlines.
taboo boxes in the horizontal direction (i.e., on the left and right of After performing the cut operation, we examine all the textlines
the component). In the special case where a component starts or whose reading orders still conflict with those of other textlines. Let
ends a textline, we mark one more taboo box before or after the U denote such a textline. If the removal of U would not create any
textline, respectively. orphans, we remove it; otherwise, we retain U.
3204 C.-C. Wu et al. / Pattern Recognition 41 (2008) 3200 -- 3213

Fig. 4. The green area in (a) is expanded in (b). Taboo boxes (red) are drawn on the left and right of the component (yellow).

Fig. 5. The green area in (a) is expanded in (b). Taboo boxes are drawn above, on the left, and on the right of the component (yellow).

Fig. 6. (a) All horizontal taboo boxes are highlighted in yellow; (b) all vertical taboo boxes are highlighted in brown.

In the above example, making a cut at an appropriate gap in a create orphans; therefore, we retain those textlines, as shown in
textline T (the red rectangle in Fig. 7c) separates the upper and lower Fig. 7d, which is the solution we desire.
text regions. All the vertical textlines in the region above the cut To ensure that cuts are made at the appropriate gaps, we use
can now be removed, since doing so does not create any orphans. an SVM learning procedure to train a decision function, which de-
However, removal of the horizontal textlines in that region would cides whether or not to perform a cut at a particular gap. To prepare
C.-C. Wu et al. / Pattern Recognition 41 (2008) 3200 -- 3213 3205

Fig. 7. (a) Textlines with conflicting reading orders. (b) When all minority textlines are removed, certain components become orphans. (c) The result of retaining the minority
textlines whose removal would create orphans. The textline T is over-extended and should be cut at the red rectangle. (d) The result of the cut operation and removal of
the textlines that do not contain orphans.

Fig. 8. T is an over-extended textline Twidth is the width of T, and G is the height of the gap marked by the red rectangle. MWR is a rectangle whose height is G. Its width
is five times that of Twidth , and its central line coincides with that of the gap LWR is a similar rectangle whose right margin coincides with the central line of the gap; and
RWR is another similar rectangle whose left margin coincides with the central line of the gap.

positive and negative examples, we first form textlines for the doc- textlines in the data set, since some of them are over-extended. We
uments in the training data set by using the method described in then examine all the gaps between consecutive components in the
Section 2. Note that these textlines are not the same as the labeled textlines thus formed. If a gap does not fall within any labeled text
3206 C.-C. Wu et al. / Pattern Recognition 41 (2008) 3200 -- 3213

region in the data set, it constitutes a positive example; that is, it

represents a cutting point. Otherwise, it is a negative example. From
each gap and its neighborhood, we extract the following nine fea-
tures for SVM learning and testing (Fig. 8):

6. T G ,
width
|LWR∩(white pixels)|
7. Rl = ,
|LWR|
|MWR∩(white pixels)|
8. Rm = ,
|MWR|
|RWR∩(white pixels)|
9. Rr = ,
|RWR|
10. max(Rl , Rm , Rr ),
|LWR∩(taboo boxes)|
11. Rtl = ,
|LWR|
|MWR∩(taboo boxes)|
12. Rtm = ,
|LWR|
|RWR∩(taboo boxes)|
13. Rtr = ,
|LWR|
14. max(Rtl , Rtm , Rtr ),

where G is the height of the gap; LWR=the left-hand white rectangle;

LWR ∩ (white pixels) is the set of white pixels lying in LWR; MWR =
the middle white rectangle; RWR = the right-hand white rectangle
(Fig. 8); and |X| = the number of pixels in X.

4. Step III: elimination of textline overlaps Fig. 9. The intersection of V and H results in the overlapping zone OVLP; GAP1, GAP2,
GAP3, and GAP4 are the gaps between OVLP and the components above it, on the
right of it, below it, and on the left of it, respectively.
Before forming text regions out of textlines, we must ensure
that no overlaps exist between the textlines. The overlaps caused
by textlines with false reading orders are removed by the method
described in the previous section. Here, we deal with the remain-
ing overlaps, which can be divided into two types: orthogonal over-
laps (i.e., overlaps between textlines with opposite reading orders),
and parallel overlaps (i.e., overlaps between textlines with the same
reading order). In both cases, the overlapping zone must be assigned
to one textline only.

4.1. Eliminating orthogonal textline overlaps

In Fig. 9, we illustrate a scenario where the green area of the

document in the top panel is expanded in the bottom panel. The
overlapping zone, shown by the red rectangle, should be deleted
from the vertical textline and assigned to the horizontal textline.
To train a decision function, we have to prepare both types of
overlapping zones. The first, denoted as the H-type, is composed
of overlapping zones assigned to horizontal textlines; and the sec-
ond, denoted as the V-type, is composed of zones assigned to verti-
cal textlines. To prepare examples of both types, we use the proce-
dure described in Section 3.1. Here, we focus on horizontal textlines
that intersect with vertical textlines of incompatible size. Recall that
textlines whose sizes are compatible were dealt with in Section 3.1.
First, an overlapping zone is designated as H-type or V-type according
to whether it forms part of a labeled horizontal or vertical textline. Fig. 10. Two vertical textlines of incompatible size overlap; A is the larger of the
two textlines, and the GAP is the overlapping zone.
We then extract the following 39 features from each overlapping
zone and its neighborhood (Fig. 9):

V Note that the nine features referred to in 18--26, 27--35, 36--44,

15. H width , and 45--53 are similar to features 6--14, respectively. Also, we use
height
the notation Y to signify the number of components in Y.
16. OVLP ,
V
17. OVLP
H

, 4.2. Eliminating parallel textline overlaps
18--26. Nine features extracted from GAP1 and its neighborhood,
27--35. Nine features extracted from GAP2 and its neighborhood, If two textlines with the same reading order overlap, their sizes
36-44. Nine features extracted from GAP3 and its neighborhood, must be incompatible (see Fig. 10), since textlines of compatible size
45--53. Nine features extracted from GAP4 and its neighborhood. have been merged into a single textline already. Therefore, we need
C.-C. Wu et al. / Pattern Recognition 41 (2008) 3200 -- 3213 3207

to train a decision function to decide whether to assign an overlap-

ping zone to the larger textline. To do this, we form textlines in the
same way as before, and extract training examples from the over-
lapping zones between textlines with the same reading order, but
whose sizes are incompatible. These zones are taken as positive or
negative examples, according to whether they belong to a larger la-
beled textline or a smaller one. Then, from each zone and its neigh-
borhood, we extract the following 10 features (Fig. 10).

max(A ,B )
54. min(A width,B width) ,
width width
55--63. Nine features extracted from the GAP and its neighborhood.
These features are also similar to features 6--14.

5. Step IV: formation of text regions Fig. 12. (a) VDAB is the vertical distance between textlines A and B.

A text region is defined as a collection of consecutive textlines

that have the same reading order and that are approximately the HDAB
68. max(A ,
same size (i.e., the same height for horizontal textlines or the same height ,Bheight )
width for vertical textlines). Moreover, no other types of textlines,
non-textual objects, or multiple columns lie within the rectangle where OAB is the overlap of the horizontal projections of A and B,
that encloses a text region. In this section, we describe how to and HDAB is the horizontal distance between A and B.
group related textlines into text regions. Given a textline, we search
for related textlines in both the parallel direction and the orthog-
5.2. Finding related textlines in orthogonal directions
onal direction. For a horizontal textline, the parallel direction is
the horizontal direction, and the orthogonal direction is the vertical
When the search direction is orthogonal, we extract the following
direction.
three features (Fig. 12):

5.1. Finding related textlines in parallel directions B

height
69. A ,
height
To decide whether textlines are related, we have to train each VDAB
decision function based on positive and negative examples. The pos- 70. min(A ,
height ,Bheight )
itive examples are those labeled textlines that belong to the same VDAB
text region. We address the creation of negative examples in the 71. max(A .
next sub-section. When a pair of horizontal textlines A and B serves height ,Bheight )
as a positive example, we extract features from A and B according
to whether the search direction is parallel or orthogonal. When it is We have to create negative examples artificially, as described in
parallel, we extract the following five features (Fig. 11): Section 2. Recall that zi , a negative example's ith feature value, was
chosen randomly from (li −i , li )∪(ui , ui +i ), where li was the lower
B bound of the positive example's ith feature values, ui was the upper
height
64. A , bound, and i = (ui − li )/2. We perform a similar operation here, but
height we choose zi randomly from (ui , ui + i ) because we only consider
AB O
65. min(A , textlines that are far apart as unrelated.
height ,Bheight ) Having obtained a decision function to decide whether two
AB O
66. max(A , textlines are related, we form text regions in the following way. A
height ,Bheight )
HDAB textline T is included in a region RG, if (i) T is related to a textline
67. min(A , that has been included in RG already, and (ii) the minimum rect-
height ,Bheight ) angle enclosing T and RG does not overlap with any non-textual
objects. When two horizontal textlines are grouped in a region and
they are related, we merge them into one textline if they are close
to each other and they have relatively the same vertical position.
We do the same for vertical textlines.

5.3. Formation of multiple columns

In the previous step, we run the risk of merging multiple columns

into one text region (Fig. 13a). If this happens, we can undo the un-
desirable result by looking for gaps to divide the region into several
columns again. Suppose that RG is such a region composed of hori-
zontal textlines. We first find gaps in RG by way of the vertical projec-
tion profile. A gap in RG corresponds to an interval on the horizontal
axis of the projection profile, whose vertical values are smaller than
a certain threshold learned from the training data set (Fig. 13b). Due
Fig. 11. HDAB is the horizontal distance between textlines A and B; OAB is the to the existence of noise, the learned value of the threshold can be
overlap of the horizontal projections of A and B. slightly higher than zero. Let A be the left-hand part of a given gap,
3208 C.-C. Wu et al. / Pattern Recognition 41 (2008) 3200 -- 3213

Fig. 13. (a) DAB is the distance between column A and column B, and RG denotes the region. (b) The vertical projection profile of RG.

B be the right-hand part, and DAB be the distance between A and B. Table 1
The eight types of layout structure and the number of documents in the training
We then compute the following three features:
data set for each type of structure, where 'H' stands for horizontal and 'V' for
vertical
DAB
72. , Type of layout structure Number of samples
STDheight
min(A ,B ) 1. H-headlines, H-textlines, and rectangle-shaped contents 192
73. width width , 2. H-headlines, H-textlines, and L-shaped contents 67
STDheight
3. H-headlines, V-textlines, and rectangle-shaped contents 62
RG 4. H-headlines, V-textlines, and L-shaped contents 188
height
74. , 5. V-headlines, V-textlines, and rectangle-shaped contents 182
STDheight
6. V-headlines, V-textlines, and L-shaped contents 115
7. Mixture of text and pictures 196
8. Official documents of the R.O.C. 50
where STDheight is the standard height of textlines in RG, defined
as follows: let S = {Theight : T is a textline in RG}. We let each mem- Total number of training documents 1052

ber of S caste a vote for another member of S whose height is the

closest to its own. Then, the member of S that receives the highest
of a number of bounding boxes, represented by their upper-left and
number of votes is taken as the standard height. To prepare train-
lower-right coordinates, and also the boxes' labels, which specify
ing examples, we must first form text regions in training documents
whether they are textlines, text regions, or non-textual objects.
in the way just described. We extract positive examples from re-
gions that have two or more columns (i.e., labeled text regions),
and negative examples from other regions that have gaps, but not 6.1. Learning and testing
columns.
When dividing a region, we apply the operation recursively until To conduct SVM learning, we extract features from documents in
no resultant parts can be divided further. Each time, we hope the the training data set. For the kernel function, we use an RBF function
division will be made in such a way that at least one resultant part in the form
will be a column. Since we have no idea which part will be a column,
we compute min(Awidth , Bwidth ), rather than Awidth or Bwidth , for k(x, y) = exp(−x − y), (2)
feature 73.
where x and y are feature vectors, and is a parameter whose value
is chosen from {10a : a = −3, −2, . . . , 3}. There is another parameter,
6. Experimental results C, a penalty factor, whose value is chosen from {10b : b=−1, 0, . . . , 5}.
To find the optimal value for (C, ), we perform a cross-validation
Our training data set consists of 1052 document images collected operation whereby the feature vectors are partitioned into five folds.
from Chinese newspapers, newsletters, magazines, and books. All For each sub-problem, we build five decision functions, obtained in
the images were produced at 300 dpi resolution. Their sizes range such a way that each function is trained on four of the five folds, and
from 416 × 685 to 3462 × 2466 pixels, and there are eight types its accuracy rate is measured on the remaining fold. Table 2 presents
of layout structure (Table 1). An example of each type is shown in the average accuracy rates of the five decision functions for each sub-
Fig. 14. As can be seen in the examples, a variety of layout structures problem under the optimal value for (C, ). The relations between
appear in the documents. Also a variety of fonts and sizes appear in these sub-problems and the four steps in the layout analysis are also
the textlines, especially in the headlines of the documents. For each presented in Table 2. The steps are: (I) formation of textlines (Section
document in the data set, we prepare a ground-truth file consisting 2); (II) removal of textlines with false reading orders (Section 3);
C.-C. Wu et al. / Pattern Recognition 41 (2008) 3200 -- 3213 3209

Fig. 14. The eight types of document layout listed in Table 1 are shown in (a)--(h), respectively.

Table 2 truth files for each of the eight types of documents in Table 1; a
The performance of SVM decision functions on all sub-problems
total of 200 test documents. Before testing, we trained SVM decision
Step Sub-problem Involved Cross-validation functions on all the feature vectors extracted from the training data
features accuracy rate (%) set, instead of four-fifths of them as before. We then applied the
I Forming textlines 1--3 99.99 decision functions to the test documents and registered, in the output
Finding similar textlines 4 99.98 files, the textlines and text regions they found. To evaluate the test
II Finding compatible textlines 5 Not applieda results, we compared the output files with the ground-truth files as
Removing false reading orders 6--14 98.20 follows.
III Eliminating orthogonal textline overlaps 15--53 98.30 A bounding box Boutput in an output file is said to match a box
Eliminating parallel textline overlaps 54--63 96.40 Bgroud-truth in the corresponding ground-truth file if both the x-
IV Finding related textlines in orthog- 64--68 99.76 and y-coordinates of their corner points differ by less than 5 pixels
onal directions (cf. Section 3.3 in [15]). We define I(Bground-truth , Boutput ) to be
Finding related textlines in paral- 69--71 99.83
lel directions Bground-truth ∩ Boutput if the two boxes match; or a null set, if they
Formation of multiple columns 72--74 99.99 do not. Next, we define the following measures:
a This
sub-problem uses the decision function that was trained for the previous
sub-problem.
I(Bground-truth , Boutput )
Recall rate =
Bground-truth
(III) elimination of textline overlaps (Section 4); and (IV) formation
of text regions (Section 5).
To evaluate the end-to-end performance of our layout-analysis I(Bground-truth , Boutput )
Precision rate =
solution, we prepared 25 additional documents and their ground- Boutput
3210 C.-C. Wu et al. / Pattern Recognition 41 (2008) 3200 -- 3213

Table 3 Table 6
Recall rates, precision rates, and F1 scores for textlines and text regions The results of the compared layout analysis approaches

Type of object Recall rate (%) Precision rate (%) F1 (%) Type of object Recall rate (%) Precision rate (%) F1 (%)

Textline 98.96 98.71 98.83 The learning approach Textline 98.96 98.71 98.83
Text region 96.91 96.54 96.72 Text region 96.91 96.54 96.72

Rule-based approach Textline 98.38 98.26 98.32

Text region 94.34 93.85 94.09
Table 4
Recall rates, precision rates, and F1 scores for textlines at the end of each step

Step Recall rate (%) Precision rate (%) F1 (%) 200 test documents. The results show that even though the learning
approach does not lead by much in textlines, it makes significant
I 34.41 7.13 11.8
II 38.45 36.09 37.23
headway in text regions. The two examples shown in Figs. 15 and
III 92.27 90.01 91.12 16 provide the reason: layout errors may only affect a few textlines,
IV 98.96 98.71 98.83 but they can damage very large text regions, thereby decreasing the
accuracy rates significantly in text regions.

Table 5 6.3. Further information about the features

Time consumption (in seconds) for Steps I--IV for eight images; the sizes of the
images are presented in the first column
In the following, we consider the features used in the four steps
Document Time Total time further. The 74 features listed in previous sections can be classified
image size
into two categories. The first category consists of features 1--5 and
Step I Step II Step III Step IV 64--74. These features describe the shapes of components, textlines,
1024 × 1236 0.10 0.23 0.09 0.14 0.56 or text regions, and their values are derived from the heights or
2112 × 1019 0.21 0.44 0.15 0.22 1.02 widths of these objects, or certain ratios of the measured values.
2016 × 859 0.12 0.36 0.17 0.11 0.76 We use them to decide whether a component is compatible with
2432 × 766 0.17 0.33 0.07 0.12 0.69
704 × 1785 0.08 0.22 0.04 0.16 0.50 a textline (Section 2.1), whether two textlines are similar (Section
768 × 2053 0.20 0.36 0.15 0.11 0.82 2.2), whether they are compatible (Section 3.1), whether they are
2112 × 1321 0.09 0.48 0.14 0.25 0.96 related (Sections 5.1 and 5.2), or whether to break a text region into
2458 × 3482 0.15 0.51 0.10 0.41 1.17
multiple columns (Section 5.3). These features are quite intuitive in
nature and almost all papers on layout analysis have incorporated
some of them in their procedures.
where X denotes the area of X. In addition, we propose the following The second category consists of features 6--63. We use these fea-
measure, which incorporates the above metrics [21,22]: tures in the procedure for removing textlines with false reading or-
ders (Section 3.2), or in the procedure for eliminating orthogonal
F1 = 2 × (Precision rate) × (Recall rate)/ and parallel textline overlaps (Sections 4.1 and 4.2). The features are
(Precision rate + Recall rate). formed out of two types of objects: taboo boxes (Section 3.1) and
white rectangles (Section 3.2). Taboo boxes are our own invention,
The test results, expressed in terms of these three measures, are while white rectangles have been used by Ittner and Baird [2], and
presented in Table 3. Since there are four steps, in Table 4, we also Pavlidis and Zhou [3].
present how each step contributes to the recall rate, precision rate We also address the question of how important each feature is
and F1 score for textlines. The stepwise results for text regions are to the SVM problem in which it is used for classification purposes.
not available, since text regions are only formed in the last step. In The answer can be derived by computing the merit or contribution
Table 5, we list the time needed to apply each step to the eight doc- of each feature to the objective of the SVM problem. The objective
ument images, each of which represents a type of layout structure of a given SVM problem is defined as
listed in Table 1. All computations were conducted on an Intel Pen-
tium 4 CPU 2 GHz with 2 GB RAM. n

w2 = yi i yj j k(xi , xj ),
i,j=1
6.2. Comparison with other approaches
where yi and i appear in Eq. (1) for i = 1, 2, . . . , n; and k(·, ·) is the
The results in Table 3 demonstrate that the proposed learning kernel function, which also appears in Eqs. (1) and whose formula
approach achieves very high end-to-end accuracy rates on test doc- is specified in Eq. (2). The merit of a feature f is then taken to be the
uments, which are not part of the training document set. The results variation of the objective, defined as follows:
in Table 5 further present that the approach is computationally effi-
cient in the sense that it only requires 1.17 s (or less) to complete the

whole procedure for each listed document. The learning approach |w − w | =
2 (f) 2 yi i yj j k(xi , xj )
appears to have an edge on other approaches, if we compare its ac- i,j

curacy rates with those reported in the literature and reproduced in
(f) (f)
Section 1.1. More rigorous comparison is difficult to conduct, since − yi i yj j k(xi , xj ) ,
reproducing other approaches is impractical due to the incomplete i,j

specification of procedures and parameter values, and the fact that
they were implemented on private databases. where, if x is a d-dimensional feature vector, then x(f) is the (d − 1)-
However, we are able to compare the learning approach with one dimensional vector obtained from x by dropping the feature f. For
of our former proposals that uses a rule-based approach, as reported example, if x = (x1 , x2 , . . . , xd ), then x (1) = (x2 , x3 , . . . , xd ). The idea
in Chang et al. [12]. Table 6 shows the recall rates, precision rates, of taking the variation of an objective as the merit of a feature was
and F1 scores achieved by the two approaches when applied to the proposed by Kohavi and John [23]. The smaller the variation, the
C.-C. Wu et al. / Pattern Recognition 41 (2008) 3200 -- 3213 3211

Fig. 15. Left panel: an error (located in the green area) produced by the rule-based approach; right panel: the correct result obtained by the learning approach.

Fig. 16. Upper panel: an error (located in the green area) produced by the rule-based approach; lower panel: the correct result obtained by the learning approach.

less important the feature is deemed to be. This idea was adopted Fig. 18 shows another ambiguous situation, where a horizontal
in the context of SVM by Guyon et al. [24] and Rakotomamonjy textline, referred to as K, incorporates two smaller characters, located
[25]. In Table 7, we present the merit of each feature to the corre- in the green area. These characters must be read vertically; thus, K
sponding SVM problem. The features are ordered according to their is not an ordinary textline, since it contains two types of characters.
merits. It cannot be divided into two textlines either, because no clues exist
between the two parts. Our method again merges the two textlines
6.4. Unsolved problems to form a single textline.
In Fig. 19, the green area contains a watermark, which inter-
We conclude this section by considering some examples that our feres with our textline-formation process. However, our method
method could not analyze correctly. In Fig. 17, we have two horizon- cannot remove a graphical object that overlaps with textual
tal headlines. Our mistake is caused by the lower headline, denoted objects.
as L. The green area contains three Arabic digits that are much higher In the last example (Fig. 20a), the text region consists of only
than the other characters. We face an ambiguous situation here: L is two vertical textlines. Unfortunately, the total area of the vertical
not an ordinary textline because its characters do not have the same taboo boxes exceeds that of the horizontal taboo boxes (Figs. 20b and
height. Furthermore, it cannot be divided into two textlines, since no c), because some gaps within the vertical textlines are larger than
clues (e.g., sufficiently large gaps and/or taboo boxes) exist between the gaps between the textlines. In this instance, our method judges
the digits and their neighboring characters. Our method combines L that the majority reading order of the region is horizontal. Because
with the upper headline to form a single textline. of this wrong decision, all subsequent operations are inappropriate,
3212 C.-C. Wu et al. / Pattern Recognition 41 (2008) 3200 -- 3213

Table 7
The 74 features, referred to by their indices, are ordered according to their merits
with respect to the SVM problems in which they are involved as variables

Step I. Sub-problem: forming textlines

Index 3 2 1
Merit 613 432 321
Step I. Sub-problem: finding similar textlines

4
Not applieda
Step II. Sub-problem: finding compatible textlines
Fig. 19. A watermark overlaps one of the text regions.
5
Not applieda
Step II. Sub-problem: removing false reading orders

Index 8 9 10 7 12 13 14 11 6
Merit 613 540 362 312 276 265 264 252 150
Step III. Sub-problem: eliminating orthogonal textline overlaps

Index 25 36 60 18 16 43 15 42 44 32 49 39 52
Merit 122 105 99 89 87 59 58 51 42 41 38 38 28
Index 45 53 21 34 23 24 26 33 41 35 17 51 27
Merit 26 25 22 22 22 22 22 22 18 15 12 7 7
Index 28 19 40 37 31 38 29 30 22 48 47 20 46
Merit 7 6 5 4 4 3 2 2 2 2 1 1 1
Step III. Sub-problem: eliminating parallel textline overlaps

Index 54 57 63 56 60 59 58 55 62 61
Merit 85 55 38 36 34 29 27 22 19 3
Step IV. Sub-problem: finding related textlines in orthogonal direction

Index 64 68 67 65 66
Merit 6 5 4 2 1
Step IV. Sub-problem: finding related textlines in parallel direction

Index 71 70 69
Merit 535 486 329
Step IV. Sub-problem: formation of multiple columns

Index 74 72 73
Merit 3031 1723 934

Note that the decimal parts of the merit scores have been omitted.
a Since only one feature is involved in this problem, there is no need to compute Fig. 20. (a) A text region composed of only two vertical textlines. (b) The horizontal
the merit of that feature. taboo boxes are highlighted in yellow. (c) The vertical taboo boxes are highlighted
in brown. (d) The incorrect result of our analysis. (e) The correct result.

After completing one step with the help of an SVM decision function,
we train another decision function, based on the information ob-
tained in the previous steps. This "learning-while-doing'' strategy is
very effective in terms of the time spent constructing solutions and
Fig. 17. Two horizontal headlines; the lower one contains two types ofcharacters. the accuracy of the solutions. We believe the same strategy could be
used for many types of documents, not just those we experimented
with in this paper.

Acknowledgment

This work was supported in part by the National Science Council

Fig. 18. A horizontal headline contains two smaller characters that must be read of the ROC under Grant: NSC95-2422-H-001-009.
vertically.
References

resulting in a messy outcome (Fig. 20d). The correct result is shown [1] M. Krishnamoorthy, G. Nagy, S. Seth, M. Viswanathan, Syntactic segmentation
and labeling of digitized pages from technical journals, IEEE Trans. Pattern Anal.
in Fig. 20e.
Mach. Intell. 15 (7) (1993) 737--747.
[2] D.J. Ittner, H.S. Baird, Language-free layout analysis, in: Second International
7. Conclusion Conference on Document Analysis and Recognition, 1993, pp. 336--340.
[3] T. Pavlidis, J. Zhou, Page segmentation and classification, Graphical Models
Image Process. 54 (6) (1992) 484--496.
Since we view document layout analysis as a series of split-or- [4] S.-W. Lee, D.-S. Ryu, Parameter-free geometric document layout analysis, IEEE
merge operations, we have to learn how to proceed step-by-step Trans. Pattern Anal. Mach. Intell. 23 (11) (2001) 1240--1256.
when we start to construct our solutions. Our approach involves [5] L. O'Gorman, The document spectrum for page layout analysis, IEEE Trans.
Pattern Anal. Mach. Intell. 15 (11) (1993) 1162--1173.
four steps, namely, textline formation, removal of false reading or- [6] A. Simon, J.-C. Pret, A.P. Johnson, A fast algorithm for bottom-up document
ders, elimination of textline overlaps, and text-region formation. layout analysis, IEEE Trans. Pattern Anal. Mach. Intell. 19 (3) (1997) 273--277.
C.-C. Wu et al. / Pattern Recognition 41 (2008) 3200 -- 3213 3213

[7] F. Liu, Y. Luo, M. Yoshikawa, D. Hu, A new component based algorithm for [16] S. Marinai, M. Gori, G. Soda, Artificial neural networks for document analysis
newspaper layout analysis, in: Sixth International Conference on Document and recognition, IEEE Trans. Pattern Anal. Mach. Intell. 27 (1) (2005) 23--35.
Analysis and Recognition, Seattle, USA, 2001, pp. 1176--1180. [17] C.L. Liu, K. Nakashima, H. Sako, H. Fujisawa, Handwritten digit recognition:
[8] J. Xi, J. Hu, L. Wu, Page segmentation of Chinese newspapers, Pattern Recognition benchmarking of state-of-the-art techniques, Pattern Recognition 36 (10) (2003)
35 (12) (2002) 2695--2704. 2271--2285.
[9] M. Chen, X. Ding, J. Liang, Analysis, understanding and representation of [18] C. Cortes, V. Vapnik, Support-vector network, Machine SVM 20 (1995) 273--297.
Chinese newspaper with complex layout, in: International Conference on Image [19] V. Vapnik, The Nature of Statistical SVM Theory, Springer, New York, 1995.
Processing, Vancouver, USA, 2000, pp. 590--593. [20] F. Chang, C.-J. Chen, C.-J. Lu, A linear-time component-labeling algorithm using
[10] M. Okamoto, M. Takahashi, A hybrid page segmentation method, in: Second contour tracing technique, Comput. Vision Image Understanding 93 (2) (2004)
International Conference on Document Analysis and Recognition, Tsukuba, 206--220.
Japan, 1993, pp. 743--748. [21] C.J. van Rijsbergen, Information Retrieval, Butterworths, London, 1979.
[11] J. Liu, Y.Y. Tang, C.Y. Suen, Chinese document layout analysis based on adaptive [22] D.D. Lewis, Evaluating and optimizing autonomous text classification systems,
split-and-merge and qualitative spatial reasoning, Pattern Recognition 30 (8) in: Proceedings of the 18th Annual International ACM SIGIR Conference on
(1997) 1265--1278. Research and Development in Information Retrieval (SIGIR 95), 1995, pp.
[12] F. Chang, S.-Y. Chu, C.-Y. Chen, Chinese document layout analysis using adaptive 246--254.
regrouping strategy, Pattern Recognition 38 (2) (2005) 261--271. [23] R. Kohavi, G. John, Wrappers for feature subset selection, Artif. Intell. 97 (1997)
[13] K. Etemad, D. Doermann, R. Chellappa, Multiscale segmentation of unstructured 273--324.
document pages using soft decision integration, IEEE Trans. Pattern Anal. Mach. [24] I. Guyon, J. Weston, S. Barnhill, V. Vapnik, Gene selection for cancer classification
Intell. 19 (1) (1997) 92--96. using support vector machines, Machine SVM 42 (2000) 389--422.
[14] A. Dengel, F. Dubiel, Computer understanding of document structure, Int. J. [25] A. Rakotomamonjy, Variable selection using SVM-based criteria, J. Machine SVM
Imaging Syst. Technol. 7 (1996) 271--278. Res. 3 (2003) 1357--1370.
[15] K. Laven, S. Leishman, S. Roweis, A statistical SVM approach to document
image analysis, in: Eighth International Conference on Document Analysis and
Recognition, Seoul, Korea, 2005, pp. 357--361.

About the author---CHUNG-CHIH WU received the B.S. degree from the Department of Information Engineering and Computer Science, Feng Chia University, Taiwan, in
2001, and the M.S. degree from the Department of Computer Science and Information Engineering, National Cheng Kung University, Taiwan, in 2003. He joined the Institute
of Information Science, Academia Sinica as a research assistant in 2004. His research interests include document layout analysis and image processing.

About the Author---CHIEN-HSING CHOU received the B.S. and M.S. degrees from the Department of Electrical Engineering, Tamkang University, Taiwan, in 1997 and
1999, respectively, and the Ph.D. degree from the Department of Electrical Engineering from Tamkang University, Taiwan, in 2003. He is currently a postdoctoral fellow at
the Institute of Information Science, Academia Sinica, Taiwan. His research interests include pattern recognition, neural networks, and image processing.

About the Author---FU CHANG received the B.A. degree in Philosophy from National Taiwan University in 1973, the M.S. degree in Mathematics from North Carolina
State University in 1978, and the Ph.D. degree in Mathematical Statistics from Columbia University in 1983. He worked as assistant professor in the Department of Applied
Mathematics, Operations Research and Statistics, State University of New York at Stony Brook (1983--1984), member of Technical Staff at Bell Communications Research, Inc.
(1984--1986) and at AT&T Bell Laboratories (1986--1990). He joined the Institute of Information Science as associate research fellow in 1990. His current research activities
are focused on machine learning, document analysis and recognition, and cognitive science.