0% found this document useful (0 votes)
44 views7 pages

2008 Guerzhoy-Segmentation of Rectangular Objects Lying

Uploaded by

GustavoRomero
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views7 pages

2008 Guerzhoy-Segmentation of Rectangular Objects Lying

Uploaded by

GustavoRomero
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Segmentation of Rectangular Objects Lying on an Unknown Background in a

Small Preview Scan Image

Michael Guerzhoy* Hui Zhou*

Abstract In fact, many scanner utilities don’t provide such


functionality. Our algorithm allows the user to
We describe a method to segment rectangular automatically crop their rectangular documents by
objects that lie on a slightly textured background of an providing their own non-white background (Figure 2).
a-priori unknown colour. Our contribution consists of Again, being able to scan multiple cards or receipts at
a fast and accurate background colour approximation once is useful.
method, a set of heuristics for accurate detection of
rectangle sides, and procedures to generate imprecise
hypotheses of rectangles, adjust hypotheses to fit the
rectangles in the image, and verify or reject the
hypotheses. Our algorithm is capable of detecting
overlapping and touching objects such as photos,
receipts, and business cards on a very small-sized
preview scan image (79 by 109 pixels) on a
coloured/textured background

1. Introduction
Automatic cropping of objects in a scanned image
considerably speeds up the process of scanning for the Figure 1. Photo segmentation in preview
user. Ideally, a cropping is proposed by the graphical
user interface after obtaining only a low-resolution
preview image since the preview image is faster to
obtain(in our experiments, we use a 79x109 pixels
preview image). While the colour of the background of
the scanned image is usually known, some scenarios
call for online background colour estimation.
We are motivated by the task of extracting
approximately rectangular documents from the preview
image. One application of the algorithm is speeding up
the process of digitizing photographs by allowing the
user to put the photographs on the flatbed scanner in an
arbitrary manner and on an arbitrary background and
still be able to store each photo automatically in a
separate file (Figure 1).
Ordinarily, the background of the scanned image is
white. This makes segmentation of mostly white Figure 2. Scanning in a card using our
objects, such as business cards or receipts, challenging. algorithm

*
In our algorithm, we first estimate the background
colour, and then proceed to segment the rectangular
objects in the image using the estimated background
colour.
Background colour estimation is particularly
challenging when most of the background is occluded
and a single colour dominates the foreground objects
(for instance, this would happen when scanning several
photos with snow scenes lying on a white background).
We use the assumption that background-coloured
pixels will appears contiguously to generate
background colour hypotheses, and choose a
hypothesis based on the edge statistics of the image.
We detect the background-non background edges in
the image and use perceptual grouping to find the
edges of the objects and then the objects themselves.
Throughout, we use our large synthetically-generated
dataset to learn to classify edges and, later, rectangle
hypotheses, and verify or reject them. Figure 3. Algorithm outline
Popular approaches to the problem of segmentation
of rectangular objects include the Generalized Hough
Transform that detects rectangles directly following
3. Background colour estimation
edge detection [9]; Hough transform of the edge space
Widely used general background/foreground
and subsequent search for rectangle hypotheses in
segmentation methods include pixel clustering using
Hough space [6]; distance transforms may be used to
graph-cuts [1], mean-shift and variations, edge
detect corners and group them into rectangles [11]. The
detection, and combinations of the above [2]. These
methods above can be directly applied to the problem
methods are of non-linear complexity in the number of
of segmentation of rectangular objects in a scanned
pixels being clustered, and do not make strong model
image; however, they do not make use of all of the
assumptions about the input image. The straight-
available domain knowledge. Namely, we can assume
forward colour histogram-based method takes the peak
that the background is at least approximately uniformly
of the colour histogram of the image as the background
coloured and that the objects are not. We use this
colour. This simple method, however, often fails for
assumption to explicitly detect the colour of the
our task. Herley [4] proposes a linear-time background
background.
colour detection scheme that uses the assumption that
Herley [3, 4, 5] addresses the problem of detecting
the foreground objects are separable by grid-aligned
rectangular objects on a scanned image as well as
line-segments. Herley constructs a histogram of those
background colour detection under the assumption that
colours that account for a majority of pixels in at least k
the objects are separable by grid-aligned line-segments.
rows or columns. This method, however, sometimes
Our approach applies when these assumptions don’t
fails in cases where the rectangular objects are not
hold.
separable by grid-aligned line-segments. A more
Our algorithm is outlined in Section 2. The details
aggressive approach is therefore needed.
of the algorithm are given in Sections 3 through 6. We
We use the following observations for the
provide experimental results in Section 7.
background colour B:
a. Since the background is approximately
2. Algorithm outline constant, we will observe uniformly coloured
contiguous line-segments along image rows
The algorithm proceeds as outlined in Figure 3. and columns that are of colour B
b. The edges that occur near pixels of colour B
will tend to be stronger than those that occur
near pixels of other colours
c. Given the size, shape, and number of the
rectangular objects in the image, assuming
some fixed resolution, it is possible to predict coloured segments by sequentially finding the end
the number of edge points near pixels of point of the line-segments that comprise the line. This
colour B that will occur in the image. method, with N = 3, turns out to work well enough for
our purposes for image resolution 79x109. Note that
Our idea is to select the colour C that accounts for the chosen optimal N will vary with resolution and, to a
the plurality of uniformly-coloured line-segments in the much lesser extent, the expected typical size of the
image (observation a) subject to constraints on the rectangular objects to be detected, since different run
strength and quantity of edge points on lengths of the same colour would be expected for
background/non-background boundaries that were different resolutions and sizes of objects.
detected by assuming that the background colour is C
(observations b and c). 3.2. Selecting candidate background colours

3.1. Segmenting lines into uniformly-coloured In order to select colours that are likely to be
segments background colours, we assume that there are many
long background-coloured line-segments in the image.
We first segment the rows and columns of the image Two voting arrays of size 16x16x16 are set up, one
into segments that are roughly the same colour. To do for colour means and one for colour variances. The cell
this, we propose a one-pass method, which is explained (i, j, k) in the means array stores the number of votes
below. for the colour 16*(i, j, k), and the cell (i, j, k) in the
Consider a line with pixels numbered 0, 1, 2, … .Let variance array stores the (diagonal) variance that
colour(k) be the RGB colour vector of pixel k on the corresponds to the colour candidate 16*(i, j, k). The
line. Suppose the line contains two differently-coloured variance will tend to be larger both when there is a
line segments, s 1 = [ k 1 , k 2 ] and s 2 = [ k 2 + 1, k 3 ] . genuine variation in background colour in the image
Assume that the colours of the pixels on the line- and when the estimate 16*(i, j, k) is imprecise.
segments are normally distributed around a mean When a line segment votes, we increment the
colour vector C i , and that the standard deviation of the appropriate cell in the colour means voting array by 1,
and the appropriate cell in the colour variances array
colour σ is significantly smaller than the difference with the (diagonal) variance of the segment.
between the mean colour of s1 and the mean colour The voting array is then smoothed to avoid aliasing,
the variances array is normalized, and the colour with
of s 2 . Formally,
the highest vote count is considered to be the most
color ( k ) ~ N ( C 1 , σ 1 ), k ∈ [ k 1 , k 2 ] likely background colour. Background colour
color ( k ) ~ N ( C 2 , σ 2 ), k ∈ [ k 2 + 1, k 3 ] candidates with high vote counts are then used in edge
k1 ≤ k 2 ≤ k 3 detection, as described below.
C1 ≠ C 2
3.3. Edge detection
max( σ 1 ∞
,σ 2 ∞
) << C 1 − C 2 ∞
We then define Given a potential background
j
colour C = ( r ± σ r , g ± σ g , b ± σ b ) , we can attempt
mean ([ i , j ]) : = ( ∑ color ( k ) ) /( j − i )
k =i to detect edge points near the boundary of the
It's easy to see that background. For a pixel of colour C that lies on the
mean ([ k 1 , i + 1]) − mean ([ k 1 , i ]) is symmetrically boundary of C-coloured and non-C-coloured regions,
edge strength is calculated. For other pixels, edge
distributed around ( 0 0 0 ) T for i < k 2 , and is
strength is set to 0. More specifically, a pixel p is of
symmetrically distributed around (approximately) colour C if
k 1 ( C 2 − C 1 ) / i 2 for i ≥ k 2 . c i − p i < max( σ i , δ ), i ∈ { r , g , b }
Therefore, we can detect that we passed point k 2 if δ can be chosen to optimize performance.
the vectors The pixel p is on the boundary given the
D i = ( mean ([ k 1 , i + 1]) − mean ([ k 1 , i ])) point in background colour C if
the same direction (roughly C 2 − C 1 ) for i = [j, j+1, ( C ( p ) ⊕ C ( L ( p ))) ∨ ( C ( p ) ⊕ C ( B ( p ))) , where
… j+N-1] (N can be adjusted). We can therefore obtain C(p) indicates that p is of colour C, L(p) is the pixel to
an approximate segmentation of a line into uniformly- the left of p, and B(p) is the pixel to the bottom of p.
This ensures that we only detect edge pixels on the We group edge points into line-segments, which are
boundaries of background-coloured and non- then grouped into right-angle corners that form our
background-coloured regions in the image. rectangle hypotheses.
Edge strength, which will be used in next step, is Grouping edge points into line-segments proceeds
calculated as follows: as follows: a small group of spatially 8-connected edge
EdgeStrength = δ (U ( p ), B( p )) + δ ( L( p ), R ( p )) , pixels is selected, and is fitted to a line-segment using
where weighted Total Least Squares, i.e., we compute the
δ ( a , b ) = min(( max ( a i − bi ), i ∈ { r , g , b }), ∆ ), ∆ = 127 major eigenvector of the weighted covariance matrix of
i
the coordinates of the edge pixels, with the weight
, U(p) is the pixel to the top of p, and R(p) is the pixel corresponding to the edge strength. We add
to the right of p. neighboring pixels that lie in proximity and along the
line-segment until the minor eigenvalue of the weighted
3.4. Detecting the background colour variance-covariance matrix exceeds a threshold. This is
similar to the UpWrite method [10].
When the image consists mostly of foreground If two line-segments are nearly perpendicular and
objects, the background colour determined by only their ends are sufficiently close, their intersection is
using line-segment information is not reliable. The marked as a right-angle corner, and the angle of the
reason is that the background line-segments tend to be bisector of the corner is recorded.
shorter and there are fewer of them for an image with Groups of right-angle corners form a rectangle
more foreground objects. hypothesis if their configuration forms a plausible
We observe that: 1) the average edge strength is rectangle, i.e., if two oriented right-angle corners can
higher when the true background colour is used; 2) be seen as lying on the diagonal of a hypothesized
when the true background is used, the number of rectangle.
background/non-background edge pixels is correlated Rectangle hypotheses that nearly coincide are
with the number of foreground objects in the image. merged. This is done to avoid adjusting duplicate
These two observations allow us to heuristically hypotheses, since the adjustment operation described
pick the true background colour by combining the edge below is relatively expensive.
statistics and the results of the voting by line-segments.
Here follows the procedure to determine the
5. Rectangle side detection
background colour:
1. Determine the most likely background colour
Given an inaccurate estimation of the endpoints of a
candidates using line segmentation by only
side s of a rectangle and an estimate of the background
considering line-segments longer than
colour, we need to find the accurate endpoints of s.
α *Length(scanline) for several α between 0 This is required for the hypothesis adjustment stage
and 1. immediately after rectangle hypothesis formation.
2. Among the different colour candidates, We devise a score function that reaches maximum
choose ones that occur most frequently when the estimated edge is a true edge, and perform
3. Select the colour that has edge point count local search to maximize this function: we compute
within reasonable range whose average edge four edge statistics and then learn a function that
strength is the largest (we tune the parameters combines them for optimal detection. Learning-based
here using our training dataset) edge detection in a somewhat different context has
Note that this is effective because we need only been explored in [7].
arbitrate between different background colour The score function takes as inputs the image, the
candidates when the background is mostly occluded, end points of the rectangle side AB, the center
since otherwise there is a clear winner in the vote. coordinates of the rectangle, and the estimate of the
Since the edge count statistics for this case are more background colour ( r ± σ r , g ± σ g , b ± σ b ) .
constrained (we know that there are several objects in
the image), it is easier to tune the parameters in (3). We compute the following for points uniformly
spaced out across the interval AB:
(1) The colour difference in RGB computed along
4. Feature Grouping the normal to AB pointing outside the rectangle, where
the colour edge between a = ( r1 , g 1 , b1 ) and
b = ( r2 , g 2 , b 2 ) is • For each hypothesis, we adjust it (see below), and
δ ( a , b ) = min((max( a i − b i ), i ∈ { r , g , b }), 127 ) . calculate the four statistics defined in Section 4 for
each side, as well as the average non-
(2) The difference in “non-backgroundedness” backgroundedness score of the pixels inside the
computed along the normal to AB pointing outside the rectangle, with the non-backgroundedness of a pixel
rectangle, where the “non-backgroundedness” of pixel
( p r , pb , p g ) given background
a = (r1 , g1 , b1 ) given background estimate
(b r ± σ r , bb ± σ b , b g ± σ g ) defined
c ± σ = ( r ± σ r , g ± σ g , b ± σ b ) is
as max ( ( p i − bi ) / σ i ) .
nbg = min( 4 , max ( ( a ( i ) − c ( i )) / σ i )) i
i
• We normalize each of the statistics to [0, 1] using
We then compute the medians and means of (1) and the maximum and minimum obtained in each sample
(2) above for a total of four statistics. We learn an for each particular statistic
optimal score function that combines these four • We train a decision-stumps-boosted-by-discrete-
statistics. Adaboost classifier using these statistics and the ground
We learn a function of the form truth data. That is, we set the target to 1 if the rectangle
f ( s1 , s 2 , s 3 , s 4 ) = as 1 + bs 2 + cs 3 + ds 4 such that hypothesis corresponds to a true rectangle; and 0
f is maximized when applied to a true edge. otherwise
For this purpose we generate 100,000 synthetic The output of the discrete Adaboost classifier serves
samples with photos lying on different backgrounds, as the rectangle score for a rectangle hypothesis. The
and quantized each parameter variable into 40 values. greater the score, the more likely it is that the rectangle
For each ground-truth rectangle side in the images, we hypothesis corresponds to a true rectangle.
compute the edge statistics for it and for the edges that
are parallel to it and lie close to it. We then proceed 5.2. Rectangle adjustment
iteratively as follows:
• Set (a,b) to initial values, and find the value of The statistics defined in Section 4 and used here for
(c,d) s.t. the edges localized by the algorithm coincide scoring the rectangle are sensitive to shifts. We
most closely with the ground truth therefore perform for a hypothesis H a local search for
• Having found (c, d), search for the optimal (a, b) each side, or for the whole rectangle when searching
• Search for the optimal (a,c) for a correct angle (see Section 4), to find the
• Do the same for other combinations hypothesis H’ in the local neighborhood of H such that
This appears to be a good way to find f, and the the sum of the scores of its sides is maximal.
function is fast to compute and not memory intensive.
The learning procedure, while non-standard, appears to 6. Rectangle Detection
work well enough for our purposes.
Having found a good edge score function, it is a We accept rectangle hypotheses iteratively. The
trivial matter to find the accurate endpoints of the adjusted rectangle hypothesis with the highest score is
rectangle side given by its approximate endpoints AB accepted at each iteration unless it overlaps with
by using hill-climbing. previously accepted hypotheses by more than 30% of
The method described here is also used to refine the either rectangle’s area. The sides of the accepted
cropping in higher resolutions given a cropping in the hypothesis are added into the line-segment list, and we
preview image as an input. backtrack to corner detection (Section 4) until no new
hypotheses are accepted.
5. Rectangle hypotheses We output the list of accepted hypotheses when the
loop terminates. The termination condition is as
5.1. Rectangle scoring follows: the loop terminates when at least 90% of
pixels classified as non-background lie within accepted
We obtain several rectangle hypotheses, and then hypotheses. A pixel ( p r , p b , p g ) is classified as non-
select the true hypotheses and reject the false ones. The background given background
rectangle score function is obtained in the following (b r ± σ r , b b ± σ b , b g ± σ g ) if
way:
• We generate rectangle hypotheses in 100,000 max ( ( p i − bi ) / σ i ) is greater than a threshold.
i
synthetic samples as in Section 4.
7. Results 8. Conclusions
Table 1 shows the results of running the algorithm We have introduced an approach for rectangular
on our sample set of a total of 90 images (79x109 object segmentation in preview of scanned images,
pixels) of multiple photos obtained by EPSON Stylus including:
CX5400 in professional mode, with colour and • a background colour estimation method;
textured backgrounds. The runtime of the algorithm is • an algorithm that uses perceptual organization to
around 50 milliseconds per image on a 2GHz CPU. See detect rectangles on an image given a
Figure 4 for samples from the test set. background colour; and
In Table 1, each correctly detected photo counts as a • a method to adjust detected rectangles to better
true positive, each detection that does not correspond fit the objects.
to a photo counts as a false positive, photos with no The approach can deal with overlapping and
corresponding detections count as false negatives, and touching, and thus can be used for streamlining the
photos for which the boundaries obtained aren’t scanning of multiple rectangular documents. Because it
accurate are counted as incorrect detects. only requires an image with small resolution, it can be
The main failure modes are: integrated into scanning software for segmentation at
• Objects where two opposite corners cannot the pre-scan stage.
be detected because of the absence of the We described an algorithm that relies in part on
necessary edges in the image because of counting background-non background edge points
similar colours in the foreground and the given a hypothesized background colour to select the
background correct background colour, and that uses one-pass
• Pairs of objects which form rectangles, so scanline segmentation in order to use the fact that
that the pair is detected as a single object background patches are contiguous in the image in the
Our background colour detection algorithm works background colour detection process. These ideas
correctly on 89 of the 90 test samples. could conceivably be adapted to other settings.
Our algorithm uses a synthetically generated sample
set in order to learn the parameters for edge detection
and the procedure that accepts or rejects hypotheses.
Finally, we described applications for rectangular
objects segmentation on an unknown background,
which include scanning multiple mostly-white
documents by placing them on a non-white
background.

9. References

Figure 4. Samples from our test set for 2, 3 [1] Y. Boykov, V. Kolmogorov, “An Experimental
and 4 photos in an image Comparison of Min-Cut/Max-Flow Algorithms for Energy
Minimization in Vision,” IEEE Transactions on Pattern
Analysis and Machine Intelligence, Vol. 26, No. 9, Sept.
2004

Table 1. Experimental results [2] C. M. Christoudias, B. Georgescu, P. Meer, "Synergism


in low level vision", 16th International Conference on
# of Photos True False False Incorrect Pattern Recognition., Quebec City, Canada, volume IV,
in image positives positives negatives detects pages 150–155, 2002
1 10 0 0 0
2 30 0 3 1 [3] C. Herley, "Recursive method to extract rectangular
2 (touching) 20 0 0 2 objects from scans," International Conference on Image
Processing, vol.3, no. pp. III- 989-92, 14-17 Sept. 2003
3 26 0 0 1
3 (touching) 38 2 5 2 [4] C. Herley, “Recursive Method to Detect and Segment
4 36 0 0 0 Multiple Rectangular Objects in Scanned Images,” technical
report MSR-TR-2004-01, Microsoft Research, 2004
[5] C. Herley, "Efficient inscribing of noisy rectangular
objects in scanned images," International Conference on
Image Processing, Vol.4, 2399- 2402, 24-27 Oct. 2004

[6] C. Jung, R. Schramm, “Rectangle Detection based on a


Windowed Hough Transform,” Proceedings of the XVII
Brazilian Symposium on Computer Graphics and Image
Processing 1530-1834, 2004

[7] S. Konishi, A. L. Yuille, J. M. Coughlan, S. C. Zhu,


"Statistical Edge Detection: Learning and Evaluating Edge
Cues," IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 25, no. 1, pp. 57-74, Jan., 2003.

[8] C. Rothwell, “The Importance of Reasoning about


Occlusions during Hypothesis Verification in Object
Recognition,” Rapport de recherche N.2673, INRIA-Sophia
Antipolis, Team ROBOTVIS, 1995

[9] Y. Zhu et al., “Automatic Particle Detection through


Efficient Hough Transforms,” IEEE Transactions on Medical
Imaging 22(9): 1053-1062, 2003

[10] R. A. McLaughlin, M. D. Alder “The Hough Transform


Versus the UpWrite,” IEEE Transactions on Pattern Analysis
and Machine Intelligence, Vol. 20, No. 4, April 1998

[11] Z. Yu, C. Bajaj, “Detecting circular and rectangular


particles based on geometric feature detection in electron
micrographs,” Journal of Structural Biology 145 (2004) 168–
180

You might also like