0% found this document useful (0 votes)

33 views14 pages

A Weak Structure Model For Regular Pattern Recognition Applied To Facade Images

This document presents a novel method for recognizing structured patterns in images, specifically detecting windows in facade images. The proposed method uses a weak structure model that represents the embedded structure through pairwise attribute constraints between elements, rather than a global grid. This allows for loosely regular configurations. A reversible jump Markov chain Monte Carlo framework is used to efficiently find the optimal parameters, including the number of elements, their attribute values and locations, and neighborhood relationships. The method was tested on window detection in facade images to demonstrate its ability to handle irregular configurations while achieving performance of other strongly informed methods for regular structures like grids.

Uploaded by

Anis Kacem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views14 pages

A Weak Structure Model For Regular Pattern Recognition Applied To Facade Images

Uploaded by

Anis Kacem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

A Weak Structure Model for Regular Pattern

Recognition Applied to Facade Images

Radim Tylecek and Radim

Sara
Center for Machine Perception
Faculty of Electrical Engineering,
Czech Technical University,
Prague, Czech Republic
Abstract. We propose a novel method for recognition of structured im-
ages and demonstrate it on detection of windows in facade images. Given
an ability to obtain local low-level data evidence on primitive elements
of a structure (like window in a facade image), we determine their most
probable number, attribute values (location, size) and neighborhood re-
lation. The embedded structure is weakly modeled by pair-wise attribute
constraints, which allow structure and attribute constraints to mutually
support each other. We use a very general framework of reversible jump
MCMC, which allows simple implementation of a specic structure model
and plug-in of almost arbitrary element classiers. The MC controls the
classier by prescribing it where to look, without wasting too much
time on unpromising locations.
We have chosen the domain of window recognition in facade images to
demonstrate that the result is an ecient algorithm achieving perfor-
mance of other strongly informed methods for regular structures like
grids, while our general model covers loosely regular congurations as
well.
1 Introduction
Recent development in construction of virtual worlds like Google Earth or Bing
Maps 3D heads toward higher level of detail and delity. Popularity of applica-
tion such as Street View shows that reconstruction of urban environments plays
an important role in this area. While acquisition of extensive data in high reso-
lution for this purpose is feasible today, their automated processing is now the
limiting factor for delivering more realistic experience and it is a task for com-
puter vision at the same time. In urban settings, typical acquired data are images
of buildings facades and their interpretation can help discover 3D structure and
reduce the complexity of the resulting model; for example, it would allow going
beyond planar assumptions in dense street view reconstruction presented by [1].
Complexity is particularly important when the representation has to scale with
the size of cities in applications such as [2] who plan to combine range data with
images. The work of [3] dealing directly with structural regularity in 3D data
also supports our ideas.
2 Radim Tylecek and Radim

S ara
While facades as man-made scenes exhibit intensive regularity and structure
when compared to arbitrary natural scenes, they still present a great variety of
styles, congurations and appearance. The design of a general facade model that
is able to cover their range is thus a challenging problem, and several approaches
have been proposed to deal with it.
Shape grammars, as introduced in [4] and later picked up by [5], are the basic
essence for all recent methods based on procedural modeling to overcome the
limitations of traditional segmentation techniques. The idea of shape grammars
is that image can be explained by combining rules and symbols.
Some aspects of probabilistic approach were rst discussed in [6], including
the use of Reversible Jump Markov Chain Monte Carlo (RJMCMC). The pro-
posed grammar is simple, based on splitting and the results are demonstrated for
highly regular facades only. In a similar fashion [7] determines the structure by
splitting facade to a regular grid of individual tiles and subdividing them. Meyer
and Reznik [8] presented a pipeline for multi-view interpretation, where heuris-
tics based on interest points were designed to detect positions of windows, and
subsequently used MCMC to localize their borders. Ripperda [9] has designed
a comprehensive dictionary of rules, on which the proposed method substan-
tially depends; the results presented on simple facades show this approach has
diculty to achieve good localization.
The most recent method of [10] combines trained randomized forest classi-
ers with shape grammar to segment Haussmannian facades into eight classes.
Their model assumes windows form a grid while allowing dierent intervals. In
the second step, positions of rows and columns are stochastically estimated by a
specic random walk algorithm that does not propose dimension changes. They
evaluated their results quantitatively on a limited dataset of Haussmannian fa-
cades in Paris which is available online.
The majority of the mentioned algorithms for single-view facade interpreta-
tion work with hard constraint on grid congurations of windows and employ
strong domain-specic heuristics. Additionally, they require user design of spe-
cic grammar or training, while both processes are prone to overtting. Our
contribution is in the design of segmentation framework with the following prop-
erties:
a general model allows a simple implementation avoiding strong domain
specic heuristics,
structure is not modeled by a global grid, but softly by local pair-wise con-
straints, allowing loosely regular congurations,
dierent element classiers can be conveniently plugged in,
ecient interpretation is achieved as the classier is guided by the sampler
and need not even visit all image pixels in practice,
the number, spacing and exact size of facade elements need not to be known
in advance and does not rely on preprocessing that can fail i.e. in irregular
cases like in Fig. 4.
Since windows are the most prominent elements of a facade, we choose detection
of window-like image elements to be the target of this paper.
A Weak Structure Model for Regular Pattern Recognition 3
image likelihood (Sec. 4)
p(I|k, A, X, N)
structural model (Sec. 3)
p(k, A, X, N)
model
p(I, k, A, X, N)
structural regularity
p(N, X|k)
structural complexity
p(k)
edge (4.1)
p(J|k, A, X, N)
color (4.2)
p(C|k, A, X, N) p(k, N, X)
str. prior (3.2)
attribute constraints (3.1)
p(A|k, N, X)

9
X
X
X
Xz

)
P
P
P
Pq
P
P
P
Pq

)
P
P
P
Pq

Fig. 1. Hierarchy in probability model, numbers in brackets are section references.
2 Structural Recognition Framework
We consider the problem of recognizing elements in an image, like windows in a
facade. Our model parameters (variables) consist of complexity k (the number of
windows), shape attributes A (i.e. size, aspect), location attributes X (window
center locations) and element neighborhood relation N. The recognition task
can then be formulated as follows: Given image data I, we search for model pa-
rameters = (k, A, X, N) by nding the mode of the following joint distribution
p(I, )

= arg max

p(I|)p(), (1)
which is computed with Bayes theorem from data likelihood p(I|) and structural
model prior p(). We will decompose our probability model hierarchically as
shown in Fig. 1 and propose pdfs specic for the task of window detection in
facade images. Then we can apply stochastic RJMCMC framework to nd the
optimal value

by eectively sampling from the space of possible combinations

of parameters . More details on its implementation will be given in the following
sections.
3 Structural Model
The structural model is based on pair-wise element neighborhood and attribute
constraints, yielding bottom-up approach. We are given a set of k N ele-
ment locations X =

x
i
R
2
; i = 1, . . . , k

. Our neighborhood representation

is based on a planar graph G(X) = {V (X), D(X)}, where vertices V (X) =
{v
i
; i = 1, . . . , k} correspond to elements and edges D(X) = {(u, v); u, v V (X)}
to relative neighborhood relationship between them.
Since we are dealing with image elements attributed by their locations X in
image plane, we can limit the edge set D(X) to a reasonable planar subgraph
4 Radim Tylecek and Radim

S ara
and Relative Neighborhood Graph (RNG) turns out to be a natural choice [11].
It is dened by the following condition: Two points u and v are connected by an
edge whenever there does not exist a third point r that is closer to both u and
v than they are to each other (in Euclidean metric). It is known that RNG is a
unique subgraph of Delaunay Triangulation (DT), and can be computed from
it eciently, in O(n) time. This choice denes a function X G(X), where the
graph is uniquely constructed from a set of element locations X.
We dene neighbors as elements that are in immediate proximity of each
other and such that they share some attributes. This neighborhood N is to
be recovered as a part of the solution, and we represent it by binary labels
N = {l
uv
{0, 1} ; (u, v) D(X)} for edges indicating mutual neighborhood of
two elements when l
uv
= 1. Such two elements are then members of the same
structural component, where all connected elements are related by attribute
similarity constraints. Labels l
uv
= 0 allow the existence of dissimilar elements
in proximity of each other.
An edge (u, v) has an orientation attribute o
uv
{h, v}, which is a function of
locations x
u
, x
v
of elements on its endpoints. It is given by the angle between
vertical direction and line connecting element locations. The case of || <

4
determines vertical orientation (h), the other case is horizontal (v). This choice
denes a function D(X) {h, v}.
The prior probability model p(k, N, X, A) = p(A|k, N, X)p(k, N, X) splits
into attribute constraints p(A|k, N, X) and structure prior p(k, N, X). The pa-
rameters of the underlying distributions were chosen empirically.
3.1 Attribute Constraints
The attribute constraints evaluate the similarity of two neighboring elements (in
terms of N); such attributes can be shape or appearance.
For facades, we assume our elements can be represented by a rectangular
shape template with its borders parallel to image borders. The shape attributes
A = {W, H, T} = {(w
i
, h
i
, t
i
) ; i = 1, . . . , k} are described in Fig. 2 and the
column width t
i
= t is given and xed. Our attribute constraints will then
h
i
t
i
w
i
Fig. 2. Left: Window shape template is parametrized by its width wi (0, 1), height
hi (0, 1), both relative to image height I
h
, and the width of the central column
ti (0, 1) relative to the window width. Right: Shape template (red) is matched with
image edges (blue).
A Weak Structure Model for Regular Pattern Recognition 5
reect the fact neighboring windows most probably have the same dimensions.
We start by decomposition
p(A|k, N, X) = p(W|H, k, N, X)p(H|k, N, X)1(A|X), (2)
where p(W|H, k, N, X) =

k
i=1
p(w
i
|h
i
) is the aspect ratio with distribution
p(w
i
|h
i
) = (
wi
wi+hi
,
r
,
r
). When any of the windows overlap with another, we
set unit function 1(A|X) = 0, eectively avoiding such window conguration.
To model constraints on heights H, we introduce a set of latent variables
h
c
, one for each component c of graph G(X) with neighborhood N. The height
similarity within components is enforced in
p(H|k, N, X) =

p(h
c
)

iVc
p(h
i
|h
c
)

, (3)
where c is from the set of all components, V
c
is the set of windows in the com-
ponent c and p(h
c
) = (h
c
,
h
,
h
) is the common height prior. Each height
in a component c should be most probably equal to h
c
, which is expressed by
p(h
i
|h
c
) = N(h
i
h
c
, 0,
h
).
3.2 Structural Prior
The structure prior p(k, N, X) = p(N, X|k)p(k) combines structural regularity
p(N, X|k) and complexity p(k).
Structural Regularity. In order to model multiple assumptions on p(N, X|k),
we express it as a probability mixture [12]:
p(N, X|k) =
1
p
a
(X|N)p(N) +
2
p
s
(X|N)p(N) +
3
p
c
(N|X)p(X), (4)
where

k
i=1

i
= 1,
123
=
1
3
and k was omitted in p() for simplicity. We assume
element locations in p(X) are mutually independent and uniformly distributed
in image. The neighborhood prior p (N) =

(u,v)
p(l
uv
) takes into account the
possibility of suppressing an edge where p(l
uv
= 0) = p
sup
, p(l
uv
= 1) = 1p
sup
and p
sup
= 0.01 is the probability of a suppressed edge.
Alignment. The rst assumption on the position of elements is that neighboring
elements should be horizontally or vertically aligned. We model this by measuring
angles (x
u
, x
v
) (

4
,

4
) between the line connecting element locations x
u
x
v
and horizontal (o
uv
= h) resp. vertical (o
uv
= v) direction, and express them in
p
a
(X|N) =

(u,v)D(X)
p(x
u
, x
v
|l
uv
), (5)
where p(x
u
, x
v
|l
uv
= 1) = (

(x
u
, x
v
),

= 50 and

(x
u
, x
v
) =
2

(
uv
+

4
) (0, 1) is the angle normalized to unit interval. The probability
in the case of a suppressed edge is p(x
u
, x
v
|l
uv
= 0) = p
a0
.
6 Radim Tylecek and Radim

S ara
Spacing. The second assumption is that the distance between elements in a
horizontal or vertical neighborhood should most probably be equal. We model
this by comparing distances to horizontal and vertical neighbors in
p
s
(X|N) =

(u,v,z)D
2
(X)
p(x
u
, x
v
, x
z
|l
uv
, l
vz
) (6)
where (u, v, z) denotes a pair of edges (u, v), (v, z), u = z with the common vertex
v and the same orientation. The distance term is expressed by p(x
u
, x
v
, x
z
|l
uv
=
l
vz
= 1) = (
uv
uv+vz
,

), where

= 50 and
uv
= |x
u
x
v
| are
distances to the neighbors. As in the previous case, the probability in the cases
with any suppressed edge is p(x
u
, x
v
, x
z
|l
uv
= 1 l
vz
= 1) = p
s0
.
Congurations. We model higher-order dependencies in the structure congu-
rations with
p
c
(N|X) =
k

i=1
p(l
ij
|(i, j) D(X)), (7)
where the probabilities p(l
ij
|(i, j) D(X)) model the expected degree of a given
vertex i, including orientation of edges (i, j) connected to it, i.e. the typical
grid conguration is to have two vertical and two horizontal edges incident with
vertex i.
With the grid assumption and the window size prior, we can estimate the
number of rows m =
1
2
h
and columns n =
1
2
h
r
h
, assuming the space between
the windows to be equal to the window size. This heuristic plays only a minor
role in our model and helps us to derive the vertex conguration probability
p(l
ij
|(i, j) D(X)). It is given in Table 1, where rows and columns correspond
to the number of horizontal and vertical edges connected to the window vertex.
The maximum degree of a vertex in RNG is six with at most three horizontal
and three vertical edges.
Table 1. Neighborhood conguration prior p(lij|(i, j) D(X)), where deg
h
(i), degv(i)
are functions of neighboring labels lij. The pc0 = 10
4
is the probability of a single
(unstructured) window, pc1 = 0.099 is the probability of a single row or column of
windows, pc2 = 0.9 is the probability of a window grid, pc3 = 10
5
is the probability
of more dense congurations.
deg
h
(i), degv(i) 0h 1h 2h 3h
0v pc0
1
2
pc1
1
(m2)
pc1 pc3
1v
1
2
pc1
1
4
pc2
2
(m2)
pc2 pc3
2v
1
(n2)
pc1
2
(n2)
pc2
1
(m2)(n2)
pc2 pc3
3v pc3 pc3 pc3 pc3
A Weak Structure Model for Regular Pattern Recognition 7
Structural Complexity. The prior for number of elements can be modeled
with Poisson distribution p(k) = Pois(k, mn) based on the estimation of number
of rows m and columns n given above.
4 Data Likelihood
The data likelihood p(I|K, N, A, X) is solely task-specic and can be chosen
arbitrarily as long as it can be evaluated by means of probability density or
likelihood ratio.
In the task of window detection in facade images, the input is image I =
{i; i = 1, . . . , I
w
I
h
} dened as a set of pixels and we assume it is rectied,
i.e. the windows borders are parallel to the image borders, and I
w
, I
h
are image
width and height.
We want to express the probability of observing image I if window parameters
and structure are given. We combine two features: image edges J and color C in
p(I|k, A, X, N) = p(J|k, A, X, N)p(C|k, A, X, N). We use color to detect regions
of interest and edge features for localization of the windows borders.
4.1 Edge Likelihood
We assume that window borders correspond to edges, and use Canny detector
to nd them. However, this model will not fully hold in real world situations,
when we obtain the input by detecting edges in a picturethere can be windows
which do not have all pixels with underlying edges and vice versa, some edges
do not belong to any windows at all. The latter case will typically prevail.
We use binary imaging model for window edges represented by oriented edge
image J = {J
i
{0, 1, 2} ; i I}, where J
i
= 1 if pixel i belongs to an horizontal
edge detected in I (foreground), resp. J
i
= 2 for vertical edge; otherwise J
i
= 0
(background). We dene d(J) (0, 1) as a distance transform of the edge image
J normalized by max(I
h
, I
w
). We use the gradient of d(J) to distinguish between
horizontal and vertical edges. Similarly, we introduce edge image R(A, X) ren-
dered from the current conguration specied by attributes A, X and the shape
template in Fig. 2 with nearest neighbor discretization. Assuming pixel indepen-
dence, we can write p(J|A, X) =

iI
p(J
i
|R
i
(A, X)) where the probability of
observing a pixel i in the edge image J given the rendered conguration R is
p(J
i
= 0|R
i
= 0) = p
TN
= 1 2p
FN
,
p(J
i
{1, 2} |R
i
= 0) = p
FN
= 0.1, (8)
p(J
i
= 0|R
i
{1, 2}) = p
FP
(d(i))(1 p
FX
), d(i) > 0,
p(J
i
= 1|R
i
= 1) = p(J
i
= 2|R
i
= 2) = p
TP
= p
FP
(0),
p(J
i
= 2|R
i
= 1) = p(J
i
= 1|R
i
= 2) = p
FX
,
where p
FP
(d(i)) = (d(i),
FP
= 500, 1) makes rectangles close to edges more
probable and acts as a guide for directing the random walk. The p
FX
= 10
9
is
8 Radim Tylecek and Radim

S ara
the probability assigned when the edge specied by the conguration crosses an
image edge with opposite direction.
The edge likelihood can be eciently evaluated from pre-computed integral
edge images, one for each orientation, yielding constant computational complex-
ity O(1) per edge; this speed-up is possible thanks to rectied images and helps
make random sampling (described in Sect. 5) very ecient.
4.2 Color Likelihood
A pixel color classier matches the input RGB color image C =

c
i
(0, 1)
3
; i = 1, . . . , k

with a unimodal Gaussian distribution N(

C,
C
) for window pixels. Its mean

C = (0.33, 0.36, 0.38) (0, 1)

3
and covariance
C
of window color were trained
on a single representative facade image and correspond to dark colors; higher
mean in blue channel is related to the reection of sky in window glass. We
use the classier to segment pixels either to foreground (window) or background
(non-window) sets C
f
C
b
= I. Assuming pixel independence, the probability
of observing segmented image is
p(C|A, X) =

iC
f
p
f
(c
i
|A, X)

jC
b
p
b
(c
j
|A, X), (9)
where the foreground color model is expressed by p
f
(C
i
|A, X) = N(

C,
C
), the
background probability p
b
(c
j
|A, X) = p
b
is constant and we evaluate foreground
pixels only. Similarly to edge likelihood, color likelihood can be evaluated using
pre-computed integral images in linear time.
5 Recognition Algorithm
We have chosen reversible jump Markov Chain Monte Carlo (RJMCMC) frame-
work [13] that ts our task of nding the most probable interpretation of the
input image in the terms of target probability p(, I) in (1), which has a very
complex pdf as it is a joint probability of both attributes and structure. Our
solution

is found as the most probable parameter value the chain visits in a

given number of samples.
While the MCMC algorithm is simple, we need to carefully design proposal
distribution q that should approximate target distribution p(, I) well while it
is easy to sample from it. We should point out that the quality of the resulting
interpretation is determined by the probability model and the time necessary to
reach the solution is inuenced by the proposal distributions. It turns out that
by exploiting the estimated structure we can eciently guide the random walk
of our chain by repeatedly sampling the new state

from the vicinity of the

current state from conditional probability q(

|).
We use an independent sampler q(|I) to initialize the Markov chain, which
samples the initial state
0
either from the prior distribution q() or ex-
ploits some image information in q(|I). This involves sampling the number
A Weak Structure Model for Regular Pattern Recognition 9
of elements k q(k) rst and then their attribute values (X, A) q(X, A)
independently. In practice we choose sampler to start with k
0
= 1.
The conditional sampler q(

|, I)

is a mixture of individual samplers

such that each modies a subset of parameters based on a specic proposal
distribution q
m
(

|, I). The main sampler only chooses from q(m) which of the
individual samplers m will be used to propose the next move. We will now
propose the set of samplers that will explore the space of parameters . Their
design must fulll Markov Chain properties of detailed balance and reversibility
of all moves, i.e. given a move there must always exist a reverse move m

, and
their probability ratio must be reected in the acceptance of Metropolis-Hastings
(MH) algorithm:
A = min

1,
p(

, I)
p(, I)

q(m

)
q(m|)

. (10)
5.1 Metropolis-Hastings Moves
Moves introduced in this section do not modify the model complexity k and can
be thus evaluated by a classical MH algorithm (10).
Attribute modication. This move picks up an element i U({1, . . . , k}) from
discrete uniform distribution and perturbs some of its attributes values ran-
domly. Additionally, attribute samplers can be designed to exploit image likeli-
hood to increase the acceptance rate. In the window detection scenario, we have
implemented three variants for this type of proposals:
Drift - random variation of position x

i
= x
i
+ , N(0,

) without
changing the size,
Resize - randomly pick up one of four window sides (left/right/top/bottom)
and move it by ,
Flip - x one of the window sides and ip the window around it.
Element resampling. This move is a more radical variant of the previous one,
we pick up an element i and change of all its attributes by sampling from the
prior distribution a

i
, x

i
q(a
i
, x
i
) or a

i
, x

i
q(a
i
, x
i
|I) if possible.
Attribute constraint enforcement. This move proposes changes to the attributes
according to the current neighborhood, a

i
, x

i
q(a
i
, x
i
|A, X, N). We pick up a
random edge (u, v) U(D(X)) and direction (u v or v u) and transfer
attribute values over the edge from one element to another according to the
specic constraints, i.e. a

u
= a
v
. For facades, we transfer both position and size
from one element to the other in dimension given by orientation of the connected
edge, i.e. height and vertical position for horizontal edge.
10 Radim Tylecek and Radim

S ara
Structure modication. We include move to allow changes to the neighborhood
structure: it picks up a random edge q
d
(u, v) and changes its label l

uv
=
1 l
uv
, eectively suppressing or recovering the edge.
Proposals for latent heights h
c
are performed similarly by choosing uniformly
component c and then sampling h
c
N(

h
c
,
h
), where

h
c
=
1
|Vc|

iVc
h
i
is the
mean height in the component.
5.2 Reversible Jump Moves
We also need to nd the number of elements k, that controls the dimension of
parameters A, X. In order to compare the models in dierent dimensions, we
need to dene dimension matching functions q

, q

for both direct and reverse

moves. Then the acceptance ratio can be calculated as A = min {1, }, where
=
p(

, I)
p(I)

q(m|

)
q(m

|)

q

)
q

|)
J

, (11)
where refers to direct move, to reverse move, u are dimension matching
variables and J

f(,u)
(,u)

is the Jacobian of the transformation, following

the notation given in [13]. There are three moves:
Birth. By inserting a new element into our model we propose an increase of
dimension k k

= k + 1. We choose the communication variables to be

= [a

, x

], where we sample the attributes of the new element a

, x

q(a, x)
and obtain a new state where A

= {A, a

} and X

= {X, x

}. The correspond-
ing dimension matching function is f

(A, X, u

) = f

({A, X}, [a

, x

]), which
inserts a

into the set, and its Jacobian J

= 1. We will use the following

notation within this paper: terms in [ ] refer to communication variables and
terms in { } to parameters. The reverse move is death, for which we have no
communication variable u

= [ ], only choose an element i to be removed

from the set. To establish reversibility, we dene inverse matching function as
f

, X

, u

) = f

({A

, X

}, [ ]) , where a
i
, x
i
are the removed attributes and
A = A

\ a
i
, X = X

\ x
i
. The corresponding birth move acceptance is then

birth
=
p(

, I)
p(I)
q(m|

)
q(m

|)

q(i|k

)
q(|k)

1
q

|A)
1, (12)
where q

|A) = p(a) is the prior probability of the new window, q(i|k

) =
1
k

and q(|k) =
1
k
are the probabilities of selecting the windows a

, a
i
.
Death. By removing an existing element from the set we propose a decrease of
dimension k k

= k 1, and choose a window i U(1, k) to be removed.

With an appropriate change of labeling, the derivation of death move will be the
same as for birth, except for the inversion of ratios in (12).
A Weak Structure Model for Regular Pattern Recognition 11
Replicate. This is a special case of the birth jump that exploits the structure
for predicting values for the new elements according to attribute constraints,
which can be generally described as sampling from a

, x

q(a, x|N). For

facades, we uniformly sample an edge (u, v) U(D(X)) and place the new
window to the position according to x

= x
u
+ (x
v
x
u
), where we choose
U

1
2
,
1
3
,
2
3
, 2, 1

and calculate the new height by h

=
1
2
(h
u
+h
v
) and
the width w

analogically.
5.3 Convergence and Complexity
We have found that the typical necessary number of MCMC samples (classier
calls) is proportional to image size in pixels |I| (from 30% for easy instances to
200% for dicult ones). This is a good news, we expected that the number will
grow exponentially with scene complexity. As a result, we xed the number of
samples in our current method to a pessimistic estimate, but our experiments
suggest that signicantly shorter sampling time could be achieved with suitably
designed stopping condition.
6 Experimental Results
We have performed a number of experiments with the implementation of window
detection in facades of various styles to demonstrate the universality of our
approach. We have run the Markov Chain for 510
5
iterations in our experiments,
which roughly equals to visiting all pixels in the analyzed images.
Because of a very recent appearance of a rst public dataset known to us with
quantitative results in [10], we are among the rst to compare with them. The
test part of the dataset consists of 10 rectied and annotated images of facades
from a street in Paris, which share attributes of Haussmannian style but diers
in lightning conditions. Direct comparison is not possible, because they segment
facade pixels into eight dierent classes of elements and our window detector
denes only two (window/non-window). To deal with this issue, we have merged
the columns of confusion matrix given in [10] into two, and the results are given in
Table 2. All parameters of our model were xed for this experiment, specically
the size prior was set such that the most probable relative window height is
h = 0.1 and aspect ratio r = 0.5.
The numbers in Table 2 for window and wall classes show that our weak
structure model slightly outperforms Procedural Segmentation (PS) framework
[10]. This is clearly a success, because PS benets from a randomized forest com-
bining 8 classiers, trained on 15 15 pixel patches in 20 images from the same
street as the test data, and a grammar specically designed for Haussmannian
style. In contrast, our method is guided by far weaker cues: color of individ-
ual pixels, rectangular shape matching with image edges and size prior. In our
case the dominant role plays the weak structural model that emerges from the
data: it is able to select among objects of interest proposed by local classiers
and, at the same time, support windows completing the structure even where
12 Radim Tylecek and Radim

S ara
a) Monge No. 13 b) Monge No. 43 c) Monge No. 50
Fig. 3. Visualization of results on part of Parisian dataset [10], facade a) is occluded
by plants, in facade b) cast shadow is present. False positive windows in c) are also
window-like regions: They have good response from both classiers and match with
the neighbors. Detected windows are shown in red, neighborhood edges in green and
image edges are emphasized in blue. Results on the complete test set are available as
supplemental material.
the classier response is low. This allows us to achieve good results even when
illumination varies and partial occlusion of windows is present, as shown in Fig.
3. Poor results of Randomized Forest (RF) segmentation from [10] included in
Table 2 give an idea how entirely unstructured approaches perform on this data.
For classes dierent than window and wall the results cannot be directly
compared with the other methods, but allow us to analyze the behavior of our
method in such classes. Balconies are typically overlapping windows in Hauss-
mannian style, but such overlaps are somehow randomly annotated as window
or balcony in the ground truth [10], even when the appearance is the same, in-
troducing some amount of ambiguity in the results. The shop class areas are
Table 2. Quantitative results on Haussmannian dataset [10] shown in percentage of
pixels from class specied in a row. Second column displays the percentage of pixels of
given class in the whole test set. RF stands for Randomized Forest, PS for Procedural
Segmentation. Our window detection rate of 83% is comparable to 81% rate for PS (in
bold face).
ground truth[10] RF [10] PS [10] proposed mapping of our classes
class area hit miss hit miss hit miss window non-window
window 11 30 70 81 19 83 17
wall 48 38 62 83 17 84 16
A Weak Structure Model for Regular Pattern Recognition 13
a) Modern facade b) Irregular facade c) Sparse structure
Fig. 4. Results on facade images from Prague.
Fig. 5. Interpreted facades of a modern building. Left: Simple shape template with
t = 1 fails to detect light windows. Right: Change to t = 0.33 improves the result
signicantly as the response from edge likelihood is stronger.
actually formed by shop-windows and the wall around them, and the visualized
results show that our detector follows this interpretation. The roof area was
dicult for our approach, since the color classier considers them window-like.
While the authors in [10] claim their segmentation framework generalizes on
some mild variants of Haussmannian facades, we can say our framework is not
limited to any particular style at all. To prove this, we demonstrate results on
modern buildings in Fig. 5 and 4 a).
Finally, we have made experiments with loosely regular facade of Frank
Gehrys Dancing House shown in Fig. 4 b), where window alignment shows
signicant deviation from grid structure. We were successful in correctly locat-
ing all windows lying on the major plane as well as their neighborhood. The
ability to handle sparse regular structures is presented on the right in Fig. 4 c).
14 Radim Tylecek and Radim

S ara
7 Conclusion and Future Work
We have presented a recognition framework that uses a weak structure model to
locate elements in images, and demonstrated its potential in the task of window
detection in facades. Our experiments have demonstrated that structural regu-
larity given by pair-wise attribute constraints can eciently guide a stochastic
process that estimates element locations and neighborhood at the same time.
We have shown that the conjunction of a weak non-specic classier and a weak
structural model can lead to performance that would be hardly achievable by a
well-trained specic classier. Despite the seemingly complex description of the
model, the ideas are simple and the implementation is straightforward.
In our future we would like to endow our recognition framework with more
powerful classiers and an ability to handle relations on multiple levels that
would i.e. allow two dierent structural components to overlap.
Acknowledgment. This work has been supported by Google Research Award,
by the Czech Ministry of Education under project MSM6840770012 and by Grant
Agency of the CTU Prague under project SGS10/278/OHK3/3T/13.
References
1. Micusik, B., Kosecka, J.: Piecewise planar city 3D modeling from street view
panoramic sequences. In: Proc. CVPR. (2009)
2. Hohmann, B., Krispel, U., Havemann, S., Fellner, D.: CITYFIT: High-quality
urban reconstructions by tting shape grammars to images and derived textured
point cloud. In: Proc. of the International Workshop 3D-ARCH. (2009)
3. Pauly, M., Mitra, N., Wallner, J., Pottmann, H., Guibas, L.: Discovering structural
regularity in 3D geometry. Transactions on Graphics 27 (2008) 4343
4. Gips, J.: Shape grammars and their uses. Birkh auser (1975)
5. Zhu, S., Mumford, D.: A stochastic grammar of images. Foundations and Trends
in Computer Graphics and Vision 2 (2006) 362
6. Alegre, F., Dellaert, F.: A probabilistic approach to the semantic interpretation of
building facades. In: International Workshop on Vision Techniques Applied to the
Rehabilitation of City Centres. (2004)
7. M uller, P., Zeng, G., Wonka, P., Van Gool, L.: Image-based procedural modeling
of facades. Transactions on Graphics 26 (2007) 85
8. Mayer, H., Reznik, S.: Building facade interpretation from uncalibrated wide-
baseline image sequences. ISPRS Journal of Photogrammetry and Remote Sensing
61 (2007) 371380
9. Ripperda, N., Brenner, C.: Data driven rule proposal for grammar based facade
reconstruction. Photogrammetric Image Analysis 36 (2007) 16
10. Teboul, O., Simon, L., Koutsourakis, P., Paragios, N.: Segmentation of building
facades using procedural shape prior. In: Proc. CVPR. (2010)
11. Toussaint, G.T.: The relative neighbourhood graph of a nite planar set. Pattern
Recognition 12 (1980) 261 268
12. McLaughlan, G.J.: Finite Mixture Models. Wiley (2000)
13. Green, P.J.: Reversible jump Markov chain Monte Carlo computation and Bayesian
model determination. Biometrika 82 (1995) 711732