0% found this document useful (0 votes)
59 views12 pages

A Variable-Resolution Probabilistic Three-Dimensional Model For Change Detection

Uploaded by

ashalizajohn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views12 pages

A Variable-Resolution Probabilistic Three-Dimensional Model For Change Detection

Uploaded by

ashalizajohn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

This article has been accepted for inclusion in a future issue of this journal.

Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 1

A Variable-Resolution Probabilistic
Three-Dimensional Model for Change Detection
Daniel Crispell, Member, IEEE, Joseph Mundy, and Gabriel Taubin, Fellow, IEEE

Abstract—Given a set of high-resolution images of a scene, it is appearance of the scene given the previously observed images.
often desirable to predict the scene’s appearance from viewpoints Some change detection algorithms operate at large scales,
not present in the original data for purposes of change detection. typically indicating changes in land-cover type (e.g., forests,
When significant 3-D relief is present, a model of the scene ge-
ometry is necessary for accurate prediction to determine surface urban, and farmland). Due to the increasing availability of
visibility relationships. In the absence of an a priori high-resolution high-resolution imagery, however, interest in higher resolution
model (such as those provided by LIDAR), scene geometry can and intraclass change detection is growing. The precise def-
be estimated from the imagery itself. These estimates, however, inition of “change” is application dependent in general, and
cannot, in general, be exact due to uncertainties and ambiguities in many cases, it is easier to define what does not constitute
present in image data. For this reason, probabilistic scene models
and reconstruction algorithms are ideal due to their inherent valid change [1], [2]. Typically, changes in appearance due
ability to predict scene appearance while taking into account to illumination conditions, atmospheric effects, viewpoint, and
such uncertainties and ambiguities. Unfortunately, existing data sensor noise are not desired to be reported. Various classes of
structures used for probabilistic reconstruction do not scale well methods for accomplishing this have been attempted, a survey
to large and complex scenes, primarily due to their dependence of which was given by Radke et al. [1] in 2005 (which includes
on large 3-D voxel arrays. The work presented in this paper
generalizes previous probabilistic 3-D models in such a way that the joint histogram-based method [3] used for comparison by
multiple orders of magnitude savings in storage are possible, mak- Pollard et al. [2]). One common assumption relied on by most
ing high-resolution change detection of large-scale scenes from of these methods, as well as more recent approaches [4], [5],
high-resolution aerial and satellite imagery possible. Specifically, is an accurate registration of pixel locations in the collected
the inherent dependence on a discrete array of uniformly sized image to corresponding pixel locations in the base imagery.
voxels is removed through the derivation of a probabilistic model
which represents uncertain geometry as a density field, allowing When the scene being imaged is relatively flat or the 3-D
implementations to efficiently sample the volume in a nonuniform structure is known a priori, “rubber sheeting” techniques can
fashion. be used to accomplish this. When the scene contains significant
Index Terms—Computer vision, data structures, remote 3-D relief viewed from disparate viewpoints, however, tech-
sensing. niques based on this assumption fail due to their inability to
predict occlusions and other viewpoint-dependent effects [2].
High-resolution imagery exacerbates this problem due to the
I. I NTRODUCTION
increased visibility of small-scale 3-D structure (trees, small

H IGH-RESOLUTION aerial imagery is quickly becoming


a ubiquitous source of information in both defense and
civil application domains due to the advancing technology of
buildings, etc.) which is ubiquitous over much of the planet.
In this case, a 3-D model of the scene is necessary for accurate
change detection from arbitrary viewpoints. There have been
remote sensing and increasing availability of aerial platforms some previous works using 3-D models for change detection.
(including unmanned aerial vehicles). As the immense volume Huertas and Nevatia [6] matched image edges with projections
of produced imagery data grows over time, automated process- of 3-D building models with some promise but relied on the
ing algorithms become ever more important. manual step of model creation. Eden and Cooper’s method [7]
based on the automatic reconstruction of 3-D line segments
avoids the manual model creation step, although it is unable
A. Change Detection to detect change due to occlusion since only a “wireframe”
One application wherein the collected imagery is frequently (as opposed to surface) model is constructed. Pollard et al.
used as input for is change detection, where a new im- [2] proposed a probabilistic reconstruction method for change
age is collected and must be compared with the “expected” detection and is the basis for the work presented here.

B. Probabilistic Models
Manuscript received December 1, 2010; revised March 18, 2011; accepted
April 22, 2011. Computing exact 3-D structure based on 2-D images is, in
D. Crispell is with the National Geospatial-Intelligence Agency, Springfield, general, an ill-posed problem. Bhotika et al. [8] characterized
VA 22150 USA (e-mail: [email protected]). the sources of difficulty as belonging to one of two classes:
J. Mundy and G. Taubin are with Brown University, Providence, RI 02912,
USA (e-mail: [email protected]; [email protected]). scene ambiguity and scene uncertainty. Scene ambiguity ex-
Digital Object Identifier 10.1109/TGRS.2011.2158439 ists due to the existence of multiple possible photo-consistent
0196-2892/$26.00 © 2011 IEEE
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

2 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

D. Outline
The remainder of this paper is laid out as follows. In
Section II, a brief survey of related work in the fields of 3-D
reconstruction is given. Section III describes the theoretical
foundations of the proposed model. Sections IV and V de-
scribe the implementation using an octree data structure and
the reconstruction algorithms, respectively. The application
of the reconstructed models for change detection are dis-
cussed in Section VI, followed by the paper’s conclusion in
Section VII.

Fig. 1. Variable-resolution models (b) such as octrees allow for high- II. R ELATED W ORK
resolution representation where needed (i.e., near surfaces) with far less data
than required in (a) fixed grid models. Representing surface probabilities using There is a large body of previous work in the computer vi-
the proposed method allows for variable-resolution models to be used for sion community involving the automatic reconstruction of 3-D
probabilistic 3-D modeling approaches to change detection.
models from imagery, a brief overview of which is given here.
The bulk of the representations used are not probabilistic in
nature and are discussed in Section II-A. Existing probabilistic
scenes and is a problem even in the absence of any sensor noise methods are discussed in Section II-B.
or violations of assumptions built into the imaging and sensor
model. In the absence of prior information regarding the scene
A. Deterministic Methods
structure, there is no reason to prefer one possible reconstruc-
tion over another. The term “scene uncertainty,” on the other Three-dimensional reconstruction from images is one of the
hand, is used to describe all other potential sources of error in- fundamental problems in the fields of computer vision and
cluding sensor noise, violations of certain simplifying assump- photogrammetry, the basic principles of which are discussed in
tions (e.g., Lambertian appearance), and calibration errors. The many texts including [9]. Reconstruction methods vary both in
presence of scene uncertainty typically makes reconstruction the algorithms used and the type of output produced.
of a perfectly photo-consistent scene impossible. Probabilistic Traditional stereo reconstruction methods take as input two
models allow the scene ambiguity and uncertainty to be ex- (calibrated) images and produce a depth (or height) map as
plicitly represented, which, in turn, allows the assignment of output. A comprehensive review of the stereo reconstruction
confidence values to visibility calculations, expected images, literature as of 2002 is given by Scharstein and Szeliski [10].
and other data extracted from the model. A probabilistic model While highly accurate results are possible with recent methods
can also be used to determine which areas of the scene require [11], [12], the reconstruction results are limited to functions of
further data collection due to low confidence. the form f (x, y) and cannot completely represent general 3-D
scenes on their own.
Many multiview methods are capable of computing 3-D
point locations as well as camera calibration information
C. Contributions
simultaneously using the constraints imposed by feature
The work presented in this paper generalizes previous prob- matches across multiple images (so called “structure from
abilistic reconstruction models in such a way that multiple motion”). One example of a such a method is presented by
orders of magnitude savings in storage are possible, making Pollefeyes et al. [13], who use tracked Harris corner [14]
precise representation and change detection of large-scale out- features to establish correspondences across frames of a video
door scenes possible. Specifically, the inherent dependence on sequence. Brown and Lowe [15] and Snavely et al. [16] use
a discrete array of uniformly sized voxels is removed through scale invariant feature transform features [17] for the same
the derivation of a new probabilistic representation based on a purpose with considerable success. Snavley et al. have shown
density field model. The representation allows for implementa- their system capable of successfully calibrating data sets con-
tions which nonuniformly sample the volume, providing high- sisting of hundreds of images taken from the Internet. The
resolution detail where needed (e.g., near surfaces) and coarser output of feature-based matching methods (at least in an initial
resolutions in areas containing little information (e.g., in empty phase) is a discrete and sparse set of 3-D elements which are
space) (Fig. 1). Additionally, it represents the first probabilistic not directly useful for the purpose of appearance prediction
volumetric model to provide a principled way to take viewing since some regions (e.g., those with homogeneous appearance)
ray/voxel intersection lengths into account, enabling higher ac- will be void of features and, thus, also void of reconstructed
curacy modeling and rendering. The proposed model combined points. It is possible to estimate a full surface mesh based
with the reconstruction and associated algorithms comprise on the reconstructed features [18], [19], but doing so requires
a practical system capable of automatically detecting change imposing regularizing constraints to fill in “holes” correspond-
and generating photo-realistic renderings of large and complex ing to featureless regions. Methods based on dense matching
scenes from arbitrary viewpoints based on image data alone. techniques avoid the hole-filling problem but are dependent on
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

CRISPELL et al.: VARIABLE-RESOLUTION PROBABILISTIC MODEL FOR CHANGE DETECTION 3

smoothness and ordering assumptions to perform the matching.


The methods presented in this paper are not dependent on any
assumptions regarding the 3-D scene geometry yet produce a
dense model suitable for high-resolution change detection.

B. Probabilistic Methods
As discussed in Section I-B, probabilistic 3-D models have Fig. 2. (Left) Three viewing rays pass through a voxel, which (right) is then
subdivided to achieve higher resolution. Assuming that no further information
the desirable quality of allowing a measure of uncertainty about the voxel is obtained, the occlusion probabilities P (QA1 ), P (QB1 ), and
and ambiguity in the reconstructed model to be explicitly P (QC1 ) should not depend on the level of subdivision and should be equal to
represented. Probabilistic methods are also capable of produc- P (QA2 ), P (QB2 ), and P (QC2 ), respectively.
ing a complete representation of the modeled surfaces while
making no assumptions about scene topology or regularizing range of possible scene reconstructions. Broadhurst et al. [25]
constraints. aim to reconstruct such a representation, as well as Pollard et al.
There exists in the literature several distinct methods for [2] for the purpose of change detection. Pollard et al. use
reconstructing a probabilistic volumetric scene model based on an online Bayesian method to update voxel probabilities with
image data, all based on discrete voxel grid models. Although each observation. Because a model which fully represents both
the methods vary in their approach, the goal is the same: to scene ambiguities and uncertainties and is capable of change
produce a volumetric representation of the 3-D scene, where detection is desired, the model and algorithms presented in this
each voxel is assigned a probability based on the likelihood paper are based on this approach.
of it being contained in the scene. The algorithms grew out of One quality that current volumetric probabilistic reconstruc-
earlier “voxel coloring” algorithms [20]–[23] in which voxels tion methods all share is that the voxel representation is inher-
are removed from the scene based on photometric consistency ently incapable of representing the true continuous nature of
and visual hull constraints. Voxel coloring methods are prone surface location uncertainty. Using standard models, occlusion
to errors due to scene uncertainty; specifically, violations of can only occur at voxel boundaries, since each voxel is modeled
the color consistency constraint often manifest themselves as as being either occupied or empty. A side effect of this fact is
incorrectly carved “holes” in the model [24]. To combat these that there is no principled way to take into account the length
errors, probabilistic methods do not “carve” voxels but rather of viewing ray/voxel intersections when computing occlusion
assign each a probability of existing as part of the model. probabilities, which limits the accuracy of the computations.
Broadhurst et al. [25] assign a probability to each voxel based These limitations are typically handled in practice by the use of
on the likelihood that the image samples originated from a high-resolution voxel grids, which minimize the discretization
distribution with small variance rather than make a binary effects of the model. Unfortunately, high-resolution voxel grids
decision. Similarly, Bhotika et al. [8] carve each voxel with a are prohibitively expensive, requiring O(n3 ) storage to repre-
probability based on the variance of the samples in each of a sent scenes with linear resolution n.
large number of runs. The final voxel probability is computed 1) Variable-Resolution Probabilistic Methods: The benefits
as the probability that the voxel exists in a given run. of an adaptive variable-resolution representation are clear: In
In addition to uncertainty due to noise and other unmodeled theory, a very highly effective resolution can be achieved
phenomenon, any reconstruction algorithm must also deal with without the O(n3 ) storage requirements imposed by a regular
scene ambiguity, the condition which exists when multiple voxel grid. One hurdle to overcome is the development of a
photo-consistent reconstructions are possible given a set of probabilistic representation which is invariant to the local level
collected images. If certain a priori information about the scene of discretization. A simple example is informative.
is available, the information may be used to choose the photo- Consider a single voxel pierced by three distinct viewing
consistent reconstruction which best agrees with the prior. The rays, as shown in Fig. 2 (left). After passing through the single
reconstruction algorithm presented in this paper is as general voxel, the viewing rays have been occluded with probabilities
as possible and, thus, does not utilize any such prior. Another P (QA1 ), P (QB1 ), and P (QC1 ), respectively. Given that no
approach is to define a particular member of the set of photo- further information about the volume is obtained, the occlusion
consistent reconstructions as “special” and aim to recover that probabilities should not change if the voxel is subdivided to
member. This is the approach taken by Kutulakos and Seitz [21] provide finer resolution, as shown in Fig. 2 (right). In other
and Bhotika et al. [8]. Kutulakos and Seitz define the photo words, the occlusion probabilities should not be inherently tied
hull as the tightest possible bound on the true scene geometry, to the level of discretization.
i.e., the maximal photo-consistent reconstruction. They show Using traditional probabilistic methods, P (QA1 ) =
that, under ideal conditions, the photo hull can be recovered P (QB1 ) = P (QC1 ) = P (X ∈ S), where P (X ∈ S) is the
exactly, while Bhotika et al. present a stochastic algorithm probability that the voxel belongs to the set S of occupied
for probabilistic recovery of the photo hull in the presence of voxels. Upon subdivision of the voxel, eight new voxels are
noise. The photo hull provides a maximal bound on the true created, each of which must be assigned a surface probability
scene geometry but does not contain any information about the P (Xchild ∈ S). Whatever the probability chosen, it is assumed
distribution of possible scene surfaces within the hull. A third to be constant among the eight “child” voxels since there is
approach is to explicitly and probabilistically represent the full no reason for favoring one over any of the others. Given that
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

4 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

rays A, B, and C pass through four child voxels, two child in general. Points along the line of sight may be parameterized
voxels, and one child voxel, respectively, the new occlusion by s, the distance from q
probabilities are computed as the probability that any of the
voxels passed through belong to the set S of surface voxels. x(s) = q + sr, s ≥ 0. (4)
This is easily solved using De Morgan’s Laws by instead
computing the complement of the probability that all voxels Given the point q and viewing ray r, a proposition Vs may be
passed through are empty defined as follows:
P (QA2 ) = 1 − (1 − P (Xchild ∈ S))4 (1)
Vs ≡ “The point along r at distance s is visible from q.” (5)
2
P (QB2 ) = 1 − (1 − P (Xchild ∈ S)) (2)
The probability P (Vs ) is a (monotonically decreasing) function
P (QC2 ) = P (Xchild ∈ S). (3)
of s and can be written as such using the notation vis(s)
Obviously, the three occlusion probabilities cannot be equal
vis(s) ≡ P (Vs ). (6)
to the original values, i.e., P (X ∈ S), except in the trivial cases
P (X ∈ S) = P (Xchild ∈ S) = 0 or P (X ∈ S) = P (Xchild ∈
S) = 1. This simple example demonstrates the general impos- Given a segment of r with arbitrary length  beginning at the
sibility of resampling a traditional voxel-based probabilistic point with distance s from q, the segment occlusion probabil-
model while maintaining the semantic meaning of the original. ity P (Qs ) is defined as the probability that the point at distance
This presents a major hurdle to generalizing standard proba- s +  is not visible, given that the point at distance s is visible
bilistic 3-D models to variable-resolution representations.  
The methods proposed in this paper solve the problems P Qs =P (V̄s+ |Vs )
associated with resolution dependence by modeling surface
= 1 − P (Vs+ |Vs ). (7)
probability as a density field rather than a set of discrete voxel
probabilities. The density field is still represented discretely in
practice, but the individual voxels can be arbitrarily subdivided Using Bayes’ theorem
without affecting occlusion probabilities since the density is   P (Vs |Vs+ )P (Vs+ )
a property of the points within the voxel and not the voxel P Qs = 1 − . (8)
P (Vs )
itself. Occlusion probabilities are computed by integrating the
density field along viewing rays, providing a principled way to
Substituting vis(s) for the visibility probability at distance s
take voxel/viewing ray intersection lengths into account. The
and recognizing that P (Vs |Vs+ds ) = 1
derivation of this density field is presented in Section III.
  vis(s + )
P Qs = 1 −
III. O CCLUSION D ENSITY vis(s)
In order to offset the prohibitively large storage costs and   vis(s) − vis(s + )
discretization problems of the regular voxel grid on which tradi- P Qs = . (9)
vis(s)
tional probabilistic methods are based, a novel representation of
surface probability is proposed in the form of a scalar function If an infinitesimal segment length  = ds is used, (9) can be
termed the occlusion density. The occlusion density at a point written as
in space can be thought of as a measure of the likelihood
that the point occludes points behind it along the line of sight   −∂vis(s)
of a viewer, given that the point itself is unoccluded. More P Qdss = (10)
vis(s)
precisely, the occlusion density value at the point is a measure  ds 
of occlusion probability per unit length of a viewing ray which P Qs vis (s)
is passing through the point. =− . (11)
ds vis(s)
If the occlusion density is defined over a volume, proba-
bilistic visibility reasoning can be performed for any pair of The left-hand side of (11) is a measure of occlusion probability
points within the volume. In the case where surface geometry per unit length and defines the occlusion density at point x(s).
exists and is known completely [e.g., scenes defined by a The estimation of the occlusion density value is discussed in
surface mesh or digital elevation model (DEM)], the occlusion Section V
density is defined as infinite at the surface locations and zero
elsewhere. vis (s)
α (x(s)) ≡ − . (12)
Given a ray in space defined by its origin point q and a unit vis(s)
direction vector r, the probability of each point x along the
ray being visible from q may be computed. It is assumed here 2) Visibility Probability Calculation: The visibility proba-
that q is the position of a viewing camera and r represents a bility of the point at distance s along a viewing ray can be
viewing ray of the camera, but the assumption is not necessary derived in terms of the occlusion density function along the
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

CRISPELL et al.: VARIABLE-RESOLUTION PROBABILISTIC MODEL FOR CHANGE DETECTION 5

Fig. 4. Camera ray parameterized by s cuts through cells in an octree. Both


the occlusion density and appearance model are approximated as being constant
Fig. 3. (a) Viewing ray originating at the camera pierces the volume of
within each cell.
interest. (b) (Top) (Arbitrary) Example plot of occlusion density along the
viewing ray. (Bottom) Resulting visibility probability along the viewing ray.

ray by integrating both sides of (12) with respect to a dummy is irrelevant. By contrast, the occlusion probabilities P (Qs )
variable of integration t represent the probability that the viewing ray is occluded at
any point on the interval [s, s + ]. The path length  becomes
s s
−∂vis(t) important when one moves away from high-resolution regular
α(t)dt = voxel grids to variable-resolution models because its value
vis(t)
0 0 may vary greatly depending on the size of the voxel and the
s geometry of the ray-voxel intersection.
− α(t)dt = [ln (vis(t)) + c]s0
0
IV. I MPLEMENTATION : O CTREE R EPRESENTATION
s
− α(t)dt = ln (vis(s)) − ln (vis(0)) . (13) In order to make practical use of the probabilistic model
described in Section III, a finite-sized representation which is
0
able to associate both an occlusion density and appearance
Finally, by recognizing that vis(0) = 1 and placing both sides model with each point in the working volume is needed. Details
of (13) in an exponential are presented in this section of an octree-based implementation
s which approximates the underlying occlusion density and ap-
− α(t)dt
vis(s) = e 0 . (14) pearance functions as piecewise constant.
Most real-world scenes contain large slowly varying regions
Equation (14) gives a simple expression for the visibility of low occlusion probability in areas of “open space” and
probability in terms of the occlusion density values along the high quickly varying occlusion probability near “surfacelike”
viewing ray. objects. It therefore makes sense to sample α(x) in a nonuni-
An example of corresponding occlusion density and visi- form fashion. The proposed implementation approximates both
bility functions is shown in Fig. 3, which depicts a camera α(x) and the appearance model as being piecewise constant,
ray piercing a theoretical volume for which the occlusion with each region of constant value represented by a cell of an
density is defined at each point within the volume. The value adaptively refined octree. The implementation details of the
of the occlusion density α(s) as a function of distance along underlying octree data structure itself are beyond the scope
the camera ray is plotted, indicating two significant peaks in of this paper; the reader is referred to Samet’s comprehensive
surface probability. The resulting visibility probability function treatment [26]. Fig. 4 shows a viewing ray passing through a
is plotted directly below it. volume which has been adaptively subdivided using an octree
3) Occlusion Probability: Substituting (14) back into (9), a data structure and the finite-length ray segments that result from
simplified expression of a segment’s occlusion probability is the intersections with the individual octree cells.
obtained Each cell in the octree stores a single occlusion density
   s+ value α and appearance distribution pA (i). The appearance
− α(t)dt
P Qs = 1 − e s . (15) distribution represents the probability density function of the
pixel intensity value resulting from the imaging of the cell.
4) Relationship With Discrete Voxel Probabilities: The key The occlusion density value and appearance distribution are
theoretical difference between the discrete voxel probabilities assumed to be constant within the cell. Note that this piecewise-
P (X) of existing methods and the preceding formulation of constant assumption can be made arbitrarily accurate since, in
occlusion density is the interpretation of the probability values. theory, any cell in which the approximation is not acceptable
Because existing methods effectively model the probability that can always be subdivided into smaller cells. In practice, how-
a voxel boundary is occluding (whether they are defined as such ever, the amount of useful resolution in the model is limited by
or not), the path length of the viewing ray through the voxel the resolution of the input data used to construct it.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

6 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

A. Appearance probability (15) of the ith interval. The integral is replaced by a


multiplication due to the assumption that occlusion density has
In addition to the occlusion density, it is necessary to define
constant value αi within the cell
appearance information for points in space in order to per-
form modeling and rendering operations. Here, “appearance”  
describes the predicted pixel value of an imaging sensor, given P Qsii = 1 − e−αi i . (17)
that the point is visible from the sensor. In general, many
factors contribute to this value: lighting conditions, viewing The probability that the start point of the ith interval is visible
angle, particularities of the imaging sensor, and others. There from the camera is denoted by the term vis(si ) and can be
is a large body of work in the computer graphics literature simplified using the piecewise-constant model as follows:
that is focused on modeling these factors’ effect on an object’s ⎛ ⎞
appearance, a comprehensive survey of which is given by i−1
vis(si ) = exp ⎝− αj j ⎠ . (18)
Dorsey et al. [27]. For practical reasons, the appearance model
j=0
used in this work is relatively simple; it is independent of
viewing direction and is modeled using a mixture of Gaussian
distributions. It is assumed that for a given point x, a single The ith posterior occlusion probability P (Qsii |D) is computed
distribution pA (c, x) describes the probability density of the by following closely the formulation of Pollard et al. [2] and
imaged value c of x. In the case of grayscale imagery, c generalizing to a series of intervals with varying lengths
is a scalar value and the distribution is 1-D. In the case of  
      P D|Qsi
multi-band imagery (e.g., RGB color), c is an n-dimensional P Qsi |D = P Qsi . (19)
vector and the distribution is defined over the n appearance P (D)
dimensions. When dealing with data such as satellite imagery
in which vastly different lighting conditions may occur, a In order to simplify computations, the term prei , which rep-
separate appearance model is used for each predefined range of resents the probability of observing D taking into account
illumination conditions (i.e., sun positions). When performing segments 0 to i − 1 only is defined as follows:
change detection, only the relevant appearance model for the

i−1
given sun position is considered. This is the approach taken by
prei ≡ P Qsj vis(sj )pAj (cD ). (20)
Pollard et al. [2].
j=0

V. R ECONSTRUCTION F ROM I MAGERY The term pAj (cD ) is the probability density of the viewing ray’s
corresponding pixel intensity value cD given by the appearance
Pollard et al. [2] estimate the probability that each voxel X model of the ith octree cell along the ray. Equation (19) can
of the model belongs to the set S of “surface voxels.” This then be generalized as
“surface probability” is denoted as P (X ∈ S) or simply P (X)
for convenience. The voxel surface probabilities are initialized −1
N
  
with a predetermined prior and updated with each new ob- vis∞ ≡ 1 − P Qsi (21)
servation using an online Bayesian update equation (16). The i=0
update equation determines the posterior surface probabilities     prei + vis(si )pAi (cD )
of each of a series of voxels along a camera ray, given their P Qsi |D = P Qsi (22)
pre∞ + vis∞ pA∞ (cD )
prior probabilities P (X) and an observed image D
P (D|X) where pre∞ represents the total probability of the observation
P (X|D) = P (X) . (16) based on all voxels along the ray. The probability of the ray
P (Dt )
passing unoccluded through the model is represented by vis∞
The marginal (P (Dt )) and conditional (P (D)|X)) probabil- and is computed based on (21). The term pA∞ (cD ) represents
ities of observing D can be expanded as a sum of probabilities the probability density of the observed intensity given that the
along the viewing ray. In practice, a single camera ray is ray passes unoccluded through the volume and can be thought
traversed for each pixel in image t and all pixel values are of as a “background” appearance model. In practice, portions
assumed to be independent. of the scene not contained in the volume of interest may be
Rather than a camera ray intersecting a series of voxels, the visible in the image. In this case, the background appearance
equation can be generalized to a series of N intervals along a model represents the predicted intensity distribution of these
ray parameterized by s, the distance from the camera center. points and is nominally set to a uniform distribution. Note that
The ith interval is the result of the intersection of the viewing the denominator of (22) differs from the update equation of
ray with the ith octree cell along the ray, and it has length i . Pollard et al. [2] due to consolidation of the “pre” and “post”
(See Fig. 4.) The interval lengths resulting from the voxel–ray terms into pre∞ and the addition of the “background” term
intersections are irrelevant in the formulation of Pollard et al. vis∞ pA∞ (cD ).
because the occlusion probabilities are fixed and equal to the Equation (22) provides a method for computing the posterior
discrete voxel surface probabilities P (Xi ). The surface prob- occlusion density of a viewing ray segment but must be related
ability of the ith voxel is replaced by P (Qsii ), the occlusion back to the cell’s occlusion density value to be of use. This is
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

CRISPELL et al.: VARIABLE-RESOLUTION PROBABILISTIC MODEL FOR CHANGE DETECTION 7

Fig. 5. Four representative frames from the “downtown” video sequence filmed over Providence, RI. Images courtesy of Brown University. Because of the short
duration of the sequence, moving objects such as vehicles dominate the changes from image to image.

accomplished by assuming that αi = α(t) is constant within the A. Appearance Calculation


segment (si <= t < si + i ) and by solving (17) for α
In addition to computing the occlusion density of a given
   octree cell, a reconstruction algorithm must also estimate the
− log 1 − P Qsi
αi = . (23) cell’s appearance model. Pollard et al. update the Gaussian
i mixture model of each voxel with each image using an on-
line approach based on Stauffer and Grimson’s adaptive back-
In general, however, multiple viewing rays from an image will ground modeling algorithm [28], and a similar approach is
pass through a single octree cell. (This happens any time a used here. Because the proposed multiresolution model allows
cell projects to more than one pixel in the image.) In order for the possibility of large cells which may project to many
to reconcile the multiple posterior probabilities, an occlusion pixels of the image, a weighted average of pixel values is
density value is calculated by computing the probability P (Q̄) computed for each cell prior to the update, with the weights
of occlusion by any of the K viewing ray segments within a proportional to the segment lengths of the corresponding view-
cell and relating it to the total combined length ¯ = k k of ing ray/octree cell intersections. The cell’s Gaussian mixture
all of the segments. This has the effect of giving more weight to model is then updated using the single weighted average pixel
observations with longer corresponding segment lengths value.

K−1
P (Q̄) = 1 − (1 − P (Qk )) . (24)
k=0
B. Adaptive Refinement
Upon initialization, the model consists of a regular 3-D array
The posterior occlusion probability corresponding to the
of octrees, each containing a single (i.e., root) cell. At this
segment of the kth camera ray passing through the cell is
stage, the model roughly corresponds to a regular voxel grid
denoted P (Qk ) for convenience
at a coarse resolution. As more information is incorporated
K−1  into the model, however, the sampling of regions with high oc-

− log (1 − P (Qk )) clusion density may be refined. The proposed implementation
ᾱ = k=0 subdivides a leaf cell into its eight children when its maximum

K−1
occlusion probability P (Qmaxi ) reaches a global threshold.
k
k=0 The maximum occlusion probability of cell i is a function of
the longest possible path through the cell (i.e., between opposite

K−1
− log (1 − P (Qk )) corners) and the cell’s occlusion density
k=0
ᾱ = . (25)

K−1
k P (Qmaxi ) = 1 − exp (−maxi αi ) . (26)
k=0

The previous occlusion density value of each cell is replaced The occlusion densities and appearance models of the refined
with the ᾱ value computed using all K viewing rays of the given cells are initialized to the value of their parent cell. This process
image which pass through it. is executed after each update to the model.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

8 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

Fig. 6. (Top) Ground truth segmentation used to evaluate the change detection
algorithms. Original image courtesy of Brown University. (Bottom) ROC
curves for change detection using the proposed 3-D model and a fixed grid
voxel model.

Fig. 8. Expected images generated from the viewpoint of the image used
to evaluate change detection. Because there is a low probability of a mov-
ing vehicle being at a particular position on the roads, they appear empty.
(Top) Image generated using the proposed 3-D model. (Bottom) Image gen-
erated using a fixed grid voxel model. The proposed variable-resolution model
allows for finer details to be represented, as is visible in the expanded crops.

to be superior to previous 2-D approaches. In particular, the


Fig. 7. Changes detected by the proposed algorithm are marked in black. Note
that the moving vehicles have low probability, as well as some false detects capability to model a 3-D structure allowed apparent changes
around building edges, presumably due to small errors in the camera model. due to viewpoint-based occlusions to be correctly ignored. A
Original image courtesy of Brown University. popular 2-D approach [3] consistently marked these regions
as change, leading to high false positive rates. The proposed
system utilizes the estimated probabilistic 3-D model in a
VI. C HANGE D ETECTION
similar fashion, but with the added advantage of higher res-
Given a new image of a previously modeled scene, it is olution capability over broader areas that the multiresolution
often useful to detect regions in the image which represent representation provides.
deviations from the expected intensity values. Pollard et al. [2] Given a model constructed using the methods proposed in
demonstrated their system, which was the first to use proba- Section V and a viewpoint defined by a camera model, a
bilistic 3-D information for the purposes of change detection, probability distribution pA (c) for each pixel in the image can
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

CRISPELL et al.: VARIABLE-RESOLUTION PROBABILISTIC MODEL FOR CHANGE DETECTION 9

Fig. 9. (Left) Training image of the Haifa Street region from 2002. (Middle) Ground truth changes marked in one of the test images from 2007. Most of the
changes are a result of moving vehicles on the roads. (Right) Changes detected using the proposed algorithm. Original image copyright Digital Globe.

Fig. 10. (Left) Training image of the region surrounding the construction of the U.S. embassy from 2002. (Middle) Ground truth changes marked in one of the
test images from 2007. There is a variety of changes resulting in the placement of large storage containers and other construction apparatus. (Right) Changes
detected using the proposed algorithm. Original image copyright Digital Globe.

be generated by integrating the appearance information of each of each pixel based on the computed probability density distri-
voxel along the pixel’s corresponding viewing ray R butions of each pixel. Fig. 8 shows expected images generated
using the proposed model and the fixed grid model. Small

pA (c) = vis(si )(1 − e−αi i )pAi (c). (27) features such as individual building windows and rooftop air-
i∈R conditioning units are visible using the proposed model but
blend into the background using the fixed grid model. The
Pixel values with a low probability density are “unexpected.” resolution of the fixed grid model can, in theory, always be
A binary “change mask” can therefore be created by thresh- increased to match the capabilities of the variable-resolution
olding the image of pixel probability densities. The receiver model, but doing so quickly becomes impractical due to the
operating characteristic (ROC) curves in Figs. 6, 11, and 12 O(n3 ) storage requirements. The fixed grid model of the
show the rate of ground-truth change pixels correctly marked “downtown” sequence is nearly 50% larger than the variable-
as change versus the rate of pixels incorrectly identified as resolution model while providing half the effective resolution.
change. The plotted curves indicate how these values vary as A fixed grid model equaling the effective resolution of variable-
the threshold τ is varied. resolution model would require approximately 18 GB (1000%
Two distinct collection types are investigated: full motion larger than the variable-resolution model).
video collected from an aerial platform and satellite imagery. The satellite imagery used for experimentation was collected
The full motion video was collected using a high-definition by Digital Globe’s Quickbird satellite over Baghdad, Iraq, from
(1280 × 720 pixels) video camera from a helicopter flying over 2002 to 2007 and are the same data sets used by Pollard et al.
Providence, RI, U.S. A few representative frames are shown in [2] in their change detection experiments. Two areas of interest
Fig. 5. The model was updated using 175 frames of a sequence are focused on the following: a region with some high-rise
(in random order) in which the helicopter made a full 360◦ buildings along Haifa Street and the region surrounding the
loop around a few blocks of the downtown area, and the change construction site of the new U.S. embassy building. A manually
detection algorithm was then run on an image (not used in the determined translation is applied to bring the supplied camera
training set) that contains some moving vehicles (Fig. 7). model to within approximately one pixel of reprojection error.
The ROC curve in Fig. 6 demonstrates the advantage that The images have a nominal resolution of approximately 0.7-m
higher resolution 3-D modeling provides to the change detec- GSD. Although the 3-D structure is less pronounced than in
tion algorithm. In order to better visualize the higher resolution the video sequence, it is still sufficient to pose a challenge to
capability of the proposed model, a synthetic image can be 2-D change detection algorithms, as shown by Pollard et al.
generated by computing the expected intensity value E[pA (c)] The “haifa” and “embassy” models were updated with each of
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

10 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

Fig. 11. (a) Change detection ROC curve for the “haifa” data set after (a) one pass and (b) five passes of the 28 training images. The additional passes allow the
octrees in the variable-resolution model time to reach their optimal subdivision levels.

Fig. 12. (a) Change detection ROC curve for the “embassy” data set after (a) one pass and (b) five passes of the 25 training images. After the additional passes,
the variable-resolution model has again reached the effective resolution of the fixed grid model.

28 and 25 images, respectively, taken between 2002 and 2006. TABLE I


C OVERAGE AND M ODEL S IZES FOR THE A ERIAL V IDEO AND S ATELLITE
Although this number of images appears to be sufficient to I MAGE T EST S ETS . T HE R ESOLUTION L ISTED FOR THE
train the fixed-grid model, a larger number of images allows VARIABLE -R ESOLUTION M ODELS I NDICATES T HAT
the cells of the variable-resolution model time to reach optimal OF THE F INEST O CTREE S UBDIVISION L EVEL

subdivision levels. In order to accommodate this requirement,


five passes of the images were used to update the models
to simulate a larger number of collected images. The change
detection algorithm was then run on previously unobserved
images taken during 2007. Figs. 9 and 10 show a representative
training image and the ground truth changes for one of the
2007 images for the two experiments. Figs. 11 and 12 show the
resulting change detection ROC curves resulting after training
the model using one pass of the images and after five passes.
Because the variable-resolution models are initialized at a much reached that of the fixed grid models while needing a fraction
coarser level than the fixed grid model, their performance suf- of the memory and disk space. Table I lists the area modeled,
fers in the period before the proper subdivision at surface voxels effective resolution, and storage requirements of each of the
occurs. After five passes, however, the effective resolution has models discussed. Fig. 13 provides a visualization of the storage
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

CRISPELL et al.: VARIABLE-RESOLUTION PROBABILISTIC MODEL FOR CHANGE DETECTION 11

[9] R. Hartley and A. Zisserman, Multiple View Geometery in Computer


Vision, 2nd ed. Cambridge, U.K.: Cambridge Univ. Press, 2003.
[10] D. Scharstein and R. Szeliski, “A taxonomy and evaluation of dense two-
frame stereo correspondence algorithms,” Int. J. Comput. Vis., vol. 47,
no. 1–3, pp. 7–42, Apr.–Jun. 2002.
[11] Z.-F. Wang and Z.-G. Zheng, “A region based stereo matching algorithm
using cooperative optimization,” in Proc. Comput. Vis. Pattern Recognit.,
2008, pp. 1–8.
[12] Q. Yang, L. Wang, R. Yang, H. Stewénius, and D. Nistér, “Stereo matching
with color-weighted correlation, hierarchical belief propagation, and oc-
clusion handling,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 3,
pp. 492–504, Mar. 2009.
[13] M. Pollefeys, L. V. Gool, M. Vergauwen, F. Verbiest, K. Cornelis, J. Tops,
and R. Koch, “Visual modeling with a hand-held camera,” Int. J. Comput.
Vis., vol. 59, no. 3, pp. 207–232, Sep./Oct. 2004.
[14] C. Harris and M. Stephens, “A combined corner and edge detector,” in
Proc. 4th Alvey Vis. Conf., 1988, pp. 147–151.
[15] M. Brown and D. Lowe, “Unsupervised 3D object recognition and re-
construction in unordered datasets,” in Proc. 5th Int. Conf. 3DIM, 2005,
pp. 56–63.
[16] N. Snavely, S. M. Seitz, and R. Szeliski, “Modeling the world from
Fig. 13. Storage costs of the models normalized by the number of voxels of internet photo collections,” Int. J. Comput. Vis., vol. 80, no. 2, pp. 189–
effective resolution. The “downtown” model requires less memory per voxel in 210, Nov. 2008.
both cases because only a single appearance distribution is needed to model the [17] D. Lowe, “Distinctive image features from scale-invariant keypoints,” Int.
constant illumination condition across video frames. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, Nov. 2004.
[18] M. Goesele, N. Snavely, B. Curless, H. Hoppe, and S. M. Seitz, “Multi-
costs of the models, normalized by their respective effective view stereo for community photo collections,” in Proc. ICCV, 2007,
resolutions. pp. 1–8.
[19] Y. Furukawa and J. Ponce, “Accurate, dense, and robust multi-view stere-
opsis,” in Proc. Comput. Vis. Pattern Recognit., 2007, pp. 1–8.
[20] S. M. Seitz and C. R. Dyer, “Photorealistic scene reconstruction by voxel
VII. C ONCLUSION coloring,” in Proc. CVPR, 1997, pp. 1067–1073.
[21] K. N. Kutulakos and S. M. Seitz, “A theory of shape by space carving,”
The proposed novel probabilistic 3-D model allows for rep- Int. J. Comput. Vis., vol. 38, no. 3, pp. 199–218, 2000.
resentations which sample the 3-D volume in a nonuniform [22] G. Slabaugh, B. Culbertson, T. Malzbender, and R. Schafer, “A survey of
fashion by providing a principled method for taking into ac- methods for volumetric scene reconstruction from photographs,” in Proc.
Int. Workshop Volume Graph., 2001, pp. 81–100.
count voxel/viewing ray intersection lengths. The model is [23] G. G. Slabaugh, W. B. Culbertson, T. Malzbender, M. R. Stevens, and
a generalization of previous probabilistic approaches which R. W. Schafer, “Methods for volumetric reconstruction of visual scenes,”
makes feasible the exploitation of currently available high- Int. J. Comput. Vis., vol. 57, no. 3, pp. 179–199, May/Jun. 2004.
[24] J. S. De Bonet and P. Viola, “Roxels: Responsibility weighted 3d volume
resolution remote sensing imagery. Experiments have shown reconstruction,” in Proc. Int. Conf. Comput. Vis., 1999, pp. 418–425.
that this capability provides a distinct advantage for the ap- [25] A. Broadhurst, T. Drummond, and R. Cipolla, “A probabilistic framework
plication of change detection over previously existing models. for space carving,” in Proc. Int. Conf. Comput. Vis., 2001, pp. 388–393.
[26] H. Samet, Applications of Spatial Data Structures: Computer Graph-
Future work will focus on appearance models that can more ics, Image Processing, and GIS, M. A. Harrison, Ed. Reading, MA:
efficiently model large changes in illumination conditions, as Addison-Wesley, 1990.
well as methods to extract surface information (e.g., DEMs) [27] J. Dorsey, H. Rushmeier, and F. Sillion, Digital Modeling of Material
Appearance. San Mateo, CA: Morgan Kaufmann, 2007.
from the probabilistic models. [28] C. Stauffer and W. Grimson, “Adaptive background mixture models
for real-time tracking,” in Proc. Comput. Vis. Pattern Recognit., 1999,
pp. 246–252.
R EFERENCES
[1] R. J. Radke, S. Andra, O. Al-Kofahi, and B. Roysam, “Image change
detection algorithms: A systematic survey,” IEEE Trans. Image Process.,
vol. 14, no. 3, pp. 294–307, Mar. 2005.
[2] T. Pollard, I. Eden, J. Mundy, and D. Cooper, “A volumetric approach to
change detection in satellite images,” Photogramm. Eng. Remote Sens.,
vol. 76, no. 7, pp. 817–831, Jul. 2010.
[3] M. Carlotto, “Detection and analysis of change in remotely sensed im-
agery with application to wide area surveillance,” IEEE Trans. Image
Process., vol. 6, no. 1, pp. 189–202, Jan. 1997.
[4] C. Benedek and T. Szirányi, “Change detection in optical aerial images
by a multilayer conditional mixed Markov model,” IEEE Trans. Geosci.
Remote Sens., vol. 47, no. 10, pp. 3416–3430, Oct. 2009.
[5] T. Celik and K.-K. Ma, “Unsupervised change detection for satellite im-
ages using dual-tree complex wavelet transform,” IEEE Trans. Geosci. Daniel Crispell (M’11) received the B.S. degree in
Remote Sens., vol. 48, no. 3, pp. 1199–1210, Mar. 2010. computer engineering from Northeastern University,
[6] A. Huertas and R. Nevatia, “Detecting changes in aerial views of man- Boston, MA, in 2003 and the M.S. and Ph.D. degrees
made structures,” in Proc. 6th ICCV, 1998, pp. 73–80. in engineering from Brown University, Providence,
[7] I. Eden and D. B. Cooper, “Using 3D line segments for robust and ef- RI, in 2005 and 2010, respectively.
ficient change detection from multiple noisy images,” in Proc. Comput. While at Brown University, his research focused
Vis.—ECCV, vol. 5305, Lecture Notes in Computer Science, D. Forsyth, on camera-based devices for 3-D geometry cap-
P. Torr, and A. Zisserman, Eds., 2008, Springer: Berlin, Germany. ture, aerial video registration, and 3-D modeling
[8] R. Bhotika, D. J. Fleet, and K. N. Kutulakos, “A probabilistic theory and rendering from aerial and satellite imagery.
of occupancy and emptiness,” in Proc. Eur. Conf. Comput. Vis., 2002, He is currently a Visiting Scientist at the National
pp. 112–132. Geospatial-Intelligence Agency, Springfield, VA.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

12 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

Joseph Mundy received the B.S. and Ph.D. de- Gabriel Taubin (M’86–F’01) received the Licenci-
grees in electrical engineering from Rensselaer Poly- ado en Ciencias Matemáticas degree from the Uni-
technic Institute, Troy, NY, in 1963 and 1969, versity of Buenos Aires, Buenos Aires, Argentina,
respectively. and the Ph.D. degree in electrical engineering from
He joined General Electric Global Research in Brown University, Providence, RI.
1963. In his early career at GE, he carried out re- In 1990, he joined IBM, where during a 13-year
search in solid state physics and integrated circuit career in the Research Division, he held various
devices. In the early 1970s, he formed a research positions, including Research Staff Member and Re-
group on computer vision with emphasis on indus- search Manager. In 2003, he joined the School of
trial inspection. His group developed a number of in- Engineering, Brown University, as an Associate Pro-
spection systems for GE’s manufacturing divisions, fessor of Engineering and Computer Science. While
including a system for the inspection of lamp filaments that exploited syntactic on sabbatical from IBM during the 2000–2001 academic year, he was appointed
methods in pattern recognition. During the 1980s, his group moved toward Visiting Professor of Electrical Engineering at the California Institute of Tech-
more basic research in object recognition and geometric reasoning. In 1988, nology, Pasadena. While on sabbatical from Brown during the spring semester
he was named a Coolidge Fellow, which awarded him a sabbatical at Oxford of 2010, he was appointed Visiting Associate Professor of Media Arts and
University, Oxford, U.K. At Oxford, he collaborated on the development of Sciences at Massachusetts Institute of Technology, Cambridge. He serves as
theory and application of geometric invariants. In 2002, he retired from GE a member of the editorial board of the Geometric Models journal. He has made
Global Research and joined the School of Engineering, Brown University, significant theoretical and practical contributions to the field now called Digital
Providence, RI. At Brown University, his research is in the area of video Geometry Processing: to 3-D shape capturing and surface reconstruction and to
analysis and probabilistic computing. geometric modeling, geometry compression, progressive transmission, signal
processing, and display of discrete surfaces. The 3-D geometry compres-
sion technology that he developed with his group was incorporated into the
MPEG-4 standard and became an integral part of IBM products.
Prof. Taubin is the current Editor-in-Chief of the IEEE C OMPUTER G RAPH -
ICS AND A PPLICATIONS M AGAZINE and has served as Associate Editor of
the IEEE T RANSACTIONS OF V ISUALIZATION AND C OMPUTER G RAPHICS.
He was named IEEE Fellow for his contributions to the development of
3-D geometry compression technology and multimedia standards, won the
Eurographics 2002 Günter Enderle Best Paper Award, and was named IBM
Master Inventor.

You might also like