Interactive Acquisition of Residential F

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Interactive Acquisition of Residential Floor Plans

Young Min Kim, Jennifer Dolson, Mike Sokolsky, Vladlen Koltun, Sebastian Thrun
Stanford University
{ymkim, jldolson, mvs, vladlen, thrun}@stanford.edu

Abstract— We present a hand-held system for real-time, does not provide sophisticated rendering. Rather, projection
interactive acquisition of residential floor plans. The system allows the user to visualize the reconstruction process. Then,
integrates a commodity range camera, a micro-projector, and the user can detect reconstruction errors that arise due to
a button interface for user input and allows the user to freely
move through a building to capture its important architectural deficiencies in the data capture path, and can complete
elements. The system uses the Manhattan world assumption, missing data in response. The user can also note which
which posits that wall layouts are rectilinear. This assumption walls have been included in the model and easily resolve
allows generating floor plans in real time, enabling the operator ambiguities with a simple input device.
to interactively guide the reconstruction process and to resolve
structural ambiguities and errors during the acquisition. The
interactive component aids users with no architectural training II. R ELATED W ORK
in acquiring wall layouts for their residences. We show a
number of residential floor plans reconstructed with the system. A number of approaches have been proposed for indoor
reconstruction in computer graphics, computer vision, and
robotics. Real-time indoor reconstruction has been recently
I. I NTRODUCTION explored either with a depth sensor [1] or an optical camera
Acquiring an accurate floor plan of a home is a challenging [2]. The key to real-time performance is the fast registration
task, yet it is a requirement in many situations that involve of successive frames. Similar to [1], we fuse both color
remodeling or selling a property. Original blueprints are often and depth information to register frames. Furthermore, our
hard to find, especially for older residences. In practice, approach extends real-time acquisition and reconstruction, by
contractors and interior designers use point-to-point laser allowing the operator to visualize the current reconstruction
measurement devices to acquire a set of distance measure- status without consulting a computer screen. By making the
ments. Based on these measurements, an expert creates a feedback loop immediate, the operator can resolve failures
floor plan that respects the measurements and represents the and ambiguities while the acquisition session is in progress.
layout of the residence. Previous approaches are also limited to a dense 3-D
In this paper, we present a hand-held system for indoor reconstruction (registration of point cloud data) with no
architectural reconstruction. The system eliminates the man- higher-level information, which is memory intensive. A few
ual post-processing necessary for reconstructing the layout exceptions include [3], which detects high level features
of walls in a residence. Instead, an operator with no archi- (lines and planes) to reduce complexity and noise. The high
tectural expertise can interactively guide the reconstruction level structures, however, do not necessarily correspond to
process by moving freely through an interior until all walls actual meaningful structure. In contrast, our system identifies
have been observed by the system. and focuses on significant architectural elements using the
Our system is composed of a laptop connected to a Manhattan world assumption, which is based on the observa-
commodity range sensor, a lightweight optical projector, and tion that many indoor scenes are largely rectilinear [4]. This
an input button interface (Figure 1, left). The real-time depth assumption has been widely used for indoor scene recon-
sensor is the main input modality. We use the Microsoft struction from images to overcome the inherent limitations
Kinect, a lightweight commodity device that outputs VGA- of image data [5][6]. The stereo method only reconstructs
resolution range and color images at video rates. The data is locations of image feature points, and the Manhattan world
processed in real time to create the floor plan by focusing on assumption successfully fills the area between the sparse
flat surfaces and ignoring clutter. The generated floor plan feature points during a post-processing step. Similarly, our
can be used directly for remodeling or real-estate applica- system differentiates between architectural features and mis-
tions, or to produce a 3D model of the interior for applica- cellaneous objects in the space, produces a clean architectural
tions in virtual environments. In Section V, we demonstrate floor plan, and simplifies the representation of the environ-
a number of residential wall layouts reconstructed with our ment. Even with the Manhattan world assumption, however,
system. the system still cannot fully resolve ambiguities introduced
The attached projector is initially calibrated to have an by large furniture items and irregular features in the space
overlapping field of view with the same image center as the without user input. This interactive capability relies on the
depth sensor and projects the reconstruction status onto the system’s ability to integrate new input into a global map of
surface being scanned. Under normal lighting, the projector the space in real time.
345

319

10

76 217

494

269
332

63

61

421

Fig. 1. Our hand-held system is composed of a projector, a Microsoft Kinect sensor, and an input button (left). The system uses augmented reality
feedback (middle left) to project the status of the current model onto the environment and to enable real-time acquisition of residential wall layouts (middle
right). The floor plan (middle right) and visualization (right) were generated using data captured by our system.

Simplifying the representation also reduces the computa- is similarly created by including the selected planes, and
tional burden of processing the map. Registration of suc- the room is correctly positioned into the global coordinate
cessive point clouds results in an accumulation of errors, system. The model is updated in real time and stored in either
especially for a large environment, and requires a global a CAD format or a 3-D mesh format that can be loaded into
optimization step in order to build a consistent map. This most 3-D modeling software.
is similar to reconstruction tasks encountered in robotic
mapping and is usually solved by bundle adjustment, a IV. DATA ACQUISITION P ROCESS
costly off-line process [7][8]. Employing the Manhattan
At each time step t, the sensor produces a new frame of
world assumption simplifies the map construction to a one-
data, Ft = {Xt , It , Pt , T t }. The sensor output is composed
dimensional, closed-form problem.
of a range image Xt (a 2-D array of depth measurements)
The augmented reality component of our system is in-
and a color image It . During the acquisition process, we
spired by the SixthSense project [9]. Instead of simply aug-
represent the relationship between the planes in global map
menting a user’s view of the world, however, our projected
Mt and the measurement in the current frame Xt as Pt , a
output serves to guide an interactive reconstruction process.
2-D array of plane labels for each pixel. T t represents the
Directing the user in this way is similar to re-photography
transformation from the frame Ft , which is the measure-
[10], where a user is guided to capture a photograph from
ment relative to the current sensor position, to the global
the same viewpoint as in a previous photograph. By using
coordinate system, which is where the map Mt is defined.
a micro-projector as the output modality, our system allows
Throughout the data capture session, the system maintains
the operator to focus on interacting with the environment.
the global map Mt , and the two most recent frames, Ft−1
and Ft . Additionally, the frame with the last observed
III. S YSTEM OVERVIEW AND U SAGE
corner Fc is stored to recover the sensor position when lost.
The data acquisition process is initiated by pointing the Instead of storing information from all frames, we keep the
sensor to a corner, where three mutually orthogonal planes total computational and memory requirements minimal by
meet. This defines the Manhattan-world coordinate system. incrementally updating the global map only with components
The attached projector will indicate successful initialization that need to be added to the final model.
by overlaying blue-colored planes with white edges onto the The map Mt is composed of loops of axis-parallel planes
scene (Figure 2 (a)). After the initialization, the user scans Ltr . Each room has its own loop of planes. Each plane has its
each room individually as he or she loops around it holding axis label (x, y, or z) and the offset value (e.g. x = x0 ), as
the device. If the movement is too fast or if there are not well as its left or right plane if the connectivity is observed.
enough features, a red projection guides the user to recover A plane can be selected or ignored based on user input. The
the position of the device (Figure 2 (b)). selected planes are extracted from Ltr as the loop of the
The system extracts flat surfaces that align with the room Rtr , which can be converted into the floor plan as a
Manhattan coordinate system and creates complete recti- 2-D rectilinear polygon. Both Ltr and Rtr are constrained to
linear polygons, even when connectivity between planes is have alternating axis labels (x and y). For the z direction
occluded. Sometimes, the user might not want some of the (vertical direction), we are only keeping the ceiling and the
extracted planes (parts of furniture or open doors) to be floor. We also keep the sequence of observation (S x , S y , and
included in the model even if they satisfy the Manhattan- S z ) of offset values for each axis direction, and we store the
world constraint. When the user clicks the input button measured distance and the uncertainty of the measurement
(left click), the extracted wall toggles between inclusion between planes.
(indicated as blue) and exclusion (indicated as grey) to the The overall reconstruction process is summarized in Fig-
model (Figure 2 (c)). As the user finishes scanning a room, he ure 2. As mentioned in Sec. III, the process is initiated by
or she walks toward another room. A new rectilinear polygon extracting three mutually orthogonal planes, when a user
is initiated by a right click. Another rectilinear polygon points the system to one of the corners. To detect planes
Fetch a new frame

Exists
Global
Success
adjustment
Pair-wise
Initialization Plane extraction
registration

Map update
New

User interaction
Failure Left click Right click
Adjust data Start a new
Visual feedback Select planes
path room

(a) (b) (c)

Fig. 2. System overview and usage (Section III). When an acquisition session is initiated by observing a corner, the user is notified by a blue projection
(a). After the initialization, the system updates the camera pose by registering consecutive frames. If a registration failure occurs, the user is notified by
a red projection and is required to adjust the data capture path (b). Otherwise, the updated camera configuration is used to detect planes that satisfy the
Manhattan-world constraint in the environment and to integrate them into the global map. The user interacts with the system by selecting planes in the
space (c). When the acquisition session is completed, the acquired map is used to construct a floor plan consisting of user-selected planes.

in the range data, we fit plane equations to groups of range


points and their corresponding normals using the RANSAC
algorithm [11]: we first randomly sample a few points, then
fit a plane equation to them. Then, we test the detected plane
by counting the number of points that can be explained by
the plane equation. After convergence, the detected plane (a) (b)
is classified as valid only if the detected points constitute
a large, connected portion of the depth information within
the frame. If there are three planes detected and they are
orthogonal to each other, we assign the x, y and z axes
to be the normal directions of these three planes, which
form the right-handed coordinate system for our Manhattan
(c) (d)
world. Now the map Mt has two planes (ignoring the floor
or ceiling), and the transformation T t between Mt and Ft Fig. 3. (a) Flat wall features (depicted as the triangle and circle) are
is also found. observed from two different locations. Diagram (b) shows both observations
with respect to the camera coordinate system. Without features, using
A new measurement Ft is registered with the previous projection-based ICP can lead to registration errors in the image-plane
frame Ft−1 by aligning depth and color features (Sec. IV- direction (c), while utilizing the features will provide better registration
A). This registration is used to update T t−1 to a new (d).
transformation T t . Then, we extract planes that satisfy the
Manhattan world from T t (Ft ) (Sec. IV-B). If the extracted
planes already exist in Ltr , the current measurement is the two most recent frames, Ft−1 and Ft . In using both the
compared with the global map and the registration is refined depth point clouds (Xt−1 , Xt ) and optical images (It−1 , It ),
(Sec. IV-C). If there is a new plane extracted, or if there is the frames can efficiently be registered in real time (about
user input to specify the map structure, the map is updated 20 fps).
accordingly (Sec. IV-D). Given two sets of point clouds, Xt−1 = {xt−1 i }N
i=1 and
t t N
X = {xi }i=1 , and the transformation for the previous
A. Pair-wise registration point cloud T t−1 , the correct rigid transformation T t will
To propagate information from previous frames and to minimize the error between correspondences in the two sets:
detect new planes in the scene, each incoming frame must be
registered with respect to the global coordinate system. To
X
mint kwi (T t−1 (xit−1 ) − T t (yit ))k2 (1)
start this process, we find the relative registration between yi ,T
i
not enough color and depth features. The first case can be
easily detected as we run the algorithm. The second case
is detected if the optical flow did not find homography
(i.e. there is a lack of color feature) and there were not
enough matched silhouette points (i.e. there is a lack of depth
feature).
In cases of registration failure, the projected image turns
Fig. 4. Silhouette points. There are two different types of depth disconti- red, indicating that the user should return the system’s view-
nuity: the boundaries of a shadow made on the background by a foreground
object (empty circles), and the boundaries of a foreground object (filled point to the most recently observed corner. This movement
circles). The meaningful depth features are the foreground points, which usually takes only a small amount of back-tracking, as the
are the silhouette points used for our registration pipeline. failure is detected within milliseconds of leaving the previous
successfully registered area. Similar to the initialization step,
the system extracts planes from Xt using RANSAC and
yit ∈ Xt is the corresponding point for xt−1 i ∈ Xt−1 . Once matches the planes with the desired corner. We show the
the correspondence is known, minimizing Eq. (1) becomes a process of overcoming a registration failure in Figure 2 (b).
closed-form solution [12]. Traditionally, the correspondence The user then deliberately moves the sensor along the path
was found by searching for the closest point, which is with richer features or steps back to have a wider field of
computationally expensive. Real-time registration methods view.
reduce the cost by projecting the 3-D points onto a 2-
D image plane and assigning correspondences to points B. Plane extraction
that project onto the same pixel locations [13]. However, Based on the transformation T t , we extract axes-aligned
projection will only reduce the distance in the ray direction; planes and associated edges. The planes and detected features
the offset parallel to the image plane cannot be adjusted. This will provide higher-level information that relates the raw
phenomenon can result in the algorithm not compensating for point cloud Xt to the global map Mt . Because we are
the translation parallel to the plane and therefore shrinking only considering the planes that satisfy the Manhattan-world
the size of the room (Figure 3). coordinate system, we can simplify the plane detection
Our pair-wise registration is similar to [13], but it com- procedure.
pensates for the displacement parallel to the image plane The planes that were visible from the previous frame
using image features and silhouette points. Intuitively, we use can be easily found by using the correspondence. From the
homography to compensate errors parallel to the plane if the pair-wise registration (Sec. IV-A), we have the point-wise
structure can be approximated into a plane, and silhouette correspondence between the previous frame and the current
points are used to compensate remaining errors when the frame. The plane label Pt−1 from the previous frame is
features were not planar. We first compute the optical flow updated simply by being copied over to the corresponding
between color images It and It−1 and find a homography (a location. Then, we refine Pt by alternating between fitting
transformation found from tracked image features between points and fitting parameters.
them), which represents the displacement parallel to the A new plane can be found by projecting remaining points
image plane. Then, we use the homography to compute for the x, y, and z axes. For each axis direction, we build a
dense correspondences between the two frames. From the histogram with the bin size 20cm and test the plane equation
second iteration, the correspondence is found by projecting for populated bins. Compared to the RANSAC procedure for
individual points onto the image plane. initialization, the Manhattan world assumption reduces the
Additionally, we modify the correspondence for silhouette number of degrees of freedom from three to one, making
points (points of depth discontinuity in the foreground, plane extraction more efficient.
shown in Figure 4). For silhouette points in Xt−1 , we find the For extracted planes, the boundary edges are also ex-
closest silhouette points in Xt within a small search window tracted; we detect groups of boundary points that can be
from the original corresponding location. If the matching explained by an axis-parallel line segment. We also keep
silhouette point exists, the correspondence is weighted more. the information of relative positions for extracted planes
(We used wi = 100 for silhouette points and wi = 1 for (left/right). As long as the sensor is not flipped upside-
non-silhouette points.) Then, the registration between the two down, this provides an important cue to build a room with
frames for the current iteration can be given as a closed-form the correct topology, even when the connectivity between
solution. The process iterates until it converges. neighboring planes was not observed.
1) Registration Failure: The real-time registration is a 1) Data association: After the planes are extracted, the
crucial part of our algorithm for accurate reconstruction. data association process finds the link between the global
Even with the hybrid approach using both color and depth map Mt and the extracted planes to be Pt , a 2-D array of
features, the registration can fail, and it is important to detect plane labels for each pixel. The plane labels that existed from
the failure immediately and to recover the position of the previous frame were automatically found while extracting
sensor. The registration failure is detected either (1) if the the plane by copying over the plane labels using correspon-
pair-wise registration does not converge or (2) if there were dences.
The plane labels for the newly detected plane can be found
by comparing T t (Ft ) and Mt . In addition to the plane
equation, the relative position with respect to other observed
planes are used to label the one. If the plane was not observed
before, a new plane will be added into Ltr based on the left-
right information. The adjacent walls should have alternating
axis directions (a x = xi wall should be connected to a
y = yj wall). If the two observed walls have the same axis
Fig. 5. As errors accumulate in T t and in measurements, the map Mt
direction, we add the unobserved wall between them on the becomes inconsistent. By comparing previous and recent measurements, the
boundary of the planes to form a complete loop. system can correct for inconsistency and update the value of c such that
After the data association step, we update the sequence c = a.
of observation S. The planes that have been assigned as
previously observed are used for global adjustment (Sec. IV-
C). If a new plane was observed, the room Rtr will be problems (we are ignoring the z direction for now, but the
updated accordingly (Sec. IV-D). idea can be extended to a 3-D case).
Figure 5 shows a simple example in the x-axis direction.
C. Global adjustment Suppose an overhead view of a rectangular room. There
Due to noise in the point cloud, frame-to-frame registration should be two walls whose normals are parallel to the x-axis.
is not perfect, and error accumulates over time. This is a The sensor detects the first wall (x = a), sweeps around
common problem in pose estimation. Large-scale localization the room, observes another wall (x = b), and returns to
approaches use bundle adjustment to contain error accu- the previously observed wall. Because of error accumulation,
mulation [7], [8]. Enforcing this global constraint involves parts of the same wall have two different offset values (x = a
detecting landmark objects, or stationary objects observed at and x = c), but by observing the left-right relationship
different times during a sequence of measurements. Usually between walls, the system infers that the two walls are indeed
this global adjustment becomes an optimization problem in the same wall.
many dimensions. The problem is formulated by constraining To optimize the offset values, we track the sequence of
the landmarks to predefined global locations, and by solving observations S x = {a, b, c} and the variances at the point
an energy function that encodes noise in a pose estimation of observation for each wall, as well as the constraints
of both sensor and landmark locations. The Manhattan world represented by the pair of the same offset values C x =
assumption allows us to reduce the error accumulation effi- {(c11 , c12 ) = (a, c)}. We introduce two random variables,
ciently in real time by refining our registration estimate and ∆1 and ∆2 , to constrain the global map optimization. ∆1
by optimizing the global map. is a random variable with mean m1 = b − a and variance
1) Refining the registration: After data association, we σ12 that represents the error between the moment when the
perform a second round of registration with respect to the sensor observed the x = a wall and the moment it observed
global map Mt to reduce the error accumulation in T t by the x = b wall. Likewise, a random variable ∆2 represents
incremental, pair-wise registration. The extracted planes Pt , the error with mean m2 = c − b and variance σ22 .
if already observed by the system, have been assigned to Whenever a new constraint is added, or when the system
the planes in Mt that have associated plane equations. For observes a plane that was previously observed, the global
example, suppose a point T t (xu,v ) = (x, y, z) has a plane adjustment routine is triggered. This is usually when the user
label Pt (u, v) = pk (assigned to plane k). If plane k has finished scanning a room by looping around it and returning
normal parallel to the x axis, the plane equation in the global to the first wall measured. By confining the axis direction,
map Mt can be written as x = x0 (x0 ∈ R). Then, the the global adjustment becomes a one-dimensional quadratic
registration should be refined to minimize kx − x0 k2 . This equation:
can be found by defining the corresponding point for xu,v 
P k∆i −mi k2

as (x0 , y, z). The corresponding points are likewise assigned minS x i σi2 (2)
for every point with plane assignment in Pt . Given the s. t. cj1 = cj2 , ∀(cj1 , cj2 ) ∈ C x .
correspondence, we can refine the registration between the
current frame Ft and the global map Mt . This second round D. Map update
of registration reduces the error in the axis direction. In our Our algorithm ignores most irrelevant features using the
example, the refinement is active while the plane x = x0 is Manhattan-world constraint. However, the system cannot
visible and reduces the uncertainty in the x direction with distinguish architectural components from other axis-aligned
respect to the global map. The error in the x direction is not objects using Manhattan world assumption. For example,
accumulated during the interval. furniture, open doors, parts of other rooms that might be
2) Optimizing the map: As error accumulates, the recon- visible, or reflections from mirrors may be detected as axis-
structed map Mt may also require global adjustment in each aligned planes. We solve the challenging cases by allowing
axis direction. The Manhattan world assumption simplifies the user to manually specify the planes that he or she would
this global optimization into two separate, one-dimensional like to include in the final model. This manual specification
Fig. 6. Selection. In sequence (a), the user is observing two new planes in the scene (colored white) and one currently included plane (colored blue). The
user selects one of the new planes by pointing at it and clicking. Then, the second new plane is added. All planes are blue in the final frame, confirming
that all planes have been successfully selected. Sequence (b) shows a configuration where the user has decided not to include the large cabinet. Sequence
(c) shows successful selection of the ceiling and the wall despite clutter.

consists of simply clicking the input button during scanning floor plan. In all cases the detection and labeling of planar
when pointing at a plane, as shown in Figure 6. If the user surfaces by our algorithm enabled the user to add or remove
enters a new room, a right click of the button indicates these surfaces from the model in real time, allowing the
the user wishes to include this room and to optimize it final model to be constructed using only the important
individually. The system creates a new loop of planes and architectural elements from the scene.
any newly observed planes are added to the loop. The overlaid floor plans in Figure 7(c) show that that
Whenever a new plane is added to Ltr or there is user the relative placement of the rooms may be misaligned.
input to specify the room structure, the map update routine This is because our global adjustment routine optimizes
extracts a 2-D rectilinear polygon Rtr from Ltr with the help rooms individually, thus error can accumulate in transitions
of user input. We start by adding all selected planes into Rtr between rooms. The algorithm could be extended to enforce
as well as whichever unselected planes in Ltr are necessary global constraints on the relative placement of rooms, such
to have alternating axis direction. The planes with observed as maintaining a certain wall thickness and/or aligning the
boundary edges have priority to be added. outer-most walls, but such global constraints may induce
other errors.
V. E VALUATION Table I contains a quantitative comparison of the errors.
The current practice in architecture and real estate is to use The reported depth resolution of the sensor is 0.01m at 2m,
a point-to-point laser device to measure distances between and for each model we have an average of 0.075m error
pairs of parallel planes. Making such measurements requires per wall. The relative error stays in the range of 2-5%,
a clear, level line of sight between two planes, which may be which shows that the accumulation of small registration error
time-consuming to find due to furniture, windows, and other continues to increase as more frames are processed.
obstructions. After making all the distance measurements, a Fundamentally, the limitations of our method reflect those
user is required to manually draw a floor plan that respects of the Kinect sensor, namely, the processing power of the
the measurements. Roughly 10-20 minutes was needed to laptop and the assumptions made in our approach. As the
take the distance measurements in each apartment. Using accuracy of depth data is worse than visual features, our
our system, the data acquisition process took approximately approach exhibits larger errors compared to visual SLAM.
2-5 minutes per home to initiate, run, and generate the full Some of the uncertainty can be reduced by adapting ap-
floor plan. Table I summarizes the timing data for each data proaches from well-explored visual SLAM literature. Still,
set. The average frame rate is 7.5 frames per second running we are limited when we cannot detect meaningful features.
on an Intel 2.50GHz Dual Core laptop. The Kinect sensor’s reported measurement range is between
In Figure 7, we visually compare the reconstructed floor 1.2 and 3.5m from an object; outside that range, data is noisy
plans. The floor plans in blue are reconstructed using point- or unavailable. As a consequence, data in narrow hallways
to-point laser measurements, and the floor plans in red are or large atriums is difficult to collect.
reconstructed by our system. For each home, the topology of Another source of potential error is a user outpacing the
the reconstructed walls agrees with the manually-constructed operating rate of approximately 7.5 fps. This frame rate
house 1

house 2

house 3

house 4

house 5

house 6
(a) (b) (c)
Fig. 7. (a) Manually constructed floor plans generated from point-to-point laser measurements, (b) floor plans acquired with our system, and (c) overlay.
For house 4, some parts (pillars in large open space, stairs, and an elevator) are ignored by the user. The system still uses the measurements from those
parts and other objects to correctly understand the relative positions of the rooms.

already allows for a reasonable data capture pace, but with in real time while ignoring clutter. The current status of
more processing power, the pace of the system could always the reconstruction is projected on the scanned environment
be guaranteed to exceed normal human motion. to enable the user to provide high-level feedback to the
system. This feedback helps overcome ambiguous situations
VI. C ONCLUSION AND F UTURE W ORK
and allows the user to interactively specify the important
We have presented an interactive system that allows a user planes that should be included in the model.
to capture accurate architectural information and to auto-
matically generate a floor plan. Leveraging the Manhattan More broadly, our interactive system can be extended to
world assumption, we create a representation that is tractable other applications in indoor environments. For example, a
data no. of run average error
fps [12] P. Besl and N. McKay, “A method for registration of 3-d shapes,”
set frames time m %
IEEE Trans. PAMI, vol. 14, pp. 239–256, 1992.
1 1465 2m 56s 8.32 0.115 4.14
[13] S. Rusinkiewicz and M. Levoy, “Efficient variants of the icp algo-
2 1009 1m 57s 8.66 0.064 1.90
rithm,” in Proc. 3DIM, 2001.
3 2830 5m 19s 8.88 0.053 2.40
4 1129 2m 39s 7.08 0.088 2.34
5 1533 3m 52s 6.59 0.178 3.52
6 2811 7m 4s 6.65 0.096 3.10
ave. 1795 3m 57s 7.54 0.075 2.86
TABLE I
ACCURACY COMPARISON BETWEEN FLOOR PLANS RECONSTRUCTED BY
OUR SYSTEM , AND MANUALLY CONSTRUCTED FLOOR PLANS
GENERATED FROM POINT- TO - POINT LASER MEASUREMENTS .

Fig. 8. The system, having detected the planes in the scene, also allows
the user to interact directly with the physical world. Here the user adds a
window to the room by dragging a cursor across the wall (left). This motion
updates the internal model of the world (right).

user could visualize modifications to the space as in Figure 8,


where we show a user clicking and dragging a cursor across
a plane to “add” a window. This example illustrates the range
of possible uses of our system.

R EFERENCES
[1] P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox., “Rgb-d mapping:
Using depth cameras for dense 3d modeling of indoor environments,”
in ISER, 2010.
[2] R. A. Newcombe and A. J. Davison, “Live dense reconstruction with
a single moving camera,” in CVPR, 2010.
[3] A. P. Gee, D. Chekhlov, A. Calway, and W. Mayol-Cuevas, “Dis-
covering higher level structure in visual slam,” IEEE Transactions on
Robotics, vol. 24, pp. 980–990, October 2008.
[4] J. M. Coughlan and A. L. Yuille, “Manhattan world: Compass direc-
tion from a single image by bayesian inference,” in ICCV, pp. 941–
947, 1999.
[5] Y. Furukawa, B. Curless, S. Seitz, and R. Szeliski, “Reconstructing
building interiors from images,” in ICCV, pp. 80–87, 2009.
[6] C. A. Vanegas, D. G. Aliaga, and B. Benes, “Building reconstruction
using manhattan-world grammars,” in CVPR, pp. 358–365, 2010.
[7] B. Triggs, P. F. McLauchlan, R. I. Hartley, and A. W. Fitzgibbon,
“Bundle adjustment - a modern synthesis,” in Proceedings of the
International Workshop on Vision Algorithms: Theory and Practice,
ICCV ’99, Springer-Verlag, 2000.
[8] S. Thrun, “Robotic mapping: A survey,” in Exploring Artificial In-
telligence in the New Millenium (G. Lakemeyer and B. Nebel, eds.),
Morgan Kaufmann, 2002.
[9] P. Mistry and P. Maes, “Sixthsense: a wearable gestural interface,” in
SIGGRAPH ASIA Art Gallery & Emerging Technologies, p. 85, 2009.
[10] S. Bae, A. Agarwala, and F. Durand, “Computational rephotography,”
ACM Trans. Graph., vol. 29, no. 5, 2010.
[11] M. A. Fischler and R. C. Bolles, “Random sample consensus: a
paradigm for model fitting with applications to image analysis and
automated cartography,” Commun. ACM, vol. 24, pp. 381–395, June
1981.

You might also like