A Multicamera Active 3D Reconstruction Approach
A Multicamera Active 3D Reconstruction Approach
1. Introduction
Manual object digitizing is a tedious task and for small objects can be replaced by 3D
scanners or CNC machines with a suitable probe. On the other hand large areas can be
modeled automatically using photogrammetry. However, objects, which range from a
few meters to several meters, are not modeled automatically. In this chapter we present
a methodology for three-dimensional modeling, which can be applied to a wide range of
objects. The approach is a synthesis of techniques and algorithms used in stereo vision
together with methods used in active laser range scanners. Finally an artificial neural
network enhances the created model.
In stereoscopy two different views of the object are processed and features
extracted from the images are matched against each other. Based on the disparity of the
corresponding features depth is estimated. Several problems occur during feature
extraction and matching resulting in many cases in an inaccurate and ill-defined object
model. On the other hand in active laser scanning systems depth is computed by
triangulation, that is by measuring the disparity of the trace of the laser beam on the
object. Laser triangulation requires very precise machining of the mechanical structure
which oscillates and moves around the object. An additional problem is that shiny parts
of the object reflect the beam and then the measured position is incorrect.
In the presented approach several cameras survey the object in order that all
sides of it are visible by at least two cameras. Then a laser beam scans the object and
measurements are taken by processing the images of the suitable cameras. We do not
require any knowledge of the position of the laser beam. All calculations are made by
processing the images after having calibrated the cameras and compensated any errors
and distortions of the lens and the sensor. Processing involves tracing of the laser beam
and performing stereoscopic matching based on the detected beam trace. Object
reconstruction follows the processing step. During reconstruction the object is also
processed by an artificial neural network, which reduces noise and determines areas that
require higher spatial sampling frequency.
2. Stereo Vision
3. Laser Triangulation
Laser triangulation is based on the following principle. The laser probe projects
a dot of light on the object, which is to be measured. The scattered light from the dot is
focused on a light sensitive device. Depending on the distance to the object, the dot is
focused at different positions on the light sensitive device. The triangle formed by the
laser source, the illuminated point on the object and the trace on the light sensitive
device is used to estimate the distance to the object. Then the beam oscillates in order to
scan the object.
Laser light striping is a case of structured light illumination. Laser light has the
advantage that is coherent and when focused properly can be modeled more accurately
than any other light source as a line or a plane. It provides also good contrast and can be
detected even if the scene is also illuminated with ambient light. Moreover, if a camera
is fitted with the proper filter, a laser-based system can operate near strong light sources
like the arc of arc welding. Deriving three-dimensional coordinates from laser striping is
done in the following way. A plane of light is projected on the scene, which causes a
stripe of light to appear on the scene. The plane of light has a known equation in 3D
space and every image point defines a line in 3D space, therefore their intersection gives
the world coordinates of each point on the stripe. The above is always true since the
camera's focal point is not in the light plane. Therefore any point that can be seen by
both the camera and the laser stripe can be measured. However, concavities are not
always visible and therefore cause troubles. Another problem is that stripes are not
evenly placed on the surfaces, but the density depends on the surface orientation relative
to the light plane. This situation can be improved by performing an additional set of
measurements by stripping at a different angle.
4. 3D Reconstruction
4.1 CALIBRATION
A simple camera model is the pin hole model, where the camera
transformation from the 3D world coordinates to the 2D image coordinates is
considered a perspective projection. The center of projection is the camera coordinate
system origin. The center of the image is defined as the intersection between the optical
axis of the lens and the image sensor.
PL PR
Epipolar qL
Line
fR
fL
Epipolar
plane
The coordinates of the point P, in the right camera and left camera coordinate systems,
are given by:
PR = RP P + TR
PL = RL P + TL
where R and T represent the rotation and translation transformation from the world to
camera coordinate system.
PL = RL RR−1PR − RL RR−1TR + TL
= MPR + B
X 1L
x1L = f
X 3L
X 1R
x1R = f
X 3R
X 2R
x2 R = f
X 3R
X
x2 L = f 2L
X 3L
x1L x1R
X 3L X 3R
x2 L = M x2 R + B
f f
f f
The real image is quite different from the image derived by the ideal model. In
order to model distortions we add non-linear terms to the perspective projection of a 3D
world coordinate point. In the complete model in order to describe the 3D to 2D
transformation we have to combine the camera translation and rotation, the perspective
projection, the distortion and the sampling [2,9].
The radial distortion produces the most important effect and is modeled as
Calibration Cube
4.2 3D MEASUREMENTS
Once the cameras have been calibrated the algorithm is applied again
whenever the position of a camera changes. In this case we either use five points with
known world coordinates in order to compute only the external parameters of the
camera or use the intrinsic parameters from the first calibration as a starting point.
Camera Setup
The scene is uniformly sampled using vertical laser beams. Using inverse
perspective as described in the previous section the image coordinates of the center of
the laser trace are projected in the 3D world along the line of sight. The lines of sight
from the different cameras, which see the trace, are intersected and the intersection
coordinates gives and object point coordinates. However, due to noise and physical
constraints, lines do not intersect in the geometrical sense so we compute the point,
which is at minimum distance from the lines of sight. The sampling frequency along
the object surface depends on the orientation of the surface relative to the laser beam.
Therefore areas with low sampling frequency along their surface are detected and are
sampled again using beams perpendicular to the surface.
A neural network is used in order to filter out noise. The parametric function
represented by a network with one hidden layer has the form zˆ = s( ax + by + c )
where s( ) is a non-linear activation function. Networks with more than one hidden layer
can model complex surfaces using a variety of basis functions. The advantage is that the
geometric characteristics of the basis functions are learned from data instead of being
specified by the user.
∑
P
φ ( z p − zˆ p )
p =1 t
where the shape of φt resembles the tanh estimator [4] whose
derivative is given as
r | r |≤ a t
ψ t (r ) = a tanh( β (bt − r )) sgn( r ) a t <| r |≤ bt
0 | r |> bt
ψt(r)
Φt(r)
When minor noise exists in the training data, the algorithm is similar to back
propagation algorithm. However, when noise level increases the influence of noise on
the learning process is significantly reduced, because the amount of weight adjustment
at the output layer during leaning is proportional to ψ t (r ) instead of the residual r. We
need then to specify q, which is the percentage of bad data to be tolerated by the
algorithm. Using the bootstrap method the confidence interval of the residuals is
computed that separates the good from the bad data.
5. Experiments
Steel Structures
6. Conclusions
References