0% found this document useful (0 votes)
31 views10 pages

A Multicamera Active 3D Reconstruction Approach

Uploaded by

Theodore
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views10 pages

A Multicamera Active 3D Reconstruction Approach

Uploaded by

Theodore
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

A MULTICAMERA ACTIVE 3D RECONSTRUCTION APPROACH

Theodore Lilas, Stefanos Kollias


Dept. of Electrical and Computer Engineering
National Technical University of Athens
Zografou, Athens, Greece
[email protected]

1. Introduction

Manual object digitizing is a tedious task and for small objects can be replaced by 3D
scanners or CNC machines with a suitable probe. On the other hand large areas can be
modeled automatically using photogrammetry. However, objects, which range from a
few meters to several meters, are not modeled automatically. In this chapter we present
a methodology for three-dimensional modeling, which can be applied to a wide range of
objects. The approach is a synthesis of techniques and algorithms used in stereo vision
together with methods used in active laser range scanners. Finally an artificial neural
network enhances the created model.

In stereoscopy two different views of the object are processed and features
extracted from the images are matched against each other. Based on the disparity of the
corresponding features depth is estimated. Several problems occur during feature
extraction and matching resulting in many cases in an inaccurate and ill-defined object
model. On the other hand in active laser scanning systems depth is computed by
triangulation, that is by measuring the disparity of the trace of the laser beam on the
object. Laser triangulation requires very precise machining of the mechanical structure
which oscillates and moves around the object. An additional problem is that shiny parts
of the object reflect the beam and then the measured position is incorrect.

In the presented approach several cameras survey the object in order that all
sides of it are visible by at least two cameras. Then a laser beam scans the object and
measurements are taken by processing the images of the suitable cameras. We do not
require any knowledge of the position of the laser beam. All calculations are made by
processing the images after having calibrated the cameras and compensated any errors
and distortions of the lens and the sensor. Processing involves tracing of the laser beam
and performing stereoscopic matching based on the detected beam trace. Object
reconstruction follows the processing step. During reconstruction the object is also
processed by an artificial neural network, which reduces noise and determines areas that
require higher spatial sampling frequency.

2. Stereo Vision

Perspective projection maps three-dimensional world to a two-dimensional


image. It maps an infinite line onto a point in the image. Therefore it is not directly
invertible. However, multiple images taken from different viewports can be combined
to derive depth information, by using the inverse perspective transformation. The
inverse perspective transformation determines the equation of a line of sight in three-
dimensional space given the image coordinates and the camera model.

One approach is to identify features in two images, which correspond to the


same point. Then derive the two lines of sight for each feature points using inverse
perspective. Finally intersect the two lines to derive the 3D coordinates. The most
difficult part is to identify the corresponding parts in the two images. One way is to
perform block matching, however the matching areas are not identical and in some
cases a feature of one image is occluded on the other. Another approach is to work on
the edges. More recent techniques address the problem of 3D-structure and motion
estimation simultaneously [1,8]. The mutual relationship between stereo disparity and
motion estimation is utilized, improving the accuracy of both estimations.

3. Laser Triangulation

Laser triangulation is based on the following principle. The laser probe projects
a dot of light on the object, which is to be measured. The scattered light from the dot is
focused on a light sensitive device. Depending on the distance to the object, the dot is
focused at different positions on the light sensitive device. The triangle formed by the
laser source, the illuminated point on the object and the trace on the light sensitive
device is used to estimate the distance to the object. Then the beam oscillates in order to
scan the object.

Structured light is used in many cases to derive depth information. The


principle is to illuminate the scene using a geometric pattern, which will help extract the
geometric information of the object [7]. Structured light allows deriving depth and
surface orientation information even from dull areas where we cannot extract any
features.

Laser light striping is a case of structured light illumination. Laser light has the
advantage that is coherent and when focused properly can be modeled more accurately
than any other light source as a line or a plane. It provides also good contrast and can be
detected even if the scene is also illuminated with ambient light. Moreover, if a camera
is fitted with the proper filter, a laser-based system can operate near strong light sources
like the arc of arc welding. Deriving three-dimensional coordinates from laser striping is
done in the following way. A plane of light is projected on the scene, which causes a
stripe of light to appear on the scene. The plane of light has a known equation in 3D
space and every image point defines a line in 3D space, therefore their intersection gives
the world coordinates of each point on the stripe. The above is always true since the
camera's focal point is not in the light plane. Therefore any point that can be seen by
both the camera and the laser stripe can be measured. However, concavities are not
always visible and therefore cause troubles. Another problem is that stripes are not
evenly placed on the surfaces, but the density depends on the surface orientation relative
to the light plane. This situation can be improved by performing an additional set of
measurements by stripping at a different angle.

4. 3D Reconstruction

4.1 CALIBRATION

Depth perception can be derived by processing multiple images taken from


different viewports. In order to extract depth it is necessary to know the relative position
of the cameras. Precise camera position and orientation is not easily available from
outside measurements, because the optical center we are looking for is inside the
camera. Moreover, if we are using a zoom lens the optical center changes according to
the zoom factor and does not correspond to a fixed point.

Of-the-self cameras compared to metric cameras require additional


information in order to model and compensate for distortions. The above problem is
solved using camera calibration techniques. The algorithm used estimates the extrinsic
parameters of the camera, describing its position and orientation, and the intrinsic
parameters describing its internal viewing characteristics. Calibration is based on
viewing points on a calibration device whose 3D coordinates are known with great
accuracy. Although, it is possible to calibrate the camera using self-calibration methods,
a special calibration pattern is necessary in order to achieve high accuracy. In order to
describe the position and orientation of each camera we need to estimate the
transformation matrix, which specify the relation between the world coordinate system
and the camera coordinate system.

The camera coordinate system C is a right handed reference system defined as


follows: The origin f of the camera reference system is positioned at the center of
projection of the lens. The z-axis points away from the camera and corresponds to the
optical axis. The x-axis is parallel to the horizontal lines of the camera image from left
to right. The y-axis is parallel to the vertical axis of the camera image from top to
bottom.

A simple camera model is the pin hole model, where the camera
transformation from the 3D world coordinates to the 2D image coordinates is
considered a perspective projection. The center of projection is the camera coordinate
system origin. The center of the image is defined as the intersection between the optical
axis of the lens and the image sensor.

PL PR
Epipolar qL
Line
fR
fL
Epipolar
plane

Let p L = ( x1L , x2 L ) , p R = ( x1R , x 2 R ) be the image coordinates of the point


P = ( X 1, X 2 , X 3 ) .

The coordinates of the point P, in the right camera and left camera coordinate systems,
are given by:
PR = RP P + TR

PL = RL P + TL
where R and T represent the rotation and translation transformation from the world to
camera coordinate system.

Combining the equations we have:

PL = RL RR−1PR − RL RR−1TR + TL
= MPR + B

where M and B are known as relative configuration parameters.


Assuming equal focal length on both cameras:

X 1L
x1L = f
X 3L
X 1R
x1R = f
X 3R

X 2R
x2 R = f
X 3R
X
x2 L = f 2L
X 3L

Combining the equations we have:

 x1L   x1R 
X 3L   X 3R  
x2 L = M x2 R + B
f   f  
 f   f 

The real image is quite different from the image derived by the ideal model. In
order to model distortions we add non-linear terms to the perspective projection of a 3D
world coordinate point. In the complete model in order to describe the 3D to 2D
transformation we have to combine the camera translation and rotation, the perspective
projection, the distortion and the sampling [2,9].

The radial distortion produces the most important effect and is modeled as

x1d=x1+ax1(x12+x22) , x2d =x2 +ax2(x12+x22)

and the decentering of the lens

x1d=x1+ax1(3x12+x22) +2bx1x2, x2d =x2 +ax2(3x12+x22)+2ax1x2

Based on the calibration setup accurate estimation was achieved by modeling


and compensating radial and decentering of the lens distortions. The intrinsic camera
parameters are the scale factor (s), the effective focal length (f), the principal point (Cx,
Cy) which is the intersection between camera’s coordinate frame z-axis with the sensor
plate, and the radial distortion coefficients.
The calibration device is a cube with a chessboard pattern on each side. The
image is processed and the features extracted are the corners of the squares forming the
pattern. The features are detected with 1/10th of a pixel accuracy [5,6], which leads to
an estimation accuracy of a few millimeters of the camera position with respect to the
calibration target.

Calibration Cube

4.2 3D MEASUREMENTS

Once the cameras have been calibrated the algorithm is applied again
whenever the position of a camera changes. In this case we either use five points with
known world coordinates in order to compute only the external parameters of the
camera or use the intrinsic parameters from the first calibration as a starting point.

Camera Setup

The next step is to take 3D measurements of the object. Stereoscopic analysis


has sometimes given poor results depending on the light conditions and workpiece
texture. We were able to overcome these problems by scanning the workpiece with a
laser beam. Two approaches were examined, one utilized a laser stripe and the other a
laser spot. Laser striping is advantageous over spot triangulation, because it provides
information about the continuity of the surface along the stripe. On the other hand laser
spot is more easily detectable especially at distant objects and provides better accuracy
on the measurement of the spot. The laser beam is traced with accuracy by taking into
account the luminance gradient of the spot in the image. The accuracy is much better
than the size of the laser spot and the camera pixel.

The scene is uniformly sampled using vertical laser beams. Using inverse
perspective as described in the previous section the image coordinates of the center of
the laser trace are projected in the 3D world along the line of sight. The lines of sight
from the different cameras, which see the trace, are intersected and the intersection
coordinates gives and object point coordinates. However, due to noise and physical
constraints, lines do not intersect in the geometrical sense so we compute the point,
which is at minimum distance from the lines of sight. The sampling frequency along
the object surface depends on the orientation of the surface relative to the laser beam.
Therefore areas with low sampling frequency along their surface are detected and are
sampled again using beams perpendicular to the surface.

4.3 NOISE FILTERING

Visual reconstruction from noise corrupted data is a fundamental problem in


computer vision. Most approaches to surface reconstruction are based on generalized
spline models on a bounded domain in the (x, y) image plane. The continuous spline
surface is represented as a single-valued function z(x, y) and the computational grid is
uniform. The nodes of the surface can move only in the vertical or z direction during the
iterative reconstruction procedure. More advanced techniques use nonuniform sampling
and reconstruct input data using adaptive meshes [3,10].

A neural network is used in order to filter out noise. The parametric function
represented by a network with one hidden layer has the form zˆ = s( ax + by + c )
where s( ) is a non-linear activation function. Networks with more than one hidden layer
can model complex surfaces using a variety of basis functions. The advantage is that the
geometric characteristics of the basis functions are learned from data instead of being
specified by the user.

The training algorithm is based on Back Propagation algorithm modified


according to Chen [4] so that it can handle errors in the training data set. Instead of
minimizing the sum of squared errors, it minimizes a new function, which is adjusted
with the progressively refined knowledge of noise in the data. The algorithm minimizes


P
φ ( z p − zˆ p )
p =1 t
where the shape of φt resembles the tanh estimator [4] whose
derivative is given as
 r | r |≤ a t

ψ t (r ) = a tanh( β (bt − r )) sgn( r ) a t <| r |≤ bt
 0 | r |> bt

ψt(r)

Φt(r)

When minor noise exists in the training data, the algorithm is similar to back
propagation algorithm. However, when noise level increases the influence of noise on
the learning process is significantly reduced, because the amount of weight adjustment
at the output layer during leaning is proportional to ψ t (r ) instead of the residual r. We
need then to specify q, which is the percentage of bad data to be tolerated by the
algorithm. Using the bootstrap method the confidence interval of the residuals is
computed that separates the good from the bad data.

5. Experiments

Experiments have been conducted in a laboratory and in an industrial


environment. In the laboratory small objects were scanned. During calibration the
target was illuminated with ambient light in order to increase the image contrast. The
calibration pattern was placed one meter away from each camera. This was done so that
the calibration data was distributed across the field of view and the range of the object
model in order to accurately estimate the radial lens distortion and image center
parameters.

Line Scan Spot Scan


In the industrial environment 3D measurements were taken in order to verify
the accuracy of the system on large steel structures. It also demonstrated the robustness
of the system under poor lightning conditions on mat objects and shiny ones too.
Stereoscopy is not accurate on dull areas, because features can be detected. Laser
scanning is not accurate on shiny surfaces because the system may detect a secondary
reflection of the beam and erroneously assume that this point lies on the laser plane.

Steel Structures

In order to scan objects at different angles, we have utilized a robotic system,


which is capable of moving the laser in all directions in space. This has been also a way
of examining the accuracy of the measurements which were accurate to 1mm in the
range of a 500 mm object. The images were acquired from three CCD cameras, using a
frame grabber which supports four input channels for cameras. Depending in the
complexity of the object more cameras can be used in order to completely cover the
object directly [11].

6. Conclusions

The proposed methodology has the following advantages: -can be applied to


large objects -provides high accuracy by fusing several sensor data and by super
sampling, - does not require position measurements of moving parts, - any defects and
distortions are compensated accurately by software calibration, - proper placement of
the cameras provides full coverage of the object.

References

[1] A. Murat Tekalp, “Digital Video Processing”, Prentice Hall, 1995.


[2] R. Y. Tsai, "A versatile Camera Calibration Technique for High-Accuracy 3D
Machine Vision Metrology Using Off-the-Shelf TV Cameras and Lenses", IEEE
Journal of Robotics and Automation, Vol. RA-3, No. 4, August 1987, pp 323-344.
[3] D.Terzopoulos and M Vasilescu, “ Sampling and Reconstruction with adaptive
Meshes”,CVPR’91 IEEE Computer Vision and Pattern Recognition 1991, pp 70- 75.
[4] D. Chen, R. Jain and B. Schunk, “Surface Reconstruction Using Neural Networks”,
CVPR’92 IEEE Computer Vision and Pattern Recognition 1992, pp 815-817.
[5] Robert J. Valkenburg, A. M. McIvor, and P. Wayne Power,"An evaluation of
subpixel feature localisation methods for precision measurement", SPIE Vol. 2350
Videometrics III (1994) pp. 229-238.
[6] M. R. Shortis, T. A. Clarke, and T. Short, "A comparison of some techniques for
the subpixel location of discrete target images", SPIE Vol. 2350 Videometrics III (1994)
pp. 239-250.
[7] Yang, Z.M., Wang, Y.F., “Error Analysis of 3D Shape Construction from Structured
Lighting”, PR(29), No. 2, 1996, pp. 189-206.
[8] A. Delopoulos and Y. Xirouhakis, “Robust Estimation of Motion and Shape based
on Orthographic Projections of Rigid Objects”, Tenth IMDSP Workshop 98, July 1998.
[9] J. Heikkila, O. Silven, “Calibration procedure for short focal length off-the-shelf
CCD cameras”, Proc. of The 13th International Conference on Pattern Recognition.
(1996) Vienna, Austria. pp. 166-170.
[10]M. Maed, K.Kumarumaru, H.Zha, K. Inoue and S. Sawai, “3D Surface Recovery
from Range Images by Using Multiresolution Wavelet Transform”, IEEE Conference
on Systems, Man, and Cybernetics,1997, pp.3654-3659.
[11] Ch. Schutz, T. Jost, H. Higli, “Semi-Automatic 3D Object Digitizing System Using
Range Images”, ACCV'98, Hong Kong, 1998.

You might also like