Cylindrical Rectification PDF
Cylindrical Rectification PDF
Universitk de Montrkalt Dkpartement d'informatique et de recherche opkrationnelle C.P. 6128, Succ. Centre-Ville, Montrkal, Qukbec, H3C 3J7
Wepropose a new rectification methodfor aligning epipolar lines of a pair of stereo images taken under any camera geometry. It effectively remaps both images onto the sugace of a cylinder instead of a plane, which is used in common rectijication methods. For a large set o f camera motions, remapping to a plane has the drawback of creating rectijied images that are potentially infinitely large and presents a loss of pixel information along epipolar lines. In contrast, cylindrical rectijication guarantees that the rectijied images are bounded for all possible camera motions and minimizes the loss of pixel information along epipolar line. The processes (eg. stereo matching, etc..) subsequently applied to the rectified images are thus more accurate and general since they can accommodate any camera geometry.
Introduction
Figure 1: Rectification. Stereo images (Il, 12)of scene S shown with planar rectification (Pl, P2) and cylindrical rectification (Cl, Cz)
Rectification is a necessary step of stereoscopic analysis. The process extracts epipolar lines and realigns them horizontally into a new rectified image. This allows subsequent stereoscopic analysis algorithms to easily take advantage of the epipolar constraint and reduce the search space to one dimension, along the horizontal rows of the rectified images. For different camera motions, the set of matching epipolar lines varies considerably and extracting those lines for the purpose of depth estimation can be quite difficult. The difficulty does not reside in the equations themselves; for a given point, it is straightforward to locate the epipolar line containing that point. The problem is to find a set of epipolar lines that will cover the whole image and introduces a minimum of distortion, for arbitrary camera motions. Since subsequent stereo matching occurs along epipolar lines, it is important that no pixel information is lost along these lines in order to efficiently and accurately recover depth. Fig. 1 depicts the rectification process. A scene S is 1 and 12. In observed by two cameras to create images 1 order to align the epipolar lines of this stereo pair, some image transformationmust be applied. The most common of such transformations, proposed by Ayache [ 11 and referred to as planar rectijication, is a remapping of the original images onto a single plane that is parallel to the line joining the two cameras optical centers (see Fig. 1, images PI and P.). This is accomplished by using a linear transformation in projective space applied to each image pixels. The new rectification method presented in this paper,
referred to as cylindrical rectification, proposes a transformation that remaps the images onto the surface of a cylinder whose principal axis goes through both cameras optical centers (see Fig. 1, images C 1 and C,). The actual images related for Fig. 1 are shown in Fig. 2. The line joining the optical centers of the cameras (see Fig. 1) defines the focus of expansion (foe). All epipolar lines intersect the focus of expansion. The rectification process applied to an epipolar line always makes that line parallel to the f o e . This allows the creation of a rectified image where the epipolar lines do not intersect and can be placed as separate rows. Obviously, both plane and cylinder remapping satisfy the alignment requirement with the foe. Planar rectification, while being simple and efficient, suffers from a major drawback: it fails for some camera motion, as demonstrated in Sec. 2. As the forward motion component becomes more significant, the image distortion induced by the transformation becomes progressively worse until the image is effectively unbounded. The image distortion induces a loss of pixel information that can only be partly compensated for by making the rectified image size larger'. Consequently, this method is useful only for motions with a small forward component, thus lowering the risk of unbounded rectified images. One benefit of planar rectification
is that it preserves straight lines, which is an important conlSee Sec. 3.6 for a detailled discussion
393
sideration if stereo matching is to be performed on edges or lines. On the other hand, cylindrical rectification is guaranteed to provide a bounded rectified image and significantly reduce pixel distortion, for all possible camera motions. This transformation also preserves epipolar line length. For example, an epipolar line 100 pixels long will always be rectified to a line 100 pixels long. This insures a minimal loss of pixel information when resampling the epipolar lines from the original images. However, arbitrary straight lines are no longer preserved, though this may only be a concern for edge based stereo. Planar rectification uses a single linear transformation matrix applied to the image, making it quite efficient. Cylindrical rectification uses one such linear transformation matrix for each epipolar line. In many cases, these matrices can be precomputed so that a similar level of performance can be achieved. Although it is assumed throughout this paper that internal camera parameters are known, cylindrical rectification works as well with unknown internal parameters, as it is the case when only the Fundamental matrix (described in [2]) is available (See Sec. 3.5). Many variants of the planar rectification scheme have been proposed [ 1,3,4]. A detailed description based on the essential matrix is given in [ 5 ] .In [ 6 ] ,a hardware implementation is proposed. In [7], the camera motion is restricted to a vergent stereo geometry to simplify computations. It also presents a faster way to compute the transformation by approximating it with a non-projective linear transformation. This eliminates the risk of unbounded images at the expense of potentially severe distortion. In [8], a measure of image distortion is introduced to evaluate the performance of the rectification method. This strictly geometric measure, based on edge orientations, does not address the problem of pixel information loss induced by interpolation (see Sec. 3.6). Sec. 2 describes planar rectification in more details. The cylindrical rectification method is then presented in Sec. 3. It describes the transformation matrix whose three components are explicitly detailed in Sec. 3.3,3.2 and 3.1. Sec. 3.4 discuss the practical aspect of finding the set of corresponding epipolar lines in both images to rectify. It is demonstrated in Sec. 3.5 that it is possible to use uncalibrated as well as calibrated cameras. A measure of image distortion is introduced in Sec. 3.6 and used to show how both rectification methods behave for different camera geometries. Examples of rectification for different camera geometries are presented in Sec. 4.
Figure 2: Images from Fig. 1. Original images (II, 12) are shown with cylindrical rectification (Cl,C 2 )and planar rectification (PI, P.). Fig. 2). In projective space, an image point is expressed as p = (p,,p,, h)T where h is a scale factor. Thus we can assume these points are projected to p = (p,,py, l)T. The linear projective transformation F is used to transform an image point U into a new point v with the relation
where
v =(W+/,Wh)
= (Uz,Uy,WJT
Uh
#0
The fact that U h # 0 simply implies that the original image has a finite size. Enforcing that the reprojected point is not at infinity implies that wh must be non-zero, that is
uh = U Z F g
uyF7 f
U ~ F# S0
(2)
Since U,, U , are arbitrary, Eq. 2 has only one possible solution (FG, F 7 , Fs) = ( 0 ,0, l) since only uh can guarantee 2/h to be non zero and F to be homogeneous. Therefore, the transformation F must have the form
F=
In this section we show how rectification methods based on a single linear transformation in projective space [ 1,3,4] fail for some camera geometries. As stated earlier, the goal of rectification is to apply a transformation to an image in order to make the epipolar lines parallel to the focus of expansion. The result is a set of images where each row represents one epipolar line and can be used directly for the purpose of stereo matching (see
[:p:
F3
F4
I':
F5
which corresponds to a camera displacement with no forward (or backward) component. In practice, the rectified image is unbounded only when the foe is inside the image. Therefore, any camera motion with a large forward component (making the foe visible) cannot be rectified with this method. Moreover, as soon as the forward component is large enough, the image points are
394
epipolar line
/FOE
also of finite length and therefore the rectified image cannot extend to infinity. The rectification process transforms an image point pxyz into a new point qfoewhich is expressed in the coordinate system foe of the cylinder. The transformation matrix Lfoe is defined so that the epipolar line containing p:yz will become parallel to the cylinder axis, the foe. Since all possible epipolar lines will be parallel to the foe, they will also be parallel to one another and thus form the desired parallel aligned epipolar geometry. We have the linear rectification relations between qfoe and pzyz stated as
qfoe
=
=
LfoePzyz
( S f m Tf oe Rfoe ) p l y =
(3)
and inversely Figure 3: The basic steps of the cylindrical rectian epipolar line is fication method. First (Rfoe), rotated in the epipolar plane until it is parallel to the foe. Second (Tfoe),a change of coordinate system is applied. Third ( S f , . ) , a projection onto the surface of the unit cylinder is applied. mapped so far apart that the rectification becomes unusable due to severe distortion. In the next section, we described how cylindrical rectijication can alleviate these problems by making a different use of linear transformations in projective space.
(4)
where
Sf,, =
1 0 0
0 o
0 0 i
and
Tfoe=
[ y]
Cylindrical rectification
The goal of cylindrical rectification is to apply a transformation of an original image to remap on the surface of a carefully selected cylinder instead of a plane. By using the line joining the cameras optical centers as the cylinder axis (Fig. I), all straight lines on the cylinder surface are necessarily parallel to the cylinder axis and focus of expansion, making them suitable to be used as epipolar lines. The transformation from image to cylinder, illustrated in Fig. 3 , is performed in three stages. First, a rotation is f , , ) . This rotation applied to a selected epipolar line (step R is in the epipolar plane and makes the epipolarline parallel to the foe. Then, a change of coordinate system is applied (step Tfoe) to the rotated epipolar line from the image system to the cylinder system (with foe as principal axis). Finally, (step Sfoe), this line is normalized or reprojected onto the surface of a cylinder of unit diameter. Since the line is already parallel to the cylinder, it is simply scaled along the direction perpendicular to the axis until it lies at unit distance from the axis. A particular epipolar line is referenced by its angle 8 around the cylinder axis, while a particular pixel on the epipolar line is referenced by its angle and position along the cylinder axis (see Fig. 3). Even if the surface of the cylinder is infinite, it can be shown that the image on that surface is always bounded. Since the transformation aligns an epipolar line with the axis of the cylinder, it is possible to remap a pixel to infinity only if its epipolar line is originally infinite. Since the original image is finite. all the visible parts of the epipolar lines are
These relations are completely invertible (except for the special case pzyz= foe, which is quite easily handled). The matrix Rfoerepresents the rotation of the image point in projective space. The matrix Tfoerepresents the change from the camera coordinate system to the cylinder system. The matrix S f , , represents the projective scaling used to project rectified point onto the surface of the unit cylinder. The next three subsections will describe how to compute the coordinate transformation Tfoe, the rotation Rf,, , and the scaling S f , , .
Determining transformation T The matrix Tfoeis the coordinate transformation matrix from system (x;y; z) to system (foe;U; v) such that
3.1
and is uniquely determined by the position and motion of the cameras (see Fig. 3 ) . Any camera has a position pos and a rotation of q!~degrees around the axis axis relative to the world coordinate system. A homogeneous world point pw is expressed in the system of camera a (with posa,axis,, and &) as
pa = RaWpw
R a w
rot(axis,,-&)
395
where
and rot(A, 0) is a 3 x 3 rotation matrix of angle 8 around axis A. The corresponding matrix Rbw for camera b with posh, axisb, and 4 b is defined in a similar way. The direct coordinate transformation matrices for camera a and b such that Pa Pb are defined as
= RabPb
to the rotation axis) and the plane normal to the foe also containing the origin. By projecting the point pzyzonto that plane, we can directly compute the angle. We have z, the normal z projected on the epipolar plane defined as
z= axis x (z x a i s ) =
= RbaPa
and p, the projected pzyzon the plane normal to the foe, defined as
(9)
where Tfoewas previously defined in Eq. 6. The rotation matrix Rfoerotates the vector z onto the vector p around the axis of Eq. 8 and is defined as
where
rab
rba
rawr;f,
T rbwraw
raw
foe, = foeb =
. (posh
POs,)
n(i(;)b) .((a x b) x a)
IT [
. ( ( a
x b) x b)
rbw
. (pas, - posh)
If the point qfoe is available instead of point pzyz, (as would be the case for the inverse transformation of Eq. 4) we can still compute Rfoe from Eq. 10 by substituting qzyz for pzyz in Eq. 8 and 9 where qzyz is derived from qfoeusing Eq. 5. Notice that because pzyz and qzyz are in the same epipolar plane, the rotation axis will be the same. Also, the angle of rotation will also be the same since their projection onto the plane normal to the foe is the same (modulo a scale factor).
from which we can derive the matrix Tfoe;afor rectifying the image of camera a as
Tfoe;a
where n(v) = v/llvll is a normalizing function. The correfor rectifying the image of camera sponding matrix Tfoe;b b can be derived similarly or more simply by the relation
3.3
Tfoe;b= - T f o e ; a rab
(7)
For the case where foe, = z, the last two rows of Tfoe;a can be any two orthonormal vectors perpendicular to z.
The matrix S f , , is used to project the epipolar line from the unit image plane (i.e. located at z = 1) onto the cylinder of unit radius. To simplify notation in the following equation, we define A=
3.2
Determining rotation R
The epipolar line containing a point pzyz will be rotated around the origin (the cameras optical center) and along the epipolar plane until it becomes parallel to the foe. The epipolar plane containing pzyzalso contain the foe (by definition) and the origin. The normal to that plane is
1 0 0 0 0 0 [ 0 0 0 1
0 0 0 B = O l O [ o 0 11
(8)
As shown in Eq. 3 and 4, Sfoehas one scalar parameter k. This parameter can be computed for a known point pyyz (Eq. 3) by enforcing unit radius and solving the resulting equation
and will be the axis of rotation (see Fig. 3 ) , thus ensuring that
pzyz remains in the epipolar plane. In the case pzyz= foe,
the axis can be any vector normal to the foe vector. The angle of rotation needed can be computed by using the fact that the normal z = (O,O, l)T to the image plane has to be rotated until it is perpendicular to the foe. This is because the new epipolar line has to be parallel to the foe. The rotation angle is the angle between the normal z = (0, 0, projected on the epipolar plane (perpendicular
II
r l
o 01
I1
396
For the case of a known point qfoe (Eq. 4), enforcing that the epipolar lines all have their z coordinates equal to 1 gives the equation
The fundamental matrix F defines the epipolar relation between points pa and Pb of the images as
T Pb .F.p,
roi
(12)
It is straightforward to extract the focus of expansion for each image by noticing that all points of one image must satisfy Eq. 12 when the point selected in the other image is its foe. More precisely, the relations for foe, and foeb are
p r . F . f o e , = 0 vpb f o e r . F . p , = 0 Vpa
which yield the homogeneous linear equation systems
. (A q f o e )
+ k ( T j o e c 3 ) . (B q f o e ) = 1
where c 3 is the third column of rotation matrix Rf,,. The solution is then
F.foe, = 0 FT.foeb = 0
(13)
(14)
It should be noted that the denominator can never be zero because of Eq. 1 1 and the fact that Tfoec3can never be zero or orthogonal to B qfoe.
3.4
which are easily solved. At this point, it remains to show how to derive the constituent of matrix Lfoe of Eq. 3 from the fundamentalmatrix F. These are the matrices S f o e ,Rfoe, and T f o e . The transformation Tfoeia can be directly obtained from Eq. 6, using foe, obtained in Eq. 13. Symmetrically (using Eq. 14) we obtain
=
Tl. (foeb ) n(z x foeb)
In general, a rectified image does not span the whole cylinder. The common angle interval is the interval that yields all common epipolar lines between two views. In order to control the number of epipolar lines extracted, it is important to determine this interval for each image. Notice that the rectification process implicitly guarantees that a pair of corresponding epipolar lines have the same angle on their respective cylinder, and therefore the same row in the rectified images. The concern here is to determine the angle interval of epipolar lines effectively present in both images. It can be shown that if a rectified image does not span the whole cylinder, then the extremum angles are given by two corners of the image. Based on this fact, it is sufficient to compute the angle of the four corners and one point between each pair of adjacent corners. By observing the ordering of these angles and taking into account the periodicity of angle measurements, it is possible to determine the angle interval for one image. Given the angle intervals computed for each image separately, their intersection is the common angle interval sought. The subsequent stereo matching process has only to consider epipolar lines in that interval.
Tfoe;b
n(f0eb x (z x foe()))
The rotation matrix R f o e is computed from the foe (which is readily available from the fundamental matrix F) and the transform matrix Tfoe, exactly as described in Sec. 3.2. Since the scaling matrix S f , , is directly computed from the value of rotation matrix Rfoeand transform Tfoe, it is computed exactly as described in Sec. 3.3. The rectification method is applicable regardless of the availability of the internal camera parameters. However, without these parameters, it is impossible to determine the minimum and maximum disparity interval which is of great utility in an subsequent stereo matching. In this paper, all the results obtained performed with known internal parameters.
3.6
3.5
Until now, it was always assumed that the cameras where calibrated, i.e. their internal parameters are known. The parameters are the principal point (optical axis), focal lengths and aspect ratio. More generally, we can represent all these parameters by a 3 x 3 upper triangular matrix. In this section, we assume that only the fundumentul matrix is available. This matrix effectively hides the internal parameters with the camera motion (external parameters) in a single matrix.
The distortion induced by the rectification process in conjunction with the resampling of the original image can create a loss of pixel information, i.e. pixels in the original image are not accounted for and the information they carry is simply discarded during resampling. We measure this loss along epipolar lines, since it is along these lines that a subsequent stereo process will be carried out. To establish a measure of pixel information loss, we consider a rectified epipolar line segments of a length of one pixel and compute the lenght L of the original line segment that is remapped to it. For a given length L, we define the loss as
397
30
0 : cylindrical
0: planar
Translation (z component)
Figure 5: Image "cube" rectified. Horizontal camera motion (foe = ( l , O , O ) ) . A row represent an individual epipolar line. the rectified displacement of the sphere and cone is purely horizontal, as expected. Fig. 8 depicts a typical camera geometry, suitable for planar rectification, with rectified images shown in Fig. 9. While the cylindrical rectification (images C1, C, in Fig. 9) introduces little distortion, planar rectification (images Pl,P2) significantly distorts the images, which are also larger to compensate for pixel information loss. Examples where the foe is inside the image are obtained when the forward component of the motion is large enough with respect to the focal length (as in Fig. 7). It is important to note that planar rectification always yields an unbounded image (i.e. infinite size) for these cases and thus can not be applied. The execution time for both methods is very similar. For many camera geometries, the slight advantage of planar rectification relating to the number of matrix computation is overcome by the extra burden of resampling larger rectified images to reduce pixel loss.
Figure 4: Pixel loss as a function of camera translation T = (l,O,z). Rectified image width is 365, 730 and 1095 pixels for an original width of 256 pixels. A shrinking of original pixels (i.e. L > 1) creates pixel information loss while a stretching (i.e. L < 1) simply reduce the density of the rectified image. For a whole image, the measure is the expected loss over all rectified epipolar lines, broken down into individual one pixel segments. The fundamental property of cylindrical rectification is the conservation of the length of epipolar lines. Since pixels do not stretch or shrink on these lines, no pixel information is lost during resampling, except for the unavoidable loss introduced by the interpolation process itself. For planar rectification, the length of epipolar lines is not preserved. This implies that some pixel loss will occur if the rectified image size is not large enough. In Fig. 4, three different rectified image width (365, 730, 1095 pixels) were used with both methods, for a range of camera translations T = (1,0, z ) with a z component in the range z E [0,1]. Cylindrical rectification shows no loss for any camera motion and any rectified image width2. However, planar rectification induces a pixel loss that depends on the camera geometry. To compensate for such a loss, the rectified images have to be enlarged, sometimes to the point where they become useless for subsequent stereo processing. For a z component equal to 1 (i.e. T = (1,0, l)), all pixels are lost, regardless of image size.
Conclusion
Some examples of rectification applied to different camera geometries are illustrated in this section. Fig. 5 presents an image plane and the rectification cylinder with the reprojected image, for an horizontal camera motion. In this case, the epipolar lines are already aligned. The rows represent different angles around the cylinder, from 0" to 360". The image always appears twice since every cylinder point is projective across the cylinder axis. The number of rows determine the number of epipolar lines that are extracted from the image. Fig. 6 depicts a camera geometry with forward motion. The original and rectified images are shown in Fig. 7 (planar rectification can not be used in this case). Notice how
2The minimum image width that guarantees no pixel loss is equal t o v ' i T P 2 for an original image of size (ut,h )
We presented a new method, called cylindrical rectiJcation, for rectifying stereoscopic images under arbitrary camera geometry. It effectively remaps the images onto the surface of a unit cylinder whose axis goes through both cameras optical centers. It applies a transformation in projective space to each image point. A single linear transformation is required per epipolar line to rectify. While it does not preserves arbitrary straight lines, it preserves epipolar line lengths, thus insuring minimal loss of pixel information. As a consequence of allowing arbitrary camera motions, the rectified images are always bounded, with a size independent of camera motion. The approach has been implemented and used successfully in the context of stereo matching [SI, ego-motion estimation [ 101 and tridimensional reconstruction and has proved to provide added flexibility and accuracy at no significant cost in performance.
References
[l] N Ayache and C. Hansen. Rectification of images for binoc-
ular and trinocular stereovision. In Proc. of Int. Conf. on Pattern Recognztzon, pages 11-16, Washington, D.C., 1988. [2] Q.-T. Luong and 0. D. Faugeras. The fundamental matrix: Theory, algorithms, and stability analysis. Int. J . Computer Vzszon, 17:43-75, 1996.
398
The images 11, Iz are shown with their cylindrical rectification Cl, C2. The rectified image displacements are all horizontal.
Figure 6: Forward motion. A sphere and cone are observed from two cameras displaced along their optical axis. The original images I l , I z are remapped onto the cylinder as CI,Cz.
[3] S. B. Kang, J. A. Webb, C. L. Zitnick, and T. Kanade. An active multibaseline stereo system with real-time image acquisition. Technical Report CMU-(3-94-167, School of Computer Science, Carnegie Mellon University, 1994.
11
[4] 0. Faugeras. Three-dzmentaonal computer uzszon. MIT Press, Cambridge, 1993. [5] R. Hartley and R. Gupta. Computing matched-epipolar projections. In Proc. of IEEE Conference on Computer Vzsaon and Pattern Recognation, pages 549-555, New York,
.7*r
IU.Y., IYYY.
.--,.
[6] P. Courtney, N. A. Thacker, and C. R. Brown. A hardware architecture for image rectification and ground plane obstacle detection. In Proc. of Int. Conf. on Pattern Recognition, pages 23-26, The Hague, Netherlands, 1992. [7] D. V. Papadimitriou and T. J. Dennis. Epipolar line estimation and rectification for stereo image pairs. IEEE Trans. Image Processing, 5(4):672-676, 1996. [8] L. Robert, M. Buffa, and M. HBbert. Weakly-calibrated stereo perception for rover navigation. In Proc. 5th Znt. Conference on Computer Vision, pages 46-51, Cambridge,
199.5
Figure 8: Camera geometry suitable for planar rectification. 11, I z are the original images.
399