Oriented Projective Geometry For Computer Vision
Oriented Projective Geometry For Computer Vision
1 Introduction
Projective geometry is now established as the correct and most convenient way to describe the geometry of systems of cameras and the geometry of the scene they record.
The reason for this is that a pinhole camera, a very reasonable model for most cameras,
is really a projective (in the sense of projective geometry) engine projecting (in the usual
sense) the real world onto the retinal plane. Therefore we gain a lot in simplicity if we
represent the real world as a part of a projective 3-D space and the retina as a part of a
projective 2-D space.
But in using such a representation, we apparently loose information: we are used
to think of the applications of computer vision as requiring a Euclidean space and this
notion is lacking in the projective space. We are thus led to explore two interesting avenues. The first is the understanding of the relationship between the projective structure
of, say, the environment and the usual affine and Euclidean structures, of what kind of
measurements are possible within each of these three contexts and how can we use image measurements and/or a priori information to move from one structure to the next.
This has been addressed in recent papers [7, 2]. The second is the exploration of the requirements of specific applications in terms of geometry. A typical question is, can this
application be solved with projective information only, affine, or Euclidean. Answers to
some of these questions for specific examples in robotics, image synthesis, and scene
modelling are described in [11, 3, 4], respectively.
In this article we propose to add a significant feature to the projective framework,
namely the possibility to take into account the fact that for a pinhole camera, both sides
of the retinal plane are very different: one side corresponds to what is in front of the
camera, one side to what is behind! The idea of visible points, i.e. of points located in
front of the camera, is central in vision and the problem of enforcing the visibility of
reconstructed points in stereo, motion or shape from X has not received a satisfactory
answer as of today. A very interesting step in the direction of a possible solution has been
taken by Hartley [6] with the idea of Cheirality invariants. We believe that our way of
extending the framework of projective geometry goes significantly further.
Thus the key idea developed in this article is that even though a pinhole camera is
indeed a projective engine, it is slightly more than that in the sense that we know for sure
that all 3-D points whose images are recorded by the camera are in front of the camera.
Hence the imaging process provides a way to tell apart both sides of the retinal plane.
Our observation is that the mathematical framework for elaborating this idea already
exists, it is the oriented projective geometry which has recently been proposed by Stolfi
in his book [13].
An n-dimensional projective space, P n can be thought of as arising from an n+1 dimensional vector space in which we define the following relation between non zero vectors.
To help guide the readers intuition, it is useful to think of a non zero vector as defining a
line through the origin. We say that two such vectors and are equivalent if and only
if they define the same line. It is easily verified that this defines an equivalence relation
on the vector space minus the zero vector. It is sometimes also useful to picture the projective space as the set of points of the unit sphere S n of Rn+1 with antipodal points
identified. A point in that space is called a projective point; it is an equivalence class of
vectors and can therefore be represented by any vector in the class. If is such a vector,
then ; 6= 0 is also in the class and represents the same projective point.
In order to go from projective geometry to oriented projective geometry we only
have to change the definition of the equivalence relation slightly:
(1)
where we now impose that the scalar be positive. The equivalence class of a vector
now becomes the half-line defined by this vector. The set of equivalence classes is the
oriented projective space T n which can also be thought of as S n but without the identification of antipodal points. A more useful representation, perhaps, is Stolfis straight
model [13] which describes T n as two copies of Rn, and an infinity point for every direction of Rn+1, i.e. a sphere of points at infinity, each copy of Rn being the central
projection of half of S n onto the hyperplane of Rn+1 of equation x1 = 1. These two
halves are referred to as the front range ( x1 > 0) and the back range (x1 < 0) and we
can think of the front half as the set of "real" points and the back half as the set of "phantom" points, or vice versa. Thus, given a point x of T n of coordinate vector , the point
represented by ? is different from x, it is called its antipode and noted :x.
The nice thing about T n is that because it is homeomorphic to S n (as opposed to
n
P which is homeomorphic to S n where the antipodal points have been identified), it
is orientable. It is then possible to define a coherent orientation over the whole of T n :
if we imagine moving a direct basis of the front range across the sphere at infinity into
the back range and then back to the starting point of the front range, the final basis will
have the same orientation as the initial one which is the definition of orientability. Note
that this is not possible for P n for even values of n.
m PM
0 lT 1
1
P = @ lT2 A
lT3
PC
m
l M
MM
PP
and back ranges in the terminology of section 1, which are two affine planes, and a circle
of points at infinity. The front range "sees" the points in the front range of the camera,
the back range "sees" the points in the back range of the camera.
The sign of determines the orientation of the camera without ambiguity. A camera
with an opposite orientation will look in the exact opposite direction, with the same projection characteristics. It is reassuring that these two different cameras are represented
by two different mathematical objects. For clarity, in the Figure 1, consider that we are
and . The scene is then T 2 that we represent as a
working in a plane containing
sphere, whereas the focal plane appears as a great circle. We know that the scene point
will be between and : .
M
C
M
:C
f
:M
C
PC
C
PC
e m
also goes away if we know the affine structure of the scene. The plane at infinity splits
is on the positive side.
the sphere in two halves. We can choose its orientation so that
Because 1 and 2 are real points, they also are on the positive side of the plane at
infinity. This will discriminate between 2 and : 2 .
l
:C1 2
++
:C2
-+
C1
:M
+C2 l1
--
4 Applications
In this section, we show some applications of the oriented projective geometry in problems in computer vision involving weakly calibrated cameras.
f2
f1
++
+-
-+
--
our first point of reference used to orient the cameras must be in front. We can now constrain every reconstructed point to be in the front range of T 3 . This enables us to choose
which point in T 3 corresponds to the point in P 3 . The points appearing in - - can be
removed, their antipodal point being an impossible reconstruction also.
We are not implying that we can detect this way all of the false matches, but only
that this inexpensive step1 can improve the results at very little additional expense. It
should be used in conjunction with other outlier detection methods like [14] and [15].
The method is simple. From our correspondences, we compute a fundamental matrix
as in [8] for example. From this fundamental matrix, we obtain two perspective projection matrices [1, 5], up to an unknown projective transformation. We then orient, perhaps arbitrarily, each of the two cameras. The reconstruction of the image points yields
a cloud of pairs of points which can lie in the four zones.
In general one of the zones contains the majority of points because it corresponds to
the real scene2. The points which are reconstructed in the other zones are then marked
as incorrect and the cameras can be properly oriented so that the scene lies in front of
them.
The pair of images in figure 4 has been taken with a conventional CCD camera. The
correspondences were computed using correlation, then relaxation. An outlier rejection
method was used to get rid of the matches which did not fulfill the epipolar constraints.
Most outliers were detected using the techniques described in [14] and [15]. Still, these
methods are unable to detect false matches which are consistent with the epipolar geometry. Using orientation, we discovered two other false matches which are marked as
points 41 and 251. This is not a great improvement because most outliers have already
been detected at previous steps, but these particular false matches could not have been
detected using any other method.
The problem is the following: given two points M a and M b in a scene which are both
visible in image 1 but project to the same image point in image 2, we want to be able
1
to decide from the two images only and their epipolar geometry which of the two scene
points is actually visible in image 2 (see Figure 5). This problem is central in view transfer and image compression [3, 9].
It is not possible to identify the point using the epipolar geometry alone, because
both points belong to the same epipolar line. We must identify the closest 3-D point to
the optical center on the optical ray. It is of course the first object point when the ray is
followed from the optical center towards infinity in the front range of the camera. It is a
false assumption that it will necessarily be the closest point to the epipole in the image.
In fact, the situation changes whenever the optical center of one camera crosses the focal
plane of the other camera. This can be seen in the top part of Figure 5 where the closest
point to the epipole switches from ma1 to ma2 when C2 crosses f 1 and becomes C20 with
the effect that e12 becomes e012 .
We can use oriented projective geometry in order to solve this problem in a simple
and elegant fashion. We have seen in the previous section that every point of the physical
space projects onto the images with a sign describing its position with respect to the
focal plane. We have also seen that the epipolar lines were oriented in a coherent fashion,
namely from the epipole to the point. When the epipole is in the front range of the retinal
plane as for e12 (right part of the bottom part of Figure 5), when we start from e12 and
follow the orientation of the epipolar line, the first point we meet is ma1 which is correct.
When the epipole is in the back range of the retinal plane as for e012 (left part of the bottom
part of Figure 5), when we start from e012 and follow the orientation of the epipolar line,
we first go out to infinity and come back on the other side to meet ma1 which is again
correct!
Hence we have a way of detecting occlusion even if we use only projective information. The choice of which representant of 2 we use will determine a possible orientation. But what happens when the chosen orientation is incorrect? In order to understand the problem better, we synthesized two views of the same object, using the same
projection matrices, but with two different orientations. This is shown in Figure 6. The
erroneous left view appears as seen from the other side. The geometric interpretation
Mb
Ma
e12
mb1 ma
1
e12
C2
C2
C1
e12
0
mb1
ma1
mb1
ma1 e12
Fig. 5. Change of orientation when C2 crosses the focal plane of the first camera.
is simple: The wrongly oriented camera looks in the opposite direction, but far enough
to go through the sphere at infinity and come back to the other side of the object. Please
note that without the use of oriented projective geometry, we would have a very large
number of images, namely two possible orientations for each pixel.
M M
u M u M
Fig. 6. Two synthesized images with cameras differing only in orientation. The image is incomplete because the source images did not cover the object entirely. The left image presents anomalies on the side due to the fact that the left breast of the mannequin is seen from the back. Hence,
we see the back part of the breast first and there is a discontinuity line visible at the edge of the
object in the source images. What we are seeing first are the last points on the ray. If the object was
a head modelled completely, only the hair would be seen in one image, whereas the face would
appear in the other. One image would appear seen from the back, and one from the front.
5 Conclusion
We have presented an extension of the usual projective geometric framework which can
nicely take into account an information that was previously not used, i.e. the fact that
we know that the pixels in an image correspond to points which lie in front of the camera. This framework, called the oriented projective geometry, retains all the advantages
of the unoriented projective geometry, namely its simplicity for expressing the viewing
geometry of a system of cameras, while extending its adequation to model realistic situations.
References
1. Olivier Faugeras. What can be seen in three dimensions with an uncalibrated stereo rig. In
G. Sandini, editor, Proceedings of the 2nd European Conference on Computer Vision, volume
588 of Lecture Notes in Computer Science, pages 563578, Santa Margherita Ligure, Italy,
May 1992. Springer-Verlag.
2. Olivier Faugeras. Stratification of 3-d vision: projective, affine, and metric representations.
Journal of the Optical Society of America A, 12(3):465484, March 1995.
3. Olivier Faugeras and Stephane Laveau. Representing three-dimensional data as a collection
of images and fundamental matrices for image synthesis. In Proceedings of the International
Conference on Pattern Recognition, pages 689691, Jerusalem, Israel, October 1994. Com-
This article was processed using the LATEX macro package with ECCV96 style