Basic 3D Geometry For One and Two Cameras
Basic 3D Geometry For One and Two Cameras
Introduction
This note discusses 3D geometry for a single camera in Sections 2-6, and then covers the basics of stereo in Sections 7-9. It combines most of the contents of two older documents, Notes on 3D Geometry and Enforcing a Ground Plane Bias by Image Shearing.
where c is the column and r is the row. (This is a Point Grey convention, and it assumes the camera center is exactly in the center column and row of the image. In fact, proper camera calibration can be done to get a better estimate of the center.) The (X, Y, Z) coordinate system is shown in Fig. 1, and the (u, v) system in Fig. 2: Z points along line of sight, away from camera; X (and also u) points from left to right as you face the image plane (your eyes are pointing in the Z direction); similarly, Y (and also v) points from top to down as you face the image plane. (0, 0, 0) is the camera center (the pinhole in a pinhole camera). The projection equations are: u v = f X/Z = f Y /Z (3) (4)
where f is the camera focal length. (If u and v are expressed in pixels, then f should also be measured in pixels.)
Vanishing Points
Any 3D direction gives rise to a vanishing point (VP) in the image. Given a 3D direction vector d, we can parameterize a line in that direction as all points 1
Figure 1: XYZ camera-centered coordinate system, following the Point Grey convention. Note that the positive Z-axis is the camera line of sight, pointing away from the eye. p(t) = p0 + td, where t is a real number. It is easy to show that the VP, dened as the value of (u, v) in the limit of t approaching or +, is given by (u, v) = (f dx /dz , f dy /dz ). Notice that this doesnt depend on p0 in other words, all parallel lines sharing the same direction have the same VP.
Calculating Horizon
A plane denes a 2D space of directions, and each direction has its own VP; you can show that the set of VPs forms a line in the image, called the horizon. Assume we are given a ground plane dened by rn = k, where r = (X, Y, Z), n = (nx , ny , nz )=ground plane unit vector. Then you can show that the horizon is dened by all points (u, v) such that n(u, v, f ) = 0. Notice that this equation doesnt depend on the value of k. The way to prove this is to suppose that we are given a vector a that is perpendicular to n, i.e. a n = 0. The VP of a is (u, v) = (f ax /az , f ay /az ). Then n (u, v, f ) = (f /az )(nx ax + ny ay + nz az ) = 0.
Suppose the ground plane is dened by rn = k, where k < 0 and |k| is the height of the camera above the ground. (We assume that n is the unit-length upwardfacing normal coming out of the ground, the opposite of the down direction.) Then any point in the image below the horizon (below in the image sense) maps to a 3D location on the ground plane, and we can describe this 3D location in 2D ground coordinates. First we nd the mapping from (u, v) to the corresponding 3D location on ground plane:
Figure 2: uv coordinate system that corresponds to projecting XYZ coordinates to the image plane, again following the Point Grey convention. From u = f X/Z and v = f Y /Z we have that (uZ/f, vZ/f, Z) n = k. Then Z = k/[n (u/f, v/f, 1)] and we can also solve for X and Y using the u = f X/Z and v = f Y /Z equations. Next we describe the 3D location in 2D ground coordinates: We need to nd a 2D basis in the ground plane, i.e. two vectors a and b lying in the ground plane that are orthogonal. To do this we will rst project the X-axis onto the ground plane by calculating how much of it projects onto the ground normal, and subtracting that o: a X (n X)n = (1, 0, 0) nx n, where X is the unit vector in the X-direction. Next we normalize a to unit length (not shown here). Then we calculate b = n a, where is the 3D vector product. If the camera is roughly horizontal, then a will be approx. parallel to the X-direction and b will be in the forward direction along the ground. Finally, to project r = (X, Y, Z) into 2D ground coordinates, rst dene p0 = kn, which is the 3D location of the users shoes (i.e. the spot on the ground directly below the camera). Then we get 2D ground coordinates (u , v ), where u = (r p0 ) a and v = (r p0 ) b. These coordinates will have units of meters if k is dened in meters.
Given the ground plane dened as above, r n = k, then given a pixel with image coordinates (u, v) we would like to determine the (X, Y, Z) coordinates of this pixel assuming it corresponds to a point lying on the ground plane. The solution is to rewrite the projection equations as X = uZ/f , Y = vZ/f and to rewrite the plane equation in terms of Z (but not X or Y ): (uZ/f, vZ/f, Z) n = k. We can then solve for Z as Z = f k/[(u, v, f ) n], and can use the rewritten projection equations, X = uZ/f , Y = vZ/f , to
determine X and Y .
Now we discuss stereo for a pair of rectied cameras, for which a row in the left camera matches the same row in the right camera. We use the same (u, v) and (X, Y, Z) coordinate systems as before. The reference (right) cameras center is at (0, 0, 0) as before, but the center of the other camera is at (B, 0, 0) and is shifted by a baseline distance B > 0 from the reference camera. Note that the image coordinates (u, v) refer to the reference image, not the other image. The projection equations relate (u, v), (X, Y, Z) and the disparity d: u = f X/Z v Z = f Y /Z = f B/d (5) (6) (7)
In the Point Grey convention, the right image R(u, v) is the reference image, and the left image is L(u, v). This convention gives us the following interpretation for the disparity map d(u, v): (u, v) in the right image maps to (u+d(u, v), v) in the left image. In other words, L(u + d(u, v), v) R(u, v) provided the camera gain is similar in the left and right images. (8)
As before, the ground plane is dened by r n = k, where r = (X, Y, Z), n = (nx , ny , nz )=ground plane unit vector. If we assume every point in the scene lies on the ground plane then we can project any point onto this plane: Since X = uZ/f and Y = vZ/f we get Z = f k/(n (u, v, f )). Finally, since in general Z = f B/d we have the predicted disparity: dg (u, v) = f B/Z = Bn (u, v, f )/k (9)
i.e. the predicted disparity (assuming every point in the scene lies on the ground plane) as a function of u and v, called dg (u, v). Note that the disparity is linear in the image row and column coordinates.
This is useful for implementing an MRF in terms of elevations instead of disparities. Given a unit normal direction vector n, dene the elevation of any point r = (X, Y, Z) to be E(r) = n r. (Sometimes we measure the elevation relative to a ground plane dened by n r = k, in which case we subtract k from the above expression for E(r). In this section we will continue to use E(r), which is by denition equal to zero at the camera center.) Converting disparity to elevation: E(u, v, d) = r n = (Zu/f, Zv/f, Z) n where Z = f B/d. Converting elevation to disparity: this is a bit more work. d(u, v, E) = f B/Z so lets solve for Z. We know that E = (X, Y, Z) n = (Zu/f, Zv/f, Z) n = fE Z/f (u, v, f ) n, so Z = (u,v,f )n . Therefore d(u, v, E) = B (u, v, f ) n E (10)
Note that these equations hold whether or not the baseline is roughly horizontal (i.e. nx 0).