0% found this document useful (0 votes)
159 views22 pages

CVF07 Lec07-08

The document discusses camera modeling and the perspective projection process. It describes how a camera maps 3D points in the world to 2D points on the image plane using a pinhole camera model. The key steps are: 1) A perspective transform matrix relates the camera coordinate frame to the image coordinate frame based on camera intrinsics like focal length. 2) Additional transformations map a world point to the camera frame, accounting for camera extrinsics like position and orientation. 3) Combining the intrinsic and extrinsic transformations yields an overall mapping from 3D world points to 2D image points.

Uploaded by

api-3709263
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
159 views22 pages

CVF07 Lec07-08

The document discusses camera modeling and the perspective projection process. It describes how a camera maps 3D points in the world to 2D points on the image plane using a pinhole camera model. The key steps are: 1) A perspective transform matrix relates the camera coordinate frame to the image coordinate frame based on camera intrinsics like focal length. 2) Additional transformations map a world point to the camera frame, accounting for camera extrinsics like position and orientation. 3) Combining the intrinsic and extrinsic transformations yields an overall mapping from 3D world points to 2D image points.

Uploaded by

api-3709263
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Ref: Light and Vision: LIFE Science Library

Camera Model

Modeling a Camera

Image by Dr Yaser Sheikh, CMU


Frames of Reference
 World Coordinate Frame, W
 Object Coordinate Frame, O
e.g. B, P
 Camera Coordinate Frame, C
 Real Image Coordinate
Frame, F
 Pixel Coordinate Frame, I

Aperture vs Shutter speed


 If shutter speed is doubled, and aperture
area is doubled, the same amount of light
should enter the camera
 Therefore, to shoot an image, there are
several valid combinations of aperture and
shutter speed
 High shutter speed: for fast moving objects
 Large aperture: low depth of field
Focus
 In general, any single point on the film can
have light coming from different directions
 Therefore a single point in the world may be
mapped to several locations in the image
 This generates blur
 To remove blur, all rays coming from a single
world point must converge to a single image
point

Example of
Shallow Depth of Field
Pinhole Camera
 Lens is assumed to be single point
 Infinitesimally small aperture
 Has infinite depth of field i.e. everything is in
focus

Distant objects are smaller

Slide Credit: Forsyth/Ponce http://www.cs.berkeley.edu/~daf/bookpages/slides.html


and Khurram Shafique, Object Video
Pinhole Camera
 Advantage  Disadvantage
 Because of small  Small aperture
aperture, everything requires high
is in focus (infinite exposure time, often
depth of field) too long for practical
 Simple construction purposes

Image Formation –
The Pin-Hole Camera
 Orient along z-axis
 World point (X,Y,Z) [camera frame]
 Image point at (x,y,z) [real frame]
Perspective Transform
f (X,Y,Z)

Equation relating Y
world coordinate and y
image coordinate? Z

−y f fY fX
= y=− x=−
Y Z Z Z
It is customary to use a negative sign to indicate
that the image is always formed upside down

Perspective Transform
 We can write this as a matrix using the
homogeneous coordinates
hx = X
hx  1 0 0 0  X 
hy  0 1 0 0  Y 
hy = Y
 =   Z
 hz  0 0 1 0  Z  h = −
  0 0 −f1

0  1 
f
 h   
fX fY
x=− y=−
Z Z
Perspective Transform: Some
Properties
 Lines map to lines
 Polygons map to polygons
 Parallel lines meet

Perspective Transform
 This relates the camera frame to the real
image frame
 Example:
 I take the image of a person (2m tall) standing 4m
away from the camera, with a 35 mm camera
using the geometry shown previously. How high
will be the image?
 Answer: y = -(35)(2000)/4000 = -17.5mm

 i.e, the image will be formed inverted of length


17.5 mm
 How to convert to pixel frame (i.e. what will be the
coordinates of the head of the person in the image?
Perspective Transform
 Suppose I know that the size of the film is
8cm x 6cm, and that the resolution of the
camera is 640 x 480 pixels
 Implies, the center of the image is at 4cm x
3cm from the corner, and is at location (240,
320)
 Image will first be made right side up
 17.5mm out of 60mm is 140 out of 480 pixels
 Hence the coordinates of the head will be
(240-140 in x, same in y) = (100, 320)

Perspective Projection
 This is for the case when the camera’s
optical axis is aligned with the world z-
axis
 Or: it relates camera frame to real image
frame
 What if that is not the case?
Camera Model
 If the camera is moved T from the
origin, we should move the world point
by –T
 Then the perspective transform
equation will be applicable
 Same holds for rotations

Camera Model
 Think that the camera
was originally at the
origin looking down Z
axis
 Then it was translated
by (r1, r2, r3)T, rotated
by φ along X, θ along Z,
then translated by
(x0, y0, z0)T
 This is the scenario in
the figure on right
Camera Model

-1
1 0 0 0 1 0 0 X0 cos θ − sin θ 0 0 1 0 0 0 1 0 0 r1 
0 1 0 0 0 1 0 Y0   sin θ cos θ 0 0 0 cos φ − sin φ 0 0 1 0 r2 
   
0 0 1 0 0 0 1 Z0   0 0 1 0 0 sin φ cos φ 0 0 0 1 r3 
       
  0 0 0 1  0 0 0 1  0 0 0 1 0 0 0 1
0 0 − 1 1
f 
1 0 0 0 1 0 0 − r1  1 0 0 0  cos θ sin θ 0 0 1 0 0 − X0
0 0 0
 1 0
 1 0 − r2  0 cos φ
 sin φ 0 − sin θ
 cos θ 0 0 0 1 0 − Y0 
0 0 1 0  0 0 1 − r3  0 − sin φ cos φ 0  0 0 1 0  0 0 1 − Z0 
       
0 0 − 1 1  0 0 0 1  0 0 0 1  0 0 0 1  0 0 0 1 
 f 

Ch = PCR−Xφ R−Zθ GWh

Camera Model
Camera Model

 This camera model is applicable in many situations


 For example, this is the typical surveillance camera scenario

Examples

1 0 0 0  0 
0 0 10  cos(−45 o )
0 1 0 sin(−45 o ) 0
0  0  0  
1 0 0  
 1 0
−1   0 1 0 0
C h = 0 0 1 0 T 10 T=
 1    0 0 1 0  − sin(−45 o ) 0 cos cos(−45 ) 0
o

0 0 − 1   
0.01   
1 0 0 0 1   1
 0 0 0
1 0 0 0 1 0 0 − 10  0  1 0 0 0  0 
0 0 0  0 
 1 0 0 0 1 0 0   0   1 0
−1 
   C h = 0 
C h = 0 0 1 0  0 0 1 0  10 0 1 0 T 10
 1   1   
− 1 0 0 0 1   1  0 0 − 1
0 0.01   
0
0.01  
1
 
1 0 0 0 − 10  − 10 
0 1 0 0  0   0 
     0
= 0 0 1 0  10  =  10 
 1  0
0 0 − 1 1  − 999
0.01  
-0.01

1
Aircraft Example
OTTER system_id
TV sensor_type
0001 serial_number
9.400008152666640300e+08 image_time
3.813193746469612200e+01 vehicle_latitude
-7.734523185193877700e+01 vehicle_longitude
9.949658409987658800e+02 vehicle_height
9.995171174441039900e-01 vehicle_pitch
1.701626418113209000e+00 vehicle_roll
1.207010551753029400e+02 vehicle_heading
1.658968732990974800e-02 camera_focal_length
-5.361314389557259100e+01 camera_elevation
-7.232969433546705000e+00 camera_scan_angle
480 number_image_lines
640 number_image_samples

cameraMat = perspective_transform * gimbal_rotation_y * gimbal_rotation_z *


gimbal_translation * vehicle_rotation_x * vehicle_rotation_y * vehicle_rotation_z *
vehicle_translation ;

1 0 0 0 cos ω 0 − sin ω 0  cos τ sin τ 0 0 cos φ 0 − sin φ 0  1 0 0 0  cos α sin α 0 0  1 0 0 − ∆Tx 


0 0  0
1 0 1 0 0 − sin τ cos τ 0 0  0 1 0 0 0 cos β sin β 0 − sin α cos α 0 0 0 1 0 − ∆Ty 
Π t =  
0 0 1 0  sin ω 0 cos ω 0  0 0 1 0  sin φ 0 cos φ 0 0 − sin β cos β 0  0 0 1 0  0 0 1 − ∆Tz 
       
0 0 −1 1  0 0 0 1  0 0 0 1  0 0 0 1  0 0 0 1  0 0 0 1  0 0 0 1 
f 

c(1,1) = (cos(c_scn)*cos(v_rll)-sin(c_scn)*sin(v_pch)*sin(v_rll))*cos(v_hdg)-sin(c_scn)*cos(v_pch)*sin(v_hdg);
c(1,2) = -(cos(c_scn)*cos(v_rll)-sin(c_scn)*sin(v_pch)*sin(v_rll))*sin(v_hdg)-sin(c_scn)*cos(v_pch)*cos(v_hdg);
c(1,3) = -cos(c_scn)*sin(v_rll)-sin(c_scn)*sin(v_pch)*cos(v_rll);
c(1,4) = -((cos(c_scn)*cos(v_rll)-sin(c_scn)*sin(v_pch)*sin(v_rll))*cos(v_hdg)-sin(c_scn)*cos(v_pch)*sin(v_hdg))*vx-(-(cos(c_scn)*cos(v_rll)-
sin(c_scn)*sin(v_pch)*sin(v_rll))*sin(v_hdg)-sin(c_scn)*cos(v_pch)*cos(v_hdg))*vy-(-cos(c_scn)*sin(v_rll)-sin(c_scn)*sin(v_pch)*cos(v_rll))*vz;

c(2,1) = (-sin(c_elv)*sin(c_scn)*cos(v_rll)+(-sin(c_elv)*cos(c_scn)*sin(v_pch)+cos(c_elv)*cos(v_pch))*sin(v_rll))*cos(v_hdg)+(-
sin(c_elv)*cos(c_scn)*cos(v_pch)-cos(c_elv)*sin(v_pch))*sin(v_hdg);
c(2,2) = -(-sin(c_elv)*sin(c_scn)*cos(v_rll)+(-sin(c_elv)*cos(c_scn)*sin(v_pch)+cos(c_elv)*cos(v_pch))*sin(v_rll))*sin(v_hdg)+(-
sin(c_elv)*cos(c_scn)*cos(v_pch)-cos(c_elv)*sin(v_pch))*cos(v_hdg);
c(2,3) = sin(c_elv)*sin(c_scn)*sin(v_rll)+(-sin(c_elv)*cos(c_scn)*sin(v_pch)+cos(c_elv)*cos(v_pch))*cos(v_rll);
c(2,4) = -((-sin(c_elv)*sin(c_scn)*cos(v_rll)+(-sin(c_elv)*cos(c_scn)*sin(v_pch)+cos(c_elv)*cos(v_pch))*sin(v_rll))*cos(v_hdg)+(-
sin(c_elv)*cos(c_scn)*cos(v_pch)-cos(c_elv)*sin(v_pch))*sin(v_hdg))*vx-(-(-sin(c_elv)*sin(c_scn)*cos(v_rll)+(-
sin(c_elv)*cos(c_scn)*sin(v_pch)+cos(c_elv)*cos(v_pch))*sin(v_rll))*sin(v_hdg)+(-sin(c_elv)*cos(c_scn)*cos(v_pch)-
cos(c_elv)*sin(v_pch))*cos(v_hdg))*vy-(sin(c_elv)*sin(c_scn)*sin(v_rll)+(-sin(c_elv)*cos(c_scn)*sin(v_pch)+cos(c_elv)*cos(v_pch))*cos(v_rll))*vz;

c(3,1) = (cos(c_elv)*sin(c_scn)*cos(v_rll)+(cos(c_elv)*cos(c_scn)*sin(v_pch)+sin(c_elv)*cos(v_pch))*sin(v_rll))*cos(v_hdg)+(cos(c_elv)*cos(c_scn)*cos(v_pch)-
sin(c_elv)*sin(v_pch))*sin(v_hdg);
c(3,2) = -(cos(c_elv)*sin(c_scn)*cos(v_rll)+(cos(c_elv)*cos(c_scn)*sin(v_pch)+sin(c_elv)*cos(v_pch))*sin(v_rll))*sin(v_hdg)+(cos(c_elv)*cos(c_scn)*cos(v_pch)-
sin(c_elv)*sin(v_pch))*cos(v_hdg);
c(3,3) = -cos(c_elv)*sin(c_scn)*sin(v_rll)+(cos(c_elv)*cos(c_scn)*sin(v_pch)+sin(c_elv)*cos(v_pch))*cos(v_rll);
c(3,4) = -
((cos(c_elv)*sin(c_scn)*cos(v_rll)+(cos(c_elv)*cos(c_scn)*sin(v_pch)+sin(c_elv)*cos(v_pch))*sin(v_rll))*cos(v_hdg)+(cos(c_elv)*cos(c_scn)*cos(v_pch)-
sin(c_elv)*sin(v_pch))*sin(v_hdg))*vx-(-
(cos(c_elv)*sin(c_scn)*cos(v_rll)+(cos(c_elv)*cos(c_scn)*sin(v_pch)+sin(c_elv)*cos(v_pch))*sin(v_rll))*sin(v_hdg)+(cos(c_elv)*cos(c_scn)*cos(v_pch)-
sin(c_elv)*sin(v_pch))*cos(v_hdg))*vy-(-cos(c_elv)*sin(c_scn)*sin(v_rll)+(cos(c_elv)*cos(c_scn)*sin(v_pch)+sin(c_elv)*cos(v_pch))*cos(v_rll))*vz;

c(4,1) =
(1/fl*cos(c_elv)*sin(c_scn)*cos(v_rll)+(1/fl*cos(c_elv)*cos(c_scn)*sin(v_pch)+1/fl*sin(c_elv)*cos(v_pch))*sin(v_rll))*cos(v_hdg)+(1/fl*cos(c_elv)*cos(c_s
cn)*cos(v_pch)-1/fl*sin(c_elv)*sin(v_pch))*sin(v_hdg);
c(4,2) = -
(1/fl*cos(c_elv)*sin(c_scn)*cos(v_rll)+(1/fl*cos(c_elv)*cos(c_scn)*sin(v_pch)+1/fl*sin(c_elv)*cos(v_pch))*sin(v_rll))*sin(v_hdg)+(1/fl*cos(c_elv)*cos(c_sc
n)*cos(v_pch)-1/fl*sin(c_elv)*sin(v_pch))*cos(v_hdg);
c(4,3) = -1/fl*cos(c_elv)*sin(c_scn)*sin(v_rll)+(1/fl*cos(c_elv)*cos(c_scn)*sin(v_pch)+1/fl*sin(c_elv)*cos(v_pch))*cos(v_rll);
c(4,4) = -
((1/fl*cos(c_elv)*sin(c_scn)*cos(v_rll)+(1/fl*cos(c_elv)*cos(c_scn)*sin(v_pch)+1/fl*sin(c_elv)*cos(v_pch))*sin(v_rll))*cos(v_hdg)+(1/fl*cos(c_elv)*cos(c_s
cn)*cos(v_pch)-1/fl*sin(c_elv)*sin(v_pch))*sin(v_hdg))*vx-(-
(1/fl*cos(c_elv)*sin(c_scn)*cos(v_rll)+(1/fl*cos(c_elv)*cos(c_scn)*sin(v_pch)+1/fl*sin(c_elv)*cos(v_pch))*sin(v_rll))*sin(v_hdg)+(1/fl*cos(c_elv)*cos(c_sc
n)*cos(v_pch)-1/fl*sin(c_elv)*sin(v_pch))*cos(v_hdg))*vy-(-
1/fl*cos(c_elv)*sin(c_scn)*sin(v_rll)+(1/fl*cos(c_elv)*cos(c_scn)*sin(v_pch)+1/fl*sin(c_elv)*cos(v_pch))*cos(v_rll))*vz+1;
Weak Perspective Projection
 Approximation to Perspective projection,
approximately valid when distance of the
camera is much greater than the depth
variation of the object
fX fY or x = mX y = mY
x=− y=−
Z Z
 Advantage: Computationally simpler [why?]
 Disadvantage: Not physically accurate

Orthographic Projection
 Scaling of weak perspective projection

x= X y=Y
 Parallel lines remain parallel
 Useful for engineering drawings, scrolls,
where the perspective shortening is not
desired
 Computationally simpler
http://www2.arts.ubc.ca/TheatreDesign/crslib/drft_1/orthint.htm

Plane + Perspective Model


 Assumptions:
 Planar World
 Rigid Motion of the World
 Perspective Camera
 Qs: Relationship between two images
taken under these conditions?
Plane + Perspective Model
 Approach
 Put planarity constraint in rigid model, then
solve for the image-to-image relation
 Assume that the camera is not moving but
the world plane has moved in front of the
camera
Step 1:  X '  r11 r12 r13   X  Tx 
Relate the two sets 3D  Y '  = r r22 r23   Y  + Ty 
world points (before    21
and after transformation  Z '  r31 r32 r33   Z  Tz 

Plane + Perspective Model


Since equation of a plane in 3D is
Step 2:
Put in the
X 
planarity aX + bY + cZ = 1 or [a b c] Y  = 1
constraint
 Z 
so

 X '  r11 r12 r13   X  Tx  X 


 Y '  = r r22 r23   Y  + Ty [a b c ] Y 
   21
 Z '   r31 r32 r33   Z  Tz   Z 

3x3 matrix
Plane + Perspective Model
 X '  r11 r12 r13   X  Tx  X 
 Y '  = r r22 r23   Y  + Ty [a b c ] Y 
   21
 Z '   r31 r32 r33   Z  Tz   Z 

Can be simplified to

 X '  a11 a12 a13   X 


 Y '  = a a22 a23   Y 
   21
 Z '  a31 a32 a33   Z 

A = R + T [a b c]

Plane + Perspective Model


Step 3: The image of point [X, Y, Z]T (before
Take the transformation) is formed at:
perspective Assume
projection of
X Y f = 1,
x= , y= ignore
image points Z Z –ve sign
and relate
then in 2D The image of point [X’, Y’, Z’]T (after
transformation) is formed at:
X' Y'
x' = , y' =
Z' Z'
We need to relate the two
Plane + Perspective Model
X' Y'
Given x' = , y' =
Z' Z'
Substitute the values of X’, Y’, Z’’ :

a11 X + a12Y + a13 Z


x' =
a31 X + a32Y + a33 Z
a21 X + a22Y + a23 Z
y' =
a31 X + a32Y + a33 Z

Plane + Perspective Model


Multiply and Divide by a33Z

X Y
a '11 + a '12 + a '13
x' = Z Z
X Y
a '31 + a '32 + 1
Z Z
X Y
a'21 + a '22 + a'23
y' = Z Z
X Y
a '31 + a '32 + 1
Z Z
Plane + Perspective Model
Substitute X/Z = x, Y/Z = y

a '11 x + a '12 y + a'13


x' =
a'31 y + a'32 y + 1
a '21 x + a'22 y + a '23
y' =
a '31 x + a '32 y + 1

This equation relates the two images captured before


and after the transformation.

Plane + Perspective Model


 Conclusion:
 Planar world and perspective camera
yields projective relationship between
the images
 Similarly, it can be shown that planar
world and orthographic camera yields
________ relationship between images
Examples of Projective
Transformations

Camera Calibration
 To relate 3D world points to 2D camera
points, we need to know a lot of things about
the camera
• Camera Location X,Y,Z
• Camera orientation α, β
• Gimbal vector (r1, r2, r3)T
• Focal length f
• Size of CCD array
• Center of projection

 Intrinsic parameters: internal to the camera.


 Do not change when camera is moved
 Extrinsic parameters: External to the camera.
 Change when camera is moved
Camera Calibration
 In general, the camera model looks like:

 Calibration is the process of finding the


parameters [a11…a44]
 If Wh and Ch are known, then we can solve
for the unknown parameters

Camera Calibration
Camera Calibration

 This equation has 12 unknowns


 Each correspondence yields two
equations
 If 6 correspondences are known, we
can solve for the unknowns

Camera Calibration
 Separating out the knowns and the unknowns
Camera Calibration
 This system CP = 0 is a homogeneous
system.
 C is rank deficient: rank(C) = 11
 Has multiple solutions (other than the
trivial solution)… Can be solved
uniquely only up to a scale factor

 Solution?

Solving for P
 The null space of C represents the P which
are the solutions to the system CP = 0
 How to find null space?
1. null(C) in MATLAB
2. Take SVD of C, as svd(C) = USVT. The column of
V corresponding to the eigen value of zero
represents the solution
(in practice, you will have to take the smallest eigen value)

You might also like