Camera
Camera
V I S I O N
CAMERA & IMAGE
FORMATION
• Camera models
– Pinhole camera model
– Perspective projection
• Camera calibration
• Implementing calibration to
determine intrinsic and extrinsic
matrices. 𝛼 0 𝑐𝑥
𝐾 = 0 𝛽 𝑐𝑦
• Implementing the small 0 0 1
𝑟11 𝑟12 𝑟13 𝑡𝑥
augmenting application to embed
R 𝑡 = 𝑟21 𝑟22 𝑟23 𝑡𝑧
3D objects to reality scence. 𝑟31 𝑟32 𝑟33 𝑡𝑧
• Thin Lens Model: This model incorporates a thin lens to simulate the effects of
lenses on image formation
• Fish-Eye Camera Model: Fish-eye lenses capture a very wide field of view.
• The image plane, which is the film or medium that captures the light rays, is situated in
front of the pinhole. However, in the real world, it is located behind the pinhole. This
assumption makes it easier to model the projection as we don’t have to worry about
inverting an image.
• All the light rays from different points converge at the pinhole, which can also be called
the center of projection or the camera center.
• The idea is that the image of a point is the projection of that point on the image plane, or
where the line from the camera center to the point intersects the image plane.
• What we want is a one-to-one correspondence between the points in the world and the
pixels in the film.
m y
𝑥 𝑋 𝑋 𝑦 𝑌 𝑌
= ⇒𝑥=𝑓 ; = ⇒𝑦=𝑓
𝑓 𝑍 𝑍 𝑓 𝑍 𝑍
𝑥, 𝑦, 1 𝑇 = 𝑋𝑓, 𝑌𝑓, 𝑍 𝑇 M
Projection line
• Change of unit: physical measurements to m y
pixels
M
• Change of coordinate system
Projection line
m
– Image plane coordinates have origin y
at image center
– Digital image coordinates have z
origin at top-left corner (0, 0)
𝑋 𝑌
𝑥 = 𝛼 + 𝑐𝑥 , 𝑦 = 𝛽 + 𝑐𝑦
𝑍 𝑍
(256, 256)
Ngo Quoc Viet 10
H O M O G E N E O U S C O O R D I N AT E S
𝑥 𝑥 𝑥 𝑥
𝑦 𝑥 𝑦 𝑦 𝑥 𝑦 𝑧
𝑥, 𝑦 → 𝑦 , 𝑥, 𝑦, 𝑧 → 𝑧 ; 𝑦 → , , 𝑧 → , ,
1 𝑤 𝑤 𝑤 𝑤 𝑤 𝑤
1 𝑤
𝑥𝑖 𝑥𝑗 𝑦𝑖 ∗ 1 − 1 ∗ 𝑦𝑗
𝑦𝑖 × 𝑦𝑗 = 1 ∗ 𝑥𝑗 − 𝑥𝑖 ∗ 1
1 1 𝑥𝑖 ∗ 𝑦𝑗 − 𝑦𝑖 ∗ 𝑥𝑗
𝑋 𝑌
• Using homogeneous coordinates, we can formulate 𝑥 = 𝛼 + 𝑐𝑥 , 𝑦 = 𝛽 + 𝑐𝑦 in
𝑍 𝑍
𝛼𝑋 𝑐𝑥
𝛼𝑋 𝛽𝑌 𝑐𝑥 𝑐𝑦
Cartesian coordinate system , ⇒ 𝛽𝑌 , , ⇒ 𝑐𝑦 by
𝑍 𝑍 1 1
𝑍 1
𝑥 𝛼𝑋 + 𝑐𝑥 𝛼 0 𝑐𝑥 0 𝑋 𝛼 0 𝑐𝑥 0
𝑝 = 𝑦 = 𝛽𝑌 + 𝑐𝑦 = 0 𝛽 𝑐𝑦 0 𝑌 = 0 𝛽 𝑐𝑦 0 𝑃 = 𝑀𝑃
𝑍
𝑍 𝑍 0 0 1 0 1 0 0 1 0
• We can represent the relationship between a point in 3D space and its image
coordinates by a matrix vector relationship.
𝛼 0 𝑐𝑥 1 0 0 0
𝑝 = 𝑀𝑃 = 0 𝛽 𝑐𝑦 0 1 0 0 𝑃
0 0 1 0 0 1 0
𝛼 0 𝑐𝑥
= 0 𝛽 𝑐𝑦 I 0 𝑃 = 𝐾 I 0 𝑃
0 0 1
• The matrix K is often referred to as the camera matrix. It is also called
intrinsic matrix.
𝛼 𝛼𝑐𝑜𝑡𝜃 𝑐𝑥
x 𝛽
y 𝐾= 0 𝑐𝑦
𝑠𝑖𝑛𝜃
0 0 1
• Most cameras have zero-skew (=90).
𝑥ො
• The camera matrix K has 5 degrees of freedom: k, f
for focal length, 𝑐𝑥 , 𝑐𝑦 for offset and for skew.
x
• Translation 1 0 0
𝑅𝑥 𝛼 = 0 𝑐𝑜𝑠𝛼 −𝑠𝑖𝑛𝛼
𝛼 0 𝑐𝑥 1 0 0 𝑡𝑥 0 𝑠𝑖𝑛𝛼 𝑐𝑜𝑠𝛼
𝑝=𝐾 I 𝑡 𝑃= 0 𝛽 𝑐𝑦 0 1 0 𝑡𝑧 𝑃 𝑐𝑜𝑠𝛽 0 𝑠𝑖𝑛𝛽
0 0 1 0 0 1 𝑡𝑧 𝑅𝑦 𝛽 = 0 1 0
• Rotation −𝑠𝑖𝑛𝛽 0 𝑐𝑜𝑠𝛽
𝑐𝑜𝑠𝛾 −𝑠𝑖𝑛𝛾 0
𝛼 0 𝑐𝑥 𝑟11 𝑟12 𝑟13 𝑡𝑥 𝑅𝑧 𝛾 = 𝑠𝑖𝑛𝛾 𝑐𝑜𝑠𝛾 0
𝑝=𝐾 R 𝑡 𝑃= 0 𝛽 𝑐𝑦 𝑟21 𝑟22 𝑟23 𝑡𝑧 𝑃 0 0 1
0 0 1 𝑟31 𝑟32 𝑟33 𝑡𝑧 y p’
• These parameters R and T are known as the extrinsic
g
parameters because they are external to and do not
depend on the camera p
𝑝 = 𝐾 R 𝑡 𝑃 = 𝑀𝑃
• The full projection matrix M consists of the two types of parameters introduced
above: intrinsic and extrinsic parameters.
• All parameters contained in the camera matrix K are the intrinsic parameters,
which change as the type of camera changes.
• The extrinsic paramters include the rotation and translation, which do not
depend on the camera’s build.
• When the camera matrix is equal to the identity matrix, it means that there is no
distortion or scaling in the image. This is a special case known as the pinhole
camera model, where light passes through a single point (the pinhole) to form an
inverted image on the opposite side of the camera.
• Distortion coefficients are used to correct lens distortion, which can cause images
to appear warped or curved. When the distortion coefficients are set to zero, it
means that there is no distortion to correct for.
• In some cases, it may be appropriate to use an identity camera matrix and zero
distortion coefficients. This is typically done when the camera has already been
calibrated and the distortion is minimal, or when a pinhole camera model is
appropriate for the task at hand.
1 0 0 0
0 1 0 0 𝑃.
0 0 0 0
0 0 0 1
• Scaled-onthography: 𝑝 = 𝑠𝐼2×2 0 𝑃
• Para-perspective
1 0 0 0
0 1 0 0
p= −𝑍𝑓𝑎𝑟 𝑍𝑛𝑒𝑎𝑟 𝑍𝑓𝑎𝑟 𝑃, 𝑍𝑟𝑎𝑛𝑔𝑒 = 𝑍𝑓𝑎𝑟 − 𝑍𝑛𝑒𝑎𝑟
0 0
𝑍𝑟𝑎𝑛𝑔𝑒 𝑍𝑟𝑎𝑛𝑔𝑒
0 0
0 0
where 𝑍𝑛𝑒𝑎𝑟 and 𝑍𝑓𝑎𝑟 are the near and far z clipping planes
• Transform the world point 𝑋𝑤 , 𝑌𝑤 , 𝑍𝑤 with optical center W to a new coordinate system
with the camera’s optical center C as the (0, 0, 0) origin. This is done using a rigid body
transformation, consisting of rotation (R) and translation (t).
• Project the camera point 𝑋𝑐 , 𝑌𝑐 , 𝑍𝑐onto the optical sensor, creating new coordinates
𝑥, 𝑦, 𝑧 in the same coordinate system. This is achieved using the camera’s intrinsic
parameter matrix K. The optical sensor is often referred to as the “image plane” or “image
frame”.
• Normalize 𝑥, 𝑦, 𝑧 to pixel coordinates 𝑢, 𝑣 by dividing by z and and adjusting the
origin of the image.
• Python Code: Perspective Projection, Mapping coordinates from 3d to 2d in
https://colab.research.google.com/
• Inverted Image Formation: The limited rays of light passing through the
pinhole create an inverted image on the opposite side of the pinhole
• Image Projection: The inverted image is projected onto the photosensitive
surface (film or image sensor) .
• Please refer to Python code to simulate pinhole camera.
• Robotics: Robots often use cameras for navigation and perception. Accurate
calibration is necessary for robot systems to interpret visual information correctly
and make informed decisions.
• Quality Control: In industrial applications, where cameras are used for quality
control purposes, calibration ensures that defects or features are accurately
detected and measured
• From the rig’s known pattern, we have known points in Given n of these corresponding
the world reference frame 𝑃1 , 𝑃2 , ⋯ , 𝑃𝑛 . Finding these points, the entire linear system of
points in the image we take from the camera gives equations becomes
corresponding points in the image 𝑝1 , 𝑝2 , ⋯ , 𝑝𝑛 . 𝑢1 𝑚3 𝑃1 − 𝑚1 𝑃1 = 0
𝑣1 𝑚3 𝑃1 − 𝑚2 𝑃1 = 0
• We set up a linear system of equations from n
⋮
correspondences such that for each correspondence
𝑢𝑛 𝑚3 𝑃𝑛 − 𝑚1 𝑃𝑛 = 0
𝑃𝑖 , 𝑝𝑖 and camera matrix M whose rows are
𝑣𝑛 𝑚3 𝑃𝑛 − 𝑚2 𝑃𝑛 = 0
𝑚1 , 𝑚2 , 𝑚3
• https://docs.opencv.org/4.x/dc/dbb/tutorial_py_calibration.html
• https://learnopencv.com/camera-calibration-using-opencv/
• https://opencv24-python-
tutorials.readthedocs.io/en/latest/py_tutorials/py_calib3d/py_calibration/
py_calibration.html
R
(G=0,B=0)
G
1,0,0 (R=0,B=0)
0,0,1
B
(R=0,G=0)
Some drawbacks
• Strongly correlated channels
• Non-perceptual http://en.wikipedia.org/wiki/File:RGB_color_solid_cube.png
Ngo Quoc Viet 44
C O LO R S PAC E S : H S V
H
(S=1,V=1)
S
(H=1,V=1)
V
(H=1,S=0)
Y
Cr (Cb=0.5,Cr=0.5)
Y=1
Cb
Cb
(Y=0.5,Cr=0.5)
Cr
Fast to compute, good for compression, used by TV (Y=0.5,Cb=05)
L
(a=0,b=0)
a
(L=65,b=0)
b
(L=65,a=0)
• Pinhole camera model is the simplest mathematical model that can be applied
to a lot of real photography devices.