Ai 3
Ai 3
Chapter 27
Computer Vision
♦ Image Formation
♦ Classifying Images
♦ Detecting Objects
♦ The 3D World
Each light sensitive element at the back of a pinhole camera receives light that passes
through the pinhole from a small range of directions. If the pinhole is small enough,
the result is a focused image behind the pinhole. The process of projection means
that large, distant objects look the same as smaller, nearby objects—the point Pt in
the image plane could have come from a nearby toy tower at point P or from a
distant real tower at point Q.
• The depth Z of all points on an object fall within the range Z0 ∆Z, with ∆Z Z0,
• The equations for projection from the scene coordinates (X,Y, Z) to the image
plane become x = sX and y = sY .
Two surface patches are illuminated by a distant point source, whose rays are shown as
light arrows. Patch A is tilted away from the source (θ is close to 90◦) and collects less
energy, because it cuts fewer light rays per unit surface area. Patch B, facing the source
(θ is close to 0◦), collects more energy.
• Principle of trichromacy
• match the visual appearance of any spectral energy
density, however complex, by mixing appropriate amounts
of just three primaries.
• Three primaries
• no mixture of any two will match the third.
• Color constancy
• estimate the color the surface would have under white light
• Ignore the effects of different colored lights
Top: Intensity profile I(x) along a one-dimensional section across a step edge. Middle: The
derivative of intensity, I!(x). Large values of this function correspond to edges, but the function
is noisy. Bottom: The derivative of a smoothed version of the intensity. The noisy candidate
edge at x = 75 has disappeared.
Here, (x, y) ranges over pixels in the block centered at (x0, y0). We find the (Dx,
Dy) that minimizes the SSD. The optical flow at (x0, y0) is then (vx, vy) = (Dx/Dt,
Dy/Dt).
Classification problem
• boundary curve at pixel location (x, y), an orientation θ. An image
neighborhood centered at (x, y) looks roughly like a disk, cut into two
halves by a diameter oriented at θ.
• compute the probability Pb(x, y, θ) that there is a boundary curve at that
pixel along that orientation by comparing features in the two halves.
• to train a machine learning classifier using a data set of
natural images in which humans have marked the ground
truth boundaries
• the goal of the classifier is to mark exactly those boundaries
marked by humans and no other
Details:
• Decide on a window shape
• Build a classifier for windows
• Decide which windows to look at
• Choose which windows to report
• Report precise locations of objects using these windows
• two views of enough points, and you know which point in the
first view corresponds to which point in the second view
The relation between disparity and depth in stereopsis. The centers of projection
of the two eyes are distance b apart, and the optical axes intersect at the fixation
point P0. The point P in the scene projects to points PL and PR in the two eyes. In
angular terms, the disparity between these is δθ (the diagram shows two angles
of δθ .
For activity data, the relationship between training and test data
is more untrustworthy people do so many things in so many
context
What you call an action depends on the time scale. The single frame
at the top is best described as opening the fridge (you don’t gaze at
the contents when you close a fridge). But if you look at a short clip
of video (indicated by the frames in the center row), the action is
best described as getting milk from the fridge. If you look at a long
clip (the frames in the bottom row), the action is best described as
fixing a snack. © 2022 Pearson Education Ltd. 34
Using Computer Vision