0% found this document useful (0 votes)

8 views14 pages

Chapter 3

CV Chapter 3

Uploaded by

Vishaal Ram

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views14 pages

Chapter 3

CV Chapter 3

Uploaded by

Vishaal Ram

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Chapter 2

A Simple Vision System

The goal of this chapter is to embrace the optimism of the 60’s and to hand-design an end-
to-end visual system. During this process, we will cover some of the main concepts that will
be developed in the rest of the book.
In 1966, Seymour Papert wrote a proposal for building a vision system as a summer
project [Papert 1966]. The abstract of the proposal starts stating a simple goal: “The summer
vision project is an attempt to use our summer workers e↵ectively in the construction of a
significant part of a visual system.” The report then continues dividing all the tasks (most
of which also are common parts of modern computer vision approaches) among a group of
MIT students. This project was a reflection of the optimism existing in the early days of
computer vision. However, the task proved to be harder than anybody expected.
In this first chapter we will discuss several of the main topics that we will cover in this
book. We will do this in the framework of a real, although a bit artificial, vision problem.
Vision has many di↵erent goals (object recognition, scene interpretation, 3D interpretation,
etc) but in this chapter we’re just focusing on the task of 3D interpretation.

2.1 A simple world: The blocks world

As the visual world is too complex, we will start by simplifying it enough that we will be able Blocks world from Larry
to build a simple visual system right away. This was the strategy used by some of the first Roberts PhD in June
scene interpretation systems. L. G. Roberts [Roberts 1963] introduced the Block World, a 1963.
world composed of simple 3D geometrical figures.
For the purposes of this chapter, let’s think of a world composed of a very simple (yet
varied) set of objects. These simple objects are composed of flat surfaces which can be
horizontal or vertical. These objects will be resting on a white horizontal ground plane. We
can build these objects by cutting, folding and gluing together some pieces of colored paper
as shown in figure 2.1. Here, we will not assume that we know the exact geometry of these
objects in advance.

2.2 A simple image formation model

One of the simplest forms of projection is parallel (or orthographic) projection. In this
image formation model, the light rays travel parallel to each other and perpendicular to the
camera plane. This type of projection produces images in which objects do not change size
as they move closer or farther from the camera, parallel lines in 3D remain parallel in the
2D image. This is di↵erent from the perspective projection (to be discussed in 5.2) where
the image is formed by the convergence of the light rays into a single point (focal point). If
we do not take special care, most pictures taken with a camera will be better described by
perspective projection (fig. 2.2.a).

43
44 CHAPTER 2. A SIMPLE VISION SYSTEM

Figure 2.1: A world of simple objects. Print, cut and build your own blocks world!

Figure 2.2: a) Close up picture without zoom. Note that near edges are larger than far
edges, and parallel lines in 3D are not parallel in the image, b) Picture taken from far away
but using zoom. This creates an image that can be approximately described by parallel
projection.

One way of generating images that can be described by parallel projection is to use the
camera zoom. If we increase the distance between the camera and the object while zooming,
we can keep the same approximate image size of the objects, but with reduced perspective
e↵ects (fig. 2.2.b). Note how, in fig. 2.2.b, 3D parallel lines in the world are almost parallel
in the image (some weak perspective e↵ects remain).
The first step we need to is to characterize how a point in world coordinates (X, Y, Z)
projects into the image plane. Figure 2.3.a shows our parameterization of the world and
camera. The camera center is inside the plane X = 0, and the horizontal axis of the camera
(x) is parallel to the ground plane (Y = 0). The camera is tilted so that the line connecting
the origin of the world coordinates system and the image center is perpendicular to the
image plane. The angle ✓ is the angle between this line and the Z axis. The image is
parametrized by coordinates (x, y). In this simple projection model, the origin of the world
coordinates projects on the origin of the image coordinates. Therefore, the world point
(0, 0, 0) projects into (0, 0). The resolution of the image (the number of pixels) will also a↵ect
the transformation from world coordinates to image coordinates via a constant factor ↵ (for
now we assume that pixels are square and we will see a more general form in section 36) and
that this constant is ↵ = 1. Taking into account all these assumptions, the transformation
between world coordinates and image coordinates can be written as:

x = X (2.1)
y = cos(✓)Y sin(✓)Z (2.2)

For this particular parametrization, the world coordinates Y and Z are mixed.
2.3. A SIMPLE GOAL 45

Y
y Y
X x

θ X

Z
Z

Figure 2.3: A simple projection model. a) world axis and camera plane. b) Visualization
of the world axis projected into the camera plane with parallel projection. Note that the Z
axis does not look like it is pointing towards the observer. Instead, the Z axis is identical to
the Y axis up to a sign change and a scaling. A point moving parallel to the Z axis will be
indistinguishable from a point moving parallel to the Y axis.

2.3 A simple goal

Part of the simplification of the vision problem resides in simplifying its goals. In this chapter
we will focus on recovering the world coordinates of all the pixels seen by the camera.
Besides recovering the 3D structure of the scene there are many other possible goals that
we will not consider in this chapter. For instance, one goal (that might seem simpler but is
not) is to recover the actual color of the surface seen by each pixel (x, y). This will require
discounting for illumination e↵ects as the color of the pixel is a combination of the surface
albedo and illumination (color of the light sources and inter-reflections).

2.4 From images to edges and useful features

The observed image is a function, f , that takes as input location, (x, y), and it outputs the Question: are there
intensity at that location: animal eyes that produce
f (x, y) (2.3) a di↵erent initial
representation than ours?
In this representation, the image is an array of intensity values (color values) indexed It seems that Gekko’s
by location. This representation is great if we are interested in knowing the light intensity have a pupil with 4
coming from each direction of the space and striking the camera plane as this information diamon shaped pinhole
is explicitly represented. However, here we are interested in interpreting the 3D structure of apertures which could
the scene and the objects within. Therefore, it will be useful to transform the image into a allow them to encode
representation that makes more explicit some of the important parts of the image that carry distance to a target in the
information about the boundaries between objects, changes in the surface orientation and so retinal image [Murphy
on. and Howland 1986].
The representation will not be unique. Di↵erent levels of the scene understanding process
will rely on di↵erent representations. For instance, the array of pixel intensities f (x, y) is
a reasonable representation for the early stages of visual processing as, although we do not
know the distance of surfaces in the world, the direction of each light ray in the world is
well defined. However, other initial representations could be used by the visual system (e.g.,
images could be coded in the Fourier domain, or pixels could combine light coming in di↵erent
directions).
There are several representations that can be used as an initial step for scene interpre-
tation. Images can be represented as collections of small image patches, regions of uniform
properties, edges, etc.
46 CHAPTER 2. A SIMPLE VISION SYSTEM

Figure 2.4: Image as a surface. The vertical axis corresponds to image intensity. For clarity
here, I have reversed the vertical axis. Dark values are shown higher than lighter values.

Occlusion

Horizontal 3D edge

Change of
surface orientation
Vertical 3D edge

Contact edge

Shadow boundary

Figure 2.5: Edges denote image regions where there are sharp changes of the image intensities.
Those variations can be due to a multitude of scene factors (occlusion boundaries, changes
in surface orientation, changes in surface albedo, shadows, etc).

2.4.1 A catalog of edges

Edges denote image regions where there are strong changes of the image with respect to loca-
tion. Those variations can be due to a multitude of scene factors (e.g., occlusion boundaries,
changes in surface orientation, changes in surface albedo, shadows, etc.).
One of the tasks that we will solve first is to classify image edges according to what
is the most probable cause. We will use the following classification of image boundaries
(fig. 2.5):
• Object boundaries: indicate pixels that delineate the boundaries of any object. Bound-
aries between objects generally correspond to changes in surface color, texture, orien-
tation, ...
• Changes in surface orientation: indicate locations where there are strong image vari-
ations due to changes in the surface orientations. A change in surface orientation
produces changes in the image intensity as intensity is a function of the angle between
the surface and the incident light.
2.4. FROM IMAGES TO EDGES AND USEFUL FEATURES 47

• Shadow edges: this can be harder than it seems. In this simple world, shadows are
soft, creating slow transitions between dark and light.
We will also consider two types of object boundaries:
• Contact edges: this is an boundary between two objects that are in physical contact.
Therefore, there are no depth discontinuity.
• Occlusion boundaries: Occlusion boundaries happen when an object is partially in
front of another. Occlusion boundaries generally produce depth discontinuities. In this
simple world, we will position the objects in such a way that objects do not occlude
each other but they will occlude the background.
Despite the apparent simplicity of this task, in most natural scenes, this classification is
very hard and requires the interpretation of the scene at di↵erent levels. In other chapters we
will see how to make better edge classifiers (for instance by propagating information along
boundaries, junction analysis, inferring light sources, etc.).

2.4.2 Extracting edges from images

The first step will consist in detecting candidate edges in the image. Here we will start by
making use of some notions from di↵erential geometry. If we think of the image f (x, y) as
a function of two (continuous) variables (fig. 2.4), we can measure the degree of variation
using the gradient: Gradient of an image at
✓ ◆ one location:
@f @f
rf = , (2.4)
@x @y
The direction of the gradient indicates the direction in which the variation of intensities is
larger. If we are on top of an edge, the direction of larger variation will be in the direction
∇f
perpendicular to the edge. θ
However, the image is not a continuous function as we only know the values of the f (x, y)
at discrete locations (pixels). Therefore, we will approximate the partial derivatives by:
@f
' f (x, y) f (x 1, y) (2.5)
@x
@f
' f (x, y) f (x, y 1) (2.6)
@y
A better behaved approximation of the partial image derivative can be computed by
combining the image pixels around (x, y) with the weights:
2 3
1 0 1
1 4
⇥ 2 0 25
4
1 0 1
We will discuss these approximations in detail in chapter 17.
From the image gradient, we can extract a number of interesting quantities:
e(x, y) = |rf (x, y)| / edge strength (2.7)
@f /@y
✓(x, y) = \rf = arctan / edge orientation (2.8)
@f /@x
The unit norm vector perpendicular to an edge is:
rf
n= (2.9)
|rf |
The first decision that we will perform is to decide which pixels correspond to edges
(regions of the image with sharp intensity variations) and which ones belong to uniform
regions (flat surfaces). We will do this by simply thresholding the edge strength e(x, y). In
the pixels with edges, we can also measure the edge orientation ✓(x, y). Figure 2.6 visualizes
the edges and the normal vector on each edge.
48 CHAPTER 2. A SIMPLE VISION SYSTEM

Input image Gradient (magnitude and orientation)` Edges

3D orientation Depth discontinuities Contact edges

Figure 2.6: Gradient and edge types.

2.5 From edges to surfaces

We want to recover world coordinates X(x, y), Y (x, y) and Z(x, y) for each image location
(x, y). Given the simple image formation model described before, recovering the X world
coordinates is trivial as they are directly observed: for each pixel with image coordinates
(x, y), the corresponding world coordinate is X(x, y) = x. Recovering Y and Z will be
harder as we only observe a mixture of the two world coordinates (one dimension is lost due
The classical visual to the projection from the 3D world into the image plane). Here we have written the world
illusion “two faces or a coordinates as functions of image location (x, y) to make explicit that we want to recover the
vase” is an example of 3D locations of the visible points.
figure-ground In this simple world, we will formulate this problem as a set of linear equations.
segmentation problem.
2.5.1 Figure/ground segmentation
Segmentation of an image into figure and ground is a classical problem in human perception
and computer vision that was introduced by the Gestalt psychology.
In this simple world deciding when a pixel belongs to one of the foreground objects or to
the background can be decided by simply looking at the color values of each pixel. Bright
pixels that have low saturation (similar values of the R, G, and B components) correspond to
the white ground plane, the rest of the pixels are likely to belong to the colored blocks that
compose our simple world. In general, the problem of image segmentation into distinct
objects is a very challenging task.
Once we have classified pixels as ground or figure, if we assume that the background
corresponds to a horizontal ground plane, then, for all pixels that belong to the ground we
can set Y (x, y) = 0. For pixels that belong to objects we will have to measure additional
image properties before we can deduce any geometric scene constraints.

2.5.2 Occlusion edges

An occlusion boundary separates two di↵erent surfaces at di↵erent distances from the ob-
Border ownership: the server. Along an occlusion edge, it is also important to know which object is in front as this
foreground object is the will be the one owning the boundary. Knowing who owns the boundary is important as an
one that owns the edge provides cues about the 3D geometry, but those cues only apply to the surface that
common edge. owns the boundary.
2.5. FROM EDGES TO SURFACES 49

In this simple world, we will assume that objects do not occlude each other (this can be
relaxed) and that the only occlusion boundaries are the boundaries between the objects and
the ground. However, as we next describe, not all boundaries between the objects and the
ground correspond to depth gradients.

2.5.3 Contact edges

Contact edges are boundaries between two distinct objects but where there exists no depth
discontinuity. Despite that there is not a depth discontinuity, there is an occlusion here (as
one surface is hidden behind another), and the edge shape is only owned by one of the two
surfaces.
In this simple world, if we assume that all the objects rest on the ground plane, then
we can set Y (x, y) = 0 on the contact edges. Contact edges can be detected as transitions
between the object (above) and ground (below). In the simple world only horizontal edges
can be contact edges. We will discuss next how to classify edges according to their 3D
orientation.

2.5.4 Generic view and non-accidental scene properties

Despite that in the projection of world coordinates to image coordinates we have lost a great
deal of information, there are a number of properties that will remain invariant and can help
us in interpreting the image. Here is a list of some of those invariant properties:

• Collinearity: a straight 3D line will project into a straight line in the image.

• Cotermination: if two or more 3D lines terminate at the same point, the corresponding
projections will also terminate at a common point.

• Smoothness: a smooth 3D curve will project into a smooth 2D curve.

Note that those invariances refer to the process of going from world coordinates to image
coordinates. The opposite might not be true. For instance, a straight line in the image could
correspond to a curved line in the 3D world but that happens to be precisely aligned with
respect to the viewers point of view to appear as a straight line. Also, two lines that intersect
in the image plane could be disjoint in the 3D space.
However, some of these properties (not all), while not always true, can nonetheless be Collinearity
used to reliably infer something about the 3D world using a single 2D image as input. For
instance, if two lines coterminate in the image, then, one can conclude that it is very likely
that they also touch each other in 3D. If the 3D lines do not touch each other, then it will
require a very specific alignment between the observer and the lines for them to appear to
coterminate in the image. Therefore, one can safely conclude that the lines might also touch Cotermination
in 3D.
These properties are called non-accidental properties [Lowe 1985] as they will only
be observed in the image if they also exist in the world or by accidental alignments between
the observer and scene structures. Under a generic view, non-accidental properties will be
shared by the image and the 3D world [Freeman 1994].
Let’s see how this idea applies to our simple world. In this simple world all 3D edges
are either vertical or horizontal. Under parallel projection, we will assume that 2D vertical
edges are also 3D vertical edges. Under parallel projection and with the camera having
its horizontal axis parallel to the ground, we know that vertical 3D lines will project into
vertical 2D lines in the image. On the other hand, horizontal lines will, in general, project
into oblique lines. Therefore, we can assume than any vertical line in the image is also a
vertical line in the world. As shown in figure 2.7, in the case of the cube, there is a particular
viewpoint that will make an horizontal line project into a vertical line, but this will require
an accidental alignment between the cube and the line of sight of the observer. Nevertheless,
this is a weak property and accidental alignments such as this one can occur, and a more
50 CHAPTER 2. A SIMPLE VISION SYSTEM

Generic Generic Generic Accidental Generic Generic Generic

Figure 2.7: Generic views of the cube allow 2D vertical edges to be classified as 3D vertical
edges, and collinear 2D edge fragments to be grouped into a common 3D edge. Those
assumptions break down only for the center image, where the cube happens to be precisely
aligned with the camera axis of projection.

general algorithm will need to account for that. But for the purposes of this chapter we will
consider images with generic views only.
In figure 2.6 we show the edges classified as vertical or horizontal using the edge an-
gle. Anything that deviates from 2D verticality by more than 15 degrees is labeled as 3D
horizontal.
We can now translate the inferred 3D edge orientation into linear constraints on the
global 3D structure. We will formulate these constraints in terms of Y (x, y). Once Y (x, y)
is recovered we can also recover Z(x, y) from equation 2.2.
In a 3D vertical edge, using the projection equations, the derivative of Y along the edge
will be:

@Y /@y = 1/ cos(✓) (2.10)

In a 3D horizontal edge, the coordinate Y will not change. Therefore, the derivative along
the edge should be zero:

@Y /@t = 0 (2.11)

n=(nx,ny) where the vector t denotes direction tangent to the edge, t = ( ny , nx ). We can write this
derivative as a function of derivatives along the x and y image coordinates:

@Y /@t = rY · t = ny @Y /@x + nx @Y /@y (2.12)

t
When the edges coincide with occlusion edges, special care should be taken so that these
constraints are only applied to the surface that owns the boundary.

2.5.5 Constraint propagation

Most of the image consists of flat regions where we do not have such edge constraints and
we thus don’t have enough local information to infer the surface orientation. Therefore, we
need some criteria in order to propagate information from the boundaries, where we do have
information about the 3D structure. This problem is common in many visual domains.
In this case we will assume that the object faces are planar. Thus, flat image regions
impose these constraints on the local 3D structure:

@ 2 Y /@x2 = 0 (2.13)
@ 2 Y /@y 2 = 0 (2.14)
2
@ Y /@y@x = 0 (2.15)

This approximation to the second derivative can be obtained

⇥ by applying
⇤ twice the first
order derivative approximated by [-1 1]. The result is: 1 2 1 which corresponds to
@ 2 Y /@x2 ' 2Y (x, y) Y (x + 1, y) Y (x 1, y), and similarly for @ 2 X/@x2
2.5. FROM EDGES TO SURFACES 51

2.5.6 A simple inference scheme

All the di↵erent constraints described before can be written as an overdetermined system of
linear equations. Each equation will have the form:
ai Y = bi (2.16)
where Y is a vectorized version of the image Y (that is, all rows of pixels have been con-
catenated into a flat vector). Note that there might be many more equations than there are
image pixels.
We can translate all the constraints described in the previous sections into this form. For
instance, if the index i corresponds to one of the pixels inside one of the planar faces of a fore-
ground object, then the planarity constraint can be written as ai = [0, . . . , 0, 1, 2, 1, 0, . . . , 0],
bi = 0.
We can solve the system of equations by minimizing the following cost function:
X
J= (ai Y bi )2 (2.17)
i

If some constraints are more important than others, it is possible to also add a weight
wi .
X
J= wi (ai Y bi ) 2 (2.18)
i

Our formulation has resulted on a big system of linear constraints (there are more equa-
tions than there are pixels in the image). It is convenient to write the system of equations
in matrix form:
AY = b (2.19)
where row i of the matrix A contains the constraint coefficients ai . The system of equations is
over determined (A has more rows than columns). We can use the pseudoinverse to compute
the solution:
Y = (AT A) 1
AT b (2.20)
This problem can be solved efficiently as the matrix A is very sparse (most of the elements
are zero).

2.5.7 Results
Figure 2.8 shows the resulting world coordinates X(x, y), Y (x, y), Z(x, y) for each pixel.
The world coordinates are shown here as images with the gray level coding the value of each
coordinate (black represents 0).
A few things to reflect on:
• It works. At least it seems to work pretty well. Knowing how well it works will require
having some way of evaluating performance. This will be important.
• But it can not possibly work all the time. We have made lots of assumptions that
will work only in this simple world. The rest of the book will involve upgrading this
approach to apply to more general input images.
Despite that this approach will not work on general images, many of the general ideas
will carry over to more sophisticated solutions (e.g., gather and propagate local evidence).
One thing to think about is: if information about 3D orientation is given in the edges, how
does that information propagate to the flat areas of the image in order to produce the correct
solution in those regions?
Evaluation of performance is a very important topic. Here, one simple way to visually
verify that the solution is correct is to render the objects under new view points. Figure 2.9
shows the reconstructed 3D scene rendered under di↵erent view points.
52 CHAPTER 2. A SIMPLE VISION SYSTEM

X Y (height) Z (depth)

Figure 2.8: The solution to our vision problem: for the input image of fig 1.7, the assumptions
and inference scheme described below leads to these estimates of 3d world coordinates at each
image location. Note that the world coordinates X, Y , and Z are shown as images.

Figure 2.9: To show that the algorithm for 3D interpretation gives reasonable results, we
can re-render the inferred 3D structure from di↵erent viewpoints, the rendered images show
that the 3D structure has been accurately captured.

2.6 Generalization
One desired property of any vision system is it ability to generalize outside of the domain
for which it was designed to operate. In the approach presented in this chapter, the domain
of images the algorithm is expected to work is defined by the assumptions described in section
2.4. Later in the book, when we describe learning-based approaches, the training dataset
Out of domain will specify the domain. What happens when assumptions are violated? or when the images
generalization refers to contain configurations that we did not consider while designing the algorithm?
the ability of a system to Let’s run the algorithm with shapes that do not satisfy the assumptions that we have
operate outside the made for the simple world. Figure 2.10 shows the result when the input image violates several
domain for which it was assumptions we made (shadows are not soft and the green cube occludes the red one) and
designed. has one object on top of the other, a configuration that we did not consider while designing
the algorithm. In this case the result happens to be good, but it could have gone wrong.
Figure 2.11 shows the impossible steps figure from [Adelson 2000]. On the left side, this
shape looks rectangular with the stripes appear as being painted on the surface. On the
right side, the shape looks as having steps with the stripes corresponding to shading due
to the surface orientation. In the middle the shape is ambiguous. Figure 2.11 shows the
reconstructed 3D scene for this unlikely image. The system has tried to approximate the
constraints, as for this shape it is not possible to exactly satisfy all the constraints.

2.7 Concluding remarks

Despite of having a 3D representation of the structure of the scene, the system is still unaware
of the fact that the scene is composed of a set of distinct objects. For instance, as the system
lacks a representation of which objects are actually present in the scene, we cannot visualize
the occluded parts. The system cannot do simple tasks like counting the number of cubes.
2.7. CONCLUDING REMARKS 53

Edges (3D orientations) Depth discontinuities (occlusions) Contact edges

Figure 2.10: Reconstruction of an out of domain example. The input image violates several
assumptions we made (shadows are not soft and the green cube occludes the red one) and
has one object on top of the other, a configuration that we did not consider while designing
the algorithm.

Figure 2.11: Reconstruction of an impossible figure (inspired from [Adelson 2000]): the
algorithm does the best it can. Note how the reconstruction seems to agree with how we
perceive the impossible image.
54 CHAPTER 2. A SIMPLE VISION SYSTEM

A di↵erent approach to the one discussed here is model-based scene interpretation, where
we could have a set of predefined models of the objects that can be present in the scene and
the system should try to decide if they are present or not in the image, and recover their
parameters (pose, color, etc.)
Recognition allows indexing properties that are not directly available in the image. For
instance, we can infer the shape of the invisible surfaces. Recognizing each geometric figure
also implies extracting the parameters of each figure (pose, size, ...).
Given a detailed representation of the world, we could render the image back, or at least
some aspects of it. We can check that we are understanding things right if we can make
verifiable predictions, such as what would you see if you look behind the object? Closing the
loop between interpretation and the input will be good at some point.
In this chapter, besides introducing some useful vision concepts, we also made use of
mathematical tools from algebra and calculus that you should be familiar with.
2.7. CONCLUDING REMARKS 55

Exercises
We provide a python notebook with the code to be completed. You can run it locally or in
Colab. To use Colab, upload it to Google Drive and select ‘open in colab’, which will allow
you to complete the problems without setting up your own environment. Once you have
finished, copy the code sections that you have completed as screenshots to the report.

Problem 1 Perspective and orthographic projections

The goal of this first exercise is to take images with di↵erent settings of a camera to
create pictures with perspective projection and with orthographic projection. Both pictures
should cover the same piece of the scene. You can take pictures of real places or objects (e.g.
your furniture).
To create pictures with orthographic projection you can do two things: 1) use the zoom
of the camera, 2) crop the central part of a picture. You will have to play with the distance
between the camera and the scene, and with the zoom or cropping so that both images look
as similar as possible, only di↵ering in the type of projection (similar to figure 2.2).
Submit the two pictures and clearly label parts of the images that reveal their projection
types.

Problem 2 Orthographic projection equations

Recall the parallel projection equations:

x = ↵X + x0 (2.21)
y = ↵(cos(✓)Y sin(✓)Z) + y0 (2.22)

which relate the coordinates of a point in the 3D world to the image coordinates of an
orthogonal camera rotated by ✓ over the X-axis.
Show that the equations emerge naturally from a series of transformations applied to the
3D world coordinates (X, Y, Z), of the form:
2 3
 X 
x x0
= ↵ · P · Rx (✓) · 4 Y 5 + (2.23)
y y0
Z

Where Rx (✓) is a 3 ⇥ 3 matrix corresponding to a rotation over the X axis, P is a 2 ⇥ 3

matrix corresponding to the orthogonal projection and ↵ is a scaling factor to account for
the size of the camera sensor, which is a single scalar when the pixels are square (assumed
in this case).
Then, find ↵, x0 and y0 when the world point (0, 0, 0) projects onto (0, 0) (which corre-
sponds to the center of the image) and the point (1, 0, 0) projects onto (3, 0).

Problem 3 Edge and surface constraints

In Sect. 2.5.5, we have written down the constraints for Y (x, y). Briefly derive the
constraints for Z(x, y) along vertical edges, horizontal edges, and flat surfaces.

Problem 4 Complete the code

Fill in the missing lines in the notebook: Ch2.ipynb, and include them in the report as
screenshots. First, find a way to classify edges as vertical or horizontal edges. Next, fill in
the rest of the conditions of the constraint matrix. The constraints for when the pixel is on
the ground have already been done for you as an example. Put the kernel in Aij and the
value you expect in b (the conversion to a linear system is done for you later so you don’t
need to worry about that part).
You only need to modify the locations marked with a TODO comment.
56 CHAPTER 2. A SIMPLE VISION SYSTEM

Please make sure to also include your answers for vertical edges, horizontal edges,
and your formulations for Aij and b for the di↵erent constraints in the report.

Problem 5 Run the code

Select some of the images included with the code and show some new viewpoints for
them.
Optional: You can also try with new images taken by you if you decide to create your
own simple world.

Problem 6 Violating simple world assumptions (1 point)

Find one example from the four images provided with the problem set (img1.jpg, ...,
img4.jpg) when the recovery of 3D information fails. Include the image and the reconstruc-
tion in your writeup, and explain why it fails.

Research problem The real world [optional]

A research problem is a question for which we do not know the answer. In fact, there
might not even be an answer. This question is optional and could be extended into a larger
course project.
The goal of this problem is to test the 3D reconstruction code with real images. A
number of the assumptions we have made will not work when the input is real images of a
more complex scene. For instance, the simple strategy of di↵erentiating between foreground
and background segmentation will not work with other scenes.
Try taking pictures of real-world scenes and propose modifications to the scheme proposed
in this lecture so that you can get some better 3D reconstructions. The goal is not to build
a general system, but to be able to handle a few more situations.

Unec 1699770296
No ratings yet
Unec 1699770296
16 pages
Module 3 Clipping 3D Geometric Transformations Color and Illumination Models
No ratings yet
Module 3 Clipping 3D Geometric Transformations Color and Illumination Models
19 pages
cs405 05 3dviewing
No ratings yet
cs405 05 3dviewing
103 pages
Chap 6
No ratings yet
Chap 6
65 pages
Unit 2 Graphics
No ratings yet
Unit 2 Graphics
34 pages
Algorithms For 2-Dimensional Object
No ratings yet
Algorithms For 2-Dimensional Object
15 pages
Computer Vision
No ratings yet
Computer Vision
41 pages
Three-Dimensional Viewing: Computer Graphics
No ratings yet
Three-Dimensional Viewing: Computer Graphics
52 pages
Introduction To Computer Graphics: 5. Viewing in 3D (A)
No ratings yet
Introduction To Computer Graphics: 5. Viewing in 3D (A)
37 pages
An Invitation To 3-D Vision From Images To Models
No ratings yet
An Invitation To 3-D Vision From Images To Models
339 pages
Geometric Modeling
No ratings yet
Geometric Modeling
4 pages
13 Pipeline
No ratings yet
13 Pipeline
53 pages
Image Formation FPCV 1 1
No ratings yet
Image Formation FPCV 1 1
35 pages
Computer Vision Lecture Notes All Compress
No ratings yet
Computer Vision Lecture Notes All Compress
17 pages
CG Module 3.....
No ratings yet
CG Module 3.....
37 pages
04 3D Viewing (Engl)
No ratings yet
04 3D Viewing (Engl)
4 pages
CG-module 4
No ratings yet
CG-module 4
58 pages
An Invitation To 3-D Vision PDF
No ratings yet
An Invitation To 3-D Vision PDF
338 pages
From Images To 3D Models
No ratings yet
From Images To 3D Models
7 pages
Unit - V Computer Graphics
No ratings yet
Unit - V Computer Graphics
9 pages
On Exemplar 3D Display Techniques:: Summary Discussion
No ratings yet
On Exemplar 3D Display Techniques:: Summary Discussion
32 pages
Aima 948 990
No ratings yet
Aima 948 990
43 pages
Cse167 04
No ratings yet
Cse167 04
32 pages
Class 03 Virtual World
No ratings yet
Class 03 Virtual World
31 pages
Image Formation: DD2423 Image Analysis and Computer Vision
No ratings yet
Image Formation: DD2423 Image Analysis and Computer Vision
45 pages
Advance Computer Graphic (CP222) : Prepared by Harsora Vinay
No ratings yet
Advance Computer Graphic (CP222) : Prepared by Harsora Vinay
41 pages
Radiometry Introduction Lense System RK
No ratings yet
Radiometry Introduction Lense System RK
95 pages
Chapter 5 - The Viewing Pipeline PDF
No ratings yet
Chapter 5 - The Viewing Pipeline PDF
23 pages
Questions:: Object Descriptions To The Viewing-Coordinate Reference Frame. This Conversion of Object Descriptions
No ratings yet
Questions:: Object Descriptions To The Viewing-Coordinate Reference Frame. This Conversion of Object Descriptions
18 pages
Unit 3 - Cga - 2021
No ratings yet
Unit 3 - Cga - 2021
12 pages
Computer Graphics and Visualization SSE, Mukka
No ratings yet
Computer Graphics and Visualization SSE, Mukka
44 pages
Nov 4 - Unit IV Projections - Part 2
No ratings yet
Nov 4 - Unit IV Projections - Part 2
60 pages
Interactive 3d Graphics
No ratings yet
Interactive 3d Graphics
60 pages
04 Perspective
No ratings yet
04 Perspective
46 pages
3D Viewing
100% (1)
3D Viewing
3 pages
Unit-7: Visible-Surface Detection Methods
No ratings yet
Unit-7: Visible-Surface Detection Methods
13 pages
3D Viewing & Clipping: Angel Chapter 5
No ratings yet
3D Viewing & Clipping: Angel Chapter 5
20 pages
8 3d Viewing
No ratings yet
8 3d Viewing
51 pages
Overview On 3 D Reconstruction From Images
No ratings yet
Overview On 3 D Reconstruction From Images
7 pages
The Role of Opengl in Reference Model
No ratings yet
The Role of Opengl in Reference Model
31 pages
CG Unit 5
No ratings yet
CG Unit 5
49 pages
Ai 3
No ratings yet
Ai 3
41 pages
Catatan 2
No ratings yet
Catatan 2
3 pages
Lecture 2
No ratings yet
Lecture 2
77 pages
U5 Part 1
No ratings yet
U5 Part 1
63 pages
Building Virtual Worlds by 3D Object Mapping
No ratings yet
Building Virtual Worlds by 3D Object Mapping
6 pages
Topic 7 3D Concepts
No ratings yet
Topic 7 3D Concepts
10 pages
Object Recognition
No ratings yet
Object Recognition
49 pages
Report 3
No ratings yet
Report 3
7 pages
TINA: A 3D Vision System For Pick and Place: J Porrill, SB Pollard, TP Pridmore, JB Bowen, JEW Mayhew & JP Frisby
No ratings yet
TINA: A 3D Vision System For Pick and Place: J Porrill, SB Pollard, TP Pridmore, JB Bowen, JEW Mayhew & JP Frisby
8 pages
08-3D Graphics and Rendering System
No ratings yet
08-3D Graphics and Rendering System
51 pages
CH4 - 3D Graphics
No ratings yet
CH4 - 3D Graphics
53 pages
Indoor Scene Understanding in 2.5-3D For Autonomous Agents A Survey
No ratings yet
Indoor Scene Understanding in 2.5-3D For Autonomous Agents A Survey
28 pages
Unit 3
No ratings yet
Unit 3
34 pages
2D Transformation
100% (1)
2D Transformation
34 pages
Basic Principles of Perspective Drawing For The Technical Illustrator
100% (1)
Basic Principles of Perspective Drawing For The Technical Illustrator
10 pages
EGD Viva Questions
No ratings yet
EGD Viva Questions
2 pages
NCERT Solutions For Class 7 Maths Chapter 13 Visualising Solid Shapes Ex 13.2 - Free PDF Download
No ratings yet
NCERT Solutions For Class 7 Maths Chapter 13 Visualising Solid Shapes Ex 13.2 - Free PDF Download
8 pages
08 B Tech AIDS
No ratings yet
08 B Tech AIDS
49 pages
Chapter 6 - ORTHOGRAPHIC PROJECTION
No ratings yet
Chapter 6 - ORTHOGRAPHIC PROJECTION
35 pages
The Role of Engineering Drawing Production Review
No ratings yet
The Role of Engineering Drawing Production Review
2 pages
Tech Draw 8 First Quarter
No ratings yet
Tech Draw 8 First Quarter
19 pages
Engineering Graphics and Computer Aided Drawing
No ratings yet
Engineering Graphics and Computer Aided Drawing
85 pages
Basic Technology Jss
No ratings yet
Basic Technology Jss
3 pages
Lesson 3 - Isometric Drawing PDF
No ratings yet
Lesson 3 - Isometric Drawing PDF
5 pages
Lecture 1 Fundamentals of Image Processing
No ratings yet
Lecture 1 Fundamentals of Image Processing
24 pages
Manual Drafting: Calamba Manpower Development Center Technical Drafting NC Ii
100% (1)
Manual Drafting: Calamba Manpower Development Center Technical Drafting NC Ii
94 pages
Engineering Graphics Lab Manual
63% (8)
Engineering Graphics Lab Manual
51 pages
ECE Syllabus - 2017 Batch PDF
No ratings yet
ECE Syllabus - 2017 Batch PDF
314 pages
Course Code: 3300007
No ratings yet
Course Code: 3300007
8 pages
Engineering Drawing Ruet
No ratings yet
Engineering Drawing Ruet
128 pages
Pictorial Drawings
No ratings yet
Pictorial Drawings
25 pages
KGCE Syllabus
No ratings yet
KGCE Syllabus
49 pages
Visual Servoing IEEE Paper
No ratings yet
Visual Servoing IEEE Paper
20 pages
Free Hand Sketching - Free Hand Drawing Lecture Notes
No ratings yet
Free Hand Sketching - Free Hand Drawing Lecture Notes
14 pages
PST202 2023
No ratings yet
PST202 2023
84 pages
CH 1 Projections Points Lines Planes
No ratings yet
CH 1 Projections Points Lines Planes
46 pages
Introduction To Staad Pro V8I & Staad Editor
No ratings yet
Introduction To Staad Pro V8I & Staad Editor
9 pages
Engineering Graphics (EG)
No ratings yet
Engineering Graphics (EG)
8 pages
Orthographic Projects Ok
No ratings yet
Orthographic Projects Ok
4 pages
Lesson Plan
No ratings yet
Lesson Plan
10 pages
Chahly - Descriptive Geometry - The Higher School Publishing House - 1968
No ratings yet
Chahly - Descriptive Geometry - The Higher School Publishing House - 1968
316 pages
Computer Graphics-1 - Introduction
No ratings yet
Computer Graphics-1 - Introduction
8 pages
Tech Drawing and Drafting
No ratings yet
Tech Drawing and Drafting
14 pages
Multiple Choice For Mock Exam 2025
No ratings yet
Multiple Choice For Mock Exam 2025
9 pages

Chapter 3

Uploaded by

Chapter 3

Uploaded by

Chapter 2

A Simple Vision System

2.1 A simple world: The blocks world

2.2 A simple image formation model

2.3 A simple goal

2.4 From images to edges and useful features

2.4.1 A catalog of edges

2.4.2 Extracting edges from images

Input image Gradient (magnitude and orientation)` Edges

3D orientation Depth discontinuities Contact edges

Figure 2.6: Gradient and edge types.

2.5 From edges to surfaces

2.5.2 Occlusion edges

2.5.3 Contact edges

2.5.4 Generic view and non-accidental scene properties

• Smoothness: a smooth 3D curve will project into a smooth 2D curve.

Generic Generic Generic Accidental Generic Generic Generic

@Y /@y = 1/ cos(✓) (2.10)

@Y /@t = rY · t = ny @Y /@x + nx @Y /@y (2.12)

2.5.5 Constraint propagation

This approximation to the second derivative can be obtained

2.5.6 A simple inference scheme

2.7 Concluding remarks

Edges (3D orientations) Depth discontinuities (occlusions) Contact edges

Problem 1 Perspective and orthographic projections

Problem 2 Orthographic projection equations

Where Rx (✓) is a 3 ⇥ 3 matrix corresponding to a rotation over the X axis, P is a 2 ⇥ 3

Problem 3 Edge and surface constraints

Problem 4 Complete the code

Problem 5 Run the code

Problem 6 Violating simple world assumptions (1 point)

Research problem The real world [optional]

You might also like