0% found this document useful (0 votes)
62 views14 pages

Image

Download as doc, pdf, or txt
Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1/ 14

http://homepages.inf.ed.ac.

uk/rbf/IAPR/r
esearchers/D2PAGES/d2tut.htm
Image Acquisition
The first stage of any vision system is the image acquisition stage.
After the image has been obtained, various methods of processing can be applied to the
image to perform the many different vision tasks required today.
However, if the image has not been acquired satisfactorily then the intended tasks may
not be achievable, even with the aid of some form of image enhancement

2D Image Input
The basic two-dimensional image is a monochrome (greyscale) image which has been
digitised.
Describe image as a two-dimensional light intensity function f(x,y) where x and y are
spatial coordinates and the value of f at any point (x, y) is proportional to the brightness or
grey value of the image at that point.
A digitised image is one where

spatial and greyscale values have been made discrete.


intensity measured across a regularly spaced grid in x and y directions
intensities sampled to 8 bits (256 values).

For computational purposes, we may think of a digital image as a two-dimensional array


where x and y index an image point. Each element in the array is called a pixel (picture
element). See Figs. 1 and 2.

Fig. 1 Greyscale image and highlighted region

Figure: Pixel values in highlighted region

2D Input Devices
TV Camera or Vidicon Tube
A first choice for a two-dimensional image input device may be a television camera -output is a video signal:

Image focused onto a photoconductive target.


Target scanned line by line horizontally by an electron beam
Electric current produces as the beam passes over target.
Current proportional to the intensity of light at each point.
Tap current to give a video signal.

This form of device has several disadvantages.


Limited resolution
-- finite number of scan lines (about 625) and frame rate (30 or 60 frames per
second)
Distortion
- unwanted persistence between one frame and the next
Non-linear video output with respect to light intensity.
Non-flat target on tube.
CCD Camera

By far the most popular two-dimensional imaging device is the charge-coupled device
(CCD) camera.

Single IC device
Consists of an array of photosensitive cells
each cell produces an electric current dependent on the incident light falling on it.
Video Signal Output
Less geometric distortion
More linear Video output.

Frame Stores
Video Signal must be digitised.
A device known as a frame storeor frame grabber usually performs this task. It:

Digitises the incoming video signal


Samples signal into discrete pixels at appropriate intervals -- line by line.
Samples signal into a (8 bit) digital value.
Stores sample frame own memory.
Frame easily transferred to computer memory or a file.

3D imaging

The 3D Image -- Depth Maps


The simplest and most convenient way of representing and storing the depth
measurements taken from a scene is a depth map.
A depth map is a two-dimensional array where the x and y distance information
corresponds to the rows and columns of the array as in an ordinary image, and the
corresponding depth readings (z values) are stored in the array's elements (pixels).
Depth map is like a grey scale image except the z information (float - 32 bytes) replaces
the intensity information.

Fig. 3 Artificial depth maps

Fig. 4 Real depth maps

Why use 3D data?


An 3D image containing has many advantages over its 2D counterpart:
Explicit Geometry
- 2D images give only limited information the physical shape and size of an
object in a scene.
3d images express the geometry in terms of three-dimensional coordinates.
e.g Size (and shape) of an object in a scene can be straightforwardly computed
from its three-dimensional coordinates.
Recent technological advances ( e.g. in camera optics, CCD cameras and laser
rangefinders) have made the production of reliable and accurate three-dimensional depth
data possible.
Consequently many three-dimensional data acquisition systems have been developed.

Introduction to Stereo Imaging -- Theory


Let us consider a simplified approach to the mathematics of the problem in order to aid
understanding of the tasks involved.

We will consider a set up using two cameras in stereo. -- other methods that involve
stereo are similar.
Let's consider a simplified optical set up:

Fig. 5 A simplified stereo imaging system


Fig. 5 shows:

2 cameras with their optical axes parallel and separated by a distance d.


The line connecting the camera lens centres is called the baseline.
Let baseline be perpendicular to the line of sight of the cameras.
Let the x axis of the three-dimensional world coordinate system be parallel to the
baseline
let the origin O of this system be mid-way between the lens centres.

Consider a point (x,y,z), in three-dimensional world coordinates, on an object.


Let this point have image coordinates
of the respective cameras.

and

in the left and right image planes

Let f be the focal length of both cameras, the perpendicular distance between the lens
centre and the image plane. Then by similar triangles:

Solving for (x,y,z) gives:

The quantity
disparity.

which appears in each of the above equations is called the

There are several practical problems with this set up:

Near objects accurately acurately but impossible for far away objects. Normally,
d and f are fixed. However, distance is inversely proportional to disparity.
Disparity can only be measured in pixel differences.
Disparity is proportional to the camera separation d. This implies that if we have a
fixed error in determining the disparity then the accuracy of depth determination
will increase with d.

However as the camera separation becomes large difficulties arise in correlating the two
camera images.
In order to measure the depth of a point it must be visible to both cameras and we must
also be able to identify this point in both images.
As the camera separation increases so do the differences in the scene as recorded by each
camera.
Thus it becomes increasingly difficult to match corresponding points in the images.
This problem is known as the stereo correspondence problem.

Methods of Acquisition
Laser Ranging Systems
Laser ranging works on the principle that the surface of the object reflects laser light back
towards a receiver which then measures the time (or phase difference) between
transmission and reception in order to calculate the depth.
Most laser rangefinders:

Work at long distances (greater than


)
Consequently their depth resolution is inadequate for detailed vision tasks.
Shorter range systems exist but still have an inadequate depth resolution (1cm at
best) for most practical industrial vision purposes.

Structured Light Methods


Basic idea:

Project patterns of light (grids, stripes, elliptical patterns etc.) onto an object.
Surface shapes are then deduced from the distortions of the patterns that are
produced on Object's Surface.
Knowing relevant camera and projector geometry, depth can be inferred by
triangulation.
Many methods have been developed using this approach.
Major advantage -- simple to use.
Low spatial resolution -- patterns become sparser with distance.
Some close range (4cm) sensors exist with good depth resolution (around
0.05mm) but have very narrow field of view and close range of operation.

Moire Fringe Methods


The essence of the method is that a grating is projected onto an object and an image is
formed in the plane of some reference grating as shown in Fig. 6.
The image then interferes with the reference grating to form Moire fringe contour
patterns which appear as dark and light stripes, as demonstrated by Fig. 7. Analysis of the
patterns then gives accurate descriptions of changes in depth and hence shape.

NOTE: Ambiguities arise in interrogating the fringe patterns.

It is not possible to determine whether adjacent contours are higher or lower in


depth.
Resolve by moving one of the gratings and taking multiple Moire images.
Reference grating can also be omitted and its effect can be simulated in software.

Moire fringe methods are capable of producing very accurate depth data (resolution to
within about 10 microns) but the methods have certain drawbacks.

Methods are relatively computationally expensive.


Surfaces at a large angle are sometimes unmeasurable -- fringe density becomes
too dense.

Shape from Shading Methods


Methods based on shape from shading employ photometric stereo techniques to produce
depth measurements.
Using a single camera, two or more images are taken of an object in a fixed position but
under different lighting conditions.
By studying the changes in brightness over a surface and employing constraints in the
orientation of surfaces, certain depth information may be calculated.
Methods based on these techniques are not suited for general three-dimensional depth
data acquisition:

Methods are sensitively dependent on the illumination and surface reflectance


properties of objects present in the scene.
Methods only work well on objects with uniform surface texture.
It is difficult to infer absolute depth, and only surface orientation is easily
inferred.
Methods are mostly used when it is desired to extract surface shape information.

Passive Stereoscopic Methods


Stereoscopy as a technique for measuring range by triangulation to selected locations in a
scene imaged by two cameras already -- further details on general stereo configurations
in Books.
The primary computational problem of stereoscopy is to find the correspondence of
various points in the two images.
This requires:

Reliable extraction of certain features (such as edges or points) from both images
Matching of corresponding features between images.
Both of these tasks are non-trivial and computationally complex.
Passive stereo may not produce depth maps within a reasonable time.
the depth data produced is typically sparse since high level features, such as
edges, are used rather than points.

NOTE:

Problems in finding and accurately locating features in each image can be hard.
Care needed not to introduce errors.
Depth measurements accurate to a few millimetres.
One such passive stereo vision system is TINA developed at Sheffield University.

Active Stereoscopic Methods


This Section describes the active stereoscopic subsystem which provides the threedimensional data to our system for automatically inspecting mechanical parts.
NOTE: Whilst this Section considers some specific active stereo problems, many of the
other issues discussed are not specific to any particular three-dimensional data acquisition
technique, and will be of general interest.
The main components of the Vision System are illustrated by the schematic diagram in
Fig. 8.

The

vision system consists of:


a matched pair of high sensitivity CCD cameras,
a laser scanner all mounted on an optical bench to reduce vibration.

Initially the cameras of the system must be calibrated in order to

determing the 3D position of them relative to some world coordinates


focal length and lens distortion of the camera (+ lens etc.).
Camera Calibration is described in my book.

Depth maps extracted from the scene by :

Moving the laser stripe across the scene to obtain a series of vertical columns of
pixels
Triangulate Pixels to give the required dense depth map. The depth of a point is
measured as the distance from one of the cameras, chosen as the master camera.
Knowing the relevant geometry and optical properties of the cameras the depth
map is constructed using the following method:

Fig. 9 Measuring a depth value


1. For each vertical stripe of laser light form an image of the stripe in the pair of
frames from each camera.
2. For each row in the master camera image, search until the stripe is found at point
P(i,j), say.
3. Form a three-dimensional line l passing through the centre
of the master
camera and P(i,j).
4. Construct the epipolar line which is the projection of the line l into the image
formed by the other camera. Do this by projecting two arbitrary points and
into the image and constructing a line between the two projected points.
5. Search along the epipolar line for the laser stripe. If it is found at
Step 6.
6. Find the point
coordinates of
the depth map.

on line l which corresponds to

, proceed to

. Calculate the (x,y,z)

, and store the z value at position (i,j) corresponding to x and y in

The position of the point is easily found by projecting a line from the centre of the
secondary camera passing through Q. The intersection of the lines l and gives the
coordinates of

The depth map is formed by using a world coordinate system fixed on the master camera
with its origin at

Fig. 10 Depth Map/Image Overlay

Image processing
Image processing is in many cases concerned with taking one array of pixels as input and
producing another array of pixels as output which in some way represents an
improvement to the original array.
For example, this processing

may remove noise,


improve the contrast of the image,
remove blurring caused by movement of the camera during image acquisition,
it may correct for geometrical distortions caused by the lens.

We will not be considering every image processing technique in this section.

Many such techniques are dealt with in Professor Batchelor's companion course.
Many books, such as Gonzalez and Woods, are devoted to this subject
Image processing methods may be broadly divided into
Real space
methods -- which work by directly processing the input pixel array.
Fourier space
methods -- which work by firstly deriving a new representation of the input data
by performing a Fourier transform, which is then processed, and finally, an
inverse Fourier transform is performed on the resulting data to give the final
output image.

Fourier Methods
Lets consider a 1D Fourier transform example:
Consider a complicated sound such as the noise of a car horn. We can describe this sound
in two related ways:

sample the amplitude of the sound many times a second, which gives an
approximation to the sound as a function of time.
analyse the sound in terms of the pitches of the notes, or frequencies, which make
the sound up, recording the amplitude of each frequency.

Similarly brightness along a line can be recorded as a set of values measured at equally
spaced distances apart, or equivalently, at a set of spatial frequency values.
Each of these frequency values is referred to as a frequency component.
An image is a two-dimensional array of pixel measurements on a uniform grid.
This information be described in terms of a two-dimensional grid of spatial frequencies.
A given frequency component now specifies what contribution is made by data which is
changing with specified x and y direction spatial frequencies.

What do frequencies mean in an image?


If an image has large values at high frequency components then the data is changing
rapidly on a short distance scale. e.g. a page of text
If the image has large low frequency components then the large scale features of the
picture are more important. e.g. a single fairly simple object which occupies most of the
image.

Smoothing Noise
The idea with noise smoothing is to reduce various spurious effects of a local nature in
the image, caused perhaps by

noise in the image acquisition system,


arising as a result of transmission of the image, for example from a space probe
utilising a low-power transmitter.

The smoothing can be done either by considering the real space image, or its Fourier
transform.

Real Space Smoothing Methods

Extracting Edges from Images


Many edge extraction techniques can be broken up into two distinct phases:

Finding pixels in the image where edges are likely to occur by looking for
discontinuities in gradients.
Candidate points for edges in the image are usually referred to as edge points,
edge pixels, or edgels.

Linking these edge points in some way to produce descriptions of edges in terms
of lines, curves etc.

Each phase in turn will be discussed in the following Sections.

You might also like