Module - 2 - Computer Vision For Robotics Systems
Module - 2 - Computer Vision For Robotics Systems
6.0 OBJECTIVESs
.Where is it?
I s it the one I am looking for?
.Is it defective or is it OK?
How far away is it?
.Is it right side up?
440
Sec. 6.1 Motivation
441
I s it being interfered with
by another object of the same
type? type? or a different
.What is the angle of the object relative to my hand?
.What color is it?
Imaging components
Image representation
Hardware considerations
.Picture coding
Object recognition and categorization
.Software considerations
.Need for vision training and adaptations
Review of existing systems
6.1 MOTIVATION
F o r two
additional time penalty is the UPH or "units per hour" factor.
of the this tac
o n e c a n define
sequential processes, such as ""assembly'" and "visions,"
as follows:
UPH =
1
assembly UPH vision UPH
Sec.6.2 Imaging Components 443
The imaging component, the "eye" or the sensor, is the first link in the vision
chain. Numerous sensors may be used to observe the world. All the vision com-
ponents have the property that they are "remote sensing'" or "noncontact" meas
urement devices.
Vision sensors can be categorized in many different ways. For convenience,
we will categorize thenm according to their dimensionality although they could also
be classified by their wavelength sensitivity (i.e., do they respond to shades of
black and white, or colors, or infrared, x-ray, ultraviolet, or the normal spectrum
of human vision?). Vision sensors may be conveniently divided into the following
dimensional categories or classes:
.Point sensors
Line sensors
Planar sensors
.Volume sensors
6.2.1 Point Sensors
The point sensors may be similar to ""electric eyes," being either some type of
photomultiplier, or more commonly, a phototransistor (see section 5.8.2.1). In
either case, the sensor is capable of measuring the light only at a single point in
space. For this reason they are referred to as "point sensors." These sensors
may be coupled with a light source (e.g., a light-emitting diode) and used as a
noncontact "feeler," as shown in Figure 6.2.1.
The "feeler" essentially monitors the light in a small "acceptance aperture."
If an object falls in this acceptance aperture, light will be reflected from the object's
surface and will be received by the sensor. If the acceptance aperture is clear, no
light will be reflected into the sensor and it will not "feel" anything.
The point sensor may be used to create a higher-dimensional set of vision
information by scanning across a field of view by employing some ancillary mech-
anism. For example, an orthogonal set of scanning mirrors (see Figure 6.2.2) or
an x-y table can be employed to execute the scanning of the scene.
Light
Point
Out of Sensing Figure 6.2.1. Noncontact feeler-point
Sensor Limit (i.e., proximity) sensor. The object is
sensed only if its location falls between
n e a r sensing limit d, and d,: Points located beyond d, are
dd=far
On d sensing limit out of the sensing limit of the device.
luminator
and
Point Sensor
Object Horizontal
Scanning9
Mirror
Vertical
Scanning9 Figure 6.2.2. Image scanning using a
Mirror point sensor andoscillating deflecting
mirrors.
Line sensors are one-dimensional devices and may be used to collect vision infor-
mation from a scene in the real world. The sensor most frequently used is a "line
array" of photodiodes or charge-coupled-device components. These devices are
similar in operation, both being the equivalent of *analog shift registers" that
produce a sequential, synchronized output of electrical signals corresponding to
the light intensity falling on an integrating light-collecting cell. See Figure 6.2.3
for a schematic representation. The light output from these arrays is available
sequentially" (i.e., the individual cell outputs are not available in parallel or even
on demand). The consequence of this is that the light intensity from the scene is
available only in an ordered sequence and not at random on demand by the user.
This has some consequences with regard to the time required for accessing a desired
point intensity. The arrays may also be obtained in other than straight lines (e.g.,
circular arrays or crossed arrays are available; see Figure 6.2.4).
By proper scanning, line arrays may be used to image a scene. For example,
by fixing the position of a straight-line sensor and moving an object orthogonaly
to the orientation of the array, one may scan he entire object of interest. Figure
6.2.5 is an example of such an application in a robot system.
ignal
Figure 6.2.3.
is a sequential
Schematie representation of line seanning arrays. The analog outpuels
representation of the intensity of the light collected by the integra
ells.
444
Sec. 6.2 Imaging Components 445
TV Monitor
Video Processing
System
Line Scan Camera
Conveyor Rate
Information Robot
Control
lurmination Robot
Conveyor Belt
Travel
Figure 6.2.5. An automated robot sorting system using a line scan camera to
applications
this time penay
is not severe, but in others it is almost intolerable. In most robotic applicatio
Sec. 6.2 Imaging Components
447
cycle times of 1 to 10 s are commonplace, and the additional time needed for robot
vision processing would probably be acceptable. Most currently available vision
systems provide about five vision cycles per second, so the extra 200 ms is not
significant. However, in a small-parts assembly where the manipulator portion of
the assembly cycle may be on the order of 1 s, the additional 200 ms for vision
processing begins to have a significant effect. In an application such as semicon-
ductor assembly, a 200-ms penalty would be prohibitive, since the additional time
would reduce the process yield to unprofitable levels.
Although a vidicon camera sensor is not inherently a raster-scan device, the
raster-scan format creates economies in manufacture (i.e., inexpensive cameras on
the order of $200) and of course provides a simple mechanism for viewing the
scene using ordinary television monitors, also
relatively inexpensive.
Random-access scanning photomultipliers (image dissectors) are also avail
able. but the photoelectric device is more expensive to manufacture (approximately
$1000 for the tube itself) and the control circuitry is more complex because of the
random-access requirements. Since this tube does not rely on the conventional
raster scan, interesting variations such as spiral scans or radial scans may be im-
plemented. Viewing the output of such a device would require a more costly
monitor that is capable of accepting both raster and random-access inputs, with
some type of mode switching required.
In addition to vidicon transducers, several types of solid-state cameras are
available (e.g., photodiode and charge-coupled-device cameras). The solid-state
camera is manufactured in a fashion similar to large-scale integrated circuits. The
sensor elements themselves are very different from the photosensitive elements in
a vidicon camera, but the arrays are still accessed in a serial or raster fashion. For
this reason the solid-state cameras have no access-time advantage over the vidicon
tube cameras. The solid-state arrays are inherently less noisy than the vidicon
cameras, but are also considerably more expensive. This price/performance trade-
off between the camera types must be carefully considered before final selection
of a photo-optical transducer is made for a particular application. Many appli-
cations require the solid-state sensors because of weight and noise factors. This
would be particularly important if it were necessary to mount the camera near or
on the end effector of a robot.
Two-dimensional arrays (similar to line arrays) may be formed using either
CCD (charge coupled device) or CID (charge injected device) technology. Both
of these sensors are based on MOS (metal oxide semiconductor) transistor tech
nology. Since they are discrete in nature, these devices will have a finite number
of cells in both the horizontal and vertical direction. The solid-state array sensor
is also an integrating detector and thus it is apparent that its sensitivity is propor-
tional to exposure time.
It is important to understand how video information from the two-dimensional
array is acquired. In the case of the CCD the most popular topology used is the
frame transfer (FT) structure. FT technology makes use of an imaging area which
448 Computer Vision for Robotic Systems: A Functional Approach
Chap.6
BRIEF COMPARISON OF CAMERA
TABLE 6.2.1
TECHNOLOGIES
is exposed to light and generates charges proportional to the integral of the light
intensity. There is also a storage area having the same number of cells as the
imaging area. During the "frame time" when the image area is exposed, charge
is accumulated at various cell locations in the array. After proper exposure, but
before the next frame, this charge is clocked up to the corresponding cell in the
storage area. (Note that the storage area is shielded from light.) A transfer
register that permits each row of data to be moved in a serial manner is utilized
in the operation. While the imaging area is being exposed for acquisition of the
next frame of data, the charge pattern of the previous frame can be read out from
the storage area by means of the readout register that operates in a manner similar
to the transfer register.
In contrast to the CcD device, the CID camera may be thought to consist
of a matrix of photosensitive cells arranged in rows and columns. As opposed to
the CCD technology, each of the cells can be addressed randomly although
com-
where I denotes the signal current from the sensor, E the illumination on the
photoconductive device, and I, and E, are respectively the value to which the
signal and illumination are referenced.
For non-unity gamma, the contrast in the darker part of the picture is com
pressed while the lighter portion is exaggerated. Most television cameras are
designed with y 0.45. This is the result of the natural characteristic of the
=
CCTV monitors used to view the output. Normally, such monitors have a gamma
2 2. Thus if the camera is adjusted for the inverse (i.e., y= 0.45), the picture
on the monitor is pleasing to the eye. Unfortunately, while the image may appear
pleasing to a human being, the signal provided to a computer will not contain the
correct information. This is why it is extremely important to use a camera of unity
gamma for imaging a scene for a vision application.
6.2.3.2 Raster scan
Previously, it was mentioned that the electron beam in a vidicon is scanned
across the photosensitive element in a raster fashion. Figure 6.2.6 illustrates how
First Horizontal
Video Synchronization
Line Pulses
Even
Even Lines Odd Lines
Video
Signal
M
Intensity of Intensity of
First Line Last Line
Horizontal Vertical
Synchronization Synchronization
Pulse
Pulse
(c)
gure 6.2.6. (a) Raster scan process on a television monitor; (b) magnified view
onterlaced raster scan showing odd and even field lines; (C) two lines of analog
video data.
450 Computer Vision for Robotic Systems: A Functional Approach
Chap. 6
TABLE 6.2.2 UNITED STATES
MONOCHROME TELEVISION STANDARD
RS-170 SPECIFICATIONS FOR RASTER
SCAN
Parameter Value
Aspect ratio
(width to height) 4/3
Lines per frame 525
Line frequency 15.75 kHz
Line time 63.5 us
Horizontal retrace time 10
Field frequency 60 Hz
Vertical retrace 20 lines per field
is motionless to the time when the RS-170 video signal representing the scene
contains both the odd and even fields (with valid data). To this must be added
the time that it takes to store the image in whatever type of vision system is being
used.
Image capture time is dependent on the type of sensor and lighting. Consider
a CCD camera fixed in space and looking at an area into which an object is moved
by means of some mechanical device (such as an x-y table or conveyor). The
camera, by its nature (integration of light levels) must be continually scanning.
Since there will be a settling time for all mechanical motions, the instant when the
motion of the object ceases will not be synchronized with the start of a field.
Additionally, and as described previously, a frame transfer CCD camera transfers
its image to a storage area which can be read out only when the next image area
is scanned. Referring to Figure 6.2.7, it can be seen that the worst-case time for
image capture with a CCD is 83.3 ms. It should also be apparent that if the object
is motionless for at least three field times (enough to read out any invalid video
Field 16.7 ms
H
83.3 ms
Worst-Case Time
From End of Motion
To Signal Received
E O4 E Og E2 O Fo Real-Time
Video Signal
Camera Acquisition
E O2 E O (Charge Integration)
Motion
Ceases
Integration Time
For First Even
Field After Motion
Integration Time
For First Odd
Fleld After Motion
* = Invalid Video
En-Even Field n
O,=Odd Field n
Figure 6.2.7. Image capture for a CCD camera at the time when an object stops moving,
field.
Note: 33.3 ms must be allowed for integration of each
A Functional Approach
452 Computer Vision for Robotic Systems:
Chap.6
data from the frame transfer system), then the image can be captured in twa fii.
times (or 33.3 ms). That is, the data coming from the camera will be valid a
three field times and the capture time is that needed to obtain both an odd after
and
even field.
The same type of analysis can be performed for a vidicon where the lao tim.
must be taken into account. The CID camera does not have a frame transfer
architecture and therefore once the image is stationary, the video is availahle
Volume sensors, providing general three-dimensional information, are not yet cur.
rently available on the market as a standard item. There are mechanisms that
may be used to measure three-dimensional shape and orientation properties of
solid objects. Stereo imaging using multiple two-dimensional arrays to image the
object may be used. Three-dimensional information from solid objects may also
Light
Source
Imeger Plene
(ine eOy
.... ...
...
..
***.
Object of height B
Camer
Field of View
Flash Plane of Light
Object
Reflected Line
of Light
(a)
Box
(b)
Figure 6.3.2. (a) Structured light imaging for the NBS robot vision system; (b)
example objects and the line segment pattern formed by a plane of light for a box
and an object with both a raised and depressed surface. In all instances, a dark
line represents reflected líght and is the only image "seen" by the camera.
456 Computer Vision for Robotic A
Systems: Functional Approach
Chap.6
Array of
Light
Integrating
Cells
White Level
Black Level.
Pixel Clock
Column 1 Column 10
U-SUnr
Output Register Pixel Clock
Figure 6.3.3. Digital picture representation and data readout. The signal from
the output register is shown for row 5 or 6.
situations. most
Since commercial television cameras are serial access devices, and since n e
one
vision system processing will require random access to the picture informatio
This
commonly encountered problem is the storage of pixels for future reference
Sec. 6.4 Hardware Considerations 457
practiccal
robotic applications. Even if one took 10 samples per line in order to match the
instruction times of a typical microprocessor, we would still need to return to the
same line 25 times, so it would require about 25/30 0.83 s to acquire an image,
=
which isstill too long for most practical applications. The generally applied solution
to the time problem is to use a frame buffer that is capable of storing an entire
image. These frame buffers, which usually have a built-in digitizer to convert the
data to digital form, are available off the shelf in a variety of configurations that
match the bus specifications of almost any minicomputer or microprocesor.
Using
such a device then permits higher-level languages and general-purpose image anal
ysis algorithms to be applied to what is essentially a large array of data points.
In many robotic applications where a manipulator is to grasp an object for
placement elsewhere, the silhouette of the part is very often sufficient to permit
orientation of the manipulator. In these situations, the image required for use is
said to be binary; that is, the pixels describing the object need only one bit to
White
Black Video
Sample A/D Digital To
and
Converter Gray
In Hold Level Computer
Data Timing
Pixel Clock
Figure 6.4.1. A/D subsystem for converting analog video signals to digital gray
levels.
458 Computer Vision for Robotic Systems: A Functional Approach
Chap.6
describe the presence or absence of light intensity with respect to the nor
the object being viewed at that instant. Since it is inefficient to
to use an
rtion of
8-bir
format. Packibyte
to store binary pixels, binary images are often stored in a packed
allows 8 pixels to be stored into each byte, theretore requiring only 8192 byte
memory for each field. (Verification of this is left as an exercise for
for the bytesof
the read
reader.
For gray-scale images one may require 32 kilobytes to 64 kilobytes of storaee
e for
4-bit (16 gray levels) or 8-bit (256 gray levels) images. Even with the
use of
storage buffers, image-processing algorithms for robot vision must still be
frame
very simple if software techniques are to be used to process the imagesbecause kept
e
much data must be processed. Even moderately complex algorithms require con.
siderable amounts of hardware to achieve sufficient preprocessing for general
purpose software to be applied. In some cases the hardware algorithm enhance.
ments are so sophisticated that there actually is no need to store the image itself
but rather only a set of reduced picture parameters is accessed by the computer
In addition to the considerations for image acquisition, the illumination of
the scene itself influences the hardware considerations. In the simple cases, the
lighting may be matched to the color or surface reflectivity of the objects under
consideration. It may be necessary to use either colored light or colored filters
to bring out certain features to match the color sensitivity of the camera. This
often permits certain simple processing to be accomplished at the speed of light
before the data ever reach the photosensitive target. One should remember that
the speed of light is finite, and that light has a speed of about 1 ft/ns. By using
oblique as well as normal perpendicular lighting, one may take advantage of shad-
ows and surface anomalies, or one may choose to eliminate these effects. It is
also common to use shutters and/or stroboscopic sources to "freeze" motion. Since
the vidicon target is scanned at 60 Hz, motion blurring of moving objects is common.
and a strobed light source can often be of great use. The principle in use is related
to the fact that the vidicon will store the electrical equivalent of an image for a
short time while the scanning is being completed. The strobe light essentially
freezes the object's position while the camera stores its reflected light pattern. The
scanning circuitry then transters the electrical signal a short time later.
The appropriate use of transmitted and reflected illumination must be made
to more easily satisfy the picture-processing requirements by the computing com
ponents, or there may be no feasible solution to the vision-processing task at han
in the limited time available. Again, the structured-light approach of the Nationa
Bureau of Standards is a good illustration of this principle in a robotics applicatiou
(see Section 6.9.5.2).
on
The representation of pictures has been briefly discussed previously. In this se
t st h a t
we treat the topic more extensively and present the following coding concepis
represent the most commonly used methods in present practice:
Sec. 6.5 Picture Coding 459
.Gray-scale images
.Binary images
Run-length coding
Differential-delta modulation
6.5.1 Gray-Scale Images
Perhaps the best place to begin the discussion of picture coding is with the simplest
distributional representation scheme, the gray-level histogram. which is a
one
dimensional array containing the distribution of intensities from the image. Figure
6.5.1 shows a picture and its gray-level distribution. Assuming an 8-bit gray scale.
the number of gray levels will be 256. One can see that the gray-level histogram
destroys all geometrical information. This is illustrated simply by noting that if
the image is rotated by any arbitrary angle, the gray-level distribution will remain
the same. In a sense, the gray-level histogram is really an image transformation
of a very simple type, so it is often very useful in evaluating imagery because of
the enormous concomitant data reduction. For
example, theoriginal image may
require 65 kilobytes of storage, while the histogram would require only 256 x 16
bits or 512 bytes, a saving of over 99%
(512/65,536).
The gray-level histogram is related to the probability of occurrence of
level information. Using this interpretation, there are many useful ideas that
gray
evolve naturally. The average gray level, in conjunction with the minimum and
maximum gray levels, can be inspected to decide whether or not the picture has
an adequate contrast throughout the scene. This could, of course, be done au-
tomatically and the illumination increased or the sensitivity of the camera increased
by opening the aperture setting on the camera.
Other picture properties may be deduced by measuring properties of the
histogram, such as variance. The variance of a histogram is a measure of the
"spread" of the distribution of gray values. The gray-level histogram function
may also be used to determine a contrast enhancement function, so that the overall
image quality is improved. For example, a simple linear contrast enhancement
may be specified by amplifying the video signal and altering the offset. The
mathematical expression for linear contrast enhancement is given as follows:
new gray level = K{old gray level) + B
where
K = amplification or attenuation factor
B bias or offset gray level
The results of linear contrast enhancement are illustrated in Figures 6.5.1 and 6.5.2.
Figure 6.5.1(a) shows an original poor contrast image with the corresponding his
togram shown in Figure 6.5.1(6) The results of linear contrast enhancement and
the corresponding histogram are indicated in Figure 6.5.2(a) and (b), respectively.
If one inspects the equation for linear contrast enhancement, it is easy to see
Robotic Systems: A Functional Approach
Computer Vision for
460 Chap.6
LAN
r
a)
11914
255
(b)
SLOAN
(a)
6377
ever, the memory storage requirements are greater than for binary imagel compli-
the algorithms for processing gray-scale images are generally much mo This
cated and more time consuming than those used for binary image procesSIng
will become clearer in the following sections.
Sec. 6.5 Picture Coding
463
Binary image coding requires that each pixel in the original image be coded into
one bit: therefore the term binary image. In its simplest form, a fixed threshold
may be applied over the entire scene. Figure 6.5.5 shows an example of such a
process. Although this binary image appears visually pleasing, and a human being
is easily able to recognize its content, in point of fact the image itself has been
poorly digitized since there is an uneven or disproportionate distribution ot blacK-
and-white regions. In this figure the original subject matter was illuminated from
one side, so that a significant shadow appears in the image and the selected binary
image threshold is inadequate. In the case of a more realistic scene for robotic
vision, this effect causes part of the object to extend past its actual limits, and part
of the image to be eroded.
In the case of the binary portrait, it is important to appreciate the fact that
no single binary threshold would be adequate under the lighting conditions used
to present the image to the camera. Given this illumination, one could attempt
to use an adaptive threshold that would adapt to the region locally surrounding
the pixel to be binarized. One such adaptation technique is to use the local average
intensity as the threshold. Figure 6.5.6 shows such a method applied to an image.
Because of the reduced memory requirements for binary image coding, as
well as the reduced arithmetic requirements when dealing with a 1-bit pixel, many
of the commercial vision products now manufactured frequently use binary images
and have, to date, not generally used gray-scale imagery for object identification,
location, and so on. This implies that shape and geometry factors, rather than
gray-level textural parameters, are the most used in present-day robotic vision
systems.
Gray-scale and binary coding of images are direct methods for image coding, in
that both systems maintain a map of the (x, y) coordinates and the corresponding
intensity information. In the simplest fornm, this might be an array of intensity
values, the array being as long as the number of pixels in the image. The term
data structure" refers to a representation of data in a structured manner useful
The data
or implementation and management by a computer system. structure
Tor gray-level and binary images will be a continuous array in memory, whose index
Value and contents are directly related to a pixel location and intensity value.
Figure 6.5.7 shows such a data structure for an 8-bit intensity mapped image. For
this data structure, the location of the pixel under consideration is used to compute
ne index in memory associated with that pixel. In this example, the entry index
Computed by taking the row number of the pixel and adding 256 multiplied by
The column number of the pixel. (The row index ranges from 0 to 255, while the
column index ranges from 1 to 256.) The actual entry value in the array is the
464 Computer Vision for Robotic Systems: A Functional Approach
Chap.6
(a)
TWO BIT
IMAGE
(b)
Figure 6.5.4. Computer-generated gray wedges (scales): (a) 1-bit
gray-scale
olution; (b) 2-bit gray-scale resolution; (c) 4-bit-gray scale resolution; (d) re
gray-scale resolution. 0-D
Sec. 6.5 Picture Coding
465
SLOAN
SLOAN
ww.o
(d)
ONE BIT
IMAGE
(a
(b)
Figure 6.5,4. Computer-generated gray wedges (scales): (a) 1-bit gray-scale res
olution; (b) 2-bit gray-scale resolution; (c) 4-bit-gray scale resolution; (d) 6-bIt
gray-scale resolution.
Sec. 6.5 Picture Coding 467
(c)
d)
Figure 6.5.4. Continued
468 Computer Vision for Robotic Systems: A Functional Approach
Chap.6
(a)
Figure 6.5.6. Binarized images using the local average as a binary threshold: (a
original gray-level image; (b) original image with a 3 x 3 pixel local avera
subiracied; (c) binarized image with gray values of 124 to 132 mapped to black lau
other values are white).
Sec. 6.5 Picture Coding 469
(b)
(c)
Figure 6.5.6. Continued
CCTV electrical value, which will range from 0 to 255. For binary
digitized
magery, the data structure will require only 1-bit per pixel instead of 8-bits per
pixel
These direct coding and simple packing schemes are quite simple but do not
Lake advantage of image structure. If we look at the data structures above, es-
470 Computer Vision for Robotic Systems: A Functional Approach
Chap.6
Entry Entry Entry Value
Index Value Size (bits)
1 100
2 102
103 8
106
First 169 8
row
255
257
Second
row
512
65281
256th
row
pecially for binary images, it is clear that there will generally exist long strings of
pixels of the same binary value. In this case, a separate entry for each pixel is a
waste of computer memory. If one simply stores the transition points and the
string length, a (potentially) more efficient data structure can be applied to the
images. Figure 6.5.8 illustrates such a "run-length coded" data structure.
For certain types of images that have a lot of "blobs," the
memory require
ments may be considerably reduced by the use of run-length coding. However,
images with numerous small features may require more memory storage than the
direct method. In many robotic vision applications, the use of
run-length
does offer a considerable saving. Gray-scale images may also be run-length coded,
coding
using a data structure similar to that shown above. Ordinarily, however, the run
lengths are not as long, and memory savings are usually not as great as for bindy
images.
6.5.4 Differential-Delta Coding
Differential-delta coding (DDC) may also be used to code gray-scale images more
efficiently. This coding technique uses the difference between the intensuy
pixel and the previous pixel. In ordinary scenes, the difference between suhet
pixels is not very large (on the order of 25% of the range or less), so the nui
Sec. 6.6 Object Recognition and Categorization 471
100
2
65536
of bits to encode the difference is two less than that required to encode the entire
number representing the intensity. The data structure for this encoding is given
in Figure 6.5.9. Figure 6.5.7 provides the data for Figure 6.5.9.
The preceding section dealt with the coding of images in a blind fashion, without
intelligently addressing the issues of analysis of imagery from robotic applications.
Although coding techniques are important, it is more important to understand the
need for partitioning or segmenting the imagery from the real world. As was
stated in the beginning of this chapter, several basic tasks are commonly encoun-
tered in robotic vision systems such as tracking, identification, and inspection. All
these tasks require that the particular portion of the scene to be operated on can
be extracted from extraneous picture information and can be analyzed efficiently.
This naturally leads us to the two topics that will make the realization of these
goals more likely:
Dimensionality reduction
Segmentation of images
6.6.1 Dimensionality Reduction
The direct coding of images using binary or gray-scale intensity coding results in
fixed and somewhat large (from 8 to 65 kilobytes
storage requirements generally
472 Computer Vision for Robotic Systems: A Functional Approach
and more). Run-length coding and DDC generally reduce those fiou Chap.6
used for
appropriate imagery. These types of storage are still rathergures when
There is no hacdire in
that they are coding the pixel values as they are found.
ligence embedded into the numbers stored to represent the picture ic intel-
the images have been memorized, a photographic memory if In a sense,
know that those with you will
photographic memories may not possess the wisdom We all
the information in an
intelligent fashion. In a sense, the coding schemewisdom to use
up for the lack of intelligent coding, by brute force. Once all
the dato
make es
image have been acquired, use must be made of the pixels to represent the con.
of the image rather than
just the arrangement of the pixels. The descriptione
the content of an
image will generally yield a simpler description of the obiechof
within the image field of view. For
example, one may describe an ects
on a table as: egg's orientation
egg is located at (o, Yo)
A BC
Chap.6
D E F
Figure 6.6.1. Local
GH tion. sub-image defini.
Edge4 |G E + JE - C
(a) (b)
(d)
(c)
images: (
Figure 6.6.2. Examples of one-dimensional edge operators on two
original egg: (b); original integrated circuit; (c) Edge2 horizontal
difference: (d)
Edge3 vertical difference.
Sec. 6.6 Object Recognition and
Categorizatioin 475
(a) (b)
swww.w.www
(c)
Figure 6.6.3. Examples of Sobel and modified Sobel operators: (a) image of an
egg: (b) Sobel diference operator applied to the image in part (a): (c) modified
Sobel operator applied to the image in part (a): (d) difference between the images
in parts (b) and (c).
2H 1 + (A +2D +G -
C- 2F 1)2)2
function along the borders
This operator computes a weighted-average intensity
measurements perpendicular to each
of the subimage, and then forms two edge
combined in a "quadrature"
measure-
other. The two edge properties are then
enhances edges in an acceptable fashion.
ment. The Sobel operator generally
of the local region. The squaring and
Note that it also has built-in smoothing
so one normally would modify
will be very time consuming,
operations
Square-root
with absolute values, and eliminating the
this operator by replacing the squares modified Sobel operator. Appli-
root in toto. This is referred to as the 6.6.3.
square Sobel operators are shown in Figure
Sobel and modified
cations of the
8
of the central pixel
pixe>'s surrounding. to the
the intensity
his operator compares the central pixel and its neighbors
in contrast between
the difference
Here, only
476 Computer Vision for Robotic Systems: A Functional Approach
Chap.6
(a) (b)
If the current point is outside the region, i.e., both test pixels are outside,
turn right until the region is entered.
If the current point is indeterminant, i.e., one test pixel is inside and the
other is outside the region, go straight.
This procedure is graphically illustrated below. (Note that 'x" marks pixels in-
terior to the region and ""o" marks pixels exterior to the region. Also, the "x*"
and the "o" are the only two pixels under consideration at the
present time.)
O O
x
o A
O o object.
eight directions that may be traversed in reaching the next point on the outn
In this
Such a coding scheme is illustrated in Figure 6.6.6 for a simple objecCt.
the out
represents a point on
c**"
Chain code=
<e,e,e.e,e,e,e, se, s, s,
w,w,w,w,w,w, W, nw,n,n>
where e = east
W = west
Size or area
.Range of object projected onto x and y axes
Ratio of the extent of the object in the x and y directions
Center of gravity of gray scale or binary rendition of object
Geometrical moment description
Number of holes in the object
Figure 6.6.7 illustrates some shape and geometry features commonly used in
object description. These features clearly reduce the dimensionality required to
characterize the objects and will reduce the computation time for object identifi-
cation. Since the dimensionality of the descriptions is reduced, we should expect
to trade-off
something in return. In fact, we will find that these features do not
uniquely characterize the object, so there exist an unlimited number of different
480 Computer Vision for Robotic Systems: A Functional Approach
Area
Chap.6
Excluding
Holes X
Interior Area
(Area of holes)
0 X
Longes
Dimension
Feret's
Diameter
I 0I XT
Breadth
Perimeteer
Convex
Perimeter
O)
Projected
Length
Maximum
OT1I IEXTI
Horizontal
Chord X
Examples of Derived Size Measurements
TXArea
4x Longest Dimension Equivalent cylindrical volume
Figure 6.6.7. Commonly used shape and geometry features. Redrawn with pe
mission of Dynztech Laboratories, Inc., Imaging Products.
Sec. 6.6 Object Recognition and Categorzation 481
where
T(x, y) = reference sub-image
G} =
2 I7(* -
Xo» y - yo)
-
t(x. y)|
of the absolute ditferences of the reference and test
thereby representing the sum
location of the reference (Xo. y%) with respect to
image as a function of the spatial
as a "nearest neighbor classifier
the test image. This technique is also known
will have the lowest score and will
since the nearest neighbor to T(x, y) in t(x, y)
is carried out over the image region in
be the closest "relative." The summation
of the test image. In general,
which the sub-image is coincident with a portion that it is
both and y) so placed over the
the sub-image is displaced spatially (in
x
Figure 6.6.8 shows an original binary image, a template, and the resultant
image after the binary correlation process has been applied. As can be seen in
part (c) of this figure, the locations where the best match occurs have the
value. Various methods can be used to
highest
provide more image data in cases when
the template extends beyond the image. One
possibility is to add additional zeros.
This was done in the example shown in Figure 6.6.8. Two columns of zeros were
added (but not shown) to the right of the image and two rows of
zeros were added
at the bottom. Thus the top lett edge of the
(3 x 3) template could be placed
on each pixel of the image.
While binary template matching is quite
as well as
simple to implement (in hardware
software), some problems exist. Recall that generally information about
an image is lost when it is binarized and
pixels that constitute an edge must either
be white or black. Thus accurate positional information
may be lost. Addition-
ally, since both the image and template are binary, it is
proximately the same threshold and lighting conditions when important
to have ap
o1 1 1 1
11 0 0 0 1 10 01 1 1011 o1 1 1 1
1 0 0 0 0 0
101 10 0
1 0 00 0 10 0 0 1 100 11110
o 0 1 1 0
o 1 0 0 0 1
0 1 1 10 1 1 1 0 0 0
110 1 0 0 1 0 0 110 0 0 1
1
1
10 1 1 0 0 0 0 0 1 1 0 1
1 1 0 10 1
0 0 0 1 0 10 0 0 0 0
0 1 0 10 0
0 0 0
10 0 11 1 10 1
0
1 0
0 01 1 0 0 110 0 1 1 1
0 1 1 1 0 0 0 1 0 1 1 1 1
0 0 0 0 1 0 1 1 1
1 0 1 0 0 0 0 1
10 1
0 1 1 1
0 0 0 1 1 1 0
0 1 1 1 1 1 0
o 11 1 10 0 0
0
0
00
1
10 1 1 1 0 1 1 0 0
1 1 1 1 0 1 1 1 1 0 1
(a)
2 3 5 3 6 4 4 5
5 5 6 5 6
74 9 3 4 3 2 2
3
5 5 5 2 4 4 4 4 5 6 6 5
15 5 2 3 4 5 5 3 3 4 3 4 7 5 6 7
4 4 5 6 4 5 4 67 4 5 7 6 63 45 43 43 63
5
44 46 33 5 5 3 4 5 5 5 5
74 6 4 7 3 4 2 5 3 6 5 6
6 6 7 2 6 5 3 4 5 3 4 6 4 3 3 3 2 2
6
4 2 6 3 6 3 4 5 548 5 6 7 5 5
5 2 2 5 4 7 3 4 5 5 2 4 4 6 5
8 4 5 5 4 2 6 6 64 5 6 6 3
5 3 4 64 8 4 6 4 5 2 4 2 5 5 5 4
2
10 4 65 51
5 5 22 65
35 45555
654 64 66 31
(b) (c)
Figure 6.6.8. Example of template matching applied to a binary image: (a) binary
image array (15 x 20): (b) binary template; (c) output of binary correlation (14 x 19).
The dashed box in part (a) indicates a perfect template match has been achieved.
edge For example, if both templates shown in Figure 6.6.9 are applied to an
mage and the square root of the sum of the squares of both process outputs at
cach (image) pixel taken, the edges of the resultant image are enhanced. These
emplates perform the same function as the Sobel operator described previously.
this case the reference location of the template would be the center pixel.
6.6.3.3 Correlation for gray-level images
The same concept used for binary template matching can be extended to a
y-level image and model (or reference sub-image). However, in this case it
hod be apparent that a high correlation could occur for features in the image
took nothing like the model due to the fact that they may be brighter than
the pixels that match the model. Thus the method of cross correlation may not
be ad to uniquely define the feature (part of image that best matches the
reference) that is being sought
isher statistical correlation coefficient is used, the best match of a
nc
Eay-level model in a gray-level image can be found, independent ofvariations
A Functional Approach
484 Computer Vision for Robotic Systems: Chap.6
Figure 6.6.9. Templates used to imple.
0 ment the Sobel operator (described in
Section 6.6.2.2) for edge detection.
Second Operator
First Operator
Po Yo (2M)]
VNE - (21)]NEM
where
which is dimensionally the same
= a sub-image of the test image
as the model (or template)
M = the model or reference sub-image
N =the total number of pixels in the model
the test image
(xo» Yo) =
the spatial location of the model with respect to
provision that it is smaller
Note that the model can be any size (q x r) with the
than the test image.
The template or model is moved across the image as described previously.
levels of the
However in this case, besides multiplying the corresponding gray
and the model (and
template and model, both the average value of the sub-image
the model and
the variance of each) are needed. Note that the average value of
its variance are constant.Variations on the equation above can simplify com-
to remove the
putation. For example, the correlation coefficient can be squared
burden of taking the square root in he denominator.
A few interesting conclusions can be inferred from the use of the correlation
than
coefficient. First, a value ofl indicates a perfect match and if values less
zero are ignored, the maximum value obtained over the image defines the location
the
of the best match. Of course, the higher the correlation coefficient the better
match. if a feature in the image is partially degraded, the correlation
coefficient will still generally identify it and its location. Third, all the locations
in an image that matched the model could be found by simply looking for all the
correlation coefficients above some specified threshold. Finally and as stated
previously, the correlation coefficientis independent of linear changes in brightnes
That is, assuming all the pixels of the image (or the model for that matter) arc
modified by the function
I(x, y)aew = a I(x, y) +b
sa
for a>0 and any b, the correlation coefficient is unchanged. The paramete
and b can be considered as gain and offset for either the image or the acqu
device.
of an
If one chooses to make a
template that defines some specific featui
Sec. 6.6 Object Recognition and Categorization 485
object such as a corner or edge, it is possible to use the gray-scale correlation
technique just deseribed to find these specific features. In this case, imagine a
(a x r) model with the left half, q x r/2, entirely white and the right half black.
Using this template (or model) with gray-level correlation enables one to find all
the areas of the image that resemble an edge having a light to dark transition.
the logical OR function for the pixel mapping rule. When it is applied to a binary
mage, the size of a white area is increased. Figure 6.6.10(a) shows an unprocessed
binary image. Figure 6.6.10(b) shows the results of the white dilation operation
performed on Figure 6.6.10(a) while Figure 6.6.10(c) shows the results of a white
erosion (or black dilation) performed on the image of Figure 6.6.10(b).
If one performs erosion operations followed by dilations, small bridges be-
Tween white objects will be broken. This cascaded operation is called an opening.
486 Computer Vision for Robotic Systems: A Functional Approach
Chap.6
(a) (b)
(c)
Figure 6.6.10. Example of morphological image processing: (a) original binary image:; (b)
result of white dilation of the image in part (a); (c) result of white erosion of the image in
part (a).
The operation of dilation followed by erosion has the opposite effect and closes
up the spaces between adjacent regions. This cascaded operation is called a closing.
Many other operators exist which allow filling in partially missing edges to
restore the original shape of the image, measuring the extent of an image, and
actually isolating objects and reducing their dimensionality to count specificaly
shaped and sized objects in a scene.
be
The techniques used for vision in robotic applications may rarely, if ever,
implemented totally in software. Virtually all vision systems implement so
with
algorithms in hardware. The software considerations lie mainly in the ease n in
which the vision systems may be used. Most of the present vision systems arc
Sec. 6.9 Review of Existing Systems 487
essence, peripheral devices to the main robot controller and are invoked through
a command/data structure that is rather simple. Basically, the peripheral (vision)
device is given a string or stack of commands, and the peripheral returns data to
the main processor. Some vision systems have their own user languages that range
from cumbersome to friendly. For the most part, then, language and software
considerations lie mainly in control of the vision peripheral, not in the actual
implementation of the algorithms.
Although one might initially have believed that the definition of a prototypical
imaged part is trivial, by now the reader should be aware that the amount of data
required to define an object may indeed be huge in comparison to other digital
data-processing applications. When considering the variety of degrees of freedom
required to describe an object fully, it is evident that vision system training will be
needed so that a reasonable amount of data may be retained by the vision system.
It is for these reasons that dimensionality reduction, as mentioned earlier, is
so important. Specifically, it allows for the efficient representation of the visual
data, usually in an independent manner. Another important consideration has to
do with the potential for dealing with objects and parts that may be changing over
time. For example, a conveyor belt carrying a part may not operate at a carefully
controlled constant speed. As a consequence, parts will not arrive at known or
precise intervals of time. In such an instance, tracking of the "trend" regarding
parts arrival may be very useful in efficiently acquiring images and in processing
the data.
Another case where this is true is in semiconductor assembly, where a die
may be attached to a substrate by a die attach machine. If this machine has a
slight but consistent drift, the imaging of the die for later bonding of wires to the
lead frame and chip may be subject to placement errors, due to variable placement
of the die. This type of adaptive updating of object positions is often necessary
for efficient part handling.
In this section we review the major types of commercial vision systems currently
available. The discussion of techniques used in these systems will be restricted
Somewhat, since robotic vision requirements may be very diverse. The major
systems can be classified into the following type of general categories:
rithms)
Gray-level vision systems
488 Computer Vision for Robotic Systems: A Functional Approach
Chap.6
Structured light systems
Character recognition systems
A d hoe
special-purpose systems
6.9.1 Binary Vision Systems
Binary vision systems are those that use only two levels of image information
They are so-called silhouette systems, since very controlled lighting must be used
to image objects
reliably. Backlighting of parts is frequently selected so that the
objects to be inspected stand apart from the background. The binary vision systems
are used primarily for:
Parts recognition
Parts location
Parts inspectionh
From a visual perspective, a binary vision device may be thought of as being
able to operate on a part as if an inspector had picked up the part and held it up
to a light source for backlighted inspection. One can see that there is a limited
but useful class of information that one can glean from this procedure.
The SRI collection of algorithms is an example of a binary vision system. As
typically implemented, it will permit arbitrary angular alignment of the part. A
run-length-coded image is often produced because of the speed enhancements
possible (see Section 6.9.5.3).
For objects where angular alignment is not an issue because of some prior
orientation stage, but where the translational position is unknown, binary corre
lation techniques are often used. Binary correlation permits the object to be
located with the value of correlation at the best match point often used as a measure
of part quality. For arbitrary parts orientation this technique is not practical yet,
because of the need to perform correlation in three dimensions (two translational,
one rotational). Other binary vision systems frequently use so-called "pixel count
of
ing" for inspection parts. These systems usually require spatial windowing or
data and then counting pixels in those windows. This scheme typically requires
a
significant application effort to choose the correct
then requires
windows, and
large degree of ad hoe adjustment to determine the significance of the pixel count
Gray-level systems generally capture 4-, 6-, or 8-bit images, and then apply ve
tailored algorithms designed tor a specitic application. Gray-level templatem
ing techniques, for example, may be used to locate parts in nonsilhouetted envr
ronments, In many instances, highly controlled lighting may not be permitted,
the surface of the object has variable reflectivities that are useful for inspecting
Sec.6.9 Review of Existing Systems
489
the object. Gray-level template comparisons may be used to locate objects that
angularly aligned, with the amount of template differences
are
It is often desirable to read labels or characters from parts, packages, and so on.
Where bar ocodes may be placed on the parts to be identified, identification may
be accomplished by simple bar-code readers. Alphanumericcodes are a different
matter entirely, since recognition of arbitrary character sets has until recently been
a very difficult image-processing task. Several systems available today are able to
read a wide variety of character sets (after initial training) at high speeds (15 to
30 characters/s).
A number of vision systems were developed specifically for use with robots prior
to 1980. These include the GM Consight System, the one developed by the
National Bureau of Standards, and also a system developed by SRI. We consider
each of these in turn.
Solid-State Stanford
Line Camera Robot Arm
PDP 11/45
Computer
Light Light
Source Source
Conveyor
Belt Position/
Speed Measurement
Robot
Interface
the robot's work station. In addition, information about the conveyor's speed is
sent to the computer. In the time it takes a part to move from the vision system's
location to that of the robot, the computer utilizes the visual and speed data to
determine the location, orientation, and type of part and sends this information
the robot controller via an interface. With this knowledge, the robot is able
to successfully approach and pick up the part while the latter is still moving on the
belt. It is important to note from Figure 6.9.1 that the vision system is not mounted
on the robot itself and, therefore, does not reduce the robot's useful
payload.
It is necessary to monitor the conveyor's velocity continuously for several
reasons. First, typical moving belts that are found in factories generally do not
have velocity servos controlling their speed. Thus it is
expected that the speed
will fluctuate due to a variety of causes including load
changes, line voltage var
ations, and wear of rotating parts (i.e., increased friction). Since the robot must
accurately know when the part arrives at its work station, the instantaneous bet
speed must be available to the computer. Second, keeping track of any belt speed
variations has to do with the method used to acquire the two-dimensional visual
scene. This is discussed next.
As can be seen in Figure 6.9.2, the linear array camera scans the belt ina
one-dimensional manner (e.g., in the y direction) and this is perpendicular to i
conveyor's motion (e.g., the x direction). Note that the camera will record 14o
equally spaced points across the width of the belt. The two-dimensional image
formed by instructing the camera to wait until the part has moved a
distance down the conveyor before recording the next line
spei be
It should
image.
clear that fluctuations in belt speed can produce distortion in the recorded
This undesirable phenomenon is avoided by speed
n the
which monitoring permt
Sec. 6.9 Review of Existing Systems 491
Conveyor
Belt
Figure 6.9.2. Camera and light source
configuration in the Consight I system.
The basic light principle is illustrated.
time interval between acquiring two successive line images to be varied in order
to compensate for any non-uniformity in the belt speed.
How does the Consight System avoid the problem of poor lighting conditions
without resorting to contrast enhancing techniques? How does the system *know"
if a part is present or not? The answer to both of these questions is through the
use of structured light. With respect to Figures 6.9.3 and 6.9.4, it is observed that
a light source, consisting ofa long slender tungsten filament bulb and a cylindrical
lens, which projects a linear (and fairly intense) beam across the belt's width is
positioned downstream of the camera which is placed in a position so that it can
sense this line of light. When no object is within the field of the camera. an
unbroken line of light results. See Figure 6.9.4(a). However, when a part is
present, the three-dimensional nature of the objJect causes a portion of the light
beam to be intercepted before it reaches the camera position. When viewed from
Cylindrical
Lens
Focused to
Line Across Belt
above by the camera, this part of the line of light that is deflected by the.
appears to be displaced (downstream) as shown in Figure 6.9.4(b). T h c
camera will sense a black Thus the
image wherever there is an object and will record a
region where there is no part. As the part moves down the conveyor, the light
of black will change in regionfel
length (i.e., y). The two-dimensional
binary image recorded
by the camera will, therefore, consist of regions of black (wherever there is a nam
and white where there is none.
One potential problem with this procedure is shadowing, as
illustrated by the
dotted lines in Figure 6.9.5. Here, it is observed that the system will
'detect the
leading edge of the part before it actually arrives at the camera position.
This
problem can be solved through the use of two or more linear light sources focused
at the camera location on the belt as shown in
Figure 6.9.5. The reader will
observe that the second light beam prevents this
position on the conveyor from
becoming dark until the part is actually at the location.
TheConsight System run-length coding scheme for storing the two-
uses a
dimensional binary image. Since the camera has 128 elements,
needed for this purpose. The remaining bit
only 7 bits are
(usually the most significant) in any
run length ""word" is used to indicate whether the
transition is light to dark or vice
First
Second
Light Line Light
Source Camera Source
For a given object, comparing these and other simply computed features with those
stored in the computer (for the entire "world" of permitted objects) allows the
part to be recognized.
Orientation, a deseriptor that is usually part specific, can be found in a number
of ways including selecting the moment axis direction that points nearest to the
maximum radius point measured from the centroid to the boundary. This de-
scriptor and the belt speed are then used to inform the robot when the object is
within the workspace and where and how to grasp the part. In some instances,
it may be necessary to stop the conveyor for a period of time to permit the ma-
nipulator to acquire the object.
A major problem with the Consight system is that it cannot handle parts that
are touching one another. If such a situation occurs, the number of scan lines will
usually be greater than for any single part. Alternatively, no match between all
the features of the two touching parts and those stored in the vision system's memory
will occur (e.g., the overall area will generally be larger for the "compound ob.
Ject"). In either instance, the objects that cannot be "identified" are permitted
to run off the end of the conveyor and into a reject bin where they can berecycled.
It is important to understand that any object that can assume more than one stable
position on the conveyor will require a separate set of features to be stored for each
one.
Although the GM Consight was developed over ten years ago, it is still used
insomewhat modified form today by the GMF Corporation and
commercially a
also by the Adept Corporation under a licensing agreement.
Having considered the various features of a vision system that was developed
by a large private company, we next consider a robotic vision system developed
with Federal money at the National Bureau of Standards.
494 Computer Vision for Robotic Systems: A Functional Approach
Chap. 6
6.9.5.2 National Bureau of Standards vision system [19]
In the 1970s, the United States Congress charged the National Bureau of
Standards (NBS) with developing a fully automated machine shop by the latter
part ofthe 1980s. As part of their effort to achieve coordinated control over robots
and other less sophisticated machine tools (e.g. lathes, punch presses, milling
machines, etc.), a need for a vision system that could be used in such an environment
and could be interfaced with robots was
perceived.
The research effort was undertaken by Dr. James Albus and his colleagues
and produced a workable system in the late 1970s. The NBS Vision System, as
itwas first introduced, was able to process picture information in less than 100 ms
and was estimated to cost about $8000. Since then, the system response time has
been improved and some of the hardware has been modified. However, the basic
operation technique has not changed appreciably as described next.
The major hardware elements of the NBS Vision System consist of three
components: (1) a solid state camera capable of producing 16K pixels (128 x 128);
(2) an electronic stroboscopic light that emits a plane of light and whose flash
intensity can be modified digitally; and (3) a "picture processing" unit. To under
stand the operation of the system, consider Figure 6.9.6. [The reader should
understand that the robot manipulator (in particular, its wrist) is not shown here.
In actual operation, the camera and (structured) light source would be mounted
on the robot's wrist whereas the processing unit would be in or near the robot's
Camera
Camera Field
of
View
Flash Plane of Light
Object
Reflected Line
of Light
Figure 6.9.6. Structured light imaging for the NBS robot vision system. The
camera and the flash (strobe) unit are mounted on opposite sides of the robot's
wrist (not shown) such that the plane of light is
The presence of an object causes one or more line
parallel to the fingers of the gripper.
camera. With permission of the National segments to be seen by the
Institute of Science and Technology
(formerly National Bureau of Standards).
Review of Existing Systems 495
Sec. 6.9
Box
Figure 6.9.7. Example objects and the line segment patterns formed by the plane
of light as seen by the camera for the NBS vision system. With permission of the
National Institute of Science and Technology (formerly National Bureau of Stand-
ards).
own controller.] The strobe unit produces a plane of light that is projected parallel
to the wrist plane (determine by the approach vector and y). The camera is
mounted above the light source and is tilted down (i.e., so as to intersect the light
plane). Its 36-degree field of view covers the region extending from inside the
fingers of the gripper out to a distance of one meter. If the projected light strikes
an object in this region, a pattern of line segments is formed on the object. See
Figure 6.9.7. As the robot gripper moves closer to the object, these line segments
will grow in size and will move down in the camera's field. Qualitatively, the
reader should be able to conclude that the nearer the bottom of the image, the
Closer the object being scanned will be to the robot's end effector. However, how
does this system provide quantitative information that will permit the robot to
acquire the object?
To answer this question, consider the calibration chart (derived from simple
geometric considerations) shown in Figure 6.9.8. Observe that the top and right
axes are calibrated in pixels whereas the bottom and left axes are calibrated in
centimeters representing the x and y distances between the camera and object
respectively. For example, if the camera viewed the horizontal line shown in
igure 6.9.9 extending from pixel element (32,64) to element (96,64), the object
producing this line would be located about 13 centimeters from the gripper and be
about 10 centimeters in width.
The information contained in Figure 6.9.8 is actually stored in the vision
496 Computer Vision for Robotic Systems: A Functional Approach Cha
Chap.6
Pixel Column
32 64 96 128
128
70
50 0
-30 30
96
30 20
-20
20
15 10
-1 10 64
10
32
4 3 2 0
5
3
x Distance in Centimeters
Figure 6.9.8. The calibration chart for the
NBS vision system. The x and
distances aremeasured in the coordinate y
system of the fingers. The x axis
through the two finger tips and the axis is
y parallel to the wrist
passes
tilt in the figure is due to a axis. The slight
misalignment of the
of the National Institute of Science and chip in the camera.With permission
Standards). Technology (formerly National Bureau of
Pixel Column
32 64 98 128
128
100
70
5 0 4 0
30
30 96
20
3020
15
10
64
10
10
32
x Distance in Centimeters
Figure 6.9.9. Calibration chart with example line segment (image) from (32, 64)
to (96, 64) shown. With permission of the National Institute of Science and
Technology (formerly National Bureau of Standards).
of the corners of the workspace. The table is then scanned (by firing the
strobe unit) in a plane that is approximately parallel to its surface. The
illuminated object appears in the image as a series of line segments. See,
for example, Figure 6.9.7. Generally, this step in the process yields coarse
range information.
2. The estimate obtained from step 1 is then used to move the robot's arm closer
to the object. The flash unit is again fired and more accurate range infor-
mation is obtained (recall that the resolution improves as the camera comes
closer to the object because we are now operating in a higher resolution part
of the calibration chart).
3. Based on the better range estimate obtained in step 2, the arm is moved
above (or in front of) the object. The strobe is triggered a third time and
the system makes fine positional and orientational corrections. The robot
can then be commanded to grasp the object.
498 Computer Vision for Robotic Systems: A Functional Approach
Chap. 6
In the earlier versions of the system, there was a
perceptible pause
of the illumination points. A later version eliminated this
delay thereby proda.n
an
extremely smooth motion. addition,
In the processing speed now so was ducing
that
it
was actually possible for the system to track a moving
object. rapid
It should be noted that the NBS system is fundamentally
quite different
other machine vision devices because it is not "looking'" all the time from
quently, it is much more time efficient since only three picture Conseho
scenes need to
processed (i.e., for the line segment information) during an object acquisition
sequence.
Besides ranging and orientation data, it is possble to extract
the structure of the object. This is also illustrated in information on
Figure 6.9.7. For example
it is observed that an
object with a raised portion of its surface will produce three
disconnected horizontal line segments. This happens because the
segment asso-
ciated with the elevated section will be lower than the other two.
In a similar
manner, objects with depressed surfaces can be detected by the nature and
of disconnected line number
segments. The figure also indicates that obliquely viewed
objects produce line segments that are connected but have different slopes
there is a cusp at their intersection). In addition, it can be shown (i.e.,
that objects
with cylindrical surfaces will
produce curved line segments when illuminated a by
plane of light.
Besides speed, another advantage of the NBS system over others is that the
contrast problem eased, or even eliminated, by the use of stroboscopic illumi-
is
nation. Even if the surface of the object is rather dull, it is
for the "threshold" problem by increasing the flash duration possible to compensate
under computer con-
trol. With the system used by NBS, the strobe time can be
as short as 6.4
as long as 1.6 ms. (This range is divided into 256 us and
values.) Theambient light
problem is solved by frame-to-frame differencing whereby data in the flash frame
is compared with that from a nonflash frame at the
same location. This does,
however, require a "frame buffer" and hence additional
memory.
At each step in the illumination process, a
one-pass line following algorithm
that looks for corners and gaps is utilized. For each
image scan line produced",
the system calculates (in hardware) a run length of 8-bits and an intensity of 8-
bits. (Note that gray-level information is utilized here unlike
the GM system
described in the previous section.) Based on this information, the possible run
length of the next scan line is predicted (from the slope and curvature information
of the working'" line). If the actual run length (RL) is within a specified e, the
point is added to the working line and the next run length is predicted. If, however,
the RL > 8, there are three cases:
1. RL = 128 which implies that a gap exists. The system will then begin to
Classification of objects
Materials and/or parts handling by a robot
Visual inspection of parts
In addition, these objectives were to be done for parts moving on a conveyor belt.
In the following sections, we describe four different aspects of the system
that permit these objectives to be realized. They are:
Color Filters. Another technique that can be used to enhance the imaoe.
is
to mount a color filter in front of the camera lens. Clearly, this is only useful
when the colors of either the object, the background, or both are known.
In
particular, when such a technique is combined with the fluorescent system men.
tioned above, a red filter (matched to the spectral response of the fluorescent belt
further enhances the image.
Special Lighting Arrangement. By illuminating an object in different ways,
it is possible to control shadows and highlights. For example, directional lighting
can be used to enhance shadows whereas multidirectional lighting can be employed
to reduce them. Highlights (or reflections from shiny surfaces) can be enhanced
by placing the illuminating source near the camera. Conversely, oblique lighting
tends to reduce such highlights. Such techniques are generally referred to as
structured lighting and are an important component of the SRI System.
Imaging Hardware
The reader should understand that with memory so inexpensive now, this would not posem
of a problem. Also, it is important to note that storing the entire image has the advantage or nav
it available for processing at any time. Thus features can be extracted anywhere in the process tne
permitting considerable flexibility in the image recognition algorithms.
Sec. 6.9
Review of Existing Systems 501
amera consisting of 128 light sensitive diodes was used instead. With this device,
ne line of data, consisting of 128 one-bit samples was obtained each time the
one
To determine or recognize
the part on the conveyor that is
being scanned by
the diode array camera, it is necessary to extract (compute) a set of features that
permit the recognition to be performed. Before this can be done, however, it is
necessary to obtain the outline of the object. In the
SRI system, a connectivity
analyzer is used to examine overlaps between lines in successive rows of the image,
which in turn, permits connected components to be ascertained. In addition, holees
located within the boundaries of the object are also found in this manner.
In both
cases, standard edge detection algorithms are utilized.
Once the part outline is determined, the initialor
In the SRI system, these include:
simple features are obtained.
With the above features now available, other features can then be
For example, the SRI
determined.
system computes a radius function. That is, for each point
on the
perimeter of the object, the square of the distance from that point to the
center of
gravity (or extent) is found. Then, the maximum, minimum, and
average
radi become the features of
interest.
Localizing features can also be used to determine the angular orientation of
the
object being viewed. They also permit attention to be directed to a specific
portion of the
part's outline. For this purpose, one can utilize:
mmw
O
B
mmy2 A
Figure 6.9.10. Binary image of a water
pump showing the hole B and the long-
est radius OA.
OA, the longest radius vector and OB, the vector formed from the CG to the
nearest hole B, permits measurement of the position of the movable pump handle.
In the actual experiments conducted by SRI, there were 200 lines required
to delineate the pump and with a PDP 11-40, image processing took about one
second. It should be clear that with the faster computers now available, this time
could probably be reduced by quite a bit.
Object rotation
Object translation
.Lightingvariations
obust.
*One refers to a system that can reliably operate under a wide range of conditions a
Sec. 6.9 Review of Existing Systems 503
.Cameranoise
Quantization errors
Once the operations above are performed, the decision tree can be constructed
504 Computer Vision for Robotic Systems: A Functional Approach
Chap.6
Class C2 Class C3 Class C1 Class C5 Class C4
Connecting Rod
(Conrod)
Cylinder Head
(On Side)
n Cylinder Head
(Upright)
[Head 1] [Head 2]
Brake Caliper
ww
Brake Caliper
(Lying Flat) (Lying on Side)
[Caliper 1] [Caliper 2]
as far as the system is concerned there are seven possible objects to identify.
Moreover, for this example, the following seven features are used:
Measure
No Yes
>1.66
Measure Measure
No X1.80 No 1.34
Head
Yes
Conrod Yes
Measure
X NoX>5.20
Caliper Yes
1
o Sleeve
X>18.45 1
Head
Yes
Measure
No
Xs>3.78 Yes
6.10 SUMMARY
This chapter illustrated many of the major aspects of machine vision, especially as
applied to robots. The concepts presented permit the general understanding of
the components, hardware, software, and algorithms that are often required in a
vision or remote sensing application.
The reader should understand that the techniques presented here, while use-
ful, will very often need to be modified before practical implementation is achieved.
These modifiçations may be in the nature of using selected regions of interest,
using computational approximations that provide for efficient implementation, or
other similar techniques that permit the transfer of the theoretical or academic
techniques into the real world of engineering and manufacturing implementation.
Furthermore, the techniques presented in this chapter should be used as "vectors"
that will point the implementer towarda good direction but will not give all the
details required. For example, one may choose a horizontal edge operator for
enhancing edges, but one also has to select the coarseness of the operator as well.
The algorithm itself may be obtained from the literature, but the specific imple-
mentation parameters must be selected by the user.
Another issue of interest is the fact that the same problem may be solved in
more than one way, and the method for selection of the most appropriate method
will not be found in any textbook. The authors have sought to present many of
the important techniques, and must regrettably leave the selection of the appro
priate technique in any given environment to the user. The example in Chapter
Nine serves to illustrate some of the problems associated with vision systems in
robotics.
For example, Automatix has developed and incorporated a special-purpose language called
RAIL," which permits the vision system to be more readily interfaced to the robot. In addition,
modern 32-bit microprocessors have been utilized, which allow both binary and gray-level data to be
processed.