Basic Image Processing For Robotics Manual
Basic Image Processing For Robotics Manual
Table of contents
Table of contents................................................................................................................. ii
Foreword:............................................................................................................................ v
Chapter 1: Introduction ................................................................................................... 2
1.1 Matlab ....................................................................................................................... 2
1.1.1 Matlab Introduction ........................................................................................... 2
1.1.2 Using Matlab Editor to Create M-file................................................................ 3
1.1.3 Getting Help....................................................................................................... 3
1.1.4 Starting with Image Processing Toolbox........................................................... 4
1.2 DIPimage .................................................................................................................. 5
1.2.1 DIPimage Introduction....................................................................................... 5
1.2.2 Getting Started ................................................................................................... 6
Chapter 2: Image representation and manipulation ..................................................... 9
2.1 Digital Image Representation ................................................................................... 9
2.1.1 Image Sampling and Quantization..................................................................... 9
2.1.2 Data Types and Image Types........................................................................... 12
2.2 Graphic Geometric Transformation........................................................................ 13
2.3 Basic Operations on Image Array........................................................................... 14
2.3.2 Array Operations.............................................................................................. 15
Chapter 3: Intensity Transformation............................................................................ 16
3.1 Contrast Manipulation ............................................................................................ 16
3.2 Histogram Processing ............................................................................................. 18
3.2.1 Histogram calculation ...................................................................................... 18
3.2.2 Histogram equalization .................................................................................... 19
3.3 Threshold ................................................................................................................ 19
Chapter 4: Spatial Filtering ........................................................................................... 22
4.1 Basic Ideas .............................................................................................................. 22
4.2 Smoothing (Blurring) Filters................................................................................... 24
4.3 Sharpening Filter..................................................................................................... 25
4.3 Edge Filter............................................................................................................... 25
4.4 Nonlinear filter........................................................................................................ 26
Chapter 5: Frequency based Processing....................................................................... 28
5.1 The Basics of Fourier Transform............................................................................ 28
5.2 Computing 2-D DFT............................................................................................... 29
5.3 Filtering in the Frequency Domain ......................................................................... 30
5.3.1 Fundamentals of Filtering in Frequency Domain ............................................ 30
5.3.2 Lowpass Frequency Domain Filters ................................................................ 31
5.3.3 Highpass Frequency Domain Filters................................................................ 32
ii
iii
iv
Foreword:
This manual is part of the Robot Practicals lab course. In the Robotics Practicals, the aim
is to familiarize yourself with the software tools that are necessary to program the robots
at the Delft Biorobotics Lab (DBL). The lab courses are very much hands-on; you will be
able to practice everything you learn immediately by doing the exercises that are
contained in the text. Because the aim is to understand the tooling instead of a particular
application, we specially encourage you to explore and try to combine the things you've
learned in new and hopefully exciting ways!
In this practicum we aim to explain basics of Image processing required for solving
simple problems in robotic vision and to give you basic knowledge for reading state of
the art vision books and papers. For more advanced explanations please refer to the
given literature. All exercises are given in Matlab so basic understanding of Matlab
environment is prerequisite for this practical.
There is no exam at the end of the course, but student is required to solve all the
assignments within practical and to come to the instructors to discuss the given solutions.
Pay attention that some parts of the Practicum are referred as advanced. We advise you
to read these parts and to solve assignments only if you would like deeper understanding
of the topic.
If you have any problems to understand parts of the Practicum after checking the given
literature, please contact the authors.
We wish you a nice work!
Authors:
Maja Rudinac [email protected]
Xin Wang [email protected]
Delft, 2010
Chapter 1 Introduction
Chapter 1: Introduction
Image processing is an area that needs a lot of experimental work to establish the
proposed solution to a given problem. The main goal of this Practicum is to get hands-on
experience with image processing. To do so, you will have to learn the image processing
environment: Matlab and the DIPimage toolbox for scientific image processing. To
facilitate a quick start, there will not be an in-depth explanation of all the features in this
software. We will briefly present the things as you need them.
In this Practicum, the details of image processing theory are left out. For these please
refer to:
1.1 Matlab
This subsection serves to get yourself acquainted yourself with Matlab so if you are
already familiar with Matlab, skip this section. If you have never worked with Matlab
before, it is necessary to check this section and related materials.
Matlab is an abbreviation of Matrix Laboratory. It is a numerical computing environment
and fourth-generation programming language that allows matrix manipulations, plotting
of functions and data, implementation of algorithms, creation of user interfaces, and
interfacing with programs written in other languages, including C, C++, and Fortran.
1.1.1 Matlab Introduction
First, let us have a look at Matlab desktop shown in Fig. 1.1. The desktop is the main
Matlab application window and it contains five sub windows: the Command Window, the
Workspace Browser, the Current Directory Window, the Command History Window, and
one or more Figure Windows which are used to display graphics. Their functions are
following:
The Command Window is the place that user can type in Matlab commands at the
prompt (>>) and where the outputs of those commands are displayed.
The workspace stores the set of variables that user creates during a work session.
The Command History Window contains a record of the commands a user had
entered in the Command Window. We can re-execute history commands by rightclicking on commands from Command History Window.
2
Chapter 1 Introduction
The Current Directory Window shows the contents of the current directory.
Current
folder
Command
window
Workspace
Command
history
Chapter 1 Introduction
Search tab, selecting Function Name as search type, and then typing in the function name
in the search field. Another way to obtain help for a specific function is by typing doc
followed by the function name at the command prompt.
M-functions have two types of information that can be displayed by user. One is H1 line
which contains the function name and one-line description and anther one is Help text
block. Typing in help at the prompt followed by a function name displays both the H1
line and the help text for that function. If type in look for followed by a keyword at the
prompt, we can have all the H1 lines that contain that keyword. It is especially useful
when looking for a particular topic without knowing the name of the function.
1.1.4 Starting with Image Processing Toolbox
The Image Processing Toolbox software is a collection of functions that extend the
capability of the MATLAB numeric computing environment for the purpose of image
processing.
Before we start, it is better to clear variables and figures before commands are executed.
This avoids undetected errors.
>>clear all
>>clc
Filename is a string containing the complete name of the image file (including any
applicable extension). And Table 1.1 gives some of images/graphics formats supported
by imread in Matlab.
Format name
TIFF
JPEG
GIF
BMP
PNG
XWD
Chapter 1 Introduction
I is an image array, and p is the number of intensity levels used to display it. If p is
omitted, it defaults to 256 levels.
>>I=imread('football.png');
>>imshow(I);
; whos(I)
280
>>whos I
Name Size
Attributes
I 280x280x3
3
Bytes
Class
235200 uint8
As shown in Command Window, size function gives the row and column dimensions as
well as depth of an image( 1 for grayscale images, 3 for color images), and whos displays
more information about the image array such as bytes and data type.
1.2 DIPimage
1.2.1 DIPimage Introduction
DIPimage is the additional toolbox which is used under MATLAB to do image
processing. At this time, we will review only the most relevant features. You will get to
know this environment better throughout the course.
The toolbox can be downloaded from: http://www.diplib.org/download
And the DIPimage user guide from: http://www.diplib.org/documentation
By typing in command dipimage in Command Window in Matlab, we can enter the
DIPimage environment.
Chapter 1 Introduction
On the top-left corner of Matlab the DIPimage GUI (Graphical User Interface) will
appear, same as one shown in Fig 1.3. The other 6 windows are used to display images.
The GUI contains a menu bar. Spend some time exploring the menus. When you choose
one of the options, the area beneath the menu bar changes into a dialog box that allows
you to enter the parameters for the function you have chosen. Most of these functions
correspond to image processing tasks, as we will see later.
Chapter 1 Introduction
This is to show you exactly the same would have happened if you had typed that
command directly in the command window.
Note that if we omitted the extension to the filename, readim can still find the file without
specifying the file type. We also didnt specify the second argument to the readim
function. Finally, by not specifying a full path to the file, we asked the function to look
for it either in the current directory or in the default image directory.
>>image_out = readim(body1);
Note that the contents of variable image_out changes, but the display is not updated.
To update the display, simply type:
>>image_out
You will have noticed that the image in variable image_out is always displayed in the
top-left window. This window is linked to that variable. Images in all the other
variables are displayed to the sixth window. This can be changed, see the DIPimage
manual for instructions.
A grey-value image is displayed by mapping each pixels value in some way to one of the
256 grey-levels supported by the display(ranging from black (0) to white (255)). By
default, pixel values are rounded, negative values being mapped to 0, and values larger
than 255 to 255. This behavior can be changed through the Mappings menu on each
Chapter 1 Introduction
figure window. Note that these mappings only change the way an image is displayed, not
the image data itself.
Another menu on the figure window, Actions, contains some tools for interactive
exploration of an image. The two tools we will be using are Pixel testing and Zoom.
Pixel testing allows you to click on a pixel in the image (hold down the button) to
examine the pixel value and coordinates. Note that you can drag the mouse over the
image to examine other pixels as well. The right mouse button does the same thing as the
left, but changes the origin of the coordinate system and also shows the distance to the
selected origin (right-click and drag to see this).
The Zoom tool is to enlarge a portion of the image to see it more clearly.
Assignment 1.1:
Create an M file with the function called ImBasic, which has read, display, save and write
operations. Read the image boy.bmp into Matlab environment, display it on the screen,
check its additional parameters and finally save it in your assignment directory by name
boy_assignment and jpg extension.
aM 1,0
a0,1
a1,1
aM 1,1
a0, N 1
f (0,0)
f (1,0)
a1, N 1
F (i, j ) =
aM 1, N 1
f ( M 1,0)
f (0,1)
f (1,1)
f ( M 1,1)
f (0, N 1)
f (1, N 1)
f ( M 1, N 1)
Each element is this array is called image element, picture element or pixel. A digital
image can be represented as Matlab matrix
f (1,1)
f (2,1)
f (i , j ) =
f ( M ,1)
f (1, 2)
f (2, 2)
f ( M , 2)
f (1, N )
f (2, N )
f (M , N )
Note that Matlab is different from C and C++ in the way that its first index is from (1, 1)
instead of (0, 0). Notice this when programming.
Example 2.1 Downsample an image by factor 2 then upsample by factor 2 (Matlab)
>>img=imread('football.png');
>>figure
>>imshow(uint8(img));
>>title('orginal image');
>>subimg = imresize(img, 0.5);
>>figure
>>imshow(uint8(subimg))
>>title('first subsampling');
>>subsubimg = imresize(subimg, 0.5);
>>figure
>>imshow(uint8(subsubimg))
>>title('second sampling');
>>upimg=imresize(subsubimg,2);
>>figure
>>imshow(uint8(upimg));
>>title('first upsampling');
>>upupimg=imresize(upimg,2);
>>figure
>>imshow(uint8(upupimg))
>>title('second upsampling');
The resulting images are shown in Figure 2.3 and Figure 2.4. We can see if we first
downsample an image and than upsample the same image to its original size the image
will lose some information. It shows the importance of spatial resolution.
10
And the results of reducing grey level can be seen in Figure 2.5.
256 levels
32 levels
16 levels
Figure 2.5 Reducing amplitude resolution
Assignment 2.1:
Write your own m file with downsample function without using any functions provided
by Matlab. Read the image kickoff.jpg into Matlab environment, use for or while loops
to downsample this image and then use imshow to display the final result.
11
Note that the rounding behavior of im2unit8 is different from data_type_name conversion
function unit8, which simply discards fractional parts. im2unit8 sets all the values in input
array less than 0 to 0, and greater than 1 to 1, and multiplies all the other values by 255.
Rounding the results of the multiplication to the nearest integer, then the conversion is
completed.
12
13
Image Operation
14
video image 1
video image 2
substraction
50
100
150
200
250
50
100
150
200
250
300
350
Assignment 2.3:
Use basic array operations to add image noise to an image region. Read the image
drop.tif, and use array indexing, imadd and randn function to add noise only to the
drop region.
15
g ( x, y) = T [ f ( x, y)]
Where f(x, y) is the input image, g(x, y) is the output image, and T is an operator on f,
defined over a special neighbored about point (x, y) as shown in Figure 3.1.
16
f = imread('mary.bmp');
figure
imshow(f)
g1 = imadjust(f,[],[],0.5);
figure
imshow(g1)
g2 = imadjust(f,[],[],2);
figure
imshow(g2)
g3 = imadjust(f,[0 1],[1 0]);
figure
imshow(g3)
Assignment 4.1:
Use logarithmic transformation to enhance the dark image qdna.tif. The
transformation function is:
g = c log(1 + f )
Assignment 4.2:
Use power-law transformation to enhance the light image sca.tif. The transformation
function is:
g = cf
17
Resulting histograms of images from Figure 3.2 are shown in Figure 3.3
18
Parameter f is the input image and nlev is the number of intensity levels that is specified
for output image. If the nlev is equal to L (the total number of possible levels in the input
image), histeq directly implements the transformation function. If nlev is less than L,
histeq attempts to distribute the levels so that they will approximate a flat histogram.
Histogram equalization produces a transformation function that is adaptive, in the sense
that it is based on the histogram of a given image. However, once the transformation
function of an image has been computed, it does not change unless the histogram of the
image changes. But image equalization which spread the levels of the input image over a
wider range of intensity scale can not always enhance image. In particular, it is useful in
some applications to be able to specify the shape of the histogram that we wish the
processed image to have. This method is used to generate an image that has a specified
histogram which is called histogram matching or histogram specification. In Matlab, we
also use histeq to implement histogram matching, which has the syntax:
g = histeq(f, hspec)
Parameter f is the input image, hspec is the specified histogram, and g is the output image
whose histogram approximates the specified histogram.
3.3 Threshold
Threshold technique is widely used in image segmentation to segment images based on
intensity levels.
Let us now have a look at the histogram in Figure 3.4 which is corresponding to the
image named coins, composed of light objects on a dark background, in such a way
that object and background pixels have intensity levels grouped into two dominant
19
16 32 48 64
20
21
Filter type
Filter
Averaging
Low-pass
Gaussian
High-pass
Sharpening
Edge
detection
Gradient
Embossing
Directional
a a a 0 a 0
c a c
a b a
c a c
a a a 0 a 0
a b a a b a
a a a 0 a 0
b
a
a
0
0
0
a b a
a 0 a
b 0 b
a 0 a
a a
0
a a a
a a 0
1 1 1 1 1 1
1 2 1 1 2 1
1 1 1 1 1 1
23
Description
Smoothing, noise reduction
and blurring filter(focal mean)
Smoothing, noise reduction
and blurring filter(focal weight
mean)
Mean effect
removal/sharpening filter
(focal sum). Provides limited
edge detection. Typically
entries sum to 1 but may be
greater. 3 3 laplacian kernels
typical add to 1. Large
laplacian kernels(e.g. 7 7 )
may be more complex and sum
to >1
Applied singly or as two-pass
process. These kernels
highlight vertical and
horizontal edges. When used in
combination they are known as
Gradient or Order 1 derivative
filters. Typically a = 1 and b =
1 or 2 and entries sum to 0.
Edge detecting filters that
enhance edges in a selected
compass direction to provide
an embossed effect. The
example here shows a sample
north-east kernel
Simply computation of
gradient in one of 8 compass
direction, east and north
directional derivative are
illustrated in the first two
examples here.
MATLAB:
Smoothing filters are used to smooth (blur) edges and noise in the images. In Matlab, it is
best to use imfilter to generate smoothing images. Firstly the smoothing mask should be
defined and afterwards applied on the image to smooth it.
Example 4.1 Apply average filter to an image (Matlab)
>>w =[1/9 1/9 1/9
1/9 1/9 1/9
1/9 1/9 1/9];
>>car =imread('car.bmp');
>>figure
>>imshow(car)
>>g =imfilter(car, w);
>>figure
>>imshow(g)
24
Sharpening filters are used to enhance the edges of objects and adjust the contrast and the
shade characteristics. In combination with threshold they can be used as edge detectors.
Sharpening or high-pass filters let high frequencies pass and reduce the lower frequencies
and are extremely sensitive to shut noise. To construct a high-pass filter the kernel
coefficients should be set positive near the center of the kernel and in the outer periphery
negative.
MATLAB:
We will now try to use another Matlab function fspecial to generate sharpening filter. We
take Laplace filter as example which can enhance discontinuities. First we use negative
Laplace operation to apply on the image, and afterwards we subtract negative Laplace
image from original to get sharpening image. The result is shown in Figure 4.3.
Example 4.3 Sharpening image (Matlab)
>>f = imread('kickoff_grayscale.jpg');
>>w = fspecial('laplacian',0);
>>g =imfilter(f, w, 'replicate');
>>g = f - g;
>>imshow(g)
a) Original image
b) Laplace image
Figure 4.3 Sharpening Image
c) Sharpening image
DIPimage
The menu item Differential Filters in filters menu contains a general derivative filter, a
complete set of first and second order derivatives, and some combinations like gradient
magnitude and Laplace filters. Let us try to use Laplace filter in DIPimage.
Example 4.4 Apply Laplace filter to an image (DIPimage)
>>image_out = readim('schema.tif','')
>>g = laplace(image_out, 1)
25
detecting significantly reduces the amount of useless information, while preserving the
important structural properties in an image. The majority of different methods are
grouped into two categories, gradient and Laplacian. The gradient method detects the
edges by looking for the maximum and minimum in the first derivative of the image. The
Laplacian method searches for zero crossings in the second derivative of the image to
find edges. More will be explained in Chapter 10, in edge segmentation section.
4.4 Nonlinear filter
A nonlinear filter is the filter whose output is a nonlinear function of the input. By
definition, any filter that is not a linear filter is a nonlinear filter. One practical reason to
use nonlinear filters instead of linear filters is that linear filters may be too sensitive to a
small fraction of anomalously large observations at the input. One of the most commonly
used nonlinear filters is the median filter which is very effective to reduce salt-and-pepper
noise in an image.
MATLAB:
There are two different ways to generate median filter, one using ordfilt2 function while
another using median function directly. The result is shown in Figure 4.4.
Example 4.5 Apply median filter to an image (Matlab)
>>f = imread('kickoff_grayscale.jpg');
>>fnoise = imnoise(f, 'salt & pepper', 0.2);
>>figure
>>imshow(fnoise)
>>g = medfilt2(fnoise);
>>figure
>>imshow(g);
26
1. Load the image shading which contains some text on a shaded background. To
remove this shading, we need to estimate the background. Once this is done, we can
correct the original image. This is a common procedure, often required to correct for
uneven illumination or dirt on a lens. There are several background shading estimation
methods:
The most used one is the low-pass filter (gaussf). Try finding a relevant parameter
for this filter to obtain an estimate of the background, and then correct the original
image.
Another method uses maximum and minimum filtering. Since the text is black, a
maximum filter with a suitable size will remove all text from the image, bringing
those pixels to the level of the background. The problem is that each background
pixel now also was assigned the value of the maximum in its neighborhood. To
correct this, we need to apply a minimum filter with the same size parameter. This
will bring each background pixel to its former value, but the foreground pixels
wont come back!
Use this estimate of the background to correct the original image. Use only Matlab.
27
f ( x, y ) =
1
MN
M 1 N 1
F (u, v)e
j 2 ( ux M + vy N )
x =0 y =0
And we should know that even if f(x, y) is real, its transform is complex. The principle
method of visually analyzing a transform is to compute its spectrum. The Fourier
spectrum is defined as:
p (u , v ) = F (u , v ) = R 2 (u , v ) + I 2 (u , v)
R(u, v)
(u, v) = tan 1
28
MATLAB
The FFT of an M N image array f is obtained in the toolbox with function fft2, which
has the syntax:
F = fft 2( f )
This function returns a Fourier transform that is also of size M N . Here we give an
example to compute 2-D DFT using Matlab.
Example 5.1 Compute 2-D DFT (Matlab)
>>img=imread('girl.bmp');
>>subplot(2,2,1);
>>imshow(img);
>>title('original image');
>>F=fft2(img);
>>S=abs(F);
>>subplot(2,2,2);
>>imshow(S,[ ]);
>>title('Fourier spectrum');
>>Fc=fftshift(F);
>>subplot(2,2,3);
>>imshow(abs(Fc),[ ]);
>>title('Centered spectrum');
>>S2=log(1+abs(Fc))
>>subplot(2,2,4);
>>imshow(S2,[ ]);
>>title('log transformation enhanced spectrum');
DIPimage
In DIPimage, choose menu transforms and menu item fourier transform(ft).
Example 5.2 Compute 2-D DFT (DIPimage)
>>image_out = readim('acros.tif','')
>>result = ft(image_out)
The foundation for linear filtering in both the spatial and frequency domains is the
convolution theorem, which may be written as:
f ( x, y ) * h( x, y ) H (u , v ) F (u , v )
And conversely,
f ( x, y ) h( x, y ) H (u , v) * F (u , v )
Here the symbol * indicates convolution of the two functions. Thus we can know, from
the two equations, that filtering in the spatial domain consist of convolving an image
f ( x, y ) with a filter mask, h( x, y ) . Now, we give the basic steps in DFT filtering:
1. Obtain the padding parameters using function paddedsize: PQ = paddedsize(size(f));
2. Obtain the Fourier transform with padding: F = fft2( f, PQ(1), PQ(2) );
3. Generate a filter function, H, of size PQ(1) * PQ(2);
4. Multiply the transform by the filter: G = H.* F;
5. Obtain the real part of the inverse FFT of G: G = real (ifft2(G)) ;
6 Crop the top, left rectangle to the original size: G = g(1:size(f,1),1:size(f,2) ).
30
Why padding?
Image and their transforms are automatically considered periodic if we select to work
with DFTs to implement filtering. It is not difficult to visualize that convolving periodic
functions can cause interference between adjacent periods if the periods are close with
respect to the duration of the nonzero parts of the functions. This interference, called
wraparound error, can be avoided by padding the functions with zeros.
5.3.2 Lowpass Frequency Domain Filters
31
Given the transfer function H lp (u , v ) of a lowpass filter, we obtain the transfer function
of the corresponding highpass filter by using the simple relation H lp (u , v) = 1 H lp (u , v)
Assignment 5.1:
Design a Butterworth high pass filter and apply it to the image kickoff_grayscale.jpg.
Look for the website to find related equation of Butterworth low pass filter, and then use
it to get high pass filter and apply it to this image. (Use Matlab)
32
Color image
Grayscale image
Figure 6.1: Binary image
Binary image
33
Set
Operation
A
Ac
A B
A B
A B
MATLAB
Expressions
A
~A
A| B
A& B
A& ~ B
All morphological operations can be described using this notation. Main morphological
operations that we will cover in this practicum are dilation, erosion, opening and closing.
6.2.1 Structuring element
This function creates a structuring element, SE, where shape is a string specifying desired
shape Depending on shape, strel can take additional parameters of the type specified by
shape. The table below lists all the supported shapes.
Table 6.2: Structuring elements
Flat structuring element
arbitrary pair
diamond periodicline
disk
rectangle
line
square
octagon arbitrary
Examples of structuring elements:
1 1 1
1 1 1
origin
1 1 1
SE=strel (rectangle, [ 3 3])
1
1 1 1
1
SE=strel (diamond, 1)
1 0 0
1 0 0
1 0 1
SE=strel (arbitrary, [1 0 0; 1 0 0; 1 0 1])
34
Figure 1 shows a number of different structuring elements of various sizes. In each case
the origin is marked with red color. The origin does not have to be in the center of the
structuring element, but often it is. As suggested by the figure, structuring elements that
fit into a 33 grid with its origin at the center are the most commonly seen type.
DipImage:
In DipImage structuring element is defined using parameters filterShape ('rectangular',
'elliptic', 'diamond', 'parabolic') and filterSize within morphological functions.
6.2.2 Erosion
The basic effect of the erosion on a binary image is to gradually enlarge the boundaries of
regions of foreground pixels (i.e. white pixels, typically). Thus areas of foreground pixels
grow in size while holes within those regions become smaller. The dilation operator
takes two pieces of data as inputs. The first is the image which is to be dilated; the second
is a structuring element. It is the structuring element that determines the precise effect of
the dilation on the input image.
Following functions are used:
MATLAB: image_out = imdilate (binary_image, structuring_element)
Dip_image: image_out = dilation (image_in, filterSize, filterShape)
6.2.3 Dilation
Dilation is dual operation to erosion. The basic effect of the dilatation on a binary image
is to gradually enlarge the boundaries of regions of foreground pixels (i.e. white pixels,
typically). Thus areas of foreground pixels grow in size while holes within those regions
become smaller. The dilation operator takes two pieces of data as inputs. The first is the
image which is to be dilated; the second is a structuring element. It is the structuring
element that determines the precise effect of the dilation on the input image.
Following functions are used:
MATLAB: image_out = imdilate (binary_image, structuring_element)
Dip_image: image_out = dilation (image_in, filterSize, filterShape)
Assignment 6.1
Load image sky.jpg and create binary image from it. Choose structuring element and
perform erosion. Try several different structuring elements, vary their size and generate
images similar to Figure 6.4. What you can observe? Now perform dilation. Can you
observe the difference in images? Results are shown in the Figure 6.3.
35
Original image
Binary image
Erosion
Dilation
Binary image
Erosion with SE
Erosion with SE
Erosion with SE
size 3
size 5
size 7
Figure 6.4: Effects of the size of structuring element on erosion
Assignment 6.2
Load image sky and perform opening operation. Compare result with the result from
Figure 6.3. Is there a difference with the result from Figure 6.3?
6.2.5 Finding boundary pixels
The boundary pixels can be found by first dilating the object and subtracting the original
from it. Or by first eroding the image and subtracting it from the original.
1. Boundary_image=imdilate(Original_image) Original_image
2. Boundary_image= Original_image - imerode(Original_image)
36
binary_image,branchpoints);
37
DipImage:
For reconstruction:
image_out=bpropagation(image_seed,image_mask,iterations,connectivity,
edgeCondition)
Iterations: the number of steps taken defines the size of the structuring element (0 is the
same as Inf, meaning repeat until no further changes occur).
EdgeCondition: Make sure to set the edge condition to 0. This is the value of the pixels
just outside the boundary of the image. If you set it to 1, all objects touching the border
will also be reconstructed.
For clearing border objects:
image_out = brmedgeobjs(image_in,connectivity)
Assignment 6.4:
Load the image text.jpg and using different morphological operations obtain the results
shown at Figure 6.7. Explain the difference between the different images.
38
Now devise some operations using not (or ~), dilation, erosion, imreconstruct (or
bpropagation) and/or imclearborder (brmedgeobjs) to detect either the good or bad bulbs
(either make a program that rejects bad bulbs or accepts good bulbs).
The colored images were generated with the command overlay. It overlays a grey- value
or color image with a binary image. It is possible to apply this function several times,
each with a different binary image, which can thus be used to mark the image using
several colors. Third (optional) parameter determines the color for the binary objects. It is
possible to apply this function several times, each with a different binary image, which
can thus be used to mark the image using several colors.
Image lamps
Exercise goal
Alternate goal
Figure 6.8: Selecting different objects
Assignment 6.6: Distinguishing nuts from bolts (only DipImage)
Now load the image nuts_bolts and threshold it to get the binary image. Note that the
threshold operation chooses the background as the object (because it is lighter). You will
need to inverse the image before or after the thresolding. Use the bskeleton function to
create a skeleton of the objects. What is the influence of the Edge Condition? What
does End-Pixel Condition control?
With looseendsaway we can transform the nuts seen from the top (with the hole in them)
into a form that is distinguishable from the other objects in the image. Now use the
function getsinglepixel to extract the objects without a hole in them. This new image can
be used as a seed image in bpropagation. The mask image is the original binary image.
The objects with holes are retrieved with b & c (b and not c) if the output image for
bpropagation was c. Try extracting the last nut using the bskeleton and the
getbranchpixel functions. Now solve the assignment in Matlab without using DipImage
toolbox. Find the equivalent functions in Matlab to the ones used in DipImage.
Image nuts_bolts
Exercise goal
Figure 6.9: Selecting different objects
39
All morphological operations that are already explained can be applied on the grayscale
images. Morphological filtering is very useful and often applied on the gray scale images
since it can severely reduce noise while preserving the edges in the image in the contrast
to linear filters. It is able to distinguish structures based on size, shape or contrast
(whether the object is lighter or darker than the background). They can be employed to
remove some structures, leaving the rest of the image unchanged. In this sense,
morphology is one step ahead of other image processing tools towards image
interpretation.
6.4.1 Morphological smoothing
Because opening suppresses bright details smaller than the structuring element and
closing suppresses dark details smaller than the structuring element they are often used in
combination for image smoothing and noise removal. Note that the size of the structuring
element is an important parameter. A morphological smoothing with a small structuring element
is an ideal tool to reduce noise in an image.
MATLAB:
Image_opening=imopen(imclose(Image, SE), SE);
Image_closing=imclose(imopen(Image, SE), SE);
DipImage:
Image_opening = bopening (bclosing (image_in, filterSize, filterShape),
filterSize, filterShape)
Image_closing = bclosing (bopening (image_in, filterSize, filterShape),
filterSize, filterShape)
Original image
Open-close filtering
Close-open filtering
Figure 6.10: Morphological smoothing
Assignment 6.7
Load the image erika and construct a smoothing filter that removes most of the hair,
but leaves the girls face recognizably human. In process use both the Open-close and the
Close-open filtering and explain the difference between them.
40
As we already saw in Chapter 3, for sharpening we should use edge detectors. They are
morphological gradient magnitudes:
Edge1 = dilation(A)A
Edge2 = Aerosion(A)
In a similar way, we can construct a morphological second derivative:
Original image
Linear Laplace
Morphological
Laplace
Figure 6.10: Morphological sharpening
Sharp morph.
Laplace
Openings can be used to compensate for non uniform background illumination, which
occurs very often in real world scenes. Figure 6.11 (a) shows an image f in which the
background is darker towards the bottom than in upper portion of the image. The uneven
illumination makes image thresholding difficult. Figure 6.11 (b) is a thresholded version
in which the grains at the top of the image are well separated from the background but the
grains in the middle are improperly extracted from the background. Opening of the
image can produce the estimate of the background across the image, if the structuring
element is larger than rise grains. Estimated image is shown on the image 6.11 (c).
41
By subtracting background estimate from the original image, we can produce the image
with reasonably even background. Figure 6.11 (d) shows the result, while the Figure 6.11
(e) shows the new thresholded image. The improvement is apparent.
Subtracting an opened image from the original is called the top-hat transformation.
>> f2 = imtophat(f,se); Image with even background
(Equal with f2 = imsubtract(f,fo);)
A related function imbothat(f, se) perfoms the bottom-hat transformation, defines the
closing of the image minus the image. These two functions can be used for contrast
enhancement using commands such as:
>> se = strel('disk',3);
>> g =imsubtract(imadd(f, imtophat(f,se)), imbothat(f,se));
a) Original image
42
Chapter 7 Measurements
Chapter 7: Measurements
In this chapter we describe various ways to measure objects in the image. In previous
chapters we saw different ways for image processing but in order to progress to image
analysis we first need to learn how to select objects in images and measure their basic
properties.
7.1 Selecting objects
It returns a matrix L, of the same size as binary_image, containing labels for the
connected objects in binary_image. The variable n can have a value of either 4 or 8,
where 4 specifies 4-connected objects and 8 specifies 8-connected objects. If the
argument is omitted, it defaults to 8.
The number of connected objects found in binary_image is returned in num.
You can use the MATLAB find function in conjunction with bwlabel to return vectors
of indices for the pixels that make up a specific object. For example, to return the
coordinates for the pixels in object 2, enter the following:
>>[r, c] = find(bwlabel(binary_image)==2)
In order to show labeled components in different color, as on Figure 7.1 we can use
following functions:
>>RGB = label2rgb(L); Convert label matrix into RGB image
Labeled objects and the selected one are shown in Figure 7.1.
43
Chapter 7 Measurements
DipImage:
Equivalent function is label, where is the connectivity same as in bwlabel function and
minSize, maxSize represent minimum and maximum size of objects to be labeled.
image_out = label(image_in,connectivity,minSize,maxSize)
a)
Labeled b) Selected object
c) Perimeter of the d) Centroids of all
components
selected object
components
Figure 7.1 Selecting and measuring object
Assignment 7.1:
Load the image cerment, label all the objects and show different components with
different colors. Now extract only the largest object. The area of the object can be easily
obtained by summing all the pixels belonging to that object.
7.2 Measuring in binary images
For basic measurements in Matlab we can use function regionprops. It measures a set of
properties for each connected component (object) in the binary image. All the results are
obtained in image pixels and represent only relative measurements. Linking these
measurements to real world values is explained in Section 7.5
STATS = regionprops(binary_image, properties)
44
Chapter 7 Measurements
PARAMETERS:
object_in: binary or labelled image holding the objects.
gray_in: (original) gray value image of object_in. It is needed for several types of
measurements. Otherwise you can use [].
measurementIDs: measurements to be performed, either a single string or a cell array
with strings (e.g.: {'Size','Perimeter, Mean, StdDev } ). See measurehelp
objectIDs: labels of objects to be measured. Use [] to measure all objects
minSize, maxSize: minimum and maximum size of objects to be measured. By default use
0.
To extract all measurements for one object, index the returned measurement object using
the label ID of the object you are interested in. The next example illustrates the four types
of indexing:
>> data(2); Properties of object with ID 4
>>data.size; Size of all objects
>>data(2).size; Size of object with ID 4
>>data.size(2); Size of 4th. element
The most important characteristic of an object is its area, which is simply the number of
pixels. It's also the first measurement of importance of the object. Whatever the "real" (or
physical) object is, its area computed this way will be as close as we like to its "true" area
as the resolution increases.
Solution 1:
In Matlab, we can use the command bwarea to estimate the area of objects in binary
image. Result is a total number of pixels with value 1 in the image.
Area=bwarea(binary_image);
If we want to estimate the area of the object in Figure 7.1 b), we can calculate it using:
>>[L, num] = bwlabel(binary_image, n)
45
Chapter 7 Measurements
>>[r, c] = find(bwlabel(binary_image)==2)
>>mask = zeros(size(L));
>>mask(r,c)= 1;
>>Area_selected_object = bwarea(L.*mask) = 189,38 pixels
>>Area_all_labeled_components = bwarea(L) = 17788 pixels
Solution 2:
Another way to do it is by using regionprops function.
>>Rice_Area = regionprops(binary_image, Area)
>>Rice_Area = 99x1 struct array with fields: Area
Solution 3:
In DipImage, just specify Area as the argument in the measure function.
7.2.2 Perimeter
The perimeter of an object is the number of edges around the object. The perimeter is
easy to compute but the result depends on the orientation of the object with respect to the
grid. Therefore it is not a good way to measure objects in digital images.
Solution 1:
In Matlab there is a function bwperim that returns a binary image containing only the
perimeter pixels of objects in the input image.
Perimeter=bwperim(binary_image);
Lets calculate the perimeter of a selected object in image 7.1(b). Length of perimeter can
be calculated using previously mentioned function bwarea. The result is shown on image
7.1(c)
>>Perimeter_selected_object = bwperim(L.*mask);
>>Perimeter_length =
bwarea(Perimeter_selected_object)=64,12 pixels
Solution 3:
In DipImage, just specify Perimeter as the argument in the measure function.
Assignment 7.2
Load the image rice and create binary image from it. Calculate the perimeter and the
length of perimeter for all objects in image as well as for the object 5. Display the result
and verify that result is same with the solution 1. Use the regionprops function.
46
Chapter 7 Measurements
7.2.3 Centroid (Centre of mass)
An object in a digital image is a collection of pixels. Then the coordinates of the center of
mass is the average of all coordinates of the pixels.
Solution 1:
In Matlab, it can be computed using regionprops function.
Rice_Centroid
= regionprops(binary_image, 'centroid');
Displaying centroids can be done in following way. Result is shown in Figure 7.1. d)
>>centroids = cat(1, s.Centroid)
>>imshow(binary_image)
>>hold on
>>plot(centroids(:,1), centroids(:,2), 'b*')
>>hold off
131.1497
61.1818
Solution 2:
In DipImage, just specify Gravity as the argument in the measure function.
7.2.4 Euler number
The Euler number is a measure of the topology of an image. It is defined as the total
number of objects in the image minus the number of holes in those objects.
In Matlab, it can be computed using bweuler function.
>>Rice_Euler = bweuler(binary_image,8)
>>Rice_Euler = 98
Since there are no holes in image rice we can conclude that there are 98 objects in
image.
Assignment 7.3
Load the image rice and create binary image from it. Calculate the total number of
objects in image using regionprops function.
Assignment 7.4
Load the image nuts_bolts and separate nuts from bolts. Display their perimeters on
two different images. How many components are there in total in each image? Select one
bolt and estimate its centre of mass. In your opinion are the values real?
47
Chapter 7 Measurements
7.3 Errors Introduced by Binarization
Note that the area, perimeter, etc. you measured earlier are not the exact measurements
you could have done on the objects in the real world. Because of the binarization, the
object boundary was discretized, introducing an uncertainty in its location. The true
boundary is somewhere between the last on-pixel and the first off-pixel. The pixel pitch
(distance between pixels) determines the accuracy of any measurement.
Assignment 7.5: Thought experiment
Imagine you drive a car with an odometer that indicates the distance travelled in 100
meter units. You plan to use this car to measure the length of a bridge. When you cross
the bridge, sometimes the odometer advances one unit, sometimes two. Can you use this
set-up to measure the length of the bridge accurately? How can you determine the
accuracy? What special measures do you need to take to make sure your measurement is
not biased?
Assignment 7.6: Errors in area measurement
The object area is computed counting the number of pixels that comprise the object. The
error made depends on the length of the contour. Quantify this error for round objects of
various sizes. What happens with the accuracy as a function of the radius? Why?
Hint: make sure you generate the objects with a random offset to avoid a biased result.
To do so, use the function rand:
In the previous sections we threw out a lot of information by binarizing the images before
measuring the objects. The original grey-value images contain a lot more information
(assuming correct sampling) than the binary images derived from them (which are not
correctly sampled). As long as we only apply operations that are sampling-error free, we
can perform measurements on grey-value images as if we were applying them directly to
analog images. In this case, measurement error is only limited by the resolution of the
image, noise, and imperfections caused by the acquisition system.
7.4.1 Thresholding
However, there are some limitations. If the continuous image is binary, the pixel sum
directly yields its size. To all other images we need to apply a form of nonlinear scaling
to accomplish this, without violating the sampling constraint. Sharping all edges into a
normalized error-function is an example of soft-clipping that fulfills the requirements.
DipImage has function erfclip that performs soft clipping in Grey-value error function.
48
Chapter 7 Measurements
Using this function we will make edges more notable and suppress intensity fluctuations
in the foreground and background.
image_out = erfclip(image_in,threshold,range)
The basic measurements in Gray value images can be calculated in the following way. If
we if we let 0 B(x,y) 1 be a continuous function of image position (x,y), we can define
Area, Parameter and Euler number as:
Area: A = B ( x, y )dxdy
Perimeter: P = Bx 2 + By 2 dxdy
1
Euler number: E =
2
dxdy
The (Bx, By) is the brightness gradient and Bxx, Bxy, and Byy are the second partial
derivatives of brightness (Check Section 10.1.3 for explanation of gradient and second
derivative).
Correspondingly, in the continuous gray-level image domain:
(1) Area can be computed using the image values directly;
(2) Perimeter requires first partial derivatives (gradient);
(3) The Euler number requires second order partial derivatives.
In order to compute different properties, lets first generate gray value image of a simple
disk with height 255, and radius 64. It is show on Figure 7.2 a)
>> disk = testobject(a,ellipsoid,255,64);
To compute the perimeter displayed in image 7.2 b), we will need first partial derivative
of the image.
Perimeter = sum(gradmag(a,2))/255 = 402
Assignment 7.7
Create binary image of image disk, and calculate its area and perimeter. Compare results
with the ones obtained by measuring in gray level images. Comment on the error
introduced by binarization.
49
Chapter 7 Measurements
Assignment 7.8
Compute the Euler number of the gray level image of simple disk. Verify that there is
only one object on the image.
b) Perimeter
All the measurements that we were performing so far had resulting values in image
pixels. If we need to obtain these values in real world coordinates (meters) there are
several parameters that we need to know in advance about image sensor on which the
image is captured.
7.5.1 Pinhole camera model
Images are captured using cameras. On Figure 7.3 is shown simple camera model.
Real world
P = ( X ,Y , Z )
Image plane
f
y
P = ( X , Y , Z )
50
Chapter 7 Measurements
Note that O represents centre of projection and f, the focal point of the camera.
X = X
X Y Z
OP = OP Y = Y =
= =
X Y
Z
Z = Z
Z = f
X = f
x = X
X
Z
Y = f
Y
Z
y = Y
( X , Y , Z ) ( x, y,1) = ( f
X
Y
, f , 1)
Z
Z
In reality, one must use lenses to focus an image onto the camera's focal plane. The
limitation with lenses is that they can only bring into focus those objects that lie on one
particular plane that is parallel to the image plane. Assuming the lens is relatively thin
and that its optical axis is perpendicular to the image plane, it operates according to the
following lens law:
1
1
1
+
=
OA ' OA f
OA is the distance of an object point from the plane of the lens, OA is the distance of the
focused image from this plane, and f is the focal length of the lens. These distances can
be observed on Figure 7.4.
A'
O
A
f
51
Chapter 7 Measurements
7.5.3 Calculating real size of the object
AB
OA
=
A ' B ' OA '
OA
OA
1) * A ' B '
* A' B ' = (
f
OA '
So if know the size of the object in pixels, the focal point of camera that captured the
image and precise distance from which the image is taken we can estimate the real size of
the object.
If we want to calculate the size of the robot on Figure 7.4, we need following parameters:
f=4.503 mm; focal length of lens ( check lens specifications)
Pixel size: 3.17m; check specifications
| AB|= 343 pixels; measured from image
| OA| = 5m; distance from which the image is taken
Assignment 7.9
Using your own camera fix the distance to 1m, capture an image of a ruler and measure
how many m is in one pixel.
Assignment 7.10
Place the uniform color object on the uniform background and capture the image of the
object with your own camera. Fix the distance from which the image is taken to 1m.
Create binary image and compute the perimeter of the object in image. If you know the
object perimeter in pixels, how large it is in cm? Measure the perimeter of the real object
by ruler and compare with the estimated value. How large is the measurement error?
52
The RGB color model is based on a Cartesian coordinate system and each color appears
in its primary spectral components of red, green and blue. An RGB image may be viewed
as a stack of three gray scale images that when fed into the red, green and blue inputs of
the color monitor, produce a color image on the screen. By convention, the three images
forming an RGB color image are referred to as green, blue and red component images.
The representation of the RGB model using color cube is shown on Figure 8.1
53
used. The 16 bit representation (so called Highcolor) is used only for displaying the
image on the screen.
Table 8.1: Influence of data type on color representation
Data class
Range of values Bit depth Number of colors (2b)3
double (b=2 bits) [0 1]
3*2=6
64
uint8 (b=8 bits)
[0,255]
3*8=24
16, 777 * 106
uint16 (b=16)
[0,65535]
3*16=48 2,814 * 1014
Lets load the image ball and observe red, green and blue components of this image.
>>
>>
>>
>>
>>
>>
>>
I=imread(ball.jpg');
I_red=I(:,:,1);
I_green=I(:,:,2);
I_blue=I(:,:,3);
subplot(1,3,1); imshow(I3(:,:,1))
subplot(1,3,2); imshow(I3(:,:,2))
subplot(1,3,3); imshow(I3(:,:,3)
What are the differences between these images? In your opinion how difficult it would be
to segment the ball from the field based on these three images?
RGB is a linear representation, since it directly maps different colors to light intensities
of the various frequencies. However, human vision is logarithmic, in the sense that the
perceived contrast is based on the ratio of two intensities, not the difference. Thus, RGB
is a (perceptually) non-uniform color space. This can create problem when observing
images with illumination changes.
Assignment 8.1:
Lets assume that a robot observes a ball at the outdoor football field. The ball has a
specific orange color and can be described using RGB color model. However since
illumination constantly changes, appearance of the color of the ball changes as well. It
appears much lighter when sunny and very dark orange when clouds appear. Observe
different appearances of the ball from Figure 8.2 If we describe this ball only using RGB
values, they will differ significantly depending on illumination and no connection can be
made between two different appearances of the same ball. For that reason illumination
independent model of the color must be used for ball representation. One of such models
is HSV model described in further text.
Part of R component:
Part of R component:
Part of R component:
131 131 132 133 134 210 211 211 214 215
70 70 71 71 72
Figure 8.2: Influence of an illumination change on the ball appearance
54
Another way to represent colors is using their Hue, Saturation and Value (Intensity)
attributes as shown on Figure 8.3 HSV model is based on the Polar coordinate system.
The hue is a color attribute that describes the pure color (eg. pure yellow, orange or red)
whereas the saturation gives a measure of an intensity of a specific hue value. In other
words saturation gives a measure of the degree to which a pure color is diluted by a white
light. The value represents the intensity of the light in the image. Observe the difference
between these 3 components on Figure 8.4.and how their levels change. The main
advantage of the HSV space over RGB is the fact that the intensity information (V or I) is
decoupled from the color information in the image which allows an illumination invariant
representation of the image. Second, the hue and saturation component are closely related
to the way that humans perceive color. For that reason HSV is very commonly used
model in image processing.
Saturation
Value
High
value
Low
value
Figure 8.4 Influence of change in hue, saturation and value attributes on the image
55
Mentioned color models are not the only ones that are used. In Table 8.2 we list more
color models supported by Matlab and their simple descriptions.
Color
space
XYZ
xyY
uvL
L*a*b*
Description
Encoding
uint16
double
double
or
In Matlab , in order to convert between different color spaces following commands are
used.
I_rgb = hsv2rgb(I_hsv) converts an HSV color map to an RGB color map.
I_hsv = rgb2hsv(I_rgb) converts an RGB color map to an HSV color map.
In order to convert to other color spaces, besides HSV and RGB we use following.
C=makecform(type) creates the color transformation structure C that defines the color
space conversion specified by type. Some values that type can take are ('srgb2lab',
'srgb2xyz', )
Lab=applycform(A, C)
L*u*v*
L*C*H*
RGB
RGB
XYZ
Yxy
CMY
CMYK
HCV
HSV
56
DIP Image
>>RGB = imread(ball.jpg);
>>HSV = rgb2hsv(RGB);
>>H=HSV(:,:,1);
>>S=HSV(:,:,2);
>>V=HSV(:,:,3);
>>subplot(2,2,1), imshow(H); title (Hue)
>>subplot(2,2,2), imshow(S); title
(Saturation)
>>subplot(2,2,3), imshow(V); title
(Intensity)
>>subplot(2,2,4), imshow(RGB);title
(RGB)
>>RGB=readim(ball.jpg);
>>HSV=colorspace(RGB,HSV);
>>H=HSV{1};
>>S=HSV{2};
>>V=HSV{3};
57
DIP Image
In DIP Image the color images are not supported directly by the filters. To apply the same
filter to each of the color channels, use the function iterate
>> color_image=readim(flower.jpg);
>> color_image_blurred = iterate(gaussf,color_image,10);
Sharpening the image follows the same steps as the smoothing with the difference that we
use the Laplacian filter.
Lets now check functions that perform color sharpening.
58
MATLAB
>>lapmask=[1 1 1; 1 -8 1; 1 1 1]; Laplacian filter mask
>>image_sharpen=color_image_blurred-imfilter(color_image_blurred,
lapmask,replicate)
>>imshow(image_sharpen)
Original image
Smoothed image
Sharpened image
Figure 8.6 Effect of smoothing and sharpening on the image
8.4.3 Motion blur in color images
One problem that is constantly present in robot vision is motion blur due to camera
motion. It is very important to approximate image for camera motion and to learn how to
correct such image.
MATLAB
Create a filter, h, that can be used to approximate linear camera motion.
h = fspecial('motion', 50, 45);
Apply the filter to the image originalRGB to create a new image, filteredRGB, shown in
Figure 8.7
>>originalRGB = imread('room.jpg);
>>imshow(originalRGB)
>>filteredRGB = imfilter(originalRGB, h);
>>figure; imshow(filteredRGB)
59
In monochromatic images for edge detection we use gradients. However, computing the
gradient of a color image by combining the gradients of its individual color planes gives
different result from computing the gradient directly in the RGB color space.
MATLAB
Computation of the gradient directly in the RGB color space is performed with following
function that is supplied with the practicum:
[VG, A, PPG]= colorgrad (image_RGB);
VG is the RGB vector gradient; A is the angle image in radians; PPG is a gradient image
formed by summing the 2-D gradient images of the individual color planes.
DIP Image
a = readim(robot_field.jpg)
b = norm(iterate(gradmag,a))
For non-linear filters, it very often is not this clear how they should be applied to color
images. For example the morphological operations, that never should introduce new
colors (the values of the output are selected from the input by maximum or minimum
operations), are particularly difficult to implement.
60
Segmentation is process that partitions the image into regions. Some images are easily
segmented when the correct color space has been chosen. This is very often L*a*b* or
HSV, and this exercise and the next will show why.
Assignment 8.8 (use either Matlab or DIP Image)
Read in the image robosoccer_in. This is an image recorded by a soccer-playing robot.
Youll see the dark green floor, greyish walls, a yellow goal, a black robot (the goal
keeper), and two orange balls (of different shades of orange). You should write an
algorithm to find these balls.
Look at the R, G and B components. Youll notice that it is not easy to segment the balls
using any one of these three images. One problem is that the bottom side of the balls is
darker than the top part. We need to separate color from luminance, as does the L*a*b*
color space.
Convert the image into L*a*b*. The a* channel (red-green) makes the segmentation very
easy (by chance: we are looking for objects with lots of red, and the balls are the only
such objects in the image). A triangle threshold will extract the balls. Use function:
[out, th_value] = threshold(in,type,parameter) for DIP_Image and simple thresholding of
the values for Matlab.
Note that the thin lines along strong edges are caused by incorrect sampling in the
camera. This is a common problem with single-chip CCD cameras, where the three
colors of a single pixel are actually recorded at different spatial locations. If you zoom in
on such a strong edge in the input image, youll notice the color changes. These thin lines
in our thresholded image are easy to filter out using some simple binary filtering.
The image robosoccer_in_dark contains the same scene recorded with smaller
diaphragms (less light reaches the detector). Examine if this algorithm still works for
these worse lighting conditions.
61
Figure 8.9: Results of the ball segmentation from the Assignment 8.6
Assignment 8.9 (Use Matlab)
The* channel provides a good solution to our problem. However, if there were a red or
purple object in the scene (like a robot adversary), this technique wouldnt work. We
want to be able to differentiate orange not only from yellow, but also from red and
purple. The hue should provide us with a nice tool for this purpose.
Compute the hue image from robosoccer_in and use it to segment the balls. Try your
program on the other images in the series.
62
The multi-scale nature of objects is quite common in nature. Most of objects in our world
exist and can be processed only over limited range of scales. A simple example is a
concept of a branch of a tree which makes sense only at scales from few centimeters to
few meters (take a look at Figure 9.1). It is meaningless to discuss the tree concept at the
nanometer or kilometer level. At those scales it is more relevant to talk about the
molecules that form the leaves of the tree or the forest in which tree grows. Once we want
to process the image of an object we need to know which the relevant scales of that
object are. Or another way would be to create the representation of the object so that is
independent of scale. Scale space theory is providing us with answers to these questions.
Such multiscale representation of an image is called the Image pyramid and is in fact
collection of images at different resolutions. At low resolutions there are no details in
image and it only contains very low frequencies (it is very blurred). At high resolution we
can observe both high and low frequencies and process even the smallest details in
image.
We can reduce the resolution of an image by limiting its frequencies which is same as
applying low pass filter. The most common filter to use is Gaussian filter since Gaussian
derivatives can be easily scaled due to explicit scale parameter sigma, . Observe the
formula of Gaussian filter (Kernel) bellow.
2
2
2
1
G ( x, y , ) =
e ( x + y ) / 2
2
Now, blurring the image is done by convolving the image with the Gaussian kernel as we
saw in Chapter 4.
L ( x, y , ) = G ( x, y , ) I ( x, y )
L represents the blurred image and the parameter will determine amount of blur.
Higher values of will yield higher amount of blur in the image and lower the resolution
of the image. Scale space of image Lena is shown at Figure 9.2.
63
In order to create the scale space of an image we need to calculate the sequence of
images, where is each image filtered with Gaussian filter with higher value of sigma.
In Dip Image we can use function gaussf to filter the image using Gaussian filter.
image_out = gaussf(image_in,
,method)
For calculation of the Gaussian scale space of an image, separate function in Analysis
menu is implemented:
[sp,bp,pp] = scalespace(image_in,nscales,base)
64
Scale spaces are often used to create scale invariant object representation. One way to do
it is to combine scale space with image resizing and to calculate object features at each
image. This method is used as a first step of a currently best image descriptor Scale
invariant feature transform-SIFT.
Process goes as follows:
1. First we take the original image, and generate progressively blurred out images
(scale space of that image).
2. Than, we resize the original image to half size and generate blurred out images
again.
3. Resize the original image to one quarter and generate blurred out images again.
4. Keep repeating till defined threshold ( we resize and construct scale space 4 times
in the case of SIFT)
65
The scale space of one image in this process is called an Octave and in SIFT, author
proposed to use 4 octaves in total. The number of scales in the scale space (number of
blurred images) is set to 5. These values are set manually using experimental results.
More or less octaves can be generated as well, depending on the problem. Take a look of
the constructed SIFT scale space of the image cat at the Figure 9.3.
Assignment 9.2: Design SIFT scale space
Load the image lena and create the SIFT scale space of that image. In implementation,
use 5 scales and 4 Octaves. Use Matlab for implementation. Why is in your opinion such
representation good for scale invariant recognition?
9.2 Hough Transform
x = 0:30;
p0 = 20; theta0 = pi/3;
y = (p0-x*cos(theta0))/sin(theta0);
plot(x,y)
66
There are a lot of lines that go through a point (x1,y1). However, there is only one line that
goes through all points (xi,yi). At each point we determine all lines (combinations of p and
) that go through that point:
>> theta = 0:pi/256:2*pi;
>> p = x(1) * cos(theta) + y(1) * sin(theta);
>> plot(theta,p)
This results in a parameter space as shown in the Figure 9.5. The axes of this space are
the parameters you are looking for (in this case p and ).
Parameter space
1. Make an binary input image of size 32x32 containing one or more lines.
2. Determine the necessary size of the parameter space if you want to measure
from 0 to 2 with an accuracy of /128, and p from 0 to 322 with an accuracy of 1.
3. Make an empty parameter space image of the determined size.
4. Fill the parameter space:
- For each object point in the image determine all possible combinations of p and .
- For each combination of p and determine the corresponding pixel in the parameter
space image and increment the value of that pixel by one.
5. Find the maximum in the parameter space.
67
Use this to speed up your Hough Transform. The variable I in the second part is an array
containing linear indices into the image b. Note how it is computed: the column number
multiplied by the height of each column, plus the row number MATLAB arrays (and thus
also images) are stored column-wise.
Hough transform can also be used for detecting circles, ellipses, etc. For example, the
equation of circle is: (x - x0)2 + (y - y0)2 = r2. In this case, there are three parameters:
(x0, y0), r and transformation that parameterizes this equation is called Circular Hough
68
Transform. In general, we can use Hough transform to detect any curve which can be
described analytically by an equation of the form: g(v,C) (v: vector of coordinates, C:
parameters). Detecting arbitrary shapes, with no analytical description, is also possible
and that transformation is called Generalized Hough Transform.
We will further elaborate usage of Hough transform for line detection in Chapter 10.
9.3 Mean Shift
In order to perform image segmentation, we first need to group extracted points (features
in feature space) in clusters. One of the techniques used for that is Mean-Shift.
Mean shift considers feature space as an empirical probability density function. If the
input is a set of points then Mean shift considers them as sampled from the underlying
probability density function. If dense regions (or clusters) are present in the feature space
then they correspond to the mode (or local maxima) of the probability density function.
Lets now look at the example on Figure 9.7. Detected points (in yellow) correspond to
objects in image and we want to cluster close points together and assign them to objects.
Notice that if we have large number of points close to each other it will yield to large
peak in underlying probability density function. Observe 3 large peaks in the case of
region 1, region 2 and no peak at all in the case of the single point in region 3. All the
points lying underneath the peak one will correspond to object one. We will use Mean
Shift to detect which points lye underneath which peak.
Peak 1
Region 1
Region 2
Region 4
69
1
f ( x) = d
nh
K(
i =1
x xi
)
h
Mean shift can be considered to based on gradient ascent on the probability contour. To
find a local maximum of a function using gradient ascent, one takes steps proportional to
the gradient (or of the approximate gradient) of the function at the current point. The
generic formula for gradient ascent is,
x1 = x0 + f ' ( x0 )
1
nh d
K (
i =1
'
x xi
)
h
Setting it to 0 we get,
n
K'(
i =1
n
x xi
x xi
)x = K ' (
) xi
h
h
i =1
h
i =1
n
K (
'
The stationary points obtained via gradient ascent represent the modes of the density
function. All points associated with the same stationary point belong to the same cluster.
Now let apply these formulas to calculate Mean Shift, m(x):
70
x xi
) xi
h
Assuming g ( x) = K ' ( x) , it follows m( x) = i =1n
x
x xi
(
)
g
h
i =1
n
g(
Mean shift has many practical applications especially in computer vision field. It is used
to perform clustering, for segmentation of images (Chapter 10), for object tracking in
videos etc. The most important application is using Mean Shift for clustering. The fact
that Mean Shift does not make assumptions about the number of clusters or the shape of
the cluster makes it ideal for handling clusters of arbitrary shape and number.
Now once you understood how the algorithm works, lets try to apply it to solve problem
of point clustering and object localization in given image.
Assignment 9.8: Applying Mean Shift algorithm to clustering problem
Load the image table.jpg that shows 3 objects on a uniform background, from Figure
2.6 a. We detected interesting points on that image using SIFT technique explained in
Chapter 11, and calculated probability function of these points. You will be given matrix
with the set of the points and the probability values that correspond to these points. To
obtain these values load the file Assignment_9.8. and check the matrix prob_values.
Group the close points together using MeanShift code from assignment 9.7. Did you
manage to group points belonging to different objects? Vary parameter h (radius of
kernel) and comment how is the result effected. Try to establish optimal value for the
parameter h. Now, calculate the rectangular box around the clustered points similar to
Figure 9.8.
71
Figure 9.8 Clustered points using Mean Shift (result of the assignment 9.8)
Following chapter (Chapter 9) is dealing with image segmentation and there is special
section devoted to clustering where we will further explain this topic.
72
according to the set of predefined criteria. For the explanations of all the methods in this
Chapter we will only use Image Processing Toolbox of Matlab.
10.1 Point line and edge detection
10.1.1 Detection of isolated points
It is very easy to detect isolated points located in the areas of constant of nearly constant
intensity. The most common way to look for discontinuities is to apply a mask on an
image as described in Chapter 4. The response of the 3x3 mask at any point in the image
is shown bellow, where zi is the intensity of pixel associated with mask coefficient wi .
The response of the mask is defined at its center.
n
R = w1 z1 + w2 z2 + ...w9 z9 = wi zi
i =1
Now we can say that an isolated mask has been detected at the location on which the
mask is centered if R T where T is nonnegative threshold. If T is given the following
command the point can be detected using following command:
image_out = abs(imfilter(tofloat(image_in),w))
w is the filter mask and image_out is the binary image containing the points detected.
If T is not given its value is often chosen based the filtered result just like in following
example:
>>
>>
>>
>>
>>
The assumption that we used in this example is that isolated points are located in constant
or nearly constant background. If we choose T to be maximum value in image_out, there
can be no points in image_out with values greater than T.
Another approach to point detection is to find points in all neighbourhoods of size mxn
for which the difference of the maximum and minimum pixel values excides a specified
value T. For this approach function ordfilt2 is used.
Assignment 10.1
Create the image that has an isolated point and using the ordfilt2 function detect that
isolated points in the image. Set T in the same way as in previous example.
In literature we can often find terms as interest point detection, salient point detection but
they will be processed in further text.
74
The next level of complexity is line detection. It can be detected in two different ways,
using masks and using Hough transform.
First one is by using different masks. At Figure 10.2 are shown most common masks
used to detect horizontal, vertical and 45 degree oriented lines.
-1
-1
-1
2
2
2
-1
-1
-1
R1: Horizontal
-1
2
-1
-1
-1
-1
2
-1
2
-1
-1
2
-1
-1
2
R2 +45
R3Vertical
Figure 10.2: Masks for line detection
2
-1
-1
-1
-1
2
-1
2
-1
R4-45
2
-1
-1
To automatically detect all the lines in the image we need to run all 4 masks
independently throughout the image and threshold the absolute value of the results, Ri is
the result for mask i. If at certain point in the image result Ri > R j for all j i that point
is considered to be more likely associated in the direction favored by mask i. Final
response is equal to:
R(x, y) = max(|R1(x, y)|, |R2(x, y)|, |R3(x, y)|, |R4(x, y)|)
If R(x, y) > T, then discontinuity
Lets now apply mentioned approach and show that the image on the Figure 10.3 is in
fact just illusion. We want to prove that all lines are in fact straight.
>>I=imread(optical_illusions.png)
>> w1=[-1 -1 -1; 2 2 2; -1 -1 -1];
>> w3=[-1 2 -1; -1 2-1; 1 2-1];
>> g1=imfilter(I, w1); imshow(g1);
>> g2=imfilter(I, w2); imshow(g2)
Original image
Horizontal lines
Figure 10.3: Line detection examples
Vertical lines
Assignment 10.2
Load the image Zollner_illusions and show that the diagonal lines are in fact parallel.
Try to detect other lines in image as well.
75
In practice, results obtained by edge detection or simple line detection often have many
discontinuities because of the noise, brakes in edge due to non uniform lightening etc..
Fro that reason edge detection algorithms are followed by linking procedure to assemble
edge pixels in meaningful edges. For that we use Hough transform. Before we continue,
please read carefully section Hough transform of Chapter 9.
In Chapter 9 we designed our own Hough transform function. Here we will see how to
use already given functions within Matlab Image Processing toolbox.
To compute Hough transform we can use function hough.
[H, theta, rho] = hough(image_in)
[H, theta, rho] = hough(image_in, ThetaRes, val1, RhoRes, val2)
H is the Hough transformation matrix and theta (in degrees) and rho are the vectors from
Figure 9.4. Input image_in is a binary image.
Lets now compute the Hough transform of image from Figure 10.4 a). The result is
shown in Figure 10.4 b).
>> [H, theta, rho]=hough(image_in, ThetaResolution, 0.2);
>> imshow(H, [], XData, theta, YData, rho, InitalMagnification,
fit);
>> axis on, axis normal
>> xlabel(\theta); ylabel(\rho) ;
The following step in line detection is finding high peaks in Hough Transform function.
The function hougpeaks locates peaks in the Hough transform matrix, H. Numpeaks is a
scalar value that specifies the maximum number of peaks to identify. If you omit
numpeaks, it defaults to 1.
peaks = houghpeaks(H, numpeaks)
peaks = houghpeaks(H, numpeaks, 'Threshold', val1, 'NHoodSize', val2)
Threshold is a threshold value at which values of H are considered to be peaks. It can vary
from 0 to Inf. and has default value 0.5*max(H(:)).
Lets now compute the peaks in Hough transformation function of image from Figure
10.4 a). The result is shown in Figure 10.4 c).
>> peaks=houghpeaks(H,5)
>>hold on
>>plot(theta(peaks(:,2)), rho(peaks(:,1)))
The final step is to determine if there are meaningful line segments associated with
detected peaks, as well as where the lines start and end. For that we use function:
Lines=houghlines(image_in, theta, rho, peaks)
76
Now lets find and link the line segments on the image from Figure 10.4 a). Final result is
shown at the Figure 10.4 d). Detected lines are superimposed as thick gray lines.
>> lines = houghlines (image_in, theta, rho, peaks);
>>figure, imshow(image_in), hold on
>> for k=1:length(lines)
xy=[lines(k).point1; lines(k).point2];
plot(xy(:,1), xy(:,2), LineWidth, 4)
end
Assignment 10.3:
Load the image soccer filed, create binary image and using functions for Hough
Transform explained above detect lines on the field.
77
The most common approach to detect sudden changes (discontinuities) in image is edge
detection. Most semantic and shape information is saved in edges. Take the artist
sketches as example. We can imagine and understand the outline of entire picture just by
looking at its sketch. In practice edges are caused by variety of different factors. Observe
Figure 10.5 for explanation.
Surface normal discontinuity
Depth discontinuity
Surface normal discontinuity
Illumination discontinuity
Figure 10.5 Factors that cause edges
Discontinuities can be detected using first and second order derivatives. The first order
derivative is the gradient, while the second order derivative is Laplacian.
Gradient:
If we use the gradient for edge detection we need to find places in image where the first
derivative of the intensity is greater in magnitude than a specified threshold.
78
We can approximate the gradient using convolution and image masks. In Matlab, in
Image Processing toolbox there is a function edge.
[g , t] = edge(f, method, parameters)
For calculation of a first derivative, there are 3 possible methods that we can use Sobel,
Prewitt and Roberts. Sobel is the most common one and it approximates the gradient
using differencing, where z represents the image neighborhood:
2 0.5
f = g x 2 + g y 2 = [ ( z7 + 2 z8 + z9 ) ( z1 + 2 z2 + z3 ) ] + [ ( z3 + 2 z6 + z9 ) ( z1 + 2 z4 + z7 ) ]
2
T is a specified threshold that determines the edge pixels, if f T pixel is edge pixel.
Dir specifies the preferred direction of edges: horizontal, vertical or both. Resulting
image g is binary image. An example of image room with detected edges using Sobel is
shown at Figure 10.6 b)
Laplace:
Second order derivatives are seldom used directly for edge detection because of their
sensitivity to noise and inability to detect edge direction. They are used in combination
with first derivatives to find places where the second derivative of intensity crosses zero.
If we consider the Gaussian function used for smoothing, G(x,y) we can define the
Laplacian of Gaussian (LoG) that is often used as edge detector:
G ( x, y ) = e
x2 + y2
2 2
2G ( x, y ) 2G ( x, y ) x 2 + y 2 2 2
G ( x, y ) =
+
=[
]e
2 x2
2 y2
4
2
x2 + y 2
2 2
Sigma is the standard deviation and default value is 2. An example of image with
detected edges using Sobel is shown at Figure 10.6 c)
Finally the best result can be obtained by combining to different methods and that is the
case in the Canny edge detector. The syntax is following:
[g , t] = edge(f, canny, T, sigma)
An example of image with detected edges using Canny is shown at Figure 10.6 d)
79
When image segmentation and object recognition have to be performed in real word
scenes, using simple lines or edges in no longer suitable. The algorithm must cope with
occlusions and heavy background clutter so we need to process the image locally. For
that most of the state of the art algorithms use detection of interest points (also called
keypoints). Interest points are points that exhibit some kind of salient (distinctive)
characteristic like a corner. Subsequently, for each interest point a feature vector called
region descriptor is calculated. Each region descriptor characterizes the image
80
information available in the local neighborhood of a point. Region descriptors and object
representation are processed in Chapter 11.
All keypoint detectors can be divided in two major groups:
Corner based detectors: used for detection of structured regions and rely on the
presence of sufficient gradient information (sufficient change in intensity)
Region based detectors (blobs): used for detection of uniform regions and regions with
smooth brightness transition.
10.2.1 Corners
We can detect corner by exploiting its main property: in the region around a corner,
image gradient has two or more dominant directions. Observe the Figure 10.7 where is
this property shown:
2
I
IxI y
M = x
I I
I y
x y
I(x+u, y+v) is shifted intensity, I(x,y) intensity at point (x,y), while w(x,y) represents
window function (or mask as we called it). As we saw in 10.1.3 we can use gradient to
measure abrupt changes, so the approximation using matrix M can be used. Ix,and Iy, are
first order image derivatives (gradients of the image).
For very distinctive patches change in intensity will be large, hence the E(u,v) will be
large. We can now check if the corner response at each pixel is large:
R = det( M ) k (traceM ) 2
81
Det is determinant of the matrix M, while trace represents the sum of eigenvalues of M. k
is empirically defined constant, and the smaller the value of k the more likely is that
algorithm will detect sharp corners. Now value of R will determine corner point:
Lets now calculate corners in image corners. Results are shown in Figure
>>I = imread('corners.png');
>>imshow(I);
>>C = cornermetric(I, 'Harris', 'SensitivityFactor', 0.04);
% Displaying corner metric
>>C_adjusted = imadjust(C);
>>figure;
>>imshow(C_adjusted);
% Detecting corner peaks
>>corner_peaks = imregionalmax(C);
% Displaying corners in the image
>>corner_idx = find(corner_peaks == true);
>> [r g b] = deal(I);
>>r(corner_idx) = 255;
>>g(corner_idx) = 255;
>>b(corner_idx) = 0;
>>RGB = cat(3,r,g,b);
>>figure
>>imshow(RGB);
>>title('Corner Points');
Original image
Corner metric
Figure 10.8 Corner detection
Detected corners
Assignment 10.6:
Calculate corners of image room. Vary the sensitivity factor. What can you observe?
82
10.2.2 Blobs
Blob detection algorithms detect points that are either brighter or darker than
surrounding. They are developed to provide additional information to edge and corner
detectors. They are used efficiently in image segmentation, region detection for tracking,
object recognition etc, they serve as interest points for stereo matching etc
One of the most used blob detector is based on the Laplacian of Gaussian (LoG) edge
detector explained in 10.1.3. The magnitude of the Laplacian response will achieve a
maximum at the center of the blob. The Laplacian operator usually results in strong
positive responses for dark blobs of extent and strong negative responses for bright
blobs of similar size.
x2 + y 2
2G ( x, y ) 2G ( x, y ) x 2 + y 2 2 2 2 2
G ( x, y ) =
+
=[
]e
2 x2
2 y2
4
A main problem when applying this operator at a single scale, however, is that the
operator response is strongly dependent on the relationship between the size of the blob
structures and the size of the Gaussian kernel used for pre-smoothing. In order to
overcome this and to make the automatic detector a multi scale approach similar to one
from Chapter 9.1 must be used. This approach is further developed in Chapter 11.
2
Another approach that is using multi scale approach and gives scale invariant detector is
Maximally Stable Extremum Regions (MSER). Its principle is based on thresholding the
image with a variable brightness threshold t. All pixels with gray value bellow t are set to
0/dark, and all pixels with gray value equal or above t are set to 1/bright. The threshold is
increased successively and in the beginning the thresholded image is completely bright.
As t increases, black areas will appear in the binarized image and they will grow and
merge together. Black areas that will be stable for a long period of time and these ones
are MSER regions. They reveal the position (center point) as well as the characteristic
scale derived from region size as input data for region descriptor calculation. Altogether,
all regions of the scene image are detected which are significantly darker than their
surrounding. Inverting the image and repeating the same procedure with the inverted
image reveals characteristic bright regions.
There is no function in Matlab or DipImage that can calculate these features, so we will
install and use new toolbox VL_Feat that is very useful for segmentation and description.
Assignment 10.7:
Download and install in Matlab VL_Feat toolbox from address:
http://www.vlfeat.org/download.html
83
In r are centres of regions, while f contains elliptical frames around centers. Lets now
compute the MSER regions of the image lena. Results is shown on Figure 10.9.
>>I = uint8(rgb2gray(imread(lena.jpg)) ;
>>[r,f] = vl_mser(I);
>>imshow(I) hold on
>>f = vl_ertr(f) ;
>>vl_plotframe(f) ;
Original image
Assignment 10.8:
Another way to use function MSER is:
[r,f] =vl_mser(I,'MinDiversity',val1,'MaxVariation',val2,'Delta', val3)
One of the easiest methods to perform image segmentation is to start from image pixels
and to group these pixels into meaningful regions (this is so called bottom up
approach).
Clusters can be formed based on pixel intensity, color, texture, location, or some
combination of these. It is very important to choose the number of cluster correctly since
the solution highly depends on that choice.
The K-means algorithm is an iterative technique that is used to partition an image into K
clusters. The basic algorithm is:
1. Pick K cluster centers, either randomly or based on some heuristic.
84
2. Assign each pixel in the image to the cluster that minimizes the distance between
the pixel and the cluster center. As distance function often is used Euclidean
distance.
3. Re-compute the cluster centers by averaging all of the pixels in the cluster
4. Repeat steps 2 and 3 until convergence is attained (e.g. no pixels change clusters)
In Matlab, in Image processing toolbox function kmeans is already implemented.
IDX = kmeans(X,k)
IDX is (n ,1) vector containing the cluster indices of each point. By default, kmeans
uses squared Euclidean distances. K is the predefined number of clusters.
matrix that returns distances from each point to every centroid in the image
sumd, (1,k) returns the within-cluster sums of point-to-centroid distances
val1 chooses the possible distance functions: 'sqEuclidean' , cityblock, Hamming
Lets now segment the color image from figure 10.10 a). Algorithm that we will use is
following:
Step 2: Convert image from RGB color space to L*a*b* color space
Note: Please refer to Chapter 7 for explanation of L*a*b* color space
>> cform = makecform('srgb2lab');
>> lab_he = applycform(I,cform);
85
Step 4: Label every pixel in the image using the results from kmeans
>>pixel_labels = reshape(cluster_idx,nrows,ncols);
>>figure(2)
>>imshow(pixel_labels,[]), title('image labeled by cluster index');
As we can see from previous example, the K-Means algorithm is fairly simple to
implement and the results for image segmentation are very good for simple images. As
can be seen by the results from Figure 10.10, the number of partitions used in the
segmentation has a very large effect on the output. By using more partitions, in the RGB
setup, more possible colors are available to show up in the output and the result is much
better. Manual detection of the number of clusters is main drawback of this method.
Assignment 10.9:
As we can see from Figure 10.10, we have detected that the glove is found in the cluster
10. Extract only the pixels belonging to the glove and perform the step 6 of the algorithm.
Assignment 10.10: Design system for Skin Detection
86
Segment only the face in image girl. Use previous example as reference. Once you
labeled all the regions, verify all the regions with the criteria for identifying the skin
pixels. Identify the skin pixel objects from the different color objects. As criteria you can
use following values:
(R,G,B) value is classified as skin if it is satisfied:
(R > 95) & (G > 40) & (B > 20)
&( (Max{R,G,B} - Min{R,G,B}) > 15)
&( |R-G| > 15) & (R > G) & (R > B)
Now assume 0s to the regions which is not a skin. Display the segmented image which
contains skin pixels in the RGB format.
Original image
Cluster 4
Pixel labels
Cluster 1
Cluster 2
Cluster 5
Cluster 8
Cluster 10
Figure 10.10 Color segmentation using kmeans clustering
87
88
1.
2.
3.
4.
5.
Online steps:
6. Calculate descriptor for unknown image (query image)
7. Compare the descriptor of query image with all other descriptors from the
database using the chosen similarity measure
8. Image/images with the smallest distance/distances are most similar with our query
image. From that result we can draw conclusion to which category query image
belongs to or which objects query image contains.
11.1 Global descriptors
Global descriptors are used when we want to describe some global property of an image.
We can divide them in color, texture and shape descriptors.
11.1.1 Color descriptors
There are many color descriptors in use in state of the methods. Here we will focus on
one simple and fast but still very efficient color descriptor based on the statistics of an
image, color moments.
Color moments:
The basis of color moments lays in the assumption that the distribution of color in an
image can be interpreted as a probability distribution. It therefore follows that if the color
in an image follows a certain probability distribution, the moments of that distribution
can then be used as descriptors to identify that image based on color. For a color
distribution perceptual color system as HSV should be used. Moments are then calculated
for each of these channels in an image, H, S and V. In total there are 3 moments Mean,
Standard deviation and Skewness and an image is therefore characterized by 9 moments,
3 moments for each 3 color channels. We will define the ith color channel at the jth image
pixel as pij ( in total there are N pixels).The three color moments can then be defined as:
Moment 1 Mean:
N
1
pij
j =1 N
Mean can be understood as the average color value in the image.
Ei =
89
1 N
( pij E i ) 2 )
N j =1
The standard deviation is the square root of the variance of the distribution.
i = (
Moment 3 Skewness:
si = 3 (
1
N
(p
j =1
ij
E i )3 )
FV= 1 2 3
s s
s3
2
1
Now we should match our image (query image) with other images from database and
calculate similarity between them. For that different similarity functions are used, eg
Euclidean distance, or cosine distance. For color moments it is best to use following
distance:
3
Here we calculate the distance between two feature vectors, FV_1 and FV_2. Ei1
represents mean of FV_1, Ei2 represents mean of FV_2 etc.
wi1, wi2 and wi3 represent weights that are used to emphasize more important components,
for example to reduce the influence of Value component (Brightness) and increase the
influence of Hue component (color type). These weights are usually pre-calculated from
the entire database of images and suited for images in that database.
Now that we calculated all distances, image with the smallest distance is most similar
with our query image.
Assignment 11.1:
Design a function that will compute the color moments of an image. Load images
beach1, beach2 and rainforest. Calculate color moments for all three images. Now
compare the image beach1 with the other two. Is the descriptor successful?
11.1.2 Texture descriptors
90
Texture descriptors are very useful to group similar texture or patterns in images. On
Figure 11.3 examples of different textures are shown.
Wall
Bush
Hair
Figure 11.3 Examples of textures
Electrical chip
When computing texture it is very important to consider the distribution of gray level
intensities as well as the relative positions of pixels in image. One such method is Gray
Level Co-occurrence matrix (GLCM).
The GLCM is a matrix of frequencies, where each element (i, j) is the sum of the number
of times that a pixel value i was at a certain distance and angle from pixel intensity j.
The number of rows and columns of the GLCM is determined by the number of grayscale
intensity values in the image. Then, 4 statistical features are calculated from the GLCM:
1. Contrast - a measure of the relative intensity contrast between a pixel and its neighbor
in a relative location .
Contrast = (i j ) 2 P (i, j )
i, j
Homogeneity =
i, j
P (i, j )
1+ i j
(i i )( j j ) P (i j )
i j
i =1
j =1
i =1
j =1
i = i P (i, j ) i = (i i ) 2 P (i, j )
Combining these features together one large feature vector is formed and used as texture
representation of an image.
In Matlab Image Processing Toolbox, there is function that computes GLCM:
91
OUTPUT:
NumLevels is the number of gray-levels to use. By default, if I is a binary image,
function scales the image to two gray-levels, if I is an intensity image it scales the image
to eight gray-levels.
Offset is p-by-2 array of integers specifying the distance between the pixel of interest
and its neighbor. Each row in the array is a two-element vector, [row_offset, col_offset],
that specifies the relationship, or offset, of a pair of pixels.
Now to calculate descriptors for GLCM, we use following function:
stats = graycoprops(glcm, properties)
As properties we can use all for all descriptors or only some of them by stating
{Contrast, Homogeneity, Energy ,Correlation}
Lets now calculate the descriptors for image wall.
>>I=imread(wall.jpg);
>>GLCM=graycomatrix(I, NumLevels, 256);
>>GLCM_n=GLCM/(sum(GLCM(:)); % Normalized matrix
>>stats =graycoprops(GLCM, all);
>>contrast=stats.Contrast;
>>corr=stats.Correlation;
>>energy=stats.Energy;
>>hom=stats.Homogenity;
These values can be joined in one feature vector of image and used for similarity
comparison with the other images. As distance function it is best to use Euclidean
distance. To calculate Euclidean distance of two vectors we can us the function norm(x-y)
from Matlab.
Assignment 11.2:
Load images brown-hair and blonde-hair and extract texture descriptors. Observe
obtained values and compare them with values obtained from image wall. What can
you conclude? Measure the distance between different descriptors using Euclidean
distance. Is the descriptor effective?
11.1.3 Shape descriptors
There are two main approaches in describing shapes in images, description of a shape
region and description of a boundary (contour) around shape. Most of the shape
descriptors that work in 2D are mostly computed on a binary image. Please read again
section 7.2 in chapter Measurements, Measuring in binary images. There we already
explained different properties of shapes that can be used as shape descriptors, like
92
Perimeter, Area, Curvature etc. We also introduced regionprops function that describes
different shape properties of regions in image.
For the description of a contour of a shape we would like to introduce a very efficient
method called Fourier descriptors. They have several properties that make them valuable
in shape description, they are insensitive to translation, rotation, and scale changes.
Now we can calculate the 1D Fourier transform of the contour s(k), lie in Chapter 4
K 1
The complex coefficients a(u) are called the Fourier descriptors of the boundary. To
restore s(k) we need to calculate the inverse Fourier transform.
1 K 1
a (u )e j 2 uk / K
K u =0
Using inverse Fourier transform, we can also restore s(k) using much less points than
originally used. We can limit to using only first P coefficients in reconstruction. In that
case we can use following equation for reconstruction:
s (k ) =
1 P 1
a(u )e j 2 k / K
P u =0
Notice that we are still using the same number of boundary points (K) but only P terms
for reconstruction of each point. By this we are limiting the high frequencies that are
responsible for detail, but the low ones that determine global shape remain.
s(k ) =
To calculate the Fourier descriptors in Matlab, we can use functions frdescp and ifrdescp
provided with this practicum.
To compute a Fourier descriptor of boundary of image we use function frdescp:
93
OUTPUT:
z are Fourier descriptors computed using frdscp
nd is the number of descriptors used to computing the inverse; ND must be an even
integer no greater than length(Z).
Lets now describe the shape of the object from Figure 11.4 a). First thing we need to do
is to extract a contour from the object. We can do it in following way:
>>b=bwboundaries(I,noholes); % To extract the contour
>>b=b{1} % There is only one boundary
>>bim=bound2im(b,size(I,1), size(I,2))%Create an image from that contour
Once we extracted the boundary, Figure 11.5 (b) we can see that it has 840 points
(K=840). We can now calculate Fourier descriptors from that image.
>>z=frdescp(b);
Now to check if computed values are correct, lets calculate inverse transformation only
using half of the descriptors (P=420).
>>s_half_points=ifrdescp(z,half_points);
>>I_half_points= bound2im(s_half_points,size(I,1), size(I,2))%
If we now vary the number of Fourier descriptors used for reconstruction, we will obtain
the results shown at Figure 11.5 (c-f). We can conclude that even much reduced number
of Fourier descriptor managed to accurately describe the shape.
a) Original shape
c) Reconstructed contour
using 420 Fourier
descriptors (840 max)
d) Reconstructed contour
e) Reconstructed contour
f) Reconstructed contour
using 84 Fourier descriptors using 28 Fourier descriptors using 8 Fourier descriptors
Figure 11.5 Shape contour reconstruction using Fourier descriptors
94
Main idea of the local approach is to divide the image in small regions and process each
of these regions separately. In that way we can solve problems like occlusion of objects
and background clutter in images. For calculation of similarity each region will vote for
similar images and the images that got most votes will be considered most similar to
query image. Descriptors that are applied on each region can also be color, texture or
shape explained before but the most common technique is keypoint description. As
keypoints we consider corners and blobs explained in Chapter 10.2. Currently best
keypoint descriptor is Scale Invariant Feature Transform (SIFT) already mentioned in
Chapter 9. Please read 9.1 part carefully before continuing with reading.
11.2.1 SIFT Detector
95
space. This technique is used in Harris Laplacian detector so we suggest reader to check
difference between this method and SIFT.
In order to make SIFT detector rotation invariant, for every minima and maxima
(keypoints) dominant orientation is calculated. To do it we need to measure orientation
( x, y ) and magnitude m( x, y ) in Gaussian image L(x,y) at the closest scale to the
keypoint's scale.
m( x, y ) = (( L( x + 1, y ) L( x 1, y )) 2 + (( L( x, y + 1) L( x, y 1)) 2
( L( x, y + 1) L( x, y 1)
( x, y ) = tan 1 (
)
( L( x + 1, y ) L( x 1, y )
96
There is no command for SIFT calculation in Matlab, so we need to use external toolbox
vl_feat, mentioned in Chapter 10. We can calculate it using:
[f,d] = vl_sift(I)
I must be single precision grayscale image, while f represents matrix of interest points. It
contains a column (frame) for each interest point where a frame is a disk of center f(1:2),
scale f(3) and orientation f(4). The matrix d contains descriptors, and has a size number
of interest points * 128.
Lets now calculate and plot SIFT features for image. Keypoints are shown at Figure 11.8
a) and descriptors at Figure 11.8 b)
>>I1= rgb2gray(imread('scene.JPG'));
>>P1 = single(I1) ;
>>[f1,d1] = vl_sift(P1) ;
% To show only 150 keypoints out of 1200 calculated
>>perm1 = randperm(size(f1,2)) ;
>>sel1 = perm1(1:150) ;
>>imshow(I1); hold on
>>h11
= vl_plotframe(f1(:,sel1)) ;
>>h21
= vl_plotframe(f1(:,sel1)) ;
>>set(h11,'color','k','linewidth',3) ;
>>set(h21,'color','y','linewidth',2) ;
% To overlay the decriptors
>>h31 = vl_plotsiftdescriptor(d1(:,sel1),f1(:,sel1)) ;
>>set(h31,'color','g')
97
In order to calculate similarity between two images we need to match a large number of
128-dimensional descriptors. For that the regular Euclidian distance, dist is used where
d1 and d2 represent descriptors from two different images.
dist ( j ) =
(d (:,1) d (:, i)
i =1
For each descriptor in image I1 the distance to all descriptors in image I2 is calculated
and following condition checked:
distance to the second closest
1.5
distance to the closest
If the condition is true the descriptor from I1 is matched to the closest one from I2,
otherwise the descriptor in I1 is not matched at all. This criterion is to avoid having too
many false matches for points in image I1 which are not present in image I2.
For each descriptor in d1, function finds the closest descriptor in d2 (using Euclidean
distance between them). The index of the original match and the closest descriptor is
stored in each column of matches and the distance between the pair is stored in scores.
Matches can be plotted, same as on Figure 11.9, using function:
plotmatches(I1,I2,d1,d2,matches)
Now lets select one of the objects from image scene and match model of that object
with original image.
>>I2=rgb2gray(imread('model.jpg'));
98
Result is presented on Figure 11.9 and shows that many keypoints are matched correctly.
However we can also see many false matches, keypoints from model that are matched
elsewhere in the scene. For that reason, further refinement of results or also called
outlier rejection is needed. Also, keypoints belonging to one object need to be clustered
and for that Hough transform is used. Finally, exact position of the object can be
obtained.
Assignment 11.5:
Load the image scene2 and extract SIFT features. Match that image with the image
model, which is one of the objects in the scene. Show matched points, group them
together using Hough transform and try to estimate the position of object model in the
scene.
99