0% found this document useful (0 votes)
14 views

High Level Computer Vision Using Opencv

This document describes a tutorial on advanced topics in computer vision using OpenCV. It begins with a review of basic image processing and low-level vision techniques in OpenCV like thresholding, histograms, filtering using spatial and frequency domains. Then it discusses more advanced concepts like feature detection using Sobel, Canny, and Laplace operators, and basic image segmentation techniques like region growing. The goal is to present these computer vision concepts and practical examples of implementing them using the open-source OpenCV library.

Uploaded by

Prasad Hiwarkar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

High Level Computer Vision Using Opencv

This document describes a tutorial on advanced topics in computer vision using OpenCV. It begins with a review of basic image processing and low-level vision techniques in OpenCV like thresholding, histograms, filtering using spatial and frequency domains. Then it discusses more advanced concepts like feature detection using Sobel, Canny, and Laplace operators, and basic image segmentation techniques like region growing. The goal is to present these computer vision concepts and practical examples of implementing them using the open-source OpenCV library.

Uploaded by

Prasad Hiwarkar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

2011 24th SIBGRAPI Conference on Graphics, Patterns, and Images Tutorials

High Level Computer Vision using OpenCV


Maurı́cio Marengoni and Denise Stringhini
Faculdade de Computação e Informática
Universidade Presbiteriana Mackenzie
São Paulo, Brazil
dstring,[email protected]

Abstract—This paper presents some more advanced topics expected in the source image and type defines the way the
in image processing and computer vision, such as Principal output image is computed, if the intensity in the source image
Components Analysis, Matching Techniques, Machine Learning
is greater than K then:
Techniques, Tracking and Optical Flow and Parallel Computer
Vision using CUDA. These concepts will be presented using the • CV THRESH BINARY: res = max, otherwise 0.
openCV library, which is a free computer vision library for • CV THRESH BINARY INV: res = 0, otherwise max.
C/C++ programmers available for Windows, Linux MacOS and • CV THRESH TRUNC: res = max, otherwise src.
Android platforms. These topics will be covered considering not
• CV THRESH TOZERO INV: res = 0, otherwise src.
only theoretical aspects but practical examples will be presented
in order to understand how and when to use each of them. • CV THRESH TOZERO: res = src, otherwise 0.

Keywords-openCV; parallel computer vision; pattern recogni- B. Histograms


tion; computer vision;
A histogram is a function that computes frequency. When
I. I NTRODUCTION considering image histograms the pixel’s frequency for each
intensity level or intensity range is computed. It can be used for
The contents of this article follows from a previous tutorial operations, such as comparisons, segmentation, compression,
presented at the 2008 edition of SIBGRAPI as an introductory etc. The openCV call for computing a histogram is given by:
course in computer vision using openCV. The idea in this
article is to study more advanced topics related to pattern void cvCalcHist(src, his, add, mask);
recognition, computer vision, and parallel computer vision. Here, src is a single channel input image, his is the
The material presented here will cover topics beyond the histogram computed for the given image, add and mask are
basic image processing and low-level vision techniques, so optionals, add by default is 0, so it erases his before computing
students who already took a first course in computer vision the histogram. If add is set to 1 the histogram accumulates the
can advance their knowledge and first year graduate students values for more than one image. Mask is a Boolean matrix
can see more specific computer vision applications. Students used to set the part of the input image where the histogram
will also learn how to design these applications using a should be computed.
free computer vision library (openCV) and how to create an
application using processors available in GPU, which comes C. Filtering
in many computer desktops and notebooks today. Filtering in the spatial domain has several applications and
We start this tutorial presenting a simple review for some can be used to smooth images and remove noise or even to
important functions in low level vision, mainly to present enhance transitions and help finding edges. OpenCV has two
students with the openCV functions for each of them. basic functions that can be used for these tasks:
II. L OW L EVEL V ISION - S PATIAL D OMAIN void cvSmooth(src, res, type, p1, p2, p3, p4);
void cvFilter2D(src, res, kernel, center);
This section presents a summary of some basic functions
used in low-level vision processes. These functions are pre- In both functions src and res are the input and out-
sented without details, so the reader is advised to check more put images, type defines the filter type being used, it can
background and details at [1], [2], [7]. be:CV BLUR (mean filter), CV BLUR NO SCALE (sum-
mation), CV MEDIAN (median value), CV GAUSSIAN
A. Threshold (gaussian filter) and CV BILATERAL (bilateral 3x3). The
One of the basic fucntions in low-level vision is the thresh- parameters p1 to p4 are related to the filter type and should be
old function. This function takes an image and a value K for checked at the reference manual [7]. The kernel is the filter’s
the threshold and computes an output image based on the mask and center is the point used as the mask’s center, by
threshold type being used. The openCV function is: default it is cvPoint(-1,-1). The kernel can be defined as:
double cvThreshold(src, res, K, max, type); CvMat *filt;
int side = 3;
In this case src is the source image, res is the resulting int total =256;
image, K is the threshold value, max is the maximum value double kernel[] = { 1, 4, 6, 4, 1,

978-0-7695-4549-3/11 $26.00 © 2011 IEEE 11


DOI 10.1109/SIBGRAPI-T.2011.11
4, 16,24,16, 4, the derivative’s order (at least one of them should be more than
6, 24,36,24, 6, 0 and at most 2), src is the input image (8-bit) and res the
4, 16,24,16, 4,
1, 4, 6, 4, 1,};
output image (16-bit) and mask is the filter’s size (supported
... values are 1, 3 , 5 and 7). A special value for the mask size
for(int i=0; i<side*side; i++){ is given by CV SCHARR which computes the Scharr filter
kernel[i]=(1./total)*kernel[i]; for a 3x3 mask. The cvLaplace function computes the second
} derivatives of an image. The cvCanny function implements the
filt=cvCreateMatHeader(side,side,CV_64FC1);
cvSetData(filt,kernel,side*8);
Canny edge detector, a well known technique to find edges.
... In this method if a pixel has a value above the high threshold
cvFilter2D(src,res,filt,cvPoint(-1,-1)); them it belongs to an edge, if the value is bellow the low
threshold it does not belong to an edge and if it is in between it
D. Fourier Trasform belongs to an edge if it is connected to a pixel with value above
Another way to filter an image is performing a convolution the high threshold. The mask here is similar to the cvSobel.
operation in frequency domain. The first step is to convert an
F. Basic Segmentation
image to the frequency domain using the Fourier Transform.
The Fourier transform requires by itself a set of other opera- Segmentation is an important operation in image processing
tions, the source image has to be embedded in a larger image and computer vision because it groups pixels into more
and padded with zeros in order to avoid boundary effects. The meaningful regions which can be used for other tasks such
call in openCV for the discrete Fourier transform is: as recognition. There are several ways to segment an image,
three openCV methods for segmentation will be presented.
cvDFT(src,res,CV_DXT_FORWARD,
complexInput->height);
Region Growing
Src is the input image already embedded and padded with The region growing method is based on similarity among
zeros and res is the image converted to frequency domain. The pixels.The idea is that given a seed point (a pixel in the image)
whole code sequence is presented bellow: the method checks the neighbors of this point. If a neighbor
int dft_Y, dft_X; has a similar intensity value the method marks the neighbor as
CvMat* fft, tmp; being from the same region and uses the marked pixels as new
IplImage *im_Real,*im_Imagi,*complexInput; seeds. The region growing method is implemented in openCV
... using the cvFloodFill function:
dft_Y=cvGetOptimalDFTSize(src->height-1);
dft_X=cvGetOptimalDFTSize(src->width-1); cvFloodFill(src,seed,value,low,high,comp,flags,
fft=cvCreateMat(dft_Y,dft_X,CV_64FC2); mask);
im_Real=cvCreateImage(cvSize(dft_X,dft_Y),
IPL_DEPTH_64F,1);
As usual, src is the source image, seed is the point where
im_Imagi=cvCreateImage(cvSize(dft_X,dft_Y), the method will start the region growing process, value is the
IPL_DEPTH_64F,1); value used to mark the region, low and high are the limits
complexInput=cvCreateImage(cvGetSize(src), within the neighbor can be accepted as belonging to the region,
IPL_DEPTH_64F,2); comp is a connected components structure that holds region’s
cvGetSubRect(fft,&tmp,cvRect(0,0,
src->width,src->height));
statistics, flags defines a set of paramenters for the method (the
cvCopy(complexInput,&tmp,NULL); connectivity, 4 or 8, the relative value for filling the region,
if(fft->cols > src->width){ etc), and mask, which can be used as output image if provided.
cvGetSubRect(fft,&tmp,cvRect(src->width,
0,fft->cols-src->width,src->height)); Background Subtraction
cvZero(&tmp); When one needs to segment an image to check what have
}
cvDFT(fft,fft,CV_DXT_FORWARD, changed over time a simple operation is called background
complexInput->height); subtraction. There are several ways to perform this task, the
simplest is just to perform an image difference. In this case all
E. Finding Edges background will be marked with 0 or a low value, which can
A typical way to compute edges in an image is to find local be set to zero using the threshold function. Opencv performs
variations in intensity levels (gradients). Some of the openCV the subtraction using the cvAbsDiff function:
functions to compute these derivatives and return an image cvAbsDiff(src1, src2, res);
with the possible boundaries are:
In this case src1 and src2 are the source images and res is
cvSobel(src, res, xorder, yorder, mask); given by res = |src1 − src2|.
cvLaplace(src, res, mask);
cvCanny(src, res, low, high, mask);
Watershed Segmentation
The cvSobel function computes the derivatives in the X The watershed algorithm is a split-merge region method, it
and/or Y directions, the xorder and yorder values determine first uses intensity levels to find groups of small regions and

12
Fig. 1. An ideal contour around an object of interest.

after that uses a set of markers in a mask to group the small Fig. 2. Contour representation: on top the Freeman chain code and on bottom
regions into areas with similar properties, which are computed a set of points for a polygon.
from each marker. The watershed method is computed in
openCV using the function:
method);
cvWatershed(src, markers);
The input image, src, is a binary image, sto is a mem-
Again, src is the input image, markers is an image with the ory location where the contours will be written first is the
same size as src where the regions are marked, so the method first contours in a sequence, header gives the size of the
can group smaller regions and segment the image properly. object being retrieved and can be either sizeof(CvContour) or
III. I MAGE M ATCHING sizeof(CvChain), depending on the method, mode defines the
data structure used to store the contours and can be:
Pattern recognition is an area of computer vision where one
• CV RETR EXTERNAL: retrieves only the most external
tries to find if there is a known pattern in a given image.
The process that checks in the image for a pattern’s possible contour.
• CV RETR LIST: retrieves all contours as a list where
location is called image matching. Matching techniques are
the simplest way to do pattern recognition. In this section it the first element is most internal contour.
• CV RETR CCOMP: retrieves all contours and organize
will present two different approaches for matching: contour
matching and template matching. them as a two level hierarchy, one level with external
contours and another with internal contours.
A. Contour Matching • CV RETR TREE: retrieves all contours and organize

After segmentation an image is composed by a set of them as a tree structure with the outer contour as a root.
groups of pixels where each group represents a region. This The method parameter is related to the way openCV stores
segmented data can be transformed into a compact way that the contour in memory. There are five ways to do this task:
facilitates the region’s description and help to compare and • CV CHAIN CODE: uses the Freeman chain code.
match with a given pattern [8]. When thinking about a contour • CV CHAIN APPROX NONE: uses the points deter-
one considers an object of interest and a line surrounding it as mined by the Freeman chain code.
an ideal contour (althought not necessarily a closed line), as • CV CHAIN APPROX SIMPLE: compresses the Free-
shown in Figure 1. A contour can be represented in different man chain code and returns the ending points.
ways, two of the most common ways to represent a contour • CV CHAIN APPROX TC89 L1 or
are polygons and Freeman chain codes [1]. CV CHAIN APPROX TC89 KCOS: uses a special
A boundary can be represented by a connected sequence chain approximation algorithm.
of straight line segments, each with a specified length and • CV LINK RUNS: can only be used with
direction. This type of representation is called a Freeman chain CV RETR LIST and represents contours as links
code, it uses a 4- or 8-connectivity and the segment’s direction horizontal segments.
is coded following a numbering code. Figure 2 presents these The returning value of the method is the total number of
two methods, on top the Freeman chain code with the green contours found. One important step before calling cvFindCon-
marker to set the starting point and the numbering code at the tours is the input image binarization. This process requires
right. On bottom the polygon given by the set of points at the some filtering afterwards to clean the image, otherwise too
corners of the polygon. many contours might be found. Figure 3 shows a simple and
In openCV contours are usually computed from a binary clean binary image (on the left side) and the contours found
image where it is easier to define contrast. The function used (on the right) there are a total of 7 contours on the right
to compute the contours is: image. Figure 4 shows the same process applied to a complex
int cvFindContours(src,sto,first,header,mode, (meaning real) image, the contour structure, in this case, has

13
region and f (i, j) is the intensity level of the binary image at
position i and j. A moment can be translation invariant if it
is computed based on orthogonal axis passing in the region’s
center of mass, these moments are called central moments and
they are given by Equation 2

 ∞

μpq = (i − xc )p (j − yc )q f (i, j) (2)
Fig. 3. A simple binary image (left) and the contours found using i=−∞ j=−∞
cvFindContours (right). The function found 7 contours.
where xc and yc are the region’s center of mass coordinates.
B. Template Matching
A simple way to do pattern recognition is just to verify if a
pattern shows up in the source image. The process makes an
exhaustive search of the template in the source image and, in
this case, marks each position where the pattern is found. The
Fig. 4. A complex binary image (left) and the contours found using search might be slow for large source images, but the process
cvFindContours (right). The function found 3790 contours. is simple and gives good results. The openCV function that
performs template matching is:
3790 contours, each small point or region in a complex image cvMatchTemplate(src,template,result,method);
will have a contour around it. In this function src is the source image, template is the
Once a set of contours are found it is possible to go through pattern to be found, result is an image showing the results
each contour using the data structure defined by CvSeq (see for the matching and method describes the way the template
[1] for details) and select the desired contour for a contour matching is performed.
matching. There are several ways to compare two contours There are six different methods to perform template match-
using openCV, it depends on the pattern contour and the ing in openCV:
contour found in an image. For instance, Freeman chain codes
• CV TM SQDIFF: this method computes the square dif-
are translation invariant, if the length used to compute the ference between the template and the source image, a best
Freeman chain is scaled up or down, the chain itself can
match in this case has a 0 value. Equation 3 is used to
be also scale invariant, and if the number coding is rotated compute the matching.
accordingly, then it can also be rotation invariant. Other ways 
to compare contours are: Rsdiff = [t(i, j) − f (x + i, y + j)]2 (3)
• Compute the contour’s perimeter and compare the results: ij
cvContourPerimeter(contour). • CV TM CCORR: this method makes a correlation be-
• Compute the contour’s area and compare the results:
tween the template and the source image at each position,
cvContourArea(contour, slice). a best match in this case has a large value (not necessarily
• Compute the contour’s moments and compare the results:
the maximum, depending on noise). Equation 4 is used
cvContourMoments(contour, moments). to compute the correlation.
In these functions, contour is a contour structure computed 
by cvFindContours, a slice in the cvContourArea is a parame- Rccorr = [t(i, j) ∗ f (x + i, y + j)]2 (4)
ter that allow to compute the area of only part of the contour, ij
otherwise specify slice as CV WHOLE SEQ. The moments • CV TM CCOEFF: this method is called correlation co-
argument in the cvContourMoments is a data structure type efficient matching and it makes the correlation between
named CvMoments and should be previously allocated. the template subtracted from its mean and the source im-
A region moment representation is an interpretation of age subtracted from its mean at each position, considering
a binary image as a probability density function of a 2D only the template’s size. Notice that the correlation
random variable. This random variable has properties that can  2 might
give poor results when the image energy, f (x, y),
be described using statistical characteristics which are called varies with position. Equation 5 is used to compute the
moments [9]. A moment of order (p, q) is depended on scaling, correlation coefficient, a good match in this case has also
translation and rotation, in digitized images it can be computed a large value.
as shown in Equation 1 
∞ ∞
Rccoeff = [(t(i, j) − t̄) ∗ (f (x + i, y + j) − f̄ ]2 (5)
  ij
mpq = ip jq f (i, j) (1)
i=−∞ j=−∞ The other three methods available for cv-
where i and j are the coordinates of the points inside the MatchTemplate are normalized methods, selected by

14
CV TM SQDIFF NORMED, CV TM CCORR NORMED Figure 7 shows the idea of learning in pattern recognition, if a
and CV TM CCOEFF NORMED. These methods work system receives a set of images having cars, faces and houses,
better when there are lighting differences between the template learning means that the system is capable do define regions
and the source image [10]. In all cases the normalization is separating these images and, if a new image is presented the
performed by dividing the method by the normalization factor system can classify it into one of the possible classes.
presented in Equation 6.

NORM = t2 (i, j) ∗ f 2 (x + i, y + j) (6)
ij

Figure 5 shows an application of cvMatchTemplate, the


template used is shown on the image’s top right and the
colored squares show where each method found the template.
Notice the problem presented by the correlation method.

Fig. 7. The idea of machine learning applied to pattern recognition.

Before going through an specific method let’s first present


some important concepts related to the data and some com-
mon functions used for the majority of the learning methods
Fig. 5. The result of the cvMatchTemplate using the template at the top
right. Notice the problem with the correlation method (the green square). available in openCV.
Learning methods in computer vision can either work with
images themselves or they can work with a vector of features
Figure 6 shows the images obtained by the six methods extracted from these images. For instance Figure 8 shows
used by cvMatchTemplate. At the top, from left to right, the the two representations for some digits extracted from license
squared difference, correlation and correlation coefficient and plates. On the top part are the images themselves and the
at the bottom their correspondent normalized results. coding used for feature extraction and on the bottom part the
features as a vector representation for each image.

Fig. 6. The output of cvMatchTemplate for each available method. Top


row presents squared difference, correlation and correlation coefficient re-
spectively. The bottom row presents the correspondent normalized method. Fig. 8. The two representations for learning, the images themselves and a
feature vector extracted from the images.

IV. PATTERN R ECOGNITION USING M ACHINE L EARNING The data presented for the learning methods in openCV have
T ECHNIQUES always to be in a matrix form. If the image is being used to
Learning is related to changes in an adaptive system, such represent the data, all images must have the same dimensions
that the system can perform similar tasks more efficiently and all images must be converted into vectors, so, if there are
throughout time. When thinking about pattern recognition the K images available and each image has size M by N, the data
adaptive system has to be capable to decide a class that a matrix in openCV will have size K ∗ M N . If the images are
new image belongs to among the patterns it had seen so far. represented by feature vectors representing P features, then the

15
data matrix will have size K ∗ P . Thus the data preparatiom
can be decomposed as:
1) Clean the images: if necessary filter the images to
remove or reduce noise.
2) Adjust contrast: if necessary have all images with a
contrast throughout the whole range, so to avoid dark
or light images.
3) Resize the images: all images must have the same
dimensions if they will be used as training data. Fig. 9. At left an image with random points and at right the seven clusters
4) Convert format: images should be converted as vectors. found.
5) Mix the data: in order to avoid any kind of bias, data
from different classes should be mixed in the data
matrix. The openCV function for the Kmeans algorithms is given
These steps should also be followed even if the data is a by the method:
feature vector, eliminating the steps that are not required such cvKMeans2(data,clusters,result,criteria).
as resizing.
The learning methods for openCV have specific calls for In this case data is a multidimensional input matrix used for
each phase of learning and using the system: training, clusters is the number of clusters given by the user
result is a matrix where each cluster is marked and criteria is
• method→train: to train the system using the method.
a termination criteria passed to the K-means algorithm.
• method→save: to save the learned system into a file as
an xml file format or yml file format. B. Decision Trees
• method→load: to load the method learned from the file
Another machine learning technique is called decision tree.
passed. In a decision tree each node represents a random variable and
• method→predict:for a new prediction in a new data.
the arcs leaving the node represents the possible outcomes
OpenCv has several machine learning techniques used for for the random variable. The simplest decision tree is called
pattern recognition, a sample will be presented here. a binary decision tree where the outcomes for the random
variables are true or false. A binary decision tree can be used
A. K-means either for classification or for regression [11].
The K-means is not exactly a pattern recognition method Usually a binary decision tree uses as data a vector of
but a clustering method that tries to find clusters in the data. features extracted from images. Figure 10 shows an idea
One way to perform pattern recognition using K-means is first for digit classification based on features extracted from digit
find the clusters and then look for the pattern in each cluster. images. The top node checks if the feature top bar exists in
If the number of clusters in the data is not known (K), the image being classified. If it exists then the decision tree
the user has to make a guess and adjust this number later, if checks for the bottom bar otherwise it checks for the middle
necessary. The method interactively searchs for K centers of bar and so on. The learning process for decision trees tries
mass inside the data. This is one of the most used techniques to find the most discrimanating features in order to create the
for grouping [1]. The method itself works as follows: simplest tree possible.
1) Input: data and the number of groups (K).
2) Ramdonly defines K centers of mass in the data.
3) Associate each entrance with a center of mass.
4) Compute the new center of mass.
5) If the error between the previous centers of masses
and the new ones is bellow a certain limit accept and
terminate, otherwise return to step 3.
The K-means algorithms has the following problems:
• There is no warranty it will find the best centers of mass
for each group, but it will always converge,
• The user has to provide the number of clusters, the
Fig. 10. A model for a decision tree on image classification.
method will not give the best number of clusters in the
data.
• The method assumes that the covariance in the grouping The features are evaluated using three possible measures:
space is not important or that it was normalized. • Entropy:

Figure 9 shows at left an image with random points (right) E(feature) = −p ∗ log2 (p) − (1 − p) ∗ log2 (1 − p)
the seven clusters found by the K-means method (left). (7)

16
• Gini index: until a new unlabelled image is presented to the system and

m once the new image is labelled the computation is discarded.
G(feature) = 1 − fi2 (8) The code in openCV for the KNN method is:
i=1
CvKNearest knn(traindata,trainclasses,sample,
• Misclassification: regression,K);
response=knn.find_nearest(samples,results,
M(feature) = 1 − max(P(wj )) (9) neighbors,n_responses,distances);
where p represents the fraction of the data that has the feature. In this case traindata is the set of images, the trainclasses
In the Gini index fi is the fraction of items labelled with value are the labels for each image, sample is a vector with the
i using the feature. In the misclassification P (wj ) represents samples in the data that have to be considered, can be set to
the fraction of patterns in class wj using the feature. 0, regression is a boolean variable and indicates if the method
The results interpretation in a decision tree is straightfor- is being used for regression or not, and K is the number of
ward which makes it a method used widely. Most implemen- neighbors to bconsider. In the find nearest method samples
tations for this method allow missing information which is are the new images to be classified, results are the answers
almost always the case when working with feature vectors. for the samples (the labels), neighbors is a vector indicating
The method in openCV for decision trees is called as: the neighbors for each image in samples, n responses is the
decTrees.train(data,type,truevalues,features, class for each neighbor found and distances is a vector with
points,vartypes,missing, the distance for each neighbor found.
parameters);
D. Boosting
In this method data is the training data, type indicates
a row or column matrix, truevalues is a vector indicating The idea of Boosting is to combine simple decision rules
the expected (true) classification of the data. The remaining into an accurate decision system. In openCV the method
parameters are optional, features if not 0 is a mask indicating implemented for Boosting is called AdaBoost [1] which uses a
the features that must be used in the training process, points, simple (weak) classifier (a binary decision tree) to implement
if not 0, indicates the points that must be considered in the Boosting method. A weak classifier is a decision method
the training process, vartypes indicates if the variables are which is capable to classify things with a probability just above
categorical or ordered and finally parameters is used to set chance (50%). The combination of several weak classifiers
tree parameters like depth, missing data, etc (see more details leads to a decision system that typically performs at or near
in the openCV’s sample directory in the mushroom.cpp file). the state of the art [1], performance can be improved only in
more specifically designed systems. OpenCV implements four
C. K Nearest Neighbor
types of boosting for its AdaBoost algorithm:
The K Nearest Neighbor (KNN) is a comparative method • DISCRETE: for a discrete data.
for pattern recognition. There is actually no learning in this • REAL: uses confidence predictions and works well with
method, the idea, as shown in Figgure 11, is to compare any categorical data.
new image with all images available in the data and take the • LOGIT: works well with regression.
K nearest images (for any small and possibly odd value of K) • GENTLE: works better in regression data because it puts
and label the new image with the label of the majority among less weight on outliers.
the K. The nearest can be computed using any measurement
method, such as Euclidean or Mahalanobis distance. The idea in AdaBoost is to find a set of simple binary deci-
sion trees where each tree has a weight and the classification
of each one of these trees can be combined into a single and
strong classification system, as presented in Figure 12. The
method in openCV for AdaBoost is called as:
boost.train(data,type,trvalues,features,points,
vartypes,missing,parameters,update);

The parameters, as expected, are similar to the parameters


used by the binary decision trees. The AdaBoost has an extra
parameter, update, which is optional and set, by default, as
Fig. 11. An example of K nearest neighbor with K=5. The new image (red 0, in this case it trains a new set of weak classifiers from
point) is compared with all points and the 5 closest are kept (three blue, one scratch. The parameters passed to boost.train are the type of
green and one yellow). The new point is classified as blue which is a 7.
boosting used, the number of weak classifiers used, the limit
of percentage that a point should have to be considered, the
The K nearest neighbor requires only three things: the value depth of the tree, etc. Check the example presented in the
of K, the labeled images (data) and the metric to measure file letter recog.cpp in the samples c directory of the openCV
closeness. Notice that the KNN method does not process data foulder and openCV book [1] or the openCV manual [7].

17
coefficients for an input object obj, eig count gives the number
of eigen objects, input is the structure with the eigen objects
and coeffs are the coefficients for the obj entered, the other
parameters are the same as in the cvCalcEigenObjects. In
cvEigenProjection the object projection over the eigen sub-
space is computed, input vecs is the input objects coeffs are
the coefficients computed by cvEigenDecomposite and proj is
the output computed by cvEigenProjection. More details about
Principal Component Analysis can be found in [12].

F. Neural Networks
Fig. 12. The idea of AdaBoost and boosting in general. Combine simple
classifiers using weights to get a strong classifier. An artificial neural network provides a general method for
learning different types of functions, such as real or discrete
valued functions from examples [14]. Neural networks are
among the most effective classifiers available, although the
E. Principal Component Analysis
learning phase might take some time when using gradient
Principal Component Analysis (PCA) is a technique based descent learning method [1].
on linear algebra which is used to find patterns in high OpenCV has two methods to implement neural networks,
dimensional data. PCA is applied in several areas, such as Support Vector Machines (SVM) and Multilayer Perceptron
neuroscience, data compression and computer vision. PCA (MLP). Both methods are implemented similarly so we will
can also be used to reduce the number of dimensions in data go through the Multilayer Perceptron and show how to use it,
with no (or little) loss of information. The dimensions in this the reader should check in the reference manual [7] for the
reduced dimensions are called the principal components and calls for the SVM methods.
they can be used for object recognition like faces [13] in The MLP structure is defined by the number of hidden
images. (intermediate) layers, the number of nodes in each layer and
The analysis using principal components is similar to the K the transition function. Each node in the network (except in
Nearest Neighbors. A new image is compared to all images the input layer) works by adding up the values that arrive to
used in the training, but instead of using the whole image the node and feed the transition function that, when activated,
the comparison is performed using only the reduced dimen- computes the node’s output. These two structures are presented
sions (principal components) of the images. Again, a measure in Figure 13 the top part shows the neural network strucutre
method is required to make the comparisons. The algorithm and the bottom part shows the node’s structure.
for PCA follows these steps:
1) Get the data.
2) Subtract the mean.
3) Compute the covariance matrix.
4) Compute the eigenvectors and eigenvalues of the covari-
ance matrix.
5) Select the principal components, the ones with the
highest eigenvalues.
The PCA methods in openCV are listed bellow:
cvCalcEigenObjects(nObjs,input,output,ioFlags,
ioBufSize,userData,limit,avg,eigVals);
cvEigenDecomposite(obj,eig_count,input,ioFlags,
userData,avg,coeffs);
cvEigenProjection(input_vecs,eig_count,ioFlags,
userData,coeffs,avg,proj);

In the cvCalcEigenObjects nObjs is the number of objects in


the input data, input is the input data, output is the vector with
the eigen objets, ioFlags is the input/output flags which indi-
cates if the computation uses callback or not, ioBufSize gives Fig. 13. The Multilayer Perceptron structure is presented on top.The bottom
the size in bytes for the i/o buffer, use 0 if unknown, userData part shows the node’s strucutre.
is a pointer for the structure used in the callback mode, 0
otherwise, limit is the termination criteria used by the method,
avg is the averaged object computed by the method and finally For the MLP openCV implements the backpropagation algo-
eigVals returns the eigenvalues computed by the method. The rithm using gradient descent for weight updates. Equation 10
cvEigenDecomposite is used to compute the decomposition shows the error computed at the output layer, which compares

18
the expected value with the computed value. Equation 11 followed so the search in the following images (frames) is
computes the weight’s update for each link in the network. minimized.
 Tracking can be applied in several areas, from security
1
E= (Ok − Ck )2 (10) systems to human computer interfaces. There are methods to
2 forecast the object’s position frame by frame, from Kalman
kinOutput

∂E filters [15] (available in openCV) to particle filtering processes


Δwij = γ (11) [16]. The first step required is to find object parts that are best
∂wij
to be tracked, these are usually corners.
In Equation 10, Ok is the expected output for node k and
Ck is the computed output for node k. In Equation 11 γ is Corner Finding
the learning rate, (γ ≤ 1) and wij is the weight in the arc This technique assumes a search for an object in an image
connecting node i to node j and Δwij is the update value sequence. The method searchs for feature points in an image
for weight ij. The methods in openCV for MLP are described that might be easier to find again in another image. OpenCV
bellow: has a function based on the Harris and Shi and Tomasi corner’s
mlp.create(layer_sizes,function,param1,param2); definition [1] which is computed using the second derivative,
mlp.train(input,output,weights,idx,param,flag); correlation and eigenvalues.
mlp.predict(sample,mlp_response); The cvGoodFeaturesToTrack is a method which returns the
pixel locations that might be easier to find in another image.
In mlp.create layer sizes is an array with the number of
This technique can also be used in stereo vision [1]:
nodes in each layer, function is the activation function type,
which can be:IDENTITY, SIGMOID or GAUSSIAN; param1 cvGoodFeaturesToTrack(src,eigImage,temp,corners,
and param2 are related to the activation function selected, see c_count,qual,min_d,mask,bl_sz,harris,k);
the reference manual [7] for these parameters. In the mlp.train, where src is the input image, eigImage and temp are two
input is the input data and output is the expected output for scratch images, corners are the corners computed and c count
each input data, some data in the input might be more signif- gives the number of corners found, qual defines the level
icant for training then others, weights is an optional floating- for the acceptable eigenvalues (always ≤ 1), min d defines
point vector of weights for each input data, idx is also an the minimum distance between two corners, mask has its
optional integer vector that shows if some input data should not usual meaning, bl sz defines the region for the correlation,
be considered, param sets parameters for the training termina- harris sets the Harris corner definition (otherwise uses Shi and
tion criteria, check the reference manual [7] for this parameter, Tomasi) and k is the weight for the Harris correlation matrix.
finally flag also controls the training algorithm and can be any
combination of: UPDATE WEIGHTS, NO INPUT SCALE Mean-Shift and Camshift
and NO OUTPUT SCALE, see the reference manual [7] for MeanShift is a robust method for finding local extreme
details. Finally the parameters for the mlp.predict are very points in an image as a density distribution function. This
simple, sample is the new input data and mlp response is a is a process where the mean-shift kernel is convolved with
vector with the output computed for the new data. the data and the hill-climbing algorithm is applied to find the
maximum [1]. The mean-shift algorithm is given by:
V. T RACKING AND M OTION
1) Select the window with the object of interest.
Sometimes when we are interested on computer vision
• initial location
applications we are not just looking for a pattern in a static
• type (uniform, polynomial, exponential or gaussian)
image, but we want to follow this pattern in an image sequence
• shape (symmetric, skewed, rotated, rounded, rectan-
and learn how it behaves. One option in openCV for tracking
gular)
objects is the Camshift/Meanshift algorithms. Motion is also
• size
an important topic and there is a set of interesting information
that can be extracted from an image sequence. Optical Flow 2) Finds the window’s center of mass.
is a technique that helps to extract information in an image 3) Place the window at its center of mass.
sequence and openCV has a set of methods to work with 4) Return to step 2 until it converges.
optical flow in order to compute image segmentation and find The Camshift (Continuously Adaptive Mean-SHIFT) is
some other 3D information. based on the Mean-shift algorithm but the window’s size is
self adjusted depending on the object proximity to the camera.
A. Tracking The Mean-shift and Camshift algorithms are capable to track
Tracking is the process to follow a pattern in an image any image that can be viewed as a distribution of features,
sequence, but not necessarily a stream. It would be possible usually color is used as a feature.
to do pattern recognition in each image (frame) but the process When it is applied, for instance, for face tracking, in each
could be slow if information acquired in one image (frame) frame, the raw image is converted into a color probability
would not be used in the following images (frames). The distribution using a skin color histogram. The face size and the
tracking process attach a knowledge about the object being face center are computed by the Camshift and this information

19
is used to define the search window for the next frame.
Figure 14 shows the camshiftdemo.c program working.

Fig. 15. CPU x GPU (source: NVIDIA).

Fig. 14. camshiftdemo.c working (histogram, user interface and image


captured and followed).

B. Optical Flow
Optical Flow is a technique used for movement identifi-
cation in an image sequence without prior knowledge about
the image contents. In this technique, typically the movement
itself indicates that something is happening in the image.
OpenCV has sparse and dense methods to find movement.
Sparse methods require a previous knowledge about the points
to be tracked, such as corners described previously. Dense
methods associates a speed vector or a pixel displacement in
the image, so they don’t need to know any specific point in the
image. In practical applications, however, the dense techniques
have a high processing cost, so, unless they are really required
the user should use a sparse method. Table V-B resumes the
optical flow methods available in openCV, for more details in
each method check the reference manual [7] or [1].
TABLE I
O PTICAL F LOW METHODS IN O PEN CV.

Method Type Command


Lucas-Kanade Sparse cvCalcOpticalF lowLK
Lucas-Kanade Piramidal Sparse cvCalcOpticalF lowP yrLK
Horn-Schunk Dense cvCalcOpticalF lowHS
Fig. 16. Heterogeneous computing (source: NVIDIA).
Block-Matching Dense cvCalcOpticalF lowBM

VI. PARALLEL C OMPUTER V ISION USING CUDA AND First, it is elementary to understand the differences between
O PEN CV CPU and GPU architectures. While the CPU cores dedicate
The majority of the computers in the market today have a their millions of transistors to a few sophisticated execution
GPU but most of the computer vision and image processing units, GPU provides up to hundreds of simple execution units
applications are still sequential. The idea behind this topic is (Figure 15). While exploring parallelism in a multicore CPU
to present the students the possibility to use the OpenCV GPU requires a few number of independent threads, the CUDA
module in order to launch commonly used OpenCV functions programming model launches hundreds of threads that execute
in a GPU and improve overall performance. the same code over the GPU cores. This programming model
This topic will cover an introduction to CUDA, the GPU is called SIMT (Single Instruction, Multiple Threads) model
and CUDA architectures and the CUDA programming model, and it is derived from the classical SIMD (Single Instruction
then it will present the basic OpenCV GPU module features, Multiple Data) model.
as well as a simple example of how to use the OpenCV GPU Despite their differences, both architectures must coex-
main data structure and some functions as well as how to ist through a heterogeneous system, where GPUs are co-
calculate the performance speedup. processors to CPUs. The GPU is also known as a type of
an accelerator resource.
A. Introducing CUDA architecture
On the source code a special function must be developed
CUDA (Compute Unified Device Architecture) unifies the that executes in the GPU and is launched from the CPU code.
programming interface for NVIDIA GPUs. It defines a scal- In the CUDA programming model [4], the GPU is called
able programming model that could be used to program dozens device while the CPU is called host. The special GPU function
of different CUDA-enabled NVIDIA GPUs. in CUDA is called kernel and it is compiled separatelly from

20
also a limited resource where the allocation is performed
basically by dividing the total amount of shared memory
available by the total number of threads.
• Global memory: it is shared by all threads in the
same grid. Its allocation is performed in the host code,
as well as the data copy to and from the host. The
Fermi architecture has a L2 cache populated with global
memory data that helps in reducing the access latency.
• Constant memory: it is also shared by all the threads
in the same grid, but it is a read only memory. The
advantage in using this memory is its reduced memory
latency in relation to the global memory.

B. A simple CUDA example


Fig. 17. GPU memory hierarchy (source: NVIDIA). The following example introduces the basic CUDA pro-
gramming model and its API functions. The example shows a
basic matrix add value operation: a determined value is added
the host code by the NVIDA C compiler (nvcc). Figure 16 to each element of a matrix.
shows this heterogenous programming style.
Figure 16 also depicts an example of a kernel configuration. int d = 64; //square matrix dim
int n = d*d; //number of elements
Each kernel is composed by a grid of thread blocks. A grid float x_h[d][d]; //host matrix
is an organized group of blocks that could have up to two float *x_d; //pointer to the device matrix
dimensions while a block is a group of threads that could size_t size= n * sizeof(float);
have up to three dimensions. These configurations are defined cudaMalloc(&x_d, size);
in the source code during the kernel launch. cudaMemcpy(x_d, x_h, size,
cudaMemcpyHostToDevice);
This organization allows each thread to know its exact
unique position in the grid, which is usually important to 1) Device memory allocation and data transfer: The
define the portion of data each thread will work on. This CUDA API provides functions to allocate and to transfer
programming model is known as SPMD (Single Program data to the device. The example above presents also the
Multiple Data), where all threads execute the exact same code two main functions to acomplish this task: cudaMalloc() and
on different portions of data. Thus, while launching a kernel cudaMemcpy(). The first one allocates size bytes in the device
the programmer determines how many threads will execute memory and returns a pointer to it in the first argument (x d).
that code and, more than this, how they are organized. The second one makes a copy of the data (with size bytes)
The memory hierarchy is another important feature to deal from the host memory (x h) to the device memory (x d). The
with while developing a CUDA application. As a co-processor direction of the copy is defined in the fourth argument: cu-
to the CPU, the GPU is connected to the main memory through daMemcpyHostToDevice or cudaMemcpyDeviceToHost (used
a PCI Express bus. The data to be processed must be allocated later to to recover the results from the GPU).
and then transfered between the main and the GPU memory. 2) Configure and launch the kernel: As mentioned, the
The GPU have a memory hierarchy that must be used threads are organized in a grid of thread blocks. There is a
in order to hide the memory latency while the threads are CUDA type, the dim3, that could be used to hold the values of
accessing the data. Figure 17 presents the GPU memory each dimension. The dim3 variables are used during the kernel
hierarchy. The main memory components are: launch to define the grid and the blocks configuration during
• Registers: the compiler allocates for each thread its own the kernel execution. In the following example there will be a
set of registers (with local behavior). It is a limited 4 x 4 grid of 16 x 16 threads each. Thus, the original 64 x 64
resource, thus an exaggerated use of registers could matrix will be partitioned into 16 blocks of 256 threads each,
reduce the performance. Usually, the compiler allocates totalizing 4096 threads, one per matrix element.
kernel local variables to registers in order to augment the int gd = 4; //square grid dim
performance. int bd = 16; //square block dim
• Shared memory: all the threads in the same block share dim3 gridDim(gd, gd, 1);
this memory. It is explicitly allocated by the programmer dim3 blockDim(bd, bd, 1);
through the shared directive in the kernel code. add_value <<<gridDim, blockDim>>>
(x_d, value, d);
In the last generation GPU, named Fermi, the shared
memory could also be configured as a L1 cache memory 3) The kernel function: The example bellow shows the
(in fact they both share a configurable memory space). add value function that executes an add instruction over
Both memories help to reduce the latency while accessing hundreds of threads in parallel. The modifier global
the data from the global memory. Like the registers, it is indicates that the function must be executed in the device and

21
launched by the host code. Note that there will be 4096 threads TABLE II
D EVICE INFORMATION AND MANAGEMENT SELECTED FUNCTIONS .
executing the kernel at the same time in the GPU.
__global__ void add_value Function Functionality
(float* a,float value,int d){ int getCudaEnabledDeviceCount(); Returns the number of
CUDA-enabled devices installed.
int x=threadIdx.x+blockIdx.x*blockDim.x;
void setDevice(int device); Sets device and initializes it for
int y=threadIdx.y+blockIdx.y*blockDim.y; the current thread.
int offset=x+y*blockDim.x*gridDim.x; int getDevice(); Returns the current device index.
if ( x < d && y < d ) a[offset]+=value; string DeviceInfo::name(); Returns the device name.
} int DeviceInfo::majorVersion(); Returns the major compute
capability version.
The cudaMemcpy() copies a block of data that resides int DeviceInfo::minorVersion(); Returns the minor compute
contiguously in the global memory. Hence, the first thing to capability version.
do is to calculate the exact position in the linearized matrix the size t DeviceInfo::freeMemory(); Returns the amount of free
memory in bytes.
thread will work on. This is done using the pre-built variables size t DeviceInfo::totalMemory(); Returns the amount of total
threadIdx, blockIdx, blockDim and gridDim. In the example, memory in bytes.
the threads configuration is defined by bidimendional grid and bool DeviceInfo::isCompatible(); Returns true if the GPU module
can be run on the device.
blocks, so we can use the values for the x and y components
of these pre-built variables. The goal is to find an offset that
indicates how many matrix elements there exists before the
current thread’s matrix element in the linear memory. The Right now, the newest compute capability is 2.1. The list of
formulas in the previous example use the position of the thread features for each capability can be found in [4].
in the block, the position of the block in the grid and the values The OpenCV GPU module comes with binaries for ca-
in each dimension to compute the final offset. A more detailed pabilities 1.3 and 2.0. For capabilities 1.1 and 1.2 it works
example could be found in [6]. with CUDA intermediary code (PTX). Using PTX code means
After the kernel execution, the cudaMemcpy() must be that the JIT (Just In Time) compiler will be used in the first
called again to copy the resulting matrix that is in the device execution. This means that the first execution will be slower
global memory to the host memory. than the subsequent executions. The OpenCV GPU module
does not work if the GPU has capability 1.0 ([7]).
C. Building the OpenCV GPU module
D. The OpenCV GPU module
The OpenCV GPU module contains a set of OpenCV C++
classes and functions already adapted to run over any CUDA- This section provides basic information on how to use the
enabled NVIDIA GPU. Hence it is possible to start using OpenCV GPU module. The complete set of ported functions
the GPU performance power even with low experience with are well documented in [7].
CUDA or GPU architecture. On the other hand, using it 1) Device information and management: The functions in
efficiently requires some basic knowledge. This and the next this group provide several device information and allow setting
sections highlights the main OpenCV GPU module classes and the current device if there are more than one installed in
functions along with some CUDA explanations. the same machine. Part of these functions are presented in
The first thing to do in order to use the OpenCV GPU Table II, some of them are methods that belong to the C++
module is to build it. It is necessary to have a CUDA-enabled class DeviceInfo.
NVIDIA GPU (only the very old ones are not enabled) and the 2) Data structures: The basic data structure for OpenCV
last CUDA and OpenCV versions. At the time of this writing GPU module users is the GpuMat that is very similar to
the last versions are OpenCV 2.2 and CUDA 4.0. OpenCV Mat. There are some limitations like no arbitrary
There is also a tricky procedure while building with these dimensions support (only 2D), no functions that returns refer-
new versions concerning another pre-requisite: the NVIDIA ences to its data (because references on GPU are not valid for
Performance Primitive (NPP) library. This library contains CPU) and no expression templates technique support [7]. The
some vision functions used by the OpenCV GPU module. example bellow presents a simple code to illustrate the use of
Before the CUDA 4.0 release the NPP library was separated GpuMat initialization.
from the CUDA toolkit, thus the building procedure was cv::gpu::GpuMat dst;
different. The last CUDA release (4.0) includes the NPP cv::gpu::GpuMat src(
library as part of the toolkit. Thus, it is important to get the cv::imread("ressaca-no-arpoador.jpg",
trunk version of OpenCV that already considers this change CV_LOAD_IMAGE_GRAYSCALE));
in CUDA 4.0. Building procedure details can be found at [5].
First it initializes two GpuMat structures that will be used in
Before building it is usefull to know the compute capability
other examples: dst and src. As it demonstrates, the OpenCV
of the installed GPU. As the GPU architecture evolves the
function imread(), which returns a Mat structure, could be
CUDA software reflects these changes. The CUDA compute
used in the constructor of the GpuMat to initialize it with tan
capability, or just capability, identifies the set of features that
image data. This is an implicit conversion that could be used
were added generation after generation of GPU architectures.
to create GpuMat structures to run on the GPU.

22
Note that there are two namespaces used in the code.
The cv::gpu namespace is used to distinguish the member
from the upper level cv namespace, that contains all OpenCV
components. There are other lower level data structures that
could be used to write new CUDA kernels. The related classes
are described in [7].
3) Using GPU module functions: The code example bellow
shows a call of the treshold() function that will run on the
GPU. Note that the namespace indicates that the call refers to
the GPU version of treshold(). Also, the two first arguments
are of GpuMat type declared and initialized previously. The
example shows that a familiarized OpenCV user will not have
any dificulty to use the OpenCV GPU module. The example Fig. 18. Original and resulting images for the treshold() performance test
(original extracted from www.ipanema.blog.br)
also shows how to use an explicit conversion from GpuMat
to Mat. The function imwrite() requires a Mat as its second
argument. In this case a GpuMat structure (dst) is typecasted cout << "Elapsed time: "
in order to be used by the function. << mtime << " microseconds" << endl;
cv::gpu::threshold(src, dst, The previous example shows the code used to measure the
128.0, 255.0, CV_THRESH_BINARY);
cv::imwrite("result.jpg", Mat(dst)); GPU version of the treshold() function. The code is similar:
remove the gpu:: namespace and use the Mat structure.
E. Performance considerations The performance results for an Intel Core I7 950 3.06GHz
As presented previously, porting an application to run over Quad-Core processor and a Nvidia GeForce GTX 580, 1.5GB
a GPU using the OpenCV GPU module is straightforward, but GDDR5 card are presented using the Linux Ubuntu 11.04
yet requires some work. The OpenCV GPU samples could be operating system, the compiler was g++ 4.5.2, with OpenCV
used to obtain some indicators about the performance of the 2.2 and CUDA 4.0. Each version was run ten times and the
user’s specific card. averages were used to calculate the speedup.
Here we give an example of how to compute the speedup for The average CPU time was about 2625 microseconds and
a given application or portion of a code. The speedup basically the average GPU time was about 264 microseconds. After
is a metric that indicates how much faster an application eliminating the upper and the lower values the standard
runs when executed in a better hardware (usually a parallel deviation was 9.8 for GPU executions and 12.4 for CPU
hardware). Speedup is defined by the formula: executions. This gives us a speedup of almost 10x for the
GPU execution.
Ts The image used has 2835 x 1225 pixels. The resulting image
Sp = (12)
Tp has the same size in pixels. Figure 18 presents the original
image and the resulting image.
where Ts is the sequential time and Tp is the parallel execution
time. F. Porting an existing application
To calculate the speedup, first it is necessary to get the This section shows how to port an existing C++ application
execution elapsed time (or wall time) for both versions. The to run with the OpenCV GPU module. We use the dft sample
elapsed time is obtained by getting a start time before the that comes with the OpenCV package (the original code is in
portion of code to be timed and subtracting it from the end opencv/samples/cpp/dft.cpp). The next subsections explain the
time obtained after its execution. main procedures to convert the application.
The operating systems usually provide several timers that 1) Header files: The first step is to add the header file
could be used to accomplish this task. The example bellow gpu.hpp as in the example bellow (the original header files
uses the Linux gettimeofday timer to calculate the elapsed time remain the same as in the original dft.cpp code). The example
in microseconds. In the example, the treshold() function was also shows, the cv and std namespaces declared.
timed both for CPU (Ts ) and GPU (Tp ).
#include "opencv2/gpu/gpu.hpp"
struct timeval start, end; using namespace cv;
long mtime, seconds, useconds; using namespace std;
...
gettimeofday(&start, NULL); 2) GpuMat initialization: As illustrated before, the GpuMat
cv::gpu::threshold(src, dst, is the main data structure an it is used as the main argument
128.0, 255.0, CV_THRESH_BINARY); in all of the OpenCV GPU functions. In the code example
gettimeofday(&end, NULL); bellow, it is also initialized with the imread() function.
seconds = end.tv_sec - start.tv_sec; int main(int argc, char ** argv) {
useconds = end.tv_usec - start.tv_usec; const char* filename =
mtime = seconds * 1000000 + useconds; argc >=2 ? argv[1] : "imagens/lena.jpg";

23
gpu::GpuMat img(
imread(filename,CV_LOAD_IMAGE_GRAYSCALE));

3) Using constructors for explicit conversions: In the code


sequence that follows we use the GPU versions of copy-
MakeBorder() and merge() functions. Note that during the
creation of the planes[] array the GpuMat constructor was
used to convert the returning Mat template initialization.
Also, the returning Mat structure of the zeros() function call
was converted through the GpuMat constructor (there are no Fig. 19. DFT resulting images.
GPU versions for these two OpenCV components).
int M = getOptimalDFTSize(img.rows);
int N = getOptimalDFTSize(img.cols); for the entire application (excepting the imshow() call). We
gpu::GpuMat padded; obtained a 5x speedup for this simple porting.
gpu::copyMakeBorder It was observed a considerable difference in the generated
(img, padded, 0, M-img.rows, images that are probably a function implementation matter.
0, N-img.cols, Scalar::all(0));
gpu::GpuMat planes[] =
The Figure 19 presents the spectrum magnitude images gen-
{gpu::GpuMat(Mat_<float>(padded)), erated form OpenCV’s lena.jpg image (512 x 512), that was
gpu::GpuMat(Mat::zeros(padded.size(), used to the performance tests.
CV_32F))};
gpu::GpuMat complexImg; R EFERENCES
gpu::merge(planes, 2, complexImg); [1] G. Bradski and A. Kaehler, Learning OpenCV, O’Reilly, 2008.
[2] D. Stringhini, I.A. Souza, L.A. da Silva and M. Marengoni, Visão Com-
4) The DFT GPU version: The code bellow shows that putacional Usando OpenCV, in Fundamentos de Visão Computacional,
there is a difference in the arguments list for this function editors: M.A. Piteri and J.C. Rodrigues, UNESP 2011
in relation to the standard dft() function. The GPU version [3] Kirk, D. B., Hwu, W.W., Programming Massively Parallel Processors: A
Hands-on Approach. Morgan Kaufman, 2010.
requires the size for the DFT and there is also a default flag [4] NVIDIA Corporation, NVIDIA CUDA C Programming Guide - 4.0, 2011.
parameter that is not redefined in the example. [5] OpenCV Homepage - available at http://opencv.willowgarage.com/ (ac-
cess in june, 2011)
Size dft_size(N, M); [6] Sanders, J.; Kandrot, E. CUDA by Example - An Introduction to General-
gpu::dft(complexImg, complexImg, dft_size); Purpose GPU Programming. Adison-Wesley, 2011
[7] OpenCV Reference Manual, v2.2, December, 2010
5) More GPU module functions: In the code sequence [8] R.C.Gonzalez and R.E.Woods, Digital Image Processing, Third edition,
bellow we continue to use GPU module functions simply by Prentice Hall, 2008
[9] M.Sonka, V.Hlavac and R. Boyle, Image Processing, Analysis, and Ma-
using the gpu:: namespace. chine Vision, 3rd edition, Thomson, 2008.
[10] J.P. Lewis, Fast Normalized Cross-Correlation,
gpu::split(complexImg, planes); http://www.idiom.com/ zilla/Papers/nvisionInterface/nip.html
gpu::magnitude(planes[0],planes[1], [11] Breiman,L., Friedman,J.H., Olshen,R.A., and Stone,C.J.Classification
planes[0]); and Regression Trees. Wadsworth International, 1984, Belmont, Ca.
gpu::GpuMat mag = planes[0]; [12] Smith, L.I., A tutorial on Principal Components Analysis,
mag += Scalar::all(1); http://www.cs.otago.ac.nz/cosc453/student tutorials/ princi-
gpu::log(mag, mag); pal components.pdf
[13] Turk,M. and Pentland,A., Eigenfaces for Recognition, Journal of Cog-
The code for the rearrangement of the quadrants of Fourier nitive Neurosicence, Vol. 3, No. 1, 1991, pp. 71-86
[14] Mitchell, T.M., Machine Learning, McGrawHill, 1997.
image will be ommited here because it uses the same porting [15] Kalman R. E., A new approach to linear filtering and prediction
techniques discussed until now. Basically, we use GpuMat problems, Transactions of the ASME: Journal of Basic Engineering, Vol.
structure instead of the original Mat structure to declare the 82, 1960, pp. 35-45
[16] Isard M. e Blake A.,Condensation-conditional density propagation for
subimages for the quadrants. visual tracking, International Journal in Computer Vision, IJCV, 29(1),
6) Returning to use the Mat structure: The last step is the 1998, pp. 5-28.
normalize() function call that is not in the list of the ported
functions until the time of this writing. Thus, we constructed
a Mat from the GpuMat in order to passe it to normalize().
As a consequence, this function will run in CPU also in the
GPU version. The code bellow illustrates the use of Mat with
normalize() and also with the imshow() function.
Mat magc(mag); //mag is of GpuMat type
normalize(magc, magc, 0, 1, CV_MINMAX);
imshow("spectrum magnitude", magc);

7) Performance considerations: The same methodology


described in section VI-E is used to measure the elapsed time

24

You might also like