High Level Computer Vision Using Opencv
High Level Computer Vision Using Opencv
Abstract—This paper presents some more advanced topics expected in the source image and type defines the way the
in image processing and computer vision, such as Principal output image is computed, if the intensity in the source image
Components Analysis, Matching Techniques, Machine Learning
is greater than K then:
Techniques, Tracking and Optical Flow and Parallel Computer
Vision using CUDA. These concepts will be presented using the • CV THRESH BINARY: res = max, otherwise 0.
openCV library, which is a free computer vision library for • CV THRESH BINARY INV: res = 0, otherwise max.
C/C++ programmers available for Windows, Linux MacOS and • CV THRESH TRUNC: res = max, otherwise src.
Android platforms. These topics will be covered considering not
• CV THRESH TOZERO INV: res = 0, otherwise src.
only theoretical aspects but practical examples will be presented
in order to understand how and when to use each of them. • CV THRESH TOZERO: res = src, otherwise 0.
12
Fig. 1. An ideal contour around an object of interest.
after that uses a set of markers in a mask to group the small Fig. 2. Contour representation: on top the Freeman chain code and on bottom
regions into areas with similar properties, which are computed a set of points for a polygon.
from each marker. The watershed method is computed in
openCV using the function:
method);
cvWatershed(src, markers);
The input image, src, is a binary image, sto is a mem-
Again, src is the input image, markers is an image with the ory location where the contours will be written first is the
same size as src where the regions are marked, so the method first contours in a sequence, header gives the size of the
can group smaller regions and segment the image properly. object being retrieved and can be either sizeof(CvContour) or
III. I MAGE M ATCHING sizeof(CvChain), depending on the method, mode defines the
data structure used to store the contours and can be:
Pattern recognition is an area of computer vision where one
• CV RETR EXTERNAL: retrieves only the most external
tries to find if there is a known pattern in a given image.
The process that checks in the image for a pattern’s possible contour.
• CV RETR LIST: retrieves all contours as a list where
location is called image matching. Matching techniques are
the simplest way to do pattern recognition. In this section it the first element is most internal contour.
• CV RETR CCOMP: retrieves all contours and organize
will present two different approaches for matching: contour
matching and template matching. them as a two level hierarchy, one level with external
contours and another with internal contours.
A. Contour Matching • CV RETR TREE: retrieves all contours and organize
After segmentation an image is composed by a set of them as a tree structure with the outer contour as a root.
groups of pixels where each group represents a region. This The method parameter is related to the way openCV stores
segmented data can be transformed into a compact way that the contour in memory. There are five ways to do this task:
facilitates the region’s description and help to compare and • CV CHAIN CODE: uses the Freeman chain code.
match with a given pattern [8]. When thinking about a contour • CV CHAIN APPROX NONE: uses the points deter-
one considers an object of interest and a line surrounding it as mined by the Freeman chain code.
an ideal contour (althought not necessarily a closed line), as • CV CHAIN APPROX SIMPLE: compresses the Free-
shown in Figure 1. A contour can be represented in different man chain code and returns the ending points.
ways, two of the most common ways to represent a contour • CV CHAIN APPROX TC89 L1 or
are polygons and Freeman chain codes [1]. CV CHAIN APPROX TC89 KCOS: uses a special
A boundary can be represented by a connected sequence chain approximation algorithm.
of straight line segments, each with a specified length and • CV LINK RUNS: can only be used with
direction. This type of representation is called a Freeman chain CV RETR LIST and represents contours as links
code, it uses a 4- or 8-connectivity and the segment’s direction horizontal segments.
is coded following a numbering code. Figure 2 presents these The returning value of the method is the total number of
two methods, on top the Freeman chain code with the green contours found. One important step before calling cvFindCon-
marker to set the starting point and the numbering code at the tours is the input image binarization. This process requires
right. On bottom the polygon given by the set of points at the some filtering afterwards to clean the image, otherwise too
corners of the polygon. many contours might be found. Figure 3 shows a simple and
In openCV contours are usually computed from a binary clean binary image (on the left side) and the contours found
image where it is easier to define contrast. The function used (on the right) there are a total of 7 contours on the right
to compute the contours is: image. Figure 4 shows the same process applied to a complex
int cvFindContours(src,sto,first,header,mode, (meaning real) image, the contour structure, in this case, has
13
region and f (i, j) is the intensity level of the binary image at
position i and j. A moment can be translation invariant if it
is computed based on orthogonal axis passing in the region’s
center of mass, these moments are called central moments and
they are given by Equation 2
∞
∞
μpq = (i − xc )p (j − yc )q f (i, j) (2)
Fig. 3. A simple binary image (left) and the contours found using i=−∞ j=−∞
cvFindContours (right). The function found 7 contours.
where xc and yc are the region’s center of mass coordinates.
B. Template Matching
A simple way to do pattern recognition is just to verify if a
pattern shows up in the source image. The process makes an
exhaustive search of the template in the source image and, in
this case, marks each position where the pattern is found. The
Fig. 4. A complex binary image (left) and the contours found using search might be slow for large source images, but the process
cvFindContours (right). The function found 3790 contours. is simple and gives good results. The openCV function that
performs template matching is:
3790 contours, each small point or region in a complex image cvMatchTemplate(src,template,result,method);
will have a contour around it. In this function src is the source image, template is the
Once a set of contours are found it is possible to go through pattern to be found, result is an image showing the results
each contour using the data structure defined by CvSeq (see for the matching and method describes the way the template
[1] for details) and select the desired contour for a contour matching is performed.
matching. There are several ways to compare two contours There are six different methods to perform template match-
using openCV, it depends on the pattern contour and the ing in openCV:
contour found in an image. For instance, Freeman chain codes
• CV TM SQDIFF: this method computes the square dif-
are translation invariant, if the length used to compute the ference between the template and the source image, a best
Freeman chain is scaled up or down, the chain itself can
match in this case has a 0 value. Equation 3 is used to
be also scale invariant, and if the number coding is rotated compute the matching.
accordingly, then it can also be rotation invariant. Other ways
to compare contours are: Rsdiff = [t(i, j) − f (x + i, y + j)]2 (3)
• Compute the contour’s perimeter and compare the results: ij
cvContourPerimeter(contour). • CV TM CCORR: this method makes a correlation be-
• Compute the contour’s area and compare the results:
tween the template and the source image at each position,
cvContourArea(contour, slice). a best match in this case has a large value (not necessarily
• Compute the contour’s moments and compare the results:
the maximum, depending on noise). Equation 4 is used
cvContourMoments(contour, moments). to compute the correlation.
In these functions, contour is a contour structure computed
by cvFindContours, a slice in the cvContourArea is a parame- Rccorr = [t(i, j) ∗ f (x + i, y + j)]2 (4)
ter that allow to compute the area of only part of the contour, ij
otherwise specify slice as CV WHOLE SEQ. The moments • CV TM CCOEFF: this method is called correlation co-
argument in the cvContourMoments is a data structure type efficient matching and it makes the correlation between
named CvMoments and should be previously allocated. the template subtracted from its mean and the source im-
A region moment representation is an interpretation of age subtracted from its mean at each position, considering
a binary image as a probability density function of a 2D only the template’s size. Notice that the correlation
random variable. This random variable has properties that can 2 might
give poor results when the image energy, f (x, y),
be described using statistical characteristics which are called varies with position. Equation 5 is used to compute the
moments [9]. A moment of order (p, q) is depended on scaling, correlation coefficient, a good match in this case has also
translation and rotation, in digitized images it can be computed a large value.
as shown in Equation 1
∞ ∞
Rccoeff = [(t(i, j) − t̄) ∗ (f (x + i, y + j) − f̄ ]2 (5)
ij
mpq = ip jq f (i, j) (1)
i=−∞ j=−∞ The other three methods available for cv-
where i and j are the coordinates of the points inside the MatchTemplate are normalized methods, selected by
14
CV TM SQDIFF NORMED, CV TM CCORR NORMED Figure 7 shows the idea of learning in pattern recognition, if a
and CV TM CCOEFF NORMED. These methods work system receives a set of images having cars, faces and houses,
better when there are lighting differences between the template learning means that the system is capable do define regions
and the source image [10]. In all cases the normalization is separating these images and, if a new image is presented the
performed by dividing the method by the normalization factor system can classify it into one of the possible classes.
presented in Equation 6.
NORM = t2 (i, j) ∗ f 2 (x + i, y + j) (6)
ij
IV. PATTERN R ECOGNITION USING M ACHINE L EARNING The data presented for the learning methods in openCV have
T ECHNIQUES always to be in a matrix form. If the image is being used to
Learning is related to changes in an adaptive system, such represent the data, all images must have the same dimensions
that the system can perform similar tasks more efficiently and all images must be converted into vectors, so, if there are
throughout time. When thinking about pattern recognition the K images available and each image has size M by N, the data
adaptive system has to be capable to decide a class that a matrix in openCV will have size K ∗ M N . If the images are
new image belongs to among the patterns it had seen so far. represented by feature vectors representing P features, then the
15
data matrix will have size K ∗ P . Thus the data preparatiom
can be decomposed as:
1) Clean the images: if necessary filter the images to
remove or reduce noise.
2) Adjust contrast: if necessary have all images with a
contrast throughout the whole range, so to avoid dark
or light images.
3) Resize the images: all images must have the same
dimensions if they will be used as training data. Fig. 9. At left an image with random points and at right the seven clusters
4) Convert format: images should be converted as vectors. found.
5) Mix the data: in order to avoid any kind of bias, data
from different classes should be mixed in the data
matrix. The openCV function for the Kmeans algorithms is given
These steps should also be followed even if the data is a by the method:
feature vector, eliminating the steps that are not required such cvKMeans2(data,clusters,result,criteria).
as resizing.
The learning methods for openCV have specific calls for In this case data is a multidimensional input matrix used for
each phase of learning and using the system: training, clusters is the number of clusters given by the user
result is a matrix where each cluster is marked and criteria is
• method→train: to train the system using the method.
a termination criteria passed to the K-means algorithm.
• method→save: to save the learned system into a file as
an xml file format or yml file format. B. Decision Trees
• method→load: to load the method learned from the file
Another machine learning technique is called decision tree.
passed. In a decision tree each node represents a random variable and
• method→predict:for a new prediction in a new data.
the arcs leaving the node represents the possible outcomes
OpenCv has several machine learning techniques used for for the random variable. The simplest decision tree is called
pattern recognition, a sample will be presented here. a binary decision tree where the outcomes for the random
variables are true or false. A binary decision tree can be used
A. K-means either for classification or for regression [11].
The K-means is not exactly a pattern recognition method Usually a binary decision tree uses as data a vector of
but a clustering method that tries to find clusters in the data. features extracted from images. Figure 10 shows an idea
One way to perform pattern recognition using K-means is first for digit classification based on features extracted from digit
find the clusters and then look for the pattern in each cluster. images. The top node checks if the feature top bar exists in
If the number of clusters in the data is not known (K), the image being classified. If it exists then the decision tree
the user has to make a guess and adjust this number later, if checks for the bottom bar otherwise it checks for the middle
necessary. The method interactively searchs for K centers of bar and so on. The learning process for decision trees tries
mass inside the data. This is one of the most used techniques to find the most discrimanating features in order to create the
for grouping [1]. The method itself works as follows: simplest tree possible.
1) Input: data and the number of groups (K).
2) Ramdonly defines K centers of mass in the data.
3) Associate each entrance with a center of mass.
4) Compute the new center of mass.
5) If the error between the previous centers of masses
and the new ones is bellow a certain limit accept and
terminate, otherwise return to step 3.
The K-means algorithms has the following problems:
• There is no warranty it will find the best centers of mass
for each group, but it will always converge,
• The user has to provide the number of clusters, the
Fig. 10. A model for a decision tree on image classification.
method will not give the best number of clusters in the
data.
• The method assumes that the covariance in the grouping The features are evaluated using three possible measures:
space is not important or that it was normalized. • Entropy:
Figure 9 shows at left an image with random points (right) E(feature) = −p ∗ log2 (p) − (1 − p) ∗ log2 (1 − p)
the seven clusters found by the K-means method (left). (7)
16
• Gini index: until a new unlabelled image is presented to the system and
m once the new image is labelled the computation is discarded.
G(feature) = 1 − fi2 (8) The code in openCV for the KNN method is:
i=1
CvKNearest knn(traindata,trainclasses,sample,
• Misclassification: regression,K);
response=knn.find_nearest(samples,results,
M(feature) = 1 − max(P(wj )) (9) neighbors,n_responses,distances);
where p represents the fraction of the data that has the feature. In this case traindata is the set of images, the trainclasses
In the Gini index fi is the fraction of items labelled with value are the labels for each image, sample is a vector with the
i using the feature. In the misclassification P (wj ) represents samples in the data that have to be considered, can be set to
the fraction of patterns in class wj using the feature. 0, regression is a boolean variable and indicates if the method
The results interpretation in a decision tree is straightfor- is being used for regression or not, and K is the number of
ward which makes it a method used widely. Most implemen- neighbors to bconsider. In the find nearest method samples
tations for this method allow missing information which is are the new images to be classified, results are the answers
almost always the case when working with feature vectors. for the samples (the labels), neighbors is a vector indicating
The method in openCV for decision trees is called as: the neighbors for each image in samples, n responses is the
decTrees.train(data,type,truevalues,features, class for each neighbor found and distances is a vector with
points,vartypes,missing, the distance for each neighbor found.
parameters);
D. Boosting
In this method data is the training data, type indicates
a row or column matrix, truevalues is a vector indicating The idea of Boosting is to combine simple decision rules
the expected (true) classification of the data. The remaining into an accurate decision system. In openCV the method
parameters are optional, features if not 0 is a mask indicating implemented for Boosting is called AdaBoost [1] which uses a
the features that must be used in the training process, points, simple (weak) classifier (a binary decision tree) to implement
if not 0, indicates the points that must be considered in the Boosting method. A weak classifier is a decision method
the training process, vartypes indicates if the variables are which is capable to classify things with a probability just above
categorical or ordered and finally parameters is used to set chance (50%). The combination of several weak classifiers
tree parameters like depth, missing data, etc (see more details leads to a decision system that typically performs at or near
in the openCV’s sample directory in the mushroom.cpp file). the state of the art [1], performance can be improved only in
more specifically designed systems. OpenCV implements four
C. K Nearest Neighbor
types of boosting for its AdaBoost algorithm:
The K Nearest Neighbor (KNN) is a comparative method • DISCRETE: for a discrete data.
for pattern recognition. There is actually no learning in this • REAL: uses confidence predictions and works well with
method, the idea, as shown in Figgure 11, is to compare any categorical data.
new image with all images available in the data and take the • LOGIT: works well with regression.
K nearest images (for any small and possibly odd value of K) • GENTLE: works better in regression data because it puts
and label the new image with the label of the majority among less weight on outliers.
the K. The nearest can be computed using any measurement
method, such as Euclidean or Mahalanobis distance. The idea in AdaBoost is to find a set of simple binary deci-
sion trees where each tree has a weight and the classification
of each one of these trees can be combined into a single and
strong classification system, as presented in Figure 12. The
method in openCV for AdaBoost is called as:
boost.train(data,type,trvalues,features,points,
vartypes,missing,parameters,update);
17
coefficients for an input object obj, eig count gives the number
of eigen objects, input is the structure with the eigen objects
and coeffs are the coefficients for the obj entered, the other
parameters are the same as in the cvCalcEigenObjects. In
cvEigenProjection the object projection over the eigen sub-
space is computed, input vecs is the input objects coeffs are
the coefficients computed by cvEigenDecomposite and proj is
the output computed by cvEigenProjection. More details about
Principal Component Analysis can be found in [12].
F. Neural Networks
Fig. 12. The idea of AdaBoost and boosting in general. Combine simple
classifiers using weights to get a strong classifier. An artificial neural network provides a general method for
learning different types of functions, such as real or discrete
valued functions from examples [14]. Neural networks are
among the most effective classifiers available, although the
E. Principal Component Analysis
learning phase might take some time when using gradient
Principal Component Analysis (PCA) is a technique based descent learning method [1].
on linear algebra which is used to find patterns in high OpenCV has two methods to implement neural networks,
dimensional data. PCA is applied in several areas, such as Support Vector Machines (SVM) and Multilayer Perceptron
neuroscience, data compression and computer vision. PCA (MLP). Both methods are implemented similarly so we will
can also be used to reduce the number of dimensions in data go through the Multilayer Perceptron and show how to use it,
with no (or little) loss of information. The dimensions in this the reader should check in the reference manual [7] for the
reduced dimensions are called the principal components and calls for the SVM methods.
they can be used for object recognition like faces [13] in The MLP structure is defined by the number of hidden
images. (intermediate) layers, the number of nodes in each layer and
The analysis using principal components is similar to the K the transition function. Each node in the network (except in
Nearest Neighbors. A new image is compared to all images the input layer) works by adding up the values that arrive to
used in the training, but instead of using the whole image the node and feed the transition function that, when activated,
the comparison is performed using only the reduced dimen- computes the node’s output. These two structures are presented
sions (principal components) of the images. Again, a measure in Figure 13 the top part shows the neural network strucutre
method is required to make the comparisons. The algorithm and the bottom part shows the node’s structure.
for PCA follows these steps:
1) Get the data.
2) Subtract the mean.
3) Compute the covariance matrix.
4) Compute the eigenvectors and eigenvalues of the covari-
ance matrix.
5) Select the principal components, the ones with the
highest eigenvalues.
The PCA methods in openCV are listed bellow:
cvCalcEigenObjects(nObjs,input,output,ioFlags,
ioBufSize,userData,limit,avg,eigVals);
cvEigenDecomposite(obj,eig_count,input,ioFlags,
userData,avg,coeffs);
cvEigenProjection(input_vecs,eig_count,ioFlags,
userData,coeffs,avg,proj);
18
the expected value with the computed value. Equation 11 followed so the search in the following images (frames) is
computes the weight’s update for each link in the network. minimized.
Tracking can be applied in several areas, from security
1
E= (Ok − Ck )2 (10) systems to human computer interfaces. There are methods to
2 forecast the object’s position frame by frame, from Kalman
kinOutput
19
is used to define the search window for the next frame.
Figure 14 shows the camshiftdemo.c program working.
B. Optical Flow
Optical Flow is a technique used for movement identifi-
cation in an image sequence without prior knowledge about
the image contents. In this technique, typically the movement
itself indicates that something is happening in the image.
OpenCV has sparse and dense methods to find movement.
Sparse methods require a previous knowledge about the points
to be tracked, such as corners described previously. Dense
methods associates a speed vector or a pixel displacement in
the image, so they don’t need to know any specific point in the
image. In practical applications, however, the dense techniques
have a high processing cost, so, unless they are really required
the user should use a sparse method. Table V-B resumes the
optical flow methods available in openCV, for more details in
each method check the reference manual [7] or [1].
TABLE I
O PTICAL F LOW METHODS IN O PEN CV.
VI. PARALLEL C OMPUTER V ISION USING CUDA AND First, it is elementary to understand the differences between
O PEN CV CPU and GPU architectures. While the CPU cores dedicate
The majority of the computers in the market today have a their millions of transistors to a few sophisticated execution
GPU but most of the computer vision and image processing units, GPU provides up to hundreds of simple execution units
applications are still sequential. The idea behind this topic is (Figure 15). While exploring parallelism in a multicore CPU
to present the students the possibility to use the OpenCV GPU requires a few number of independent threads, the CUDA
module in order to launch commonly used OpenCV functions programming model launches hundreds of threads that execute
in a GPU and improve overall performance. the same code over the GPU cores. This programming model
This topic will cover an introduction to CUDA, the GPU is called SIMT (Single Instruction, Multiple Threads) model
and CUDA architectures and the CUDA programming model, and it is derived from the classical SIMD (Single Instruction
then it will present the basic OpenCV GPU module features, Multiple Data) model.
as well as a simple example of how to use the OpenCV GPU Despite their differences, both architectures must coex-
main data structure and some functions as well as how to ist through a heterogeneous system, where GPUs are co-
calculate the performance speedup. processors to CPUs. The GPU is also known as a type of
an accelerator resource.
A. Introducing CUDA architecture
On the source code a special function must be developed
CUDA (Compute Unified Device Architecture) unifies the that executes in the GPU and is launched from the CPU code.
programming interface for NVIDIA GPUs. It defines a scal- In the CUDA programming model [4], the GPU is called
able programming model that could be used to program dozens device while the CPU is called host. The special GPU function
of different CUDA-enabled NVIDIA GPUs. in CUDA is called kernel and it is compiled separatelly from
20
also a limited resource where the allocation is performed
basically by dividing the total amount of shared memory
available by the total number of threads.
• Global memory: it is shared by all threads in the
same grid. Its allocation is performed in the host code,
as well as the data copy to and from the host. The
Fermi architecture has a L2 cache populated with global
memory data that helps in reducing the access latency.
• Constant memory: it is also shared by all the threads
in the same grid, but it is a read only memory. The
advantage in using this memory is its reduced memory
latency in relation to the global memory.
21
launched by the host code. Note that there will be 4096 threads TABLE II
D EVICE INFORMATION AND MANAGEMENT SELECTED FUNCTIONS .
executing the kernel at the same time in the GPU.
__global__ void add_value Function Functionality
(float* a,float value,int d){ int getCudaEnabledDeviceCount(); Returns the number of
CUDA-enabled devices installed.
int x=threadIdx.x+blockIdx.x*blockDim.x;
void setDevice(int device); Sets device and initializes it for
int y=threadIdx.y+blockIdx.y*blockDim.y; the current thread.
int offset=x+y*blockDim.x*gridDim.x; int getDevice(); Returns the current device index.
if ( x < d && y < d ) a[offset]+=value; string DeviceInfo::name(); Returns the device name.
} int DeviceInfo::majorVersion(); Returns the major compute
capability version.
The cudaMemcpy() copies a block of data that resides int DeviceInfo::minorVersion(); Returns the minor compute
contiguously in the global memory. Hence, the first thing to capability version.
do is to calculate the exact position in the linearized matrix the size t DeviceInfo::freeMemory(); Returns the amount of free
memory in bytes.
thread will work on. This is done using the pre-built variables size t DeviceInfo::totalMemory(); Returns the amount of total
threadIdx, blockIdx, blockDim and gridDim. In the example, memory in bytes.
the threads configuration is defined by bidimendional grid and bool DeviceInfo::isCompatible(); Returns true if the GPU module
can be run on the device.
blocks, so we can use the values for the x and y components
of these pre-built variables. The goal is to find an offset that
indicates how many matrix elements there exists before the
current thread’s matrix element in the linear memory. The Right now, the newest compute capability is 2.1. The list of
formulas in the previous example use the position of the thread features for each capability can be found in [4].
in the block, the position of the block in the grid and the values The OpenCV GPU module comes with binaries for ca-
in each dimension to compute the final offset. A more detailed pabilities 1.3 and 2.0. For capabilities 1.1 and 1.2 it works
example could be found in [6]. with CUDA intermediary code (PTX). Using PTX code means
After the kernel execution, the cudaMemcpy() must be that the JIT (Just In Time) compiler will be used in the first
called again to copy the resulting matrix that is in the device execution. This means that the first execution will be slower
global memory to the host memory. than the subsequent executions. The OpenCV GPU module
does not work if the GPU has capability 1.0 ([7]).
C. Building the OpenCV GPU module
D. The OpenCV GPU module
The OpenCV GPU module contains a set of OpenCV C++
classes and functions already adapted to run over any CUDA- This section provides basic information on how to use the
enabled NVIDIA GPU. Hence it is possible to start using OpenCV GPU module. The complete set of ported functions
the GPU performance power even with low experience with are well documented in [7].
CUDA or GPU architecture. On the other hand, using it 1) Device information and management: The functions in
efficiently requires some basic knowledge. This and the next this group provide several device information and allow setting
sections highlights the main OpenCV GPU module classes and the current device if there are more than one installed in
functions along with some CUDA explanations. the same machine. Part of these functions are presented in
The first thing to do in order to use the OpenCV GPU Table II, some of them are methods that belong to the C++
module is to build it. It is necessary to have a CUDA-enabled class DeviceInfo.
NVIDIA GPU (only the very old ones are not enabled) and the 2) Data structures: The basic data structure for OpenCV
last CUDA and OpenCV versions. At the time of this writing GPU module users is the GpuMat that is very similar to
the last versions are OpenCV 2.2 and CUDA 4.0. OpenCV Mat. There are some limitations like no arbitrary
There is also a tricky procedure while building with these dimensions support (only 2D), no functions that returns refer-
new versions concerning another pre-requisite: the NVIDIA ences to its data (because references on GPU are not valid for
Performance Primitive (NPP) library. This library contains CPU) and no expression templates technique support [7]. The
some vision functions used by the OpenCV GPU module. example bellow presents a simple code to illustrate the use of
Before the CUDA 4.0 release the NPP library was separated GpuMat initialization.
from the CUDA toolkit, thus the building procedure was cv::gpu::GpuMat dst;
different. The last CUDA release (4.0) includes the NPP cv::gpu::GpuMat src(
library as part of the toolkit. Thus, it is important to get the cv::imread("ressaca-no-arpoador.jpg",
trunk version of OpenCV that already considers this change CV_LOAD_IMAGE_GRAYSCALE));
in CUDA 4.0. Building procedure details can be found at [5].
First it initializes two GpuMat structures that will be used in
Before building it is usefull to know the compute capability
other examples: dst and src. As it demonstrates, the OpenCV
of the installed GPU. As the GPU architecture evolves the
function imread(), which returns a Mat structure, could be
CUDA software reflects these changes. The CUDA compute
used in the constructor of the GpuMat to initialize it with tan
capability, or just capability, identifies the set of features that
image data. This is an implicit conversion that could be used
were added generation after generation of GPU architectures.
to create GpuMat structures to run on the GPU.
22
Note that there are two namespaces used in the code.
The cv::gpu namespace is used to distinguish the member
from the upper level cv namespace, that contains all OpenCV
components. There are other lower level data structures that
could be used to write new CUDA kernels. The related classes
are described in [7].
3) Using GPU module functions: The code example bellow
shows a call of the treshold() function that will run on the
GPU. Note that the namespace indicates that the call refers to
the GPU version of treshold(). Also, the two first arguments
are of GpuMat type declared and initialized previously. The
example shows that a familiarized OpenCV user will not have
any dificulty to use the OpenCV GPU module. The example Fig. 18. Original and resulting images for the treshold() performance test
(original extracted from www.ipanema.blog.br)
also shows how to use an explicit conversion from GpuMat
to Mat. The function imwrite() requires a Mat as its second
argument. In this case a GpuMat structure (dst) is typecasted cout << "Elapsed time: "
in order to be used by the function. << mtime << " microseconds" << endl;
cv::gpu::threshold(src, dst, The previous example shows the code used to measure the
128.0, 255.0, CV_THRESH_BINARY);
cv::imwrite("result.jpg", Mat(dst)); GPU version of the treshold() function. The code is similar:
remove the gpu:: namespace and use the Mat structure.
E. Performance considerations The performance results for an Intel Core I7 950 3.06GHz
As presented previously, porting an application to run over Quad-Core processor and a Nvidia GeForce GTX 580, 1.5GB
a GPU using the OpenCV GPU module is straightforward, but GDDR5 card are presented using the Linux Ubuntu 11.04
yet requires some work. The OpenCV GPU samples could be operating system, the compiler was g++ 4.5.2, with OpenCV
used to obtain some indicators about the performance of the 2.2 and CUDA 4.0. Each version was run ten times and the
user’s specific card. averages were used to calculate the speedup.
Here we give an example of how to compute the speedup for The average CPU time was about 2625 microseconds and
a given application or portion of a code. The speedup basically the average GPU time was about 264 microseconds. After
is a metric that indicates how much faster an application eliminating the upper and the lower values the standard
runs when executed in a better hardware (usually a parallel deviation was 9.8 for GPU executions and 12.4 for CPU
hardware). Speedup is defined by the formula: executions. This gives us a speedup of almost 10x for the
GPU execution.
Ts The image used has 2835 x 1225 pixels. The resulting image
Sp = (12)
Tp has the same size in pixels. Figure 18 presents the original
image and the resulting image.
where Ts is the sequential time and Tp is the parallel execution
time. F. Porting an existing application
To calculate the speedup, first it is necessary to get the This section shows how to port an existing C++ application
execution elapsed time (or wall time) for both versions. The to run with the OpenCV GPU module. We use the dft sample
elapsed time is obtained by getting a start time before the that comes with the OpenCV package (the original code is in
portion of code to be timed and subtracting it from the end opencv/samples/cpp/dft.cpp). The next subsections explain the
time obtained after its execution. main procedures to convert the application.
The operating systems usually provide several timers that 1) Header files: The first step is to add the header file
could be used to accomplish this task. The example bellow gpu.hpp as in the example bellow (the original header files
uses the Linux gettimeofday timer to calculate the elapsed time remain the same as in the original dft.cpp code). The example
in microseconds. In the example, the treshold() function was also shows, the cv and std namespaces declared.
timed both for CPU (Ts ) and GPU (Tp ).
#include "opencv2/gpu/gpu.hpp"
struct timeval start, end; using namespace cv;
long mtime, seconds, useconds; using namespace std;
...
gettimeofday(&start, NULL); 2) GpuMat initialization: As illustrated before, the GpuMat
cv::gpu::threshold(src, dst, is the main data structure an it is used as the main argument
128.0, 255.0, CV_THRESH_BINARY); in all of the OpenCV GPU functions. In the code example
gettimeofday(&end, NULL); bellow, it is also initialized with the imread() function.
seconds = end.tv_sec - start.tv_sec; int main(int argc, char ** argv) {
useconds = end.tv_usec - start.tv_usec; const char* filename =
mtime = seconds * 1000000 + useconds; argc >=2 ? argv[1] : "imagens/lena.jpg";
23
gpu::GpuMat img(
imread(filename,CV_LOAD_IMAGE_GRAYSCALE));
24