Introduction To Programming With OpenCV
Introduction To Programming With OpenCV
Gady Agam Department of Computer Science Illinois Institute of Technology January 27, 2006
Abstract:
The purpose of this document is to get you started quickly with OpenCV without having to go through lengthy reference manuals. Once you understand these basics you will be able to consult the OpenCV manuals on a need basis.
Contents
Introduction o Description of OpenCV o Resources o OpenCV naming conventions o Compilation instructions o Example C Program
Basic OpenCV data structures o Image data structure o Matrices and vectors o Other data structures
Working with images o Allocating and releasing images o Reading and writing images
1
o o o
Working with matrices o Allocating and releasing matrices o Accessing matrix elements o Matrix/vector operations
Working with video sequences o Capturing a frame from a video sequence o Getting/setting frame information o Saving a video file
Introduction
Description of OpenCV
General description o Open source computer vision library in C/C++. o Optimized and intended for real-time applications. o OS/hardware/window-manager independent. o Generic image/video loading, saving, and acquisition. o Both low and high level API. o Provides interface to Intel's Integrated Performance Primitives (IPP) with processor specific optimization (Intel processors). Features: o Image data manipulation (allocation, release, copying, setting, conversion). o Image and video I/O (file and camera based input, image/video file output). o Matrix and vector manipulation and linear algebra routines (products, solvers, eigenvalues, SVD). o Various dynamic data structures (lists, queues, sets, trees, graphs). o Basic image processing (filtering, edge detection, corner detection, sampling and interpolation, color conversion, morphological operations, histograms, image pyramids). o Structural analysis (connected components, contour processing, distance transform, various moments, template matching, Hough transform, polygonal approximation, line fitting, ellipse fitting, Delaunay triangulation). o Camera calibration (finding and tracking calibration patterns, calibration, fundamental matrix estimation, homography estimation, stereo correspondence). o Motion analysis (optical flow, motion segmentation, tracking). o Object recognition (eigen-methods, HMM). o Basic GUI (display image/video, keyboard and mouse handling, scroll-bars). o Image labeling (line, conic, polygon, text drawing)
2
OpenCV modules: o cv - Main OpenCV functions. o cvaux - Auxiliary (experimental) OpenCV functions. o cxcore - Data structures and linear algebra support. o highgui - GUI functions.
Resources
Reference manuals:
o <opencv-root>/docs/index.htm
Web resources: o Official webpage: http://www.intel.com/technology/computing/opencv/ o Software download: http://sourceforge.net/projects/opencvlibrary/ Books: o Open Source Computer Vision Library by Gary R. Bradski, Vadim Pisarevsky, and Jean-Yves Bouguet, Springer, 1st ed. (June, 2006). Sample programs for video processing (in <opencv-root>/samples/c/): o color tracking: camshiftdemo o point tracking: lkdemo o motion segmentation: motempl o edge detection: laplace Sample programs for image processing (in <opencv-root>/samples/c/): o edge detection: edge o segmentation: pyramid_segmentation o morphology: morphology o histogram: demhist o distance transform: distrans o ellipse fitting: fitellipse
Header files:
#include #include #include #include <cv.h> <cvaux.h> <highgui.h> <cxcore.h>
Compilation instructions
Linux:
g++ hello-world.cpp -o hello-world \ -I /usr/local/include/opencv -L /usr/local/lib -lm -lcv -lhighgui -lcvaux \
Windows:
In the project preferences set the path to the OpenCV header files and the path to the OpenCV library files.
Example C Program
//////////////////////////////////////////////////////////////////////// // // hello-world.cpp // // This is a simple, introductory OpenCV program. The program reads an // image from a file, inverts it, and displays the result. // //////////////////////////////////////////////////////////////////////// #include <stdlib.h> #include <stdio.h> #include <math.h> #include <cv.h> #include <highgui.h> int main(int argc, char *argv[]) { IplImage* img = 0; int height,width,step,channels; uchar *data; int i,j,k; if(argc<2){ printf("Usage: main <image-file-name>\n\7"); exit(0);
} // load an image img=cvLoadImage(argv[1]); if(!img){ printf("Could not load image file: %s\n",argv[1]); exit(0); } // get the image data height = img->height; width = img->width; step = img->widthStep; channels = img->nChannels; data = (uchar *)img->imageData; printf("Processing a %dx%d image with %d channels\n",height,width,channels); // create a window cvNamedWindow("mainWin", CV_WINDOW_AUTOSIZE); cvMoveWindow("mainWin", 100, 100); // invert the image for(i=0;i<height;i++) for(j=0;j<width;j++) for(k=0;k<channels;k++) data[i*step+j*channels+k]=255-data[i*step+j*channels+k]; // show the image cvShowImage("mainWin", img ); // wait for a key cvWaitKey(0); // release the image cvReleaseImage(&img ); return 0; }
GUI commands
Window management
Load an image:
IplImage* img=0; img=cvLoadImage(fileName); if(!img) printf("Could not load image file: %s\n",fileName);
Display an image:
cvShowImage("win1",img);
Can display a color or grayscale byte/float-image. A byte image is assumed to have values in the range . A float image is assumed to have values in the range assumed to have data in BGR order.
. A color image is
Close a window:
cvDestroyWindow("win1");
Resize a window:
cvResizeWindow("win1",100,100); // new width/heigh in pixels
Input handling
event: CV_EVENT_LBUTTONDOWN, CV_EVENT_RBUTTONDOWN, CV_EVENT_MBUTTONDOWN, CV_EVENT_LBUTTONUP, CV_EVENT_RBUTTONUP, CV_EVENT_MBUTTONUP, CV_EVENT_LBUTTONDBLCLK, CV_EVENT_RBUTTONDBLCLK, CV_EVENT_MBUTTONDBLCLK, CV_EVENT_MOUSEMOVE: flags: CV_EVENT_FLAG_CTRLKEY, CV_EVENT_FLAG_SHIFTKEY, CV_EVENT_FLAG_ALTKEY, CV_EVENT_FLAG_LBUTTON, CV_EVENT_FLAG_RBUTTON, CV_EVENT_FLAG_MBUTTON o
o o
The keyboard does not have an event handler. Get keyboard input without blocking:
int key; key=cvWaitKey(10); // wait 10ms for input
IPL image:
IplImage |-- int nChannels; // |-- int depth; // | // | // | // | // |-- int width; // |-- int height; // |-- char* imageData; // | // |-- int dataOrder; // | // | // |-- int origin; // | // |-- int widthStep; // |-- int imageSize; // |-- struct _IplROI *roi;// | // |-- char *imageDataOrigin; | | |-- int align; // | // |-- char colorModel[4]; // Number of color channels (1,2,3,4) Pixel depth in bits: IPL_DEPTH_8U, IPL_DEPTH_8S, IPL_DEPTH_16U,IPL_DEPTH_16S, IPL_DEPTH_32S,IPL_DEPTH_32F, IPL_DEPTH_64F image width in pixels image height in pixels pointer to aligned image data Note that color images are stored in BGR order 0 - interleaved color channels, 1 - separate color channels cvCreateImage can only create interleaved images 0 - top-left origin, 1 - bottom-left origin (Windows bitmaps style) size of aligned image row in bytes image data size in bytes = height*widthStep image ROI. when not NULL specifies image region to be processed. // pointer to the unaligned origin of image data // (needed for correct image deallocation) Alignment of image rows: 4 or 8 byte alignment OpenCV ignores this and uses widthStep instead Color model - ignored by OpenCV
Matrices:
CvMat |-flags |-|-|-|-int type; // 2D array // elements type (uchar,short,int,float,double) and // full row length in bytes // dimensions // alternative dimensions reference // // // // // data data data data data pointer pointer pointer pointer pointer for for for for for an unsigned char matrix a short matrix an integer matrix a float matrix a double matrix
int step; int rows, cols; int height, width; union data; |-- uchar* ptr; |-- short* s; |-- int* i; |-- float* fl; |-- double* db;
CvMatND |-- int type; flags |-- int dims; |-- union data; | |-- uchar* ptr; | |-- short* s;
// N-dimensional array // elements type (uchar,short,int,float,double) and // number of array dimensions // data pointer for an unsigned char matrix // data pointer for a short matrix
| | | | |--
|-- int* i; |-- float* fl; |-- double* db; struct dim[]; |-- size; |-- step;
// data pointer for an integer matrix // data pointer for a float matrix // data pointer for a double matrix // information for each dimension // number of elements in a given dimension // distance between elements in a given dimension
Generic arrays:
CvArr* // // // // // Used only as a function parameter to specify that the function accepts arrays of more than a single type, such as: IplImage*, CvMat* or even CvSeq*. The particular array type is determined at runtime by analyzing the first 4 bytes of the header of the actual array.
Scalars:
CvScalar |-- double val[4]; //4D vector
Initializer function:
CvScalar s = cvScalar(double val0, double val1=0, double val2=0, double val3=0);
Example:
CvScalar s = cvScalar(20.0); s.val[0]=10.0;
Note that the initializer function has the same name as the data structure only starting with a lower case character. It is not a C++ constructor.
Points:
CvPoint p = cvPoint(int x, int y); CvPoint2D32f p = cvPoint2D32f(float x, float y); CvPoint3D32f p = cvPoint3D32f(float x, float y, float z); E.g.: p.x=5.0; p.y=5.0;
Rectangular dimensions:
CvSize CvSize2D32f r = cvSize(int width, int height); r = cvSize2D32f(float width, float height);
Allocate an image:
IplImage* cvCreateImage(CvSize size, int depth, int channels); size: cvSize(width,height);
depth: pixel depth in bits: IPL_DEPTH_8U, IPL_DEPTH_8S, IPL_DEPTH_16U, IPL_DEPTH_16S, IPL_DEPTH_32S, IPL_DEPTH_32F, IPL_DEPTH_64F channels: Number of channels per pixel. Can be 1, 2, 3 or 4. The channels are interleaved. The usual data layout of a color image is b0 g0 r0 b1 g1 r1 ...
Examples:
// Allocate a 1-channel byte image IplImage* img1=cvCreateImage(cvSize(640,480),IPL_DEPTH_8U,1); // Allocate a 3-channel float image IplImage* img2=cvCreateImage(cvSize(640,480),IPL_DEPTH_32F,3);
Release an image:
IplImage* img=cvCreateImage(cvSize(640,480),IPL_DEPTH_8U,1); cvReleaseImage(&img);
Clone an image:
IplImage* img1=cvCreateImage(cvSize(640,480),IPL_DEPTH_8U,1); IplImage* img2; img2=cvCloneImage(img1);
10
By default, the loaded image is forced to be a 3-channel color image. This default can be modified by using:
img=cvLoadImage(fileName,flag); flag: >0 the loaded image is forced to be a 3-channel color image =0 the loaded image is forced to be a 1 channel grayscale image <0 the loaded image is loaded as is (with number of channels in the file).
The output file format is determined based on the file name extension.
Assume that you need to access the row index is in the range
-row and
is in the range
The channel index is in the range . Indirect access: (General, but inefficient, access to any type image) o For a single-channel byte image:
IplImage* img=cvCreateImage(cvSize(640,480),IPL_DEPTH_8U,1); CvScalar s; s=cvGet2D(img,i,j); // get the (i,j) pixel value printf("intensity=%f\n",s.val[0]); s.val[0]=111; cvSet2D(img,i,j,s); // set the (i,j) pixel value o
11
IplImage* img=cvCreateImage(cvSize(640,480),IPL_DEPTH_32F,3); CvScalar s; s=cvGet2D(img,i,j); // get the (i,j) pixel value printf("B=%f, G=%f, R=%f\n",s.val[0],s.val[1],s.val[2]); s.val[0]=111; s.val[1]=111; s.val[2]=111; cvSet2D(img,i,j,s); // set the (i,j) pixel value
Direct access: (Efficient access, but error prone) o For a single-channel byte image:
IplImage* img=cvCreateImage(cvSize(640,480),IPL_DEPTH_8U,1); ((uchar *)(img->imageData + i*img->widthStep))[j]=111; o
Direct access using a pointer: (Simplified and efficient access under limiting assumptions) o For a single-channel byte image:
IplImage* img int height int width int step uchar* data data[i*step+j] o = = = = = = cvCreateImage(cvSize(640,480),IPL_DEPTH_8U,1); img->height; img->width; img->widthStep/sizeof(uchar); (uchar *)img->imageData; 111;
12
Direct access using a c++ wrapper: (Simple and efficient access) o Define a c++ wrapper for single-channel byte images, multi-channel byte images, and multichannel float images:
template<class T> class Image { private: IplImage* imgp; public: Image(IplImage* img=0) {imgp=img;} ~Image(){imgp=0;} void operator=(IplImage* img) {imgp=img;} inline T* operator[](const int rowIndx) { return ((T *)(imgp->imageData + rowIndx*imgp->widthStep));} }; typedef struct{ unsigned char b,g,r; } RgbPixel; typedef struct{ float b,g,r; } RgbPixelFloat; typedef typedef typedef typedef o Image<RgbPixel> Image<RgbPixelFloat> Image<unsigned char> Image<float> RgbImage; RgbImageFloat; BwImage; BwImageFloat;
IplImage* img=cvCreateImage(cvSize(640,480),IPL_DEPTH_32F,3); RgbImageFloat imgA(img); imgA[i][j].b = 111; imgA[i][j].g = 111; imgA[i][j].r = 111;
Image conversion
Drawing commands
Draw a box:
// draw a box with red lines of width 1 between (100,100) and (200,200) cvRectangle(img, cvPoint(100,100), cvPoint(200,200), cvScalar(255,0,0), 1);
Draw a circle:
// draw a circle at (100,100) with a radius of 20. Use green lines of width 1 cvCircle(img, cvPoint(100,100), 20, cvScalar(0,255,0), 1);
14
cvPolyLine(img,curveArr,nCurvePts,nCurves,isCurveClosed,cvScalar(0,255,255),lineW idth);
Add text:
CvFont font; double hScale=1.0; double vScale=1.0; int lineWidth=1; cvInitFont(&font,CV_FONT_HERSHEY_SIMPLEX|CV_FONT_ITALIC, hScale,vScale,0,lineWidth); cvPutText (img,"My comment",cvPoint(200,400), &font, cvScalar(255,255,0));
General: o OpenCV has a C interface to matrix operations. There are many alternatives that have a C++ interface (which is more convenient) and are as efficient as OpenCV. o Vectors are obtained in OpenCV as matrices having one of their dimensions as 1. o Matrices are stored row by row where each row has a 4 byte alignment. Allocate a matrix:
15
CvMat* cvCreateMat(int rows, int cols, int type); type: Type of the matrix elements. Specified in form CV_<bit_depth>(S|U|F)C<number_of_channels>. E.g.: CV_8UC1 means an 8-bit unsigned single-channel matrix, CV_32SC2 means a 32-bit signed matrix with two channels. Example: CvMat* M = cvCreateMat(4,4,CV_32FC1);
Release a matrix:
CvMat* M = cvCreateMat(4,4,CV_32FC1); cvReleaseMat(&M);
Clone a matrix:
CvMat* M1 = cvCreateMat(4,4,CV_32FC1); CvMat* M2; M2=cvCloneMat(M1);
Initialize a matrix:
double a[] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 }; CvMat Ma=cvMat(3, 4, CV_64FC1, a);
Alternatively:
CvMat Ma; cvInitMatHeader(&Ma, 3, 4, CV_64FC1, a);
Assume that you need to access the Indirect matrix element access:
cvmSet(M,i,j,2.0); // Set M(i,j) t = cvmGet(M,i,j); // Get M(i,j)
16
Matrix/vector operations
Matrix-matrix operations:
CvMat *Ma, *Mb, *Mc; cvAdd(Ma, Mb, Mc); cvSub(Ma, Mb, Mc); cvMatMul(Ma, Mb, Mc); // Ma+Mb // Ma-Mb // Ma*Mb -> Mc -> Mc -> Mc
Vector products:
double va[] = {1, 2, 3}; double vb[] = {0, 0, 1}; double vc[3]; CvMat Va=cvMat(3, 1, CV_64FC1, va); CvMat Vb=cvMat(3, 1, CV_64FC1, vb); CvMat Vc=cvMat(3, 1, CV_64FC1, vc); double res=cvDotProduct(&Va,&Vb); // dot product: Va . Vb -> res cvCrossProduct(&Va, &Vb, &Vc); // cross product: Va x Vb -> Vc end{verbatim}
Note that Va, Vb, Vc, must be 3 element vectors in a cross product.
CvMat *Ma, *Mb; cvTranspose(Ma, Mb); CvScalar t = cvTrace(Ma); double d = cvDet(Ma); cvInvert(Ma, Mb);
// // // //
transpose(Ma) -> Mb (cannot transpose onto self) trace(Ma) -> t.val[0] det(Ma) -> d inv(Ma) -> Mb
The flags cause U and V to be returned transposed (does not work well without the transpose flags).
OpenCV supports capturing images from a camera or a video file (AVI). Initializing capture from a camera:
CvCapture* capture = cvCaptureFromCAM(0); // capture from video device #0
Capturing a frame:
IplImage* img = 0; if(!cvGrabFrame(capture)){ // capture a frame printf("Could not grab a frame\n\7"); exit(0);
18
} img=cvRetrieveFrame(capture);
To obtain images from several cameras simultaneously, first grab an image from each camera. Retrieve the captured images after the grabbing is complete.
Note that the image captured by the device is allocated/released by the capture function. There is no need to release it explicitly.
The total frame count is relevant for video files only. It does not seem to be working properly.
Get the position of the captured frame in [msec] with respect to the first frame, or get its index where the first frame starts with an index of 0. The relative position (ratio) is 0 in the first frame and 1 in the last frame. This ratio is valid only for capturing images from a file.
This only applies for capturing from a file. It does not seem to be working properly.
19
int frameW = 640; // 744 for firewire cameras int frameH = 480; // 480 for firewire cameras writer=cvCreateVideoWriter("out.avi",CV_FOURCC('P','I','M','1'), fps,cvSize(frameW,frameH),isColor);
// capture a frame // retrieve the captured frame // add the frame to the file
To view the captured frames during capture, add the following in the loop:
cvShowImage("mainWin", img); key=cvWaitKey(20); // wait 20 ms
Note that without the 20[msec] delay the captured sequence is not displayed properly.
20