Writeup PDF
Writeup PDF
Index:
I. Abstract
II. Overview
III. Design
IV. Constraints
VIII. Conclusion
X. Program Description
XIV. Platform
This project deals with the detection and recognition of hand gestures. Images of the
hand gestures are taken using a Nokia N900 cell phone and matched with the
images in the database and the best match is returned. Gesture recognition is one of
the essential techniques to build user-friendly interfaces. For example, a robot that
can recognize hand gestures can take commands from humans, and for those who
are unable to speak or hear, having a robot that can recognize sign language would
allow them to communicate with it. Hand gesture recognition could help in video
gaming by allowing players to interact with the game using gestures instead of using
a controller. However, such an algorithm needs to be more robust to account for the
myriad of possible hand positions in three-dimensional space. It also needs to work
with video rather than static images. That is beyond the scope of our project.
II. Overview:
For example,
In the above image, each pixel p of the object is labeled by the distance to the closest
point q in the background.
Moments: Image moments are useful to describe objects after segmentation. Simple
properties of the image, which are found via image moments, include area (or total
intensity), its centroid and information about its orientation.
Ratio of the two distance transformed images of the same size = (No of pixels whose
difference is zero or less than a certain threshold) / (Total number of pixels in the
distance transformed image)
III. Design:
Step 1: User takes a picture of the hand to be tested either through the cell phone
camera or from the Internet.
Step 2: The image is converted into gray scale and smoothed using a Gaussian
kernel.
Step 3: Convert the gray scale image into a binary image. Set a threshold so that the
pixels that are above a certain intensity are set to white and those below are set to
black.
Step 4: Find contours, then remove noise and smooth the edges to smooth big
contours and melt numerous small contours.
Step 6: The angles of inclination of the contours and also the location of the center of
the contour with respect to the center of the image are obtained through the
bounding box information around the contour.
Step 7: The hand contours inside the bounding boxes are extracted and rotated in
such a way that the bounding boxes are made upright (inclination angle is 0) so that
matching becomes easy.
Step 8: Both the images are scaled so that their widths are set to the greater of the
two widths and their heights are set to the greater of the two heights. This is done
so that the images are the same size.
Step 9: The distance transform of both the query image and the candidate images
are computed and the best match is returned.
IV. Constraints:
3. We must have each gesture with at least four orientations at 90 degrees each to
return the best match.
V. Sample Images contained in the Database:
Images with different orientations are present within the database. In order to
maintain performance, we have a limited number of gesture images in our database.
New gesture images can be added to the database without any pre-processing.
Case 1:
(A)
In the above case, we can see the query image on the left hand side and the matched
candidate image from the database on the right.
Both the images are converted to binary images and the contours are computed.
Using the bounding box information obtained from the contours, we get the angle of
inclination of the contours and also, the center, height, and width of the bounding
box. The results obtained after rotating and scaling the query and candidate images
are:
(a) Rotated and scaled query image (b) Rotated and scaled candidate
image
Now, the distance transforms of both the query image and the candidate images are
computed as follows:
Now, the difference of the two image images is computed and the ratio of the match
is found by the number of pixels whose difference between the two corresponding
pixels is zero or below a certain threshold divided by the total number of pixels in
one of the image. If the ratio is above 65% then the candidate image is determined
as a match and returned.
(B)
Case 2:
In the above case, the hand gesture on the left hand side is slightly tilted and still the
right gesture from the database is returned.
(a) The rotated and scaled image of (b) The rotated and scaled image of
the query image the candidate image
(a) Distance Transform image of the (b) Distance Transform image of the
query image candidate image
Case 4:
In the above case, the program returns the right gesture image, even though the
database doesn’t contain the left hand gesture as in the query image because, the
candidate image passes the ratio test.
Case 5:
If a similar gesture as the query image is not present in the database then a no-
match is returned.
Case 6:
Cases of False Positives. The right gesture is returned but not logical.
VII. Results on the Cell Phone:
We got similar results on the cell phone as well and the performed almost close to
that on the desktop.
Based on our observation, we can conclude that the results mainly depend on:
1. Threshold, while converting the gray image to the binary image and finding
contours. For example found that uneven lighting across the picture of the hand
caused the algorithm to draw contours around the darkened areas in addition to the
contour around the hand. Changing the threshold prevented that from happening.
2. The threshold for the Ratio test while matching the distance transformed images.
The ratio we used
FeaturesUI.cpp: has been customized to get the GUI as shown in the above images.
X. Future Work:
2. Can have more images of gestures added to the database for the program to
recognize
void cvDistanceTransform( . . . )
int cvFindContours( . . . )
2. To compute the moments
double cvMatchShapes ( . . . )
cvMinAreaRect2( . . .)
4. To find the area of the contour
cvContourArea (…)
1. rotate_inverse_warp()
This function extracts the image contained within the bounding box, rotates the image
in the negative direction (the angle is obtained from the bounding box) and creates a
new image.
2. distance transform
This function uses the algorithm described in “Euclidean Distance Transform,” which
replaces binary image pixel intensities with values equal to their Euclidean distance from
the nearest edge. While we implemented this function, we were unable to get it to
work completely, so we ended up using OpenCV’s cvDistTransform().
Wrote functions for rotation, scaling, and distance transform. Debugged code, worked
on the report, the presentation slides, and presented the slides during the final
presentation.
Wrote the main pipeline for the project and also, got the program working on the cell
phone. Debugged code, worked on the report, the presentation slides, and presented
the slides during the final presentation.
XIV: Platform
Acknowledgements
We’d like to thank Professor Noah Snavely and Kevin Matzen for guiding and assisting us
during this project and the course.
All the openCV functions that are used are marked in blue.
/* metric to compute the distance transform */
int found = 0;
int dist_type = CV_DIST_L2;
int mask_size = CV_DIST_MASK_5;
/* This is the main function that computes the matches. The query image
//and the database image are pre-processed. Both the images are
//converted to binary images and the contours are computed. The openCV
//function is used to compute the contours and it also returns the
//bounding box information like the height and the width of the
//bounding box and also the center of the bounding box with respect to
//the image. In addition it also returns the angle of inclination of
//the bounding box. This information is used to rotate the binary image
//in order to compute the distance transform of the images. The pixel
//values of the distance-transformed images are subtracted to compute
//the match. If the pixel difference is zero or less than a certain
//threshold, then it is determined as a match. The number of such
//pixels that satisfy this criterion is counted and the ratio of this
//count to the total number of pixels in the image gives us the ratio.
//In our program we have set the threshold to be at least 65%. If the
//ratio is greater than 65% then the candidate image in consideration
//is selected as a match. In addition, the moment value is also
//computed and set to between 0.0 and 0.22 to find the best match. */
/*****************/
/*****************/
/********************/
IplImage* can_img2=0;
can_img2 = cvCreateImage(cvSize(cmp_img->width, cmp_img-
>height),IPL_DEPTH_8U,1);
cvCanny( gray_ath_img2, can_img2, 50, 100, 3 );
//cvShowImage("Canny Image", can_img2);
/******************/
// We are finding the contours here
// find the contours of the query image
CvMemStorage* g_storage = NULL;
//IplImage* contour_img=0;
if( g_storage==NULL ) {
//contour_img = cvCreateImage(cvSize(src_img->width, src_img-
>height),IPL_DEPTH_8U,1);
g_storage = cvCreateMemStorage(0);
}
else
{
cvClearMemStorage( g_storage );
}
CvSeq* contour = 0;
CvSeq* first_contour = NULL;
int nc = 0; // total number of contours -- roopa
nc = cvFindContours( gray_ath_img, g_storage, &first_contour,
sizeof(CvContour), CV_RETR_LIST);
cvZero(gray_ath_img);
if(contour)
{
cvDrawContours( gray_ath_img, contour, cvScalarAll(255),
cvScalarAll(255), 100);
}
CvSeq* contour2 = 0;
CvSeq* first_contour2 = NULL;
int nc2 = 0; // total number of contours
nc2 = cvFindContours( gray_ath_img2, g_storage2, &first_contour2,
sizeof(CvContour), CV_RETR_LIST);
//printf( "Total Contours Detected in the Candidate Image: %d\n",
nc2 );
cvZero(gray_ath_img2);
if(contour2)
{
cvDrawContours( gray_ath_img2, contour2, cvScalarAll(255),
cvScalarAll(255), 100);
}
/**************************/
CvBox2D box;
box = cvMinAreaRect2(contour, 0);
float ang;
ang = box.angle;
// for size
CvSize2D32f siz = box.size;
double wid = siz.width;
double hei = siz.height;
/*
printf("Width and Height of Query_Image Box\n");
printf("Width : %f Height : %f Angle : %f\n", wid, hei, ang);
*/
// for size
CvSize2D32f siz2 = box2.size;
double wid2 = siz2.width;
double hei2 = siz2.height;
/*
printf("Width and Height of Query_Image Box\n");
printf("Width : %f Height : %f Angle : %f\n", wid2, hei2,
ang2);
*/
/********************/
IplImage* res = cvCreateImage(cvSize(cmp_img->width, cmp_img-
>height),IPL_DEPTH_8U,3);
res = convert3channel(gray_ath_img2);
cvSaveImage( "s1.jpg",scale_img1);
cvSaveImage( "s2.jpg",scale_img2);
cvSaveImage("rot1.jpg", rotsc1);
cvSaveImage("rot2.jpg", rotsc2);
cvSaveImage("dt1.jpg", dt1);
cvSaveImage("dt22.jpg", dt2);
//cvShowImage("dt1", dt1);
//cvShowImage("dt2", dt2);
int count = 0;
int maxcount = dt1->width * dt1->height;
}
}
if ( (ratio >= 0.5 and ratio <= 1.0) and (val >= 0.0 and val <=0.4))
//0.2
{
std::cout << "\nMatch Found with DT";
found = 1;
return res;
}
else
{
std::cout << "No Match Found";
res = cvLoadImage("no_match.jpg");
//cvShowImage("no match", res);
return res;
IplImage* dist = 0;
IplImage* dist8u1 = 0;
IplImage* dist8u2 = 0;
IplImage* dist8u = 0;
IplImage* dist32s = 0;
IplImage* edge = 0;
IplImage* gray = 0;
if(flag == 1)
{
gray = cvLoadImage("rot1.jpg", 0);
edge = cvLoadImage("s1.jpg", 0);
}
else
{
gray = cvLoadImage("rot2.jpg", 0);
edge = cvLoadImage("s2.jpg", 0);
}
return dist8u1;
/* This function uses the Bounding Box information obtained from the
//contours. The Bounding Box data structure gives us information about
//center of the bounding box, the angle of inclination and also the
//height and width of the bounding box.
//
// A new image is created which is nothing but the image contained
//within the bounding box.
*/
cvSet2D(after, b, a, MOPSpixel);
}
}
// apply threshold: those pixels that have intensities > 0 will
have them set to 1.
for(int a = 0; a < after->height; a++)
{
for(int b = 0; b < after->width; b++)
{
CvScalar Afterpixel = cvGet2D(after, a, b);
Afterpixel.val[0] = (Afterpixel.val[0] > 0 ? 255 : 0);
cvSet2D(after, a, b, Afterpixel);
}
}
CvScalar s, s1;
s = cvGet2D(toch, i, j);
if(s.val[0] > 0)
{
s1.val[0] = 255;
s1.val[1] = 255;
s1.val[2] = 255;
}
else
{
s1.val[0] = 0;
s1.val[1] = 0;
s1.val[2] = 0;
}
cvSet2D(rot3ch1,i,j,s1);
}
//std::cout << std::endl;
}
return rot3ch1;
}