Project Report
Project Report
TOPIC:
Counting the number denoted by a hand gesture. This model serves
as a smaller version of gesture recognition which can be used for
sign language.
Course Instructor: Dr. (Prof.) Yu Hen Hu
Author: Sudhanya Chatterjee
Campus id: 9068386912
Contents
Topics Page No.
Problem Statement 1
Motivation 1
Approach 2
Results 5
Discussion 9
References 9
Problem Statement
The aim of this project is to build a model for gesture recognition. In this work, the main aim
has been to count the number of fingers that is denoted by a gesture using image processing
techniques only.
The main challenges in this project have been extraction of the hand from a colour image and
form it into a suitable format on which finger counting algorithm could be applied. Hence,
the part of the problem is termed as the image pre-processing. For performing these
operations, various image processing techniques have been used, which have been discussed
in detail in Approach section of the report.
The second challenge is to devise an algorithm to count the number of fingers that is denoted
by the gesture. For this part, problem an edge detection algorithm is used specifically for this
problem. This too is discussed in detail in Approach section.
Motivation
Gesture recognition can have huge applications in the field of education for differently able
people. For example, if there is a system to convert the gestures in to audio signals
simultaneously, then a common classroom approach can be followed for student with
impaired vision and hearing ability. The main challenge is to recognize the gesture, and
mapping it into a particular sound signal can be done instantaneously. This project thus
provides a feature extraction method for gesture recognition of a particular type.
Moreover, this can also be applied for driving an automatic system using gestures only. For
example, an auto-drive option in a car, security systems at low levels, commands to a mobile
phone, etc. can also use this approach.
Approach
A simple block diagram approach followed in this project is given below.
Figure 1: Block Diagram of the approach
As shown in the block diagram, the approach basically has two main steps:
I. Pre-Processing of the image
II. Finger Counting Logic
Pre-Processing of Image:
The image is read in RGB format. The next objective is to extract the boundary of the hand
gesture from the figure. The image is then converted into a binary image, using gray
thresholding, to perform boundary extraction. The boundary extraction is performed using
erosion method. The erosion is intentionally performed using a very high dimension ones
matrix so that a sizeable difference is obtained between the original image and the eroded
image. The eroded image is then subtracted from the original binary image to get the
boundary of the objects in the image.
The next objective is to select the object in the image having the largest connected boundary.
This is done by finding the connected chain of pixels with value 255 using 8-neigbourhood
concept. Once all the chains are found, one with the longest value is selected and segregated
from the image. This, if seen from another point of view, is the shape in the image which
encloses the largest area.
The image with the largest area which is separated from the main image is now inverted. So,
what takes place is that the area enclosed by the longest boundary has values 255 and rest all
values in the image is set to 0. Thus, we succeed in motive of separating the hands area from
the main RGB format image in a binary image.
The next step is to crop out the unwanted portion of the image for final processing. For this, a
horizontal scan line is used for scanning binary image, starting from top. The first time its
value changes from 0 to 255, the value of the row is stored. Then the image is cropped such
that it starts from top of the gesture to bottom only. We do not care about the cropping of
image in terms of width because of the nature of the finger counting algorithm. This is the
final stage of the image pre-processing.
Finger Counting Logic:
After the image has been pre-processed, next stage is counting the number of fingers in the
gesture. This process is done in two steps. In one part of the logic, the number of fingers
(other than thumb) is counted. In another, detection of thumb is carried out. This approach is
adopted because whenever a gesture comes up for recognition, the thumbs presence is in
almost all cases supposed to be in the lower half of the image and applying similar logic, we
can say that the other fingers lie in top half of the image.
So, the detection part is done by first analyzing the top half of the image and then the bottom
half. First, the detection of non-thumb fingers is discussed. A scanning line is again used for
both the cases. For the top half of the image, the scanning is done from top of the image to
half row count of the image. The scanning line checks the number of times the value changes
from 0 to 255 and vice versa. Thus, whenever this takes place, it is actually a finger in the
gesture. The change of pixel value from 0 to 255 and vice versa is taken as a single count. As
the scanning is finished, the maximum value which the count had during the scanning is
stored as the finger (other than thumb) count.
In the next part, the horizontal scanning line is started from the bottom to half row count of
the image. If the number of changes from 0 to 255 and vice versa is found to be greater than
1, then the thumb count is set to one.
Hence, the final count of fingers is simply the sum of two counts obtained during two
horizontal scans.
The program was written using MATLAB. Once the image is fed into the program, all the
procedures mentioned in Approach section is automated.
This algorithm was tested on few set images and their results have been shown in the Result
section.
Results
Figure 2: Correctly Detected Gesture for digit 1
Figure 3: Correctly detected gesture for digit 5. Thumb is properly detected too
Figure 4: Correctly recognized gesture for 4
Figure 5: Wrong detection of gesture. Displayed count is 3. Correct Count 4
Discussion
As we could see from the results, the detection of the gesture is fairly successful. This
operation was performed on total 25 images. Out of which, 21 images were correctly
detected. Hence the success percentage is 84.0%.
The images which were incorrectly detected were due to close proximity of the fingers of the
subject. Hence, properly gestured images can be detected using this algorithm very
efficiently.
As far as the run time is concerned, it is extremely time efficient even with a very high level
language like MATLAB. Sufficient precautions were taken to minimize use of for loops in
MATLAB program and maximum calculations were done at matrix level to minimize run
time.
References
[1] Zhong Yang; Yi Li; Weidong Chen; Yang Zheng , Dynamic Hand Gesture
Recognition Using Hidden Markov Models, 7
th
International Conference on
Computer Science and Education, 2012, pp 360-365, IEEE conference Publication.
[2] Rautaray, S.S.; Agrawal, A. ,Design of Gesture Recognition System for Dynamic
User Interface, 2012 International Conference on Technology Enhanced Education
(ICTEE), pp. 1-6, IEEE Conference Publication.
[3] Panwar, M., Hand Gesture Recognition based on Shape Parameters, 2012
International Conference on Computing, Communication and Applications (ICCCA),
pp. 1-6, IEEE Conference Publication.
[4] Gonzalez, Rafael C. & R. E. Woods, Digital image processing 3rd ed c2008, Printice
Hall, ISBN: 9780131687288.