0% found this document useful (0 votes)
49 views9 pages

Hand Gesture Recognition Based On Digital Image Processing Using MATLAB

Uploaded by

Graça Marietto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views9 pages

Hand Gesture Recognition Based On Digital Image Processing Using MATLAB

Uploaded by

Graça Marietto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

International Journal of Scientific & Engineering Research, Volume 6, Issue 9, September 2015 338

ISSN 2229-5518

Hand Gesture Recognition based on


Digital Image Processing using MATLAB
By Tahir Khan under supervision of Dr. Amir Hassan Pathan
Faculty of Engineering, Sciences and Technology, IQRA University
Karachi, Pakistan
Email: [email protected]

Abstract - This research work presents a prototype system that helps to recognize hand gesture to normal people in order to
communicate more effectively with the special people. Aforesaid research work focuses on the problem of gesture recognition
in real time that sign language used by the community of deaf people. The problem addressed is based on Digital Image
Processing using Color Segmentation, Skin Detection, Image Segmentation, Image Filtering, and Template Matching
techniques. This system recognizes gestures of ASL (American Sign Language) including the alphabet and a subset of its
words.

Index Terms— Hand Gesture Recognition, Digital Image Processing, Skin Detection, Image Segmentation,
Image Filtering, Template Matching technique.

——————————  ——————————

1. Introduction
Communication is a Latin word derived from The gesture recognition method is divided into
SCIO means to share. Communication means to two major categories a) vision based method b)
share thoughts, messages, knowledge or any glove based method. In glove based systems data
information. Since ages communication is the gloves are used to achieve the accurate positions
tool of exchange of information through oral, of the hand sign though, using data gloves has
writing, visuals signs or behaviour. The become a better approach than vision based
communication cycle consider to be completed method as the user has the flexibility of moving
once the message is received by a receiver and the hand around freely.
recognizes the message of the sender. Ordinary
people communicate their thoughts through There are many possible vision based methods
speech to others, whereas the hearing impaired are available. Above all, Byong K. Ko and H. S.
community the means of communication is the Yang developed a finger mouse system that
use of sign language. enables a signer to specify commands with the
fingers as in [2]. Apart from that, there are other
Around 500,000 to 2,000,000 speech and hearing different methods available such as colored
impaired people express their thought through hand-glove based method, Neural Network and
Sign Language in their daily communication [1]. PCA as in [3] to [5] etc.
These numbers may diverge from other sources
but it is most popular as mentioned that the ASL Though, implementation of Neural Network is
is the 3rd most-used sign language in the world. very simple, but it is used to be over-trained on
such a limited training sample particularly
obstructed gesture sign also may cause a
2. Objective
problem. In these circumstances, it is very
This research work focuses on the problem of
difficult to predict the response of a neural
gesture recognition in real time that sign
network. Also the Neural Network can
language used by the community of deaf people.
potentially create erroneous results due to
Research problem identified is based on Digital
environment variation. In the other hand PCA,
Image Processing using Color Segmentation,
due to the very limited training set. PCA faces
Skin Detection, Image Segmentation, Image
the same problem of over-specification of the
Filtering, and Template Matching techniques.
gesture sign as well as may involve lowering the
This system recognizes gestures of ASL
dimensionality of the image.
including the alphabet and a subset of its words.
The main goal of this research paper is to
———————————————— demonstrate that how a good performance can
Author Tahir Khan has accomplished his Masters of Philosophy be achieved without using any special hardware
program in Computer Science from Iqra University, Karachi,
Pakistan. equipment, so that such a system can be
E-mail: [email protected] implemented and easily used in real life.

IJSER © 2015
http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 6, Issue 9, September 2015 339
ISSN 2229-5518

the input images and hence affects the


The contribution of this research paper can be performance.
summarized in this manner that the paper
emphasises to use Template Matching technique 5. Concerns and Issues
as the primary hand gesture sign detection Visibility issue may arise due to several reasons
method due to its conceptual simplicity and my For instance, the camera where the user has to
confidence in it. Apart from that this method has stay in position, the various environmental
ability to combine feature detection with gesture condition like lighting sensitivity, background
detection very easy which can be done by color and condition, electric or magnetic fields or
creating more templates. any other disturbance may affect the
performance.
3. What is ASL?
ASL (American Sign Language) is a language for  Occlusion can be occurred due to the
occluded figures, while signing [6].
hearing impaired and the deaf alike people, in
which manual communication with the help of  The boundaries of gesture have to be
hands, facial expression and body language are automatically detected. For example the
start sign and the end sign for alphabets
used to convey thought to others without using
sound. Since ASL uses an entirely different especially “J” and “Z” have to be
detected automatically.
grammar and vocabulary, such as tense and
articles, does not use "the", therefore, it is  The sitting or standing position of the
considered not related to English.. ASL is signer may vary in front of camera.
Movements of the signer, like rotating
generally preferred as the communication tool
for deaf and dumb people. around the body must be taken into
account.
 Delay in the processing execution can be
4. The Method occurred due to the large amount or
higher resolution of image. Hence, it is
The idea behinds this method is that the software
difficult to recognize in real time basis.
run in a mobile handset having a frontal camera
while a disabled person (who is in the front of
6. Image Acquisition
the mobile handset) makes the signs. This
The most common method of Image Acquisition
software recognizes these gestures of ASL,
is done by digital photography with usually a
including letters and a subset of its words and
digital camera but other methods are also
produces text message for corresponding word
considered. The Image Acquisition includes
or letter for normal people to understand.
compression, processing, and display of images.

Figure 1: Sign Language Interpreter


Figure 2: Image Capturing Process
In this sign language interpreter system, mobile
frontal camera is the input device for observing
The image/frames of the person conveying the
the information of hands or fingers of the user
message using hand gesture can be obtained by
and then these inputs presents to the system to
using a frontal camera of mobile phone. The
produce the text message for normal people to
reason for choosing mobile camera phone
understand the gestures.
instead of a traditional camera for capturing the
The development of such a visual gesture image is that, it is the easiest way to transfer text
recognition sys-tem is not an easy task. There are or voice message to the other ordinary person’s
a numerous environmental concerns and issues mobile device through a mobile network.
are associated with this Sign Language
Figure 2, shows the description of the image
Interpreter System from real world. Such as
acquisition process. In this research work, I have
visibility, this is the key issue in the performance
considered that the mobile camera faces towards
of such system, since it determines the quality of

IJSER © 2015
http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 6, Issue 9, September 2015 340
ISSN 2229-5518

the signer to capture the image of the hand Though, there is no any common theory
gestures of the signer. available for color image segmentation up till
now. The color image segmentation methods all
7. Image Processing Steps we have are yet, either by nature or ad hoc basis.
To satisfy and reduce the computational effort The color segmentation approaches are
needed for the processing, pre-processing of the dependent on the application , there are no any
image taken from the camera is highly common algorithms which is considered the best
important. Apart from that, numerous factors for color image segmentation. The color image
such as lights, environment, background of the segmentation is a psychophysical perception,
image, hand and body position and orientation since it is very essential to have pre-knowledge
of the signer, parameters and focus the of camera of mathematical solutions about the image
impact the result dramatically. information.
The main purpose of Color segmentation is to
8. Color Segmentation find particular objects for example lines, curves,
Color in an image is apparent by human eyes as etc in images. In this process every pixel is
a combination of R(red), G(green) and B(blue), assigned in an image in such a way that pixels
these three colors i.e Red, Green and Blue are with the same label share certain visual
known as three primary colors. Other kinds of characteristics.
color components can be derived from R,G,B
color represented by either linear or nonlinear The goal of color segmentation in this research
transformations. work is to simplify and increase the ability of
separation between skin and non-skin, and also
The RGB color components represent the decrease the ability of separation among skin
incoming light, that is the brightness values of tone.
the image that can be obtained through (Red,
Green and Blue filters) i.e RGB filters based on 9. Skin Detection
the following equations: There are several techniques used for color space
transformation for skin detection. Some potential
color spaces that are considerable for skin
detection process are:

represent filters of the color on incoming  CIEXYZ


light, whereas is radiance and is the  YCbCr
wavelength of the image.  YIQ
 YUV
A performance metric that the other colorspaces
have used is scatter matrices for the computation
of skin and non-skin classes. Another drawback
is to comparison through histogram of the skin
and non-skin pixel after transformation of
colorspace.
The YCbCr, colorspace performs very well in 3
out of 4 performance metrics used [7]. Thus, it
Figure 3 RGB color model
was decided to use YCbCr colorspace in skin
detection algorithm.
It has been highly praised that human eye can
only distinguish two-dozen of color out of In this research work, Skin Detection process
thousands of color shades and intensities. It is involves classification of each pixel of the image
quite often difficult to extract an object or to identify as part of human skin or not by
recognize a pattern from image using gray scale, applying Gray-world Algorithm for illumination
the object can only be extracted using color compensation and the pixels are categorized
information. Since color information provides based on an explicit relationship between the
additional information to the intensity as color components YCbCr. In YCbCr colorspace,
compared to grayscale. Therefore, the Color the single component “Y” represents luminance
information is extremely necessary for pattern information, and Cb and Cr represent color
recognition. information to store two color-difference
components, Component Cb is the difference

IJSER © 2015
http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 6, Issue 9, September 2015 341
ISSN 2229-5518

between the blue component and a reference After the skin detection, image marked with Blue
value, whereas component Cr is the difference color converted into the binary with skin pixels
between the red component and a reference as '1' and rest are “0”. So that, the correlation of
value [8]. the image can be matched with the Template,

Figure 7: Binary Image of the signer after skin detection


Figure 4: The CbCr plane at constant luma Y
The skin detection algorithm implements the
following steps:
Thus, a pixel is considered a human skin, if a set
of pixel is falling into that particular category  Read the image(RGB color image) , and
with a certain value of Cr and Cb having certain capture the dimensions (height and
threshold. width)
 Initialize the output images
 Apply Grayworld Algorithm for
Where S is the tuples of Cr and Cb values that illumination compensation
consider as skin.  Convert the image from RGB to YCbCr
 Detect Skin:
 Mark Skin Pixels with Blue

10. Image Segmentation


To reduce the computational time needed for the
processing of the image, it is important to reduce
the size of the image and only the outline of the
sign gesture has to be process. After conversation
of the image into binary, the outline of the hand
gesture is cropped as a vector of (x, y)
coordination, assuming that the hand gesture
Figure 5: Original Image of the signer
covers left corner of the image.

Overlaid onto the image with human skin pixels


marked in blue color, so that the gesture can
easily be identified.
Figure 8: Image after cropping, ready to match with template

height = size(inputImage,1)/1.5;
width = size(inputImage,2)/2;
imcrop(inputImage,[0 0 width height]);

11. Image Filtering


In Image Filtering technique, the value of the
pixel of any given image is determined by
applying algorithm to the value of the pixels in
the neighborhood. This Filter is referred to the
sub image, mask, kernel, template or window.
Figure 6: Image with skin pixels marked in Blue color
There are mainly two types of Image Filtering.

IJSER © 2015
http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 6, Issue 9, September 2015 342
ISSN 2229-5518

1. Spatial Filtering
2. Linear Filtering

I have considered Linear Filtering as the primary


hand gesture sign detection method due to the
following reasons:

1. There are some limitations with Spatial


Filtering. i.e it limits the outline of center of mask
to be at distance no less than (n-1)/2 pixel from
border. That resulting output image shorter than
the original image. That means one or more
column of the mask will be located outside
image plane.
2. Linear Filtering conceptual simplicity
and my confidence in it.

12. Implementation
Figure 10: Template Image T(xt, yt)
The Template Matching cross-correlation
involves simply multiplying together
corresponding pixels of the signer image, here is
called the Target image and the Template and
then summing the result.
Template Matching is implemented by the
following method:
 First, select a part of the search image
that can be used as image template:
called the search image. i.e S(x, y),
where S represents Search Image, x and
y represent the coordinates of each pixel
in the search image.
 The template T(xt, yt), where T
represents Template, xt and yt represent
the coordinates of each pixel in the
template.
 Then the center of the template T(xt, yt)
moves over each x and y point in the
search image. And then sum up the
products between the coefficients in
Search Image S(x, y) and the Template Figure 11: Result: the output of the Convolution
T(xt, yt) over the complete area of the
Cross-correlation is used to compare the
target image.
intensities of the pixels using template matching
 The search image considers all the
to handle the translation issue on the signer
position of the template.
image.
 The largest value of the position is
considered the best position of the For our hand gesture recognize application in
object. which the brightness of the input image of the
signer can vary due to the various environmental
condition like lighting sensitivity, background
color and condition, electric or magnetic fields or
any other disturbance and exposure conditions
of the signer, the images has to be first
normalized. The norma-lization has to be done at
every step by subtracting the mean and dividing
Figure 9: Search Image S(x y) by the standard deviation. That algorithm is

IJSER © 2015
http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 6, Issue 9, September 2015 343
ISSN 2229-5518

called the cross-correlation of a template and transforms. Or we can say, in time domain,
represented as follow: convolution is point-wise multiplication in
frequency domain
{ f * g}= {f} . {g}

Where f and g are two functions with convolution


Where n represents the number of pixels in the f * g. (the asterisk * denotes convolution and not
template t(tx,ty) and ƒ(x,y), is the average of ƒ multiplication). denote the Fourier transform
and is standard deviation of ƒ. operator. Thus, {f} and {g} are Fourier
transform of f and g respectively. And . denotes
13. Speed up the Process point-wise multiplication
Template matching as we know is a two-
dimensional cross-correlation of a grayscale 14. Improving the Accuracy
image. There are lot of other elements that have In order to improve the accuracy of the template
major influence on estimating the level of matching method, I have decided to use a
similarity between the template and the target secondary template of hand gesture (subimage
image. Since, a two-dimensional cross- or masking), this secondary template have
correlation calculation for a large image in term slightly different gesture sign with different
of size is not only very time-consuming but also angle, considering an overall hit only if a proper
it is very difficult to estimate the overall and correct ges-ture sign is supplied. This is an
performance of the system as well. To overcome additional advantage as this secondary template
this situation, I have used the correlation in the allows system to let go the individual thresholds
frequency domain by multiplying the two- to get all the possible correct hand gesture sign.
dimensional Fourier Transforms (FFT). And then Apart from that it also helps to determinate the
took the inverse of FFT to obtain the output start and end boundary of signs alphabets
image. This reduces the processing time involve motion like “J” and “Z”.
significantly.
Due to the complication of the process, image
filtering was normally done in dedicated
hardware systems in the past. In order to
speeding up the matching process, the process
can also be done through the use of an image Figure 12: Signs alphabets involve motion
pyramid. The Image Pyramid is basically a series
of images, which has different scales, created by
repeatedly filtering and sub-sampling the input
image like we have here the signer’s image to
generate a sequence of reduced resolution
images. Then, these lower resolution images are
searched for the template having the similarly in
term of resolution, to delay the possible start
positions for searching the image at the larger Figure 13: Primary and Secondary Templates
scales. Then, these larger images are searched in with different angle for sign "A"
a small window nearby the start position to find
the best template position. 15. Result and Analysis
The purpose of this application is to recognize
Apart from the above mentioned process using hand gesture. The design is very simple and the
image pyramid there is another way of speeding signer doesn’t need to wear any type of hand
up the process of template matching through gloves. Although this sign language recognition
filtering the image in the frequency domain, also application can be run in an ordinary computer
called 'frequency domain filtering,' . Frequency having a web camera, but ideally it requires
Domain Filtering is done through the Android Smart phone having frontal camera
convolution theorem. with at least 1GHz processor and at least 514MB
So, I have considered convolution theorem as RAM. The template set consists of all alphabets
because according to the convolution theorem, A to Z. The letters J and Z involves motions and
under suitable conditions the Fourier transform hence required secondary templates to
of a convolution is point wise product of Fourier determinate start and end boundary of the

IJSER © 2015
http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 6, Issue 9, September 2015 344
ISSN 2229-5518

alphabets. Table 1 represents result of algorithm MATLAB code for skin detection
presented using the code in MATLAB. The
algorithm can detect all the alphabets from A to Filename: generate_skintone.m
Z with 100% recognition rate if the signer
function [out bin] = generate_skintone(inputimage)
supplies the correct sign.
%GENERATE_SKINTONE Produce a skinmap of
an inputimage. Highlights patches of %skin" like
pixels. Can be used in , gesture recognition,
.
if nargin > 1 | nargin < 1
error(generate_skinmap(inputimage)');
end;

%Read input image


img_inputimg_input = imread(inputimage);
img_height = size(img_input,1);
img_width = size(img_input,2);

%Initialize the images


out = img_input;
bin = zeros(img_height,img_width);

%Apply Grayworld Algorithm Img_gray =


grayworld(img_input);

%Convert from RGB to YCbCr


imgycbcr = rgb2ycbcr(img_gray);
YCb = imgycbcr(:,:,2);
YCr = imgycbcr(:,:,3);

Table 1: Analysis and Result %Detect Human Skin


[r,c,v] = find(YCb>=77 & YCb<=127 & YCr>=133
The system can recognize a set of 24 letters from & YCr<=173);
the ASL alphabets: A, B, C, D, E, F, G, H, I, K, L, numind = size(r,1);
M, N, O, P, Q, R, S, T, U, V, W. The only issue
found for alphabets involve motion like J and Z. %Mark Humain Skin Pixels
A statistic regarding the performance of this for i=1:numind
system using single template can be found: out(r(i),c(i),:) = [0 0 255];
bin(r(i),c(i)) = 1;
Recognized alphabets Recognition accuracy
end
A, B, C, D, E, F, G, H, I, 100%
img_show(img_input);
K, L, M, N, O, P, Q, R,
S, T, U V, W, X, Y figure; img_show(out);
J and Z 0% figure; img_show(bin);
end

Overall performance of the system: 92.30% ---------------------------------------------------------------------


Fliename: grayworld.m

function_out = grayworld(input_image)
%Color Balancing
% input_image- 24 bit RGB Image
% result - Color Balanced 24-bit RGB Image

result = uint8(zeros(size(I,1), size(I,2), size(I,3)));

%R,G,B components
R = input_image(:,:,1);

IJSER © 2015
http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 6, Issue 9, September 2015 345
ISSN 2229-5518

G = input_image (:,:,2); if size(img1)>size(img2)


B = input_image (:,:,3); Target_image=img1;
Template_image=img2;
%Inverse of the Avg values else
mR = input_image /(mean(mean(R))); Target_image=img2;
mG = input_image /(mean(mean(G))); Template_image=img1;
mB = input_image /(mean(mean(B))); end
% calculate the size of both images
%Calculate the Smallest Avg Value [r1,c1]=size(Target_image);
max_RGB = max(max(mR, mG), mB); [r2,c2]=size(Template_image);

% Compute the scaling factors % calculate the mean of the template


mR = mR/max_RGB; image22=Template_image-
mG = mG/max_RGB; mean(mean(Template_image));
mB = mB/max_RGB;

%Calculate the scale values %corrolate target and template images


result(:,:,1) = R*mR; corrMat=[];
result(:,:,2) = G*mG; for i=1:(r1-r2+1)
result(:,:,3) = B*mB; for j=1:(c1-c2+1)
end N_image=Target_image(i:i+r2-1,j:j+c2-1);
N_image=N_image-mean(mean(N_image));
MATLAB code for Template Matching corr=sum(sum(N_image.*image22));
corrMat(i,j)=corr;
Filename: template.m
end
close all
end
clear all
% plot the box on the target image
result1=plotbox(Target_image,Template_image,corr
% read the input image
Mat);
img1=imread('template/template_bw.jpg');

% read the traget template Image 16. CONCLUSION


img2=imread('target/v_crop.jpg'); The statistic of result of the implementation, it is
therefore, concluded that the method is used for
% apply templete matching algorithm using DC template matching and color segmentation work
components with high accuracy with hand gesture
result2=tmc(img1,img2); recognition. The results obtained are applicable,
and can be implemented in a mobile device
figure, smart phone having frontal camera. However,
imshow(img2);title('Target'); only issue was found for alphabets involve
figure,imshow(result2);title('Matching Result using motion like J and Z, which are recommended to
tmc'); be handled through multiple secondary
----------------------------------------------------------------- templates.
----
Filename: tmcmain.m 17. References
function result1=tmc(img1,img2) [1] Paulraj M. P. Sazali Yaacob, Mohd
Shuhanaz bin Zanar Azalan, Rajkumar
if size(img1,3)==3 Palaniappan, “A Phoneme based sign language
img1=rgb2gray(img1); recognition system using skin color
end segmentation”, Signal Processing and its
if size(img2,3)==3 Applications (CSPA) – pp: 1 – 5, 2010.
g2=rgb2gray(img2); [2] Byong K. Ko and H. S Yang, “Finger
end mouse and gesture recognition system as a new
human computer interface”, pp: 555-561, 1997.
% recognize which one is target and which one is [3] Manar Maraqa, Dr. Raed Abu Zaiter,
template “Recognition of Arabic sign Language using
recurrent neural networks”, Applications of

IJSER © 2015
http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 6, Issue 9, September 2015 346
ISSN 2229-5518

Digital Information and Web Technologies, pp:


478 – 481, 2008.
[4] Yang quan, “Chinese Sign Language
Recognition Based on Video Swquence
Appearance Modeling”, ICIEA, the 5th IEEE
Conference, pp: 1537 – 1542, 2010.
[5] K. Kawahigasi, Y. Shirai, J. Miura, N.
Shimada “Automatic Synthesis of training Data
for Sign Language Recognition using HMM, pp:
623 – 626, 2006.

[6] P. Mekala, R. Salmeron, Jeffery Fan, A


Davari, J Tan, “Occlusion Detection Using
Motion-Position Analysis” IEEE 42nd
Southeastern Symposium, on System Theory
(SSST”10), pp: 197-201, 2010.
[7] Jae Y. Lee and Suk I. Yoo “An Elliptical
Boundary Model for Skin Color Detection” pp: 2-
5, 2002.
[8] Digital Image Processing Using Matlab
by Ganzalez, p: 205. 2009.

IJSER © 2015
http://www.ijser.org

You might also like