0% found this document useful (0 votes)
11 views6 pages

Journal Final

The document discusses an assistive sign language converter using convolutional neural networks. It aims to create a computer application that can recognize American Sign Language fingerspelling in real-time video and output the letters in text format. The system analyzes hand gestures through techniques like Gaussian blur filtering and classification using neural networks to achieve 99.25% accuracy in recognizing the 26 letters of the ASL alphabet.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views6 pages

Journal Final

The document discusses an assistive sign language converter using convolutional neural networks. It aims to create a computer application that can recognize American Sign Language fingerspelling in real-time video and output the letters in text format. The system analyzes hand gestures through techniques like Gaussian blur filtering and classification using neural networks to achieve 99.25% accuracy in recognizing the 26 letters of the ASL alphabet.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

ASSISTIVE SIGN LANGUAGE CONVERTER USING

CONVOLUTIONAL NEURAL NETWORKS


S. Surya*1, S. Abinaya*2, Dr. A. Vishnu Kumar*3
*1,2,3Department of IT, Vel Tech High Tech Dr. Rangarajan & Dr. Sakunthala
Engineering College, Avadi, Tamilnadu, India.

Abstract— Sign language is becoming an important with no established signs are spelled in the air by hand
mode of communication for the deaf and dumb movements [24]. This type of system could be the most
individuals. But the problem is most people are helpful for real life situations.
unaware of this sign language, making the
interpretation difficult. The proposed method is a
real time method for fingerspelling based american
sign language built up using neural networks.
Initially The hand is passed through a filter and after
filtering, It is passed through a classifier which
predicts the class of the hand gestures. Most analysis
so far has viewed sign recognition as a naive gesture
recognition. For producing readable outputs, spell-
checking systems are essential. Thus A package
known as Hunspell is used that helps in the reduction
of spell checking errors. Then Adam Optimizer is used
in response to the output of loss operate. This
technique gives 99.25 % accuracy for the 26 letters of
the alphabet.
Figure 1 : ASL Alphabet
Keywords— Fingerspelling system, Convolutional
Neural Networks, Gaussian blur, American Sign The aim is to create a computer application and train a
Language, Human computer interaction. model that, when shown a real-time video of American Sign
Language hand gestures, displays the sign's output in text
I. INTRODUCTION format on the screen. As a result, a user-friendly Human
Computer Interface (HCI) will be developed, in which the
Sign language makes extensive use of movements to computer recognises human sign language.
resemble a movement language composed of a sequence of
hand and arm motions. Sign languages are divided into two II. EXISTING SYSTEM
categories: static gestures and dynamic gestures. Static
gestures are used to represent alphabets, numbers or some Most of the research in this area has been conducted with a
specific words, while dynamic gestures are used to glove-based method. Sensors including potentiometers and
represent complex concepts which can include words, accelerometers are connected to each finger in the glove-
sentences, and many other items also. based device [4]. Based on their readings the corresponding
alphabet is displayed. The key issue with this glove-based
Static gestures are hand positions [17], while dynamic device is that it has to be re-calibrated every time the user's
gestures involve motion of the hands, head, or both [28]. hand changes. Some of the systems allowed colour bands to
Certain unfamiliar words cannot be interpreted as a whole be worn on the fingertips so that the Image Processing unit
so they are simply translated by making movements for could identify the finger tips [6]. They're also quite
each alphabet in the word. This is known as Finger spelling expensive. The issue with vision-based recognition systems
system which comes under static type. is that Depth images and Higher resolution creates
significant delays in the acquisition process and long
The proposed work is a fingerspelling system for American processing time [14]. When using the Skin Segmentation
sign language which is based on a static gesture method. process, the user must wear full sleeves all the time, which
Though different countries have different sign languages, is not convenient always [7]. Utilizing depth cameras to
American sign language is the most preferably used sign produce depth map does not give signer the advantage of
language [16]. Fingerspelling is utilized in circumstances keeping the camera pose unchanged; person have to stand
where new words like names of people, locations, or words in front of the camera in a particular place [12].
Ref. No. Author & Gesture Acquisition
Inference
Year Type device

[1] Walaa Aly, Static Microsoft The PCA Net deep learning architecture is used to learn hand
Saleh Ali, Kinect shape functionality. The precise hand area is cut by locating the
2019 Depth wrist line and removing the forearm region of the hand. It is a
sensor difficult problem in wrist detection to determine orientation
and locate horizontal parallel lines that scan for different hand
poses.

[2] Sunmok Dynamic Camera The hand detection method uses a well-known object detection
Kim,Yangho network, you only look once (YOLO). The major flaw in this
Ji1, 2018 system is that since the hand's surface area is so limited, large
training data is needed.

[3] Muthu Dynamic Camera This FCM (Fuzzy c-means prediction algorithm) based real-
Mariappan, time sign language recognition system, for perceiving the
Dr Gomathi expressions of Indian Sign Language has given a 75 percent
V, 2019 exactness, which is excessively poor.

[4] Paul D Static Glove with A series of electronic gloves that can detect numbers of signs in
Rosero flex sensor sign language. The device is contained inside a glove with flex
Montalvo, sensors in each finger, which are used to capture and analyse
eat al, 2018 data. The disadvantage of this method is that it is a bulky
device with many cables.

[5] Lionel Pigou, Dynamic Kinect and The Microsoft Kinect, convolutional neural networks (CNNs),
Dieleman S., GPU and GPU acceleration are used to build a recognition scheme.
Kindermans Hand gesture recognition over a distance of two metres is not
PJ., possible with this method.
Schrauwen
B, 2015

[6] Ravkiran J, Static Camera The finger detection algorithm for sign language recognition is
Kavi M, a simple, quick, precise and reliable method for locating finger
Suhas M, tips and figuring out a set of hand gestures. Fingertip
Dheeraj R, identification is finished through the idea of Boundary Tracing
Sudheeder S, and Finger Tip Detection.
Nitin V.P,
2009

[7] Paulraj M P, Dynamic USB Web This paper is focused on the identification of skin segmentation
Sazali Camera gestures. The restriction for the system is that the user must
Yaacob, wear a dark color long sleeve shirt or coat.
Hazry Desa,
C.R. Hema,
2016

[8] Kshitij Dynamic Camera The model had issues with facial features and skin tones.
Bantupalli, During testing with different skin tones, the model's precision
Ying Xie, dropped if it hadn’t been trained on and predicted on a specific
2016 skin tone.

[9] Huang, J., Dynamic Microsoft In order to combine colour and depth information, multi-
Zhou, W., Li, Kinect channels of video sources, including colour information, depth
H. and Li, clue, and body joint locations, are used as input to the 3D
W., 2015 CNN. This system has more computation and memory
requirements which is the only drawback.
[10] S. Wu and Dynamic Camera This framework utilizes a technique dependent on the
H.Nagahashi, AdaBoost classifier to prepare on the Haar features of the
2013 picture. Since both faces and hands have identical skin tones,
the Adaboost face detector is used to distinguish between them.
This has faced false positive errors.

[11] Rini Static Colour The signer is wearing color-coded gloves that will help with
Akmeliawati, coded data extraction from sign images using colour segmentation.
Melanie gloves This sign language interpreter understands both static and
PoLeen Ooi motion sign gestures. Only disadvantage is that Gloves are
Ye Chow actually extra costs which are also cumbersome and
Kuang, 2015 inconvenient to use.

[12] Byeongkeun Static Depth Utilizing depth cameras to produce depth map does not give
Kang , sensor signer the advantage of keeping the camera pose unchanged;
Subarna person have to stand in front of the camera in a particular place.
Tripathi , Despite the fact that it facilitates hand tracking and directly
Nguyen, generates depth images. It does not encourage the identification
2015 of hand shapes.

III. PROPOSED SYSTEM additional activities. So The coloured images are changed
over into grayscale image and then Gaussian blur filter is
The Proposed system consists of mainly four phases applied which helps extracting various features of our
namely Image Acquisition, Image Pre-processing, Hand image. After feature extraction, the processed image is
Gesture Recognition and Spell correction. acquired by thresholding the captured frames in opencv.
Sameway the dataset collected from internet is also
preprocessed which contains around 16000 images for
training part and 11000 images for testing part. Using this
only the Convolutional neural network model gets trained.

C. HAND GESTURE RECOGNITION

The processed image is forwarded for prediction to the


CNN network. A static gesture image is fed into the
convolutional neural network model, which outputs the
probability of each gesture type to which the gesture
belongs. If a letter is detected for quite 50 frames, it is
printed and considered in the formation of the word and then
sentences also. Spaces between the words are viewed by
showing a blank image which is also a symbol.

D. SPELL CORRECTION
Figure 2 : Architecture
In order to produce readable outputs, spell-checking
A. IMAGE ACQUISITION systems are needed. So A python library Hunspell_suggest
is used to suggest correct alternatives for each incorrect
This system is based on the vision-based gesture recognition word and a bunch of words will be shown coordinating with
approach only which uses normal camera to capture the the current word where the user can choose a word to annex
image. First each frame shown by the webcam of our it to the current sentence. This enables in lowering mistakes
machine is captured. In each frame a place of interest (ROI) devoted in spellings and assists in predicting complex
is defined which will be denoted by a blue bounded square words.
and the process is called Hand segmentation. The
segmented image which contains only the hand area is now
given to the pre-processor. IV. CNN ALGORITHM STEPS

B. IMAGE PREPROCESSING Input: Preprocessed image


Variables: Weights and Biases
Preprocessing is a process which involves cutting off the Initialization: Feature detector matrix (Filter or Kernel)
unwanted noises and upgrading picture qualities for Output: Predicted class value
Step:1 - The process starts with an input image to which
filters (3x3 pixels) are applied, resulting in a convolutional
layer.

Step:2 - Then the linearity of that image is broken up


utilizing the rectifier function called ReLu.

Step:3 - Now The images are down sampled using 2x2


maximum pooling. The motive is to decrease the
dimensions of activation matrix and ultimately lessen the
learnable parameters.

Step:4 - The pooled feature map is flattened by reshaping it


to an array of certain values and sent to the fully connected Figure 4 : Model Accuracy Graph
layer.

Step:5 - The first Densely Connected Layer contains 96


nodes and the second densely connected layer has 64 nodes.
And then the final output layer has 27 nodes (alphabets +
blank symbol) which produces a probability distribution
output.

Step:6 - The most probable class in all classes is selected as


the final output prediction and it gets activated by the
function called Softmax.

Figure 5 : Model Loss Graph

Figure 6 : Final output


Figure 3 : CNN Classifier Model
This research results in the development of a CNN classifier
Atlast Optimizers are used to adjust the properties of neural capable of recognising static sign language gestures. In this
networks such as weights and learning rate in order to system, a simple GUI application is built to evaluate our
minimize errors and get results faster. In this proposed classifier. The application permits users to show the sign
work, Adam optimizer is used to minimize the loss function gestures as input and it will predict the words or sentences
called cross entrophy. relating to the signs.

V. EXPERIMENTAL RESULTS The model which is a multi-class classifier yields the most
likely sign it perceives. When the expected sign is
The trained model achieves correct classification rate of constantly the same for 50 frames, this sign is saved in
99.25% and shown in Fig.4. CNN loss is computed after memory. Fig.6 depicts the system's recognition of the
every epoch is cited. Fig.5 shows the overall loss of the sentences "ADD SOME SUGAR" and "I LOVE DOGS" in
model which is observed to be 0.0267. response to the user's signs.
[4] Paul D Rosero Montalvo, eat al, ”Sign Language
Recognition Based on Intelligent Glove Using Machine
Learning Techniques”, IEEE Ecuador Technical Chapters
Meeting (ETCM), pp.1-5, 2018.

[5] Pigou L., Dieleman S., Kindermans PJ., Schrauwen B.


(2015) Sign Language Recognition Using Convolutional
Neural Networks.

[6] J. Ravkiran, M. Kavi, M. Suhas, R. Dheeraj, S.


Sudheeder, P. V. Nitin, “Finger Detection for Sign Languae
Recognition,” Proceedings of International
Figure 7: Comparison of proposed method with MultiConference of Engineers and Computer Scientists
existing methods (IMECS), Vol 1, Hongkong, March 18-20, 2009.

[7] Paulraj M P, Sazali Yaacob, Hazry Desa, Hema C.R.,


VI. CONCLUSION “Extraction of Head & Hand Gesture Feature for
Recognition of sign language”, International Conference on
The proposed framework makes an interpretation of the sign Electronic Design, Penang, Malaysia, December 1-3, 2016.
gestures to text utilizing CNN. The exactness of the model
is 99.25%. This algorithm used is simple and easy to [8] Kshitij Bantupalli, Ying Xie, “American Sign Language
understand, moreover the precision got from the analysis is Recognition using Deep Learning and Computer Vision,”
exceptionally high. IEEE International Conference on Big Data 2018 (Big
Data).
Fingerspelling is the most critical aspect of hearing-to-deaf
communication in real-life circumstances. Because words
[9] Huang, J., Zhou, W., Li, H. and Li, W. “Sign language
with no established signs are spelled by hand movement
recognition using 3d convolutional neural networks”. IEEE
only, no other way to convey their thoughts easily. Hence
international conference on multimedia and expo (ICME)
through this finger spelling system, any word can be spelled
(pp. 1-6), June 2015.
out by using hand gestures that corresponds to the letters of
the word.
[10] S. Wu and H. Nagahashi, “Real-time 2D hands
There are two flaws in this system's design. One is plain detection and tracking for sign language recognition,”8th
background needed for the model to detect with accuracy. International Conference on System of Systems
Another one is that it fits well only under good lighting Engineering. 2013
conditions.
[11] Rini Akmeliawati, Melanie Po-Leen Ooi and Ye Chow
So This venture can be stretched out to make framework Kuang, “Real-Time Malaysian Sign Language Translation
work even if there arise an occurrence of complex using Colour Segmentation and Neural Network”, IMTC
backgrounds by evaluating different background deduction 2007 - Instrumentation and Measurement Technology
algorithms. The proposed framework can be built and Conference Warsaw, Poland, 1-3, May 2015
implemented using Raspberry Pi in the future.
[12] Byeongkeun Kang , Subarna Tripathi , Truong Q.
Nguyen ”Real-time sign language fingerspelling
REFERENCES recognition using convolutional neural networks from depth
map” 2015 3rd IAPR Asian Conference on Pattern
[1] Walaa Aly,Saleh Ali, ”User-Independent American sign Recognition (ACPR).
language alphabet recognition based on depth image and
PCA Net features”, IEEE International Journal on [13] Pramada, S., Saylee, D., Pranita, N., Samiksha, N. and
Engineering and Science, vol.7, 2019. Vaidya, M.S., “Intelligent sign language recognition using
image processing,” IOSR Journal of Engineering
[2] Sunmok Kim,Yangho Ji1, ”An effective sign language (IOSRJEN), 3(2), pp.45-51, 2013.
learning with object detection based ROI segmentation”,
IEEE international conference on Robotic computing, pp. [14] Zaki, M.M., Shaheen, S.I.: Sign language recognition
330-333, 2018. using a combination of new vision based features.

[3] Muthu Mariappan, Dr Gomathi V ”Real-Time [15] K.Y. Lian, C.C. Chiu, Y.J. Hong, W.T. Sung, "Wearable
Recognition of Indian Sign Language”, Second armband for real time hand gesture recognition", IEEE
International Conference on Computational Intelligence in International Conference on Systems Man and
Data Science (ICCIDS), 2019, pp. 1-6 IEEE. Cybernetics, 2017.
[16] W C Stokoe, D Casterlines and C Croneberg, A [28] Bao J, Song A, Guo Y, Tang H (2011) Dynamic hand
Dictionary of American Sign Language on Linguistic gesture recognition based on SURF tracking. In: Electric
Principles, Linstok Press, 1965. information and control engineering (ICEICE), international
conference, IEEE, pp 338–341
[17] R. Pinto, C. Borges, A. Almeida, and I. Paula, ‘Static
Hand Gesture Recognition Based on Convolutional Neural [29] K. Hara, K. Nakayamma ”Comparison of activation
Networks’, J. Electr. Comput. Eng., pp. 1–12, 2019. functions in multilayer neural network for pattern
classification” IEEE International Conference on Neural
[18] Syed Muhammad, Husnain Abbas, ”Shape based Networks (ICNN’94), 1994.
Pakistan sign language categorization using statistical
features”, IEEE, vol. 6, 2018. [30] S. Joudaki, D. bin Mohamad, T. Saba, A. Rehman, M.
AlRodhaan, and A. Al-Dhelaan, “Vision-based sign
[19] Surejya Suresh, Mithun Haridas T.P, Supriya M.H, ”Sign language classification: A directional review,” IETE
Language Recognition System Using Deep Neural Technical Review (Institution of Electronics and
Network”, IEEE 5th International Conference on Advanced Telecommunication Engineers, India), pp. 383–391, 2014.
Computing Communication Systems (ICACCS), 2019.
[31] M. Ebrahim Al-Ahdal and M. T. Nooritawati, “Review
in sign language recognition systems,” in 2012 IEEE
[20] Helene Brashear, Thad Starner, Paul Lukowicz and
Symposium on Computers and Informatics, ISCI 2012,
Holger Junker, “Using Multiple Sensors for Mobile Sign 2012, pp. 52–57.
Language Recognition” Proceedings of the 7th IEEE
International Symposium on Wearable Computers IEEE [32] S. Kausar and M. Y. Javed, “A survey on Sign
Computer Society Washington, DC, USA, pp.45-52, 2003. Language recognition,” in Proceedings - 2011 9th
International Conference on Frontiers of Information
[21] Kalpattu S Abhishek, Lee Chun Fai Qubeley and Derek Technology, FIT 2011, 2011, pp. 95–98.
Ho, ”Glove Based Hand Gesture Recognition Sign Language
Translation Using Capacitive Touch Sensor”, IEEE [33] S. C. Agrawal, A. S. Jalal, and R. K. Tripathi, “A
International Conference on Electronic Devices and Solid survey on manual and non-manual sign language
State Circuits(EDSSC), pp. 334-337, 2016. recognition for isolated and continuous sign,” Int. J. Appl.
Pattern Recognit., vol. 3, no. 2, p. 99, 2016.
[22] R. Su, X. Chen, S. Cao, and X. Zhang, “Random forest-
based recognition of isolated sign language subwords using [34] A. Er-Rady, R. Faizi, R. O. H. Thami, and H. Housni,
data from accelerometers and surface electromyographic “Automatic sign language recognition: A survey,” in
Proceedings - 3rd International Conference on Advanced
sensors,” Sensors (Switzerland), vol. 16, no. 1, pp. 9–11,
Technologies for Signal and Image Processing, ATSIP 2017
2016.
[35] bdelnasser H, Youssef M, Harras KA (2015) Wigest: a
[23] Shirin Sulthan Shanta, Saif Taifur Anwar, Md Raganul ubiquitous wifi-based gesture recognition system. In: 2015
Kabir, ”Bangala Sign Language Detection Using SIFT and IEEE conference on computer communications (INFO
CNN”, IEEE 9th ICCCNT, 2018. COM , IEEE, pp 1472–1480

[24] N. Mukai, N. Harada and Y. Chang, "Japanese [36] Rahul D Raj, ”British sign language recognition using
Fingerspelling Recognition Based on Classification Tree and HOG”, IEEE Iternational Students conference on Electrical,
Machine Learning," 2017 Nicograph International ELectronics and Computer science, pp. 1-4, 2018.
(NicoInt).
[37] W. Jiangqin and G. Wen, “The Recognition of Finger-
[25] Lin, H.I.; Hsu, M.H.; Chen, W.K. ‘Human hand gesture Spelling for Chinese Sign Language,” 2001, pp. 96–100.
recognition using a convolution neural network’. In
Proceedings of the IEEE International Conference on [38] Bay H, Tuytelaars T, Van Gool L (2006) Surf: speeded
Automation Science and Engineering (CASE), Taipei, up robust features. In: European conference on computer
Taiwan, 18–22 August 2014; pp. 1038–1043 vision. Springer, pp 404–417

[39] Kurdyumov R, Ho P, Ng J (2011) Sign language


[26] Pansare JR, Gawande SH, Ingle M (2012) Real-time
classification using webcam images
static hand gesture recognition for American sign language
(ASL) in complex background. J Signal Inf Process [40] M. Elmezain, A. Al-Hamadi, J. Appenrodt, and B.
Michaelis, “A hidden markov model-based continuous
[27] Qiu-yu Z, Jun-chi L, Mo-yi Z, Hong-xiang D, Lu L (2015) gesture recognition system for hand motion trajectory,” in
Hand gesture segmentation method based on YCbCr color 19th International Conference on Pattern Recognition,
space and K-means clustering. Interaction 8:106–16 2009.

You might also like