Journal Final
Journal Final
Abstract— Sign language is becoming an important with no established signs are spelled in the air by hand
mode of communication for the deaf and dumb movements [24]. This type of system could be the most
individuals. But the problem is most people are helpful for real life situations.
unaware of this sign language, making the
interpretation difficult. The proposed method is a
real time method for fingerspelling based american
sign language built up using neural networks.
Initially The hand is passed through a filter and after
filtering, It is passed through a classifier which
predicts the class of the hand gestures. Most analysis
so far has viewed sign recognition as a naive gesture
recognition. For producing readable outputs, spell-
checking systems are essential. Thus A package
known as Hunspell is used that helps in the reduction
of spell checking errors. Then Adam Optimizer is used
in response to the output of loss operate. This
technique gives 99.25 % accuracy for the 26 letters of
the alphabet.
Figure 1 : ASL Alphabet
Keywords— Fingerspelling system, Convolutional
Neural Networks, Gaussian blur, American Sign The aim is to create a computer application and train a
Language, Human computer interaction. model that, when shown a real-time video of American Sign
Language hand gestures, displays the sign's output in text
I. INTRODUCTION format on the screen. As a result, a user-friendly Human
Computer Interface (HCI) will be developed, in which the
Sign language makes extensive use of movements to computer recognises human sign language.
resemble a movement language composed of a sequence of
hand and arm motions. Sign languages are divided into two II. EXISTING SYSTEM
categories: static gestures and dynamic gestures. Static
gestures are used to represent alphabets, numbers or some Most of the research in this area has been conducted with a
specific words, while dynamic gestures are used to glove-based method. Sensors including potentiometers and
represent complex concepts which can include words, accelerometers are connected to each finger in the glove-
sentences, and many other items also. based device [4]. Based on their readings the corresponding
alphabet is displayed. The key issue with this glove-based
Static gestures are hand positions [17], while dynamic device is that it has to be re-calibrated every time the user's
gestures involve motion of the hands, head, or both [28]. hand changes. Some of the systems allowed colour bands to
Certain unfamiliar words cannot be interpreted as a whole be worn on the fingertips so that the Image Processing unit
so they are simply translated by making movements for could identify the finger tips [6]. They're also quite
each alphabet in the word. This is known as Finger spelling expensive. The issue with vision-based recognition systems
system which comes under static type. is that Depth images and Higher resolution creates
significant delays in the acquisition process and long
The proposed work is a fingerspelling system for American processing time [14]. When using the Skin Segmentation
sign language which is based on a static gesture method. process, the user must wear full sleeves all the time, which
Though different countries have different sign languages, is not convenient always [7]. Utilizing depth cameras to
American sign language is the most preferably used sign produce depth map does not give signer the advantage of
language [16]. Fingerspelling is utilized in circumstances keeping the camera pose unchanged; person have to stand
where new words like names of people, locations, or words in front of the camera in a particular place [12].
Ref. No. Author & Gesture Acquisition
Inference
Year Type device
[1] Walaa Aly, Static Microsoft The PCA Net deep learning architecture is used to learn hand
Saleh Ali, Kinect shape functionality. The precise hand area is cut by locating the
2019 Depth wrist line and removing the forearm region of the hand. It is a
sensor difficult problem in wrist detection to determine orientation
and locate horizontal parallel lines that scan for different hand
poses.
[2] Sunmok Dynamic Camera The hand detection method uses a well-known object detection
Kim,Yangho network, you only look once (YOLO). The major flaw in this
Ji1, 2018 system is that since the hand's surface area is so limited, large
training data is needed.
[3] Muthu Dynamic Camera This FCM (Fuzzy c-means prediction algorithm) based real-
Mariappan, time sign language recognition system, for perceiving the
Dr Gomathi expressions of Indian Sign Language has given a 75 percent
V, 2019 exactness, which is excessively poor.
[4] Paul D Static Glove with A series of electronic gloves that can detect numbers of signs in
Rosero flex sensor sign language. The device is contained inside a glove with flex
Montalvo, sensors in each finger, which are used to capture and analyse
eat al, 2018 data. The disadvantage of this method is that it is a bulky
device with many cables.
[5] Lionel Pigou, Dynamic Kinect and The Microsoft Kinect, convolutional neural networks (CNNs),
Dieleman S., GPU and GPU acceleration are used to build a recognition scheme.
Kindermans Hand gesture recognition over a distance of two metres is not
PJ., possible with this method.
Schrauwen
B, 2015
[6] Ravkiran J, Static Camera The finger detection algorithm for sign language recognition is
Kavi M, a simple, quick, precise and reliable method for locating finger
Suhas M, tips and figuring out a set of hand gestures. Fingertip
Dheeraj R, identification is finished through the idea of Boundary Tracing
Sudheeder S, and Finger Tip Detection.
Nitin V.P,
2009
[7] Paulraj M P, Dynamic USB Web This paper is focused on the identification of skin segmentation
Sazali Camera gestures. The restriction for the system is that the user must
Yaacob, wear a dark color long sleeve shirt or coat.
Hazry Desa,
C.R. Hema,
2016
[8] Kshitij Dynamic Camera The model had issues with facial features and skin tones.
Bantupalli, During testing with different skin tones, the model's precision
Ying Xie, dropped if it hadn’t been trained on and predicted on a specific
2016 skin tone.
[9] Huang, J., Dynamic Microsoft In order to combine colour and depth information, multi-
Zhou, W., Li, Kinect channels of video sources, including colour information, depth
H. and Li, clue, and body joint locations, are used as input to the 3D
W., 2015 CNN. This system has more computation and memory
requirements which is the only drawback.
[10] S. Wu and Dynamic Camera This framework utilizes a technique dependent on the
H.Nagahashi, AdaBoost classifier to prepare on the Haar features of the
2013 picture. Since both faces and hands have identical skin tones,
the Adaboost face detector is used to distinguish between them.
This has faced false positive errors.
[11] Rini Static Colour The signer is wearing color-coded gloves that will help with
Akmeliawati, coded data extraction from sign images using colour segmentation.
Melanie gloves This sign language interpreter understands both static and
PoLeen Ooi motion sign gestures. Only disadvantage is that Gloves are
Ye Chow actually extra costs which are also cumbersome and
Kuang, 2015 inconvenient to use.
[12] Byeongkeun Static Depth Utilizing depth cameras to produce depth map does not give
Kang , sensor signer the advantage of keeping the camera pose unchanged;
Subarna person have to stand in front of the camera in a particular place.
Tripathi , Despite the fact that it facilitates hand tracking and directly
Nguyen, generates depth images. It does not encourage the identification
2015 of hand shapes.
III. PROPOSED SYSTEM additional activities. So The coloured images are changed
over into grayscale image and then Gaussian blur filter is
The Proposed system consists of mainly four phases applied which helps extracting various features of our
namely Image Acquisition, Image Pre-processing, Hand image. After feature extraction, the processed image is
Gesture Recognition and Spell correction. acquired by thresholding the captured frames in opencv.
Sameway the dataset collected from internet is also
preprocessed which contains around 16000 images for
training part and 11000 images for testing part. Using this
only the Convolutional neural network model gets trained.
D. SPELL CORRECTION
Figure 2 : Architecture
In order to produce readable outputs, spell-checking
A. IMAGE ACQUISITION systems are needed. So A python library Hunspell_suggest
is used to suggest correct alternatives for each incorrect
This system is based on the vision-based gesture recognition word and a bunch of words will be shown coordinating with
approach only which uses normal camera to capture the the current word where the user can choose a word to annex
image. First each frame shown by the webcam of our it to the current sentence. This enables in lowering mistakes
machine is captured. In each frame a place of interest (ROI) devoted in spellings and assists in predicting complex
is defined which will be denoted by a blue bounded square words.
and the process is called Hand segmentation. The
segmented image which contains only the hand area is now
given to the pre-processor. IV. CNN ALGORITHM STEPS
V. EXPERIMENTAL RESULTS The model which is a multi-class classifier yields the most
likely sign it perceives. When the expected sign is
The trained model achieves correct classification rate of constantly the same for 50 frames, this sign is saved in
99.25% and shown in Fig.4. CNN loss is computed after memory. Fig.6 depicts the system's recognition of the
every epoch is cited. Fig.5 shows the overall loss of the sentences "ADD SOME SUGAR" and "I LOVE DOGS" in
model which is observed to be 0.0267. response to the user's signs.
[4] Paul D Rosero Montalvo, eat al, ”Sign Language
Recognition Based on Intelligent Glove Using Machine
Learning Techniques”, IEEE Ecuador Technical Chapters
Meeting (ETCM), pp.1-5, 2018.
[3] Muthu Mariappan, Dr Gomathi V ”Real-Time [15] K.Y. Lian, C.C. Chiu, Y.J. Hong, W.T. Sung, "Wearable
Recognition of Indian Sign Language”, Second armband for real time hand gesture recognition", IEEE
International Conference on Computational Intelligence in International Conference on Systems Man and
Data Science (ICCIDS), 2019, pp. 1-6 IEEE. Cybernetics, 2017.
[16] W C Stokoe, D Casterlines and C Croneberg, A [28] Bao J, Song A, Guo Y, Tang H (2011) Dynamic hand
Dictionary of American Sign Language on Linguistic gesture recognition based on SURF tracking. In: Electric
Principles, Linstok Press, 1965. information and control engineering (ICEICE), international
conference, IEEE, pp 338–341
[17] R. Pinto, C. Borges, A. Almeida, and I. Paula, ‘Static
Hand Gesture Recognition Based on Convolutional Neural [29] K. Hara, K. Nakayamma ”Comparison of activation
Networks’, J. Electr. Comput. Eng., pp. 1–12, 2019. functions in multilayer neural network for pattern
classification” IEEE International Conference on Neural
[18] Syed Muhammad, Husnain Abbas, ”Shape based Networks (ICNN’94), 1994.
Pakistan sign language categorization using statistical
features”, IEEE, vol. 6, 2018. [30] S. Joudaki, D. bin Mohamad, T. Saba, A. Rehman, M.
AlRodhaan, and A. Al-Dhelaan, “Vision-based sign
[19] Surejya Suresh, Mithun Haridas T.P, Supriya M.H, ”Sign language classification: A directional review,” IETE
Language Recognition System Using Deep Neural Technical Review (Institution of Electronics and
Network”, IEEE 5th International Conference on Advanced Telecommunication Engineers, India), pp. 383–391, 2014.
Computing Communication Systems (ICACCS), 2019.
[31] M. Ebrahim Al-Ahdal and M. T. Nooritawati, “Review
in sign language recognition systems,” in 2012 IEEE
[20] Helene Brashear, Thad Starner, Paul Lukowicz and
Symposium on Computers and Informatics, ISCI 2012,
Holger Junker, “Using Multiple Sensors for Mobile Sign 2012, pp. 52–57.
Language Recognition” Proceedings of the 7th IEEE
International Symposium on Wearable Computers IEEE [32] S. Kausar and M. Y. Javed, “A survey on Sign
Computer Society Washington, DC, USA, pp.45-52, 2003. Language recognition,” in Proceedings - 2011 9th
International Conference on Frontiers of Information
[21] Kalpattu S Abhishek, Lee Chun Fai Qubeley and Derek Technology, FIT 2011, 2011, pp. 95–98.
Ho, ”Glove Based Hand Gesture Recognition Sign Language
Translation Using Capacitive Touch Sensor”, IEEE [33] S. C. Agrawal, A. S. Jalal, and R. K. Tripathi, “A
International Conference on Electronic Devices and Solid survey on manual and non-manual sign language
State Circuits(EDSSC), pp. 334-337, 2016. recognition for isolated and continuous sign,” Int. J. Appl.
Pattern Recognit., vol. 3, no. 2, p. 99, 2016.
[22] R. Su, X. Chen, S. Cao, and X. Zhang, “Random forest-
based recognition of isolated sign language subwords using [34] A. Er-Rady, R. Faizi, R. O. H. Thami, and H. Housni,
data from accelerometers and surface electromyographic “Automatic sign language recognition: A survey,” in
Proceedings - 3rd International Conference on Advanced
sensors,” Sensors (Switzerland), vol. 16, no. 1, pp. 9–11,
Technologies for Signal and Image Processing, ATSIP 2017
2016.
[35] bdelnasser H, Youssef M, Harras KA (2015) Wigest: a
[23] Shirin Sulthan Shanta, Saif Taifur Anwar, Md Raganul ubiquitous wifi-based gesture recognition system. In: 2015
Kabir, ”Bangala Sign Language Detection Using SIFT and IEEE conference on computer communications (INFO
CNN”, IEEE 9th ICCCNT, 2018. COM , IEEE, pp 1472–1480
[24] N. Mukai, N. Harada and Y. Chang, "Japanese [36] Rahul D Raj, ”British sign language recognition using
Fingerspelling Recognition Based on Classification Tree and HOG”, IEEE Iternational Students conference on Electrical,
Machine Learning," 2017 Nicograph International ELectronics and Computer science, pp. 1-4, 2018.
(NicoInt).
[37] W. Jiangqin and G. Wen, “The Recognition of Finger-
[25] Lin, H.I.; Hsu, M.H.; Chen, W.K. ‘Human hand gesture Spelling for Chinese Sign Language,” 2001, pp. 96–100.
recognition using a convolution neural network’. In
Proceedings of the IEEE International Conference on [38] Bay H, Tuytelaars T, Van Gool L (2006) Surf: speeded
Automation Science and Engineering (CASE), Taipei, up robust features. In: European conference on computer
Taiwan, 18–22 August 2014; pp. 1038–1043 vision. Springer, pp 404–417