Project Report
Project Report
INTRODUCTION
To do the communication with paralyzed people we use eye Face Detection
Algorithm. The Face Detection Algorithm then processes captured video frames to
give out the rectangular boxed face. This output from Face Detection Algorithm
then gets processed using AdaBoost Classifier to detect the eye region in the face.
Eye detected will be sent to check if there is any movement of the eyeball. If it’s
there, then this movement will be tracked to give out the combination the patient is
using to express the dialogue. If not, then the blink pattern will be processed to
give out the voice as well as the text input with the respective dialogue. There are
many methods introduced for motor neuron disease patients to communicate with
the outside world such as Brain wave technique and Electro-oculography. Loss of
speech can be hard to adjust. It is difficult for the patients to make the caretaker
understand what they need especially when they are in hospitals. It becomes
difficult for the patients to express their feelings and even they cannot take part in
conversations. The proposed system detects the voluntary blinks of the patient and
accordingly sends the message about the requirement to the caretaker and also
gives the voice output via a call to the caretaker. The system uses an inbuilt camera
to capture the video of the patient and with the help of a facial landmark algorithm,
it identifies the face and eyes of the patient. The system then slides a bunch of
images one after the other on the screen and the patient can choose to blink over
the image he wants, just to convey message of his desires. The system identifies
the blink with help of eye aspect ratio and then sends a message to the care taker of
what the patient wants and also the system initiates a call to the care taker where in
a voice is audible saying what the patient wants.
Blink To Speak offers a form of independence to paralyzed people. The
software platform converts eye blinks to Speak. Every feature of the software can
1
be controlled by eye movement. Thus, the software can be independently operated
by paralyzed people. Using the software, patients can record messages, recite those
messages aloud, and send the messages to others.
2
Fast: There are few algorithms which are developed for video Oculography system
for communication. The main objective of this project is to develop an algorithm
which is extremely fast compared to the existing ones.
3
CHAPTER 2
LITERATURE SURVEY
4
interface (BCI) and second communication the interface is often intrusive, which
require is a system controlled by invasive devices . A BCI uses special hardware or
depend on active infrared sensors. A nonelectrical brain activity and measure the
signal and interpret intrusive communication interface which was developed and
runs on a consumer grade computer which takes input in the the signal which helps
to control computer applications. form of video frames from an inexpensive
webcam without However, the main drawback of BCI are intrusiveness and special
lightning conditions. The interface detects voluntary eye needs an EEG recording
hardware. The invasive devices blinks and pupil motion then interprets them as
control commands. The detected eye direction can be useful through method
makes use of the contact-lens based tracking system. applications such as medical
assistance, S.O.S, basic utility. The Small silicon wired coils are used, called as the
Scleral video frame is processed by OpenCV library which is an open Search Coils
that are embedded into a modified contact lens source software.
5
body improvements in information technology, object detection / muscles are
paralyzed except the movement of eyes. Our recognition has wide usage in
applications.
6
CHAPTER 3
MATERIALS AND METHOLOGY
3.1 METHOLOGY
3.1.1 VISUAL COMMUNICATION
Visual communication is the practice of using visual elements to convey a
message, inspire change, or evoke emotion. It’s one part communication design
crafting a message that educates, motivates, and engages, and one-part graphic
design using design principles to communicate that message so that it’s clear and
eye-catching. Effective visual communication should be equally appealing and
informative.
7
3.1.3 FACE DETECTION
Face detection has progressed from rudimentary computer vision techniques
to advances in machine learning (ML) to increasingly sophisticated artificial neural
networks (ANN) and related technologies; the result has been continuous
performance improvements. It now plays an important role as the first step in many
key applications -- including face tracking, face analysis and facial recognition.
Face detection has a significant effect on how sequential operations will perform in
the application.
8
An infrared light source (and thus detection method) is necessary as the
accuracy of gaze direction measurement is dependent on a clear demarcation (and
detection) of the pupil as well as the detection of corneal reflection. Normal light
sources (with ordinary cameras) aren’t able to provide as much contrast, meaning
that an appropriate amount of accuracy is much harder to achieve without infrared
light.
Light from the visible spectrum is likely to generate uncontrolled Specular
reflection, while infrared light allows for a precise differentiation between the
pupil and the iris – while the light directly enters the pupil, it just “bounces off” the
iris. Additionally, as infrared light is not visible to humans it doesn’t cause any
distraction while the eyes are being tracked.
9
3.1.6 BLINK DETECTION
Blink detection is actually the process of using computer vision to firstly
detect a face, with eyes, and then using a video stream (or even a series of rapidly-
taken still photos) to determine whether those eyes have blinked or not within a
certain timeframe.
10
The following are the libraries used in our project
OpenCv
Dlib
Enum
Time
Subprocess
TKinter
Gtts
PIL
Twillio & Tempfile
3.2.2.1OPENCV
OpenCV is the huge open-source library for the computer vision, machine
learning, and image processing and now it plays a major role in real- time
operation which is very important in today’s systems. By using it, one can process
images and videos to identify objects, faces, or even handwriting of a human.
When it integrated with various libraries, such as NumPy, python is capable of
processing the OpenCV array structure for analysis. To Identify image pattern and
its various features we use vector space and perform mathematical operations on
these features.
3.2.2.2 DLIB
Dlib is a general-purpose cross platform software library written in the
programming language C++. It is design heavily influenced by ideas from a design
by contract and component-based software engineering. Thus it is, first and
foremost, a set of independent software components.
11
It is open-source software released under a boost software license. Since
development began in 2002, Dlib has grown to include a wide variety of tools. As
of 2016, it contains software components for dealing with networking, threads,
graphical user interfaces, data structures, linear algebra, machine learning, image
processing, data mining, XML and text parsing, numerical optimization, Bayesian
networks, and many other tasks.
3.2.2.3 ENUM
Enumerations in Python are implemented by using the module named
“enum”. Enumerations are created using classes. Enums have names and values
associated with them.
3.2.2.4 TIME
As the name suggests Python time module allows to work with time in
Python. It allows functionality like getting the current time, pausing the Program
from executing, etc. So before starting with this module we need to import it.
3.2.2.5 TKINTER
Python offers multiple options for developing GUI (Graphical User
Interface). Out of all the GUI methods, tkinter is the most commonly used method.
It is a standard Python interface to the Tk GUI toolkit shipped with Python. Python
with tkinter is the fastest and easiest way to create the GUI applications. Creating a
GUI using tkinter is an easy task.
3.2.2.6 SUBPROCESS
Subprocess is a standard Python module that allows the user to start new
processes from within a Python script1234. It is useful for running multiple
12
processes in parallel or calling an external program or command from inside
Python code. Subprocess allows the user to manage inputs, outputs, and errors
raised by the child process from Python code.
The parent-child relationship of processes is where the "sub" in the
subprocess name comes from. Subprocess is used to launch processes that are
completely separate from the user's program, while multiprocessing is designed to
communicate with each other.
3.2.2.7 GTTS
There are several APIs available to convert text to speech in Python. One of
such APIs is the Google Text to Speech API commonly known as the gTTS API.
gTTS is a very easy to use tool which converts the text entered, into audio which
can be saved as a mp3 file. The gTTS API supports several languages including
English, Hindi, Tamil, French, German and many more.
The speech can be delivered in any one of the two available audio speeds,
fast or slow. However, as of the latest update, it is not possible to change the voice
of the generated audio.
3.2.2.8 PIL(Pillow)
PIL stands for Python Imaging Library, and it’s the original library that
enabled Python to deal with images. PIL was discontinued in 2011 and only
supports Python 2. To use its developers’ own description, Pillow is the friendly
PIL fork that kept the library alive and includes support for Python 3.
PIL is the Python Imaging Library which provides the python interpreter with
image editing capabilities. The Image module provides a class with the same name
13
which is used to represent a PIL image. The module also provides a number of
factory functions, including functions to load images from files, and to create new
images.
In this case, a problem arose that many output files were created and this cluttered
the file system with unwanted files that would require deleting every time the
program ran.
14
3.3 SYSTEM ARCHITECTURE
15
3.4 WORK FLOW DIAGRAM
16
EYE RECOGNITION DIAGRAM
17
CASE DIAGRAM
18
STEPS FOR IMPLEMENTATION:
Step 1: Capturing a video.
Step 2: Capture images from video.
Step 3: Converting images into grayscale. Step 4: Fix landmarks on the images.
Step 5: Detect Blinks.
Step 6: Detecting Eye-ball movements.
Step 7: Converting to text.
Step 8: Sending the text (or) if emergency situation means doing calls.
3.5 MODULES
Images From Camera
Converting images into Grayscale
Preprocessing
Face Detection
Eye Ball Movement Recognition using Dlib
Sending Messages
19
The purpose for converting grayscale images are
Simplicity
Data Reduction
If we converting into grayscale, especially due to the likely reduction in processing
time. However, it comes at the cost of throwing away data (color data) that may be
very helpful or required for many image processing applications.
3.5.1.3 PREPROCESSING:
Pre-processing is the first step of the language processing system, which
translates high-level language to machine-level language or absolute machine
code. It involves data validation and imputation to assess whether the data is
complete and accurate, and to correct errors and input missing values. We use the
pre-processing method to improve the quality of the captured images.
In image processing, preprocessing improves image quality by removing
noise, unmated data, or eliminating variations that arise during acquisition.
20
3.5.1.5 EYE BALL MOVEMENT RECOGNITION USING DLIB
Research on eye tracking is increasing owing to its ability to facilitate many
different tasks, particularly for the elderly or users with special needs. Eye tracking
is the process of measuring where one is looking (point of gaze) or the motion of
an eye relative to the head. Researchers have developed different algorithms and
techniques to automatically track the gaze position and direction, which are helpful
to find the emotions of the paralyzed person. We explore and review eye tracking
concepts, methods, and techniques by further elaborating on efficient and effective
modern approaches such as machine learning (ML).
21
Deep Learning is a subfield of Machine Learning that involves the use of
neural networks to model and solve complex problems. Neural networks
are modeled after the structure and function of the human brain and
consist of layers of interconnected nodes that process and transform data.
The key characteristic of Deep Learning is the use of deep neural
networks, which have multiple layers of interconnected nodes. These
networks can learn complex representations of data by discovering
hierarchical patterns and features in the data. Deep Learning algorithms
can automatically learn and improve from data without the need for
manual feature engineering.
Deep Learning has achieved significant success in various fields,
including image recognition, natural language processing, speech
recognition, and recommendation systems. Some of the popular Deep
Learning architectures include Convolutional Neural Networks (CNNs),
Recurrent Neural Networks (RNNs), and Deep Belief Networks (DBNs).
Training deep neural networks typically requires a large amount of data
and computational resources. However, the availability of cloud
computing and the development of specialized hardware, such as Graphics
Processing Units (GPUs), has made it easier to train deep neural networks.
In summary, Deep Learning is a subfield of Machine Learning that
involves the use of deep neural networks to model and solve complex
problems. Deep Learning has achieved significant success in various
fields, and its use is expected to continue to grow as more data becomes
available, and more powerful computing resources become available.
Deep learning is the branch of machine learning which is based on
artificial neural network architecture. An artificial neural network or ANN uses
22
layers of interconnected nodes called neurons that work together to process and
learn from the input data.
In a fully connected Deep neural network, there is an input layer and
one or more hidden layers connected one after the other. Each neuron receives
input from the previous layer neurons or the input layer. The output of one neuron
becomes the input to other neurons in the next layer of the network, and this
process continues until the final layer produces the output of the network. The
layers of the neural network transform the input data through a series of nonlinear
transformations, allowing the network to learn complex representations of the input
data.
23
UNSUPERVISED MACHINE LEARNING
Unsupervised machine learning is the machine learning technique in which
the neural network learns to discover the patterns or to cluster the dataset based on
unlabeled datasets. Here there are no target variables. while the machine has to
self-determined the hidden patterns or relationships within the datasets. Deep
learning algorithms like autoencoders and generative models are used for
unsupervised tasks like clustering, dimensionality reduction, and anomaly
detection.
24
weight. These weights are then adjusted during the training process to enhance the
performance of the model.
ALGORITHM USED
CNN (CONVOLUTIONAL NEURAL NETWORKS)
Deep Learning has facilitated multiple approaches to computer vision,
cognitive computation and refined processing of visual data. One such instance is
the use of CNN or Convolutional Neural Networks for object or image
classification. CNN algorithms provide a massive advantage in visual-based
classification by enabling machines to perceive the world around them (in the form
of pixels) as humans do.
CNN is fundamentally a recognition algorithm that allows machines to
become trained enough to process, classify or identify a multitude of parameters
from visual data through layers. CNN-based systems learn from image-based
training data and can classify future input images or visual data on the basis of its
training model. As long as the dataset that is used for training contains a range of
useful visual cues (spatial data), the image or object classifier will be highly
accurate.
25
This promotes advanced object identification and image classification by
enabling machines or software to accurately identify the required objects from
input data. CNN models rely on classification, segmentation, localisation and then
build predictions. This allows these cars to almost react like human brains would in
any given situation or sometimes even more effectively than human drivers.
26
Sharpening and Restoration
Pattern recognition
Retrieval
VISUALIZATION
Data visualization is the graphical representation of information and data. By
using visual elements like charts, graphs, and maps, data visualization tools
provide an accessible way to see and understand trends, outliers, and patterns in
data. Additionally, it provides an excellent way for employees or business owners
to present data to non-technical audiences without confusion. In the world of Big
Data, data visualization tools and technologies are essential to analyze massive
amounts of information and make data-driven decisions.
ADVANTAGES OF VISUALIZATION:
Easily sharing information.
Interactively explore opportunities.
Visualize patterns and relationships.
RECOGNITION
Facial recognition is a way of identifying or confirming an individual’s
identity using their face. Facial recognition systems can be used to identify people
in photos, videos, or in real-time.
Facial recognition is a category of biometric security. Other forms of biometric
software include voice recognition, fingerprint recognition, and eye retina or iris
recognition. The technology is mostly used for security and law enforcement,
though there is increasing interest in other areas of use.
27
Many people are familiar with face recognition technology through the
Face ID used to unlock iPhones (however, this is only one application of face
recognition). Typically, facial recognition does not rely on a massive database of
photos to determine an individual’s identity it simply identifies and recognizes one
person as the sole owner of the device, while limiting access to others.
Beyond unlocking phones, facial recognition works by matching the faces
of people walking past special cameras, to images of people on a watch list. The
watch lists can contain pictures of anyone, including people who are not suspected
of any wrongdoing, and the images can come from anywhere — even from our
social media accounts. Facial technology systems can vary, but in general, they
tend to operate as follows:
28
STEP 3: CONVERTING THE IMAGE TO DATA
The face capture process transforms analog information (a face) into a set
of digital information (data) based on the person's facial features. Your face's
analysis is essentially turned into a mathematical formula. The numerical code is
called a faceprint. In the same way that thumbprints are unique, each person has
their own faceprint.
29
It’s a subjective measure of the contrast at an edge. There’s no unit for
acutance you either think an edge has contrast or think it doesn’t. Edges that have
more contrast appear to have a more defined edge to the human visual system.
PATTERN RECOGNITION
Pattern recognition is a technique to classify input data into classes or
objects by recognizing patterns or feature similarities. Unlike pattern matching
which searches for exact matches, pattern recognition looks for a “most likely”
pattern to classify all information provided. This can be done in a supervised
(labelled data) learning model or unsupervised (unlabelled data) to discover new,
hidden patterns.
INFORMATION RETRIEVAL:
Information Retrieval (IR) can be defined as a software program that deals
with the organization, storage, retrieval, and evaluation of information from
document repositories, particularly textual information. Information Retrieval is
the activity of obtaining material that can usually be documented on an
unstructured nature i.e. usually text which satisfies an information need from
within large collections which is stored on computers. For example, Information
Retrieval can be when a user enters a query into the system.
30
(ML) and artificial neural network (ANN) technology, and plays an important role
in face tracking, face analysis and facial recognition. In face analysis, face
detection uses facial expressions to identify which parts of an image or video
should be focused on to determine age, gender and emotions. In a facial
recognition system, face detection data is required to generate a faceprint and
match it with other stored faceprints.
31
taken still photos) to determine whether those eyes have blinked or not within a
certain timeframe. There are a number of uses for blink detection. Probably the
most common use, as far as consumers are concerned, has been in cameras and
smartphones. The aim of this detection has been to help the photographer improve
their photographs, by telling them when their subjects have blinked.
The blink detection technology focuses on the eyes of people in the
photograph (they can often work with up to twenty faces) and whenever a pair of
eyes are occluded there will either be a message displayed on the lcd screen telling
the photographer to delay taking their photograph, or the more advanced cameras
are smart enough to simply snap the photo at a moment when all eyes are open.
32
CHAPTER 4
RESULTS
33
FIG 4.3 PATIENTS NEED SELECTION
34
FIG 4.5 MOBILITY ASSISTANCE NEED
35
FUTURE ENHANCEMENT
In our research, we have demonstrated in the laptop perspective which can
be equipped in compact manner. Which will help the normal users to use the
proposed system. Without any human intervention the system should work
according to the requirement. This analysis does not work in dark, hence it can be
enhanced in the system Further it can be automated the things to which are in the
form of audio and message in our study. The main objective is to design a real time
interactive system that can assist the paralysis patients to control appliances such
as lights, fans etc. In addition, It can also play pre-recorded audio messages
through predefined number of eye blinks and it also helps to alert the doctor or
concerned person by sending SMS in case of emergency by using eye blink Sensor.
The eye blink sensor is able to detect an intentional blink from a normal blink,
which is useful for the paralysis patients especially Tetraplegic patients to regulate
their home devices easily without any help.
36
CONCLUSION
Although blink detection systems exist for other purposes, an
implementation of a blink detection system with the end use of controlling
appliances has not been previously accomplished. While the system is intended to
assist the paralyzed and physically challenged, it can definitely be used by all
types of individuals. The main challenge involved in the implementation of the
system is the development of a real time robust blink detection algorithm. Many
algorithms have been developed to serve the purpose, with some being more
accurate than the others. This paper presented a blink detection system based on
Online template matching. The first phase involved the blink detection phase; the
second phase involved the counting of blinks and subsequent control of
appliances through a micro controller. By enabling the paralyzed to gain control
of albeit a small part of their lives, the system can offer some level of
independence to them. The helpers who are assigned the task of tending to
paralyzed persons through the day can then be afforded a break. The practical
use. For continuous video input, laptops with built in webcams or USB cameras
will suffice. The system is limited by the efficiency of the blink detection
algorithm and efficiency falls further under limited lighting conditions. Since the
initialization phase of the algorithm is based on differencing between consecutive
frames, background movement in the frame may lead to inaccurate operation.
Typically, background movement causes non eye pairs to be detected as eye pairs.
This is overcome to some extent by limiting the search region to the face of an
individual, by implementing a face tracking algorithm prior to blink detection.
However, this in turn can lead to reduced efficiency in blink detection. By giving
an option to the user to choose between the system with and without face
tracking, a level of flexibility can be reached. The application of the blink
detection system is not limited to the control of appliances but can also be used fora
37
variety of other functions. Playback of audio distress messages over an intercom
system is one of the other applications of the system. Future applications of the
system may include playback of video or audio files by eye blinks and making a
VOIP call to play a distress message.
38
REFERENCES
1. Chinnawat Devahasdin Na Ayudhya, ThitiwanSrinark, A Method for Real-
Time Eye Blink Detection and Its Application
2. Michael Chau and Margrit Betke,2005. Real Time Eye Tracking and Blink
Detection with USB Cameras. Boston University Computer Science Technical
Report No. 2005-12. Boston, USA.
3. Liting Wang, Xiaoqing Ding, Chi Fang, Changsong Liu, Kongqiao Wang
,2009. Eye Blink Detection Based on Eye Contour Extraction. Proceedings of
SPIE-IS&T Electronic Imaging. San Jose, CA, USA.
4. Abdul-Kader, S. A., & Woods, J. (2015). Survey on Chatbot Design
Techniques in Speech Conversation Systems. International Journal of
Advanced Hima T Eye Controlled Home-Automation For Disable‖ pp6-7
(ERTEEI'17).
5. AbuShawar B., Atwell E. ALICE Chatbot: Trials and outputs Computación y
Sistemas, 19 (2015). doi: 10.13053/cys-19-4-2326
6. Kohei Aai and Ronny Mardiyanto, 2011. Comparative Study on Blink
Detection and Gaze Estimation Methods for HCI, in Particular, Gabor Filter
Utilized Blink Detection Method. Proceedings of Eighth International
Conference on Information Technology: New Generations. Las Vegas, USA,
pp. 441-446.
7. Taner Danisman, Ian Marius Bilasco, Chabane Djeraba, Nacim Ihaddadene
“Drowsy driver detection system using Eye Blink patterns” 2010 International
Conference on Machine and Web Intelligence 29 November 2010.
8. Atish Udayashankar ; Amit R. Kowshik ; S. Chandramouli ; H.S.
Prashanth,”Assistance for paralyzed using Eye Blink Detection, 2012 fourth
international conference on digital home , 11 december 2012.
9. Home automation with eye blink for paralyzed patients, DeepBose, BRAC
39
University, Bangladesh, 2017
10. F.M. Sukno, S.K. Pavani, C. Butakoffand and A.F. Frangi, “Automatic
Assessment of Eye Blinking Patterns through Statistical Shape Models,” ICVS
2009, LNCS 5815, Springer-Verlag Berlin Heidelberg, pp. 33-42, 2009.
11. M. Divjak and H. Bischof, “Eye blink based fatigue detection for prevention of
Computer Vision Syndrome,” MVA2009 IAPR Conference on Machine Vision
Applications, Yokohama, Japan, May 2009.
12. L. Wang, X. Ding, C. Fang and C. Liu, “Eye blink detection based on eye
contour extraction,” Proceedings of SPIE, vol. 7245,72450R, 2009.
13. Michelle Alva, Neil Castellino,” An Image Based Eye Controlled Assistive
System for Paralytic Patients”, 2nd International Conference on
Communication Systems IEEE 2017
14. Milan Pandey, Anoop Shinde, Kushal Chaudhari, DivyanshuTotla, Rajnish
Kumar, Prof. N.D. Mali “Assistance for Paralyzed Patient Using Eye Motion
Detection” 2018 IEEE.
40