0% found this document useful (0 votes)
16 views43 pages

Saavip-Smart Ai - Assistant For Visually Impaired People

The document outlines the SAAVIP-SMART AI project, a mobile application designed to assist visually impaired individuals by providing real-time environmental descriptions through a voice assistant. It highlights the limitations of existing systems and proposes a comprehensive solution that includes features like face detection, object recognition, and audio feedback to enhance user independence. The project aims to improve navigation and interaction with surroundings, promoting accessibility and confidence for visually impaired users.

Uploaded by

Abhijith Babu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views43 pages

Saavip-Smart Ai - Assistant For Visually Impaired People

The document outlines the SAAVIP-SMART AI project, a mobile application designed to assist visually impaired individuals by providing real-time environmental descriptions through a voice assistant. It highlights the limitations of existing systems and proposes a comprehensive solution that includes features like face detection, object recognition, and audio feedback to enhance user independence. The project aims to improve navigation and interaction with surroundings, promoting accessibility and confidence for visually impaired users.

Uploaded by

Abhijith Babu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

02-11-2024

SAAVIP-SMART AI -
ASSISTANT FOR VISUALLY
IMPAIRED PEOPLE
COLLEGE OF ENGINEERING PATHANAPURAM

Project Guide: Dr. Praveen K. Wilson Team Members:


Abhijith Babu (PEC21CS002)
Project Coordinators: Ms. Prameela S
Abhinav R Narayan (PEC21CS004)
Dr. Praveen K. Wilson
Adhira R (PEC21CS008)
Anjaly C A (PEC21CS014)

01
Contents
02-11-2024

1. Introduction
2. Existing System
3. Proposed System
4. Problem Statement
5. Objective
6. Scope
7. Literature Review
LITERATURE REVIEW

8. System Design
9. Conclusion
10. Reference

02
LITERATURE REVIEW 02-11-2024

03
Introduction
01
Introduction
02-11-2024

A mobile application specifically designed for visually impaired individuals.


Utilizes a live voice assistant to describe surroundings in real-time.
Leverages the smartphone's camera and on-device machine learning models.
Capable of recognizing objects, detecting faces, and interpreting scenes.
Provides users with detailed audio feedback to enhance their awareness and interaction
LITERATURE REVIEW

with their environment.

04
LITERATURE REVIEW 02-11-2024

05
Existing System
02
Existing System
02-11-2024

Current systems like Seeing AI and Be My Eyes provide object detection, text reading, and
navigation assistance for the visually impaired.

Limitations
Fragmented features, focusing on individual tasks.
Limited real-time performance and processing.
LITERATURE REVIEW

Reliance on internet connectivity.


Lack of personalization.
Inability to interpret emotional context.

06
LITERATURE REVIEW 02-11-2024

07
Proposed System
03
Proposed System
02-11-2024

User Interface for Interaction


Face Detection and Emotion Recognition
Real-Time Scene and Object Detection
Text-to-Speech for Audio Feedback
Data Storage and Management
LITERATURE REVIEW

08
LITERATURE REVIEW 08.12.2024

09
Problem Statement
04
Problem Statement
02-11-2024

Visually impaired individuals face challenges in navigation, recognizing people,


and understanding surroundings due to the lack of integrated, real-time
assistive solutions. An accessible system with face recognition, object
detection, and audio feedback is needed to enhance independence and
confidence.
LITERATURE REVIEW

10
11
LITERATURE REVIEW 02-11-2024

Objectives
05
Objectives
02-11-2024

Develop an AI-powered Voice Assistant


Integrate Object Detection and Scene Description
Implement Face Recognition
Create a Navigation System
Design an Accessible User Interface
Promote Technological Accessibility
LITERATURE REVIEW

12
13
LITERATURE REVIEW 02-11-2024

Scope
06
Scope
02-11-2024

Real-Time Environment Description


Face Recognition for Social Interaction
Accessible Interface Design
Safety and Independence
Broad Applicability
LITERATURE REVIEW

14
15
LITERATURE REVIEW 02-11-2024

Literature Review
07
CITATION METHODOLOGY ADVANTAGES DISADVANTAGES IDEA INHERITED

Real-time Object Detection for Employed a Convolutional Neural Provides real-time feedback, enabling Accuracy can be affected by lighting To empower visually impaired individuals with
Visually Challenged People" Network (CNN) architecture for real-time immediate situational awareness. conditions, object occlusion, and complex increased autonomy and independence by
object detection. Utilized a dataset of Can potentially enhance independent backgrounds. leveraging AI-powered object detection for
Sunit Vaidya- 2020 common objects and trained the model navigation and daily living activities. Requires continuous power supply for the environmental understanding
to accurately identify and locate them in device.
real-world scenarios.

Text to Voice Conversion for Visually Image Capture: Converts to grayscale,


Impaired removes noise, detects edges, focuses on Promotes independence for visually Limited with complex backgrounds. Integrates OCR and text-to-speech for
Person by using Camera.1Mr. Sumit text. impaired users. Needs a clear, dark surface for accurate accessibility.
Chafale.,2Ms.PriyankaDighore Data Processing: Uses OCR to turn images Easy to use, portable, and helps with text detection. Aims to aid visually impaired in daily activities.
3 Ms.Dipika Panditpawar., 4Mr. into text. pronunciation.
Khushal Bhagawatkar5Mr.Shrikant Audio Output: Converts text to audio via
Sakhare - March 2021 text-to-speech.

Improved Disabled Mobile Aid Location: GPS-based navigation for clinics,


Application for hospitals, and organizations.
Android Enhances access to health facilities Requires GPS and internet. Combines location services and health
Health and Fitness Helper for Health: Medical reminders and exercise and fitness. May be complex for some users. support for accessibility.
Disabled People suggestions. Supports navigation and health remi
Dhafer Sabah Yaseen1
, Shamala A/P Batumalai2 Accessibility: Stores preferred locations; lists
, Falah Y H Ahmed - March 2022 paralympic sports.

Detection: SSD with MobileNet for objects;


SIFT for currency. High Accuracy: 99.31% detection, Limited Classes: Only five object types.
98.43% recognition. Assistive tech for visually impaired mobility.
IoT Enabled Automated Object Bulky: Wired components make it heavy. Real-time, audio-based guidance.
Real-Time Alerts: Audio feedback for
Recognition for the Visually Impaired Sensors: Laser sensors for multi-directional users.
Environmental Constraints: Slightly lower Remote data storage for analysis.
Md. Atikur Rahman∗ detection. accuracy outdoors
Multi-Functionality: Detects objects.
, Muhammad Sheikh Sadi - April 2023
IoT: Real-time data sent to a remote server.

16 Fall Detection: Accelerometer alerts for falls.


CITATION METHODOLOGY ADVANTAGES DISADVANTAGES IDEA INHERITED

A Deep Learning Approach for Developed a deep learning model using a Can potentially improve accessibility Requires a large and diverse dataset To bridge the gap in assistive
Object Recognition System for the combination of Convolutional Neural for visually impaired individuals in with accurate Arabic annotations for technology for visually impaired
Visually Impaired Using Arabic Networks (CNNs) and Recurrent Neural Arabic-speaking regions. effective training. individuals in regions with different
Annotation Networks (RNNs) to recognize objects in Combines the strengths of CNNs and May face challenges in generalizing to linguistic contexts.
By real-time. Trained the model on a dataset RNNs for enhanced feature extraction unseen objects or environments.
Mohammad Hussan et al. of Arabic-annotated images. and classification.
- 2023

A deep learning-based integrated Developing a deep learning-based system integrating deep learning and assistive
voice assistance system for partially that uses a camera and sensors to Enhanced Independence Limited Precision in Complex technology to create a voice
disabled people by capture real-time environmental data, Health Monitoring Environments assistance system specifically
Harshit Garg ,Srishti Jhunthra which is then processed for object Cost-effective Design Latency Issues designed for individuals with partial
,Madhav Kindra ,Vikrant Dixit ,Vedika detection and text recognition. The User Training Required disabilities
Gupta - April 2016 system utilizes a text-to-speech engine to Enhancing their ability to interact with
convert visual data into audio feedback. their surroundings.

Real-Time Object Detection and Utilized a smartphone-based system with Leverages the widespread Accuracy can be affected by
Recognition for Visually Impaired a camera and integrated object detection availability and computational power smartphone camera limitations and It combines a voice recognition module
Persons Using Smartphone by algorithms. Developed a user-friendly of smartphones. varying lighting conditions. with an obstacle detection system to
Hiren Kumar Thakkar et al - 2020 interface for real-time object identification Provides a portable and convenient Battery life of the smartphone can impact improve safety and autonomy for users.
and audio feedback. solution for everyday use. usage duration.

Image processing and machine learning To harness the power of readily available
A facial expression controlled Increased Accessibility Reliance on Facial Movements technology to enhance the quality of life
techniques to recognize specific facial Environmental Limitations
wheelchair for people with expressions, which are then mapped to Intuitive Interface for visually impaired individuals.
disabilities by Yassine Rabhi,Makrem wheelchair control commands (e.g., forward, Real-time Response High Computational Demand
Mrabet,Farhat Fnaiech - February backward, left, right).
2019

17
CITATION METHODOLOGY ADVANTAGES DISADVANTAGES IDEA INHERITED

Faster R-CNN: Towards Real-Time Uses RPNs to generate region proposals, Faster and more accurate region
Object Detection with Region Computationally intensive for real-time Combines Region Proposal Networks
sharing convolutional features with the proposals. applications.
Proposal Networks by Shaoqing detection network for unified training. Enables joint training of RPN and (RPNs) and Fast R-CNN into an end-to-
Ren, Kaiming He, Ross B. Girshick, Struggles with small object detection. end trainable object detection system.
Jian Sun - 2015 detection networks

You Only Look Once (YOLO): Divides the image into grid cells and
directly predicts bounding boxes and Extremely fast, achieving real-time Simplifies object detection as a single
Unified, Real-Time Object Detection Lower accuracy in complex scenes. neural network regression task for
by Joseph Redmon, Santosh class probabilities in a single pass. detection. Struggles with small or closely spaced
Simple, unified architecture. bounding boxes and class probabilities.
Divvala, Ross B. Girshick, Ali objects.
Farhadi-2016

Employs feature maps at different scales


Single Shot MultiBox Detector for efficient multi-scale object detection.
(SSD) Fast and efficient with a good Struggles with small objects.
Wei Liu, Dragomir Anguelov, speed-accuracy balance. Requires careful tuning of Builds upon one-stage detection by using
Dumitru Erhan, Christian Szegedy, Supports detection of varying object hyperparameters. multi-scale feature maps for detecting
Scott Reed, Cheng-Yang Fu, sizes using multi-scale features. objects of varying sizes in a single forward
Alexander C. Berg - 2016 pass.

Introduces a loss function that down- Proposes Focal Loss to address class
Focal Loss for Dense Object Boosts accuracy for one-stage Adds a tunable hyperparameter (γ). imbalance in one-stage detectors by
Detection weights easy examples, emphasizing hard- detectors, especially for rare classes. Slightly increases training time.
to-classify ones. focusing training on hard examples.
Tsung-Yi Lin, Priya Goyal, Ross B. Works with various one-stage
Girshick, Kaiming He, Piotr Dollár - architectures.
2017

18
CITATION METHODOLOGY ADVANTAGES DISADVANTAGES IDEA INHERITED

Improved Disability Assistant The project creates an Android app to Provides accessible health and fitness
Android Mobile Application Mulla Lacks features for visually impaired Health guidance through exercises
support disabled users with health resources. users.
Amina Mustaq1, Sapkal Kinjal guidance, communication tools, GPS-based Enhances communication with text-to- and diet plans.
Baliram2, Chaudhary Chinmaya Limited to basic sign language resources. Communication aid with text and
hospital location, reminders for speech and speech-to-text. Reminder notifications could be more
Pravin3, Prof. K.S Charumathi4 - appointments, and sign language Includes reminders for medical speech conversion.
April 2023 customizable. GPS and reminders for easy
resources. appointments.
healthcare access.

A reconfigurable technical Develops a reconfigurable assistance Enhances mobility and independence Complexity in managing multiple users in Flexible and fault-tolerant assistance
assistance for disabled people A. system for disabled users, using intelligent for disabled individuals. shared spaces. in daily activities.
Belabbas, P. Berruet, A. Rossi, J-L. wheelchairs and domotic services, Offers customizable assistance Limited to areas with domotic Intelligent navigation and interaction
Philippe LESTER - 2022 modeled with Petri nets to adapt to through adaptive technology. infrastructure. with the environment.
failures in the environment. Maintains service availability despite High reliance on real-time adjustments Ensures continuous service availability
system breakdowns. and reconfiguration. through reconfiguration​

Integrated Speaker and Speech The project integrates speaker and


Recognition for Wheel Chair Enhances control accuracy through Performance may vary with background Voice-controlled wheelchair operation.
speech recognition to control a optimized speech recognition. noise. Improved accuracy using speaker
Movement using Artificial wheelchair, using MFCC for feature
Intelligence Gurpreet Kaur Reduces misinterpretation of High computational complexity due to recognition.
extraction, ABC algorithm for feature commands by verifying the speaker. feature optimization. Optimized feature extraction for
Research Scholar, I.K Gujral Punjab optimization, and FFBPN for classification.
Technical University, Kapurthala- Offers reliable wheelchair control in Requires speaker-specific data, limiting robust performance​
144603, India - November 2017 various environments. use by multiple users.

Smartphone-based Accessibility: Smartphone Limitations: Performance The paper inherits the idea of using real-
The methodology of this paper involves Object detection for visually impaired depends on device power, affecting
Real-Time Object Detection And conducting a comparative analysis of time object detection to assist visually
Identification For Visually without extra hardware. older models. challenged individuals by adapting
existing object detection and identification Optimized Algorithms: Compares
Challenged People Using Mobile algorithms, focusing on their performance Detection Range: Effective within 2-5 algorithms to work on mobile platforms. It
Platform Neeraj Joshi, Shubham on low-computation devices, and identifying YOLO and SSD for real-time use. meters, limiting some scenarios. leverages the speed of regression-based
Maurya, Sarika Jain - February 2021 research gaps to propose a feasible model Low-Compute Solutions: Proposes Speed vs. Accuracy: Prioritizing speed algorithms like YOLO and SSD to create
for visually impaired individuals using a lightweight methods for accessible may reduce accuracy in complex an accessible, low-computation solution
use. settings.
19 smartphone-based system. without extra hardware.
LITERATURE REVIEW 02-11-2024

20
SYSTEM DESIGN
08
MODULE DESCRIPTION
02-11-2024

1. User Interface (UI):


Provides a user-friendly and accessible interface for visually impaired users, enabling
interaction with the app, receiving navigation prompts.
2. Face Detection Module:
Uses a dataset and a YOLO (You Only Look Once) model to recognize known faces.
Integrates a Facial Emotion Recognition (FER) component, which could analyze the emotional
state of recognized faces.
3. Scene and Object Detection Module:
LITERATURE REVIEW

Utilizes a YOLO model trained on a specific dataset to identify objects and scenes in real-time,
helping users understand their surroundings.

21
MODULE DESCRIPTION
02-11-2024

4. Navigation Module (GPS):


Integrates GPS data to provide navigation support, offering real-time directions and guiding
users to specified locations.
Processes input data from GPS services for accurate positioning and routing.
5. Text-to-Speech and Audio Output Module:
Converts text data into speech, describing recognized objects, scenes, faces, and navigation
directions to the user.
Provides voice output for all relevant information, ensuring the user receives continuous
LITERATURE REVIEW

auditory feedback.

21
MODULE DESCRIPTION
02-11-2024

6. Database:
Stores datasets, trained models, and user-specific data (such as recognized faces and
navigation preferences).
Enables efficient data retrieval for face recognition, object detection, and navigation
processing.
LITERATURE REVIEW

21
HARDWARE REQUIREMENTS
02-11-2024

1. Smartphone: A modern smartphone with:


High-quality camera for real-time object and face detection.
GPS for navigation and location tracking.
Sufficient processing power (preferably with an AI or neural processing unit) for on-device
machine learning.
2. Server (optional):
Used if cloud-based processing is required, with capabilities for high-performance storage,
computational power, and secure data handling.
LITERATURE REVIEW

3. Headphones or Earbuds (optional):


For private and clear audio feedback without disturbing others, especially in noisy
environments.

21
SOFTWARE REQUIREMENTS
02-11-2024

1. Operating System:
Android or iOS SDK (e.g., Android Studio for Android or Xcode for iOS development).
2. Programming Languages:
Java/Kotlin for Android, or Swift for iOS development.
Python for developing machine learning models.
3. Machine Learning Libraries:
YOLO Model for object and face detection.
OpenCV for image processing.
TensorFlow Lite or Core ML for on-device model deployment.
4. APIs and Services:
LITERATURE REVIEW

Google Maps API or Mapbox for GPS-based navigation.


Text-to-Speech API (e.g., Google Text-to-Speech or iOS TTS) for audio output.
5. Database:
Firebase or a cloud database for storing user-specific data, trained models, and preferences.

21
LITERATURE REVIEW 02-11-2024

20
STRUCTURE
PROJECT
configuration component src Configuration
Files

eslint.config.js index.css vite-env.d.ts main.tsx App.tsx


ESLint configuration Global styles TypeScript declarations Application entry point Main application

postcss.config.js
PostCSS configuration

vite.config.ts tsconfig.json tailwind.config.js


Vite configuration TypeScript configuration Tailwind CSS

package.json public
Project dependencies and scripts Static assets
State Management Refs Effects UI Components

index.css vite-env.d.ts main.tsx App.tsx

Camera State
Detected Objects State

Detection Detection Control Video/Canvas


Camera State State Results Header Buttons Container
LITERATURE REVIEW 02-11-2024

20
MODULE
DETECTION
OBJECT
TECHNOLOGIES USED
02-11-2024

REACT JS
TENSORFLOW JS
COCO-DATASET
LITERATURE REVIEW

21
02-11-2024

MediaDevices API
LABEL
VIDEO FEED
COCO-SSD BOUNDARY BOX

CAMERA CONFIDENCE
ANALYSIS

Canvas API
LITERATURE REVIEW

BOUNDARY
BOX

21
02-11-2024

MediaDevices API - used to access the user's camera feed. This allows you to get a live video stream from the
camera so the app can analyze it in real-time.
COCO-SSD - pre-trained model available from TensorFlow.js. It can detect objects from a list of 90 categories,
such as people, animals, vehicles, and more.

COCO-SSD model processes each frame and returns a list of detected objects, each with:
Label - The name of the detected object (e.g., "person", "dog").
Bounding Box - The location of the detected object in the video frame (given as coordinates: top, left, width,
height).
Confidence - A measure of how confident the model is in its detection.
LITERATURE REVIEW

Canvas API - to draw bounding boxes around the detected objects in the video feed. This provides a visual
indication of where each object is in the frame.

21
21
LITERATURE REVIEW 02-11-2024
OUTPUT
LITERATURE REVIEW 02-11-2024

20
MODULE
OUTPUT
VOICE
FEATURES
02-11-2024

Announces object as “DETECTED [Object Name]” using clear voice.


Works seamlessly accross modern browsers using built - in WEB SPEECH API.
Stops announcements when object detection is disabled.
LITERATURE REVIEW

21
PROGRESS
02-11-2024

SPEAK OBJECT FUNCTION


- Added a function that uses the browser’s Web Speech API to announce detected objects Via text -to- speech.

TRACKING PREVIOUSLY ANNOUNCED OBJECTS


- Implemented a previous Object Ref to track of objects already announced and prevent repetitive
announcements.

INTEGRATION INTO DETECTION LOOP


- Incorporated TTS announcements into the object detection loop.
LITERATURE REVIEW

STOP ONGOING SPEECH


- Added clearup functionality to halt any ongoing speech when detection is stopped.

21
02-11-2024

Object is detected and the app formats a


string describing it
CAMERA INPUT

COCO-SSD model (from TensorFlow.js)


DESCRIPTION

API
ech
Spe
eb
W
LITERATURE REVIEW

SpeechSynthesis API:
Use the SpeechSynthesis interface to
make the browser speak the text. Here’s
how you can implement it.
API converts the description text into speech
21
LITERATURE REVIEW 02-11-2024

20
Conclusion
09
Conclusion
02-11-2024

The mobile application empowers visually impaired individuals by enhancing their


independence and awareness.
Real-time audio feedback enables users to interact confidently with their surroundings.
The integration of smartphone cameras and on-device machine learning ensures high
accuracy and responsiveness.
Features like object recognition, face detection, and scene interpretation cater to diverse
user needs.
The application sets a benchmark for accessibility-focused innovations, improving quality of
LITERATURE REVIEW

life for its users.

21
LITERATURE REVIEW 02-11-2024

22
Reference
10
Reference
02-11-2024

Improved Disability Assistant Android Mobile Application Mulla Amina Mustaq1, Sapkal Kinjal Baliram2, Chaudhary
Chinmaya Pravin3, Prof. K.S Charumathi4 - April 2023
Intelligent Voice Controlled Wheel Chair for Disabled People by M. Joly, Arun Pradeep ,Kavitha S - 25.02.2023
Voice Control Intelligent Wheelchair Movement Using CNNs By Mohammad Shahrul Izham Sharifuddin; Sharifalillah Nordin;
Azliza Mohd Ali - March 2021
Integrated Speaker and Speech Recognition for Wheel Chair Movement using Artificial Intelligence Gurpreet Kaur Research
Scholar, I.K Gujral Punjab Technical University, Kapurthala-144603, India - November 10, 2017
Text to Voice Conversion for Visually Impaired Person by using Camera.1Mr. Sumit Chafale.,2Ms.PriyankaDighore 3 Ms.Dipika
Panditpawar., 4Mr. Khushal Bhagawatkar5Mr.Shrikant Sakhare - March 2021
Integrated Speaker and Speech Recognition for Wheel Chair Movement using Artificial Intelligence Gurpreet Kaur Research
LITERATURE REVIEW

Scholar, I.K Gujral Punjab Technical University, Kapurthala-144603, India - November 2019

21
LITERATURE REVIEW 02-11-2024

24
THANK YOU

You might also like