Final Invision
Final Invision
I. ABSTRACT
This research introduces an AI-powered assistive primarily provide tactile or physical guidance,
system designed to enhance autonomy and safety lacking the ability to interpret surrounding
for visually impaired individuals. The proposed information such as signs, moving obstacles, or
Vision AI solution integrates real-time object written content. As a result, users often face
recognition, text extraction, spatial awareness, and challenges in unfamiliar settings like urban
auditory feedback to assist users in navigating their intersections, transportation hubs, and public
environments independently. Leveraging deep buildings.
learning models, OpenCV, OCR, and low-cost Recent advancements in Artificial Intelligence (AI),
embedded hardware such as Raspberry Pi, the particularly in computer vision and deep learning,
system captures visual data, identifies relevant have opened new possibilities in developing
information, and conveys it through audio output. intelligent assistive systems. These technologies
An SOS alert function is incorporated to ensure enable machines to understand and interpret visual
user safety during emergencies. Designed for data in real time, allowing for enhanced sensory
affordability, portability, and user-friendliness, this substitution and context-aware feedback
modular system represents a significant step toward mechanisms. The emergence of low-cost embedded
inclusive assistive technology, fostering systems further facilitates the deployment of such
independence and accessibility for the visually solutions in portable, everyday devices.
impaired.
Vision AI aims to bridge the gap between sensory
In addition to core navigation features, the system perception and contextual understanding by
is capable of recognizing street signs, reading providing real-time visual interpretation and audio
printed documents aloud, detecting obstacles at feedback. By utilizing AI technologies such as
various heights, and guiding users through dynamic object detection, optical character recognition
indoor and outdoor settings. Customizable voice (OCR), and distance measurement via ultrasonic
feedback and multi-language support enhance user sensors, users can receive timely, relevant
interaction, making the device adaptable to diverse information about their surroundings. This system
populations. The architecture supports future addresses challenges such as obstacle avoidance,
integration with GPS for route guidance, voice sign reading, and emergency communication.
command input for hands-free control, and wireless
connectivity for remote updates and caregiver This project, titled "Vision AI: An Eye for the
alerts. Field tests indicate improved confidence and Blind," proposes an affordable and compact device
mobility among users, demonstrating the system’s using a Raspberry Pi, camera module, ultrasonic
potential to bridge critical accessibility gaps. sensor, and earphones to facilitate real-time scene
Continued development aims to refine recognition analysis and voice-guided navigation. The device
accuracy, minimize latency, and explore wearable leverages deep learning models for accurate object
form factors such as smart glasses or belts, classification and text extraction, while ultrasonic
positioning this innovation as a cornerstone in the sensors provide depth cues for detecting nearby
evolution of smart assistive devices. obstacles. The processed information is converted
into intuitive audio instructions, allowing users to
II. INTRODUCTION make informed decisions with minimal cognitive
load.
Visually impaired individuals encounter significant
barriers when navigating independently in complex In addition to its core functionalities, the system
and dynamic environments. Traditional aids such as features a modular and scalable architecture,
white canes and guide dogs, while essential, offer enabling future enhancements such as GPS-based
limited contextual awareness and often fall short in route guidance, facial recognition for social
unfamiliar or crowded areas. These methods interaction, and wireless connectivity for remote
monitoring or updates. Designed with affordability, toward inclusive technology, where AI is not only
usability, and adaptability in mind, this solution innovative but also transformative for those who
empowers visually impaired individuals with a need it most.
greater sense of independence and confidence.
Ultimately, Vision AI represents a significant stride
III.RELATED WORK and future upgrades. This approach ensures greater
reliability, affordability, and accessibility, making
Previous research has explored a range of assistive the technology suitable for a wide range of users
technologies for the visually impaired. Traditional and environments.
systems include ultrasonic canes and wearable
navigation aids. These devices offer basic support IV.METHODOLOGY
but often lack contextual analysis and flexibility in The system design follows a modular approach
diverse environments. While useful for detecting with the following components:
nearby physical obstacles, they provide minimal
information about object identity, signage, or text, • Camera Module: Captures live video
limiting the user’s ability to fully understand and feed of the user's surroundings. It forms
interact with their surroundings. the foundation of the system’s visual input
and is essential for object and text
Recent AI-powered systems have utilized YOLO- detection.
based object detection, OCR tools like Tesseract,
and voice feedback mechanisms. For instance, • Object Detection: Utilizes YOLOv5/
smart glasses integrated with AI and Raspberry Pi YOLOv8 with pre-trained weights on the
platforms have enabled real-time object and text COCO dataset to detect and identify
recognition, translating visual data into audio cues common objects in real time. The YOLO
for the user. Other projects have included model is optimized for speed and
smartphone-based applications that provide live accuracy, ensuring smooth performance on
feedback using cloud AI services such as Google Raspberry Pi.
Cloud Vision or Microsoft Azure. These solutions • Text Recognition: Implements Tesseract
demonstrate the potential of AI in enhancing spatial OCR to extract printed text from captured
awareness and information access for visually frames, such as signs, labels, and notices.
impaired individuals. This enables visually impaired users to
In addition, research has explored the fusion of interpret written information in their
computer vision with wearable hardware to create environment.
head-mounted or belt-mounted devices that aid in • Ultrasonic Sensor: Measures distance to
obstacle detection and navigation. Efforts have also nearby obstacles and enhances spatial
been made to incorporate facial recognition and awareness. The HC-SR04 sensor provides
scene description functionalities to improve user non-contact distance measurement to
interaction and social engagement. However, prevent collisions.
despite these advancements, several challenges
persist. Processing latency and power consumption • Audio Output: Converts detected
remain critical issues, especially for real-time information into speech using a text-to-
applications. Many systems perform poorly in low- speech (TTS) engine and relays it through
light or high-glare conditions, which significantly earphones. Pyttsx3, a lightweight offline
impacts usability. High hardware costs and TTS library, ensures instant audio
complex user interfaces also act as barriers to feedback.
widespread adoption. • SOS Alert: A button trigger sends a
Furthermore, the dependency on cloud services in message to predefined contacts via
some systems raises concerns about data privacy, Telegram bot for emergency response.
latency, and reliability—especially in offline or This feature enhances safety and enables
low-connectivity environments. These limitations immediate help in critical situations.
restrict the practicality of such systems in rural or Processing is handled on a Raspberry Pi running
remote areas, where access to stable internet may Python, OpenCV, and relevant AI libraries. Real-
not be guaranteed. time performance is optimized through lightweight
Our proposed system addresses these issues models and edge computing, minimizing delay and
through a compact design, offline processing, low- ensuring independence from cloud resources.
latency performance, and integration of an
emergency alert mechanism. By leveraging V. S Y S T E M
efficient deep learning models optimized for edge ARCHITECTURE
devices, it ensures consistent performance even
without internet connectivity. The incorporation of
a local SOS feature enhances user safety, while the
modular architecture allows for easy customization
utilizing YOLO (You Only Look Once)
models. The detection rate remains
consistent for common obstacles such as
poles, benches, and moving entities like
humans and vehicles.
• OCR (Optical Character Recognition)
effectively reads high-contrast printed
text, particularly when the text is in
standard fonts and sizes. This proves
useful for identifying signage and labels in
structured environments.
• The ultrasonic sensor demonstrates
reliable performance by accurately
detecting obstacles within a 4-meter range,
facilitating timely audio alerts to the user.
The real-time response ensures that users
can navigate their surroundings with
enhanced safety.
• Audio feedback is delivered with
negligible latency (less than 1 second),
providing a seamless and intuitive
interaction experience. The system
communicates directional instructions and
alerts without causing confusion or delay.
• SOS alert functionality via the Telegram
bot is successfully triggered within 5
seconds of the button press. This rapid
Figure 1: System Architecture (diagram to be response time is critical in emergency
inserted) scenarios, enabling immediate
communication with predefined contacts.
Hardware:
Despite these promising results, the system does
• Raspberry Pi 4 have some limitations. Accuracy in object detection
• USB Camera declines in dimly lit or low-contrast environments,
which can impact performance during nighttime or
• HC-SR04 Ultrasonic Sensor indoor usage with inadequate lighting. OCR
• Earphones reliability decreases when confronted with stylized,
cursive, or handwritten text, reducing the system's
• Power Bank
effectiveness in certain contexts. Additionally,
Software: prolonged usage leads to increased power
• Python consumption, potentially limiting operational
duration without regular charging.
• OpenCV
These limitations highlight the need for further
• Tesseract OCR refinement. Future iterations of the system aim to
• Telegram API incorporate improved low-light camera sensors,
adaptive thresholding techniques for OCR, and
• Pyttsx3 (TTS) energy-efficient processing modules. Additionally,
integrating AI model fine-tuning with diverse
VI.RESULT AND datasets can enhance the robustness of detection
DISCUSSION and recognition tasks across varying conditions.
The system has been evaluated under various Overall, the system demonstrates a strong
lighting and environmental conditions to ensure foundation for assistive technology, combining
robust performance across real-world scenarios. multiple sensing and communication modules to
Key observations from the testing phase include: support user safety and independence.
• Object detection achieves over 85%
accuracy in well-lit environments when
VII.COMPARATIVE STUDY
Table 1: Comparative Analysis of Related Works
Title Authors Datasets Technologies Used Performance /
Accuracy
T h e U s e o f A I i n A i k a t e r i n i Not specified AI Assistive Tech Highlights benefits;
Education of People Tsouktakou et mentions challenges
Visually Impaired al. like cost
Smart Glasses for Blind Hawra Al Said Not specified AI, OCR, Navigation Limited to English,
People et al. bulky design
Accessibility Datasets Rie Kamikubo V i z W i z , Data Ethics, AI Privacy and data sharing
et al. custom concerns
WaveNet: Raw Audio Aaron van den V C T K , L J DNN, TTS Long-range dependency
Model Oord et al. Speech handling
AI Navigation for Blind Vikram Shirol et Not specified Raspberry Pi, IR, Overheating, sensor
al. Camera accuracy
Blind Assist System Nagaswathi S et Not specified AI, Image Processing, Accuracy in dynamic
Using AI al. Ultrasonic Sensors lighting
A I S u p p o r t w i t h Rijwan Khan et Not specified ETA Prototype, TTS Sensor accuracy and
Reading Assistant al. lighting challenges
Vi z Wi z : R e a l - t i m e J e f f r e y P . V i z W i z Mobile App Works in low light but
Answers Bigham et al. Dataset struggles indoors
Smart Glasses with Rotimi Abayomi Not specified S t e r e o V i s i o n , Limited camera slots
Raspberry Pi Raspberry Pi
."