Priya
Priya
SYSTEM
A SEMINAR REPORT
submitted by
HARIPRIYA K S
(KME20CS029)
to
the APJ Abdul Kalam Technological University
in partial fulfillment of the requirements for the award of the Degree
of
Bachelor of Technology
in
Computer Science & Engineering
Place : ..........................
Date : .̇.........................
2
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
KMEA Engineering College Edathala, Aluva
683 561
CERTIFICATE
Head of Department
Name : Dr. Rekha Lakshmanan
Signature : .......................
ACKNOWLEDGMENT
First and foremost, I would like to express my thanks to Almighty for the
diving grace bestowed on me to complete this seminar successfully on time.
I would like to thank our respected Principal Dr. Amar Nishad T. M, the leading
light of our institution and Dr. Rekha Lakshmanan, Vice Principal and Head of
Department of Computer Science and Engineering for her suggestions and support
throughout my seminar. I also take this opportunity to express my profound grat-
itude and deep regards to my Seminar Coordinator Ms. Vidya Hari, for all her
effort, time and patience in helping me to complete the seminar successfully with
all her suggestions and ideas. And a big thanks to my Seminar Guide Ms. Abeera
V P, of the Department of Computer Science & Engineering for leading to the suc-
cessful completion of the seminar. I also express my gratitude to all teachers for
their cooperation. I gratefully thank the lab staff of the Department of Computer
Science and Engineering for their kind cooperation. Once again I convey my grati-
tude to all those who had direct or indirect influence on my seminar.
HARIPRIYA K S
B. Tech. (Computer Science & Engineering)
Department of Computer Science & Engineering
KMEA Engineering College Edathala, Aluva
i
ABSTRACT
ii
format. By leveraging auditory feedback, StereoPilot empowers individuals with
visual impairments to perceive and interpret their surroundings with enhanced ac-
curacy and efficiency, thereby facilitating independent navigation and interaction
within various environments.
The paper extensively discusses the technical underpinnings of StereoPilot,
emphasizing its utilization of spatial audio rendering and computer vision-based
spatial perception. It delves into the comparative analysis of different feedback
methods employed by the system and evaluates their impact on information trans-
fer rate, positioning accuracy, and overall usability. Through rigorous experimenta-
tion, the paper substantiates the system’s efficacy in improving information transfer
efficiency when compared to alternative feedback methods, thereby highlighting
its potential to serve as a valuable tool for individuals with visual impairments in
spatial tasks and navigation.
iii
CONTENTS
ACKNOWLEDGMENT i
ABSTRACT ii
LIST OF FIGURES iv
ABBREVIATIONS vi
Chapter 1. INTRODUCTION 1
Chapter 3. METHODOLOGY 15
3.1 System Framework . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Feasibility and Positioning Accuracy Testing . . . . . . . . 18
3.3 Comparison of Feedback Methods . . . . . . . . . . . . . . 19
3.4 Desktop Manipulation Experiment . . . . . . . . . . . . . . 21
3.5 ITR Evaluation Experiment . . . . . . . . . . . . . . . . . . 24
Chapter 4. ADVANTAGES 26
Chapter 5. CHALLENGES 27
Chapter 7. CONCLUSION 31
iv
LIST OF FIGURES
v
ABBREVIATIONS
Abbreviation Expansion
GANs Generative Adversarial Networks
CNN Convolutional Neural Network
MSE mean-square error
PSNR peak signal-to-noise ratio
SAPLC Spatial Aggregation of Pixel-level Local Classifiers
FCN fully convolutional network
*** ***
vi
Chapter 1
INTRODUCTION
2
Chapter 2
LITERATURE SURVEY
Shabnam Mohamed Aslam et al., (2020) This paper describes a research ini-
tiative aimed at addressing the challenges visually impaired individuals face when
interacting with touch screen devices. Despite various innovative interaction meth-
ods like virtual keyboards, three-dimensional gestures, and RFID sensing, visually
impaired individuals encounter navigation difficulties on touch screens. The pri-
mary objective here is to develop a Braille-based interface for touch screen smart-
phones to facilitate easier access for visually impaired users. This initiative utilizes
Braille codes as the foundation for communication, enabling individuals with visual
impairments to comfortably interact with touch screens. The process involves opti-
mizing hand finger motions as input parameters, such as coordinate values on x and
y axes, swipe speed and distance, pixel rate, and X and Y axis speeds. To enhance
system performance, the researchers employ a technique that involves varying hid-
den layers and neurons using the Crow Search Algorithm (CSO) in Artificial Neu-
ral Networks (ANN). This approach aims to determine the Optimal Hidden Layer
and Neuron (OHLN) configuration for accurately predicting the intended gestures,
providing a solution for visually impaired individuals to effectively communicate
through hand signals with others. The proposed model is anticipated to offer high
precision and optimal performance metrics compared to existing models. It ad-
dresses the limitations faced by visually impaired individuals in using information
devices such as keyboards, smartphones, and other tech gadgets. The advancement
in information technology has notably improved Braille reading and writing, mak-
ing it more accessible for visually impaired individuals to interpret various materials
like bank statements, transport tickets, maps, and music notes. Mobile phones play
a pivotal role not only in the lives of the general population but also in the lives
of differently-abled individuals. However, communication remains a challenge for
the visually impaired. Consequently, this research aims to bridge this gap by estab-
lishing an interaction framework between visually impaired individuals and mobile
devices, freeing users from the challenges of accessing smartphones for various ac-
tivities. The research focuses on developing a Braille-based system for touch screen
mobiles, predicting different gestures through instructions. The primary contribu-
tion lies in optimizing the interaction framework using an Artificial Neural Network
(ANN) by adjusting hidden layers and neurons. The subsequent sections of the pa-
per outline a literature review, methodology, simulation results, and conclude by
discussing future scopes of this innovation. Overall, the research aims to signifi-
cantly enhance accessibility and communication capabilities for visually impaired
individuals using touch screen technology.
Mandhatya Singh et al., (2022) This paper delineates a comprehensive re-
search endeavor addressing the longstanding issue faced by blind and visually im-
paired individuals (BVIP) in recognizing Indian paper currency denominations. In
countries like India, where currency notes exhibit minimal size variations and lack
distinct tactile attributes, visually impaired individuals encounter challenges in dis-
tinguishing between different denominations. To mitigate this issue, this paper in-
troduces an innovative framework named IPCRNet—a lightweight neural network
designed for low/medium-level smartphones. IPCRNet employs Dense connection,
Multi-Dilation, and Depth-wise separable convolution layers to enhance recogni-
tion accuracy, aiming to assist BVIP in identifying Indian currency notes accurately.
The research team curated an extensive dataset, IPCD, comprising over 50,000 im-
ages representing various real-life scenarios of Indian paper currency. Addition-
ally, they developed an Android application called ’Roshni-Currency recognizer’
4
tailored specifically for BVIP, providing voice-based guidance and denomination
information, enabling hassle-free currency recognition. Recognizing the limita-
tions of existing models in resource-constrained environments, the research focuses
on IPCRNet’s lightweight design—less than four million parameters—making it
highly deployable on mobile devices. This model integrates MobileNet as the front-
end and employs a Contextual Block in the backend to optimize computations while
maintaining accuracy. The innovative multi-dilation scheme expands the network’s
receptive field without inflating the parameters, effectively integrating global and
semantic features for improved accuracy. To facilitate effective training and evalu-
ation of IPCRNet, the researchers conducted comprehensive quantitative and qual-
itative analyses using multiple publicly available datasets. Furthermore, they em-
phasized the importance of their BVIP-friendly android app, ’Roshni,’ which offers
a user-friendly interface and aids in real-time recognition of currency denomina-
tions. The distinctive contributions of this research lie in its novel lightweight CNN
model, the vast and diverse dataset of Indian currency images, thorough quanti-
tative and qualitative analyses, and the publicly available BVIP-oriented android
application, ’Roshni.’ These elements collectively form the proposed end-to-end
Indian paper currency recognition framework (IPCRF), offering a solution to ad-
dress the challenges faced by BVIP in recognizing currency notes. The paper’s
structure includes sections detailing the literature review on currency recognition,
the creation and characteristics of IPCD, the architecture and implementation de-
tails of IPCRNet, experimental setups and results, the development of the ’Roshni’
android application, discussions on accuracy and reliability, and concludes by out-
lining future research directions. Overall, the research provides a comprehensive
solution that amalgamates advanced technology with user-friendly applications to
aid visually impaired individuals in recognizing Indian paper currency.
Salvador Martinez-Cruz et al., The navigation challenges faced by visually
5
impaired and blind people (VIBP) in locating public transport and bus stops due to
their vision limitations have prompted the development of various assistance sys-
tems over the past decade. However, most existing solutions rely on the global
positioning system (GPS), which encounters issues with satellite coverage, partic-
ularly in indoor environments. Moreover, some prototypes designed to aid VIBP
in navigation tend to be cumbersome for the user, affecting their mobility and in-
dependence. Addressing these challenges, a novel assistance system for VIBP uti-
lizing Bluetooth Low Energy (BLE) technology has been introduced in this paper
to facilitate the use of public transportation. This innovative system integrates BLE
beacons installed on buses and bus stops, coupled with a mobile application for
seamless user interaction. The BLE beacons serve as location markers, tracked in
real-time by the mobile app, which subsequently provides pertinent information
to users through verbal instructions. Crucially, this includes details such as trans-
portation line, destination, next stop name, and current location, empowering users
to proactively select the desired bus in advance and alight at the correct destina-
tion stop. The effectiveness of this system has been rigorously tested in controlled
settings and real-world environments, demonstrating an impressive 97.6% success
rate for VIBP traveling independently between points. Participants reported en-
hanced confidence and independence compared to GPS-based systems, citing sev-
eral key advantages. Firstly, the system operates seamlessly with or without an
internet connection, addressing a critical limitation of GPS-based solutions. Sec-
ondly, it offers real-time information without the encumbrance of wearable devices,
alleviating concerns about impeding natural movements. Notably, the BLE-based
system does not encounter satellite coverage issues indoors, a significant advantage
over GPS systems, ensuring reliable functionality regardless of the environment. In
the broader context of public transportation management systems (PTMS), which
commonly provide data on arrival/departure times through digital screens at bus
6
stops—information inaccessible to VIBP—the introduction of this BLE-based sys-
tem represents a significant step towards inclusivity. By leveraging technology that
bypasses the limitations of GPS and addresses indoor coverage challenges, this
innovative system empowers visually impaired individuals to navigate public trans-
port confidently and independently. The positive feedback from participants under-
scores the system’s efficacy in enhancing user experience, bolstering their sense of
security and comfort. Ultimately, this BLE-based assistance system not only fills
a critical gap in accessibility for VIBP but also sets a benchmark for inclusive and
user-friendly solutions in the realm of public transportation navigation.
Wafa M. et al.,(2018) The paper presents a comprehensive overview of the
challenges faced by visually impaired (VI) individuals and the limitations of exist-
ing systems designed to aid their mobility. It introduces an intelligent framework
aimed at significantly improving the quality of life for the VI population by offering
a novel solution that integrates sensor-based and computer vision-based technolo-
gies. The objective is to create a cost-effective and accurate system that enhances
navigation and obstacle avoidance for VI individuals, particularly considering the
high prevalence of VI individuals in developing countries. The statistics highlighted
from the World Health Organization (WHO) underscore the magnitude of visual
impairment globally, emphasizing the urgency to address this issue. The challenges
faced by VI individuals in navigating their surroundings, detecting obstacles (both
static and dynamic), and ensuring safe mobility are discussed. Traditional aids
like white canes and guide dogs are acknowledged but deemed limited in provid-
ing comprehensive real-time information about the environment, especially con-
cerning head-level barriers, and their availability and affordability pose additional
challenges. The limitations of existing electronic devices aimed at aiding VI indi-
viduals, such as ultrasonic obstacle detection glasses, laser canes, and smartphone
applications, are outlined. These systems are noted for their high cost and restricted
7
functionalities, often falling short in providing a complete solution for VI indi-
viduals, particularly those from low-income backgrounds. The paper proposes an
innovative framework that integrates computer vision technology and sensor-based
solutions to address these limitations. It emphasizes the novel approach of using
image depth for proximity measurement, enhancing the system’s ability to detect
and avoid obstacles while providing real-time navigational guidance. The integra-
tion of multiple sensor data through a data fusion algorithm aims to improve the
system’s accuracy and performance. Real-time scenario testing has demonstrated
the system’s effectiveness, achieving high accuracy rates in obstacle detection and
avoidance while providing auditory warnings to users. This system is intended to
assist VI individuals in their daily navigation, offering more comprehensive support
than traditional aids and existing electronic devices. Overall, this paragraph empha-
sizes the need for an efficient and inclusive navigation assistant for VI individuals,
discussing the limitations of current solutions and proposing a novel framework that
integrates sensor technologies and computer vision to provide enhanced real-time
assistance and navigation for the visually impaired.
Sreenu Ponnada et al., (2018) This paper outlines a comprehensive prototype
designed to aid visually impaired individuals in recognizing and navigating obsta-
cles like staircases and manholes. Understanding an object is essential for individu-
als to categorize it correctly. However, this becomes challenging for blind individu-
als. Therefore, this prototype utilizes a combination of feature vector identification
and sensor-computed Arduino chips to empower visually impaired individuals with
more independence while traversing roads. The primary objective of this prototype
is to enhance the autonomy of the visually impaired by helping them recognize and
navigate obstacles through a lightweight stick integrated with technology. To de-
tect manholes and staircases, the chip embedded in the stick is programmed using
specific algorithms. For manhole detection, a code is embedded in the stick’s chip,
8
utilizing a bivariate Gaussian mixture model. Meanwhile, for staircase detection,
the system employs the speeded up robust features (SURF) algorithm for feature
extraction. Navigation in unfamiliar surroundings poses a significant challenge for
visually impaired individuals due to their visual impairment. In India alone, about
1.5 million people face these challenges, and globally, around 170 million indi-
viduals are visually impaired, with this number increasing by approximately 10%
annually. Staircases are a major concern in navigation for the visually impaired.
Various sensors, such as monocular and stereo cameras, depth sensors, and laser
scanning devices like LiDAR, have been used to detect staircases. Image-based
methods often identify staircases by recognizing non-ground plane regions and the
concurrent line patterns resembling staircases within those regions. Moreover, the
detection of open manholes, a critical risk in the Indian context, has been addressed
using ultrasonic sensors. Several systems, such as the Smart Cane and the Ultra-
Cane/Batcane, have relied on a white cane integrated with a single sonar sensor to
detect above-knee obstacles. This paper proposes a hybrid approach utilizing both
sensor and image-based algorithms to detect and classify upward and downward
staircases, employing an array of sonar sensors mounted on a white cane managed
by an Arduino processor. The system also utilizes median-based thresholds for pre-
cise manhole identification and vibro-feedback on the cane to alert the user about
obstacles. Importantly, this entire processing occurs on a smartphone without the
need for heavy computation devices or high-speed internet connectivity for cloud
computation, making it lightweight and cost-effective. The subsequent sections of
the paper provide an overview of ultrasonic sensors, vibrator mechanisms, and Ar-
duino processors. They elaborate on the methodology for identifying manholes and
staircases, feature extraction methods, experimental results, and a summary along
with suggestions for future directions.
9
Yunjia Lei et al,(2022) The paper you’re referring to focuses on a critical
aspect of assistive navigation for visually impaired individuals—pedestrian lane
detection. This task is crucial for helping visually impaired people navigate safely
through environments by providing information about walkable areas, aiding in
staying within pedestrian lanes, and assisting in obstacle detection. However, de-
spite its significance, there has been limited attention given to pedestrian lane de-
tection in unstructured scenes within the research community. The goal of this
paper is to address this gap by conducting a comprehensive review and experimen-
tal evaluation of methods applicable to pedestrian lane detection, intending to pave
the way for future research in this area. The World Health Organization (WHO)
reports that there are approximately 253 million visually impaired individuals glob-
ally, with 217 million experiencing moderate to severe impairments and 36 million
being blind. Visual impairment significantly reduces mobility and increases the risk
of accidents like falls or collisions, making navigation in unfamiliar environments
extremely challenging for the visually impaired. Presently, traditional walking aids
like white canes or guide dogs assist visually impaired individuals, but they have
limitations. White canes have short detection ranges, while guide dogs require train-
ing and are effective primarily in familiar environments. Hence, there’s a growing
need to develop advanced assistive navigation systems. Pedestrian lane detection
plays a crucial role in these systems as it allows visually impaired users to navigate
within lanes, aiding their balance and mobility. An accurate, reliable, and real-time
pedestrian lane detection algorithm can immensely enhance the safety and mobility
of visually impaired individuals. Despite the significance of pedestrian lane detec-
tion in assistive navigation, research in this domain has been lacking. This survey
paper aims to lay the groundwork for assistive navigation research by reviewing
and assessing various methods, including those used for general road detection and
semantic segmentation. The methods’ design principles and performances on a spe-
10
cialized pedestrian lane detection dataset serve as valuable resources for developing
new methods. The paper highlights that methods designed for vehicle road detec-
tion aren’t optimized for pedestrian lane detection due to differences in structure
and environmental considerations. Pedestrian lanes have diverse shapes and surface
textures (e.g., bricks, concrete, grass), unlike vehicle roads with clearer boundaries
and asphalt surfaces. Moreover, pedestrian lane detection encompasses both indoor
and outdoor scenes, whereas road detection primarily deals with outdoor scenarios.
Paul Mejia et al., (2021) The challenges faced by visually impaired peo-
ple (VIPs) in accessing mathematical resources pose significant obstacles, particu-
larly in pursuing degrees in science-related fields. Traditional computational tools
like Computer Algebra Systems (CAS) are not designed to be user-friendly for the
visually impaired, making even simple mathematical problem-solving a daunting
task. To address this issue, a new system called Casvi has been developed. Casvi
functions as a specialized CAS tailored for individuals with visual disabilities, en-
abling them to perform basic and advanced numerical calculations using the Max-
ima mathematical engine. The system underwent testing by 25 VIPs to evaluate
its functionality and user-friendliness. Impressively, these individuals achieved a
92% accuracy rate in executing mathematical operations using Casvi. Addition-
ally, Casvi proved to be more efficient than the LAMBDA system in terms of the
time required for VIPs to perform mathematical operations accurately. Globally,
approximately 2,200 million people grapple with visual impairment or blindness,
a statistic that highlights the magnitude of this challenge [1]. In the United States,
the dropout rate among high school students with disabilities hovers around 40%
[2]. Moreover, only 13.7% of students with visual disabilities pursuing higher ed-
ucation manage to obtain a degree [3]. In the context of Ecuador, where a portion
of this research was conducted, the population exceeds 17 million people [4], with
481,392 individuals registered as having some form of disability, equating to an an-
11
nual prevalence of 2.74%. Within this group, 11.60% (55,843 people) suffer from
visual disabilities. Specifically, 2,906 students with visual impairments are studying
in primary, middle, or high school, and 1,188 are enrolled in universities or poly-
technic schools. Additionally, 147 individuals with visual disabilities are registered
in Technical and Technological Institutes. For VIPs pursuing Bachelor of Science
majors, such as engineering, the lack of accessibility in essential resources like spe-
cialized software and math textbooks severely restricts their academic and career
options. Algebraic Computational Systems (CAS) like MATLAB, Wolfram Math-
ematica, and Maxima, which are crucial tools in engineering and related fields, are
inaccessible to the visually impaired. This inaccessibility renders even basic math-
ematical operations challenging for VIPs, despite the assistance of screen readers.
Moreover, the technical complexity of documents exacerbates difficulties for vi-
sually impaired individuals, reducing their access to crucial mathematical content.
The primary barrier for visually impaired individuals in grasping mathematical se-
mantics isn’t blindness itself but rather the lack of access to mathematical content.
Bridging this gap between existing CAS tools and VIPs becomes crucial, allowing
for the writing, editing, evaluation, and solving of mathematical expressions. Ad-
ditionally, as visually impaired students increasingly integrate into regular schools,
these tools must also be accessible to teachers who may not be proficient in braille
[5]. To address these challenges, the Casvi computational algebraic system emerges
as a promising solution, providing crucial support for individuals with varying de-
grees of visual impairment in their academic journey within engineering and exact
sciences.
Amit Kumar Jaiswal, (2021) The field of healthcare has witnessed a surge in
interest due to the integration of Deep Learning and IoT, particularly in addressing
real-time health concerns. Among these, Diabetic Eye Disease stands as a leading
cause of blindness among the working-age population, notably affecting populous
12
Asian countries like India and China, where the prevalence of diabetes is burgeon-
ing. The escalating number of diabetic patients presents a formidable challenge for
healthcare professionals to conduct timely medical screenings and diagnoses. The
objective at hand is to harness deep learning methodologies to automate the identi-
fication of blind spots in the eye and assess the severity of this condition. The pro-
posed solution in this paper introduces an optimized technique built upon the foun-
dation of recently introduced pre-trained EfficientNet models. This approach aims
to detect blindness indicators in retinal images, culminating in a comparative analy-
sis among various neural network models. Notably, the fine-tuned EfficientNet-B5
based model, evaluated using benchmark datasets comprising retina images cap-
tured through fundus photography across diverse imaging stages, demonstrates su-
perior performance compared to CNN and ResNet50 models. The convergence of
AI and IoT in smart healthcare systems has garnered attention, offering more effi-
cient detection and management of various health conditions. Diabetes, a prevalent
chronic ailment globally, arises due to insufficient insulin production or ineffective
utilization by the body. The World Health Organization (WHO) recorded over 1.6
million deaths attributable to diabetes in 2016, emphasizing its critical impact. Di-
abetic Retinopathy (DR) emerges as a severe complication of diabetes, potentially
leading to complete blindness, affecting a substantial proportion of diabetic indi-
viduals worldwide. Approximately 25% of diabetic patients suffer from DR exclu-
sively, highlighting its complexity and impact within this demographic. Long-term
diabetes poses a significant risk of DR, a progressive disease capable of causing
partial or permanent vision impairment. Notably, the majority of those affected by
DR belong to the working-age group, a crucial segment of any country’s workforce.
India, in particular, houses a considerable diabetic population, and this number is
rapidly escalating each year. Detecting DR at its early stages remains challeng-
ing, as initial symptoms are often subtle and may go unnoticed until irreversible
13
retinal damage occurs or is diagnosed via medical testing. However, the identifi-
cation of DR necessitates highly skilled professionals capable of evaluating digital
color fundus photographs of the retina. Fundus images, capturing the rear part of
the human eye, undergo assessment to pinpoint lesions linked to vascular abnor-
malities caused by diabetes. Deep learning methodologies, notably Convolutional
Neural Networks (ConvNets), have emerged as a prominent approach for exten-
sive medical image processing across various healthcare applications. EfficientNet
architecture, specifically utilized in this study, showcases its efficacy in analyzing
retina images to detect DR. The scalability of ConvNets’ parameters enhances their
accuracy, especially in domains prioritizing precision, such as the medical field.
Thus, this research employs EfficientNet architecture to scrutinize retina images
and identify indicators of DR, signifying a potential breakthrough in early detec-
tion and intervention for this debilitating condition.
14
Chapter 3
METHODOLOGY
16
framework leverages machine learning solutions and deep neural networks to iden-
tify and track objects and hands in the environment. The system is designed to han-
dle close-distance environmental perception, making it suitable for various daily
life tasks and practical life skills.
In summary, the system framework of StereoPilot integrates wearable visual
perception with spatial audio rendering to provide individuals with visual impair-
ments the ability to perceive and interact with their environment in a non-visual
format.
17
3.2 FEASIBILITY AND POSITIONING ACCURACY TESTING
The feasibility and positioning accuracy testing of the wearable target lo-
cation system, StereoPilot, for blind and visually impaired individuals involved
evaluating the system’s ability to increase information transfer rate (ITR) and re-
duce positioning error during spatial navigation tasks. The system utilizes a head-
mounted RGB-D camera to capture 3D spatial information, which is then translated
into auditory cues through spatial audio rendering to assist users in perceiving and
interacting with their surroundings.
The testing process aimed to assess the effectiveness of the system in pro-
viding accurate target location information to assist individuals with visual impair-
ments in navigating their environment. The experimental results demonstrated that
the system significantly improved information transfer efficiency and reduced po-
sitioning error compared to other feedback methods. This indicates that the system
has the potential to assist visually impaired individuals in spatial tasks, thereby en-
hancing their spatial cognition and navigation experience.
The feasibility and positioning accuracy testing involved comparing the per-
formance of the wearable target location system with three other baseline feedback
strategies based on auditory and haptic display methods. The evaluation focused
on the information transfer rate (ITR) for the spatial audio rendering (SAR) on
blind and visually impaired individuals. The results of the testing indicated that
the system’s spatial audio rendering approach outperformed the alternative feed-
back methods, highlighting its efficacy in providing essential navigational cues and
location assistance to individuals with visual impairments.
The testing process also involved assessing the system’s ability to accurately
convey spatial information and provide precise navigation cues to users. By lever-
aging spatial audio rendering, the system empowered individuals with visual im-
pairments to perceive and interpret their surroundings with enhanced accuracy and
18
efficiency. This not only facilitated independent navigation and interaction within
various environments but also contributed to improving the overall mobility and
independence of individuals with visual impairments in real-world scenarios.
So, the feasibility and positioning accuracy testing of the wearable target lo-
cation system, StereoPilot, demonstrated its effectiveness in enhancing information
transfer rate and reducing positioning error during spatial navigation tasks for blind
and visually impaired individuals. The system’s ability to provide accurate target
location information and assist individuals with visual impairments in navigating
their environment underscores its potential as an innovative and valuable tool in the
field of assistive technology for individuals with visual impairments.
The paper discusses the development and evaluation of a wearable target lo-
cation system, StereoPilot, designed for individuals with blindness or visual impair-
ment. The system utilizes spatial perception based on computer vision and target
19
location based on spatial audio rendering to provide essential navigational cues and
location assistance. One of the key aspects of the evaluation involved comparing
different feedback methods and assessing their impact on the information transfer
rate for individuals with visual impairments.
The comparison of feedback methods aimed to evaluate the effectiveness of
the spatial audio rendering (SAR) approach employed by the system in providing
accurate target location information to assist individuals with visual impairments in
navigating their environment. The evaluation involved testing the system’s perfor-
mance against three representative auditory and haptic display methods, including
voice instruction feedback, vibrotactile feedback, and non-speech sonification feed-
back.
The results of the evaluation indicated that the system’s spatial audio render-
ing approach outperformed the alternative feedback methods in terms of informa-
tion transfer efficiency. This finding underscores the efficacy of the spatial audio
rendering approach in providing essential navigational cues and location assistance
to individuals with visual impairments. By leveraging spatial audio rendering, the
system empowered individuals with visual impairments to perceive and interpret
their surroundings with enhanced accuracy and efficiency, thereby facilitating inde-
pendent navigation and interaction within various environments.
Figure 3.5: Scatter plot based on MT and ID and the linear regression curve. For
simplicity, only a portion of the sample points are shown
20
The comparison of feedback methods also highlighted the potential of the
spatial audio rendering approach to significantly improve the mobility and indepen-
dence of individuals with visual impairments in real-world scenarios. The system’s
ability to convey spatial information and provide precise navigation cues to users
through spatial audio rendering demonstrated its potential as an innovative and valu-
able tool in the field of assistive technology for individuals with visual impairments.
The comparison of feedback methods in the evaluation of the wearable tar-
get location system, It emphasized the superiority of the spatial audio rendering
approach in enhancing information transfer efficiency and providing accurate tar-
get location information to assist individuals with visual impairments in navigating
their environment. This underscores the potential of the system to serve as a valu-
able tool for individuals with visual impairments in spatial tasks and navigation.
21
completed 30 trials for each spatial information feedback method.
22
ing tasks accurately, with very few failures. In contrast, the non-speech sonification
feedback method showed significant shortcomings compared to the other methods,
leading to its exclusion from further experiments.
Figure 3.7: (a) Success rate and (b) completion time in desktop manipulation ex-
periment
23
viding accurate spatial information and enhancing the navigation experience for
individuals with visual impairments.
The desktop manipulation experiment verified the feasibility of the wearable
target location system, StereoPilot, for accurate prehension tasks in a real environ-
ment. The results highlighted the effectiveness of the spatial audio rendering ap-
proach and its potential to enhance the spatial perception and navigation experience
for individuals with blindness or visual impairments.
24
to individuals with visual impairments. The spatial audio rendering approach was
found to enhance information transfer efficiency, thereby empowering individuals
with visual impairments to perceive and interpret their surroundings with enhanced
accuracy and efficiency.
The experiments also involved comparing the completion time, positioning
errors, and success rates of the different feedback methods. The results indicated
that SAR greatly shortened the completion time, contributing to a smooth user ex-
perience. Additionally, SAR, along with the other auditory and haptic feedback
methods, was able to assist the subjects in completing tasks accurately, with very
few failures. In contrast, the non-speech sonification feedback method showed sig-
nificant shortcomings compared to the other methods, leading to its exclusion from
further experiments.
Overall, the ITR evaluation experiments provided valuable insights into the
performance of different feedback methods, with SAR demonstrating superior in-
formation transfer efficiency and effectiveness in providing accurate target location
information to assist individuals with visual impairments in navigating their en-
vironment. The results underscored the potential of the spatial audio rendering
approach as an innovative and valuable tool in the field of assistive technology for
individuals with visual impairments. This experiments highlighted the effectiveness
of the spatial audio rendering approach employed by StereoPilot, emphasizing its
potential to enhance the spatial perception and navigation experience for individuals
with blindness or visual impairment.
25
Chapter 4
ADVANTAGES
CHALLENGES
28
Chapter 6
The paper presents the results and discussions of the StereoPilot wearable
target location system, focusing on the evaluation of the system’s performance and
its implications for individuals with blindness or visual impairment. The study
compared the Spatial Audio Rendering (SAR) feedback method with three other
baseline feedback strategies based on auditory and haptic display methods, namely
voice instruction feedback (VI), vibrotactile feedback (VB), and non-speech sonifi-
cation feedback (NS).
The results of the study demonstrated that SAR significantly improved the
Information Transfer Rate (ITR) for individuals with blindness or visual impair-
ment (BVI) compared to the other baseline feedback methods (). The experimental
evaluation based on Fitts’ law test showed that SAR greatly shortened the com-
pletion time, contributing to a smooth user experience (). The study also involved
in-depth research on the wearable design of the assistance device and extensive
comparative experiments on target populations, demonstrating the feasibility and
positioning accuracy of the wearable visual perception module.
Furthermore, the study conducted desktop manipulation experiments to ver-
ify the feasibility of StereoPilot for accurate prehension tasks in real environments.
The results indicated that SAR, VI, and VB were able to assist the subjects in com-
pleting all tasks accurately, with very few failures, while NS had significant short-
comings compared to the other three information feedback methods (). The study
also highlighted the impact of the physical properties of the target and adjacent
interfering objects on the user’s grasping success rate, emphasizing the need for
effective technical support of computer vision and spatial information feedback for
individuals with visual impairments ().
The evaluation metrics used in the study included positioning errors, com-
pletion time, ITR, Pearson correlation coefficient between the Index of Difficulty
(ID) and Movement Time (MT), the root mean square error of the linear regression
curve, and the success rate. The results of the Fitts’ law test and the desktop ma-
nipulation experiments provided valuable insights into the performance of the SAR
feedback method and its potential impact on individuals with blindness or visual
impairment ().
The discussions in the paper emphasized the significance of the experimental
results, highlighting the potential of SAR to improve the spatial perception and nav-
igation experience for individuals with blindness or visual impairment. The study
also identified challenges related to deviations in rendered spatial audio, machine
vision recognition errors, limitations of existing auditory display technologies, and
performance optimization for mobile devices, underscoring the need for continued
research and innovation in the field of assistive technology.
30
Chapter 7
CONCLUSION
32
REFERENCES
[2] W. Sun, Y. Song, C. Chen, J. Huang, and A. C. Kot, “Face spoofing detec-
tion based on local ternary label supervision in fully convolutional networks,”
IEEE Trans. Inf. Forensics Security, vol. 15, pp. 3181–3196, 2020.
[3] Y. Sun, R. Ni, and Y. Zhao, “MFAN: Multi-level features attention network
for fake certificate image detection,” Entropy, vol. 24, no. 1, p. 118, Jan. 2022.
[5] P. Zhuang, H. Li, S. Tan, B. Li, and J. Huang, “Image tampering localiza-
tion using a dense fully convolutional network,” IEEE Trans. Inf. Forensics
Security, vol. 16, pp. 2986–2999, 2021.
[6] Y. Gao, F. Wei, J. Bao, S. Gu, D. Chen, F. Wen, and Z. Lian , “High-
fidelity and arbitrary face editing,” in Proc. IEEE/CVF Conf. Comput. Vis.
Pattern Recognit. (CVPR), Jun. 2021, pp. 16115–16124.
[7] R. Chen, X. Chen, B. Ni, and Y. Ge, “SimSwap: An efficient framework for
high fidelity face swapping,” in Proc. 28th ACM Int. Conf. Multimedia, Oct.
2020, pp. 2003–2011.
33
[8] Y. Nirkin, I. Masi, A. Tran Tuan, T. Hassner, and G. Medioni, “On face
segmentation, face swapping, and face perception,” in Proc. 13th IEEE Int.
Conf. Autom. Face Gesture Recognit. (FG), May 2018, pp. 98–105
34
Chapter 8
Chapter 9
Chapter 10