Object Detection and Recognition Using TensorFlow For Blind People

Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-8 | Issue-5 , October 2024, URL: https://www.ijtsrd.com/papers/ijtsrd69446.pdf Paper URL: https://www.ijtsrd.com/computer-science/other/69446/object-detection-and-recognition-using-tensorflow-for-blind-people/darshan-a-mirapure

Uploaded by

Editor IJTSRD

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views6 pages

Object Detection and Recognition Using TensorFlow For Blind People

Uploaded by

Editor IJTSRD

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

International Journal of Trend in Scientific Research and Development (IJTSRD)

Volume 8 Issue 5, Sep-Oct 2024 Available Online: www.ijtsrd.com e-ISSN: 2456 – 6470

Object Detection and Recognition

Using TensorFlow for Blind People
Darshan A. Mirapure1, Dipam Pakhamode2, Prof. Rina Shirpurkar3
1,2
School of Science, G H Raisoni University, Amravati, Maharashtra, India
3
Assistance Professor, G H Raisoni University, Amaravati, Maharashtra, India

ABSTRACT How to cite this paper: Darshan A.

Computer Vision impairment or blindness is one such top ten Mirapure | Dipam Pakhamode | Prof.
disabilities in humans, and unfortunately, India has the world’s Rina Shirpurkar "Object Detection and
largest visually impaired population. For this we are creating a Recognition Using TensorFlow for
framework to guide the visually impaired on object detecting and Blind People" Published in International
Journal of Trend in
recognition, so that they can navigate without others support, and be Scientific Research
safe within their surroundings. In this system the captured image is and Development
taken and sent it as input using camera. SSD Architecture is used (ijtsrd), ISSN:
here for the detection of objects based on deep neural networks to 2456-6470,
make precise detection. This input will be given to the software and it Volume-8 | Issue-5,
will be processed under the COCO datasets which are predefined in October 2024, IJTSRD69446
the Tensor flow library used as training dataset for the system in pp.747-752, URL:
general this data set consist of features for nighty percent of real www.ijtsrd.com/papers/ijtsrd69446.pdf
world data objects and distance is calculated by depth estimation and
also by using voice assistance packages the software will produce the Copyright © 2024 by author (s) and
International Journal of Trend in
output in the way of Audio. The System is implemented completely Scientific Research and Development
using Python Programming Language since python consist of many Journal. This is an
inbuilt packages and libraries which will make the complication of Open Access article
writing code more number of lines into simple any less number of distributed under the
lines. terms of the Creative Commons
Attribution License (CC BY 4.0)
KEYWORDS: Object Detection, Tensor Flow, COCO Datasets (http://creativecommons.org/licenses/by/4.0)

I. INTRODUCTION
The fast progress of data and organize technology has that we came up with an Machine Learning
advanced from the Internet and computerization Framework permits the blind activities to distinguish
systems that were initially utilized for authoritative and classify general Time Based day-to-day object
offices and mechanical and economical applications to and produce voice outputs and calculates distance
the application of these innovations every one’s life. using mathematical calculationswhich produces alerts
Once you think of technology like augmented reality, whether user is exceptionally near or far absent from
one of the key components to consider is object the source. The same framework can be used for
acknowledged innovation, moreover known as object Obstacle Detection Instrument. The primary reason
detection. This term specifies to a capacity to for object detection is to find different things, which
distinguish the frame and shape of diverse objects draw rectangular bounding boxes around them, and
and their position in space caught by the camera. It’s a decide the course of each item found.
known reality that the number of visually disabled
Applications of object discovery emerge in large no of
individuals within the world is almost more than 280
diverse domains counting recognizing people on foot
million, roughly break-even with the 25% of the
for self-driving vehicals, checking crops, and indeed
Indian population. They suffer-normal and difficult
real-time ball following the basket.
challenges in regular activities particularly when they
are on their own. They are generally dependent on TensorFlow, an open-source machine learning
somebody for gettingto their day-to-day works. framework developed by Google, provides robust
tools and libraries specifically designed for building
So, it’s a very challenging and the nonphysical
and deploying object detection models. With its
arrangement for them is of most extreme significance TensorFlow Object Detection API, users can access a
and much required. One such solution from our sideis

@ IJTSRD | Unique Paper ID – IJTSRD69446 | Volume – 8 | Issue – 5 | Sep-Oct 2024 Page 747
International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470
variety of pre-trained models, facilitating rapid output of the application will sent to voice
development and fine-tuning for specific tasks. modules the course of the object will be changed
This framework simplifies the entire workflow, from Into default voice notes which can at that point be
data preprocessing and model training to and sent to the users for their needs.
inference. TensorFlow's integration with Keras also  Along with the object finding, we have an alert
allows for easy model customization and voice framework where figure out distance. In
experimentation, making it an ideal choice for both case the Blind victim is especially close to the
beginners and experienced practitioners. source or is distant far away at a more secure
II. OBJECTIVE place, it'll generate voice notes alongside distance
The project goal is to incorporate an art of techniques measure units
for object detection to achieve high accuracy with III. LITERATURE SURVEY
realtime detecting performance. In this project, we use OBJECT DETECTION USING
Python programming language with an TensorFlow- CONVOLUTIONAL NEURAL NETWORK
based solution for solving the problem of object In 2019, “Object Detection using convolutional
detection in an end-to-end solving proposed system Neural Networks”. As Vision systems are essential in
will be fast and effiecient. A TensorFlow based building a mobile robot. That will complete a certain
application approach for an mobile device, using its task like navigation, surveillance, and explosive
built-in hardware component camera is used for ordnance disposal (EOD). Vision systems are
detecting objects, more specifically: The framework is essential in building a mobile robot. A project was
prepared in such a way where an mobile application proposed based on CNN, which is used to detect
(assuming you're using it on an Android/ios device) objects in the environment. Methodology used- Two
will capture real-time frames and will send them to the state of art models are compared for object detection,
backend of the application where all the predefind Single Shot Multi-Box Detector (SSD) with
computations takes place. MobileNetV1. Another methodology is A Faster
Convolutional Neural Network (Faster-RCNN) with
the help of InceptionV2.
IMAGE BASED REAL TIME OBJECT
DETECTION ALONG WITH RECOGNITION
IN IMAGE PROCESSING
In 2019, “Image Based Real Time Object Detection
and Recognition, In Image Processing” Object
detection and tracking mainly for human and vehicle
is presently most active research topic. It is used in
applications such as surveillance, image retrieval. A
solution was proposed which has reviewed recent
Fig 1: Objects for object recognition which technologies for each phase of the object detection.
consist a dog and a duck on the beach The methodology used here is four different methods
 Along with the object finding, we have used an for object detection which is nothing but a computer
alert framework where distance will get technology related to computer vision with image
calculated. In case the Blind Person is especially processing that deals with detecting instances of
close to the object or is far away at a safe put, it'll semantic objects of a certain class in digital images
produce voice-based outputs yields alongside and videos and, they are feature based detection,
distance units. The backend of the application is region based detection outline based detection
where the video clip is sent and is taken as an illustrations and model based detection. applications
input, which goes through in everyday life. It is used in applications such as
surveillance, image retrieval.
 Victim is especially close to the source or is
distant far away at a more secure place, it'll A solution was proposed which introduced a new fast
generate voice notes alongside distance measure method for saliency object detecting within images.
units. The main aim was detection of objects in complex
images. The methodology used has four steps:
 The COCO DATASETS object detection model regional feature extraction, segment clustering,
one of the datasets predefined and which tests saliency score computation and post-processing.
and detects with accurate metrics. After testing

@ IJTSRD | Unique Paper ID – IJTSRD69446 | Volume – 8 | Issue – 5 | Sep-Oct 2024 Page 748
International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470
REAL-TIME OBJECT DETECTION USING assistance, AI and GPS based navigation systems, etc.
DEEP LEARNING These systems aredeveloped to work in specific cases
In 2020, “Real-Time Object Detection Using Deep or conditions, and cannot be used broadly. There are
Learning”. Object detecting, recognition in images cases wherein the people with visual impairment have
and videos is one such major thing today. For this a to accept about their surroundings, which is not
solution was proposed using deep learning. The possible with the existing systems.
methodology used here includes feature extraction
DISADVANTAGES OF EXISTING SYSTEM
with the help of Darknet-53 along with feature map up
 They are expensive. Most of the visually impaired
sampling and concatenation. Model includes various
people (assume single person) cannot afford such
changes in object detection techniques.
highly economical products.
ASSISTIVE OBJECT FINDING SYSTEM FOR
 These systems may be complex in functionality,
VISUALLY IMPAIRED PEOPLE
making it difficult to be used by the blind people.
In 2020 ,“Assistive object Recognition/finding System
for visually impaired” The issue of visual impair or  Some systems are not real-time.
blind people is faced worldwide, for this a solution  Developing and training object detection models
was proposed where two cameras placed on blind can be complex and resource-intensive, requiring
person's glasses, GPS free service, and ultrasonic substantial expertise in machine learning and
sensors are employed. To give information about the computer vision.
surroundings.
 High-quality, annotated datasets are essential for
The methodology used here is system takes real-time training effective models. Collecting and labeling
images as input, then images are pre-processed based this data can be time-consuming and costly.
on the job, their background and foreground work are
separated and then the DNN module with the help of  Object detection systems can struggle with
pre-trained YOLO model is applied for resulting in accuracy in challenging conditions, such as poor
featured extraction. lighting, occlusions, or variations in object
appearance.
IV. PROBLEM FINDING
The Populated number of people visually impaired in  Achieving real-time detection can be
the world is more than 290 million. In this 42% are computationally demanding, requiring powerful
blind and 58% have no vision. They are an important hardware, which may not always be feasible for
part of our society. It’s very difficult for them to live all applications.
the outside world independently. Today in the fast  Models can inherit biases from the training data,
moving society, visually impaired people require leading to inaccuracies in detecting certain objects
supportive instruments in their day-to-day life. Our or demographics, which raises ethical concerns.
thought primarily centered on developing an assistive
framework for impaired people to detect objects  The initial investment in technology and ongoing
effectively which can be helpful to live. maintenance can be high, especially for small
businesses or startups.
PROBLEM DESCRIPTION
The system is designed in such a way where an  In applications like surveillance, object detection
mobile application will capture real-time objects and can raise privacy issues, as it often involves
will send it to a laptop based networked server where monitoring individuals without their consent.
all the important computations take place and VI. PROPOSED SYSTEM
utilizing a pretrained SSD detection model which is In this proposed system, we are using Python with an
trained on COCO datasets the objects will detect and Tensor Flow-based approach to find the solution for
recognized by the system. After that distance will be the problem of object detection in an end-to- end
calculated and the output for this will bein audio form fashion. We used SSD Detection Model for the
where the system gives warnings with calculated detecting of items based on deep neural networks to
distance. make effective detection and OpenCV library for real
V. EXISTING SYSTEM time picture capturing. Among ImageNet, Google
Most of the computer vision systems exist now-a-days Open, COCO datasets we are using COCO since it
to help the people who are visually impaired in their will provided class of classified feature for more than
life. These include technological Augmented Reality 90% of the real world objects. The image is sent as an
approached wearable goggles, video calling input to the model and meanwhile distance is
applications for the visually impaired to ask for calculated using depth estimation with the help of

@ IJTSRD | Unique Paper ID – IJTSRD69446 | Volume – 8 | Issue – 5 | Sep-Oct 2024 Page 749
International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470
voice modules predefined by python the output of the images which will be considered as input. After
object name will be converted into default voice capturing images it will store and send to the dataset
notes which are sent to the blind people for their help where using SSD Architecture the internal
with calculation. computations will takes place mean while after the
User starts the System, after that the system will computations the model will detect the object and
recognition will be done.
activate the camera and capture instant real time

Fig 2 System Design

After detection of object next it will be displayed on • It can assist visually impaired individuals by
the monitor where frames are captured to the detected identifying objects in their environment through
object along will the labels. Next distance will be audio feedback.
calculated using depth estimation by finding mid
• In marketing, object detection can be used to
ranges to the frames. Now using speakers which are
tailor advertisements and content based on user
based on voice module packages the detected object
behavior and interactions with specific products.
images will be read as text and it will be the output.
 ADVANTAGES • In applications like augmented reality and mobile
apps, object detection enhances user interactions
• Easy to use
by recognizing and responding to real-world
• Provides real-time results and this result is in the objects.
voice format with distance.
• If a model is too closely trained on specific data,
• Depending on the video quality, difference it may not generalize well to new, unseen
between various objects like chair and table etc environments or variations of the objects.
can be easily differentiated .
• The initial investment in technology and ongoing
• Due to usage of COCO datasets it will provide the maintenance can be high, especially for small
90% of results efficiently. businesses or startups.
• It enables automated systems to identify and • In applications like surveillance, object detection
analyze objects in real time, streamlining can raise privacy issues, as it often involves
processes in industries like manufacturing, monitoring individuals without their consent.
security, and logistics.
VII. MODULES
• It allows for efficient analysis of large datasets, Video Capturing Module:
extracting valuable insights from images or When the system is turned on the system capture
videos for applications in surveillance, retail, and images using camera. We have to connect this as input
agriculture. to the COCO dataset and classification of pixels and
• By automating tasks like inventory management features takes place. The captured frames can be seen
or quality control, object detection can reduce in the monitor with drawn boundaries and label. The
labor costs and minimize errors.

@ IJTSRD | Unique Paper ID – IJTSRD69446 | Volume – 8 | Issue – 5 | Sep-Oct 2024 Page 750
International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470
method videocapture( ) is used to start the camera and
capture the video.
Image Processing Module:
OpenCV (Open Source Computer Vision) is a library
in python which functions mainly aimed at real-time
computer vision. It is mainly used to do all the
computational operation related to images. cv2 is used
to perform image processing and make use of methods
which are used to detect and capture the frames and
specifies names. This module is processed after the
input is taken from the camera.
Fig 3: Object Detection
Object Detecting Module:
The algorithm will take the image as input and all the VIII. CONCLUSION
computations will take place like divding the image Previous studies have proposed a number of methods
into neurons nothing but pixels and classification of to detect object. After doing literature survey,
features which will be done on Neural Network. different techniques has been found for detecting and
Image will be read as string for the next computation Recognition of Object and they use different types of
and it will be compared under trained dataset. This can data as input for their methodology. After the survey
be achieved here by using category index where 90 of different types of methods, it is found that using
objects are trained separately. Here we used SSD SSD Architecture model which was trained under
Architecture which comes under Tensor Flow API . COCO datasets is the easy method which can be
easily applied and appropriate in all conditions. We
Distance calculation Module: decide to explore this method of computer vision and
To find the distance of the object numpy is used, proposed a noble method to detecting and recognize
which is pip package used for mathematical the objet based on Tensor flow and finding distance,
calculation. Finding distance can be approach by sending output through voice assistance like speaker,
using depth estimation, using detected objects visible by this blind person can live without depending on
on the monitor frames the depth estimation will take others for their day to day life on detection and
place by finding mid ranges and rounding the recognizing the object and will alerted because of
estimation scaleto 0-10. voice outputs. As per future work we are willing to
TensorFlow Object Detection API: make an application software for the IOS devices.
This is a powerful framework specifically designed IX. REFERENCES
for object detection tasks. It provides pre-trained [1] Aditya Raj, "Model for Object Detection using
models, utilities for training custom models, and tools Computer Vision and Machine Learning for
for evaluation and visualization. Decision Making," International Journal of
TensorFlow Core: Computer Applications, 2019.
Basic TensorFlow libraries and functions are used for [2] Bhumika Gupta, "Study on Object Detection
building, training, and evaluating neural networks. using Open CV Python," International Journal
This includes layers, optimizers, and metrics. of Computer Applications Foundation of
TFRecord: Computer Science, vol. 162, 2017.
A data format optimized for TensorFlow that is [3] Abdul Muhswin M, "Online Blind Assistive
commonly used for storing training data, including System using Object Recognition,"
images and their corresponding annotations. International Research Journal of Innovations
Audio Output Module: in Engineering and Technology, vol. 4, pp. 49-
Next after detecting the object and calculating the 51, 2018.
distance our aim is to give the output in the audio [4] "OpenCV," [Online]. Available on:
using voice notes. In the output we are going to www.opencv.org.
specify the distance along with units and the warning
messages to alert the user. For audio output the [5] "Python language," [Online]. Available on:
pyttxs3 pip package which is predefinded python www.python.org.
built in module used for converting text to speech. [6] Usha Kosarkar, Gopal Sakarkar, Shilpa Gedam
(2022), “An Analytical Perspective on Various
Deep Learning Techniques for Deepfake

@ IJTSRD | Unique Paper ID – IJTSRD69446 | Volume – 8 | Issue – 5 | Sep-Oct 2024 Page 751
International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470
Detection”, 1st International Conference on International Journal of Scientific Research in
Artificial Intelligence and Big Data Analytics Science and Technology (IJSRST), 13th October
(ICAIBDA), 10th & 11th June 2022, 2456-3463, 2021, 2395-602X, Volume 9, Issue 6, PP.
Volume 7, PP. 25-30 1132-1140, https://ijsrst.com/IJSRST219682
[7] Usha Kosarkar, Gopal Sakarkar, Shilpa Gedam [10] Usha Kosarkar, Prachi Sasankar(2021), “ A
(2022), “Revealing and Classification of study for Face Recognition using techniques
Deepfakes Videos Images using a Customize PCA and KNN”, Journal of Computer
Convolution Neural Network Model”, Engineering (IOSR-JCE), 2278-0661,PP 2-5,
International Conference on Machine Learning
[11] Usha Kosarkar, Gopal Sakarkar (2024),
and Data Engineering (ICMLDE), 7th & 8th
“Design an efficient VARMA LSTM GRU
September 2022, 2636-2652, Volume 218, PP.
model for identification of deep-fake images
2636-2652,
via dynamic window-based spatio-temporal
https://doi.org/10.1016/j.procs.2023.01.237
analysis”, Journal of Multimedia Tools and
[8] Usha Kosarkar, Gopal Sakarkar (2023), Applications, 1380-7501,
“Unmasking Deep Fakes: Advancements, https://doi.org/10.1007/s11042-024-19220-w
Challenges, and Ethical Considerations”, 4th
[12] Usha Kosarkar, Dipali Bhende, “ Employing
International Conference on Electrical and
Artificial Intelligence Techniques in Mental
Electronics Engineering (ICEEE),19th & 20th
Health Diagnostic Expert System”,
August 2023, 978-981-99-8661-3, Volume
International Journal of Computer Engineering
1115, PP. 249-262, https://doi.org/10.1007/978-
(IOSR-JCE),2278-0661, PP-40-45,
981-99-8661-3_19
https://www.iosrjournals.org/iosr-
[9] Usha Kosarkar, Gopal Sakarkar, Shilpa Gedam jce/papers/conf.15013/Volume%202/9.%2040-
(2021), “Deepfakes, a threat to society”, 45.pdf?id=7557