Project-II B.Tech Format&guidelines
Project-II B.Tech Format&guidelines
Affiliated
to
Maulana Abul Kalam Azad University of Technology
(Formerly WBUT), 2019
Name : Arkadev Kundu
Sanjib Basak
Moumita Majumder
Soumyadip Debnath
12616003188 161260120057
12616003209 161260120078
12616003204 161260120073
12616003216 161260120085
.
1
Heritage Institute of Technology
(An Autonomous Institute)
Department of
Electronics and Communication Engineering
12616003188 161260120057
12616003209 161260120078
12616003204 161260120073
12616003216 161260120085
.Bachelor of Technology
In
Electronics and Communication Engineering
Maulana Abul Kalam Azad University of Technology
(Formerly WBUT), 2019
2
Dept. of Electronic and Communication Engineering
Heritage Institute of Technology, Kolkata-700107.
Certificate of Recommendation
This is to certify that the Thesis entitled “Wearable Object Detection System For Visually
Challenged People” &”Blind Stick” submitted by Arkadev Kundu, Sanjib Basak, Moumita
Majumder, Soumyadip Debnath- under the supervision of Prof. Chandrima Roy (Assistant
professor, Dept. of ECE, HITK), has been prepared according to the regulations of B.Tech.
Degree in Electronics and Communication Engineering Department, awarded by Maulana
Abul Kalam Azad University of Technology (Formerly WBUT) and he/she has fulfilled the
requirements for submission of thesis report and that neither his/her thesis report has been
submitted for any degree/diploma or any other academic award anywhere before.
…………………………………………………….………
Prof. Chandrima Roy
(Assistant Professor, Dept of ECE, HITK)
Project Supervisor
.
……………………………… ……………………….
Prof. Prabir Banerjee External Examiner
(HOD, Dept of ECE, HITK)
3
Heritage Institute of Technology
(An Autonomous Institute)
Affiliated to
Maulana Abul Kalam Azad University of
Technology
(Formerly WBUT)
Certificate of Approval*
The foregoing thesis report is hereby approved as a creditable study of an engineering
subject carried out and presented in a manner satisfactory to warrant its acceptance as a
prerequisite to the degree for which it has been submitted. It is understood that by this
approval the undersigned don’t necessarily endorse or approve any statement made
opinion expressed or conclusion drawn therein but approve the project report only for the
purpose for which it is submitted.
Signature of the Examiners:
1…………………………………………….
2…………………………………………….
3…………………………………………….
4
Contents
1. Introduction
1.1. Objective
1.2. Plan Of Action
2. Literature Study
3. Development Of Wearable Object Detection System & Associated
Algorithms
3.1. Bounding Box
3.2. Classification + Regression
3.3. Two-stage Method
3.4. Unified Method
3.4.1. SSD Architecture
3.5. Block Diagram
3.6. Working Principal
4. Circuit Equipments
5. Results & Discussion
6. Application
7. Challenges & future scope
8. Development Of Blind Stick For Visually Challenged People
8.1. Working Principal
8.2. Block Diagram
9. Circuit Equipments
10.Results & Discussion
11.Application
12.Challenges & future scope
13.Conclusion
14.Reference
5
Abstract
For the blind people the world is full of darkness, so if we try to import some ray of hope to their
life, world could be a better place for them. We, as students of engineering, found it to be a
humble duty to develop such thing that could be of help to them. An wearable object detection
system one of the most needed amenities which could facilitates their daily life. Efficient and
accurate object detection is an important topic for them. With the advent of deep learning
techniques, the accuracy for object detection has increased drastically. The project aims to
incorporate state-of-the-art technique for object detection with the goal of achieving high
accuracy with a real-time performance. A major challenge in many of the object detection
systems is the dependency on other computer vision techniques for helping the deep learning
based approach, which leads to slow and non-optimal performance. In this project, we use a
completely deep learning based approach to solve the problem of object detection in an end-to-
end fashion. The network is trained on the most challenging publicly available dataset, on which
a object detection challenge is conducted annually. The resulting system is fast and accurate, thus
aiding those applications which require object detection. With wearable object detection device a
blind stick is also important for a visually challenged people. If blind stick constructed with at
most accuracy, the blind people will able to move from one place to another without others help,
which leads to increase autonomy for the blind.
6
1 Introduction
Walking sticks are used to guide blind people in their difficulties. The world is changing and the
changing world needs to contribute to the aid for the visually impaired according to the
technological advancements. That is where the object detection and walking stick comes into
play. When we’re shown an image, our brain instantly recognizes the objects contained in it. On
the other hand, it takes a lot of time and training data for a machine to identify these objects. But
with the recent advances in hardware and deep learning, this computer vision field has become a
whole lot easier and more intuitive. Object detection technology has seen a rapid adoption rate in
various and diverse industries. It helps self-driving cars safely navigate through traffic, spots
violent behavior in a crowded place, assists sports teams analyze and build scouting reports,
ensures proper quality control of parts in manufacturing, among many, many other things. And
these are just scratching the surface of what object detection technology can do!
1.1 Objective
The main objective of this project is to develop an application for blind people to detect the
objects in various directions, detecting objects on the ground to make free to walk Detecting
objects using image processing can be used in multiple industrial as well as social applications.
This project is proposing to use object detection for blind people and give them audio/ vocal
information about it. We are detecting an object using the camera and giving voice instructions
about the direction of an object. User must have to train the system first about the object
information .We are then do in feature extraction to search for objects in the camera view. We
are taking help of angle where object is placed to give direction about the object. And the blind
stick detects all the obstacles that are in the way of the user.
1.2 Plan of action
1.3 Challenges
The major challenge in this problem is that of the variable dimension of the output which is
caused due to the variable number of objects that can be present in any given input image. Any
general machine learning task requires a fixed dimension of input and output for the model to be
7
trained. Another important obstacle for widespread adoption of object detection systems is the
requirement of real-time (>30fps) while being accurate in detection. The more complex the
model is, the more time it requires for inference; and the less complex the model is, the less is
the accuracy. This trade-off between accuracy and performance needs to be chosen as per the
application. The problem involves classification as well as regression, leading the model to be
learnt simultaneously. This adds to the complexity of the problem.
8
2 Related Work
There has been a lot of work in object detection using traditional computer vision techniques
(sliding windows, deformable part models). However, they lack the accuracy of deep learning
based techniques. Among the deep learning based techniques, two broad class of methods are
prevalent: two stage detection (RCNN [1], Fast RCNN [2], Faster RCNN [3]) and unified
detection (Yolo [4], SSD [5]). The major concepts involved in these techniques have been
explained below.
2.1 Bounding Box
The bounding box is a rectangle drawn on the image which tightly _ts the object in the image. A
bounding box exists for every instance of every object in the image. For the box, 4 numbers
(center x, center y, width, height) are predicted. This can be trained using a distance measure
between predicted and ground truth bounding box. The distance measure is a jaccard distance
which computes intersection over union between the predicted and ground truth boxes as shown
in Fig. 3.
9
2.3 Two-stage Method
In this case, the proposals are extracted using some other computer vision technique and then
resized to fixed input for the classification network, which acts as a feature extractor. Then an
SVM is trained to classify between object and background (one SVM for each class). Also a
bounding box regressor is trained that outputs some correction (offsets) for proposal boxes. The
overall idea is shown in Fig. 5 These methods are very accurate but are computationally
intensive (low fps).
(a) Stage 1
10
(b) Stage 2
Figure 5: Two stage method
The major techniques that follow this strategy are: SSD (uses different activation maps
(multiple-scales) for prediction of classes and bounding boxes and Yolo (uses a single activation
map for prediction of classes and bounding boxes). Using multiple scales helps to achieve a
higher mAP(mean average precision) by being able to detect objects with different sizes on the
image better. Thus the technique used in this project is SSD.
11
3. Approach
The network used in this project is based on Single shot detection (SSD) [5]. The architecture is
shown in Fig. 7.
The SSD normally starts with a VGG [6] model, which is converted to a fully convolutional
network. Then we attach some extra convolutional layers, that help to handle bigger objects. The
output at the VGG network is a 38x38 feature map (conv4 3). The added layers produce 19x19,
10x10, 5x5, 3x3, 1x1 feature maps. All these feature maps are used for predicting bounding
boxes at various scales (later layers responsible for larger objects). Thus the overall idea of SSD
is shown in Fig. 8. Some of the activations are passed to the sub-network that acts as a classifier
and a localizer. Figure 8:
Anchors (collection of boxes overlaid on image at different spatial locations, scales and aspect
ratios) act as reference points on ground truth images as shown in Fig. 9. A model is trained to
make two predictions for each anchor:
_ A discrete class
_ A continuous offset by which the anchor needs to be shifted to fit the ground-truth bounding
box
12
Figure 9: Anchors
During training SSD matches ground truth annotations with anchors. Each element of the feature
map (cell) has a number of anchors associated with it. Any anchor with an IoU (jaccard distance)
greater than 0.5 is considered a match. Consider the case as shown in Fig. 10, where the cat has
two anchors matched and the dog has one anchor matched. Note that both have been matched on
different feature maps.
The loss function used is the multi-box classification and regression loss. The classification loss
used is the softmax cross entropy and, for regression the smooth L1 loss is used. During
prediction, non-maxima suppression is used to filter multiple boxes per object that may be
matched as shown in Fig. 11.
13
4 Experimental Results
4.1 Dataset
For the purpose of this project, the publicly available github platform dataset will be used. It
consists of 90 annotated images. These images are downloaded from google
Software Packages
TensorFlow - TensorFlow is a free and open-source software library for dataflow and
differentiable programming across a range of tasks. It is a symbolic math library, and is also used
for machine learning applications such as neural networks. It is used for both research and
production at Google. TensorFlow was developed by the Google Brain team for internal Google
use. It was released under the Apache 2.0 open-source license on November 9, 2015.
e-SpeakNG - eSpeakNG is a compact, open source, software speech synthesizer for Linux,
Windows, and other platforms. It uses a formant synthesis method, providing many languages in
a small size. Much of the programming for eSpeakNG's language support is done using rule files
with feedback from native speakers.
Hardware
Raspberry pi 3B+ -The system(Raspberry pi 3B+) specifications on which the model is trained
and evaluated are mentioned as follows: CPU - ARM Cortex-A53 1.4GHz, RAM - 1 Gb. The
Raspberry Pi is a series of small single-board computers developed in the United Kingdom by
the Raspberry Pi Foundation to promote teaching of basic computer science in schools and in
developing countries. The original model became far more popular than anticipated, selling
outside its target market for uses such as robotics. It does not include peripherals (such as
keyboards and mice) and cases. However, some accessories have been included in several
official and unofficial bundles.
14
Pi-Camera – It is a complete raspberry pi camera module. it has 5MP camera sensor.
Connection- The circuit connection is very easy, as we shown below in the image. The
raspberry pi is connected to a mobile or laptop through a Portable hotspot connectivity, by that
laptop or phone, it can be on or off
15
4.2.1 Pre-processing
The annotated data is provided in xml format, which is read and stored into a pickle file along
with the images so that reading can be faster. Also the images are resized to a fixed size.
4.2.2 Network
The model consists of the base network derived from VGG net and then the modified
convolutional layers for fine-tuning and then the classifier and localizer networks. This creates a
deep network which is trained end-to-end on the dataset.
16
4.3 Qualitative Analysis
The results on custom data sheet are shown in this table. This table is devided into model,ground
truth and prediction.
Model
Ground Truth Prediction
17
Model Ground Truth Prediction
The average precision for all the object categories are reported in Table
18
Class Average Precision Class Average Precision
Dog 0.891
Clock 0.719
Bottle 0.786
Laptop 0.864
Keyboard 0.728
19
Part B
1. Introduction- Ever heard of Hugh Herr? He is a famous American rock climber who has
shattered the limitations of his disabilities; he is a strong believer that technology could help
disabled persons to live a normal life. In one of his TED talk Herr said “Humans are not
disabled. A person can never be broken. Our built environment, our technologies, is
broken and disabled. We the people need not accept our limitations, but can transfer
disability through technological Innovation”. These were not just words but he lived his life
to them, today he uses Prosthetic legs and claims to live to normal life. So yes, technology
can indeed neutralize human disability; with this in mind let us use the power of Arduino
and simple sensors to build a Blind man’s stick that could perform more than just a stick
for visually impaired persons.
2. Methodology-
2.1. System Architechture- The proposed system design of the smart stick, as shown in
Fig.1 is composed of the following units:
20
2.3. Ultrasonic Sensor HC-SR04- Ultrasonic is the production of sound waves above
the frequency of human hearing and can be used in a variety of applications such as,
sonic rulers, proximity detectors, movement detectors, liquid level measurement.
Ultrasonic Ranging Module HC - SR04
21
22
Features: Ultrasonic ranging module HC - SR04 provides 2cm - 400cm non-contact
measurement function, the ranging accuracy can reach to 3mm. The modules includes ultrasonic
transmitters, receiver and control circuit. The basic principle
of work:
Using IO trigger for at least 10us high level signal,
The Module automatically sends eight 40 kHz and detect whether there is a pulse signal back.
IF the signal back, through high level , time of high output IO duration is the time from
sending ultrasonic to returning.
Buzzer
A transducer (converts electrical energy into mechanical energy) that typically operates A buzzer
is in the lower portion of the audible frequency range of 20 Hz to 20 kHz. This is accomplished
by converting an electric, oscillating signal in the audible range, into mechanical energy, in the
form of audible waves. Buzzer is used in this research to warn the blind person against obstacle
by generating sound proportional to distance from obstacle
23
2.5. Remote Control Unit
if the stick is lost, Remote unit is used for finding the stick. We use 433MHz RF transmitter and
receiver module in the remote unit.
An RF module (radio frequency module) is a (usually) small electronic device used to transmit
and/or receive radio signals between two devices. In an embedded system it is often desirable to
communicate with another device wirelessly. This wireless communication may be
accomplished through optical communication or through radio frequency (RF) communication.
For many applications the medium of choice is RF since it does not require line of sight. RF
communications incorporate a transmitter and a receiver. They are of various types and ranges.
Some can transmit up to 500 feet. RF modules are widely used in electronic design owing to the
difficulty of designing radio circuitry.
24
2.6. Circuit Diagram
25
Remote Control Unit Circuit
2.7. Program
#include <Ultrasonic.h>
int buzzer = 9;
#include <Ultrasonic.h>
Ultrasonic ultrasonic(12,11);
void setup() {
Serial.begin(9600);
pinMode(buzzer,OUTPUT);
}
void loop()
{
int distance = ultrasonic.Ranging(CM);
26
if (distance<50){
int dil = 2*distance;
digitalWrite(buzzer,HIGH);
delay(dil);
digitalWrite(buzzer,LOW);
delay(dil);
}
}
27
CONCLUSION
An accurate and efficient object detection system will be developed which achieves comparable
metrics with the existing state-of-the-art system. This project uses recent techniques in the field
of computer vision and deep learning. This can be used in real-time applications which require
object detection for pre-processing in their pipeline. An important scope would be to train the
system on a video sequence for usage in tracking applications. Addition of a temporally
consistent network would enable smooth detection and more optimal than per-frame detection. .
In the other hand with the proposed architecture of blind stick and object detection , if
constructed with at most accuracy, the blind people will able to move from one place to another
without others help, Which leads to increase autonomy for the blind. The developed smart stick
th alarming the person if any sign of danger or inconvenience is detected.
28
REFERENCE
[1] Ross Girshick, Je_ Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for
accurate object detection and semantic segmentation. In The IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), 2014.
[2] Ross Girshick. Fast R-CNN. In International Conference on Computer Vision (ICCV), 2015.
[3] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster R-CNN: Towards real- time
object detection with region proposal networks. In Advances in Neural Information Processing
Systems (NIPS), 2015.
[4] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once:
Unified, real-time object detection. In The IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 2016.
[5] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng- Yang
Fu, and Alexander C. Berg. SSD: Single shot multibox detector. In ECCV, 2016.
[6] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale
image recognition. arXiv preprint arXiv:1409.1556, 2014.15
29