I ASSIST INTERIM REPORT Final

I-ASSIST
Submitted by:
ADITYA AGARWAL (191B020)

ASHMIT MODI (191B076)
ANSHU GUPTA (191B052)
in partial fulfillment for the award of the Degree of

Bachelor of Technology
IN
Department of Computer Science & Engineering
Aug 2021- Dec 2021
JAYPEE UNIVERSITY OF ENGINEERING & TECHNOLOGY,
A-B ROAD, RAGHOGARH, DT. GUNA - 473226, M.P., INDIA

DECLARATION
We hereby declare that the work reported in 5th semester Minor project entitled “I-
ASSIST”, in partial fulfillment for the award of the degree of B.Tech (CSE)
submitted at Jaypee University of Engineering and Technology, Guna, as per the
best of our knowledge and belief there is no infringement of intellectual property
rights and copyright. In case of any violation, we will solely be responsible.
Signature of the Student

Aditya Agarwal (191B020)
Anshu Gupta (191B052)
Ashmit Modi (191B076)
Jaypee University of Engineering and Technology,

Raghogarh, Guna – 473226
Date: 20/1/2022
i
CERTIFICATE
This is to certify that the project titled “I-ASSIST” is the bona fide work carried out
by Aditya Agarwal ,Ashmit Modi and Anshu Gupta, a student of B Tech (CSE) of
Jaypee University of Engineering and Technology, Guna (M.P) during the academic
year 2020-21, in partial fulfillment of the requirements for the award of the degree of
Bachelor of Technology (Computer Science and Engineering ) and that the project
has not formed the basis for the award previously of any other degree, diploma,
fellowship or any other similar title.
Signature of the Guide
Jaypee University of Engineering and Technology, Raghogarh,

Guna – 473226
Date: 20/1/2022
ii
ABSTRACT
This project was an attempt at developing an object detection and tracking system
using modern computer vision technology. The project delivers an implemented
tracking system. It consists of a hybrid of optical and modern infra-red technology
and is applicable to areas such as unsupervised surveillance or semi-autonomous
control. It is stable and is applicable as a stand alone system or one that could easily
be embedded into an even larger system. The project was implemented in 5 months,
and involved research into the area of computer vision and robotic automation. It
also involved the inclusion of cutting-edge technology of both the hardware and
software kind. The results of the project are expressed in this report, and amount to
the application of computer vision techniques in tracking animate objects in both a
2 dimensional and 3 dimensional scene.
iii
ACKNOWLEDGEMENT
We would like to express our gratitude and appreciation to all those who gave us the
opportunity to complete this project. Special thanks is due to our supervisor Mr.
Navaljeet Singh Arora whose help, stimulating suggestions and encouragement
helped us in all the time of development process and in writing this report. We also
sincerely thanks for the time spent proofreading and correcting my many mistakes.
We would also like to thank our parents and friends who helped us a lot in finalizing
this project within the limited period. Last but not the least I am grateful to all the
team members of I-ASSIST
Thanking you
Aditya Agarwal (191B020)
Anshu Gupta (191B052)
Ashmit Modi (191B076)
iv
LIST OF FIGURES
Figure Title Page No.
Fig 3.1 Flowchart of Face 9

Detection Model
Fig 3.2 Flowchart for Object 10
Detection Model
Fig 3.3 Sequence Diagram 12
Fig 3.4 Use Case Diagram 13
Fig 3.5 Result Obtained After 15
Training The YOLO_v5
Model
Fig 3.6 Output From The Face 16
Detection Model
Fig 3.7 Output from the Object 16
Detection Model
v
Table of Contents
Title page i
Declaration of the Student ii
Certificate of the guide iii
Abstract iv
Acknowledgement v
List of Figures
Chapter-1 INTRODUCTION
1.1 Problem Definition
1.2 Project Overview
1.3 Hardware Specification
1.4 Software Specification
Chapter-2 LITERATURE SURVEY

2.1 Existing System
2.2 Proposed System
2.3 Feasibility Study
Chapter-3 SYSTEM ANALYSIS & DESIGN

3.1 Requirement Specification
3.2 Flowcharts
3.2.1 Face Detection Model
3.2.2 Object Detection Model
3.3 Sequence Diagram
Chapter-4 RESULTS/OUTPUTS
Chapter-5 CONCLUSIONS/RECOMMENDATIONS
Chapter-6 REFERENCES
Chapter-7 APPENDICES
7.1 Details of software used
7.2 Code
CHAPTER-1
INTRODUCTION
1.1 Problem Definition
According to a study conducted by MIT about 80% of the information collected from
the environment are transferred to the brain with the help of eyes.
According to this survey we get to know that among all the five sense organs (Eyes,
Ears, Nose, Tongue and Skin), eyes play a most important role.
To help blind people so that they can walk without hitting obstacle and identify
obstacle in their path, detect the known family members and to help them in many
more such tasks we are coming up with the solution of I-ASSIST.
1.2 Project Overview
The project “I-Assist” mainly focuses to provide a vision assistance to blind people
By helping them detect the object in front of them so that they can walk without
hitting any obstacle.
 Helps in identifying the objects in front of the person.
 Recognizing known faces(family members, friends and relatives).
 Help in crossing the roads by identifying the traffic lights colors.
Sending SOS message to contacts in case of any emergency.
Blind people face difficulty in walking as they might get hit by an obstacle in the
path or similar problems like crossing a traffic light signal or recognizing faces as
they cannot see.
Page 2 of 33
“I-ASSIST” goal is to make the world more accessible to people who are blind or
have a low level of vision. At the end of this project we expect that our application
is capable of giving desired output to the users and is able to help them. The main
goal of the project is to help the people and make their life easy by providing them
assistance while doing different tasks. At the end of this project we expect that our
application is fully stable on its version.
1.3 Hardware Specification
 CPU (3.0 GHz or faster) or faster 64-bit Dual Core processor like Intel
core-2 duo.
 Memory: 4GB(DDR4 | DDR2) RAM or more
 Camera for capturing the image
 Speaker (1mW) i.e. like in ear microphone.
1.4 Software Specification
 Python Interpreter
 Operating system: Linux- Ubuntu 16.04 to 17.10
Page 3 of 33
CHAPTER-2
LITERARTURE SURVEY
2.1 EXISTING SYSTEM

There are some systems like self-driving cars, which uses cameras and other stuffs
to identify the objects and obstacles in path so that they can sense where to stop and
slow down.
However, there is no such existing system to help blind people, but due to advancing
technology; there are new techniques that can help blind and low vision people to
help identify object and obstacle in the path of the people, identifying the known
faces and traffic light.
2.2 PROPOSED SYSTEM
The project will enable users to identify the objects and obstacles in the path which
will help them to walk freely without hitting any object , identify the known people
so that they can easily recognize the people they know easily contact them in case
of any emergencies.
The software will be provided at a certain cost which requires the cost of the
hardware and the software that are used to build up the application, the system is
easy to operate and can be used anywhere by the people.
2.3 FEASIBILITY STUDY
 Financial Stability: The price of the equipment will not be too high and
minimal cost will be charged from the people, which include the cost of the
Page 4 of 33
hardware and the software.
 Technical feasibility: Each of the hardware and the software used are freely
available in the market and the technologies used are open source which
means anyone can contribute in these technologies. The data collected from
the user will be stored in the user local system and it will be used to improve
the accuracy and the functioning of the application.
Page 5 of 33
CHAPTER-3
SYSTEM ANALYSIS & DESIGN
3.1 Requirement Specification
PYTHON
Python is an interpreted high-level general-purpose programming language. Its

design philosophy emphasizes code readability with its use of significant
indentation. Its language constructs as well as its object-oriented approach aim to
help programmers write clear, logical code for small and large-scale projects.
Python is dynamically typed and garbage-collected. It supports multiple
programming paradigms, including structured (particularly, procedural), object-
oriented and functional programming. It is often described as a "batteries included"
language due to its comprehensive standard library.
GOOGLE COLAB
Colaboratory, or “Colab” for short, is a product from Google Research. Colab allows
anybody to write and execute arbitrary python code through the browser, and is
especially well suited to machine learning, data analysis and education. More
technically, Colab is a hosted Jupyter notebook service that requires no setup to use,
while providing free access to computing resources including GPUs.
Page 6 of 33
YOLO V5
YOLO v5 is a family of object detection architectures and models pretrained on the

COCO dataset, and represents Ultralytics open-source research into future vision AI
methods, incorporating lessons learned and best practices evolved over thousands of
hours of research and development.
All the previous object detection algorithms have used regions to localize the object
within the image. The network does not look at the complete image. Instead, parts
of the image which has high probabilities of containing the object. YOLO or You
Only Look Once is an object detection algorithm much is different from the region
based algorithms which seen above. In YOLO a single convolutional network
predicts the bounding boxes and the class probabilities for these boxes.
OpenCV
OpenCV (Open Source Computer Vision Library) is an open source computer vision
and machine learning software library. OpenCV was built to provide a common
infrastructure for computer vision applications and to accelerate the use of machine
perception in the commercial products. Being a BSD-licensed product, OpenCV
makes it easy for businesses to utilize and modify the code.
The library has more than 2500 optimized algorithms, which includes a
comprehensive set of both classic and state-of-the-art computer vision and machine
learning algorithms. These algorithms can be used to detect and recognize faces,
identify objects, classify human actions in videos, track camera movements, track
Page 7 of 33
moving objects, extract 3D models of objects, produce 3D point clouds from stereo
cameras, stitch images together to produce a high resolution image of an entire scene,
find similar images from an image database, remove red eyes from images taken
using flash, follow eye movements, recognize scenery and establish markers to
overlay it with augmented reality, etc. OpenCV has more than 47 thousand people
of user community and estimated number of downloads exceeding 18 million. The
library is used extensively in companies, research groups and by governmental bodie
pip install opencv-python -command
TENSOR FLOW
Tensor flow is an open-source software library for dataflow and differentiable

programming across a range of tasks. It is an symbolic math library, and is also used
for machine learning application such as neural networks, etc.. Google uses it for
both research and production. The Google Brain team for internal Google use
develops tensor flow. It is released under the Apache License 2.0 on November
9,2015. Tensor flow is Google Brain's second-generation system.1st Version of
tensor flow was released on February 11, 2017.While the reference implementation
runs on single devices, Tensor flow can run on multiple CPU’s and GPU (with
optional CUDA and SYCL extensions for general-purpose computing on graphics
processing units). Tensor Flow is available on various platforms such as64-bit
Linux, macOS, Windows, and mobile computing platforms including Android and
iOS. The architecture of tensor flow allows the easy deployment of computation
across a variety of platforms (CPU’s, GPU’s, TPU’s), and from desktops - clusters
of servers to mobile and edge devices. 22 Tensor flow computations are expressed
Page 8 of 33
as tasteful dataflow graphs. The name Tensor flow derives from operations that such
neural networks perform on multidimensional data arrays, which are referred to as
tensors.
pip install tensorflow -command
NUMPY
NumPy is library of Python programming language, adding support for large, multi-
dimensional array and matrice, along with large collection of high-level
mathematical function to operate over these arrays. The ancestor of NumPy,
Numeric, was originally created by Jim Hugunin with contributions from several
developers. In 2005 Travis Olphant created NumPy by incorporating features of
computing Numarray into Numeric, with extension modifications. NumPy is open-
source software and has many contributors.
pip install numpy -command
Page 9 of 33
3.2 Flowchart
3.2.1 Face Detection Model
Fig 3.1
Explanation:
Face recognition extends beyond detecting the presence of a human face to
determine whose face it is. The process uses a computer application that captures a
digital image of an individual's face -- sometimes taken from a video frame -- and
compares it to images in a database of stored records.
Page 10 of 33
Face detection helps identify which parts of an image or video should be focused on
to determine age, gender and emotions using facial expressions. In a facial
recognition system -- which maps an individual's facial features mathematically and
stores the data as a faceprint -- face detection data is required for the algorithms that
discern which parts of an image or video are needed to generate a faceprint. Once
identified, the new faceprint can be compared with stored faceprints to determine if
there is a match.
3.2.2 Object Detection Model
Fig 3.2
Page 11 of 33
Explanation:
Object detection is a computer vision technique that works to identify and locate
objects within an image or video. Specifically, object detection draws bounding
boxes around these detected objects, which allow us to locate where said objects are
in (or how they move through) a given scene.
An encoder takes an image as input and runs it through a series of blocks and layers
that learn to extract statistical features used to locate and label objects. Outputs from
the encoder are then passed to a decoder, which predicts bounding boxes and labels
for each object.
The bounding boxes are then shown to us as the output of the object identified in the
picture by the model.
The model predicts the bounding boxes of the object after verifying it with the help
of accuracy.
Page 12 of 33
3.3 Sequence Diagram
Fig 3.3
Explanation:
The use case diagram is explained as follows:
1.A user opens the application and the web cam is accessed which captures the live
camera feed from the environment.
2. the image captured from the backgrouind is then transferred to the face detection
and the object detection model which looks for the familiar faces and the object in
the camera feed and tells about the output in the form of voice.
3. the model also offers traffic light detection application to the user ehich can easily
Page 13 of 33
help the user to identify the traffic light.
4. the user can also send the soso message ioncase of any emergency with the help
of the application.
3.4 Use case Diagram
Fig 3.4
Explanation:
The user starts the system and then the live feed from the camera is taken as input
for the object detection and the face detection program.
The face detection model scans the image feed and check for the known faces
available in the database which are then matched with the input data.
Page 14 of 33
The result is then output to the screen with the bounding boxes and the name of the
person if identified is pronounced or spoken by the program.
The obstacle detection program scans the image feed and detects the object if present
in the image feed and tells the output with accuracy in the form of speech.
The system also provides the facility to send an SOS signal to the users contact list
in case of any emergencies.
Page 15 of 33
CHAPTER-4
RESULTS/OUTPUTS
With the help of this project, people are able to helps in identifying the objects in
front of the person, recognizing known faces(family members, friends and relatives),
recognizing known faces(family members, friends and relatives) and Help in
crossing the roads by identifying the traffic lights colors.
By using this thesis and based on experimental results we are able to detect obeject
more precisely and identify the objects individually with exact location of an obeject
in the picture in x,y axis.This paper also provide experimental results on different
methods for object detection and identification and compares each method for
Page 16 of 33
SCREENSHOTS
Fig 3.5
Page 17 of 33
Fig 3.6
Page 18 of 33
CHAPTER-5
CONCLUSIONS/RECOMMENDATIONS
● “I-ASSIST” goal is to make the world more accessible to people who

are blind or have a low level of vision.
● At the end of this project we expect that our application is capable of

giving desired output to the users and is able to help them.
● The main goal of the project is to help the people and make there life
easy by providing them assistance while doing different tasks.
● At the end of this project we expect that our application is fully stable
on its version.
By using this thesis and based on experimental results we are able to detect obeject
more precisely and identify the objects individually with exact location of an obeject
in the picture in x,y axis.This paper also provide experimental results on different
methods for object detection and identification and compares each method for their
Page 19 of 33
CHAPTER-6
REFERENCES
● https://sourceforge.net/
● Vector Images Creator
● Wikipedia
● Geeks For Geeks
● https://medium.com/
● https://github.com/ultralytics/yolO
Page 20 of 33
CHAPTER-7
APPENDICES
7.1 Details of software used
 Text Editor
 Visual Studio
 Google Collab
 Tensor Flow
Page 21 of 33
7.1.1 TEXT EDITOR
A Text editor is a type of computer program that edits plain Text. Text editors
are provided with operating systems and software development packages, and can
be used to change files such as configuration files, documentation files and
programming language source code
7.1.2 VISUAL STUDIO
Visual Studio Code is a streamlined code editor with support for development
operations like debugging, task running, and version control. It aims to provide just
the tools a developer needs for a quick code-build-debug cycle and leaves more
complex workflows to fuller featured IDEs, such as Visual Studio IDE
7.1.3 Google Colab
Colaboratory, or “Colab” for short, is a product from Google Research. Colab allows
anybody to write and execute arbitrary python code through the browser, and is
especially well suited to machine learning, data analysis and education. More
technically, Colab is a hosted Jupyter notebook service that requires no setup to use,
while providing free access to computing resources including GPUs
Page 22 of 33
7.2 Code
import numpy as np
import cv2
from gtts import gTTS
from playsound import playsound
import os
import cv2
import face_recognition
import numpy as np
from face_recognition.api import face_encodings
chris = face_recognition.load_image_file('chris.png')
chris_encodings = face_recognition.face_encodings(chris)[0]
robert = face_recognition.load_image_file('robert.png')
robert_encodings = face_recognition.face_encodings(robert)[0]
known_face_encodings = [chris_encodings, robert_encodings]

known_face_names = ["Chris", "Robert"]
thres = 0.5 # Threshold to detect object

nms_threshold = 0.2 #(0.1 to 1) 1 means no suppress , 0.1 means high suppress
cap = cv2.VideoCapture(0)
# cap.set(cv2.CAP_PROP_FRAME_WIDTH,280) #width
Page 23 of 33
# cap.set(cv2.CAP_PROP_FRAME_HEIGHT,120) #height
# cap.set(cv2.CAP_PROP_BRIGHTNESS,150) #brightness
classNames = []
with open('coco.names','r') as f:
classNames = f.read().splitlines()
print(classNames)
font = cv2.FONT_HERSHEY_PLAIN
#font = cv2.FONT_HERSHEY_COMPLEX
Colors = np.random.uniform(0, 255, size=(len(classNames), 3))
weightsPath = "frozen_inference_graph.pb"
configPath = "ssd_mobilenet_v3_large_coco_2020_01_14.pbtxt"
net = cv2.dnn_DetectionModel(weightsPath,configPath)
net.setInputSize(320,320)
net.setInputScale(1.0/ 127.5)
net.setInputMean((127.5, 127.5, 127.5))
net.setInputSwapRB(True)
while True:
flag, frame = cap.read()
if not flag:
print("Colud not access the camera")
break
small_frame = cv2.resize(frame, (0, 0), fx=1, fy=1)
rgb_small_frame = cv2.cvtColor(small_frame, cv2.COLOR_BGR2RGB)
Page 24 of 33
face_locations = face_recognition.face_locations(rgb_small_frame)
face_encodings = face_recognition.face_encodings(
rgb_small_frame, face_locations)
face_names = []
for face_encoding in face_encodings:
matches = face_recognition.compare_faces(
known_face_encodings, face_encoding)
name = "UNKNOWN"
face_distances = face_recognition.face_distance(
known_face_encodings, face_encoding)
best_match_index = np.argmin(face_distances)
if matches[best_match_index]:
name = known_face_names[best_match_index]
face_names.append(name)
print(face_names)
for (top, right, bottom, left), name in zip(face_locations, face_names):
top *= 4
right *= 4
bottom *= 4
left *= 4
cv2.rectangle(frame, (left, top), (right, bottom), (0, 0, 255), 2)
font = cv2.FONT_HERSHEY_DUPLEX
cv2.putText(frame, name, (left+6, bottom-6),
font, 1.0, (255, 255, 255), 1)
cv2.imshow("Frame", cv2.cvtColor(rgb_small_frame,
Page 25 of 33
cv2.COLOR_BGR2RGB))
success,img = cap.read()
classIds, confs, bbox = net.detect(img,confThreshold=thres)
bbox = list(bbox)
confs = list(np.array(confs).reshape(1,-1)[0])
confs = list(map(float,confs))
#print(type(confs[0]))
#print(confs)
indices = cv2.dnn.NMSBoxes(bbox,confs,thres,nms_threshold)
if len(classIds) != 0:
for i in indices:
i = i[0]
box = bbox[i]
confidence = str(round(confs[i],2))
color = Colors[classIds[i][0]-1]
x,y,w,h = box[0],box[1],box[2],box[3]
cv2.rectangle(img, (x,y), (x+w,y+h), color, thickness=2)
cv2.putText(img, classNames[classIds[i][0]-1]+"
"+confidence,(x+10,y+20),
font,1,color,2)
# speak_Text(classNames[classIds[i][0]-1])
# cv2.putText(img,str(round(confidence,2)),(box[0]+100,box[1]+30),
# font,1,colors[classId-1],2)
print(classNames[classIds[i][0]-1])
k=cv2.waitKey(1)
Page 26 of 33
if k== ord('q'):
# cap.release
cv2.destroyAllWindows()
break
# speak_Text(classNames[classIds[i][0]-1])
cv2.imshow("Output",img)
# cap.release()
# cv2.waitKey(1)
Page 27 of 33

I ASSIST INTERIM REPORT Final

Uploaded by

Copyright:

Available Formats

I ASSIST INTERIM REPORT Final

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

I ASSIST INTERIM REPORT Final

Uploaded by

Copyright:

Available Formats

I-ASSIST

ADITYA AGARWAL (191B020)

in partial fulfillment for the award of the Degree of

Aug 2021- Dec 2021

JAYPEE UNIVERSITY OF ENGINEERING & TECHNOLOGY,

A-B ROAD, RAGHOGARH, DT. GUNA - 473226, M.P., INDIA

Signature of the Student

Anshu Gupta (191B052)

Ashmit Modi (191B076)

Jaypee University of Engineering and Technology,

Signature of the Guide

Jaypee University of Engineering and Technology, Raghogarh,

Aditya Agarwal (191B020)

Anshu Gupta (191B052)

Ashmit Modi (191B076)

Figure Title Page No.

Fig 3.1 Flowchart of Face 9

Chapter-2 LITERATURE SURVEY

Chapter-3 SYSTEM ANALYSIS & DESIGN

1.1 Problem Definition

1.2 Project Overview

1.3 Hardware Specification

1.4 Software Specification

2.1 EXISTING SYSTEM

2.2 PROPOSED SYSTEM

2.3 FEASIBILITY STUDY

3.1 Requirement Specification

Python is an interpreted high-level general-purpose programming language. Its

YOLO v5 is a family of object detection architectures and models pretrained on the

pip install opencv-python -command

Tensor flow is an open-source software library for dataflow and differentiable

pip install tensorflow -command

pip install numpy -command

3.2.1 Face Detection Model

3.2.2 Object Detection Model

The use case diagram is explained as follows:

3.4 Use case Diagram

● “I-ASSIST” goal is to make the world more accessible to people who

● At the end of this project we expect that our application is capable of

● Vector Images Creator

● Geeks For Geeks

7.1 Details of software used

7.1.2 VISUAL STUDIO

7.1.3 Google Colab

known_face_encodings = [chris_encodings, robert_encodings]

thres = 0.5 # Threshold to detect object

You might also like