I ASSIST INTERIM REPORT Final
I ASSIST INTERIM REPORT Final
I ASSIST INTERIM REPORT Final
Submitted by:
IN
Department of Computer Science & Engineering
We hereby declare that the work reported in 5th semester Minor project entitled “I-
ASSIST”, in partial fulfillment for the award of the degree of B.Tech (CSE)
submitted at Jaypee University of Engineering and Technology, Guna, as per the
best of our knowledge and belief there is no infringement of intellectual property
rights and copyright. In case of any violation, we will solely be responsible.
Date: 20/1/2022
i
CERTIFICATE
This is to certify that the project titled “I-ASSIST” is the bona fide work carried out
by Aditya Agarwal ,Ashmit Modi and Anshu Gupta, a student of B Tech (CSE) of
Jaypee University of Engineering and Technology, Guna (M.P) during the academic
year 2020-21, in partial fulfillment of the requirements for the award of the degree of
Bachelor of Technology (Computer Science and Engineering ) and that the project
has not formed the basis for the award previously of any other degree, diploma,
fellowship or any other similar title.
Date: 20/1/2022
ii
ABSTRACT
This project was an attempt at developing an object detection and tracking system
using modern computer vision technology. The project delivers an implemented
tracking system. It consists of a hybrid of optical and modern infra-red technology
and is applicable to areas such as unsupervised surveillance or semi-autonomous
control. It is stable and is applicable as a stand alone system or one that could easily
be embedded into an even larger system. The project was implemented in 5 months,
and involved research into the area of computer vision and robotic automation. It
also involved the inclusion of cutting-edge technology of both the hardware and
software kind. The results of the project are expressed in this report, and amount to
the application of computer vision techniques in tracking animate objects in both a
2 dimensional and 3 dimensional scene.
iii
ACKNOWLEDGEMENT
We would like to express our gratitude and appreciation to all those who gave us the
opportunity to complete this project. Special thanks is due to our supervisor Mr.
Navaljeet Singh Arora whose help, stimulating suggestions and encouragement
helped us in all the time of development process and in writing this report. We also
sincerely thanks for the time spent proofreading and correcting my many mistakes.
We would also like to thank our parents and friends who helped us a lot in finalizing
this project within the limited period. Last but not the least I am grateful to all the
team members of I-ASSIST
Thanking you
iv
LIST OF FIGURES
v
Table of Contents
Title page i
Declaration of the Student ii
Certificate of the guide iii
Abstract iv
Acknowledgement v
List of Figures
Chapter-1 INTRODUCTION
1.1 Problem Definition
1.2 Project Overview
1.3 Hardware Specification
1.4 Software Specification
Chapter-4 RESULTS/OUTPUTS
Chapter-5 CONCLUSIONS/RECOMMENDATIONS
Chapter-6 REFERENCES
Chapter-7 APPENDICES
7.1 Details of software used
7.2 Code
CHAPTER-1
INTRODUCTION
According to a study conducted by MIT about 80% of the information collected from
the environment are transferred to the brain with the help of eyes.
According to this survey we get to know that among all the five sense organs (Eyes,
Ears, Nose, Tongue and Skin), eyes play a most important role.
To help blind people so that they can walk without hitting obstacle and identify
obstacle in their path, detect the known family members and to help them in many
more such tasks we are coming up with the solution of I-ASSIST.
The project “I-Assist” mainly focuses to provide a vision assistance to blind people
By helping them detect the object in front of them so that they can walk without
hitting any obstacle.
Helps in identifying the objects in front of the person.
Recognizing known faces(family members, friends and relatives).
Help in crossing the roads by identifying the traffic lights colors.
Sending SOS message to contacts in case of any emergency.
Blind people face difficulty in walking as they might get hit by an obstacle in the
path or similar problems like crossing a traffic light signal or recognizing faces as
they cannot see.
Page 2 of 33
“I-ASSIST” goal is to make the world more accessible to people who are blind or
have a low level of vision. At the end of this project we expect that our application
is capable of giving desired output to the users and is able to help them. The main
goal of the project is to help the people and make their life easy by providing them
assistance while doing different tasks. At the end of this project we expect that our
application is fully stable on its version.
CPU (3.0 GHz or faster) or faster 64-bit Dual Core processor like Intel
core-2 duo.
Memory: 4GB(DDR4 | DDR2) RAM or more
Camera for capturing the image
Speaker (1mW) i.e. like in ear microphone.
Python Interpreter
Operating system: Linux- Ubuntu 16.04 to 17.10
Page 3 of 33
CHAPTER-2
LITERARTURE SURVEY
The project will enable users to identify the objects and obstacles in the path which
will help them to walk freely without hitting any object , identify the known people
so that they can easily recognize the people they know easily contact them in case
of any emergencies.
The software will be provided at a certain cost which requires the cost of the
hardware and the software that are used to build up the application, the system is
easy to operate and can be used anywhere by the people.
Financial Stability: The price of the equipment will not be too high and
minimal cost will be charged from the people, which include the cost of the
Page 4 of 33
hardware and the software.
Technical feasibility: Each of the hardware and the software used are freely
available in the market and the technologies used are open source which
means anyone can contribute in these technologies. The data collected from
the user will be stored in the user local system and it will be used to improve
the accuracy and the functioning of the application.
Page 5 of 33
CHAPTER-3
SYSTEM ANALYSIS & DESIGN
PYTHON
GOOGLE COLAB
Colaboratory, or “Colab” for short, is a product from Google Research. Colab allows
anybody to write and execute arbitrary python code through the browser, and is
especially well suited to machine learning, data analysis and education. More
technically, Colab is a hosted Jupyter notebook service that requires no setup to use,
while providing free access to computing resources including GPUs.
Page 6 of 33
YOLO V5
All the previous object detection algorithms have used regions to localize the object
within the image. The network does not look at the complete image. Instead, parts
of the image which has high probabilities of containing the object. YOLO or You
Only Look Once is an object detection algorithm much is different from the region
based algorithms which seen above. In YOLO a single convolutional network
predicts the bounding boxes and the class probabilities for these boxes.
OpenCV
OpenCV (Open Source Computer Vision Library) is an open source computer vision
and machine learning software library. OpenCV was built to provide a common
infrastructure for computer vision applications and to accelerate the use of machine
perception in the commercial products. Being a BSD-licensed product, OpenCV
makes it easy for businesses to utilize and modify the code.
The library has more than 2500 optimized algorithms, which includes a
comprehensive set of both classic and state-of-the-art computer vision and machine
learning algorithms. These algorithms can be used to detect and recognize faces,
identify objects, classify human actions in videos, track camera movements, track
Page 7 of 33
moving objects, extract 3D models of objects, produce 3D point clouds from stereo
cameras, stitch images together to produce a high resolution image of an entire scene,
find similar images from an image database, remove red eyes from images taken
using flash, follow eye movements, recognize scenery and establish markers to
overlay it with augmented reality, etc. OpenCV has more than 47 thousand people
of user community and estimated number of downloads exceeding 18 million. The
library is used extensively in companies, research groups and by governmental bodie
TENSOR FLOW
NUMPY
NumPy is library of Python programming language, adding support for large, multi-
dimensional array and matrice, along with large collection of high-level
mathematical function to operate over these arrays. The ancestor of NumPy,
Numeric, was originally created by Jim Hugunin with contributions from several
developers. In 2005 Travis Olphant created NumPy by incorporating features of
computing Numarray into Numeric, with extension modifications. NumPy is open-
source software and has many contributors.
Page 9 of 33
3.2 Flowchart
Fig 3.1
Explanation:
Face recognition extends beyond detecting the presence of a human face to
determine whose face it is. The process uses a computer application that captures a
digital image of an individual's face -- sometimes taken from a video frame -- and
compares it to images in a database of stored records.
Page 10 of 33
Face detection helps identify which parts of an image or video should be focused on
to determine age, gender and emotions using facial expressions. In a facial
recognition system -- which maps an individual's facial features mathematically and
stores the data as a faceprint -- face detection data is required for the algorithms that
discern which parts of an image or video are needed to generate a faceprint. Once
identified, the new faceprint can be compared with stored faceprints to determine if
there is a match.
Fig 3.2
Page 11 of 33
Explanation:
Object detection is a computer vision technique that works to identify and locate
objects within an image or video. Specifically, object detection draws bounding
boxes around these detected objects, which allow us to locate where said objects are
in (or how they move through) a given scene.
An encoder takes an image as input and runs it through a series of blocks and layers
that learn to extract statistical features used to locate and label objects. Outputs from
the encoder are then passed to a decoder, which predicts bounding boxes and labels
for each object.
The bounding boxes are then shown to us as the output of the object identified in the
picture by the model.
The model predicts the bounding boxes of the object after verifying it with the help
of accuracy.
Page 12 of 33
3.3 Sequence Diagram
Fig 3.3
Explanation:
1.A user opens the application and the web cam is accessed which captures the live
camera feed from the environment.
2. the image captured from the backgrouind is then transferred to the face detection
and the object detection model which looks for the familiar faces and the object in
the camera feed and tells about the output in the form of voice.
3. the model also offers traffic light detection application to the user ehich can easily
Page 13 of 33
help the user to identify the traffic light.
4. the user can also send the soso message ioncase of any emergency with the help
of the application.
Fig 3.4
Explanation:
The user starts the system and then the live feed from the camera is taken as input
for the object detection and the face detection program.
The face detection model scans the image feed and check for the known faces
available in the database which are then matched with the input data.
Page 14 of 33
The result is then output to the screen with the bounding boxes and the name of the
person if identified is pronounced or spoken by the program.
The obstacle detection program scans the image feed and detects the object if present
in the image feed and tells the output with accuracy in the form of speech.
The system also provides the facility to send an SOS signal to the users contact list
in case of any emergencies.
Page 15 of 33
CHAPTER-4
RESULTS/OUTPUTS
With the help of this project, people are able to helps in identifying the objects in
front of the person, recognizing known faces(family members, friends and relatives),
recognizing known faces(family members, friends and relatives) and Help in
crossing the roads by identifying the traffic lights colors.
By using this thesis and based on experimental results we are able to detect obeject
more precisely and identify the objects individually with exact location of an obeject
in the picture in x,y axis.This paper also provide experimental results on different
methods for object detection and identification and compares each method for
Page 16 of 33
SCREENSHOTS
Fig 3.5
Page 17 of 33
Fig 3.6
Page 18 of 33
CHAPTER-5
CONCLUSIONS/RECOMMENDATIONS
● The main goal of the project is to help the people and make there life
easy by providing them assistance while doing different tasks.
● At the end of this project we expect that our application is fully stable
on its version.
By using this thesis and based on experimental results we are able to detect obeject
more precisely and identify the objects individually with exact location of an obeject
in the picture in x,y axis.This paper also provide experimental results on different
methods for object detection and identification and compares each method for their
Page 19 of 33
CHAPTER-6
REFERENCES
● https://sourceforge.net/
● Wikipedia
● https://medium.com/
● https://github.com/ultralytics/yolO
Page 20 of 33
CHAPTER-7
APPENDICES
Text Editor
Visual Studio
Google Collab
Tensor Flow
Page 21 of 33
7.1.1 TEXT EDITOR
A Text editor is a type of computer program that edits plain Text. Text editors
are provided with operating systems and software development packages, and can
be used to change files such as configuration files, documentation files and
programming language source code
Visual Studio Code is a streamlined code editor with support for development
operations like debugging, task running, and version control. It aims to provide just
the tools a developer needs for a quick code-build-debug cycle and leaves more
complex workflows to fuller featured IDEs, such as Visual Studio IDE
Colaboratory, or “Colab” for short, is a product from Google Research. Colab allows
anybody to write and execute arbitrary python code through the browser, and is
especially well suited to machine learning, data analysis and education. More
technically, Colab is a hosted Jupyter notebook service that requires no setup to use,
while providing free access to computing resources including GPUs
Page 22 of 33
7.2 Code
import numpy as np
import cv2
from gtts import gTTS
from playsound import playsound
import os
import cv2
import face_recognition
import numpy as np
from face_recognition.api import face_encodings
chris = face_recognition.load_image_file('chris.png')
chris_encodings = face_recognition.face_encodings(chris)[0]
robert = face_recognition.load_image_file('robert.png')
robert_encodings = face_recognition.face_encodings(robert)[0]
classNames = []
with open('coco.names','r') as f:
classNames = f.read().splitlines()
print(classNames)
font = cv2.FONT_HERSHEY_PLAIN
#font = cv2.FONT_HERSHEY_COMPLEX
Colors = np.random.uniform(0, 255, size=(len(classNames), 3))
weightsPath = "frozen_inference_graph.pb"
configPath = "ssd_mobilenet_v3_large_coco_2020_01_14.pbtxt"
net = cv2.dnn_DetectionModel(weightsPath,configPath)
net.setInputSize(320,320)
net.setInputScale(1.0/ 127.5)
net.setInputMean((127.5, 127.5, 127.5))
net.setInputSwapRB(True)
while True:
flag, frame = cap.read()
if not flag:
print("Colud not access the camera")
break
small_frame = cv2.resize(frame, (0, 0), fx=1, fy=1)
rgb_small_frame = cv2.cvtColor(small_frame, cv2.COLOR_BGR2RGB)
Page 24 of 33
face_locations = face_recognition.face_locations(rgb_small_frame)
face_encodings = face_recognition.face_encodings(
rgb_small_frame, face_locations)
face_names = []
for face_encoding in face_encodings:
matches = face_recognition.compare_faces(
known_face_encodings, face_encoding)
name = "UNKNOWN"
face_distances = face_recognition.face_distance(
known_face_encodings, face_encoding)
best_match_index = np.argmin(face_distances)
if matches[best_match_index]:
name = known_face_names[best_match_index]
face_names.append(name)
print(face_names)
for (top, right, bottom, left), name in zip(face_locations, face_names):
top *= 4
right *= 4
bottom *= 4
left *= 4
cv2.rectangle(frame, (left, top), (right, bottom), (0, 0, 255), 2)
font = cv2.FONT_HERSHEY_DUPLEX
cv2.putText(frame, name, (left+6, bottom-6),
font, 1.0, (255, 255, 255), 1)
cv2.imshow("Frame", cv2.cvtColor(rgb_small_frame,
Page 25 of 33
cv2.COLOR_BGR2RGB))
success,img = cap.read()
classIds, confs, bbox = net.detect(img,confThreshold=thres)
bbox = list(bbox)
confs = list(np.array(confs).reshape(1,-1)[0])
confs = list(map(float,confs))
#print(type(confs[0]))
#print(confs)
indices = cv2.dnn.NMSBoxes(bbox,confs,thres,nms_threshold)
if len(classIds) != 0:
for i in indices:
i = i[0]
box = bbox[i]
confidence = str(round(confs[i],2))
color = Colors[classIds[i][0]-1]
x,y,w,h = box[0],box[1],box[2],box[3]
cv2.rectangle(img, (x,y), (x+w,y+h), color, thickness=2)
cv2.putText(img, classNames[classIds[i][0]-1]+"
"+confidence,(x+10,y+20),
font,1,color,2)
# speak_Text(classNames[classIds[i][0]-1])
# cv2.putText(img,str(round(confidence,2)),(box[0]+100,box[1]+30),
# font,1,colors[classId-1],2)
print(classNames[classIds[i][0]-1])
k=cv2.waitKey(1)
Page 26 of 33
if k== ord('q'):
# cap.release
cv2.destroyAllWindows()
break
# speak_Text(classNames[classIds[i][0]-1])
cv2.imshow("Output",img)
# cap.release()
# cv2.waitKey(1)
Page 27 of 33