Major Project Report - (Revised4)
Major Project Report - (Revised4)
AND ASSISTANT
A Project Report
Submitted By:
Kushagra Bajpai.
Kunal Gupta.
Place: Lucknow
Page 2 of 62
CERTIFICATE
This is to certify that the project titled “VOICE BASED CONTROL SYSTEM
AND ASSISTANT” is the bona fide work carried out by Kushagra Bajpai, Kumari
Saniya Ansari, Ayush Kumar Singh, Mohd. Saif Khan and Kunal Gupta, the
students of Bachelor of Technology (in Computer Science and Engineering) of Babu
Banarasi Das University, Lucknow, Uttar Pradesh, during the academic year 2020-21,
in partial fulfilment of the requirements for the award of the degree of Bachelor of
Technology (Computer Science and Engineering) and that the project has not formed
the basis for the award previously of any other degree, diploma, fellowship or any
other similar title.
Place: Lucknow.
Page 3 of 62
ACKNOWLEDGEMENT
The completion of this project gives us much Pleasure. We wish to express our
heartfelt gratitude to the all the people who have played a crucial role in the research
for this project, without their active cooperation, the preparation of this project could
not have been completed within the specified time limit.
We would like to show our gratitude to our project guide and respected
lecturer Ms. Sarita Soni for giving us a good guideline and support throughout this
project with utmost cooperation and patience for the completion of this Project.
We would also like to expand our deepest gratitude to all those who have
directly and indirectly guided us in completing this project and project report.
Many people, especially our respected professors and friends, have made
valuable comments and suggestions during the project which gave us inspiration to
improve our project. Here, we thank all the people for their help directly and
indirectly to complete this project report.
Page 4 of 62
ABSTRACT
The project aims to develop a voice based control system and personal-assistant for
Windows-based systems.
It can perform all basic tasks on a desktop machine such as: launch
applications, play/switch music or videos, set and play reminder, tell date and time,
take screenshot, send email, etc. Complex tasks that it can execute include: using
system camera (if available) to capture image, record video, perform image face-
detection, perform real-time face detection, and execute interactive games to play.
Page 5 of 62
TABLE OF CONTENT
I. Declaration………………………………………….……… 02
II. Certificate………………………………………….……… 03
III. Acknowledgement………………………………………… 04
IV. Abstract…………………………………………………… 05
1. Introduction………………………………………………. 09
1.1. Background……………………………………………... 10
1.3.1. Purpose………………………………...…………. 13
1.3.2. Objectives………………………………………… 13
1.3.3. Scope……………………………………………… 15
1.3.4. Applicability……………………………………… 16
2. Literature Survey………………………………………… 19
Page 6 of 62
2.6. Technologies to be used………………………………… 33
5. Results……………………………………………………… 50
Page 7 of 62
6. Conclusion And Recommendations……………………… 54
7. References………………………………………………… 56
8. Appendices………………………………………………… 58
Page 8 of 62
(1) INTRODUCTION
Voice searches have dominated over text search. Web searches conducted via
mobile devices have only just overtaken those carried out using a computer and the
analysts are already predicting that 50% of searches will be via voice by 2022. Digital
assistants are turning out to be smarter than ever. Allow your intelligent assistant to
make email work for you. Detect intent, pick out important information, automate
processes, and deliver personalized responses.
This project was started on the premise that there is sufficient amount of openly
available data and information on the web that can be utilized to build a digital
assistant that has access to making intelligent decisions for routine user activities.
Page 9 of 62
(1.1) BACKGROUND
There are a variety of terms that refer to agents that can perform tasks or
services for an individual, and they are almost interchangeable — but not quite. They
differ mainly based on how we interact with the technology, the app, or a combination
of both. Here are some basic definitions, similarities, and differences:
1. Intelligent Personal Assistants (IPA): This type of software can assist users
with some basic tasks, usually using natural language. Intelligent personal
assistants are also so smart that they go online and search for an answers to a
user’s question. It may text or voice either of them trigger an action.
Page 10 of 62
4. Chatbot: Its function is similar to its name it uses text as medium to
communicate and provide information and do task for user. Chatbots can
imitate a conversation with a human user.
5. Voice Assistant: The input key here is our voice. It is a digital assistant that
uses voice recognition, speech synthesis, and natural language processing
(NLP) and also AI to provide an amazing service through an application exam
Siri, Ok Google. Cortana, etc.
Page 11 of 62
(1.2) PROBLEM DEFINITION
There are several automation softwares that can automate pc; but generally are
limited to specific functionality of their own, like automating only the aspects of
single application in the system.
The current trend in research and study of automation programs is on a rise, and
various tech. companies are either developing automation algorithms or the one’s
which have developed are working to optimize and enhance their algorithms that
can integrate automation with integrated technologies and more new features in
order to achieve higher productivity and efficiency.
Some popular optimized Voice Assistants have been developed by tech. (giant)
companies such as Google’s google assistant, Apple’s Siri, Amazon’s Alexa, etc.
and many other leading tech. companies have also developed their own version of
voice assistants.
Page 12 of 62
(1.3) PROJECT OVERVIEW
(1.3.1) PURPOSE :
(1.3.2) OBJECTIVES :
Digital assistants can tremendously save you time. We spend hours in online
research and then making the report in our terms of understanding. Our voice based
control system can do that for you. Provide a topic for research and continue with
your tasks while our application does the research.
Page 13 of 62
Another difficult task is to remember scheduled event dates, birthdates or
anniversaries. It comes with a surprise when you enter the class and realize it is class
test today. Just tell our system digital assistant in advance about your tests and it
reminds you well in advance so you can prepare for the test.
One of the main advantages of voice searches is their rapidity. In fact, voice is
reputed to be four times faster than a written search: whereas we can write about 40
words per minute, we are capable of speaking around 150 during the same period of
time. In this respect, the ability of personal assistants to accurately recognize spoken
words is a prerequisite for them to be adopted by consumers.
Objective motto:
Page 14 of 62
(1.3.3) SCOPE :
This turned out to be a wish list for the agent and so specific boundaries were
defined based on the availability of data sources, technologies and concepts that could
be validated for these use cases. The initial list of use cases was then categorized
based on user-agent interactions, and based on type of inputs and outputs.
Page 15 of 62
(1.3.4) APPLICABILITY :
Page 16 of 62
(1.4) HARDWARE SPECIFICATIONS
Page 17 of 62
(1.5) SOFTWARE SPECIFICAIONS
Page 18 of 62
(2) LITERATURE SURVEY
This field of digital assistants having speech recognition has seen some
major advancements or innovations. This is mainly because of its demand in
devices like smartwatches or fitness bands, speakers, bluetooth earphones, mobile
phones, laptop or desktop, television, etc. Almost all the digital devices which are
coming nowadays are coming with voice assistants which help to control the
device with speech recognition only. A new set of techniques is being developed
constantly to improve the performance of voice automated search.
SIRI is personal assistant software that interfaces with the user through voice
interface, recognizes commands and acts on them. It learns to adapt to user’s speech
and thus improves voice recognition over time. It also tries to converse with the user
when it does not identify the user request.
It integrates with calendar, contacts and music library applications on the device
and also integrates with GPS and camera on the device. It uses location, temporal,
social and task based contexts, to personalize the agent behavior specifically to the
user at a given point of time.
Supported Tasks:
Page 19 of 62
• Send a text message to someone
• Set an alarm
Drawback:
• SIRI does not maintain a knowledge database of its own and its
understanding comes from the information captured in domain models and data
models.
Supported Tasks:
• Reminders
• Outlook
Page 20 of 62
• Evernote
• Facebook, LinkedIn
• News Feeds
Drawbacks:
• Will take some time to recognize certain sensor’s data for certain automating
functionality which causes a notable delay.
Google Assistant is a virtual assistant software that interfaces with the user
through voice interface, recognizes commands through Google’s API and acts on
them. It learns to adapt to user’s speech and thus improves voice recognition over
time. It also tries to converse with the user when it does not identify the user request.
It integrates best with google applications and then with calendar, contacts and
music library applications on the device and also integrates with GPS and camera on
the device. It uses location, temporal, social and task based contexts, to personalize its
behavior specifically to the user at a given point of time.
Supported Tasks:
• Launch an application
Page 21 of 62
• Play a specific song or video online.
Drawback:
In case of any negative statement by user the program shall reply with
apt answer telling user about the improper statement.
Then adding certain conditions, so that program can know and provide
assistance if user is requesting assistance else provide service if user is
instructing certain task.
Page 22 of 62
In such case the program will tell the user of its inability to find the
path or absence of required application from the system.
In case user doesn’t respond with parameters then the program shall
use some default parameters to launch the instructed application.
For playing audio and video user needs to place audio and video files
in (D:\Music) or (D:\Videos) drive as for now this functionality is
restricted to a pre-decided drive and later it could be upgraded to be
played from user-desired directory.
Page 23 of 62
(2.3) FEASIBILITY STUDY :
Feasibility study can help you determine whether or not you should
proceed with your project. It is essential to evaluate cost and benefits. It is
essential to evaluate cost and benefit of the proposed system.
1. Technical feasibility:
It includes finding out technologies for the project, both hardware and
software. For digital assistant, user must have microphone to convey their
message and a speaker to listen when system speaks. There are many types of
affordable equipment now a days and everyone generally possess them. Besides,
system needs internet connection. While using our application, make sure you
have a steady internet connection. It is also not an issue in this era where almost
every home or office has Wi-Fi and with digitalization campaigns across many
countries, internet has never been so much accessible and affordable.
2. Operational feasibility:
Page 24 of 62
3. Economic feasibility:
Here, we find the total cost and benefit of the proposed system over
current system. For this project, the main cost is documentation cost. User also
would have to pay for microphone and speakers. Again, they are cheap, available
and affordable.
4. Organizational feasibility:
The management tasks are all to be carried out by a single person. That
won’t create any management issues and will increase the feasibility of the
project. Hence, excellent organizational feasibility is accomplished.
5. Cultural feasibility:
Page 25 of 62
Conclusion of feasibility study:
Since, overall feasibility study of the project reveals that the goals of the
proposed system are achievable.
Page 26 of 62
(2.4) SYSTEM DEVELOPMENT :
Figure 1
MODULE 1:
MODULE 2:
o Analyzing input.
Page 27 of 62
or not.
MODULE 3:
MODULE 4:
Page 28 of 62
o Simple playable games (made purely in python), etc.
MODULE 5:
o GUI creation.
MODULE 6:
Page 29 of 62
(2.5) SURVEY OF TECHNOLOGY :
Python:
Python was created in the late 1980s, and first released in 1991, by Guido
van Rossum as a successor to the ABC programming language.
Page 30 of 62
(2.6) TECHNOLOGIES TO BE USED :
And for any new technology that we will integrate with our software we
will be only using its required functionalities.
Packages required:
5. datetime — This is an inbuilt module in python and it works on date and time.
Page 31 of 62
Page 32 of 62
8. web browser — This is an in-built package in python. It extracts data from the
web, as well provides functions to interact with system default web-browser.
10. Json- The json module is used for storing and exchanging data.
11. request- The request module is used to send all types of HTTP request. Its
accepts URL as parameters and gives access to the given URL’S.
12. wolfram alpha — Wolfram Alpha is an API which can compute expert-
level answers using Wolfram’s algorithms, knowledge base and AI technology. It
is made possible by the Wolfram Language.
14. Psutil – psutil (process and system utilities) is a cross-platform library for
retrieving information on running processes and system utilization (CPU,
memory, disks, network, sensors) in Python.
15. Smtplib – This module defines an SMTP client session object that can be
used to send mail to any Internet machine with an SMTP or ESMTP listener
daemon.
Page 33 of 62
to even control them.
17. Random – The random module is a built-in module to generate the
pseudo-random variables. It can be used perform some action randomly such as to
get a random number, selecting a random elements from a list, shuffle
elements randomly, etc.
20. Tkinter – It is the standard GUI library in Python. Python when combined
21. OpenCV – by using it one can process images and videos to identify
22. Turtle – Turtle is a Python library which used to create graphics, pictures,
and games. It was developed by Wally Feurzeig, Seymour Parpet and Cynthina
Slolomon in 1967. It was a part of the original Logo programming language.
Page 34 of 62
(3) SYSTEM ANALYSIS AND DESIGN
Figure 2
In this project there is only one user. The user queries command to the system.
System then interprets it and fetches answer. The response is sent back to the user.
Page 35 of 62
(3.2) CLASS DIAGRAM :
Figure 3
The class user has 2 attributes command that it sends in audio and the response
it receives which is also audio. It performs function to listen the user command.
Interpret it and then reply or sends back response accordingly.
Page 36 of 62
(3.3) E-R DIAGRAM :
Figure 4
The above diagram shows entities and their relationship for a digital assistant
system.
We have a user of a system who can have their keys and values. It can be used
to store any information about the user. Single user can ask multiple questions. Each
question gets recognized in form of query and answer shall be fetched. User can also
be having n number of tasks.
Page 37 of 62
(3.4) COMPONENT DIAGRAM :
Figure 5
The main component here is the Virtual Assistant. It provides two specific
service: executing Task or Answering your question.
Page 38 of 62
(3.5) SEQUENCE DIAGRAM :
Figure 6
The above sequence diagram shows how an answer asked by the user is being
fetched from internet. The audio query is interpreted and sent to google API. The API
parses speech and sends back the text. Query is then processed in the main function
and appropriate action is taken.
Page 39 of 62
(3.4.2) Sequence diagram for task execution
Figure 7
The user sends command to virtual assistant in audio form. The command is
passed to the interpreter. It identifies what the user has asked and directs it to task
executer. If the task is missing some info, the virtual assistant asks user back about it.
The received information is sent back to task and it is accomplished. After execution
feedback is sent back to user.
Page 40 of 62
(3.6) DATA FLOW DIAGRAM :
Figure 8
Figure 9
Page 41 of 62
(3.5.3) DFD Level 2
Page 42 of 62
Figure 10
Page 43 of 62
(3.7) DEPLOYMENT DIAGRAM :
Figure 11
The user interacts with google API using a normal high speed internet
connection.
Page 44 of 62
(3.8) ACTIVITY DIAGRAM :
Figure 12
Page 45 of 62
(4) TEST CASE DESIGN
TEST CASE 1:
Test ID: T1
Test Objective: To make sure that the system respond back time is efficient.
Description: Time is very critical in a voice based system. As we are not typing
inputs, we are speaking them. The system must also reply in a moment. User must get
instant response of the query made.
Page 46 of 62
TEST CASE 2:
Test ID: T2
Test Objective: To assure that answers retrieved by system are accurate as per
gathered data.
Description: A virtual assistant system is mainly used to get precise answers to any
question asked. Getting answer in a moment is of no use if the answer is not correct.
Accuracy is of utmost importance in a virtual assistant system.
Page 47 of 62
TEST CASE 3:
Test ID: T3
Note: There might include a few more test cases and these test cases are also subject
to change with the final software development.
Page 48 of 62
(5) RESULTS
TEST CASE 1
Test Objective: To make sure that the system respond back time is efficient for
simple non-parametric query.
Test Steps:
1. Make query.
2. Record response feedback.
Test Data:
1. As expected.
2. As expected.
Page 49 of 62
Test Title: Response Time
Test Objective: To make sure that the system respond back time is efficient for
simple parametric query.
Test Steps:
1. Make query.
2. Give further input.
3. Record response feedback.
Test Data:
1. Open chrome.
1.1. bbd university
1. As expected.
Page 50 of 62
TEST CASE 2
Test ID: T2
Test Objective: To assure that answers retrieved by system are accurate as per
gathered data.
Test Steps:
Test Data:
Page 51 of 62
TEST CASE 3
Test ID: T3
Test Steps:
Test Data:
1. Answer should contain approximate value of pi i.e. not more than 3 significant
digits
Page 52 of 62
(6) CONCLUSION AND RECOMMENDATIONS
We aim to make this project a complete desktop assistant and make it smart
enough to be more powerful than keyboard. The future plans include integrating our
application to IoT devices and with other software applications for seamless
automation and performance.
The digital voice assistant system presented in this project is very fundamental
system with few features however the additional and advance feature may be
introduced as future work of this project, In this project the design and
implementation of a voice based control system digital assistant. The project is built
using available open source software modules with python 3.x and its libraries
backing which can accommodate any updates in future.
The modular approach used in this project makes it more flexible and easy to
integrate additional modules and features without disturbing the current system
Page 53 of 62
functionaries. It not only works on human commands but also it is designed for give
responses to the user on the basis of query being asked or the words spoken by the
user such as opening tasks and operations. This software application has an enormous
and limitless scope in the future, like Siri, Google Assistant and Cortana and other
popular personal voice assistants. The project will easily able to integrate with devices
near future for a Connected Home using Internet of Things, voice command system
and computer vision.
Page 54 of 62
(7) REFERENCES
Documents referred:
Page 55 of 62
University”https://www.researchgate.net/publication/26400164
4_Virtual_Personal_Assistant
[7] “Evaluation methodology for Speech To Text Services
similarity and
[8] “speed characteristics focused on small size computers”—“ J.E.
Aguilar-Chacon and D.A. Segura-Torres 2020 IOP Conf. Ser.:
Mater. Sci. Eng. 844
012039”https://iopscience.iop.org/article/10.1088/1757-899X/8
44/1/012039/pdf
[9] “Facial Recognition using OpenCV—March 2012”—“Authors:
Shervin Emami, The University of Queensland; Valentin Petruț,
Suciu”https://www.researchgate.net/publication/267426877_Fa
cial_Recognition_using_OpenCV
[10] “International Research Journal of Engineering and
Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 05 |
May 2020 www.irjet.net p-ISSN: 2395-0072© 2020, IRJET |
Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal |
A REVIEW ON METHODS FOR SPEECH-TO-TEXT AND
TEXT-TO-SPEECH CONVERSION—“Shivangi Nagdewani,
Ashika Jain”https://www.irjet.net/archives/V7/i5/IRJET-
V7I5854.pdf
[11] “GENESIS THE DIGITAL ASSISTANT
(PYTHON)”—“May
2020”—“DOI:10.33564/IJEAST.2020.v05i01.114”—“Authors:
Tushar Bansal, Ritik Karnwal, Vishal Singh Hardik
Bansal”https://www.researchgate.net/publication/343543058_G
ENESIS-THE_DIGITAL_ASSISTANT_PYTHON
[12] “International Research Journal of Engineering and
Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 01 |
Page 56 of 62
Jan 2019 www.irjet.net p-ISSN: 2395-0072
[13] © 2019, IRJET | Impact Factor value: 7.211 | ISO
9001:2008 Certified Journal | Page 1550 “—“AI-Smart
Assistant”—“Authors: Tushar Gharge, Chintan Chitroda ,
Nishit Bhagat, Kathapriya
Giri”https://www.irjet.net/archives/V6/i1/IRJET-V6I1288.pdf
[14] “Engaging Students with Game Programming in Python,
October 2009”—“Authors: Wang
Hong”—https://www.researchgate.net/publication/44260415_E
ngaging_Students_with_Game_Programming_in_Python
Books referred:
[1] Automate the Boring Stuff with Python, 2nd Edition: Practical
Programming for Total Beginners Paperback – by Al Sweigart
Page 57 of 62
(8) APPENDICES
Page 58 of 62
(7.2) ABOUT TEHNOLOGIES USED:
Google has a great Speech Recognition API. This API converts spoken text
(microphone) into written text (Python strings), briefly Speech to Text. You can
simply speak in a microphone and Google API will translate this into written text. The
API has excellent results for English language. Google has also created the JavaScript
Web Speech API, so you can recognize speech also in JavaScript
Recognizer Class :
import speech_recognition as sr
recognizer = sr.Recognizer()
Now, let’s set the energy threshold to 300. You can think of the energy
threshold as the loudness of the audio files. The values below the threshold are
considered silent, and the values above the threshold are considered speech. This will
improve the recognition of the speech when working with the audio file.
recognizer.energy_threshold = 300
Page 59 of 62
SpeechRecognition’s documentation recommends 300 as a threshold value
which works great with most of the audio files. Also, keep in mind that the energy
threshold value will adjust automatically as the recognizer listens to audio files.
Speech Recognition has a built-in function to make it work with many of the
APIs out there:
recognize_bing()
recognize_google()
recognize_google_cloud()
recognize_wit()
The smtplib module defines an SMTP client session object that can be used to
send mail to any Internet machine with an SMTP or ESMTP listener daemon.
Page 60 of 62
If the connect() call returns anything other than a success code,
an SMTPConnectError is raised.
Wolfram Alpha API is free (for non-commercial usage), but we still need to
get API key (AppID) to perform queries against the API endpoints.
OpenCV was started at Intel in 1999 by Gary Bradsky, and the first release
came out in 2000. Vadim Pisarevsky joined Gary Bradsky to manage Intel's Russian
software OpenCV team. Its active development continued under the support of
Page 61 of 62
Willow Garage with Gary Bradsky and Vadim Pisarevsky leading the project.
OpenCV now supports a multitude of algorithms related to Computer Vision and
Machine Learning and is expanding day by day.
OpenCV-Python is the Python API for OpenCV, combining the best qualities
of the OpenCV C++ API and the Python language.
Turtle is a Python library which used to create graphics, pictures, and games.
It was developed by Wally Feurzeig, Seymour Parpet and Cynthina Slolomon in
1967. It was a part of the original Logo programming language.
Page 62 of 62