Mini Project Report
Mini Project Report
on
Examiner(s)
1
Acknowledgments
Our first and foremost words of recognition go to my highly esteemed
Guide for her constructive academic advice and guidance, constant en-
couragement and valuable suggestions, and all other support and kindness
to me. Her supervision and guidance proved to be the most valuable to
overcome all the hurdles in the completion of this report.
Finally, We would like to thank all whose direct and indirect support
helped me complete this report in time.
2
Abstract
As we all know now we are living in the era of computers. We all would
have wondered how convenient it would be if we had our own virtual A.I.
assistant , imagine how easier and effortless it would be to send emails
without typing a single word, searching on Wikipedia without actually
opening the web browsers, and performing many other daily tasks with
the help of a single voice command. As of today, voice assistants are now
everywhere. Voice based artificial intelligence is here to play an important
role in our daily life. Several such applications include Siri on Apple
devices, Cortana on Microsoft Devices, and Google Assistant on Android
devices. There are also devices dedicated to providing virtual assistance.
Virtual assistants are typically cloud-based programs that require internet-
connected devices and/or applications to work. Virtual assistants typically
perform simple jobs for end users, such as adding tasks to a calendar, pro-
viding information that would normally be searched in a web browser, or
a little more complex tasks like checking the status of smart home devices
etc. The work is initialized by analyzing the audio commands given by the
user via microphone. The speech engine is setup up so that it can convert
the text to speech using in build libraries. Speech recognition is used to
convert the speech input to text. This text is then fed to the model which
determines the nature of the command and calls the relatable script for
execution. This is basically what happens when the assistant receives a
command from the user.
3
Contents
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Purpose, Scope and Applicability . . . . . . . . . . . . . . 4
1.4 How it Works . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Introduction to AI 7
2.1 What is Artificial Intelligence? . . . . . . . . . . . . . . . . 7
2.2 Types of Artificial Intelligence . . . . . . . . . . . . . . . . 8
4 Implementation 13
5 System Design 16
5.1 ER Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.2 Activity Diagram . . . . . . . . . . . . . . . . . . . . . . . 18
4
5.3 Class Diagram . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.4 Use Case Diagram . . . . . . . . . . . . . . . . . . . . . . 20
5.5 Sequence Diagram . . . . . . . . . . . . . . . . . . . . . . 21
6 Result 22
6.1 User Interface . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.2 Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
7 BIBLIOGRAPHY 24
5
Chapter 1
Introduction
1
This project was started on the premise that there is sufficient amount
of openly available data and information on the web that can be utilized
to build a virtual assistant that has access to making intelligent decisions
for routine user activities.
1.1 Background
Supported Tasks
2
• Launch an application on my iPhone
Drawback
SIRI does not maintain a knowledge database of its own and its un-
derstanding comes from the information captured in domain models and
data models.
1.2 Objectives
3
One of the main advantages of voice searches is their rapidity. In fact,
voice is reputed to be four times faster than a written search: whereas we
can write about 40 words per minute, we are capable of speaking around
150 during the same period of time 15. In this respect, the ability of
personal assistants to accurately recognize spoken words is a prerequisite
for them to be adopted by consumers.
Purpose
Scope
Voice assistants will continue to offer more individualized experiences
as they get better at differentiating between voices. However, it’s not just
4
developers that need to address the complexity of developing for voice as
brands also need to understand the capabilities of each device and inte-
gration and if it makes sense for their specific brand. They will also need
to focus on maintaining a user experience that is consistent within the
coming years as complexity becomes more of a concern. This is because
the visual interface with voice assistants is missing. Users simply cannot
see or touch a voice interface.
Applicability
The mass adoption of artificial intelligence in users’ everyday lives is
also fueling the shift towards voice. The number of IoT devices such as
smart thermostats and speakers are giving voice assistants more utility in
a connected user’s life. Smart speakers are the number one way we are
seeing voice being used. Many industry experts even predict that nearly
every application will integrate voice technology in some way in the next
5 years.
The use of virtual assistants can also enhance the system of IoT (In-
ternet of Things). Twenty years from now, Microsoft and its competitors
will be offering personal digital assistants that will offer the services of a
full-time employee usually reserved for the rich and famous.
5
1.4 How it Works
• The natural language audio signal is converted into digital data that
can be analyzed by the software.
6
Chapter 2
Introduction to AI
7
2.2 Types of Artificial Intelligence
Based on Capabilities
• Weak AI or Narrow AI
• General AI
• Super AI
Based on Functionality
• Reactive Machines
• Limited Memory
• Theory of Mind
8
Chapter 3
The system is built keeping in mind the generally available hardware and
software compatibility. It doesn9t require any expensive hardware devices.
The minimum hardware and software requirements for the system are
listed below.
9
Python 2 and 3. . An application invokes the pyttsx3.init() factory
function to get a reference to a pyttsx3. Engine instance. It is a
very easy-to-use tool that converts the entered text into speech. The
pyttsx3 module supports two voices first is female and the second is
male which is provided by ¡sapi5= for windows. It supports three TTS
engines:- sapi5 – SAPI5 on Windows, nsss – NSSpeechSynthesizer on
Mac OS X, espeak – eSpeak on every other platform.
10
erating system-dependent functionality. The os and os.path modules
include many functions to interact with the file system.
11
guage.
• Datetime - In Python, date and time are not a data type of their own,
but a module named datetime can be imported to work with the date
as well as time. Python Datetime module comes built into Python,
so there is no need to install it externally.Python Datetime module
supplies classes to work with date and time. These classes provide a
number of functions to deal with dates, times and time intervals. Date
and datetime are an object in Python, so when you manipulate them,
you are actually manipulating objects and not string or timestamps
12
Chapter 4
Implementation
13
Step 3: Neural network for assistant
Neural networks comprise of layers/modules that perform operations on
data. The torch.nn namespace provides all the building blocks you need to
build your own neural network and define our neural network. Then create
the neurons through which data and computations flow. The input comes
from the raw data set We use NumPy to build a single neuron. NLTK is a
toolkit built for working with NLP in Python. It provides us with various
text processing libraries with a lot of test datasets. A neural network is a
series of algorithms that endeavours to recognize underlying relationships
in a set of data through a process that mimics the way the human brain
operates. They develop the output without programmed rules.
Step 4: dataset
Here we are creating a .json file which contains tags, patterns, responses
and they are supplied to the neural network to train the model. And then
all the data which is trained will be stored in a .pth dataset file, .pth is
a data file for machine learning in PyTorch. The reason why we used a
json file is because it is a data inter change format and uses human read-
able text to store and transmit data objects consisting of attribute value
pairs and arrays. It basically has two data structures: - object and array.
Object stores a set of name value pairs and array is a list of values. The
dataset has been created by us depending on the tasks that has to be
carried out.
14
Step 5: Categorizing
We are generating the probability where we train the model what to re-
spond like if we are communicating with the assistant the assistant should
be able to categorize the conversation under the specific tag using the
probability.
Step 6: Tasks
There are two types of inputs :– input function and non input function.
Examples of non input function are Time, Date etc. and examples of input
functions – google search, Wikipedia etc. Such tasks can be implemented
using various modules provided by python like datetime, Wikipedia, py-
whatkit etc. We can also provide tasks for interacting with operating
system using OS library. And based on the tasks we add, we even need
to add various tags related to the task which can make the conversation
better after training the model.
15
Chapter 5
System Design
5.1 ER Diagram
The above diagram shows entities and their relationship for a virtual
assistant system. We have a user of a system who can have their keys and
values. It can be used to store any information about the user. Say, for
16
key “name” value can be “Jim”. For some keys user might like to keep
secure. There he can enable lock and set a password (voice clip). Single
user can ask multiple questions. Each question will be given ID to get
recognized along with the query and its corresponding answer. User can
also be having n number of tasks. These should have their own unique
id and status i.e. their current state. A task should also have a priority
value and its category whether it is a parent task or child task of an older
task.
17
5.2 Activity Diagram
18
5.3 Class Diagram
The class user has 2 attributes command that it sends in audio and the
response it receives which is also audio. It performs function to listen the
user command. Interpret it and then reply or sends back response accord-
ingly. Question class has the command in string form as it is interpreted
by interpret class. The task class also has interpreted command in string
format. It has various functions like reminder, note, mimic, research and
reader.
19
5.4 Use Case Diagram
In this project there is only one user. The user queries command to
the system. System then interprets it and fetches answer. The response
is sent back to the user
20
5.5 Sequence Diagram
The above sequence diagram shows how an answer asked by the user
is being fetched from internet. The audio query is interpreted and sent to
Web scraper. The web scraper searches and finds the answer. It is then
sent back to speaker, where it speaks the answer to user.
21
Chapter 6
Result
22
6.2 Output
23
Chapter 7
BIBLIOGRAPHY
Websites referred
• www.stackoverflow.com
• www.pythonprogramming.net
• www.codecademy.com
• www.tutorialspoint.com
• mySirG
24