0% found this document useful (0 votes)
119 views

Voice Based System Assistant Using NLP and Deep Learning-1

This document is a project report submitted for the degree of Bachelor of Technology in Computer Science Engineering. It describes developing a voice-based system assistant using natural language processing and deep learning. The project was completed by 4 students under the guidance of their professor. It includes an introduction to key concepts like deep learning, natural language processing, sequential models, natural language toolkit, lemmatization, Numpy, Tensorflow and Keras. It also provides the proposed system architecture, design diagrams, sample code, results and screenshots from testing the assistant. The report was submitted to fulfill the degree requirements of Anil Neerukonda Institute of Technology and Sciences.

Uploaded by

subbs reddy
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
119 views

Voice Based System Assistant Using NLP and Deep Learning-1

This document is a project report submitted for the degree of Bachelor of Technology in Computer Science Engineering. It describes developing a voice-based system assistant using natural language processing and deep learning. The project was completed by 4 students under the guidance of their professor. It includes an introduction to key concepts like deep learning, natural language processing, sequential models, natural language toolkit, lemmatization, Numpy, Tensorflow and Keras. It also provides the proposed system architecture, design diagrams, sample code, results and screenshots from testing the assistant. The report was submitted to fulfill the degree requirements of Anil Neerukonda Institute of Technology and Sciences.

Uploaded by

subbs reddy
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 82

Voice Based System Assistant using NLP and deep learning

A Project report submitted in partial fulfillment of the requirements for


the award of the degree of
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE ENGINEERING
Submitted by
K. Subba Reddy (318126510028)
K. Sesha Shai Datta (319126510L06)
A. Tarun (318126510004)
S. Ajay Varma (318126510048)

Under the guidance of


Mrs. K. Amaravathi
ASSISTANT PROFESSOR, DEPT.OF CSE, ANITS

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


ANIL NEERUKONDA INSTITUTE OF TECHNOLOGY AND SCIENCES
(UGC AUTONOMOUS)

(Permanently Affiliated to AU, Approved by AICTE and Accredited by NBA & NAAC with ‘A’ Grade)
Sangivalasa, bheemili mandal, visakhapatnam dist.(A.P)
2021-2022

1
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
ANIL NEERUKONDA INSTITUTE OF TECHNOLOGY AND SCIENCES
(UGC AUTONOMOUS)
(Affiliated to AU, Approved by AICTE and Accredited by NBA & NAAC with ‘A’ Grade) Sangivalasa,
Visakhapatnam district (A.P)

BONAFIDE CERTIFICATE

This is to certify that the project report entitled “Voice Based System Assistant using NLP and
deep learning” submitted by K. Subba Reddy(318126510028), K. Sesha Shai
Datta(319126510L06), A. Tarun(318126510004), S. Ajay Varma(318126510048) in partial
fulfillment of the requirements for the award of the degree of Bachelor of Technology in
Computer Science Engineering of Anil Neerukonda Institute of technology and sciences (A),
Visakhapatnam is a record of bonafide work carried out under my guidance and supervision.

PROJECT GUIDE HEAD OF THE DEPARTMENT

Mrs. K. Amaravathi Dr.R.Sivaranjani

(Assistant Professor) (Professor)


Dept of CSE, ANITS Dept of CSE, ANITS

2
DECLARATION

We K. Subba Reddy (318126510028), K. Sesha Shai Datta (319126510L06), A. Tarun (318126510004),


S. Ajay Varma (318126510048) of final semester B.Tech., in the department of Computer Science and
Engineering from ANITS, Visakhapatnam, hereby declare that the project work entitled “Voice Based

System Assistant using NLP and deep learning ” is carried out by us and submitted in partial fulfillment
of the requirements for the award of Bachelor of Technology in Computer Science Engineering, under
Anil Neerukonda Institute of Technology & Sciences(A) during the academic year 2018-2022 and has not
been submitted to any other university for the award of any kind of degree.

Submitted By Team 3A

K. Subba Reddy (318126510028)

K. Sesha Shai Datta (319126510L06)

A. Tarun (318126510004)

S. Ajay Varma (318126510048)

Mrs. K. Amaravathi
(Assistant Professor)
Dept. of CSE, ANITS

3
ACKNOWLEDGEMENT

The satisfaction and euphoria that accompany the successful completion of a task would be incomplete
without mentioning the people who made it possible, whose constant guidance and encouragement always
boosted morale. We take a great pleasure in presenting a project, which is the result of a studied blend of
both research and knowledge.

We first take the privilege to thank our Head of Department, DR.R.SIVARANJANI, for permitting us in
laying the first stone of success and providing the lab facilities, we would also like to thank the other staff in
our department and lab assistants who directly or indirectly helped us in successful completion of the
project.

We feel great to thank Mrs. K. Amaravathi, our project guide, who shared her valuable knowledge with us
and made us understand the real essence of the topic and created interest in us to work for the project.

PROJECT STUDENTS

K. Subba Reddy (318126510028)

K. Sesha Shai Datta (319126510L06)

A. Tarun (318126510004)

S. Ajay Varma (318126510048)

4
CONTENTS

TITLE                                                                              Page No
List of FIGURES 7
ABSTRACT 8
1. INTRODUCTION      9

1.1ProblemStatement                                                 9-10                                     


1.2 Deep Learning                                     10-11
1.3 Natural language processing 11-13                           

1.4 Sequential Model       13-16


1.5 Natural Language Toolkit 16-19
1.6 Lemmatizer     19-21    

1.7 Numpy      22-24


1.8 Tensorflow  24-25
1.9 Keras                                    25-26
2. LITERATURE SURVEY      27-30
3. METHODOLOGY       
3.1 Proposed System                                                 31                            
3.2 Architecture                                                         32               
4. System Design 33

4.1 State Chart Diagram 33-34


4.2 Use Case Diagram 34-35
4.3 Sequence Diagram 35-36
5. EXPERIMENTAL ANALYSIS AND RESULTS                                           

5
5.1 System configuration     
5.1.1 Software requirements                           37                             
5.1.2 Hardware requirements                          37                             
5.2 Sample Code      37-59       
5.3 Screenshots/results 59-62             
6. Platform used: Pycharm.
6.1 Intelligent Coding Assistance 63
6.2 Built-in Developer Tools 63
6.3 Customizable and Cross-platform IDE 64
7. Testing 65-67

8. CONCLUSION AND FUTURE SCOPE          


8.1 Conclusion 68         
8.2 Future Scope                                                       68         
9. REFERENCES       69
10. PAPER ACCEPTENCE LETTER 70

6
List of FIGURES:

Figure No Figure Name Page No


1.2 Deep Learning 10
1.3 Natural Language Processing 12
1.4(a) Method one for Sequential 14
model layers
1.4(b) Method two for Sequential 14
model layers
1.4(c) Creating a Sequential model 15
1.4(d) Removing layers in 15
Sequential Model
1.4(e) Specifying the input shape in 16
advance
1.5(a) Tokenize and tagging text 17
1.5(b) Tokenization 18
1.5(c) Lower case conversion 19
1.6 Lemmatization code 20
1.7 Creating Array using Numpy 23
2 Data flow diagram 28
3.2 Voice Based System 32
Assistant Architecture
4.1 State Chart diagram 34
4.2 Use Case Diagram 35
4.3 Sequence Diagram 36
5.3(a) Graphical User Interface 59
5.3(b) Google Search for time 60
5.3(c) Browsing web Information 60
5.3(d) Youtube Search 61

7
5.3(e) Writing text in Notepad 61
5.3(f) Opening System App 62
Control panel

Abstract

Desktop assistants also evolved during the period along with the humans. Nowadays people are
very used to it and it is a part of their day-to-day lives. We all want to make use of these
computers more comfortable. So there is one way by using an assistant. The typical way to
provide a command to the computer is via the keyboard, but there is a more convenient way to
do. Voice instruction is a more convenient way to give the command. These systems were used
in many different applications like human-computer interactions. It plays an important role in
some people's lives, like the physically disabled. This includes development of text to Braille
systems, screen magnifiers and screen readers. Recently, attempts have been made in order to
develop tools and technologies to help Blind people to access internet technologies. Some
people may find it hard to use the system, so to overcome those kinds of issues we use virtual
assistance this helps in their daily life. This Paper Build a general purpose that makes
conversations between user and computer.

This is accomplished by performing internal operations such as speech to text conversion, which
is accomplished using Python's built-in speech recognition modules, and then using the
converted text to identify and categorize the words, which is accomplished using lemmatizer, a
component of NLP (natural language processing), which divides the words into different groups
under labels such as greetings, Google's search, applications, and so on. After the text has been
identified, it is sent to the training model, where we will learn which intent or tag corresponds to
which command or instruction. The model will then communicate with the user via voice
response, asking for instructions to perform actions in the system such as opening an application
or searching for information. It will clarify any questions about instructions, and then the
instruction will be depending on the intent, such as an application, a YouTube search, or a
Google search. It will carry out the task specified in the intent. It will open and close the
application if it is an application. If it's a YouTube search, it'll go to YouTube and look for the
relevant text there. If it's a Google search, the browser will open and search for the relevant text.

8
1. Introduction

Machines had to learn how to hear, recognize, and analyze human speech.long before the
voice assistants integrated into our smart speakers could understand our requests to play music,
switch off lights, and read the weather report. The technology we use today has been under
development for more than a century, and it has come a long way from the first listening and
recording equipment. From the phonograph to the virtual assistants. Assistant is described as one
of the most advanced and promising interactions between humans and machines. An Assistant is
computer software that can perform a conversation with a user in natural language through
applications, mobile apps, or desktop applications. Different companies like google, Apple use
different API’s for this purpose. It is truly a feat that today, one can schedule meetings or send
email merely through spoken commands. Whenever we want to perform an action on the system
then we have to communicate with it using our hands. If a person is facing difficulty
communicating with the system physically, then it will not be proper communication. To
overcome these kinds of problems we made an initiative to develop an application. With the
rapid development of deep learning techniques, it is now possible to solve these types of
complicated problems utilizing neural networks, which overcomes the constraints of traditional
machine learning methodology. We can extract high-level features from provided data using
deep learning and neural networks. In this way, the limitations of machine learning are being
overcome by using Deep Learning techniques.

1.1 Problem Statement

To improve customer service and increase delivery of services through the advancement in
technology. This is to gain a competitive edge over other benefits of specific search engines.In
order to overcome user satisfaction issues associated with online services. This Voice Assistant
will provide personal and efficient communication between the user and their needs in order to
reach the user desires. The Voice Assistant will allow user to feel confident and comfortable
when they are using this service regardless of the user’s computer literacy due to the natural
language used in messages. It also provides a very accessible and efficient service as all
interactions will take place within the one chat conversation negating the need for the user to
navigate through a site. Generally a Voice Assistant is which can communicate or have an
interaction with a human but the main motivation of our project is that one have to use the Voice
Assistant comfortably and in most efficient way. Not only a normal person but also a blind
person can access the personal computer by just giving the command by speech or voice, even an

9
illiterate person can access the computer just by entering the command or can access vocally.
Computer programs which can have real conversations are known as Voice Assistant. Voice
Assistant can be used with almost all popular apps. These bots can be given distinct personalities
as well. Voice Assistant can understand written and spoken text, and interpret its meaning. The
Voice Assistant can then look up relevant information and deliver it to the user. Most of the
websites rely on Voice Assistant for providing customer a quick response. The motivating
reasons for developing the project are accessible anytime, cost effective, handling capacity,
flexible and customer satisfaction. Different companies like google, Apple use different API’s
for this purpose. It is truly a feat that today, one can schedule meetings or send email merely
through spoken commands. . If a person is facing difficulty communicating with the system
physically, then it will not be proper communication. To overcome these kinds of problems we
made an initiative to develop an application.

1.2 Deep Learning

Deep learning is a subset of machine learning, which is essentially a neural network with three
or more layers. These neural networks attempt to simulate the behavior of the human brain albeit
far from matching its ability—allowing it to “learn” from large amounts of data. While a neural
network with a single layer can still make approximate predictions, additional hidden layers can
help to optimize and refine for accuracy.
Deep learning drives many Artificial Intelligence(AI) applications and services that improve
automation, performing analytical and physical tasks without human intervention. Deep learning
technology lies behind everyday products and services (such as digital assistants, voice-enabled
TV remotes, and credit card fraud detection) as well as emerging technologies (such as self-
driving cars). Deep learning is an increasingly popular subset of machine learning. Deep learning
models are built using neural networks. A neural network takes in inputs, which are then
processed in hidden layers using weights that are adjusted during training. Then the model spits
out a prediction. The weights are adjusted to find patterns in order to make better predictions.
The user does not need to specify what patterns to look for — the neural network learns on its
own.

Keras is a user-friendly neural network library written in Python. In this tutorial, I will go over
two deep learning models using Keras: one for regression and one for classification. We will
build a regression model to predict an employee’s wage per hour, and we will build a
classification model to predict whether or not a patient has diabetes.

10
Fig 1.2 Deep Learning
If deep learning is a subset of machine learning, how do they differ? Deep learning distinguishes
itself from classical machine learning by the type of data that it works with and the methods in
which it learns.

Machine learning algorithms leverage structured, labeled data to make predictions—meaning


that specific features are defined from the input data for the model and organized into tables.
This doesn’t necessarily mean that it doesn’t use unstructured data; it just means that if it does, it
generally goes through some pre-processing to organize it into a structured format.

Deep learning eliminates some of data pre-processing that is typically involved with machine
learning. These algorithms can ingest and process unstructured data, like text and images, and it
automates feature extraction, removing some of the dependency on human experts. For example,
let’s say that we had a set of photos of different pets, and we wanted to categorize by “cat”,
“dog”, “hamster”, et cetera. Deep learning algorithms can determine which features (e.g. ears)
are most important to distinguish each animal from another. In machine learning, this hierarchy
of features is established manually by a human expert.

Then, through the processes of gradient descent and backpropagation, the deep learning
algorithm adjusts and fits itself for accuracy, allowing it to make predictions about a new photo
of an animal with increased precision.

Machine learning and deep learning models are capable of different types of learning as well,
which are usually categorized as supervised learning, unsupervised learning, and reinforcement
learning. Supervised learning utilizes labeled datasets to categorize or make predictions; this
requires some kind of human intervention to label input data correctly. In contrast, unsupervised
learning doesn’t require labeled datasets, and instead, it detects patterns in the data, clustering
them by any distinguishing characteristics. Reinforcement learning is a process in which a model
learns to become more accurate for performing an action in an environment based on feedback in
order to maximize the reward.

11
1.3 Natural language processing

Natural language processing (NLP) refers to the branch of computer science—and more
specifically, the branch of artificial intelligence or AI—concerned with giving computers the
ability to understand text and spoken words in much the same way human beings can. NLP
combines computational linguistics—rule-based modeling of human language—with statistical,
machine learning, and deep learning models. Together, these technologies enable computers to
process human language in the form of text or voice data and to ‘understand’ its full meaning,
complete with the speaker or writer’s intent and sentiment.
NLP drives computer programs that translate text from one language to another, respond to
spoken commands, and summarize large volumes of text rapidly—even in real time. There’s a
good chance you’ve interacted with NLP in the form of voice-operated GPS systems, digital
assistants, speech-to-text dictation software, customer service chatbots, and other consumer
conveniences. But NLP also plays a growing role in enterprise solutions that help streamline
business operations, increase employee productivity, and simplify mission-critical business
processes. Natural language processing, or NLP, is a branch of computer science that involves
the analysis of human language in speech and text. A specific subset of AI and machine learning
(ML), NLP is already widely used in many applications today. NLP is how voice assistants, such
as Siri and Alexa, can understand and respond to human speech and perform tasks based on
voice commands.

NLP is the driving technology that allows machines to understand and interact with human
speech, but is not limited to voice interactions. Natural language processing is also the
technology behind apps such as customer service chatbots. In addition, NLP enables email and
SMS apps to automatically suggest replies or text to complete a message as it is typed. These
applications, just like voice assistants, cannot intuitively understand human (or “natural”)
language.

12
Fig 1.3 Natural Language Processing

Everything we express (either verbally or in written) carries huge amounts of information. The
topic we choose, our tone, our selection of words, everything adds some type of information that
can be interpreted and value extracted from it. In theory, we can understand and even predict
human behaviour using that information.

But there is a problem: one person may generate hundreds or thousands of words in a
declaration, each sentence with its corresponding complexity. If you want to scale and analyze
several hundreds, thousands or millions of people or declarations in a given geography, then the
situation is unmanageable.

Data generated from conversations, declarations or even tweets are examples of unstructured
data. Unstructured data doesn’t fit neatly into the traditional row and column structure of
relational databases, and represent the vast majority of data available in the actual world. It is
messy and hard to manipulate. Nevertheless, thanks to the advances in disciplines like machine
learning a big revolution is going on regarding this topic. Nowadays it is no longer about trying
to interpret a text or speech based on its keywords (the old fashioned mechanical way), but about
understanding the meaning behind those words (the cognitive way). This way it is possible to
detect figures of speech like irony, or even perform sentiment analysis.

It is a discipline that focuses on the interaction between data science and human language, and is
scaling to lots of industries. Today NLP is booming thanks to the huge improvements in the
access to data and the increase in computational power, which are allowing practitioners to
achieve meaningful results in areas like healthcare, media, finance and human resources, among
others.

13
1.4 Sequential Model:

When we are using NLP to deal with textual data, one key point we must understand is that the
data is always in the form of sequences and the order of the data matters. For any given sentence,
if the order of words is changed, the meaning of the sentence doesn’t stay the same, hence we
can say that the sentence information is stored in both the words as well as the order of the words
in that particular sentence. In any type of data, if the sequential order matters, we call it
sequential data.

Traditional neural networks typically cannot handle sequential data. This is because when we
build a neural network for a particular task, we need to set a fixed input size at the beginning, but
in sequential data, the size of the data can vary. A sentence can contain 5 words, or 20 words,
hence we cannot configure a neural network to effectively deal with this kind of data. Even if we
were dealing with sentences with the same number of words, which is an ideal scenario, when
we input the processed words into a neural network of some fixed input size, a neural network is
not designed to pay attention to the sequence of the words. The model will effectively learn from
the semantic information of the individual words in the sentence, but it will fail to learn from the
order of the words in the sentence.

To convert textual data into numerical format so that we can input them into neural networks, we
must convert them into vectors. These can be either one hot encoded vectors or word vectors. I
have explained about these in the previous Deep NLP article over here. So our textual data will
turn into a sequence of vectors, which is exactly the format we need.
A Sequential model is appropriate for a plain stack of layers where each layer has exactly one
input tensor and one output tensor.

Schematically, the following Sequential model:

Fig 1.4(a): Method one for Sequential model layers

14
is equivalent to this function:

Fig 1.4(b): Method two for Sequential model layers

A Sequential model is not appropriate when:

 Your model has multiple inputs or multiple outputs


 Any of your layers has multiple inputs or multiple outputs
 You need to do layer sharing
 You want non-linear topology (e.g. a residual connection, a multi-branch model)

Creating a Sequential model


You can create a Sequential model by passing a list of layers to the Sequential constructor:

Fig 1.4(c): Creating a Sequential model

15
Fig 1.4(d): Removing layers in Sequential Model

Specifying the input shape in advance:


Generally, all layers in Keras need to know the shape of their inputs in order to be able to create
their weights. So when you create a layer like this, initially, it has no weights:

Fig 1.4(e): Specifying the input shape in advance

1.5 Natural Language Toolkit:

NLTK is a leading platform for building Python programs to work with human language data. It
provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along

16
with a suite of text processing libraries for classification, tokenization, stemming, tagging,
parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries, and an active
discussion forum.

Thanks to a hands-on guide introducing programming fundamentals alongside topics in


computational linguistics, plus comprehensive API documentation, NLTK is suitable for
linguists, engineers, students, educators, researchers, and industry users alike. NLTK is available
for Windows, Mac OS X, and Linux. Best of all, NLTK is a free, open source, community-
driven project.

NLTK has been called “a wonderful tool for teaching, and working in, computational linguistics
using Python,” and “an amazing library to play with natural language.”

Natural Language Processing with Python provides a practical introduction to programming for
language processing. Written by the creators of NLTK, it guides the reader through the
fundamentals of writing Python programs, working with corpora, categorizing text, analyzing
linguistic structure, and more. The online version of the book has been been updated for Python
3 and NLTK 3.
NLTK is a toolkit build for working with NLP in Python. It provides us various text processing
libraries with a lot of test datasets. A variety of tasks can be performed using NLTK such as
tokenizing, parse tree visualization, etc… In this article, we will go through how we can set up
NLTK in our system and use them for performing various NLP tasks during the text processing
step.

Fig 1.5(a): Tokenize and tagging text

17
You can see this screen and install the required corpus. Once you have completed this step let’s
dive deep into the different operations using NLTK.

Tokenization:
The breaking down of text into smaller units is called tokens. tokens are a small part of that text.
If we have a sentence, the idea is to separate each word and build a vocabulary such that we can
represent all words uniquely in a list. Numbers, words, etc.. all fall under tokens.
The security and risk reduction benefits of tokenization require that the tokenization system is
logically isolated and segmented from data processing systems and applications that previously
processed or stored sensitive data replaced by tokens. Only the tokenization system can tokenize
data to create tokens, or detokenize back to redeem sensitive data under strict security controls.
The token generation method must be proven to have the property that there is no feasible means
through direct attack, cryptanalysis, side channel analysis, token mapping table exposure or brute
force techniques to reverse tokens back to live data.

Replacing live data with tokens in systems is intended to minimize exposure of sensitive data to
those applications, stores, people and processes, reducing risk of compromise or accidental
exposure and unauthorized access to sensitive data. Applications can operate using tokens
instead of live data, with the exception of a small number of trusted applications explicitly
permitted to detokenize when strictly necessary for an approved business purpose. Tokenization
systems may be operated in-house within a secure isolated segment of the data center, or as a
service from a secure service provider.

Tokenization may be used to safeguard sensitive data involving, for example, bank accounts,
financial statements, medical records, criminal records, driver's licenses, loan applications, stock
trades, voter registrations, and other types of personally identifiable information (PII).
Tokenization is often used in credit card processing. The PCI Council defines tokenization as "a
process by which the primary account number (PAN) is replaced with a surrogate value called a
token. De-tokenization[5] is the reverse process of redeeming a token for its associated PAN
value. The security of an individual token relies predominantly on the infeasibility of
determining the original PAN knowing only the surrogate value".[6] The choice of tokenization
as an alternative to other techniques such as encryption will depend on varying regulatory
requirements, interpretation, and acceptance by respective auditing or assessment entities. This is
in addition to any technical, architectural or operational constraint that tokenization imposes in
practical use.

18
Fig 1.5(b): Tokenization

Lower case conversion:


We want our model to not get confused by seeing the same word with different cases like one
starting with capital and one without and interpret both differently. So we convert all words into
the lower case to avoid redundancy in the token list.

Fig 1.5(c): Lower case conversion

1.6 Lemmatizer
Lemmatization in NLTK is the algorithmic process of finding the lemma of a word
depending on its meaning and context. Lemmatization usually refers to the morphological
analysis of words, which aims to remove inflectional endings. It helps in returning the base or
dictionary form of a word known as the lemma.

The NLTK Lemmatization method is based on WorldNet’s built-in morph function. Text
preprocessing includes both stemming as well as lemmatization. Many people find the two terms
confusing. Some treat these as the same, but there is a difference between stemming vs
lemmatization. Lemmatization is preferred over the former because of the below reason

19
Stemming algorithm works by cutting the suffix from the word. In a broader sense cuts either the
beginning or end of the word.

On the contrary, Lemmatization is a more powerful operation, and it takes into consideration
morphological analysis of the words. It returns the lemma which is the base form of all its
inflectional forms. In-depth linguistic knowledge is required to create dictionaries and look for
the proper form of the word. Stemming is a general operation while lemmatization is an
intelligent operation where the proper form will be looked in the dictionary. Hence,
lemmatization helps in forming better machine learning features.

Lemmatizer minimizes text ambiguity. Example words like bicycle or bicycles are converted to
base word bicycle. Basically, it will convert all words having the same meaning but different
representation to their base form. It reduces the word density in the given text and helps in
preparing the accurate features for training machine. Cleaner the data, the more intelligent and
accurate your machine learning model, will be. NLTK Lemmatizer will also saves memory as
well as computational cost.

In many languages, words appear in several inflected forms. For example, in English, the verb 'to
walk' may appear as 'walk', 'walked', 'walks' or 'walking'. The base form, 'walk', that one might
look up in a dictionary, is called the lemma for the word. The association of the base form with a
part of speech is often called a lexeme of the word.

Lemmatisation is closely related to stemming. The difference is that a stemmer operates on a


single word without knowledge of the context, and therefore cannot discriminate between words
which have different meanings depending on part of speech. However, stemmers are typically
easier to implement and run faster. The reduced "accuracy" may not matter for some
applications. In fact, when used within information retrieval systems, stemming improves query
recall accuracy, or true positive rate, when compared to lemmatisation. Nonetheless, stemming
reduces precision, or the proportion of positively-labeled instances that are actually positive, for
such systems.
For instance:

The word "better" has "good" as its lemma. This link is missed by stemming, as it requires a
dictionary look-up.

The word "walk" is the base form for the word "walking", and hence this is matched in both
stemming and lemmatisation.

20
The word "meeting" can be either the base form of a noun or a form of a verb ("to meet")
depending on the context; e.g., "in our last meeting" or "We are meeting again tomorrow".
Unlike stemming, lemmatisation attempts to select the correct lemma depending on the context.

Fig 1.6: Lemmatization code

NLTK Lemmatization is the process of grouping the inflected forms of a word in order to
analyze them as a single word in linguistics. NLTK has different lemmatization algorithms and
functions for using different lemma determinations. Lemmatization is more useful to see a
word’s context within a document when compared to stemming. Unlike stemming,
lemmatization uses the part of speech tags and the meaning of the word in the sentence to see the
main context of the document. Thus, NLTK Lemmatization is important for understanding a text
and using it for Natural Language Processing, and Natural Language Understanding practices.
NLTK Lemmatization is called morphological analysis of the words via NLTK. To perform text
analysis, stemming and lemmatization, both can be used within NLTK. The main use cases of
the NLTK Lemmatization are below.
1. Information Retrieval Systems
2. Indexing Documents within Word Lists
3. Text Understanding
4. Text Clustering
5. Word Tokenization and Visualization

21
The NLTK Lemmatization example above contains word tokenization, and a specific
lemmatization function example that returns the words’ original form within the sentence and
their lemma within a dictionary. The NLTK Lemmatization code block example above can be
explained as follows.

1. Import the WordNetLemmetizer from nltk.stem


2. Import word_tokenize from nltk.tokenize
3. Create a variable for the WordNetLemmetizer() method representation.
4. Define a custom function for NLTK Lemmatization with the argument that will include
the text for lemmatization.
5. Use a list, and for loop for tokenization and lemmatization.
6. Append the tokenized and lemmatized words into a dictionary to compare their lemma
and original forms to each other.
7. Call the function with an example text for lemmatization.

1.7 Numpy:

This article will help you get acquainted with the widely used array-processing library in Python,
NumPy. What is NumPy? NumPy is a general-purpose array-processing package. It provides a
high-performance multidimensional array object, and tools for working with these arrays. It is
the fundamental package for scientific computing with Python. It is open-source software. It
contains various features including these important ones:

A powerful N-dimensional array object Sophisticated (broadcasting) functions Tools for


integrating C/C++ and Fortran code Useful linear algebra, Fourier transform, and random
number capabilities Besides its obvious scientific uses, NumPy can also be used as an efficient
multi-dimensional container of generic data. Arbitrary data-types can be defined using Numpy
which allows NumPy to seamlessly and speedily integrate with a wide variety of databases.

22
NumPy, which stands for Numerical Python, is a library consisting of multidimensional array
objects and a collection of routines for processing those arrays. Using NumPy, mathematical and
logical operations on arrays can be performed. This tutorial explains the basics of NumPy such
as its architecture and environment. It also discusses the various array functions, types of
indexing, etc. An introduction to Matplotlib is also provided. All this is explained with the help
of examples for better understanding.

This tutorial has been prepared for those who want to learn about the basics and various
functions of NumPy. It is specifically useful for algorithm developers. After completing this
tutorial, you will find yourself at a moderate level of expertise from where you can take yourself
to higher levels of expertise.

You should have a basic understanding of computer programming terminologies. A basic


understanding of Python and any of the programming languages is a plus.

Fig 1.7 Creating Array using Numpy

For example, you can create an array from a regular Python list or tuple using the array function.
The type of the resulting array is deduced from the type of the elements in the sequences.

23
Often, the elements of an array are originally unknown, but its size is known. Hence, NumPy
offers several functions to create arrays with initial placeholder content. These minimize the
necessity of growing arrays, an expensive operation. For example: np.zeros, np.ones, np.full,
np.empty, etc.

To create sequences of numbers, NumPy provides a function analogous to range that returns
arrays instead of lists.
Arange: returns evenly spaced values within a given interval. step size is specified.
Linspace: returns evenly spaced values within a given interval. num no. of elements are
returned.
Reshaping array: We can use reshape method to reshape an array. Consider an array with shape
(a1, a2, a3, …, aN). We can reshape and convert it into another array with shape (b1, b2, b3, …,
bM). The only required condition is: a1 x a2 x a3 … x aN = b1 x b2 x b3 … x bM . (i.e original
size of array remains unchanged.)
Flatten array: We can use flatten method to get a copy of array collapsed into one dimension. It
accepts order argument. Default value is ‘C’ (for row-major order). Use ‘F’ for column major
order.
We will use the Python programming language for all assignments in this course. Python is a
great general-purpose programming language on its own, but with the help of a few popular
libraries (numpy, scipy, matplotlib) it becomes a powerful environment for scientific computing.

We expect that many of you will have some experience with Python and numpy; for the rest of
you, this section will serve as a quick crash course on both the Python programming language
and its use for scientific computing. We’ll also introduce notebooks, which are a very convenient
way of tinkering with Python code.

1.8 Tensorflow

TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive,
flexible ecosystem of tools, libraries, and community resources that lets researchers push the
state-of-the-art in ML, and gives developers the ability to easily build and deploy ML-powered
applications.

TensorFlow provides a collection of workflows with intuitive, high-level APIs for both
beginners and experts to create machine learning models in numerous languages. Developers
have the option to deploy models on a number of platforms such as on servers, in the cloud, on
mobile and edge devices, in browsers, and on many other JavaScript platforms. This enables
developers to go from model building and training to deployment much more easily.

24
TensorFlow is an open source software library for numerical computation using data-flow
graphs. It was originally developed by the Google Brain Team within Google's Machine
Intelligence research organization for machine learning and deep neural networks research, but
the system is general enough to be applicable in a wide variety of other domains as well. It
reached version 1.0 in February 2017, and has continued rapid development, with 21,000+
commits thus far, many from outside contributors. This article introduces TensorFlow, its open
source community and ecosystem, and highlights some interesting TensorFlow open sourced
models.

TensorFlow is cross-platform. It runs on nearly everything: GPUs and CPUs—including mobile


and embedded platforms—and even tensor processing units (TPUs), which are specialized
hardware to do tensor math on. They aren't widely available yet, but we have recently launched
an alpha program.

Keras is compact, easy to learn, high-level Python library run on top of TensorFlow framework.
It is made with focus of understanding deep learning techniques, such as creating layers for
neural networks maintaining the concepts of shapes and mathematical details. The creation of
freamework can be of the following two types −

i. Sequential API
ii. Functional API
Consider the following eight steps to create deep learning model in Keras −

i. Loading the data


ii. Preprocess the loaded data
iii. Definition of model
iv. Compiling the model
v. Fit the specified model
vi. Evaluate it
vii. Make the required predictions
viii. Save the model

1.9Keras

Keras is an open-source high-level Neural Network library, which is written in Python is capable
enou gh to run on Theano, TensorFlow, or CNTK. It was developed by one of the Google
engineers, Francois Chollet. It is made user-friendly, extensible, and modular for facilitating
faster experimentation with deep neural networks. It not only supports Convolutional Networks
and Recurrent Networks individually but also their combination.

25
It cannot handle low-level computations, so it makes use of the Backend library to resolve it. The
backend library act as a high-level API wrapper for the low-level API, which lets it run on
TensorFlow, CNTK, or Theano.

Initially, it had over 4800 contributors during its launch, which now has gone up to 250,000
developers. It has a 2X growth ever since every year it has grown. Big companies like Microsoft,
Google, NVIDIA, and Amazon have actively contributed to the development of Keras. It has an
amazing industry interaction, and it is used in the development of popular firms likes Netflix,
Uber, Google, Expedia, etc.

Keras user experience


1. Keras is an API designed for humans
Best practices are followed by Keras to decrease cognitive load, ensures that the models
are consistent, and the corresponding APIs are simple.
2. Not designed for machines
Keras provides clear feedback upon the occurrence of any error that minimizes the
number of user actions for the majority of the common use cases.
3. Easy to learn and use.
4. Highly Flexible
Keras provide high flexibility to all of its developers by integrating low-level deep
learning languages such as TensorFlow or Theano, which ensures that anything written in the
base language can be implemented in Keras.

Keras can be developed in R as well as Python, such that the code can be run with TensorFlow,
Theano, CNTK, or MXNet as per the requirement. Keras can be run on CPU, NVIDIA GPU,
AMD GPU, TPU, etc. It ensures that producing models with Keras is really simple as it totally
supports to run with TensorFlow serving, GPU acceleration (WebKeras, Keras.js), Android (TF,
TF Lite), iOS (Native CoreML) and Raspberry Pi.

Keras is a high-level neural networks API developed with a focus on enabling fast
experimentation. Being able to go from idea to result with the least possible delay is key to doing
good research. Keras has the following key features:

 Allows the same code to run on CPU or on GPU, seamlessly.

 User-friendly API which makes it easy to quickly prototype deep learning models.

 Built-in support for convolutional networks (for computer vision), recurrent networks
(for sequence processing), and any combination of both.

26
 Supports arbitrary network architectures: multi-input or multi-output models, layer
sharing, model sharing, etc. This means that Keras is appropriate for building essentially
any deep learning model, from a memory network to a neural Turing machine.

Keras empowers engineers and researchers to take full advantage of the scalability and cross-
platform capabilities of TensorFlow 2: you can run Keras on TPU or on large clusters of GPUs,
and you can export your Keras models to run in the browser or on a mobile device.

2. LITERATURE SURVEY

Desktop Voice Assistant for Visually Impaired

The usage of virtual assistants is expanding rapidly after 2017, more and more products are
coming into the market. Due to advancement in the technology many different features are being added in
the mobile phone and desktops. To use them with more convenient and fun way we require a means of
input which is faster and reliable at the same time. In our project we use voice command to input the data
into the system for that the microphone is used which convert acoustic energy into electrical energy. After
taking the input there is a requirement to understand the audio signal for this google API is used.
Different companies like google, apple use different API’s for this purpose. It is truly a feat that today,
one can schedule meetings or send email merely through spoken commands.

1. Speech recognition The proposed system used the google API to convert input speech into
text. The speech is given as an input to google cloud for processing, As an output, the system
then receives the resulting text.

2. Backend work At backend the python gets the output from speech recognition and after that it
identifies whether the command is a system command or a browser command. The output is sent
back to the python backend to give desired output to the user.

3. Text to speech Text to speech, or TTS, is a new wave technique for transforming voice
commands into readable text. Not to mix it up with VR Systems that instead, generate speech by
joining strings gathered in an exhaustive DB of preinstalled text and have been developed for
different goals which form full-fledged sentences, clauses or meaningful phrases through a
dialect's graphemes and phonemes. Such systems have their limits as they can only determine

27
text on the basis of pre-determined text in their databases TTS systems, on the other hand, are
practically to "read" strings of characters and dole out resulting sentences, clauses and phrases.

ii) Proposed Architecture The system design consists of


1. Taking the input as speech patterns through the microphone.
2. Audio data recognition and conversion into text.
3. Comparing the input with predefined commands
4. Giving the desired output The initial phase includes the data being taken in as speech patterns
from the microphone.in the second phase the collected data is worked over and transformed into
textual data using NLP. In the next step, this resulting stringified data is manipulated through
Python Script to finalize the required output process. In the last phase, the produced output is
presented either in the form of text or converted from text to speech using TTS.

Fig 2: Data flow diagram

An Intelligent Behaviour Shown by Chatbot System:

A chatterbot is a computer program which conducts a conversation via auditory or textual


methods. Such programs are often created to convincingly simulate how a human would behave
as a conversational partner, thereby passing the Turing test. Chatbots are mainly used in dialog
systems for various practical purposes including customer services or information acquisition.
There are two main types of chatbots available, one whose functions are based on a set of rules
and other is the more advanced version which uses artificial intelligence. The former one tends to
be limited and their smartness depends upon the complexity of the program. The more complex

28
the program is, the more is the smartness of the bot. The one that uses artificial intelligence,
understands language, not just commands, and continuously gets smarter as it learns from the
conversation with the people. A chatbot can also perform some basic functions like calculations,
setting-up remainders, alarms etc. A popular example is ALICE Bot (Artificial Linguistic
Internet Computer Entity), that uses AIML(Artificial Intelligence Mark-Up Language) pattern
matching techniques. Turing Test is the one of the most popular measures of intelligence of such
systems. This test was proposed by British mathematician Alan Turing in his 1950 paper titled
“Computing Machinery and Intelligence” published in Mind. According to this test, when a
panel of human beings is conversing with an unknown entity believes that entity is human while
it was a computer, then the computer is said to have passed the turing test. A natural language
processing (NLP) gives capability of computer allows communication to happen between Vibhor
Sharma, Assistant Prof. , Department of Information Technology Maharaja Agrasen Institute Of
Technology GGSIPU, New Delhi, India Monika Goyal , Department of Information Technology
Maharaja Agrasen Institute Of Technology GGSIPU, New Delhi, India Drishti Malik,
Department of Information Technology Maharaja Agrasen Institute Of Technology GGSIPU,
New Delhi, India user-to-computer or human-to-machine and computer-to-computer or machine-
to-machine using human natural language.

Intelligent Android Voice Assistant - A Future Requisite:

The smart phone market is one of the most competitive markets in the world today with various
competitors such as Samsung, Google, Sony, HTC etc. As the users increase day by day,
facilities are also increasing. In recent years, smart phones have placed a rising emphasis on
bringing speech technologies into mainstream usage. The purpose of voice assistant systems is
the exchange of information in a more interactive approach using speech to commune. It is
estimated that smart phones captured 44% of all mobile phone sales in the December 2012
quarter with Android smart phones taking 31% of all mobile phone shipments and iOS in second
place at 9%. Android smart phones grew 88% year over year with iOS at 23%. So, it is
preferable that the application should be made on android platform as more number of people
can be facilitated by the personal assistants.[1] According to a survey, 54% of users agreed that
AI personal assistants make their lives easier. 31% said that AI assistants are part of their
everyday lives. 65% agreed that they have many different uses. Looking forward, the result of
the survey also unfold that 65% of users said that they regularly ask general questions to their AI
personal assistants. 40% use them to get directions while they drive. 25% use them to make calls.
23% dictate texts or emails through those assistants. 17% use them to receive updates. And 9%
use them in other ways, like for weather alerts and appointment reminders.[2] This paper
presents an Intelligent Android Voice Assistant system using speech recognition technologies of
mobile devices for every common user who is interested in AI personal assistant.

Voice assistant privacy concerns:


29
A user may have a privacy concern as personal assistant require a huge amount of data and are
always listening to take the command This passive data is then, retained and sifted through
humans that are employed by almost all of the major companies- Amazon, Apple etc. In addition
to this discovery over the Ai being able to record our audio interactions, there have been
concerns over the type of data such employees and contractors were hearing. So, in the cloud-
based voice assistant, privacy policy must be there which will protect the user information
As voice assistants and voice-activated Internet of Things devices work their way into people's
homes, there are plenty of new opportunities for developers and businesses. Creating an app that
obeys voice commands and interacts with a third-party voice assistant presents the opportunity to
reach a rapidly-growing market of tech-savvy consumers.
But you must proceed with caution. There are significant privacy issues associated with this type
of technology. There's a real danger of scaring people away if you aren't totally transparent.
Voice assistants and privacy is a topic that's often in the news. While most of the attention is on
device manufacturers, anyone who offers tools and services that use the technology also needs to
think carefully about the privacy implications.

Whether it's a mobile app that uses voice recognition, or a dedicated service for a voice assistant
gadget, you need to comply with both the rules of the device manufacturer and a host of national
and international laws.
Everyone is familiar with voice assistants: in many households, Alexa is practically a member of
the family, and whenever kids nowadays want to hear their favorite tunes, all they need to do is
ask, “Alexa, play … “. Every iPhone user knows Siri, every Android phone user might know the
Google Assistant. And, of course, the list goes on: there is Samsung’s Bixby, Microsoft’s
Cortana, Huawei’s HiVoice, etc. However, to be fair, 97% of all voice assistant users choose one
of the big three: Alexa, Siri, or Google Assistant.

Voice assistants are integrated into smart devices, like Alexa in Amazon Echo and Amazon Echo
Display, Apple’s Siri in HomePod or Google Assistant in Google Home. They are also usually
compatible with other digital equipment, such as smart TVs, mobile phones etc. Moreover,
Alexa’s functionality, for instance, can be extended not only by Amazon itself, (functions called
“Alexa skills”), but also through skills provided by third party users (e.g., radio station x
provides the skill “Alexa, play [radio station x]” in the Amazon’s skill store).

And, last but certainly not least, there is the automotive industry. Nearly every car company
either provides their own voice assistant, an interface for Google Assistant and/or Siri, or
integrates Alexa directly. Every driver of a modern Audi, Mercedes or BMW knows their “Hey
[Audi | Mercedes | BMW]” activating commands.

30
3. METHODOLOGY

3.1 Proposed System

We developed a program that serves the needs of the user, but the user cannot run the
code always when he is in need. So we developed a GUI(Graphical user interface) by using
QWidgets. It's a good tool for developers to design an efficient interface, where the data and the
features of the system are represented graphically. So he can directly interact with the interface
conveniently. The JSON file is used as a dataset in our project. This includes various kinds of
data, and these data are categorized by using Tag or Label, this tells about which kind of data is
stored under the name of the tag. The model which we developed has to be trained. By training,
we will get more accurate results and by doing this more and more the Loss function(Expected
output-Original output) will get less cost. So we used the Sequential model which is part of
Keras. It is a linear stack of layers Each layer has a particular number of neurons which are used
to process the data to get desired results. Keras is an open-source platform which acts as an
interface between Python and ANN(Artificial neural network). By using this we trained and
tested the data stored in the json dataset files. By doing the training of the model along with the
model testing, the assistant is ready to answer the user queries by performing some internal
operations, such as speech to text conversion it is done by using python inbuilt speech
recognition modules and after by using the converted text to identify the words and to categorize
them we used lemmatizer which is part of NLP( natural language processing), it then divides into
different groups under labels like greetings, googles search, applications, etc. After identifying
the text it is sent to the training model, then we will get to know which intent or tag based on the
command or instruction, the model will communicate with the user with voice response by
asking for instructions to perform the actions in the system like opening the application or
searching for information. It will ask doubts about instructions and after getting clarity then the
instruction is based on the intent like application, youtube search, google search, etc.. it will

31
perform the task related to the intent If it is an application then it will open the application and
close the application, If it is a youtube search it will go to the youtube and search the required
text in the youtube, If it is a google search it will go to the browser and search the required text in
the chrome. Every time we use the model it will be getting trained more and more in the time
being and it will produce more accurate results.

3.2 Architecture:

Fig 3.2: Voice Based System Assistant Architecture

32
To answer the user queries by performing some internal operations, such as speech to text
conversion it is done by using python inbuilt speech recognition modules and after by using the
converted text to identify the words and to categorize them we used lemmatizer which is part of
NLP( natural language processing), it then divides into different groups under labels like
greetings, googles search, applications, etc. After identifying the text it is sent to the training
model, then we will get to know which intent or tag based on the command or instruction, the
model will communicate with the user with voice response by asking for instructions to perform
the actions in the system like opening the application or searching for information. It will ask
doubts about instructions and after getting clarity then the instruction is based on the intent like
application, youtube search, google search, etc.. it will perform the task related to the intent If it
is an application then it will open the application and close the application.

4. System Design

UML is an acronym that stands for Unified Modeling Language. Simply put, UML is a modern
approach to modeling and documenting software. It is based on diagrammatic representations of
software Components. As the old proverb says: “ a picture is worth a thousand words”. By using
visual representations, we are able to better understand possible flaws or errors in software or
business processes. UML was created as a result of the chaos revolving around software
development and documentation. In the 1990s, there were several different ways to represent and
document software systems. The need arose for a more unified way to visually represent those
systems and as a result, in 1994-1996, the UML was developed by three software engineers
working at Rational Software. It was later adopted as the standard in 1997 and has remained the
standard ever since, receiving only a few updates.

Mainly, UML has been used as a general-purpose modeling language in the field of software
engineering. However, it has now found its way into the documentation of several business
processes or workflows. For example, activity diagrams, a type of UML diagram, can be used as
a replacement for flowcharts. They provide both a more standardized way of modeling
workflows as well as a wider range of features to improve readability and efficacy. There are
several types of UML diagrams and each one of them serves a different purpose regardless of
whether it is being designed before the implementation or after (as part of documentation).The
two most broad categories that encompass all other types are Behavioral UML diagram
and Structural UML diagram. As the name suggests, some UML diagrams try to analyze and
depict the structure of a system or process, whereas other describe the behavior of the system, its
actors, and its building
components.

33
4.1 State Chart Diagram:

Statechart diagram is one of the five UML diagrams used to model the dynamic nature of a
system. They define different states of an object during its lifetime and these states are changed
by events. Statechart diagrams are useful to model the reactive systems. Reactive systems can be
defined as a system that responds to external or internal events. Statechart diagram describes the
flow of control from one state to another state. States are defined as a condition in which an
object exists and it changes when some event is triggered. The most important purpose of
Statechart diagram is to model lifetime of an object from creation to termination.

Fig 4.1 State Chart diagram

34
4.2 Use Case Diagram:

A UML use case diagram is the primary form of system/software requirements for a new
software program underdeveloped. Use cases specify the expected behavior (what), and not the
exact method of making it happen (how). Use cases once specified can be denoted both textual
and visual representation (i.e. use case diagram). A key concept of use case modeling is that it
helps us design a system from the end user's perspective. It is an effective technique for
communicating system behavior in the user's terms by specifying all externally visible system
behavior.

Fig 4.2 Use Case Diagram

35
4.3 Sequence UML Diagram

In this post we discuss Sequence Diagrams. Unified Modelling Language (UML) is a modeling


language in the field of software engineering which aims to set standard ways to visualize the
design of a system. UML guides the creation of multiple types of diagrams such as interaction
structure and behaviour diagrams. A sequence diagram is the most commonly
used interaction diagram. Interaction diagram – An interaction diagram is used to show
the interactive behavior of a system. Since visualizing the interactions in a system can be a
cumbersome task, we use different types of interaction diagrams to capture various features and
aspects of interaction in a system. Sequence Diagrams – A sequence diagram simply depicts
interaction between objects in a sequential order i.e. the order in which these interactions take
place. We can also use the terms event diagrams or event scenarios to refer to a sequence
diagram.

36
Fig 4.3 Sequence UML Diagram

5. EXPERIMENTAL ANALYSIS AND RESULTS

5.1 System Configuration

37
5.1.1 Software Requirements
Operating System: Windows XP and above.
Programming Language: Python.
Technology: Deep Learning.

5.1.2 Hardware Requirements


Processor: core i5 and above.
RAM: 4GB and above.
Graphic card: NVIDIA GTX 1060.
Hard Disk: 1 TB and above.

5.2 Sample Code

Main file

from googletrans import Translator

import config
import model
import utils
from intents import mail, note
from intents.application import Applications
from intents.google_search import GoogleSearch
from intents.youtube_search import YoutubeSearch
from model.model_training import TrainingModel
from PyQt5 import QtWidgets, QtGui, QtCore
from PyQt5.QtGui import QMovie
import sys
from PyQt5.QtWidgets import *
from PyQt5.QtCore import *
from PyQt5.QtGui import *
from PyQt5.uic import loadUiType
import speech_recognition as sr
import os
import time
import datetime

38
import pyautogui

flags = QtCore.Qt.WindowFlags(QtCore.Qt.FramelessWindowHint)

def wish():
hour = int(datetime.datetime.now().hour)
if hour >= 0 and hour < 12:
utils.speak("Good morning")
elif hour >= 12 and hour < 18:
utils.speak("Good Afternoon")
else:
utils.speak("Good night")

class mainT(QThread):
def __init__(self):
super(mainT, self).__init__()

def run(self):
self.Assistant()

def STT(self):
R = sr.Recognizer()
with sr.Microphone() as source:
print("Listening...........")
audio = R.listen(source)
try:
print("Recognizing......")
text = R.recognize_google(audio, language='en-in')
print(">> ", text)
except Exception:
utils.speak("Sorry Speak Again")
return "None"

39
text = text.lower()
return text

def Assistant(self):
wish()
words = model.words
classes = model.classes
data_x = model.data_x
data_y = model.data_y
training_model = TrainingModel(words, classes, data_x, data_y)
trained_model = training_model.train()

while True:
command = self.STT()
if command != "None" or command != '' or command:
intent = training_model.get_intent(trained_model, command)
response = TrainingModel.get_response(intent, config.DATA)
print(intent, ' : ', response)

if intent == 'greeting':
utils.speak(response=response)
elif intent == 'youtube_search':
utils.speak(response=response)
YoutubeSearch.launch(command)
elif intent == 'google_search':
utils.speak(response=response)
GoogleSearch.launch(command)
elif intent == 'applications':
Applications(response).launch(command)
elif intent == "note":
utils.speak("What would you like me to write down?")
note_text = self.STT()
note.take_note(note_text)
utils.speak("I've made a note of that")

40
elif intent == "close_note":
utils.speak("Okay sir, closing notepad")
os.system("taskkill /f /im notepad.exe")
elif intent == "screen_shot":

utils.speak("Alright sir, taking the screenshot")


img = pyautogui.screenshot()
img.save(r"C:\Users\seshu\Desktop\screenshot.png")

utils.speak(response=response)

FROM_MAIN, _ = loadUiType(os.path.join(os.path.dirname(__file__),
"./scifi.ui"))

class Main(QMainWindow, FROM_MAIN):


def __init__(self, parent=None):
super(Main, self).__init__(parent)
self.setupUi(self)
self.setFixedSize(1920, 1080)
self.label_7 = QLabel
self.exitB.setStyleSheet("background-image:url(./lib/exit - Copy.png);\n"
"border:none;")
self.exitB.clicked.connect(self.close)
self.setWindowFlags(flags)
Dspeak = mainT()
self.label_7 = QMovie("./lib/gifloader.gif", QByteArray(), self)
self.label_7.setCacheMode(QMovie.CacheAll)
self.label_4.setMovie(self.label_7)
self.label_7.start()

self.ts = time.strftime("%A, %d %B")

Dspeak.start()
41
self.label.setPixmap(QPixmap("./lib/tuse.png"))
self.label_5.setText("<font size=8 color='white'>" + self.ts + "</font>")
self.label_5.setFont(QFont(QFont('Acens', 8)))

app = QtWidgets.QApplication(sys.argv)
main = Main()
main.show()
exit(app.exec_())

Model Training

import random
import string

import nltk
import numpy as np
import tensorflow as tf
from nltk.stem import WordNetLemmatizer
from keras import Sequential
from keras.layers import Dense, Dropout

class TrainingModel:
def __init__(self, words, classes, data_x, data_y):
self.words = words
self.classes = classes
self.data_x = data_x
self.data_y = data_y
self.lemmatizer = WordNetLemmatizer()

def train(self):

42
words = [self.lemmatizer.lemmatize(word.lower()) for word in self.words if
word not in string.punctuation]

training = []
out_empty = [0] * len(self.classes)

for idx, doc in enumerate(self.data_x):


bow = []
text = self.lemmatizer.lemmatize(doc.lower())
for word in words:
bow.append(1) if word in text else bow.append(0)
output_row = list(out_empty)
output_row[self.classes.index(self.data_y[idx])] = 1
training.append([bow, output_row])
random.shuffle(training)
training = np.array(training, dtype=object)
train_x = np.array(list(training[:, 0]))
train_y = np.array(list(training[:, 1]))

input_shape = (len(train_x[0]),)
output_shape = len(train_y[0])

model = Sequential()
model.add(Dense(128, input_shape=input_shape, activation="relu"))
model.add(Dropout(0.5))
model.add(Dense(64, activation="relu"))
model.add(Dropout(0.3))
model.add(Dense(output_shape, activation="softmax"))
adam = tf.keras.optimizers.Adam(learning_rate=0.01, decay=1e-6)
model.compile(loss='categorical_crossentropy',
optimizer=adam,
metrics=["accuracy"])

model.fit(x=train_x, y=train_y, epochs=200, verbose=1)

43
return model

def get_intent(self, model, command):


tokens = nltk.word_tokenize(command)
tokens = [self.lemmatizer.lemmatize(word.lower()) for word in tokens]
bow = [0] * len(self.words)
for token in tokens:
for idx, word in enumerate(self.words):
if word == token:
bow[idx] = 1
bow = np.array(bow)
result = model.predict(np.array([bow]))[0]
thresh = 0.2

y_pred = [[idx, res] for idx, res in enumerate(result) if res > thresh]
y_pred.sort(key=lambda x: x[1], reverse=True)

intent = self.classes[y_pred[0][0]]
return intent

@staticmethod
def get_response(tag, data):
list_of_intents = data['intents']
for intent in list_of_intents:
if intent['tag'] == tag:
if len(intent['response']) > 0:
return random.choice(intent['response'])
else:
return None
Application.py

import os
import re

44
import subprocess

import config
import utils
import intents.windows as iw

class Applications:
INTENT_NAME = 'applications'
APP_INSTALLATION_DIRECTORIES = ['/System/Applications',
'/Applications', '/System/Applications/Utilities']

def __init__(self, response, logger=None):


self.logger = logger
self.response = response
self.os_name = config.OS_NAME

def get_name(self, command):


app = utils.get_search_value(command,
intent_name=Applications.INTENT_NAME)

is_space = bool(re.search(r"\s", app))


if is_space:
return app.replace(' ', "\\ ")
else:
return app

def launch(self, command):


app = self.get_name(command)

if self.os_name == 'Darwin':
path = utils.get_path_from_file(app)
if path is None:
patterns = [f'*{app}.app', f'{app}*.app', f'*{app}.app', f'*{app}*.app']

45
for directory in Applications.APP_INSTALLATION_DIRECTORIES:
if path:
break
for pattern in patterns:
path = os.popen(f"find {directory} -iname '{pattern}'").read() \
.split('\n')[0].replace(" ", "\\ ")
if path:
break

cmd = f'open {path}'


self.execute_command(cmd)
utils.add_to_json({app: {'path': path}})
else:
path = utils.get_path_from_file(app)
if path is None:
path = utils.get_path(app, iw.EXECUTABLE_EXT,
iw.APP_INSTALLATION_DIRECTORIES)
utils.add_to_json({app: {'path': path}})
if path:
cmd = f'explorer "{path}"'
print('Application : ', cmd)
self.execute_command(cmd)

def execute_command(self, cmd):


utils.speak(response=self.response)
output = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE,
stderr=subprocess.PIPE).communicate()

if str(output[1], 'utf-8')!='':
utils.speak('I am sorry sir, The app which you are looking for is not
installed in my database.')

Google Search

46
import webbrowser
import utils
class GoogleSearch:
@staticmethod
def launch(query):
utils.speak("ok sir This is what I found for your search!")
query.replace("assistant", "")
query.replace("google search", "")
query.replace("search", "")
query.replace("google", "")
web = "https://www.google.com/search?q=" + query
webbrowser.open(web)
utils.speak("Done sir")

Mail.py

import smtplib

def mail(sender_email, sender_password, receiver_email, msg):


try:
mail = smtplib.SMTP('smtp.gmail.com', 587)
mail.ehlo()
mail.starttls()
mail.login(sender_email, sender_password)
mail.sendmail(sender_email, receiver_email, msg)
mail.close()
return True
except Exception as e:
print(e)
return False

Note.py

import subprocess
import datetime

def take_note(text):

47
date = datetime.datetime.now()
file_name = str(date).replace(":", "-") + "-note.txt"
with open(file_name, "w") as f:
f.write(text)
notepad = "C:\\Windows\\System32\\notepad.exe"
subprocess.Popen([notepad, file_name])

YouTube

import webbrowser
import utils
class YoutubeSearch:
@staticmethod
def launch(query):
utils.speak("ok sir This is what I found for your search!")
query.replace("assistant", "")
query.replace("youtube search", "")
query.replace("search", "")
query.replace("youtube", "")
web = "https://www.youtube.com/results?search_query=" + query
webbrowser.open(web)
utils.speak("Done sir")

Config.json

{
"intents": [
{
"tag": "greeting",
"utterances": [
"wake up assistant",
"hello assistant",
"hi",
"hello",
"assistant",
"hai"
],
"response": [

48
"Hello Sir, How may i help you?",
"Hello Sir, always at your service."
]
},
{
"tag": "google_search",
"utterances": [
"google search",
"search",
"who is ",
"who are ",
"what is ",
"when",
"why",
"google"
],
"response": [
"Got it Sir.",
"I got the things for which you are looking for.",
"Give me a sec sir.",
"Just a moment sir"
]
},
{
"tag": "youtube_search",
"utterances": [
"play video on youtube",
"search video on youtube",
"play",
"youtube"
],
"response": [
"Got it Sir.",
"Give me a second sir."
]
},
{
"tag": "applications",
"utterances": [
"open",

49
"launch",
"app",
"application"
],
"response": [
"Ok sir.",
"sure sir."
]
},
{
"tag": "send_email",
"utterances": [
"email",
"send",
"send mail",
"send email",
"mail"
],
"response": [
"sending mail sir.",
"sure sir"
]
},
{
"tag": "note",
"utterances": [
"note",
"make a note",
"write this down",
"take a note"
],
"response": [
"sure sir",
"writing in the notepad sir"
]
},

{
"tag": "close_note",
"utterances": [

50
"close note",
"close notepad",
"close it",
"close"
],
"response": [
"sure sir",
"closing the notepad sir"
]
}
]
}

UTILS

import fnmatch
import json
import os
import random
import re
import webbrowser

import pyttsx3
import speech_recognition as sr

import config
from model.voice_ana import VoiceAnalyzer

def choose_random(response):
return random.choice(response)

def speak(response):
engine = pyttsx3.init('sapi5')
voices = engine.getProperty('voices')

51
engine.setProperty('voice', voices[0].id)
engine.setProperty('rate', 180)
engine.say(response)
engine.runAndWait()

def open_url(url):
webbrowser.open(url)

def find_file(pattern, path):


paths = []
for root, dirs, files in os.walk(path):
for name in files:
if fnmatch.fnmatch(name, pattern):
paths.append(os.path.join(root, name))
if paths:
return paths

def get_search_value(command, intent_name, match_flag='word'):


intents = config.DATA['intents']

utterances = [intent['utterances'] for intent in intents if intent['tag'] ==


intent_name][0]
if match_flag == 'word':
words = ['\\b' + word + '\\b' for utterance in utterances for word in
utterance.split(' ')]
words = '|'.join(words)
elif match_flag == 'sentence':
words = '|'.join(utterances)

return re.sub(words, '', command, flags=re.IGNORECASE).strip()

52
def get_path_from_file(app):
with open(config.APP_DETAILS_FILE) as file:
app_details = json.load(file)

app = app_details.get(app)
if app:
return app.get('path')

def get_path(app, ext, directories):


patterns = [f'{app}{ext}', f'{app}*.{ext}', f'*{app}.{ext}', f'*{app}*.{ext}']
for directory in directories:
for pattern in patterns:
result = find_file(pattern, directory)
if result:
if len(result):
return get_multiple_paths(result, ext)
else:
return result

def get_multiple_paths(paths, ext):


speak('I got multiple applications. Which one would you like to open?')
for path in paths:
exe_name = os.path.basename(path).replace(ext, '')
speak(exe_name)
if path:
return path

def add_to_json(app_details):
with open(config.APP_DETAILS_FILE, 'r+') as file:
data = json.load(file)

53
data.update(app_details)
file.seek(0)
json.dump(data, file)

def read_voice_cmd():
recognizer = sr.Recognizer()
voice_input = ''
try:
with sr.Microphone() as source:
print('Listening...')
audio = recognizer.listen(source=source, timeout=5, phrase_time_limit=5)
voice_input = recognizer.recognize_google(audio)
print('Input : {}'.format(voice_input))
except sr.UnknownValueError:
pass
except sr.RequestError:
print('Network error.')
except sr.WaitTimeoutError:
pass
except TimeoutError:
pass

return voice_input.lower()

Config

import json
import os

54
import platform

OS_NAME = platform.uname().system
APP_DETAILS_FILE = 'C:\\Users\\seshu\\PycharmProjects\\
VOICE_ASSISTANT\\config\\applications.json'

with open('C:\\Users\\seshu\\PycharmProjects\\VOICE_ASSISTANT\\config\\
config.json') as file:
DATA = json.load(file)

if os.path.exists(APP_DETAILS_FILE) is False:
with open(APP_DETAILS_FILE, 'w') as file:
file.write('{}')

speech recognition

import speech_recognition as sr
from nltk.sentiment.vader import SentimentIntensityAnalyzer

import utils

class VoiceAnalyzer:
def _init_(self):
self.sid = SentimentIntensityAnalyzer()

def get_polarity_scores(self):
try:
voice_input = utils.read_voice_cmd()
return self.sid.polarity_scores(voice_input)
except sr.UnknownValueError as e:
print(e)
except sr.RequestError:

55
print('Network error.')
except sr.WaitTimeoutError:
pass
except TimeoutError:
pass

Model init

import nltk
import ssl

# Downloading the nltk data


# try:
# _create_unverified_https_context = ssl._create_unverified_context
# except AttributeError:
# pass
# else:
# ssl._create_default_https_context = _create_unverified_https_context
#
# nltk.download('punkt')
# nltk.download('wordnet')
# nltk.download('vader_lexicon')
import config

words = []
classes = []
data_x = []
data_y = []

for intent in config.DATA['intents']:


for utterance in intent['utterances']:
tokens = nltk.word_tokenize(utterance)
words.extend(tokens)
data_x.append(utterance)

56
data_y.append(intent['tag'])

if intent['tag'] not in classes:


classes.append(intent['tag'])

words = sorted(set(words))
classes = sorted(set(classes))

run.py

from PyQt5 import QtWidgets, QtGui,QtCore


from PyQt5.QtGui import QMovie
import sys
from PyQt5.QtWidgets import *
from PyQt5.QtCore import *
from PyQt5.QtGui import *
from PyQt5.uic import loadUiType
import pyttsx3
import speech_recognition as sr
import os
import time
import webbrowser
import datetime

flags = QtCore.Qt.WindowFlags(QtCore.Qt.FramelessWindowHint)

engine = pyttsx3.init('sapi5')
voices = engine.getProperty('voices')
engine.setProperty('voice',voices[0].id)
engine.setProperty('rate',180)

def speak(audio):
engine.say(audio)
engine.runAndWait()

57
def wish():
hour = int(datetime.datetime.now().hour)
if hour>=0 and hour <12:
speak("Good morning")
elif hour>=12 and hour<18:
speak("Good Afternoon")
else:
speak("Good night")

class mainT(QThread):
def __init__(self):
super(mainT,self).__init__()

def run(self):
self.JARVIS()

def STT(self):
R = sr.Recognizer()
with sr.Microphone() as source:
print("Listning...........")
audio = R.listen(source)
try:
print("Recog......")
text = R.recognize_google(audio,language='en-in')
print(">> ",text)
except Exception:
speak("Sorry Speak Again")
return "None"
text = text.lower()
return text

def JARVIS(self):
wish()
while True:

58
self.query = self.STT()
if 'good bye' in self.query:
sys.exit()
elif 'open google' in self.query:
webbrowser.open('www.google.co.in')
speak("opening google")
elif 'open youtube' in self.query:
webbrowser.open("www.youtube.com")
elif 'play music' in self.query:
speak("playing music from pc")
self.music_dir ="./music"
self.musics = os.listdir(self.music_dir)
os.startfile(os.path.join(self.music_dir,self.musics[0]))
FROM_MAIN,_ = loadUiType(os.path.join(os.path.dirname(__file__),"./scifi.ui"))
class Main(QMainWindow,FROM_MAIN):
def __init__(self,parent=None):
super(Main,self).__init__(parent)
self.setupUi(self)
self.setFixedSize(1920,1080)
self.label_7 = QLabel
self.exitB.setStyleSheet("background-image:url(./lib/exit - Copy.png);\n"
"border:none;")
self.exitB.clicked.connect(self.close)
self.setWindowFlags(flags)
Dspeak = mainT()
self.label_7 = QMovie("./lib/gifloader.gif", QByteArray(), self)
self.label_7.setCacheMode(QMovie.CacheAll)
self.label_4.setMovie(self.label_7)
self.label_7.start()

self.ts = time.strftime("%A, %d %B")

Dspeak.start()
self.label.setPixmap(QPixmap("./lib/tuse.png"))

59
self.label_5.setText("<font size=8 color='white'>"+self.ts+"</font>")
self.label_5.setFont(QFont(QFont('Acens',8)))
app = QtWidgets.QApplication(sys.argv)
main = Main()
main.show()
exit(app.exec_())

5.3 Screen shots/ Results

Interface

Fig 5.3(a): Graphical User Interface


Input: What is the current time

60
Fig 5.3(b) Google Search for time
Input: Who is the president of India.

Fig 5.3(c): Browsing web Information


Input : Play Songs

61
Fig 5.3(d): Youtube Search
Input: Take a note

Fig 5.3(e) Writing Text in Notepad


Input: Open Control Panel:
62
Fig 5.3(f): Opening System App Control panel.

6. Platform used: Pycharm:

63
As a programmer, you should be focused on the business logic and creating useful applications
for your users. In doing that, PyCharm by JetBrains saves you a lot of time by taking care of the
routine and by making a number of other tasks such as debugging and visualization easy.
PyCharm provides smart code completion, code inspections, on-the-fly error highlighting and
quick-fixes, along with automated code refactorings and rich navigation capabilities.

6.1 Intelligent Coding Assistance

PyCharm provides smart code completion, code inspections, on-the-fly error highlighting and
quick-fixes, along with automated code refactorings and rich navigation capabilities.

Intelligent Code Editor:


PyCharm’s smart code editor provides first-class support for Python, JavaScript, CoffeeScript,
TypeScript, CSS, popular template languages and more. Take advantage of language-aware code
completion, error detection, and on-the-fly code fixes!

Smart Code Navigation:


Use smart search to jump to any class, file or symbol, or even any IDE action or tool window. It
only takes one click to switch to the declaration, super method, test, usages, implementation, and
more.

Fast and Safe Refactorings:


Refactor your code the intelligent way, with safe Rename and Delete, Extract Method, Introduce
Variable, Inline Variable or Method, and other refactorings. Language and framework-specific
refactorings help you perform project-wide changes.

6.2 Built-in Developer Tools:

PyCharm’s huge collection of tools out of the box includes an integrated debugger and test
runner; Python profiler; a built-in terminal; integration with major VCS and built-in database
tools; remote development capabilities with remote interpreters; an integrated ssh terminal; and
integration with Docker and Vagrant.

Debugging, Testing and Profiling:


Use the powerful debugger with a graphical UI for Python and JavaScript. Create and run your
tests with coding assistance and a GUI-based test runner. Take full control of your code with
Python Profiler integration.
VCS, Deployment and Remote Development:

64
Save time with a unified UI for working with Git, SVN, Mercurial or other version control
systems. Run and debug your application on remote machines. Easily configure automatic
deployment to a remote host or VM and manage your infrastructure with Vagrant and Docker.

Database tools:
Access Oracle, SQL Server, PostgreSQL, MySQL and other databases right from the IDE. Rely
on PyCharm’s help when editing SQL code, running queries, browsing data, and altering
schemas.

6.3 Customizable and Cross-platform IDE


Use PyCharm on Windows, macOS and Linux with a single license key. Enjoy a fine-tuned
workspace with customizable color schemes and key-bindings, with VIM emulation available.

Customizable UI
Are there any software developers who don't like to tweak their tools? We have yet to meet one,
so we've made PyCharm UI customization a breeze. Enjoy a fine-tuned workspace with
customizable color schemes and key-bindings.

Plugins
More than 10 years of IntelliJ platform development gives PyCharm 50+ IDE plugins of
different nature, including support for additional VCS, integrations with different tools and
frameworks, and editor enhancements such as Vim emulation.

Cross-platform IDE
PyCharm works on Windows, macOS or Linux. You can install and run PyCharm on as many
machines as you have, and use the same environment and functionality across all your machines.

65
7. Testing

Testing is a process of executing a program with the aim of finding error. To make our software
perform well it should be error free. If testing is done successfully it will remove all the errors
from the software. Manual Testing - Manual testing is the process of testing software by hand to
learn more about it, to find what is and isn’t working. This usually includes verifying all the
features specified in requirements documents, but often also includes the testers trying the
software with the perspective of their end user’s in mind. Manual test plans vary from fully
scripted test cases, giving testers detailed steps and expected results, through to high-level guides
that steer exploratory testing sessions. There are lots of sophisticated tools on the market to help
with manual testing, but if you want a simple and flexible place to start, take a look at Testpad.
Automation testing is the process of testing the software using an automation tool to find the
defects. In this process, testers execute the test scripts and generate the test results automatically
by using automation tools. Some of the famous automation testing tools for functional testing are
QTP/UFT and Selenium.

Importance of Testing

The purpose of testing is to discover errors. Testing is the process of trying to discover every
conceivable fault or weakness in a work product. It provides a way to check the functionality of
components, sub-assemblies, assemblies and/or a finished product. It is the process of exercising
software with the intent of ensuring that the software system meets its requirements and user
expectations and does not fail in an unacceptable manner. Software testing is an important
element of the software quality assurance and represents the ultimate review of specification,
design and coding. The increasing feasibility of software as a system and the cost associated with
the software failures are motivated forces for well-planned through testing. Testing is an
important phase in the development life cycle of the product; this was the phase where the error
remaining from all the phases was identified. Therefore, the system functional testing exhibits a
very significant responsibility for quality assurance and ensuring the reliability of the software.
During the testing, the program to be tested was performed with certain defined test scenarios
and the output of the program for the test cases was evaluated to determine whether the program
is performing as expected. In case of any error the irregularity were detected and corrected for
optimal functioning by employing following testing approaches and phases and correction was
recorded for future references. Thus, a series of testing was performed on the system before it
was ready for implementation.

66
Test Case 1: Good Morning Assistant
By giving instruction to the assistant “Good Morning Assistant” then the sequential
model will match the intent of the input with the predefined data stored in the json file. Based on
it the response will be generated by the system after categorizing the input that belongs to which
label or tag.
Here the intent belong to “Greetings” tag.

Test Case 2: Who is the president of India (or) What is the current time
The sequential model will match the purpose of the input with the predetermined data saved in
the json file if you tell the assistant " Who is the president of India " The system will generate a
response based on it after categorizing the input and determining which label or tag it belongs to.

The intent in this case is associated with the "Google Search" tag.

Test Case 3: Play Songs


By giving instruction to the assistant “Play Songs” then the sequential model will match
the intent of the input with the predefined data stored in the json file. Based on it the response
will be generated by the system after categorizing the input that belongs to which label or tag.
Here the intent belong to “Greetings” tag.

67
Test Case 4: Take a note

The sequential model will match the purpose of the input with the predetermined data saved in
the JSON file if you tell the assistant "Take a Note" The system will generate a response based
on it after categorizing the input and determining which label or tag it belongs to.

The intent in this case is associated with the "Writing Note" tag.

Test Case 5: Open control panel

By giving instruction to the assistant “Open Control Panel” then the sequential model
will match the intent of the input with the predefined data stored in the JSON file. Based on it the
response will be generated by the system after categorizing the input that belongs to which label
or tag.
Here the intent belong to “System Applications” tag.

68
8. CONCLUSION AND FUTURE SCOPE 

8.1 Conclusion

The project is to build an assistant which is reliable, cost-effective and provides many services to
the user. Assistant was used in many applications such as scientific, educational, and
commercial. The main agenda of this project is to serve people with physical disabilities. If a
person is facing difficulty in communicating with the system it is not proper communication. To
overcome this kind of issue we developed an assistant which helps them to communicate with
the system efficiently. This assistant responses from a finite set of predefined responses or pre-
existing Information and training. Assistants are a big step forward in enhancing human
computer interactions. These can be used as intelligent personal assistants on mobile devices, as
artificial tutors in the educational area, offering rapid and individualised feedback to students,
and as personalised marketing to clients in the social networking arena. Finally, we were able to
give the Assistant any set of instructions, and it performed the function correctly. We are mostly
interested in speech type messages that can be directly typed into any document file (notepad,
word, pdf, etc.). Assistant are a big step forward in enhancing human computer interactions.

8.2 Future Scope

To get the design right and build assistants that give users a fantastic experience, it takes practice
and a deeper understanding of the underlying principles.
There are some limitations with our project they are:
At present we have developed an assistant which supports one language, in the near future we
can break the barrier by developing a more sophisticated assistant.
To use the assistant efficiently we need a strong and durable internet connection, without that we
may end up with late responses. To process the voice data and convert it into the text we used
python modules, so we need an internet connection. In the future, it is better to overcome this
drawback and come up with more efficient solutions.

69
9. REFERENCES      
[1].https://www.ijrte.org/wp-content/uploads/papers/v9i2/A2753059120.pdf
[2].https://media.neliti.com/media/publications/263312-an-intelligent-behaviour-shown-by-
chatbo-7020467d.pdf 
[3].Apte, T. V., Ghosalkar, S., pandey, S., & Padhra, S. (2014). Android app for blind using
speech technology. International Journal of Research in Computer and Communication
Technology (IJRCCT), #(3), 391-394. 
[4].https://opensource.adobe.com/Spry/samples/data_region/JSONDataSetSample.html 
[5].https://support.etlworks.com/hc/en-us/articles/360014078293-JSON-dataset- 
Format#:~:text=The%20JSON%20dataset%20is%20a,REST%20APIs%20in%20Etlworks
%20Integrator
[6].Anwani, R., Santuramani, U., Raina, D., & RL, P. Vmail: voice Based Email Application.
International Journal of Computer Science and Information Technologies, Vol. 6(3), 2015 
[7]. https://keras.io/guides/sequential_model/ 
[8]. https://keras.io/guides/making_new_layers_and_models_via_subclassing/ 
[9]. https://keras.io/guides/training_with_built_in_methods/ 
[10]. https://analyticsindiamag.com/a-tutorial-on-sequential-machine-learning/ 
[11]. https://doc.qt.io/qtforpython-5/PySide2/QtWidgets/QWidget.html 

70
10. Paper Acceptance Letter

71
72
73
74
75
76
77
78
79
80
81
82

You might also like