0% found this document useful (0 votes)
113 views23 pages

Intelligent Virtual Assistants

This document discusses the history and future of intelligent virtual assistants (IVAs). It describes how IVAs have evolved from early text-based chatbots to modern voice assistants that use artificial intelligence techniques like natural language processing. The document outlines several proposed advances for IVAs, including multi-modal assistants that respond to multiple input types (e.g., voice, text, images) and "telepathic" assistants that could be controlled by thought via brain-computer interfaces. It also discusses privacy and security concerns that arise from IVAs gaining more knowledge about users' lives.

Uploaded by

PRINCE NANDHA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
113 views23 pages

Intelligent Virtual Assistants

This document discusses the history and future of intelligent virtual assistants (IVAs). It describes how IVAs have evolved from early text-based chatbots to modern voice assistants that use artificial intelligence techniques like natural language processing. The document outlines several proposed advances for IVAs, including multi-modal assistants that respond to multiple input types (e.g., voice, text, images) and "telepathic" assistants that could be controlled by thought via brain-computer interfaces. It also discusses privacy and security concerns that arise from IVAs gaining more knowledge about users' lives.

Uploaded by

PRINCE NANDHA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Contents

1 Introduction 4

2 History 6
2.1 Early decades: 1910 - 1980 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Birth of smart virtual assistants: 1990s—Present . . . . . . . . . . . . . . . . 7

3 Working of Virtual Personal Assistant 8


3.1 Smart chatbots: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Voice based IVA: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4 Future Advancements 9
4.1 Multi-Modal Intelligent Virtual Assistants: . . . . . . . . . . . . . . . . . . . 9
4.1.1 Structure of General Dialogue System . . . . . . . . . . . . . . . . . . 10
4.1.2 The Proposed IVAs Systems . . . . . . . . . . . . . . . . . . . . . . . 10
4.2 Telepathic Virtual Assistants: . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.2.1 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2.2 Proposed work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2.3 Working of the proposed model: . . . . . . . . . . . . . . . . . . . . . 14
4.2.4 Module I- Brain Computer Interface: . . . . . . . . . . . . . . . . . . 15
4.2.5 Module II – Request Processing: . . . . . . . . . . . . . . . . . . . . 17

5 Security & Privacy with IVAs 18


5.1 Possible Security & Privacy Threats . . . . . . . . . . . . . . . . . . . . . . . 19

6 IVAs Knows your life: 19


6.1 Measurement Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

7 Conclusion 21

1
List of Tables
1 Types of Alexa cloud-native data . . . . . . . . . . . . . . . . . . . . . . . . 20

List of Figures
1 The Structure of General Dialogue System . . . . . . . . . . . . . . . . . . . 9
2 Structure of next generation virtual assistant . . . . . . . . . . . . . . . . . 11
3 The graph model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4 The gesture model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5 The ASR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
6 The next generation Virtual Assistant . . . . . . . . . . . . . . . . . . . . . 15
7 Architecture of the proposed system . . . . . . . . . . . . . . . . . . . . . . 16
8 The Brain Computer Interface . . . . . . . . . . . . . . . . . . . . . . . . . 17
9 The Request Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2
Nomenclature
AI Artificial intelligence

AIML Artificial Intelligence Markup Language

ASR Automatic Speech Recognition

BCI Brain Computer Interface

CAGR Compound Annual Growth Rate

DARPA Defense Advanced Research Projects Agency

DM Dialogue Manager

EEG Electroencephalogram

IoT Internet of Things

IVA Intelligent Virtual Assistant

NLG Natural Language Generation

NLP Natural Language Processing

NLTK Natural Language Tool Kit

NLU Natural Language Understanding

SLU Spoken Language Understanding

TTS Text to Speech Synthesis

3
Intelligent Virtual Personal Assistant
Prince Nandha

Abstract
Artificial Intelligence (AI) has made great progress and continues to expand its
potential. Natural Language Processing is one of AI’s applications (NLP). Voice assis-
tants employ cloud computing to combine AI and can converse with users in natural
language. Voice assistants are simple to use, which is why there are millions of gadgets
with them in homes today. The goal of this research is to investigate every facet of the
new Intelligent Virtual Assistant technology (IVA). This article comprises a study of
the beginning of IVA and its future progress.We’ve talked about the next generation
of IVAs, multimodal IVAs, which can be accessible by several types of inputs. This
study also provides an improvement in the form of telepathic IVA, which functions
by thought.Security and privacy from the outside world are two of the most pressing
concerns as technology advances. This paper investigates the chances of a security
breach and offers some potential solutions to make it more secure for users.

1 Introduction
Human lifestyle is upheaval with the springing up of technology. Enlightenment requires
knowledge. With the passage of time, the methods of obtaining knowledge have evolved.
The library, which was formerly regarded as a centre of information, is now only used by
scholars as a source of information. It is due to the vast amount of information presently
available through search engines. Even computers are being phased out in favour of smart
phones, and virtual assistants are taking over the labour of manual search.The use of a
virtual assistant has grown in popularity. Rather than operating it manually, it makes the job
easier. Personal assistants are mostly used for searching, navigating between programmes,
and completing activities that the user has asked. A speech or text-based bot can be used
as a more generic sort of personal assistant[8].
AI-based assistant one of a kind of software with the work is to fulfill the task based on
given orders or inquiries. IVAs working on chat-based systems are also known as chatbots.
Online chat services are occasionally used solely for entertainment. By means of artificial
speech, some IVAs can communicate with humans in an understanding manner. Due to this
functionality, IVAs can fulfill the tasks such as controlling home automation devices, media,
answer to the asked questions, and some basic tasks like reading emails, to-do lists, etc by
means of vocal commands. The dialogue system is based on a similar idea, but with a few
modifications.[9].

4
According to the report published in 2017, IVAs capabilities and daily life usage are in
upheaval due to the springing up due to the new gadgets and effective vocal-based interface
with humans. Tech Giant companies like Apple and Google have introduced IVAs in smart-
phones, Amazon has a good base of uses with smart speakers, Microsoft has a great base of
personal computers all over the globe.nConversica’s intelligent virtual assistants for business
have received over 100 million email and SMS engagements. [4].
Nowadays, spoken conversation systems are in demand due to their ability to communi-
cate with humans in more human ways and capability to assist the human user in performing
the task quickly. In addition, these systems are now part of smartphones, Televisions, and
personal computers, and Automated as well as normal vehicles. Microsoft’s Cortona, Ap-
ple Siri, Google Assistant, Amazon Alexa are a few of the most popular spoken conversion
systems used to make the device more communicable through vocal commands.[1].
Amazon’s powerful deep learning capabilities, such as ASR for converting speech to
text and NLU for determining text intent, enable developers to create apps with extremely
engaging user interfaces and lifelike conversational interaction.[1].
Virtual assistants use the following methods to communicate with their clients:
IVAs can communicate with humans in many means. A few of the ways are text especially
online chatbots, SMS, emails, and some other textbases communication channels. Some of
the examples are Conversica’s IVA used in business. Voice, the advanced way of communi-
cation with IVA. These are instrumented by Tech giants. A few examples are Apple Siri,
Amazon Alea, Google Assistant, etc. Some of these can be utilized by different types of
input, i.e. we can also access Google assistant by searching pictures, by text, and by vocal
commands.[9].
NLP is used by virtual assistants to match user text or voice input to executable com-
mands. Many people are always learning through artificial intelligence approaches such as
machine learning. Some of these assistants, such as Google Assistant and Samsung Bixby,
can also perform image processing to distinguish items in photographs, allowing users to
obtain better results from the images they’ve clicked.
A wake word is a vocal command that may be used to activate a virtual assistant.
This phrase includes ”Hey Siri,” ”OK Google” or ”Hey Google,” ”Alexa,” and ”Hey Mi-
crosoft.” As virtual assistants grow more common, the legal risks they pose are becoming
more apparent.[8].
The IoT is in upheaval. The market of IoT is expected to be $2 trillion by now. Which
is about 20% in CAGR and experts claimes that in 2 years 40% of homes will use one or
more IVAs. In the IoT industry, an IVA is a popular service for connecting with people using
voice commands. Smartspeakers, smart refrigerators, connected vehicles, and other gadgets
can all contain an IVA.
Famous IVAs like Amazon Alexa and Google Assistant rely on cloud computing for max-
imum performance and effective data management. A huge number of behavioural traces,
including a user’s voice activity history with extensive descriptions, can be saved in remote
cloud servers inside an IVA ecosystem during this process. If those data are stolen or exposed
as a result of a cyberattack, such as a data breach, a malicious individual may be able to not

5
only capture extensive IVA service usage history, but also divulge other user-related informa-
tion using various data analysis techniques[7]. This document illustrates and categorises the
many sorts of user-generated Alexa data. We examine a multi-month experimental dataset
using Alexa-enabled devices such as the Echo Dot, Dash Wand, Fire HD, and Fire TV. This
study demonstrates how to gain fresh insights into personal information such as likes and
life patterns using a number of data analysis approaches. Furthermore, the findings of this
study have significant ramifications for both IVA vendors and end users in terms of privacy
risks[7].
Chatbots, which do pre-determined tasks, are a variation on the personal assistant con-
cept. Multilingual personal assistants are available on the market to aid various groups of
people. Personal assistants that can complete tasks by reading the user’s mind are still a
pipe dream. Telepathy is only regarded as a folklore and is not now in practise. However,
the phenomenon is feasible because to the advancement of technology. Virtual assistants
make it difficult for those with hearing and speech impairments. Because of the risk of being
misunderstood, the usage of a virtual assistant causes issues in vital and tense situations.
We can prevent these unfavourable situations by employing telepathic personal helpers[2].
Recent news has aroused the question of the reliability of IVAs like Alexa, Google Home,
Apple Siri, and many other IVAs. Reports show that all the IVAs are not always reliable.
For the strong support, I have given a news cutlet here. A child once shared her passion for
dollhouses and cookies with a new echo dot of her family. This information prompted Alexa
to order a $160 dollhouse and some cookies without the parent’s astonishment. When this
news is reported on TV, the users reported that their echo dots also tried to order doll house
by listening to the voice of the reporter. We will look at some security problems coming
with the technology advancements.

2 History
2.1 Early decades: 1910 - 1980
Radio rex was the first toy with a vocal base activation system invented in the late 1920s.
A wooden sogs comes out upon calling its name.
The Automatic Digit Recognition system, dubbed ”Audrey” by Bell Labs, was unveiled in
1952. It took up a six-foot-high relay rack, drank a lot of power, had a lot of cables, and had
all the problems that come with intricate vacuum-tube electronics in terms of maintenance.
It was able to distinguish between phonemes, which are the basic units of speech. It was
limited to a strict digit recognizer with few designated users only. It could thus be used
for voice dialling, but in most circumstances, instead of reciting the successive digits, push-
button dialling was cheaper and faster. The IBM Shoebox voice-activated calculator, which
was introduced to the general public during the 1962 Seattle World’s Fair after its original
market introduction in 1961, was another early gadget that could perform digital speech
recognition. This early computer, which was developed nearly 20 years before the first IBM
Personal Computer was introduced in 1981, could detect 16 spoken phrases and the numerals

6
0 to 9[4].
MIT scientist Joseph Weizenbaum created first NLP-based chatbot called. ELIZA in
the late 1960s. It was invented to show the connection between humans and machines
is superficial. ELIZA employed pattern matching and replacement methods in scripted
responses to simulate conversation, giving the impression that the software understood what
was being said [4].
Weizenbaum’s assistant allegedly asked him to leave the room so she and ELIZA could
have a proper talk. Weizenbaum was taken aback, noting subsequently, ”I had not imag-
ined... that extremely brief exposures to a pretty simple computer programme might cause
significant delusional thinking in fairly normal persons.”[4]
The ELIZA effect, also known as anthropomorphisation, is a phenomena that occurs
in human interactions with virtual assistants. It is the tendency to automatically assume
computer activities are equivalent to human behaviours.
The next milestone was achieved in the 1970s at Carnegie Mellon University in Pitts-
burgh, Pennsylvania, where Us based DARPA agency stated half decade long research on
Speech Understanding with the goal of achieving a vocabulary of 1,000 words. IBM, Carnegie
Mellon University (CMU), and Stanford Research Institute were among the companies and
universities that participated in the programme.[4]
The result of the research was called ”Harpy” with the knowledge of 1000 words and
was able to understand some phrases. Harpy was able to do the analysis of speech using
preprogrammed vocabulary, speech, and grammar for the identification of word sequences
making sense together, and minimizing the speech recognition errors.[4]
Another example of such IVA is ”TANGORA”, which is a voice-recognition typewriter.
it was launched in 1986 as an advanced form of the ”SHOEBOX”. It has knowledge of about
20k words to predict the outcome of the query based on the historic inputs. One interesting
fact about ”Tangora” is that it is named after the fastest typewriter at that time. The IBM
used Markov Model to deal with the statistics of digital signal processing techniques. The
approach allows you to forecast which phonemes are most likely to follow a given and Each
user had to teach the ”TANGORA” to identify his or her vocals and phrases among the
words.[4]

2.2 Birth of smart virtual assistants: 1990s—Present


With the big giants like IBM, PHILLIPS, and L&H which are competing among themselves
for a large part of the market in the 1990s, they added speech recognition technology as one
of the great parts of personal computers. IBM launched the first smartphone in 1994 and
laid the base for the emerging market of IVAs in smartphones.[4].
In the late 1990s, Dragon’s naturally speaking software was introduced which could detect
and able to transcript the human vocals in the document in written form at a good pace,
with no gaps between each syllable. Naturally Speaking is still accessible for download, and
many doctors in the United States and the United Kingdom use it to document their medical
records[4].

7
Two of the IVAs, ”COLLOQUIS” and ”SmarterChild” were made public in 2001 on the
platform of some messengers.These two are able to play games, search the facts, and are able
to check for the weather. Although they are only text-based IVAs.
Apple made a debut in the IVA market with its Siri as a part of the iPhone 4S in 2011.
It was the first modern IVA to be part of smartphones. after the acquisition of Siri inc.
by Apple, it goes into upheaval due to the collection of the fund in research from the US
Department of Defense and DARPA a US-based research center. The aim of Siri was to
make text answers, phone calls, weather reports, putting alarms, making internet search and
the further aim was extended to give suggestions also.[4].
Amazon debuted Alexa alongside the Echo in November 2014.Amazon launched a service
in April 2017 that allows users to create conversational interfaces for any sort of virtual
assistant or interface[4].

3 Working of Virtual Personal Assistant


3.1 Smart chatbots:
Intelligent chatbots are programs that perform a certain task automatically depending upon
their preset algorithms and triggers. Also, it has the functionality to mimic human interac-
tion. Chatbots and User can communicate with the help of chat boxes with communication
medium as text or voices, similar to human-human interaction. Earlier chat boxes are based
on solid preprogramming, which gives responses from the already fed data.
The first step is to develop a bot that matches patterns. Bots that match patterns in
text categorise it and respond to terms they encounter. AIML is a standard format for these
patterns. When pattern-matching is used, the chatbot can only answer queries that are
already in its models. The bot can only use the patterns that were previously programmed
into its system..[5].
Algorithms are improved dramatically over time and emerged as the alternative for to-
day’s chatbot algorithms. A unique pattern for each type of query must be provided in a
database for the bot to provide the proper response. Various combinations of trends can be
used to create a hierarchical structure. To reduce the number of classifiers and make the
structure more clear, developers utilise algorithms. A well-known NLP and text classification
algorithm is Multinational Nave Bayes.[5].
The third major approach for chatbots is artificial neural networks (ANN). Bots can use
these technologies to determine the response to a question based on weighted connections
and data context. Each sentence given to a bot is broken down into several words, with
each word serving as an input to an artificial neural network. Over time, the neural network
develops and becomes stronger, allowing the bot to generate a more accurate set of replies
to common queries.[5].

8
3.2 Voice based IVA:
A virtual assistant is a programme that recognises voice instructions and does tasks on the
user’s behalf. Virtual assistants can be found on most smartphones and tablets, as well as
desktop computers and standalone devices such as the Amazon Echo and Google Home.
They use a combination of specialised computer chips, microphones, and software to
listen for precise spoken commands from you and respond in the voice you choose.
The programme coverts the human spoken words to text and then fed that to cloud based
systems. As given below:

Figure 1: The Structure of General Dialogue System

4 Future Advancements
4.1 Multi-Modal Intelligent Virtual Assistants:
With time, the only text-based IVAs couldn’t satisfy human needs and hence the new re-
searches are made to form such and IVAs that can accept the input by means of speech,
by touch, by pen, by gestures, and by the movement of body parts. Such IVAs are then
referred to as the multi-modal IVAs as there is a number of input modes.This technology,
which includes a touch screen and a speech recogniser, is used to handle numerous non-
critical automotive functions, such as weather, navigation queries, phone calls, etc, in the
Ford Model U Concept Vehicle, for example. with this improvement, the newer ways like
speech, command-and-control interface are introduced in some car systems which ruled the
car market with this new technology. the prototype provides the human language dialogue
interface with an attractive graphical interface for the user[1].
”Semio is building a cloud-based platform to allow humans to use robots through natural
communication—speech and body language,” according to a statement from the University

9
of Southern California. They propose a method for designing the Next-Generation of Virtual
Personal Assistants, based on the Multi-modal dialogue system, employs techniques such as
input by means of gestures, images, videos or speech, a vast dialogue and conversational
knowledge base, and a general knowledge base to increase user-computer interaction. Fur-
thermore, our method will be applied to a variety of jobs, including educational support,
medical support, robots and vehicles, disability systems, home automation, and security
access controls[10].
The method includes some novel features that distinguish this device, such as using
the TV by displaying data on a screen or connecting it to one, watching shows with the
translation of language, textbase conversation with others in any language, understanding
body language, and movements, and playing games with speech and gesture recognition; it
can also be used to read facial and speech expressions[10].
ASR Model, Gesture Model, Graph Model, Interaction Model, User Model, Input Model,
Output Model, Inference Engine, Cloud Servers, and Knowledge Base are added to the
original structure of general dialogue systems to change the general model to Multi-modal
dialogue systems, allowing the Next-Generation of Virtual Personal Assistants to be designed
with high accuracy.

4.1.1 Structure of General Dialogue System


The dialogue system is one of the most active areas in which many businesses are design-
ing and improving new technologies. Millions of people will use ”speech” to communicate
with machines before 2030, according to CHM Research, and voice-driven services will be
integrated into smartphones, smart eyewear, home hubs, kitchen appliances, TVs, games
consoles, thermostats, in-car systems, and clothes.
A dialogue system can be classed into three groups based on the method used to control
discourse: finite state (or graph) based systems, frame based systems, and agent based
systems.
A conversation system’s core components are the input decoder, natural language un-
derstanding, dialogue manager, domain specific component, response generator, and output
renderer. The six main components of generic dialogue systems are ASR, SLU, DM, NLG,
TTS, and the knowledge base. For a generic conversation system, see fig 1.

4.1.2 The Proposed IVAs Systems


Multi-modal conversation systems that handle two or more integrated user input modes,
such as speech, image, video, touch, manual gestures, gaze, and head and body movement,
were utilised to create the Next-Generation of VPAs model. Modifications and additions
to the basic architecture of generic conversation systems include the ASR Model, Gesture
Model, Graph Model, Interaction Model, User Model, Input Model, Output Model, Inference
Engine, Cloud Servers, and Knowledge Base.[1].

The following is the structure of the Next-Generation of Virtual Personal Assistants:

10
Figure 2: Structure of next generation virtual assistant

A) Knowledge Base
Generally, 2 types of knowledge bases are there. First is the online base and the remaining
is the local knowledge base, the common thing is that both contain all the data and facts
pertaining to each model, for example, facial data and gesture data for the next model i.e.
gesture model, vocal recognition in a knowledge base, vocabulary and spoken phrases for
the ASR model, graphical data for graph model and some other useful information and
configuration of systems are stored in this base.

B) Graph Model :
Graph model uses graphical input such as images and videos in real-time base, it extracts
the frames from the videos collected by the camera and the connected input model and then
it further connects those frames and images to graph modals and applications running on
cloud servers for the data analysis and then applications return the results. Refer Fig. 3.

C) Gesture model
This model deals with the reading of the human body movement and facial expressions
and gestures made by them using the camera and some sensors in the input models, then
data is further connected to gesture model and apps on cloud servers for data analysis and
delivers the result. Refer Fig. 4.

D) ASR Model

11
This model deals with the reading of speech input and recognition in real-time base with
the help of a microphone in the connected input model along with the ASR model in cloud
servers for vocal recognition and then covert the vocal data into the text, then the text is
further connected to the applications in a cloud server for analysis and result is received.
Refer Fig. 5.

E) Interaction Model
This is the main model that is used to establish the interaction between the system and
the models by forwarding data of input model after evaluating and determining which model
to send this data depending upon the duties, then to arrive at the result that can be used
to arrive at the final result.

F) Inference Engine
In the chain of conditions and derivations, the inference engine collaborates with the
Interaction Model to derive the conclusion. They examine all of the facts and rules, sort
them, and then come up with a solution.

G) User Model
This model contains all of the information on the system’s users. Personal information
such as users’ names and ages, hobbies, skills and expertise, objectives and ambitions, pref-
erences and dislikes, and statistics about their behaviour and interactions with the system
can all be included.

H) Input Model
This model will coordinate the operation of all input devices used by the system to gather
data from the microphone, camera, and Kinect. In addition, before delivering the data to
the Interaction Model, this model contains intelligence algorithms to arrange the input data.

I )Model of Output
This model receives the final decision of the Interaction model with the explanation of
the results, then its duty is to choose the perfect output device to show the data to the user,
for example, screen, speaker, etc.
THE SYSTEM IS BEING TESTED
The entire system will analyse user inputs and build queries to cloud servers and knowl-
edge sources in order to complete tasks and obtain data for the response generating models
to output.
In this way the multi-modal Intelligent Virtual assistant can be developed[1].

4.2 Telepathic Virtual Assistants:


Multilingual personal assistants are available on the market to serve various groups of indi-
viduals. Personal assistants capable of completing chores by reading the user’s mind are still
a fiction. Telepathy is only known as a mythology and is not used in everyday life. However,
the phenomenon is feasible thanks to the development of new technology. Virtual assistants

12
Figure 3: The graph model

Figure 4: The gesture model

make it difficult for persons with hearing and speech impairments. Because of the risk of
being misunderstood, the employment of a virtual assistant causes problems in important
and tense situations.
We may prevent these unfavourable situations by employing telepathic personal helpers.
The BCI is used to decode the user’s ideas without the need of physical or vocal inputs. We
connect the BCI with IVA to increase their effectiveness in receiving inputs from users, and
we employ Bone Conduction Technology to allow persons with auditory difficulties to enjoy
the virtual assistant environment. The emotional classifier aids in determining the user’s
emotional state and allowing the virtual assistant to respond appropriately[2].

4.2.1 Related Works


1. Brain to text: People can communicate via a BCI, which analyse their brain’s EEG
data. The ability to type via direct brain control is a well debated use of EEG[2].

13
Figure 5: The ASR Model

2. Bone conduction: For decades, bone conduction technology has been routinely em-
ployed. The gadget turns sound into vibration, which is the eardrum’s fundamental
function, and the vibrations reach the cochlea, which is attached to the auditory nerve,
which then communicates the sounds to our brain. As a result, those with hearing dif-
ficulties so those seeking a unique listening experience utilise this[2].

3. Sentimental Classifier: Positive, negative, or neutral is the act of detecting and classi-
fying material in order to ascertain the user’s current feeling. It is commonly used by
online retailers and data analysts to categorise items based on customer feedback[2].

4. Voice Assistance: Bixby Voice is a popular virtual assistant. Samsung’s Bixby Voice
is an IVA. It conducts activities that an IVA could undertake for a client using the
customer’s selected voice, much like any other voice assistant[2].

5. EEG Headset: It aids in the monitoring and reading of brain impulses. It is made of
of sensors that monitor brain activity and communicate the data to computers. One
such EEG headgear is the NeuroSkyMindWave Mobile BrainWave[2].

4.2.2 Proposed work


The goal of this project is to take the notion of a virtual assistant to the next level. The
use of the BCI in conjunction with a IVA might lead to a technological advancement in the
field of IVAs. People with hearing difficulties may now explore the world of virtual assistants
thanks to the usage of Bone Conduction Technology.

4.2.3 Working of the proposed model:


The user’s thoughts are translated into text using a BCI, which uses electroencephalography
(EEG) to transform the signal to text. The EEG headset signals are based on the user’s

14
Figure 6: The next generation Virtual Assistant

attention, such as focusing on the notion of pressing any letter on the keyboard. As a result,
the user’s attention is crucial in producing complicated dialogues. The sentimental classifier
is then given the text generated by translating the signals, which analyses the user’s emotions
using the training data set. This sentimental analysis aids in a more effective Understanding
of the user by elucidating the user’s motivations and feelings in relation to the text. Bone
conduction technology has taken the responsibility to communicate in an unobstructed way
and to take the technology to such level that it can be helpful for the peoples with auditory
problems and are diable to use normal IVAs.

4.2.4 Module I- Brain Computer Interface:


EEG is used by BCI to evaluate brain activity by detecting the voltage variations of ionic
current inside brain neurons. In this system, a brain typing system is suggested to translate
the user’s ideas into commands in real time[2].
The user’s brain activity can be extracted using one of two ways. The first method
involves implanting electrodes into the human brain. Though the outcomes are remarkable,
electrode placement necessitates surgery, which is a time-consuming process. This opens the
door for the second method, which involves employing detectors to read and examine brain
activity. Though this approach is non-invasive, the results of brain activity is not as good
as the previous method, since the brain’s neural activity is not conducted by the skull. The
EEG signals of the brain can be recorded using a variety of equipment. The device acquires
the signal, which is then transmitted for signal processing.

15
Figure 7: Architecture of the proposed system

An electroencephalograph’s signal acquisition consists of four key components: the data


collection , amplifier with different types filters, analogue-to-digital converter, and storage
with display screens. The electrodes with some conductive medium are the most essential
element of the data collector. In order to ensure that the electrodes are correctly placed
on the subject’s head, they are generally put in a textile cap. Alternative chemicals are
used in modern EEG equipment, which are typically of a more common nature, such as
gel.The electrodes are attached to a cable in order to amplify the signals. Because the
brain’s electrical signal is diminished by the skull, skin, and other layers of biological tissue,
it must be amplified. The signal separation and any interferences are also helped by the
amplifier. Not only is feature extraction used to minimise dimensionality, but it is also used
to extract relevant information from signals by minimising duplicated data.After extracting
the features, classification is done using the features that have been chosen. To see the
brain’s response to stimulation, we run several trials and take average of the results together,
resulting in the averaging out of random brain activity and the preservation of the essential
waveform. To learn the potential features for classification, we can use a SVM with a gradient
descent algorithm. External factors, such as a flash of light, cause EEG wave responses. The
measurable brain response that is the direct result of a specific sensory and external cognitive
event to a detector is known as an event-related potential(ERP).
The user feeds the input by visualising the letters on the screen, which are organised in
a keyboard-like matrix. The user focuses on a certain symbol by simply paying attention
to it for several seconds after the EEG headset has been placed and the classifier has been

16
trained. All of the symbols appear in a fast, random order. Based on the data recorded by
EEG during flashing, the computer is able to differentiate the desired sign from all others
after it has flashed many number times. The EEG headset would be connected to Open
BCI, and the data get processed and sent via Bluetooth link to the programme. EEGLAB
is the software we use to analyse and alter information.

Figure 8: The Brain Computer Interface

4.2.5 Module II – Request Processing:


The user’s thoughts are translated into text using the BCI processed output. The text is
forwarded to a virtual assistant, who interprets the user’s request using natural language
processing. The NLTK module is a large toolkit that aids in NLP process. NLTK assists
with a variety of tasks, such as separation of sentences from given paragraphs, breaking up
words, recognising the part of speech of those words, call attention to the major subjects, and
even assisting the computer in comprehending the text. The Virtual Assistant analyses and
crawls through the request in order to extract appropriate information from data sources.
However, sentimental analysis is used to provide a more advanced and better comprehen-
sion capabilities for the virtual assistant. The sentimental classifier aids in a more efficient
interpretation of the user by recognising the user’s motives and feelings in relation to the
content. The user’s input is mapped to the training data set, and the user’s emotion is de-
termined, and the appropriate answer is supplied. The user receives the result of the request
via bone conduction. The bone conduction approach causes vibrations in the cheek bone,
which aid the user in figure out IVA’s answer[2].

17
Figure 9: The Request Processing

5 Security & Privacy with IVAs


We used cloud servers for the data analysis,application analysis, packet analysis, voice com-
mand testings, and firmware analysis for a better understanding of IVA ecosystems and their
potential security and privacy concerns.The cloud-based IVA software reads text and voice
commands and performs the necessary procedures. IVA-enabled devices, such as an Alexa or
a Cortana, and partner programmes installed on the device that connect with the IVA are
the two user-side components.
The responses to requests made to an IVA, whether in text or speech format, are nowadays
stored in cloud storage. Companion is typically required to access the conversation between
the user and the IVA system. Obviously, the content of such exchanges could contain re-
vealing details, such as health-related questions. However, because user voice recordings
contain personally identifiable information, they pose a privacy risk. Unauthorized entities
could use this data to identify the user, gain malicious access to systems that implement
voice recognition, or simply process data and create voice artefacts that could be used to
impersonate the user.
Numerous IVAs allow the untrusted third-party vendors to link their products and ser-
vices to IVAs, which results in the enhancement in the skill of IVA.. Alexa, for example, is
compatible with ecobee, Philips Hue, Nest, Ring, and Leviton smart-home devices, among
others. It also works with a variety of apps for things like ordering meals, streaming music
and video, calling a cab, and monitoring account balances. Cloud-based IVA, IVA-enabled
devices, and related apps make up an IVA ecosystem. Over 10,000 voice-activated apps are
currently accessible in the Alexa Skills Store.[3].

18
5.1 Possible Security & Privacy Threats
1. Wiretapping an IVA ecosystem:
Sniffing the traffic between companion apps and the IVA can reveal the ecosystem’s
communication mechanisms, even if the apps use encrypted network connections.
Not all network traffic between IVA-enabled devices and cloud-hosted services is sent
through a secure protocol, according to our analysis. Many devices don’t use en-
crypted communications to check network connectivity, making IVA devices detectable
in a home network.Firmware image data could be delivered in unencrypted packets,
making the system vulnerable to man-in-the-middle attacks and malicious image al-
teration. Even if firmware images aren’t modified, having access to them creates a
security risk because it allows unauthorised people to view an IVA-enabled device’s
internal functionality.[3].

2. Mischievous voice commands:


A hacker who is using the service as a user and mischievous vocal command to gain the
unauthorized to any IoT device connected with IVAs give them entrance to home or
garage, for example, unlocking the smart door or placing an order without knowledge
of the user. Despite the fact that certain IVAs feature a voice-training mechanism to
prevent mimicking, the system may have trouble distinguishing between similar sounds.
As a result, a malicious person with access to an IVA-enabled device may be able to
fool the system into thinking he or she is the genuine owner and engage in criminal or
mischievous activities.[3].

3. Unintentional voice recording:


Voices within range of an IVA-enabled device can be erroneously recorded and trans-
ferred to the cloud, allowing other parties, such as commercial organisations with lawful
access to the data and hackers who penetrate the database, to listen in on private con-
versations. Due to the risk of inadvertent recording, users may not have total control
over their speech data.[3].

6 IVAs Knows your life:


As we know the data of IVAs are stored on cloud storage for the the enhancement of accuracy
of the present IVAs to make them more functional and accurate for the tasks.
One of the disadvantage of storing this data is system can predict the users lifestyle and
emotions based on the previous data stored. This can be done in following way:

6.1 Measurement Methodology


We gathered a different types of data from the Amazon Alexa cloud to gain insights about
user activities through IVA. We detail how we gathered Amazon Alexa cloud data from its
cloud in this section, and then show numerous statistics on the data.

19
1. Amazon Alexa Ecosystem
We focused our efforts on Amazon Alexa and its ecosystem, as previously stated.
Various Alexa-enabled devices are required to engage with the Alexa cloud service.
Alexa IVA may be used for a number of tasks, including as managing to-do lists, playing
playlists, setting mornings alarms, placing shopping orders, searching for information,
and checking traffic updates. The Alexa cloud creates and retains several forms of
digital traces connected to a user’s behaviour during this process[7].

2. Data Collection Methodology


In order to receive data, the Alexa cloud uses predefined APIs. We discovered some
central APIs that can be utilised to get data stored int Alexa cloud storage in a prior
study. We automatically collected users usage history logs maintained on the cloud
storage for supporting this study using these APIs that we found[7].

3. Data Description
We gathered data from a participant’s daily life over the course of three months, using
a number of Alexa-connected devices, including Echo Dots, Dash Wand, Fire Tablet,
and Fire TV[7].

Table 1: Types of Alexa cloud-native data

Similarly, the other findings can be derived in same manner using the previous data. Some
of this findings are the timing of alarms, users interested, daily schedule, driving routines.
The privacy of users is one of the most serious threats they face. If IVA data may reveal
all of this information about a person, privacy becomes the primary concern. Some of the
proposals include allowing the IVA provider to permanently remove the user’s history so
that the data cannot be used to predict lifestyle[7].

20
7 Conclusion
This paper concludes that, as technology advances, Intelligent Virtual Assistants are be-
coming an increasingly important part of everyone’s life. Although security is a worry, we
may gradually increase privacy and security. with the help of emerging technologies like
gesture recognition, image recognition, video recognition, speech recognition, the IVAs are
continuously evolving to smoothen the human-machine interaction. These systems can also
be utilized for a variety of functions, including education, healthcare sector, medical aid,
automated vehicles, home automation, etc. It can also be a potential solution for customer
service, training or education, facilitating transactions, travel information, counseling, on-
line ticket booking, reservation bookings, remote banking, information inquiries, and many
more services. We can provide virtual support for disabled people with the use of telepathic
IVAs, so they can communicate with IVAs using their thoughts. With the springing up in
technology, cyber security methods are also improving to keep the systems more and more
safe for the user.

21
References
[1] V. Këpuska and G. Bohouta, ”Next-generation of virtual personal assistants (Microsoft
Cortana, Apple Siri, Amazon Alexa and Google Home),” 2018 IEEE 8th Annual Com-
puting and Communication Workshop and Conference (CCWC), 2018, pp. 99-103, doi:
10.1109/CCWC.2018.8301638.

[2] V. NARMADHA, J. U. AJAY KRISHNAN, R. P. KUMAR and R. R. KUMAR, ”Tele-


pathic Virtual Assistant,” 2019 3rd International Conference on Computing and Commu-
nications Technologies (ICCCT), 2019, pp. 321-325, doi: 10.1109/ICCCT2.2019.8824886.

[3] H. Chung, M. Iorga, J. Voas and S. Lee, ”“Alexa, Can I Trust You?”,” in Computer, vol.
50, no. 9, pp. 100-104, 2017, doi: 10.1109/MC.2017.3571053.

[4] Virtual assistant. (2021a, October 4). In Wikipedia.


https://en.wikipedia.org/wiki/Virtual assistant

[5] CX Today. (2021, May 24). How do bots and chatbots work? Retrieved October 17,
2021, from https://www.cxtoday.com/contact-centre/how-do-bots-and-chatbots-work/

[6] Hoy, Matthew B. (2018). ”Alexa, Siri, Cortana, and More: An Introduc-
tion to Voice Assistants”. Medical Reference Services Quarterly. 37 (1): 81–88.
doi:10.1080/02763869.2018.1404391. PMID 29327988. S2CID 30809087.

[7] Chung, H. (2018, February 28). Intelligent virtual assistant knows your life. ArXiv.Org.
Retrieved October 11, 2021, from https://arxiv.org/abs/1803.00466

[8] Journal of Network Communications and Emerging Technologies (JNCET), Au-


gustian Isaac, R., & Narayanan, A. (2018, October). Virtual persinal assistant.
http://www.jncet.org/Manuscripts/Volume-8/Issue-10/Vol-8-issue-10-M-09.pdf

[9] Botelho, B. (2017, October 31). virtual assistant (AI assis-


tant). SearchCustomerExperience. Retrieved October 13, 2021, from
https://searchcustomerexperience.techtarget.com/definition/virtual-assistant-AI-
assistant

[10] Sudhakar Reddy, A., Vyshnavi, M., & Raju Kumar, C. (2020,
March). VIRTUAL ASSISTANT USING ARTIFICIAL INTELLIGENCE.
https://www.jetir.org/papers/JETIR2003165.pdf

22
Acknowledgment
I would like to take this opportunity to convey to Sir Sunny Bodiwala my gratitude and
deep appreciation for offering his continuous encouragement, support and motivation during
the work. This work may not come out in time without his encouragement and valuable
suggestions. The analysis methodology for scientific research he taught is an important
experience for me.

It gives me pleasure to express my deep sense of gratitude to Head of the Computer Sci-
ence and Engineering Department Dr. Mukesh A. Zaveri for providing us an opportunity
to present our work. Also I thank all the faculty members of the Computer Science and
Engineering and colleagues as they spent their valuable time guiding me during the work.

Prince Nandha
U19CS045

23

You might also like