Report Mini Edited
Report Mini Edited
On
AI Voice Assistant (Using Python)
Submitted in partial fulfillment for award of
BACHELOR OF TECHNOLOGY
Degree
In
Computer Science and Engineering
2023-24
Under the Guidance of: Submitted By:
Mr. Vineet Srivastava Divyansh Singh (2100330100089)
(Assistant Professor) Chetan Kansal(2100330100074)
Krishnakant Tiwari (2100330100128)
1
TABLE OF CONTENTS
CHAPTER NO. TITLE PAGE NO.
1 INTRODUCTION 7
5 PROJECT SNAPSHOTS 17
6 LIMITATIONS 27
6.1. Chatbots Don’t Understand Human Context.
6.2. They Don’t Do Customer Retention
6.3. They Can’t Make Decisions
6.4. Exorbitant Installation
6.5. Chatbots Have the Same Answer For a Query
6.6. They Have Zero Research Skills
6.7. Voice assistants Have No Emotions
7 FUTURE SCOPE 30
CONCLUSIONS 32
REFERENCES 33
ii
LIST OF TABLES
iii
LIST OF FIGURES
iv
CHAPTER 1
INTRODUCTION
In this modern era, day to day life became smarter and interlinked with technology. We already
know some voice assistance like google, Siri. etc. Now in our voice assistance system, it can act
as a basic medical prescriber, daily schedule reminder, note writer, calculator and a search tool.
This project works on voice input and give output through voice and displays the text on the
screen. The main agenda of our voice assistance makes people smart and give instant and
computed results. The voice assistance takes the voice input through our microphone (Bluetooth
and wired microphone) and converts our voice into computer understandable language and gives
the required solutions and answers which are asked by the user. This assistance connects with the
world wide web to provide results that the user has questioned. Natural Language Processing
algorithm helps computer machines to engage in communication using natural human language
in many forms.
Today the development of artificial intelligence (AI) systems that can organize a natural human-
machine interaction (through voice, communication, gestures, facial expressions, etc.) are
gaining in popularity. One of the most studied and popular was the direction of interaction, based
on the understanding of the machine by the machine of the natural human language. It is no
longer a human who learns to communicate with a machine, but a machine learns to
communicate with a human, exploring his actions, habits, behavior and trying to become his
personalized assistant.
Virtual assistants are software programs that help you ease your day to day tasks, such as
showing weather reports, creating remainders, making shopping lists etc. They can take
commands via text (online chatbots) or by voice. Voice-based intelligent assistants need an
invoking word or wake word to activate the listener, followed by the command. We have so
many virtual assistants, such as Apple’s Siri, Amazon’s Alexa and Microsoft’s Cortana.
This system is designed to be used efficiently on desktops. Personal assistants software improves
user productivity by managing routine tasks of the user and by providing information from an
online source to the user.
This project was started on the premise that there is a sufficient amount of openly available data
and information on the web that can be utilized to build a virtual assistant that has access to
making intelligent decisions for routine user activities.
Keywords: Virtual Assistant Using Python, AI, Digital assistance, Virtual Assistance, Python
Each company developer of the intelligent assistant applies his own specific methods and
approaches for development, which in turn affects the final product. One assistant can synthesize
speech more qualitatively, another can more accurately and without additional explanations and
v
corrections perform tasks, others can perform a narrower range of tasks, but most accurately and
as the user wants.
Obviously, there is no universal assistant who would perform all tasks equally well. The set of
characteristics that an assistant has depends entirely on which area the developer has paid more
attention to. Since all systems are based on machine learning methods and use for their creation
huge amounts of data collected from various sources and then trained on them, an important role
is played by the source of this data, be it search systems, various information sources or social
networks. The amount of information from different sources determines the nature of the
assistant, which can result as a result. Despite the different approaches to learning, different
algorithms and techniques, the principle of building such systems remains approximately the
same. Figure 1 shows the technologies that are used to create intelligent systems of interaction
with a human by his natural language. The main technologies are voice activation, automatic
speech recognition, Teach-To-Speech, voice biometrics, dialogue manager, natural language
understanding and named entity recognition.
6
1.2. Proposed Plan of Work
The work started with analyzing the audio commands given by the user through the microphone.
This can be anything like getting any information, operating a computer’s internal files, etc. This
is an empirical qualitative study, based on reading above mentioned literature and testing their
examples. Tests are made by programming according to books and online resources, with the
explicit goal to find best practices and a more advanced understanding of Voice Assistant.
Fig.2 shows the workflow of the basic process of the voice assistant. Speech recognition is used
to convert the speech input to text. This text is then fed to the central processor which determines
the nature of the command and calls the relevant script for execution.
But, the complexities don’t stop there. Even with hundreds of hours of input, other factors can
play a huge role in whether or not the software can understand you. Background noise can easily
throw a speech recognition device off track. This is because it does not inherently have the
ability to distinguish the ambient sounds it “hears” of a dog barking or a helicopter flying
overhead, from your voice. Engineers have to program that ability into the device; they conduct
data collection of these ambient sounds and “tell” the device to filter them out. Another factor is
the way humans naturally shift the pitch of their voice to accommodate for noisy environments;
speech recognition systems can be sensitive to these pitch changes.
7
1.3. Methodology of Virtual Assistant Using Python
The system uses Google’s online speech recognition system for converting speech input to text.
The speech input Users can obtain texts from the special corpora organized on the computer
network server at the information center from the microphone is temporarily stored in the system
which is then sent to Google cloud for speech recognition. The equivalent text is then received
and fed to the central processor.
The python backend gets the output from the speech recognition module and then identifies
whether the command or the speech output is an API Call and Context Extraction. The output is
then sent back to the python backend to give the required output to the user.
API stands for Application Programming Interface. An API is a software intermediary that
allows two applications to talk to each other. In other words, an API is a messenger that delivers
your request to the provider that you’re requesting it from and then delivers the response back to
you.
8
1.3.4. Content Extraction
Context extraction (CE) is the task of automatically extracting structured information from
unstructured and/or semi-structured machine-readable documents. In most cases, this activity
concerns processing human language texts using natural language processing (NLP). Recent
activities in multimedia document processing like automatic annotation and content extraction
out of images/audio/video could be seen as context extraction TEST RESULTS.
Text-to-Speech (TTS) refers to the ability of computers to read text aloud. A TTS Engine
converts written text to a phonemic representation, then converts the phonemic representation to
waveforms that can be output as sound. TTS engines with different languages, dialects and
specialized vocabularies are available through third-party publishers.
9
CHAPTER 2
● Python
● QT Designer
● Web Browser: Google Chrome or later
● Canva
● Operating System: Windows XP / Windows7/ Windows Vista
10
CHAPTER 3
11
Figure 3.2. Working of algorithm
12
CHAPTER 4
As we know Python is a suitable language for scriptwriters and developers. Let’s write a script
for Voice Assistant using Python. The query for the assistant can be manipulated as per the
user’s need. Speech recognition is the process of converting audio into text. This is commonly
used in voice assistants like Alexa, Siri, etc. Python provides an API called Speech Recognition
to allow us to convert audio into text for further processing. In this article, we will look at
converting large or long audio files into text using the Speech Recognition API in python.
4.1. Subprocess: This module is used to get system subprocess details used in various
commands i.e. Shutdown, Sleep, etc. This module comes built-in with Python.
4.3. Speech Recognition: Since we are building an application for voice assistant, one of
the most important things is that your assistant recognizes your voice (means what you want to
say or ask). To install this module, type the command: pip install Speech Recognition. This
sophisticated technology leverages machine learning algorithms, often employing neural
networks, to discern and transcribe spoken words with remarkable accuracy. AI chatbots
equipped with speech recognition capabilities enable users to engage in natural,
conversational interactions, facilitating a more seamless and user-friendly experience. By
understanding and processing spoken language, these chatbots can swiftly address user
queries, provide information, and execute commands, fostering a more intuitive and efficient
communication channel.
4.4. Pyttsx 3: This module is used for the conversion of text to speech in a program that works
offline. To install this module, type the below command: pip install pyttsx3.
4.5. Web browser: To perform Web Search. This module comes built-in with Python. As
gateways to the vast expanse of the internet, browsers provide users with the means to
access and engage with chatbot services effortlessly. The user interface of a web browser
serves as the canvas upon which AI chatbots display their conversational prowess,
enabling intuitive and dynamic exchanges. Modern browsers, equipped with advanced
features and compatibility, ensure that users can harness the full potential of AI chatbots
across diverse platforms. Whether through traditional text-based interfaces or more
immersive voice and video interactions, web browsers serve as the conduit for users to
connect with AI chatbots, fostering a user-friendly and accessible environment for the
integration of artificial intelligence into the fabric of the online experience.
13
4.6. Datetime: Date and Time are used to showing Date and Time. This module comes
built-in with Python.
4.7. Requests: Requests are used for making GET and POST requests. To install these
modules, the below command: pip install requests. Users submit these requests to the chatbot,
seeking information, assistance, or specific actions. These requests can vary widely, ranging
from simple queries about weather updates to more complex tasks such as setting reminders
or conducting online transactions. The effectiveness of an AI chatbot hinges on its ability to
accurately interpret and fulfill these requests. Advanced natural language processing (NLP)
algorithms enable chatbots to comprehend user input, discern intent, and generate
contextually relevant responses. Moreover, the continuous learning capabilities of AI chatbots
empower them to adapt and improve over time, enhancing their proficiency in understanding
and addressing diverse user requests. The dynamic nature of user interactions with AI
chatbots underscores the importance of refining algorithms and expanding the model's
knowledge base to better cater to evolving user needs.
14
CHAPTER 5
PROJECT SNAPSHOTS
15
Figure 5.2. Calling Initial Functions
16
Figure 5.3. Calling py.jokes
17
Figure 5.4. Input of Responses
18
Figure 5.5. Other Function calls
19
Figure 5.6. Calling Screenshot Function
20
Figure 5.7. Making a Speed Function
21
Figure 5.8. Other Function Calls
22
Figure 5.9. Start/Stop Functions
23
Figure 5.10. User Interface
24
CHAPTER 6
LIMITATIONS
Voice Assistants are directly linked with businesses, so understanding their weaknesses is a
crucial part. There are a plethora of limitations users and business owners have complained
about.
Also, these limitations of voice assistant have stopped various organizations from deploying
chatbots on their applications and websites.
It is one of the significant limitations of chatbots. These chatbots are programmed in a way that
they only know what they are taught. They cannot understand humans’ context, and this is a
massive gap that can even lead to an angry customer.
The AI-powered smart-bots can understand the general context, but 40 out of 100 cases are not
related to the broad context.
Retaining a customer is a vital part of every organization. It holds more importance than getting
new customers. A chatbot is significantly less capable of retaining the customers as it only tries
up to a level for which it is programmed.
It is seen that human executives are better at customer retention because they can relate to the
customers’ feelings, which is not the case with chatbots.
Another limitation of chatbots is that they lack decision-making. They don’t have the right know-
how to differentiate between the good and the bad. On March 23, 2016, the tech biggie Microsoft
attracted many controversies due to its voice assistants Tay. The Voice assistants posted offensive
Tweets and landed Microsoft in huge troubles. So they have to shut down the chatbot temporarily.
Similarly, voice assistants have done a lot of damage to multiple brands due to their poor decision-
making capability.
25
6.4. Exorbitant Installation
Yes, voice assistants save you a lot of money in the long run, but their installation cost can break
the bank. You need to hire professionals who have rightly programmed chatbots to match the
integrity of your business.
And installing a chatbot service means your business should be ready for substantial investment
into Artificial Intelligence and Machine Learning.
Most customers don’t proceed with the chat when they know they are chatting with voice
assistants. Voice assistants are easily identifiable because they have the same answer for multiple
queries. Suppose you are asking something to a bot that is not available in the data server so that
you will get an apology.
The same is the case with other queries; no matter how many different questions you ask, it will
deliver you with the same apology, which is quite irritating.
The harsh reality of chatbots is that they have zero research skills. These bots only have the
answers to the available queries; they cannot research new topics on the web. Also, the
memorizing power of a chatbot is significantly less; they cannot memorize anything until they
are fed with new samples and continual training, which is expensive and time-consuming.
However, advancements in AI research are ongoing, and there are efforts to integrate chatbots
with external databases or APIs to simulate a limited form of research. These enhancements
aim to enable chatbots to access real-time information or retrieve data from specific sources,
expanding their utility and making them more adept at addressing a broader range of user
queries. Nevertheless, as of now, the autonomous and comprehensive research capabilities
commonly associated with human intellect remain a challenge for AI chatbots.
26
6.7. Voice assistants Have No Emotions
Lastly, voice assistants have no emotions, and they cannot relate to any low situation. Having no
emotions means a chatbot can never establish a connection with the customer, which is crucial
for any business’s growth.
The voice assistants without sentiment analysis knowledge will deal with the customers in a
particular way irrespective of the chat flow. As a result, some customers prefer to close the chat!
27
CHAPTER 7
FUTURE SCOPE
Voice Assistants are hot software in the enterprise, but to maintain longevity and relevance,
developers need to take a look at the barriers to entry, interface options and NLP issues.
From gauging purchase intent to answering questions about IT issues, chatbots are on track to
play a major role in the contemporary enterprise. Voice assistants are fully functioning, semi-
autonomous systems that can assist customer service experiences and response time.
But that doesn't mean their future in the enterprise is secure. For voice assistants to withstand the
rapidly increasing technological shifts and become mainstays in the enterprise, developers need
to examine the issues that have popped up with increased implementation.
The future scope of voice assistants could include many benefits for enterprises, but experts say
they will need to be gently nudged in the right direction for businesses to reap these benefits.
Over the past few years, the adoption of voice search has grown significantly as well. 65% of 25-
49-year-olds speak to their voice-enabled devices at least once per day whether it’s to get
answers to their questions, find local businesses, or make purchases. And since not only
customers but also businesses are looking at voice search and voice assistants with growing
interest, we can be sure that these devices are here to stay.
71% of consumers already prefer voice search to manual typing since it’s much faster and also
allows them to multitask. But as voice assistants become more powerful, easier to use, and able
to understand context far better, more people will turn to voice search and virtual assistants for
help with their everyday tasks.
In the near future, voice assistants are also expected to take a more proactive role. Rather than
just waiting for user commands, assistants will collect context-specific information and then take
the initiative by making helpful suggestions to the user. For example, people can interact with
their in-car voice assistants to get information about fuel levels, diagnostics, and service needs or
system settings that may need adjustment. So when fuel levels are low, the voice assistant may
suggest going to the nearest gas station (with GPS directions if needed).
What’s more, an in-car voice assistant could be connected to intelligent home systems by
integrating them with IoT devices or home automation systems. This would enable car owners to
turn off the lights and set the alarm after they leave home or turn on the heating before they
return.
Soon, voice assistants will also be able to authenticate purchases by recognizing a voice and
matching it to a set credit card or bank account. Users will be able to pay for their orders simply
by using voice commands - the voice assistant would only ask them to confirm the payment.
28
The option of paying through voice command is quickly growing in popularity. While only
around 8% of the US adult population used voice payments in 2017, that number rose to 24% in
2021. Statista also predicts that over 30% of Americans will use voice payments by 2022 as a
result of people increasingly looking for instant and contactless methods of payment.
Some companies are still hesitant to offer this payment method, fearing that it will open up new
opportunities for fraudsters. However, using voice biometrics may prove to be a solution here.
As each voiceprint is unique and nearly impossible to forge, voice assistants armed with voice
biometrics technology shouldn’t have any problems differentiating real bank accounts or credit
card owners from fraudsters. What’s more, it can work just as well at preventing any accidental
purchases (made by children, for example) from going through by simply rejecting all payment
orders that fail voice verification.
29
CONCLUSION
There can be no doubt that voice assistants are, and will continue to become, a great feat of
human ingenuity and they are already creeping into our lives in some shape or form. With the
eventual roll-out of 5G and the improvement in machine learning, voice assistants may be setting
themselves up to be a tool we cannot live without.
However, before we get to that stage, there are hurdles to cross which include heavy investment,
improvement in the technology and confidence from consumers that this device that is in their
lives does not pose a risk to their privacy.
The future of voice search and assistants is looking bright. With the number of people already
seeing how convenient those tools can be and the growing number of devices that use voice
recognition. It's clear that the technology will soon be everywhere, and with 5G and
improvements in machine learning, voice assistants might at some point become tools we can’t
live without.
30
REFERENCES
List of References
[2] http://yudian.voicecloud.cn/
[4] http://en.wikipedia.org/wiki/Extreme_programming
[5] https://www.researchgate.net/publication/342880348_Introduction_to_AI_Chatbots
[6] Grandview research. Chatbot Market Growth & Trends. 2022 [cited 2022 14 June]; Available
from: https://www.grandviewresearch.com/press-release/global-chatbot-market
[7] ] Dale, R., The return of the chatbots. Natural Language Engineering, 2016. 22(5): p. 811-817
[8] McTear, M.F. The rise of the conversational interface: A new kid on the block? An
international workshop on future and emerging trends in language technology. 2016. Springer.
[9] Radziwill, N.M. and M.C. Benton, Evaluating quality of chatbots and intelligent
conversational agents. arXiv preprint arXiv:1704.04579, 2017.
[10] Seeger, A.-M., J. Pfeiffer, and A. Heinzl. When do we need a human? Anthropomorphic
design and trustworthiness of conversational agents. in Proceedings of the Sixteenth Annual
Pre-ICIS Workshop on HCI Research in MIS, AISeL, Seoul, Korea. 2017.
[11] Gkinko, L.E., Amany, Chatbots at Work: A Taxonomy of the Use of Chatbots in the
workplace, in Responsible AI and Analytics for an Ethical and Inclusive Digitized Society.
20th IFIP WG 6.11 Conference on e-Business, e-Services and e- Society, I3E 2021, , A.G. D.
Dennehy, N. Pouloudi, Y. K. Dwivedi, I. Pappas, & M. Mäntymäki (Eds.), Editor. 2021:
Galway, Ireland, September 1– 3, 2021.
[12] von Wolff, R.M., et al., Chatbots at Digital Workplaces - A Grounded-Theory Approach for
Surveying Application Areas and Objectives. Pacific Asia Journal of the Association for
Information Systems, 2020. 12(2): p. 64-102.
31