0% found this document useful (0 votes)
59 views30 pages

Mini Project Report

The document describes a mini project report submitted by 4 students on creating an AI-based desktop assistant using Python. It includes sections on introduction, requirements, implementation, system design, results, and bibliography. The students developed a virtual assistant that uses voice commands to perform tasks like getting information, adding calendar events, and more through the use of AI and Python libraries.

Uploaded by

Suyog
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views30 pages

Mini Project Report

The document describes a mini project report submitted by 4 students on creating an AI-based desktop assistant using Python. It includes sections on introduction, requirements, implementation, system design, results, and bibliography. The students developed a virtual assistant that uses voice commands to perform tasks like getting information, adding calendar events, and more through the use of AI and Python libraries.

Uploaded by

Suyog
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Mini Project Report

on

AI base Desktop Assistant using Python


by
Shubham Shidheshwar Aute (2030331246044)
Rohit Dinkar Gaikwad (2030331246036)
Suyog Yogesh Deore (2030331246047)
Darvesh Singh Dogra (2030331246060)

Department of Information Technology


Dr. Babasaheb Ambedkar Technological University,
Lonere-402103, Dist. Raigad, (MS) INDIA.
Certificate

Mini Project report,on AI base Desktop Assistant using Python


submitted by Shubham Aute, Rohit Gaikwad, Suyog Deore, Darvesh
Dogra , is approved for the partial fulfillment of the requirements for
the degree of B.Tech.in Information Techonolgy of Dr. Babasaheb
Ambedkar Technological University, Lonere - 402 103, Raigad (MS).

Examiner(s)

(1) —————————————————————— Sign.: ——————–

(2) —————————————————————— Sign.: ——————–

Ms. Sonali V. Bharad Dr. Sanjay R. Sutar


Guide Head of Department

Place: Dr. Babasaheb Ambedkar Technological University, Lonere - 402 103.

1
Acknowledgments
Our first and foremost words of recognition go to my highly esteemed
Guide for her constructive academic advice and guidance, constant en-
couragement and valuable suggestions, and all other support and kindness
to me. Her supervision and guidance proved to be the most valuable to
overcome all the hurdles in the completion of this report.

We are also thankful to the Head of the Department, Dr. Sanjay R.


Sutar, for his guidance and valuable suggestions. We would also like to
thank my departmental staff, and library staff for their timely help.

Finally, We would like to thank all whose direct and indirect support
helped me complete this report in time.

Shubham Shidheshwar Aute (2030331246044)


Rohit Dinkar Gaikwad (2030331246036)
Suyog Yogesh Deore (2030331246047)
Darvesh Singh Dogra (2030331246060)

2
Abstract
As we all know now we are living in the era of computers. We all would
have wondered how convenient it would be if we had our own virtual A.I.
assistant , imagine how easier and effortless it would be to send emails
without typing a single word, searching on Wikipedia without actually
opening the web browsers, and performing many other daily tasks with
the help of a single voice command. As of today, voice assistants are now
everywhere. Voice based artificial intelligence is here to play an important
role in our daily life. Several such applications include Siri on Apple
devices, Cortana on Microsoft Devices, and Google Assistant on Android
devices. There are also devices dedicated to providing virtual assistance.
Virtual assistants are typically cloud-based programs that require internet-
connected devices and/or applications to work. Virtual assistants typically
perform simple jobs for end users, such as adding tasks to a calendar, pro-
viding information that would normally be searched in a web browser, or
a little more complex tasks like checking the status of smart home devices
etc. The work is initialized by analyzing the audio commands given by the
user via microphone. The speech engine is setup up so that it can convert
the text to speech using in build libraries. Speech recognition is used to
convert the speech input to text. This text is then fed to the model which
determines the nature of the command and calls the relatable script for
execution. This is basically what happens when the assistant receives a
command from the user.

3
Contents

1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Purpose, Scope and Applicability . . . . . . . . . . . . . . 4
1.4 How it Works . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Introduction to AI 7
2.1 What is Artificial Intelligence? . . . . . . . . . . . . . . . . 7
2.2 Types of Artificial Intelligence . . . . . . . . . . . . . . . . 8

3 Software Requirements Specification 9


3.1 Hardware Requirement: . . . . . . . . . . . . . . . . . . . 9
3.2 Software Requirement: . . . . . . . . . . . . . . . . . . . . 9

4 Implementation 13

5 System Design 16
5.1 ER Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.2 Activity Diagram . . . . . . . . . . . . . . . . . . . . . . . 18

4
5.3 Class Diagram . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.4 Use Case Diagram . . . . . . . . . . . . . . . . . . . . . . 20
5.5 Sequence Diagram . . . . . . . . . . . . . . . . . . . . . . 21

6 Result 22
6.1 User Interface . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.2 Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

7 BIBLIOGRAPHY 24

5
Chapter 1

Introduction

In today’s era almost all tasks are digitalized. We have Smartphone in


hands and it is nothing less than having world at your finger tips. These
days we aren’t even using fingers. We just speak of the task and it is done.
There exist systems where we can say Text Dad, “I’ll be late today.” And
the text is sent. That is the task of a Virtual Assistant.
Virtual Assistants are software programs that help you ease your day
to day tasks, such as showing weather report, creating reminders, making
shopping lists etc. They can take commands via text (online chat bots)
or by voice. Voice based intelligent assistants need an invoking word or
wake word to activate the listener, followed by the command.
This system is designed to be used efficiently on desktops. Personal as-
sistant software improves user productivity by managing routine tasks of
the user and by providing information from online sources to the user.”Helper”
is effortless to use. Allow your intelligent assistant to make email work for
you. Detect intent, pick out important information, automate processes,
and deliver personalized responses.

1
This project was started on the premise that there is sufficient amount
of openly available data and information on the web that can be utilized
to build a virtual assistant that has access to making intelligent decisions
for routine user activities.

1.1 Background

There already exist a number of desktop virtual assistants. A few exam-


ples of current virtual assistants available in market are discussed in this
section along with the tasks they can provide and their drawbacks.

SIRI from Apple


SIRI is personal assistant software that interfaces with the user thru
voice interface, recognizes commands and acts on them. It learns to adapt
to user’s speech and thus improves voice recognition over time. It also
tries to converse with the user when it does not identify the user request.
It integrates with calendar, contacts and music library applications on the
device and also integrates with GPS and camera on the device. It uses lo-
cation, temporal, social and task based contexts, to personalize the agent
behavior specifically to the user at a given point of time.

Supported Tasks

• Call someone from my contacts list

2
• Launch an application on my iPhone

• Send a text message to someone

• Set up a meeting on my calendar for 9am tomorrow

• Set an alarm for 5am tomorrow morning

• Play a specific song in my iTunes library

• Enter a new note

Drawback
SIRI does not maintain a knowledge database of its own and its un-
derstanding comes from the information captured in domain models and
data models.

1.2 Objectives

Main objective of building personal assistant software (a virtual assistant)


is using semantic data sources available on the web, user generated content
and providing knowledge from knowledge databases. The main purpose of
an intelligent virtual assistant is to answer questions that users may have.
This may be done in a business environment, for example, on the business
website, with a chat interface. On the mobile platform, the intelligent
virtual assistant is available as a call-button operated service where a
voice asks the user “What can I do for you?” and then responds to verbal
input.

3
One of the main advantages of voice searches is their rapidity. In fact,
voice is reputed to be four times faster than a written search: whereas we
can write about 40 words per minute, we are capable of speaking around
150 during the same period of time 15. In this respect, the ability of
personal assistants to accurately recognize spoken words is a prerequisite
for them to be adopted by consumers.

1.3 Purpose, Scope and Applicability

Purpose

Purpose of virtual assistant is to being capable of voice interaction,


music playback, making to-do lists, setting alarms, streaming podcasts,
playing audiobooks, and providing weather, traffic, sports, and other real-
time information, such as news. Virtual assistants enable users to speak
natural language voice commands in order to operate the device and its
apps.
There is an increased overall awareness and a higher level of comfort
demonstrated specifically by millennial consumers. In this ever-evolving
digital world where speed, efficiency, and convenience are constantly be-
ing optimized, it’s clear that we are moving towards less screen interaction.

Scope
Voice assistants will continue to offer more individualized experiences
as they get better at differentiating between voices. However, it’s not just

4
developers that need to address the complexity of developing for voice as
brands also need to understand the capabilities of each device and inte-
gration and if it makes sense for their specific brand. They will also need
to focus on maintaining a user experience that is consistent within the
coming years as complexity becomes more of a concern. This is because
the visual interface with voice assistants is missing. Users simply cannot
see or touch a voice interface.

Applicability
The mass adoption of artificial intelligence in users’ everyday lives is
also fueling the shift towards voice. The number of IoT devices such as
smart thermostats and speakers are giving voice assistants more utility in
a connected user’s life. Smart speakers are the number one way we are
seeing voice being used. Many industry experts even predict that nearly
every application will integrate voice technology in some way in the next
5 years.
The use of virtual assistants can also enhance the system of IoT (In-
ternet of Things). Twenty years from now, Microsoft and its competitors
will be offering personal digital assistants that will offer the services of a
full-time employee usually reserved for the rich and famous.

5
1.4 How it Works

• User ask the Assistant to perform the task.

• The natural language audio signal is converted into digital data that
can be analyzed by the software.

• Compared with a database of the software using an innovative algo-


rithm to find a suitable answer.

• This database is located on distributed servers in cloud networks

• For this reason, it must have a reliable Internet connection.

6
Chapter 2

Introduction to AI

2.1 What is Artificial Intelligence?

In the simplest terms, AI which stands for artificial intelligence refers to


systems or machines that mimic human intelligence to perform tasks and
can iteratively improve themselves based on the information they collect.
AI manifests in a number of forms.A few examples are:

• Chatbots use AI to understand customer problems faster and provide


more efficient answers

• Intelligent assistants use AI to parse critical information from large


free-text datasets to improve scheduling

• Recommendation engines can provide automated recommendations


for TV shows based on users’ viewing habits

7
2.2 Types of Artificial Intelligence

Based on Capabilities

• Weak AI or Narrow AI

• General AI

• Super AI

Based on Functionality

• Reactive Machines

• Limited Memory

• Theory of Mind

8
Chapter 3

Software Requirements Specification

The system is built keeping in mind the generally available hardware and
software compatibility. It doesn9t require any expensive hardware devices.
The minimum hardware and software requirements for the system are
listed below.

3.1 Hardware Requirement:

• Processor - minimum Intel(R) Core(TM) i5-11500U CPU @ 2.20GHz


2.60 GHz

• Processor speed - minimum 2 GHz

• RAM - minimum 8.00 GB

3.2 Software Requirement:

• pyttsx3 - is a text-to-speech conversion library in Python. Unlike


alternative libraries, it works offline, and is compatible with both

9
Python 2 and 3. . An application invokes the pyttsx3.init() factory
function to get a reference to a pyttsx3. Engine instance. It is a
very easy-to-use tool that converts the entered text into speech. The
pyttsx3 module supports two voices first is female and the second is
male which is provided by ¡sapi5= for windows. It supports three TTS
engines:- sapi5 – SAPI5 on Windows, nsss – NSSpeechSynthesizer on
Mac OS X, espeak – eSpeak on every other platform.

• Speech Recognition - to allow us to convert audio into text for further


processing. The speech recognition module is used to covert the audio
into text. We are also using the Google web speech API and turn the
spoken language into text using recognize google().

• Visual Studio Code - Visual Studio Code is a streamlined code ed-


itor with support for development operations like debugging, task
running, and version control. It aims to provide just the tools a
developer needs for a quick code-build-debug cycle and leaves more
complex workflows to fuller featured IDEs, such as Visual Studio IDE.
It is a source code editor made by Microsoft for Windows, Linux and
macOS. Features include support for debugging, syntax highlighting,
intelligent code completion, snippets, code refactoring, and embedded
Git. Users can change the theme, keyboard shortcuts, preferences,
and install extensions that add additional functionality.

• OS Module - The OS module in Python provides functions for inter-


acting with the operating system. OS comes under Python’s standard
utility modules. This module provides a portable way of using op-

10
erating system-dependent functionality. The os and os.path modules
include many functions to interact with the file system.

• Wikipedia - Wikipedia is a multilingual online encyclopedia created


and maintained as an open collaboration project by a community of
volunteer editors using a wiki-based editing system. In this article,
we will see how to use Python’s Wikipedia module to fetch a variety
of information from the Wikipedia website.

• Webbrowser Module - The webbrowser module provides a basic in-


terface to the system’s standard web browser. It provides an open
function, which takes a filename or a URL, and displays it in the
browser. If you call open again, it attempts to display the new page
in the same browser window.

• smtplib Module - The smtplib module defines an SMTP client session


object that can be used to send mail to any internet machine with
an SMTP or ESMTP listener daemon. For details of SMTP and
ESMTP operation, consult RFC 821 (Simple Mail Transfer Protocol)
and RFC 1869 (SMTP Service Extensions).

• Wolframalpha - The Wolfram—Alpha Webservice API provides a


web-based API allowing the computational and presentation capa-
bilities of Wolfram—Alpha to be integrated into web, mobile, desk-
top, and enterprise applications. Wolfram Alpha is an API which can
compute expert-level answers using Wolfram’s algorithms, knowledge
base and AI technology. It is made possible by the Wolfram Lan-

11
guage.

• Beautiful Soup - Beautiful Soup is a Python library for pulling data


out of HTML and XML files. It works with your favorite parser to
provide idiomatic ways of navigating, searching, and modifying the
parse tree. It commonly saves programmers hours or days of work.

• Datetime - In Python, date and time are not a data type of their own,
but a module named datetime can be imported to work with the date
as well as time. Python Datetime module comes built into Python,
so there is no need to install it externally.Python Datetime module
supplies classes to work with date and time. These classes provide a
number of functions to deal with dates, times and time intervals. Date
and datetime are an object in Python, so when you manipulate them,
you are actually manipulating objects and not string or timestamps

12
Chapter 4

Implementation

Step 1: Setting up the speech engine


The pyttsx3 module is stored in a variable name engine. pyttsx3 is a
text-to-speech conversion library in Python. Unlike alternative libraries,
it works offline and is compatible with both Python 2 and 3. It is a very
easy-to-use tool that converts the entered text into speech. The pyttsx3
module supports two voices first is female and the second is male.

Step 2: Speech Recognition


Speech recognition is a machine’s ability to listen to spoken words and
identify them. Speech recognition allows us to convert audio into text for
further processing. The speech recognition module is used to convert the
audio into text. It can be used to convert the spoken words into text,
make a query or give reply. We are creating an instance of the Recognizer
class and we will use recognize google() method on it to access the Google
web speech API and turn spoken language into text.

13
Step 3: Neural network for assistant
Neural networks comprise of layers/modules that perform operations on
data. The torch.nn namespace provides all the building blocks you need to
build your own neural network and define our neural network. Then create
the neurons through which data and computations flow. The input comes
from the raw data set We use NumPy to build a single neuron. NLTK is a
toolkit built for working with NLP in Python. It provides us with various
text processing libraries with a lot of test datasets. A neural network is a
series of algorithms that endeavours to recognize underlying relationships
in a set of data through a process that mimics the way the human brain
operates. They develop the output without programmed rules.

Step 4: dataset
Here we are creating a .json file which contains tags, patterns, responses
and they are supplied to the neural network to train the model. And then
all the data which is trained will be stored in a .pth dataset file, .pth is
a data file for machine learning in PyTorch. The reason why we used a
json file is because it is a data inter change format and uses human read-
able text to store and transmit data objects consisting of attribute value
pairs and arrays. It basically has two data structures: - object and array.
Object stores a set of name value pairs and array is a list of values. The
dataset has been created by us depending on the tasks that has to be
carried out.

14
Step 5: Categorizing
We are generating the probability where we train the model what to re-
spond like if we are communicating with the assistant the assistant should
be able to categorize the conversation under the specific tag using the
probability.

Step 6: Tasks
There are two types of inputs :– input function and non input function.
Examples of non input function are Time, Date etc. and examples of input
functions – google search, Wikipedia etc. Such tasks can be implemented
using various modules provided by python like datetime, Wikipedia, py-
whatkit etc. We can also provide tasks for interacting with operating
system using OS library. And based on the tasks we add, we even need
to add various tags related to the task which can make the conversation
better after training the model.

15
Chapter 5

System Design

5.1 ER Diagram

The above diagram shows entities and their relationship for a virtual
assistant system. We have a user of a system who can have their keys and
values. It can be used to store any information about the user. Say, for

16
key “name” value can be “Jim”. For some keys user might like to keep
secure. There he can enable lock and set a password (voice clip). Single
user can ask multiple questions. Each question will be given ID to get
recognized along with the query and its corresponding answer. User can
also be having n number of tasks. These should have their own unique
id and status i.e. their current state. A task should also have a priority
value and its category whether it is a parent task or child task of an older
task.

17
5.2 Activity Diagram

Initially, the system is in idle mode. As it receives any wake up cal it


begins execution. The received command is identified whether it is a ques-
tionnaire or a task to be performed. Specific action is taken accordingly.
After the Question is being answered or the task is being performed, the
system waits for another command. This loop continues unless it receives
quit command. At that moment, it goes back to sleep

18
5.3 Class Diagram

The class user has 2 attributes command that it sends in audio and the
response it receives which is also audio. It performs function to listen the
user command. Interpret it and then reply or sends back response accord-
ingly. Question class has the command in string form as it is interpreted
by interpret class. The task class also has interpreted command in string
format. It has various functions like reminder, note, mimic, research and
reader.

19
5.4 Use Case Diagram

In this project there is only one user. The user queries command to
the system. System then interprets it and fetches answer. The response
is sent back to the user

20
5.5 Sequence Diagram

The above sequence diagram shows how an answer asked by the user
is being fetched from internet. The audio query is interpreted and sent to
Web scraper. The web scraper searches and finds the answer. It is then
sent back to speaker, where it speaks the answer to user.

21
Chapter 6

Result

6.1 User Interface

Shows the interface for start or stop working of program.

22
6.2 Output

Shows the output when asking for play music...

Shows the output when asking for open youtube...

23
Chapter 7

BIBLIOGRAPHY

Websites referred

• www.stackoverflow.com

• www.pythonprogramming.net

• www.codecademy.com

• www.tutorialspoint.com

YouTube Channels referred

• code with harry

• mySirG

24

You might also like