0% found this document useful (0 votes)

17 views34 pages

Voice Assistant

The document outlines a course project focused on developing a smart voice assistant that utilizes speech recognition, natural language processing, and text-to-speech technologies. It details the objectives, functionalities, methodologies, and literature survey related to voice assistants, emphasizing the integration of machine learning and intelligent prediction systems. The project targets tech-savvy individuals interested in AI and aims to enhance user interaction through voice commands across various devices.

Uploaded by

tanishka.24bce10353

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views34 pages

Voice Assistant

Uploaded by

tanishka.24bce10353

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

VOICE ASSISTANT POWERED BY YOUR VOICE .

DRIVEN BY INTELLIGENCE

COURSE: FUNDAMENTALS IN AI/ML

COURSE CODE: CSA2001
SLOT: A24+C21+F22
our GROUP
1. Ruddransh Bhardwaj 24BCE10011
2. Himanshu Singh 24BCE10123
3. Tanishka Chauhan 24BCE10353
4. Rashi Tiwari 24BSA10132
5. Daksh Patodi 24BCE10304
6. Saransh Singh 23BAS10099
About
Speech recognition, also known as automatic speech recognition (ASR)
or voice recognition, is a technology that converts spoken language
into written text.

The primary goal of speech recognition systems is to accurately and

efficiently transcribe spoken words into a format that can be processed,
stored, or used for various applications.

This technology relies on sophisticated algorithms and machine

learning techniques to interpret and understand human speech
patterns.
OBJECTIVES AND GOALS
To design and develop a smart voice assistant capable of understanding and executing user
commands through natural language interaction, offering seamless integration with various
devices and services for improved user convenience and efficiency.

1. 2. 3.
TASK
ACCURATE REAL-TIME
EXECUTION
SPEECH AND NATURAL
AND
RECOGNITION INTERACTION
AUTOMATION
FUNCTIONALITIES
Listening: The assistant uses a microphone to capture the user's
voice and processes it.
Speech Recognition: Converts the user's spoken input into text for
processing.
Command Processing: Interprets the text to understand the user's
intent.
Performing Actions: Executes the required task, like fetching
information, opening applications, or responding verbally.
Responding: Converts the response text back into speech for output
INTRODUCTION
The assistant can perform simple tasks like opening apps, fetching
information, or giving time and date updates.

It employs technologies like Speech Recognition to convert spoken words

into text, Natural Language Processing to understand user intent, and Text-
to-Speech (TTS) to provide audio responses. Libraries such as
speech_recognition, pyttsx3, and pyaudio are commonly used.

While simple, it demonstrates the fundamentals of voice-based interaction

and can be enhanced with additional features for more advanced
functionality.
LITERATURE
SURVEY
The literature survey highlights the growing relevance and integration of voice assistants across various
smart devices such as phones, smartwatches, TVs, and home automation systems. To handle vast data
and provide better responses, voice assistants increasingly rely on Machine Learning (ML), Natural
Language Processing (NLP), Big Data, and the Internet of Things (IoT).
Key referenced works include:
Patrick Nguyen et al. introduced the Flat Direct Model (FDM) for speech recognition, bypassing
traditional Markov models and improving sentence error rates.
Nil Goksel et al. explored the role of Intelligent Personal Assistants (IPAs) in education and learning
through AI and NLP.
Keerthana S et al. demonstrated smart home control using voice assistants and Wi-Fi-enabled
microcontrollers for automation.
Sutar Shekhar et al. discussed how voice assistants on Android platforms enhance user convenience
with features like predictive recommendations.
Rishabh Shah et al. emphasized the importance of NLP in enabling assistants to understand native
languages, making the technology inclusive.
GTTS-EHU systems were studied for Spoken Term Detection, using synthesized speech and Stacked
Bottleneck Features (sBNF).
Tanvee Gawand et al. combined gTTS, AIML, and Python to develop offline-capable, flexible voice
assistants like "JARVIS".
The survey underlines that the integration of speech recognition, context extraction, and intelligent
prediction systems is critical for building smarter, more accessible voice assistants.
METHODOLOGY
USE OF CONDITIONAL STATEMENTS
Conditional statements, also known as "if statements", allow programs to make decisions based on
specific conditions. They are fundamental in programming to control the flow of a program. Basic
Structure of Conditional Statements

•If Statement: Executes a block of code if a condition is true.

•If-Else Statement: Provides an alternative block of code if the condition is false.
•If-Else Statement: Provides an alternative block of code if the condition is false.
•Elif statement: Used as part of conditional statements to check multiple conditions. It's short for "else
if." When you need to test more than one condition in a sequence, you use elif between if and else

Use of LOOPS: Loops in Python are used to execute a block of code repeatedly, either for a fixed
number of times or until a condition is met. They are fundamental in automating repetitive tasks,
handling large datasets, and iterating over elements in collections like lists or strings. In this program
we have used the while loop in the main function to execute the block of code
USE OF FUNCTIONS IN PYTHON PROGRAM
A function in Python is a block of reusable code designed to perform a specific task. It help organize
and structure code, make it easier to debug, and improve reusability
In this program we have used various Functions:

Listen to
Process Open
speech
Commands Website
or voice

EXIT CONDITION:
The program stops running when the user says "exit" or "quit."
1. Wave
2. NumPy
3. datetime

LIBRARIES 4. requests

USED... 5. Thread
6. webbrowser
7. Speech_recognition
8. pyttsx3
FEATURES
ERROR HANDLING REDUCED LATENCY
Handled Shortened timeout
unknown input for speech
recognition, faster
and failed API text-to-speech
calls output

THREADED OPERATIONS VOICE OPTIMIZATION CORE FUNCTIONS

Fast speech
Used threads to Provide Word
synthesis with
open websites meanings , fetch
adjustable rate and
without delaying weather details, tell
volume, female voice
other tasks time and date, open
prioritization
websites.
How does speech to text work?!
Steps in speech to text conversion....

1. 2. 3. 4. output
Step 1. Acoustic Signal Processing:
The input to a speech recognition system is an acoustic
signal, the analogue waveform of the spoken words. This
signal is captured by a microphone and converted into a
digital format, Therefore, a complex speech recognition
algorithm known as the Fast Fourier Transform is used to
convert the graph into a spectrogram.

Step 2. Feature Extraction

Spectral Analysis: The digital signal undergoes spectral analysis to extract relevant features.
This involves breaking down the signal into frequency components, revealing patterns
representing speech sound characteristics.
Pitch and Intensity Analysis: Additional features, such as pitch (frequency of the speech) and
intensity (loudness), are extracted to capture more nuances of the spoken language.
Standardization prepares the data; CNN extracts the features

STANDARDIZATION CNN (Convolutional Neural Network)

Standardization is a data preprocessing step that CNN is a type of neural network that learns to extract
transforms input features so they have: features from data automatically — especially spatial or
Mean = 0 temporal patterns.
Standard deviation = 1
🔍 What it does:
Formula: Applies filters (or kernels) that slide over input data
z=(x−μ)/σ Detects local features, like edges, shapes (in images), or
phonetic patterns (in spectrograms)
Where: Each convolution layer learns increasingly abstract
x= input value representations
μ= mean of the feature
σ= standard deviation ✅ CNN learns what features are important — it doesn't
just scale or normalize them.
✅ Why it's used:
To normalize the range of values
Helps neural networks converge faster
Prevents features with large scales from dominating
learning
1.
The spectogram which is a frquency v/s time
representation is generated from the amplitude and time
graph

2.
it is then divided into various slices each corresponding to
the sound it makes
Step 3. Acoustic Modelling
Acoustic models can be of various types and with different loss functions but the most used in literature
and production are Connectionist Temporal Classification (CTC) based model that considers
spectrogram (X) as input and produces the log probability scores (P) of all different vocabulary tokens for
each time step.

The Problem CTC Solves In speech recognition:

The input is a long sequence of audio frames (e.g., 1,000 time steps).
The output is a much shorter sequence of text (e.g., 10 characters).
We don’t know exactly which part of the audio corresponds to which letter/word (no alignment).

How CTC Works (Conceptually):

Allows the model to predict at every time step a character or a special blank token (_).
During training, it considers all possible alignments of the output text within the input length.\It computes
the total probability of all valid alignments using dynamic programming.

CTC solves the alignment problem in speech recognition.

It lets models map long input sequences to shorter outputs without frame-level labels.
It uses dynamic programming to sum over all possible alignments.
It powers many speech models — especially before attention-based methods became mainstream.
Step 4. Decoding

Matching Patterns: The acoustic and language models work

together in decoding. The system matches the observed acoustic
patterns with the learned models to identify the most probable
sequence of phonemes and words.
3. these divided spectogram slices are then looked
upon in the vast datasets which can give similar
phonetic identifications as seen in picture 1

then these words are looked upon in the datasets to

make words further then these words are analyzed
according to
1. probability of phonetics occurring adjacent to each
other as seen in picture 2
2. words that make sense next to each other
3. as well as the sentence structure that derives
meaning as seen in picture 3
RESEARCH
PAPER
By:
Ms. Preethi G
Mr. Abhishek K
Mr. Thiruppugal S
Mr. Vishwaa D A
CODE
TARGET AUDIENCE
TECH-SAVVY INDIVIDUALS INCLUDING
STUDENTS, RESEARCHERS, DEVELOPERS,
AND INDUSTRY PROFESSIONALS INTERESTED
IN ARTIFICIAL INTELLIGENCE, NATURAL
LANGUAGE PROCESSING, HUMAN-
COMPUTER INTERACTION, AND THE FUTURE
DEVELOPMENT OF SMART VOICE ASSISTANT
TECHNOLOGY.THE PROJECT IS INTENDED
INITIALIZATION LISTENING PROCESSING OUTPUT
Set up text to Wait for user Match the Provide
speech input with a input with
recognition, responses
microphone, predefined
adjust voice, through
recognize commands
speed and
and execute text-to-
volume speech using
corresponding speech or
Google
functions web browsin
speech API

How it
WORKS?!
TIMELINE

NOVEMBER 2016 APRIL 2017 OCTOBER 2017 MAY 2019

Google announced
Google announced
Google Home A software update that virtual home
two new products:
was introduced brought back multi- devices, including
the Google Home the Nest Hub Max,
in the United user functionality
Mini and the Google would be rebranded
States. Home Max.
under the Google
Nest standard.
THANK YOU

Personal Voice Assistant in Python
86% (22)
Personal Voice Assistant in Python
30 pages
Voice Technology Seminar
100% (1)
Voice Technology Seminar
35 pages
Voice Assistant
No ratings yet
Voice Assistant
30 pages
Mestrado-Engenharia Informatica-Eduardo Farofia Medeiros
No ratings yet
Mestrado-Engenharia Informatica-Eduardo Farofia Medeiros
103 pages
ISM Report Final
No ratings yet
ISM Report Final
33 pages
Similarity 0505064848
No ratings yet
Similarity 0505064848
56 pages
224s 22 Lec1
No ratings yet
224s 22 Lec1
31 pages
Speech Recognition
No ratings yet
Speech Recognition
9 pages
Presentation 3
No ratings yet
Presentation 3
24 pages
Telecommunication Applications of Speech Recognition
No ratings yet
Telecommunication Applications of Speech Recognition
100 pages
Ai Voice Assistant PPT Project
0% (1)
Ai Voice Assistant PPT Project
22 pages
Speech To Text
No ratings yet
Speech To Text
17 pages
Speech Recognition: BY Charu Joshi
100% (2)
Speech Recognition: BY Charu Joshi
26 pages
Speech Recognition Report
100% (1)
Speech Recognition Report
20 pages
Final Report
No ratings yet
Final Report
35 pages
AI Speech Recognition Document
No ratings yet
AI Speech Recognition Document
26 pages
Report Latex Code A08 1
No ratings yet
Report Latex Code A08 1
57 pages
Unit 5 (Automatic Speech Recognition)
No ratings yet
Unit 5 (Automatic Speech Recognition)
13 pages
Automatic Speech Recognition Using Python
No ratings yet
Automatic Speech Recognition Using Python
18 pages
Seminar Report Final
No ratings yet
Seminar Report Final
37 pages
NLP 1.3.1 - Speed Recogmnition
No ratings yet
NLP 1.3.1 - Speed Recogmnition
20 pages
Ai Project Sona-1 (1) - 250630 - 194118
No ratings yet
Ai Project Sona-1 (1) - 250630 - 194118
10 pages
Speech Recognition Seminar
No ratings yet
Speech Recognition Seminar
19 pages
Anthropology 14th Edition Carol R Ember HQ File Fast Access
No ratings yet
Anthropology 14th Edition Carol R Ember HQ File Fast Access
312 pages
Speechrecognitionfinalpresentation 141124072610 Conversion Gate01
No ratings yet
Speechrecognitionfinalpresentation 141124072610 Conversion Gate01
30 pages
DL Proj Rep
No ratings yet
DL Proj Rep
11 pages
Text and Speech CCS369-UNIT 5
No ratings yet
Text and Speech CCS369-UNIT 5
9 pages
Personal Voice Assistant in Python
100% (1)
Personal Voice Assistant in Python
30 pages
Voice Assistant Using Python 2
No ratings yet
Voice Assistant Using Python 2
20 pages
Convai Technical Overview Speech Ai Part 2 2301964
No ratings yet
Convai Technical Overview Speech Ai Part 2 2301964
11 pages
IT Report-1
No ratings yet
IT Report-1
14 pages
Thesis-Speech Recognition Markov
No ratings yet
Thesis-Speech Recognition Markov
65 pages
Ai Virtual Assistant in Python: Submitted By: Rohit Kumar Sakshi Verma
No ratings yet
Ai Virtual Assistant in Python: Submitted By: Rohit Kumar Sakshi Verma
17 pages
Tutorial On Speech Recognition: Alex Acero Microsoft Research
No ratings yet
Tutorial On Speech Recognition: Alex Acero Microsoft Research
38 pages
4bs1 02 Rms 20250123
No ratings yet
4bs1 02 Rms 20250123
27 pages
Lecture 1
No ratings yet
Lecture 1
48 pages
Seminar Presentation: Topic: Speech Recognition
No ratings yet
Seminar Presentation: Topic: Speech Recognition
26 pages
CASSI Speech Recognition
No ratings yet
CASSI Speech Recognition
14 pages
Speech Recognition System
No ratings yet
Speech Recognition System
5 pages
SPEECH
100% (1)
SPEECH
17 pages
Speech Recognition
No ratings yet
Speech Recognition
13 pages
Speech Recognition Technology
No ratings yet
Speech Recognition Technology
23 pages
Caterpillar ISO Symbols
100% (2)
Caterpillar ISO Symbols
55 pages
(IJCST-V9I2P18) :swati, Harpreet Kaur
No ratings yet
(IJCST-V9I2P18) :swati, Harpreet Kaur
6 pages
Synopsis
No ratings yet
Synopsis
5 pages
Piyu Sem Report.5
No ratings yet
Piyu Sem Report.5
30 pages
Progress - Report - of - Intership MD Shams Alam
No ratings yet
Progress - Report - of - Intership MD Shams Alam
4 pages
Speech Representation Models For Speech Synthesis and Multimodal Speech Recognition
No ratings yet
Speech Representation Models For Speech Synthesis and Multimodal Speech Recognition
63 pages
Attachment and Culture - Security in The United States and Japan
No ratings yet
Attachment and Culture - Security in The United States and Japan
12 pages
Be Electrical Engineering Semester 5 2023 December Renewable Energy Sourcesrev 2019 C Scheme
No ratings yet
Be Electrical Engineering Semester 5 2023 December Renewable Energy Sourcesrev 2019 C Scheme
1 page
Lecture 9 - Speech Recognition
No ratings yet
Lecture 9 - Speech Recognition
65 pages
How Speech Recognition Works: Hidden Markov Model
No ratings yet
How Speech Recognition Works: Hidden Markov Model
25 pages
9 Speech Recognition
No ratings yet
9 Speech Recognition
26 pages
AI Desktop
No ratings yet
AI Desktop
14 pages
A Thief in The Night
No ratings yet
A Thief in The Night
103 pages
Speech Recognition: BY Charu Joshi
No ratings yet
Speech Recognition: BY Charu Joshi
26 pages
Ann LA2 Project
No ratings yet
Ann LA2 Project
23 pages
Design and Implementation
No ratings yet
Design and Implementation
74 pages
Minor Project123
No ratings yet
Minor Project123
40 pages
Natural Language Processing: by Dr. Parminder Kaur
No ratings yet
Natural Language Processing: by Dr. Parminder Kaur
26 pages
Traits of 21st Century Teacher
No ratings yet
Traits of 21st Century Teacher
14 pages
Malpezzi Ozanne Thibodeau Characteristic Prices 59 Metro Areas Hedonic Indexes Hud-50814
No ratings yet
Malpezzi Ozanne Thibodeau Characteristic Prices 59 Metro Areas Hedonic Indexes Hud-50814
200 pages
Jarvis Digital Life Assistant IJERTV2IS1237 PDF
No ratings yet
Jarvis Digital Life Assistant IJERTV2IS1237 PDF
6 pages
Customer Inquiry Report-9
No ratings yet
Customer Inquiry Report-9
7 pages
Reconocimiento de Voz - MATLAB
No ratings yet
Reconocimiento de Voz - MATLAB
5 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
35 pages
Parkinson Disease & ALS Cheat Sheet
No ratings yet
Parkinson Disease & ALS Cheat Sheet
4 pages
Experimental Investigation of Circular Concrete Filled Steel Tube Geometry On Seismic Performance
No ratings yet
Experimental Investigation of Circular Concrete Filled Steel Tube Geometry On Seismic Performance
54 pages
Read The Masterplan
No ratings yet
Read The Masterplan
47 pages
MS-Syllabus TCChem
No ratings yet
MS-Syllabus TCChem
17 pages
Assignment 1 ECN3112
No ratings yet
Assignment 1 ECN3112
4 pages
2018 Oakland Linuxmalware
No ratings yet
2018 Oakland Linuxmalware
15 pages
AC6-How To Setup Client+AP Mode
No ratings yet
AC6-How To Setup Client+AP Mode
10 pages
Bapp Telesys Tag 3
No ratings yet
Bapp Telesys Tag 3
4 pages
MN67672 Eng
No ratings yet
MN67672 Eng
22 pages
Sundyne Compressor Brochure - US
No ratings yet
Sundyne Compressor Brochure - US
16 pages
Oral Characteristics of Newborns: Journal of Dentistry For Children (Chicago, Ill.) December 2008
No ratings yet
Oral Characteristics of Newborns: Journal of Dentistry For Children (Chicago, Ill.) December 2008
4 pages
Secret of Anti-Aging Anti-Aging Food Con
No ratings yet
Secret of Anti-Aging Anti-Aging Food Con
5 pages
Q1-DLL-WK-7 - October 9-13-2023-2024
No ratings yet
Q1-DLL-WK-7 - October 9-13-2023-2024
5 pages
My MVP in Volleyball: Individual Awards: Collegiate Awards
No ratings yet
My MVP in Volleyball: Individual Awards: Collegiate Awards
1 page
PHP Yii JSP Servlet - 2 - Md. Shibly Forkani
No ratings yet
PHP Yii JSP Servlet - 2 - Md. Shibly Forkani
4 pages
Day 4 Plastic Pollution Ielts Nguyenhuyen
No ratings yet
Day 4 Plastic Pollution Ielts Nguyenhuyen
1 page
Ephesians: What To Do
No ratings yet
Ephesians: What To Do
8 pages
CS Nipple 21K-62-71310
No ratings yet
CS Nipple 21K-62-71310
1 page
Review 1 Lop 11 Thi Diem Units 123
No ratings yet
Review 1 Lop 11 Thi Diem Units 123
2 pages
Chitoglucan New Overview
No ratings yet
Chitoglucan New Overview
6 pages
BROSURABFPLOFT20112
No ratings yet
BROSURABFPLOFT20112
6 pages
Assembly Language: From Basics to Expert Proficiency
From Everand
Assembly Language: From Basics to Expert Proficiency
William Smith
No ratings yet
Aimybox Voice Assistant Development: Definitive Reference for Developers and Engineers
From Everand
Aimybox Voice Assistant Development: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Hugging Face Transformers Essentials: From Fine-Tuning to Deployment
From Everand
Hugging Face Transformers Essentials: From Fine-Tuning to Deployment
Robert Johnson
No ratings yet

Voice Assistant

Uploaded by

Voice Assistant

Uploaded by

VOICE ASSISTANT POWERED BY YOUR VOICE .

COURSE: FUNDAMENTALS IN AI/ML

The primary goal of speech recognition systems is to accurately and

This technology relies on sophisticated algorithms and machine

It employs technologies like Speech Recognition to convert spoken words

While simple, it demonstrates the fundamentals of voice-based interaction

•If Statement: Executes a block of code if a condition is true.

THREADED OPERATIONS VOICE OPTIMIZATION CORE FUNCTIONS

Step 2. Feature Extraction

STANDARDIZATION CNN (Convolutional Neural Network)

The Problem CTC Solves In speech recognition:

How CTC Works (Conceptually):

CTC solves the alignment problem in speech recognition.

Matching Patterns: The acoustic and language models work

then these words are looked upon in the datasets to

NOVEMBER 2016 APRIL 2017 OCTOBER 2017 MAY 2019

You might also like