Voice Assistant
Voice Assistant
DRIVEN BY INTELLIGENCE
1. 2. 3.
TASK
ACCURATE REAL-TIME
EXECUTION
SPEECH AND NATURAL
AND
RECOGNITION INTERACTION
AUTOMATION
FUNCTIONALITIES
Listening: The assistant uses a microphone to capture the user's
voice and processes it.
Speech Recognition: Converts the user's spoken input into text for
processing.
Command Processing: Interprets the text to understand the user's
intent.
Performing Actions: Executes the required task, like fetching
information, opening applications, or responding verbally.
Responding: Converts the response text back into speech for output
INTRODUCTION
The assistant can perform simple tasks like opening apps, fetching
information, or giving time and date updates.
Use of LOOPS: Loops in Python are used to execute a block of code repeatedly, either for a fixed
number of times or until a condition is met. They are fundamental in automating repetitive tasks,
handling large datasets, and iterating over elements in collections like lists or strings. In this program
we have used the while loop in the main function to execute the block of code
USE OF FUNCTIONS IN PYTHON PROGRAM
A function in Python is a block of reusable code designed to perform a specific task. It help organize
and structure code, make it easier to debug, and improve reusability
In this program we have used various Functions:
Listen to
Process Open
speech
Commands Website
or voice
EXIT CONDITION:
The program stops running when the user says "exit" or "quit."
1. Wave
2. NumPy
3. datetime
LIBRARIES 4. requests
USED... 5. Thread
6. webbrowser
7. Speech_recognition
8. pyttsx3
FEATURES
ERROR HANDLING REDUCED LATENCY
Handled Shortened timeout
unknown input for speech
recognition, faster
and failed API text-to-speech
calls output
1. 2. 3. 4. output
Step 1. Acoustic Signal Processing:
The input to a speech recognition system is an acoustic
signal, the analogue waveform of the spoken words. This
signal is captured by a microphone and converted into a
digital format, Therefore, a complex speech recognition
algorithm known as the Fast Fourier Transform is used to
convert the graph into a spectrogram.
Standardization is a data preprocessing step that CNN is a type of neural network that learns to extract
transforms input features so they have: features from data automatically — especially spatial or
Mean = 0 temporal patterns.
Standard deviation = 1
🔍 What it does:
Formula: Applies filters (or kernels) that slide over input data
z=(x−μ)/σ Detects local features, like edges, shapes (in images), or
phonetic patterns (in spectrograms)
Where: Each convolution layer learns increasingly abstract
x= input value representations
μ= mean of the feature
σ= standard deviation ✅ CNN learns what features are important — it doesn't
just scale or normalize them.
✅ Why it's used:
To normalize the range of values
Helps neural networks converge faster
Prevents features with large scales from dominating
learning
1.
The spectogram which is a frquency v/s time
representation is generated from the amplitude and time
graph
2.
it is then divided into various slices each corresponding to
the sound it makes
Step 3. Acoustic Modelling
Acoustic models can be of various types and with different loss functions but the most used in literature
and production are Connectionist Temporal Classification (CTC) based model that considers
spectrogram (X) as input and produces the log probability scores (P) of all different vocabulary tokens for
each time step.
How it
WORKS?!
TIMELINE
Google announced
Google announced
Google Home A software update that virtual home
two new products:
was introduced brought back multi- devices, including
the Google Home the Nest Hub Max,
in the United user functionality
Mini and the Google would be rebranded
States. Home Max.
under the Google
Nest standard.
THANK YOU