14-Speech Recognition

Uploaded by

thatsarra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views11 pages

14-Speech Recognition

Uploaded by

thatsarra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

CS463 – Natural Language Processing

Speech Recognition
 Speech
 Automatic Speech Recognition (ASR)
• History
• Challenges
• Performance Evaluation
 ASR Approaches
• Template-based ASR
• Statistical ASR
Speech
• Speech - is the most common analog signal
produced by humans.
• Human thoughts and intentions
• Speech production involves physiology, cognition,
and acoustics to produce meaningful sounds to
communicate
• Consonants
• Vowels
• Speech signal can be decomposed into source and filter.
– The source is the vocal folds in voiced speech.
– The filter is the vocal tract and articulators.

2
Speech
• Speech visualization combines the science of speech
production with the art of visual representation
– It aims to make the invisible sounds of speech visible, offering
insights into the temporal and spectral dynamics of spoken
language.

3
Automatic Speech Recognition (ASR)
• Automatic Speech Recognition (ASR) - is the
process of converting spoken language into text.
• How does it work?
– Analyzes the acoustic signal (sound waves) of spoken
language.
– Extracts features like pitch, formants, and spectral energy.
– Uses these features to identify individual sounds
(phonemes) and then combine them into words.
– Employs language models to predict the most likely
sequence of words based on the context and grammar.
– Outputs the recognized text.

4
ASR Example

5
ASR Challenges
• Converting spoken language into text using computational
methods is a complex challenge with various facets
encompassing many factors of input, processing, and output.
• Input factors:
– Acoustic signal - captured through microphones, contains the
speaker's voice but can also be mixed with background noise and
other environmental factors.
• Microphone: close-mic, throat-mic, microphone array
• Sources: band-limited, background noise
• Speaker: speaker dependent, speaker independent
– Language - the specific spoken language, with its vocabulary,
grammar, and pronunciation rules, guides the interpretation of the
acoustic signal.

9
ASR Challenges
• Processing factors:
– Feature extraction - Identifying the relevant aspects of the acoustic
signal that represent the spoken words, like pitch, formants, and
spectral energy.
• Pitch: the fundamental frequency of the vocal cords, determining
the perceived "highness" or "lowness" of a voice.
• Formants: resonant frequencies created by the vocal tract, shaping
the sound waves of vowels and certain consonants.
• Spectral energy: comprehensive picture of the sound wave
including pitch, formants, noise, breath, speaker variations and
emotions.
– Acoustic modeling - This stage decodes the sequence of sounds
(phonemes) from the extracted features, essentially recognizing the
building blocks of speech.
– Language modeling - Based on acoustic modeling, the system
then applies its understanding of language rules and context to
predict the most likely sequence of words. 10
ASR Challenges
• Output factors:
– Text transcript - The final product, the text version of the spoken
language, is the result of the previous stages.
• Accuracy and fluency of sentences are crucial
• but factors like keywords, punctuation and speaker identification
can also be part of the output.
• What is hard about that?
– Digitization – Converting analogue signal to digital representation
– Signal processing – Separating speech from background noise
– Phonetics – Variability in human speech
– Phonology – Recognizing individual sound distinctions (similar
phonemes)
– Lexicology and syntax – Disambiguating homophones and
features of continuous speech
– Pragmatics – Filtering of performance errors (disfluencies) 11
ASR Performance Evaluation
• Evaluating the performance of an ASR system is
crucial for:
– Understanding its strengths and weaknesses
– Identifying areas for improvement
– Comparing different approaches
• Feature Selection is a crucial step in ASR systems, as
it plays a vital role in determining the accuracy and
efficiency of the recognition process
– Choosing the right features from the extracted acoustic
signal can significantly improve performance
– Selecting irrelevant or redundant features can lead to errors
and wasted computational resources
12
ASR Performance Evaluation
• Key metrics to evaluate the performance of an ASR
system:
– Accuracy – Percentage of tokens correctly recognized
– Error Rate – Percentage of errors made by the system
(inverse of accuracy)
– Speed and latency – Time taken to process the speech and
generate the output text (transcript)
– Resource consumption - The amount of memory,
processing power, and other resources required by the
system.
– User experience - Subjective factors like ease of use,
clarity of transcripts, and overall satisfaction with the
system.
13
ASR Approaches – Template-Based ASR
• Originally only worked for isolated words, one user.
• Performs best when training and testing conditions
are best.
• For each word we want to recognize, we store a
template or example based on actual data.
• Each test utterance is checked against the templates
to find the best match.
• Uses the Dynamic Time Warping (DTW) algorithm

Automatic Speech Recognition Thesis
100% (3)
Automatic Speech Recognition Thesis
7 pages
Speech Recognition Report
100% (1)
Speech Recognition Report
20 pages
Automatic Speech Recognition (ASR) : Omar Khalil Gómez - Università Di Pisa
100% (1)
Automatic Speech Recognition (ASR) : Omar Khalil Gómez - Università Di Pisa
65 pages
Xiao Guest Lecture ASR
No ratings yet
Xiao Guest Lecture ASR
39 pages
ASR2018
No ratings yet
ASR2018
40 pages
Lecture 1
No ratings yet
Lecture 1
32 pages
SPEECH
100% (1)
SPEECH
17 pages
Asr01 Intro
No ratings yet
Asr01 Intro
43 pages
HG3052 SpeechSynthesisAndRecognition Lecture 10 Update2019-20
No ratings yet
HG3052 SpeechSynthesisAndRecognition Lecture 10 Update2019-20
49 pages
Artificial Intelligence-For Speech Recognition
100% (3)
Artificial Intelligence-For Speech Recognition
13 pages
Speech Recognition: BY Charu Joshi
100% (2)
Speech Recognition: BY Charu Joshi
26 pages
Lecture1 PDF
No ratings yet
Lecture1 PDF
28 pages
UNIT-V Automatic Speech Recognition 22.10,24
No ratings yet
UNIT-V Automatic Speech Recognition 22.10,24
15 pages
Artificial Intelligence For Speech Recognition
No ratings yet
Artificial Intelligence For Speech Recognition
13 pages
Speech Recognition1
100% (1)
Speech Recognition1
39 pages
Chapter One
No ratings yet
Chapter One
13 pages
Electrical Engineering (2017-2021) Punjab Engineering College, Chandigarh - 160012
No ratings yet
Electrical Engineering (2017-2021) Punjab Engineering College, Chandigarh - 160012
23 pages
A Review On Automatic Speech Recognition Architect
No ratings yet
A Review On Automatic Speech Recognition Architect
13 pages
A Review On Different Approaches For Speech - Recognition System
No ratings yet
A Review On Different Approaches For Speech - Recognition System
6 pages
Speechrecognitionfinalpresentation 141124072610 Conversion Gate01
No ratings yet
Speechrecognitionfinalpresentation 141124072610 Conversion Gate01
30 pages
IT Report-1
No ratings yet
IT Report-1
14 pages
Speech Recognition Seminar
No ratings yet
Speech Recognition Seminar
19 pages
Automatic Speech Recognition For Indian Languages: Comprehension and Analysis
No ratings yet
Automatic Speech Recognition For Indian Languages: Comprehension and Analysis
11 pages
Convai Technical Overview Speech Ai Part 2 2301964
No ratings yet
Convai Technical Overview Speech Ai Part 2 2301964
11 pages
Speech Processing
No ratings yet
Speech Processing
70 pages
Booklet 2 Unit 4 English
No ratings yet
Booklet 2 Unit 4 English
37 pages
Speech Recognition
0% (1)
Speech Recognition
27 pages
Comparative Analysis of Automatic Speech Recognition Techniques
No ratings yet
Comparative Analysis of Automatic Speech Recognition Techniques
8 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
34 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
17 pages
Speech Recognition Technology
No ratings yet
Speech Recognition Technology
23 pages
A Review On Speech Recognition Challenge
No ratings yet
A Review On Speech Recognition Challenge
7 pages
Tsa Ut V
No ratings yet
Tsa Ut V
9 pages
Speech Recognition: BY Charu Joshi
No ratings yet
Speech Recognition: BY Charu Joshi
26 pages
A Brief Introduction To Automatic Speech Recognition
No ratings yet
A Brief Introduction To Automatic Speech Recognition
22 pages
Unit 5 UA
No ratings yet
Unit 5 UA
19 pages
Speech Recognition
No ratings yet
Speech Recognition
7 pages
ASR
No ratings yet
ASR
13 pages
Artificial Intelligence-An Introduction: Department of Computer Science & Engineering
No ratings yet
Artificial Intelligence-An Introduction: Department of Computer Science & Engineering
17 pages
A Report On
No ratings yet
A Report On
35 pages
Lecture 9 - Speech Recognition
No ratings yet
Lecture 9 - Speech Recognition
65 pages
Lectures 1 Rabiner Speech Processing
No ratings yet
Lectures 1 Rabiner Speech Processing
77 pages
Design and Implementation
No ratings yet
Design and Implementation
74 pages
02 10 22 - JR - Super60 - Jee Adv (2020 P2) - WTA 14 - Q.Paper
No ratings yet
02 10 22 - JR - Super60 - Jee Adv (2020 P2) - WTA 14 - Q.Paper
18 pages
Synthesis Paper
100% (1)
Synthesis Paper
2 pages
Speech Technology
No ratings yet
Speech Technology
5 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
9 pages
Speech Recognition
No ratings yet
Speech Recognition
4 pages
9 Speech Recognition
No ratings yet
9 Speech Recognition
26 pages
Feature Extraction Using PCA
No ratings yet
Feature Extraction Using PCA
36 pages
Lingoda Class
No ratings yet
Lingoda Class
42 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
35 pages
Speech Recognition UTHM
No ratings yet
Speech Recognition UTHM
30 pages
SPEECH RECOGNITION SYSTEM Final
No ratings yet
SPEECH RECOGNITION SYSTEM Final
16 pages
(IJCST-V4I2P62) :Dr.V.Ajantha Devi, Ms.V.Suganya
No ratings yet
(IJCST-V4I2P62) :Dr.V.Ajantha Devi, Ms.V.Suganya
6 pages
Automatic Speech Recognition Documentation
No ratings yet
Automatic Speech Recognition Documentation
24 pages
Term Paper ECE-300 Topic: - Speech Recognition
No ratings yet
Term Paper ECE-300 Topic: - Speech Recognition
14 pages
Speech Recognition As Emerging Revolutionary Technology
No ratings yet
Speech Recognition As Emerging Revolutionary Technology
4 pages
Sem 2 UNIT 3 Indian Schools of Philosophy
No ratings yet
Sem 2 UNIT 3 Indian Schools of Philosophy
37 pages
"Non Omnis Moriar" Is Improper, Unsuitable, and Unbefitting
No ratings yet
"Non Omnis Moriar" Is Improper, Unsuitable, and Unbefitting
12 pages
Vedic Swar Bodh by DR - Vraj Bihari Choube 1972
No ratings yet
Vedic Swar Bodh by DR - Vraj Bihari Choube 1972
170 pages
NVLD Brochure
100% (1)
NVLD Brochure
2 pages
Chabacano PDF
No ratings yet
Chabacano PDF
2 pages
Summ - Test English 4 q1 w4
No ratings yet
Summ - Test English 4 q1 w4
6 pages
Across The Gulf
No ratings yet
Across The Gulf
62 pages
Worksheet Present Simplecontinuous Martín Cárdenas
No ratings yet
Worksheet Present Simplecontinuous Martín Cárdenas
9 pages
Teachers Interview and Class Observation
No ratings yet
Teachers Interview and Class Observation
5 pages
Chattanein Full Book - Roman
No ratings yet
Chattanein Full Book - Roman
57 pages
Carman Scan Lite PC Scan: User's Guide
No ratings yet
Carman Scan Lite PC Scan: User's Guide
48 pages
Practice Exam Papers For Russian National Exam. Teachers Book - Afanaseva Evans Kopylova - 2010 211s
No ratings yet
Practice Exam Papers For Russian National Exam. Teachers Book - Afanaseva Evans Kopylova - 2010 211s
209 pages
Sample Questions Iitbmo PDF
No ratings yet
Sample Questions Iitbmo PDF
35 pages
Stucor Cs3401 As
No ratings yet
Stucor Cs3401 As
50 pages
000 - Solar Photovoltaic Generators With MPPT and Battery Storage in Microgrids-File Exchange - MATLAB Central PDF
No ratings yet
000 - Solar Photovoltaic Generators With MPPT and Battery Storage in Microgrids-File Exchange - MATLAB Central PDF
20 pages
Data Handling Notes
No ratings yet
Data Handling Notes
48 pages
Assignment No 1 (Data Science) - Ashber
No ratings yet
Assignment No 1 (Data Science) - Ashber
9 pages
2007 May P2 MS
No ratings yet
2007 May P2 MS
6 pages
02..sin Integral
No ratings yet
02..sin Integral
12 pages
Uzzy Ogic: Amit Raj Satyal Bigyan Sapkota Krishna Paudyal Simon Shrestha Subash Paudyal 14 February 2012
No ratings yet
Uzzy Ogic: Amit Raj Satyal Bigyan Sapkota Krishna Paudyal Simon Shrestha Subash Paudyal 14 February 2012
54 pages
DS Lab 9 - Recursion in C++
No ratings yet
DS Lab 9 - Recursion in C++
10 pages
Ges1007 - India Fever Book Review
No ratings yet
Ges1007 - India Fever Book Review
5 pages
DBMS Notes
No ratings yet
DBMS Notes
4 pages
Report
No ratings yet
Report
2 pages
Business Culture in Czech Republic
No ratings yet
Business Culture in Czech Republic
11 pages
Reagine
No ratings yet
Reagine
2 pages
Gerund or Infinitive
No ratings yet
Gerund or Infinitive
7 pages
The Impulse Response Bible
From Everand
The Impulse Response Bible
Past To Future
No ratings yet
Voice Technologies and Systems: Definitive Reference for Developers and Engineers
From Everand
Voice Technologies and Systems: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Text-to-Speech Systems and Algorithms: Definitive Reference for Developers and Engineers
From Everand
Text-to-Speech Systems and Algorithms: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Speech-to-Text Systems and Technologies: Definitive Reference for Developers and Engineers
From Everand
Speech-to-Text Systems and Technologies: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Computer Audition: Fundamentals and Applications
From Everand
Computer Audition: Fundamentals and Applications
Fouad Sabry
No ratings yet

14-Speech Recognition

Uploaded by

14-Speech Recognition

Uploaded by

CS463 – Natural Language Processing

You might also like