Lecture1 PDF
Lecture1 PDF
Session 2003
Introduction to Automatic Speech Recognition
Input Output
Speech Speech
Human
Recognition Synthesis
Computer
Text Text
Generation Understanding
Meaning
Meaning
Speech
Speech interfaces
interfaces are
are ideal
ideal for
for information
information access
access and
and
management
management when:
when:
The
The information
information space
space isis broad
broad andand complex,
complex,
The
The users
users are
are technically
technically naive,
naive, oror
Only
Only telephones
telephones are
are available.
available.
ASR
ASR
System
System
Speech Recognized
Signal Words
Co-articulation
Speaker independence
Dialect variations
Non-native speakers
Spontaneous speech
Disfluencies
Out-of-vocabulary words
Language modelling
Noise robustness
Frequency
Time
Parameters
Parameters Range
Range
Speaking
Speaking Mode:
Mode: Isolated
Isolated word
word to to continuous
continuous speech
speech
Speaking
Speaking Style:
Style: Read
Read speech
speech toto spontaneous
spontaneous speech
speech
Enrollment:
Enrollment: Speaker-dependent
Speaker-dependent to to speaker-independent
speaker-independent
Vocabulary:
Vocabulary: Small
Small (<20
(<20 words)
words) to to large
large (>50,000
(>50,000 words)
words)
Language
Language Model:
Model: Finite-state
Finite-state to
to context-sensitive
context-sensitive
Perplexity:
Perplexity: Small
Small (<10)
(<10) to
to large
large (>200)
(>200)
SNR:
SNR: High
High (>30dB)
(>30dB) toto low
low (<10dB)
(<10dB)
Transducer:
Transducer: Noise-cancelling
Noise-cancelling microphone
microphone to to cell
cell phone
phone
before
before mid
mid 70's
70's mid
mid 70s
70s -- mid
mid 80s
80s after
after mid
mid 80s
80s
Recognition
Recognition whole-word
whole-word and
and sub-word
sub-word units
units sub-word
sub-word units
units
Units:
Units: sub-word
sub-word units
units
Modeling
Modeling heuristic
heuristic and
and template
template matching
matching mathematical
mathematical
Approaches:
Approaches: ad
ad hoc
hoc and
and formal
formal
rule-based
rule-based and
and deterministic
deterministic and
and probabilistic
probabilistic
declarative
declarative data-driven
data-driven and
and data-driven
data-driven
Knowledge
Knowledge heterogeneous
heterogeneous homogeneous
homogeneous homogeneous
homogeneous
Representation:
Representation: and
and complex
complex and
and simple
simple and
and simple
simple
Knowledge
Knowledge intense
intense knowledge
knowledge embedded
embedded inin automatic
automatic
Acquisition:
Acquisition: engineering
engineering simple
simple structure
structure learning
learning
1989
1991
1993
1995
1997
1999
2001
Year
6.345 Automatic Speech Recognition Introduction 14
Important Lessons Learned
Training Data
Applying Constraints
Speech Recognized
Signal Words
Representation Search
Meaning
SPEECH LANGUAGE
RECOGNITION UNDERSTANDING
Words
Jupiter
A conversational interface for on-line
weather information over the phone.
1-888-573-8255
(outside the USA: 1-617-258-0300)
http://www.sls.lcs.mit.edu/jupiter
Spoken Language Systems Group,
MIT Laboratory for Computer Science
30
25
10
20
15
10
5
0 1
97 98 99
Apr May Jun Jul Aug Nov Apr Nov May
Corpus
Corpus Speech
Speech Lexicon
Lexicon Word
Word Error
Error Human
Human Error
Error
Type
Type Size
Size Rate
Rate (%)
(%) Rate
Rate (%)
(%)
Digit
Digit Strings
Strings (phone)
(phone) spontaneous
spontaneous 10
10 0.3
0.3 0.009
0.009
Resource
Resource Management
Management read
read 1000
1000 3.6
3.6 0.1
0.1
ATIS
ATIS spontaneous
spontaneous 2000
2000 22 ----
Wall
Wall Street
Street Journal
Journal read
read 64000
64000 6.6
6.6 11
Radio
Radio News
News mixed
mixed 64000
64000 13.5
13.5 ----
Switchboard
Switchboard (phone)
(phone) conversation
conversation 10000
10000 19.3
19.3 44
Call
Call Home
Home (phone)
(phone) conversation
conversation 10000
10000 30
30 ----
Acoustic
Acoustic Theory
Theory of
of Robust
Robust Acoustic Lexical Language
Speech
Speech Production
Production ASR
ASR Models Models Models
Adaptation
Adaptation Recognized
Speech Words
Signal
Representation Search
Properties
Properties of
of Signal
Signal Search
Search
Speech
Speech Sounds
Sounds Representation
Representation Algorithms
Algorithms
Vector
Vector Quantization
Quantization Hidden
Hidden Markov
Markov Graphical
Graphical Segmental
Segmental
& Clustering
& Clustering Modeling
Modeling Models
Models Models
Models
Grading
9 Assignments 45%
2 Quizzes 30%
Term Project (about 4 weeks) 25%