Speechrecognitionfinalpresentation 141124072610 Conversion Gate01
Speechrecognitionfinalpresentation 141124072610 Conversion Gate01
recognition?
Speech recognition technology has recently
reached a higher level of performance and
robustness, allowing it to communicate to another
user by talking .
Language Model
Step 2:Digitization
Digitize the analog acoustic signal.
Step 5:Matching
According to grammar , phonetic representation
and Dictionary , the system returns an n-best list
(I.e.:a word plus a confidence score)
Grammar-the union words or phrases to constraint
the range of input or output in the voice application.
Dictionary-the mapping table of phonetic
representation and word(EX:thu,theethe)
Approaches
to ASR
Template Statistics
based based
13
/3
Store examples of units (words,
phonemes), then find the example that
most closely fits the input
Extract features from speech signal, then
it’s “just” a complex similarity matching
problem, using solutions developed for all
sorts of applications
OK for discrete utterances, and a single
user
14
/3
Hard to distinguish very similar templates
And quickly degrades when input differs
from templates
Therefore needs techniques to mitigate
this degradation:
• More subtle matching techniques
• Multiple templates which are aggregated
Taken together, these suggested …
15
/3
Collect a large corpus of transcribed
speech recordings
Train the computer to learn the
correspondences (“machine learning”)
At run time, apply statistical processes to
search through the space of all possible
solutions, and pick the statistically most
likely one
16
/3
Acoustic and Lexical Models
• Analyse training data in terms of relevant features
• Learn from large amount of data different
possibilities
different phone sequences for a given word
different combinations of elements of the speech signal
for a given phone/phoneme
• Combine these into a Hidden Markov Model
expressing the probabilities
17
/3
Real-world has structures and processes which have (or
produce) observable outputs:
• Advantages:
o Effective
o Can handle variations in record structure
Optional fields
Varying field ordering
Digitization
• Converting analogue signal into digital representation.
Signal processing
• Separating speech from background noise.
Phonetics
• Variability in human speech.
Phonology
• Recognizing individual sound distinctions (similar phonemes.)
Lexicology and syntax
• Disambiguating homophones.
• Features of continuous speech.
Syntax and pragmatics
• Interpreting features.
• Filtering of performance errors (disfluencies).
Speech Recognition is still a very cumbersome problem.
Following are the problem….
Speaker Variability
Two speakers or even the same speaker will
pronounce the same word differently
Channel Variability
The quality and position of microphone and
background environment will affect the output
Speech recognition applications include
Voice dialling (e.g., "Call home"),
Call routing (e.g., "I would like to make a collect call"),
Simple data entry (e.g., entering a credit card number),
Preparation of structured documents (e.g., A radiology
report),
Speech-to-text processing (e.g., word processors or emails),
and
In aircraft cockpits (usually termed Direct Voice Input).
Medical Transcription
Military
Telephony and other domains
Serving the disabled
Further Applications
• Home automation
• Automobile audio systems
• Telematics
Faster than “hand-writing”.
Hands-free capability .
No program is 100% perfect