CCS369 - TSS-Unit 5
CCS369 - TSS-Unit 5
UNIT 5
By
C.Jerin Mahibha
Assoc.Prof / CSE
UNIT V AUTOMATIC SPEECH RECOGNITION
COURSE OBJECTIVES:
Develop a speech recognition system
COURSE OUTCOME:
CO5:Apply deep learning models for building speech recognition systems
Text Book : Speech and Language Processing: An Introduction to Natural Language
Processing, Computational Linguistics, and Speech Recognition” by Daniel Jurafsky
and James H. Martin - Chapter 26
Reference :
https://maelfabien.github.io/machinelearning/speech_reco/#acoustic-model
Speech Recognition - Automatic Speech Recognition (ASR)
• Speech - natural interface for communication
• Task - map any waveform to the appropriate string of words
• Application
• Smart home appliances
• Personal assistants
• Cellphones
• Telephony - call-routing
• Sophisticated dialogue applications
• Transcription - automatically generating captions for audio or video text
• Field of law - dictation plays an important role
• Augmentative communication - difficulties or inabilities in typing
• Model the relationship between the audio signal and the phonetic units in
the language
• Isolated word/pattern recognition - acoustic features (Y) - used as an input
to a classifier - output is the correct word
• Continuous speech recognition – Input / Output is a sequence
• acoustic model goes further than a simple classifier
• Output - sequence of phonemes
• Hidden Markov Models
➢are natural candidates for Acoustic Models
➢they are great at modeling sequences
➢Has states si, and at each state, observations oi are generated
Feature Extraction
• A commonly used algorithm for computing the DFT is the Fast Fourier
transform or FFT
• This implementation of the DFT is very efficient but only works for values
of N that are powers of 2
Mel Filter Bank and Log
• Results of FFT – represent the energy at each frequency band
• Human hearing - not equally sensitive at all frequency bands- less sensitive at
higher frequencies
• Modeling - human perceptual property - improves speech recognition
performance
• Implemented - by collecting energies - not equally at each frequency band,
but according to the mel scale, an auditory frequency scale
• A mel - is a unit of pitch
• Pairs of sounds that are perceptually equidistant in pitch are separated by an
equal number of mels
• The mel frequency m - computed from the raw acoustic frequency by a log
transformation: