Speech Recognition
Speech Recognition
Speech Recognition
and Dynamic
Programming
Nabin Kumar Bamma
Lakshmi Prasanna Nakka
Laura Brown
1
Motivation
Matching spoken words to text
2
Issues
Time
Spoken words vary in length and
pronunciation.
Noise or accents may distort the signal
Different sequences of words can sound
similar.
DYNAMIC PROGAMMING
Selects the most probable path using dynamic
programming
4
Dynamic Programming
...in one form or another, the ideas of dynamic
programming have pervaded the methodology of
almost every aspect of speech signal processing-
particularly the areas of speech and speaker
recognition
Chin-Hui Lee
5
Key Term
Phoneme - smallest
frame of speech. 20-40
ms in length
Allophones - variations
in equal phonemes
6
Hidden Markov Model
7
Viterbi Algorithm
Finds the most probable sequence of states (speech
patterns)
Explanation of its role in HMM
Step-by-step breakdown with an example
11
Example
Recognizing “cat”
12
Input
Observation sequence (acoustic frames)
O = {o₁, o₂, o₃}
13
For each qᵢ :
α₁(i)=π[i]⋅B[i][o₁]
15
Forward Algorithm
Induction (t = 2)
For each qᵢ :
α₂(j)=(∑α₁(i)⋅A[i][j]) ⋅ B[j][o₂]
16
Forward Algorithm
Induction (t = 2)
For each qᵢ :
α₂(j)=(∑α₁(i)⋅A[i][j]) ⋅ B[j][o₂]
17
Forward Algorithm
Induction (t = 2)
For each qᵢ :
α₂(j)=(∑α₁(i)⋅A[i][j]) ⋅ B[j][o₂]
18
Forward Algorithm
Induction (t = 2)
For each qᵢ :
α₂(j)=(∑α₁(i)⋅A[i][j]) ⋅ B[j][o₂]
19
Forward Algorithm
Induction (t = 2)
For each qᵢ :
α₂(j)=(∑α₁(i)⋅A[i][j]) ⋅ B[j][o₂]
20
Forward Algorithm
Induction (t = 3)
For each qᵢ :
α₃(j)=(∑α₂(i)⋅A[i][j]) ⋅ B[j][o₃]
22
Algorithm Overview
23
The Forward Algorithm
Goal: Identify the most likely sequence of words
(hidden states) given acoustic signals
(observations).
24
Input
25
Method
Steps:
1. Initialization
2. Recursion:
3. Termination
26
Output
P(O∣λ): How well the model explains what was
heard.
O(T • N²)
28
Image Citation:
29