0% found this document useful (0 votes)
4 views29 pages

Speech Recognition

Uploaded by

elmisayda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views29 pages

Speech Recognition

Uploaded by

elmisayda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

CSCI-6632-01

Speech Recognition
and Dynamic
Programming
Nabin Kumar Bamma
Lakshmi Prasanna Nakka
Laura Brown

1
Motivation
Matching spoken words to text

Audio input → Preprocessing →


Feature Extraction → Recognition
(with DP)

2
Issues
Time
Spoken words vary in length and
pronunciation.
Noise or accents may distort the signal
Different sequences of words can sound
similar.

Goal: Find the most likely


sequence of words given an
audio signal 3
Process
DYNAMIC TIME WARPING
Helps align the input sequence to possible
paths by minimizing distortion

HIDDEN MARKOV MODELS


Statistically defines probable paths based on
observed features

DYNAMIC PROGAMMING
Selects the most probable path using dynamic
programming
4
Dynamic Programming
...in one form or another, the ideas of dynamic
programming have pervaded the methodology of
almost every aspect of speech signal processing-
particularly the areas of speech and speaker
recognition

Chin-Hui Lee

5
Key Term
Phoneme - smallest
frame of speech. 20-40
ms in length
Allophones - variations
in equal phonemes
6
Hidden Markov Model

A “generative model” showing the joint probablity p(Q, O|λ)


Used to model sequences of events that occur one after
another
Each state has a probability distribution over the possible
output

7
Viterbi Algorithm
Finds the most probable sequence of states (speech
patterns)
Explanation of its role in HMM
Step-by-step breakdown with an example

Dynamic Time Warping


(DTW)
Matches speech patterns under time shifts.
Explanation of DTW's use in aligning speech samples
Example visualization: aligning "hello" spoken at different
speeds
8
HMM Components
States (Q): The hidden conditions of the system.
Observations (O): Visible outputs generated by states.
Transition Probabilities (A)
Likelihoods of moving from one state to another.
ie: What comes next
Emission Probabilities (B)
Probabilities of observing a particular output from a state.
ie: Heard right
Initial State Distribution (π): Probabilities of starting in each
state.
9
1. Start in state qₜ = i, for some 1 ≤ i ≤ N.
2. Generate an observation, o, with pdf bi(o).
3. Transition to a new state, qt+1 = j, according to pmf aij
4. Repeat steps #2 and #3, T times each. 10
Dynamic Programming:
The Forward Algorithm

11
Example

Recognizing “cat”

12
Input
Observation sequence (acoustic frames)
O = {o₁, o₂, o₃}

13

*See image citation on final slide


Input
HMM Parameters λ=(A,B,π)
States (Q): The phonemes in the word "cat":
Q={q₁:/k/, q₂:/æ/, q₃:/t/}
Transition Matrix (A)

Emission Probabilities (B)

Initial State Distribution (π) π=[1.0, 0.0, 0.0]


14
Forward Algorithm
Initialization (t = 1)

For each qᵢ :
α₁​(i)=π[i]⋅B[i][o₁​]

q₁​ → α₁​(1)=π[1]⋅B[1][o₁​] = 1.0⋅0.7=0.7


q₂​ → α₁​(2)=π[2]⋅B[2][o₁​] = 0.0⋅0.1=0.0
q₃​ → α₁​(3)=π[3]⋅B[3][o₁​] = 0.0⋅0.1=0.0
/k/ /ae/ /t/
Result: α₁​= [0.7, 0.0, 0.0]

15
Forward Algorithm
Induction (t = 2)

For each qᵢ :
α₂​(j)=(∑​α₁​(i)⋅A[i][j]) ⋅ B[j][o₂​]

/k/ /ae/ /t/

16
Forward Algorithm
Induction (t = 2)

For each qᵢ :
α₂​(j)=(∑​α₁​(i)⋅A[i][j]) ⋅ B[j][o₂​]

q₁​ → α₂​(1) = (α₁​(1)⋅A[1][1]+α₁​(2)⋅A[2][1]+α₁​(3)⋅A[3][1])⋅B[1][o₂​]

/k/ /ae/ /t/

17
Forward Algorithm
Induction (t = 2)

For each qᵢ :
α₂​(j)=(∑​α₁​(i)⋅A[i][j]) ⋅ B[j][o₂​]

q₁​ → α₂​(1) = (α₁​(1)⋅A[1][1]+α₁​(2)⋅A[2][1]+α₁​(3)⋅A[3][1])⋅B[1][o₂​]


= (0.7⋅0.7 + 0⋅0 + 0⋅0)⋅0.2 = 0.098

/k/ /ae/ /t/

18
Forward Algorithm
Induction (t = 2)

For each qᵢ :
α₂​(j)=(∑​α₁​(i)⋅A[i][j]) ⋅ B[j][o₂​]

q₁​ → α₂​(1) = (α₁​(1)⋅A[1][1]+α₁​(2)⋅A[2][1]+α₁​(3)⋅A[3][1])⋅B[1][o₂​]


= (0.7⋅0.7 + 0⋅0 + 0⋅0)⋅0.2 = 0.098
q₂​ → α₂(2) = (0.7⋅0.3+0⋅0.6+0⋅0)⋅0.6 = 0.126

/k/ /ae/ /t/

19
Forward Algorithm
Induction (t = 2)

For each qᵢ :
α₂​(j)=(∑​α₁​(i)⋅A[i][j]) ⋅ B[j][o₂​]

q₁​ → α₂​(1) = (α₁​(1)⋅A[1][1]+α₁​(2)⋅A[2][1]+α₁​(3)⋅A[3][1])⋅B[1][o₂​]


= (0.7⋅0.7 + 0⋅0 + 0⋅0)⋅0.2 = 0.098
q₂​ → α₂(2) = (0.7⋅0.3+0⋅0.6+0⋅0)⋅0.6 = 0.126
q₃​ → α₂​(3) = (0.7⋅0.0+0⋅0.4+0⋅1.0)⋅0.2 = 0.0
/k/ /ae/ /t/

Result: α₂ = [0.098, 0.126, 0.0]

20
Forward Algorithm
Induction (t = 3)

For each qᵢ :
α₃(j)=(∑​α₂​(i)⋅A[i][j]) ⋅ B[j][o₃​]

q₁​ → α₃(1) =(0.098⋅0.7 + 0.126⋅0 + 0⋅0)⋅0.1 = 0.00686


q₂​ → α₃(2) = (0.098⋅0.3+0.126⋅0.6+0⋅0)⋅0.3 = 0.0315
q₃​ → α₃​(3) = (0.098⋅0.0+0.126⋅0.4+0⋅1.0)⋅0.6 = 0.03024

/k/ /ae/ /t/

Result: α₃ = [0.00686, 0.0315, 0.03024]


21
The Forward Algorithm
Termination
P(O∣λ)=∑​α₃​(j)​

P(O∣λ) = 0.00686+0.0315+0.03024 = 0.0686

Best interpreted in relation to other HMMs. For example, if


P(O∣λ꜀ₐₜ​)=0.0686
P(O∣λₕₐₜ)=0.01234
P(O∣λₘₐₜ)=0.00345

22
Algorithm Overview

23
The Forward Algorithm
Goal: Identify the most likely sequence of words
(hidden states) given acoustic signals
(observations).

HMM models the joint probability P(Q, O∣λ)


The Forward Algorithm seeks to computes
P(O∣λ) using dynamic programming
Without the F.A., summing over all possible
sequences would take exponential time

24
Input

O: Sequence of acoustic frames (length T)


λ HMM Parameters
A, B: HMM probability matrices
π: Initial probabilities
M: the number of observations

25
Method
Steps:
1. Initialization

2. Recursion:

3. Termination

26
Output
P(O∣λ): How well the model explains what was
heard.

Applications of the Forward


Algorithm
Speech recognition.
Natural language processing.
Bioinformatics (e.g., gene sequence
alignment).
Financial modeling.
27
Time Complexity

O(T • N²)

Without dynamic programming we would have


exponential computation time, O(Nᵀ).

28
Image Citation:

Slide 12: Spectrogram


Dumpala, Sri Harsha & Alluri, K N R K. (2017).
An Algorithm for Detection of Breath Sounds in Spontaneous Speech with Application to
Speaker Recognition.
98-108. 10.1007/978-3-319-66429-3_9.
Link:
https://www.researchgate.net/publication/319081627_An_Algorithm_for_Detection_of_Breat
h_Sounds_in_Spontaneous_Speech_with_Application_to_Speaker_Recognition

29

You might also like