Speech Recognition

Uploaded by

elmisayda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views29 pages

Speech Recognition

Uploaded by

elmisayda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

CSCI-6632-01

Speech Recognition
and Dynamic
Programming
Nabin Kumar Bamma
Lakshmi Prasanna Nakka
Laura Brown

1
Motivation
Matching spoken words to text

Audio input → Preprocessing →

Feature Extraction → Recognition
(with DP)

2
Issues
Time
Spoken words vary in length and
pronunciation.
Noise or accents may distort the signal
Different sequences of words can sound
similar.

Goal: Find the most likely

sequence of words given an
audio signal 3
Process
DYNAMIC TIME WARPING
Helps align the input sequence to possible
paths by minimizing distortion

HIDDEN MARKOV MODELS

Statistically defines probable paths based on
observed features

DYNAMIC PROGAMMING
Selects the most probable path using dynamic
programming
4
Dynamic Programming
...in one form or another, the ideas of dynamic
programming have pervaded the methodology of
almost every aspect of speech signal processing-
particularly the areas of speech and speaker
recognition

Chin-Hui Lee

5
Key Term
Phoneme - smallest
frame of speech. 20-40
ms in length
Allophones - variations
in equal phonemes
6
Hidden Markov Model

A “generative model” showing the joint probablity p(Q, O|λ)

Used to model sequences of events that occur one after
another
Each state has a probability distribution over the possible
output

7
Viterbi Algorithm
Finds the most probable sequence of states (speech
patterns)
Explanation of its role in HMM
Step-by-step breakdown with an example

Dynamic Time Warping

(DTW)
Matches speech patterns under time shifts.
Explanation of DTW's use in aligning speech samples
Example visualization: aligning "hello" spoken at different
speeds
8
HMM Components
States (Q): The hidden conditions of the system.
Observations (O): Visible outputs generated by states.
Transition Probabilities (A)
Likelihoods of moving from one state to another.
ie: What comes next
Emission Probabilities (B)
Probabilities of observing a particular output from a state.
ie: Heard right
Initial State Distribution (π): Probabilities of starting in each
state.
9
1. Start in state qₜ = i, for some 1 ≤ i ≤ N.
2. Generate an observation, o, with pdf bi(o).
3. Transition to a new state, qt+1 = j, according to pmf aij
4. Repeat steps #2 and #3, T times each. 10
Dynamic Programming:
The Forward Algorithm

11
Example

Recognizing “cat”

12
Input
Observation sequence (acoustic frames)
O = {o₁, o₂, o₃}

*See image citation on final slide

Input
HMM Parameters λ=(A,B,π)
States (Q): The phonemes in the word "cat":
Q={q₁:/k/, q₂:/æ/, q₃:/t/}
Transition Matrix (A)

Emission Probabilities (B)

Initial State Distribution (π) π=[1.0, 0.0, 0.0]

14
Forward Algorithm
Initialization (t = 1)

For each qᵢ :
α₁(i)=π[i]⋅B[i][o₁]

q₁ → α₁(1)=π[1]⋅B[1][o₁] = 1.0⋅0.7=0.7

q₂ → α₁(2)=π[2]⋅B[2][o₁] = 0.0⋅0.1=0.0
q₃ → α₁(3)=π[3]⋅B[3][o₁] = 0.0⋅0.1=0.0
/k/ /ae/ /t/
Result: α₁= [0.7, 0.0, 0.0]

15
Forward Algorithm
Induction (t = 2)