0% found this document useful (0 votes)

45 views35 pages

Speech Recognition: Lecture 11: Advanced Topics

This lecture discusses advanced topics in speech recognition including evaluating performance, generating N-best strings from lattices, and discriminative training. It covers measuring accuracy using edit distance between the transcription and reference, computing N-best paths efficiently in weighted graphs, and generating unique top N strings from lattices rather than just paths. Discriminative training is also mentioned.

Uploaded by

sdede

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views35 pages

Speech Recognition: Lecture 11: Advanced Topics

Uploaded by

sdede

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

Speech Recognition

Lecture 11: Advanced Topics.

Mehryar Mohri
Courant Institute of Mathematical Sciences
[email protected]
This Lecture
Speech recognition evaluation
N-best strings algorithms
Lattice generation
Discriminative training

Mehryar Mohri - Speech Recognition page 2 Courant Institute, NYU

Performance Measure
Accuracy: based on edit-distance of speech
recognition transcription and reference
transcription.

• word or phone accuracy.

• lattice oracle accuracy: edit-distance of lattice
and reference transcription.
Note: performance measure does not match the
quantity optimized to learn models.

• word-error rate lattices.

Mehryar Mohri - Speech Recognition page 3 Courant Institute, NYU
Word Error Rates

*
*

* Based on 1998 evaluation

Edit-Distance
Definition: minimal cost of a sequence of edit
operations transforming one string into another.
Edit operations and costs:

• standard edit-distance definition: insertion,

deletions, substitutions, all with same cost one.

• general case: more general operations, arbitrary

non-negative costs.
Application: measuring word error rate in speech
recognition and other string processing tasks.
Mehryar Mohri - Speech Recognition page 5 Courant Institute, NYU
Local Edits
Edit operations: insertion: ε → a, deletion: a → ε,
substitution: a → b (a ≠ b).
Example: 2 insertions, 3 deletions, 1 substitution
cttg!!ac
!ta!gt!c
This is called an alignment.

Mehryar Mohri - Speech Recognition page Courant6 Institute, NYU

Edit-Distance Computation
Standard case: textbook recursive algorithm
(Cormen, Leiserson, Rivest, 1992), quadratic
complexity, O(|x||y|) for two strings x and y .
General case: (MM, Pereira, and Riley, 2000; MM, 2003)

• construct tropical semiring edit-distance

transducer
T e with arbitrary edit costs.

• represent x and y by automata X and Y .

• compute best path of X ◦ T ◦ Y . e

• complexity quadratic: O(|T ||X||Y |). e

Mehryar Mohri - Speech Recognition page 7 Courant Institute, NYU
Global Alignment - Example
Example: c(A, G) = 1, c(A,T) = c(G, C) = .5, no cost
for matching symbols. C:G/0.5
G:C/0.5
Representation: T:A/0.5
A:T/0.5
G:A/1.0
A:G/1.0
A G C T T:T/0.0
0 0 0 0 0
G:G/0.0
C:C/0.0
A:A/0.0
A C C T G
0 0 0 0 0 0
0

echo “A G C T” | farcompilestrings >X.fsm

Mehryar Mohri - Speech Recognition page Courant8 Institute, NYU
Global Alignment - Example
Program: fsmcompose X.fsm Te.fsm Y.fsm |
fsmbestpath -n 1 >A.fsm

Graphical representation:

A:A/0 G:C/0.5 C:C/0 T:T/0 e:G/2

0 1 2 3 4 5/0

A:A/0 G:C/0.5 C:C/0 T:T/0 e:G/2 6/0

e:e/0 1 2 3 4 5
0 e:e/0
A:A/0 G:e/2 e:C/2 C:C/0 T:T/0 e:G/2
7 8 9 10 11 12 13/0

Mehryar Mohri - Speech Recognition page Courant9 Institute, NYU

Edit-Distance of Automata
Definition: the edit-distance of two automata A
and B is the minimum edit-distance of a string
accepted by A and a string accepted by B .
Computation:

• best path of A ◦ T ◦ B . e

• complexity for acyclic automata:O(|T ||A||B|) .

Generality: any weighted transducer in the tropical

semiring defines an edit-distance. Learning edit-
distance transducer using EM algorithm.
Mehryar Mohri - Speech Recognition page 10 Courant Institute, NYU
This Lecture
Speech recognition evaluation
N-best strings algorithms
Lattice generation
Discriminative training

Mehryar Mohri - Speech Recognition page 11 Courant Institute, NYU

N-Best Sequences
Motivation: rescoring.

• first pass using a simple acoustic and grammar

lattice or N-best list.

• re-evaluate alternatives with a more

sophisticated model or use new information.
General problem:

• speech recognition, handwriting recognition.

• information extraction, image processing.
Mehryar Mohri - Speech Recognition page 12 Courant Institute, NYU
N-Shortest-Paths Problem
Problem: given a weighted directed graph G, a
source state s and a set of destination or final
states F, find the N shortest paths in G from s to F.
Algorithms:
• (Dreyfus, 1969): O(|E| + N log(|E|/|Q|)).

• (MM, 2002): shortest-distance algorithm, N-tropical

semiring.
• (Eppstein, 2002): O(|E| + |Q| log |Q| + N ).

+ explicit representation of N best paths: O(|Q| N 2 ).

Mehryar Mohri - Speech Recognition page 13 Courant Institute, NYU

N-Shortest Strings != N-Shortest-Paths
Problem: given a weighted directed graph G, a
source state s and a set of destination or final
states F, find the N shortest strings in G from s to F.
Example: NAB Eval 95.
Thresh Non-Unique Unique
1.5 8 2
2.0 24 4
2.5 54 4
3.0 1536 48

Mehryar Mohri - Speech Recognition page 14 Courant Institute, NYU

N-Shortest Paths
Program: fsmprune -c1.5 lat.fsm |
farprintstrings -c -iNAB.wordlist

in addition the launch of Microsoft corporation's windows ninety five

software will mean more memory will be required to run -2038.46
in addition the launch of Microsoft corporation's windows ninety five
software will mean more memory will be required around -2037.8
in addition the launch of Microsoft corporation's windows ninety five
software will mean more memory will be required to run -2037.51
in addition the launch of Microsoft corporation's windows ninety five
software will mean more memory will be required to run -2037.42
in addition the launch of Microsoft corporation's windows ninety five
software will mean more memory will be required around -2036.85
in addition the launch of Microsoft corporation's windows ninety five
software will mean more memory will be required around -2036.76
in addition the launch of Microsoft corporation's windows ninety five
software will mean more memory will be required to run -2036.47
in addition the launch of Microsoft corporation's windows ninety five
software will mean more memory will be required around -2035.81

Mehryar Mohri - Speech Recognition page 15 Courant Institute, NYU

N-Shortest Strings
Program: fsmprune -c1.5 lat.fsm |
farprintstrings -c -u -iNAB.wordlist

in addition the launch of Microsoft corporation's windows ninety five

software will mean more memory will be required to run -2038.46

in addition the launch of Microsoft corporation's windows ninety five

software will mean more memory will be required around -2037.8

Mehryar Mohri - Speech Recognition page 16 Courant Institute, NYU

Algorithms Based on N-Best Paths
(Chow and Schwartz, 1990; Soon and Huang, 1991)
Idea: use K-best paths algorithm to generate K ! N
distincts paths.
Problems:

• K not known in advance.

• in practive, K may be sometimes quite large, that
is K ∼ 2N, which affects both time and space
complexity.

Mehryar Mohri - Speech Recognition page 17 Courant Institute, NYU

N-Best String Algorithm
(MM and Riley, 2002)
Idea: apply N-best paths algorithm to on-the-fly
determinization of input automaton. But, N-best
paths algorithms require shortest distances to F’.
Weighted determinization (partial):

• eliminates redundancy, no determinizability

issue.

• on-demand computation: only the part needed

is computed.

• on-the-fly computation of the needed shortest-

distances to final states.
Mehryar Mohri - Speech Recognition page 18 Courant Institute, NYU
Shortest-Distances to Final States
Definition: let d(q, F ) denote the shortest distance
from q to the set of final states F in input (non-
deterministic) automaton A , and let d! (q ! , F ! ) be
defined in the same way in the resulting
(deterministic) automatonB .
Theorem: for any state = {(q1 , w1 ), . . . , (qn , wn )}
q !

in B , the following holds:

d! (q ! , F ! ) = min {wi + d(qi , F )} .
i=1,...,n

Mehryar Mohri - Speech Recognition page 19 Courant Institute, NYU

Simple N-Shortest-Paths Algorithm
1 for p ← 1 to |Q! | do r[p] ← 0
2 π[(i! , 0)] ← nil
3 S ← {(i! , 0)}
4 while S "= ∅
5 do (p, c) ← head(S); Dequeue(S)
6 r[p] ← r[p] + 1
7 if (r[p] = N and p ∈ F ) then exit
8 if r[p] ≤ N
9 then for each e ∈ E[p]
10 do c! ← c + w[e]
11 π[(n[e], c! )] ← (p, c)
12 Enqueue(S, (n[e], c! ))

Mehryar Mohri - Speech Recognition page 20 Courant Institute, NYU

N-Best String Alg. - Experiments
NAB 40K Bigram

100
5 5
5
90

5 4 4
4
3 3
4 3
word accuracy

5 3 2 2
2
4 2
3
80

2 1 1 1
1
5
43 1
2
70

1
60

0 1 2 3 4 5 6
x real-time
1. 1-Best 2. 10-Best 3. 100-Best 4. 1000-Best 5. Lattice

Additional time to pay for N-best very small even for large N.
Mehryar Mohri - Speech Recognition page 21 Courant Institute, NYU
N-Best String Alg. - Properties
Simplicity and efficiency:

• easy to implement: combine two general

algorithms.

• works with any N-best paths algorithm.

• empirically efficient.
Generality:

• arbitrary input automaton (not nec. acyclic).

• incorporated in FSM Library ( ).
fsmbestpath

Mehryar Mohri - Speech Recognition page 22 Courant Institute, NYU

This Lecture
Speech recognition evaluation
N-best strings algorithms
Lattice generation
Discriminative training

Mehryar Mohri - Speech Recognition page 23 Courant Institute, NYU

Speech Recognition Lattices
Definition: weighted automaton representing
speech recognizer’s alternative hypotheses.
is/22.36 10
this/16.8 4 my/63.09
1
hi/80.76 is/70.97 number/34.56
8 hard/22 15 57
I/47.36 this/153.0 is/71.16 Mike/192.5
0 2 5 87 uh/83.34

is/77.68 7 my/19.2
I’d/143.3 card/20.1
9
3 this/90.3 6

like/41.58

Mehryar Mohri - Speech Recognition page 24 Courant Institute, NYU

Lattice Generation
(Odell, 1995; Ljolje et al., 1999)
Procedure: given transition e in N, keep in lattice
transition ((p[e], t! ), i[e], o[e], (n[e], t)) with best start
time (p[e], t! ) during Vierbi decoding.

(r, t!! )

(q, t)

(r, t! )

t! t!! t+1

Mehryar Mohri - Speech Recognition page 25 Courant Institute, NYU

Lattice Generation
Computation time: little extra computation over
one-best.
Optimization:

• projection on output (words or phonemes).

• epsilon-removal.
• pruning: keeps transitions and states lying on
paths whose total weight is within a threshold
of the best path.

• garbage-collection (use same pruning).

Mehryar Mohri - Speech Recognition page 26 Courant Institute, NYU
Notes
Heuristics: not all paths within beam are kept in
lattice.
Lattice quality: oracle accuracy, that is best
accuracy achieved by any path in lattice.
Optimizations: weighted determinization and
minimization.

• in general, dramatic reduction of redundancy

and size.

• bad for some lattices, typically uncertain cases.

Mehryar Mohri - Speech Recognition page 27 Courant Institute, NYU
in/14.3 50

in/25

saint/68.1
in/21.7

around/107
at/33.8

Speech Recognition Lattice

at/35

at/31.6

at/28.5

at/32.8

arrive/24.2
at/29.7

arrive/23.2 at/31.1

arrive/27.7 in/24.7
saint/53.6
19 48
arrive/21.7 in/27.2 saint/46.8

53 petersburg/73.6
75
saint/42.5 petersburg/91.5 around/116
arrive/16.5 27 at/28.9

arrive/16.2 saint/41.3 petersburg/85.5

at/31.7
and/49.7 76
49 saint/46.2 around/98.8
64 petersburg/93.1
arrive/15.6 26
at/25.8
arrive/15.2 petersburg/98.3 72
at/31.7 saint/47.9 63 around/103
petersburg/77.5

arrive/16.5
22 at/35 saint/49.7
around/105
62 petersburg/85.2
arrive/19.7 in/29 saint/39.8 77
28 around/109
petersburg/84.4
at/22.2 saint/56.1 61
arrive/19.1 around/104
petersburg/90.3
at/23.4 around/92.7
arrive/16.9 71
saint/34.4
and/59.3 at/19.5 petersburg/92.3 96
arriving/41.3 around/109 nine/11.8
saint/39.1 a/16.3

ninety/18
arrive/21.1 79 97
leave/39.2 at/26.4
32 45 saint/45.4 around/87.6 a/9.34
nine/12.9
arrive/16
leave/35.9 at/37.3 saint/57.8
around/104
nine/5.61
flights/79 8 leave/41.9 at/36.8 saint/44.4 around/91.6 98
55 petersburg/97.9 nine/10.7 a/14.1
arrive/13.7
69 around/99.3 nine/17.2 m/16.6
saint/33.2 petersburg/99.9
flights/83.8 leave/34.6 arrive/16.6 at/32.8
a/22.8 102 m/19
95
arrive/20.5 petersburg/104 91 a/13.3 m/13.9
at/28.9 saint/38.7 around/117 and/19.1
nine/13
leave/50.7
81 105 /13.6
and/14.2 nine/24.9
flights/83.4 detroit/99.1 around/97.2 a/17.9 100 m/16.3
leave/31.3 12 arrive/15.6 at/35.1 saint/51.5 petersburg/86.1
9
flights/88.2
leave/37.3 detroit/102 saint/45.5 around/92.2 nine/9.2 92
arrive/16.9 m/13.6
petersburg/92.8 a/26.6
25 at/32 nine/5.66
detroit/103
nine/18.7
3 leave/47.4 82 m/24.1
flights/63.5 11 arrive/13.1 at/33.7 saint/37.8 petersburg/85.1 70 a/18.5 104 /16.2
which/77.7 detroit/106 20 47 around/87.8
and/55.8
leave/57.7
saint/44.2 around/105 a/8.73 m/12.5
flights/68.3 arrive/15.7 at/27.5 petersburg/88.1 78 87 nine/17.2
54 67 94
which/72.9 2 detroit/96.3 15 petersburg/90 around/109 around/48.8
flights/61.8 7 and/57 a/13.6 m/10.9
leave/53.4 arrive/14.4
at/23.6 saint/43.1 nine/6.31 nine/8.8 101
and/55.3 21 m/21.4
0 detroit/88.5
which/81.6 flights/72.4 leave/54.4 arrive/18.2
petersburg/98.8 around/93.2 86 around/102
at/29.8 saint/38 nine/8.76 a/16.4
13 99
which/69.9 4 detroit/106
flights/55.2 leave/70.9 m/7.03
24 saint/46.5 around/109 nine/28 nine/18.8 a/22.3
arrive/22.7 at/20 petersburg/92 103 /20.8
and/53.5
flights/59.9 leave/60.4 detroit/102
m/17.5
6 at/21.2 saint/43.3 petersburg/80.9 around/111 nine/12.3
and/55 arrive/20.2 a/12.1
leave/45.8 detroit/105 around/97.2
flights/53.5
at/17.3 saint/49.7 56 around/97.8 nine/31.2
flights/64 arrive/19.6 petersburg/91.1 93
detroit/99.7
1 leave/61.9 16 85
flights/45.4 arrive/13.9 at/20.1 petersburg/73.3
saint/48.6 around/99.5
detroit/91.9
leave/68.9 14 nine/21.9
that/60.1 arrive/16.6 at/25.7 46 petersburg/84.8 at/65 89
detroit/110
saint/36.8 68
leave/67.6
flights/50.2 and/53.1 around/85.6
nine/21.9
at/27.4 saint/50.1 petersburg/77.1 at/12.2
arrive/14.1 57
flights/43.7 detroit/99.4 at/15.2 nine/28
leave/73.6 17 petersburg/76.4
5 at/21.3 saint/38.7 around/81
and/55 arrive/16 88
31 80
detroit/91.6 around/97.1 nine/12.4
flights/54.3 leave/82.1 at/17.4
arrive/20.5 at/20.1
detroit/109
leave/44.4 saint/42.3 90
33
18 at/23.5
10 arrive/14.4
petersburg/75.6 at/14.1
saint/49.5 nine/9.45
30
leave/51.4 at/26.3 around/113
arrive/13.6 83
saint/43.1
leave/64.6 nine/22.6
arrive/18.6 at/27.5 saint/56.5 around/111
73
petersburg/89.3
saint/45 around/101
arrive/23.1 at/20.1
59 84
petersburg/84.1
arrive/17.1 saint/43 around/99.5
at/23.5

arrive/16.5
at/25.2 saint/49.3

Mehryar Mohri - Speech Recognition page 28 Courant Institute, NYU

at/21.3 saint/55.4
43
Lattice after Determinization
(MM, 1997)

around/81
27
at/19.1
32
around/83
nine/21.8

saint/51.2 petersburg/83.6 around/107 nine/5.61

9 14 19 23

36
arriving/45.2 saint/51.9 petersburg/85.9 nine/21.7
12 17 21 around/96.5 a/15.3
in/16.3
25 ninety/18
at/16.1
arrive/17.4 at/23
7 11 petersburg/80 a/15.3
18 22 29 33 m/
5 saint/49.4 at/16.1 26 ninety/34.1
and/49.7 arrives/23.6
at/20.9 m/
which/69.9 flights/53.1 leave/61.2 detroit/105 8 13 saint/43
0 1 2 3 4 nine/21.7
that/60.1 a/9.34
petersburg/85.6 around/97.1 30 34
6 arrive/12.8 saint/43 16 20
at/21.9 a/14.1
10 15 nine/21.7

around/97.1 28 ninety/34.1
35
and/18.7 nine/10.7

and/18.7 31

24 ninety/34.1

and/18.7

Mehryar Mohri - Speech Recognition page 29 Courant Institute, NYU

Lattice after Minimization
(MM, 1997)

around/97.1 nine/29.1

arrive/12.9 at/21.8 saint/48.6 petersburg/80 at/18

8 10 13 16
6 19 ninety/34.1
that/60.1 arrives/23.6 at/23
around/96.5 and/21.2
nine/13 a/9.34 m
which/69.9 flights/53.1 leave/61.2 detroit/105 21 23
0 1 2 3 4 22
and/49.7 ninety/18
5 arrive/17.4 around/97.2
in/24.9 saint/48.6 petersburg/80.6 at/19.1
7 11 14 17 20
arriving/45.2
nine/13
saint/51.1 petersburg/83.6 around/107 18
9 12 15

Mehryar Mohri - Speech Recognition page 30 Courant Institute, NYU

This Lecture
Speech recognition evaluation
N-best strings algorithms
Lattice generation
Discriminative training

Mehryar Mohri - Speech Recognition page 31 Courant Institute, NYU

Discriminative Techniques
Maximum-likelihood: parameters adjusted to
increase joint likelihood of acoustic and CD phone
or word sequences, irrespective of the probability
of other word hypotheses.
Discriminative techniques: takes into account
competing word hypotheses and attempts to
reduce the probability of incorrect ones.

• Main problems: computationally expensive,

generalization.

Mehryar Mohri - Speech Recognition page 32 Courant Institute, NYU

Objective Functions
Maximum likelihood (joint):
!
m
F = argmax log pθ (oi , wi ).
θ i=1
Conditional maximum likelihood (CML):
!
m
pθ (oi , wi ) !
m
F = argmax log pθ (oi |wi ) = argmax log .
θ i=1 θ i=1
p θ (o i )

Maximum mutual information (MMI/MMIE)

!
m
pθ (oi , wi )
F = argmax log .
θ i=1
pθ (oi )pθ (wi )
Equivalenty to CML when independent of theta.
Mehryar Mohri - Speech Recognition page 33 Courant Institute, NYU
References
• Y. Chow and R. Schwartz, The N-Best Algorithm: An Efficient Procedure for Finding top N
Sentence Hypotheses. In Proceedings of the International Conference on Acoustics, Speech, and
Signal Processing (ICASSP '90), Albuquerque, New Mexico, April 1990, pp. 81–84.

• S. E. Dreyfus. An appraisal of some shortest path algorithms. Operations Research,

17:395-412, 1969.

• David Eppstein, Finding the shortest paths, SIAM Journal of Computing, vol.28, no. 2, pp. 652–
673, 1998.

• Andrej Ljolje and Fernando Pereira and Michael Riley, Efficient general lattice generation
and rescoring. In Proceedings of the European Conference on Speech Communication
and Technology (Eurospeech ’99), Budapest, Hungary, 1999.

• Mehryar Mohri. Finite-State Transducers in Language and Speech Processing. Computational

Linguistics, 23:2, 1997.

• Mehryar Mohri. Statistical Natural Language Processing. In M. Lothaire, editor, Applied

Combinatorics on Words. Cambridge University Press, 2005.

Mehryar Mohri - Speech Recognition page 34 Courant Institute, NYU

References
• Mehryar Mohri. Edit-Distance of Weighted Automata: General Definitions and Algorithms.
International Journal of Foundations of Computer Science, 14(6):957-982, 2003.

• Mehryar Mohri and Michael Riley. An Efficient Algorithm for the N-Best-Strings Problem.
In Proceedings of the International Conference on Spoken Language Processing 2002 (ICSLP
’02), Denver, Colorado, September 2002.

• Mehryar Mohri, Fernando C. N. Pereira, and Michael Riley. The Design Principles of a
Weighted Finite-State Transducer Library. Theoretical Computer Science, 231:17-32, January
2000.

• Julian Odell. The Use of Context in Large Vocabulary Speech Recognition. Ph.D. thesis, 1995.
Cambridge University, UK.

• Frank Soong and Eng-Fong Huang, A Tree-Trellis Based Fast Search for Finding the N Best
Sentence Hypotheses in Continuous Speech Recognition. In Proceedings of the International
Conference on Acoustics, Speech, and Signal Processing (ICASSP ’91), Toronto, Canada,
November 1991, pp. 705–708.

Mehryar Mohri - Speech Recognition page 35 Courant Institute, NYU

ScadaBR-Developers - CERTI - ScadaBR2
100% (1)
ScadaBR-Developers - CERTI - ScadaBR2
20 pages
Asynchronous Data Transfer in Computer Organization - Javatpoint
No ratings yet
Asynchronous Data Transfer in Computer Organization - Javatpoint
8 pages
Lecture 2 PDF
No ratings yet
Lecture 2 PDF
65 pages
CASE STUDY - Speech Recognition
No ratings yet
CASE STUDY - Speech Recognition
25 pages
Fundamental Math
From Everand
Fundamental Math
Russell Pead
No ratings yet
Nonlinear Transformations of Random Processes
From Everand
Nonlinear Transformations of Random Processes
Ralph Deutsch
No ratings yet
Beatrice Gay Letter Writing Scipt
0% (2)
Beatrice Gay Letter Writing Scipt
23 pages
SAP MDG Questions and Answers - Ambikeya
No ratings yet
SAP MDG Questions and Answers - Ambikeya
17 pages
Variational Methods for Boundary Value Problems for Systems of Elliptic Equations
From Everand
Variational Methods for Boundary Value Problems for Systems of Elliptic Equations
M. A. Lavrent’ev
No ratings yet
Unit 1 Fundamentals of Algorithmic Problem Solving Read Only 1677308772949
No ratings yet
Unit 1 Fundamentals of Algorithmic Problem Solving Read Only 1677308772949
46 pages
A Model of Musical Motifs
100% (1)
A Model of Musical Motifs
8 pages
Class 1 Intro
No ratings yet
Class 1 Intro
45 pages
Lesson 1: Functions: Solution
0% (1)
Lesson 1: Functions: Solution
4 pages
Rings and Homology
From Everand
Rings and Homology
James P. Jans
No ratings yet
18-IntroNLP II PDF
No ratings yet
18-IntroNLP II PDF
187 pages
All Models PDF
No ratings yet
All Models PDF
125 pages
Design and Implementation
No ratings yet
Design and Implementation
74 pages
CMP3008 LN1 CourseOverview Introduction
No ratings yet
CMP3008 LN1 CourseOverview Introduction
49 pages
Department of Education: Republic of The Philippines
No ratings yet
Department of Education: Republic of The Philippines
24 pages
Ann LA2 Project
No ratings yet
Ann LA2 Project
23 pages
Verilog Questions
No ratings yet
Verilog Questions
6 pages
224s 22 Lec7
No ratings yet
224s 22 Lec7
50 pages
Intro To Algo and DS
No ratings yet
Intro To Algo and DS
55 pages
ACE6000 User Guide - FW - V2 65bis
No ratings yet
ACE6000 User Guide - FW - V2 65bis
93 pages
154 Main
No ratings yet
154 Main
53 pages
Viva Questions Class 12
No ratings yet
Viva Questions Class 12
28 pages
CSE 204-Theory of Computation: Introduction Lecture
No ratings yet
CSE 204-Theory of Computation: Introduction Lecture
33 pages
3 LM Jan 08 2021
No ratings yet
3 LM Jan 08 2021
77 pages
Lecture 9 - Speech Recognition
No ratings yet
Lecture 9 - Speech Recognition
65 pages
Lecture 1
No ratings yet
Lecture 1
48 pages
SRG 4600 Manual
No ratings yet
SRG 4600 Manual
29 pages
Automata Lecture 01 DFA
No ratings yet
Automata Lecture 01 DFA
35 pages
Temporal Pattern Classification Using Spiking Neural Networks
No ratings yet
Temporal Pattern Classification Using Spiking Neural Networks
67 pages
Expert Evaluation Form PK
No ratings yet
Expert Evaluation Form PK
2 pages
Machine Learning Chapter 2
No ratings yet
Machine Learning Chapter 2
37 pages
Lec. 01 and 02 - Introduction
No ratings yet
Lec. 01 and 02 - Introduction
55 pages
Speech Recognition Using Python
No ratings yet
Speech Recognition Using Python
49 pages
2021 Computing Challenge BEAVER
No ratings yet
2021 Computing Challenge BEAVER
46 pages
Week 4
No ratings yet
Week 4
37 pages
Lecture05-Hmm Pos Tagging
No ratings yet
Lecture05-Hmm Pos Tagging
38 pages
CPT212 04b ComputationalComplexity
No ratings yet
CPT212 04b ComputationalComplexity
38 pages
Voice Assistant
No ratings yet
Voice Assistant
34 pages
So You Thought You Were Safe Using Angularjs. - . - Think Again!
No ratings yet
So You Thought You Were Safe Using Angularjs. - . - Think Again!
46 pages
Definition of Minimum Edit Distance
No ratings yet
Definition of Minimum Edit Distance
49 pages
Project
No ratings yet
Project
27 pages
Lecture 9 PDF
No ratings yet
Lecture 9 PDF
42 pages
Lecture 3 PDF
No ratings yet
Lecture 3 PDF
41 pages
Lecture 4 PDF
No ratings yet
Lecture 4 PDF
38 pages
Workshop 01 - Live Session HW0
No ratings yet
Workshop 01 - Live Session HW0
21 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
35 pages
Evolutionary Algorithms N NLP
No ratings yet
Evolutionary Algorithms N NLP
28 pages
Lecture 5 PDF
No ratings yet
Lecture 5 PDF
24 pages
Dynamic Programming and Single Word Recognizers (Part 1)
No ratings yet
Dynamic Programming and Single Word Recognizers (Part 1)
25 pages
Speech Processing 15-492/18-492: Speech Recognition Template Matching
No ratings yet
Speech Processing 15-492/18-492: Speech Recognition Template Matching
24 pages
PROJECT PROPOSAL FOR jetPOINT
No ratings yet
PROJECT PROPOSAL FOR jetPOINT
17 pages
Isolated Digit Recognition System
100% (1)
Isolated Digit Recognition System
3 pages
Jan. 19, 2001 VLSI Test: Bushnell-Agrawal/Lecture 1 1
No ratings yet
Jan. 19, 2001 VLSI Test: Bushnell-Agrawal/Lecture 1 1
16 pages
CSCI-2400 Models of Computation
No ratings yet
CSCI-2400 Models of Computation
36 pages
Afaster Training Algorithm and Genetic Algorithm To Recognize Some of Arabic Phonemes
No ratings yet
Afaster Training Algorithm and Genetic Algorithm To Recognize Some of Arabic Phonemes
13 pages
Speech Recognition Using Matlab: Objective
No ratings yet
Speech Recognition Using Matlab: Objective
2 pages
Handwritten Telugu Character Recognition Using Machine Learning
No ratings yet
Handwritten Telugu Character Recognition Using Machine Learning
6 pages
A Framework For Speech Recognition Development
No ratings yet
A Framework For Speech Recognition Development
23 pages
Internship Topics 2022-2023
No ratings yet
Internship Topics 2022-2023
10 pages
Speech Recognition System
No ratings yet
Speech Recognition System
5 pages
Speech Recognition1
No ratings yet
Speech Recognition1
24 pages
Pptmicro
No ratings yet
Pptmicro
9 pages
Grade 8 Computer LM Unit 2
No ratings yet
Grade 8 Computer LM Unit 2
11 pages
Am PDF
No ratings yet
Am PDF
11 pages
Iphone Xs Max (US-Canada-A1921) 64, 256, 512 GB Specs (A1921, MT5A2LL-A, 3219, IPhone11,6)
No ratings yet
Iphone Xs Max (US-Canada-A1921) 64, 256, 512 GB Specs (A1921, MT5A2LL-A, 3219, IPhone11,6)
6 pages
Reconocimiento de Voz - MATLAB
No ratings yet
Reconocimiento de Voz - MATLAB
5 pages
Fin f12 Sol
No ratings yet
Fin f12 Sol
6 pages
Neurocomputing: Mario Malcangi, David Frontini
No ratings yet
Neurocomputing: Mario Malcangi, David Frontini
10 pages
A Puzzle Solver and Its Application in Speech Descrambling: January 2007
No ratings yet
A Puzzle Solver and Its Application in Speech Descrambling: January 2007
7 pages
Jarvis Digital Life Assistant IJERTV2IS1237 PDF
No ratings yet
Jarvis Digital Life Assistant IJERTV2IS1237 PDF
6 pages
Information Security Challenges: A Malaysian Context: Adnan Rizal Haris@Harib, Suhaimi Sarijan and Norhayati Hussin
No ratings yet
Information Security Challenges: A Malaysian Context: Adnan Rizal Haris@Harib, Suhaimi Sarijan and Norhayati Hussin
7 pages
CV Purabi Apr22
No ratings yet
CV Purabi Apr22
5 pages
K85001-0668 - Genesis LED EG4 Series Wall Mount Notification Devices
No ratings yet
K85001-0668 - Genesis LED EG4 Series Wall Mount Notification Devices
6 pages
Proposal of An Intelligent Speech Recognition System: November 2012
No ratings yet
Proposal of An Intelligent Speech Recognition System: November 2012
7 pages
Implementation of Speech Recognition Using Artificial Neural Networks
No ratings yet
Implementation of Speech Recognition Using Artificial Neural Networks
12 pages
Vol7no3 331-336 PDF
No ratings yet
Vol7no3 331-336 PDF
6 pages
Interrupts Considered Harmful: Euripedes Fauls and Fourrier Forrester
No ratings yet
Interrupts Considered Harmful: Euripedes Fauls and Fourrier Forrester
6 pages
How To Log Defects
No ratings yet
How To Log Defects
6 pages
Oracle - End of Support Dates
No ratings yet
Oracle - End of Support Dates
3 pages
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
No ratings yet
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
6 pages
(Ebook) AQA GCSE Physics Student Book by Jim Breithaupt Lawrie Ryan ISBN 9780198359395, 019835939X Instant Download
100% (1)
(Ebook) AQA GCSE Physics Student Book by Jim Breithaupt Lawrie Ryan ISBN 9780198359395, 019835939X Instant Download
43 pages
Vocabulary Classroom Objects
No ratings yet
Vocabulary Classroom Objects
2 pages
Naukri NarahariJayavardhan (6y 0m)
No ratings yet
Naukri NarahariJayavardhan (6y 0m)
2 pages
To For "Passport Mela" With Of: On Their
No ratings yet
To For "Passport Mela" With Of: On Their
2 pages
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
From Everand
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
Fouad Sabry
No ratings yet
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
From Everand
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
Fouad Sabry
No ratings yet

Speech Recognition: Lecture 11: Advanced Topics

Uploaded by

Speech Recognition: Lecture 11: Advanced Topics

Uploaded by

Speech Recognition

Lecture 11: Advanced Topics.

Mehryar Mohri - Speech Recognition page 2 Courant Institute, NYU

• word or phone accuracy.

• word-error rate lattices.

* Based on 1998 evaluation

• standard edit-distance definition: insertion,

• general case: more general operations, arbitrary

Mehryar Mohri - Speech Recognition page Courant6 Institute, NYU

• construct tropical semiring edit-distance

• represent x and y by automata X and Y .

• complexity quadratic: O(|T ||X||Y |). e

echo “A G C T” | farcompilestrings >X.fsm

A:A/0 G:C/0.5 C:C/0 T:T/0 e:G/2

A:A/0 G:C/0.5 C:C/0 T:T/0 e:G/2 6/0

Mehryar Mohri - Speech Recognition page Courant9 Institute, NYU

• complexity for acyclic automata:O(|T ||A||B|) .

Generality: any weighted transducer in the tropical

Mehryar Mohri - Speech Recognition page 11 Courant Institute, NYU

• first pass using a simple acoustic and grammar

• re-evaluate alternatives with a more

• speech recognition, handwriting recognition.

• (MM, 2002): shortest-distance algorithm, N-tropical

+ explicit representation of N best paths: O(|Q| N 2 ).

Mehryar Mohri - Speech Recognition page 13 Courant Institute, NYU

Mehryar Mohri - Speech Recognition page 14 Courant Institute, NYU

in addition the launch of Microsoft corporation's windows ninety five

Mehryar Mohri - Speech Recognition page 15 Courant Institute, NYU

in addition the launch of Microsoft corporation's windows ninety five

in addition the launch of Microsoft corporation's windows ninety five

Mehryar Mohri - Speech Recognition page 16 Courant Institute, NYU

• K not known in advance.

Mehryar Mohri - Speech Recognition page 17 Courant Institute, NYU

• eliminates redundancy, no determinizability

• on-demand computation: only the part needed

• on-the-fly computation of the needed shortest-

in B , the following holds:

Mehryar Mohri - Speech Recognition page 19 Courant Institute, NYU

Mehryar Mohri - Speech Recognition page 20 Courant Institute, NYU

• easy to implement: combine two general

• works with any N-best paths algorithm.

• arbitrary input automaton (not nec. acyclic).

Mehryar Mohri - Speech Recognition page 22 Courant Institute, NYU

Mehryar Mohri - Speech Recognition page 23 Courant Institute, NYU

Mehryar Mohri - Speech Recognition page 24 Courant Institute, NYU

Mehryar Mohri - Speech Recognition page 25 Courant Institute, NYU

• projection on output (words or phonemes).

• garbage-collection (use same pruning).

• in general, dramatic reduction of redundancy

• bad for some lattices, typically uncertain cases.

Speech Recognition Lattice

arrive/16.2 saint/41.3 petersburg/85.5

Mehryar Mohri - Speech Recognition page 28 Courant Institute, NYU

saint/51.2 petersburg/83.6 around/107 nine/5.61

Mehryar Mohri - Speech Recognition page 29 Courant Institute, NYU

arrive/12.9 at/21.8 saint/48.6 petersburg/80 at/18

Mehryar Mohri - Speech Recognition page 30 Courant Institute, NYU

Mehryar Mohri - Speech Recognition page 31 Courant Institute, NYU

• Main problems: computationally expensive,

Mehryar Mohri - Speech Recognition page 32 Courant Institute, NYU

Maximum mutual information (MMI/MMIE)

• S. E. Dreyfus. An appraisal of some shortest path algorithms. Operations Research,

• Mehryar Mohri. Finite-State Transducers in Language and Speech Processing. Computational

• Mehryar Mohri. Statistical Natural Language Processing. In M. Lothaire, editor, Applied

Mehryar Mohri - Speech Recognition page 34 Courant Institute, NYU

Mehryar Mohri - Speech Recognition page 35 Courant Institute, NYU

You might also like