0% found this document useful (0 votes)
5 views90 pages

NLP Unit-3

The document discusses phonetics and phonology, focusing on the study of speech sounds and their transcription. It explains the differences between phonetics, which deals with the physical properties of sounds, and phonology, which examines the abstract mental representations and rules governing sound patterns in language. Additionally, it covers various aspects of speech processing, including speech synthesis, recognition, and the application of phonological rules.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views90 pages

NLP Unit-3

The document discusses phonetics and phonology, focusing on the study of speech sounds and their transcription. It explains the differences between phonetics, which deals with the physical properties of sounds, and phonology, which examines the abstract mental representations and rules governing sound patterns in language. Additionally, it covers various aspects of speech processing, including speech synthesis, recognition, and the application of phonological rules.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 90

Speech sounds and phonetic

transcription
 Phonetics
 The study of speech sounds used in the languages of the world
 Phone
 A speech sound
 Represented with phonetic symbols
 Two types:
 Consonants
 Vowels

1
Phonology
Phonology is the area of linguistics that describes the systematic
way that sounds are differently realized in different
environments, and how this system of sounds is related to
rest of the grammar.

2
Speech sounds and phonetic
transcription
 Phonetic alphabets
 IPA
 Standard developed by International Phonetic Association
 Alphabet + principles of transcription
 ARPAbet
 Designed for American English in ASCII symbols
 Computer-friendly

3
Speech sounds and phonetic
transcription

[ɹ]

4
Articulatory phonetics
 Definition:
 The study of how phones are produced
 The vocal organ

5
Articulatory phonetics
 Consonants are defined by
 Place of articulation
 The point of maximum constriction
 Manner of articulation
 How the restriction of airflow is made
 Voicing
 State of the glottis

6
Articulatory phonetics
 Sounds are formed by the motion of air through the
mouth
 Consonants:
 Made by restricting or blocking the airflow in some way
 May be voiced(sounds made with vocal folds together and
vibrating) or voiceless
 Vowels:
 Made with less obstruction
 Usually voiced
 Generally louder and longer than consonants

7
Articulatory phonetics
 Place of articulation

coronal

8
SPEECH PROCESSING
Why Speech?
 No visual contact required
 No special equipment required
 Can be done while doing other things
Speech Processing
 Speech Coding
 Speech Synthesis
 Speech Recognition
 Speaker Recognition/Verification
Speech Synthesis
 Construct Speech waveform from words
 Speaker Quality and Accent
Speech Recognition
 Convert a sound waveform to words
 The most relevant and important task in the industry
 90% in lab conditions, much lower in factory conditions
Speaker Recognition
 Concerned with Biometrics
 Acceptable as a verification technique
 How would this be different from Speech recognition?
 Speaker Quality
 Pitch, Accent etc.
Automatic Speech Recognition
 Most Important Task
 Hardest Task
 Co-articulation: Two speakers speaking at the same time
 Speaker Variation
 Spontaneity
 Language Modeling
 Noise Robustness
ASR: Problems
ASR: Method
ASR: Application
WHAT IS PHONOLOGY
Despite the infinite variations which occur when we speak, all speakers of
language agree that certain utterances are the «same» and others are
«different». Phonology tells us why this is the case.

Note that, everyone who knows a language knows its phonology


unconsciously.
PHONETICS PHONOLOGY

How are How are


speech sounds speech sounds
made? classified?

PHONOLOGY is concerned with the abstract,


PHONETICS is concerned with the physical properties of sounds.

PHONOLOGY is study of sounds in context, PHONETICS is study of


sounds in isolation.
Phonology deals with the
following questions:
1. Why do related forms differ?

2. What is stored in the mind?

3. What sounds go together?

4. How are sounds organized into syllables?

5. What are the differences between languages?


1. Why do related forms differ?

Sane—Sanity. Electric—Electricity/ Atom—Atomic

Phonology finds the systematic ways in which the forms differ and
explains them
2. What is stored in the mind?

Phonology studies abstract mental entities, such as structures and


processes.

This contrasts with phonetics, which deals with the actual production
and acoustics of the sounds of language.
3. What sounds go together?

Looks at what sounds/sound combinations are accepted and why.


4. How are sounds organized
into syllables?

With the use of phonological trees syllables are broken up more easily.

Syllables are made up of a rhyme and an onset (any consonants before the
rhyme).

The rhyme made up of a nucleus (the vowel sound(s) in the syllable, the key
component of all syllables) and a coda (any consonants following the
nucleus).
5. What are the differences
between languages?

For example, different languages can use different phonemes, or


different syllable structures (what sounds can go together to make
sequences or words) and phonology identifies these differences.
2. BASIC UNITS OF PHONOLOGY
1. PHONE

2. PHONEME

3. ALLOPHONE
Phone is a sound.
It is the smallest unit of sound.
The speech sounds we hear and produce are all
phones. It is represented with phonetic symbols.

i.e. [t]
PHONEME is a speech sound that signals a difference in
meaning

i.e. dime vs dine

They sound exactly alike but their meanings are different. It


is the /m/ and /n/ that made the difference in meaning.
Allophones are variations of the samephoneme.

phoneme
/p/
allophones
[p] [ph]
Spit pit
Phonemes: The Phonological
Units of Language
• Phonemes are the basic unit of sound
and are sensed in your mind rather than
spoken or heard

• Each phoneme has one or more sounds


called allophones associated with it,
which represent the actual sound being
produced in various environments
Vowel phonemes in English
• When you do these substitutions you are creating minimal
pairs, such as in this list:

• This list demonstrates that this dialect of English has fourteen


different vowel phonemes: /i ɪ e ɛ æ u ʊ o ɔ a ʌ/ and /aɪ/, /aʊ/
and /ɔɪ .And all of these phonemes has at least two allophones;
• Consonants also have allophones:

tick [thɪk] stick [stɪk] hits [hɪts] bi7er [bɪɾər]

• /t/ is pronounced [th] before a stressed vowel


• /t/ is pronounced [t] directly before or aMer [s]
• /t/ is pronounced [ɾ] between a stressed and unstressed vowel

• If we pronounce tick as [tɪk] or [ɾɪk] instead of [thɪk], we are still speaking the
same word, even if it sounds strange because these allophones of /t/ do not
contrast

• However, if we tried to pronounce tick as [sɪk], we would be saying sick,


which has a different meaning
• The meaning changes because /t/ and /s/ are separate phonemes and do
contrast
Distinctive Features of Phonemes
• For two phones, or sounds, to contrast meaning
there must be some difference between them

– For example, the phonetic feature of voicing distinguishes


[s] from [z]

• When a feature distinguishes one phoneme from


another, it is a distinctive feature or a
phonemic feature
Feature Values
• Features have two values: [+ feature] and
[-feature] to indicate the presence or
absence of that particular feature
– For example, [b] is [+voiced] and [p] is [-
voiced]

• At least one feature difference must distinguish each phoneme of a


language
Phonemic Patterns May Vary
Across Languages
• The same phones may occur in two languages
but pattern differently because the phonologies
of the languages are different

• While aspiration is not distinctive in English, it is


distinctive in Thai:
Phonological rules
• Phonological rules describe how phonemes
are realized as their allophones in the given
environment
• Environment in phonology typically refers to
the neighboring phonemes
• Example: In English, the phoneme /t/ is
realized as its allophone [ɾ] between a
stressed vowel and an unstressed vowel
Formal notation
• Formally a phonological rule is written as
X -> Y / W _ Z
• The variables (X, Y, etc.) can be phonemes,
allophones, or features
e.g. /t/ -> [ɾ] / V́ _ V
e.g. [-voice] -> [+voice] / V _ V
• Some symbols have special meanings
– # means word-boundary
– $ means syllable-boundary
– + means morpheme boundary
Syllable Structure
• Words are composed of one or more syllables, which
are phonological units composed of one or more
phonemes

– Every syllable has a nucleus, and the nucleus may be


preceded and/or followed by one or more phonemes called the
onset and the coda

– The rime is the nucleus + the coda


Assimilation Rules
• An assimilation rule is a rule that makes
neighboring segments more similar by
duplicating a phonetic property
– For example, the English vowel nasalization
rule states that vowels become nasalized
before a nasal consonant within the same
syllable
Assimilation Rules
• Assimilation rules reflect coarticulation

– Coarticulation is the spreading of phonetic features either in


anticipation or in the preservation of articulatory processes

• For example, it is easier to lower the velum while a vowel is being


produced before a nasal stop than to wait for the completion of the
vowel to then lower the velum even more quickly

• There are many assimilation rules in English and other


languages
– English plural and past tense morphemes
– Akan negative morphemes
Dissimilation Rules
• Languages also have dissimilation rules, in
which a segment becomes less like another
segment
– It is sometimes easier to articulate dissimilar sounds

• Latin suffix –alis to form adjectives dissimilates


to
–aris when an l is in the noun and the
dissimilation can be seen in the words
borrowed into English
Segment Insertion and Deletion
Rules
• Phonological rules may also add or delete
entire segments
– Adding a segment is known as epenthesis
• The rules for forming plurals, possessives, and third
person singular verb agreement in English all involve an
epenthesis rule:

Insert a [ə] before the plural morpheme /z/ when a regular noun
ends in a sibilant, giving [əz]
Segment Insertion and
Deletion Rules
• Segment deletion is more common than
insertion
– The word memory is often pronounced as if it
were spelled memry
– The deletion of [g]:
The Function of Phonological Rules
• Phonological rules provide the phonetic
information necessary for the pronunciation of
utterances
– Derivation: the way the phonological rules apply to the
underlying phonemic representation to create the
phonetic representation:
Probabilistic Approaches to
Pronunciation and Spelling
Spoken and Written Word (Lexical)
Errors
 Variation vs. error

 Word formation errors:


 I go to Columbia Universary.

 Lexical access problems:


 Turn to the right (left)
 “I called my mother on the television and did not understand
the door. It was too breakfast, but they came from far to near.
My mother is not too old for me to be young."
 Can humans understand ‘what is meant’ as opposed to ‘what
is said/written’?
 Aoccdrnig to a rscheearch at an Elingsh uinervtisy, it deosn't
mttaer in waht oredr the ltteers in a wrod are, the olny
iprmoetnt tihng is taht the frist and lsat ltteers are at the rghit
pclae. The rset can be a toatl mses and you can sitll raed it
wouthit a porbelm. Tihs is bcuseae we do not raed ervey lteter
by itslef but the wrod as a wlohe.
 How do we do this?
Detecting and Correcting Spelling
Errors
 Applications:
 Spell checking in MS word
 OCR scanning errors
 Hand-writing recognition of zip codes, signatures, Graffiti
 Issues:
 Correct non-words (dg for dog, but frplc?)
 Correct “wrong” words in context (their for there, words of
rule formation)
Patterns of Error
 Human typists make different types of errors from OCR
systems -- why?
 Error classification I: performance-based:
 Insertion: catt
 Deletion: ct
 Substitution: car
 Transposition: cta
 Error classification II: cognitive
 People don’t know how to spell (nucular/nuclear)
 Homonymous errors (their/there)
How do we decide if a (legal) word
is an error?
 How likely is a word to occur?
 They met there friends in Mumbai.
 The Noisy Channel Model

Source Noisy
 Input to channel: Channel
true (typed or spoken) word Decoder
w
 Output from channel: an observation O
 Decoding task: find w = P(w|O)
arg max
wV
Probability
 Experiment: a procedure involving chance that leads to
different results
 Outcome: the result of a single trial of an experiment;
 Event: one or more outcomes of an experiment;
 Probability: the measure of how likely an event is;
Basic Approach
Bayes Rule: P ( D | h) P ( h)
P(h | D) =
P( D)
 P(h) = prior probability of hypothesis h
 P(D) = prior probability of training data D
 P(h|D) = probability of h given D (posterior density )
 P(D|h) = probability of D given h (likelihood of D given h)
The Goal of Bayesian Learning: the most probable hypothesis given the
training data (Maximum A Posteriori hypothesis hmap)
hmap
= max P(h | D)
hH

P ( D | h) P ( h)
= max
hH P( D)
= max P( D | h) P(h)
hH
An Example
Does patient have cancer or not?
A patient takes a lab test and the result comes back positive. The test
returns a correct positive result in only 98% of the cases in which the
disease is actually present, and a correct negative result in only 97% of
the cases in which the disease is not present. Furthermore, .008 of the
entire population have this cancer.
P (cancer ) = .008, P (cancer ) = .992
P (+ | cancer ) = .98, P (− | cancer ) = .02
P (+ | cancer ) = .03, P (− | cancer ) = .97
P (+ | cancer ) P (cancer )
P (cancer | + ) = = 0.0078/(0.0078 + 0.0298)
P(+)
= 0.20745
P (+ | cancer ) P (cancer )
P (cancer | + ) = = 0.0298/(0.0078 + 0.0298)
P(+)
= 0.79255
Bayesian Inference
 Population: 10 Columbia students
 4 vegetarians, 3 CS major

 Probability that a rcs is a vegetarian? p(v) = .4


 That a rcs who is a vegetarian is also a CS major? p(c|v) = .5
 That a rcs is a vegetarian (and) CS major? p(c,v) = .2
Returning to Spelling...

Source Noisy Channel Decoder


 Channel Input: w; Output: O
 Decoding: hypothesis w = P(w|O)
arg max
 or, by Bayes Rule... wV
 w=
P(O | w) P(w)
 and, since
argP(O)
maxdoesn’t change for any entries in our lexicon
wV P(O )
we are going to consider, we can ignore it as constant, so…
w= P(O|w) P(w) (Given that w was intended, how
likely are we to see O)
arg max
wV
Applications for spelling
correction
Word processing Phones

Web search

62
Spelling Tasks
 Spelling Error Detection
 Spelling Error Correction:
 Autocorrect
 hte→the
 Suggest a correction
 Suggestion lists

63
Types of spelling errors
 Non-word Errors
 graffe →giraffe
 Real-word Errors
 Typographical errors
 three →there
 Cognitive Errors (homophones)
 piece→peace,
 too → two
 your →you’re

 Non-word correction was historically mainly context insensitive


 Real-word correction almost needs to be context sensitive
64
Rates of spelling errors

Depending on the application, ~1–20%


error rates
26%: Web queries Wang et al. 2003
13%: Retyping, no backspace: Whitelaw et al. English&German
7%: Words corrected retyping on phone-sized organizer
2%: Words uncorrected on organizer Soukoreff &MacKenzie 2003
1-2%: Retyping: Kane and Wobbrock 2007, Gruden et al. 1983

65
Non-word spelling errors
 Non-word spelling error detection:
 Any word not in a dictionary is an error
 The larger the dictionary the better … up to a point
 (The Web is full of mis-spellings, so the Web isn’t necessarily a
great dictionary …)
 Non-word spelling error correction:
 Generate candidates: real words that are similar to error
 Choose the one which is best:
 Shortest weighted edit distance
 Highest noisy channel probability

66
How could we use bayes model to
correct spelling errors?
 Simplifying assumptions
 We only have to correct non-word errors
 Each non-word (O) differs from its correct word (w) by one
step (insertion, deletion, substitution, transposition)
 From O, generate a list of candidates differing by one step
and appearing in the lexicon, e.g.
Error Corr Corr letter Error letter Pos Type
caat cat - a 2 ins
caat carat r - 3 del
How do we decide which correction
is most likely?
 We want to find the lexicon entry w that maximizes
P(typo|w) P(w)
 How do we estimate the likelihood P(typo|w) and the prior
P(w)?
 First, find some corpora
 Different corpora needed for different purposes
 Some need to be labeled -- others do not
 For spelling correction, what do we need?
 Word occurrence information (unlabeled)
 A corpus of labeled spelling errors
Cat vs Carat
 Suppose we look at the occurrence of cat and carat in a large
(50M word) AP news corpus
 cat occurs 6500 times, so p(cat) = .00013
 carat occurs 3000 times, so p(carat) = .00006
 Now we need to find out if inserting an ‘a’ after an ‘a’ is
more likely than deleting an ‘r’ after an ‘a’ in a corrections
corpus of 50K corrections ( p(typo|word))
 suppose ‘a’ insertion after ‘a’ occurs 5000 times (p(+a)=.1)
and ‘r’ deletion occurs 7500 times (p(-r)=.15)
 Then p(word|typo) = p(typo|word) * p(word)
 p(cat|caat) = p(+a) * p(cat) = .1 * .00013 = .000013
 p(carat|caat) = p(-r) * p(carat) = .15 * .000006 = .000009
 Issues:
 What if there are no instances of carat in corpus?
 Smoothing algorithms
 Estimate of P(typo|word) may not be accurate
 Training probabilities on typo/word pairs
 What if there is more than one error per word?
A More General Approach: Minimum Edit Distance
 How can we measure how different one word is from
another word?
 How many operations will it take to transform one word into
another?
caat --> cat, fplc --> fireplace (*treat abbreviations as typos??)
 Levenshtein distance: smallest number of insertion, deletion, or
substitution operations that transform one string into another
(ins=del=subst=1)
 Alternative: weight each operation by training on a corpus of
spelling errors to see which most frequent
Dynamic Programming
 Dynamic Programming is an algorithm design technique for
optimization problems: often minimizing or maximizing.
 Like divide and conquer, DP solves problems by combining
solutions to subproblems.
 Unlike divide and conquer, subproblems are not independent.
 Subproblems may share subsubproblems,
 However, solution to one subproblem may not affect the solutions to other
subproblems of the same problem. (More on this later.)
 DP reduces computation by
 Solving subproblems in a bottom-up fashion.
 Storing solution to a subproblem the first time it is solved.
 Looking up the solution when subproblem is encountered again.
 Key: determine structure of optimal solutions

Comp 1
Edit Distance
 One measure of similarity between two strings is their edit
distance.
 This is a measure of the number of operations required to
transform the first string into the other.
 Single character operations:
 Deletion of a character in the first string
 Insertion of a character in the first string
 Substitution of a character from the second character into the second string
 Match a character in the first string with a character of the second .
Edit Distance
Example from textbook: transform vintner to writers
vintner replace v with w → wintner
wintner insert r after w → wrintner
wrintner match i → wrintner
wrintner delete n → writner
writner match t → writner
writner delete n → writer
writer match e → writer
writer match r → writer
writer insert s → writers
Edit Distance
Let  = {I, D, R, M} be the edit alphabet
Defn. An edit transcript of two strings is a string over 
describing a transformation of one string into another.
Defn. The edit distance between two strings is defined as the
minimum number of edit operations needed to transform
the first into the second. Matches are not included in the count.
Edit distance is also called Levenshtein distance.
Edit Distance
Defn. An optimal transcript is an edit transcript with the
minimal number of edit operations for transforming one
string into another.
Note: optimal transcripts may not be unique.
Defn. The edit distance problem entails computing the edit
distance between two strings along with an optimal
transcript.
Dynamic Programming

 Observation 1: There are many possible ways to transform one


string into another.
 Observation 2: This is like the knapsack problem
 Recall: dynamic programming is used to solve knapsack-like
problems.
 Defn. Let D(i,j) denote the edit distance of S1[1..i] and S2[1..j].
 That is, D(i,j) is the minimum number of edit ops needed to transform
the first i characters of S1 into the first j characters of S2.
Dynamic Programming
 Notice that we can solve D(i,j) for all combination of lengths of
prefixes of S1 and S2.
 Examples: D(0,0),.., D(0,j), D(1,0),..,D(1,j), … D(i,j)
 Dynamic programming is a divide and conquer method.
 The three parts to dynamic programming are:
 The recurrence relation
 Tabular computation
 Traceback
Dynamic Programming
 The recurrence relation expresses the recursive relation
between a problem and smaller instances of the problem.
 For any recursive relation, the base condition(s) must be
specified.
 Base conditions for D(i,j) are:
 D(i,0) = i
Q: Why is this true? What does it mean in terms of edit ops?
 D(0,j) = j
Q: Why is this true? What does it mean in terms of edit ops?
Dynamic Programming
 The general recurrence is given by:
D(i,j) = min[D(i - 1, j) + 1, D(i, j - 1) + 1, D(i - 1, j - 1) + t (i,j) ]
Here t (i,j) = 1 if S1(i)  S2(j), o/w t (i,j) = 0.
 Basic argument: D(i,j) must be one of :
1. D(i - 1, j) + 1
2. D(i, j - 1) + 1
3. D(i - 1, j - 1) + t (i,j)
There are NO other ways of creating S2[1..j] from S1[1..i].
Dynamic Programming
 Note: In calculating D(n,m), there are only (n + 1)  (m + 1)
unique combinations of i and j.
 Clearly an exponential number of computations is NOT
required.
 Soln: instead of going top-down with recursion, go bottom-
up. Compute each combination only once.
 Decide on a data structure to hold intermediate results.
 Start from base conditions. These are the smallest D(i,j) values and are
already defined.
 Compute D(i,j) for larger values of i and j.
Dynamic Programming
 Q: What kind of data structure should we use for edit
distance?
1. Has to be a random access data structure.
2. Has to support the dimensionality of the problem.
 D(i,j) is two-dimensional: S1 and S2.
 We will use a two-dimensional array, i.e., a table.
Dynamic Programming

Example: edit distance from vintner to writer


Fill in the base condition values.
D(i,j) w r i t e r s
0 1 2 3 4 5 6 7
0 0 1 2 3 4 5 6 7
v 1 1
i 2 2
n 3 3
t 4 4
n 5 5
e 6 6
r 7 7
Dynamic Programming
 Q: How do we fill in the other values?
 A: use the recurrence:
D(i,j) = min[D(i - 1, j) + 1, D(i, j - 1) + 1, D(i - 1, j - 1) + t (i,j) ]
where t (i,j) = 1 if S1(i)  S2(j), o/w t (i,j) = 0.
 We can first compute D(1,1) because we have D(0,0), D(0,1),
and D(1,0)
 D(1,1) = min[ 1+1, 1+1, 0+1] = 1
 Then we have all the values needed to compute in turn D(1,2),
D(1,3),..,D(1,m)
Dynamic Programming
First compute D(1,1) because we have D(0,0), D(0,1), and D(1,0)
Then compute in turn D(1,2), D(1,3),..,D(1,m)
D(i,j) w r i t e r s
0 1 2 3 4 5 6 7
0 0 1 2 3 4 5 6 7
v 1 1 1 2 3 4 5 6 7
i 2 2
n 3 3
t 4 4
n 5 5
e 6 6
r 7 7
Dynamic Programming

Fill in subsequent values, row by row, from left to


D(i,j) w r i t e r s
0 1 2 3 4 5 6 7
0 0 1 2 3 4 5 6 7
v 1 1 1 2 3 4 5 6 7
i 2 2 2 2 2 3 4 5 6
n 3 3 3 3 3 3 4 5 6
t 4 4 4 4 4 3 4 5 6
n 5 5 5 5 5 4 4 5 6
e 6 6 6 6 6 5 4 5 6
r 7 7 7 6 7 6 5 4 5
Dynamic Programming
Alternatively, first compute D(1,1) from D(0,0), D(0,1), and D(1,0)
Then compute in turn D(2,1), D(3,1),..,D(n,1)
D(i,j) w r i t e r s
0 1 2 3 4 5 6 7
0 0 1 2 3 4 5 6 7
v 1 1 1 2 3 4 5 6 7
i 2 2 2
n 3 3 3
t 4 4 4
n 5 5 5
e 6 6 6
r 7 7 7
Dynamic Programming

Fill in subsequent values, column by column, from t


D(i,j) w r i t e r s
0 1 2 3 4 5 6 7
0 0 1 2 3 4 5 6 7
v 1 1 1 2 3 4 5 6 7
i 2 2 2 2 2 3 4 5 6
n 3 3 3 3 3 3 4 5 6
t 4 4 4 4 4 3 4 5 6
n 5 5 5 5 5 4 4 5 6
e 6 6 6 6 6 5 4 5 6
r 7 7 7 6 7 6 5 4 5
Dynamic Programming
 Filling each cell entails a constant number of operations.
 Cell (i,j) depends only on characters S1(i) and S2(j) and cells (i -
1, j - 1), (i, j - 1), and (i - 1, j).
 There are O(nm) cells in the table
 Consequently, we can compute the edit distance D(n, m) in
O(nm) time by computing the table in O(nm).
Summary
 We can apply probabilistic modeling to NL problems like
spell-checking
 Noisy channel model, Bayesian method
 Training priors and likelihoods on a corpus
 Dynamic programming approaches allow us to solve large
problems that can be decomposed into subproblems
 e.g. MED algorithm
Apply similar methods to modeling pronunciation variation

You might also like