NLP Unit-3
NLP Unit-3
transcription
Phonetics
The study of speech sounds used in the languages of the world
Phone
A speech sound
Represented with phonetic symbols
Two types:
Consonants
Vowels
1
Phonology
Phonology is the area of linguistics that describes the systematic
way that sounds are differently realized in different
environments, and how this system of sounds is related to
rest of the grammar.
2
Speech sounds and phonetic
transcription
Phonetic alphabets
IPA
Standard developed by International Phonetic Association
Alphabet + principles of transcription
ARPAbet
Designed for American English in ASCII symbols
Computer-friendly
3
Speech sounds and phonetic
transcription
[ɹ]
4
Articulatory phonetics
Definition:
The study of how phones are produced
The vocal organ
5
Articulatory phonetics
Consonants are defined by
Place of articulation
The point of maximum constriction
Manner of articulation
How the restriction of airflow is made
Voicing
State of the glottis
6
Articulatory phonetics
Sounds are formed by the motion of air through the
mouth
Consonants:
Made by restricting or blocking the airflow in some way
May be voiced(sounds made with vocal folds together and
vibrating) or voiceless
Vowels:
Made with less obstruction
Usually voiced
Generally louder and longer than consonants
7
Articulatory phonetics
Place of articulation
coronal
8
SPEECH PROCESSING
Why Speech?
No visual contact required
No special equipment required
Can be done while doing other things
Speech Processing
Speech Coding
Speech Synthesis
Speech Recognition
Speaker Recognition/Verification
Speech Synthesis
Construct Speech waveform from words
Speaker Quality and Accent
Speech Recognition
Convert a sound waveform to words
The most relevant and important task in the industry
90% in lab conditions, much lower in factory conditions
Speaker Recognition
Concerned with Biometrics
Acceptable as a verification technique
How would this be different from Speech recognition?
Speaker Quality
Pitch, Accent etc.
Automatic Speech Recognition
Most Important Task
Hardest Task
Co-articulation: Two speakers speaking at the same time
Speaker Variation
Spontaneity
Language Modeling
Noise Robustness
ASR: Problems
ASR: Method
ASR: Application
WHAT IS PHONOLOGY
Despite the infinite variations which occur when we speak, all speakers of
language agree that certain utterances are the «same» and others are
«different». Phonology tells us why this is the case.
Phonology finds the systematic ways in which the forms differ and
explains them
2. What is stored in the mind?
This contrasts with phonetics, which deals with the actual production
and acoustics of the sounds of language.
3. What sounds go together?
With the use of phonological trees syllables are broken up more easily.
Syllables are made up of a rhyme and an onset (any consonants before the
rhyme).
The rhyme made up of a nucleus (the vowel sound(s) in the syllable, the key
component of all syllables) and a coda (any consonants following the
nucleus).
5. What are the differences
between languages?
2. PHONEME
3. ALLOPHONE
Phone is a sound.
It is the smallest unit of sound.
The speech sounds we hear and produce are all
phones. It is represented with phonetic symbols.
i.e. [t]
PHONEME is a speech sound that signals a difference in
meaning
phoneme
/p/
allophones
[p] [ph]
Spit pit
Phonemes: The Phonological
Units of Language
• Phonemes are the basic unit of sound
and are sensed in your mind rather than
spoken or heard
• If we pronounce tick as [tɪk] or [ɾɪk] instead of [thɪk], we are still speaking the
same word, even if it sounds strange because these allophones of /t/ do not
contrast
Insert a [ə] before the plural morpheme /z/ when a regular noun
ends in a sibilant, giving [əz]
Segment Insertion and
Deletion Rules
• Segment deletion is more common than
insertion
– The word memory is often pronounced as if it
were spelled memry
– The deletion of [g]:
The Function of Phonological Rules
• Phonological rules provide the phonetic
information necessary for the pronunciation of
utterances
– Derivation: the way the phonological rules apply to the
underlying phonemic representation to create the
phonetic representation:
Probabilistic Approaches to
Pronunciation and Spelling
Spoken and Written Word (Lexical)
Errors
Variation vs. error
Source Noisy
Input to channel: Channel
true (typed or spoken) word Decoder
w
Output from channel: an observation O
Decoding task: find w = P(w|O)
arg max
wV
Probability
Experiment: a procedure involving chance that leads to
different results
Outcome: the result of a single trial of an experiment;
Event: one or more outcomes of an experiment;
Probability: the measure of how likely an event is;
Basic Approach
Bayes Rule: P ( D | h) P ( h)
P(h | D) =
P( D)
P(h) = prior probability of hypothesis h
P(D) = prior probability of training data D
P(h|D) = probability of h given D (posterior density )
P(D|h) = probability of D given h (likelihood of D given h)
The Goal of Bayesian Learning: the most probable hypothesis given the
training data (Maximum A Posteriori hypothesis hmap)
hmap
= max P(h | D)
hH
P ( D | h) P ( h)
= max
hH P( D)
= max P( D | h) P(h)
hH
An Example
Does patient have cancer or not?
A patient takes a lab test and the result comes back positive. The test
returns a correct positive result in only 98% of the cases in which the
disease is actually present, and a correct negative result in only 97% of
the cases in which the disease is not present. Furthermore, .008 of the
entire population have this cancer.
P (cancer ) = .008, P (cancer ) = .992
P (+ | cancer ) = .98, P (− | cancer ) = .02
P (+ | cancer ) = .03, P (− | cancer ) = .97
P (+ | cancer ) P (cancer )
P (cancer | + ) = = 0.0078/(0.0078 + 0.0298)
P(+)
= 0.20745
P (+ | cancer ) P (cancer )
P (cancer | + ) = = 0.0298/(0.0078 + 0.0298)
P(+)
= 0.79255
Bayesian Inference
Population: 10 Columbia students
4 vegetarians, 3 CS major
Web search
62
Spelling Tasks
Spelling Error Detection
Spelling Error Correction:
Autocorrect
hte→the
Suggest a correction
Suggestion lists
63
Types of spelling errors
Non-word Errors
graffe →giraffe
Real-word Errors
Typographical errors
three →there
Cognitive Errors (homophones)
piece→peace,
too → two
your →you’re
65
Non-word spelling errors
Non-word spelling error detection:
Any word not in a dictionary is an error
The larger the dictionary the better … up to a point
(The Web is full of mis-spellings, so the Web isn’t necessarily a
great dictionary …)
Non-word spelling error correction:
Generate candidates: real words that are similar to error
Choose the one which is best:
Shortest weighted edit distance
Highest noisy channel probability
66
How could we use bayes model to
correct spelling errors?
Simplifying assumptions
We only have to correct non-word errors
Each non-word (O) differs from its correct word (w) by one
step (insertion, deletion, substitution, transposition)
From O, generate a list of candidates differing by one step
and appearing in the lexicon, e.g.
Error Corr Corr letter Error letter Pos Type
caat cat - a 2 ins
caat carat r - 3 del
How do we decide which correction
is most likely?
We want to find the lexicon entry w that maximizes
P(typo|w) P(w)
How do we estimate the likelihood P(typo|w) and the prior
P(w)?
First, find some corpora
Different corpora needed for different purposes
Some need to be labeled -- others do not
For spelling correction, what do we need?
Word occurrence information (unlabeled)
A corpus of labeled spelling errors
Cat vs Carat
Suppose we look at the occurrence of cat and carat in a large
(50M word) AP news corpus
cat occurs 6500 times, so p(cat) = .00013
carat occurs 3000 times, so p(carat) = .00006
Now we need to find out if inserting an ‘a’ after an ‘a’ is
more likely than deleting an ‘r’ after an ‘a’ in a corrections
corpus of 50K corrections ( p(typo|word))
suppose ‘a’ insertion after ‘a’ occurs 5000 times (p(+a)=.1)
and ‘r’ deletion occurs 7500 times (p(-r)=.15)
Then p(word|typo) = p(typo|word) * p(word)
p(cat|caat) = p(+a) * p(cat) = .1 * .00013 = .000013
p(carat|caat) = p(-r) * p(carat) = .15 * .000006 = .000009
Issues:
What if there are no instances of carat in corpus?
Smoothing algorithms
Estimate of P(typo|word) may not be accurate
Training probabilities on typo/word pairs
What if there is more than one error per word?
A More General Approach: Minimum Edit Distance
How can we measure how different one word is from
another word?
How many operations will it take to transform one word into
another?
caat --> cat, fplc --> fireplace (*treat abbreviations as typos??)
Levenshtein distance: smallest number of insertion, deletion, or
substitution operations that transform one string into another
(ins=del=subst=1)
Alternative: weight each operation by training on a corpus of
spelling errors to see which most frequent
Dynamic Programming
Dynamic Programming is an algorithm design technique for
optimization problems: often minimizing or maximizing.
Like divide and conquer, DP solves problems by combining
solutions to subproblems.
Unlike divide and conquer, subproblems are not independent.
Subproblems may share subsubproblems,
However, solution to one subproblem may not affect the solutions to other
subproblems of the same problem. (More on this later.)
DP reduces computation by
Solving subproblems in a bottom-up fashion.
Storing solution to a subproblem the first time it is solved.
Looking up the solution when subproblem is encountered again.
Key: determine structure of optimal solutions
Comp 1
Edit Distance
One measure of similarity between two strings is their edit
distance.
This is a measure of the number of operations required to
transform the first string into the other.
Single character operations:
Deletion of a character in the first string
Insertion of a character in the first string
Substitution of a character from the second character into the second string
Match a character in the first string with a character of the second .
Edit Distance
Example from textbook: transform vintner to writers
vintner replace v with w → wintner
wintner insert r after w → wrintner
wrintner match i → wrintner
wrintner delete n → writner
writner match t → writner
writner delete n → writer
writer match e → writer
writer match r → writer
writer insert s → writers
Edit Distance
Let = {I, D, R, M} be the edit alphabet
Defn. An edit transcript of two strings is a string over
describing a transformation of one string into another.
Defn. The edit distance between two strings is defined as the
minimum number of edit operations needed to transform
the first into the second. Matches are not included in the count.
Edit distance is also called Levenshtein distance.
Edit Distance
Defn. An optimal transcript is an edit transcript with the
minimal number of edit operations for transforming one
string into another.
Note: optimal transcripts may not be unique.
Defn. The edit distance problem entails computing the edit
distance between two strings along with an optimal
transcript.
Dynamic Programming