Speech Processing: Review # (Or) Seminar #
Speech Processing: Review # (Or) Seminar #
Speech Processing
By
Name of the Student
Regd Number: 11111111111
Course: M.Tech ( CSE )
Department of Computer Science & Engineering.
Under theGuidance of
Sri Faculty Name
Designation of the Faculty
Raghu Institute of Technology
Approved by AICTE(New Delhi) , Affiliated to JNTU-K,
Dakamarri(V), Bheemunipatnam (M),
Visakhapatnam District, Andhra Pradesh, India.
01/30/15
Objective
Fundamental definitions
What is speech?
Phonetics and Phonology
Speech Recognition
Speech Synthesis
Research areas in speech
Fundamental Definitions
Sound waves
A sound is simply a disturbance of air molecules,
which radiates outward from its source, in waves
of fluctuating air pressure, like ripples from a
stone dropped in a pool.
The structure of these sound waves distinguishes
one sound from another.
When sound waves hit our eardrums, nerve cells
in the inner ear detect the structure of the
vibrations, and they pass this information on to the
brain.
Ahigheramplitude,lowerfrequencywave:
1cycleofthewave(troughtotrough,orpeaktopeak)
Spectrograms
Display of time on the x-axis, frequency on the y-axis, and the
higher-amplitude frequency regions shown as darker areas
Spectrogramof'Areyouworkinglate,Nanny?'
[ A
w k
N l
n i
SPEECH
What is it?
Acoustics
Linguistics
Physiology
Linguistic
level
Physiological
level
Listener
Acoustic
level
Physiological
level
Linguistic
level
Linguistics
Units of language. What are they?
Words?
Syllables?
Sounds?
Phonemes.
Physiology
This relates to how the sounds are produced through neural
and muscular activity.
We set air coming up from the lungs in motion using our
vocal cords and then we can channel this air through the
vocal tract using out tongue, lips, etc.
We can classify the different sounds we make according to how
we set the air in motion and how we channel the airstreams
through the vocal tract.
Acoustics
This describes the generation and transmission of the sounds.
How air is set in motion.
We generate sound waves. What do they look like?
ALLOPHONE
A contextual variant of a single phoneme, in a particular phonetic
environment. They do not involve a semantic contrast
their distribution is mutually exclusive an allophone cannot
occur where another can. Predicted/governed by phonological rules.
For example: The p sounds in the English words pin and spin
are acoustically different. The [p] in pin is produced with a breath of
air following it (aspirated) whereas the [p] in spin is not.
Vowels
sounds produced with no obstruction to the airstream as it passes
through the vocal tract.
There are three main organs of speech involved in changing the
size of the air chamber. These are
the lips - rounding, spreading
the lower jaw - lowered, raised
the tongue - raised, flattened, brought forward, etc.
Consonants
Places of articulation
Bilabial
Bilabial sounds are those sounds made by the articulation of the lips against each
each other. Examples of such sounds in English are the following: [b],[p],[m]
Dental
Dental sounds are those sounds made by he articulation of the tip of the tongue
towards the back of the teeth. Such sounds are not present in Standard American
English, but in some Chicano English dialects and certain Brooklyn dialects, the
sounds [t] and [d] are pronounced with a dental articulation
Alveolar
Alveolar sounds are those sounds made by the articulation of the tip of the tongue
towards the alveolar ridge, the ridge of cartilage behind the teeth. Examples of such
sounds in English are the following [n],[l]
Manner of articulation
Plosive/Stop
Plosive sounds are made by forming a complete obstruction to the flow of air through
the mouth and nose.
explosion of air causes a sharp noise.
Voiceless - p, t, k
Voiced - b, d, g
pit
bit
tip
dip
cot
got
Fricative
A fricative is a type of consonant that is formed by forcing air through a narrow gap
so that hissing sound is created.
Air is forced between the tongue and the place of articulation for the particular sound
f (as in far)
sh (as in shut)
Syllable
A syllable is a structural unit of sound that constitutes a sequence of consonants
and vowels. It is hierarchically composed of three parts:
Onset initial consonant or consonant cluster
Nucleus the vowel
Coda final consonant or consonant cluster
syllable
onset
Rime
Nucleus
Coda
str
eh
nx ths
Existing SR systems
Dragon Naturally speaking
IBM Via Voice
PHILIPS Free Speech 2000
L & H (Lernout & Heuspie)
Voice Xpress
Fundamental Components
TTS System
words
Text
Pre-processing
Prosody
Concatenation
Text Pre-Processing
Input
String of characters (sentence)
Output
String of diphone symbols
Objective
Perform sentence level analysis
Punctuation marks
Pauses between words
Acronym
Converter
Word
Segmenter
Word to
Diphone
Translator
(Phonetization)
Diphone
Dictionary
MLDS
Number Converter
Replace
Handle
Acronym Converter
Replace
Change
format
Mr.
Mister
Word Segmenter
Divide
can be:
A single word
An acronym
A numeral
Identify
punctuation marks
Allows
for modularization
Prosody
MLDS
Diphone
Retrieval
Diphone
Database
done
Acoustic
Manipulation
no
yes
Concatenation
Diphone Retrieval
Database of recorded diphones
Every diphone matched with txt file
Distinguished by type (CC, CV, VC, VV)
References to specific components within
waveform
Conclusion
Diphones
Words
Sentence
Straight
THANK U