0% found this document useful (0 votes)
112 views49 pages

Speech Processing: Review # (Or) Seminar #

Sample PPT Format for Preparing M.Techs/ B.Techs , Seminar

Uploaded by

prasad9440024661
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
112 views49 pages

Speech Processing: Review # (Or) Seminar #

Sample PPT Format for Preparing M.Techs/ B.Techs , Seminar

Uploaded by

prasad9440024661
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 49

Review # (or) Seminar #

Speech Processing
By
Name of the Student
Regd Number: 11111111111
Course: M.Tech ( CSE )
Department of Computer Science & Engineering.
Under theGuidance of
Sri Faculty Name
Designation of the Faculty
Raghu Institute of Technology
Approved by AICTE(New Delhi) , Affiliated to JNTU-K,
Dakamarri(V), Bheemunipatnam (M),
Visakhapatnam District, Andhra Pradesh, India.
01/30/15

Objective

Fundamental definitions
What is speech?
Phonetics and Phonology
Speech Recognition
Speech Synthesis
Research areas in speech

Fundamental Definitions

Sound waves
A sound is simply a disturbance of air molecules,
which radiates outward from its source, in waves
of fluctuating air pressure, like ripples from a
stone dropped in a pool.
The structure of these sound waves distinguishes
one sound from another.
When sound waves hit our eardrums, nerve cells
in the inner ear detect the structure of the
vibrations, and they pass this information on to the
brain.

Frequency and amplitude of a


wave
Aloweramplitude,higherfrequencywave:

Ahigheramplitude,lowerfrequencywave:

1cycleofthewave(troughtotrough,orpeaktopeak)

Pitch and loudness


The frequency of a wave is heard as its
pitch.
The amplitude of a wave is heard as its
loudness.

Spectrograms
Display of time on the x-axis, frequency on the y-axis, and the
higher-amplitude frequency regions shown as darker areas

Spectrogramof'Areyouworkinglate,Nanny?'

[ A

w k

N l

n i

SPEECH
What is it?
Acoustics

Linguistics
Physiology

The Speech Chain (Denes & Pinson)


Speaker

Linguistic
level

Physiological
level

Listener

Acoustic
level

Physiological
level

Linguistic
level

Linguistics
Units of language. What are they?
Words?

Syllables?

Sounds?

What are the individual sounds in language?

Phonemes.

How are they defined?

Physiology
This relates to how the sounds are produced through neural
and muscular activity.
We set air coming up from the lungs in motion using our
vocal cords and then we can channel this air through the
vocal tract using out tongue, lips, etc.
We can classify the different sounds we make according to how
we set the air in motion and how we channel the airstreams
through the vocal tract.

Acoustics
This describes the generation and transmission of the sounds.
How air is set in motion.
We generate sound waves. What do they look like?

PHONETICS AND PHONOLOGY


Phonetics concerns itself with:
The study of the acoustic detail of speech sounds and how
they are articulated

Phonology concerns itself with:


Considers how these speech sounds are used within
languages
deals with the mechanisms / rules / processes that
underlie / govern these units of speech.

ALLOPHONE
A contextual variant of a single phoneme, in a particular phonetic
environment. They do not involve a semantic contrast
their distribution is mutually exclusive an allophone cannot
occur where another can. Predicted/governed by phonological rules.
For example: The p sounds in the English words pin and spin
are acoustically different. The [p] in pin is produced with a breath of
air following it (aspirated) whereas the [p] in spin is not.

Vowels
sounds produced with no obstruction to the airstream as it passes
through the vocal tract.
There are three main organs of speech involved in changing the
size of the air chamber. These are
the lips - rounding, spreading
the lower jaw - lowered, raised
the tongue - raised, flattened, brought forward, etc.

Consonants

Consonants are articulated by restricting the airflow at some part

of the vocal tract.

The consonant that is produced is determined by three factors;


place, manner and voice.

Characterized by three features


1) Place of articulation- Bilabial,Dental, Alveolar, Palatal,Velar,
2) Manner of articulation Stop(Plosive), Nasal, Trill, Tap(Flap),
Fricative,Affricate, Lateral approximant.
3) Voicing voiced/voiceless

Places of articulation
Bilabial
Bilabial sounds are those sounds made by the articulation of the lips against each
each other. Examples of such sounds in English are the following: [b],[p],[m]

Dental
Dental sounds are those sounds made by he articulation of the tip of the tongue
towards the back of the teeth. Such sounds are not present in Standard American
English, but in some Chicano English dialects and certain Brooklyn dialects, the
sounds [t] and [d] are pronounced with a dental articulation

Alveolar
Alveolar sounds are those sounds made by the articulation of the tip of the tongue
towards the alveolar ridge, the ridge of cartilage behind the teeth. Examples of such
sounds in English are the following [n],[l]

Manner of articulation
Plosive/Stop
Plosive sounds are made by forming a complete obstruction to the flow of air through
the mouth and nose.
explosion of air causes a sharp noise.

Voiceless - p, t, k
Voiced - b, d, g

pit
bit

tip
dip

cot
got

Fricative
A fricative is a type of consonant that is formed by forcing air through a narrow gap
so that hissing sound is created.
Air is forced between the tongue and the place of articulation for the particular sound
f (as in far)
sh (as in shut)

Syllable
A syllable is a structural unit of sound that constitutes a sequence of consonants
and vowels. It is hierarchically composed of three parts:
Onset initial consonant or consonant cluster
Nucleus the vowel
Coda final consonant or consonant cluster

syllable
onset

Rime
Nucleus

Coda

str
eh

nx ths

Existing SR systems
Dragon Naturally speaking
IBM Via Voice
PHILIPS Free Speech 2000
L & H (Lernout & Heuspie)
Voice Xpress

A Text-to-Speech Synthesis System

Fundamental Components
TTS System
words

Text
Pre-processing

Prosody

Concatenation

Text Pre-Processing

Input
String of characters (sentence)

Output
String of diphone symbols

Objective
Perform sentence level analysis

Punctuation marks
Pauses between words

Convert all input to corresponding diphones

Text Pre-Processing (Block Diagram)


Number
Converter

Acronym
Converter

Word
Segmenter

Word to
Diphone
Translator
(Phonetization)

Diphone
Dictionary

MLDS

Number Converter
Replace

numerals with their textual


versions
100
one hundred

Handle

fractional and decimal


numbers
0.25
point two five

Acronym Converter
Replace

acronyms with single letter


components
A.B.C.
ABC

Change

abbreviations to full textual

format
Mr.

Mister

Word Segmenter
Divide

sentence into word segments

Special delimiter to separate segments


(i.e. ||)
Segments

can be:

A single word
An acronym
A numeral
Identify

punctuation marks

Word To Diphone Converter


(Phonetization)
Purpose

Translate words to their diphone


representations
Resource

Dictionary of words and their diphones

The Multi-Level Data Structure


Contains

all necessary data for the


next sub-system:
Word
Diphone representation
Prosodic parameters for each diphone
This reflects both word-level and sentencelevel prosody

Allows

for modularization

Prosody

MLDS

Diphone
Retrieval

Diphone
Database

done
Acoustic
Manipulation

no

yes

Concatenation

Diphone Retrieval
Database of recorded diphones
Every diphone matched with txt file
Distinguished by type (CC, CV, VC, VV)
References to specific components within
waveform

Store diphone waveform and prosodic


parameters in variables

Conclusion
Diphones

Words

Using PSOLA at the joining ends


Ensures smooth transition
Words

Sentence

Straight

joining at the end


points due to presence of
pauses

THANK U

You might also like