0% found this document useful (0 votes)
39 views37 pages

Module-2 Basics of Sounnd

Uploaded by

amina mughli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views37 pages

Module-2 Basics of Sounnd

Uploaded by

amina mughli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

AYAZ AHMAD KHAN

M.A, Audiology and Speech Pathology


Auditory Verbal Therapist,, AG Bell Academy USA,
Certification…InProgress
Founder Auditory Verbal Pakistan
Speech Therapist at HHRD
How do we hear?
Speech Acoustics

Speech acoustics only reflect how a speaker is speaking on a

particular occasion; however, just like any other parameter, they are

not invariant and show both within-speaker as well as between-

speaker variation.
Why Do We Need to Understanding Speech
Acoustics ?
Audition is the only sense capable of appreciating all aspects of speech. As

professionals focused on developing listening and spoken language with

children with hearing loss, we must understand that auditory access is key.

Speech acoustics is the framework to understand the acoustic properties of

speech phonemes and their relationship to an audiogram. By understanding

speech acoustics, we can understand what sounds of speech children are

hearing, better understand their perception of sounds, and use that information

to plan therapy.
Basics of Sound
 The perception of sound is based on three dimensions that are

relevant and have applications to our work: duration, intensity, and

pitch.

 Duration is the perceived length of the acoustic event. Loudness is


the perception of intensity. Frequency is perceived as pitch.

 Loudness is the perception of how strong a sound is and can be

measured objectively in units of decibels (dB) and represented on

an audiogram
Why are there adult-infant differences?

 Ear canal

 Ossicles

 Tympanic membrane
External auditory canal

 Continues to grow until 7 years of age

 At birth length <15mm, average volume <0.5cc (adults 28-40mm,2cc)

 Average diameter of ear canal at 1 month is 4.4 mm (adult 10.4)

 Inner 1/3 of canal becomes osseous around 1 to 4 years of age


Tonotopic organization of cochlea
Audiogram

 A detection audiogram is a line graph of the dimensions of sound that

indicates what a person hears

 It represents the intensity level where a threshold is measured for a

designated frequency.

 Sounds are indicated based on the graph’s horizontal and vertical

axes. Sounds below or louder than the indicated line can be heard,

while those above cannot be heard.


A Sound Basis: The Applications

 Hearing Loss and Detection

 The analogy of a submarine is effective when considering

an audiogram and hearing loss (Rotfleisch, 2000). To

help the reader understand hearing loss, this figure

compares an audiogram and hearing levels and speech

to the water level and a submarine. The audiogram, the

threshold levels of the hearing, can be indicated by the

water level (Ling, 2002)


Modifying the Signal
 The first way to improve the signal for a child with

hearing loss is through the consistent use of

technology

 Some children with hearing loss will derive little or

no benefit from hearing aids and conventional forms

of hearing technology
Ear Shot/Speech Bubble

 The distance range that we can hear within is referred to as


earshot (Ling, 1980) or the speech bubble (Anderson,
2002).

 Earshot is impacted by the child’s hearing loss, speaker’s


clarity, and intensity of speech and ambient noise
conditions.

 As professionals, we must determine the child’s speech


bubble during therapy sessions and with input from parents.
Background Noise and Noise Clutter

 A sound that is measured at 10 dB stronger in intensity would be

perceived as a sound that is 10 times louder.

 An 80-dB noise or sound such as a piano or vacuum cleaner would be

10 times louder than 70-dB sounds such as a dog barking or phone

ringing, and 100 times louder than a 60-dB normal conversational

level (John Tracy Audiogram of Familiar Sounds, 2012; Ling, 1989).


Audible Versus Intelligible
 An audiogram is a graph of detection of the sounds that can be
heard by a person. Thus, an audiogram indicates what is audible.
However, just hearing the sound is not an indication of the quality
of the sound or whether it can be discriminated and identifies.

 The sound might not be adequately loud to provide for the


needed perception for intelligibility of the word. Therefore,
hearing (that is, detecting) words and discriminating them as
different by number of syllables (e.g., sneak versus sneaking)
might be possible. That would be an indication of audibility.
Sounds of Speech
 Speech sounds or phonemes are created by the complex speech
mechanism, which includes four systems in our bodies:

 The respiratory, phonating, resonating, and articulatory systems. We


create sounds by modifying the airflow through these systems.
(Pickett, 1999; Ladefoged & Johnson, 2015).

 Speech sounds have frequency and intensity elements that are important to
understand. Phonemes can vary in frequency from 250 to 6000 Hz and
above. The lowest-frequency speech sounds are phonemes such as /m/ and
low components of /u/. The highest-frequency sounds are phonemes such
as /s/ and /f/.
Applications :Basic Principles

 When we consider access related to frequency areas on the

audiogram, low frequency and high-frequency information have

some very different properties. Low-frequency information is

typically easier to hear, more accessible, and carries more acoustic

energy. High-frequency information is typically more difficult to hear,

less accessible, and carries less acoustic energy.


The 90/10 Dilemma
 An estimated 90% of acoustic energy of sound (Mueller & Killion, 1990) is in the low
frequencies and carries about 10% of information related to spoken language

 The information in those low frequencies is less important to the perception of


speech sounds and the meaning of the message. The remaining 10% of the
acoustic energy is in the high frequencies and carries the 90% of information that is
critical to the perception of speech sounds and the meaning of the message
(Mueller & Killion, 1990;)
Conti…
 As an illustration, this means that vowels are

significantly louder than consonants such as /s/, /f/, /t/,

and /k/. It is easier to discriminate morphemes that have

lower frequency components such as a vowel. A vowel

will add a syllable, such as the difference in run versus

running or runner or runny, and take advantage of the

90% of acoustic energy. When we add the morpheme /s/

in runs or change the word to runt by adding /t/, the final

consonant carries acoustic information from the 10%.


Speech Features and Acoustic Correlates
 Speech patterns are composed of and classified into two main patterns:

supra-segmentals (also known as non-segmentals) and segmentals. The

suprasegmental aspect has to do with the prosody of the production as

perceived by the rate, rhythm, and intonation. These features are described

using the dimensions of sound: duration, intensity, and pitch.


Speech Features

 Suprasegmentals
These features of speech production are
(nonsegmentals):
superimposed on all of our vocalizations. The
 Duration Intensity  Pitch

supra-segmentals allow us to control and • Loud • High


• Long
• Normal • Normal
modify rate, rhythm, stress, and intonation
• Short
• Quiet • Low
(Ladefoged & Johnson, 2015; Ling, 1988,
• Interrupted
• Whispered • Variable
2002)
Segmental
 Vowels and Diphthongs • Front/back • Tense/lax

 Consonants Manners of Production: The way the sound is produced

Nasals: resonating of the air in nasal cavity with the mouth typically closed and
velopharyngeal port opens the nasal passage

Plosive: exploding sound created with buildup of pressure and burst when released.
Can be produced with the pressure not released.

Fricatives: blowing sounds with friction between articulators producing an oral breath
stream

Affricates: a combination of stop plosives and fricatives

Semivowels: the sound made by the movement between vowels

Liquids (laterals): two articulators in close approximation preventing breath stream


release
Conti…

 Place of Production:
• Palatal: front of tongue and hard palate
 The anatomical place where the sound is
• Velar: back of tongue and hard palate
produced
• Glottal: originating at the vocal cords
• Bilabial: both lips
Voicing
• Labiodental: lower lip and upper front teeth
• Voiced: vocal folds vibrating
• Lingua-dental: tongue tip and teeth
• Unvoiced: vocal folds not vibrating
• Alveolar: tongue tip and ridge behind upper front

teeth
The Applications Related to Speech
Features

 Supra-segmentals and vowels are foundational related to both the ability to

produce speech that others can easily understand and nuances of the

linguistic message.

 Supra-segmentals are dependent on vowels and diphthongs. Vowels and

diphthongs provide the opportunity for modulating the duration, intensity, and

pitch (DIP), which carry the melody of speech and much underlying meaning


Acoustic Cues for Speech Features
 Vocalization

 125 Hz Male fundamental frequency (F0)

 250 Hz Low harmonics of adult male voices Fundamental


frequency (F0) of female and child voices

 750 Hz Minimum level to hear all fundamental frequencies

 1000 Hz Harmonics of most voices

 2000 Hz Harmonics of most voices

 4000 Hz Upper range of harmonics of most voices


Supra-segmentals, Vowels, and Diphthongs

 250 Hz F1 of high back and front vowels

 500 Hz F1 of most vowels

 1000 Hz To detect all vowels F2 and T1 of back and


central vowels

 2000 Hz F2 and T2 of front vowels

 3000 Hz To discriminate among vowels

 4000 Hz F3 and T3 of most vowels


Consonants: Manners of Production
 300 Hz Nasal murmur F1 for /m/, /n/,
/ŋ/ 1500 Hz F1 of lateral liquids /r/, /l/
2000 Hz Additional cues Noise burst of
 500 Hz Primary cues for most
most plosives and affricates
consonants T1 of the semivowels
Unvoiced plosives: /p/, /k/, /t/
 750 Hz Allows for ability to
Turbulent noise of fricatives /ʃ/,
discriminate nasals from
/f/, /θ/
plosives
(unvoiced) T2 and T3 of /l/ and
 1000 Hz Noise burst of most plosives /r/

F2 of nasal consonants

T2 of the semivowels Additional


cues for most consonants
Consonants: Place of Production

 1500+ Hz F2 and T2 and burst frequency

 1500 Hz Primary cues

 4000 Hz Secondary cues


Consonants: Voicing

 250 Hz Voicing cues

 500 Hz And below for duration and intensity differences

 750 Hz For vocal cord vibration Voiced: at least one formant

present at or below

 3000 Hz For unvoiced: has no energy below 1000 Hz


Supra-segmentals: Important Factors
and Applications
 Identified through duration, intensity, pitch (DIP)

 Carry the melody of speech

 Carry pragmatic meaning/coding, e.g., question, sarcasm, mood


of speaker

 Are the foundation of child- and infant-directed speech

 Information is not available through vision and must be accessed


through audition

 Can be accessed if child has audition up to 1000 Hz


Conti….
 Babies begin to discriminate the different features of DIP with
auditory access

 Babies control their early vocal productions using these


elements

 Babies will learn to express different vocal emotions by


controlling these elements

 Control of supra-segmentals is a preliminary skill developed


during stages of babbling and control is learned and practiced

 Control and correct production of supra-segmentals are critical


foundational skills for production of intelligible speech
Vowels: Important Factors and Applications

 Vowels allow for modulation of the suprasegmental aspects of speech

 Auditory information is critical to detect and discriminate vowels

 First two formants are critical to making vowels distinct and identifiable

 Possible to detect but not discriminate or identify vowels

 Discrimination and identification of specific vowels requires access to


both F1 and F2

 Access to F1 permits detection of a vowel

 All vowel F1 are below 1000 Hz


Conti…
 Vowel F2 range from low to high frequency

 The sentence “Who would know more of art must again learn and then take his
ease” (Ling & Ling, 1978) lists vowels such that F2 is in ascending order

 Vowel’s F2 carry information about the adjacent consonants found in the 750 to
2500 Hz range

 Acoustically similar vowels have similar F1 and differ by F2 and therefore can be
confused (e.g., /u/ versus /i/)

 All vowels are voiced

 Vowels cannot be identified though speechreading since they are dependent on


tongue position
Consonants: Important Factors and Applications

There are three defining features for consonants: manner, place, and voicing

• These acoustic features are in low-, mid-, and high-frequency areas

• Many acoustic features overlap with features of other supra-segmentals,


vowels, diphthongs, and other consonant features

• Consonants carry only 10% of acoustic energy but 90% of critical speech
perception information

• Consonants are critical to understanding specifics of words and language

• Nasals, liquids, and semivowels are always voiced


Conti..
Place cue information is carried primarily by second formant transition (T2) and burst
frequency

• Voicing feature is noted by the presence or absence of the vibration of the vocal folds

• Features have specific acoustic correlates that can be related to the different frequency
bands on the audiogram

• When the morphemes such as /s/ or past tense -ed are added to a word, they will take on
the voicing characteristic of the previous phoneme

• A word final stop plosive will be perceived as voiceless when the preceding vowel is short
in duration and perceived as voiced when the vowel is longer in duration (Lisker, 1964;
Ling, 2002)

• Acoustic highlighting eliminates the low-frequency energy of voiced phonemes to help


access unvoiced phonemes

You might also like