Chapter 1: Speech Mechanism: Introduction To Phonology
Chapter 1: Speech Mechanism: Introduction To Phonology
Introduction to Phonology
Phonology is the study of the sounds of a particular language (e.g., English). In phonology, it
matters whether sounds are contrastive or not, that is, whether substituting one sound for another
gives a different, or "contrastive," meaning. For example in English, [r] and [l] are two different
sounds - and the words "road" and "load" differ according to which of these sounds is used.
Similarly, phonologists describe the contrastive consonants and vowels in a sound system
(language). They are also interested in syllables, phrases, rhythm, tone, and intonation of a
specific language.
Introduction to Phonetics
Phonetics, as a discipline, is the study of human speech sounds. It includes the understanding of
how sounds are articulated using mouth, nose, teeth and tongue, and how ears hear those sounds
and can tell them apart. In phonetics, the physical properties (such as the wave form of each
sound) can also be analyzed with the help of computer programs (e.g., Praat). There are three
phonetics.
Phonetics and phonology, both are important subfields of linguistics dealing with speech sounds
overlapping each other. But the key difference is that phonology is the study of how sounds are
patterns (e.g., phonological rules within a specific language). The key words for describing
phonology are ‘distribution’ and ‘patterning’ related to speech. Phonologists may look into
questions like – why there is a difference in the plurals of cat and dog; the former ends with an
‘s’ sound, whereas the latter ends with the ‘z’ sound. Phonetics, on the other hand, is the study of
actual process of sound making. Phonetics has been derived from the Greek word ‘phone’
meaning sound or voice. It covers the domain of speech production and its transmission and
reception. The sounds made by us when we talk are studied through different branches of
There are various terms which are frequently used in phonetics and phonology. They mainly
include phone, phoneme and allophone. For better understanding, we need to distinguish among
them.
A phone is a sound (or a segment) which has some physical feature and the term is mostly used
in a non-technical sense.
A phoneme is the smallest meaningful unit of sound (therefore, a smallest unit in phonology) in
a language and this meaningful unit of sound is one that will change one word into another
word. For example, the difference in both ‘white’ and ‘right’ (ignore spellings here, focus on
sounds) is the difference of sounds (w – r) which are phonemes and they have the ability to
change meaning. Similarly, take another example of ‘cat’ vs. ‘bat’ (k – b). Linguists have also
defined phoneme as a group or class of sound events having common patterns of articulation. If
phoneme is a group then allophones are the group members. Let us discuss now allophone.
‘s’ sound in words like sill, still and spill or in words like seed, steed and speed
‘k’ sound in words like, key and car
Introduction to Vowels
There are 44 sounds in English RP (BBC) accent. Out of them, 20 are vowels which, in turn, are
further divided into pure vowels and diphthongs. Pure vowels or monophthongs are 12 out of
which 5 are long and 7 are short vowels. Examples for these vowel sounds are given here:
Short vowels
• ɪ pit
• e pet
• æ pat
• ʌ putt
• ɒ pot
• ʊ put
anothe
• ǝ r
Long vowels
• iː bean
• ɑː barn
• ɔː born
• uː boon
• ɜː burn
English diphthongs are divided into two categories: centering (which end with ‘ǝ’ sound)
and closing (which end with either ‘ɪ’ or ‘ʊ’ sounds respectively). Examples for these
Diphthongs
pee
• ɪǝ r
• eǝ pair
poo
• ʊǝ r
• eɪ Bay
• w wet
• J yet
by a fricative with the same place of articulation (e.g., [tʃ] and [dʒ] sounds at the beginning and
end of the English words ‘church’ and ‘judge’). It is often difficult to decide whether any
particular combination of a plosive plus a fricative should be classed as a single affricate sound
or as two separate sounds, and the question depends on whether these are to be regarded as
separate phonemes or not. It is usual to regard [tʃ], [dʒ] as affricate phonemes in English (usually
It might be useful to know the terms trill (sometimes called roll), tap and flap and distinguish
among them. These are also called central approximants. In the case of tap and flap, there is only
one rapid contact while in the case of trill [r] the tongue is striking continuously (rrrrr) as the
Vowels:
• Long vowels iː ɑː ɔː uː ɜː
• Short vowels ɪ e æ ʌ ɒ ʊ ǝ
• Diphthongs eɪ aɪ ɔɪ ǝʊ aʊ ɪǝ eǝʊǝ
Consonants:
• Plosives p b t d k g
• Nasals m n ŋ
• Fricatives f v θ ð s z ʃ ʒ h
• Affricates ʧ ʤ
• Approximants l r w j
Types of phonetics:
Phonetics is the scientific study of speech sounds. It has three major branches:
articulatory phonetics, acoustics phonetics and auditory phonetics. Phonetics as a field of study
has a long history, going back certainly to well over two thousand years ago. The central
concerns in phonetics are the discovery of how speech sounds are produced; how they are used
in spoken language; how we can record speech sounds with written symbols and how we hear
and recognize different sounds. In the first of these areas, when we study the production of
speech sounds we can observe what speakers do (articulatory observation) and we can try to feel
what is going on inside our vocal tract (kinesthetic observation). The second area is where it
overlaps with phonology: usually in phonetics we are only interested in sounds that are used in
meaningful speech, and phoneticians are interested in discovering the range and variety of
sounds used this way in all the known languages of the world. This is sometimes known as
linguistic phonetics. Thirdly, there has always been a need for agreed conventions for using
phonetic symbols that represent speech sounds; the International Phonetic Association has
played a very important role in this regard. Finally, the auditory aspect of speech is very
important: the ear is capable of making fine discrimination between different sounds, so much so
that sometimes it is not possible to define in articulatory terms precisely what the difference is
There are various other fields which are newly emerging and taking phonetics into account for a
detailed analysis such as ‘instrumental phonetics’, ‘applied research in speech technology’ and
Articulatory Phonetics
Articulatory phonetics is the branch of phonetics which deal the making of single sounds by the
vocal tract. In this branch of phonetics we studies the way in which speech sounds are made
(‘articulated’) by the vocal organs. It derives much of its descriptive terminology from the fields
of anatomy and physiology, and is sometimes referred to as physiological phonetics. This area
has traditionally held a central place in the training of phoneticians, the movements involved
being reasonably accessible to observation and, in principle, under the control of the
investigator. The classification of sounds used in the International Phonetic Alphabet (IPA), for
example, is based on articulatory variables. In the recent years, there has been much progress in
the development of instrumental techniques for observing and measuring such factors as tongue,
lip, palate and vocal fold movement. Important discussions included in this field are: airstream
(voicing), and other processes such as the oro-nasal process and the description of vowel
production.
Acoustic Phonetics
Acoustic phonetics is related to the study of physical attributes of sounds produced by the vocal
tract. It is the branch of phonetics which studies the physical properties of speech sound as
transmitted between mouth and ear according to the principles of acoustics (the branch of
physics devoted to the study of sound). It is primarily dependent on the use of instrumental
importance to the phonetician is that acoustic analysis can provide a clear, objective datum for
Auditory Phonetics
Auditory phonetics deals with understanding how human ear perceives sound and how the brain
recognizes different speech units. This branch of phonetics studies the perceptual response to
speech sounds as mediated by ear, auditory nerve and brain. It is a very less well-studied area of
phonetics, mainly because of the difficulties encountered as soon as one attempts to identify and
measure psychological and neurological responses to speech sounds. On the other hand,
anatomical and physiological studies of the ear are well advanced, as are techniques for the
measurement of hearing, and the clinical use of such studies is now established under the
headings of audiology and audiometry. But relatively little pure research has been done into the
attributes of speech-sound sensation, seen as a phonetic system, and the relationship between
such phonetic analyses and phonological studies remains obscure. The subject is closely related
Articulatory Phonetics
It is the branch of phonetics that studies articulators and their actions related to human speech
production. Actually, we can only produce speech sounds by moving parts of our articulators
(body parts), and this is done by the contraction of muscles. Most of the movements relevant to
speech take place in the mouth and in the chest for breath controland parts of the mouth and
throat area that we move when speaking. These are called articulators. In this branch of
phonetics, we study the principal articulators (such as tongue, lips, lower jaw and the teeth,
velum or soft palate, uvula and larynx) and other processes related to speech production. This
includes the features of various sounds such as vowels and consonants and their specific
Speech Mechanism
The process of speech production mainly includes respiration, phonation, articulation and
resonance. This simply means that in order to produce speech, we need the air stream
mechanism (so that the process of speech is activated), the exploitation of the air stream at
larynx (this process is called phonation or voicing), the modification of the air passage with the
help of articulators at the cavity (either oral or nasal) and finally the transfer of energy. In
phonetics, speech production is a term used for the activity of the respiratory, phonatory and
articulatory systems during speech, along with the associated processes required for their co-
ordination and use. A contrast is usually drawn with the receptive aspects of spoken
As the anatomy of speech, some experts (such as Ladefoged) highlight the following four main
components—the airstream process, the phonation process, the oro-nasal process, and the
articulatory process. The airstream process includes all the ways of pushing air out that provide
energy for speech. The phonation process is the name given of the vocal folds. The oro-nasal
process is the possibility of the airstream going out through the mouth, as in [v] or [z], or the
nose, as in [m] and [n]. And finally, the movements of the tongue and lips interacting with the
roof of the mouth and the pharynx are part of the articulatory process.
Places of Articulation
Sound Waves
A sound wave is the pattern of disturbance caused by the movement of energy traveling through
air (sound always travels in the shape of waves in the air). Sound basically consists of small
variations in air pressure that occur very rapidly one after another. These variations are caused
by actions of the speaker’s vocal organs that are (for the most part) superimposed on the
outgoing flow of lung air. Thus, in the case of voiced sounds, the vibrating vocal folds chop up
the stream of lung air so that pulses of relatively high pressure alternate with moments of lower
pressure. Variations in air pressure in the form of sound waves move through the air somewhat
like the ripples on a pond. When they reach the ear of a listener, they cause the eardrum to
vibrate.
The possibility of the airstream going out through the mouth, as in [v] or [z], or the nose, as in
[m]and [n], is determined by the oro-nasal process. Consider the consonants at the end of rang,
ran, ram (ŋ, m, n) which are all nasal sounds. these consonants when you say by themselves, note
that the air is coming out through the nose. In the formation of these sounds in a sequence, the
point of articulatory closure moves forward, from velar in ‘rang’, through alveolar in ‘ran’ and to
bilabial in ‘ram’. In each case, air is prevented from going out through the mouth but is able to
go out through the nose because the soft palate, or velum, is lowered. In the most speech, soft
palate is raised so that there is a velic closure. It is when lowered and there is an obstruction in
the mouth, we say that there is a nasal consonant. Raising or lowering the velum controls the
oro-nasal process, the distinguishing factor between oral and nasal sounds.
In order to fully describe a sound, we need to know various actions made by articulators during
the process of articulation. The articulators make gestures required for speech by moving toward
other articulators to produce speech sounds. This movement is called articulatory gesture.
Bearing all these terms in mind, let us go through the major articulatory gestures used in the
production of English sounds:
Bilabial: This sound is made with two lips (for example, /p/ and /b/). The lips come together for
thesesounds.
Labiodental: This sound is made when the lower lip is raised to touch the upper front teeth (for
Dental: This sound is made with the tongue tip or blade and upper front teeth. For example, say
thewords thigh, thy and you will find the first sound in each of these words to be dental.
Alveolar: This sound is made with the tongue tip or blade and the alveolar ridge. You may
pronouncewords such as tie, die, nigh, sigh, zeal, lie using the tip of the tongue or the blade of
the tongue for the first sound in each of these words (which are alveolar sounds).
Retroflex: This sound is produced when the tongue tip curls against the back of the alveolar
ridge. Manyspeakers of English do not use retroflex sounds at all but it is a common sound in
Palato-alveolar: This sound is produced with the tongue blade and the back of the alveolar
ridge (forexample, first sound in each of words like shy, she, show)
Palatal: This sound is produced with front of the tongue and the hard palate (such as the first
sound
Approximants
Approximant is a phonetic term used to denote a consonant which makes very little
obstruction to the airflow. Traditionally approximants have been divided into two groups: (1)
“sein‘yes’.
Velar: This sound is produced with back of the tongue and the soft palate (such as /k/ and /g/).
Manner of Articulation
In order to classify a speech sound, one of the most important things that we need to know is
what sort of obstruction it makes to the flow of air: a vowel makes very little obstruction, while
a plosive consonant makes complete obstruction. The kind of obstruction is known as the
manner of articulation. There are several basic ways to pronounce a consonant sound which are
based on the configuration and interaction of the articulators involved. For example, a stop
sound [p] is pronounced by blocking the air passage completely in the oral cavity. Similarly,
there are certain parameters for determining the manners of articulation such as stricture,
laterality and nasality. Consonantal sounds are divided, in terms of their manner of articulation,
into two major types: obstruents (such as stops, fricatives and affricates) and sonorants (such as
nasals, liquids and glides). The possible manners of articulation are described in detail in the
next sections. The International Phonetic Association classifies consonants according to their
Stop refers to any sound which is produced by a complete closure in the vocal tract, and thus
traditionally includes the class of plosives. Both nasal and oral sounds can be classified as stops,
though the term is usually reserved for the latter. The term ‘stop’ is used in the phonetic
classification of consonant sounds on the basis of their manner of articulation (it refers to a
sound made when a complete closure in the vocal tract which suddenly released and the pressure
of air which had built up behind the closure rushes out with an explosive sound). Thus the sound
stop has two processes; the closure of air passage (stop) and the burst (release). Examples are [p,
b, t, d, k, g]. Plosion is the term used to refer to the outwards movement of air upon release.
Plosive consonants are one type of stop consonant. Nasal stops include [m, n, ŋ]. It is also
possible (using a different airstream mechanism than the one which produces an outward flow of
lung air) to produce plosives (implosives) where the air upon release moves inward.
Fricative
A fricative consonant is made by forcing air through a narrow gap so that a hissing noise is
generated. This may be accompanied by voicing (in which case the sound is a voiced fricative,
such as [z] or it may be voiceless as [s]). The quailty of fricative sounds varies highly but all are
acoustically composed of energy at relatively high frequency. There are several fricative sounds
in English, both voiced and voiceless, as in fin [f], van [v], thin [θ], this [ð], sin [s], zoo [z], ship
[ʃ], measure [ʒ] and hoop [h]. There is distinction which is made between sibilant or strident
fricatives. Sibilant fricatives (such as s, ʃ) are strong and clearly audible and strident fricatives
are weak and less audible (such as θ, f). BBC pronunciation has nine fricative phonemes: f, θ, s,
mivowels” such as [w] in English ‘wet’ and [j] in English ‘yet’, which are very similar to close
vowels such as [u] and [i] but are produced as a rapid glide; and (2) “liquids” sounds which have
an identifiable constriction of the airflow but not the one that is sufficiently obstructive to
produce fricative noise. This category includes laterals such as English [l] in ‘lead’ and non-
fricative [r] (phonetically ɹ) as in ‘read’. BBC English has four approximant sounds which
include [l] as in light, [r] as in right, [w] as in wet and [j] as in yet.
There are some additional consonantal gestures which may be useful to discuss at this stage. One
of such gestures not yet discussed is ‘affricate’. It is a type of consonant consisting of a plosive
followed by a fricative with the same place of articulation (e.g., [tʃ] and [dʒ] sounds at the
beginning and end of the English words ‘church’ and ‘judge’). It is often difficult to decide
whether any particular combination of a plosive plus a fricative should be classed as a single
affricate sound or as two separate sounds, and the question depends on whether these are to be
regarded as separate phonemes or not. It is usual to regard [tʃ], [dʒ] as affricate phonemes in
It might be useful to know the terms trill (sometimes called roll), tap and flap and distinguish
among them. These are also called central approximants. In the case of tap and flap, there is only
one rapid contact while in the case of trill [r] the tongue is striking continuously (rrrrr) as the
Tap: Tap is up and down movement of the top of the tip of tongue. For example, pronouncing
the middlesound in word ‘pity’ with typical American accent [ɾ]. It is very brief and is produced
by a sharp upward throw of the tongue blade. In this sound, tongue makes a single tap against
Flap: Flap is front and back movement of tongue tip at the underside of tongue with curling
behind. It isfound in abundance in Indo-Aryan (IA) languages [ɽ]. Typical flap sounds found in
IA languages is a retroflex sound and the examples are [ɽ], [ɖ] and [ɳ].
Trill: In the production of trill the articulator is set in motion by the current of air [r]. It is a
Silent r
The most obvious difference between standard American (GA) and standard British (GB) is the
omission of ‘r’ in GB: you only pronounce a written < r > if there is a vowel sound after it, so we
don’t say it in PARK /pɑːk/, HORSE /hɔːs/ or FURTHER /ˈfɜːðə/. In American, though, we
Many of the 19 vowel sounds are very similar in American and British, however, there are
/ɒ/ to /ɑ/
In British (GB) we use back rounded open sound /ɒ/ for words like SHOP /ʃɒp/, LOST /lɒst/ and
WANT /wɒnt/. In American (GA) we don’t round the lips, so it’s: /ʃɑp/, /lɑst/ & /wɑnt/.
/æ/ to /e/
The British thinking sound /ɜː/, found in words like HEARD /hɜːd/, FIRST /fɜːst/ and WORST
/wɜːst/, is pronounced differently – with the tongue raised and a /r/ quality in
American, /hɜrd/, /fɜrst/ & /wɜrst/. This sound nearly always has an ‘r’ in its spelling, but
even when it doesn’t, American speakers say one, like in the word COLONEL /ˈkɜrnəl/,
Long back rounded /ɔː/ as in SWORD /sɔːd/, FORCE /fɔːs/, THOUGHT /θɔːt/ & LAW /lɔː/is
pronounced in 2 ways in American. /ɔr/ for words with ‘r’ so SWORD /sɔrd/ & FORCE
/fɔrs/, and /ɑ/ for words without /r/ so THOUGHT /θɑt/ & LAW /lɑ/. This means that for
many American speakers, COT /kɑt/and CAUGHT /kɑt/ are the same, though COURT
/kɔrt/ would be different.In British English CAUGHT /kɔːt/ and COURT would be the same,
ong back unrounded /ɑː/ like in CAR /kɑː/, START /stɑːt/, AFTER /ɑːftə/ & HALF /hɑːf/ is
pronounced /ɑr/ in American if there’s an ‘r’ in the spelling so CAR /kɑr/ & START
/stɑrt/. Most of those words that don’t have an ‘r’ in GB are pronounced /æ/ in American
Vowel Length
There is a greater difference in British English between the length of vowel sounds, with some
being pronounced significantly longer than their American counterparts. Some of this is owing to
the additional pronunciation of ‘r’ in many American vowel sounds as seen above. Most
phonemic charts reflect this by showing five or six English vowel sounds with two triangular
Consonant Sounds
Consonant sounds are largely similar in American and British with just a few key
differences:
/t/
When /t/ appears after a stressed vowel and before a weak vowel, American speakers often
make a voiced flap – a bit like a very fast /d/: WATER, FIGHTER, GOT IT. In Standard
British this would be pronounced as a normal /t/ WATER, FIGHTER, GOT IT, though in
regional British accents, most famously cockney, this would be a glottal stop: WATER,
/r/
Apart from the higher number of /r/ sounds in American English, there is also a small but
significant difference in the way they are pronounced. In American, the tongue curls back
further, giving it a slightly muffled quality – RIGHT, ARROW. Whereas in British the
In British English where /j/ appears after /t, d, n, l, s, z/ (the alveolar consonants) it is omitted in
American: /t/ TUNE /tjuːn, tun/, /d/ DUTY /ˈdjuːti, ˈduti/, /n/ NEW /njuː, nu/, /l/ LEWD
/ljuːd, lud/, /s/ SUIT /sjuːt, sut/ /z/ EXUDE /ɪgˈzjuːd, ɪgˈzud/. This is often referred to as ‘yod
dropping’.
Word Stress
Some words are stressed differently in American English, particularly those of French origin
where American keeps the last syllable stress and British goes for first syllable (audio is British
then American): GARAGE, GOURMET, BALLET, BROCHURE, though this is reversed in the
Intonation
The melody of British and American is quite different, though the structure of speech is very
similar. The most obvious difference is the British tendency to use high falling intonation, hitting
the main stress high and dropping down. Whereas in American rising tones are more common,
so you go up from the main stress. This use of rising intonation on statements is sometimes
referred to as ‘Upspeak’.