Johan Sundberg. Intonation in Singing.
Johan Sundberg. Intonation in Singing.
Johan Sundberg. Intonation in Singing.
Keywords: vibrato, solo singing, ensemble singing, common partials, accuracy, expressivity, reaction time, equally
tempered tuning
Background
In music composition as well as in music performance the musical scale plays a
fundamental role. From a physical point of view, a scale corresponds to a division of the
frequency continuum into discrete steps, the scale tones. In our Western music culture,
there are mostly seven of them in the octave, and the frequencies of the scale tones in
other octaves are obtained by frequency doubling and halving.
In the development of these scales the widespread use of simultaneous playing of several
instruments that produce harmonic spectra seems to have played an important role.
Under these conditions some of the lower partials of two simultaneously sounding tones
Page 1 of 15
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com). © Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
will coincide. This happens when their frequencies can be expressed as a ratio between
small integers.
Dissonance is the other extreme. Maximal dissonance occurs when two partials are
separated by an interval that equals a quarter of a critical band of hearing. These bands
represent a type of analysis bandwidth of the ear (Plomp and Levelt 1965). For
frequencies above about 500 Hz the critical bands are about a minor third wide. For
lower frequencies it is around 100 Hz.
In this sense, our Western diatonic scale offers possibilities to play both very consonant
and very dissonant intervals. It is tempting to speculate how music would have sounded if
it were always played on instruments that generated inharmonic partials.
Consonance of intervals with coinciding partials happens when the intervals are tuned so
that their frequency ratios can be written as ratios between small integers. This tuning is
generally referred to as either just or pure. If the same intervals are played with almost,
but not exactly the same, frequency ratios, beats appear. The reason is that the overtones
do not coincide exactly. Thus, just tuning is the only one that does not give rise to beats in
consonant intervals. On the other hand, in playing solo music, just tuning is often not
applied. The greatest difference happens on the intervals that exist in both major and
minor versions: the second, third, sixth, and seventh. Rather, the major versions are
widened and the minor versions are narrowed. This tuning is referred to as the
Pythagorean.
The frequencies of the scale tones in Pythagorean tuning can be obtained by piling pure
fifth intervals on top of each other and then reducing the frequencies by halving, such
Page 2 of 15
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com). © Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
that they all arrive in the same octave. This tuning enhances the difference between
major and minor versions of intervals.
The equally tempered tuning (ETT) can be regarded as a brutal mathematical method for
obtaining the frequencies of the scale tones that lie between those of the pure and the
Pythagorean tunings. In the ETT, the octave interval is simply chopped into twelve
intervals, all of exactly identical width. Here, all consonant intervals except the octave
have only nearly, but not exactly, coinciding partials. Hence they generate beats when the
tones sound simultaneously.
Page 3 of 15
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com). © Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
It is tempting to assume that just intonation is preferred in ensemble singing such that
beats are avoided. On the other hand, beats may not be a really severe disadvantage,
since they are produced only when the F0 is perfectly constant, a condition only met in
electronically generated tones. Further, as mentioned, just tuning has the disadvantage of
being associated with intervals between successive tones that are far from the
Pythagorean tuning, which seems preferred in performances of solo parts and melodies.
In the same study, Ternström and Sundberg studied also the relevance to intonation of
three factors: (1) vibrato, (2) amplitude of common partials, and (3) amplitudes of partials
above the first common partial. They presented synthesized tones with all combinations
of these three properties to each of eighteen male choir singers who were asked to sing a
major third and a fifth above the reference tones, which were presented over a
loudspeaker. The standard deviation of the F0 of the tones produced was measured.
Page 4 of 15
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com). © Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
Jers and Ternström (2005) analyzed intonation in a choir of sixteen singers. The ensemble
performed in unison a music example containing half notes and quarter notes, and the
males sang one octave below the females. The melody was performed in a slow and a fast
tempo, average quarter note durations being about 500 ms and 750 ms. Their F0 was
measured from accelerometer microphones glued to the singers’ noses and recorded on
separate tracks of a tape recorder. F0 was measured as the average across the entire
tone. The result showed that the standard deviation of F0 of the ensemble decreased from
around 25 cents to 16 cents in the slower tempo.
Page 5 of 15
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com). © Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
presented over loudspeakers. Suddenly, the tones were shifted by 50 or 100 cents upward
or downward and the task was to adapt to this shift in reference.
Great inter-individual variation was observed, but the most common reaction time was
close to 150 ms, as can be seen in Figure 4. Thus, it took about 150 ms for a singer to
adjust tuning to a heard reference, for example the tones from the fellow singers in a
choir. This seems to explain the findings regarding the effect of tempo in the Jers and
Ternström study. In that experiment the fine-tuning of the tones was averaged across the
entire tone, so the mean included the fine correction of tuning that may have happened
during the first 150 ms.
Barbershop
Vibrato
Singing in the classical Western operatic tradition includes vibrato as an important
property. It corresponds to a quasi-sinusoidal modulation of F0. As a consequence, the
frequencies of the harmonic partials also vary in phase with F0. This also implies that the
partials which undulate in frequency prevent the production of beats in mistuned, i.e.,
non-pure intervals. This makes intonation in singing in the classical Western operatic
tradition a particularly interesting area.
A question of basic relevance is what pitch is perceived when a tone has such an
undulating F0. Experiments where musically trained subjects have matched the pitch of a
synthetic sung vibrato tone with the same tone void of vibrato have indicated that the
pitch corresponds to the mean F0, averaged over a complete vibrato period (Horan and
Shonle 1980; Sundberg 1981). This is true for vibrato rates in the range of about 5 to 7
Hz, which is typical in professional Western opera singing. Thus, the perceptual system
Page 6 of 15
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com). © Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
A striking application of
this averaging is provided
by the F0 pattern observed
in legato performances of
coloratura passages, i.e.,
sequences of short tones
which are not separated by
pauses. Figure 5 shows a
typical example. Each
scale tone corresponds to
an F0 pattern encircling
Figure 6 Green curve: fundamental frequency the target F0. A similar
pattern resulting from superimposing a sinusoidal pattern can be obtained if
vibrato on a glissando represented by the black
dashed curve. The black curve shows a running a vibrato-like sinusoidal
average of the green curve. modulation is
superimposed on a
glissando, as illustrated in Figure 6. If this pattern is processed with a running average
function with a window of about 200 ms width, a quasi-stepwise changing F0 is obtained,
as illustrated in the same figure.
Page 7 of 15
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com). © Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
Provided that the rate lies within about 5 and 7 Hz, vibrato generally has no influence on
the accuracy of the pitch perceived. Thus, in an experiment, musically educated listeners
adjusted the F0 of a vibrato free synthetic vowel such that it appeared to have the same
pitch as the same vowel with vibrato. The accuracy was found to be the same as when the
same subjects repeated the experiment with vowels that lacked vibrato. On the other
hand, van Besouw and associates (2008) reported that the tolerance of what was
considered to be in tune was somewhat more generous for vibrato tones than for vibrato
free tones. Absence of beats between vibrato tones may contribute to this effect.
Accuracy
Listeners’ accuracy of pitch perception is obviously crucial to the accuracy required in
singing. This accuracy has been analyzed by Vurma and Ross (2006). They analyzed
professional singers’ performances of ascending and descending versions of three
intervals: minor second, triton, and fifth. The results showed that, on average, the
singers’ intonation was very close to the equally tempered tuning; the F0 averages had a
standard deviation in the vicinity of 20 cents. Thus, the singers’ mean F0 values were less
than 20 cents from equally tempered tuning. Yet there was a systematic tendency to
expand the wider intervals fifth and tritone and to compress the narrow minor second
interval slightly.
While Vurma and Ross investigated singers’ intonation of isolated intervals, deviations
from ETT seem also to depend on the musical context. Sundberg (2011) analyzed the
Page 8 of 15
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com). © Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
Page 9 of 15
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com). © Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
A relevant question is to
what extent these
deviations were consistent.
Nine of the singers had
recorded both the first and
the second verse of the
song, thus allowing a
comparison of intonation
of the same tone. The
comparison was carried
out for tones 1, 2, 3, 23,
24, and 25, i.e., the first
and last three long tones
of the song. The result is
shown in Figure 9, where
Figure 9 Examples of the consistency of intonation
the dashed lines represent
observed in two singers’ performance of the three a ±10 cents difference. In
long tones of the first phrase in verse 1 and verse 2 forty-eight tones out of the
(dotted and solid curves) of Franz Schubert’s Ave
Maria. total of (6 x 9) fifty-four
From Sundberg et al. (1996).
tones, the difference
exceeded ±15 cents. One
singer showed a clear tendency to sing the tones in the second verse flatter than those in
the first verse (open triangles in the graph). Thus, in most cases the deviations from the
ETT were reasonably consistent.
A listening test was carried out with six highly experienced professional music listeners,
representing different professions: teacher of singing, phonogram producer, choral
conductor, and piano accompanist. They were provided with one score for each of the ten
singers and were asked to circle all tones, which they perceived as out of tune.
Page 10 of 15
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com). © Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
Notes number 6, 7, 18, and 21 showed exceptionally great variation of intonation of tones
that none of the expert listeners perceived as out of tune, varying no less than between
35 cents sharp and 20 cents flat for tone 7. The reason for this is an open question, but it
might be relevant that this tone is a suspension note, presenting the fourth that moves to
Page 11 of 15
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com). © Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
the major third in the next beat. Moreover, the note initiates a chromatically falling
sequence of the following melody notes that appear on the stressed position of the beats.
A marked intonation
difference was revealed
between the excited and
the peaceful excerpts. In
the concert versions of the
excited examples, the
phrase-peak tones were
sung about 25 cents sharp,
on average. In the peaceful
examples the sharpening
was close to zero in both
versions. Moreover, in the
excited examples, the
sharpening was greater in
the concert versions than
Figure 12 Deviations from ETT for the phrase-peak
in the neutral versions, as
tones in the singer’s neutral and concert versions of
the excerpts. Triangles and circles refer to excited illustrated in Figure 12.
and peaceful excerpts, respectively. This supported the
From Sundberg et al. (2013). assumption that the
sharpening was used for
expressive purposes in the excited excerpts.
To test this assumption the sharpening of the phrase-peak tones in the excited examples
was eliminated using the Melodyne software. Thus, the intonation of these tones was
flattened such that the mean F0 became in accordance with ETT. The original version and
the manipulated version of the excited examples were then presented pair-wise to
musicians, who were asked to decide which version in the pair sounded more expressive.
The result showed that the original versions, having sharpened phrase-peak tones, were
perceived as significantly more expressive than the manipulated.
Page 12 of 15
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com). © Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
The results of this experiment was in accordance with the assumption that tuning can be
used as an expressive tool and that sharpening phrase-peak tones can be used for this
purpose in excited contexts. It can also be noted that the intonation of an interval may
carry some emotional information.
In view of these investigations, a relevant question is how small the intonation differences
are that music listeners can notice. The answer appears to depend on the musical context
and on the listening conditions.
The just noticeable difference in pitch between two tones presented in sequence
approaches 10 cents. In a listening experiment Friberg and Sundberg (1994) presented
three versions of short musical excerpts to musically experienced students. In one of the
versions the tuning of the scale tones deviated from ETT, and the subjects’ task was to tell
which of the three presentations was different from the other two. When an incorrect
answer was obtained the deviations were made twice as large in the subsequent
presentation. The results suggested that the just noticeable difference in tuning for these
listeners was about 42 cents, i.e., about four times larger than what has been found in
comparison of single tones presented in succession. Thus, it seems that even musically
experienced listeners may completely lack sensitivity to fine-tuning, their pitch
discrimination being limited to discriminate nothing but semitones. By contrast,
professional music listeners, such as teachers of singing and phonogram producers, seem
to have a tolerance zone of ±10 cents, as suggested by the results shown in Figure 10.
Thus, experience and education appears to sharpen the auditory system with regard to
fine-tuning as well. An improved ability to detect tuning differences may also increase a
listener’s possibilities to perceive expressive components of a musical performance.
The study of phrase-peak tone intonation in sung performance of excited music excerpts
suggested that sharpening such tones increased the expressiveness (Sundberg et al.
2013). Some experimental findings indeed support the hypothesis that intonation may
add an emotional coloring even to an octave interval (Makeig and Balzano 1982). In any
event, it seems fair to assume that an increased accuracy of pitch perception paves the
way for a more complete experience of emotional colors embedded in music
performances.
It is tempting to make a final remark regarding singing in tune and singing out of tune. In
today’s recording studios software packages are available that allow manipulation of
vocal artists’ pitch. If such corrections are made on the assumption that deviations from
ETT equals singing out of tune, they may in fact have the effect of reducing the
expressivity of the singer’s performance. It seems important to keep in mind, that ETT
cannot be accepted as the gold standard for tuning.
References
d’Alessandro, C. and Castellengo, M. (1994). The pitch of short-duration vibrato tones.
Journal of the Acoustical Society of America 95: 1617–1630.
Page 13 of 15
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com). © Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
van Besouw, R.M., Brereton J.S., and Howard, D.M. (2008). Range of tuning for tones with
and without vibrato. Music Perception 26: 145–155.
Friberg, A. (1991). Generative rules for music performance: a formal description of a rule
system. Computer Music Journal 15(2): 56–71.
Friberg, A. and Sundberg, J. (1994). Just noticable difference in duration, pitch and sound
level in a musical context. In: Proceedings of the 3rd International Conference for Music
Perception and Cognition, pp. 339–340, Liège, July 23–27.
Grell, A., Sundberg, J., Ternström, S., Ptok, M., and Altenmüller, E. (2009). Rapid pitch
correction in choir singers. Journal of the Acoustical Society of America 126(1): 407–413
Horan, J.I. and Shonle, K.E. (1980). The pitch of vibrato tones. Journal of the Acoustical
Society of America 67: 246–252.
Plomp, R. and Levelt, A. (1965). Tonal consonance and critical bandwidth. Journal of the
Acoustical Society of America 38: 548ff.
Sundberg, J. (1981). Effects of the vibrato and the ‘singing formant’ on pitch. Journal of
Research in Singing 5(2): 5–17.
Sundberg, J., Prame, E. and Iwarsson, J. (1996). Replicability and accuracy of pitch
patterns in professional singers. In: P. Davis and N. Fletcher (eds), Vocal Fold Physiology:
Controlling Complexity and Chaos, pp. 291–306. San Diego: Singular.
Sundberg, J., Lã, F.M., and Himonides, E. (2013). Intonation and expressivity: A single
case study of classical western singing. Journal of Voice 27: 391e–397e.
Ternström, S. and Sundberg, J. (1988). Intonation precision of choir singers. Journal of the
Acoustical Society of America 84: 59–69.
Vurma, A. and Ross, J. (2006). Production and perception of musical intervals. Music
Perception 23: 331–344.
Page 14 of 15
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com). © Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).
Johan Sundberg
Johan Sundberg had a personal chair in Music Acoustics at the department from
1979 to his retirement 2001. Since 2002 he is Visiting Professor at the University of
London, UK.
Page 15 of 15
PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com). © Oxford University Press, 2018. All Rights
Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in
Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).