According to common practice and oral tradition, learning verbal materials through song should facilitate
word recall. In the present study, we provide evidence against this belief. In Experiment 1, 36 university students,
half of them musicians, learned an unfamiliar song in three conditions. In the sung–sung condition, the song to
be learned was sung, and the response was sung too. In the sung–spoken condition, the response was spoken. In
the divided–spoken condition, the presented lyrics (accompanied by music) and the response were both spoken.
Superior word recall in the sung–sung condition was predicted. However, fewer words were recalled when sing-
ing than when speaking. Furthermore, the mode of presentation, whether sung or spoken, had no influence on
lyric recall, in either short- or long-term recall. In Experiment 2, singing was assessed with and without words.
Altogether, the results indicate that the text and the melody of a song have separate representations in memory,
making singing a dual task to perform, at least in the first steps of learning. Interestingly, musical training had
little impact on performance, suggesting that vocal learning is a basic and widespread skill.
The notion that music may serve as a mnemonic tech- Thus, to properly test the idea that music may serve as a
nique for learning verbal material has a long history. Min- mnemotechnique for recalling words, one must not only
strels transmitted stories through songs (Calvert & Tart, examine oral responses, but also select material in which
1993; Rubin, 1995), and this practice is still influential the words are appropriately set to the music (Gingold &
today. Among the most familiar experiences of musical Abravanel, 1987); in short, one must use real songs. This
learning are jingles for brand names and the alphabet was done in the present study.
song children learn. Other examples that have been de- Although an adequate test of the idea that music fa-
scribed consist of learning the laws of physics through cilitates text recall requires consideration of both input
karaoke (Dickson & Grant, 2003) and learning English as and output factors, the influence of music on word recall
a second language via songs (Medina, 1993). The goal of starts at the encoding stage. Thus, all prior studies that
the present study was to contribute to the understanding used written recall but looked at input factors may shed
of this phenomenon from both empirical and theoretical light on the idea that sung words are easier to encode than
perspectives. spoken words. Support for this notion is mixed. In sev-
Indeed, it is not obvious why music should facilitate eral studies, participants recalled as many sung as spoken
word recall, since there is more to learn in a song than in a words (Gingold & Abravanel, 1987; Wolfe & Hom, 1993)
text. To our surprise, this simple notion has not been prop- or even did worse on sung material (Calvert & Billingsley,
erly assessed. Song learning is typically assessed through 1998). Yet, in many other studies, an advantage of sung
written recall (Kilgour, Jakobson, & Cuddy, 2000; McEl- over spoken presentation has been shown (Calvert & Tart,
hinney & Annett, 1996; Wallace, 1994). This change in 1993; Chazin & Neuschatz, 1990; Kilgour et al., 2000;
format between perception and performance introduces a McElhinney & Annett, 1996; Rainey & Larsen, 2002;
bias in word recall in favor of the spoken version, because Wallace, 1994; Wolfe & Hom, 1993).
extracting words from the sung version requires filter- This advantage of sung over spoken text at encoding
ing out the music component. Moreover, written recall has been attributed to speed (Kilgour et al., 2000) and to
requires participants to perform a task that is not familiar melody simplicity (Wallace, 1994). In effect, words are
to them. Lyrics are typically learned to be sung, not to be pronounced more slowly when singing than when speak-
written. Thus, a putative advantage of singing over recit- ing. When the sung version of a text is compressed to
ing words should be assessed with a vocal response. To our match its spoken duration, there is no longer a difference
knowledge, this procedure has been used only once (Jelli- in recall, suggesting that the slower rate of singing in com-
son & Miller, 1982), and the results were negative: Music parison with speaking is a key variable in song learnability
was found to interfere with digit recall and had no effect (Kilgour et al., 2000). Similarly, in order for a sung text
on word recall. However, in this experiment the words to be recalled better than one that is recited, it has to be
were unrelated and probably were not optimally aligned presented on a simple and repeated melody, as typically
to the music, hence introducing an additional difficulty. found in songs. Lyrics that are sung to a complex and
changing melody can be more difficult to remember than & Arguin, 2004; Samson & Zatorre, 1991; Serafine,
their spoken version (Wallace, 1994). Crowder, & Repp, 1984; Serafine, Davidson, Crowder,
Songs also possess structural characteristics that may & Repp, 1986). In the recognition of song lines, melody
assist text recall. For instance, the metrical structure of and text appear to be highly associated, even after a single
music and the number of musical notes in a line can cue hearing, suggesting that lyrics and melody representations
word recall. Similarly, song lyrics are usually constrained are integrated in memory for songs (Serafine et al., 1984;
by both semantics (a story underlies the words, generally Serafine et al., 1986). However, there is increasing evi-
through a schema or a script) and sound patterns (e.g., dence that the music and language components of songs
rhymes, alliteration), which may again limit the possibili- maintain autonomy in both perception (Besson, Faïta,
ties. Indeed, when errors occur in song recall, the changes Peretz, Bonnel, & Requin, 1998; Bonnel, Faïta, Peretz, &
usually preserve the rhyme (Rubin, 1995) and the number Besson, 2001) and memory (Crowder et al., 1990, Experi-
of syllables in the line (Wallace, 1994). ment 3; Peretz, 1996). Very recently, we extended these
Nevertheless, as mentioned previously, texts of real conclusions to singing by studying brain-damaged pa-
songs are not systematically memorized better when sung tients who suffered from a severe speech disorder without
than when recited (see, e.g., Gingold & Abravanel, 1987, a a concomitant musical disorder (Hébert, Racette, Gagnon,
study with children; Wallace, 1994, Experiment 3; Wolfe & Peretz, 2003; Peretz, Gagnon, Hébert, & Macoir, 2004;
& Hom, 1993). This lack of consistency might be related Racette, Bard, & Peretz, 2006). The results indicate that
to the mode of response, as has been pointed out previ- verbal production, be it sung or spoken, is mediated by
ously. Writing down or reciting the words requires filtering the same (impaired) language output system and that this
them out from the music. This filtering process might be speech route is distinct from the (spared) melodic route.
difficult, especially when words are sung at high pitches These neuropsychological findings strongly suggest that
(Scotto Di Carlo & Germain, 1985). One way to control singing taps into distinct codes for melody and text. Thus,
for this perceptual disparity between sung and spoken pre- the present study should help us to shed further light on
sentations is to present the spoken lyrics accompanied by this issue by testing singing in the normal population.
music. We refer to this situation as the “divided song” con- The general population is musically untrained. How-
dition. By adding the musical background, this condition ever, we also considered a group of professional musi-
also maintains the presence of the melody at encoding. cians because these individuals might exploit musical cues
None of the prior studies that aimed at testing the effect of more effectively than nonmusicians, and therefore might
music on word recall have included such a control condi- benefit more from the presence of music on text recall.
tion. Finally, in order to promote the use of musical cues Moreover, musicians seem to have better verbal memory
as a structural aid in the retrieval process, one needs to than nonmusicians (Chan, Ho, & Cheung, 1998; Jellison &
assess sung recall. Miller, 1982; Kilgour et al., 2000), apparently from child-
Consideration of all these factors is not solely motivated hood (Ho, Cheung, & Chan, 2003). Thus, it is possible that
by experimental elegance, since the contribution of music musical training strengthens auditory temporal processing,
to verbal memory is a theoretically important question. which would mediate verbal recall (Jakobson, Cuddy, &
As alluded to previously, text and melody are aligned in Kilgour, 2003; Jellison & Miller, 1982). These results in
songs in such a way that they promote binding of speech turn suggest that music may assist in text recall, but only in
and musical sounds at multiple levels of processing. These those individuals who regularly use the two codes.
tight relations may enhance memory for relatively distinct Therefore, in the present study 36 students, half with
representations of text and melody in songs by linking musical expertise, had to learn novel songs in three dif-
elements of words and tones in rich, multiple-linked rep- ferent conditions. As illustrated in Table 1, the text to be
resentations (Peretz, Radeau, & Arguin, 2004). Alterna- learned was either sung or spoken. When spoken, its cor-
tively, the text and melody of songs might be integrated responding melody was sung on /la/ in the background.
in a unitary representation, especially when singing is Recall of text was either sung (on the melody) or spoken
required. Central to the distinction between these two (lyrics alone). We predicted that word recall would be su-
positions is a difference in the way recall is assumed to perior in the sung–sung condition, especially for musi-
operate. If integrated, a part of the song’s representation cians, simply because singing is slowed down relative to
will reinstate the whole—namely, singing the melody will normal speech. The sung–spoken condition was expected
reinstate the text. If separate, a part (the melody) may or to be the most difficult, because in this condition the text
may not connect with the other part (the text), depending needs to be extracted from the song. In the divided–spoken
on the strength of the links. Thus, the integrated view of condition, there would be no cost of extracting the words,
song memory would predict superior text recall in singing
over speaking, whereas a separate-memory view of song
components would not. Table 1
The idea that melody and text might be represented in Modes of Presentation and Recall
a unitary memory trace has been relatively neglected in in the Three Conditions of Experiment 1
performance, but it has been studied in perception and Presentation of the Song Recall of the Lyrics
memory. The prevailing paradigm in the field involves the Sung Sung
recognition of unrelated song lines (Crowder, Serafine, & Sung Spoken
Repp, 1990; Morrongiello & Roes, 1990; Peretz, Radeau, Spoken (divided) Spoken
244 Racette and Peretz
Table 2
Illustration of the Adaptive Learning Procedure
Lyrics Presented Lyrics Repeated Lyrics to Be Recalled
1 Dans cette petite boîte vide 1 Dans cette petite boîte vide 1 Dans cette petite boîte vide
2 Avec un ruban de velours
1 Dans cette petite boîte vide 1 Dans cette petite boîte vide
3 Il y a tout mon cœur et mes rides
2 Avec un ruban de velours 2 Avec un ruban de velours
4 Mon sourire et tout mon amour
3 Il y a tout mon cœur et mes rides 3 Il y a tout mon cœur et mes rides
4 Mon sourire et tout mon amour 4 Mon sourire et tout mon amour
If less than 80% of words recalled, stop.
If 80% or more words recalled, continue.
5 Il n’y a pas d’argent qui remplace 5 Il n’y a pas d’argent qui remplace 1 Dans cette petite boîte vide
6 Tout le temps que l’on peut donner 6 Tout le temps que l’on peut donner 2 Avec un ruban de velours
3 Il y a tout mon cœur et mes rides
4 Mon sourire et tout mon amour
5 Il n’y a pas d’argent qui remplace
6 Tout le temps que l’on peut donner
7 À tous ceux que l’on aime hélas 7 À tous ceux que l’on aime hélas 1 Dans cette petite boîte vide
8 Trop souvent qu’on oublie d’aimer 8 Trop souvent qu’on oublie d’aimer 2 Avec un ruban de velours
3 Il y a tout mon cœur et mes rides
4 Mon sourire et tout mon amour
5 Il n’y a pas d’argent qui remplace
6 Tout le temps que l’on peut donner
7 À tous ceux que l’on aime hélas
8 Trop souvent qu’on oublie d’aimer
The practice song was learned before each condition in the corre- lines that participants were recalling. This ratio was multiplied by
sponding version. Participants were asked to do their best to recall the 100 to obtain a percentage.
exact words and, if they did not remember a part, to report whatever The raw number of words recalled and the number of lines (4, 6,
came to mind. The participants listened to digital recordings through or 8) attempted in immediate recall were also taken into account. Num-
speakers, and their performance was recorded on a Sony DAT. ber of hesitations, defined as a marked pause or a corrected attempt
In order to assess verbal memory independently from song mem- (the participant tried something and then changed her/his answer), was
ory, the Rey Auditory Verbal Learning Test (RAVLT; Rey, 1964) also noted. Finally, the locations of breaths were recorded.
was administered after participants had learned the three songs. In In the sung mode of recall, the musical notes in each final recall
this task, they had five trials to recall a list of 15 unrelated words. were transcribed by two independent musicians. The agreement be-
The RAVLT also served as a distraction task. Afterward, participants tween the judges was very low for rhythm, and therefore rhythm was
were asked to make a written recall1 of the three songs they had not considered in the present study. Instead, pitch intervals and di-
previously learned (excluding the practice song). The time elapsed rections were analyzed. The number of correct notes was defined as
between recalls was approximately 20 min. This delayed recall came the number of notes both judges gave a point to. When there was a
as a surprise test, because participants were not warned in advance disagreement (in 15% of the cases—i.e., for 228 out of 1,559 notes
that their memory would be assessed one more time. Since music produced), the note was discarded. Thus, the score corresponded to
may help long-term memory (Rainey & Larsen, 2002), 25 of the 36 the number of correct pitches divided by the total number of possible
participants were contacted 7 months later (2–10 months after the notes minus the notes both raters disagreed upon, multiplied by 100.
first administration) and asked again for vocal song recall, which
was recorded on tape. Results
Data Scoring Performance in the immediate recall of the song was
For text recall, words were considered correct or incorrect, irre- first examined by considering the percentages of words
spective of their pitch and duration when sung. Words were chosen that were correctly sung and spoken after presentation of
over syllables as the criterion because number of syllables some- sung and recited songs. The number of lines completed,
times differed across conditions; mute vowels are often sung but the total number of words recalled, the position of the for-
not pronounced. The words had to be produced in the correct order
to obtain a point. Omissions and substitutions received no points. A gotten lines in the song, the types of errors made by the
point was lost when words were added, and half-points were sub- participants, and pitch accuracy were also analyzed. Word
tracted when words were mispronounced but recognizable. In cases recall was also examined as a function of learning condi-
in which participants made an error because they misperceived a tion both after a delay of 20 min and after several months.
word and did not repeat it correctly when first heard, the repeated Finally, performance in lyrics learning was compared with
version of the word was considered correct in recall. Finally, a point performance in the Rey Auditory Verbal Learning Test.
was lost if the correct word was spoken instead of sung, and vice
versa. The total numbers of words correctly reproduced in the last
immediate recall and in the delayed recall were then divided by the Immediate Recall
number of words contained in the lines to be recalled by a given Correct words. An initial repeated measures ANOVA,
participant, thus taking into consideration the differing number of with both group (musician, nonmusician) and order of
246 Racette and Peretz
Table 3
Means (and Standard Errors) Obtained in Each Condition
on Immediate Recall in Experiment 1
Sung–Sung Sung–Spoken Divided–Spoken
Group Dependent Variable M SE M SE M SE
Nonmusicians % 63.9 4.8 73.0 3.5 74.0 3.2
Words 20.4 2.9 22.7 2.3 27.9 2.5
Lines 5.0 0.4 5.0 0.4 6.0 0.4
Musicians % 56.1 4.3 68.9 3.0 73.9 3.6
Words 18.7 2.3 23.7 2.3 27.1 3.0
Lines 5.2 0.4 5.6 0.4 5.8 0.4
Mean % 60.0 3.2 70.9 2.3 74.0 2.4
Words 19.6 1.8 23.2 1.6 27.5 1.9
Lines 5.1 0.4 5.3 0.4 5.9 0.4
presentation (1, 2, 3) as between-subjects variables and the participants, and compared it with recall of the last
condition (sung–sung, sung–spoken, divided–spoken) as line. The last line was not recalled by 42% of the partici-
a within-subjects variable, was performed on the percent- pants while singing, but by only 18% while reciting. This
age of words recalled. Since there was neither an effect difference was significant [Q(2) 5 8.78, p , .05, using
of order [F(2,30) 5 1.19, MSe 5 175.84, p . .05] nor an Cochran’s test]. Thus, serial position of the line appears
interaction between order and the other factors, order was to be more important in singing than in speaking. More-
not considered in the following analyses. over, forgetting an entire line was more frequent in sung
In Table 3, performance is expressed in percentage of recall (24% of the lines) than in the spoken recalls (11%)
words recalled, as well as in terms of the total number of [F(2,70) 5 8.25, MSe 5 0.02, p , .01]. When a line was
words recalled and the number of lines attempted. As can omitted in singing, the next line was omitted in 71% of
be seen, recall appears more difficult when participants the cases. In contrast, when a line was omitted in recit-
have to sing, regardless of their musical background. This ing, only 55% of the following lines were missed. This
was supported by an ANOVA performed on percentage of suggests that text recall in singing is more strictly sequen-
words recalled, with condition (sung–sung, sung–spoken, tial, because it appears to be more dependent on the serial
divided–spoken) as the within-subjects variable and order of information than is reciting.
group (musician, nonmusician) as the between-subjects Word errors. Types of errors are useful in determining
variable. The ANOVA revealed a main effect of condition the nature of the memory code used by participants. For
[F(2,68) 5 11.78, MSe 5 165.78, p , .001] and no group example, one word can be substituted for another in order
effect (F , 1) or interaction between condition and group to preserve the song line structure, and this type of error
(F , 1). Post hoc Tukey tests revealed that recall did not would be expected to occur more often while singing than
differ in the two spoken-recall conditions ( p . .05) and while speaking. Indeed, words were often replaced by a
was significantly better than sung recall ( p , .01). word with the same number of syllables (e.g., “Je t’écris
The superiority of spoken recall was also apparent when cet’ lettr’ par amitié” for “Je t’écris ces mots par amitié”).
the other measures were considered. When the raw num-
ber of correct words was considered as the dependent vari-
able, the main effect of condition [F(2,68) 5 7.34, MSe 5 36
78.14, p , .01] also reached significance. There was no Professional singers
group effect nor an interaction with condition (Fs , 1). Singing as minor
No singing training
When considering the number of attempted lines (4, 6, or 28
Number of Participants
8 lines), a similar trend was observed, since in this respect 24
performance in the spoken conditions was also superior
to that in the sung condition [F(2,68) 5 2.83, MSe 5 20
2.14, p 5 .07]. Moreover, as shown in Figure 1, nonmusi- 16
cians could learn as much of the songs as musicians when
An aspect that is worth examining is serial recall. The 8
beginning of a song acts as an anchor point for the whole 4
song. This refers to the fact that the beginning of a se-
quence is a determinant for the recall of the sequence in 0
4 Lines 6 Lines 8 Lines
question (Peretz, Radeau, & Arguin, 2004). Therefore, re-
call of the first song lines should be best, and recall should Attempted Lines
decline as the song progresses. Because the first line was
Figure 1. Number of nonmusicians and musicians reaching
presented twice to participants, whereas the subsequent each level of song line recall in Experiment 1, as a function of
lines were presented only once, we considered the recall their singing experience. Nonmusicians are represented in white
of the second line, which was forgotten by only 17% of and musicians in gray shades.
Learning Song Lyrics 247
Table 4
Mean Percentages of Errors (and Standard Errors) in Each Condition
As a Function of Line Structure
Sung–Sung Sung–Spoken Divided–Spoken
Type Line Structure M SE M SE M SE
Omissions Preserved 13 4 11 5 9 4
Altered 10 3 21 4 24 5
Substitutions Preserved 20 4 27 4 22 4
Altered 10 3 12 3 20 4
Similarly, in singing, when a word was omitted, partici- be seen, the amount of hesitations was equal for musicians
pants could replace it by a meaningless syllable (/na/) in in the singing and speaking conditions, but nonmusicians
order to preserve line structure. These omissions and sub- clearly made fewer hesitations when singing. The interac-
stitution errors were assessed with respect to the number of tion between condition and group was close to signifi-
syllables in the line. If a match was found, the line structure cance [F(2,68) 5 2.88, MSe 5 0.03, p 5 .06].
was considered preserved. When the number of syllables Finally, participants generally took a breath between
did not match, the line structure was considered altered. lines (75%) instead of during a line. While singing, 47%
The results of this analysis are presented in Table 4. Other of them took a breath after each line. While reciting,
types of errors, such as the addition of words (2% of total breaths were often taken after two or three lines. Indeed,
errors) or pronunciation errors (0.2%) were too rare to be more spoken than sung words can be produced in a single
examined. breath.
As can be seen in Table 4, errors tended to preserve the Notes. In the sung–sung condition, nonmusicians cor-
line structure, especially in singing. An ANOVA with con- rectly sang 36% of the notes (SE 5 7.8) and 65% of the
dition (sung–sung, sung–spoken, divided–spoken), type words (SE 5 4.8), whereas musicians correctly sang 48%
of errors (omission, substitution), and line structure (pre- of the notes (SE 5 7.2) and 56% of the words (SE 5
served, altered) as within-subjects variables and group 4.3). An ANOVA with material (word, note) as a within-
(musician, nonmusician) as a between-subjects variable subjects variable and group (musician, nonmusician) as a
yielded an interaction between type of error and line between-subjects variable revealed an interaction between
structure [F(1,34) 5 8.96, MSe 5 0.08, p , .01]. As ex- material and group [F(1,34) 5 4.61, MSe 5 413.43, p ,
pected, the omission of words tends to alter the line struc- .05]. Whereas nonmusicians recalled more words than
ture [18% vs. 11%; t(35) 5 1.90, SE 5 0.04, p 5 .07], notes ( p , .01 using a post hoc Tukey test), musicians did
whereas substitutions more often preserved it [23% vs. not. Interestingly, musicians also did not reproduce more
14%; t(35) 5 2.95, SE 5 0.03, p , .01]. This pattern was correct pitches than did nonmusicians (n.s.). When the
not significantly affected by singing, since the interaction total numbers of correct notes (M 5 13.1, SE 5 2.83, for
with condition was not significant (F , 1). There was no nonmusicians and M 5 19.6, SE 5 4.17, for musicians)
group effect, nor any interaction between group and any and words were examined instead of the proportions of
other variables. In addition, the substituted words were correct notes and words (see Table 3), there was no effect
semantically related to the target (67% of the words), thus of material [F(1,34) 5 1.85, MSe 5 97.63, p . .05] nor
keeping the gist of the line (e.g., “si jamais vous trouvez any group effect (F , 1), but the interaction was again
cet homme”/“if you ever find this man” instead of “si ja- close to significance [F(1,34) 5 3.03, p 5 .09].
mais vous tenez cet âme”/“if you ever hold this soul”). Furthermore, there was no significant correlation
Thus, participants tried to respect both the number of syl- between note and word recall, either in nonmusicians
lables and the meaning of words in their recall of lyrics, [r(16) 5 .38, n.s.] or in musicians [r(16) 5 .27, n.s.].2
regardless of the mode of vocal reproduction. Delayed recall and long-term retention. Recall after
Another factor that is known to enhance memory of a 20-min delay is presented in Table 6. As can be seen,
lyrics is the presence of rhymes at the end of lines. In performance dropped by half. Moreover, word recall ap-
order to assess the contribution of rhyme, we examined peared to persist longer after a divided–spoken presenta-
word errors as a function of their serial position in the tion. However, this trend was not significant, as revealed
line. The final words of each line—that is, those bearing by an ANOVA with condition (sung–sung, sung–spoken,
the rhyme—were incorrectly reproduced in only 15% of
the lines (with 19% and 12.5% in singing and reciting,
respectively). This error rate was smaller than the one ob- Table 5
served for any prior position in the line (e.g., the error rate Mean Percentages of Hesitations per Line (and Standard
was 20% for the initial word of the line; t(595) 5 2.92, Errors) in Each Condition for Each Group
SE 5 0.02, p , .01). Moreover, when the last word was Sung–Sung Sung–Spoken Divided–Spoken
replaced by another word, it respected the rhyme in 39% Group M SE M SE M SE
of the cases (e.g., tour for jour). Nonmusicians 6 3 26 5 21 4
In order to assess fluency, the number of hesitations per Musicians 14 3 17 3 15 5
line was examined in each condition (see Table 5). As can Mean 9 3 21 4 18 5
248 Racette and Peretz
6 Method
Six musicians (3 women, 3 men) and 6 nonmusicians (5 women, 1
man) who had participated in Experiment 1, for a total of 12 partici-
pants (mean age, 23.3; range, 20–26), came back for an additional
Nonmusicians (n = 18)
Musicians (n = 18) session 11 months later (range, 5–13 months). This subgroup was
0 selected on the basis of their availability. No professional singers
participated in this second experiment.
1 2 3 4 5 The participants were presented with the sung–sung and divided–
Trial spoken versions of the same six songs that were used in Experi-
ment 1. However, care was taken to present each participant with
Figure 2. Mean number of recalled words (and standard er- the three songs that the participant had not learned in Experiment 1.
rors) on each trial of the Rey Auditory Verbal Learning Test in Recall was tested with an adaptive procedure, as in Experiment 1.
nonmusicians and musicians. Each participant once again learned each of the three songs in a
Learning Song Lyrics 249
does not seem related to a trade-off between the two com- contrary, it was observed that when a line was forgotten,
ponents. Accuracy in singing the melody was similar participants were usually unable to continue singing, but
whether it carried lyrics or not. Furthermore, there was that they could continue reciting after a break. This might
no correlation between words and notes recalled, suggest- be a drawback of the strictly sequential nature of sing-
ing that these two components are supported by separate ing, in which melodic lines are represented in connected
memory representations. strings with front anchoring.
Nevertheless, one important cue for auditory–vocal re-
GENERAL DISCUSSION membering that is common to both music and poems is
rhythm. The regular organization of stresses, mostly al-
The present findings suggest that the best strategy for ternating between strong and weak beats/syllables, is sup-
learning song lyrics is to ignore the melody. The melody posed to limit the words that are compatible with it, and
seems to interfere rather than facilitate word recall in thereby constrains word selection. At least in English, the
songs in both musically trained and untrained learners. rhythmic similarity between the prosodic accent structure
Music was found to be of little help for text recall in ei- of spoken words and the metric structure of the melody
ther encoding or response. Hearing the lyrics embedded is striking and has long been noted by linguists (see, e.g.,
in the melody (i.e., sung) or spoken with the melody in Hayes & Kaun, 1996) and music theorists (Lerdahl &
the background did not affect word recall, even after a Jackendoff, 1983). Moreover, Palmer and Kelly (1992)
time delay (Experiment 1) and after task familiarization have shown that linguistic accent structure and musical
(Experiment 2). The same conclusion applies to the mode meter are generally aligned in Western songs. Hence,
of expression: Performance at reproducing both the lyrics rhythmic structure, as determined by the number of syl-
and the melody while singing was either impaired (Exper- lables (or notes) and the location of primary stress, may
iment 1) or slightly inferior (Experiment 2) to the recall serve as a compatible format for setting words to tones.
of the text alone. Melody recall was generally less precise By this account, recalling a particular stress pattern in a
than word recall, whether it was sung with the lyrics (Ex- melody (or spoken text) activates a metrical grid that con-
periments 1 and 2) or on /la/ (Experiment 2), in both mu- strains the type of text (melody) that is compatible with
sicians and nonmusicians. Thus, the results suggest that in it. A common metrical grid is typically used throughout
the first steps of learning a new song, melody and lyrics a song. Therefore, metric structure provides a means by
are remembered separately, making singing a dual task. which lines of an entire song are organized in a common
The cost of singing was reflected by a 14% word loss hierarchical structure, thereby relating nonadjacent song
(Experiment 1), but it was associated with an 8% increase components and helping memory.
in the recall of notes (Experiment 2); the cost was reliable, A limitation of the present study is that we were unable
but the benefit was not. This cost–benefit analysis is more to assess the specific contribution of rhythm to memory.
compatible with the view that the melody and lyrics of First, the raters failed to provide consistent judgments for
songs are processed independently (Besson et al., 1998; the rhythmic aspect of the productions. Second, French is
Bonnel et al., 2001; Hébert & Peretz, 2001; Peretz, 1996) not a stress-based language, so it is possible that musical
rather than treated as an integrated unit (see, e.g., Serafine meter (and rhythm in general) is not as efficient a memory
et al., 1984). Thus, the present results extend to singing aid for French lyrics as it is for English lyrics. Yet, as men-
what has been found in the normal functioning of percep- tioned in the introduction, support for the contribution of
tion and memory (see Peretz, Radeau, & Arguin, 2004, for music to lyric recall in English is scant (Kilgour et al.,
a recent review and discussion). 2000, Experiment 1 but not 2; Wallace, 1994, Experi-
However, separate production of melody and lyrics does ments 1 and 2 but not 3). There are also many negative
not entail interference, unless attention to one component reports of this contribution, even in English (Calvert &
adversely affects the other. In the present case, it seems Billingsley, 1998; Jellison & Miller, 1982). Therefore, and
that lyric recall was either prioritized or much easier than even though the contribution of rhythm to lyric recall has
note recall. Such a discrepancy between the processing of not been established yet in French, musical constraints ap-
words and notes has repeatedly been found in the litera- pear to be of limited help for lyric recall in general.
ture pertaining to perception of songs, with words always This conclusion raises the question of why music is be-
being more salient than musical notes (Hébert & Peretz, lieved to be so important for verbal memory, not only in
2001; Peretz, Radeau, & Arguin, 2004). There are sev- oral tradition but also in everyday life. We believe this is
eral factors that can account for this advantage of lyrics due to a misunderstanding of the utility of music. Music is
over melody. First, the lyrics were organized like a poem, not at the service of language. In songs, music contributes
and hence their memorability benefited from the use of to the creation of a general mood that is shared with oth-
several language constraints that are known to help re- ers (Bowra, 1962; see also Thompson & Russo, 2004, for
membering (Rubin, 1995). Semantics, rhymes, and line empirical support). As Booth (1981) writes, a singer tells
structure were all found to affect recall, whether recited or people “nothing they need to decode or learn. He evokes
sung. In contrast, the melody had no semantics or rhymes, in them ways of seeing life that they already have” (p. 28).
but has rhythm, line structure, and pitch accents. These In fact, oral transmission of text is rarely word for word
musical characteristics were instrumental in decreasing (verbatim) in singing. Although singers believe that they
hesitations, making singing more fluent, but were not suf- sing the text exactly as heard, they never do so (see Rubin,
ficient to give additional assistance to lyric recall. On the 1995, for a review). This applies to music recall as well.
Learning Song Lyrics 251
Singers, with and without musical training, never recall Engineering Research Council of Canada to I.P. We thank Claude Gau-
note for note what they have been presented (Sloboda thier for giving us access to his song material, and Sylvie Hébert, Chantal
Bergeron, and Bernard Bouchard for assistance in creating the stimuli,
& Parker, 1985). Rather, singers memorize a schema in data transcription, and melody transcription, respectively. Correspon-
which the surface detail is not retained. Recall involves dence relating to this article may be sent to I. Peretz, Département de psy-
processes akin to improvisation that fill in structurally im- chologie, Université de Montréal, C.P. 6118, succ. Centre-ville, Montréal,
portant events according to general constraints. Learning Québec, H3C 3J7 Canada (e-mail: [email protected]).
a new song for faithful reproduction is thus a laborious
task that requires hours of practice.
Copyright 1993 by Éditions du Jour de l’An. Transcribed and reproduced with permission.