0% found this document useful (0 votes)
80 views9 pages

Benefits From Retrieval Practice Are Greater For

This document summarizes a study that examined how the benefits of retrieval practice (i.e. testing effect) vary based on an individual's working memory capacity. The study tested college students on general knowledge facts under different conditions of lag between study and test, presence of feedback, and retention interval. It found that testing led to better long-term retention than restudying, and benefits from testing with feedback were significantly greater for students with lower working memory capacity on a delayed final test administered 2 days later. The findings suggest that retrieval practice may be an especially effective learning strategy to enhance long-term retention for students with lower ability levels.

Uploaded by

Mário Loureiro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views9 pages

Benefits From Retrieval Practice Are Greater For

This document summarizes a study that examined how the benefits of retrieval practice (i.e. testing effect) vary based on an individual's working memory capacity. The study tested college students on general knowledge facts under different conditions of lag between study and test, presence of feedback, and retention interval. It found that testing led to better long-term retention than restudying, and benefits from testing with feedback were significantly greater for students with lower working memory capacity on a delayed final test administered 2 days later. The findings suggest that retrieval practice may be an especially effective learning strategy to enhance long-term retention for students with lower ability levels.

Uploaded by

Mário Loureiro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Memory

ISSN: 0965-8211 (Print) 1464-0686 (Online) Journal homepage: http://www.tandfonline.com/loi/pmem20

Benefits from retrieval practice are greater for


students with lower working memory capacity

Pooja K. Agarwal, Jason R. Finley, Nathan S. Rose & Henry L. Roediger III

To cite this article: Pooja K. Agarwal, Jason R. Finley, Nathan S. Rose & Henry L. Roediger III
(2016): Benefits from retrieval practice are greater for students with lower working memory
capacity, Memory, DOI: 10.1080/09658211.2016.1220579

To link to this article: http://dx.doi.org/10.1080/09658211.2016.1220579

Published online: 17 Aug 2016.

Submit your article to this journal

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at


http://www.tandfonline.com/action/journalInformation?journalCode=pmem20

Download by: [Pooja Agarwal] Date: 18 August 2016, At: 06:40


MEMORY, 2016
http://dx.doi.org/10.1080/09658211.2016.1220579

Benefits from retrieval practice are greater for students with lower working
memory capacity
Pooja K. Agarwala, Jason R. Finleyb, Nathan S. Rosec and Henry L. RoedigerIIIa
a
Department of Psychology, Washington University in St. Louis, St. Louis, MO, USA; bDepartment of Psychology, Fontbonne University,
Clayton, MO, USA; cDepartment of Psychology, University of Notre Dame, Notre Dame, IN, USA

ABSTRACT ARTICLE HISTORY


We examined the effects of retrieval practice for students who varied in working memory Received 2 April 2016
capacity as a function of the lag between study of material and its initial test, whether or not Accepted 29 July 2016
feedback was given after the test, and the retention interval of the final test. We sought to
KEYWORDS
determine whether a blend of these conditions exists that maximises benefits from retrieval Testing effect; retrieval
practice for lower and higher working memory capacity students. College students learned practice; working memory;
general knowledge facts and then restudied the facts or were tested on them (with or feedback; lag
without feedback) at lags of 0–9 intervening items. Final cued recall performance was better
for tested items than for restudied items after both 10 minutes and 2 days, particularly for
longer study–test lags. Furthermore, on the 2-day delayed test the benefits from retrieval
practice with feedback were significantly greater for students with lower working memory
capacity than for students with higher working memory capacity (r = −.42). Retrieval practice
may be an especially effective learning strategy for lower ability students.

Testing is a powerful technique to enhance learning, intervals, but longer lags produce superior performance
because the act of retrieving information from memory at long retention intervals (Karpicke & Roediger, 2007;
promotes the ability to recall material again in the future Whitten & Bjork, 1977). Third, benefits from retrieval prac-
(Carpenter & DeLosh, 2005; Carrier & Pashler, 1992; see tice substantially increase when feedback is provided, com-
Roediger & Karpicke, 2006a, for a review). The use of retrie- pared to retrieval without feedback; however, the timing of
val practice as a learning strategy, by teachers and stu- feedback following retrieval (immediate vs. delayed) and
dents, has been shown to increase students’ long-term the length of the retention interval (e.g., one day vs. one
retention and transfer of knowledge to new situations week) influence its potency (Butler, Karpicke, & Roediger,
(Agarwal, Bain, & Chamberlain, 2012; Butler, 2010). 2007). In summary, lag, retention interval, and feedback
In laboratory and classroom settings, several factors all modulate benefits from retrieval practice, and various
modulate benefits from retrieval practice, also referred to combinations of these factors produce varying degrees
as the “testing effect” (for a review, see Dunlosky, of enhanced learning.
Rawson, Marsh, Nathan, & Willingham, 2013). These Individual differences may also influence retrieval-
factors include the time elapsed or the number of items enhanced learning (Unsworth & Engle, 2007). For instance,
between initial study and retrieval attempts (i.e., lag), the recent examinations reveal relationships between individ-
delay between initial retrieval practice and the final test ual differences and retrieval difficulty (Bui, Maddox, &
(i.e., retention interval), and the presence or absence of Balota, 2013), accessibility of retrieval cues (Unsworth, Spil-
feedback during initial retrieval. First, regarding lag, in lers, & Brewer, 2012), and presentation duration (Unsworth,
general longer intervals between study of material and a 2016). Regarding the testing effect, Wiklund-Hörnqvist,
test lead to better long-term retention, though the Jonsson, and Nyberg (2014) concluded that retrieval prac-
precise benefit from various schedules is complex and tice benefits did not differ as a function of working
under debate (e.g., Balota, Duchek, & Logan, 2007; Karpicke memory; however, their design manipulated trial type
& Roediger, 2007; Pyc & Rawson, 2007; Roediger & Karpicke, (study–study vs. study–test) between-subjects, so it
2011). Second, regarding retention interval, a tradeoff is cannot be determined the extent to which individual sub-
often found such that restudying improves retention jects exhibited the retrieval practice effect, making the null
in the short-term, but retrieval practice benefits learning result difficult to interpret.
in the long-term (e.g., Roediger & Karpicke, 2006b). In In a paired associate paradigm, Brewer and Unsworth
addition, shorter lags between study and retrieval trials (2012) found a small benefit of retrieval practice, which
often produce superior performance at short retention was significantly correlated with some individual difference

CONTACT Pooja K. Agarwal [email protected], www.poojaagarwal.com Washington University in St. Louis, St. Louis, MO 63130, USA
© 2016 Informa UK Limited, trading as Taylor & Francis Group
2 P. K. AGARWAL ET AL.

measures (e.g., episodic memory) but not others (e.g., combinations may prove effective for different students.
working memory). As Brewer and Unsworth noted, the This research contributes to our practical understanding
relatively small testing effect they found is inconsistent about the conditions that lead to the greatest benefits
with those of larger magnitude typically seen in the litera- of test-enhanced learning, and how these conditions
ture, leaving open the question whether there are “apti- might be tailored to enhance learning.
tude × treatment interactions” (pp. 414–415). In other
words, individual benefits from retrieval may vary depend-
ing on factors known to modulate the testing effect, includ- Methods
ing lag, retention interval, and feedback. Subjects
In a follow-up study, Pan, Pashler, Potter, and Rickard
(2015) conducted a replication attempt using Brewer One hundred sixty-six subjects (M age = 20.0 years, 103
and Unsworth’s materials and general procedures. female) were recruited from the Washington University in
Across two experiments, one online and one in the lab- St. Louis Department of Psychology human subject pool.
oratory, Pan et al. found substantial testing effects Subjects received either credit towards completion of a
(larger than in Brewer and Unsworth), but no significant research participation requirement or cash payment ($10/
correlations between an individual difference measure hour). Data from 10 subjects were excluded from analyses
(episodic memory) and benefits from testing. Pan et al. because they did not follow instructions or they did not
speculated that subtle procedural distinctions might return for the second session. Thus, data are reported
have contributed to the discrepancy between the two from 156 subjects.
studies. Namely, the differences in counterbalancing and We note that the 156 subjects were tested at two differ-
the blocking or mixing of presentations may account for ent time periods. The initial experiment was conducted in
the increased testing effect and/or the lack of a corre- 2008 with 104 subjects. In 2011, we added 52 more sub-
lation with the individual difference measure in the Pan jects from the same pool for greater power. The design
et al. study. and procedures used at the two time periods were identi-
Lastly, in a foreign language vocabulary paradigm, Tse cal, and analyses reported in the results section confirmed
and Pu (2012) found a small benefit of retrieval practice, a replication of findings between the two cohorts of sub-
albeit significantly correlated with a combined working jects. Accordingly, we have collapsed the remainder of
memory and test anxiety measure. Echoing the concluding the methods and results sections across the two cohorts
remarks by others, Tse and Pu acknowledged that the for maximal power and variability across individuals,
unexpected small testing effect might be a result of unless otherwise noted.
using a short lag between items, even when employing a
7-day retention interval. In other words, ascertaining a Design
strong relationship between the testing effect and individ-
ual differences can be challenging when using shorter We used a 2 (Trial type: study–study, study–test) × 6 (Lag: 0,
lags, which are known to be less potent for learning (e.g., 1, 3, 5, 7, 9) × 2 (Feedback for study–test trials: present,
Dunlosky et al., 2013). absent) × 2(Retention interval: 10 minutes, 2 days) mixed
To summarise, across recent studies examining individ- design. Trial type and lag were manipulated within sub-
ual differences, factors known to improve test-enhanced jects, whereas feedback and retention interval were
learning (lag, retention interval, and feedback) were manipulated between-subjects (39 subjects per cell). A
held constant. As a result, prior studies with small non-studied baseline condition was included such that all
testing effects and/or small correlations with individual subjects were tested on some items only during the final
difference measures provide an initial glimpse into the test (without initially studying these items) to assess how
precise relationship between retrieval practice and indi- much learning had taken place during the experimental
vidual differences. Our aim was to explore both the session.
relationship between the testing effect and individual
differences, as well as the relationship between individ-
Materials
uals and optimal retrieval conditions. We examined indi-
vidual differences across various levels of lag, retention One hundred ten general knowledge questions drawn
interval, and feedback, variables that are known to from the Nelson and Narens (1980) norms were used for
modulate the benefits of retrieval practice. Based on the this experiment. An example general knowledge question
current literature, we expected to find large benefits used was, “What is the city in which the Baseball Hall of
from retrieval when testing at longer lags, with feedback, Fame is located?” Based on the norms, items had a 10%
at a delayed retention interval. We also measured average recall in college students, ranging from 0.4% to
working memory capacity to determine whether individ- 22% recall. As noted in our results section, the average
uals might differ in the factors needed to provide the baseline (non-studied) recall for the general knowledge
greatest benefit from retrieval. An ideal combination of questions found in our study was 12%, in accordance
factors may not exist for all students; rather, different with the Nelson and Narens norms.
MEMORY 3

Of the 110 general knowledge items, 78 were used as Procedure


experimental items and 32 were used as fillers to create
Subjects were tested individually or in small groups. They
the list structure. Thirteen sets of six facts each, equated
were seated at a computer and completed all learning
for probability of recall, were counterbalanced across
and test phases using E-Prime 1.0 software (Schneider,
the 13 within-subject conditions (6 study–study lags, 6
Eschman, & Zuccolotto, 2002), which also provided instruc-
study–test lags, and a non-studied baseline condition).
tions and recorded time spent on each phase of the
Of the 78 critical items, subjects were presented 36
experiment.
items in the study–study condition and 36 items in the
In the learning phase, subjects viewed 110 general
study–test condition, whereas 6 items were queried
knowledge questions during study and test trials. Subjects
only during the final test (the non-studied baseline con-
were given the following instructions:
dition). For each study–study or study–test lag (0, 1, 3,
5, 7, and 9), subjects were presented with six items and During study trials, you will see a trivia question with its one-
average list position was equated across trial type and word answer below it on the computer screen. Please study
lag condition. this pair so you can remember it later on. During test trials,
you will see a trivia question with a cursor below it. Please
type in the correct answer for the trivia question.

Following these instructions, subjects received one prac-


tice study–test trial (including feedback), and then
moved on to the remainder of the learning phase.
For the first presentation of an item, subjects studied an
intact question–answer pair for 8 seconds (e.g., What is the
city in which the Baseball Hall of Fame is located? Coopers-
town). For the second presentation of an item, which fol-
lowed a lag of 0–9 intervening items, subjects completed
either a study–study trial (for half of the items), or a
study–test trial (for the other half). Study–study trials con-
sisted of re-presentation of the intact question–answer
pair for 11 seconds. Study–test trials differed by feedback
condition. For the no feedback condition, subjects were
shown the question and had 11 seconds to recall and
type in the answer. For the feedback condition, subjects
were shown the question, given 8 seconds to recall and
type in the answer, and then they were shown the
correct answer for three seconds. Note that total time for
the second presentation of an item was equated at 11
seconds in all conditions (study–study, study–test–no feed-
back, and study–test–feedback).
After the learning phase, all subjects completed a
working memory task on the computer for approximately
10 minutes. Specifically, subjects completed an automated
operation span by Unsworth, Heitz, Schrock, and Engle
(2005). Subjects were presented with a set of letters to
remember, followed by a math operation to solve, followed
by a recall phase in which subjects selected letters on a
computer screen in the order in which the letters were pre-
sented. The span task included three sets of letters for each
set size, which ranged from three to seven letters. In total,
the task included 75 letters and 75 math problems. The
order of set sizes was random for each participant.
Unsworth et al. reported a reliability (Cronbach’s α) of .78.
Subjects then received a final test either immediately
following the working memory task (a 10-minute retention
interval) or 2 days after the learning phase. The instructions
Figure 1. Mean proportion correct on initial (Panel a) and final recall tests for the final test were: “This test will look similar to the test
after 10 minutes (Panel b) or 2 days (Panel c) as a function of lag and trial trials earlier. You will see a trivia question at the top of the
type, collapsed over feedback conditions. Error bars represent standard
errors of the mean per lag (Panel a) or standard errors of the mean difference computer screen with a cursor below it. Please type in the
score per lag (Panels b and c). correct one-word answer for each trivia question.” During
4 P. K. AGARWAL ET AL.

the final test phase, subjects were presented with the 78 Accordingly, a 2 × 6 mixed ANOVA confirmed that there
critical items in random order and were provided 14 was neither a main effect of feedback group, F(1, 154)
seconds to type in their answer for each question. = .153, MSE = .142, p = .697, v̂p2 < .001, nor an interaction
The total time required for this procedure was approxi- between feedback and lag, F(1, 154) = 1.49, MSE = .021, p
mately 90 minutes (60 min for the learning phase and = .191, v̂p2 = .001. Thus, the data in Figure 1a are collapsed
working memory task, 30 min for the final test phase). over feedback conditions.
Upon completion of the experiment, subjects were
debriefed and thanked for their time.
Final test performance
Final test performance is shown in Figure 1b (10-min reten-
Results
tion interval) and 1c (2-day interval) as a function of
An alpha level of .05 was used for all tests of statistical sig- whether repetitions across lags were in the study–study
nificance except where otherwise noted. Where Mauchly’s or study–test condition. Reliability (Cronbach’s α) was
test indicated that the assumption of sphericity was vio- .953 for final test performance. We first conducted an
lated for a within-subjects factor in an analysis of variance overall 2 × 6 × 2 × 2 mixed ANOVA (trial type × lag × feed-
(ANOVA), the Greenhouse–Geisser correction was applied back × retention interval) and determined that feedback
to the degrees of freedom. Effect sizes for comparisons (present or absent) showed no significant main effects
of means are reported as Cohen’s d calculated using the and was not involved in any significant interactions.
pooled standard deviation of the groups being compared. Thus, the data in Figure 1b and 1c and further analyses
Effect sizes for ANOVAs are reported as v̂ 2 (one way) or v̂p2 in this section were collapsed across feedback groups.
calculated using the formulae provided by Maxwell and Feedback may not have had an effect because perform-
Delaney (2004, p. 598). Standard deviations reported are ance in the tested conditions was reasonably high at the
uncorrected for bias (i.e., calculated using N, not N – 1). lags we used (see Figure 1a).
For initial learning performance, a three-way ANOVA Second, we examined final test performance as a func-
(cohort, lag, and feedback) showed that cohort had no sig- tion of retention interval to determine if there were signifi-
nificant effect and was not involved in any significant inter- cant retrieval practice effects after 10 minutes and after 2
actions (ps ≥ .245). For final test performance, a five-way days. A 2 × 2 mixed ANOVA (trial type × retention interval)
ANOVA (cohort, trial type, lag, feedback, and retention confirmed a main effect of trial type: overall final test per-
interval) showed that cohort had no significant effect and formance was better for study–test items (M = 62%, SD =
was not involved in any significant interactions 23%) than for study–study items (M = 54%, SD = 23%),
(ps ≥ .136). Furthermore, working memory capacity did F(1, 154) = 73.18, MSE = .007, p < .001, v̂p2 = .036. Forgetting
not significantly differ between the two cohorts, t(154) = occurred between 10 minutes (M = 69%, SD = 19%) and 2
0.12, p = .903. Thus, we combined the data from the two days (M = 46%, SD = 20%), F(1, 154) = 55.86, MSE = .074,
cohorts for all analyses, except where otherwise noted. p < .001, v̂p2 = .260. The interaction between trial type and
retention interval did not reach statistical significance,
F(1, 154) = 2.67, MSE = .007, p = .104, v̂p2 < .001, indicating
Initial learning performance
that regardless of retention interval, final performance
Initial learning performance is shown in Figure 1a. was always greater for study–test items (10 minutes: M =
Reliability (Cronbach’s α) was .855 for initial learning per- 73%, SD = 19%; 2-day: M = 51%, SD = 21%) than for
formance. Initial recall of answers to general knowledge study–study items (10 minutes: M = 66%, SD = 20%;
questions declined as the lag between study and test 2-day: M = 42%, SD = 19%). In addition, final performance
increased. This was confirmed by a one-way ANOVA for non-studied baseline items (M = 12%, SD = 14%) was
across lags, F(5, 775) = 12.22, MSE = 0.261, p < .001, v̂ 2 significantly worse compared to study–study items,
= .030. Follow-up t-tests of all 15 pairwise comparisons con- t(155) = 22.89, p < .001, d = 1.48, and study–test items,
firmed that lag 0 led to greater initial recall than the other t(155) = 27.46, p < .001, d = 1.78, confirming that subjects
lags, ts > 5.11, ps < .001, ds > 0.41, though differences were indeed learning the obscure facts and did not know
between lags greater than 0 were not significant at the most of them ahead of time.
Bonferroni adjusted alpha level of .0033. We also per- Next, we examined final performance as a function of
formed an alternative analysis using regression to test lag in order to determine whether there was an optimal
the apparent decreasing pattern. For each subject, we lag for learning and whether this lag differed for the
obtained a slope using simple linear regression predicting study–study and study–test conditions. Parallel analyses
mean initial learning performance as a function of lag. The were conducted for the 10-min and 2-day retention inter-
mean slope was −.01 (SD = .02), which was significantly val. In both cases, the pattern in Figure 1a for initial learn-
different from zero, t(155) = 5.71, p < .001, d = 0.46. ing was reversed at final test – whereas greater lags
Because subjects did not receive feedback until after between initial study and restudy/test impaired perform-
initial test trials and there was only one test per item, no ance during initial learning, they enhanced performance
effect of feedback was expected on initial learning. on the final test at both retention intervals, illustrating
MEMORY 5

the pattern Bjork (1994) described as a “desirable diffi- compared to other subjects (not low on the range of poss-
culty.” The conditions leading to best initial performance ible scores on the working memory task).
led to poorest long-term retention (and vice versa). Working memory was significantly correlated with initial
Two separate 2 × 6 repeated measures ANOVAs (trial recall success for study–test items, r = .31, t = 4.09, p < .001.
type × lag), one for each retention interval, confirmed Next, we computed correlations between working memory
main effects of lag for the 10-min retention interval, F(5, scores and the difference between final performance on
385) = 8.26, MSE = .029, p < .001, v̂p2 = .024, and for the study–test items vs. study–study items, and did so separ-
2-day retention interval F(5, 385) = 9.94, MSE = .036, ately for all the between-subjects conditions. These data
p < .001, v̂p2 = .031. Benefits from retrieval practice are shown as scatterplots in Figure 2. At the 10-min reten-
appeared to increase as a function of lag at both retention tion interval (Figure 2, top panels), there was no significant
intervals (see Figure 1b and 1c), although the interaction correlation between working memory capacity and retrie-
between trial type and lag did not reach statistical signifi- val practice effects in the no feedback condition, r = .18, t
cance at the 10-min retention interval, F(4.5, 346.3) = 1.15, (37) = 1.09, p = .282, and none in the feedback condition,
MSE = .032, p = .335, v̂p2 = .001, nor after 2 days, F(4.4, r = .11, t(37) = 0.69, p = .494. Note that although the trend
335.3) = 2.11, MSE = .035 p = .074, v̂p2 = .003. in both 10-min conditions was positive, it was not statisti-
Next, we performed an alternative analysis using cally significant; thus, students with differing working
regression to test the apparent increasing pattern. For memory capacity benefitted equivalently from retrieval
each subject, we obtained a slope using simple linear practice, either with or without feedback.
regression predicting the mean difference score between At the 2-day retention interval (Figure 2, bottom panels),
study–study and study–test trials as a function of lag. At there was no significant correlation in the no feedback con-
the 10-min retention interval, the slopes did not signifi- dition, r = −.02, t(37) = 0.09, p = .926; however, there was a
cantly differ from zero, M = .006, SD = .03, t(77) = 1.65, significant negative correlation in the feedback condition,
p = .102, d = 0.21. At the 2-day retention interval, r = −.42, t(37) = 2.79, p = .008. Note that this result repli-
however, the slopes were significantly positive, M = .010, cated across our first sample (n = 26, r = −.45) and our
SD = .03, t(77) = 3.09, p = .003, d = 0.35, indicating that second sample (n = 13, r = −.40), increasing our confidence
retrieval practice benefits indeed increased as lag in the result. Thus, for a 2-day retention interval, the lower a
increased after a 2-day delay. This outcome is consistent student’s working memory capacity, the more s/he bene-
with prior findings that testing effects often emerge on fited from retrieval practice with feedback. We note that
delayed tests more than on immediate tests (Roediger & these specific conditions (retrieval with feedback after a
Karpicke, 2006a, 2006b), and that more difficult retrieval 2-day delay) may be of particular relevance in applied
yields greater benefits (Bjork, 1994; Finley, Benjamin, settings, where the provision of feedback and a delay
Hays, Bjork, & Kornell, 2011; Pyc & Rawson, 2009). In before the final test are practical and ideal for enhancing
summary, retrieval practice improved final performance learning.
compared to restudying, both immediately (after 10 Finally, we conducted an analysis to determine whether
minutes) and after a delay (at 2 days); further, the benefit the relationship between trial type and lag varied as a
after a 2-day delay increased as the lag, or number of function of working memory capacity. We restrict this
intervening items between study and retrieval trials, analysis to the 2-day retention interval group in which
increased. feedback was given during learning (Figure 2, bottom-
right panel), as this is the group in which a significant cor-
relation was observed between working memory capacity
Associations with working memory capacity
and the effect of retrieval practice. A 2 × 6 ANCOVA (trial
Is there a relationship between working memory capacity type × lag), using working memory span as a covariate
and the potency of retrieval practice? To address this and difference scores (study–test vs. study–study) as the
issue, we first examined correlations between initial and dependent variable, revealed no significant interactions
final test performance and individual differences in between lag and working memory capacity, F(5, 185) =
working memory capacity, as measured by the automatic 0.66, MSE = .036, p = .655, v̂p2 < .001, or between trial
operation span task (Unsworth et al., 2005). In keeping type, lag, and working memory, F(5, 185) = 1.48, MSE
with Unsworth et al., we used subjects’ total number of = .031, p = .197, v̂p2 < .001. Follow-up t-tests at each lag
letters recalled in the correct serial position (for trials in showed that difference scores were greater for the lower
which all letters in the sequence were correctly recalled) capacity group than the higher capacity group at lags 0
in the span task for all analyses. Subjects’ performance and 9, t(37) = 3.28, p = .002, d = 1.05 and t(37) = 3.84,
on the working memory task ranged from 10 to 75 (M = p < .001, d = 1.23, but did not significantly differ at any of
60.3, Mdn = 65.0, SD = 14.3). The maximum score for the the other lags (Bonferroni adjusted alpha level of .0083).
working memory task is 75; thus, subjects in our sample Thus, although all subjects benefitted from retrieval prac-
demonstrated working memory capacities toward the tice, there was no obvious pattern of optimal lag between
higher end of the scale. As such, “lower” working study and retrieval trials as a function of working memory
memory in our study refers to lower task performance capacity.
6 P. K. AGARWAL ET AL.

Figure 2. Difference in final test performance (study–test items minus study–study items) as a function of working memory span score, retention interval (10
minutes vs. 2 days), and feedback condition. Black lines represent the least squares linear regression.

Discussion students with lower working memory, for instance, spon-


taneous activation of semantic mediators, productive
The primary findings from this study were: (a) retrieval
mediators, and/or durable mediators may fluctuate
practice improved performance across the board, regard-
depending on the desirable difficulties present, the
less of feedback, with longer lags between study and
materials during testing, or possibly within a testing
initial test trials yielding greater benefits at the 2-day reten-
session for an individual subject. This interpretation is, of
tion interval; and (b) retrieval practice with feedback
course, post hoc and needs to be examined in future
yielded a greater benefit for students with lower working
research.
memory capacity at the 2-day retention interval.
Surprisingly, the provision of feedback did not provide
We replicated the typical finding that retrieval enhances
an overall additional benefit above and beyond retrieval,
delayed performance relative to restudying (Roediger &
Karpicke, 2006a) and we also confirmed previous findings regardless of retention interval, possibly because per-
that longer lags during learning enhance performance rela- formance was reasonably high on the initial test.
tive to shorter lags (e.g., Karpicke & Roediger, 2007; Whitten Although we found retrieval practice with feedback
& Bjork, 1977). Even so, we were unable to determine an improved performance at the 2-day retention interval dis-
optimal blend of lag, retention interval, and feedback to proportionately for lower working memory capacity stu-
maximise retrieval practice benefits in this paradigm. dents (r = −.42), Tse and Pu (2012) found a small benefit
According to the mediator effectiveness hypothesis, of testing for students with lower working memory
benefits from testing are greater when the initial test is capacity when corrective feedback was not provided
challenging because these opportunities strengthen the during initial learning.
link between a cue and a target (“mediating information;” One possible explanation for this inconsistent feedback
Carpenter, 2011; Pyc & Rawson, 2010). While it may seem pattern relates to the bifurcation model by Kornell, Bjork,
counterintuitive that various combinations of “desirable and Garcia (2011). In this framework, items that are suc-
difficulties” did not yield peak performance in the cessfully retrieved are boosted in terms of memory
present study, we consider the possibility that these diffi- strength, whereas items that are not successfully retrieved
culties (retrieval, increased lag, and delayed retention nor provided feedback remain below threshold. When
interval) may have proven too challenging for students items are followed by feedback, however, non-retrieved
with lower working memory. In other words, at what items are boosted to a similar amount of memory
point are difficulties for students no longer desirable? For strength as successfully retrieved items. Alternatively, this
MEMORY 7

discrepancy may be due to test-enhanced processing of References


feedback (e.g., Arnold & McDermott, 2013a, 2013b; Izawa, Agarwal, P. K., Bain, P. M., & Chamberlain, R. W. (2012). The value of
1970; Kornell, Hays, & Bjork, 2009). Feedback allows one applied research: Retrieval practice improves classroom learning
to identify recall errors and, thus, provides an opportunity and recommendations from a teacher, a principal, and a scientist.
to engage in elaborative (re)encoding of question–answer Educational Psychology Review, 24, 437–448.
pairs in order to correct these errors on subsequent tests. Arnold, K. M., & McDermott, K. B. (2013a). Free recall enhances sub-
sequent learning. Psychonomic Bulletin & Review, 20, 507–513.
Thus, it is important to bear in mind that for students Arnold, K. M., & McDermott, K. B. (2013b). Test-potentiated learning:
with lower working memory capacity, the relationship Distinguishing between direct and indirect effects of tests.
between tests with feedback, tests without feedback, and Journal of Experimental Psychology: Learning, Memory, and
the test–delay interaction may prove unique from students Cognition, 39, 940–945.
with higher working memory capacity. Balota, D. A., Duchek, J. M., & Logan, J. M. (2007). Is expanded retrieval
practice a superior form of spaced retrieval? A critical review of the
We note that an appropriate examination of benefits extant literature. In J. S. Nairne (Ed.), The foundations of remember-
from an intervention as a function of individual differences ing: Essays in honor of Henry L. Roediger, III (pp. 83–105). New York,
requires attention to several methodological issues, such as NY: Psychology Press.
sample size and replication. While our sample size (N = 156) Bjork, R. A. (1994). Memory and metamemory considerations in the
was similar to or greater than those in prior studies on training of human beings. In J. Metcalfe & A. Shimamura (Eds.),
Metacognition: Knowing about knowing (pp. 185–205). Cambridge,
retrieval practice and working memory (e.g., Brewer & Uns- MA: MIT Press.
worth, 2012, N = 107; Pan et al., 2015, N = 120, 122; Tse & Brewer, G. A., & Unsworth, N. (2012). Individual differences in the
Pu, 2012, N = 160), future research should aim to obtain effects of retrieval from long-term memory. Journal of Memory
larger sample sizes. In addition, while our sample included and Language, 66, 407–415.
data from two cohorts of subjects (see the “Methods” Bui, D. C., Maddox, G. B., & Balota, D. A. (2013). The roles of working
memory and intervening task difficulty in determining the benefits
section) and we found a significant negative correlation of repetition. Psychonomic Bulletin & Review, 20, 341–347.
between retrieval practice and working memory for both Butler, A. C. (2010). Repeated testing produces superior transfer of
cohorts, additional replication is necessary to ascertain learning relative to repeated studying. Journal of Experimental
the optimal combination of lag, feedback, and retention Psychology: Learning, Memory, and Cognition, 36, 1118–1133.
interval for learning. Butler, A. C., Karpicke, J. D., & Roediger, H. L. (2007). The effect of type
and timing of feedback on learning from multiple-choice tests.
The takeaway message is that delayed benefits from Journal of Experimental Psychology: Applied, 13, 273–281.
testing with feedback during learning were significantly Carpenter, S. K. (2011). Semantic information activated during retrieval
greater for students with lower working memory than for contributes to later retention: Support for the mediator effective-
students with higher working memory capacity. This ness hypothesis of the testing effect. Journal of Experimental
finding suggests that retrieval practice during learning, Psychology: Learning, Memory, and Cognition, 37, 1547–1552.
Carpenter, S. K., & DeLosh, E. L. (2005). Application of the testing and
when accompanied by feedback, may serve to level the spacing effects to name learning. Applied Cognitive Psychology, 19,
playing field for lower capacity students. Results from the 619–636.
present study suggest important educational implications Carrier, M., & Pashler, H. (1992). The influence of retrieval on retention.
for enhancing learning conditions for lower ability stu- Memory & Cognition, 20, 633–642.
dents, and further work in applied settings is necessary Dunlosky, J., Rawson, K. A., Marsh, E. J., Nathan, M. J., & Willingham,
D. T. (2013). Improving students’ learning with effective learning
to sustain this conclusion. techniques: Promising directions from cognitive and edu-
cational psychology. Psychological Science in the Public Interest,
14, 4–58.
Acknowledgements Finley, J. R., Benjamin, A. S., Hays, M. J., Bjork, R. A., & Kornell, N. (2011).
We thank Bridgid Finn for comments on a draft of this manuscript; Benefits of accumulating versus diminishing cues in recall. Journal
Andrew Butler, Jeff Karpicke, and Geoffrey Maddox for valuable of Memory and Language, 64, 289–298.
discussions; and Jane McConnell for her help throughout this Izawa, C. (1970). Optimal potentiating effects and forgetting-preven-
project. tion effects of tests in paired-associate learning. Journal of
Experimental Psychology, 83, 340–344.
Karpicke, J. D., & Roediger, H. L. (2007). Expanding retrieval practice
promotes short-term retention, but equally spaced retrieval
Disclosure statement enhances long-term retention. Journal of Experimental Psychology:
No potential conflict of interest was reported by the authors. Learning, Memory and Cognition, 33, 704–719.
Kornell, N., Bjork, R. A., & Garcia, M. A. (2011). Why tests appear to
prevent forgetting: A distribution-based bifurcation model.
Journal of Memory and Language, 65, 85–97.
Funding
Kornell, N., Hays, M. J., & Bjork, R. A. (2009). Unsuccessful retrieval
This research was supported by the National Science Foundation attempts enhance subsequent learning. Journal of Experimental
Graduate Research Fellowship Program and the Harry S. Truman Scho- Psychology: Learning, Memory, and Cognition, 35, 989–998.
larship Foundation (awarded to the first author), and the James Maxwell, S. E., & Delaney, H. D. (2004). Designing experiments and ana-
S. McDonnell Foundation twenty-first Century Science Initiative lyzing data: A model comparison perspective (2nd ed.). Mahwah, NJ:
grant, Applying Cognitive Psychology to Enhance Educational Practice: Lawrence Erlbaum Associates.
Bridging Brain, Mind, and Behavior Collaborative Award (awarded to Nelson, T. O., & Narens, L. (1980). Norms of 300 general-information
the fourth author). questions: Accuracy of recall, latency of recall, and feeling-of-
8 P. K. AGARWAL ET AL.

knowing ratings. Journal of Verbal Learning and Verbal Behavior, 19, Tse, C.-S., & Pu, X. (2012). The effectiveness of test-enhanced learning
338–368. depends on trait test anxiety and working-memory capacity.
Pan, S. C., Pashler, H., Potter, Z. E., & Rickard, T. C. (2015). Testing Journal of Experimental Psychology: Applied, 18, 253–264.
enhances learning across a range of episodic memory abilities. Unsworth, J. (2016). Working memory capacity and recall from long-
Journal of Memory and Language, 83, 53–61. term memory: Examining the influence of encoding strategies,
Pyc, M. A., & Rawson, K. A. (2007). Examining the efficiency of sche- study time allocation, search efficiency, and monitoring abilities.
dules of distributed retrieval practice. Memory & Cognition, 35, Journal of Experimental Psychology: Learning, Memory, and
1917–1927. Cognition, 42, 50–61.
Pyc, M. A., & Rawson, K. A. (2009). Testing the retrieval effort hypothesis: Unsworth, N., & Engle, R. W. (2007). The nature of individual differences
Does greater difficulty correctly recalling information lead to higher in working memory capacity: Active maintenance in primary
levels of memory? Journal of Memory and Language, 60, 437–447. memory and controlled search from secondary memory.
Pyc, M. A., & Rawson, K. A. (2010). Why testing improves memory: Psychological Review, 114, 104–132.
Mediator effectiveness hypothesis. Science, 330, 335. Unsworth, N., Heitz, R. P., Schrock, J. C., & Engle, R. W. (2005). An auto-
Roediger, H. L., & Karpicke, J. D. (2006a). The power of testing memory: mated version of the operation span task. Behavior Research
Basic research and implications for educational practice. Methods, 37, 498–505.
Perspectives on Psychological Science, 1, 181–210. Unsworth, N., Spillers, G. J., & Brewer, G. A. (2012). Working memory
Roediger, H. L., & Karpicke, J. D. (2006b). Test-enhanced learning: capacity and retrieval limitations from long-term memory: An
Taking memory tests improve long-term retention. Psychological examination of differences in accessibility. Quarterly Journal of
Science, 17, 249–255. Experimental Psychology, 65, 2397–2410.
Roediger, H. L., & Karpicke, J. D. (2011). Intricacies of spaced retrieval: A Whitten, W. B., & Bjork, R. A. (1977). Learning from tests: The effects
resolution. In A. S. Benjamin (Ed.), Successful remembering and suc- of spacing. Journal of Verbal Learning and Verbal Behavior, 16,
cessful forgetting: Essays in honor of Robert A. Bjork (pp. 23–48). 465–478.
New York, NY: Psychology Press. Wiklund-Hörnqvist, C., Jonsson, B., & Nyberg, L. (2014). Strengthening
Schneider, W., Eschman, A., & Zuccolotto, A. (2002). E-prime user’s concept learning by repeated testing. Scandinavian Journal of
guide. Pittsburgh, PA: Psychology Software Tools. Psychology, 55, 10–16.

You might also like