Training Spatial Cognition Enhances Mathematical
Training Spatial Cognition Enhances Mathematical
https://doi.org/10.1038/s41562-021-01118-4
Spatial and mathematical abilities are strongly associated. Here, we analysed data from 17,648 children, aged 6–8 years, who
performed 7 weeks of mathematical training together with randomly assigned spatial cognitive training with tasks demanding
more spatial manipulation (mental rotation or tangram), maintenance of spatial information (a visuospatial working memory
task) or spatial, non-verbal reasoning. We found that the type of cognitive training children performed had a significant impact
on mathematical learning, with training of visuospatial working memory and reasoning being the most effective. This large,
community-based study shows that spatial cognitive training can result in transfer to academic abilities, and that reasoning
ability and maintenance of spatial information is relevant for mathematics learning in young children.
S
patial ability is closely associated with performance in sci- two questions that are of both theoretical and practical relevance:
ence, technology, engineering and mathematics1. For example, (1) does spatial cognitive training impact mathematical learning,
the ability to mentally rotate a figure in the mind correlates and if so is it more effective to train mental rotation or VSWM; and
with current performance in mathematics and predicts later learn- (2) are there inter-individual characteristics that predict the optimal
ing in children1–3. Similar associations exist for other spatial tasks, type of cognitive training to enhance mathematics?
including visuospatial working memory (VSWM)4,5 and spatial, Although our main hypothesis relates to the contrast between
non-verbal reasoning (NVR)6. mental rotation and VSWM, we also included training of spatial
Consequently, it has been suggested that improving spatial abili- NVR. NVR tasks rely heavily on one’s ability to find visuospatial
ties might be a way to enhance mathematical learning7–9, with some patterns and determine how these patterns interrelate spatially27.
teacher organizations going as far as to place equal emphasis on They were added based on the high correlation between NVR and
spatial training and numerosity10. However, the mechanisms of mathematical ability6 and some evidence that such training can
spatial training are still unclear, and there is a lack of large, random- enhance problem-solving ability28–30.
ized studies assessing its effect on mathematical performance9. We conducted our study via modifications on a freely available
Spatial tasks involve at least three aspects: creating an internal app (Vektor), using number line tasks to train mathematics. The
spatial representation; maintaining it; and manipulating it11. A men- app was used as an extra, voluntary activity in school, organized by
tal rotation task involves representation and manipulation, but min- teachers. When using the app, children spent half of their time with
imal maintenance. In contrast, a VSWM task puts a higher demand number line tasks, as the efficacy of this training was established for
on maintenance, but less on manipulation. this app31 as well as for similar training using the number line32,33.
This distinction is at the heart of two hypotheses on why train- The remaining time was allotted, by randomization, to the training
ing spatial abilities could transfer to improvements in mathemat- of rotation tasks (two-dimensional (2D) mental rotation and tan-
ics. The first hypothesis is that manipulation is the critical aspect, gram), VSWM or NVR (Fig. 1 and Supplementary Fig. 1). In the
and that rotation training would therefore be more effective than first, fifth and seventh week, children performed self-administered
training on VSWM2,8. In the second hypothesis, maintenance of tests of mathematics (addition, subtraction and number com-
spatial information that is critical, as well as training on VSWM, parison). For feasibility reasons, mathematical tests had to be
should therefore be superior11,12. It is also possible, of course, that self-administered, but this enabled us to perform a large, compre-
both hypotheses are wrong and spatial training does not transfer to hensive (12–20 h of training) and ecologically valid training study.
mathematics at all.
Previous literature gives mixed support for both hypotheses Results
of transfer. Several studies have shown a positive impact of rota- Over the course of 7 weeks, children completed cognitive training
tion training on mathematics13–15 while others found no effect16–18. for either 20 or 33 min d−1. Our final sample consisted of 17,648
Similarly, training of VSWM improved mathematical outcomes in children between the ages of 6 and 8 years, who on average com-
children in some studies19–21, although there have also been negative pleted 5,077 trials (s.d. = 1,710).
findings22,23. Given these mixed results, some suggest that cognitive To measure the baseline performance within each training
training, in general, does not transfer at all to academic abilities24. domain, we designed the app to give every child an identical first
This broad lack of consensus in the cognitive training lit- week. Task performance for all tasks was moderately correlated
erature could be due to inadequate statistical power25 along with between tasks, as well as with the mathematical transfer tests, con-
meta-analyses combining different training methods, populations sistent with the well-documented association between spatial cogni-
and outcome measures26. Perhaps most importantly, there exists a tion and mathematics (Fig. 2b).
lack of large, randomized studies. Following this first week, individuals were randomly split into
Here, we report data from over 17,000 children who engaged in one of five training plans. Each plan involved the same amount of
various forms of mathematical and cognitive training. We asked mathematical training (~50% of the training time), yet differed in the
a b 100
Rotation tasks
Tangram
50 2D rotation
VSWM 2D mental rotation
NVR
VSWM
Number line
25
0
Default NVR Rotation Rotation Mixed
NVR Tangram heavy
Training plan
Fig. 1 | Overview of the four training tasks and the percentages of time allocated to each in the five training plans. a, Images of the four training tasks:
VSWM (a visuospatial working memory task in which participants had to remember a sequence of dots on a 4 × 4 grid and, after a delay, respond to them
in order); NVR (a spatial task in which participants had to find spatial patterns among figures); 2D mental rotation (a task in which children had to choose
the shape that, when rotated, would fit into an empty silhouette); and the tangram task (a second rotation task in which children had to manipulate and
rotate several pieces to fit into an empty silhouette). b, Percentages of time allocated to each cognitive training task in each of the five training plans. Both
2D mental rotation (that is, 2D rotation) and the tangram task are classified as rotation tasks. Images adapted from Cognition Matters.
a b Pearson
NVR correlation 1.00
0.4
First week
2D rotation 1.00 0.43
0
25
50
75
00
Fifth week
0.
0.
0.
1.
0.3
Density
Seventh week
VSWM 1.00 0.51 0.41
0.2
Baseline
0 1.00 0.58 0.53 0.40 0.25
mathematics
–2 0 2 4
e
VR
m lin
in
tio
W
N tics
rl
N
he se
ta
VS
ro
at Ba
a
um
2D
m
c
VSWM Number line
8
100
6
50
4
2 0
0 10 20 30 0 10 20 30
Mean level
2D rotation NVR
50
7.5 40
5.0 30
20
2.5
10
0 0
0 10 20 30 0 10 20 30
Training day
Fig. 2 | Predicted factor scores by test week, correlations with baseline mathematics and training curves of the density of children for each difficulty
level and day of training. a, Density plot (n = 17,648) of predicted factor scores from three tests of mathematics by test week (Supplementary Fig. 3
shows individual tests). b, Correlations (n = 16,484) of the mean correct difficulty level for the training tasks in the first week with baseline mathematics
(a participant-specific intercept of the mathematics factor). All correlations were significant (P < 0.001). c, Training curves (maximum n = 17,647) showing
the color-coded density of children for each difficulty level and day of training. The light blue diamonds denote the mean correct level per day of each of
the corresponding tasks.
a
0.8
Predicted mathematical
improvement (s.d.)
0.6 Minutes of
VSWM training
4.3
0.4 7.0
9.7
11.5
0.2 16.0
0
–1.5 0 1.5
Baseline performance (s.d.)
b
0.8
Predicted mathematical
improvement (s.d.)
0.6 Minutes of
NVR training
0
0.4 2.0
2.7
3.3
0.2 4.5
0
–1.5 0 1.5
Baseline performance (s.d.)
Fig. 4 | Predicted performance following differing amounts of VSWM and NVR training for different levels of baseline cognitive performance.
a,b, Predicted mean improvement in mathematical performance following VSWM (a) and NVR training (b), as a function of baseline performance
and time spent training (n = 16,484). Baseline performance was measured from the first week of training, before randomization. The improvement in
mathematics was estimated from the slopes of improvement over the three tests, extracted from the mixed-effects model and corrected for baseline
mathematical performance. The error bars indicate 95% confidence intervals. Full model results are provided in the main text.
on average, but it might be that rotation training is more effective for Specifically, we found that training on VSWM was more effec-
some children. Children were characterized based on cognitive and tive than both types of rotation training (2D mental rotation and
mathematical performance in the first week (Fig. 2b). Both accuracy the tangram task). This suggests that, when it comes to transfer to
and average difficulty level were included, resulting in eight predictor mathematics, the crucial aspect of spatial training is maintaining a
variables from the cognitive tasks (Supplementary Table 5). spatial representation, rather than manipulating it. This is in line
First, we evaluated predictors of mathematical improvement with suggestions that a bottleneck for spatial cognition is the ability
independent of the training plans. As shown by earlier analyses, to maintain the spatial representation11,12 and that individuals with
baseline performance on the mathematics tests (that is, intercept) problems relating to mental rotation lose the image they attempt to
had a positive impact on improvement (that is, the Matthew effect). keep in mind.
However, after correcting for baseline mathematics, children with a A priori, we expected rotation training to have a positive impact
lower than average level in the number line task improved more in on mathematics learning. However, rotation turned out to have
mathematics over the course of training (b = −0.12; 95% CI = (−0.15 the smallest effect. Being the worst-performing condition, rotation
to −0.10); P after FDR (pFDR) < 0.001). Children with a higher training can be considered an active and very strict control group
level of NVR (b = 0.03; 95% CI = (0.01 to 0.05); pFDR < 0.001) for the other conditions. Our analysis showed that, compared with
and higher accuracy on VSWM (b = 0.05; 95% CI = (0.03 to 0.06); rotation training, both VSWM and NVR training had a significant
pFDR < 0.001) also improved more. and meaningful impact on mathematics learning. In this large, ran-
Second, we tested whether differing amounts of cognitive train- domized study, we show that cognitive training results in transfer to
ing (VSWM, rotation or NVR) interacted with any of these base- academic abilities, adding evidence in favour of the view that cogni-
line characteristics to predict mathematical improvement (Fig. 4). tive abilities are malleable34–36. We also show the relevance of such
This analysis showed that children with initially higher VSWM training for academic performance in an ecologically valid setting.
accuracy (b = 0.006; 95% CI = (0.002 to 9.011); pFDR < 0.001) or Our finding of transfer to mathematical learning is consistent
lower NVR performance (b = −0.007; 95% CI = (−0.012 to −0.002); with some previous studies19–21 but is at odds with others22,23. The
pFDR < 0.001) benefitted more from VSWM training. Moreover, largest negative study was performed by Roberts and colleagues22.
children with lower initial NVR levels benefitted more from NVR One reason for the discrepancies could be that the current study
training (b = −0.014; 95% CI = (−0.024 to −0.004); pFDR < 0.001). included a general population while the study by Roberts et al.
There were no significant interactions between any of the baseline included only children with low working memory capacity
performance measures and the amount of time spent on rotation (the lowest 15th percentile). As shown by the interaction analysis
training. (Fig. 4a), and consistent with previous findings, the impact of
VSWM training depends heavily on baseline performance31. Such
Discussion dependence on baseline characteristics needs to be taken into
Here, we report strong evidence that spatial cognitive training account when interpreting the existent literature as well as planning
impacts mathematical learning in children. Taking the type of cog- future studies.
nitive activity into account resulted in a model that was around An interesting finding was the positive effect of NVR training.
20 times more likely to predict mathematical improvement than a Although such training is much less researched, our finding is con-
model that did not account for it (∆AIC = 24). sistent with previous research showing NVR training to improve
Testing tasks. Testing was self-administered through the app, and appeared
Methods on specific days. The first testing session was delivered on the third and fourth
Implementation and inclusion criteria. This research project was approved by the days of training, with the second on the 25th and 26th days of training and the
Swedish Ethical Review Authority under application number 2016/136-31/1. As final session on the 35th and 36th days of training. Addition and subtraction
specified in our ethics application and in accordance with Swedish law (2003:460), tests appeared on the first day per testing session, while a number comparison
informed consent was not sought from children or their guardians, as no task appeared on the following day. For the addition and subtraction tasks,
personally identifiable information was collected or stored and the study involved children responded with on-screen buttons (0–9). The test finished if a trial was
no risk of harm to the participating children, all of whom stood to benefit from not completed within 60 s or if three consecutive errors were made. The sum of
the training. Data were collected in collaboration with the non-profit foundation correct responses was used as the outcome measure for both tests. The number
Cognition Matters (https://cognitionmatters.org/), to implement our training comparison task presented two single-digit Arabic numerals on either side of the
plans using the freely available app Vektor. Vektor is an adaptive cognitive training screen and the children had to respond by tapping on the larger number. The
app primarily aimed at improving school performance in mathematics. Previous outcome measure was the mean response time of correct trials.
research has shown the efficacy of Vektor to improve mathematics in children31. As expected in this age range, the subtraction and addition tests were not
There was no active recruitment or advertising for this study. Educators signed normally distributed, due to a large number of zero responses in the first test
up their classes entirely of their own volition, including agreeing on data storage, (Supplementary Fig. 3a). However, the third test (the mean reaction time of correct
and children’s data were automatically anonymized. Children and educators number comparison trials) was normally distributed. Around 2% of the number
could withdraw from the study at any point. There was no compensation for comparison test data were missing and therefore imputed. This represented under
participation. Educators chose the amount of training per day (either 20 or 33 min) 1% of total test data.
and the app automatically logged out after the prescribed amount to time.
Children were only included if they completed 36 d of training (corresponding Validity of online mathematical testing. To test the validity of the online tasks,
to the mathematical tests in week 7) and if they were between the ages of 6 and we used data from two schools gathered over the course of a year. The first study
8 years (see Supplementary Fig. 7 for attrition). The app automatically randomized consisted of 46 preschool-age children, while the second consisted of 60 children
each account to one of the training plans, ensuring that schools with differing (36 of whom were in first grade). Ethical approval was granted by a regional ethics
demographics would have equal percentages of students in each training plan. committee. A trained experimenter individually administered an addition and
Reporting Summary
Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency
in reporting. For further information on Nature Research policies, see our Editorial Policies and the Editorial Policy Checklist.
Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.
n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement
A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly
The statistical test(s) used AND whether they are one- or two-sided
Only common tests should be described solely by name; describe more complex techniques in the Methods section.
For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted
Give P values as exact values whenever suitable.
For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes
Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated
Our web collection on statistics for biologists contains articles on many of the points above.
Data analysis We only used open source software for the analysis. R version 3.6.0 (2019-04-26) was used and the relevant packages are mentioned in the
methods.
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and
reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.
Data
Policy information about availability of data
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable:
- Accession codes, unique identifiers, or web links for publicly available datasets
- A list of figures that have associated raw data
April 2020
Data and code for the main analysis (i.e., mixed-effect model) will be made available upon publication. Any other data will be made available upon request.
1
nature research | reporting summary
Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.
Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf
Research sample The sample is 17,648 children between the ages of 6-8 that completed mathematical training for at least 36 days, this age range was
chosen as it is the target demographic of the training. We included all children from Aug 2017 – Jan 2020.
Sampling strategy No sample size was predetermined. As this is the first study of its kind comparing different active types of training (WM vs Rotation vs
NVR) it is quite difficult to accurately estimate expected effect sizes. A lot of previous literature suffers from a variety of issues (which
we discuss in the manuscript) making it even harder to calculate an expected effect size of each. Lastly, we also only modified a small
proportion of the training (unrelated to mathematics) and therefore assumed we would need quite a large sample.
Data collection All data was remotely gathered and anonymized making the experimenter naive to the condition until analysis.
Data exclusions 184 subjects were excluded as they partook in an experimental training task making them "unique"; a further single subject was
excluded for missing cohort information. These exclusion were decided prior to analysis.
Non-participation We accessed the database and only included subjects with 36 days therefore we have not calculated subjects that only preferred a
few days of training.
Randomization An algorithm randomly allocated children into training programs, this was done on the level of the child and not classroom.
Recruitment There was no advertisement or active recruitment. Teachers independently decided to implement the training program
themselves, there could be classroom level self selection bias yet this should not impact the results as the different training
plans were randomized within classroom.
Note that full information on the approval of the study protocol must also be provided in the manuscript.