Interpreting Mathematics Performance in PISA: Taking Account of Reading Performance
Interpreting Mathematics Performance in PISA: Taking Account of Reading Performance
A R T IC LE I N F O ABS TRA CT
Keywords: This study examines the importance of reading performance in explaining mathematics perfor-
PISA mance in the Programme for International Student Assessment (PISA), and analyses how the
Score interpretation relationship is present for different reading subareas. Data of Fangshan District of Beijing in PISA
Mathematics 2009 China Trial were used. Multilevel modelling analyses reveal that: (1) reading performance
Reading
can explain a considerable proportion of the variance in mathematics performance, and mod-
Reading subareas
Word problems
erates the gender gap favouring males in mathematics performance; (2) specific reading subareas
significantly associated with mathematics performance. These findings suggest that taking into
consideration students’ performance in reading, especially some specific reading subareas, is
important when interpreting mathematics performance. Implications for formulating policy
based on PISA outcomes are made.
1. Introduction
The relationships between reading and mathematics performance have been a topic discussed in a number of studies. The positive
association between performance in these two domains is widely documented in a range of contexts. For example, Walker, Zhang,
and Surber (2008) found that on some mathematics problems, students with low reading ability are more likely to give incorrect
answers even if they have the similar level of mathematical attainment. Similarly, by employing the Trends in International
Mathematics and Science Study (TIMSS) and Progress in International Reading Literacy Study (PIRLS) 2011 data of Italian 4th grade
students, Caponera, Sestito, and Russo (2016) suggest that good readers, regardless of their mathematics ability, are advantaged in
solving mathematics problems. Chang and Ko (2012) found that mathematics achievement is best explained by reading compre-
hension ability for the students with low mathematics achievement comparing with those with medium or high mathematics
achievement. The predictive function of reading ability for the progress of mathematics achievement has been evidenced as well.
Grimm's (2008) longitudinal study found that students who have better early reading ability are more likely to achieve greater
improvement in mathematics.
By contrast, a relatively small body of literature shows rather than early reading ability, early mathematics ability is a stronger
predictor of later achievements including that in reading (e.g. Claessens, Duncan, & Engel, 2009; Duncan et al., 2007). However, it is
considered that, in these studies, the mathematics tests may capture language skills with the use of applied mathematics problems,
meanwhile, the early reading test may be constructed inconsistently with the later reading test since it only involves letter sounds,
⁎
Corresponding author.
E-mail address: [email protected] (H. Ding).
https://doi.org/10.1016/j.ijer.2020.101566
Received 18 December 2019; Received in revised form 23 March 2020; Accepted 27 March 2020
Available online 16 May 2020
0883-0355/ © 2020 Elsevier Ltd. All rights reserved.
H. Ding and M. Homer International Journal of Educational Research 102 (2020) 101566
word recognition and vocabulary (Claessens et al., 2009; Purpura, Logan, Hassinger-Das, & Napoli, 2017).
Therefore, item types or measures of the specified mathematics construct employed in the mathematics test may be one of the
possible reasons explaining the relationship between reading and mathematics performance. Indeed, it has been consistently reported
that the relationship is stronger for mathematics word problems which have relatively high reading demands compared to pure
computation problems (Doerr & Temple, 2016; Helwig, Rozek-tedesco, Tindal, Heath, & Almond, 1999). For mathematics word
problem solving, problem comprehension is theoretically and empirically suggested as a crucial component of the cognitive processes
(Boonen, van der Schoot, van Wesel, de Vries, & Jolles, 2013; Boonen, de Koning, Jolles, & van der Schoot, 2016; Fuchs et al., 2016;
Mayer, 1986). Researchers argue that it is often in the phase of problem comprehension that students make errors on many math-
ematical problems (Leiss, Plath, & Schwippert, 2019; Lewis & Mayer, 1987; Schumacher & Fuchs, 2012). It is found that relatively
long text in mathematics word problems make poor mathematics achievers tend to be more disadvantaged in solving problems
(Mullis, Martin, & Foy, 2013; Walkington, Clinton, & Shivraj, 2017). It is even assumed that more ‘wordy’ mathematics problems may
contribute to the narrowing gender differences favouring males in mathematics performance which are observed from large-scale
assessments (Marks, 2008).
Besides text length, linguistic- semantic factors may also hamper students’ understanding in problems (LeFevre et al., 2010; van
der Schoot, Bakker Arkema, Horsley, & van Lieshout, 2009). From an experiment examining 1st grade children’s understanding in
solving arithmetic world problems, Cummins, Kintsch, Reusser, and Weimer (1988)) found that children are more likely to mis-
comprehend the problems which include abstract or ambiguous language. The experiment conducted by Lewis and Mayer (1987)
found that for the problems including relational statements, the way the relational term is presented (e.g. ‘more than’ / ‘less than’) in
the sentence is linked to students’ comprehension of the problem. Reading ability is assumed to be helpful for handling linguistic-
sematic characteristics and text complexity during the process of problem comprehension (Boonen et al., 2016).
Though strong correlation between reading and mathematics performance has been widely evidenced, it does not necessarily
imply that one causes the other. It is suggested that there might be shared cognitive processes (e.g. working memory) or a general
ability (e.g. general intelligence) between or above reading and mathematics, contributing to the mathematics performance
(Ashkenazi, Rubinsten, & De Smedt, 2017). Reading ability may just act as a proxy for these unknown constructs if the relations
between its multiple subareas and mathematics ability are consistent (Grimm, 2008); yet, when the relation is only observed for
specific reading subareas, shared commonalities between reading and mathematics abilities would be suggested (Grimm, 2008;
Purpura et al., 2017).
PISA, a triennial programme launched by the Organisation for Economic Co-operation and Development (OECD) in 1997, assesses
how well students approaching the end of compulsory education are prepared and equipped to meet the challenges in their adult life
by measuring students’ achievement in reading, mathematics and science literacy (OECD, 1999). Its mathematics tasks typically
employ context-embedded word problems in which mathematical objects and symbols are not explicitly presented to students (OECD,
2010a). The use of word problems and the need to describe real-world contexts intrinsically brings relatively high reading demands to
PISA mathematics test (Eivers, 2010). Although the OECD (2010a, 2013a) claims that consideration of the appropriate level of
reading required in mathematics problems is taken, it seems that they still have relatively high reading demands, at least in terms of
the word counts (Wu, 2010). The employment of non-continuous texts such as maps and graphs even makes PISA mathematics
problems more complex to read (OECD, 2010a), since more types of transformations among different representations are needed in
problem comprehension (Duval, 2006). The Appendix displays mathematics item examples which were used in PISA 2012 and were
released by the OECD afterwards.
By regressing country means of PISA mathematics on country means of PISA reading, Wu (2010) found that country mean score in
reading is a good predictor of country mean score in mathematics, since they have a very high correlation (r = 0.95), and variance in
reading scores accounts for 91 % of the variance of mathematics scores. She argues that reading demands in items are one of the
factors explaining the differential performance between PISA and TIMSS across countries, as many TIMSS items are context-free and
have fewer words. By classifying PISA 2012 mathematics problems as ‘low reading demand’ and ‘high reading demand’, Ajello,
Caponera, and Palmerio (2018)) found that Italian male students achieved higher in low-reading demand problems, while females
performed better in high-reading demand problems. In the context of China, high correlation between mathematics performance and
reading subareas in terms of text formats has been evidenced with PISA 2009 Shanghai data (Shen & Lu, 2013). However, research on
examining the relationships towards mathematics performance and reading cognitive processes is rarely seen. Moreover, it seems that
the extent to which performance in reading and specific reading subareas account for mathematics performance is still under-
researched.
Due to the strong overall relationship between mathematics and reading, researchers argue that construct validity in mathematics
assessment in PISA can be obscured by reading differences. Rindermann and Baumeister (2015) examined the validity of PISA by
rating its tasks on various scales (e.g. reading competence, math competence, problem solving, general knowledge). They suggest that
the validity of literacies (e.g. reading, mathematics, science) measured in PISA is questionable, and also that understanding reading
literacy is crucial for interpreting performance in PISA tasks (Rindermann & Baumeister, 2015).
Since students’ performance in PISA has been playing a critical role in influencing educational policy or practice in participating
2
H. Ding and M. Homer International Journal of Educational Research 102 (2020) 101566
jurisdictions (Breakspear, 2012; Ertl, 2006; Niemann, Martens, & Teltemann, 2017; Nortvedt, 2018), appropriate interpretation of
students’ performance is important for informing good policymaking. However, as suggested by Pons (2012), “PISA knowledge for
learning”, that is, rigorous analysis of PISA data for understanding education quality and for informing policymaking is usually
missing. Hence, it is argued that the reception and interpretation of PISA results in the policy field is usually superficial, without the
awareness of the complexity underpinning the results (Gruber, 2006; Mangez & Hilgers, 2012). Although the high correlation be-
tween reading and mathematics performance is officially reported in PISA outputs (e.g. OECD, 2012), it is only briefly displayed in
PISA technical reports, rather than its results reports which are usually used by national policymakers for informing policy making,
and the extent to which reading performance can explain the variance in mathematics performance is left unclear. Considering the
relatively high reading demands of PISA mathematics problems (Wu, 2010), taking students’ reading ability into consideration in the
interpretation of their mathematics performance is clearly important.
In addition, it is also worthwhile to analyse whether different reading subareas are differentially associated with mathematics
ability, from which one can obtain evidence suggesting whether reading ability is only a proxy of some other constructs that actually
influence mathematics performance or reading ability itself is directly connected with mathematics ability through common com-
ponents between these two domains. Previous studies examining the relationship between reading and mathematics most employ a
single measure for reading ability (Harlaar, Kovas, Dale, Petrill, & Plomin, 2012), while identifying reading subareas associated with
mathematics performance is suggested but not yet commonly conducted (Grimm, 2008; van der Schoot et al., 2009). Clarifying these
reading subareas could provide further insights into the interpretation of mathematics performance in terms of describing how the
relationship between reading and mathematics performance is present.
The current study specifically investigates what differences in reading performance imply for the interpretation of mathematics
performance, and whether some cognitive aspects of reading ability or specific reading text formats have stronger association with
mathematics performance than others. Possible effects of gender and family social and economic status background are taken into
consideration. Hence, this study focusses on addressing the following two research questions:
(1) To what extent, does students’ overall reading performance explain their mathematics performance after controlling for student
background factors?
(2) Are there differences in the strengths of the relationships between students’ performance in reading subareas and in mathematics?
This paper aims to raise the awareness of policy-makers and other consumers of PISA outcomes with regard to the potential
importance of reading in interpreting PISA mathematics performance; and to begin the process of developing a stronger evidence
base on the proper interpretation of mathematical outcomes from international large scale assessments.
2. Methods
This section contains two parts. Firstly, data involved in this study are described in terms of the variables used, weights, the
approach to addressing missing values, and interactions between variables. Secondly, data analyses methods and procedures are
introduced.
2.1. Data
This study uses data of PISA 2009 in which reading was the majority domain, since data of students’ performance in reading
subareas are available in this cycle in addition to performance in overall reading literacy and mathematics literacy.
Data of Fangshan District of Beijing in PISA 2009 China Trial were employed. China conducted three cycles of PISA China Trial
respectively in 2006, 2009, and 2012 with the aim to inform the reforms of domestic educational assessment (Wang, 2007, 2009).
PISA China Trials were administrated in alignment with PISA technical standards, although the data are not released into public
domain (Wang & Jing, 2013). Fangshan District of Beijing has been involved in PISA since its participation in PISA 2009 China Trial
(Wang, Jing, & Tong, 2017). Not only has Fansghan published its results, but also local policymakers explicitly claimed that its local
PISA scores and PISA assessment ideas have been actively used in motivating a number of initiatives for improving teaching and
learning in mathematics, science and reading in its local context (Wu, 2015). Interpretation and utilisation of Fangshan results in
PISA 2009 China Trial was also part of the local-level teacher education content for teachers across the whole local area (Guo, Wu, &
Zhang, 2015). In this case, appropriate interpretation of students’ performance in PISA seems especially important considering the
high engagement with PISA outcomes in Fangshan local educational practices. Data of PISA China Trials are managed by the National
Education Examinations Authority (NEEA) of China and are not yet released into public domain. Approval of using Fangshan data
was obtained from the NEEA through a written application.
In each cycle of PISA, a two-stage sample design is typically used, in which schools having 15-year-old students who are enrolled
at 7th grade or higher are selected, and then 15-year-olds are selected within the sampled schools (OECD, 2012). Jurisdictions also
have the option to use a three-stage design in which regions are first selected before sampling schools to obtain accurate estimates of
regional results (OECD, 2012). PISA China Trials used three-stage design, however, they only targeted academic school students (i.e.
students who take the academic track) and do not include vocational schools (Wang & Jing, 2013) – this limits the extent to which it
makes sense to compare results from these assessments with those from the main PISA outcomes which are intended to include all
types of schools having PISA eligible students. According to PISA 2009 sampling technical standards, 35 eligible students were
selected from each sampled school, and all eligible students were selected in the case that schools had students fewer than 35 (OECD,
3
H. Ding and M. Homer International Journal of Educational Research 102 (2020) 101566
Table 1
Summary statistics of the sample.
Total sample (N = 610)
N %
Gender
Female 292 47.9
Male 318 52.1
Educational level
Lower-secondary 231 37.9
Upper-secondary 379 62.1
Mean SD
Age in years 15.73 0.30
2012). During the academic year 2008–2009, there were 52 secondary academic schools in Fangshan (Fangshan Bureau of Statistics,
2009), 25 of which were sampled in PISA 2009 China Trial. 610 students from these 25 schools, representing student population in
secondary academic schools in this local area, participated in this assessment. Data of them are employed in this study. The summary
statistics of the sample are shown in Table 1 below.
2.1.1. Variables
Students’ performance in mathematics, reading, reading subareas, and students’ gender as well as their family economic, social
and cultural status were variables for analyses. We discuss each of these in turn.
Performance in mathematics. The definitions of the typical domains (i.e. mathematical literacy, reading literacy, and scientific
literacy) assessed in PISA are usually developed over cycles to echo the changes in the wider field (OECD, 2013a). Since PISA 2009
data were used in this current study, definitions employed in this cycle are used through this work. According to PISA 2009 fra-
mework (OECD, 2010a, p. 14), mathematical literacy is defined as following:
An individual’s capacity to identify and understand the role that mathematics plays in the world, to make well-founded judge-
ments and to use and engage with mathematics in ways that meet the needs of that individual’s life as a constructive, concerned and
reflective citizen.
Specifically, PISA assesses students’ ability “to analyse, reason and communicate mathematical ideas effectively as they pose,
formulate, solve and interpret mathematical problems in a variety of situations” (OECD, 2010a, p. 105). This is further classified as
formulating, employing, and interpreting mathematics in PISA 2012 (OECD, 2013a).
In PISA 2009, each student’s performance in mathematics is indicated by five plausible values (PVs) (Mislevy, 1991) which are
generated by Item Response Theory (IRT) modelling (OECD, 2012). The mean and standard deviation (SD) for the average math-
ematical performance of OECD countries was set as 500 and 100 respectively in PISA 2003 in which mathematics was the major
domain for the first time (OECD, 2012). On this previously established scale, mathematics performance is reported in follow-up PISA
cycles (OECD, 2012). The set of five PVs in mathematics were employed as the dependent variable in this study and were dealt with
accordingly (see Analyses section).
Overall performance in reading and performance in reading subareas. As the major domain in PISA 2009, reading literacy referred to
“an individual’s capacity to: understand, use, reflect on and engage with written texts, in order to achieve one’s goals, to develop
one’s knowledge and potential, and to participate in society” (OECD, 2010a, p. 14).
Reading literacy has a broader meaning than decoding (i.e. word recognition), as it also contains other cognitive components such
as linguistic knowledge (e.g. words, grammar, textual structures and features) and knowledge about the world (OECD, 2010a, p. 23).
Reading literacy was classified into three ‘aspects’ (i.e. cognitive processes): Access and Retrieve, Integrate and Interpret, and Reflect
and Evaluate (OECD, 2010a). In the tasks of Access and Retrieve, students are required to access and locate details from the information
explicitly specified in questions, while Integrate and Interpret requires students to make sense of meaning from information that is not
explicitly stated (OECD, 2010a). In the tasks of Integrate and Interpret, students have to identify assumptions and implications in a text,
and understand the text as coherent by recognising the relationships between pieces of information (OECD, 2010a). Reflect and
Evaluate requires students to connect the information within a text with their own experience and knowledge of the world by
providing evidence or arguments, drawing comparisons, or assessing the claims (OECD, 2010a).
In addition to the classification based on cognitive processes, this construct was also classified separately as reading Continuous
texts (e.g. sentences and paragraphs, see item example 1 in Appendix), and reading Non-continuous texts (e.g. forms, graphs, figures,
etc., see item example 2 in Appendix) based on text formats (OECD, 2010a).
PVs in reading and PVs in reading subareas were employed as explanatory variables in this study. As with mathematics, PVs in
reading and reading subareas have mean = 500 and SD = 100 for OECD average on PISA international reading scale which was
established in PISA 2000 in which reading was the major domain for the first time (OECD, 2012).
Gender. Gender was included in this study to examine the relationship between mathematics and reading after adjusting for
possible gender differences in performance in these domains. Traditionally there is a stereotype that females usually outperform
males in reading (Ehrtmann & Wolter, 2018; Nowicki & Lopata, 2017), and in PISA 2009 females achieved higher than males in all
participating jurisdictions (OECD, 2010b). It is reasonable to assume that gender differences in reading performance may moderate
4
H. Ding and M. Homer International Journal of Educational Research 102 (2020) 101566
gender differences in mathematics, if reading performance does have a significant effect on mathematics performance. In PISA 2009
student questionnaire database, variable ST04Q01 indicates student gender, with values 1 and 2 refer to female and male respec-
tively. To more conveniently interpret the gender effect, the dummy variable ‘male’ was created based on this variable, with 0
represents female and 1 represents male.
ESCS. Students’ family socioeconomic status and its relationships with their academic achievement have been widely discussed
(e.g. Berkowitz, Moore, Astor, & Benbenishty, 2017; Marks, 2006; Sirin, 2005; White, 1982). The index ESCS (PISA index of eco-
nomic, social and cultural status) is included in PISA. It is based on three other indices which reflect students’ background in terms of
home possessions, parents’ occupations and parents’ educational levels (OECD, 2012). ESCS is traditionally used in PISA reports for
adjusting for the socioeconomic status of students as well as schools (OECD, 2012). It has a mean = 0 and SD = 1 for OECD countries’
average on PISA international scale (OECD, 2012).
In this current study, ESCS values were centred on its mean of the student sample involved in this study, so that ESCS = 0
representing the average ESCS background of the involved student sample.
Beyond individual student ESCS, it is known that the average ESCS within schools has significant effect on students’ performance,
and its effect is even stronger than that of student individual ESCS (OECD, 2013b; Sirin, 2005; White, 1982). School-level ESCS is
considered as an indicator of socioeconomic segregation among schools (Perry & McConney, 2010). The variable SCH_ESCS re-
presenting school ESCS was therefore created and also included in analyses.
2.1.2. Weights
PISA uses sampling weights to address sampling error and allow for making inferences of the population (OECD, 2012). To
address the measurement error brought by the generation of PVs, PISA adopts the approach of replicating estimation of parameters
with replicate weights (OECD, 2012). In this study, student final weights (i.e. sampling weights) and all replicate weights were
employed when applicable (see Analyses section).
2.2. Analyses
First, descriptive analysis for the above mentioned variables was carried out. Then correlation analysis was conducted to look at
the bivariate relationships between mathematics performance and student background measures as well as reading performance, in
addition to the inter-correlations between reading subareas. Following that, two-level multilevel modelling (MLM) analyses were
conducted with the consideration that students were nested in schools. With the ‘bottom-up’ modelling strategy (Hox, 2010), MLM
analyses starts from the null model (M0) which does not include any explanatory variables to investigate the between-school variance
in students’ mathematics performance. The intraclass correlation coefficient (ICC), indicating the variance explained by schools (Hox,
2010), was also calculated. After that, Model 1 (M1) employing student level (i.e. level 1) background variables male, ESCS, and
school level (i.e. level 2) variable SCH_ESCS explores to what extent these variables account for students’ mathematics performance.
With the control for background variables, in M2, one of the variables of the main interest of this study, that is, PVs in reading were
added to investigate the account of reading performance in terms of explaining the variance of mathematics performance.
In M3 and M4, PVs in three reading aspects and PVs in two reading text formats were added respectively to identify whether some
Table 2
Outline of models.
Explanatory variable Model
M0 M1 M2 M3 M4
Level 1
Male √ √ √ √
ESCS √ √ √ √
Reading √
Reading subareas-aspects Access and Retrieve √
Integrate and Interpret √
Reflect and Evaluate √
Reading subareas- text formats Continuous Texts √
Non-continuous Texts √
Level 2
SCH_ESCS √ √ √ √
5
H. Ding and M. Homer International Journal of Educational Research 102 (2020) 101566
reading subareas had stronger effect comparing with other subareas for interpreting mathematics performance. Table 2 outlines all
the two-level models.
The percentage of between-school variance explained and the percentage of within-school variance explained were calculated
respectively for M1, M2, M3, and M4 to investigate the contribution of adding explanatory variables into the modelling.
Although reading subareas scores were all reported on the international scale, the SDs of them for the sample involved in this
present study were not the same (see Table 3 in Results section). Therefore, it is problematic to discuss their effect sizes by comparing
their coefficients directly (Lorah, 2018). Hence, the regression coefficients of reading subareas were then standardised with the
method suggested by Snijders and Bosker (2012) to allow the comparability among them.
In conjunction with SPSS 23, IEA IDB Analyzer 4.0 (IEA, 2018) which allows for incorporating PVs, sampling weights and
replicate weights was used for data descriptive analysis and correlation analysis. The MIXED procedure of Stata 13 (StataCorp,
2013a) was used for multilevel modelling (MLM) analyses. In MLM analyses, the full set of five PVs in mathematics, reading and
reading subareas were also employed. The normalised student final weights were adopted (OECD, 2009). The MI procedure of Stata
13 was used to pool the results of the five datasets as per the standard methodological guidance (StataCorp, 2013b).
3. Results
In this section, first, the descriptive statistics of the variables of interest followed by the correlation between mathematics per-
formance and explanatory variables are presented. Then, multilevel modelling results are presented for the null model in addition to
the models involving explanatory variables.
Descriptive statistics of mathematics achievement as well as all explanatory variables are shown in Table 3.
Using OECD average (500) on the international scale as the benchmark1, Table 3 shows that students’ mean performance in
mathematics was above the OECD average, while the performance in reading and all the reading subareas lagged behind. Regarding
gender difference, though on average males scored 7.38 points higher than females in mathematics achievement, this difference was
not statistically significant (p > 0.05). However, significant gender differences in favour of girls were observed in the overall reading
performance as well as in reading subareas performance. Fig. 1 further presents the distribution of males’ and females’ performance in
mathematics and reading.
As shown in Fig. 1, for mathematics, males and females’ median performance were similar, however, low male mathematics
achievers (i.e. the 5th percentile) scored about 27 points higher than the low female mathematics achievers. Differential distributions
between males and females in reading performance are also evidenced in Fig. 1. Males’ median performance was about 30 points
lower than females, and high male reading achievers (i.e. the 95th percentile) scored about 32 points lower than high female reading
achievers.
The bivariate correlation analysis was conducted to identify the relationships between mathematics and explanatory variables of
interest. Results are shown in Table 4.
As Table 4 presents above, all the explanatory variables of interest except ‘male’ had significant positive relationships with
students’ mathematics performance (p < 0.05). With regard to the background variables, comparing with SCH_ESCS (r = 0.35),
ESCS had a relatively weak relationship (r = 0.14) with mathematics performance. The relationship between reading and mathe-
matics performance was very high with r = 0.77, while the correlation coefficients of reading subareas ranged from 0.61 to 0.68. The
inter-correlations between reading subareas were high, with the coefficients ranging from 0.73 to 0.84, as shown in Table 5 below.
The correlation analysis results confirm the importance of employing these explanatory variables in MLM analyses. Though
correlation analysis did not demonstrate a significant relationship between gender and mathematics performance, male was still
employed in the following MLM analyses considering the differential distributions of mathematics performance between males and
females, and gender difference in reading performance.
Firstly, a null model, M0, was run to identify the between-school variance and within-school variance of mathematics perfor-
mance.
As shown in Table 6, p < 0.001, suggesting that mathematics performance varied significantly among schools. The value of ICC
indicates that 31 % of variance of Fangshan students’ mathematics performance in PISA 2009 China Trial lay between schools. The
result of the null model supports the need to take into consideration the multilevel data structure in analyses. The estimates of
1
Theoretically, the performance of Fangshan students in PISA 2009 China Trial could not be compared with the OECD average directly, since the
target population in PISA 2009 China Trial did not include the students enrolled in vocational education. Here, the OECD average was only used as
an indicative reference score.
6
H. Ding and M. Homer International Journal of Educational Research 102 (2020) 101566
Table 3
Descriptive statistics of variables.
Variable Description Mean (SD) Mean Gender difference
(SD) (Male-Female)
Female Male
Table 4
Bivariate correlation between students’ mathematics performance, background and reading performance.
ESCS SCH_ESCS male Reading Reading1 Reading2 Reading3 Reading4 Reading5
(Access and (Integrate and (Reflect and (Continuous texts) (Non-continuous
Retrieve) Interpret) Evaluate) texts)
Mathematics 0.14* 0.35* 0.04 0.77* 0.61* 0.66* 0.63* 0.67* 0.68*
variance components in the null model were then used as the basis for calculating the variance reduced by the more complex models
presented in Table 7.
In M1 which involved only level 1 (student) background variables ESCS and male, and level 2 (school) background variable
SCH_ESCS, student ESCS had no significant association with mathematics performance (p > 0.05), while both SCH_ESCS and male
had (p < 0.001and p < 0.05 respectively). For females (male = 0), an increase of 92 score points (0.92 SD of PISA international
scale) in their mathematics performance was associated with a one-unit increase on SCH_ESCS. The significant effect of gender on
mathematics performance indicates that for students from the average ESCS background who were enrolled in the schools with the
same school average ESCS, males would perform 17 score points (0.17 SD of PISA international scale) higher than females. M1
explained 40 % of between-school variance and 1% of within-school variance.
On the basis of M1, in M2 students’ reading performance was added. As shown in Table 7, with background variables controlled,
reading performance had significant effect on mathematics performance (βreading = 0.833, p < 0.001). For the students from the
same socio-economic background, those who achieved 100 score points higher in reading than their peers would achieve 83 higher in
mathematics. On the contrary, students who had relatively low reading literacy tended to have relatively poor mathematics per-
formance in PISA 2009 China Trial. Another finding was that the change of the effect sizes of gender and school average ESCS was
observed after adding reading performance as an explanatory variable. The coefficient of variable male βmale changed from 17 to 34,
7
H. Ding and M. Homer International Journal of Educational Research 102 (2020) 101566
Table 5
Inter-correlations between students’ performance in reading subareas.
Reading1 Reading2 Reading3
(Access and Retrieve) (Integrate and Interpret) (Reflect and Evaluate)
Reading1 1.00
(Access and Retrieve)
Reading2 0.84 1.00
(Integrate and Interpret)
Reading3 0.73 0.82 1.00
(Reflect and Evaluate)
Reading4 Reading5
(Continuous texts) (Non-continuous texts)
Reading4 1.00
(Continuous texts)
Reading5 0.84 1.00
(Non-continuous texts)
Table 6
Analysis of null model.
M0
and it was significant at 0.001 level rather than 0.05 level. With ESCS and SCH_ESCS controlled for, males achieved on average 34
higher than females, supposing that they had same reading performance. On the contrary, the effect size of SCH_ESCS was weakened
with βSCH_ESCS turning from 92 to 42, though it was still significant (p < 0.001). As Table 7 indicates, comparing with M1, M2 (which
took into consideration reading performance) explained 46 % more of between-school variance and 51 % more of within-school
variance.
In M3 which involved the three cognitive aspects of PISA reading, reading 1 denoting Access and Retrieve had no significant effect
on mathematics (p > 0.05). However, the other two aspects reading 2 (Integrate and Interpret) and reading 3 (Reflect and Evaluate)
both had significant relationships with mathematics performance (both had p < 0.01). In M4 involving reading 4 (Continuous texts)
and reading 5 (Non-continuous texts), both of these two variables were significantly associated with mathematics performance with
p < 0.001.
The standardised regression coefficients of reading subareas are shown in Table 8.
Reading 2, representing the cognitive aspect Integrate and Interpret, had a marginally larger effect than reading 3 on mathematics
performance. In terms of the ability of reading different formats of texts, reading 5 denoting non-continuous texts had a slightly larger
effect size than reading 4 on mathematics performance.
This study investigated and estimated the importance of taking reading performance into account in interpreting mathematics
performance, and examined whether particular reading subareas have a stronger association with mathematics performance. In the
sections below, key findings and policy implications are discussed, followed by the discussion on limitations and future research.
In line with the findings of previous studies (e.g. Wu, 2010), the results of the current study indicate that reading performance had
significant effect on mathematics performance. We have found that students with high reading performance were more likely to also
achieve high in mathematics (βreading = 0.833, see M2 in Table 7). As suggested by Helwig et al. (1999) and Doerr and Temple (2016),
the relationship between reading and mathematics achievement is relatively high for mathematics word problems. This study has
confirmed this finding with data of PISA which employs mathematics word problems with high reading demands. The significant
effect of reading performance and the magnitude of variance explained by including reading as an explanatory variable supports that
it is crucial to take reading performance into account in interpreting students’ performance (including mathematics performance) in
PISA (Rindermann & Baumeister, 2015).
8
H. Ding and M. Homer
Table 7
Analysis of multilevel models with explanatory variables.
Model M1 M2 M3 M4
Fixed effects Coefficient (SE) p Coefficient (SE) p Coefficient (SE) p Coefficient (SE) p
Intercept 531.361 (8.876) < 0.001 126.176 (20.819) < 0.001 177.775 (29.125) < 0.001 164.177 (21.944) < 0.001
Level 1
ESCS −0.947 (3.903) 0.809 −5.085 (3.104) 0.103 −4.975 (3.535) 0.165 −4.006 (3.818) 0.300
male 16.739 (6.486) < 0.05 33.564 (4.882) < 0.001 31.247 (5.518) < 0.001 31.708 (7.483) < 0.01
reading 0.833 (0.038) < 0.001
Reading1 0.074 (.061) 0.253
Reading2 0.342 (0.104) < 0.01
Reading3 0.307 (0.084) < 0.01
Reading4 0.323 (0.065) < 0.001
Reading5 0.436 (0.071) < 0.001
9
Level 2
SCH_ESCS 91.684 (22.473) < 0.001 41.755 (11.931) < 0.001 48.235 (15.858) < 0.01 43.523 (12.555) < 0.01
Variance components Estimate (SE) [95 % Conf. Interval] Estimate (SE) [95 % Conf. Interval] Estimate (SE) [95 % Conf. Interval] Estimate (SE) [95 % Conf. Interval]
Between-school variance 1404.934 (392.897) [809.804, 2437.427] 320.191 (177.349) [97.535, 1051.137] 443.702 (232.222) [144.7462, 399.645 (220.561) [117.481, 1359.5]
1360.117]
Within-school variance 5052.545 (414.988) [4297.056, 2461.436 (242.732) [1992.056, 3177.129 (300.965) [2605.818, 3064.199 (244.508) [2619.659, 3584.174]
5940.862] 3041.414] 3873.697]
% of between-school variance 40 86 81 83
explained
% of within-school variance 1 52 38 40
explained
International Journal of Educational Research 102 (2020) 101566
H. Ding and M. Homer International Journal of Educational Research 102 (2020) 101566
Table 8
Standardised regression coefficients of reading subareas.
M3 M4
With regard to reading subareas, all reading subareas performance except Access and Retrieve significantly explained mathematics
performance (see M3 and M4 in Table 7). This supports that specific components of reading literacy significantly account for
mathematic performance. To add into Grimm's (2008) finding that reading comprehension has differential impacts on different
aspects of mathematics, we argue that different aspects of reading could also have varying effects on interpreting mathematics
performance. To our knowledge, no studies have had discussion on the differences in the strengths of the effect sizes of performance
in reading subareas for interpreting mathematics performance in PISA (Table 8). Regarding the nonsignificant effect of Access and
Retrieve (M3, Table 7), it may because, for this subarea, students need to locate details from explicitly specified information in the
problem (OECD, 2010a). This process is not generally required in PISA mathematics problems, in which the underlying mathematics
is within the texts (OECD, 2010a). The stronger effect size of non-continuous texts comparing with that of continuous texts (Table 8)
suggests that higher ability of reading texts such as graphs and tables would benefit students more in terms of their performance in
mathematics.
Taking into account reading performance in interpreting mathematics performance also provides insights into understanding the
effects of gender and socioeconomic status on mathematics performance. As shown in Table 6, the effect of gender became larger
after controlling for students’ reading performance, suggesting that to some extent reading performance moderated the gender
difference in mathematics performance. Regarding socioeconomic status, the considerably reduced effect size of the average ESCS
within schools after reading was added as an explanatory variable suggests that socioeconomic segregation among schools may
influence students’ mathematics performance in part through the differences in reading performance.
4.2. Implications
This study contributes to a more nuanced understanding of mathematics performance in PISA from the perspective of students'
reading performance. This perspective could shed light on future studies on the interpretation of mathematics performance in PISA
and other assessments in which mathematics items have high reading demands. Specifically, this study extends the existing literature
on discussing the relationship between reading and mathematics performance by identifying the significance of reading performance
for interpreting mathematics performance in terms of the magnitude of its effect size and the variance in PISA mathematics per-
formance it can explain. Moreover, by identifying that strong association with mathematics performance is only present for specific
reading subareas, this study provides initial evidence supporting that, rather than just act as a proxy of other unknown constructs,
reading literacy (as defined in PISA) itself may at least have some influence on mathematics performance in PISA. The specific
subareas significantly associated with mathematics performance may suggest the commonalities between reading literacy and
mathematics literacy which have impact on mathematics performance in PISA.
Hence, rather than adding to the debate on the construct validity issue of PISA (Rindermann & Baumeister, 2015), we would argue
that it might be the way that PISA reading literacy and mathematics literacy are constructed with some overlaps between each other,
which to some extent is implicitly suggested in their definitions (see Variables section). We suggest that for those who would use PISA
scores for informing educational policymaking or other educational practices, taking into consideration what PISA means by lit-
eracies, key item/problem characteristics in terms of reading demands, and associated reading performance, when interpreting
students’ mathematics performance is absolutely essential. It is also necessary to identify whether students’ weakness lies in those
overlaps or the abilities unique to mathematics literacy before taking policy initiatives to improve students’ mathematics perfor-
mance.
Taking into account reading performance also allows for deeper understanding of subgroups’ differences in mathematics per-
formance. The analyses of gender differences in mathematics imply that to compare subgroups, for example, gender difference in
mathematics performance, using average scores of the interested domain (e.g. mathematics) as the sole measure may not reveal the
full relationships in the data, and caution is required when assessments results are simply interpreted in this way (see Table 3). By
simply comparing, say, the mean mathematics performance between males and females, the gender difference was not significant
(see Table 3). This is consistent with the previous study which employed Shanghai data in PISA 2009, PISA 2012, and China (B-S-J-G)
data in PISA 2015 (Guan, 2017). According to Guan (2017), males generally made up a higher proportion high achievers than did
females. This current study also finds ‘gender distributional imbalance’ (Zhou, Fan, Wei, & Tai, 2017) (see Fig. 1). Multilevel re-
gression modelling offers evidence that gender effects favouring males still exist (Zhu, Kaiser, & Cai, 2018). Adding to Zhu et al.
(2018), this current study found that gender difference is more notable after adding reading performance into modelling (see M1and
M2 as shown in Table 7). Within the context of this study, our findings suggest that to fully develop students’ potential in mathe-
matics and reduce gender difference, the reading ability of male students and the mathematics ability of female students merit further
10
H. Ding and M. Homer International Journal of Educational Research 102 (2020) 101566
attention. In recent year, shrinking gender difference in mathematics has been shown in PISA (OECD, 2010b, 2014, 2016) and other
assessment such as TIMSS (Martin and Mullis 2013). Therefore, researchers had begun to argue gender similarity (Hyde, 2005).
However, based on the findings of this study, we would argue that further investigation of the mediation effect of reading ability in
gender difference in mathematics performance may be needed before making such claim.
The importance of taking into account reading performance when interpreting mathematics performance within and between
jurisdictions also underlines the innate complexity of PISA results, and supports the argument that crudely borrowing policies from
high-ranking jurisdictions in PISA is highly problematic (Oates, 2011). As one of the most influential international large-scale as-
sessments, PISA has brought the globalized phenomenon of ‘policy borrowing’ where jurisdictions seek ‘best practices’ from edu-
cation systems elsewhere (Kamens, 2013). Researchers argue that context matters in translating others’ policies (Auld & Morris, 2016;
Oates, 2011). To add to the literature which discusses policy borrowing from the contextual view, this study suggests that, the rich
data available in PISA itself, for example, reading performance, should also not be neglected in order to better understand the
education system concerned, and when considering ‘borrowing’ educational features/policies from high performing jurisdictions.
The design of the current study does not allow for claims of causal inference (Gustafsson, 2013) to be made with regard to
mathematics and reading performance, although our findings are highly suggestive of a causal link between specific reading subareas
and mathematics in the context of PISA. To better establish the causal inference, future research investigating mathematics teachers’
perceptions of the influence of students’ reading ability in these subareas on their performance on mathematics word problems, or
conducting intervention on mathematics teaching by highlighting mathematical reading in these subareas may provide additional
such evidence.
It should be noted that this current study only focused on a specific context which is Fangshan District of Beijing, and only
secondary academic school students were involved. Hence, the findings of this study may not necessarily hold in other contexts and
age groups. However, the wider literature (Shen & Lu, 2013; Wu, 2010) suggests that the general findings of this study are likely to
replicate, to some degree, in other educational settings.
Understanding the mathematics performance by taking into consideration reading performance is still far from telling the whole
story of mathematics performance in PISA. As the extent of variance not explained by M2 (shown in Table 7) suggests, other factors
that might also explain mathematics performance in PISA might be expected – these might include, for example, affective measures
like mathematics self-efficacy and anxiety (Lee, 2009), and ability of communicating mathematical solutions or thoughts through
writing (Adu-Gyamfi, Bossé, & Faulconer, 2010). Further studies may investigate these by making use of additional PISA data and/or
sources beyond of PISA.
Declarations of interest
None.
Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Acknowledgements
We would like to thank Innocent Tasara for his comments on an early version of the manuscript. We would also like to thank the
NEEA for providing the access to the data used in this study.
Supplementary material related to this article can be found, in the online version, at doi:https://doi.org/10.1016/j.ijer.2020.
101566.
References
Adu-Gyamfi, K., Bossé, M. J., & Faulconer, J. (2010). Assessing understanding through reading and writing in mathematics. International Journal for Mathematics
Teaching and Learning, 11(5), 1–22.
Ajello, A. M., Caponera, E., & Palmerio, L. (2018). Italian students’ results in the PISA mathematics test: does reading competence matter? European Journal of
Psychology of Education, 33(3), 505–520. https://doi.org/10.1007/s10212-018-0385-x.
Ashkenazi, S., Rubinsten, O., & De Smedt, B. (2017). Editorial: Associations between reading and mathematics: Genetic, brain Imaging, cognitive and educational
perspectives. Frontiers in Psychology, 8. https://doi.org/10.3389/fpsyg.2017.00600.
Auld, E., & Morris, P. (2016). PISA, policy and persuasion: Translating complex conditions into education ‘best practice.’. Comparative Education, 52(2), 202–229.
https://doi.org/10.1080/03050068.2016.1143278.
Berkowitz, R., Moore, H., Astor, R. A., & Benbenishty, R. (2017). A research synthesis of the associations between socioeconomic background, inequality, school
climate, and academic achievement. Review of Educational Research, 87(2), 425–469. https://doi.org/10.3102/0034654316669821.
Boonen, A. J. H., de Koning, B. B., Jolles, J., & van der Schoot, M. (2016). Word problem solving in contemporary math education: A plea for reading comprehension
11
H. Ding and M. Homer International Journal of Educational Research 102 (2020) 101566
12
H. Ding and M. Homer International Journal of Educational Research 102 (2020) 101566
13