0% found this document useful (0 votes)
146 views10 pages

Paper Test Critical Thinking

CT

Uploaded by

andrea217
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
146 views10 pages

Paper Test Critical Thinking

CT

Uploaded by

andrea217
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Thinking Skills and Creativity 20 (2016) 40–49

Contents lists available at ScienceDirect

Thinking Skills and Creativity


journal homepage: http://www.elsevier.com/locate/tsc

Designing and implementing a test for measuring critical


thinking in primary school
Damián Gelerstein a,∗ , Rodrigo del Río b , Miguel Nussbaum b ,
Pablo Chiuminatto c , Ximena López d
a
Computer Science Department, Pontificia Universidad Católica de Chile. Av. Vicuña Mackenna, 4860 Santiago, Chile
b
Computer Science Department, Pontificia Universidad Católica de Chile, Av. Vicuña Mackenna, 4860 Santiago, Chile
c
Department of Literature and Linguistics, Pontificia Universidad Católica de Chile, Av. Vicuña Mackenna, 4860 Santiago, Chile
d
Department of Educational Sciences, Università Roma III, Via Ostiense, 169, 00154 Rome, Italy

a r t i c l e i n f o a b s t r a c t

Article history: The importance of critical thinking in education is underpinned by decades of theoretical
Received 27 May 2015 and practical work. Differences have been demonstrated between students who receive
Received in revised form education in this area from an early age and those who do not. However, not enough work
18 December 2015
has been done to measure these skills in a classroom setting. Given that the best time to
Accepted 5 February 2016
teach critical thinking is during the first years of primary education, we designed a test
Available online 10 February 2016
to determine the level of critical thinking among 3rd and 4th grade students in Language
Arts using a graphic novel. We showed the legitimacy of the instrument through Construct
Keywords:
Validity, Content Validity, Pearson product-moment correlation, Reliability and Item Anal-
Critical thinking
Assessment models ysis. Using the instrument, we studied how critical thinking skills differ among 3rd grade
21st Century skills students, according to their socioeconomic status (SES), studying schools with low, middle
Culture and high SES. We found significant differences between the schools, which suggest that
Graphic and textual narrative there may be a relationship between socioeconomic status and the development of critical
thinking skills. The test presented in this study is an improvement on existing assessment
tools as it replaces unidimensional models with models that provide a more detailed and
multidimensional picture of student learning.
© 2016 Elsevier Ltd. All rights reserved.

1. Introduction

Teaching critical thinking in schools is one of the main topics in the discussion regarding so-called 21st Century skills
(Greenhill, 2009). Critical thinking has been regarded as an essential requirement for responsible human activity (Marques,
2012). It is also considered fundamental if citizens are to perform their social, professional and ethical duties (Griffin, McGaw
& Care, 2012; Greenhill, 2009). Critical thinking skills allow individuals to make autonomous decisions and to question beliefs
when these are not based on solid evidence (Halpern, 2003; Mulnix, 2012).
There is a wide range of definitions of critical thinking. Some of these definitions are more philosophical, relating to
classical thought and human development (Nussbaum, 2011); others are more focused on how critical thinking is received
and developed in education (Facione, 1990; Halpern, 2003; Bailin, 2002). Despite these divergent approaches, one essential
element of critical thinking is that it is a metacognitive process. Critical thinking allows us to think not just about the world

∗ Corresponding author.
E-mail address: [email protected] (D. Gelerstein).

http://dx.doi.org/10.1016/j.tsc.2016.02.002
1871-1871/© 2016 Elsevier Ltd. All rights reserved.
D. Gelerstein et al. / Thinking Skills and Creativity 20 (2016) 40–49 41

around us (first-order skills), but also about the thought process itself (second-order skills) (Kuhn, 1999; Halpern, 2003). This
element of metacognition has consequences in teaching, relating directly to how the context is perceived and understood
(Halpern, 2003). For example, it has been established that metacognitive processes linked to critical thinking are fundamental
to the transfer of acquired knowledge (Nickerson, 1988). This study will use a definition of critical thinking that explicitly
states its link to metacognition. It is therefore defined as “a metacognitive process, consisting of a number of sub-skills (e.g.
analysis, evaluation and inference) that, when used appropriately, increases the chances of producing a logical conclusion
to an argument or solution to a problem” (Dwyer, Hogan & Stewart, 2014, p.43).
The importance of critical thinking in education is underpinned by decades of theoretical and practical work (Lai, 2011).
The inclusion of critical thinking in school curriculums has been widely reported since at least the first half of the 20th
Century, influenced by John Dewey (Bean, 2011). Facione (1990) suggests that developing critical thinking skills should
be an objective for every grade level in the K-12 curriculum. In this sense, teaching critical thinking in schools should
be integrated into regular classroom activities (Bailin, Case, Coombs, & Daniels, 1999). It is also important to teach critical
thinking from early childhood. This is because significant differences have been demonstrated between students who receive
education in this area from an early age and those who do not (Osakwe, 2009).
However, although the importance of critical thinking to global education has been defended repeatedly, not enough
work has been done to measure these skills in a classroom setting. According to UNESCO (2000), measuring improvements in
critical thinking skills is essential for improving the quality of education. If tests are understood to shape both the curriculum
and teaching, then an efficient way to improve the quality of education in critical thinking is to develop better tests (Yeh,
2001).
The literature reports that specific knowledge of the subject in which critical thinking skills are being taught is needed
if these skills are to be measured properly (Facione, 1990; Ennis 1989; McPeck, 1981; Bailin, 2002; Willingham, 2008). In
order to assess children’s critical thinking skills in the classroom, the assessment must therefore be situated and take into
account the specific subject in which these skills are being taught (Merrell, 2003). This is because of the influence that
different subjects can have on the findings or conclusions of a study, depending on the objectives of the task (Bailin et al.,
1999). Therefore, although a certain level of generality can be assigned to critical thinking skills, their use depends on their
connection to a specific subject (McPeck, 1981).
Ennis (1989) suggests that the best time to teach critical thinking is during the first years of primary education. This
suggestion is coherent with other studies, which conclude that young children benefit from being taught and assessed on
critical thinking (Kennedy, Fisher, & Ennis, 1991). However, the critical thinking tests that are available are not designed for
young children and do not focus on a specific subject. Instead, they only measure critical thinking as a general skill (Ennis
& Millman, 1985; Watson & Glaser, 1980; Facione, 1990; Ennis & Weir, 1985; Halpern, 2003). Faced with this problem, our
first research question asks: “How can we design and validate a test that determines the level of critical thinking among
primary school children in a specific subject?”
The relationship between socioeconomic status and the acquirement of skills is best expressed by Bourdieu (2011). In
his study, the author shows that the accumulation of economic capital can be transferred to society as cultural capital, in
the form of educational skills and abilities (Bourdieu, 2011). It has been revealed that a student’s socioeconomic status
substantially influences their achievements in education (Breen & Goldthorpe, 1997). It has also been shown that levels of
poverty and reading skills among third grade students have an impact on their performance in higher levels of education
(Hernandez, 2011). Some authors suggest that a student’s socioeconomic status could also plausibly influence their level
of critical thinking (Cheung, Rudowicz, Lang, Yue, & Kwan, 2001). Our second research question therefore asks: “How do
critical thinking skills differ among 3rd grade students, according to their socioeconomic status?”

2. Methodology

2.1. Designing the test

A literature review was conducted during the first stage of this study in order to define a model of critical thinking. The
aim of this model is to provide a suitable sequence of the skills required by critical thinking. The proposed model was
based on the Delphi Report (Facione, 1990), in which a group of experts defined the main skills of a critical thinker. These
skills include interpretation, analysis, evaluation, inference, explanation, and self-regulation. Furthermore, the proposed
model also incorporates the concepts of transferability and metacognition (Halpern, 2003). Therefore, through sequences
of questions and images, it is possible to assess the student’s level of ability for each of the skills included in the model.
During this sequence, the students address how their thoughts are formulated (interpretation), evaluate solutions (eval-
uation), explore and clarify inconsistencies or missing information (analysis and inference), and, finally, explain the results
of their mental process (explanation). The sequence successively infuses higher-order thinking skills (analysis, interpreta-
tion and inference) with the metacognitive process (evaluation, explanation and self-regulation). This is done by explicitly
forming arguments in response to a given narrative. Special weighting is given to the explanation as the cognitive strategies
used by students when constructing an explanation are heavily influenced by metacognition (McNamara & Magliano, 2009).
Questions requiring reasoning were included in the model so as to track the explanatory process. According to Berland
& McNeill (2012), the explanatory and reasoning processes occur simultaneously. Using reasoning to put critical thinking
into operation makes measuring this skill much simpler (Yeh, 2001). Reasoning is particularly important when it comes to
42 D. Gelerstein et al. / Thinking Skills and Creativity 20 (2016) 40–49

Table 1
Participants in the study by socioeconomic status and gender.

3rd Grade

Gender/Socioeconomic status High Middle Low


Male 55 13 17
Female 47 17 14
Total 102 30 31
4th Grade
Gender/Socioeconomic status High Middle Low
Male 40 14 18
Female 42 14 10
Total 82 28 28

measuring critical thinking as it is one of the most effective tools for discriminating between ideas based on evidence and
those that are merely opinions (Jiménez-Aleixandre & Puig, 2012).
In a second phase of the design process, the subject matter for the test was defined.
The potential for integrating critical thinking skills from an early age is demonstrated by studies which show that children
can carry out activities involving such skills as successfully as adults (Gelman & Markman, 1987; Willingham, 2008). In this
sense, it was estimated that the earliest possible age for the test to be effective is 3rd grade. This is because between 3rd and
5th grade the majority of students are capable of decoding and using most words (Swaggerty, 2015).
Critical thinking has a significant impact on the learning skills that are associated with language. For example, Nippold
et al. (2015) show that tasks based on critical thinking require sentences with more complex syntax than tasks based
on conversations. Chapman (2014) suggests that learning critical thinking skills improves student performance on the
TOEIC (Test of English for International Communication). Furthermore, the literature reports that the use of critical thinking
strategies is one of the most important factors in the development of reading skills (Hong-Nam, Leavell, & Maher, 2014).
Additionally, O’Flahavan & Tierney (2013) suggest that reading and writing are both highly productive tools for promoting
critical thinking. Given the above, Language Arts was chosen as the subject matter to be used in this study.
In order to do so, an adaptation of the plot of a well-known story was developed. The selection criteria for choosing the
story took into account that, between the ages of 8 and 10, students’ thinking becomes more logical, their memory and
intellectual skills improve, and they appreciate humour (Das, 2013). In this sense, the recommendation by Moore and Zigo
(2005) to use science fiction was followed. This is because, as a genre, science fiction encourages the use of critical and
creative reading. For this reason, the classic novel by Jules Verne, Around the World in 80 Days, was selected and adapted
for children from 3rd grade. This adapted novel was used as a means to introduce the questions from the critical thinking
test.
In a third phase of the design process, it was felt that a graphic novel, i.e., a narrative based on both text and images,
was needed in order to help deliver the test. This decision was based on the privileged position enjoyed by visual media
in our culture (Emmison & Smith, 2000; Sturken, Cartwright, & Sturken, 2001). The consequence of this is that students
basically learn from visual resources, especially during their first years of school (Downey, 2009). Frey and Fisher (2004)
acknowledge that the use of graphic novels can stimulate interest in reading among students who are unmotivated by more
traditional formats. Furthermore, this type of narrative is considered to boost traditional elements of the Language Arts
curriculum, such as literacy (Schwarz, 2006). Another argument in favour of choosing this format is that it allows for the
inclusion of elements of visual literacy that are associated with critical thinking assessment standards (Facione, 1990; Lazo
& Smith, 2014). Hassett and Schleble (2007) suggest that when reading graphic novels students do not only improve their
basic reading skills; they also discover new ways to think about the interaction between the image and text.
The questions were developed in such a way that they would appear throughout the story. In total there were 29
questions, composed of 42 items with achievement indicators. There were also 94 images included as graphics in the
test.
Following this, a series of performance criteria were defined for each of the skills included in the theoretical framework
(Table 4 in Annex) (Facione, 1990). The assessment of these criteria, using reading-writing activities, allowed the level
of critical thinking to be determined. The narrative therefore had to be structured around these criteria. Questions were
developed for each of the performance criterion so as to cover all of the skills (interpretation, analysis, inference, evaluation
and explanation). There was a higher percentage of questions related to the skill of constructing explanations. This decision is
justified by the fact that the cognitive strategies used by students when constructing explanations are heavily influenced by
metacognition (McNamara & Magliano, 2009). This link is particularly important when developing critical thinking questions.
When constructing an explanation, the student’s answer is not only based on the contents of the question; their explanation
may also come from metacognitive considerations, such as analysing the type of task at hand and different strategies for
completing the task (Flavell, 1979). Such emphasis is placed on metacognition because it is this process that articulates
critical thinking (Dwyer et al., 2014). Given the above, it was decided that using short written activities based on questions

1
The definitions of the skills are taken from the Delphi Report (Facione, 1990)
D. Gelerstein et al. / Thinking Skills and Creativity 20 (2016) 40–49 43

Fig. 1. Example question.

would be most beneficial. This is a particularly effective teaching strategy for assessing thinking as it provides an idea of
the student’s mental process when developing an argument (Bean, 2011). It was also decided to have the questions appear
throughout the story, as this has been proven to be a more effective strategy than asking the questions at the end of the story
(Van Den Broek, Kendeou, Lousberg, & Visser, 2011). An example of this can be seen in Fig. 1 (Annex), which shows one of the
scenes from the graphic novel, together with a question for the student. In this scene one of the characters (the Detective)
tries to fool another character (Martin) by using a disguise consisting of a fake moustache. The interaction between the act
of wearing a disguise and the visual display of the moustache allows a series of factors to be measured. The first of these is
interpretation, by determining the role of visual images in a narrative. Here, the student has to be capable of understanding
the role of the object as part of the disguise and not as an innate characteristic of the character. The second factor that is
measured is evaluation, by weighing up the pros and cons of a given alternative. In this case, the student assesses whether the
moustache is a good or bad disguise with which to fool the other character. The third factor is explanation, as the students not
only have to decide whether or not it is a good disguise, they also have to give their reasons (Facione, 1990). The criteria used
to evaluate their written responses are also made explicit. The markers must therefore decide whether or not the student’s
response meets these criteria. If it does, they are awarded 1 point; if it does not they are awarded 0 points. The marking
guidelines were developed with the guidance of two experts in education and Language Arts, with the aim of ensuring that
the exercises were in line with the skill that was supposedly being measured.

2.2. Criteria used to validate the test

Construct validity, content validity and criterion validity were all analysed in order to validate the test (Brown, 2000;
Wolming, 1998). Cronbach’s alpha was also used for this purpose (Cronbach, 1971). These analyses were supplemented by
an item analysis. The aim of this analysis was to gather data in order to highlight the need to review, select and remove any
questions that were very difficult, very easy, or that failed to discriminate (Nitko, 1996).
An ANOVA was conducted to provide evidence of the construct validity. The hypothesis of the ANOVA is that there would
be significant differences in critical thinking skills between students in 3rd and 4th grade (Brown, 2000). The presence of
this difference validates the construct as it is coherent with the literature, which shows that the level of critical thinking
linked to metacognition increases with age (Veenman, Wilhelm, & Beishuizen, 2004). This is explained by the fact that in
3rd grade students can give causal explanations, make simple predictions (Hapgood, Magnusson, & Sullivan-Palincsar, 2004)
and partially correct analogies (May, Hammer, & Roy, 2006). From 4th grade on, they are capable of developing problem
solving strategies, defining tasks and monitoring their thoughts (Hudgins et al., 1989).
A panel of experts was used to demonstrate content validity (Rubio, Berg-Weger, Tebb, Lee, & Rauch, 2003). The panel
included an expert in education, as well as an expert in philosophy and language arts. The panel’s opinions were then
compared with the test developers’ opinions in order to judge the level of consistency between the two (Dillashaw & Okey,
1980; Nitko, 1996). Four think aloud sessions were also included in the study to analyse content validity (Ericsson & Simon,
44 D. Gelerstein et al. / Thinking Skills and Creativity 20 (2016) 40–49

Table 2
Summary of difficulty and discrimination levels per question.

Discrimination/Difficulty Difficult (<0.2) (0.2–0.4) Suitable (0.4–0.6) (0.6–0.8) Easy (>0.8)

Bad (<0.3) 2 items 2 items 0 items 0 items 2 items


Good (≥0.3) 1 item 5 items 10 items 18 items 2 items

Table 3
Average scores by school and grade.

School SES Grade Average score Standard deviation

High 4 22.15 3.27


High 3 19.3 4.55
Middle 4 14.75 6.95
Middle 3 10.73 6.97
Low 4 10.61 6.20
Low 3 6.97 6.06

1993; Flaherty, 1974). These sessions were conducted using students of a similar age to those used in the study, as suggested
by Facione (1990).
Correlations have been detected between critical thinking test scores and school grades among Chilean high school
students (Santelices, Ugarte, Flotts, Radovic, & Kyllonen, 2011). Pearson’s product moment correlation was therefore used to
detect correlations between the test scores and Language Arts grades so as to analyse criterion validity. This sample strategy
was used as it was not possible to access the full report cards for all of the students (Ary, Jacobs, Sorensen, & Walker, 2013).
Furthermore, Cronbach’s alpha was also used to analyse the test’s reliability. This index is used to measure the internal
consistency of the test (Cronbach, 1971). An item analysis was then conducted using the test results. The aim of this was
to use objective data, i.e. difficulty and discrimination indices for each question, to identify the need to review, remove or
modify any of the questions (Nitko, 1996). These indices were classified according to the area of critical thinking measured
by each question so as then to observe which areas the students found the most and least difficult.

2.3. Experimental work

To study the relationship between critical thinking and socioeconomic status, a stratified sample was selected based
on the three types of schools that are found in the Chilean education system (state, state-subsidized and private). Each
type of establishment corresponds to a different socioeconomic status: low, middle and high, respectively (Gallego, 2002).
Furthermore, to analyse construct validity, the study was conducted with students from two grade levels: 3rd and 4th.
The participants in the sample, Table 1 in Annex, included 301 students from 3rd and 4th grade from the three types
of schools. It was decided to use an ANOVA in order to prove the existence of significant differences between the different
groups.

3. Results

3.1. Construct validity

The results of the Shapiro–Wilk test suggest that the students from 3rd and 4th grade were not normally distributed
(3.246e-06 for 3rd grade and 1.613e-07 for 4th grade). Furthermore, the Bartlett test shows that variances within the study
groups were homogeneous (a value of 0.3007). Although the normality assumption was not met, Sheng (2008) suggests that
ANOVA is a robust test. It is therefore possible to use this analysis, even though some of the assumptions were violated. In
order to do so, the Kruskal–Wallis test was used as a non-parametric alternative to address the violation of the normality
assumption (Osborne, 2008).
The test used to analyse the construct validity of the test suggests that there are significant differences between 3rd and
4th grade, at p < 0.001.

3.2. Content validity

A panel of experts examined the relationship between the questions on the test and the objectives that were established
for measuring critical thinking. This panel was more than 70% in agreement with the test that had been developed, which is
an acceptable level. The think aloud sessions also allowed questions that were misunderstood or unclear to be corrected in
the final version of the test.
D. Gelerstein et al. / Thinking Skills and Creativity 20 (2016) 40–49 45

3.3. Pearson product-moment correlation

The Pearson product-moment correlation between the test scores and the students’ grades was 0.805 for 3rd grade and
0.723 for 4th grade. Dancey & Reidy (2004) classify strong correlation as being greater than 0.7. The results of this analysis
therefore provide strong evidence of the criterion validity.

3.4. Reliability

Before the item analysis, Cronbach’s alpha for the test was 0.917. Following the item analysis, Cronbach’s alpha
was 0.909, which shows that the test is reliable (George & Mallery, 2003). In addition, we calculated the mean inter-
item correlation, which is a direct measure of internal consistency independent of the scale length (Clark & Watson,
1995). Clark and Watson (1995) recommend the inter-item correlation to be in the range 0.15–0.50, and specifically
suggest a correlation in the region of 0.20 for broad, higher-order constructs (such as critical thinking). For this test,
the inter-item correlation is ␳ = 0.25, indicating a desirable, moderate correlation.
Both the alpha and the mean inter-item correlation are results obtained having analyzed the items according to
their level of difficulty and discrimination, as detailed in the following section.

3.5. Item analysis

The results from the item analysis were organized according to the area of critical thinking that was measured, and
measured the level of difficulty and discrimination of each item (Table 2, in Annex). A range of between 0.4 and 0.6 was
chosen for this study as Womer (1968) suggests that this range can determine whether or not a question should be included
on a test. Nitko (1996) also suggests that items with a difficulty index of less than 0.2 or 0.8 should also be rejected as they are
too easy or too difficult. This recommendation was taken into account for the test used in this study. Finally, it was decided
that questions with a difficulty index between 0.2 and 0.4 or between 0.6 and 0.8 would also be included, so long as they
added to the coherence of the narrative and provided further information. Furthermore, a discrimination index greater than
0.3 was considered acceptable for any item on the test. Therefore, items with an index greater than 0.3 were considered
good and items with an index less than 0.3 were considered bad (Adkins, 1974; Hinkle, Wiersma, & Jurs, 2003; Nitko, 1996).
A summary of the distribution for all of the questions according to their level of difficulty and discrimination can be found
in Table 4 in the Annex. This Table also shows the skill and sub-skill to which each question belongs.
Initially, 42 items were measured through 29 questions; following the item analysis only 29 items were finally
included in 24 questions. As some questions had more than one achievement indictor, there are more items than
there are questions. As shown in Table 5, the results reveal that Analysis was the area with the highest level of difficulty
(0.36) and lowest level of discrimination (0.38). The area with the lowest level of difficulty was Interpretation, with an
average of 0.62.
By following Womer (1968) criteria regarding difficulty and discrimination, a selection was made of the items that would
initially be included on the test (items 2a, 3, 4, 6a, 10, 14b, 22, 23b and 27a). Nitko (1996) criteria were then used to select
additional items for the test (1a, 2b, 7, 8, 11, 12, 13, 14a, 16, 17a, 17b, 18, 19a, 19b, 21, 26, 28a, 28b, 29a and 29b). The aim of
this was to obtain more information, while preserving the overall coherence of the test. Finally, one item (6b) was removed
from the test as it had a correlation of 1 with another item and both items targeted the same performance indicator.
By doing so, the difficulty and discrimination levels for all of the items fell within the aforementioned ranges suggested
by Womer (1968) and Nitko (1996) (Table 4, Annex).

3.6. Differences between schools

A comparison was made between the groups from the different types of schools (state, state-subsidized and private).
The distribution for one of the groups did not meet the normality assumption for the Shapiro–Wilk test (high: 3.208e-05,
middle: 0.07288, low: 0.0207). The Bartlett test also revealed that the variances within the groups were not homogeneous
(1.063e-07). However, this comparison was still possible as ANOVA is a robust test and allows assumptions to be violated
(Sheng, 2008). The Kruskal–Wallis test was therefore used as a non-parametric alternative to the ANOVA test (Osborne,
2008). Significant differences were identified between socioeconomic statuses (SES), at p < 0.001. Tukey’s test then revealed
that the differences between the groups were significant at p < 0.05.
To analyse these differences in more detail, Table 3 in Annex shows the average test score for each school, as well as the
standard deviation. The results clearly show that test scores in high SES schools (22.15 for 4th grade and 19.3 for 3rd grade)
are considerably higher than in middle SES schools (14.75 for 4th grade and 10.73 for 3rd grade) and low SES schools (10.61
for 4th grade and 6.97 for 3rd grade). It is also important to highlight that the standard deviation between test scores in
the high SES school decreases considerably (28%) from 3rd grade to 4th grade (from 4.55 to 3.27 points). In the middle SES
school, the standard deviation remains practically the same (6.95 in 4th grade and 6.97 in 3rd grade), while in the low SES
school there is an increase in the standard deviation (from 6.06 to 6.2). This suggests that differences in student performance
are reduced over time in the high SES school, which is not the case in the other schools.
Table 4
Summary of scores and skills.

46
Skill Definition1 Sub-Skill Performance Indicators Difficulty Discrimination

Interpretation Understanding and expressing the meaning Categorize Classifying elements according to certain rules, criteria or 0.7 0.48
or importance of a wide variety of procedures.
experiences, situations, data, events, Decode significance Identifying which elements are important for solving a 0.65 0.34
judgements, conventions, beliefs, rules, problem.
procedures or criteria. Determining the role of images in a narrative.b 0.8 0.35
Clarify meaning Representing a result or operation in different ways. 0.48 0.49
Clarifying the use of a convention. 0.64 0.38
Clarifying an idea by giving an example. 0.39 0.37
Analysis Identifying the intended and actual Examine ideas Comparing concepts or statements by determining 0.72 0.51
inferential relationships among statements, similarities and differences.
questions, concepts, descriptions, or other Identify arguments Identifying whether an argument is in favour of or against 0.21 0.33
forms of representation intended to express a given statement.
belief, judgment, experiences, reasons, Analyse arguments Identifying the main point of an argument.b 0.35 0.42
information, or opinions. Identifying the reasons that support an argument.b 0.35 0.42

D. Gelerstein et al. / Thinking Skills and Creativity 20 (2016) 40–49


Evaluation Assessing the credibility of statements or Assess claims Recognizing the important factors for determining the 0.46 0.42
other representations that are accounts or degree of credibility of a source of information or opinion.
descriptions of a person’s perception, Assess arguments Judging whether a conclusion is correct based on the 0.28 0.28
experience, situation, judgment, belief, or premises adopted.
opinion; also assessing the logical strength Assessing the strength of the logic of an objection to an 0.63 0.37
of the actual or intended inferential argument.
relationships among statements, Judging whether an argument is relevant, applicable or has 0.65 0.52
descriptions, questions, or other forms of any implication on a situation.
representation. Assess alternatives Assessing the advantages and disadvantages of different 0.67 0.62
alternatives.b
Inference Identifying and securing elements needed to Query evidence Determining whether information is useful for building an 0.81 0.23
draw reasonable conclusions; forming argument.
conjectures and hypotheses, considering Identifying whether an inference comes from evidence. 0.43 0.31
relevant information and reducing the Identifying what additional information is needed to 0.36 0.33
consequences flowing from data, decide between two contradicting opinions.
statements, principles, evidence, judgments, Accepting or rejecting a hypothesis based on empirical 0.46 0.3
beliefs, opinions, concepts, descriptions, evidence.
questions, or other forms of representation. Propose alternatives Given a set of priorities with which one may or may not 0.6 0.61
agree, seeing the advantages and disadvantages that will
come from implementing a decision once it has been
made.b
Using different strategies to solve problems.b 0.81 0.55
Draw conclusions Arriving at a valid conclusion based on evidence. 0.69 0.52
Reasoning/Explanation Stating and justifying reasoning in terms of State results Communicating the conclusions reached by following a 0.67 0.68
the evidential, conceptual, methodological, procedure.b
criteriological, and contextual Graphically communicating the relationship between 0.46 0.45
considerations upon which one’s results concepts and ideas.
were based; also presenting one’s reasoning Developing a narrative which examines the changes in a 0.82 0.35
in the form of cogent arguments. topic over time.
Justify procedures Stating the standards used to assess a literary work. 0.75 0.64
Describing the strategy used to make a reasonable 0.18 0.41
decision.
Present arguments Arguing in favour of or against a point of view. 0.63 0.65
Explaining the comprehension of a concept.b 0.35 0.14
Identifying and expressing evidence as part of reasoning 0.72 0.66
for and against other people’s and one’s own way of
thinking.
a
Items based on the Delphi Report (Facione, 1990).
b
Indicators that were measured on more than one occasion.
D. Gelerstein et al. / Thinking Skills and Creativity 20 (2016) 40–49 47

Table 5
Average scores for difficulty and discrimination, before and after the item analysis.

Skill Pre-item analysis Post-item analysis

Difficulty Discrimination Difficulty Discrimination


Interpretation 0.62 0.43 0.6 0.45
Analysis 0.36 0.38 0.47 0.45
Inference 0.58 0.41 0.57 0.44
Evaluation 0.47 0.4 0.6 0.48
Explanation 0.41 0.47 0.59 0.57

4. Conclusions

The aim of this study was to answer the two research questions stated above.
The first question asked “How can we design and validate a test that determines the level of critical thinking among
primary school children in a specific subject?” To answer this question, a critical thinking test was designed and validated
within the context of Language Arts, using a graphic novel. The challenge highlighted in the research question was overcome
as the test was validated for both 3rd and 4th grade students.
It is interesting to note that no other test has used a graphic novel to measure the level of critical thinking among 3rd and
4th grade students. There are several benefits to designing a test using a graphic novel. These include the impact it has on
student motivation and interest (Chase, Son, & Steiner, 2014), the opportunity it provides for developing higher-order skills
(Pantaleo, 2014), their effectiveness for teaching literacy (Jennings, Rule, & Zanden, 2014) and the support they provide to
readers who have difficulty visualizing stories (Lyga, 2006). It remains as future work to validate tests in other areas of the
curriculum, such as science and mathematics.
The main limitation of the test that was used is the cultural context in which it was designed. Cultural factors are
particularly important when developing critical thinking skills in Language Arts classes (Guo, 2013). Examples of this include
the significant effect of culture on critical thinking among university students when writing blogs (Shahsavar, 2014) and
the influence of institutional culture on critical thinking among students (Tsui, 2000). It is also important to highlight that,
as the theoretical framework was developed within a specific cultural context, students may be at a disadvantage because
of the structure of the test and not because of their level of critical thinking (Howe, 1995; Childs, 1990). Further study is
required to analyse how local culture affects the theoretical framework for critical thinking described in this study. It is also
necessary to study how local culture affects the performance of the test, in terms of both how the images are understood,
as well as the structure of the narrative. Quantification of cultural factors has been studied among adults (Hofstede, 2001),
but is only incipient among children (Hofstede, 1986). Developing a tool in this sense would help with the robustness of a
critical thinking test for children from different cultural backgrounds. Another limitation was that the study sample was
unbalanced towards the high SES students. This characteristic of the sample may have violated the assumptions of the
parametric tests that were used, as well as the item analysis that was performed to determine the final items to be
included in the test. As for the former, homogeneity of variances between groups was confirmed by the Bartlett test.
Although the data appeared to have a non-normal distribution, ANOVA was nevertheless used in the construct validity
as it is a robust test when used with large samples, even when the non-normality assumption is not met (Sheng, 2008).
As for the latter, results of the item analysis may indeed have been affected due to the over-representativeness of the
high SES students and these should thus be taken as preliminary. Further research is required in order to fine tune the items.
Despite this limitation we believe that this test provides an important contribution to the assessment of critical thinking
skills in young children. There is an urgent need to understand how these skills are promoted in schools in order to address
curricular changes from an early stage of development (UNESCO, 2000)
Once the test had been validated, it was decided to answer the second research question, which asked “How do critical
thinking skills differ among 3rd grade students, according to their socioeconomic status?” In this sense, significant differences
were found between the schools with different socioeconomic statuses. This suggests that there may be a relationship
between socioeconomic status and the development of critical thinking skills. This is particularly evident when observing
that the standard deviation among students in the high SES school decreases over time, increasing homogeneity among
students, while the standard deviation remains the same in the other schools. Although further findings are needed in
this area, it would be worth asking whether this difference can be reduced by teaching the critical thinking skills that are
measured by this test.
This study is a first step towards giving primary schools access to a test that allows them to measure the development of
their students’ critical thinking skills. This test is an improvement on existing assessment tools as it replaces unidimensional
models with models that provide a more detailed and multidimensional picture of student learning (Conley, 2015). The
limitations of this study include the range of grade levels that were covered, the cultural context in which it took place, and
the area of the curriculum for which the test was developed. Future work is required in order to overcome these limitations.
48 D. Gelerstein et al. / Thinking Skills and Creativity 20 (2016) 40–49

Acknowledgements

This research was funded by CONICYT-FONDECYT grant 1150045, and by the VRI-UC Interdisciplina Research Project N◦
13/2014.

References

Adkins, D. C. (1974). Test construction: development and interpretation of achievement tests. Columbus,Ohio: Merrill Publishing Company.
Ary, D., Jacobs, L., Sorensen, C., & Walker, D. (2013). Introduction to research in education. Cengage Learning.
Bailin, S. (2002). Critical thinking and science education. Science & Education, 11(4), 361–375.
Bailin, S., Case, R., Coombs, J. R., & Daniels, L. B. (1999). Conceptualizing critical thinking. Journal of Curriculum Studies, 31(3), 285–302.
Bean, J. C. (2011). Engaging ideas: the professor’s guide to integrating writing, critical thinking, and active learning in the classroom. San Francisco: John Wiley
& Sons.
Berland, L. K., & McNeill, K. L. (2012). For whom is argument and explanation a necessary distinction? A response to Osborne and Patterson. Science
Education, 96(5), 808–813.
Breen, R., & Goldthorpe, J. H. (1997). Explaining educational differentials towards a formal rational action theory. Rationality and Society, 9(3), 275–305.
Bourdieu, P. (2011). The forms of capital. (1986). Cultural Theory: An Anthology, 81–93.
Brown, J. D. (2000). What is construct validity. JALT Testing and Evaluation SIG Newsletter, 4(2), 7–10.
Chase, M., Son, E. H., & Steiner, S. (2014). Sequencing and graphic novels with primary-grade students. The Reading Teacher, 67(6), 435–443.
Chapman, J. (2014). Critical thinking and toeic: advanced skills for teachers and test-takers. The 2013 PanSIG Proceedings, 102.
Cheung, C. K., Rudowicz, E., Lang, G., Yue, X. D., & Kwan, A. S. (2001). Critical thinking among university students: does the family background matter?
College Student Journal, 35(4), 577.
Childs, R. (1990). Gender bias and fairness. Practical Assessment, Research & Evaluation, 2(3).
Clark, L. A., & Watson, D. (1995). Constructing validity: basic issues in objective scale development. Psychological Assessment, 7(3), 309.
Conley, D. (2015). A new era for educational assessment. Education Policy Analysis Archives, 23, 8.
Cronbach, L. J. (1971). Test validation. In R. L. Thorndike (Ed.), Educational measurement. Washington, D. C: American Council on Education.
Dancey, C., & Reidy, J. (2004). Statistics without maths for psychology: using SPSS for windows. London: Prentice Hall.
Das, A. (2013). The impact of children’s literature in the teaching of english to young learners. International Journal of English: Literature Language & Skills.
Dillashaw, F. G., & Okey, J. R. (1980). Test of integrated science process skills for secondary students. Science Education, 64, 601–608.
Downey, E. M. (2009). Graphic novels in curriculum and instruction collections. Reference & User Services Quarterly, 49(2), 181–188.
Dwyer, C. P., Hogan, M. J., & Stewart, I. (2014). An integrated critical thinking framework for the 21st century. Thinking Skills and Creativity, 12, 43–52.
Emmison, M., & Smith, P. (2000). Researching the visual: images, objects, contexts and interactions in social and cultural inquiry: Introducing qualitative
methods.
Ennis, R. H. (1989). Critical thinking and subject specificity: clarification and needed research. Educational Researcher, 18(3), 4–10.
Ennis, R. H., & Millman, J. (1985). Cornell critical thinking test-level X. CA, Pacific Grove: Midwest Publications.
Ennis, R. H., & Weir, E. (1985). The Ennis–Weir essay test: an instrument for testing and teaching. Pacific Grove, CA: Midwest Publications.
Ericsson, K. A., & Simon, H. A. (1993). Protocol analysis: verbal reports on data. Cambridge, MA: MIT Press.
Facione, P. A. (1990). Critical thinking: a statement of expert consensus for purposes of educational assessment and instruction. Research Findings and
Recommendations. http://assessment.aas.duke.edu/documents/Delphi Report.pdf, (downloaded 03.02.14).
Flavell, J. H. (1979). Metacognition and cognitive monitoring: a new area of cognitive–developmental inquiry. American Psychologist, 34(10), 906.
Flaherty, E. G. (1974). The thinking aloud technique and problem-solving ability. Journal of Educational Research, 68, 223–225.
Frey, N., & Fisher, D. (2004). Using graphic novels, anime, and the Internet in an urban high school. English Journal, 19–25.
Gallego, F. A. (2002). Competencia y resultados educativos: teoría y evidencia para Chile. Cuadernos de Economía, 39(118), 309–352.
Gelman, S. A., & Markman, E. M. (1987). Young children’s inductions from natural kinds: the role of categories and appearances. Child Development,
1532–1541.
George, D., & Mallery, P. (2003). SPSS for Windows step by step: a simple guide and reference. Boston, MA: Allyn and Bacon.
Greenhill, V. (2009). P21 framework definitions document, (Retrieved 15.12.10), from,
http://www.21stcenturyskills.org/documents/p21 framework definitions 052909.pdf.
Griffin, P., McGaw, B., & Care, E. (2012). Assessment and teaching of 21st century skills. New York: Springer.
Guo, M. (2013). Developing critical thinking in english class: culture-based knowledge and skills. Theory and Practice in Language Studies, 3(3), 503–507.
Halpern, D. (2003). Thought and knowledge: an introduction to critical thinking (4th ed.). Mahwah, NJ: Lawrence Erlbaum Associates.
Hapgood, S., Magnusson, S. J., & Sullivan-Palincsar, A. (2004). Teacher, text, and experience: a case of young children’s scientific inquiry. Journal of the
Learning Sciences, 13(A), 455–505.
Hassett, D. D., & Schleble, M. B. (2007). Finding space and time for the visual in K-12 literacy instruction. English Journal, 97(1)
Hernandez, D. J. (2011). Double jeopardy: how third-grade reading skills and poverty influence high school graduation. Annie E. Casey Foundation.
Hinkle, D. E., Wiersma, W., & Jurs, S. G. (2003). Applied statistics for the behavioral sciences. Boston: Houghton Mifflin.
Hofstede, G. (1986). Cultural differences in teaching and learning. International Journal of Intercultural Relations, 10(3), 301–320.
Hofstede, G. (2001). Culture’s consequences: comparing values, behaviors, institutions and organizations across nations. In Geert Hofstede (Ed.),.
Thousand Oaks, California: Sage.
Hong-Nam, K., Leavell, A. G., & Maher, S. (2014). The Relationships among reported strategy use, metacognitive awareness, and reading achievement of
high school students. Reading Psychology, 35(8), 762–790.
Howe, K. (1995). Validity, bias and justice in educational testing: the limits of consequensialist conception. In A. Neiman (Ed.), Philosophy of education
1995 (2nd ed., pp. 295–302). Normal, ILL: Philosophy of Education Society.
Hudgins, B. B., et al. (1989). Children’s critical thinking: a model for its analysis and two examples. Journal of Educational Research, 82(6), 327–338.
Jennings, K. A., Rule, A. C., & Zanden, S. M. V. (2014). Fifth graders’ enjoyment, interest, and comprehension of graphic novels compared to
heavily-illustrated and traditional novels. International Electronic Journal of Elementary Education, 6(2), 257.
Jiménez-Aleixandre, M. P., & Puig, B. (2012). Argumentation, evidence evaluation and critical thinking. In Second international handbook of science
education. pp. 1001–1015. Netherlands: Springer.
Kennedy, M., Fisher, M. B., & Ennis, R. H. (1991). Critical thinking: Literature review and needed research. Educational Values and Cognitive Instruction:
Implications for Reform, 2, 11–40.
Kuhn, D. (1999). A developmental model of critical thinking. Educational Researcher, 28(2), 16–26.
Lai, E. R. (2011). Critical thinking: a literature review. Pearson’s Research Reports, 6, 40–41.
Lazo, V. G., & Smith, J. (2014). Developing thinking skills through the visual: An a/r/tographical journey. International Journal of Education through Art,
10(1), 99–116. Chicago.
Lyga, A. A. (2006). Graphic novels for (Really) young readers: owly, buzzboy, pinky and stinky. who are these guys? and why aren’t they ever on the shelf?
School Library Journal, 52(3), 56.
D. Gelerstein et al. / Thinking Skills and Creativity 20 (2016) 40–49 49

Marques, J. F. (2012). Moving from trance to think: why we need to polish our critical thinking skills. International Journal of Leadership Studies, vol. 7(1),
87–95.
May, D. B., Hammer, D., & Roy, P. (2006). Children’s analogical reasoning in a third grade science discussion. Science Education, 90(2), 316–330.
McNamara, D. S., & Magliano, J. P. (2009). 5 Self-Explanation and Metacognition, Handbook of metacognition in education, 60.
McPeck, J. (1981). Critical thinking and education. Oxford: Martin Robertson.
Merrell, K. W. (2003). Behavioral, social, and emotional assessment of children and adolescents. Psychology Press.
Moore, M., & Zigo, D. (2005). Chicken soup for the science fiction soul: breaking the genre lock in the high school literacy experience. Journal of Curriculum,
21(3), 29–45.
Mulnix, J. W. (2012). Thinking critically about critical thinking. Educational Philosophy and Theory, 44(5), 464–479.
Nickerson, R. (1988). On improving thinking through instruction. Review of Research in Education, 15(3), 3–57.
Nippold, M. A., Frantz-Kaspar, M. W., Cramond, P. M., Kirk, C., Hayward-Mayhew, C., & MacKinnon, M. (2015). Critical Thinking about Fables: Examining
Language Production and Comprehension in Adolescents. Journal of Speech, Language, and Hearing Research.
Nitko, A. J. (1996). Educational assessment of students (2nd ed.). New Jersey: Prentice-Hall.
Nussbaum, M. C. (2011). Creating capabilities. Harvard: University Press.
O’Flahavan, J., & Tierney, R. (2013). Reading, Writing, and Critical Thinking. Educational Values and Cognitive Instruction Implications for Reform, 41.
Osakwe, R. N. (2009). The effect of early childhood education experience on the academic performances of primary school children. Studies on Home and
Community Science, 3(2), 143–147.
Osborne, J. W. (Ed.). (2008). Best practices in quantitative methods. In. Thousands Oaks: Sage.
Pantaleo, S. (2014). Reading images in graphic novels: taking students to a’greater thinking level’. English in Australia, 49(1), 38.
Rubio, D. M., Berg-Weger, M., Tebb, S. S., Lee, E. S., & Rauch, S. (2003). Objectifying content validity: conducting a content validity study in social work
research. Social Work Research, 27(2), 94–104.
Santelices, M. V., Ugarte, J. J., Flotts, P., Radovic, D., & Kyllonen, P. (2011). Measurement of new attributes for chile’s admissions system to higher
education. ETS Research Report Series, 2011(1), i–44.
Schwarz, G. (2006). Expanding literacies through graphic novels. English Journal, 58–64.
Shahsavar, Z. (2014). The impact of culture on using critical thinking skills through the blog. Media & Mass Communication, 3, 99–105.
Sheng, Y. (2008). Testing the assumptions of analysis of variance. In J. W. Osborne (Ed.), Best practices in quantitative methods. California: Sage.
Sturken, M., Cartwright, L., & Sturken, M. (2001). Practices of looking: an introduction to visual culture. pp. (p. 385). Oxford: Oxford University Press.
Swaggerty, E. A. (2015). Selecting engaging texts for upper elementary students who avoid reading or find reading difficult. Children’s Literature in the
Reading Program: Engaging Young Readers in the 21st Century, 150.
Tsui, L. (2000). Effects of campus culture on students’ critical thinking. The Review of Higher Education, 23(4), 421–441.
UNESCO. (2000). Dakar Framework for Action. Education for All: Meeting Our Collective Commitments. In Adopted by the world education forum, Dakar,
Senegal, April 26–28 1999. Paris: UNESCO. http://unesdoc.unesco.org/images/0012/001211/121147e.pdf
Van Den Broek, P., Kendeou, P., Lousberg, S., & Visser, G. (2011). Preparing for reading comprehension: Fostering text comprehension skills in preschool
and early elementary school children. International Electronic Journal of Elementary Education, 4(1), 259–268.
Veenman, M. V. J., Wilhelm, P., & Beishuizen, J. J. (2004). The relation between intellectual and metacognitive skills from a developmental perspective.
Learning and Instruction, 14(1), 89–109.
Watson, G. B., & Glaser, E. M. (1980). WGCTA Watson-glaser critical thinking appraisal manual: forms A and B. San Antonio: The Psychological Corporation.
Willingham, D. T. (2008). Critical thinking: why is it so hard to teach? Arts Education Policy Review, 109(4), 21–32.
Wolming, S. (1998). Validitet. Ett traditionellt begrepp i modern tillämpning Validity: A modern approach to a traditional concept. Pedagogisk Forskning i
Sverige, 3(2), 81–103.
Womer, F. B. (1968). Basic concepts in testing. Boston: Houghton: Mifflin.
Yeh, S. S. (2001). Tests worth teaching to: constructing state-mandated tests that emphasize critical thinking. Educational Researcher, 30(9), 12–17.

You might also like