0% found this document useful (0 votes)

77 views17 pages

Is The d2 Test of Attention Rasch Scalable

This document analyzes the fit of the d2 test of attention to the Rasch Poisson Counts Model (RPCM). The d2 test is a widely used measure of sustained and selective attention that requires individuals to cross out target letters among distractors within time limits. The authors examined the fit of six different scoring techniques of the d2 test to the RPCM. They found that only two scoring techniques - concentration performance scores and total number of characters canceled - fit the RPCM. Additionally, the individual test items fit the model and there was negligible differential item functioning across sex. Graphical and statistical analyses confirmed the overall fit of these two scoring techniques to the RPCM.

Uploaded by

Micaela Delaloye

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

77 views17 pages

Is The d2 Test of Attention Rasch Scalable

Uploaded by

Micaela Delaloye

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Article

Perceptual and Motor Skills

Is the d2 Test of 2018, Vol. 126(1) 70–86

! The Author(s) 2018
Article reuse guidelines:
Attention Rasch sagepub.com/journals-permissions
DOI: 10.1177/0031512518812183
Scalable? Analysis journals.sagepub.com/home/pms

With the Rasch

Poisson Counts Model

Purya Baghaei1 , Hamdollah Ravand2,

and Mahsa Nadri1

Abstract
The d2 test is a cancellation test to measure attention, visual scanning, and process-
ing speed. It is the most frequently used test of attention in Europe. Although it has
been validated using factor analytic techniques and correlational analyses, its fit to
item response theory models has not been examined. We evaluated the fit of the d2
test to the Rasch Poisson Counts Model (RPCM) by examining the fit of six different
scoring techniques. Only two scoring techniques—concentration performance
scores and total number of characters canceled—fit the RPCM. The individual
items fit the RPCM, with negligible differential item functioning across sex.
Graphical model check and likelihood ratio test confirmed the overall fit of the
two scoring techniques to RPCM.

Keywords
sustained attention, selective attention, d2 test, Rasch Poisson Counts Model,
validity

1
English Department, Mashhad Branch, Islamic Azad University, Mashhad, Iran
2
English Department, Vali-e-Asr University, Rafsanjan, Iran
Corresponding Author:
Purya Baghaei, English Department, Islamic Azad University, Ostad Yusofi St., 91871-Mashhad, Iran.
Email: [email protected]
Baghaei et al. 71

Introduction
Attention is a basic neurocognitive function that is a prerequisite for perform-
ance on more complex cognitive tasks (Cooley & Morris, 1990). Deficits in
attention are symptoms of several disorders in individuals with acquired brain
injury (Sohlberg & Mateer, 2001) and other disorders including schizophrenia
and metabolic disturbances (Mirsky, Anthony, Duncan, Ahearn, & Kellam,
1991). Furthermore, affective and anxiety disorders such as depression and bipo-
lar disorder coincide with impairments of attention.
Research on identifying the cognitive profiles of adults with ADHD demon-
strates that they suffer from difficulties in attention, working memory, inhibi-
tion, and flexibility (Fuermaier et al., 2013). Inattention and problems of
concentration are deemed to be the core symptoms of attention deficit hyper-
activity disorder (ADHD). Therefore, measuring attention is part of the diag-
nosis for ADHD (American Psychiatric Association, 2013).
The d2 test of attention is a widely used paper and pencil measure
of sustained and selective attention in which both components of speed
and accuracy have been taken into consideration in its scoring system. It is a
cancellation test in which respondents have to cross out target variables
among similar nontarget stimuli (Brickenkamp & Zillmer, 1998).The target vari-
ables are ds’ with two dashes above or below them. The targets are randomly
interspersed among nontarget characters. Nontarget characters are d’s with one,
three, or four dashes above or below them and p’s with one, two, three, or four
dashes above or below them. The target and nontarget characters are presented
in 14 consecutive lines. Separate time limits of 20 seconds are allotted for each
line with no pause between the lines. Because the d2 test is timed and requires
focus on the target stimuli among background irrelevant noise, it is also con-
sidered a measure of speed, scanning accuracy, and selective attention.
The construct validity and reliability of the test have been investigated with
factor analysis and against criterion measures. Brickenkamp and Zillmer (1998)
demonstrated that the d2 test correlates with symbol digit modalities test
(Smith, 1973), Stroop (1935) color word test, and trail making test Parts A
and B (Reitan &Wolfson, 1985), all of which are measures of attention, scan-
ning, and mental flexibility. Brickenkamp and Zillmer (1998) also showed that
the test has a weak correlation with performance and verbal subtests of the
Wechsler Adult Intelligence ScaleRevised (Wechsler, 1981), which was deemed
as evidence of divergent validity for the d2 test. Researchers have also reported
high retest reliability estimates for the test (Brickenkamp & Zillmer, 1998; Lee,
Lu, Liu, Lin, & Hsieh, 2017; Steinborn, Langner, Flehmig, & Huestegge, 2018).
In fact, the d2 test is regarded as an extremely reliable test. Steinborn et al.
(2018) recently, using an interpolated resampling technique (cumulative reliabil-
ity function analysis), demonstrated that the d2 reliability is retained even when
test length is reduced by 50%. One hypothesized reason for the exceptional test–
retest reliability of the d2 test is the mode of administration. In particular, it is
72 Perceptual and Motor Skills 126(1)

believed that the participant’s task motivation is constantly refreshed by both

the experimenter’s presence and the time-critical instructions to shift to the next
line, both of which seem to encourage participants to give their best performance
(cf. Pieters, 1985; Steinborn, Langner, & Huestegge, 2017; Van Breukelen et al.,
1995, for further theoretical considerations).
In an attempt to demonstrate the validity of the d2 test with an American
sample and extend its scoring system, M. E. Bates and Lemay (2004) admini-
strated it along with several other neuropsychological measures to a relatively
large participant sample. The other tests employed in their study included meas-
ures of attention, processing speed, abstract reasoning, verbal ability, visual
spatial ability, and working memory. Results demonstrated that the test is
internally consistent as measured by Cronbach’s alpha. Principal component
analysis of all the measures along with different subscores of the d2 (as separate
variables) revealed that five factors can be extracted from the data. The first
factor was a speed factor with the total number of characters processed, the total
number of characters correctly processed, concentration performance (CP; total
number of correctly canceled minus total number incorrectly canceled), and the
digit symbol substitution test loading on this factor. A second factor, named
scanning accuracy, was comprised of total errors, errors of omission, and per-
cent errors. An intelligence factor, a scanning deterioration or acceleration
factor, and a memory factor also emerged. These findings were interpreted as
convergent and discriminant validity evidence for the d2 test.
To our knowledge, no study so far has examined the fit of the d2 test to item
response theory (IRT) models (Birnbaum, 1968). IRT models are a class of
psychometric theories which model the relationship between an examinee’s per-
formance on an item and the examinee’s overall location on the latent trait.
The probability of a correct response to an item is assumed to be a function
of a persons’ ability and some item parameters. With a higher ability parameter,
the probability of a correct response is expected to increase. The relationship
between ability and probability of a correct response is depicted graphically in a
set of graphs called item characteristic curves which are the main tool for assess-
ing item quality. Therefore, one straightforward way to examine the psychomet-
ric quality of an item is to check whether examinees with higher locations on the
latent trait have higher probabilities of endorsing an item.
Although intelligence tests are commonly analyzed using IRT models, pro-
cessing speed tests are still evaluated using classical test theory because the
structure of speed tests does not match the requirements of most IRT models
(Doebler & Holling, 2016). Usually in speed tests, there are several simple tasks,
and the unit of analysis is a count of correct answers to these tasks. This is unlike
individual right or wrong or Likert-type items that are commonly fed into
IRT models. IRT models are very flexible and allow for detailed item analysis,
provision of standard errors of measurement for different ability levels,
Baghaei et al. 73

computerized adaptive testing, optimal planning of test designs, test equating,

and differential item functioning (DIF; Doebler & Holling, 2016).
In this study, we examine the fit of the d2 test to the Rasch Poisson
Counts Model (RPCM, Rasch, 1960/1980). The structure of the test, that is, a
combination of 14 lines of stimuli each with a separate time limit, makes it an
ideal candidate for RPCM scaling. Thus, in this study, the overall fit of the
d2 test to RPCM, the fit of the individual items (lines), and the reliability of
the test are all examined.

The RPCM Measurement Model

The RPCM (Rasch, 1960/1980) is a unidimensional member of the family of
Rasch models (RMs). It is used for timed tests where counts of correct replies or
errors, within each task, are modeled instead of replies to individual items
(Doebler, Doebler, & Holling, 2014; Jansen, 1997). Such testing conditions
arise in speeded neuropsychological or psychomotor tests in which respondents
must tick oﬀ an unlimited number of items within a ﬁxed time period (Spray,
1990). The RPCM is expressed as follows:

expðvi Þyvivi
pðYvi ¼ yvi Þ ¼
yvi!

where yvi is the total raw score or counts of errors for person v on part i of the
test.The scores on the tasks are assumed to be independent conditional on the
parameters vi0 v ¼ 1, . . . N,i ¼ 1,. . . ,l and Poisson distributed. vi is the mean of
the raw scores or errors on part i of the test for person v. This average is assumed
to have a multiplicative composition and is the product of person’s ability v and
item’s easinessi :

vi ¼ v i

It is possible to estimate v and i independent of each other; hence, separ-

ability of parameters holds (for derivation see Rasch, 1960/1980). RPCM has
been applied in psychomotor testing (Spray, 1990), the testing of attention or
processing speed (Doebler & Holling, 2015), oral reading errors (Jansen, 1997;
Rasch, 1960/1980; Verhelst & Kamphuis, 2009), reading comprehension
(Verhelst & Kamphuis, 2009), and divergent thinking (Forthmann et al., 2016).
The ﬁt of data to a latent trait model, such as the RM, is evidence that the
covariation among the test items is caused by an underlying latent factor which
could be the intended construct and is, therefore, considered validity evidence
(Baghaei & Tabatabaee-Yazdi, 2016; Borsboom, 2008). When the RM ﬁts, the
homogeneity of the latent variable is supported. ‘‘More’’ or ‘‘less’’ of a latent
trait only makes sense if the trait is homogeneous. Needless to say, when the RM
74 Perceptual and Motor Skills 126(1)

does not hold, adding raw scores to compute an overall score is not warranted,
as we are then adding components of a heterogeneous latent variable. If a set of
items measure a single latent variable, ‘‘then the Rasch model is the necessary
and sufficient conceptualization. If they do not, then the set of items contains a
mixture of variables and there is no simple, efficient, or unique way to know
their utility for measuring anything’’ (Wright, 1977, p. 224). The fit of the RM is
evidence that the latent variable is quantitative, and items and the latent variable
can be measured on an interval scale with a common unit of measurement
(Wright, 1988).

Method
Participants and Instrument
We administered the d2 test of attention according to its standard procedures
(Brickenkamp & Zillmer, 1998) to 138 nonclinical Iranian university students
(68% female). The age range was 19 to 52 years (M ¼ 24.26, standard deviation
[SD]¼5.64). The data were collected as part of a project on the cognitive correlates
of listening comprehension in English as a foreign language (Nadri, 2018). As
mentioned earlier, the test consists of 14 lines of characters where respondents
should cross out d’s with two dots in 20 seconds. This time limit is allotted for each
line separately. This structure makes the test optimal for RPCM analysis.
Participation in the study was voluntary and no ﬁnancial compensation was
made. On conclusion, participants were thanked for their cooperation and time
and were provided with the proﬁles of their cognitive and English language skills.
The institutional review board approved the study and waived the need for par-
ticipants’ written consent (IRB decision # 145/ ).

Results
There are a number of scores that are computed on the basis of d2 test perform-
ance, including the total number of characters canceled (TN, total number—an
index of processing speed), errors of commission (C, nontarget characters can-
celed), errors of omission (O, target characters respondents failed to cancel),
total errors (TE, sum of the errors of commission and errors of omission),
the number of characters correctly canceled minus the number of errors of
commission (CP), and the number of characters correctly canceled minus the
sum of the errors (TN-TE) (Brickenkamp & Zillmer, 1998). These scores are
separately calculated for each of the 14 lines of the test. In this analysis, each line
of the test is considered an item and is a unit of analysis.
We ran six separate RPCM analyses on the six scoring techniques mentioned
earlier. We used ‘‘lme4’’ package (D. Bates et al., 2017) in R (R Development
Core Team, 2016) to estimate the models.1 We evaluated global model ﬁt by
Baghaei et al. 75

Table 1. LRTs With Median of Raw Scores as a Partitioning

Criterion for Overall Model Check.

Scoring technique Chi square p df

O 35.75 .00 13
C 53.85 .00 13
TE 37.34 .00 13
TN 12.82 .46 13
CP 14.98 .31 13
TN-TE 36.57 .00 13
TN ¼ total number; C ¼ nontarget characters canceled; O ¼ errors of omis-
sion; CP ¼ concentration performance; TE ¼ total number of errors.

examining diﬀerences in item parameters across two subsamples of the data

using likelihood ratio tests (LRTs), similar to Andersen’s (1973) test for the
binary RM. The approach is based on the invariance requirement of the RM
which states that the item parameters should remain constant in different sub-
samples of the data (Baghaei, Yanagida, & Heene, 2017). In this approach, in
the first step, the sample is partitioned according to a criterion such as sex or raw
score median. Two new models are then estimated. In the first model, a group-
specific intercept is added. The item parameters are assumed to be equal in this
model for both groups. In a subsequent model, ’’group’’ by ‘‘item’’ interaction
terms which represent the group-specific deviations in item difficulty are added.
In other words, the items are allowed to have different difficulty parameters
across the groups. The interaction of group and the individual items, that is,
the difference in item difficulty for respondents in one of the groups relative to
those in the other group, is accounted for in this model. The two models are then
compared with LRT. If the model with the interaction term fits better, it means
that this model accounts for more variability in the data and the item parameters
are not constant across the groups; hence, the RPCM does not hold (Baghaei &
Doebler, 2018).
To evaluate the overall fit of the d2 test, we partitioned the data on the basis
of test takers’ median raw scores for each scoring technique and performed
LRT for the low scorers versus high scorers. Table 1 shows that the LRT was
nonsignificant when the CP and TN scores are computed. Therefore, the d2 test
fits the RPCM only when these two scores are computed, while other scoring
techniques result in misfit to the RM.
We also conducted graphical model checks to examine which scoring type is a
better fit to the model. In the graphical check, predicted values for each person
are plotted against their standardized Pearson residuals. Roughly symmetrical
Pearson residuals with few outliers confirm acceptable fit (Baghaei & Doebler,
2018). Furthermore, standardized (Pearson) residuals are normally distributed
76 Perceptual and Motor Skills 126(1)

Figure 1. Graphical model check for different scoring techniques: (a) Errors of omission,
(b) errors of commission, (c) total errors, (d) total characters canceled, (e) concentration
performance, and (f) characters correctly canceled minus the sum of the errors.
Baghaei et al. 77

Figure 1. Continued.
78 Perceptual and Motor Skills 126(1)

Figure 1. Continued.
Baghaei et al. 79

Table 2. Item Easiness Parameters, their Standard Errors, and Chi-

Square Fit Values for the CP and TN Scores.

CP TN

Item Estimate SE Fit Estimate SE Fit

1 2.27 .068 .91 2.44 .053 3.90

2 2.21 .068 1.64 2.43 .053 2.41
3 2.29 .068 1.80 2.47 .053 1.48
4 2.25 .068 1.13 2.43 .053 .84
5 2.28 .068 1.05 2.46 .053 .89
6 2.22 .068 1.75 2.40 .053 1.08
7 2.21 .068 1.42 2.44 .053 20.58
8 2.27 .068 6.39 2.44 .053 2.82
9 2.21 .068 1.54 2.38 .053 .31
10 2.19 .068 .39 2.35 .053 .43
11 2.21 .068 3.43 2.37 .053 3.39
12 2.13 .068 4.33 2.37 .053 4.70
13 2.17 .068 .73 2.36 .053 5.50
14 2.21 .068 2.65 2.40 .053 1.62
SD (latent ability) .73 .55
Note. CP ¼ concentration performance; TN ¼ total number canceled.

with mean 0 and SD of 1.0 for good model fit. The graphs in Figure 1 also
indicate that the CP scores (i.e., the number of characters correctly canceled
minus the number of characters incorrectly canceled) and the TN scores (i.e., the
total number of characters canceled) have the best fit. Because the C, O, E, and
TN-E scores did not have a good fit, further analyses were run only on the CP
and TN scores.
Table 2 shows the item easiness parameters, their standard errors, and their
chi-square fit values for the two types of score. A chi-square type item fit statistic
based on binning observed and predicted values showed that none of the item
misfits in the two scoring techniques (df ¼ 5, the number of subsets specified in
the person scores). The SD for the latent ability parameters in the CP score was
.73 and in the TN score was 55.
In both scoring procedures, item easiness parameters were very close to each
other, suggesting that they are not significantly different. Two other analyses
were run assuming that all the 14 items (lines) were equally difficult. For the CP
scores, the information criteria and the deviance statistic showed that this model
had a worse fit compared with the model where item difficulties were assumed
to differ, 2 (13) ¼ 37, p < .01.The same result was observed for the TE scores,
80 Perceptual and Motor Skills 126(1)

2 (13) ¼ 33.87, p < .01. Therefore, the diﬃculties of the liens signiﬁcantly varied,
regardless of the scoring method.

Gender DIF
CP scores. We examined gender DIF and global goodness of fit by investigating
differences in item parameters across sexes with an LRT. The interaction of sex
and the individual items (i.e., the difference in item difficulty on log-scale for
males relative to females) was significant only for Item 2 (contrast ¼ .26, p < .01).
The LRT, however, showed that the model with the interaction term did not
explain more variability in the data, 2 (13) ¼ 18.84, p ¼ .12. In other words, the
item parameters remained constant across the two partitions of the sample, and
the data fit the RPCM.

TE scores. We examined DIF and global ﬁt across sex for the TE scores. DIF
analysis across sex showed that Items, 5, 6, and 8 manifested negligible DIF with
contrasts equal to .22, .20, and .24, respectively (p < .01). Nevertheless, an LRT
showed that the model with sex as the interaction term did not explain more
variability in the data, 2 (13) ¼ 15.86, p ¼ .25. This means that the item param-
eters remained invariant when sex was a partitioning criterion. Therefore, the
magnitude of observed DIF for the three items across sex is harmless.

Reliability analysis. We estimated the ability-speciﬁc reliability of the measures

(Baghaei & Doebler, 2018). Figure 2 shows the reliability estimates for diﬀerent
locations on the ability continuum for the CP and TE scores. The index of
reliability was more than .80 at the lowest level for the CP scores and augmented
to above .96 and was above .90 for a large portion of the ability continuum. For
the TN scores, reliability was slightly lower than .80 at its lowest level but aug-
mented to above .95 at highest levels and was very high for a large section of the
ability scale.

Discussion
The d2 test of attention is a short and easy-to-administer measure of sustained
attention. It is based on a well-grounded theoretical framework and is relatively
well researched. The test has sound psychometric properties, and its validity has
been demonstrated against other criterion measures by providing divergent and
discriminant evidence. The aim of this study was to contribute to the validity
literature of the test by examining its fit to a unidimensional RM. We examined
the psychometric functioning of the d2 test of attention by investigating the fit of
the individual items as well as the overall fit of the test to the unidimensional
RPCM. The RPCM was chosen as the psychometric model to fit to the d2 test,
as time limits are imposed on the individual lines and the counts of correct
Baghaei et al. 81

Figure 2. Ability-specific reliability graph: (a) Concentration performance scores and

(b) total characters canceled scores.

replies or errors are modeled. Graphical inspection and LRT with sex and
median of scores as partitioning criteria were utilized to evaluate the fit of the
data to the model.
Our findings suggest that the test fits the RPCM best when the CP scores
(total number of characters processed minus errors of commission) and total
number of characters canceled are modelled. The model did not fit when it was
applied to the errors of omission, errors of commission, total errors, and total
correctly processed. This is consistent with previous research on the psychomet-
ric quality of the d2 test. In a recent study, Steinborn et al. (2018) examined the
reliability of the d2 test and concluded that ‘‘. . . (a) only the speed score
(and error-corrected speed score) is eligible for highly reliable assessment,
[and] that (b) error scores might be used as a secondary measure (e.g., to
check for aberrant behavior) . . . ’’(p. 339).
For both favored scoring types, none of the 14 items misfits the model accord-
ing to a chi square test that compared the observed score for the items with the
82 Perceptual and Motor Skills 126(1)

score predicted by the Poisson function. A few items showed negligible

DIF across sex for both scoring types. However, the LRT with sex as a parti-
tioning criterion was nonsignificant, supporting unidimensionality. When the
sample was divided according to the median of CP and TE scores, all items
were invariant and the LRT was nonsignificant.
The item parameters had a limited range in both scoring procedures.
Nevertheless, the better fit of a model with different item parameters compared
with a model where item difficulties were assumed to be identical indicated that
item difficulties significantly vary. Ability-specific reliability measures showed
that test reliability ranges between .80 and .97 for different sections of the ability
scale for both CP and TE scores.
The results of the RPCM analysis demonstrated that the d2 test is an intern-
ally valid and accurate measure of attention. Fit of data to the RM is evidence
that total raw score is a valid estimator of ability and justifies the use of sum
scores as an indication of respondents’ latent trait (Wright, 1989).
IRT models in general and RPCM in particular can be used as powerful
psychometric models to evaluate neuropsychological tests where respondents
have to perform simple tasks speedily within limited response time allotments.
Under such testing conditions where test takers are supposed to identify target
stimuli, every target is a potential item and it is rather difficult to envisage clear
cut individual items. In such situations RPCM can be employed, as in this
model, the total number of correct replies or errors within each time allotment
is the unit of analysis instead of individual stimuli. Another advantage of the
RPCM is that it is less complex than polytomous IRT models that might seem to
be viable alternatives in such situations and, therefore, requires smaller sample
sizes. Furthermore, commonly used polytomous IRT models such as the partial
credit model (Masters, 1982) and the rating scale model (Andrich, 1978) do not
incorporate the time limit into item difficulty estimation, and it is not possible to
untangle the impact of task complexity from the impact of time constraint on
item parameters.
A limitation of this study is that we examined overall fit using LRT only
across sex and score because we had a relatively small sample size, and other
classifications we had at our disposal (e.g., handedness or whether respondents
wear glasses) were extremely small. Future research should examine fit across
other partitions of the test takers. The RPCM analysis demonstrated that the d2
test is an efficient and precise instrument for measuring attention.

Summary and Conclusions

The findings of this study can be summarized as follows: (a) The d2 test fitted the
RPCM best when the CP scores and the total number of characters canceled
were modeled; (b) none of the 14 items (lines) misfits, according to a chi square
test in the CP and TN scoring techniques, supporting unidimensionality; (c) few
Baghaei et al. 83

items showed negligible DIF across sex for the two scoring techniques; (d) LRTs
with sex and the median of raw scores as partitioning criteria were nonsigniﬁcant
for both scoring techniques, supporting model ﬁt; (e) the test is highly reliable
across a wide range of the ability continuum; and (f) the d2 test is an internally
valid and accurate measure of attention.

Declaration of Conflicting Interests

The author(s) declared no potential conﬂicts of interest with respect to the research,
authorship, and/or publication of this article.

Funding
The author(s) received no ﬁnancial support for the research, authorship, and/or publica-
tion of this article.

ORCID iD
Purya Baghaei http://orcid.org/0000-0002-5765-0413

Note
1. For details on RPCM estimation and R codes, see Baghaei and Doebler (2018).

References
American Psychiatric Association (2013). Diagnostic and statistical manual of mental
disorders. Washington, DC: American Psychiatric Press.
Andersen, E. B. (1973). A goodness of fit test for the Rasch model. Psychometrika, 38,
123–140.
Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika,
43, 561–573.
Baghaei, P., & Doebler, P. (2018). Introduction to the Rasch Poisson Counts Model: An
R tutorial. Psychological Reports. Advanced online publication. doi: 10.1177/
0033294118797577
Baghaei, P., Yanagida, T., & Heene, M. (2017). Development of a descriptive fit statistic
for the Rasch model. North American Journal of Psychology, 19, 155–168.
Baghaei, P., & Tabatabaee-Yazdi, M. (2016). The logic of latent variable analysis as
validity evidence in psychological measurement. The Open Psychology Journal, 9,
168–175.
Bates, D., Maechler, M., Bolker, B., Walker, S., Christensen, R., Singmann, H., et al.
(2017). lme4: Linear mixed-effects models using ‘Eigen’ and S4 (R package, version
1.1-14). Retrieved from https://cran.r-project.org/web/packages/lme4/index.html
Bates, M. E., & Lemay, E. P. Jr. (2004). The d2 test of attention: Construct validity and
extensions in scoring techniques. Journal of International Neuropsychological Society,
10, 392–400.
84 Perceptual and Motor Skills 126(1)

Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s
ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores
(pp. 395–479). Reading, MA: Addison-Wesley.
Borsboom, D. (2008). Latent variable theory. Measurement, 6, 25–53.
Brickenkamp, R., & Zillmer, E. (1998). The d2 test of attention. Seattle, WA: Hogrefe &
Huber Publishers.
Cooley, E. L., & Morris, R. D. (1990). Attention in children: A neuropsychologically
based model for assessment. Developmental Neuropsychology, 6, 239–247.
Doebler, A., Doebler, P., & Holling, H. (2014). A latent ability model for count data
and application to processing speed. Applied Psychological Measurement, 38, 587–598.
Doebler, A., & Holling, H. (2016). A processing speed test based on rule-based item
generation: An analysis with the Rasch Poisson Counts Model. Learning and
Individual Differences, 52, 121–128. doi: 10.1016/j.lindif.2015.01.013
Forthmann, B., Gerwig, A., Holling, H., Çelik, P., Storme, M., & Lubart, T. (2016). The
be-creative effect in divergent thinking: The interplay of instruction and object fre-
quency. Intelligence, 57, 25–32.
Fuermaier, A. B. M., Tucha, L., Koerts, J., Aschenbrenner, S., Westermann, C.,
Weisbrod, M., Lange, K.W., Tucha, O. (2013). Complex prospective memory in
adults with attention deficit hyperactivity disorder. PLoS One, 8, e58338.
doi:10.1371/journal.pone.0058338
Jansen, M. G. H. (1997). Applications of Rasch’s Poisson counts model to
longitudinal count data. In R. Langeheine & J. Rost (Eds.), Applications of latent
trait and latent class models in the social sciences (pp. 380–388). Münster, Germany:
Waxmann.
Lee, P., Lu, W.-S., Liu, C.-H., Lin, H.-Y., & Hsieh, C.-L. (2017). Test–retest reliability
and minimal detectable change of the D2 test of attention in patients with schizophre-
nia. Archives of Clinical Neuropsychology. Advance online publication. doi: 10.1093/
arclin/acx123
Manly, T., Anderson, V., Nimmo-Smith, I., Turner, A., Watson, P., & Robertson, I. H.
(2001). The differential assessment of children’s attention: The Test of Everyday
Attention for Children (TEA-Ch), normative sample and ADHD performance.
Journal of Child Psychology and Psychiatry, 42, 1065–1081.
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47,
149–174.
Mirsky, A. F., Anthony, B. J., Duncan, C. C., Ahearn, M. B., & Kellam, S. G. (1991).
Analysis of the elements of attention: A neuropsychological approach.
Neuropsychology Review, 2, 109–145.
Nadri, M. (2018). The Contribution of attention, processing speed, and fluid intelligence to
predicting Iranian EFL learners’ listening comprehension: Development and validation of
two measures of auditory attention. Unpublished master’s thesis. Mashhad: Islamic
Azad University.
Pieters, J. P. M. (1985). Reaction time analysis of simple mental tasks: A general
approach. Acta Psychologica, 59, 227–269.
Rasch, G. (1980). Probabilistic models for some intelligence and attainment tests
(expanded edition). Copenhagen: Danish Institute for Educational Research.
(Original work published 1960).
Baghaei et al. 85

R Development Core Team. (2016). R: A language and environment for statistical comput-
ing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from http: //
www.R-project.org/
Reitan, R., & Wolfson, D. (1985). The Halstead-Reitan neuropsychological battery:
Theory and clinical implications. Tucson, Arizona: Neuropsychology Press.
Smith, A. (1973). Symbol digit modalities test. Los Angeles, CA: Western Psychological
Services.
Sohlberg, M. M., & Mateer, C. A. (2001). Improving attention and managing attentional
problems. Adapting rehabilitation techniques to adults with ADD. Annals of the New
York Academy Sciences, 931, 359–375.
Spray, J. A. (1990). One-parameter item response theory models for psychomotor tests
involving repeated independent attempts. Research Quarterly for Exercise and Sport,
61, 162–168.
Steinborn, M. B., Langner, R., Flehmig, H. C., & Huestegge, L. (2018). Methodology
of performance scoring in the d2 sustained-attention test: Cumulative-reliability func-
tions and practical guidelines. Psychological Assessment, 30, 339–357.
Steinborn, M. B., Langner, R., & Huestegge, L. (2017). Mobilizing cognition for speeded
action: Try-harder instructions promote motivated readiness in the constant-foreper-
iod paradigm. Psychological Research, 81, 1135–1151.
Stroop, J. (1935). Studies of interference in serial verbal reactions. Journal of
Experimental Psychology, 18, 643–662.
Van Breukelen, G. J. P., Roskam, E. E. C. I., Eling, P. A. T. M., Jansen, R. W. T. L.,
Souren, D. A. P. B., & Ickenroth, J. G. M. (1995). A model and diagnostic measures
for response-time series on tests of concentration: Historical background, conceptual
framework, and some applications. Brain and Cognition, 27, 147–179.
Verhelst, N. D., & Kamphuis, F. H. (2009). A Poisson-gamma model for speed tests.
Measurement and Research Department Reports 2009-2. Arnhem, The Netherlands:
Cito.
Wechsler, D. (1981). WAIS-R : Wechsler adult intelligence scale-revised. New York, NY:
Psychological Corporation.
Wright, B. D. (1977). Misunderstanding the Rasch model. Journal of Educational.
Measurement, 14, 219–225.
Wright, B. D. (1988). Rasch model derived from Campbell concatenation: Additivity,
interval scaling. Rasch Measurement Transactions, 2, 16. Retrieved from https://www.
rasch.org/rmt/rmt21b.htm
Wright, B. D. (1989). Dichotomous Rasch model derived from counting right answers:
Raw scores as sufficient statistics. Rasch Measurement Transactions, 3, 62. Retrieved
from https://www.rasch.org/rmt/rmt32e.htm
Zillmer, E. A., & Kennedy, C. H. (1999). Preliminary United States norms for the d2 test
of attention. Archives of Clinical Neuropsychology, 14, 727–728. doi: 10.1093/arclin/
14.8.727.

Author Biographies
Purya Baghaei is an associate professor in the English Department of Islamic Azad University,
Mashhad, Iran. His major research interest is foreign language proficiency testing with a focus on
86 Perceptual and Motor Skills 126(1)

the applications of item response theory models in test validation and scaling. He has also conducted
research on the role of cognition in second language acquisition.

Hamdollah Ravand is an assistant professor at Vali-e-Asr University of Rafsanjan. His major areas of
interest are Language Testing Assessment, Cognitive Diagnostic Modeling, Structural Equation
Modeling, Multilevel Modeling, and Item Response Theory.

Mahsa Nadri holds an MA in TEFL (teaching English as a foreign language) from Islamic Azad
University, Mashhad, Iran. Her research interests are the cognitive and neuropsychological correlates
of foreign language ability.

Cronbach - 1990 - Essentials of Psychological Testing PDF
100% (2)
Cronbach - 1990 - Essentials of Psychological Testing PDF
745 pages
D2 Test of Attention
No ratings yet
D2 Test of Attention
4 pages
REVIWER Drills TOS BASED ASSESSMENT
No ratings yet
REVIWER Drills TOS BASED ASSESSMENT
44 pages
Psychological Assessment Reviewer PDF
No ratings yet
Psychological Assessment Reviewer PDF
122 pages
Research Methodology 2
No ratings yet
Research Methodology 2
91 pages
Psychology Practical Questions
No ratings yet
Psychology Practical Questions
3 pages
Spray 1987
No ratings yet
Spray 1987
8 pages
(Original PDF) Psychological Testing and Assessment 2e 2nd Edition Download
100% (8)
(Original PDF) Psychological Testing and Assessment 2e 2nd Edition Download
43 pages
Psychological Assessment Q - A Topic 2 B.
No ratings yet
Psychological Assessment Q - A Topic 2 B.
90 pages
Field Methods Presentation 3 Tests and Questionnaires
No ratings yet
Field Methods Presentation 3 Tests and Questionnaires
39 pages
Toni Iv
100% (11)
Toni Iv
80 pages
EDE812
No ratings yet
EDE812
3 pages
2nd Chapter Assessment & Evaluation
No ratings yet
2nd Chapter Assessment & Evaluation
16 pages
Test Development - Falcutan
No ratings yet
Test Development - Falcutan
3 pages
Chapter 4 Predictor - Psychological Assessment
No ratings yet
Chapter 4 Predictor - Psychological Assessment
29 pages
Item Analysis: A Technique To Check Suitability of Items For Test Dr. Girish Kumar Tiwari
No ratings yet
Item Analysis: A Technique To Check Suitability of Items For Test Dr. Girish Kumar Tiwari
28 pages
Psy. Testing
No ratings yet
Psy. Testing
43 pages
Processing Speed Tests
No ratings yet
Processing Speed Tests
10 pages
WAIS-IV Digital Stimulus Books
100% (10)
WAIS-IV Digital Stimulus Books
153 pages
Adhd Diva - 5
89% (18)
Adhd Diva - 5
20 pages
Luciano Act. 4
No ratings yet
Luciano Act. 4
10 pages
Assignment 2 ASL
No ratings yet
Assignment 2 ASL
8 pages
Unit 3
No ratings yet
Unit 3
37 pages
MMPI 2 Test Booklet
93% (14)
MMPI 2 Test Booklet
37 pages
RGO Psychological Assessment Enhanced 2016 1
100% (2)
RGO Psychological Assessment Enhanced 2016 1
335 pages
Public Administration Unit-53 Patterns of Relationship Between The Secretariat and Directorates
100% (3)
Public Administration Unit-53 Patterns of Relationship Between The Secretariat and Directorates
20 pages
PSYCHOMETRY
No ratings yet
PSYCHOMETRY
34 pages
Lecture 2 - Types of Psychological Testing
No ratings yet
Lecture 2 - Types of Psychological Testing
26 pages
Unit 2
No ratings yet
Unit 2
10 pages
Psychological Testing
No ratings yet
Psychological Testing
10 pages
Overall Equipment Effectiveness: Production Data Calculated Data
No ratings yet
Overall Equipment Effectiveness: Production Data Calculated Data
1 page
Psychological Testing Section B
No ratings yet
Psychological Testing Section B
5 pages
UNIT 2 Individual Differences and Assessment
No ratings yet
UNIT 2 Individual Differences and Assessment
44 pages
Pa2 Notes
No ratings yet
Pa2 Notes
3 pages
Basc3 Behavior Assessment System For Children, Third Edition Manual I
100% (10)
Basc3 Behavior Assessment System For Children, Third Edition Manual I
188 pages
Psychological Assessment Review Notes
83% (12)
Psychological Assessment Review Notes
8 pages
Based On Different Criteria, Different People Classify Test Into Different Categories As Seen Below
No ratings yet
Based On Different Criteria, Different People Classify Test Into Different Categories As Seen Below
44 pages
Introduction To Psychological Testing
No ratings yet
Introduction To Psychological Testing
22 pages
Business Requirements Document
100% (1)
Business Requirements Document
13 pages
Psychometrics
No ratings yet
Psychometrics
69 pages
DIVA-5 Diva 5 Id English Form
93% (28)
DIVA-5 Diva 5 Id English Form
20 pages
Development and Psychometric Properties of The Brief Test of Attention
No ratings yet
Development and Psychometric Properties of The Brief Test of Attention
11 pages
MCMI 3 Mannul
90% (10)
MCMI 3 Mannul
247 pages
Staes of Matter Target Achievemnet WS
No ratings yet
Staes of Matter Target Achievemnet WS
2 pages
Wais IV ALL Response Booklet and Record Form
82% (11)
Wais IV ALL Response Booklet and Record Form
24 pages
CC01 PA Introduction
No ratings yet
CC01 PA Introduction
11 pages
Week 5 of Tests and Testing
No ratings yet
Week 5 of Tests and Testing
7 pages
Psychometric Testing
No ratings yet
Psychometric Testing
12 pages
Psychology Practical Exam Viva Questions
No ratings yet
Psychology Practical Exam Viva Questions
10 pages
Camila E Mulero Morales - GRED 603 - d2 Test of Attention in Professional Counseling Presentation (Recording)
No ratings yet
Camila E Mulero Morales - GRED 603 - d2 Test of Attention in Professional Counseling Presentation (Recording)
19 pages
Reliability and Validity of The Digit Cancellation Test, A Brief Screen of Attention
No ratings yet
Reliability and Validity of The Digit Cancellation Test, A Brief Screen of Attention
11 pages
Lesson 2 Etp
No ratings yet
Lesson 2 Etp
8 pages
024 - Benjamin Fulford - 01-31-12
No ratings yet
024 - Benjamin Fulford - 01-31-12
2 pages
Characteristics, Construction and Evaluation of Psychological Tests
100% (1)
Characteristics, Construction and Evaluation of Psychological Tests
52 pages
Seismic Analysis Lumped Mass Procedure
No ratings yet
Seismic Analysis Lumped Mass Procedure
20 pages
Practical Introduction 240518 093518-1
No ratings yet
Practical Introduction 240518 093518-1
4 pages
Mmpi 2
70% (10)
Mmpi 2
214 pages
Psychological Assessment Review Notes
No ratings yet
Psychological Assessment Review Notes
8 pages
d2 Baremos
No ratings yet
d2 Baremos
9 pages
Psych Assessment Quizzes
No ratings yet
Psych Assessment Quizzes
4 pages
CAARS Adult ADHD Rating Scales Technical Manual
93% (14)
CAARS Adult ADHD Rating Scales Technical Manual
25 pages
Tickle: IQ and Personality Tests - Which Online Personality Test Are You? - Results The PROFILER Personality Test
No ratings yet
Tickle: IQ and Personality Tests - Which Online Personality Test Are You? - Results The PROFILER Personality Test
33 pages
UNIT 2 Psych. Testing
No ratings yet
UNIT 2 Psych. Testing
13 pages
Quiz 2 Clinical Psychology Final
No ratings yet
Quiz 2 Clinical Psychology Final
68 pages
Introduction To Psychological Testing
No ratings yet
Introduction To Psychological Testing
5 pages
Keytop Anpr Car Finder System 2017 - Wuyh
No ratings yet
Keytop Anpr Car Finder System 2017 - Wuyh
17 pages
Towards A Process Model of Sustained Attention Tests: Intelligence
No ratings yet
Towards A Process Model of Sustained Attention Tests: Intelligence
25 pages
Boston Naming Test
90% (10)
Boston Naming Test
61 pages
SCID - Full Interview
93% (14)
SCID - Full Interview
141 pages
Applications of Psychological Testing
No ratings yet
Applications of Psychological Testing
6 pages
Coloured Progressive Matrice Sets A, Ab, B
96% (28)
Coloured Progressive Matrice Sets A, Ab, B
39 pages
Foundations of Psych Testing 2
No ratings yet
Foundations of Psych Testing 2
14 pages
Practical 1
No ratings yet
Practical 1
37 pages
Basic of Bioethics Quiz
100% (3)
Basic of Bioethics Quiz
5 pages
Psychological Assessment
100% (1)
Psychological Assessment
7 pages
ADHD and Reading Fluency
No ratings yet
ADHD and Reading Fluency
29 pages
The SM Store - Application Form
No ratings yet
The SM Store - Application Form
1 page
Trail Making Test
80% (10)
Trail Making Test
5 pages
Wjiv Ach em Sec 040220
80% (10)
Wjiv Ach em Sec 040220
218 pages
SCL-90-R Auto Scoring Spreadsheet Dis - 7-7-08
64% (11)
SCL-90-R Auto Scoring Spreadsheet Dis - 7-7-08
9 pages
Proposed Enhanced Learners' Work Immersion Program
100% (2)
Proposed Enhanced Learners' Work Immersion Program
26 pages
MCMI Scoring
100% (11)
MCMI Scoring
11 pages
List of Ability Tests
0% (1)
List of Ability Tests
4 pages
CAARS - Long Version
97% (31)
CAARS - Long Version
5 pages
Introduction MMPI 3
100% (5)
Introduction MMPI 3
136 pages
List of Psychological Tests
100% (10)
List of Psychological Tests
22 pages
The Construction of Self - Harter
No ratings yet
The Construction of Self - Harter
45 pages
1 1.2. Ecosystem Introduction: The Term Eco' Means Environment. The Immediate
No ratings yet
1 1.2. Ecosystem Introduction: The Term Eco' Means Environment. The Immediate
17 pages
MMPI 2 Scale 2 PDF
100% (6)
MMPI 2 Scale 2 PDF
14 pages
Raven 2 Progressive Matrices
91% (11)
Raven 2 Progressive Matrices
50 pages
Matrices Wais IV
50% (10)
Matrices Wais IV
19 pages
Microsoft CRM 2011 Installation Guide
No ratings yet
Microsoft CRM 2011 Installation Guide
221 pages
Wisc V
90% (10)
Wisc V
6 pages
James N. Butcher-Mmpi-2 A Practitioner's Guide
94% (34)
James N. Butcher-Mmpi-2 A Practitioner's Guide
624 pages
Untitled
No ratings yet
Untitled
248 pages
Grades 3-6
No ratings yet
Grades 3-6
77 pages
Manual WAIS IV Completo
73% (11)
Manual WAIS IV Completo
49 pages
Per g01 Pub 585 Touchstone AssessmentQPHTMLMode1 GATE2451 GATE2451S2D3934 17402118184514583 CS25S26409309 GATE2451S2D3934E1.HTML#
No ratings yet
Per g01 Pub 585 Touchstone AssessmentQPHTMLMode1 GATE2451 GATE2451S2D3934 17402118184514583 CS25S26409309 GATE2451S2D3934E1.HTML#
33 pages
Actuator Design
No ratings yet
Actuator Design
7 pages
Madhavi Sem 4 Dissertation
No ratings yet
Madhavi Sem 4 Dissertation
64 pages
E-Recruitment A New Dimension of Human R PDF
No ratings yet
E-Recruitment A New Dimension of Human R PDF
7 pages
Law and Economics Anthology - Kenneth G. Dau-Schmidt y Thomas S. Ulen
No ratings yet
Law and Economics Anthology - Kenneth G. Dau-Schmidt y Thomas S. Ulen
64 pages
Trail Making Test
100% (1)
Trail Making Test
4 pages
Wisc V Manual
100% (15)
Wisc V Manual
330 pages
Virtual Memory - Wikipedia, The Free Encyclopedia
No ratings yet
Virtual Memory - Wikipedia, The Free Encyclopedia
8 pages
WAIS-IV Manual de Aplicación
100% (12)
WAIS-IV Manual de Aplicación
136 pages
The State of AI in The Cloud 2025
No ratings yet
The State of AI in The Cloud 2025
7 pages
Torque, Rotational Equilibrium and Center of Gravity: Gwyneth Marie Dayagan
No ratings yet
Torque, Rotational Equilibrium and Center of Gravity: Gwyneth Marie Dayagan
5 pages
Grade 7 - Half Yearly Review Timetable & Portion 2024-25
No ratings yet
Grade 7 - Half Yearly Review Timetable & Portion 2024-25
2 pages
Study On Optimization and Pigment Analysis of Beetroot (Beta-Vulgaris) and Their Application As Natural Dyes
No ratings yet
Study On Optimization and Pigment Analysis of Beetroot (Beta-Vulgaris) and Their Application As Natural Dyes
5 pages
Arts Therapies in Educational Settings An Intercultural Encounter
No ratings yet
Arts Therapies in Educational Settings An Intercultural Encounter
7 pages
Manual WAIS IV
86% (7)
Manual WAIS IV
247 pages
Facilitating Preschool Learning and Movement Through Dance
No ratings yet
Facilitating Preschool Learning and Movement Through Dance
9 pages
Celery Lab
No ratings yet
Celery Lab
3 pages
Allegory Analysis
No ratings yet
Allegory Analysis
4 pages
Technology: Autoform
No ratings yet
Technology: Autoform
8 pages
Local Literature 1
No ratings yet
Local Literature 1
5 pages
Semi-Automatic Twister: The Way To Make It
No ratings yet
Semi-Automatic Twister: The Way To Make It
1 page
Kids Presentation Xyz
No ratings yet
Kids Presentation Xyz
1 page
Stoichiometrey II Which Copper Sulfide?
No ratings yet
Stoichiometrey II Which Copper Sulfide?
3 pages
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
From Everand
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
César Pérez López
No ratings yet
Introduction To Non Parametric Methods Through R Software
From Everand
Introduction To Non Parametric Methods Through R Software
Editor IJSMI
No ratings yet
Glossary of Research Methodology
From Everand
Glossary of Research Methodology
Dr. Awadhesh Kishore
No ratings yet

Is The d2 Test of Attention Rasch Scalable

Uploaded by

Is The d2 Test of Attention Rasch Scalable

Uploaded by

Article

Perceptual and Motor Skills

Is the d2 Test of 2018, Vol. 126(1) 70–86

With the Rasch

Purya Baghaei1 , Hamdollah Ravand2,

believed that the participant’s task motivation is constantly refreshed by both

computerized adaptive testing, optimal planning of test designs, test equating,

The RPCM Measurement Model

It is possible to estimate v and i independent of each other; hence, separ-

Table 1. LRTs With Median of Raw Scores as a Partitioning

Scoring technique Chi square p df

examining diﬀerences in item parameters across two subsamples of the data

Table 2. Item Easiness Parameters, their Standard Errors, and Chi-

Item Estimate SE Fit Estimate SE Fit

1 2.27 .068 .91 2.44 .053 3.90

Reliability analysis. We estimated the ability-speciﬁc reliability of the measures

Figure 2. Ability-specific reliability graph: (a) Concentration performance scores and

score predicted by the Poisson function. A few items showed negligible

Summary and Conclusions

Declaration of Conflicting Interests

You might also like