0% found this document useful (0 votes)

11 views14 pages

A New Approach For Testing The Rasch Model

The article presents a new approach for testing the Rasch model, emphasizing the need for proper sample size determination based on Type I and Type II risks. The authors propose a three-way analysis of variance design to enhance the testing of the Rasch model, addressing limitations of existing methods. A simulation study supports the effectiveness of this approach in achieving accurate sample size calculations and model testing outcomes.

Uploaded by

patriciafernandes.to

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views14 pages

A New Approach For Testing The Rasch Model

Uploaded by

patriciafernandes.to

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Educational Research and Evaluation

An International Journal on Theory and Practice

ISSN: 1380-3611 (Print) 1744-4187 (Online) Journal homepage: https://www.tandfonline.com/loi/nere20

A new approach for testing the Rasch model

Klaus D. Kubinger , Dieter Rasch & Takuya Yanagida

To cite this article: Klaus D. Kubinger , Dieter Rasch & Takuya Yanagida (2011) A new
approach for testing the Rasch model, Educational Research and Evaluation, 17:5, 321-333, DOI:
10.1080/13803611.2011.630529

To link to this article: https://doi.org/10.1080/13803611.2011.630529

Published online: 30 Nov 2011.

Submit your article to this journal

Article views: 397

View related articles

Citing articles: 2 View citing articles

Full Terms & Conditions of access and use can be found at

https://www.tandfonline.com/action/journalInformation?journalCode=nere20
Educational Research and Evaluation
Vol. 17, No. 5, October 2011, 321–333

A new approach for testing the Rasch model

Klaus D. Kubingera*, Dieter Raschb and Takuya Yanagidaa
a
Division of Psychological Assessment and Applied Psychometrics, Faculty of Psychology,
University of Vienna, Vienna, Austria; bInstitute of Applied Statistics and Computing, University
of Natural Resources and Applied Life Sciences, Vienna, Austria

Though calibration of an achievement test within psychological and educational

context is very often carried out by the Rasch model, data sampling is hardly
designed according to statistical foundations. However, Kubinger, Rasch, and
Yanagida (2009) recently suggested an approach for the determination of sample
size according to a given Type I and Type II risk, and a certain effect of model
misfit when testing the Rasch model is supported by some new results. The
approach uses a three-way analysis of variance design ðA BÞ C with mixed
classification. There is a (fixed) group factor A, a (random) factor B of testees
within A, and a (fixed) factor C of items cross-classified with ðA BÞ. The
simulation study in this article deals with further item parameter ranges and
ability parameter distributions, and with larger sample sizes and item numbers
than the original paper. The results are: The approach works given several
restrictions, and its main aim, the determination of the sample size, is attained.
Keywords: Rasch model; sample size; Type I and Type II risk; analysis of
variance; mixed model

Introduction
Recently, Kubinger, Rasch, and Yanagida (2009) suggested a very simple approach
for testing the Rasch model (Rasch, 1980), nowadays also known as the 1-PL model.
Most importantly, the authors strictly distinguish between (Rasch-) ‘‘model tests’’,
on the one hand, which test model implications or are, so to speak, performed
according to specific objective measurement, and ‘‘goodness-of-fit tests’’, on the
other hand, which only measure the model’s appropriateness. Their approach deals
with model tests. And although there are several statistical approaches for testing the
Rasch model, among which Andersen’s Likelihood-Ratio test is best established
(LRT; Andersen, 1973) – see Glas and Verhelst (1995), for a current review of Rasch
model tests –, the authors emphasize that all these tests offer no clear procedure for
planning a study for item calibration. That is, if a researcher tries to calibrate within
psychological or educational context an achievement test according to the Rasch
model, data sampling is not designed based on statistical foundations concerning the
determination of the sample size for fulfilling certain ‘‘precision’’ requirements. So

*Corresponding author. Email: [email protected]

ISSN 1380-3611 print/ISSN 1744-4187 online

Ó 2011 Taylor & Francis
http://dx.doi.org/10.1080/13803611.2011.630529
http://www.tandfonline.com
322 K.D. Kubinger et al.

they point out that it is necessary to calculate the sample size given a certain Type I
risk a and a certain Type II risk b – that is, the probability of rejecting the Rasch
model though it is correct, on the one side, and the probability of accepting the
Rasch model though it is wrong, on the other side –, and of course given a certain
effect d. This effect refers to the degree of item parameters’ misfit to the Rasch
model, which is supposed to be of practical importance. That is, the sample size must
be calculated so that such an effect or even larger ones lead to a Type II error with at
most the probability of the fixed Type II risk. To achieve this, the authors aimed to
use an F-distributed statistic where the sample size directly affects the degrees of
freedom – bear in mind that Andersen’s LRT is chi-squared distributed and the
statistics’ degrees of freedom do not at all depend on the sample size, but only on the
number of estimated parameters. In contrast, the proposed F-distributed statistic
enables the researcher to calculate the sample size according to this distribution,
given a certain Type I and Type II risk and some specified alternative hypothesis
via d.
Though Kubinger et al. (2009) focussed just on a proper approach to sample size
calculation or rather to planning a study for Rasch model calibrating an achievement
test, they suggested by the way a new approach to testing the Rasch model. They
disclosed that this new approach is – despite some severe shortcomings – seriously
competitive to Andersen’s LRT. But as they restricted their simulation study to a
very small sample of scenarios, we now try in this article to give some support to
their results and to sound out the approach’s practicability in greater detail. We
therefore conduct a simulation study with a broader range of scenarios.

Method
Like Andersen’s LRT, the approach of Kubinger et al. (2009) shares the most
frequently referenced assumption of the Rasch model – that is that the item difficulty
parameters are statistically independent of the person ability parameters. The
approach is now a three-way analysis of variance.
Analysis of variance1 is a special case of the so-called general linear model. That
is, some character y has to be modelled as a random variable y due to a linear
function of certain model parameters. Thus, models of analysis of variance differ
with respect to such parameters. The main aim of analysis of variance is to test the
null hypothesis that some of these parameters are zero. The pertinent test is an F-test
which presupposes a random variable y which is normally distributed. Given
different conditions or treatments by which this variable is sampled, the F-test
presupposes equal variances as well. These conditions or treatments establish the
different levels of so-called factors and might move the random variable’s y mean. In
educational research, such factors could be sex and age of the pupils, social status of
their parents, localisation of the school, and many other variables. Dependent on the
number p of such factors there, we have a p-way layout of analysis of variance. At
least two different data situations and models must be distinguished. Case 1: All or
certain levels a of any factor are of interest and included in the analysis. That is, the
levels are fixed and not (randomly) sampled; hence, the factor is a fixed factor. From
the factors mentioned above, sex and age of pupils as well as social status of the
parents are fixed factors. We call this situation Model I. Case 2: There are many
levels of the interesting factor, the number of which has theoretically to be
considered as infinite. Those a levels included into the study have been randomly
Educational Research and Evaluation 323

selected by drawing a random sample from the population of all levels. Thus, the
factor has to be modelled as a random variable, too. Hence, the factor is a random
factor. From the factors mentioned above, the school would be a random factor if we
consider not all schools from a country but only take a random sample of schools.
We call this situation Model II. Besides these two models for a single factor or even
for p factors, there are several mixed models when p – x factors are fixed and x
factors are random (p ¼ 2, 3, . . . ; x ¼ 1, 2, . . . p – 1). A further characterization of
analysis of variance refers to a cross-classification of the p 4 1 factors versus a
hierarchical classification or to say ‘‘nested factors’’. Let us consider the case of
p ¼ 3 like in the following, dealing with the factors A, B, and C having a, b, and c
levels, respectively. If every level of any factor is combined with every level of the
other factors, then the design is a cross-classification. This occurs for instance in the
case we would have girls and boys (levels of factor A ¼ sex) combined completely
with the ages 10, 11, and 12 years (levels of factor B ¼ age) and all five social groups
of parents (levels of factor C ¼ social status). That is, we would have a three-way
cross-classification (Model I) with a ¼ 2, b ¼ 3, and c ¼ 5 levels. However, if p ¼ 2
and the levels of, for example, the factor B were realized only in a certain level of
another factor, say A, then B is nested within A; we would have a nested factor. In
case of p 4 2, we of course could gain some mixed classification. In the following,
we need a mixed three-way classification: ðA BÞ C. That is, A and C are fixed
factors, and B (in bold) is a random factor, and B is nested within A; the
combination B nested in A ðA BÞ is cross-classified with C.
Now, the different items of an item pool to be calibrated according to the
Rasch model are considered as the c different levels of a first (fixed) factor C, and
the testees as the b different levels of a second (random) factor B – factor C is a
fixed one, because just these given items are of interest, and factor B is a random one,
as there is an almost randomly chosen sample of testees who are part of a certain
intended population. Finally, the second (fixed) factor A is due to grouping of
the testees in a different levels. These groups need to be defined in advance and
therefore establish a fixed factor; they might be, for instance, male versus female
testees. Obviously, then, the factor B is nested within A, that is, A is a partition of the
total set of testees. This leads to a mixed classification ðA BÞ C , where C
is crossed with ðA BÞ. Now, Rasch model fitting means that there is no
interaction effect between the fixed factors – irrespective of the presumably strong
main effect C due to different items being solved more or less frequently within the
sample. For simplification, a b testees will be selected in such a way that each of the
a groups has equal size b (see the design in Figure 1). Then the equation for this
model is:

yijk ¼ m þ ai þ bij þ ck þ ðacÞik þ eijk 2 ð1Þ

For instance, according to Rasch, Herrendörfer, Bock, Victor, and Guiard

(2008), the statistic for testing the null hypothesis H0: (ac)ik ¼ 0 for every i and k is
F ¼ MSMS AC
BCwithinA
, which is F-distributed with (a71)(c71) and a(b71)(c71) degrees of
freedom – MS the mean squares.
The presented design of analysis of variance suﬀers from two problems: Firstly,
the design establishes just a single observation within each cell (n ¼ 1) of a mixed
model; secondly, this design is applied to dichotomous, not interval-scaled – and not
remotely normally distributed – data. A simulation study needs to assess the actual
324 K.D. Kubinger et al.

Figure 1. Rasch model data design interpreted as a three-way analysis of variance design
with mixed classification ðA BÞ C.
Note: The items are levels of a fixed factor C, and the testees are levels of a random factor B,
nested within a fixed factor A of different groups. yijk is either 1 or 0, depending on whether
Testee j from Group i has solved Item k or not.

Type I and Type II risk as these violations of the analysis of variance’s test
assumptions could destroy the test statistic’s distribution.
Kubinger et al. (2009) ran several scenarios: c ¼ 6 and 20; b ¼ 25, 50, and 100;
a ¼ 2. The c levels with parameters ck (matches item difficulty parameters sk within
Rasch model terminology – see Formula (2)) were equally spaced within the interval
[–2.5, 2.5] for c ¼ 6 and [–3, 3] for c ¼ 20; the levels of the random factor bij
(matches person ability parameters xj within Rasch model terminology – see
Formula (2)) were randomly drawn from a N(0, 1.5). One hundred thousand data
matrices were generated for each combination of j(i) and k. A significance level
of a ¼ .05 was applied. The main question of interest was whether the F-test for the
interaction effect A6C holds this nominal Type I risk. However, similar scenarios
were used for power analyses. Violations of the Rasch model were restricted to the
case of differential item functioning (DIF) as concerns specific item pairs. As
concerns c ¼ 6, the first case refers to parameters ck (matches si) as

[–2.5, –1.5, –0.5, 0.5, 1.5, 2.5] for group i ¼ 1 in A and as

[–2.5, –1.5, 0.5, –0.5, 1.5, 2.5] for i ¼ 2 –

this corresponds in terms of the Model equation (1) with ck as

[–2.5, –1.5, 0.0, 0.0, 1.5, 2.5], (ac)1k as

[0.0, 0.0, –0.5, 0.5, 0.0, 0.0], and (ac)2k as
[0.0, 0.0, 0.5, –0.5, 0.0, 0.0].

That is, there actually is, apart from main eﬀects of B(A) and C, only an interaction eﬀect
A6C due to items 3 and 4. This means there is a DIF of both these items with respect to
the two groups of testees. The second case refers to parameters ck (matches si) as

[–2.5; –1.5; –0.5; 0.5; 1.5; 2.5] for group i ¼ 1 in A and as

[–2.5; –0.5; –0.5; 0.5; 0.5; 2.5] for i ¼ 2.

The diﬀerence between these two cases is that in the latter case not only a two-item DIF
but also a diﬀerence in the variation of the parameters si applies – that variation
Educational Research and Evaluation 325

corresponds in terms of the Model equation (1) with the nominator of the non-centrality
parameter of the (ac)ik. As concerns c ¼ 20, for the analysis of the actual Type I risk (that
is when the null hypothesis is true), the parameters ck (¼si) were

[–3, –2.5, –2, –1.75, –1.5, –1.25, –1, –0.75, –0.5, –0.25, 0.25, 0.5, 0.75, 1, 1.25, 1.5,
1.75, 2, 2.5, 3];

and referring to the analysis of the power (that is when a speciﬁc alternative
hypothesis is true), the parameters ck (¼si) were

[–3, –2.5, –2, –1.75, –1.5, –1.25, –1, –0.75, –0.5, –0.25, 0.25, 0.5, 0.75, 1, 1.25, 1.5,
1.75, 2, 2.5, 3] for group i ¼ 1 in A and
[–3, –2.5, –2, –1.75, –1.5, –1.25, –1, –0.75, 0.5, –0.25, 0.25, –0.5, 0.75, 1, 1.25, 1.5,
1.75, 2, 2.5, 3] for i ¼ 2.

The main results were:

(1) If a main effect of A exists, an artificially high Type I risk of the A6C
interaction F-test results – that is, the new approach works as long as no
significant main effect of A occurs.
(2) Given no main effect of A and the null hypothesis is correct, a significant
interaction effect A6C occurs with a probability very near to the actual Type
I risk.
(3) Given no main effect of A and the null hypothesis is wrong, a significant in-
teraction effect A6C occurs with an acceptable probability for a defined two-item
DIF, depending on the number of testees (and the size of the item pool).
(4) Throughout the investigated scenarios, the A6C interaction F-test approach
proves to be more powerful than Andersen’s LRT.

We now perform an additional simulation study in order to test several other

scenarios. Above all, a broader range of item parameter than [–3, 3] is of interest, as
well as peaked rather than equally spaced item parameters within that interval. Other
variances of person ability parameters are also of interest. Finally, we wondered
whether there is much diﬀerence in the power of the approach under discussion when
there is only a single-item DIF or, on the other side, two times a two-item DIF.
Hence, the following scenarios (in combination) were under consideration:

(1) number of items: c ¼ 6; 25; 30; 40; 60; 100; 500;

(2) number of testees (within each of a ¼ 2 groups): b ¼ 25; 50; 100; 150; 500;
(3) standard deviations of person ability parameters: normally distributed with
N(0, 1); N(0, 1.5); N(0, 2); N(0, 2.5), and uniquely distributed in [–3, 3]; [74, 4];
(4) interval of item parameter: [–3, 3]; [–4, 4];
(5) item parameters’ concentration: equally spaced versus a peaked dispersion;
(6) number of DIF: a single-item DIF; a two-item DIF; two times a two-item
DIF; three times a two-item DIF;
(7) magnitude of DIF: 0.6.

Based on the already given results, we restricted the simulation study to cases
where there is no main eﬀect A.
326 K.D. Kubinger et al.

In each step of the simulation, the random number generator of R (R

Development Core Team, 2008) was used as implemented in the program package
extended Rasch modeling (eRm; Mair, Hatzinger, & Maier, 2010; cf. also Poinstingl,
Mair, & Hatzinger, 2007). A data set was generated by calculating the probability P
that testee j with person ability parameter xv solve (þ) item i with item diﬃculty
parameter si according to the pertinent Rasch model formula:

exv si
Pðþjxv ;si Þ ¼ ð2Þ
1 þ exv si

Then a Bernoulli trial was carried out with the probability P, which led to a
matrix of data based on the Rasch model. In contrast to Kubinger et al. (2009), who
performed 100,000 simulation replications for each scenario, we only used for the
moment 10,000 replications if b5150 and only 1,000 replications if b 150 and
c 4 6. In the case of c ¼ 500, we performed 1,000 replications for all conditions.
That is, 10,000 or 1,000 data matrices were generated for each combination of j(i)
and k. A signiﬁcance level of a ¼ .05 was applied. Of course, simulation studies in
statistics are always based on 100,000 replications, because otherwise chance eﬀects
might be established leading to unacceptable impreciseness. Nevertheless, we chose
smaller replication numbers in order to be able to take more relevant scenarios into
account. In the case of incongruent results, we had a larger number of replications up
our sleeve.

Results
We used the program package R for the calculation of all F-tests (main effects A, B,
C, and the interaction effect A6C).
The first question of interest was whether the F-test for the interaction effect
A6C holds the nominal Type I risk of 5%. Table 1 gives the results of Kubinger
et al. (2009). Table 2 gives our results for the case of equally spaced item parameters
within the interval [–3, 3] and the ability parameters randomly distributed according
to N(0, 1.5). Table 3 does the same; however, now the interval is [–4, 4]. As a matter
of fact, the corresponding cases in Table 1 and Table 2 (see the shadowed cells) led to

Table 1. The actual Type I risk of the F-test for the interaction eﬀect A6C in a three-way
analysis of variance design ðA BÞ C with mixed classiﬁcation (the nominal Type I risk is
5%).

p (F-test) A6C
b
c 25 50 100
6 .05371 .05276 .05208
20 .05514 .05463 .05318

Note: A is a ﬁxed factor with a ¼ 2 levels (groups from the same population), B is a random factor nested
within A with the levels b ¼ 25, 50, and 100 (testees) for each of the a ¼ 2 levels, and C is a ﬁxed factor
with c ¼ 6, 20 levels (items). Estimations are based on 100,000 simulation replications of Rasch model-
based data; the item parameters ck (¼si) are equally spaced within the interval [–2.5, 2.5] in the case of
c ¼ 6 and within the interval [–3, 3] in the case of c ¼ 20; the ability parameters are randomly distributed
according to N(0, 1.5).
Educational Research and Evaluation 327

Table 2. The actual Type I risk of the F-test for the interaction eﬀect A 6 C in a three-way
analysis of variance design ðA BÞ C with mixed classiﬁcation (the nominal Type I risk is
5%).

p (F-test) A 6 C
b
c 25 50 100 150 500
6 .0564 .0533 .0532 .0546 .0514
25 .0573 .0562 .0575 .046 .057
30 .0581 .0583 .0532 .055 .056
40 .0576 .0597 .0560 .057 .060
60 .0664 .0612 .0572 .056 .066
100 .0635 .0665 .0617 .065 .072
500 .098 .095 .087 .085 .102

Table 3. The actual Type I risk of the F-test for the interaction eﬀect A6C in a three-way
analysis of variance design ðA BÞ C with mixed classiﬁcation (the nominal Type I risk is
5%).

p (F-test) A 6 C
b
c 25 50 100 150 500
6 .0623 .0631 .0680 .0573 .0635
25 .0705 .0680 .0682 .066 .064
30 .0682 .0675 .0661 .068 .068
40 .0682 .0692 .0715 .077 .069
60 .0756 .0701 .0769 .075 .074
100 .0852 .0808 .0844 .075 .100
500 .134 .119 .143 .140 .153

almost the same values. But, speciﬁcally Table 3 discloses a trend that the actual
Type I risk increases when b and c increase; and if the item parameters have a greater
range, then the actual Type I risk is almost always beyond the nominal Type I risk
plus 20%. For this reason, we analysed in greater detail the behavior of the actual
Type I risk in dependence on (a) the interval of the item parameters, (b) the standard
deviation of the ability parameters, and (c) the item parameters’ concentration.
Although we did this for every combination of b and c, we summarize the results in
Table 4 only for a representative combination of them, that is b ¼ 100 and c ¼ 40.
Hence, for the moment the approach suggested by Kubinger et al. (2009) for
testing the Rasch model has to be brought back down to earth somewhat. In
328 K.D. Kubinger et al.

Table 4. The actual Type I risk of the F-test for the interaction eﬀect A 6 C in a three-way
analysis of variance design ðA BÞ C with mixed classiﬁcation in dependence of (a) the
interval of the item parameters, (b) the standard deviation of the ability parameters, and (c)
the item parameters’ concentration (the nominal Type I risk is 5%).

testees’ ability
uniquely uniquely
distributed distributed
items’ diﬃculty N(0, 1) N(0, 1.5) N(0, 2) N(0, 2.5) in [–3, 3] in [–4, 4]
equally spaced [73, 3] .0591 .0560 .0604 .0671 .0581
[74, 4] .0710 .0715 .0736 .0807 .0716
peaked [73, 3] .0559 .0538 .0529 .0552 .0477
[74, 4] .0600 .0628 .0577 .0673 .0655

Note: For simplicity, only the results for a representative combination of b (¼100) and c (¼40) are given.
Estimations are based on 10,000 simulation replications of Rasch model-based data.

addition to the already mentioned restriction that there must not be a significant
main effect in A (a partition of the total set of testees), we now also have to restrict
this approach to applications of an item parameter interval [–3, 3] (at most) with a
peaked dispersion (at best) and a (normal) distribution of the testees’ ability
parameters with a standard deviation not larger than 1.5 – and even then the number
of items should not be larger than 100 (better not larger than 40), and the number of
testees no larger than two times 150. Of course, there are two ways out or attempts to
rescue the approach. On the one hand, Kubinger, Rasch, and Yanagida (2009) did
not really aim for a new Rasch model test but only tried to use a three-way analysis
of variance design ðA BÞ C with mixed classification as a means for determining
the sample size according to a given Type I and Type II risk, and according to a
certain effect of model misfit when testing the Rasch model. And for this purpose,
the approach will work, given the mentioned restrictions. On the other hand, we
could of course apply some correction of the nominal Type I risk in order to gain an
actual Type I risk that fits. However, we can not detect any mathematical function
for doing so in a well-founded manner.
So the second question of interest, the power of the F-test for the interaction effect
A6C, is to be restricted to cases where the nominal Type I risk has proven to hold
tolerably – that is, an item parameter interval [–3, 3] (equally spaced) and a normal
distribution of the testees’ ability parameter with a standard deviation of 1.5. And the
case of c ¼ 500 is not considered any more. As indicated, we investigated a single-
item DIF, a two-item DIF, and a two times two-item DIF. The magnitude of the DIF
has been fixed to 0.6 as this is, in case of an item parameter interval [–3, 3], a 10th of
the item parameters’ range, which is, according to some rules of thumb, a relevant
effect size (cf. Kubinger, 2005). The single-item DIF is, for instance, for the case of
c ¼ 25 and DIF’s location at the 15th item as follows: the parameters ck (¼si) were

[–3.00, –2.75, –2.50, –2.25, –2.00, –1.75, –1.50, –1.25, –1.00, –0.75, –0.50, –0.25, 0.00,
0.25, 0.50, 0.75, 1.00, 1.25, 1.50, 1.75, 2.00, 2.25, 2.50, 2.75, 3.00] for group i ¼ 1 in A
and
[–3.02, –2.77, –2.52, –2.27, –2.02, –1.77, –1.52, –1.27, –1.02, –0.78, –0.52, –0.28,
–0.03, 0.22, 1.10, 0.72, 0.98, 1.23, 1.48, 1.73, 1.98, 2.23, 2.48, 2.73, 2.98] for i ¼ 2.
Educational Research and Evaluation 329

The two-item DIF is, for instance, for the case of c ¼ 25 and DIF’s location at
the 12th and 14th items as follows: the parameters ck (¼si) were

[–3.00, –2.75, –2.50, –2.25, –2.00, –1.75, –1.40, –1.25, –1.00, –0.75, –0.50, –0.30, 0.00, 0.30,
0.50, 0.75, 1.00, 1.25, 1.40, 1.75, 2.00, 2.25, 2.50, 2.75, 3.00] for group i ¼ 1 in A and
[–3.00, –2.75, –2.50, –2.25, –2.00, –1.75, –1.40, –1.25, –1.00, –0.75, –0.50, 0.30, 0.00,
–0.30, 0.50, 0.75, 1.00, 1.25, 1.40, 1.75, 2.00, 2.25, 2.50, 2.75, 3.00] for i ¼ 2.

The two times a two-item DIF is, for instance, for the case of c ¼ 25 and DIF’s
location at the 5th, 7th, 19th, and 21st items as follows: the parameters ck (¼si) were

[–3.00, –2.75, –2.50, –2.25, –2.00, –1.75, –1.40, –1.25, –1.00, –0.75, –0.50, –0.30, 0.00,
0.30, 0.50, 0.75, 1.00, 1.25, 1.40, 1.75, 2.00, 2.25, 2.50, 2.75, 3.00] for group i ¼ 1 in A
and
[–3.00, –2.75, –2.50, –2.25, –1.40, –1.75, –2.00, –1.25, –1.00, –0.75, –0.50, –0.30, 0.00,
0.30, 0.50, 0.75, 1.00, 1.25, 2.00, 1.75, 1.40, 2.25, 2.50, 2.75, 3.00] for i ¼ 2.

And the three times a two-item DIF is, for instance, for the case of c ¼ 25 and
DIF’s location at the 5th, 7th, 12th, 14th, 19th, and 21st items as follows: the
parameters ck (¼si) were

[–3.00, –2.75, –2.50, –2.25, –2.00, –1.75, –1.40, –1.25, –1.00, –0.75, –0.50, –0.30, 0.00,
0.30, 0.50, 0.75, 1.00, 1.25, 1.40, 1.75, 2.00, 2.25, 2.50, 2.75, 3.00] for group i ¼ 1 in A
and
[–3.00, –2.75, –2.50, –2.25, –1.40, –1.75, –2.00, –1.25, –1.00, –0.75, –0.50, 0.30, 0.00,
–0.30, 0.50, 0.75, 1.00, 1.25, 2.00, 1.75, 1.40, 2.25, 2.50, 2.75, 3.00] for i ¼ 2.

That is, in accordance with the results of Kubinger et al. (2009), we avoided the case
that the relevant DIF changes the variation of the item parameters – and, remember, no
main effect between the two interesting groups (factor A) has been assumed.
Of course, the effect of DIF depends on the absolute value of the item parameter.
Hence we analysed different localisations of the DIF, too. We always took into
account four different locations on the item parameter continuum in the case of a
single-item DIF and always three different couples or quadruples of locations in the
cases of a two-item and two times two-item DIF; only in the case of three times a
two-item DIF did we just study a single form of localisations. Table 5 summarizes
the results of the single-item DIF and the two-item DIF, Table 6 for two and three
times a two-item DIF.
The main result of Tables 5 and 6 is as follows: Obviously, and not surprisingly
(cf. the item standard error of estimation’s dependency on the parameter itself), the
power of the F-test is considerably greater if the same DIF happens to occur for
items with moderate item difficulty. Then, there is of course a big difference in the
power between a sample size of 26150 or 26500; but, as Table 5 (a single-item DIF
and a two-item DIF) shows, since a DIF of 0.6 leads in general to very low power,
differences between sample sizes of 2625 (50, 100) and 26150 are almost negligible.
And, also expectedly, the power decreases if the number of DIF-involved items
becomes relatively small. For instance, a two-item DIF in a medium position results
in power of about .435 when there are 25 items (with 26150 testees), however only
about .365 when there are 40 items. All in all, Table 5 discloses that the F-test’s
330 K.D. Kubinger et al.

Table 5. The power of the F-test for the interaction eﬀect A6C in a three-way analysis of
variance design ðA BÞ C with mixed classiﬁcation (the nominal Type 1 risk is 5%).

p (F-test) A6C
b
number localisation
of DIF of DIF c 25 50 100 150 500
1 1 6 .0693 .0821 .1162 .1612 .5117
3 .1159 .1826 .3156 .4668 .9501
4 .1063 .1598 .2838 .4196 .9270
6 .0628 .0750 .0923 .1165 .3260
1 5 25 .0667 .0745 .1019 .143 .312
11 .0805 .1018 .1555 .216 .704
15 .0731 .0969 .1395 .174 .655
21 .0671 .0702 .0794 .124 .266
2 5/7 25 .0814 .1019 .1588 .256 .786
12/14 .0985 .1587 .2974 .435 .971
19/21 .0776 .1049 .1557 .226 .776
1 6 30 .0655 .0792 .0979 .128 .394
13 .0716 .0949 .1456 .193 .674
18 .0696 .0887 .1324 .183 .629
25 .0641 .0714 .0845 .113 .239
2 6/9 30 .0778 .0955 .1528 .245 .750
14/17 .0953 .1448 .2688 .414 .957
22/25 .0724 .1016 .1507 .230 .753
1 7 40 .0658 .0740 .0891 .101 .310
16 .0704 .0920 .1340 .185 .570
25 .0732 .0828 .1165 .159 .489
34 .0669 .0694 .0697 .089 .172
2 7/11 40 .0702 .0902 .1281 .164 .631
18/23 .0863 .1271 .2348 .365 .943
30/34 .0758 .0907 .1261 .175 .608
1 10 60 .0680 .0706 .0844 .091 .252
24 .0739 .0910 .1154 .137 .495
37 .0708 .0805 .1032 .127 .403
51 .0654 .0682 .0792 .084 .149
2 10/16 60 .0738 .0808 .1131 .157 .511
27/34 .0887 .1159 .1931 .268 .887
45/51 .0749 .0889 .1155 .139 .514
1 18 100 .0718 .0731 .0897 .093 .203
41 .0747 .0828 .1050 .135 .371
60 .0715 .0763 .0998 .113 .301
83 .0678 .0697 .0768 .083 .146
2 18/28 100 .0771 .0802 .1045 .137 .452
45/56 .0849 .1052 .1513 .232 .771
73/83 .0758 .0834 .1016 .137 .411

Note: A is a fixed factor with a ¼ 2 levels (groups from the same population), B is a random factor nested
within A with the levels b ¼ 25, 50, 100, 150, and 500 (testees) for each of the a ¼ 2 levels, and C is a fixed
factor with c ¼ 6, 25, 30, 40, 60, and 100 levels (items). Estimations are based on 1,000 or 10,000
simulation replications of DIF-based data: Within the first group, Rasch model-based data were used with
either a single- or a two-item DIF as compared to the second group’s Rasch model-based data. There is no
difference in the variation of the item parameters in both the groups. The item parameters are equally
spaced within the interval [–3, 3], the ability parameters randomly distributed according to N(0, 1.5).

power for the given eﬀect size of a DIF of 0.6 is acceptable if there are 25 to 40 items,
26500 testees, and a two-item DIF. According to Table 6, the situation is better –
though no case of sample size 26100 or smaller satisﬁes. On the other hand, a
Educational Research and Evaluation 331

Table 6. The power of the F-test for the interaction eﬀect A 6 C in a three-way analysis of
variance design ðA BÞ C with mixed classiﬁcation (the nominal Type I risk is 5%).

p (F-test) A6C
b
number localisation
of DIF of DIF c 25 50 100 150 500
4 5/7/12/14 25 .1246 .2190 .4580 .670 1.000
5/7/19/21 .0987 .1568 .3178 .478 .992
12/14/19/21 .1282 .2222 .4607 .675 1.000
6 5/7/12/14/19/21 25 .1513 .3049 .6264 .850 1.000
4 6/9/14/17 30 .1220 .2041 .4187 .619 1.000
6/9/22/25 .0991 .1459 .2855 .468 .989
14/17/22/25 .1170 .2108 .4357 .661 .998
6 6/9/14/17/22/25 30 .1527 .2888 .5901 .796 1.000
4 7/11/18/23 40 .1086 .1794 .3637 .576 .998
7/11/30/34 .0987 .1286 .2362 .375 .974
18/23/30/34 .1124 .1699 .3504 .527 .999
6 40 .1264 .2358 .4827 .737 1.000
4 10/16/27/34 60 .1039 .1552 .2881 .418 .992
10/16/45/51 .0815 .1117 .1903 .283 .944
27/34/45/51 .0989 .1549 .2892 .445 .990
6 10/16/27/34/45/51 60 .1118 .1951 .3964 .638 1.000
4 18/28/45/56 100 .0934 .1292 .2276 .335 .963
18/28/73/83 .0835 .1087 .1713 .240 .838
45/56/73/83 .0968 .1241 .2316 .363 .972
6 18/28/45/56/73/83 100 .1034 .1664 .3234 .473 .997

Note: A is a fixed factor with a ¼ 2 levels (groups from the same population), B is a random factor nested
within A with the levels b ¼ 25, 50, 100, 150, and 500 (testees) for each of the a ¼ 2 levels, and C is a fixed
factor with c ¼ 6, 25, 30, 40, 60, and 100 levels (items). Estimations are based on 1,000 or 10,000
simulation replications of DIF-based data: Within the first group, Rasch model-based data were used with
either two times or three times a two-item DIF as compared to the second group’s Rasch model-based
data. There is no difference in the variation of the item parameters in both the groups. The item
parameters are equally spaced within the interval [–3, 3], the ability parameters randomly distributed
according to N(0, 1.5).

sample size of 26500 is generally too large; there is no realistic Type II risk then
(apart probably from a single case). Hence, interpolating the numerical results, a
sample size of 26200 or 26250 promises to ﬁt best – however, be aware that in
order to hold the Type I risk, the sample size should not be larger than 26150. As
concerns the sample size 26150, a three times two-item DIF of magnitude 0.6 will be
discovered with an acceptable power of about .80, if the number of items is not larger
than 40.

Discussion and conclusion

As already indicated, with our results we have to constrain the approach suggested
by Kubinger et al. (2009) for testing the Rasch model. The approach is not only
restricted to the case where no signiﬁcant main eﬀect in A (a partition of the total set
of testees) results, but also to an item parameter interval of [–3, 3] and a (normal)
distribution of the testees’ ability parameter with a standard deviation not larger
than 1.5. Then, the number of items should not be larger than 40, and, from the
actual Type I error’s point of view, the number of testees not larger than two times
332 K.D. Kubinger et al.

150, from the power’s point of view, not smaller than 150. However, the very last
restriction is based on a single effect size of an item DIF of 0.6, and we did not
investigate more than a three times two-item DIF, which would most likely increase
the power of the approach.
While a larger DIF as well as a DIF involving more items could probably meet a
researcher’s aspirations for the power of this approach (hence further research is
needed), the problem of the extremely raised actual Type I risk stated here is hardly
understandable. Again, further research may disclose which combination of item
parameter interval and ability parameter’s standard deviation leads to analyses
where the nominal Type I risk holds. At any rate, a peaked dispersion leads
somehow to better results, which is, by the way, most likely the common case.
Maybe the problem is based on the fact that we have analysed only cases where
the range of the ability parameters is larger than the range of the item parameters –
as is well-known, for instance, for N(0, 1), the ability parameter ranges from –3 to 3.
Thereby, a lot of simulated testees will not master even the easiest items, and a lot of
simulated testees will master every item; hence, any specific unlikely observation (an
unlikely solution or an unlikely failure) might happen for one of the observed items
in the one group, for another item in the other group, or to say: Chance effects with
generally a very low probability become likely to polarize at least two items in both
the groups. However, our selective results for [–3, 3] and N (0, 0.5) do not support
this argumentation.
Nevertheless, the main aim of the approach, that is, to use a three-way analysis of
variance design ðA BÞ C with mixed classification as a means for determining
the sample size – according to a given Type I and Type II risk, and according to a
certain effect of model misfit when testing the Rasch model – is to some extent
attained. We now know in much more detail which sample size fits. Given that a DIF
of 0.6 (a 10th of the item parameters’ range) which occurs at least with respect to
three couple of items is of relevance, for instance, a sample size of two times 150 will
detect such an effect in the case of 30 items at a nominal Type I risk of 1% with a
power of almost 80%. Of course, a table of various combinations of the number of
items, the number of testees, the relevant DIF, the Type I risk, and the power within
the stated frame of restrictions would be of further interest.

Notes
1. Because of a reviewer’s suggestion, we give here a short introduction into analysis of
variance (see for a deliberate presentation, e.g., Kubinger, Rasch, & Yanagida, 2011;
Rasch, 1995; or Rasch, Kubinger, & Yanagida, 2011).
2. Random variables are printed in bold.

References
Andersen, E.B. (1973). A goodness of ﬁt test for the Rasch model. Psychometrika, 38, 123–
140.
Glas, C.A.W., & Verhelst, N.D. (1995). Testing the Rasch model. In G.H. Fischer & I.W.
Molenaar (Eds.) Rasch models: Foundations, recent developments, and applications (pp.
69–95). New York, NY: Springer.
Kubinger, K.D. (2005). Psychological test calibration using the Rasch Model – Some
critical suggestions on traditional approaches. International Journal of Testing, 5, 377–394.
Kubinger, K.D., Rasch, D., & Yanagida, T. (2009). On designing data-sampling for
Rasch model calibrating an achievement test. Psychology Science Quarterly, 51, 370–
384.
Educational Research and Evaluation 333

Kubinger, K.D., Rasch, D., & Yanagida, T. (2011). Statistik in der Psychologie – vom
Einführungskurs bis zur Dissertation [Statistics in Psychology – Introduction course up to
doctoral thesis]. Göttingen, Germany: Hogrefe.
Mair, P., Hatzinger, R., & Maier, M. (2010). eRm: extended Rasch modeling. R package
version 0.13-0 [Computer software]. Retrieved from http://cran.r-project.org/web/
packages/eRm/
Poinstingl, H., Mair, P., & Hatzinger, R. (2007). Manual zum Softwarepackage eRm (extended
Rasch modeling) – Anwendung des Rasch-Modells (1-PL Modell). Deutsche Version
[Manual of eRm. To apply the Rasch model – German version]. Lengerich, Germany:
Pabst.
Rasch, D. (1995). Mathematische Statistik. Berlin, Germany: Wiley.
Rasch, D., Herrendörfer, G., Bock, J., Victor, N., & Guiard, V. (2008). Verfahrensbibliothek
Versuchsplanung und -auswertung. Elektronisches Buch [Collection of procedures in design
and analysis of experiments. Electronic book]. München, Germany: Oldenbourg.
Rasch, D., Kubinger, K.D., & Yanagida, T. (2011). Statistics in psychology – Using R and
SPSS. Chichester, UK: Wiley.
Rasch, G. (1980). Probabilistic models for some intelligence and attainment tests (Reprint).
Chicago, IL: The University of Chicago Press.
R Development Core Team. (2008). R: A language and environment for statistical computing.
Vienna, Austria: R Foundation for Statistical Computing. Retrieved from http://www.
R-project.org.

Hospital-At-home Integrated Care Programme For Older Patients With
No ratings yet
Hospital-At-home Integrated Care Programme For Older Patients With
5 pages
How Trained Volunteers Can Improve The Quality of Hospital Care For
No ratings yet
How Trained Volunteers Can Improve The Quality of Hospital Care For
6 pages
Developing A Holistic, Multidisciplinary
No ratings yet
Developing A Holistic, Multidisciplinary
7 pages
Effects of A Group-Based Physical and Cognitive Intervention
No ratings yet
Effects of A Group-Based Physical and Cognitive Intervention
9 pages
Longitudinal Effects On Self-Determination
No ratings yet
Longitudinal Effects On Self-Determination
13 pages
OECD Full Version
No ratings yet
OECD Full Version
44 pages
Class - VIII HHW (2025-26) - 3
No ratings yet
Class - VIII HHW (2025-26) - 3
6 pages
1 s2.0 S0020748908002721 Main
No ratings yet
1 s2.0 S0020748908002721 Main
14 pages
On The Question of An Identify Category Order
No ratings yet
On The Question of An Identify Category Order
9 pages
Regras de Ligação 2002
No ratings yet
Regras de Ligação 2002
6 pages
Regras de Ligaçao 2005
No ratings yet
Regras de Ligaçao 2005
8 pages
Starck Et Al. 2019 - Fertility and Marital Status in Adults With Childhood Onset Epilepsy - A Population-Based Cohort Study
No ratings yet
Starck Et Al. 2019 - Fertility and Marital Status in Adults With Childhood Onset Epilepsy - A Population-Based Cohort Study
7 pages
Fatigue Scale
No ratings yet
Fatigue Scale
10 pages
Independent T Test Statistics It S Relevance in Educational Research
No ratings yet
Independent T Test Statistics It S Relevance in Educational Research
10 pages
Vedic Mathematics 1 VAC PYQ
No ratings yet
Vedic Mathematics 1 VAC PYQ
8 pages
BUS 3304 Unit 6 Assignment
No ratings yet
BUS 3304 Unit 6 Assignment
3 pages
Severe Accident Uncertainty Quantification and Analysis Using The Modular Accident Analysis Program - MAAP
No ratings yet
Severe Accident Uncertainty Quantification and Analysis Using The Modular Accident Analysis Program - MAAP
268 pages
Unit 3 Theoretical Questions
No ratings yet
Unit 3 Theoretical Questions
5 pages
9 - Maths - L-3-Coordinate Geometry WS-1
No ratings yet
9 - Maths - L-3-Coordinate Geometry WS-1
6 pages
Non Parametric Tests (Sarah)
No ratings yet
Non Parametric Tests (Sarah)
32 pages
Chapter 6 - Hypothesis Testing - Part 1
No ratings yet
Chapter 6 - Hypothesis Testing - Part 1
8 pages
Arihant CBSE Applied Mathematics Term 2 Class 11 Book
100% (3)
Arihant CBSE Applied Mathematics Term 2 Class 11 Book
214 pages
Lamport Algorithm
No ratings yet
Lamport Algorithm
4 pages
Fdsa U 4
No ratings yet
Fdsa U 4
16 pages
ANSYS CFX Tutorials
No ratings yet
ANSYS CFX Tutorials
610 pages
Maths English Medium 11th Model Question Paper WWW tn11th in
No ratings yet
Maths English Medium 11th Model Question Paper WWW tn11th in
5 pages
Mid Term Last Year
No ratings yet
Mid Term Last Year
4 pages
Mat 402 Statistics With Computer Final Exam
No ratings yet
Mat 402 Statistics With Computer Final Exam
4 pages
Strings Js Notes
No ratings yet
Strings Js Notes
3 pages
Nonparametric Statistics
No ratings yet
Nonparametric Statistics
32 pages
Red e CVC
No ratings yet
Red e CVC
19 pages
Unit 4 - Notes
No ratings yet
Unit 4 - Notes
14 pages
Intro Quantitative Research
No ratings yet
Intro Quantitative Research
24 pages
Semansky Et Al 2013 How States Use Medicaid To Fund Community Based Services To Children With Autism Spectrum Disorders
No ratings yet
Semansky Et Al 2013 How States Use Medicaid To Fund Community Based Services To Children With Autism Spectrum Disorders
5 pages
Thuy 2020
No ratings yet
Thuy 2020
10 pages
Dababnah 2015
No ratings yet
Dababnah 2015
11 pages
Tutorial 4 - MATRIX and LINEAR - DE - WITH SOLUTION 2020
No ratings yet
Tutorial 4 - MATRIX and LINEAR - DE - WITH SOLUTION 2020
26 pages
WHOCB For AUutism
0% (1)
WHOCB For AUutism
44 pages
Unit - 4 Testing of Hypothesis
No ratings yet
Unit - 4 Testing of Hypothesis
11 pages
Practical Skills
No ratings yet
Practical Skills
35 pages
ANOVA Course Notes
No ratings yet
ANOVA Course Notes
22 pages
F Test
No ratings yet
F Test
10 pages
(AR) Making Better Tests With The Rasch Measurement Model (2018)
No ratings yet
(AR) Making Better Tests With The Rasch Measurement Model (2018)
25 pages
Hypothesis Formulation and Testing
No ratings yet
Hypothesis Formulation and Testing
23 pages
InfluenceOfBearingAsymmetryOnStability Linked
No ratings yet
InfluenceOfBearingAsymmetryOnStability Linked
28 pages
PR2 W11D1
No ratings yet
PR2 W11D1
43 pages
Lesson - 12 Between Subject Designs 12.0. Objectives
No ratings yet
Lesson - 12 Between Subject Designs 12.0. Objectives
15 pages
All Units 2&5marks Qu With Ans
No ratings yet
All Units 2&5marks Qu With Ans
39 pages
Psychometric Evaluation of A Knowledge Based Examination Using Rasch Analysis
No ratings yet
Psychometric Evaluation of A Knowledge Based Examination Using Rasch Analysis
4 pages
Do Hands-On Activities Increase Student Understand
No ratings yet
Do Hands-On Activities Increase Student Understand
35 pages
Chapter 4 Torsion PDF
67% (3)
Chapter 4 Torsion PDF
30 pages
Syllabus of Chemical Engineering 3rd Year 2020 5 April 2021
No ratings yet
Syllabus of Chemical Engineering 3rd Year 2020 5 April 2021
57 pages
58835-Article Text-175938-3-10-20221202
No ratings yet
58835-Article Text-175938-3-10-20221202
11 pages
Noise Source Identification Techniques Simple To Advanced Applications
No ratings yet
Noise Source Identification Techniques Simple To Advanced Applications
6 pages
Non-Parametric Test
No ratings yet
Non-Parametric Test
12 pages
Database Management Systems-9
No ratings yet
Database Management Systems-9
10 pages
ML Geometry Chapter 6 Review-Test
No ratings yet
ML Geometry Chapter 6 Review-Test
5 pages
18M-302C 6362 Answer Key
No ratings yet
18M-302C 6362 Answer Key
26 pages
Ashfaq
No ratings yet
Ashfaq
1 page
Module 8 ANOVA
No ratings yet
Module 8 ANOVA
18 pages
Best Test Design
No ratings yet
Best Test Design
242 pages
Iwata (E 1974) PDF
No ratings yet
Iwata (E 1974) PDF
21 pages
Statistics CHAPTER 9 T Test Activity
No ratings yet
Statistics CHAPTER 9 T Test Activity
3 pages
5 Three Phase System1
No ratings yet
5 Three Phase System1
28 pages
Chapter 09 Excercises Part II
No ratings yet
Chapter 09 Excercises Part II
5 pages
I P S F E The Analysis of Variance: Ntroduction To Robability AND Tatistics Ourteenth Dition
No ratings yet
I P S F E The Analysis of Variance: Ntroduction To Robability AND Tatistics Ourteenth Dition
68 pages
Alternatives Test in ANOVA With Unequal Variance and Unequal Sample Size
No ratings yet
Alternatives Test in ANOVA With Unequal Variance and Unequal Sample Size
12 pages
Delacre Et Al 2019 Taking Parametric Assumptions Seriously Arguments For The Use of Welchs F-Test Classical F-Test in One-Way ANOVA
No ratings yet
Delacre Et Al 2019 Taking Parametric Assumptions Seriously Arguments For The Use of Welchs F-Test Classical F-Test in One-Way ANOVA
12 pages
Welcome To TÜV Rheinland Vietnam: Risk Assessment
No ratings yet
Welcome To TÜV Rheinland Vietnam: Risk Assessment
50 pages
OneWayANOVA LectureNotes
No ratings yet
OneWayANOVA LectureNotes
13 pages
Chapter 7 Anova
No ratings yet
Chapter 7 Anova
20 pages
Speaker Identification Based On GFCC Using GMM: Md. Moinuddin Arunkumar N. Kanthi
No ratings yet
Speaker Identification Based On GFCC Using GMM: Md. Moinuddin Arunkumar N. Kanthi
9 pages
Compre Advanced Stat 2022
No ratings yet
Compre Advanced Stat 2022
5 pages
Math 403 Engineering Data Analysi1
No ratings yet
Math 403 Engineering Data Analysi1
10 pages
Ad3491 Fdsa Unit 4 Notes Eduengg-2
No ratings yet
Ad3491 Fdsa Unit 4 Notes Eduengg-2
16 pages
Economics, Game Theory and Terrorism (Walter Enders, Todd Sandler)
No ratings yet
Economics, Game Theory and Terrorism (Walter Enders, Todd Sandler)
544 pages
Vel Tech (Owned by Rs Trust)
No ratings yet
Vel Tech (Owned by Rs Trust)
6 pages
Exam Question Evaluation With Item Response Theory: Evert-Jan - Bakker@wur - NL
No ratings yet
Exam Question Evaluation With Item Response Theory: Evert-Jan - Bakker@wur - NL
4 pages
3 RRSH 801 Module 3 Assignment
No ratings yet
3 RRSH 801 Module 3 Assignment
7 pages
Matlab Stock Modelling
No ratings yet
Matlab Stock Modelling
16 pages
1 - Relations and Functions
No ratings yet
1 - Relations and Functions
18 pages
ST 511 Self Notes
No ratings yet
ST 511 Self Notes
6 pages
Mathematics 9 Key
80% (5)
Mathematics 9 Key
6 pages
Statistical Treatment of Data
No ratings yet
Statistical Treatment of Data
27 pages
Aaaa
100% (1)
Aaaa
4 pages
What Is A Hypothesis
No ratings yet
What Is A Hypothesis
4 pages
Inferential Statistics
No ratings yet
Inferential Statistics
101 pages
One Way Anova
No ratings yet
One Way Anova
26 pages
Do Curriculum-Based Measures Predict Performance On Word-Problem Solving Measures?
No ratings yet
Do Curriculum-Based Measures Predict Performance On Word-Problem Solving Measures?
10 pages
An Applied Research of Rasch GSP For Evaluating Difficulty of Test Questions
No ratings yet
An Applied Research of Rasch GSP For Evaluating Difficulty of Test Questions
10 pages
Rasch Model
No ratings yet
Rasch Model
8 pages
A Fundamental Conundrum in Psychology's Standard Model of Measurement and Its Consequences For Pisa Global Rankings.
100% (1)
A Fundamental Conundrum in Psychology's Standard Model of Measurement and Its Consequences For Pisa Global Rankings.
10 pages
BL RSCH 2122 LEC 1922S Inquiries, Investigations and Immersion (VICTOR)
No ratings yet
BL RSCH 2122 LEC 1922S Inquiries, Investigations and Immersion (VICTOR)
11 pages
Using The General Diagnostic Model To Measure Learning and Change in A Longitudinal Large-Scale Assessment
No ratings yet
Using The General Diagnostic Model To Measure Learning and Change in A Longitudinal Large-Scale Assessment
26 pages
Analysis of Variance and Design of Experiments: Learning Objectives
No ratings yet
Analysis of Variance and Design of Experiments: Learning Objectives
30 pages
Parametric Test
No ratings yet
Parametric Test
2 pages
Empirical Characteristic Function Approach To Goodness of Fit Tests
No ratings yet
Empirical Characteristic Function Approach To Goodness of Fit Tests
277 pages
Research Methods and Designs
No ratings yet
Research Methods and Designs
36 pages
Ken Black QA 5th Chapter 11 Solution
No ratings yet
Ken Black QA 5th Chapter 11 Solution
30 pages

A New Approach For Testing The Rasch Model

Uploaded by

A New Approach For Testing The Rasch Model

Uploaded by

Educational Research and Evaluation

An International Journal on Theory and Practice

ISSN: 1380-3611 (Print) 1744-4187 (Online) Journal homepage: https://www.tandfonline.com/loi/nere20

A new approach for testing the Rasch model

Klaus D. Kubinger , Dieter Rasch & Takuya Yanagida

To link to this article: https://doi.org/10.1080/13803611.2011.630529

Published online: 30 Nov 2011.

Submit your article to this journal

Article views: 397

View related articles

Citing articles: 2 View citing articles

Full Terms & Conditions of access and use can be found at

A new approach for testing the Rasch model

Though calibration of an achievement test within psychological and educational

*Corresponding author. Email: [email protected]

ISSN 1380-3611 print/ISSN 1744-4187 online

yijk ¼ m þ ai þ bij þ ck þ ðacÞik þ eijk 2 ð1Þ

For instance, according to Rasch, Herrendörfer, Bock, Victor, and Guiard

[–2.5, –1.5, –0.5, 0.5, 1.5, 2.5] for group i ¼ 1 in A and as

this corresponds in terms of the Model equation (1) with ck as

[–2.5, –1.5, 0.0, 0.0, 1.5, 2.5], (ac)1k as

[–2.5; –1.5; –0.5; 0.5; 1.5; 2.5] for group i ¼ 1 in A and as

The main results were:

We now perform an additional simulation study in order to test several other

(1) number of items: c ¼ 6; 25; 30; 40; 60; 100; 500;

In each step of the simulation, the random number generator of R (R

Discussion and conclusion

You might also like