0% found this document useful (0 votes)
16 views

A Review of Linear Programming and Its Application

Uploaded by

Heather Thom
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

A Review of Linear Programming and Its Application

Uploaded by

Heather Thom
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/228707933

A review of linear programming and its application to the assessment tools


for teaching and learning (asTTle) projects

Article · January 2001

CITATIONS READS

14 9,194

1 author:

Richard Fletcher
Massey University
38 PUBLICATIONS 912 CITATIONS

SEE PROFILE

All content following this page was uploaded by Richard Fletcher on 23 July 2014.

The user has requested enhancement of the downloaded file.


Technical Report 5, December 2000

A Review of Linear Programming and its Application to the Assessment


Tools for Teaching and Learning (asTTle) Projects
Technical Report 5, Project asTTle, University of Auckland, 2000

Richard B. Fletcher1
Massey University (Albany)

This report reviews the international research literature on linear programming as applied to
the issues of banking assessment items. It outlines the mathematical procedures needed to
obtain feasible solutions to selections made by teachers and constraints imposed by the
assessment developers. The various algorithms and heuristic procedures necessary for
feasible solutions in an item bank of only 500 items with testlets are discussed and
exemplified. The report recommends use of detailed item mapping, limiting the number of
ability levels, use of the simultaneous selection of items and sets method, use of the
maximin model, and use of the optimal rounding method in finding solutions.

Table of Contents 1 precision of other items. For the test


development process, this is a great advantage
Introduction.......................................................1
as item pools can be enlarged over time. The
Item Response Theory ......................................2
payoff of having a larger item bank is the
IRT Test Assembly ...........................................3
increased ability to select different sets of items
Linear Programming .........................................3
and therefore reduce item exposure rates. IRT
Objective Functions ......................................4
and item banking are major strengths that the
Practical Constraints .........................................6
asTTle projects will fully utilise.
Practical Constraints for Simultaneous
In combination, IRT and computers have led
Selection of Items..........................................6
to new innovations in test assembly. One
Practical Constraints for Simultaneous
practical example of the use of item banking is
Selection of Sets of Items..............................8
computer adaptive testing, where individual
Solving Binary 0-1 and Integer LP Models ....11
tests are constructed based on an examinee’s
LP Algorithms and Heuristics.....................11
responses to individual items (Lord, 1980).
Infeasibility .....................................................12
Ability is estimated after each item, and when
Conclusions.....................................................14
the test is terminated at a predetermined
Recommendations...........................................14
criterion, the ability level is then assigned. The
References.......................................................15
net result is a tailored test that accurately
Appendix One – Item Bank Structure.............18
estimates the examinee’s ability in less time,
using fewer and different sets of items.
Introduction
Efficiency in trait estimation is the advantage of
Item banking permits the test constructor to this ability testing approach.
store a great deal of information about the items There are difficulties, however, that have
(e.g., item difficulty, item discrimination, emerged from large-scale implementation of
response times, content attributes) which then item banking. Firstly, the “best” items become
can be used to develop tests according to overexposed, and therefore a small set of items
detailed test specifications. A significant tend to be selected into most tests. The net
feature of item response theory (IRT) item result is that tests do no vary from one another
banking is that items can be added to the item in terms of content. Secondly, item security can
bank (once they are linked to the other items) or be problematic in that it has been possible for
removed without impacting the measurement candidates to memorise the first few items in an
adaptive test, and therefore the potential exists
1
Address for mailing: Richard Fletcher, School of for the content of the item bank to be known
Psychology, Massey University, Albany, Auckland, New and used by specialist test preparation
Zealand. e-mail: [email protected]

1
2 Fletcher, R. B.

organisations. The above two issues can comparable to other tests constructed from the
undermine utility of a test if too much is known same item bank. This is a major advance in
about the item content beforehand. Ultimately, classroom-based assessment.
ability estimation is compromised and the The aim of this paper is to discuss linear
testing process is invalidated. The issue facing programming (LP) in the context of the asTTle
test developers is to create a very large item projects using IRT calibrated items, by
bank to alleviate some of these problems. providing practical examples for specifying
Another example of the utility of item objective functions and linear constraints. IRT
banking is optimal test design in the form of 0-1 test assembly is first addressed to overview this
linear programming (LP) that allows for measurement model and its applicability to test
complete tests to be assembled according to construction within the asTTle projects.
detailed test specifications while maximizing an Secondly, LP is presented along with an
objective function (Adema, Boekooi-Timminga, explanation of some of the objective functions
& van der Linden, 1991; Baker, Cohen, & available to the test constructor. Thirdly,
Barmish, 1988; Timminga & Adema 1995, practical constraints for individual items are
Thuenissen, 1985). Test assembly using introduced along with a worked example to
existing item banks and LP is applicable to both exemplify the main issues. Fourthly, set-based
classical test theory and IRT (Adema, 1990, items are presented along with a practical
1992a, 1992b; Adema, Boekkooi-Timminga, & example to illustrate this method. In sum, the
Gademann, 1992; Adema, Boekkooi- paper will provide computer programmers with
Timminga, & van der Linden, 1991; Adema & the relevant information and references to
van der Linden, 1989; Armstrong, Jones, & enable them to set up the constraints as outlined
Wang, 1994; Baker, Cohen, & Barmish, 1988; in the Assessment Tools for Teaching and
Berger & Veerkamp, 1996; Boekkooi- Learning (asTTle) proposal.
Timminga, 1987, 1990a, 1990b, 1993; de
Gruijter, 1990; Stocking, Swanson, & Item Response Theory
Pearlman, 1991, 1993; Swanson & Stocking, For the purpose of this paper it is assumed
1993; Theunissen, 1985, 1986; Timminga & that the one-parameter Rasch model is used as
Adema, 1995, 1996; van der Linden, 1996, the basis of item calibration. The Rasch model
2000; van der Linden & Boekkooi-Timminga, is computationally the simplest of the current
1989). Furthermore, LP can be used to IRT models for dichotomously scored data (see
assemble tests using dichotomous (e.g., Baker, Hambleton, 1989; Hambleton & Swaminathan,
Cohen, & Barmish, 1988; de Gruijter, 1990; 1985; and Harris, 1996 for a discussion of the
Stocking, Swanson, & Pearlman, 1991; various unidimensional dichotomous IRT
Theunissen, 1985, 1986; van der Linden & models). A critical assumption of the Rasch
Boekkooi-Timminga, 1989) or polytomous item model (and all other unidimensional IRT
formats (e.g., Berger, 1998; Berger & models) is unidimensionality for each item; that
Mathijssen, 1997; and Fletcher & Hattie, in is, one underlying latent ability should account
review). for the item response. The Rasch model implies
Baker, Cohen, and Barmish (1988, p. 190) that the probability of a correct response to an
suggest that “mathematical programming item (dichotomous in this case) is a function of
represents a major addition to the tools of the the item difficulty and the examinee’s ability
test constructor because it provides an level (θ), and is denoted by the following
analytical rather than an ad hoc procedure for equation:
item selection under IRT.” Such an approach to
test assembly is a major strength of the asTTle
project, as tests can be assembled to teacher
specifications using items with known (1)
properties. The use of LP in this situation where Pi (θ) is the item characteristic function
furnishes a teacher with an adaptive test that is and bi is the item difficulty parameter.
Technical Report 5: Linear Programming and its Application to asTTle 3

An important feature of IRT models is the 2. Items with information curves that can fill
concept of item information that is the the difficult areas to fill under TIF are
reciprocal of the standard error of measurement. selected.
Items with low standard errors have greater 3. Item information function curves are added
information and vice versa. Item information to determine their information contribution
provides the test constructor with an indication to the TIF.
of item measurement precision from which 4. Continue to add items so that the TIF is
items can then be selected accordingly into the approximated at the specified ability levels.
test on the basis of their information. For the
Rasch model, item information function (IIF) is 4

estimated using the following equation: 3.5

2.5

Information
(2) 2

The usefulness of IIF curves is that they can 1.5

be added together. As a consequence, these 1

specify the form of the test information function


0.5

0
curve (TIFC). The TIFC is simply the sum of -3 -2.4 -1.8 -1.2 -0.6 0 0.6 1.2 1.8 2.4 3

the IIF curves, and is computationally Ability/Difficulty

expressed: Figure 1. A hypothetical TIF specified over the


ability range -1 to +1.

(3) A major limitation of this method is that it


becomes difficult to achieve when the item
IRT Test Assembly bank is large (Thuenissen, 1985). For example,
an item bank of 500 items will result in 2,500
IIFs are the building blocks of IRT test possible tests that can be assembled, and
assembly, and their use was first suggested by enumerating all possible tests is difficult, if not
Birnbaum (1968) and Lord (1977, 1980), who impossible. As tests are usually assembled to
indicated that their properties were useful in meet detailed specifications, then manually
trying to match them to a target information selecting items using IIFs to fit a TIF is an
function (TIF), such that they approximated the implausible task. One cannot be sure that the
desired shape. The additive properties of IIF optimal set of items, which meets all of the
curves enable tests to be assembled according to specifications, has indeed been selected.
the specified shape of the TIF. Accordingly, Simultaneously, taking all the test requirements
the TIF can be specified to cover a range of the into consideration when assembling tests under
trait continuum, or to provide information at the IRT framework is a laborious manual task,
certain ability points. For example, greater or computationally intractable. Test assembly
information may be required around a cut score methods that can utilize IRT measurement
on a test (i.e., mastery testing), whereas properties, and which incorporate complex test
selecting information across a range of abilities specification, are required, and are of great
would require TIF to be spread across a advantage to test constructors.
predetermined interval.
Lord (1980) outlined the following Linear Programming
procedure for assembling tests under the IRT
framework. One method that can maximize the
1. The TIF is specified to be a certain shape psychometric properties of IRT information
(see Figure 1 for an example of a curves and fully implements Birnbaum’s (1968)
hypothetical TIF over the ability range -1 to and Lord’s (1977, 1980) IRT test assembly
+1). method, while meeting complex test
4 Fletcher, R. B.

specifications, is mathematical linear cover. From an LP programming perspective,


programming (LP) in the form of 0-1 binary however, this means that the objective function
[integer] programming (e.g., Adema & van der used in each of the asTTle assessments will
Linden, 1989; Boekkooi-Timminga, 1989, need to be flexible in its ability to realise
1990; de Gruijter, 1990; Theunissen, 1985, feasible solutions and to satisfy the teachers’
1986; van der Linden & Boekkooi-Timminga, demands. In other words, the objective function
1989). Adapted from the area of operations must provide teachers with a set of items that
research, LP maximizes or minimizes an meets, or closely approximates, their test
objective function while satisfying a series of specifications (e.g., range of ability). Thus, it is
linear constraints, and is therefore suited to test not desirable for teachers to be told there is no
assembly problems. solution.
The primary aim of LP models is to Timminga and Adema (1995) suggested six
maximize or minimize an objective function types of objective functions for achievement
(e.g., maximizing test reliability, minimizing testing (see also van der Linden & Boekkooi-
the number of items in the test, and/or Timminga, 1989). The first two are based on
maximizing information at a given point of the information being maximized at a specified
ability scale) so that test specifications (e.g., test point of the ability scale (θf) (i.e., mastery
length, test information, item format, and item testing), and the last four are based on a broader
content) in the form of optimal constraints (e.g., range of ability being tested (i.e., non-mastery).
the number of items in the test equal 45; mean For example, information may be required to be
p-value equals 0.25; items 3 and 5 should not greater across the ability range of θk (k = 1, ...,
appear together) are met. In a 0-1 LP model, K). For the asTTle projects, the teachers will be
the decision variables may take either 0 or 1. asked to make choices about the range of ability
Accordingly, items that are selected into the test the test should cover. Generally, their selection
are assigned a 1 and those omitted from the test, should be of three to five ability levels, as this is
a 0. sufficient to provide satisfactory results
(Timminga, 1995; cited in van der Linden,
1994). The six objective functions are:
1. Maximize the information at θk
(4)
where xi is the decision variable and i is the
number of items in the item bank.
LP provides the test constructor with a
method for selecting items that meet the (5)
objective function while meeting a series of where Ii is the item information for item Ii,
complex linear constraints, and can ensure that and n is the items in the item bank.
both content and statistical specifications are 2. Minimize the deviation between the cut
simultaneously satisfied (Fletcher & Hattie, in score and the βi (item difficulty) of the item
review). LP is a major advance in test assembly selected
process and will fully realize the objective of
the asTTle projects in that it will present
teachers with a set of items that matches, or
close approximates, their test specifications. (6)
3. Exact target information functions that
Objective Functions delineate the TIF should be as close as
LP involves an explicit statement about what possible to the specified information values.
the objective function of the test should be. In In actuality, obtaining an exact TIF will be
terms of the asTTle projects, this will likely an improbable task, and therefore a decision
entail teachers making an initial choice about is required as to how the deviation for the
the range of the ability they wish the test to target values should be reduced. Timminga
Technical Report 5: Linear Programming and its Application to asTTle 5

and Adema (1995) suggest three methods


that can be used to achieve this objective:
a. Minimize the sum of the positive (12)
deviations from the target values
(13)
With this objective function, the decision
variable y (y ≥0) is the largest deviation
(7) from the TIF.
subject to: 4. Maximum information is required at all
ability points. This objective function was
originally specified by Theunissen (1985)
(8) and is formulated to be the same as (2).
where T(θk) is the target value. However, as information is required at more
The model suggests that an objective than one ability point, and as LP deals with
function and a set of constraints are one objective function at a time, then these
required to obtain a set of values that are have to be re-specified for each θ level.
closely approached. Thus, minimizing Thus, the decision variable y (y≥0) specifies
the positive deviations from the target the amount of information to be obtained at
values is achieved when the values in the the specified ability levels. The model is
constraint (Equation 8) are at least specified:
reached. A limitation of this approach is maximize y
that not all points of the TIF may be subject to:
reached, and that at some ability points
the accuracy is greater than for others
(Timminga and Adema, 1995).
b. Minimize the sum of the absolute (14)
deviations 5. Relative target information values specify
that the shape of the TIF be approximated at
some of the ability points. This objective
function was developed by van der Linden
(9) and Boekkooi-Timminga (1989) as the
subject to: maximin model, and specifies the shape of
the TIF rather than the exact height. Thus,
the maximin model is a more flexible
approach to test construction. The decision
(10)
variable y (y≥0) denotes the degree to which
information is maximized, and constant rk
denote the relative shape of the TIF across
(11) the specified θ points. The model is denoted:
In this model, two decision variables yk maximize y
(positive deviation ≥ 0) and uk (negative subject to:
deviation ≥0) are introduced. As the sum
of yk and uk are minimized, and both are
equal to or greater than zero, it means that
when one is zero, the other is equal to the (15)
deviation from the TIF. 6. Minimizing the test length while the TIF
c. Minimize the largest deviation from the values are required to be equal to or greater
TIF. than the target values. With this model, one
minimize y should not set the test length as a constraint
subject to: as this will be in conflict with the objective
6 Fletcher, R. B.

function, and will result in infeasibility. The and will help in identifying potential
objective function is to: infeasibility problems and in finding out how to
deal with these (see Tables 1, 2, and 4 for
examples of hypothetical item bank structures).
In general, though, the constraints should be
(16) reasonable so as to allow for a feasible solution
subject to: to be obtained, and, therefore, knowing the
qualities of the item bank will help to overcome
this type of issue.
(17)
The objective functions outlined above each Practical Constraints for Simultaneous
have their potential uses in the construction of Selection of Items
achievement tests. The maximin model (van The importance of setting accurate and
der Linden & Boekkooi-Timminga, 1989) reasonable practical constraints cannot be
appears to be the most flexible of the above overstated, as these are essential components in
objective functions, and seems more LP test assembly because they are the means by
appropriate for the asTTle projects in that it which test specifications are fully met.
only requires the TIF to be approximated. Constraints should be closely checked before
any attempt is made to solve the model. Some
Practical Constraints simple examples of practical test constraints
For the asTTle projects, teachers will not that are likely to be used in the asTTle projects
only specify an objective function but will also are given below.
make choices about characteristics of the type 1. Constraining the number of items allowed to
of test they desire. These choices will then be be selected into the test. In other words, the
transferred into an objective function and series sum of the items from i = 1, 2, ..., I must be
of practical linear constraints. Teachers will equal to N
therefore be setting test specifications, albeit
limited ones. In practice, however, test
specifications are often complex, and practical
constraints are required within LP in order to (18)
fully realize the test constructors’ demands. 2. Limiting the number of items with certain
Test specifications in LP are expressed as a characteristics (e.g., multiple-choice, open-
series of linear constraints. Before setting ended questions, deep or surface learning).
practical constraints, it is advisable to consider If Sd is a set of deep cognitive processing
the qualities of the item bank and its ability to items, then one can limit the number of these
meet them. For example, it would not make items in the test using the equation below:
sense to ask for six deep processing items at θ3
in the close reading curricular area, when in fact
there were only three items. Such a constraint
would lead to no solution being found to the test (19)
construction model, and as teachers will not 3. Limiting the sum of certain item attributes.
know how to deal with such infeasibility, then For example, the sum of the administration
constraints will need to be set within reasonable times (ti) for item i, should be less than or
limits. equal to bj. For this type of constraints, one
Teachers will certainly need to know should not use equality signs as these can
something about the structure of the item bank lead to no solution being found. The
to enable them to understand the choices they inequality sign therefore provides some
can make. For the computer programmers, flexibility in designation of items into the
however, mapping out the item bank will prove test.
invaluable when writing the linear constraints
Technical Report 5: Linear Programming and its Application to asTTle 7

bank structure for the closed-choice items


(multiple-choice, “fill in the blanks” and “true
or false”), and Table 2 outlines the item bank
(20) structure for the open-ended items. Table 3
4. Select all or none of the items in a subset. If, presents the number of items at each θk level (K
for example, the items in a set are = 1, 2, ..., 7).
consecutively numbered 1–7, then the A test constructor may, for example, want a
equation that denotes selecting all or none of test that meets a set of tests specifications to
the items is: maximize information, such that:
1. The TIF should be approximately equal at θ3
= -1.0, θ4 = 0.0, θ5 = +1.0 (Equation 23) (see
Figure 1).
(21) 2. Ten items must be from the personal reading
5. One can also specify the proportion of items curricular area (Equation 24).
to be selected from one subset in relation to 3. Five closed-choice (C-C) items in the
another subset. For example, one might personal reading curricular area must be at
wish to have three times the items from the deep level of processing (Equation 25).
subset F1 as from subset F2. The equation 4. Ten items must be from the close reading
that denotes this is: curricular area (Equation 26).
5. Five C-C items in the close reading
curricular area must be at the deep level of
processing (Equation 27).
6. The test should contain 39 C-C items
(22) (Equation 28).
6. The inclusion of an item into the test requires 7. The test should contain one open-ended item
the item to take on a fixed value (xi = 1). If (Equation 29).
this constraint is chosen then a modification 8. The test administration time should take no
to the objective function should be made to more than 2,400 seconds (40 minutes)
take into account the pre-selected items. (Equation 30).
7. The exclusion of an item into the test To meet these specifications the following
requires the item to take on a fixed value (xi model was constructed:
= 0), or one can ensure that these items do maximize y
not enter the test assembly model. The latter subject to
of these two options is suggested as this will
speed up the selection process because fewer
items are considered from the item bank
(Timminga & Adema, 1995). In other
words, leaving items out of the test assembly (23)
model will speed up the solution time.
An application of some of the practical
constraints discussed above is presented below, (24)
highlighting some possible constraints that may
be used in the asTTle projects. In the following
example, the item bank is hypothesized to (25)
consist of 448 items (see Tables 1 & 2) with
each item consecutively numbered from 1–448.
Tables 1 and 2 represent the total item bank but
the items are presented in separate tables for (26)
ease of interpretation when examining the
practical constraints. Table 1 presents the items
8 Fletcher, R. B.

The method outlined below is based on an


item bank with set-based items. This method
(27) can incorporate discrete items, but for the sake
of simplicity only set-based items are fully
discussed. As van der Linden (2000, p. 229)
(28) states
The key feature of this method is that
(29) separate decision variables for the selection
of items and stimuli are defined. The
variables are used to model the constraints
imposed on the section of items and stimuli.
(30) Special logical constraints are added to keep
the selection of items and stimuli consistent,
that is, to prevent the selection of items
without the selection of stimuli, or the
(31)
selection of stimuli without the selection of
items.
(32) Set-based items can be conceptualized at
four levels (although there could be more):
Practical Constraints for Simultaneous individual item, stimuli, item sets, and total test
Selection of Sets of Items (van der Linden, 2000). Table 4 shows the
For the asTTle projects the item bank is attribute and constraint levels for items as
likely to be composed of set-based items (or specified by van der Linden (2000). It is
testlets). That is, groups of items are linked to a important to note that constraints must be
common stimulus and as a result are set-bound. delineated at the same level as the attribute
The implication is that if certain items are level, or higher. Understanding the levels of the
selected then the associated stimulus should items in terms of attribute and constraint levels
also be selected, or vice versa. The challenge will make setting the constraint a
for the test constructor is to ensure that items straightforward task.
related to a stimulus are selected, or that if the The notation used is similar to that used by
stimulus is selected then some or all of its van der Linden (2000). Items are arranged in
associated items are selected into the test. item sets, S (s = 1, 2, ..., S) (if the item bank
Boolean constraints (e.g., if Item 1 is selected, were to contain discrete items, then the items
then its related stimulus must be selected) are might be arranged into S-1 sets (s = 1, 2, ..., S),
therefore needed to overcome the problem of with each set having a common stimulus, and
simultaneous selection of items and stimuli. with the final set being the discrete items. The
Van der Linden (2000) suggests six methods for only change to the constraints below would be
selecting set-based items and their associated on the selection of quantitative and categorical
stimuli. Only one method is presented below stimuli attributes, and the number of item sets;
(simultaneous selection of items and sets), as thus, the summation for these would be over S-
this method was shown to be computationally 1). Items within each set are denoted is = 1, 2,
efficient and more accurate in its ability to ..., Is. Specifically, for this example, each
match the TIF. Furthermore, van der Linden curricular area has seven stimuli, with each
(2000) suggests that this method is most suited stimulus having seven associated items
to new test assembly issues or new item banks, hypothesized to be at the different θ levels (θ1 -
and is thus most appropriate to the asTTle θ7). Accordingly, for Stimulus 1 in the personal
projects. (For a further discussion, and the reading curricular area, the associated items are
implications of the other methods, see van der the first item at each of the seven θ levels as
Linden, 2000). laid out in Table 5 (e.g., items 1, 8, 15, 22, 29,
35, and 43); for Stimulus 2 in the reading area
Technical Report 5: Linear Programming and its Application to asTTle 9

the associated items are 2, 9, 16, 22, 29, 35, and and categorical stimulus attributes are defined
44, and so on. Sets are selected into the test if zs in Equations 44 to 47. Equations 48 to 50 are
= 1. Alternatively, zs = 0 indicates that the set is definitions of the decision variables.
not selected. Item is is denoted xis = 1 when it is Minimize y
selected into the test, and xis = 0 when it is not subject to:
in the test.
The maximin objective function (van der
Linden & Boekkooi-Timminga, 1989) is
specified such that information for item is at θk, (33)
denoted as Is (θk), is specified by the target
values T(θk ) (k = 1, 2, ..., K) for each value of
the TIF at θk. Thus, the TIF at each θk is (34)
required to fall between upper and lower
bounds [T(θk )-y, T(θk ) + y]. To control the size
of the bounds, y ≥0 is an actual numerical value
that defines the width of the interval. The (35)
objective function is therefore to minimise y.
Further notation is needed to fully explicate the
model and is based on those suggested by van
der Linden (2000), such that: (36)
1. qi is the value of item is and on quantitative
attribute q. For example, the quantitative
attribute for item is could be item difficulty.
2. rs denotes the value of stimulus s on (37)
quantitative attribute r. For example, the
quantitative attribute may be time taken to
read a passage of writing.
3. Cg is defined as the set of indices of items (38)
with the value g on categorical attribute C, g
= 1, 2, ..., G. If the categorical attribute is
level of processing then the associated items
would be those at the deep or the surface (39)
level.
4. Dh is a set of stimuli indices with value h on
categorical attribute D, h = 1, 2, ..., H.
5. n defines the upper (u) and lower (l) bounds (40)
for the numbers of items in subsets from the
item bank.
The following model formally outlines the
maximin objective function for items sets, and (41)
some practical possible constraints that can be
used. Equations 33 and 34 set the TIF to a
relative shape and to fall within a certain range. (42)
Equations 35 and 36 set the length of the test.
Equation 37 sets the number of sets to be
selected. The number of items to be selected
from an item set is given in Equations 38 and (43)
39. Items to be selected on the basis of their
quantitative or categorical attributes are denoted
in Equations 40 to 43. Similarly, quantitative
10 Fletcher, R. B.

(52)
(44)

(53)
(45)

(54)
(46)

(47) (55)

(48)
(56)

(49)

(57)
(50)
To provide a worked example for selecting
set-based items using some of the above
constraints, a hypothetical set of test
specifications, using item sets in Table 4, are (58)
specified, such that:
1. The information at θ1 = -2 and θ3 = 0 means
that the TIF should fall between a series of
lower and upper bounds such that the (59)
interval is minimized (Equations 51 & 52).
2. The test must contain no more than 40 items
(Equations 53 & 54).
3. Ten sets must be selected (Equations 55). (60)
4. Six sets must come from the personal and the
close reading curricular areas (Equation 56).
5. There must be no more than five items per
set (Equations 57 & 58). (61)
6. The test can take no longer than 40 minutes
to complete (Equations 59 & 60).
7. At least five items ≥ θ3 at the deep
processing level from the personal reading (62)
area must be selected (Equations 61 & 62).
Minimize y
subject to:
(63)

(51) (64)
Technical Report 5: Linear Programming and its Application to asTTle 11

the best possible solution (see Ignizio &


Cavalier, 1994; Sutton, 1993; Wolsey, 1998, for
(65) a more detailed discussion of the simplex
where is ∈CRdeep denotes a set of items at the algorithm).
deep processing level in the close reading The branch-and-bound method for solving
curricular domain. 0-1 LP problems refers to a class of algorithms
(see Adema, 1992a; Ignizio & Cavalier, 1994;
Solving Binary 0-1 and Integer LP Models Sanders, Theunissen, & Bass, 1996; Sutton,
1993; and Wolsey, 1998, for a more detailed
The solving of 0-1 and integer LP test
discussion of the branch-and-bound algorithm).
assembly models is computationally complex
Basically, the branch-and-bound algorithm
and, in terms of computing time, very
starts by calculating a solution to a relaxed
demanding (Timminga & Adema, 1995). In
model – the relaxed model being real values
other words, the time taken to search for the
taking the place of integer variables. For
optimal solution can be prohibitive, and hence
maximization or minimization problems, the
the practical benefits of LP are lost. A major
value obtained for the objective function for a
concern for the asTTle projects is to ensure that
relaxed solution is an upper or lower bound to
feasible solutions are obtained in reasonable
the objective function value of the integer
computing time. There are, however, trade-offs
solution to be obtained.
between optimality and a relative degree of
Once the relaxed solution is obtained, two
precision in obtaining a solution to the test
new problems are created; one for the variable
assembly problem. As optimality may not be
with a fractional value in the solution, which is
guaranteed, finding solutions as close to optimal
fixed at 0, and the other for the same variable
as possible is required. In order to ensure that a
fixed at 1. The two problems are then solved.
feasible solution can be obtained close to
Once a problem is solved, its solution is
optimality, heuristics is required. The use of
checked to determine if its value is greater than
heuristics facilitates close-to-optimal solutions
the best integer solution found thus far. The
but does affect the precision of the test
problem with the best objective function value
obtained. The strength of heuristics is that it
is split again using one of the other variables
allows for complex test specification to be
with a fractional solution. If the objective
realized in practical computing time, and
function value is no better, then backtracking to
therefore places LP within the realm of the
the previous solution takes place. If the
asTTle projects.
solution is an integer, then a comparison with
the best integer solution found thus far takes
LP Algorithms and Heuristics
place, and the best one is retained. The process
The main algorithms used to solve linear continues until all the variables are assigned a
programming problems are the simplex method value of either 1 or 0, and the objective function
and the branch-and-bound method. value with the best integer value is determined.
The simplex method is an iterative selection Timminga et al. (1996, p. 15) suggest that the
process that achieves optimality by finding branch-and-bound algorithm can be
successive basic feasible solutions. The represented as a search in a tree where each
procedure moves from one feasible region to node is a problem with partially fixed
another to improve the value of the objective variables and the two branches leaving the
function, and when the objective function nodes are the problems associated with
cannot be improved any further the solution is fixing another variable. Fortunately it is not
optimal and the process is halted. In other necessary to search the whole tree for a
words, when a set of items that provides a solution. As soon as a node produces a
solution that maximizes the objective function solution with a value for the objective
by satisfying all the constraints is found, and no function worse than the one at hand, all
set of items can better it, then it is considered
12 Fletcher, R. B.

problems farther along the branch can be solution is found. Likewise, optimality is
skipped. not guaranteed.
A full branch-and-bound search on a large The application of the simplex and the
LP problem requires a great deal of time to branch-and-bound methods for solving test
reach a solution, and thus, to use it as the sole assembly problems will facilitate an optimal
minimization method would be unacceptable in solution, if it exists. The main limitation of
the asTTle projects. To facilitate test assembly these approaches in terms of the asTTle projects
using large item banks, test developers can is the computing time needed to reach
employ heuristics. In general, heuristics optimality. Heuristics greatly increases the
facilitates varying degrees of precision ability of the test constructor to assemble tests
depending on the process chosen and the using LP methods, and therefore makes LP
constraints used. (For a more complete feasible to test assembly problems. Of the
discussion of the uses and implications of heuristics outlined above, van der Linden &
various heuristics, see Timminga et al., 1996). Boekooi-Timminga (1989) suggest that the
Stocking, Swanson, and Pearlman (1991) optimal rounding method provides excellent
presented the following heuristics for results that are close to optimal in reasonable
facilitating feasible solutions, based on the work computing time. The drawback of heuristics is
of van der Linden and Boekkooi-Timminga that they lead to varying degrees of precision
(1989). when the final solution is obtained, and this
1. Crude linear rounding. The initial solution should be borne in mind when examining
to the relaxed problem is obtained, and the results.
decision variables are allowed to take on
non-integer values 0 ≤ xi ≤ 1, and the Infeasibility
findings are rounded to 0 or 1. A problem An issue that can occur in LP is infeasibility.
with this approach is that it may not reach That is, there is a problem in finding a feasible
optimality, or satisfy all of the constraints. solution that fully meets the test constraints.
2. Improved linear rounding. This approach Although not an uncommon problem, there are
differs from crude linear rounding in that the methods for overcoming such issues. Swanson
decision variables are ranked in descending & Stocking (1993) note that studies reported in
order, and the first n (n = the number of the literature report relatively small item banks
items) are re-rounded to 1 and selected into of fewer than 1,000 items, and with as few as
the test. Again, optimality is not guaranteed, 50 constraints. In practice, however, item
nor may all the constraints be satisfied. banks tend to be much larger with many more
3. Optimal rounding. First the relaxed linear complex constraints, and therefore the
solution is obtained (0 ≤ xi ≥ 1), then all probability of finding a feasible solution
variables equal to 0 or 1 are fixed. increases. The message is clear, the larger the
Optimality is then achieved by applying the item bank, the lower the potential for
branch-and-bound method. Again, this infeasibility.
method does not guarantee an optimal Timminga and Adema (1995, p. 422) suggest
solution to the problem. that “the basic cause of infeasibility is that the
4. First 0-1 solution. Branch-and-bound set of test requirements is in contradiction with
methods are used to search for a global the characteristics of the item bank.” An
solution by discarding many local solutions. important consideration in the test development
After the first 0-1 integer solution is found, phase therefore is the identification of the test
the method stops. If a solution to the specifications and how these relate to the
constraints exists, it is found, although it structure of the item bank. Test specifications
may not be optimal. need to be thoroughly reviewed prior to the
5. Second 0-1 solution. This solution is similar application of LP methods in order to decrease
to the first 0-1 solution, except that the the probability of infeasibility. If, however,
method terminates after the second integer infeasibility is encountered, then practical
Technical Report 5: Linear Programming and its Application to asTTle 13

solutions are required to solve the problem. region. Combining constraints will also enlarge
The cause of infeasibility is often difficult to the solution area. For example, if no solution
detect, as computer software is unable to could be found to the model specified above
identify the problem in the constraints. More (i.e., Equations 22 to 32), then Equations 23 and
often, the problem can be manually traced to 25 can be combined so that:
impractical constraints on the item bank. For
example, if the item bank has 10 items relating
to surface learning, and the test specification
demand that, 12 items be surface learning, then
(66)
it is impossible to obtain a solution. In this
Combining these two constraints results in
obvious example, the test constructor can
the same number of items being selected, but it
identify the problem and rectify the constraints.
allows more items to be drawn from a curricular
Making changes to the right-hand side of a
area that is best able to provide information that
constraint is one method for increasing the
meets the TIF. It may be that one curricular
solution space to produce feasible solutions. It
area can provide 12 items, and the other area
is important to understand that changes, for
can provide 8 items to then meet all the test
example, in the right-hand side in Equation 15,
specifications. Test specificity is somewhat
will not increase the feasible region, as these are
compromised in this example, but not to a great
unrelated solution space. Changing the values
extent. The same approach can also be applied
in Equations 35 and 36 will, however, assist in
to combining constraints on item sets to avoid
increasing the solution space, and therefore may
infeasibility. In general, the problem of
result in a feasible solution being obtained.
combining constraints is that test specificity
Some manipulation (i.e., increase or decrease)
may be compromised. As a consequence, if
of n in the right-hand side is needed to facilitate
constraints are combined, then the final solution
any increase in the solution space.
should be examined to determine its
Another approach to increasing solution
acceptability in terms of the original test
space is to examine closely the equality and
specifications.
inequality signs. In general, the use of
Problems may also be encountered if the
inequality signs can result in the increase of the
objective function and one or more of the
solution space if the cause of infeasibility is due
constraints are in conflict. For example, if
to that constraint. Replacing inequality with
items are designated not to be in the test
equality signs can create infeasibility as there
(assigned 0, and left out the problem), then
may not be an exact solution to such a
modification of the objective function or the
constraint. Timminga and Adema (1995)
constraints is required to take this into account.
suggest that equality signs should be used only
Although the literature on infeasibility and
where constraints are applied to integer-valued
test assembly problems is limited, Timminiga
variables. With item sets, one should attempt to
and Adema (1995) suggest the following
increase the number of items associated to a
analytical strategy to overcome infeasibility.
common stimulus to reduce potential
1. Check that the constraints are compatible
infeasibility problems. Thus, for set-based
with the item bank.
items, and for the asTTle projects, the
2. Solve the relaxed LP problem. If
implication is to have a large item bank with
infeasibility occurs, then check the
many items associated with each stimulus to
constraints using some of the approaches
assist in reducing issues of infeasibility with
outlined above to modify the constraints.
set-based items.
3. If the relaxed problem can be solved, then
As previously stated, knowing the qualities if
solve the 0-1 LP problem.
the item bank can help to identify sources of
4. If the 0-1 LP problem has a solution, the test
infeasibility. If, for example, a constraint is too
is specified.
severe, then it should be either modified or
5. If the relaxed problem cannot be solved, then
deleted in such a way as to increase the feasible
check the constraints.
14 Fletcher, R. B.

6. If the 0-1 LP is still infeasible, then a the items and the amount of information they
heuristic method should be applied to the have. One method for determining the upper
problem. and lower bound on the TIF would be to draw
7. If infeasibility continues with the 0-1 LP repeated random samples of 40 items to
problem, then the test constructor should determine the upper and lower values of the
modify the test specifications. TIF, which could then serve as the values for
In sum, problems with infeasibility will be the TIF in all tests.
generally be traceable to incompatibility with The practical constraints presented above
the item bank or to constraints that are in should provide the computer programmers with
conflict with one another or with the objective the relevant information to be able to formulate
function. Most infeasibility can be overcome the teachers’ choices (i.e, curricular area, item
by knowing the structure of the item bank and format, administration time, level of processing,
by setting reasonable test specifications. and item difficulty) into linear constraints. The
Timminiga and Adema (1995) point out that if set-based items approach outlined by van der
the above procedures are followed, then the test Linden (2000) will allow for the constraints to
constructor should always be able to assemble a be designated between upper and lower bounds.
test. As with the TIF, the use of upper and lower
bounds on the practical constraints will provide
Conclusions the computer programmers with some
flexibility when dealing with possible
The review presented above suggests that LP
infeasibility issues.
is highly applicable to the asTTle projects, as it
Solving LP problems should not be
provides a search mechanism that will furnish
problematic for the asTTle projects, as
teacher-specified tests with a high degree of
heuristics are available that should provide
precision. Linear programming simultaneously
close-to-optimal results in reasonable
satisfies both content and statistical attributes
computing time. Van der Linden and
that will be specified by the teacher through the
Boekkooi-Timminga (1989) state that as the
use of an objective function and a series of
size of the item bank increases, the amount of
linear constraints. As the teacher choices will
computing time needed to find an exact solution
be limited (difficulty, curricular area, level of
increases. In the future, items are likely to be
processing, etc.), then writing these in LP form
added to the asTTle item bank, and therefore
should not be too problematic.
the adoption of heuristics, such as the optimal
An issue for the computer programmers of
rounding heuristic, will provide accurate tests
the asTTle projects will be to set an objective
(to be generated quickly) that meet detailed
function that can conform closely to the
specifications.
teachers’ test specifications in terms of the
range of ability. In general, three to five ability
Recommendations
points should be sufficient for the range of TIF.
With set-based items, setting the objective 1. Have a detailed map of the item bank
function should not be too problematic, given showing the quantitative and qualitative
that that maximin model (van der Linden, 2000; aspects of the items, and how these relate to
van der Linden & Boekkooi-Timminga, 1989) the curricular areas (see Tables 1, 2, and 4,
will allow for the TIF, and will be specified to and also Timminga, van der Linden, &
fall between an upper and lower bound. The Schweizer, 1996).
task of the computer programmers will be to 2. Although there are seven ability levels to be
determine the width of this interval to alleviate covered across the asTTle projects, the
potential infeasibility. Finding a set of values teachers should be constrained to choose no
that will decrease the potential of infeasibility, more than five ability levels, as increases on
but still maximize information, will be essential this number will likely result in difficulty in
to the success of the asTTle projects. Setting approximating the TIF or result in
these values will be dependent on the quality of infeasibility.
Technical Report 5: Linear Programming and its Application to asTTle 15

3. As the asTTle project will use mainly set- Adema, J. J., Boekkooi-Timminga, E., &
based items, the method for the simultaneous Gademann, A. J. R. M. (1992).
selection of items and sets outlined by van Computerized test construction. In M.
der Linden (2000) is recommended. This Wilson (Ed.), Objective Measurement:
method is shown to provide a more accurate Theory into Practice (Vol. 1, pp. 261–273).
approximation of the objective function in Norwood, New Jersey: Ablex.
reasonable computing time using a large Adema, J. J., Boekkooi-Timminga, E., & van
number of constraints (e.g., 24 stimuli with der Linden, W. J. (1991). Achievement test
5–12 items per stimuli (498 items) with over construction using 0-1 linear programming.
200 constraints across 2 tests; the LP models European Journal of Operational Research,
were solved in 1–5 minutes). 55, 103–111.
4. For set-based items, the TIF should be Armstrong, R. D., Jones, D. H., & Wang, Z.
specified to be a relative shape (the maximin (1994). Automated test construction using
model developed by van der Linden & classical test theory. Journal of Educational
Boekkooi-Timminga, 1989, and van der Statistics, 19(1), 73–90.
Linden, 2000). This approach is the most Baker, F. B., Cohen, A. S., & Barmish, B. R.
flexible approach, given that the TIF can be (1988). Item characteristics of tests
set to fall between lower and upper bounds. constructed by linear programming. Applied
Determining the width of the interval will be Psychological Measurement, 12(2), 189–
a critical aspect of the computer 199.
programming. Berger, M. P. F. (1998). Optimal design of
5. For solving the binary 0-1 and integer tests with items with dichotomous and
problems, the optimal rounding method is polytomous response formats. Applied
more flexible in its approach to finding Psychological Measurement, 22(3), 248-258.
feasible solutions in reasonable computing Berger, M. P. F., & Mathijssen, E. (1997).
time. This method will be invaluable for Optimal test designs for polytomously
finding accurate and fast solutions to the scored items. British Journal of
teachers’ choices, as the size of the item Mathematical and Statistical Psychology, 50,
bank for the asTTle projects is likely to 127–141.
increase in the coming years. Berger, P. F., & VeerKamp, W. J. J. (1996). A
review of selection methods for optimal test
References design. In G. Engelhard & M. Wilson
(Eds.), Objective Measurement: Theory into
Adema, J. J. (1990). The construction of
Practice (Vol. 3, pp. 437–455). Norwood,
customized two-stage tests. Journal of
New Jersey: Ablex.
Educational Measurement, 27(3), 241–253.
Birnbaum, A. (1968). Some latent trait models.
Adema, J. J. (1992a). Implementations of the
In F. M. Lord & M. R. Novick (Eds.),
branch-and-bound method for test
Statistical Theories for Mental Test Scores.
construction problems. Methodika, 6, 99–
Reading, MA: Addison-Wesley.
117.
Boekooi-Timminga, E. (1987). Simultaneous
Adema, J. J. (1992b). Methods and models for
test construction by zero-one programming.
the construction of weakly parallel test.
Methodika, 1(2), 101–112.
Applied Psychological Measurement, 16(1)
Boekooi-Timminga, E. (1990a). The
53–63.
construction of parallel tests from IRT-based
Adema, J. J., & van der Linden, W. J. (1989).
item banks. Journal of Educational
Algorithms for computerized test
Measurement, 15(2), 129–145.
construction using classical item parameters.
Boekooi-Timminga, E. (1990b). A cluster-
Journal of Educational Statistics, 14(3),
based method for test construction. Applied
279–290.
Psychological Measurement, 14(4), 341–
354.
16 Fletcher, R. B.

Boekooi-Timminga, E. (1993). Computer Swanson, L., & Stocking, M. L. (1993). A


assisted test construction. Social Science model and heuristic for solving very large
Computer Review, 11(3), 292–300. item selection problems. Applied
de Gruijter, D. N. M. (1990). Test construction Psychological Measurement, 17(2), 151–
by means of linear programming. Applied 166.
Psychological Measurement, 14(2), 175– Theunissen, T. J. J. M. (1985). Binary
181. programming and test design.
Fletcher, R. B. & Hattie, J. A. (In review). Psychometrika, 50(4), 411–420.
Automated Test Assembly for Polytomous Theunissen, T. J. J. M. (1986). Some
Items: The Application of Linear applications of optimization algorithms in
Programming to Sports Psychology Test test design and adaptive testing. Applied
Construction. Psychological Measurement, 10(4), 381–
Hambleton, R. K., Swaminathan, H., & Rogers, 389.
H. J. (1991). Fundamentals of Item Timminga, E., & Adema, J. J. (1995). Test
Response Theory. London: Sage. construction from item banks. In G. H.
Harris, D. (1996). Comparison of 1-, 2-, and 3- Fischer & I. W. Molenaar (Eds.), Rasch
parameter models. Educational Models: Foundations, Recent Developments
Measurement: Issues and Practice, 15, 157– and Applications. New York: Springer-
163. Verlag.
Ignizio, J. P., & Cavalier, T. M. (1994). Timminga, E., & Adema, J. J. (1996). An
Introduction to Linear Programming. interactive approach to modifying infeasible
Englewood Cliffs, New Jersey: Prentice- 0-1 linear models for test construction. In G.
Hall. Engelhard & M. Wilson (Eds.), Objective
Lord, F. M. (1952). A theory of test scores. Measurement: Theory into Practice (Vol. 3,
Psychometric Monograph, 7. pp. 419–436). Norwood, New Jersey:
Lord, F. M. (1977). Practical applications of Ablex.
item characteristic curve theory. Journal of Timminga, E., van der Linden, W. J., &
Educational Measurement, 14, 117–138. Schweizer, D. A. (1996). CONTEST
Lord, F. M. (1980). Applications of Item (Version 2) [Computer Program].
Response Theory to Practical Testing Groningen: ProGAMMA.
Problems. New Jersey: Lawrence Erlbaum. van der Linden, W. J. (1996a). Assembling
Sanders, P. F., Theunissen, T. J. J. M., & Baas, tests for the measurement of multiple traits.
S. S. (1996). The optimization of decision Applied Psychological Measurement, 20(4),
studies. In G. Engelhard & M. Wilson 373–388.
(Eds.), Objective Measurement: Theory into van der Linden, W. J. (1994). Optimum design
Practice (Vol. 3, pp. 301–312). Norwood, in item response theory: Test assembly and
New Jersey: Ablex. item calibration. In G. H. Fischer & D.
Stocking, M. L., Swanson, L., & Pearlman, M. Laming (Eds.), Contributions to
(1991). Automated item selection using item Mathematical Psychology, Psychometrics,
response theory. (Research Report 91-9). and Methodology (pp. 308–318). New York:
Princeton, NJ: Educational Testing Service. Springer-Verlag.
Stocking, M. L., Swanson, L., & Pearlman, M. van der Linden, W. J. (2000). Optimal
(1993). Application of an automated item assembly of tests with item sets. Applied
selection method to real data. Applied Psychological Measurement, 24(3), 225–
Psychological Measurement, 17(2), 167– 240.
176. van der Linden, W. J., & Boekkooi-Timminga,
Sultan, A. (1993). Linear Programming: An E. (1989). A maximin model for test design
Introduction with Applications. Boston: with practical constraints. Psychometrika,
Academic Pres, Inc. 54(2), 137–247.
Technical Report 5: Linear Programming and its Application to asTTle 17

Wolsey, L.A. (1998). Integer Programming.


New York: John Wiley & Sons.
Appendix One – Item Bank Structure

18
Appendix Table 1
Item Bank Structure for the Closed-Choice Items
Items
C-C C-C C-C C-C C-C C-C C-C
Curricular in C-C C-C Items at Deep C-C Items at Surface
Items Items Items Items Items Items Items
Area each Items Processing Processing
at θ1 at θ2 at θ3 at θ4 at θ5 at θ6 at θ7
Area
Personal 1–3, 7–9, 13–15, 19–21, 4–6, 10–12, 16–18, 21–
1–56 1–42 1–6 7–12 13–18 19–24 25–30 31–36 37–42
Reading 25–27, 31–33, 37–39 24, 28–30, 34–36, 40–42
57–59, 63–65, 69–71, 60–62, 66–68, 72–74,
Close 57–
57–98 57–62 63–68 69–74 75–80 81–86 87–92 93–98 75–77, 81–83, 87–89, 78–80, 84–86, 90–92,
Reading 112
93–95 96–98
113–115, 119–121, 125– 116–118, 122–124, 128–
Expressive 113– 113– 113– 119– 125– 131– 137– 143– 149–
127, 131–133, 137–139, 130, 134–136, 140–142,
Writing 168 154 118 124 130 136 142 148 154
143–145, 149–151 146–148, 152–154
169–171, 175–177, 181– 172–174, 178–180, 184–
Poetic 169– 169– 169– 175– 181– 187– 193– 199– 205–

Fletcher, R. B.
183, 187–189, 193–195, 186, 190–192, 195–197,
Writing 224 210 174 180 186 192 198 204 210
199–201, 205–207 202–204, 207–210
225–227, 231–233, 237– 228–230, 234–236, 230–
Transitional 225– 225– 225– 231– 237– 243– 249– 255– 261–
239, 243–245, 249–251, 232, 246–248, 251–253,
Writing 280 266 230 236 242 248 254 260 266
255–257, 261–263 258–260, 264–266
281–283, 287–289, 293– 284–286, 290–292, 295–
Exploring 281– 281– 281– 287– 293– 299– 305– 311– 317–
295, 299–301, 305–307, 297, 301–303, 308–310,
Writing 336 322 286 292 298 304 310 316 322
311–313, 317–319 314–316, 320–322
337–339, 343–345, 349– 340–342, 345–347, 352–
Thinking 337– 337– 337– 343– 349– 355– 361– 367– 373–
351, 355–357, 361–363, 354, 358–360, 364–366,
Critically 392 378 342 348 354 360 366 372 378
367–369, 373–375 370–372, 376–378
393–395, 399–401, 405– 396–398, 402–404, 408–
Processing 393– 393– 393– 399– 405– 411– 417– 423– 429–
407, 411–413, 417–419, 410, 414–416, 420–422,
Information 448 434 398 404 410 416 422 428 434
423–425, 429–431 426–428, 433–434
Note: C-C denotes closed-choice items.
Appendix Table 2
Item Bank Structure for the Open-Ended Items
Items
O-E O-E O-E O-E O-E O-E O-E
Curricular in O-E O-E Items at Deep O-E Items at Surface
Items Items Items Items Items Items Items
Area each Items Processing Processing
at θ1 at θ2 at θ3 at θ4 at θ5 at θ6 at θ7
Area

Technical Report 5: Linear Programming and its Application to asTTle


Personal
1–56 43–56 43–44 45–46 47–48 49–50 51–52 53–54 55–56 43, 45, 47, 49, 51, 53, 55 44, 46, 48, 50, 52, 54, 56
Reading
Close 57– 99– 99– 101– 103– 105– 107– 109– 111– 99, 101, 103, 105, 107, 100, 102, 104, 106, 108,
Reading 112 112 100 102 104 106 108 100 112 109, 111 110, 112
Expressive 113– 155– 155– 157– 159– 161– 163– 165– 167– 155, 157, 159, 161, 156, 158, 160, 162, 164,
Writing 168 168 156 158 160 162 164 166 168 163165, 167 166, 168
Poetic 169– 211– 211– 213– 215– 217– 219– 221– 223– 211, 213, 215, 217, 219, 212, 214, 216, 218, 220,
Writing 224 224 212 214 216 218 220 222 224 221, 223 222, 224
Transitional 267– 267– 267– 269– 271– 273– 275– 277– 279– 267, 269, 271, 273, 275, 268, 270, 272, 274, 276,
Writing 280 280 268 270 272 274 276 278 280 277, 279 278, 280
Exploring 281– 323– 323– 325– 327– 329– 331– 333– 335– 323, 325, 327, 329, 331, 324, 326, 328, 330, 332,
Writing 336 336 324 326 328 330 332 334 336 333, 335 334, 336
Thinking 337– 379– 379– 381– 383– 385– 387– 389– 391– 379, 381, 383, 385, 387, 380, 382, 384, 386, 388,
Critically 392 392 380 382 384 386 388 390 392 389, 391 390, 392
Processing 393– 435– 435– 437– 439– 441– 443– 445– 447– 435, 437, 439, 441, 443, 436, 438, 440, 442, 444,
Information 448 448 436 438 440 442 444 446 448 445, 447 446, 448
Note: O-E denotes open-ended items.

19
20 Fletcher, R. B.

Appendix Table 3
Hypothetical Number of Items at each θ across the Three Levels based on an Item Bank of 448
Items

θ1 = -3 θ2 = -2 θ3 = -1 θ4 = 0 θ5 = +1 θ6 = +2 θ7 = +3

very very
Level 1 easy average hard
easy hard
very very
Level 2 easy average hard
easy hard
very very
Level 3 easy average hard
easy hard

64 items 64 items 64 items 64 items 64 items 64 items 64 items

Note: θ2 -θ6 share the same items due the overlapping of the grade/ability levels.

Appendix Table 4
Item Attribute and Constraint Levels for Set-Based Items

Level Attribute Level Constraint Level

Can be quantitative or Inclusion/exclusion of items


categorical; with certain attribute values
e.g., content, processing into the test;
Item
level, item parameters, item e.g., “the first five items
format, word count, response should be surface
time. processing”.
Can be quantitative or Inclusion/exclusion of stimuli
categorical; with specific attribute values
Stimulus e.g., curricular area, into the test;
processing level, reading e.g., “no stimulus should be
passage. longer than 250 words”.
Control the distribution of
categorical attributes. These
Mainly quantitative; are usually set between
Item Set Level e.g., number of items in the lower and upper bounds;
set. e.g., “each item set should
have at least three items at
the deep processing level”.
Distributions or functions of
Mainly quantitative;
items or stimulus attributes;
e.g., test length, deviation of
Test Level e.g., “no more than three
the test information function
open-ended questions
and the TIF.
should be in the test”.
View publication stats

Appendix Table 5
Hypothetical Item Bank with Set-based Items
Curricular Area Items C-C C-C C-C C-C C-C C-C C-C C-C Items at Deep C-C Items at Surface
in each Items Items Items Items Items Items Items at Processing Processing
Area at θ1 at θ2 at θ3 at θ4 at θ5 at θ6 θ7
Personal 1–49 1–7 8–14 15–21 22–28 29–35 36–42 43–49 1–4, 8–11, 15–18, 22– 5–7, 12–14, 19–21, 26–
Reading (PR) 25, 29–32, 36–39, 43–46 28, 33–35, 40–42, 47–49
Close Reading 50–98 50–56 57–63 64–70 71–77 78–84 85–91 92–98 50–53, 57–60, 64–67, 54–56, 61–63, 68–70,
(CR) 71–74, 78–81, 85–88, 75–77, 82–84, 89–91,
92–95 96–98

Linear Programming and its Application to asTTle


Expressive 99– 99– 106– 113– 120– 127– 134– 141– 99–102, 106–109, 113– 103–105, 110–112, 117–
Writing (EW) 147 105 112 119 126 133 140 147 116, 120–123, 127–130, 119, 124–126, 131–133,
134–137, 141–144 138–140, 145–147
Poetic Writing 148– 148– 155– 162– 169– 176– 183– 190– 148–151, 155–158, 162– 152–154, 159–161, 166–
(PW) 196 154 161 168 175 182 189 196 165, 169–172, 176–179, 168, 173–175, 180–182,
183–186, 190–193 187–189, 194–196
Transitional 197– 197– 204– 211– 218– 225– 232– 239– 197–200, 204–207, 211– 201–203, 208–210, 215–
Writing (TR) 245 203 210 217 224 231 238 245 214, 218–221, 225–228, 217, 222–224, 229–231,
232–235, 239–242 236–238, 243–245
Exploring Writing 246– 246– 253– 260– 267– 274– 281– 288– 246–249, 253–256, 260– 250–252, 257–259, 264–
(EXW) 294 252 259 266 273 280 287 294 263, 267–270, 274–277, 266, 271–272, 278–280,
281–284, 288–291 285–287, 292–294
Thinking 295– 295– 302– 309– 316– 323– 330– 337– 295–298, 302–305, 309– 299–301, 306–308, 313–
Critically (TC) 343 301 308 315 322 329 336 343 312, 316–319, 323–326, 315, 320–322, 327–329,
330–333, 337–340 334–336, 341–343
Processing 344– 344– 351– 358– 365– 372– 379– 386– 344–347, 351–354, 358– 348–350, 355–357, 362–
Information (PI) 392 350 357 364 371 378 385 392 361, 365–368, 372–375, 364, 369–371, 376–378,
379–382, 386–389 383–385, 390–392

21

You might also like