Dynamic Testing - Grigorenko and Sternberg

Download as pdf or txt
Download as pdf or txt
You are on page 1of 37

Psychological Bulletin Copyright 3998 by the American Psychological Association, Inc.

1998, Vol. 124, No. 1 , 7 5 - 1 1 1 0033-2909/98/S3.00

Dynamic Testing

Elena L. Grigorenko and Robert J. Steinberg


Yale University

This article evaluatively reviews the literature on dynamic testing, a collection of testing procedures
designed to quantify not only the products or even the processes of learning but also the potential
to learn. The article considers a variety of approaches to dynamic testing and the strengths and
weaknesses of each. Moreover, the literature on each approach is reviewed and analyzed in terms
of the extent to which it fulfills the claims made for it. In all of these approaches, testing involves
learning at the tune of test, rather than just static testing of what has been learned before. It is
concluded that dynamic testing has great potential for helping to understand people's potentials but
that its potential has yet to be realized fully,

Conventional measures of cognitive skills quantify developed and to substitute testing that is dynamic in nature for the static
abilities. Thus, they indicate latent capacity only as it is realized testing used within the framework of conventional psychometric
in performance, which, in turn, is affected by many variables tests. In other words, instead of quantifying the existing set of
such as amount of education, test-taking skills, parental support, abilities and level of knowledge and viewing them as a basis
and so on. Often, one wishes to know the extent to which for predicting children's subsequent cognitive development, dy-
developed abilities reflect latent capacity and the extent to which namic testing has as its aim the quantification of the learning
they do not—in other words, the difference between latent ca- potential of the child during the acquisition of new cognitive
pacity and developed abilities. Dynamic testing has been pro- operations.
posed as a way of uncovering this information. If it is successful, Three major differences emerge when static and dynamic
then it is also revolutionary. There is at last a way to reduce the paradigms are compared. First, dynamic testing puts an empha-
effects of obstruction from all the environmental variables that sis on quantifying the psychological processes involved in learn-
can color performance and to quantify a person's true potential ing and change, whereas static testing primarily is concerned
for growth from wherever he or she may happen to be cogni- with products formed as a result of pre-existing skills. The
tively at any one moment. Does dynamic testing deliver on its second difference has to do with the role of feedback. In static
promise? If it does not yet quite deliver, what needs to be done testing, an examiner presents a graded sequence of problems,
so that it will? These are the questions addressed in this article. and the test taker responds to each of the problems. There is
The idea of quantifying the learning potential underlying the no feedback from examiner to test taker regarding quality of
processes and products of learning initially appeared as a meta- performance. In dynamic testing, feedback is given. Similarly,
phor, as a product of wishful thinking. Wouldn't it be nice if the examiner presents a sequence of progressively more chal-
researchers were equipped to quantify someone's potential lenging tasks, but after the presentation of each task, the exam-
rather than actualized abilities, something developing and modi- iner gives the test taker feedback and continues with this feed-
fiable rather than something developed and perhaps even fixed? back in successive iterations until the examinee either solves
Wouldn't it be nice if researchers could test people's ability to the problem or gives up. Testing thus joins with instruction, and
learn new things rather than just people's ability to demonstrate the test taker's ability to learn is quantified while she or he
the knowledge they already have acquired? learns. Third, the two paradigms differ in the quality of exam-
Conventionally, attempts to operationalize this metaphor have iner-examinee relationships. Specifically, within the framework
been referred to as dynamic testing. The basic idea with such of dynamic testing, the test situation and the type of examiner-
testing is to develop an evaluative model of children's potential examinee relationship are modified from the one-way tradi-
tional setting of the conventional psychometric approach (in
which neutrality and lack of involvement on the part of the
Elena L. Grigorenko and Robert J. Sternberg, Department of Psychol- experimenter are considered necessary to ensure standard mea-
ogy, Yale University. surement conditions) to form a two-way interactive relationship
Work on this article was supported by the Partnership for Child Devel- between the examiner and the examinee. This tester-testee in-
opment of Oxford University, England, and by the U.S. Office of Educa- teraction is individualized for each child: The conventional atti-
tional Research and Improvement, U.S. Department of Education (Grant tude of neutrality is thus replaced by an atmosphere of teaching
R206R50001). The findings and opinions expressed in this article do and helping.
not reflect the positions or policies of the above mentioned agencies.
Thus, dynamic testing is based on the link between testing
We are thankful to D. Tzuriel and J. Carlson for their attentive reading
and intervention and examines the processes of learning as well
of the earlier version of this article.
Correspondence concerning this article should be addressed to Elena as its products. By embedding learning in evaluation, dynamic
L. Grigorenko, Department of Psychology, Yale University, P.O. Box testing assumes that the examinee can start at the zero (or almost
208205, New Haven, Connecticut 06520-8205. Electronic mail may be zero) point of having certain knowledge and that teaching will
sent to [email protected]. provide virtually all the necessary information for mastery of

75
76 GRIGORENKO AND STERNBERG

the tested knowledge. In other words, what is tested is not just educator needs to teach students something and then to observe
previously acquired knowledge but the capacity to master, apply, their learning. People draw conclusions about other people's
and reapply knowledge taught in the dynamic testing situation. ability to learn—their learning potential—all the time. Experts
This view of the testing procedure underlies our use of the terms in different fields are able to predict the future performance of
testing of learning potential and dynamic testing in this article. novices by first giving the novices a chance to participate in
In our interpretation, dynamic testing is part of a larger pro- professional activities and then by evaluating their performance
cess referred to as dynamic assessment; testing and assessment while they are learning. When a professor starts working with
here are not synonyms. Dynamic testing, along with other types a student on research, the first step is usually some kind of
of evaluations (e.g., observations and judgments; Salvia & Ys- informal pretest on the student's understanding of the problem
seldyke, 1981), is only one of many procedures used in assess- to be solved. The student, who would have just started working
ment. This view of dynamic testing justifies our somewhat re- on the problem, typically does not know much. Therefore, the
stricted conception of the tester-testee interaction. Because we professor suggests ideas, appropriate readings, and issues on
define our undertaking as an attempt to review dynamic testing which to concentrate. After a series of subsequent visits and
(not dynamic assessment), we correspondingly limit ourselves discussions based on the learned material, the professor has
to a narrowly conceived interpretation of the tester-testee inter- enough information to make a preliminary judgment about the
action. Broadly defined dynamic assessment is naturally linked learning potential of the student. Similarly, an experienced car
with intervention. In essence, the goal of dynamic assessment mechanic trying to train apprentices in the garage gradually
is to evaluate, to intervene, and to change. The goal of dynamic involves them in the operation and lets them handle more and
testing, however, is much more modest: It is to see whether and more difficult tasks, observing and correcting the novices' per-
how the subject will change if an opportunity is provided. In formance. In this way, the expert evaluates the ability of the
some approaches to dynamic testing, the tester—testee interac- novice to learn.
tion component is limited to providing simple feedback, whereas This kind of implicit prediction of a novice's future achieve-
in others, this interactive link exceeds simple feedback bound- ment, based on learning during an apprenticeship, occurs fre-
aries and takes the form of targeted intervention. quently in everyday life. Now imagine a test that measures the
Notwithstanding the importance of the endeavor and of allo- ability to learn something new. For example, a person is about
cating significant resources to its realization, multiple attempts to make a career decision. He takes two tests in different fields;
to quantify learning potential and to transform such testing into let us say, biology and psychology. The tests are designed in
robust psychological diagnostic tools have not produced consis- such a way that they initially assess his unassisted performance,
tent results. Nevertheless, the idea of such testing is so appealing then his performance while working on problems with experts
that, despite its lack of empirical validation, it has been widely in each field, and then his individual performance when he is
discussed and fairly widely used. However, the paucity of pub- retested. He does equally well on the pretests, but he has learned
lished empirical data on the reliability and validity of dynamic much more successfully while working with the biology expert
testing as well as (for some approaches) insufficient detail in than with the psychology expert. Thus, his posttest biology per-
the presentation of methods, which has made replication diffi- formance is significantly better than his posttest psychology
cult, have led the scientific community to pay insufficient atten- performance. These results might be interpreted as suggesting
tion to the power of the conclusions that can be drawn legiti- that the field of biology is more promising for him, that he can
mately in this area. In fact, there have been only a handful of better realize his potential in this field. Thus, the test provides
reviews of dynamic testing studies published in peer-reviewed results that predict to some degree his future performance in
journals (e.g., Elliott, 1993; Jitendra & Kameenui, 1993; the field. Alternatively, picture a child whose parents are recent
Laughon, 1990; Missiuna & Samuels, 1988), and most of them immigrants to a new culture. Irrespective of his or her knowl-
have centered on the educational and clinical applicability of edge of English, if this child is given a conventional test that is
dynamic testing rather than on the underlying psychological not traditional in his or her culture, the child most likely is going
models and hard empirical data yielded by such testing. This to demonstrate a fairly low level of performance. However, if
article attempts to fill in this gap. It summarizes current litera- the same child is given a chance to be reevaluated after the test-
ture on dynamic testing and discusses the achievements and specific intervention, his or her performance might be drastically
limitations of various approaches that have attempted to mea- different.
sure people's learning potential. It then proposes a three-prong The most important application of dynamic testing perhaps
approach to testing learning potential. has been in work with certain disadvantaged children who have
We wish to state up front that our goal is not to drive a performed exceptionally poorly on conventional static tests
stake through the heart of dynamic testing. On the contrary, we (e.g., Feuerstein, Rand, & Hoffman, 1979). The category disad-
endorse the concept and are attempting to construct dynamic vantaged (or sometimes, challenged), as opposed to advantaged
tests ourselves. Our specific concern is with the paucity of evi- (nonchallenged), is used to refer to a large class of pupils
dentiary support for the utility of the operationalizations of the viewed as having unequal learning opportunities due to deficient
constructs that have been proposed to date. previous education, to lack of a match in previous and current
cultural and educational practices, or to learning disability or
The Need for Dynamic Testing and the Attempts
mental deficiency. The claim that these students should be tested
to Meet This Need
dynamically is motivated by the belief that dynamic testing in
Rationale its proper application should reduce educational inequalities by
Russian psychologist Sergei Rubinstein (1946) wrote that in providing what are seen as more compassionate, fair, and equita-
order for an educator to evaluate students' ability to learn, the ble means for assessing students' learning capacities. For disad-
DYNAMIC TESTING 77

vantage^ children, quantifying their learning in action, with the worked out not by Vygotsky himself but by those who followed
assistance of and under the supervision of an adult, might be (e.g., Ginzburg, 1981; Vlasova, 1972).
the only way to evaluate their true level of functioning. Even though the theoretical background for dynamic testing
was established, and the first experimental investigations were
conducted in the 1930s and 1940s (e.g., Kern, 1930; Vygotsky,
Concepts and Approaches
1934/1962), extensive professional attention was devoted to
The idea of developing a methodological paradigm that goes this methodology only in the 1960s and 1970s, during a time
beyond measurement of developed abilities and quantifies the of severe criticism of static testing. This interest led to intensive
potential that will be a main force in students' learning is an research in Israel (Feuerstein, Rand, & Hoffman, 1979) and in
extraordinarily appealing idea to scientists and laypeople alike. the United States (Brown & French, 1979; Budoff, 1975; Carl-
A number of synonymous concepts, traditionally unified under son, 1992; Lidz, 1987, 1991) on learning ability and dynamic
the term dynamic testing/assessment (e.g., interactive testing/ testing.
assessment, process testing/assessment, measuring the zone of This reestablished interest in dynamic testing has been attrib-
proximal development, assisted testing/assessment, and tests of uted to the development of certain societal needs and to the
learning potential), have been suggested for this paradigm. formation of more mixed and even critical professional opinion
Although the concept of dynamic testing has been heavily on the usefulness of static testing (Tzuriel & Haywood, 1992).
researched only during the past 2 decades, the theoretical foun- With regard to societal needs, researchers recognized the desir-
dations for this methodology were proposed some time ago. ability of (a) more nearly culture-fair tests that could be used
For example, the creator of static testing, Alfred Binet (1909), in work with immigrants for the purpose of integrating them
advocated process assessment. He did not suggest, however, an into society, (b) tests that would be useful for comparing results
instrument suitable for such an assessment. Thorndike (1924) obtained in culturally diverse populations, (c) developmental
argued for the necessity of measuring the ability to learn as tests appropriate for testing of individuals with deprived educa-
part of intelligence. Buckingham (1921) suggested that the best tional experiences, and (d) the measurement of learning poten-
measure of intelligence is one that takes into account the rate tial as distinct from what has been learned regardless of the
at which learning takes place, the products of learning, or both. culture, population, or social group of a tested individual. Such
Penrose (1934) wrote that "the ideal test in the study of mental measurement is especially attractive when one considers the
deficiency would be one which investigates the ability to learn" relatively modest-to-moderate forecasting power of most static
(p. 49). Dearborn (1921) and DeWeerdt (1927) believed there ability tests for many predictive purposes (e.g., Sternberg,
was a need for tests involving and measuring actual learning 1996).
and practice rather than just their results. Andre Rey (1934) The usage of the term dynamic testing involves multiple con-
suggested testing educability, and to this end he constructed texts. First, dynamic testing provides a basis for teaching and
some 400 tests. Most of them, however, were of a static nature. developing cognitive skills. In this context, teaching is a form
A German psychologist. Kern (1930), also attempted to concep- of intervention, and school evaluations are forms of pre- and
tualize the measurement of learning ability. Kern criticized the posttests. Whole pedagogical programs have been designed so
psychometric tests of the times, stating that initial inhibitions that teaching would extend the child's ability by gradually ex-
frequently distort the results of the first testing session. What panding his or her skills to learn and work independently (Davy-
really should be a predictor of future success, according to Kern, dov, 1986; Jensen, Robinson-Zafiartu, & Jensen, 1992; Kalmy-
is capacity for learning. Kern conducted an experiment in which kova, 1975; Klein, 1987; Newman, Griffin, & Cole, 1989; Taly-
intelligence tests were administered repeatedly for a duration zina, 1995).
of several weeks. He found a significant shift in the rank order Second, the concept of dynamic testing deals with concrete
of his test takers from the beginning of the training period to methodologies for the testing of learning potential. Indeed, dy-
the fifth testing session, after which ranks remained fairly stable. namic testing has come to be explored in a wide variety of
He concluded that one-shot testing does not provide enough testing contexts, including testing of the abilities of (a) job
information for developing an adequate prognosis and might in candidates (Coker, 1990; Downs, 1985; Robertson & Mindel,
fact lead to incorrect conclusions. 1980), ( b ) children in social-learning situations (Zimmerman,
Credit for introducing the concept of dynamic testing to mod- as cited in Guthke, 1993), (c) persons with mental retardation
ern psychology is usually given to Lev Vygotsky (1934/1962). (e.g.,Ashman, 1985, 1992; Molina & Perez, 1993;Paour, 1992),
Actually, it is arguable who deserves credit for the modem con- (d) patients with brain damage (Baltes, Kilhl, & Sowarka, 1992)
cept of dynamic testing. Some research, such as that of Brown and with various types of sensory handicaps (Keane, Tannen-
(e.g., Brown & Ferrara, 1985) and Guthke (e.g., Guthke, 1992), baum, & Krapf, 1992), (e) aging individuals (Fernandez-Balles-
was directly derived from Vygotsky's theory (see Lidz, 1987), teros, Juan-Espinosa, Colom, & Calero, 1997; Kliegl & Baltes,
whereas other work (e.g., that of Feuerstein, Rand, & Hoffman, 1987; Kliegl, Smith, & Baltes, 1989), ( f ) candidates for under-
1979) is presented as of independent origin (Kozulin & Falik, graduate admissions (Shochet, 1992), (g) adults (Barr & Sam-
1995). Whatever their antecedents, we formulate our analyses uels, 1988) and children (e.g., Brownell, Mellard, & Deshler,
here on the conceptual rather than the historical antecedents of 1993; Pefia, Quinn, & Iglesias, 1992; Samuels, Tzuriel, & Mal-
various approaches and therefore attempt to cluster research on loy-Miller, 1989; Tzuriel, 1997) exhibiting learning difficulties,
the basis of theoretical and methodological dimensions. We do, ( h ) gifted disadvantaged pupils (Bolig & Day, 1993; Borland &
however, assign historical priority to Vygotsky, whose theory Wright, 1994; Hickson & Skuy, 1990), (i) immigrants to differ-
appears to have been the first nearly complete theory of dynamic ent cultures (Gutierrez-Clellen, Pefia, & Quinn, 1995) and cul-
testing. Many of the implications of the theory, however, were turally diverse students (Coxhead & Gupta, 1988; Hamers, Hes-
78 GRIGORENKO AND STERNBERG

sles, & Van Luit, 1991; Hamers, Hessels, & Pennings, 1996; the state of development is never defined only by what has matured.
Luther, Cole, & Gamlin, 1996), (j) foreign-language learners If a gardener decides only to evaluate the mature or harvested fruits
(Frawley & Lantolf, 1985), (k) penitentiary inmates (Sil- of the apple tree, he cannot determine the state of his orchard.
Maturing trees must also be taken into consideration. The psycholo-
verman & Waksman, 1992), and (1) preschoolers (Day, Engel-
gist must not limit his analysis to functions that have matured. He
hardt, Maxwell, & Bolig, 1997; Olswang & Bain, 1996; Spector,
must consider those that are in the process of maturing. If he is to
1992). Of course, there have been other uses of dynamic testing
evaluate fully the state of the child's development, the psychologist
as well. must consider not only the actual level of development but the zone
Third, dynamic testing is a sociopolitical concept. For exam- of proximal development, (pp. 208-209)
ple, dynamic testing has had a long history in the former Soviet
Union. Static testing was prohibited by state decree in 1936, One can assess these maturing cognitive functions by setting
because its goal, according to the Communist Party officials of up a collaborative effort between the child and others (adults
that time, was to label children as deficient by estimating the or more capable peers) to provide a basis for estimating the
current status of their abilities and without considering their discrepancy between what the child can do independently and
future potential. Consequently, for a number of decades, dy- what he or she can do with the help of others. In this context,
namic testing was virtually the only paradigm accepted in Soviet the ZPD is viewed by Vygotsky as the distance between a child's
psychology and remedial education, and when the concept of "actual developmental level as determined by independent prob-
testing was used, it was assumed to be in the context of dynamic lem solving [and the higher level of] potential development as
testing. determined through problem solving under adult guidance or in
What are the most influential approaches to dynamic testing, collaboration with more capable peers" (Vygotsky, 1978,
and how are they related? We consider this question next. P. 86).
Vygotsky (1987) also wrote about the ZPD in the context
Underlying Theories of the Testing of Learning Potential of studying the relations between learning and development.
Instruction is useful when it moves ahead of development and
The First Operationalization: Vygotsky, His Followers,
when it impels or wakens a whole series of functions that are
and His Interpreters
in a stage of maturation and, thus, that still lie in the ZPD rather
We start our discussion with a description of the work of Lev than having been actualized already.
Vygotsky, because Vygotsky's writings provide the context for The type of activity in which the ZPD is most likely to appear,
contemporary approaches to dynamic testing (Lidz, 1995; Rut- according to Vygotsky (1978), is play. In play, a child always
land & Campbell, 1996). The ideas of Vygotsky that are relevant behaves beyond his average age, above his daily behavior; in
to dynamic testing appeared in the context of his theory of play, it is as though he were "a head taller than himself"
higher mental functions (Vygotsky, 1931/1983). (Vygotsky, 1978, p. 102). McLane (1990) developed this idea,
One of the major concepts of this theory is that of the zone saying that play encourages the player to act as if he or she were
of proximal development (ZPD). One of Vygotsky's followers already competent in the activity under consideration. Thus, he
and colleagues, Leont'ev, in his discussions with Bronfenbren- suggested, when children are learning how to write, their playing
ner (cited in Bronfenbrenner, 1977), summarized the meaning with the processes and forms of writing may result in a sense
of this concept by saying that it tries ' 'to discover not how the of ownership of this complex cultural activity, which makes
child came to be what it is, but how it can become what it not them feel like they are writers long before they have the needed
yet is" (p. 528). Thus, the ZPD reflects development itself: It skills and knowledge to produce mature, culturally appropriate,
is not what one is but what one can become; it is not what has and fully conventional writing.
developed but what is developing. The ZPD is a purely social It is important to note that, in contrast to the well-elaborated
construct, because it exists only in social interaction and is and grounded theoretical representation of the ZPD by Vygotsky,
created by the interaction. The developmental, interactive, and experimental validation of the construct is very scarce. Neither
forward-looking nature of the ZPD resulted in its becoming one Vygotsky himself nor his immediate followers and colleagues
of the ideas of Vygotsky that has received the most attention conducted any systematic empirical validation of the ZPD. Thus,
in the West. According to Newman and Holzman (1993), the in the 1930s, this concept gained theoretical grounding but
popularity of the ZPD is due to the fact that it (a) lends itself sorely needed compelling experimental validation.
well to contemporary interests in social cognition and classroom Vygotsky's theoretical account of the ZPD and its various
interaction, (b) delves into the essence of learning and develop- implications has spawned research in many countries, among
ment, and (c) is an expression of the individual in society. them Russia (e.g., Davydov, 1986; Rubtsov, 1981; Vlasova,
The concept of the ZPD is epistemologically complex and is 1972), Germany (e.g., Guthke, 1993), and the United States
not described or researched fully in Vygotsky's work(Ginzburg, (e.g., Rogoff & Wertsch, 1984). Two main lines of contempo-
1981; Kozulin, 1990). Vygotsky discussed the multiple implica- rary interpretation of Vygotsky's ZPD can be pointed out. One
tions of the ZPD in the context of (a) mature-versus-rnaturing has to do with its sociological-pedagogical aspect; another has
cognitive functions, (b) learning versus development, (c) the to do with an individualized means to improve the testing of a
discrepancy between what the child can do independently versus child's mental functioning (Wertsch & TUlviste, 1992).
in collaboration with others (adults and peers), and (d) the
types of activities in which the ZPD is most likely to manifest
Sociological-Pedagogical Interpretation of the ZPD
itself. Below, we briefly sketch each of these implications.
In developing the idea of mature-versus-maturing cognitive In the context of its sociological-pedagogical aspect, the
functions, Vygotsky (1987) wrote: ZPD is viewed as a dynamic region of sensitivity to learning
DYNAMIC TESTING 79

cultural skills that is created by more experienced members of between expert and novice and later in the novice's independent
the culture, has distinct historical and cultural qualities, and activity. Initially, the novice not only lacks the skills necessary
determines a child's learning and development within a given to carry out the tasks independently but also, and more im-
culture (Newman & Holzman, 1993; Rogoff, 1990). The re- portant, does not understand the goal toward which a given
search capitalizing on this aspect of the ZPD is heterogeneous in situation is directed. Thus, in their school program, Newman et
many aspects, including target populations of students, teaching al. ensured that teaching took place in a Vygotskian manner,
methods, and the obtained results. Its style is qualitative rather whereby children were introduced to a given task in such a way
than quantitative. Moreover, much of the work has been con- that "the goal and the procedure were simultaneously internal-
ducted at the ideological or the theoretical level or at the level ized in the course of the interaction" (p. 55). Generalizing from
of descriptive analysis and makes it difficult to conduct an evalu- their school-based experiences, Newman et al. suggested that
ation of relevant empirical findings. What relevant literature the concept of the ZPD refers to an interactive system within
there is can be classified into clusters. which people work on a problem that at least one of them
One relatively small cluster of literature pertains to a broad could not solve effectively on his or her own. Thus, in their
interpretation of the ZPD as the main mechanisms of vertical interpretation, the ZPD is a more general phenomenon, observed
transmission of cultural knowledge (Cole, 1985). Such under- when two or more people with unequal expertise are jointly
standing of the ZPD, according to Cole, is applicable to anthro- accomplishing a goal.
pological-psychological studies of socialization in different cul- In Russia, this type of broad interpretation of the ZPD as a
tures. Moll and Greenberg (1990) illustrated this idea by using "place" in which development, stimulated by learning, takes
the concept of ZPD in their study of the means by which young place led to the formulation of the theory of systematic forma-
children of the Mexican American community in Arizona master tion of mental actions and concepts (Gal'perin, 1966). In the
knowledge about household maintenance and showed that the context of this theory, teaching is viewed as a psychological
initial steps of mastery are carried out in joint activities of experiment, the goal of which is to bring the student to a new,
children and adult housekeepers. higher level of development. By conducting teaching as an ex-
The second cluster includes studies on the pedagogical impli- periment, that is, by determining the dependent (outcome) and
cations of the ZPD. In this context, the ZPD is viewed as being the independent (treatment) variables in teaching, the research-
of such great importance that users of the ZPD see its manifesta- ers view teaching as the way systematically to influence and
tion in instructional settings as essential to good teaching. They continually to monitor the child's cognitive and educational
believe that teaching should involve assisting performance progress. The teaching is conducted in six stages (for details,
through the ZPD (Tharp & Gallimore, 1988). From this point see Haenen, 1996): (a) the motivational stage (preliminary in-
of view, teaching is interpreted as a kind of negotiation between troduction of the action to the learner and mobilization of the
teacher and student in which the teacher provides instruction learning motive), (b) the orienting stage (construction of the
that helps students take on additional responsibility for manag- orienting, schematic basis of the action), (c) the materialized
ing their own learning through activity (Cole, 1985). Teachers stage (mastering the action by using material or materialized
constantly "test the waters" (Wertsch, 1991) to see whether objects), (d) the stage of overt speech (mastering the action by
their students are ready to move to a new level of self-regulation. speaking the steps out loud), (e) the stage of covert speech
When students fail the test, teachers return to instruction that (mastering the action by speaking to oneself), and (f) the men-
requires less developed thinking on the part of the students. tal stage (transferring the action to the mental level or internaliz-
There have been a number of attempts to design teaching ing the steps necessary to the action). For example, to explain
programs based on the concept of the ZPD. For example, Hede- addition and subtraction, the teacher first tries to motivate stu-
gaard (1990) conducted a study implementing the concept of dents to master this action (Stage 1) then gives the learners a
the ZPD in a 3-year curriculum for school subjects (i.e., biology, whole set of orienting elements to guide them in the execution
history, and geography) in Danish school classrooms of third of the action (e.g., explains what subtraction does and what
to fifth graders. Similarly, McNamee and her colleagues used components it includes; Stage 2) and then, using the subject
the concept of the ZPD in devising programs to develop literacy matter as the basis for teaching, provides the students with a
in both preschoolers and learning-disabled children (McNamee, means to master the operation with mediating tools (e.g., ob-
1990; McNamee, McLane, Cooper, & Kerwin, 1985). Bodrova jects, drawings, pictures; Stage 3). Then, at Stage 4, the children
and Leong (1996) pulled together examples of and activities are taught to execute the action without any direct links to
relevant to ZPD-based strategies of teaching and learning, all mediating tools by verbalizing the steps of addition and subtrac-
of which were developed and tested in different subject areas tion, which they originally learned by manipulating real objects.
and at different stages of schooling. Similarly, at Stage 5, the action is still verbalized, but the learn-
In yet another educational use of the ZPD, Newman, Griffin, ers are encouraged to whisper to themselves instead of speaking
and Cole (1989) extended the pedagogical notion of the ZPD aloud. Eventually, whispering disappears, and the action moves
and renamed it the construction zone and referred to it as the entirely in the mind (Stage 6). Now the children can add and
zone between the thoughts of two people, a shared activity in subtract mentally; that is, the function taught has been mastered.
the frame of which interpsychological (shared-mental) pro- The followers of this approach state that their intention is not
cesses can take place. Cognitive change does not occur in a to install in the child a developmental stage or level but rather
closed, determined system. Rather, the child's cognitive system gradually to help the child move on to the next, higher stage of
opens up in the construction zone, in which the shared activity development (Talyzina, 1995). Gal'perin's theory has never
of constructing new (or advancing old) cognitive functions takes been implemented in the form of a complete curriculum but has
place. Cognitive tasks are first constructed in the interaction been applied in many subject-specific areas (e.g., reading and
80 GRIGORENKO AND STERNBERG

arithmetic [El'konin, 1960] and math [Obukhova, 1972; Sal- the mastery of the four concepts of quantity, mediation, relation,
mina & Koimogorova, 1980]). A number of monographs (e.g., and discrete versus continuous results in the mastery of the
Talyzina, 1995) have presented mostly qualitative data indicating concept of real numbers. The curriculum included sets of de-
significant gains in children's school achievement when they tailed practices and learning exercises based on Davydov's the-
were taught by Gal'perin's method, but no robust quantitative ory of learning activity. All of these activities were performed
validity data have yet been presented. together with the teacher; that is, in the spectrum of the chil-
Yet another Vygotsky-based Russian school program was de- dren's ZPD.
veloped by the group of Davydov in three Russian schools. The 15-year-long intervention study showed significant posi-
Davydov (1986) combined Leont'ev's theory of activity and tive differences in the short- and long-term academic achieve-
Vygotsky's ideas about the relationship between development ments of students enrolled in the experimental program when
and learning to create a theoretical approach that he called devel- compared with control students (Davydov, 1986). Moreover,
oping teaching. The general principles of this approach were related studies that investigated the development of general cog-
used in the Russian, biology, mathematics, arts, and physics nitive functions in experimental versus control students showed
curricula in the elementary grades of Moscow School 91 and significant differences in metacognitive development as well as
Khar'kov Schools 4 and 17 (see Dmitriev, 1997; Kozulin, in the development of imagination, thinking, memory, and self-
1984). The core of the program was the belief that students regulatory processes (Razvities psikhiki shkol'nikov v protsesse
should master general reasoning (as a method of scientific analy- uchebnoi deiatel'nosti, 1983). Though probably the best vali-
sis) in a given subject domain and that reasoning skills should dated ZPD-based pedagogical program, Davydov's curricula
later enable them to solve domain-specific concrete problems. are yet to be scrutinized by means of large-scale longitudinal
Mastery of reasoning, according to Davydov, is possible even analyses as well as by other independent researchers.
by the age of 6 or 7 if the child is engaged in an appropriate In general, the research conducted to date has not produced
and carefully designed learning activity. Thus, the task of educa- convincing quantitative empirical data to support the broad
tors and psychologists is to design special reasoning-targeted claim that ZPD-based teaching results in better educational and
learning-activity scenarios and to lead students through the pro- cognitive outcomes. Of course, we cannot ignore observed
cess of mastering new concepts by guiding them from the ab- group differences in performance on logic games (with students
stract to the concrete, from the most general relationship charac- from the experimental teaching group generalizing rules of the
teristic of the mastered concept to its concrete, empirical mani- game significantly better; Davydov, Pushkin, & Pushkina,
festation. The main task of teaching is to ensure the identification 1972), and we cannot dismiss students' overall positive affect
of the concept and its representation in a symbolic form, which and happiness, which have been reported on numerous occasions
indicates the formation of a corresponding abstraction. More- by many investigators mentioned above (see, e.g., Davydov,
over, the teacher should lead the student through the exploration 1986; Newman, Griffin, & Cole, 1989). However, (usually un-
of the links between the abstraction and its various empirical quantified) feelings of happiness do not necessarily translate to
manifestations and the established links between the original better school or life achievements. Unfortunately, most of the
abstraction and abstractions of the second order. approaches have lacked hard data from carefully controlled
All of these teacher-guided activities are conducted in the studies regarding gains in achievement. Moreover, a very small
child's ZPD and at the end point of the process; the original proportion of these studies has been published in refereed jour-
abstractions become concrete concepts, which are assumed to nals; most of the accounts of the programs come from somewhat
help students solve any empirical problem pertaining to a given loose descriptions in monographs and book chapters, in which
subject. For example, the formation of the concept of real num- not much attention has been paid to quantitative aspects of the
bers is based on the child's mastery of the concept of quantity. work in general and to program evaluation in particular. Such
The first-grade math course of School 91 started with the intro- limited availability of quantitative information inevitably raises
duction of the concept of quantity through the concepts of equal questions regarding various biases in available studies such as
to, more than, and less than. These concepts were given initially how samples were obtained, how and what statistics were ap-
in their abstract form by means of letter formulas (e.g., a = b, plied, and whether there were adequate controls. In addition,
a > b, b < a, a + c > b). At the next stage, the teacher to our knowledge, independent evaluations of the approaches
introduced real-world comparisons and gave children problems described above have been scarce.
comparing the lengths and weights of different objects. The
teacher thereby ensured the establishment of a link between the The ZPD as an Individualized Means of
real-world object and the child's ability to translate the quantity Improving Testing
relationships into abstract forms (e.g., if the weight of one
object is a and the weight of the other object is b, then a > b). The ZPD can be viewed as a means to improve testing of
At the next step, the children were expected to discover that individual mental functioning. Within this framework, cognitive
immediate comparison is often not feasible and that they there- skills are taught initially in the external space formed between
fore had to find a mediating quantity that could be measured in the teacher and the student and then are internalized by the
relation to the two objects that were compared (e.g., if a > c student so that they become a part of the learning-disabled
and c > b, then a > b). Thus, the concept of mediation was child's own personal repertoire (Das & Conway, 1992; Levina,
introduced. The next stage involved the mastery of the concept 1968). For example, in Russia, this aspect of Vygotsky's ideas
of relation (e.g., if Ale = K and b < c, then Alb > K). The took the form of research developments in Soviet defectalogy
last stage of this section of the math curriculum was the mastery (the term used in Soviet psychology to signify research on
of the concepts of discrete and continuous. It was assumed that handicapped children; see Grigorenko, 1998). Soviet defectol-
DYNAMIC TESTING 81

ogy considered the responsiveness of children to prompts as a cognitive intervention targeted at teaching generalizable con-
crucial basis for separating mentally and educationally retarded cepts and principles (e.g., Feuerstein's mediated learning, Hur-
children from normal children (e.g., Vlasova & Pevsner, 1971). tig's experimental learning, Paour's induction of learning struc-
Many Russian remedial programs are based on the ideas of the tures), (b) learning within the test (e.g., Guthke's learning test,
ZPD and the careful monitoring of the intervention within the Brown's graduated-prompts approach), (c) restructuring the
ZPD (e.g., Goncharova, 1990; Goncharova, Akshonina, & Zar- test situation (e.g., Budoff's training tests, Carlson & Wiedl's
echnova, 1990; Nikolaeva, 1995; Pozhilenko, 1995; Spirova & optimization of test administration, Haywood's enriched input),
Litvinova, 1988). In the United States, this notion of the ZPD and (d) training a single cognitive function (e.g., Swanson's
as a means to improve testing to maximize the effectiveness of working memory battery, Specter's test of phonemic awareness,
teaching has been used in intervention programs such as recipro- Pefla's test of narrative performance).
cal teaching (Palincsar & Brown, 1984,1988) and in the context We used a four-point profile to evaluate the empirical data
of information-integration theory (Das, Kirby, & Jarman, 1979), collected through these approaches: (a) comparative informa-
in which teaching is highly individualized and targeted at tiveness, (b) power of prediction, (c) degree of efficiency, and
minimizing the difference between assisted and nonassisted (d) robustness of results.
performance. Comparative informativeness. This dimension indicates
The ZPD thereby provides a way to create competence before various qualities of the discussed method (e.g., its differentiating
performance (Cazden, 1981). It is this perspective on the ZPD properties, its psychometric characteristics, and the quality and
that has formed the basis for the development of the dynamic- informativeness of the obtained data). The underlying question
interactive testing paradigm that is discussed in detail in the here is whether the given methodological paradigm contributes
following sections. any new information over and above that obtained with conven-
Thus, Vygotsky's approach launched research in two major tional measures.
directions: one that has studied social methods of transmitting Power of prediction. This dimension pertains to the rela-
knowledge (including pedagogy) and another that has dealt with tionship between the information collected and the criteria used
enhancing and improving the quantification of individual cogni- to assess validity. The underlying question is how successfully
tive functioning. Subsequent developments of Vygotsky's origi- the new methodology predicts performance in a designated pop-
nal ideas have modified initial concepts and methodologies, but ulation for a given set of criteria.
all these later interpretations took off from Vygotsky's work as Degree of efficiency (time and effort invested, in consider-
the main launching point for their research. ation of the uniqueness of information obtained). This dimen-
The boundary lines between the two (teaching-oriented and sion reflects how much effort a proposed test or testing proce-
individual-testing-oriented) perspectives are somewhat artificial. dure requires for its administration as compared with conven-
These two subfields of dynamic testing have never been com- tional testing and the uniqueness of the information obtained.
pletely independent of one another, and, quite to the contrary, The underlying issue is the extent to which using the new method
researchers have crossed these boundaries and extended work takes more (or less) effort when compared with traditional static
developed in one perspective into the other perspective as well methods given the unique nature of the information obtained
(e.g., educational applications of Feuerstein's ideas by Jensen from the new method.
et al., 1992). Dynamic testing has been used for close to 3 Robustness of results. This dimension indicates how robust
decades with the goals of both testing and education, largely in the obtained findings are. The underlying question is whether
the context of working with socially deprived, retarded, and the results have been shown to be replicable across studies and
learning-disabled children and adolescents. Recently, however, research groups.
there have been more and more attempts to apply these ideas We have to admit that our evaluations of the different ap-
to regular school settings (for a review, see Haywood & Tzuriel, proaches are not necessarily equivalent, either in depth or critical
1992a, 1992b). analysis. As is often the case, the better developed the approach,
the more the data that are available; and the more the work that
has been put into it, the easier it is to find flaws and to identify
Leading Modern Approaches to Dynamic Testing
questionable practices. The approaches we review have different
The Approaches in Action publication frequencies ranging from dozens (e.g., Feuerstein
and colleagues) to a few (e.g., the Russian defectological tradi-
This section of the article describes what we believe to be tion ) publications. We have tried to review most of the published
the leading approaches to individual-testing-oriented subfield of material on dynamic testing; however, much of the relevant work
dynamic testing. Our description of these approaches focuses has been presented orally or is in preparation for publication.
on a number of aspects including (a) method of testing, (b) We start our review with the leading approach to dynamic test-
target population, (c) format of testing, (d) nature of testing ing, the approach of Feuerstein and colleagues.
materials, (e) outcome measures, and ( f ) predictive power of
the approach (see Table 1). Of course, it is not possible to
Feuerstein's Approach and Its Modifications
describe and evaluate every approach that has been used.
To provide adequate sampling, we review selected paradigms One of the most noteworthy and fully articulated contribu-
that are representative of the four major clusters of dynamic tions to the field of dynamic testing has been that made by
testing approaches. Expanding on Haywood's (1997) three- Reuven Feuerstein and his colleagues (e.g., Feuerstein, Rand, &
cluster classification (a-c), we point out one more group of Hoffman, 1979; Feuerstein, Rand, Hoffman, & Miller, 1980;
approaches to dynamic testing (d). The clusters are (a) meta- Feuerstein, Rand, Jensen, Kaniel, & Tzuriel, 1987), who devel-
82 GRIGORENKO AND STERNBERG

t
T3 8 . j?
o
*
1 •s
•a 1 JS
tJJ 1 s
£ 1 J 1
£ .b s
z [i. z
-£• P3
c c
> >
if 8
i£ 'CS •d
G
U H H H

b «8 •8 S
Outcome (

cognitive

lo
changes
Structural

-•i
|l
"111
4> C o ^
E S &$
<
i it
"><
2 a^
c Us
S2 "5s 'o
00 2^
g 8 .. 3 g OJ
1+-,CL.

|| ,C £ 00 ^

§l|
C -i
0 ^_
s.s *
&1 '*
*-» C

S a
1 5 lllfo 5 5 'S » II !
S
^ o o S •£
0
u ^o gS,^

-S •Jf
sill 1 I'S.s^ f £'=* 1 •g &| I -S ||.
< < « " x

— li
= -a S
1 -s-
B || il j|i
I aJ
l
| | ll'l!1!
(S .S s .a ' 13 o ~ c !

1
0,1-1 •a 1 § g <
s I 3 .1 1
]

1
111 a
l1 1 !
£ £

'i * s g J
who can i

0
13 — ,2
l
modificati
All individu

E -a tie TS c
S '-3 ** ' c - in S

itifi ill
•^

U
u S O rt
w
a •"

I 1 J2 11

^
U
Method

12 1
Vi

$
S
?iJ
C ** ^
S P £
« 0- H

-1
11
H
W) g

'§ o.
s
^•IpJil&iGi&ls
||||il|ii|iia|
O Q H
II
r? -j
_
.. E"~_ w
D.

i
U ^ '
C 3 — o f
^ •&u B. *
I 1
Ll |
||I| § ^ • 3 •- «
O-
* 1 1 1 -:
I £•11
| J.s.f I =1.
1 | 3 (£1
1| s 8 11 sit ' •
DYNAMIC TESTING 83

oped the Learning Potential Assessment Device (LPAD) as a the modifiability is sampled, subsequent intervention attempts
dynamic-testing instrument. The LPAD is characterized as a to induce, promote, and strengthen the modification. Thus, the
method for assessing the potential of children, adolescents, and goal is not a concrete, specific change but rather a global modi-
adults for growth in specific cognitive processes, first by guided fication of a somewhat loosely defined set of cognitive struc-
exposure to problems and processes of thought and subsequently tures. Cognitive modifiability is viewed as an independent ability
by their own independent efforts (Feuerstein, Rand, Haywood, to self-modify cognitive functioning and to adapt to changing
Hoffman, & Jensen, 1985). In contrast to static testing measures, demands. Thus, the stated goal of this approach is to evaluate the
the LPAD changes the practice of testing in four major areas: individual's ability to profit from instruction and subsequently to
the structure of the instruments, the nature of the test situation, modify his or her own cognitive functioning. The goal of the
the orientation to process, and the interpretation of results LPAD is to ferment change in old cognitive structures and even
(Feuerstein, Rand, & Rynders, 1988). The specific instruments to introduce new structures. Feuerstein has stated that evaluation
constituting the LPAD battery include verbal and nonverbal and remediation should take place simultaneously.
tasks targeted at specific skills such as analogical and numerical The model. The LPAD is rooted in Feuerstein's concept of
reasoning, categorization, and memory strategies. mediated learning experience (MLE; Feuerstein, Rand, Hoff-
The 15 instruments in the battery are designed to challenge man, & Miller, 1980), an interactional process in which an adult
a testee to use (or to form and use) different cognitive operations interposes himself or herself between the child and the task and
(e.g., serialization, classification) in different cognitive domains modifies both the task (by adjusting frequency, order, complex-
(i.e., numeric, verbal, logico-deductive, figural). Most of the ity, and context) and the child (by arousing him or her to a
tests are based on (or are) standardized psychometric instru- higher level of curiosity and to a level at which structural cogni-
ments, which are used in dynamic, mediational modes. For ex- tive changes can occur). The concept of MLE is ideologically
ample, the LPAD directly uses the Raven Colored (1956) and quite close to some of Vygotsky's thoughts. MLE highlights the
Standard Progressive Matrices (1958) and the Rey-Osterreith role of adults or older children as catalysts in younger children's
Complex Figure Test (Rey, 1959). Other instruments (e.g., Set mastery of both declarative and procedural knowledge. However,
Variations I and II, the Organizer, the Representational Stencil in its broader interpretation, any interaction can be viewed as
Design Test) were developed for the LPAD specifically but either MLE if it meets a set of criteria among which are the mediator's
are based on or are modifications of older existing instruments intention to change the child and the generalizability of the
(e.g., the LPAD Stencil Test is an adaptation of the Grace Arthur projected change beyond the precipitating situation. According
Stencil Test). Testees' time is not limited, but their responses' to Feuerstein, everybody undergoes and experiences MLE. What
tempo is registered. is important, however, is that inadequate MLE leads to inade-
A limited selection of the total battery of tests is usually quate cognitive development (Haywood, 1997). This belief
administered. The number of tests used with a given testee as might very well have served as the basis for Feuerstein's redefi-
well as the time of testing vary widely depending on the individ- nition of the target population; everybody has a chance of expe-
ual profile of the testee and the amount of mediation (Kozulin & riencing inadequate MLE, so everybody has a chance of having
Falik, 1995). Consider some key features of this approach. some cognitive functions developed inadequately. Correspond-
Target population. Initially, the LPAD was developed as a ingly, everybody can benefit from modification.
testing device for work with low-achieving children. Feuerstein To implement his paradigm, Feuerstein eschewed the use of
started developing his theory at a point in time when Israeli conventional standardized tests but nevertheless assembled tests
society was struggling to integrate minorities and immigrants that are similar or identical to the conventional ones into a device
from all over the world into its culture (Feuerstein & Krasilow- based on his theory (Feuerstein, Rand, & Hoffman, 1979). He
sky, 1972; Tzuriel, 1992). Throughout 20 years of research, argued for a flexible, individualized, and highly interactive for-
the LPAD has been used in case studies involving severely mat of testing. The LPAD model describes three dimensions
handicapped children with discouraging prognoses on static IQ crucial for the testing of learning potential: (a) modality of
measures (Feuerstein et al., 1988) as well as in work with spe- presentation, (b) novelty and complexity, and (c) operations
cial-needs children (e.g., Keane & Kretschmer, 1987) and edu- required for task solution. Different subtests of the LPAD vary
cationally deprived children such as immigrants in Israel (Ka- on which of these dimensions they assess in such a way that,
niel, Tzuriel, Feuerstein, Ben-Shachar, & Eitan, 1991). together, the device covers various combinations of values on
Later, however, Feuerstein redefined the target population of the three dimensions. The tests are administered in a specially
his instrument to include virtually everyone (Feuerstein, designed test situation that is intended to be a flexible, individu-
Feuerstein, & Schur, 1997). This expansion appears to have been alized, and intensely interactive three-way (task-examinee-ex-
justified only weakly. It might be assumed that an expansion aminer) process.
of such magnitude would be supported by empirical evidence In the LPAD paradigm, the examiner has a very important
suggesting that the proposed instrument has proven itself to be role. Note that the examiner not only has to detect failures but
appropriate for the newly designated population. However, the also has to find the best way to remedy them. To anchor the
extended population has never been clearly specified. Moreover, examiner's activity and to help him or her approach each child
a lack of criterion-related validity studies convincingly demon- effectively, the concept of a cognitive map has been developed.
strating the predictive power of the LPAD (see below) places The cognitive map is a heuristic representation proposed to
the enterprise in doubt. analyze both cognitive tasks and corresponding mental acts.
The goal. In contrast to other approaches with more modest The cognitive map is composed of seven parameters (content,
aspirations, Feuerstein's approach allegedly exemplifies the cog- modality, phase, operation, level of complexity, level of abstrac-
nitive modifiability of the individual's cognitive structure; when tion, and level of efficiency). The function of the cognitive map
84 GR1GORENKO AND STERNBERG

is to chart the cognitive deficit and pinpoint the directions that exposure to MLE. The support for this claim is drawn from
instruction should take. studies that showed improvement in cognitive performance after
The administration of the LPAD is thought to be of special mediated learning had been experienced (Feuerstein et al.,
value to the child, because it provides an MLE by creating a 1979). However, an individual's response to treatment does not
ZPD. In this interaction, affect plays the crucial role. The neutral necessarily imply causality. In other words, even a convincing
unresponsive attitude of the examinee that is characteristic of demonstration of the fact that an individual's performance has
static-testing situations is expected to reinforce the child's al- improved after experiencing mediated learning does not mean
ready negative self-image. Even the provision of simple positive that the initial low performance was caused by a lack of medi-
feedback is not seen as a sufficient counterbalance to the child's ated experience. Moreover, strong controls are needed to draw
negative self-image. The examiner is expected to behave as a any firm conclusions at all. Second, there is a certain degree
good teacher who is responsive to the examinee in a variety of of discrepancy between the elaborateness of these theoretical
ways (e.g., giving and asking for explanations, selecting exam- speculations and the fairly meager usage of these constructs
ples and control tasks, and monitoring progress). within the framework of empirical research actually conducted
The LPAD is designed to evaluate how modifiable the child's with the LPAD. The ultimate test of Feuerstein's approach is in
cognitive structures are and where the child's cognitive deficits improvements in academic performance. The results of the stud-
lie. The LPAD-based evaluation is expected to reveal what must ies addressing this question are mixed (see Bradley, 1983;
be done in intervention in order to provide a situation in which Bransford, Stein, Arbitman-Smith, & Vye, 1985; Frisby & Bra-
the MLE can take place. The outcome of the MLE is the active den, 1992, for a review and analysis; Missiuna & Samuels,
production of new, adequate cognitive structures. This is where, 1989). Here we briefly analyze some of the studies conducted
according to Hay wood (1997), the crucial difference between within this framework.
static and dynamic tests lies: Whereas static tests accurately and Empirical findings. Feuerstein's approach has been adopted
reliably point to deficient cognitive functions, the dynamic tests by many psychologists and educational professionals. Despite
point to what can be done to overcome these deficiencies and large amounts of published material, however, the empirical
thereby defeat the prediction of static tests, which otherwise— findings are somewhat difficult to evaluate, because very few of
if there is no dynamic testing followed by intervention—could these studies have been published as original empirical reports
have proved correct. in peer-reviewed journals. Most of them have been presented in
In contrast to other dynamic-testing approaches, Feuerstein's book chapters or as papers delivered at meetings. Therefore, it is
approach does not assume that the administration of the LPAD difficult (and sometimes impossible) to ascertain such important
will result in deep structural changes in the testee's cognitive data as sample size, F values, p values, and magnitudes of
structures. The transient changes produced and observed while effects. Consequently, this review is by no means complete, and
administering the LPAD provide a professional with an idea of our conclusions are open to modification pursuant to new or
what type and how much intervention will be needed to yield more complete data. We structure our analyses by using the
those desired deep structural changes. The quality of the child's four-point profile we proposed for evaluating the empirical find-
response to MLEs is operationalized by the concept of structural ings. Some of the points we make are illustrated with selected
cognitive modifiability (Feuerstein et al., 1985). studies.
The LPAD is, by design, tightly linked to intervention and is Comparative quality. Internal consistency and test-retest
distinct from traditional static testing. Feuerstein, Rand, and reliability of the LPAD have been assessed within a static mode.
Hoffman (1979) have emphasized four principal differences As such, reliability coefficients range between .70 and .95
between the LPAD and the standard testing approach: (a) the (Wingenfeld, as presented in Frisby & Braden, 1992), thus
LPAD tasks are designed to teach and assess cognitive changes falling within an acceptable spectrum. These coefficients, how-
rather than to measure the individual's status relative to his or ever, are no surprise, because of the fact that the LPAD compo-
her peers; (b) the LPAD is oriented toward process rather than nents, when administered statically, are conventional tests them-
product; (c) the LPAD's testing situation provides an inter- selves. What is a challenge to the LPAD is the registered low
active-dynamic approach rather than a standard formal ap- interrater reliabilities that are obtained when different observers
proach; and (d) the interpretation of results on the LPAD fo- evaluate the types and severity of cognitive deficiencies exhib-
cuses on the peaks of individual performance, attempts to locate ited by a child (Vaught & Haywood, 1990). These low interrater
origins of success and failure, and assesses the changes possible reliabilities make the results suspect, as they might reflect more
through cognitive modifiability. Specific testings are made in of the observers' than the child's characteristics.
different domains with the quality and the amount of interven- Despite vast amounts of data accumulated by the LPAD pro-
tion taken into account. ponents in various studies, very little attention has been given
Thus, Feuerstein's is an elaborate theoretical model with the to questions of either construct or criterion validity. Regarding
concept of cognitive modifiability at its center. Two comments the construct validity of the LPAD, factor-analytic studies, tradi-
have to be made regarding this model. First, researchers (e.g., tionally used for establishing internal construct validity, have
Frisby & Braden, 1992) have pointed out the vagueness and not been published, so there is no evidence that the structure of
imprecision of the theoretical terminology used by Feuerstein. the LPAD corresponds to the major parameters of the proposed
In close analyses, it appears that semantic fields of different cognitive map.
concepts overlap, lack exactness, and suggest undemonstrated Many validity studies have compared the effectiveness of
causal links. For example, it has been argued (Feuerstein & different types of mediation on test performance (e.g., Burns,
Rand, 1974; Feuerstein, Rand, & Hoffman, 1979) that the proxi- Vye, Bransford, Delclos, & Ogan, 1987; Missiuna & Samuels,
mal cause of individual differences in performance is lifetime 1989). One of the most recent, as well as comprehensive and
DYNAMIC TESTING 85

elaborately designed studies in terms of its results, is that of ATI design. The study would have been more relevant to the
Tzuriel and Feuerstein (1992). The goal of this study was to field of dynamic testing if the researchers had a measure of
compare the performance of advantaged and disadvantaged chil- learning potential for all groups and incorporated this index into
dren in Grades 4-9 (total N = 1,394) on Raven's Standard their models. If such an index remained significant over and
Progressive Matrices (RSPM) with their performance on a above the effect of initial performance and if the children in the
group-administered subtest of the LPAD based on selected ma- no-teaching condition (whose learning potential was controlled
trices from the individual version of the test (the Set Varia- for) gained less, then the findings would have been novel. Other-
tions—II subtest of the LPAD) under different amounts of medi- wise, the results showing that more teaching leads to better
ation. The group administration of the LPAD involved four outcome and that students who demonstrate lower levels of per-
stages, namely, demonstration, test, learning, and retest. All formance at the baseline gain more from targeted teaching ap-
stages were conducted by the same experimenter. pear to be of little novelty. Moreover, there was no control for
In this study, the experimenter varied the amount of teaching any other (non-LPAD-relevant) type of instruction. It might be
(high, low, and no teaching) provided at the LPAD intervention that the disadvantaged students were more sensitive to any kind
stage. The RSPM were administered twice: before and 2 weeks of instruction and thus that their learning gain might have been
after the intervention. According to the authors' interpretation, greater than the gain of advantaged students, who were at a
the results showed that (a) LPAD performance was predicted higher level from the beginning. Regression effects also could
by RSPM scores; (b) children performed significantly better on have accounted for the results.
the LPAD after intervention than at pretest; (c) children who Second, there is a statistical problem with the presented re-
received higher levels of teaching performed better; (d) disad- sults because of the fact that most of the presented analyses
vantaged children performed worse than did advantaged chil- include two highly correlated variables reflecting the initial level
dren, but with more teaching, the difference in performance of performance: (a) a continuous variable, RSPM-initial, and
was smaller; and (e) the gain acquired during the intervention (b) the level of performance, specified as low, medium, or high
remained for 2 weeks and was detectable with the RSPM when on the basis of the same pretest RSPM scores. The inclusion of
they were readministered 2 weeks after the intervention. these two highly correlated variables in the analyses results in
Our closer inspection of the statistical procedures that were multicollinearity (or possibly even singularity), causing logical
used, however, revealed a number of details that might have and statistical problems. Moreover, the correlation itself is arti-
influenced the final outcomes. These drawbacks in the design ficially inflated because of its part-whole nature.
challenge the conclusions regarding the impact of the LPAD Finally, the data analyses did not explore the practice effects
intervention. that occur in RSPM performance simply as a result of taking the
First, there were in fact two different designs in one study. test twice. We have explored the data (as presented in Tzuriel &
The first design, within which the comparison of the RSPM Feuerstein, 1992, pp. 194-195) on percentage gains in correct
scores on pre- and posttest at different levels of teaching was answers on the RSPM posttest compared with the pretest for
carried out, included a control group (the no-teaching group), children in Grades 4-6 and 7-9. Under the assumption that the
whereas the second design, allowing for comparison of the standard deviation within different teaching groups is at least
LPAD Set Variations' performance at different levels of teach- as high as the one calculated from the means of different sub-
ing, did not have a control group (i.e., the comparison was types of the groups, we compared the total group means for the
carried out only in high vs. low teaching). The authors presented high-, low-, and no-teaching groups; there was no significant
this circumstance as a logical feature of the design; the Set difference between the gains under the three conditions. This
Variations are where the teaching had been carried out, so there finding suggests that any speculations regarding the effects of
was implicitly—according to the authors—no possibility for a teaching are not warranted unless the variance in improvement
control group. due to the second test administration is taken into account.
However, let us draw an analogy. Suppose the effect of train- In summary, strictly speaking, none of these findings are dy-
ing on singing ability was studied. The participants were stu- namic-testing specific. In essence, all they suggest is that if
dents from two different groups: The first consisted of partici- students are test-specifically trained (the more the better), then
pants who took music lessons, and the second was made up of their posttraining performance improves. Similar results have
those who did not. At the pretest, participants were asked to been obtained with the test-train-retesl paradigm by using
sing a song they had never heard before. Their performance was static tests (e.g., Throne & Farb, 1978). Moreover, other re-
recorded as their initial level of ability. Then participants were searchers have observed significant spontaneous practice effects
randomly assigned to different groups: (a) a group to which in performance on static tests in cultures in which these tests
five singing lessons were given, (b) a group to which one singing are not commonly used (Ombredane, Robayer, & Plumail, 1956;
lesson was given, and (c) a group to which no lessons were Serpell, 1993).
given. The teaching was targeted at the test song. A week later As noted earlier, the Tzuriel and Feuerstein study (1992) is
the training participants were asked to sing the same song again. prototypical of many other studies that have been conducted
When researchers analyzed the data, they took into account the earlier and with different samples. The main goals of these
quality of music schooling and the initial performance of the studies are to show (a) that the performance on the dynamically
song. administered tests is higher on posttest than on pretest, (b)
Virtually any psychologist familiar with the abilities literature that more mediation results in more improvement, and (c) that
would agree that this singing example manifests an Aptitude X disadvantaged students tend to benefit more from dynamically
Treatment interaction (ATT) design. Similarly, we argue that the administered tests than do advantaged students.
Tzuriel and Feuerstein (1992) study is an example of a typical A number of studies, conducted within this framework, were
GRIGORENKO AND STERNBERG

strongly influenced by Feuerstein's work but used different in- hinting and the static procedures. Thus, it appears that the data
struments and populations. For example, Lidz (1991) gave the support the usefulness of the mediational approaches. The au-
Preschool Learning Assessment Device (PLAD; Lidz & thors speculated, however, that the graduated-prompts procedure
Thomas, 1987) to a group of Head Start children. The PLAD is might be more relevant to school experiences and thus could be
a modification of the LPAD. The PLAD is based on Feuerstein's more predictive of children's performance than the mediation
approach and Luria's methodology (1966; Naglieri & Das, approach. There are two worrisome aspects of this study, how-
1988) and is targeted at children 3-5 years old. The comparison ever. First, there was no control group for the effect of teaching
was to a matched group that was exposed to the same pre- per se. As in the Tzuriel and Feuerstein (1992) study, the as-
and posttest materials but that did not undergo the cognitive sumption was made that the no-treatment group could be used
intervention that is usually a component of the PLAD. Tn other as a control group; the question, however, remains: If there is
words, similar to other studies in the field, the design included no control for the fact of teaching or simply interacting continu-
a no-intervention control group (to rule out practice effects) ously with the tester and getting used to him or her, how does
but no placebo group for teaching. The researchers found that one know that the effects of intervention are not mostly due to
mediated children showed greater gains than did control chil- teaching per se? To rule out the effect of teaching, some kind
dren, who showed no change. In another study (Reinharth, 1989, of placebo control is needed. Second, the magnitudes of the
as cited in Lidz, 1991), the PLAD was administered to a group reported effects were quite low: in light of the facts that multivar-
of developmentally delayed children. The comparison was also iate models were not tested and that the obtained p values were
made with members of a matched group who were not trained. not corrected for multiple comparisons, the published p values
Once again, the mediated group showed greater cognitive gains. of .01 might not hold under proper reanalysis. These findings,
A follow-up testing a week later documented an increase in however, were supported by the results from two other studies
performance of the experimental group. In addition, using an- conducted by the same group (Burns, Delclos, Vye, & Sloan,
other application of the LPAD for preschoolers, the Children's 1992; Burns et al., 1987). Thus, even though the magnitudes of
Analogical Thinking Modifiability Test (CATM; Tzuriel & the observed effects are not large, their consistency across the
Klein, 1985,1987), Tzuriel and Klein (1985) conducted a study Vanderbilt group studies (Burns et al., 1987, 1992) suggests
comparing the performances of advantaged, disadvantaged, spe- their durable nature.
cial-education, and mentally retarded children. All four groups Another important index of criterion validity of the LPAD
performed better (as shown by the absolute values; no variability would be a study that could demonstrate correlations between
measure was presented) on the dynamic test, the CATM, than the device and changes in academic achievement. We are not
on a static test, Raven's Colored Progressive Matrices (RCPM; aware of any studies that have directly investigated this issue.
Raven, 1956). For example, the disadvantaged children an- Another important type of validity is treatment validity. This
swered 64% of the CATM problems correctly as compared with refers to the extent to which an individual's test scores can be
44% of the RCPM items. A similar pattern of results regarding used to detect the kind of instruction that will be most effective
the children's performance on static and dynamic tests was for that individual. In other words, on the basis of the belief
found in a study of hearing children (66% correct on the CATM that the LPAD should lead directly to intervention, it is expected
vs. 42% on the RCPM) and deaf children (54% on the CATM that the device should be able to predict differential responses
vs. 39% on the RCPM; Tzuriel & Caspi, 1992) and in other of the assessed children to instruction and to address the issue
case studies (e.g., Haywood & Menal, 1992; Kaniel & Tzuriel, of ATT. However, beyond the observation that students of lower
1992; Katz & Bucholz, 1984). Thus, one after another, the stud- ability tend to benefit more from mediation than do students of
ies addressed the question of how much better the mediated higher ability, the LPAD-based studies do not address this ques-
performance and the posttest performance are when they are tion. For example, in the Tzuriel and Feuerslein (1992) study
compared with pretest performance. Other validity-related is- described above, the highest posttest gains were demonstrated
sues are not addressed. by the initially low-performance group followed by the medium-
Researchers have addressed the issue of the external validity and high-performance groups, with the larger magnitude of ef-
of some of the LPAD components, modified for young children. fect for the higher teaching condition. However, these findings
For example, the effectiveness of the mediation approach have not been translated into intervention recommendations.
(Feuerstein et al., 1979) versus that of the graduated-prompts They also may be due in part or wholly to regression effects.
procedure (Brown & French, 1979) in assessing independent Concluding this section, we pose a number of specific ques-
and transfer task performance was compared in a number of tions that have to do with the design of the studies and the
studies conducted by the Vanderbilt group (Burns, 1991; magnitudes of the obtained effects. One issue relates to the fact
Delclos, Vye, Burns, Bransford, & Hasselbring, 1992; Vye, that the LPAD and its variants represent not a single test but a
Burns, Delclos, & Bransford, 1987). collection of tests. Mean differences between pre- and posttest
In Burns's (1991) study, 127 children ages 4-6 were divided scores on different tests are not adjusted for multiple compari-
into three groups: mediation, graduated-prompts, and static test- sons. This apparent oversight in failing to control for Type I
ing groups. The comparisons were made for children's indepen- error (the error of rejecting the null hypothesis when it is actually
dent performances on the task they had been trained on and on true) results in the probability levels of these tests being much
a transfer task. Dynamically assessed children, in both media- higher than the .05 level claimed. Just by chance, some of these
tional and hinting approaches, performed better than children statistical tests would be expected to be significant.
from the static group. Moreover, the mediational group did better Another problem that appears to be relevant to most of the
than the hinted group. The transfer performance was better for studies mentioned above centers around the power of the statisti-
mediated children, but there were no differences between the cal tests used. With < / / = ! , 1375 (e.g., Tzuriel & Feuerstein,
DYNAMIC TESTING 87

1992), the power to produce statistical significance is large even credits and grade point average (r = .55, p < .01, and r = .50,
if the difference between the means is quite small and possibly p < .01, respectively), whereas the DRT scores of more highly
trivial in a practical sense. Thus, researchers conducting large- modifiable students did not correlate with the criterion mea-
scale studies need to formulate a priori hypotheses regarding sures. Shochet concluded that static tests might be not only
the magnitude of differences between means that they believe unreliable but also unfair when applied to highly cognitively
a priori to be practically and psychologically important. If such modifiable individuals.
estimates are not made a priori, it is often assumed that the The Shochet (1992) study has a number of methodological
differences should at least exceed the standard error of the mea- difficulties, among which are a relatively small sample size of
surement for a given test (Salvia, as cited in Bradley, 1983). disadvantaged students (N = 52), a limited sampling of abilities
Bradley (1983) conducted a scrupulous analysis of the mean measured (deduction), and ambiguity of the cutoff point be-
differences obtained by Feuerstein et al. (1979) and found that tween more and less modifiable students. However, this is the
none of the reported mean differences even approached the stan- only study that has anchored (even if weakly) Feuerstein's para-
dard-error-of-measurement criterion. digm to a criterion measure. Many researchers, when they did
Power of prediction. It seems reasonable to assume that if a not find any evidence for criterion validity, have used phrases
child's cognitive processing has been modified and, presumably, like "as expected" (e.g., Haywood & Arbitman-Smith, 1981),
strengthened by training, then there should be effects of these stating after the fact that the changes one might get either could
modifications on school performance. Feuerstein and his col- not be observed yet or that the time period in which changes
leagues have argued against the use of school achievement as a could have been observed had passed. These equivocations led
criterion for the evaluation of the predictive validity of dynamic Bradley (1983) to question how many failures to find evidence
instruments (see Feuerstein et al., 1979). Feuerstein and col- for the paradigm's criterion validity would need to be accumu-
leagues suggested concentrating instead on "changes in the lated before the researchers would admit that Feuerstein's mea-
functioning of the individual following the intervention charac- sures have questionable predictive power.
teristic of the dynamic assessment" (p. 326), but it would be Effort invested in training and administration. It has been
desirable to have some clearly specified and important criterion stated many times that LPAD-based testing requires more skill
that is external to the tests whose validity is being evaluated. and greater investment of time from both examiner and exami-
As of today, the target of prediction of dynamic testing is not nee than does static testing. On average, according to Tzuriel
clear. According to Tzuriel (1992), in order to predict cognitive (1995), if a complete 15-subtest version of the LPAD is admin-
modifiability, an intervention should be carried out first with the istered, a 1-participant investment may take about 10 hr. Vye et
goal of actualizing the potential diagnosed by a dynamic test. al. (1987) effectively described the demands of such testing:
Following Feuerstein's argument, Tzuriel argued that automatic "[Feuerstein's] extended assessment can last from a number of
transfer from test results to school achievements cannot be as- hours to several days" (p. 330). To justify such an investment,
sumed. To reiterate, in order to validate and verify results of one should be confident that the results of the testing will be
dynamic testing, a complex intervention should be carried out adequately worthwhile for both examiner and examinee. Of
first. In other words, in this approach, testing and intervention course, most of the practitioners use only reduced versions of
are linked together: initially, testing is done for the sake of the LPAD and combine its components with other approaches.
devising appropriate interventions; then intervention is carried Moreover, group-administered versions of the LPAD save con-
out to validate the test results; then, once again, more testing is siderable time by testing many children simultaneously. How-
undertaken to identify the next round of intervention, and so on. ever, as has been stated by the developers of the group version
A criterion-validation study with clearly formulated external of the LPAD, group testing is only a first step of the process
criteria was, however, conducted by Shochet (1992), who stud- and, in many cases, is expected to be followed by the more
ied the role of dynamic testing in predicting the success of refined and time-consuming individual LPAD testing (Rand &
104 advantaged and 52 disadvantaged undergraduate students Kaniel, 1987).
in South Africa. The criterion measures were number of credits An extremely important aspect of dynamic testing concerns
and grade point average at the end of the 1st year. Using scores teachers' perceptions of the evaluated children. Using their brief-
on the Deductive Reasoning Test (DRT; Verster, 1973) adminis- form modification of Feuerstein's Stencil Design Test, Delclos,
tered both statically and dynamically, Shochet obtained three Bums, and Kulewicz (1987) investigated the effects that view-
scores: (a) manifest intellectual functioning (as measured by a ing dynamic-testing situations had on teachers' expectations of
statically administered DKT), (b) potential intellectual function- handicapped children as learners. Videotapes of two children,
ing (as measured by a dynamically administered DRT), and (c) one evaluated across a static-static-dynamic test sequence and
modifiability (as measured by the difference between scores on the other evaluated across a static-dynamic test sequence, were
the statically and dynamically administered DRT). On the basis shown to 60 teachers, randomly divided into two groups, each
of the modifiability score (c), the sample of disadvantaged stu- of which viewed one of the children. Even though the teachers
dents was divided into two groups of greater and lesser cognitive did not notice the difference in the children's involvement in
modifiability. The results revealed significant differences for the static versus dynamic tests, both task-specific and general com-
beginning versus the end of the school year in the prediction of petencies were viewed as much higher when the children were
the criterion measures among the less modifiable students as evaluated dynamically rather than statically. In other words,
compared with no differential prediction among the more modi- seeing the children in the context of dynamic testing raised the
fiable students. Shochet found that the less modifiable students' teachers' expectations. It is interesting to note that Hoy (1983,
scores on the DRT administered statically at the beginning of as cited in Delclos et al., 1992) did not show any such effect
the academic year correlated significantly with both number of when written reports of dynamic and static testing were com-
88 GRIGORENKO AND STERNBERG

pared. Moreover, researchers found that, for programming pur- serve as a basis for remediation programs and for conducting
poses, standard psychological reports were rated as significantly individual evaluations of handicapped children.
more useful than LPAD reports (Hoy & Relish, 1984). This A different situation arises when the LPAD is administered
finding was not supported, however, in the study carried out by to mainstream children. No strong validation data have yet been
Delclos, Bums, and Vye (1993). These researchers found that provided that adequately support the claim that the information
the perceptions of the psychological reports depended on a num- obtained through the LPAD has more predictive power than do
ber of factors, among which was the individual profile of the scores from conventional tests. As Blagg (1991) noted, it is
evaluated child and the familiarity of the teachers with relevant somewhat paradoxical that, although Feuerstein rejects IQ tests
psychological theories. In combination, these results suggest as predictors of learning potential, he still refers to changes in
that, although teachers' perception of children's competencies IQ scores observed after the mediation as major evidence to
might be influenced by both observing children's actual perfor- support the application of the LPAD (e.g., Feuerstein et al.,
mances on dynamic testing and reading about the results, this 1979). Given the similarity of content, mediation may in effect
influence is mediated by many factors. be doing nothing more than teaching to the tests. At this point,
The LPAD's lower age boundary is 10, so the dynamic testing we cannot be sure.
of younger children is conducted with either CATM or PLAD. Robustness rtf results. Studies of the LPAD are probably
Each test requires approximately 90-120 min. more numerous in the field of dynamic testing than are studies
The LPAD is designed as a clinical procedure. This fact has of alternative dynamic measures. However, quite a few of these
been viewed as a justification for saying that the device is not studies have what appear to be limitations.
a standardized or normative tool and consequently that scoring First, consider the theoretical basis of this approach. In de-
involves clinical judgment and inference (Lidz, 1991). As for scribing LPAD's drawbacks, BUchel and Scharnhorst (1993)
the LPAD administration, the issue of variability among testers pointed to what they believe to be the lack of explicitly and
in the administration of the LPAD (or similar instruments) has unequivocally defined concepts, the uncontrolled use of ill-de-
been raised by a number of researchers (Jitendra & Kameenui, fined concepts, the overlapping subcomponents of the theory,
1993). For example, using a sample of 32 children evaluated by and the incorporation of heterogeneous description languages
five testers. Burns (1996) showed that, overall, tester-clinician belonging to different types of theories. We suggest a somewhat
behavior correlated with task performance (for some reason, different formulation of the problem: The vagueness of the con-
the researchers decided to match the children on their pretest cepts underlying mediated learning and cognitive modifiability
scores rather than to keep the pretest score as a covariate). The theory renders operationalization difficult and validation of these
concepts even more difficult. For example, it is not clear how
correlation between such tester behavior as asking two-choice
the transcendent nature of intervention, which is one of the
rule-goal questions correlated .62 (p < .01) with the child's
characteristics of MLE, translates into valid cognitive perfor-
correct nonverbal response. The comparison between the testers
mance outcomes that are related to school or other activities.
was carried out for only two testers. This decision considerably
This criticism, however, might not be totally fair. There are
reduced the sample size and consequently the statistical power.
researchers who have been able to overcome, at least in part,
No performance differences but some behavior differences were
the vagueness of the MLE-related concepts. For example, Lidz
found. In other words, the tester-testee behavioral exchange
(1991) has attempted to operationalize MLE by modifying and
varied among different pairs of testers and children, but the
reinterpreting some MLE components and placing them on a
children's performance outcome did not vary within the groups
scale. However, empirical research on the MLE at this time is
of the two testers (N = 10 and N = 7). This study certainly
limited and has been carried out primarily in the context of
raises a very interesting methodological question, but, unfortu-
parent—child interactions (for a review, see Tzuriel, 1997). The
nately, it does not have enough power to answer this question
goals of this research are (a) to define characteristics of parent-
one way or the other. In other words, some behavioral differ-
child interactions that are linked to various components of the
ences in response to testers' behaviors were registered, but the
MLE, (b) to describe the profile of the MLE, and (c) to investi-
question of whether or not these differences are linked to differ-
gate the effect of the MLE on children's cognitive modifiability.
ences in performance remains open.
Second, at the level of methodology, the LPAD fails to fulfill
As one can imagine, the scoring of the LPAD is a complex a number of requisites. One is standardization (in test adminis-
process involving mapping the obtained data back to the model tration, analysis, and interpretation of results). A second is relia-
and making subjective conclusions about such test dimensions bility; the LPAD and its modifications have weak test-retest
as phase of the mental act (i.e., input, elaboration, or output), reliability (Bu'chel & Scharnhorst, 1993) and are supported by
level of abstraction (i.e., conceptual distance between the object virtually no data addressing the issue of the reliability of change.
and the mental operations required), and five more dimensions Moreover, most of the studies that have been done use less than
that seem somewhat abstract and vague. Perhaps this is why the ideal statistical analysis; they have failed to use multivariate
mastery of scoring requires extensive and potentially expensive research designs and have not controlled for numbers of compar-
residential training sponsored by the Feuerstein groups and why isons in significance testing.
scoring reliabilities are nevertheless variable (Haywood & Any dynamic-testing procedure, especially one as interactive
Wingenfeld, 1992), as Feuerstein's, may result in both temporary and lasting
The LPAD methodology seems to be effective in work with changes in the child's cognition. Even though, as we have men-
handicapped children and adolescents (e.g., Feuerstein et al., tioned earlier, the LPAD does not have as its goal to provide
1979) for differentiating populations of underachievers much MLE that might result in durable changes, there have been
more finely than do conventional tests. The device may also a few reports registering the modifying effect of the group-
DYNAMIC TESTING 89

administered LPAD. We need a better sense of what the changes Ideologically, Budoff's position regarding the population for
are and how long they last. which the usage of measures of learning potential is crucial
Third, this approach requires a substantial investment of time resembles Feuerstein's. For both, the underlying beliefs are that
and money in training. The assumed ability of the mediator to (a) there are substantial numbers of individuals who, as a result
respond to any slight change in an examinee requires a fairly of their unique educational experiences (e.g., cultural differ-
high level of professionalism and, like any demanding require- ences or lack of proper education), have their actual capabilities
ment, limits the applicability and availability of competent ad- underestimated and so are unfairly viewed and classified by
ministration of the instrument. their teachers and the whole educational system and (b) on
In sum, Feuerstein's work is an example of a well-intentioned average, the performance of mentally retarded individuals, as
paradigm that very much needs convincing empirical validation. measured by static tests, is underestimated. These individuals
The principles and ideas developed by Feuerstein and his follow- have more learning potential than is usually identified by con-
ers have had major implications for the field of dynamic testing ventional tests. Correspondingly, Budoff's work addressed, in
and have facilitated the development of new approaches to test- essence, two different populations: those low-achieving children
ing. The field of dynamic testing is indebted to Feuerstein for unfairly classified as educable but mentally retarded on the basic
his pioneering and pathmakmg efforts. He placed his work in a IQ tests (primarily minority children and children of low socio-
large psychological and philosophical framework; he articulated economic status [SES]) and those children correctly diagnosed
the societal need for alternative approaches to testing; he initi- as educable but mentally retarded and who can demonstrate
ated practical movement away from conventional testing; he higher levels of performance when properly tested by other than
created an elaborate theory and a corresponding methodology. conventional tests and therefore must be educated in a way that
Feuerstein's work is ambitious, both in terms of its attempt to will allow them to maximize their capabilities.
cover a wide range of populations and in the cognitive functions The paradigm. Budoff (1987b), criticizing Feuerstein's
it addresses. Because of this globalism, such finer details as the work for lack of standardization, made a concerted effort to
psychometric properties of the instruments, adequate statistical standardize the training component of his approach. According
approaches to data analysis, and precision in making inferential to Budoff (1987b), his approach differs from other dynamic-
statements in interpreting the data do not appear to be the center testing techniques in which "it is difficult to distinguish the
of attention. Moreover, Feuerstein and his colleagues openly contribution the tester makes to improved student responses
state their preferences. Thus, in response to criticism, the first from what the student actually understands and can apply" (p.
line of defense is traditionally based on the societal and philo- 56). There are a number of characteristics of Budoff's approach
sophical aspects of the approach rather than on the accumulated that make his methodology distinct: (a) The procedure is explic-
psychometric and experimental data (e.g., Tzuriel, 1992). itly designed to serve as an alternative to conventional intelli-
Feuerstein's approach has provoked more research than any gence tests in selecting and classifying children for purposes of
other dynamic-testing approach. At approximately the same time special education; (b) only the use of standardized, reliable,
as Feuerstein's approach was developed, a number of other ap- and extensively validated tests is involved and even permitted;
proaches grew out of different traditions. Responding in part to (c) the aim of training is to familiarize the students with the
the methodological limitations apparent in Feuerstein's work demands of the tests, thus attempting to equalize their
(e.g., see Budoff's criticisms of Feuerstein for lack of standard- experiences.
ization of the training segment as cited in Lidz, 1991), and The outcome of the training procedure is conceptualized as
partially driven by their own ideas to improve the psychometric the measure of learning potential. Budoff views learning poten-
properties of the instruments in the field, researchers have at- tial as a measure of general ability (g; Budoff, 1968) in the
tempted to design paradigms that would maintain the flexibility disadvantaged population of children. Budoff's g differs from
of dynamic testing but simultaneously introduce into the testing the conventional g in that it does not directly relate to school
situation more demonstrably reliable and valid measures. activities (Budoff, 1969) and is trainable (Corman & Budoff,
Among the attempts to standardize both methodology and inter- 1973).
pretation are the approaches of Budoff, Campione and Brown, Initially, Budoff (1987a, 1987b) operationalized learning po-
Guthke, and Carlson and Wiedl. tential in terms of qualitative classification of cases. He distin-
guished among high scorers (those who scored high prior to
training), gainers (those who performed poorly on the pretest
Budoff's Approach to Measuring Learning Potential
but improved their scores significantly following instruction),
Target population. Budoff (1968) based his work on the and nongainers (those who performed poorly at pretest and did
assumption that some educable disadvantaged children (e.g., not gain from the instruction). However, in response to criticism
educable mentally handicapped children from impoverished en- that this classification could not distinguish between more finely
vironments) are more capable of learning than their conven- defined groups (e.g., Lidz, 1991), the qualitative descriptor of
tional test results would suggest. The hypothesis was that if learning potential was replaced with a set of continuous scores,
children are given the opportunity to learn how to solve a prob- operationalized through pretraining score, posttraining score,
lem through organized, specialized instructions, at least some and posttraining score adjusted for pretest level (the residualized
disadvantaged students will demonstrate improved performance score).
beyond that predicted by ability tests. Thus, the target popula- Measures of learning potential. Budoff and his colleagues
tion is a broadly defined population of disadvantaged and low- have developed dynamic versions of about a dozen well-known
IQ students, which includes underachiever, learning-disabled, standardized tests, among which are the Kohs Learning Potential
recent-immigrant, and other children. Task, Raven Learning Potential Test, and Picture Word Game.
90 GRIGORENKO AND STERNBERG

These instruments (referred to by the authors as tests, proce- and the Vanderbilt group did not find any evidence of between-
dures, or games) can be administered individually or in any domain transfer either for a pretest predicting intervention per-
combination and are referred to as Budoff 's Measures of Learn- formance or for an intervention predicting posttest performance
ing Potential (Budoff, 1987a, 1987b). (for a review, see Lidz, 1987). For example, it has been found
Internal consistencies of the tests are satisfactory, averaging consistently that neither full-scale intelligence measures nor
around .86 (Budoff, 1987a). Test-retest reliabilities were evalu- cross-domain measures predict dynamic performance with any
ated in a number of studies when posttests were administered great precision. The predictive power of subscales, correspond-
with different time intervals (from 1 day to 6 months). The ing to the type of cognitive functioning used in the dynamic
reliability estimated for the time interval of 1 month varied from tests, is argued to be higher (e.g., 48 vs. 18; Vye et al., 1987).
.51 (for the Raven test) to .95 (for the Kohs test). No evaluation Explaining their results, the authors (Fernandez-Ballesteros et
of the reliability of change has been performed. al., 1997) asserted that although the EPA materials include pro-
These tests can be administered in both individual and group gressive matrices, the embedded training is verbal; Femandez-
formats. For each test there is a specific set of instructions for the Ballesteros et al. appealed to a statement made by Campione and
examiner. Furthermore, generalized approaches are suggested to Brown (1979), who claimed that the operation trained during
testers at every step of the procedure. Thus, at pretest, the Bu- dynamic testing is that of executive verbal control. This result
doff-based examiner is expected to adopt the style of an exam- poses a yet-unanswered question of what gets trained or
iner in regular standardized static testing. At the training stage, coached—a targeted cognitive function or a more general strat-
the examiner's role is to direct the student's attention, to explain egy of problem solving.
the crucial attributes of the task and the testing procedure, and Despite his early claim that measured learning potential does
to guide the student in mastering all actions (both cognitive and not directly relate to school achievement, Budoff has conducted
motor) that are necessary for finding the right solution. For a number of predictive-validity studies (for a review, see Budoff,
example, training for the Kohs Block Designs involves five 1987a, 1987b). In one study using teacher-rated achievement,
coaching strategies, which are printed in the same format and correlations between the learning-potential posttest and achieve-
dimensions as the test designs. The coaching emphasizes four ment were different for three groups of students (above-average
principles (Budoff, 1987b): (a) maximizing success by pointing [high scorers], average [gainers], and below-average learners
out the simplest elements in the design, (b) frequent praise and [nongainers], as identified by Budoff's measures of learning
encouragement, (c) underscoring the importance of checking potential (for a review, see Budoff, 1987a, 1987b). Whereas the
the block construction against the design card, and (d) empha- correlations between posttest and achievement were twice as
sizing the two-color design of the blocks. The coaching sequence high as the correlations between IQ scores and achievement (r
for an item is designed so that the test taker initially has to = .35 and .16, respectively) for the average and below-average
solve the problem from a stimulus card on which the blocks are learners, the correlations were comparable (r = .35 and .31,
undifferentiated. In case of failure, the individual is presented respectively) for above-average learners (as presented in
one row of the design at a time. If he or she fails again, the blocks Laughon, 1990).
are progressively examined and their structure is explained on The predictive power of Budoff's measure of learning poten-
succeeding presentations. The standardization of the training is tial was also estimated by correlating it with classroom measures
approximate, not absolute. At the last stage, the examiner once of educational achievement (Budoff, Meskin, & Harrison,
again uses the style of traditional static testing. The materials 1971). As in the teacher-ratings study, students were classified
that were presented at pretest are presented twice more: usually into groups of below-average, average, and high-average learn-
1 day and again 1 month following coaching. ers on the basis of their scores on Budoff's measures. Students
Budoff and his colleagues typically categorized the slow were taught about electricity, and their subsequent performance
learners and the educable mentally retarded children they on the relevant achievement test was correlated with measures
worked with into three groups: (a) high scorers, those who of their learning potential. The results showed that the measures
performed well initially and without help; (b) gainers, those of learning potential were more predictive of achievement differ-
who initially had low scores but improved significantly with ences in learning about electricity than was IQ.
training; and (c) nongainers, those who scored low and did not Empirical findings. Within this paradigm, training improves
improve significantly with training. scores for both special- and regular-education children (Budoff,
In terms of construct validity, Budoff and his colleagues have 1987b). Moreover, for both nonchallenged and challenged learn-
presented ample evidence that coaching leads to improvement ers, Budoff reported a high correlation between learning-poten-
on posttest (Budoff, 1967, 1987a, 1987b; Budoff & Friedman, tial testing results and teacher indications of students' learning
1964). In terms of concurrent validity, Budoff's findings rates (Budoff & Hamilton, 1976). The best predictor of the
showed that learning-potential status was related to performance school-achievement criterion for children enrolled in special
but not to verbal IQ (Budoff, 1967; Budoff & Corman, 1974). curriculum programs was found to be the posttraining scores on
This finding appears not to be surprising, because most of Bu- the learning-potential testing instruments (Budoff, Corman, &
doff's tests measure nonverbal abilities. This result, however, Gimon, 1976; Budoff, Meskin, & Harrison, 1971).
has been challenged by a group of Spanish psychologists (Fernan- Most of the empirical work of Budoff and his colleagues
dez-Ballesteros et al., 1997). On the basis of Budoff's materials, (for a review, see Budoff, 1987a, I987b) has been devoted
they developed a Spanish version of a learning-potential assess- to exploring correlations between learning potential and other
ment device (EPA). The EPA gain scores were predictive of cognitive—as well as demographic, motivational, and emo-
improvement in verbal but not performance abilities. These re- tional—indicators. The schemes of these studies are quite simi-
sults were especially unexpected, because both the Budoff group lar to each other. Initially, Budoff's battery (or parts of it) was
DYNAMIC TESTING 91

used to evaluate students' learning potential; then the learning- We do not yet know if the characteristics of learning potential
potential indicator(s) is (or are) correlated with scores from a are predictive of positive life outcomes and better life adjustment
set of other measures. R>r example, Budoff and his colleagues or if a higher learning potential is itself a characteristic of a
conducted a number of large-scale studies (N > 400) investigat- more adaptive person.
ing social, demographic, and psychometric correlates of learning Second, correlations have been obtained in samples that have
potential (as reviewed in Budoff, 1987b). The participants in differed in terms of (a) size, ranging from N = 20 (Budoff &
these studies were low-IQ pupils who were either students in Pagell, 1968) to JV = 627 (Budoff & Gorman, 1974); (b) nature,
special-education classes or residents of institutions for persons ranging from mainstreamed to institutionalized mentally re-
with mental retardation. The results showed that (a) static mea- tarded children; and (c) age, ranging from 12 to 17 years in one
sures of ability (the Stanford-Binet intelligence test) were re- study (Budoff & Pagell, 1968). The question of comparability of
lated more to such variables as noninstitutionalization, family the obtained results in terms of both the meaning of the findings
demographic characteristics, and to Wechsler (1949; WISC) and the magnitude of the effects has yet to be addressed.
Verbal IQ than to measures of learning potential (pretraining Third, the authors have not been fully concerned with issues,
Kohs scores); (b) immediate effects of training were predicted such as lack of statistical power, to determine the presence of
by gender, family demographics, and static measures of perfor- an effect and correcting for multiple comparisons.
mance IQ; and (c) delayed effects of training were predicted Finally, at least in published presentations, comparatively lit-
by static measures of performance IQ. Moreover, whereas such tle attention has been given to the coaching procedures them-
variables as race and SES were significantly associated with selves. The initial agenda of Budoff' s approach was to standard-
static measures of IQ, these demographic characteristics did not ize the training component of dynamic testing (Budoff, 1987b).
elicit systematic differences on the scores following training. On the basis of the published material, it is unclear (a) whether
Another series of studies conducted by Budoff and his col- and how the coaching is standardized and (b) how much of the
leagues examined the personality profiles of low-IQ students registered effects can be accounted for by the way in which
whose learning potential was evaluated as higher than average coaching is performed.
for the group of low-IQ students. Children with higher learning Evaluation based on the proposed four-point profile. How
potential tended to be friendlier, showed higher motivation for can Budoff's work be evaluated within the four-dimensional
achievement, and were less rigid and less impulsive than were space described above? First, what kind of information is obtain-
children of similar IQ level but lower learning potential (Budoff, able by Budoff's device (for a review, see Budoff, 1987a,
1987b). Budoff also studied SES as a variable that might corre- I987b)7 In contrast to Feuerstein's approach (Feuerstein,
late with learning-potential measures and found that low-SES Rand, & Hoffman, 1979), this approach (a) seeks only to im-
students needed more training and benefited more from individ- prove performance on conventional tests and thus defines learn-
ual training than from group training in comparison with advan- ing potential by a simple measure of improvement in perfor-
taged students (Budoff, 1987b). mance on conventional tests and ( b ) works with a specific popu-
In their longitudinal studies, Budoff and his colleagues (Bu- lation of low-IQ children. Correspondingly, the obtained results
doff, 1967, 1987b; Budoff & Friedman, 1964) followed various are both population- and task-restricted. Budoff's device for
groups of low-IQ children over time. They found that gainers assessing learning potential, as Feuerstein's, has been shown to
as compared with nongainers were more likely (a) to perform help in differentiating the low-IQ population into distinct
better in special-education programs, (b) to attain economic groups. In addition, Budoff, as the Soviet defectologists (for
and social independence as adults to a greater degree, (c) to be details, see Vlasova, 1972), suggested that low-IQ students with
qualified for military service more often, (d) to report having higher learning potential (high gainers) are educationally disad-
friends and dating experience, (e) to live away from home and vantaged rather than mentally retarded. In a similar vein,
to obtain a driver's license, and ( f ) to be released from institu- Feuerstein et al. (1979) have spoken only of retarded performers
tions if institutionalized. However, these findings were obtained and not of retarded individuals. This distinction refers to stu-
from a fairly small group of participants and without proper dents whose school progress, for a number of reasons, is not
controls for possible covariates (e.g., type and amount of ob- satisfactory but who, given proper teaching, can progress suc-
tained education and family environment). cessfully. Those students whose learning potential is quite low
Four aspects of Budoff's experimental work are important (nongainers) match the profile of mental retardation. This differ-
to mention. First is the consistency of methodologies used and entiation appears to be helpful and might be used by educational
the research questions formulated. It appears that Budoff chose practitioners who work with populations of low-IQ students.
as his main goal the study of levels of learning potential as a This theme of Budoff's research is similar to Feuerstein et al.'s
function of students' various characteristics, ranging from their work on modifiability. Those students who gain more (or are
SES to their self-concept. Such consistency has both positive more modifiable) appear (a) to show that their performance is
and negative aspects. On the positive side, Budoff has explored better accounted for by the measure of learning potential than
correlations between his measure of learning potential and a by IQ and (b) to have a better overall lifelong prognosis. In
large number of measures of individual differences and has sum, in terms of its overall contribution to the theory and meth-
placed learning potential in the context of other individual char- odology of dynamic testing. Budoff's approach has attempted
acteristics. On the negative side, all of these studies are observa- to place learning potential into the broader context of other
tional and of limited explanatory power. Neither experimental cognitive, personality, and demographic characteristics.
designs nor statistical analyses allowing for fitting causal models The differentiation between gainers and nongainers as a way
have been implemented. In other words, only limited causal or to account for the heterogeneity of the population classified as
predictive information can be extracted from the obtained data. mentally retarded has been further explored by a number of
92 GRIGORENKO AND STERNBERG

psychologists interested in dynamic testing. One suggested dif- from each other—most likely as a result of the lack of statistical
ferentiation was that between biologically impaired and cultur- power in the sample of Black children. Thus, the conclusion
ally deprived participants (e.g., Fernandez-Ballesteros et al., of a difference in the correlation patterns was not warranted
1997). In one study, Fernandez-Ballesteros et al. (1997) investi- statistically. The author chose not to present the details of the
gated the differences in obtained gains between students with stepwise regression analysis (e.g., the variables used and the
known biological causes of mental retardation (classified as incremental K 2 s). If, however, the regression analyses included
organically impaired) and individuals who did not have any all of the variables presented in the correlation table, the results
noticeable biological dysfunction (classified as culturally de- might have been problematic because of the presence of multi-
prived). The researchers found that the gains of students without collinearity in the equation.
any known biological dysfunction were about twice as great as, Similarly, when researchers (Wurtz, Sewell, & Manni, 1985)
and of a more stable nature than, improvements evidenced by attempted to predict learning potential itself (estimated within
students with a known biological cause of mental retardation. Budoff's test-teach-test paradigm) with a conventional mea-
In terms of the psychometric properties of the device as com- sure (WISC-R IQ; Wechsler, 1974) and a so-called Estimated
pared with those of other tests (both static and other dynamic), Learning Potential (Mercer, 1979), virtually no differences in
two points should be raised. First, it is not surprising that internal predictive patterns were found on the basis of statistical analy-
consistencies of the tests are high and match those of the corre- ses. The authors, however, went outside the boundaries of the
sponding static tests because of the fact that all of the tests data by using an "eye-balling" approach to compare for Black
composing the battery are slightly modified versions of the static versus White children the frequencies of cases classified as men-
tests themselves. Second, although test-retest reliabilities have tally retarded on the basis of tests of Wechsler and Mercer
been obtained for pretests, what is really needed is an evaluation and the frequencies of Budoff's gainers. Similarly, Wurtz et
of the reliabilities of the gain. For example, Carver (1974) al. (1985) concluded (without any statistics at all) that the
suggested using two alternative pretests, followed by highly differential impact of using Estimated Learning Potential "is
standardized training, and then two alternative posttests. It has clearly evident" (p, 301).
been suggested (Embretson, 1987b) that the reliability of the In sum, the four-point profile of Budoff's measures suggests
gain can be measured by using one pretest but three posttests, that they are fairly robust instruments for the restricted specific
a strategy that would allow researchers to evaluate the reliability purpose of differentiating the population of low-IQ children in
of change repeatedly. Such (or similar) evaluations have yet to order to conduct proper educational placement and to predict
be conducted. future performance. However, Budoff's search for the best out-
Regarding predictive power, the studies using Budoff's ap- come measure has raised an issue that is of concern to everyone
proach, as with the studies using the LPAD, have not accumu- working in the field of dynamic testing. This issue has to do
lated enough evidence to suggest a resolution to the dynamic - with operationalization and interpretation of responsiveness to
versus-static testing dilemma. Only two studies have addressed training and learning. It has been pointed out that it is quite
the issue of predictive validity, and their findings, though promis- common to find children who show good cognitive test results
ing, cannot be interpreted as certain. Moreover, similar to discus- but have a slow rate of learning and vice versa (see Lidz, 1991).
sions initiated by the followers of Feuerstein's approach, Budoff Moreover, this discrepancy is present in populations of both
and his colleagues questioned the appropriateness of using aca- challenged (i.e., low-IQ) and nonchallenged (i.e., normal-IQ)
demic performance as the criterion measure (Budoff, 1987b). learners. We agree with Lidz that Budoff's decision to change
The tests constituting the set of Budoff Learning Potential the gainers-versus-nongainers paradigm into a paradigm using
Measures are relatively easy to administer and do not require a set of quantitatively distributed scores solved some analytic
lengthy professional training (Budoff, 1987a, 1987b). The re- issues but also introduced the task of developing population-
sults of Budoff's studies are consistent with each other, and it specific norms that would allow for comparison between
is claimed that some of these outcomes (notably, those of high obtained and normative scores. This task has yet to be
societal significance) have been replicated by independent accomplished.
groups (for a review, see Luther, Cole, & Gamlin, 1996). How- Another concern involves the lack of incremental-validity
ever, after a closer examination of some of the findings, we studies. Although the relative properties (i.e., its associations
doubt the conclusiveness of these replications. For example, with gender, static ability measures, personality traits, and SES)
Sewell and colleagues (Sewell, 1979, 1987; Wurtz, Sewell, & of the measures of learning potential have been investigated in
Manni, 1985) explored the differential predictive effectiveness large samples of students, what is missing is "evidence that
of the results of dynamic and static testing. Working with 70 Budoff's Learning Potential Assessment procedures contribute
White and 21 Black first graders, Sewell (1979) concluded that more to prediction of school success than do measures of non-
the conventional IQ tests provided a more valid estimate of verbal IQ" (Lidz, 1991, p. 27). This remark parallels the recent
learning potential under varied learning conditions for middle- Spanish results (Fernandez-Ballesteros et al., 1997), according
class children than for lower-class students, whereas posttest to which training based on Budoff' s approach led to an increase
performance was the best predictor for lower-class Black chil- in verbal IQ. However, as a methodology with restricted and
dren. The conclusions were based on two analyses: the compari- well-defined goals, it appears to be adequate. Moreover, Bu-
son of correlation patterns within the two groups and the results doff's is a pioneering, innovative, and important attempt to
of stepwise regression. In the correlational analyses, although incorporate carefully developed static tests into a dynamic-test-
the correlations between the IQ score and the achievement tests ing procedure.
were significant for Whites but not for Blacks, Fisher's 7. trans- Ultimately, followers of this approach remain committed to
formation showed that the correlations themselves did not differ the specific goal of obtaining results from standardized tests
DYNAMIC TESTING 93

that are administered dynamically and thus challenge primarily simple two-digit addition problems (e.g., 3 + 2 = ?) is tested.
the procedure but not the content of traditional tests. Budoff During the learning session, the student and tester work collabo-
has not developed any global theoretical paradigm or specific ratively, and the math problems are presented as word problems
intervention programs based on his approach. Perhaps he or his such as the following: ''Cookie Monster starts out with three
followers will develop such a program in the future. cookies in his cookie jar, and I'm putting 2 more in the jar. Now
how many cookies are there in the cookie jar?" (p. 161).
Testing by Learning and Transfer If the student runs into difficulties, the tester provides a se-
quence of hints and suggestions about how he or she should
(The Graduated-Prompts Approach)
proceed. The amount of help needed to master the specific proce-
The paradigm. The graduated-prompts approach was devel- dure is an outcome of the learning component of the test. Follow-
oped primarily by Campione and Brown (Campione, 1989; ing the learning stage, the student is presented with a variety of
Campione & Brown, 1987) to establish a supportive framework transfer problems in the same interactive, assisted format. The
that would gradually help individuals until they could solve a new problems are designed so they require students to apply
test problem. Kozulin and Falik (1995) referred to this approach the procedures learned originally to (a) similar problems or near
as one that uses the idea of the ZPD as an explicit working transfer (e.g., 3 + 1 = ?), (b) somewhat dissimilar problems or
concept. The theoretical foundation of this approach is an infor- far transfer (e.g., 4 + 3 + 2 = 7), and (c) very different
mation-processing theory of intelligence (Campione & Brown, problems or very far transfer (e.g., 4 + ? = 6). As in the
1987). learning stage, the outcome measure of the transfer session is the
The key concept of this approach is transfer (maintenance amount of help students need to solve these transfer problems on
of learning), or an individual's ability to use learned information their own. The final stage of the testing procedure is a posttest,
flexibly and in a variety of contexts (Campione, Brown, & in which the student is given a set of tasks that require utilization
Bryant, 1985). Transfer is considered to be especially important of the mastered procedures.
in academic learning situations, in which instruction is often It should be mentioned that, in contrast to previous ap-
incomplete and ambiguous. The ratio of learning (how much proaches, this mode of dynamic testing relies on new content
instruction is needed) to transfer (how far from the original rather than on complex tasks (which are typical of standardized
example an individual can apply the mastered knowledge) is tests). The content tends to be at the beginner level, which
viewed as a measure of individual differences. A special concern makes the standardization of hints easier and allows for more
here is that the indicators of learning and of transfer do not differentiation among children at the lower end of the ability
appear to be on the same scale, therefore, challenging the mean-
distribution.
ing of the ratio.
Formally, the outcome measure (viewed by these researchers
Thus, the operationalization of the theory is in the quantifica-
as the measure of learning potential) is constructed as the in-
tion of indicators of learning and transfer. This quantification is
verse of the minimal number of hints that is necessary for each
conducted on the basis of a guided-learning paradigm. The typi-
individual to reach a specified amount of learning (Resing,
cal sequence of testing consists of several sessions that include
1993). In other words, rather than concentrating on the amount
(a) collection of static, level-of-performance information (pre-
of improvement in a student's performance, researchers deter-
test); (b) initial learning (hinted stage); (c) static, unmediated
mine how much help students need in order to reach a specified
maintenance and transfer testing (posttest); and (d) mediated
criterion and consequently how much additional help they need
maintenance and transfer testing (hinted posttest). The training
to transfer the learned rules and principles to novel tasks and
procedure is considered successful if (a) performance on the
situations. This position is different from the position of the
task improves as a consequence of the instruction, (b) the benefit
majority of other dynamic testers (e.g., Budoff, Guthke, Hamer,
of the training is durable, and (c) the outcome of the training
and Ruijssenaars), who stress the maximal degree of improve-
is generalizable—that is, if transfer occurs to tasks other than
those on which training took place (Brown & Campione, 1981; ment in performance, that is, how much better the individual
Brown, Bransford, Ferrara, & Campione, 1983). Mediation in does on the posttest than on the pretest. Values on the outcome
this approach is delivered by predetermined hints that range variables within the graduated-prompts approach are calculated
from general to specific. Every new hint is given in response to as sums of the total number of hints given at each stage of the
the child's struggle, failure, or error. Provision of hints stops task (i.e., initial learning, maintenance, and transfer) as well as
when the child reaches the level of independent task solution of the total sum for the whole session. The profile of the outcome
predetermined for this task (e.g., two consecutive items solved measures is viewed as an indicator of the student's ZPD. It is
correctly). The outcome variables are viewed as measures of assumed that children with broad ZPDs profit more from the
the child's efficiency of learning. They are operationalized by intervention and need less assistance than children with narrow
the number of prompts and by breadth of transfer in terms of ZPDs.
the degree of success with maintenance, near transfer (a child's The primary target population for this approach is academi-
ability to solve problems that are similar both contextually and cally weak students, who are often labeled as learning disabled
formally to the training problems), and far transfer (a child's or mildly retarded (Campione & Brown, 1987). Researchers
ability to solve problems that are similar to the training problems working in this paradigm have chosen to standardize the testing
contextually but not necessarily formally). procedure with the goal of producing psychometrically defensi-
Consider the following example of a graduated-prompts task ble quantitative data (Campione, 1989). Types of tasks that are
developed by Ferrara (as presented in Campione, 1989). At the used within this framework include inductive-reasoning prob-
first, pretest stage of the procedure, the student's ability to solve lems such as variants of progressive-matrices problems and se-
94 GRIGORENKO AND STERNBERG

ries-completion problems, mathematics problems, and reading- metacognitive training procedures have significant short-term
and listening-comprehension tasks. and long-term effects. The test scores of experimental groups
Empirical findings. Researchers using this approach have that had undergone the training were higher compared with the
been concerned with a number of issues. First, they have investi- scores of the control group several months after training. In
gated the role of learning and transfer processes in students at addition, Resing found that both posttest scores and learning-
different levels of scholastic performance. To accomplish this potential scores, when compared with pretest scores, make a
goal, the researchers developed measures of learning and trans- significant contribution to the prediction of school achievement
fer and evaluated their concurrent and predictive validity (for a (4% to 40% increase in explained variance, depending on the
review, see Campione, 1989). These measures were imple- test). These findings were replicated in a sample of preschool
mented in studies conducted to determine whether learning and children (Day, Engelhardt, Maxwell, & Bolig, 1997) who were
transfer are related to general ability differences and whether given pretests, training, and posttests on block design and simi-
these scores provide information beyond that obtainable from larities tasks. Using structural-equation modeling, the research-
the static tests. The learning and transfer studies indicated that ers demonstrated that the best-fit model was the one that in-
students of lower ability, as compared with higher ability stu- cluded paths from both pretest and learning testings to posttest
dents, require more instruction to reach the criterion and need performances within each domain. Yet another independent rep-
more help to show transfer (Campione, Brown, Ferrara, lication of these results was obtained in a study of 193 first-
Jones, & Steinberg, 1985; Ferrara, Brown, & Campione, 1986). grade children (Speece, Cooper, & Kibler, 1990). On the basis
For example, a study of this type was conducted among groups of static ability measures, 104 students were considered at risk
of third and fifth graders (Ferrara et al., 1986). A test of letter- for school failure, and 83 were identified as controls of average
series completions was presented to each child. Each consecu- ability. The researchers designed their own test for 6 year olds
tive step included a specific set of standard and increasingly with the instrument developed by Bryant (as presented in
explicit prompts. The first prompt was a subtle hint, whereas Speece, Cooper, & Kibler, 1990) as a proximal model. Again, the
the final one was direct teaching of the problem's solution. The dynamic measure proved superior. Verbal intelligence, pretest
ZPD was operationalized as the inverse of the number of knowledge, language variables, and the inverse of the number
prompts needed for each sequential task: A wide ZPD corre- of prompts needed during training accounted for 48% of the
sponded to a reduced number of prompts needed from one trial posttest variance, with the prompt measure accounting for a
to another, that is, for the effective transfer of a new solution significant amount of variance beyond all other variables in the
across similar problems. It was found that fifth graders learned model. Moreover, although the two groups of children were
more quickly than did third graders and that higher IQ children indistinguishable by standard achievement measures and the in-
needed significantly fewer hints to reach a learning criterion verse of the number of prompts, the groups could be discrimi-
than did average-IQ children. Furthermore, as the number of nated by the posttest measure.
characteristics distinguishing the learning and transfer tasks in- Second, researchers investigated the role of dynamic testing
creased, the performance differences became progressively more in clinical assessment. A number of group-comparison studies
pronounced when younger children were compared with older have been conducted to investigate possible differences between
ones and when lower ability students were compared with higher retarded and nonretarded children. Hall and Day (1984, as cited
ability students. in Day & Hall, 1987) tested the claim that learning-disabled
With regard to the superiority of dynamic-versus-slatic proce- and average-achieving children require less assistance than do
dures, Bryant, Brown, and Campione (as cited in Campione, retarded children to learn how to find a solution for a novel
1989) asserted that the learning and transfer scores would pro- problem and how to transfer their learning. There were no group
vide information beyond that obtainable from static tests. In differences during training or close transfer. Performance on the
this study, the individuals' gains from the pretest to the posttest far-transfer task, however, did reflect group differences: The
were treated as criterion measures; the intent was to investigate average children did the best, but the differences between their
the set of scores that would best predict these gains. It was performance and that of learning-disabled students were not
found that guided learning and transfer scores were the best significant. Mentally retarded children, however, did signifi-
individual predictors of gain (rs ~=^ .60), whereas static ability cantly worse than did either of the other groups and required
scores, although predictive, were of secondary importance (rs more hints to learn to solve the problems.
2 .45). Campione, Brown, Ferrara, et al. (1985) used the Raven
Similarly, the goal of two other studies, one involving a sim- matrices test in a comparative study involving mental-age-
plified version of the matrices test and the other a simplified matched retarded and nonretarded children. They found no dif-
version of the series-completion task (Campione & Brown, ference between the groups' means at the learning stage. It was
1987), was to evaluate the magnitude of the contributions of hypothesized that this lack of difference was due to matching
IQ and of measures of training and transfer to the residual gain, procedures that equated the groups for both mental age and
calculated on the basis of pretest and posttest. It has been found entering competence. However, there were significant group dif-
that both learning and transfer measures are associated with the ferences during both the maintenance and transfer phases in that
gain measure over and above measures of abilities (the incre- retarded children scored lower. It was found that the greater the
ments in K2 were 22% and 17% for the matrices task and 2% need for flexibility in applying the learned rules, the larger were
and 22% for the series-completion task). The ability measures the differences between retarded and nonretarded children.
accounted for approximately 37% of the variance, with the esti- Moreover, in the second study (Campione & Brown, 1987),
mated IQ explaining 24% and the Raven scores accounting for which used letter-series completions, group differences between
14% of the variance. Moreover, Resing (1993) showed that retarded and nonretarded children emerged at the learning stage,
DYNAMIC TESTING 95

and more differences appeared at maintenance, as well as during attentive and reflective child might require fewer hints than
far transfer and very far transfer, once again with the retarded might a more impulsive child and consequently score higher on
group scoring lower. learning-potential measures. In this case, the learning-potential
Day and Zajakowski (1991) compared assisted and unas- measure might be just a proxy measure of the child's attention
sisted performance of average and learning-disabled readers and rather than of his or her learning.
found that children with learning disabilities required signifi- Another concern that needs to be investigated is whether the
cantly more instruction than did average readers to reach a way learning potential is defined in this paradigm is really quali-
mastery criterion in reading. Resing (1993) studied children tatively different from the traditional way ability is defined in
from mainstream primary classes, learning-disabled students, the static-testing paradigm, or whether it is another measure of
and mildly mentally retarded children of the same chronological it that complements (but is nonidentical to) existing measures.
age. Children in these three groups differed considerably in the The number of prompted hints reflects the number of subtasks
mean number of hints needed to reach the established criterion. a child cannot solve independently. In other words, a graduated-
The three groups also needed a different number of hints per prompts task might be viewed as a sequence of smaller tasks,
training item. Whereas children from primary school mostly each of which can be either solved (i.e., no hint is given, the
needed metacognitive hints, about 12% of learning-disabled child gets a score) or not solved (i.e., a hint is given, the child
children and about 25% of mildly retarded children needed task- does not get a score). Thus, test performance could be rescored
specific instruction in addition to metacognitive hints. in terms of counting right and wrong answers rather than hints.
Evaluation based on the proposed four-point profile. The When reconceived in this way, the outcome measure of the
evaluation of this work on the four dimensions involves a num- graduated-prompts approach seems to have more a static than
ber of issues. Similar to the findings obtained within other dy- a dynamic nature.
namic testing paradigms, the observation was made that retarded Moreover, in the graduated-prompts paradigm, the child is
and nonretarded children differ in terms of their test perfor- not directly taught anything; he or she is led to the discovery
mance. The novel piece of information here is that in addition of a rule through a system of predetermined hints. What one
to differences in pre- and posttest, that is, the differences that really wants to know, then, is how the hint-based measures
closely replicate the discrepancy registered when static tests are correlate with indicators of other cognitive functions (e.g., mem-
used, differences have also been documented in terms of the ory, cognitive styles, attention, specific abilities). It appears that
ability of retarded and nonretarded children to transfer and learn. these outcome measures might correlate higher with other ability
The quantification of these differences became possible because measures than with other learning-potential measures.
performance at both learning and transfer stages was measured In contrast to the two traditions discussed above, followers
through the number of hints needed to reach the criterion. In of this approach, as far as can be judged from the published
other words, the remarkable achievement of this approach is in material, have paid significantly less attention to such psycho-
its creative quantification and standardization of the intervention metric properties of their instruments as test-retest reliability
and transfer stages, an achievement that has not been equally and internal consistency. We are aware of only one study (Fer-
accomplished by either Feuerstein or Budoff. rara et al., 1986) that has addressed reliability issues. This study,
Another novel piece of information came from studies of however, appears to have been addressing the issue of reliability
normal children, in which the researchers noticed a develop- of change rather than the issue of the reliability of the tasks
mental difference between third and fifth graders in terms of themselves. The researchers reported that the amount of instruc-
the number of requested and processed hints: The older the tion needed by participants was similar across two inductive-
child, the fewer the number of hints he or she needed in order reasoning tasks, which suggests that learning potential can be
to find the correct answer. The broader psychological meaning measured reliably in the context of different but related tasks.
of this finding is that learning potential, as defined by followers With regard to predictive validity, this approach is similar to
of the graduated-prompts approach, changes its properties at other dynamic testing approaches, in that it has not yet shown
different developmental stages, that is, it is itself developmen- the substantial predictive power of the learning and transfer
tally dynamic. On the basis of this finding, a hypothesis might measures for either school-achievement criteria or other adaptive
be formed that learning potential has a psychological structure measures. At present, the empirical research is limited, but we
developmentally similar to that of any other cognitive function believe it is very promising.
and that it may be possible to intervene in its development. The graduated-prompts approach operates with standardized
As with any research methodology, the graduated-prompts measures that do not assume any special training and are easy
approach raises a number of concerns. A first concern has to to administer. Moreover, this methodology has been used by
do with the nature and meaning of the hints' comparability with independent researchers (e.g., Day & Cordon, 1993; Day &
each other (Lidz, 1991). Because of the fact that the nature of Zajakowski, 1991; Speece et al., 1990), suggesting its transport-
hints is very heterogeneous—the hints, for example, vary in ability and applicability in different research contexts.
terms of their difficulty—hints given at different points of prob- In sum, the graduated-prompts approach introduced a new
lem solution may not be comparable (or even additive in their feature to dynamic testing: standardization of intervention and
effects). Moreover, hints might have different meanings and transfer. This development has been realized by switching the
significances for individuals of different cognitive profiles. In center of the testing situation from the child to the task. The
order to understand the psychological meanings of the outcome task and the developed hints anticipate virtually every move a
measures obtained using this approach, these outcomes need to child can make in the search for a solution. The question is not
be studied with more traditional cognitive-developmental indica- what the trainer should say to the child in order to lead him or
tors (e.g., memory and attention), for example, because a more her to the solution. The what is clear. The only issue is how
96 GRIGORBNKO AND STERNBERG

much should be said to the child in order for him or her to reach ages 7-9 years), the Speed and Recall Test (for adults with
the criterion. functional brain disorders), and the Reasoning test. Some of
A similar approach, shifting the emphasis from the child to these tests are based on conventional intelligence tests, whereas
the task, has been implemented within the European tradition, others use new types of items. An example of a short-term
especially in the work of Carlson and Wiedl. We discuss this learning test based on a conventional measure is the Raven
work in the next section. Short-Term Learning Test (Guthke, 1992), which is designed
for early identification of developmentally challenged children.
At the first stage, the child is tested with the original colored
The European Contribution
form of the Raven test for children. If problems are not solved,
The Lemlest(s): The paradigm. This approach represents a set of graded hints is given. The hints (dosaged teaching
a conglomerate of testing procedures united under the rubric of intervention) are developed according to Gal'perin's (1966)
the Learning Test (Lerntest in German). theory of learning. All children are led to the correct solution
A German psychologist, Guthke (1992,1993), has developed and, if needed, are shown it. The outcome measure is the number
a series of tests framed in the pretest-training-posttest para- of hints needed.
digm. In this paradigm, training (learning) is based on Gal'per- Finally, the so-called Diagnostic Programs (DP) are a spe-
in's theory (1966; previously discussed), the core of which cial, newly developed variant of the short-term learning test.
states that every cognitive function can be initially formed (de- The DP, according to Guthke and Stein (1996), attempt to quan-
veloped) within the child's ZPD under the teacher's assistance tify not only learning gains but also the learning process itself.
and then internalized and assimilated. Guthke and his colleagues An example of DP is the Reasoning Diagnostic Program
have developed several types of learning-potential tests, adjusted (Guthke, Rader, Caruso, & Schmidt, 1991). This is a sequence
to the length of the training phase. They use repetitions, prompts, of increasingly difficult figural tests, administered and analyzed
and systematic feedback during the test in extended training by computer. The items are ordered in terms of their complexity
programs interposed between a pretest and a posttest. It has been so that in order to be qualified to attempt the next level of
noted (Haywood, 1997) that Guthke's procedures are closer to complexity, the child has to adequately solve so-called target
the psychometric tradition than are most of the techniques used items, presented at the beginning and the end of every stage. If,
in other approaches to dynamic testing. however, the child makes errors, he or she will be returned to
The learning test concept attempts to meet two criteria: (a) easier items and will be forced to work up to the point of his
the need to meet the demands of modem psychometrics and (b) or her mistake. If, however, the child makes the same mistake
the need to determine an individual's ability to learn. The latter again at this item, he or she will receive prompts. These prompts
is accomplished by recording the effect of standardized cues will, presumedly, lead the child to a more complex level of
incorporated into the test. According to the type of standardized problem solving. Thus, a combination of (a) repetitions of easier
cue, Guthke (Guthke, Beckman, & Dobat, 1997) distinguishes items and (b) standardized prompts allows the child to progress
between short-term and long-term learning tests. within the test.
An example of a learning-potential battery with a long-term The Lerntest(s): Empirical findings. Researchers investi-
(7-day) training phase is a reasoning battery, the Reasoning gating the Lerntest have addressed various dynamic-versus-
Learning Test (Guthke, 1993), which consists of two parallel static testing issues such as the influence of training on perfor-
tests (Forms A and B) that can be alternated as pre- and post- mance, group differences on static and dynamic measures, and
tests. The test is designed to assess reasoning in the basic do- the predictive validity of their measures. However, because of the
mains: verbal (through analogies), numerical (through numeri- fact that European followers of dynamic testing are of different
cal sequences), and figural (through figural sequences). Train- nationalities and because the majority of them have published
ing is standardized and can be conducted either in groups or their results in their own countries in their own languages, it
individually. During the training phase, the students are provided was difficult to obtain the data from many of these studies.
with instruction manuals. Students are explicitly taught meta- Therefore, the following section is not a comprehensive review
cognitive strategies for solving the test items. The outcome vari- but rather a representative collection of the published studies
able is the posttest score, which is considered to be the result that were available to us.
of the learning-potential test. With regard to training, a number of studies compared a
The short-term battery is designed in such a way that the trained experimental group with a control group to which a
training phase is embedded in the procedure. These tests do not test was simply administered twice. The experimental group
involve intervention per se but are based on manipulations consistently has demonstrated significantly higher gain. Even
within the test situation itself. This approach is similar to the short-term training has resulted in significant gain in learning
testing-the-limits procedures developed by Schmidt (1971). (Guthke & Wingenfeld. 1992). It appears that pretest scores do
There are two types of short-term batteries: (a) tests providing not reliably predict either posttest scores or learning gain (e.g.,
systematic but fairly limited feedback and (b) tests providing simple repetition of the Raven test resulted in r = .70 between
extensive assistance in a standardized form in addition to simple scores obtained on the first administration and the repetition,
feedback. There are five published German learning-potential whereas the correlation between pre- and posttests was only .27
tests (for a review, see Guthke, 1993): Sequence of Sets Test when an interview occurred between testings). However, the
(for preschool and first-grade children; this test of series com- predictive validity of posttraining scores, when correlated with
pletion is designed to assess the learning skills prerequisite for school grades and teacher ratings, appears to be fairly high;
math), the Preschool Learning Potential Test (for children ages posttest scores tend to show significantly higher correlations
5-7 years), the Situation Learning Potential Test (for children with school performance than do pretest scores (Guthke &
DYNAMIC TESTING 97

Wingenfeld, 1992). For example, the posttest scores on the Ra- power of the DLPT was not significantly different from the
ven Learning Test obtained for 28 kindergarten children were predictive power of static tests. This finding contradicts those
found to be predictive of school performance evaluated in the from the 7-year-long longitudinal study (Guthke, 1992) de-
first, second, sixth, and seventh grades, whereas the results of scribed previously. Because of the facts that the sample size of
the conventional RCPM were not found to correlate with school the Dutch study was significantly larger and that the magnitudes
success (Guthke, 1992). In interpreting these results, one should of the effects in the German study were quite small, the inconsis-
be extremely cautious. First, the sample size appears to be quite tency of the results might be explained by a range of factors;
small for obtaining the statistical power necessary to detect for example, the DLPT and Lemtests—though based on the
both the change itself and its magnitude. Second, the presented same methodology—are different tests, the German results were
correlations are all statistically significant only atp < .05, which limited, and there were some moderating factors of unknown
suggests possibly that the magnitude of the effects was quite effect in the Dutch study.
low. Testing-the-limits approach: The paradigm. A somewhat
Thus, these studies replicated the finding that posttest scores different methodology has been developed within what has been
are more informative than scores on both pretest and training called a testing-the-limits approach (Carlson & Wiedl, 1978,
measures. Guthke (1992) and his colleagues (Guthke, Beck- 1979; Embretson, 1987a). Carlson and Wiedl (1980, 1992a,
man, & Dobat, 1997; Guthke & Stein, 1996; Guthke & Wingen- 1992b) have attempted to construct a theoretical framework
feld, 1992) summarized the results of a number of learning- for their approach by integrating their empirical findings with
potential studies that investigated the associations between the information-processing theory. They attributed poor perfor-
measures of learning potential and other psychological variables. mance on ability tests, at least to some degree, to participants'
According to these authors (who, unfortunately, did not include inability to understand clearly what they were supposed to do
the statistical data in their report), learning potential (a) appears and to a set of personality variables (e.g., test anxiety, personal-
to be relatively insensitive to environmental manipulations (e.g., ity traits, self-esteem). Their conceptual schemata included
parental support), (b) tends to correlate with creativity (the three components: task characteristics, personal factors, and di-
higher the learning potential, the more creative a student is), agnostic approaches (Carlson & Wiedl, 1992a). Their main
and (c) tends to reduce the influence of nonintellectual compo- tasks were the Raven matrices and the Cattell Culture Fair Test
nents (e.g., personality traits) on test performance (the higher (Cattell, 1940). Personal factors include cognitive and metacog-
the learning potential, the smaller the role played by such factors nitive variables. Epistemic (structural) and heuristic (proce-
as irritability and neuroticism). dural ) structures are reflective of previous states of knowledge.
In a study of group differences in learning potential, Groot- Diagnostic approaches represent differentiations of testing strat-
Zwaaftink, Ruijssenaars, and Schelbergen (1987) worked with egies that are designed to boost the performance of disadvan-
groups of children with and without cerebral palsy. They used taged children. In their work and specifically in their model,
a computer version of the Tower of Hanoi problem on the basis Carlson and Wiedl have addressed all three of these issues.
of the pretest-training-posttest paradigm. They also adminis- This approach, like other dynamic-testing approaches, states
tered two other learning-potential tests that dynamically ap- that test performance is conceptualized as the result of the dy-
proached the question of measuring fluid abilities. The results namic interaction among the individual, the test materials, and
showed significant group differences in learning potential be- the test situation (Carlson & Wiedl, 1992a). The special feature
tween affected and normal children. This finding might have of this approach is that it is centered on the test situation. The
turned out to be important had the researchers conducted a initial idea of the testing-the-limits approach was originally for-
comparison of group differences on conventional tests. Had mulated by Schmidt (1971), was subsequently developed by
there been such data, especially if the patterns of group differ- Guthke (1977), and later was refined by Carlson and Wiedl.
ences had been different for affected and normal children, these The main assumption of this approach is that for particular
data would have offered yet further confirmation of the effective- individuals, certain manipulations of the testing situation de-
ness of dynamic testing. Unfortunately, a finding of just group signed to compensate for present intellectual or educational
differences between affected and unaffected individuals does deficits can lead to significant improvements in performance
not tell us what is added to the usage of a conventional test in (Carlson & Wiedl, 1979). Thus, the task is to find a match
similar situations. between a certain type of disadvantaged student and a certain
Similarly, Hessels and Hamers (1993) conducted a large com- type of manipulation of the test situation so that the match
parative study (N ~ 500) of group differences between immi- evokes the best performance possible.
grant children from Turkey and Morocco and native Dutch Researchers working within this paradigm concentrate their
school-age children. In this study, the researchers developed the attention on conventional tests (e.g., the Cattell Culture Fair
language-free Dutch Learning Potential Test (DLPT), on the Tests or the RCPM) in order to develop a system of test adminis-
basis of methodology originated by Guthke and his colleagues. tration that can determine which characteristics of the test situa-
Using factor analyses, the researchers established that the factor tion, the test itself, or an individual are related to changes in
structure of the learning-potential measures was virtually the test performance and consequently can enhance performance.
same for all groups. Although the mean difference among the Hence, the researchers are primarily concerned with standard-
groups was significant on conventional IQ tests, the difference ized interventions designed to facilitate performance and serve
between mean DLPT scores was not significant. In other words, as more sensitive measures of abilities.
although scoring significantly lower on static tests, immigrant Testing-the-limits approach: Empirical findings. Most field
children scored as well as the native Dutch children who partici- work exploring the effectiveness of the testing-the-limits ap-
pated in the study on the dynamic tests. However, the predictive proach has been conducted with German children (e.g., Bethge,
98 GRIGORENKO AND STERNBERG

Carlson, & Wiedl, 1982; Carlson & Wiedl, 1976, 1978, 1979, had the greatest effect on items requiring reasoning by analogy,
1980; Wiedl & Carlson, 1976). Initially the work was done with that is, those items for which higher level cognitive processes
normally developing children, but later it included children with could be modified. Expanding on this finding, researchers (Cor-
learning difficulties and children of different ethnic groups. In mier, Carlson, & Das, 1990; Kar, Dash, Das, & Carlson, 1993)
the early studies, researchers compared the six conditions over used planning (operationalized by visual search) as an individ-
the course of task administration. These conditions were (a) ual-differences dimension in the design and examined the effect
standard instruction, (b) verbalization during and after solution, of verbalization on task performance. No main effect of verbal-
(c) verbalization after solution, (d) simple feedback, (e) elabo- ization was shown, but there was a significant interaction effect:
rate feedback, and (f) elaborate feedback plus verbalization Only poor planners improved. In other words, overt verbaliza-
during and after solution. For example, Carlson and Wiedl tion compensated for inttaindividual variability in planning and
(1976, 1978, 1979) conducted a series of studies with the yielded an interaction between individual differences and test
RCPM. The test was given to second and fourth graders in two condition. Thus, on the basis of the results from these studies,
forms: the regular booklet and puzzle form. The RCPM was the testing-the-limits approach appears to be most appropriate
administered under the six conditions described above. The first for assessing higher level cognitive functions in individuals
two studies (Carlson & Wiedl, 1976, 1978) demonstrated that whose level of performance on corresponding tasks is initially
test performance improved significantly because of testing con- low.
dition and the form of the test. The most salient conditions Carlson and Wiedl (1979) were the first to introduce person-
resulting in improved performance were those involving verbal- ality as a variable in dynamic testing. Using the sample from
ization and feedback. The most effective conditions were Condi- their previous studies, the researchers have modified the design
tions e and f, involving verbal descriptions during and after and collected data on the introversion, neuroticism, and impul-
problem solution and elaborated feedback (p < .05). However, sivity-reflectivity of 203 second-grade and 230 fourth-grade
only the second-grade children showed further improvements students. The results showed differential correlational profiles
in performance through feedback (Age X Testing Condition for different testing conditions with the number of significant
interaction, F(5, 410) = 2.22, p < .01). Closer analysis of the correlations between dynamic test performance and various per-
published results of the analysis of variance used in the 1978 sonality measures varying from condition to condition. For ex-
and 1979 studies allowed us to compare the magnitude of the ample, for the impulsivity-reflectivity indicator, the number of
main effects specified in these analyses. In the 1978 study, the significant correlations varied from zero out of six (Condition
following main effects (Fs) and their magnitudes (/s) were e) to five out of six (Condition f). However, the patterns of
revealed: for testing condition, F(5, 101) = 2.71, p < .05, / correlations between test performance and personality measures
= 0.35; for version of the test, F(l, 101) = 15.55, p < .001, within Condition a (conventional administration) and Condition
/ = 0.38; and for repeated testing, F(l, 101) = 28.68, p < f (elaborate feedback plus verbalization during and after solu-
.001, / = 0.52. In the 1979 study, when the sample size was tion) were very similar. This finding suggests that personality
increased from 108 to 433 students, the following results were variables operate in similar ways in situations of both static
obtained: for testing condition, F(5, 410) = 12.63, p < .01, / testing and testing with elaborate feedback and verbalization.
= 0.38; for age, F(l, 410) = 92.29, p < .01, / = 0.46; for In addition, the researchers tried to predict RCPM scores from
version of the test, F(l, 410) = 61.77, p < .01, / = 0.38. personality traits. Whereas the same percentage of variation in
Clearly, the magnitude of the effect of testing condition appears performance on the Raven can be attributed to personality vari-
to be similar to or smaller than the magnitude of the effect of ables under all testing conditions, no main effects for specific
simple repetition of the test and the effect of the test version. personality traits were found. However, there were some interest-
When the researchers found that the most effective conditions ing interaction effects. Neuroticism correlated positively with
were verbalization and feedback, they used these conditions and test results under Condition b, but impulsivity correlated with
the condition of standard administration for comparison. They test results under Condition f. This finding was replicated in the
restructured the testing situation to incorporate verbalization 1980 study (as cited in Lidz, 1987), in which the performance
and elaborate verbal feedback on the participants' performance. of impulsive children improved when verbalization before and
While solving the task, the participants were asked to describe after problem solving took place. Similar results were obtained
both the task and their own cognitive activity (e.g., Tell me what in a population of learning-disabled children (as cited in Lidz,
you see and what you are thinking about as you solve this 1987). This general result (performance improving under non-
problem. Tell me why you think the solution you chose is cor- standard administration) has also been replicated in a study of a
rect. Why do you think it is correct and other possible answers racially mixed sample of American children (Dillon & Carlson,
are wrong?). 1978). In this study, Dillon and Carlson administered the RCPM
In general, the restructuring of testing situations led to higher to children of three different age groups (5-6, 7-8, and 9-
levels of performance in participants with mental retardation, 10 years) and different ethnic backgrounds (White American,
learning disabilities, or neurological impairments and in partici- Mexican American, and African American). Three test condi-
pants of a minority-ethnic cultural background. Moreover, it tions were used: no help (conventional testing), verbalization,
was shown that when related to general ability test scores and and verbalization plus elaborated feedback. Although there were
school achievement, the scores obtained in the situations of marked differences in the performance of the three groups in
restructured testing had higher predictive validity than did the the testing situation of no help, these discrepancies were signifi-
scores recorded under conventional testing. cantly reduced under the verbalization-plus-feedback condition;
Similar to the results of the earlier work by Budoff and Cor- the researchers found that differences by race declined markedly
man (1976), the results of these studies showed that the training under the dynamic conditions, which suggests that the differ-
DYNAMIC TESTING 99

ences by race traditionally registered by static tests could be and language performance. Even though the numbers of partici-
reduced or perhaps even eliminated in dynamic-testing settings. pants tested under different conditions are small (ranging from
Similarly, Bethge, Carlson, and Wiedl (1982) studied 72 third- 13 to 21), there is a trend toward improving the magnitude of
grade children and showed that dynamic testing reduces test correlations by increasing the degree to which the instruction
anxiety and negative orientation to the testing situation. They is dynamic. Later examination of this issue in a sample of
found that both situation and achievement anxiety indicators are German children showed that under effective testing-the-limits
significantly lower in dynamic than in static testing settings, procedures (verbalization), both Raven and Cattell scores corre-
F(2, 69) = 5.73, p < .01, and F(2, 69) = 5.55, p < .01, for late highly with mathematics performance when the teaching
evaluation and achievement anxiety, respectively. procedures match the testing procedures (J. S. Carlson, personal
In sum, the work of Carlson, Wiedl, and their colleagues communication, September 10, 1997).
convincingly demonstrated that specific nontarget variables In summary, the Carlson-Wiedl approach explores one di-
(e.g., anxiety, impulsivity, poor planning) affect performance mension that is important for dynamic testing: the impact of
on ^-loaded factors, but these detrimental impacts could be instruction and feedback. The paradigm is based on group com-
compensated for by overt verbalization. parison; it serves pedagogical goals and does not address the
One advantage of the Carlson-Wiedl approach (for a review, issue of individual differences. The results obtained with this
see Carlson, 1989) is that it does not require a pretest, specific methodology can be used within remedial programs for specific
training, and aposttest (i.e., the test-teach-test paradigm). The disadvantaged populations.
studies are designed in such a way that children are randomly
assigned to different testing conditions. Thus, the typical dy-
Swanson's Cognitive Processing Test
namic test-teach-test paradigm is not necessary. Moreover, we
are not aware of any published material in which the psychomet- Swanson's (1996) work in the field of dynamic testing re-
ric properties of the Raven test, as it is administered under sulted in creation of the Swanson Cognitive Processing Test (S-
different conditions, has been evaluated. If testing conditions CPT), the only dynamic test currently distributed by a major
change the structure of external correlations between test perfor- test publisher (Pro-Ed). The development of the S-CPT was
mance and other (e.g., personality) variables, they might change triggered by an intention to create an instrument allowing the
the psychological structure revealed by the test itself by chang- evaluation of "components of processing ability under standard-
ing the internal consistency of the test. In addition, this approach ized dynamic testing conditions" (Swanson, 1995b, p. 674).
mitigates methodological problems related to the measurement The theoretical roots of the S-CPT are in the information-pro-
of change. cessing approach to learning disabilities (Swanson, 1984a,
The main disadvantage of the approach is in the group nature 1984b, 1988). Swanson's approach to dynamic testing is both
of the results. In other words, tests are presented in different methodologically and terminologically linked to the work of his
modes of administration (e.g., standard procedure, verbalization predecessors, especially Feuerstein and Brown and Campione.
during and after solution, elaborate feedback) to different Theoretical model. The main assumption of the model is
groups of participants, with the goal of determining which mode the central role of working memory (WM), a critical component
is better for which group (e.g., learning disabled versus mentally of many information-processing models, in skill acquisition and
retarded). No individual comparison is possible. However, al- learning. Correspondingly, children's learning difficulties are
though the research findings are based on group comparison, attributed to deficits in WM, whereas children's academic excel-
the intervening procedures are highly individualized and targeted lence is considered to be linked to high levels of WM (Swanson,
to a specific profile of the performance of every child. 1995b). Swanson defined WM as a system that simultaneously
Most of the findings obtained within this framework are inter- holds old and new information that is being manipulated and
active in nature. Specifically, verbalization and elaborated feed- transformed. Long-term memory is defined as a system of highly
back improve performance in individuals (a) of below-average interconnected units representing semantic and episodic infor-
ability levels, (b) at certain developmental stages, (c) with high mation. The procedural basis of the model is in the assumption
levels of anxiety, and (d) on tasks with a certain degree of that WM encoding occurs when long-term memory representa-
difficulty that require higher level cognitive processing. This tions are engaged in the process of testing as a result of previous
list of requirements limits the target population for whom this learning.
approach to testing is most beneficial. However, when the popu- Test description. The S-CPT consists of 11 subtests (Swan-
lation is defined appropriately, the approach should be adequate. son, 1992, 1993) that can be administered as a battery or sepa-
For example, below-average-ability students with a high level rately. Administration of the complete S-CPT battery requires
of anxiety are expected to demonstrate their best performance approximately 3 hr. The subtests are Rhyming, Visual Matrix,
on the Raven's test under the testing condition of feedback and Auditory Digit Sequence, Mapping and Directions, Story Retell-
verbalization. Thus, the testing-the-limits approach has provided ing, Picture Sequence, Phrase Sequence, Spatial-Organization,
some evidence that when testing approaches are applied differ- Semantic Association, Semantic Categorization, and Nonverbal
entially and optimal conditions are defined for specific groups Sequencing. The test is viewed as an instrument for quantifying
of individuals, these individuals are expected to demonstrate processing potential (Swanson, 1995a). The concept of pro-
their best performance, a performance of higher quality than cessing potential was defined by Swanson as being close to
they would have shown in a situation of conventional testing. Feuerstein's concept of cognitive modifiability. The administra-
The issue of predictive validity was addressed by Carlson and tion of the S-CPT results in seven composite scores (Swanson,
Wiedl (1979), who compared correlations of Raven scores un- 1995b): (a) the initial score, indicating the highest level of
der different testing conditions with measures of mathematical unassisted performance, that is, the level corresponding to the
100 GR1GORENKO AND STERNBERG

traditional static score; (b) the gain score, indicating the highest PPVT-R scores constant) of academic performance indicators
score obtained under probing conditions; (c) the probe score with S-CPT components was slightly higher than with short-
(also referred to as the instructional efficiency score), indicating term memory scores but was not statistically different (.37 vs.
the number of prompts or hints necessary to achieve the higher .25, f ( l , 59) = 0.93, ns, for mean correlations for S-CPT and
score under probing conditions; (d) the maintenance score, indi- the short-term memory measures). Although this difference in
cating the stability of the newly achieved highest level of perfor- correlations is an interesting observation, these data do not war-
mance without the support of probes or hints; (e) the processing rant any strong conclusions such as the one made by Swanson
difference score, measuring the difference between potential (1995b) that correlations of higher magnitude were more asso-
performance as determined under guided assistance and the ac- ciated with S-CPT scores than with short-term memory scores
tual performance level; ( f ) the processing stability score, dem- (p. 678). In reality, the difference in magnitude did not cross
onstrating the difference between the maintenance score and the the threshold.
initial score; and (g) the strategy efficiency score, demonstrating Three composite scores (initial, gain, and maintenance) from
the strategy to remember (obtainable only from the Auditory the data collected in the total sample (over 1,600 individuals)
Digit Sequence, Mapping and Directions, Picture Sequence, were subjected to factor analyses (Swanson, 1995b). Swanson
Spatial-Organization, Semantic Categorization, and Nonverbal chose not to present the total amount of variance explained by
Sequencing subtests). Even though Swanson's terminology is the two-factor solutions; for the three scores, the first-factor
novel, most of his dynamic testing scores map onto the indexes eigenvalues for the unrotated solutions were 4.43, 5.06, and
used in other approaches. The initial score corresponds to the 4.45, whereas for the second factor they ranged from 1.01 to
pretest score; the gain score corresponds to the intervention 1.17. The best solution was found for the principal-factor analy-
score; the maintenance score corresponds to the posttest score; sis with a varimax rotation: A two-factor model emerged, re-
the processing-difference score corresponds to the intervention flecting semantic memory processes (with higher loadings of
score value minus pretest score value; and the processing-stabil- the Rhyming, Phrase Sequence, Semantic Association, and Se-
ity score corresponds to the posttest score value minus pretest mantic Categorization subtests) and episodic memory processes
score value. (with higher loadings of the Visual Matrix, Mapping and Direc-
In addition to obtaining the dynamic scores, the battery pro- tions, Story Retelling, and the Nonverbal Sequencing subtests).
vides the tester with indexes of semantic and episodic WM (so- As a follow-up to the exploratory analysis, a confirmatory factor
called factorial composites), as well as scores on auditory- analysis was conducted. On the basis of presented chi-square
verbal, visuo-spatial, prospective, and retrospective memory values (but no other goodness-of-fit indexes and no parameter
aspects (the so-called S-CPT components). According to Swan- estimates), Swanson concluded that the confirmatory analysis
son (1995b), both the factorial composites and the S-CPT com- also supported a two-factor model. Although the presented evi-
ponents were characterized by high Cronbach alphas (ranging dence supported this conclusion, this finding might have been
from .86 to .95) in all three conditions (initial, gain, and even clearer had Swanson presented his results more fully.
maintenance). The most interesting outcome of the factor analysis is that,
Practical context. The S-CPT was initially developed in the somewhat surprisingly, the factor structures were practically
context of special education to address two major issues. The identical for all three composite scores. This finding suggests
first issue is (a) whether children with specific learning disabili- that either (a) the intervention had almost a linear effect as if
ties (in particular, reading and math disabilities) reflect general- a constant were added to every test score: The correlational
ized or specific WM deficits when compared with average- structure of gains was very similar to the structure of the initial
achieving children and (b) whether children with these WM scores; similarly, the maintenance scores appeared to be almost
deficits are distinct from other groups of children experiencing in linear dependency from both the initial and the gain scores;
learning problems (e.g., slow learners). The second issue con- or (b) the initial, gain, and maintenance scores are highly corre-
cerns the degree of modifiability of WM performance in children lated with each other so that intraindividual variability intro-
with learning disabilities. duced by the intervention procedure was overshadowed by the
Empirical findings. Most of the studies that have used the magnitude of these correlations. Whereas the former is very
S-CPT were carried out within a large-scale standardization unlikely (though possible), the latter is much more likely and,
study of this newly developed test. The majority of the questions unfortunately, much less appealing. Swanson (1995b) did not
raised within this study had to do with evaluating the psychomet- present the correlations between the composite scores of the
ric properties of Swanson's test. test for the total (N & 1,600) sample , so one cannot tell.
For example, in order to address the issue of construct valid- Another piece of evidence suggesting a possibility of high
ity, 98 children were evaluated through the S-CPT, a sentence- correlations between the three composite scores comes from
span measure (a WM test), a set of achievement tests (the stepwise regression analysis. Here the intention was to predict
Peabody Individual Achievement Test and the Peabody Picture how much variance in academic performance might be ac-
Vocabulary Test—Revised [PPVT-R]; Dunn & Dunn, 1981), counted for by dynamic-testing composites. It is interesting to
and short-term memory tests (word sequence and object se- note that, unlike other investigators in the field (e.g., Campi-
quence). To evaluate convergent validity, various S-CPT compo- one & Brown, 1987), Swanson decided not to include ability
nents (i.e., verbal, visuo-spatial, prospective, and retrospective) measures in his regression equations. In his article (Swanson,
were correlated with the WM test; all correlations were signifi- 1995b), Swanson provided the reader with a single illustration
cant. To evaluate divergent validity, both the S-CPT components of a stepwise regression predicting a reading achievement score
and the short-term memory tests were correlated with achieve- (Wide Range Achievement Test [WRAT] reading subtest, J. F.
ment scores; the mean partial correlation (holding age and Jastak & Jastak, 1978; S. Jastak & Wilkinson, 1984). With the
DYNAMIC TESTING 101

initial score always entered first in the equation, of the four of scores in the learning-disabled and control groups. In both
composites entered (initial, gain, probe, and maintenance studies, scores on both static and dynamic tests were signifi-
scores), only the initial score (R2 = .26, p < .001) and the cantly higher in the control group than in the learning-disabled
gain score (incremental R~ = .05, p < .05) remained in the group. Then, using stepwise regression analysis, reading and
equation. Leaving aside a variety of technical and theoretical mathematical performance scores were predicted for the total
problems associated with stepwise analyses (Altman & Ander- sample. Then Swanson (1994, 1995b) tried to differentiate two
sen, 1989), we would like to remind the reader that Resing groups (normal and disabled) by the means of discriminant
(1993), working in the framework of the graduated-prompts analyses. In both studies, this procedure failed to provide clear-
approach, found that both posttest scores and learning-potential cut information and therefore was followed by other types of
scores, when compared with pretest scores, made a significant classification analyses. These analyses resulted in the differenti-
contribution to the prediction of school achievement (4% to ation of multiple subgroups. Finally, the derived groups were
40% increase in explained variance, depending on the test). compared with each other.
Thus, it appears that Swanson's conclusion that his dynamic- In the first study (Swanson, 1994), one of the research ques-
testing procedures enhance prediction of school achievement is tions was whether ability-group classification, based on tradi-
not yet supported enough by the data in terms of the absolute tional IQ and achievement measures, corresponds to the dy-
magnitude of the effect or in terms of its relative magnitude namic-testing classification. The sample included 47 average
when compared with the results from other approaches. achievers, 26 children with specific reading disability, 24 chil-
The matrix of intercorrelations between composite scores dren with specific math disability, 17 slow learners, and 29
(calculated on a much smaller sample of 61 children) published underachievers. This study used all composite scores of the S-
elsewhere (Swanson, 1995a) supports our impression. The inter- CPT.
correlations between the initial, gain, and maintenance scores In the initial group comparison, the groups differed from each
ranged between .85 and .88. In this study, the researcher once other on both composite and component dynamic test scores.
again used the stepwise regression paradigm, but now the pre- Although the profiles of the five groups differed significantly,
dictors encompassed both static indexes (Full Scale IQ) and overall, able students did better on both composite and compo-
dynamic indexes (the initial score in one set of equations and nent dynamic measures. However, there were significant differ-
the gain score in another set of equations). The results showed ences between disabled groups, with the slow learners consis-
that both static and dynamic scores contributed significant tently the lowest group on the dynamic measures. The differ-
amounts of variance: Independently of the order of entry, the ences between reading- and math-disabled students were more
initial score explained more variance in achievement scores subtle. The stepwise regression analyses were carried out for
(14% if entered first and 15% if entered second for reading, the whole sample. When the initial score was entered first, the
18% for both orders for math) than did Pull Scale IQ (8% for variance in the reading performance score was attributed to the
both orders for reading, 1% if entered first and 2% if entered initial (R2 = .11, p < .001), probe (incremental R2 = .15, p
second for math). In contrast to the results from the total data < .001), maintenance (incremental R2 = .03, p < .01), and
set (see previous discussion), the gain score appeared to be a processing-difference (incremental R2 = .04, p < .01) scores.
more powerful predictor than the initial achievement scores. For mathematics, the analysis implemented the initial (R2 = .20,
When three composite scores (gain, maintenance, and initial p < .001) and stability (incremental R1 = . 1 2 , p < .001) scores
scores) along with Full Scale IQ were forced into the analysis as significant predictors. When there was no fixed order of entry,
predicting the reading achievement score, 14% of the variance the order of the predictors for reading was the gain (R2 =
was attributed to the gain score, 11% to Full Scale IQ, and 4% .14, p < .001), probe (incremental R2 = .12, p < .001), and
to the maintenance score; the initial score was dropped from maintenance (incremental R2 = .06, p < .001) scores; for math,
the equation. In predicting the math achievement score, the gain only the gain score (R2 - .32, p < .001) predicted the criterion
score was the only variable that remained in the equation (R2 variance significantly. This pattern of results was similar to the
= .26, p < .001). one observed before. Specifically, if the initial score was entered
Two studies (Swanson, 1994, 1995b) were concerned with first, the gain and maintenance scores tended not to be significant
usefulness of dynamic testing in the classification of children (probably because of high correlations between the scores).
with learning disabilities. The learning-disabled children were When the order of the entry was random, the gain score appeared
classified as such and then subtyped on the basis of their results to be the most powerful predictor. Some rather small amounts
on static IQ tests (the WISC-R, Slosson Intelligence Test- of variance were attributable to other dynamic-testing indicators.
Revised | Slosson, 1971 ], or PPVT-R) and on math and reading As for the results of discriminant analysis, only 42% of the
subtests from a traditional achievement test (WRAT). Classifi- cases were classified correctly among the five groups. The single
cation of learning disabilities used the cut-off scores procedures best discriminatory variable was the initial score. Only two
(e.g., Siege], 1989) rather than the IQ-achievement discrepancy. dynamic measures (the probe and processing-difference scores)
Dynamic-testing results were used in this research in order to added significant variance to ability group classification. This
distinguish two groups of learning-disabled children: so-called result was followed up with a stepwise discriminant analysis,
instructionally or teaching-deficient children (those who im- the purpose of which was to identify which groups were best
prove in information-processing performance relative to their predicted by which variables of the three that were significant
initial test performance) and so-called slow learners (those in the previous analysis. The overall characteristics of the analy-
whose potential score is not discrepant from their achievement ses were rather weak, but Swanson (1994) proceeded with an
score). The methodology implemented in both studies was quite interpretation of the data for specific groups on the basis of the
similar. First, Swanson (1994, 1995b) investigated the patterns number of cases classified correctly. According to this analysis,
102 GRIGORENKO AND STERNBERG

the initial score was the best classification variable for slow ities. In other words, by use of the first point of comparison
learners (59% classified correctly) and normal achievers from the four-point profile, what kind of new information does
(76%), the probe score was the best classifier for math-disabled this approach to dynamic testing provide, or what is the compar-
learners (58%), and the processing-difference score was helpful ative informativeness of this approach? The answer to this ques-
in classifying slow learners (52%). None of the variables were tion requires an examination of the statistical procedures used.
informative in classifying reading-disabled students (the maxi- There have been a number of warnings issued regarding the use
mum classified correctly was 15%) or underachieving learners of stepwise procedures (e.g., Altman & Andersen, 1989; Bollen,
(the maximum classified correctly was 38%). 1989), so the procedure must be used with caution. For example,
In the second study (Swanson, 1995a), 155 poor readers were the F distribution is unstable, parameter estimates are biased,
compared with 351 skilled readers. This study used all but the R2 tends to be unreliable, and problems of multicollinearity are
strategy-efficiency composite scores of the S-CPT. As in the troublesome. Obviously, all these concerns are applicable to the
1994 study, a multivariate analysis of variance revealed group analyses conducted by Swanson. Moreover, Swanson has had a
differences on all S-CPT composites, with skilled readers con- tendency not to follow a widely used strategy of conservatively
sistently outperforming poor readers. As a follow-up to this interpreting univariate analysis when multivariate models do not
result, Swanson conducted a stepwise discriminant analysis. fit well (as in the case of the discriminant procedures). In other
Unfortunately, the results of this analysis are presented in a words, heavy reliance on stepwise procedures and a tendency
fragmentary fashion so that it is difficult to estimate its overall to forego caution in making statistical inferences put most of
outcome (e.g., the percentage of the cases classified correctly). Swanson's findings in the gray area of results that require further
Swanson claimed that the best predictor of group classification proof by replication. However, the finding that we consider par-
was the gain score (fl 2 = .05, p < .001). Notice that this result ticularly relevant to the field of learning disability is the differen-
is quite different from the results of the 1994 study (Swanson, tiation of learning-disabled students into those who are more or
1994) discussed above. less responsive to intervention. The sensitivity of the S-CPT to
Having accomplished less than he perhaps hoped for with procedural characteristics of students' learning potential seems
discriminant analysis, Swanson (1995a) conducted a type of to be of special interest, although the same issue has been ad-
cluster analysis that he hoped would allow him to identify dis- dressed in many other approaches to dynamic testing. What is
tinct nonoverlapping clusters in such a way that children would new here is an attempt to identify a specific function, WM, as
be assigned to only one cluster (somehow, however, the number central to quantifying learning potential and to classifying both
of children in each group increased—there were 155 poor read- learning-disabled and average-achieving students into subgroups
ers before clustering and 156 after; similarly, the group of 351 based on this function.
skilled readers grew in size so that there were 355 of them after Regarding the power of prediction of the S-CPT, two com-
clustering). The clustering was performed on the basis of math ments should be made. High levels of correlations between the
achievement scores and two composite S-CPT scores (gain and initial (pretest), gain (intervention), and maintenance (posttest)
probe). Unfortunately, the justification of the variables selected scores warrant serious investigation. The interpretation of the
for the cluster analysis is somewhat difficult to ascertain; the stepwise regression results as indicative of the higher predictive
article does not clearly address why those particular variables power assignable to either the initial or gain scores might be
and no others were selected. Moreover, no overall statistical unreliable and due to fluctuations in correlations between the
characteristics of the procedure were presented. Swanson stated two variables. This is probably why the results of different
that eight groups (four nonoverlapping clusters in two reading- studies provide competing evidence supporting either the initial
skill groups) emerged from this analysis. On the basis of the score or the gain score as especially informative in predicting
analysis of patterns of discrepancies in the values of the criterion school achievement. Confirming previous observations obtained
variables, these groups were compared with each other. Relying by other dynamic-testing researchers, Swanson's data indicate
heavily on intuition, Swanson classified poor readers into slow independent contributions of abilities and indexes of learning
learners (with minimal discrepancy between dynamic and static potential (processing potential in Swanson's terms) to measures
scores), dyslexic learners (with average math performance and of school achievement.
no significant discrepancies between math performance and dy- The second comment addresses the uniqueness of the infor-
namic variables but significant discrepancies between reading mation provided by the S-CPT compared with the information
performance and dynamic variables), instructionally deficient provided by static tests. Swanson's data support the claim ini-
learners (with highest gain and probe scores and larger discrep- tially made by Vygotsky and then supported by evidence from
ancies between gain and maintenance than between achievement other studies (Day, Engelhardt, Maxwell, & Bolig, 1997) that
and initial scores), and students with learning disabilities in static and dynamic measures jointly provide valuable informa-
reading and mathematics (with large discrepancies between both tion regarding an individual's performance.
reading and math achievement scores and dynamic indexes, but As for the degree of efficiency of the S-CPT, the evidence
low gain and probe scores). The skilled readers group was presented in the reviewed publications suggests that the S-CPT
classified into subgroups of gifted students (with high gain is a psychometrically sound and robust instrument that does not
scores), low math achievers (with low scores in math), skilled require special training of examiners and that may be adminis-
math achievers (with high math scores), and instructionally tered in full in 3 hr. The effort invested in testing of the instru-
responsive children (with high probe scores). ment in the general U.S. population exceeds that done for any
As mentioned above, the main purpose of these studies was other dynamic test. Despite this ease of use, the question of the
to investigate whether the dynamic measures contribute more incremental informational content of the obtained data remains
information to the subgrouping of children with learning disabil- open. The underlying motivation in the creation of this instru-
DYNAMIC TESTING 103

merit was to tap into cognitive processes rather than cognitive approach to testing procedures, testing materials, and data
products. Although the correlations between the S-CPT scores analyses.
and various criterion measures are promising, the link between For example, the first goal of dynamic testing is applicable
the profile of dynamic scores and adequate teaching is still to situations in which the test does not measure the same trait
missing. In this context, the findings reported in the discriminant for all examinees. In this aspect, dynamic testing can be viewed
analyses of discrepancies between the S-CPT gain and mainte- as a means to increase comparability of individual differences
nance scores, if shown to be reliable and stable in subsequent by (a) equating for background, either by training examinees
research, might be used in designing intervention strategies. The in the content and relevant processes or by supplying the out-
assumption here is that those children who exhibit low initial comes from prerequisite processes for solving items (Stemberg,
scores and minimal discrepancy between the initial, gain, and 1977); (b) eliminating test-related artifacts (e.g., anxiety); and
maintenance scores should be viewed as better candidates for (c) taking into account cultural and other group differences
special education than those who show a significant discrepancy between examinees. The a priori assumption that estimates of
between the initial score and the gain score but minimal discrep- ability can be changed by varying any of the above aspects,
ancy between the gain and maintenance scores (Swanson, that is, by modifying the conditions under which the test is
1995b). administered, implies the necessity of having dynamic tests with
Our final comment is on the robustness of the results obtained (a) a fairly high goodness of fit to a latent-trait model, implying
on the S-CPT so far. The instrument was evaluated with a large a detailed understanding of the effects resulting from modifica-
heterogeneous sample. However, to our knowledge, all of the tion of the above aspects; (b) high predictive validity; and (c)
data were analyzed simultaneously and by one group. The publi- a clear idea of the impact of various related factors such as
cation of the S-CPT no doubt will result in the use of the processes, strategies, and executive functions (as mediated by
instrument by independent groups whose findings will extend personality and cultural factors) on performance. Another im-
those reviewed above. portant issue is the particular score or scores that should be
used. Current research on learning-potential tests consistently
shows that a posttest can be considered to be a sound predictor
A Three-Prong Approach to Dynamic Testing in comparison with a pretest; this suggests that perhaps only
the former score is interesting in the analyses. However, the
In this article, we have reviewed a number of approaches that
predictive validity even of posttests does not appear at this time
identify themselves as dynamic-testing methodologies. We have
consistently to be reliably higher than that of conventional intel-
attempted here to point out their many differences with respect
ligence tests. The predictive validity of the tests is dependent, of
to aims, tasks, training strategies, targeted processes, target pop-
course, on a number of factors such as the time period between
ulations, and predictive power. Differences among these ap-
measurements of the predictor and the criterion and the types
proaches compromise an accurate evaluation of the comparative
of tasks used as criterion variables. Thus, researchers need to
efficacies of the various approaches, as well as of the overall
take a very careful approach in designing posttraining studies.
efficacy of the dynamic approach in general. Nevertheless, de-
The second goal of dynamic testing, implemented in a number
spite variability in points of view, ideas, and concrete tech-
of existing approaches, is not only to improve the estimation
niques, what is characteristic of all the methodologies consid-
of ability but also to measure a newly formed or developed
ered here is that they share an underlying basic hypothesis that
psychological function. Some researchers (e.g., Feuerstein and
cognitive performance, with optimal aid, should provide the
Budoff) viewed cognitive modifiability as an independent ability
most valid testing of learning potential (Minick, 1987). The
and implemented this belief in their testing devices. Other re-
research activity of the past decade in the field of dynamic
searchers (e.g., Gal'perin) attempted to develop new specific
testing might be characterized as a series of attempts to verify
cognitive functions (e.g., subtraction and addition operations)
this hypothesis empirically.
and then assessed these newly developed functions.
The empirical results derived from studies of the testing of
Finally, the third goal of dynamic testing is to improve peo-
learning potential are not wholly consistent, however. In this
ple's mental efficiency. The major assumption here is that the
part of the article, we consider some of the factors that may
level of ability itself should be changed and that fairly extensive
have contributed to the lack of convergence of the empirical
training is required to change it. Examples of this application
findings regarding the usefulness and importance of dynamic
of dynamic-testing procedures are the Feuerstein Instrumental
testing. Our discussion centers around three issues: (a) corre-
Enrichment program and the work of Soviet-Russian defectolo-
spondence among aims, methodologies, and analytic strategies;
gists. In pursuing this goal, dynamic testing is closely linked to
(b) measurement of change; and (c) ecological validity of dy-
intervention. The dominant purpose is modifying, changing, and
namic testing.
improving cognitive performance. Testing here serves to deter-
mine the starting point, direction, and amount of intervention
General Aims: To Evaluate, Modify, or Both ? necessary.
Each application of dynamic testing comes with its own cor-
It appears that some of the inconsistencies in research findings responding methodological and data-analytic assumptions and
are due to the multifaceted nature of the goals of dynamic test- limitations. Moreover, each application of dynamic testing re-
ing. Thus, Embretson (1987b) delineated three main goals of veals unique information beyond that which is derivable from
dynamic testing: (a) to provide a better estimate of a specified conventional testing. The varied approaches to dynamic testing
ability construct, (b) to measure new abilities, and (c) to im- that arose from different theoretical paradigms are based on
prove mental efficiency. Each of these goals assumes a different different assumptions and are targeted at different purposes.
104 GRIGORENKO AND STERNBERG

But the general claim made by dynamic-testing promoters and training results in better performance. However, it has been
developers seems to have been justified, at least to some extent: shown that approximately 30% of children improve to a statisti-
dynamic testing does provide data unique to this type of testing. cally significant extent simply because of retesting (Klauer,
The multiplicity of the aims of dynamic testing requires a 1993; LeGagnoux, Michael, Hocevar, & Maxwell, 1990). Thus,
researcher to determine which particular aim of dynamic testing relatively large changes can be observed simply as an outcome
is relevant for a given situation and whether the given procedure of retesting. For example, in 20-30% of the cases, the absolute
will be applicable to this situation. Most likely, the relationship value of the retest effect is at least one standard deviation
between testing and instruction is viewed differently by teachers (Klauer, 1993). In other words, it appears that the most im-
and clinicians, who wish to promote change, and researchers, portant components of the change or posttest scores can be
who are generally more interested in measuring change. Whereas traced back to retesting (practice) effects. This issue is espe-
teachers and clinicians are expected to monitor a child's perfor- cially worrisome for methodologies using standardized tests in
mance closely and to decide when it is beneficial to intervene, their dynamically corrected administration. There are a number
researchers conducting testing typically are not expected (and of precautions that could be taken in order to minimize the
often are unable) to work in such an interactive mode. effects of posttesting: (a) to avoid or reduce retest effects
Hence, there are three differences between these two types through test construction, (b) to control for the effect of re-
of testing, specifically between testing conducted for the sake testing by using a control population, and (c) to model possible
of the evaluation and quantification of learning potential and effects of retesting by use of mathematical models of dynamic
testing conducted for the sake of teaching (Guthke, 1993). First, testing.
the priorities of the researcher and the teacher are different. However, even assuming that all potential methodological dis-
Whereas the researcher typically subordinates the goal of teach- turbances have been controlled for and that the dynamic tester
ing to the goal of determining the difference between a child's has registered the occurrence of true change, another important
unassisted and assisted levels of performance, the teacher typi- question arises: How does one measure change? Traditionally,
cally subordinates testing to instruction and modification. Sec- the measurement of change was based on the regression model
ond, the teacher rarely has a chance to work with a student in developed in the framework of classical test theory (see Cron-
an individual setting. Thus, the teacher accumulates the feedback bach & Furby, 1970). Recently, dynamically oriented psycho-
received from a group of students and adjusts teaching behavior metricians have objected to this approach, stating that its various
as part of his or her conception of the task at hand. Finally, premises typically are not met (Schottke, Bartram, & Wiedl,
much of what is taught at school cannot be broken down into 1993; Sijtsma, 1993). A useful review of harm-benefit ratios
a well-organized invariant sequence with a distinct path to mas- that result from applications of different statistical treatments
tery; on the contrary, there is a spectrum of appropriate reactions of change has been written by Embretson (1987b). This review
by the child, and the teacher is expected to interpret them and described a number of paradigms based on item-response theory
flexibly respond to them while teaching. (IRT; e.g., Fischer, 1987; Hambleton & Swaminathan, 1985;
These differences in attitude are especially important because Hambleton, Swaminathan, & Rogers, 1991; Lord, 1980) and
recently there has been a shift in how the role and mission of applicable to the problem of measuring change. For example,
learning-potential research are viewed. The major changes of Schottke et al. (1993) have suggested that the analysis should
emphasis were in (a) the shift from prediction-oriented testing take place in terms of qualitative determination of whether
to instruction-oriented testing (Delclos et al., 1992; Ruijsse- change (favorable or unfavorable) has occurred for each
naars, Castelijns, & Hamers, 1993), (b) the shift in emphasis individual.
from improved testing of general intelligence toward the analy- Yet another statistical treatment of change data, based on
sis and description of learning processes, and (c) the extension IRT, was developed by Fischer (1983a, I983b). Fischer has
of dynamic testing methodology to groups of nonchallenged suggested using qualitative data of zero scores for incorrect
children. Researchers have observed the usefulness of the learn- answers and unity scores for correct answers on both pretest
ing-potential tests in (a) the analysis of individual differences and posttest as the input for the so-called linear logistic model
in cognitive performance and the causes that lead to these differ- with relaxed assumptions. The changes from 0 to 1 and from 1
ences, (b) the context of studying so-called differential sensitiv- to 0 between pre- and posttest answers are treated as the change
ity to instruction, and (c) the study of learning processes (Ruijs- data and are analyzed in order to formulate hypotheses concern-
senaars, Castelijns, & Hamers, 1993). Although this road might ing the salient moderator variables (e.g., task characteristics,
well be the most productive one for dynamic testing to take, the personal factors, diagnostic approaches) that could have led to
methodology that now exists is not fully amenable to this shift, the observed change. Embretson (1987b) developed an applica-
and much background methodological and psychometric re- tion of structural-equation modeling, suitable for dynamic test-
search needs to be conducted before it will be feasible to view ing, that alleviates some of the problems related to the measure
these goals as representing a new mission for dynamic testing. of change. In our work, we are attempting to implement Markov
chain-based methodology developed for classical learning the-
ory (see Atkinson, Bower, & Crothers, 1965), which allows for
Measurement of Change
carrying over a change acquired at a given stage to a new stage.
One of the most serious criticisms that has been leveled at the
dynamic-testing paradigm is that it lacks a sound psychometric The Ecological Validity of Dynamic-Testing Instruments
foundation (Snow, 1990), particularly in regard to the measure-
ment of change occurring between pretest and posttest. One of Last but not least is the issue of the validity of dynamic-
the major assumptions of the dynamic-testing paradigm is that testing instruments. The main criticism of conventional tests of
DYNAMIC TESTING 105

cognitive abilities concerns construct validity: that these tests from the scientific community. We have suggested a number of
focus primarily on products rather than on processes and thus dimensions along which dynamic-testing studies can be re-
are valid only in terms of product-based criteria such as tests viewed and compared, and we have conducted such a compari-
of school achievement (Lidz, 1991). son. We have arrived at two conclusions.
Much of the early research on dynamic testing (Budoff, 1970; First, as of today, it is difficult to argue that this approach
Guthke, 1977) was devoted to comparing the predictive validi- clearly has proved its usefulness and has shown distinct advan-
ties of dynamic-versus-static tests. Even an advocate of dynamic tages over traditional static testing relative to the resources that
testing would find it quite difficult to argue that the empirical need to be expend .. Second, certain requirements, once met,
data, as of now, have consistently showed the higher predictive will make dynami .esting studies more compelling and, corre-
power of dynamic tests compared with static tests. However, spondingly, will r /ce the field of dynamic testing stronger and
there are multiple interpretations of this result. its data more cor.. mcing. These requirements are of two types:
One is that school achievement, in itself, is a product rather macroscale and microscale. The macrorequirements are related
than a process. Thus, naturally, there is a better internal match to theoretical issues of defining dynamic testing as an indepen-
between conventional tests and school achievement than there dent tradition in the psychology of testing with its own goals,
is between dynamic tests and school achievement. In support methods, and applied techniques. The microrequirements are
of this statement, it has been shown that the degree of correspon- specific to empirical issues and (a) underscore the necessity of
dence between learning-potential tests and everyday learning conducting studies that involve larger participant populations,
tasks in school (e.g., how well the test represents the material (b) validate dynamic-testing results against educational or pro-
studied at school) is a very important parameter—the higher fessional criteria, and (c) replicate results from different labora-
the correspondence, the greater the test's predictive power. tories and independently use developed methodologies to arrive
Moreover, researchers have shown that the predictive validity at similar findings.
of domain-specific learning-potential tests is higher when cri- In sum, the work in the field of dynamic testing has suggested
terion tests are also similarly domain specific (Ruijssenaars, interesting paradigms and ideas as well as promising findings.
Castelijns, & Hamers, 1993). For example, researchers found The question is whether this potential can be realized in a branch
high correlations (p < .01) between a reading-simulation test, of psychological testing characterized by consistently converg-
administered in kindergarten, and first-grade reading and spell- ing results and techniques that provide information over and
ing tests. Consequently, different kinds of dynamic tasks may above the data collected by conventional tests. We believe that
correspond to different styles of teaching. For example, whereas dynamic testing will ultimately meet these challenges and will
more traditional test items tend to correspond to more conserva- prove to be a valuable resource to the psychological profession
tive styles of teaching, more dynamic, student-oriented tasks are and to the world.
expected to correspond to more open styles of teaching to small
groups of students. References
Thus, the idea of including prototypical learning tasks that
Altman, D. G., & Andersen, P. K. (1989). Bootstrap investigation of
match school activities in learning-potential testing devices is
the stability of a Cox regression model. Statistics in Medicine, 8,
very important. For example, European researchers (Hamers,
771-783.
Pennings, & Guthke, 1994) have developed a set of domain-
Ashman, A. F. (1985). Process-based interventions for retarded students.
specific tasks (e.g., an auditory-analysis test), all of which were Mental Retardation and Learning Disability Bulletin, 13, 62-74.
assumed to be very important precursors to initial reading, spell- Ashman, A. F. (1992). Process-based instruction: Integrating testing and
ing, and arithmetic processes. The results showed that the pre- instruction. In H. C. Haywood & D. Tzuriel (Eds.), Interactive testing
dictive power of this test was higher for many school achieve- (pp. 375-396). New -fork: Springer-Verlag.
ment tests than was that obtained for either domain-general tests Atkinson, R. C., Bower, G. H., & Crothers, E. J. (1965). An introduction
of learning potential or a static test of intelligence. to mathematical learning theory. New %rk: Wiley.
Another idea that could explain the lack of predictive validity Baltes. M. M, Kuhl. K. P., & Sowarka, D. (1992). Testing for limits
of cognitive reserve capacity: A promising strategy for early diagnosis
of domain-general dynamic tests lies in the mismatch between
of dementia? Journal of Gerontology, 47, 165-167.
the active nature of the tested process (e.g., learning) and the
Barr, P. M., & Samuels, M. T. (1988). Dynamic testing of cognitive and
passive behavior of the testees in most of the dynamic-testing
affective factors contributing to learning difficulties in adults: A case
situations. Even in child-centered methodologies such as study approach. Professional Psychology: Research and Practice, 19,
Feuerstein's, the degree of cognitive activity of the child is 6-13.
minimized and is framed within the examiner's understanding Bethge, H., Carlson, J. S., & Wiedl, K. H. (1982). The effects of dy-
of what he or she should do in order for a child to reach the namic testing procedures on Raven Matrices performance, visual
criterion of performance. An important question is to what ex- search behavior, test anxiety and test orientation. Intelligence, 6, 89-
tent learning-potential tests are better equipped and better suited 97.
than conventional intelligence tests to provide diagnostic and Binet, A. (1909). Les idees modernes sur les enfants [Modem concepts
concerning children]. Paris: Flammarion.
prescriptive information about an active process of learning.
Blagg, N. (1991). Can we teach intelligence? Hillsdale, NJ: Erlbaum.
Bodrova, E., & Leong, D. J. (1996). Tools of the mind. The Vygotskian
Conclusion approach to early childhood education. Englewood Cliffs, NJ: Pren-
tice Hall.
We started this article by stating that the field of dynamic Bolig, E. E., & Day, J. D. (1993). Dynamic testing and giftedness: The
testing, as a result of a number of historical and ideological promise of assessing training responsiveness. Roeper Review, 16,
circumstances, has experienced insufficient critical attention 110-113.
106 GRIGORENKO AND STERNBERG

Bollen, K. A. (1989). Structural equations with latent variables. New learning potential testing with Spanish-speaking youth. Interamerican
York: Wiley. Journal of Psychology, 10, 13-24.
Borland, J. H., & Wright, L. (1994). Identifying young, potentially Budoff, M., & Friedman, M. (1964). "Learning potential" as a testing
gifted, economically disadvantages students. Gifted Child Quarterly, approach to the adolescent mentally retarded. Journal of Consulting
38, 164-171. Psychology, 28, 434-439.
Bradley, T. B. (1983). Remediation of cognitive deficits: A critical ap- Budoff, M., & Hamilton, J. (1976). Optimizing test performance of the
praisal of the Feuerstein model. Journal of Mental Deficiency Re- moderately and severely mentally retarded. American Journal of Men-
search, 27, 79-92. tal Deficiency, 81, 49-57.
Bransford, J. D., Stein, B. S., Arbitman-Smith, R., & Vye, N. J. (1985). Budoff, M., Meskin, J., & Harrison, R. G. (1971). An educational test
Improving thinking and learning skills: An analysis of three ap- of the learning potential hypothesis. American Journal of Mental De-
proaches. In J. Segal, S. F. Chipman, & R. Glaser (Eds.), Thinking ficiency, 76, 159-169.
and learning skills: Relating instruction to research (Vol. 1, pp. 133- Budoff, M., & Pagell, W. (1968). Learning potential and rigidity in the
208). Hillsdale, NJ: Erlbaum. adolescent mentally retarded. Journal of Abnormal Psychology, 73,
Bronfenbrenner, U. (1977). Toward an experimental ecology of human 479-486.
development. American Psychologist, 32, 513-551. Burns, M. S. (1991). Comparison of two types of dynamic testing and
Brown, A. L., Bransford, J. D., Ferrara, R. A., & Campione, J. C. static testing with young children. The International Journal of Dy-
(1983). Learning, remembering, and understanding. In J. H. Flavell & namic Testing and Instruction, 2, 29-42.
E. M. Markman (Eds.), Handbook of child psychology (Vol. Ill, pp. Burns, M.S. (1996). Dynamic assessment: Easier said than done. In
515-529). New York: Wiley. M. Luther, E. Cole, & P. Gamlin (Eds.). Dynamic assessment for
Brown, A. L., & Campione, J. C. (1981). Inducing flexible thinking: A instruction: From theory to application (pp. 182-187). North York,
problem of access. In M. Friedman, J. P. Das, & N. O'Connor (Eds.), Ontario, Canada: Captus Press.
Intelligence and learning (pp. 515-529). New York: Plenum Press. Bums, M. S., Delclos, V. R., Vye, N. J., & Sloan, K. (1992). Changes
Brown, A. L., & Ferrara, R. A. (1985). Diagnosing zones of proximal in cognitive strategies in dynamic testing. International Journal of
development. In J. V. Wertsch (Ed.), Culture, communication, and Dynamic Testing and Instruction, 2, 45-54.
cognition: Vygotskianperspectives (pp. 273-305). New York: Cam-
Burns, M. S., Vye, N., Bransford, I., Delclos, V, & Ogan, T. (1987).
bridge University Press.
Static and dynamic measures of learning in young handicapped chil-
Brown, A. L., & French, L. (1979). The zone of potential development:
dren. Diagnoaique, 12(2), 59-73.
Implications for intelligence testing in the year 2000. Intelligence, 3,
Campione, J. C. (1989). Assisted testing: A taxonomy of approaches
255-273.
and an outline of strengths and weaknesses. Journal of Learning
Brownell, M. T., Mellard, D. E, & Deshler, D. D. (1993). Differences in
Disabilities, 22, 151-165.
the learning and transfer performance between students with learning
Campione, J. C., & Brown, A. L. (1979). Human intelligence. Nor-
disabilities and other low-achieving students on problem-solving tasks.
wood, NJ: Ablex.
Learning Disability Quarterly, 16, 138-156.
Campione, J. C., & Brown, A. L. (1987). Linking dynamic testing with
Bttchel, F. P., & Schamhorst, U. (1993). The Learning Potential Testing
school achievement. In C. S. Lidz (Ed.), Dynamic testing (pp. 82-
Device (LPAD): Discussion of theoretical and methodological prob-
115). New York: Guilford Press.
lems. In J. H. M. Hamers, K. Sijtsma, & A. J. J. M. Ruijssenaars
Campione, J. C., Brown, A., & Bryant, N. (1985). Individual differences
(Eds.), Learning potential testing (pp. 83-111). Amsterdam:
in learning and memory. In R. J. Steinberg (Ed.), Human abilities:
Swets & Zeitlinger.
An information-processing approach (pp. 103-126). New York:
Buckingham, B. R. (1921). Intelligence and its measurement: A sympo-
Freeman.
sium. Journal of Educational Psychology, 12, 271-275.
Campione, J. C., Brown, A. L., Ferrara, R. A., Jones, R. S., & Steinberg,
Budoff, M. (1967). Learning potential among institutionalized young
E. (1985). Differences between retarded and nonretarded children in
adult retardates. American Journal of Mental Deficiency, 72, 404-
transfer following equivalent learning performance: Breakdowns in
411.
flexible use of information. Intelligence, 9, 297-315.
Budoff, M. (1968). Learning potential as a supplementary testing proce-
dure. In I. Helhnuth (Ed,), Learning disorders (Vol. 3, pp. 295-343). Carlson,!. S. (1989). Advances in research on intelligence: The dynamic
testing approach. Mental Retardation and Learning Disability Bulle-
Seattle, WA: Special Child.
tin, 17, 1-20.
Budoff, M. (1969). Learning potential: A supplementary procedure for
assessing the ability to reason. Seminar in Psychiatry, 1, 278-290. Carlson,!. (Ed.). (1992). Cognition and educational practice: An inter-
Budoff, M. (1970). Learning potential: A supplementary procedure for national perspective. Greenwich, CT: JAI Press.
assessing the ability to reason. Acta Paedopsychiatrics, 37, 293-309. Carlson, J., & Wiedl, K. H. (1976). The factorial analysis of perceptual
(ERIC Document Reproduction Service No. ED 048 703) and abstract reasoning abilities in tests of concrete operational
Budoff, M. ( 1975). Learning potential among educable retarded pupils. thought. Educational and Psychological Measurement, 36, 1015-
Cambridge, MA: Research Institute for Educational Problems. 1019.
Budoff, M. (1987a). Measures for assessing learning potential. In C. S. Carlson, J. S., & Wiedl, K. H. (1978). Use of testing-the-limits proce-
Lidz (Ed.), Dynamic testing (pp. 173-195). New York: Guilford dures in the testing of intellectual capabilities in children with learning
Press. difficulties. American Journal of Mental Deficiency, 11, 559-564.
Budoff, M. (1987b). The validity of learning potential. In C. S. Lidz Carlson, J. S., & Wiedl, K. H. (1979). Toward a differential testing
(Ed.), Dynamic testing (pp. 52-81). New York: Guilford Press. approach: Testing-the-limits employing the Raven matrices. Intelli-
Budoff, M., & Corman, L. (1974). Demographic and psychometric gence, 3, 323-344.
factors related to improved performance on the Kohs learning-poten- Carlson, J. S., & Wiedl, K. H. (1980). Applications of a dynamic testing
tial procedure. American Journal of Mental Deficiency, 78, 578-585. approach: Empirical results and theoretical formulations. Zeitschrift
Budoff, M., & Corman, L. (1976). Effectiveness of a learning potential fur Differentielle und Diagnostische Psychologie, 4, 303-318.
procedure in improving problem-solving skills to retarded and nonre- Carlson, J. S., & Wiedl, K. H. (1992a). The dynamic testing of intelli-
tarded children. American Journal of Mental Deficiency, 81. 260- gence. In H. C. Haywood & D. Tzuriel (Eds.), Interactive testing
264. (pp. 167-186). New York: Springer-Verlag.
Budoff, M., Corman, L., & Gimon, A. (1976). An educational test of Carlson, J. S., & Wiedl, K. H. (1992h). Principles of dynamic testing:
DYNAMIC TESTING 107

The application of a specific model. Learning and Individual Differ- DeWeerdt, E. H. (1927). A study of the improvability of fifth grade
ences, 4, 153-166. school children in certain mental functions. Journal of Educational
Carver, R. (1974). Two dimensions of tests: Psychometric and edume- Psychology, 18, 547-557.
tric. American Psychologist, 29, 512-518. Dillon, R. R, & Carlson, J. S. (1978). Testing for competence in three
Cattell, R. B. (1940). A culture free intelligence test. I. Journal of ethnic groups. Educational and Psychological Measurement, 38, 436-
Educational Psychology, 31, 161-180. 443.
Cazden, C. B. (1981). Performance before competence: Assistance to Dmitriev, D. (1997). Pedagogical psychology and the development of
child discourse in the zone of proximal development. Quarterly News- education in Russia. In E. L. Grigorenko, P. Ruzgis, & R. I. Sternberg
letter of the Laboratory of Comparative Human Cognition, 3, 5—8. (Eds.), Russian psychology: Past, present, and future (pp. 225-265).
Coker, C. C. (1990). Dynamic testing, learning curve analysis and the Commack, NY: Nova Science Publishers.
training quotient. Vocational Evaluation and Work Adjustment Bulle- Downs, S. (1985). Testing trainability. Oxford, England: NFER Nelson.
tin, 23, 139-147. Dunn, L., & Dunn, L. (1981). Peabody Picture Vocabulary Test-
Cole, M. (1985). The zone of proximal development: Where culture Revised. Circle Pines, MN: American Guidance Service.
and cognition create each other. In J. V. Wertsch (Ed.), Culture, com- El'konin, D. B. (1960). Opyt psikhologicheskogo issledovaniia v ek-
munication, and cognition: Vygotskian perspectives (pp. 146-161). sperimental'nom klasse [A sample of psychological research in an
New Tfork: Cambridge University Press. intervention class]. Voprosy Psikhologii, 5, 30-40.
Corman, L., & Budoff, M. ( 1973). A comparison of group and individ- Elliott, J. (1993). Assisted testing: If it is "dynamic" why is it so
ual training procedures on the Raven Learning Potential Measure rarely employed? Educational and Child Psychology, 10, 48-58.
(RIE Print #56). Cambridge, MA: Research Institute for Educational Embretson, S. E. (1987a). Improving the measurement of spatial apti-
Problems. (ERIC Document Reproduction Service No. ED 086 924) tude by dynamic testing. Intelligence, 11, 333-358.
Cormier, P., Carlson, J. S., & Das, J. P. (1990). Planning ability and Embretson, S. E. (1987b). Toward development of a psychometric ap-
cognitive performance: The compensatory effects of a dynamic testing proach. In C. S. Lidz (Ed.), Dynamic testing (pp. 141-172). New
approach. Learning and Individual Differences, 2, 437-449. York: Guilford Press.
Coxhead, P., & Gupta, R. M. (1988). Construction of a test battery to Fernandez-Ballesteros, R., Juan-Espinosa, M., Colom, R., & Calero,
measure learning potential. In R. M. Gupta & P. Coxhead (Eds.), M. D. (1997). Contextual and personal sources of individual differ-
Cultural diversity and learning efficiency: Recent developments in ences in intelligence: Empirical results. In Advances in cognition and
testing (pp. 15-22). London: Macmillan. educational practice (Vol. 4, pp. 221-274). Greenwich, CT: JAI
Cronbach, L. J., & Furby, L. (1970). How we should measure Press.
"change"—Or should we? Psychological Bulletin, 74, 68-80. Ferrara, R. A., Brown, A. L., & Campione, J. C. (1986). Children's
Das, I. P., & Conway, R. N. F. (1992). Reflections on remediation and learning and transfer of inductive reasoning rules: Studies in proximal
transfer: A Vygotskian perspective. In H. C. Haywood & D. Tzuriel development. Child Development, 57, 1087-1099.
(Eds.), Interactive testing (pp. 94-115). New York: Springer-Verlag. Feuerstein, R., Feuerstein, R., & Schur, Y. (1997). Process as content in
Das, J. P., Kirby, J. R., & Jarman. R. E (1979). Simultaneous and suc- regular education and in particular in education of the low functioning
cessive cognitive processes. New \brk: Academic Press. retarded performer. In A. L. Costa & R. M. Liebmann (Eds.), Envi-
Davydov, V. V. (1986). Problemy razvivaiushchego obuchenia [Issues sioning process as content: Toward a renaissance curriculum. Thou-
in developing learning]. Moscow: Pedagogika. sand Oaks, CA: Corwin Press.
Davydov, V. V., Pushkin, V. N., & Pushkina, A. G. (1972). Zavisimosf Feuerstein, R., & Krasilowsky, D. (1972). Interventional strategies for
razvitia myshlenia rnladshikh shkol'nikov ot kharaktera oluchenia the significant modification of cognitive functioning in the disadvan-
[Relationships between the development of thinking of elementary taged adolescent. Journal of the American Academy of Child Psychia-
schools students and instruction]. Voprosy Psikhologii, 6, 124-132. try, 11, 572-582.
Day, J. D., & Cordon, L. A. (1993). Static and dynamic measures of Feuerstein, R., & Rand, Y. (1974). Mediated learning experiences: An
ability: An experimental comparison. Journal of Educational Psychol- outline of proximal etiology for differential development of cognitive
ogy, 85, 75-82. functions. International Understanding, 9-10, 7-37.
Day, J. D., Engelhardt, J. L., Maxwell, S. E., & Bolig, E. E. (1997). Feuerstein, R., Rand, Y, Haywood, H. C., Hoffman, M., & lensen, M.
Comparison of static and dynamic testing procedures and their relation (1985). The learning potential testing device (LPAD): Examiners'
to independent performance. Journal of Educational Psychology, 89, manual. Jerusalem, Israel: Hadassah-Wizo-Canada Research
358-368. Institute.
Day, J. D., & Hall, L. K. (1987). Cognitive testing, intelligence, and Feuerstein, R., Rand, Y, & Hoffman, M. B. (1979). The dynamic testing
instruction. In I. D. Day & J. G. Borkowski (Eds.), Intelligence and of retarded performers: The learning potential testing device: Theory,
exceptionality: New directions for theory, testing, and instructional instruments, and techniques. Baltimore: University Park Press.
practice (pp. 57-80). Norwood, NJ: Ablex. Feuerstein, R., Rand, Y., Hoffman, M. B., & Miller, R. (1980). Instru-
Day, J. D., & Zajakowski, A. (1991). Comparisons of learning ease and mental enrichment. Baltimore: University Park Press.
transfer propensity in poor and average readers. Journal of Learning Feuerstein, R., Rand, Y, lensen, M. R., Kaniel, S., & Tzuriel, D. (1987).
Disabilities, 24, 421-428. Prerequisites for testing of learning potential: The LPAD model. In
Dearborn, W. F. (1921). Intelligence and its measurement. Journal of C. S. Lidz (Ed.), Dynamic testing (pp. 35-51). New Tlbrk: Guilford
Educational Psychology, 12, 210-212. Press.
Delclos, V. R., Burns, M. S., & Kulewicz, S. J. (1987). Effects of dy- Feuerstein, R., Rand, J., & Rynders, J. E. (1988). Don't accept me as I
namic assessment on teachers' expectations of handicapped children. am: Helping "retarded" people to excel. New York: Plenum Press.
American Educational Research Journal, 24, 325-336. Fischer, G. H. (1983a). Logistic latent trait models with linear con-
Delclos, V. R., Burns, M. S., & Vye, N. J. (1993). A comparison of straints. Psychometrika, 48, 3-26.
teachers' responses to dynamic and traditional assessment reports. Fischer, G. H. (1983b). Some latent trait models for measuring change
Journal of Psychoeducational Assessment, 11, 46-55. in qualitative observations. In D. J. Weis (Ed.), New horizons in test-
Delclos, V. R., Vye, N. J., Burns, M. S., Bransford, J. D., & Hasselbring, ing (pp. 118-138). New York: Academic Press.
T. S. (1992). Improving the quality of instruction: Roles for dynamic Fischer, G. H. (1987). Applying the principles of specific objectivity
testing. In H. C. Haywood & D. Tzuriel (Eds.), Interactive testing and of generalizability to the measurement of change. Psychometrika,
(pp. 317-332). New York: Springer-Verlag. 52, 565-587.
108 GRJGORENKO AND STERNBERG

Frawley, W., & Lantolf, J. P. (1985). Second language discourse: A Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Funda-
Vygotskyan perspective. Applied Linguistics, 6(1), 21-43. mentals of item response theory. Newbury Park, CA: Sage.
Frisby, C. L., & Braden, J. P. (1992). Feuerstein's dynamic testing ap- Hamers, J., Pennings, A., & Guthke, J. (1994). Training-based assess-
proach: A semantic, Logical, and empirical critique. Journal of Special ment of school achievement. Learning and Instruction, 4, 347-360.
Education, 26, 281-301. Hamers, J. H. M., Hessels, M. G. P., & Pennings, A. H. (1996). Learning
GaTperin, P. Ya. (1966). Kucheniui ob mteriorizatsii [Toward the theory potential in ethnic minority children. European Journal of Psychologi-
of interiorization]. Voprosy Psikhologii, 6, 20-29. cal Assessment, 12, 183-192.
Ginzburg, M. P. (1981). O vozraozhnoi interpretatsii poniatia zony bliz- Hamers, J. H. M., Hessels, M. G. P., & Van Luit, J. E. H. (1991). Learn-
haishego razvitia [On a possible interpretation of the concept of the ing potential test for ethnic minorities (LEM): Manual and test. Lisse,
zone of proximal development!. In D. B. El'konin & A. L. Venger The Netherlands: Swets & Zeitlinger.
(Eds.), Diagnostika uchebnoi diatel'nosti i intellectual'nogo razvitia Haywood, H. C. (1997). Interactive assessment. In R. L. Taylor (Ed.),
delei (pp. 145-155). Moscow: Academia Pedagogicheskikh Nauk Assessment of individuals with mental retardation (pp. 100-140).
SSSR. San Diego, CA: Singular Publishing Group.
Goncharova, E. L. (1990). Nekotorye voprosy vyshego obrazovania Haywood, H. C., & Arbitman-Smith. R. (1981). Modification of cogni-
vzroslykh slepoglukhikh [On higher education for the deaf-blind J. tive functions in slow-learning adolescents. In P. Mittler (Ed.), Fron-
In V. N. Chulkov, V.I. Lubovsky, & E. N. Martsinovskaia (Eds.), tiers of knowledge in mental retardation (Vol. 1, pp. 98-107). Balti-
Differentsirovannyi podkhod pri obuchenii i vospitanii slepoglukhikh more: University Park Press.
detei (pp. 56-70). Moscow: Academia Pedagogicheskikh Nauk Haywood, H. C., & Menal, C. (1992). Cognitive developmental psycho-
SSSR. therapy: A case study. International Journal of Cognitive Education
Goncharova, E. L., Akshonina, A. la., & Zarechnova, E. A. (1990), and Mediated Learning, 2, 43-54.
Formirovanie motivatsionnoi o&novy chtenia u detei s glubokimi naru- Haywood, H. C., & Tzuriel, D. (1992a). Epilogue: The status and future
sheniiami zreniia i slukha [The formation of motivation for reading of interactive testing. In H. C. Haywood & D. Tzuriel (Eds.), Inter-
in children with severe visual and auditorial handicaps]. In V.N. active testing (pp. 38-63). New York: Springer-Verlag.
Chulkov, V. I. Lubovsky, & E. N. Martsinovskaia (Eds.), Differentsir- Haywood, H. C., & Tzuriel, D. (Eds.). (I992b). Interactive testing.
ovannyi podkhod pri obuchenii i vospitanii slepoglukhikh detei (pp. New Ybrk: Springer-Verlag.
105-121). Moscow: Academia Pedagogicheskikh Nauk SSSR. Haywood, H. C., & Wingenfeld, S. A. (1992). Interactive testing as a
research tool. The Journal of Special Education, 26, 253-268.
Grigorenko, E. L. (1998). Russian defectology: Anticipating perestroika
Hedegaard, M. (1990). The zone of proximal development as the basis
in the field. Journal of Learning Disabilities, 31, 193-207.
for instruction. In L. Moll (Ed.), Vygotsky and education: Instructions
Groot-Zwaaftink, X, Ruijssenaars, A. J. J. M., & Schelbergen, I. (1987).
and applications of sociohistorial psychology (pp. 349-372). Cam-
Computer controlled learning test: Learning test research with cerebral
bridge, England: Cambridge University Press.
paresis. In E J. Maarse, L. J. M. Mulder, W. P. B. Sjouw, & A. E.
Hessels, M. G. P., & Hamers, J. H. M. (1993). The learning potential
Akkerman (Eds.), Computers in psychology: Methods, instrumenta-
test for ethnic minorities. In J. H. M. Hamers, K. Sijtsma, & A. J. J. M.
tion and psychodiagnostics (pp. 113-127). Lisse, The Netherlands:
Ruijssenaars (Eds.), Learning potential testing (pp. 285-311). Am-
Swets & Zeitlinger.
sterdam: Swets & Zeitlinger.
Guthke, J. (1977). Zur Diagnostik der intellekturllen Lerndhigkeit [As-
Hickson, J., & Skuy, M. (1990). Creativity and cognitive modifiability in
sessment of intellectual learning potential]. Berlin: VEB Deutscher
gifted disadvantaged pupils: A promising alliance. School Psychology
Vertag der Wissenschafen.
International, 11, 295-301.
Guthke, J. (1992). Learning tests: The concept, main research findings,
Hoy, M. P., & Relish, P.M. (1984). A comparison of two types of
problems and trends. Learning and Individual Differences, 4, 137-
assessment reports. Exceptional Children, 51, 225-229.
151.
Jastak, J. F., & Jastak, S. R. (1978). The Wide Range Achievement Test
Guthke, J. (1993). Current trends in theories and testing of intelligence.
(rev. ed.). Washington, DC: Guidance Associates.
In J, H.M. Hamers, K. Sijtsma, & A. J. J. M. Ruijssenaars (Eds.),
Jastak, S., & Wilkinson, G. S. (1984). The Wide Range Achievement
Learning potential testing (pp. 13 -20). Amsterdam: S wets &
Test—Revised: Administration manual. Washington, DC: Jastak
Zeitlinger.
Associates.
Guthke, J., Beckman, J. R, & Dobat, H. (1997). Dynamic testing—
Jensen, M. R., Robinson-Zafiartu, C., & Jensen, M. L. (1992). Dynamic
Problems, uses, trends and evidence of validity. Educational and Child
testing and mediated learning: Testing and intervention for devel-
Psychology, 14(4), 17-32.
oping cognitive and knowledge structures. Sacramento, CA: Califor-
Guthke, J., Rader, E., Caruso, M,, & Schmidt, K. D. (1991). En-
nia Department of Education, Advisory Committee on the Reform of
twicklung eines adaptiven computergestiitzten Lemtests auf der Basis
California's Testing Procedures in Special Education.
der strukturellen Inforrnationstheorie [Development of an adaptive
Jitendra, A. K., & Kameenui, E. J. (1993). Dynamic testing as a com-
computer-assisted learning test based on structural intformation the-
pensatory testing approach: A description and analysis. RASE: Reme-
ory]. Diagnostika, 37, 1-28.
dial and Special Education, 14, 6- 18.
Guthke, J., & Stein, H. (1996). Are learning tests the better version of Kalmykova, S. J. (1975). Problemy diagnostiki psikhicheskogo razvitiia
intelligence tests? European Journal of Psychological Assessment, 12, shkol'nikov [Testing of schoolchildren's mental development]. Mos-
1-13. cow: Pedagogika.
Guthke, J., & Wingenfeld, S. (1992). The Learning Test concept: Ori- Kaniel, S., & Tzuriel, D. ( 1992). Mediated learning experience approach
gins, state of art, and trends. In H. C. Haywood & D. Tzuriel (Eds.), in the assessment and treatment of borderline psychotic adolescents.
Interactive testing (pp. 64-93). New York: Springer-Verlag. In H. C. Haywood & D. Tzuriel (Eds.), Interactive assessment (pp.
Gutierrez-Clellen, V. K, Pefia, E., & Quinn, R. (1995). Accommodating 399-418). New York: Springer-Verlag.
cultural differences in narrative style: Multicultural perspective. Topics Kaniel, S., Tzuriel, D., Feuerstein, R., Ben-Shachar, N., & Eitan, T.
in Language Disorders, 15, 54-67. (.1991). Dynamic assessment, learning, and transfer abilities of Jewish
Haenen, J. (1996). PiotrGal'perin: Psychologist in Vygotsky's footsteps. Ethiopian immigrants to Israel. In R. Feuerstein, P. S. Klein, & A.
Commack, NY: Nova Science Publishers, Tannenbaum (Eds.), Mediated learning experience (pp. 179-209).
Hambleton, R. K., & Swaminathan, H. (1985). Item response theory. London: Freund.
Boston, MA: Kluwer Nijhoff. Kar, B.C., Dash, U. N., Das, J. R, & Carlson, J. (1993). Two ex peri-
DYNAMIC TESTING 109

merits on the dynamic testing of planning. Learning and Individual Mercer, J. R. (1979). Technical manual: SOMPA. New York: Psychologi-
Differences, 5, 13-29. cal Corporation.
Katz, M., &Bucholz, E. (1984). Use of the LPAD for cognitive enrich- Minick, N. (1987). The zone of proximal development. In C. S. Lidz
ment of a deaf child. School Psychology Review, 13, 99-106. (Ed.), Dynamic assessment (pp. 116-140). New York: Guilford
Keane, K., & Kretschmer, R. (1987), The effect of mediated learning Press.
intervention on cognitive task performance with a deaf population. Missiuna, C., & Samuels, M. (1988). Dynamic testing: Review and
Journal of Educational Psychology, 79, 49-53. critique. Special Services in the Schools, 5, 1-22.
Keane, K.. J., Tannenbaum, A. J., & Krapf, G. F. (1992). Cognitive com- Missiuna, C., & Samuels, M. T (1989). Dynamic testing of preschool
petence: Reality and potential in the deaf. In H. C. Haywood & D. children with special needs: Comparison of mediation and instruction.
Tzuriel (Eds.), Interactive testing (pp. 300-316). New York: Remedial and Special Education, 10, 53-62.
Springer-Verlag. Molina, S., & Perez, A. A. (1993). Cognitive processes in the child
Kern, B. (1930). Wirkungsformen der Ubung [Effects in training]. Mun- with Down's syndrome. Developmental Disabilities Bulletin, 21, 21 -
ster, Germany: Helios. 35.
Klauer, K. J. (1993). Evaluation einer evaluation. Stellungsnahme zum Moll, L,, & Greenberg, J. (1990). Creating zones of possibilities: Com-
beitrag von Hager und Hasselhorn [Evaluation of an evaluation: A bining social contexts for instruction. In L. Moll (Ed.), Vygotsky and
critique of the study by Hager and Hasselhorn]. Zeitschtift fur En- education (pp. 319-348). Cambridge, England: Cambridge Univer-
twicklungspsychologie und Padagogische Psychologie, 25, 322—327. sity Press.
Klein, S. (1987). The effects of modern mathematics. Budapest, Hun- Naglieri, J. A., & Das, J. P. (1988). Planning-attention-simultaneous-
gary: Akademia. successive (PASS): A model for testing. School Psychology Review,
Kliegl, R., & Baltes, P. B. (1987). Theory-guided analysis of develop- 19, 423-458.
ment and aging mechanisms through testing-the-limits and research Newman, D., Griffin, P., & Cole, M. (Eds.). (1989). The construction
on expertise. In C. Schooler & K. W. Schaie (Eds,), Cognitive func- zone: Working for cognitive change in school. New \brk: Cambridge
tioning and social structures over the life course (pp. 30-54). Nor- University Press.
wood, NJ: Ablex. Newman, F., & Holzman, L. (1993). Lev Vygotsky: Revolutionary scien-
Kliegl, R., Smith, J., & Baltes, P. B. (1989). Testing-the-limits and the tist. London: Routledge.
study of adult age differences in cognitive plasticity of a mnemonic Nikolaeva, S. M. (1995). Vidy raboty po korrektsii narusheny pis'men-
skill. Developmental Psychology, 25, 247-256. noi rechi u pervoklassnikov [Correcting written language problems
Kozulin, A. (1984). Psychology in Utopia. Cambridge, MA: MIT Press. in first graders]. Defectologiia, 3, 76-81.
Kozulin, A. (1990). Vygotsky's psychology (a biography of ideas). New Obukhova, L. F. (1972). Etapy razvitia detskogo myshlenia [Stages of
%rk: Harvester Wheatsheaf. the development of children's thinking]. Moscow: MGU.
Kozulin, A., & Ealik, L. (1995). Dynamic cognitive testing of the child. Olswang, L. B., & Bain, B. A. ( 1996). Assessment information for pre-
Current Directions in Psychological Science, 4(6), 192-196. dicting upcoming change in language production. Journal of Speech
Laughon, P. (1990). The dynamic testing of intelligence: A review of and Hearing Research, 39, 414-423.
three approaches. School Psychology Review, 29, 459-470. Ombredane, A.. Robayer, F., & Plumail, H. (1956). Resultats d'une
LeGagnoux, G., Michael, W. B., Hocevar, D., & Maxwell, V. (1990). application repetee du matrix-couleur a une population de Noirs Con-
Retest effects on standardized structure-of-intellect ability measures golais [Results of repeated use of the color-matrix on a population of
for a sample of elementary school children. Educational and Psycho- Congolese blacks]. Bulletin, Centre d'Etudes et Recherches Psycho-
logical Measurement, 50, 475-492. techniques, 6, 129-147.
Levina, R. E. (Ed.). (1968). O.movy teorii i praktiki logopedii [Funda- Palincsar.A. S., & Brown, A. L. (1984). Reciprocal teaching of compre-
mentals of logopaedic theory and practice]. Moscow: Pedagogika. hension-fostering and comprehension-monitoring activities. Cognition
Lidz, C. S. (Ed.). (1987). Dynamic testing: An interactional approach and Instruction, 1, 117-175.
to evaluating learning potential. New Yark: Guilford Press. Palincsar, A. S., & Brown, A. L. (1988). Teaching and practical thinking
Lidz, C. S. (1991). Practitioner's guide to dynamic testing. New York: skills to promote comprehension in the context of group problem
Guilford Press. solving. RASE: Remedial and Special Education, 9, 53-59.
Lidz, C. S. (1995). Dynamic testing and the legacy of L. S. Vygotsky. Paour, J.-L. (1992), Induction of logic structures in the mentally re-
School Psychology International, 16, 143-153. tarded: A testing and intervention instrument. In H. C. Haywood &
Lidz, C. S., & Thomas, C. (1987). The Preschool Learning Testing D. Tzuriel (Eds.), Interactive testing (pp. 119-166). New \brk:
Device: Extension of a static approach. In C. S. Lidz (Ed.), Dynamic Springer-Verlag.
testing (pp. 288-326). New York: Guilford Press. Pena, E., Quinn, R., & Iglesias, A. (1992). The application of dynamic
Lord, F. M. (1980). Applications of item response theory to practical methods to language testing: A nonbiased procedure. Journal of Spe-
testing problems. Hillsdale, NJ: Erlbaum. cial Education, 26, 269-280.
Luria, A. R. (1966). Human brain and psychological processes. New Penrose, L. S. (1934). Mental defect. New York: Farrar and Rinehart.
York: Harper & Row. Pozhilenko, E. A. (1995). Ispol'zovanie nagliadnukh posoby i igrovykh
Luther, M., Cole, E., & Gamlin, P. (Eds.). (1996). Dynamic testing for priemov v korrektsii rechi doshkol'nikov [Usage of visual materials
instruction: From theory to application. North "fork, Ontario, Canada: and play in the speech correction of preschoolers]. Defectologiia, 3,
Captus University Publications. 61-68.
McLanc, J. B. (1990). Writing as a social process. In L. Moll (Ed.), Rand, Y, & Kaniel, S. (1987). Group administration of the LPAD. In
Vygotsky and education (pp. 304-318). Cambridge, England: Cam- C. S. Lidz (Ed.), Dynamic testing (pp. 196-214). New York: Guilford
bridge University Press. Press.
McNamee, G. D. (1990). Learning to read and write in an inner-city Raven, J. C. (1956). Guide to using the Coloured Progressive Matrices:
setting: A longitudinal study of community change. In L. Moll (Ed.), Sets A, Ab, and B. London: H. K. Lewis.
Vygotsky and education (pp. 287-303). Cambridge, England: Cam- Raven, J. C. (1958). Standard Progressive Matrices: Sets A, B, C, D,
bridge University Press. and E. London: H. K. Lewis.
McNamee, G. D., McLane, J. B., Cooper, P. M., & Kerwin, S. M. Razvities psikhiki shkol 'nikov v protsesse uchebnoi deiatel :nosti
(1985). Cognition and affect in early literacy development. Early [Schoolchildren's psychological development in learning activity].
Childhood Development and Cure, 20, 229-244. (1983). Moscow: Pedagogika.
110 GRIGORENKO AND STERNBERG

Resing, W. C. M. (1993). Measuring inductive reasoning skills: The Slosson, R. (1971). Shsson Intelligence Test. East Aurora, NY: Slosson
construction of a learning potential test. In J. H. M. Hamers, K. Sijt- Educational Publications.
sma, & A. J. J. M. Ruijssenaars (Eds.), Learning potential testing (pp. Snow, R. E. (1990). Progress and propaganda in learning testing. Con-
219-242). Amsterdam: Swets & Zeitlinger. temporary Psychology, 35, 1134-1136.
Rey, A. (1934). D'un precede pour evaluer 1'educabilite [A method for Spector, J.E. (1992). Predicting progress in beginning reading: Dy-
assessing educability]. Archives de Psychologic, 24, 297-337. namic testing of phonemic awareness. Journal of Educational Psy-
Rey, A. (1959). Test de copie d'une figure complexe [lest of copying chology, 84, 353-363.
a complex drawing]. Paris: Centre de Psychologic Applique. Speece, D. L., Cooper, D. H., & Kibler, J. M. (1990). Dynamic testing,
Robertson, I. T, & Mindel, R. M. (1980). A study of trainability testing. individual differences, and academic achievement. Learning and Indi-
Journal of Occupational Psychology, 53, 131-138. vidual Differences, 2, 113-127.
Rogoff, B. (1990). Apprenticeship in thinking: Cognitive development Spirova, L. R, & Litvinova, A. V. (1988). Differentsirovannyi podkhod
in social context. New \brk: Oxford University Press. k proiavleniiam narushenia pis'ma i chtenia u uchachshikhsia obsh-
Rogoff, B., & Wertsch, J. V. (1984). Children's learning in the "zone cheobrazovatel' nykh shkol [Differentiative approaches to the mani-
of proximal development." San Francisco: Jossey-Bass. festation of writing and reading problems in school-aged children].
Rubinstein, S. L. (1946). Osnovy obshchei psikhologii [Foundation of Defectologiia, 5, 4-9.
general psychology]. Moscow: Uchpedgiz. Sternberg, R. J. (1977). Intelligence, information processing and ana-
Rubtsov, V. V. (1981). The role of cooperation in the development of logical reasoning: The componential analysis of human abilities.
intelligence. Soviet Psychology, 19 (4), 41-62. Hillsdale, NJ: Erlbaum.
Ruijssenaars, A. J. J. M., Castelijns, J. H. M., & Hamers, J. H. M. Sternberg, R. J. (1996). Successful intelligence. New York: Simon &
(1993). The validity of learning potential tests. In J. H. M. Hamers, Schuster.
K. Sijtsma, & A. J. J. M., Ruijssenaars (Eds.), Learning potential Swanson, H. L. (1984a). A multidirectional model for assessing learning
testing (pp. 69-82). Amsterdam: Swets & Zeitlinger. disabled students' intelligence. Learning Disability Quarterly, 5, 316-
Rutland, A. F, & Campbell, R. N. (1996). Relevance of Vygotsky's 326.
theory of the zone of proximal development to the testing of children Swanson, H. L. (1984b). Process assessment of intelligence in learning
with intellectual disabilities. Journal of Intellectual Disability Re- disabled and mentally retarded children: A multidirectional model.
search, 40, 151-158. Educational Psychologist, 29, 149-162.
Salmina, N., & Kolmogorova, L. S. (1980). Usvoenie nachal'nykh ma- Swanson, H. L. (1988). A multidirectional model for assessing learning
tematicheskikh poniatii pri raznykh vidakh materializatsii ob'ektov i disabled students' intelligence: An information-processing framework.
orudii deistvia [The acquisition of elementary math concepts under Learning Disability Quarterly, 11, 233-247.
different types of representation of objects and tools]. Voprosy Psikho-
Swanson, H. L. (1992). Generality and modifiability of working mem-
logii, 1, 47-56.
ory among skilled and less skilled readers. Journal of Educational
Salvia, J., & Ysseldyke, J. E. (1981). Assessment in special and reme-
Psychology, 84, 473-488.
dial education. Boston: Houghton Mifflin.
Swanson, H. L. (1993). Working memory in learning disability sub-
Samuels, M., Tzuriel, D., & Malloy-Miller, T. (1989). Dynamic testing
types. Journal of Experimental Child Psychology, 56, 87-114.
of children with learning difficulties. In R. T. Brown & M. Chazan
Swanson, H. L. (1994). The role of working memory and dynamic
(Eds.), Learning difficulties and emotional problems (pp. 145-166).
assessment in the classification of children with learning disabilities.
Calgary, Alberta, Canada: Detselig Enterprises.
Learning Disabilities Research and Practice, 9, 190-202.
Schmidt, L. (1971). Testing-the-limits in Leistungsuerhalten: Moglich-
Swanson, H. L. (1995a). Effects of dynamic testing on the classification
keiten and Grenzen. In E. Duhm (Ed.), Praxis der Klinischen Psy-
of learning disabilities: The predictive and discriminant validity of
chologie, Bond II (pp. 18-33). Gottingen, Germany: Hogrefe.
the Swanson Cognitive Processing Test, Journal of Psychoeducational
SchOttke, H., Bartram, M., & Wiedl, K. H. (1993). Psychometric impli-
Testing, 1, 204-229.
cations of learning potential testing: A typological approach. In
Swanson, H. L. (1995b). Using the cognitive processing test to assess
J. H. M. Hamers, K. Sijtsma, & A. J. J. M. Ruijssenaars (Eds.), Learn-
ability: Development of a dynamic assessment measure. School Psy-
ing potential testing (pp. 153—173). Amsterdam: Swets & Zeitlinger.
chology Review, 24, 672-693.
Serpell, R. (1993). The significance of schooling: Life journeys in an
Swanson, H. L. (1996). Swanson-Cognitive Processing Test. Austin,
African society. New \brk: Cambridge University Press.
TX: Pro-Ed.
Sewell, T. E. (1979). Intelligence and learning tasks as predictors of
Talyzina, N. K (Ed.). (1995). Formirovaniepriemov matematicheskogo
scholastic achievement in black and white first-grade children. Journal
of School Psychology, 17, 325-332. myshlenia [The formation of mathematical thinking skills]. Moscow:
Ventana-Graf.
Sewell, T. E. (1987). Dynamic assessment as a nondiscriminatory proce-
dure. In C. S. Lidz (Ed.), Dynamic testing (pp. 426-443). New "fork: Tharp, R. G., & Gallimore, R. (1988). Rousing minds to life: Teaching,
Guilford Press. learning and schooling in social context. Cambridge, England: Cam-
Shochet, I. M. (1992). A dynamic testing for undergraduate admission: bridge University Press.
The inverse relationship between modifiability and predictability. In Thorndike, E. L. (1924). An introduction to the theory of mental and
H. C. Haywood & D. Tzuriel (Eds.), Interactive testing (pp. 332- social measurement. New %rk: Wiley.
355). New York: Springer-Verlag. Throne, J. M., & Farb, J. (1978). Can mental retardation be reversed?
Siegel, L. S. (1989). IQ is irrelevant to the definition of learning disabili- British Journal of Mental Subnormality, 24, 63-73.
ties. Journal of Learning Disabilities, 22, 469-478. Tzuriel, D. (1992). The dynamic testing approach: A reply to Frisby
Sijtsma, K. (1993). Classical and modern test theory with an eye toward and Braden. The Journal of Special Education, 26, 302-324.
learning potential testing. In J. H. M. Hamers, K. Sijtsma, & A. J. J. M. Tzuriel, D. (1995). Dynamic-interactive testing: The legacy of L. S.
Ruijssenaars (Eds.), Learning potential testing (pp. 117—133). Am- Vygotsky and current developments. Unpublished manuscript.
sterdam: Swets & Zeitlinger. Tzuriel, D. (1997). The relation between parent-child MLE interactions
Silverman, H., & Waksman, M. (1992). Assessing the learning potential and children's cognitive modifiability. In A. Kozulin (Ed.), The ontog-
of penitentiary inmates: An application of Feuerstein's Learning Po- eny of cognitive modifiability (pp. 157 -180). Jerusalem: International
tential Testing Device. In H. C. Haywood & D. Tzuriel (Eds.), Inter- Center for the Enhancement of Cognitive Modifiability.
active testing (pp. 356-374). New \brk: Springer-Verlag. Tzuriel, D., & Caspi, N. (1992). Dynamic testing of cognitive modifi-
DYNAMIC TESTING 111

ability in deaf and hearing preschool children. Journal of Special Vygotsky, L. S. (1962). Thought and language. Cambridge, MA: MIT
Education, 26, 235-252. Press. (Original work published 1934)
Tzuriel, D., & Eeuerstein, R. (1992). Dynamic group testing for pre- Vygotsky, L. S. (1978 ). Mind in society. Cambridge, MA: Harvard Uni-
scriptive teaching: Differential effects of treatment. In H. C. Hay- versity Press.
wood & D. Tzuriel (Eds.), Interactive testing (pp. 187-206). New Vygotsky, L. S. (1983). Istoriia ravitiia vyshikh psikhicheskikh funktsy
Tfork: Springer-Verlag. [ A history of the development of the higher mental functions ] In A. N.
Tzuriel, D., & Haywood, H. C. (1992). The development of interactive- Matushkin (Ed.), The collected works of L. S. Vygotsky (Vol. 3, pp.
dynamic approaches for assessment of learning potential. In H. C. 5-328). Moscow: Pedagogika. (Original work published 1931)
Haywood & D. Tzuriel (Eds.), Interactive assessment (pp. 3-37). Vygotsky, L. S. (1987). The collected worts ofL. S. Vygotsky (Vol. 1).
New "fork: Springer-Verlag. New York: Plenum Press.
Tzuriel, D., & Klein, P. S. (1985). Analogical thinking modifiability Wechsler, D. (1949). Manual for the Wechsler Intelligence Scale for
in disadvantaged, regular, special education, and mentally retarded Children. New "fork: Psychological Corporation.
children. Journal of Abnormal Child Psychology, 13, 539-552. Wechsler, D. (1974). Manual for the Wechsler Intelligence Scale for
Tzuriel, D., & Klein, P. S. (1987). Assessing the young child: Children's Children—Revised. New \brk: Psychological Corporation.
analogical thinking modifiability. In C. S. Lidz (Ed.), Dynamic testing Wertsch, J. V. (1991). A sociocultural approach to socially shared cogni-
(pp. 268-282). New York: Guilford Press. tion. In L. B. Resnick, J. M. Levine, & S. D. Teasley (Eds.), Perspec-
Vaught, S., & Haywood, H. C. (1990). Interjudge reliability in dynamic tives on socially shared cognition (pp. 85-100). Washington, DC:
assessment. The Thinking Teacher, 5, 2-6. American Psychological Association.
Verster, J. M. (1973). Test administrators' manual for Deductive Rea- Wertsch, J. V, & Tulviste, P. (1992). L. S. Vygotsky and contemporary
soning Test. Johannesburg, South Africa: National Institute for Person- developmental psychology. Developmental Psychology, 28, 548—557.
nel Research. Wiedl, K. H., & Carlson, J. S. (1976). The factorial structure of the
Vlasova, T. A. (1972). New advances in Soviet defectology. Soviet Edu- Raven Coloured Progressive Matrices Test. Educational and Psycho-
cation. 14, 20-39. logical Measurement, 36, 1015-1019.
Vlasova, T. A., & Pevsner, M. S. (1971). Deti s vremennoi otstalost'iu Wurtz, R. G., Sewell, T, & Manni, J. L. (1985). The relationship of
razvitia [Children with temporary retardation in development]. Mos- estimated learning potential to performance on a learning task and
cow: Pedagogika. achievement. Psychology in the Schools, 22, 293-302.
Vye, N. J., Burns, M. S., Delclos, V. R., & Bransford, J. D. (1987).
A comprehensive approach to assessing intellectually handicapped Received January 24, 1997
children. In C. S. Lidz (Ed.), Dynamic testing (pp. 327-359). New Revision received November 5, 1997
York: Guilford Press. Accepted January 8, 1998 •

Low Publication Prices for APA Members and Affiliates

Keeping you up-to-date. All APA Fellows, Members, Associates, and Student Affiliates
receive—as part of their annual dues—subscriptions to the American Psychologist and
APA Monitor. High School Teacher and International Affiliates receive subscriptions to
the APA Monitor, and they may subscribe to the American Psychologist at a significantly
reduced rate. In addition, all Members and Student Affiliates are eligible for savings of up
to 60% (plus a journal credit) on all other APA journals, as well as significant discounts on
subscriptions from cooperating societies and publishers (e.g., the American Association for
Counseling and Development, Academic Press, and Human Sciences Press).

Essential resources. APA members and affiliates receive special rates for purchases of
APA books, including the Publication Manual of the American Psychological Association,
and on dozens of new topical books each year.

Other benefits of membership. Membership in APA also provides eligibility for


competitive insurance plans, continuing education programs, reduced APA convention fees,
and specialty divisions.

More information. Write to American Psychological Association, Membership Services,


750 First Street, NE, Washington, DC 20002-4242.

You might also like