The Role of Understanding in Word Problems
The Role of Understanding in Word Problems
net/publication/222165025
CITATIONS READS
369 822
4 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Denise Dellarosa Cummins on 13 August 2021.
AND
WALTERKINTSCH,KURTREUSSER, AND RHONDA WEIMER
University of Colorado
Wordproblemsare notoriously difficult to solve. We suggest that much of the
difliculty children experience with word problems can be attributed to difficulty in
comprehending abstract or ambiguous language. We tested this hypothesis by (1)
requiring children to recall problems either before or after solving them, (2) re-
quiring them to generate final questions to incomplete word problems, and (3)
modeling performance patterns using a computer simulation. Solution perfor-
mance was found to be systematically related to recall and question generation
performance. Correct solutions were associated with accurate recall of the prob-
lem structure and with appropriate question generation. Solution “errors” were
found to be correct solutions to miscomprehended problems. Word problems that
contained abstract or ambiguous language tended to be miscomprehended more
often than those using simpler language, and there was a great deal of system-
aticity in the way these problems were miscomprehended. Solution error patterns
were successfully simulated by manipulating a computer model’s language com-
prehension strategies, as opposed to its knowledge of logical set relations. o 1st~
Academic Press, Inc.
This work was supported by National Science Foundation Grant BNS-8309075 to Walter
Kintsch and James G. Greeno. We thank Arthur Samuel, Kurt Van Lehn, and an anony-
mous reviewer for helpful comments on this manuscript. Requests for reprints should be
sent to Denise D. Cummins, Psychology Department, University of Arizona, Tucson, AZ
85721.
405
oolo-0285/88 $7.50
Copyright 0 1988 by Academic Press, Inc.
Au rights of reproduction in any form reserved.
406 CUMMINS ET AL.
Correct performance on this version of the problem ranged from 83% for
nursery school children to 100%for first graders. Importantly, even nurs-
ery school children exhibited sophisticated set knowledge when solving
this problem. They did not, for example, simply line up the birds and
worms (on an accompanying picture) and count the singletons. Instead,
they solved the problem by counting the smaller set (worms) to determine
its cardinality, counting out a subset of the larger set (birds) to the same
cardinality, and then counting the number of birds remaining and retum-
ing that number as the answer. By using this “match-separate” strategy,
even nursery school children evidenced a tacit understanding of one-
to-one correspondence among sets that possess equivalent cardinality
(subset equivalence), as well as a sophisticated grasp of part-whole set
relations. Similar results were found by DeCorte, Verschaffel, and De-
Winn (1985), who improved solution performance by manipulating lin-
guistic aspects of problem texts in such a way as to make the semantic
relations among sentencesclearer. In fact, the inlluence of problem word-
ing was apparent in the Riley et al. data. For example, mean solution
accuracy on Compare 4 (see Table 1) was 25% higher than that on Com-
pare 5 (for second graders), even though these two problems describe
identical part-whole set structures, albeit with different words. The same
discrepancy was noted for Compare 3 (80%) and Compare 6 (35%), both
of which describe the same problem structure with different wordings.
Empirical results such as these are damaging to the logico-
mathematical explanation of solution difficulties. If children fail to solve
certain problems because they do not possess the conceptual knowledge
required to solve them, one would not expect minor wording changes to
improve solution performance. Yet this is precisely what is observed.
Instead, these results are entirely consistent with the linguistic develop-
ment view of problem solving development, since they suggest that chil-
dren find certain problems difficult because they cannot interpret key
words and phrases in the problem text.
An unanswered question in this work, however, is just how children do
interpret the problems they are asked to solve, particularly those that
employ troublesome language. This is of some importance because the
errors that children make are often counter-intuitive. For example, the
most commonly committed error on the birds/worms problem is to return
the large number “5” as the answer to the problem. In fact, these “given
UNDERSTANDING WORD PROBLEMS 409
Method
Subjects. Thirty-eight first grade children from the Boulder Valley School District served
as participants in the study. The children were tested late in the school year (during May).
Apparatus and materials. The 18 story problems used by Riley et al. (1983) served as
stimulus materials in the current study. These 18 problems are presented in Table 1. They
consist of six instances within each of three major problem types. The problem types are as
follows: Combine problems, in which a subset or superset must be computed given infor-
TABLE 1
Problems Used in Experiment 1 (Adapted from Riley, Greeno, & Heller, 1983)
Combine problems
1. Mary has 3 marbles. John has 5 marbles. How many marbles do they have
altogether?
2. Mary and John have some marbles altogether. Mary has 2 marbles. John has 4
marbles. How many marbles do they have altogether?
3. Mary has 4 marbles. John has some marbles. They have 7 marbles altogether. How
many marbles does John have?
4. Mary has some marbles. John has 6 marbles. They have 9 marbles altogether. How
many marbles does Mary have?
5. Mary and John have 8 marbles altogether. Mary has 7 marbles. How many marbles
does John have?
6. Mary and John have 4 marbles altogether. Mary has some marbles. John has 3
marbles. How many does Mary have?
Change problems
1. Mary had 3 marbles. Then John gave her 5 marbles. How many marbles does Mary
have now?
2. Mary had 6 marbles. Then she gave 4 marbles to John. How many marbles does
Mary have now?
3. Mary had 2 marbles. Then John gave her some marbles. Now Mary has 9 marbles.
How many marbles did John give to her?
4. Mary had 8 marbles. Then she gave some marbles to John. Now Mary has 3
marbles. How many marbles did she give to John?
5. Mary had some marbles. Then John gave her 3 marbles. Now Mary has 5 marbles.
How many marbles did Mary have in the beginning?
6. Mary had some marbles. Then she gave 2 marbles to John. Now Mary has 6
marbles. How many marbles did she have in the beginning?
Compare problems
1. Mary has 5 marbles. John has 8 marbles. How many marbles does John have more
than Mary?
2. Mary has 6 marbles. John has 2 marbles. How many marbles does John have less
than Mary?
3. Mary has 3 marbles. John has 4 marbles more than Mary. How many marbles does
John have?
4. Mary has 5 marbles. John has 3 marbles less than Mary. How many marbles does
John have?
5. Mary has 9 marbles. She has 4 marbles more than John. How many marbles does
John have?
6. Mary has 4 marbles. She has 3 marbles less than John. How many marbles does
John have?
UNDERSTANDING WORD PROBLEMS 411
mation about two other sets; Change problems, in which a starting set undergoes a transfer-
in or transfer-out of items, and the cardinality of the start set, transfer set, or result set must
be computed given information about two of the sets; Compare problems, in which the
cardinality of one set must be computed by comparing the information given about the
relative sizes of the other set sizes. The instances within each problem type differ in terms
of which set cardinality must be computed and the wording of the problems. The story
problems used in the present study all contained “Mary and John” as actors and “marbles”
as objects. This was done to reduce the memory load required to comprehend the problem.
The child needed only to attend to the relationships among the sets and to remember the
numbers stated in the problems.
Each child solved 18 problems. Half of the 18 problems were first solved and then re-
called; the remaining half were first recalled and then solved. Two versions of problem
presentation were used to ensure that all 18 problems were tested in each solve-recall
condition. In the fust version, one half of the problems in each problem type was assigned
to the Solve-Recall condition, and the remaining halves were assigned to the RecallSolve
condition. In the second version, these assignments were reversed so that version 1 Solve-
Recall problems became RecallSolve problems, and version 2 RecallSolve problems be-
came Solve-Recall problems. The presentation version served as a between-subjects factor.
The number triples used in the story problems included only the numbers 1 through 9 and
were chosen such that correct answers were (a) less than 10 and (b) not the same as a
number used in the story. Nine triples that satisfied these constraints were chosen for use
in the problems: 3-2-5, 4-2-6, S-2-7, 6-2-8,7-2-9,4-3-7, 5-3-8, 6-3-9, and 5-4-9. Half of these
triples were tested as addition problems and half as subtraction. Across subjects, these
triples were assigned to problems such that each triple was tested in each of the eighteen
problems.
In addition to the story problems, these number triples were tested as numeric format
problems. Each child received the same number assignment condition for both the story
problem and numeric format tests. For example, a given child received 3 + 2 = ? as both
a story problem (Combine 1) and as a numeric format problem. The child’s performance on
that equation could therefore be observed under both the story and numeric formats. The
numeric formats mirrored the story problem structures to which they corresponded. Note
that in certain cases (e.g., Change 5) this meant that the equation to be solved contained an
unknown on the left side of the equation (e.g., ? + 2 = 5). All numeric format problems
were presented in vertical sentence form; equations such as ? + 2 = 5 were written as an
open box with “ + 2” underneath it, a line underneath “ + 2,” and “5” underneath the line.
Procedure. Children were tested individually in a quiet room in their schools during
school hours. In keeping with the methodology of Riley et al. (and others), all problems were
presented orally, and the child was required to solve them without benetit of paper and
pencil. The sessions were recorded on a small, unobtrusive tape recorder. The child was
informed of the presence of the tape recorder, but was assured that only the experimenter
would hear the tape (i.e., parents and teachers would not). No child seemed uncomfortable
having the session taped.
Problem presentation was randomized for each child. The session began with instructions,
followed by practice problems. The practice problems consisted of two solve-recall and two
recall-solve problems. Children were assisted in solving and recalling these if required.
Once the experimenter was satisfied that the child understood the procedure, the experi-
mental session was begun. Children were not assisted in solving or recalling experimental
problems. They also were not told whether a problem was to be solved first or recalled first
until after the problem had been read. This was done to ensure that the strategies used to
solve and recall the problems would be the same in both conditions. Following the oral story
problem session, the child was given a sheet with the numeric problems on it and was
required to solve these.
CUMMINS ET AL.
-N
-W
123456123456123456
Combine Change Compare
Problem Type
FIG. 1. Proportion of correct solutions for the problems shown in Table 1 when presented
as word problems (W) and when presented in numerical form (N).
As stated earlier, subjects’ verbal recall protocols were scored for ac-
curacy of structural recd. A correct structural recall was any recall that
preserved the logical relations among sets in the original problem. For
example, consider Compare problems 4 and 5. These two problems de-
scribe the sameproblem structure using different wording. In both cases,
the small set must be derived given information about the large and dif-
ference sets. “Recalling” Compare 5 as Compare 4, therefore, constitutes
accurate structural recall because the original problem’s logical structure
is preserved. Structure-preserving recall transformations such as these
were observed on 12% of the trials; along with veridical reproductions
(45%), they constitute our measure of correct structural recall. Together,
they constituted 57% of all recall instances.
Figure 2 illustrates proportion correct structural recall. Like the solu-
tion results, the recall data are also in agreement with those of Riley
(1981), although the greater sensitivity of our recall measureprovides a bit
more information than the repetition measure used in the Riley study. As
predicted, the overall pattern of recall accuracy closely resembled that of
word problem solution accuracy, suggesting a strong relationship be-
tween the two.
We predicted that word problem performance would vary systemati-
cally with recall performance but not with numeric format performance.
To test this hypothesis, each subject’s protocol was scored for proportion
of correct word problem solutions, numeric format solutions, and struc-
tural recall across the 18 problems. A regression model was then con-
structed to predict each subject’s overall word problem solution perfor-
mance as a function of his or her overall performance on the structural
recall and numeric format tasks. A forward selection procedure was used
123456123456123456
Combine Change Compare
Problem Type
FIG. 2. Proportion correct structural recall for the problems shown in Table 1.
414 CUMMINS ET AL.
to select candidates for entry in the model. ’ Only one variable met the 5%
significance level for entry into the model, that of structural recall. This
simple model accounted for 72% of the variance in solution accuracy
(F(1,36) = 93.62, MS, = .Ol, p < .OOOl),supporting our prediction that
performance on word problems depends primarily on successful compre-
hension.
Finally, to ensure that our observed relationship was not simply re-
flecting subject variation (i.e., talented subjects performing well on all
tasks, less-talented subjects performing poorly on all tasks), we calcu-
lated a 2 x 2 contingency table for each subject indicating the number of
times problem recall and solution were equivalent in accuracy (i.e., both
right or both wrong) or were different in accuracy (i.e., one right, the
other wrong). The expected frequency was then computed for the right-
right case, and the deviation between observed and expected frequency in
that case calculated. These deviations were found to be significantly
greater than zero, t(37) = 6.75, p < .OOl, indicating that a dependency
between solutions and structural recall existed for individual subjects,
regardless of talent.
Miscomprehensions and Error Types
While the quantitative relationship between recall and solution perfor-
mance strongly suggests that solution difficulties are driven by break
downs in story comprehension, we can offer more direct evidence by way
of qualitative error analyses. We assumethat when a child recalls a prob-
lem, he or she describes the problem representation he or she constructed
during a solution attempt. The nature of a miscomprehension therefore
should be related to the type of solution error made. In the following
discussion, we will describe the relationships we noted between compre-
hension and solution errors.
Types of miscomprehensions. Aside from verbatim recall, subjects’
recall protocols could be classified into six categories. The first, termed
structure-preserving transformations (SP), was mentioned in the struc-
tural recall analysis and comprised 12% of all recall trials. These were
occasions on which the wording of the problem was changed during re-
call, but the all-important mathematical relations among sets was main-
tained (e.g., a subtraction Compare 5 problem became a subtraction Com-
t Separate regressions were also performed on the three problem types. The results were
not appreciably different than the overall regression, with the exception that the regression
coefficient for numeric accuracy was marginally significant for Change problems (b, = .22,
p < .ll). This was not surprising since, as noted earlier, four out of these six problems
contain unknowns in their number sentences, and first grade children are not familiar with
such forms.
UNDERSTANDING WORD PROBLEMS 415
Here, the superset specification line (i.e., “Now Mary has 7 marbles.“) is
simply left out of the problem altogether. In contrast to the other trans-
formations, this category does not seem to be a “transformation” at all,
but rather a legitimate “misrecall,” or memory error. A line from the
problem was simply left out or forgotten. Compare this to category 2S,
which seemsto suggest a true misconceptualization of the problem struc-
ture .
The sixth and final category simply included all misrecall instances that
did not fit into the above categories either because the child could re-
member nothing of the problem, or recall was so confused it could not be
classified. This category comprised 13% of all trials.
In summary, subjects’ miscomprehensions appeared to be systematic
in that they could be classified into five meaningful categories. It is also
interesting to note the distributions of problem types across these cate-
gories. When Compare problems were miscomprehended in a classifiable
way, they tended to fall into two categories, SV (38%) and NP (31%).
Change problems also tended to fall into the SV category (40%) and the
NP category (20%). Combine problems, on the other hand, tended to be
miscomprehended as double-superset problems (33%). Clearly, some as-
pect of these problems tends to invite certain interpretations from chil-
dren. We return to this question below; here, we turn to the more impor-
UNDERSTANDING WORD PROBLEMS 417
TABLE 2
Miscomprehensions and Conceptual Solution Errors
Response type
Recall type co wo SPN SBN AR OTH Total
Structure Preserving (SP) 51 1 5 2 20 1 80
Structure Violating (SV) 11 34 16 4 12 3 80
Nonsense Problem (NP) 10 2 14 24 4 1 55
Double Superset (2s) 4 5 13 0 5 3 30
Partial Recall (OS) 25 1 2 4 5 4 41
Correct (CO) 2.51 3 11 13 20 13 311
OTH (Other) 25 7 12 5 6 32 87
Total 377 53 73 52 72 57 684
Note. Frequencies are based on 18 observations from each of 38 subjects. WO, Wrong
operation errors; SPN, superset given number errors; SBN, subset given number errors;
AR, arithmetic error; OTH, unclassifiable error; CO, correct solution. See the text for an
explanation of recall types.
418 CUMMINS ET AL.
(e.g., John & Mary’s 7 marbles = John’s 5 marbles and Mary’s 2 mar-
bles.) In each case a part-whole (or SUPERSET) superschemais created
to capture the logical relations among the sets specified in the problems.
Finally, the presence of superschemata triggers arithmetic counting
procedures which produce answers to the problems. Failure to produce
an adequate superschema(i.e., misunderstanding the problem) causes the
program to use default strategies to produce a “best-guess” answer. An
example of such a default strategy is to search memory to determine
whether the answer is already known, that is, if a set that matches the
requested specifications has already been created. Another default strat-
egy is to mine the text base for key words (e.g., altogether, “in the
beginning”) that might cue a solution procedure. The effect of these de-
fault strategies will become apparent shortly.
When given the 18 problems to solve, the Dellarosa (1986) simulation
model solved all 18 without error, indicating that it and the Kintsch and
Green0 (1985) model upon which it is based are sufficient models of
children’s problem solving. More germane to our discussion here, how-
ever, is its usefulness in explaining children’s errors. Specifically, we
required the model to attempt these same problems under conditions of
impaired knowledge and compared its performance to that of children.
Presented in Table 3 are the answers produced by the simulation under
each of three knowledge impairment conditions, along with children’s
errors observed in this study.
Deficient conceptual knowledge. To test the contribution of conceptual
knowledge, we removed the simulation’s decontextualized knowledge
concerning par-whole relations and required it to solve the 18 problems.
The first thing to notice is that without conceptual knowledge, the
simulation’s performance matches that of children on four problems. Its
solution protocols can be described as attempts to model the actions in
the story, relying on linguistic knowledge and default strategies to obtain
answers. Relying on its “altogether means add” key word strategy, it
solves the addition Combine problems correctly (Combines 1 and 2), but
produces wrong operation errors on the subtraction Combine problems
(Combines 3 through 6). Children, however, produced given number er-
rors most frequently on these problems, with the exception of Combine 5.
On this problem, both simulation and children produced wrong operations
errors. Also, like children, the simulation had little difficulty solving
Change 1 and 2 problems because these two describe only simple transfer
operations. It cannot solve Changes 3 through 6, however, because the
story actions describe transfers involving unknown quantities, and it has
no way of mapping these onto part-whole structures. It cannot resort to
its default strategy of simply returning the quantity of the set specified in
the last line of the problem because that quantity is “SOME,” and that is
422 CUMMINS ET AL.
TABLE 3
Characteristic Errors: Observed and Simulated-Experiment 1
Simulation error
Children’s errors
Problem (most frequent) (IC)SL c(-S)L cs-L
-
Combine 1 - -* -* -*
Combine 2 - -* -* -*
Combine 3 SPN wo -wo SPN*
Combine 4 SPN wo -wo SPN*
Combine 5 wo wo* -wo* SPN
Combine 6 SPN wo -wo SPN*
Change 1 - -* U -*
Change 2 - -* U -*
Change 3 SPN U - -
Change 4 SPN U - -
Change 5 SPN-SBN U - SPN-SBN*
Change 6 SPN-SBN U - SPN-SBN*
Compare 1 SPN-WO - wo* SPN*
Compare 2 SBN-WO - wo* SBN*
Compare 3 SBN-WO U wo* SBN*
Compare 4 SBN U - SBN*
Compare 5 SBN U - SBN*
Compare 6 wo U wo* SBN
Total matches (*) 5 7 14
Note. Children’s errors constitute the most frequently observed error based on 38 sub-
jects’ observations per problem. Simulation conditions: YSL, no conceptual knowledge;
C-SL, no problem situation knowledge; CS-L, degraded linguistic knowledge concerning
key words and phrases. Error types: WO, wrong operation errors; SPN, superset given
number errors; SBN, subset given number errors; -, no error; U, unable to derive any
solution.
* Problems for which the simulation matched children’s error patterns.
but instead could only search for key words and the like in order to trigger
its conceptual knowledge concerning part-whole relations. Under these
conditions, the simulation matched children’s response patterns on 7 out
of 18 problems. Its solution protocols revealed the following:
Using its “altogether” default key word strategy, the simulation suc-
cessfully solved Combine 1 and 2 problems. Its performance on the other
Combine problems, however, depended on the ordering of rules/
strategies. If the rules were ordered such that the simulation accessed its
default rules rather than its thinking rules, then it committed wrong op-
eration errors on Combines 3 through 6. If the rules were ordered such
that the simulation “thought” before “defaulting,” then it used its con-
junction-superset strategy to assign the role of SUPERSET to the set
owned by both Mary and John. As a result it solved these problems
correctly. Giving priority to defaulting produced a pattern that matched
children’s performance on three out of six Combine problems; giving
priority to thinking produced a pattern that matched children’s on two out
of six.
In contrast, without its Change schemata, the simulation could not
solve any of the Change problems because it could not understand the
transfers described in them. In other words, it had no way to map these
transfers onto its conceptual knowledge concerning part-whole relations.
This is because Change problems describe story situations, not simple
comparisons or combining of set quantities. Without its story-situation
knowledge, it could process the text base but not produce a coherent
representation of the story.
Turning finally to Compare problems, the simulation was found to pro-
duce wrong operation errors on Compares 1 through 3 and on Compare 6,
while correctly solving Compares 4 and 5-all using the same strategy.
This strategy is one that assigns the role of SUPERSET to the set that is
least specified in the problem. The reason it did this is rather interesting.
Since it no longer understood comparison scenarios, i.e., had no direct
mappings from comparisons to part-whole structures, it ignored the
phrases containing the comparative form. The remaining parts of these
lines therefore specified sets owned by no one and were hence considered
specifications of the superset. For Compares 1 and 2, this translates into
“Mary has 5 marbles. John has 3 marbles. How many marbles (are there
altogether)?” Under these circumstances, the simulation assigned the
role of superset to the unknown quantity referenced in the last line of its
representation and added the other two set quantities. Most importantly,
when children performed wrong operation errors on these problems, they
misrecalled the problem in just this way 77% of the time. For Compares
3 and 6, ignoring the comparative phrase in the last line translated into,
e.g., “Mary has 3 marbles. (There are) 5 marbles (altogether). How many
424 CUMMINS ET AL.
not recalled by children, but it was included in our simulation to allow the
MORE-THAN proposition to enter short-term memory-as it presum-
ably does when children hear it-and hence affect the processing load.
Note the difference between this treatment of the comparative and the
treatment received when the story schemata are missing. Here, the com-
parative form is completely misunderstood as a statement of ownership;
in the former case, it was ignored entirely because it was not taken as a
statement of ownership nor could it be understood as a comparison sce-
nario since no knowledge of such a scenario was present in the simula-
tion’s knowledge base.
Finally, there was some indication in our data that children have difli-
culty interpreting the term “ALTOGETHER,” as in “Mary and John
have X marbles altogether.” There was some suggestion that children
interpret this as meaning that Mary and John EACH have X marbles.
Accordingly, the simulation’s proposition processing rules were made to
interpret these linguistic forms as follows: “Mary has X marbles and John
has X marbles.”
To summarize, in its present state, the simulation interpreted SOME as
an adjective, HAVE-MORE-THAN as simply HAVE, and ALTO-
GETHER as EACH. With these changes in its linguistic structures, the
simulation was again required to solve the 18 problems.
When the simulation’s linguistic knowledge was impaired as described,
it produced a response pattern that matched that of children on 15 out of
18 problems. These results indicate that the characteristic errors reported
here and elsewhere in the literature primarily reflect difficulties children
have in assigning interpretations to certain words and phrases used in
standard word problems.
Let us begin with the Compare type problems, since these are the most
straightforward cases. Recall that these problems are misinterpreted as
follows:
John has 4 marbles.
Mary has 3 marbles.
How many does (John, Mary) have?
In such a case, the simulation simply builds three unrelated sets, one
corresponding to John’s marbles, one corresponding to Mary’s, and one
corresponding to the set whose cardinality is requested. No superschema
is built since there is no information about how these sets are logically
related. As a result, none of the standard arithmetic operation rules apply,
and the simulation resorts to its default rules to produce an answer. In this
case, it searches memory to see if it already created a set that matches the
specifications of the requested set. Finding that it does, it returns the
426 CUMMINS ET AL.
Again, the simulation ends up with four unrelated sets in memory, and no
information about how they are logically related. As a result, it performs
a search for a set corresponding to John’s marbles, and returns “12,” or
the superset cardinal. Accordingly, it matched children’s performance on
five out of six Combine problems, the exception being Combine 5, on
which our subjects committed wrong operation errors instead of given
number errors .3
The case of Change problems is a bit more complex. In order for the
simulation to solve a Change problem, it must build a coherent TRANS-
FER-IN or TRANSFER-OUT schema. A TRANSFER-IN schema is
built if the problem describes a starting set into which objects are trans-
ferred. A TRANSFER-OUT schema is built if objects are transferred out
of the starting set. A difficulty arises when the simulation does not un-
derstand “SOME” to be a quantity word. In such a case, it does not
create a set when it encounters a proposition containing “Some.” In
Change 5 and Change 6 problems this is particularly disastrous, because
“Some” describes the starting set. Without this all important set, there is
not enough information to determine whether the problem describes a
TRANSFER-IN situation or a TRANSFER-OUT situation. As a result
the simulation again ends up with three unrelated sets (corresponding to
lines 2,3, and 4 in the problems) instead of a coherent superschemaunder
which these sets are subsumed. In order to solve the problem, it resorts
to its default rules. In this case, it can either (1) return the cardinality of
the set specified in the final line of the problem (e.g., Mary’s marbles) or
it can (2) use the term “BEGINNING” as a cue to return the cardinal of
the first set it created (i.e., the cardinal of the transfer set, line 2 of the
problem). Accordingly, the simulation matched the children’s perfor-
mance on four of the six Change problems.
To summarize, the best match between the children’s performance and
3 It should be noted that although our subjects committed wrong operation errors on
Combine 5, DeCorte, Verschtiel, and DeWinn (1985) reported that their subjects commit-
ted superset-given number errors on this problem, just as our simulation did.
UNDERSTANDING WORD PROBLEMS 427
the simulation’s was obtained when the latter’s language processing was
altered, as opposed to its logico-mathematical knowledge. Two discrep-
ancies did occur, however. The first was the fact that children were
observed to make an abundance of wrong operation errors on Compare
problems in addition to given number errors; our linguistically deficient
model produced solely given number errors. Note, however, that the
simulation did produce wrong operation errors on these problems when
its story-understanding knowledge was deficient. These results suggest
that children have two strategies for dealing with the difficult comparative
linguistic form. The first is to simply treat it as a statement about posses-
sion (as our linguistically deficient model did) and the second is to ignore
it completely (as our schema-deficient model did). In the former case, a
given number error occurs; in the latter, a wrong operation error occurs.
The same can be said of the term ALTOGETHER which can be inter-
preted either as EACH or as a command to add the numbers in the
problem.
The other discrepancy occurred on Change 3 and Change 4. Here, the
simulation could solve the problem (even without knowing that SOME is
a quantity word). Children sometimes had difficulty with these problems,
as evidenced by the given number errors observed on this problem. It is
not clear why this discrepancy occurred, although it should be noted that
even children do not find Changes 3 and 4 as difficult as Combines 3
through 6 or most of the Compare problems (see Figure 1).
Most important to our endeavor is the fact that the patterns of solution
difficulty reported here and elsewhere in the literature could be accounted
for simply by manipulating linguistic aspects of the simulation program.
The major determinant of its solution characteristics was whether the
nature of its linguistic processing afforded access to its conceptual knowl-
edge. Certain wordings allow direct mapping onto part-whole structures
(e.g., Changes 1 and 2); others instead require inferences about these
mappings (e.g., Compares 3 through 6, Changes 5 and 6) or instruction
concerning special interpretations/mappings of certain words in mathe-
matical settings (e.g., “altogether,” “ some”); these problems are there-
fore more likely to be misunderstood.
EXPERIMENT 2
The results of Experiment 1 supported our claim that language com-
prehension strategies play a central role in word problem solving. In
Experiment 2, we tested our claim further in two ways. First, we em-
ployed a new set of problems that better deserve to be called story prob-
lems than the impoverished versions used in Experiment 1. The problems
were designed to be little vignettes, showing plausible, realistic situations
and setting up a motivation for the final arithmetic question that com-
428 CUMMINS ET AL.
Method
Subjects. The participants were 36 second and 36 third grade children from the Jefferson
County School system. The majority were white, middle class children of average intelli-
gence. The schools were paid $5.00 for each student’s participation.
Murerials. Four &cult problem types from those used on Experiment 1 were chosen on
which to base the stories. These included Combine 5, Change 5, Change 6, and Compare 5.
These problem types were then embedded in rich story contexts. The stories were all five
lines long, and their propositional content varied from 18 to 31 propositions, with a mean of
23.9 and a standard deviation of 3.9. There were 20 story problems in all, five from each of
the four problem types. One of each of these were used as practice problems; the remaining
16 served as the stimulus materials. The numbers embedded in the problems were double
digit numbers. Examples of the problems are presented in Table 4.
Procedure. Subjects were tested in a small, quiet room in their schools during school
hours. The sessions, which lasted approximately 1 h, were recorded on tape. The child was
informed of the tape recorder, but was assured that no one but the experimenter would hear
the tape (i.e., parents and teachers would not). The tape recorder was unobtrusive, and no
child seemed uncomfortable with its presence. Problems were presented by placing a card
in front of the child on which a problem was typed and reading it out loud. The reading rate
was kept slow enough to ensure that the child co&d follow along.
There were two experimental factors. The fust was recall order: Recall before or after
solving a problem. The second was the question task: Generate or listen to the fmal line of
the problem story (Generation vs Standard question condition). These two factors were
crossed to form 2 x 2 within-subject design. Each subject received one problem from each
problem type under each of these conditions.
The sessions began with verbal instructions concerning the problem solving and recall
tasks. The standard question condition always preceded the generation condition, and sep-
arate instructions were given prior to beginning the generation task. Following instructions,
the child was given two practice problems. One problem was fast solved and then recalled,
and the other was fmt recalled and then solved. Recall was initiated by asking the subject
to tell the story back. In order to ensure a reasonable amount of recall, a set of recall
prompts was used whenever the child failed to respond. A subset of words from the stories
was reserved for this purpose. These words consisted primarily of character names and time
sequence words such as “then she.” All subjects were prompted using the same words. The
question generation task was initiated by asking the subject to think up a good question to
complete the story. Once a question was generated, the child was asked to answer the
question, that is, solve the problem.
During practice, subjects were assisted in solving the problems and clarifying the task.
Once the experimental session began, no help was given other than recall prompting. The
card on which the problem was typed was turned over immediately following reading. Typed
on the back of the card were the numbers required to solve the problem, including the word
UNDERSTANDING WORD PROBLEMS 429
TABLE 4
Examples of Each Problem Type Used in Experiment 2
“SOME.” Subjects were asked to write down the numbers on their work sheet and to
indicate whether they intended to add or subtract (i.e., they had to write an equation). When
they solved the problem, they also indicated their answers on the sheet.
To summarize, the sessions consisted of two practice problems, four standard problems,
two of which were solved first and two recalled first, another two practice problems, and
four question-generation problems of which two were solved first and two were recalled
fmt. Order of recall task presentation was counterbalanced across subjects, as was problem
presentation.
RESULTS
Protocols were scored for the following: (a) solution accuracy, (b) ques-
tion accuracy, both recalled and generated, (c) equation accuracy, (d)
solution error type, (e) propositional recall, and (f) structural recall. The
following analyses were conducted on these data. Unless otherwise
stated, rejection probability was .05. One third-grader’s data was lost due
to a faulty tape. In all subsequent analyses, the appropriate group mean
was substituted for that subject’s data.
430 CUMMINS ET AL.
TABLE 5
Proportion Correct Solutions to Word Problems in Experiment 2
Standard condition Generation condition Mean
Third grade
Solve First .67 .71 .69
Recall First .63 .63 .63
Second grade
Solve First A4 .33 .39
Recall First .33 .28 .30
Mean .52 .49 SO
Note. Cell means are based on 72 observations (two problems from each of 36 subjects
under each cell condition).
UNDERSTANDING WORD PROBLEMS 431
ers under the standard condition, and one for third graders under the
generation condition.4
For both second and third graders, only one variable met the p = .05
entry level condition for predicting solution performance in the standard
condition, that of structural recall, F(1,34) = 22.45, MS, = .09, p < .OOl,
and F(1,35) = 6.51 MS, = .07, p < .02, respectively. Thus, a subject’s
ability to solve a problem depended on his or her ability to comprehend
the story properly. The ability to simply remember the final line to the
problem did not correlate significantly with solution performance. It
should be noted, however, that while this variable accounted for 40% of
the variance among third graders’ solution performance, it accounted for
only 16% of the variance among second graders’ solution performance.
The models for the generation condition presented a different picture.
Here third graders’ solution performance was determined by both their
ability to complete the problem story with an appropriate question (6, =
.61, SE, = .12, p < .OOOl)and their ability to recall the problem structure
properly, (6, = .35, SE, = .16, p < .04). This two-variable model ac-
counted for 67% of the variance in solution performance. Second graders’
solution performance, however, was significantly influenced by only one
factor, the ability to complete the problem story with an appropriate
question (b, = .33, SE, = .13, p < .02). Despite its statistical signifi-
cance, however, it should be noted that this model accounted for only
16% of the variance in second graders’ solution performance in this con-
dition. (Moreover, the regression coefficient for structural recall was sig-
nificant at the .15 level, suggesting a difficulty of statistical power.)
Clearly, second graders’ performance was far more idiosyncratic and
variable than was third graders’ performance. One interpretation is that
the rather taxing task demands exceeded their processing resources.
Finally, to ensure that the relationship between structural recall and
solution performance was not simply reflecting subject variability, the
deviation from expected frequencies in the correct-recall/correct-answer
case was computed for each subject as is described in Experiment 1.
These deviations were found to be significantly greater than zero, t(72) =
4.10, p < .OOOl, indicating that, on the average, structural recall and
solution performance correlated for each subject regardless of overall
performance level. The same result was observed for the relationship
4 Equation generation was not included in the regression models because it correlated
nearly perfectly with solution performance. That is, subjects had no diffkulty carrying out
the computations in their equations. If they wrote an appropriate equation, they got the
problem right; if they wrote the wrong equation, they got the problem wrong. The more
important question is what determined the subject’s ability to generate the right equation.
Since equation-writing and solution performance were nearly perfectly correlated, it follows
that the same variables that influenced the latter would also influence the former.
432 CUMMINS ET AL.
TABLE 6
Questions Generated and Solutions in Experiment 2
Response type
Question type co wo GN AR OTH Total
co 106 17 7 17 2 149
wo 10 35 4 0 0 49
GN 13 17 18 1 0 49
OTH 12 12 7 2 4 37
Total 141 81 36 20 6 284
Nore. Frequencies are based on four observations from each of 71 subjects. CO, Correct
answer; WO, wrong operation error; GN, given number error; AR, arithmetic error; OTH,
unclassifiable error.
UNDERSTANDING WORD PROBLEMS 433
TABLE 7
Characteristic Errors: Observed and Simulated-Experiment 2
Simulation error
Children’s errors
Problem (most frequent) (C)SL C(--S)L cs-L
Combine 5
A wo wo* SBN SBN
B wo wo* wo* wo*
C WO-SPN wo* wo* wo*
D WO-SPN wo* wo* wo*
Change 5
A wo U - SPN
B wo wo* - wo*
C wo wo* wo* wo*
D wo U - -
Change 6
A wo U wo* wo*
B WO-SPN U wo* wo*
C wo U - -
D WO-SBN U wo* wo*
Compare 5
A wo U wo* wo*
B wo U - wo*
C wo U wo* wo*
D wo U U U
Total matches (*) 6 9 11
Note. Conceptual errors constitute the most frequently observed conceptual errors based
on 36 subject observations per problem. Error types: WO, wrong operation errors; SPN,
superset given number errors; SBN, subset given number errors; -, no error; U, unable to
derive any solution. Simulation conditions: CSL, no conceptual knowledge; C-SL, no
problem situation knowledge; CS-L, degraded linguistic knowledge concerning key words
and phrases.
* Problems for which the simulation matched children’s error patterns.
inferences. The best match with children’s error patterns was obtained
when its knowledge concerning key words and phrases were impaired.
More importantly, when its verbal knowledge was impaired but the prob-
lem texts were rich, its problem solving strategies changed. Rather than
simply terminating the problem solving episode with unrelated sets, it
attempted to use information from the problems’ rich text bases in order
to build par-whole schemas. It is possible that children attempted the
same strategies, causing a switch from given number to wrong operation
errors in their protocols.
DISCUSSION
Why are word problems so difficult? The results reported here suggest
that text comprehension factors figure heavily in word problem difficulty.
In Experiment 1, solution performance was found to mirror structural
recall, indicating that solution strategies are dictated by the quality of
comprehension achieved. Comprehension, in turn, appeared to be influ-
enced by the nature of the language used in the problem text. Problem
texts that contain certain linguistic forms are particularly difficult for
children to solve. These include forms such as “SOME,” “How many
more X’s than Y’s?“, and certain uses of “altogether.” Our structural
recall results show that these forms are often misinterpreted by children
and, moreover, misinterpreted in certain ways. Our simulation results
suggest that common solution error patterns are directly related to the
linguistic sophistication possessed by the solver. The empirical and sim-
ulation results of Experiment 2 indicated that solution strategies can be
directly influenced by the nature of the problem text.
More importantly, the simulation results of the two experiments clearly
show the interaction of text characteristics and knowledge in determining
solution strategies. Robust linguistic knowledge produced successful so-
lution attempts regardless of text characteristics, but the attempts were
more time and resource-demanding. Poor linguistic knowledge, however,
produced poor solution performance, but the nature of the errors com-
mitted depended on text characteristics. Sparse texts were associated
with given number errors; rich texts were associated with wrong opera-
tion errors and a preference for a more global super-set/subset conceptu-
alization of the problem situations. More important, the primary benefit
of proper linguistic knowledge in this domain appears to be in facilitating
access to conceptual knowledge concerning part-whole relations. Such
access allows the problem solver to construct large, cohesive problem
representations in which the relations among individual sets are clearly
specified.
This description of successful arithmetic problem solving as the build-
ing of large, cohesive structures is consistent with descriptions of problem
436 CUMMINS ET AL.
solving in other “adult” domains, such as physics (e.g., Chi, Feltovich, &
Glaser, 1981) and chess (Newell & Simon, 1972). In these domains, ex-
pertise development is typically considered a matter of constructing prob-
lem type schemata and using such structures to guide comprehension.
Indeed, the models of arithmetic problem solving expertise proposed by
Briars and Larkin and Riley et al. are based on this very notion. From this
view, then, the reason “How many more X’s are there than Y’s” is
difficult for children because it depends on the possession of a part-whole
superschema in memory to guide comprehension. This view, however,
does not explain adequately why a simple change in wording, such as that
used in Compare 4 as opposed to Compare 5, should improve perfor-
mance dramatically, as do other language manipulations (e.g., Hudson,
1983; DeCorte et al., 1985). Instead, it appears that certain verbal formats
allow contact to be made with superschema knowledge while others do
not. Results such as these suggest that word problem difficulties may be
akin to reasoning fallacies. For example, dramatic differences in reason-
ing about the logical conditional (if p, then q) have been produced by
manipulating problem format (Cheng & Holyoak, 1985; Cheng, Holyoak,
Nisbett, & Oliver, 1986; Johnson-Laird & Wason, 1977; Reich & Ruth,
1982). Typically, adults exhibit better performance when reasoning tasks
are presented in concrete formats (e.g., envelopes and postage require-
ments) rather than abstract, formal formats (e.g., letters and numbers).
Moreover, the strategies adults employ change as a function of the stim-
ulus format. For example, Reich and Ruth (1982) reported that adults tend
to match the terms mentioned in abstractly stated rules, but to verify
concretely stated rules-the latter being the more appropriate strategy for
the task. The tendency to produce given number errors in Experiment 1
may also be instances of this “matching” strategy: When faced with
problems whose text bases are sparse and incomprehensible due to am-
biguous linguistic terms, children may choose to ignore those terms and,
having no other useful information available, simply match actor’s names
with numbers stated in the problem. (Our simulation was found to employ
this strategy under these conditions.) If this were the case, then children
could be described as using the same strategies adults use to solve prob-
lems under conditions of uncertainty in unfamiliar domains.
Linguistic and resource factors, however, are not the whole story. How
well children perform on word problems depends, of course, on their
formal knowledge of the rules and operations in the problem domain and
on their level of conceptual development. These factors have been ex-
tensively studied both in the developmental and problem solving litera-
ture (Piaget, 1970; Greeno, 1978). The well-demonstrated significance of
such format factors, however, need not obscure the role played by com-
prehension factors. The experiments reported here provide rich evidence
UNDERSTANDING WORD PROBLEMS 437
REFERENCES
Briars, D. J., & Larkin, J. H. (1984). An integrated model of skill in solving elementary
word problems. Cognition and Instruction, 1, 245-296.
Carpenter, T. P., Corbitt, M. K., Kepner, H. S., Lindquist, M. M., & Reys, R. E. (1980).
Solving verbal problems: Results and implications for National Assessment. Arithmetic
Teacher, 28, 8-12.
Cheng, P. W., & Holyoak, K. J. (1985). Pragmatic reasoning schemas. Cognitive Psychol-
ogy, 17, 391-416.
Cheng, P. W., Holyoak, K. J., Nisbett, R. E., & Oliver, L. M. (1986). Pragmatic versus
syntactic approaches to training deductive reasoning. Cognitive Psychology, 18, 293-
328.
Chi, M. T., Feltovich, P. J., & Glaser, R. (1981). Categorization and representation of
physics problems by experts and novices. Cognitive Science, 5, 121-152.
Dellarosa, D. (1986). A computer simulation of children’s arithmetic word problem solving.
Behavior Research Methods, Instruments, and Computers, 18, 147-154.
DeCorte, E., & Verschaffel, L. (1986). Eye-movement data as access to solution processes
of elementary addition and subtraction problems. Paper presented at the meetings of
the American Educational Research Association, San Francisco, April.
DeCorte, E., Verschaffel, L., & DeWinn, L. (1985). The influence of rewording verbal
problems on children’s problem representation and solutions. Journal of Educational
Psychology, 11, 460470.
Fletcher, C. R. (1985). Understanding and solving arithmetic word problems: A computer
simulation. Behavior Research Methods, Instruments, and Computers, 17, 565-571.
438 CUMMINS ET AL.