Mathematics Self-Efficacy and Mathematical Problem Solving Implications of Using Different Forms of Assessment
Mathematics Self-Efficacy and Mathematical Problem Solving Implications of Using Different Forms of Assessment
Mathematics Self-Efficacy and Mathematical Problem Solving Implications of Using Different Forms of Assessment
SOCIAL COGNITIVE THEORISTS have hypothesized that students selfefficacy beliefs, that is, their judgments of their capability to accomplish specific academic tasks, are important determinants of academic motivation, choices,
and performance (e.g., Bandura, 1986, 1997; Pajares, 1996b; Schunk, 1991).
These beliefs of self-capability affect motivation by influencing the effort and
213
214
persistence with which students engage in academic tasks, as well as the anxiety
they experience. Social cognitive theorists also hypothesize that these beliefs of
self-capability mediate the influence of other determinants of academic outcomes. In part, this is because the confidence that students have in their own
capability helps determine what they do with the knowledge and skills they possess. Thus, social cognitive theorists maintain that the academic performance of
students is determined in large measure by the confidence with which they
approach academic tasks (Bandura, 1997; Schunk, 1991). For example, when
students take a mathematics exam, the self-confidence that they experience as
they read and analyze specific problems in part determines the amount of time
and effort they put into solving those problems. Student with greater confidence
work harder and longer and are less anxious. As a result, their chances of successful academic performance are enhanced.
Various researchers (e.g., Hackett, 1985; Pajares, 1996a; Pajares & Miller,
1994) have reported that students judgments of their capability to solve mathematics problems are predictive of their actual capability to solve those problems.
These judgments also mediate the influence of other predictors such as math
background, math anxiety, perceived usefulness of mathematics, prior achievement, and gender. Math self-efficacy also has been shown to be as strong a predictor of mathematical problem-solving capability as general mental ability
(Pajares & Kranzler, 1995), a variable generally found to be a powerful predictor of academic performance (Thorndike, 1986). Other researchers (Collins,
1982; Schunk, 1989, 1991) have reported that, when students approach academic tasks, those with higher self-efficacy work harder and for longer periods of
time than do those with lower self-efficacy.
Researchers have usually assessed mathematics self-efficacy by asking students to use a Likert-type scale to indicate the strength of their confidence in
solving various mathematics problems. Specifically, students are presented with
a number of problems and asked to rate how confident they are that they can successfully solve each one. Efficacy instruments such as the Mathematics Self-Efficacy Scale (MSES; Betz & Hackett, 1983), for example, ask students to provide
judgments of their capability to solve specific algebra or geometry problems correctly. However, whereas the subsequent performance assessment presents students with these problems in a traditional multiple-choice format, the efficacy
instrument presents the problems without the multiple-choice options. Pajares
and Miller (1995) have shown the importance of having a close link between efficacy judgments and the criterion task. Therefore, it is important to determine
whether the form of self-efficacy assessment (multiple-choice or open-ended
question format) differentially predicts performance and whether using multiplechoice or open-ended self-efficacy and performance assessments yields different
relationships between self-efficacy and performance. In the present study, students were administered a self-efficacy measure in which math problems were
215
MC
cc
OE
(Group 1)
Open-ended selfefficacy
instnunent;multiple-choice
performance test.
OE self-efficacy
OE performance
(Group 3)
MC
MC self-effkacy
MC performance
(Group 2)
Both instruments presented
with multiple-cboieeanswer
format.
MC self-et8csey
OE performance
(Group 4)
Multiple-choice self-efficacy
instrument; open-ended
performancetest.
'We are aware that alternative assessment methods involve more than simply administering an openended test. However, mathematics tests with open-ended answers are one alternative to traditional
assessment formats, and therefore we take the liberty of using the term alternative to refer to them.
216
the implementation of statewide assessments that are in new formats. For example, the state of Florida is currently developing new forms of assessment that are
to be adopted statewide to match the objectives of Blueprint 2000, which map out
the states new educational mission. The hope is that new assessments will have
a strong impact on both the self-efficacy and performance of students in the years
to come. Specifically, it is hoped that the new assessments will more closely
mirror the teaching and learning process, resulting in greater instructional fidelity (Miller & Seraphine, 1993, p. 119). Wiggins (1989) argued that if tests more
closely match the instruction received, confidence will play a greater role in student learning and assessment. Thus, part of the rationale for alternative forms of
assessment lies in their ability to increase students confidence in their capability to do well on them.
A third benefit to be gained from research on formats for self-efficacy judgments and performance assessment involves the issue of calibration, that is, the
degree to which students judgments of their capability reflect their actual competence. Researchers (Hackett, 1985; Hackett & Betz, 1989; Pajares & Miller,
1994) consistently have found that most students are overconfident about their
capability to solve mathematics problems. Pajares and Kranzler (1995) found
that the vast majority of the high school students they tested demonstrated strong
confidence in their ability to solve mathematics problems, but this confidence
was not matched by their subsequent performance. Bandura (1986) argued that
the most functional efficacy judgments are those that slightly exceed what one
can actually accomplish. Excessive overconfidence, however, can result in serious, irreparable harm (p. 394).
It merits considering what role formats for self-efficacy judgments and performance assessment play in students reported levels of confidence and in the
observed correspondence between that confidence and their performance. It may
be that the levels of overconfidence reported by researchers are at least partly a
function of the formats that they used to measure self-efficacy and performance,
Pajares and Kranzler (1995) recommended that research on the nature of the relationship between efficacy judgments and calibration is needed. They suggested
that it may be more important to develop instructional techniques and intervention strategies to improve students calibration than to attempt to raise their
already overconfident beliefs. Improved calibration should result in better understanding by students of what they know and do not know so that they more effectively deploy appropriate cognitive strategies during the problem-solving
process. If improved calibration is in part a function of self-efficacy assessment,
then the assessment itself becomes a useful intervention to help students with this
metacognitive capability.
Our purpose in the present study was to determine whether varying the form
of assessment would influence students self-efficacy judgments and affect the
relationship between self-efficacy judgments and mathematics performance.
217
Method
Participants and Procedures
218
219
OE Eff/OE Perf and MC Eff/OE Perf groups received a version of the test in a
fill-in-the-blank format and were asked to write the answer in the blank that corresponded to each problem. The OE EffMC Perf and MC EffMC Perf groups
were provided with 5-item multiple-choice answers and were asked to circle the
correct one. Cronbachs alpha coefficient for the open-ended test was 37; the
alpha coefficient for the multiple-choice test was 32.
Calibration. We used three measures of calibration. The first was the mean bias
score described by Keren (1991), Schraw (1993, and Yates (1990). Bias reveals
the direction of the errors in judgment and is computed by subtracting actual performance from predicted confidence. Performance was scored so that a correct
answer was scored as 6 and an incorrect answer was scored as 1. Scores for the
Likert-type scale for the self-efficacy instrument also ranged from 1 to 6. Thus,
expressing no confidence (1) and answering incorrectly (1) would reflect zero
bias (1 - I), whereas the same lack of confidence with a correct answer would
receive a bias score of -5 (1 - 6), indicating underconfdence. With this procedure, then, bias scores could range from -5 to +5. Scores larger than zero corresponded to overconfidence; scores less than zero corresponded to underconfidence. The 30 bias scores (1 for each item) were averaged to yield a mean bias
score.
The second measure of calibration was mean accuracy, which was computed
by subtracting the absolute value of each bias score from 5. This score reveals the
magnitude of the judgment error, which could range from 0 (complete inaccwacy) to 5 (complete accuracy). The 30 accuracy scores (1 for each item) were averaged to yield a mean accuracy score. Similar procedures and their rationale are
described elsewhere (Schraw, 1995; Schraw & DeBacker Roedel, 1994; Schraw,
Dunkle, Bendixen, & DeBacker Roedel, 1995; Schraw, Potenza, & NebelsickGullet, 1993).
The third measure of calibration was item accuracy, which is a score that
reflects the number of items on which a students confidence judgment and performance attainment concurred. Specifically, the score is the number of items on
which the student expressed confidence (by marking 4, 5 , or 6) and answered
correctly plus the number of items on which the student expressed lack of confidence (by marking 1,2, or 3) and answered incorrectly. This last measure of calibration was computed so as to permit comparisons with prior findings when this
technique was used.
Results
Descriptive statistics for each treatment groups scores on the dependent variables are shown in Table 1. A multivariate analysis of variance (MANOVA)
revealed a significant between-groups effect for the dependent variables of math-
220
TABLE 1
Descriptive Statistics for Variables, by Treatment Group
OE EffMC Perf
Variable
Performance
Self-efficacy
Bias
Mean accuracy
Item accuracy
Grow
MC Eff/MC Perf
OE Eff/OE Perf
MC Eff/OE Perf
SD
SD
SD
SD
18.7ga
146.31
0.75,
3.17
19.54
5.20
18.52
0.84
0.60
4.23
19.36a
144.79
0.60,
3.28,
20.26a
5.14
20.38
0.78
0.62
4.69
14.78,
141.61
1.26,
2.98,
18.10,
6.09
22.55
0.86
0.58
4.44
15.70,
145.91
1.25,
3.04,
18.41,
5.82
18.40
0.84
0.62
4.62
Note. Group means for a dependent variable (row) that are subscripted by different letters are statistically different
(experimentwise = S .05) according to a Tukey HSD computed on an effect identified by MANOVA. OE =openended; MC = multiple-choice; Eff = self-efficacy: Perf = performance.
ematics performance, self-efficacy, mean bias, mean accuracy, and item accuracy, Wilkss lambda = .77, F( 12, 842) = 7.19; p < .0001. Univariate analysis of
variance (ANOVA) results showed group effects for all dependent variables with
the exception of self-efficacy-students reported similar levels of confidence
whether problems were presented to them in a multiple-choice or open-ended
format, F(5, 321) = 2.07, p > .07. There were significant group effects for performance, F(5, 321) = 41.60, p < .Owl; for mean bias, F(5, 321) = 38.33, p <
.Owl; for mean accuracy, F(5, 321) = 32.01, p < .0001; and for item accuracy,
F(5,321) = 33.70, p < .0001. Tukeys HSD test showed that the two groups that
took the multiple-choice performance test significantly outperformed the groups
that took the open-ended test. As a result of their lower performance scores, the
groups that took the open-ended test also had higher mean bias scores, reflecting
greater overconfidence. Differences in mean accuracy and item accuracy were
found only between the group that took both measures in a multiple-choice format and the two open-ended performance groups.
Prediction of Mathematics Performance
22 1
TABLE 2
CorrelationsBetween Self-Efticacyand ProblemSolving Performance by 'Ikatment Group,
Level, and Gender
Group
Group
Total
Full sample
Girls
Boys
F'realgebra
Girls
Boys
Algebra
Girls
Boys
.49***
.38***
a***
.46***
.23
a***
.55***
.50***
.62***
OE Effl
MC Perf
MC Effl
MC Perf
OE Effl
OE Perf
MC Effl
OE Perf
.41***
.2 1
.63***
.47**
.I3
.76***
.57**t
.52**
.61**
.51***
.34*
.64***
.41*
-.32
.66***
.52***
.56***
.54***
.59***
.52***
.40**
.69***
.56***
.41
.70***
.57***
.50**
.73***
.51**
.55**
.54**
.36
.70***
.62***
.59***
.66***
choice vs. open-ended), performance test format (multiple-choice vs. openended), and the interaction between the self-efficacy and performance test formats. In addition, we examined differences across subgroups in the predictive
value of self-efficacy by including the interaction between self-efficacy and (a)
gender, (b) self-efficacy test format, (c) performance test format, and (d) self-efficacy test format by performance test format (a three-way interaction). We examined the residuals of the multiple regression analyses for heteroscedasticity; plots
of the residuals showed the data to be homoscedastic. Tests for nonlinearity were
nonsignificant. The model for this analysis was significant (R2= .57), but the last
three interactions described proved to be nonsignificant. Consequently, we tested a reduced model with the main effects, the interaction of the self-efficacy and
performance tests formats, and the interaction between self-efficacy score and
gender (see Table 3). The reduced model suffered no significant loss in predictive value, F(6, 320) = 72.39, p < .OO01, R2 = .57.
Results of the reduced model showed that the form of self-efficacy assessment
was not predictive of performance scores. This finding was consistent with the
MANOVA finding of no significant differences between self-efficacy assessments. As expected, the main effect for performance format was predictive; students who took the multiple-choice test scored an average of 3.67 problems higher than did students who took the open-ended performance measure. Both math
self-efficacy, t = 6.36, p < .OO01, and algebra level, t = 13.04, p < .OO01, proved
predictive of performance. Students enrolled in algebra scored an average of 6.3
problems higher than did students enrolled in prealgebra.
One interesting effect was the significant interaction between gender and self-
222
TABLE 3
Multiple Regression Model Predicting Mathematics Problem-Solving Performance
Independent
variable
Intercept
Self-efficacy format
Performance format
Math self-efficacy
Level
Gender
Gender x Self-Efficacy
Parameter
estimate
SE
-2.63
0.56
3.63
0.10
5.78
-7.73
0.05
2.21
0.43
0.43
0.02
0.44
3.14
0.02
-1.19
1.30
8.46*
6.36*
13.04*
-2.46*
2.55*
P
,000
,048
,309
.331
.481
-.655
.693
Prob > I f 1
,2348
,1933
.OoOl
.ooOl
,000 1
,0145
.0113
Note. Format denotes multiple-choice versus open-ended response assessment. Level denotes prealgebra (coded 0)
versus algebra (coded I ) enrollment. For gender, girls were coded 0 and boys coded I . For self-efficacy format, the
open-ended instrument was coded 0 and the multiple-choice instrument was coded I . For performance format, the
open-ended test was coded 0 and the multiple-choice test was coded I .
Calibration
The students who were administered the open-ended performance measure
had greater bias and lower mean and item accuracy (see Table I). These differences resulted from the lower performance scores obtained by the OE Eff/OE
Perf and MC Eff/OE Perf groups on the open-ended performance measure but
without the corresponding lower self-efficacy scores. The MANOVA results
revealed significant differences between the students enrolled in algebra and prealgebra for the dependent variables of performance, mean bias, mean accuracy,
223
A
22
MBoys
163.01
/
Girls
20
18
16
14
/Y
1 W.19
12 -
105
125
145
-2SD
-lSD
165
180
+1SD
+2SB
224
TABLE 4
Mean Scores for Dependent Variables, by Algebra Level and Gender
Variable
Self-efficacy
Multiple choice
Open-ended
Performance
Multiple choice
Open-ended
Mean bias
Mean accuracy
Item accuracy
Prealgebra
M
SD
142.01
140.82
143.24
13.36*
15.33*
11.27*
1.51*
2.71*
15.98*
20.60
22.52
18.52
5.1 1
4.61
4.82
.8 1
.55
4.15
Algebra
M
SD
146.35
148.16
144.42
19.69*
21.52*
17.76*
0.60*
3.38
21.12*
19.54
16.69
22.12
4.95
3.90
5.21
.72
.50
3.60
Girls
Boys
SD
SD
142.96
144.22
141.73
17.12
19.00
15.20
0.91
3.08
18.99
19.1
17.5
20.6
5.7
4.9
5.9
.92
.58
4.5 1
146.67
146.53
146.84
17.32
19.19
15.27
1.00
3.17
19.24
20.98
21.40
20.67
6.09
5.50
6.08
.81
.65
4.66
Note. Mean differencesreported are the result of MANOVA and follow-ups, with group, gender, algebra level, and
all possible interactions as independent variables. No interactive effects of level or gender were found for any variable.
ended performance measure than for those tested with the multiple-choice measure. The algebra students had higher performance scores and better calibration
than did the prealgebra students. There were no gender differences for any of the
dependent measures.
Discussion
The first objective of this study was to determine whether a traditional, multiple-choice presentation of mathematics problems versus an alternative, openended presentation influences students mathematics self-efficacy judgments.
The data analysis demonstrated that the students reported self-efficacy was not
affected by these variations in presentation format. One possible explanation for
this result is that the students did not look at the multiple-choice answers when
they made their confidence judgments. As we discuss below, it is also possible
that, regardless of how their confidence is assessed, students familiarity with traditional forms of assessment creates a mind set that causes them to base their
confidence judgments on the expectation that the performance task will be presented in a traditional, and therefore familiar, format. Supporting this mind set is
the fact that students in the two self-efficacy groups were not informed that the
format of the performance task might differ from the format of the efficacy task.
In other words, when students in the MC Eff/OE Perf group were asked to provide confidence judgments on multiple-choice problems, they were not told that
they would be asked to take the subsequent test in an open-ended format. Students in the OE EffMC Perf group were also not told of the change in testing
format. Had this information been provided, it might have altered confidence
225
226
performance is crucial to understanding the relationship between these two variables. Although we found no difference between the two methods of assessing
self-efficacy, relations between self-efficacy and performance differed depending
on the method of assessing performance. These differences altered the predictive
utility of self-efficacy and influenced calibration results. As we have demonstrated, these differences have measurement implications for researchers
attempting to assess the relationship between self-efficacy beliefs and related
academic outcomes. These measurement concerns must be analyzed further
before researchers can make sound generalizations about the strength of the selfefficacy/performance relationship or the accuracy of students self-perceptions.
Whereas some researchers (Fennema & Ijart, 1994; Pajares, 1996a; Wigfield,
Eccles, MacIver, Reuman, & Midgley, 1991) have reported that gender differences in mathematics confidence surface in middle school, we did not find them
in the present sample. We found, however, that at higher self-efficacy levels, boys
were slightly better predictors of their performance than were girls. At lower selfefficacy levels, boys were poorer predictors. Pesearch is needed to better determine the nature of these effects. Pajares (1996a) found no gender differences in
the calibration of regular-education middle-school students but discovered that
gifted girls were biased toward underconfidewe.
Students enrolled in algebra were better calibrated across all measures. This
finding is consistent with those of other studies (Pajares, 1996a; Pajares & Kranzler, 1995) of the relationship between calibration and mathematics capability.
Pajares (1996a) reported that middle-school gifted students are better calibrated
than are regular-education students. Similarly, we found that the more capable
students in the present sample were better judges of their capability. In general,
we concur with the recommendation of Pajares and Kranzler (1995) that instructional intervention is needed to help students better understand what they know
and do not know so that they can more effectively deploy appropriate cognitive
strategies during mathematical problem solving. Interventions should be particularly appropriate for students at lower levels of academic achievement. It seems
likely that a productive intervention would be to vary the form of assessment and
to familiarize students with each form. These recommendations are particularly
pertinent to states and school districts that are moving toward new forms of
assessment.
The present findings are based on a sample of regular-education eighth-grade
students enrolled in higher level mathematics courses. We recommend that the
findings be tested using students enrolled in other levels of mathematics and in
other grades. In addition, we acknowledge that the strength of the relationship
between self-efficacy and performance may be influenced by the correlated
specifics that can result from the use of the same items to measure both constructs. Marsh, Roche, Pajares, and Miller (in press) have cautioned that using
identical self-efficacy and performance indexes in an effort to closely match
227
belief and criterion may lead to positively biased estimates of effects from selfefficacy to performance outcomes. Thus, researchers are encouraged to use similar rather than identical items or tasks to assess self-efficacy beliefs and performance criteria or to use structural equation modeling analyses to sift out the bias
that might result from correlated specifics.
NOTE
We wish to express our gratitude to Me1 Lucas, Professors Tim Urdan and Margaret Johnson, and
especially to Gio Valiante for their valuable assistance on this project. In addition, we thank the consulting editors and the executive editor of The Journal of Experimental Education for the many valuable comments and suggestions that served to strengthen the final manuscript. This study was funded by a grant from the Florida Educational Research Council, Inc.
Address correspondence to Frank Pajares, Department of Educational Studies, Emory University,
Atlanta, GA 30322.Tel: (404) 727-1775. Fax: (404) 727-2799. E-mail: [email protected].
REFERENCES
Bandura, A. (1986). Social foundations of thought and action: A social cognitive theory. Englewood
Cliffs, NJ: Prentice Hall.
Bandura, A. (1997). Self-efticacy: The exercise of control. New York Freeman.
Betz, N. E., & Hackett, G. (1983). The relationship of mathematics self-efficacy expectations to the
selection of science-based college majors. Journal of Vocational Behavior; 23, 329-345.
Collins, J. L. (1982, March). Self-efticacy and ability in achievement behavior. Paper presented at the
meeting of the American Educational Research Association, New York.
Fennema, E., & Hart. L. E. (1994). Gender and the JRME. Journalfor Research in Mathematics Education, 25, 648-659.
Fennema, E., & Sherman, J. A. (1978). Sex-related differences in mathematics achievementand related factors: A further study. Journal for Research in Mathematics Education, 9, 189-203.
Hackett, G. (1985). The role of mathematics self-efficacyin the choice of math-related majors of college women and men: A path analysis. Journal of Counseling Psychology, 32.47-56.
Hackett, G., & Betz, N. E. (1989). An exploration of the mathematics self-efficacyhathematics performance correspondence. Journal for Research in Mathematics Education. 20, 26 1-273.
International Association for the Evaluation of Educational Achievement (1985). Second Study of
Mathematics: Technical Report IV Washington, DC: Author.
Keren, G. (1991). Calibration and probability judgments: Conceptual and methodological issues.
Acta Psychologica, 77, 217-273.
Lapan, R. T., Boggs, K. R., & Momll, W. H. (1989). Self-efficacyas a mediator of investigativeand
realistic general occupational themes on the Strong-Campbell Interest Inventory. Journal of Counseling Psychology, 36. 176-1 82.
Marsh, H. W., Roche, L. A., Pajares, F., and Miller, D. (in press). Item-specific efficacy judgments in
mathematical problem-solving. Contemporary Educational Psychology.
Miller, M. D.,& Legg, S. M. (1993). Alternative assessment in a high stakes environment. Educational Measurement: Issues and Practice, 12, 9-15.
Miller, M. D., & Seraphine, A. E. (1993). Can test scores remain authentic when teaching to the test?
Educational Assessment, 1, 119-129.
Pajares, F. (1996a). Self-efficacy beliefs and mathematical problem solving of gifted students. Contemporary Educational Psychology, 21, 325-344.
Pajares, F. (1996b). Self-efficacy beliefs in academic settings. Review of Educational Research, 66.
543-578.
Pajares, F., & Kranzler, J. (1995). Self-efficacy beliefs and general mental ability in mathematical
problem-solving. Contemporary Educational Psychology, 26,426-443.
228
Pajares, F., & Miller, M. D. (1994). The role of self-efficacy and self-concept beliefs in mathematical problem-solving: A path analysis. Journal of Educational Psychology, 86, 193-203.
Pajares, F., & Miller, M. D. (1995). Mathematics self-efficacy and mathematics outcomes: The need
for specificity of assessment. Journal of Counseling Psychology, 42, 190-198.
Pintrich, P. R., & De Groot, E. V. (1990). Motivational and self-regulated learning components of
classroom academic performance. Journal of Educational Psychology, 82, 33-40.
Schraw, G. (1995). Measures of feeling-of-knowing accuracy: A new look at an old problem. Applied
Cognitive Psychology, 9, 321-332.
Schraw, G., Dunkle, M. E., Bendixen, L. D., & DeBacker Roedel, T. (1995). Does a general monitoring skill exist? Journal of Educational Psychology, 87, 435-444.
Schraw, G., & DeBacker Roedel, T. (1994). Test difficulty and judgment bias. Memory & Cognition,
22, 6 3 4 9 .
Schraw, G., Potenza, M. T., & Nebelsick-Gullet, L. (1993). Constraints on the calibration of performance. Contemporary Educational Psychology, 18. 445463.
Schunk, D. H. ( 1989). Self-efficacy and achievement behaviors. Educational Psychology Review, I .
173-208.
Schunk, D. H. (1991). Self-efficacy and academic motivation. Educational Psychologist, 26,
207-23 I .
Thorndike, R. L. (1986). The role of general ability in prediction. Journal of Vocational Behavior; 29,
332-329.
Wigfield, A., Eccles, J., MacIver, D., Reuman, D., & Midgley, C.(1991). Transitions at early adolescence: Changes in childrens domain specific self-perceptions and general self-esteem across the
transitions to junior high school. Developmental Psychology, 27, 552-565.
Wiggins, G. (1989). Teaching to the (authentic) test. Educational Leadership, 46, 4 1 4 7 .
Williams, E. J. (1959). The comparison of regression variables. Journal of the Royal Statistical Society, Series B, 21, 396-399.
Worthen, B. R. (1993). Critical issues that will determine the future of alternative assessment. Phi
Delta Kappan, 74, 444-456.
Yates, J. F. (1990). Judgment and decision making. Englewood Cliffs, NJ: Prentice Hall.