0% found this document useful (0 votes)
65 views17 pages

DIT2 Devising and Testing A Revised Instrument of

DIT
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views17 pages

DIT2 Devising and Testing A Revised Instrument of

DIT
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/232502836

DIT2: Devising and Testing A Revised Instrument of Moral Judgment

Article  in  Journal of Educational Psychology · December 1999


DOI: 10.1037/0022-0663.91.4.644

CITATIONS READS
340 2,444

4 authors, including:

Darcia Narvaez Stephen Thoma


University of Notre Dame University of Alabama
137 PUBLICATIONS   4,348 CITATIONS    91 PUBLICATIONS   3,538 CITATIONS   

SEE PROFILE SEE PROFILE

Muriel J Bebeau
School of Dentistry, University of Minnesota Twin Cities
76 PUBLICATIONS   3,297 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Applying Bayesian Statistics in Studies of Educational Psychology and Neuroscience View project

Measurement development View project

All content following this page was uploaded by Stephen Thoma on 28 May 2014.

The user has requested enhancement of the downloaded file.


Journal of Educational Psychology Copyright 1999 by the American Psychological Association, Inc.
1999, Vol. 91, No. 4,644-659 0022-0663/99/$3.00

DIT2: Devising and Testing a Revised Instrument of Moral Judgment


James R. Rest and Darcia Narvaez Stephen J. Thoma
University of Minnesota, Twin Cities Campus University of Alabama, and University of Minnesota,
Twin Cities Campus

Muriel J. Bebeau
University of Minnesota, Twin Cities Campus

The Denning Issues Test, Version 2 (DIT2), updates dilemmas and items, shortens the original
Defining Issues Test (DIT1) of moral judgment, and purges fewer participants for doubtful
response reliability. DIT1 has been used for over 25 years. DIT2 makes 3 changes: in
dilemmas and items, in the algorithm of indexing, and in the method of detecting unreliable
participants. With all 3 changes, DIT2 is an improvement over DIT1. The validity criteria for
DIT2 are (a) significant age and educational differences among 9th graders, high school
graduates, college seniors, and students in graduate and professional schools; (b) prediction of
views on public policy issues (e.g., abortion, religion in schools, rights of homosexuals,
women's roles); (c) internal reliability; and (d) correlation with DIT1. However, the increased
power of DIT2 over DIT1 is primarily due to the new methods of analysis (a new index called
N2, new checks) rather than to changes in dilemmas, items, or instructions. Although DIT2
presents updated dilemmas and smoother wording in a shorter test (practical improvements),
the improvements in analyses account for the validity improvements.

The Defining Issues Test, Version 2 (DIT2), is a revision which has been used for over 25 years. (See the Rest,
of the original Defining Issues Test (DIT1), which was first Thoma, Narvaez, et al., 1997, article for further discussion
published in 1974. DIT2 updates the dilemmas and items, of N2.) This article reports on a revised version (new
shortens the test, and has clearer instructions. This is the dilemmas and items) of the DIT1—the DIT2—with more
third in a series of articles in the Journal of Educational streamlined instructions and shorter length. Also, this article
Psychology aimed at improving the measurement of moral describes a new approach to detecting bogus data ("new
judgment (Rest, Thoma, & Edwards, 1997; Rest, Thoma, checks").
Narvaez, & Bebeau, 1997). Rest, Thoma, and Edwards While we were reexamining aspects of the DIT1, we also
(1997) proposed an operational definition of construct reconsidered our methods of checking for participant reliabil-
validity (seven criteria) that could be used to evaluate ity. That is, given a multiple-choice test that can be group
various measurement devices of moral judgment. Rest, administered—often under conditions of anonymity—some
Thoma, Narvaez, et al. (1997) reported that a new way of participants might fill out the DIT1 answer sheet without
indexing DIT data, the N2 index, had superior performance regard to test instructions, and some participants might give
on the seven criteria in contrast to the traditional P index, bogus data. The participant reliability checks are methods
for detecting bogus data. For the past decades, we have used
a procedure called "standard checks" to check for bogus
James R. Rest, Department of Educational Psychology and data. In sum, DIT2 uses new checks instead of standard
Center for the Study of Ethical Development, University of checks and uses revised items and dilemmas as well as N2.
Minnesota, Twin Cities Campus; Darcia Narvaez, College of With these three changes, we wanted to see whether the
Education and Human Development and Center for the Study of research dividends would increase by creating alternatives to
Ethical Development, University of Minnesota, Twin Cities Cam-
pus; Stephen J. Thoma, Department of Human Development, DIT1, P index, and standard checks.
University of Alabama, and Center for the Study of Ethical However, we had an important question to consider
Development, University of Minnesota, Twin Cities Campus; before getting into the matter of updating: Why would
Muriel J. Bebeau, Department of Preventive Science and Center for anyone want a DIT score, either updated or not? Two issues
the Study of Ethical Development, University of Minnesota, Twin are at the heart of the matter. First, is Kohlberg's approach so
Cities Campus. James R. Rest died in July 1999. flawed that research ought to start anew? Second, can a
We thank Lee Fertig, Irene Getz, Carol Koskela, Christyan multiple-choice test like the DIT (as opposed to interview
Mitchell, and Nanci Turner Shults for help in data collection. We data) yield useful information?
also thank the Bloomington School District and the Moral Cogni-
tion Research Group at the University of Minnesota.
Correspondence concerning this article should be addressed to The Kohlbergian Approach
Darcia Narvaez, Department of Educational Psychology, Univer-
sity of Minnesota, 206 Burton Hall, 178 Pillsbury Drive Southeast, The DIT is derived from Kohlberg's (1976, 1984) ap-
Minneapolis, Minnesota 55455. Electronic mail may be sent to proach to morality. In the past decades, many challenges to
[email protected]. this approach have been made. Critics raise both philosophi-

644
DIT2 645
cal and psychological objections. In a recent book (Rest, battles of secession" (Marty & Appleby, 1993, p. 1).
Narvaez, Bebeau, & Thoma, 1999), the criticisms and Understanding how people come to hold opinions about
challenges to Kohlberg's theory are reviewed and analyzed. macromoral issues is now no less important, urgent, and real
In contrast to those who find Kohlberg's theory so faulty that than the study of micromoral issues. It is premature to say
they propose discarding it, we have found that continuing what approach best illuminates micromorality. However, we
with many of Kohlberg's starting points has generated claim that a Kohlbergian approach illuminates macromoral-
numerous findings in DIT research. ity issues (see Table 4.9 in Rest et al., 1999).
To appreciate how Kohlberg's basic ideas illuminate DIT1 research follows Kohlberg's approach in four basic
important aspects of morality, first consider a distinction ways. It (a) emphasizes cognition (in particular, the forma-
between macromorality and micromorality. Just as in the tion of concepts of how it is possible to organize cooperation
field of economics, macro and micro distinguish different among people on a society-wide scope); (b) promotes the
phenomena and different levels of abstraction in analysis, self-construction of basic epistemological categories (e.g.,
we use the terms to distinguish different phenomena and reciprocity, rights, duty, justice, social order); (c) portrays
levels of analysis in morality. Macromorality concerns the change over time in terms of cognitive development (i.e., it
formal structure of society, that is, its institutions, role is possible to talk of "advance" in which "higher is better");
structure, and laws. The following are the central questions and (d) characterizes the developmental change of adoles-
of macromorality: Is this a fair institution (or role structure cents and young adults in terms of a shift from conventional
or general practice)? Is society organized in a way that to postconventional moral thinking. However, we call our
different ethnic, religious, and subcultural groups can coop- approach a "neo-Kohlbergian approach" (i.e., it is based on
erate in it and should support it? Should I drop out of a these starting points, but we have made some modifications
corrupt society? On the other hand, micromorality focuses in theory and method).
on the particular, face-to-face relationships of people in One major difference is our approach to assessment.
everyday life. The following questions are central to micro- Instead of Kohlberg's interview, which asks participants to
morality: Is this a good relationship? Is this a virtuous solve dilemmas and explain their choices, the DIT1 uses a
person? Both micro- and macromorality are concerned with multiple-choice recognition task that asks participants to rate
establishing relationships and cooperation among people. and rank a standard set of items. Some people are more
However, micromorality relates people through personal accustomed to interview data and question whether data
relationships, whereas macromorality relates people through from multiple-choice tests are sufficiently nuanced to ad-
rules, role systems, and formal institutions. In macromoral- dress the subtleties of morality research. Some researchers
ity, the virtues of acting impartially and abiding by general- regard a multiple-choice test as a poor way to study morality,
izable principles are praised (for how else could strangers compared with the richness of explanation data from inter-
and competitors be organized in a societal system of views. Therefore, the prior question concerning whether to
cooperation?). In micromorality, the virtues of unswerving update the DIT1 needs attention first. These challenges raise
loyalty, dedicated care, and partiality are praised, because complex issues that are addressed in a recent book (Rest et
personal relationships depend on mutual caring and special al., 1999). Within the short span of an article, we can
regard. In our view, Kohlberg's theory is more pertinent to indicate only the general direction that we take.
macromorality than to micromorality (for further discussion
of macro- and micromorality, see Rest et al., 1999). Some of
Kohlberg's critics fault his approach for not illuminating The DIT Approach
"everyday" morality (in the sense of micromorality; see A common assumption in the field of morality, and one
Killen & Hart, 1995). However, it remains to be seen how with which we disagree, is that reliable information about
well other approaches accomplish this. the cognitive processes that underlie moral behavior is
The issues of macromorality are real and important, obtained only by interviewing people. The interview method
regardless of the relative contributions of a Kohlbergian or asks a person to explain his or her choices. The moral
non-Kohlbergian approach to issues of micromorality. Re- judgment interview has been assumed to provide a clear
garding the importance of macromorality issues, consider window into the moral mind. In his scoring system (Colby et
Marty and Appleby's (e.g., 1991) six-volume series on al., 1987), Kohlberg gave privileged status to interview data.
current ideological clashes in the world. Marty and Appleby At one point, Kohlberg (1976) referred to scoring interviews
talked about the world's major disputes since the cessation as "relatively error-free" and "theoretically the most valid
of the Cold War. Formerly, the Soviet Union and Marxism/ method of scoring" (p. 47). According to this view, the
communism seemed to be the greatest threats to democra- psychologist's job is to create the conditions in which the
cies. However, Marty and Appleby characterized the major participant is candid, ask relevant and clarifying questions,
ideological clash today as between fundamentalism and and then classify and report what the participant said. Then,
modernism; others describe the clash in ideology as the in the psychologist's reports, the participant's theories about
"culture war" between orthodoxy and progressivism (Hunter, his or her own inner process are quoted to support the
1991) or religious nationalism versus the secular state psychologist's theories of how the mind works.
(Juergensmeyer, 1993). These clashes in ideology lead "to However, consider some strange outcomes of interview
sectarian strife and violent ethnic particularisms, to skir- material. When Kohlberg reported interviews, the partici-
mishes spilling over into border disputes, civil wars, and pants talked like philosopher John Rawls (Kohlberg, Boyd,
646 REST, NARVAEZ, THOMA, AND BEBEAU

& LeVine, 1990); when Gilligan reported interviews, the from structure in assessment, the highest development in
participants talked like gender feminists (Gilligan, 1982); moral judgment is not denned in terms of a particular moral
and when Youniss and Yates (in press) reported interviews, philosopher (i.e., John Rawls), and the concept of develop-
the participants said that they don't reason or deliberate at all ment is redefined so that development is not tied to the
about their moral actions. This unreliability in explanation staircase metaphor.
data exists because people do not have direct access to their We grant that the DIT started out in the 1970s as a "quick
cognitive operations. Perhaps people do not know how their and dirty" method for assessing Kohlberg's stages. How-
minds work any more than they know how their immuniza- ever, as time has passed and as data on the DIT1 has
tion or digestive systems work. Perhaps asking a person to accumulated, different theories about human cognition have
explain his or her moral judgments is likely to get back what evolved (e.g., Taylor & Crocker, 1981). In keeping with
they have understood current psychological theorists to be these changes, we have reconceptualized our view of the
saying. Then, when psychologists selectively quote the DIT1 (see Rest et al., 1999, chapter 6). Now, we regard the
participants' explanations that agree with their own views, DIT1 as a device for activating moral schemas (to the extent
such evidence is vulnerable to the charge of being circular. that a person has developed them) and for assessing them in
Thus, interview data need more than face validity. terms of importance judgments. The DIT1 has dilemmas and
Contrary to assuming the face validity of interviews, standard items; the participant's task is to rate and rank the
researchers in cognitive science and social cognition con- items in terms of their moral importance. As the participant
tend that self-reported explanations of one's own cognitive encounters an item that both makes sense and also taps into
processes have severe limitations (e.g., Nisbett & Wilson, the participant's preferred schema, that item is judged as
1977; Uleman & Bargh, 1989). People can report on the highly important. Alternatively, when the participant encoun-
products of cognition but cannot report so well on the mental ters an item that either doesn't make sense or seems
operations they used to arrive at the product. We believe that simplistic and unconvincing, he or she gives it a low rating
people's minds work in ways they do not understand and in and passes over it. The items of the DIT1 balance bottom-up,
ways that they can't explain. We believe that one of the data-driven processing (stating just enough of a line of
reasons that there is so little evidence for postconventional argument to activate a schema) with top-down, schema-
thinking in Kohlberg's studies (e.g., Snarey, 1985) is that driven processing (stating a line of argument in such a way
interviewing people does not credit their tacit knowledge. that the participant has to fill in the meaning from schemas
There is now a greater regard for the importance of implicit already in his or her head). In the DIT1, we are interested in
processes and tacit knowledge in human decision making. knowing which schemas the participant brings to the task.
Tacit knowledge is outside the awareness of the cognizer We assume that those are the schemas that structure and
(e.g., Bargh, 1989; Holyoak, 1994) and beyond his or her guide the participant's thinking in decision making beyond
ability to articulate verbally. For example, consider the the test.
inability of a 3-year-old to explain the grammatical rules
used to encode and decode utterances in his or her native
language. The lack of ability to state grammatical rules does Validity of the DIT1
not indicate what children know about language. Similarly, a Arguing that there are problems with interview data does
lack of introspective access has been documented in a wide not automatically argue for the validity of the DIT1. Rather,
range of phenomena, including attribution studies (e.g., the DIT1 must make a case for validity on its own. Validity
Lewicki, 1986), word recognition (Tulving, Schacter, & of the DIT1 has been assessed in terms of seven criteria.
Stark, 1982), conceptual priming (Schacter, 1996), and Rest, Thoma, and Edwards (1997) described the seven
expertise (Ericsson & Smith, 1991). This research calls into criteria for operationalizing construct validity. A recent book
question the privileged place of interview data over recogni- (Rest et al., 1999) cited over 400 published articles that more
tion data (as in the DIT1). We believe that any data- fully document the validity claims. The validity criteria
gathering method needs to build a case for its validity and briefly are as follows:
usefulness. 1. Differentiation of various age and education groups.
Note that the issue here is not whether Kohlberg distin- Studies have shown that 30% to 50% of the variance of DIT
guished normative ethics from meta-ethics. Rather, our point scores is attributable to the level of education in heteroge-
is that Kohlberg regarded explanation data from interviews neous samples.
as directly revealing the cognitive operations by which 2. Longitudinal gains. A 10-year longitudinal study showed
moral judgments are made. We are denying that people have significant gains of men and women and of college attenders
access to the operations or inner processes by which they and noncollege participants from diverse backgrounds. A
make moral decisions. We are denying that the royal road review of a dozen studies of freshman to senior college
into the moral mind is through explanation data given in students (n = 755) showed effect sizes of .80 (large gains).
interviews. The upshot of all of this is extensive (see more Of all the variables, DIT1 gains have been one of the most
detailed discussion in Rest et al., 1999). It not only means dramatic longitudinal gains in college (Pascarella & Teren-
that multiple-choice data may have something of value to zini, 1991).
contribute to moral judgment research, but it also results in 3. DIT1 scores are significantly related to cognitive
drawing the distinction between content and structure at a capacity measures of moral comprehension (r = .60s),
different place than Kohlberg did. All content is not purged recall and reconstruction of postconventional moral argu-
DIT2 647
ments (Narvaez, 1998), to Kohlberg's moral judgment well-known dilemma about "Heinz and the drug" is used,
interview measure, and (to a lesser degree) to other cognitive the Vietnam War is talked about in one dilemma as if it is a
developmental measures. current event, and, in one of the items, the term Orientals
4. DIT1 scores are sensitive to moral education interven- was used to refer to Asian Americans). While updating
tions. One review of over 50 intervention studies reported an dilemmas and items, we rewrote the instructions to clarify
effect size for dilemma discussion interventions to be .41 them, and we shortened the test from six stories to five
(moderate gains), whereas the effect size for comparison stories when we found that one dilemma in DIT1 was not
groups was only .09 (small gains). contributing as much to validity as were the other dilemmas
5. DIT1 scores are significantly linked to many prosocial (Rest, Narvaez, Mitchell, & Thoma, 1998b).
behaviors and to desired professional decision making. One 2. DIT2 takes advantage of a recently discovered way to
review reported that 32 of 47 measures were statistically calculate a developmental score (the N2 index; Rest, Thoma,
significant (see also Rest & Narvaez, 1994, for recent
Narvaez, et al., 1997). (Because issues of indexing are
discussions of professional decision making).
discussed at length in this recent publication, that discussion
6. DIT1 scores are significantly linked to political atti-
is not repeated here.)
tudes and political choices. In a review of several dozen
3. There is the ever-present problem in group-adminis-
correlates with political attitude, DIT1 scores typically
tered, multiple-choice tests (that are also often anonymous)
correlated in the range, r = .40 to .60. When combined in
multiple regression with measures of cultural ideology, the that participants might give bogus data. The challenge,
combination predicted up to two thirds of the variance (Rs in therefore, is to develop methods for detecting bogus data so
the .80s) of controversial public policy issues such as that we can purge the questionnaires that have bogus data. In
abortion, religion in the public school, women's roles, rights DIT1, there are several checks for participant reliability; the
of the accused, rights of homosexuals, and free-speech usefulness of having some sort of check for participant
issues. Such issues are among the most hotly debated issues reliability has been described (Rest, Thoma, & Edwards,
of our time, and DIT1 scores are a major predictor to these 1997). Nevertheless, with DIT2, we reconsidered our particu-
real-life issues of macromorality. lar method of checking for participant reliability, especially
7. Reliability. Cronbach's alpha is in the upper .70s/low because such a large percentage (typically over 10%) of
.80s. Test—retest is about the same. samples using DIT1 are discarded for questionable partici-
A specification of validity criteria tells us which studies to pant reliability. (Maybe in our zeal to detect bogus data, we
do to test a construct and what results should be found in threw out too many participants.)
those studies. Operational definitions enable us to examine To prepare the new dilemmas and items of DIT2, we first
the special usefulness of information from a measure. We discussed various versions amongst ourselves. Then we
want to know how the construct is different from other asked members of an advanced graduate seminar on moral-
theoretically related constructs. Accordingly, DIT1 scores ity research at the University of Minnesota to take the
show discriminant validity from verbal ability and general reformulated DIT2 and to make comments. Then we dis-
intelligence and from conservative and liberal political cussed the dilemmas, items, and instructions again. Given
attitudes. That is, the information in a DIT1 score predicts that DIT1 has been unchanged for over 25 years and the fact
the seven validity criteria above and beyond that accounted that the Kohlberg group labored for decades over the scoring
for by verbal ability and general intelligence or political system of the moral judgment interview (Colby et al., 1987),
attitude (Thoma, Narvaez, Rest, & Derryberry, in press). changing the DIT might seem to be a big undertaking.
Further, the DIT1 is equally valid for men and women (Rest However, the process was surprisingly straightforward and
et al., 1999). In sum, mere is no other variable or construct swift (and the results were positive). We conclude there is
that accounts as well for the combination of the seven nothing sacred or special about the original Kohlberg
validity findings than the construct of moral judgment. The dilemmas or the DIT1 dilemmas that cannot be reproduced
persuasiveness of the validity data comes from the combina- in new materials. After freezing the DIT1 for years, we now
tion of criteria that many independent researchers have encourage experimentation in new dilemmas and new
found, not just from one finding with one criterion. formats. To encourage this experimentation, the new scoring
guides and computer scoring from the Center for the Study
of Ethical Development provide special aids to assist in the
Why a Revised DIT?
development of new dilemmas and new indexes (see Rest &
Because we wanted to maintain comparability in studies, Narvaez, 1998; Rest, Narvaez, Mitchell, & Thoma, 1998a).
DIT1 went unchanged while we went through a full cycle of DIT2 parallels DIT1 in construction:
studies. It took much longer to go through a full cycle than 1. Paragraph-length hypothetical dilemmas are used, each
we originally anticipated; the DIT1 was frozen for over 25 followed by 12 issues (or questions that someone deliberat-
years. ing on the dilemma might consider) representing different
There are several issues about DIT1 that DIT2 seeks to stages or schemas. The participant's task, a recognition task,
address (and this moves us to the specific purposes of the is to rate and rank the items in terms of their importance.
present article): 2. The "fragment strategy" is used whereby each item is
1. Some of the dilemmas in DIT1 are dated, and some of short and cryptic, presenting only enough verbiage to
the items needs new language (e.g., in DIT1, Kohlberg's convey a line of thinking, not to present a full oration
648 REST, NARVAEZ, THOMA, AND BEBEAU

defending one action choice or another (see Rest et al., 1999; 3. High correlations between DIT1 and DIT2. Of course
Rest, Thoma, & Edwards, 1997). this is important when comparing two tests purported to
3. Dilemmas and items on DIT2 closely parallel the moral measure the same thing.
issues and ideas presented in DIT1; however, the circum- 4. Adequate internal reliability in DTT2. This was the final
stances in the dilemmas and wording are changed, and the criterion for determining the adequacy of DIT2.
order of items is changed. We present our findings in four parts. Part 1 compares the
4. We presume that the underlying structure of moral performance of DIT2 (including the changes in dilemmas
judgment assessed by the DIT consists of three developmen- and items, in indexing, and in participant reliability checks)
tal schemas: personal interest, maintaining norms, and with DIT1, focusing on the four validity criteria mentioned
postconventional (Rest et al., 1999). See the Appendix for a previously. The central questions here are whether updating,
sample story from DIT2. clarifying, and shortening DIT2, and purging fewer partici-
pants for questionable reliability (practical improvements)
can be done without sacrificing validity, and whether
improvements in constructing a new index (N2) and new
Validating DIT2
methods of detecting bogus data (new checks) are effective.
How does one determine whether a new version of the In Part 2, we seek to isolate the effects of each of the three
DIT is working? We administered both DIT1 and DIT2 to changes. What are the particular effects of changing the
the same participants, balancing the order of presentation. dilemma and item stimuli, the method of indexing, and the
We included students at several age and education levels method of checking for participant reliability? In Part 3, we
(from ninth-grade to graduate and professional school shift our focus to consider in some detail the problem of
students). We wanted to pick criteria for this preliminary bogus data and methods for detecting unreliable partici-
validation on which DIT1 was particularly strong, thinking pants. (New checks turns out to be the most unique
that DIT2 would have to be at least as strong on these methodological feature discussed in this article.) Finally, in
criteria. We used four criteria for initial validity: Part 4, we further examine a replication with DIT2 of the
1. Discrimination of age and education groups. This is Narvaez et al. (1999) study that concerns the discriminabil-
our chief check on the presumption that our measure of ity of the DIT1 from political attitudes and examines the
moral judgment is measuring cognitive advance—a key particular usefulness of the DIT2 in predicting opinions
assumption of any cognitive developmental measure. about public policy issues (seeking replication of the theoreti-
2. Prediction of opinions on controversial public policy cal claim that moral judgment's most important payoff is the
issues. As discussed in Rest et al. (1999), one of the most prediction of opinions about controversial public policy
important payoffs of the moral judgment construct is its issues).
ability to illuminate how people think about the macromoral
issues of society. The DIT predicts how people think about Method
the morality of abortion, religion in public schools, and so on
(matters dealing with the macro-issues of social justice, that Participants
is, how it is possible to organize cooperation on a society- The overall goal in constituting this sample was to have a mix of
wide basis, going beyond face-to-face relationships). The participants at various age and educational levels. We sought
significant correlation between the DIT and various mea- participants from four educational levels: students who were in the
sures of political attitude has long been noted (see the review ninth grade, students who had recently graduated from high school
of over 30 correlations in Rest et al., 1999). A secondary goal and were enrolled for only a few weeks as freshmen in college,
of this study was to replicate a study by Narvaez, Getz, students who were college seniors, and students in graduate or
Thoma, and Rest (1999) by (a) using the specific measure of professional school programs beyond the baccalaureate degree.
political attitude—the Attitudes Toward Human Rights Inven- These four levels of education have been used in studies of the DIT
tory (ATHRI; Getz, 1985); (b) testing whether D1T scores reduce since 1974 (Rest, Cooper, Coder, Masanz, & Anderson, 1974). A
to political identity or religious fundamentalism or to a common total of 200 participants from these four age and educational levels
turned in completions of all the major parts of the questionnaire
factor of liberalism or conservatism; and (c) testing whether
package. Note that both the least advanced and the most advanced
or not the combination of DIT scores with cultural ideology groups were from the upper Midwest, whereas the two middle
(e.g., political identity and religious fundamentalism) more groups were from the South. Thus, correlations with education
powerfully predicts controversial public policy issues than could not be explained as regional differences.
any one of these measures alone. Replicating the Narvaez et Ninth-grade students. Two classrooms of ninth graders (» = 47)
al. (1999)findings(both with DIT1 and D1T2 in a new study) is were asked to participate. The students attended a school that was
the first direct replication of these findings beyond the original located in a middle-class suburb of the Twin Cities metropolitan
study, on which we base our interpretation that moral judgment area. Testing took place over two class periods of a life skills class.
interacts with cultural ideology in parallel—not serially—in Senior high graduates, new freshmen. Students (n = 35) from
producing moral thinking about macromoral issues. More gener- a university in the southeastern United States were offered extra
ally, we have taken the position that an important payoff of credit in several psychology classes for participation. Freshman
students had recently graduated from high school and had been at
moral judgment research is to illuminate people's opinions
the university for only a few weeks.
about controversial public policy issues, and thus it is important College seniors. Students (n = 65) from a university in the
to show that this interpretation is not based on only one study. southeastern United States were offered extra credit in several
DIT2 649
psychology classes. College seniors were students who were free speech, women's roles, and the role of religion in public
finishing their last year as undergraduates. schools. The ATHRI poses issues suggested by the American
Graduate school and professional school students. Partici- Constitution's Bill of Rights, similar to the large-scale studies of
pants in this category (n = 53) consisted of 37 students in a American attitudes about civil liberties by McClosky and Brill
dentistry program at a state university in the upper Midwest (at the (1983). The ATHRI contains 40 items, 10 of which are platitudi-
end of their professional school program), 13 students at a private, nous, "apple pie" statements of a general nature with which
moderately conservative seminary in the upper Midwest, and 3 everyone tends to agree. Here are two examples of the platitudi-
students in a doctoral program in moral philosophy (we were nous, noncontroversial items: "Freedom of speech should be a
unsuccessful in our attempts to recruit more moral philosophy basic human right" and "Our nation should work toward liberty
students). Participants who took the tests on their own time were and justice for all." In contrast, 30 items are controversial, specific
paid. applications of human rights. Two examples are "Books should be
banned if they are written by people who have been involved in
Instruments un-American activities" and "Laws should be passed to regulate
the activities of religious cults that have come here from Asia."
The choice of instruments followed from the goals of the study, During initial validation, a pro-rights group (from an organization
which are (a) to compare DIT1 with DIT2 and (b) to replicate the that had a reputation for backing civil liberties) and a selective-about-
Narvaez et al. (1999) study. rights group (from a group with a reputation for backing rights of
Moral judgment: DIT1-P.1 The DIT (Rest et al., 1999) is a certain groups selectively) were enrolled for a pilot study (n = 101)
paper-and-pencil test of moral judgment. DIT1 presents six dilem- with 112 controversial items (Getz, 1985). Thirty of the items that
mas: (a) "Heinz and the drug" (whether Heinz ought to steal a drug showed the strongest divergence between groups were selected for
for his wife who is dying of cancer, after Heinz has attempted to get the final version of the questionnaire, along with 10 items that
the drug in other ways); (b) "escaped prisoner" (whether a expressed platitudes with which there was no disagreement (see
neighbor ought to report an escaped prisoner who has led an Getz, 1985, for further details on the pilot study). Therefore, with
exemplary life after escaping from prison); (c) "newspaper" the ATHRI, we have a total of 40 human rights issues that are
(whether a principal of a high school ought to stop publication of a related to civil libertarian issues.
student newspaper that has stirred complaints from the community In the study by Narvaez et al. (1999), scores ranged from 40 to
for its political ideas); (d) "doctor" (whether a doctor should give 200. These high scores represent advocacy of civil liberties.
medicine that may kill a terminal patient who is in pain and who Although the items of the ATHRI represent many different issues
requests the medicine); (e) "webster" (whether a manager ought to and contexts, they strongly cohere (Cronbach's alpha was .93).
hire a minority member who is disfavored by the store's clientele); Narvaez et al. (1999) reported significant bivariate correlations of
and (f) "students" (whether students should protest the Vietnam DIT1 with ATHRI (rs in the .60s). Also, when measures of political
War). Each dilemma is followed by a list of 12 considerations in identity and religious fundamentalism were combined in multiple
resolving the dilemma, each of which represent different types of regression with the DIT to predict ATHRI, the R was in the range of
moral thinking. Items are rated and ranked for importance by the .7 to .8, accounting for as much as two thirds of the variance.
participant. For over 25 years, the most widely used index of the Further, each of the independent variables had unique predictability
DIT1 has been the P score, representing the percentage of (as well as shared variance). Thus, each independent variable was
postconventional reasoning preferred by the respondent. Although not reduced to a single common factor of liberalism or conserva-
the stages of moral thinking reflected on the DIT were inspired by tism. The present study was intended to replicate those findings
Kohlberg's (1976) initial work, the DIT is not tied to a particular using a different sample, with both DIT1 and DIT2.
moral philosopher (as Kohlberg's is tied to Rawls, 1971). Kohl- Religious ideology. To measure religious fundamentalism, we
berg's stages are redefined in terms of three schemas (personal chose Brown and Lowe's (1951) Inventory of Religious Belief,
interests, maintaining norms, and postconventional). following Getz (1985) and Narvaez et al. (1999). It is a 15-item
DFT2-N2. The revised test consists of five dilemmas: (a) measure that uses a 5-point, Likert-type scale. Its items differenti-
"famine" (A father contemplates stealing food for his starving ate between those who believe and those who reject the literalness
family from the warehouse of a rich man hoarding food— of Christian tenets. It includes items such as "I believe the Bible is
comparable to the Heinz dilemma in DIT1); (b) "reporter" (A the inspired Word of God" (a positively keyed item); "The Bible is
newspaper reporter must decide whether to report a damaging story full of errors, misconceptions, and contradictions" (a negatively
about a political candidate—comparable to the prisoner dilemma in keyed item); "I believe Jesus was born of a virgin"; and "I believe
DIT1); (c) "school board" (A school board chair must decide in the personal, visible return of Christ to earth." Scores on the
whether to hold a contentious and dangerous open meeting— Brown-Lowe inventory range from 15 to 75. High scores indicate
comparable to the newspaper dilemma in DIT1; (d) "cancer" (A strong literal Christian belief. Criterion group validity is good
doctor must decide whether to give an overdose of a painkiller to a between more and less fundamentalistic church groups (Brown &
frail patient—comparable to the doctor dilemma in DIT1; and (e) Lowe, 1951; Getz, 1984; Narvaez et al., 1999). Test-retest
"demonstration" (College students demonstrate against U.S. for- reliability has been reported in the upper .70s. Spearman-Brown
eign policy—comparable to the students dilemma in DIT1). The reliability has been found in the upper .80s (Brown & Lowe, 1951).
validity of DIT2 is unknown because this is the first study to use it. In Narvaez et al. (1999), Cronbach's alpha was .95 for the entire
The N2 index takes into account preference for postconventional
schemas and rejection of less sophisticated schemas, using both 1
ranking and rating data. Its rationale is discussed in Rest, Thoma, Operationalized variables used in statistical analysis are printed
and Edwards (1997). as an abbreviated name in capital letters (e.g., DIT1-P, FUNDA).
Opinions about public policy issues. As in the Narvaez et al. Theoretical constructs are printed in the usual manner (e.g., moral
(1999) study, the ATHRI, constructed by Getz (1985), asks judgment, religious fundamentalism). In the case of DIT variables,
participants to agree or disagree (on a 5-point scale) with state- the version is designated by DIT1 or DIT2, and the index used is
ments about controversial public policy issues such as abortion, designated after the hyphen (e.g., DIT1-P, the original DIT using
euthanasia, homosexual rights, due process rights of the accused, the P index; or DIT2-N2, the new DIT using the N2 index).
650 REST, NARVAEZ, THOMA, AND BEBEAU

Table 1 Then in Part 2, we examine the particular effects of each of


Participants Groups and Demographics the three changes in DIT2: (a) using the original wording of
dilemmas and items versus the revised dilemmas and items,
Average age Percent
Group Number (SD) women (b) using the P index versus using the new N2 index, and (c)
using the standard participant reliability checks versus using
Ninth grade 47 14.64 (0.53) 34 new checks.
High school graduates/college
freshmen 35 18.51 (2.03) 77
College seniors 65 21.55(3.11) 77 Parti
Graduate/professional school 53 29.06 (5.90) 45
Participant reliability. The DIT contains checks on the
Total 200 21.4(6.39) 58.5 reliability of a participant's responses. DIT1 uses a different
method for detecting participant unreliability than the DIT2
(discussed in detail in Part 3). From the total sample of 200
group of 158 participants. This scale taps religious fundamentalism participants, 154 survived the reliability checks of the
and is labeled FUNDA. standard procedure for DIT1 (77%), whereas 192 survived
Political identity: Liberalism and conservatism. Participants the new reliability checks of DIT2 (96%). Given that in this
were asked to identify their political identity on a 5-point political study the same participants took both DIT1 and DIT2, we
conservatism scale, ranging from 1 (liberal) to 5 (conservative). conclude that DIT2 purges fewer participants for suspected
This method of measuring liberalism and conservatism replicates unreliability than does DIT1. The difference in proportion of
the Narvaez et al. (1999) study and is the variable of contention in participants purged between the new procedure and the
the challenge to the DIT1 by Emler, Resnick, and Malone (1983). standard procedure is significant (z = 5.56, p < .0001).
This variable will be referred to as POLCON (political conserva-
tism), with high scores being conservative. Criterion 1. We expect a developmental measure of
Demographics. Age of participants was given in years. Partici- moral judgment to increase as age and education increases.
pants were also asked to state their gender, but because there were Table 2 presents the means and standard deviations of
no significant differences on any of the DIT scores for gender DIT1-P and DIT2-N2 for each of the four educational levels.
scores for both males and females were collapsed for analysis. An analysis of variance (ANOVA) with DIT1-P grouped by
Education was measured in terms of the four levels of education four levels of education produces F(3, 153) = 41.1, p <
(1 = ninth grade, 2 = college freshman, 3 = college senior, .0001; an ANOVA with DIT2-N2 produces F(3, 191) =
4 = graduate or professional school student). Participants were 58.9; p < .0001. Table 3 presents age and educational trend
asked whether they were Christians. Participants were also asked data in terms of correlations of the moral judgment indexes
whether they were citizens of the United States (virtually all, with educational level (four levels) and with chronological
98.3%) and whether English was their first language (virtually all,
age (14-53). Although there might be doubts about the strict
97%). Some participant demographics are shown in Table 1.
linearity of education level (and therefore the use of level of
education as a linear variable in correlations), we assume
Procedure that deviations of the educational-level variable from strict
The order of materials was randomly varied (for all but the linearity affects both DIT1 and DIT2 equally, thus not
dentistry students), with DIT1 coming first for half of the partici- biasing the comparison between DIT measures. The correla-
pants and DIT2 coming first for the other half. There were no tional analysis shows stronger educational trends with
significant differences in terms of order for any of the major DIT2-N2 than with DIT1-P, although this amount of differ-
variables (P and N2 indexes on DIT1, P and N2 on DIT2 or on ence may not make much practical difference. In sum, the
ATHRI, FUNDA, and POLCON). Because the 37 dentistry stu- practical advantages of DIT2 (i.e., being shorter, more
dents had already taken the DIT1 as part of their regular curriculum up-to-date, and purging fewer participants) are not at the
requirements, we sought volunteers to take the remaining package
of questionnaires, and the order was not varied.
For the high school participants, time in two class sessions was
used to take the questionnaires; for the remaining participants, the Table 2
questionnaire package was handed out and the participants filled Means and Standard Deviations ofDITl-P and DIT2-N2
out the questionnaires on their own time. by Four Education Levels
All Minnesota participants (and parents of the ninth graders)
signed consent forms in accordance with the procedures of the DIT1-P DIT2-N2
University of Minnesota Human Participants Committee. Partici- (n = 154) (n = 192)
pants from the southeastern university were recruited in compli- Education level M SD M SD
ance with that institution's human participant requirements.
1. Ninth grade 23.0 10.0 20.5 9.7
2. College freshmen 28.7 11.5 30.6 14.4
Results a n d Discussion 3. College seniors 33.7 14.1 40.4 13.6
4. Graduate/professional school 53.9 13.1 53.3 11.5
In Part 1, DIT1-P is compared with DIT2-N2. How does
the new revision of the DIT stack up against the traditional Note. In DIT1-N2, for comparison purposes, the N2 index is
DIT, which has been used for over 25 years and reported in adjusted so that the mean (37.85) and standard deviation (17.19)
are equal to those of the P index. DIT1 = Defining Issues Test
hundreds of studies? The key question is whether, after (original version); DIT2 = Defining Issues Test, Version 2; P = P
decades of research, we have developed a better instrument. index; N2 = N2 index.
DIT2 651

Table 3 Because we use ranking data in the P index and as part of the
Correlations ofDIT With Education and Age N2 index, we cannot use the individual items as the unit of
Education Chronological internal consistency. Ranks are ipsative; that is, if one item is
Measure level (l^t) age ranked in first place, then no other item of a story can be in
first place. Therefore, the unit of internal reliability is on the
DIT1-P index .62 .52
story level, not the item level. Cronbach's alpha for DIT1-P
DIT2-N2 index .69 .56
over the six stories (n = 154) is .76. For the DIT2-N2, it is
Note. All correlations of DIT with age and education level are .81 (n = 192). Although these levels of Cronbach's alpha are
significant, p < .0001. The correlation of DIT2-N2 with education
level is significantly higher, f(151) = 6.72, p < .001, than the not outstandingly high, we regard them as adequate.
correlation of DIT1-P with educational level. The correlation of It is interesting to note that Cronbach's alpha for DIT1 's 6
DIT1-P with age is not significantly different, t{\5\) = 1.67, ns, stories plus DIT2's 5 stories (for a total of 11 units) is .90.
from the correlation of DIT1-N2 with age. (Calculation of differ- We might speculate that this finding (i.e., 5 or 6 stories have
ences between correlations follows Howell, 1987, pp. 244ff, in
which the correlations are first transformed to Fisher's r.) DIT1 = modest reliability, but 11 stories have high Cronbach's
Defining Issues Test (original version); DIT2 = Defining Issues alpha) indicates that the five or six stories of DIT1 and DIT2
Test, Version 2; P = P index; N2 = N2 index. each tap some different subdomains within morality. Al-
though the DIT1-P and DIT2-N2 cohere enough, there is
nevertheless some diversity in what each story taps. When
cost of poorer validity on Criterion 1. In fact, the opposite is we add the 6-story DIT1 to the 5-story DIT2, the 11 stories
true. show higher internal consistency because the 11 stories have
Criterion 2. We expect a measure of moral judgment to more overlap and are more redundant than the smaller
be related to views on public policy issues such as abortion, samples of the 5 or 6 stories. Paradoxically, however, a score
free speech, rights of homosexuals, religion in public based on the 11 stories contains essentially the same
schools, women's roles, and so on. information (although somewhat redundantly) as the score
In Table 4, we show both the old and new DIT correlated from 5 stories (with less redundancy). This can be seen by
with ATHRI and also the partial correlations with ATHRI comparing the correlations of the validity criteria from the
controlled for FUNDA and POLCON. We show partial 5-story DIT2-N2 with the 11-story DIT1 + DIT2: For the
correlations because previous studies (Rest, 1986) have 5-story DIT2-N2, the correlation with education is .69,
shown that both religious fundamentalism and political whereas the correlation with the 11-story DIT is .73. The
conservatism and liberalism were significantly correlated correlation of the 6-story DIT2-N2 with ATHRI is .50,
with the DIT. Therefore, the partial correlation attempts to whereas the correlation of the 11-story DIT1 + DIT2 is .52.
control the shared variance with political or religious By using all 11 stories (virtually doubling the test), the gain
conservatism of the DIT with ATHRI, estimating the relation in Cronbach's alpha is 8 points, whereas the gain in the
of moral judgment to public policy issues after controlling correlations with validity criteria is only 2 to 4 points.
for religious and political conservatism. Again, despite the (Hence, we conclude that on the basis of 5 stories, DIT2-N2
practical advantages of DIT2-N2 over DIT1-P, the new contains virtually the same information as a moral judgment
version does not suffer any weaker trends on Criterion 2. In variable that is based on 11 stories with high Cronbach's
fact, in the partial correlation with ATHRI, DIT2-N2 has a alpha.)
significant advantage over DIT1-P. Criterion 4. We expect DIT1 to be significantly corre-
Criterion 3. We expect a measure of moral judgment to lated with DIT2. This criterion is different from the previous
have adequate reliability as measured by Cronbach's alpha. three criteria in that it does not contrast DIT1 with DIT2,
but, rather, examines their overlap. The correlation of
DIT1-P with DIT2-N2 is .71 (using the standard participant
Table 4 reliability checks; n = 154). The correlation of DIT1-N2
Correlations and Partial Correlations of Moral with DIT2-N2 is .79 (using the N2 index and new checks;
Judgment With ATHRI n = 178).
With Guilford's (1954, p. 400) correction for attenuation
ATHRI (controlling for
Measure ATHRI FUNDA and POLCON) resulting from the less-than-perfect reliability of two mea-
sures, the upward bound estimate for the correlation be-
DIT1-P .48 .40 tween the two "true" scores is .95 to .99 (depending on the
DIT2-N2 .50 .51
sample used for reliability estimates and the method of
Note. All correlations of the DIT with ATHRI are significant, p < indexing). Hence, the DIT1 and DIT2 are correlated with
.001. The correlation of DIT1-P with ATHRI is not significantly each other about as much as their reliabilities allow. DIT1 is
lower, f(151) = .99, ns, from the correlation of DIT1-N2 with
ATHRI. The correlation of DIT1-P with ATHRI, partialing out for correlated with DIT2 about as much as previous studies have
FUNDA and POLCON, is significantly lower, f(149) = 4.43, reported for the test-retest of DIT1 with itself (Rest, 1979, p.
p < .001, than the corresponding partial correlation of DIT2-N2. 239).
ATHRI = Attitudes Toward Human Rights Inventory (Getz, 1985); In sum, DIT2-N2 is shorter, more streamlined, more
FUNDA = religious fundamentalism; POLCON = political iden-
tity as conservative; DIT1 = Defining Issues Test (original updated, and purges fewer participants than DIT1-P, and
version); DIT2 = Defining Issues Test, Version 2; P = P index; (with N2 and new checks) it has somewhat better validity
N2 = N2 index. characteristics. According to this study, if either measxnehas
652 REST, NARVAEZ, THOMA, AND BEBEAU

the validity advantage, it seems to lie with DIT2 in addition does standard checks on both DITs, DIT1 lost nine more
to its practical advantages. participants than did DIT2 using new checks.
Second, note that using the new analyses (N2 and new
Part 2 checks) makes more of a difference in the validity criteria
than using new dilemmas (DIT2). In other words, the old
What effects are unique to the new dilemmas and items DIT (6-story DIT1)—for all its datedness and awkward
and what effects are a result of the new analyses (N2 and wording—seems to produce trends as strong as the new DIT
new checks)? What if the new analyses are computed on the (5-story DIT2) with updated wording when the new analyses
old DIT (i.e., the data from the 6-story DIT1)? Would there (N2 and new checks) are used. The particular advantages of
be any advantage in doing so (without using the new DIT2 seem mostly to be that it is shorter and retains slightly
dilemmas and items)? more participants (nine more than DIT1), not that the
In Table 5, the top row repeats the correlations of changes in dilemmas or wording produce stronger validity
DIT2-N2 with the validity criteria already given in Tables 2 trends. Perhaps the datedness and awkward wording of
and 3 and in the discussion of Cronbach's alpha in Criterion DIT1 put off some participants and undermined motivation
3; the bottom row repeats the correlations of DITl-P with to perform the task, but in the current study, this seemed to
the validity criteria (also given previously). Rows 1 and 4 affect only 5% of the participants. When most participants
are provided for easy comparison with rows 2 and 3. The perform the task of DIT1, the validity trends are as strong as
second row (the most important row in Table 5) shows how the updated, shorter version. In both cases, however, the new
the validity criteria are affected by using the old DIT (Heinz analyses with N2 and new checks are preferable to the
and the drug, etc.) with the new index (N2) and the new analyses used for over 25 years for DIT1.
participant reliability checks. (In other words, row 2 uses the The third row in Table 6 shows that it is not a good idea to
old DIT, including Heinz and the drug, but adopts the new use DIT2 without the N2 index and new checks. From the
data analyses of N2 and new checks.) The special interest in perspective of this study, the only disadvantage of N2 and
row 2 is whether there seems to be any advantage to new checks is that they are too labor intensive for hand
reanalyzing DIT1 with N2 and new checks. The third row scoring (the original DIT1 could be hand scored). It takes
shows how the correlations are affected by using the new several hours of hand computation per participant to perform
DIT with the old P index and the old standard reliability the routines of N2 and new checks. Only a computer should
checks. be put through the amount of calculation necessary to
First, note the sample sizes. The new participant reliabil- produce N2 and new checks.
ity checks allow more participants in the sample to be One might wonder whether the DIT's relation to ATHRI is
cleared for analysis (96% for DIT2; 92% for DIT1) than do "piggybacking" on a third variable, education. After all,
the standard reliability checks (77% for DIT-P). The differ- other research (e.g., McClosky & Brill, 1983) has shown
ence between 92% and 77% is statistically significant correlations between public policy issues and education.
(z = 3.98, p < .001, n = 200), and the difference between Therefore, partialling out education, the partial correlation
96% and 92% is statistically significant (z = 1.86, p = .05, of DIT1 with ATHRI is .30 (n = 180, p < .001), and the
n = 200). In other words, the new analyses (N2, new partial correlation of DIT2 with ATHRI is .28 (n = 195,
checks) retain significantly more participants on both DIT1 p < .001). Again, partialling out education, the partial
and DIT2 than do the old standard analyses, and with new correlation of DIT1 with DIT2 is .62 (n = 178, p < .001).
checks, DIT2 retains slightly more participants than does Therefore, there is no indication that education can account
DIT1. Although new checks retains more participants than for the predictability of the DIT.

Table 5
Correlations of DIT Measures With the Validity Criteria With and Without New Index
and New Participant Reliability Checks
Measure EDb ATHRP ATHRI/partiald Cronbach's a
With new index and new participant reliability checks
DIT2-N2 192 .69 .50 .51
00 00

DIT1-N2 183 .68 .54 .52


With old P index and standard participant reliability checks
DIT2-P 154 .62 .55 .42 .74
DITl-P 154 .62 .48 .40 .76
Note. ED = educational level; ATHRI = Attitudes Toward Human Rights Inventory (Getz, 1985);
DIT1 = Defining Issues Test (original version); DIT2 = Defining Issues Test, Version 2.
a
Sample retained after participant reliability checks. bCorrelation with educational level (4
levels). cBivariate correlation with ATHRI. dPartial correlation with ATHRI, controlling for
religious fundamentalism (FUNDA) and political identity as conservative (POLCON).
DIT2 653
Table 6 judgment construct purports) or are bogus? There are four
Multiple Regressions of Moral Judgment, Political Identity, problem responses that give bogus data:
and Religious Fundamentalism (Independent Variables) 1. Random responding. The participant may fill in the
Predicting Controversial Public Policy bubbles on an answer sheet, but the marks may not have
Issues (Dependent Variable) anything to do with his or her moral cognition. For instance,
Variable B we have seen answer sheets on which participants filled in
the answer bubbles to form Christmas trees and other
Equation 1, predicting to ATHRI including DIT1-P, n = 154, geometric designs. We doubt that such responses accurately
multiple R = .56,<f/ = 151
measure the construct of moral judgment.
DIT1-P 0.34 .38 5.3*** 2. Missing data. The participant may not be sufficiently
POLCON -3.78 -.23 -3.2**
motivated to take the test and may leave out large sections of
FUNDA -0.18 -.15 -2.0*
answers, or just quit.
Equation 2, predicting to ATHRI including DIT2-N2, n = 194, 3. Alien test-taking sets. The participant may choose items
multiple R = .58, df= 191 not on the basis of their meaning, but on the basis of
DIT2-N2 0.28 .48 8.0*** complex syntax, special wording, or the seemingly lofty
POLCON -3.85 -.23 -3.7** sound of the words. In this case, the scores do not reflect the
FUNDA -0.17 -.14 -2.2*
moral judgment construct but instead reflect a preference for
Note. POLCON and FUNDA are negatively related because high complex style or verbiage.
scores are more conservative, in contrast to DIT and ATHRI scores, 4. Nondiscrimination of items. The participant may put
which run in the opposite direction. ATHRI = Attitudes Towards down the same response for all items, failing to discriminate
Human Rights Inventory. DIT1-P = Defining Issues Test (original
version), using P index; POLCON = political identity as conserva- among the items (e.g., putting down 3s for all ratings and
tive; FUNDA = religious fundamentalism; DIT2-N2 = Defining ranks). Rest, Narvaez, Mitchell, and Thoma (1998a) showed
Issues Test, Version 2, using N2 index. that for a very large sample (n > 58,000), participants show
*p < .05. **/?< .01. ***p < .001. considerable variation in rating and ranking DIT items;
therefore, some variation is expected.
If a participant is suspected of any one of these four
For completeness of analysis, additional tables of the response problems, we know of no way to salvage or correct
validity criteria were also computed to separate the effect of the protocol. Instead, the entire protocol is discarded from
indexing from methods of detecting participant reliability analysis. In general, previous research has shown that
(i.e., using the P index with new checks, and using N2 with purging the protocols of participants who manifest any of
standard checks). The results were generally intermediate these four problems results in clearer data trends (Rest,
between rows 1 and 4. So nothing of special interest was Thoma, & Edwards, 1997), presumably because error vari-
found here. The general conclusion is that, for the strongest ance has been minimized.
validity trends, the researcher might use either DIT1 or Standard checks. In the standard checks procedure used
DIT2, but should use both new analyses together. The with DIT1, four checks are used to detect the likelihood of
practical advantages of DIT2 (i.e., it is somewhat shorter, each of the four problems.
less dated, and likely to retain slightly more participants) is 1. The problem of random responding. As a guard against
what recommends it over DIT1. We were expecting that the random checking, a participant's ratings are checked for
new dilemmas and wording of DIT2 would make a contribu- consistency with the participant's rankings. For example, if a
tion to greater validity (in addition to using N2 and new participant chose Item 10 as the top rank (most important
checks), but we were surprised that DIT1 seems to work item), then no other item should be rated higher in impor-
about as well (when used in conjunction with N2 and new tance than Item 10. Further, with this example, if a partici-
checks). pant chose Item 8 as second most important rank, then only
Item 10 should be rated higher. Our general approach is to
count each violation of a pattern of rank-rate consistency as
Part 3
an inconsistency. Thus, with regard to the first problem
Recall that DIT2 involves three changes from DIT1: (a) (random responding), rate-rank inconsistencies are assumed
changes in dilemmas and items (discussed earlier); (b) to indicate random checking. Theoretically, the perfectly
changes in indexing (discussed in detail elsewhere; Rest, consistent participant will have no rank-rate inconsistencies.
Thoma, Narvaez, et al., 1997); and (c) changes in participant In reality, however, we can expect some inconsistency, even
reliability checks, which is addressed in this section. among serious, well-motivated participants. Participants
The problems of bogus data. One inevitable problem sometimes change their minds after being exposed to a
with a group-administered, multiple-choice test is that variety of issues. So the question becomes, how much
participants might put check marks down on the question- inconsistency should researchers tolerate as the innocent
naire without reading the items or following instructions, or shifting of item evaluations, and how much inconsistency is
they might proceed with a test-taking set that is alien to the too much, reflecting random responding? Where do we draw
instructions. How do researchers determine whether the the line? In the standard procedure, participants who have
participants' responses reflect moral thinking (as the moral more than eight inconsistencies on a dilemma (counting only
654 REST, NARVAEZ, THOMA, AND BEBEAU

the top two ranks) are considered to have too much design). In general, the objective was to have protocols that
inconsistency as are participants who have inconsistencies we knew were bogus data and to see whether our reliability
on more than two dilemmas. Participants exceeding these checks would pick up all these bad protocols, but would pass
cutoff points are eliminated from the sample. It turns out, through a high percentage of actual data. We also wanted to
over our 25-year experience, that this rate-rank consistency see if the validity trend was still robust with new cutoff
check (more than the other three participant reliability scores. We were especially interested in comparing data
checks, described later) accounts for the bulk of purged trends of the new checks with the old standard checks.
participants. 1. The problem of random responding. Because in the
2. The problem of missing data. Occasional missing data standard checks, the largest numbers of participants are
are tolerated in standard checks. For example, if someone purged for unreliability based on the rate-rank consistency
omits an occasional rating or ranking, we do not purge the check, we paid the most attention to this procedure. To
entire protocol. Instead, we readjust scores to make up for detect participants who are randomly checking, this is our
the missing data, in effect calculating readjusted scores to new procedure: We look at a participant's ranks, weight the
reflect the response patterns in the rest of the protocol and top rank as 4, the second most important as 3, the third as 2,
adjusting scores so that every participant's data are on the and the fourth as 1 (same weights as in deriving the P score).
same scale. However, too much missing data may reflect a Then we look at the item's rating. If there is an item different
general lack of motivation to take the task. In this case, we from the one in the first rank that is rated more highly than
cannot have confidence in any responses. Again, the ques- the item in the top rank, then that is one occurrence of
tion is how much is tolerable and how much is grounds for inconsistency and is multiplied by 4. All other inconsisten-
purging the entire protocol? In the standard checks proce- cies with the top rank are also multiplied by 4. Then we look
dure, if a participant leaves out two whole stories (for at the item ranked as second most important. There should
instance, a participant is asked to complete six stories but not be any item rated more highly than the second-ranked
only completes four), then that is regarded as too much. The item except the item ranked in the top rank. The occurrences
problem in interpreting such a protocol is not that we could of exceptions to this expectation are counted and weighted
not readjust a score based on four stories to be on the same by 3, and so on for the third- and fourth-ranked items (the
scale as six stories; rather, the problem is that we are violations are counted and weighted by 2, for third rank, or
suspicious of the motivation of the participant to do the work by 1, for fourth rank). The weighted inconsistencies for each
of the DIT in the four stories (it is possible that even in the story and across stories are summed. The summed weighted
four stories, reliable data were not given). rank-rate inconsistencies across five stories can range from
3. The problem of alien test-taking sets. Participants who 0 to 600. Through trial and error, we arrived at cutoff points.
choose items for their pretentiousness or lofty sound are not We wanted a stringent enough threshold point to prevent any
following instructions to choose items based on their of the deliberately bogus data from getting through, but not
meanings. As a check on this alien test-taking set, we have so low a threshold to make the validity trends suffer. Thus,
distributed five meaningless items (M-items) throughout the we arrived at the cutoff point of how much is too much by
DIT that may be attractive for their complex syntax or "high empirical trial and error. It turns out that if the sum of
sounding" verbiage, but do not mean anything. If a partici- rate-rank inconsistencies is more than 200, then that is too
pant ranks too many of these M-items too highly, we assume much, and the protocol is invalidated (purged from the
an alien test-taking set and purge the whole protocol. In the sample). If the sum is under the 200 mark, it is regarded as
standard procedure, a score of 8 or more (weighting ranks by innocent confusion, and we tolerate that much inconsistency
4 for top rank, by 3 for second rank, etc.) on the M-items by not purging the entire protocol.
invalidates the protocol. 2. The problem of missing data. Occasional missing data
4. The problem of nondiscrimination. Participants who do are tolerated by DIT2. Using the trial-and-error procedure
not discriminate answers (e.g., those who check i s for all described earlier, we arrived at cutoff values. If the partici-
items) are not complying with our instructions to make pant leaves out more than three ratings on any of two stories,
discriminations. Because nondiscriminating participants will the protocol is invalidated. If the participant leaves out more
not be picked up in the rate-rank check, a special check has than six ranks, the protocol is also invalidated.
been devised for nondiscrimination. In the standard proce- 3. The problem of alien test-taking sets. Participants who
dure, no more than one story can have more than eight items pick items for style rather than for meaning are not following
rated the same. our instructions. In the new checks procedure, we also use
New checks. The new checks procedure recognizes the M-items to detect this problem. The protocols of participants
same four problems in participant reliability, but deals with whose weighted ranks on the M-items total more than 10 are
them in ways different from the standard checks procedure. invalidated (more lax than the cutoff of 8 on standard
To investigate the consequences of different methods and checks).
cutoff points, we concocted a set of protocols that deliber- 4. The problem of nondiscrimination. In new checks,
ately epitomized one or more of the violations we sought to participants who rate 11 items the same on a story are
detect. Some of the deliberately bogus data were based on a considered as not discriminating; if the participant fails to
random number table (to simulate random responding). discriminate on two stories or more, the protocol is invali-
Other bogus protocols were based on filling in the answer dated. Nondiscrimination by rates or ranks is grounds for
bubbles to form graphic designs (e.g., the Christmas tree purging the protocol.
D1T2 655
As mentioned earlier, the new checks purged 8 partici- or conservative), and religious fundamentalism are related
pants from the sample of 200 (or 4%), whereas the standard but also distinct constructs. The variables carry unique
checks purged 46 participants (or 23%) from the sample. In information, and they cannot all be reduced to a common
general, the new checks are less stringent that the standard factor of liberalism-conservatism. (See Thoma et al., in
checks. Paradoxically, the data in this study suggest that the press, and Rest et al., 1999, for discussion of the Emler et al.,
less stringent method (new checks) produces stronger trends 1983, studies.) In support of the uniqueness of each con-
than does the more stringent method (standard checks). How struct or variable (moral judgment, religious fundamental-
can this be? One might expect the opposite—that making ism, and political identity as liberal or conservative), Nar-
sure of participant reliability (the more stringent method) vaez et al. reported a multiple regression having the ATHRI
would produce stronger validity trends than a more lax as the dependent variable and having the DIT, FUNDA, and
method for checking for participant reliability. The key to POLCON as independent variables. Multiple regression
this paradox lies in the fact that standard checks purges analysis permits estimation of the unique contribution of
proportionately more of the youngest group of ninth graders each independent variable by examining the standardized
(58% of the ninth graders were purged by the standard beta weights. Narvaez et al. (Study 1) examined two church
checks) than for the oldest group (only 8% were purged in congregations and found that the beta weights for each of the
the graduate and professional school subsample). In con- three independent variables were each significant in their
trast, new checks purged only 11 % of the ninth graders (and own right, indicating that each contributes distinct informa-
1 participant from the graduate school subsample). The tion to ATHRI. This finding for church samples was
difference between 58% and 11% is significant, z = 4.80, replicated in a student sample (Narvaez et al., 1999,
p < .0001, n = 47. One might speculate that with standard Study 2). Now we wish to determine whether the findings
checks, the disproportionate purging of the youngest partici- replicate with both DIT1 and DIT2. Because we place so
pants from the sample in effect changes the distribution of much importance on the DIT's unique contribution to
scores in the total sample, making the sample more homoge- understanding opinions about controversial public policy
neous, attenuating the spectrum of scores, and resulting in issues (the macrolevels of morality), we wanted to have
slightly weaker correlations and validity trends for standard more than just the Narvaez et al. studies to confirm our
checks. In other words, the new checks have stronger interpretation.
validity trends because they retain more of the lower scores
from the youngest participants and thus retain a wider range Because we wanted to replicate the Narvaez et al. (1999)
of scores (by which the correlations increase). study, we used the Brown and Lowe (1951) instrument as
Because the cutoff values for the reliability checks are the measure of fundamentalism. However, there is a problem
empirically derived, it remains to be seen whether they are in that the Brown and Lowe instrument is a measure of
optimum for other samples. The experiences of other Christian fundamentalism, and the sample from this study
researchers is the most important consideration here. To could have included Orthodox Jews, Orthodox Muslims, or
facilitate experimentation with different cutoff values for the others who would have a low score on Christian fundamen-
checks, the scoring service of the Center for the Study of talism but nevertheless be very orthodox in a non-Christian
Ethical Development provides a set of variables that can be way. Checking the total sample, it turned out that the
manipulated for each sample (Rest & Narvaez, 1998; Rest, overwhelming proportion (90%) indicated they considered
Narvaez, Mitchell, & Thoma, 1998a). themselves to be Christian. Only 20 participants indicated
that they were non-Christian. However, leaving these partici-
pants out of the analysis made little difference in the relation
Part 4 of FUNDA to DIT, ATHRI, or POLCON. Correlations of
FUNDA with DIT2-N2, ATHRI, and POLCON, including
Recall that Criterion 2 of validity in this study deals with the non-Christians, were —.10, —.25, and .28, respectively;
the correlation of the DIT with ATHRI. The correlation of excluding the non-Christians, the correlations were —.13,
moral judgment with political attitudes has been noted for — .26, and .21, respectively. Because including or excluding
some time (typically rs in the .4 to .6 range; see Rest et al., the non-Christians made little difference where FUNDA was
1999, for a review of several dozen correlations over 25 concerned (Criterion 2), we left the non-Christians in the
years). Emler et al. (1983) interpreted this pattern of sample for the sake of maximizing the sample size on the
correlations, contending that the DIT is really liberalism- other three criteria.
conservatism masquerading as developmental capacity. They
stated that The multiple regressions in Table 6 on this sample
replicate with DIT1-P and with DIT2-N2 in the Narvaez et
Moral reasoning and political attitude are by and large one and al. (1999) studies: (a) Both studies used the same dependent
the same thing.... We believe that individual differences in
moral reasoning among adults—and in particular those corre- variables, ATHRI (controversial public policy issues), and
sponding to the conventional-principled distinction—are inter- independent variables (FUNDA, POLCON, and DIT); (b)
pretable as variations on a dimension of political-moral each independent variable (DIT, POLCON, and FUNDA)
ideology and not as variations on a cognitive-developmental has significantly unique predictability to ATHRI; (c) moral
dimension, (pp. 1073-1075)
judgment has higher standardized beta weights than does
In contrast, our view (Narvaez et al., 1999) is that moral POLCON or FUNDA; (d) when all three independent
judgment, political identity (identifying oneself as a liberal variables are combined, the combination predicts powerfully
656 REST, NARVAEZ, THOMA, AND BEBEAU

Table 7
Predictability to ATHRI From Multiple Regression Beta Weights From Present Study
With Multiple Regression Weights From Narvaez et al. (1999), Study 1
ORTHO" Multiple Rb
Measure (Study 1) (present study with DIT1-P)
ATHRI (n = 154) -.54 .56
ORTHCF Multiple Rd
(Study 1) (present study with DIT2-N2)
ATHRI (TI = 192) -.55 .58
Note. DIT, POLCON, and FUNDA are components that go into ORTHO and multiple R (from
Table 5). All correlations are significant,/? < .001. DIT1-P = Defining Issues Test (original version),
using P index; DIT2-N2 = Defining Issues Test, Version 2, using N2 index; ATHRI = Attitudes
Toward Human Rights Inventory (Getz, 1985).
"Orthodoxy combination variable formed by combining DIT1-P, POLCON, and FUNDA according
to weights of multiple regression in Study 1 of Narvaez et al. (1999). bMultiple regression from
Table 5, Equation 1. cSame combination of variables based on weights of Study 1, but with
DIT2-N2 for moral judgment component. ""Multiple regression from Table 5, Equation 2.

to ATHRI (with DIT1-P, R = .56; with DIT1-N2, multiple of multiple regression results when the independent vari-
R = .58; with the 11-story DIT1 + 2 - N2, R = .63). ables are intercorrelated; the problem is that the beta weights
A stronger test of Narvaez et al. (1999) is to use the same are unstable from sample to sample.
beta weights (nonstandardized) as those in Study 1 for In the present sample, the independent variables are
combining DIT, POLCON, and FUNDA. Using beta weights significantly intercorrelated; however, the extent of the
from the original multiple regression (in Narvaez et al., correlation raises to only .28, far short of the problem caused
1999, Study 1) produces a variable called ORTHO (represent- when the correlations among independent variables ap-
ing the construct, orthodoxy-progressivism, as discussed by proach + 1.00 or -1.00. Furthermore, Howell (1987) sug-
Hunter, 1991). ORTHO provides a stronger replication of gested that the relative importance of each independent
the Narvaez et al. (1999) study than a new multiple variable is indicated by the t statistic (indicated in Table 6
regression on this new sample because in combining the and all the multiple regression tables). It can be seen in our
three variables from the beta weights of the original sample, tables that relative importance is the same relative order as
we are not capitalizing on sample-specific chance factors (as that of beta weights. Hence, what is said about the primary
do the multiple regressions in Table 6). ORTHO is, in effect, importance of DIT scores as one of the independent
a transfer of the original relations of the independent variables still stands in view of the f-test results.
variables from Narvaez et al. (1999, Study 1) to the present Hence, one of the findings of the present study is the
study in predicting ATHRI. Table 7 shows that the correla- replication that the DIT is of first importance among
tion of ATHRI with ORTHO is only two or three points independent variables (has higher beta weights and higher
weaker than the Rs run in Table 6 on the specific new f-test scores). In all four replications (two in Narvaez et al.,
samples of the present study. In other words, the beta 1999, and both DIT1 and DIT2 results in the present study),
weights derived from the multiple regression for ORTHO this was the stable result; therefore, there does not seem to be
from Study 1 in Narvaez et al. (1999) generalizes well to the a multicollinear problem in the stability of these results
present study. regarding beta weights.
One might be concerned with the problem of multicol- It is true that the Rs in Narvaez et al. (1999) were
linearity on the multiple regression results. As Howell generally in the range of .7 to .8. In the present study, Rs
(1982, pp. 500ff) noted, a problem can exist in interpretation were in the .5 to .6 range. The difference in R may be due to

Table 8
Differences Between Students in Present Sample and Students
in Narvaez et al. (1999), Sample 2
Present sample Narvaez et al • (1999),
(n = 154) Study 2 (n = 62)
Difference
Variable M SD M SD (t test, df= 214)
DIT1-P 37.86 17.19 48.58 15.13 4.53****
POLCON 3.16 0.92 2.85 0.94 2.21*
FUNDA 58.13 12.81 55.48 14.78 1.24
ATHRI 145.93 15.18 159.16 17.36 5.28****
Note. DIT1-P = Defining Issues Test (original version), using P index; POLCON = political
identity as conservative; FUNDA = religious fundamentalism; ATHRI = Attitudes Toward Human
Rights Inventory (Getz, 1985).
*p<.05. ****/><.0001.
DIT2 657
the peculiarities of the samples. As shown in Table 8, the each independent variable alone, but each does not reduce to
student sample in the present study is generally more the other. This is consistent with the view expressed in
conservative, that is, lower in moral judgment, lower on Narvaez et al. (1999) about the relation of moral judgment to
advocacy for human rights, and more politically conserva- cultural ideology. Third, the new index, N2, as reported in
tive than the student sample of Narvaez et al. (1999, Study Rest, Thoma, Narvaez, et al. (1997) shows advantages over
2). Future research may clarify whether views on public the traditional ways of performing these calculations. Al-
policy issues (ATHRI) are better predicted in more liberal though there seems to be some gain in the power of trends
groups (Narvaez et al., 1999, Study 2) than in more using these new forms of analysis (N2 and new checks), the
conservative groups (present study). computations have become so labor intensive that hand
scoring is no longer an option with N2 or new checks. To
Conclusions these replications, we add that DIT1 is highly correlated
with DIT2 (r = .79) and that the 11 stories of DIT1 plus
The four parts of this article indicate the following DIT2 show a very high degree of internal consistency
conclusions: (Cronbach's alpha = .90).
1. After 25 years of research using DIT1 -P, there may now What are the practical implications of the present study?
be a better DIT that is shorter, more updated, purges fewer The findings encourage researchers to substitute DIT2 for
participants, and has significantly better validity characteristics. DIT1. However, because this is only the first study with
2. To the extent that DIT2 shows an improvement in DIT2, with 200 participants—and because hundreds of
validity trends over DIT1 in this study, the increase in validity studies have used the DIT1, involving about half a million
seems to be attributable to the new ways of analyzing data (in participants—the older version must be regarded as the more
indexing and in checking participant reliability) and not to the established entity. Researchers for new projects must decide
new dilemmas or new wording. We were expecting signifi-
whether an updated, shorter, and slightly more powerful
cant gains in validity for new dilemmas and wording (DIT1
DIT2 with a short track record is preferable to the dated,
vs. DIT2), for N2 versus P, and for new checks versus
longer, but better established DIT1. In any case, whether
standard checks. Instead, we found significant improve-
using DIT1 or DIT2, the new analyses (with N2 and new
ments for the new analyses (N2 and new checks) but not for
DIT1 over DIT2. Still, the practical advantages of DIT2 (i.e., checks) should be employed. (Users of DIT1 can send
it is shorter and updated and thus purges slightly fewer previously scored data for rescoring, free of charge to the
participants) recommend experimentation. Center for the Study of Ethical Development.)
3. The reason that new checks show stronger trends on the The most meaningful verdict on DIT2 must come from
validity criteria seems to be because they retain a wider independent researchers beyond the site of development
range of scores, resulting in a fuller distribution of scores. (Center for the Study of Ethical Development). The general-
4. The present study supports the particular interpretation izability of DIT2, N2, and new checks must come from other
of Narvaez et al. (1999) regarding the combination and researchers who may or may not find these innovations useful.
interaction of moral judgment with cultural ideology in the
formation of opinions on public policy issues. DIT2 seems
to operate in a way similar to DIT1 when used to predict References
attitudes toward public policy issues. More generally, this
Bargh, J. (1989). Conditional automaticity: Varieties of automatic
supports our view that Kohlbergian theories of morality are influence in social perception and cognition. In J. Uleman & J.
more useful in describing macromorality than micromorality. Bargh (Eds.), Unintended thought (pp. 3-51). New York:
Despite the long tradition in using the same dilemmas in Guilford Press.
Kohlbergian (1976, 1984) research (e.g., Heinz and the Brown, D. G., & Lowe, W. L. (1951). Religious beliefs and
drug), this study suggests that there is nothing exceptional or personality characteristics of college students. Journal of Social
magical about the DITl's dilemmas and items, or about the Psychology, 33, 103-129.
classic Kohlberg dilemmas. It is possible to update, shorten, Colby, A., Kohlberg, L., Speicher, B., Hewer, A., Candee, D.,
and revise the DIT without sacrificing validity. This should Gibbs, J., & Power, C. (1987). The measurement of moral
be encouraging for experimentation with new dilemmas and judgment (Vols. 1-2). New York: Cambridge University Press.
items. For instance, profession-specific dilemmas may be Emler, N., Resnick, S., & Malone, B. (1983). The relationship
devised for a profession (e.g., for dentists, accountants, or between moral reasoning and political orientation. Journal of
teachers) in the hope of accounting better for profession- Personality and Social Psychology, 45, 1073-1080.
specific behavior. Ericsson, K. A., & Smith, J. (1991). Toward a general theory of
expertise. New York: Cambridge University Press.
This study reconfirms several basic findings about the
Getz, I. (1984). The relation of moral reasoning and religion: A
moral judgment construct. First, the developmental, age and review of the literature. Counseling and Values, 28, 94-116.
education trends are reconfirmed with DIT2 (i.e., moral Getz, I. (1985). The relation of moral and religious ideology to
judgment scores increase as age and education increases). human rights. Unpublished doctoral dissertation, University of
Second, moral judgment scores are highly related to views Minnesota.
on controversial public policy issues, as assessed by the Gilligan, C. (1982). In a different voice. Cambridge, MA: Harvard
ATHRI. Further, in multiple regression, moral judgment University Press.
along with political identity and religious fundamentalism Guilford, J. P. (1954). Psychometric methods. New York: McGraw-
predict the ATHRI scores in combination more strongly than Hill.
658 REST, NARVAEZ, THOMA, AND BEBEAU

Holyoak, K. J. (1994). Symbolic connectionism: Toward third- Rest, J., Cooper, D., Coder, R., Masanz, J., & Anderson, D. (1974).
generation theories of expertise. In K. A. Ericsson & J. Smith Judging the important issues in moral dilemmas—an objective
(Eds.), Toward a general theory of expertise (pp. 301-336). New test of development Developmental Psychology, 10,491-501.
York: Cambridge University Press. Rest, J., & Narvaez, D. (Eds.). (1994). Moral development in the
Howell, D. C. (1987). Statistical methods for psychology (2nd ed.). professions: Psychology and applied ethics. Hillsdale, NJ:
Boston: Duxbury. Erlbaum.
Hunter, J. D. (1991). Culture wars: The struggle to define America. Rest, J., & Narvaez, D. (1998). Guide for DIT-2. Unpublished
New York: Basic Books. manuscript. (Available from Center for Study of Ethical Devel-
Juergensmeyer, M. (1993). The new cold war? Berkeley, CA: opment, University of Minnesota, 206 Burton Hall, 178 Pills-
University of California Press. bury Dr., Minneapolis, MN 55455.)
Killen, M., & Hart, D. (Eds.) (1995). Morality in everyday life. Rest, J., Narvaez, D., Bebeau, M. J., & Thoma, S. J. (1999).
New York: Cambridge University Press. Postconventional moral thinking: A neo-Kohlbergian approach.
Kohlberg, L. (1976). Moral stages and moralization: The cognitive Mahwah, NJ: Erlbaum.
developmental approach. In T. Lickona (Ed.), Moral development Rest, J., Narvaez, D., Mitchell, C , & Thoma, S. J. (1998a).
and behavior (pp. 31-53). New York: Holt, Rinehart, & Winston. Exploring moral judgment: A technical manual for the Defining
Kohlberg, L. (1984). Essays on moral development: The nature and Issues Test. Unpublished manuscript. (Available from the Center
validity of moral stages, Vol. 2. San Francisco: Harper & Row. for the Study of Ethical Development, University of Minnesota,
Kohlberg, L., Boyd, D. R., & Levine, C. (1990). The return of 206 Burton Hall, 178 Pillbury Dr., Minneapolis, MN 55455.)
Stage 6: Its principle and moral point of view. In T. Wren (Ed.), Rest, J., Narvaez, D., Mitchell, C , & Thoma, S. J. (1998b). How
The moral domain: Essays in the ongoing discussion between Test Length Affects Validity and Reliability of the Defining Issues
philosophy and the social sciences (pp. 1151-1181). Cambridge, Test. Manuscript submitted for publication.
MA: The MIT Press. Rest, J., Thoma, S. J., & Edwards, L. (1997). Designing and
Lewicki, P. (1986). Non-conscious social information processing. validating a measure of moral judgment: Stage preference and
New York: Academic Press. stage consistency approaches. Journal of Educational Psychol-
Marty, M. E., & Appleby, R. S. (1991). Fundamentalism observed. ogy, 89, 5-28.
Chicago: University of Chicago Press. Rest, J., Thoma, S. J., Narvaez, D., & Bebeau, M. J. (1997).
Marty, M. E., & Appleby, R. S. (1993). Fundamentalism and the Alchemy and beyond: Indexing the Denning Issues Test. Journal
state. Chicago: University of Chicago Press. of Educational Psychology, 89,498-507.
McClosky, H., & Brill, A. (1983). Dimensions of tolerance: What Schacter, D. L. (1996). Searching for memory. New York: Basic
Americans believe about civil liberties. New York: Sage. Books.
Narvaez, D. (1998). The influence of moral schemas on the Snarey, J. (1985). The cross-cultural universality of social-moral
reconstruction of moral narratives in eighth graders and college development. Psychological Bulletin, 97, 202-232.
students. Journal of Educational Psychology, 90, 13-24. Taylor, S. E., & Crocker, J. (1981). Schematic bases of social
Narvaez, D., Getz, I., Thoma, S. J., & Rest, J. (1999). Individual information processing. In E. T. Higgins, C. P. Herman, & M. P.
moral judgment and cultural ideologies. Developmental Psychol- Zanna (Eds.), Social cognition: The Ontario Symposium (Vol. 1,
ogy, 35, 478-488. pp. 89-134). Hillsdale, NJ: Erlbaum.
Nisbett, R. E., & Wilson, T. D. (1977). Telling more than we can know: Thoma, S., Narvaez, D., Rest, J., & Derryberry, P. (in press). The
Verbal reports on mental processes. Psychological Review, 84, distinctiveness of moral judgment. Educational Psychology
231-259. Review.
Pascarella, E. T., & Terenzini, P. (1991). How college affects Tulving, E., Schacter, D. L., & Stark, H. A. (1982). Priming effects
students: Findings and insights from twenty years of research. in word-fragment completion are independent of recognition
San Francisco: Jossey-Bass. memory. Journal of Experimental Psychology; Learning,
Rawls, J. A. (1971). A theory ofjustice. Cambridge, MA: Harvard Memory, and Cognition, 8, 336-342.
University Press. Uleman, J. S., & Bargh, J. A. (1989). Unintended thought. New
Rest, J. (1979). Development in judging moral issues. Minneapolis: York: Guilford Press.
University of Minnesota Press. Youniss, J., & Yates, M. (in press). Youth service and moral
Rest, J. (1986). Moral development: Advances in research and identity: A case for everyday morality. Educational Psychology
theory. New York: Praeger. Review.

Appendix

Sample Story From DIT2: The Famine

The small village in northern India has experienced shortages of food before, What should Mustaq Singh do? Do you favor the action of taking the
but this year's famine is worse than ever. Some families are even trying to feed food? (Check one)
themselves by making soup from tree bark. Mustaq Singh's family is near
starvation. He has heard that arichman in his village has supplies of food stored in 2O 3D 4D 5D 6D 7D
away and is hoarding food while its price goes higher so that he can sell the food
Strongly Favor Slightly Neutral Slightly Disfavor Strongly
later at a huge profit Mustaq is desperate and thinks about stealing some food
favor favor disfavor disfavor
from the rich man's warehouse. The small amount of food that he needs for his
family probably wouldn't even be missed. Rate the following issues in terms of importance (1 = great, 2 = much.
DIT2 659
3 = some, 4 = little, 5 = no). Please put a number from 1 to 5 alongside 11. ED Would stealing bring about more total good for everybody
every item. concerned or not?
1. H} Is Mustaq Singh courageous enough to risk getting caught for 12. ED Are laws getting in the way of the most basic claim of any
stealing? member of a society?
2. EH Isn't it only natural for a loving father to care so much for his
family that he would steal? Which of these 12 issues is the 1st most important? (write in the number
3. CII Shouldn't the community's laws be upheld? of the item) | |
4. EH Does Mustaq Singh know a good recipe for preparing soup from Which of these 12 issues is the 2nd most important? I I
tree bark? Which of these 12 issues is the 3rd most important? | |
5. ED Does the rich man have any legal right to store food when other Which of these 12 issues is the 4th most important? I I
people are starving?
6. El Is the motive of Mustaq Singh to steal for himself or to steal for Note. An information package can be obtained from the Center for the
his family? Study of Ethical Development, University of Minnesota, 206 Burton Hall,
7. ED What values are going to be the basis for social cooperation? 178 Pillsbury Drive Southeast, Minneapolis, Minnesota 55455. Electronic
8. ED Is the epitome of eating reconcilable with the culpability of mail may be sent to [email protected], or call (612) 624-0876.
stealing?
9. ED Does the rich man deserve to be robbed for being so greedy? Received November 10, 1998
10. ED Isn't private property an institution to enable the rich to exploit Revision received February 16, 1999
the poor? Accepted February 16, 1999 •

AMERICAN PSYCHOLOGICAL ASSOCIATION


SUBSCRIPTION CLAIMS INFORMATION Today's Date:_

We provide this form to assist members, institutions, and nonmember individuals with any subscription problems. With the
appropriate information we can begin a resolution. If you use the services of an agent, please do NOT duplicate claims through
them and directly to us. PLEASE PRINT CLEARLY AND IN INK IF POSSIBLE.

PRINT FULL NAME OR KEY NAME OF INSTITUTION MEMBERORCTJSTOMERNUMBER(MAYBEFOUNDONANYPASTISSUELABEL)

ADDRESS DATE YOUR ORDER WAS MAILED (OR PHONED)

PREPAID CHECK ___CHAROE


CHECK/CARD CLEARED DATE:_
CITY STATECOUNTRY
(If possible, send & copy, front and back, of your cancelled check to help us in our research
of your claim.)
YOUR NAME AND PHONE NUMBER ISSUES: MISSING DAMAGED

TITLE VOLUME OR YEAR NUMBER OR MONTH

Thank you. Once a claim is received and resolved, delivery of replacement issues routinely takes 4-6 weeks.
— — — ^ — — . (TO BE FILLED OUT BY APA STAFF) — — — — — - — — —
DATE RECEIVED:. DATE OF ACTION: _
ACTION TAKEN: _ INV.NO.&DATE:
STAFF NAME: LABEL NO. & DATE:_

Send this form to APA Subscription Claims, 750 First Street, NE, Washington, DC 20002-4242

PLEASE DO NOT REMOVE. A PHOTOCOPY MAY BE USED.

View publication stats

You might also like