0% found this document useful (0 votes)
6 views21 pages

Chen 1983

The article advocates for a theory-driven approach to program evaluation, emphasizing the need for theoretical models to enhance understanding and effectiveness in impact assessment. It critiques the dominance of randomized controlled experiments, arguing that they often overlook the complexities of social programs and their implementation processes. The authors propose that incorporating theoretical insights can lead to more accurate evaluations and better-informed social interventions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views21 pages

Chen 1983

The article advocates for a theory-driven approach to program evaluation, emphasizing the need for theoretical models to enhance understanding and effectiveness in impact assessment. It critiques the dominance of randomized controlled experiments, arguing that they often overlook the complexities of social programs and their implementation processes. The authors propose that incorporating theoretical insights can lead to more accurate evaluations and better-informed social interventions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Evaluation Review

http://erx.sagepub.com/

Evaluating With Sense: The Theory-Driven Approach


Huey-Tsyh Chen and Peter H. Rossi
Eval Rev 1983 7: 283
DOI: 10.1177/0193841X8300700301

The online version of this article can be found at:


http://erx.sagepub.com/content/7/3/283

Published by:

http://www.sagepublications.com

Additional services and information for Evaluation Review can be found at:

Email Alerts: http://erx.sagepub.com/cgi/alerts

Subscriptions: http://erx.sagepub.com/subscriptions

Reprints: http://www.sagepub.com/journalsReprints.nav

Permissions: http://www.sagepub.com/journalsPermissions.nav

Citations: http://erx.sagepub.com/content/7/3/283.refs.html

>> Version of Record - Jan 1, 1983

What is This?

Downloaded from erx.sagepub.com at UNIVERSITY OF WATERLOO on June 6,


2014
Arguing for more serious theorizing in connection with evaluation, this article shows that
although the randomized controlled experiment conceptualized as a "black box"
approach has dominated the discussions of impact assessment since the classic statements
of Campbell and Stanley (1966), the use of theoretical models in connection with impact
assessment can both heighten the power of experimental designs and compensatefor some
of the major deficiencies of quasi-experimental designs. The article also emphasizes the
importance of theoretical models of implementation processes, arguing that this process
often constitutes the major obstacle to the full realization of effective programs.

EVALUATING WITH SENSE


The Theory-Driven Approach
HUEY-TSYH CHEN
Johns Hopkins University
PETER H. ROSSI
University of Massachusetts at Amherst

or morethan two decades discussions about the appropriate


~ methodology for estimating the net effects of social programs
have been dominated by the paradigm of the randomized controlled
experiment. For some evaluation commentators (e.g., Suchman, 1969;
Campbell and Stanley, 1966; Cook and Campbell, 1979) alternative
designs for impact assessment are valued to the extent that such designs
mimic the validity advantages of randomized experiments. For others
(e.g., Scriven, 1972; Guba and Lincoln, 1981; Deutscher, 1977) the
paradigm is used as an example of what not to do in assessing the effects

AUTHORS’ NOTE: This article is a revised version of a paper presented at the 1982
meeting of the Evaluation Research Society. Preparation of this article was aided by a
grant from the National Science Foundation (Grant SES-8121745), of which P. H. Rossi
is the Principal Investigator.

283

Downloaded from erx.sagepub.com at UNIVERSITY OF WATERLOO on June 6,


2014
284

of programs-arguments that often stress the artificiality of standardized


treatments and accompanying data collection strategies, especially for
labor-intensive human services programs.
The domination of the experimental paradigm in the program eval-
uation literature has unfortunately drawn attention away from a more
important task in gaining understanding of social programs, namely,
developing theoretical models of social interventions. A very seductive
and attractive feature of controlled experiments is that it is not necessary
to understand how a social program works in order to estimate its net
effects through randomized experiments, provided that the goals and
objectives of a program can be specified in reasonably measurable
terms. Thus cookbook evaluation manuals (e.g., Morris et al., 1978) can
outline how to proceed in an evaluation with scarcely a mention of any
theory underlying the programs in question. Or evaluability assessments
(Wholey et al., 1975) can concentrate on whether or not goals are
sufficiently defined to permit the application of experimental or quasi-
experimental evaluation methods. Even the critics of the experimental
paradigm have had their attention distracted from seriously considering
the theoretical issues in social programs by concentrating on the false
issue of artificial data collection methods and the old social science
finding that the actual goals of human organizations are often not their
professed goals.
An unfortunate consequence of this lack of attention to theory is that
the outcomes of evaluation research often provide narrow and sometimes
distorted understandings of programs. It is not usually clear whether the
recorded failures of programs are due to the fact that the programs were
built on poor conceptual foundations, usually preposterous sets of
&dquo;causal&dquo; mechanisms (e.g., the Impact Cities program); or because
treatments were set at such low dosage levels that they could not
conceivably affect any outcomes (e.g., Title I); or because programs
were poorly implemented. Note that the emphasis in the above
statements is on deficiencies in the theoretical underpinnings of the
treatment or of the treatment delivery systems.
The purpose of this article is to bring theory back into program
evaluation. Our aim is not to make a case for basic research-there is
enough justification for that goal-but to make a case that neglect of
existing theoretical knowledge and of thinking theoretically has retarded
both our understanding of social programs and the efficient employment
of evaluation designs in impact assessment. This perspective on

Downloaded from erx.sagepub.com at UNIVERSITY OF WATERLOO on June 6,


2014
285

evaluation research we have called elsewhere (Chen and Rossi, 1980) the
&dquo;theory-driven&dquo; approach to evaluation-a perspective, we believe, that
has promise to yield better information on social programs, as well as
rich yields to the basic social science disciplines.
Of course the kind of theory we have in mind is not the global
conceptual schemes of the grand theorists, but much more prosaic
theories that are concerned with how human organizations work and
how social problems are generated. It advances evaluation practice very
little to adopt one or another of current global theories in attacking, say,
the problem of juvenile delinquency, but it does help a great deal to
understand the authority structure in schools and the mechanisms of
peer group influence and parental discipline in designing and evaluating
a program that is supposed to reduce disciplinary problems in schools.
Nor are we advocating an approach that rests exclusively on proven
theoretical schema that have received wide acclaim in published social
science literatures. What we are strongly advocating is the necessity for
theorizing, for constructing plausible and defensible models of how
programs can be expected to work before evaluating them. Indeed the
theory-driven perspective is closer to what econometricians call &dquo;model
specification&dquo; than are more complicated and more abstract and general
theories.
Nor do we argue for uncritically using the theories that may underlie
policymakers’ and program designers’ views of how programs should
work. Often enough policymakers and program designers are not social
scientists and their theories (if any) are likely to be simply the current
folklore of the upper-middle-brow media. The primary criterion for
identifying theory in the sense used in this article is consistency with
social science knowledge and theory. Indeed theoretical structures
constructed out of social science concerns may directly contradict what
may be the working assumptions of policymakers and program
designers.
It is an acknowledged embarrassment to our viewpoint that social
science theory is not well enough developed that appropriate theoretical
frameworks and schema are ordinarily easily available &dquo;off the shelf.&dquo;
But the absence of fully developed theory should not prevent one from
using the best of what is already at hand. Most important of all, it is
necessary to think theoretically, that is, to rise above the specific and the
particular to develop general understandings of social phenomena.

Downloaded from erx.sagepub.com at UNIVERSITY OF WATERLOO on June 6,


2014
286

A GENERALIZED MODEL
FOR PROGRAM EVALUATION

A useful general scheme for identifying the main components of any


social program is shown in Figure 1. The main causal relationships that
would have to be worked out in any useful model of a program are
shown in that diagram as arrows connecting the main components.
Central to the diagram are &dquo;delivered treatment variables,&dquo; which
constitute the program to be evaluated insofar and inasmuch as has been
delivered. Note that treatment is conceived not as designed but as
actually delivered, the delivery being affected by an implementation
system that includes organizations, personnel, facilities, clients, and
regulations concerning eligibility.
At the far right of the diagram are &dquo;outcome variables,&dquo; those
aspects-intended and unintended-that are affected either directly by
treatment variables and/or mediated through a set of intervening
variables, represented by the box so labeled. Whether or not intervening
processes are present is a matter of conceptualization. Thus the
provision of transfer payments as a treatment directly affects the
incomes of clients, but the introduction of a new payment incentive
system into a work organization will only affect incomes if the
appropriate intervening processes postulated come into play.
The classic statement of the problem of inferring treatment effects
centers around the contents of the box labeled &dquo;exogenous variables.&dquo;
The outcomes of social programs are rarely, if ever, attributable solely
to treatments and intervening processes but are also determined by
other sources: those exogenous variables correlated with the treatment
variables and stochastic disturbances that are independent of the
treatment exogenous variables. The exogenous processes may or may
not be correlated with treatment, but they often are. Confounded
estimates of the treatment effects usually result from the failure to
control for correlated exogenous variables, which leads to a correlation
between the disturbance and the treatment. Indeed, one of the great
virtues of the classic Campbell and Stanley (1966) statement is the
identification of some very general ways in which exogenous processes
may be correlated with treatment variables, thereby confounding
estimates of effects except under special circumstances. Indeed it is the
special (but not unique) quality of the classical randomized experiment
that its use can make exogenous variables uncorrelated with treatment
variables.’I

Downloaded from erx.sagepub.com at UNIVERSITY OF WATERLOO on June 6,


2014
.!2
d
b
0
$
E
1%
131)
0
...
0.
<-
0
c::
0
i
r0
1.9
0.
Q)
0::
U
m
E
Q)
..c
u
cn

Y
¡¡:

287

Downloaded from erx.sagepub.com at UNIVERSITY OF WATERLOO on June 6,


2014
288

OUTCOME VARIABLES:
SPECIFYING THE GOALS *

OF PROGRAMS

The problem of specifying outcome variables is usually phrased as


goal specification-the determination of those outcomes that policy-
makers and/or program designers envisaged as the intended outcomes
of the program. This issue constitutes one of the important distinctions
between basic and applied social research. In basic research, outcome
variables express the disciplinary interests of the researcher; in applied
social research, outcome variables are those of interest to policymakers
or other sponsors of applied work.
As traditionally viewed, goal specification in evaluation research
tends to be a search for appropriate operational definitions of the
intended effects of programs. Since such definitions are sometimes
cloaked in obscure and ambiguous statements, goal specification can be
a separate empirical research enterprise of its.own, as Wholey’s (1975)

evaluability assessments exemplify. Some of the consequences of


searching for those intentions that are measurable have been noted by
many commentators (e.g., Deutscher, 1977; Scriven, 1972; Chen and
Rossi, 1980). First of all, outcome variables tend to be narrower than the
connotative intentions of program designers and/or policymakers.
Thus the goals of Head Start were defined by the evaluators (Cicirelli et
al., 1969) in cognitive terms mainly because such goals could be more
easily operationalized than others that were more vaguely formulated.
Second, there is a large gap between enabling legislation and the actual
design of programs. Designers may narrow the goals of legislators,
elaborate upon them, or substitute entirely different goals. Sometimes
the enabling legislation deliberately fosters diversity in the processes of
implementation, as in Head Start. In other cases local conditions may
appear to require extensive adaptation, as, for example, in the Planned
Variation Education Program. Some of the critics of the experimental
paradigm were among the first to note these program changes in
execution (e.g., Scriven, 1972; Deutscher, 1977; Chen and Rossi, 1980)
recommending that the goals also be inferred from actual program
operations, rather than solely from policymaker statements or legislative
intents. Indeed in the case of some programs, program operators are
encouraged to develop specific goals as the program is designed (e.g.,
High Impact and Head Start); hence the goals of programs cannot be
described in any specific sense through a consideration of legislative
intent or policymaker statements alone.
Most of the commentators who have advocated specifying goals
through empirical observation of programs in operation have been

Downloaded from erx.sagepub.com at UNIVERSITY OF WATERLOO on June 6,


2014
289

strangely silent on how goals should be inferred through observation.


Some come close to advocating connoisseurial approaches, i.e., &dquo;Anyone
with any experience and smarts will obviously see that ... &dquo;
We believe that there is some wisdom to the admonition that goal
specification be empirically based. However, the process whereby it is
possible to go from observations to goal specification is through social
science theory and knowledge, not by the craftlore of experts and
consultants-a point to which we will return at a later point in this
article. Nor are we advocating ignoring the goals of policymakers in
filling out the content of the goals specified in an evaluation. Indeed it is
useful to conceptualize goals as falling into one or the other of the
following three classes.

Policy-Directed, Plausible Goals


These are goals explicitly formulated by those who designed and/ or
authorized the program and that are plausible in the following senses:
(1) The goals are consonant with current knowledge of the problem to
which the program is directed; (2) the program is designed so that it can
be implemented without heroic efforts; (3) the resources allocated to the
program are sufficient to deliver the treatment at reasonable dosage
levels.
For example, the policy goal of the current speed limit 55 miles per
hour was to lower the consumption of gasoline by motor vehicles. The
goal was plausible in the sense that prior knowledge of the gas
consumption characteristics of typical gas engines indicated that
consumption would be so lowered. The mechanism of implementation
involved tying the passage of appropriate state laws to continuing
federal highway fund allocations-a reasonable strategy.

Policy-Directed, Implausible Goals

These are goals specified explicitly by those who designed/authorized


a program and which are not plausible in the following senses: (1) The
goal is so vague that a relatively large number of specific operationaliza-
tions, some of which are mutually contradictory, are possible; (2) the
goals are not consonant with current knowledge about the problem in
question; (3) the program is not designed so that it can be implemented
by the agency given responsibility; (4) the resources allotted to the
program are not sufficient to deliver the treatment at reasonable dosage
levels.

Downloaded from erx.sagepub.com at UNIVERSITY OF WATERLOO on June 6,


2014
290

For example, although the major goals of the 1968 Federal Firearms
Regulation Act were specific enough (i.e., to restrict access to firearms
on the part of felons, the insane, and certain other categories of persons),
the mode of implementation, namely, requiring the registration of gun
dealers and forbidding them to sell to the proscribed social categories,
was bound to fail since it was based on the assumptions that gun-using
criminals obtained their weapons through gun dealers, that gun dealers
could discern which of their potential customers fit into the proscribed
categories, and that gun registration records could be easily accessed to
trace gun ownership. None of these assumptions was tenable and some
went quite contrary to existing established knowledge (Wright et al.,
1983).

Theory-Derived Goals,
Not Specified by Policy

These goals that are plausible but not specified in policy


are

directives, and which can be discerned either through the a priori


examination of policy and program or through the empirical study of
the program (Chen and Rossi, 1980).
For example, in the TARP experiments (Rossi et al., 1980), official
goals included the reduction of recidivism among released felons
through the extending to this group eligibility for unemployment bene-
fits payments. Consideration of the potential work disincentive effects
of such payments, as strongly suggested in the writing of labor
economists, led the investigators to postulate work disincentive effects.
Except for reasons, it obviously makes little sense to
political
evaluate a
program whose goals were only policy directed and
implausible; but there is no way to decide whether a program’s goals fall
into such a category without careful consideration of whether or not
existing social science theory and knowledge would support such a
judgment of plausibility. The main point of making such distinctions
among goals, however, is to highlight the fact that programs may be
accomplishing some things that were not intended by their designers and
that such effects may be either desirable or undesirable, may sometimes
(as in the case of the TARP experiments) produce effects that offset
those intended, and that a good evaluation should take into account
inferred effects as well as those directly intended.
As indicated above, the judgment whether or not a set of policy-
directed goals is plausible depends on examining the total program in

Downloaded from erx.sagepub.com at UNIVERSITY OF WATERLOO on June 6,


2014
291

light of existing social theory and knowledge. We turn to that task in the
next two sections of this article.

SPECIFYING THE
TREATMENT MODEL

The treatment model consists of the treatment-as-delivered character-


istics, other related exogenous factors, intervening processes, and
outcome variables along with the postulated relationships among these
component parts. An appropriate treatment model lays out in detail
how delivered treatments work (see Figure 1).
As shown in Figure 1 the treatment variables2 are likely correlated
with exogenous variables, which may also independently affect inter-
vening processes and/or outcome variables. This is simply a formal
statement of the truism that most social phenomena are the outcomes of
many interrelated processes. For example, whether or not a person quits
smoking may depend not only on a particular antismoking program to
which he or she may be exposed, but also on factors such as the
participant’s personal health problems believed to be caused by
smoking, whether family members and friends are smokers, and past
smoking history, any or all of which may be correlated with participation
in the program. Hence if we are modeling the outcome of an
antismoking program, we have to take such exogenous factors &dquo;into
account&dquo; in estimating the effects of that antismoking program upon
participants. From a technical viewpoint, unbiased estimates of a
program’s effects can only be obtained when treatment variables are
adequately purged of correlated exogenous variables.
It is the real and present danger that treatment variables are
correlated with exogenous variables that affect intervening processes
and outcome variables which make the randomized controlled experi-
ment so attractive. By randomly assigning target units of a program
(usually persons) to experimental and control groups, the naturally
existing correlations between treatment variables and exogenous vari-
ables are forced to be essentially zero. If the only concern in a program
evaluation is to obtain unbiased treatment effects, then a randomized
controlled experiment that maintains its integrity need not be designed
with any knowledge of the relationships among exogenous, treatment,
intervening, and outcome variables. Hence randomized experiments
can be designed as &dquo;black box&dquo; researches in which how treatments

affect outcomes is unknown.

Downloaded from erx.sagepub.com at UNIVERSITY OF WATERLOO on June 6,


2014
292

But black box randomized experiments are not the only realization of
the experimental paradigm and, indeed, may often be an inefficient
form of that paradigm. This arises because advocates of the black box
experimental paradigm often neglect the fact that after randomization
exogenous variables are still correlated with outcome variables. Knowing
how such exogenous factors affect outcomes makes it possible to
construct more precise estimates of experimental effects by controlling
for such exogenous variables. For example, an experiment on the
recidivism of released prisoners can estimate treatment effects with
smaller standard errors by taking into account the fact that age,
education, and previous work experiences of the released prisoners
ordinarily affect tendencies to recidivate. For a given N, a randomized
experiment that takes into account existing theory and knowledge can3
have considerably more power than a black box randomized experiment.3
The black box paradigm also dominates classical discussion of
nonexperimental approaches (Campbell and Stanley, 1966). Such
discussions center around what are the inherent dangers of black box
quasi-experimental approaches. This may be appropriate if one is
estimating the effectiveness of a program for which there is no
underlying sensible rationale, but it is not sensible to ignore existing
knowledge when its use can increase the power of the research design.
Indeed, at best, it may be possible to obtain unbiased estimates of
effects from quasi-experimental approaches if one can model with some
degree of accuracy the relationships among all the elements of the
treatment model. For example, an evaluation of an unemployment
insurance program in California (Rauma and Berk, 1982) was able to
control for the exogenous factors that determined the size of a released
prisoner’s benefit eligibility, because such benefits were completely
determined by the number of days worked while in prison.4 By holding
constant the number of days worked while in prison, it was possible to
hold constant the exogenous factors that determined receiving the
treatment and hence to construct unbiased estimates of the effects of the
treatment on subsequent recidivism.
The general issue of controlling for self-selection bias has been
discussed thoroughly in recent literature (Barnow et al., 1980) and more
recently by Berk and Ray (1982). How successfully such approaches can
be applied in particular cases is determined by how well known are the
exogenous processes and how well they can be measured. Furthermore,
it is somewhat obvious, but bears emphasis, that knowledge and theory

Downloaded from erx.sagepub.com at UNIVERSITY OF WATERLOO on June 6,


2014
293

concerning the effects of exogenous processes need to be built into


evaluations ab initio and not constructed ad hoc from the selections
available in a given data set.
The dangers of black box quasi-experiments are real, but they flow
from the fact that they are black box efforts and not from their quasi-
experimental character. Theory-driven randomized and quasi-exper-
iments both are superior to their black box counterparts in power and
efficiency. At best the distinction between randomized experiments and
quasi-experiments becomes blurred to the extent that correctly specified
theory-driven treatment models are employed. This last statement has a
number of important implications. Randomized experimental designs
applied to field situations have a distressing tendency to deteriorate
rapidly into quasi-experiments. For example, one may randomly assign
persons to treatment groups, but if treatment acceptance depends even
partially on target population cooperation, differential cooperation can
easily change the research design into a quasi-experiment. Witness the
effects that attrition rates have had on the income maintenance and
housing allowance experiments (Watts and Rees, 1976). Randomized
experiments are difficult to install and carry out except on proposed but
not yet enacted programs. Existing, full-coverage programs can usually
only be evaluated for impact assessment by quasi-experimental designs.
Finally, theory-driven treatment modeling can meet the objections of
many evaluators who are concerned that programs once in place
develop goals that replace those officially proclaimed by policymakers
and program designers. The truism that every program has some effects
can be given some substance if treatment modeling can be used to

uncover them.

THE PROBLEM OF GENERALIZATION

given social program ordinarily is a complex bundle of specific


A
items lumped together as a treatment. Even very simple-appearing
treatments can become quite complex in implementation. For example,
although the transfer payments were conceived as the treatments in
income maintenance or housing allowance experiments, the treatments
as delivered consisted of the payments of varying amounts, methods of
establishing and validating eligibility, housing inspections, and so on
through the entire apparatus of the experiments that dealt directly with
the families in the experiment. Without careful specification of the

Downloaded from erx.sagepub.com at UNIVERSITY OF WATERLOO on June 6,


2014
294

treatment as delivered, interpretation of treatment effects may become


very muddy indeed. More important; an experiment may be fatally
flawed by confounding the intended treatment with administrative
trappings that might nullify intended treatment effects. A priori analysis
of the treatment as delivered should lead to an experimental design that
can separate out the effects of various components of the treatment as
delivered. A very good example exists again in the TARP experiment in
which the administrative regulations of the unemployment benefit
systems of the states of Georgia and Texas negated the beneficial effects
of the payments (Rossi et al., 1980).
Of course any treatment as delivered can be broken down analytically
into a very large number of identifiable components, the vast majority of
which may have trivial impacts upon outcomes. Identifying the
important compenents is again the task of applying a priori knowledge
and theory. Thus in the income maintenance experiments, the guarantee
level and the implicit tax rates were identified on the basis of
microeconomic theories concerning labor force participation as critically
important components and hence were systematically varied within the
experimental design. Similarly in the housing allowance experiments,
the use of housing standards as a criterion for eligibility was conceived
to be an important device and hence built into the experimental design.
These considerations, it should be noted, apply with equal force to
quasi-experiments, especially those in which the design of treatments
can be influenced by the evaluation researcher.

One of the main benefits of departing from the black box treatment-
as-unit approach to evaluation is an enhanced ability to generalize from
the researches in question to other circumstances. The end result of a
black box evaluation is to know whether or not a given treatment-as-
unit is effective and to what extent it is so. A transfer into a different
administrative environment and subsequent modifications to fit the
requirements of that environment may drastically alter the treatment’s
effectiveness, if the elements changed are among the more important
within the treatment-as-unit. Indeed, since the translation of a proposed
program into an enacted program always requires modification to fit the
administrative environment into which it is placed, as well as to the
political acceptability constraints of the policymakers, it is important to
be able to point out what are the essential and nonessential components
of a proposed program.

Downloaded from erx.sagepub.com at UNIVERSITY OF WATERLOO on June 6,


2014
295

MODELING INTERVENING PROCESSES

The main points made with respect to the modeling of the treatment
processes and components of delivered treatments apply as well as to the
modeling of intervening processes. Indeed any model of the treatment
process necessarily includes modeling intervening processes. From
some viewpoints it hardly makes any sense to distinguish intervening

processes except that, for programs that may be expected to have very
long time effects, whether or not intervening processes occur may be the
first sign of whether or not a program is working. For example, if a
manpower training program is to be installed to increase the earning
power of participants over the long run, it may be useful as a first step to
specify what has to change in the short run in order that the long-range
effects of the desired sort may be eventually captured. Thus, if a training
program does not increase the job-relevant skills of participants, it
seems unlikely that long-run wages will also increase. In short, the

specification of intervening processes provides the opportunities for the


more sensitive testing of the effectiveness of programs and also for their

redesign in the unhappy eventuality that postulated intervening steps do


not occur. ,

RESPONSE FUNCTIONAL FORMS

Another issue in modeling centers around the functional forms that


relate program variables to each other and to outcome variables. Re-
cursive models postulate one-way relations among variables and
nonrecursive models postulate that at least some of the relations involve
reciprocal effects.
In program evaluation it is possible to find that modeling causal
processes of the intervention requires postulating reciprocal relations
among outcome variables and/or between the outcome variables and
the intervening variables. For example, an educational program might
affect students’ test scores and self-esteem. However, if existing
knowledge suggests that a reciprocal process exists between test scores
and self-esteem, then a nonrecursive model should be proposed for
evaluating this
program.
Interactive effects may also be postulated in which treatment
variables are differentially effective among subgroups of targets.

Downloaded from erx.sagepub.com at UNIVERSITY OF WATERLOO on June 6,


2014
296

Interactions are sufficiently well known that evaluators routinely look


for them, but the search for interactions should not be a matter of
systematically testing out all possible interactions-a strategy that
maximizes Type I errors-but one which looks for those interactions
that one has a good a priori reason to suspect exist.
Finally, linear additive models of response effects are popular
because they are both easy to compute and simple to interpret when
found. But in some cases there may be good reason to suspect that
polynomial models may be more appropriate. For example, increasing
the amount of treatments may lead one to expect a point of maximum
effect per unit of treatment with lower rates of return for points above
and below the maximum. Thus transfer payments that are too small
may not affect labor force supply at all, while transfer payments that are
very large may not affect labor force supply any more than modest
transfer payments, as the diminishing marginal returns formulation
suggests.

Implementation Modeling
Implementation systems traditionally have not been given the
amount of attention they fully deserve in evaluation research. As
pointed out earlier, experimental evaluations of prospective programs
involve setting up arrangements for delivering programs (or treatments);
hence even programs set up for testing purposes by researchers involve
implementation systems. Even more important is the fact that a
program once enacted must be carried out through an implementation
system that includes administrative rules and regulations, bureaucratic
structures, and personnel who have been given the responsibility to
administer the program in question.
An understanding of program implementation is important in
program evaluation, since successful implementation is also a necessary
condition in assessing program theory success. Only when treatment
variables are implemented successfully, at least to some extent, can we
test whether or not the treatment variables have had any impact upon
outcome variables.
In the evaluation 1_iterature there has been no dearth of interest in
implementation, but too much of the attention has been given to
worrying about whetiier programs have be~!1 delivered as intended, and
not enough attention has been given to understanding the process of

Downloaded from erx.sagepub.com at UNIVERSITY OF WATERLOO on June 6,


2014
297

implementation. Thus Levine ( 1972) stated that the main prob!em of the
War on Poverty was the failure of programs to be implemented in the
field. Gramlich and Koshel ( 1975) found that the performance contract-
ing experiments failed to the extent that they were not implemented (or
implemented incorrectly) in the field.
Part of the problem of integrating a concern for implementation
process into evaluation stems from the fact that evaluators tend to be
specialists in the disciplines relevant to treatment processes. Thus an
evaluator concerned with the outcome of educational programs usually
knows a great deal about educational processes, but may know very
little about theories of organization; hence the organizational contexts
of the program may be neglected or unspecified.
In Figure 1 we have designated an implementation system as the
organizational arrangement that is either specially designed to deliver
treatments (or programs) or given the responsibility to do so. We do not
mean to imply that this box represents a simple system. Indeed at least
six subsystems have been identified in the existing literature (e.g., Van
Meter and Van Horn, 1975; Williams and Elmore, 1976; Scheirer, 1981 ),
and these are detailed below. ,

Implementing organization. An agency of some sort, either newly


created for the purpose or already existing, is usually given the mandate
to administer a program. Its characteristics, such as the particular type
of authority structure, the composition of personnel, existing standard
operating procedures, and the system of incentives employed to achieve
coordination of activities among personnel and departments may all
affect how much and what specific forms of a given treatment are
delivered. For example, schools are considered to be loosely coupled
systems in which component personnel (e.g., teachers) are not linked
together into an extensive division of labor in which the work of one
member is closely dependent in time on the work of another member.
Hence the activities of teachers in their classrooms are notoriously
difficult to control (and affect). In contrast, a public welfare agency in
which caseworkers each handle only part of the treatment of a case may
be more easily changed, since it is easier to detect caseworkers who are
not performing according to plan.
Organizational theory is not the best developed of social sciences and
tends to be heavily dominated by theories that were developed in
connection with the study of industrial and business enterprises. The
study of public sector organizations that process people rather than
material objects has been relatively neglected.

Downloaded from erx.sagepub.com at UNIVERSITY OF WATERLOO on June 6,


2014
298

Target groups. Every program defines a target population consisting


of some human units-persons, households, communities, and so on-
in which its effects are to be manifested in the form of specified changes.
Target groups affect the implementation of programs to the extent that
such implementation implies the cooperation, compliance, or partici-
pation of the groups in question. If targets, for whatever reasons, refuse
to accept delivery of a program’s treatment, clearly the program can
have no effect. Participation rates, therefore, are extremely important
characteristics of programs. For example, the fact that Sesame Street
achieved so large a penetration of its intended audience of nursery-
school-age children in poor families contributed greatly to its overall
success, even though the effects of viewing on each child may have been
relatively slight. In contrast, the failure of Feeling Good (Mielke and
Swinehart, 1976) was largely caused by its inability to reach more than a
very small proportion of its intended target of poor adults. Understand-
ing the conditions under which targets of various sorts will or will not
participate in programs may involve knowing how subgroups within the
population receive information, subcultural beliefs concerning partici-
pation in similar programs, and so on.
Environmental context. Implementation takes place within an
environment containing other organizations, competing activities and
programs, political structures, and so on. All these exogenous contextual
processes can affect whether or not a program can be effectively
implemented. Thus the Community Action Program of the Office of
Economic Opportunity was eventually handed over to local political
control after mayors protested against the federal government’s setting
up independent political entities in their domains (Moynihan, 1969).
Also, the fact that the health education television program Feeling
Good was broadcast at prime viewing times means that it had to
compete with extremely popular programs for the attention of its
intended audiences (Mielke and Swinehart, 1976).

Characteristics of treatments. Some treatments are intrinsically


difficult and others much easier to deliver. Perhaps the critical element is
the extent to which the treatment is &dquo;operator-robust&dquo;-capable of
being delivered relatively intact as intended, regardless of the activities
of the persons in whose hands responsibility for delivery is given. Thus,
at one extreme, transfer payments are relatively robust treatments, since
there are limited numbers of ways in which checks can be delivered to

Downloaded from erx.sagepub.com at UNIVERSITY OF WATERLOO on June 6,


2014
299

persons through the mail. At the other extreme, treatments that involve
tailoring interventions to the characteristics of targets usually involve
allowing considerable discretion to the frontline implementer, a circum-
stance that may considerably distort program intentions. Indeed for
most human services programs the dilemma is what is the optimum level
of discretion to be allowed? If too little discretion is allowed, inappro-
priate treatments may be administered to clients. If too much discretion
is allowed, it may become very difficult to determine precisely what was
delivered, as is the case with many educational programs designed to
alter the teaching practices in the classroom.
Another characteristic of treatments that bears attention is the matter
of dosage. Thus a transfer payment that amounts to $100 per week is
simply worth a lot more than 100 times a transfer payment of $1 per
week. Or, just becuase we know that three hours of counseling per week
may help a client does not mean that one hour per week will simply do
one-third less. The amount of an intervention, especially as actually
delivered, ought to be an important concern of evaluators.
Resources. Obviously a program requires-sufficient resources to
enable it to accomplish the delivery of treatment. Funds are used to hire
persons, physical facilities, and so on. An underfunded program simply
will not be able to deliver the treatments as prescribed.

Interorganizational transactions. An implementing organization


may have to deal with other organizations in order to be able to deliver
treatments. For example, treatments that call for the cooperation of
other organizations may not be able to function if cooperation is
withheld; or an organization may be under the jurisdiction of a more
superordinate organization whose command superiority may either
interfere with or facilitate the implementation of the program.
All of the above characteristics of implementation systems may need
to be taken into account in developing a model of implementation in
particular cases. We cannot pretend that the construction of imple-
mentation models will be easy at this stage in our understanding of the
public sector human services organizations. All that is clear to us is that
neglect of understanding implementation has made it ambiguous in
many cases of evaluation researches whether the program or the
implementation system or both were at fault in a demonstrated failure to
achieve outcomes.

Downloaded from erx.sagepub.com at UNIVERSITY OF WATERLOO on June 6,


2014
300

CONCLUSIONS

This article presents a set of arguments for a new appraisal of the


dominant experimental paradigm as applied to evaluations. A central
feature of that paradigm as elaborated has been to emphasize black box
randomized experiments and quasi-experiments. We have argued for a
paradigm that accepts experiments and quasi-experiments as dominant
research designs, but that emphasizes that these devices should be used
in conjunction with a priori knowledge and theory to build models of the
treatment process and implementation system to produce evaluations
that are more efficient and that yield more information about how to
achieve desired effects.
It should also be emphasized that this article does not argue for
postponing evaluations until the most adequate theory and knowledge
have been constructed. It argues, rather, that we make do with what we
have, at least for the time being, drawing upon existing stocks of theory
and knowledge to the extent relevant. We also make a special plea for
more intensive attention to developing knowledge and theory concerning
how human services organizations work, so that our general under-
standing of implementation systems will be advanced
In sum, we hope that what we have to say here will inspire evaluators
to spend more effort on understanding how programs work than on the
effort to find out whether or not they actually work in some specific and
nongeneralizable instance.

NOTES

1. It has become increasingly clear with experience in social experimentation (as


opposed to short-term laboratory experiments) that the beneficial effects of randomization
can be undermined seriously by exogenous factors operating on the delivery of treatments,
as exemplified in nonrandom attrition in treatment and control groups in the income
maintenance experiments (Watts and Rees, 1976; Rossi and Lyall, 1976).
2. Treatment variables include treatment characteristics as delivered. In order to make
the discussion in this article flow more smoothly, we will simply refer to "treatment
factors" with the understanding that such terms refer to treatments as delivered.
3. A similar argument against black box experiments was made recently in a
commentary on experiments on the effectiveness of seeding clouds with silver iodide
crystals in order to produce rainfall (Kerr, 1982a, 1982b). The author made the point that
without good knowledge of the processes that take place within clouds that ordinarily lead

Downloaded from erx.sagepub.com at UNIVERSITY OF WATERLOO on June 6,


2014
301

to rainfall, the very expensive experiments were not powerful enough to detect treatment
effects. A strong argument was then advanced against any additional black box
experiments on the effects of cloud seeding.
4. Since prisoners did not know that their working would affect their eligibility for and
amount of benefits (the legislation had been enacted at the time of imprisonment), they
could not have worked in prison because they anticipated postrelease benefits. Of course,
for subsequent cohorts of prisoners, the possibility of prison work being affected by
anticipated benefits will have to be taken into account.

REFERENCES

BARNOW, B., G. G. CAIN, and A. GOLDBERGER (1980) "Issues in the analysis of


selectivity bias," in E. W. Stormsdorfer and G. Farkas (eds.) Evaluation Studies
Review Annual, Vol. 5. Beverly Hills, CA: Sage.
BERK, R. and S. C. RAY (1982) "Selection biases in sociological data." Social Science
Research 11, 4: 352-398.
CAMPBELL, D. T. and J. C. STANLEY (1966) Experimental and Quasi-Experimental
Designs for Research. Chicago: Rand McNally.
CHEN, H. and P. H. ROSSI(1980) "Multi-goal theory-driven approach: a model linking
basic and applied social science." Social Forces 51, 1 : 106-122.
CICIRELLI, V. G. and Associates (1969) The Impact of Head Start. Athens, OH:
Westinghouse Learning Corporation and Ohio University.
COOK, T. D. and D. T. CAMPBELL (1979) Quasi-Experimentation: Design and
Analysis Issues for Field Setting. Chicago: Rand McNally.
DEUTSCHER, I. (1977) "Toward avoiding the goal trap in evaluation research," in F. G.
Caro (ed.) Readings in Evaluation Research. New York: Russell Sage.
GRAMLICH, E. M. and P. P. KOSHEL (1975) "Is real-world experimentation possible?
The case of educational performance contracting." Policy Analysis 1 (Summer):
511-530.
GUBA, E. G. and Y. LINCOLN (1981) Effective Evaluation. San Francisco: Jossey-Bass.
KERR, R. A. (1982a) "Test fails to confirm cloud seeding effect." Science 217, 4557.
(1982b) "Cloud seeding: one success in 35 years." Science 217, 4559.
&mdash;&mdash;&mdash;

LEVINE, R. A. (1972) Public Planning: Failure and Redirections. New York: Basic
Books.
MIELKE, K. W. and J. W. SWINEHART (1976) Evaluation of the Feeling Good
Television Series. New York: Children’s Television Workshop.
MORRIS, L. L., C. T. FITZ-GIBBON, and M. E. HENERSON (1978) Program
Evaluation Kit. Beverly Hills, CA: Sage.
MOYNIHAN, D. P. (1969) Maximum Feasible Misunderstanding. New York: Free
Press.
RAUMA, D. and R. BERK (1982) "Crime and poverty in California." Social Science
Research 11, 4: 318-351.
RIECKEN, H. W. and R. F. BORUCH (1974) Social Experimentation-A Method for
Planning and Evaluating Social Intervention. New York: Academic.

Downloaded from erx.sagepub.com at UNIVERSITY OF WATERLOO on June 6,


2014
302

ROSSI, P. H. and K. LYALL (1976) Reforming Social Welfare. New York: Russell Sage.
ROSSI, P. H., R. A. BERK, and K. J. LENIHAN (1980) Money, Work and Crime. New
York: Academic.
SCHEIRER, M. A. (1981) Program Implementation: The Organizational Context.
Beverly Hills, CA: Sage.
SCRIVEN, M. (1972) "Pros and cons about goal-free evaluation." Evaluation Comment
3: 1-4.
SUCH MAN, E. A. (1969) "Evaluating educational programs." Urban Review 3, 4 : 15-17.
VAN METER, D. S. and C. E. VAN HORN (1975) "The policy implementation process:
a conceptual framework." Administration and Society 6, 4: 445-488.

WATTS, H. and A. REES (1976) The New Jersey Income Maintenance Experiment,
Vols. 2 and 3. New York: Academic.
WHOLEY, J. S., J. N. NAY, and R. E. SCHMIDT (1975) "Evaluation: where is it really
needed?" Evaluation Magazine 2, 2: 89-93.
WILLIAMS, W. (1976) "Implementation analysis and assessment," in W. Williams and
R. F. Elmore (eds.) Social Program Implementation. New York: Academic.
and R. F. ELMORE [eds.] (1976) Social Program Implementation. New York:
&mdash;&mdash;&mdash;

Academic.
WRIGHT, J. D., P. H. ROSSI, and K. DALY (1983) Under the Gun. Hawthorne, NY:
Aldine.

Huey-tsyh Chen is Associate Research Scientist of the Center for Metropolitan Planning
and Research and an Assistant Professor of the Department of Social Relations at the
Johns Hopkins University. He is interested in developing theoretical models in program
evaluation and implementationfor policy decisions. Currently he is editing an evaluation
book with Peter H. Rossi.

Peter H. Rossi is Professor of Sociology and Director of Research at the Social and
Demographic Research Institute of the University of Massachusetts at Amherst. He is
coauthor (with Howard Freeman) of Evaluation: A Systematic Approach (Sage, 1982)
and (with S. Nock) of Measuring Social Judgments: The Factorial Survey Approach
(Sage, 1982). He is past president of the American Sociological Association and recipient
of the 1981 Alva and Gunnar Myrdal Prize of the Evaluation Research Society, awarded
for contributions to evaluation research methods.

Downloaded from erx.sagepub.com at UNIVERSITY OF WATERLOO on June 6,


2014

You might also like