Single Case Experimental Designs: Strategies For Studying Behavior Change (Barlow & Hersen)
Single Case Experimental Designs: Strategies For Studying Behavior Change (Barlow & Hersen)
Single Case Experimental Designs: Strategies For Studying Behavior Change (Barlow & Hersen)
ZJ 'JVL
b 1
1^
1
>
1 %T
STRATEGIES FOR STUDYING
BEHAVIOR CHANGE
SECOND EDITION
^
^
HHBBI I
Digitized by tine Internet Arciiive
in 2013
http://archive.org/details/vidhOOdavi
Single Case
Experimental Designs
(PGPS-56)
Pergamon Titles of Related Interest
Related Journals'"
BEHAVIORAL ASSESSMENT
PERSONALITY AND INDIVIDUAL DIFFERENCES
David H. Barlow
SUNY at Albany
IVIichel Hersen
University of Pittsburgii School of Medicine
Donald P. Hartmann
University of Utah
and
Alan E. Kazdin
University of Pittsburgt) Sctiooi of IVIedicine
PERGAMON PRESS
NEW YORK OXFORD BEIJING FRANKFURT
Barlow, David H.
Preface ix
Epigram xi
1.1. Introduction 1
2.1. Introduction 32
2.2. Variability ^ 33
2.3. Experimental Analysis of Sources of Variability Through
Improvised Designs 39
2.4. Behavior Trends and Intrasubject Averaging 45
2.5. Relation of Variability to Generality of Findings 49
2.6. Generality of Findings 50
2.7. Limitations of Group Designs in EstabHshing Generality of
Findings 51
2.8. Homogeneous Groups Versus Replication of a Single-Case
Experiment 56
2.9. Applied Research Questions Requiring Alternative Designs 62
2.10. Blurring the Distinction Between Design Options 64
vi Contents
3.1. Introduction 67
3.2. Repeated Measurement 68
3.3. Choosing a Baseline 71
3.4. Changing One Variable at a Time 79
3.5. Reversal and Withdrawal 88
3.6. Length of Phases 95
3.7. Evaluation of Irreversible Procedures 101
3.8. Assessing Response Maintenance 105
References 374
ix
X Preface
David H. Barlow
Albany,New York
Michel Hersen
Pittsburgh, Pennsylvania
Epigram
1.1. INTRODUCTION
The individual is of paramount importance in the cUnical science of human
behavior change. Until recently, however, this science lacked an adequate
methodology for studying behavior change in individuals. This gap in our
methodology has retarded the development and evaluation of new procedures
in clinical psychology and psychiatry as well as in educational fields.
Historically, the intensive study of the individual held a preeminent place in
the fields of psychology and psychiatry. In spite of this background, an
adequate experimental methodology for studying the individual was very
slow to develop in applied research.* To find out why, it is useful to gain some
perspective on the historical development of methodology in the broad area
of psychological research.
The purpose of this chapter is to provide such a perspective, beginning with
the origins of methodology in the basic sciences of physiology and experimen-
tal psychology in the middle of the last century. Because most of this early
work was performed on individual organisms, reasons for the development
of between-group comparison methodology in basic research (which did not
occur until the turn of the century) are outlined. The rapid development of
inferential statistics and sampling theory during the early 20th century
enabled greater sophistication in the research methodology of experimental
psychology. The manner in which this affected research methods in applied
areas during the middle of the century is discussed.
*In this book applied research refers to experimentation in the area of human
behavior change relevant to the disciplines of clinical psychology, psychiatry, social
work, and education.
1
2 Single-case Experimental Designs
In the meantime, applied research was off to a shaky start in the offices of
early psychiatrists with a technique known as the case study method. The
separate development of applied research is traced from those early begin-
nings through the grand collaborative group comparison studies proposed in
the 1950s. The subsequent disenchantment with this approach in applied
research forced a search for alternatives. The rise and fall of the major
alternatives
process research and naturalistic studies
is outlined near the
end of the chapter. This disenchantment also set the stage for a renewal of
interest in the scientific study of the individual. The multiple origins of single-
case experimental designs in the laboratories of experimental psychology and
the offices of clinicians complete the chapter. Descriptions of single-case
designs and guidelines for their use as they are evolving in applied research
comprise the remainder of this book.
Johannes Miiller and Claude Bernard, but an important landmark for ap-
plied research was the work of Paul Broca in 1861. At this time, Broca was
caring for a man who was hospitalized for an inability to speak intelligibly.
Before the man died, Broca examined him carefully; subsequent to death, he
performed an autopsy. The finding of a lesion in the third frontal convolution
of the cerebral cortex convinced Broca, and eventually the rest of the scien-
tific world, that this was the speech center of the brain. Broca's method was
Pierre Flouren in the 1850s. In this method, brain function was mapped out
by systematically destroying parts of the brain in animals and noting the
effects on behavior.
The importance of this research in the context of the present discussion lies
individual subjects who were highly trained. This training involved learning
to describe experiences in an objective manner, free from emotional or
language restraints. For example, the experience of seeing a brightly colored
object would be described in terms of shapes and hues without recourse to
aesthetic appeal. To illustrate the objectivity of this system, introspection of
4 Single-case Experimental Designs
well known that summaries are not required. What is often overlooked,
however, is that Pavlov's basic findings were gleaned from single organisms
and strengthened by replication on other organisms. In terms of scientific
yield, the study of the individual organism reached an early peak with Pavlov,
and Skimjer would later cite this approach as an important link and a strong
bond between himself and Pavlov (Skinner, 1966a).
these findings to mean that nature strove to produce the "average" man but,
due to various reasons, failed, resulting in errors or variances in traits that
grouped around the average. As one moved further from this average, fewer
examples of the trait were evident, following the well-known normal distribu-
tion. This approach, in turn, had its origins in Darwin's observations on
individual variation within a species. Quetelet viewed these variations or
errors as unfortunate since he viewed the average man, which he termed
rhomme moyen, as a cherished goal rather than a descriptive fact of central
tendency. If nature were "striving" to produce the average man, but failed
due to various accidents, then the average, in this view, was obviously the
ideal. Where nature failed, however, man could pick up the pieces, account
for the errors, and estimate the average man through statistical techniques.
The influence of this finding on psychological research was enormous, as it
paved the way for the application of sophisticated statistical procedures to
psychological problems. Quetelet would probably be distressed to learn,
however, that his concept of the average individual would come under attack
during the 20th century by those who observed that there is no average
individual (e.g., Dunlap, 1932; Sidman, 1960).
This viewpoint notwithstanding, the study of individual differences and the
statistical approach to psychology became prominent during the first half of
the 20th century and changed the face of psychological research. With a push
from the American functional school of psychology and a developing interest
in the measurement and testing of intelligence, the foundation for comparing
groups of individuals was laid.
6 Single-case Experimental Designs
about the statistical approach and seemed to believe, at times, that inaccurate
data could be made to yield accurate conclusions if the proper statistics were
applied (Boring, 1950). Although this view was rejected by more conservative
colleagues, it points up a confidence in the power of statistical procedures that
reappears from time to time in the execution of psychological research (e.g.,
D. A. Shapiro & Shapiro, 1983; M. L. Smith & Glass, 1977; G. T. Wilson &
Rachman, 1983).
One of the best known psychologists to adopt this approach was James
McKeen Cattell. Cattell, along with Farrand, devised a number of simple
mental tests that were administered to freshmen at Columbia University to
determine the range of individual differences. Cattell also devised the order
of merit method, whereby a number of judges would rank items or people on
a given quality, and the average response of the judges constituted the rank of
that item vis-a-vis other items. In this way, Cattell had 10 scientists rate a
number of eminent colleagues. The scientist with the highest score (on the
average) achieved the top rank.
It may seem
ironic at first glance that a concern with individual differences
led to an emphasis on groups and averages, but differences among individu-
als, or intersubject variability, and the distribution of these differences neces-
among organisms could be accounted for or averaged out in large groups was
a commonsense notion emanating from the new emphasis on variability
among organisms. The fact that this research resulted in an average finding
from the hypothetical average rat drew some isolated criticism.For instance,
The Single-case in Basic and Applied Research 7
situations which should be grouped for statistical treatment are those which
have for the individual rats or for the individual children the same psycholog-
ical structure and only for such period of time as this structure exists" (p.
328). The new emphasis on variability and averages, however, would have
pleased Quetelet, whose slogan could have been "Average is Beautiful."
better on the average than a similar plot treated differently. The implications
of this philosophy for applied research will be discussed in chapter 2.
other words, inference is made from the sample to the population. This work
and the subsequent developments in the field of sampling theory made it
possible to talk in terms of psychological principles with broad generality and
applicability
a primary goal in any science. This type of estimation, how-
ever, was based on appropriate statistics, averages, and intersubject variabil-
ity in the sample, which further reinforced the group comparison approach in
basic research.
As the science of psychology grew out of its infancy, its methodology was
reported during this period came tantalizingly close to providing the basic
scientific ingredients of experimental single-case research. The most famous
of these, of course, is the J. B. Watson and Rayner (1920) study of an
time, firmly link successful treatment with the necessity of discovering the
etiology of the behavior disorder. One wonders if the early development of
clinical techniques, including psychoanalysis, would have been different if
careful observers like Breuer had been cognizant of the experimental implica-
tions of their clinical work. Of course, this small leap from uncontrolled case
study to scientific investigation of the single case did not occur because of a
lack of awareness of basic scientific principles in early clinicians. The result
was an accumulation of successful individuals' case studies, with clinicians
from varying schools claiming that their techniques were indispensable to
success. In many cases their claims were grossly exaggerated. Brill noted in
1909 on psychoanalysis that "The results obtained by the treatment are
unquestionably very gratifying. They surpass those obtained by simpler
methods in two chief respects; namely, in permanence and in the prophylactic
value they have for the future" (Brill, 1909). Much later, in 1935, Kessel and
Hyman observed, "this patient was saved from an inferno and we are
convinced that this could have been achieved by no other method" (Kessel &
Hyman, 1933). From an early behavioral standpoint. Max (1935) noted the
electrical aversion therapy produced "95 percent relief" from the compulsion
of homosexuality.
These kinds of statements did little to endear the case study method to
serious applied researcherswhen they began to appear in the 1940s and 1950s.
In fact, the case study method, if anything, deteriorated somewhat over the
years in terms of the amount and nature of publicly observable data available
in these reports. Frank (1961) noted the difficulty in even collecting data from
a therapeutic hour in the 1930s due to lack of necessary equipment, reluc-
tance to take detailed notes, and concern about confidentiality. The advent of
the phonograph record at this time made it possible at least to collect raw data
from those clinicians who would cooperate, but this method did not lead to
any fruitful new ideas on research. With the advent of serious applied
research in the 1950s, investigators tended to reject reports from uncontrolled
case studies due to an inabilij)rtaiialuatjJi^fects^gfJj:eatment. Given the
extraordinary claims by clinicians after successful case studies, this attitude is
understandable. However, from the viewpoint of single-case experimental
designs, this rejection of the careful observation of behavior change in a case
report had the effect of throwing out the baby with the bathwater.
defined than in most case reports, and techniques tended to be fixed and
"school" oriented. Because all procedures achieved some success, practi-
tioners within these schools concentrated on the positive results, explained
away the failures, and decided that the overall results confirmed that their
procedures, as applied, were responsible for the success. Due to the strong
and overriding theories central to each school, the successes obtained were
attributed to theoretical constructs underlying the procedure. This precluded
a careful analysis of elements in the procedure or the therapeutic intervention
that many have been responsible for certain changes in a given case and had
the effect of reinforcing the application of a global, ill-defined treatment
from whatever theoretical orientation, to global definitions of behavior disor-
ders, such as neurosis. This, in turn, led to statements such as "psy-
chotherapy works with neurotics." Although applied researchers later
rejected these efforts as unscientific, one carryover from this approach was
the notion of the average response to treatment; that is, if a global treatment
is successful on the average with a group of "neurotics," then this treatment
will probably be successful with any individual neurotic who requests treat-
ment.
Intuitively, of course, descriptions of results from 50 cases provide a more
convincing demonstration of the effectiveness of a given technique than
A modification of this approach
separate descriptions of 50 individual cases.
utilizing updated and procedures and with the focus on individual
strategies
responses has been termed clinical replication. This strategy can make a
substantial contribution to the applied research process (see chapter 10). The
major difficulty with this approach, however, particularly as itwas practiced
in early years, is that the category in which these clients are classified most
always becomes unmanageably heterogeneous. The neurotics described in
Eysenck's (1952) paper may have less in common than any group of people
one would choose randomly. When cases are described individually, however,
a clinician stands a better chance of gleaning some important information,
since specific problems and specific procedures are usually described in more
detail. When one lumps cases together in broadly defined categories, individ-
ual case descriptions are lost and the ensuing report of percentage success
becomes meaningless. This unavoidable heterogeneity in any group of pa-
tients is an important consideration that will be discussed in more detail in
this chapter and in chapter 2.
that clinical researchers must start defining the independent variables more
precisely and must ask the question: "What specific treatment is effective with
a specific type of client under what circumstances?"
14 Single-case Experimental Designs
Ethical objections
Practical problems
from these a tiny number are suitable for inclusion in the homogeneous sample
one wishes to study. Selection of the sample can be so time consuming that it
severely limits research possibilities. Consider the clinician who wishes to assem-
ble a series of obsessive-compulsive patients to be assigned at random into one of
two treatment conditions. He will need at least 20 such cases for a start, but
obsessive-compulsive neuroses (not personality) make up only 0.5-3 percent of
the psychiatric outpatients in Britain and the USA. This means the clinician will
need a starting population of about 2000 cases to sift from before he can find his
sample, and even then this assumes that all his colleagues are referring every
suitable patient to him. In practice, at a large center such as the Maudsley
Hospital, it would take up to two years to accumulate a series of obsessive
compulsives for study (Bergin & Strupp, 1972, p. 130).
Averaging of results
ways to treatment. That is, some improve and others will not.
patients will
The average response, however, will not represent the performance of any
individual in the group. In relation to this problem, Bergin (Bergin & Strupp,
1972) noted that he consulted a prominent statistician about a therapy
research project who dissuaded him from employing the usual inferential
statistics applied to the group as a whole and suggested instead that individual
Generality of findings
specific questions about effects of therapy, one loses the ability to make
inferential statements to the population of patients with a particular disorder
because the individual complexities in the population will not have been
adequately sampled. Thus it becomes difficult to generalize findings at all
Intersubject variability
Naturalistic studies
The advantage of the naturalistic study for most clinicians was that it did
little to disrupt the typical activities engaged in
by clinicians in day-to-day
practice. Unlike with the experimental group comparison design, clinicians
were not restricted by precise definitions of an independent variable (treat-
ment, time limitation, or random assignment of patients to groups). Kiesler
(1971) noted that naturalistic studies involve "... live, unaltered, minimally
controlled, unmanipulated ^natural' psychotherapy sequences so-called ex-
periments of nature" (p. 54). Naturally this approach had great appeal to
clinicians for it dealt directly with their activities and, in doing so, promised
to consider the complexities inherent in treatment. Typically, measures of
multiple therapist and patient behaviors are taken, so that all relevant vari-
ables (based on a given clinician's conceptualization of which variables are
relevant) may be examined for interrelationships with every other variable.
Perhaps the best known example of this type of study is the project at the
Menninger Foundation (Kernberg, 1973). Begun in 1954, this was truly a
mammoth undertaking involving 38 investigators, 10 consultants, three dif-
ferent project leaders, and 18 years of planning and data collection. Forty-
two patients were studied in this project. Thisgroup was broadly defined,
although overtly psychotic patients were excluded. Assignment of patient to
therapist and to differing modes of psychoanalytic treatment was not random
but based on clinical judgments of which therapist or mode of treatment was
most suitable for the patient. In other words, the procedures were those
normally in effect in a clinical setting. In addition, other treatments, such as
pharmacological or organic interventions, were administered to certain pa-
tients as needed. Against this background, the investigators measured multi-
ple patient characteristics (such as various components of ego strength) and
correlated these variables, measured periodically throughout treatment by
referring to detailed records of treatment sessions, with multiple therapeutic
activities and modes of treatment. As one would expect, the results are
enormously complex and contain many seemingly contradictory findings. At
least one observer (Malan, 1973) noted that the most important finding is that
purely supportive treatment is ineffective with borderline psychotics, but
working through of the transference relationship under hospitalization with
this group is effective. Notwithstanding the global definition of treatment and
the broad diagnostic categories (borderline psychotic) also present in early
group comparison studies, this report was generally hailed as an extremely
important breakthrough in psychotherapy research. Methodologists, how-
ever, were not so sure. While admitting the benefits of a clearer definition of
seem necessary to undermine the stated strengths of the study that is, the
"unaltered, minimally controlled, unmanipulated" condition prevaiHng in
the typical naturalistic project
by randomly assigning patients, limiting
access to additional confounding modes of treatment, and observing devia-
tion of therapists from prescribed treatment forms. But if this were done, the
study would no longer be naturalistic.
A further problem is obvious from the example of the Menninger project.
The practical difficulties in executing this type of study seem very little less
than those inherent in the large group comparison approach. The one excep-
tion is that the naturalistic study, in retaining close ties to the actual function-
numbers of
ing of the clinic, requires less structuring or manipulating of large
patients and therapists. The fact that this project took 18 years to complete
makes one consider the significant administrative problem inherent in main-
taining a research effort for this length of time. This factor is most likely
responsible for the admission from one prominent member of the Menninger
team, Robert S. Wallerstein, that he would not undertake such a project
again (Bergin & Strupp, 1972). Most seem to have heeded his advice because
few, if any, naturalistic studies have appeared in recent years.
Correlational studies, of course, do not have to be quite so "naturalistic"
as the Menninger study (Kazdin, 1980a; Kendall & Butcher, 1982). Kiesler
(1971) reviewed a number of studies without experimental manipulation that
contain adequate definitions of variables and experimental attempts to rule
out obvious confounding factors. Under such conditions, and if practically
feasible, correlational studies may expose heretofore unrecognized relation-
ships among variables in the psychotherapeutic process. But the fact remains
that correlational studies by their nature are incapable of determining causal
relationships on the effects of treatment. As Kiesler pointed out, the most
common error in these studies is the tendency to conclude that a relationship
between two variables indicates that one variable is causing the other. For
instance, the conclusion in the Menninger study that working through trans-
20 Single-case Experimental Designs
Process research
and therapist instead of the final outcome of any therapeutic effort. In the
late 1950s and early 1960s, a large number of studies appeared on such topics
as relation of therapist behavior to certain patient behaviors in a given
interview situation (e.g., Rogers, Gendlin, Kiesler, & Truax, 1967). As such,
process research held much appeal for clinicians and scientists alike. CHni-
cians were pleased by the focus on the individual and the resulting ability to
study actual clinical processes. In some studies repeated measures during
therapy gave clinicians an idea of the patient's course during treatment.
Scientists were intrigued by the potential of defining variables more precisely
within one interview without concerning themselves with the complexities
involved before or after the point of study. The increased interest in process
research, however, led to an unfortunate distinction between process and
outcome was well stated by Lu-
studies (see Kiesler, 1966). This distinction
borsky (1959), who noted that process research was concerned with how
changes took place in a given interchange between patient and therapist,
whereas outcome research was concerned with what change took place as a
result of treatment. As Paul (1969) and Kiesler (1966) pointed out, the
dichotomization of process and outcome led to an unnecessary polarity in the
manner in which measures of behavior change were taken. Process research
collected data on patient changes at one or more points during the course of
therapy, usually without regard for outcome, while outcome research was
concerned only with pre-post measures outside of the therapeutic situation.
Kiesler noted that this was unnecessary because measures of change within
treatment can be continued throughout treatment until an "outcome" point is
direct practical help. My clinical experience is the only thing that has helped
22 Single-case Experimental Designs
me in my practice to date. . .
." (Bergin & Strupp, 1972, p. 340). This opinion
was echoed by one of the most productive and best known researchers of the
1950s, Carl Rogers, who as early as the 1958 APA conference on psy-
chotherapy noted that research had no impact on his clinical practice and by
1969 advocated abandoning formal research in psychotherapy altogether
(Bergin & Strupp, 1972). Because this view prevailed among prominent
clinicians who were well acquainted with research methodology, it follows
that clinicians without research training or expertise were largely unaffected
by the promise or substance of scientific evaluation of behavior change
procedures. L. H. Cohen (1976, 1979) confirmed this state of affairs when he
summarized a series of surveys indicating that 40% of mental health profes-
sionals think that no research exists that is relevant to practice, and the
remainder believe that less than 20% of research articles have any applicabil-
ity to professional settings.
Although the methodological above were only one
difficulties outlined
cannot be blamed on the techniques proper after all, they are merely tools
but their veneration mirrors a prevailing philosophy among behavioral scientists
which subordinates problems to methodology. The insidious effects of this trend
are tellingly illustrated by the typical graduate student who is often more in-
terested in the details of a factorial design than in the problem he sets out to
study; worse, the selection of a problem is dictated by the experimental design.
Needless to say, the student's approach faithfully reflects the convictions and
teachings of his mentors. With respect to inquiry in the area of psychotherapy,
the kinds of effects we need to demonstrate at this point in time should be
significant enough so that they are readily observable by inspection or descriptive
statistics. If this statistical and mathematical
cannot be done, no fixation upon
which obviously can come only from the
niceties will generate fruitful insights,
researcher's understanding of the subject matter and the descriptive data under
scrutiny (1972, p. 440)
The Single-case in Basic and Applied Research 23
this point" (Bergin & Strupp, 1970, p. 19). The hope was also expressed that
this approach would tend to bring research and practice closer together.
With the recommendations emerging from Bergin and Strupp's compre-
hensive analysis, the philosophy underlying applied research methodology
had come full circle in a little over 1(X) years. The disillusionment with large-
scale between-group comparisons observed by Bergin and Strupp and their
subsequent advocacy of the intensive study of the individual is an historical
At that
repetition of a similar position taken in the middle of the last century.
time, the noted physiologist, Claude Bernard, in An Introduction to the
Study of Experimental Medicine (1957), attempted to dissuade colleagues
who believed that physiological processes were too complex for experimental
inquiry within a single organism. In support of this argument, he noted that
the site of processes of change is in the individual organism, and group
averages and variance might be misleading. In one of the more famous
anecdotes in science, Bernard castigated a colleague interested in studying the
properties of urine in 1865. This colleague had proposed collecting specimens
from urinals in a centrally located train station to determine properties of the
average European urine. Bernard pointed out that this would yield little
information about the urine of any one individual. Following Bernard's
persuasive reasoning, the intensive scientific study of the individual in physi-
ology flourished.
But methodology in physiology and experimental psychology is not directly
applicable to the complexities present in applied research. Although the
splendid isolation of Pavlov's laboratories allowed discovery of important
psychological processes without recourse to sophisticated experimental de-
24 Single-case Experimental Designs
sign, it is unlikely that the same results would have obtained with a household
pet in its natural environment. Yet these are precisely the conditions under
which most applied researchers must work.
The plea of applied researchers for appropriate methodology grounded in
the scientific method to investigate complex problems in individuals is never
more evident than in the writings of Gordon Allport. Allport argued most
eloquently that the science of psychology should attend to the uniqueness of
the individual (e.g., Allport, 1961, 1962). In terms commonly used in the
1950s, Allport became the champion of the idiographic (individual) ap-
proach, which he considered superior to the nomothetic (general or group)
approach.
Why should we not start with individual behavior as a source of hunches (as we
have and then seek our generalization (also as we have in the past) but
in the past)
finally come back to the individual not for the mechanical application of laws (as
we do now) but for a fuller and more accurate assessment then we are now able
to give? I suspect that the reason our present assessments are now so often feeble
and sometimes even ridiculous, is because we do not take this final step. We stop
with our wobbly laws of generality and seldom confront them with the concrete
person. (Allport, 1962, p. 407)
Due to the lack of a practical, applied methodology with which to study the
individual, however, most of Allport 's own research was nomothetic. The
increase in the intensive study of the individual in applied research led to a
search for appropriate methodology, and several individuals or groups began
developing ideas during the 1950s and 1960s.
sen & Barlow, 1976; Kazdin, 1981; Leitenberg, 1973), and the inability of
many scientists and clinicians to discriminate the critical difference between
the uncontrolled case study and the experimental study of an individual case
has most likely retarded the implementation of single-case experimental
designs (see chapter 5).
Shontz also failed to recognize the value of the single-case study in isolating
effective therapeutic variables or building new procedures, as suggested later
by Bergin and Strupp (1972). Rather, he proposed the use of a single-case in a
deductive manner to test previously established hypotheses and measurement
instruments in an individual who is known to be so stable in certain personal-
ity characteristics that he or she is "representative" of these characteristics.
Conceptually, Shontz moved beyond Allport, however, in noting that this
approach was not truly idiographic in that he was not proposing to investigate
a subject as a self-contained universe with its own laws. To overcome this
objectionable aspect of single-case research, he proposed replication on sub-
jects who differed in some significant way from the first subject. If the general
hypothesis were repeatedly confirmed, this would begin to establish a gener-
ally applicable law of behavior. If the hypothesis were sometimes confirmed
and sometimes rejected, he noted that "... the investigator will be in a
position either to modify his thinking or to state more clearly the conditions
under which the hypothesis does and does not provide a useful model of
psychological events" (Shontz, 1965, p. 258). With this statement, Shontz
26 Single-case Experimental Designs
the guilt control phase and improved during the rational discussion phase.
These fluctuations around the regression line were statistically significant.
This effect, of course, is weak and of dubious importance because overall
improvement in paranoid scores was not functionally related to treatment.
Furthermore, several guidelines for a true experimental analysis of the treat-
ment were violated. Examples of experimental error include the absence of
baseline measurement to determine the pretreatment course of the paranoid
beliefs and the simultaneous withdrawal of one treatment and introduction of
a second treatment (see chapter 3). The importance of the case and other
early work from M. B. Shapiro, however, is not the knowledge gained from
any one experiment, but the beginnings of the development of a scientifically
based methodology for evaluating effects of treatment within a single-case.
To the extent that Shapiro's correlational studies were similar to process
research, he broke the semantic barrier which held that process criteria were
unrelated to outcome. He demonstrated clearly that repeated measures within
an individual could be extended to a logical end point and that this end point
was the outcome of treatment. His more important contribution from our
point of view, however, was the demonstration that independent variables in
applied research could be defined and systematically manipulated within a
single-case, thereby fulfilling the requirements of a "true" experimental ap-
proach to the evaluation of therapeutic technique (Underwood, 1957). In
addition, his demonstration of the applicability of the study of the individual
case to the discovery of issues relevant to psychopathology was extremely
important. This approach is only now enjoying more systematic application
by some of our creative clinical scientists (e.g., Turkat & Maisto, in press).
Quasi-experimental designs
given intervention. Thus one can observe changes from a baseline as a result
of a given intervention. While the inclusion of a baseline is a distinct method-
ological improvement, this design is basically correlational in nature and is
It remained for Chassan (1967, 1979) to pull together many of the method-
ological advances in single-case research to that point in a book that made
clear distinctionsbetween the advantages and disadvantages of what he
termed extensive (group) design and intensive (single-case) design. Drawing
on long experience in applied research, Chassan outlined the desirability and
applicability of single-case designs evolving out of applied research in the
1950s and early 1960s. While most of his own experience in single-case design
concerned the evaluation of pharmacologic agents for behavior disorders,
Chassan also illustrated the uses of single-case designs in psychotherapy
research, particularly psychoanalysis. As a statistician rather than a practic-
ing clinician, he emphasized the various statistical procedures capable of
establishing relationships between therapeutic intervention and dependent
variables within the single-case. He concentrated on the correlation type of
made occasional use of a prototype of the A-
design using trend analysis but
B-A &
Chassan, 1964), which, in this case, extended the
design (e.g., Bellak
work of M. B. Shapiro to evaluation of drug effects but, in retrospect,
contained some of the same methodological faults. Nevertheless, the sophisti-
cated theorizing in the book on thorny issues in single-case research, such as
generality of findings from a single-case, provided the most comprehensive
treatment of these issues to this time. Many of Chassan 's ideas on this subject
will appear repeatedly in later sections of this book.
The Single-case in Basic and Applied Research 29
Kazdin, 1978, and Krasner, 1971a, for a history of behavior therapy). The
relevance of the experimental analysis of behavior to applied research is the
development of sophisticatedjnethodplogy^nabling intensive study of indi-
vidual_suB]ects. In rejecting a between-subject approaciraslEe~only" useful
scientific methodology. Skinner (1938, 1953) reflected the thoughts of the
intended for the animal laboratory was adapted more fully to the investiga-
tion of applied problems and "applied behavior analysis" became an impor-
tant supplementary and, in some cases, alternative methodological approach
to between-subjects experimental designs.
The early pleas to return to the individual as the cornerstone of an applied
science of behavior have been heeded. The last several years have witnessed
the crumbling of barriers that precluded publication of single-case research in
any leading journal devoted to the study of behavioral problems. Since the
first edition of this book, a proHferation of important books has appeared
2.1. INTRODUCTION
TXvo issues basic to any science are variability and generality of findings.
These issues are handled somewhat differently from one area of science to
another, depending on the subject matter. The first section of this chapter
concerns variability.
In applied research, where individual behavior is the primary concern, it is
our contention that the search for sources of variability in individuals must
occur if we are to develop a truly effective clinical science of human behavior
change. After a brief discussion of basic assumptions concerning sources of
variability in behavior, specific techniques and procedures for dealing with
behavioral variability in individuals are outlined. Chief among these are
repeated measurement procedures that allow careful monitoring of day-to-
day variability in individual behavior, and rapidly changing, improvised
experimental designs that facilitate an immediate search for sources of va-
riability in an individual. Several examples of the use of this procedure to
track down sources of intersubject or intrasubject variability are presented.
The second section of this chapter deals with generality of findings. Histori-
cally, this has been a thorny issue in applied research. The seeming limitations
32
General Issues in A Single-case Approach 33
2.2. VARIABILITY
The notion that behavior is a function of a multiplicity of factors finds
wide agreement among and professional investigators. Most scien-
scientists
tists also agree that as one moves up the phylogenetic scale, the sources of
variability in behavior become greater. In response to this, many scientists
choose to work with lower life hope that laws of behavior will
forms in the
emerge more readily and be generalizable to the infinitely more complex area
of human behavior. Applied researchers do not have this luxury. The task of
the investigator in the area of human behavior disorders is to discover
functional relations among treatments and specific behavior disorders over
and above the welter of environmental and biological variables impinging on
the patient at any given time. Given these complexities, it is small wonder that
most treatments, when tested, produce small effects or, in Bergin and Strupp's
terms, weak results (Bergin & Strupp, 1972).
nent of behavior, then procedures had to be found to deal with this issue
before meaningful research could be conducted. The solution involved ex-
perimental designs and confidence level would elucidate func-
statistics that
tional relations among independent and dependent variables over and above
the intrinsic variability. Sidman (1960) noted that this is not the case in some
other sciences, such as physics. Physics assumes that variability is imposed by
error of measurement or other identifiable factors. Experimental efforts are
then directed to discovering and eliminating as many sources of variability as
possible so that functional relations can be determined with more precision.
Sidman proposed that basic researchers in psychology also adopt this strat-
34 Single-case Experimental Designs
egy. Rather than assuming that variability is intrinsic to the organism, one
should make every effort to discover sources of behavioral variability among
organisms such that laws of behavior could be studied with the precision and
specificity found in physics. This precision, of course, would require close
attention to the behavior of the individual organism. If one rat behaves
differently from three other rats in an experimental condition, the proper
tactic is to find out why. If the experimenter succeeds, the factors that produce
that variability can be eliminated and a "cleaner" test of the effects of the
original independent variable can be made. Sidman recognized that behav-
ioral variability may never be entirely eliminated, but that isolation of as
many sources of variability as possible would enable an investigator to
estimate how much variability actually is intrinsic.
Applied researchers, by and large, have not been concerned with this
argument. Every practitioner is aware of multiple social or biological factors
that are imposed on his or her data. If asked, many investigators might also
assume some intrinsic variability in clients attributable to capriciousness in
nature; but most are more concerned with the effect of uncontrollable but
potentially observable events in the environment. For example, the sudden
appearance of a significant relative or the loss of a job during treatment of
depression may affect the course of depression to a far greater degree than the
particular intervention procedure. Menstruation may cause marked changes
in behavioral measures of anxiety. Even more disturbing are the multiple
unidentifiable sources of variability that cause broad fluctuation in a patient's
clinical course. Most applied researchers assume this variability is imposed
rather than intrinsic, but they may not know where to begin to factor out the
sources.
The solution, as in basic research, has been to accept broad variability as an
unavoidable evil, to employ experimental design and statistics that hopefully
control variability, and to look for functional relations that supersede the
"error."
As Sidman observed when discussing these tactics in basic research:
Although one may question this strategy in basic research, as Sidman has, the
amount of control an experimenter has over the behavioral history and
current environmental variables impinging on the laboratory animal makes
this strategy at least feasible. In applied research, when control over behav-
ioral histories or even current environmental events is limited or nonexistent,
there is far less probability of discovering a treatment that is effective over
and above these uncontrolled variables. This, of course, was the major cause
of the inability of early group comparison studies to demonstrate that the
treatment under consideration was effective. As noted in chapter 1, some
clients were improving while others were worsening, despite the presence of
the treatment. Presumably, this variability was not intrinsic but due to current
life circumstances of the clients.
Repeated measures
can offer only a minimum of information about the patient state. While such
information is literally better than no information, it provides no more data than
does any other statistical sample of one (1967, p. 182)
These terms are "ad hoc" definitions which move the focus of inquiry away from
repetitive patterns with observable frequencies to fixed momentary states. But
this notion of the momentary present is specious and deceptive; it is neither fixed
nor momentary nor immediately present, but an inferred condition (p. 39).
obvious. But the search for sources of individual variability cannot be re-
stricted to repeated measures of one small segment of a client's course
somewhere between the beginning and the end of treatment, as in process
research. With the multitude of events impinging on the organism, significant
behavior fluctuation may occur at any time from the beginning of an
intervention until well after completion of treatment. The necessity of re-
peated, frequent measures to begin the search for sources of individual
variability is apparent. Procedures for repeated measures of a variety of
behavior problems are described in chapter 4.
A prior design in which variables are distributed, for example, in a Latin square,
may be a severe handicap. When on behavior can be immediately ob-
effects
served, it is more efficient to explore relevant variables by manipulating them in
an improvised and rapidly changing design. Similar practices have been responsi-
ble for the greater part of modern science (Honig, 1966, p. 21).
More recently, this feature of single-case designs has been termed response
guided experimentation (Edgington, 1983, 1984).
General Issues in A Single-case Approach 39
100
000 0000 005 4100 000?5 rl
o
S.I 80
If
si
C M
-< c
3 . n
n
20-
:;
1 2 3 4 5 6 7 8 9 10 U 1? 13 14 13 16 17 18 19
I I
I
Presentation I Cond. Present. Conditioning
,S 60.
v7 ?5?
V.
Backward Classcal Simultaneous
Presentation Conditioning Presentation
Individual Sessions
FIGURE 2-2. Mean penile circumference change to male and female slides expressed as a
percentage of full erection and total heterosexual urges and fantasies collected from 4 days
surrounding each session. Data are presented for individual sessions with circumference change to
males averaged over each phase. Mean UCR percentage is indicated for each treatment session.
(Figure 2, p. 40, from: Herman, S. H., Barlow, D. H., and Agras, W. S. [1974]. An experimental
analysis of classical conditioning as a method of increasing heterosexual arousal in homosexuals.
Behavior Therapy, 5, 33-47. Copyright 1974 by Association for the Advancement of Behavior
Therapy. Reproduced by permission.)
that 30 seconds of viewing the female slide alone was followed by 30 seconds
of viewing both the male and female slides simultaneously (side by side),
followed by 30 seconds of the male slide alone. This adjustment (labeled
simultaneous presentation) produced increases in heterosexual arousal in the
separate measurement sessions, which reversed during a return to the original
classical conditioning procedure and increased once again during the second
phase, in which the slides were presented simultaneously. The experiment
suggested that classical conditioning was also effective with this cHent but
only after a sensitive temporal adjustment was made.
Merely observing the "outcome" of the 2 subjects at the end of a fixed
point in time would have produced the type of intersubject variability so
common in outcome studies of therapeutic techniques. That is, one subject
would have improved with the initial classical conditioning procedure
whereas one subject would have remained unchanged. If this pattern contin-
ued over additional subjects, the result would be the typical weak effect
(Bergin & Strupp, 1972) with large intersubject variability. Highlighting the
variability through repeated measurement in the individual and improvising a
new experimental design as soon as a variation in response was noted (in this
42 Single-case Experimental Designs
case no response) allowed an immediate search for the cause of this unrespon-
siveness. It should also be noted that this research tactic resulted in immediate
MALE
BASELINE , FEMALE EXPOSURE EXPOSURE FEMALE EXPOSURE
i
62.5-
Circumference change to:
Females
Males
50-
37.5
25
12 3 4 5 6 7 8 9 10 11 12 13 14 15
FIGURE 2-3. Mean penile circumference change expressed as a percentage of full erection to
nude female (averaged over blocks of three sessions) and nude male (averaged over each phase)
slides. (Figure 1, p. 338, from: Herman, S. H., Barlow, D. H., and Agras, W. S. [1974]. An
activities during the day, such as games, shopping expeditions, meetings with
her mother, and other social visits. These daily recordings revealed that
44 Single-case Experimental Designs
asthmatic attacks most often followed meetings with the patient's mother,
particularly if these meetings occurred in the home of the mother. After this
relationshipwas demonstrated, the patient experienced a change in her Hfe
circumstances which resulted in moving some distance away from her mother.
During the ensuing 20 months, only nine attacks were recorded despite the
fact that these attacks had occurred daily for a period of 2 years prior to
intervention. What is more remarkable is that eight of the attacks followed
her now infrequent visits to her mother.
Once again, the procedure of repeated measurement highlighted individual
fluctuation, allowing a search for correlated events that bore potential causal
It should be noted that no experimen-
relationships to the behavior disorder.
tal was undertaken in this case to isolate the mother as the cause of
analysis
asthmatic attacks. However, the dramatic reduction of high-frequency at-
tacks after decreased contact with the mother provided reasonably strong
evidence about the contributory effects of visits to the mother, in an A-B
fashion. What is more convincing, however, is the reoccurrence of the attacks
at widely spaced intervals after visits to the mother during the 20-month
follow-up. This series of naturally occurring events approximates a contrived
A-B- A-B. . . design and effectively isolates the mother's role in the patient's
asthmatic attacks (see chapter 5).
behavior occur that cannot be correlated with any one variable. In these
cases, close examination of repeated measures of the target behavior and
correlated internal or external events does not produce an obvious relation-
ship. Most likely, many events may be correlated at one time or another with
deterioration or improvement in a client. At this point, it becomes necessary
to employ sophisticated experimental designs if one is to search for the source
of variability. The experienced applied researcher must first choose the most
likely variables for investigation from among the many impinging on the
client at any one time. In the case described above, not only visits to the
mother but visits to other relatives as well as stressful situations at work might
all have contributed to the variance. The task of the clinical investigator is to
tease out the relevant variables by manipulating one variable, such as visits to
mother, while holding other variables constant. Once the contribution of
visits to mother to behavioral fluctuation has been determined, the investiga-
experiment demonstrated that size of meals was related to caloric intake only
if feedback and reinforcement were present. This discovery led to inclusion of
this procedure in a recommended treatment package for anorexia nervosa.
Experimental designs to determine the effects of combinations of variables
will be discussed in section 6.6 of chapter 6.
Weight m - 4,000
Caloric
Intake o>--o
3,000 o
S
2,000
- 1,000
30 40 50
Days
FIGURE 2-4, Data from an experiment examining the effect of feedback on the eating behavior
of a patient withanorexia nervosa (Patient 4). (Figure 3, p. 283, from: Agras, W. S., Barlow, D.
H., Chapin, H. N., Abel, G. G., and Leitenberg, H. [1974]. Behavior modification of anorexia
nervosa. Archives of General Psychiatry, 30, 279-286. Copyright 1974 by American Medical
Association. Reproduced by permission.)
REINFORCEMENT
FEEDBACK
I I I I I I I I I I I I 'l I I I I I I I I I I !
1 4 8 12 16 20 24 DAYS
FIGURE 2-5. Caloric intake presented on a daily basis during reinforcement and reinforcement
and feedback phases for the patient whose data is presented in Figure 2-4. (Replotted from Figure
3, p. 283, from: Agras, W. S., Barlow, D. H., Chapin, H. N., Abel, G. G., and Leitenberg, H.
[1974]. Behavior modification of anorexia nervosa. Archives of General Psychiatry, 30, 279-286.
Copyright 1974 by American Medical Association. Reproduced by permission.)
in blocks of 2 days. The averaged data, however, present a clear picture cf the
effect of the variable over time. Since the major purpose of the experiment
was to demonstrate the effects of various therapeutic variables with anorex-
ics, we chose to present the data in this way. It was not our intention,
however, to ignore the daily variability. The fairly regular pattern of change
suggests several environmental or metabolic factors that may account for
these changes. If one were interested in more basic research on eating patterns
in anorexics, one would have to explore possible sources of this variability in
a finer analysis than we chose to undertake here.
It is possible, of course, that feedback might not have produced the clear
and clinically relevant increase noted in these data. If feedback resulted in a
small increase in caloric intake that was clearly visible only when data were
averaged, one would have to resort to statistical tests to determine if the
increase could be attributed to the therapeutic variable over and above the
day-to-day variability (see chapter 9). Once again, however, one may question
the clinical relevance of the therapeutic procedure if the improvement in
behavior is so small that the investigator must use statistics to determine if
change actually occurred. If this situation obtained, the preferred strategy
might be to improvise on the experimental design and augment the thera-
peutic procedure such that more relevant and substantial changes were pro-
duced. The issue of clincial versus statistical significance, which was discussed
in some detail above, is a recurring one in single-case research. In the last
analysis, however, this is always reduced to judgments by therapists, educa-
tors, etc. on the magnitude of change that is relevant to the setting. In most
cases, these magnitudes are greater than changes that are merely statistically
significant.
The above example notwithstanding, the conservative and preferred ap-
proach of data presentation in single-case research is to present all of the data
so that other investigators may examine the intrasubject variability firsthand
and draw their own conclusions on the relevance of this variability to the
problem.
Large intrasubject variability is a common feature during repeated mea-
surements of target behaviors in a single-case, particularly in the beginning of
an experiment, when the subject may be accommodating to intrusive mea-
sures. How much variability the researcher is willing to tolerate before
And again,
It is unrealistic to expect that a given variable will have the same effects upon all
subjects under all conditions. As we identify and control a greater number of the
conditions that determine the effects of a given experimental operation, in effect
we decrease the variability that may be expected as a consequence of the opera-
tion. It then becomes possible to produce the same results in a greater number of
subjects. Such generality could never be achieved if we simply accepted inter-
subject variability and gave equal status to all deviant subjects in an investigation
(p. 190).
which treatments are most effective with a given client in a given setting.
have looked to the applied researcher to answer these
Typically, clinicians
questions.
The most obvious limitation in studying a single-case is that one does not
know if the results from this case would be relevant to other cases. Even if
one isolates the active therapeutic variable in a given client through a rigorous
single-case experimental design, critics note that there is little basis for infer-
ring that this therapeutic procedure would be equally effective when applied
to clients with similar behavior disorders (client generality) or that different
therapists using this technique would achieve the same results (therapist
generality). Finally, one does not know if the technique would work in a
different setting (setting generality). This issue, more than any other, has
retarded the development of single-case methodology in applied research and
has caused many authorities on research to deny the utility of studying a
single-case for any other purpose than the generation of hypotheses (e.g.,
Kiesler, 1971). Conversely, in the search for generality of applied research
findings, the group comparison approach appeared to be the logical answer
(Underwood, 1957).
In the specific area of individual human behavior, however, there are issues
that limit the usefulness of a group approach in establishing generality of
findings. On the other hand, the newly developing procedures of direct,
systematic, and clinical replication offer an alternative, in some instances, for
establishing generality of findings relevant to individuals. The purpose of this
section is to outline the major issues, assumptions, and goals of generality of
findings as related to behavior change in an individual and to describe the
advantages and disadvantages of the various procedures to establishing
generality of findings.
outside of the study. As Edgington (1967) pointed out, "In the absence of
random samples hypothesis testing is still possible, but the significance state-
ments are restricted to the effect of the experimental treatments on the
subjects actually used in the experiment, generalization to other individuals
being based on logical nonstatistical considerations" (p. 195). If one wishes to
make statements about effectiveness of a treatment across therapists or
settings, random samples of therapists and settings must also be included in
the study.
Random sampling of characteristics in the animal laboratories of experi-
mental psychology is most relevant
feasible, at least across subjects, since
characteristics such as genetic and environmental determinants of individual
behavior can be controlled. In clinical or educational research, however, it is
ments in recent years (Spitzer, Forman, & Nee, 1979), makes it very difficult
to determine the adequacy of a given sample. In addition, the therapeutic
emphasis may differ from setting to setting. In one center, bizarre behavior
and hallucinations may be emphasized. In another center, a thought disorder
may be the primary target of assessment (Neale & Oltmanns, 1980; Wallace,
Boone, Donahoe, & Foy, in press).
A second problem that arises when one is attempting an adequate sample
of a population is the availability of clients who have the needed behavior or
characteristics to fill out the sample (see chapter 1, section 1.5). In laboratory
animal research this is not a problem because subjects with specified charac-
teristics or genetic backgrounds can be ordered or produced in the laborator-
General Issues in A Single-case Approach 53
ies. In applied research, however, one must study what is available,and this
may result in a heavy weighting on certain client characteristics and inade-
quate sampling of other characteristics. Results of a treatment applied to this
sample cannot be generalized to the population. For example, techniques to
control disruptive behavior in the classroom will be less than generalizable if
they are tested in a class where students are from predominantly middle-class
suburbs and inner-city students are underrepresented.
Even in the great snake phobic epidemic of the 1960s, where the behavior
in question was circumscribed and clearly defined, the clients to whom
various treatments were applied were almost uniformly female college sopho-
mores whose fear was neither too great (they could not finish the experiment
on time) nor too little (they would finish it too quickly). Most investigators
admitted that the purpose of these experiments was not to generalize treat-
ment results to clinical populations, but to test theoretical assumptions and
generate hypotheses. The fact remains, however, that these results cannot even
be generalized beyond female college sophomores to the population of snake
fearers, where age, sex, and amount of fear would all be relevant.
It should be noted that all examples above refer to generality of findings
across clients with simalar behavior and background characteristics. Most
studies at least consider the importance of generality of findings along this
dimension, although few have been successful. What is perhaps more impor-
tant is the failure of most studies to consider the generality problem in the
other two dimensions namely, setting generality and behavior change agent
(therapist) generality. Several investigators (e.g., Kazdin, 1973b, 1980b;
McNamara & MacDonough, 1972) have suggested that this information may
be more important than client generality. For example, Paul (1969) noted
after a survey of group studies that the results of systematic desensitization
seemed to be a function of the qualifications of the therapist rather than
differences among clients. Furthermore, in regard to setting generality,
Brunswick (1956) suggested that, "In fact, proper sampling of situations and
problems may be in the end more important than proper sampling of subjects
considering the fact that individuals are probably on the whole much more
alike than are situations among one another" (p. 39). Because of these
problems, many sophisticated investigators specializing in research methodol-
ogy have accepted the impracticability of random sampling in this context
and have sought other methods for establishing generality (e.g., Kraemer,
1981).
The failure to be able to make statistically inferential statements, even
about populations of clients based on most clinical research studies, does not
mean no statements about generality can be made. As Edgington (1966)
that
pointed out, one can make statements at least on generality of findings to
similar clients based on logical non-statistical considerations. Edgington re-
ferred to this as logical generalization, and this issue, along with generality to
54 Single-case Experimental Designs
while the other group becomes the no-treatment control. This arrangement,
which has characterized much clinical and educational research, suffers for
two reasons; (1) To the extent that the "available" clients are not a random
sample, one cannot generalize to the population; and (2) to the extent that the
group is heterogeneous on any of a number of characteristics, one cannot
make statements about the individual. The only statement that can be made
concerns the average response of a group with that particular makeup which,
unfortunately, is unlikely to be duplicated again. As Bergin (1966) noted, it
was even difficult to say anything important about individuals within the
group based on the average response because his analysis demonstrated that
some were improving and some deteriorating (see Strupp & Hadley, 1979).
The result, as Chassan (1967, 1979) eloquently pointed out, was that the
behavior change agent did not know which treatment or aspect of treatment
was effective that was statistically better than no treatment but that actually
might make a particular patient worse.
What Bergin and Strupp (1972) and others (e.g., Kiesler, 1971; Paul, 1967)
recognized was that if anything important was going to be said about the
group would have to be
individual, after experimenting with a group, then the
homogeneous For example, in a study of a
for relevant client characteristics.
group of agoraphobics, they should all be in one age-group with a relatively
homogeneous amount of fear and approximately equal background (per-
sonality) variables. Naturally, clients in the control group must also be
homogeneous for these characteristics.
Although this approach sacrifices random sampling and the ability to make
about the population of agoraphobics, one can begin to
inferential statements
say something about agoraphobics with the same or similar characteristics as
those in the study through the process of logical generalization (Edgington,
1967, 1980a). That is, if a study shows that a given treatment is successful
with a homogeneous group of 20- to 30-year-old female agoraphobics with
certain personality characteristics, then a clinician can be relatively confident
that a 25-year-old female agoraphobic with those personality characteristics
will respond well to that same treatment. (Recently some experts have sug-
gested that one should not assemble groups that are too homogeneous, for
even the ability to generalize on more logical grounds might be greatly
restricted [Kraemer, 19811.)
The process of logical generalization depends on similarities between the
patients in the homogeneous group and the individual in question in the
clinician's office. Which features of a case are important for extending logical
56 Single-case Experimental Designs
generalization and which features can be ignored (e.g., hair color) will depend
on the judgment of the clinician and the state of knowledge at the time. But if
one can generalize in logical fashion from a patient whose results or charac-
teristics are well specified as part of a homogeneous group, then one can also
Kazdin, 1980b, 1982b; Underwood, 1957), the sections to follow will describe
our views of the relative merits of replication studies versus generalization
from homogeneous groups.
As a basis for comparison, it is useful to compare the single-case approach
with PauFs (1967, 1969) incisive analysis of the power of various experimental
designs using groups of clients. Within the context of the power of these
various designs to establish cause-effect relationships, Paul reviewed the
several procedures commonly used in applied research. These procedures
range from case studies with and without measurement, from which cause-
effect relationships can seldom if ever be extracted, through series of cases
typically reporting percentage of success with no control group. Finally, Paul
cited the two major between-group experimental designs capable of establish-
ing functional relationships between treatments and the average response of
clients in the group. The first is what Paul referred to as the nonfactorial
design with no-treatment control, in other words the comparison of an
experimental (treatment) group with a no-treatment control group. The sec-
ond design is the powerful factorial design, which not only establishes cause-
effect relations between treatments and clients but also specifies what type of
clients under what conditions improve with a given treatment; in other words,
client-treatment interactions. The single-case replication strategy paralleling
the nonfactorial design with no-treatment control is direct replication. The
replication strategy paralleling the factorial design is called systematic replica-
tion.
General Issues in A Single-case Approach 57
relevant domains such as client, therapist, and setting. We would agree with
Paul's notions that the level of product of a single-case experimental design
only "approaches" that of treatment/no-treatment group designs, but for
somewhat different reasons. It is our contention that the single-case A-B-A
design approaches rather than equals the nonfactorial group design with no-
treatment controls only because the number of clients is considerably less in a
single-case design(N = I) than in a group design, where 8, 10, or more clients
are not uncommon. It is our further contention that, in terms of external
validity or generality of findings, a series of single-case designs in similar
clients in which the original experiment is directly replicated three or four
times can far surpass the experimental group/no-treatment control group
design. Some of the reasons for this assertion are outlined next.
Results generated from an experimental group/no-treatment control group
study as well as a direct replication series of single-case experimental designs
yield some information on generality of findings across clients but cannot
address the question of generality across different therapists or settings.
Typically, the group study employs one therapist in one setting who applies a
given treatment to a group of clients. Measures are taken on a pre-post basis.
Premeasures and postmeasures are also taken from a matched group of
clients in the control group who do not receive the intervening treatment. For
example, 10 depressive patients homogeneous on behavioral and emotional
aspects of their depression, as well as personality characteristics, would be
compared to a matched group of patients who did not receive treatment.
Logical generalization to other patients (but not to other therapists or set-
tings) would depend on the degree of homogeneity among the depressives in
both groups. As noted above, the less homogeneous the depression in the
experiment, the greater the difficulty for the practicing clinician in determin-
ing if that treatment is effective for his or her particular patient. A solution to
this problem would be to specify in some detail the characteristics of each
patient in the treatment group and present individual data on each patient.
The clinician could then observe those patients that are most like his or her
58 Single-case Experimental Designs
(noted in section 2.3). If a particular procedure works well in one case but
works less well or fails when attempts are made to replicate this in a second or
third case, slight alterations in the procedure can be made immediately. In
many cases, reasons for the inability to replicate the findings can be ascer-
tained immediately, assuming that procedural deficiencies were, in fact, re-
sponsible for the lack of generality. An was outlined in
example of this result
one patient
section 2.3, describing intersubject variability. In this example,
improved with treatment, but a second did not. Use of an improvised
experimental design at this point allowed identification of the reason for
failure. This finding should increase generality of findings by enabling imme-
diate application of the altered procedure to another patient with a similar
General Issues in A Single-case Approach 59
findings across all important domains in applied research (within the limits
and procedures and guidelines for replication will be described in chapter 10.
Actuarial questions
systematic replication series, the results would be stated differently. Here the
investigator would say that under certain conditions the treatment works,
while under other conditions it does not work, and other therapeutic variables
must be added. While this statement might be adequate for the practicing
clinician or educator, little information on the magnitude of effect is con-
veyed. Because society supports research and, ultimately, benefits from it, this
General Issues in A Single-case Approach 63
individuals within that group. It may not be important that two or three
children remain somewhat out of order if the classroom is substantially more
quiet. A good example is an experiment on the modification of
particularly
classroom noise reported in chapter 7, Figure 7-5 (C. W. Wilson & Hopkins,
1973). A similar approach might be desirable with any coexisting group of
people, such as a ward in a state hospital where the control of disruptive
64 Single-case Experimental Designs
Once again, it is a good idea to have a treatment that has been adequately
worked out on individuals before attempting to modify behavior of a group.
If not, the investigator will encounter intolerable intersubject variability that
will weaken the effects of the intervention.
designs, to highlight the differences. This need not be the case. As described
throughout group designs could be carried out with close atten-
this chapter,
the individual data to back on. This would be important for purposes of
fall
logical generalization, which forms the only rational basis for generalizing
results from one group of individual subjects to another individual subject. In
our experience as editors of major journals, data from group studies are
being reported increasingly in this manner, as investigators alter their underly-
General Issues in A Single-case Approach 65
That is, one subject improves dramatically while another improves only
if
General Procedures in
Single-case Research
3.1. INTRODUCTION
Advantages of the experimental single-case design and general issues involved
in this type of research were briefly outlined in chapter 2. In the present
chapter a more detailed analysis of general procedures characteristic of all
experimental single-case research will be undertaken. Although previous
discussion of these procedures has appeared periodically in the psychological
and psychiatric literatures (Barlow & Hersen, 1973; Hersen, 1982; Kazdin,
1982b; Kratchowill, 1978b; Levy & Olson, 1979), a more comprehensive
analysis, from both a theoretical and an applied framework, is very much
needed.
A review of the literature on applied clinical research since the 1960s shows
that there is a substantial increase in the number of articles reporting the use
of the experimental single-case design strategy. These papers have appeared in
a wide variety of educational, psychological, and psychiatric journals. How-
ever, many researchers have proceeded without the benefit of carefully
thought-out guidelines, and, as a consequence, needless errors in design and
practice have resulted. Even in the Journal of Applied Behavior Analysis,
which is primarily devoted to the experimental analysis model of research,
errors in procedure and practice are not uncommon in reported investiga-
tions.
67
68 Single-case Experimental Designs
public, and replicable in all respec ts. When measurement techniques require
the use of human observers, independent reliability checks must be es-
tablished (see chapter 4 for specific details). Secondly, rrif^^l'^^'^^^^S tt^V^P
r epeatedly, esp eciall y over extended periods of time, must be done under
exacting and totally standardized conditions with resp ect to measurement
devices use d,_:p ersonnel involved, time or times of day measurements are
recorded^ instructions ^^ ^^^ g"bjf ot, and specifi c pnvirnpmpntal mnditions
(e.g., location)where the mpavmrpjj^^pt SCSSionS OCCUr..
Deviations from any of the aforementioned conditions may well lead to
spurious effects in the data and might result in erroneous conclusions. This is
General Procedures in Single-case Research 69
turned toward the dial, and, for the most part, by the same experimenter. In
this study, consistency of the experimenter was not considered crucial to
the A-phase of study (Barlow, Blanchard, Hayes, & Epstein, 1977; Barlow &
Hersen, 1973; Hersen, 1982; Risley & Wolf, 1972; Van Hasselt & Hersen,
1981). It should be noted that this phase was earlier labeled 0,020304 by
Campbell and Stanley (1966) in their analysis of quasi-experimental designs
for research (time series analysis).
The primary purpose of baseline measurement is to have a standard by
which the subsequent efficacy of an experimental intervention can be evalu-
ated. In addition, Risley and Wolf (1972) pointed out that, from a statistical
framework, the baseline period functions as a predictor for the level of the
target behavior attained in the future. A number of statistical techniques for
analyzing time series data have appeared in the literature (Edgington, 1982;
Wallace & Elder, 1980); the use of these methods will be discussed in chapter
9.
Baseline stability
that can be applied to this question, but a number of suggestions have been
made. Baer, Wolf, and Risley (1968) recommended that baseline measure-
ment be continued over time "until its stability is clear" (p. 94). McNamara
and MacDonough concurred with Wolf and Risley 's (1971) recommendation
that repeated measurement be applied until a stable pattern emerges. How-
ever, there are some practical and ethical limitations to extending initial
measurement beyond certain Hmits. The first involved a problem of logistics.
72 Single-case Experimental Designs
In the context of basic animal research, where the behavioral history of the
organism can be determined and controlled, Sidman (1960) has recom-
mended that, for stability, rates of behavior should be within a 5 percent
range of variability. Indeed, the "basic science" research is in a position to
create baseline data through a variety of interval and ratio scheduling effects.
However, even in animal resarch, where scheduling effects are programmed
to ensure stability of baseline conditions, there are instances where unex-
pected variations take place as a consequence of extrinsic variables. When
such variability is presumed to be extrinsic rather than intrinsic, Sidman
(1960) has encouraged the researcher to first examine the source of variability
through the method of experimental analysis. Then extrinsic sources of
variation can be systematically eliminated and controlled.
Sidman acknowledged, however, that the applied clinical researcher, by
virtue of his or her subject matter, when control over the behavioral history is
nearly impossible, is at a distinct disadvantage. He noted that "The behav-
ioral engineer must continuously take variability as he finds it, and deal with
it as an unavoidable fact of life" (Sidman, 1960, p. 192). He also acknowl-
edged that "The behavioral engineeer seldom has the facilities or the time that
would be required to eliminate variability he encounters in a given problem"
(p. 193). When variability in baseline measurements is extensive in applied
clinical research, it might be useful to apply statistical techniques for purposes
of comparing one phase to the next. This would certainly appear to be the
case when such variability exceeds a 50 percent level. The use of statistics
General Procedures in Single-case Research 73
under these circumstances would then meet the kind of criticism that has been
who uses single-case methodology.
leveled at the applied clinical researcher
For example, Bandura (1969) argued that there is no difficulty in interpreting
performance changes when differences between phases are large (e.g., the
absence of overlapping distributions) and when such differences can be
replicated across subjects (see chapter 10). However, he underscored the
difficulties in reaching valid conclusions when there is "considerable variabil-
ity during baseline conditions" (p. 243).
Examples of baselines
tratedand described. Methods for dealing with each pattern will be outlined,
and an attempt to formulate some specific rules (a la cookbook style) will be
undertaken.
The issue concerning the ultimate length of the basehne measurement
phase was previously discussed in some detail. However, it should be pointed
out here that "A minimum of three separate observation points, plotted on
the graph, during this baseline phase are required to establish a trend in the
data" (Barlow & Hersen, 1973, p. 320). Thus three successively increasing or
decreasing points would constitute establishment of either an upward or
downward trend in the data. Obviously, in two sets of data in which the same
trend is exhibited, differences in the slope of the line will indicate the extent or
power of the trend. By contrast, a pattern in which only minor variation is
seen would indicate the recording of a stable baseline pattern. An example of
such a stable baseline pattern is depicted in Figure 3-1 Mean number of facial
.
tics averaged over three daily 15-minute videotaped sessions are presented for
a 6-day period. Visual inspection of these data reveal no apparent upward or
downward trend. Indeed, data points are essentially parallel to the abscissa,
minimum. This kind of baseline pattern, which
while variability remains at a
shows a constant rate of behavior, represents the most desirable trend, as it
permits an unequivocal departure for analyzing the subsequent efficacy of a
treatment intervention. Thus the beneficial or detrimental effects of the
following intervention should be clear. In addition, should there be an ab-
sence of effects following introduction of a treatment, it will also be ap-
parent. Absence of such effects, then, would graphically appear as a
74 Single-case Experimental Designs
S250 .
P
^ 200 -
o
<
"-
u.
O
150
^"~^*"~~"*~
~-_ ^
'-t^^"'^
*
g 100
2
LU
-
g, 50
UJ
ec
"-
1 1 1 1
3 4
DAYS
FIGURE 3-1. The stable baseline. Hypothetical data for mean number of facial tics averaged
over three daily 15-minute videotaped sessions.
continuation of the steady trend first established during the baseline measure-
ment phase.
A second type of baseline trend that frequently is encountered in applied
clinical research is such that the subject's condition under study appears to be
worsening (known as the deteriorating baseline Barlow & Hersen, 1973).
Once again, using our hypothetical data on an example of this kind
facial tics,
of baseline trend is presented in Figure 3-2. Examination of this figure shows
a steadily increasing linear function, with the number of tics observed aug-
menting over days. The deteriorating baseline is an acceptable pattern inas-
much as the subsequent application of a successful treatment intervention
should lead to a reversed trend in the data (i.e., a decreasing linear function
over days). However, should the treatment be ineffective, no change in the
slope of the curve would be noted. If, on the other hand, the treatment
application leads to further deterioration (i.e., if the treatment is actually
detrimental to the patient see Bergin, 1966), it would be most difficult to
assess its effects using the deteriorating baseline. In other words, a differen-
tial analysis as to whether a trend in the data was simply a continuation of the
baseline pattern or whether application of a detrimental treatment specifically
led to its continuation could not be made. Only if there appeared to be a
pronounced change of the curve following introduction of a
in the slope
detrimental treatment could some kind of valid conclusion be reached on the
basis of visual inspection. Even then, the withdrawal and reintroduction of
the treatment would be required to establish its controlling effects. But from
both clinical and ethical considerations, this procedure would be clearly
unwarranted.
A baseline pattern that provides difficulty for the applied clinical researcher
250
s
^
^ 200
o
1^
1^
150 ""^
> 100
o
SB ,,,__^ ^^
Ul
9
e>
50
UJ
1 2 3 4 5 6
DAYS
FIGURE 3-2. The increasing baseline (target behavior deteriorating). Hypothetical data for
mean number of facial tics averaged over three daily 15-minute videotaped sessions.
is one that reflects steady improvement in the subject's condition during the
course of initial observation. An example of this kind of pattern appears in
Figure 3-3. Inspection of this figure shows a linear decrease in tic frequency
over a 6-day period. The major problem posed by this pattern, from a
research standpoint, is that application of a treatment strategy while improve-
ment is already taking place will not allow for an adequate assessment of the
intervention. Secondly, shouldimprovement be maintained following initia-
tion of the treatment intervention, the experimenter would be unable to
attribute such continued improvement to the treatment unless a marked
change in the slope of the curve were to occur. Moreover, removal of the
treatment and its subsequent reinstatement would be required to show any
controlling effects.
An alternative (and possibly a more desirable) strategy involves the contin-
uation of baseline measurement with the expectation that a plateau will be
reached. At emerge and the effects of
that point, a steady pattern will
treatment can then be easily evaluated. improvement
It is also possible that
seen during baseline assessment is merely a function of some extrinsic vari-
able (Sidman, 1960) of which the experimenter is currently unaware. Follow-
ing Sidman's recommendations, it then behooves the methodical
experimenter, assuming that time limitations and clinical and ethical consider-
ations permit, to evaluate empirically, through experimental analysis, the
possible source (e.g., "placebo" effects) of covariation. The results of this
kind of analysis could indeed lead to some interesting hunches, which then
might be subjected to further verification through the experimental analysis
method (see chapter 2, section 2.3).
The extremely variable baseline presents yet another problem for the
76 Single-case Experimental Designs
^ 200
^ 150
Ik
e
100
z
UJ
g
UJ
50
"
3 4
DAYS
FIGURE 3-3. The decreasing baseline (target behavior improving). Hypothetical data for
mean number of facial tics averaged over three daily 15-minute videotaped sessions.
.
.250
o -
5200
A
^
il50
fck
kk
.
/^
/
A A
\\V
/
/
/
/
\
\
\
\
/
/
/
1
/
/
100 \
\
/
\ /
>- /
\ /
o \ / \ /
50
a
.
Y
o>
UJ
ec .
u. "
1 , ,
1 2 3 4 5 6
DAYS
FIGURE 3-4, The variable baseline. Hypothetical data for mean number of facial tics
ate probability levels will not be reached. Further details regarding graphic
presentation and statistical analyses of data will appear in chapter 9.
A final strategy for dealing with the variable baseline is to assess systemati-
cally the sources of variability. However, as pointed out by Sidman (1960), the
amount of work and time involved in such an analysis is better suited to the
"basic scientist" than the applied clinical researcher. There are times when the
clinical researcher will have to learn to live with such variability or to select
measures that fluctuate to a lesser degree.
Another possible baseline pattern is one in which there is an initial period of
deterioration, which is then followed by a trend toward improvement (see
Figure 3-6). This type of baseline (increasing-decreasing) poses a number of
problems for the experimenter. First, when time and conditions permit, an
empirical examination of the covariants leading to reversed trends would be
of heuristic value. Second, while the trend toward improvement is continued
in the latter half, of the baseline period of observation, application of a
treatment will lead to the same difficulties in interpretation that are present in
the improving baseline, previously discussed. Therefore, the most useful
course of action to pursue involves continuation of measurement procedures
until a stable and steady pattern emerges.
S 250
:i 200
2 150
100
50
23456789 DAYS
10
FIGURE 3-5. The variable-stable baseline. Hypothetical data for mean number of facial tics
averaged over three daily 15-minute videotaped sessions.
78 Single-case Experimental Designs
2 250
^ 200
o
1 150
100
50
3 4
DAYS
FIGURE 3-6. The increasing-decreasing baseline. Hypothetical data for mean number of
facial tics averaged over three daily 15-minute videotaped sessions.
o *
^ 200
2 150
100
S
z
Ui
5. 50
12 3
DAYS
4 5 6
FIGURE 3-7. The decreasing-increasing baseline. Hypothetical data for mean number of
facial tics averaged over three daily 15-minute videotaped sessions.
CA 250
o
p
^ 200
^
C9
150
ik
e
>- 100
o
3 50
O*
UJ
fie
7 9 11 13 15
DAYS
FIGURE 3-8. The unstable baseline. Hypothetical data for mean number of facial tics
averaged over three daily 15-minute videotaped sessions.
80 Single-case Experimental Designs
conditions (A), while Phase 4 consists of socal reinforcement alone (C). Here
we have an example of an A-BC-A-C design, with A= baseline, BC = token
and social reinforcement, A = baseline, and C = social reinforcement. In
this experiment the researcher is hopeful of teasing the relative effects of
token and social reinforcement. However, this a totally erroneous assumption
on his or her part. From the A-BC-A portion of this experiment, it is feasible
only to assess the combined BC assuming that the
effect over baseline (A),
appropriate trends in the data appear. Evaluation of the individual effects of
the two variables (social and token reinforcement) comprising the treatment
package is not possible. Moreover, application of the C condition (social
General Procedures in Single-case Research 81
reinforcement alone) following the second baseline also does not permit firm
conclusions, either with respect to the effects of social reinforcement alone or
in contrast to the combined treatment of token and social reinforcement. The
experimenter is not in a position to examine the interactive effects of the BC
and C phases, as they are not adjacent to one another.
If our experimenter were interested in accurately evaluating the interactive
effects of token and social reinforcement, the following extended design
would be considered appropriate: A-B-A-B-BC-B-BC. When this experimen-
tal strategy is used, the interactive effects of social and token reinforcement
4801
400
360
320
280
240
FIGURE 3-9. Time in which a knife was kept exposed by a phobic patient as a function of
feedback, feedback plus praise, and no feedback or praise conditions. (Figure 2, p. 131, from
Leitenberg, H., Agras, W. S., Thomson, L., & Wright, D. E. (1%8), Feedback in behavior
modification: Anexperimental analysis in two phobic cases. Journal of Applied Behavior
Analysis, 1, 131-137. Copyright 1968 by Society for the Experimental Analysis of Behavior, Inc.
Reproduced by permission.)
y
< Z
1^O
S^mmmmtmm
"X
g
Am.-^
SESSIONS
FIGURE 3-10. Each point represents one session and indicates the number of intervals in which
the subject was out of his seat (top) or talking without permission (bottom). A total of 90 such
intervals was possible within a 15-minute session. Asterisks over points indicate sessions that
resulted in time being spent in the booth. (Figure 1, p. 237, from: Ramp, E., Ulrich, R., &
Dulaney, S. (1971). Delayed timeout as a procedure for reducing disruptive classroom behavior:
A case study. Journal of Applied Behavior Analysis, 4, 235-239. Copyright 1971 by Society for
the Experimental Analysis of Behavior, Inc. Reproduced by permission.)
SUBJECT ^1
S 1.00
i 0.75
O O
a;0.50 A"- ,..^, z
f^- <
^
2
<
0.25
-W sV-
CO
U
5
O 0.50
|0.25
1.00
0.75
/ \. ^-
Mil-
UJ
CO
o
a.
o BASE T BASE WATCH T
a. 1 2 3 4 5
f\
FIGURE 3-11. Proportion of total intervals in which Bang (punished) and Bite (unpunished)
responses were recorded for SI in 47 free-play periods. (Figure 1, p. 88, from: Pendergrass, V. E.
(1972). Timeout from positive reinforcement following persistent, high-rate behavior in retar-
dates. Journal of Applied Behavior Analysis, 5, 85-91 Copyright 1972 by Society for Experimen-
.
LOOKING SMILING
^ jV Video Fdbk
& Foe Insir
4 5 6 7 8 9 10 11 12 1 4 5 6 7 8 9 10
, BLOCKS OF TWO MINUTES BLOCKS OF TWO MINUTES
FIGURE
A- Mean
et - A - 6t
number of looks and
3-12. smiles for three couples in 10-second intervals plotted
in blocks of 2 minutes for the Videotape Feedback Plus Focused Instructions Design. (Figure 3,
p. 556, from: Eisler, R. M., Hersen, M., & Agras, W. S. (1973). Effects of videotape and
on nonverbal marital interaction: An analog study. Behavior Therapy, 4,
instructional feedback
551-558. Copyright 1973 by Association for the Advancement of Behavior Therapy. Reproduced
by permission.)
^
30
CO
HI
Q. ACQUISITION EXTINCTION REACQUISITION
fe
Total urges O
o>-
oo< Card sort > #
oaO
<^ 6
r-c
Om
^^
Zlu OZ
5 (/><
Qo
-J
<
3
P
c
12 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
lg^ EXPERIMENTAUpAYS
EXPERIMENTAL
f\ Qp
FIGURE 3-13. Total score on card sort per experimental day and total frequency of pedophilic
sexual urges in blocks of 4 days surrounding each experimental day. (Lower scores indicate less
sexual arousal.). (Figure 1, p. 599, from: Barlow, D. H., Leitenberg, H., & Agras, W. S. (1969).
Experimental control of sexual deviation through manipulation of the noxious scene in covert
sensitization. Journal of Abnormal Psychology, 74, 5%-601. Copyright 1969 by the American
Psychological Association. Reproduced by permission.)
General Procedures in Single-case Research 87
(1973) have labeled this sequence the A-A,-B-A,-B design. More specifically,
Once again, pursuing the one variable rule, Liberman et al., (1973) have
shown how the combined effects of drugs and behavioral manipulations can
be evaluated. Maintaining a constant level of medication (600 mg of
chlorpromazine per day), the controlling effects of time-out on delusional
behavior (operationally defined) were examined as follows: (1) baseline plus
600 mg of clorpromazine, (2) time-out plus 600 mg of chlorpromazine, and
(3) removal of time-out plus 600 mg of chlorpromazine. In this study (AB-
88 Single-case Experimental Designs
CB-AB) the only variable manipulated across phases was the time con-
tingency.
There are several other important issues related to the investigation of drug
effects in single-case experimental designs that merit careful analysis. They
include the double-blind evaluation of results, long-term carryover effects of
phenothiazines, and length of phases. These will be discussed in some detail
in section 3.6 of this chapter and in chapter 7.
numerous extensions and permutations (see chapter 5 for details) are usually
placed in this category (Barlow et al., 1977; Barlow & Hersen, 1973; Hersen,
1982; Kazdin, 1982b; Van Hasselt & Hersen, 1981).
When speaking of a reversal, one typically refers to the removal (with-
drawal) of the treatment variable that is applied after baseline measurement
has been concluded. In practice, the reversal involves a withdrawal of the B
phase (in the A-B-A design) after behavioral change has been successfully
demonstrated. If the treatment (B phase) indeed exerts control over the
targeted behavior under study, a decreased or increased trend (depending on
which direction indicates deterioration) in the data should follow its removal.
In describing their experimental efforts when using A-B-A designs, applied
clinical researchers frequently have referred to both their procedures and
resuhing data as reversals. This, then, represents a terminological confusion
between the independent variable and the dependent variable. However, from
either a semantic, logical, or scientific standpoint, it is untenable that both a
cause and an effect should be given an identical label. A careful analysis
reveals that a reversal involves a specific technical operation, and that its
12 3 4 9 10 II 12 13 14 IS \h\ 17 It 19 20 21 22 23 24 2S 31 31 40 SI
Basalin* Rinf. Inttract. with Rvrsl Rinf. IntM^act. with Children Post
Children Clitckt
D y t
FIGURE 3-14. Daily percentages of time spent in social interaction with adults and with children
during approximately 2 hours of each morning session. (Figure 2, p. 515, from: Allen K. E.,
Hart, B. M., Buell, J. S., Harris, R R., & Wolf, M. M. (1964). Effects of social reinforcement on
isolate behavior of a nursery school child. Child Development, 35 511-518. Copyright 1964.
Reproduced by permission of The Society for Research in Child Development, Inc.)
Withdrawal of treatment
The specific point at which the experimenter removes the treatment vari-
able (second A phase in the A-B-A design) in the withdrawal design is
multidetermined. Among the factors to be considered are time Hmitations
imposed by the treatment setting, staff cooperation when working in institu-
tions (J. M. Johnston, 1972), and ethical considerations when removal of
treatment can possibly lead to some harm to the subject (e.g., head banging
in a retardate) or others in the environment (e.g., physical assaults toward
General Procedures in Single-case Research 91
15
9 11 13 15 17
DAYS
FIGURE 3-15. Increasing treatment phase followed by decreasing baseline. Hypothetical data
for frequency of social responses in a schizophrenic patient per 2-hour period of observation.
92 Single-case Experimental Designs
, 15
UJ CONT.
BASELINE BASELINE
REINF.
i 12
u
e
y. 6
1 3 5 7 9 H 13 15 17
DAYS
FIGURE 3-16. High-level treatment phase followed by low-level baseline. Hypothetical data
for frequency of social responses in a schizophrenic patient per 2-hour period of observation.
CONT
BASELINE BASELINE
REINF.
i
bU
ce
12
\\ /
A^ y\ \
< 9
\y \
i
^-V
^
2 3
1
[V^
7 9 11 13 15 17 19 21 23
in Figure 3-18. Inspection of the figure reveals that after a stable pattern is
last data point in contingent reinforcement is clearly above the highest point
achieved in baseline. Removal of treatment and a return to baseHne condi-
tionson Day 13 similarly result in a decreasing trend in the data. Therefore,
no conclusions as to the controlling effects of contingent reinforcement are
possible, as it is not clear whether the decreasing trend in the second baseline
is a function of the treatment's withdrawal or mere continuation of the trend
begun during treatment. Even if withdrawal of treatment were to lead to the
stable low-level pattern seen in the first baseline period, the same problems in
interpretation would be posed.
When the aforementioned trend appears during the course of experimental
treatment, it is recommended that the phase be continued until a more
consistent pattern emerges. However, if this strategy is pursued, the equiva-
lent length of adjacent phases is altered (see section 3.6). A second strategy,
although admittedly somewhat weak, is to reintroduce treatment in Phase 4
(thus, we have an A-B-A-B design), with the expectation that a reversed trend
in the data will reflect improvement. There would then be limited evidence for
the treatment's controlling effects.
A similar problem ensues when treatment is withdrawn in the example that
appears in Figure 3-19. In spite of an initial upward trend in the data when
contingent reinforcement is first introduced (B), the decreasing trend in the
latter half of the phase, which is then followed by a similar decline during the
second baseline (A), prevents an analysis of the treatment's controlling ef-
CONT.
<o 15f BASELINE BASELINE
LU REINF.
/>
g
t/i
12
kU
oc
^ 9
5
o
C/3
l^ 6-
O
'>
u \-\
z
UJ 5-
3
o>
LU
QC
i
-I I l_l k_JL I I I '
1 5 5 7 9 11 13 15 17
DAYS
r, , period of observation.
94 Single-case Experimental Designs
c 15 CONT.
BASELINE REINF. BASELINE
12
< 9
1 3 5 9 11 13 15 17
DAYS
fects. Therefore, the same recommendations made in the case of Figure 3-18
apply here.
effects and it exerts control over the targeted behavior being examined, then,
when reinstated, its controlling effects will be established. To the contrary,
Krasner (1971b) reported that recovery of low levels of baseline
initially
(1966). Leitenberg (1973) argued that "In such cases, where the therapeutic
procedure cannot be introduced and withdrawn at will, sequential ABA
designs are obviated" (p. 98). Under these circumstances, the use of alterna-
tive experimental strategies such as multiple baseline (Hersen, 1982) or al-
ing individual and relative length of phases, carryover effects and cyclic
variations. In addition, these considerations will be examined as they apply to
the study of drugs on behavior.
He notes further:
7 9 11 13 15
DAYS
FIGURE 3-20. Extension of the treatment phase in an attempt to show its effects. Hypothetical
data in which the effects of time-out on daily frequency of hitting other children (based on a 2-
hour free-play situation) in a 3-year-old male child are examined.
Training (RCT) in a "secondary enuretic" child (see Figure 3-21). Two larget
behaviors, number of enuretic episodes and mean frequency of daily urina-
tion, were selected for study in an A-B-A-B experimental design. During
baseline, the child recorded the natural frequency of target behaviors and
received counselingfrom the experimenter on general issues relating to home
and school. Following baseline, the first week of RCT involved teaching the
child topostpone urination for a 10-minute period after experiencing each
urge. Delay of urinationwas increased to 20 and 30 minutes in the next 2
weeks. During Weeks 7-9 RCT was withdrawn, but was reinstated in Weeks
10-14.
Examination of Figure 3-21 indicates that each of the first three phases
consisted of 3 weeks, with data reflecting the controlling effects of RCT on
both target behaviors. Reinstatement of RCT in the final phase led to re-
newed control, and the treatment was extended to 5 weeks to ensure main-
tenance of gains.
It might be noted that phase and data patterns do not often follow the ideal
sequence depicted in the Miller (1973) study. And, as a consequence, experi-
menters frequently are required to make accommodations for ethical, proce-
DAILY URHUTION
ENURETK EPISODES
Rtttntion Ntontlon
lattline Control . iaMlIno Control
1 TralnlRc! Trainlni
,
j
I
' \ ,
! i'
i ^ - '\ ^ .'^
*
y
\/
i\
1/"
\
1 \_l \
\
12 3 4 5 7 8 9 K) 11 12 13 M 15 16
CONSECUTIVE DAYS
FIGURE 3-21. Number of enuretic episodes per week and mean number of daily urinations per
week for Subject 1. (Figure 1, p. 291, from: Miller, P. M. (1973). An experimental analysis of
retention control training in the treatment of nocturnal enuresis in two institutionalized adoles-
cents. Behavior Therapy, 4, 288-294. Copyright 1973 by Association for the Advancement of
Behavior Therapy. Reproduced by permission.)
General Procedures in Single-case Research 99
Carryover effects
treatment (B phase) and returning to the placebo (A, phase) condition in the
A-A,-B-A,-B design. With respect to such effects, Chassan (1967) pointed out
that "This, for instance, is thought likely to be the case in the use of
monoaminoxidase inhibitors for the treatment of depression" (p. 204). Simi-
larly, when using phenothiazine derivatives, the experimenter must exercise
caution inasmuch as residuals of the drugs have been found to remain in body
tissues for extended periods of time (as long as 6 months in some cases)
following their discontinuance (Ban, 1969).
However, it is possible to examine the short-term effects of phenothiazines
on designated target behaviors (Liberman et al., 1973), but it behooves the
experimenter to demonstrate, via blood and urine laboratory studies, that
controlling effects of the drug are truly being demonstrated. That is to say,
and graphic data patterns) between behavioral
correlations (statistical
changes and drug levels in body tissues should be demonstrated across
experimental phases.
Despite the carryover difficulties encountered with the major tranquilizers
and antidepressants, the possibility of conducting extended studies in long-
term facilities should be explored, assuming that high ethical and experimen-
tal standards prevail. In addition, study of the short-term efficacy of the
minor tranquilizers and amphetamines on selected target behaviors is quite
feasible.
Cyclic variations
the multiple baseline strategy is ideally suited for studying such variables, in
that withdrawals of treatment are not required to show the controlling effects
of particular techniques (Baer et al., 1968; Barlow & Hersen, 1973; Hersen,
1982; Kazdin, 1982b). A complete discussion of issues related to the varieties
of multiple baseline designs currently being employed by applied researchers
appears in chapter 7.
Agras, Leitenberg, Callahan, & Moore, 1972), but it is not possible to remove
it in the same sense as one does in the case of reinforcement. Therefore, in
light of these issues, when examining the interacting effects of instructions
and other therapeutic variables (e.g., social reinforcement), instructions are
typically maintained constant across treatment phases while the therapeutic
variable is introduced, withdrawn, and reintroduced in sequence (Hersen,
Gullick, Matherne, & Harbert, 1972).
Exceptions
There are some exceptions to the above that periodically have appeared in
the psychological literature. In two separate studies the short-term effects of
instructions (Eisler, Hersen, & Agras, 1973) and the therapeutic value of
instructional sets (Barlow et al., 1972) were examined in withdrawal designs.
In one of a series of analogue studies, Eisler, Hersen and Agras investigated
the effects of focused instructions ("We would you to pay attention as to
like
how much you are looking at each other") on two nonverbal behaviors
(looking and smiling) during the course of 24 minutes of free interaction in
three married couples. An A-B-A-B design was used, with A consisting of 6
minutes of interaction videotaped between a husband and wife in a small
television studio. The B phase also involved 6 minutes of videotaped interac-
tion, but focused instructions on looking were administered three times at 2-
minute intervals over a two-way intercom system by the experimenter from
the adjoining control room. During the second A phase, instructions were
discontinued, while in the second B they were renewed, thus completing 24
minutes of taped interaction.
Retrospective ratings of looking and smiling for husbands and wives (mean
data for the three couples were used, as trends were similar in all cases)
appear in Figure 3-22. Looking duration in baseline for both spouses was
moderate in frequency. In the next phase, focused instructions resulted in a
LOOKING SMILING
*
8
VA
Baseline Foe. Instr. Baseline Foe. Instr.
2 3 4 5 ( 7 8 9 10 11 12 23 456 789 10
BLOCKS O TWO MINUTES BLOCKS OF TWO MINUTES
FIGURE 3-22. Mean number of looks and smiles for three couples in 10-second intervals plotted
in blocks of 2 minutes for the Focused Instructions Alone Design. (Figure 4, p. 556, from: Eisler,
R. M., Hersen, M., & Agras, W. S. (1973). Effects of videotape and instructional feedback on
An analog study. Behavior Therapy, 4, 551-558. Copyright 1973
nonverbal marital interactions:
by Association for the Advancement of Behavior Therapy. Reproduced by permission.)
2
positive instructional set (subjects were informed that pairing of the nauseous
scene with homosexual imagery, based on a review of their data, would lead
to greatest improvement).
Mean data for the four subjects presented in blocks of two sessions appear
in Figure 3-23. Baseline data suggest that the positive set failed to effect a
decreased trend. In the next phase (BC), a marked improvement was noted as
a function of covert sensitization despite the instigation of a negative set. In
the third phase (A), some deterioration was apparent although a positive set
had been instituted. Finally, in the last phase (BD), covert sensitization
coupled with positive expectation of treatment resulted in renewed improve-
ment.
I I
50
E >
40 \.
c
^
>
30-
2L^ i /
20-
\.
10
J I I I
12 3 4 5 6 7 8 9 10 11 1
FIGURE 3-23. Mean penile circumference changes to male slides for 4 Ss, expressed as a
percentage of full erection. In each phase, data from the first, middle, and last pair of sessions are
shown. (Figure 1, p. 413, from: Barlow, D. H., Agras, W. S., Leitenberg, H., Callahan, E. J., &
Moore, R. C, (1972). The contribution of therapeutic instruction to covert sensitization. Beha-
viour Research and Therapy, 10, 411-415. Copyright 1972 by Pergamon. Reproduced by
permission.)
General Procedures in Single-case Research 105
Assessment Strategies
by Donald P. Hartmann
4.1. INTRODUCTION
Assessment strategies that best complement single-case experimental designs
are direct, ongoing or repeated, and intraindividual or ideographic rather
than interindividual or normative. The search is for the determinants of
behavior through examination of the individual's transactions with the social
and physical environment. Thus behavior is a sample, rather than a sign of
the individual's repertoire in the specific assessment setting. This approach,
with its various strategies and philosophical underpinnings, has burgeoned of
late within the general area of behavioral assessment (Hartmann, Roper, &
Bradford, 1979). However, as noted throughout the book, the implementa-
tion of these strategiesis not in any way limited to behavioral approaches to
Thanks to Lynne Zarbatany for her critical reading of an earlier draft of this
chapter and to Andrea Stavros for her typing and editorial assistance.
107
108 Single-case Experimental Designs
and clarifying procedure, a final decision concerns the order of treating target
behaviors. While the existing (and scant) data on this issue suggest that the
order of treatment of target behaviors may have no effect on outcome
(Eyberg & Johnson, 1974), a number of suggestions have been offered for
choosing the first behavior to be treated (Mash & Terdal, 1981; Nelson &
Hayes, 1981). Behaviors recommended for initial treatment include those that
are (1) dangerous to the client or others; (2) most irritating to individuals in
the client's immediate social environment such as spouse or parent; (3) easiest
to modify; (4) most produce generalized positive effects; (5) earliest
likely to
in a chain or prerequisite to other important behaviors; or (6) most difficult to
modify. Of course this decision, as well as many others faced by therapists,
may have to be based on more mundane considerations, such as skill level of
the therapist or demands of the referral source.
Assessment Strategies 1 1
Elaboration: Peer interaction is scored when the child is (a) within three feet
of a peer and either (b) engaged in conversation or physical
activity with the peer or (c) jointly using a toy or other play
object.
Note. From Gelfand, D. M. & Hartmann, D. P. Child behavior: Analysis and therapy (2nd ed.).
Elmsford, NY: Pergamon Press. Copyright 1984. Reproduced by permission.
The settings used for conducting behavioral investigations have been lim-
ited only by the creativity of investigators and the location of subjects.
Because the occurrences of many behaviors are dependent upon specific
environmental stimuli, behavior rates may well vary across settings contain-
ing different stimuli (e.g., Kazdin, 1979). Thus, for example, drinking as-
sessed in a laboratory bar may not represent the rate of the behavior observed
in more natural contexts (Nathan, Titler, Lowenstein, Solomon, & Rossi,
1970), and cooperative behavior modified in the home may not generalize to
the school setting (R. G. Wahler, 1969b). Even within the home, desirable and
undesirable child behaviors may vary with temporal and climatic variables
(Russell & Bernal, 1977). Thus unless the purpose of an investigation is
limited to modifying a behavior in a narrowly defined treatment context,
observations need to be extended beyond the setting in which treatment
occurs. Observations conducted in multiple settings are required (1) if gener-
alization of treatment effects is to be demonstrated; (2) if a representative
portrayal of the target behavior is to be obtained; and (3) if important
contextual variables that control responding and that may be used to generate
effective interactions are to be identified (e.g., Gelfand & Hartmann, 1984;
Hutt & Hutt, 1970). Given the infrequency with which settings are typically
3
Assessment Strategies 1 1
sampled (P. H. Bornstein et al., 1980), these issues either have not captured
the interests of behavior change researchers, or the cost of conducting obser-
vations in multiple settings has exceeded available resources.
While most investigators would prefer to observe behavior as it naturally
number of factors may require that observa-
occurs (e.g., Kazdin, 1982b), a
tions be conducted elsewhere. The reasons for employing contrived or ana-
logue settings include convenience to observers and clients; the need for
standardization or measurement sensitivity; or the fact that the target behav-
ior naturally occurs as a low rate, and observations in natural settings would
involve excessive dross. All of these factors may have determined R. T. Jones,
Kazdin and Haney's (1981b) choice of a contrived setting to assess the
effectiveness of a program to improve children's skill in escaping from home
emergency fires.
The correspondence between behavior observed in contrived observational
settings and in naturalistic settings varies as a function of (1) similarities in
persons present, and (3) the control
their physical characteristics, (2) the
exerted by the observation process (Nay, 1979). Even if assessments are
conducted in naturalistic settings, the observations may produce variations in
the cues that are normally present in these settings. For example, setting cues
may change when structure is imposed on observation settings. Structuring
may range from presumably minor restrictions in the movement and activities
of family members during home observations to the use of highly contrived
situations, as in some assessments of fears and social skills. Haynes (1978),
McFall (1977), and Nay (1977, 1979) provided examples of representative
studies that employed various levels and types of structuring in observation
settings; they also discussed the potential advantages and limitations of
structuring relative to cost, measurement sensitivity, and generalizability.
Cues in observation settings may also be affected by the type of observers
used and their relationship to the persons observed. Observers can vary in
their level of participation with the observed. At the one extreme are nonpar-
ticipant (independent) observers whose only role is to gather data. At the
other extreme are self-observations conducted by the subject or client. In-
termediate levels of participant-observation are represented by significant
others, such as parents, peers, siblings, teachers, aides, and nurses, who are
normally present in the setting where observations take place (e.g., Bickman,
1976). The major advantages of participant-observers is that they may be
present at times that might otherwise be inconvenient for independent obser-
vers, and their presence may be less obtrusive. On the other hand, they may
be less dependable, more subject to biases, and more difficult to train and
evaluate than are independent observers (Nay, 1979).
When observation settings vary from natural life settings either because of
the presence of possibly obtrusive external observers or the imposition of
structure, the ecological validity of the observations is open to question (e.g.,
114 Single-case Experimental Designs
Barker & Wright, 1955; Rogers-Warren & Warren, 1977). Methods of limiting
these threats to ecological validity are discussed in the section on observer
effects.
Though selection of observation settings is an important issue, investiga-
tors must also determine how best to sample behaviors within these settings.
Sampling of behavior is influenced by how observations are scheduled.
Behavior cannot be continuously observed and recorded except by partici-
pant-observers and when the targets are low-frequency events (see, for exam-
ple, the Clinical Frequency Recording System employed by Paul & Lentz,
Assessment Strategies 1 1
targeted such as the length of time required to perform the response, the
response latency, or the interresponse time (Cone & Foster, 1982). While
duration is less commonly observed than is frequency (e.g., M. B. Kelly,
1977), duration has been measured for a variety of target responses including
the length of time that a claustrophobic, patient sat in a small room (Leiten-
berg et al., 1968) and latency to comply with classroom instructions
(Fjellstedt & Sulzer-Azaroff, 1973).
Duration measures require the availability of a suitable timing device and a
target response with clearly discernible onsets and offsets. In single-variable
studies, the general availability and convenience of digital wristwatches with
real time and stopwatch functions may enable even a participant observer to
serve as the primary source of data. In the case of multiple-target behaviors, a
complex timing device such as a multiple-channel event recorder such as a
Datamyte is required.
Response quality is typically assessed when target behaviors vary either in
(1) intensity or amplitude, such as noise level and penile erection; (2) ac-
curacy, such as descriptions of place and time used to test general orientation;
or (3) acceptability, such as the appropriateness of assertion and the intelligi-
bility of speech (Cone & Foster, 1982). These qualitative dimensions may be
evaluated on continuous or discrete scales, and the discrete scales can them-
selves be dichotomous or multi-categorical. For example, assessment of the
amount of food spilled by a child could be made by weighing the child and
the food on his or her plate before and after each meal (quantitative,
116 Single-case Experimental Designs
Assessment Strategies 1 1
the fragmentary picture it gives of the stream of behavior; (2) the difficulty of
(1977).
Duration recording is used when one of the previously discussed temporal
aspects of responding is targeted. According to M.B. Kelly (1977), duration
recording is the least used of the common recording techniques, perhaps in
part because of the belief that frequency is a more basic response characteris-
Observer effects
Note. Adapted from Gelfand, D. M. & Hartmann, D. P. (1984). Child behavior: Analysis and
therapy (2nd ed,). Elmsford, NY: Pergamon Press. Copyright 1984. Reproduced by permission.
120 Single-case Experimental Designs
and public-relations skills (e.g., Haynes, 1978; also see Johnson & Bolstad,
1973); (4) young children under the age of six and subjects who are open and
confident or perhaps merely insensitive may react less to direct observation
than subjects who do not share these characteristics; and (5) the rationale for
observation may affect the degree to which subjects respond in an atypical
manner (see discussion by Weick, 1968). Johnson and Bolstad (1973) recom-
mended providing a thorough rationale for observation procedures in order
to reduce subject concerns and potential reactive effects due to the observa-
tion process. Other methods for reducing reactivity also may prove useful
(Kazdin, 1979; 1982a).
eye contact with the observee. Table 4-3 lists suggestions for classroom
observers that are intended to decrease their obtrusiveness and hence the
reactivity of their observations.
3. Increase reliance on reports from informants who are a natural part of the
client's social environment.
4. Obtain assessment data from multiple sources differing in method arti-
fact.
1 Obtain the caretaker's permission to observe the child in the classroom or other
school environment.
2. Consult the classroom teacher prior to making observations and agree upon an
acceptable introduction and explanation for your presence in the classroom. Also
arrange for mutually agreeable observation times, location, etc.
3. Insofar as possible, coordinate your entry and exit from the classroom with normal
breaks in the daily routine.
4. Be inconspicuous in your personal appearance and conduct.
5. Do not strike up conversations with the children.
6. Sit in an inconspicuous location from which you can see but cannot easily be seen.
7. Disguise your interest in the target child by varying the apparent object of your
glances.
8. Do not begin systematic behavioral observations until the children have become
accustomed to your presence.
9. Minimize disruptions by taking your observations at the same time each day.
10. Thank the teacher for allowing you to visit the classroom.
Note. Adapted from Gelfand, D. M. & Hartmann, D. P. (1984). Child behavior: Analysis and
therapy (2nd ed.). Elmsford, NY: Pergamon Press. Copyright 1984. Reproduced by permission.
being controlled, investigators should assess the nature and extent of bias by
systematically probing their observers (Hartmann, Roper, & Gelfand, 1977;
Johnson & Bolstad, 1973).
Observer drift, or instrument decay (Cook & Campbell, 1979; Johnson &
Bolstad, 1973), occurs when observer consistency or accuracy decreases, for
example, from the end of training to the beginning of formal data collection
(e.g., Taplin & Reid, 1973).^ Drift occurs when a recording-interpretation bias
122 Single-case Experimental Designs
has gradually evolved over time (Arrington, 1939, 1943) or when response
definitions or measurement procedures are informally altered to suit novel
changes in the topography of some target behavior (Doke, 1976). Drift can
also result from observer satiation or boredom (Weick, 1968). Observer drift
can cause inflated estimates of interobserver reliability when these estimates
are based on data obtained (1) during training sessions, (2) from overt
reliability assessment no matter when scheduled, or (3) from a long-standing,
familiar team of observers during the course of a lengthy investigation (see
Hartmann & Wood, 1982).
Drift can be limited or its effects reduced by providing continuing training
throughout a project, by training and recalibrating all observers at the same
time, and by inserting random and covert reliability probes throughout the
course of the investigation. Alternatively, investigators can take steps to
evaluate the presence of observer drift by having observers periodically rate
prescored videotapes (sometimes referred to as criterion videotapes), by
conducting reliability assessment across rotating members of observation
teams, and by using independent reliability assessors (see reviews by Cone &
Foster, 1982; Hartmann & Wood, 1982; Haynes, 1978).
Observer cheating has been reported only rarely (e.g., Azrin, Holz, Ulrich,
& Goldiamond, 1961). More commonly, observers have been known to
though these calculation mistakes are
calculate inflated reliability coefficients,
not necessarily the result of intentional fabrication (e.g., Rusch, Walker, &
Greenwood, 1975). Precautions against observer cheating include random,
unannounced reliabihty spot checks; collection of data forms immediately
after an observation session ends; restriction of data analysis and rehability
calculations to individuals who did not collect the data; provision of pens
rather than pencils to raters (obvious corrections might then be evaluated as
an indirect measure of cheating); and reminders to observers about the
canons of science and the dire consequences of cheating (Hartmann & Wood,
1982). See the section on staging reliability assessments (p. 124) for further
suggestions regarding limiting observer drift and observer cheating.
throughout this training phase, and all scoring decisions and clarifications
should be posted in an observer log or noted in the observation manual that
each observer carries.
Practice in the observation setting follows. Practice observations can serve
the dual purpose of desensitizing observers to fears about the setting (i.e.,
Reliability
possible, kept unaware of both when reliability assessment sessions are sched-
uled and the purpose of the study; (3) observers should be reminded of the
importance of accurate data and regularly retrained with observational stim-
ulivarying in complexity; (4) reliability assessments should be conducted
throughout the investigation, particularly in each part of multiphase behav-
ior-change investigations; and (5) the task of calculating reliability should be
undertaken by the investigator, not by the observers (Hartmann, 1982).
Before a reliabihty analysis can be completed, the investigator must deter-
mine the appropriate behavioral units (or the levels of data) on which the
analysis will be conducted (Johnson & Bolstad, 1973). A common, molar unit
is obtained by combining the scores of either empirically or logically related
Jones, Reid, & Patterson, 1975). Still other composite units can be based on
aggregation of scores over time. For example, students* daily question asking
can be combined over a 5-day period to generate weekly question-asking
scores.
SCED E*
126 Single-case Experimental Designs
SUMMARY TABLE
Note. Some of the summary statistics described here commonly employ a percentage scale (for
example, raw agreement). For convenience, these statistics are defined in terms of a proportion
scale. (Adapted from Hartmann, D. P. (1982). Assessing the dependability of observational data.
In D. Hartmann, (Ed.), Using observers to study behavior: New directions for methodology of
P.
social and behavioral science. San Francisco: Jossey-Bass. Copyright 1982 by D. P. Hartmann.
Reproduced by permission.)
ments (e.g., Cone & Foster, 1982; Hawkins & Dotson, 1975), whereas other
procedures provide formal correction for chance agreements. The most pop-
ular of these corrected statistics is Cohen's kappa (J. Cohen, 1960). Kappa
has been discussed and illustrated by Hartmann (1977) and HoUenbeck
(1978), and a useful technical bibliography on kappa appears in Hubert
(1977). Kappa may be used for summarizing observer agreement as well as
accuracy (Light, 1971), for determining consistency among many raters
(A. J. Conger, 1980), and for evaluating scaled (partial) consistency among
observers (J. Cohen, 1968).
Table 4-5 includes qualitative data from a subject scores from six sessions
for
two observers and analyses of these data. The percentage agreement for
these data, sometimes called marginal agreement (Frick & Semmel, 1978), is
the ratio of the smaller value (frequency or duration) to the larger value
obtained by two observers, multiplied by 100. This form of percentage
agreement also has been criticized for potentially inflating reliability estimates
(Hartmann, 1977). Berk (1979) advocated use of generalizability coefficients,
as these statistics provide more information and permit more options than do
either percentage agreement or simple correlation coefficients (also see Hart-
mann, 1977; Mitchell, 1979; and Shrout & Fleiss, 1979). Despite these advan-
tages, some researchers argue that generalizability and related correlational
approaches should be avoided because their mathematical properties may
128 Single-case Experimental Designs
OBSERVERS
1 11 9 82%
2 8 6 75<7o
3 9 7 78<7o
4 10 9 90Vo
5 12 11 92<^o
6 8 8 100<?7o
Note. Adapted from Hartmann, D.P. (1982). Assessing the dependability of observational data.
In D. P. Hartmann (Ed.), Using observers to study behavior: New directions for methodology of
social and behavioral science. San Francisco: Jossey-Bass. Copyright 1982 by D. P Hartmann.
Reproduced by permission.)
and from .60 to .75 for kappa-like statistics (see Hartmann, 1982). While
these recommendations will be adequate for many, even most, research
purposes, the overriding basis for judging the adequacy of data is whether
they provide a powerful means of detecting experimentally produced or
naturally occurring response covariation.
Power depends not only on data quality, but also on the magnitude of
number of available investigative units (for
covariation to be detected, the
Assessment Strategies 129
example, sessions), and the experimental design. Thus, data quality must be
evaluated in the context of these factors (Hartmann & Gardner, 1979). If
consideration of these factors indicates that the data are of adequate quality,
further modification of the observational system is not required. However,
if one or more forms of reliability prove unacceptable, revision of the
research plan is in order.
If the quality judged unsatisfactory, a number of options are
of data is
VaUdity
Johnson & Bolstad, 1973; O'Leary, 1979). In fact, observations have been
130 Single-case Experimental Designs
rately measure some psychological construct. The need for construct validity
is most apparent when observation scores are combined to yield a measure of
Assessment Strategies 131
Behavioral products
Self-report measures
ity have passed this validity hurdle, not all have done so successfully (e.g.,
ever, in the latter case, objective assessments typically play a more important
role, except when the target is itself a subjective response.
Self-monitoring has proven particularly useful for assessing rare and sensi-
tivebehaviors and responses that are only accessible to the client such as pain
due to migraine headaches (Feuerstein & Adams, 1977) and obsessive rumina-
tions (Emmelkamp & Kwee, 1977). Other responses assessed via self-moni-
toring include appetitive urges, hallucinations, hurt and depressed feelings,
sexual behaviors, and waking time (for insomniacs). An array of behaviors
more susceptible to direct observations also has been monitored by the client,
including weight gain or loss, caloric intake, nail biting, exercise, academic
behaviors, alcohol consumption, and whining. Haynes (1978), Haynes and
Wilson (1979), Nay (1979), and Nelson (1977) surveyed applications of target
behaviors and recording procedures used in self-monitoring.
Self-monitoring procedures share a number of method-related problems.
Foremost among these is reactivity (Haynes & Wilson, 1979; Nelson, 1977).
Reactivity effects vary as a function of the social desirability of the behavior
recorded, with the frequency of positively valued responses likely to increase
and negatively valued acts likely to decrease during the course of self-
Psychophysiological measures
usually skin conductance or its reciprocal, skin resistance. EDRs have been
viewed as a measure of activation or autonomic arousal; thus, they often are
used to monitor changes in response to fear stimuli as a result of behavioral
interventions (e.g.. Barlow, Leitenberg, Agras, & Wincze, 1969). However,
the use of ectodermal responding as a measure of arousal also must be done
cautiously,as scores vary depending on the EDR response component
measured (conductance, fluctuations, latency, and wave form), the time-
sampling parameters utilized, and the specific measurement site and proce-
dures used (e.g., Edelberg, 1972; Venables & Christie, 1973).
Sophisticated uses of physiological measures have been made primarily by
laboratory investigators rather than practicing clinicians, due to the expense
of the equipment, the inconvenience associated with its use, and the need for
the nature of the resulting record. For example, some responses display
substantial habituation or adaptation effects; that is, the same stimulus
change in heart rate from 120 to 125 is different from, and probably greater
than, a change from 70 to 75. Thus some form of data transformation may
be necessary to equate response changes at various ranges of the response
dimension (e.g., Ray & Raczynski, 1981). Individuals also may show response
specificity, or a particular pattern of responding across related stimuli (e.g.,
Lacey, 1959). Because individuals vary in the response system that is most
reactive, investigators should assess their clients' reactivity before selecting a
measure that will be sensitive to the changes resulting from treatment. Some
physiological systems also may be responsive to circadian rhythms, and to
diurnal as well as layer cyclic effects (Haynes & Wilson, 1979); again,
familarity with standard technique references is critical to the judicious
selection of measurement procedures.
NOTES
1. The by-products, or traces (e.g., Webb, Campbell, Schwartz, Sechrest, & Grove,
1981), of behaviors such as pounds gained and cigarettes smoked also are consid-
ered grist for the assessment mill.
3. Not infrequently, additional behaviors will be monitored during one or more of the
aforementioned phases. For example, measurements may be regularly or periodi-
cally obtained on the independent, or treatment, variable to ensure that it is
manipulated in the intended manner. L. Peterson, Homer, and Wonder lich (1982)
argued that the infrequent use of independent variable checks seriously threatens
the reliability and validity of applied behavior studies. Along with J. M. Johnston
and Pennypacker (1980), they suggested a variety of methods of assessing the
integrity of independent variable manipulations. Similar recommendations are
given in related treatment literatures (e.g., Hartmann, Roper, & Gelfand, 1977;
Paul & Lentz, 1977).
At other times the investigator may choose to measure environmental events
such as the opportunities to perform the target response (Hawkins, 1982). For
example, when the target is "instruction following," assessing the client's perfor-
mance may require measurement of the occurrence of each instruction or request.
Assessment Strategies 139
7. Self-report measures have proliferated at such a rapid rate that at least one well-
known behavioral assessor suggested that journal editors limit these devices by not
considering for publication those studies employing new instruments that are not
demonstrably superior to existing ones (see comments by blue-ribbon panelists in
Hartmann, 1983).
5.1. INTRODUCTION
In this chapter we will examine the prototype of experimental single-case
research the A-B-A design and its many variants. The primary objective
is to inform and familiarize the reader as to the advantages and limitations of
each design strategy while illustrating from the clinical, child, and behavior
modification literatures. The development of the A-B-A design will be traced,
beginning with its roots in the clinical case study and in the application of
"quasi-experimental designs" (Campbell & Stanley, 1966). Procedural issues
discussed at length in chapter 3 will also be evaluated here for each of the
specific design options as they apply. Both "ideal" and "problematic" exam-
ples, selected from the applied research area, will be used for illustrative
purposes.
Since the publication of the first edition of this book (Hersen & Barlow,
1976) the literature has become replete with examples of A-B-A designs.
However, there has been very little change with respect to basic procedural
Therefore, we have retained most of the original design illustrations
issues.
but have added some more recent examples from the applied behavioral
literature.
140
Basic A-B-A Withdrawal Designs 141
niques (cf. Ashem, 1963; Barlow, 1980; Barlow et al., 1983; Lazarus, 1963;
UUmann & Krasner, 1965; Wolpe, 1958, 1976).
Although there can be no doubt that the case history method yields
interesting (albeit uncontrolled) data, that it is a rich source for clinical
speculation, and that ingenious technical developments derive from its appli-
cation, the multitude of uncontrolled factors present in each study do not
permit sound cause-and-effect conclusions. Even when the case study method
is applied at its best (e.g., Lazarus, 1973), the absence of experimental control
and the lack of precise measures for target behaviors under evaluation remain
mitigating factors. Of course, proponents of the case study method (e.g.,
Lazarus & Davison, 1971) are well aware of its inherent limitations as an
evaluative tool, but they show how it can be used to advantage to generate
hypotheses that later may be subjected to more rigorous experimental
scrutiny. Among their advantages, the case study method can be used to (1)
foster clinical innovation, (2) cast doubt on theoretic assumptions, (3) permit
study of rare phenomena (e.g., Gilles de la Tourette's Syndrome), (4) develop
new technical skills, (5) buttress theoretical views, (6) result in refinement of
techniques, and (7) provide clinical data to be used as a departure point for
subsequent controlled investigations.
With respect to the last point, Lazarus and Davison (1971) referred to the
use of "objectified single case studies." Included are the A-B-A experimental
designs that allow for an analysis of the controlling effects of variables, thus
permitting scientifically valid conclusions. However, in the more typical case
study approach, a subjective description of treatment interventions and re-
sulting behavioral changes is made by the therapist. Most frequently, several
techniques are administered simultaneously, precluding an analysis of the
relative merits of each procedure. Moreover, evidence for improvement is
there the strong possibility of bias in these evaluations, but controls for the
treatment's placebo value are unavailable. Finally, the effects of time (ma-
turational factors) are confounded with application of the treatment(s), and
the specific contribution of each of the factors is obviously not distinguished.
More Kazdin (1981) has pointed out how "... the scientific yield
recently,
from case reports might be improved in clinical practice where methodologi-
cal alternatives are unavailable" (p. 183). In ascending order of rigor, three
types are described: (1) cases with preassessment and postassessment, (2)
cases with repeated assessment and marked changes, and (3) multiple cases
with continuous assessment and stability information (e.g., no change in a
patient's condition over extended periods of time despite prior therapeutic
efforts). However, notwithstanding improvements inherent in the aforemen-
tioned case approaches, threats to internal validity are still present to one
degree or another.
A very modest improvement over the uncontrolled case study method
142 Single-case Experimental Designs
elsewhere (Browning & Stover, 1971) has been labeled the "B Design." In this
"design," baseline measurement is omitted, but the investigator monitors one
of a number of target measures throughout the course of treatment. One
might also categorize this procedure as the simplest of the time series analyses
(see G. V. Glass, Willson, & Gottman, 1973). Although this strategy ob-
viously yields a more objective appraisal of the patient's progress, the con-
founds that typify the case study method apply equally here. In that sense the
B Design is essentially an uncontrolled case study with objective measures
The weakness in this design is that the data in the experimental condition is
compared with a forecast from the prior baseline data. The accuracy of an
assessment of the role of the experimental procedure in producing the change
rests upon the accuracy of that forecast. A strong statement of causality there-
fore requires that the forecast be supported. This support is accomplished by
elaborating the A-B design, (p. 5)
Epstein and Hersen (1974) used an A-B design with a follow-up procedure
to assess the effects of reinforcement on frequency of gagging in a 26-year-old
psychiatric inpatient. The patient's symptomatology had persisted for ap-
proximately 2 years despite repeated attempts at medical intervention. During
baseline (A phase), the patient was instructed to record time and frequency of
each gagging episode on an index card, collected by the experimenter the
following morning at ward rounds. Treatment (B phase) consisted of present-
ing the patient with $2.00 in canteen books (exchangeable at the hospital store
for goods) for a decrease (N -
1) from the previous daily frequency. In
ow-up
2 4 6 8 10 12 14 16 18 20 22 24 2 4 6 8 10 12
DAYS WEEKS
FIGURE 5-1. Frequency of gagging during baseline, treatment, and follow-up. (Figure 1, p. 103,
from: Epstein, L. H., & Hersen, M. (1974). Behavioral control of hysterical gagging. Journal of
Clinical Psychology, 30, 102-104. Copyright 1974 by American Psychological Association.
Reproduced by permission.)
Basic A-B-A Withdrawal Designs 145
Day 13. Renewed improvement was then noted between Days 15-18, and
treatment was continued through Day 24. Thus the B phase was twice as long
as baseline, butit was extended for very obvious clinical considerations.
The 12-week follow-up period reveals a zero level of gagging, with the
exception of Week 9, when three gagging episodes were recorded. Follow-up
data were corroborated by the patient's wife, thus precluding the possibility
that treatment only affected the patient's verbal report rather than diminution
of actual symptomatology.
Although treatment appeared to be the effective ingredient of change in
and follow-up) could readily have been carried out in an outpatient facility
(clinic or private-practice setting) with a minimum of difficulty and with no
deleterious effects to the patient.
Lawson (1983) also used an A-B design with a single target behavior
(alcohol consumption) and obtained a follow-up assessment. His case in-
volved a divorced 35-year-old male with a history of problem drinking
beginning at age 16. He periodically would experience blackouts as a function
of his drinking. But despite the chronicity of his problem, with the exception
of a few AA meetings, the subject had not obtained any form of treatment
for his alcoholism. Baseline data (based on the subject's self-report) indicated
that he consumed an average of 65 drinks per week (see Figure 5-2). This was
confirmed by his girlfriend.
Treatment (B phase) began in the third week, and, on the basis of the
behavioral analyses performed, three goals were identified: (1) to decrease
alcohol consumption, (2) to improve social relationships, and (3) to diminish
frequency of anxiety and depression episodes. Thus the comprehensive
therapy program involved goal setting with regard to number of drinks
consumed, rate-reduction strategies, stimulus-control strategies, development
of new social relationships and recreational activities, assertion training, and
self-management of depression.
Examination of data in Figure 5-2 indicates that there were substantial
improvements in rate of drinking during the course of therapy (to about 10
drinks per week) that appeared to be maintained at the 3-month follow-up
(also confirmed by the girlfriend). Indeed, an informal communication re-
ceived by the therapist 1 Vi years subsequent to treatment further confirmed
that the subject still was drinking in a socially acceptable manner.
Treatment did appear to be responsible for change in Lawson's (1983)
alcoholic, particularly given the 19-year history of excessive drinking. This,
then, from a design standpoint, fits in nicly with Kazdin's notion of repeated
146 Single-case Experimental Designs
3 MONTH
BASELINE TREATMENT FOLLOW
70 UP
g50
00
Z 40
a
O30
\A-
4 6
WEEKS
FIGURE 5-2. Weekly self-monitored alcohol consumption during baseline, treatment, and at 3-
month follow-up. (Figure 6-1, p. 165, from: Lawson, D. M. Alcoholism. In M. Hersen (Ed.).
(1983). Outpatient behavior therapy: A clinical guide. New York: Grune & Stratton. Copyright
1983 by M. Hersen. Reproduced by permission.)
In our next example we will examine the use of an A-B design in which a
numher^of target behaviors were monitored simultaneously^(Eisler & Hersen,
1973). The effectroffolcen economy on points earned, behavioral ratings of
depression (WiUiams et al., 1972), and self-ratings of depression (Beck
Depressive Inventory A. T. Beck, Ward, Mendelsohn, Mock, & Erbaugh,
1961) were assessed in a 61 -year-old reactively depressed male patient. In this
study the treatment variable was not withdrawn due to time limitations.
During baseline (A), the patient was able to earn points for a variety of
specified target behaviors (designated under general rubrics of work, personal
hygiene, and responsibility), but these earned points were exchangeable for
ward privileges and material goods in the hospital canteen. During each
phase, the patient filled out a Beck Depressive Inventory (three alternate
forms were used to prevent possible response bias) at daily morning "Bank-
ing Hours," at which time points previously earned on the token economy
were tabulated. In addition, behavioral ratings (talking, smiling, motor activ-
ity) of depression (high ratings indicate low depression) were obtained sur-
Basic A-B-A Withdrawal Designs 147
reptitiously on the average of one per hour between the hours of 8:00 A.M.
and 10:00 P.M. during non-work-related activities.
The results of this study appear in Figure 5-3. Inspection of these data
indicates that number of points earned in baseline increased slightly but then
stabilized. Baseline ratings of depression show stability, with evidence of
greater daytime activity. Beck scores ranged from 19-28. Institution of token
economy on Day 5 resulted in a marked linear increase in points earned, a
substantial increase in day and evening behavioral ratings of depression, and
a linear descrease in self-reported Beck Inventory scores.
Thus it appears that token economy effected improvement in this patient's
depression as based on both objective and subjective indexes. However, as
was previously pointed out, this design does not permit a direct analysis of
the controlling effects of the therapeutic variable introduced (token
economy), as does our example of an A-B-A design seen in Figure 5-7
(Hersen, Eisler, Alford, & Agras, 1973). Nonetheless, the use of an A-B
design in this case proved to be useful for two reasons. First ^ from a clinical
standpoint, it was possible to obtain some objective estimate of the treat-
ment's success during the patient's abbreviated hospital stay. Second, the
results of this study prompted the further investigation of the effects of token
economic procedures in three additional reactively depressed subjects (Her-
sen, Eisler, Alford, & Agras, 1973). In that investigation more sophisticated
experimental strategies confirmed the controlling effects of token economy in
neurotic depression.
smiles, (3) extraneous movements, (4) appropriate verbal content, and (5)
overall social skill. Assessment involved the patient and a male confederate
role-playing 16 scenes (8 commendatory; 8 refusal) that were videotaped.
Social skills training was conducted twice a week for nine weeks and
consisted of modeling, instructions, behavior rehearsal, cognitive modifica-
tion,and in vivo practice. Training was carried out with half of the commen-
datory and refusal scenes; the other half served as a measure of
generalization. In addition, follow-up sessions were conducted at 1 and 6
months after conclusion of treatment.
The results of this A-B analysis appear in Figure 5-4, with the left half
148 Single-case Experimental Designs
LJJ
2
QC
& 20-
u. 10
O
QC
UJ
00
5 BASELINE TOKEN ECONOMY
-8AM 4PM
- 4PM 10PM
0-
BASELINE TOKEN ECONOMY
t I I
30
\ i
20
V^ ^
i
1
\
i
10 -
i \.
!
1
BASELINE 1
TOKEN ECONOMY
1 !
4 5
DAYS
FIGURE 5-3. Number of points earned, mean behavioral ratings, and Beck Depression Scale
scores during baseline and token economy in a reactively depressed patient. (Figure 1, from:
Eisler, R. M., Hersen, M. (1973). The A-B design: Effects of token economy on behavioral and
subjective measures in neurotic depression. Paper presented at the meeting of the American
Psychological Association, Montreal, August 29.)
F : T r
<!
is
5ju I I I 1-n T r-Tp
^1 "V
^
li
I
^i 11^
ii
=s^ ^ I
VI I I
t^
A X
sis
I I I I 1 ir-n
TMINCD
aGENEMIIZATION
^<4 !t=
111
8j
1*
^ l^
I I t I
I I I I I
FIGURE 5-4. Mean frequency of targeted behaviors in refused and commendatory role-play
situations. (Figure 1, p. 50, from: St. Lawrence, J. S., Bradlyn, A. S., & Kelly, J. A. (1983).
Interpersonal adjustment of a homosexual adult: Enhancement via social skills training. Behavior
portraying commendatory scenes and the right half refusal scenes. In general,
improvements during training suggest that the treatment was effective for
both categories (commendatory and refusal) and that there was transfer of
gains from trained to generalization scenes. Moreover, gains appeared to
remain in follow-up, with the exception of smiles (commendatory). However,
a closer examination does reveal a number of problems with these data. First,
for the commendatory scenes there are only one- or two-point baselines.
Therefore, complete establishment of baseline trends was not possible. Also,
for two of the behaviors (smiles, appropriate verbal content), improvements
in training similarly appear to be the continuation of baseline trends. Second,
this also seemed to be the case with regard to refusal scenes for the following
components: eye contact, extraneous movements, appropriate verbal con-
tent, and overall social skill. Thus, although the subject was obviously
clinically improved, these data do not clearly reflect experimental confirma-
tion of such improvement, given the limited confidence one can ever have
with the A-B strategy.
COVERT
BASELINE SENSITIZATION FOLLOW-UP
80
Slides
2 60
o
S 20
z
10
12 3 4 5 6 7 8 9 101112 13
|
CNJ ' C\J CO CO
PROBE DAYS
FIGURE 5-5.Mean penile circumference change to audiotapes and slides during baseline, covert
and follow-up. (Figure 1, p. 83, from: Harbert, T. L., Barlow, D. H., Hersen, M.,
sensitization,
& Austin, J. B. (1974). Measurement and modification of incestuous behavior: A case study.
Psychological Reports, 34, 79-86. Copyright 1974 by Psychological Reports. Reproduced by
permission.)
152 Single-case Experimental Designs
. Deviant
oNon- Deviant
1 2 3 4 5 6 7 8 9 1011121314 g HI i
CM 1 CM CO CO
PROBE DAYS
FIGURE 5-6. Card sort scores on probe days during baseline, covert sensitization, and follow-
up. (Figure 2, p. 84, from: Harbert, T. L., Barlow, D. H., Hersen, M., & Austin, J. B. (1974).
Measurement and modification of incestuous behavior: A case study. Psychological Reports, 34,
79-86. Copyright 1974 by Psychological Reports. Reproduced by permission.)
Basic A-B-A Withdrawal Designs 153
any influence (e.g., some correlated or uncontrolled variable) other than the
treatment variable that is systematically changed. Also, replication of the A-
B-A design in different subjects strengthens conclusions as to power and
controlling forces of the treatment (see chapter 10).
Although the A-B-A strategy is acceptable from an experimental stand-
point, it has one major undesirable feature when considered from the clinical
However, despite this limitation, the A-B-A design is a useful research tool
when time factors (e.g., premature discharge of a patient) or clinical aspects
of a case (e.g., necessity of changing the level of medication in addition to
reintroducing a treatment variable after the second A phase) interfere with
the correct application of the more comprehensive A-B-A-B strategy.
A second problem with the A-B-A strategy concerns the issues of multiple-
treatment interference, particularly sequential confounding (Bandura, 1969;
Cook & Campbell, 1979). The problem of sequential confounding in an A-B-
A design and its variants also somewhat limits generalization to the clinic. As
Bandura (1969) and Kazdin (1973b) have noted, the effectiveness of a thera-
peutic variable in the final phase of an A-B-A design can only be interpreted
in the context of the previous phases. Change occurring in this last phase may
not be comparable to changes that would have occurred if the treatment had
been introduced initially. For instance, in an A-B-BC-B design, when A is
baseline and B and C are two therapeutic variables, the effects of the BC
phase may be more or less powerful than if they had been introduced initially.
This point has been demonstrated in studies by O'Leary and his associates
(O'Leary & Becker, 1967; O'Leary, Becker, Evans, & Saudargas, 1969), who
noted that the simultaneous introduction of two variables produced greater
change than the sequential introduction of the same two variables.
154 Single-case Experimental Designs
be recalled that an improved trend in baseline is not the most desirable trend.
^
Peintt Irn4
havlaral Rating*
Token
It alnf orcamant
i i I 10 11 n
DAYS
FIGURE 5-7. Number of points earned and mean behavioral ratings for Subject 1. (Figure 1, p.
394, from: Hersen, M., Eisler, R. M., Alford, G. S., & Agras, W. S. (1973). Effects of token
economy on neurotic depression: An experimental analysis, Behavior Therapy, 4, 392-397.
Copyright 1973 by Association for the Advancement of Behavior Therapy. Reproduced by
permission.)
However, as the slope of the curve was not extensive, and in light of the
primary focus on behavioral ratings (depression), we proceeded with our
change in conditions on Day 5. Had there been unlimited time, baseline
conditions would have been maintained until number of points earned daily
stabilized to a greater extent.
We might note parenthetically at this point that all of the ideal conditions
(procedural rules) outlined in our discussion in chapter 3 are rarely approxi-
mated when conducting single-case experimental research. Our experience
shows that procedural variations from the ideal are required, as data simply
do not conform to theoretical expectation. Moreover, experimental finesse is
sometimes sacrificed at the expense of time and clinical considerations.
Continued examination of Figure 5-7 indicates that instigation of token
economic procedures on Day 5 resulted in a marked linear increase in both
points earned and behavioral ratings. The abrupt change in slope of the
curves, particularly in points earned, strongly suggests the influence of the
token economy variable, despite the slightly upward trend initially seen in
baseline. Removal of token economy on Day 9 led to an initially large drop in
156 Single-case Experimental Designs
100
40 .
20 .
FIGURE 5-8. Percentage of attending behavior in successive time samples during the individual
conditioning program. (Figure 2, p. 247, from: Walker, H. M., & Buckley, N. K. (1968). The use
of positive reinforcement in conditioning attending behavior. Journal of Applied Behavior
Analysis, 1, 245-250. Copyright 1968 by Society for the Experimental Analysis of Behavior, Inc.
Reproduced by permission.)
of the curve in extinction (A) and the relatively equal lengths of the B and A
phases further dispel doubts that the reader might have as to the confound of
time.
Secondly, with respect to the decreasing-increasing baseline obtained in the
first A phase, although it might be preferable to extend measurement until
full stability is achieved (see section 3.3, chapter 3), the range of variability is
ment fortuitously occurrs during the second baseline period. In the third we
will illustrate the use of the A-B-A-B design when concurrent behaviors are
monitored in addition to targeted behaviors of interest. Finally, in the fourth
we will examine the advantages and disadvantages of using the A-B-A-B
strategy without the experimenter's knowledge of results throughout the
different phases of study.
CONTINOINT CONTINOINf
ATTINTION, AtfllNI^ ATTINTIOM,
<
CD V
D
Z
10 15 20
SESSIONS
FIGURE 5-9. A record of talking out behavior of an educable mentally retarded student.
Baseline I
before experimental conditions. Contingent Teacher Attention, systematic ignoring
of talking out and increased teacher attention to appropriate behavior. Baselinei reinstatment
of teacher attention to talking out behavior. (Figure 2, p. 143, from: Hall, R. V., Fox, R., Willard,
D., Goldsmith, L., Emerson, M., Owen, M., Davis, T, & Porcia, E. (1971). The teacher as
observer and experimenter in the modification of disputing and talking-out behaviors. Journal of
Applied Behavior Analysis, 4, 141-149. Copyright 1971 by Society for the Experimental Analysis
of Behavior, Inc. Reproduced by permission.)
A Responses to and 3s
ui 15
derate neither initiated any of the three targeted behaviors nor responded to
any initiations of the three withdrawn children. However, during the first
In our next examplewe will illustrate the difficulties that arose in interpre-
tation when unexpected improvement took place during the latter half of the
second series of baseline (A) measurements. Epstein, Hersen, and Hemphill
(1974) used an A-B-A-B design in their assessment of the effects of feedback
on frontalis muscle activity in a patient who had suffered from chronic
headaches for a 16-year period. EMG recordings were taken for 10 minutes
following 10 minutes of adaptation during each of the six basehne (A)
sessions. EMG data were obtained while the patient relaxed in a reclining
chair in the experimental laboratory. During the six feedback (B) sessions, the
patient's favorite music (prerecorded on tape) was automatically turned on
whenever EMG activity decreased below a preset criterion level. Responses
above that turned off recordings of music. Instructions to the
level conversely
patient during this phasewere to "keep the music on." In the next six sessions
baseline (A) conditions were reinstated, while the last six sessions involved a
return to feedback (B). Throughout all phases of study, the patient was asked
to keep a record of the intensity of headache activity.
^ 50
o
t 40
^ 30
^ 20
o
.^
^ 10
2 4 10 12 14 16 18 20 22 24
SESSIONS
FIGURE 5-11. Mean seconds per minute that contained integrated responses above criterion
microvolt level during baseline and feedback phases. (Figure 1, p. 61, from: Epstein, L. H.,
Hersen, M., & Hemphill, D. P. Music feedback as a treatment for tension headache: An
(1974).
experimental case study. Journal of Behavior Therapy and Experimental Psychiatry, 5, 59-63.
Copyright 1974 by Pergamon. Reproduced by permission.)
162 Single-case Experimental Designs
When using the withdrawal strategy, such as the A-B-A-B design, most
experimenters have been concerned with the effects of their treatment vari-
able on one behavior the number of
targeted behavior. However, in a
reports (Kazdin, 1973a; Kazdin, 1973b; Lovads &
Simmons, 1969; Risley,
1968; Sajwaj, Twardosz, & Burke, 1972; Twardosz & Sajwaj, 1972) the
importance of monitoring concurrent (nontargeted) behaviors was docu-
mented. This is of particular importance when side effects of treatment are
possibly negative (see Sajwaj, Twardosz, & Burke, 1972). Kazdin (1973b) has
listedsome of the potential advantages in monitoring the multiple effects of
treatment on operant paradigms.
One initial advantage is that such assessment would permit the possibility of
determining response generalization. If certain response frequencies are in-
Basic A-B-A Withdrawal Designs 163
This study . . . points out the desirability of measuring several child behaviors,
although a modification procedure might focus on only one. In this way the
preschool teacher can assess the efficacy of her program based upon changes in
other behaviors as well as the behavior of immediate concern, (p. 77)
60
40
a/
100
eo-
60
40
20
^ /\J^ ^
too-
60-
60-
40-
20-
''V^
100
80
60-
40-
20-
OO '
Js^ v
80-
tS eo
IS
SCHOOL DAYS
^s A^
FIGURE 5-12. Percentages of Tim's sitting, posturing, walking, use of toys, and proximity to
children during freeplay as a function of the teacher's ignoring him when he did not obey a
command to sit down. (Figure 1, p. 75, from: TVardosz, S., & Sajwaj, T. (1972). Multiple effects
of a procedure to increase sitting in a hyperactive retarded boy. Journal of Applied Behavior
Analysis, 5, 73-78. Copyright 1972 by Society for the Experimental Analysis of Behavior, Inc.
Reproduced by permission.)
Thus changes from one phase to the next are accompHshed with the experi-
menter's full knowledge of prior results. Moreover, specific techniques are
then applied with the expectation that they will be efficacious. Although these
factors are of benefit to the experimental clinician, they present certain
difficultiesfrom a purely experimental standpoint. Indeed, critics of th^
single-case approach have concerned themselves with the possibilities of bias
in evaluation and in actual application and withdrawal of specified tech-
niques. One method of preventing such "bias" is to determine lengths o
baseline and experimental phases on an a priori basis, while keeping the
experimenter uninformed as to trends in the data during their collection. A
problem with this approach, however, is that decisions regarding choice of
baselines and those concerned with appropriate timing of institution and
removal of therapeutic variables are left to change.
The above-discussed strategy was carried out in an A-B-A-B design in
which target measures were rated from video tape recordings for all phases on
a postexperimental basis. Hersen, Miller, and Eisler (1973) examined the
effects of varying conversational topics (nonalcohol and alcohol-related) on
duration of looking and duration of speech in four chronic alcoholics and
their wives in ad libitum interactions videotaped in a television studio. Fol-
lowing 3 minutes of "warm-up" interaction, each couple was instructed to
converse for 6 minutes (A phase) about any subject unrelated to the hus-
band's drinking problem. Instructions were repeated at 2-minute intervals
over a two-way intercom from an adjoining room to ensure maintenance of
the topic of conversation. In the next 6 minutes (B phase) the couple was
instructed to converse only about the husband's drinking problem (instruc-
tions were repeated at 2-minute intervals). The last 12 minutes of interaction
consisted of identical replications of the A and B phases.
Mean data for the four couples are presented in Figure 5-13. Speech
duration data show no trends across experimental phases for either husbands
or wives. Similarly, duration of looking for husbands across phases does not
vary greatly. However, duration of looking for wives was significantly greater
during alcohol- than nonalcohol-related segments of interaction. In the first
if)
ieo
70
/ y ieol
70
o o
o o
to 50 i50
u. 11.
O O
DC 40 a:40
lU / ID
m m
30
z J 30
z
.?i20 1 20
10
t 10
FIGURE 5-13. Looking and speech duration in nonalcohol- and alcohol-related interactions of
alcoholics and their wives. Plotted in blocks of 2 minutes. Closed circles husbands; open
circles wives. (Figure 1, p. 518, from: Hersen, M., Miller, P. M., & Eisler, R. M. (1973).
Interactions between alcoholics and their wives: A descriptive analysis of verbal and non-verbal
behavior. Quarterly Journal of Studies on Alcohol, 34, 516-520. Copyright 1973 by Journal of
Studies on Alcohol, Inc. New Brunswick, N.J. 08903. Reproduced by permission.)
until the wives' looking duration achieved stability in the form of a plateau.
Then the second phase would have been introduced.
usually involves the application of a treatment. In the second phase (A) the
treatment is withdrawn and in the final phase (B) it is reinstated. Some
investigators (e.g., Agras et al., 1968) have introduced an abbreviated base-
line session prior to the major B-A-B phases. The B-A-B design is superior to
the A-B-A design, described in section 5.3, in that the treatment variable is in
effect in the terminal phase of experimentation. However, absence of an
Basic A-B-A Withdrawal Designs 167
and Hersen (1973), the use of the more complete A-B-A-B design is preferred
for assessment of singular therapeutic variables.
We will illustrate the use of the B-A-B strategy with one example selected
from the operant literature and a second drawn from the Rogerian frame-
work. In the first, an entire group of subjects underwent introduction,
removal, and reintroduction of a treatment procedure in sequence (Ayllon &
Azrin, 1965). In the second, a variant of the B-A-B design was imployed by
proponents of client-centered therapy (Truax & Carkhuff, 1965) in an attempt
to experimentally manipulate levels of therapeutic conditions.
Ayllon and Azrin (1965) used the B-A-B strategy on a group basis in their
evaluation of the effects of tokeneconomy on the work performance of 44
"backward" schizophrenic subjects. During the first 20 days (B phase) of the
experiment, subjects were awarded tokens (exchangeable for a large variety
of "backup" reinforcers) for engaging in hospital ward work activities. In the
next 20 days (A phase) subjects were given tokens on a noncontingent basis,
regardless of their work performance. Each subject received tokens daily,
based on the mean daily rate obtained in the initial B phase. In the last 20
days (second B phase) the contingency system was reinstated. We might note
at this point that this design could alternately be labeled B-C-B, as the middle
phase is not a true measure of the natural frequency of occurrence of the
target measure (see section 5.6).
Work performance data (total hours per day) for the three experimental
phases appear in Figure 5-14. During the first B phase, total hours per day
REINFOICfMINT
NOT i
CONTINGENT
^50 r
*
-
!
1
!
UPON
PIRFORMANCI
|
1
1 40
RIINfORCIMINT i;
11
i REINfORCEMENT
CONTINOINT II CONTINCINT
UPON 1
1 UPON
RIRrORMANCI 1 | PERFORMANCE
d 30 1
l\
I 20
1
1
>
N=44
UJ ^
'\
\.
S 10
S \J
20 40 60
DAYS
Although the withdrawal design has been used in physiological research for
years, and has been associated with the operant paradigm, the experimental
strategies that are applied can easily be employed in the investigation of
nonoperant (both behavioral and traditional) treatment procedures. In this
connection, Truax and Carkhuff (1965) systematically examined the effects of
high and low "therapeutic conditions" on the responses of 3 psychiatric
patients during the course of initial 1-hour interviews. Each of the interviews
consisted of the three 20-minute phases. In the first phase (B) the therapist
was instructed to evidence high levels of "accurate empathy" and "uncondi-
tional positive warmth" in his interactions with the patient. In the following
Basic A-B-A Withdrawal Designs 169
Each of the three interviews was audiotaped. From these audiotapes, five 3-
minute segments for each phase were obtained and rerecorded on separate
spools. These were then presented to raters (naive as to which phase the tape
originated in) in random order. Ratings made on the basis of the Accurate
Empathy Scale and the Unconditional Positive Regard Scale confirmed
(graphically and statistically) that the therapist followed directions as indi-
cated by the dictates of the experimental design (B-A-B).
The effects of high and low therapeutic conditions were then assessed in
terms of depth of the patient*s intrapersonal exploration. Once again, 3-
minute segments from the A and B phases were presented to "naive" raters in
randomized order. These new ratings were made on the basis of the Truax
Depth of Interpersonal Exploration Scale (reliability of raters per segment =
78). Data with respect to depth of intrapersonal exploration are plotted in
Figure 5-15. Visual inspection of these data indicates that depth of intraper-
sonal exploration, despite considerable overlapping in adjacent phases, was
somewhat lowered during the middle phase (A) for each of the three patients.
Although these data are far from perfect (i.e., overlap between phases), the
study does illustrate that the controlling effects of nonbehavioral therapeutic
variables can be investigated systematically using the experimental analysis of
behavior model. Those of nonbehavioral persuasion might be encouraged to
assess the effects of their technical operations more frequently in this fashion.
PATIENT A
1 1
lTl. leNO
7.0 llOWIIIO llOWIIIO
0OMDniOM|<ONBII>OHt
1
' 1
|S6.5
22
*
6.0
>-
r^
Sx
o -
^'^
x2e
>~
as!
Ui
5.0
4.5
^r \l fli
1 3 5 7 9 11 1315 1 3 5 7 9 11 13 15 1 3 5 7 9 11 13 15
TIME <3 MINUTE BLOCKS) TIME (3 MINUTE BLOCKS) TIME (3 MINUTE BLOCKS)
FIGURE 5-15. Depth of intrapersonal exploration. (Figure 4, p. 122, redrawn from: Thiax,
C.B., & Carkhuff, R. R. (1965). Experimental manipulation of therapeutic conditions, Journal
of Consulting Psychology, 29, 1 19-124. Copyright 1965 by the American Psychological Associa-
tion. Reproduced by permission.)
1 70 Single-case Experimental Designs
analogous to the A, phase (placebo) used in drug evaluations (see chapter 6).
In the final phase, contingent reinforcement procedures are reinstated. Thus
the last three phases of study are identical to those used by Ayllon and Azrin
(1965) in the example described in section 5.5 (however, there the study is
labeled B-A-B).
In the A-B-C-B design the A and C phases are not comparable, inasmuch
as experimental procedures differ. Therefore, the main experimental analysis
is derived from the B-C-B portion of study. However, baseline observations
are of some value, as the effects of B over A are suggested (here we have the
limitations of the A-B analysis). We will illustrate the use of the A-B-C-B
design with one example concerned with the control of drinking in a chronic
alcoholic.
Miller, Hersen, Eisler, and Watts (1974) examined the effects of monetary
reinforcement in a 48-year-old "skid row" alcoholic. During all phases of
study, a research assistant obtained breathalyzer samples, analyzed biochemi-
cally shortly thereafter for blood alcohol concentration, from the subject
(psychiatric outpatient) in various locations in his community. To avoid
possible bias in measurement, the subject was not informed as to specific
times that probe measures were to be taken. In fact, these times were
randomized in all phases to control for measurement bias.
During baseline (A phase), eight probe measures were obtained. During
contingent reinforcement (B), the subject was awarded $3.00 in canteen
booklets (redeemable at the hospital commissary for material goods)
whenever a negative blood alcohol sample was obtained. In the noncon-
tingent reinforcement phase (C), reinforcement ($3.00 in centeen booklets)
was administered regardless of blood alcohol concentration. In the final
phase, contingent reinforcement was reinstituted.
Inspection of Figure 5-16 reveals a variable baseline pattern ranging from a
00 to -27 level of blood alcohol. In contingent reinforcement, five of the six
Basic A-B-A Withdrawal Designs 171
S 5 20
g S 10
^ e
.00 /v
1 3 5 7 9 11 13 15 17 19 21 23 25
PROBE DAYS
^
FIGURE 5-16. Biweekly blood-alcohol concentrations for each phase. (Figure 1, p. 262, from:
Miller, P. M., Hersen, M., Eisler, R. M., & Watts,G. (1974). Contingent reinforcement of
J.
(B) the room manager condition was reinstated. Then there was a 69-day
follow-up period involving the room manager condition in the absence of the
experimenter.
Data appear in Figure 5-17 and are presented as the percentage of subjects
trainees) engaged in activity. It is clear that baseline (A) functioning was
(i.e.,
TRAINEE ENGAGEMENT
Room Room
100 Baseline
Monoger No -Distraction
i i Monoger Follow-up
80 / \
/
&
S 60
40
20
Study days
FIGURE 5-17. Percentage of trainees engaged during the activity hour for 19 days and follow-up
days. (Figure 1, p. 236 from: Porterfield, J., Blunden, R., & Blewitt, E. (1980). Improving
environments for profoundly handicapped adults: Using prompts and social attention to main-
tain high group engagement. Behavior Modification, 4, 225-241. Copyright 1980 by Sage
Publications. Reproduced by permission.)
Basic A-B-A Withdrawal Designs 173
effect change over baseline levels. However, in the A-B- A-B-A-C-A design the
individual controlling effects of B and C variables can be determined. A
careful distinction should be made between these kinds of designs and designs
where the interactive effects of variables are investigated (e.g., A-B-A-B-BC-
B-BC). In the latter design the effects of C above those of B can be assessed
experimentally. Once again, in the A-B-A-C-A design the effects of B and C
174
Extensions of the A-B-A Design 175
1972; Leitenberg et al., 1968; TUrner, Hersen, & Alford, Such analysis
1974).
is accomplished by examining the effects of both variables alone and in
combination, to determine the interaction. This extends beyond analysis of
two therapeutic variables over baseline as represented
the separate effects of
by the A-B-A-C-A type design described in the second category. It also
extends a stop beyond merely adding a variation of a therapeutic variable on
the end of an A-B-A-B series (e.g., A-B-A-B-BC), since no experimental
analysis of the additive effects of BC is performed. Properly run, interaction
designs are complex and usually require more than one subject (see section
6.5.).
The fifth category consists of the changing-criterion design (Hartmann &
Hall, 1976) and its variant, the periodic-treatments design (cf. Hayes, 1981).
Basically, in the changing-criterion design, baseline is followed by treatment
until a preset criterion is met. This then becomes the new baseline (A'), and a
new criterion is set. Such repetition, of course, continues until eventually the
final criterion is reached (see Hersen, 1982).
The following subsections present examples of extensions and variations,
with illustrations selected from each of the five major categories.
310.
M\ ^
^
300*
\1 1
290*
280-
5
Vi
o
S 270-
260*
LINE ;
SAL ;
250 i
A B ; A ;
^"""^^ I I I I
h r^
e>
FIGURE 6-1. A record of the weight of Subject 1 during all conditions. Each open circle
(connected by the thin solid line) represents a 2-week minimum weight loss requirement. Each
solid dot (connected by the thick solid line) represents the subject's weight on each day that he was
measured. Each triangle indicates the point at which the subject was penalized by a loss of
valuables, either for gaining weight or for not meeting a 2-week minimum weight loss require-
ment. NOTE: The subject was ordered by his physician to consume at least 2,500 calories per day
for 10 days, in preparation for medical tests. (Figure la, p. 104, from: Mann, R. A. [1972]. The
behavior-therapeutic use of contingency contracting to control an adult behavior problem:
Weight control. Journal of Applied Behavior Analysis, 5, 99-109. Copyright 1972 by Society for
the Experimental Analysis of Behavior, Inc. Reproduced by permission.)
Extensions of the A-B-A Design 1 71
A-B-A-C-A-C'-A design
aspects of the patient's ward life was Tokens could be earned by the
instituted.
patient for "talking correctly" (nondelusionally) both in individual sessions
and on the ward. Tokens were exchangeable for meals, luxuries, and privi-
leges. Phase 5 (A) once again involved a return to baseline. In the sixth phase
2 3 4 5
SESSIONS a-
S4 ;^ WARD -
\/ ^
7 8 18 19 25 26 32 33 39 40 46 47 53
DAYS
FIGURE 6-2. Percentage of delusional talk of Subject 4 during tnerapist sessions and on ward for
each experimental day. (Figure 4, p. 256, from: Wincze, J. P., Leitenberg, H., & Agras, W. S.
[1972]. The effects of token reinforcement and feedback on the delusional verbal behavior of
chronic paranoid schizophrenics. Journal of Applied Behavior Analysis, 5, 247-262. Copyright
1972 by Society for the Experimental Analysis of Behavior, Inc. Reproduced by permission.)
(1972) carried out this necessary counterbalancing with half of their subjects
in order to analyze the effects of feedback on token reinforcement.
This design, then, approximates the group crossover design or the counter-
balanced within-subject group comparison (e.g., Edwards, 1968), with the
exception of the presence of repeated measures and individual analyses of the
data. Each design option suffers from possible multiple-treatment inter-
A-C-A single-case design,on the other hand, data are usually presented more
descriptively, with visual analysis sometimes combined with statistical descrip-
tions (rather than inferences) to estimate the effect of each treatment. Wincze
et al. (1972) did an excellent job of this in their series, which is fully described
in chapter 10. But analysis depends on comparing individuals experiencing
different orders of treatments. Thus the functional analysis cannot be carried
out within one individual with all of the experimental control that it affords.
Other alternatives to comparing two treatments include a between-groups
comparison design or an alternating-treatments design (see chapter 8).
As noted above, this direct replication series will be discussed in greater
detail in chapter 10.
in their papers. In the next phase (B) each child was permitted access to an
adjoining playroom, containing attractive toys, after his or her paper was
scored. The child was allowed to remain there until the 50-minute period was
terminated, unless he or she became too noisy; then he or she was required to
return to his or her seat. The next two phases (A and B) were identical to the
first two. In the last three phases each child was permitted access to the
playroom after his or her paper had been scored, but the length of class
periods was gradually decreased (45, 40, 35 minutes). A procedural exception
to the aforementioned was made in the last phase on Days 47-54 inasmuch as
the teacher noted that a concomitant of increased speed was decreased quality
(number of errors) in writing. Therefore, during the last 8 days a quality
criterion was imposed before the child gained access to the playroom. In some
cases the child was required to recopy a portion of writing.
Data for first-grade children are plotted in Figure 6-3. Examination of the
bottom half of the figure shows that access to the playroom (50-minute
period) increased the rate of letter writing over baseline levels. This was
confirmed on two occasions in the A-B-A-B portion of study. When total time
of classroom periods systematically decreased, a corresponding increase in
rate of writing resulted. However, data for the last three phases are correla-
tive, asan experimental analysis was not performed. For example, a sequen-
tial comparison of 50-, 45- and 50-minute periods was not made. Therefore,
56 lbs. at baseline) who was profoundly retarded and who ruminated (emesis
of previously chewed food, rechewing food, and reswallowing food). The
disorder had begun some 17 years earlier.
Baseline (A) observations took place one hour after the subject had con-
sumed his meal. After each meal Bob was brought to the cottage lounge and
observed. Duration of rumination (cheek swelling, chewing, and swallowing)
G
209
K
UJ
ui .109
-I
ae
Ui
a- t69i
i/>
ec
o
gj49
UJ
129
K
W
a
109
I
z
woes
.069i
IS
u 12
Ui
-i
o
Ui
a
?
z
7
/v
z
2 6
DAYS / // Jo
fv 6 /)
FIGURE 6-3. The mean number of letters printed per minute by first-grade children are shown on
the lower coordinates, and the mean proportion of letters scored as errors are on the upper
coordinates.Each data point represents the mean averaged over all children for that day. The
horizontal dashed lines are the means of the daily means averaged over all days within the
experimental conditions noted by the legends at the top of the figure. (Figure 1, p. 81, from:
Hopkins, B. L., Schutte, R. C, & Carton, K. L. [19711. The effects of access to a playroom on
the rate and quality of printing and writing of first- and second-grade students. Journal of
Applied Behavior Analysis, 4, 77-87. Copyright 1971 by Society for the Experimental Analysis of
Behavior, Inc. Reproduced by permission.)
was timed. In the second phase (B) a DRO procedure was implemented. This
consisted of giving Bob small portions of cookies or bits of peanut butter
contingent on no rumination. In the B phase reinforcement was provided if
no rumination occurred for 15 seconds or more (IRT> 15"). In the next phase
SCED
182 Single-case Experimental Designs
Successive meals
A 6 (?'
S" A- 6
FIGURE 6-4. Duration of ruminations after meals by Bob. (Figure 2, p. 328, from: Conrin, J.,
Pennypacker, H. S., Johnston, J. M., & Rast, J. [1982]. Differential reinforcement of other
behaviors to treat chronic rumination of mental retardates. Journal of Behavior Therapy and
Experimental Psychiatry, 13, 325-329. Copyright 1982 by Pergamon. Reproduced by permission.
Extensions of the A-B-A Design 183
and psychiatric literatures (e.g., Agras, Bellack, & Chassan, 1964; Chassan,
1967; K. V. Davis, Sprague, & Werry, 1969; Grinspoon, Ewalt, & Shader,
1967; Hersen & Breuning, in press; Liberman et al., 1973; Lindsley, 1962;
McFarlain & Hersen, 1974; Roxburgh, 1970). Indeed, Liberman et al. (1973)
have encouraged researchers to use the within-subject withdrawal design in
assessing drug-environment interactions. In support of their position they
contend that:
There is no doubt that this approach can be of value in the study of both the
major forms of psychopathology and those of more exotic origin (Hersen &
Breuning, in press). The single-case experimental strategy is especially well
suited to the latter, as control group analysis in the rarer disorders is obviously
not feasible.
Specific issues
method of assigning patients to drugs such that neither the patient nor the
investigator observing him knows which medication a patient is receiving at
any point along the course of treatment" (Chassan, 1967, pp. 80-81). In these
studies, placebos and active drugs are identical in size, shape, markings, and
color.
While the double-blind procedure is readily adaptable to group comparison
research, it is difficult to some of the single-case strategies and
engineer for
impossible for others. Moreover, in some cases (see Table 6-1, Designs 1, 2, 4,
5, 8) even the single-blind strategy (where only the subject remains unaware
of differences in drug and placebo manipulations) is not applicable. In these
designs the changes from baseline observation to either placebo or drug
conditions obviously cannot be disguised in any manner.
Extensions of the A-B-A pesign 185
one of the major advantages of the single-case strategy (i.e., its flexibility) is
lost. However, even though the experimenter is fully aware of treatment
changes, the spirit of the double-blind trial can be maintained by keeping the
observer (often a research assistant or nursing staff member) unaware of drug
and placebo changes (Barlow & Hersen, 1973). We might note here addi-
of
tionally that despite the use of the double-blind procedure, the side effects
drugs in some cases Parkinsonism following administration of large
(e.g.,
In some of the investigations in which the subject has served as his or her
own method of study, where the
control, the standard experimental analysis
treatment variable is introduced, withdrawn, and reintroduced following
initial measurement, has not been followed rigorously. Thus the controlling
effects of the drug under evaluation have not been fully documented. For
186 Single-case Experimental Designs
example, K. V. Davis et al. (1969) used the following sequence of drug and
However, to date, most of these strategies have not yet been implemented.
A number of possible single-case strategies suitable for drug evaluation are
presented in Table 6-1 . The first three strategies fall into the A-B category and
are really quasi-experimental designs, in that the controlling effects of the
treatment variable (placebo or active drug) cannot be determined. Indeed, it
75
eye -A ^ ?'o^. .^
contoct 5Q.
25H ,/
o-o'' 'o
'wA b-6
0-
% of
motor
self-stim
75-
50-
25H
P-o-tx
A^ b-o' \/
75-1 \ jO CX
/
<
tr oo
% of
verboi 50-
seif-stim
25H
<to
..^vVV
rr t T I r T I I 1 1 I I T T I I I I I I I t
'
2 4 6 8 10 12 14 16 18 20 22 24 26
SESSIONS
FIGURE 6-5. Interpersonal eye contact, motor, and self-stimulation in a schizophrenic young
man during placebo and fluphenazine (20 mg daily) conditions. Each session represents the
average of a 2-day block of observations. (Figure 3, p. 437, from: Liberman, R. P., Davis, J.,
Moon, W., & Moore, J. [1973]. Research design for analyzing drug-environment-behavior
interactions. Journal of Nervous and Mental Disease, 156, 432-439. Copyright 1973. Reproduced
by permission.)
188 Single-case Experimental Designs
response facilitation which is seen most clearly in the increase of verbal self-
stimulation, and less so in rate of eye contact" (p. 437). It was also suggested
that residual phenothiazines during the placebo phase may have contributed
to the continued increase in eye contact. However, in the absence of concur-
rent monitoring of biochemical factors (phenothiazine blood and urine
levels), this hypothesis cannot be confirmed. In summary, Liberman et al.
(1973) were not able to confirm the controlling effects of fluphenazine over
any of the target behaviors selected for study in this Ai-B-A, design.
Let us now continue our examination of drug designs listed in Table 6-1.
Strategies 7-9 can be classified as B-A-B designs, and the same advantages
and limitations previously outlined in section 5.5 of chapter 5 apply here.
Strategies 10-12 fall into the general category of A-B-A-B designs and are
superior to the A-B-A and B-A-B designs for several reasons: (A) The initial
observation period involves baseline or baseline-placebo measurement; (2)
there are two occasions in which the controlling effects of the placebo or the
treatment variables can be demonstrated; and (3) the concluding phase ends
on a treatment variable.
Agras (1976) used an A-B-A-B design to assess the effects of chlorproma-
zine in a 16-year-old, black, brain-damaged, male inpatient who evidenced a
wide spectrum of disruptive behaviors on the ward. Included in his repertoire
were: temper tantrums, stealing food, eating with his fingers, exposing him-
self, hallucinations, and begging for money, cigarettes, or food. A specific
token economy system was devised for this youth, whereby positive behaviors
resulted in his earning tokens, and inappropriate behaviors resulted in his
being penalized with fines. Number of tokens earned and number of tokens
fined were the two dependent measures selected for study. The results of this
investigation appear in Figure 6-6. In the first phase (A) no thorazine was
administered. Although improvement in appropriate behaviors was noted,
the patient's disruptive behaviors continued to increase markedly, resulting in
his being fined many times. This occurred in spite of the addition of a time-
out contingency. On Day 9, thorazine (300 mg per day) was intro-
Hospital
duced (B phase) in an attempt to control the patient's impulsivity. This dosage
was subsequently decreased to 200 mg per day, as he became drowsy. Ex-
amination of Figure 6-6 reveals that fines decreased to a zero level whereas
tokens earned for appropriate behaviors remained at a stable level. In the
Extensions of the A-B-A Design 189
No
No Thorazine Thorazine Tho Tho
40r - Earned j
o-^ Fined
j
CO I
I
0) 30- I
I
I
I
I
I
f I
20- .n '
E 10-
oo^>o o o o o-
1 3 5 7 9 11 13 15 17 19 21 23
Hospital Days
5^ 12
S
5) 10 J
f
T I I
T 17
I I
11 13 15 19 21 23 25
SESSIONS
FIGURE 6-7. Average number of refusals to engage in a brief conversation. (Figure 2, p. 435,
from: Liberman, R. P., Davis, J., Moon, W., & Moore, J. [1973]. Research design for analyzing
drug-environment-behavior interactions. Journal of Nervous and Mental Disease, 156, 432-439.
Copyright 1973 Williams & Wilkins. Reproduced by permission.)
FIGURE 6-8. Mean duration of hand-washing and toothbrushing per day. (Figure 3, p. 654,
from: Tbrner, S. M., Hersen, M., Bellack, A. S., Andrasik, E, & Capparell, H. V. [1980].
Behavioral and pharmacological treatment of obsessive-compulsive disorders. Journal of Ner-
vous and Mental Disease, 168, 651-657. Copyright 1980 The Williams and Wilkins Co., Balti-
more. Reproduced by permission.)
Extensions of the A-B-A Design 193
obvious that these variables may have different effects when interacting with
other treatment variables. In advanced stages of the construction of complex
treatments it becomes necessary to determine the nature of these interactions.
Within the group comparison approach, statistical techniques, such as analy-
sis of variance, are quite valuable in determining the presence of interaction.
These techniques are not capable, however, of determining the nature of the
interaction or the relative contribution of a given variable to the total effect in
an individual.
To evaluate the interaction of two (or more) variables, one must analyze the
effects of both variables separately and in combination in one case, followed
by replications. However, one must be careful to adhere to the basic rule of
not changing more than one variable at a time (see chapter 3, section 3.4).
Before discussing examples of strategies for studying interaction, it will be
helpful to examine some examples of two or more vari-
designs containing
ables that are not capable of isolating interactive or additive effects. The first
example is one where variations of a treatment are added to the end of a
successful A-B-A-B (e.g., A-B-A-B'-B'-B' described above or an A-B-A-B-BC
design in which C is a different therapeutic variable). If the BC variable
produced an effect over and above the previous B phase, this would provide a
clue that an interaction existed, but the controlling effects of the BC phase
would not have been demonstrated. To do this, one would have to return to
the B phase and reintroduce the BC phase once again.
A second design, containing two or more variables where analysis of
interaction is not possible, occurs if one performs an experimental analysis of
one variable against a background of one or more variables already present in
the therapeutic situation. For example, O'Leary et al. (1969) measured the
disruptive behavior of seven children in a classroom. Three variables (rules,
educational structure, and praising appropriate behavior while ignoring dis-
ruptive behavior) were introduced sequentially. At this point, we have an A-
B-BC-BCD design, where B is rules, C is structure, and D is praise and
ignoring. With the exception of one child, these procedures had no effect on
disruptive behavior. A fourth treatment token economy was then added.
In five of six cfhildren this was effective, and withdrawal and reinstatement of
the token economy confirmed its effectiveness. The last part of the design can
In that series (Leitenberg et al., 1968) the firstsubject was a severe knife
phobic. The target behavior selected for study was the amount of time (in
seconds) that the patient was able to remain in the presence of the phobic
object. The design can be represented as B-BC-B-A-B-BC-B, where B repre-
sents feedback, C represents praise, and A is basehne. Each session consisted
of 10 trials. Feedback consisted of informing the patient after each trial as to
the amount of time spent looking at the knife. Praise consisted of verbal
reinforcement whenever the patient exceeded a progressively increasing time
criterion. The results of the study are reproduced in Figure 6-9. During
feedback, a marked upward linear trend in time spent looking at the knife
was noted. The addition of praise did not appear to add to the therapeutic
effect. Similarly, the removal of praise in the next phase did not subtract from
the progress. At this point, it appeared that feedback was responsible for the
therapeutic gains. Withdrawal and reinstatement of feedback in the next two
205-j
PHASES: 1
3 4 5
120
100-
AV:
/I
Z
<
60-
k /
5 40-
20
NO FB
FEEDBACK (FB) FB + PRAISE FB NO FB FB FB
ALONE
15
BLOCKS OF
20
: ALONE
25
PRAISE
30 35 40
ALONE PRAISEi ALONE
75
SESSIONS (40 TRIALS)
B 6t B A ^C s
FIGURE 6-9. Time
which a Rnife was kept exposed by a phobic patient as a function of
in
feedback, feedback plus praise, and no feedback or praise conditions. (Figure 2, p. 136, from:
Leitenberg, H., Agras, W. S., Thomson, L. E., & Wright, D. E. [1%8]. Feedback in behavior
modification: An experimental analysis in two phobic cases. Journal of Applied Behavior
Analysis, 1, 131-137. Copyright 1968 by Society for the Experimental Analysis of Behavior, Inc.
Reproduced by permission.)
196 Single-case Experimental Designs
< 140
O
z
8 120
z 100
Si
80
/
FIGURE
g ec 6^6
_
10 11 12
SESSIONS (BLOCKS OF
13
FIVE)
15 16 17
et
6-10. (Figure 1, from: Leitenberg, H. [1973]. Interaction designs. Paper read at the
18 19 20 21 22 23
exceeded a certain criterion, the patient could leave her room, watch televi-
sion, play table games with the nurses, and so on. Feedback consisted of
providing precise information on weight, caloric intake, and number of
mouthfuls eaten. Specifically, the patient plotted on a graph the information
that was provided by hospital staff.
ment. During the first feedback phase (labeled baseline on the graph), slight
gains in caloric intake and weight were noted (see Figure 6-11). When
reinforcement was added to feedback, caloric intake and weight increased
sharply. Noncontingent reinforcement produced a drop in caloric intake and
a slowing of weight gain, while reintroduction of reinforcement once again
produced sharp gains in both measures. These data contain hints of an
Noncontingent
Base Line Reinforcement Reinforcement Reinforcement
45 4,000
Weight t ,
Caloric
Intake o -o
- 3,500
o
43 - 3,000 o
2,500
- 2.000
/V^l
I
15 30
Days
6e gfi. ic
FIGURE 6-11. Data from an experiment examining the effect of positive reinforcement in the
absence of negative reinforcement (Patient 3). (Figure 2, p. 281, from: Agras, W. S., Barlow, D.
H., Chapin, H, N., Abel, G. G., & Leitenberg, H. [1974]. Behavior modification of anorexia
nervosa. Archives of General Psychiatry, 30, 279-286. Copyright 1974 American Medical Asso-
ciation. Reproduced by permission.)
Extensions of the A-B-A Design 199
interaction, in that caloric intake and weight rose slightly during the first
feedback phase, a finding that replicated two earlier experiments. The addi-
tion of reinforcement, however, produced increases over and above those for
feedback alone. The drop and subsequent rise of caloric intake and rate of
weight gain during the next two phases demonstrated that reinforcement is a
when combined with feedback.
controlling variable
These data only hint at the role of feedback in this study, in that some
improvement occurred during the initial phase when feedback alone was in
effect. Similarly, we cannot know from this experiment the independent
effects of reinforcement because was not analyzed separately. To
this aspect
accompHsh this, two experiments were conducted where feedback was intro-
duced against a background of reinforcement. Only one experiment will be
presented, although both sets of data are very similar. The design can be
represented as A-B-BC-B-BC, where A is baseline, B is reinforcement, and C
is feedback (see Figure 6-12). It should be noted that the patient continued to
be presented with 6,000 calories throughout the experiment, a point to which
we will return later. During baseline, in which no reinforcement or feedback
was present, caloric intake actually declined. The introduction of reinforce-
3,000
oP
<o
20
^ o
2.2.
HE
2,000
1.000
40 50 60
Days
FIGURE 6-12. Data from an experiment examining the effect of feedback on the eating behavior
of a patient with anorexia nervosa (Patient 5). (Figure 4, p. 283, from: Agras, W. S., Barlow, D.
H., Chapin, H. N., Abel, G. G., & Leitenberg, H. [1974]. Behavior modification of anorexia
nervosa. Archives of General Psychiatry, 30, 279-286. Copyright 1974 American Medical Asso-
ciation. Reproduced by permission.)
200 Single-case Experimental Designs
ment did not result in any increases; in fact, a slight decline continued.
Adding feedback to reinforcement, however, produced increases in weight
and caloric intake. Withdrawal of feedback stopped this increase, which
began once again when feedback was reintroduced in the last phase.
With this experiment (and its replications) it becomes possible to draw
conclusions about the nature of what is in this case a complex interaction.
When both variables were presented alone, as in the initial phases in the
respective experiments, reinforcement produced no increases, but feedback
produced some increase. When presented in combination, reinforcement
added to the feedback effect and, against a background of feedback, became
the controlling variable, in that caloric intake decreased when contingent
reinforcement was removed. Feedback, however, also exerted a controlling
effect when was removed and reintroduced against a background of rein-
it
Under this condition, size of meal did have an effect, in that more was
eaten when 6,000 calories were served than when 3,000 calories were pre-
sented (see Figure 6-13). In terms of treatment, however, even large meals
were incapable of producing weight gain in those phases where it was the only
therapeutic variable. Thus this variable is not as strong as feedback. The
authors concluded this series by summarizing the effects of the three variables
alone and in combination across five patients:
Thus large meals and reinforcement were combined in four experimental phases
and weight was lost in each phase. On the other hand, large meals and feedback
were combined in eight phases and weight was gained in all but one. Finally, all
three variables (large meals, feedback, and reinforcement) were combined in 12
phases and weight was gained in each phase. These findings suggest that informa-
o
.--'--
2.800
o- -<f
2.600
f---
2.400
2.200.
2.000
t
Days
FIGURE 6-13. The effect of varying the size of meals upon the caloric intake of a patient with
anorexia nervosa (Patient 5). (Figure 5, p. 285, from: Agras, W. S., Barlow, D. H., Chapin, H.
N., Abel, G. G., & Leitenberg, H. [1974]. Behavior modification of anorexia nervosa. Archives
of General Psychiatry, 30, 279-286. Copyright 1974 American Medical Association. Reproduced
by permission.)
this point in time, in contrast with the experiments described above. One
example is the evaluation of cognitive strategies (M. E. Bernard et al., 1983)
and the other is concerned with the possible combined effects of drugs and
behavior therapy (Rapport, Sonis, Fialkov, Matson, & Kazdin, 1983). M. E.
Bernard et al. (1983) evaluated the effects of rational-emotive therapy (RET)
and self-instructional training (SIT) in an A-B-A-B-BC-B-BC-A design with
follow-up. The subject was a 17-year old, overweight female who suffered
from trichotillomania (i.e., chronic hair pulling), especially while studying at
home. Throughout the study the subject self-monitored time studying and
number of hairs pulled out (deposited in an envelope). The dependent vari-
able was the ratio of hairs pulled out per minute of study time.
In baseline (A) the subject simply self-monitored. During the B phase, RET
was instituted, followed by a return to baseline (A) and reintroduction of
RET (B). In the next phase, (BC), SIT, consisting of problem-solving dia-
logues, was added to RET Then, SIT was removed (B) and subsequently
reintroduced (BC). In the last phase (A) all treatment was removed, and then
follow-up was conducted.
Results of this study appear in Figure 6-14. The first four phases comprise
an A-B-A-B analysis and do appear to confirm the controlling effects of RET
in reducing hair pulling. However, at this point the subject, albeit improved,
still was engaging in the behavior a significant proportion of the time.
Numbtrof 1.8 B ! BC BC ; A
hairs pulled
Up
out per "'fi"
minute of
''*
study time
1.2
1.0 H
0.8
0.6
0.4
0.2-
n n M lA
12
iii lii ilii II
3
i|iiiiif
4
i
5
iiii|i
6
I II iliii 1^1
7 8 9 10 11 '
12 13 14 15 20 36
Weeks
Note: 'Subject did not study
FIGURE 6-14. The number of hairs pulled out per minute of study time over baseline treatment
and follow-up phases. Missing data (*) reflect times when the subject did not study. (Figure 1, p.
277, from: Bernard, M. E., Kratochwill, T. R., & Keefauver, L. W. [1983]. The effects of rational-
emotive therapy and self-instructional training on chronic hair pulling. Cognitive Therapy and
Research, 7, 273-280. Copyright 1983 Plenum Publishing Corporation. Reproduced by permission.)
Extensions of the A-B-A Design 203
design, with two drugs (sodium valproate, carbamazepine) and one behav-
ioral technique (differential reinforcement of other behavior [DRO]) evalu-
ated (Rapport et al., 1983). The subject in this experimental analysis was a
13.7-year-old mentally retarded female who suffered from seizures and exhib-
ited aggressive behavior toward others. She had a long history of hospitaliza-
tions and had been tried on a large variety of medications, but with little
success. Aggressive behaviors included grabbing, biting, kicking, and hair
pulling. Aggression was the primary dependent measure in this study and was
recorded by inpatient staff with a high degree of interrater agreement (range
= 9207o-100<^o).
The subject received carbamazepine (4(X) mg, t.i.d.) in each phase of the
study. In the first phase (BC) she received sodium valproate (1,2(X) mg) as
well. This was gradually withdrawn in phase 2 (BC) and removed altogether
in Phase 3 (B). In Phase 4 (BD) a DRO procedure (edible reinforcements
delivered contingently for 15 -minute time periods in which no aggression
occurred; then increased to 30 and 60 minutes) was added to carbamazepine.
DRO was discontinued in Phase 5 (B) and then reinstated in Phase 6 (BD).
Examination of Figure 6-15 shows a high rate of aggressive incidents (mean
= 15 per day) in the first phase (BC), which decreased (mean = 3 per day)
204 Single-case Experimental Designs
CARBAMAZEPINC
SODIUM WITH-
VALHlOATE CMAWN
48-, ^ ^
NUMBER OF
INCIDENTS
DAYS
FIGURE 6-15. Data points represent the daily frequency of aggressive behavior during the child's
hospital stay. (Arrows indicate days when nocturnal enuresis was observed.) (Figure 1, p. 262,
from: Rapport, M. D., Sonis, W. A., Fialkov, M. J., Matson, J. L., & Kazdin, A. E. [1983].
Carbamazepine and behavior therapy for aggressive behavior: Treatment of a mentally retarded,
postencephalic adolescent with seizure disorder. Behavior Modification, 7, 255-264. Copyright
1983 by Sage Publication. Reproduced by permission.)
when sodium valproate was withdrawn (BC). However, when the patient was
totally withdrawn in Phase 3 (B), aggression rose to a mean of 10 a day.
Institution of DRO in Phase 4 (BD) led to a dramatic decrease (0), rose to 4-8
when DRO was withdrawn (B) on days 63 and 64, and gradually decreased to
zero againwhen DRO was reintroduced (BD) on days 65-91.
Although there was only a 2-day withdrawal of DRO procedures, this is
truly justified given the aggressive nature of the behavior being observed.
Indeed, it is quite clear that although the drug, carbamazepine had a minor
role in controlling aggression, the addition of DRO was the major controUing
force. Moreover, effectiveness of DRO allowed the subject to be discharged
to her family, with DRO procedures subsequently implemented at school in
order to ensure generalization of treatment gains.
Once again, replication on additional
and a subsequent reordering
subjects
of the experimental strategy so that DRO
was analyzed separately and then
combined with the drug would be necessary for a more complete study of
interactions. Finally, the nature of this experimental strategy deserves some
comment, particularly when compared to other strategies attempting to
answer the same questions. First, in any experiment there are more things
interacting with treatment outcome than the two or more treatments or
variables under question. Foremost among these are client variables. This, of
Extensions of the A-B-A Design 205
course, is the reason for direct replication (see chapter 10). If the experimental
operations are replicated (in this example the interaction), despite the dif-
ferent experiences clients bring with them to the experiment, then one has
increasing confidence in the generality of the interactional finding across
subjects.
Second, as pointed out in chapter 5 and discussed more fully in chapter 8,
the latter phases of these experiments are subject to multiple-treatment inter-
ference. In other words, the effect of a treatment or interaction in the latter
phases may depend to some extent on experience in the earlier phases. But if
the interaction effect is consistent across subjects, both early and late in the
because the early phase most closely resembles the applied situation, where
the treatment would also be introduced and continued without a prior back-
ground of several treatments.
The other popular method of studying interactions is the between-group
factorial design. In this case, of course, one group would receive both
Treatments A and B, while two other groups would receive just A or just B.
(If the factorial were complete, another group would receive no treatment.)
Here treatments are not delivered sequentially, but the more usual problems
of intersubject variability, inflexibility in altering the design, infrequent mea-
surement, determination of results by statistical inference, and difficulties
generalizing to the individual obtain, as discussed in chapter 2. Each approach
to studying interactions obviously has its advantages and disadvantages.
This continues in graduated fashion until the final target (or criterion) is
between the criterion and behavior over the course of the intervention phase"
(Kazdin, 1982b, p. 160). When such close correspondence fails to materialize,
with stability not apparent in each successive phase, unambiguous interpreta-
tions of the data are not possible. One solution, of course, is to partially
withdraw treatment by returning to a lower criterion, followed by a return to
the more stringent one (as in a B-A-B withdrawal design). This adds experi-
mental confidence to the treatment by clearly documenting its controlling
effects. Or, on a more extended basis, one can reverse the procedure and
experimentally demonstrate successive increases in a targeted behavior fol-
lowing initial demonstration of successive decreases. This is referred to as bi-
directionality. Finally, Kazdin (1982b) pointed out that some experimenters
have dealt with the problem of excessive variability by showing that the mean
performance over adjacent subphases reflects the stepwise progression.
None of the aforementioned solutions to variability in the subphases is
ideal. Indeed, it behooves researchers using this design to demonstrate close
correspondence between the changing criterion and actually observed behav-
ior. Undoubtedly, as this design is employed more frequently, more elegant
DAYS: I 8 15 22 29 36 43 50 57 64 78 85
PHASES: BASELINE TREATMENT
FIGURE 6-16. Data from a smoking-reduction program used to illustrate the stepwise criterion
change design. The solid horizontal lines indicate the criterion for each treatment phase. (Figure
2, p. 529, from: Hartmann, D. P., & Hall, R. V. [1976]. The changing criterion design. Journal of
Applied Behavior Analysis, 9, 527-532. Copyright 1976 by Soc. for the Experimental Analysis of
Behavior. Reproduced by permission.)
These data do not show what about the treatment produced the change (any
more than an A-B-A design would). It may be therapist concern or the fact that
the client attended a session of any kind. These possibilities would then need to
be eliminated. For example, one could manipulate both the periodicity and
nature of treatment. If the periodicity of behavior change was shown only when
a particular type of treatment was in place, this would provide evidence for a
more specific effect, (p. 203)
208 Single-case Experimental Designs
FIGURE 6-17. The periodic treatments effect is shown on hypothetical data. (Data are graphed in
raw data form in the top graph.) Arrows on the abscissa indicate treatment sessions. This
apparent B-only graph does not reveal the periodicity of improvement and treatment as well as
the bottom graph, where each two data points are plotted in terms of the difference from the
mean of the two previous data points. Significant improvement occurs only after treatment. Both
graphs show an experimental effect; the lower is merely more obvious. (Figure 3, p. 202, from:
Hayes, S. C. [1981]. Single case experimental design and empirical clinical practice. [1981].
Journal of Consulting and Clinical Psychology, 49, 193-211. Copyright 1981 by American
Psychological Association. Reproduced by permission.)
CHAPTER 7
7.1. INTRODUCTION
The use of sequential withdrawal or reversal designs is inappropriate when
treatment variables cannot be withdrawn or reversed due to practical limita-
tions, ethical considerations, or problems in staff cooperation (Baer et al.,
1968; Barlow et al., 1977; Barlow & Hersen, 1973; Birnbauer, Peterson, &
Solnick, 1974; Hersen, 1982; Kazdin & Kopel, 1975; Van Hasselt & Hersen,
1981). Practical limitations arise when carryover effects appear across adja-
cent phases of study, particularly in the case of therapeutic instructions
(Barlow & Hersen, 1973). A similar problem may occur when drugs with
known long-lasting effects are evaluated in single-case withdrawal designs.
Despite discontinuation of medication in the withdrawal (placebo) phase,
active agents persist psychologically and, with the phenothiazines, traces have
been found in body tissues many months later (Goodman & Oilman, 1975).
Also, when multiple behaviors within an individual are targeted for change,
withdrawal designs may not provide the most elegant strategy for such
evaluation.
Ethical considerations are of paramount importance when the treatment
variable is effective in reducing self- or other-destructive behaviors in sub-
jects. Here the withdrawal of treatment is obviously unwarranted, even for
brief periods of time. Related to the problem of undesirable behavior is the
matter of environmental cooperation. Even if the behavior in question does
not have immediate destructive effects on the environment, if it is considered
to be aversive (i.e., by teachers, parents, or hospital stafO the experimenter
will not obtain sufficient cooperation to carry out withdrawal or reversal of
treatment procedures. Under these circumstances, it is clear that the applied
clinical researcher must pursue the study using different experimental strate-
gies. In still other instances, withdrawal of treatment, despite absence of
209
210 Single-case Experimental Designs
In this chapter we will examine in detail the rationale and procedures for
multiple baseline designs. Examples of the three principal varieties of multiple
baseline strategies will be presented for illustrative purposes. In addition, we
will consider the more recent varieties and permutations, including the non-
concurrent multiple baseline design across subjects, the multiple-probe tech-
nique, and the changing criterion design. Finally, the application of the
multiple baseline across subjects in drug evaluations will be discussed.
continued in sequence until the experimental variable has been applied to all
of the target behaviors under study. In each case the treatment variable is
effective when a change in rate appears after its application while the rate of
concurrent (untreated) behaviors remains relatively constant. A basic as-
sumption is that the targeted behaviors are independent from one another. If
they should happen to covary, then the controlling effects of the treatment
variable are subject to question, and limitations of the A-B analysis fully
apply (see chapter 5).
If general effects on multiple behaviors were observed after treatment had been
applied to only one, there would be no way to clearly interpret the results. Such
results may reflect a specific therapeutic effect and subsequent response general-
ization, or they may simply reflect non-specific therapeutic effects having little to
do with the specific treatment procedure under investigation, (p. 95)
While changes in target behaviors are the raison d'etre for undertaking treatment
or training programs, concomitant changes may take place as well. If so, they
should be assessed. It is one thing to assess and evaluate changes in a target
a
treated as a single organism" (p. 253). However, in this case the experimenter
would also be expected to present data for individual subjects, demonstrating
that sequential treatment applications to independent behaviors affected
most subjects in the same direction.
In the second design
the multiple baseline design across subjects
particular treatment is applied in sequence across matched subjects presum-
ably exposed to "identical" environmental conditions. Thus, as the same
treatment variable is applied to succeeding subjects, the baseline for each
subject increases in length. In contrast to the multiple baseline design across
behaviors (the within-subject multiple baseline design), in the multiple base-
line design across subjects a single targeted behavior serves as the primary
focus of inquiry. However, there is no experimental contraindication to
monitoring concurrent (untreated) behaviors as well. Indeed, it is quite likely
that the monitoring of concurrent behaviors will lead to additional findings of
merit.
As with the multiple baseline design across behaviors, a possible variation
of the multiple baseline design across subjects involves the sequential applica-
tion of the treatment variable across entire groups of subjects (see Domash et
al., But here, too, it behooves the experimenter to show that a large
1980).
majority of individual subjects for each group evidenced the same effects of
treatment.
We might note that the multiple baseline design across subjects has also
been labeled a time-lagged control design (Gottman, 1973; Gottman, McFall,
& Barnett, 1969). In fact, this strategy was followed by Hilgard (1933) some
50 years ago in a study in which she examined the effects of early and delayed
practice on memory and motoric functions in a set of twins (method of co-
twin control).
In the third design the multiple baseline design across settings a partic-
ular treatment is applied sequentially to a single subject or a group of subjects
across independent situations. For example, in a classroom situation, one
214 Single-case Experimental Designs
with a peer, he cried or reported the incident to his teacher. Three target
behaviors were selected for modification as a resuh of role-played perfor-
mance in baseline: ratio of eye contact to speech duration, number of words,
and number of requests. In addition, independent evaluations of overall
assertiveness, based on role-played performance, were obtained. As can be
seen in Figure 7-1, baseline responding for targeted behaviors was low and
stable. Following baseline evaluation, Tom received 3 weeks of social skills
training consisting of three 15-30 minute sessions per week. These were
applied sequentially and cumulatively over the 3-week period. Throughout
training, six role-played scenes were used to evaluate the effects of treatment.
In addition, three scenes (on which the subject received no training) were used
to assess generalization from trained to untrained scenes.
The resuhs for training scenes appear in Figure 7-1. Examination of the
graph indicates that institution of social skills training for ratio of eye contact
to speech duration resulted in marked changes in that behavior, but rates for
number of words and number of requests remained constant. When social
skills training was applied to number of words itself, the rate for number of
requests remained the same. Finally, when social skills training was directly
applied to number of requests, marked changes were noted. Thus it is clear
that social skills training was effective in increasing the rate of the three target
behaviors, but only when treatment was applied directly to each. Indepen-
dence of the three behaviors and absence of generalization effects from one
behavior to the next facilitate interpretation of these data. On the other hand,
had nontreated behaviors covaried following application of social skills train-
ing, unequivocal conclusions as to the controlling effects of the training could
not have been reached without resorting to Kazdin and Kopel's (1975) solu-
tion to withdraw and reinstate the treatment.
The reader should also note in Figure 7-1 that, despite the fact that overall
assertiveness was not treated directly, independent ratings evinced gradual
improvement over the 3 -week period, with treatment gains for all behaviors
maintained in follow-up.
Examination of data for the untreated generalization scenes indicates that
similar results were obtained, confirming that transfer of training occurred
from treated to untreated items. Indeed, the patterns of data for Figures 7-1
and 7-2 are remarkably alike.
Liberman and Smith (1972) also used a multiple baseline design across
behaviors in studying the effects of systematic desensitization in a 28-year-
old, multiphobic female who was attending a day treatment center. Four
specificphobias were identified (being alone, menstruation, chewing hard
foods, dental work), and baseline assessment of the patient's self-report of
each was taken for 4 weeks. Subsequently, in vivo and standard systematic
desensitization (consisting of relaxation training and hierarchical presentation
of items in imagination) were administered in sequence to the four areas of
Multiple Baseline Designs 217
TRAINING SCENES
5 7 9 11 2-4-
Probe Sessions Weeks
FIGURE 7-1. Probe sessions during baseline, social skills treatment, and follow-up for training
scenes for Tom. A multiple baseline analysis of ratio of eye contact while speaking to speech
duration, number of words, number of requests, and overall assertiveness. (Figure 3, p. 190,
from: Bornstein, M. R., Bellack, A. S., Hersen, M. [1977]. Social-skills training for unassertive
children: A multiple-baseline analysis. Journal of Applied Behavior Analysis, 10, 183-195.
Copyright 1977 by Society for Experimental Analysis of Behavior. Reproduced by permission.)
GOOAUZATDN SCENES
X
^\A
J j
'
\^^^^
\
:a
J
AA^ \
5
^ 3
/
1 3 5 7 9 11 2-4-
BASELINE DESENSITIZATION
12-1 Being Alone '
// T-
\Z-{ Menstruation I-
y
8
o
Chewing
U n-T r r I
- I
//
Hard Foods
IIIIUI n *T r I
- -// T
12-1 ^e/7/a/ Work
6-
1 2 3 4 5
ll
6 7 8
lill..
9 10 11 12 13 14 15
-//-
23
Weeks
FIGURE 7-3. Multiple baseline evaluation of desensitization in a single case with four phobias.
(Figure 1, p. 600, from: Liberman, R. P., & Smith, V. [1972]. A multiple baseline study of
systematic desensitization in a patient with multiple phobias. Behavior Therapy^ 3, 597-603.
Copyright 1972 by Association for the Advancement of Behavior Therapy. Reproduced by
permission.)
ment was maintained throughout all phases of study, the possibility that
expectancy of improvement and actual treatment effects were confounded
cannot be discounted, especially in light of the primary reliance on self-report
data. However, casually conducted behavioral observations corroborate self-
report data.
Despite the above-mentioned limitations, Liberman and Smith's (1972)
investigation is of interest from a number of standpoints. Firsts as most
multiple baseline studies emanate from the operant framework, this study
lends credence to the notion that nonoperant procedures (e.g., systematic
can be assessed in this paradigm. Second as the particular
desensitization) y
consulting room practice (see chapter 3, section 3.2). Finally, the treatment
was fully implemented by a mental health paraprofessional who had only one
year's training in psychiatry.
In our next example of a multiple baseline design across behaviors, a
psychological measure (erectile strength as assessed with a penile gauge) was
used to determine efficacy of covert sensitization in the treatment of a 21-
year-old married male, admitted for inpatient treatment of exhibitionism and
obscene phone calling (Alford, Webster, & Sanders, 1980). History of exhibi-
tionism began at age 16, and obscene phone calling had taken place over the
previous year. During baseline assessment:
Audiotapes of both deviant and nondeviant sexual scenes were used to elicit
These included one taped description of intercourse with his wife and another
with different sexual partners.
Covert sensitization sessions were conducted twice daily in the hospital at
various locations. This treatment consisted of imaginally pairing the deviant
sexual approach (i.e., obscene phone calls, exhibitionism) with aversive stim-
uli such as suffocation, nausea, and arrest. Each session involved 20 pairings
of the deviant scenarios with aversive imagery. Following baseline assess-
ment, covert sensitization was first applied to obscene phone calling and then
to exhibitionism. In addition to therapist-conducted treatment sessions, the
patient was instructed to use covert imagery on his own initiative whenever he
experienced deviant sexual urges.
Data for this multiple baseline analysis are presented in Figure 7-4. During
baseline evaluation, penile tumescence in response to tapes of obscene phone
calling and exhibitionism was quite high. Similarly, tumescence was above
75^0 in response to nondeviant tapes of sexual activity with females other
than his wife, but only slightly higher than 25% in response to lovemaking
with his wife.
Institution of covert sensitization for obscene phone calling resulted in
marked diminution in penile responsivity to taped descriptions of that behav-
ior, eventually resulting in only a negligible response. However, such treat-
ment also appeared to affect changes in penile response to one of the
exhibitionism tapes (Ex. even though that behavior had not yet been
1),
specifically targeted. (We have here an instance where the baselines are not
independent from one another.) However, when treatment subsequently was
directed to exhibitionism itself, there was marked diminution in penile re-
Multiple Baseline Designs 221
COVERT SENSITIZATION
OPC 1
OBSCENE PHONE CALLING
o OPC 2
o OPC 3 GENERALIZATION
I 2
I
3 4| 5 6 |7
BSIN CS/OPC CS/OPC EX SACS
^
INPATIENT DISCHARGED 1 CS/OPCEX Ri
FIGURE 7-4. Percentage of full phone call (OPC) exhibitionistic (EX), and
erection to obscene
heterosexual stimuli (ND) during and follow-up phases. (Figure 1, p. 20,
baseline, treatment,
from: Alford, G. S., Webster, J. S., & Sanders, S. H. [1980]. Covert aversion of two interrelated
deviant sexual practices: Obscene phone calling and exhibitionism. A single case analysis.
Behavior Therapy, 11, 13-25. Copyright 1980 by Association for the Advancement of Behavior
Therapy. Reproduced by permission.)
sponse to tapes Ex. 2 and Ex. 3 in addition to continued decreases to tape Ex.
1. During the course of treatment, penile responsivity to nondeviant hetero-
sexual interactions remained high, increasing considerably with respect to
lovemaking with the wife.
The reader might note that "the patient was preloaded with 36 oz of beer 90
to 60 minutes prior to Assessments 10 and 11" (Alford et al., 1980, p. 19).
This was carried out inasmuch as he had claimed that alcohol had disinhibited
deviant sexuality. However, experimental data did not seem to confirm this.
One, 2-, and 10-month follow-up assessments indicated that all gains were
maintained, with the exception of decreased penile responsivity to taped
descriptions of intercourse with the wife. In addition, 10-month collateral
information from the patient's wife, parents, and attorney, as well as police,
court, and telephone company records revealed no incidents of sexual de-
viance.
Our illustration reveals a clinically successful intervention evaluated
SCED H
222 Single-case Experimental Designs
l^ 'WV^\.A_J\A-Jwy\Arv a-a-z:l
^
. ffh
^AV^'A^'V-An/^ ^=^
. . NEAT
J/'^^V'wV, #Ayri
/\,AA^
jjMfSUL
X g
at 'a, 1%
i?'
o J <s
10 20 30 40 50 60 70
^Aw^ W>^J
80 90 100 110
^^
120
*
*"
SUCCESSIVE MEALS OF THE STUDY
FIGURE 7-5. Concurrent group rates of Stealing, Fingers, Utensils, and Pigging behaviors, and
the sum of Stealing, Fingers, and Pigging (Total Disgusting Behaviors) through the baseline and
experimental phases of the study. (Figure 1, p. 80, from: Barton, E. S., Guess, D., Garcia, E., &
Baer, D. M. Improvement of retardates' mealtime behaviors by time-out procedures using
[1970].
multiple baseline techniques. Journal of Applied Behavior Analysis, 3, 77-84. Copyright 1970 by
Society for Experimental Analysis of Behavior, Inc. Reproduced by permission.)
rate for that behavior. Finally, application of time-out for pigging proved
successful in reducing its rate.
Independence of the target behaviors was observed, with the exception of
messy utensils, which increased in rate when the time-out contingency was
applied to fingers. Although group data for the 16 subjects were presented, it
224 Single-case Experimental Designs
would have been desirable if the authors had presented data for individual
subjects. Unfortunately, the time-sampling procedure used by Barton et al.
(1970) precluded obtaining such information. However, this factor should not
overshadow the clinical and social significance of this study, in that (1)
mealtime behaviors improved significantly; (2) a result of improved mealtime
behaviors was a concomitant improvement in staff morale, facilitating more
favorable interactions with the subjects; and (3) staff in other cottages were
sufficiently impressed with the results of this study to begin to implement
similar mealtime programs for their own retarded residents.
A more recent example of a multiple baseline design across behaviors
(carried out in group format) was presented by Bates (1980). This study is of
particular interest inasmuch as he contrasted the effects of interpersonal skills
training (i.e., social skills training) for an experimental group with a control
condition that received no treatment. Subjects were moderately and mildly
retarded adults (8 in the treatment group, 8 in the control group). Since
treatment was carried out sequentially and cumulatively across four behav-
iors (introductions and small talk, asking for help, differing with others,
handling criticism) following initial assessment, a multiple baseline analysis
was possible in addition to a controlled group evaluation.
A 16-item role-play testwas the dependent measure, with subjects receiving
interpersonal skills training for eight of these scenarios. The remaining eight,
for which subjects received no training, served as a measure of transfer of
training. (But this was only accomplished on a pre-post basis.) Skills training
was conducted thrice weekly and consisted of modeling, behavior rehearsal,
coaching, feedback, incentives, and homework assignments. After each set of
three training sessions an assessment was performed.
Results of this analysis appear in Figure 7-6. As the reader will note,
improvements in each of the four targeted behaviors occurred in time-lagged
fashion only when treatment was specifically applied to each. Thus there was
no evidence of correlated baselines. Data indicate that interpersonal skills
training was effective in bringing about behavioral change. Further, results of
the group comparison indicated that there were statistically significant dif-
ferences in favor of the experimental condition.
Although these data are impressive, we would like to identify a few
problems. First, baseline assessment for introductions and small talk should
have been extended to three points, despite the apparent stability. Second a y
INTRODUCTIONS
AND SMALL TALK GROUP INSTRUCTION (B)
10-r BASELINE (A)
8--
uj Z 6--
O J^
< ^ 4--
>
<
u;<
00
CO "=
I
I
o<
(/) Q.
CO LU
Po I
1
UJ 10
0-1-
I-
HANDLING
H
CRITICISM
h - J- -
>Q
I- UJ
8
6-|-
< Z 4--
2--
U 0--
H 1 1 h-i-f
PREl PRE 2 WK 1 WK 2 WK 3 WK 4 POST-
TEST
FIGURE 7-6. A multiple baseline analysis of the influence of interpersonal skills training on Exp.
I's cumulative content effectiveness score average across four social skill areas. (Figure 1, p. 244,
from: Bates, P. The effectiveness of interpersonal skills training on the social skill
[1980].
acquisition of moderately and mildly retarded adults. Journal of Applied Behavior Analysis, 13,
237-248. Copyright 1980 by Society for Experimental Analysis of Behavior. Reproduced by
permission.)
(See also section 10.2 for a discussion of issues arising from this strategy
relevant to replication.)
Although the multiple baseline design is frequently used in clinical research
when withdrawal of treatment is considered to be detrimental to the patient,
on occasion withdrawal procedures have been instituted following the se-
quential administration of treatment to target behaviors, particularly when
reinforcement techniques are being evaluated (e.g., Russo & Koegel, 1977). If
treatment is reintroduced after a withdrawal, a powerful demonstration of its
tives of others, and, when she did verbalize, her comments reflected pronoun
Multiple Baseline Designs 227
FIGURE 7-7. Social behavior, self-stimulation, and verbal response to command in the normal
kindergarten classroom during baseline, treatment by the therapist, and treatment by the trained
kindergarten teacher. All three behaviors were measured simultaneously. (Figure 1, p. 585, from:
Russo, D. C, & Koegel, R. L. [1977]. A method for integrating an autistic child into a normal
public school classroom. Journal of Applied Behavior Analysis, 10, 579-590. Copyright 1977 by
Society for Experimental Analysis of Behavior. Reproduced by permission.)
stimulation was quite high, and appropriate responses were low but increas-
ing. Treatment consisted of token reinforcement paired with verbal praise,
feedback, and response cost (removal of tokens) for self-stimulation. Tokens
were earned contingently upon occurrence of each instance of social behavior
228 Single-case Experimental Designs
and appropriate responses, and they were systematically removed for each
occurrence of self-stimulatory behavior. At the end of each training session
the child had the opportunity to trade remaining tokens for a menu of backup
reinforcers. Three pretraining sessions were carried out to estabhsh the rein-
forcing value of tokens.
Initial treatment by the therapist for social behaviors resulted in a marked
TRAINING SCENES
Follow-up
Baseline Social Skills Training
9i^
7-
1
I I I I I I I J_J L
I
1.0
.8
V
<p .6
4
.2
J-l
5- %
I I I I I I I I I i I I
V
I I I
w 12
^ S 8
cS 4
^8
.0
V
' ' ' lilt L r
'
I
'
I I
1 3 5 7 9 11 13 15 17 4 6 8 10
Hersen, M., Kazdin, A. E., Simon, J., & Mastantuono, A. K. [1983]. Social skills training for
blind adolescents. Journal of Visual Impairment and Blindness, 75, 199-203. Copyright 1983.
Reproduced by permission.)
230 Single-case Experimental Designs
ment follow-up revealed a decrement for gaze and requests for new behavior.
Examination of Figure 7-8 shows that retreatment in booster sessions for
those behaviors resulted in a renewed improvement, extending through the 8-
and 10- week follow-up assessments. Thus our multiple baseline analysis
permitted a clear assessment of which behaviors were maintained after treat-
ment in addition to those requiring booster treatment.
Our first example of the multiple baseline strategy across subjects is taken
from the clinical child literature. Barmann, Katz, 0*Brien, and Beauchamp
(1981) examined the sequential application of overcorrection training for
three developmentally disabled children who were diagnosed as irregular
enuretics. These children (4-, 7-, and 8-years-old, respectively) had IQs that
ranged from 23-41. The first 2 subjects lived at home and the third resided in
a home care facility for the developmentally disabled. Subjects 1 and 3 were
12
8
4
:-t I- 1^ 4 4^ r:f-|-4
20
16
12
12
4 8 12 16 20 24 28 S2 36 40 44 48 52 56 60 64 68 72 76 80 84 88
4 CAY BLOCKS
FIGURE 7-9. Total number of accidents at home and school during baseline, treatment, and
follow-up conditions. NOTE: Data are collapsed over 4-day periods. (Figure 1, p. 344, from:
Barmann, B. C, Katz, R. C, O'Brien, E, & Beauchamp, K. L. 11981]. TVeating irregular
enuresis in developmentally disabled persons: A study in the use of overcorrection. Behavior
Modification, 5, 336-346. Copyright 1981 by Sage Publications. Reproduced by permission.)
,
CHILD 1
NoDalar Delay Oalay Oalay
/
/^
\
CNIL0 2
Oalay
100
eo
eo
40
\
y\
J
20
CNILO 3
AA
u*iay >to 0lir
4<y
20-
O-
FIGURE 7-10. Results of the multiple baseline analysis with subsequent repeated reversals of the
influence of a response-delay requirement of the correct responding of autistic children. (Figure 1
p. 235, from: Dyer, K., Christian, W. P., & Luce, S. C. [1982]. The role of response delay in
improving the discrimination performance of autistic children. Journal of Applied Behavior
Analysis, 15, 231-240. Copyright 1982 by Society for Experimental Analysis of Behavior.
Reproduced by permission.)
tution overcorrection when the pants were found to be wet at home. (No
treatment was administered at school as this served as a measure of general-
ization.) Restitutional overcorrection "... required the child to (a) obtain a
towel, (b) clean up all traces of the accident, (c) go to the bedroom and put on
clean pants, and (d) dispose of the wet pants in the diaper pail" (Barmann et
al., 1981, p. 341). This was followed by 10 repetitions of positive practice
100
^
100
50
z 100
u
DON
so
100
so
JOMN
V
1 2 3 20-22 2
SESSIONS
FIGURE 7-11. Percentage of correct emergency escape responses. Baseline first 3 days of
performance from original baseline phase. Training last 3 days of training from original
intervention phase. Post postcheck assessment 2 weeks after training was terminated. Follow-
up 1-5 month follow-up (FU) reassessment when no intervention Retraining- in effect.
reinstatement of original training program. Follow-up 2-9 month follow-up (FU) reassessment
after original training and 4-month follow-up after retraining. (Figure 1, p. 718, from: Jones, R.
T, Kazdin, A. E., & Haney, J. L. [1981]. A follow-up to training emergency skills. Behavior
Therapy, 12, 716-722. Copyright 1981 by Association for Advancement of Behavior Therapy.
Reproduced by permission.)
The present follow-up study has several implications for future research. First,
conclusions about the effectiveness of particular procedures need to be tempered
unless accompanied by evidence showing maintenance of behavior. The implica-
tion of many demonstrations is that an important applied problem has been
solved by application of behavioral (or other) procedures. However, durability of
behavior change is not an ancillary measure of treatment effects, (p. 721)
Our illustration shows how the muhiple basehne strategy allows for (1) an
initial demonstration of the controlling effects of a treatment, (2) an assess-
ment at follow-up, (3) a second demonstration of the controlling effects of
the treatment, and (4) a second follow-up assessment showing differential
responding among subjects.
A three-group application of the multiple baseline strategy across subjects
(groups of children with insulin dependent diabetes) was provided by Epstein
et al. (1981). The effects of a behavioral treatment program to increase the
percentage of negative urine tests were examined in 19 families of such
diabetic children. Treatment was directed to decrease intake of simple sugars
and saturated fats, decrease stress, increase exercise, and adjust insulin
intake. Parents were taught to use praise and token economic techniques to
reinforce improvements in the child *s self-regulating behavior. When treat-
ment began, 10 of the children (ages 8 to 12) were self-administering their
insulin; the remaining 9 were receiving shots from their parents.
Multiple Baseline Designs 235
50 -
A
40 ^
_ -L V_ ^/\-/ - - - - - - - -- -
30 -
/v GROUP 1
;:::;..
:.vy^
,,,y...
20 - .
50
% NEGATIVE 40 -
URINES
30 "0^ GROUP 2
50 -1
40 -
30 -,
20 -I
WEEKS
FIGURE 7-12. Percentage of 0% urine concentration tests weekly for children in each group. The
mean and standard error of the mean for all the observations in each phase by group are
represented by a solid and dotted line, respectively. (Figure 1, p. 371, from: Epstein, L. H., Beck,
S., Figueroa, J., Farkas, G., Kazdin, A. E., Daneman, D., & Becker, D. [1981]. The effects of
targeting improvements in urine glucose on metabolic control in children with insulin dependent
diabetes. Journal of Applied Behavior Analysis, 14, 365-375. Copyright 1981 by Society for
Experimental Analysis of Behavior. Reproduced by permission.)
236 Single-case Experimental Designs
of the 12- week program. Examination of Figure 7-12 indicates that percent-
age of negative urines was relatively low for each of the three groups during
baseline. Institution of treatment resulted in marked improvements in per-
centage of negative urines, indicating the controlling effects of the strategy.
Moreover, it appears that these gains were maintained posttreatment, as
indicated by the follow-up assessment at 22 weeks.
In summary, Epstein et al. (1981) presented a powerful demonstration of
the effects of a behavioral treatment over a biochemical dependent measure
(that has serious health implications). From a design standpoint, this study is
rather than being collapsed across groups. Such data are important, as it is
Multiple Baseline Designs 237
^Hkick/Si|ftttii Mm !
^fv/VV//>,Vv;^
^^^'^^'^..Jy^ -
Dtp! 4 !
To JO ^0
JO 'sr*i
sessions 'S.Si
FIGURE 7-13. Frequency of hazards across department as a function of the introduction of the
"feedback package." Data for days following unplanned safety meetings are indicated by an open
circle. At point "a" there was a change in supervisors. (Figure 1, p. 293, from: Sulzer-Azaroff,
B., &
deSantamaria, M. C. [1980]. Industrial safety hazard reduction through performance
feedback. Journal of Applied Behavior Analysis, 13, 287-295. Copyright 1980 by Society for
Experimental Analysis of Behavior. Reproduced by permission.)
238 Single-case Experimental Designs
Self- Self-Moniforing
- Tochr
Oovid
- o
8
lu p
13 15 17 19 21 23 25 27 29 31 33 35 37 39 3-6-l^
Days A^nfhs
FIGURE 7-14. Effects of self-monitoring and self-administered overcorrection in the school and
home: David. (Figure I, p. 81, from: Ollendick, T. H. [1981]. Self-monitoring and self-adminis-
of aromatic ammonia was crushed and held under her nose for more
. . .
than 3 sec" (Singh et al., 1980, p. 563). Finally, during the 8 weeks of the
genralization phase, ward nurses were requested to carry out the punishment
procedure on an 8-hour-per-day basis. This is in contrast to original treatment
240 Single-case Experimental Designs
that was carried out for only four 30-minute sessions per day.
Results of this single-case analysis appear in Figure 7-15. Data clearly
indicate the controlling effects of the treatment, both in terms of its initial
B LINE 1
PUNISHMENT 1
8LINE M PUNISHMCNT II
GfNfRAHZATlON
X = 10 82
14 X-0 20 X = 30l X.0 34 X.014
/
14
X = 9 95 A X =
0I8J X--3 73 X = 023 X=0 08
12 CXNING BOOM
I"
oc
r
a:
9
<
X=6 75 X = 26 XO 97 X=013 x.QIS
BATH ROOM
I
4
2 4 10 12 14 1* U 20 22 24 2 20 JO J2 34 30 30 40 42 44 2 4
^""5
.fSSlONS
FIGURE 7-15. Number of hyperventilation responses per minute and condition means across
experimental phases and settings. (Figure 1, p. 565, from: Singh, N. N., Dawson, J. H., &
Gregory, P. R. [1980]. Suppression of chronic hyperventilation using response-contingent dra-
matic ammonia. Behavior Therapy, 11, 561-566. Copyright 1980 by Association for Advance-
ment of Behavior Therapy. Reproduced by permission.)
Multiple Baseline Designs 241
12 3 4
Probe Assessrr^nt Sessions
5
FIGURE 7-16. Maximum SUDS ratings during probe sessions (Subject 2). (Figure 2, p. 505,
from: Fairbank, J. A., & Keane, M. [1982]. Flooding for combat-related stress disorders:
Assessment of anxiety reduction across traumatic memories. Behavior Therapy, 13, 499-510.
Copyright 1982 by Association for Advancement of Behavior Therapy. Reproduced by permission.)
populations 49,978 and 65,910). During baseline, the mean number of home
burglaries committed per day was computed for each area (Xs = 2.83 and
2.25).
After 17 days of baseline in Area 1 of standard police patrolling, an
.
BASELINE INTERVENTION
FIGURE 7-17. Number of home burglaries in two high-density areas over baseline and interven-
tion conditions. (Figure 1, p. 145, from: Kirchner, R. E., Schnelle, J. E, Domash, M., Larson,
L., Carr, A., & McNees, M. P. [1980]. The applicability of a helicopter patrol procedure to
diverse areas: A cost-benefit evaluation. Journal of Applied Behavior Analysis, 13, 143-148.
Copyright 1980 by Society for Experimental Analysis of Behavior. Reproduced by permission.)
As noted in section 7.2, in the multiple baseline design across subjects, each
individual targeted for treatment is exposed to the same environment. Treat-
ment is delayed for each successive subject in time-lagged fashion because of
the increased length of baselines required for each. The functional relation-
ship between treatment and behavior change can be determined
selected for
only when such treatment is applied to each subject in succession. Thus, since
subjects (at least two but usually three or more) are simultaneously available
for assessment and treatment, this design is able to control for history (cf.
Campbell & Stanley, 1963), a possible experimental contaminant.
There are times, however, when one is unable to obtain concurrent obser-
vations for several subjects, in that they may be available only in succession
(e.g., less frequently seen diagnostic conditions such as hysterical spasmodic
jects, this design ordinarily would not be considered appropriate under these
circumstances. However, more recently Watson and Workman (1981) have
proposed an alternative the nonconcurrent multiple baseline across individ-
uals.
In this . . . design, the researcher initially determines the length of each of several
baseline designs (e.g., 5, 10, 15 days). When a given subject becomes available
(e.g., a client referred who has the target behavior of interest, and is amenable to
through the treatment phase, as in a simple A-B design. Subjects who fail to
display stable responding would be dropped from the formal investigation;
however, their eventual reaction to treatment might serve as useful replication
data.
Baseline Treatnnent
Subject 3
10 days
Baseline Tredtment
Subject 2
5 days
Baseline Treatnnent
Subject I
15 days
Days
FIGURE 7-18. Hypothetical data obtained through use of a nonconcurrent multiple baseline
design. (Figure 1, p. 258, from: Watson, P. J., & Workman, E. A, [1981]. The nonconcurrent
multiple baseline across-individuals design: An extension of the traditional multiple baseline
design. Journal of Behavior Therapy and Experimental Psychiatry, 12, 257-259. Copyright 1981
by Pergamon. Reproduced by permission.)
Multiple-probe technique
15
]
Tom
10 1 Hypothetical h-obes
1
Reported Data o
,
(Horner &KeilitzJ975)
5
A 1
1 I
Michael
15
10
.r
5ll__ n
^ 15
Larry
CO
A<^ ^
10 I
Russell
5 10 15
BASELINE SESSIONS
FIGURE 7-19. Number of toothbrushing steps conforming to the definition of a correct response
across 4 subjects. (Figure 2, p. 194, from: Horner, R. D., & Baer, D. M. [1978]. Multiple-probe
technique: A variation of the multiple baseline. Journal of Applied Behavior Analysis, 11,
189-196. Copyright 1978 by Society for Experimental Analysis of Behavior. Reproduced by
permission.)
Multiple Baseline Designs 247
The multiple-probe technique, with probes every five days, would have provided
one, two, three, and five probe sessions to establish baselines across the four
subjects. The multiple-probe technique probably could have provided a stable
baseline with five or fewer probe sessions for the subject who had 15 days of
continuous baseline in the original study. The use of the multiple-probe proce-
dure might have precluded the increase in irrelevant and competing behaviors by
this subject because such behavior began to increase after the tenth baseline
session, (p. 195)
It should be noted that, over the years, a variety of researchers have applied
this variant of baseline assessment in the multiple baseline design (Baer &
Guess, 1971; Schumaker & Sherman, 1970; Striefel, Bryan, & Aikins, 1974;
Striefel & Wetherby, 1973). In each of these studies the design used was the
multiple baseline design across behaviors. But, as in Figure 7-19, it could be
across subjects, and it certainly might also be across settings.
If reactivity is the primary reason for using probe tech-
this variant, the
5 100
8 5 80
v^ S
60
5^ > 40
: o 20 /v:.
2 "
z 36
'''''*'
;&.1
8 8 2< -v^^'
?^ '2 **
S
4 I-
'':'''''
2 ':
*
il y-i * ^ *
i 1 i 1 ill I >
o ?
2 ./>
v.:
o^
-"< /J
I > I I 1 111 1 i 1 > I I i I I 1 > i|i 1 1 III 1 t
-:
lit--
.:t
II
?s iLLlI
I 3
''''''''
5 7 9 II 13 15 17 19 >4-IO
1
1
> I
3
III
5
I I
7
1 t
9 II 13
I
15
I I
17
I III I >
19 2-4-10
FIGURE 7-20. Probe sessions during baseline, treatment, and follow-up for Subject 3. (Figure 3,
p. 396, from: Bellack, A. S., Hersen, M., & Tlirner, S. M. [1976]. Generalization effects of social
skills training in chronic schizophrenics: An experimental analysis. Behaviour Research and
Therapy, 14, 391-398. Copyright 1976 by Pergamon. Reproduced by permission.)
"ratio of words spoken to speech duration." Probe data (open circles) suggest
that there was further evidence of transfer of training to the Novel Scenes,
with the exception of "ratio of words spoken to speech duration." Finally, for
the three sets of scenes, data indicate that gradual improvements in overall
assertiveness were noted throughout treatment, which appeared to be main-
tained in follow-up.
As we have seen, the probe technique can be most useful in a number of
instances. However, as in the case of the nonconcurrent multiple baseline
design, it should not be employed as a substitute for continuous measurement
when that is feasible. That is, data accrued from use of probe measures are
suggestive rather than confirmatory of the controlling effects of a given
treatment.
Multiple Baseline Designs 249
O S-12
S-15
D S-16
^ 70
LU60
O 50 J5CX
u 40- ^j>V^ o S-14
^ 30
S-17
20-
10
15
I 1 I I I I 1 I I' I
10
I I I I I
15
WEEKS
I' I I I I
20
I I 1 I I
25
I I I I I
30
FIGURE 7-21, Frequencies of inappropriate behaviors for Subjects 12-18 plotted as total
occurrences per week (summed daily interval totals). During the D condition, the subjects
received their drug; during the P no longer
condition, the subjects received a placebo, were
receiving their drug, and the response cost procedure was not in effect. Drugs were discontinued
during the first 3 weeks of the P condition. During the RC condition, the response cost procedure
was in effect, and the subjects were not receiving their drug. The dotted vertical lines separate the
conditions. (Figure 2, p. 261, from: Breuning, S. E., O'Neill, M. J., & Ferguson, D. G. [1980].
Comparison of psychotropic drug, response cost, and psychotropic drug plus response cost
procedures for controlling institutionalized mentally retarded persons. Applied Research in
8.1. INTRODUCTION
Few areas of single-case experimental designs have advanced as much as the
design strategies to be discussed in this chapter. The strength and underlying
logic of these strategies, as well as the fact that some specific questions can
only be answered using these approaches, have ensured the rapid develop-
ment and increasing use of this design, particularly during the last 5 years.
The major question addressed by this design is the relative effectiveness of
two (or more) treatments or conditions. The most common experimental
approach employed to address this question until now has been the tradi-
tional between-group comparison. In this strategy, each of two or more
treatments is usually administered to a separate group of subjects, and the
outcome of the treatments is compared between groups. Since considerable
intersubject variability exists in each group (some subjects change and some
do not), inferential statistics are necessary to determine if an effect exists.
This leads to problems in generalizing results from the group average to the
individual subjects, as discussed in chapter 2. To avoid intersubject variabil-
ity, an ideal solution would be to divide the subject in two and apply two
252
Alternating Treatments Design 253
The name that has come to be employed for the experimental design that
accomplishes this goal is the alternating treatments design (ATD) (Barlow &
Hayes, 1979). As the name implies, the basic strategy involved in this design is
the rapid alternation of two or more treatments or conditions within a single
subject. Rapid does not necessarily mean rapid within a fixed period of time;
as, for example, every hour or every day. In applied research, rapid might
mean is seen he or she would receive an alternative
that each time the client
treatment. For example, an experimenter were comparing treatments A and
if
B in a client seen weekly, he or she might apply Treatment A one week and
IVeatment B the next. If the client were seen monthly, alternations would be
monthly Contrast this with the usual A-B-A withdrawal design where, after a
baseline, an experimenter would need at least three, and usually more,
consecutive data points measuring the effect of Treatment A in order to
examine any trends toward improvement. For a client seen weekly, at least 3
weeks would be needed to establish the trend.
Since one is alternating two or more treatments, an experimenter is not
interested simply in the trend toward improvement over time. Therefore, one
would not plot the data simply by connecting data points for Weeks 1, 2, 3,
and so on. Rather, what one is interested in is comparing treatments A and B.
Therefore, in order to examine visually the experimental effects, one would
connect all the data points measuring the effects of TVeatment A and then
connect all the data points measuring the effects of TVeatment B. If, over
time, these two series of data points separated (i.e., TVeatment B, for exam-
ple, produced greater improvement than TVeatment A), then one could say
with some certainty that TVeatment B was the more effective. Naturally, these
results would then need replication on additional clients with the same
problem. Such hypothetical data are plotted in Figure 8-1 for a client who was
treated and assessed weekly.
Of course, one would not want to proceed in a simple A-B-A-B-A-B-A-B
fashion. Rather, one would want to randomize the order of introduction of
the treatments to control for sequential confounding, or the possibility that
introducing Treatment A first, for example, would bias the results in favor of
Treatment A. Therefore, notice in the hypothetical data that A and B are
introduced in a relatively random fashion. Thus, if one were seeing a client in
an office or a child in a school setting, one might administer the treatments in
an A-B-B-A-B-A-A-B fashion, as in the hypothetical data. For a client in an
office setting, these treatment occasions might be twice a week, with the
experiment taking a total of 4 weeks. For a child in a school setting, one
might alternate treatments 4 times a day, and the experiment would be
completed in a total of 2 days. Randomizing introduction of treatments and
SCED !
254 Single-case Experimental Designs
100
90
80
70
g
cz>
60
^ 50
Treatment B
30
I Treatment A
20
10
WEEKS
Tenninology
While this basic research strategy has been used for years within a number
of experimental contexts, a confusing array of terminology has delayed a
widespread understanding of the basic logic of this design. In the first edition
of this book, we termed schedule design. Others have
this strategy a multiple
termed the same design a multi-element baseline design (Sidman, 1960;
Ulman & Sulzer-Azaroff, 1973, 1975), a randomization design (Edgington,
1967), and a simultaneous treatment design (Kazdin & Hartmann, 1978;
McCuUough, Cornell, McDaniel, & Meuller, 1974). These terms were origina-
ted for somewhat different reasons, reflecting the multiple historical origins
Alternating Treatments Design 255
Multiple-treatment interference
Kazdin (1982b) has used the term multiple-treatment designs very accurately, in our
view, to subsume both and simultaneous treatment designs. However,
alternating
since simultaneous treatment designs are so rare and would seem to have such little
applicability in applied research, this book will concentrate on the description and
illustration of alternating treatment designs.
Alternating Treatments Design 257
there are few strictly "applied" situations where treatments are ever alter-
name for sequential confounding is order effects. That is, much of the benefit
of Treatment B might be due simply to the order in which it is administered
vis-a-vis other treatments. Sequential confounding with A-B-A withdrawal
designs has been discussed in section 5.3. The solution, of course, is to
arrange for a random (or semirandom) sequencing of treatments. One can
view this random order of sequencing treatments in a typical ATD in the
hypothetical data presented in Figure 8-1. Such counterbalancing also allows
for statistical analyses of ATDs for those who so desire (see chapter 9).
Carryover effects, on the other hand, refer to the influence of one treat-
ment on an adjacent treatment, irrespective of overall sequencing. Terms such
258 Single-case Experimental Designs
differences in ability to learn discrimination may be the reason. That is, those
subjects (pigeons or rats) that are slower in learning the discriminations are
associated with longer periods of carryover effects, whereas subjects learning
the discriminations quickly evidence very short and transient carryover ef-
fects.
When carryover effects have been noticed in humans (e.g., Waite & Os-
borne, 1972), experimental operations similar to those employed in the
laboratories of basic research were in operation. Presumably the same lack of
discriminability was occurring.
In applied research, this would imply that carryover effects of the type
discussed here are a possibility only when learning is occurring. This would
exclude most biological treatments, such as pharmacotherapy, where no real
learning occurs (although biological multiple-treatment interference will oc-
cur if drugs are alternated too quickly, depending on the half-life of the
particular drug, see chapter 6). On the other hand, almost all psychosocial
Alternating Treatments Design 259
sessions with a time interval should reduce carryover effects. Powell and
Hake (1971) minimized carryover effects in this way in a study comparing
two reinforcement conditions by presenting only one condition per session.
Fortunately, in applied research it is the usual case that only one treatment per
session is administered even if several sessions are held each day (e.g., Agras
et al., 1969; McCullough et al., 1974). Similar procedures have been sug-
The experimental design and the results are represented in Figure 8-2,
where the average responses of the five subjects are presented. (Individual
data were also presented, but this figure will suffice for purposes of illustra-
tion.) Thus this experiment really consisted of four separate ATDs after the
baseline condition, in which token reinforcement was alternated with either
baseline or response costs. Each of these ATDs was repeated twice. The
elegance of this design for examining multiple-treatment interference is found
in the fact that one can examine the effects of token reinforcement when
alternated with either another treatment or baseline. If multiple-treatment
interference is evident when token reinforcement is alternated with the other
treatment, response cost, then the effects of token reinforcement should be
different during that part of the experiment from when token reinforcement
is alternated with baseline.
First, it is important to note here that both token reinforcement and
response costs produced strong and comparable effects in increasing on-task
behavior, and that token reinforcement was clearly effective when compared
to baseline. The investigators decided, however, that token reinforcement was
the preferable treatment because they noticed that more disruptive behavior
occurred during the response-cost procedure than during the token reinforce-
ment procedure. Thus token procedures were continued during both sessions
in the last phase.
The investigators reported three different sets of findings from their ex-
amination of potential multiple-treatment interference. First, no evidence was
PERCENT
INTERVALS
ON TASK
-
A- - - A BL or MnponM Cmi
FIGURE 8-2. Group mean percentages of on-task behavior. Paired interventions in each phase
consisted of Baseline/Baseline; Token Reinforcement/Baseline; Token Reinforcement/Response
Cost; Token Reinforcement/Baseline; Token Reinforcement/Response Cost; Token Reinforce-
ment/Token Reinforcement. (Figure 1, p. 110. from: Shapiro, E. S., Kazdin, A. E., &
McGonigle, J. J. (1982). Multiple-treatment interference in the simultaneous- or alternating-
treatments design. Behavioral Assessment, 4, 105-115. Copyright 1982 by Association for
Advancement of Behavior Therapy. Reproduced by permission.)
262 Single-case Experimental Designs
found that the overall level of on-task behavior was different when it was
would be important that one treatment did not always occur in the same
classroom. For example, in McCuUough et al (1974) ATD examining the
effects of two treatments on disruptive behavior in a 6-year-old boy, two
factors were counterbalanced (see Table 8-1). In this particular experiment the
first treatment was social reinforcement for cooperative behavior and ig-
Table 8-1
TREATMENT
TIME
DAY 1 DAY 2 DAY 3 DAY 4
Redrawn Table 1, p. 260 from McCullough, J. P., Cornell, J. E., McDaniel, M. H., & Mueller, R.
K. (1974). Utilizational of the simultaneous treatment design to improve student behavior in a
first-grade classroom. Journal of Consulting and Clinical Psychology, 42, 288-292. Copyright
1974 by the American Psychological Association. Reproduced by permission.
264 Single-case Experimental Designs
data points for each treatment would be necessary, although a higher number
would, of course, be much more desirable. TWo data points per treatment
would allow an examination of the relative position of each treatment and
some tentative conclusions on treatment efficacy. However, returning to
Figure 8-1 once again, few investigators would be convinced of the superior-
ity of Treatment B if the experiment were stopped after Week 4. Nevertheless,
c -^ IS
B B
Is o o^ I Is
2> fi-O c O C
4> C 4> O 1>
^
o
o o
0*0 O o 2 o 2
60 J w tS 'a i> 13
-o '53
=1
tJOe S C C
.5 g ^ -s 'a
SO-53ooo-cooc_gotio-Ca
^!r^#-^7^r!/-^/vS?50n
1111
i6>s^ao =
cs ^ cj
>
g o
1
^ 1
-
I
>^
"
^
S ^ 2 -
^3 T.-- o
2i .2 -c
6 -s > 3
a '^ 0.5 5'
T3
I I
a
3
Tt -H -,
1^
U
i3
Tj-
c^
^.K,
<N 2;
S <5 s 5 N
CL, w ^^ S t ^.
^^ Q
.
iJ I
t/3
CQ 2 <
oa
^ =
NO 4> O
06 ^ 0Q
266
8 -
c o
a- o- cu
Q -5 z Pu cu z f2^
ucO^ctfX) ucq(jct]coX)u
I
1
o 00
= 1 > .S V.
1
-a
c >O
I 2 60 U
J j3
2 let
O
CO i/J
CJ
'g
C3
>.
4>
-^
^'^^ o o 3
J^ Si 14> s
(t:
C
(35 ^ 55 (5 o
60 iS
g "2 c TJ*' g
5 -S
I .a >>
60 4) t
i2
-a
c
4> 1
S aj a ^1
^ .2
2-2 :5 o <i>2
.S
60 O "S
c3
-1
II ^'
3 : <N H : o2
C S2 |g S3 g SJ)
rj ^' ON =^
o a ^ 5>
t
c U C
OO
On
1 to
?-
^o ^
a
CO
ON l->
9 J O
O 1
o 1
1o 1 1 1
1
H ^^
u o.
^ c
o u i to
^ t ^ rj"
w s~ .^ O -^ 1
g o
21o o'
tz
CO
x: 1
J
d"
u-
1 <u
5
. 5
^'2 'S.
CO
s_
C/3
'5.
CO
c4 jiC
u
i
1 t^ oo ^B COCS
J=00 -^s
ON .2 On ^'^ -*
S
(73
1" C/D
1 o ui o 3: pg
267
<S
el ill? I
d c8 e oj
C/3 esbesb ^
fill 3'^3-?!:ic
r\ M rs
1 1
*
"i
12
tary tary
id
s ^ s
Son 6o
<^ rt**^ ^
^g.*^^^
U U
I
fS
^^ ^
^ 1
O oc
C; g
4>
3> ^
03
a &
<^
b ^
2 c
2 ? o 1
-"
1 ?
X I jD ^
268
Alternating Treatments Design 269
before beginning the experiment, the investigators ruled out the use of an A-
B-A withdrawal design because even temporary increases in stereotypic be-
havior during withdrawal phases were unacceptable in this setting.
Furthermore, previous experience of these investigators suggested that there
was a chance the two treatments might be equally effective. Thus a no-
treatment condition might be necessary to determine if these treatments were
effective at all.Of course, this problem also arises in between-group research
because, iftwo treatments were equally effective (on the average) in two
groups, a control group would be necessary to determine if any clinical
effects occurred over and above no treatment.
In this procedure, three 15-minute sessions were administered by the same
experimenter each day. Individual sessions were separated by at least one
hour. Following baseline conditions for all three time periods, the two treat-
ments and the no-treatment conditions were administered in a counterbal-
anced order across sessions. When one of the treatments produced a zero or
near-zero rate of stereotypic behavior, that treatment was then selected and
implemented across all three time periods during the remainder of the study.
During sessions, each child was escorted to a small table in a classroom and
instructed to work on one of several visual motor tasks. One treatment was
physical restraint, consisting of a verbal warning and manual restraint of the
child's hand on the tabletop for 30 seconds contingent on each occurrence of
16
14 NO INTERVENTION
POSITIVE PRACTICE
12 ^ ^PHYSICAL RESTRAINT
10
8
6 -
UJ
O
li
OC (/)
itifl:
SESSIONS
^IGURE 8^ Stereotypic hair twirling and accurate task performance for John across experi-
rnenfaTconHitions. The data are plotted across the three alternating time periods according to the
schedule that the treatments were in effect. The three treatments were presented only during the
alternating-treatments phase. During the last phase, physical restraint was used during all three
time periods. (Figure 1, p. 573, from Ollendick, T. H., Shapiro, E. S., & Barrett, R. P. (1981).
Reducing stereotypic behaviors: An analysis of treatment procedures utilizing an alternating
treatments design. Behavior Therapy, 12, 570-577. Copyright 1981 by Association for Advance-
ment of Behavior Therapy. Reproduced by permission.)
ALTERNATING
BASELINE TREATMENT PHYSICAL RESTRAINT
John
NO INTERVENTION
"POSITIVE PRACTICE
^ - *^ PHYSICAL RESTRAINT
'IGURE 8-4 Stereotypic hand posturing and accurate task performance for Tim across experi-
menlatconditions. The data are plotted across the three alternating time periods according to the
schedule that the treatments were in effect. The three treatments were presented only during the
alternating-treatments phase. During the last phase, positive practice overcorrection was used
during all three time periods. (Figure 2, p. 574, from OUendick, T. H., Shapiro, E. S., & Barrett,
R. P. (1981). Reducing stereotypic behaviors: An analysis of treatment procedures utilizing an
alternating treatments design. Behavior Therapy, 12, 570-577. Copyright 1981 by Association for
Advancement of Behavior therapy. Reproduced by permission.)
differences, which in fact they did. Because of this, they were in a position to
examine more carefully client-treatment interactions that would predict
which treatment would be successful in an individual case. Once again,
highlighting intersubject variability in this way can only increase the precision
with which one can generalize the effects of these specific treatments to other
individual clients (see chapter 2).
Finally, the discerning reader will notice that posturing during the no-
treatment condition of the ATD is
somewhat higher with John and Tim than
during baseline, where the same condition was in effect across all three time
periods (but this increased response during no treatment was not true for the
third subject). It is possible that this is an example of negative carryover
effects, because responding during no treatment was worse when it was
alternated with treatment than it was alone; that is, in baseline. In this
experiment the authors purposefully blurred the discriminability of the three
conditions as part of their experimental strategy, which may account, in part,
for the carryover effects. This finding, once again, occurred in baseline and
did not affect the ability of the investigators to determine the most effective
treatment and then to apply it successfully during the last phase.
Of course, determination of the effectiveness of a single treatment com-
pared to no treatment can also be examined via the most common A-B-A-B
withdrawal design (see chapter 6, section 6-3). In this particular experiment,
however, the authors were interested in comparing the effects of two treat-
ments with each other as well as the effects of each compared to no treat-
ment, and thus the ATD was the only choice. Furthermore, they had
determined clinically that it was not possible to allow an increase in stereotyp-
ic responding in the absence of treatment, a condition that would obtain
during the withdrawal phase of any A-B-A design. Nevertheless, when one
wishes to compare treatment with no treatment, one has a choice between a
more standard withdrawal design and an ATD. The advantages of the ATD
have already been mentioned. In addition to not requiring a withdrawal of
treatment for a period of time, the comparison within the ATD can usually be
made more quickly, and it can proceed without a formal baseline if this is
necessary. On the other hand, there no single phase in the ATD where
is
The majority of ATDs compare the effects of two treatments rather than
the effects of treatment with no treatment. An early example in an adult
clinical situation examined the effects of two fear-reduction procedures
(Agras et al., 1969, see Figure 8-5). This study examined the effects of two
forms of exposure-based therapy. The subject was a 50-year-old female with
severe claustrophobia. Her fears had intensified following the death of her
husband some 7 years before admission to the treatment program. When
admitted, the patient was unable to remain in a closed room for longer than
one minute without experiencing considerable anxiety. As a consequence of
this phobia, her activities were seriously restricted. During the study she was
asked four times daily to remain inside a small room until she felt she had to
come out. Time in the room was the dependent measure. During the first four
data points, representing treatment, she kept her hand on the doorknob.
Before the fifth treatment data point (sixth block of session), she took her
hand off the doorknob, resulting in a considerable drop in times. During one
treatment she was simply exposed to the closet, with the therapist nearby
(outside the door). In the second treatment the therapist administered social
praise contingent on her remaining in the room foran increasing period of
time. The two therapists alternated sessions with one another. In the original
experimental phase the therapists switched roles, but they returned to their
original reinforcing or nonreinforcing roles in the third phase. The data
indicate that reinforced sessions were consistently superior to nonreinforced
sessions.
Several procedural considerations deserve comment. First, the counterbal-
ancing was rather weak because the therapists switched roles only twice
during the whole experiment. Ideally, a more systematic counterbalancing
strategy would have been planned. Second, the treatments were not adminis-
tered randomly. Sessions involving exposure without contingent praise always
preceded exposure with contingent praise. Despite this fact, a clear superior-
ityof one treatment over the other emerged. Nevertheless, the experiment
274 Single-case Experimental Designs
600 -I
Experimental phases 1
550
5 500
O
450
^
O ^^
-
4- NRT
? 350
/)
300
o
< 250
2 200
Z 150
lU
a.
to
RT = Reinforcing therapist
100 Nonreinforcing therapist
NRT
n Therapist 1
- 50
Baseline o -o Therapist 2
4 5 6 7 8 9 10 11 12 14
FIGURE 8-5. Comparison of effects of reinforcing and nonreinforcing therapists on the modi-
fication of claustrophobic behavior. (Figure 3, p. 1438, from: Agras, W. S., Leitenberg, H.,
Barlow, D. H., & Thomson, L. E. (1969). Instructions and reinforcement in the modification of
neurotic behavior.American Journal of Psychiatry, 125, 1435-1439. Copyright 1969 by the
American Psychiatric Association. Reproduced by permission.)
12341234S67I
>
WMkS
t I I > I t I I I < I
IMillli flllfSllf
mjicii
FIGURE 8-6. The effects of each treatment (COG = cognitive treatment; SS = social skill
treatment) in a multiple baseline design across the 3 subjects experiencing difficulties in social
skills on the weekly dependent measures administered. (Total score on the Lubin Depression
Adjective Checklist; Average score on the Personal Beliefs Inventory; Mean cross-product score
on the Interpersonal Events Schedule.) (Figure 2 from: McNight, D. L., Nelson, R. O., Hayes, S.
C, & Jarrett, R. B. (in press). Importance of treating individually assessed response classes in the
amelioration of depression. Behavior Therapy. Copyright 1984 by Association for Advancement
of Behavioral Therapy. Reproduced by permission.)
^
COGNITIVE GROUP
llSfllll llfllMflf llfllll TIEI1MIIT
2t mufCT 1 muiai
M
^T^:;
O
<2I
-M- - I - -I I 14
, IniJEci ,1 I
-I I I I t <
5 4
UIJICI
I
1 1 1 -I f^H <
yi
-I
KBIJECT
I
I
t > I r
I I I
>4
3
2 -
* J 'I
l^4'l234S(7l
12S41234S678
I
WMkS
I } 1 I I 1 t I > * I t
limill TIEITMIIT
1
(UlJECf
FIGURE 8-7. The effects of each treatment (COG = cognitive treatment; SS = social skill
treatment) in a multiple baseline design across the 3 subjects experiencing difficulties in irrational
cognitions on the weekly dependent measures administered. (Total score on the Lubin Depression
Adjective Checklist; Average score on the Personal Beliefs Inventory; Mean cross-product score
on the Interpersonal Events Schedule.) (Figure 4, from: McKnight, D. L., Nelson, R. O., Hayes,
S. C, & Jarrett, R. B. (in press). Importance of treating individually assessed response classes in
This very elegant experiment a model in many ways for the use of the
is
(C 80
o
> 60
<
X 40
UJ
OD
UJ
20
>
z
UJ
h- 100
H ^^P^
< 80
H
Z 60
UJ
o
C
UJ
40
Q.
20 SELF
CLASSo
15 10
DAYS
15 20
FIGURE 8-8. Attentive behavior of Max across experimental conditions. Baseline (base) no
experimental intervention. Token reinforcement (token rft) implementation of the token pro-
gram where tokens earned could purchase events for himself (selO or the entire class (class).
Second phase of token reinforcement (token rft 2) implementation of the class-exchange inter-
vention across both time periods. The upper panel presents the overall data collapsed across time
periods and interventions. The lower panel presents the data according to the time periods across
which the interventions were balanced, although the interventions were presented only in the last
two phases. (Figure 2, p. 690, from: Kazdin, A. E., & Geesey, S. (1977). Simultaneous-treatment
design comparisons of the effects of earning reinforcers for one's peers versus for oneself.
Behavior Therapy, 8, 682-693. Copyright 1977 by Association for Advancement of Behavior
Therapy. Reproduced by permission.)
research indicates that this problem may not be a great as once feared, we
must still await systematic investigation of this issue to proceed with certainty.
In any case, when it comes to generalizing the results of single-case experi-
mental investigations to applied situations, there seems little question that the
first treatment phase of an A-B-A-B design (or a multiple baseline design) is
immediately following the switch in therapists, the Agras et al. (1969) ATD
presented nonoverlapping series (see Figure 8-5).
Kazdin and Geesey (1977) also presented two series of data from the two
treatments tested in their experiment which do not overlap, with the exception
of one point very early in the ATD experiment (see Figure 8-8). Also, these
data diverge increasingly as the ATD proceeds. Finally, Ollendick, Shapiro,
and Barrett (1981) demonstrated a clear divergence between treatment and no
treatment (see Figures 8-3 and 8-4). When one examines the effects of the two
treatments, several data points overlap initially, but the two series increasingly
diverge as the ATD proceeds. One must also remember that in this particular
experiment (Ollendick et al., 1981) there were no clear signs or signals
discriminating the treatments, and therefore this overlap may reflect some
confusion about which treatment was in effect early in the experiment.
If overlap among the series occurs, then there is little to choose among the
treatments or conditions, and most investigators say so. For example,
Weinrott et al. (1978) observed considerable overlap between observer-present
and observer-absent conditions in their experiment and concluded that obser-
ver reactivity was not a factor. Last, Barlow and O'Brien (1983) also ob-
served overlap between two cognitive therapies and concluded that each was
effective. Of course, when some overlap does exist, it is possible to utilize
statistical procedures to estimate if any differences that do exist are due to
chance or not (e.g., McKnight et al., 1983, Figure 8-7; E. S. Shapiro et al.,
1982, Figure 8-2). However, as discussed in chapter 9, one must then decide if
these rather small effects, even if statistically significant, are clinically useful.
Our recommendation for these designs, and throughout this book, is to be
conservative and to look for large visually clear, clinically significant effects.
On the other hand, the ATD lends itself to a wide number of statistical tests,
as outlined by Edgington (1984) and reviewed in chapter 9. Many of these
tests require relatively few data points in each series. For example, using some
of the examples presented in this chapter, Edgington (1984) has demonstrated
how a variety of tests would be applicable to these data sets.
10 total frequency
o
B (B) positive attention
(C) verbal admonishn>ent
9 purposely ignore
U^
hZ 9
(D)
-> 8
S
"S ^* /
hs
^ s 6
tX
5
s.^
H
^
B
s
S.
4
3 3
rS" 2
8 9 10 11
WEEKS
FIGURE 8-9. Total mean frequency of grandiose bragging responses throughout study and for
each reinforcement contingency during experimental period. (Figure 3, p. 241, from: Browning,
R. M. (1967). A same-subject design for simultaneous comparison of three reinforcement
contingencies. Behaviour Research and Therapy, 5, 237-243. Copyright 1967 by Pergamon Press.
Reproduced by permission.)
report resulting from asking a subject about his or her preference will not be
sufficient, for a variety of reasons. When these questions arise, the STD can
be a very powerful tool for studying preference in the individual subject. But
the STD is not well suited to an evaluation of the effectiveness of behavior
change procedures.
CHAPTER 9
Analyses for
Statistical
Single-case Experimental Designs
by Alan E. Kazdin*
9.1. INTRODUCTION
Data evaluation consists of methods that are used to draw conclusions about
behavior change. In applied research where single-case designs are used,
experimental and therapeutic criteria are invoked to evaluate data (Risley,
1970). The experimental criterion refers to the way in which data are evaluated
to determine if an intervention has had a reliable or veridical effect on behav-
ior. The experimental criterion is based on a comparison of behavior under
criterion,whether certain types of tests should be used, and so on, they remain
in the background in terms of the actual conduct of research. Within single-
case research, application of statistical tests is far less well developed or
established. The types of statistical tests available are not widely familiar, and
their appropriate application has relatively few exemplars (Kratochwill, 1978b;
Kratochwill & Brody, 1978). More basic than the application of the tests is the
question of whether such tests should be used at all in single-case research. The
present chapter discusses issues regarding the use of statistical analyses in
single-case research. However, major emphasis will be given to various tests
themselves and how they are applied. Advantages and limitations in applying
particular tests will be presented as well.
nature of the data and the population from which subjects are drawn. In single-
case research, one or a few individuals are observed at several different points
in time. Statistical tests applicable to group studies may not be appropriate for
single cases where data are collected over time.
Serial dependency
tion for pairs of observations is assumed to be zero (i.e. r^^ = 0). Typically, in
, .
inwhich case the data are said to be serially dependent. The correlation among
successive data points means that knowing the level of performance of a
subject at a given time allows one to predict subsequent points in the series.
The extent to which there is dependency among successive observations can
288 Single-case Experimental Designs
13 15
+I.Or
.8-
.6-
.4-
'
0-
-.2-
-.4-
-.6-
LAG
-.8-
-I.OL
FIGURE 9-1. Correlograms for data with (upper portion)
and without serial dependency (lower portion).
Ftests. Use of these tests when the data are serially dependent can lead to Type
I and Type II errors, and simple corrections to avoid these biases (e.g.,
adjustment of probability level) do not address the problem. (In passing, it may
be important to note as well that serial dependency in the data can also bias the
conclusions reached through visual inspection as well as statistical analyses [see
R. R. Jones, Weinrott, & Vaught, 1978].)
General comments
methods are not fundamentally different, but they do vary in the sorts of
effects that are sought and the manner in which decisions are reached about
intervention effects.'
Some of the objections to statistics in single-case research have stemmed
from the focus on groups of subjects in between-group research. Within-group
variability is often a basis for evaluating the effect of interventions in group
research. Yet, within-group variability is not part of the behavioral processes of
individual subjects and perhaps should not be included
in the evaluation of
performance (Sidman, 1960; also see chapter 2). Related group research often
obscures the performance of the individual subject. Statistical analyses usually
reflect the performance of the group as a whole with data characteristics
(means, variances) that do not bear on the performance of any single subject. It
remains unclear how the intervention affects individuals and the extent to
which group performance represents individual subjects. As these objections
illustrate, concerns over statistical analyses extend beyond the manner in which
data are evaluated. The objections pertain to fundamental issues about experi-
mental design and the approach toward research more generally (J. M. John-
ston & Pennypacker, 1981;Kazdin, 1978).
Potential contributions
General comments
The controversy over statistical analyses is not whether all data in single-case
research should be evaluated statistically. Single-case research designs, the
tradition from which they derive, and the dual concerns in applied work for
experimental and therapeutic criteria for evaluating change all place limits on
the role of statistical analysis. Within the approach of single-case research, the
question is whether statistical tests can be of use in situations where visual
inspection might be difficult to apply. There are different reasons for posing an
affirmative answer. Although visual inspection can be readily applied to many
investigations, the method has its own weaknesses. In a variety of circum-
There are a large number of statistical tests that can be applied to data
obtained from a single subject over time. The range of available tests has not
been conveniently codified or illustrated. Indeed, the task is rather large
because a given test might be applied in a variety of different ways depending
294 Single-case Experimental Designs
on the specific variant of single-subject designs and the statement the investiga-
tor wishes to make about the intervention. Several tests discussed below
illustrate major variants currently available but do not exhaust the range of
appropriate tests.
1 12 13 88
2 10 14 28
3 12 15 40
4 22 16 63
5 19 17 86
6 10 18 90
7 14 19 82
8 29 20 95
9 26 21 39
10 5 22 51
11 11 23 56
12 34 24 86
25 31
26 77
27 76
Autocorrelation r = .005 --
Autocorrelation r = .010
(lagl) (lag 1)
tions is the analysis proposed by Gentile, Roden, and Klein (1972). When
autocorrelation exists, these investigators suggested that nonadjacent phases
that employed the same treatment can be combined and will reduce the effect
of serial dependency. For example, in an A-B-A-B design, the two A phases are
not adjacent and could be combined and compared with the two B phases. The
rationale for combining phases is based on the fact that autocorrelations tend
to decrease as the lag between observations increases. Assuming serial depen-
dency in the data. Observation 1 in phase A, would be more highly correlated
with Observation 1 in Phase B, (i.e. the immediately adjacent phase) than with
,
will reduce the dependency. Combining phases that are not adjacent should
make A and B treatments more dissimilar, due to dependency in the data. The
resulting t (or F) should be reduced because the dependency of adjacent
observations will minimize treatment differences. Additional variations of /
and Fhave been proposed, some of which attempt to address the issue of serial
dependency by developing special error terms to make statistical comparisons
of treatment effects (see Gentile et al., 1972; Shine & Bower, 1971).
296 Single-case Experimental Designs
should be considered.
1981; Hartmann et al., 1980; R. R. Jones, Vaught, & Weinrott, 1977). The
which alternative phases (e.g.,
analysis can be used in single-case designs in
baseline and intervention) are compared. There are two important features of
time series analysis for single-case research. First, the analysis provides a t test
Data analysis
The actual analysis itself cannot be outlined in a fashion that permits simple
computation. Time depends upon more than entering raw data
series analysis
B B
Q 1
>
<
X
1
' ^^--^^^
^^ i
1
^
^^^
UJ ^ ^^^0*^^"^ ^^^
<D x^
j
U.
O
\
^^
^y'^x
UJ ^^.^""^j
^^^^^'"''^ y^
^0^ \
l- 1 1
< 1
(T
A , B A , B
tr.
o /
>
[
/
/
< 1
/
X 1
/
UJ '
(D \
/ L
u. \/ >^ 1 >v
o
^^X
1 ^N^
UJ 1 >>^^
H ^^^,,,*
-^^ 1
/^ 1 N,^
< .^^"^'''''^ 1
q:
A ^ B A B
q:
o [ y
j/^
> 1
< 1 >^
X 1
>^
>^
>^
y
UJ
X
1
CD 1
y/^
j/^
U.
O
1
y^
Ul
1-
<
or
in the data. Different patterns of dependency may emerge that depend upon
the pattern of autocorrelations, which are computed with different lags or
intervals, as noted earlier. Once the pattern of serial dependency is identified, a
model is applied to the data. The analysis consists of several steps, including
adoption of a model that best fits the data, evaluation of the model, estimation
of parameters for the and generation of t for level and slope changes
statistic,
(G. V. Glass et al., 1974; Gorsuch, 1983; Gottman, 1981; Home, Yang, &
Ware, 1982; Stoline, Huitema, & Mitchell, 1980). Computer programs are
available to handle these steps (see Gottman, 1981; Hartmann et al., 1980).
It is useful to examine the results of a time series analysis for illustrative
purposes and to evaluate the results in light of the characteristics of the data
that might be inferred from visual inspection. As an illustration, one program
focused on the frequency of inappropriate talking in a second-grade classroom
(C. Hall et al., 1971, Exp. 6). Although there were many children in class, the
class as a whole was treated as a single subject. The intervention consisted of
praise and other reinforcers provided to children for their appropriate class-
room behavior. The effects of the intervention, evaluated in an A-B-A-B
design, are plotted in Figure 9-3. The results suggest that inappropriate talking
out was generally high during the two different baseline phases and was much
lower during the different reinforcement phases (praise, tokens plus a sur-
prise). The first two phases (AB) have been analyzed using time series analysis
(R. R. Jones, Vaught, & Reid, 1975). Through a computer program, the
analyses revealed that the data were serially dependent, that is, the adjacent
points were significantly correlated. Indeed, autocorrelation for lag 1 was .96
principle, comparisons could be made across the other phases as well, although
restrictions on the number of data points in this particular study present a
limiting condition, discussed later.
The analysis is not restricted to variations of an A-B-A-B design. In any
design where there is a change across phases, time series analysis provides a
potentially useful tool. For example, in multiple baseline designs, time series
analysis can evaluate change from baseline to intervention phases for each of
the responses, persons, or situations, depending upon the precise design.
Straws
plus
(Grade 2) Baseline Praise plus a favorite activity
I
surprise Bi Praise
25
vv^'V/u
V
J L
20
_l I
30
I
40
L \AI 60
10 15 25 35 45 50 55
Days
FIGURE 9-3. Daily number of talk-outs in a second-grade classroom. Baseline before experi-
mental conditions. Praise plus a favorite activity systematic praise and permission to engage in
a favorite classroom activity contingent on not talking out. Straws plus surprise systematic
praise plus token reinforcement (straws) backed by the promise of a surprise at the end of the
week. Bi
withdrawal of reinforcement. Praise systematic praise and attention for handraising
and ignoring of talking out. (From: Hall, R. V, Fox, R., Willard, D., Goldsmith, L., Emerson,
M., Owen, M., Davis, F, & Porcia, E. [1971]. The teacher as observer and experimenter in the
modification of disputing and talking-out behaviors. Journal of Applied Behavior Analysis, 4,
141-149. Copyright 1971 The Society for the Experimental Analysis of Behavior, Inc. Repro-
duced by permission.)
tated when there is no slope in baseline or even a slope in the direction opposite
to that predicted by the intervention effects. In contrast, time series analysis
can be readily applied even when there is a trend toward improved perfor-
mance in baseline, as illustrated earlier. The separate analyses of the changes in
level and slope provide a reliable criterion where visual inspection may
in cases
be particularly difficult to invoke. Notwithstanding the desirable features of
time series analysis, several issues need to be considered before using the
analysis in applied research.
Jenkins, 1970). The nature of the underlying data is revealed through autocor-
relations of different lags. In conventional analyses, large sample sizes are
important to achieve statistical power. In time series analysis, the large sample
(of data points) is necessary to identify the processes within the series itself and
statistically significant changes (R. R. Jones et al., 1977). Yet applied investiga-
tions often employ relatively short phases lasting only a few days to demon-
strate intervention effects. In such cases, time series analyses will not be
applicable.
dependency because multiple data points are generated by the same subject
over time and because any influence on a particular occasion may spread (i.e.,
continue) to other occasions as well. Thus data from one occasion to the next
are likely to be correlated, and the correlation is likely to attenuate over time as
new factors impinge on the subject. In the middle and late 1970s, when time
series analyses began to receive attention in single-case research, it seemed as if
serial dependency were likely to be the rule rather than the exception (e.g.,
itself requires multiple data points to detect a statistically significant effect, and
a small number of data points may not permit precise evaluation of the
processes involved in the data.
General Comments. Time series analysis has been used increasingly within the
last several years. The increased availability of publications on the topic (e.g.,
Gottman, 1981; McCleary & Hay, 1980) and several computer programs
(Hartmann et al., 1980; Home et al., 1982) may be fostering increased use of
time series analyses. Nevertheless, use of the analysis has been relatively limited
for several reasons. The tests are complex and involve multiple steps that are
not easily described in terms familiar to most researchers. For example, serial
dependency and autocorrelation, two of the less esoteric notions underlying
time series analysis, are not part of the usual training of researchers who
conduct group studies in the social sciences. More in-depth examination of
time series analysis and its underlying rationale introduces many concepts that
depart from conventional statistical techniques and training (see Gottman,
1981). In addition, requirements for conducting time series analysis may not
foster widespread adoption within applied behavioral research. The relatively
brief phases typically used in single-case experimental designs make the test
difficult to apply and perhaps, simply, inappropriate. Recent controversy over
whether single-case data as a rule are serially dependent raises questions for
some about the need for time series analysis. Nevertheless, time series analyses
have been appropriately applied in several demonstrations and provide a
valuable addition to statistical analyses of single-case data.
Data analysis
ABAABABB
20 50 15 10
DAYS
60 25 65 70
A B
20 50
15 60
10 65
25 70
EA = 70 EB = 245
Xa = 17.50 Xb = 61.25
Xb >Xa == 43.75
TABLE 9-3. Critical Region for the Obtained Data from the Hypothetical Example
to conditions that yielded the greatest difference between A and B, then the
combination of data points that could show the next greatest difference, and so
on. A total of four combinations was selected because this is the number of
combinations that reflects the critical region for the .05 level of confidence.
Thus the critical region consists of the n set of data combinations in the
predicted direction that are the least likely to have occurred by chance (where n
= the number of combinations that constitutes the critical region). The
question for the randomization test is whether the difference between means
obtained in the original data is equal to or greater than one of the mean
differences included in the critical region. The obtained mean difference
(43.75) equals the most extreme value in the critical region and hence is a
statistically significant effect. The actual probability of the difference being
this large, given random assignment of conditions to occasions, is 1/70 or p =
.014. When the data represent the least probable combination of data (given a
one-tailed null hypothesis), the probability equals 1 divided by the total num-
ber of possible data combinations.
In the above example, a one-tailed test was performed. For a two-tailed test,
the critical region is at both ends (tails) of the distribution. The number of data
combinations that constitute the critical region unchanged for a given level of
is
dency may exist in the data. Yet the test is based on the null hypothesis that
there would be identical responses across occasions if the conditions were
presented in a different order. Every order of presenting treatments should lead
to an identical pattern of data (assuming the null hypothesis). Serial depen-
dency does not affect the estimation of the sampling distribution of the statistic
from which the inference of significance is drawn.
level, the investigator must compute the number of different ways in which the
able that permit use of the test without the cumbersome computation of the
critical region. The approximations depend on the same conditions as the
day. Thus the different conditions need not be shifted daily. Moreover, because
of random assignment, a given condition is likely to be assigned for two or
more consecutive occasions (periods). This would increase the length of the
period in which a particular condition is in effect (e.g., 6 days if two consecu-
tive 3-day periods of a particular condition are assigned). Thus the problem of
308 Single-case Experimental Designs
point when the interventionis introduced for any one of the subjects. Each
this should be reflected in the ranks. If each subject in turn shows a change
when the intervention is introduced, this would be reflected in the sum of the
ranks (or R) across all subjects, and it suggests that the ranks are not the
of random factors. R requires several different baselines or
likely result
subexperiments to evaluate whether change at the point of treatment is
reliable. At the .05 level of confidence the minimum requirement for detecting
a statistically significant effect is four baselines (i.e., persons, behaviors, or
situations).
Data analysis
subjects are ranked when the intervention is introduced, not all ranks are
used. R consists of the sum of the ranks for those subjects who receive the
intervention at the point that the intervention is introduced. If treatment is
effective, the point of intervention should result in low ranks for each subject
at that point (if low numbers are assigned to the most extreme score in the
predicted direction of change).
310 Single-case Experimental Designs
DAYS
1 2 3 4 5 6 7 8 9 10
1 45 30 35 50 40 30a 70b
g 2 60 75 80 60 50 70a 50a 65a 80b
2 3 20 20 25 10 30 80b
8^
^ 5
55 60 40 45 50 40a 75a 90b
30 25 20 30 20 30a 30a 40a 35a 50b
Ranks = 1 2 1 1 1 ER == 6
Note. Days 1 through 5 served as baseline (a) days for all subjects and are unmarked,
a = control or baseline, b = experimental or intervention point for a child.
As is evident in Table 9-4, hypothetical data show that the child who
receives the intervention at a given point in time, with the exception of
Subject 1, receives the lowest rank (i.e., 1 or 1st place) for performance on
that occasion. Summing the ranks for all children exposed to the intervention
yields R = 6. The significance of the ranks for designs employing different
numbers of subjects (or baselines) can be determined by examining Table 9-5.
The table provides a one-tailed test for R. (A two-tailed test, of course, can
be computed by doubling the probabiHty level for the tabled columns.) To
return to the above example, R,2 = 6 for 5 subjects (one-tailed test) is equal to
the tabled value required for the .05 level (see arrow). Thus the data in the
hypothetical example permit rejection of the null hypothesis of no treatment
effect.
4 4
5 6 5 5 5
6 8 7 7 7 6
7 11 10 10 9 8
8 14 13 13 12 11
9 18 17 16 15 14
10 22 21 20 19 18
11 27 25 24 23 22
12 32 30 29 27 26
rankings could be made on the basis of the mean performance across the
entire week while the intervention was in effect. Mean performance of the
target child would be compared with the mean of the other persons, and
ranks would be assigned on the basis of each person's mean for that time
period. Using means across days is likely to provide a more stable estimate of
actual performance, to allow the intervention to operate on behavior, and
consequently to reflect intervention effects more readily than evaluation
based on the first day that the intervention is applied. Also, by using averages,
the statistic takes into account the usual manner in which multiple basehne
designs are conducted where the intervention is continued for several days for
one person (baseline) before being introduced to the next person.'
If ranks are to be based on several days rather than a single day, additional
considerations become important. First, the duration employed to evaluate
treatment changes within subjects should be specified in advance. If interven-
tion effects are expected to take a certain period of time, the precise number
of days (or a conservative estimate) should be specified. The mean for that
period is then used when the ranks are assigned. Second the duration for ,
the intervention is introduced to one subject, and change occurs, the amount
of change does not bring the person's score higher (or lower) than the level of
another person who has continued in baseline conditions. The intervention
may have led to change, but this is not reflected in the rankings because of
discrepancies in the magnitude of scores across subjects.
For example, in Table 9-4, compare the hypothetical performance of Child
2 and Child 5. The performance of Child 2 was higher during baseline than
was the performance of Child 5 when treatment was introduced. Had treat-
ment been introduced to Child 5 before Child 2, the rank assigned to Child 5
would not have been as low as it was in the example. This would have been an
artifact of the differences in absolute levels of performance of the subjects
rather than of the ineffectiveness of the intervention. In general, the ranking
procedure, as described thus far, does not take into account the differences in
basehne magnitudes.
A simple data transformation can be used to ameliorate the problem of
different response magnitudes. The transformation corrects for the different
initial levels of baseline responding (Revusky, 1967). The formula for the
transformation is
B/ - A/
A/
Data description
50
40
30
20 h
10 I 'III! 1 1 L
XT
g 50
40 t..
X
UJ
GO
30 !
\-
O 20
LU
b
10 J I I L J L_l I L
50 "
40
30
-
>*
^>*^
slope=l.65
level =39
20 h
1 1 1 1 1 1 1 1 1 1
DAYS 10
FIGURE 9-4. Hypothetical data during one phase of an A-B-A-B design {top panel a), with
steps to determine the median data points in each half of the phase {middle panel b), and with
the original data (dashed) and adjusted (solid) celeration line {bottom panel c).
Statistical Analyses for Single-case Experimental Designs 315
for the and second halves of the phase. This median refers to the data
first
points that form the dependent measure rather than to the number of
sessions.
T\vo potentially confusing points should be resolved. First, although the
sessions are divided into quarters, only the first division (halves) is employed
at this stage. Second, the median data value within each half of the sessions is
Figure 9-4c shows the original line (dotted) and the line (solid) after it has
been adjusted to achieve the split-middle slope. Note that the original line did
not divide the data so that an equal number of points fell above and below the
line. The adjustment achieves this "middle" slope by altering the level of the
line (and not the slope). (In some cases, the original line may not have to be
adjusted.)
The celeration line reflects the rate of behavior change, which can also be
expressed numerically. White (1974) has used the weekly rate of change as the
basis of calculating rate, although any time period that might be more
meaningful for a given situation can be employed. To calculate the rate of
change, a point of the celeration line (Day;^) that passes through a given value
on the ordinate is determined. The data value on the ordinate for the
celeration line 7 days later (i.e., Day;^-+7) is obtained. To compute the rate of
change, the numerically larger value (either Day;^: or Dayj^ +7) is divided by the
smaller value.
The procedure can be applied to the data in Figure 9-4c. At Day 1, the
celeration line is at 20. Seven days later, the line is at approximately 33.
Applying the above computations, the ratio for the rate of change is 1.65.
316 Single-case Experimental Designs
Because the celeration line is accelerating, this indicates that the average rate
of responding for a given week is 1.65 times greater than it was for the prior
week. The ratio merely expresses the slope of the line.
The level of the slope can be expressed by noting the level of the celeration
line on the last day of the phase. In the above example, the level is approxi-
mately 39. When separate phases are evaluated (e.g., baseline and interven-
tion), the levels of the celeration
lines refer to the last day of the first phase
and the first day of the second phase, as will be discussed below.
For each phase in the experimental design, separate celeration lines are
drawn. The slope of each line is expressed numerically. The change across
phases is evaluated by comparing the levels and slopes. Consider hypothetical
data for A and B phases, each with its separate celeration line, in Figure 9-5.
To estimate the change in level, a comparison is made between the last data
point in baseline (approximately 22) and the first data point during the
intervention (approximately 28). The larger value is divided by the smaller
value, yielding a ratio of 1.27. The ratio merely expresses how much higher
(or lower) the intersection of the different celeration lines is. Similarly, for a
change in slope, the larger slope is divided by the smaller slope, yielding a
value in the example of 1.52. The change in level and slope summarizes the
differences in performance across phases.
Statistical analysis
BASELINE INTERVENTION
100
Slope = X 1.05 Slope = x 1.60
Level = 22 Level = 28
g (line at (line at
last day) first day)
50
I 40
UJ
CD
-
30
Ll
O
20 -
UJ
I-
<
or
10
FIGURE 9-5. Hypothetical data across baseline (A) and intervention (B) phases, with separate
celeration lines for each phase (solid lines). The dashed line represents an extension of the
extended from baseline into the intervention phase. For purposes of the
statistical test, it is assumed that the probability of a data point during the
intervention phase falling above the projected celeration line of baseline is
50% (i.e., p = .5), given the null hypothesis of no change across phases. A
binomial test can be used to determine if the number of data points that are
above the slope, p = (io)'/2' yields ap< .001. Thus the null hypothesis can
be rejected; the data in the intervention phase are significantly different from
the data of the baseline phase. The results do not convey whether the level
and/or slope account for the differences but only that the data overall depart
from one phase to another.
318 Single-case Experimental Designs
describe the data in a summary fashion and to predict the outcome given the
rate of change. The utility of the test is that it provides a computationally
simple technique for characterizing data and for examining if trends change
across phases. In the usual case of data presentation in single-case research,
summary statistics are often restricted to describing mean changes across
phases (see Kazdin, 1982b). The split-middle technique can provide addi-
tional descriptive information on the level, slope, and changes in these
characteristics over time (see Wolery & Billingsley, 1982).
Since a major purpose of the technique is to predict behavior rather than to
determine statistical significance of change, it is appropriate to examine the
extent to which this purpose is adequately achieved. White (1974) presented
data based upon "several thousand" analyses of classroom performance. The
analyses determined the accuracy of predicting behavior using the split-
middle procedure at different points in the future. As might be expected, the
extent to which the predictions approximated the actual data depended upon
the number of data points upon which the prediction was based and upon the
amount of time into the future that was predicted. For example, on the basis
of 7 days of data, performance one week into the future would be success-
fully predicted (with a narrow margin of error) 64% of the time; for perfor-
mance 3 weeks into the future, predictions were successful SO^Vo of the time.
With 1 1 days of data, predictions one week into the future were successful
89<7o of the time; for performance 3 weeks into the future, predictions were
successful 81% of the time.
The predictive uses of the split-middle technique have been accorded
important applied significance. If the data suggest that behavior is not
changing at a sufficient rate to obtain a particular goal, the intervention can
be altered. Thus the technique may provide useful information that leads the
investigator to change the intervention as needed.
baseline. The binomial test might show a statistically significant effect even
though the numbers were assigned randomly and no intervention was imple-
mented. Thus problems may exist in drawing inferences using the binomial
test when trend is evident in baseline (or the condition from which a projected
and not the clinical significance of the changes. Although rules of science
have depended upon levels of confidence as a criterion to decide veridical
effects, no leap is warranted from levels of confidence to the applied value of
the finding. Clinical significance, as noted earlier, refers to the importance of
the change and entails different criteria from those invoked for statistical
analyses.
Clinical significance is usually viewed as a more stringent criterion than
statistical significance because many statistically reliable effects can be ob-
tained without clear or detectable impact on everyday client functioning. It is
especially marked and hence typically statistically significant. There are also
cases, however, where clinically significant effects might be evident where
statistical tests might not be applicable and or where statistical significance is
not For example, for clinical cases where complete amelioration of the
clear.
problem is achieved in one trial (e.g., Creer, Chai, & Hoffman, 1977),
statistical significance would be difficult if not impossible to demonstrate with
tests. For example, protracted baseline phases are difficult to justify but could
be essential in order to apply such tests of time series analyses.
An important characteristic of single-case designs is that they are quite
flexible. Design changes are made in part as a function of the client*s re-
9.10. CONCLUSIONS
The present chapter has discussed specific statistical tests for single-case
experimental designs and considerations dictated by their use. The availability
of multiple statistics provides the investigator with diverse options for the
single-case. A few salient considerations underlying all of the tests warrant
reiteration. To begin with, the appropriateness of utilizing statistical criteria
for the evaluation of applied behavioral interventions remains a major source
of controversy. Statistical analysis is seen by many proponents of single-case
research as a violation of the rationale for conducting research with the
individual subject. Thus whether statistical tests should be used to draw
inferences from single-case research remains an issue.
On this issue, it is important to distinguish experimental designs (e.g.,
single-case and between-group designs), methods of data evaluation (e.g.,
visual inspection and statistical analyses), and types of research (e.g., basic or
applied). There are no necessary connections between particular types of
research, designs, and analyses. Thus use of statistical analyses does not
necessarily conflict with single-case designs or their purposes. When research
attempts to develop a technology of behavior change and to achieve clinically
important effects, statistical analyses will definitely be of limited value. Small
effects that pass beyond a threshold of traditional levels of confidence may
not address the priorities of applied research. Yet there are several uses of
statistics, detailed earlier, that may contribute to the goals of applied re-
search.
Another issue important to mention is that the use of statistical tests may
322 Single-case Experimental Designs
NOTES
1. As the lag increases, the correlation becomes somewhat less stable, in part,
because of the decrease in the number of pairs of observations upon which the
coefficient can be based (Holtzman, 1963).
3. Baer (1977a) has articulately stated the similarities and differences in the ra-
tionales underlying statistical analysis and visual inspection. Both methods of data
evaluation attempt to avoid Type I and Type II error. Type I error refers to
concluding that the intervention produced a veridical effect when in fact the
results are attributed to chance. Type II errors refers to concluding that the
intervention did not produce a veridical effect when in fact it did. Typically,
researchers give a higher priority to avoiding a Type I error. In statistical analyses,
the probability of committing a Type I error is specified (by the level of confidence
of the statistical test or a). With visual inspection, the probability of a Type I error
is not known. Hence, to avoid chance effects, the investigator searches for highly
consistent effects that can be readily seen. By minimizing the probability of a Type
I error, researchers increase the probability of making a Type II error. Investiga-
tors who rely on visual inspection are more likely to commit Type II errors than
investigators who rely on statistical analyses. Thus reliance on visual inspection
Statistical Analyses for Single-case Experimental Designs 323
will tend to overlook and discount many reliable but weak effects. From the
standpoint of developing an effective applied technology of behavior change,
Baer (1977a) has argued persuasively that minimizing Type I errors leads to
few variables whose effects are consistent and potent across a
identification of a
wide range of conditions. Thus visual inspection may be suited for the special
goals of applied research. For other research purposes (e.g., testing of alternative
theories), weak but reliable effects may be important to detect, and the priorities
of erring in one direction rather than another might change.
4. The randomization test discussed and illustrated here is one of many available
tests (see Edgington, 1969, 1984). The specific one selected, which compares
means from different conditions, is likely to be of special interest in single-case
experiments where performance is compared across phases.
6. As a general guideline, ranks are assigned so that the lowest number is given to the
baseline that shows the highest level of performance in the desired direction. An
easy rule of thumb is to assign "first place" (a rank of 1) to the highest or lowest
score that represents the "best" performance in terms of the dependent measure.
Thus 1 might be assigned to the highest performance of social skills or the lowest
8. The semilog units refer to the fact that the scale on the ordinate is logarithmic but
the scale on the abscissa is not. The effect of this arrangement is to ensure that
there is no zero origin on the graph and that low and high rates of performance
can be readily represented. The chart can be used for behaviors with extremely
high or low rates. Rates of behavior can vary from .0006944 per minute (i.e., one
every 24 hours) to 1000 per minute. (The semilog chart paper has been developed
by Behavior Research Company, Kansas City, KS.) Adoption of the charting
procedure has not been widespread in applied research. Hence it is useful to note
that the split-middle technique can be used with ordinary graph paper.
9. The binomial applied to the split-middle slope test would be the probability of
attaining x data points above the projected slope:
X = the number of data points above (or below) the projected slope
p = ^ = .5 by definition of the split-middle slope
p and q = the probability of data points appearing above or below the slope given
the null hypothesis
10. Other design options may raise special issues for statistical tests. For example, in a
changing criterion design, the intervention may be introduced in such a way that
only gradual and small changes in behavior are sought. Obviously, one might not
wish to test for changes in level in such instances, because abrupt changes at the
point of introducing the intervention might not be expected. In an alternating- or
simultaneous-treatment design of special interest, it is not the change from one
phase to another but rather whether separate interventions implemented in the
same phase differ significantly. Analyses discussed previously can be adopted to
these circumstances (e.g., see Edgington, 1982; Kratochwill & Levin, 1980).
CHAPTER 10
10.1 INTRODUCTION
Replication is at the heart of any science. In all sciences, replication serves at
least two purposes: first, to establish the reliability of previous findings; and,
second, to determine the generality of these findings under differing condi-
tions. These goals, of course, are intrinsically interrelated. Each time that
certain results are replicated under different conditions, this not only es-
tablishes generality of findings, but also increases confidence in the reliability
of these findings. The emphasis of this chapter, however, is on replication
procedures for establishing generality of findings.
In chapter 2 the difficulties of establishing generality of findings in applied
research were reviewed and discussed. The problem in generalizing from a
heterogeneous group to an individual limits generality of findings from this
approach. The problem in generalizing from one individual to other individu-
als who may differ in many ways limits generality of findings from a single-
case. One answer to this problem is the replication of single-case experiments.
Through this procedure, the applied researcher can maintain his or her focus
on the individual, but establish generality of findings for those who differ
from the individual in the original experiment. Sidman (1960) has outlined
two procedures for replicating single case experiments in basic research: direct
replication and systematic replication. In applied research a third type of
replication, which we term clinical replication, is assuming increasing impor-
tance.
The purpose of this chapter is to outline the procedures and goals of
replication strategies in applied research. Examples of each type of replication
series will be presented and criticized. Guidelines for the proper use of these
325
326 Single-case Experimental Designs
investigator" (p. 73). Sidman divided direct replication into two different
procedures: repetition of the experiment on the same subject and repetition
on different subjects. While repetition on the same subject increases con-
fidence in the reliability of findings and is used occasionally in applied
research (see chapter 5), generality of findings across cHents can be ascer-
tained only by replication on different subjects. More specifically, direct
replication in applied research refers to administration of a given procedure
by the same investigator or group of investigators in a specific setting (e.g.,
hospital, clinic, or classroom) on a series of clients homogeneous for a
particular behavior disorder (e.g., agoraphobia, compulsive hand washing).
While it is recognized that, in applied research, clients will always be more
heterogeneous on background variables such as age, sex, or presence of
additional maladaptive behaviors than in basic research, the conservative
approach is to match clients in a replication series as closely as possible on
Beyond the Individual: Replication Procedures 327
benefit from the procedure and some do not, can then be attributed to as few
differences as possible, thereby providing a clearer direction for further
experimentation. This point will be discussed more fully below.
Direct replication as we define can begin to answer questions about
it
examined the
replications of a therapeutic procedure. This early clinic2il series
effects of social reinforcement (praise) on severe agoraphobic behavior in
three patients (Agras et al., 1968). This series was also one of the first
evaluations of direct-exposure-based treatments for phobia that have become
the treatment of choice today (Mavissakalian & Barlow, 1981b). This proce-
dure has also come to be known as reinforced practice (Leitenberg, 1976) and
self-observation therapy (Emmelkamp, 1982). The procedure was straight-
forward.
All patients were hospitalized. Severity of agoraphobic behavior was
measured by observing the distance the patients were able to walk on a course
from the hospital to a downtown area. Landmarks were identified at 25 -yard
intervals for over one mile. The patients were asked two or more times a day
to walk as far as they could on the course without feeling "undue tension."
Their report of distance walked was surreptitiously checked from time to time
by an observer to determine reliability, precise feedback of progress in terms
of increases in distance was provided, and this progress was socially rein-
forced with praise and approval during treatment phases and ignored during
withdrawal phases. In the first patient, increases in time spent away from the
center were praised first, but as this resulted in the patient simply standing
outside the front door of the hospital for longer periods, the target behavior
was changed to distance. Because baseline procedures were abbreviated, this
design is best characterized as a B-A-B design (see chapter 5). The compari-
son, then, is between treatment (praise) and no treatment (no praise).
For purposes of generality across clients, it is important to note that the
patients in this experiment were rather heterogeneous, as is typically the case
328 Single-case Experimental Designs
in applied research. Although each patient was severely agoraphobic, all had
numerous associated fears and obsessions. The extent and severity of
agoraphobic fears differed. One subject was a 36-year-old male with a 15-year
agoraphobic history. He was incapacitated to the extent that he could manage
a 5-minute drive to work in a rural area only with great difficulty. A second
subject was a 23 -year-old female with only a one-year agoraphobic history.
This patient, however, could not leave her home unaccompanied. The third
subject, a 36-year-old female, also could not leave her home unaccompanied,
but had a 16-year agoraphobic history. In fact, this patient had to be sedated
and brought to the hospital in an ambulance. In addition, these 3 patients
presented different background variables such as personality characteristics
and cultural variations (one patient was European).
The results from one of the cases (the male) are presented in Figure 10-1.
Reinforcement produced a marked increase in distance walked, and with-
drawal of reinforcement resulted in a deterioration in performance. Reintro-
duction of reinforcement in the final phase produced a further increase in
distance walked. These results were replicated on the remaining 2 patients.
At least three conclusions can be drawn from these data. The first conclu-
sion is that the treatment was effective in modifying agoraphobic behavior.
The second conclusion is that within the limits of these data, the results are
reliable and not due to idiosyncracies present in the first experiment, since two
replications of the first experiment were successful. The third conclusion,
however, is of most interest here. The procedure was clearly effective with 3
patients of different ages, sex, duration of agoraphobic behavior, and cultural
backgrounds. For purposes of generality of findings, this series of experi-
ments would be strengthened by a third rephcation (a total of 4 subjects). But
the consistency of the results across 3 quite different patients enables one to
draw initially favorable conclusions on the general effectiveness of this proce-
dure across the population of agoraphobic clients through the process of
logical generalization (Edgington, 1967).
On the other hand, if one client had failed to improve or improved only
slightly such that the result was clinically unimportant, an immediate search
would have had to be made for procedural or other variables responsible for
the lack of generality across clients. Given the flexibility of this experimental
design, alterations in procedure (e.g., adding additional reinforcers, changing
the criterion for reinforcement) could be made
an attempt to achieve
in
clinically important results. If mixed results such as these were observed,
further replication would be necessary to determine which procedures were
most efficacious for given clients (see section 2.2, chapter 2).
In this series, however, these steps were not necessary due to the uniformly
successful outcomes, and some preliminary statements about client generality
were made. The next step in this series, then, would be an attempt to replicate
the results systematically, that is, across different situations and therapists. It
Beyond the Individual: Replication Procedures 329
1200
1000
'^ 800 -
600 -
400
200 -
10 12 16
BLOCKS OF 5 TRIALS
FIGURE 10-1. The effects of reinforcement and nonreinforcement upon the performance of an
agoraphobic patient (Subject 2). (Figure 2, p. 425, from: Agras, W. S., Leitenberg, H.,and
Barlow, D. H. [1968]. Social reinforcement in the modification of agoraphobia. Archives of
General Psychiatry, 19, 423-427. Copyright 1968 by American Medical Association. Reproduced
by permission.)
is evident that the preliminary series, which was carried out in Burlington,
Vermont, does not address questions on effectiveness of techniques in dif-
ferent settings or with different therapists. It is entirely possible that charac-
teristics of the therapist or the particular structure of the course that the
agoraphobic walked facilitated the favorable results. Thus these variables
must be systematically varied to determine generality of findings across all
important clinical domains. In fact, this step was taken many times. Using
procedures that were operationally quite similar to those described above, but
carrying different labels, Marks (1972) successfully treated a variety of severe
agoraphobics in an urban European setting (London) using, of course,
different therapists, and Emmelkamp (1974, 1982) treated a long series of
Dutch agoraphobics.
330 Single-case Experimental Designs
the treatment, if successful, must remain the same, and the comparison is
slightly during response prevention and continued into follow-up. This de-
cline continued beyond the data presented in Figure 10-2 until urges were
minimal. These results were essentially replicated in the remaining three hand
washers.
Before discussion of issues relative to replication, experimental design
considerations in this series deserve comment. The dramatic success of re-
attribute its reduction to response prevention using the basic A-B-A with-
c
Baseline Placebo Response Placebo Baseline
80 1 Prevention
I 4 5 ,
Begin + Placebo
C Exposure 3
xeoA
CO
i^
O
B
c
s
40
V
20-
f
0-
45.
I
A
1.35
'5
O
15
25
/-
KV
I I I ^T 1 1 r-
1 9 11 12 18 19 21 22 28
Two-Day Blocks
FIGURE upper half of the graph, the frequency of hand washing across treatment
10-2. In the
phases is Each point represents the average of 2 days. In the lower portion of the
represented.
graph, total urges reported by the patient are represented. (Figure 3, p. 527, from: Mills, H. L.,
Agras, W. S., Barlow, D. H., and Mills, J. R. [1973], Compulsive rituals treated by response
prevention: An experimental analysis. Archives of General Psychiatry, 28, 524-529. Copyright
1973 by American Medical Association. Reproduced by permission.)
Beyond the Individual: Replication Procedures 333
drawal design. From the perspective of this design, it is possible that some
correlated event occurred concurrent with response prevention that was ac-
tually responsible for the gains. Fortunately, the aforementioned flexibility in
cations ensure that this finding is reliable. In addition, the clinical significance
of the result is easily observable by inspection, since rituals were entirely
elminated in all 4 patients. More importantly, however, the fact that this
was consistently present across 4 patients lends considerable
clinical result
confidence to the notion that this procedure would be effective with other
patients, again through the process of logical generalization. It is common
sense that confidence in generality of findings across clients increases with
each replication, but it is our rule of thumb that a point of diminishing
returns is reached after one successful experiment and three successful repli-
cations for a total of 4 subjects. At this point, it seems efficient to publish the
results so that systematic replication may begin in other settings.
An alternative strategy would be to administer the procedure in the same
setting to clients with behavior disorders demonstrating marked differences
from those of the first series. Some behavior disorders such as simple phobias
lend themselves to this method of replication since a given treatment (e.g., in
vitro exposure) should theoretically work on many different varieties of
simple phobia. Within a disorder such as compulsive rituals, this is also
feasible because several different types of rituals are encountered in the clinic
(Mavissakalian & Barlow, 1981a; Rachman & Hodgson, 1980). The question
that can be answered in the original setting then is: Will the procedure
work on other behavior disorders that are topographically different but
presumably maintained by similar psychological processes? In other words,
would rituals quite different from hand washing respond to the same proce-
dure? The fifth case in this series was the beginning of a replication along
these lines.
The fifth was a 15-year-old boy who performed a complex set of
subject
when retiring
rituals at night and another set of rituals when arising in the
morning. The night rituals included checking and rechecking the pillow
placement and folding and refolding pajamas. The morning rituals were
concerned mostly with dressing. This type of ritual has come to be known as
checking as opposed to previous washing rituals. The rituals were extremely
334 Single-case Experimental Designs
time consuming and disruptive to the family's routine. After a baseline phase
in which rituals remained relatively stable, the night rituals were prevented,
but the morning rituals were allowed to continue. Here again, response
prevention dramatically eliminated nighttime rituals. Morning rituals gradu-
ally decreased to zero during prevention of night rituals.
treatment phase, but the increase was quite modest. Withdrawing treatment
resulted in a slight drop in heterosexual arousal, which increased once again
336 Single-case Experimental Designs
MALE
BASELINE . FEMALE EXPOSURE EXPOSURE FEMALE EXPOSURE
62.5-
Circumference change tO:
Females
Males
50-
37.5-
25
123 456789 10 11 12 13 14 15
BLOCKS OF THREE SESSIONS
{ Circumference Change to Males Averaged Over Each Phase )
FIGURE 10-3. Mean penile circumference change, expressed as a percentage of full erection, to
nude female (averaged over blocks of three sessions) and nude male (averaged over each phase)
slides. (Figure 1, p. 338, from: Herman, S. H., Barlow, D. H., and Agras, W. S. [1974]. An
experimental analysis of exposure to "explicit" heterosexual stimuli as an effective variable in
changing arousal patterns of homosexuals. Behaviour Research and Therapy, 12, 335-346.
Copyright 1974 by Pergamon. Reproduced by permission.)
when the heterosexual film was reinstated. This last increase, however, does
not become clear until the last point in the phase, which represents only one
session. Subsequently, the patient was unable due to to continue treatment
prior commitments precluding an extension of which would have this phase,
confirmed (or disconfirmed) the increase represented by that one point.
Reports of sexual fantasies and behavior were consistent with the modest
increases in heterosexual arousal. While some increase in heterosexual fanta-
sies was noted, the patient continued to employ homosexual fantasies occa-
Beyond the Individual: Replication Procedures 337
FEMALE EXPOSURE
MALE FEMALE
EXPOSURE EXPOSURE
75
oO 50
I.I LU
1 point
iZ O
S^
^^
LU <
= q25-
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
BLOCKS OF TWO SESSIONS
( Circumference Change to Males Averaged Over Each Phase )
FIGURE 10-4. Mean penile circumference change, expressed as a percentage of full erection, to
nude female (averaged over blocks of two sessions) and nude male (averaged over each phase)
slides. (Figure 4, p. 342 from: Herman, S. H., Barlow, D. H., and Agras, W. S. [1974]. An
experimental analysis of exposure to "explicit" heterosexual stimuli as an effective variable in
changing arousal patterns of homosexuals. Behaviour Research and Therapy, 12, 335-346.
Copyright 1974 by Pergamon. Reproduced by permission.)
sionally during sexual intercourse with his wife and was still unable to
ejaculate.
Again, conclusions in three general areas can be drawn from these data.
First, exposure to explicit heterosexual films can be an effective variable for
increasing heterosexual arousal, as demonstrated by the experimental analysis
of the first patient. Second to the extent
, that the results were replicated
directly on three patients, the data are reliable and are not due to idiosyncra-
cies in the first case. It does not follow, however, that generality of findings
across patients- has been firmly established. Although the results were clear
and clinically significant for the first 3 patients, results from the fourth patient
338 Single-case Experimental Designs
then, is that this procedure has only limited generality across clients, and the
task remains to pinpoint differences between this patient and the remaining
patients to ascertain possible causes for the limitations on client generality.
The authors (Herman et al., 1974b) noted that the fourth patient differed
two ways from the remaining three. One difference falls under the
in at least
heading of background variables and the other is procedural. First, the
patient was married and therefore was required to engage in heterosexual
intercourse before heterosexual arousal or interest was generated. In fact, he
reported this to be quite aversive, which may have hampered the development
of heterosexual interest during treatment. The remaining patients had expe-
rienced no significant heterosexual behavior prior to treatment. Second, this
patient was seen less frequently than other patients. At most he was seen three
times a week, rather than daily. At times, this dropped to once a week and
even once every 3 weeks during periods when other commitments interfered
with treatment. It is possible that this factor retarded development of hetero-
sexual interest. To the extent that this was a procedural problem, rather than
a variable that the patient brought with him to the experiment, it would have
been possible to alter the procedure prior to the beginning of the experiment
or even during the experiment (i.e., require daily attendance). If this altera-
tion had been undertaken and similar results (the weak effect) had ensued, it
might have limited the search for causes of the weak effect to just the
background variables, such as the ongoing aversive heterosexual behavior. Of
course, this procedural variable was not thought to be important when the
experiment was designed. In fact, failures to replicate are always occurring in
direct replication series. Another good example was presented in the study by
Ollendick et al. (1981) in chapter 8 (Figures 8-3 and 8-4). In this comparison
of two treatments in an ATD, one treatment was more effective than another
for the first subject, but just the opposite was true for the second subject.
Because the investigators were close to the data, they speculated on one
seemingly obvious reason for this discrepancy. Thus, pending a subsequent
test of their hypothesis, they have already taken the first step on the road to
tracking down intersubject variability and establishing guidelines for general-
ity of findings. The investigators themselves are always in the best position to
identify, and subsequently test, putative sources of lack of generality of
findings.
The issue of interpreting mixed results and looking for causes of failure
Beyond the Individual: Replication Procedures 339
I 1 1 1 1 1 1 1 1 1 I 1 1 1
14 15 25 26 32 33 43 44 50
DAYS
FIGURE 10-5. Percentage delusional talk of Subject 1 during therapist sessions and on ward for
each experimental day. (Figure 1, p. 254, from: Wincze, J. P., Leitenberg, H., and Agras, W. S.
[1972]. The effects of token reinforcement and feedback on the delusional verbal behavior of
chronic paranoid schizophrenics. Journal of Applied Behavior Analysis 5, 247-262. Copyright y
tal analysis to determine which variables were responsible for the improve-
ment. The lack of replication, however, suggests that this would not be a
fruitful line of inquiry.
The results from token reinforcement were quite different. This procedure
was administered to 9 patients. Six (Subjects 1, 2, 4, 5, 8 and 9) improved
an improvement that was confirmed by a return of delusional speech when
token reinforcement was removed. Subject 7 also improved, but delusional
speech did not reappear when token reinforcement was removed. In all of
these patients, the decrease was substantial both in percentage of delusional
speech and in trends across the token phase.
Several conclusions can be drawn from these data. In terms of reduction of
delusional speech within sessions, the experimental analysis demonstrated
that token reinforcement was effective, and replication indicated that the
finding had some reliability. Generality of findings across clients, however, is
O vO ON o^^ O NO w-> -^
? S5 iS
t^ NO f^ ni:
ro
I I I I I I
>V Z v;
r- Tf Tj- <s
SQc^
O o< CO
-^ oo O O O 00
VO "O 0\ vo -H m <r>
I I
f<|ob
NO '^
O 2;
d d <N m I
_ 55 N6obo\odd/^No^-^
-^ <N r- NO OO O 00 ON o <N
I I
ob TJ- ij
NO <N NO r~
o
C/5
ooTtNoc^pa^TtNon<s r-r>oow-^rofnfnONTt-^
cKo-^w^rnCrvNOfS-^ON
C/5
U
u
z
D
a
u '^<NpNOf^P'<tpr<^fn r~<SNOOfnONt^Ti-ONO
CO ooNOf^No-^r^vor^-NOoo
< vOr>J00-^O\<Nr~-<S00rf
Qu
.ii
C/)
73 .S T3
l_i (/I U;
.ii
(/)
T3
b;
b.
2
CA
T3
kZ
*t^
(^
jo
4>>
ij^ ^ j^4)>.S w>r
a>> <j^
Djs:
IS
342
Beyond the Individual: Replication Procedures 343
extent that the experimental design was sound (internally valid). However,
applied researchers cannot stop here, satisfied that the procedure seems to
work well enough on most cases, since the practicing clinician would be at a
loss to predict which cases would improve with this procedure. In fact,
because the authors (Wincze et al., 1972) noted that these two cases actually
deteriorated on the ward during this treatment, the search for accurate
predictions of success becomes all the more important to the clinician. Thus a
careful search for differences that might be important in these cases should
ensue, leading to a more intensive functional investigation and experimental
manipulation of those factors that contribute to success or failure.
In view of the additional fact that all subjects in this series demonstrated
little generalization of improvement from session to ward behavior, analysis
of this treatment is in a very preliminary state and, as Wincze et al. (1972)
pointed out, "... much work needs to be done in order to predict when a
given type of behavioral intervention is likely to succeed in a given case"
(p. 262).
Finally,seems important to make a methodological point on the size of
it
cal strategy, the experimental design was a multiple baseline across behaviors
for six subjects. Three different aspects of social skills were repeatedly
assessed by role playing. Intervention then proceeded for all six subjects on
the first social skill, followed by the second social skill, and so on. In this
hypothetical example, of course, all subjects did very well, with particular
aspects of social skills improving only when treated. Naturally, this strategy
need not be limited to a multiple-baseline-across-behaviors design. Almost
any single-subject design, such as an alternating treatments design or a
standard withdrawal design, could be simultaneously replicated.
From the point of view of replication, this is a very economical and
conservative way to proceed. It is economical because it is less time consum-
ing to treat six clients in a group than it is to treat six clients individually. But
one still has the advantage of observing individual data repeatedly measured
from six different subjects. Naturally, this is only possible where opportuni-
ties for group therapy exist. Furthermore, the procedure is conservative
because fewer variables are different from client to client. The gamble taken
by the investigator in a replication series with increasing heterogeneity or
diversity of subjects or settings was mentioned above. To repeat, if a replica-
tion fails, the more differences there are in subjects, settings, timing of the
intervention, and so forth, the harder it is to track down the cause of the
failure for replication during subsequent experimentation. If all subjects are
treated simultaneously in the same group, at the same time, then one can be
relatively sure that the intervention procedures, as well as setting and tem-
poral factors, are identical. If there is a failure to replicate, then the investiga-
tor should look elsewhere for possible causes, most likely in background
variables or personality differences in the subjects themselves.
Of course, treating clients in group therapy has its own special kind of
setting. If one were interested in the generality of these findings to individual
treatment settings, the first step in a systematic replication series would be to
test the procedure in subjects treated individually. Also, when groups of
individuals are treated simultaneously, one cannot stop the series at just any
time to begin examining for causes of failures if they occur. However, this is
not really a problem as long as the groups remain reasonably small (e.g.,
FREOUFNCY OF
FIRST
COMPONENT SKILL
IN ROLE PLAY
FREQUENCY OF
SECOND
COMPONENT SKILL
IN ROLE PLAY
FREQUENCY OF
THIRD
COMPONENT SKILL
IN ROLE PLAY
DAYS
FIGURE 10-6. Graphed hypothetical data of simultaneous replications design. (Figure 2, p. 306
from: Kelly, J. A., Laughlin, C, Claiborne, M., & Patterson, J. [1979]. A group procedure for
teaching job interviewing skills to formerly hospitalized psychiatric patients. Behavior Therapy,
10, 299-310. Copyright 1979 by Association for Advancement of Behavior Therapy, Reproduced
by permission.)
346 Single-case Experimental Designs
One of the first reports on differential attention appeared in 1959 (Ayllon &
Michael). This report contained several examples of the application of dif-
ferential attention to institutionalized patients in a state hospital. The thera-
pists in all cases were psychiatric nurses or aides. The purpose of this early
demonstration was to illustrate to personnel in the hospital the possible
clinical benefits of differential attention. Thus differential attention was
applied to most cases in an A-B design, with no attempt to demonstrate
experimentally its controUing effects. In several cases, however, an experi-
mental analysis was performed. One patient was extremely aggressive and
required a great deal of restraint. One behavior incompatible with aggression
was or lying on the floor. Four-day baseline procedures revealed a
sitting
It is safe to say that the impact of this work on adult wards has been
substantial, and differential attention to psychotic behavior is now a common
therapeutic procedure on many wards. More has been thor-
importantly, it
these populations (e.g., Paul & Lentz, 1977; Wallace et al., in press). In
retrospect, however, there are many methodological faults with this series,
leading to large gaps in our knowledge, which could have been avoided had
replication been more systematic.
While differential attention was successfully administered on psychiatric
wards in several different parts of the country across the range of therapists
or ward personnel typically employed in these settings and across a variety of
psychotic behaviors, from motor behavior through inappropriate speech,
only a few studies contained experimental analyses. On the other hand, many
of the reports would come under the category of case studies (A-B designs
with measurement). Certainly, this preliminary series on institutionalized
patients would be much improved had each class of behavior (e.g., verbal
behavior, withdrawn behavior, inappropriate behavior, aggressive or other
motor behaviors) been subjected to a direct replication series with three or
four patients and then systematically replicated in other settings with other
therapists.
This procedure most likely would have produced some failures. Reasons
for these failures could then have been explored, providing considerably more
information to clinicians and ward personnel on the limitations of differential
attention. As it stands, Ayllon and Michael (1959) reported a failure but did
not describe the patient in any detail or the circumstances surrounding the
failure. This type of reporting leads to undue confidence in a procedure
among when failures do occur, disappointment is followed
naive clinicians;
by a tendency to eliminate the procedure entirely from therapeutic programs.
In this specific case, however, what has happened is that differential attention
has been incorporated into more comprehensive programs without adequate
analysis of its contribution. With some cases or in some settings it may be
either important or superfluous. In other cases it may even be detrimental (see
Herbert et al., 1973).
This early series also illustrated a second use of the single-case study (A-B).
In chapter 1 we noted that case studies can suggest initially that a new
technique is clinically effective, which can lead to more rigorous experimental
demonstration and direct replication. In a systematic replication series the
single-case study makes another appearance. Many reports are published that
include only one case, but replicate an earlier direct replication series in either
an experimental or an A-B form. Usually the reports are from different
settings and contain a slight twist, such as a new form of the behavior
disorder or a slight modification of the procedure. While these reports are less
desirable from the larger viewpoint of a systematic replication series, the fact
is that they are published. When a sufficient number accumulate, these
reports can provide considerable information on generality of findings. We
will return to this point later.
352 Single-case Experimental Designs
with an adult. More recently. Redd has extended this work by demonstrating
the usefulness of differential attention in controlling retching and vomiting in
cancer patients undergoing chemotherapy Redd, 1980). Specifically,
(e.g.,
case studies suggested that differential attention was effective in this context.
Since that time, marital therapies based broadly on social learning principles
have become well developed and are widely used for the treatment of marital
distress & MargoHn, 1979; Liberman et al., 1980; 0*Leary &
(Jacobson
Tbrkewitz, 1981).Most of these programs contain a variety of interventions,
including comunications training, problem solving, and instructions on al-
tering various dyadic patterns of behavior. Embedded within these ap-
proaches, however, is a strong differential attention component. For
example, when leading marital therapists describe their actual approaches in
great detail (e.g., L. F. Wood & Jacobson, 1984), these treatments include
training in expressions of appreciationand praise contingent on desirable
partner behavior. Often this most prominent in the early stages of therapy.
is
For example, during "caring days" husbands and wives are taught to express
appreciation for positive qualities or behaviors of their spouses. Ways in
which spouses would like their partners to express appreciation are carefully
explored in the therapy session. These types of expressions, most often
including positive verbal feedback of some sort or another, are then inte-
grated into the couples' daily lives. Unfortunately, this treatment component
has never been evaluated systematically, and thus, once again, we are not sure
of the specific conditions in which it succeeds or fails.
Thus the deficits and faults in this area are similar to those encountered in
the series with psychotic adults described above. Evidence exists that differen-
tial number of settings (e.g., inpatient, outpa-
attention can be effective in a
tient, when applied by different therapists (e.g., doctors, nurses, or
or home)
wives) on a number of different behavioral problems. The difficulty here is
with the dearth of experimental analyses and direct replication in each new
setting or with each new problem. Nevertheless, clinical investigators have for
354 Single-case Experimental Designs
the most part not followed the type of detailed technique-building approach
described in chapter 2 that would ensure that treatment programs, such as
marital therapy, be as powerful as they might be.
was a minor part of a treatment package, such as parent training, were for the
most part omitted. It is certainly possible that a few additional studies were
inadvertently excluded. In the table, it is important to note the variety of
clients, problem behaviors, therapists, and settings described in the studies,
because generality of findings in all domains is entirely dependent on
relevant
the diversity of settings, clients, and the rest employed in such studies. One
should also note that the bulk of this work occurred in the late 1960s and
early 1970s, with a decrease in published research since that time. Unlike the
examples above, this is due to the fact that many of the goals of this
systematic replication series were completed. We will discuss this issue further.
Most replication efforts through 1965 presented an experimental analysis
of results from a single-case (see Table 10-2). A good example of the early
studies was presented by Allen et al. (1964), who reported that differential
attention was responsible for increased social interaction with peers in a
socially isolated preschool girl. The setting for the demonstration was a
classroom, and the behavior change agent, of course, was the teacher. While
most of the early studies contained only one case, the experimental demon-
stration of the effectiveness of differential attention in different settings with
different therapists began to provide information on generality of findings
across all-important domains. These replications increased confidence in this
procedure as a generally effective clinical tool. In addition to isolate behavior,
the successful treatment of such problems as regressed crawling (Harris,
Johnston, Kelley, & Wolf, 1964), crying (Hart, Allen, Buell, Harris, & Wolf,
1964), and various behavior problems associated with the autistic syndrome
Beyond the Individual: Replication Procedures 355
(e.g., Davison, 1965) also suggested that this procedure was applicable to a
wide variety of behavior problems in children while at the same time provid-
ing additional information on generality of findings across therapists and
settings.
Although studies of successful application of differential attention to a
single-case demonstrated that this procedure is applicable in a wide range of
situations, a more important development in the series was the appearance of
direct replication efforts containing three or more cases within the systematic
Although reports of single-cases are uniformly successful,
replication series.
or they would not have been published, exceptions to these reports of success
can and do appear in series of cases, and these exceptions or failures begin to
define the limits of the applicability of differential attention.
For this reason, it is particularly impressive that many series of three or
more cases reported consistent success across many different clients, with
such behavior disorders as inappropriate social behavior in disturbed hospi-
talized children (e.g., Laws, Brown, Epstein, & Hocking, 1971), disruptive
behavior in the elementary classroom (e.g., Cormier, 1969; R. V. Hall et al.,
1971; R. V. Hall, Lund, & Jackson, 1968) or high school classroom (e.g.,
Schutte & Hopkins, 1970), chronic thumb-sucking (Skiba, Pettigrew, &
Alden), disruptive behavior in the home (Veenstra, 1971; Wahler, Winkel,
Peterson, & Morrison, and disruptive behavior in brain-injured
1965),
children (R. V. Hall & Broden, 1967). These improvements occurred in many
different settings such as elementary and high school classrooms, hospitals,
homes, kindergartens, and various preschools. Therapists included profes-
sionals, teachers, aides, parents, and nurses (see Table 10-2).
The consistency of their success was impressive, but as these series of cases
accumulated, the inevitable but extremely valuable reports of failures began
to appear. Almost from the beginning, investigators noted that differential
attention was not effective with self-injurious behavior in children. For
instance, Tate and Baroff (1966) noted that in the length of time necessary for
differential attention to work, severe injury would result. In place of differen-
tial attention, a strong aversive stimulus
electric shock
proved effective in
suppressing this behavior. Later, Corte, Wolf, and Locke (1971) found that
differential attention was totally ineffective on mild self-injurious behavior in
retarded children but, again, electric shock proved effective. Because there
are no reports of success in the literature using differential attention for self-
injurious behavior, it is unlikely that these cases would have been published at
all if differential attention had not proven effective on other behavior disor-
(4-1
{3 2 2 o o
0U 0U
9i
g
1 I 8
8
ill
C 4> t_
CO
I Si 6 1 1 6.9
4> w a o. o I
X 0^ u
C 3 C
:3 c PU 11
x> CO
2 P
^ i.
16
5 ^E^o. .S.I-5
(u O -r "2 "o > >
a>
g
x> wj "3 x; Si
'^
- o 2 .2 '? I ^
|Ie| a llllil S to C3 4> dO
(S^ ci "O a,
4)
^_^
w
13 4>
6l
g u 13
g i2i
1 S E 13 6
u T3 I B E T3
n 9 2 a o 9 S vi ^Z t5
u ? O "o T3 1*U
E
I
7* *= cl<=^
G
^
,cO CO "O "is "K
00
4 i i:J 4 rA $ vi.E
<^
^ <^ c
=
On ( X "S 4> Co
On c
w ^
2 Si t
CO
S
E b ffl
g^ 3:
f^ N
.2E
t;5
c
= ^^ a5
OU On
S 1 11
I if x:
o
o
J c Jo o=a 4J
O
11
</5
. t/T
DC 2
OX g ^ -C ^4- W' Jj-J
.2 -it c K Harri
Allen
11
ui 115 M.
356
2
u
I JS I I
ft-
<
X
Si wh
o 21 o
Pu -S
O
!3
o Se
I I 8 -1
cj O ^ J -I
o <^ .9 <*H o .9
*>
? (u a t: jc
- ^3 > g ^
6 c 3 4>
-?>-*' t_ "O o, .^ .> 60 u, o .9 2^ -B IJ C > >
^ a > o c
^1 ?^y 2- fc
I 1 ^ ^' o Q -o X> C/5
J i I iI X g 8-2
o ^ a> (U 2
? 73 |
t-i ,4>
^1 G 6
- E 2o T3
^ "o
r^ 9 4>
J2
^ i
1
^ ?
^
1
4 m vo o S a4 vi 00
c
<^
S"
CQ
(/T
c .-s
si o 2i o<
c
^ CQ
c
X PQ </5 '=y
^^ -^
oo
o
T3 CO 0\ CO 4> O
0- ?i
J
-^ 2O
T3 en < CO pQ Ql^
o < oo
(55 2i c c 2i
en
H 2H
lT
c P ^^
^ ^"^
^ K- OO w ^ ^ oo
gC^ &
S^ lu r" CO
e3 5 2 ^ Os
CO O oa 05 USw Q^Q i=4 ^
357
V3 l/i
<
z
ft, <
X
*- JT! fl>
O CO CO
SfSi
^a 8
le^6-2 6 2
c o
a> 2 "C 2 11 1
S
s P 4>
s > t 'o
t 5 G > </5
o o H iS o o (/5
j2 o ^ a JS
o-a t''
o
s ^ <=>
W5 ,
CO 3t.S
us
U
1> CO
^^
c -. > o 5 ^ o g <u 2 .52 u
OO
1
rr Jrt
00
S
H 5 =
T3 o g 60O.S2 go.-ga.fe '^
2 g 8 2 5 O 6 '%. =1
E CO Q (5 e w s <2 fS II
4>
"cO
8 e a
1 M a
b^ O ^ *?"
s (u -75
<ii ^
o 2o "S
"o a -s A a
2
9 o o 00 ^ a >.
2
T3
io ^O ^ OTN
VO (U
I
CO a M ^ a^ CO
.
vi
4> A s
-o I
T3
5 CO
T3 CL) J
SO "o PC 73 <s f^ a On A
60
ca
^S
|i
y a
-2 ir^y
a
o
E C.
-^ - oa
a
=2 s| a E
*i 5 j^ . =3 o S
53 as
I
2 > o a
b c5? ^ ^ OQ U OQ U^O Oc)5 C/5
358
la I I
I
I
^ 4>
Q CO
73
I
2 o
w
s^
o
CO to iS
oo.su U
r- 4> 3 ^ g
I -B 9 S bo 2p 2 fe
.S .S H .2
.2 ^ T .2 2 .S ^
S"^
43 CO ffl <iJ W
S H U fed- ^
O r-
Tt n vd
^ 5 1 ^
rS s,
CO c3
6 3 C
5
9
T. -O
1 s
b-o 2 S 2 ^ o 1 1l^ia9l?l
o p
o E
II
<^ t ^_^
S
B r~
g ON
O < 4>
^ ii .2?
CO
i 1
m 00
c 3 1 2
. & CP *;< (U
0 W3
ii Q
Jl c/o 1 :5 ;2
359
s
Ou <
m
(t^ C
(U
C4
*-
Mi
"O
3
4-<
13 C/9
(/3
-
4)
c3
a> a>
a>
Xo J= J= o 3
o u DC (2 "^
3 3 B3
CO o I-.
0^ C &H H
C? H
L H
L ^'
<N o
OS 4D
"o O
73 o 13 o S
CO 4> i-i a> SI
c ^ 6 ^ E .2 ^ fe
D C O a> o -3 4> !
S 5 ii -fi
o
:s 8DC UK QC
2i
aps5
iJ
aw S^
-
O t ,^ iS
-H <s
.
Dli J3
T3
"2 " ^
I
1.2
g ^ -^ > I boS
-a (4-1 M i> ** S -3 t- t,
> o - 5
4> -C .> .^ S3 6 I I -S
S k w CO 1/3 > 00 ^ ^ 8 -s 8 -^
11 .S.ti g.S 2 ^ S ^.S ^ ?
^ s
^ .2> ' c > -a P a>
o </3 fto a>
^ "".SO
<N -H
1)
13
^
13
"O
"o -a
c
i c 2 13
O O 2 13 13 2 13 5 2i
E E ^A E 2
Tt g 00 c
2 S-^-g ^ (/TS ? 2o
i^ o o ^w TH 9 9 -a c3 5
C CO >^
^ ^
I
c '. '. '.
2
13 ^
g >%
s ^3
cia
o 13 3
3
- ^
I (S
2 > y 1
'^ ^ 3 <
5;:
o o^ 1^ o
CO ^ .
a sa^ Oh 03 S >O
360
Beyond the Individual: Replication Procedures 361
Comment on replication
more difficult than for direct replication due to the variety of experimental
364 Single-case Experimental Designs
typically found in a systematic replication series (e.g., see Table 10-2), two
fall into this category: the experimental analysis containing only one case
and the group study.
As noted above, the report of a single-case, particularly when accompa-
nied by an experimental analysis, can be a valuable addition to a series in
that it describes another setting, behavior disorder, or other item where the
procedure was successful. Reports of single-cases also may lead to direct
and systematic repHcation, as in the differential attention series. Unfortu-
nately, however, failures in a single-case are seldom published in journals.
Among the numerous successful reports of single-case studies contained in
the differential attention series, very few reported a failure, although it is
our guess that differential attention has failed on many occasions, and
these failures simply have not been reported.
The group study suffers from the same limitation because failures are
lost in thegroup average. Again, group studies can play an important role
in systematic replication in that demonstration that a technique is success-
ful with a given group, as opposed to individuals in the group, may serve
an important function (see section 2.9). In the differential attention series,
several investigators thought it important to demonstrate that the proce-
dure could be effective in a classroom as a whole (e.g., Ward & Baker,
1968). These data contributed to generality of findings across several
domains. The fact remains, however, that failures will not be detected
(unless the whole experiment fails, in which case it would not be
published), thus leading us no closer to the goal of defining the conditions
in which a successful technique fails. In clinical replication, ox field testing,
described below, one has more flexibility in examining results from large
groups of treated clients as long as it is possible to pinpoint individuals
who succeed or fail.
differential attention, with more confidence than procedures from less ad-
vanced series (Barlow, 1974). However, it is still possible through inspection
of these data to utilize those new procedures with a degree of confidence
dependent on the degree to which the experimental clients, therapists, and
At the very least, this is a
settings are similar to those facing the clinician.
good beginning to the often discouraging and sometimes painful process of
clinical trial and error.
been developed for all coexisting problems, the next step would be to estab-
lish generality of findings by replicating this treatment package on additional
Thus, while all facets of single-case experimental research are much closer
to the procedures in clinical or applied practice than to other types of research
methodology (see below), chnical replication in its most elementary form
becomes almost identical with the activities of practitioners.
Lovaas et al. (1973) presented the results and follow-up data from the
initial clinical replication series for 13 children. Results were presented in
terms of response of the group as a whole, as well as of individual improve-
ment across the variety of behavioral and emotional problems. While these
data are complex, they can be summarized as follows. All children demon-
strated increases in appropriate behaviors and decreases in inappropriate
behaviors. There were marked differences in the amount of improvement. At
leastone child was returned to a normal school setting, while several children
improved very little and required continued institutionalization. In other
words, each child improved, but the change was not clinically dramatic for
several children.
Because clinical replication is similar to direct replication, it can be ana-
lyzed in a similar fashion, and conclusions can be made in two general areas.
First, the treatment package can be effective for behaviors subsumed under
the autistic syndrome. This conclusion is based on (1) the initial experimental
analysis of each component of the treatment package in the original direct
replication series (e.g., Lovaas & Simmons, 1969) and (2) the withdrawal and
reintroduction of this whole package in A-B-A-B fashion in several children
(Lovaas et al., 1973). Second, replication of this finding across all subjects
indicates that the data are reliable and not due to idiosyncracies in one child.
It does not follow, however, that generality across children was established.
As in example 3 in the section on direct replication (10.2), the results were
Beyond the Individual: Replication Procedures 369
clear and clinically significant for several children, but the results were also
weak and clinically unimportant for several children. Thus the package has
only limited generality across clients, and the task remains to pinpoint dif-
ferences between children who improved and those who did not improve.
From these differences, possible causes for limitations on client generality
should emerge.
In fact, children in this series were quite heterogeneous. In many respects,
this was due to an inherent difficulty in clinical replication the vagueness
and unreliability of many diagnostic categories. As Lovaas et al. (1973)
pointed out, "... the delineation of 'autism' is one area that will demand
considerably more work. It has not been a particularly useful diagnosis. Few
people agree on when to apply it" (p. 156). It follows that heterogeneity of
clients will most likely be greater than in a direct replication series, where the
target behavior is well defined and clients can be matched more closely.
Thus the causes of failure in a series with mixed results are more difficult to
ascertain, due to the greater number of differences among individuals. Never-
theless, it is necessary to pinpoint these differences and begin the search for
intersubject variability. As Lovaas et al (1973) concluded:
Finally a major focus of future research should attempt more functional descrip-
tions of autistic children. As we have shown, the children responded in vastly
different ways to the treatment we gave them. We paid scant attention to
individual differences when we treated the first twenty children. In the future, we
will assess such individual differences, (p. 163)
schedule arrangement for a large group study, where years may pass before
publishable data are available.
Third, the experimental analysis of the single-case is close to the clinic. As
noted in chapter 1, thisapproach tends to merge the role of scientist and
practitioner. Many an important series has started only after the clinician
confronted an interesting case. Subsequently, measures were developed, and
an experimental analysis of the treatment was performed (Mills et al., 1973).
As a result, the data increase one's understanding of the problem, but the
client also receives and benefits from treatment. If one plans to treat the
patient, it is an easy enough matter to develop measures and perform the
necesssary experimental analyses. The recent book mentioned above (Barlow
et al., 1983) was designed to explore this potential in our full-time practi-
tioners by demonstrating how they can incorporate these principles into their
practices and thereby participate in the research process. This ability to work
with ease within the clinical setting, more than any other fact, may ensure the
future of meaningful replication efforts.
Finally, as noted above, the results of the series are cumulative, and each
new replicative effort has some immediate payoff for the practicing clinician.
As this is the ultimate goal of the applied researcher, it is far more satisfactory
than participating in a multiyear collaborative study where knowledge or
benefit to the clinician is a distant goal.
Nevertheless, the advancement of a systematic replication series is a long
and arduous road full of pitfalls and dead ends. In the face of the immediate
demands on clinicians and behavior change agents to provide services to
society, it is tempting to "grab the glimmer of hope" provided by treatments
that prove successful in preliminary reports or case studies. That these hopes
have been repeatedly dashed as therapeutic techniques and schools of therapy
have come and gone supplies the most convincing evidence that the slow but
inexorable process of the scientific method is the only way to meaningful
advancement in our knowledge. Although we are a long way from the
sophistication of the physical sciences, the single case experimental design
with adequate replication may provide us with the methodology necessary to
overcome the complex problems of human behavior disorders.
Hiawatha Designs an Experiment
Maurice G. Kendall
(Originally published in The American Statistician, Dec. 1959, Vol. 13,
No. 5. Reprinted by Permission).
Thus it happened in the contest All the same, his fellow tribesmen
That their scores were most Ignorant, benighted heathens.
impressive Took away his bow and arrows.
With one notable exception Said that though my Hiawatha
This (I hate to have to say it) Was a brilliant statistician
Was the score of Hiawatha, He was useless as a bowman.
Who, as usual, shot his arrows As for variance components,
Shot them with great strength and Several of the more outspoken
swiftness Made primeval observations
Managing to be unbiased Hurtful to the finer feelings
Not, however, with his salvo Even of a statistician.
Managing to hit the target.
There, they said to Hiawatha In a corner of the forest
That is what we all expected. Dwells alone my Hiawatha
Permanently cogitating
Hiawatha, nothing daunted. On the normal law of error.
Called for pen and called for paper Wondering in idle moments
Did analyses of variance Whether an increased precision
Finally produced the figures Might perhaps be rather better.
Showing, beyond peradventure. Even at the risk of bias.
Everybody else was biased If thereby one, now and then,
And the variance components could
Did not differ from each other Register upon the target.
References
Abel, G. G., Blanchard, E. B., Barlow, D. H., & Flanagan, B. (1975, December). A controlled
behavioral treatment of a sadistic rapist. Paper presented at the meeting of the Association for
Advancement of Behavior Therapy, San Francisco.
Agras, W. S. (1975). Behavior modification in the general hospital psychiatric unit. In H.
Leitenberg (Ed.), Handbook of behavior modification (pp. 547-565). Englewood Cliffs, NJ:
Prentice-Hall.
Agras, W. S., Barlow, D. H., Chapin, H. N., Abel, G. G., & Leitenberg, H. (1974). Behavior
modification of anorexia nervosa. Archives of General Psychiatry, 30, 279-286.
Agras, W. S., Kazdin, A. E., & Wilson, G. T. (1979). Behavior Thearpy: Toward an applied
clinical science. San Francisco: W. H. Freeman.
Agras, W. S., Leitenberg, H., & Barlow, D. W. (1968). Social reinforcement in the modification
of agoraphobia. Archives of General Psychiatry, 19, Ali-All.
Agras, W. S., Leitenberg, H., Barlow, D. H., Curtis, N. A., Edwards, J. A., & Wright, D. E.
(1971). Relaxation in systematic desensitization. Archivesof General Psychiatry, 25, 511-514.
Agras, W. S., Leitenberg, H., Barlow, D. H., & Thomson, L. E. (1969). Instructions and
reinforcement in the modification of neurotic behavior. American Journal of Psychiatry, 125,
1435-1439.
Alford, G. S., Blanchard, E. B., & Buckley, M. (1972). Treatment of hysterical vomiting by
modification of social contingencies: A case study. Journal of Behavior Therapy and Experi-
mental Psychiatry, 3, 209-212.
Alford, G. S., Webster, J. S., & Sanders, S. H. (1980). Covert aversion of two interrelated deviant
sexual practices: Obscene phone calling and exhibitionism. A single case analysis. Behavior
Therapy. 11. 15-25.
Allen, K. E., & Harris, F. R. (1966). Elimination of a child's excessive scratching by training the
mother in reinforcement procedures. Behaviour Research and Therapy, 4, 79-84.
Allen, K. E., Hart, B. M., Buell, J. S., Harris, F R., & Wolf, M. M. (1964). Effects of social
reinforcement on isolate behavior of a nursery school child. Child Development, 35, 511-518.
Allen, K. E., Henke, L. B., Harris, F R., Baer, D. M., & Reynolds, N. J. (1%7). Control of
hyperactivity by social reinforcement of attending behavior. Journal of Educational Psychol-
ogy, 58, 231-237.
Allison, M. G., & Ayllon, T. (1980). Behavioral coaching in the development of skills in football,
gymnastics, and tennis. Journal of Applied Behavior Analysis, 13, 297-314.
Allport, G. D. (1961). Pattern and growth in personality. New York: Holt, Rinehart and
Winston.
Allport, G. D. (1962). The general and the unique in psychological science. Journal of Personal-
ity, 30, 405-422.
Altman, J. (1974). Observational study of behavior: Sampling methods. Behaviour. 49, 227-267.
American Psychological Association. (1973). Ethical principles in the conduct of research with
human participants. Washington, DC: Author.
Anderson, R. L. (1942). Distribution of the serial correlation coefficient. Annab of Mathematical
Statistics, 13, 1-13.
374
References 375
Anderson, R. L. (1971). The statistical analysis of time series. New York: Wiley.
Arrington, R. E. (1939). Time-sampling studies of child behavior. Psychological Monography, 51
( ).
Baer, D. M. (1971). Behavior modification: You shouldn't. In E. Ramp & B. L. Hopkins (Eds.),
A new direction for education: Behavior analysis. Lawrence, KS: Lawrence University Press.
Baer, D. M. (1977a). "Perhaps it would be better not to know everything." Journal of Applied
Behavior Analysis, 10, 167-172.
Baer, D. M. (1977b). Reviewer's comment: Just because it's reliable doesn't mean that you can use
it. Journal of Applied Behavior Analysis, 10, 117-119.
Baer, D. M., & Guess, D. (1971). Receptive training of adjectival inflections in mental retardates.
Journal of Applied Behavior Analysis, 4, 129-139.
Baer, D.M., Wolf, M. M., & Risley, T R. (1968). Some current dimensions of applied behavior
analysis. Journal of Applied Behavior Analysis, 1, 91-97.
Bailey, J. S., Wolf, M. M., & PhilHps, E. L. (1970). Home-based reinforcement and the
modification of pre-delinquents' classroom behavior. Journal of Applied Behavior Analysis, 3,
223-233.
Bakeman, R. (19*78). Untangling streams of behavior: Sequential analysis of observational data.
In G. P. Sackett (Ed.), Observing behavior: Vol. 2. Data collection and analysis methods (pp.
63-78). Baltimore: University Park Press.
Ban, T. (1969). Psychopharmacology. Baltimore: Williams & Wilkins.
Bandura, A. (1969). Principles of behavior modification. New York: Holt, Rinehart & Winston.
Barker, R. G., & Wright, H. E
Midwest and its children: The psychological ecology of an
(1955).
American town. New York: Harper & Row.
Barlow, D. H. (1974). The treatment of sexual deviation: Towards a comprehensive behavioral
approach. In K. S. Calhoun, H. E. Adams, & K. M. Mitchell (Eds.), Innovative treatment
methods in psychopathology. New York: John Wiley & Sons, Inc., 1974.
Barlow, D. H. (1980). Behavior therapy: The next decade. Behavior Therapy, 11, 315-328.
Barlow, D. H. (Ed.). (1981). Behavioral assessment of adult disorders. New York: Guilford
Press.
376 Single-case Experimental Designs
Barlow, D. H., Agras, W. S., Leitenberg, H., Callahan, E. J., & Moore, R. C. (1972). The
contributions of therapeutic instructions to covert sensitization. Behaviour Research and
Therapy, 70,411-415.
Barlow, D. H., Becker, R., Leitenberg, H., & Agras, W. S. (1970). A mechanical strain gauge for
recording penile circumference change. Journal of Applied Behavior Analysis, 3, 73-76.
Barlow, D. H., Blanchard, E. B., Hayes, S. C, & Epstein, L. H. (1977). Single case designs and
biofeedback experimentation. Biofeedback and Self-Regulation, 2, 211-236.
Barlow, D. H., & Hayes, S. C. (1979). Alternating treatments design: One strategy for comparing
the effects of two treatments in a single subject. Journal of Applied Behavior Analysis, 12,
199-210.
Barlow, D. H., Hayes, S. C, & Nelson, R. O. (1983). The scientist-practitioner: Research and
accountability in clinical and educational settings. Elmsford, New York: Pergamon Press.
Barlow, D. H., & Hersen, M. (1973). Single case experimental designs: Uses in applied clinical
research. Archives of General Psychiatry, 29, 319-325.
Barlow, D. H., Leitenberg, H., & Agras, W. S. (1969). Experimental control of sexual deviation
through manipulation of the noxious scene in covert sensitization. Journal of Abnormal
Psychology, 74, 596-601.
Barlow, D. H., Leitenberg, H., Agras, W. S., & Wincze, J. R (1969). The transfer gap in
systematic desensitization: An analogue study. Behaviour Research and Therapy, 7, 191-197.
Barlow, D. H., Mavissakalian, M., & Schofield, L. (1980). Patterns of desynchrony in agorapho-
bia: A preliminary report. Behaviour Research and Therapy, 18, 441-448.
Barmann, B. C, Katz, R. C, O'Brien, E, & Beauchamp, K. L. (1981). Treating irregular
enuresis in developmentally disabled persons: A study in the use of overcorrection. Behavior
Modification, 5, 336-346.
Barnes, K. E., Wooton, M., & Wood, S. (1972). The public health nurse as an effective therapist-
behavior modifier of preschool play behavior. Community Mental Health Journal, 8, 3-7.
Barrera, R. D., & Sulzer-Azaroff, B. (1983). An alternating treatment comparison or oral and
total communication training program with echolalic autistic children. Journal of Applied
Behavior Analysis, 16, 379-395.
Barrett, R. R, Matson, J. L., Shapiro, E. S., & Ollendick, T. H. (1981). A comparison of
punishment and DRO procedures for treating stereotypic behavior of mentally retarded
children. Applied Research in Mental Retardation, 2, 247-256.
Barrios, B. A., & Hartmann, D. P. (in press). Traditional assessment's contributions to behavioral
assessment: Concepts, issues, and methodologies. In. R. O. Nelson & S. C. Hayes (Eds.),
Conceptual foundations of behavioral assessment. New York: Guilford Press.
Barrios, B. A., Hartmann, D. P., & Shigetomi, C. (1981). Fears and anxieties in children. In E. J.
Mash & L. G. Terdal (Eds.), Behavioral assessment of childhood disorders (pp. 259-304). New
York: Guilford Press.
Barron, ., & Leary, T. (1955). Changes in psychoneurotic patients with and without psy-
chotherapy. Journal of Consulting Psychology, 19, 239-245.
Barton, E. S., Guess, D., Garcia, E., & Baer, D. M. (1970). Improvement of retardates' mealtime
behaviors by timeout procedures using multiple baseline techniques. Journal of Applied
Behavior Analysis, 3, 77-84.
Bates, P. The effectiveness of interpersonal skills training on the social acquisition of
(1980).
moderately and mildly retarded adults. Journal of Applied Behavior Analysis, 13, 237-248.
Baum, C. G., Forehand, R. L., & Zegiob, L. E. (1979). A review of observer reactivity in adult-
child interactions. Journal of Behavioral Assessment, 1, 167-178.
Beck, A. T, Rush, A. J., Shaw, B. J., & Emery, G. (1979). Cognitive therapy of depression. New
York: Guilford Press.
Beck, A. T, Ward, C. H., Mendelson, M., Mock, J., & Erbaugh, J. (1961). An inventory for
measuring depression. Archives of General Psychiatry, 4, 561-571.
References 377
Boer, A. P. (1968). Application of a single recording system to the analysis of free-play behavior
Bolger, H. (1965). The case study method. In B B. Wolman (Ed.), Handbook of clinical
psychology (pp. 28-39). New York: McGraw-Hill.
Boring, E. G. (1950). A history of experimental psychology. New York: Appleton-Century-
Crofts.
Bornstein, M. R., Beilack, A. S., & Hersen, M. (1977). Social-skills training for unassertive
children: A multiple-baseUne analysis. Journal of Applied Behavior Analysis, 10, 183-195.
Bornstein, M. R., Beilack, A. S., & Hersen, M. (1980). Social skills training for highly aggressive
children: IVeatment in an inpatient psychiatric setting. Behavior Modification, 4, 173-186.
Bornstein, P. H., Bridgewater, C. A., Hickey, J. S., & Sweeney, T. M. (1980). Characteristics and
trends in behavioral assessment: An archival analysis. Behavioral Assessment, 2, 125-133.
Bornstein, P H., Hamilton, S. B., Carmody, T. B., Rychtarik, R. G., &, Veraldi, D. M. (1977).
Reliability enhancement: Increasing the accuracy of self-report thjough mediation-based pro-
cedures. Cognitive Therapy and Research, 1, 85-98.
Bornstein, P. H., & Rychtarik, R. G. (1983). Consumer satisfaction in aduh behavior therapy:
Procedures, problems, and future perspective. 5e/iav/or Therapy, 14, 191-208.
Bowdlear, C. M. (1955). Dynamics of idiopathic epilepsy as studied in one case. Unpublished
doctoral dissertation. Case Western Reserve University, Cleveland, Ohio.
Box, G. E. P., & Jenkins, G. M. (1970). Time series analysis: Forecasting and control. San
Francisco: Holden-Day.
Box, G. E. P., & Tiao, G. C. (1965). A change in level of non-stationary time series. Biometrika,
52, 181-192.
Boykin, R. A., & Nelson, R. O. (1981). The effects of instruction and calculation procedures on
observers' accuracy, agreement, and calculation correctness. Journal of Applied Behavior
Analysis, 14, 479-489.
Bradley, L. A., & Prokop, C. K. (1982). Research methods in contemporary medical psychology.
In P. C. Kendall & J. N. Butcher (Eds.), Handbook of research methods in clinical psychology
(pp. 591-649). New York: Wiley
Brady, J. P., & Lind, D. L. (1961). Experimental analysis of hysterical blindness. Archives of
General Psychiatry, 4, 331-339.
Brawley, E. R., Harris, F. R., Allen, K. E., Fleming, R. S., & Peterson, R. E (1969). Behavior
modification of an autistic child. Behavioral Science, 14, 87-97.
Breuer, J., & Freud, S. (1957). Studies on hysteria. New York: Basic Books.
Breuning, S. E., O'Neill, M. J., & Ferguson, D. G. (1980). Comparison of psychotropic drugs,
response cost, and psychotropic drug plus response cost procedures for controlling institution-
alized mentally retarded persons. Applied Research in Mental Retardation, 1, 253-268.
Brill, A. A. (1909). Selected papers on hysteria and other psychoneuroses: Sigmund Freud.
Nervous and Mental Disease Monograph Series, 4.
Broden, M., Bruce, C, Mitchell, M. A., Carter, V, & Hall, R. V. (1970). Effects of teacher
attention on attending behavior of two boys at adjacent desks. Journal of Applied Behavior
Analysis, 3, 205-211.
Broden, M., Hall, R. V, Dunlap, A., & Clark, R. (1970). Effects of teacher attention and a token
reinforcement system in a junior high school special education class. Exceptional Children, 36,
341-349.
Brookshire, R. H. (1970). Control of "involuntary" crying behavior emitted by a multiple
sclerosis patient. Journal of Community Disorders, 1, 386-390.
Browning, R. M. (1967). A same-subject design for simultaneous comparison of three reinforce-
ment contingencies. Behaviour Research and Therapy, 5, 237-243.
.
References 379
Mash & L. G. Terdal (Eds.), Behavioral assessment of childhood disorders (pp. 639-678). New
York: Guilford.
Cohen, D. C. (1977). Comparison of self-report and overt-behavioral procedures for assessing
380 Single-case Experimental Designs
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological
Measurement, 20, 37-46.
Cohen, J. (1968). Weighted kappa: Nominal scale agreement with provisions for scale disagree-
ment or partial credit. Psychological Bulletin, 70, 313-220.
Cohen, L. H. (1976). Clinicians' utilization of research findings. JSAS Catalog of Selected
Documents in Psychology, 6, 116.
Cohen, L. H. (1979). The research readership and information source reliance of clinical
psychologists. Professional Psychology, JO, 780-786.
Coleman, R. A. (1970). Conditioning techniques applicable to elementary school classrooms.
Journal of Applied Behavior Analysis, 3, 293-297.
Cone, J. D. (1977). The relevance of reliability and validity for behavior assessment. Behavior
Therapy, 8, 411-426.
Cone, J. D. (1979). Confounded comparisons in triple response mode assessment research.
Behavioral Assessment, I, 85-95.
Cone, J. D. (1982). Validity of direct observation assessment procedures. In D. P. Hartmann
(Ed.), Using observers to study behavior: New directions for methodology of social and
behavioral science (pp. 67-79). San Francisco: Jossey-Bass.
Cone, J. D., & Foster, S. L. (1982). Direct observations in clinical psychology. In P. C. Kendall &
J. N. Butcher (Eds.), Handbook of research methods in clinical psychology (pp. 311-354).
New York: Wiley.
Cone, J. D., & Hawkins, R. P. (Eds.). (1977). Behavior assessment: New directions in clinical
psychology. New York: Brunner/Mazel.
Conger, A. J. (1980). Integration and generalization of kappas for multiple raters. Psychological
Bulletin, 88, 322-328.
Conger, J. C. (1970). The treatment of encopresis by the management of social consequences.
Behavior Therapy, 1, 386-390.
Conover, W. J. (1971). Practical nonparametric statistics. New York: Wiley.
Conrin, J., Pennypacker, H. S., Johnston, J. M., & Rast, J. (1982). Differential reinforcement of
other behaviors to treat chronic rumination of mental retardates. Journal of Behavior Therapy
and Experimental Psychiatry, 13, 325-329.
Cook, T. D., & Campbell, D, T. (Eds.). (1979). Quasi-experimentation: Design and analysis issues
for field settings. Chicago: Rand McNally.
Cormier, W. H, (1969). Effects of teacher random and contingent social reinforcement on the
classroom behavior of adolescents. Dissertation Abstracts International, 31, 1615A-1616A.
Corte, H. E., Wolf, M. M., & Locke, B. J. (1971). A comparison of procedures for eliminating
self-injurious behavior of retarded adolescents. Journal of Applied Behavior Analysis, 4,
201-215.
Cossairt, A., Hall, R. V, & Hopkins, B. L. (1973). The effects of experimenters' instructions,
feedback, and praise on teacher praise and student attending behavior. Journal of Applied
Behavior Analysis, 6, 89-100.
Creer, T. L., Chai, H., & Hoffman, A. (1977). A single application of an aversive stimulus to
eliminate chronic cough. Journal of Behavior Therapy and Experimental Psychiatry, 8,
107-109.
Cronbach, L. J. of psychological testing (3rd ed.). New York: Harper & Row.
(1970). Essentials
Cronbach, L. R. L. Thorndike (Ed.), Educational measurement (pp.
J. (1971). Test validation. In
Cuvo, A. J., & Riva, M. T. (1980). Generalization and transfer between comprehension and
production: A comparison of retarded and nonretarded persons. Journal of Applied Behavior
Analysis, /5, 215-231.
Dalton, K. (1959). Menstruation and acute psychiatric illness. British Medical Journal, 1,
148-149.
Dalton, K. (1960a). Menstruation and accidents. British Medical Journal, 2, 1425-1426.
Dalton, K. (1960b). School girls' behavior and menstruation. British Medical Journal, 2,
1647-1649.
Dalton, K. (1961). Menstruation and crime. British Medical Journal, 2, 1752-1753.
Davidson, P. O., & Costello, C. G. (1969). N= J: Experimental studies of single cases. New York:
Van Nostrand Reinhold.
Davis, K. v., Sprague, R. L., & Werry, J. S. (1969). Stereotyped behavior and activity level in
severe retardates: The of drugs. American Journal of Mental Deficiency, 73, 721-727.
effect
Davis, V. J., PoHng, A. D., Wysocki, T, & Breuning, S. E. (1981). Effects of Phenytoin
withdrawal on matching to sample and workshop performance of mentally retarded persons.
Journal of Nervous and Mental Disease, 169, 718-725.
Davison, G. C. (1965). The training of undergraduates as social reinforcers for autistic children.
In L. P. UUmann & L. Krasner (Eds.), Case studies in behavior modification (pp. 146-148).
New York: Holt, Rinehart and Winston.
DeProspero, A., & Cohen, S. (1979). Inconsistent visual analysis of intrasubject data. Journal of
Applied Behavior Analysis, 12, 573-579.
Doke, L. A. (1976). Assessment of children's behavioral deficits. In M. Hersen & A. S. Bellack
(Eds.), Behavioral assessment (pp. 493-536). Elmsford, New York: Pergamon Press.
Doke, L. A., fe Risley, T. R. (1972). The organization of day-care environments: Required vs
optional activities. Journal of Applied Behavior Analysis, 5, 405-420.
Dollard, J., Doob, L. W., Miller, N. E., Mowrer, O. H., & Sears, R. R. (1939). Frustration and
aggression. New Haven: Yale University Press.
Domash, M. A., Schnelle, J. E, Stomatt, E. L., Carr, A. E, Larson, L. D., Kirchner, R. E., &
Risley, T. R. (1980). Police and prosecution systems: An evaluation of a police criminal case
29, 672A.
Emmelkamp, P. M. G. (1974). Self-observation versus flooding in the treatment of agoraphobia.
Behaviour Research and Therapy, 12, 229-237.
Emmelkamp, P. M. G. Phobic and obsessive-compulsive disorders: Theory, research and
(1982).
practice. New York: Plenum.
Emmelkamp, P. M. G., & Kwee, K. G. (1977). Obsessional ruminations: A comparison between
thought stopping and prolonged exposure in imagination. Behaviour Research and Therapy,
15, 441-444.
Epstein, L. H., Beck, S. J., Figueroa, J., Farkas, G., Kazdin, A. E., Daneman, D., & Becker, D.
(1981). The effects of targeting improvements in urine glucose on metabolic control in children
with insulin dependent diabetes. Journal of Applied Behavior Analysis, 14, 365-375.
Epstein, L. H., & Hersen, M. (1974). Behavioral control of hysterical gagging. Journal of
Clinical Psychology, 30, 102-104.
Epstein, L. H., Hersen, M., & Hemphill, D. P. (1974). Music feedback in the treatment of tension
headache: An experimental case study. Journal of Behavior Therapy and Experimental Psy-
chiatry, 5, 59-63.
Etzel, B. C, & Gerwitz, J. L. (1967). Experimental modifications of caretaker-maintained
highrate operant crying in a 6- and 20- week-old infant (Infans tyrannotearus): Extinction of
crying with reinforcement of eye contact and smiling. Journal of Experimental Child Psychol-
ogy, 5, 303-317.
Evans, I. M. (1983). Behavioral assessment. In C. E. Wallace (Ed.), Handbook of clinical
psychology: Vol. 1. Theory, research, and practice {pv^. 391-419). Homewood, IL: Dow Jones-
Irwin.
Evans, I. M., & Wilson, F. E. (1983). Behavioral assessment on decision making: A theoretical
analysis. In M. Rosenbaum, C. M. Franks, & Y. Jaffe (Eds.), Perspectives on behavior therapy
in the eighties (Vol. 9, (pp. 35-53). New York: Springer Publishing.
Eyberg, S. M., & Johnson, S. M. (1974). Multiple assessment of behavior modification with
families: Effects of contingency contracting and order of treated problems. Journal of Con-
sulting and Clinical Psychology, 42, 594-606.
Eysenck, H. J. (1952). The effects of psychotherapy: An evaluation. Journal of Consulting
Psychology, 16, 319-324.
References 383
97-178.
Ezekiel, M., & Fox, K. A- (1959). Methods of correlation and regression analysis: Linear and
curvilinear New York: Wiley.
Fairbank, J. A., & Keane, T. M. (1982). Flooding for combat-related stress disorders: Assessment
of anxiety reduction across traumatic memories. Behavior Therapy, 13, 499-510.
Fisher, E. B. (1979). Overjustification effects in token economies. Journal of Applied Behavior
Analysis, 12, 407-415.
Fisher, R. A. (1925). On the mathematical foundations of the theory of statistics. In Cambridge
Phil. Society (Ed.), Theory of statistical estimation (Proceedings of the Cambridge Philosophi-
cal Society) England.
Fjellstedt, N., & Sulzer-Azaroff, B. (1973). Reducing the latency of a child's responding to
instructions by means of a token system. Journal of Applied Behavior Analysis, 6, 125-130.
Fleiss, J. H. (1975). Measuring agreement between two judges on the presence or absence of a
trait. Biometrics, 31, 651-659.
Foa, E. B. (1979). Failure in treating obsessive-compulsives. Behaviour Research and Therapy,
17, 169-175.
Foa, E. B., Grayson, J. B., Steketee, G. S., Doppelt, H. G., Tlirner, R. M., & Latimer, P. R.
(1983). Success and failure in the behavioral treatment of obsessive compulsives. Journal of
Consulting and Clinical Psychology, 51, 287-297.
Forehand, R. L. (Ed.). (1983). Mini-series on consumer satisfaction and behavior therapy.
Behavior Theraoy, 14, 189-246.
Forehand, R. L., & McMahon, R. J. (1981). Helping the noncompliant child: A clinician's guide
to parent training. New York: Guilford Press.
Frank, J. D. (1961). Persuasion and healing. Baltimore: Johns Hopkins University Press.
Freund, K., & Blanchard, R. (1981). Assessment of sexual dysfunction and deviation. In M.
Hersen & A. S. Bellack (Eds.), Behavioral assessment: A practical handbook (2nd ed., pp.
427-455). Elmsford, New York: Pergamon Press.
Frick, X, & Semmel, M.I. (1978). Observer agreement and reliability of classroom observational
measures. Review of Educational Research, 48, 157-184.
Feuerstein, M., & Adams, H. E. (1977). Cephalic vasomotor feedback in the modification of
migraine headache. Biofeedback and Self-Regulation, 3, 241-254.
Garfield, S. L., & Bergin, A. E. (Eds.). (1978). Handbook of psychotherapy and behavior
change: An empirical analysis (2nd ed.). New York: Wiley.
Geer, J. H. (1965). The development of a scale to measure fear. Behaviour Research and Therapy,
13, 45-53.
Gelfand, D. M., Gelfand, S., & Dobson, W R. (1967). Unprogrammed reinforcement of
patients' behavior in a mental hospital. Behaviour Research and Therapy, 5, 201-207.
Gelfand, D. M., & Hartmann, D. P (1975). Child behavior analysis and therapy Elmsford, N.Y.:
Pergamon Press.
Gelfand, D. M., & Hartmann, D. P (1984). Child behavior: Analysis and therapy (2nd ed.).
Elmsford, New York: Pergamon Press.
& Everett, P. B. (1982). Preserving the environment: New strategies
Geller, E. S., Winett, R. A.,
Analysis, J, 315-322.
Hallahan, D. P., Lloyd, J. W., Kneedler, R. D., & Marshall, K. J. (1982). A comparison of the
effects of self- versus teacher-assessment of on-task behavior. Behavior Therapy, 13, 715-723.
Halle, J. W., Baer, D. M., & Spradlin, J. E. (1981). Teachers' generahzed use of delay as a
stimulus control procedure to increase language use in handicapped children. Journal of
Applied Behavior Analysis, 14, 389-409.
Harbert, T. L., Barlow, D. H., Hersen, M., & Austin, J. B. (1974). Measurement and modifica-
tion of incestuous behavior: A case study. Psychological Reports, 34, 79-86.
Harris, F. R., Johnston, M. K., Kelley, C. S., & Wolf, M. M. (1964). Effects of positive social
reinforcement on regressed crawling of a nursery school child. Journal of Educational Psychol-
ogy, 55, 35-41.
Hart, B. M., Allen, K. E., Buell, J. S., Harris, F. R., & Wolf, M. M. (1964). Effects of social
reinforcement on operant crying. Journal of Experimental Child Psychology, 1, 145-153.
Hart, B. M., Reynolds, N. J., Baer, D. M., Brawley, E. R., & Harris, F R. (1968). Effect of
contingent social reinforcement on the cooperative play of a preschool child. Journal of
Applied Behavior Analysis, 1, 73-76.
Hartmann, D. P. (1974). Forcing square pegs into round holes: Some comments on "An analysis-
of-variance model for the intrasubject replication design." Journal of Applied Behavior
Analysis, 7, 635-638.
Hartmann, D. P. (1976). Some restrictions in the application of the Spearman-Brown prophecy
formula to observational data. Educational and Psychological Measurement, 36, 843-845.
Hartmann, D. P. (1977). Consideration in the choice of interobserver reliability estimates.
Journal of Applied Behavior Analysis, 10, 103-116.
Hartmann, D. P. (1982). Assessing the dependability of observational data. In D. P. Hartmannn
(Ed.), Using observers to study behavior: New directions for methodology of social and
behavioral science (pp. 51-65). San Francisco: Jossey-Bass.
Hartmann, D. P. (1983). Editorial. Behavioral Assessment, 5, 1-3.
Hartmann, D. P., & Gardner, W. (1979). On the not so recent invention of interobserver reliability
statitics: A commentary on two articles by Birkimer and Brown. Journal of Applied Behavior
Analysis, 12, 559-560.
Hartmann, D. P., & Gardner, W (1981). Considerations in assessing the reliability of observa-
tions. In E. E. Filsinger & R. A. Lewis (Eds.), Assessing marriage (pp. 184-196). Beverly Hills:
Sage.
Hartmann, D. P, Gottman, J. M., Jones, R. R., Gardner, W, Kazdin, A. E., & Vaught, R. S.
(1980). Interrupted time-series analysis and its application to behavioral data. Journal of
Applied Behavior Analysis, 13, 543-559.
Hartmann, D. P., & Hall, R. V. (1976). The changing criterion design. Journal of Applied
Behavior Analysis, 9, 527-532.
Hartmann, D. P., Roper, B. L., & Bradford, D. C. (1979). Some relationships between behavioral
and traditional assessment. Journal of Behavioral Assessment, 1, 3-21.
Hartmann, D. R, Roper, B. L., & Gelfand, D. M. (1977). Evaluation of alternative modes of
child psychotherapy. In B. Lahen & A. Kazdin (Eds.), Advances in child clinical psychology
(Vol 1, pp. 1-46). New York: Plenum.
Hartmann, D. R, & Wood, D. D. (1982). Observation methods. In A. S. Bellack, M. Hersen, &
A. E. Kazdin (Eds.), International handbook of behavior modification and therapy (pp.
109-138). New York: Plenum.
Hasazi, J. E., & Hasazi, S. E. (1972). Effects of teacher attention on digit-reversal behavior in an
elementary school child. Journal of Applied Behavior Analysis, 5, 157-162.
Hawkins, R. P. (1975). Who decided that was the problem? Two stages of responsibility for
applied behavior analysis. In W S. Wood (Ed.), Issues in evaluating behavior modification (pp.
95-214). Champaign, IL: Research Press.
Hawkins, R. P. (1979). The functions of assessment: Implications for selection and development
386 Single-case Experimental Designs
of devices for assessing repertoires in clinical, educational, and other settings. Journal of
Applied Behavior Analysis, 12, 501-516.
Hawkins, R. P. (1982). Developing a behavior code. In D. P. Hartmann (Ed.), Using observers to
study behavior: New directions for methodology of social and behavioral science (pp. 21-35).
San Francisco: Jossey-Bass.
Hawkins, R. P., Axelrod, S., & Hall, R. V. (1976). Teachers as behavior analysts: Precisely
monitoring student performance. In J. A. Brigham, R. P. Hawkins, J. Scott, & J. F.
McLaughlin (Eds.), Behavior analysis in education: Self-control and reading (pp. 274-2%).
Dubuque, lA: Kendall/Hunt.
Hawkins, R. P., & Dobes, R. W. (1977). Behavioral definitions in applied behavior analysis:
Explicit or implicit. In B. C. Etzel, J. M. LeBlanc, & D. M. Baer (Eds.), New directions in
behavioral research: Theory, methods, and applications. In honor of Sidney W. Bijou (pp.
167-188). Hillsdale, NJ: Erlbaum.
Hawkins, R. P., & Dotson, V. A. (1975). Reliability scores that delude: An Alice in Wonderland
trip through the misleading characteristics of interobserver agreement scores in interval record-
References 387
Kazdin, A. E. (1973a). The effect of response cost and aversive stimulation in suppressing
punished and non-punished speech dysfluencies. Behavior Therapy, 4, 73-82.
Kazdin, A. E. (1973b). Methodological and assessment considerations in evaluating reinforce-
ment programs in applied settings. Journal of Applied Behavior Analysis, 6, 517-531.
Kazdin, A. E. (1977). Assessing the of behavior change through
clinical or applied significance
References 391
Shlien (Ed.), Research in psychotherapy (Vol. 3, pp. 90-102). Washington, DC: American
Psychological Association.
Last, C. G., Barlow, D. H., & O'Brien, G. T. (1983). Comparison of two cognitive strategies in
treatment of a patient with generalized anxiety disorder. Psychological Reports, 53, 19-26.
Laws, D. R., Brown, R. A., Epstein, J., & Hocking, N. (1971). Reduction of inappropriate social
behavior in disturbed children by an untrained paraprofessional therapist. Behavior Therapy,
2, 519-533.
Lawson, D. M. (1983). AlcohoHsm. In M. Hersen (Ed.), Outpatient behavior therapy: A clinical
Liberman, R. P., & Smith, V. (1972). A multiple baseline study of systematic desensitization in a
patient with multiple phobias. Behavior Therapy, 3, 597-603.
Liberman, R. P., Wheeler, E. G., DeVisser, L. A., Kuehnel, J., & Kuehnel, T. (1980). Handbook
of marital therapy. New York: Plenum.
Lick, J. R., Sushinsky, L. W., & Malow, R. (1977). Specificity of Fear Survey Schedule items and
the prediction of avoidance behavior. Behavior Modification, /, 195-204.
Light, F. J. (1971). Measures of response agreement for qualitative data: Some generalizations
and alternatives. Psychological Bulletin, 76, 365-377.
Lindsley, O. R. (1%2). Operant conditioning techniques in the measurement of psychopharmaco-
logical response. In J. H. Nodine & J. H. Moyer (Eds.), Psychosomatic medicine: The first
Hahnemann symposium on psychosomatic medicine (pp. 373-383). Philadelphia: Lea &
Febiger.
Linehan, M. M. (1980). Content validity: Its relevance to behavioral assessment. Behavioral
Assessment, 2, 147-159.
Lovaas, O. I., Berberich, J. P, Perloff, B. E, & Schaeffer, B. (1966). Acquisition of imitiative
speech by schizophrenic children. Science, 161, 705-707.
Lovaas, O. I., Freitas, L., Nelson, K., & Whalen, C. (1967). The establishment of imitation and
its use for the development of complex behavior in schizophrenic children. Behaviour Research
and Therapy, 5, 171-181.
Lovaas, O. I., Koegel, R., Simmons, J. Q., & Long, J. D. (1973). Some generalization and
follow-up measures on autistic children in behavior therapy. Journal of Applied Behavior
Analysis, 5, 131-166.
Lovaas, O. I., Schaeffer, B., & Simons, J. Q. (1965). Experimental studies in childhood
schizophrenia: Building social behaviors using electric shock. Journal of Experimental Re-
search in Personality, 1, 99-109.
Lovaas, O. I., & Simmons, J. Q. (1969). Manipulation of self-destruction in three retarded
children. Journal of Applied Behavior Analysis, 2, 143-157.
P. R. Farnsworth & Q. McNemar (Ed.), Annual review of
Luborsky, L. (1959). Psychotherapy. In
psychology (pp. 317-344). Palo Alto, CA: Annual Review.
Lyman, R. D., Richard, H. C, & Elder, I. R. (1975). Contingency management of self-report
and cleaning behavior. Journal of Abnormal Child Psychology, 3, 155-162.
Madsen, C. H., Becker, W. C, & Thomas, D. R. (1968). Rules, praise, and ignoring: Elements of
elementary classroom control. Journal of Applied Behavior Analysis, 1, 139-150.
Malan, D. H, (1973). Therapeutic factors in analytically oriented brief psychotherapy. In R. H.
Gosling (Ed.), Support, innovation and autonomy (pp. 187-205). London: Tavistock.
Malone, J. C, Jr. (1976). Local contrast and Pavlovian induction. Journal of the Experimental
Analysis of Behavior, 26, 425-440.
Mandell, R. M., & Mandell, M. P. (1967). Suicide and the menstrual cycle. Journal of the
American Medical Association, 200, 792-793.
Mann, R. A. (1972). The behavior-therapeutic use of contingency contracting to control an adult
behavior problem: Weight control. Journal of Applied Behavior Analysis, 5, 99-109.
Mann, R. A., & Baer, D. M. (1971). The effects of receptive language training on articulation.
Journal of Applied Behavior Analysis, 4, 291-298.
Mann, R. A., & Moss, G. R. (1973). The therapeutic use of a token economy to manage a young
and assaultive inpatient population. Journal of Nervous and Mental Disease, 157, 1-9.
Mansell, J. (1982). Repeated direct replication of AB designs (Letter to the Editor). Journal of
Behaviour Therapy and Experimental Psychiatry, 13, 261-262.
Marks, I. M. (1972). Flooding (implosion) and allied treatments. In W. S. Agras (Ed.), Behavior
modification: Principles and clinical applications (pp. 151-213). Boston: Little, Brown.
Marks, I. M. (1981). New developments in psychological treatments of phobias. In M. R.
Mavissakalian & D. H. Barlow (Eds.), Phobia: Psychological and pharmacological treatment
(pp. 175-199). New York: Guilford Press.
References 393
Marks, I. M., & Gelder, M. G. (1967). Transvestism and fetishism: Clinical and psychological
changes during faradic aversion. British Journal of Psychiatry, IB, 711-729.
Martin, G., Pallotta-Cornick, A., Johnstone, G., & Celso-Goyos, A, (1980). A supervisory
strategy to improve work performance for lower functioning retarded clients in a sheltered
workshop. Journal of Applied Behavior Analysis, 13, 185-190.
Martin, P. J., & Lindsey, C. J. (1976). Irregular discharge as an unobtrusive measure of . . .
Bellack (Eds.), Behavioral assessment: A practical handbook, (2nd ed.) (pp. 3-37). Elmsford,
References 395
235-239.
Rapport, M. D., Sonis, W. A., Fialkov, M. J., Matson, J. L., & Kazdin, A. E. (1983).
Carbamazepine and behavior therapy for aggressive behavior: Treatment of a mentally re-
tarded, postencephalic adolescent with seizure disorder. Behavior Modification, 7, 255-265.
Ray, W. J., & Raczynski, J. M. (1981). Psychophysiological assessment. In M. Hersen & A. S.
Bellack (Eds.), Behavioral assessment: A practical handbook, (2nd ed.) (pp. 175-211). Elms-
ford, New York: Pergamon Press.
Redd, W. H, (1980). Stimulus control and extinction of psychosomatic symptoms in cancer
patients in protective isolation. Journal of Consulting and Clinical Psychology, 48, 448-456.
Redd, W. H., & Birnbrauer, J. S. (1969). Adults as discriminative stimuli for different reinforce-
ment contingencies with retarded children. Journal of Experimental Child Psychology, 7,
440-447.
Redfield, J. P., & Paul, G. L. (1976). Bias in behavioral observation as a function of observer
familiarity with subjects and typicality of behavior. Journal of Consulting and Clinical Psy-
chology, 44, 156.
Rees, L. (1953). Psychosomatic aspects of the prementrual tension system. Journal of Mental
Science, 99, 62-73.
Reid, J. B. (1978). The development of specialized observation systems. In J. B. Reid (Ed.), A
social learningapproach to family intervention: Vol. 2. Observation in home settings (pp.
43-49). Eugene, OR: Castalia.
Reid, J. B. (1982). Observer training in naturalistic research. In D. P. Hartmann (Ed.), Using
observers to study behavior: New directions for methodology of social and behavioral science
(pp. 37-50). San Francisco: Jossey-Bass.
Revusky, S. H. (1976). Some statistical treatments compatible with individual organism method-
ology. Journal of the Experimental Analysis of Behavior, 10, 319-330.
References 397
Hamerlynck, P. O. Davidson, & L. E. Acker (Eds.), Behavior modification and ideal health
services (pp. 103-127). Calgary, Alberta, Canada: University of Calgary Press.
Risley, T. R., & Wolf, M. M. (1972). Strategies for analyzing behavioral change over time. In J.
Nesselroade & H, Reese (Eds.), Life-span developmental psychology: Methodological issues
(pp. 175-183). New York: Academic Press.
Roberts, M. W., Hatzenbuehler, L. C, & Bean, A. W. (1981). The effects of differential attention
and timeout on child noncompliance. Behavior Therapy, 12, 93-99.
Rogers, C. R., Gendlin, E. T, Kiesler, D. J., & Truax, C. B. (1967). The therapeutic relationship
and its impact: A study ofpsychotherapy with schizophrenics. Madison: University of Wiscon-
sin Press.
Rogers- Warren, A., & Warren, S. F. (1977). Ecological perspectives in behavior analysis. Balti-
more: University Park Press.
Rojahn, J., Mulick, J. A., McCoy, D., & Schroeder, S. R. (1978). Setting effects, adaptive
clothing, and the modification of head-banging and self-restraint in two profoundly retarded
adults. Behavioural Analysis and Modification, 2, 185-196.
Rosen, J. C, & Leitenberg, H. (1982). Bulimia Nervosa: Treatment with exposure and response
evaluation. Behavior Therapy, 13, 117-124.
Rosenblum, L. A. (1978). The creation of a behavioral taxonomy. In G. P. Sackett (Ed.),
Observing behavior: Vol. 2. Data collection and analysis methods (pp. 15-24). Baltimore:
University Park Press.
Rosenthal, R. (1976). Experimenter effects in behavioral research (enlarged ed.). New York:
Irvington.
Rosenzweig, S. (1951). Idiodynamics in personality therapy with special reference to projective
methods. Psychological Review, 58, 213-223.
Ross, A. O. (1981). Child behavior therapy: Principles, procedures, and empirical basis. New
York: Wiley
Roxburgh, P. A. (1970). TVeatment of persistent phenothiazine-induced oraldyskinesia. British
Journal of Psychiatry, 116, 277-280.
Rubenstein, E. A., & Parloff, M. B. (1959). Research problems in psychotherapy. In E. A.
Rubenstein & M. B. Parloff (Eds.), Research in psychotherapy, (Vol. 1) (pp. 276-293).
Washington, DC: American Psychological Association.
Rugh, J. E., & Schwitzgebel, R. L. (1977). Instrumentation for behavioral assessment. In A. R.
Ciminero, K. S. Calhoun, & H. E. Adams (Eds.), Handbook of behavioral assessment (pp.
79-113). New York: Wiley
Rusch, F. R., & Kazdin, A. E. (1981). Toward a methodology of withdrawal designs for the
assessment of response maintenance. Journal of Applied Behavior Analysis, 14, 131-140.
Rusch, F. R., Walker, H. M., & Greenwood, C. R. (1975). Experimenter calculation errors: A
potential factor affecting interpretation of results. Journal of Applied Behavior Analysis, 5,
460.
Russell, M. B., & Bernal, M. E. (1977). Temporal and climatic variables in naturalistic observa-
398 Single-case Experimental Designs
M. LeBlanc, & D. M. Baer (Eds)., New developments in behavioral research: Theory, methods
and application (pp. 303-315). Hillsdale, NJ: Erlbaum.
Sajwaj, T. E., & Hedges, D. (1971). Functions of parental attention in an oppositional retarded
boy In Proceedings of the 79th Annual Convention of the American Psychological Association
(pp. 697-698). Washington, DC: American Psychological Association.
Sajwaj, T. E., TXvardosz, S., & Burke, M. (1972). Side effects of extinction procedures in a
remedial preschool. Journal of Applied Behavior Analysis, 5, 163-175.
Sanson-Fisher, R. W., Poole, A. D., Small, G. A., & Fleming, I. R. (1979). Data acquisition in
real time: An improved system for naturalistic observations. Behavior Therapy, 10, 543-554.
Scheffe, H. (1959). The analysis of variance. New York: Wiley.
Schindele, R. (1981). Methodological problems in rehabilitation research. International Journal
of Rehabilitation Research, 4, 233-248.
Schleien, S. J., Weyman, P., 8c Kiernan, J. (1981). Teaching leisure skills to severely handicapped
adults: An age appropriate darts game. Journal of Applied Behavior Analysis, 14, 513-519.
Schreibman, L., Koegel, R. L., Mills, D. L., & Burke, J. C. (in press). Training parent child
interactions. In E. Scholper & G. Mesibov (Eds.), Issues in autism: Vol. III. The effects of
Sechrest, L. (Ed.). (1979). Unobtrusive measurement today: New directions for methodology of
behavioral science. San Francisco: Jossey-Bass.
Shapiro, D. A., & Shapiro, D. (1983). Comparative therapy outcome research: Methodological
implications of meta-analysis. Journal of Consulting and Clinical Psychology, 51, 42-53.
Shapiro, E. S., Barrett, R. P., & Ollendick, T H. (1980). A comparison of physical restraint and
positive practice overcorrection in treating stereotypic behavior. Behavior Therapy, 11,
227-233.
Shapiro, E. S., Kazdin, A. E., & McGonigle, J.J. (1982). Multiple-treatment interference in the
simultaneous- or alternating-treatments design. Behavioral Assessment, 4, 105-115.
Shapiro, M. B. (1961). The single case in fundamental clincial psychological research. British
behaviors in an organically impaired and retarded patient. Journal of Behavior Therapy and
Experimental Psychiatry, 9, 253-258.
& Capparell, H. V. (1980). Behavioral
Tbrner, S. M., Hersen, M., Bellack, A. S., Andrasik, E,
and pharmacological treatment of obsessive-compulsive disorders. Journal of Nervous <fc
Mental Disease, 168, 651-657.
Twardosz, S., & Sajwaj, T. E. (1972). Multiple effects of a procedure to increase sitting in a
hyperactive, retarded boy. Journal of Applied Behavior Analysis, 5, 73-78.
UUmann, L. P., & Krasner, L. (Eds.) (1965). Case studies in behavior modfication. New York:
Holt, Rinehart and Winston.
Ulman, J. D., & Sulzer-Azaroff, B. (1973, August). Multielement baseline design in applied
behavior analysis. Symposium conducted at the annual meeting of the American Psychological
Association, Montreal.
Ulman, J. D., & Sulzer-Azaroff, B. (1975). Multielement baseline design in educational research.
In E. Ramp &
G. Semb (Eds.), Behavior analysis: Areas of research and application (pp.
377-391). Englewood Cliffs, NJ: Prentice-Hall, 1975.
Underwood, B. J. (1957). Psychological research. New York: Appleton-Century-Crofts.
VanBierliet, A., Spangler, P. P., & Marshall, A. M. (1981). An ecobehavioral examination of a
simple strategy for increasing mealtime language in residential facilities. Journal of Applied
Behavior Analysis, 14, 295-305.
Van Hasselt, V. B., & Hersen, M. (1981). Applications of single-case designs to research with
visually impaired individuals. Journal of Visual Impairment and Blindness, 75, 359-362.
Van Hasselt, V. B., Hersen, M., Kazdin, A. E., Simon, J., & Mastantuono, A. K. (1983). Social
skills training for blind adolescents. Journal of Visual Impairment and Blindness, 75, 199-203.
Van Houten, R., Nau, P A., MacKenzie-Keating, S. E., Sameoto, D., & Colavecchia, B. (1982).
An analysis of some variables influencing the effectiveness of reprimands. Journal of Applied
Behavior Analysis, 15, 65-83.
Varni, J. W., Russo, D. C, & Cataldo, M. E (1978). Assessment and modification of delusional
speech in an 1
1
-year-old child: A comparative analysis of behavior therapy and stimulant drug
effects. Journal of Behavior Therapy and Experimental Psychiatry, 9, 377-380.
Veenstra, M. (1971). Behavior modification in the home with the mother as the experimenter: The
effect of differential reinforcement on sibling negative response rates. Child Development, 42,
2079-2083.
Venables, P. H., fe Christie, M. H. (1973). Mechanism, instrumentation, recording techniques
change. In B. B. Lahey & A. E. Kazdin (Eds.), Advances in clinical child psychology (pp.
36-72). New York: Plenum.
Wahler, R. G., & Pollio, H. R. (1968). Behavior and insight: A case study in behavior therapy.
Journal of Experimental Research in Personality, 3, 45-56.
Wahler, R. G., Sperling, K. A., Thomas, M. R., Teeter, N. C, & Luper, H. L. (1970).
Modification of childhood stuttering: Some response-response relationships. Journal of Ex-
perimental Child Psychology, 9, 411-428.
Wahler, R. G., Winkel, G. H., Peterson, R. E, & Morrison, D. C. (1965). Mothers as behavior
therapists for their own children. Behaviour Research and Therapy, 3, 113-124.
Waite, W.W, & Osborne, J. G. (1972). Sustained behavioral contrast in children. Journal of the
Experimental A nalysis of Behavior, 18, 113-117.
Walker, H. M., & Buckely, N. K. (1968). The use of positive reinforcement in conditioning
attending behavior. Journal of Applied Behavior Analysis, 1, 245-250.
Walker, H. M., & Lev, J. (1953). Statistical inference. New York: Holt, Rinehart and Winston.
Wallace, C. J., Boone, S. E., Donahoe, C. P., & Foy, D. W (in press). Chronic mental disabilities.
In D. H. Barlow (Ed,), Behavioral treatment of adult disorders. New York: Guilford Press.
Wallace, C. J. (1982). The social skills training project of the Mental Health Clinical Research
Center for the Study of Schizophrenia. In J. P. Curran & P. M. Monti (Eds.), Social skills
training (pp. 57-89). New York: Guilford Press.
Wallace, C. J., & Elder, J. P. (1980). Statistics to evaluate measurement accuracy and treatment
effects in single subject research designs. In M. Hersen, R. M. Eisler, & P. M. Monti (Eds.),
Progress in behavior modification, (Vol. 10, pp. 40-82). New York: Academic Press.
Wampold, B. E,, & Furlong, M. J. (1981a). The heuristics of visual inference. Behavioral
Assessment, 3, 79-82.
Wampold, B. E., & Furlong, M. J. (1981b). Randomization tests in single-subject designs:
Illustrative examples. Journal of Behavioral Assessment, 3, 329-341.
Ward, M. H., & Baker, B. L. (1968). Reinforcement therapy in the classroom. Journal of Applied
Behavior Analysis, 1, 323-328.
Warren, V. L., & Cairns, R. B. (1972). Social reinforcement satiation: An outcome of frequency
or ambiguity. Journal of Experimental Child Psychology, 13, 249-260.
Watson, J. B., & Rayner, R. (1920). Conditioned emotional reactions. Journal of Experimental
Psychology, 3, 1-14.
Watson, P. J., & Workman, E. A. (1981). The non-concurrent multiple baseline across-individu-
als design: An extension of the traditional multiple baseline design. Journal of Behavior
Therapy and Experimental Psychiatry, 12, 257-259.
Webb, E. J., Campbell, D. T, Schwartz, R. D., & Sechrest, L. (1966). Unobtrusive measures:
Nonreactive research in the social sciences. Chicago: Rand McNally.
Webb, E. J., Campbell, D. T, Schwartz, R. D., Sechrest, L., & Grove, J. B. (1981). Nonreactive
measures in the social sciences, (2nd ed.). Boston: Houghton Mifflin.
References 403
Weick, K. E. (1968). Systematic observational methods. In G. Lindzey & E. Aronson (Eds.)., The
handbook of social psychology, (Vol. 2, 2nd ed.). (pp. 357-451). Menlo Park, CA: Addison-
Wesley.
Weinrott, M. R., Garrett, B., & Todd, N. (1978). The influence of observer presence on classroom
behavior. Behavior Therapy P, 900-911.
Weinrott, M. R., Jones, R. R., & Boler, G. R. (1981). Convergent and discriminant validity of
five classroom observation systems: A secondary analysis. Journal of Educational Psychology,
73, 671-679.
Wells, K. C, Hersen, M., Bellack, A. S., & Himmelhock, J. M., (1979). Social skills training in
White, O. R. (1972), manual for the calculation and use of the median slope: A technique of
A
progress estimation and prediction in the single case. Eugene, OR: University of Oregon,
Regional Resource Center for Handicapped Children.
White, O. R. (1974). The "split middle": A "quickie" method of trend estimation. Seattle, WA:
University of Washington, Experimental Education Unit, Child Development and Mental
Retardation Center.
Wildman, B. G., & Erickson, M. T. (1977). Methodological problems in behavioral observation.
In J. D. Cone & R. P. Hawkins (Eds.), Behavior assessment: New directions in clinical
psychology (pp. 255-273). New York: Brunner/Mazel.
Williams, C. D. (1959). Case report: The elimination of tantrum behavior by extinction proce-
dures. Journal of Abnormal and Social Psychology, 59, 269.
Williams, J. G., Barlow, D. H., & Agras, W
S. (1972). Behavioral measurement of severe
depression. Archives of General Psychiatry, 27, 330-334.
Wilson, C. W, & Hopkins, B. L. (1973). The effects of contingent music on the intensity of noise
in junior high home economics classes. Journal of Applied Behavior Analysis, 6, 269-275.
of the vomiting behavior of a retarded child. In L. P. Ullmann & L. Krasner (Eds.). Case
studies in behavior modification (pp. 364-366). New York: Holt, Rinehart and Winston.
Wolf, M. M., & Risley, T. R. (1971). Reinforcement: Applied research. In R. Glaser (Ed.), The
nature of reinforcement (pp. 310-325). New York: Academic Press.
Wolfe, J. L., & Fodor, I. G. (1977). Modifying assertive behavior in women: A comparison of
three approaches. Behavior Therapy, 8, 567-574.
Wolpe, J. (1958). Psychotherapy by reciprocal inhibition. Stanford: Stanford University Press.
Wolpe, J. (1976). Theme and variations: A behavior therapy casebook. Elmsford, New York:
Pergamon Press.
Wolstein, B. (1954). Transference: Its meaning and function in psychoanalytic therapy. New
York: Grune & Stratton.
Wong, S. E., Gaydos, G. R., & Fuqua, R. W. (1982). Operant control of pedophilia. Behavior
Modification, 6, 73-84.
Wood, D. D., Callahan, E. J., Alevizos, R N., & Teigen, J. R. (1979). Inpatient behavioral
assessment with a problem-oriented psychiatric logbook. Journal of Behavior Therapy and
Experimental Psychiatry, 10, 229-235.
Wood, L. E, & Jacobson, N. S. (in press). Marital disorders. In D. H. Barlow (Ed.), Behavioral
treatment of adult disorders. New York: Guilford Press.
Wright, H. E (1960). Observational child study. In R Mussen (Ed.), Handbook of research
methods in child development (pp. 71-139). New York: Wiley.
Wright, J., Clayton, J., & Edgar, C. L. (1970). Behavior modification with low-level mental
retardates. Psychological Record, 20, 465-471.
Yarrow, M. R., & Waxier, C. Z. (1979). Dimensions and correlates of prosocial behavior in young
children. Child Development, 47, 118-125.
Yates, A. J. (1970). Behavior therapy New York: Wiley.
Yates, A. J. (1975). Theory and practice in behavior therapy. New York: Wiley.
Yawkey, T. D. (1971). Conditioning independent work behavior in reading with seven-year-old
children in a regular early childhood classroom. Child Study Journal, 2, 23-34.
Yelton, A. R., Wildman, B. G., & Erickson, M. T. (1977). A probability-based formula for
calculating interobserver agreement. Journal of Applied Behavior Analysis, 10, 127-131.
Zeilberger, J., Sampen, S. E., & Sloane, H. N. (1968). Modification of a child's problem
behaviors in the home with the mother as therapist. Journal of Applied Behavior Analysis, 1,
47-53.
Zilbergeld, B., & Evans, M. B. (1980). The inadequacy of Masters and Johnson. Psychology
Today, 14, 28-43.
Zimmerman, E. H., & Zimmerman, J. (1962). The alteration of behavior in a special classroom
Journal of the Experimental Analysis of Behavior, 5, 59-60.
situation.
Zimmerman, J. Overpeck, C, Eisenberg, H., & Garlick, B. (1969). Operant conditioning in a
sheltered workshop. Rehabilitation Literature, 30, 326-334.
Subject Index
405
1
Expectancy effects, 42, 184, 189, 219 classical conditioning, 39, 40,90
Experimental analysis of behavior, 8, operant conditioning, 8, 30, 99
29-31 Logical generalization, 253, 333, 369
Experimental criterion, 285, 286
Experimental psychology, 1, 2-5, 6, 14, Maintenance, 68, 105-106, 144, 230,
30, 35 236, 239, 248, 250
Matching, 15, 54, 68, 213, 214
Factor analysis, 6 Merit Method, 6
Factorial Design. See Analysis of Mixed Schedule Design, 255
variance Multi-Element Baseline Design, 254,
Field testing, 365, 367 255, 299, 319
Follow-up, 44, 89, 110, 145, 150, 151, Multi-Element Experimental Designs, 30
234, 236, 247, 248 Multiple Baseline Design, 9, 64, 66, 88,
Functional manipulation, 260 95, 101, 102, 106, 164, 209-251,
275,281, 308, 309, 311, 321, 333
Generality of findings, 2, 4, 7, 8, 14, across behaviors, 215-230, 247, 344
16, 25,28, 32, 33,49-66, 84, 112, across individuals, 244
113, 127, 130, 150, 153, 154, 162, across settings, 238-244, 247, 249
204, 205, 211, 216, 226, 232, 239, across subjects, 230-238, 249, 251,
241, 247, 252, 257, 260, 272, 325, 278, 343
325-371 Multiple Probe Technique, 245-248
Group comparison. See Group design Multiple Schedule Design, 254, 255
Group Comparison Design, 1, 2, 3, Multiple treatment interference, 143,
5-8, 11-13, 14, 15, 16, 17, 18, 19, 153, 179, 205, 256-263, 272, 273,
20, 21, 22, 23, 28, 29, 30, 31, 33, 281
35, 36, 51-66, 99, 108, 167, 178,
179, 191, 193, 205, 226, 238, 252, Naturalistic studies, 2, 17, 18-20, 21
259, 286, 287, 291, 320, 321, 365, Nonconcurrent Multiple Baseline
370 Design, 244, 248
Norm Reference Tests. See Criterion
Habituation, 138 reference tests
Headache, 135, 136, 161-162 Normal distribution, 3, 5, 305
Homosexuality, 10, 39-42, 70, 86,
103-105, 147, 334-339 Obsessive Compulsive Disorder, 15, 16
Operational definition, 1 1
Independent variables, 9, 10, 17, 18, 27,
28, 29, 30, 33, 34, 35, 39, 48, 67, Paranoid delusions, 26
154 Patient Uniformity Myth, 16
Independent verification, 259 Percent of success, 12, 17, 19, 56
Individual differences, 5, 6, 7 Period treatment design, 175, 206
Instrumentation, 108 Phase, 26, 67, 72, 93, 95-101, 154, 162,
Intensive Design, 28 165, 280, 286, 292, 295, 299, 301,
Interaction effects, 193-205, 249, 272 302, 316, 319
Intelligence, 5 Phobia, 53, 82, 195-197, 201, 216-219,
tests, 6 273, 284, 333, 343, 346, 347
Intrasubject averaging, 45-48 Physiological measures, 108, 131,
Introspection, 3-4 135-138, 150
Irreversible procedures, 101-105 Physiological psychology, 1, 2-5, 8, 23
Placebo effects, 39, 60-61, 75, 78, 87,
Law of Initial Values, 138 101, 104, 105, 141, 183, 184, 185,
Learning Theory, 4, 6, 30, 31 186, 187, 188, 189, 190, 191, 192,
Subject Index 407
209, 249, 251, 255, 330, 331, 333, Scientist-Practitioner Split, 21-22
335 Self-report measures, 70, 108, 109 131,
Population, 8, 16, 305 132-135, 136, 150, 218-219, 284
Post Traumatic Stress Disorder, 241 behavioral, 133
Probe measures, 241 questionnaires, 108, 109, 133-134
Process research, 2, 17, 20-21, 23, 25, ^elf-monitoring, 108, 109, 133,
26, 27, 38 134-135, 203, 239
structured interviews, 108, 109
Quasi-Experimental Designs, 27-28, 71, Serial dependency, 287-290, 295, 296,
142, 143, 186, 206, 249 299, 300, 301, 302, 305-306
Questionnaires, 29 Sexual disorders, 86, 194, 220-222, 367
Simultaneous Replication Design, 226,
Random assignment, 15, 18, 19, 287 254
Random sampling, 52-54, 55, 65, 305 Simultaneous Treatment Design, 255,
Randomization Design, 254, 255 282-284, 319
Randomization Tests, 302-308, 319, 320 Social psychology, 30
Reactivity, 118, 120, 130, 135, 143, 245, Social validation procedures
247, 282 social comparison, 109, 110
Regression techniques, 110 subjective evaluation, 109, 110
Reliability, 68, 109, 114, 118, 122, Splitmiddle technique, 312-319, 321
124-129, 134, 158, 239, 286, 290, Spontaneous remission, 12, 19, 42
293, 308, 322, 325, 326, 327, 333, Statistical analysis, 3, 5, 6, 22, 28, 34,
338, 341, 346, 364 36, 126, 128, 129, 255, 257, 281,
Repeated measurement, 3, 4, 20, 21, 26, 282, 321
27, 30, 32, 37-38, 39,41,42,43, descriptive statistics, 3, 6, 22, 319,
44, 48, 64, 65, 67, 68-71, 72, 108, 321
110, 142, 179, 245, 287 inferential statistics, 1, 7-8, 16, 53,
Replication, 5, 11, 25, 26, 33, 51, 60, 65, 252, 318, 319, 321
56-62, 111, 143, 153, 154, 156, single-case, 285-324
162, 165, 179, 193, 196, 200, 204, Statistical significance, 35, 36, 48, 58,
205, 212, 225, 226, 232, 241, 244, 294, 302, 303-304, 308, 309, 313,
253, 260, 264, 286, 325-371 316, 318, 319-320
clinical, 325, 366-369 Structuralism, 4
direct, 50, 58, 61, 325, 326-347, 351,
364, 365
Target behaviors, 107, 108, 109-112,
systematic, 56, 59, 61, 62, 63, 101,
126, 129, 131, 134, 142, 145, 146,
325, 334, 339, 343, 344, 346,
156, 158, 187, 212, 228, 251, 309
347-354, 363-366
Term Series Design, 27
Representative case, 25-26
Therapeutic criterion, 285, 286
Response dimensions, 114-116
Time sampling, 70, 222, 224
Response guided experimentation, 38
Time Series Analysis, 71, 142, 288,
Response specificity, 138
296-302, 308, 319, 321, 353
Reversal design, 9, 30, 67, 88-95, 101,
Trend, 37, 38, 45, 73, 77
209, 210
Trend analysis, 28
Rn Test of Ranks, 308-312, 320
Triple response system, 108, 132
Abel, G. G., 45, 46, 47, 198, 199, 201, Bailey, J. S., 175
262 Bakeman, R., 114, 116
Adams, H. E., 135, 139, 359 Baker, B. L., 357, 365
Agras, W. S., 30, 39, 40, 41, 42, 43, 45, Ban, T, 100
46,47, 56, 69, 71, 80, 81, 82, 85, Bandura, A., 73, 99, 101, 153
86, 102, 103, 104, 136, 137, 138, Barker, R. G., 114
147, 150, 154, 155, 156, 166, 174, Barlow, D. H., 9, 15, 22, 24, 25, 30,
175, 176, 183, 188, 189, 194, 195, 35, 39,40,41,42,43,45,46,47,
197, 201, 205, 255, 259, 273, 274, 61, 67, 69, 70, 71, 73, 74, 79, 80,
278, 282, 327, 329, 330, 332, 336, 82, 86, 88, 95, 96, 102, 103, 104,
337, 341, 342, 352 133, 136, 137, 138, 140, 141, 142,
Aikins, D. A., 247 143, 150, 151, 152, 153, 158, 164,
Alevizos, P. N., 115 166, 167, 184, 185, 194, 196, 198,
Alden, S. E., 355, 359 199, 201, 207, 209, 212, 253, 254,
Alford, G. S., 71, 147, 154, 155, 156, 255, 256, 257, 261, 263, 268, 270,
175, 214, 220, 221, 352 274, 280, 281, 282, 327, 329, 330,
Allen, K. E., 89, 90, 94, 354, 356, 358 332, 333, 336, 337, 347, 352, 366,
Allison, M. G., 214 367, 369, 370, 371
Allport, G. D., 24, 62 Barmann, B. C, 214, 230
Altmann, J., 116, 117 Barnes, K. E., 360
Anderson, R. L., 322 Barnett, J. T, 213
Andrasik, 191, 192 Baroff, G. S., 355
Angell, M. J., 215 Barrera, R. D., 268
Armstrong, M., 357 Barrett, R. P., 265, 267, 270, 271. 272,
Arnold, G. R., 357 282
Arrington, R. E., 114, 118, 122 Barrios, B. A., 132
Ashem, R., 141 Barton, E. S., 222, 223, 224, 275
Atiqullah, M., 287 Bates, P, 214, 224, 225, 236
Ault, 99, 108 Baum, C. G., 120
Austin, J. B., 150, 152 Beauchamp, K. L., 214, 230
Axelrod, S., 132 Beck, A. T, 146, 275
Ayllon, T, 64, 70, 166, 167, 168, 170, Beck, S. J., 8, 235
214, 348, 349, 351 Becker, D., 153, 154, 235
Azrin, N. H., 64, 70, 106, 122, 166, Becker, R., 69
167, 168, 170, 265, 266, 349 Becker, W. ., 357, 358
Bellack, A. S., 28, 68, 87, 133, 139,
Baker, T. B., 108 183, 191, 192, 214, 215, 217, 218,
Baer, D. M., 62, 63, 71, 88, 94, 102, 247, 248, 347
114, 116, 128, 138, 139,209,210, Bemis, K. M., 72
212, 214, 222, 223, 245, 246, 247, Berberich, J. P, 368
266, 286, 290, 322, 323, 356, 357, Berger, L., 17
358, 360 Bergin, A. E., 15, 16, 19, 21, 22, 23,
409
410 Single-case Experimental Designs
25. 33, 35, 36,41, 51, 54, 55,61, Bruce, C., 358
63, 74, 366, 370 Brunswick, E., 53
Berk, R. A., 126, 127 Bryan, K. S., 247
Berler, E. S., 214 Bryant, L. E., 214
Bernal, M. E., 112 Bucher, B., 268
Bernard, M. E., 175, 202 Buckley, N. K., 156, 157, 352
Bickman, L., 113 Budd, K. S., 214, 360
Bijou, S. W., 95, 99, 108, 117, 118, Buell, J. S., 89, 90, 354, 356, 357
356, 357 Bugle, C., 106
Billingsley, E E, 308, 318, 323 Burgio, L. D., 214
Birkimer, J. C, 129 Burke, M., 162, 163, 360, 369
Birney, R. C, 6 Butcher, J. N., 19, 31
280
Bittle, R., 255, 266, Buys, C. J., 359
Blackburn, B. L., 215
Blanchard, E. B., 71, 136, 263, 352 Cairns, R. B., 363
Blewitt, E., 171, 172 Calhoun, K. S., 139
Blough, P. M., 258 Callahan, E. J., 102, 104, 115
Blunden, R., 171, 172 Campbell, D. T, 27, 28, 45, 57, 71,
Boer, A. P., 118 111, 121, 126, 132, 138, 140, 142,
Boler, G. R., 131 143, 153, 157, 244, 252, 256
Bolger, H., 9 Capparell, H. V, 191, 192
Bolstad, O. D., 120, 121, 125, 129, 131, Carey, R. G., 268
139 Carkhuff, R. R., 167, 168, 169
Boone, S. E., 52 Carlson, C. S., 357
Bootzin, R. R., 94, 99 Carmody, T. B., 135
Borakove, L. S., 110 Carr, A., 243
Boring, E. G., 3, 4, 6 Carter, V, 358
Bornstein, M. R., 214, 215, 217, 218, Carver, R. P, 110
347 Cataldo, M. E, 360
Bornstein, PH., 108, 113, 133, 135 Catania, A. C, 212
Bower, S. M., 295 Celso-Goyos, A., 267
Bowdler, C. M., 25 Chai, H., 320
Box, G. E. P, 300, 301, 306 Chapin, H. N., 198, 199, 201, 275
Boyer, E. G., 139 Chapin, J. P, 45, 46, 47
Boykin, R. A., 139 Christian, W. P, 214, 231, 232
Bradley, L. A., 136 Christie, M. H., 137
Bradlyn, A. S., 147, 149 Chassan, J. B., 15, 16, 20, 28, 35, 36,
Brady, J. P, 352 55, 87, 95, 99, 100, 183, 184, 185
Brawley, E. R., 357, 358 Ciminero, A. R., 139
Breuer, J., 9 Clairborne, M., 343, 345
Breuning, S. E., 214, 249, 250, 251 Clark, R., 358
Bridgwater, C. A., 108 Clayton, J., 359
Brill, A. A., 10 Coates, T. J., 136
Brinbauer, J. S., 209, 265, 352, 366 Cohen, D. C, 22, 70
Broden, M., 355, 356, 357, 358 Cohen, J., 127
Brody, G. H., 287 Cohen, S., 293
Brookshire, R. H., 352 Colavecchia, B., 268
Brouwer, R., 215 Coleman, R. A., 175
Brown, J. H., 129 Coles, E. M., 138
Brown, R. A., 355, 359 Conderman, L., 358
Browning, R. M., 142, 256, 283 Cone, J. D., 108, 109, 115, 118, 122,
Name Index 411
Herman, S. H., 39, 40, 41, 42, 43, 334, Johnson, M., 110, 120, 121, 125,
S.
336, 337, 338 129, 131,139,266
Hernstein, R. J., 212 Johnston, J. M., 31, 37, 72, 90, 94, 95,
Hersen, M., 25, 35, 61, 67, 68, 69, 70, 96, 100, 111, 128, 132, 175, 182,
71, 73, 74, 79, 80, 82, 85, 86, 88, 291, 347, 354
94, 95, 96, 102, 105, 133, 137, 139, Johnston, M. K., 356, 357
140, 142, 144, 146, 148, 150, 152, Johnstone, G., 267
153, 154, 155, 156, 158, 161, 164, Jones, R. R., 125, 131, 290, 293, 296,
165, 166, 167, 170, 171, 175, 183, 297, 299, 301
184, 185, 191, 192, 209, 212, 214, Jones, R. T, 214, 233, 234
215, 217, 218, 228, 229, 247, 248,
347, 352, 366 Kanowitz, J., 121
Hickey, J. S., 108 Katz R. C., 214, 230
Hilgard, J. R., 213 Kaufman, K. E, 175
Himmelhock, J. M., 68, 347 Kazdin, A. E., 9, 19, 24, 25, 30, 31, 53,
Hinson, J. M., 258 56, 59, 60, 67, 88, 94, 95, 99, 101,
House, A. E., 126 102, 105, 106, 109, 110, 112, 113,
House, B. J., 126 115, 118, 120, 121, 130, 132, 139,
Horner, R. D., 245, 246, 349 141, 142, 153, 162, 202, 204, 206,
Home, G. P., 299, 302 209, 211, 212,214,215, 216, 223,
Hopkins, B. L., 116, 175, 179, 355, 228, 229, 234, 235, 247, 254, 256,
358, 360 260, 261, 266, 267, 278, 279, 282,
Honing, W. K., 38, 212 286, 290, 291, 292, 307, 318
Homer, A. L., 138 Kane, M., 214, 241, 242
Holz, W., 122 Keefauver, L. W., 175, 202
Holtzman, W. H., 322 Kelley, C. S., 354, 356
Holmes, D. S., 356 Kelly, J. A., 149, 214, 226, 343, 345
Hollon, S. D., 72 Kelly, M.
G., Ill, 115, 117, 126, 147
Holmberg, M., 114 Kendall. P C., 19, 31, 116
Holm, R. A., 115 Kennedy, R. E., 215, 301
Hollenbeck, A. R., 121, 127 Kent, R. N., 118, 121
Hollandsworth, J. G., 120 Kernberg, O. E, 18
Hoffman, A., 320 Kessel, L., 10
Hodgson, R. J., 333, 334 Kiernan, J., 110
Hocking, N., 355, 357 Kiesler, D. J., 16, 17, 18, 20, 49, 55, 60
Hoch, P. H., 17, 20 Kirby, E D., 360
Hubert, L. J., 127, 302 Kircher, A. S., 266
Huitema, B. E., 299 Kirchner, R. E., 215, 243
Hundert, J., 214 Kirk, R. E., 307
Hutt, C, 111, 112 Kistner, J., 215
Hutt, S. J., Ill, 112 Klein, R. D., 295
Hyman, R., 10, 17 Knapp, T. J., 293
Kneedler, R. D., 267
Inglis, J., 29 Koegel, R. L., 106, 214, 215, 226, 227,
Iwata, B. A., 267-268 368, 369
Kopel, S. A., 209, 211, 212, 216
Jackson, D., 355, 357 Kraemer, H. C, 55, 117
Jacobson, N. S., 353, 363 Krasner, L., 30, 57, 94, 99, 141
Jarrett, R. B., 268, 274, 276, 277 Kratchowill, T. R., 31, 67, 142, 175,
Jayaratne, S., 31 202, 287, 296, 301, 324
Jenkins, G. M., 301 Kulp, S., 117
414 Single-case Experimental Designs
Wincze, J. P., 69, 137, 174, 178 179, Wooton, M., 360
330, 339, 341, 342, 343, 366 Workman, E. A., 244, 245
Winett, R. A., 138, 292 Wright, D. E., 80, 81, 195
Winkel, G. H., 355, 356 Wright, H. E., 114, 116
Winkler, R. C, 138 Wright, J., 358
Winton, A. S., 268 Wysocki, T, 249
Wittlieb, E., 109
Wodarski, J. S., 215 Yang, M. C. K., 299
Wolery, M., 308, 318, 323 Yarrow, M. R., 123
Wolf, M. M., 64, 71, 89, 90, 110, 142, Yates, A. J., 29
143, 175, 212, 266, 286, 290, 352, Yawkey, T. D., 359
354, 355, 356, 359 Yelton, A. R., 129
Wolfe, J. L., 134, 215 Yule, W, 215
Wolstein, B., 37
Wonderlich, S. A., 138 Zegiob, L. E., 120
Wong, S. E., 215 Zeilberger, J., 99, 358
Wood, D. D., 122 Zilbergeld, B., 367
Wood, L. E, 108, 115, 116, 117, 123. Zimmerman, E. H., 356
125, 129, 353 Zimmerman, J., 265, 266, 356
Wood, S.. 360 Zubin, J., 17, 20
About the Authors
DAVID H. BARLOW received his Ph.D from the University of Vermont in
1969 and has pubHshed over 150 articles and chapters and seven books,
mostly in the areas of anxiety disorders, sexual problems, and clinical re-
419
m
l?Tti
u
T
M
if-
m i
iillil:!
\V