2009 DOF Manual
2009 DOF Manual
2009 DOF Manual
o
b
s
e
r
v
a
t
i
o
n
s
;
A
g
g
r
e
s
s
i
v
e
/
R
u
l
e
-
B
r
e
a
k
i
n
g
,
A
n
x
i
o
u
s
/
D
e
p
r
e
s
s
e
d
,
a
n
d
S
o
m
a
t
i
c
C
o
m
p
l
a
i
n
t
s
a
r
e
s
c
o
r
e
d
f
r
o
m
c
h
i
l
d
r
e
n
s
s
e
l
f
-
r
e
p
o
r
t
s
d
u
r
i
n
g
t
h
e
S
C
I
C
A
.
T
h
e
D
S
M
-
o
r
i
e
n
t
e
d
s
c
a
l
e
s
o
n
t
h
e
S
C
I
C
A
a
r
e
s
c
o
r
e
d
f
r
o
m
i
n
t
e
r
v
i
e
w
e
r
s
o
b
s
e
r
v
a
t
i
o
n
s
a
n
d
c
h
i
l
d
r
e
n
s
s
e
l
f
-
r
e
p
o
r
t
s
.
65
5. Practical Applications and Case Examples
Practitioners can examine the DOF results, along
with cognitive and achievement test data, to plan
appropriate interventions and classroom accommo-
dations to address co-occurring behavioral and
emotional problems along with academic deficits
of children with learning disabilities.
CASE EXAMPLE OF ASSESSMENT
OF ADHD:
Melinda Brandt, Age 8
Melinda Brandt is the 8-year-old girl whose
computer-scored DOF Profile was shown in Chap-
ter 3. Melinda was the younger of two children in
a middle class family that included her mother,
father, and older brother. Her mother brought her
to a mental health outpatient clinic for a psycho-
logical evaluation because she was concerned about
Melindas problems paying attention and her
struggles with school work. Melindas teacher had
sent home several notes complaining about
Melindas behavior in school and her failure to
complete work on time. Melindas mother was es-
pecially worried that Melinda might be retained in
third grade, which she felt would be a great blow
to her self-esteem.
Melinda was evaluated by a child psychologist
in the mental health clinic who also provided con-
tracted consultation services in Melindas school
district. Melindas evaluation followed the se-
quence illustrated in Figure 5-1 and included the
five assessment axes outlined earlier in Table 1-1
in Chapter 1. Prior to Melindas appointment at
the clinic, Ms. Brandt completed the CBCL/6-18
and a questionnaire about Melindas developmen-
tal and medical history. With Ms. Brandts permis-
sion, Melindas third grade teacher completed the
TRF and provided copies of Melindas school
records. Ms. Brandt also gave permission for the
clinic psychologist to interview Melindas teacher
and to obtain observations of Melindas behavior
in the classroom.
As part of her consultation services to the school
district, the psychologist had trained teacher aides
in procedures for using the DOF. One week before
Melindas evaluation at the clinic, a teacher aide
(Valerie Stone) used the DOF to obtain four 10-
minute observations of Melinda in her classroom.
Ms. Stone also made two 10-minute observations
of each of two of Melindas classmates on the same
days that she observed Melinda. After she com-
pleted all the observations, Ms. Stone mailed the 8
DOFs to the clinic psychologist for computer-scor-
ing.
For Melindas evaluation at the clinic, the psy-
chologist administered the Wechsler Intelligence
Scales for Children-Fourth Edition (WISC-IV;
Wechsler, 2003) and the Wechsler Individual
Achievement Test-Second Edition (WIAT-II;
Wechsler, 2002) to assess her cognitive and aca-
demic functioning. She also administered a com-
puterized continuous performance test (CPT) to
assess Melindas impulsivity and ability to sustain
attention. After each test, the psychologist com-
pleted the TOF to provide a standardized assess-
ment of Melindas test session behavior. The psy-
chologist also interviewed Ms. Brandt about
Melindas developmental and educational history
and her behavior at home, and interviewed
Melindas teacher on the phone.
Parent and Teacher Reports. The CBCL/6-18
completed by Ms. Brandt produced scores in the
borderline clinical range for Externalizing (84
th
percentile) and Attention Problems (95
th
percen-
tile), but normal range scores for all other scales.
Melindas scores on the CBCL/6-18 competence
scales were also in the normal range, although her
mother expressed worries about her school perfor-
mance. In a structured diagnostic interview, Ms.
Brandt endorsed 6 of 9 DSM-IV-TR ADHD symp-
toms of inattention, with onset before age 7, but
no symptoms of hyperactivity-impulsivity. Al-
though Ms. Brandt acknowledged that Melinda
sometimes seemed restless (e.g., had trouble sit-
ting still at dinner and in church) and she did not
always think things through, Ms. Brandt did
not think that Melinda was unusually hyperac-
tive compared to other children in the family.
Ms. Brandts main concern was that Melindas
5. Practical Applications and Case Examples
66
attention problems were interfering with her abil-
ity to do schoolwork and that she was falling be-
hind in class. Ms. Brandt said that Melinda needed
constant reminders to do her homework and some-
times failed to hand in work even when she did
complete it. Melindas struggles with schoolwork
often led to arguments and temper tantrums at
home. Ms. Brandt had become especially alarmed
when Melindas teacher suggested that Melinda
might not be ready to move on to fourth grade at
the end of the year.
The TRF completed by Melindas third grade
teacher produced scores in the clinical range for
Externalizing and Total Problems (above 90
th
per-
centile), plus clinical range scores on the TRF At-
tention Problems syndrome scale, the DSM-ori-
ented Attention Deficit/Hyperactivity Problems
scale, and the Inattention and Hyperactivity-Impul-
sivity subscales (all above 97
th
percentile).
Melindas scores were in the borderline clinical
range on the TRF Social Problems, Thought Prob-
lems, Rule-Breaking and Aggressive Behavior syn-
drome scales, as well as the DSM-oriented Oppo-
sitional Defiant and Conduct Problems scales.
The teachers ratings of Melindas adaptive
functioning yielded a score in the clinical range
below the 10
th
percentile. The teacher rated Melinda
as behaving much less appropriately, learning much
less, and somewhat less happy than typical pupils.
The teacher also rated Melindas academic perfor-
mance as far below grade level in mathematics,
written language, and social studies, somewhat
below grade level in reading, but at grade level in
art. The teacher noted that Melinda was a capable
and creative child, but she had great difficulty sit-
ting still, seemed to talk constantly, and frequently
disturbed other children. Her school work was of-
ten messy and incomplete. She failed to listen to
instructions and seemed unconcerned about the
quality of her work. The teacher felt it was very
challenging to have Melinda in her class. The
teacher had tried accommodations to address
Melindas attention problems and disruptive be-
havior (e.g., moving her to a quiet corner in the
class and providing stickers for completed work),
but nothing seemed to work. The teacher confirmed
that school staff were considering retention in third
grade due to Melindas poor academic and social
functioning.
Classroom Observations with the DOF. Fig-
ure 2-2 in Chapter 2 showed the observers notes
and on-task ratings on the DOF for the first 10-
minute observation of Melinda. Figures 3-1 to 3-4
in Chapter 3 displayed Melindas computer-scored
DOF Profile. Melinda scored in the clinical range
above the 97
th
percentile on the Attention Prob-
lems, Intrusive, and Oppositional syndrome scales,
and in the borderline clinical range on the Slug-
gish Cognitive Tempo syndrome scale (see Figure
3-1). These high scores indicated that Melinda ex-
hibited many more attention problems and more
intrusive and oppositional behavior in the class-
room than was typical for the DOF normative
sample of 6-11-year-old girls. Melinda also scored
in the clinical range on the DSM-oriented Atten-
tion Deficit/Hyperactivity Problems scale and the
Hyperactivity/Impulsivity subscale, and in the bor-
derline range on the Inattention subscale (see Fig-
ure 3-3).
Test Scores and Observations with the TOF.
On the WISC-IV, Melinda obtained a full scale IQ
of 107, which was in the average range. She scored
in the average range for Verbal Comprehension
(VCI = 108), Perceptual Reasoning (PRI = 106),
and Working Memory (WMI = 107), but low aver-
age for Processing Speed (PSI = 88). On the WIAT-
II, Melinda scored in the average range for reading
and mathematics, but low average for written ex-
pression. On the math subtests, she scored much
lower for numerical operations than for math rea-
soning. Her scores on the CPT also suggested ten-
dencies toward impulsive responding and difficul-
ties sustaining attention.
The psychologists ratings of Melindas test ses-
sion behavior produced scores in the borderline
range on the TOF Attention Problems syndrome
and the DSM-oriented Attention Deficit/Hyperac-
tivity Problems scale, as well as borderline scores
on the Inattention and Hyperactivity/Impulsivity
67
5. Practical Applications and Case Examples
subscales. Melinda also scored in the borderline
range on the TOF Oppositional syndrome during
achievement testing, but not during cognitive test-
ing.
Data Interpretation and Integration. Class-
room observations with the DOF were especially
useful in Melindas case in light of discrepancies
between reports by her mother versus her teacher.
The DOF Profile showed that Melinda exhibited
many more attention problems than was typical for
the normative sample of 6-11-year-old girls. She
also exhibited many more attention problems than
two classmates selected as DOF controls. These
DOF findings corroborated reports of inattention
by Melindas mother and her teacher. At the same
time, the DOF also showed high levels of hyper-
activity and impulsivity in the classroom, consis-
tent with reports by Melindas teacher but not her
mother. The TOF Profile also indicated high lev-
els of inattention and hyperactivity/impulsivity dur-
ing cognitive and achievement testing, but at less
severe levels than observed in the classroom with
the DOF.
Taken together, the DOF and TOF provided im-
portant independent evidence of problems with in-
attention and hyperactivity/impulsivity, as reported
by Melindas teacher on the TRF. The results from
the DOF, TOF, and TRF, in conjunction with de-
velopmental and educational history, supported a
DSM-IV-TR diagnosis of ADHD-Combined type.
However, if the evaluation had relied only on symp-
tom reports by Melindas mother, without class-
room observations that corroborated reports by her
teacher, a diagnosis of ADHD-Combined type
would not have been appropriate.
Melindas average scores on the WISC-IV sug-
gested that her problems with schoolwork were not
due to low ability. However, low average WIAT-II
scores for math operations and written expression
indicated that Melinda was falling behind in these
basic academic skills. Interestingly, the TOF Pro-
file showed severe oppositional behavior during
achievement testing, but not during cognitive test-
ing. The DOF also revealed severe oppositional
and intrusive behavior in the classroom, consis-
tent with the TOF and the teachers reports on the
TRF. Although Melindas mother did not report
oppositional behavior on the CBCL/6-18, she did
say that attempts to help Melinda with her home-
work often erupted into arguments and temper tan-
trums at home. Taken together, these findings sug-
gested a strong association between Melindas aca-
demic skill deficits and her oppositional behavior
when confronted with academic tasks.
Case Management and Outcome Evaluation.
To address problems revealed in the evaluation,
the psychologist referred Melinda and her parents
to a child psychiatrist in the clinic for possible
medication for ADHD. The psychologist also con-
sulted with Melindas teacher to develop accom-
modations and behavioral interventions in the
classroom. They moved Melindas seat near the
teachers desk for closer monitoring of her work.
They also paired Melinda with a peer tutor to work
on math and writing assignments and created an
incentive plan to encourage on-task behavior and
academic productivity. The school staff incorpo-
rated the accommodations and behavioral interven-
tions into a Section 504 plan for Melinda as an
alternative to retention in third grade. As part of
the Section 504 plan, a teachers aide continued to
conduct biweekly classroom observations of
Melinda with the DOF. Following an RTI model,
the school team examined DOF scores for On-task,
Attention Problems, and the Attention Deficit/Hy-
peractivity Problems scale to monitor Melindas
progress toward their behavioral goals. They also
used curriculum-based measures to monitor
Melindas academic progress in math and written
work.
CASE EXAMPLE OF A SCHOOL-BASED
ASSESSMENT OF BEHAVIOR PROBLEMS:
Ricky Johnson, Age 9
Ricky Johnson was the youngest child living
with his mother and two sisters in a low-income
inner city neighborhood. Rickys fourth grade
teacher consulted the school MDT because he had
been involved in several fights on the playground.
5. Practical Applications and Case Examples
68
The teacher was not sure what started the fights or
who else was involved, but Ricky was usually the
one sent to the principals office for in-school sus-
pensions. The teacher also reported that Ricky was
disruptive in class and seemed to have few friends.
The school psychologist (Harry Provo) contacted
Rickys mother to express the teams concerns and
to obtain her permission for a behavioral assess-
ment. Ms. Johnson agreed to have the school psy-
chologist observe Rickys behavior in school. She
also agreed to complete the CBCL/6-18 and to have
Rickys teacher complete the TRF.
Recess and Classroom Observations with the
DOF. As an initial step in his evaluation, Mr. Provo
used the DOF to observe Ricky in his classroom
and during recess. On each of three days, Mr. Provo
conducted two observations of Ricky in the class-
room and two observations on the playground dur-
ing recess. Mr. Provo also observed two other boys
as control children in the same setting.
Figure 3-6 in Chapter 3 showed the DOF Pro-
file scored from Mr. Provos six observations of
Ricky during recess. On the DOF Profile, Ricky
scored in the clinical range above the 97
th
percen-
tile on the Aggressive Behavior syndrome and in
the clinical range above the 90
th
percentile for To-
tal Problems. These results indicated that Ricky
exhibited many more problems at recess than was
typical for the DOF normative sample of 6-11-year-
old boys. On the Aggressive Behavior syndrome,
Mr. Provo rated six items as present for Ricky: 14.
Cruel, bullies, or mean to others; 30.Gets into
physical fights; 31. Gets teased; 47. Screams; 66.
Teases; and 86. Bossy. Ricky also exhibited six
other problems that contributed to his high DOF
Total Problems score: 3. Argues; 8. Difficulty wait-
ing turn in activities or tasks; 20. Disobedient; 22.
Doesnt seem to feel guilty after misbehaving; 67.
Temper tantrums, hot temper, or seems angry; and
83. Doesnt get along with peers. The two control
children, by contrast, showed little aggressive be-
havior. However, their borderline clinical score for
Total Problems indicated that they too showed other
problems on the playground, most notably scream-
ing and difficulty waiting their turn, similar to
Ricky.
The DOF Profile scored from Mr. Provos ob-
servations of Ricky in the classroom (not shown)
produced a score at the 84
th
percentile for Total
Problems, which fell in the borderline range for 6-
11-year-old boys. Ricky scored in the borderline
range between the 93
rd
and 97
th
percentiles on the
DOF Intrusive and Oppositional syndrome scales,
but in the normal range on the Sluggish Cognitive
Tempo, Immature/Withdrawn, and Attention Prob-
lems syndrome scales and the DSM-oriented At-
tention Deficit/Hyperactivity Problems scale. On
the Intrusive syndrome, Mr. Provo rated five items
as present for Ricky: 8. Difficulty waiting turn in
activities or tasks; 21. Disturbs other children; 46.
Disrupts group activities; 55. Demands must be
met immediately; and 65. Talks too much. On the
Oppositional syndrome, Mr. Provo rated four items
as present: 16. Difficulty following directions; 23.
Doesnt seem to listen to what is being said; 52.
Shows off, clowns, or acts silly; and 83. Doesnt
get along with peers. Although the two control
children also showed difficulty waiting their turn
and talking too much, their scores on all DOF scales
were within the normal range.
Parent and Teacher Reports. The CBCL/6-18
completed by Ms. Johnson yielded a score in the
clinical range above the 90
th
percentile for Exter-
nalizing, along with a borderline score on the Rule-
Breaking Behavior syndrome (95
th
percentile).
Ricky scored just below the borderline range on
the Social Problems syndrome (90
th
percentile) and
Aggressive Behavior syndrome (92
nd
percentile),
but well within the normal range on all other syn-
drome scales. Rickys total competence score on
the CBCL/6-18 was in the clinical range below the
10
th
percentile, with clinical range scores below the
3
rd
percentile on the Social and School scales.
In a phone interview with Mr. Provo, Ms.
Johnson said that she was worried that Ricky was
hanging out with older boys who had gotten into
trouble in the neighborhood. She said that the po-
lice had come to her house a month ago because
some of the boys were caught shoplifting at a local
69
5. Practical Applications and Case Examples
grocery store. Ms. Johnson believed that Ricky was
innocent, but she worried that he might be led on
by the other boys. Because Ms. Johnson worked
two jobs to support her family, she was not able to
provide the supervision at home that she felt Ricky
needed. She also reported that Ricky often had
trouble with his schoolwork. She asked his older
sisters to help him with homework, but they were
often too busy with their own work or socializing
with friends.
The TRF completed by Rickys fourth grade
teacher yielded clinical range scores above the 90
th
percentile for Externalizing and Total Problems,
along with a clinical range score above the 97
th
percentile on the Social Problems syndrome. Ricky
scored near the borderline range on the Rule-Break-
ing Behavior and Aggressive Behavior syndromes.
The teacher also reported several problems on the
Attention Problems syndrome (e.g., 4. Fails to fin-
ish things he/she starts; 22. Difficulty following
directions; 92. Underachieving, not working up to
potential), but Rickys total score was in the nor-
mal range. Rickys scores on the TRF adaptive
functioning scale was in the clinical range below
the 10
th
percentile. The teacher rated Ricky as be-
having much less appropriately and learning much
less than typical pupils. She also rated Rickys aca-
demic performance as far below grade level in read-
ing, mathematics, and written language, but at
grade level in social studies and science. Rickys
teacher was especially concerned about his prob-
lems getting along with other children on the play-
ground and his disruptive behavior in class. She
was also concerned that in-school suspensions had
caused Ricky to miss instructional time to the point
that he was falling behind in his schoolwork. When
Ricky was in class, he often clowned around and
disrupted class activities. He also became easily
frustrated with his work, sometimes to the point of
ripping up his own papers.
Data Interpretation and Integration. Results
from the DOF, TRF, and CBCL/6-18 indicated that
Ricky was showing more externalizing problems
than most boys his age. All three ASEBA forms
produced scores above the 90
th
percentile on the
Aggressive Behavior syndrome. The CBCL/6-18
and TRF also produced scores above the 90
th
per-
centile on the Rule-Breaking Behavior syndrome.
Low scores on the CBCL/6-18 Social competence
scale and high scores on the TRF Social Problems
syndrome also indicated problems in social rela-
tionships.
Following the three-tiered model discussed ear-
lier, the school MDT decided to initiate Tier 2 be-
havioral interventions to address Rickys problems
in school. As a first step, Mr. Provo examined the
profiles from the ASEBA forms to identify prob-
lems that were consistent across informants and
settings. He listed several problems from the DOF
recess observations that were similar to problems
reported on the CBCL/6-18 and TRF: 3. Argues;
20. Disobedient; 22. Doesnt seem to feel guilty
after misbehaving; 30. Gets into physical fights;
31. Gets teased; 66. Teases; 67. Temper tantrums,
hot temper, or seems angry; and 83. Doesnt get
along with peers.
Mr. Provo conducted a functional behavior as-
sessment to identify antecedents and consequences
of the problem behaviors, particularly fighting on
the playground. He learned that fights usually
erupted after other children teased Ricky and called
him names or after Ricky argued with them, teased
them, or became bossy about the rules of a game.
As the arguing and teasing escalated, Ricky would
lose his temper and start hitting and punching. The
other children also lost their tempers and began
hitting and punching, so that it was not always clear
who started the fights. On one occasion, the play-
ground supervisor broke up a fight and sent Ricky
to the principals office. On another occasion, no
one intervened and the fight ended when the bell
rang for children to return to class.
Mr. Provo conducted an additional functional
behavior assessment to identify antecedents and
consequences of problems observed in the class-
room, particularly 21. Disturbs other children; 46.
Disrupts group activities; and 52. Shows off,
clowns, or acts silly. Mr. Provo learned that Ricky
became easily frustrated when academic tasks were
too difficult. He would then start clowning and
5. Practical Applications and Case Examples 70
acting silly or would disturb other children to avoid
doing his work. When this happened, the teacher
scolded Ricky and made him sit alone in the back
of the room for a time-out.
Mr. Provo met with Rickys teacher and Ms.
Johnson to discuss his observations and learn more
about their perspectives on Rickys problems. Ms.
Johnson and the teacher agreed that Ricky had dif-
ficulty controlling his temper and that he lacked
social skills for getting along with other children
his age. They also worried that Ricky might be
learning delinquent behaviors from the older
boys in his neighborhood. Mr. Provo noted that
teasing and name-calling seemed to be what
sparked Rickys fights on the playground. He also
pointed out that a desire to avoid difficult school
work might explain Rickys disruptive behavior in
the classroom.
Mr. Provo consulted with Ms. Johnson and
Rickys teacher to develop Tier 2 interventions to
address Rickys problems. They delineated a clearer
set of playground rules for all children (e.g., no
hitting and fighting, no name calling) and increased
the level of supervision on the playground. They
developed a behavior contract for Ricky to encour-
age positive behaviors (e.g., working quietly, ask-
ing for help when needed, and working coopera-
tively with other children) that would replace un-
desirable behaviors in the classroom. The behav-
ior contract also included bonus points for days
when Ricky did not get into fights on the play-
ground. The teacher sent weekly reports home to
Ms. Johnson showing Rickys progress toward the
behavioral goals. When Ricky met his weekly
goals, Ms. Johnson provided special rewards at
home (e.g., watching a DVD, getting a special din-
ner, playing a board game).
To improve his social functioning, Ricky was
enrolled in a weekly social skills group conducted
by the school guidance counselor. The teacher and
the guidance counselor also began teaching a so-
cial skills curriculum to the entire class. Ms.
Johnson enrolled Ricky in an after-school recre-
ational and sports program, which introduced him
to a new peer group and increased adult supervi-
sion in after-school hours. To improve his academic
skills, Ricky received daily small group instruc-
tion in reading and math and his teacher checked
his work regularly to ensure that he understood
directions and was staying on task. Ricky received
additional points on his behavior contract for meet-
ing academic goals (e.g., handing in assignments
on time).
Case Management and Outcome Evaluation.
Mr. Provo continued to use the DOF to monitor
Rickys progress in meeting behavioral goals. To
do this, he trained a teacher aide in the DOF rating
procedures. The teacher aide used the DOF to make
two 10-minute observations of Ricky in the class-
room and two 10-minute observations at recess
each week over a period of 10 weeks. Mr. Provo
scored each set of weekly DOFs and created graphs
of Rickys scores for DOF On-task and the Intru-
sive, Oppositional, and Aggressive Behavior syn-
drome scales. Mr. Provo and the teacher examined
the graphs each week to evaluate Rickys progress.
The graphs of DOF scores showed a gradual
decline in the DOF syndrome scale scores and an
increase in On-task scores over the 10-week inter-
vention period. At the end of the 10 weeks, Rickys
teacher completed a new TRF and Ms. Johnson
completed a new CBCL/6-18 as additional mea-
sures of outcomes for Ricky. The CBCL/6-18 and
TRF both showed lower scores on the Social Prob-
lems, Rule-Breaking and Aggressive Behavior syn-
drome scales than at baseline. These results, com-
bined with the DOF findings, suggested that the
school-based interventions were associated with
reductions in targeted problem behaviors and an
increase in on-task behavior. Although the MDT
could not attribute direct causal effects to the in-
terventions, they felt that the changes in ASEBA
scores justified continuing the Tier 2 interventions
until the end of the school year.
If the Tier 2 interventions had not been associ-
ated with reductions in Rickys problems, then the
MDT would have proceeded with a more compre-
hensive Tier 3 evaluation to determine whether
Ricky was eligible for special education services
under the category of emotional disturbance. The
MDT could use the existing CBCL/6-18, TRF, and
71
In earlier chapters, we described the 2009 DOF
and the DOF Profile. In this chapter, we summa-
rize research to develop previous versions of the
DOF and its scoring profile. We then describe our
research to develop and standardize the scales of
the 2009 DOF Profile. Readers who are not inter-
ested in the details of form and scale construction
should feel free to skim or skip this chapter.
EARLIER VERSIONS OF THE DOF
The original version of the DOF (Achenbach,
1981) included 96 problem items, plus an open-
ended item for other problems, and on-task ratings
for ten 1-minute intervals. To assemble the DOF
item set, Achenbach examined early versions of
the CBCL and TRF to find items appropriate for
rating direct observations of childrens behavior
in group settings. (For brevity, we use CBCL here
to refer all versions.) Whenever possible,
Achenbach retained the original wording of the
CBCL and TRF items for DOF items. Examples
are 2009 DOF items: 5. Defiant or talks back to
staff; 19. Destroys property belonging to others;
41. Physically attacks people; and 71. Unhappy,
sad, or depressed. Other CBCL/TRF items were
worded slightly differently on the DOF to make
them more appropriate for direct observations of
childrens behavior. Examples are: rewording
CBCL/TRF item 8. Cant concentrate, cant pay
attention for long to DOF item 7. Doesnt concen-
trate or doesnt pay attention for long; rewording
CBCL/TRF item 10. Cant sit still, restless, or
hyperactive to DOF item 9. Doesnt sit still, rest-
less, or hyperactive; and rewording CBCL/TRF
item 19. Demands a lot of attention to DOF item
17. Demands or tries to get attention of staff (later
abbreviated to 17. Tries to get attention of staff).
Seventy-two of the 96 original DOF items had
counterparts on the CBCL and 86 had counterparts
on the TRF. Ten of the 96 original DOF items had
no direct counterpart on the CBCL and/or TRF. Ex-
amples are 2009 DOF items: 81. Easily led by
peers; 85. Tattles; and 86. Bossy (for details
of the 1981 DOF, see Achenbach & Edelbrock,
1983).
A revised edition of the DOF (Achenbach, 1986)
included the same 96 problem items, plus an open-
ended item for other problems. Some items were
reworded slightly for clarification. An example is
1981 DOF item 34. Isolates self from others that
was reworded to 34. Physically isolates self from
others.
The 1986 DOF Profile displayed scores for six
syndrome scales derived from principal compo-
nents/varimax analyses of 212 clinically referred
5-to-14-year-old children: Withdrawn-Inattentive,
Nervous-Obsessive, Depressed, Hyperactive, At-
tention Demanding, and Aggressive. The profile
also displayed scores for Internalizing and Exter-
nalizing scales derived from second-order factor
analyses of scores on the six syndrome scales, plus
a Total Problems score (the sum of ratings on all
96 items plus other problems) and an On-task
score. The 1986 DOF problem scales were normed
on 287 children observed as controls for referred
children in regular classrooms of 45 schools in
Vermont, Nebraska, and Oregon. The 1986 DOF
computer-scoring program calculated raw scores
for the syndrome scales, plus raw scores and T
scores for Internalizing, Externalizing, and Total
Problems, averaged across observation sessions
separately for the identified child and matched con-
trols. The 1986 DOF profile also provided an On-
task score ranging from 1 to 10, averaged across
observation sessions for identified and control chil-
dren.
Constructing the DOF and DOF Profile
Chapter 6
6. Constructing the DOF and DOF Profile 72
Several studies have reported data on the reli-
ability, stability, and validity of the 1981 and 1986
versions of the DOF. Achenbach and Edelbrock
(1983) reported inter-rater reliabilities (Pearson r)
of .96 for DOF Total Problems and .71 for On-task
for 16 children observed in a residential treatment
setting by two trained research assistants. In the
same residential setting, 6-month stability Pearson
correlations for 36 children were .55 for classroom
observations, .59 for recess observations, and .51
for classroom On-task scores. Scores for each child
were averaged over six 10-minute observations at
Time 1 and Time 2. These stability coefficients for
the DOF were similar to the 6-month stability co-
efficient of .57 for CBCL ratings of the same chil-
dren by their mother or a child care worker.
For a sample of 25 public school boys referred
for special services for behavioral problems, Reed
and Edelbrock (1983) reported inter-rater
reliabilities of .91 for DOF Total Problems and .83
for On-task ratings summed across six 10-minute
observations in a 60-minute observation period.
They also reported mean inter-rater reliabilities of
.85 for Total Problems and .71 for On-task for the
same sample when rs for each of the 6 sets of 10-
minute observations were averaged across sessions.
For a sample of 62 randomly selected 6-11-year-
old boys, McConaughy, Achenbach, and Gent
(1988) reported inter-rater reliabilities of .75 for
DOF Total Problems and .88 for On-task scores
averaged across one 10-minute classroom obser-
vation and one 10-minute recess observation for
each child. As part of a school-based prevention
study, McConaughy, Kay, and Fitzgerald (1998,
1999) reported reliabilities for five pairs of trained
DOF raters. Each rater pair observed 20 randomly
selected elementary school-aged children in class-
rooms for one 10-minute period. Inter-rater
reliabilities averaged across rater pairs were .86
for DOF Total Problems and .90 for On-task.
Achenbach and Rescorla (2001) reported mean
inter-rater reliabilities of .90 for DOF Total Prob-
lems and .84 for On-task scores (after Fisher z trans-
formations), averaged across the studies of the 1981
and 1986 DOF discussed above.
As evidence for the validity of the DOF, Reed
and Edelbrock (1983) reported that DOF Total
Problems and On-task scores discriminated signifi-
cantly between referred and nonreferred boys ob-
served in the same classroom settings by observ-
ers blind to referral status. The convergent validity
of the DOF was supported by significant correla-
tions of .37 to .51 between DOF Total Problems
and TRF Total Problems (Achenbach & Edelbrock,
1986; Reed & Edelbrock, 1983) and significant
associations between DOF scores and CBCL pro-
file patterns (McConaughy et al., 1988).
McConaughy et al. (1998, 1999) also found that
DOF On-task, Internalizing, Nervous-Obsessive,
and Depressed scale scores significantly discrimi-
nated between outcomes for at-risk children who
received different school-based programs to pre-
vent serious emotional disturbance.
In another study, Skansgaard and Burns (1986)
added nine items to the 1986 DOF to create a priori
problem scales for Inattention, Hyperactivity/Im-
pulsivity, Oppositional Defiant Disorder/overt
Conduct Disorder (ODD/overt CD), and Slow
Cognitive Tempo (SCT). For a sample of 24 chil-
dren, inter-rater reliabilities were .97, .95, .99, and
.69 for each of these four scales, respectively, plus
1.00 for the DOF On-task score. To test the dis-
criminative validity of this 106-item DOF,
Skansgaard and Burns grouped the 24 children into
an ADHD-Combined subtype (ADHD/C; n = 6),
ADHD-Inattentive subtype (ADHD/I; n = 6), and
matched controls (n = 12), based on percentile
cutpoints on teachers ratings of DSM-IV ADHD
symptoms. Despite the small sample sizes,
Skansgaard and Burns found that the ADHD-C and
ADHD-IN groups both scored significantly higher
on the DOF Inattention scale and lower on DOF
On-task scores than matched controls. The ADHD-
C group scored significantly higher than the
ADHD-IN group and controls on the DOF Hyper-
activity/Impulsivity and ODD/overt CD scales. The
ADHD-IN group scored significantly higher than
controls on the DOF SCT scale. Similar group dif-
ferences were reported for teachers ratings on com-
6. Constructing the DOF and DOF Profile 73
parable subsets of DSM-IV symptoms, except that
ADHD-IN scored significantly higher than both
ADHD-C and controls on SCT.
In 2003, we created an expanded version of the
1986 DOF for use in a large study of children with
behavioral and emotional problems and matched
controls (McConaughy & Achenbach, 2003). The
2003 DOF included 114 specific problem items,
plus an open-ended item for the observer to write
in any observed problems or behaviors not listed
above. We retained 95 of the 1986 DOF problem
items, plus the open-ended item for other problems.
We added 19 new items to expand the 1986 DOF
for research, as follows:
We replaced the 1986 item, 4. Behaves like op-
posite sex, with a new item, 4. Cheats. We changed
the 1986 item, 75. Underactive, slow moving, lacks
energy, or yawns, by removing the word, yawns
and created a new item 114. Yawns. We changed
the 1986 item, 83. Fails to express self clearly, in-
cluding speech defects, by removing the words, in-
cluding speech defects and created a new item
97. Speech problem (describe). We created 16 ad-
ditional items to correspond to TOF items and prob-
lems that observers had written in for the open-
ended item in earlier research, plus items to tap
DSM-IV (American Psychiatric Association, 1994)
symptoms for ADHD that were not covered by the
original 1986 DOF items. We also slightly re-
worded eleven 1986 DOF items.
We used data from samples rated on the 1986
DOF and the 2003 DOF to develop the 2009 DOF
with 89 problem items and to derive scales for the
2009 DOF Profile, as discussed in the next sec-
tions. (Appendix D shows the final 89 items of the
2009 DOF in comparison to the 97 items of the
1986 DOF.)
PSYCHOMETRIC APPROACH TO
THE 2009 DOF
Consistent with previous research, we designed
the 2009 DOF to obtain direct observational data
on childrens problems and on-task behavior in
school classrooms, recess, and comparable group
settings. As part of the ASEBA, the DOF yields
scores for observed problems that can be meshed
with data from other sources, such as parent re-
ports, teacher reports, childrens self-reports, test
session observations, and observations from child
clinical interviews. This facilitates a multiaxial
approach to assessment, as discussed in Chapter
1.
To develop the 2009 DOF and DOF Profile, we
used a psychometric approach similar to that used
for other ASEBA forms and profiles, including the
CBCL/6-18, TRF, and YSR (Achenbach &
Rescorla, 2001), SCICA (McConaughy &
Achenbach, 2001), and TOF (McConaughy &
Achenbach, 2004). We used the following proce-
dures:
1. We selected and tested a pool of items that de-
scribe observable aspects of childrens behav-
ior, affect, and interactions in group settings.
2. We obtained observers ratings of problem items
for 6-11-year-old clinically referred children and
matched control children in the same settings.
3. We factor analyzed the observers ratings to ag-
gregate problem items into quantitative syn-
drome scales.
4. We constructed a DSM-oriented Attention Defi-
cit/Hyperactivity Problems scale comprised of
problem items that are consistent with DSM-IV-
TR criteria for ADHD.
5. We constructed a Total Problems score consist-
ing of the sum of ratings on the 89 problem
items.
6. We assigned standard scores (T scores) and per-
centiles for the DOF problem scales and the On-
task score that indicate how a childs scores com-
pare with scores for normative samples of chil-
dren.
7. We tested the problem scales and On-task score
for reliability and validity.
The next sections describe our research for steps
1 through 6. Chapter 7 reports reliability and Chap-
ter 8 reports validity for the 2009 DOF scales.
STATISTICAL DERIVATION OF DOF
6. Constructing the DOF and DOF Profile 74
SYNDROMES FOR CLASSROOM
OBSERVATIONS
Like other ASEBA forms, the DOF was devel-
oped both to document specific problems and to
identify co-occurring problems. We used various
statistical procedures to identify syndromes of co-
occurring problems. The original Greek meaning
of the word syndrome is the act of running together.
Although syndrome is often equated with dis-
ease, its most general meaning is a set of concur-
rent things (Gove, 1971). Consistent with this
meaning, we performed a series of factor analyses
to identify syndromes of childrens problems that
were observed in school classrooms.
Factor analysis refers to a family of statistical
methods for identifying patterns of co-occurring
items. Because different factor-analytic approaches
may produce different results, we used several ap-
proaches that included both exploratory factor
analysis (EFA) methodology (SPSS, 2007) and
confirmatory factor analysis (CFA) methodology
(Mplus; Muthn & Muthn, 2004). EFA yields fac-
tors that summarize the associations among prob-
lem items without testing specific models for the
factor structure, whereas CFA tests the fit of data
to particular measurement structures.
Samples for Factor Analyses
For the factor analyses, we assembled a sample
of 1,261 children ages 6-12 who were observed in
school classrooms. Of these, 486 were rated on the
1986 DOF and 775 were rated on the 2003 DOF.
The total sample included 649 children who were
clinically referred for evaluations of behavioral and
emotional problems and/or learning difficulties or
were identified as at risk for such problems (for
brevity, we are labeling this group referred). The
referred samples were drawn from outpatient clin-
ics and schools in Vermont, New York, and Penn-
sylvania. The Vermont sample included children
referred to the outpatient clinic of the University
of Vermont Department of Psychiatry. Children in
the Vermont sample were drawn from urban, semi-
urban, and rural areas. The New York sample in-
cluded children referred to the outpatient clinic of
the Department of Psychiatry at SUNY Upstate
Medical University in Syracuse, New York. The
Pennsylvania sample included children referred to
the outpatient clinic of The Childrens Hospital of
Philadelphia. An additional 612 children were ran-
domly selected control children in the same class-
rooms as the referred children. The total sample of
1,261 children included 873 boys and 388 girls,
each having 2 to 4 DOFs. Each DOF covered a 10-
minute observation.
From the total sample of 1,261 children, we se-
lected two samples for EFAs. One sample included
955 children who were rated on either the 1986 or
2003 DOF. This sample included 649 referred chil-
dren plus 306 matched control children who scored
above the DOF median Total Problems score of
5.9 that we found for controls. As explained later,
this sample was used for EFAs of 41 high frequency
problem items included in both the 1986 and 2003
DOF. The second EFA sample was on a subset of
the first sample of 955 children, which included
613 children who were rated only on the 2003 DOF
(335 referred and 278 controls). This sample was
used for EFAs of 57 high frequency problem items
included in the 2003 DOF. We used two different
samples for initial EFAs to determine whether the
factor structure differed for the 41-item set versus
the larger 57-item set from the 2003 DOF. The
sample for subsequent CFAs included all 1,261
children (649 referred and 612 controls). We com-
bined boys and girls in all analyses to maximize
our sample sizes. Table 6-1 summarizes the
samples used for factor analyses.
Items for Factor Analyses
To select DOF items for factor analyses, we
omitted the open-ended item for other problems
and combined two new items on the 2003 DOF
with two other items with similar wording, reduc-
ing the total item set to 112 items. Of these, 95
items were included on both the 1986 DOF and
6. Constructing the DOF and DOF Profile 75
2003 DOF. To obtain item frequencies, we aver-
aged the 0, 1, 2, and 3 ratings for each of 112 DOF
items across the 2 to 4 DOFs for each of the 1,261
children in the total sample shown in Table 6-1.
From the frequency distributions of averaged item
ratings, we identified 55 low frequency items that
were rated present (>0.00) for fewer than 5% of
referred children and fewer than 3% of referred
and control children combined. Omitting these 55
low frequency items, we retained 57 items from
the 2003 DOF that were rated present (>0.00) for
>5% of referred children and for >3% of referred
and control children combined. We then identified
41 of the 57 items from the 2003 DOF that were
also rated on the 1986 DOF. We used these two
item sets for initial EFAs, as described in the next
section.
Factor-Analytic Methods
EFAs for Deriving Factors. As a general strat-
egy, we performed exploratory Maximum Likeli-
hood (ML), Unweighted Least Squares (ULS), and
Principal Components Analyses (PCA) of Pearson
Table 6-1
Samples for Factor Analyses to Derive 2009 DOF Syndromes
Boys Girls Total
Total Sample
a
Referred children 464 185 649
Controls 409 203 612
Total 873 388 1,261
Sample for EFAs of 41 high frequency items from
1986 & 2003 DOF
b
Referred children 464 185 649
Controls 219 87 306
Total 683 272 955
Sample for EFAs of 57 high frequency items from
2003 DOF
c
Referred children 231 104 335
Controls 197 81 278
Total 428 185 613
Ethnicity
d
Non-Latino White 79.4%
African American 14.5%
Latino/Hispanic 2.0%
Mixed or Other 4.1%
a
This sample was used for CFAs.
b
This sample was a subset of N = 1,261.
c
This sample was a subset of N = 955.
d
Percents for N = 1,124 referred and control children for whom ethnicity was known.
6. Constructing the DOF and DOF Profile 76
for each child. The single-factor WLS analyses
were applied to 3,533 DOFs for the entire sample
of 1,261 children. The single-factor WLS analy-
ses identified five unidimensional factors, with
29 items loading on only one factor and 16 items
loading on more than one factor.
CFA Tests of the 5-Factor Model. To assign
items to only one factor for a final 5-factor model,
we performed a correlated 5-factor WLS analysis
of the candidate factors for the 3,533 DOFs for the
total sample of 1,261 children. From these analy-
ses, we identified 43 items meeting criteria (a) and
(b) above: 10 for Factor 1; 7 for Factor 2; 12 for
Factor 3; 6 for Factor 4; and 8 for Factor 5. An
additional item loaded .18 on Factor 2. Only one
of the 45 items from the single-factor solutions
failed to load significantly on any factor in the test
of the 5-factor model.
We then examined correlations between di-
chotomous item scores (0 vs. 1-2-3) and latent vari-
ables for the five factors for all items dropped from
the factors in previous analyses. We looked for
items that had correlations > .40 with at least one
factor and a difference > .10 between that correla-
tion and correlations with the remaining four fac-
tors. Seven items met these criteria. To obtain a
final 5-factor solution, we then tested models with
and without these seven items for the 3,533 DOFs
for the total sample of 1,261 children.
To obtain a final 5-factor solution, we used CFA
methodology in an exploratory manner, rather than
seeking confirmation of factor models. We ex-
amined solutions for the following characteristics:
(a) proper convergence; (b) no out-of-range param-
eter estimates; (c) reasonable model fit; and (d) re-
tention of items with factor loadings > .20 and sig-
nificant at p <.01. For the final 5-factor solution,
52 items met criteria (a) through (d): 12 for Factor
1; 10 for Factor 2; 12 for Factor 3; 8 for Factor 4;
and 10 for Factor 5. We evaluated goodness-of-fit
between the data and the models with the Root
Mean Square Error of Approximation (RMSEA;
Browne & Cudek, 1993), which has been recom-
mended as the best measure of fit (Loehlin, 1998).
correlations among the retained DOF high fre-
quency items. The initial EFAs that yielded 3 to 10
factors were subjected to Varimax (orthogonal)
rotations to produce uncorrelated factors and
Oblimin (oblique) rotations to allow correlations
among factors. Using these general strategies, we
performed six separate EFAs (3 methods x 2 rota-
tions) on the 41 high frequency items included on
the 1986 DOF and 2003 DOF, using the sample of
children rated on either of the two forms (N =
955). We then performed an additional six EFAs
on the 57 high frequency items included on the
2003 DOF, using the sample of children rated only
on the 2003 DOF (N = 613). We found similar fac-
tor structures from the 1986/2003 DOF 41-item
set and the analyses of the 2003 DOF 57-item set.
We therefore used solutions from the EFAs of the
2003 DOF 57-item set for our next analyses.
We identified five factors that were similar in
the six EFAs of the 2003 DOF 57-item set. We
retained DOF items that had loadings > .20 and p
<.01 on at least one of the five factors. The differ-
ent factor extraction and rotation methods thus
collectively contributed results that were subse-
quently tested via CFA methods, as described in
the next two sections.
CFAs for Evaluating the Unidimensionality of
Factors. To test the unidimensionality of factors
derived in the forgoing analyses, we applied single-
factor Weighted Least Squares (WLS) analyses to
candidate items comprising each of the five candi-
date factors. Items that loaded on multiple versions
of a particular factor were included if they met the
following criteria: (a) the items factor loading had
to be significant at p <.01, i.e., the estimated factor
loading had to exceed its standard error by at least
2.57, and (b) the loading had to be >.20. To avoid
statistical risks associated with low frequency cells,
we applied the WLS analyses to tetrachoric corre-
lations between item scores dichotomized 0 vs. 1,
2, and 3. Because Mplus can take account of de-
pendency in a data set (i.e., more than one DOF
per subject), we were able to analyze ratings from
each DOF separately, rather than entering the mean
of the ratings on each item for all DOFs completed
6. Constructing the DOF and DOF Profile 77
Table 6-2
Factor Loadings of Items on the DOF Syndrome Scales
for Classroom Observations
DOF Syndrome Scale Factor Loading
I. Sluggish Cognitive Tempo
11. Confused or seems to be in a fog .71
15. Daydreams or gets lost in thoughts .53
27. Forgetful in activities or tasks .47
44. Apathetic, unmotivated, or wont try .78
51. Slow to respond verbally .55
53. Shy or timid .32
57. Stares blankly .58
60. Yawns .22
70. Underactive, slow moving, tired, or lacks energy .50
71. Unhappy, sad or depressed .61
II. Immature/Withdrawn
1. Acts too young for age .87
25. Difficulty organizing activities or tasks .49
26. Fails to give close attention to details .57
34. Physically isolates self from others .43
39. Loses things .58
49. Avoids or is reluctant to do tasks that require sustained mental effort .62
59. Wants to quit or does quit tasks .66
61. Strange behavior .27
75. Withdrawn, doesnt get involved with others .31
77. Fails to express self clearly .36
III. Attention Problems
7. Doesnt concentrate or doesnt pay attention for long .74
9. Doesnt sit still, restless, or hyperactive .61
13. Fidgets, including with objects .55
24. Eats, drinks, chews, or mouths things that are not food, excluding junk foods .31
42. Picks or scratches nose, skin, or other parts of body .23
56. Easily distracted by external stimuli .62
76. Sucks thumb, fingers, hand, or arm .28
82. Clumsy, poor motor control .52
IV. Intrusive
8. Difficulty waiting turn in activities or tasks .73
17. Tries to get attention of staff .45
21. Disturbs other children .63
32. Interrupts .68
33. Impulsive or acts without thinking, including calling out in class .66
6. Constructing the DOF and DOF Profile 78
The RMSEA for the final 5-factor solution was
.024, which was well within the range <.07 gener-
ally considered to indicate good fit.
Results of Factor Analyses
Table 6-2 shows the five DOF syndrome scales
with the factor loadings for each item derived from
the final 5-factor solution. The names of the syn-
drome scales reflect the content of the items com-
prising each factor. We chose names that were con-
sistent with current literature and with the names
of scales derived from similar factor analyses of
the 1986 DOF and other ASEBA forms. The syn-
dromes are numbered in Table 6-2 according to
the order in which they appear on the DOF Pro-
file. The order of the syndrome scales was deter-
mined by subsequent second-order factor analyses
described in a later section.
The DOF Sluggish Cognitive Tempo syndrome
includes 10 items describing confusion, lack of mo-
tivation, and underactivity. Items with the highest
factor loadings were: 11. Confused or seems to be
in a fog; 44. Apathetic, unmotivated, or wont try;
and 71. Unhappy, sad, or depressed. Five items
(11, 15, 44, 57, and 75) were consistent with the
2007 Sluggish Cognitive Tempo scales created for
the CBCL/6-18 and TRF (Achenbach & Rescorla,
2007). Interestingly, symptoms of sluggish cogni-
Table 6-2 (cont.)
DOF Syndrome Scale Factor Loading
45. Responds before instructions are completed .41
46. Disrupts group activities .79
55. Demands must be met immediately, easily frustrated .73
65. Talks too much .50
72. Unusually loud .65
78. Impatient .76
81. Easily led by peers .51
V. Oppositional
2. Makes odd noises .51
3. Argues .68
5. Defiant or talks back to staff .75
16. Difficulty following directions .67
20. Disobedient .76
22. Doesnt seem to feel guilty after misbehaving .87
23. Doesnt seem to listen to what is being said .60
43. Runs about or climbs excessively .50
52. Shows off, clowns, or acts silly .40
74. Whining tone of voice .58
83. Doesnt get along with peers .54
87. Complains .71
Note. N = 1,261 with 3,533 DOFs for the final Weighted Least Squares factor analyses of tetrachoric
correlations of dichotomous item scores; referred children, n = 649; matched controls, n = 612. Values in
bold show the three highest loadings for each factor.
6. Constructing the DOF and DOF Profile 79
gressive Behavior and SCICA Self-Control Prob-
lems syndromes. Examples are: 3. Argues; 5. De-
fiant or talks back to staff; and 20. Disobedient.
The Oppositional syndrome does not include prob-
lems reflecting physical aggression, such as fight-
ing, which are unlikely to be observed in class-
room settings.
LOW FREQUENCY ITEMS RETAINED
ON THE DOF
To finalize the DOF, we examined the frequency
distributions of averaged item ratings for the
sample of 649 referred children who were rated on
the 1986 DOF or 2003 DOF in their classrooms,
plus 232 of the referred children who were rated
on the 1986 DOF during recess. We retained items
that were scored present (>0.00) for >2% of the
classroom and recess samples. We retained 23
items from classroom observations and an addi-
tional 7 items from recess observations. These 30
items, plus 6 additional items that were not on the
five DOF syndromes, and open-ended item 89.
Other problems not listed above were grouped to-
gether as Other Problems, as shown in Table 6-
3. The 37 Other Problems, plus the 52 items on the
DOF syndromes, are included in the final 2009
version of the DOF, which thus has 88 specific
problem items, plus one open-ended item. The 0-
1-2-3 ratings on all 89 items are summed to com-
pute the DOF Total Problems score, as explained
in a later section.
AGGRESSIVE BEHAVIOR SYNDROME
FOR RECESS OBSERVATIONS
As explained in previous sections, five DOF
syndrome scales were derived from observations
of children in classroom settings. Because activi-
ties in classrooms are often teacher-directed and
structured around a curriculum, children may be
less likely to exhibit certain types of problem be-
haviors in the classroom than in less structured
settings, such as recess. Examples are getting into
fights, teasing, and being teased. To determine
whether there were any syndromes for recess ob-
tive tempo were tested for possible inclusion in
the DSM-IV criteria for ADHD, but were not in-
cluded in the final criteria (Frick et al., 1994). As
described in an earlier section, a prior research
study using an expanded version of the 1986 DOF
showed a significant association between a DOF
scale labeled Slow Cognitive Tempo and the In-
attentive type of ADHD (Skansgaard & Burns,
1998).
The DOF Immature/Withdrawn syndrome in-
cludes 10 items describing problems of immatu-
rity and disorganization, along with withdrawn be-
havior. Items with the highest factor loadings were:
1.Acts too young for age; 49. Avoids or is reluc-
tant to do tasks that require sustained mental ef-
fort; and 59. Wants to quit or does quit tasks. Five
items (25, 26, 39, 49, and 59) were consistent with
items comprising a DOF DSM-oriented Inatten-
tion scale described in a later section. Item 75. With-
drawn, doesnt get involved with others was con-
sistent with similar items on the CBCL/6-18, TRF,
and TOF Withdrawn/Depressed syndrome, as well
as the 1986 DOF Withdrawn-Inattentive syndrome.
The DOF Attention Problems syndrome in-
cludes eight items describing difficulty with atten-
tion and restlessness. Items with the highest factor
loadings were: 7. Doesnt concentrate or doesnt
pay attention for long; 9. Doesnt sit still, restless,
or hyperactive; and 13. Fidgets, including with ob-
jects. Four items (7, 9, 13, and 56) were consistent
with items comprising the 1986 DOF Hyperactive
scale and the TOF and TRF Attention Problems
scales.
The DOF Oppositional syndrome was the most
robust factor to emerge from our analyses. It in-
cludes12 items that reflect oppositional or unco-
operative behavior. The highest loading items were:
5. Defiant or talks back to staff; 20. Disobedient;
and 22. Doesnt seem to feel guilty after misbehav-
ing. Five items (3, 5, 20, 52, and 105) were consis-
tent with items on the TOF Oppositional syndrome.
The Oppositional syndrome also contains items
with counterparts on the CBCL/6-18 and TRF Ag-
6. Constructing the DOF and DOF Profile 80
Table 6-3
DOF Other Problems Item Set
DOF Items
4. Cheats
6. Brags, boasts
10. Clings to adults or too dependent
12. Cries
14. Cruel, bullies, or mean to others
18. Destroys own things
19. Destroys property belonging to others
28. Out of seat
29. Gets hurt, accident prone
30. Gets in physical fights
31. Gets teased
35. Lies
36. Bites fingernails
37. Nervous, highstrung, or tense
38. Nervous movements, twitching, tics, or other unusual movements (describe):
40. Too fearful or anxious
41. Physically attacks people
47. Screams
48. Secretive, keeps things to self, including refusal to show things to teacher
50. Self-conscious or easily embarrassed
54. Explosive or unpredictable behavior
58. Speech problem (describe):
62. Stubborn, sullen, or irritable
63. Sulks
64. Swears or uses obscene language
66. Teases
67. Temper tantrums, hot temper, or seems angry
68. Threatens people
69. Too concerned with neatness or cleanliness
73. Overly anxious to please
79. Tattles
80. Repeats behavior over & over; compulsions (describe):
84. Runs out of class (or similar setting)
85. Behaves irresponsibly (describe):
86. Bossy
88. Afraid to make mistakes
89. Other problems not listed above
6. Constructing the DOF and DOF Profile 81
servations that were not identified in classroom
observations, we performed additional factor analy-
ses of 35 items from the Other Problems shown
in Table 6-3. We excluded item 28. Out of seat,
which was not on the 1986 DOF, and would not
have been relevant for recess observations. We also
excluded open-ended item 89. The factor analyses
were performed on the sample of 232 clinically re-
ferred children ages 6-11 whose recess observa-
tions were rated on the 1986 DOF, plus 248 matched
control children. (Of the 232 referred children, 124
had two matched control children in the same re-
cess setting.) Each child was rated on the DOF for
two 10-minute observations during recess, alter-
nating between control and referred children.
From the total sample of 480 children observed
at recess, we obtained frequency distributions of
averaged 0, 1, 2, 3 item ratings for each of the 35
items. From the frequency distributions of averaged
item ratings, we identified 12 DOF items that were
scored present (>0.00) for >5% of referred chil-
dren and >3% of control children. We then applied
single-factor ULS analyses to the 12 candidate
items to test the unidimensionality of a single-fac-
tor solution. Consistent with criteria for the five
syndromes for classroom observations, we retained
items with (a) factor loadings significant at p <.01,
i.e., the estimated factor loading had to exceed its
standard error by at least 2.57, and (b) loadings
>.20. Table 6.4 shows the factor loadings for the
nine items that met these criteria for a recess ob-
servation scale, which we labeled Aggressive Be-
havior. As expected, the Aggressive Behavior syn-
drome included problems with physical aggression
as well as other social problems, such as teasing
and being teased. The three highest loading items
were: 30. Gets into physical fights; 41. Physically
attacks people; and 14. Cruel, bullies, or mean to
others. Most of the items comprising the DOF
Aggressive Behavior syndrome for recess obser-
vations have counterparts on the CBCL/6-18 and
TRF Aggressive Behavior syndromes (Achenbach
& Rescorla, 2001).
DSM-ORIENTED ATTENTION DEFICIT/
HYPERACTIVITY PROBLEMS AND
INATTENTION AND HYPERACTIVITY-
IMPULSIVITY SUBSCALES
To aid practitioners and researchers in diagnos-
tic assessments, Achenbach and Rescorla (2001)
Table 6-4
Factor Loadings of Items on the DOF Aggressive Behavior
Syndrome Scale for Recess Observations
DOF Items Factor Loading
14. Cruel, bullies, or mean to others .40
30. Gets in physical fights .52
31. Gets teased .21
41. Physically attacks people .45
47. Screams .23
63. Sulks .34
66. Teases .24
79. Tattles .33
86. Bossy .23
Note. N = 480 for Unweighted Least Squares single-factor analyses of averaged item scores; referred
children, n = 232; matched controls, n = 248. Values in bold show the three highest factor loadings.
6. Constructing the DOF and DOF Profile 82
tained mental effort. Three other DOF items were
also consistent with DSM-IV ADHD symptoms:
28.Out of seat; 45. Responds before instructions
are completed; and 55. Demands must be met im-
mediately, easily frustrated. To cover all possible
DSM-IV ADHD symptoms, we added the above
11 items to the 12 items that the experts judged to
be very consistent with DSM-IV ADHD symptoms.
We then assigned the 23 items to Inattention
and Hyperactivity-Impulsivity subscales, as shown
in Table 6-5. Of these 23 items, 21 were similar to
items on the TOF Attention Deficit/Hyperactivity
Problems scale and its Inattention and Hyperac-
tivity-Impulsivity subscales (McConaughy &
Achenbach, 2004). Items in italic are similar to
items identified by experts for the CBCL/6-18 and
TRF Attention Deficit/Hyperactivity Problems
scales, while non-italicized items are the additional
DOF items consistent with DSM-IV symptoms.
The Attention Deficit/Hyperactivity Problems To-
tal score is the sum of the 0, 1, 2, and 3 ratings for
all 23 items.
NORMATIVE SAMPLE
For classroom observations, the DOF norma-
tive sample included 661 children ages 6-11, as
shown in Table 6-6. These were randomly selected
children in general education classrooms in four
states: Arizona (n = 65), New York (n = 146), Penn-
sylvania (n = 172), and Vermont (n = 278). The
DOF normative sample for recess observations
included 244 Vermont children ages 6-11, who
were a subsample of the normative sample for
classroom observations. Each child in the norma-
tive samples was observed and rated on the DOF
for two to four 10-minute periods for classroom
observations and two 10-minute observations for
recess observations. The 0-1-2-3 ratings on each
of the DOF items were averaged across the 2 to 4
DOFs for each child. The averaged item scores
were then summed to obtain total raw scores for
each of the relevant DOF scales for classroom ob-
servations and recess observations.
To test age and gender differences in the nor-
constructed DSM-oriented scales comprising
CBCL/6-18, TRF, and YSR items that mental
health experts judged to be very consistent with
DSM-IV (American Psychiatric Association, 1994)
diagnostic categories. To do this, they asked the
experts to rate items from all three ASEBA forms
as very consistent, somewhat consistent, or not
consistent with descriptive criteria for several
DSM-IV diagnostic categories. The raters were 22
highly experienced child psychiatrists and psy-
chologists from 16 cultures. All the raters had pub-
lished research on childrens behavioral and emo-
tional problems. Raters were given the DSM-IV
criteria for guidance, but one-to-one matching of
DSM-IV criteria to ASEBA items was not neces-
sary to justify ratings of very consistent. Some
ASEBA items could thus be judged as very con-
sistent with the experts concepts of particular
DSM-IV categories, even if the DSM-IV criteria
did not include precise counterparts of the ASEBA
items. ASEBA items that were rated as very con-
sistent with the DSM-IV categories by at least 14
of the 22 raters were grouped into six DSM-ori-
ented scales: Affective Problems, Anxiety Prob-
lems, Somatic Problems, Attention Deficit/Hyper-
activity Problems, Oppositional Defiant Problems,
and Conduct Problems (for details, see Achenbach
& Rescorla, 2001).
To create the DOF DSM-oriented Attention
Deficit/Hyperactivity Problems scale, we selected
DOF items that were comparable to the CBCL/6-
18 and TRF items that the experts rated as very
consistent with the DSM-IV diagnosis of ADHD.
We identified 12 DOF items that were similar to
CBCL/6-18 and TRF items.
As indicated earlier, to develop the 2003 DOF,
we also wrote new items to tap DSM-IV symp-
toms of ADHD that were not already covered by
other items: 8.Difficulty waiting turn in activities
or tasks; 23. Doesnt seem to listen to what is be-
ing said; 25. Difficulty organizing activities or
tasks; 26. Fails to give close attention to details;
27. Forgetful in activities or tasks; 39. Loses things;
43. Runs about or climbs excessively; and 49.
Avoids or is reluctant to do tasks that require sus-
6. Constructing the DOF and DOF Profile 83
mative sample, we performed a 2 (ages 6-8 vs. ages
9-11) x 2 (boys vs. girls) MANOVA on raw scale
scores for the five DOF syndromes, followed by
univariate 2 x 2 ANOVAs on scores for each syn-
drome scale. We performed a similar 2 x 2
MANOVA, followed by univariate ANOVAs, on
the DOF Inattention and Hyperactivity-Impulsiv-
ity scales, and 2 x 2 univariate ANOVAs on the
Attention Deficit/Hyperactivity Problems scale,
Total Problems-Classroom, Aggressive Behavior,
Total Problems-Recess, and On-task scores. As
shown in Table 6-7, boys scored significantly
higher than girls on 6 of 10 DOF scales for class-
room observations. There were no significant gen-
der differences for recess observations. Significant
age effects were found only on the Immature/With-
drawn syndrome, on which children ages 6-8
scored significantly higher (Mean = .23, SD = .58)
Table 6-5
Items Comprising the DOF DSM-Oriented Attention Deficit/Hyperactivity Problems Scale
and Inattention and Hyperactivity-Impulsivity Subscales
Inattention Subscale
7. Doesnt concentrate or pay attention for long
16. Difficulty following directions
23. Doesnt seem to listen to what is being said
25. Difficulty organizing activities or tasks
26. Fails to give close attention to details
27. Forgetful in activities or tasks
39. Loses things
49. Avoids or is reluctant to do tasks that require sustained mental effort
56. Easily distracted by external stimuli
59. Wants to quit or does quit tasks
Hyperactivity-Impulsivity Subscale
8. Difficulty waiting turn in activities or tasks
9. Doesnt sit still, restless, or hyperactive
13. Fidgets, including with objects
21. Disturbs others
28. Out of seat
32. Interrupts
33. Impulsive or acts without thinking, including calling out in class
43. Runs about or climbs excessively
45. Responds before instructions are completed
46. Disrupts group activities
55. Demands must be met immediately, easily frustrated
65. Talks too much
72. Unusually loud
Note. Items in italics have counterparts on the CBCL/6-18 and TRF Attention Deficit/Hyperactivity
Problems scales. All but two DOF items (21 and 46) have counterparts on the TOF Attention Deficit/
Hyperactivity Problems scale. The Attention Deficit/Hyperactivity Problems scale score is the sum of
0-1-2-3 ratings on the Inattention and Hyperactivity-Impulsivity subscales.
6. Constructing the DOF and DOF Profile 84
than children ages 9-11 (Mean = .12, SD = .40), p
= .005, Eta
2
= .012. We constructed norms sepa-
rately for boys and girls in each setting, as described
in the next section.
ASSIGNING NORMALIZED T SCORES
TO RAW SCORES
The sums of the averaged 1, 2, and 3 ratings on
the items of the DOF problem scales provide con-
tinuous distributions of scores that indicate the
degree to which problems are reported for each
child on each scale. The DOF On-task score also
provides continuous raw scores ranging from 0 to
10 in 0.5 increments. These raw scale scores are
especially useful for statistical analyses, because
they reflect all the variation that is possible on
each scale. To help users see how an individual
Table 6-6
Characteristics of Normative Samples for the DOF
Boys Girls Total
Classroom Observations
Ages
6 79 45 124
7 92 44 136
8 68 59 127
9 73 43 116
10 56 26 82
11 35 41 76
Total 403 258 661
Recess Observations
Ages 32 6 38
6 26 18 44
7 34 18 52
9 32 18 50
10 28 8 36
11 18 6 24
Total 170 74 244
Ethnicity
a
Non-Latino White 63.8%
African American 20.1%
Native American 8.9%
Latino/Hispanic 3.5%
Asian 2.1%
Mixed or Other 0.3%
Unknown 1.2%
a
Percentages of total N = 661 for classroom observations. (Recess observations were obtained on a
subsample of children used for classroom observations.)
6. Constructing the DOF and DOF Profile 85
Scales
We assigned normalized T scores to the total
raw scores of each DOF problem scale according
to the percentiles found for the raw score distribu-
tions in each normative sample. For each DOF
scale, we computed midpoint percentiles for each
total raw score according to procedures specified
by Crocker and Algina (1986, p. 439). According
to this procedure, a raw score that occupies a par-
ticular percentile of the cumulative frequency dis-
tribution is assumed to also occupy all the next
lower percentiles down to the percentile occupied
by the next lower raw score. To re-present the range
childs scores on each scale compare with scores
from the normative sample, we assigned normal-
ized T scores to the total raw scores for each DOF
scale. The T scores are standard scores that com-
pare the childs standing on a scale with the distri-
bution of scores obtained by the normative
sample of children of the same gender for the same
setting (classroom or recess). This enables users
to see whether a childs scale scores are high or
low compared to normal peers. The T scores also
enable users to compare a childs standing on each
scale with the childs standing on the other scales.
Assigning T scores to the DOF Problem
Table 6-7
Means and Standard Deviations of DOF Raw Scale Scores for the Normative Samples
Boys Girls
DOF Scales Mean SD Mean SD Eta
2
Classroom Observations
Empirically Based Scales
Sluggish Cognitive Tempo .91
a
1.27 .68 .91 .010
Immature/Withdrawn .21 .56 .13 .44 ns
Attention Problems 4.45
a
3.10 3.90 2.66 .007
Intrusive 1.07 1.54 .99 1.43 ns
Oppositional .97
a
1.57 .67 1.31 .009
Total Problems-Classroom 8.79
a
5.95 7.36 5.09 .014
DSM-Oriented Scales
Attention Deficit/Hyperactivity Problems 5.41
a
3.95 4.70 3.52
b
.007
Inattention subscale 1.76 1.91 1.53 1.70 ns
Hyperactivity-Impulsivity subscale 3.66
a
2.63 3.17 3.39 .008
On-task 8.64 1.57 8.86 1.57 ns
Recess Observations
Aggressive Behavior .48 .75 .49 .75 ns
Total Problems-Recess 1.56 2.07 1.56 2.15 ns
Note. N = 661 for classroom observations; N = 244 for recess observations.
a
Boys > girls, p <.05.
b
Not significant when corrected for the number of comparisons (Sakoda, Cohen & Beall, 1954).
6. Constructing the DOF and DOF Profile 86
of percentiles occupied by a raw score, the raw
score is assigned to the midpoint of the percentiles
that it occupies.
For example, on the DOF Attention Problems
syndrome for classroom observations, we found
that a raw score of 6 spanned the 71
st
through 75
th
percentiles for the normative sample of 6-11-year-
old boys. We therefore assigned the score of 6 to
the 73
rd
percentile, which is midway between the
71
st
and 75
th
percentiles. According to the proce-
dure for assigning normalized T scores to raw
scores (Abramowitz & Stegun, 1968), the 73
rd
per-
centile score should get a T score of 56. To pro-
vide a common metric for the five DOF syndrome
scales for classroom observations, we assigned nor-
malized T scores from 50 through 70 according to
the midpoint percentile procedures.
We followed the same midpoint percentile pro-
cedure for assigning normalized T scores from 50
to 70 to all the DOF problem scales: the five syn-
dromes, DSM-oriented Attention Deficit/Hyperac-
tivity Problems scale, Inattention and Hyperactiv-
ity-Impulsivity subscales, and Total Problems for
classroom observations, as well as the Aggressive
Behavior syndrome and Total Problems for recess
observations. Procedures for truncating lower T
scores at 50 and for assigning T scores >70 are
described below. Procedures for assigning lower
and higher T scores to DOF Total Problems scores
for classroom observations are described in a sepa-
rate section.
Truncation of Lower T Scores at 50. The raw
scores of the DOF problem scales were all posi-
tively skewed in the normative sample, with large
proportions of children having scale scores of 0.
That is, more children in the normative sample re-
ceived very low than very high DOF problem
scores. Furthermore, because high scores are clini-
cally significant on problem scales, it is more im-
portant for the scales to make finer discriminations
among high scores than among low scores that are
at the bottom of the normal range.
If we based T scores directly on midpoint per-
centiles, the lowest T score for the Attention Prob-
lems syndrome for boys 6-11 would be 32, reflect-
ing the 4
th
midpoint percentile for boys who ob-
tained a score of 0. By contrast, the lowest T score
for the Oppositional syndrome would be 43, re-
flecting the 22
nd
midpoint percentile for boys 6-11
who obtained a score of 0 on this syndrome scale.
If these T scores were displayed on a profile for a
boy whose score was 0 on both syndrome scales,
the T score of 43 might suggest that the boy had
more problems on the Oppositional syndrome
scale, than on the Attention Problems syndrome
scale where the boys T score would be 32. This
difference in T scores would mask the fact that the
boy really had no problems on either syndrome
scale.
To avoid misleading impressions like those de-
scribed above, we truncated the assignment of T
scores, as recommended by Petersen, Kolen, and
Hoover (1993), and as done for other ASEBA forms
(Achenbach & Rescorla, 2000, 2001). To equalize
the starting points for the five syndrome scales for
classroom observations and the DSM-oriented At-
tention Deficit/Hyperactivity Problems scale and
Inattention and Hyperactivity-Impulsivity
subscales, we assigned a T score of 50 to raw scores
that fell at approximately the 50
th
percentile and
lower.
We also truncated T scores at 50 for the Ag-
gressive Behavior syndrome scale and Total Prob-
lems for recess observations. That is, we assigned
a T score of 50 to raw scores of 0 and then based
normalized T scores on midpoint percentiles for
Aggressive Behavior and Total Problems-Recess
up to the 98
th
percentile (T = 70).
Assignment of a T score of 50 to several raw
scale scores prevents users from overinterpreting
small differences among scores that are well within
the normal range. It also reduces differentiation
among low scores. However, loss of such differ-
entiation is of little practical importance, because
it involves differences that are all at the low end of
the normal range. If users nevertheless wish to pre-
6. Constructing the DOF and DOF Profile 87
serve differences at the low end of the normal
range, they can focus on the total raw scale scores.
For statistical analyses that do not involve com-
bining data across genders, raw scale scores are
usually preferable, because they directly reflect all
differences among scores without the effects of
truncation or other transformations.
Assigning T Scores Above 70 (>98
th
Percen-
tile). Most children in the normative samples ob-
tained scores that were well below the maximum
possible. It was therefore impossible to base the
highest T scores on percentiles, because the high-
est possible scores were spread over a tiny per-
centage of children in the normative sample. Be-
cause there were hardly any children in the norma-
tive samples on whom to base T scores above the
98
th
percentile (T >70), we assigned T scores from
71 to 100 in as many increments as there were re-
maining raw scores on each scale.
As an example, on the DOF Attention Problems
syndrome scale, the raw score of 11 (occupying
the 98
th
percentile) was assigned a T score of 70
for boys 6-11. Because there are eight items on the
scale, the maximum possible score is 24 (i.e., if a
boy received a rating of 3 on all eight items, the
boys raw scale score would be 24.) There are 30
intervals from 71 to 100, but 26 possible raw scores
from 11.5 through 24. (Because of averaging, DOF
raw scores include scores rounded to .5). To as-
sign T scores to the 26 possible raw scores, we
divided 30 by 26. Because 30/26 = 1.15, we as-
signed T scores to raw scores in intervals of 1.15.
Thus, a raw score of 11.5 was assigned a T score
of 70 + 1.15 = 71.15, rounded off to 71. A raw
score of 12 was assigned a T score of 71.15 + 1.15
= 72.30, rounded off to 72, and so on. The highest
possible raw score of 24 on Attention Problems
was assigned a T score of 100. By comparison, on
the Oppositional syndrome, a raw score of 5.5 (oc-
cupying the 98
th
percentile) was assigned a T score
of 70 for boys 6-11. The number of items on the
Oppositional syndrome is 12. Therefore, the high-
est possible score on the Oppositional syndrome
is 36, which was assigned a T score of 100.
We followed the same procedure for assigning
T scores above 70 to the DSM-oriented Attention
Deficit/Hyperactivity Problems scale, the Inatten-
tion and Hyperactivity-Impulsivity subscales, and
the Aggressive Behavior syndrome for recess ob-
servations. Our procedures for assigning T scores
to Total Problems are described below.
Assigning T scores to Total Problems
The DOF Total Problems score consists of the
sum of the 1, 2, and 3 ratings on all the specific
problem items of the DOF, plus the highest rating
(1, 2, or 3) for any problems written by the ob-
server in the spaces for the open-ended item 89.
Item 89 provides two spaces for adding problems
that are not listed elsewhere. However, only the
highest rating for added items is included in order
to limit the effects of idiosyncratic problems on
the Total Problems score. Separate Total Problems
scores are computed for classroom observations
and recess observations. There are gender-specific
norms for classroom and recess. To provide norm-
referenced scores for Total Problems, we computed
the scores obtained by each gender within each
setting. We then computed midpoint percentiles ac-
cording to the procedure described earlier for the
other DOF problem scales. We assigned T scores
to midpoint percentiles for Total Problems raw
scores, as described below.
No Truncation of Lower T Scores for Total
Problems-Classroom. There are more items on the
Total Problems scale than on any other scale, and
at least some of the items are endorsed for most
children. For classroom observations, relatively
few children in the normative samples obtained ex-
tremely low scores for Total Problems. It was there-
fore unnecessary to truncate Total Problems T
scores at 50 for classroom observations as we did
for other DOF problem scales. For Total Problems-
Classroom, the lowest raw score of 0 for boys and
for girls was assigned a T score of 33 (2
nd
percen-
tile). We then based normalized T scores directly
on midpoint percentiles for scores obtained by the
normative samples, up to the 98
th
percentile (T =
70).
6. Constructing the DOF and DOF Profile 88
For consistency in displaying scores on the DOF
Profile, the DOF computer-scoring program does
not print Total Problems-Classroom T scores be-
low 50. However, users can obtain these lower T
scores from the computer-scored data sets.
Truncation of Lower T Scores for Total Prob-
lems-Recess. For recess observations, 32% of boys
and 36% of girls in the normative samples obtained
raw scores of 0 for Total Problems. To take this
positive skew into account, we truncated T scores
at 50 for Total Problems-Recess, as done for other
DOF problem scales, as explained earlier.
Assigning T Scores Above 70 (>98
th
Percen-
tile) for Total Problems. No children in the nor-
mative or referred samples obtained DOF Total
Problems scores close to the maximum scores pos-
sible. If we followed the same procedure as for the
other problem scales, we would have compressed
the Total Problems scores actually obtained into a
narrow range of T scores. We would also have as-
signed a relatively broad range of T scores to raw
scores obtained by few or no children. To enable
the upper Total Problems-Classroom and Total
Problems-Recess T scores to reflect differences
among the raw scores that are most likely to occur,
we did the following: (a) we identified the five
highest scores obtained by boys and girls in the
normative and referred samples combined, sepa-
rately for classroom and recess; (b) we computed
the mean of the five highest scores for each gender
in each setting; (c) we assigned a T score of 89 to
the mean of the five highest raw scores for each
gender in each setting; (d) we then assigned T
scores 90 through 100 in equal intervals to the raw
scores that were above those that had been assigned
T = 89. We followed these procedures for Total
Problems-Classroom T scores >70 and Total Prob-
lems-Recess T scores >70.
Assigning T Scores to DOF On-Task
DOF On-task is only scored for classroom ob-
servations. To score On-task, an observer records
whether the child is on-task or off-task in the
last 5 seconds of each 1-minute interval for each
10-minute observation period. On-task is deter-
mined by the predominant activity sampling
method (i.e., the child must be doing what is ex-
pected for more than one half of the 5-second in-
terval). The number of on-task intervals are then
summed for each 10-minute observation period and
are averaged by the DOF computer-scoring pro-
gram across multiple observations. The averaged
On-task raw score can thus range from 0 to 10, in
increments of 0.5.
To provide norm-referenced scores for DOF On-
task, we obtained averaged raw scores for boys and
girls in the normative samples for classroom ob-
servations. We then computed midpoint percentiles
according to the procedures described earlier for
the DOF problem scales. The raw scores for DOF
On-task were all negatively skewed in the norma-
tive samples. That is, fewer children in the norma-
tive sample received very low than received very
high On-task scores. Furthermore, because low
scores are clinically significant for On-task, it is
more important to make finer discriminations
among low scores than among high scores. To take
account of the negatively skewed On-task scores
and the need for finer discrimination among low
than high scores, we assigned T scores to raw scores
in the following ways:
1. At the low end of the On-task scale, we
assigned a T score of 20 to On-task scores
of 0 for both boys and girls. We then
assigned T scores to raw scores of 0.5 to
9.5 based on the midpoint per-centiles. The
T scores ranged from 21 to 51 (53
rd
percentile) for girls and 21 to 53 (62
nd
percentile) for boys.
2. We assigned a T score of 60 to the
highest possible On-task raw score of
10 for both boys and girls, which was
above the 80
th
percentile for both
genders.
MEAN T SCORES
Appendix A shows the mean DOF T scores and
raw scores for the normative samples of boys and
girls for classroom observations and recess obser-
6. Constructing the DOF and DOF Profile 89
vations. For all DOF problem scales, except Total
Problems-Classroom, raw scale score distributions
are positively skewed and low scores are truncated
at T = 50. Consequently, the mean T scores are
above 50 and their standard deviations are below
10 in the normative samples. Raw scores are less
skewed for DOF Total Problems-Classroom. Thus,
the mean T scores for DOF Total Problems-Class-
room are closer to 50, and their standard devia-
tions are closer to 10 in the normative samples.
In contrast to the DOF problem scales, On-task
scores are negatively skewed and high scores are
truncated at T = 60. Thus, the mean T scores for
on-task are below 50 and their standard deviations
are below 10 in the normative samples.
Users should thus keep in mind that the T scores
for most DOF problem scales and T scores for On-
task deviate from the mean of 50 and standard de-
viation of 10 expected when normal bell-shaped
distributions are transformed directly into T scores.
Users should also keep in mind that the means and
standard deviations of the DOF scales may vary
from one sample of children to another. In particu-
lar, the means and standard deviations for prob-
lem scale scores obtained by samples of children
referred for mental health services are typically
higher than for nonreferred children. Examples of
this can be seen in Appendix B, which displays
means and standard deviations for scale scores ob-
tained by matched samples of referred children and
nonreferred control children observed in the same
settings. Scores for referred children are often less
skewed than for nonreferred children, because
fewer referred children obtain very low scores.
NORMAL, BORDERLINE, AND
CLINICAL RANGES
On the computer-scored DOF Profile shown in
Chapter 3, broken lines are printed across the
graphic displays to demarcate borderline and clini-
cal ranges for DOF scale scores. T scores from 65
to 69 (93
rd
through 97
th
percentiles) are considered
to be in the borderline clinical range for the DOF
syndrome scales, DSM-oriented Attention Deficit/
Hyperactivity Problems scale and Inattention and
Hyperactivity-Impulsivity subscales for classroom
observations, and the Aggressive Behavior scale
for recess observations. The borderline range indi-
cates scores that are high enough to be of concern,
but not so high as to be clearly deviant. T scores
>69 (>97
th
percentile) are considered to be in the
clinical range. T scores below 65 (<93
rd
percen-
tile) are considered to be in the normal range.
T scores from 60 to 63 (84
th
through 90
th
per-
centiles) are considered to be in the borderline clini-
cal range for DOF Total Problems-Classroom and
Total Problems-Recess. T scores >63 (>90
th
per-
centile) are considered to be in the clinical range.
T scores below 60 (<84
th
percentile) are consid-
ered to be in the normal range.
For DOF On-task, T scores from 31 to 35 (3
rd
to
7
th
percentiles) are considered to be in the border-
line clinical range. T scores <31 (<3
rd
percentile)
are considered to be in the clinical range. T scores
above 35 (>7
th
percentile) are considered to be in
the normal range. DOF On-task is scored only for
classroom observations.
As reported in Chapter 8, children who were
referred for mental health or special education ser-
vices scored significantly higher on the DOF prob-
lem scales and On-task than matched samples of
nonreferred control children in the same settings.
Because scores on the DOF problem scales are
quantitative measures of the number and degree
of problems observed for a child, the scores are
not intended to mark categorical differences be-
tween children who are sick vs. well. Instead,
the borderline and clinical ranges help users iden-
tify scores that are of enough concern to warrant
consideration for professional help. Users may
choose to apply higher or lower cutpoints for their
own clinical or research purposes. For example,
for some cases, or for certain clinical or research
purposes, scores at the high end of the normal range
(e.g., >90
th
percentile) on the syndrome scales or
Attention Deficit/Hyperactivity Problems scale
may also warrant concern. If you wish to classify
childrens scores dichotomously as clearly in the
normal vs. clinical range on the DOF syndrome
scales, Attention Deficit/Hyperactivity Problems
6. Constructing the DOF and DOF Profile 90
scale, and Aggressive Behavior, we suggest using
T scores below 65 to designate the normal range
vs. T scores >65 to designate the clinical range.
For DOF Total Problems, we suggest using T scores
below 60 to designate the normal range vs. T scores
>60 to designate the clinical range. For DOF On-
task, we suggest using T scores above 35 to desig-
nate the normal range vs. T scores <35 to desig-
nate the clinical range.
SUMMARY
We developed the DOF from observations of
childrens behavior in classroom and recess set-
tings. The DOF covers a 10-minute observation
window. We recommend obtaining 3 to 6 DOFs
for each identified child, plus additional DOFs for
control children in the same setting. The 2009 ver-
sion of the DOF contains 88 specific problem
items, plus one open-ended item for other prob-
lems. Each problem is rated on a 0-1-2-3 scale
ranging from 0 = no occurrence to 3 = definite oc-
currence with severe intensity, high frequency, or
3 or more minutes total duration. The DOF also
includes an On-task score ranging from 0 to 10,
which can easily be converted to a percentage.
We constructed the DOF syndromes by apply-
ing exploratory and confirmatory factor-analytic
methodology similar to procedures used for other
ASEBA forms, including the CBCL/6-18, TRF,
YSR, SCICA, and TOF. The factor analyses yielded
five syndromes: Sluggish Cognitive Tempo, Im-
mature/Withdrawn, Attention Problems, Intrusive,
and Oppositional. The Attention Problems syn-
drome was similar to syndromes derived from the
CBCL/6-18, TRF, YSR, SCICA, and TOF. The
Oppositional syndrome was similar to the Oppo-
sitional syndrome on the TOF and the Self-Con-
trol Problems syndrome of the SCICA. The Slug-
gish Cognitive Tempo syndrome was similar to the
2007 Sluggish Cognitive Tempo scales scored from
the CBCL/6-18 and TRF.
In addition to the syndrome scales for classroom
observations, we constructed an Attention Deficit/
Hyperactivity Problems scale comprised of items
consistent with DSM-IV symptoms of ADHD. The
Attention Deficit/Hyperactivity Problems scale has
subscales for Inattention and Hyperactivity-Impul-
sivity. We also constructed an Aggressive Behav-
ior syndrome scale that can be scored from recess
observations. DOF Total Problems can be scored
separately for classroom and recess observations
by summing the 0-1-2-3 ratings for the 89 prob-
lem items.
The DOF scales are normed separately for boys
and girls ages 6-11. We assigned normalized T
scores to raw scores on each scale. The T scores
enable users to compare children with peers across
all scales and to compare a childs standing on each
syndrome with the same childs standing on each
of the other syndromes. To prevent over-interpre-
91
Reliability refers to agreement between repeated
assessments when the phenomena being assessed
are expected to remain constant. The DOF is de-
signed to obtain observers ratings of childrens
functioning in group settings. To assess the reli-
ability of such observations, it is important to know
the degree to which two observers obtain similar
results for the same child in the same observation
period, i.e., the degree of inter-rater reliability. We
present inter-rater reliability between pairs of ob-
servers for classroom observations of 212 children
and for recess observations of 17 children.
It is also important to know the degree to which
observers obtain similar results over periods when
childrens behavior is not expected to change much,
i.e., test-retest reliability. In this chapter, we present
test-test reliability for DOFs completed for two
separate sets of observations of 27 children over
intervals averaging 12 days.
Some users may be interested in the internal
consistency of the DOF scales. This refers to the
correlation between half of a scales items and the
other half of the items. We report Cronbachs
(1951) alpha as a measure of internal consistency
for each DOF scale for separate samples of referred
children and control children in the same settings.
For direct observations of behavior, reliability
coefficients >.70 are generally considered good for
low-stakes screening or program evaluation, while
coefficients closer to .90 are desirable for high-
stakes eligibility or diagnostic decisions
(Chafouleas, Christ, Riley-Tillman, Briesch, &
Chanese, 2007; Hintze & Matthews, 2004). In
terms of effect sizes, Cohen (1988) considers
Pearson rs of .10 to .29 small, .30 to .49 medium,
and >.50 large.
INTER-RATER RELIABILITY
To assess inter-rater reliabilities for classroom
observations, pairs of trained observers used the
DOF to rate one to four 10-minute observations of
212 randomly selected children in elementary
school classrooms in Pennsylvania, New York, and
Vermont. The sample of 212 children included 112
boys and 100 girls, ages 6-11. Of these, 58 chil-
dren were rated by five pairs of observers in greater
Philadelphia, Pennsylvania; 91 children were rated
by four pairs of observers in greater Syracuse, New
York; and 63 children were rated by three pairs of
observers in greater Burlington, Vermont, for a to-
tal of 12 observer pairs. For training, each pair of
observers simultaneously rated five practice cases
to learn the DOF procedures, as described in Chap-
ter 4. Following training, the observer pairs inde-
pendently used the DOF to simultaneously rate 14
to 24 anonymously selected children. Observers
were instructed not to discuss their ratings with
each other until after all reliability data were col-
lected. The number of observation periods per child
varied across observer pairs. Nine observer pairs
completed one DOF per child per observer, while
three observer pairs completed 2 to 4 DOFs per
child per observer.
To assess inter-rater reliabilities for recess ob-
servations, one pair of trained observers used the
DOF to rate two 10-minute observations during
recess (and lunch) for 17 anonymously selected
children (14 boys and 3 girls) in a Vermont school
for children with behavioral/emotional disorders.
When multiple observations were obtained per
Chapter 7
Reliability of the DOF
7. Reliability of the DOF 92
child, we averaged the 0-1-2-3 ratings across DOFs
to obtain an average rating for each of the 88 items
for each observer. We then summed the average
ratings for relevant items to obtain raw scores for
each DOF problem scale. We also averaged On-
task scores across multiple DOFs per child per
observer. When only one DOF was obtained per
child per observer, we summed the 0-1-2-3 ratings
for relevant items to obtain raw scores for each
DOF problem scale and computed the On-task
score per child per observer.
To obtain reliabilities for classroom observa-
tions, we computed Pearson rs between raw scale
scores separately for 10 DOF scales for each of
the 12 observer pairs. Of the 120 Pearson rs for
classroom observations, 106 were significant at p
<.05. We converted each r to Fishers z and then
obtained a mean z for each DOF scale across the
12 observer pairs. We also averaged Fishers z
scores across the six DOF empirically based prob-
lem scales, the three DSM-oriented scales, and all
nine problem scales. We converted the mean z
scores back to r for each DOF scale. We also con-
verted mean z scores back to r to obtain the mean
r of the six empirically based scales, mean r of the
three DSM-oriented scales, and mean r of all nine
problem scales. Inter-rater reliabilities for the Ag-
gressive Behavior syndrome and Total Problems-
Recess were obtained directly for one pair of ob-
servers. Both rs were significant at p <.001.
As can be seen in the first column of Table 7-1,
inter-rater reliabilities for the empirically based
scales ranged from .71 for the Oppositional syn-
drome to .87 for Sluggish Cognitive Tempo and
.88 for Total Problems-Classroom, with a mean r
of .80. For the DSM-oriented scales, inter-rater
reliabilities were .80 for the Attention Deficit/Hy-
peractivity Problems scale, .70 for the Inattention
subscale, and .81 for the Hyperactivity-Impulsiv-
ity subscale, with a mean r of .77. The mean inter-
rater r was .79 across all nine problem scales and
.97 for On-task. For classroom observations, the
inter-rater reliabilities for DOF Total Problems and
On-task scores were consistent with previous find-
ings on earlier versions of the DOF, as discussed
in Chapter 6. For recess observations, inter-rater
reliabilities were .73 for the Aggressive Behavior
syndrome and .97 for Total Problems.
The second column in Table 7-1 shows inter-
rater reliabilities derived from raw scale scores
obtained only on the first 10-minute observation
with the DOF. The correlations were generally simi-
lar to those shown in the first column for scores
averaged across 1 to 4 DOFs: mean r = .78 versus
.79 for all nine problem scales for classroom ob-
servations and mean r = .94 versus .97 for On-task.
8. For the problem scales, seven rs for scores aver-
aged across 1 to 4 DOFs (column 1) were within
.02 r values for scores obtained from 1 DOF (col-
umn 2). To further examine inter-rater reliability
for one versus multiple observations per child, we
computed average rs by Fishers z transformation
for the nine observer pairs who obtained only one
DOF per child versus the three observer pairs who
obtained 2 to 4 DOFs per child. For observer pairs
with only one DOF per child, the mean r was .82
across the nine problem scales for classroom ob-
servations and .94 for On-task. For observer pairs
who obtained 2 to 4 DOFs per child, the mean r
was .75 across the nine problem scales and .99 for
On-task.
The above findings indicate that inter-rater re-
liability was generally similar for observer pairs
obtaining only one 10-minute observation per child
versus multiple 10-minute observations per child.
The small differences between rs are useful to con-
sider for training purposes, since obtaining mul-
tiple DOFs per child is more time and labor inten-
sive than obtaining only one DOF per child. As
can be seen in Table 7-1, for most scales, good
inter-rater reliability can be obtained with only one
10-minute observation per child. Chapter 4 dis-
cusses procedures for training DOF observers.
As described in Chapter 6, revisions of the DOF
entailed adding and testing new items as well as
writing rules for rating various items. We analyzed
findings from various revisions to identify any sig-
nificant effects on inter-rater reliability. Computed
across the 12 observer pairs, we found similar mean
7. Reliability of the DOF 93
inter-rater reliabilities for DOF Total Problems
scores computed from the 88 items retained on
the 2009 DOF versus Total Problems scores com-
puted from the 96 items of the 1986 DOF (mean r
= .86 versus .83, respectively). We found similar
mean inter-rater reliabilities for the 43 retained
items that had scoring rules versus 45 retained
items without scoring rules (mean r = .81 versus
.76, respectively).
TEST-RETEST RELIABILITY
To assess test-retest reliability, we computed
Table 7-1
Inter-Rater Reliabilities for DOF Scales
Inter-Rater r Inter-Rater r
Scores averaged Scores for first
across 1 to 4 DOFs DOF per child
DOF Scale per child
Classroom Observations
Empirically Based Scales
Sluggish Cognitive Tempo .87 .86
Immature/Withdrawn .79 .73
Attention Problems .72 .74
Intrusive .78 .72
Oppositional .71 .71
Total Problems-Classroom .88 .86
Mean r for empirically based scales .80 .78
DSM-Oriented Scales
Attention Deficit/Hyperactivity Problems .80 .80
Inattention subscale .70 .72
Hyperactivity-Impulsivity subscale .81 .80
Mean r for DSM-oriented scales .77 .78
Mean r for all problem scales .79 .78
On-task .97 .94
Recess Observations
Aggressive Behavior .73 .83
Total Problems-Recess .97 .98
Note. N = 212 for classroom observations; N = 17 for recess observations. For classroom observations,
inter-rater rs were obtained for each of 12 pairs of observers. Mean rs were then computed for each scale
by Fisher z transformation. For recess observations, inter-rater rs were obtained from one pair of observers.
Mean rs across sets of scales for classroom and recess observations were computed by Fishers z
transformation.
7. Reliability of the DOF 94
Pearson rs for DOFs completed for 27 children,
who were rated by the same observer over inter-
vals of 7 to 22 days (average interval = 12.4 days).
The test-retest sample included 19 boys and 8 girls
attending Vermont schools. Ages ranged from 6 to
12 years, with a mean age of 8.4 (S.D. = 1.9). (Only
one child was age 12.) The observer obtained four
10-minute classroom observations for each child
over two days at Time 1 and four 10-minute class-
room observations over two days at Time 2. The
0-1-2-3 item ratings were averaged across the four
Time 1 observation sessions and across the four
Time 2 observation sessions. The averaged item
ratings were then summed to obtain raw scores for
the five DOF syndrome scales, the DSM-oriented
Attention Deficit/Hyperactivity Problems scale, In-
attention and Hyperactivity-Impulsivity subscales,
and Total Problems-Classroom. Averaged raw
scores were also obtained for On-task.
We computed rs between raw scores obtained
for Time 1 versus Time 2 for each DOF problem
scale, plus On-task. Correlations were significant
at p <.05 for 8 of 10 scales. As Table 7-2 shows,
the significant test-retest rs for the empirically
based syndromes ranged from .48 for the Imma-
ture/Withdrawn syndrome to .73 for the Intrusive
syndrome. The test-retest r for Total Problems-
Classroom was .72. Test-retest rs were .76 for the
DSM-oriented Attention Deficit/Hyperactivity
Problems scale, .43 for the Inattention subscale,
and .77 for the Hyperactivity-Impulsivity subscale.
The mean test-retest rs were .53 across the empiri-
cally based scales, .73 across the DSM-oriented
scales, and .58 across all problem scales. The test-
retest r was .42 for On-task.
Test-retest reliabilities were moderate (.72 to
.77) for the Intrusive syndrome, Total Problems-
Classroom, the DSM-oriented Attention Deficit/
Hyperactivity Problems scale, and the Hyperactiv-
ity-Impulsivity subscale. Test-retest reliability was
low for the Sluggish Cognitive Tempo, Attention
Problems, and Oppositional syndromes, and the In-
attention subscale, suggesting that the problems
comprising these scales are more variable than the
problems comprising the other scales. Test-retest
reliability was also low for On-task scores. The
lower test-retest reliabilities versus higher inter-
rater reliabilities may also be due to the composi-
tion of our samples. The test-retest sample was
comprised only of clinically referred children, some
of whom were in treatment, in contrast to anony-
mously selected control children for the inter-rater
reliabilities.
Pearson r reflects similarities between the rank
orders of scores obtained at Time 1 and Time 2. It
is high when rankings of individuals scores retain
approximately the same rank from Time 1 to Time
2. Because it is not affected by the absolute mag-
nitude of scores, r can be high even if all the Time
1 scores differ in magnitude from the Time 2 scores.
To test differences in mean scores relative to their
variance, we performed dependent t tests of differ-
ences between Time 1 versus Time 2 scores for
each of the 10 DOF scales. As shown in Table 7-2,
Time 1 scores differed significantly (p <.05) from
Time 2 scores only for the Immature/Withdrawn
syndrome, which could be a chance effect (Sakoda,
et al., 1954).
INTERNAL CONSISTENCY
To assess internal consistency of the DOF
scales, we computed Cronbachs alpha (1951) for
each DOF scale. Alpha represents the mean of the
correlations between all possible sets of half the
items comprising a scale. The magnitude of alpha
tends to be directly related to the length of the scale,
because half the items of a short scale provide a
less stable measure than half the items of a long
scale.
Although internal consistency is sometimes re-
ferred to as split-half reliability, it is not reli-
ability in the sense of measuring how well a scale
will produce the same results over different occa-
sions when the target phenomena are expected to
remain constant. Furthermore, some scales with
relatively low internal consistency may be more
valid than other scales with very high internal con-
sistency. As an example, if a scale consists of 20
versions of the same item, it should produce very
high internal consistency, because respondents
7. Reliability of the DOF 95
should give similar answers to the 20 versions of
the item. However, such a scale would usually be
less valid than a scale that uses 20 different items
to assess the same phenomenon. Because each of
the 20 different items is likely to tap different as-
pects of the target phenomenon and to be subject
to different errors of measurement, the 20 differ-
ent items are likely to provide better measurement
despite lower internal consistency than a scale that
uses 20 versions of a single item.
As detailed in Chapter 6, the DOF syndrome
scales were derived from factor analyses of the
correlations among items. The composition of the
syndrome scales is therefore based on internal con-
sistencies among certain subsets of items. Mea-
sures of internal consistency of the syndrome scales
are thus somewhat redundant with the inter-item
correlations on which the scales were based. By
contrast, the DOF DSM-oriented scales were de-
veloped a priori, based on experts ratings of how
consistent items are with a DSM-IV diagnosis of
Table 7-2
Test-Retest Reliabilities, Means, and Standard Deviations for DOF Scales
Test- Time 1 DOF Time 2 DOF
DOF Scale Retest r Mean SD Mean SD
Classroom Observations
Empirically Based Scales
Sluggish Cognitive Tempo (.25) 1.09 .84 1.08 .86
Immature/Withdrawn .48 .69
b, c
.67 .42 .52
Attention Problems (.35) 8.30 1.68 8.00 1.66
Intrusive .73 2.02 1.56 2.37 1.35
Oppositional .49 2.23 1.48 2.37 1.72
Total Problems-Classroom .72 16.58 4.76 16.34 4.70
Mean r empirically based scales
a
.53
DSM-Oriented Scales
Attention Deficit/Hyperactivity Problems .76 10.63 3.00 10.13 2.69
Inattention subscale .43 3.77 1.19 3.42 1.25
Hyperactivity-Impulsivity subscale .77 6.86 2.18 6.72 1.78
Mean r for DSM-oriented scales
a
.73
Mean r for all problem scales
a
.58
On-task .42
c
8.53 1.12 8.65 .91
Note. N = 27. All test-retest observations were done in classrooms. Mean test-retest interval = 12.4 days.
All significant Pearson rs were p <.05. Values in parentheses were not significant.
a
Mean r was computed by Fishers z transformation.
b
Time 1 DOF > Time 2 DOF, p <.05.
c
Not significant when corrected for the number of comparisons (Sakoda, et al., 1954).
7. Reliability of the DOF 96
ADHD (for details, see Chapter 6). Consequently,
internal consistencies for the DSM-oriented scales
are not redundant with inter-item correlations from
factor analyses.
As shown in Table 7-3, we computed alphas,
derived separately from classroom observations of
332 children and from recess observations of 248
children. The samples included equal numbers of
referred children and randomly selected control
children of the same gender in the same settings.
The classroom sample included 224 boys and 108
girls ages 6-11 (mean age = 8.3, SD = 1.7). The
recess sample included 174 boys and 74 girls ages
6-12 (mean age = 8.4, SD = 1.6; only two referred
children were age 12). For classroom observations,
alphas ranged from .49 to .80 for the five empiri-
cally based syndromes, .68 to .81 for the three
DSM-oriented scales, and .87 for Total Problems.
For recess observations, alphas were .56 for Ag-
gressive Behavior and .70 for Total Problems.
SUMMARY
For classroom observations, the mean inter-rater
r was .80 across the five empirically based syn-
dromes and Total Problems and .77 across the three
DSM-oriented scales, with an overall mean r of
.79 across all DOF problem scales. The r of .97 for
the DOF On-task score and .88 for Total Problems
showed high inter-rater reliability, consistent with
prior research. For recess observations, the inter-
rater r was .73 for Aggressive Behavior and .97
for Total Problems.
To assess test-retest reliability, a trained observer
completed four DOFs at Time 1 and four DOFs at
Time 2 for 27 children observed in classrooms at
Table 7-3
Cronbachs Alpha Coefficients (Internal Consistency) for DOF Scales
DOF Scale Alpha
Classroom Observations
Empirically Based Scales
Sluggish Cognitive Tempo .49
Immature/Withdrawn .76
Attention Problems .67
Intrusive .80
Oppositional .69
Total Problems-Classroom .87
Mean alpha for empirically based scales
a
.73
DSM-Oriented Scales
Attention Deficit/Hyperactivity Problems .81
Inattention subscale .68
Hyperactivity-Impulsivity subscale .72
Mean alpha for DSM-oriented scales
a
.74
Recess Observations
Aggressive Behavior .56
Total Problems-Recess .70
Note. Cronbachs alpha was computed for matched samples of referred children and control children in
the same settings. For classroom observations, N = 332; for recess observations, N =248.
a
Mean alpha computed by Fishers z transformation.
97
Validity refers to the accuracy with which in-
struments assess what they are supposed to assess.
The DOF is designed to measure independent ob-
servations of childrens behavioral, emotional, and
social problems in group settings. Data obtained
from the DOF are intended to mesh with data from
other sources, particularly ASEBA instruments for
obtaining parent reports (CBCL/6-18), teacher re-
ports (TRF), self-reports (YSR), clinical interviews
(SCICA), and test session observations (TOF). Like
other ASEBA instruments for assessing behavioral
and emotional problems, the validity of the DOF
must be evaluated in relation to a variety of crite-
ria, none of which is definitive by itself. In this
chapter, we present evidence for the content valid-
ity and criterion-related validity of the DOF.
CONTENT VALIDITY OF DOF ITEMS
The most basic kind of validity is content va-
lidity, which is the degree to which an instruments
content includes what the instrument is intended
to measure. The DOF items were modeled on
CBCL/6-18 and TRF items that are appropriate for
direct observations. The TRF includes 97 items
paralleling those of the CBCL/6-18, plus additional
items appropriate to school settings. Nearly all the
CBCL/6-18 and TRF items discriminated signifi-
cantly (p <.01) between referred and nonreferred
children (Achenbach & Rescorla, 2001).
Beginning in the 1960s, ASEBA problem items
were developed and refined on the basis of research
and practical experience (Achenbach, 1966;
Achenbach & Lewis, 1978). The procedures for
selecting ASEBA problem items included exami-
nation of child/adolescent psychiatric case histo-
ries, extensive literature searches, and consultation
with mental health professionals and special edu-
cators. Pilot editions were tested at multiple sites
and revised on the basis of feedback from parents,
paraprofessionals, and clinicians. Details of the
rationale and procedures for selecting ASEBA
items have been presented in previous manuals for
the instruments (Achenbach, 1991a, b, c;
Achenbach & Edelbrock, 1983, 1986, 1987).
As explained in Chapter 6, the original 1981
and 1986 DOF had 96 problem items, plus an open-
ended item for describing and rating problems not
specified on the DOF. To assemble the original
DOF item sets, Achenbach selected items from
early versions of the CBCL and TRF describing
problems that might be directly observed in group
settings. Whenever possible, he retained the origi-
nal wording of the CBCL and TRF items, but re-
worded some items slightly to make them more
appropriate for direct observations. For the 2003
research edition of the DOF, we retained 95 of the
1986 DOF items and added 19 new items to corre-
spond to TOF items and to tap DSM-IV and DSM-
IV-TR symptoms for ADHD and other problems
that were not covered by the original DOF items.
Through analyses described in Chapter 6, we
reduced the item set for the 2009 version of the
DOF to 88 specific items, plus one open-ended item
for other problems. Of the 88 specific items on the
DOF, there are the following counterpart items on
other ASEBA forms: 51 on the CBCL/6-18, 63 on
the TRF, 49 on the YSR, 69 on the TOF, 60 on the
SCICA-Observation Form, and 35 on the SCICA
Self-Report Form. The content validity of the DOF
items is thus strongly supported by nearly four de-
cades of research, consultation, feedback, and re-
finement of comparable ASEBA items. In addition,
63% of the DOF items significantly discriminated
between clinically referred and control children,
Chapter 8
Validity of the DOF
8. Validity of the DOF 98
as described in the next section.
CRITERION-RELATED VALIDITY
Criterion-related validity refers to the degree
of association between a particular measure, such
as a scale scored from the DOF, and an external
criterion for characteristics that the scale is intended
to measure. One of the main reasons for deriving
syndrome scales from ASEBA forms was the lack
of an empirically based taxonomy of child psycho-
pathology (Achenbach & McConaughy, 1997;
Achenbach & Rescorla, 2001). The ASEBA syn-
drome scales provide empirically based groupings
of items that describe childrens behavioral and
emotional problems, as reported by key informants.
In a similar fashion, the DOF items describe be-
havioral and emotional problems that can be ob-
served in group settings, such as classrooms and
school playgrounds. The DOF syndromes were
derived to provide empirically based scales for
scoring groups of these problem items that tend to
co-occur. The DOF DSM-oriented Attention Defi-
cit/Hyperactivity Problems scale was developed for
scoring problems consistent with a the DSM-IV
and DSM-IV-TR diagnosis of ADHD.
An important way to test criterion-related va-
lidity of the DOF items and scales is to measure
their ability to discriminate between children who,
independently of their DOF scores, have been
judged to be at risk for emotional or behavioral
problems and have been referred for mental health
or special education evaluations and/or services.
We recognize that clinical referral is not an infal-
lible criterion of need for help. Some children in
our referred samples may not have needed profes-
sional help, while others in our control sample
may have needed help. However, actual referral
status is as ecologically valid as any other practi-
cal alternative for testing criterion-related valid-
ity.
Matched Referred and Control Samples
To test the criterion-related validity of DOF
items and scale scores, we used matched samples
of clinically referred children and randomly se-
lected control children in the same settings. From
the samples shown earlier in Table 6-1, we selected
6- to-12-year-olds who had been referred for evalu-
ation of behavioral and emotional problems and/
or learning difficulties and had participated in our
research studies using the DOF (for brevity, we call
this group referred.). For each referred child, in-
dependent observers selected one or two control
children of the same gender and in the same class-
room as the referred child. For classroom observa-
tions, the matched samples included 166 referred
children ages 6-11 and 263 control children. (For
classroom observations, 97 referred children had
two matched controls). For recess observations, the
matched samples included 124 referred children
ages 6-12 and 248 control children. (For recess ob-
servations, all referred children had two matched
controls; only two referred children were age 12.)
Referred children with full scale IQ scores <75
were excluded from both samples.
Table 8-1 shows the characteristics of the
matched samples of boys and girls for classroom
observations (N = 430) and recess observations (N
= 372). Ethnicity of the sample for classroom ob-
servations was 85.2% non-Latino White and 14.8%
other ethnicities. Ethnicity of the sample for re-
cess observations was 100% non-Latino White.
MANCOVA of DOF Item Ratings
To test associations of referral status and de-
mographic variables with DOF item ratings for
classroom observations, we used a multivariate
analysis of covariance (MANCOVA) to analyze
DOF item ratings obtained by the matched samples
shown in Table 8-1. For classroom observations,
the MANCOVA design was 2 (referred vs. con-
trols) x 2 (boys vs. girls), with ethnicity (non-Latino
White vs. Other) as a covariate. For recess obser-
vations, we used a 2 (referred vs. controls) x 2 (boys
vs. girls) MANOVA, with no covariate because
ethnicity for that sample was 100% non-Latino
White. For each of the 88 DOF items, we aver-
aged 0-1-2-3 ratings across 2 to 4 DOFs separately
for each referred child and each control child. To
create equal sample sizes for referred and control
8. Validity of the DOF
99
children, we computed the mean of the averaged
item ratings when there were two control children.
For classroom observations, the dependent vari-
ables for the MANCOVA were mean item ratings
for 166 referred children and 166 averaged ratings
for controls. For recess observations, dependent
variables for the MANOVA were mean item rat-
ings for 124 referred children and 124 averaged
ratings for controls. (For recess observations, ob-
servers used the 1986 version of the DOF, from
which 72 items were retained on the 2009 DOF.)
Because we found no significant differences on
DOF Total Problems for younger (ages 6 to 8) ver-
sus older (ages 9 to 12) children, we did not in-
clude age in the MANCOVA or MANOVA designs.
Socioeconomic status (SES) was also not included
as a covariate because SES was not available for
the control sample.
The overall MANCOVA for classroom obser-
vations showed significant effects of referral sta-
tus, gender, and ethnicity (p < .01), but no signifi-
cant referral status x gender interaction. The over-
all MANOVA for recess observations showed sig-
nificant effects of referral status and gender (p <
.01), but no significant interaction. The first three
columns of Table 8-2 display significant effect sizes
(ES) of referral status, gender, and ethnicity for each
of the 88 specific problem items on the DOF, as
obtained from subsequent ANCOVAs of classroom
observations. The last two columns of Table 8-2
display significant ES of referral status and gender
obtained from subsequent ANOVAs of recess ob-
servations. The ES is represented by the percent of
variance (partial Eta
2
) uniquely accounted for by
each independent variable that was significant at p
<.05. According to Cohens (1988) criteria for ES
in ANCOVA/ANOVA, effects accounting for 1-
5.8% of variance are small; effects accounting for
5.9-13.7% of variance are medium; and effects ac-
counting for >13.8% of variance are large. The ES
Table 8-1
Characteristics of Matched Samples of Referred and Control Children
Boys Girls Total
Classroom Observations
Referred children 112 54 166
Control children 179 85 264
Total 291 139 430
Recess Observations
Referred children 87 37 124
Control children 174 74 248
Total 261 111 372
Ethnicity for Classroom Sample
a
Non-Latino White 85.2%
African American 9.7%
Latino/Hispanic 0.9%
Mixed or Other 4.0%
a
Percentages for N = 425 for classroom observations; ethnicity for recess observations was 100% non-
Latino White.
8. Validity of the DOF 100
Table 8-2
Percent of Variance Accounted for by Significant (p<.05) Effects of Referral Status and
Demographic Variables on DOF Item Scores in ANCOVAs
Classroom Observations Recess Observations
Ref Ref
DOF Item Status
a
Gender
b
Ethnicity
c
Status
a
Gender
b
1. Acts too young for age 5 1
W, d
2
d
87. Complains 1
d
3
O
3 3
G, d
88. Afraid to make mistakes 2
Note. For classroom observations: N = 166 referred children ages 6-11 and 264 matched controls in the
same classrooms. Analyses were referral status x gender MANCOVA and ANCOVAs with ethnicity
(Non-Latino White vs. Other) as a covariate. For recess observations: N = 124 referred children ages 6-
12 and 248 matched controls in the same setting. Analyses were referral status x gender MANOVA and
ANOVAs with no covariate. The percent of variance uniquely accounted for by each independent variable
is represented by partial Eta
2
. Scores were item raw scores averaged across 2 to 4 DOFs per child.
a
All significant effects of referral status reflected higher scores for referred than control children.
b
B = boys scored higher; G = girls scored higher.
c
W = Non-Latino White scored higher; O = Other scored higher.
d
Not significant when corrected for number of analyses.
in Table 8-2 are values for partial Eta
2
rounded to
the nearest whole number. The superscript d in the
table indicates effects that could be regarded as
significant by chance when corrected for the num-
ber of analyses for each independent variable, us-
ing a p <.05 protection level (Sakoda et al., 1954).
Table 8-2 (cont.)
Classroom Observations Recess Observations
Ref Ref
DOF Item Status
a
Gender
b
Ethnicity
c
Status
a
Gender
b
Referral Status Effects. For classroom obser-
vations, referred children scored significantly
higher (p <.05) than control children on 38 of the
88 DOF items. Of these, eight effects could have
occurred by chance, which are marked by the su-
perscript d in Table 8-2. Three DOF items showed
8. Validity of the DOF
103
Scale Scores
To test associations of referral status and de-
mographic characteristics with DOF scale scores
for classroom observations, we performed multiple
regressions on raw scores for each scale (the de-
pendent variable) with the independent variables
of referral status, gender, and ethnicity (non-Latino
White versus Other). For multiple regressions of
recess observations, the independent variables were
referral status and gender. To obtain raw scores for
each problem scale, we first averaged item ratings
across 2 to 4 DOFs separately for each referred
child and each control child. We then computed
the mean of the averaged item ratings when there
were two control children, as done for the
MANCOVA and MANOVA of item ratings. Raw
scores for the DOF scales were the sums of the
averaged ratings for items comprising each scale.
For classroom observations, the raw score for To-
tal Problems was the sum of the averaged item rat-
ings for the 88 specific problem items, plus the
open-ended item for additional problems. For re-
cess observations, the raw score for Total Prob-
lems was the sum of the averaged item ratings for
the 72 specific problem items retained from the
1986 DOF, plus the open-ended item for additional
problems. For On-task, we averaged the 0 to 10
scores across 2 to 4 DOFs separately for each re-
ferred child and each control child, and then com-
puted the mean of the averaged On-task scores
when there were two control children.
Table 8-3 displays ESs for associations of re-
ferral status and ethnicity with DOF scale scores.
The ES is the squared standardized regression co-
efficient, which reflects the percent of variance in
scale scores that was uniquely accounted for by
each independent variable. According to Cohens
(1988) criteria for ES in multiple regressions, ef-
fects accounting for 2-12% of variance are small;
effects accounting for 13-25% of variance are me-
dium; and effects accounting for >26% of variance
are large. The superscript c in the table indicates
effects that could be regarded as significant by
chance when corrected for the number of analyses
for each independent variable, using a p <.05 pro-
medium ES for referral status: 7. Doesnt concen-
trate or doesnt pay attention for long (6%); 13.
Fidgets (7%); and 23. Doesnt seem to listen to what
is being said (6%). The remaining 35 significant
ES were small, accounting for 1-5% of variance.
For recess observations, referred children scored
significantly higher (p <.05) than control children
on 24 of 67 DOF items. (Five items were scored 0
for 100% of cases and 16 items had missing val-
ues because they were not included on the 1986
DOF used for recess observations.) One item, 75.
Withdrawn, doesnt get involved with others,
showed a medium ES for referral status. All other
ES for recess observations were small, accounting
for 1-5% of variance. Seventeen items showed sig-
nificant ES for recess observations but not class-
room observations. Thus, 55 of 88 (63%) DOF
items showed significant effects of referral status
in classroom observations, recess observations, or
both.
Demographic Effects. The demographic vari-
ables of gender and ethnicity showed several small
ESs (p <.05) on DOF item ratings as follows: For
classroom observations, there were eight small ES
for gender, accounting for 2-4% of variance, all of
which could be due to chance. Boys were rated
higher on three DOF items and girls were rated
higher on five items. There were 22 small ES for
ethnicity, accounting for 1-5% of variance. Of
these, eight could be chance effects. Children with
Other ethnicity (which included African Ameri-
can, Latino/Hispanic, and Mixed or other ethnicity)
were rated higher than non-Latino White children
on 20 DOF items, while non-Latino White chil-
dren were rated higher on two items.
For recess observations, there were eight small
ES for gender, accounting for 1-4% of variance,
all of which could be due to chance. Boys were
rated higher on three DOF items and girls were
rated higher on five items. (As indicated earlier,
ethnicity was not included in analyses of recess
observations because the entire sample was non-
Latino White.)
Multiple Regression Analyses of DOF
8. Validity of the DOF 104
tection level (Sakoda et al., 1954).
Referral Status Effects. Referral status effects
outweighed demographic effects on all DOF prob-
lem scales. Referred children scored significantly
(p <.05) higher than control children on all DOF
problem scales, accounting for 4 to 26% of vari-
ance. Control children scored significantly (p <.05)
higher than referred children on On-task (8% of
variance). Of the 12 significant ES for referral sta-
tus, two could have occurred by chance. DOF To-
tal Problems-Recess showed a large ES, account-
ing for 26% variance. DOF Total Problems-Class-
room showed a medium ES, accounting for 13%
of variance. All other ES were small according to
Cohens (1988) criteria. After the two DOF Total
Problems scales, the next highest ES were for the
DSM-oriented Attention Deficit/Hyperactivity
Table 8-3
Percent of Variance Accounted for by Significant (p <.05) Effects of Referral Status and
Ethnicity on DOF Scale Scores in Multiple Regressions
DOF Scale Referral Status
a
Ethnicity
b
Classroom Observations
Empirically Based Scales
Sluggish Cognitive Tempo 6
c
1
O, c
Immature/Withdrawn 6
Attention Problems 8
Intrusive 4
c
3
O
Oppositional 7 3
O
Total Problems-Classroom 13 2
O
DSM-Oriented Scales
Attention Deficit/Hyperactivity Problems 10 1
O, c
Inattention subscale 8 2
O
Hyperactivity-Impulsivity subscale 9
On-task 8 6
W
Recess Observations
Aggressive Behavior 10 N/A
Total Problems-Recess 26 N/A
Note. For classroom observations: N = 166 referred children ages 6-11 and 264 matched controls in the
same classrooms. For recess observations: N = 124 referred children ages 6-12 and 248 matched controls
in the same setting. For classroom observations, analyses were multiple regressions of raw scale scores
on referral status, gender, and ethnicity. For recess observations, analyses were multiple regressions of
raw scale scores on referral status and gender. Percent of variance is represented by the squared standardized
regression weight for each independent variable. There were no significant gender effects on any DOF
scale.
a
Referred children scored significantly (p <.05) higher than control children on all problem scales; control
children scored significantly (p <.05) higher than referred children on On-task.
b
O = Other scored higher than non-Latino White; W = non-Latino White scored higher than Other.
c
Not significant when corrected for the number of analyses.
8. Validity of the DOF
105
Mean Scale Scores for Referred and
Control Children
Table 8-4 displays the mean raw scores and stan-
dard deviations obtained by referred and control
children on each DOF scale, derived from
MANCOVA and ANOVA. MANCOVAs were
modeled on the regression analyses, treating refer-
ral status and gender as between subject measures
and ethnicity as a covariate. These included a 2
(referred vs. control) x 2 (boys vs. girls)
MANCOVA on raw scale scores for the five DOF
syndromes, followed by univariate 2 x 2 ANOVAs
on scores for each syndrome scale. We performed
a similar 2 x 2 MANCOVA, followed by univariate
ANOVAs, on the Inattention and Hyperactivity-Im-
pulsivity subscales, and 2 x 2 univariate ANOVAs
on the Attention Deficit/Hyperactivity Problems
scale, Total Problems-Classroom, and On-task
scores. ANOVAs for the Aggressive Behavior syn-
drome and Total Problems-Recess were also mod-
eled on the multiple regressions, treating referral
status and gender as between subject measures with
no covariate. The results mirrored those of the mul-
tiple regressions. Referred children scored signifi-
cantly (p <.05) higher than control children on all
DOF problem scales, while control children scored
significantly (p <.05) higher than referred children
on On-task. There were no significant gender ef-
fects.
Discriminant Analyses of DOF Scale
Scores
We used discriminant analyses to determine
which weighted combinations of DOF scale scores
best differentiated referred from control children
for the matched samples shown in Table 8-1. When
there were two matched control children for a re-
ferred child, we averaged item ratings across the
two controls, as done in previous analyses. For
classroom observations, we performed one dis-
criminant analysis using the five DOF syndromes
as candidate predictors and another discriminant
analysis using the DSM-oriented Inattention and
Hyperactivity-Impulsivity subscales as candidate
predictors. Predictors were entered simultaneously
within each set of discriminant analyses.
Discriminant analyses selectively weight pre-
dictors to maximize their collective associations
with the criterion groups being analyzed. The
weighting process makes use of characteristics of
the sample that may differ from other samples. To
avoid overestimating the accuracy of the classifi-
cation obtained by discriminant analyses, it is nec-
essary to correct for shrinkage in associations that
would occur when discriminant weights derived
in one sample are applied to a new sample. To cor-
rect for shrinkage, we employed a jackknife
(cross-validation) procedure whereby discriminant
functions are computed with a different childs data
excluded (held out) of the sample each time. Each
discriminant function is then cross-validated by
testing the accuracy of its predictions for the child
who was held out when the discriminant function
was computed. Finally, the percentage of correct
predictions is averaged across all the held-out chil-
dren.
In addition to discriminant analyses of sets of
DOF scales, we obtained cross-validated percent-
ages of cases correctly classified by the DOF Total
Problems-Classroom, Attention Deficit/Hyperac-
tivity Problems scale, Aggressive Behavior, and To-
tal Problems-Recess as single predictors.
For each set of predictors, Table 8-5 shows the
cross-validated percentages of children correctly
classified as referred (sensitivity) versus controls
(specificity). The weighted combination of the five
syndrome scales correctly classified 56% of re-
ferred children and 74% of control children, with
an overall correct classification rate of 65% and
overall misclassification rate of 35%. An additional
forward stepwise discriminant analysis indicated
that all but the Intrusive syndrome were signifi-
cant (p <.05) predictors in the discriminant func-
tion. The Sluggish Cognitive Tempo syndrome was
the strongest predictor (standardized canonical co-
efficient = .478), with the other three syndromes
contributing about equally to the discriminant func-
tion (standardized canonical coefficients = .340
to .389). The DOF Total Problems-Classroom score
alone showed similar classification rates: 54% of
8. Validity of the DOF 106
referred children and 75% of control children cor-
rectly classified, with an overall correct classifi-
cation rate of 65% and overall misclassification
rate of 35%.
The weighted combination of the DSM-oriented
Inattention and Hyperactivity-Impulsivity
subscales correctly classified 53% of referred chil-
dren and 73% of control children, with an overall
correct classification rate of 63% and overall
misclassification rate of 37%. An additional for-
ward stepwise discriminant analysis indicated that
both the Inattention and Hyperactivity-Impulsiv-
ity subscales were significant (p <.05) predic-
tors, with Hyperactivity-Impulsivity contributing
slightly more (standardized canonical coefficient
= .610) than Inattention (standardized canonical
coefficients = .508). The Attention Deficit/Hyper-
activity Problems scale alone showed the same
overall correct classification rate of 63% and
misclassi-fication rate of 37%.
Table 8-4
Means and Standard Deviations of DOF Raw Scale Scores for Referred and Control Children
Referred
a
Averaged Controls
b
DOF Scale Mean SD Mean SD
Classroom Observations
Empirically Based Scales
Sluggish Cognitive Tempo 1.2 1.3 0.7 0.8
Immature/Withdrawn 0.8 1.8 0.2 0.4
Attention Problems 5.6 3.0 4.1 2.6
Intrusive 1.9 3.0 1.0 1.4
Oppositional 1.9 1.9 0.9 1.8
Total Problems-Classroom 13.6 8.8 8.0 5.4
DSM-Oriented Scales
Attention Deficit/Hyperactivity Problems 8.2 5.7 5.0 3.6
Inattention subscale 2.8 2.6 1.5 1.8
Hyperactivity-Impulsivity subscale 5.4 3.7 3.5 2.4
On-task 8.0 1.7 8.9 1.6
Recess Observations
Aggressive Behavior 1.4 1.6 0.5 0.6
Total Problems-Recess 4.2 3.2 1.3 1.4
Note. For classroom observations: N = 166 referred children ages 6-11 and averaged ratings for 264
matched controls in the same classrooms. For recess observations: N = 124 referred children ages 6-12
and averaged ratings for 248 matched controls in the same setting. Problem scale scores were the sums
of averaged item ratings.
a
Referred children scored significantly (p <.05) higher than control children on all problem scales.
b
Control children scored significantly (p <.05) higher than referred children on On-task.
8. Validity of the DOF
107
The Aggressive Behavior syndrome, based on
recess observations, correctly classified 51% of re-
ferred children and 80% of control children, with
an overall correct classification rate of 65% and
overall misclassi-fication rate of 35%. Total Prob-
lems-Recess alone produced the best classification
rates, correctly classifying 64% of referred chil-
dren and 90% of control children, with an overall
correct classification rate of 77% and overall
misclassification rate of 23%.
SUMMARY
This chapter presented several kinds of evidence
for the validity of the 2009 DOF items and scale
scores. Content validity of the DOF items is based
on their derivation from similar items of the CBCL/
6-18 and TRF, most of which significantly dis-
criminated referred from nonreferred children
(Achenbach & Rescorla, 2001).
Criterion-related validity was supported by the
ability of the DOF items and scale scores to dis-
criminate between matched samples of referred and
nonreferred control children. Referred children
scored significantly higher on 55 of the 88 DOF
items for observations in classrooms and/or recess,
with referral status accounting for 1 to 7% of vari-
ance. Demographic variables of gender and
ethnicity showed small effects on item scores. Re-
ferred children scored significantly higher than
nonreferred children on all DOF problem scales,
accounting for 4 to 26% of variance. DOF Total
Problems accounted for 13% of variance in class-
room observations and 26% of variance in recess
observations. Control children scored significantly
higher on DOF On-task, accounting for 8% of vari-
ance.
A weighted combination of the five DOF syn-
dromes correctly classified 56% of referred chil-
dren (sensitivity) and 74% of nonreferred children
(specificity). A weighted combination of the DSM-
oriented Inattention and Hyperactivity-Impulsiv-
ity subscales showed only slightly lower sensitiv-
Table 8-5
Cross-Validated Percents of Cases Correctly Classified as Referred vs. Control
Overall
Averaged Correct
Candidate Predictors Referred Controls Classification
Classroom Observations
Five syndrome scales 56% 74% 65%
Total Problems-Classroom 54% 75% 65%
DSM-oriented Inatttention & Hyperactivity-
Impulsivity subscales 53% 73% 63%
DSM-oriented Attention Deficit/Hyperactivity
Problems 54% 72% 63%
Recess Observations
Aggressive Behavior 51% 80% 65%
Total Problems-Recess 64% 90% 77%
Note. For classroom observations: N = 166 referred children ages 6-11 and averaged ratings for 264
matched controls in the same classrooms. For recess observations: N = 124 referred children ages 6-12
and averaged ratings for 248 matched controls in the same setting. Scale scores were the sums of
averaged item ratings for referred and control children.
108
Chapter 9
Answers to Frequently Asked Questions
This chapter answers questions that may arise
about the DOF. The questions are grouped under
headings pertaining to the DOF form and profile,
applications of the DOF, relations to other assess-
ment procedures, coordinating data from multiple
sources, and relations to DSM and special educa-
tion classifications. If you have a question that is
not answered under one heading, look under the
other headings. The Table of Contents and Index
may also help you find answers to questions not
listed in this chapter.
FEATURES OF THE DOF
1. What is the DOF?
Answer: The Direct Observation Form (DOF)
is a standardized form for rating observations of
6- to-11-year-old children in school classrooms, at
recess, and in other group settings. During a 10-
minute period, the observer uses the DOF to write
a narrative description of the childs behavior, af-
fect, and interactions. The observer also rates the
child for being on-task or off-task for 5 seconds at
the end of each 1-minute interval. At the end of
the 10-minute observation, the observer rates the
child on 89 problem items using a 0-1-2-3 scale.
Chapter 2 of this Manual provides detailed instruc-
tions for using the DOF and rating the DOF items.
2. How many 10-minute observations are
necessary to score the DOF?
Answer: The DOF Module for computer scor-
ing requires at least two DOFs (i.e., two 10-minute
observations) and allows up to six DOFs per child
to score one DOF Profile. Because childrens be-
havior can vary from one occasion to another, we
recommend 3 to 6 observations of the identified
child on at least two different days. Observers may
also include 1 to 6 DOFs for each of two control
children matched to the identified child. Observa-
tions of control children are recommended but
optional.
3. Why are control children included on
the DOF?
Answer: Observations of control children pro-
vide a standard for evaluating the behavior of the
identified child in relation to peers in the same situ-
ation. Observers do not need to know the names of
control children. Observers should select a control
child of the same gender who is situated far enough
away so as not to influence the behavior of the iden-
tified child, if possible. The DOF Module for com-
puter scoring allows up to two control children per
identified child to score one DOF Profile.
4. What if it is not possible to match the
gender of a control child to the gender
of the identified child?
Answer: Although rare, this may occur in some
settings or special programs where one gender
vastly outnumbers the other. We recommend
matching the gender of control children to the gen-
der of the identified child because the DOF Profile
has separate norms for boys and girls. However, if
the gender of a control child is different from the
gender of the identified child, the DOF Module
for computer scoring weights the item scores of
that control child in order to derive scale scores
that approximate scores for the correct gender. If
the identified child is a boy and the control child is
a girl, then item scores of the control child are ad-
justed upward. If the identified child is a girl and
the control child is a boy, item scores of the con-
trol child are adjusted downward. These adjust-
ments are done only for DOF Profiles for class-
room observations, because there were no signifi-
cant gender differences for recess observations, as
reported in Chapter 8.
9. Answers to Frequently Asked Questions
109
5. What is the DOF Profile?
Answer: The DOF Profile is a computer-scored
display of item and scale scores from classroom
observations and/or recess observations. The user
must select one setting (class or recess) for com-
puter-scoring. The DOF can only be scored by com-
puter because of the complexity of averaging item
scores across multiple observation sessions. As
discussed in Chapter 3, the DOF Profile for class-
room observations displays averaged item scores
plus raw scores, T scores, and percentiles for five
empirically based syndrome scales, a DSM-ori-
ented Attention Deficit/Hyperactivity Problems
scale with Inattention and Hyperactivity-Impulsiv-
ity subscales, Total Problems, and an On-task score.
The DOF Profile for recess observations displays
averaged item scores plus raw scores, T scores and
percentiles for an empirically based Aggressive
Behavior syndrome scale and Total Problems. Pro-
files for both settings also display averaged item
scores for Other Problems not scored on the syn-
drome scales. The DOF Profile has separate norms
for boys and girls ages 6 to 11.
6. What are the DOF syndrome scales?
Answer: As detailed in Chapter 6, the DOF syn-
drome scales were derived by factor analyzing av-
eraged scores for the DOF items to identify pat-
terns of co-occurring problems. Each of the five
syndrome scales consists of a set of problem items
that were found to co-occur. The 0-1-2-3 ratings
on each problem item are averaged across multiple
10-minute observations. A childs total score on a
syndrome scale is the sum of the averaged ratings
on the items comprising each scale. For total scores
on each syndrome scale, the DOF Profile indi-
cates standard scores (T scores) and percentiles
based on a normative sample of boys and girls ages
6-11. Classroom observations are scored on the
following five syndrome scales: Sluggish Cogni-
tive Tempo, Immature/Withdrawn, Attention Prob-
lems, Intrusive, and Oppositional. For recess ob-
servations, the one syndrome is designated as Ag-
gressive Behavior.
7. What are the Other Problems?
Answer: The Other Problems are problem
items that were not associated strongly to quality
for any of the syndromes derived from factor analy-
ses. Therefore, they are not included in the syn-
drome scales. However, each of the Other Prob-
lems items may be important in its own right.
There is a different set of Other Problems for
DOF Profiles based on classroom observations
versus recess observations. The relevant Other
Problems item set is included along with items
from the syndrome scales when computing scores
for Total Problems-Classroom and Total Problems-
Recess.
8. Why doesnt the DOF Profile display
percentiles or T scores for the Other
Problems?
Answer: The Other Problems do not consti-
tute a separate scale. They are merely the items
that did not qualify for the syndrome scales. There
are thus no specific associations among them to
warrant treating them as a separate scale. How-
ever, each of these problems may be important, and
they are all included in computing Total Problems
scores.
9. How is the open-ended item 89
figured into the scale scores?
Answer: If the observer enters any problems in
item 89, the highest rating that the observer gave
to any of these problems (i.e., 1, 2, or 3) is added
to the Total Problems score.
10. What are the DSM-oriented Attention
Deficit/Hyperactivity Problems scale
and its Inattention and Hyperactivity-
Impulsivity subscales?
Answer: The DSM-oriented Attention Deficit/
Hyperactivity Problems scale and its Inattention
and Hyperactivity-Impulsivity subscales consist of
items that are consistent with the DSM-IV and
DSM-IV-TR diagnostic categories of Attention
Deficit/Hyperactivity Disorder (ADHD). The In-
attention subscale has 10 items and the Hyperac-
tivity-Impulsivity subscale has 13 items. The At-
9. Answers to Frequently Asked Questions 110
tention Deficit/Hyperactivity Problems total score
is the sum of ratings on all 23 items. Twelve of the
23 items were similar to CBCL/6-18 and/or TRF
items that an international panel of experts identi-
fied as being very consistent with the DSM-IV
symptoms of ADHD, as explained in Chapter 6.
Eleven additional items were added to the DOF to
tap ADHD symptoms that were not covered by the
other items.
11. Should raw scores, percentiles, or T
scores be used to report results for
DOF scales?
Answer: Percentiles and T scores are usually
preferable to raw scale scores for reporting find-
ings for individual children, because they indicate
degrees of deviance on each scale in comparison
with the normative sample for the childs gender.
However, for statistical analyses of scale scores,
raw scale scores should be used because the T
scores on all scales, expect Total Problems-Class-
room, are truncated at 50, as explained in Chapter
6. If boys and girls are combined in the same sta-
tistical analyses, it may be useful to assign stan-
dard scores separately for each gender so that scores
for each gender have the same mean and the same
standard deviation. On the other hand, if the sta-
tistical analyses are intended to test gender differ-
ences in raw scale scores, then the scores should
not be standardized by gender.
12. How should high scores on the DOF
problem scales be interpreted?
Answer: The DOF Profile shows the letter B
next to T scores that fall in the borderline clinical
range and the letter C next to T scores that fall in
the clinical range for each of the problem scales.
Scores in the borderline range warrant concern but
are not as clearly deviant at those in the clinical
range. For the syndrome scales and DSM-oriented
scales, T scores >69 (>97
th
percentile) are consid-
ered to be in the clinical range, while T scores of
65 to 69 (93
rd
to the 97
th
percentiles) are consid-
ered to be in the borderline range. For Total Prob-
lems, T scores >63 (>90
th
percentile) are in the clini-
cal range, while T scores of 60 to 63 (84
th
to 90
th
percentiles) are in the borderline range. For cer-
tain purposes, such as screening to identify chil-
dren who are at risk for problems, users may choose
to use lower cutpoints on the problem scales than
those that demarcate the borderline or clinical
range.
13. Should extremely low scores on the
DOF problem scales be considered
deviant?
Answer: No. Extremely low scores on the prob-
lem scales merely reflect an absence of problems
observed for a particular time frame and setting.
Because children may manifest problems that are
concentrated in particular areas, it is not unusual
for profiles to have high scores on some scales but
low scores on other scales. Low scores on the prob-
lem scales do not necessarily mean that problems
are absent in other contexts, such as the home or
other settings in school.
14. How should DOF On-task scores be
interpreted?
Answer: As explained in Chapter 2, observers
rate on-task behavior by marking boxes for on-task
or off-task that represent the last 5 seconds of each
1-minute interval over each 10-minute observation
period. Total On-task scores can thus range from 0
to 10 for each observation. The DOF Module for
computer scoring averages On-task scores across
multiple observation sessions. Low On-task scores
warrant clinical concern, in contrast to high scores
for the problem scales. The DOF Profile displays
mean raw scores, T scores and percentiles for On-
task. T scores <31 (<3
rd
percentile) are considered
to be in the clinical range, while T scores of 31 to
35 (3
rd
to 7
th
percentiles) are in the borderline range.
The On-task mean raw score can also be translated
into a percentage of on-task behavior, as done in
the DOF Narrative Report.
15. How are clinical interpretations of
the DOF Profile made?
Answer: The DOF is designed to provide a stan-
9. Answers to Frequently Asked Questions
111
dardized description of a childs behavioral and
emotional problems observed in school classrooms,
at recess, or in other comparable group settings.
The T scores and percentiles for the problem scales
and On-task provide a basis for comparing an in-
dividual child to normative samples of peers of the
same gender. The scale scores on the DOF Profile
can also be compared with analogous scale scores
on CBCL/6-18, TRF, YSR, SCICA, and TOF pro-
files to identify similarities and differences between
problems reported by different informants in dif-
ferent situations. Information from all available
sources should then be integrated to form a com-
prehensive picture of the childs functioning, as il-
lustrated in case examples in Chapter 5.
APPLICATIONS OF THE DOF
1. Who should complete the DOF?
Answer: The DOF can be completed by any-
one who has sufficient understanding of the re-
quired observation and rating procedures, as de-
scribed in Chapter 2. Observers can be teachers
aides or other school paraprofessionals, under-
graduate or graduate students, and research assis-
tants, as well as professionals in education, school
psychology, clinical psychology, and related disci-
plines. Paraprofessionals and students should use
the DOF under the supervision of a qualified pro-
fessional who has knowledge of the theory and
methodology of standardized assessment. Chapter
4 provides guidelines for training DOF observers
and conducting school observations.
2. When should the DOF be completed?
Answer: The DOF should be completed imme-
diately after the 10-minute observation for which
it was used. Observers should complete a separate
DOF for each 10-minute observation.
3. Can the DOF be used below age 6 or
above age 11?
Answer: The 2009 version of the DOF was
normed for ages 6 to 11. The DOF may be appro-
priate for younger children in group settings, such
as Kindergarten or preschool, and children older
than age 11. However, the farther the departure
from the 6-11-year-old norms, the less appropriate
the percentiles and T scores may be for interpret-
ing a childs DOF Profile. Researchers who plan
to analyze only DOF raw scores (not T scores) may
choose to use the DOF for observations of chil-
dren outside of the 6 - 11 age range.
4. Can the DOF be used to assess child-
ren who have physical or mental dis-
abilities?
Answer: The DOF provides a standardized de-
scription of observed behavior. If a child has a
physical or mental disability, then observed behav-
ior must be interpreted with this in mind. How-
ever, children with physical and mental disabili-
ties were excluded from the DOF normative
sample. T scores and percentiles on the DOF Pro-
file therefore provide comparisons only to peers
without disabilities.
RELATIONS TO OTHER ASSESSMENT
PROCEDURES
1. Can other procedures for assessing
behavioral and emotional problems be
used with the DOF?
Answer: The DOF obtains samples of behav-
ioral and emotional problems observed during
multiple 10-minute observations of children in
group settings. As explained in Chapter 2, users
can include up to six DOFs in one observation
set for each identified child and up to six DOFs
for each of two control children matched to the
identified child. Scores from one observation set
can then be compared with scores from another
observation set for the same identified child. For
example, you might choose to compare DOF scores
for observations completed at the beginning of the
school year (e.g., observation set = Fall 2009) and
a second set of observations completed at the end
of the school year (e.g., observation set = Spring
2010). Or you might compare sets of observations
done before and after an intervention. By compar-
ing DOF scores obtained from observation sets for
different time periods, users can distinguish be-
9. Answers to Frequently Asked Questions 112
tween problems that are quite consistent across
time versus those that are more variable. Users may
also choose to compare observation sets for differ-
ent situations, such as math class versus reading
class. In addition, the DOF scores can be compared
with the scores on analogous scales of other
ASEBA forms, which include counterparts of many
DOF items and scales. Assessment data obtained
from interviews of children, parents, and teachers,
medical exams, cognitive and achievement tests,
and behavioral and family assessment can also be
compared with DOF data to provide a comprehen-
sive basis for assessment, as discussed in Chap-
ters 1 and 5.
2. How do DOF scales compare with
scales scored from other ASEBA
forms?
Answer: Although the scales of other ASEBA
forms were derived independently from item pools,
samples of participants, and raters that differ from
those of the DOF, the following DOF scales have
counterparts on profiles scored from most ASEBA
forms for children and youth: Sluggish Cognitive
Tempo (similar to 2007 CBCL/6-18 and TRF Slug-
gish Cognitive Tempo), Immature/Withdrawn
(similar to CBCL/6-18 and TRF Withdrawn/De-
pressed), Attention Problems (similar to CBCL/6-
18 and TRF Attention Problems), Oppositional
(similar to TOF Oppositional and CBCL/6-18 and
TRF Aggressive Behavior), the DSM-oriented At-
tention Deficit/Hyperactivity Problems scale and
Inattention and Hyperactivity-Impulsivity
subscales (similar to CBCL/6-18, TRF, and TOF
DSM-oriented Attention Deficit/Hyperactivity
Problems scale and Inattention and Hyperactivity-
Impulsivity subscales). DOF Total Problem scores
can also be compared to Total Problems on other
ASEBA forms.
3. What if there are differences between
a childs pattern of problems on the
DOF Profile versus the childs
patterns of problems on other ASEBA
profiles?
Answer: Discrepancies between findings from
different assessment procedures can be as infor-
mative as similarities. For example, if the DOF was
used to obtain observations of a child in a particu-
lar school classroom, then DOF scale scores could
be compared to analogous scale scores on the TRF
completed by the childs teacher in the same class-
room and the CBCL/6-18 completed by one or both
parents. DOF scale scores could then be compared
with scores on analogous scales of the CBCL/6-
18 and TRF to see if the childs observed behavior
was different from behavior reported by the teacher
or the parents. If a child has more than one teacher,
observations with the DOF could be done in both
teachers classrooms and then scored on separate
DOF Profiles. If the DOF scores from the two class-
room settings differed on certain scales, then fur-
ther observations and interviews with the teachers
would be appropriate to determine why the child
may behave differently in the two classrooms.
Comparisons of DOF scores with test session ob-
servations scored on the TOF and interviewer rat-
ings scored on the SCICA may also help to docu-
ment discrepancies and consistencies between
problems observed in group settings, such as school
classrooms, versus elsewhere.
RELATIONS TO DSM AND SPECIAL
EDUCATION CLASSIFICATIONS
1. How can the DOF contribute to an
ADHD diagnosis and other DSM
diagnoses?
Answer: Because the DSM criteria for behav-
ioral and emotional disorders are not defined in
terms of specific assessment procedures, scores on
the DOF items and scales may be combined with
other kinds of data in judging whether the criteria
for DSM diagnoses are met. The 23 items of the
DOF DSM-oriented Attention Deficit/Hyperactiv-
ity Problems scale have fairly clear counterparts
among the symptom criteria for ADHD as defined
by DSM-IV (American Psychiatric Association,
1994) and DSM-IV-TR (American Psychiatric
Association, 2000). High scores on the DOF At-
9. Answers to Frequently Asked Questions
113
tention Problems syndrome may also suggest that
DSM criteria for ADHD should be considered.
High scores on the DOF Oppositional syndrome
may suggest that DSM criteria for Oppositional
Defiant Disorder should be considered. At the same
time, to formulate DSM diagnoses, DOF results
must be combined with other assessment data, in-
cluding parent reports, teacher reports, and test
results as appropriate. The CBCL/6-18, TRF, and
TOF also have an Attention Problems syndrome
and a DSM-oriented Attention Deficit/Hyperactiv-
ity Problems scale, as well as Inattention and Hy-
peractivity-Impulsivity subscales, that can contrib-
ute useful information for making DSM diagnoses
of ADHD.
2. How can the DOF be used in
determining eligibility for special
education according to disability
categories, such as those defined by
the 2004 Individuals with Disabilities
Education Improvement Act (IDEA
2004)?
Answer: Categories of educational disabilities
are not defined in terms of specific tests and other
assessment procedures. However, IDEA 2004 does
require direct observations of children in school
as part of a comprehensive assessment for deter-
mining eligibility for special education services.
The DOF provides a structured format for conduct-
ing observations and produces a standardized pro-
file that documents the results of the observations.
The DOF, along with other ASEBA forms, can thus
provide important quantitative data for judging
whether children have the kinds of problems for
which particular special education services are in-
tended, as discussed in Chapter 5.
114
Abramowitz, M., & Stegun, I.A. (1968). Handbook
of mathematical functions. Washington, DC: National
Bureau of Standards.
Achenbach, T. M. (1966). The classification of
childrens psychiatric symptoms: A factor-analytic
study. Psychological Monographs, 80, (No. 615).
Achenbach, T. M. (1981). The Direct Observation
Form of the Child Behavior Checklist (rev. ed.)
Burlington, VT: University of Vermont, Department
of Psychiatry.
Achenbach, T. M. (1986). The Direct Observation
Form of the Child Behavior Checklist (rev. ed.)
Burlington, VT: University of Vermont, Department
of Psychiatry.
Achenbach, T.M. (1991a). Manual for the Child Be-
havior Checklist/4-18 and 1991Profile. Burlington,
VT: University of Vermont, Department of Psychia-
try.
Achenbach, T.M.(1991b). Manual for the Teachers
Report Form and 1991Profile. Burlington, VT: Uni-
versity of Vermont, Department of Psychiatry.
Achenbach, T.M. (1991c). Manual for the Youth Self-
Report and 1991 Profile. Burlington, VT: University
of Vermont, Department of Psychiatry.
Achenbach, T. M., & Edelbrock, C. (1983). Manual
for the Child Behavior Checklist/4-18 and Revised
Child Behavior Profile. Burlington, VT: University
of Vermont, Department of Psychiatry.
Achenbach, T. M., & Edelbrock, C. (1986). Manual
for the Teachers Report Form and Teacher Version
of the Child Behavior Profile. Burlington, VT: Uni-
versity of Vermont, Department of Psychiatry.
Achenbach, T. M., & Edelbrock, C. (1987). Manual
for the Youth Self-Report and Profile. Burlington, VT:
University of Vermont, Department of Psychiatry.
Achenbach, T. M., & Lewis, M. (1971). A proposed
model for clinical research and its application to en-
copresis and enuresis. Journal of the American Acad-
emy of Child Psychiatry, 10, 535-554.
Achenbach, T. M., & McConaughy, S. H. (1997).
Empirically based assessment of child and adoles-
cent psychopathology: Practical applications. Thou-
sand Oaks, CA: Sage.
Achenbach, T. M., & Rescorla, L. A. (2000). Manual
for the ASEBA Preschool Forms & Profiles.
Burlington, VT: University of Vermont, Department
of Psychiatry.
Achenbach, T. M., & Rescorla, L. A. (2001). Manual
for the ASEBA School Age Forms & Profiles.
Burlington, VT: University of Vermont, Research
Center for Children, Youth, & Families.
American Education Research Association, Ameri-
can Psychological Association, & National Council
on Measurement in Education. (1999). Standards
for educational and psychological testing. Washing-
ton, D.C.: American Education Research Association.
American Psychiatric Association. (1994). Diagnos-
tic and statistical manual of mental disorders-fourth
edition. Washington, DC: Author.
American Psychiatric Association. (2000). Diagnos-
tic and statistical manual of mental disorders-fourth
edition-text revision. Washington, DC: Author.
Barkley, R. A. (2006). Attention deficit hyperactivity
disorder: A handbook for diagnosis and treatment
(3
rd
ed.). New York: Guilford Press.
References
References 115
Browne, N. W., & Cudeck, R. (1993). Alternative
ways of assessing model fit. In K. A. Bollen, & J. S.
Long (Eds.), Testing structural equation models (pp.
136-162). Newbury Park, CA: Sage.
Cohen, J. (1988). Statistical power analysis for the
behavioral sciences (2nd ed.). New York: Academic
Press.
Chafouleas, S. M., Christ, T. J., Riley-Tillman, T. C.,
Briesch, A. M., & Chanese, J. A. M. (2007).
Generalizability and dependability of direct behav-
ior ratings to assess social behavior of preschoolers.
School Psychology Review, 36, 63-79.
Crocker, L., & Algina, J. (1986). Introduction to clas-
sical and modern test theory. New York: Holt,
Rinehart, Winston.
Cronbach, L.J. (1951). Coefficient alpha and the in-
ternal structure of tests. Psychometrika, 16, 297-334.
DuPaul, G.J., & Stoner, G. (2003). ADHD in the
schools. (2
nd
ed.). New York: Guilford Press.
Gove, P. (Ed.). (1971). Websters third new interna-
tional dictionary of the English language. Springfield,
MA: Merriam.
Guze, S. (1978). Validating criteria for psychiatric di-
agnosis: The Washington University approach. In
M.S. Akiskal & W.L. Webb (Eds.), Psychiatric diag-
nosis: Exploration of biological predictors (pp. 49-
59). New York: Spectrum.
Hintze, J. M. (2005). Psychometrics of direct obser-
vation. School Psychology Review, 34, 507-519.
Hintze, J. M., & Matthews, W. J. (2004). The
generalizability of systematic direct observations
across time and settings: A preliminary investigation
of the psycholometrics of behavioral observation.
School Psychology Review, 33, 258-270.
Frick, P. J., Lahey, B. B., Applegate, B., Kerdyck, L.,
Ollendick, T., Hynd, G. W., et al. (1994). DSM-IV
field trials for the disruptive behavior disorders:
Symptom utility estimates. Journal of the American
Academy of Child and Adolescent Psychiatry, 33, 529-
539.
Individuals with Disabilities Education Improvement
Act of 2004. Public law. No. 108-446, 118 Stat. 2647
(2004). [Amending 20 U.S.C. 1400 et seq.]
Joint Committee on Testing Practices. (2004). Code
of fair testing practices in education. Washington,
D.C.: American Psychological Association.
Kaufman, A.S., & Kaufman, N.L. (1983). Kaufman
Assessment Battery for Children. Circle Pines, MN:
American Guidance Service.
Leff, S. S., & Lakin, R. (2005). Playground-based
observational systems: A review and implications for
practitioners and researchers, School Psychology
Review, 34, 475-489.
Loehlin, J. C. (1998). Latent variable models: An
introduction to factor, path, and structural analysis
(3
rd
ed.). Mahwah, NJ: Lawrence Erlbaum Associ-
ates.
McConaughy, S.H. (2005). Clinical interviews for
children and adolescents: Assessment to intervention.
New York: Guilford Press.
McConaughy, S.H., & Achenbach, T.M. (2003). Di-
rect Observation Form (Research Ed.). Burlington,
VT: University of Vermont, Research Center for Chil-
dren, Youth, & Families.
McConaughy, S. H., & Achenbach, T. M. (2001)
Manual for the Semistructured Clinical Interview for
Children and Adolescents-Second Edition.
Burlington, VT: University of Vermont, Research
Center for Children, Youth, & Families.
McConaughy, S.H., & Achenbach, T.M. (2004).
Manual for the Test Observation Form for Ages 2-
18. Burlington, VT: University of Vermont, Research
Center for Children, Youth, & Families.
McConaughy, S. H., Achenbach, T. M. & Gent, C. L.
(1988). Multiaxial empirically-based assessment:
Parent, teacher, observational, cognitive, and person-
ality correlates of Child Behavior Profiles for 6-11
116 References
year-old boys. Journal of Abnormal Child Psychol-
ogy, 16, 485-509.
McConaughy, S. H., Kay, P. J., & Fitzgerald, M.
(1999). The Achieving, Behaving, Caring Project for
preventing ED: Two-year outcomes. Journal of Emo-
tional and Behavioral Disorders, 7, 224-239.
McConaughy, S. H., Kay, P. J., & Fitzgerald, M.
(1998). Preventing SED though parent-teacher action
research and social skills instruction: First-year out-
comes. Journal of Emotional and Behavioral Disor-
ders, 6, 81-93.
McConaughy, S.H., Mattison, R.E., & Peterson, R.L.
(1994). Behavioral/emotional problems of children
with serious emotional disturbance and learning dis-
abilities. School Psychology Review, 23, 81-98.
McConaughy, S.H., & Ritter, D. (2008). Best prac-
tices in multimethod assessment of emotional and
behavioral disorders. In A. Thomas & J. Grimes
(Eds.), Best practices in school psychology-V, Vol-
ume 3, (pp. 697-716), Bethesda, MD: National As-
sociation of School Psychologists.
McConaughy, S.H., & Skiba, R. (1993). Comorbidity
of externalizing and internalizing problems. School
Psychology Review, 22, 419-434.
Muthn, L.K., & Muthn, B.O. (2001). Mplus Users
Guide (Version 2). Los Angeles, CA: Muthn &
Muthn.
Naglieri, J. A., & Das, J. P. (1997). Cognitive Assess-
ment System. Itasca, IL: Riverside Publishing.
Petersen, N. S., Kolen, M. J., & Hoover, H. D. (1993).
Scaling, norming, and equating. In R.L. Linn (Ed.),
Educational measurement (3
rd
ed.), (pp.221-262).
Washington, D.C.: American Council on Education.
Reed, M. L., & Edelbrock, C. (1983). Reliability and
validity of the Direct Observation Form of the Child
Behavior Checklist. Journal of Abnormal Child Psy-
chology, 11, 521-530.
Rehabilitation Act of 1973, Section 504. (1973). 29
U.S.C. 706, 1996; 504 [ 30 C.F.R Part 104].
Roid, G. H. (2003). Stanford-Binet Intelligence Scales,
Fifth Edition. Itasca, IL: Riverside Publishing.
Sakoda, J. M., Cohen, B. H., & Beall, G. (1954). Test
of significance for a series of statistical tests. Psy-
chological Bulletin, 51, 172-175.
Sattler, J. M. (2008). Assessment of children: Cogni-
tive applications (5
th
ed.). La Mesa, CA: Author.
Sattler, J. M., & Hoge, R. D. (2006). Assessment of
children: Behavioral, social, and clinical foundations
(5
th
ed.). La Mesa, CA: Jerome Sattler, Publisher, Inc.
Shapiro, E. S., & Heick, P. (2004). School psycholo-
gist assessment practices in the evaluation of students
referred for social/behavioral/emotional problems.
Psychology in the Schools, 41, 551-561.
Shapiro, E. S., & Kratochwill, T. R. (Eds.) (2000).
Introduction: Conducting a multidimensional behav-
ioral assessment. In E.S. Shapiro & T.R. Kratochwill
(Eds.), Conducting school-based assessments of child
and adolescent behavior (pp. 1-20). New York:
Guilford.
Skansgaard, E. P., & Burns, G. L. (1998). Compari-
son of DSM-IV ADHD combined and predominantly
inattention types: Correspondence between teacher
ratings and direct observations of inattentive, hyper-
activity/impulsivity, slow cognitive tempo, opposi-
tional defiant, and overt conduct disorder symptoms.
Child & Family Behavior Therapy, 20, 1-14.
SPSS. (2007). SPSS Base 15.1 Users Guide. Chi-
cago, IL: SPSS.
Tilly, W.D. (2008). The evolution of school psychol-
ogy to science-based practice: Problem solving and
the three-tiered model. In A. Thomas & J. Grimes
117
References
(Eds.), Best practices in school psychology-V, Vol-
ume 1, (pp. 17-36). Bethesda, MD: National Asso-
ciation of School Psychologists.
Volpe, R.J., & McConaughy, S.H. (2005). (Guest Edi-
tors). Systematic direct observational assessment of
student behavior: Its use and interpretation in mul-
tiple settings: An introduction to the Mini-series,
School Psychology Review, 34, 451-453.
Volpe, R. J., DiPerna, J. C., Hintze, J. M., & Shapiro,
E. S. (2005). Observing students in classroom set-
tings: A review of seven coding systems. School Psy-
chology Review, 34, 454-474.
Wechsler, D. (2002). Wechsler Individual Achieve-
ment Tests-Second Edition. San Antonio, TX: Psy-
chological Corporation.
Wechsler, D.C. (2003). Wechsler Intelligence Scale
for Children-Fourth Edition. San Antonio, TX: Psy-
chological Corporation.
Wilson, M.S., & Reschly, D.J. (1996). Assessment in
school psychology training and practice. School Psy-
chology Review, 25, 9-23.
Woodcock, R.W., McGrew, K., & Mather, N. (2001).
Woodcock-Johnson III. Itasca, IL: Riverside Publish-
ing.
118
119
120
Appendix B
121
122
APPENDIX D
ITEMS COMPRISING THE 2009 DOF AND THE 1986 DOF
2009 DOF Items 1986 DOF Items
1. Acts too young for age 1. Acts too young for age
2. Makes odd noises 2. Makes odd noises
3. Argues 3. Argues
4. Cheats 4. Cheats
5. Defiant or talks back to staff 5. Defiant or talks back to staff
6. Brags, boasts 6. Bragging, boasting
7. Doesnt concentrate or doesnt pay attention for long 7. Doesnt concentrate or doesnt pay attention for long
8. Difficulty waiting turn in activities or tasks 8. Cant get mind off certain thoughts; obsessions
(describe):
9. Doesnt sit still, restless, or hyperactive 9. Doesnt sit still, restless, or hyperactive
10. Clings to adults or too dependent 10. Clings to adults or too dependent
11. Confused or seems to be in a fog 11. Confused or seems to be in a fog
12. Cries 12. Cries
13. Fidgets, including with objects 13. Fidgets, including with objects
14. Cruel, bullies, or mean to others 14. Cruelty, bullying, or meanness
15. Daydreams or gets lost in thoughts 15. Daydreams or gets lost in thoughts
16. Difficulty following directions 16. Deliberately harms self
17. Tries to get attention of staff 17. Tries to get attention of staff
18. Destroys own things 18. Destroys own things
19. Destroys property belonging to others 19. Destroys property belonging to others
20. Disobedient 20. Disobedient
21. Disturbs other children 21. Disturbs other children
22. Doesnt seem to feel guilty after misbehaving 22. Doesnt seem to feel guilty after misbehaving
23. Doesnt seem to listen to what is being said 23. Shows jealousy
24. Eats, drinks, chews, or mouths things that are not 24. Eats, drinks, chews, or mounths things that are not
food, excluding junk foods (describe): food, excluding tobacco and junk foods (describe):
25. Difficulty organizing activities or tasks 25. Shows fear of specific situations or stimuli (describe):
26. Fails to give close attention to details 26. Says no one likes him/her
27. Forgetful in activities or tasks 27. Says others are out to get him/her
28. Out of seat 28. Expresses feelings of worthlessness or inferiority
29. Gets hurt, accident prone 29. Gets hurt, accident prone
30. Gets in physical fights 30. Gets in physical fights
31. Gets teased 31. Gets teased
32. Interrupts 32. Hears things that arent there (describe):
33. Impulsive or acts without thinking, including calling 33. Impulsive or acts without thinking, including calling
out in class out in class
34. Physically isolates self from others 34. Physically isolates self from others
35. Lies 35. Lying
36. Bites fingernails 36. Bites fingernails
37. Nervous, highstrung, or tense 37. Nervous, highstrung, or tense
38. Nervous movements, twitching, tics, or other unusual 38. Nervous movements, twitching, tics or other unusual
movements (describe): movements (describe):
39. Loses things 39. Overconforms to rules
40. Too fearful or anxious 40. Too fearful or anxious
41. Physically attacks people 41. Physically attacks people
Note. Bold font in the first column shows new items added to the 2009 DOF. Bold italic font in the second column shows
1986 DOF items that were not included on the 2009 DOF.
a
Items 62 through 89 on the 2009 DOF have counterparts on the 1986 DOF, but the item numbers were changed as can be
seen by comparing item numbers across columns.
Appendix D 123
42. Picks or scratches nose, skin, or other parts of body 42. Picks or scratches nose, skin, or other parts of body
(describe): (describe):
43. Runs about or climbs excessively 43. Falls asleep
44. Apathetic, unmotivated, or wont try 44. Apathetic, unmotivated, or wont try
45. Responds before instructions are completed 45. Refuses to talk
46. Disrupts group activities 46. Disrupts group activities
47. Screams 47. Screams
48. Secretive, keeps things to self, including refusal to 48. Secretive, keeps things to self, including refusal to
show things to teacher show things to teacher
49. Avoids or is reluctant to do tasks that require 49. Sees things that arent there (describe):
sustained mental effort
50. Self-conscious or easily embarrassed 50. Self-conscious or easily embarrassed
51. Slow to respond verbally 51. Sexual activity (describe):
52. Shows off, clowns, or acts silly 52. Shows off, clowns, or acts silly
53. Shy or timid 53. Shy or timid
54. Explosive or unpredictable behavior 54. Explosive or unpredictable behavior
55. Demands must be met immediately, easily frustrated 55. Demands must be met immediately, easily frustrated
56. Easily distracted by external stimuli 56. Easily distracted by external stimuli
57. Stares blankly 57. Stares blankly
58. Speech problem (describe) 58. Acts like feelings are hurt when criticized
59. Wants to quit or does quit tasks 59. Steals
60. Yawns 60. Stores up things he/she doesnt need, except hobby
items such as marbles (describe):
61. Strange behavior (describe): 61. Strange behavior (describe):
62. Strange ideas (describe):
62. Stubborn, sullen, or irritable
a
63. Stubborn, sullen, or irritable
64. Sudden changes in mood or feelings
63. Sulks
a
65. Sulks
66. Suspicious
64. Swears or uses obscene language
a
67. Swears or uses obscene language
68. Talks about killing self
65. Talks too much
a
69. Talks too much
66. Teases
a
70. Teases
67. Temper tantrums, hot temper, or seems angry
a
71. Temper tantrums, hot temper, or seems angry
72. Verbal expressions of preoccupation with sex
68. Threatens people
a
73. Threatens people
69. Too concerned with neatness or cleanliness
a
74. Too concerned with neatness or cleanliness
70. Underactive, slow moving, tired, or lacks energy
a
75. Underactive, slow moving, or lacks energy
71. Unhappy, sad, or depressed
a
76. Unhappy, sad, or depressed
72. Unusually loud
a
77. Unusually loud
73. Overly anxious to please
a
78. Overly anxious to please
74. Whining tone of voice
a
79. Whining tone of voice
75. Withdrawn, doesnt get involved with others
a
80. Withdrawn, doesnt get involved with others
81. Worries
76. Sucks thumb, fingers, hand, or arm
a
82. Sucks thumb, fingers, hand, or arm
77. Fails to express self clearly
a
83. Fails to express self clearly
APPENDIX D (CONT.)
2009 DOF Items 1986 DOF Items
Note. Bold font in the first column shows new items added to the 2009 DOF. Bold italic font in the second column shows
1986 DOF items that were not included on the 2009 DOF.
a
Items 62 through 89 on the 2009 DOF have counterparts on the 1986 DOF, but the item numbers were changed as can be
seen by comparing item numbers across columns.
Appendix D 124
78. Impatient
a
84. Impatient
79. Tattles
a
85. Tattles
80. Repeats behavior over & over; compulsions (describe):
a
86. Repeats behavior over & over; compulsions
(describe):
81. Easily led by peers
a
87. Easily led by peers
82. Clumsy, poor motor control
a
88. Clumsy, poor motor control
83. Doesnt get along with peers
a
89. Doesnt get along with peers
84. Runs out of class (or similar setting)
a
90. Runs out of class (or similar setting)
85. Behaves irresponsibly (describe):
a
91. Behaves irresponsibly (describe):
86. Bossy
a
92. Bossy
93. Plays with younger children
87. Complains
a
94. Complains
88. Afraid to make mistakes
a
95. Afraid to make mistakes
89. Other problems not listed above:
a
96. Acts like poor loser
97. Other problems (specify):
Note. Bold font in the first column shows new items added to the 2009 DOF. Bold italic font in the second column shows
1986 DOF items that were not included on the 2009 DOF.
a
Items 62 through 89 on the 2009 DOF have counterparts on the 1986 DOF, but the item numbers were changed as can be
seen by comparing item numbers across columns.
APPENDIX D (CONT.)
2009 DOF Items 1986 DOF Items
125
A
Abramowitz, M., 84, 114
Achenbach, T.M., 2-3, 30, 56, 58, 62, 71-73, 78,
81, 86, 97-98, 114-116
ADHD, 61, 65, 73, 82, 109
Aggressive Behavior, 1, 33, 37, 79, 81
Algina, J., 84, 115
Alpha, 94-96
American Education Research Association, 114
American Psychiatric Association, 1, 30, 58, 73, 81, 114
Applegate, B., 115
Attention Deficit/Hyperactivity Problems , 1, 23, 30-32, 81-83,
109
Attention Problems, 23, 77, 79
B
Barkley, R.A., 58, 61, 114
Beall, G., 85, 116
Borderline range, 26-28, 30-31, 37, 88, 110
Briesch, A. M., 91, 115
Browne, N. W., 76, 115
Burns, G. L., 72, 78, 116
C
Case Management, 59, 67, 70
CBCL/6-18, 60-63, 65, 68, 70-71, 78-79, 81, 97
Chafouleas, S. M., 91, 115
Chanese, J. A. M., 91, 115
Christ, T. J., 91, 115
Classroom observations, 23-24, 33-35, 66-67, 77,
82, 84-85, 87, 93, 95-96, 99-100, 104, 106, 107
Clinical interpretations, 110
Clinical range, 26-28, 30-31, 37, 88, 110
Cohen, B. H., 85, 116
Cohen, J., 91, 115
Computer-scoring program, 23
Content validity, 97
Continuous recording methods, 2
Control children, 1, 5, 10, 11, 23, 25, 45
Criterion-related validity, 98
Crocker, L., 84, 115
Cronbach, L.J., 91, 94, 96, 115
Cudeck, R., 76, 115
D
Das, J. P., 2, 116
Diagnosis, 112
DiPerna, J. C., 1, 117
Disabilities, 111
DOF profile, 108
DSM , 1, 30, 58, 78, 81-82, 109, 112
DSM-oriented scales 31, 81, 85
DuPaul, G. J., 61, 115
E
Edelbrock, C., 71-72, 97, 114, 116
Emotional disturbance, 61, 63
Ethnicity, 84, 99, 100, 103-104
F
Factor analyses, 74, 75, 76, 79
Fitzgerald, M., 72, 115, 116
Frick, P. J., 78, 115
Functional behavioral assessment, 60, 69
G
Gender , 100, 103, 108
Gent, C. L., 72, 115
Gove, P., 73, 115
Guidelines for rating problem items, 15
Guze, S., 58, 115
H
Heick, P., 1, 116
Hintze, J. M., 1, 46, 47, 55, 91, 115, 117
Hoge, R. D., 1, 58, 116
Hoover, H. D., 86, 116
Hynd, G. W., 115
Hyperactivity-Impulsivity, 1, 30-32, 81-83, 109
I
ID number, 10
Identified child, 1, 5, 10, 11, 23, 25, 44
Immature/Withdrawn, 23, 77-78, 82
Inattention, 1, 30-32, 81-83, 109
Individualized Education Program (IEP), 59
Individuals with Disabilities Education Improvement, 115
Inter-observer agreement, 46, 48-50, 52-53
Inter-rater reliability, 51, 91-93
Internal consistency, 91, 94
Intrusive, 23, 77
Index
Index 126
K
Kaufman, A.S., 3, 115
Kaufman, N.L., 3, 115
Kay, P. J., 72, 116
Kerdyck, L., 115
Kolen, M. J., 86, 116
Kratochwill, T. R., 58, 116
L
Lahey, B. B., 115
Lakin, R., 1, 115
Learning disabilities, 62
Leff, S. S., 1, 115
Lewis, M., 97, 114
Loehlin, J. C., 76, 115
Low scores, 110
M
Mather, N., 3, 117
Matthews, W. J., 91, 115
Mattison, R.E., 62, 116
McConaughy, S. H., 2-3, 30, 58, 60, 62, 72-73, 98,
114-116
McGrew, K., 3, 117
Mean T scores, 88
Multiaxial assessment, 2-3
Multidisciplinary team (MDT), 59
Multisource data, 59
Muthn, B.O., 74, 116
Muthn, L.K., 74, 116
N
Naglieri, J. A., 2, 116
Narrative report, 33, 35, 38, 40
Normal range, 26, 28, 30, 31, 37, 88
Normalized T scores 82-84
Normative samples, 82, 84-85
O
Observation set, 11, 23, 25
Observers notes, 12, 14
Ollendick, T., 115
On-task, 12-14, 23, 30, 47-50, 87-88, 110
Oppositional, 23, 78-79
Other problems, 23, 28, 33, 37, 79-80, 109
Outcome evaluation, 59, 67, 70
P
Percent agreement index, 46
Percentiles, 25, 28, 30-31, 33, 37, 110
Petersen, N. S., 62, 86, 116
Peterson, R. L., 62, 116
Problem items, 13, 15, 47, 52-54
Profile, 23
R
Recess observations, 36, 38-40, 67, 79, 81, 84-85,
87, 93, 96, 99-100, 102, 104, 106, 107
Reed, M. L., 72, 116
Referral status, 99-100, 103-107
Rehabilitation Act of 1973, 116
Reliability, 91
Reschly, D.J., 1, 117
Rescorla, L. A., 2-3, 30, 56, 58, 72-73, 78, 81, 86,
97-98, 114
Response-to-Intervention (RTI), 60, 67
Riley-Tillman, T. C., 91, 115
Ritter, D., 60, 62, 116
Roid, G. H., 116
S
Sakoda, J. M., 85, 95, 99, 116
Sattler, J. M., 1, 55, 58, 116
School psychologist, 59-60
SCICA, 58, 63, 79, 97
Section 504 Accommodations 61, 67
Setting, 12
Shapiro, E. S., 1, 58, 116-117
Skansgaard, E. P., 72, 78, 116
Skiba, R., 116
Sluggish Cognitive Tempo, 23, 76-78, 105
Special education, 60, 112
SPSS, 74, 116
Stegun, I.A., 84, 114
Stoner, G., 61, 115
Syndromes, 73, 77, 109
T
T Scores, 25, 28, 30-31, 33, 37, 83, 85-87, 110
Test-retest reliability, 91, 93, 95
Three-tiered model, 60, 69
Tilly, W.D., 60, 116
Time sampling, 2
TOF, 58-59, 61, 63, 66, 78-79, 97
Total problems, 23, 28, 33, 37, 87
Training observers, 41
TRF, 60-63, 66, 68, 70-71, 78-79, 81, 97
V
Validity, 96
Volpe, R.J., 1, 2, 58, 116-117
W
Wechsler, D., 2, 65, 116-117
Wilson, M.S., 1, 117
Woodcock, R.W., 2, 117
Y
YSR, 63, 81, 97
ibrary of Congress Control Number: xxxxxxx
ISBN 978-1-932975-12-3
t
h
e
A
S
E
B
A
D
i
r
e
c
t
O
b
s
e
r
v
a
t
i
o
n
F
o
r
m
M
c
C
o
n
a
u
g
h
y
&
A
c
h
e
n
b
a
c
h
a
l
f
o
r
t
h
e
A
S
E
B
A
D
i
r
e
c
t
O
b
s
e
r
v
a
t
i
o
n
F
o
r
m
M
c
C
o
n
a
u
g
h
y
&
A
c
h
e
n
b
a
c
h