Childhood Assessment
Childhood Assessment
Childhood Assessment
THE NATIONAL ACADEMIES PRESS 500 Fifth Street, N.W. Washington, DC 20001
NOTICE: The project that is the subject of this report was approved by the
Governing Board of the National Research Council, whose members are
drawn from the councils of the National Academy of Sciences, the National
Academy of Engineering, and the Institute of Medicine. The members of the
committee responsible for the report were chosen for their special competences and with regard for appropriate balance.
The study was supported by Award No. HHSP23320042509XI between
the National Academy of Sciences and the U.S. Department of Health and
Human Services. Any opinions, findings, conclusions, or recommendations
expressed in this publication are those of the author(s) and do not necessarily reflect the view of the organizations or agencies that provided support
for this project.
Library of Congress Cataloging-in-Publication Data
Early childhood assessment : why, what, and how / Committee on
Developmental Outcomes and Assessments for Young Children ; Catherine
E. Snow and Susan B. Van Hemel, editors.
p. cm.
Includes bibliographical references and index.
ISBN 978-0-309-12465-2 (hardcover) ISBN 978-0-309-12466-9 (pdf) 1.
Children with social disabilitiesEducation (PreschoolUnited States.
2. Child developmentUnited States. 3. Competency-based education
United States. I. Snow, Catherine E. II. Van Hemel, Susan B. III. Committee
on Developmental Outcomes and Assessments for Young Children.
LC4069.2.E37 2008
372.126--dc22
2008038565
Additional copies of this report are available from the National Academies
Press, 500 Fifth Street, N.W., Lockbox 285, Washington, DC 20055; (800)
624-6242 or (202) 334-3313 (in the Washington metropolitan area); Internet,
http://www.nap.edu.
Copyright 2008 by the National Academy of Sciences. All rights reserved.
Printed in the United States of America
Suggested citation: National Research Council. (2008). Early Childhood Assessment: Why, What, and How. Committee on Developmental Outcomes and
Assessments for Young Children, C.E. Snow and S.B. Van Hemel, Editors.
Board on Children, Youth, and Families, Board on Testing and Assessment,
Division of Behavioral and Social Sciences and Education. Washington, DC:
The National Academies Press.
vii
Acknowledgments
ix
ACKNOWLEDGMENTS
ACKNOWLEDGMENTS
xi
the report review process, and Eugenia Grohman provided guidance during that process.
This report has been reviewed in draft form by individuals
chosen for their diverse perspectives and technical expertise,
in accordance with procedures approved by the Report Review
Committee of the NRC. The purpose of this independent review
is to provide candid and critical comments that will assist the
institution in making the published report as sound as possible
and to ensure that the report meets institutional standards for
objectivity, evidence, and responsiveness to the study charge. The
review comments and draft manuscript remain confidential to
protect the integrity of the deliberative process.
We thank the following individuals for their participation in
the review of this report: Stephen J. Bagnato, Early Childhood Partnerships, Childrens Hospital of Pittsburgh; Virginia Buysse, Child
Development Institute, University of North Carolina at Chapel
Hill; Gayle Cunningham, Executive Directors Office, Jefferson
County Committee for Economic Opportunity, Birmingham,
AL; David Dickinson, Department of Teaching and Learning,
Vanderbilt University; Walter Gilliam, The Edward Zigler Center
in Child Development and Social Policy of the Yale Child Study
Center, Yale University School of Medicine; Robert L. Linn, Department of Education, University of Colorado; Joan Lombardi, The
Childrens Project, Washington, DC; Helen Raikes, University of
Nebraska, Lincoln; David M. Thissen, Department of Psychology,
University of North Carolina; and Ross A. Thompson, Department
of Psychology, University of California, Davis.
Although the reviewers listed above have provided many
constructive comments and suggestions, they were not asked to
endorse the conclusions or recommendations, nor did they see the
final draft of the report before its release. The review of this report
was overseen by Aletha C. Huston, Pricilla Pond Flawn Regents
Professor of Child Development, University of Texas at Austin,
and Jack P. Shonkoff, Center on the Developing Child, Harvard
University, as review coordinator and monitor, respectively.
Appointed by the NRC, they were responsible for making sure
that an independent examination of this report was carried out in
xii
ACKNOWLEDGMENTS
Contents
Summary
13
1 Introduction
15
2 Purposeful Assessment
27
43
57
61
85
145
179
181
233
281
xiii
xiv
CONTENTS
299
10 Thinking Systematically
301
341
References
377
Appendixes
A Glossary of Terms Related to Early Childhood
Assessment
423
429
437
449
455
Index
465
Summary
he assessment of young childrens development and learning has recently taken on new importance. Private and
government organizations are developing programs to
enhance the school readiness of all young children, especially children from economically disadvantaged homes and communities
and children with special needs. These programs are designed to
enhance social, language, and academic skills through responsive
early care and education. In addition, they constitute a site where
children with developmental problems can be identified and
receive appropriate interventions.
Societal and government initiatives have also promoted
accountability for these educational programs, especially those
that are publicly funded. These initiatives focus on promoting
standards of learning and monitoring childrens progress in meeting those standards. In this atmosphere, Congress has enacted
such laws as the Government Performance and Results Act and
the No Child Left Behind Act. School systems and government
agencies are asked to set goals, track progress, analyze strengths
and weaknesses in programs, and report on their achievements,
with consequences for unmet goals. Likewise, early childhood
education and intervention programs are increasingly being
asked to prove their worth.
SUMMARY
SUMMARY
SUMMARY
SUMMARY
C. Reporting: Maintenance of an integrated database of assessment instruments and results (with appropriate safeguards
of confidentiality) that is accessible to potential users, that
provides information about how the instruments and
scores relate to standards, and that can generate reports for
varied audiences and purposes.
D. Professional development: Ongoing opportunities provided
to those at all levels (policy makers, program directors,
assessment administrators, practitioners) to understand the
standards and the assessments and to learn to use the data
and data reports with integrity for their own purposes.
E. Opportunity to learn: Procedures to assess whether the
environments in which children are spending time offer
high-quality support for development and learning, as well
as safety, enjoyment, and affectively positive relationships,
and to direct support to those that fall short.
F. Inclusion: Methods and procedures for ensuring that all
children served by the program will be assessed fairly,
regardless of their language, culture, or disabilities, and
with tools that provide useful information for fostering
their development and learning.
G. Resources: The assurance that the financial resources
needed to ensure the development and implementation of
the system components will be available.
H. Monitoring and evaluation: Continuous monitoring of the
system itself to ensure that it is operating effectively and
that all elements are working together to serve the interests
of the children. This entire infrastructure must be in place to
create and sustain an assessment subsystem within a larger
system of early childhood care and education.
(S-2) A successful system of assessments must be coherent in a
variety of ways. It should be horizontally coherent, with the
curriculum, instruction, and assessment all aligned with
the early learning and development standards and with the
program standards, targeting the same goals for learning,
and working together to support childrens developing
knowledge and skill across all domains. It should be vertically coherent, with a shared understanding at all levels of
the system of the goals for childrens learning and devel-
10
SUMMARY
11
12
Part
I
Early Childhood Assessment
13
1
Introduction
16
promote infant and child safety and physical health, but societal
attention to childrens mental health is much less universal. Education policies, starting with the common school and continuing
through the No Child Left Behind Act of 2001, have been designed
to ensure adequate accomplishments in particular domains; reading and mathematics are almost always included, but science,
history, literature, art, music, and athletics receive more intermittent and contested support. American society has largely avoided
making policies related to positive ethicshow one should
actconsistent with the separation of church and state. The
criminal code can be seen as a set of ethical guidelines focusing
on the negative sidewhat one should not dobut here as well
the policies relevant to children typically exempt them from full
responsibility even for wrongful actions.
The largest body of child-oriented federal, state, and local
policies focuses on a subset of goals for child development: It is
fairly uncontroversial that society should legislate and appropriate funding to ensure safety and health and to promote academic
achievement. Much less attention has traditionally been devoted
to happiness; trustworthiness; friendship and social relationships;
membership in family, society, or nation; moral development; or
leading a productive life.
One might conceptualize the policies as a map that provides
a distorted representation of the underlying landscape, much
as the Mercator projection of the earth greatly overestimates
the areas of land masses at the poles. The policy projection of
child development has often shrunk the size of social, emotional,
and relational domains to focus on health and academics. This
perspective directly reflects (and may indeed result from) the
researchers projection and the associated measurement projection. Somewhat more attention has been given by the field of
child development to language, literacy, and cognition than to
happiness, emotional health, friendship, or morality (although
some of these goals are beginning to attract research attention and
to be represented in states early childhood standards), and the
tools available to measure development in that first set of domains
are more numerous and more precise.
Assessment strategies also traditionally have focused on
rather discrete aspects of a childs functioning, such as vocabulary
INTRODUCTION
17
18
to influence the outcomes, for example by preventing malnutrition in pregnant women and infants, or increasing resources for
early childhood education, or promoting time for recess and
active play to reduce obesity. Social policy makers are committed
environmentalists when designing programs, but they too often
forget their environmentalist convictions when dictating ways of
assessing the outcomes of those programs.
Assessment of young children is crucial in meeting a variety
of purposes. It provides information with which caregivers and
teachers can better understand individual childrens develop
mental progress and status and how well they are learning, and
it can inform caregiving, instruction, and provision of needed
services. It helps early childhood program staff determine how
well they are meeting their objectives for the children they serve,
and it informs program design and implementation. It provides
some of the information needed for program accountability and
contributes to advancing knowledge of child development.
Furthermore, the tools available for assessing young children
and their environments have increased vastly in number and
variety in recent years. Advances in child development research
and demands from educators, evaluation researchers, and policy
makers have converged to provide a dizzying array of assessment
optionsthus enhancing the urgency of providing some guidelines for deciding when and what to assess, choosing and using
assessment tools, and interpreting assessment data.
The assessment of young childrens development and learning
has taken on new importance as investment in early childhood education rises. Private and government organizations are increasingly
implementing programs for young children, many of them targeted
toward those from disadvantaged homes and communities. These
programs attempt to improve childrens chances for optimal development by compensating in various ways for perceived deficiencies. Some of the more intensive interventions include teaching
parenting skills through home visits, providing child care services
that nurture development, and offering such preschool programs
as Head Start and state prekindergarten (pre-K) programs.
At the same time, the last decade or so has seen societal
and government initiatives promoting accountability for such
programs, especially those that are publicly funded. In this
INTRODUCTION
19
20
INTRODUCTION
21
22
ble, identify opportunities to link measurement improvement strategies within diverse settings (such as educational, developmental,
and pediatric programs for young children) to avoid duplication
and to maximize collaboration and efficiencies.
The committee will provide recommendations to practitioners
and policy makers about criteria for the selection of appropriate
assessment tools for different purposes, as well as how to collect
and use contextual information to interpret assessment results
appropriately for young children. The committee will also develop
a research agenda to improve the quality and suitability of developmental assessment tools that can be used in a variety of early
childhood program and service environments.
INTRODUCTION
23
24
25
INTRODUCTION
26
information on the stakeholder forum held as part of the committees information-gathering efforts. Appendix C has information
on the domains included in state pre-K learning standards, as well
as a description of recent state standards development. Appendix
D provides sources for detailed information on assessment instruments. Appendix E contains brief biographical sketches of the
committee members and staff.
2
Purposeful Assessment
28
PURPOSEFUL ASSESSMENT
29
30
PURPOSEFUL ASSESSMENT
31
32
PURPOSEFUL ASSESSMENT
33
34
designed for this purpose is implemented in the base tier to identify children who are not meeting established educational benchmarks in a high-quality instructional program. Those identified
as not making progress are provided with additional empirically
supported interventions or instructional strategies and their progress is monitored on a regular basis to determine the effectiveness
of the intervention, with additional intervention provided to those
who continue to show limited progress.
Although there is considerable interest in applying tiered
models to preschool, how the principles would be applied has
not been thoroughly developed, and there has been very little
research to date on the application to early education (Coleman,
Buysse, and Neitzel, 2006; VanDerHayden and Snyder, 2006).
An example of an RTI application for children under age 5 is
a model called Recognition and Response; it is under development as an approach to early identification and intervention for
children with learning disabilities (Coleman, 2006). The developmental and experiential variation in young children presents
challenges for the strict application of RTIs prescribed universal
screening, identification of low-performing children, and tiered
intervention. One concern is whether the early and frequent use
of assessment to single some children out as requiring additional
assistance is necessary, or even potentially harmful, before the
children have had the opportunity to benefit from a high-quality
preschool experience. Much more research is needed on how to
apply the assessment and intervention practices of multitiered
models in a way that is consistent with what is known about
young childrens development.
EVALUATING the performance of
a program or society
Perhaps the most talked-about of the many purposes for
which assessment can be used, especially since the passage
of the No Child Left Behind Act (NCLB) in 2001, is accountability. It is important to note that the term accountability
encompasses a number of distinct purposes, which we attempt
to distinguish here.
PURPOSEFUL ASSESSMENT
35
Program Effectiveness
If a government or an agency is investing money in a program,
it makes sense to ask the questions Is this program effective? Is
it meeting our goals? Assessment designed to evaluate program
effectiveness against a set of externally defined goals is one form
of accountability assessment. This may look a lot like progress
monitoring assessment, and indeed the selection of tools for the
two purposes might be identical. But evaluation differs from
progress monitoring in two key ways. First, progress monitoring
assessment is meant to be useful to those inside the program who
are responsible for day-to-day decisions about curriculum and
pedagogy, whereas evaluation of program effectiveness is useful
to those making decisions about funding, extending, or terminating programs. Second, progress monitoring requires data on all
relevant domains from all children in a program, whereas in many
cases it is possible to evaluate a programs effectiveness by sampling children rather than testing them all, or by using a matrix
design to sample different abilities in different children.
Using assessments for accountability purposes may seem
simple, but in fact interpreting test data as reflecting the value of
a program can be risky. There are many challenges to the conclusion that a program in which children perform poorly at the end
of the year should be terminated. What if they were extremely
low scorers at program entry and made notable progress, just
not enough to reach the norm or criterion? What if the program
is basically sound but disruptions to financing or staffing led to
poor implementation in this particular year? What if the program is potentially good but investments in needed professional
development or curricular materials were denied? What if the
alternative program in which the children would end up if this
one is terminated is even worse? Challenges like this have been
widely discussed in the context of accountability consequences for
school-age children under NCLB, and they are equally applicable
to programs for preschoolers.
In other words, establishment of program-level accountability
is a legitimate and important purpose for assessment, but not one
that can be sensibly met by sole reliance on child-focused assessment data. Accountability is part of a larger system and cannot be
36
derived from outcome data alone, or even from pre- and posttest
data, on a set of child assessments. We say more about the importance of the larger system in Chapter 10.
Program Impacts
A more specific purpose for assessing children participating
in a particular program is to evaluate the impact of that program,
ideally in comparison to another well-defined treatment (which
might be no program at all), and ideally in the context of random
assignment of individuals or classrooms to the two conditions.
Under these circumstances, it is possible to evaluate the impact of
the program on childrens performance on the assessments used.
Under these (relatively rarely encountered) ideal experimental
circumstances, it is appropriate to sample children in programs
rather than testing them all, and it is possible, if one is willing to
limit claims about program effectiveness to subsets of children, to
exclude groups of children (English language learners, for example, or children with disabilities) from the assessment regimen.
Social Benchmarking
Another purpose for early childhood assessment that relates
to accountability at a societal level is social benchmarking
answering questions like Are 3-year-olds healthier than they
were 20 years ago? or How do American 4-year-olds perform
compared with Australian 4-year-olds on emergent literacy
tasks? Social benchmarking efforts include projects like those
launched by the National Center for Education Statistics (the
Birth Cohort Study, the Early Childhood Longitudinal StudyKindergarten) and individual states (Californias Desired Results
Developmental Profile).
These efforts provide profiles of expectable development
that can be used for comparisons with smaller groups in particular
studies and also as a baseline for comparison with data collected
at a later time. Furthermore, these studies provide policy makers
and the public with a view of what the society is doing well and
not so well at. The movement to develop early learning guidelines
can be seen as a contribution to the social benchmarking effort;
PURPOSEFUL ASSESSMENT
37
38
BOX 2-1
Guidelines of Documents Promulgated by
Major Early Childhood Professional Groups
Principles and Recommendations for Early Childhood Assessments (Shepard, Kagan, and Wurtz, 1998). Goal 1 Early Childhood Assessments Resource Group document.
Early Childhood Curriculum, Assessment, and Program Evaluation (and an accompanying extension for English language
learners), a position statement promulgated by the National Association for the Education of Young Children and the National
Association of Early Childhood Specialists in State Departments
of Education (2003).
Promoting Positive Outcomes for Children with Disabilities:
Recommendations for Curriculum, Assessment, and Program
Evaluation from the Division for Early Childhood (2007).
Council of Chief State School Officers set of documents on
Building an Assessment System to Support Successful Early
Learners (undated, but circa 2003a, 2003b).
PURPOSEFUL ASSESSMENT
39
40
PURPOSEFUL ASSESSMENT
41
3
Perspectives on Early Childhood
Learning Standards and Assessment
44
45
46
BOX 3-1
The Development of Major Early Learning Standards
1989 Goal 1, All children ready to learn, articulated by the
nations governors at education summit
1995 Publication of Reconsidering Childrens Early Development
and Learning (Kagan, Moore, and Bredekamp, 1995)
1998 Publication of Preventing Reading Difficulties (National
Research Council, 1998)
47
48
The risk of children being unfairly denied program participation based on what they do or do not know.
The risk that responsibility for meeting the standards will
shift from the adults charged with providing high-quality
learning opportunities to very young children.
Whether high-quality teaching will be undermined by
the pressure to meet standards, causing the curriculum to
become rigid and focused on test content and the erosion of
a child-centered approach to curriculum development and
instructional practices.
Whether switching to child outcome standards as the sole
criterion for determining the effectiveness of programs or
personnel is unfair. Early childhood services continue to be
underresourced, and poor child outcomes may reflect the
lack of resources.
Misunderstanding of how to achieve standards frequently
appears to engender more teacher-centered, didactic
practices.
Although these concerns cannot be dismissed, it is important
to note that early learning standards were developed as a tool to
improve program quality for all children. Their rapid development has resulted from a combination of policy shifts and an
emerging practitioner consensus, influenced by a number of
factors:
The standards-setting activity in K-12 education, which
gained momentum after the 1990 establishment of the
National Education Goals Panel and the subsequent passage of Goals 2000 by Congress in 1994. This act and its
accompanying funding led states to develop or refine K-12
standards in at least the areas of English language arts,
mathematics, science, and history.
Greater understanding about the capabilities of young
children. Earlier work of the National Research Council
(NRC) has played a key role in informing and developing
that understanding and thereby supporting the development of early learning standards. The most influential NRC
document influencing the development of standards for
49
50
childhood instruments, along with teacher reports, parent reports, and observation, to assess numerous cognitive and socioemotional outcomes. It follows
children from their Head Start experiences through kindergarten and through
the 1997 cohort into first grade (U.S. Department of Health and Human Services,
Administration for Children and Families, 2006a, available: http://www.acf.hhs.
gov/programs/opre/hs/faces/index.html).
From Thomas Schultz via personal communication with committee member
Harriet Egertson.
51
CHILDS
SOCIAL
COMPETENCE
Pro
cess
es
Ou
tco
me
ENHANCE
STRENGTHEN
childrens
growth and
development.
families as the
primary nurturers
of their children.
PROVIDE
LINK
5
ENSURE
well-managed programs that involve parents in decision making.
52
53
54
domains because of the difficulty in finding high-quality instruments that would meet NRS requirements. Most of the items in
the NRS battery were taken from existing assessment instruments
that had been used in Head Start research or in local Head Start
assessment programs.
A Spanish-language version of the assessment was developed
as well. In the first year of implementation, it was administered
after the English version to children whose home language was
Spanish and who passed a Spanish language screener. Thus all
children were assessed in English or Spanish only if they had
passed the screener for that language.
The NRS aroused much concern on the part of some early
childhood experts. More than 200 educators, researchers, and
practitioners signed letters to Congress in early 2003 laying out
their concerns about the NRS, along with some suggested ways
to improve it. The letters ended with the following words: If we
can move ahead on adopting a matrix sampling design for the
proposed Reporting System; if we can ensure that the System is
composed of subtests that are reliable, valid, and fair; and if we
can have adequate time to learn how to mount this historically
largest-ever effort to test young children without creating chaos
and confusion, then we will have created a system that has a
chance of assisting young, at-risk children (Meisels et al., 2003).
In May 2005, the Government Accountability Office (GAO)
released a report on the first year of implementation of the NRS
(U.S. Government Accountability Office, 2005). In it, the GAO
identified several weaknesses in the system and its implementation, noting: Currently, results from the first year of the NRS are
of limited value for accountability purposes because the Head
Start Bureau has not shown that the NRS meets professional
standards for such uses, namely that (1) the NRS provides reliAmong the other criticisms of the NRS was dissatisfaction with the omission of
any measure of socioemotional development. A socioemotional component, based
on teacher observations over a 1-month period, was added to the NRS as of the fall
2006 administration. For that administration, teachers were asked to assess only
children who had been in the program for at least 4 weeks. It included items asking
the teacher to report on approaches to learning, cooperative classroom behavior,
relations with other children, and behavior problems (U.S. Department of Health
and Human Services, Administration for Children and Families, 2006b).
55
Part
II
Child-Level Outcomes
and Measures
57
58
59
60
4
Screening Young Children
62
63
This list does not include many purposes typical of assessment for older preschoolers, such as evaluation of intervention
strategies, prediction of future competencies, or assessment of
skills that are fundamental for success in a classroom environment, such as ease of gaining the childs attention and ability to
sustain it. The focus is on the identification of possible developmental problems at an early agein part, we argue, because of the
relatively undifferentiated nature of developmental organization
in early infancy and the associated difficulty of making precise
predictions to later abilities. We note also that in spite of wide
agreement that screening and monitoring of the development of
these youngest children is important, pediatricians still do not
fully agree on the most important domains to measure or the best
measures to use (McCormick, 2008).
Most of the assessment conducted in this age range is actually
screening to identify potential problems, to be followed by more
definitive diagnostic assessment. The principles of a good screening program are thus relevant (Wilson and Jungner, 1968):
a valid and reliable measure,
acceptability to the population being screened and their
parents or guardians,
facilities to conduct the screening,
facilities to ensure follow-up and treatment, and
cost-effectiveness.
Contexts and Assessment
As noted, assessment of infants and toddlers often takes
place in pediatric settings, with screening as a primary goal.
Screening may also take place in early childhood education and
intervention settings, such as Early Head Start and home visiting
programs. Interpreting results from such assessments must take
into account the effects of a wide variety of inputs into the childs
development, for example, safety of the residence, care practices
of parents and other caregivers, exposure to substances that might
hamper normal development, and consistency of care settings, as
well as information about the infants state of health and alertness
during the assessment.
64
65
66
67
and the relevant part of the brain is intact, indicated by pupillary responses to light; and that the eyes move in a coordinated
fashion. Between ages 2 and 4 years, it becomes possible to test for
visual acuitythat is, the size of objects that can be seen at certain
distances (American Academy of Pediatrics, 1996). The goal of
these procedures is to reduce poor vision or risk factors that lead
to abnormal visual development. Recent evidence supports the
effectiveness of intensive screening for the reduction of amblyopia
and improved visual acuity. The U.S. Preventive Services Task
Force concluded that the routine screening currently done has
not been shown to be effective, although the potential benefit
outweighed the minimal risk of the screening (U.S. Preventive
Services Task Force, 2004).
Iron Deficiency Screening
A lengthy literature addresses the effect of nutritional deficiency and child development (Grantham-McGregor, 1984). Since
poor nutrition and micronutrient deficiency are more likely in
the context of poverty and ill health, disentangling the effect of
specific nutritional deficiencies on development is sometimes
difficult. However, evidence from developing and industrialized
countries supports a relation between iron deficiency and poorer
socioemotional, sensorimotor, and cognitive development and
school performance (Lozoff et al., 2000, 2003). Recommendations
for screening for iron deficiency are consistent with this body
of research (American Academy of Pediatrics, 2003). However,
substantial questions about the specificity of using blood hemoglobin levels to assess the presence of iron deficiency led the U.S.
Preventive Services Task Force to conclude that the evidence is
insufficient to recommend for or against such screening (U.S.
Preventive Services Task Force, 2006).
Acuity tests, such as Teller Acuity Cards, are available for infants and toddlers,
and they can be useful for at-risk (e.g., premature) infants, but they are not suitable
for general screening and good predictive validity has not been demonstrated
(National Research Council, 2002).
68
Lead Screening
Lead absorbed from the environment has long been recognized as a neurotoxicant, and major efforts have been undertaken
to reduce environmental lead (Grandjean and Landrigan, 2006).
The success of these efforts has led to a sharp decline in the
blood lead levels of children in America: as of 2006, only slightly
more than 1 percent had blood lead levels above the cutoff of
10 micrograms/deciliter (Centers for Disease Control and Prevention, 2007). Nonetheless, certain populations, such as minority
children and those living in older housing stock, remain at risk,
and thus a targeted screening strategy has been recommended
by the American Academy of Pediatrics (2005). Several studies
have reported that children with low-level prenatal lead exposure
(< 10 mg/dl) have intellectual deficits as measured by standard
IQ tests (Banks, Ferrittee, and Shucard, 1997; Lanphear et al., 2000,
2002; Needleman and Gatsonis, 1990) reflected in poorer performance on specific items on the Neonatal Behavioral Assessment
Scale (Brazelton and Nugent, 1995; Emory et al., 1999) and on
infant intelligence at age 7 months (Emory et al., 2003; Shepherd
and Fagan, 1981). The study by Emory et al. (2003) characterized
the effects found as lowered optimal performance rather than an
increase in impaired performance across the board.
DEVELOPMENTAL ASSESSMENT
Newborns
Developmental assessments provide useful information about
overall physiological status and risk. Neurodevelopmental examinations initially focused on neurological reflexes and postural
reactions that can be elicited in the newborn, which emerge and
disappear within fairly specific time periods, as a means of assessing central nervous system integrity, especially early signs of
cerebral palsy (Zafeieriou, 2003). Primitive reflexes are mediated
by the brainstem and consist of complex, automatic movement
patterns that emerge from 25 weeks of gestation and disappear
by age 6 months. Postural reactions are infant responses to being
held in different standardized positions and probably reflect more
69
70
Using the model of the NBAS, Als et al. (2005) have developed the Assessment of Preterm Infants Behavior (APIB). The
scale assesses what are theorized to be five interacting systems
of functioning: autonomic, motor, state organization, attention,
and self-regulation. Like the NBAS, the APIB forms the basis
of an intervention, the Newborn Individualized Development
Care and Assessment Program, intended to improve the develop
mental outcomes of preterm infants by teaching caregivers in the
neonatal intensive care unit how to interact more sensitively with
the infant. If the intervention improves performance on the APIB
and leads to better long-term outcomes in early childhood, then
one might argue that the APIB has predictive validity, and Als et
al. (2003) have argued for such an effect. However, a recent metaanalysis of individualized developmental interventions in the
neonatal intensive care unit suggests that the data do not support
this argument (Jacobs, Sokol, and Ohlsson, 2002).
Infants and Toddlers
Developmental assessment of infants and toddlers occurs
routinely in medical care settings and is carried out by a variety of
people; some children receive this service through infant-toddler
care/education/intervention programs. In view of the time pressures in primary care settings, the approach has been to rely on
brief screening instruments, with more complete assessments of
children who do not seem to be developing at the usual pace.
Since most young children are monitored by pediatricians or
other primary medical care personnel, it seems reasonable to use
the clinical guidelines from the American Academy of Pediatrics
(American Academy of Pediatrics, Committee on Children with
Disabilities, 2001; American Academy of Pediatrics, Council on
Children With Disabilities, 2006) as a template for this process.
The first step is developmental surveillance performed as part
of the regular well-child visit. Surveillance is considered to include
eliciting and attending to the parents concerns, documenting and maintaining a developmental history, making accurate
observations of the child, identifying risk and protective factors,
and . . . documenting the process and findings (American Academy of Pediatrics, Council on Children with Disabilities, 2006). If
71
72
73
74
75
with infants and young children need training and support in the
appropriate procedures.
Finally, the effectiveness of screening may be further limited
by the fact that the system of access to screening settings and of
response to abnormalities found may be as diffuse and unstandardized as the assessment process itself. Unlike the classroom
setting, in which more standardized and local approaches to
developmental and learning problems may be taken, response
to abnormalities of development in infants, toddlers, and older
preschoolers not already enrolled in intervention programs typically requires referral to other services for diagnosis and management. In part, this variability in response reflects the diversity of
state and other policies regarding young children. This means
that some infants and toddlers are not screened, and that those
who are identified as requiring diagnostic assessments and other
services may not receive them. As noted above, much of the early
screening is accomplished in health care settings, and access to
care is heavily dependent on having health insurance. Children
without health insurance are more likely to have low family
income, to come from minority families, to use medical care less
intensely, and to be referred to other settings for services (Simpson
et al., 2005). Even with insurance, access to some services is more
difficult than others. Although the Individuals with Disabilities
Education Act does mandate testing for all children suspected
of developmental disability or delay and requires the provision
of appropriate services to children so identified, there remains
considerable local variation in the capacity to respond to this
mandate. A recent chapter by Gilliam, Meisels, and Mayes (2005)
proposes a system of screening and surveillance that uses many
available community resources to provide a more integrated
screening, referral, and assessment system.
Finally, even if the current assessment of infant and toddler
development were more universally effective, fitting well into
a larger system and building continuity with the assessment of
slightly older preschoolers would improve its usefulness. The
focus of infant-toddler assessment procedures is primarily on
monitoring development and risks to development for purposes
of ensuring adequate progress and to rule out health-related challenges to normal development. For example, the vision examinations conducted by health care providers may focus less on the
76
Screening
Denver Prescreening
Developmental Questionnairea
Parents Evaluation of
Developmental Statusa
NCHS/NLSY Questionnaire
(U.S. Department of Health and
Human Services, National Center
for Health Statistics, 1981)
Caregiver Report
Instrument Type
Data-Gathering Method
Developmental Profile-II
Preschool Screening System
Denver Developmental Screening
Test IIa
continued
Brigance Screens
Mixed/Both
Observation
Appendix TableS:
Summary of Assessment Instruments for Children 0-3 Years of Age
77
aIncludes
Parents/Evaluation of
Developmental Status (PEDS)a
Caregiver Report
Observation
Diagnostic
Instrument Type
Data-Gathering Method
78
aRequires
MacArthur-Bates Communicative
Development Inventories
Test of Early Language
Development
trained interviewer/observer.
Diagnostic
Screening
Observation
Caregiver Report
Instrument Type
Data-Gathering Method
Sequenced Inventory of
Communication Development
Mixed/Both
79
Diagnostic
Screening
Motor Quotient
(Capute and Shapiro, 1985)
Caregiver Report
Instrument Type
Data-Gathering Method
Observation
Mixed/Both
80
Screens for
Specific
Developmental
Disabilities
General
Caregiver Report
Instrument Type
Data-Gathering Method
Bayley Scales of Infant
Development, Third ed.
Observation
continued
Mixed/Both
81
trained interviewer/observer.
Social Communication
Questionnaire (SCQ) (Rutter,
Bailey, and Lord, 2003)
Pervasive Developmental
Disorders Screening Test-II
(PDDST-II) (Siegel, 2004)
Caregiver Report
Observation
aRequires
All
Caregiver Report
Data-Gathering Method
trained interviewer/observer.
Instrument Type
Observation
aRequires
Instrument Type
Data-Gathering Method
Mixed/Both
Mixed/Both
82
All
Pictorial Assessment of
Temperament (PAT)
(Clarke-Stewart et al., 2000)
Infant Characteristics
Questionnaire
(Bates, Freeland, and Lounsbury,
1979)
Caregiver Report
Instrument Type
Data-Gathering Method
Mixed/Both
83
All
Instrument Type
Caregiver Report
Data-Gathering Method
Preschool Assessment of
Attachment
(Teti and Gelfand, 1997)
Observation
84
5
Assessing Learning
and Development
ssessments for purposes other than screening and diagnosis have become more and more common for young
children. Some of these assessments are conducted to
answer questions about the child (e.g., monitoring progress during instruction or intervention). Other assessments are conducted
to provide information about classrooms and programs (e.g., to
evaluate a specific curriculum or type of program) or society in
general (e.g., to describe the school readiness of children entering
kindergarten). Many of the assessments widely in use in educational settings are designed primarily to inform instruction by
helping classroom personnel specify how children are learning
and developing and where they could usefully adapt and adjust
their instructional approaches. Thus, the goals of much testing in
this later period are more closely related to educational than to
medical or public health issues, and the nature of the assessments
as well as the domains assessed are modified accordingly.
The greater role of education in these assessments means that
the settings for assessing children may be different, and the range
of domains toward which assessments are directed is expanded.
Assessment that is educationally oriented often takes school-age
achievement as the ultimate target and thus is organized into
domains that are highly relevant to K-12 schooling (e.g., literacy,
science, social studies). Understanding the developmentally rel85
86
87
1995) and in the analysis of state learning standards by ScottLittle, Kagan, and Frelow (2006). For each of the domains, we
first discuss how it is defined and how its internal structure has
been delineated. We then present evidence for the importance
of the domain: that it is widely mentioned in child achievement
standards, that it is a focus of developmental theory and research,
or that it relates to other outcomes important in the short or long
term. We also consider evidence that the developmental domain
is malleable, that is, amenable to change through interventions,
since the capacity to change is another source of evidence for the
importance of assessing it. We then describe some of the assessment approaches and tools that have been widely used to reflect
status or progress in that domain. Appendix Tables 5-1 through
5-7 provide a summary listing of the major instruments discussed
here, with a table for each domain. For each table, the first column
indicates the subscale or specific domain assessed, and the second
through fifth columns list the instruments that offer the relevant
subscales, categorized by the measurement method(s) used by
each: direct assessment, questionnaire, observation, or interview.
Because many useful instruments do not quite fit into the domains
we discuss, we have also included a table for general knowledge
(sometimes categorized under cognitive skills), and have included
science in the table with mathematics.
For more detailed information on instruments, including
evaluative reviews, specific age range, time to administer, administrator qualifications required, as well as psychometric information, we have listed and described a variety of print and online
instrument compendia and reviews in Appendix D.
Physical well-being and motor development
Defining the Domain
This domain encompasses issues of health, intactness of sensory systems, growth, and fitness, as well as motor development.
Motor development has long been a topic of interest in pediatric
and developmental studies, and it also is one of the areas used in
screening children for possible developmental problems. The com-
88
89
90
Kagan, and Frelow, 2006). Californias Preschool Learning Foundations in Social and Emotional Development for Ages 3 and 4
(http://www.cde.ca.gov/re/pn/fd/documents/preschoollf.pdf)
is an excellent example of the development of a consensus document regarding expectations for childrens social and emotional
skills in the preschool years. Relying heavily on the research on
young childrens social and emotional development, the document describes benchmarks for the behavior of 3- and 4-yearolds in central domains of social and emotional development. . . .
In focusing on social and emotional foundations of school readiness, a central assumptionwell supported by developmental
and educational researchis that school readiness consists of
social-emotional competencies as well as other cognitive competencies and approaches to learning required for school success
(p. 1). The standards for social and emotional development in
Californias early learning standards identify the dimensions
of self (self-awareness and self-regulation, social and emotional
understanding, empathy and caring, and initiative in learning),
social interaction (including interactions with familiar adults,
interaction with peers, group participation, and cooperation and
responsibility) and relationships (attachments to parents, close
relationships with teachers and caregivers, and friendships). The
perspective that social and emotional development and early
learning are closely linked is reflected in the inclusion of Initiative in Learning as a component of social and emotional development, involving the childs interest in activities in the classroom,
enjoyment of learning and exploring, and confidence in his or her
ability to make new discoveries.
Importance for Later Development
The social and emotional demands of formal schooling on
young children differ from those of early childhood settings,
and childrens skills in this area at school entry are predictors
of how well they make the adjustment to the new setting and
progress academically (see Bierman and Erath, 2006; Campbell,
2006; Ladd, Herald, and Kochel, 2006; Mashburn and Pianta,
2006; Raver, 2002; Thompson and Raikes, 2007; Vandell, Nenide,
and Van Winkle, 2006). Early childhood care and educational
91
92
Buhs, 1999; Pianta and Steinberg, 1992; Silver et al., 2005). Closeness, conflict, and dependence have been identified as three features of teacher-child relationships that are important to childrens
development (Mashburn and Pianta, 2006).
While relationships with teachers as well as peers during
the transition to formal schooling appear to be central to positive engagement in school and thereby achievement, positive
teacher and peer relations in turn appear to rest at least in part on
childrens knowledge of emotions and their ability to regulate the
expression of their own emotions (Bierman et al., under review;
Denham, 2006; Vandell, Nenide, and Van Winkle, 2006).
Self-regulation: Recent research on self-regulation acknowledges that some aspects of it involve emotion (e.g., modulation in
the expression of negative emotions) and behavior (e.g., inhibition
of aggressive impulses), and other aspects focus more on attentional and cognitive skills (e.g., the ability to maintain a set of
instructions actively in working memory over time and despite
distractions, taking the perspective of another, switching attention
as task demands change) (Diamond et al., 2007; McClelland et al.,
2007; Raver, 2002, 2004).
Socioemotional development is of importance during the
early childhood period because it relates to childrens capacities
to form relationships, both trusting relationships with adults and
friendships with peers, and these relationships in turn seem to be
related to the speed of learning in early care and educational settings. These markers of positive relations with peers and teachers
have implications for childrens engagement and participation in
the classroom. Children learn to regulate the expression of emotion in a variety of ways, including turning to others with whom
they have secure relationships for comfort and support, using
external cues, and, increasingly with age, managing their own
states of arousal (Thompson and Lagattuta, 2006).
Behavior problems: Serious behavior problems are apparent
early in some children. Research summarized by Raver (2002)
indicates that children with early and serious problems of aggression who are rejected by peers are at elevated risk in terms of poor
academic achievement, grade retention, dropping out of school,
and eventually delinquency. Raver notes that children who are
disruptive tend to get less instruction and positive feedback from
93
94
confirming the relation of early social and emotional competencies, self-regulation, and absence of serious behavior problems to
early participation in learning activities and to academic achievement. While it is important to note that social and emotional
development predicts later academic outcomes, at the same time
we insist that childrens social and emotional well-being and
competencies are worthy developmental goals in their own right,
independent of their relationship to academic outcomes.
Evidence of Malleability
According to a review by Raver (2002), there is substantial
evidence from experimental evaluations that it is possible to
improve young childrens social and emotional development at
the point of school entry or earlier, helping them to develop and
stay on a positive course in their relationships with teachers and
peers and to engage positively in learning activities. While the
evidence summarized points to program effects across all the
levels of intensity and the setting of the interventions considered
(in the classroom, with parents, or both), findings are stronger
when interventions engage parents as well as teachers and are
more intensive. More recent reviews contribute to understanding
the complexity of this domain (Bierman and Erath, 2006; Fabes,
Gaertner, and Popp, 2006).
Several recent developments in intervention research on
young childrens social and emotional development are note
worthy. First, very recent work has focused explicitly on interventions targeting childrens self-regulation skills. In recent work by
Diamond and colleagues (Diamond et al., 2007), the Tools of the
Mind curriculum, which embeds direct instruction in strengthening executive function in play activities and social interactions,
was experimentally evaluated in prekindergarten programs in
low-income neighborhoods. This intervention takes a Vygotskian
approachthat is, it encourages extended dramatic play, teaches
children to use self-regulatory private speech, and provides
external stimuli to support inhibition. Results showed significant improvements in direct assessments of childrens executive
function. By the end of the school year, children in classrooms
95
implementing Tools of the Mind did not need help staying on task
or redirecting inappropriate behavior. This study provides important evidence that aspects of self-regulation are malleable.
Measurement Issues
An ongoing challenge in the research on social and emotional
development of young children is to forge agreement about specific constructs, measures, and the mapping of constructs to measures (Fabes, Gaertner, and Popp, 2006; Raver, 2002). The internal
complexity of the domain is reflected in the fact that different
measures parse it differently. The lack of agreement impedes the
capacity to look across studies at accumulating patterns of findings (Zaslow et al., 2006).
Another challenge is that some see measures of social and
emotional development as reflecting in part the early childhood environment and the teacher-child relationship, rather
than as pure measures of the child. For example, a teacher who
requires 3-year-olds in an early childhood classroom to sit still
for long periods to do seat work is likely to assess many children
as inattentive or disruptive (Thompson and Raikes, 2007). Her
rating of a child as having behavior problems may actually be a
reflection of her inappropriate expectations, rather than a childs
enduring behavior problem.
Another measurement challenge is the heavy reliance in this
domain on teacher and parent reports. In development are direct
assessments of childrens behavioral self-regulation (Emotion
Matters II Direct assessments developed by Raver and modeled
after work by Kochanska and colleagues); of the executive function aspects of self-regulation (the Head to Toe Task described
by McClelland and colleagues, 2007); and of the Dots Task from
the Directional Stroop Battery and the Flanker Task described
by Diamond and colleagues (2007). Further work with these
measures may generate important evidence about their reliability and validity, as well as their sensitivity to intervention
approaches and their relation to teacher and parent reports and
direct observations.
96
97
98
and Morrow, 1989; Lewit and Baker, 1995), claiming that many
children, especially from low-income homes, enter kindergarten
lacking them (Rimm-Kaufman, Pianta, and Cox, 2000).
Evidence of Continuity and Associations with
Important Outcomes
Aspects of infant behavior, such as giving attention and the
ability to sustain attention, appear to show continuity over time
and relate to educational outcomes. Learning behaviors, such as
persistence and attention in the classroom, have been shown to
be related to specific academic skills in early childhood, such as
early mathematics and literacy skills, across a number of studies
(Fantuzzo, Perry, and McDermott, 2004; Green and Francis, 1988;
McDermott, 1984; McWayne, Fantuzzo, and McDermott, 2004),
even when measures of emotional adjustment were also considered. Approaches to learning as rated by the kindergarten teacher
at entry to school predicted growth in mathematics from kindergarten to third grade in a national sample, the Early Childhood
Longitudinal Study-Kindergarten Cohort (ECLS-K) (DiPerna, Lei,
and Reid, 2007).
Several studies have found significant associations between
young childrens learning-related behavior and their academic
performance. Normandeau and Guay (1998) reported that first
graders cognitive self-control (the ability to plan, evaluate, and
regulate problem-solving activities; attend to tasks; persist; resist
distraction) was associated with their academic achievement, net
of their intellectual skills assessed in kindergarten. Howse et al.
(2003) found that teachers ratings of kindergarteners (but not
second graders) motivation (e.g., is a self-starter, likes to do
challenging work) predicted concurrent reading achievement,
with receptive vocabulary (but not previous reading achievement) held constant.
In a longitudinal study of children from kindergarten through
second grade by McClelland, Morrison, and Holmes (2000),
teachers ratings of kindergarten childrens work-related skills
(compliance with work instructions, memory for instructions,
completion of games and activities) were significantly associated
99
100
101
102
103
104
105
106
107
108
109
110
111
Evidence of Malleability
The theory of change for most early childhood intervention
programs is that some form of preschool enrichment will lead
to more rapid growth in cognitive skills for participants, often
children from low-income families. Most often, cognitive skills
are measured via individual direct assessments using standardized tests administered by trained staff members. A recent Rand
Corporation study (RAND Labor and Population, 2005) examined
programs implemented in the United States that provide services
to children and families during early childhood and reported
effect sizes (d) for cognitive outcomes for successful programs
that ranged from .13 to 1.23.
The largest effect sizes were obtained in the most intensive interventions in assessments of children after age 2. The
Abecedarian Project, a single-site experimental intervention that
delivered 5 years of full-time quality child care, yielded effect sizes
of d = .50 at 18 months, d = .83 at 24 months, d = 1.23 at 36 months,
and d = .73 at 54 months on standardized infant developmental
or IQ tests (note that the reduction in effect sizes between ages 3
and 5 appears to be related to the fact that control children were
attending quality child care centers) (Burchinal, Lee, and Ramey,
1989). The High Scope/Perry Preschool Project, a single-site program that delivered 2 years of preschool between ages 3 and 5 and
included a home visit/parenting education component, yielded
effect sizes of d = 1.03 at age 5 on standardized IQ tests. The Infant
Health and Development Project, a large multisite research project
that delivered 3 years of home visiting and 2 years of full-time
high-quality child care from birth, yielded an effect size of d = .83
on an IQ test at the end of the program at 36 months. The Early
Training Project (Gray and Klaus, 1970), which included both
home visiting and child care for preschoolers, reported an effect
size of d = .70 in an IQ test.
In contrast, much weaker effect sizes were obtained for interventions that were less intense: d = .27 for the Ypsilanti Carnegie
Infant Education project, which provided home visiting (Epstein
and Weikart, 1979); d = .13 at 36 months for Early Head Start, a
large multisite research site that delivered 2-3 years of home visiting and high-quality child care in some sites (U.S. Department
of Health and Human Services, Administration for Children and
112
Families, 2004); d = .13 for the Prenatal Early Infancy ProjectElmira site (Olds et al., 1993), another home visiting project; and
d = .12 at 48 months for the Head Start Impact Study, which evaluated the impact of a year of Head Start involving both center care
and home visiting (U.S. Department of Health and Human Services, Administration for Children and Families, 2005). Finally, the
relatively frequent need to renorm cognitive tests provides further
evidence of mutability for general cognitive scores (Neisser et al.,
1996). As the average level of education rose in this country, IQ
tests had to be renormed to ensure that the mean score did not
rise substantially.
A growing literature also demonstrates mutability in executive functioning. Experimental studies have demonstrated that
children who participated in brain training activities and curricula exhibited improved neurocognitive abilities (including
executive function) and, in some cases, behavior relative to peers
who did not participate in the training activities (Diamond et al.,
2007; Dowsett and Livesey, 2000; Klingberg et al., 2005; Rueda et
al., 2005; Semrud-Clikeman et al., 1999).
Testing All Children
The challenges of collecting interpretable data on the cognitive skills of children from non-English-speaking or multicultural
backgrounds have been hotly debated. Overall, recent IQ and
general cognitive tests have been developed using diverse populations in their norming samples, and scores on these tests tend
to show similar patterns of prediction with academic achievement and other criteria for different ethnic and economic groups
(Neisser et al., 1996). However, insufficient evidence exists to draw
definitive conclusions regarding the use of these measures with
infants, toddlers, and preschoolers. Similarly, many measures of
specific cognitive skills were developed using middle-class white
children but have been used recently in studies in Head Start
classrooms or other programs serving low-income, ethnically
diverse children. There is growing attention to the psychometric
properties of these measures as the research moves away from
documenting normative development to examining individual
differences (Blair and Razza, 2007).
113
Available Measures
Measures of general cognitive skills during early childhood
include psychometrically developed developmental and IQ tests,
questionnaires, specific tasks, and curriculum-based assessments.
Many of these are listed in Appendix Table 5-4. The Bayley Scales
of Infant Development measure the mental and motor development and test the behavior of infants from 1 to 42 months of age.
The Wechsler tests may be the most widely used measures of 3- to
8-year-olds, although other psychometric tests are also widely
used for children age 2 years and older, including the Stanford
Binet Intelligence Scales, the Woodcock-Johnson III (WJ-III) Tests
of Cognitive Abilities, and the Kaufman Assessment Battery for
Children (K-ABC) (Bayley, 2005; Kaufman and Kaufman, 2006;
Roid, 2003; Wechsler, 2003; Woodcock, McGrew, and Mather,
2001). The K-ABC assesses sequential and simultaneous processing skills as well as achievement. Similarly, the WJ-III assesses
specific cognitive and achievement skills.
In contrast, most measures of executive function involve
laboratory-based tasks. The continuous performance task is
widely used to measure sustained attention for typically developing children in research and for children referred for cognitive delays or disorders. Assessments of executive skills were
reviewed recently (Carlson, 2005), listing tasks appropriate for
toddlers and preschoolers. Perhaps the most widely used measures include the continuous performance task, shape Stroop,
snack delay, day/night, and Simon says (note that these are also
used as measures of constructs defined under socioemotional
development, again pointing out the porous boundaries between
emotional and cognitive development). Assessments of memory
include scales on psychometrically developed assessments and a
wide variety of laboratory assessments (Gathercole, 1998). Ceiling
and floor effects have limited the use of many of the laboratory
tasks across a variety of ages, and concerns about the extent to
which tasks require multiple specific cognitive skills result in
measures that cannot provide pure assessment of a single executive function or memory skill.
114
Mathematics
In this section we discuss the development of mathematical
understanding, concepts, and skills during early childhood as a
particular aspect of the cognitive skills domain.
Defining the Domain
Researchers emphasize that very young children can and
should be acquiring knowledge that provides the foundations
for later mathematics learning in number sense, spatial sense and
reasoning (geometry), measurement, classification and patterning
(algebra), and mathematical reasoning. Each of these subdomains
of mathematics is described briefly below.
Research suggests that children begin developing number
sense in early infancy (Clements, 2004; Clements, Sarama, and
DiBiase, 2004; Feigenson, Dehaene, and Spelke, 2004; Xu, Spelke,
and Goddard, 2005) and much of what young children know
about numbers depends on their understanding and mastery of
counting (Fuson, 1992a; National Research Council, 2001). Studies suggest that the three major basic skills required for counting
are knowing the sequence of number words, one-to-one correspondence, and cardinality (Becker, 1989; Clements, 2004; Fuson,
1988, 1992a, 1992b; Hiebert et al., 1997; National Research Council,
2001). Following initial acquisition of counting, children begin to
acquire an understanding of number operations (Clements, 2004;
Hiebert et al., 1997; National Council of Teachers of Mathematics,
2000; National Research Council, 2001) and then simple operations and word problems (Fuson, 1992a). Number operations
for preschoolers mainly involve understanding additive number
relationships in which two (or more) small numbers make up
one larger number (e.g., 2 and 3 make 5), which will develop into
addition and subtraction concepts in the future. In acquiring these
skills related to number sense, young children and students of
nonmajority backgrounds tend to be influenced by the context of
the problem and perform better with more contextual information
(Boaler, 1994; Cooper and Dunne, 1998; Lubienski, 2000; Means
and Knapp, 1991).
Geometry is the study of space and shape (Clements, 1999).
115
116
identifying the core unit of the pattern, which, in turn, is dependent on the types of experiences the child experiences at home or
in care and educational settings (Klein and Starkey, 2004; Starkey,
Klein, and Wakeley, 2004).
Most young children can solve problems involving simple
mathematical reasoning by age 3, often by modeling with real
objects or thinking about sets of objects. Alexander, White, and
Daugherty (1997) propose three conditions for reasoning in young
children: (1) the children must have a sufficient knowledge base,
(2) the task must be understandable and motivating, and (3) the
context of the task must be familiar and comfortable to the problem solver. Although these conditions probably apply to problem
solvers of all ages, they may be particularly important for young
children who are not motivated to complete tasks for external
reasons (e.g., good grades).
Importance of the Domain
The case for assessing mathematics in early education programs is easy to make. Looking across international comparative studies, U.S. students performance in mathematics is in the
bottom third (American Institutes for Research, 2005). And recent
analyses of longitudinal studies have shown that mathematical
concepts, such as knowledge of numbers and ordinality, at school
entry are the strongest predictors of later academic achievement,
even stronger than early literacy skills (Duncan et al., 2007). Efforts
clearly need to be made to improve opportunities for mathematics
learning and carefully monitor childrens learning. Furthermore,
all the state early childhood standards mention mathematical
development as a target for attention.
Testing All Children
The ability to articulate thinking and problem-solving
approaches in mathematics is currently recognized as an important skill (National Council of Teachers of Mathematics, 2000),
although this may prove difficult for children who are not proficient in English or have not yet learned mathematics vocabulary.
Mathematical skills therefore need to be assessed in multiple
117
ways, with objects that can be manipulated and questions requiring verbal explanations.
Available Measures
Each of the domains in mathematics discussed above has
measures associated with it, although of varying quality and
degrees of development. Both formative and summative assessments should measure childrens skills in the different sub
domains and not focus only on number sense. Because childrens
mathematical experiences and learning are grounded in their
everyday lives, often in practical situations, it is also important
that the problems, even in formal and structured assessments,
be familiar and involve materials that children can use to solve
the problem and show their thinking. Young children need to be
able to touch and move objects to give an accurate demonstration of their understanding of the concepts. Assessments using
still pictures on a piece of paper are likely to underestimate their
mathematical understanding, as they may be better able to solve
problems when they are allowed to move actual objects around
physically. Some of the skills that should be examined in each
domain are listed below.
Since young childrens primary experience with numbers
focuses on counting, any assessment of number sense should
examine how children count groups of objects. Assessments
should include asking the child to count to measure their knowledge of number sequence names and rote counting, assessing
the childs understanding of one-to-one correspondence between
objects and counting and of cardinality. Similarly, assessment of
spatial sense and reasoning (geometry) should involve observation of children engaged in activities using shapes. Assessment
of childrens understanding of measurement in early childhood
should begin with asking them to make direct comparisons of different attributes of objects. For classification and sorting, children
should be provided with materials or objects and asked to create
their own groups and describe their reasoning. Their reasoning
should be carefully noted and their understanding should be
evaluated based on their reasoning, not solely by the evaluators
criteria. Assessment items for mathematical reasoning should be
118
119
measurement tools for use in the future. As the history of instrument development in the domains of approaches to learning and
social/emotional development shows, identifying a domain as
important can generate researcher and practitioner interest that
translates itself initially into informal assessments, which are
refined and expanded to meet the psychometric criteria of importance from wider use.
The default when thinking about assessment is to think
about direct, formal testingthe familiar scenario of an adult
sitting down with a child and presenting prescribed questions or
challenges for him or her to solve, in a prescribed sequence. It is
important to emphasize that, although many of the assessment
tools discussed in this chapter have that character, the repertoire
of usable, reliable, and informative assessments is in fact much
larger, including observation of the child in natural or somewhat
structured settings, collecting information from primary care
givers and from adults in child care and educational settings
about the childs behavior, and interacting with the child directly
but without formal test items or materials. The reliability and
validity of such measures for young children needs more study,
and such research is beginning to be done. For example, Meisels,
Xue, and Shamblott (in press) studied the Work Sampling for
Head Start (WSHS) measure, derived from the Work Sampling
System, which has observers complete a checklist of childrens
demonstrated capabilities. They reported moderate correlations with direct assessment instruments for language, literacy,
and mathematics, but did not recommend use of the WSHS for
accountability purposes.
Motor control
Nutrition
Well-being
Motor
development
NEPSY
Denver II
Denver II
Direct Assessment
Data-Gathering Method
Physical
development
Assessment
Subscales
Toddler-Parent Mealtime
Behavior Questionnaire
Creative Curriculum
Development
Continuum for Ages 3-5
Questionnaire
Games as Measurement
for Early Self-Control
(GAMES)
Toddler-Parent Mealtime
Observation
Indices of Obesity
Growth Charts
Creative Curriculum
Development
Continuum for Ages 3-5
Observation
Denver II
Denver II
Interview
APPENDIX TABLES:
5-1 through 5-7Tables of Preschool Instruments1
120
High/Scope Child
Observation Record
(COR)
Music and
movement
High/Scope Child
Observation Record
(COR)
Vineland Social-Emotional
Early Childhood Scales
(SEEC)
1These listings do not imply any approval or endorsement by the committee of particular instruments. They are included to provide
examples of instruments available for measuring various domains and outcomes. Appendix D provides information on where reviews
of the instruments may be found.
Motor quality
Perceptual motor
development
Physical activity
Clinical Evaluation of
Language Fundamentals
(CELF)-Preschool
Behavioral Observation
Checklist
Processing speed
Woodcock-Johnson III
(WJ-III)
Games as Measurement
for Early Self-Control
(GAMES)
Impulse control/
delay gratification
121
Negative
reaction
tendency
Social skills
Emotion
regulation
Adapted EZ-Yale
Personality/Motivation
Questionnaire (Adapted
EZPQ)
ECLS-K Adaptation of
the Social Skills Rating
System (SSRS), Task
Orientation/Approaches
to Learning Scale
Delay-of-Gratification
Task
Observation
Behavior
problems
Questionnaire
Strange Situation
Direct Assessment
Data-Gathering Method
Attachment
Assessment
Subscales
Attachment Q-Sort
122
Social
development
Creative Curriculum
Development Continuum
for Ages 3-5
Creative Curriculum
Development Continuum
for Ages 3-5
Social-emotional
development
Games as Measurement
for Early Self-Control
(GAMES)
Clinical Evaluation of
Language Fundamentals
(CELF)-Preschool
Behavioral Observation
Checklist
Adapted EZ-Yale
Personality/Motivation
Questionnaire (Adapted
EZPQ)
Fatigue/
boredom/
frustration
Behavior rating
scale
Self-regulation
tasks
Outer
directedness
continued
123
Behavioral Assessment
System for Children
(BASC)
Behavioral Assessment
System for Children
(BASC)
Child Behavior Checklist
(CBCL) and CaregiverTeacher Report Form
(C-TRF)
Composites
Syndromes
Clinical scales
Attention/
arousal
Personal
and social
development
Questionnaire
High/Scope Child
Observation Record
(COR)
Direct Assessment
Data-Gathering Method
Social relations
Assessment
Subscales
High/Scope Child
Observation Record
(COR)
Observation
Interview
124
Summary scales
DSM-oriented
scales
Factor
analytically
derived
subscales
Auxiliary scales
Protective
factors scale
Behavioral
concern
Externalizing
symptoms
continued
125
Dysregulation
Competence
Overall
adjustment
scales
Peer social
interactions
scales
Adult social
interactions
scales
Social
competence
Questionnaire
Infant-Toddler Social and
Emotional Assessment
(ITSEA)
Direct Assessment
Data-Gathering Method
Internalizing
symptoms
Assessment
Subscales
Interview
126
Vineland Social-Emotional
Early Childhood Scales
(SEEC)
Denver II
Coping skills
Personal and
social skills
General
adaptation
Vineland Social-Emotional
Early Childhood Scales
(SEEC)
Externalizing
problems
Interpersonal
relationships
Internalizing
problems
127
NEPSY
Direct Assessment
Data-Gathering Method
Adapted EZ-Yale
Personality/Motivation
Questionnaire (Adapted
EZPQ)
ECLS-K Adaptation of the
Social Skills Rating Scale
(SSRS), Task Orientation/
Approaches to Learning Scale
ECLS-K Adaptation of the
Social Skills Rating Scale
(SSRS), Task Orientation/
Approaches to Learning Scale
Engagement in
learning
Organization
Delay-of-Gratification Task
Tower of Hanoi
Observation
The Galileo System for the
Electronic Management of
Learning (Galileo)
Questionnaire
The Galileo System for the
Electronic Management of
Learning (Galileo)
Academic mastery
motivation
Emotion regulation
Executive
functioning
Assessment
Subscales
128
Adaptive behavior
scales
Behavioral Assessment
System for Children (BASC)
High/Scope Child
High/Scope Child
Observation Record (COR) Observation Record (COR)
Initiative
Self-help
Orientation/
engagement
CELF-Preschool Behavioral
Observation Checklist
Attention to task
NEPSY
Adaptability
Visuospatial
processing
Creativity
129
NEPSY
Kaufman Assessment
Battery for Children
(K-ABC)
Memory and
learning
Sequential
processing
Sustained
attention
Tower of Hanoi
Observation
NEPSY
Executive
functioning
Questionnaire
Cognitive control
Bayley, Stanford-Binet,
Wechsler Preschool and
Primary Scale of Intelligence
(WPPSI), WISC
Expressive One-Word
Picture Vocabulary Test
(EOWPVT)
Woodcock-Johnson III
(WJ-III)
Direct Assessment
Data-Gathering Method
Intelligence
Assessment
Subscales
130
Stanford-Binet Intelligence
Scale, Fourth ed. (SB-IV)
Woodcock-Johnson III
(WJ-III)
Woodcock-Johnson III
(WJ-III)
Long-term
retrieval
Concepts
Short-term
memory
Memory
Kaufman Assessment
Battery for Children
(K-ABC)
Woodcock-Johnson III
(WJ-III)
Achievement
scale
Kaufman Assessment
Battery for Children
(K-ABC)
Mental
processing
Spatial
Kaufman Assessment
Battery for Children
(K-ABC)
Simultaneous
processing
continued
131
Woodcock-Johnson III
(WJ-III)
Woodcock-Johnson III
(WJ-III)
Auditory
processing
Fluid reasoning
Creative Curriculum
Development Continuum
for Ages 3-5
The Galileo System for the
Electronic Management of
Learning (Galileo)
Early cognitive
development
Questionnaire
Cognitive
development
Phonological
awareness
Woodcock-Johnson III
(WJ-III)
Woodcock-Johnson III
(WJ-III)
Visual-spatial
thinking
Response
latency
Direct Assessment
Assessment
Subscales
Data-Gathering Method
Creative Curriculum
Development Continuum
for Ages 3-5
Clinical Evaluation of
Language Fundamentals
(CELF)-Preschool
Behavioral Observation
Checklist
Observation
Interview
132
Wechsler Preschool
and Primary Scale of
Intelligence, Third ed.
(WPPSI-III)
Wechsler Preschool
and Primary Scale of
Intelligence, Third ed.
(WPPSI-III)
Colors
Performance IQ
Full-scale IQ
Woodcock-Johnson III
(WJ-III), Peabody
Individual Achievement
Test (PIAT), Peabody
Individual Achievement
Test-Revised (PIAT-R)
Direct Assessment
Data-Gathering Method
Mental scale
Assessment
Subscales
Questionnaire
Observation
Interview
continued
133
Comparisons
Shapes
Stanford-Binet Intelligence
Scale, Fourth ed. (SB-IV)
Self-/social
awareness
Texture/material
Quantity
Time/sequencing
Abstract/visual
reasoning
Direct Assessment
Data-Gathering Method
Sizes
Assessment
Subscales
Observation
Interview
134
High/Scope Child
High/Scope Child
Observation Record (COR) Observation Record (COR)
The Work Sampling System The Work Sampling System
(WSS)
(WSS)
The Work Sampling System The Work Sampling System
(WSS)
(WSS)
Creative
representation
Social studies
The arts
Expressive One-Word
Picture Vocabulary Test
(EOWPVT)
Creative arts
Academic
progress
135
Woodcock-Johnson III
(WJ-III), Peabody Individual
Achievement Test (PIAT),
Peabody Individual
Achievement Test-Revised
(PIAT-R), Test of Early
Mathematics Ability
(TEMA)
Woodcock-Johnson III
(WJ-III), Peabody Individual
Achievement Test (PIAT)
Stanford-Binet Intelligence
Scale, Fourth ed. (SB-IV)
Science
Quantitative
reasoning
Number/
counting
Sizes
Shapes
Direct Assessment
Data-Gathering Method
Mathematics
Assessment
Subscales
Questionnaire
Observation
Interview
136
Kaufman Assessment
Battery for Children
(K-ABC)
Woodcock-Johnson III
(WJ-III)
Woodcock-Johnson III
(WJ-III)
Woodcock-Johnson III
(WJ-III)
Achievement
scalearithmetic
subtest
Formal
mathematics
Informal
mathematics
Achievement
broad
mathematics
Achievementmathematical
calculation skills
Achievement
mathematical
reasoning
Early
mathematics
Quantity
137
High/Scope Child
High/Scope Child
Observation Record (COR) Observation Record (COR)
The Work Sampling System The Work Sampling System
(WSS)
(WSS)
The Work Sampling System The Work Sampling System
(WSS)
(WSS)
Mathematical
thinking
Scientific thinking
Observation
Logic and
mathematics
Questionnaire
The Galileo System for the
Electronic Management of
Learning (Galileo)
Direct Assessment
Data-Gathering Method
Nature and
science
Assessment
Subscales
138
MacArthur-Bates
Communicative
Development Inventories
(CDI)
Vocabulary
Observation
Questionnaire
Creative Curriculum
Development Continuum
for Ages 3-5
Direct Assessment
Data-Gathering Method
Assessment
Subscales
continued
139
Comprehensive Test of
Phonological Processing
(CTOPP), WoodcockJohnson III (WJ-III)
Diagnostic Evaluation of
Language Variation (DELV)
Peabody Individual
Achievement Test-Revised
(PIAT-R)
Peabody Individual
Achievement Test-Revised
(PIAT-R)
Peabody Individual
Achievement Test-Revised
(PIAT-R)
Grammar
Literacy
Reading
recognition
Reading
comprehension
Spelling
Direct Assessment
Data-Gathering Method
Phonological
awareness
Assessment
Subscales
Observation
Interview
140
Wechsler Preschool
and Primary Scale of
Intelligence, Third ed.
(WPPSI-III)
Clinical Evaluation of
Language FundamentalsPreschool (CELF-Preschool)
Test of Early Language
Development, Third ed.
(TELD-3)
Clinical Evaluation of
Language FundamentalsPreschool (CELF-Preschool)
Reynell Developmental
Language Scales: U.S.
Edition (RDLS)
Test of Early Language
Development, Third ed.
(TELD-3)
Verbal IQ
Receptive
language
Expressive
language
Stanford-Binet Intelligence
Scale, Fourth ed. (SB-IV)
Letters
Verbal
Verbal reasoning
Primary Test of Cognitive
Skills (PTCS)
Sequenced Inventory
of Communication
Development-Revised
(SICD-R), Reynell
Developmental Language
Scales, U.S. ed. (RDLS)
Sequenced Inventory
of Communication
Development-Revised
(SICD-R)
continued
Sequenced Inventory
of Communication
Development-Revised
(SICD-R)
Sequenced Inventory
of Communication
Development-Revised
(SICD-R)
141
MacArthur-Bates
Communicative
Development Inventories
(CDI)
Kaufman Assessment
Battery for Children
(K-ABC), Expressive
Vocabulary Subtest
Verbal
expression
Interview
Words and
sentences
Kaufman Assessment
Battery for Children
(K-ABC), Expressive
Vocabulary Subtest
Recall ability
Observation
MacArthur-Bates
Communicative
Development Inventories
(CDI)
Clinical Evaluation of
Language FundamentalsPreschool (CELF-Preschool)
Quick-test
Questionnaire
Words and
gestures
Clinical Evaluation of
Language FundamentalsPreschool (CELF-Preschool)
Direct Assessment
Data-Gathering Method
Total language
Assessment
Subscales
142
Initial sound
fluency
Letter naming
fluency
Phoneme
segmentation
fluency
Reynell Developmental
Language Scales, U.S. ed.
(RDLS)
Auditory
comprehension
Verbal
comprehension
Expressive
communication
continued
143
Direct Assessment
Data-Gathering Method
Woodcock-Johnson III
(WJ-III)
Woodcock-Johnson III
(WJ-III)
Woodcock-Johnson III
(WJ-III)
Denver II
Alphabet
Conventions
Meaning
Letter-word
identification
Writing samples
Word attack
Language skills
Oral reading
Dynamic Indicators of Basic
fluency and retell Early Literacy Skills, Sixth
fluency
ed. (DIBELS)
Nonsense word
fluency
Assessment
Subscales
Observation
Denver II
Interview
144
6
Measuring Quality in
Early Childhood Environments
146
147
of child care. For example, Pianta and colleagues use their tool,
the CLASS (Pianta, La Paro, and Hamre, 2007), to promote more
intentional instruction, classroom management, and emotional
support in the classroom through their professional program,
My Teaching Partner (Kinzie et al., 2006). The Quality Interventions for Early Care and Education (QUINCE) intervention and
evaluation, which uses on-site technical assistance to improve the
quality of home-based as well as center-based child care, uses the
environmental ratings scales, the Family Day Care Environment
Rating Scale, or FDCERS (Harms and Clifford, 1989), and the
Early Childhood Environment Rating Scale-Revised, or ECERS-R
(Harms, Clifford, and Cryer, 1998), to promote the use of ageappropriate activities and enhance teacher-child interactions in
their program, which follows the Partners for Inclusion model
(Bryant, 2007; Wesley, 1994).
Second, observational measures can be used in formative
assessment of programs that are striving to improve their quality.
Periodic observations and examination of scores on different
dimensions can help identify weaknesses that require further
attention. Fourteen states now have quality ratings systems available to the public, with summary ratings of the quality of early
care and education, and many more states are developing such
systems, with the aim of improving information to consumers
and providing supports to improve quality (Tout, Zaslow, and
Martinez-Beck, forthcoming). Local communities as well are
developing such systems. In most fully developed state quality
ratings systems, an observational measure of the quality of the
early care and education environmentusually the ECERS-R,
FDCERS, or the infant and toddler version of this measure, the
Infant/Toddler Environment Rating Scale (Harms, Cryer, and
Clifford, 1990)is used as one component of the overall rating of
the environment, which usually includes multiple components,
selected and weighted differently in each state or community. The
rating of the environment is used not only as a contributor to the
summary rating of quality, but also as a source of detailed information about the facets of quality that need improvement and in
which changes will help progress to the next quality rating.
Third, classroom observations can be used for accountability
purposes, instead of or as a supplement to child outcome mea-
148
sures. Child care quality has been a consistent modest to moderate positive predictor of childrens cognitive and language skills
in large, multisite studies and smaller local studies (Howes et al.,
2008; Lamb, 1998; NICHD Early Child Care Research Network,
2006; Peisner-Feinberg et al., 2001; Vandell, 2004) and a somewhat consistent predictor of social skill (NICHD Early Child Care
Research Network, 2006; Peisner-Feinberg et al., 2001; Vandell,
2004). Using early childhood assessments as part of an aligned
system requires the capacity to juxtapose information about
quality in the early care and education setting with change scores
on childrens development (along with other key components).
Thus, a complete system will require both ratings of the environment and assessments of children at multiple points in time,
although this is expensive.
In some federal and state efforts, observations of early care
and education settings serve both a monitoring and accountability function and a formative function, providing information to
improve quality. Thus, for example, as part of monitoring and
accountability, the Head Start Impact Study collected observations of the quality of Head Start programs as well as of formal
early care and education programs serving children in the control group (U.S. Department of Health and Human Services,
Administration for Children and Families, 2005). Similarly, the
Head Start Family and Child Experiences Survey (FACES) regularly collects observational data on a nationally representative
sample of Head Start programs. The observational data are used
in combination with child outcome data as part of ongoing program monitoring. However, the observational ratings and child
outcomes together are also used to inform ongoing program
improvement (see discussion in Zaslow, 2008). As one example,
information from Head Start FACES was instrumental in shaping an increased focus in Head Start programs on early literacy
development. Information from the Head Start Impact Study has
also been instrumental in increasing professional development
for Head Start teachers, focusing on early mathematics development in young children and how best to foster it.
Fourth, classroom observations are useful for research.
Indeed, most measures were originally developed as part of a
research initiative. An extensive body of work looks at the rela-
149
150
151
152
153
154
155
156
157
158
BOX 6-1
Dimensions of Quality Observable in the Classroom
1. Emotional climate, social interactions, support for social skills
development, and discipline strategies:
A. Degree to which adults are affectionate, supportive, attentive,
and respectful toward children.
B. Explicit support for social skills (e.g., encouraging children to
use their words, modeling and engaging children in conversations about social problem solving skills, encouraging use
of learned strategies to solve real social conflicts).
C. Conversations about feelings.
D. Collaboration and cooperation opportunities.
E. Clarity and developmental appropriateness of rules.
F. Teachers use of redirection, positive reinforcement, encouragement, and explanations to minimize negative behavior.
2. Instructional activitiesan explicit curriculum with specified
learning goals for children.
3. Generalindividualized (adjusted to childrens skills and interests); purposeful, planned instruction; integration of content
areas; children actively interacting with materials.
4. Languageadults engage in conversations with children; activities that encourage conversation among children; explicit efforts
to develop vocabulary and language skills in the context of
meaningful activities.
5. Literacychildren read to and given opportunities to read;
rhyming words, initial sounds, lettersound links, and spellings
of common words pointed out and practiced; functions and features of print pointed out; opportunities to dictate and write using
invented spelling made available.
6. Mathematicsactivities that involve counting objects, measuring, identifying shapes, creating patterns, telling time, classifying
and seriating objects; instruction on concepts (e.g., big, bigger,
equal, one-to-one correspondence, spatial relationships).
7. Scienceactive manipulation of materials (e.g., sink and float)
with adult engaging children in prediction, systematic observation and analysis; instruction on scientific concepts linked to
active exploration (e.g., care and observations of live animals).
8. Interactions with parentsactivities and opportunities for parents to be informed about the program and their child.
9. Cultural responsiveness:
A. Evidence of supports for linguistic and cultural diversity (e.g.,
pictures, books, language).
B. Activities that expose children to diverse languages and
cultural practices.
C. Support for native language development.
D. Support for learning English.
10. Safety:
A. Adult-child ratio.
B. Absence of broken furniture, any objects that could cause
physical harm.
C. Sufficient space; open pathways.
D. Place for personal hygiene (e.g., teeth brushing, hand
washing).
11. Materials:
A. Technology (e.g., computers).
B. Music (e.g., CD player).
C. Creativity (e.g., art supplies, easels, play dough).
D. Dramatic play (e.g., store, post office, kitchen, clothes).
E. Science (e.g., sand, water, plants, live animals).
F. Literacy (e.g., books, writing materials).
G. Math (e.g., counting objects, blocks, measuring instruments).
H. Fine motor (e.g., materials for drawing, scissors).
12. Physical arrangement:
A. Space and equipment for gross motor activities (e.g., climbing
equipment, swings, balls).
B. Place for quiet and rest (e.g., rugs and pillows out of the
center of activity).
C. Childrens access to materials.
13. Adaptations for children with disabilities.
159
160
161
162
measure contains 26 items divided into two subscales. The emotional climate subscale assesses the teachers warmth, encouragement, and positive guidance. In the program focus subscale, half
of the 20 items refer to didactic, teacher-directed practices (e.g.,
large-group instruction; workbooks, ditto sheets, and flashcards;
memorization and drill; art projects that involve copying; focus
on getting the right answer), which were considered developmentally inappropriate by NAEYC. Of the 10 items that describe
positive activities, most concern child choice and initiative and
diversity of activities and materials that children can manipulate.
Three of the items refer to positive instructional approaches (e.g.,
teachers ask questions that encourage children to give more than
one right answer).
The CPI described center-based child care preschool programs
in the 10-site NICHD Study of Early Child Care and Youth Development. The program focus score predicted childrens language
and academic outcomes at 4.5 years in analyses that adjusted for
family characteristics in unpublished analyses (available from the
authors on request).
A Developmentally Appropriate Practices Template
A Developmentally Appropriate Practices Template (ADAPT;
Van Horn and Ramey, 2004) has 19 items based on the 1987 NAEYC
guidelines. It also focuses on the teaching practices the teacher
uses with the entire preschool classroom. Items are anchored
on a 1 (developmentally inappropriate) to 5 (developmentally
appropriate) scale, with descriptions for each anchor. The items
form three scales: (1) integrated curriculum (e.g., teacher adapts
instruction to childrens interests, needs, and prior knowledge;
literacy integrated across content areas with literacy materials of
social relevance), (2) social-emotional emphasis (e.g., childrens
social and emotional development consistently supported by
peers and teachers; children and teacher collaborate, classroom exemplifies community of learners with shared goals),
and (3) child-centered approaches (e.g., children encouraged to
choose and interact with materials to create and problem-solve;
children work interdependently to complete task or project
and make joint decisions). Instructional practices are described
163
164
165
166
167
168
169
170
used quality measure, the positive caregiving composite, is calculated slightly differently for each age level. At 6, 15, and 24
months, positive care-giving composite scores are the mean of five
4-point qualitative ratings (sensitivity to childs nondistress signals,
stimulation of cognitive development, positive regard for child,
emotional detachment [reflected], flatness of affect [reflected]). At
36 months, these five scales plus two additional subscales, fosters
childs exploration and intrusive [reflected], are included in
the composite. At 54 months, the positive caregiving composite is
the mean of 4-point ratings of caregivers sensitivity/responsivity,
stimulation of cognitive development, intrusiveness (reflected), and
detachment (reflected). The behaviors observed include language
stimulation, positive talk (e.g., praise, encouragement), positive
physical contact and other behaviors (e.g., positive affect, stimulation of social development, restricting activity, speaking negatively
to child, etc.) as well as the amount of time the child positively or
negatively interacted with the caregiver and other children.
The ORCE composite quality ratings predicted concurrent
and later child outcomes in the 10-site NICHD Study of Early
Child Care and Youth Development in analyses that adjusted for
family demographic and parenting characteristics. Children who
experienced more responsive and stimulating care according to
the ORCE consistently had high language and cognitive scores
and tended to have better social skills while in child care (NICHD
Early Child Care Research Network, 2006) and to demonstrate
better language skills through fifth grade (Belsky et al., 2007) and
better academic skills through third grade (NICHD Early Child
Care Research Network, 2005).
Preschool Classroom Mathematics Inventory
The Preschool Classroom Mathematics Inventory (PCMI;
National Institute for Early Education Research, 2007) was created
to assess the quality of mathematics instruction for the preschool
classroom and is modeled after Supports for the Early Literacy
Assessment (see below). The 17 items assess instruction and
learning opportunities related to (1) number (e.g., materials for
counting, comparing number, and estimating; teachers encourage
children to recombine and count); (2) mathematical concept (e.g.,
171
172
173
Age
Group
6 months11 years
2-5 years
6 months5 years
Infant,
toddler,
preschool
Infant,
toddler,
preschool
Preschool
Preschool3rd grade
Instrument
Quality of Instruction
Assessment Profile
for Early Childhood
Programs (APECP)
Caregiver Interaction
Scale (cis)
Child/Home Early
Language and
Literacy Observation
(CHELLO)
Classroom
Assessment Scoring
System (CLASS)
Used for
Center/
school
Homebased
child care
All child
care
Center
Home
Home or
lab
Home or
lab
Type of
Setting
**
**
**
Physical
Environment,
Materialsa
**
**
**
**
Social/
Emotional
Climateb
**
**
**
**
**
Learning
Environment/
Opportunities
**
**
**
Language
and
Literacy
Math
**
Descriptive
Detailc
174
4-5 years
1st-3rd
grades
2.5-5 years
2.5-5 years
4-7 years
Pre-K-3rd
grade
10
months8 years
Classroom Practices
Inventory (CPI)
A Developmentally
Appropriate Practices
Template (ADAPT)
Early Childhood
Environment Rating
Scale-Revised
(ECERS-R)
Early Childhood
Environment Rating
Scale-Extension
(ECERS-E)
Early Childhood
Classroom
Observation Measure
(ECCOM)
Emerging Academics
Snapshot (EAS)
Center/
school
Center/
school
Center/
school
Center
Center
School
Center
**
**
**
**
**
**
**
**
**
**
**
**
**
continued
175
Infant12 years
Birth30 months
Preschool
Available
for 6-54
months
Preschool
Observation Measure
of Language and
Literacy Instruction
(OMLIT)
Observation Record
of the Caregiving
Environment (ORCE)
Preschool Classroom
Mathematics
Inventory (PCMI)
Instrument
Age
Group
Used for
Center
All child
care
Center
Center
Homebased
child care
Type of
Setting
**
Physical
Environment,
Materialsa
**
Social/
Emotional
Climateb
Learning
Environment/
Opportunities
**
Language
and
Literacy
**
Math
**
**
**
Descriptive
Detailc
176
3-5 years
Preschool
Supports for
English Language
Learners Classroom
Assessment (SELLCA)
Center
Center
Center
**
**
**
**
**
**
NOTES: Single asterisk = Instrument provides some representation of this feature. Two asterisks = Instrument provides substantial
representation of this feature.
aSafety, physical arrangement, materials.
bEmotional climate, social interactions with adults, support for social skill development.
cLevel of detail in descriptions.
Preschool
Preschool Program
Quality Assessment,
2nd ed. (PQA)
177
Part
III
How to Assess
n this part, we turn to the question of how to select and administer assessments, once purposes have been established and
domains selected. Some of the issues dealt with here are the
technical ones defined by psychometricians as key to test quality:
the reliability and validity of inferences, discussed in Chapter 7.
Others have to do with the usability and fairness of assessments,
issues that arise when assessing any child but in particular children with disabilities and children from cultural and language
minority homes; these are discussed in Chapter 8. In Chapter 9,
and in particular with regard to direct assessments, we discuss the
many ways in which the test as designed may differ from the test
as implemented. Testing a young child requires juggling many
competing demands: developing a trusting relationship with the
child, presenting the test items in a relatively standardized way
that is nonetheless natural, responding appropriately to both correct and incorrect answers and to other child behaviors (signs of
fear, anxiety, sadness, shyness). While it may not be possible to
manage all these demands optimally, it is important that they are
at least acknowledged when interpreting test results.
179
7
Judging the Quality and
Utility of Assessments
n this chapter we review important characteristics of assessment instruments that can be used to determine their quality
and their utility for defined situations and purposes. We
review significant psychometric concepts, including validity and
reliability, and their relevance to selecting assessment instruments, and we discuss two major classes of instruments and the
features that determine the uses to which they may appropriately
be put. Next we review methods for evaluating the fairness of
instruments, and finally we present three scenarios illustrating
how the process of selecting assessment instruments can work
in a variety of early childhood care and educational assessment
circumstances.
Many tests and other assessment tools are poorly designed.
The failure of assessment instruments to meet the psychometric
criteria of validity and reliability may be hard for the practitioner
or policy maker to recognize, but these failings reduce the usefulness of an instrument severely. Such characteristics as ease of
administration and attractiveness are, understandably, likely to be
influential in test selection, but they are of less significance than
the validity and reliability considerations outlined here.
Validity and reliability are technical concepts, and this chapter addresses some technical issues. Appendix A is a glossary of
words and concepts to assist the reader. Especially for Chapter 7,
181
182
183
184
185
186
187
188
target domain requires a complex chain of inferences and generalizations that must be made clear as a part of the interpretive
argument.
An interpretive argument for a measure of childrens cognitive
development in the area of quantitative reasoning, for example,
may include inferences ranging from those involved in the scoring
procedure (Is the scoring rule that is used to convert an observed
behavior or performance by the child to an observed score appropriate? Is it applied accurately and consistently? If any scaling
model is used in scoring, does the model fit the data?); to those
involved in the generalization from observed score to universe
of scores (Are the observations made of the child in the testing or
observation situation representative of the universe of observations or performances defining the target cognitive domain? Is the
sample of observations of the childs behavior sufficiently large to
control for sampling error?); to extrapolation from domain score
to level of development (or level of proficiency) of the competencies for that domain (Is the acquisition of lower level skills a
prerequisite for attaining higher level skills? Are there systematic
domain-irrelevant sources of variability that would bias the interpretation of scores as measures of the childs level of development
of the target domain attributes?); to the decisions that are made,
or implications drawn, on the basis of conclusions about developmental level on the target outcome domain (e.g., children with
lower levels of the attribute are not likely to succeed in first grade;
programs with strong effects on this measure are more desirable
than those with weak effects).
The decision inference usually involves assumptions that rest
on value judgments. These values assumptions may represent
widely held cultural values for which there is societal consensus,
or they may represent values on which there is no consensus or
even bitter divisions, in which case they are readily identifiable
for the purposes of validation. When the underlying decision
assumptions represent widely held values, they can be difficult to
identify or articulate for validation through scientific analysis.
The interpretive argument may also involve highly technical inferences and assumptions (e.g., scaling, equating). The
technical sophistication of measurement models has reached
such a high degree of complexity that they have become a black
189
190
191
192
193
observational situations and allowed to explore the environment so that they are comfortable. The results can provide
insights ranging from the very plainthe children were very
distracted when respondingto the very detailed, including
evidence about particular behaviors and actions that were evident when they were responding.
The exit interview is similar in aim but is timed to occur after
the child has made his or her responses. It may be conducted
after each item or after the assessment as a whole, depending
on whether the measurer judges that the delay will or will
not interfere with the childs memory. Again, limitations with
infants and toddlers are obvious. The types of information
gained will be similar to those from the think-aloud, although
generally it will not be so detailed. It may be that a data collection strategy that involves both think-alouds or observations
and exit interviews will be best.
3. Evidence Based on Internal Structure. To collect evidence
based on internal structure, the measurer must first ensure that
there is an intention of internal structure. Although this idea of
intended structure may not always be evident, it must always
exist, even if it is treated as being so obvious that it need not be
mentioned or only informally acknowledged in some cases. We
refer to this internal structure as the construct. This is what has
been described above in the section on construct validity. Note
that the issue of differential item functioning (DIF), discussed
later in this chapter, is one element of this type of evidence,
specifically one related to fairness of the assessment.
4. Evidence Based on Relations to Other Variables. If there are
other external variables that the construct should (according
to theory) be related to, and especially if another instrument is
intended to measure the same or similar variable, a strong relation (or lack of a strong relation) between the assessment under
scrutiny and these external variables can be used as validity evidence. Typical examples of these external variables are (a) caregiver judgments and (b) scores on other assessments. Another
source of external variables is treatment studies: if the measurer
has good evidence that a treatment does indeed change the construct, then the contrast on the assessment between a treatment
and a control group can be used as an external variable. (One
194
195
validity, information regarding the consequences of the assessment becomes part of the evidentiary basis for judging the
validity of the assessment. An illustration can be drawn from
high-stakes assessments in education, through which policy
makers have sought to establish accountability. As with any
form of assessment, these can have intended or unintended,
desirable or undesirable consequences. An alleged potential
consequence of high-stakes assessments is that they can drive
instructional decisions in unintended and undesirable ways,
usually by over-emphasizing the skills tested (teaching to
the test). They can also possibly have a corrupting influence,
since the motivation to misuse or misrepresent test scores can
be compelling. In addition, the psychometric characteristics
of the test can vary depending on whether it is administered
under low- or high-stakes conditions (e.g., level of motivation
or anxiety as construct-irrelevant sources of variance in test
performance). It is also possible that new and future technologies used to administer, score, or report assessments will have
unintended, unanticipated consequencesas many new technologies have had.
Social Consequences of Assessment
As in the field of medicine, in assessment there is an obligation to do no harm to those assessed. As such, it is important to
inquire into the intended as well as unintended consequences of
assessment. Validity theoreticians differ from one another in the
extent to which they incorporate the consequences of assessment
under the purview of validity. Thus, although evidence about
consequences can inform judgments about validity, it is important
to distinguish between evidence that is directly relevant to validity and consequences that may inform broader concerns, such as
educational or social policy.
For example, concerns have been raised about the impact of
certain forms of assessment on narrowing the curriculum. (That
is, it is often said that assessments should not have the effect of
unduly narrowing the early childhood programs focus to the
detriment of the programs wider or comprehensive goals.) For
example, an educational assessment system should not lead
196
197
198
199
test-retest reliability coefficient. In a test-retest reliability coefficient, the respondents give responses to the questions twice, then
the reliability coefficient is calculated simply as the correlation
between the two sets of scores. On one hand, the test and the
retest should be so far apart that it is reasonable to assume that
the respondents are not answering the second time by remembering the first but are genuinely responding to each item anew.
This may be difficult to achieve for some sorts of complex items,
which may be quite memorable. On the other hand, as the aim
is to investigate variation in the scores not due to real change
in respondents true scores, the measurements should be close
enough together for it to be reasonable to assume that there has
been little real change. Obviously, this form of reliability index
will work better when a stable construct is being measured with
forgettable items, compared with a less stable construct being
measured with memorable items.
Another type of reliability coefficient is the alternate forms
reliability coefficient. With this coefficient, two sets of items are
developed for the instrument, each following the same construction process. The two alternate copies of the instrument are
administered, and the two sets of scores are then correlated to
produce the alternate forms reliability coefficient. This coefficient
is particularly useful as a means of evaluating the consistency
with which the test has been developed.
Other classical consistency indices that have also been developed have their equivalents in the construct modeling approach.
For example, in the so-called split-halves reliability coefficient, the
instrument is split into two different (nonintersecting) but similar
parts, and the correlation between them is used as a reliability
coefficient after adjustment with a factor that attempts to predict
what the reliability would be if there were twice as many items in
each half. The adjustment is a special case of the Spearman-Brown
formula:
r =
Lr
,
1 + ( L 1) r
200
201
202
203
academic achievement in primary grades was predicted by assessments administered in preschool or kindergarten. This provides a
ceiling for possible external validity evidence. Observation-based
measures present an entirely different set of issues. They do not
present any of the problems associated with the young childs
ability to understand and comply with the demands of a structured testing situation, since the childs day-to-day behavior is
the basis for the inference of knowledge and skills. Teachers and
caregivers collect data over a variety of contexts and over time to
gain a more valid and reliable picture of what children know and
can do. Observation-based assessment approaches also are consistent with recommended practices for the assessment of young
children. The challenges associated with observation-based measures are centered around the caregiver or teacher as the source
of the information. Mathematica Policy Research (2007) has summarized challenges related to observation-based assessments:
There is a need to establish trust in teachers and caregivers
judgments. Research has identified the conditions under
which their ratings are reliable, but there is an ongoing
need to monitor reliability.
Teachers and caregivers must be well trained in the
administration of the tool to achieve reliable results. More
research is needed to specify the level of training needed to
obtain reliable ratings from preschool teachers. (Assessors
of direct assessments need to be trained as well, but the
protocol may be more straightforward.)
The assessment needs to contain well-defined rubrics and
scoring guides.
Teachers and caregivers may be inclined to inflate their
ratings if they know the information is being used for program accountability.
Not all teachers or caregivers will be good assessors.
Measurement carried out by teachers and caregivers
requires that additional steps be taken to ensure the validity and reliability of the data, such as periodic monitoring.
A strength of observation-based measures is that the information has utility for instructional as well as accountability purposes.
204
205
206
methods for examining (a) test bias and (b) DIF. These issues are
most relevant for three populations of young children, which are
the subject of the next chapter: minority children, English language
learners, and children with disabilities.
Differential Item Functioning
Assessments are typically made of children from a variety of
backgrounds. One standard requirement of fairness in assessment
practice is that, for children who are at the same level of ability on
the variable being measured, the items in the instrument behave
in a reasonably similar way across different subgroups. That is,
the items should show no evidence of bias due to DIF (American Educational Research Association, American Psychological
Association, and National Council on Measurement in Education,
1999, p. 13). Typically these subgroups are gender, ethnic and
racial, language, or socioeconomic groups, although other groupings may be relevant in particular circumstances.
First, it is necessary to make an important distinction. If the
responses to an item have different frequencies for different subgroups, then that is evidence of differential impact of the item on
those subgroups. Although such results may well be of interest
for other reasons, they are not generally the focus of DIF studies.
Instead, DIF studies focus on whether children at the same locations
on the score distribution give similar responses across the different
subgroups.
DIF is not always indicated when different groups perform
differently on an assessment or on particular items. For example,
suppose that more English language learners got a particular item
wrong from an assessment of speaking in English than children
who are native speakers; that would constitute differential impact
on the results of the assessment and could well be an interesting result in itself. But the issue of DIF would not necessarily be
raised by such a resultit is to be expected that someone learning
a language will find it harder to speak that language than native
speakers, and hence the result does not challenge the contention
that the instrument was accurately measuring that difference in
their speaking performance.
However, if children from the two groups who scored at
207
208
FIGURE 7-1 Examining differential item functioningProportion answering item Z correctly vs. score on entire test, for male and female subjects
(hypothetical data).
209
210
211
212
213
214
215
216
over the course of the year to ensure that their language skills are
developing at an appropriate pace and that they will be ready for
kindergarten when they finish at Honeycomb.
The committee discusses these purposes and works to further
clarify the assessment setting. They discuss who will administer
and score the assessments, who will interpret the assessments,
what specific decisions will be made on the basis of the assessment results, when these decisions will need to be made and how
often they will be reviewed and possibly revised, which children
will participate in the assessments, and what the characteristics of
these children are: their ages, their race/ethnicity, their primary
language, their socioeconomic status, and other aspects of their
background and culture that might affect the assessment of their
language skills. Dr. Thompson concludes, on the basis of the
answers to these questions and refinement of their purposes in
assessing childrens language, that either a direct assessment or
a natural language assessment might be used. Ms. Conway likes
the idea of using a natural language assessment but considers
that such an assessment may be too costly. The committee decides
not to preclude any particular form of assessment until they have
more information on the available assessments; their reliability
and validity for the purposes they have specified with children
like those at Honeycomb; and the specific costs associated with
using each of them, including the costs of training personnel to
administer, score, and interpret the assessments and the costs
associated with reporting and storing the assessment results so
that they will be useful to teachers.
The committee next considers how they will go about identifying suitable tests. They consider what tests are being used in
other programs like Honeycomb. In one nearby program, the
director has adopted the use of a locally developed assessment.
Ms. Conway considers that perhaps Honeycomb could also use
this assessment, since the other program appears to be obtaining
excellent results with it. However, Dr. Thompson points out that
such a locally developed test, because it has not been normed with
a nationally representative sample, will not meet at least one of
the stated purposes for assessment, namely, to provide the teacher
with information about how each assessed child is doing relative
to other typically developing children. Knowledge about how
217
218
219
220
about reliability and validity that has been accumulated from all
available sources. It is tempting to think that the best decision will
be obvious and that everyone would make the same decision in
the face of the same information, but each setting is somewhat different, and choosing between tests is a matter of balancing competing objectives. For example, reviewers may differ in how much
weight they put on the desire for short testing times compared
with the desire for high reliability and validity for all subgroups of
children, or the desire for a single assessment compared with the
desire to measure all of the identified skills. Thus, decisions may
vary from setting to setting, or even between members of the same
committee in a given setting. These differences can be reduced by
deciding on specific weights for each criterion that all reviewers
will use, but in most situations these differences of opinion will
need to be resolved by the committee.
It is important to keep in mind that, at this point, the goal is
simply to settle on a small slate of possible tests to review directly.
The committee can always decide to keep an extra test in the
review when opinions about it are divided. Some information
will prevent a test from further consideration, such as a test that
has been shown to function differently for language-minority children, children with disabilities, or other important subgroups (see
the section on differential item and differential test functioning),
or a test found to have poor reliability for one or more subgroups,
or a test that is found to have special requirements for test administrators that cannot be met in the current setting.
Lack of information is not, in and of itself, a reason to reject
a test. For example, a test that appears strong on all other criteria
may have no information on its functioning for language-minority
children. Specifically, the published information may not discuss
the issue of test bias, and there may be no normative information
or validity studies that focus on the use of the test with this population. The decision that one makes about this test may depend
largely on: (1) the strength of other tests in the pool with respect to
their use with language-minority children, (2) the ability to locate
information from other sources that can provide the missing information on the test in question, and (3) the capacity of the center
to generate its own information on how the test functions with
this population of children through systematic use of the test and
221
222
elect to show the tests to the teachers who will use them, to have
teachers rate the difficulty of learning to administer the test, and
to pilot the tests with a few children in order to get a sense of how
they react to the procedures. This information will be compiled,
along with the technical and descriptive information about the
test, the information on cost, and the committees best judgment
about any special infrastructure that might be needed to support
a particular test (e.g., a test may require computerized scoring to
obtain standard scores).
At this point, the committee can choose the test or tests that
will best meet the assessment needs of the center. The decision
about which test or tests to adopt will boil down to a compromise
across the many criteria agreed on by the committee. In this case,
these included the desire to have an assessment process that is
both child and teacher friendly, minimizes lost instructional time,
meets the highest standards of evidence for reliability and validity for the purposes for which assessment is being planned and
with the particular kinds of children that comprise the centers
population, and that can be purchased and supported within the
budgetary limits set out by the director. To no ones surprise, no
test has emerged that is rated at the top on all of the committees
dimensions. Nevertheless, the committees diligence in collecting
and reviewing information and in their deliberations has given
them the best possible chance of selecting a test that will best meet
their needs.
Selecting Tests for Multiple Related Entities
In this scenario we consider a consortium of early childhood
programs that seeks to establish an assessment system to guide
instructional decisions that can be used across all programs in the
consortium. The process is similar in many respects to the process
followed by Ms. Conway and the team at Honeycomb. Unique
to this situation are the facts that the consortium wishes to use
assessment to guide instructional decision making and that the
consortium would like to use the assessment system across all
members of the consortium. These differences suggest that the
processes adopted by Honeycomb should be modified in specific
223
224
in what form will the data be collected, and how will the data be
stored and aggregated for reporting purposes? Who will decide
on report formats and the process of disseminating the results?
This list is not exhaustive, but it highlights some of the additional
challenges that arise when more than one entity is involved in the
testing enterprise.
Another major difference between the current scenario and the
Honeycomb scenario is the focus on using assessment results to
guide instructional decisions. Using assessments to guide instructional decisions implies that assessments will occur at intervals
throughout the year, which may imply that different assessments
are used at different times during the year, or that different forms
of the same assessments are used at different times during the
year. In part this distinction hinges on the nature of the instructional decisions to be made throughout the year. Decisions that
relate to monitoring progress in a single domain would generally
argue for the use of different forms of the same assessment over
time, whereas decisions that relate to the introduction of instruction in a new domain or transitioning from one form of instruction to another (e.g., from native language instruction to English
instruction) might argue for the use of a different assessment.
Several questions must be considered when the focus is on
guiding instruction. The first is whether or not the assessment is
expected to assess progress against a specific set of standards set
forth by the state, the district, the consortium, or some other entity.
Ideally, there will not be multiple sets of standards against which
performance must be gauged, as each set of standards potentially
increases the number of behaviors that have to be assessed and
monitored, and the more standards that exist, the more likely it
becomes that sets of standards will come into conflict with one
another.
A second major question that must be addressed is the distinction between status and growth. If the assessment is to monitor
growth over time, it should be clear in what domain growth is
being measured, whether growth in that domain is captured
through quantitative change (i.e., change in level of performance),
or whether growth in that domain is captured through qualitative change (i.e., change in type), or both. Measuring quantitative
change requires that additional psychometric work has been done
225
226
227
228
229
230
231
8
Assessing All Children
234
235
236
237
standardized tests, even controlling for socioeconomic background and proficiency in standard American English (Garcia
and Pearson, 1994; Rock and Stenner, 2005). The list of theories
related to such disparities is long; however, one reason relevant
to this report is that differences in test scores (e.g., between black
and white children) may be due to striking disparities in ecological conditions and to instruments that are not designed to be
sensitive to those cultural variations. Such contextual variations,
if not considered in the assessment instrument design, can lead
to systematic biases (Brooks-Gunn et al., 2003). Such a bias may
actually perpetuate or increase social inequalities because it legitimates them by designing a test that has content and measures
reflecting the values, culture, and experiences of the majority
(Gipps, 1999).
Inappropriate Standardization Sample and Methods
Hall (1997) argues that Western psychology tends to operate from an ethnocentric perspective that research and theories
based on the majority, white, population are applicable to all
groups. These paradigms are seen as templates to be used on all
groups to derive parallel conclusions. As such, often the standardization samples of tests are primarily drawn from white
populations, and often minorities are included in insufficient
numbers for them to have a significant impact on item selection or to prevent bias. For example, there is a great deal of
concern about accurate identification of language disorders
among black children using standardized, norm-referenced
instruments, because many literacy tests are developed based
on mainstream American English and do not recognize dialect
differences. The tests have been normed on children from white,
middle-class backgrounds (Fagundes et al., 1998; Qi et al., 2003;
Washington and Craig, 1992). Often validity and sampling
tests do not include representative samples of nonmainstream
English speakers, so the statistical ability to find items that are
biased is limited (Green, 1980; Seymour et al., 2003).
It may be that the large proportion of minority children who
score poorly on some standardized language assessment tools
may have to do more with the fact the tests have been normed
238
on children from primarily white, middle-class language backgrounds than with true differences in childrens language abilities (Qi et al., 2003). Minority groups may be underrepresented
in standardization samples relative to their proportions in the
overall population, or their absolute number may be too small
to prevent bias. Standardized tests based on white middleclass normative data have inevitable bias against children from
minority and lower SES groups, providing information on their
status in comparison to mainstream children. They do not take
into account cultural differences in values, beliefs, attitudes, and
cultural influences on assessment content; contextual influences
of measuring behavior; or alternative pathways in development
(Notari-Syverson et al., 2003, p. 40).
In addition, the fact that a minority group is included in a
normative sample does not mean the assessment tool is unbiased
and appropriate to use with that group (Stockman, 2000). It
is a common misconception that, because a test is normed, it is
unbiased toward minorities. The norming process, by its nature,
leans toward the mainstream culture (Garcia and Pearson, 1994).
When test companies draw strict probability samples of the
nation, very small numbers of particular minorities are likely to be
included, increasing the likelihood that minority group samples
will be unrepresentative. Even if a test is criterion-referenced
instead of norm-referenced, the performance standards (cutoff
scores) by which the childrens performance is evaluated are
likely to be based on professional judgments about what typical
(that is, mainstream) children know and can do at a particular
developmental level (Garcia and Pearson, 1994).
Inappropriate Testing Situation and Examiner Bias
Rarely examined is the assessors influence on child assessments and whether assessor familiarity or unfamiliarity exerts a
bias against different population groups. For example, situational
factors may systematically enhance or depress the performance
of certain groups differently, such as familiarity with the testing
situation, the speed of the test, question-answer communication
style, assessor personal characteristics, and the like (Green, 1980,
p. 244). Assessor and language bias is present particularly if the
239
240
241
242
Developmental
Domain
Number of
Assessment
Tools
Searched
Number of
Bias Testing
Articles
Found
Cognitive
11
16
Kaufman Assessment
Battery for Children
(K-ABC) (n = 5)
Peabody Individual
Achievement TestRevised (PIAT-R) (n = 2)
Stanford-Binet Intelligence
Scales, Fourth ed.
(SB-IV) (n = 3)
Wechsler Preschool
and Primary Scale of
Intelligence, Third ed.
(WPPSI-III) (n = 3)
Woodcock-Johnson III
(WJ-III) (n = 3)
Language
15
Expressive Vocabulary
Test (n = 3)
Peabody Picture
Vocabulary Test III (n = 5)
Preschool language scale
(n = 1)
Socioemotional
21
Behavioral Assessment
System for Children
(n = 1)
Bayley Scales of Infant
Development (n = 1)
Child behavior checklist
1-5 (n = 1)
Attachment Q-set (n = 1)
Peen Interactive Peer Play
Scale (n = 1)
Approaches to
learning
243
244
245
246
247
a lack of adequate instruments to use with them, especially considering the hundreds of languages spoken in the United States.
Some tests exist in Spanish, but most lack the technical qualities
of a high-quality assessment tool. In addition, there is a shortage
of bilingual professionals with the skills necessary to evaluate
these children, and a shortage as well of conceptual and empirical work systematically linking context with child learning. In
this section we discuss these challenges, review important principles associated with high-quality assessments of young English
language learners, and discuss further needs in the field so that
research and practice work together to see that such principles
are implemented.
Several terms are used in the literature to describe children
from diverse language backgrounds in the United States. A general term describing children whose native language is other than
English, the mainstream societal language in the United States, is
language minority. This term is applied to nonnative English
speakers regardless of their current level of English proficiency.
Other common terms are English language learner and limited
English proficient. These two terms are used interchangeably to
refer to children whose native language is other than English and
whose English proficiency is not yet developed to a point at which
they can profit fully from English instruction or communication.
In this report, the term English language learner is used, rather
than limited English proficient, as a way of emphasizing childrens learning and progress rather than their limitations. Given
the charge of the committee, the focus is particularly on children
from birth to age 8young English language learners.
Young English Language Learners: Who Are They?
Young English language learners have been the fastest growing child population in the country over the past few decades, due
primarily to increased rates in both legal and illegal immigration.
Currently, one in five children ages 5-17 in the United States has a
foreign-born parent (Capps et al., 2005), and many, though not all,
of these children learn English as a second language. Whereas the
overall child population speaking a non-English native language
in the United States rose from 6 percent in 1979 to 14 percent in
248
249
250
251
252
253
254
255
256
257
258
259
260
261
2000). This diverse population of young children presents numerous challenges related to the validity of assessments, not only
because they are young, but also because of their developmental
or disability-related needs. The following pages address why
young children with special needs are being assessed, the principles that should guide assessment, and some of the unique issues
raised by conducting assessments for this population. The term
young children with special needs is used to describe children
from birth through age 5 years who have diagnosed disabilities,
developmental delays, or a condition that puts them at risk for a
delay or a disability.
Key to understanding the assessment issues in this area is
understanding who makes up this population. Many children
with special needs receiving services do so through programs
supported under the Individuals with Disabilities Education Act,
the primary law that provides funding and policy guidance for
the education of children with disabilities. The IDEA is basically a
grants program of federal funds going to states to serve students
with special needs on the condition that the education provided
for them is appropriate (National Research Council, 1997).
In 2006, nearly 1 million children with special needs under
age 5 received services through programs governed by the IDEA.
Specifically, almost 300,000 children under age 3 received early
intervention services and more than 700,000 children ages 3 to 5
received special education and related services (https://www.
ideadata.org/arc_toc8.asp#partbCC). Children under age 5 with
special needs are served under two different sections of IDEA.
Children from birth to age 3 receive services under Part C, Infants
and Toddlers with Disabilities, whereas children ages 3 through
5 are served under Part B, which addresses special education and
related services for children and youth ages 3 through 21.
Infants and toddlers receive services for a variety of developmental problems, with communication problems being the most
frequent. A total of 64 percent of children served under age 3 have
some kind of developmental delay. Nearly one in five (19 percent)
have some kind of a prenatal or perinatal abnormality, and 18
percent have motor problems. Three-fourths of the children identified between ages 2 and 3 receive services for a communication
problem. Smaller percentages have problems with movement (18
262
263
264
265
266
267
268
269
270
271
272
273
274
children, including those with special needs. These observationbased tools are unique because they were designed from the
beginning to ensure that young children with disabilities could
be included in the data collection (see http://www.draccess.org
for more information).
In addition to these general problems, we describe below several challenges of special relevance to the assessment of children
with disabilities.
Construct-Irrelevant Skills and the Interrelatedness of
Developmental Domains
For a young child to demonstrate competency on even a single
item on an assessment requires a combination of skills, yet some
of them may not be relevant to the construct being assessed. To
the extent that items on an assessment require skills other than
the construct being assessed (e.g., problem solving), constructirrelevant variance exists in the scores. Some examples of this in
assessments of young children with special needs are obvious. A
child who cannot hear or who has no use of her arms will not be
able to point to a picture of a cat when asked. The item requires
hearing and pointing as well as knowledge of a cat, even though
these are not the skills being tested. The child who cannot point
will fail the item, regardless of what he or she knows about cats.
Other occurrences of construct-irrelevant variance may not
be so obvious. All assessments that require children to follow
and respond to the examiners directions require some degree
of language processing. Even though test developers attempt to
address this by keeping instructions simple, all young children
are imperfect language processors because they are still learning language. Many young children with special needs have
impairments related to communication, meaning their capacity
to process language is even less than the restricted capacity of a
typical peer. Unlike deafness, blindness, or a motor impairment,
language processing problems may present no visible signs of
impact on the assessment process.
Construct-irrelevant variance is a major problem for the
assessment of young children because many assessments are
organized and scored around domains of development. Domains
275
276
277
278
Explanation
Inclusive assessment
population
Precisely defined
constructs
Accessible,
nonbiased items
Amenable to
accommodations
Simple, clear,
and intuitive
instructions and
procedures
Maximum
readability and
comprehensibility
Maximum legibility
279
280
Conclusion
The nearly 1 million young children with special needs are
regularly being assessed around the country for different purposes. Although a variety of assessment tools are being used
for these purposes, many have not been validated for use with
these children. Much more information is needed about assessments and children with special needs, such as what tools are
being used by what kind of professionals to make what kind of
decisions. Assessment for eligibility determines whether a young
child will have access to services provided under the IDEA. It is
unknown to what extent these critical decisions are being made
consistent with recommended assessment practices and whether
poor assessment practices are leading to inappropriate denial of
service. The increasing call for accountability for programs serving young children, including those with special needs, means
that even more assessment will be occurring in the future. Yet
the assessment tools available are often insufficiently vetted for
use as accountability instruments, and they are difficult to use in
standardized ways if children have special needs, and they focus
inappropriately on discrete skills rather than functional capacity
in daily life. Until more information about assessment use is available and better measures are developed, extreme caution is critical
in reaching conclusions about the status and progress of young
children with special needs. The potential negative consequences
of poor measurement in the newest area of assessment, accountability, are especially serious. Concluding that programs serving
young children with special needs are not effective based on
flawed assessment data could lead to denying the next generation
of children and families the interventions they need. Conversely,
good assessment practices can be the key to improving the full
range of services for young children with special needs: screening,
identification, intervention services, and instruction. Good assessment practices will require investing in new assessment tools and
creating systems that ensure practitioners are using the tools in
accordance with the well-articulated set of professional standards
and recommendations that already exist.
9
Implementation of
Early Childhood Assessments
282
283
284
resentative sample of Head Start programs regarding implementation of the Head Start National Reporting System (NRS) suggest
that there was ambiguity as to whether the information from the
child assessments was to be used for evaluation and monitoring
purposes (with the intent of informing program improvement
and tracking whether improvements were occurring over time) or
whether it was intended for high-stakes purposes (to make determinations about program funding). Staff in 63 percent of the programs in this study indicated that they felt that it was not clear how
the results of the assessment were going to be used (Mathematica
Policy Research, 2006). This study concluded that when systems
of early childhood assessment are implemented, information
should be shared with programs about how data will be used.
Furthermore, if the intent is to guide program improvement, the
results at the program level should be shared with sufficient time
to guide decisions for the coming year, and guidance should be
provided on how to use the results at the program level.
Communicating with Parents
A further issue of importance in planning for the implementation of early childhood assessments is whether informed consent
is required of parents and how they will be informed of results.
Mathematica Policy Research (2006) reports that in the representative sample of Head Start programs studied to document
implementation of the NRS, nearly all programs had informed
parents that their children would be participating in the assessments. However, there was ambiguity as to whether informed
consent was needed. In the second year of implementation, in
this sample, two-thirds of programs had obtained written consent
from parents. This represented a substantial increase over the
proportion of programs collecting written consent in the first year
of implementation.
Thus, in preparing for administration of early childhood
assessments, a clear decision should be made about a requirement to obtain informed consent from parents, and it should be
A report of the spring 2006 NRS administration was published in 2008 and
received too late for inclusion here.
285
286
287
tion to the assessor was associated with higher scores for children
ages 6 to 9 on a measure of receptive vocabulary assessed in the
home as part of the National Longitudinal Survey of Youth-Child
Supplement. This finding suggests that a familiar presence may
help the child relax and focus during an assessment. It is also possible that the causal direction works in the opposite way, and that
children who have closer, more supportive, and stimulating relationships with parentsand therefore may tend to score higher
on a vocabulary assessmentalso tend to have parents who want
to stay with them and monitor a situation with an unfamiliar
adult present. In addition, in this study, when there was a match
between the childs and the assessors race, the race-related gap
in assessment scores on measures of vocabulary, reading, and
mathematics was significantly reduced.
Counterbalancing these findings are reports from the study
by Mathematica Policy Research (2006) indicating that familiarity of the assessor and child can also pose difficulties. In the
small but representative sample of Head Start programs in which
implementation of the NRS was studied, teachers were used as
assessors in 60 percent of programs. Furthermore, teachers were
often permitted to assess the children in their own classes (this
was reported in 75 percent of programs that used teachers as
assessors). According to the report, teacher assessors sometimes
became frustrated when they felt that the child was responding
incorrectly, because the teacher felt that the child knew the correct
answer to an assessment question (for example, could name more
letters than responded to correctly on the letter-naming task).
Teachers sometimes felt uncomfortable with the standardization
required for the assessments, especially not being able to provide
praise when the child performed well. Some children also reportedly became concerned because of the discrepant behavior of their
teachers in not providing positive feedback.
Systematic study of the effects of familiarity of the assessor
on childrens assessment scores would make an important contribution. While evidence to date concerns variation in childrens
scores and reactions to the assessment situation when familiarity
with the assessor has varied naturally (that is, at the decision of
families or programs regarding who should be present during
an assessment), an important next step would be to randomly
288
289
One recent study examined variations in childrens performance associated with session length on the assessments carried
out for the Preschool Curriculum Evaluation Research Study
(PCER; Rowand et al., 2005). While the FACES early childhood
assessments and the assessments carried out for the Head Start
Impact Study required about 20 minutes to administer, and
the NRS took approximately 15 minutes, the PCER assessment
battery was substantially longer, requiring about 60 minutes.
Because the PCER study was designed to evaluate the full range
of impacts of different early childhood curricula, it was important
that multiple domains of development be assessed. However, an
important question was whether the longer assessment was having implications for the childrens performance.
Rowand et al. (2005) found that children who took longer to
complete the PCER assessments scored higher, probably because
these children were administered more items to reach their ceiling.
These researchers also asked whether children generally scored as
well on subtests focusing on literacy that were administered earlier versus later in the assessment battery. They found that 63 percent of children showed consistent performance on the early- and
late-administered literacy assessments. The 37 percent of children
whose performance varied on earlier versus later subtests of the
same domain, however, included 21 percent who scored worse
as the assessment proceeded (perhaps reflecting fatigue with the
long assessment) but 16 percent who scored better on the related
assessment carried out later in the session. In a sample of 1,168
preschool-age children, 228 needed two sessions instead of one
to complete the assessment. Performance on four key outcomes
did not differ significantly according to the number of sessions
required to complete the assessment. However, interviewers rated
children as more persistent, more likely to sit still, and less likely
to make frequent comments if they completed the assessment in
one session. These results suggest that long assessment batteries
may be difficult for some young children to complete, and that it
is important to train assessors to identify when to take breaks or
split administration. The authors of this study note the need for
a random assignment study in which children are assigned to
complete the same battery of assessments in one versus two sessions. This would eliminate issues of self-selection in the research
290
291
292
293
The conceptual scoring on the receptive vocabulary assessment is intended to acknowledge that children learning English
may have mastered particular words in one or another language,
giving the child the opportunity show mastery of vocabulary
across languages. This matches with the purpose noted above
of assessing overall mastery of concepts and vocabulary rather
than vocabulary in a particular language, an approach that will
not be appropriate if the underlying purpose is to assess retention of home language or progress in English. The important
point to note here is that the range of options for routing and of
approaches to assessment for children learning English is expanding and will enable better matching with the underlying purpose
of assessment.
Order of Administration
Questions about the order of administration of assessments
for children learning English arose in the initial year of the NRS
and resulted in a change in practice (Mathematica Policy Research,
2006). In the first year, all children receiving the assessment in
both Spanish and English started with the English assessment.
However there was feedback that this was discouraging to children whose mastery of English was still limited. There was concern that scores on the Spanish language assessment were being
affected by these childrens initial negative experience with the
English assessment.
In the second year of administration, the order of administration was reversed, so that the Spanish version of the assessment
was always to be given first to children receiving the assessment
in both Spanish and English. Interestingly, this too caused some
problems, particularly in the spring administration. By this point,
children who were accustomed to speaking only in English in
their Head Start programs were not always comfortable being
assessed in Spanish. According to Mathematica Policy Research,
the childrens discomfort may have arisen for several different
reasons: they may have been taught not to speak Spanish in
their Head Start programs, their Spanish may never have been
very strong, or their Spanish may have been deteriorating. There
were also some observed deviations from the sequencing of the
294
assessments in the small observational study of assessments conducted in both Spanish and English. Three of 23 programs that
participated in this study were observed continuing to administer
the assessment in English prior to the Spanish version after the
change in guidelines for administration.
These findings indicate that when the decision is to administer
assessments in two languages, a decision about order of administration is not an easy one to make because there are potential
issues with either ordering. Decisions about ordering may need
to take into account the nature and goals of the early childhood
program, especially whether the primary goal is to maintain two
languages or to introduce English. There is a need for systematic
study of whether scores for young children learning English
vary according to order of administration of home language and
English versions of assessments.
Length of Administration
The NRS implementation study found that administration
of the Spanish assessment took several minutes longer than the
English assessment (18.6 compared with 15.8 minutes). In addition, children who received the assessments in two languages had
to spend double the time or a little more in the assessment situation. The guidance that sites received was to try to administer both
assessments the same day, but to reserve the English language
assessment for another day if the child seemed bored or tired.
Interviews with program staff about their experiences in administering the NRS assessment indicated concern with the burden
to Spanish-speaking children of taking the assessment in two
languages (Mathematica Policy Research, 2006). There is a need
for systematic study of whether childrens assessment scores are
related to whether assessments in two languages are conducted as
part of a single session or broken up into two sessions.
Availability of Bilingual Assessors and Trainers
A further issue may be finding assessors who are sufficiently
bilingual to administer assessments in both Spanish and English.
Although the study conducted of assessments in both Spanish and
295
296
example, to ascertain whether an aide should be present, if children need to take frequent breaks, or if it is important to confirm
that hearing aids or other assistive devices are working properly.
It is possible that certification on assessments could include a
requirement to tape an assessment with a child who has a disability. Such a procedure would help to ensure that assessors are
aware of and are implementing appropriate practices for children
with special needs.
In the small study of NRS implementation, 30 of 35 programs reported carrying out assessments with children with
disabilities. Staff in these programs usually indicated that they
were comfortable with the accommodations made for these
children. However, about one in six programs would have liked
additional information on when to include children with disabilities in the assessment process and when to exempt them and
on the kinds of accommodations that were appropriate during
the assessments. Some direct observations of assessments carried
out as part of the study indicated that children who could have
been exempted were nonetheless being assessed. These findings
suggest that in implementing a system of early childhood assessments, it is a high priority to articulate clearly the decision rules
for including children with disabilities in the assessments as well
as to provide appropriate training for assessors on the use of
accommodations.
Following Up on Administration
Guiding the Use of Information from Assessments
Key implementation decisions for a system of early childhood assessments do not stop once the assessments have been
administered and the data analyzed and summarized. Decisions
have to be made about how assessment results will be reported
back to programs and program sponsors/funding agencies, and
what guidance will be provided on how programs should use the
information from the assessments. Fundamental decisions need to
be made about how results will be used if the purpose of carrying
out assessments is for program monitoring and evaluation or for
high-stakes purposes.
297
298
Part
IV
Assessing Systematically
299
10
Thinking Systematically
n this volume we have discussed the dimensions of assessment, including its purposes, the domains to be assessed, and
guidelines for selecting, implementing, and using information
from assessments. Beyond this, however, one cannot make use
of assessments optimally without thinking of them as part of a
larger system. Assessments are used in the service of higher level
goalsensuring the well-being of children and their families,
ensuring that societal resources are deployed productively,
distributing scarce educational or medical resources equitably,
facilitating the relevance of educational outcomes to economic
challenges, making informed decisions about contexts for the
growth and development of children, and so on. Assessments by
themselves cannot achieve these higher goals, although they are
a crucial part of a larger system designed to address them. Only
when the entire system is considered can reasonable decisions
about assessment be made.
This chapter argues that early childhood assessment needs to
be viewed not as an isolated process, but as integrated in a system that includes a clearly articulated higher level goal, such as
optimal growth, development, and learning for all children; that
defines strategies for achieving the goal, such as adequate funding, excellent teaching practices, and well-designed educational
environments; that recognizes the other elements of infrastructure
301
302
instrumental to achieving the goal, such as professional development and mechanisms for monitoring quality in the educational
environment; and that selects assessment instruments and procedures that fit with the other elements in service of the goal. We
begin by noting the multiple state and federal structures in which
early childhood assessments are being implemented.
These structures have emerged from different sources with
different funding streams (e.g., federally funded Head Start, statefunded prekindergarten, foundation-funded intervention programs) and rarely display complete convergence of performance
standards, criteria, goals, or program monitoring procedures.
Thus, referring to a larger system of early care and education
is slightly deceptive, or perhaps aspirational. Furthermore, even
the well-established programs in the system may lack key
componentsfor example, they may assess child outcomes but
not relate those outcomes to measures of the environment, or they
may not have a mechanism in place for sharing child outcome
data in helpful ways with caregivers and teachers.
We use recent National Research Council reports, state experiences with the No Child Left Behind Act, and the recent work
of the Pew Foundationsponsored National Early Childhood
Accountability Task Forcea national effort focused on accountability in early childhoodas a basis for articulating the components needed in order for early childhood assessment to be part
of a fully integrated system. We also provide some examples of
progress toward this goal at the state level. Although we did not
find any examples of fully integrated systems, in which services
are provided by a single source and the assessment infrastructure
is fully aligned and developed, the three states we describe are
moving toward integrating early childhood assessment in a wellarticulated system.
WHAT DO WE MEAN BY A SYSTEM?
The idea of a system comes up often in education discussions
and analysesthere are education systems, instructional systems,
assessment systems, professional development systemsbut it is
not always clear what the word actually means. Systems have a
number of important features, which are enumerated in Systems
THINKING SYSTEMATICALLY
303
304
assistance in implementing instructional activities. These two subsystemsthe individual- and the program-level feedback from
child performance to teacher supportsfunction well as part of a
larger system if the same or consistent information is used in both
loops. However, if, for example, the teacher is responding to child
performance so as to enhance creative problem solving, whereas
the institution is encouraging teachers to focus on childrens rote
memorization capacity, then the subsystems conflict and do not
constitute a well-functioning system.
In a well-designed program, the assessment subsystem is
part of a larger system of early childhood care and education
comprised of multiple interacting subsystems. These other systems include the early learning standards, which describe what
young children should know and be able to do at the end of the
program; the curriculum, which describes the experiences and
activities that children will have; and the teaching practices,
which describe the conditions under which learning should take
place, including interactions among the teachers and children as
well as the provisioning and organization of the physical environment (National Research Council, 2006). The relationships among
these four subsystems are illustrated in Figure 10-1, adapted
from the curriculum, instruction, assessment (CIA) triangle
commonly cited in the educational assessment community. Each
of these subsystems is also affected by other forces, for example,
laws intended to influence what children are expected to learn,
professional development practices, and teacher preparation
policies influenced by professional organizations and accrediting agencies. We argue in this chapter that all these components
must be thought of as part of a larger system, and that they must
be designed so as to be coherent with one another, as well as with
the policy and education system they are a part of, and with the
goals for child development that the entire system is meant to be
promoting. We reframe these arguments as a conclusion to this
chapter.
Infrastructure for an Assessment System
An early childhood assessment subsystem should be part of
a larger system with a strong infrastructure that is designed to
THINKING SYSTEMATICALLY
305
306
307
THINKING SYSTEMATICALLY
308
THINKING SYSTEMATICALLY
309
org/projects/SCASS/projects/early_childhood_education_
assessment_consortium/publications_and_products/2838.cfm).
It defines standards as widely accepted statements of expectations for childrens learning or the quality of schools and other
programs. Of critical importance in this definition is the inclusion of program standards on equal footing with expectations
for childrens learning.
The report Systems for State Science Assessment (National
Research Council, 2006) examines the role of standards in certain
educational assessments and recommends that they be designed
with a list of specific qualities in mind: standards should be clear,
detailed, and complete; be reasonable in scope; be correct in their
academic and scientific foundations; have a clear conceptual
framework; be based on sound models of learning; and describe
performance expectations and proficiency levels. State standards
that have been developed for K-12 education do not meet these
requirements as a whole, although some come closer than others.
Recent analyses of states early childhood standards also suggest
some misunderstanding of the difference between content and
performance (Neuman and Roskos, 2005; Scott-Little, Kagan, and
Frelow, 2003a). Appendix C presents a brief description of the current status of state standards for early childhood education, and
includes some discussion of the efforts to align early childhood
with K-12 standards.
Standards should be arranged and detailed in ways that
clearly identify what children need to know and be able to do
and how their ideas and skills will develop over time. Learning
progressions (also called learning trajectories) and learning
performances are two useful approaches to arranging and detailing standards so as to guide curriculum, teaching practices, and
assessment.
Learning progressions are descriptions of successively more
sophisticated ways of thinking and behaving that tend to follow
one another as children mature and learn: they lay out in text and
through examples what it means to move toward more mature
understanding and performance.
A useful example of the ideas of learning progressions and
learning performances in the preschool years is Californias
Desired Results Developmental Profiles-Revised (DRDP-R) and
310
THINKING SYSTEMATICALLY
311
Preschool
SOC 4 (of 6)
Leads or participates in
planning cooperative play with
other children
Integrating
FIGURE 10-2 An excerpt from the Desired Results Developmental Profile-Revised. Reprinted by permission from the California
Department of Education, CDE Press 1430 N. Street, Suite 3207, Sacramento, CA 95814.
Alternate Figure 10-2, downloaded from some source,
with type as editable type
broadside
R01340
Measure 6
Examples
} Plays with blocks with another child.
} Plays in sand to build a castle with
several other children.
} Joins another child to help look for a
lost toy.
Building
Developing
Exploring
Definition: Child interacts with other children through play that becomes increasingly cooperative and oriented towards a shared purpose
312
THINKING SYSTEMATICALLY
313
314
Reporting
The reporting of assessment results is frequently taken for
granted, but deliberation on this step is essential in the design of
assessment systems and for the sound use of assessment-based
information. In fact, decisions about the scope and targets of
reporting should be made before assessment design or selection
proper begins, and, most importantly, before the assessment data
themselves are collected (National Research Council, 2006).
Information about childrens progress is useful for all tiers
of the system, although different tiers need varying degrees of
assessment frequency and varying degrees of detail. Parents,
teachers, early childhood program administrators, policy makers,
and the public need comprehensible and timely feedback about
what is taking place in the classroom (Wainer, 1997). Furthermore,
taking a systems perspective, many kinds of information need
to be accessible, but not all stakeholders need the same types of
information. Thus, very early in the process of system design,
questions need to be asked about how various types of information will be accessed and reported to different stakeholders and
how that reporting process can support valid interpretations.
Individual standards or clusters of standards can define the
scope of reporting, as can learning progressions if they have been
developed and made clear to the relevant audiences. Reports
can compare one childs performance, or the performance of a
group, with other groups or with established norms. They can
also describe the extent to which children have met established
criteria for performance (the current No Child Left Behind or
NCLB option). If descriptions of the skills, knowledge, and abilities that were targeted by the tasks in the assessment are included,
users will be better able to interpret the links between the results
and goals for childrens learning. It is important to recognize that
many states lack the resources to design assessments that are
perfectly aligned with their standards. They may have to resort
to selecting existing assessments and cross-walking them to standards. While this may lead to a period of only partial alignment,
the exercise leads to useful opportunities to refine both standards
and assessment portfolios.
The reporting of assessment outcomes can take on many
315
THINKING SYSTEMATICALLY
Outcomes
Foundation Descriptions
Level Descriptions
Aspects
Aspect Descriptions
316
THINKING SYSTEMATICALLY
317
318
THINKING SYSTEMATICALLY
319
320
321
THINKING SYSTEMATICALLY
322
THINKING SYSTEMATICALLY
323
324
The process described here may go beyond the resources available in many programs. In particular, some programs may need to
rely on selecting existing assessment tools and reporting strategies
rather than developing new ones. Nonetheless, we describe here
an ideal toward which programs should be moving.
The current Landscape of
Early Childhood Systems
An analysis of a systems approach for early childhood assessment starts with the somewhat utopian view presented in the
previous section, but it also requires careful review of the current
terrain: How are current early childhood assessment efforts linked
to standards, learning opportunities, or both? The early childhood landscape reveals multiple forms and targets of service and
assessment, varied sources of standards and mandates, numerous
ways of reporting and using data, and different approaches to
linking consequences with patterns of performance by children
and programs (Gilliam and Zigler, 2004); in other words, it is
at this moment very far from constituting a single system. The
National Early Childhood Accountability Task Force (2007) concluded that early childhood agencies are implementing a great
variety of child and program assessments.
Table 10-1 displays nine different forms of child and program assessments, including four forms of assessment used to
document the quality of early childhood programs, four forms
of assessments of young children, and one form of assessment
that gathers information on both program quality and childrens
learning. Each form carries its own distinctive purposes, its procedure for reporting to different audiences, and its specific ways
of using assessment data. Taken together, these multiple assessments are generating many different types of data on children and
programs. They also require substantial time and effort from local
practitioners and program administrators (National Early Childhood Accountability Task Force, 2007).
Beyond drawing attention to the large number of different
forms of assessment, the Accountability Task Force Report notes
that current assessment models, with the single exception of program evaluation studies, separate reports about child outcomes
325
THINKING SYSTEMATICALLY
Population Assessed
Uses of Data
Quality rating
systems
Providers seeking
recognition for varied
levels of quality
Consumer information on
quality status
Higher reimbursement
rates for higher quality
Program improvement
Program
accreditation
Providers seeking
recognition as above a
threshold of quality
Consumer information on
quality status
Program improvement
Program monitoring
Providers receiving
state/federal program
funding
Program improvement
Funding decisions
Program licensing
Determine compliance
with health and safety
standards
Kindergarten
readiness
assessment
All children at
kindergarten entry
Report to public
Planning early
childhood investments
State/federal pre-K
child assessments
Children enrolled in a
state or federal program
Reporting to funding
sources
Assessment for
instruction
All children
Planning curriculum
Informing parents
Developmental
screening
All children
Program Assessments
Child Assessments
Representative samples
of children and local
programs
Report to legislatures
and the public on
program quality,
outcomes, impacts
Informs program
improvement and
appropriations decisions
326
327
THINKING SYSTEMATICALLY
Child Care
Head Start
State Pre-K
Early
Childhood
Special
Education
Standards
for
childrens
learning
Early
learning
guidelines
(49 states)
Head Start
Child
Outcomes
Framework
(federal)
Early learning
guidelines
(49 states)
3 functional
goals
(federal)
Child
assessments
No current
requirements
National
Reporting
System*
(federal)
Pre-K
assessments
(12 states)
Kindergarten
assessments
(16 states)
States report
percent of
children in 5
categories on 3
goals
*The National Reporting System was discontinued after this table was published.
SOURCE: National Early Childhood Accountability Task Force (2007).
328
329
THINKING SYSTEMATICALLY
330
THINKING SYSTEMATICALLY
331
332
THINKING SYSTEMATICALLY
333
334
not only for the required state and local reporting functions, but
also for ongoing program improvement and curriculum planning.
Nebraskas system is responsive to the federal mandate of the
IDEA Part C (birth to age 3) and Part B, 619 (ages 3 to 5), as well
as the state requirements of Nebraska Department of Education
Rule 11, Regulations for Early Childhood Programs (http://www.
nde.state.ne.us/LEGAL/RULE11.html), which apply to all pre-K
programs operated through public schools.
Program Quality Assessment
The system also includes regular evaluation of programs
to ensure that they achieve and maintain overall high quality,
employ qualified staff, and operate in compliance with federal and state guidelines. Programs receiving state funding are
required to conduct an annual evaluation using one of the environment rating scales, such as the Infant/Toddler Environment
Rating Scale-Revised, ITERS-R (Harms, Clifford, and Cryer, 1998);
Early Childhood Environment Rating Scale-Revised, ECERS-R
(Harms, Cryer, and Clifford, 1990); or the Early Language and
Literacy Classroom Observation, ELLCO (Smith and Dickinson,
2002), and complete Nebraskas Rule 11 reporting and approval
processes. Data obtained from these tools are used to develop
improvement plans. In addition, programs are strongly encouraged to participate in the accreditation process of the National
Association for the Education of Young Children and receive
technical and financial assistance to do so.
Professional Development
Programs receive continuous support to ensure that their
participation in Results Matter does generate the highest quality
data and knowledge about how to use it to improve program
quality and child and family outcomes. The states Early Childhood Training Center, in cooperation with the organizations that
provide the program and child assessment tools, regularly offers
training in their use. The state maintains a cadre of professionals
who have achieved reliability in the use of the environment rating
scales. In addition, each program provider is required to submit
335
THINKING SYSTEMATICALLY
336
THINKING SYSTEMATICALLY
337
338
THINKING SYSTEMATICALLY
339
340
equivalent opportunity to achieve the defined goals, and the allocation of resources should reflect those goals. We emphasize that
a system of assessment is only as good as the effectivenessand
coherenceof all of its components.
11
Guidance on Outcomes
and Assessments
342
can lead to decisions that are unfair or unclear, and they may do
harm to programs, teachers, and, most importantly, children.
In this chapter, we present a set of guidelines that should be
useful to a broad range of organizations charged with the assessment of children and of programs providing care and education
to young children. These guidelines are organized around the
major themes of the report and flow from the perspective that
any assessment decision should be made in the context of a
larger, coherent assessment system, which is in turn embedded
in a network of medical, educational, and family support systems
designed to ensure optimal development for all children.
Thus, though we briefly recap our rationale, based on our
review of the literature, and present our guidelines following the
order of topics in the volume, we hope the reader interprets our
discussion of purposes, targets, and procedures for assessment as
different specific topics subordinated to the notion of an assessment
system. In compliance with our charge, we have also included a
section presenting a recommended agenda for research on the
assessment of young children, following the detailed guidelines.
These guidelines should be useful to anyone contemplating the selection or implementation of an assessment for young
children, including medical and educational service providers,
classroom practitioners, federal, state, and local governments
and private agencies operating or regulating child care and early
childhood education programs, and those interested in expanding
the knowledge base about child development and the conditions
of childhood. To make our guidance more pointed and practical,
the chapter ends with a list of high-priority actions by members
of specific groups engaged in the assessment of young children,
which can be taken quickly and should provide maximum
payoffs.
Purposes and uses of Assessment
Rationale
In recent years, the purposes for which young children are
being assessed have expanded, with more children being assessed
than ever before. Young children have been assessed to screen for
343
344
345
346
date assessments for the desired purpose and for use with
all the subgroups of children to be included. Although the
same measure may be used for more than one purpose,
prior consideration of all potential purposes is essential, as
is careful analysis of the actual content of the assessment
instrument. Direct examination of the assessment items is
important because the title of a measure does not always
reflect the content.
Domains and Measures of
developmental outcomes
Rationale
During infancy and toddlerhood in particular, frequently
assessed domains include those implicated by the agenda of
screening for medical, developmental, or environmental risk.
Across the entire preschool period, a critical issue is what aspect
of young childrens skills or behavior to measure. Research on
the developing child has traditionally conceived of development
as proceeding in different domains, for example, language or
motor or socioemotional development. These distinctions have
served science well and are helpful for assessment purposes, but
in reality the distinctions among childrens skills and behaviors
are somewhat artificial and not as clear-cut as the organization
of research or assessment tools would suggest. Developmental
domains are intertwined, especially in the very young child,
making it challenging or even impossible to interpret measures in
some domains without also measuring the influence of others.
Health, socioemotional functioning and cognitive functioning are closely interconnected in infancy, as for example when
sleeping difficulties affect both socioemotional and cognitive
functioning. For somewhat older preschoolers, the domains may
be more readily differentiated operationally and theoretically, but
they remain interdependent; for example, socioemotional (e.g.,
capacity to regulate negative emotion) and cognitive measures are
interrelated and appear to have linked neural bases.
Nevertheless, a conceptualization is needed that identifies the
areas of development society wants to track and that programs
347
348
(D-2)
(D-3)
(D-4)
(D-5)
349
350
conditions like hunger or fatigue, and to recognizing the possibility of bias if the tester is a caregiver or otherwise connected to the
child. Instruments that have the most user-appeal often do not
have the best psychometric properties. For example, portfolios
of childrens artistic productions contain rich information but are
hard to rate reliably. In the experience of committee members,
selection of instruments is often more influenced by cost, by ease
of administration, and by use in other equivalent programs than
by the criteria proposed here.
Those charged with selecting assessment instruments need
to carefully review the information provided in the instruments
technical manual. Although test publishers may provide extensive psychometric information about their products, additional
evidence beyond that provided in manuals should also be considered in instrument selection. Those selecting assessments
should be familiar with the assessment standards contained in
the standards document produced by the American Educational
Research Association, American Psychological Association, and
National Council on Measurement in Education (1999). Important
questions to ask are: Has this assessment been developed and
validated for the purpose for which it is being considered? If a
norm-referenced measure is being considered, has the assessment
been normed with children like those with whom it will be used?
For example, if the assessment is to be used as part of a program
evaluation with minority children, were like children included in
the development studies, including any norming studies? There
is typically more robust evidence for inferences based on early
childhood measures when used for normally developing, white,
English-speaking children than for children from ethnic or language minorities or children with disabilities. Validity evidence
is quite sparse for these special groups on most extant measures.
Conducting valid assessments with language-minority children
and children with special needs is especially challenging, and the
reader is referred to Part III for more discussion of these topics. As
explained in Chapter 7, one cannot say that measurement instruments either possess or lack validity; rather, inferences from the
use of particular measurements for particular purposes may be
supported or not supported by validity evidence.
There are many special considerations when using existing
351
352
353
354
355
356
357
358
assessment subsystem within a larger system of early childhood care and education.
(S-2) A successful system of assessments must be coherent in a
variety of ways. It should be horizontally coherent, with the
curriculum, instruction, and assessment all aligned with
the early learning and development standards and with the
program standards, targeting the same goals for learning,
and working together to support childrens developing
knowledge and skill across all domains. It should be vertically coherent, with a shared understanding at all levels of
the system of the goals for childrens learning and development that underlie the standards, as well as consensus
about the purposes and uses of assessment. It should be
developmentally coherent, taking into account what is known
about how childrens skills and understanding develop over
time and the content knowledge, abilities, and understanding that are needed for learning to progress at each stage of
the process. The California Desired Results Developmental
Profile provides an example of movement toward a multiply coherent system. These coherences drive the design of
all the subsystems. For example, the development of early
learning standards, curriculum, and the design of teaching
practices and assessments should be guided by the same
framework for understanding what is being attempted
in the classroom that informs the training of beginning
teachers and the continuing professional development of
experienced teachers. The reporting of assessment results
to parents, teachers, and other stakeholders should also be
based on this same framework, as should the evaluations of
effectiveness built into all systems. Each child should have
an equivalent opportunity to achieve the defined goals, and
the allocation of resources should reflect those goals.
( S-3) Following the best assessment practices is especially crucial
in cases in which assessment can have significant consequences for children, teachers, or programs. The NRC
report High Stakes: Testing for Tracking, Promotion, and Graduation (National Research Council, 1999) urged extreme
caution in basing high-stakes decisions on assessment outcomes, and we conclude that even more extreme caution
(S-4)
(S-5)
(S-6)
(S-7)
359
360
361
362
363
allow growth to be tracked on the same assessment, even if children are performing significantly below their age peers.
Recently developed tools for examining social emotional
development need further work to generate evidence about their
reliability, validity, and sensitivity to intervention approaches.
More work is needed to develop key constructs within the domain
of approaches to learning, as well as tools to measure those constructs and their role in childrens learning and development. The
shortcomings of current measures, especially standardized normreferenced measures for young children and those with special
needs, have been extensively documented, yet it is precisely these
kinds of measures that are often employed in large-scale data
collections. New measures are needed that accurately capture
childrens growth toward being able to meaningfully participate
in the variety of settings that make up their day-to-day lives.
Research is needed on how to effectively use technology in
all forms of early childhood assessment. Some assessments currently provide for online entry of data and computerized scoring
and automatic report generation, but more work is needed. More
research is needed on the use of computer adaptive procedures
for establishing floor and ceiling levels, to allow more in-depth
assessment at the childs current performance level. Computeradaptive assessment could be applicable to both direct and
observation-based measures.
For the Improvement of Screening
Research is needed to validate screening tools for the full
range of children represented in early childhood programs. There
is a need to continue to collect information on who currently conducts screenings, including consideration of the barriers working
against more widespread screening. There is a need for information on how many are screened, fail the screen, receive follow-up
testing, and receive treatment or intervention based on whether a
problem is verified. (Newborn hearing screening data is a model
for this; the dismal results on measures of follow-up have become
clear only because the data were systematically collected.)
364
365
366
367
More research documenting the current scenarios for the assessment of young ELLs across the country is needed, including more
work to evaluate assessment practices in various localities; survey
research and observational approaches to document practices in
preassessment and assessment planning, conducting the assessment,
analyzing and interpreting the results, reporting the results (in written and oral format), and determining eligibility and monitoring; and
a focus on the development of strategies to train professionals with
the skills necessary to serve young ELL children.
Research is needed to develop assessment tools normed especially for young English language learners using a bottom-up
approach, so that assessment tools, procedures, and constructs
assessed are aligned with cultural and linguistic characteristics
of ELL children.
Children with Special Needs
More research is needed on what the various practitioners
who assess young children with special needsearly interventionists, special education teachers, speech therapists, psychologists, etc.actually do.
More research is needed on the use of accommodations with
children with disabilities. What are appropriate guidelines for
decision making about what kind of accommodations to use with
what kind of child under what conditions?
Research is needed on the impact of accommodations on the
validity of the assessment results.
Accountability and Program Quality
There is a need for the development of assessment instruments designed for the purpose of accountability and program
evaluation. Instruments that are developed for federal studies
such as the Early Childhood Longitudinal Study, KindergartenFirst Grade Waves (ECLS-K) or national studies of Head Start
should become publicly available, so they can used by others.
There is a need for research on the implementation of accountability systems and the tracking of positive and negative consequences at all levels of the system:
368
369
370
371
372
373
374
375
376
both are correct and that both are limited. The final version of the
report, thus, explicitly does not take the position that assessment
is here to stay and wed better learn to live with it. Rather, it takes
the position that assessments can make crucial contributions to
the improvement of childrens well-being, but only if they are
well designed, implemented effectively and in the context of systematic planning, and interpreted and used appropriately. Other
wise, assessment of children and programs can have negative
consequences for both. We conclude that the value of assessments
themselves cannot be judged without attention to the design of
the larger systems in which they are used.
References
SUMMARY
National Education Goals Panel. (1995). Reconsidering childrens early development
and learning: Toward common views and vocabulary. Washington, DC: Author.
Chapter 1
National Research Council. (2001). Eager to learn: Educating our preschoolers.
Committee on Early Childhood Pedagogy, B.T. Bowman, M.S. Donovan,
and M.S. Burns (Eds.). Commission on Behavioral and Social Sciences and
Education. Washington, DC: National Academy Press.
National Research Council and Institute of Medicine. (2000). From neurons
to neighborhoods: The science of early childhood development. Committee on
Integrating the Science of Early Childhood Development, J.P. Shonkoff and
D.A. Phillips (Eds.). Board on Children, Youth, and Families, Commission
on Behavioral and Social Sciences and Education. Washington, DC: National
Academy Press.
Chapter 2
Brown, G., Scott-Little, C., Amwake, L., and Wynn, L. (2007). A review of methods
and instruments used in state and local school readiness evaluations. (Issues and
Answers Report, REL 2007No. 004.) Washington, DC: U.S. Department of
Education, Institute of Education Sciences, National Center for Education
Evaluation and Regional Assistance, Regional Educational Laboratory
Southeast.
377
378
379
REFERENCES
Chapter 3
Ackerman, D.J., and Barnett, W.S. (2006). Increasing the effectiveness of preschool
programs. Preschool Policy Brief, 11.
American Educational Research Association. (2005). Letter to Congress. Washington,
DC: Author.
Council of Chief State School Officers and Early Childhood Education Assessment
Consortium. (2003a). Key considerations: Building a system of standards to support
successful early learners: The relationship between early learning standards, program
standards, program quality measures and accountability. Washington, DC:
Author.
Council of Chief State School Officers and Early Childhood Education Assessment
Consortium. (2003b). Matrix of state early learning standards. Washington, DC:
Author.
Council of Chief State School Officers and Early Childhood Education Assessment
Consortium. (2007). The words we use: A glossary of terms for early childhood
education standards and assessment. Available: http://www.ccsso.org/Projects/
scass/projects/early_childhood_education_assessment_consortium/
publications_and_products/2892.cfm [accessed February 2008].
Gilliam, W.S., and Zigler, E.F. (2001). A critical meta-analysis of all impact
evaluations of state-funded preschool from 1977 to 1998: Implications for
policy, service delivery and program evaluation. Early Childhood Research
Quarterly, 15, 441-473.
Hatch, J.A. (2001). Accountability shove down: Resisting the standards movement
in early childhood education. Phi Delta Kappan, 83(6), 457-462.
High/Scope Educational Research Foundation. (2002). Making validated educational
models central in preschool standards. Ypsilanti, MI: Author.
High/Scope Educational Research Foundation. (in press). Michigan school
readiness program evaluation, grades 6-8 follow-up. Ypsilanti, MI: Author.
380
Kagan, S.L., Moore, E., and Bredekamp, S. (1995). Reconsidering childrens early
development and learning: Toward common views and vocabulary. National
Education Goals Panel Goal 1 Technical Planning Group. Washington, DC:
U.S. Government Printing Office.
McKey, R.H., and Tarullo, L. (1998). Ensuring quality in Head Start: The FACES
Study. The Evaluation Exchange, 4(1).
Meisels, S.J., Barnett, W.S., Espinosa, L., Kagan, S.L., et al. (2003). Letter
to U.S. representatives concerning implementation of the National Reporting
System. Available: http://www.nhsa.org/download/research/headstart
letterforsenate.doc [accessed July 2008].
National Association for the Education of Young Children. (2006). NAEYC
accreditation criteria. Available: http://www.naeyc.org/academy/NAEYC
AccreditationCriteria.asp [accessed February 2008].
National Association for the Education of Young Children and National
Association of Early Childhood Specialists in State Departments of Education.
(2002). Early learning standards: Creating the conditions for success. Washington,
DC: Author.
National Early Childhood Accountability Task Force. (2007). Taking stock: Assessing
and improving early childhood learning and program quality. Philadelphia:
Author.
National Institute for Early Education Research. (2005). The effects of the Michigan
School Readiness Program on young childrens abilities at kindergarten entry.
Rutgers, NJ: Author.
National Research Council. (1998). Preventing reading difficulties in young children.
Committee on the Prevention of Reading Difficulties in Young Children, C.E.
Snow, M.S. Burns, and P. Griffin (Eds.). Commission on Behavioral and Social
Sciences and Education. Washington, DC: National Academy Press.
National Research Council. (2001). Eager to learn: Educating our preschoolers.
Committee on Early Childhood Pedagogy, B.T. Bowman, M.S. Donovan,
and M.S. Burns (Eds.). Commission on Behavioral and Social Sciences and
Education. Washington, DC: National Academy Press.
National Research Council and Institute of Medicine. (2000). From neurons
to neighborhoods: The science of early childhood development. Committee on
Integrating the Science of Early Childhood Development, J.P. Shonkoff and
D.A. Phillips (Eds.). Board on Children, Youth, and Families, Commission
on Behavioral and Social Sciences and Education. Washington, DC: National
Academy Press.
Pre-kindergarten Standards Panel. (2002). Pre-kindergarten standards: Guidelines for
teaching and learning. New York: McGraw-Hill.
Shepard, L., Kagan, S.L., and Wurtz, E. (1998). Principles and recommendations
for early childhood assessments. National Education Goals Panel Goal 1
Early Childhood Assessments Resource Group. Washington, DC: National
Education Goals Panel.
U.S. Department of Health and Human Services, Administration for Children
and Families. (2000). Head Start child outcomes framework. Washington, DC:
Author.
381
REFERENCES
Part II
Kagan, S.L., Moore, E., and Bredekamp, S. (1995). Reconsidering childrens early
development and learning: Toward common views and vocabulary. National
Education Goals Panel Goal 1 Technical Planning Group. Washington, DC:
U.S. Government Printing Office.
McCartney, K., and Phillips, D. (Eds.). (2006). Handbook of early childhood
development. Malden, MA: Blackwell.
National Education Goals Panel. (1995). Reconsidering childrens early development
and learning: Toward common views and vocabulary. Washington, DC: Author.
National Research Council and Institute of Medicine. (2000). From neurons
to neighborhoods: The science of early childhood development. Committee on
Integrating the Science of Early Childhood Development, J.P. Shonkoff and
D.A. Phillips (Eds.). Board on Children, Youth, and Families, Commission
on Behavioral and Social Sciences and Education. Washington, DC: National
Academy Press.
382
Chapter 4
Als, H., Butler, S., Kosta, S., and McAnulty, G. (2005). The assessment of preterm
infants behavior (APIB): Furthering the understanding and measurement
of neurodevelopmental competence in preterm and full-term infants. Mental
Retardation and Developmental Disability Research Review, 11, 94-102.
Als, H., Gilkerson, L., Duffy, F.H., McAnulty, G.B., Buehler, D.M., Vandenberg,
K., Sweet, N., Sell, E., Parad, R.B., Ringer, S.A., Butler, S.C., Blickman,
J.G., and Jones, K.J. (2003). A three-center, randomized, controlled trial of
individualized developmental care for very low birth weight preterm infants:
Medical, neurodevelopmental, parenting and caregiving effects. Journal of
Developmental and Behavioral Pediatrics, 24(6), 399-408.
American Academy of Pediatrics. (1996). Eye examination and vision screening
in infants, children and young adults. Pediatrics, 98(1), 153-157.
American Academy of Pediatrics. (2003). Iron deficiency. In Pediatric nutrition
handbook (5th ed.). Elk Grove, IL: Author.
American Academy of Pediatrics. (2005). Lead exposure in children: Prevention,
detection, and management. Pediatrics, 16(4), 1036-1046.
American Academy of Pediatrics, Committee on Children with Disabilities.
(2001). Developmental surveillance and screening of infants and young
children. Pediatrics, 108(1), 192-196.
American Academy of Pediatrics, Council on Children with Disabilities. (2006).
Identifying infants and young children with developmental disorders in the
medical home: An algorithm for developmental surveillance and screening.
Pediatrics, 118(1), 405-420.
Bagnato, S.J., Neisworth, J.T., Salvia, J.J., and Hunt, F.M. (1999). Temperament
and Atypical Behavior Scale (TABS): Early childhood indicators of developmental
dysfunction. Baltimore, MD: Brookes.
Baird, G., Charman, T., Baron-Cohen, S., Cox, A., Swettenham, J., Wheelwright, S.,
and Drew, A. (2000). Screening for autism at 18 months of age: 6-year followup study. Journal of the American Academy of Child and Adolescent Psychiatry,
39(3), 694-702.
Banks, E.C., Ferrittee, L.E., and Shucard, D.W. (1997). Effects of low-level
lead exposure on cognitive function in children: A review of behavioral,
neuropsychological and biological evidence. Neurotoxicology, 18, 237-281.
Bates, J.E., Freeland, C.A., and Lounsbury, M.L. (1979). Measurement of infant
difficultness. Child Development, 50, 794-803.
Beeghly, M., Brazelton, T.B., Flannery, K.A., Nugent, J.K., Barrett, D.E., and
Tronick, E.Z. (1995). Specificity of preventative pediatric intervention effects in
early infancy. Journal of Developmental and Behavioral Pediatrics, 16, 158-166.
Biondich, P.G., Downs, S.M., Carroll, A.E., Laskey, A.L., Liu, G.C., Rosenman, M.,
Wang, J., and Swigonski, N.L. (2006). Shortcomings in infant iron deficiency
screening methods. Pediatrics, 117(2), 290-294.
Blake, P.E., and Hall, J.W. (1990). The status of state-wide policies for neonatal
hearing screening. Journal of the American Academy of Audiology, 1, 67-74.
REFERENCES
383
384
Fisher, E.S., and Welch, H.G. (1999). Avoiding the unintended consequences of
growth in medical care: How might more be worse. The Journal of the American
Medical Association, 281(5), 446-453.
Folio, M.R., and Fewell, R.R. (1983). Peabody developmental motor scales and activity
cards: A manual. Allen, TX: DLM Teaching Resources.
Freed, G.L., Nahra, T.A., and Wheeler, J.R.C. (2004). Which physicians are
providing health care to Americas children? Archives of Pediatrics and
Adolescent Medicine, 158, 22-26.
Gilliam, W.S., Meisels, S.J., and Mayes, L. (2005). Screening and surveillance in
early intervention systems. In M.J. Guralnick (Ed.), The developmental systems
approach to early intervention. Baltimore, MD: Brookes.
Glascoe, F.P. (2003). Parents evaluation of developmental status: How well do
parents concerns identify children with behavioral and emotional problems?
Clinical Pediatrics, 42, 133-138.
Glascoe, F.P. (2005). Screening for developmental and behavioral problems.
Mental Retardation and Developmental Disabilities Research Reviews, 11(3),
173-179.
Glascoe, F.P., Martin, E.D., and Humphrey, S. (1990). A comparative review of
developmental screening tests. Pediatrics, 86, 5467-5554.
Grandjean, P., and Landrigan, P. (2006). Developmental neurotoxicity of industrial chemicals. The Lancet, 368(9553), 2167-2178.
Grantham-McGregor, S. (1984). Chronic undernutrition and cognitive abilities.
Human Nutrition-Clinical Nutrition, 38(2), 83-94.
Gravel, J.S., Fausel, N., Liskow, C., and Chobot, J. (1999). Childrens speech
recognition in noise using omni-directional and dual-microphone hearing aid
technology. Ear & Hearing, 20(1), 1-11.
Ireton, H. (1992). Child development inventory manual. Minneapolis, MN: Behavior
Science Systems.
Jacobs, S.E., Sokol, J., and Ohlsson, A. (2002). The newborn individualized
developmental care and assessment program is not supported by metaanalyses of the data. Journal of Pediatrics, 140, 699-706.
Johnson, J.O. (2005). Whos minding the kids? Child care arrangements: Winter,
2002. Current Population Reports (P70-101). Washington, DC: U.S. Census
Bureau.
Kaye, C.I., and the Committee on Genetics. (2006). Introduction to the newborn
screening fact sheets. Pediatrics, 118(3), 1304-1312.
Lanphear, B.P., Dietrich, K., Auinger, P., and Cox, C. (2000). Cognitive deficits
associated with blood lead concentrations, 10Mg/dl in US children and
adolescents. Public Health Reports, 115, 521-529.
Lanphear, B.P., Hornung, R., Ho, M., Howard, C., Eberli, S., and Knauf, D.K.
(2002). Environmental lead exposure during early childhood. Journal of
Pediatrics, 140, 40-47.
Lloyd-Puryear, M.A., Tonniges, T., van Dyck, P.C., Mann, M.Y., Brin, A., Johnson,
K., and McPherson, M. (2007). American Academy of Pediatrics Newborn
Screening Task Force recommendations: How far have we come? Pediatrics,
117(5 Pt. 2), S194-S211.
REFERENCES
385
Lozoff, B., Jiminez, E., Hagen, J., Mollen, E., and Wolf, A.W. (2000). Poorer
behavioral and developmental outcome more than 10 years after treatment for
iron deficiency in infancy. Pediatrics, 105(4), e51. Available: http://pediatrics.
aappublications.org/cgi/content/abstract/105/4/e51 [accessed August
2008].
Lozoff, B., Andraca, I.D., Castillo, M., Smith, J.B., Walter, T., and Pino, P. (2003).
Behavioral and developmental effects of preventing iron-deficiency anemia in
healthy full-term infants. Pediatrics, 112(4), 846-854.
Mangione-Smith, R., DeCristofaro, A.H., Setodji, C.M., Keesey, J., Klein, D.J.,
Adams, J.L., Schuster, M.A., and McGlynn, E.A. (2007). The quality of
ambulatory care delivered to children in the United States. The New England
Journal of Medicine, 357(15), 1515-1523.
Mathematica Policy Research. (2003). Resources for measuring services and outcomes
in Head Start programs serving infants and toddlers. Princeton, NJ: Author.
McCormick, M. (2008). Issues in measuring child health. Ambulatory Pediatrics,
8(2), 77-84.
Moeller, M.P. (2000). Early intervention and language development in children
who are deaf and hard of hearing. Pediatrics, 106, e43. Available: http://
pediatrics.aappublications.org/cgi/content/full/106/3/e43 [accessed August
2008].
Morgan, A.M., and Aldag, J.C. (1996). Early identification of cerebral palsy using
a profile of abnormal motor patterns. Pediatrics, 98(4), 692-697.
National Center for Hearing Assessment and Management. (2007). Early hearing
detection and intervention (EHDI) resources and information. Available: http://
www.infanthearing.org/ehdi.html [accessed July 2008].
National Research Council. (2002). Visual impairments: Determining eligibility
for Social Security benefits. Committee on Disability Determination for
Individuals with Visual Impairments, P. Lennie and S.B. Van Hemel (Eds.).
Board on Behavioral, and Sensory Sciences, Center for Studies of Behavior
and Development, Division of Behavioral and Social Sciences and Education.
Washington, DC: National Academy Press.
National Research Council and Institute of Medicine. (2000). From neurons
to neighborhoods: The science of early childhood development. Committee on
Integrating the Science of Early Childhood Development, J.P. Shonkoff and
D.A. Phillips (Eds.). Board on Children, Youth, and Families, Commission
on Behavioral and Social Sciences and Education. Washington, DC: National
Academy Press.
Needleman, H.L., and Gatsonis, C.A. (1990). Low-level lead exposure and the
IQ of children: A meta-analysis of modern studies. The Journal of the American
Medical Association, 263(2), 673-678.
Newman, T.B., Browner, W.S., and Hulley, S.B. (1990). The case against childhood
cholesterol screening. The Journal of the American Medical Association, 264, 30393043.
Piper, M.C., and Darrah, J. (1994). Motor assessment of the developing infant.
Philadelphia: W.B. Saunders.
386
Putnam, S.P., and Rothbart, M.K. (2006). Development of short and very
short forms of the childrens behavior questionnaire. Journal of Personality
Assessment, 87, 102-112.
Rutter, M., Bailey, A., and Lord, C. (2003). The social and communication questionnaire (SCQ) manual. Los Angeles, CA: Western Psychological Services.
Schulze, A., Lindner, M., Kohlmuller, D., Olgemoller, K., Mayatepek, E., and
Hoffman, G.F. (2003). Expanded newborn screening for inborn errors of
metabolism by electrospray ionization-tandem mass spectrometry: Results,
outcome, and implications. Pediatrics, 111, 1399-1406.
Shepherd, P.A., and Fagan, J.F. (1981). Visual pattern detection and recognition
memory in children with profound mental retardation. In N.R. Ellis (Ed.),
International review of research in mental retardation. New York: Academic Press.
Siegel, B. (2004). Pervasive developmental disorders screening test-II (PDDST-II). Early
childhood screener for autistic spectrum disorders. San Antonio, TX: Harcourt
Assessment.
Simpson, L., Owens, P., Zodet, M., Chevarley, F., Dougherty, D., Elixhauser,
A., and McCormick, M.C. (2005). Health care for children and youth in the
United States: Annual report on patterns of coverage, utilization, quality, and
expenditures by income. Ambulatory Pediatrics, 5(1), 45-46.
Stone, W.I., Coonrod, E.E., and Ousley, O.Y. (2000). Brief report: Screening tool for
autism in two-year-olds (STAT): Development and preliminary data. Journal of
Autism and Developmental Disorders, 30, 607-701.
Teti, D.M., and Gelfand, D.M. (1997). The preschool assessment of attachment:
Construct validity in a sample of depressed and nondepressed families.
Development and Psychopathology, 9, 517-536.
Tronick, E.Z. (1987). The Neonatal Behavioral Assessment Scale as a biomarker
of the effects of environmental agents on the newborn. Environmental Health
Perspectives, 74, 185-189.
U.S. Department of Health and Human Services, National Center for Health
Statistics. (1981). National health interview survey-1981 child health supplement.
Washington, DC: Author.
U.S. Preventive Services Task Force. (2001). Guide to clinical preventive services.
Washington, DC: Office of Disease Prevention and Health Promotion.
U.S. Preventive Services Task Force. (2004). Screening for visual impairment in
children younger than age 5 years: Update of the evidence. Rockville, MD: Agency
for Healthcare Research and Quality. Available: http://www.ahrq.gov/clinic/
uspstf/uspsvsch.htm [accessed July 2008].
U.S. Preventive Services Task Force. (2006). Screening for iron deficiency anemia
Including iron supplementation for children and pregnant women. Washington, DC:
Office of Disease Prevention and Health Promotion.
Voigt, R.G., Brown, F.R., Fraley, J.K., Liorente, A.M., Rozelle, J., Turcich, M.,
Jensen, C.L., and Heird, W.C. (2003). Concurrent and predictive validity of
the Cognitive Adaptive Test/Clinical Linguistic and Auditory Milestone Scale
(CAT/CLAMS) and the mental developmental index of the Bayley Scales of
Infant Development. Clinical Pediatrics, 42(5), 427-432.
387
REFERENCES
Wasserman, R., Croft, C., and Brotherton, S. (1992). Preschool vision screening
in pediatric practice: A study from the Pediatric Research in Office Settings
(PROS) network. Pediatrics, 89(5 Pt. 1), 834-838.
Wetherby, A.M., and Prizant, B.M. (2002). Communication and symbolic behavior
scales: Developmental profile. Baltimore, MD: Brookes.
Widerstrom, A. (1999). Newborns and infants at risk for or with disabilities. In A.
Widerstrom, B. Mowder, and S. Sandall (Eds.), Infant development and risk (2nd
ed., pp. 3-24). Baltimore, MD: Brookes.
Wilson, J.M.G., and Jungner, G. (1968). Principles and practice of screening for
diseases. Geneva: World Health Organization.
Wyly, M.V. (1997). Infant assessment. Boulder, CO: Westview Press.
Yoshinaga-Itano, C., Sedey, A.L., Coulter, D.K., and Mehl, A.L. (1998). Language
of early- and later-identified children with hearing loss. Pediatrics, 102, 11611171.
Zafeieriou, D.I. (2003). Primitive reflexes and postural reactions in the neuro
developmental examination. Pediatric Neurology, 31, 1-8.
Chapter 5
Alexander, P.A., White, C.S., and Daugherty, M. (1997). Analogical reasoning
and early mathematics learning. In L.D. English (Ed.), Mathematical reasoning:
Analogies, metaphors, and images (pp. 117-147). Mahwah, NJ: Lawrence Erlbaum
Associates.
American Academy of Pediatrics. (2005). Lead exposure in children: Prevention,
detection, and management. Pediatrics, 16(4), 1036-1046.
American Institutes for Research. (2005). Reassessing U.S. international mathematics
performance: New findings from the TIMSS and PISA. Washington, DC: Author.
American Psychological Association Task Force on Intelligence. (1996). Intelligence:
Knowns and unknowns. Washington, DC: Author.
Barone, T. (2001). The end of the terror: On disclosing the complexities of teaching.
Curriculum Inquiry, 31(1), 89-102
Bayley, N. (2005). Bayley Scales of Infant and Toddler Development. San Antonio, TX:
Psychological Corporation.
Bear, D.R., Invernizzi, M., Templeton, S., and Johnston, F. (1999). Words their way:
Word study for phonics, vocabulary, and spelling instruction. Upper Saddle River,
NJ: Prentice Hall.
Beck, I., McKeown, M., and Kucan, L. (2002). Bringing words to life: Robust
vocabulary instruction. New York: Guilford Press.
Becker, J. (1989). Preschoolers use of number words to denote one-to-one
correspondence. Child Development, 60, 1147-1157.
Bierman, K.L., and Erath, S.A. (2006). Promoting social competence in early
childhood: Classroom curricula and social skills coaching programs. In K.
McCartney and D. Phillips (Eds.), Handbook of early childhood development.
Malden, MA: Blackwell.
388
Bierman, K.L., Domitrovich, C.E., Nix, R.L., Gest, S.D., Welsh, J.A., Greenberg,
M.T., Blair, C., Nelson, K.E., and Gill, S. (under review). Promoting academic and
social-emotional school readiness: The Head Start REDI program. The Pennsylvania
State University. Available: http://www.srcd.org/journals/cdev/0-0/
Bierman.pdf [accessed July 2008].
Birch, S., and Ladd, G.W. (1998). Childrens interpersonal behaviors and the
teacher-child relationship. Developmental Psychology, 34, 934-946.
Blair, C. (2002). School readiness: Integrating cognition and emotion in a
neurobiological conceptualization of childrens functioning at school entry.
American Psychologist, 57(2), 111-127.
Blair, C. (2006). How similar are fluid cognition and general intelligence? A
developmental neuroscience perspective on fluid cognition as an aspect of
human cognitive ability. Behavioral and Brain Sciences, 29, 109-125.
Blair, C., and Razza, R.P. (2007). Relating effortful control, executive function,
and false belief understanding to emerging math and literacy ability in
kindergarten. Child Development, 78(2), 647-663.
Blank, M., Rose, S., and Berlin, L. (1978). Preschool Language Assessment Instrument
(PLAI). New York: Psychological Corporation.
Boaler, J. (1994). When do girls prefer football to fashion? An analysis of female
underachievement in relation to realistic mathematics contexts. British
Educational Research Journal, 20(5), 551-564.
Bodrova, E., and Leong, D.J. (2001). Tools of the mind: A case study of implementing
the Vygotskian approach in American early childhood and primary classrooms.
Geneva: UNESCO, International Bureau of Education.
Bull, R., and Scerif, G. (2001). Executive functioning as a predictor of childrens
mathematics ability: Inhibition, switching, and working memory. Develop
mental Neuropsychology, 19(3), 273-293.
Burchinal, M., Lee, M., and Ramey, C. (1989). Type of day-care and preschool
intellectual development in disadvantaged children. Child Development, 60(1),
128-137.
Campbell, S.B. (2006). Maladjustment in preschool children: A developmental
pschyopathology perspective. In K. McCartney and D. Phillips (Eds.),
Handbook of early childhood development (pp. 358-378). Malden, MA: Blackwell.
Carlson, S. (2005). Developmentally sensitive measures of executive function in
preschool children. Developmental Neuropscyhology, 28, 595-616.
Clay, M. (1979). Concepts about print tests: Sand and stones. Portsmouth, NH:
Heinemann.
Clements, D.H. (1999). Geometric and spatial thinking in young children. In J.V.
Copley (Ed.), Mathematics in the early years. Reston, VA: National Council of
Teachers of Mathematics.
Clements, D.H, Sarama, J., and DiBiase, A.M. (Eds.). (2004). Engaging young
children in mathematics: Standards for early childhood mathematics education.
Mahwah, NJ: Lawrence Erlbaum Associates.
Clements, D.H. (2004). Major themes and recommendations. In D.H. Clements,
J. Sarama, and A.-M. DiBiase (Eds.), Engaging young children in mathematics:
Standards for early childhood mathematics education. Mahwah, NJ: Lawrence
Erlbaum Associates.
REFERENCES
389
390
Fabes, R.A., Gaertner, B.M., and Popp, T.K. (2006). Getting along with others:
Social competence in early childhood. In K. McCartney and D. Phillips (Eds.),
Handbook of early childhood development (pp. 297-316). Malden, MA: Blackwell.
Fantuzzo, J., Bulotsky-Shearer, R., Fusco, R.A., and McWayne, C. (2005). An
investigation of preschool classroom behavioral adjustment problems and
social-emotional school readiness competencies. Early Childhood Research
Quarterly, 20(3), 259-275.
Fantuzzo, J., Bulotsky-Shearer, R., McDermott, P.A., McWayne, C., Frye, D., and
Perlman, S. (2007). Investigation of dimensions of social-emotional classroom
behavior and school readiness for low-income urban preschool children.
School Psychology Review, 36(1), 44-62. Available: http://repository.upenn.
edu/gse_pubs/124/ [accessed July 2008].
Fantuzzo, J., Perry, M.A., and McDermott, P. (2004). Preschool approaches to
learning and their relationship to other relevant classroom competencies for
low-income children. School Psychology Quarterly, 19, 212-230.
Feigenson, L., Dehaene, S., and Spelke, E. (2004). Core systems of number. Trends
in Cognitive Sciences, 8, 307-314.
Fenson, L., Dale, P., Reznick, J.S., Thal, D., Bates, E., Hartung, J.P., Pethick, S., and
Reilly, J. (1993). MacArthur-Bates Communicative Development Inventories. San
Diego, CA: Singular.
Flanagan, D.P., and McGrew, K.S. (1997). A cross-battery approach to assessing
and interpreting cognitive abilities: Narrowing the gap between practice
and cognitive science. In D.P. Flanagan, J.L. Genshaft, and P. Harrison (Eds.),
Contemporary intellectual assessment: Theories, tests, and issues (pp. 314-325).
New York: Guilford Press.
Foulks, B., and Morrow, R.D. (1989). Academic survival skills for the young child
at risk for school failure. Journal of Educational Research, 82(3), 158-165.
Fuson, K.C. (1988). Childrens counting and concepts of number. New York: SpringerVerlag.
Fuson, K.C. (1992a). Relationships between counting and cardinality from age
2 to age 8. In J. Bideau, C. Meljac, and J.P. Fischer (Eds.), Pathways to number:
Childrens developing numerical abilities (Chapter 6, pp. 127-150). Hillsdale, NJ:
Lawrence Erlbaum Associates.
Fuson, K.C. (1992b). Research on whole number addition and subtraction. In D.
Grouws (Ed.), Handbook of research on mathematics teaching and learning. New
York: Macmillan.
Gardner, M.F., and Brownell, R. (2000). Expressive One-Word Picture Vocabulary
Test. Novato, CA: Academic Therapy.
Gathercole, S.E. (1998). The development of memory. Journal of Child Psychiatry,
39, 3-27.
Ginsburg, H.P., Inoue, N., and Seo, K.H. (1999). Young children doing mathe
matics: Observations of everyday activities. In J.V. Cooper (Ed.), Mathematics
in the early years (pp. 87-99). Reston, VA: National Council of Teachers of
Mathematics.
Gray, S.W., and Klaus, R.A. (1970). The early training project: A seventh-year
report. Child Development, 7(4), 909-924.
REFERENCES
391
Green, L.F., and Francis, J. (1988). Childrens learning skills at the infant and
junior stages: A follow-on study. British Journal of Educational Psychology, 58(1),
120-126.
Hammill, D.D., and Newcomer, P.L. (1997). Test of language development-primary
(3rd ed.). Austin, TX: Pearson Education.
Hamre, B.K., and Pianta, R.C. (2001). Early teacher-child relationships and
the trajectory of childrens school outcomes through eighth grade. Child
Development, 72, 625-638.
Hemphill, L., Uccelli, P., Winner, K., Chang, C.-J., and Bellinger, D. (2002).
Narrative discourse in young children with histories of early corrective heart
surgery. Journal of Speech, Language, and Hearing Research, 45, 318-331.
Herrnstein, R.J., and Murray, C. (1994). The bell curve: Intelligence and class structure
in American life. New York: Simon and Schuster.
Hiebert, J., Carpenter, T.P., Fennema, E., Fuson, K.C., Wearne, D., and Murray,
H. (1997). Making sense: Teaching and learning mathematics with understanding.
Portsmouth, NH: Heinemann.
Howes, C., Phillipsen, L., and Peisner-Feinberg, E. (2000). The consistency of
perceived teacher-child relationships between preschool and kindergarten.
Journal of School Psychology, 38, 113-132.
Howse, R.B., Lange, G., Farran, D.C., and Boyles, C.D. (2003). Motivation and
self-regulation as predictors of achievement in economically disadvantaged
young children. Journal of Experimental Education, 71(2), 151-174.
Hresko, W.P., Reid, D.K., and Hammill, D.D. (1999). Test of early language
development (3rd ed.). Austin, TX: Pearson Education.
Jordan, G.E., Snow, C.E., and Porche, M.V. (2000). Project EASE: The effect of a
family literacy project on kindergarten students early literacy skills. Reading
Research Quarterly, 35(4), 524-546.
Juel, C., and Minden-Cupp, C. (2000). Learning to read words: Linguistic units
and instructional strategies. Reading Research Quarterly, 35(4), 458-492.
Kaufman, A.S., and Kaufman, N.L. (2006). Kaufman Assessment Battery for Children
(K-ABC) (2nd ed.). Upper Saddle River, NJ: Pearson Assessments.
Klein, A., and Starkey, P.J. (2004). Fostering preschool childrens mathematical
knowledge: Finding from the Berkeley Math Readiness Project. In D.H.
Clements, J. Samara, and A.-M. DiBiase (Eds.), Engaging young children in
mathematics: Standards for early childhood mathematics education (pp. 343-360).
Hillsdale, NJ: Lawrence Erlbaum Associates.
Klingberg, T., Fernell, E., Olesen, P.J., Johnson, M., Gustafsson, P., Dahlstrom,
K., Gillberg, C.G., Forssberg, H., and Westerberg, H. (2005). Computerized
training of working memory in children with ADHDA randomized,
controlled trial. Journal of the American Academy of Child and Adolescent
Psychiatry, 44(2), 177-186.
Knight, G.P., and Hill, N.E. (1998). Measurement equivalence in research
involving minority adolescents. In V.C. McLoyd and L. Steinberg (Eds.),
Studying minority adolescents: Conceptual, methodological, and theoretical issues
(pp. 183-210). Mahwah, NJ: Lawrence Erlbaum Associates.
392
Ladd, G.W., and Burgess, K. (2001). Do relational risks and protective factors
moderate the linkages between childhood aggression and early psychological
and school adjustment? Child Development, 72, 1579-1601.
Ladd, G.W., Birch, S., and Buhs, E. (1999). Childrens social lives in kindergarten:
Related spheres of influence. Child Development, 70, 1373-1400.
Ladd, G.W., Herald, S.L., and Kochel, K.P. (2006). School readiness: Are there
social prerequisites? Early Education and Development, 17(1), 115-150.
Lehto, J.E. (2004). A test for childrens goal-directed behavior: A pilot study.
Perceptual and Motor Skills, 98(1), 223-236.
Lewit, E.M., and Baker, L.S. (1995). School readiness. The Future of Children, 5(2),
128-139.
Lubienski, S.T. (2000). Problem solving as a means toward mathematics for all:
An exploratory look through a class lens. Journal of Research in Mathematics
Education, 31(4), 454-482.
MacDonald, A.W., Cohen, J.D., Stenger, V.A., and Carter, C.S. (2000). Dissociating
the role of the dorsolateral prefrontal and anterior cingulate cortex in cognitive
control. Science, 288(5472), 1835-1838.
Malvern, D.D., and Richards, B.J. (1997). A new measure of lexical diversity. In
A. Ryan and A. Wray (Eds.), Evolving models of language. Bristol, England:
Multilingual Matters.
Mashburn, A.J., and Pianta, R.C. (2006). Social relationships and school readiness.
Early Education and Development, 17(1), 151-176.
Mason, J., Stewart, J., Peterman, C., and Dunning, D. (1992). Toward an integrated
model of early reading development (No. 566). Champaign, IL: Center for the
Study of Reading.
McCall, R.B. (1977). Childhood IQs as predictors of adult educational and
occupational status. Science, 197(4302), 482-483.
McCall, R.B., and Carriger, M.S. (1993). A meta-analysis of infant habituation and
recognition memory performance as predictors of later IQ. Child Development,
64(1), 57-79.
McCall, R.B., Appelbaum, M.I., and Hogarty, P.S. (1974). Developmental changes in
mental performance. Chicago, IL: University of Chicago Press.
McCartney, K. (2002). Language environments and language outcomes: Results
from the NICHD study of early child care and youth development. In L.
Girolametto and E. Weitzman (Eds.), Enhancing caregiver language facilitation
in child care setting (pp. 3-13-10). Toronto: The Hanen Centre.
McClelland, M.M., Acock, A.C., and Morrison, F.J. (2006). The impact of
kindergarten learning-related skills on academic trajectories at the end of
elementary school Early Childhood Research Quarterly, 21(4), 471-490.
McClelland, M.M., Cameron, C.E., Connor, C.M., Farris, C.L., Jewkes, A.M., and
Morrison, F.J. (2007). Links between behavioral regulation and preschoolers
literacy, vocabulary, and math skills. Developmental Psychology, 43(4), 947-959.
McClelland, M.M., Morrison, F.J., and Holmes, D.L. (2000). Children at risk
for early academic problems: The role of learning-related social skills. Early
Childhood Research Quarterly, 15(3), 307-329.
REFERENCES
393
394
Norman, D.A., and Shallice, T. (1986). Attention to action. In R.J. Davidson, G.E.
Schwartz, and D. Shapiro (Eds.), Consciousness and self-regulation: Advances in
theory and research (pp. 1-18). New York: Plenum Press.
Normandeau, S., and Guay, F. (1998). Preschool behavior and first-grade
school achievement: The mediational role of cognitive self-control. Journal of
Educational Psychology, 90(1), 111-121.
Olds, D.L., Henderson, C.R., Phelps, C., Kitzman, H., and Hanks, C. (1993). Effect
of prenatal and infancy nurse home visitation on government spending.
Medical Care, 31(2), 155-174.
Olson, S.L., Sameroff, A.J., Kerr, D.C.R., Lopez, N.L., and Weeman, H.M. (2005).
Developmental foundations of externalizing problems in young children: The
role of effortful control. Development and Psychopathology, 17, 25-45.
Pan, B.A., Mancilla-Martinez, J., and Vagh, S.B. (2008). Tracking bilingual childrens
vocabulary development: Reporter- and language-related measurement challenges.
Poster presentation at the Head Starts Ninth National Research Conference,
June 23-25, Washington, DC.
Phinney, J.S., and Landin, J. (1998). Research paradigms for studying ethnic
minority families within and across groups. In V.C. McLoyd and L. Steinberg
(Eds.), Studying minority adolescents: Conceptual, methodological, and theoretical
issues (pp. 89-110). Mahwah, NJ: Lawrence Erlbaum Associates.
Pianta, R.C., and Steinberg, M. (1992). Teacher-child relationships and the process
of adjusting to school. New Directions for Child Development, 57, 61-80.
Piotrkowski, C.S., Botsko, M., and Matthews, E. (2000). Parents and teachers
beliefs about childrens school readiness in a high-need community. Early
Childhood Research Quarterly, 15(4), 537-558.
Poe, M.D., Burchinal, M.R., and Roberts, J.E. (2004). Early language and the
development of childrens reading skills. Journal of School Psychology, 42,
315-332.
Posner, M.I., and Rothbert, M.K. (2000). Developing mechanisms of selfregulation. Development and Psychopathology, 12, 427-441.
Quattrin, T., Liu, E., Shaw, N., Shine, B., and Chiang, E. (2005). Obese children
who are referred to the pediatric endocrinologist: Characteristics and outcome.
Pediatrics, 115, 348-351.
RAND Labor and Population. (2005). Early childhood interventions: Proven results,
future promise. Santa Monica, CA: RAND Corporation.
Raver, C. (2002). Emotions matter: Making the case for the role of young childrens
emotional development for early school readiness (No. 3). Ann Arbor, MI: Society
for Research in Child Development.
Raver, C. (2004). Placing emotional self-regulation in sociocultural and
socioeconomic contexts. Child Development, 75(2), 8.
Raver, C., Gershoff, E.T., and Aber, J. (2007). Testing equivalence of mediating
models of income, parenting, and school readiness for white, black, and
Hispanic children in a national sample. Child Development, 78(1), 20.
Reynolds, A.J., and Temple, J.A. (1998). Extended early childhood intervention
and school achievement: Age thirteen findings from the Chicago Longitudinal
Study. Child Development, 69(1), 231-246.
REFERENCES
395
Rimm-Kaufman, S.E., Pianta, R.C., and Cox, M.J. (2000). Teachers judgment of
problems in the transition to kindergarten. Early Childhood Research Quarterly,
15(2), 147-166.
Robbins, T.W. (1996). Refining the taxonomy of memory. Science, 273(5280), 13531354.
Roid, G.H. (2003). Stanford-Binet Intelligence Scales for Early Childhood (5th ed.).
Rolling Meadows, IL: Riverside.
Roth, F.P., Speece, D.L., and Cooper, D.H. (2002). A longitudinal analysis of the
connection between oral language and early reading. Journal of Educational
Research, 95(5), 259-272.
Rothbart, M.K., Posner, M.I., and Kieras, J. (2006). Temperament, attention, and
the development of self-regulation. In K. McCartney and D. Phillips (Eds.),
Handbook of early childhood development (pp. 338-357). Malden, MA: Blackwell.
Rueda, M.R., Rothbart, M.K., McCandliss, B.D., Saccomanno, L., and Posner, M.I.
(2005). Training, maturation, and genetic influences on the development of
executive attention. Proceedings of the National Academy of Sciences of the United
States of America, 102(41), 14931-14936.
Schatschneider, C., Francis, D.J., Carlson, C.D., Fletcher, J.M., and Foorman, B.F.
(2004). Kindergarten prediction of reading skills: A longitudinal comparative
analysis. Journal of Educational Psychology, 96(2), 265-282.
Schrank, F.A., Mather, N., and Woodcock, R.W. (2006). Woodcock-Johnson III(r)
Diagnostic Reading Battery. Rolling Meadows, IL: Riverside.
Scott-Little, C., Kagan, S.L., and Frelow, V.S. (2005). Inside the content: The depth and
breadth of early learning standards. Greensboro: University of North Carolina,
SERVE Center for Continuous Improvement.
Scott-Little, C., Kagan, S.L., and Frelow, V.S. (2006). Conceptualization of
readiness and the content of early learning standards: The intersection of
policy and research? Early Childhood Research Quarterly, 21(2), 153-173.
Semrud-Clikeman, M., Nielsen, K.H., Clinton, A., Sylvester, L.H., Parle, N., and
Connor, R.T. (1999). An intervention approach for children with teacher- and
parent-identified attentional difficulties. Journal of Learning Disabilities, 32(6),
581-590.
Snchal, M., and LeFevre, J.-A. (2002). Parental involvement in the development
of childrens reading skill: A five-year longitudinal study. Child Development,
73(2), 445-460.
Seymour, H.N., Roeper, T.W., de Villiers, J., and de Villiers, P.A. (2003). Diagnostic
evaluation of language variation. San Antonio, TX: Pearson Assessments.
Silver, R., Measelle, J., Armstrong, J., and Essex, M. (2005). Trajectories of
classroom externalizing behavior: Contributions of child characteristics,
family characteristics, and the teacher-child relationship during the school
transition. Journal of School Psychology, 43, 39-60.
Silverman, R.D. (2007). Vocabulary development of English-language and
English-only learners in kindergarten. The Elementary School Journal, 107,
365-384.
Snow, C.E., Porche, M., Tabors, P., and Harris, S. (2007). Is literacy enough?
Pathways to academic success for adolescents. Baltimore, MD: Brookes.
396
Snow, C.E., Tabors, P.O., Nicholson, P., and Kurland, B. (1995). SHELL: Oral
language and early literacy skills in kindergarten and first grade children.
Journal of Research in Childhood Education, 10, 37-48.
Starkey, P., Klein, A., and Wakeley, A. (2004). Enhancing young childrens
mathematical knowledge through a pre-kindergarten mathematics
intervention. Early Childhood Research Quarterly, 19, 99-120.
Sulzby, E. (1985). Childrens emergent reading of favorite storybooks: A
developmental study. Reading Research Quarterly, 20(4), 458-481.
Thompson, R.A., and Lagattuta, K. (2006). Feeling and understanding: Early
emotional development. In K. McCartney and D. Phillips (Eds.), Handbook of
early childhood development (pp. 317-337). Malden, MA: Blackwell.
Thompson, R.A., and Raikes, A.H. (2007). The social and emotional foundations
of school readiness. In R.K. Kaufmann and J. Knitzer (Eds.), Social and emotional
health in early childhood (pp. 13-35). Baltimore, MD: Brookes.
U.S. Department of Health and Human Services, Administration for Children
and Families. (2003a). Head Start child outcomes framework. Washington, DC:
Author.
U.S. Department of Health and Human Services, Administration for Children
and Families. (2003b). Head Start child outcomesSetting the context for the
National Reporting System. Head Start Bulletin, 76. Available: http://www.
headstartinfo.org/publications/hsbulletin76/cont_76.htm [accessed July
2008].
U.S. Department of Health and Human Services, Administration for Children
and Families. (2004). Early Head Start research: Making a difference in the lives of
infants, toddlers, and their families. The impacts of early Head Start, volume 1: Final
technical report. Washington, DC: Author.
U.S. Department of Health and Human Services, Administration for Children
and Families. (2005). Head Start impact study: First year findings. Washington,
DC: Author.
Van Hiele, P.M. (1986). Structure and insight: A theory of mathematics education.
Orlando, FL: Academic Press.
Vandell, D., Nenide, L., and Van Winkle, S.J. (2006). Peer relationships in early
childhood. In K. McCartney and D. Phillips (Eds.), Handbook of early childhood
development (pp. 455-470). Cambridge, MA: Blackwell.
Vernon-Feagans, L. (1996). Childrens talk in communities and classrooms. Cambridge,
MA: Blackwell.
Wagner, R., Torgesen, J., and Rashotte, C. (1990). Comprehensive test of phonological
processing. Bloomington, MN: Pearson Assessments.
Wasik, B.A., Bond, M.A., and Hindman, A. (2006). The effects of a language and
literacy intervention on Head Start children and teachers. Journal of Educational
Psychology, 98(1), 63-74.
Wechsler, D. (2003). The Wechsler Intelligence Scale for Children (4th ed.). San
Antonio, TX: Psychological Corporation.
Weiss, R., Dziura, J., Burgert, T.S., Tamborlane, W.V., Taksali, S.E., Yeckel, C.W.,
Allen, K., Lopes, M., Savoye, M., Morrison, J., Sherwin, R.S., and Caprio, S.
(2004). Obesity and the metabolic syndrome in children and adolescents. The
New England Journal of Medicine, 350(23), 2362-2374.
397
REFERENCES
Welsh, M.C., Pennington, B.F., and Groisser, D.B. (1991). A normativedevelopmental study of executive function: A window on prefrontal function
in children. Developmental Neuropsychology, 7, 131-149.
Whitehurst, G.J., and Lonigan, C.J. (1998). Child development and emergent
literacy. Child Development, 69, 848-872.
Whitehurst, G.J., Arnold, D.H., Epstein, J.N., Angell, A.L., Smith, M., and Fischel,
J.E. (1994). A picture book reading intervention in day care and home for
children from low-income families. Developmental Psychology, 30, 679-689.
Woodcock, R.W. (1990). Theoretical foundations of the WJ-R measures of cognitive
ability. Journal of Psychoeducational Assessment, 8(3), 231-258.
Woodcock, R.W., McGrew, K.S., and Mather, N. (2001). Woodcock-Johnson III
(WJ-III) Tests of Cognitive Abilities. Rolling Meadows, IL: Riverside.
Xu, F., Spelke, E.S., and Goddard, S. (2005). Number sense in human infants.
Developmental Science, 8(1), 88-101.
Zaslow, M., Halle, T., Martin, L., Cabrera, N., Calkins, J., Pitzer, L., and Margie,
N.G. (2006). Child outcome measures in the study of child care quality:
Overview and next steps. Evaluation Review, 30, 577-610.
Chapter 6
Abbot-Shinn, M., and Sibley, A. (1992). Assessment profile for early childhood
programs: Research version. Atlanta, GA: Quality Assist.
Abt Associates Inc. (2006). Observation training manual: OMLIT early childhood.
Cambridge, MA: Author.
Adams, G., Tout, K., and Zaslow, M. (2007). Early care and education for children
in low-income families: Patterns of use, quality, and potential implications. Wash
ington, DC: The Urban Institute.
Arnett, J. (1989). Caregivers in day-care centers: Does training matter? Journal of
Applied Developmental Psychology, 10, 541.
Belsky, J., Vandell, D.L., Burchinal, M., Clarke-Stewart, K.A., McCartney, K.,
Owen, M., and the NICHD Early Child Care Research Network. (2007). Are
there long-term effects of early child care? Child Development, 78, 681-701.
Bornstein, M.H., and Sawyer, J. (2006). Family systems. In K. McCartney and D.
Phillips (Eds.), Handbook of early childhood development (pp. 381-398). Malden,
MA: Blackwell.
Bradley, R.H., Corwyn, R.F., Burchinal, M., McAdoo, H.P., and Garcia-Coll,
C. (2001). The home environments of children in the United States: Part 2,
Relations with behavioral development from birth through age 13. Child
Development, 72, 1868-1886.
Bryant, D. (2007). Delivering and evaluating the Partnerships for Inclusion model of
early childhood professional development in a 5-state collaborative study. Paper
presented at the National Association for the Education of Young Children,
November, Chicago, IL.
Bryant, D. (forthcoming). Observational measures of quality in center-based early
care and education programs. Submitted to Child Development Perspectives.
398
Bryant, D.M., Burchinal, M.R., Lau, L.B., and Sparling, J.J. (1994). Family and
classroom correlates of Head Start childrens developmental outcomes. Early
Childhood Research Quarterly, 9(4), 289-309.
Burchinal, M. (forthcoming). The measurement of child care quality. University of
California, Irvine.
Burchinal, M., Kainz, K., Cai, K., Tout, K., Zaslow, M., Martinez-Beck, I., and
Rathgeb, C. (2008). Child care quality and child outcomes: Multiple studies
analyses. Paper presented at Developing a Next Wave of Quality Measures
for Early Childhood and School-Age Programs: A Working Meeting, January,
Washington, DC.
Burchinal, M.R., Peisner-Feinberg, E., Bryant, D.M., and Clifford, R. (2000).
Childrens social and cognitive development and child care quality: Testing
for differential associations related to poverty, gender, or ethnicity. Applied
Developmental Science, 4, 149-165.
Burchinal, M.R., Roberts, J.E., Riggins, R., Zeisel, S., Neebe, E., and Bryant, M.
(2000). Relating quality of center child care to early cognitive and language
development longitudinally. Child Development, 71, 339-357.
Caldwell, B.M., and Bradley, R.H. (1984). Home observation for measurement of the
environment. Little Rock: University of Arkansas.
Child Trends. (2007). Quality in early childhood care and education settings: A
compendium of measures. Washington, DC: Author.
Connell, C.M., and Prinz, R.J. (2002). The impact of childcare and parent-child
interactions on school readiness and social skills development for low-income
African American children. Journal of School Psychology, 40(2), 177-193.
DeTemple, J., and Snow, C.E. (1998). Mother-child interactions related to the
emergence of literacy. In C.A. Eldred (Ed.), Parenting behaviors in a sample of
young single mothers in poverty: Results of the New Chance Observational Study
(pp. 114-169). New York: Manpower Demonstration Research.
Dickinson, D.K., Sprague, K., Sayer, A., Miller, C., Clark, N., and Wolf, A. (2000).
Classroom factors that foster literacy and social development of children from
different language backgrounds. In M. Hopmann (Chair) (Ed.), Dimensions of
program quality that foster child development: Reports from 5 years of the Head Start
Quality Research Centers. Poster presentation at the biannual National Head
Start Research Conference, June, Washington, DC.
Dickinson, D.K., Sprague, K., Sayer, A., Miller, C.M., and Clark, N. (2001, April).
A multilevel analysis of the effects of early home and preschool environments
on childrens language and early literacy development. Paper presented at the
Biennial Conference of the Society for Research in Child Development, April,
Minneapolis, MN.
Early, D.M., Bryant, D.M., Pianta, R.C., Clifford, R.M., Burchinal, M.R., Ritchie,
S., Howes, C., and Barbarin, O. (2006). Are teachers education, major, and
credentials related to classroom quality and childrens academic gains in prekindergarten? Early Childhood Research Quarterly, 21(2), 174-195.
Egeland, B., and Deinard, A. (1975). Life stress scale and manual. Minneapolis:
University of Minnesota.
REFERENCES
399
Englund, M.M., Luckner, A.E., Whaley, G., and Egeland, B. (2004). Childrens
achievement in early elementary school: Longitudinal effects of parental
involvement, expectations, and quality of assistance. Journal of Educational
Psychology, 96, 723-730.
Frosch, C.A., Cox, M.J., and Goldman, B.D. (2001). Infant-parent attachment and
parental and child behavior during parent-toddler storybook interaction.
Merrill-Palmer Quarterly, 47(4), 445-474.
Fuligni, A.S., Han, W.J., and Brooks-Gunn, J. (2004). The Infant-Toddler HOME
in the 2nd and 3rd years of life. Parenting, 4(2&3), 139-159.
Harms, T., and Clifford, R. (1980). Early Childhood Environment Rating Scale. New
York: Teachers College Press.
Harms, T., and Clifford, R.M. (1989). Family Day Care Rating Scale. New York:
Teachers College Press.
Harms, T., Clifford, R., and Cryer, D. (1998). Early Childhood Environment Rating
Scale (Revised ed.). New York: Teachers College Press.
Harms, T., Cryer, R., and Clifford, R. (1990). Infant/Toddler Environment Rating
Scale. New York: Teachers College Press.
Helburn, S. (1995). Cost, quality and child outcomes in child care centers. Denver:
University of Colorado, Department of Economics, Center for Research in
Economic and Social Policy.
High/Scope. (2003). Preschool Program Quality Assessment (2nd ed.). Ypsilanti, MI:
High/Scope Press.
Howes, C. (1997). Childrens experiences in center-based child care as a function
of teacher background and adult:child ratio. Merrill-Palmer Quarterly, 43,
404-425.
Howes, C., Mashburn, A., Pianta, R., Hamre, B., Downer, J., Barbarin, O., Bryant,
D., Burchinal, M., and Early, D.M. (2008). Measures of classroom quality in
pre-kindergarten and childrens development of academic, language and
social skills. Child Development, 79(3), 732-749.
Howes, C., Phillips, D.A., and Whitebrook, M. (1992). Thresholds of quality:
Implications for the social development of children in center-based child care.
Child Development, 53, 449-460.
Hyson, M., Hirsh-Pasek, K., and Rescorla, L. (1990). The classroom practices
inventory: An observation instrument based on NAEYCs guidelines for
developmentally appropriate practices for 4- and 5-year-old children. Early
Childhood Research Quarterly, 5, 475-494.
Kinzie, M.B., Whitaker, S.D., Neesen, K., Kelley, M., Matera, M., and Pianta, R.C.
(2006). Innovative web-based professional development for teachers of at-risk
preschool children. Educational Technology & Society, 9(4), 194-204.
Kontos, S., Howes, C., and Galinsky, E. (1996). Does training make a difference to
quality in family child care? Early Childhood Research Quarterly, 11(4), 427-445.
Lamb, M. (1998). Nonparental child care: Context, quality, correlates, and
consequences. In W. Damon, I.E. Sigel, and K.A. Renninger (Eds.), Handbook
of child psychology (Vol. 4: Child). London: Wiley.
400
Lambert, R., Abbott-Shinn, M., and Sibley, A. (2006). Evaluating the quality of
early childhood education settings. In B. Spodek and O.N. Saracho (Eds.),
Handbook of research on the education of young children (2nd ed.). Mahwah, NJ:
Lawrence Erlbaum Associates.
McCartney, K. (1984). Effect of quality of day care environment on childrens
language development. Developmental Psychology, 20, 244-260.
McCartney, K., and Phillips, D. (Eds.). (2006). Handbook of early childhood
development. Malden, MA: Blackwell.
Mitchell, A.W. (2005). Stair steps to quality: A guide for states and communities
developing quality rating systems for early care and education. Alexandria, VA:
United Way Success by Six.
National Association for the Education of Young Children. (2005). Screening and
assessment of young English-language learners: Supplement to the NAEYC and
NAECS/SDE joint position statement on early childhood curriculum, assessment,
and program evaluation. Washington, DC: Author.
National Institute for Early Education Research. (2005). Support for English
language learners classroom assessment. Rutgers, NJ: Author.
National Institute for Early Education Research. (2006). The state of preschool.
Rutgers, NJ: Author.
National Institute for Early Education Research. (2007). Preschool classroom
mathematics inventory. Rutgers, NJ: Author.
National Research Council and Institute of Medicine. (2000). From neurons
to neighborhoods: The science of early childhood development. Committee on
Integrating the Science of Early Childhood Development, J.P. Shonkoff and
D.A. Phillips (Eds.). Board on Children, Youth, and Families, Commission
on Behavioral and Social Sciences and Education. Washington, DC: National
Academy Press.
Neuman, S., Dwyer, J., and Koh, S. (2007). Child/Home Early Language & Literacy
Observation Tool (CHELLO). Baltimore, MD: Brookes.
NICHD Early Child Care Research Network. (1999). Child care and motherchild interaction in the first three years of life. Developmental Psychology, 35,
1399-1413.
NICHD Early Child Care Research Network. (2000). The relation of child care to
cognitive and language development. Child Development, 71(4), 960-980.
NICHD Early Child Care Research Network. (2002). Early child care and
childrens development prior to school entry: Results from the NICHD Study
of Early Child Care. American Educational Research Journal, 39, 133-164.
NICHD Early Child Care Research Network. (2003). Does quality of child care
affect child outcomes at age 4? Developmental Psychology, 39, 451-469.
NICHD Early Child Care Research Network. (2005). Duration and developmental
timing of poverty and childrens cognitive and social development from birth
through third grade. Child Development, 76(4), 795-810.
NICHD Early Child Care Research Network. (2006). Child care effect sizes for
the NICHD Study of Early Child Care and Youth Development. American
Psychologist, 61(2), 99-116.
REFERENCES
401
402
Sylva, K., Siraj-Blatchford, I., and Taggart, B. (2003). Assessing quality in the early
years: Early Childhood Environment Rating Scale-Extension (ECERS-E): Four
curricular subscales. Stoke-on Trent, Staffordshire, England: Trentham Books.
Sylva, K., Siraj-Blatchford, I., Taggart, B., Sammons, P., Melhuish, E., Elliot,
K., and Totsika, V. (2006). Capturing quality in early childhood through
environmental rating scales. Early Childhood Research Quarterly, 21, 76-92.
Tout, K., Zaslow, M., and Martinez-Beck, I. (forthcoming). Measuring the quality
of early care and education programs at the intersection of research, policy,
and practice. Submitted to Child Development Perspectives.
U.S. Department of Health and Human Services, Administration for Children
and Families. (2004). Early Head Start research: Making a difference in the lives of
infants, toddlers, and their families. The impacts of early Head Start, volume 1: Final
technical report. Washington, DC: Author.
U.S. Department of Health and Human Services, Administration for Children
and Families. (2005). Head Start impact study: First year findings. Washington,
DC: Author.
Van Horn, M., and Ramey, S. (2004). A new measure for assessing develop
mentally appropriate practices in early elementary school, a developmentally
appropriate practice template. Early Childhood Research Quarterly, 19, 569-587.
Vandell, D. (2004). Early child care: The known and the unknown. Merrill-Palmer
Quarterly, 50, 387-414.
Vu, J.A., Jeon, H., and Howes, C. (in press). Formal education, credential, or both:
Early childhood program classroom practices. Submitted to Early Education
and Development.
Wasik, B.H., and Bryant, D.M. (2001). Home visiting: Procedures for helping families
(2nd ed.). Newbury Park, CA: Sage.
Weinfield, N.S., Egeland, B., and Ogawa, J.R. (1998). Affective quality of motherchild interactions. In C.A. Eldred (Ed.), Parenting behaviors in a sample of young
single mothers in poverty: Results of the New Chance Observational Study (pp. 71113). New York: Manpower Demonstration Research.
Wesley, P.W. (1994). Providing on-site consultation to promote quality in
integrated child care programs. Journal of Early Intervention, 18(4), 391-402.
Yoder, P.J., and Warren, S.F. (2001). Relative treatment effects of two prelinguistic
communication interventions on language development in toddlers with
developmental delays vary by maternal characteristics. Journal of Speech
Language and Hearing Research, 44, 224-237.
Zaslow, M. (2008). Issues for the learning community from the Head Start Impact
Study. Infants and Young Children, 21(1), 4-17.
Zaslow, M., Halle, T., Martin, L., Cabrera, N., Calkins, J., Pitzer, L., and Margie,
N.G. (2006). Child outcome measures in the study of child care quality:
Overview and next steps. Evaluation Review, 30, 577-610.
403
REFERENCES
Chapter 7
American Educational Research Association, American Psychological Associa
tion, and National Council on Measurement in Education. (1999). Standards
for educational and psychological testing. Washington, DC: American Educational
Research Association.
American Institutes for Research. (2000). Voluntary national test, cognitive laboratory
report, year 2. Palo Alto, CA: Author.
Bagnato, S.J., Smith-Jones, J., McComb, G., and Cook-Kilroy, J. (2002). Quality early
learningKey to school success: A first-phase 3-year program evaluation research
report for Pittsburghs Early Childhood Initiative (ECI). Pittsburgh, PA: SPECS
Program Evaluation Research Team.
Bagnato, S.J., Suen, H., Brickley, D., Jones, J., and Dettore, E. (2002). Child
developmental impact of Pittsburghs Early Childhood Initiative (ECI) in
high-risk communities: First-phase authentic evaluation research. Early
Childhood Research Quarterly, 17(4), 559-589.
Brennan, R.L. (2006). Perspectives on the evolution and future of educational
measurement. In R.L. Brennan (Ed.), Educational measurement (4th ed., pp. 116). Westport, CT: ACE/Praeger.
Buros Institute of Mental Measurements. (2007). The seventeenth mental measure
ments yearbook. Lincoln, NE: Author.
Campbell, D.T., and Stanley, J.C. (1966). Experimental and quasi-experimental designs
for research. Chicago, IL: Rand McNally.
Child Trends. (2004). Early childhood measures profiles. Washington, DC: Author.
Cook, T.D., and Campbell, D.T. (1979). Quasi-experimentation: Design & analysis
issues for field settings. Boston: Houghton Mifflin.
Cronbach, L.J. (1951). Coefficient alpha and the internal structure of tests.
Psychometrika, 16(3), 297-334.
Cronbach, L.J. (1971). Test validation. In R.L. Thorndike (Ed.), Educational
measurement (2nd ed., pp. 443-507). Washington, DC: American Council on
Education.
Cronbach, L.J., and Meehl, P.E. (1955). Construct validity in psychological tests.
Psychological Bulletin, 52, 281-302.
Cureton, E.E. (1951). Validity. In E.F. Lindquist (Ed.), Educational measurement (pp.
621-694). Washington, DC: American Council on Education.
Dorans, N.J., and Holland, P.W. (1993). DIF detection and description: MantelHaenszel and standardization. In P.W. Holland and H. Wainer (Eds.),
Differential item functioning. Hillsdale, NJ: Lawrence Erlbaum Associates.
Goldstein, H. (1996). Assessment: Problems, developments, and statistical issues: A
volume of expert contributions. New York: Wiley.
Gulliksen, H. (1950). Theory of mental tests. New York: Wiley.
Holland, P.W., and Wainer, H. (1993). Differential item functioning. Hillsdale, NJ:
Lawrence Erlbaum Associates.
Huang, X. (2007). Validity equivalence between the Chinese and English versions
of the IEA Child Cognitive Developmental Status Test. Berkeley: University of
California.
404
405
REFERENCES
Thissen, D., Steinberg, L., and Wainer, H. (1993). Detection of differential item
function using the parameters of item response models. In P.W. Holland and
H. Wainer (Eds.), Differential item functioning. Hillsdale, NJ: Lawrence Erlbaum
Associates.
Wilson, M. (2005). Constructing measures: An item response modeling approach.
Mahwah, NJ: Lawrence Erlbaum Associates.
Wilson, M., and Adams, R.J. (1996). Evaluating progress with alternative
assessments: A model for Chapter 1. In M.B. Kane (Ed.), Implementing
performance assessment: Promise, problems and challenges. Hillsdale, NJ: Lawrence
Erlbaum Associates.
Zhou, Z., and Boehm, A.E. (1999). Chinese and American childrens knowledge of basic
relational concepts. Paper presented at the biennial meeting of the Society for
Research in Child Development, April, Albuquerque, NM.
Chapter 8
Abedi, J., Hofstetter, C.H., and Lord, C. (2004). Assessment accommodations for
English-language learners: Implications for policy-based empirical research.
Review of Educational Research, 74(1), 1-28.
Abedi, J., Lord, C., Hofstetter, C., and Baker, E. (2000). Impact of accommodation
strategies on English language learners test performance. Educational
Measurement: Issues and Practice, 19(3), 16-26.
American Educational Research Association, American Psychological Associa
tion, and National Council on Measurement in Education. (1999). Standards for
educational and psychological testing. Washington, DC: Author.
August, D., and Shanahan, T. (Eds.). (2006). Developing literacy in second-language
learners: Report of the National Literacy Panel on language-minority children and
youth. Mahwah, NJ: Lawrence Erlbaum Associates.
Bagnato, S.J. (2007). Authentic assessment for early childhood intervention: Best
practices. New York: Guilford Press.
Bagnato, S.J., and Neisworth, J. (1995). A national study of the social and
treatment invalidity of intelligence testing in early intervention. School
Psychologist Quarterly, 9(2), 81-102.
Bagnato, S.J., and Yeh-Ho, H. (2006). High-stakes testing with preschool children:
Violation of professional standards for evidence-based practice in early
childhood intervention. KEDI International Journal of Educational Policy, 3(1),
23-43.
Bagnato, S.J., Macey, M., Salaway, J., and Lehman, C. (2007a). Research foundations
for authentic assessments to ensure accurate and representative early intervention
eligibility. Washington, DC: U.S. Department of Education, Office of Special
Education Programs, TRACE Center for Excellence.
Bagnato, S.J., Macey, M., Salaway, J., and Lehman, C. (2007b). Research foundations
for conventional tests and testing to ensure accurate and representative early
intervention eligibility. Washington, DC: U.S. Department of Education, Office
of Special Education Programs, TRACE Center for Excellence.
406
REFERENCES
407
Carta, J.J., Greenwood, C.R., Walker, D., Kaminski, R., Good, R., McConnell,
S., and McEvoy, M. (2002). Individual growth and development indicators
(IGDIs): Assessment that guides intervention for young children. Young
Exceptional Children Monograph Series, 4, 15-28.
Carter, A.S., Briggs-Gowan, M.J., and Ornstein Davis, N. (2004). Assessment of
young childrens social-emotional development and psychopathology: Recent
advances and recommendations for practice. Journal of Child Psychology and
Psychiatry, 45(1), 109-134.
Castenell, L.A., and Castenell, M.E. (1988). Testing the test: Norm-referenced
testing and low-income blacks. Journal of Counseling and Development, 67,
205-206.
Center for Universal Design. (1997). The principles of universal design. Available:
http://www.design.ncsu.edu/cud/about_ud/udprinciplestext.htm [accessed
December 2007].
Chachkin, N.J. (1989). Testing in elementary and secondary schools: Can miscue
be avoided? In B. Gifford (Ed.), Test policy and the politics of opportunity allocation:
The workplace and the law (pp. 163-187). Boston: Kluwer Academic.
Child Trends. (2004). Early childhood measures profiles. Washington, DC: Author.
Cho, S., Hudley, C., and Back, H.J. (2002). Cultural influences on ratings of selfperceived social, emotional, and academic adjustment for Korean American
adolescents. Assessment for Effective Intervention; Special Issue: Assessment of
Culturally-Linguistically Diverse Learners, 29(1), 3-14.
Christenson, S.L. (2004). The family-school partnership: An opportunity to
promote learning and competence of all students. School Psychology Review,
33(1), 83-104.
Coleman, M.R., Buysse, V., and Neitzel, J. (2006). Recognition and response: An
early intervening system for children at-risk for learning disabilities. Chapel Hill:
University of North Carolina, FPG Child Development Institute.
Cook, T.D., and Campbell, D.T. (1979). Quasi-experimentation: Design & analysis
issues for field settings. Boston: Houghton Mifflin.
Coutinho, M.J., and Oswald, D.P. (2000). Disproportionate representation in
special education: A synthesis and recommendations. Journal of Child and
Family Studies, 9, 135-156.
Crocker, L., and Algina, J. (1986). Introduction to classical and modern test theory.
New York: Holt, Rinehart and Winston.
De Avila, E., and Duncan, S. (1990). Language Assessment ScalesOral. Monterey,
CA: CTB McGraw-Hill.
Deno, S. (1985). Curriculum-based measurement: The emerging alternative.
Exceptional Children, 52, 219-232.
Deno, S. (1997). Whether thou goest . . . Perspectives on progress monitoring. In
J.W. Lloyd, E.J. Kameenui, and D. Chard (Eds.), Issues in educating students with
disabilities (pp. 77-99). Mahwah, NJ: Lawrence Erlbaum Associates.
Division for Early Childhood. (2007). Promoting positive outcomes for children with
disabilities: Recommendations for curriculum, assessment, and program evaluation.
Missoula, MT: Author.
408
REFERENCES
409
Garcia, G.E., Stephens, D.L., Koenke, K.R., Pearson, P.D., Harris, V.J., and Jimenez,
R.T. (1989). A study of classroom practices related to the reading of low-achieving
students: Phase one (Study 2.2.3.5). Urbana: University of Illinois, Reading
Research and Education Center.
Genesee, F., Geva, E., Dressler, C., and Kamil, M. (2006). Synthesis: Crosslinguistic relationships. In D. August and T. Shanahan (Eds.), Report of the
National Literacy Panel on Language Minority Youth and Children. Mahwah, NJ:
Lawrence Erlbaum Associates.
Gipps, C. (1999). Socio-cultural aspects of assessment. Review of Research in
Education, 24, 355-392.
Goldenberg, C., Rueda, R., and August, D. (2006). Synthesis: Sociocultural
contexts and literacy development. In D. August and T. Shanahan (Eds.),
Report of the National Literacy Panel on Language Minority Youth and Children.
Mahwah, NJ: Lawrence Erlbaum Associates.
Gopaul-McNicol, S., and Armour-Thomas, S.A. (2002). Assessment and culture.
New York: Academic Press.
Graziano, W.G., Varca, P.E., and Levy, J.C. (1982). Race of examiner effects and the
validity of intelligence tests. Review of Educational Research, 52(4), 469-497.
Green, R.L. (1980). Critical issues in testing and achievement of black Americans.
Journal of Negro Education, 49(3), 238-252.
Greenspan, S.I., and Meisels, S.J. (1996). Toward a new vision for the develop
mental assessment of infants and young children. In S.J. Meisels and E.
Fenichel (Eds.), New visions for the developmental assessment of infants and young
children (pp. 11-26). Washington, DC: Zero to Three.
Gutirrez-Clellen, V.F. (1999). Language choice in intervention with bilingual
children. American Journal of Speech-Language Pathology, 8, 291-302.
Gutirrez-Clellen, V.F., and Kreiter, J. (2003. Understanding child bilingual
acquisition using parent and teacher reports. Applied Psycholinguistics, 24,
267-288.
Hagie, M.U., Gallipo, P.G., and Svien, L. (2003). Traditional culture versus
traditional assessment for American Indian students: An investigation of
potential test item bias. Assessment for Effective Intervention, 23(1), 15-25.
Hall, C.C.I. (1997). Cultural malpractice: The growing obsolescence of psychology
with the changing U.S. population. American Psychologist, 52(6), 624-651.
Harbin, G., Rous, B., and McLean, M. (2005). Issues in designing state account
ability systems. Journal of Early Intervention, 27(3), 137-164.
Harry, B., and Klingler, J. (2006). Why are so many minority students in special
education? Understanding race and disability in schools. New York: Teachers
College Press.
Hatton, D.D., Bailey, D.B., Burchinal, M.R., and Ferrell, K.A. (1997). Develop
mental growth curves of preschool children with vision impairments. Child
Development, 68(5), 788-806.
Hebbeler, K., and Spiker, D. (2003). Initiatives on children with special needs.
In J. Brooks-Gunn, A.S. Fuligni, and L.J. Berlin (Eds.), Early child development
in the 21st century: Profiles of current research initiatives. New York: Teachers
College Press.
410
Hebbeler, K., Barton, L., and Mallik, S. (2008). Assessment and accountability for
program serving young children with disabilties. Exceptionality, 16(1), 48-63.
Hebbeler, K., Spiker, D., Bailey, D., Scarbourgh, A., Mallik, S., Simeonsson, R.,
Singer, M., and Nelson, L. (2007). Early intervention for infants and toddlers with
disabilities and their families: Participants, services, and outcomes. Menlo Park,
CA: SRI International. Available: http://www.sri.com/neils/pdfs/NEILS_
Report_02_07_Final2.pdf [accessed July 2008].
Helms, J.E. (1992). Why is there no study of cultural equivalence in standardized
cognitive ability testing? American Psychologist, 49(9), 1083-1101.
Hemmeter, M.L., Ostrosky, M., and Fox, L. (2006). Social and emotional
foundations for early interventions: A conceptual model for intervention.
School Psychology Review, 35(4), 583-601.
Hernandez, D. (2006). Young Hispanic children in the U.S.: A demographic portrait
based on Census 2000. Tempe: Arizona State University.
Hilliard, A.G. (1976). Alternatives to I.Q. testing: An approach to the identification
of gifted minority students. Sacramento: California State Department of
Education.
Hilliard, A.G. (1979). Standardization and cultural bias impediments to
the scientific study and validation of intelligence. Journal of Research and
Development in Education, 12(2), 47-58.
Hilliard, A.G. (1991). Testing African American students. Morristown, NJ: Aaron
Press.
Hilliard, A.G. (1994). What good is this thing called intelligence and why bother
to measure it? Journal of Black Psychology, 20(4), 430-443.
Hilliard, A.G. (2004). Intelligence: What good is it and why bother to measure it?
In R. Jones (Ed.), Black psychology. Hampton, VA: Cobb and Henry.
Hiner, N.R. (1989). The new history of children and the family and its implications
for educational research. In W.J. Weston (Ed.), Education and the American
family. New York: New York University Press.
Laing, S.P., and Kamhi, A. (2003). Alternative assessment of language and literacy
in culturally and linguistically diverse populations. Language, Speech, and
Hearing Services in Schools, 34(1), 44-55.
Losardo, A., and Notari-Syverson, A. (2001). Alternative approaches to assessing
young children. Baltimore, MD: Brookes.
Macy, M., Bricker, D.D., and Squires, J.K. (2005). Validity and reliability of a
curriculum-based assessment approach to determine eligibility for Part C
services. Journal of Early Intervention, 28(1), 1-16.
Madhere, S. (1998). Cultural diversity, pedagogy, and assessment strategies. The
Journal of Negro Education, 67(3), 280-295.
Markowitz, J., Carlson, E., Frey, W., Riley, J., Shimshak, A., Heinzen, H., Strohl,
J., Klein, S., Hyunshik, L., and Rosenquist, C. (2006). Preschoolers with
disabilities: Characteristics, services, and results: Wave 1 overview report from the
Pre-Elementary Education Longitudinal Study (PEELS). Rockville, MD: Westat.
Available: https://www.peels.org/Docs/PEELS%20Final%20Wave%201%20
Overview%20Report.pdf [accessed July 2008].
REFERENCES
411
McCardle, P., Mele-McCarthy, J., and Leos, K. (2005). English language learners
and learning disabilities: Research agenda and implications for practice.
Learning Disabilities Research and Practice, 20(1), 69-78.
McConnell, S.R. (2000). Assessment in early intervention and early childhood
special education: Building on the past to project into the future. Topics in Early
Childhood Special Education, 20, 43-48.
McCormick, L., and Noonan, M.J. (2002). Ecological assessment and planning.
In M.M. Ostrosky and E. Horn (Eds.), Assessment: Gathering meaningful
information (pp. 47-60). Missoula, MT: Division for Early Childhood.
McCune, L., Kalmanson, B., Fleck, M.B., Glazewski, B., and Sillari, J. (1990). An
interdisciplinary model of infant assessment. In S.J. Meisels and J.P. Shonkoff
(Eds.), Handbook of early childhood intervention (pp. 219-245). New York:
Cambridge University Press.
McLean, L., and Cripe, J.W. (1997). The effectiveness of early intervention for
children with communication disorders. In M. Guralnick (Ed.), The effectiveness
of early intervention (pp. 329-428). Baltimore, MD: Brookes.
McLean, M. (2004). Assessment and its importance in early intervention/early
childhood special education. In M. McLean, M. Wolery, and D.B. Bailey, Jr.
(Eds.), Assessing infants and preschoolers with special needs (3rd ed., pp. 1-21).
Upper Saddle River, NJ: Pearson Assessments.
McLean, M. (2005). Using curriculum-based assessment to determine eligibility:
Time for a paradigm shift. Journal of Early Intervention, 28(1), 23-27.
McWilliam, R.A. (2004). DEC recommended practices: Interdisciplinary models.
In S. Sandall, M.L. Hemmeter, B.J. Smith, and M.E. McLean (Eds.), DEC
recommended practices: A comprehensive guide for practical application in early
intervention/early childhood special education (pp. 127-131). Longmont, CO:
Sopris West.
Meisels, S.J., and Atkins-Burnett, S. (2000). The elements of early childhood
assessment. In J.P. Shonkoff and S.J. Meisels (Eds.), Handbook of early childhood
intervention (2nd ed., pp. 231-257). New York: Cambridge University Press.
Meisels, S., and Provence, S. (1989). Screening and assessment: Guidelines for
identifying young disabled and developmentally vulnerable children and their
families. Washington, DC: National Early Childhood Technical Assistance
System/National Center for Clinical Infant Programs.
National Association for the Education of Young Children. (2005). Screening and
assessment of young English-language learners: Supplement to the NAEYC and
NAECS/SDE joint position statement on early childhood curriculum, assessment,
and program evaluation. Washington, DC: Author.
National Association for the Education of Young Children and National
Association of Early Childhood Specialists in State Departments of Education.
(2003). Early childhood curriculum, assessment, and program evaluation: Building an
effective, accountable system in programs for children birth through age 8. A position
statement. Washington, DC: National Association for the Education of Young
Children.
National Association of School Psychologists. (2000). Professional conduct manual.
Bethesda, MD: Author.
412
REFERENCES
413
Pretti-Frontczak, K., Jackson, S., Gross, S.M., Grisham-Brown, J., Horn, E., and
Harjusola-Webb, S. (2007). A curriculum framework that supports quality
early education for all children. In E.M. Horn, C. Peterson, and L. Fox (Eds.),
Linking curriculum to child and family outcomes (pp. 16-28). Missoula, MT:
Division for Early Childhood.
Qi, C.H., Kaiser, A.P., Milan, S.E., Yzquierdo, Z., and Hancock, T.B. (2003). The
performance of low-income African American children on the Preschool
Language Scale-3. Journal of Speech, Language, and Hearing Research, 46, 576590.
Rebell, M.A. (1989). Testing, public policy, and the courts. In B. Gifford (Ed.), Test
policy and the politics of opportunity allocation: The workplace and the law. Boston:
Kluwer Academic.
Reynolds, C.R. (1982). Methods for detecting construct and prediction bias.
In R.A. Berk (Ed.), Handbook of methods for detecting test bias (pp. 199-259).
Baltimore, MD: Johns Hopkins University Press.
Reynolds, C.R. (1983). Test bias: In God we trust; all others must have data. Journal
of Special Education, 17(3), 241-260.
Reynolds, C.R., and Kamphaus, R.W. (2003). Behavior assessment system for children
(2nd ed.). Minneapolis, MN: Pearson.
Reynolds, C.R., Lowe, P.A., and Saenz, A.L. (1999). The problem of bias in
psychological assessment. In C.R. Reynolds and T.B. Gutkin (Eds.), Handbook
of school psychology (3rd ed., pp. 549-595). New York: Wiley.
Rhodes, R., Ochoa, S.H., and Ortiz, S. (2005). Assessing culturally and linguistically
diverse students: A practical guide. New York: Guilford Press.
Rock, D.A., and Stenner, A.J. (2005). Assessment issues in the testing of children
at school entry. The Future of Children, 15(1), 15-34.
Rodrigue, J.R., Morgan, S.B., and Geffken, G.R. (1991). A comparative evaluation
of adaptive behavior in children and adolescents with autism, Down
syndrome, and normal development. Journal of Autism and Developmental
Disorders, 21(2), 187-196.
Rueda, R. (2007). Motivation, learning, and assessment of English learners.
Paper presented at the School of Education, California State University,
Northridge.
Rueda, R., and Yaden, D. (2006). The literacy education of linguistically and
culturally diverse young children: An overview of outcomes, assessment, and
large-scale interventions. In B. Spodek and O.N. Saracho (Eds.), Handbook of
research on the education of young children (2nd ed., pp. 167-186). Mahwah, NJ:
Lawrence Erlbaum Associates.
Rueda, R., MacGillivray, L., Monz, L., and Arzubiaga, A. (2001). Engaged
reading: A multi-level approach to considering sociocultural features with
diverse learners. In D. McInerny and S.V. Etten (Eds.), Research on sociocultural
influences on motivation and learning (pp. 233-264). Greenwich, CT: Information
Age.
Santos, R.M., Lee, S., Valdivia, R., and Zhang, C. (2001). Translating translations:
Selecting and using translated early childhood materials. Teaching Exceptional
Children, 34(2), 26-31.
414
REFERENCES
415
416
Chapter 9
Duncan, S.E., and De Avila, E. (1998). Pre-Language Assessment Scale 2000.
Monterey, CA: CTB McGraw-Hill.
Espinosa, L. (2005). Curriculum and assessment considerations for young children
from culturally, linguistically, and economically diverse backgrounds. Special
Issue, Psychology in the Schools, 42(8), 837-853.
Kim, H., Baydar, N., and Greek, A. (2003). Testing conditions influence the
race gap in cognition and achievement by household survey data. Applied
Developmental Psychology, 23, 16.
Mathematica Policy Research. (2006). Implementation of the Head Start National
Reporting System: Spring 2005 update. Princeton, NJ: Author.
Mathematica Policy Research. (2007). Language routing protocol developed for the
First Five LA Universal Preschool Child Outcomes Study, 2007-2008. Princeton,
NJ: Author.
Mathematica Policy Research. (2008). Introduction to conducting assessments as
part of survey projects: Presentation for staff development trainings. Princeton, NJ:
Author.
Maxwell, K.L., and Clifford, R.M. (2004). School readiness assessment. Young
Children: Journal of the National Association for the Education of Young Children,
January, 10. Available: http://journal.naeyc.org/btj/200401/Maxwell.pdf
[accessed February 2008].
Meisels, S.J., and Atkins-Burnett, S. (2006). Evaluating early childhood assess
ments: A differential analysis. In K. McCarney and D. Phillips (Eds.), Handbook
of early childhood development (pp. 533-549). Cambridge, MA: Blackwell.
Rowand, C., Sprachman, S., Wallace, I., Rhodes, H., and Avellar, H. (2005). Factors
contributing to assessment burden in preschoolers. Paper presented at the American
Association for Public Opinion Research, May, Miami, FL. Available: http://
www.allacademic.com/meta/p_mla_apa_research_citation/0/1/6/7/2/
p16722_index.html [accessed July 2008].
Shepard, L., Kagan, S.L., and Wurtz, L. (1998). Principles and recommendations for
early childhood assessments. Goal 1 Early Childhood Assessments Resource
Group. Washington, DC: National Education Goals Panel.
Snow, K.L. (2006). Measuring school readiness: Conceptual and practical
considerations. Early Education and Development, 17(1), 7-41.
Spier, E.T., Sprachman, S., and Rowand, C. (2004). Implementing large-scale studies
of children using clinical assessments. Paper presented at the Children and the
Mediterranean Conference, January, Genoa, Italy.
Sprachman, S., Atkins-Burnett, S., Glazerman, S., Avellar, S., and Loewenberg,
M. (2007). Minimizing assessment burden on preschool children: Balancing burden
and reliability. Paper presented at the Joint Statistical Meetings, September,
Salt Lake City, UT.
417
REFERENCES
Chapter 10
American Educational Research Association, American Psychological Associa
tion, and National Council on Measurement in Education. (1999). Standards
for educational and psychological testing. Washington, DC: American Educational
Research Association.
Baker, E.L., Linn, R.L., Herman, J.L., and Koretz, D. (2002). Standards for
educational accountability systems. Los Angeles: National Center for Research on
Evaluation, Standards, and Student Testing, University of California.
Bruner, C., Wright, M.S., Gebhard, B., and Hubbard, S. (2004). Building an early
learning system: The ABCs of planning and governing structures. Des Moines, IA:
SECPTAN.
California Department of Education. (2003). Desired results for children and families.
Sacramento: Author, Child Development Division.
California Department of Education. (2005). Desired Results Developmental
Profile-Revised (DRDP-R), Preschool Instrument. Sacramento: Author, Child
Development Division.
Espinosa, L.M. (2008). A review of the literature on assessment issues for young English
language learners. Paper commissioned by the Committee on Developmental
Outcomes and Assessments for Young Children, The National Academies,
Washington, DC.
Espinosa, L.M., and Lpez, M.L. (2007). Assessment considerations for young
English language learners across different levels of accountability. Philadelphia: The
National Early Childhood Accountability Task Force.
Garca, E.E. (2005). Teaching and learning in two languages: Bilingualism and schooling
in the United States. New York: Teachers College Press.
Gilliam, W.S., and Zigler, E.F. (2004). State efforts to evaluate the effects of prekindergarten: 1977-2003. New Haven, CT: Yale University Child Study Center.
Goodman, D.P., and Hambleton, R.K. (2003). Student test score reports and
interpretive guides: Review of current practices and suggestions for future research.
Amherst: University of Massachusetts School of Education.
Hambleton, R.K., and Slater, S.C. (1997). Reliability of credentialing examinations
and the impact of scoring models and standard setting policies. Applied
Measurement in Education, 10, 19-38.
Harms, T., Clifford, R., and Cryer, D. (1998). Early Childhood Environment Rating
Scale (Revised ed.). New York: Teachers College Press.
Harms, T., Cryer, R., and Clifford, R. (1990). Infant/Toddler Environment Rating
Scale. (Revised ed.). New York: Teachers College Press.
Herman, J.L., and Perry, M. (2002). California student achievement: Multiple views of
K-12 progress. Menlo Park, CA: Ed Source.
Jaeger, R.M. (1998). Evaluating the psychometric qualities of the National
Board for Professional Teaching Standards assessments: A methodological
accounting. Journal of Personnel Evaluation in Education, 22, 189-210.
Kagan, S.L., Tarrant, K., and Berliner, A. (2005). Building a professional development
system in South Carolina: Review and analysis of other states experiences. New
York: Columbia University National Center for Children and Families.
418
Koretz, D.M., and Baron, S.I. (1998). The validity of gains in scores on the Kentucky
Instructional Results Information System (KIRIS). Santa Monica, CA: RAND
Corporation.
Linn, R.L. (2003). Accountability: Responsibility and reasonable expectations.
Educational Researcher, 32(7), 3-13.
Meisels, S.J. (2006). Accountability in early childhood: No easy answers. Chicago, IL:
Erikson Institute.
Mitchell, A.W. (2005). Stair steps to quality: A guide for states and communities
developing quality rating systems for early care and education. Alexandria, VA:
United Way Success by Six.
National Early Childhood Accountability Task Force. (2007). Taking stock: Assessing
and improving early childhood learning and program quality. Philadelphia:
Author.
National Research Council. (2001). Knowing what students know: The science
and design of educational assessment. Committee on the Foundations of
Assessment, J. Pellegrino, N. Chudowsky, R. Glaser (Eds.). Board on Testing
and Assessment, Center for Education, Division of Behavioral and Social
Sciences and Education. Washington, DC: National Academy Press.
National Research Council. (2006). Systems for state science assessment. Committee
on Test Design for K-12 Science Achievement, M.R. Wilson and M.W.
Bertenthal (Eds.). Board on Testing and Assessment, Center for Education,
Division of Behavioral and Social Sciences and Education. Washington, DC:
The National Academies Press.
National Research Council and Institute of Medicine. (2000). From neurons
to neighborhoods: The science of early childhood development. Committee on
Integrating the Science of Early Childhood Development, J.P. Shonkoff and
D.A. Phillips (Eds.). Board on Children, Youth, and Families, Commission
on Behavioral and Social Sciences and Education. Washington, DC: National
Academy Press.
Neuman, S.B., and Roskos, K. (2005). The state of state pre-kindergarten
standards. Early Childhood Research Quarterly, 20(2), 125-145.
New Jersey Office of Early Childhood Education. (2004). NJ early learning
assessment systemLiteracy. Trenton: New Jersey Department of Education.
New Jersey Office of Early Childhood Education. (2006). NJ early learning
assessment systemMath. Trenton: New Jersey Department of Education.
Pianta, R.C. (2003). Standardized classroom observations from pre-K to third grade: A
mechanism for improving quality classroom experiences during the P-3 years. New
York: Foundation for Child Development.
Scott-Little, C., Kagan, S.L., and Frelow, V.S. (2003a). Creating the conditions for
success with early learning standards: Results from a national study of statelevel standards for childrens learning prior to kindergarten. Early Childhood
Research & Practice, 5(2).
Scott-Little, C., Kagan, S.L., and Frelow, V.S. (2003b). Standards for preschool
childrens learning and development: Who has standards, how were they developed,
and how are they used? Greensboro: University of North Carolina.
419
REFERENCES
Smith, M., and Dickinson, D. (2002). Users guide to the early language & literacy
classroom observation toolkit. Available: http://www.brookespublishing.com/
store/books/smith-ellco/index.htm [accessed July 2008].
U.S. Department of Education. (2004). Standards and assessments peer review
guidance: Information and examples for meeting requirements of the No Child Left
Behind Act of 2001. Washington, DC: Author.
Wainer, H. (1997). Improving tabular displays: With NAEP tables as examples and
inspirations. Journal of Educational and Behavioral Statistics, 22, 1-30.
Wainer, H., Hambleton, R.K., and Meara, K. (1999). Alternative displays for
communicating NAEP results: A redesign and validity study. Journal of
Educational Measurement, 36, 301-335.
Chapter 11
American Educational Research Association, American Psychological Associa
tion, and National Council on Measurement in Education. (1999). Standards for
educational and psychological testing. Washington, DC: Author.
Christenson, S.L. (2004). The family-school partnership: An opportunity to
promote learning and competence of all students. School Psychology Review,
33(1), 83-104.
Goldenberg, C., Rueda, R., and August, D. (2006). Synthesis: Sociocultural
contexts and literacy development. In D. August and T. Shanahan (Eds.),
Report of the National Literacy Panel on Language Minority Youth and Children.
Mahwah, NJ: Lawrence Erlbaum Associates.
National Education Goals Panel. (1995). Reconsidering childrens early development
and learning: Toward common views and vocabulary. Washington, DC: Author.
National Research Council. (1999). High stakes: Testing for tracking, promotion,
and graduation. Committee on Appropriate Test Usage, J.P. Heubert and
R.M. Hauser (Eds.). Center for Education, Division of Behavioral and Social
Sciences and Education. Washington, DC: National Academy Press.
National Research Council. (2006). Systems for state science assessment. Committee
on Test Design for K-12 Science Achievement, M.R. Wilson and M.W.
Bertenthal (Eds.). Board on Testing and Assessment, Center for Education,
Division of Behavioral and Social Sciences and Education. Washington, DC:
The National Academies Press.
National Research Council and Institute of Medicine. (2000). From neurons
to neighborhoods: The science of early childhood development. Committee on
Integrating the Science of Early Childhood Development, J.P. Shonkoff and
D.A. Phillips (Eds.). Board on Children, Youth, and Families, Commission
on Behavioral and Social Sciences and Education. Washington, DC: National
Academy Press.
Rueda, R. (2007). Motivation, learning, and assessment of English learners. Paper
presented at the School of Education, California State University, Northridge,
April.
420
Rueda, R., and Yaden, D. (2006). The literacy education of linguistically and
culturally diverse young children: An overview of outcomes, assessment, and
large-scale interventions. In B. Spodek and O.N. Saracho (Eds.), Handbook of
research on the education of young children (2nd ed., pp. 167-186). Mahwah, NJ:
Lawrence Erlbaum Associates.
Rueda, R., MacGillivray, L., Monz, L., and Arzubiaga, A. (2001). Engaged reading:
A multi-level approach to considering sociocultural features with diverse
learners. In D. McInerny and S.V. Etten (Eds.), Research on sociocultural influences
on motivation and learning (pp. 233-264). Greenwich, CT: Information Age.
Appendixes
Appendix
Accommodations
Achievement test
Alternative
assessment
423
424
Assessment
Authentic
assessment
Constructi rrelevant
variance
Criterionreferenced
assessment
Curriculum-based
assessment
Developmental
assessment
APPENDIX A
Developmentally
appropriate
Dynamic
assessment
Formal
assessment
Formative
assessment
High-stakes
assessment
Informal
assessment
Naturalistic
assessment
425
Developmentally appropriate practice is
informed by what is known about child
development and learning, what is known
about each child as an individual, and what
is known about the social and cultural contexts in which children live (adapted from
National Association for the Education of
Young Children, 1996, 2008).
Assessment approach characterized by
guided support or learning for the purpose
of determining a childs potential for change
(Losardo and Notari-Syverson, 2001).
A procedure for obtaining information that
can be used to make judgments about characteristics of children or programs using
standardized instruments (Council of Chief
State School Officers, 2008).
An assessment designed to monitor progress toward an objective and used to guide
curricular and instructional decisions.
Tests or assessment processes for which
the results lead to significant sanctions or
rewards for children, their teachers, administrators, schools, programs, or school systems. Sanctions may be direct (e.g., retention in grade for children, reassignment
for teachers, reorganization for schools) or
unintended (e.g., narrowing of the curriculum, increased dropping out).
A procedure for obtaining information that
can be used to make judgments about characteristics of children or programs using means
other than standardized instruments (Council
of Chief State School Officers, 2008).
See Authentic assessment.
426
Norm-referenced
test
Performance
assessment
Portfolio
assessment
Progress
monitoring
Readiness test
APPENDIX A
427
Reliability
Screening
Standardized test
Standards-based
assessment
Summative
assessment
Validity (of an
assessment or
tool)
428
SOURCES
American Educational Research Association, American Psychological Association,
and National Council on Measurement in Education. (1999). Standards for
educational and psychological testing. Washington, DC: Author.
Association for Supervision and Curriculum Development. (2008). Homepage.
Available: http://www.ascd.org [accessed June 2008].
Bagnato, S.J., and Neisworth, J.T. (1991). Assessment for early intervention: Best
practices for professionals. New York: Guilford Press.
Council of Chief State School Officers. (2008). Glossary terms. Washington,
DC: Author. Available: http://www.ccsso.org/projects/scass/projects/
early_childhood_education_assessment_consortium/publications_and_
products/2892.cfm [accessed August 2008].
Losardo, A., and Notari-Syverson, A. (2001). Alternative approaches to assessing
young children. Baltimore, MD: Brookes.
McAfee, O., Leong, D.J., and Bodrova, E. (2004). Basics of assessment: A primer
for early childhood educators. Washington, DC: National Association for the
Education of Young Children.
Appendix
AGENDA
1:00 Catherine Snow, Committee Chair, and Susan Van
Hemel, Study Director. Welcome and introduction of
committee. Description of the study and purpose of the
forum. Review of procedure and ground rules.
1:20 Ben Allen, National Head Start Association
1:32 Tammy Mann, Zero to Three
1:44 Fasaha Traylor, Foundation for Child Development
1:56 Jerlean Daniel, National Association for the Education of
Young Children
2:08 Joan Isenberg, National Association of Early Childhood
Teacher Educators
2:20 Sally Flagler, National Association of School Psychologists
2:32 Andrea Browning, Society for Research in Child
Development (brief statement)
2:40 Break
3:00 Willard Gilbert, National Association for Bilingual
Education
429
430
431
APPENDIX B
432
APPENDIX B
433
434
APPENDIX B
435
Appendix
Development of
State Standards for
Early Childhood Education
437
438
439
APPENDIX C
440
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
AZ
AR
CA
CO
CT
DE
FL
GA
HI
ID
x
x
x
x
Social/
Emotional
AK
National
Head Start COF
Carnegie/
McGraw-Hill
States
AL
Physical/
Motor/
Health
x
x
x
x
Approaches
Toward
Learning
x
x
x
x
x
x
x
x
x
x
Literacy
x
x
x
x
Language/
Communication
Cognition/
General
Knowledge
x
x
Math
x
x
Science
x
x
x
x
x
x
Art/
Aesthetics
Social
Studies
continued
Humanities
Safety
Technology
Environmental
Education
World
Languages
Safety
World
Languages
Other
TABLE C-1 Domain/Content Areas Headings Included in National and State Pre-K Early Learning Standards
Documents
441
x
x
x
x
x
MI
MN
MS
MO
MT
NE
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
Language/
Communication
x
x
Approaches
Toward
Learning
x
x
x
x
x
IN
IA
KS
KY
LA
ME
Learning
Results
Early
Learning
Results
MD
MA
Social/
Emotional
IL
Physical/
Motor/
Health
x
x
x
x
x
x
x
x
x
x
Literacy
Cognition/
General
Knowledge
x
x
x
x
x
x
x
x
x
x
x
x
Math
x
x
x
x
x
x
x
x
x
x
x
x
Science
x
x
x
x
x
x
x
Art/
Aesthetics
x
x
x
x
x
Social
Studies
Technology,
Engineering
Nutrition, SelfHelp
LR: Career
Preparation,
Modern
and Classic
Languages,
Technology
Foreign
Language
Other
442
x
x
x
x
x
x
x
x
x
x
x
x
x
x
WV
WI
NM
NY
NC
ND
OH
OK
OR
PA
RI
SC
SD
TN
TX
UT
VT
VA
WA
NV
NH
NJ
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
continued
Self-Help
Technology
Technology
World
Languages
McGraw-Hill
443
Social/
Emotional
Approaches
Toward
Learning
Language/
Communication
Literacy
Cognition/
General
Knowledge
Math
Science
Art/
Aesthetics
Social
Studies
Other
NOTE: This table has been adapted with permission from a 2005 report by Scott-Little, Kagan, and Frelov, Inside the Content: The Breadth and
Depth of Early Learning Standards. The table has been updated to include states that published their early learning standards document after this
report was completed. Data were collected by simply reviewing the table of contents of each early learning standards document and noting the
developmental domain areas and academic subject areas included in the table of contents. Results from analyses conducted by Scott-Little, Kagan,
and Frelov (2005) on the content of the actual early learning standards included in the documents indicate that the table of contents is not always
an accurate reflection of the content of the standards themselves. While the table of contents may reflect the intentions or overall mind set of the
persons who developed the early learning standards, they do not necessarily give a complete or accurate indication of the areas of learning and
development that have been addressed in the standards themselves.
WY
Physical/
Motor/
Health
444
APPENDIX C
445
446
APPENDIX C
447
448
REFERENCES
Council of Chief State School Officers and Early Childhood Education Assessment
Consortium. (2007). The words we use: A glossary of terms for early childhood
education standards and assessment. Available: http://www.ccsso.org/Projects/
scass/projects/early_childhood_education_assessment_consortium/
publications_and_products/2892.cfm [accessed February 2008].
Foundation for Child Development. (2008). PreK-3rd: A new beginning for American
education. Available: http://www.fcd-us.org/initiatives/initiatives_show.
htm?doc_id=447080 [accessed May 2008].
Michigan State Board of Education. (2006). Early childhood standards of quality for
infant and toddler programs. Lansing: Author.
National Institute for Early Education Research. (2003). State standards database.
New Brunswick, NJ: Author.
National Research Council. (2001). Eager to learn: Educating our preschoolers.
Committee on Early Childhood Pedagogy, B.T. Bowman, M.S. Donovan,
and M.S. Burns (Eds.). Commission on Behavioral and Social Sciences and
Education. Washington, DC: National Academy Press.
Neuman, S.B., and Roskos, K. (2005). The state of state pre-kindergarten
standards. Early Childhood Research Quarterly, 20(2), 125-145.
Partnership for 21st Century Skills. (2007). The intellectual and policy foundations of
the 21st century skills framework. Tucson, AZ: Author.
Scott-Little, C., Kagan, S.L., and Frelow, V.S. (2005). Inside the content: The depth and
breadth of early learning standards. Greensboro: University of North Carolina,
SERVE Center for Continuous Improvement.
Scott-Little, C., Lesko, J., Martella, J., and Milburn, P. (2007). Early learning
standards: Results from a national survey to document trends in state-level
policies and practices. Early Childhood Research and Practice, 9(1), 1-22.
White House. (2002). Good start, grow smart: The Bush administrations early child
hood initiative. Washington, DC: Executive Office of the President.
Appendix
Sources of
Detailed Information on
Test and Assessment Instruments
449
450
APPENDIX D
451
psychometric properties. It covers measures of child development; parenting, the home environment, and parent well-being;
and program implementation and quality.
Title: Screening for Developmental and Behavioral Problems
Source: Glascoe (2005), Mental Retardation and Developmental Disabilities Research Reviews, 11(3), 173-179
Notes: This recent review article by Glascoe describes screening
tools and instruments for use with infants and young children and
is focused chiefly on instruments for use in pediatric surveillance
and screening programs. Similar information authored by Glascoe
is available at the DBPeds website (see below).
Title: Developmental Screening Tools: Gross Motor/Fine Motor for
Newborn, Infants and Children
Source: (Beligere, Zawacki, Pennington, and Glascoe, 2007)
(available at DBPeds website)
Notes: Glascoe and colleagues provide a listing specifically of
screening tools for gross motor and fine motor development, also
available on the DBPeds website.
Title: Assessing Social-Emotional Development in Children from a
Longitudinal Perspective for the National Childrens Study: SocialEmotional Compendium of Measures
Source: (Denham, 2005) (available at The National Childrens
Study website)
Notes: Denham provides extensive information on content and
psychometric characteristics of social-emotional instruments,
with additional comments on their use for the national childrens
Study. She includes judgments on strengths and weaknesses of
each measure reviewed, and references for research studies of
each instrument. Many of the measures reviewed are not for
young children.
452
453
APPENDIX D
454
Mathematica Policy Research. (2003). Resources for measuring services and outcomes
in Head Start programs serving infants and toddlers. Princeton, NJ: Author.
Ringwalt, S. (2008). Developmental screening and assessment instruments with an
emphasis on social and emotional development for young children ages birth through
five. Chapel Hill: University of North Carolina, FPG Child Development
Institute, National Early Childhood Technical Assistance Center.
Appendix
Biographical Sketches of
Committee Members and Staff
456
APPENDIX E
457
458
APPENDIX E
459
460
APPENDIX E
461
462
APPENDIX E
463
science education, patient-reported outcomes, and child development; to policy issues in the use of assessment data in accountability systems. He has recently published three books: Constructing
Measures: An Item Response Modeling Approach is an introduction
to modern measurement; Explanatory Item Response Models: A
Generalized Linear and Nonlinear Approach (with Paul De Boeck)
introduces an overarching framework for the statistical modeling of measurements that makes available new tools for understanding the meaning and nature of measurement; and Towards
Coherence Between Classroom Assessment and Accountability explores
the issues relating to the relationships between large-scale assessment and classroom-level assessment. At the NRC, he chaired
the Committee on Test Design for K-12 Science Achievement. He
is the founding editor of Measurement: Interdisciplinary Research
and Perspectives. He has a Ph.D. in educational measurement and
educational statistics from the University of Chicago.
Martha Zaslow is the vice president for research at Child Trends
and area director for the early child development content area.
Her research takes an ecological perspective, considering the
contributions of different contexts to the development of children
in low-income families, including the family, early care and education, and policy contexts. In studying the role of the family, she
has focused especially on parenting, carrying out observational
studies of mother-child interaction. In studying early care and
education, her work has focused on patterns of child care use
among low-income families and on strategies to improve child
care quality. She has a particular interest in the professional
development of those working in early childhood settings and
its relation to quality and to child outcomes. With respect to the
policy context, she has studied the use of funding from the Child
Care and Development Fund to improve child care quality, state
initiatives to improve childrens school readiness, and impacts
on children of different welfare reform policies. At the NRC, she
was a member of the NRC/IOM Committee on Promoting Child
and Family Well-Being Through Family Work Policies: Building a
Knowledge Base to Inform Policies and Practice. She has a Ph.D.
in psychology from Harvard University.
Index
A
A Developmentally Appropriate
Practices Template (ADAPT),
162-163, 175
Abecedarian Project, 104, 111
Access to health care, 75
Access to test data, 3
Accommodations
for children with disabilities, 5, 8,
40, 250, 254, 259, 260, 267, 273,
276-279, 295-296, 298, 330-331,
353, 367
defined, 423
for English language learners, 250,
254, 366
Accountability. See also High-stakes
assessment
appropriate use of assessments,
35-36, 39-41, 167, 198, 259
current practice, 326
demand for assessments, 1, 18-19,
246, 280
development of instruments for,
367-368
interpreting test data, 35
465
466
Adaptive Social Behavior Index, 122
Administration for Children and
Families (ACF), 20, 21, 23, 24,
25, 53, 55
Administration of assessments. See
also Implementing assessments
accommodations for children with
disabilities, 272, 295-296
to English language learners, 105,
116-117, 260, 291-295
familiarity of assessor to child,
185-186, 227, 286-288
guidelines, 37-39, 268-269, 272
individualization of, 272
length of, 288-291, 294
order of, 293-294
standardization, 40, 74-75, 287
standards for testing, 250-251
stop rules, 290-291
training of examiners, 3, 33, 64,
102, 150, 256, 285-286, 291,
294-295
Age of children, 3, 38, 71, 194
Ages and Stages, 77
Ainsworth Strange Situation
Procedure, 84
Alberta Infant Motor Scale, 80
Alternative assessment. See
Performance assessments
American Academy of Pediatrics, 67,
68, 70, 71, 453
American Educational Research
Association, 55
American Psychological Association,
107
Approaches to learning
consensus, 97-98
constructs, 97, 107
domain defined, 58, 97
early childhood education
standards, 445
instruments, 128-129
intervention studies, 99
malleability, 99
measures of, 100
and outcomes, 98-99
testing all children, 100, 242
INDEX
Appropriate use of assessments
accommodations for disabilities,
5, 8, 267, 295-296, 298, 330-331,
353
for accountability, 35-36, 39-41,
167, 198, 259
age and, 3, 38, 71,194
defining, 22, 184
developmental delays and, 38, 271
domain definition and, 3, 184, 433
English language learners, 3, 40,
250-251, 256, 258, 259, 292, 293,
295, 298
guidelines for, 37-39, 270, 345-346,
353, 360
inclusivity and, 320-321, 322
legal and ethical precedents,
250-251
level of expectation for program
target and, 197
minority children, 3, 235-240, 243,
244, 259
program evaluation, 39, 86, 148,
197-198, 259, 297
for progress monitoring, 39, 86,
148, 197, 259, 297
purpose of assessment and, 27,
259, 283, 341-342, 344, 433
for quality of learning
environments, 320
readiness assessment, 31
for screening, 360
for special needs children, 3, 4, 38,
40, 271, 280, 283
standardization sample and
methods and, 237-238, 243, 271
test and item bias, 205, 210-212,
235-240, 243
for testing all children, 96, 100,
104-106, 112, 116-117
testing situation and assessor
characteristics and, 238-239
Assessment, Evaluation and
Programming System (AEPS),
333
Assessment, generally. See also
Instruments for assessment
current forms, 324-328
467
INDEX
B
Bank Street curriculum, 335
Battelle Developmental Inventory
Screening Test, 77
Bayley Scales of Infant and Toddler
Development, 40, 78, 80, 81, 88,
113, 120, 121, 122, 123, 124, 129,
133, 242, 285, 288
468
INDEX
Brigance Screens, 77
Buros Institute of Mental
Measurements, 215, 450, 452
C
California Desired Results for
Children and Families (DRCF)
Access for Children with
Disabilities Project, 273-274,
330-331
Desired Results Developmental
Profile, 10, 36, 118, 277, 309-310,
312, 330, 332, 339, 358
system, 277, 329-332
California Preschool Learning
Foundations in Social and
Emotional Development for
Ages 3 and 4, 90
Capute Scales (CAT/CLAMS), 78
Caregiver. See Parent-child
interaction; Primary caregiverchild interactions
Caregiver Interaction Scale (CIS), 160161, 174
Caregiver-Teacher Report Form, 124,
125
Center-based environments. See also
Classroom environments
components, 99
quality measures, 155-156, 160,
162, 168, 169, 173
Center for Universal Design, 277
Cerebral palsy, 68, 69
Charge to committee, 20-22
Checklist for Autism in Toddlers
(CHAT), 81
Child Behavior Checklist, 122, 124,
125, 242
Child Care and Development Fund,
49
Child Development Inventory and
Child Development ReviewParent Questionnaire, 78
Child/Home Early Language
and Literacy Observation
(CHELLO), 153-154, 174
469
INDEX
Cognitive skills. See also Attention
span; Executive functioning;
General knowledge;
Intelligence tests; Memory
consensus, 109
continuity and associations with
important outcomes, 108, 110
domain defined, 58, 106-109
English language learners, 253-254
infants, 108, 110, 113
instruments, 68, 77-78, 110, 111,
130-132, 253
lead poisoning and, 68
malleability, 111-112
measures of, 107, 108, 110, 113,
153-154
minority children, 242
nutritional deficiency and, 67, 76
quality of environment and, 108,
148, 151, 153-154, 170
standards of learning, 89
stimulation in home environments,
146, 151-152, 153-154
testing all children, 112, 242,
253-254
Communication and Symbolic
Behavior Scales, 79
Comprehensive Test of Phonological
Processing, 103, 140
Concepts About Print, 102, 140
Congenital hypothyroidism, 65
Connors Rating Scales-Revised
(CRS-R), 125
Construct, defined, 186. See also
Validity of assessments
Construct-irrelevant variance, 188,
191, 196, 274-275, 277, 278, 424
Context measures, 60
Contextual issues, 15-20, 40-41, 63-64,
67 n.1, 168, 226-231, 235-237,
243, 247, 250, 257-258
Continuous performance task (CPT),
113, 128
Council for Exceptional Children,
Division for Early Childhood
(DEC), 33, 38, 39, 268, 269
Council of Chief State School Officers,
38, 44, 308-309, 437 n.1, 439-440,
445-446
D
Databases on measurement
instruments, 452-453
Day/night test, 113
Delay-of-Gratification Task, 122, 128
Denver Developmental Screening Test
II, 77, 88, 120, 127, 144
Denver Prescreening Developmental
Questionnaire, 77
Desired Results Developmental
Profile, 10, 36, 118, 277, 309-310,
312, 330, 332, 339, 358, 375
Developmental assessment
charge to committee, 21-22,
431-432
clinical guidelines, 70
contexts for, 63-64
defined, 424
infants and toddlers, 2, 65, 70-72,
73, 110, 261
mandatory, 75
newborns, 68-70
normal limits, 72, 74
research agenda, 12, 360-368
for special needs children, 76, 261,
271, 369
types, 71, 77-84
Developmental delays, 38, 64, 65, 262,
263, 271. See also Special needs
children
Developmental Indicators for
Assessment of LearningRevised, 77
Developmental outcome measures,
69, 73, 88, 275-276
English language learners, 52, 54,
55
470
INDEX
guidelines on, 5, 348-349
Head Start, 50-52
infant-toddler period, 63
instruments by, 71, 77-84, 87
justifications for, 59-60, 346-348
measurement ease, 17
overlap across, 275
schooling-related, 86
subscales, 87
Dots Task, 95
Dynamic assessment, 143, 144, 425
Dynamic Indicators of Basic Early
Literacy Skills, 143, 144
E
Early Childhood Classroom
Observation Measure
(ECCOM), 165, 175
Early Childhood Education
Assessment Consortium, 44,
437 n.1, 439-440, 445-446
Early childhood education standards
for accreditation, 45
alignment with assessments, 184185, 335-336, 338
concerns about, 46, 48
content, by state, 439-444
defined, 44, 437 n.1
development history, 36-37, 45-52,
437-447
differences among state
documents, 438-439
domains, 44, 97, 441-444, 445
Good Start, Grow Smart initiative,
52-53
Head Start Child Outcomes
Framework, 46, 49-52, 184, 445
important influences, 48-49
K-12 learning standards and,
445-447
national, 46
National Reporting System, 20, 23,
47, 49, 53-55, 201, 273, 284, 430
state, 45-46, 97
uses, 44
471
INDEX
Early Childhood Environment Rating
Scale-Extension (ECERS-E),
164-165, 175
Early Childhood Environment
Rating Scale-Revised Edition
(ECERS-R), 147, 163-164, 167,
168, 175, 336
Early Childhood Learning and
Knowledge Center, 23
Early Childhood Longitudinal StudyBirth Cohort, 36, 285, 288
Early Childhood Longitudinal StudyKindergarten (ECLS-K), 36, 98,
100, 122, 128, 129, 201, 266-267,
273, 367
Early Head Start, 23, 63, 64, 104, 111,
152, 201, 266-267, 430, 438, 450
Research and Evaluation Study,
152, 201, 267, 285, 466
Early Language and Literacy
Classroom Observation
(ELLCO), 154, 165-166, 175, 334
Early Language Milestone Scale, 79
Early Learning Assessment System,
335-336
Early Motor Pattern Profile (EMPP),
80
Early Training Project, 111
Educational Testing Service, 71, 207,
215, 452, 453
Effect sizes, 111, 208-209
Emerging Academics Snapshot (EAS),
166-167, 175
Emotion Matters II Direct
assessments, 95
English language learners
accommodations, 250, 254, 366
administration of assessments, 105,
116-117, 250-251, 260, 291-295
appropriateness of assessments
for, 3, 40, 250-251, 256, 258, 259,
292, 293, 295, 298
assessment issues, 23, 110, 112,
208-209, 249-258, 350-351
cognitive assessments, 253-254
contextual issues, 247, 250, 257-258
domains, 251-255
examiner issues, 247, 250
F
Fagan Test of Infant Intelligence, 77
Family and Child Experiences Survey
(FACES), 50, 148, 285-286, 289
Family Day Care Environment Rating
Scale (FDCERS), 147, 167, 176
Five LA Universal Preschool Child
Outcomes Study, 292
Flanker Task, 95
Formal assessment, 71, 106, 117, 119,
137, 236-237, 272, 370, 371, 425
472
INDEX
G
Galileo System for the Electronic
Management of Learning, 121,
123, 128, 129, 132, 135, 137, 138,
139
Games as Measurement for Early
Self-Control (GAMES), 120, 121,
123, 130
General knowledge, 58, 87, 107
instruments, 133-135
Generalizability theory, 200
Genetic/metabolic screening, 64-65
Global functioning, 17
Goal 1 Early Childhood Assessments
Resource Group, 38
Goals 2000, 48
Good Start, Grow Smart initiative, 47,
49, 52-53, 348, 437-438, 446
Government Performance and Results
Act, 1, 19
Growth Charts, 120
Guidelines. See also Standards
developmental outcome measures,
5-6, 348-349
domains, 5, 348-349
government responsibility, 372-373
health care providers role, 369
implementing guidance, 369-374
instrument selection and
implementation, 6-8, 352-354
of professional organizations,
37-39
program administrators role,
371-372
purposeful assessments, 5, 345-346
rationales for, 342-345, 346-348,
349-351, 354-356
researchers role, 374
H
Head Start, 2, 18, 45, 156, 302
approaches to learning in, 99, 100
assessment practices, 52, 53, 54,
110, 327, 328, 430
Child Outcomes Framework, 46,
49-52, 97, 184, 326, 327, 445
Family and Child Experiences
Survey (FACES), 50, 148, 285286, 289
Impact Study, 112, 148, 289
learning standards, 88
National Reporting System, 20, 23,
47, 49, 53-55, 201, 273, 284, 285,
287, 289, 291, 293, 294, 295, 296,
297, 327, 430
Office of Planning, Research and
Evaluation, 23
performance measures, 51, 322,
430
Pyramid of Services, 49, 50, 51
reauthorization, 55
State Collaboration Offices, 438
University Partnership
Measurement Development
Grants Program, 105
Head Start Act, 49
Head to Toe Task, 95
Health care providers
assessment of infants and toddlers,
64
implementing guidance, 369
Hearing
impairment, and assessment
validity, 274, 296
screening, 29, 30, 66, 255, 262, 263
n.2, 363
High/Scope Child Observation
Record (COR), 33, 121, 124, 129,
133, 135, 138, 139, 172, 333, 335
High Scope/Perry Preschool Project,
111
473
INDEX
High-stakes assessment, 27. See also
Accountability
defined, 2-3 n.1, 425
guidelines for using, 7, 10, 296,
353, 355, 358, 373
reliability and validity, 283
systemic approach, 337
unintended or inappropriate uses
of data, 284, 286, 337, 355-356,
358, 373
unintended or undesirable
consequences, 195
Home environments
and academic and social outcomes,
155
assessing, 149, 150-155, 167, 168,
169, 172, 173
basic needs and safety monitoring
provided, 151, 152
cognitive stimulation, 146, 151-152,
153-154
primary caregiver-child
interactions, 152-153
Home Observation for Measurement
of the Environment (HOME),
154-155, 174
Home visiting programs, 18, 63, 111112, 145-146, 149, 154
I
Implementing assessments. See
also Administration of
assessments
cost analysis, 97, 297-298
determining and communicating
purpose, 281, 282-284, 291, 292293, 296-297, 341-342
following up on administration,
33, 296-298
guidelines, 6-8, 352-354
parental consent, 284-285
preparing for administration,
282-286
protecting data, 286
rationale for guidelines, 349-351
standardization in, 283
474
INDEX
K
Kaufman Assessment Battery for
Children (K-ABC), 113, 130,
131, 137, 142, 242, 245
Knowledge. See General knowledge
L
Labeling vulnerable children, 46
Language and literacy, 16-17, 30. See
also Phonological awareness;
Reading; Vocabulary
accountability assessments, 102
associations with important
outcomes, 66, 103-104
cognitive skills and, 110
constructs, 58
delays/disorders, 101, 102, 106
diagnostic testing, 101, 102, 106
discourse skills, 101, 102, 103
domain defined, 79, 100-103,
139-144
early learning guidelines, 49
English language learners, 104105, 172-173, 248-249, 251-252,
291-292
instructional and intervention
planning, 32, 104
instruments/tools for assessment,
59, 79, 101, 102, 106, 139-144,
162, 165-166, 168-169, 172-173,
174-177
learning behaviors and, 98
length of assessment, 289-290
malleability, 104
measures of, 102, 105, 106
minority children, 237, 242
quality of learning environment,
17, 104, 148, 154, 155, 157, 158,
161, 162, 164, 165-166, 167, 168,
169, 170, 174-177
receptive language, 66, 101
research-related assessment, 101,
106
standards of learning, 52-53, 89
testing all children, 104-106, 242
training examiners, 102
transfer theory, 248-249
validity of scores, 101
Language Assessment Scales (LAS),
252
Language minority. See English
language learners
Large-scale assessments, 40, 254, 259260, 266-267, 285
475
INDEX
Lead screening, 29, 30, 68
Learning disabilities, 34, 255, 263 n.2
Learning standards. See Approaches
to learning; Early childhood
education standards; Standards
Lexington Developmental Scales, 77
Limited English proficient. See
English language learners
Literacy. See Language and literacy
Literacy Activities Rating Scale, 166
Literacy Environment Checklist, 166
M
MacArthur-Bates Communicative
Development Inventories, 40,
79, 101, 105, 139, 142
Mathematica Policy Research, 201,
202, 203, 204, 283, 284, 285, 286,
287, 293, 295, 450-451
Mathematics
and academic achievement, 116
algebraic concepts, 115-116, 118
developmentally appropriate, 171
domain defined, 107, 114-116,
136-138
early learning standards, 49, 116
geometry, 114-115, 117, 118
importance, 116
instruments for assessment, 118,
136-138, 170-171, 175, 176
language-oriented problems, 154
learning-related behaviors and, 98,
99
mathematical reasoning, 116,
170-171
measurement skills, 114, 115, 117,
118, 120, 121, 123, 130, 170-171
measures of, 110, 117-118
number sense, 114, 117, 118, 165, 170
quality of learning environment,
157, 158, 161, 164, 170-171, 175,
176
testing all children, 116-117
U.S. students performance, 116
vocabulary and, 58
McCarthy Scales of Childrens Ability,
78
N
National Assessment of Educational
Progress, 337
National Association for the
Education of Young Children
(NAEYC), 33, 38, 39, 45, 161,
162, 258, 268, 334
476
National Association of Early
Childhood Specialists in State
Departments of Education, 38,
268
National Association of School
Psychologists, 250, 268
National Association of Test Directors,
246
National Center for Education
Statistics, 36
National Child Care Information
Center, 107
National Childrens Study, 451
National Early Childhood
Accountability Task Force, 24,
39, 302, 322, 324
National Early Childhood Technical
Assistance Center, 71, 452
National Early Intervention
Longitudinal Study, 266
National Education Goals Panel, 4, 38,
48, 50, 58, 86, 97, 282, 347
National Head Start Association, 23,
55
National Institute for Early Education
Research (NIEER), 71, 439,
452-453
National Institute of Child Health and
Human Development, 110, 152,
153, 162, 170
National Longitudinal Survey of
Youth-Child Supplement, 287
National Registry Alliance, 318
National Research Council, 2, 20, 21,
24, 48-49, 431
Naturalistic assessment. See Authentic
assessment
NCHS/NLSY Questionnaire, 77
Nebraska, assessment system, 332-335
Neonatal Behavioral Assessment
Scale, 68, 69, 70
NEPSY, 120, 128, 129, 130, 139
New Jersey, Abbott Preschool
Program, 335-336, 351, 354
Newborn Individualized
Development Care and
Assessment Program, 70
INDEX
Newborns
developmental assessment, 68-70
hearing screening, 66
No Child Left Behind Act (NCLB), 1,
16, 19, 34, 35, 302, 307-308, 314,
315
Norm-referenced tests, 40, 50 n.1, 112,
197, 237-238, 254, 259-260, 264265, 270, 271-272, 273, 279, 350,
423, 426, 427
Normative development. See Threats
to normative development
Nursing Child Assessment Satellite
Training, 84
Nutritional deficiency, 67
O
Obesity, 18, 88, 120
Observation Measure of Language
and Literacy Instruction
(OMLIT), 168-169, 176
Observational measures
for accountability, 147-148, 149,
167, 201-202, 203, 204
classroom environments, 146, 154,
156-157, 158-159, 165-166, 168169, 173, 175, 176, 334
of environmental quality, 63,
146-150
home environment, 152-153, 154155, 174
instruments/tools, 120-144, 157173, 201, 297, 351
language and literacy instruction,
168-169, 176
length of assessment, 150, 160, 166,
169
and professional development,
146-147, 149
purposes, 146-150, 201-202,
203-204
reliability, 149-150, 157, 203, 204,
268, 283, 334-335
research needs, 204, 364
selecting, 146
477
INDEX
for special needs children, 274
strengths and weaknesses, 203-205
training assessors, 157, 203, 204
validity, 150, 157, 164-165
Observation Record of the Caregiving
Environment (ORCE), 169-170,
176
Office of Civil Rights, 251
Office of Special Education Programs,
328, 333, 348
Oral Language Development Scale,
292
Otoacoustic emissions, 66
Outcome measures. See
Developmental outcome
measures
P
Parent-child interaction, 104, 151, 155,
174
Parental/family involvement, 38, 94,
159, 171, 172, 177, 251, 260, 265,
268-269, 287
Parenting skills, 149
Parents Evaluation of Developmental
Status (PEDS), 77, 78
Partners for Inclusion model, 147
Peabody Developmental Motor Scales,
80
Peabody Individual Achievement
Tests, 133, 136, 140, 242
Peabody Picture Vocabulary Test
(PPVT), 40, 79, 101, 139, 154,
166, 236, 242, 252, 291
Peen Interactive Peer Play, 242
Performance assessments, 11, 133,
213, 224-226, 238, 254-255,
264, 335-336, 359, 424, 426. See
also Authentic assessment;
Classroom environments;
individual instruments
Pervasive Developmental Disorders
Screening Test-II (PDDST-II), 82
Pew Foundation, 39, 302
Phenylketonuria screening, 29
478
INDEX
Psychometric issues in assessment,
23, 119. See also Reliability
of assessments; Validity of
assessments
abbreviation or adaptation of tests,
40
bias testing, 235, 240, 243-244
cognitive skills, 107-108, 112, 113
direct tests, 370, 371, 372
guidelines, 6, 271, 350, 352, 370
information on instruments, 87,
449, 451, 452, 453
high-stakes vs. low-stakes
conditions, 195
measuring quantitative change,
224-225
precision, 263
research needs, 361, 364
special populations, 96, 112,
243-244
standards of evidence, 3, 225,
243-244
Purpose of assessments. See also
Accountability; Diagnostic
testing; Progress monitoring;
Program, performance
assessment; Screening
and appropriate use of
assessments, 27, 259, 283, 341342, 344, 433
community-focused screening,
29-30
determining and communicating,
3, 282-284
diagnostic testing, 30
eligibility testing, 31
functional level, 2, 29-31
guidelines on, 5, 37-39, 345-346
importance of purposefulness, 2,
18, 313
individual-focused screening, 29
in infant-toddler period, 62, 74
intervention and instruction
planning, 2, 31-34, 39, 69, 70, 85,
201, 222-226, 259, 264-265, 283
rationale for guidelines, 342-345
readiness testing, 30-31
research-related, 2, 34, 37, 266-267
479
INDEX
Q
Qualistar Early Learning Quality
Rating and Improvement
System, 173
Quality Interventions for Early Care
and Education (QUINCE), 147
Quality of assessments. See also Bias
in assessments; Reliability
of assessments; Validity of
assessments
measurement choices and, 200-205
Quality of environment. See also
Center-based environments;
Classroom environments;
Home environments
appropriate assessment of, 320
and cognitive skills, 108, 148, 151,
153-154, 170
and developmental outcomes, 1718, 64, 86, 95, 104, 108
English language learners, 172-173
importance, 145
instruments, 147, 152-177
observational measures, 63,
146-150
strategy for assessing programs,
173
systems perspective, 319-320
Quality of Instruction in Language
and Literacy, 169, 174
The Quick Test, 79
R
RAND Corporation, 111
Ratings of Parent-Child Interactions,
174
Read Aloud Profile, 169
480
INDEX
S
Sampling error, 188, 207, 211, 212
Science, 107, 136-138, 157, 158
Screening
appropriateness of assessment for,
360
community-focused, 29-30
contexts for, 63-64
defined, 427
developmental, 68-72, 87, 262
difficulties with young children,
72-74
implementing, 283
individual-focused, 29
infants and toddlers, 62-64, 70-72
instruments, by domain, 77-84
limitations in effectiveness, 74-76
newborns, 66, 68-70
principles of good programs, 63
research needs, 363
special needs children, 262
for threats to normative
development, 64-68
universal, 33-34
uses of assessments, 62-63
Screening Tool for Autism in TwoYear-Olds (STAT), 82
Selecting assessment tools, 4, 431
for accountability purposes, 40-41,
102, 201, 226-231
accuracy and quality issues, 2, 181,
210
committee approach, 22-25
guidelines on, 6-8, 37-39, 352-354
for local needs, 214-222
for multiple related entities,
222-226
in program evaluation context, 4041, 226-231
rationale for guidelines, 349-351
Sequenced Inventory of
Communication Development,
79, 141
Shape Stroop measure, 113
Simon says test, 113
Slosson Intelligence Test, 77
Snack delay test, 113
Snapshot of Classroom Activities, 169
Social benchmarking, 36-37, 40
Social Communication Questionnaire
(SCQ), 82
INDEX
Social Competence and Behavioral
Evaluation (SCBE), 126, 127
Social consequences of assessment
bias in assessment and, 195-196,
239-240
scenario, 227
Social Skills Rating Scale, 100, 122,
128, 129
Social studies, 50, 85, 107, 135, 440
Socioemotional development. See also
Approaches to learning
behavior problems, 54 n.4, 89, 91,
92-93, 94, 95, 99, 365
consensus, 90
constructs, 58, 95, 96, 113
domain defined, 89
early learning guidelines, 89
home environment and, 146, 148,
150-151
importance in practice and policy,
50-51, 89-90
infant assessment, 62, 69
instruments, 59, 71, 164, 166, 167,
170, 173-177, 362-363
and later development, 90-94
malleability, 94-95
measurement issues, 95, 242
measures of, 91, 95, 96-97
minority children, 242, 245
nutritional deficiency and, 67
quality of environment and, 164,
166, 167, 170, 174-177
reliability and validity of tests, 9697, 194
research needs, 362-363
screening instruments, 81-82,
122-127
self-regulation, 70, 89, 90, 92, 93-94,
95, 96, 108, 123, 311, 365
social competence, 89, 91-92, 96,
108, 126, 127, 148, 164, 166, 167,
170, 194
testing all children, 96, 242, 255
Special education, 31, 153, 239, 252,
255, 256, 261, 262-263, 264, 267,
271, 325, 326, 327, 328, 330, 348,
367, 369
481
Special needs children
accommodations for, 8, 40, 250,
254, 259, 260, 267, 272, 273, 276279, 295-296, 298, 330-331, 353,
367
accountability-related assessments,
267, 270, 272, 279
administration of assessments,
269-270, 272-273, 295-296
appropriateness of assessment for,
3, 4, 38, 40, 271, 280, 283
challenges in assessment, 270-279
construct-irrelevant skills, 274-275
developmental assessment, 76,
271, 369
diagnostic testing, 262-264
domain-based assessments,
274-276
eligibility determinations, 262-264,
271-272
functional outcomes approach,
275-276
inclusion in assessments, 36, 40,
266-267, 270, 273, 295-296, 320,
330-331
infants and toddlers, 261
instruments/tools, 273-274,
276-279
intervention or instruction
planning, 1, 264-265
labeling concerns, 46
large-scale assessments, 40, 266267, 273, 279
outcome measures, 275-276
population characteristics, 261
principles of assessment, 267-270
progress monitoring, 263, 265-266,
267
purposes of assessment, 260,
262-267
reporting outcomes, 47
research needs, 367
research-related assessments, 266267, 272
response to intervention approach,
263-264
screening, 262
482
INDEX
483
INDEX
T
Tandem mass spectrometry, 65
Teacher-child relationships, 91-92, 95,
147, 157, 160-161, 163, 164, 165,
171, 257, 287, 372
Teacher Rating Scale, 100
Temperament, screening instruments,
81, 83
Temperament and Atypical Behavior
Scale (TABS), 81
Test de Vocabulario en Imgenes
Peabody, 252
Test of Early Language Development
(TELD), 79, 102, 141, 143, 159
Test of Early Mathematics Ability
(TEMA), 136, 137
Test of Early Reading Ability (TERA),
140, 144
Test of Language Dominance (TOLD),
102, 139
Testing. See Diagnostic testing
Threats to normative development
genetic/metabolic screening, 64-65
iron deficiency screening, 67
lead screening, 68
newborn hearing screening, 66
vision screening, 66-67
Toddler Behavior Assessment
Questionnaire (Carey Scales),
83
Toddler-Parent Mealtime Behavior
Questionnaire, 120
Toddlers. See Infant and toddler
assessments
U
Universal design principles, 8, 33-34,
276-279, 353, 366, 374
Universal Nonverbal Intelligence Test
(UNIT), 253
University of Nebraska, 215
U.S. Department of Education, 23,
249, 267, 271, 333, 348
U.S. Department of Health and
Human Services, 52-53, 430, 431
Office of Head Start, 2, 20, 450
U.S. Government Accountability
Office, 23, 54, 55
U.S. Preventive Services Task Force,
66, 67
Use of assessments. See Appropriate
use of assessments; Purpose of
assessments
V
Validity argument, 187
Validity of assessments. See also Bias
in assessments
for accountability, 40, 54-55, 198
as argument, 186-191
consequence of use and, 194-196
consistency of assessment. See
Reliability of assessments
construct, 185-186, 193, 197, 236237, 243, 250, 254, 255, 274-275
contemporary views of evidence,
192-196
content, 100, 184-185, 192, 193, 235236, 244, 250, 254, 255
convergent/divergent evidence,
194
criterion model, 183, 211
484
INDEX
Vocabulary, 16-17, 32, 40, 53, 58, 79,
98, 101, 102, 103, 106, 107, 116,
135, 139, 142, 146, 154, 158, 166,
215, 236, 242, 248, 252, 286-287,
291, 292-293
Vygotskian play-based preschool
curriculum, 94, 99
W
Wechsler Intelligence Scale for
Children, 113, 245, 253
Wechsler Preschool and Primary Scale
of Intelligence (WPPSI), 113,
130, 133, 141, 242, 245
Welfare reform policies, 153
Westat, 24, 53-54
Woodcock-Johnson III (WJ-III) Tests
of Cognitive Abilities, 113, 118,
121, 130, 131, 132, 133, 136, 137,
139, 140, 144, 242, 252
Woodcock-Johnson-Revised Tests of
Cognitive Ability (WJ-R COG),
253
Work Sampling for Head Start
(WSHS) measure, 119
Work Sampling System (WSS), 33,
119, 120, 122, 124, 135, 138, 139,
297
Y
Ypsilanti Carnegie Infant Education
project, 111