100% found this document useful (4 votes)

1K views501 pages

Childhood Assessment

Uploaded by

Surya Laga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (4 votes)

1K views501 pages

Childhood Assessment

Uploaded by

Surya Laga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 501

Committee on Developmental Outcomes and

Assessments for Young Children

Catherine E. Snow and Susan B. Van Hemel, Editors

Board on Children, Youth, and Families

Board on Testing and Assessment
Division of Behavioral and Social Sciences and Education

THE NATIONAL ACADEMIES PRESS 500 Fifth Street, N.W. Washington, DC 20001
NOTICE: The project that is the subject of this report was approved by the
Governing Board of the National Research Council, whose members are
drawn from the councils of the National Academy of Sciences, the National
Academy of Engineering, and the Institute of Medicine. The members of the
committee responsible for the report were chosen for their special competences and with regard for appropriate balance.
The study was supported by Award No. HHSP23320042509XI between
the National Academy of Sciences and the U.S. Department of Health and
Human Services. Any opinions, findings, conclusions, or recommendations
expressed in this publication are those of the author(s) and do not necessarily reflect the view of the organizations or agencies that provided support
for this project.
Library of Congress Cataloging-in-Publication Data
Early childhood assessment : why, what, and how / Committee on
Developmental Outcomes and Assessments for Young Children ; Catherine
E. Snow and Susan B. Van Hemel, editors.
p. cm.
Includes bibliographical references and index.
ISBN 978-0-309-12465-2 (hardcover) ISBN 978-0-309-12466-9 (pdf) 1.
Children with social disabilitiesEducation (PreschoolUnited States.
2. Child developmentUnited States. 3. Competency-based education
United States. I. Snow, Catherine E. II. Van Hemel, Susan B. III. Committee
on Developmental Outcomes and Assessments for Young Children.
LC4069.2.E37 2008
372.126--dc22
2008038565
Additional copies of this report are available from the National Academies
Press, 500 Fifth Street, N.W., Lockbox 285, Washington, DC 20055; (800)
624-6242 or (202) 334-3313 (in the Washington metropolitan area); Internet,
http://www.nap.edu.
Copyright 2008 by the National Academy of Sciences. All rights reserved.
Printed in the United States of America
Suggested citation: National Research Council. (2008). Early Childhood Assessment: Why, What, and How. Committee on Developmental Outcomes and
Assessments for Young Children, C.E. Snow and S.B. Van Hemel, Editors.
Board on Children, Youth, and Families, Board on Testing and Assessment,
Division of Behavioral and Social Sciences and Education. Washington, DC:
The National Academies Press.

The National Academy of Sciences is a private, nonprofit, self-perpetuating

society of distinguished scholars engaged in scientific and engineering
research, dedicated to the furtherance of science and technology and to
their use for the general welfare. Upon the authority of the charter granted
to it by the Congress in 1863, the Academy has a mandate that requires it to
advise the federal government on scientific and technical matters. Dr. Ralph J.
Cicerone is president of the National Academy of Sciences.
The National Academy of Engineering was established in 1964, under the
charter of the National Academy of Sciences, as a parallel organization of
outstanding engineers. It is autonomous in its administration and in the
selection of its members, sharing with the National Academy of Sciences the
responsibility for advising the federal government. The National Academy of
Engineering also sponsors engineering programs aimed at meeting national
needs, encourages education and research, and recognizes the superior
achievements of engineers. Dr. Charles M. Vest is president of the National
Academy of Engineering.
The Institute of Medicine was established in 1970 by the National Academy of Sciences to secure the services of eminent members of appropriate
professions in the examination of policy matters pertaining to the health of
the public. The Institute acts under the responsibility given to the National
Academy of Sciences by its congressional charter to be an adviser to the
federal government and, upon its own initiative, to identify issues of medical care, research, and education. Dr. Harvey V. Fineberg is president of the
Institute of Medicine.
The National Research Council was organized by the National Academy of
Sciences in 1916 to associate the broad community of science and technology with the Academys purposes of furthering knowledge and advising
the federal government. Functioning in accordance with general policies
determined by the Academy, the Council has become the principal operating
agency of both the National Academy of Sciences and the National Academy
of Engineering in providing services to the government, the public, and the
scientific and engineering communities. The Council is administered jointly
by both Academies and the Institute of Medicine. Dr. Ralph J. Cicerone and
Dr. Charles M. Vest are chair and vice chair, respectively, of the National
Research Council.
www.national-academies.org

Committee on DEVELOPMENTAL OUTCOMES AND

ASSESSMENTS FOR YOUNG CHILDREN
CATHERINE E. SNOW (Chair), Graduate School of Education,
Harvard University
MARGARET BURCHINAL, Department of Education,
University of California, Irvine; University of North
Carolina
HARRIET A. EGERTSON, Independent consultant, Temecula,
California
EUGENE K. EMORY, Department of Psychology, Emory
University
DAVID J. FRANCIS, Department of Psychology, University of
Houston
EUGENE E. GARCIA, College of Education, Arizona State
University
KATHLEEN HEBBELER, SRI International, Menlo Park,
California
EBONI HOWARD, Herr Research Center, Erikson Institute,
Chicago
JACQUELINE JONES, New Jersey Department of Education,
Trenton
LUIS M. LAOSA, Educational Testing Service, Princeton,
New Jersey
KATHLEEN McCARTNEY, Graduate School of Education,
Harvard University
MARIE C. McCORMICK, School of Public Health, Harvard
University
DEBORAH J. STIPEK, School of Education, Stanford University
MARK R. WILSON, Graduate School of Education, University
of California, Berkeley
MARTHA ZASLOW, Child Trends, Washington, DC
Liaison to the Board on Children, Youth, and Families
BETSY LOZOFF, Center for Human Growth and Development,
University of Michigan
SUSAN B. VAN HEMEL, Study Director
MATTHEW D. McDONOUGH, Senior Program Assistant

Board on children, youth, and families

BERNARD GUYER (Chair), Bloomberg School of Public Health,
Johns Hopkins University
BARBARA L. WOLFE (Vice Chair), Departments of Economics and
Population Health Sciences, University of Wisconsin-Madison
WILLIAM R. BEARDSLEE, Department of Psychiatry,
Childrens Hospital, Boston
JANE D. BROWN, School of Journalism and Mass Communications,
University of North Carolina, Chapel Hill
LINDA MARIE BURTON, Department of Sociology, Duke
University
P. LINDSAY CHASE-LANSDALE, School of Education and
Social Policy, Northwestern University
CHIRSTINE FERGUSON, School of Public Health and Health
Services, George Washington University
WILLIAM T. GREENOUGH, Department of Psychology
and Beckman Institute, University of Illinois at
Urbana-Champaign
RUBY HEARN, Robert Wood Johnson Foundation, Baltimore
MICHELE D. KIPKE, Childrens Hospital of Los Angeles
BETSY LOZOFF, Center for Human Growth and Development,
University of Michigan
SUSAN G. MILLSTEIN, Division of Adolescent Medicine,
University of California, San Francisco
CHARLES A. NELSON, Laboratory of Cognitive Neuroscience,
Childrens Hospital, Boston
PATRICIA OCAMPO, Centre for Research on Inner City Health,
St. Michaels Hospital, Toronto
FREDERICK P. RIVARA, Harborview Injury Prevention and
Research Center, University of Washington School of Medicine
LAURENCE D. STEINBERG, Department of Psychology,
Temple University
JOHN R. WEISZ, Judge Baker Childrens Center, Harvard
University
MICHAEL ZUBKOFF, Department of Community and Family
Medicine, Dartmouth Medical School
ROSEMARY A. CHALK, Director
vi

BOARD ON TESTING AND ASSESSMENT

EDWARD HAERTEL (Chair), Stanford University
LYLE F. BACHMAN, Department of Applied Linguistics and
TESOL, University of California, Los Angeles
STEPHEN B. DUNBAR, College of Education, University of
Iowa
DAVID J. FRANCIS, Department of Psychology, University of
Houston
MICHAEL T. NETTLES, Educational Testing Service, Princeton,
New Jersey
JAMES W. PELLEGRINO, Department of Psychology,
University of Illinois at Chicago
DIANA C. PULLIN, Lynch School of Education, Boston College
STUART W. ELLIOTT, Director

vii

Acknowledgments

his report is the result of over a year of effort by the

C ommittee on Developmental Outcomes and Assessments for Young Children. The study was performed at the
request of the Office of Head Start, Administration for Children
and Families (ACF) of the U.S. Department of Health and Human
Services. The committee gathered and reviewed literature on
developmental outcomes and assessments for young children,
listened to briefings and presentations by experts and stake
holders, and, using this information and its combined expertise,
has attempted to provide its best advice on issues associated with
assessing children from birth to age 5.
Members of the study committee, volunteers selected from
several academic and professional practice specialties, found
the project an interesting and stimulating opportunity for inter
disciplinary collaboration. They cooperated in work groups,
learned each others technical languages, and exemplified in their
work the collegial qualities that are among the National Academies unique strengths. I am grateful to them for their hard work,
expertise, and good humor. Committee member biographies can
be found in Appendix E. Background papers that were prepared
One member, Cybele Raver, resigned from the committee in September 2007
because of increased professional responsibilities.

ACKNOWLEDGMENTS

under contracts to Linda Espinosa, Aki Murata, E. Michael Foster,

and David Rose (some written with the participation of coauthors) were also of great value in the committees work.
On behalf of the committee, I would like to express appreciation to the many other people who contributed to this project.
Lauren Supplee of ACF served as project monitor and provided
guidance as needed. ACF staff were of great help to the committee, obtaining hard-to-find documents and materials, providing
helpful explanations, and answering the committees questions
about the documents and their applicability. Among those at
ACF who provided information and support are Mary Bruce
Webb, Jennifer Brooks, and Naomi Goldstein, who provided the
committee with background and context as well as the specifics
of the National Reporting System (NRS) and ACFs objectives for
the study. Catherine Hildum, staff to the Senate Committee on
Health, Education, Labor and Pensions, Subcommittee on Children and Families; James Bergeron, staff to the House Committee
on Education and Workforce (R); and Roberto Rodriguez, staff
to Senator Edward Kennedy, chair of the Senate Committee on
Health, Education, Labor and Pensions, provided briefings that
helped the committee understand the objectives of Congress for
the study. Nicholas Zill of Westat and Sam Meisels of the Erikson
Institute also provided briefings on the NRS. We also wish to
thank the participants who provided input to the committee at its
public stakeholder forum (see Appendix B).
At the National Research Council (NRC), Susan Van Hemel
was study director for the project. Rosemary Chalk, director of the
Board on Children, Youth, and Families, and Stuart Elliott, director of the Board on Testing and Assessment, provided important
management support and oversight for this work, and Naomi
Chudowski provided research support. Matthew McDonough,
senior program assistant, provided administrative and logistic
support as well as literature research and manuscript preparation
work. The executive office reports staff of the Division of Behavioral and Social Sciences and Education, especially Christine
McShane and Yvonne Wise, provided valuable help with editing
and production of the report. Kirsten Sampson Snyder managed

ACKNOWLEDGMENTS

the report review process, and Eugenia Grohman provided guidance during that process.
This report has been reviewed in draft form by individuals
chosen for their diverse perspectives and technical expertise,
in accordance with procedures approved by the Report Review
Committee of the NRC. The purpose of this independent review
is to provide candid and critical comments that will assist the
institution in making the published report as sound as possible
and to ensure that the report meets institutional standards for
objectivity, evidence, and responsiveness to the study charge. The
review comments and draft manuscript remain confidential to
protect the integrity of the deliberative process.
We thank the following individuals for their participation in
the review of this report: Stephen J. Bagnato, Early Childhood Partnerships, Childrens Hospital of Pittsburgh; Virginia Buysse, Child
Development Institute, University of North Carolina at Chapel
Hill; Gayle Cunningham, Executive Directors Office, Jefferson
County Committee for Economic Opportunity, Birmingham,
AL; David Dickinson, Department of Teaching and Learning,
Vanderbilt University; Walter Gilliam, The Edward Zigler Center
in Child Development and Social Policy of the Yale Child Study
Center, Yale University School of Medicine; Robert L. Linn, Department of Education, University of Colorado; Joan Lombardi, The
Childrens Project, Washington, DC; Helen Raikes, University of
Nebraska, Lincoln; David M. Thissen, Department of Psychology,
University of North Carolina; and Ross A. Thompson, Department
of Psychology, University of California, Davis.
Although the reviewers listed above have provided many
constructive comments and suggestions, they were not asked to
endorse the conclusions or recommendations, nor did they see the
final draft of the report before its release. The review of this report
was overseen by Aletha C. Huston, Pricilla Pond Flawn Regents
Professor of Child Development, University of Texas at Austin,
and Jack P. Shonkoff, Center on the Developing Child, Harvard
University, as review coordinator and monitor, respectively.
Appointed by the NRC, they were responsible for making sure
that an independent examination of this report was carried out in

xii

ACKNOWLEDGMENTS

accordance with institutional procedures and that all reviewers

comments were considered carefully. Responsibility for the final
content of this report, however, rests entirely with the authoring
committee and the institution.
Catherine E. Snow, Chair
Committee on Developmental Outcomes
and Assessments for Young Children

Contents

Summary

Part I: Early Childhood Assessment

1 Introduction

2 Purposeful Assessment

3 Perspectives on Early Childhood Learning Standards

and Assessment

Part II: Child-Level Outcomes and Measures

4 Screening Young Children

5 Assessing Learning and Development

6 Measuring Quality in Early Childhood Environments

145

Part III: How to Assess

179

7 Judging the Quality and Utility of Assessments

181

8 Assessing All Children

233

9 Implementation of Early Childhood Assessments

281

xiii

xiv

CONTENTS

Part IV: Assessing Systematically

299

10 Thinking Systematically

301

11 Guidance on Outcomes and Assessments

341

References

377

Appendixes
A Glossary of Terms Related to Early Childhood
Assessment

423

B Information on Stakeholder Forum

429

C Development of State Standards for Early Childhood

Education

437

D Sources of Detailed Information on Test and

Assessment Instruments

449

E Biographical Sketches of Committee

Members and Staff

455

wisely, they cannot be made by a single individual or by
fiat in legislation. Policy and legislation should allow for
the adoption of new instruments as they are developed and
validated.
(I-10) Assessment tools should be constructed and selected for
use in accordance with principles of universal design, so
they will be accessible to, valid, and appropriate for the
greatest possible number of children. Children with disabilities may still need accommodations, but this need
should be minimized.
(I-11) Extreme caution needs to be exercised in reaching conclusions about the status and progress of, as well as the effectiveness of programs serving, young children with special
needs, children from language-minority homes, and other
children from groups not well represented in norming or
validation samples, until more information about assessment use is available and better measures are developed.
Guidelines on Systems
(S-1) An effective early childhood assessment system must
be part of a larger system with a strong infrastructure to
support childrens care and education. The infrastructure
is the foundation on which the assessment systems rest
and is critical to its smooth and effective functioning. The
infrastructure should encompass several components that
together form the system:
A. Standards: A comprehensive, well-articulated set of standards for both program quality and childrens learning that
are aligned to one another and that define the constructs of
interest as well as child outcomes that demonstrate that the
learning described in the standard has occurred.
B. Assessments: Multiple approaches to documenting child
development and learning and reviewing program quality
that are of high quality and connect to one another in welldefined ways, from which strategic selection can be made
depending on specific purposes.

SUMMARY

C. Reporting: Maintenance of an integrated database of assessment instruments and results (with appropriate safeguards
of confidentiality) that is accessible to potential users, that
provides information about how the instruments and
scores relate to standards, and that can generate reports for
varied audiences and purposes.
D. Professional development: Ongoing opportunities provided
to those at all levels (policy makers, program directors,
assessment administrators, practitioners) to understand the
standards and the assessments and to learn to use the data
and data reports with integrity for their own purposes.
E. Opportunity to learn: Procedures to assess whether the
environments in which children are spending time offer
high-quality support for development and learning, as well
as safety, enjoyment, and affectively positive relationships,
and to direct support to those that fall short.
F. Inclusion: Methods and procedures for ensuring that all
children served by the program will be assessed fairly,
regardless of their language, culture, or disabilities, and
with tools that provide useful information for fostering
their development and learning.
G. Resources: The assurance that the financial resources
needed to ensure the development and implementation of
the system components will be available.
H. Monitoring and evaluation: Continuous monitoring of the
system itself to ensure that it is operating effectively and
that all elements are working together to serve the interests
of the children. This entire infrastructure must be in place to
create and sustain an assessment subsystem within a larger
system of early childhood care and education.
(S-2) A successful system of assessments must be coherent in a
variety of ways. It should be horizontally coherent, with the
curriculum, instruction, and assessment all aligned with
the early learning and development standards and with the
program standards, targeting the same goals for learning,
and working together to support childrens developing
knowledge and skill across all domains. It should be vertically coherent, with a shared understanding at all levels of
the system of the goals for childrens learning and devel-

EARLY CHILDHOOD ASSESSMENT

opment that underlie the standards, as well as consensus

about the purposes and uses of assessment. It should be
developmentally coherent, taking into account what is known
about how childrens skills and understanding develop over
time and the content knowledge, abilities, and understanding that are needed for learning to progress at each stage of
the process. The California Desired Results Developmental
Profile provides an example of movement toward a multiply coherent system. These coherences drive the design of
all the subsystems. For example, the development of early
learning standards, curriculum, and the design of teaching
practices and assessments should be guided by the same
framework for understanding what is being attempted
in the classroom that informs the training of beginning
teachers and the continuing professional development of
experienced teachers. The reporting of assessment results
to parents, teachers, and other stakeholders should also be
based on this same framework, as should the evaluations of
effectiveness built into all systems. Each child should have
an equivalent opportunity to achieve the defined goals, and
the allocation of resources should reflect those goals.
(S-3) Following the best possible assessment practices is especially
crucial in cases in which assessment can have significant
consequences for children, teachers, or programs. The 1999
NRC report High Stakes: Testing for Tracking, Promotion, and
Graduation urged extreme caution in basing high-stakes
decisions on assessment outcomes, and we conclude that
even more extreme caution is needed when dealing with
young children from birth to age 5 and with the early care
and education system. We emphasize that a primary purpose of assessing children or classrooms is to improve the
quality of early childhood care and education by identifying
where more support, professional development, or funding
is needed and by providing classroom personnel with tools
to track childrens growth and adjust instruction.
(S-4) Accountability is another important purpose for assessment, especially when significant state or federal investments are made in early childhood programs. Programlevel accountability should involve high stakes only under

SUMMARY

very well-defined conditions: (a) data about input factors

are fully taken into account, (b) quality rating systems or
other program quality information has been considered in
conjunction with child measures, (c) the programs have
been provided with all the supports needed to improve,
and (d) it is clear that restructuring or shutting the program down will not have worse consequences for children
than leaving it open. Similarly, high stakes for teachers
should not be imposed on the basis of classroom functioning or child outcomes alone. Information about access to
resources and support for teachers should be gathered and
carefully considered in all such decisions, because sanctioning teachers for the failure of the system to support them is
inappropriate.
(S-5) Performance (classroom-based) assessments of children
can be used for accountability, if objectivity is ensured by
checking a sample of the assessments for reliability and
consistency, if the results are appropriately contextualized
in information about the program, and if careful safeguards
are in place to prevent misuse of information.
(S-6) Minimizing the burdens of assessment is an important goal;
being clear about purpose and embedding any individual
assessment decision into a larger system can limit the time
and money invested in assessment.
(S-7) It is important to establish a common way of identifying
children for services across the early care and education,
family support, health, and welfare sectors.
(S-8) Implementing assessment procedures requires skilled administrators who have been carefully trained in the assessment
procedures to be implemented; because direct assessments
with young children can be particularly challenging, more
training may be required for such assessments.
(S-9) Implementation of a system-level approach requires having
services available to meet the needs of all children identified through screening, as well as requiring follow-up with
more in-depth assessments.
(S-10) If services are not available, it can be appropriate to use
screening assessments and then use the results to argue
for expansion of services. Failure to screen when services

EARLY CHILDHOOD ASSESSMENT

are not available may lead to underestimation of the need

for services.
Research AGENDA
Among the tasks of the committee was the development of a
research agenda to improve the quality and suitability of developmental assessment, across a wide array of purposes and for the
benefit of all the various subgroups of children who will eventually be entering kindergarten. References to the need for research
on assessment tools and the building of an assessment system
are distributed throughout this document. Major topics of recommended research, with details in Chapter 11, are
research related to instrument development,
research related to assessment processes,
research on the use of assessment tools and processes with
special populations, and
research related to accountability.
Conclusion
Well-planned and effective assessment can inform teaching
and program improvement, and contribute to better outcomes for
children. Current assessment practices do not universally reflect
the available information about how to do assessment well. This
report affirms that assessments can make crucial contributions to
the improvement of childrens well-being, but only if they are well
designed, implemented effectively, developed in the context of
systematic planning, and are interpreted and used appropriately.
Otherwise, assessment of children and programs can have negative consequences for both. The value of assessments therefore
requires fundamental attention to their purpose and the design
of the larger systems in which they are used.

Part
I
Early Childhood Assessment

n this part of the report, we present an introduction to the

work, in Chapter 1, with an explanation of the policy context for the study, the committees charge, the committees
approach to the work, and the structure of the report.
In Chapter 2, we discuss purposeful assessment, emphasizing the importance of determining the purposes of any assessment before proceeding to design, develop, or implement it. We
review some common purposes for assessing young children, and
introduce some guidelines for such assessments developed by
respected organizations concerned with the care and education of
young children. We also introduce the special issues attendant to
using assessment of young children for accountability purposes.
In Chapter 3, we provide some historical context for this
study. We review the recent history of the development of early
childhood learning standards and assessments, especially in
the states and the federal government, with a discussion of the
societal and governmental changes that have motivated some of
these efforts.

1
Introduction

designing assessment systems that serve the purpose of ensuring

optimal outcomes for young children requires the investment of
time, money, and considerable expertise. Failing to make those
investments risks negative effects on children, on those responsible for care and education of young children, and ultimately on
society. On the other hand, implementing assessment as a crucial,
though neither simple nor inexpensive, part of a well-articulated
early childhood care and education system offers the possibility
of improved programs, better informed parents and care and education providers, happier and more accomplished children, and
more solid evidence concerning program effectiveness.
THE COMMITTEES CHARGE
In the context described above, the U.S. Department of Health
and Human Services Office of Head Start implemented the Head
Start National Reporting System (NRS) in 2003. (This assessment
and its origins are discussed more fully in Chapter 3.) The NRS
met with a great deal of well-publicized critical reaction from
early childhood researchers and advocates, some of it based on
the belief that such an assessment was inappropriate, and some
criticizing the NRS design, development, and implementation
process. Partly in response to this criticism, Congress included a
requirement for an independent study by the National Research
Council (NRC) of developmental outcomes and their assessment
in funding legislation for the Administration for Children and
Families (ACF) in fiscal year (FY) 2006.
In September 2006, the NRC, an operating arm of the
National Academies, entered into a contract with the Office of
Head Start of the ACF in the U.S. Department of Health and
Human Services, at the request of the House Subcommittee on
Education, to perform this study. The study was overseen jointly
by the Board on Children, Youth, and Families (a joint activity
of the NRC and the Institute of Medicine) and the NRCs Board
on Testing and Assessment. The Committee on Developmental
Outcomes and Assessments for Young Children was appointed
following the procedures mandated for all NRC committee
appointments. Those procedures are designed to ensure that
committee members are chosen for their expertise, indepen-

INTRODUCTION

dence, and diversity and that the committees membership is

balanced and without conflicts of interest. Brief biographies of
the committee members appear in Appendix E.
The committees charge as described in the Academies proposal, incorporated by reference in the contract with the ACF
reads:
The committee will respond to a congressional mandate for a
National Research Council panel to review and provide guidance
on appropriate outcomes and assessments for young children.
The committee will focus on two key topics: (1) the identification
of key outcomes associated with early stages of child development
for children ages 0-5, and (2) the quality and purpose of different
state-of-the art techniques and instruments for developmental
assessments.
In the first area, the committee will review the research base
associated with developmental outcomes for children ages 0-5 in
different domains, including physical, cognitive, social, psycho
biological, and emotional. This review will include consideration
of the range of variation associated with developmental outcomes
in different child populations according to gender, SES status,
race/ethnicity, and age. Special attention will be given to outcomes
that are specified as the focus of early childhood programming,
such as Head Start, as well as outcomes that allow states to monitor the developmental capacities of young children and to support
programs that make positive contributions to these outcomes.
In the second area, the committee will examine the available
range of techniques and instruments for assessing these outcomes,
paying particular attention to the empirical evidence available
about the reliability, validity, fairness and other considerations
related to the quality and use of the developmental assessments.
The review will consider issues related to the use of assessments
in screening the developmental status of special populations of
children (such as children with developmental disabilities, children
from minority cultures, and children whose home language is not
English).
The committee will also examine the criteria that should guide
the selection of assessment techniques for different purposes, such
as guiding curriculum and instructional decisions for individual
children, or program evaluation and program accountability, and
the ability to link early childhood interventions such as Head Start
with wider community goals for young children. Special consideration will be given to the training requirements that are necessary
for the use of assessments in different program settings and with
different child populations. The committee will, to the extent possi-

EARLY CHILDHOOD ASSESSMENT

ble, identify opportunities to link measurement improvement strategies within diverse settings (such as educational, developmental,
and pediatric programs for young children) to avoid duplication
and to maximize collaboration and efficiencies.
The committee will provide recommendations to practitioners
and policy makers about criteria for the selection of appropriate
assessment tools for different purposes, as well as how to collect
and use contextual information to interpret assessment results
appropriately for young children. The committee will also develop
a research agenda to improve the quality and suitability of developmental assessment tools that can be used in a variety of early
childhood program and service environments.

THE COMMITTEES APPROACH

At the first meeting, the committee identified information
needs in several domains and developed plans for obtaining and
analyzing the needed information and for organizing the report.
After reviewing the charge and the time available to complete
the work, the committee discussed the scope of the tasks and
determined what would and would not be attempted. We did not
think it appropriate to perform in-depth technical reviews of existing instruments, nor to attempt to develop a list of approved
assessment instruments. We chose instead to develop principles
and criteria for the selection of appropriate instruments for various assessment purposes.
The committee gathered information from a broad range of
sources on a number of issues:
Appropriate purposes for assessing young children and
uses for assessment results
Defining appropriate uses and identifying user groups
Identifying potential misuses of assessment results
Using childrens assessment results to make decisions
about programs
Decisions to be made in assessing young children
Choosing domains that should be assessed
Selecting direct versus observational, in-context, or
authentic assessment
Deciding when to sample children or items (or both)
versus administering all items to all children

INTRODUCTION

Reviewing psychometric criteria

Defining reliability and validity in assessments for
young children
Reviewing a sample of available assessments for their
psychometric adequacy
Seeking information about validity in less frequently
studied populations
Information and opinions about the NRS
Special challenges of assessing language-minority children
and children with disabilities in a fair and useful manner
We used several methods to gather the information needed,
including literature review, briefings by the ACF and congressional
staff and others, and a public forum for stakeholders.
The committee and staff searched for and reviewed a large
number of ACF documents and online information relevant to
Head Start and Early Head Start programs and to the NRS, the
assessment effort instituted by Head Start in 2003 that was a
major impetus for the commissioning of this report. Committee
members drew on their expertise and professional experience
in child development, early childhood care and education, and
assessment in reviewing and evaluating these materials. The ACF
materials reviewed include
documents describing Head Start and Early Head Start
programs, standards, frameworks, and research projects;
documents describing the NRS, as well as its development
and implementation; and
web pages maintained by ACF organizations, including
Head Start, the Office of Planning, Research and Evaluation, the Early Childhood Learning and Knowledge Center,
the National Head Start Association, and others.
The committee also reviewed reports of the U.S. Government Accountability Office, the U.S. Department of Education,
and other agencies relevant to early childhood assessment. In
addition to all of these materials, some of the stakeholders and
other sources provided documents for our review. Some of these

EARLY CHILDHOOD ASSESSMENT

were clearly opinion pieces advocating specific points of view or

courses of action and were evaluated as such.
The committee reviewed scientific and professional literature
in early childhood development and assessment, as well as information on early learning guidelines, standards, and frameworks
developed by states and by organizations active in early childhood education. We were especially interested in materials on
developmental outcomes, assessment methods, and instruments,
including existing reviews of early childhood assessment instruments and material on children in special populations and with
special needs. Previous NRC reports including From Neurons to
Neighborhoods: The Science of Early Childhood Development (National
Research Council and Institute of Medicine, 2000) and Eager to
Learn: Educating Our Preschoolers (National Research Council,
2001), also provided much useful information. We read with special interest the report of the National Early Childhood Accountability Task Force, released about halfway through our work, and
received a briefing on that report from the task force chair.
We invited ACF personnel and staff members of the House
and Senate education subcommittees to brief the committee at
our first meeting. Some ACF personnel also attended the stakeholder forum, described below. The committee also asked for and
received briefings from some individuals representing organizations involved with the NRS, to better understand the issues surrounding that assessment. Nicholas Zill of Westat, the contractor
with major responsibility for its development and implementation, briefed the committee at the first meeting, as did Samuel
Meisels, a prominent child development researcher and critic of
the NRS.
In order to better understand the issues in the child development and early education community concerning assessments,
the committee decided it would be useful to hear from various
stakeholders involved in or affected by early childhood assessments. It was also important to ensure that the relevant groups
had the opportunity to tell the committee about their views on the
issues important to them and about their specific concerns.
Two members of this committee, Eugene Garcia and Jacqueline Jones, were
also members of the task force.

INTRODUCTION

After consultation with ACF staff and general discussion in

the committee, a number of stakeholders were identified. Representatives from these organizations were invited to speak briefly
at an open meeting of the committee structured as a public forum
and to submit written responses to questions posed by the committee. We invited a total of 55 organizations to participate in a
public forum on July 6, 2007. Appendix B includes the agenda
for the meeting, a list of participants, and the list of questions the
stakeholder groups were asked to consider.
The committee made a good-faith effort to reach a broad
sampling of stakeholders, although several interest groups
whose inputs we solicited chose not to participate. We understand that we may not have heard all relevant points of view but
worked with the information obtained from those who agreed to
participate.
STRUCTURE OF THE REPORT
This report is organized into four parts. Part I includes this
introduction, Chapter 2, on purposes of assessment, and Chapter3, a brief history of early childhood standards.
Part II concentrates on what should be assessed and why.
Chapter 4 discusses screening assessments, particularly for
infants and young toddlers; Chapter 5 focuses on the domains
typically assessed in young children and approaches to assessing
them; and Chapter 6 discusses methods for measuring the quality
of early childhood environments.
Part III focuses on assessment methods. Chapter 7 addresses
psychometric issues in assessment, and Chapter 8 deals with
issues in assessing ethnic/racial minority and language-minority
children and children with disabilities. Chapter 9 discusses the
implementation of assessments.
Part IV, on assessing systematically, has two chapters. Chapter 10 is a discussion of the need for systems of assessment and
how that need might be satisfied, and Chapter 11 provides the
committees guidance on assessments, including a proposed
research agenda.
The report has five appendixes. Appendix A is a glossary of
some important terms used in our discussions. Appendix B has

EARLY CHILDHOOD ASSESSMENT

information on the stakeholder forum held as part of the committees information-gathering efforts. Appendix C has information
on the domains included in state pre-K learning standards, as well
as a description of recent state standards development. Appendix
D provides sources for detailed information on assessment instruments. Appendix E contains brief biographical sketches of the
committee members and staff.

2
Purposeful Assessment

ssessment, defined as gathering information in order to

make informed instructional decisions, is an integral part
of most early childhood programs. By the mid-elementary
level, children in some school systems may spend several weeks
every year completing district and state assessments, and those in
troubled schools probably spend even more time in more formal
test preparation activities designed to ensure that their highstakes assessment outcomes are acceptable. Since assessment is
such a fact of educational life, it is important to step back and ask:
Why is this assessment being done? What purpose does it have? Is
this particular assessment optimal for meeting that purpose?
For younger children, thinking about purpose is equally central.
Done well, ongoing assessment can provide invaluable information
to parents and educators about how children grow and develop.
Developmentally appropriate assessment systems can provide
information to highlight what children know and are able to do.
However, inappropriate testing of young children runs the risk of
generating insufficient information for the tester and discomfort (or
just wasted time) for the testee; such risks are unacceptable and can
be avoided only if it is very clear why people are engaging in the
activity and what benefit will accrue from it.
Furthermore, specifying the purpose of an assessment activity
should guide all the decisions that we write about in this volume:
27

EARLY CHILDHOOD ASSESSMENT

what domains to assess, what assessment procedures to adopt,

and how to interpret and use the information derived from the
assessments. We make the case throughout this report that the
selection and use of assessments, in early childhood as elsewhere,
should be part of a larger system that specifies the infrastructure
for distributing and delivering medical or educational services,
maintaining quality, supporting professional development, distributing information, and guiding further planning and decision
making. Thus, while in this chapter we focus on the purposes for
which one might choose and use an assessment tool, we return to
the theme of purpose in thinking about designing the systems for
assessment in Part IV.
A wide range of tools can be used to collect information
about children, classrooms, homes, or programs, and thinking
about mode of assessment along with purpose is crucial. Assessment modes include medical procedures, observation of natural
behavior, participant reports using checklists or surveys, performance in structured versions of natural tasks, and performance
on standardized tests. Given the challenges of direct assessment
with very young children, it is worth first considering less
intrusive modes of assessment if they also meet the purposes
formulated.
In the following sections we discuss many purposes for
which assessment of childrens learning and development is
employed, beginning with several purposes associated with
determining the level of functioning of individual children, and
progressing to the purpose of guiding instruction, and then measuring program or societal performance. After briefly mentioning research usesemploying assessment to learn more about
child developmentwe present guidance to be kept in mind
when assessing for individual child-focused or accountability
purposes, drawing on the wisdom of many previous reports
from organizations interested in promoting the education and
welfare of young children.

PURPOSEFUL ASSESSMENT

DETERMINING an individual childs

level of functioning
Individual-Focused Screening
Many assessments, particularly in the infancy and toddler
period, are designed to screen children for medical risks. For
example, within a few days of birth, infants in the United States
are screened for phenylketonuria (PKU)a genetic disorder
characterized by an inability of the body to use the essential
amino acid, phenylalanineand in the first year of life infants are
screened for vision and hearing deficits. These screening assessments are typically carried out in pediatric settings. Because their
purpose is to ensure delivery of care or appropriate services to
all children with an identified problem or risk, the screening is
designed to minimize false negatives. False positives are less
harmful; they may alarm a parent or generate a costly follow-up,
but such mistakes are less severe in consequence than missing
a child who could benefit from early intervention or medical
treatment. It is important to ensure that individual children who
fail the screen are followed up with further assessment, both to
confirm the identification and in many cases to specify the source
of the difficulty. In Part II we document many of the domains for
which screening instruments are available and widely used.
Community-Focused Screening
Although community-focused screening may use the same
tools and procedures as individual-focused screening, its purpose
is not individual, but rather to give a picture of risk at the community level. Thus, for example, if screening for toxic levels of
lead is done in an individual-focused way, the response would
be to counsel parents about ways to protect children from lead
exposure, as well as to treat them directly. If done in a communityfocused way, the goal might be to identify neighborhoods with a
high risk of lead toxicity, in order to guide the distribution of services or to plan the provision of compensatory education in those
locations, or perhaps even to influence public policy; this could
Screening,

assessment, and other terms are defined in Appendix A.

PURPOSEFUL ASSESSMENT

early learning guidelines represent a set of aspirations about what

children should be able to do, and the social benchmarking assessments provide information about the reality.
AdvancING knowledge of child development
Finally, a major purpose of assessmentand a major source
of the assessments widely used for the purposes discussed in this
chapteris for research to advance knowledge of child development. It goes far beyond our charge to discuss in any detail the
use of assessments for research purposes. Furthermore, there exist
robust mechanismspeer review of journal articles, peer review
of grant proposals, institutional review boards for the use of
human subjectsfor providing guidance to researchers in selecting, administering, and interpreting the results of assessments
of young children. Nonetheless, because researchers of child
development have indeed innovated and in many cases refined
the tools adopted for use by education practitioners and policy
makers, it seems churlish not to acknowledge this important and
generative line of work.
Guidelines for Administering and Using
Child Assessments Appropriately
for Various Purposes
Organizations concerned with early childhood development
and learning have recognized the potential good that can come
of child assessment as well as the harm that incorrect uses or
interpretations of such assessments can cause. Several of them
have developed position statements or guidelines for the use of
assessments with young children, with the intention of maximizing the benefits and preventing harm. Some of these documents
are listed in Box 2-1.
The more recent of them incorporate and expand on earlier
ones to a large extent. Thus, the entire set represents a relatively
coherent set of guidelines for selection, use, and interpretation of
early childhood assessments. Several of these documents agree,
for example, on the following important guidelines for individual
assessment:

EARLY CHILDHOOD ASSESSMENT

BOX 2-1
Guidelines of Documents Promulgated by
Major Early Childhood Professional Groups
Principles and Recommendations for Early Childhood Assessments (Shepard, Kagan, and Wurtz, 1998). Goal 1 Early Childhood Assessments Resource Group document.
Early Childhood Curriculum, Assessment, and Program Evaluation (and an accompanying extension for English language
learners), a position statement promulgated by the National Association for the Education of Young Children and the National
Association of Early Childhood Specialists in State Departments
of Education (2003).
Promoting Positive Outcomes for Children with Disabilities:
Recommendations for Curriculum, Assessment, and Program
Evaluation from the Division for Early Childhood (2007).
Council of Chief State School Officers set of documents on
Building an Assessment System to Support Successful Early
Learners (undated, but circa 2003a, 2003b).

Assessments should benefit children: National Education

Goals Panel (NEGP), NAEYC, DEC.
Assessments should meet professional, legal, ethical standards: NAEYC, DEC.
Assessments should be designed for a specific purpose and
be shown to be psychometrically sound for that purpose:
NEGP, NAEYC, DEC.
Assessments should be age-appropriate or developmentally/
individually appropriate: NEGP, NAEYC, DEC.
Parents/family should be involved in assessment when
possible: NEGP, NAEYC, DEC.
Assessments should be linguistically and culturally
appropriate/responsive: NEGP, NAEYC, DEC.
Assessments should assess developmentally/educationally
significant content: NEGP (in narrative), NAEYC, DEC.

PURPOSEFUL ASSESSMENT

Assessment information should be gathered from familiar

contexts (NEGP), realistic settings and situations (NAEYC),
or be authentic (DEC).
Information should be gathered from multiple sources:
NEGP, NAEYC, DEC.
Assessment results should be used to improve instruction
and learning: NAEYC, DEC, NEGP.
Screening should be linked to follow-up assessment: NEGP,
NAEYC.
Special Considerations When
Using Child Assessments for Accountability
Particular care is needed in moving from child-focused to
accountability-focused purposes for assessment. Data collected
for accountability purposes are never meant as a basis for drawing conclusions or informing program personnel about individual
children. Instead, they are meant to be useful to funders, state
and federal policy makers, and others responsible for making
decisions about a program or policy, and for this purpose it is
completely appropriate to use sampling. However, in many cases,
states are attempting to use the same data for accountability and
for progress monitoring purposes. The wisdom of this approach is
questionable, although the apparent efficiencies are understandably seductive. Progress monitoring, however, requires data at the
individual child level from all children.
Decisions about accountability should never rest solely
on findings from child-directed assessments. Information about
the conditions under which the program is operating and
about the characteristics of the families and children it is serving
are crucial to making valid inferences from child performance
to program quality. (Many other safeguards must also be in
place, which are discussed in Part III.) Considerable guidance
about accountability assessment is available from the documents
listed in Box 2-1, as well as from a recent Pew Foundation report
(National Early Childhood Accountability Task Force, 2007).
The tools used for various accountability purposes are often
adaptations of tools developed for other purposes. The largescale, large-sample assessment sweeps needed for accountability

EARLY CHILDHOOD ASSESSMENT

purposes impose a particular set of requirements: relatively

brief assessments that can be administered and interpreted in
standardized and straightforward ways. These requirements are
particularly difficult to meet when assessing young children.
Standardization of administration conflicts with establishing a
trusting relationship with a child, for example, and standardization of interpretation conflicts with using all the information
available. The reliability of standardized tests is threatened when
they are shortened for use with large groups, and brief forms may
generate information too sparse to be interpretable, in particular
for children from language and cultural minorities and children
with disabilities. Thus such abbreviation or adaptation requires
careful evaluation of the psychometric properties of the adapted
or abbreviated instruments. Nonetheless, tools developed for
other purposes (e.g., Peabody Picture Vocabulary TestDunn and
Dunn, 2007; Bayley Scales of Infant and Toddler Development
Bayley, 2005; MacArthur-Bates Communicative Development
InventoriesFenson et al., 1993) are often adapted for use in
large-scale evaluations and social benchmarking efforts.
As noted above, the validity of conclusions about accountability, evaluation, and social benchmarking extends only to
groups that are represented in sufficient numbers among those on
whom the instruments were normed and among those assessed.
Language and cultural-minority children and children with disabilities must typically be either oversampled or excluded from
consideration; neither solution is entirely without problems.
Conclusions about the status or development of children in these
groups are also of concern in large-scale assessments because
they are highly standardized and often norm-referenced. Some
children with disabilities may not be included because they need
accommodations or because the floor of the assessment is too
high. English language learners may not be included because
the assessment is given or exists only in English. Any conclusion
about program accountability requires data about initial as well
as final performance.
Another key issue in accountability-related assessment is the
selection of the assessment tools to be used. This step should be
as purposeful as the other decisionswhen to assess, whom to
assess, how to assessinvolved in establishing accountability.

PURPOSEFUL ASSESSMENT

Too often these decisions are made by committees or with input

from multiple stakeholders; even with the best intentions, multiple parties may end up compromising on poor tests. We hope
this report provides some guidance to groups making decisions
about instruments to choose for any of the purposes they may be
addressing.

3
Perspectives on Early Childhood
Learning Standards and Assessment

n a perfect world, participants in the development of a set of

early childhood services at either a local or system level would
begin by thinking about what is needed to improve the physical well-being and developmental competence of young children.
They would decide what outcomes could be anticipated for
children who participate in a particular well-designed program
or set of services. They would subsequently concern themselves
with what standards and processes would be needed to ensure
that participating children would benefit from the program. The
planners would select formative assessments to track childrens
progress toward the standards and use this information to guide
instructional adjustments. And finally, reliable and valid processes
to assess whether childrens overall development and learning
have met the expectations of the planners would be selected and
employed. The results of such assessment would be used to refine
the program practices with the expectation that the outcomes for
children would improve even further.
In the real world, this rarely happens. The underresourced
complex of early childhood care and education settings in the
United States is seldom able to implement the ideal sequence of
steps at the local, state, or national level. The federal government,
individual states, and local providers usually find themselves
working at least partially backward to create workable processes
43

EARLY CHILDHOOD ASSESSMENT

to determine what the expectations for children and their families

should be, what program standards lead to the accomplishment
of those outcomes, and how to assess childrens status related to
the standards as a function of program participation.
That picture is changing as the early childhood field, as never
before, is influenced by and actively reconfigures itself in response
to the burgeoning development of state prekindergarten (pre-K)
programs and accompanying expectations for documentation
of childrens progress, the development of learning standards
in K-12 education, the parallel development of state assessment
systems, and the accompanying development of quality rating
systems across the early care and education sector.
This chapter describes the development of well-defined expectations for child outcomesthat is, early learning standardsas
a function of participation in an early childhood setting of some
kind, how these learning standards are being used, and how practitioners are able to access information about how to use them. We
use the term early learning standards, as defined by the Early
Childhood Education Assessment Consortium of the Council of
Chief State School Officers, in collaboration with several early
childhood organizations. Early learning standards are statements
that describe expectations for the learning and development of
young children across the domains of health and physical wellbeing, social and emotional well-being, approaches to learning,
language development and symbol systems, and general knowledge about the world around them (Council of Chief State School
Officers and Early Childhood Education Assessment Consortium,
2007).
Until recently the very idea of defined expectations for what
children should know and be able to do at particular times in
these very early years of their lives was rejected by many in the
early childhood field. Policy makers, researchers, program leaders, and teachers have historically depended on structural program and process standards (e.g., the qualifications of staff, group
size and ratio, nature of the curriculum, provisions for parental
involvement, the nature of adult and child interaction) to assess
whether a program was offering a quality experience for children.
These sets of program and process standards exist in forms as
diverse as the minimum regulations each state requires for child

PERSPECTIVES ON EARLY CHILDHOOD LEARNING STANDARDS

care settings, to requirements for operating the federal Head Start

program, to regulations for state prekindergarten programs, to
standards for National Association for the Education of Young
Children accreditation (National Association for the Education of
Young Children, 2006). Program standards can reflect the minimum floor under which a program cannot operate, such as in the
case of the states child care regulations, or they can be the highest quality requirements, as in the case of the new Accreditation
Standards of the National Association for the Education of Young
Children (2006).
DEVELOPMENT OF EARLY LEARNING STANDARDS
Decades of research on effective programs have demonstrated
that children participating in programs adhering to high-quality
program and process standards exhibit improved developmental
and learning outcomes compared with children with no program
or those experiencing a low-quality program (Ackerman and
Barnett, 2006; High/Scope Educational Research Foundation,
2002). Many states making an investment in prekindergarten
conduct evaluations of program quality and, in some cases,
assess child outcomes. These studies are in addition to the regular program monitoring done to ensure that programs meet state
standards, and they have increased in number as more and more
states have begun to invest public money in prekindergarten
(Gilliam and Zigler, 2001). Michigan, for example, has compelling
longitudinal program evaluation data on the link between program quality and child outcomes in the Michigan School Readiness Program (High/Scope Educational Research Foundation,
in press; National Institute for Early Education Research, 2005).
Few other public or private programs (e.g., child care, private
preschools) are subject to either quality-driven program standards
or requirements for assessing child outcomes.
The earliest state early learning standards were developed by
states operating pre-K programs (typically for 3- and 4-year-olds or
just 4-year-olds). Such standards were developed on the premise
that evaluation of child outcomes could not be done without a set
of early learning standards against which to measure childrens
progress. Since the early 1990s, there has been an explosion of

EARLY CHILDHOOD ASSESSMENT

activity around the development of state learning standards, and

every state now has them except North Dakota (where they exist
in draft form). National early learning standards, such as those
developed for Head Start and by subject-specific professional organizations, have also been created (Council of Chief State School
Officers and Early Childhood Education Assessment Consortium,
2003a; U.S. Department of Health and Human Services, Administration for Children and Families, 2003). A set of model early
learning standards has been developed by a national committee of
experts (Pre-kindergarten Standards Panel, 2002), although a 2003
study found that few states made specific reference to this document (Council of Chief State School Officers and Early Childhood
Education Assessment Consortium, 2003b).
Virtually every report or article about states and their development of early learning expectations begins with an expression
of surprise about how quickly the development process unfolded
across the nation (see Box 3-1). The development and implementation of these standards reflect a significant shift in how the
field has viewed the usefulness of setting expectations for young
childrens learning and development. Appendix C provides more
information about state early childhood standards.
While acknowledging that adherence to high-quality program
standards substantially increases the likelihood that participating children will benefit from the program, advocates have been
forceful in expressing reservations about creating these sets of
expectations (Hatch, 2001; National Association for the Education
of Young Children and National Association of Early Childhood
Specialists in State Departments of Education, 2002). Such reservations include a number of concerns:
The threat of ignoring the variability of childrens development and learning and of their experiences.
Worry that early labeling of the most vulnerable children as
failures puts their access to appropriate instruction and
thus their future development at risk.
Unfairly judging programs on the basis of whether participating children meet standards, without taking into
account their status at entry to the program or information
about the resources available to the program.

PERSPECTIVES ON EARLY CHILDHOOD LEARNING STANDARDS

BOX 3-1
The Development of Major Early Learning Standards
1989 Goal 1, All children ready to learn, articulated by the
nations governors at education summit
1995 Publication of Reconsidering Childrens Early Development
and Learning (Kagan, Moore, and Bredekamp, 1995)
1998 Publication of Preventing Reading Difficulties (National
Research Council, 1998)

Publication of Principles and Recommendations for Early

Childhood Assessments (Shepad, Kagan, and Wurtz,
1998)

1999 10 states have standards for children ages 3-4

2000 Publication of From Neurons to Neighborhoods (National
Research Council and Institute of Medicine, 2000)

Publication of Head Start Child Outcomes Framework (U.S.

Department of Health and Human Services, Administration
for Children and Families, 2000)

2001 Publication of Eager to Learn (National Research Council,

2001)
2002 17 states have standards for children ages 3-4; 4 states
have standards for children ages 0-3

Good Start, Grow Smart initiative (White House, 2002)

launched

Head Start National Reporting System launched

2007 49 states have standards for children ages 3-4; 18 states

have standards for children ages 0-3

Publication of Taking Stock: Assessing and Improving Early

Childhood Learning and Program Quality (National Early
Childhood Accountability Task Force, 2007)

States now required to report outcomes data for children

with disabilities served through Part C and Part B of the
Individuals with Disabilities Education Act as part of their
Annual Performance Report

EARLY CHILDHOOD ASSESSMENT

The risk of children being unfairly denied program participation based on what they do or do not know.
The risk that responsibility for meeting the standards will
shift from the adults charged with providing high-quality
learning opportunities to very young children.
Whether high-quality teaching will be undermined by
the pressure to meet standards, causing the curriculum to
become rigid and focused on test content and the erosion of
a child-centered approach to curriculum development and
instructional practices.
Whether switching to child outcome standards as the sole
criterion for determining the effectiveness of programs or
personnel is unfair. Early childhood services continue to be
underresourced, and poor child outcomes may reflect the
lack of resources.
Misunderstanding of how to achieve standards frequently
appears to engender more teacher-centered, didactic
practices.
Although these concerns cannot be dismissed, it is important
to note that early learning standards were developed as a tool to
improve program quality for all children. Their rapid development has resulted from a combination of policy shifts and an
emerging practitioner consensus, influenced by a number of
factors:
The standards-setting activity in K-12 education, which
gained momentum after the 1990 establishment of the
National Education Goals Panel and the subsequent passage of Goals 2000 by Congress in 1994. This act and its
accompanying funding led states to develop or refine K-12
standards in at least the areas of English language arts,
mathematics, science, and history.
Greater understanding about the capabilities of young
children. Earlier work of the National Research Council
(NRC) has played a key role in informing and developing
that understanding and thereby supporting the development of early learning standards. The most influential NRC
document influencing the development of standards for

PERSPECTIVES ON EARLY CHILDHOOD LEARNING STANDARDS

preschool-age children has been Eager to Learn: Educating

Our Preschoolers (National Research Council, 2001). Other
important influences include From Neurons to Neighborhoods: The Science of Early Childhood Development (National
Research Council and Institute of Medicine, 2000) and
Preventing Reading Difficulties in Young Children (National
Research Council, 1998).
Linking of the development of early learning standards
with receipt of federal funds from the Child Care and
Development Fund for each state (U.S. Department of
Health and Human Services, Administration for Children and Families, 2002). The requirement that all states
develop voluntary early learning guidelines in language,
literacy and mathematics followed the release of the 2002
early childhood initiative, Good Start, Grow Smart (White
House, 2002).
HEAD START CHILD OUTCOMES FRAMEWORK
Head Start is a large, well-known federally funded early
childhood services program, serving over 909,000 children in
FY 2006. Actions taken by Head Start are highly visible and
embody federal policies toward early childhood services. The
following narrative provides some background for understanding the evolution of the Head Start National Reporting System.
Development of the Framework
The Head Start Child Outcomes Framework was developed in
response to an unfolding set of congressional mandates beginning
with the 1994 reauthorization of the Head Start Act, which mandated the development of measures to assess services and administrative and fiscal practices, to be usable for local self-assessment
and peer review, to identify Head Start strengths and weaknesses,
and to identify problem areas (Section 641A).
The earliest response to this mandate by the Head Start
Bureau was the creation of a Pyramid of Services diagram that
local programs could use to support and inform continuous
program improvement efforts (see Figure 3-1). The pyramid was

EARLY CHILDHOOD ASSESSMENT

also used in the formulation of the Family and Child Experiences

Survey (FACES) (McKey and Tarullo, 1998).
When Head Start was reauthorized in 1998, programs were
required to include specific child outcomes in their self-assessment
process. This requirement led in 2000 to the development of the
Child Outcomes Framework (U.S. Department of Health and
Human Services, Administration for Children and Families, 2000).
The development process was informed by the participation of a
committee of outside experts (the Head Start Bureau Technical
Work Group on Child Outcomes), who used the Pyramid of Services as a basis for their deliberations.
Bureau staff also consulted standards documents from professional associations and the existing state early learning standards,
of which 10 sets existed at the time. Although those sets of state
standards displayed some common elements, great disparity was
reflected in the ways the developmental domains were described
and in which domains were included. Some included only a few
domains, such as language and literacy; others reflected the five
dimensions described by the National Education Goals Panel
Goal 1 Technical Planning Group (Kagan, Moore, and Bredekamp,
1995) or additional content-related domains (e.g., social studies,
science, mathematics, arts).
As had the state leaders, the developers of the Head Start
Child Outcomes Framework struggled with how to organize
learning expectations for Head Start children. They settled on
eight broad categories that include the domains in the Goal 1
document (Kagan, Moore, and Bredekamp, 1995), with the addition of the content categories of mathematics, science, and the
arts. Expectations related to social studies were included under
the social emotional domain as knowledge of families and comFACES employs direct assessment items from several nationally normed early

childhood instruments, along with teacher reports, parent reports, and observation, to assess numerous cognitive and socioemotional outcomes. It follows
children from their Head Start experiences through kindergarten and through
the 1997 cohort into first grade (U.S. Department of Health and Human Services,
Administration for Children and Families, 2006a, available: http://www.acf.hhs.
gov/programs/opre/hs/faces/index.html).
From Thomas Schultz via personal communication with committee member
Harriet Egertson.

CHILDS
SOCIAL
COMPETENCE

Pro

cess
es

tco
me

PERSPECTIVES ON EARLY CHILDHOOD LEARNING STANDARDS

ENHANCE

STRENGTHEN

childrens
growth and
development.

families as the
primary nurturers
of their children.

PROVIDE

LINK

children with educational,

health, and nutritional services.

children and families to needed

community services.

5
ENSURE
well-managed programs that involve parents in decision making.

FIGURE 3-1 Head Start Program performance measures conceptual framework.

SOURCE:
Health and Human
Services, Administration
AlternateU.S.
FigDepartment
3-1, fromofdownloaded
source,
editable vectors
for Children and Families (2006).

munities. The eight general domains in the final document

language development, literacy, mathematics, science, creative
arts, social and emotional development, approaches to learning,
and physical health and developmentwere divided further into
27 domain elements, and 100 examples of more specific indicators
of childrens skills, abilities, knowledge, and behaviors considered
to be important for school success (U.S. Department of Health and
From S.A. Andersen via personal communication with committee member
Harriet Egertson.

EARLY CHILDHOOD ASSESSMENT

Human Services, Administration for Children and Families, 2003).

Among the 100 indicators were 13 specific, legislatively mandated
domain elements or indicators in various language, literacy, and
numeracy skills. Two indicators are specific to the desired outcomes for young children learning English.
The framework was clearly intended to provide guidance
for ongoing child assessment and program improvement efforts.
Several caveats are specified in the introduction: the framework
is intended to focus on children ages 3 to 5 rather than younger
children and to guide local programs in selecting, developing, or
adapting an assessment instrument or set of assessment tools.
The framework is not intended to be an exhaustive list of
everything a child should know or be able to do by the end of preschool or to be used directly as a checklist for assessing children.
There is no mention of its relationship to curriculum development.
The introduction further attempts to broaden practitioner understanding of the use of the framework: Information on childrens
progress on the Domains, Domain Elements and Indicators can
be obtained from multiple sources, such as teacher observations,
analysis of samples of childrens work and performance, parent
reports, or direct assessment of children. Head Start assessment
practices should reflect the assumption that children demonstrate
progress over time in development and learning on a developmental continuum, in forms such as increasing frequency of a
behavior or ability; increasing breadth or depth of knowledge and
understanding; or increasing proficiency or independence in exercising a skill or ability (U.S. Department of Health and Human
Services, Administration for Children and Families, 2000).
Good Start, Grow Smart Initiative
The next step in the federal effort to prepare children to succeed in school with improved Head Start programs came in 2002.
President George W. Bush mandated the Good Start, Grow Smart
initiative to help states and local communities strengthen early
learning for young children. As described in the executive summary of the initiative, President Bush directed the Department
of Health and Human Services (HHS) to develop a strategy for
assessing the standards of learning in early literacy, language, and

PERSPECTIVES ON EARLY CHILDHOOD LEARNING STANDARDS

numeracy skills in every Head Start center. Every local program

SCREENING YOUNG CHILDREN

This list does not include many purposes typical of assessment for older preschoolers, such as evaluation of intervention
strategies, prediction of future competencies, or assessment of
skills that are fundamental for success in a classroom environment, such as ease of gaining the childs attention and ability to
sustain it. The focus is on the identification of possible developmental problems at an early agein part, we argue, because of the
relatively undifferentiated nature of developmental organization
in early infancy and the associated difficulty of making precise
predictions to later abilities. We note also that in spite of wide
agreement that screening and monitoring of the development of
these youngest children is important, pediatricians still do not
fully agree on the most important domains to measure or the best
measures to use (McCormick, 2008).
Most of the assessment conducted in this age range is actually
screening to identify potential problems, to be followed by more
definitive diagnostic assessment. The principles of a good screening program are thus relevant (Wilson and Jungner, 1968):
a valid and reliable measure,
acceptability to the population being screened and their
parents or guardians,
facilities to conduct the screening,
facilities to ensure follow-up and treatment, and
cost-effectiveness.
Contexts and Assessment
As noted, assessment of infants and toddlers often takes
place in pediatric settings, with screening as a primary goal.
Screening may also take place in early childhood education and
intervention settings, such as Early Head Start and home visiting
programs. Interpreting results from such assessments must take
into account the effects of a wide variety of inputs into the childs
development, for example, safety of the residence, care practices
of parents and other caregivers, exposure to substances that might
hamper normal development, and consistency of care settings, as
well as information about the infants state of health and alertness
during the assessment.

EARLY CHILDHOOD ASSESSMENT

There is an explicit assumption that child care practices,

caregiver stability, and infant-caregiver attachment provide the
basis for optimal social and cognitive development. However,
for many children, including those under age 3, substantial variability exists in the types, extent, and number of forms of out-ofhome care available (Johnson, 2005); this variability may be even
greater for children at risk of developmental delay, who may also
be eligible to receive community-based early intervention services
(Widerstrom, 1999). Understanding the quality of these variable
settings, as well as the impact of the childs exposure to different
settings, is crucial in interpreting child-based outcomes.
Because of the variety of the settings in which infants and
toddlers are cared for, the equivalent of the older childs classroom
as a place for administering developmental assessments is available only for the minority of children now reached by infant and
toddler intervention and education programs like Early Head Start.
However, because the vast majority of children under age 4 are
monitored by pediatricians or family practitioners (Freed, Nahra,
and Wheeler, 2004) and regular developmental assessment is recommended for well-child care, the pediatric setting thus becomes
the most likely site for infant and toddler screening. This fact has
implications for the training of pediatric personnel, for the design
of organized data systems useful in ensuring that all children are
screened for developmental problems, and for an integrated service
delivery system that spans medical and educational settings.
Assessing Threats to Normative Development
We focus here on threats that are susceptible to prevention or
amenable to postnatal intervention. There is a much longer list of
factors associated with increased risk to normative development,
ranging from child-specific (low birth weight, prematurity) to
societal (poverty) factors; the ones discussed here are merely a
selection.
Genetic/Metabolic Screening
Currently, every newborn in the United States is screened
at birth for certain genetic conditions and metabolic disorders,

SCREENING YOUNG CHILDREN

although the number of conditions varies by state (Kaye and

Committee on Genetics, 2006; Lloyd-Puryear et al., 2007). Many
of these conditions result in significant nervous system damage,
leading to severe developmental delays, which early treatment may
prevent or ameliorate (Kaye and Committee on Genetics, 2006). In
the past, such screening depended on chemical analyses of a spot of
blood taken at the time of discharge from the hospital nursery, limiting the number of conditions for which screening could be done.
More recently, the use of tandem mass spectrometry (MS/MS) has
greatly expanded the number of conditions for which screening is
possible (Schulze, 2003). Although this technology is expensive to
implement, its use has been argued to be very cost-effective (Carroll
and Downs, 2005). Moreover, since neonatal metabolic screening
has been so well incorporated into care following birth, it is generally well accepted by both providers and parents.
Estimating the effect of newborn genetic/metabolic screening is made difficult by several factors (Botkin, 2004; Kaye and
Committee on Genetics, 2006). First, when newborn screening
programs were initiated, the assumption was that an affected
gene led to disease. Advances in modern genetics have revealed
that many mutations may occur in a single gene, not all of them
leading to significant disease, and it often is unclear whether treatment is needed. Second, the expanded MS/MS techniques reveal
biochemical abnormalities that may or may not be associated
with specific disease states, so the natural history of some of these
abnormalities is unknown. Infant maturation may affect detection; for example, congenital hypothyroidism may be difficult to
detect in preterm infants. Moreover, these tests, while having some
power of detection, are not a proxy for functional outcomes related
to behavior. The prevention of developmental disability requires a
system of detection, validation, and treatment, and the treatments
may be onerous, thereby affecting compliance. Finally, many more
infants test positive on the screening tests than have the disease,
and assessing these infants adds to the costs without preventing
disability. In addition to the costs, simply identifying the infants
who test falsely positive may have unintended consequences on
their development (Fisher and Welch, 1999; Newman, Browner,
and Hulley, 1990). Despite these concerns, neonatal metabolic
screening has proven to be an effective screening process.

EARLY CHILDHOOD ASSESSMENT

EARLY CHILDHOOD ASSESSMENT

and the method by which data are gathered: caregiver report,

direct observation of the child, or both methods.
The tables are not meant as an endorsement of any instruments, but rather as a way to categorize instruments that are frequently used and to lead the reader to references, like those listed
above, that provide more detailed information on each.
Challenges in Effective Infant Screening
There are two sets of challenges to be faced in generating an
optimal system of infant assessment for screening purposes. The
first set has to do with the inherent difficulty of assessing very
young children reliably and validly, and the second with the many
societal conditions that need to be in place to ensure effective
infant assessment and use of infant assessment information
The Difficulty of Assessing Young Children
Very young children are hard to assess reliably and validly
because of the relatively undifferentiated nature of their capabilities. Infants are less differentiated than older childrenthat
is, children express their developmental status in increasingly
differentiated ways as they mature (National Research Council
and Institute of Medicine, 2000). Moreover, the environment in
which abilities are expressed changes drastically from infancy to
preschool and beyond, thus requiring changes in the childs adaptive capacity as well.
Young children also show enormous variability within and
across individuals, reflecting the emerging differentiation of
functional systems. This developmental state gradually gives
way in later childhood to narrower windows of performance
considered to be within normal limits. Embedded in this
concept of normal limits is an expectation that, as children
mature, their behavior will conform to the increasingly stringent
standards and expectations associated with social and academic
success.
In infancy, biological homeostasis, autonomic regulation,
and organizational properties of behavioral development are
important indicators. These might be informally assessed by

SCREENING YOUNG CHILDREN

observing how long it takes for an infant to calm down after a

stressful event, such as an injection; whether an infant turns away
from highly stimulating events before becoming overexcited; or
whether a 14-month-old turns to a caregiver when confronted by
an unfamiliar or frightening stimulus. By the time a child reaches
age 2 years, autonomic regulation is typically under control, so the
developmental challenges associated with gross and fine motor
control, receptive and early expressive communication skills, and
socioemotional regulation of affective states are now more important and more susceptible to assessment. By age 5, the childs
major developmental challenges include expressive language and
social communication skills, affect regulation in the context of
broader social and peer relations, and cognitive maturation commensurate with instruction in a formal educational setting.
The childs expanding repertoire of behavioral and social
abilities, including linguistic communication skills, opens up
more options for assessment during the toddler years. Assessing
infants permits only a relatively global appraisal of level of functioning. Infant assessment is therefore focused on optimal performance and the testing of limits more than on assessing whether
the infant can pass a minimum threshold of performance in any
particular domain. For infants even more than older children,
optimal performance is dependent on state of arousal.
For infants and toddlers as for older preschoolers, effective
assessment of behavioral functioning presupposes that the child
attends to the relevant information. If the child is not attending,
assessment results are typically viewed as invalid. The ability to
sustain attention for information-processing purposes can itself
be assessed from birth through age 5.
Prediction of later outcomes would be much easier if developmental assessments used with infants had a one-to-one correspondence to measures taken later. Under such circumstances,
the timing of early developmental milestonessuch as when the
child sits unassisted, begins to grasp objects, crawl, babble, and
declare wants and intentionswould lead to accurate predictions
of later walking, handedness, speech development, and emotion
regulation. There is no practical or reliable measure of any specific
domain in early infancy that gives a precise prediction about the
childs performance in that domain several years later; in part

EARLY CHILDHOOD ASSESSMENT

this fact reflects the enormous plasticity of the developing child

and susceptibility to environmental influences. Thus, though
screening measures of infant functioning can be very important
in identifying the need for further diagnostic assessment to reveal
conditions that represent risk for poor performance later on, as
well as in allowing early access to prevention or intervention,
assessment for purposes of tracking development or predicting
later outcomes is less likely to be useful.
Conditions Required for Effective Screening
A second set of challenges to effective screening arises from
the complexity of putting together the societal conditions required
to do it well. Several problems limit the potential usefulness of
the current system for infant and toddler developmental screening. First, there are concerns about the validity of the instruments themselves. The sample sizes on which many tests were
validated may be insufficient to provide robust estimates of their
sensitivitythat is, their ability to identify those affectedand
specificitythe ability to avoid identifying those not affected
(Camp, 2007). Sensitivity may be further affected when the reference test is given to all who score in the abnormal range but
to only a sample of those in the normal range. Some screening
tests have used reference tests with outdated norms, resulting in
inflated scores. In addition, several have procedural problems that
could lead to biased results, and often the reported results do not
indicate the predictive validity (Camp, 2007).
A second issue is that responsibilities for screening are
dispersed across individuals and settings, and that a standard
procedure for administering screenings has not been established.
Thus, the screening assessments may not be administered and,
if they are, may not provide comparable information across providers. A recent assessment of the quality of pediatric ambulatory care revealed that children received fewer than half of the
recommended procedures and that screening procedures were
particularly unlikely to be performed (Mangione-Smith et al.,
2007). These results parallel those for specific screening tests
(Biondich et al., 2006; Wasserman, Croft, and Brotherton, 1992).
Clearly, if administration procedures are to be standardized and
well implemented, medical and education practitioners working

SCREENING YOUNG CHILDREN

with infants and young children need training and support in the
appropriate procedures.
Finally, the effectiveness of screening may be further limited
by the fact that the system of access to screening settings and of
response to abnormalities found may be as diffuse and unstandardized as the assessment process itself. Unlike the classroom
setting, in which more standardized and local approaches to
developmental and learning problems may be taken, response
to abnormalities of development in infants, toddlers, and older
preschoolers not already enrolled in intervention programs typically requires referral to other services for diagnosis and management. In part, this variability in response reflects the diversity of
state and other policies regarding young children. This means
that some infants and toddlers are not screened, and that those
who are identified as requiring diagnostic assessments and other
services may not receive them. As noted above, much of the early
screening is accomplished in health care settings, and access to
care is heavily dependent on having health insurance. Children
without health insurance are more likely to have low family
income, to come from minority families, to use medical care less
intensely, and to be referred to other settings for services (Simpson
et al., 2005). Even with insurance, access to some services is more
difficult than others. Although the Individuals with Disabilities
Education Act does mandate testing for all children suspected
of developmental disability or delay and requires the provision
of appropriate services to children so identified, there remains
considerable local variation in the capacity to respond to this
mandate. A recent chapter by Gilliam, Meisels, and Mayes (2005)
proposes a system of screening and surveillance that uses many
available community resources to provide a more integrated
screening, referral, and assessment system.
Finally, even if the current assessment of infant and toddler
development were more universally effective, fitting well into
a larger system and building continuity with the assessment of
slightly older preschoolers would improve its usefulness. The
focus of infant-toddler assessment procedures is primarily on
monitoring development and risks to development for purposes
of ensuring adequate progress and to rule out health-related challenges to normal development. For example, the vision examinations conducted by health care providers may focus less on the

EARLY CHILDHOOD ASSESSMENT

visual acuity needed for classroom work and more on detecting

opacities in the eye (e.g., cataracts) that may hamper visual development or muscle imbalances that might signal other neurological
problems. Likewise, screening for iron deficiency should attend
to the cognitive deficits associated with it as much as evaluating
the childs nutritional status and addressing questions about the
production and destruction of red blood cells and potential covert
blood loss.
Conclusion
Assessment of important behavioral and physiological outcomes for infants and toddlers is an increasing focus of pediatricians, primary medical care providers, and providers of care
and education to infants and toddlers. Ideally, these individuals
recognize the full array of informationchild performance, caregiver report, observationthat can be used and are well trained
to collect information systematically. While screening for risk
is a key goal of assessment during this developmental period,
an equally important goal is tracking well-child developmental
indicators and focusing on what children can do as well as what
they have problems with. For children with disabilities that have
already been identified in this early period, a focus on functional
capacities may be more important than a delineation of limitations. Although screening for risk and assessment for well-child
functioning are widely practiced, the system of infant and toddler
assessment needs to be expanded in a number of ways.
First, it is important that children living in poverty and children from cultural and language-minority groups are included in
these assessments. Second, the system linking assessment results
to other resourcesreferrals, follow-up, access to servicesis
at this time far from seamless. Identifying risk or disability in a
young child does little good if no provisions have been made to
remedy or mediate the problem, to help caregivers understand
and address it, or to link the early available information to decisions about interventions, schooling, and ongoing attention.
We raise again the importance of thinking systematically if the
potential of assessment to improve child learning and welfare is
to be realized.

Ages and Stagesa

Screening

Denver Prescreening
Developmental Questionnairea

Infant Monitoring Systema

Parents Evaluation of
Developmental Statusa

NCHS/NLSY Questionnaire
(U.S. Department of Health and
Human Services, National Center
for Health Statistics, 1981)

Infant Development Inventory

Caregiver Report

Instrument Type

Data-Gathering Method

APPENDIX TABLE 4-1 Domain: Cognition

Developmental Profile-II
Preschool Screening System
Denver Developmental Screening
Test IIa

Slosson Intelligence Test

Lexington Developmental Scalesa
Bayley Infant Neurodevelopmental
Screener (BINS) (Aylward, 1995)

continued

Fagan Test of Infant Intelligence

Brigance Screens

Battelle Developmental Inventory

Screening Testa

Mixed/Both

Developmental Indicators for

Assessment of Learning-Revised

Observation

Appendix TableS:
Summary of Assessment Instruments for Children 0-3 Years of Age

aIncludes

Capute Scales (CAT/CLAMS)

(Voigt et al., 2003)

Parents/Evaluation of
Developmental Status (PEDS)a

Child Development Inventory and

Child Development Review-Parent
Questionnaire (Ireton, 1992)

Caregiver Report

Mullen Scales of Early Learning

McCarthy Scales of Childrens

Ability

Bayley Scales of Infant

Development, Third ed.

Observation

questions on behavioral issues or personal-social development.

Diagnostic

Instrument Type

Data-Gathering Method

APPENDIX TABLE 4-1 Continued

Mixed/Both

aRequires

Preschool Language Scale

MacArthur-Bates Communicative
Development Inventories
Test of Early Language
Development

Reynell Developmental Language

Scales

Receptive Expressive Emergent

Language Scale (REEL)a

trained interviewer/observer.

Diagnostic

Expressive One-Word Picture

Vocabulary Test

Communication and Symbolic

Behavior Scales (Wetherby and
Prizant, 2002)

Early Language Milestone Scale

(Coplan, 1993)

Peabody Picture Vocabulary Test

The Quick Test

Screening

Observation

Caregiver Report

Instrument Type

Data-Gathering Method

APPENDIX TABLE 4-2 Domain: Language

Sequenced Inventory of
Communication Development

Mixed/Both

Diagnostic

Early Motor Pattern Profile (EMPP)

(Morgan and Aldag, 1996)

Screening

Motor Quotient
(Capute and Shapiro, 1985)

Caregiver Report

Instrument Type

Data-Gathering Method

APPENDIX TABLE 4-3 Domain: Motor

Alberta Infant Motor Scale (Piper

and Darrah, 1994)

Peabody Developmental Motor

Scales (Folio and Fewell, 1983)

Movement Assessment of Infants

(Chandler, Andrews, and Swanson,
1980)

Bayley Scales of Infant

Development, Third ed.
(see above)

Observation

Mixed/Both

Screens for
Specific
Developmental
Disabilities

Eyberg Child Behavior Inventory

General

Checklist for Autism in Toddlers

(CHAT) (Baird et al., 2000)

Modified Checklist of Autism in

Toddlers (Dumont-Mathieu and
Fine, 2005)

Temperament and Atypical

Behavior Scale (TABS) (Bagnato et
al., 1999)

Devereux Early Childhood

Assessment

Achenbach System of Empirically

Based Assessment

Brief Infant-Toddler Social

Emotional Assessment, BITSEA

Infant-Toddler Social Emotional

Assessment, ITSEA

Caregiver Report

Instrument Type

Data-Gathering Method
Bayley Scales of Infant
Development, Third ed.

Observation

APPENDIX TABLE 4-4 Domain: Social-Emotional

continued

Vineland Social-Emotional Maturity

Scalea

Mixed/Both

trained interviewer/observer.

Social Communication
Questionnaire (SCQ) (Rutter,
Bailey, and Lord, 2003)

Screening Tool for Autism in TwoYear-Olds (STAT) (Stone, Coonrod,

and Ousley, 2000)

Pervasive Developmental
Disorders Screening Test-II
(PDDST-II) (Siegel, 2004)

Caregiver Report

Observation

aRequires

All

Caregiver Report

Data-Gathering Method

trained interviewer/observer.

Instrument Type

Observation

APPENDIX TABLE 4-5 Domain: Function/Activities of Daily Living

aRequires

Instrument Type

Data-Gathering Method

APPENDIX TABLE 4-4 Continued

Vineland Adaptive Behavior Scale-IIa

Mixed/Both

Toddler Behavior Assessment

Questionnaire (Carey Scales)

All

Pictorial Assessment of
Temperament (PAT)
(Clarke-Stewart et al., 2000)

Infant Characteristics
Questionnaire
(Bates, Freeland, and Lounsbury,
1979)

Childrens Behavior Questionnaire

(Putnam and Rothbart, 2006)

Caregiver Report

Instrument Type

Data-Gathering Method

APPENDIX TABLE 4-6 Domain: Temperament

Observation

Mixed/Both

All

Instrument Type

Caregiver Report

Data-Gathering Method

Nursing Child Assessment Satellite

Training

Preschool Assessment of
Attachment
(Teti and Gelfand, 1997)

Ainsworth Strange Situation

Procedure

Observation

APPENDIX TABLE 4-7 Domain: Attachment/Caregiver-Child Interaction

Mixed/Both

5
Assessing Learning
and Development

ssessments for purposes other than screening and diagnosis have become more and more common for young
children. Some of these assessments are conducted to
answer questions about the child (e.g., monitoring progress during instruction or intervention). Other assessments are conducted
to provide information about classrooms and programs (e.g., to
evaluate a specific curriculum or type of program) or society in
general (e.g., to describe the school readiness of children entering
kindergarten). Many of the assessments widely in use in educational settings are designed primarily to inform instruction by
helping classroom personnel specify how children are learning
and developing and where they could usefully adapt and adjust
their instructional approaches. Thus, the goals of much testing in
this later period are more closely related to educational than to
medical or public health issues, and the nature of the assessments
as well as the domains assessed are modified accordingly.
The greater role of education in these assessments means that
the settings for assessing children may be different, and the range
of domains toward which assessments are directed is expanded.
Assessment that is educationally oriented often takes school-age
achievement as the ultimate target and thus is organized into
domains that are highly relevant to K-12 schooling (e.g., literacy,
science, social studies). Understanding the developmentally rel85

EARLY CHILDHOOD ASSESSMENT

evant conceptualization of these skills for preschool-age children

is a task for researchers as well as test developers; nonetheless,
it is clear that precursors to academic literacy, mathematics, and
general knowledge can be measured long before formal instruction in these domains has commenced.
The domains of relevance to schooling extend well beyond
cognition and knowledge. Children being educated or cared for
in groups are expected to be able to regulate their emotions and
attention; to form social relationships with peers and with non
familial adults; to learn from observation, participation, and direct
instruction; and increasingly to direct their own learning. All
these capacities are crucial if children are going to function well
in preschool and child care or in K-12 programs, and promoting
these capacities is also a primary goal of adults in group care and
educational settings. Thus, assessments of such capacities are seen
to reflect not only child skills but also the adequacy of the settings
in which children spend their time. In addition, group care and
educational settings vary in quality and in design, although state
and local guidelines for teacher-child ratios, number of children
served, and the preparation required of preschool teachers and
caregivers limit the degree of variation to some extent.
Screening and diagnosis remain crucial purposes in assessment of older preschoolers, as well as infants and toddlers. In
addition, such purposes as tracking the progress of children with
an individualized education program or of groups of children
exposed to a particular program or curriculum become particularly salient for older preschoolers. The measures discussed in this
chapter are typically more appropriate for progress monitoring or
program evaluation than for individual screening or diagnosis.
Nonetheless, we recognize that all these domains raise assessment
issues for the full range of purposes.
The chapter covers five domains: (1) physical well-being
and motor development, (2) social and emotional development,
(3) approaches to learning, (4) language and literacy, and (5) cognitive skills, including mathematics as a particular case. These
are widely accepted domains differentiated in various policy
statements, such as the all children ready for school goal of the
National Education Goals Panel (Kagan, Moore, and Bredekamp,

ASSESSING LEARNING AND DEVELOPMENT

1995) and in the analysis of state learning standards by ScottLittle, Kagan, and Frelow (2006). For each of the domains, we
first discuss how it is defined and how its internal structure has
been delineated. We then present evidence for the importance
of the domain: that it is widely mentioned in child achievement
standards, that it is a focus of developmental theory and research,
or that it relates to other outcomes important in the short or long
term. We also consider evidence that the developmental domain
is malleable, that is, amenable to change through interventions,
since the capacity to change is another source of evidence for the
importance of assessing it. We then describe some of the assessment approaches and tools that have been widely used to reflect
status or progress in that domain. Appendix Tables 5-1 through
5-7 provide a summary listing of the major instruments discussed
here, with a table for each domain. For each table, the first column
indicates the subscale or specific domain assessed, and the second
through fifth columns list the instruments that offer the relevant
subscales, categorized by the measurement method(s) used by
each: direct assessment, questionnaire, observation, or interview.
Because many useful instruments do not quite fit into the domains
we discuss, we have also included a table for general knowledge
(sometimes categorized under cognitive skills), and have included
science in the table with mathematics.
For more detailed information on instruments, including
evaluative reviews, specific age range, time to administer, administrator qualifications required, as well as psychometric information, we have listed and described a variety of print and online
instrument compendia and reviews in Appendix D.
Physical well-being and motor development
Defining the Domain
This domain encompasses issues of health, intactness of sensory systems, growth, and fitness, as well as motor development.
Motor development has long been a topic of interest in pediatric
and developmental studies, and it also is one of the areas used in
screening children for possible developmental problems. The com-

EARLY CHILDHOOD ASSESSMENT

ponent of this domain attracting particular policy interest recently

is fitness, with evidence that increases in obesity and lack of exercise
in childhood are coming to constitute public health challenges.
Evidence of Consensus
Healthy children are a goal of every society, and indicators of
health are included in standards promulgated by states as well as
in Head Start standards and other documents reflecting policy.
Piotrkowski, Botsko, and Matthews (2000) found in a survey of
kindergarten teachers that good health was one of the factors
perceived to be essential to school readiness. Surprisingly, issues
of physical fitness are rarely addressed in state standards items,
despite their clear importance to long-term health outcomes. Half
of the physical well-being and motor development items cataloged by Scott-Little, Kagan, and Frelow (2005) addressed motor
skills, but only 11.5 percent addressed fitness.
Perhaps because physical fitness and health have traditionally
been considered of medical rather than educational relevance,
they are not richly represented in the measures typically used
in developmental assessment. An interest in the general welfare
of children, however, dictates more focus on them in ongoing
assessment. In particular, levels of childhood obesity constitute
a recognized crisis (American Academy of Pediatrics, 2005; Institute of Medicine, 2005). Given the potential influences of early
childhood care and education settings (which provide meals and
organize physical activities that can influence obesity and fitness)
and the evidence that preschool status on these dimensions predicts later health indices (Quattrin et al., 2005; Weiss et al., 2004),
more attention is warranted to these indicators as part of developmental assessment. Many general developmental measures (e.g.,
the Bayley Scales of Infant Development and the Denver II) have
subscales reflecting motor development, but greater attention to
easily obtained measures of fitness (height, weight, body-mass
index) as part of early childhood assessment in care and education
settings is clearly merited.

ASSESSING LEARNING AND DEVELOPMENT

Social and emotional development

Defining the Domain
Research on young childrens social and emotional development has focused on three broad issues: (1) social competence,
which reflects the degree of effectiveness the child has in social
interactions with others (Fabes, Gaertner, and Popp, 2006); (2) selfregulation, which involves the modulating thought, affect, and
behavior by means of deliberate as well as automated responses
(Rothbart, Posner, and Kieras, 2006); and (3) maladjustment, consisting of clusters of symptoms that emerge over time, in more
than one context, in more than one relationship, and that may
impede the childs ability to adapt and function in the family
and the peer group (Campbell, 2006). Although there is general
agreement on these three dimensions, different researchers parse
the field somewhat differently, with the result that the various
measures that have been developed reflect different emphases in
defining the domain.
Importance in Practice and Policy
Although there is a lack of agreement as to how this domain
should be subdivided, there is substantial agreement on the
importance of the social and emotional development of young
children to those working directly with them before and after
the transition to formal schooling. In addition, a number of state
consensus documents defining what young children should
know and be able to do include a strong focus on their social and
emotional skills, reflecting a recognition of the importance of this
domain among policy makers as well.
Many states have addressed social and emotional development in their early learning guidelines. In reviews of state early
learning guidelines, Scott-Little and colleagues conclude that
guidelines for preschool-age children focus more on language
and cognition than on physical and social and emotional development, whereas guidelines for infants and toddlers are more
balanced across domains, with the guidelines for infants focusing especially on social and emotional development (Scott-Little,

EARLY CHILDHOOD ASSESSMENT

Kagan, and Frelow, 2006). Californias Preschool Learning Foundations in Social and Emotional Development for Ages 3 and 4
(http://www.cde.ca.gov/re/pn/fd/documents/preschoollf.pdf)
is an excellent example of the development of a consensus document regarding expectations for childrens social and emotional
skills in the preschool years. Relying heavily on the research on
young childrens social and emotional development, the document describes benchmarks for the behavior of 3- and 4-yearolds in central domains of social and emotional development. . . .
In focusing on social and emotional foundations of school readiness, a central assumptionwell supported by developmental
and educational researchis that school readiness consists of
social-emotional competencies as well as other cognitive competencies and approaches to learning required for school success
(p. 1). The standards for social and emotional development in
Californias early learning standards identify the dimensions
of self (self-awareness and self-regulation, social and emotional
understanding, empathy and caring, and initiative in learning),
social interaction (including interactions with familiar adults,
interaction with peers, group participation, and cooperation and
responsibility) and relationships (attachments to parents, close
relationships with teachers and caregivers, and friendships). The
perspective that social and emotional development and early
learning are closely linked is reflected in the inclusion of Initiative in Learning as a component of social and emotional development, involving the childs interest in activities in the classroom,
enjoyment of learning and exploring, and confidence in his or her
ability to make new discoveries.
Importance for Later Development
The social and emotional demands of formal schooling on
young children differ from those of early childhood settings,
and childrens skills in this area at school entry are predictors
of how well they make the adjustment to the new setting and
progress academically (see Bierman and Erath, 2006; Campbell,
2006; Ladd, Herald, and Kochel, 2006; Mashburn and Pianta,
2006; Raver, 2002; Thompson and Raikes, 2007; Vandell, Nenide,
and Van Winkle, 2006). Early childhood care and educational

ASSESSING LEARNING AND DEVELOPMENT

settings usually involve a choice of activities for portions of the

day, many activities involve small rather than large groups, and
children tend to have access to adult caregivers and teachers not
only for guidance on activities but also when they are upset or
experiencing difficulty with peers. Studies of kindergarten classrooms indicate a shift toward large group activities, which are
structured, directed by teachers, and involve less choice. Lower
adult-child ratios and more structured activities result in more
limited access to adults. Not only do children need to learn to
navigate interactions in larger groups and in tasks with more
structure, but they also need to form new relationships with
peers and teachers.
The domains of socioemotional development and executive
functionthe cognitive processes used in response to novel
stimuliare of central importance in early childhood, although a
final decision about exactly which subskills in this area are most
important to measure and most predictive would be somewhat
speculative at this point. Nonetheless, providing a full picture of
a young childs development or of the impact of a care and educational setting requires attending at least to the measurement of
social competence, attention regulation, and behavior problems.
Studies in these areas illustrate evidence of linkages between early
social and emotional development and behavioral adjustment to
school as well as academic performance.
Social competence: A series of studies by Ladd and colleagues
provides evidence for how different facets of social engagement
in the kindergarten classroom combine to predict participation in
the classroom and achievement. In one, the researchers concluded
that findings were consistent with the hypothesis that childrens
classroom participation, particularly the ability to behave in a
cooperative/independent manner in the kindergarten milieu,
is a powerful precursor of early achievement (Ladd, Birch, and
Buhs, 1999).
The connection between a childs socioemotional characteristics and teacher-child relationships is well established. Teachers
report more conflicts with children who exhibit antisocial behaviors, such as interpersonal aggression or tantrums (e.g., Birch
and Ladd, 1998; Hamre and Pianta, 2001; Howes, Phillipsen, and
Peisner-Feinberg, 2000; Ladd and Burgess, 2001; Ladd, Birch, and

EARLY CHILDHOOD ASSESSMENT

Buhs, 1999; Pianta and Steinberg, 1992; Silver et al., 2005). Closeness, conflict, and dependence have been identified as three features of teacher-child relationships that are important to childrens
development (Mashburn and Pianta, 2006).
While relationships with teachers as well as peers during
the transition to formal schooling appear to be central to positive engagement in school and thereby achievement, positive
teacher and peer relations in turn appear to rest at least in part on
childrens knowledge of emotions and their ability to regulate the
expression of their own emotions (Bierman et al., under review;
Denham, 2006; Vandell, Nenide, and Van Winkle, 2006).
Self-regulation: Recent research on self-regulation acknowledges that some aspects of it involve emotion (e.g., modulation in
the expression of negative emotions) and behavior (e.g., inhibition
of aggressive impulses), and other aspects focus more on attentional and cognitive skills (e.g., the ability to maintain a set of
instructions actively in working memory over time and despite
distractions, taking the perspective of another, switching attention
as task demands change) (Diamond et al., 2007; McClelland et al.,
2007; Raver, 2002, 2004).
Socioemotional development is of importance during the
early childhood period because it relates to childrens capacities
to form relationships, both trusting relationships with adults and
friendships with peers, and these relationships in turn seem to be
related to the speed of learning in early care and educational settings. These markers of positive relations with peers and teachers
have implications for childrens engagement and participation in
the classroom. Children learn to regulate the expression of emotion in a variety of ways, including turning to others with whom
they have secure relationships for comfort and support, using
external cues, and, increasingly with age, managing their own
states of arousal (Thompson and Lagattuta, 2006).
Behavior problems: Serious behavior problems are apparent
early in some children. Research summarized by Raver (2002)
indicates that children with early and serious problems of aggression who are rejected by peers are at elevated risk in terms of poor
academic achievement, grade retention, dropping out of school,
and eventually delinquency. Raver notes that children who are
disruptive tend to get less instruction and positive feedback from

ASSESSING LEARNING AND DEVELOPMENT

teachers, to spend less time on task, to engage less with peers in

learning tasks, and to show lower levels of school engagement
overall, as reflected in part by lower attendance.
With respect to evidence relating to early social and emotional
competencies, two notes of caution are needed. First, social and
emotional competencies are worthy developmental goals in their
own right, independent of their relationship to academic outcomes. Second, research in this area is not all in accord with the
perspective that early social and emotional development predicts
more positive academic achievement.
We note that, in a recent study, Duncan and colleagues (2007)
carried out coordinated analyses of six major data sets looking at
early predictors of later academic achievement. They found that
early measures of achievement were strong predictors of later
academic achievement, that measures of attention were moderately strong predictors of later achievement, but that measures of
early social and emotional development, gleaned from parent and
teacher reports, showed no or almost no predictive relationship
to later achievement. The findings of this important study clearly
differ from those of the reviews and findings summarized earlier.
However, as the authors of this article themselves note, our
analysis is focused on behavior during the years just before and
at the point of school entry. If some types of socioemotional skills
are well established before the preschool years, and unchanging
during these years, then we will not be able to detect their effects
(p. 1442). A further issue with this set of analyses is that the
extensive set of control variables in the analyses includes many
of the documented predictors of early social and emotional development, such as maternal education, family structure, family
income, and, in some of the data sets, also parenting and home
environment as well as participation in early care and education.
This extensive set of controls may have diminished the capacity to
detect relationships between early social and emotional development and later achievement. Finally, there was differential attrition in a number of the data sets included in the analyses, with
greater attrition among families at greater risk. Selective attrition
also works against detecting patterns of relationship between
social and emotional development and academic achievement.
In summary, a number of recent reviews summarize evidence

EARLY CHILDHOOD ASSESSMENT

confirming the relation of early social and emotional competencies, self-regulation, and absence of serious behavior problems to
early participation in learning activities and to academic achievement. While it is important to note that social and emotional
development predicts later academic outcomes, at the same time
we insist that childrens social and emotional well-being and
competencies are worthy developmental goals in their own right,
independent of their relationship to academic outcomes.
Evidence of Malleability
According to a review by Raver (2002), there is substantial
evidence from experimental evaluations that it is possible to
improve young childrens social and emotional development at
the point of school entry or earlier, helping them to develop and
stay on a positive course in their relationships with teachers and
peers and to engage positively in learning activities. While the
evidence summarized points to program effects across all the
levels of intensity and the setting of the interventions considered
(in the classroom, with parents, or both), findings are stronger
when interventions engage parents as well as teachers and are
more intensive. More recent reviews contribute to understanding
the complexity of this domain (Bierman and Erath, 2006; Fabes,
Gaertner, and Popp, 2006).
Several recent developments in intervention research on
young childrens social and emotional development are note
worthy. First, very recent work has focused explicitly on interventions targeting childrens self-regulation skills. In recent work by
Diamond and colleagues (Diamond et al., 2007), the Tools of the
Mind curriculum, which embeds direct instruction in strengthening executive function in play activities and social interactions,
was experimentally evaluated in prekindergarten programs in
low-income neighborhoods. This intervention takes a Vygotskian
approachthat is, it encourages extended dramatic play, teaches
children to use self-regulatory private speech, and provides
external stimuli to support inhibition. Results showed significant improvements in direct assessments of childrens executive
function. By the end of the school year, children in classrooms

ASSESSING LEARNING AND DEVELOPMENT

implementing Tools of the Mind did not need help staying on task
or redirecting inappropriate behavior. This study provides important evidence that aspects of self-regulation are malleable.
Measurement Issues
An ongoing challenge in the research on social and emotional
development of young children is to forge agreement about specific constructs, measures, and the mapping of constructs to measures (Fabes, Gaertner, and Popp, 2006; Raver, 2002). The internal
complexity of the domain is reflected in the fact that different
measures parse it differently. The lack of agreement impedes the
capacity to look across studies at accumulating patterns of findings (Zaslow et al., 2006).
Another challenge is that some see measures of social and
emotional development as reflecting in part the early childhood environment and the teacher-child relationship, rather
than as pure measures of the child. For example, a teacher who
requires 3-year-olds in an early childhood classroom to sit still
for long periods to do seat work is likely to assess many children
as inattentive or disruptive (Thompson and Raikes, 2007). Her
rating of a child as having behavior problems may actually be a
reflection of her inappropriate expectations, rather than a childs
enduring behavior problem.
Another measurement challenge is the heavy reliance in this
domain on teacher and parent reports. In development are direct
assessments of childrens behavioral self-regulation (Emotion
Matters II Direct assessments developed by Raver and modeled
after work by Kochanska and colleagues); of the executive function aspects of self-regulation (the Head to Toe Task described
by McClelland and colleagues, 2007); and of the Dots Task from
the Directional Stroop Battery and the Flanker Task described
by Diamond and colleagues (2007). Further work with these
measures may generate important evidence about their reliability and validity, as well as their sensitivity to intervention
approaches and their relation to teacher and parent reports and
direct observations.

EARLY CHILDHOOD ASSESSMENT

Testing All Children

Much developmental research has assumed universality of
many measures tapping socioemotional processes in child development (Phinney and Landin, 1998). More recently, investigators
have begun to challenge this assumption by testing whether
measures show a similar or different factor structure and different
patterns of predictive validity across groups of children who vary
by race, ethnicity, and culture (Knight and Hill, 1998; Mendez,
Fantuzzo, and Cicchetti, 2002; Phinney and Landin, 1998; Raver,
Gershoff, and Aber, 2007). Measures and constructs should be
reviewed carefully for the presence or absence of consistent
psychometric properties across groups of black, Hispanic, and
European American children. More often than not, measurement
equivalence for Asian and Pacific Islander children, American
Indian children, and biracial children has been all but ignored (see
Chapter 8 for more on assessing special populations).
Available Measures
Existing measures of socioemotional development address
two large groups of constructs: socioemotional functioning
and self-regulation. Socioemotional functioning, in turn, can be
divided into measures of positive functioning (prosocial behavior,
relations with peers, attachment to caregiver, acceptance of authority) and problematic functioning (aggression, resisting authority,
loneliness, depression). Self-regulation measures typically tap
such domains as delayed gratification, sustained attention, behavioral persistence, and problem-solving skillsmeasures that may
overlap with those classified under approaches to learning by
some researchers.
A relatively well-articulated inventory of measures that can
be used to capture constructs in the socioemotional domain
now exists, although approximately half of those measures are
newly developed and thus are not yet endowed with high levels
of certainty about the full spectrum of psychometric properties.
That said, the field has developed enough experience using these
measures in experimental and nonexperimental research with
low-income preschool-age children that solid estimates of their

ASSESSING LEARNING AND DEVELOPMENT

reliability, predictive validity, and distributional properties exist,

as does information about the costs of collecting these assessments and their relative costs and benefits. Appendix Table 5-2
lists many of these measures.
Approaches to Learning
Defining the Domain
The developmental domain of approaches to learning includes
such constructs as showing initiative and curiosity, engagement
and persistence, and reasoning and problem-solving skills (U.S.
Department of Health and Human Services, Administration for
Children and Families, 2003b). These skills are viewed separable from both socioemotional adjustment and overall cognitive
skills (Fantuzzo et al., 2007), although it will be clear from the
preceding section that the distinction from socioemotional skills
is sometimes hard to draw. Approaches to learning are defined
as distinct, observable behaviors that indicate ways children
become engaged in classroom interactions and learning activities, according to a recent review (Fantuzzo et al., 2007). Such
behaviors are viewed as an essential component of school readiness (National Education Goals Panel, 1997; U.S. Department of
Health and Human Services, Administration for Children and
Families, 2003b), although they are less understood or researched
than other components (Fantuzzo et al., 2007).
Evidence of Consensus
There is general consensus that children need to be able to
engage in classroom activities in order to learn in a classroom setting. The National Education Goals Panel (1997) underscored the
importance of such learning behaviors. Subsequently, Head Start
included indicators regarding approaches to learning in its Child
Outcomes Framework (U.S. Department of Health and Human
Services, Administration for Children and Families, 2003a). And
16 states have included indicators in this area in their early learning guidelines. Furthermore, elementary school teachers in the
early grades believe that these behaviors are important (Foulks

EARLY CHILDHOOD ASSESSMENT

and Morrow, 1989; Lewit and Baker, 1995), claiming that many
children, especially from low-income homes, enter kindergarten
lacking them (Rimm-Kaufman, Pianta, and Cox, 2000).
Evidence of Continuity and Associations with
Important Outcomes
Aspects of infant behavior, such as giving attention and the
ability to sustain attention, appear to show continuity over time
and relate to educational outcomes. Learning behaviors, such as
persistence and attention in the classroom, have been shown to
be related to specific academic skills in early childhood, such as
early mathematics and literacy skills, across a number of studies
(Fantuzzo, Perry, and McDermott, 2004; Green and Francis, 1988;
McDermott, 1984; McWayne, Fantuzzo, and McDermott, 2004),
even when measures of emotional adjustment were also considered. Approaches to learning as rated by the kindergarten teacher
at entry to school predicted growth in mathematics from kindergarten to third grade in a national sample, the Early Childhood
Longitudinal Study-Kindergarten Cohort (ECLS-K) (DiPerna, Lei,
and Reid, 2007).
Several studies have found significant associations between
young childrens learning-related behavior and their academic
performance. Normandeau and Guay (1998) reported that first
graders cognitive self-control (the ability to plan, evaluate, and
regulate problem-solving activities; attend to tasks; persist; resist
distraction) was associated with their academic achievement, net
of their intellectual skills assessed in kindergarten. Howse et al.
(2003) found that teachers ratings of kindergarteners (but not
second graders) motivation (e.g., is a self-starter, likes to do
challenging work) predicted concurrent reading achievement,
with receptive vocabulary (but not previous reading achievement) held constant.
In a longitudinal study of children from kindergarten through
second grade by McClelland, Morrison, and Holmes (2000),
teachers ratings of kindergarten childrens work-related skills
(compliance with work instructions, memory for instructions,
completion of games and activities) were significantly associated

111

Evidence of Malleability
The theory of change for most early childhood intervention
programs is that some form of preschool enrichment will lead
to more rapid growth in cognitive skills for participants, often
children from low-income families. Most often, cognitive skills
are measured via individual direct assessments using standardized tests administered by trained staff members. A recent Rand
Corporation study (RAND Labor and Population, 2005) examined
programs implemented in the United States that provide services
to children and families during early childhood and reported
effect sizes (d) for cognitive outcomes for successful programs
that ranged from .13 to 1.23.
The largest effect sizes were obtained in the most intensive interventions in assessments of children after age 2. The
Abecedarian Project, a single-site experimental intervention that
delivered 5 years of full-time quality child care, yielded effect sizes
of d = .50 at 18 months, d = .83 at 24 months, d = 1.23 at 36 months,
and d = .73 at 54 months on standardized infant developmental
or IQ tests (note that the reduction in effect sizes between ages 3
and 5 appears to be related to the fact that control children were
attending quality child care centers) (Burchinal, Lee, and Ramey,
1989). The High Scope/Perry Preschool Project, a single-site program that delivered 2 years of preschool between ages 3 and 5 and
included a home visit/parenting education component, yielded
effect sizes of d = 1.03 at age 5 on standardized IQ tests. The Infant
Health and Development Project, a large multisite research project
that delivered 3 years of home visiting and 2 years of full-time
high-quality child care from birth, yielded an effect size of d = .83
on an IQ test at the end of the program at 36 months. The Early
Training Project (Gray and Klaus, 1970), which included both
home visiting and child care for preschoolers, reported an effect
size of d = .70 in an IQ test.
In contrast, much weaker effect sizes were obtained for interventions that were less intense: d = .27 for the Ypsilanti Carnegie
Infant Education project, which provided home visiting (Epstein
and Weikart, 1979); d = .13 at 36 months for Early Head Start, a
large multisite research site that delivered 2-3 years of home visiting and high-quality child care in some sites (U.S. Department
of Health and Human Services, Administration for Children and

112

EARLY CHILDHOOD ASSESSMENT

Families, 2004); d = .13 for the Prenatal Early Infancy ProjectElmira site (Olds et al., 1993), another home visiting project; and
d = .12 at 48 months for the Head Start Impact Study, which evaluated the impact of a year of Head Start involving both center care
and home visiting (U.S. Department of Health and Human Services, Administration for Children and Families, 2005). Finally, the
relatively frequent need to renorm cognitive tests provides further
evidence of mutability for general cognitive scores (Neisser et al.,
1996). As the average level of education rose in this country, IQ
tests had to be renormed to ensure that the mean score did not
rise substantially.
A growing literature also demonstrates mutability in executive functioning. Experimental studies have demonstrated that
children who participated in brain training activities and curricula exhibited improved neurocognitive abilities (including
executive function) and, in some cases, behavior relative to peers
who did not participate in the training activities (Diamond et al.,
2007; Dowsett and Livesey, 2000; Klingberg et al., 2005; Rueda et
al., 2005; Semrud-Clikeman et al., 1999).
Testing All Children
The challenges of collecting interpretable data on the cognitive skills of children from non-English-speaking or multicultural
backgrounds have been hotly debated. Overall, recent IQ and
general cognitive tests have been developed using diverse populations in their norming samples, and scores on these tests tend
to show similar patterns of prediction with academic achievement and other criteria for different ethnic and economic groups
(Neisser et al., 1996). However, insufficient evidence exists to draw
definitive conclusions regarding the use of these measures with
infants, toddlers, and preschoolers. Similarly, many measures of
specific cognitive skills were developed using middle-class white
children but have been used recently in studies in Head Start
classrooms or other programs serving low-income, ethnically
diverse children. There is growing attention to the psychometric
properties of these measures as the research moves away from
documenting normative development to examining individual
differences (Blair and Razza, 2007).

ASSESSING LEARNING AND DEVELOPMENT

measurement tools for use in the future. As the history of instrument development in the domains of approaches to learning and
social/emotional development shows, identifying a domain as
important can generate researcher and practitioner interest that
translates itself initially into informal assessments, which are
refined and expanded to meet the psychometric criteria of importance from wider use.
The default when thinking about assessment is to think
about direct, formal testingthe familiar scenario of an adult
sitting down with a child and presenting prescribed questions or
challenges for him or her to solve, in a prescribed sequence. It is
important to emphasize that, although many of the assessment
tools discussed in this chapter have that character, the repertoire
of usable, reliable, and informative assessments is in fact much
larger, including observation of the child in natural or somewhat
structured settings, collecting information from primary care
givers and from adults in child care and educational settings
about the childs behavior, and interacting with the child directly
but without formal test items or materials. The reliability and
validity of such measures for young children needs more study,
and such research is beginning to be done. For example, Meisels,
Xue, and Shamblott (in press) studied the Work Sampling for
Head Start (WSHS) measure, derived from the Work Sampling
System, which has observers complete a checklist of childrens
demonstrated capabilities. They reported moderate correlations with direct assessment instruments for language, literacy,
and mathematics, but did not recommend use of the WSHS for
accountability purposes.

Motor control

Nutrition

Well-being

Bayley Scales of Infant

Development (BSID),
Third ed.

Motor
development

Bayley Scales of Infant

Development (BSID),
Third ed.

NEPSY

Denver II

Direct Assessment

Data-Gathering Method

Physical
development

Assessment
Subscales

Toddler-Parent Mealtime
Behavior Questionnaire

The Work Sampling

System (WSS)

Creative Curriculum
Development
Continuum for Ages 3-5

Questionnaire

Games as Measurement
for Early Self-Control
(GAMES)

Toddler-Parent Mealtime
Observation

Indices of Obesity

The Work Sampling

System (WSS)

Growth Charts
Creative Curriculum
Development
Continuum for Ages 3-5

Observation

APPENDIX TABLE 5-1 Physical Well-Being and Motor Development Instruments

Denver II

Interview

APPENDIX TABLES:
5-1 through 5-7Tables of Preschool Instruments1

120

Bayley Scales of Infant

Development (BSID),
Third ed., Behavioral
Rating Scale (BRS)

High/Scope Child
Observation Record
(COR)

Music and
movement

High/Scope Child
Observation Record
(COR)

The Galileo System

for the Electronic
Management of Learning
(Galileo)

Vineland Social-Emotional
Early Childhood Scales
(SEEC)

1These listings do not imply any approval or endorsement by the committee of particular instruments. They are included to provide
examples of instruments available for measuring various domains and outcomes. Appendix D provides information on where reviews
of the instruments may be found.

Play and leisure

time

Motor quality

The Galileo System

for the Electronic
Management of Learning
(Galileo)

Perceptual motor
development

Physical activity

Clinical Evaluation of
Language Fundamentals
(CELF)-Preschool
Behavioral Observation
Checklist

Processing speed

Woodcock-Johnson III
(WJ-III)

Games as Measurement
for Early Self-Control
(GAMES)

Impulse control/
delay gratification

121

Negative
reaction
tendency

Social skills

Emotion
regulation

Bayley Scales of Infant

Development (BSID),
Third ed., Behavioral
Rating Scale (BRS)

Adapted EZ-Yale
Personality/Motivation
Questionnaire (Adapted
EZPQ)

Adaptive Social Behavior

Index

ECLS-K Adaptation of
the Social Skills Rating
System (SSRS), Task
Orientation/Approaches
to Learning Scale

Social Skills Rating Scale

(SSRS)

Social Skills Rating Scale

(SSRS)

The Work Sampling

System (WSS)

Delay-of-Gratification
Task

The Work Sampling

System (WSS)

Child Behavior Checklist

Observation

Behavior
problems

Questionnaire
Strange Situation

Direct Assessment

Data-Gathering Method

Attachment

Assessment
Subscales

APPENDIX TABLE 5-2 Social-Emotional Development Instruments

Interview

Social Skills Rating Scale

(SSRS)

Social Skills Rating Scale

(SSRS)

Attachment Q-Sort

122

The Galileo System

for the Electronic
Management of Learning
(Galileo)

Social
development

The Galileo System

for the Electronic
Management of Learning
(Galileo)

Creative Curriculum
Development Continuum
for Ages 3-5

Social-emotional
development

Games as Measurement
for Early Self-Control
(GAMES)

Clinical Evaluation of
Language Fundamentals
(CELF)-Preschool
Behavioral Observation
Checklist

Bayley Scales of Infant

Development (BSID),
Third ed.

Adapted EZ-Yale
Personality/Motivation
Questionnaire (Adapted
EZPQ)

Fatigue/
boredom/
frustration

Behavior rating
scale

Self-regulation
tasks

Outer
directedness

continued

123

Behavioral Assessment
System for Children
(BASC)
Behavioral Assessment
System for Children
(BASC)
Child Behavior Checklist
(CBCL) and CaregiverTeacher Report Form
(C-TRF)

Composites

Syndromes

Bayley Scales of Infant

Development (BSID),
Third ed., Behavioral
Rating Scale (BRS)

Clinical scales

Attention/
arousal

The Work Sampling

System (WSS)

Personal
and social
development

Questionnaire
High/Scope Child
Observation Record
(COR)

Direct Assessment

Data-Gathering Method

Social relations

Assessment
Subscales

APPENDIX TABLE 5-2 Continued

The Work Sampling

System (WSS)

High/Scope Child
Observation Record
(COR)

Observation

Interview

124

Child Behavior Checklist

(CBCL) and CaregiverTeacher Report Form
(C-TRF)
Child Behavior Checklist
(CBCL) and CaregiverTeacher Report Form
(C-TRF)
Connors Rating ScalesRevised (CRS-R)

Connors Rating ScalesRevised (CRS-R)

Devereux Early
Childhood Assessment
(DECA)
Devereux Early
Childhood Assessment
(DECA)
Infant-Toddler Social and
Emotional Assessment
(ITSEA)

Summary scales

DSM-oriented
scales

Factor
analytically
derived
subscales

Auxiliary scales

Protective
factors scale

Behavioral
concern

Externalizing
symptoms

continued

Infant-Toddler Social and

Emotional Assessment
(ITSEA)

125

Infant-Toddler Social and

Emotional Assessment
(ITSEA)
Infant-Toddler Social and
Emotional Assessment
(ITSEA)
Social Competence and
Behavioral Evaluation
(SCBE), Preschool Ed.
Social Competence and
Behavioral Evaluation
(SCBE), Preschool Ed.
Social Competence and
Behavioral Evaluation
(SCBE), Preschool Ed.
Social Competence and
Behavioral Evaluation
(SCBE), Preschool Ed.

Dysregulation

Competence

Overall
adjustment
scales

Peer social
interactions
scales

Adult social
interactions
scales

Social
competence

Questionnaire
Infant-Toddler Social and
Emotional Assessment
(ITSEA)

Direct Assessment

Data-Gathering Method

Internalizing
symptoms

Assessment
Subscales

APPENDIX TABLE 5-2 Continued

Observation

Infant-Toddler Social and

Emotional Assessment
(ITSEA)

Infant-Toddler Social and

Emotional Assessment
(ITSEA)

Infant-Toddler Social and

Emotional Assessment
(ITSEA)

Interview

126

Vineland Social-Emotional
Early Childhood Scales
(SEEC)
Denver II

Coping skills

Personal and
social skills

Social Competence and

Behavioral Evaluation
(SCBE), Preschool Ed.

General
adaptation

Vineland Social-Emotional
Early Childhood Scales
(SEEC)

Social Competence and

Behavioral Evaluation
(SCBE), Preschool Ed.

Externalizing
problems

Interpersonal
relationships

Social Competence and

Behavioral Evaluation
(SCBE), Preschool Ed.

Internalizing
problems

127

NEPSY

Direct Assessment

Data-Gathering Method

Adapted EZ-Yale
Personality/Motivation
Questionnaire (Adapted
EZPQ)
ECLS-K Adaptation of the
Social Skills Rating Scale
(SSRS), Task Orientation/
Approaches to Learning Scale
ECLS-K Adaptation of the
Social Skills Rating Scale
(SSRS), Task Orientation/
Approaches to Learning Scale

Engagement in
learning

Organization

Delay-of-Gratification Task

Tower of Hanoi

Observation
The Galileo System for the
Electronic Management of
Learning (Galileo)

Questionnaire
The Galileo System for the
Electronic Management of
Learning (Galileo)

Academic mastery
motivation

Emotion regulation

Inhibitory control NEPSY, CPT

Executive
functioning

Assessment
Subscales

APPENDIX TABLE 5-3 Approaches to Learning Instruments

Interview

128

Adaptive behavior
scales

Behavioral Assessment
System for Children (BASC)

High/Scope Child
High/Scope Child
Observation Record (COR) Observation Record (COR)

Initiative

Bayley Scales of Infant

Development (BSID), Third
ed., Behavioral Rating Scale
(BRS)

The Galileo System for the

Electronic Management of
Learning (Galileo)

The Galileo System for the

Electronic Management of
Learning (Galileo)

Self-help

Orientation/
engagement

CELF-Preschool Behavioral
Observation Checklist

Attention to task

NEPSY

ECLS-K Adaptation of the

Social Skills Rating Scale
(SSRS), Task Orientation/
Approaches to Learning Scale

Adaptability

Visuospatial
processing

ECLS-K Adaptation of the

Social Skills Rating Scale
(SSRS), Task Orientation/
Approaches to Learning Scale

Creativity

129

NEPSY

Kaufman Assessment
Battery for Children
(K-ABC)

Memory and
learning

Sequential
processing

Games as Measurement for

Early Self-Control (GAMES)

Sustained
attention

Tower of Hanoi

Observation

Games as Measurement for

Early Self-Control (GAMES)

NEPSY

Executive
functioning

Questionnaire

Cognitive control

Bayley, Stanford-Binet,
Wechsler Preschool and
Primary Scale of Intelligence
(WPPSI), WISC

Expressive One-Word
Picture Vocabulary Test
(EOWPVT)
Woodcock-Johnson III
(WJ-III)

Direct Assessment

Data-Gathering Method

Intelligence

Assessment
Subscales

APPENDIX TABLE 5-4 Cognitive Skills Instruments

Interview

130

Stanford-Binet Intelligence
Scale, Fourth ed. (SB-IV)
Woodcock-Johnson III
(WJ-III)

Woodcock-Johnson III
(WJ-III)

Long-term
retrieval

Primary Test of Cognitive

Skills (PTCS)

Concepts

Short-term
memory

Primary Test of Cognitive

Skills (PTCS)

Memory

Kaufman Assessment
Battery for Children
(K-ABC)
Woodcock-Johnson III
(WJ-III)

Achievement
scale

Primary Test of Cognitive

Skills (PTCS)

Kaufman Assessment
Battery for Children
(K-ABC)

Mental
processing

Spatial

Kaufman Assessment
Battery for Children
(K-ABC)

Simultaneous
processing

continued

131

Woodcock-Johnson III
(WJ-III)

Auditory
processing

Fluid reasoning

Creative Curriculum
Development Continuum
for Ages 3-5
The Galileo System for the
Electronic Management of
Learning (Galileo)

Early cognitive
development

Questionnaire

Cognitive
development

Phonological
awareness

Woodcock-Johnson III
(WJ-III)

Visual-spatial
thinking

Response
latency

Direct Assessment

Assessment
Subscales

Data-Gathering Method

APPENDIX TABLE 5-4 Continued

The Galileo System for the

Electronic Management of
Learning (Galileo)

Creative Curriculum
Development Continuum
for Ages 3-5

Clinical Evaluation of
Language Fundamentals
(CELF)-Preschool
Behavioral Observation
Checklist

Observation

Interview

132

Bayley Scales of Infant

Development (BSID), Third
ed.

Bracken Basic Concept

Scale-Revised (BBCS-R)

Wechsler Preschool
and Primary Scale of
Intelligence, Third ed.
(WPPSI-III)

Colors

Performance IQ

Full-scale IQ

Woodcock-Johnson III
(WJ-III), Peabody
Individual Achievement
Test (PIAT), Peabody
Individual Achievement
Test-Revised (PIAT-R)

Direct Assessment

Data-Gathering Method

Mental scale

Assessment
Subscales
Questionnaire

APPENDIX TABLE 5-5 General Knowledge Instruments

Work Sampling Plans,

Portfolio, Summative
Instructional Tools (e.g.,
COR)

Observation

Interview

continued

Self-/social
awareness

Texture/material

Quantity

Time/sequencing

Abstract/visual
reasoning

Direction/position Bracken Basic Concept

Scale-Revised (BBCS-R)

Bracken Basic Concept

Scale-Revised (BBCS-R)

Direct Assessment

Data-Gathering Method

Sizes

Assessment
Subscales

APPENDIX TABLE 5-5 Continued

Questionnaire

Observation

Interview

134

High/Scope Child
High/Scope Child
Observation Record (COR) Observation Record (COR)
The Work Sampling System The Work Sampling System
(WSS)
(WSS)
The Work Sampling System The Work Sampling System
(WSS)
(WSS)

Creative
representation

Social studies

The arts

The Galileo System for the

Electronic Management of
Learning (Galileo)

The Galileo System for the

Electronic Management of
Learning (Galileo)

Expressive One-Word
Picture Vocabulary Test
(EOWPVT)

Creative arts

Academic
progress

135

Woodcock-Johnson III
(WJ-III), Peabody Individual
Achievement Test (PIAT),
Peabody Individual
Achievement Test-Revised
(PIAT-R), Test of Early
Mathematics Ability
(TEMA)

Woodcock-Johnson III
(WJ-III), Peabody Individual
Achievement Test (PIAT)

Stanford-Binet Intelligence
Scale, Fourth ed. (SB-IV)

Bracken Basic Concept

Scale-Revised
(BBCS-R)/SRC

Bracken Basic Concept

Scale-Revised
(BBCS-R)/SRC

Bracken Basic Concept

Scale-Revised
(BBCS-R)/SRC

Science

Quantitative
reasoning

Number/
counting

Sizes

Shapes

Direct Assessment

Data-Gathering Method

Mathematics

Assessment
Subscales
Questionnaire

APPENDIX TABLE 5-6 Math and Science Instruments

Work Sampling Plans,

Portfolio, Summative
Instructional Tools

Work Sampling Plans,

Portfolio, Summative
Instructional Tools

Observation

Interview

136

Kaufman Assessment
Battery for Children
(K-ABC)

Test of Early Mathematics

Ability, Second ed.
(TEMA-2)

Test of Early Mathematics

Ability, Second ed.
(TEMA-2)

Woodcock-Johnson III
(WJ-III)

Achievement
scalearithmetic
subtest

Formal
mathematics

Informal
mathematics

Achievement
broad
mathematics

Achievementmathematical
calculation skills

Achievement
mathematical
reasoning

Questionnaire
The Galileo System for the
Electronic Management of
Learning (Galileo)

Direct Assessment

Data-Gathering Method

Nature and
science

Assessment
Subscales

APPENDIX TABLE 5-6 Continued

Interview

138

MacArthur-Bates
Communicative
Development Inventories
(CDI)

Peabody Picture Vocabulary

Test (PPVT), Expressive
One-Word Picture
Vocabulary Test (EOWPVT)

Vocabulary

The Galileo System for the

Electronic Management of
Learning (Galileo)
High/Scope Child
Observation Record (COR)
The Work Sampling System
(WSS)

The Galileo System for the

Electronic Management of
Learning (Galileo)
High/Scope Child
Observation Record (COR)
The Work Sampling System
(WSS)
Creative Curriculum
Development Continuum
for Ages 3-5

Observation

Questionnaire

Creative Curriculum
Development Continuum
for Ages 3-5

Direct Assessment

Data-Gathering Method

General language Clinical Evaluation of

Language Fundamentals
(CELF),
MacArthur-Bates
Communicative
Development Inventories
(CDI), Test of Language
Dominance (TOLD),
Woodcock-Johnson III
(WJ-III), NEPSY

Assessment
Subscales

APPENDIX TABLE 5-7 Language and Literacy Instruments

Interview

continued

139

Comprehensive Test of
Phonological Processing
(CTOPP), WoodcockJohnson III (WJ-III)

Diagnostic Evaluation of
Language Variation (DELV)

Test of Early Reading Ability

(TERA), Woodcock-Johnson III
(WJ-III), Peabody Individual
Achievement Test (PIAT)
Concepts About Print (Clay)
Sulzby Classification
Schemes: Emergent
Storybook Reading (1985)

Peabody Individual
Achievement Test-Revised
(PIAT-R)

Grammar

Literacy

Reading
recognition

Reading
comprehension

Spelling

Direct Assessment

Data-Gathering Method

Phonological
awareness

Assessment
Subscales

APPENDIX TABLE 5-7 Continued

Questionnaire

Work Sampling Plans,

Portfolio, Summative
Instructional Tools

Observation

Interview

140

Bracken Basic Concept

Scale-Revised (BBCS-R)

Wechsler Preschool
and Primary Scale of
Intelligence, Third ed.
(WPPSI-III)

Clinical Evaluation of
Language FundamentalsPreschool (CELF-Preschool)
Test of Early Language
Development, Third ed.
(TELD-3)

Clinical Evaluation of
Language FundamentalsPreschool (CELF-Preschool)
Reynell Developmental
Language Scales: U.S.
Edition (RDLS)
Test of Early Language
Development, Third ed.
(TELD-3)

Verbal IQ

Receptive
language

Expressive
language

Stanford-Binet Intelligence
Scale, Fourth ed. (SB-IV)

Letters

Verbal

Verbal reasoning
Primary Test of Cognitive
Skills (PTCS)

Sequenced Inventory
of Communication
Development-Revised
(SICD-R), Reynell
Developmental Language
Scales, U.S. ed. (RDLS)

Sequenced Inventory
of Communication
Development-Revised
(SICD-R)

continued

Sequenced Inventory
of Communication
Development-Revised
(SICD-R)

141

MacArthur-Bates
Communicative
Development Inventories
(CDI)

Kaufman Assessment
Battery for Children
(K-ABC), Expressive
Vocabulary Subtest

Verbal
expression

Interview

Words and
sentences

Kaufman Assessment
Battery for Children
(K-ABC), Expressive
Vocabulary Subtest

Recall ability

Observation

MacArthur-Bates
Communicative
Development Inventories
(CDI)

Clinical Evaluation of
Language FundamentalsPreschool (CELF-Preschool)

Quick-test

Questionnaire

Words and
gestures

Clinical Evaluation of
Language FundamentalsPreschool (CELF-Preschool)

Direct Assessment

Data-Gathering Method

Total language

quotient
Development, Third ed.
(TELD-3)

Reynell Developmental
Language Scales, U.S. ed.
(RDLS)

Preschool Language Scale,

Fourth ed. (PLS-4)

Auditory
comprehension

Verbal
comprehension

Preschool Language Scale,

Fourth ed. (PLS-4)

Expressive
communication

continued

143

Dynamic Indicators of Basic

Early Literacy Skills, Sixth
ed. (DIBELS)

Direct Assessment

Data-Gathering Method

Test of Early Reading

Ability-3 (TERA-3)

Test of Early Reading

Ability-3 (TERA-3)

Test of Early Reading

Ability-3 (TERA-3)

Woodcock-Johnson III
(WJ-III)

Denver II

Alphabet

Conventions

Meaning

Letter-word
identification

Writing samples

Word attack

Language skills

Oral reading
Dynamic Indicators of Basic
fluency and retell Early Literacy Skills, Sixth
fluency
ed. (DIBELS)

Nonsense word
fluency

Assessment
Subscales

APPENDIX TABLE 5-7 Continued

Questionnaire

Observation

Denver II

Interview

144

6
Measuring Quality in
Early Childhood Environments

he domains of importance in early childhood all show

mutability as a result of aspects of the environment. In this
chapter, we review measures of quality in family and in
early care and educational environments. Sometimes the family
or the quality of the early care and educational setting is an outcome in its own rightthe target of an intervention, for example.
In other cases, it is a mediator of the effects of an intervention
(e.g., improving family financial resources, introducing a new
preschool curriculum, providing professional development) on
child-level outcomes. In both these cases, it is crucial to have
reliable and usable instruments from which one can draw valid
inferences about the quality of the environment.
Infants, toddlers, and young children need supportive,
responsive, and stimulating relationships with caregivers and
stimulating and safe environments to thrive (McCartney and
Phillips, 2006; National Research Council and Institute of Medicine, 2000). The National Academies synthesis of research on early
development From Neurons to Neighborhoods concluded that early
environments matter and nurturing relationships are essential
(National Research Council and Institute of Medicine, 2000, p. 4).
Families provide the primary care for children and are often the
focus of early intervention programs. Home visiting programs
are designed to promote positive, supportive parenting and to
145

146

EARLY CHILDHOOD ASSESSMENT

MEASURING QUALITY IN EARLY CHILDHOOD ENVIRONMENTS

157

to which children are given specific kinds of opportunities to

develop literacy, mathematical, or science skills.
We summarize below selected observational measures that
have been developed and used to assess early childhood programs. For each of these measures, there is some evidence for their
reliability and validity. (Evidence on the reliability and validity of
these and other observational measures is summarized by Child
Trends, 2007, in a compendium providing profiles of measures
of quality in early childhood care and educational settings.) Few
measures have demonstrated effects on child outcomes, although
most assess practices that have been associated with positive child
outcomes. Note that although classroom observations may not
be as labor-intensive or expensive as assessing individual child
outcomes, a fair amount of training is necessary to use all of these
measures reliably. Observations generally should be done for a
minimum of 3 hours before a classroom is rated. For full-day programs, a full-day observation is preferable, and observations on
two separate days are always desirable. The developers of some
measures require their own training and certification.
Assessment Profile for Early Childhood Programs
The Assessment Profile for Early Childhood Programs
(APECP; Abbot-Shinn and Sibley, 1992) is an observational checklist with dichotomous items that provides a global assessment of
overall preschool classroom environment; it includes subscales
that address specific aspects of the dimensions thought to define
global quality. These scales include (1) learning environment
(provisions for and accessibility of materials, space conducive
to child independence), (2) scheduling (written plans assessed
for balance and variety of activities), (3) curriculum (degree to
which alternative techniques are used to facilitate learning, based
on assessment of children in class; degree to which children are
encouraged to be active in guiding their own learning; the role of
the teacher in facilitating learning), (4) interacting (teachers positive interactions, responsiveness, and management of children),
and (5) individualizing (support for individualized learning experiences through assessment, parent communication, and referrals;
plans for children with special needs).

158

EARLY CHILDHOOD ASSESSMENT

BOX 6-1
Dimensions of Quality Observable in the Classroom
1. Emotional climate, social interactions, support for social skills
development, and discipline strategies:
A. Degree to which adults are affectionate, supportive, attentive,
and respectful toward children.
B. Explicit support for social skills (e.g., encouraging children to
use their words, modeling and engaging children in conversations about social problem solving skills, encouraging use
of learned strategies to solve real social conflicts).
C. Conversations about feelings.
D. Collaboration and cooperation opportunities.
E. Clarity and developmental appropriateness of rules.
F. Teachers use of redirection, positive reinforcement, encouragement, and explanations to minimize negative behavior.
2. Instructional activitiesan explicit curriculum with specified
learning goals for children.
3. Generalindividualized (adjusted to childrens skills and interests); purposeful, planned instruction; integration of content
areas; children actively interacting with materials.
4. Languageadults engage in conversations with children; activities that encourage conversation among children; explicit efforts
to develop vocabulary and language skills in the context of
meaningful activities.
5. Literacychildren read to and given opportunities to read;
rhyming words, initial sounds, lettersound links, and spellings
of common words pointed out and practiced; functions and features of print pointed out; opportunities to dictate and write using
invented spelling made available.
6. Mathematicsactivities that involve counting objects, measuring, identifying shapes, creating patterns, telling time, classifying
and seriating objects; instruction on concepts (e.g., big, bigger,
equal, one-to-one correspondence, spatial relationships).
7. Scienceactive manipulation of materials (e.g., sink and float)
with adult engaging children in prediction, systematic observation and analysis; instruction on scientific concepts linked to
active exploration (e.g., care and observations of live animals).

MEASURING QUALITY IN EARLY CHILDHOOD ENVIRONMENTS

8. Interactions with parentsactivities and opportunities for parents to be informed about the program and their child.
9. Cultural responsiveness:
A. Evidence of supports for linguistic and cultural diversity (e.g.,
pictures, books, language).
B. Activities that expose children to diverse languages and
cultural practices.
C. Support for native language development.
D. Support for learning English.
10. Safety:
A. Adult-child ratio.
B. Absence of broken furniture, any objects that could cause
physical harm.
C. Sufficient space; open pathways.
D. Place for personal hygiene (e.g., teeth brushing, hand
washing).
11. Materials:
A. Technology (e.g., computers).
B. Music (e.g., CD player).
C. Creativity (e.g., art supplies, easels, play dough).
D. Dramatic play (e.g., store, post office, kitchen, clothes).
E. Science (e.g., sand, water, plants, live animals).
F. Literacy (e.g., books, writing materials).
G. Math (e.g., counting objects, blocks, measuring instruments).
H. Fine motor (e.g., materials for drawing, scissors).
12. Physical arrangement:
A. Space and equipment for gross motor activities (e.g., climbing
equipment, swings, balls).
B. Place for quiet and rest (e.g., rugs and pillows out of the
center of activity).
C. Childrens access to materials.
13. Adaptations for children with disabilities.

159

160

EARLY CHILDHOOD ASSESSMENT

The 75 items are scored on a yes/no basis according to

whether or not they characterize a program during each time
interval observed. Typically, programs are observed in 15- to
20-minute time periods over a course of 3 hours (e.g., one time
period per hour), thus yielding up to 3 yes/no scores for each
item. Although the measure includes items related to academic
instruction, the yes/no format is a major limitation. Thus, for
example, the caregiver/teacher asking only one open-ended
question or writing down one word dictated by a child during an
observation period gets a yes score for that observation period.
The measure also does not differentiate among kinds of instructional approaches. For example, scores go up whether children
are asked questions that require remembering specific facts (such
as who, what, when questions), or questions that are open-ended
or problem-solving (such as why and how questions). Scores on
the learning environment are also substantially affected by the
number of materials of a particular kind rather than the quality
of their use. Also, some of the items require inspection of records
(e.g., lesson plans, daily schedule).
The APECP scores have been related to child outcomes in
both program improvement and observational studies (Lambert,
Abbott-Shinn, and Sibley, 2006).
Caregiver Interaction Scale
The Caregiver Interaction Scale (CIS; Arnett, 1989) provides
a global rating of caregiver/teacher sensitivity and responsiveness to all children in the setting. It has been used in both center
and home-based care and for infants, toddlers, and preschoolers. It focuses on caregiver/teacher interactions with children,
especially on responsiveness and emotional tone. The measure
consists of 26 items measuring teachers: (1) sensitivity (e.g.,
seems enthusiastic about the childrens activities and efforts),
(2) harshness (e.g., seems unnecessarily harsh when scolding or
prohibiting children), (3) detachment (e.g., spends considerable
time in activity not involving interventions with the children),
and (4) permissiveness (e.g., expects the children to exercise selfcontrol). Each item is rated on a 4-point Likert scale with 1 being
not at all to 4 being very much. The focus on teacher-child

MEASURING QUALITY IN EARLY CHILDHOOD ENVIRONMENTS

161

interaction is a strength if that is the primary goal. The measure

170

EARLY CHILDHOOD ASSESSMENT

used quality measure, the positive caregiving composite, is calculated slightly differently for each age level. At 6, 15, and 24
months, positive care-giving composite scores are the mean of five
4-point qualitative ratings (sensitivity to childs nondistress signals,
stimulation of cognitive development, positive regard for child,
emotional detachment [reflected], flatness of affect [reflected]). At
36 months, these five scales plus two additional subscales, fosters
childs exploration and intrusive [reflected], are included in
the composite. At 54 months, the positive caregiving composite is
the mean of 4-point ratings of caregivers sensitivity/responsivity,
stimulation of cognitive development, intrusiveness (reflected), and
detachment (reflected). The behaviors observed include language
stimulation, positive talk (e.g., praise, encouragement), positive
physical contact and other behaviors (e.g., positive affect, stimulation of social development, restricting activity, speaking negatively
to child, etc.) as well as the amount of time the child positively or
negatively interacted with the caregiver and other children.
The ORCE composite quality ratings predicted concurrent
and later child outcomes in the 10-site NICHD Study of Early
Child Care and Youth Development in analyses that adjusted for
family demographic and parenting characteristics. Children who
experienced more responsive and stimulating care according to
the ORCE consistently had high language and cognitive scores
and tended to have better social skills while in child care (NICHD
Early Child Care Research Network, 2006) and to demonstrate
better language skills through fifth grade (Belsky et al., 2007) and
better academic skills through third grade (NICHD Early Child
Care Research Network, 2005).
Preschool Classroom Mathematics Inventory
The Preschool Classroom Mathematics Inventory (PCMI;
National Institute for Early Education Research, 2007) was created
to assess the quality of mathematics instruction for the preschool
classroom and is modeled after Supports for the Early Literacy
Assessment (see below). The 17 items assess instruction and
learning opportunities related to (1) number (e.g., materials for
counting, comparing number, and estimating; teachers encourage
children to recombine and count); (2) mathematical concept (e.g.,

MEASURING QUALITY IN EARLY CHILDHOOD ENVIRONMENTS

171

measuring and comparing amounts, time, classifying, seriation);

and (3) parents (efforts to involve parents in supporting childrens
mathematical development). A 5-point scale is used, with a score
of 5 representing strong evidence of a developmentally appropriate mathematics program. The one item on parents could not be
given a score without a conversation with a teacher or director.
This is the only measure that focuses entirely on mathematical
learning opportunities. A limitation is that scores may not reflect
the instructional program accurately because on any given day
an observer is not likely to see the full range of mathematical
activities that a program provides. To accurately reflect childrens
opportunity to learn, it would be necessary to visit the program
more than once or to rely on teacher or administrator reports.
Preschool Program Quality Assessment
The Preschool Program Quality Assessment (PQA; High/
Scope, 2003) provides an overall quality rating of the preschool
classroom as well as descriptions of dimensions thought to define
overall quality. It includes 63 5-point scales describing a broad
array of program characteristics, with the endpoints (1 and 5)
and the midpoint (3) defined and illustrated with examples.
There are seven sections: (1) learning environment (e.g., defined
interest areas, varied and open-ended materials, diversity-related
materials); (2) daily routine (e.g., consistent, time for childinitiated activities, small-group time); (3) adult-child interaction
(e.g., warm and caring atmosphere, adults as partners in play);
(4) curriculum planning and assessment (e.g., team teaching,
comprehensive child records); (5) parent involvement and
family services (e.g., opportunities for involvement, staff-parent
informal interactions); (6) staff qualifications and development
(e.g., ongoing professional development, instructional staff background); and (7) program management (e.g., program licensed,
operating policies and procedures). Some of the items are rated
following observations. Others require information provided by
administrators. The observation items tend to emphasize efforts
to promote childrens personal initiative, problem solving, and
explorations.
The PQA manual (High/Scope, 2003) states that scores for

172

EARLY CHILDHOOD ASSESSMENT

preschool classrooms have been shown to predict concurrent

measures of childrens language, and change in scores on the
High/Scope child observation record, but gives little information
on the studies that underlie these assertions.
Supports for Early Literacy Assessment
The Supports for Early Literacy Assessment (SELA; Smith
and colleagues, in development) focuses on literacy learning
opportunities in the preschool classroom. It consists of 20 items
concerning: (1) the literacy environment (print used for a purpose,
such as labeling; inviting places to look at books; array of books;
writing materials available; literacy items and props in pretend
area); (2) language development (encouragement to use and
extend oral language; introduction of new words, concepts, and
linguistic structures; activities to promote oral language; books
shared); (3) print/books concepts (calling attention to functions
and features of print); (4) phonological awareness; (5) letters and
words (promoting letter recognition and interest in writing);
(6) parent involvement (home-based supports for literacy; regular
communication with parents); and (7) sites with English language
learners, promoting maintenance and development of childrens
native language. Scores range from 1 to 5, with 1 considered very
low quality and 5 ideal quality. The measure is one of the few that
provides substantial information on the literacy environment.
One limitation is that some items require an interview with the
teacher to complete.
Supports for English Language Learners Classroom Assessment
The Supports for English Language Learners Classroom
Assessment (SELLCA; National Institute for Early Education
Research, 2005) consists of 8 items, with scores ranging from
1 (minimal evidence) to 5 (strong evidence). It assesses the degree
to which the teacher incorporates the cultural backgrounds of the
children in the classroom and encourages parent participation;
provides literacy materials and encourages children to use their
native language; and supports English language development.

MEASURING QUALITY IN EARLY CHILDHOOD ENVIRONMENTS

173

Observations need to be supplemented with an interview of the

director or a teacher to complete the scale.
STRATEGY FOR ASSESSING PROGRAM QUALITY
We have described direct observation as a strategy for
a ssessing program quality, focusing particularly on systematic
assessments of practices that are believed or known to be associated with child outcomes and that yield numerical scores, allowing comparisons over time and across classrooms. Such measures
can serve several related purposes.
Many classroom observation measures exist that can be used
or adapted to meet the specific needs of a program. Prior to selecting a measure, it is necessary to be clear about the goals of the program and the criteria for quality. Available measures vary along
several dimensions. First, they vary in whether they focus on the
child care or educational experiences of the individual child or
the entire classroom. Second, some measures provide a global
assessment of the child care experiences, whereas other measures
are designed to focus more closely on a specific aspect of those
experiences. Third, they vary in how much they focus on various
program qualitiesthe socioemotional context versus opportunities for children to develop academic skills, for example. Finally,
many measures were designed for preschool classrooms, but
some were designed to measure home-based child care or child
care for infants and toddlers.
We note here that there is research underway examining
current quality rating systems. One recent study by the Rand
Corporation (2008) addressed aspects of the validity of the Qualistar rating system, implemented in child care centers and family
care sites serving over 1,300 children. Centers showed improvement in measured program quality during the course of the study,
but the authors found little evidence that quality ratings predicted
child outcomes, and problems were found with the data used for
some of the component measures in the system. The study had
significant technical problems, including high child attrition,
which limited the conclusions that could be drawn. More work
examining existing quality rating systems could provide welcome
information for those charged with assessing program quality.

Age
Group

6 months11 years

2-5 years

6 months5 years

Infant,
toddler,
preschool

Preschool

Preschool3rd grade

Instrument

Ratings of ParentChild Interactions

Quality of Instruction

Home Observation for

Measurement of the
Environment

Assessment Profile
for Early Childhood
Programs (APECP)

Caregiver Interaction
Scale (cis)

Child/Home Early
Language and
Literacy Observation
(CHELLO)

Classroom
Assessment Scoring
System (CLASS)

Used for

Center/
school

Homebased
child care

All child
care

Center

Home

Home or
lab

Type of
Setting

Physical
Environment,
Materialsa

Social/
Emotional
Climateb

Learning
Environment/
Opportunities

Features of Environment Observed

APPENDIX TABLE 6-1 Environmental Observation Instruments

Language
and
Literacy
Math

Descriptive
Detailc

174

4-5 years

1st-3rd
grades

2.5-5 years

4-7 years

Pre-K-3rd
grade

10
months8 years

Classroom Practices
Inventory (CPI)

A Developmentally
Appropriate Practices
Template (ADAPT)

Early Childhood
Environment Rating
Scale-Revised
(ECERS-R)

Early Childhood
Environment Rating
Scale-Extension
(ECERS-E)

Early Childhood
Classroom
Observation Measure
(ECCOM)

Early Language and

Literacy Classroom
Observation (ellco)

Emerging Academics
Snapshot (EAS)

Center/
school

Center

School

Center

continued

175

Infant12 years

Birth30 months

Preschool

Available
for 6-54
months

Preschool

Family Day Care

Rating Scale
(FDCERS)

Infant and Toddler

Environmental
Rating Scale-Revised
(ITERS-R)

Observation Measure
of Language and
Literacy Instruction
(OMLIT)

Observation Record
of the Caregiving
Environment (ORCE)

Preschool Classroom
Mathematics
Inventory (PCMI)

Instrument

Age
Group

Used for

Center

All child
care

Center

Homebased
child care

Type of
Setting

APPENDIX TABLE 6-1 Continued

Physical
Environment,
Materialsa

Social/
Emotional
Climateb

Learning
Environment/
Opportunities

Features of Environment Observed

Language
and
Literacy

Math

Descriptive
Detailc

176

3-5 years

Preschool

Supports for Early

Literacy Assessment
(SELA)

Supports for
English Language
Learners Classroom
Assessment (SELLCA)

Center

NOTES: Single asterisk = Instrument provides some representation of this feature. Two asterisks = Instrument provides substantial
representation of this feature.
aSafety, physical arrangement, materials.
bEmotional climate, social interactions with adults, support for social skill development.
cLevel of detail in descriptions.

Preschool

Preschool Program
Quality Assessment,
2nd ed. (PQA)

177

Part
III
How to Assess

n this part, we turn to the question of how to select and administer assessments, once purposes have been established and
domains selected. Some of the issues dealt with here are the
technical ones defined by psychometricians as key to test quality:
the reliability and validity of inferences, discussed in Chapter 7.
Others have to do with the usability and fairness of assessments,
issues that arise when assessing any child but in particular children with disabilities and children from cultural and language
minority homes; these are discussed in Chapter 8. In Chapter 9,
and in particular with regard to direct assessments, we discuss the
many ways in which the test as designed may differ from the test
as implemented. Testing a young child requires juggling many
competing demands: developing a trusting relationship with the
child, presenting the test items in a relatively standardized way
that is nonetheless natural, responding appropriately to both correct and incorrect answers and to other child behaviors (signs of
fear, anxiety, sadness, shyness). While it may not be possible to
manage all these demands optimally, it is important that they are
at least acknowledged when interpreting test results.

187

validation based on validity conceptualized as argument. In

Kanes formulation, to validate a proposed interpretation or use
of test scores is to evaluate the rationale for its interpretation for
use (2006, p. 23). In Kanes approach, validation involves two
kinds of argument. An interpretive argument specifies the proposed
interpretations and uses of test results. This argument consists of
articulating the inferences and assumptions that link the observed
behavior or test performance to the conclusions and decisions
that are to be based on that behavior or performance. The validity
argument is an evaluation of the interpretive argument. To claim
that a proposed interpretation or use is valid is to claim that the
interpretive argument is coherent, that its inferences are reasonable, and that its assumptions are plausible (Kane, 2006, p. 23).
In other words, the validity argument begins by reviewing the
interpretive argument as a whole to ascertain whether it makes
sense. If the interpretive argument is reasonable, then its inferences and assumptions are evaluated by means of appropriate
evidence. Any interpretive argument potentially contains many
assumptions. If there is any reason for not taking for granted a
particular assumption, that assumption needs to be evaluated.
The interpretive argument makes explicit the reasoning behind
the proposed interpretations and uses, so that it can be clearly
understood and evaluated. It also indicates which claims are to
be evaluated through validation.
For example, a child assessment procedure or instrument
usually takes some performances by or observations of the child
that are intended to be a sample of all possible performances
or observations that constitute the instruments target content
domain. The procedure assumes that the childs score on the
instrument can be generalized to the entire domain, although the
actual observed behaviors or performances may be only a small
subset of the entire target domain. In addition, they may or may
not be a representative sample of the domain. Standardization
typically further restricts the sample of performances or observations by specifying the conditions of observation or performance.
Although standardization is necessary to reduce measurement
error, it causes the range of possible observations or performances
to be narrower than that of the target domain. In other words, it
can be seen that the interpretation of the childs observed behavior or performance as an indicator of his or her standing in the

188

EARLY CHILDHOOD ASSESSMENT

target domain requires a complex chain of inferences and generalizations that must be made clear as a part of the interpretive
argument.
An interpretive argument for a measure of childrens cognitive
development in the area of quantitative reasoning, for example,
may include inferences ranging from those involved in the scoring
procedure (Is the scoring rule that is used to convert an observed
behavior or performance by the child to an observed score appropriate? Is it applied accurately and consistently? If any scaling
model is used in scoring, does the model fit the data?); to those
involved in the generalization from observed score to universe
of scores (Are the observations made of the child in the testing or
observation situation representative of the universe of observations or performances defining the target cognitive domain? Is the
sample of observations of the childs behavior sufficiently large to
control for sampling error?); to extrapolation from domain score
to level of development (or level of proficiency) of the competencies for that domain (Is the acquisition of lower level skills a
prerequisite for attaining higher level skills? Are there systematic
domain-irrelevant sources of variability that would bias the interpretation of scores as measures of the childs level of development
of the target domain attributes?); to the decisions that are made,
or implications drawn, on the basis of conclusions about developmental level on the target outcome domain (e.g., children with
lower levels of the attribute are not likely to succeed in first grade;
programs with strong effects on this measure are more desirable
than those with weak effects).
The decision inference usually involves assumptions that rest
on value judgments. These values assumptions may represent
widely held cultural values for which there is societal consensus,
or they may represent values on which there is no consensus or
even bitter divisions, in which case they are readily identifiable
for the purposes of validation. When the underlying decision
assumptions represent widely held values, they can be difficult to
identify or articulate for validation through scientific analysis.
The interpretive argument may also involve highly technical inferences and assumptions (e.g., scaling, equating). The
technical sophistication of measurement models has reached
such a high degree of complexity that they have become a black

JUDGING THE QUALITY AND UTILITY OF ASSESSMENTS

189

box even for many measurement professionals (Brennan, 2006,

p. 14). Moreover, as Brennan further points out, many measurement models are operationalized in proprietary computer
programs that can sometimes make it difficult or impossible for
users to know important details of the algorithms and assumptions that underlie the manner in which measurement data are
generated.
Ideally, the interpretive argument should be made as a part
of the development of the assessment procedure or system.
From the outset, the goal should be to develop an assessment
procedure or system congruent with the proposed interpretation and use. In addition, efforts to identify and control sources
of unwanted variance can help to rule out plausible alternative
interpretations. Efforts to make the assessment system or procedure congruent with the proposed interpretation and uses
provide support for the plausibility of the interpretive argument.
In practice, this developmental stage is likely to overlap considerably with the appraisal stage, but at some point in the process
a shift to a more arms-length and critical stance is necessary in
order to provide a convincing evaluation of the proposed interpretation and uses (Kane, 2006, p. 25). Kane views this shift as
necessary because it is human nature (appropriate and probably
inevitable) for the developers to have a confirmationist bias since
they are trying to make the assessment system as good as it can
be. The development stage thus has a legitimate confirmationist
bias: its purpose is to develop an assessment procedure and a
plausible interpretive argument that reflects the proposed interpretations and uses of test scores.
After the assessment instrument or system is developed but
still as a part of the development process, the inferences and
assumptions in the interpretive argument should be evaluated
to the extent possible. Any problems or weakness revealed by
this process would indicate a need for alterations in either the
interpretive argument or the assessment instrument. This iterative process would continue until the developers are satisfied
with the congruence between the assessment instrument and the
interpretive argument. This iterative process is similar to that of
theory development and refinement in science; here the interpretive argument plays the role of the theory.

190

EARLY CHILDHOOD ASSESSMENT

When the development process is considered complete, it is

appropriate for the validation process to take a more neutral or
even critical stance (Kane, 2006, p. 26). Thus begins the appraisal
stage. If the development stage has not delivered an explicit,
coherent, detailed interpretive argument linking observed
behavior or performance to the proposed interpretation and uses,
then the development stage is considered incomplete, and thus
a critical evaluation of the proposed interpretation is premature
(Kane, 2006).
The following events should occur during the appraisal
stage:
1. Conduct studies of questionable inferences and assumptions in the interpretive argument. To the extent that the
proposed interpretive argument withstands these challenges, confidence in the claims increase. If they do not
withstand these challenges, then either the assessment
procedure or the interpretive argument has to be revised
or abandoned (Kane, 2006, p. 26).
2. Search for hidden assumptions, including value judgments, seeking to make such assumptions explicit and
subject them to scrutiny (e.g., by individuals with different
values).
3. Conduct investigations of alternative possible interpretations of the scores. An effective way to challenge an
interpretive argument is to propose an alternative, more
plausible argument. The evaluation of plausible competing
interpretations is an important component in the appraisal
of the proposed interpretive argument.
Ruling Out Plausible Alternative Hypotheses
It is important to recognize that one never establishes the
validity of an assessment instrument or system; rather, one
validates a score, and its typical uses, yielded by the instrument
(Messick, 1989). For example, depending on the circumstances
surrounding an assessment (e.g., the manner of test administration, the characteristics of the target population), the same instrument can produce valid or invalid scores.

JUDGING THE QUALITY AND UTILITY OF ASSESSMENTS

191

197

Can the instrument adequately gauge program quality? This is

really a threefold question: (1) Do the scores (or other data that
are derived from the instrument) have the technical characteristics
(e.g., reliability) to show measurable improvement in childrens
developmental level on the programs intended outcomes?
(Popham, 2007). (2) Is there evidence available that the scores (or
other data that are derived from the instrument) have appropriate
validity characteristics (e.g., internal construct validity, external
variable validity, etc.) for measuring the programs intended
outcomes? (Popham, 2007). (3) Is the evaluation design strong
enough that improvement can be attributed to program effects?
The program may or may not specify targets for attaining
particular developmental levels on its intended outcomes. If
the program has specific developmental outcome targets, then
questions that should be asked in relation to the assessment
instrument include (a) What are those targets? and (b) Can
the instrument accurately measure those targets? It is important,
for example, to ensure that the instrument does not have a ceiling
short of those targets.
One should also ask, What is the yardstick used to measure
a programs success? For example, is the outcome target the
percentage of children who score at or above the chronological
age norms for that outcome? If so, are those norms for the nation
as a whole or are they subgroup normssuch as state norms,
ethnic or language minority or socioeconomic group norms? If
subgroup norms are used, it may be important to establish the
metrics of correspondence between them (Popham, 2007). For
example, a 1-decile improvement at the lower tail of the distribution may or may not mean the same thing as a 1-decile improvement at the higher tail end. Thus, more program resources may be
required to obtain improvements for one group of children than
for another groupor for one portion of the normative curve than
for another.
Moreover, in making judgments about program effectiveness
on the basis of assessment data, one should also ask, Are those
program targets realistic? Although this question does not bear
on the quality of the assessment instrument per se, it nevertheless
bears on the appropriateness of its use. What is a realistic level of
expectation for childrens attaining a particular level of develop-

198

EARLY CHILDHOOD ASSESSMENT

ment on a programs intended outcomes? What is the timeline for

attaining a programs outcome targets?
If assessment results are used for the purposes of accountability, it is important that the assessment should reflect the domains
or areas of development or learning that the program or policy
was intended to influence. For example, a pre-K program that was
not designed to provide nutrition should not be held accountable
for childrens nutritional status. This is discussed further in Chapter 10 on assessment systems.
Reliability Evidence
The traditional quality-control approach to score consistency
has been to find ways to measure the consistency of the scores
this is the so-called reliability coefficient. There are several ways to
do this, for example as (a) how much of the observed variance in
scores is attributable to the underlying true score (as a proportion), (b) the consistency over time, and (c) the consistency over
different sets of items (i.e., different forms). These constitute
three different perspectives on measurement error and are termed
internal consistency, test-retest, and alternate forms reliability,
respectively.
The internal consistency reliability coefficients are calculated
using the information about variability that is contained in the
data from a single administration of the instrumenteffectively
they are investigating the proportion of variance accounted for by
the true score. This variance explained formulation is familiar
to many through its use in analysis of variance and regression
methods. Examples are the Kuder-Richardson 20 and 21 (Kuder
and Richardson, 1937) for dichotomous responses and coefficient
alpha (Cronbach, 1951) for polytomous responses.
As described above, there are many sources of measurement
error beyond a single administration of an instrument. Each such
source could be the basis for calculating a different reliability
coefficient. One type of coefficient that is commonly used is the
Dichotomous means there are two possible responses, such as yes/no, true/
false. Polytomous means there are more than two possible responses, as in partialcredit items.

JUDGING THE QUALITY AND UTILITY OF ASSESSMENTS

199

test-retest reliability coefficient. In a test-retest reliability coefficient, the respondents give responses to the questions twice, then
the reliability coefficient is calculated simply as the correlation
between the two sets of scores. On one hand, the test and the
retest should be so far apart that it is reasonable to assume that
the respondents are not answering the second time by remembering the first but are genuinely responding to each item anew.
This may be difficult to achieve for some sorts of complex items,
which may be quite memorable. On the other hand, as the aim
is to investigate variation in the scores not due to real change
in respondents true scores, the measurements should be close
enough together for it to be reasonable to assume that there has
been little real change. Obviously, this form of reliability index
will work better when a stable construct is being measured with
forgettable items, compared with a less stable construct being
measured with memorable items.
Another type of reliability coefficient is the alternate forms
reliability coefficient. With this coefficient, two sets of items are
developed for the instrument, each following the same construction process. The two alternate copies of the instrument are
administered, and the two sets of scores are then correlated to
produce the alternate forms reliability coefficient. This coefficient
is particularly useful as a means of evaluating the consistency
with which the test has been developed.
Other classical consistency indices that have also been developed have their equivalents in the construct modeling approach.
For example, in the so-called split-halves reliability coefficient, the
instrument is split into two different (nonintersecting) but similar
parts, and the correlation between them is used as a reliability
coefficient after adjustment with a factor that attempts to predict
what the reliability would be if there were twice as many items in
each half. The adjustment is a special case of the Spearman-Brown
formula:
r =

Lr
,
1 + ( L 1) r

where L is the ratio of the number of items in the hypothetical test

to the number of items in the real one (i.e., if the number of items
were to be doubled, L = 2).

200

EARLY CHILDHOOD ASSESSMENT

These reliability coefficients can be calculated separately,

and the results will be quite useful for understanding the consistency of the instruments measures across each of the different
circumstances. In practice, such influences will occur simultaneously, and it would be better to have ways of investigating the
influences simultaneously also. Such methods have indeed been
developedfor example, generalizability theory (e.g., Shavelson
and Webb, 1991) is an expansion of the analysis of variance
approach mentioned above.
One of the issues in interpreting reliability coefficients is the
lack of any absolute standards for what is acceptable. It is certainly true that a value of 0.90 is better than one of 0.84, but not
so good as one of 0.95. At what point should one say that a test is
good enough? At what point is it not? One reason that it is difficult to set a single uniform acceptable standard is that instruments
are used for multiple purposes. A better approach is to consider
each type of application individually and develop specific standards based on the context. For example, when an instrument is
to be used to make a single division into two groups (pass/fail,
positive/negative, etc.), then a reliability coefficient may be quite
misleading, using, as it does, data from the entire spectrum of the
respondent locations. It may be better to investigate false positive
and false negative rates in a region near the cut score.
Measurement Choices:
Direct Assessment and
Observation-based Assessment
Choosing what type of assessment to use is a critical decision
for the design of an early childhood program evaluation or an
accountability system. As others have noted, it is a decision for
which there are no easy answers because there are serious shortcomings in all currently available approaches (Meisels, 2007). Two
sharply contrasting measurement approaches (which we have
discussed in other chapters) can be used with children under
age 5: direct assessments and observation-based (often called
authentic) measures.
A direct assessment involves an adult, possibly a familiar
adult but sometimes a stranger, sitting with a child and asking

JUDGING THE QUALITY AND UTILITY OF ASSESSMENTS

201

him or her to respond to a number of requests, such as pointing

to picture, or counting objects. The conditions for administration,
such as the directions and how the materials are presented, are
standardized to ensure that each child is being presented with
identical testing conditions.
Observation-based measures, such as those involving observation of childrens behaviors or a portfolio collecting records
of observations together with products of childrens work, use
regularly occurring classroom activities and products as the evidence for what children know and can do. Observation-based
measures encompass a variety of tools, including checklists of a
series of items that a teacher or parent completes based on general
knowledge of the child, and classroom-based observation tools,
with which the teacher is expected to make extensive annotations based on what the child is doing in the classroom and use
that documentation to complete the observation items. Portfolio
assessment involves collecting and analyzing records of such
observations or samples of childrens work.
Both direct assessment and observation-based assessment
have strengths and weaknesses. Direct assessments, however,
have been used far more frequently in large-scale research
projects, such as the Early Childhood Longitudinal Study; program evaluations, such as the evaluation of Early Head Start; and
accountability efforts, such as the Head Start National Reporting System. Consequently, there is more known about both the
strengths and weaknesses of this approach. Observation-based
and performance methods are routinely recommended as tools for
teachers to use to plan and guide instruction (National Association
for the Education of Young Children and National Association of
Early Childhood Specialists in State Departments of Education,
2003). Even the recommendation to regularly use such measures
to assess childrens progress in early childhood classrooms is a
relatively new development, so there is much yet to be learned
about the large-scale use of authentic tools for any purpose and
that certainly includes program evaluation and accountability.
In an extensive review of assessment approaches, researchers
at Mathematica Policy Research (2007) noted challenges associated with using both direct assessment and observation-based
measures for program evaluation and accountability purposes.

202

EARLY CHILDHOOD ASSESSMENT

Direct assessments often have been found to be predictive of

school achievement. However, they are strongly associated with
socioeconomic status and may not show whether a program
is supporting children across all developmental domains. The
dilemma is that as a direct measure gets longer and more comprehensive, it also taxes the energy and attention span of young
children. The limitations of direct assessment derive from the
nature of the young child; that nature is not well matched to the
demands of a standardized testing situation. Potential problems
include the following:
The child may not be familiar with this type of task or be
able to stay focused.
Young children have a limited response repertoire, being
more likely to show rather than tell what they know.
Young children may have difficulty responding to situation
cues and verbal directions.
Young children may not understand how to weigh alternative choices, for example, what it means for one answer to
be the best answer.
Young children may be confused by the language demands,
such as negatives and subordinate clauses.
Young children do not respond consistently when asked to
do something for an adult.
In some cultures, direct questioning is considered rude.
The direct, decontextualized questioning about disconnected events may be inconsistent with the types of questions children encounter in the classroom.
Measurement error may not be randomly distributed
across programs if some classrooms typically use more
direct questioning, like that found in a standardized testing
situation.
These problems may not be shown in traditional ways
of assessing validity, which compare childrens performance
on one type of direct assessment with their performance on a
similarly structured testso-called external validity evidence.
Mathematica Policy Research reports on a study by La Paro and
Pianta (2000) that found that about 25 percent of the variance in

JUDGING THE QUALITY AND UTILITY OF ASSESSMENTS

203

academic achievement in primary grades was predicted by assessments administered in preschool or kindergarten. This provides a
ceiling for possible external validity evidence. Observation-based
measures present an entirely different set of issues. They do not
present any of the problems associated with the young childs
ability to understand and comply with the demands of a structured testing situation, since the childs day-to-day behavior is
the basis for the inference of knowledge and skills. Teachers and
caregivers collect data over a variety of contexts and over time to
gain a more valid and reliable picture of what children know and
can do. Observation-based assessment approaches also are consistent with recommended practices for the assessment of young
children. The challenges associated with observation-based measures are centered around the caregiver or teacher as the source
of the information. Mathematica Policy Research (2007) has summarized challenges related to observation-based assessments:
There is a need to establish trust in teachers and caregivers
judgments. Research has identified the conditions under
which their ratings are reliable, but there is an ongoing
need to monitor reliability.
Teachers and caregivers must be well trained in the
administration of the tool to achieve reliable results. More
research is needed to specify the level of training needed to
obtain reliable ratings from preschool teachers. (Assessors
of direct assessments need to be trained as well, but the
protocol may be more straightforward.)
The assessment needs to contain well-defined rubrics and
scoring guides.
Teachers and caregivers may be inclined to inflate their
ratings if they know the information is being used for program accountability.
Not all teachers or caregivers will be good assessors.
Measurement carried out by teachers and caregivers
requires that additional steps be taken to ensure the validity and reliability of the data, such as periodic monitoring.
A strength of observation-based measures is that the information has utility for instructional as well as accountability purposes.

204

EARLY CHILDHOOD ASSESSMENT

This means the time invested in training teachers to become good

observers and the time teachers spend collecting the information
are of direct benefit to classroom practice, which is not true for
direct assessment. Mathematica Policy Research concludes that
it is wiser to invest in training teachers to be better observers
and more reliable assessors than to spend those resources training and paying for outside assessors to administer on-demand
tasks to young children in unfamiliar contexts that will provide
data with the added measurement error inherent in assessing
young children from diverse backgrounds (Mathematica Policy
Research, 2007).
More research needs to be done on the use of observationbased assessment tools for program evaluation and accountability.
If teachers or caregivers are not well trained or do not complete
the tool reliably because they want their programs to look
good for accountability, then the information is useless for both
accountability and instructional purposes. Several states have
elected to use observation-based measurement in their preschool
accountability systems, but it is so new that very limited data are
available. One large program evaluation was able to document
that early childhood teachers could be trained to use observationbased measures reliably. Bagnato and colleagues (Bagnato, SmithJones, et al., 2002; Bagnato, Suen, et al., 2002) used an authentic
assessment approach to document improved outcomes for 1,350
preschoolers participating in an innovative community-based
urban preschool initiative. The highest level of education was a
high school diploma for 42 percent of the teachers working with
the children and thus providing the child outcomes data. To
ensure the outcomes data were valid and reliable, the evaluation
team provided initial, booster, and follow-up training until mastery was reached; supervised caregiver assessments during a set
week each quarter; and once a year conducted random, authentic
assessments on children as a concurrent validation of teacher and
parent assessments.
Although we have presented direct assessments and
observation-based assessments as distinct choices in the paragraphs above, a more recent perspective sees them as constituting different parts of an assessment system or net (Wilson, 2005;
Wilson and Adams, 1996). In this perspective, no single type of

JUDGING THE QUALITY AND UTILITY OF ASSESSMENTS

205

assessment is seen as being fully satisfactory, hence a multipart

assessment system is developed, which uses a combination of
specific assessment types to ensure that the measures are useable
under a range of circumstances and the entire system can adapt
to changing circumstances. The strengths of item response modeling are used to establish both the validity and the usefulness of
this approach. In a classic example drawn primarily from K-12
education, the two assessment types were multiple-choice items
and open-response items (Wilson and Adams, 1996), but in the
context of early childhood education, a more likely combination
would be a mixture of direct assessment and observation-based
assessments, such as teacher observations and portfolios. The
judicious deployment of such a combination allows the different
assessment types to "bootstrap" one another in terms of validity, going a long way to helping establish (a) whether the direct
assessments did indeed suffer from problems of unfamiliarity and
(b) whether observation-based assessments suffered from such
problems as teacher bias. Moreover, systematic use of a combination of assessments enables the monitoring of assessments as an
ongoing possibility, not just a special study carried out during
initial implementation.
Methods for Assessing Test and Item Bias
Developing tests for educational and psychological purposes
requires a thorough consideration of the populations for which the
test is appropriate. Specifically, the test development process should
include several phases designed to ensure that tests and items are
free from bias across the populations for which the test is intended.
These steps include the subjective review of items and test content
by subject matter and bias review panels, as well as more objective
or quantitative examination of item and test properties. In modern
test development, the examination of test bias favors these more
quantitative examinations of item and test bias for their ability
to quantify the extent to which items and tests may function differently across populations of interest, and because of the strong
psychometric theory that supports their development and use, but
interpretation will still rely heavily on qualitative approaches.
The following section is an overview of these quantitative

206

EARLY CHILDHOOD ASSESSMENT

methods for examining (a) test bias and (b) DIF. These issues are
most relevant for three populations of young children, which are
the subject of the next chapter: minority children, English language
learners, and children with disabilities.
Differential Item Functioning
Assessments are typically made of children from a variety of
backgrounds. One standard requirement of fairness in assessment
practice is that, for children who are at the same level of ability on
the variable being measured, the items in the instrument behave
in a reasonably similar way across different subgroups. That is,
the items should show no evidence of bias due to DIF (American Educational Research Association, American Psychological
Association, and National Council on Measurement in Education,
1999, p. 13). Typically these subgroups are gender, ethnic and
racial, language, or socioeconomic groups, although other groupings may be relevant in particular circumstances.
First, it is necessary to make an important distinction. If the
responses to an item have different frequencies for different subgroups, then that is evidence of differential impact of the item on
those subgroups. Although such results may well be of interest
for other reasons, they are not generally the focus of DIF studies.
Instead, DIF studies focus on whether children at the same locations
on the score distribution give similar responses across the different
subgroups.
DIF is not always indicated when different groups perform
differently on an assessment or on particular items. For example,
suppose that more English language learners got a particular item
wrong from an assessment of speaking in English than children
who are native speakers; that would constitute differential impact
on the results of the assessment and could well be an interesting result in itself. But the issue of DIF would not necessarily be
raised by such a resultit is to be expected that someone learning
a language will find it harder to speak that language than native
speakers, and hence the result does not challenge the contention
that the instrument was accurately measuring that difference in
their speaking performance.
However, if children from the two groups who scored at

JUDGING THE QUALITY AND UTILITY OF ASSESSMENTS

207

around the same level on the whole assessment had response

rates on that item that were very different, that would be evidence of DIF for that item. The item is sensitive to some special
characteristic of the children that goes beyond what is being
assessed generally across the range of the items in the assessment
(e.g., interest in the topic or content of the item). In order to be
more fair to children from different subgroups, one would wish
to reduce the influence of items from the assessment that had
notable amounts of DIF, or perhaps amend them to eliminate this
characteristic.
Second, one must be careful to distinguish between DIF and
item bias. For one thing, it is possible that a test may include two
items that exhibit DIF between two groups, but in opposite directions, so that they tend to cancel out. Also, DIF may not always
be a flaw, since it could be due to a kind of multidimensionality
that may be unexpected or may conform to the test framework
(American Educational Research Association, American Psychological Association, and National Council on Measurement in
Education, 1999, p. 13). However, despite these considerations,
most test developers seek to reduce or eliminate instances of
DIF in their tests. The Educational Testing Service has developed
criteria for judging DIF effects (Longford, Holland, and Thayer,
1993).
Several techniques are available for investigating DIF, among
them techniques based on linear and logistic regression and techniques based on log-linear models (see Holland and Wainer, 1993,
for an overview).
For example, consider the results of a (hypothetical) DIF
analysis examining the differences between males and females
on one item (item Z) of a certain test, shown in Figure 7-1. For
each score on the test as a whole, the proportions of boys and girls
who responded correctly to the item have been plotted separately.
If there were no DIF, those proportions would be the same (except
for sampling error) for all scores.
Looking at the figure, we see that for most whole-test scores
boys are more likely to respond correctly to this item than are
girls. That is DIF, and it means that this item indicates a larger
difference in proficiency between boys and girls on this item than
on the test as a whole. Examination of item Z may well reveal that

215

her objective, she establishes a test selection committee that is

comprised of herself, her best teacher, a parent, and Rebecca
Thompson, a retired school psychologist. She asks Dr. Thompson
to chair the committee, because of her experience working in
school settings with diverse child populations, including children
who are not native speakers of English.
Dr. Thompson and Ms. Conway meet and agree to complete
the committees work in 45 days. To achieve this goal, they will
need to rely on information about specific assessments from
external sources, such as Tests in Print (Murphy, Spies, and Plake,
2006) and the Buros Mental Measurements Yearbook (Buros Institute
of Mental Measurements, 2007); products of the Buros Institute of
Mental Measurements at the University of Nebraska; publications
focused on preschool assessment, such as the Child Trends (http://
www.childtrends.org) compendium, Early Childhood Measures
Profiles (Child Trends, 2004), and the compendium developed by
Mathematica Policy Research for Head Start (Mathematica Policy
Research, 2003); and online databases, such as those provided by
Buros, the Educational Testing Service, and others.
The first committee meeting is focused on clarifying the purpose for using the test. Ms. Conway explains that her desire is to
have information about the incoming language skills of all of the
children and to be able to gauge how much language skill the
children gain over the course of their time at Honeycomb. Thus,
she would like a test that measures both receptive and expressive
language, including vocabulary and the ability to follow directions, and childrens knowledge and understanding of grammar
(e.g., the ability to form the simple past tense of common verbs).
She wants to know how the children at Honeycomb compare
with other typically developing 3- and 4-year-old children. She is
especially concerned to know the overall language skills, not just
the English language skills, of the English language learners. This
will help her teachers provide the necessary visual and linguistic
supports to their children and opportunities to develop language
skills through their interactions with the teacher, the environment, and the other children, as well as to measure their progress
See Appendix D for a list and descriptions of useful sources of information on
instruments.

216

EARLY CHILDHOOD ASSESSMENT

over the course of the year to ensure that their language skills are
developing at an appropriate pace and that they will be ready for
kindergarten when they finish at Honeycomb.
The committee discusses these purposes and works to further
clarify the assessment setting. They discuss who will administer
and score the assessments, who will interpret the assessments,
what specific decisions will be made on the basis of the assessment results, when these decisions will need to be made and how
often they will be reviewed and possibly revised, which children
will participate in the assessments, and what the characteristics of
these children are: their ages, their race/ethnicity, their primary
language, their socioeconomic status, and other aspects of their
background and culture that might affect the assessment of their
language skills. Dr. Thompson concludes, on the basis of the
answers to these questions and refinement of their purposes in
assessing childrens language, that either a direct assessment or
a natural language assessment might be used. Ms. Conway likes
the idea of using a natural language assessment but considers
that such an assessment may be too costly. The committee decides
not to preclude any particular form of assessment until they have
more information on the available assessments; their reliability
and validity for the purposes they have specified with children
like those at Honeycomb; and the specific costs associated with
using each of them, including the costs of training personnel to
administer, score, and interpret the assessments and the costs
associated with reporting and storing the assessment results so
that they will be useful to teachers.
The committee next considers how they will go about identifying suitable tests. They consider what tests are being used in
other programs like Honeycomb. In one nearby program, the
director has adopted the use of a locally developed assessment.
Ms. Conway considers that perhaps Honeycomb could also use
this assessment, since the other program appears to be obtaining
excellent results with it. However, Dr. Thompson points out that
such a locally developed test, because it has not been normed with
a nationally representative sample, will not meet at least one of
the stated purposes for assessment, namely, to provide the teacher
with information about how each assessed child is doing relative
to other typically developing children. Knowledge about how

JUDGING THE QUALITY AND UTILITY OF ASSESSMENTS

217

the children at Honeycomb compare with typically developing

children is a sufficiently important purpose that the committee
rejects the idea of using any locally developed assessments that
do not support this kind of inference.
Having clarified their purposes for collecting language assessments and given careful consideration to the requirements and
limitations of their specific setting, the committee collects information on specific assessments. They search online publishers
of major commercial tests for new and existing assessments and
search and gather information from the print and online resources
mentioned above, to gather general descriptive information
about the skills measured by each assessment, its format (both
stimuli and response formats), training requirements or skills of
examiners, costs, and the kinds of scores and interpretive information that are provided. Because they anticipate finding a large
number of assessments that meet their general needs, they decide
not to examine specific review information until after they have
narrowed the field to a manageable number (e.g., 10). They do
agree, however, to consider tests that measure only some of the
language skills of interest, although they believe that it would be
preferable to have one assessment that measures all of the skills
of interest.
Dr. Thompson has developed an electronic form on which to
record this information for each test that they identify as meeting
their primary needs. Committee members arrange the information to be collected and the general characteristics to be rated in
a hierarchy from most important to least important. Information
on the name of the test and the publisher is to be obtained on
all potentially suitable tests, including those that will ultimately
be eliminated, in order that the committee has a record of each
test examined at any level and the reason that it was rejected or
not given further consideration. They arrange the criteria in the
following order: (1) measures some or all of the language skills
of interest, (2) has been normed on a nationally representative
sample and provides normative information for each subgroup
of interest to Honeycomb, (3) is suitable for use with children
in the age range found at Honeycomb, and (4) is suitable for
administration by preschool teachers. For each characteristic, the
individual gathering the information is to mark Yes, No, or

assessments may lend themselves more readily to use within the
planned infrastructure than others, and this information should
be considered in evaluating the usefulness of assessments. While
ease of integration with the infrastructure would not drive a
choice between two instruments that differ substantially in their
technical adequacy, it could be a factor in choosing between two
instruments of comparable technical merit. When examining the
costs associated with the two assessments, the costs of incorporating the assessments into the reporting infrastructure must also be
considered.
Summary
This section provides three different assessment scenarios that
might arise in early childhood settings. They are intended to highlight the kinds of processes that one might establish to identify
suitable instruments, gather information about those instruments,
compile and evaluate the information, and ultimately select the
instruments and make them operational for the stated purposes.
While each new scenario introduces elements not present in the
preceding ones, there is considerable overlap in key aspects of
the process of refining ones purpose; identifying assessments;
gathering, compiling, and reviewing information; and ultimately
selecting instruments and making them operational in the particular context. One other way in which all of the scenarios are alike
is in the need for regular review. Like most educational undertakings, assessments and assessment programs should be subject to
periodic review, evaluation, and revision. Over time, the effectiveness of assessment systems for meeting their stated purposes may
diminish. Regular review of the stated purposes of assessment,
along with regular review of the strengths and weaknesses of
the assessment system and consideration of alternativessome
of which may not have been available at the time of the previous
reviewcan ensure that the individual assessments and the entire
assessment system remain effective and efficient for meeting the
organizations current purposes. If the process for selecting tests
in the first place is rigorous and principled, the review and evaluation process will be greatly simplified.

8
Assessing All Children

ll children deserve to be served equitably by early care

and educational services and, if needed, by intervention
services. This requires that there be fair and effective tools
to assess their learning and development and identify their needs.
In this chapter we address the challenges to assessment posed by
groups of children who differ from the majority population in
various ways. For all of the groups discussed here, assessment
has been problematic.
This chapter has three major sections. In the first section, we
review issues around the assessment of young children who are
members of ethnic and racial minority groups in the United States
and the research that has been done on them, chiefly on black children. The next section deals with assessment of young children
whose home language is not English, to whom we refer as English
language learners. The final section treats the assessment of young
children with disabilities.
Minority Children
Conducting assessments for all children has both benefits
and challenges, but when it comes to assessing young children
from a cultural, ethnic, or racial minority group, unique concerns
apply related to issues of bias. There is a long history of concern
233

234

EARLY CHILDHOOD ASSESSMENT

related to the potential for, and continued perpetuation of, unfair

discriminatory practices and outcomes for minority children.
The topic has struck political, legal, and emotional chords, with
many in the minority population holding deep-seated skepticism
about the positive benefits of assessing their children (Green, 1980;
Reynolds, 1983). Some of the features that distinguish minority
children in United States include racial/ethnic background, socioeconomic status (SES), cultural values, dialect/linguistic differences, historical and current discrimination, current geographic
isolation, and other characteristics that marginalize a population
to the majority society. In this section we provide a brief overview
of the concerns about assessment of young minority children and
examine the available empirical evidence on potential bias in
assessing young children from birth to age 5.
Fairness
The primary concerns about the assessment of this population are fairness and equality across groups. That is, there is
concern that assessment tools, by their inherent properties, could
contribute to the over- or underidentification of children differently across different minority population groups. Since the first
assessment tools were developed, there has been long-standing
concern that test scores may not necessarily reflect differences
in ability or developmental milestones among children and the
populations they represent, but rather demonstrate problems in
the construction, design, administration, and interpretation of the
assessment tests that lead them to be unfair and untrustworthy
(Brown, Reynolds, and Whitaker, 1999; Garcia and Pearson, 1994;
Gipps, 1999; National Association of Test Directors, 2004; Skiba,
Knesting, and Bush, 2002). Most of what is known about potential
bias in assessing minority children is based on school-age children
and youth. Less is known about children younger than age 5 and
assessment score differences between whites and blacks (BrooksGunn et al., 2003). Children ages 5-14 are the most extensively
examined for cultural bias, mostly in intelligence testing, with
most of the empirical focus on ages 7-11 (Valencia and Suzuki,
2001).
It is important for us to clarify the many definitions of unfair

235

ASSESSING ALL CHILDREN

and untrustworthy assessment problems that are typically

termed bias, because they are often confused by researchers and
the public alike (Reynolds, Lowe, and Saenz, 1999). There is bias
as in being unfair or as partiality toward a point of view or prejudice, and there is bias defined as a statistical term: systematic
error in measurement of a psychological attribute as a function
of membership in one or another cultural or racial subgroup
(Reynolds, Lowe, and Saenz, 1999, p. 550). Many of the definitions
of bias as defined by statistical terms are tied to psychometric
validity and reliability theory (discussed in Chapter 7); however,
they are often confounded with philosophical definitions of bias
related to fairness and views of prejudice (Brown, Reynolds, and
Whitaker, 1999).
Types of Biases
Several categories of biases are particularly relevant for
minority populations (Reynolds, 1982; Reynolds, Lowe, and
Saenz, 1999).
Inappropriate Content and Measuring Different Constructs
Bias may arise when the content of the test is unfamiliar to or
inappropriate for minority children; test content is inappropriate
for a population as a result of contextual differences (Neisworth
and Bagnato, 2004). The assumption is that since tests are designed
for cultural values and practices of middle-class white children,
minority children will be at a disadvantage and more likely to
perform poorly because of a lack of exposure to, and a mismatch
with, content included in the testing situation. A lack of success
in an assessment may be due to the fact that the assessment
instrument does not reflect the local and cultural experiences of
the children taking the test, resulting in flawed examinations and
misrepresentation of minority childrens true ability and performance (Hagie, Gallipo, and Svien, 2003).
For example, differences in culture between racial minority
and white majority groups in communication patterns, child
rearing practices, daily activities, identities, frames of reference,
histories, and environmental niches may influence child develop-

236

EARLY CHILDHOOD ASSESSMENT

Developmental
Domain

Number of
Assessment
Tools
Searched

Number of
Bias Testing
Articles
Found

Cognitive

Kaufman Assessment
Battery for Children
(K-ABC) (n = 5)
Peabody Individual
Achievement TestRevised (PIAT-R) (n = 2)
Stanford-Binet Intelligence
Scales, Fourth ed.
(SB-IV) (n = 3)
Wechsler Preschool
and Primary Scale of
Intelligence, Third ed.
(WPPSI-III) (n = 3)
Woodcock-Johnson III
(WJ-III) (n = 3)

Language

Expressive Vocabulary
Test (n = 3)
Peabody Picture
Vocabulary Test III (n = 5)
Preschool language scale
(n = 1)

Socioemotional

Behavioral Assessment
System for Children
(n = 1)
Bayley Scales of Infant
Development (n = 1)
Child behavior checklist
1-5 (n = 1)
Attachment Q-set (n = 1)
Peen Interactive Peer Play
Scale (n = 1)

Approaches to
learning

Assessment Tools with

Articles Meeting Committee
Criteria

ASSESSING ALL CHILDREN

243

unified conclusion about the role of bias in assessment tests for

children.
1. The lack of agreement on the definition of bias. Often it
is not clearly specified what type of bias and validity is
being tested for, and, if it is, only one type of bias may be
addressed. Most of the attention is focused on construct
validity and testing for biases related to inappropriate
content, followed by biases related to an improper normative sample. Cultural groups may have conceptions
or meanings of constructs that are not aligned with what
is represented in the assessment (Gopaul-McNicol and
Armour-Thomas, 2002). Or there is no commonly agreedon use of the term bias from a multicultural testing perspective or agreement on how to measure it (Stockman,
2000, p. 351). Psychometric tests alone cannot address all
potential issues of construct threatsproblems about the
validity of the constructs themselves, not just whether
they are being assessed equivalently. These include contextual nonequivalence, conceptual nonequivalence, and
linguistic nonequivalence.
2. A related issue is mono-operation of bias and measures
of bias. That is, many studies use only a single variable
or a single technique to examine bias effects (Cook and
Campbell, 1979).
3. Methods used to empirically test for bias vary widely,
from simple comparisons of means and standard deviations with the normative sample, partial correlation
between subgroups and item scores to conduct t-tests, to
multiple regression and methodological approaches controlling for potential confounding variables. Depending
on what type of bias is being examined, the simple presence or absence of differences in mean scores between two
different minority groups does not directly say anything
about the fairness of the test (Qi et al., 2003; Reynolds,
Lowe, and Saenz, 1999).
4. Lack of consistent use of psychometric research and theory
in testing for bias. Empirical evidence for potential bias with
minority groups may be a result of the type of psychometric

244

EARLY CHILDHOOD ASSESSMENT

property studied and type of statistical method employed

(Valencia and Suzuki, 2001). For example, the significant
difference in performance between minority samples and
the normative sample of an assessment test prompts one
to consider whether this is evidence for test bias (Qi et al.,
2003). There is no agreement about which psychometric
procedures that deal with or test for bias are most effective
(Crocker and Algina, 1986, cited in Fagundes et al., 1998).
In item analysis, a normal distribution alone does not
indicate whether items differed in difficulty in a sequential
manner equally for minority and nonminority populations
(Qi et al., 2003). For example, if items are placed in order of
increasing difficulty based on a white-normed population,
it is possible that this sequence is not appropriate for black
children (Qi et al., 2003).
5. Examining or testing for content validity or bias tends
to focus on individual item bias. Subjective techniques
to overcome such bias usually involve panels of experts
from diverse backgrounds who say the question is valid
and statistical techniques that are based on item test
differencesand these experts often disagree.
6. Small sample sizes, limited representation of minority
groups, and monolithic conceptualization of minority
groups. For example, there is often an assumption that all
American Indians, Asian Americans, or African Americans
represent a similar culture and language (Helms, 1992).
Most studies examine only black-white differences. Most
existing studies are based on small samples and provide
limited power to examine the relationship between various environmental factors and the reliability or validity of
test outcomes.
7. The empirical evidence available about bias for minority
populations is almost entirely based on African American
and Mexican American children (Madhere, 1998; Valencia
and Suzuki, 2001). Given the growing presence of other
minority groups, particularly Hispanic and Asian groups,
the lack of attention to these groups in bias testing is
problematic, and combining various ethnic groups into a
single rubric is a serious flaw in the empirical testing of

ASSESSING ALL CHILDREN

245

assessment validity and potential bias (Cho, Hudley, and

Back, 2002).
8. Few studies examine potential bias with proper control
for potential confounding variables. The most obvious
omissions are the age and gender of the child. Few studies
report gender or consider gender differences in testing
for cultural bias. Many fail to report or control for socio
economic status as well.
9. Most of the research on test bias, particularly cultural
bias with minority populations, was conducted in the
late 1970s and 1980s, with very few studies in the 1990s
or later. Also, the subjects were mostly older children. For
example, Valencia and Suzukis (2001) review found that
92 percent of empirical, peer-reviewed articles on cultural
bias in intelligence tests for children of preschool age or
older were conducted in the 1970s and 1980s.
10. Limitation of the type of assessment instruments examined for bias. What is known about cultural bias in assessment instruments is confined mostly to intelligence and
cognitive tests, mostly the WISC, WISC-R, and K-ABC.
The WISC and WISC-R have now been replaced by the
WISC-III, yet this new version has not been examined, so
most of what is known about cultural bias in intelligence
tests is thus based on two obsolete instruments (Valencia
and Suzuki, 2001). Tests that measure other aspects of
child development have not received much attention, yet
they are also likely to be culturally influenced, as intellectual and cognitive tests are. An example is culturally
defining and measuring dimensions of socioemotional
development. Such dimensions as creativity, attention,
approaches to learning, and aggression may well be contextually, ecologically, and culturally dependent.
11. Little empirical work has been done on the effects of the
assessor, the rater, or the testing situation. The questions
of whether some children systematically perform worse
under testing situations, and whether assessor effects
operate by increasing the distress or anxiety associated
with a testing situation, merit further research attention
(Brooks-Gunn et al., 2003). Few empirical tests have exam-

246

EARLY CHILDHOOD ASSESSMENT

ined variations across subjects relative to the race of the

assessor or interactions between the race of the assessor
and the race of the child (Sharma, 1986).
The lack of current available empirical evidence exploring test
bias in early childhood assessment suggests that the subject has
become peripheral among both policy makers and researchers.
But, as was stated so clearly at a National Association of Test
Directors Symposium in 2003, those of us who work in testing
should not be lulled into a false sense of calm. The issues raised
in the earlier go-around have not been fully addressed (National
Association of Test Directors, 2004, p. 7). The issues raised in
the policy arena about the fairness of testing, particularly for
young children, have not been informed by sufficient systematic
information.
English Language LearnerS
The increasing demand for evaluation, assessment, and
accountability in early education comes at a time when the fasting growing population of children in the country consists of
those whose home language is not English. This presents several challenges to school systems and practitioners who may be
unfamiliar with important concepts, such as second language
acquisition, acculturation, and the role of socioeconomic status as
they relate to the development, administration, and interpretation
of assessments.
Because assessment is key to effective curricular and instructional strategies that promote childrens learning, young English
language learners (ELL) have the right to be assessed. Through
individual assessments, teachers can personalize instruction,
make adjustments to classroom activities, assign children to
appropriate program placements, and have more informed communication with parents. System administrators need to know
how young English language learners are performing in order to
make proper adjustments and policy changes. However, there is
This section is informed by a paper prepared for the committee by Espinosa
(2007).

ASSESSING ALL CHILDREN

2000). This diverse population of young children presents numerous challenges related to the validity of assessments, not only
because they are young, but also because of their developmental
or disability-related needs. The following pages address why
young children with special needs are being assessed, the principles that should guide assessment, and some of the unique issues
raised by conducting assessments for this population. The term
young children with special needs is used to describe children
from birth through age 5 years who have diagnosed disabilities,
developmental delays, or a condition that puts them at risk for a
delay or a disability.
Key to understanding the assessment issues in this area is
understanding who makes up this population. Many children
with special needs receiving services do so through programs
supported under the Individuals with Disabilities Education Act,
the primary law that provides funding and policy guidance for
the education of children with disabilities. The IDEA is basically a
grants program of federal funds going to states to serve students
with special needs on the condition that the education provided
for them is appropriate (National Research Council, 1997).
In 2006, nearly 1 million children with special needs under
age 5 received services through programs governed by the IDEA.
Specifically, almost 300,000 children under age 3 received early
intervention services and more than 700,000 children ages 3 to 5
received special education and related services (https://www.
ideadata.org/arc_toc8.asp#partbCC). Children under age 5 with
special needs are served under two different sections of IDEA.
Children from birth to age 3 receive services under Part C, Infants
and Toddlers with Disabilities, whereas children ages 3 through
5 are served under Part B, which addresses special education and
related services for children and youth ages 3 through 21.
Infants and toddlers receive services for a variety of developmental problems, with communication problems being the most
frequent. A total of 64 percent of children served under age 3 have
some kind of developmental delay. Nearly one in five (19 percent)
have some kind of a prenatal or perinatal abnormality, and 18
percent have motor problems. Three-fourths of the children identified between ages 2 and 3 receive services for a communication
problem. Smaller percentages have problems with movement (18

262

EARLY CHILDHOOD ASSESSMENT

percent) (Scarborough, Hebbeler, and Spiker, 2006). Nearly half

so that all construct-irrelevant cognitive, sensory,
emotional, and physical barriers can be removed.

Accessible,
nonbiased items

Accessibility is built into items from the beginning, and

bias review procedures ensure that quality is retained
in all items.

Amenable to
accommodations

The test design facilitates the use of needed

accommodations (e.g., all items can be Brailled).

Simple, clear,
and intuitive
instructions and
procedures

All instructions and procedures are simple, clear, and

presented in understandable language.

Maximum
readability and
comprehensibility

A variety of readability and plain language guidelines

are followed (e.g., sentence length and number of
difficult words are kept to a minimum) to produce
readable and comprehensible text.

Maximum legibility

Characteristics that ensure easy decipherability are

applied to text, to tables, figures, and illustrations, and
to response formats.

SOURCE: Thompson and Thurlow (2002).

An accommodation is never intended to modify the construct

being tested. Accommodations can include modifications in presentation, in response format, in timing, and in setting. They are
generally associated with standardized testing, with its stringent
administration requirements. Criterion-based measures, which
tend to be more observation-based, provide children with many
and varied ways to demonstrate competence as part of the assessment procedures, an approach that reduces but may not eliminate
the need for accommodations.
An extensive body of literature has developed in the last
20 years on the use of accommodations of various kinds with
various subgroups of school-age children with disabilities, as

ASSESSING ALL CHILDREN

279

states moved to include children with disabilities in statewide

accountability testing programs (see http://www2.cehd.umn.
edu/NCEO/accommodations). There is no corresponding literature for young children, probably because the process of building
a system of ongoing large-scale assessment of young children for
accountability is only beginning in many states (National Early
Childhood Accountability Task Force, 2007), and it is the implementation of large-scale data collection that precipitates the need
for accommodations.
Other Assessment Characteristics
Individual assessment tools differ with regard to other features that have implications for their appropriateness for some
children with special needs. The tool must have a low enough
floor to capture the functioning of children who are at a level
that is far below their age peers. Not having enough items low
enough for children with severe disabilities can be a problem on
a norm-referenced or curriculum-referenced measure. Similarly,
the assessment must have sufficient sensitivity to capture small
increments of growth for children who will make progress at far
slower rates than their peers (Meisels and Atkins-Burnett, 2000).
Identifying a tool that has a sufficiently low floor, provides
adequate sensitivity, and covers the target age range will be
challenging for any large-scale assessment that includes young
children with special needs. An assessment developed to be used
with 3- through 5-year-olds that includes items only appropriate to that age span will not adequately capture the growth of a
3-year-old who begins the year with the skills of a 2-year-old and
finishes with those of a 3-year-old.
One last consideration related to assessing young children
with special needs is the extent to which the tests assumptions
about how learning and development occur in young children
are congruent with how development occurs in the child being
assessed. Caution is needed in using assessments with children
with special needs that were developed for a typically developing population, and in which children with special needs were
not included in the design work or the norming sample (Bailey,
2004).

280

EARLY CHILDHOOD ASSESSMENT

Conclusion
The nearly 1 million young children with special needs are
regularly being assessed around the country for different purposes. Although a variety of assessment tools are being used
for these purposes, many have not been validated for use with
these children. Much more information is needed about assessments and children with special needs, such as what tools are
being used by what kind of professionals to make what kind of
decisions. Assessment for eligibility determines whether a young
child will have access to services provided under the IDEA. It is
unknown to what extent these critical decisions are being made
consistent with recommended assessment practices and whether
poor assessment practices are leading to inappropriate denial of
service. The increasing call for accountability for programs serving young children, including those with special needs, means
that even more assessment will be occurring in the future. Yet
the assessment tools available are often insufficiently vetted for
use as accountability instruments, and they are difficult to use in
standardized ways if children have special needs, and they focus
inappropriately on discrete skills rather than functional capacity
in daily life. Until more information about assessment use is available and better measures are developed, extreme caution is critical
in reaching conclusions about the status and progress of young
children with special needs. The potential negative consequences
of poor measurement in the newest area of assessment, accountability, are especially serious. Concluding that programs serving
young children with special needs are not effective based on
flawed assessment data could lead to denying the next generation
of children and families the interventions they need. Conversely,
good assessment practices can be the key to improving the full
range of services for young children with special needs: screening,
identification, intervention services, and instruction. Good assessment practices will require investing in new assessment tools and
creating systems that ensure practitioners are using the tools in
accordance with the well-articulated set of professional standards
and recommendations that already exist.

9
Implementation of
Early Childhood Assessments

s noted in earlier chapters, there is a substantial body of

evidence on the importance of considering the reliability
and validity of early childhood assessments in the selection
of measures and in understanding and interpreting the information
obtained from them. In addition to looking at the psychometric
properties of the assessment tools themselves, there is emerging
evidence that it is also important to attend carefully to the ways in
which assessments are actually carried out.
Indeed, as noted in Chapter 7, problems with implementation
can pose a challenge to the validity of the data obtained. A poorly
trained assessor or a child so distracted that she does not engage
with the assessment fully, for example, can lead to questionable
data. Careful consideration of implementation issues can help to
contribute to the underlying goals. For example, if the goal is to
use ongoing monitoring or evaluation to strengthen early childhood programs, then planning for implementation can include
consideration of how results will be summarized and communicated to programs. These issues may be particularly salient
when early childhood assessments are implemented on a broad
scalefor example, when assessments are carried out focusing on
a population of children or of early childhood programs.
The purpose of this chapter is to summarize the emerging
evidence on implementation issues in conducting early child281

282

EARLY CHILDHOOD ASSESSMENT

hood assessments. That is, we complement the earlier summary of

research that looks within the assessment tools by considering the
evidence on the way in which they are implemented. Relative to the
substantial body of work looking at the reliability and validity of
specific early childhood assessments, there is much more limited
research on issues of implementation. While summarizing available evidence, this chapter also identifies areas in which future
research could contribute to the understanding of implementation
issues in early childhood assessment.
The discussion of implementation issues is organized into
three areas, moving sequentially from preparation for administration to the actual administration, and then to follow-up steps:
1. Preparation for administration: clarifying the purpose of
assessment, communicating with parents, training of assessors, and protection against unintended use of data.
2. Administration of assessments: degree of familiarity of the
child with the assessor, childrens responses to the assessment situation, issues in administration of assessment to
English language learners, adaptations for children with
special needs.
3. Following up on administration: helping programs use the
information from assessments and taking costs to programs
into account in planning for next steps.
Preparing for Administration
Determining and Communicating the
Purpose of the Assessments
In a summary of principles of early childhood assessment
that continues to serve as an important resource (see Meisels and
Atkins-Burnett, 2006; Snow, 2006), the National Education Goals
Panel (Shepard, Kagan, and Wurtz, 1998) identified four underlying purposes for conducting assessments of young children. They
cautioned that problems can occur when there is lack of clarity or
agreement as to the underlying purpose of carrying out assessments, because decisions about which assessment is used, the
circumstances under which the information is collected, who is
assessed, the technical requirements for the assessment, and how

IMPLEMENTATION OF EARLY CHILDHOOD ASSESSMENTS

283

the information is communicated follow from the purpose. As one

illustration, Shepard, Kagan, and Wurtz (1998) note that assessments for the purpose of improving instruction have the least
stringent requirements for reliability and validity, while assessments with high stakes have the most stringent ones. Assessments
to guide instruction can be gathered repeatedly over the course
of the year through observations in the classroom, and instruction can be modified if the most recent observations update and
change earlier information. This flexibility is not present for highstakes assessment, in which information gathered in one or only
a few assessments must provide a sufficient basis for important
decisions. In a more recent discussion of this issue, Mathematica
Policy Research (2008) similarly notes that while careful attention
needs to be paid to standardization in the implementation of early
childhood assessments when the goal is evaluation research, there
is greater flexibility in administration when the goal is screening.
For example, for screening purposes it may be warranted to repeat
the administration of an item if this helps to be certain of the
childs best possible performance. Chapter 7 includes a detailed
explanation of the process of matching assessments to purposes.
Shepard and colleagues (1998) also cautioned against the
inappropriate use of assessment resulting from poor understanding of the purpose. For example, screening assessments, intended
to provide an initial indication of whether a child should receive
in-depth diagnostic assessment by a specialist, are sometimes
inappropriately used to make a final determination of childrens
special needs. Screening assessments are also sometimes used to
guide instruction, without the further detailed information that is
needed to glean how childrens learning is progressing in relation
to a set of goals or a curriculum.
Assessments carried out by teachers through ongoing observation in the classroom (such as work sampling) have sometimes
been used for ongoing program monitoring, although it has been
questioned whether data collection of this type is sufficiently reliable to be used for this further purpose. Use of data from ongoing
observations in the classroom for a purpose other than informing
instruction also has the potential to introduce bias, as incentives
or consequences come to be connected to teacher reports (Snow,
2006).
Interviews carried out with staff in a small but nationally rep-

284

EARLY CHILDHOOD ASSESSMENT

resentative sample of Head Start programs regarding implementation of the Head Start National Reporting System (NRS) suggest
that there was ambiguity as to whether the information from the
child assessments was to be used for evaluation and monitoring
purposes (with the intent of informing program improvement
and tracking whether improvements were occurring over time) or
whether it was intended for high-stakes purposes (to make determinations about program funding). Staff in 63 percent of the programs in this study indicated that they felt that it was not clear how
the results of the assessment were going to be used (Mathematica
Policy Research, 2006). This study concluded that when systems
of early childhood assessment are implemented, information
should be shared with programs about how data will be used.
Furthermore, if the intent is to guide program improvement, the
results at the program level should be shared with sufficient time
to guide decisions for the coming year, and guidance should be
provided on how to use the results at the program level.
Communicating with Parents
A further issue of importance in planning for the implementation of early childhood assessments is whether informed consent
is required of parents and how they will be informed of results.
Mathematica Policy Research (2006) reports that in the representative sample of Head Start programs studied to document
implementation of the NRS, nearly all programs had informed
parents that their children would be participating in the assessments. However, there was ambiguity as to whether informed
consent was needed. In the second year of implementation, in
this sample, two-thirds of programs had obtained written consent
from parents. This represented a substantial increase over the
proportion of programs collecting written consent in the first year
of implementation.
Thus, in preparing for administration of early childhood
assessments, a clear decision should be made about a requirement to obtain informed consent from parents, and it should be
A report of the spring 2006 NRS administration was published in 2008 and
received too late for inclusion here.

IMPLEMENTATION OF EARLY CHILDHOOD ASSESSMENTS

285

291

stop rules can affect overall performance by diminishing the total

length of assessments or the sense of frustration in answering
questions that are difficult for a child.
Length of assessment was sometimes a concern in the substantially shorter NRS assessment. Head Start staff in the selected
sites of the study of implementation were asked about their
perceptions of the childrens reactions to the NRS assessment
(Mathematica Policy Research, 2006). The responses were mixed.
Staff in 63 percent of the programs sampled indicated that most
children responded positively to the assessment. And 43 percent
of the staff members interviewed felt that the assessment protocol
was too long, and that this contributed to behavioral issues in the
children. Behaviors that were of concern to staff included children
becoming bored or restless during the Peabody Picture Vocabulary Test (PPVT) or letter-naming tasks and needing redirection.
Children sometimes pointed again and again to the same quadrant of the PPVT rather than varying their responses to respond
to the word provided. By staff report, however, some children
enjoyed the one-on-one time that the assessment permitted. Staff
members also often reported that childrens comfort level with the
assessment situation increased from the fall to the spring assessment. It would be valuable to examine childrens assessment
scores in light of assessor perceptions of child comfort in order
to examine whether childrens comfort level might be associated
with higher scores.
Administration for Children Who Are Learning English
Multiple implementation issues arise in administering early
childhood assessments to children who are learning English.
These include the order of assessments if they will be carried out
in two languages; length and potential burden to the children
of receiving the assessment in two languages; the availability of
skilled bilingual assessors; and the adequacy of training for conducting assessments in two languages. The assessment of children who are learning English also requires a reconsideration
of the purposes of assessment. We note that issues pertaining to
the content, reliability, and validity of assessments in a language
other than English are covered in Chapter 8.

292

EARLY CHILDHOOD ASSESSMENT

Revisiting the Issue of Underlying Purpose

When Assessing in More Than One Language
The decision to administer an assessment in both the home
language and English to a child who is learning English is clearly
tied to the purpose of the assessment for English language
learners and the goals of instruction for these children in their
early care and education program (Espinosa, 2005). For example,
if the intent is to measure how far along a child is in learning
English, it might suffice to assess only in English once he or
she has passed a screener in English. Another possible purpose
for assessing children who are English language learners is to
assess their maintenance of and progress in their home language
while they are learning English. If this is the goal, then it would
be important to assess the child in both languages, and analyses
would report on both. Yet another possibility is that the aim of the
assessment is to measure the childs mastery of certain concepts
or of overall vocabulary, irrespective of which language this is in.
If this is the goal, an appropriate assessment practice would be to
encourage a child to respond to assessment questions in either the
home language or in English and to feel free to use both.
The availability of new approaches both to screening and
to administration of assessments to children who are learning
English will help make it possible to select procedures that are in
alignment with the underlying purpose. Thus, for example, new
language routing procedures have been developed for the First
Five LA Universal Preschool Child Outcomes Study, a study that
needed to address the challenge of having many children in the
study population learning English with a range of different home
languages (Mathematica Policy Research, 2007). The new routing
procedures involve three steps: asking parents about the childs
language use, examining the childs performance on two subtests
from the Oral Language Development Scale or the Pre-Language
Assessment Scale (Pre-LAS) (Duncan and De Avila, 1998), and
observing the language in which the child tends to respond on
a conceptually scored receptive vocabulary test. The routing
procedures provide for the possibility that the initial language of
assessment may be revised during the course of administration in
response to the childs spontaneous language use.

IMPLEMENTATION OF EARLY CHILDHOOD ASSESSMENTS

293

The conceptual scoring on the receptive vocabulary assessment is intended to acknowledge that children learning English
may have mastered particular words in one or another language,
giving the child the opportunity show mastery of vocabulary
across languages. This matches with the purpose noted above
of assessing overall mastery of concepts and vocabulary rather
than vocabulary in a particular language, an approach that will
not be appropriate if the underlying purpose is to assess retention of home language or progress in English. The important
point to note here is that the range of options for routing and of
approaches to assessment for children learning English is expanding and will enable better matching with the underlying purpose
of assessment.
Order of Administration
Questions about the order of administration of assessments
for children learning English arose in the initial year of the NRS
and resulted in a change in practice (Mathematica Policy Research,
2006). In the first year, all children receiving the assessment in
both Spanish and English started with the English assessment.
However there was feedback that this was discouraging to children whose mastery of English was still limited. There was concern that scores on the Spanish language assessment were being
affected by these childrens initial negative experience with the
English assessment.
In the second year of administration, the order of administration was reversed, so that the Spanish version of the assessment
was always to be given first to children receiving the assessment
in both Spanish and English. Interestingly, this too caused some
problems, particularly in the spring administration. By this point,
children who were accustomed to speaking only in English in
their Head Start programs were not always comfortable being
assessed in Spanish. According to Mathematica Policy Research,
the childrens discomfort may have arisen for several different
reasons: they may have been taught not to speak Spanish in
their Head Start programs, their Spanish may never have been
very strong, or their Spanish may have been deteriorating. There
were also some observed deviations from the sequencing of the

294

EARLY CHILDHOOD ASSESSMENT

assessments in the small observational study of assessments conducted in both Spanish and English. Three of 23 programs that
participated in this study were observed continuing to administer
the assessment in English prior to the Spanish version after the
change in guidelines for administration.
These findings indicate that when the decision is to administer
assessments in two languages, a decision about order of administration is not an easy one to make because there are potential
issues with either ordering. Decisions about ordering may need
to take into account the nature and goals of the early childhood
program, especially whether the primary goal is to maintain two
languages or to introduce English. There is a need for systematic
study of whether scores for young children learning English
vary according to order of administration of home language and
English versions of assessments.
Length of Administration
The NRS implementation study found that administration
of the Spanish assessment took several minutes longer than the
English assessment (18.6 compared with 15.8 minutes). In addition, children who received the assessments in two languages had
to spend double the time or a little more in the assessment situation. The guidance that sites received was to try to administer both
assessments the same day, but to reserve the English language
assessment for another day if the child seemed bored or tired.
Interviews with program staff about their experiences in administering the NRS assessment indicated concern with the burden
to Spanish-speaking children of taking the assessment in two
languages (Mathematica Policy Research, 2006). There is a need
for systematic study of whether childrens assessment scores are
related to whether assessments in two languages are conducted as
part of a single session or broken up into two sessions.
Availability of Bilingual Assessors and Trainers
A further issue may be finding assessors who are sufficiently
bilingual to administer assessments in both Spanish and English.
Although the study conducted of assessments in both Spanish and

IMPLEMENTATION OF EARLY CHILDHOOD ASSESSMENTS

295

English as part of the NRS was small, it helps to identify issues

that other large-scale systems of early childhood assessment may
face. Thus, for example, results reported by Mathematica Policy
Research (2006, p. 29) indicate that observers at about half the
sites with observed Spanish assessments reported that some
Spanish-language assessors either were not very fluent in Spanish
or knew Spanish to speak but not to read; they had difficulty reading or pronouncing words in the assessment, and in rare cases,
had difficulty communicating with the children (for example, they
had trouble understanding questions in Spanish).
In addition, 17 percent of the programs in the study sample
administering the assessment in Spanish indicated that there
was a problem with finding certified trainers who could provide
training on the Spanish version of the assessment. Overall, while
84 percent of the observed English language administrations of
the NRS protocol achieved a certification score of 85 percent of
higher and would have been certified, the portion who attained
or surpassed the certification criterion for observed Spanish language administrations was 78 percent. Analyses have not been
reported on whether childrens assessment scores are related to
assessors fluency in Spanish nor on the degree to which assessors
would have met certification criteria.
These results indicate that an important set of issues for
those setting up a system of early childhood assessment with
an increasingly diverse population of children will be not only
finding appropriate assessments but also finding those qualified
to administer the assessments in Spanish and other languages,
as well as ways to ensure that there is an appropriate process for
training on the administration of the assessment in languages
other than English.
Inclusion of Children with Disabilities in a
System of Assessment
The 2008 Mathematica Policy Research report identifies as a
key issue in training assessors their preparation in working with
children with disabilities. In preparing to conduct assessments at
a particular site, assessors need to be trained to collect information on appropriate accommodations for individual childrenfor

296

EARLY CHILDHOOD ASSESSMENT

example, to ascertain whether an aide should be present, if children need to take frequent breaks, or if it is important to confirm
that hearing aids or other assistive devices are working properly.
It is possible that certification on assessments could include a
requirement to tape an assessment with a child who has a disability. Such a procedure would help to ensure that assessors are
aware of and are implementing appropriate practices for children
with special needs.
In the small study of NRS implementation, 30 of 35 programs reported carrying out assessments with children with
disabilities. Staff in these programs usually indicated that they
were comfortable with the accommodations made for these
children. However, about one in six programs would have liked
additional information on when to include children with disabilities in the assessment process and when to exempt them and
on the kinds of accommodations that were appropriate during
the assessments. Some direct observations of assessments carried
out as part of the study indicated that children who could have
been exempted were nonetheless being assessed. These findings
suggest that in implementing a system of early childhood assessments, it is a high priority to articulate clearly the decision rules
for including children with disabilities in the assessments as well
as to provide appropriate training for assessors on the use of
accommodations.
Following Up on Administration
Guiding the Use of Information from Assessments
Key implementation decisions for a system of early childhood assessments do not stop once the assessments have been
administered and the data analyzed and summarized. Decisions
have to be made about how assessment results will be reported
back to programs and program sponsors/funding agencies, and
what guidance will be provided on how programs should use the
information from the assessments. Fundamental decisions need to
be made about how results will be used if the purpose of carrying
out assessments is for program monitoring and evaluation or for
high-stakes purposes.

IMPLEMENTATION OF EARLY CHILDHOOD ASSESSMENTS

297

Turning again to the study of implementation carried out

as part of the NRS, problems with the guidance provided to
programs on how to use the results of the assessments often
concerned the unit of analysis. In more than half of the sample
programs in the study, respondents felt that it would have been
more useful to report on results at the classroom or center level
rather than the program level (which may have involved multiple
centers), because those were the units in which quality improvement efforts were most meaningful. Furthermore, about half of
the programs participating in the study indicated that local assessments (such as ongoing observational assessment through work
sampling) were more useful for program improvement purposes
than the program-level results of the NRS because results were
available more quickly, covered a wider range of domains of childrens development, and could be summarized at the classroom
level or even for individual children.
Thus, when designing a system of assessment, it is important
to look forward in time to the point of communicating results and
to consider in advance the extent to which results are appropriate
for use in program improvement, as well as how best to summarize them so that implications for programs are clear.
Assessing the Costs of Implementing a System of Assessment
Finally, a key follow-up step involves taking stock of the costs
of the assessments to programs. There is limited information from
research available on this issue. Direct examination of the costs
of purchasing material, conducting training, and implementing
early childhood assessments would be extremely valuable. Some
pertinent findings come from program directors participating in
the NRS who reported their perceptions of the costs of implementation (Mathematica Policy Research, 2006). These data should be
seen as a starting point in the examination of this issue not only
because of the small sample size, but also because director perceptions were not accompanied by direct measures of costs. In this
study, 77 percent of the program directors interviewed indicated
that there had been substantial in-kind as well as monetary costs
to their programs of implementation of the NRS assessments.
An in-kind cost they reported was the cost of having staff taken

298

EARLY CHILDHOOD ASSESSMENT

away from their usual activities, including instruction of children,

to conduct assessments. A monetary cost they reported was the
need to hire substitute teachers so that teachers could carry out the
assessments or to hire contract staff to conduct the assessments.
Information on costs to programs can be used as input into
decisions for the future about the frequency of assessments (for
example, whether to conduct them at one or multiple time points),
whether assessments are conducted universally or for a sample
of children, and whether resources need to be made available to
programs to cover the additional costs of assessments.
Conclusion
Emerging evidence indicates that implementing a reliable
and valid system of early childhood assessment requires careful
consideration not only of which assessments to use but how they
are prepared for, how they are put into practice, and how results
are communicated to programs. In the next chapter we stress the
particular importance of these issues in large-scale systemwide
implementation of assessments. However, such issues as clear
communication of the purpose of the assessments, consistent
practices regarding communication with parents and obtaining informed consent, training of assessors, circumstances of
administration to children, appropriate training and assessment
practice for children learning English as well as children with disabilities, and communication of results to programs are important
whether the assessments occur only within specific programs or
at a broader level, such as across a state or for a national program.
There is a clear need for research focusing explicitly on such issues
as how child performance may vary as a function of variations in
the length of assessment, familiarity of the assessor, and procedures for assessing children who are learning English.

Part
IV
Assessing Systematically

n this part, we present our ideas about how to design, develop,

and implement systems of assessment. We strongly believe
that assessment of young children should be an integral part
of a larger system of early childhood development services,
and should be designed to be coherent with the objectives and
approaches the system embraces and should be complementary
to the other components of the system. We realize that today such
comprehensive systems to support childrens development are
more commonly aspirations than realities, but we see them as an
important goal that should be pursued. Thus in Chapter 10 we
present our vision of an ideal early childhood services system,
its components and infrastructure, and describe the roles that
assessments play in such a system. In Chapter 11, we present our
guidelines for developing and implementing assessments within
such a system.

299

10
Thinking Systematically

n this volume we have discussed the dimensions of assessment, including its purposes, the domains to be assessed, and
guidelines for selecting, implementing, and using information
from assessments. Beyond this, however, one cannot make use
of assessments optimally without thinking of them as part of a
larger system. Assessments are used in the service of higher level
goalsensuring the well-being of children and their families,
ensuring that societal resources are deployed productively,
distributing scarce educational or medical resources equitably,
facilitating the relevance of educational outcomes to economic
challenges, making informed decisions about contexts for the
growth and development of children, and so on. Assessments by
themselves cannot achieve these higher goals, although they are
a crucial part of a larger system designed to address them. Only
when the entire system is considered can reasonable decisions
about assessment be made.
This chapter argues that early childhood assessment needs to
be viewed not as an isolated process, but as integrated in a system that includes a clearly articulated higher level goal, such as
optimal growth, development, and learning for all children; that
defines strategies for achieving the goal, such as adequate funding, excellent teaching practices, and well-designed educational
environments; that recognizes the other elements of infrastructure
301

302

THINKING SYSTEMATICALLY

care and education services, also serves as a barrier to developing

a unified system of assessment. While the suggestion that these
many barriers to an integrated system must be vaulted may seem
unrealistic, we argue that a vision of a well-integrated, coherent
system is needed to guide the development of policy for young
children. We expand on the importance of each component of a
well-organized system below.
Standards
The most fundamental aspect of the assessment system is
the set of explicit goals for childrens development and learning
around which the larger system is organized, thus providing the
basis for coherence among the various elements. In most educational settings, these are referred to as standards, but in early
childhood education sometimes other terms, such as guidelines
or foundations, have been used. Whatever they are named,
these standards direct the design of curriculum, the choice of
teaching practices, and the priorities of teachers in setting instructional goals, planning activities and experiences, and organizing
the environment. They are the starting point for developing
assessments, judging performance levels, and rating childrens
and the programs growth and performance.
Standards are also the framework for reporting childrens
performance to educators and the public and for focusing program improvement efforts. Note that, although these standards
are to be applied to childrens performance, they can be used
as one input in establishing accountability for teachers, centers,
and states (National Research Council, 2006). Thus, while some
may see holding teachers, early care and education settings, and
states to these standards for childrens performance as potentially
punitive, others argue that they constitute a defense of the right of
children to a high-quality and fair early childhood environment.
Note that when applying the same logic to the programs in which
children are to be educated, an equivalent set of statements can be
made regarding program standards.
For example, consider the No Child Left Behind Act (NCLB),
which requires states to have reading, mathematics, and science
standards for K-12 education that must be of high quality,

308

EARLY CHILDHOOD ASSESSMENT

although the act says relatively little about what characterizes

standards of high quality. While we are emphatically not recommending that the NCLB regime be extended to early childhood
education, it is important to understand the NCLB framework, as
it is the most common reference point on standards in the United
States, and states are being asked by the federal government to
align their preschool standards with their K-12 standards. Under
the act, the word standards refers both to content standards
and to achievement standards. The law requires states to develop
challenging academic standards of both types, and a federal guidance document describes them as follows (U.S. Department of
Education, 2004):
Academic content standards must specify what all children
are expected to know and be able to do; contain coherent and rigorous content; and encourage the teaching of
advanced skills.
Academic achievement standards must be aligned with the
states academic content standards. For each content area,
a states academic achievement standards must include at
least two levels of achievement (proficient and advanced)
that reflect mastery of the material in the states academic
content standards, and a third level of achievement (basic)
to provide information about the progress of lowerachieving children toward mastering the proficient and
advanced levels.
Note that achievement standards are often also referred to as
performance standards.
The NCLB-driven standards apply to children in grades 3-12
and link directly to the explicitly defined academic content areas
that are also assessed in determining adequate yearly progress
for schools. It would be inappropriate to borrow this model
unchanged and apply it to early childhood settings, in which
explicit instruction in well-defined academic content areas is not
characteristic of excellent care and education.
The Council of Chief State School Officers defines common
standards and assessment-related terms in language relevant to the early childhood community (http://www.ccsso.

THINKING SYSTEMATICALLY

309

org/projects/SCASS/projects/early_childhood_education_
assessment_consortium/publications_and_products/2838.cfm).
It defines standards as widely accepted statements of expectations for childrens learning or the quality of schools and other
programs. Of critical importance in this definition is the inclusion of program standards on equal footing with expectations
for childrens learning.
The report Systems for State Science Assessment (National
Research Council, 2006) examines the role of standards in certain
educational assessments and recommends that they be designed
with a list of specific qualities in mind: standards should be clear,
detailed, and complete; be reasonable in scope; be correct in their
academic and scientific foundations; have a clear conceptual
framework; be based on sound models of learning; and describe
performance expectations and proficiency levels. State standards
that have been developed for K-12 education do not meet these
requirements as a whole, although some come closer than others.
Recent analyses of states early childhood standards also suggest
some misunderstanding of the difference between content and
performance (Neuman and Roskos, 2005; Scott-Little, Kagan, and
Frelow, 2003a). Appendix C presents a brief description of the current status of state standards for early childhood education, and
includes some discussion of the efforts to align early childhood
with K-12 standards.
Standards should be arranged and detailed in ways that
clearly identify what children need to know and be able to do
and how their ideas and skills will develop over time. Learning
progressions (also called learning trajectories) and learning
performances are two useful approaches to arranging and detailing standards so as to guide curriculum, teaching practices, and
assessment.
Learning progressions are descriptions of successively more
sophisticated ways of thinking and behaving that tend to follow
one another as children mature and learn: they lay out in text and
through examples what it means to move toward more mature
understanding and performance.
A useful example of the ideas of learning progressions and
learning performances in the preschool years is Californias
Desired Results Developmental Profiles-Revised (DRDP-R) and

310

EARLY CHILDHOOD ASSESSMENT

its learning progression for interpersonal skills. This learning

progression has been viewed as being composed of six areas,
for each of which a measure (or observational guide) has been
constructed:
1. expressions of empathy,
2. building cooperative relationships with adults,
3. developing friendships,
4. building cooperative play with other children,
5. conflict negotiation, and
6. awareness of diversity in self and others.
The learning progression itself is summarized in the DRDP-R
Preschool instrument (California Department of Education, 2005).
Taking the interpersonal skills example further, we can examine
one of the measures to see what the learning progression looks
like. For example, consider the measure building cooperative
play with other children. For the chosen measure, the progression, expressed as four successive levels, is as follows (starting
from the lowest):
(a) interacts with other children side-by-side as they play
with similar materials,
(b) engages with another child or children in play involving
a common idea or purpose,
(c) shows preference for particular playmates but plays
cooperatively with a variety of children, and
(d) leads or participates in planning cooperative play with
other children.
This measure in the learning progression is brought to life by
examples of learning performances that could illustrate the different levels. Examples for the lowest level (a in the list above) are:
(i) plays blocks side-by-side with other children,
(ii) hands another child a toy that he or she is looking for, and
(iii) hands a bucket to a child sitting next to him or her in the
sandbox.

THINKING SYSTEMATICALLY

311

Note that the teachers are encouraged to develop their own

examples, so that these three do not become canonical. To illustrate changes to the second level in this measure, examples for the
next level (b in the list) are as follows:
(i) plays with blocks with another child,
(ii) plays in sand to build a castle with several other children,
and
(iii) joins another child to help look for a lost toy.
More examples of learning performances are shown in
Figure 10-2, which is a copy of the scoring guide for the measure
building cooperative play with other children.
Learning progressions should be developed around the organizing principles of child development, such as self-regulation.
Such organizing principleswhich are sometimes referred to as
the big ideas of a curriculumare the coherent foundation for
the concepts, theories, principles, and explanatory schemes for
child development (National Research Council, 2006).
Organizing standards around these big ideas represents a
fundamental shift from the more traditional organizational structure used in K-12 standards, in which standards are grouped
under discrete topic headings. For example, instead of listing
knowledge of 10 letters as a desirable outcome for a 4-year-old,
one might list letter recognition and phonological awareness as
examples of performances under a heading such as emergent
understanding of literacy forms. A likely positive outcome of
reorganizing standards from many discrete topics to a few big
ideas is a shift from breadth to depth of coverage, from long lists
of goals to a relatively small set of foundational values, principles,
and concepts. If those values, principles, and concepts are the
target of instruction, they can develop naturally and be extended
over time.
Specifying learning performances is a technique for elaborating on content standards by describing what children should
be able to do if they have achieved a standard. Some examples
of learning performances: children should be able to interact

Preschool

} Succcessfully organizes playmates to

build a city out of blocks.
} Participates in pretend play with
peers, following the agreed-upon
roles.
} Successfully helps to negotiate where
and how a small group of children
can play.
} We can make one big spaceship
with the LEGOS. Want to try?

PS DRDP-R Manual 2007 California Department of Education

Building cooperative play with other children

SOC 4 (of 6)

4. If you are unable to rate this measure, explain why.

3. Mark here if the ch ild is emerging to the next level.

} Plays in blocks area with whomever

happens to be there, then moves on
to play with particular playmates on
the climbing structure.
} Gets along easily with various
playmates in different parts of the
room or playground.
} Participates in short pretend play
with several peers, but mostly
interacts with one of them.

Leads or participates in
planning cooperative play with
other children

Integrating

FIGURE 10-2 An excerpt from the Desired Results Developmental Profile-Revised. Reprinted by permission from the California
Department of Education, CDE Press 1430 N. Street, Suite 3207, Sacramento, CA 95814.
Alternate Figure 10-2, downloaded from some source,
with type as editable type
broadside
R01340

Measure 6

2. Record evidence for this rating here.

} Plays blocks side-by-side with other

children.
} Hands another child a toy that he or
she is looking for.
} Hands a bucket to a child sitting next
to him or her in the sandbox.

Examples
} Plays with blocks with another child.
} Plays in sand to build a castle with
several other children.
} Joins another child to help look for a
lost toy.

Shows preference for

particular playmates, but plays
cooperatively with a variety of
children

Engages with another child

or children in play involving a
common idea or purpose

Interacts with other children

side-by-side as they play with
similar materials

Not yet at fi rst level

Building

Developing

Exploring

1. Mark the highest developmental level the child has mastered.

Definition: Child interacts with other children through play that becomes increasingly cooperative and oriented towards a shared purpose

6 Measure 6: Building cooperative play with other children

Desired Result 1: Children are personally and socially competent

6 Indicator: SOC Preschoolers demonstrate effective social and interpersonal skills

312

THINKING SYSTEMATICALLY

313

with their peers in a positive way, express their wishes, follow

common teacher instructions, carry out basic personal hygiene,
use different media for art. A clear understanding of what performance demonstrates that a child has attained a standard allows
assessment developers to design activities or tasks to elicit those
performances, and it provides teachers with explicit goals for
instruction. This approach helps build coherence between what is
taught and what is assessed (National Research Council, 2006).
Assessments
Assessment, which includes everything from systematic child
observations to nationally standardized tests, is an organized
process for gathering information about child performance and
early care and education environments. Assessments of all kinds
make available information vital in allowing the early childhood
education system to make decisions about choosing content and
learning experiences, to hold preschool programs accountable
for meeting development and learning goals, and monitor program effectiveness. Assessment is also a way for teachers, school
administrators, program directors, and state and national education policy and decision makers to operationalize the goals for
childrens development and learning articulated in the standards.
Although assessment can serve all of these purposes, no single
assessment can.
To generate valid inferences, every assessment has to be
designed expressly to serve its functions. An assessment designed
to provide information about a childs problems with a single idea
or skill, in order to guide a teacher in helping that child learn,
would be constructed differently from an assessment designed to
provide data to policy makers for evaluating the effectiveness of
a statewide program. The former requires that childrens understanding of the selected idea or skill be tested rigorously and
completely; the latter requires that the assessment sample all of
the topics the program is designed to teach. Results from either
of these assessments would not be valid for the purposes of the
other, although they may share certain characteristics as part of a
common system of assessment.

314

EARLY CHILDHOOD ASSESSMENT

Reporting
The reporting of assessment results is frequently taken for
granted, but deliberation on this step is essential in the design of
assessment systems and for the sound use of assessment-based
information. In fact, decisions about the scope and targets of
reporting should be made before assessment design or selection
proper begins, and, most importantly, before the assessment data
themselves are collected (National Research Council, 2006).
Information about childrens progress is useful for all tiers
of the system, although different tiers need varying degrees of
assessment frequency and varying degrees of detail. Parents,
teachers, early childhood program administrators, policy makers,
and the public need comprehensible and timely feedback about
what is taking place in the classroom (Wainer, 1997). Furthermore,
taking a systems perspective, many kinds of information need
to be accessible, but not all stakeholders need the same types of
information. Thus, very early in the process of system design,
questions need to be asked about how various types of information will be accessed and reported to different stakeholders and
how that reporting process can support valid interpretations.
Individual standards or clusters of standards can define the
scope of reporting, as can learning progressions if they have been
developed and made clear to the relevant audiences. Reports
can compare one childs performance, or the performance of a
group, with other groups or with established norms. They can
also describe the extent to which children have met established
criteria for performance (the current No Child Left Behind or
NCLB option). If descriptions of the skills, knowledge, and abilities that were targeted by the tasks in the assessment are included,
users will be better able to interpret the links between the results
and goals for childrens learning. It is important to recognize that
many states lack the resources to design assessments that are
perfectly aligned with their standards. They may have to resort
to selecting existing assessments and cross-walking them to standards. While this may lead to a period of only partial alignment,
the exercise leads to useful opportunities to refine both standards
and assessment portfolios.
The reporting of assessment outcomes can take on many

315

THINKING SYSTEMATICALLY

appearancesfrom graphical displays to descriptive text, and

from numbers to a detailed analysis of what the numbers mean.
In some states, NCLB assessment results are reported on a
standard-by-standard basis; others provide information keyed to
learning objectives for a specific class. In some states in Australia,
where learning continua serve as the basis for assessment at all
levels of the system, progress maps are used to describe child
achievement. Figure 10-3 is a progress map from a Government of
Western Australia website (http://www.curriculum.wa.edu.au/
ProgressMaps/english.htm). During the early childhood years,
assessment results should be conveyed to parents in accessible

Outcomes

Foundation Descriptions

Level Descriptions

The outcome from the

Curriculum Framework.

Intended for students for whom

development of, or beyond, this
achievement may be a long-term goal.

For each level of achievement (Foundation,

Levels 1 to 8) a description of student
achievement at that level is provided.

Aspects

Aspect Descriptions

The aspects that comprise student

achievement of the outcome are listed
beneath each outcome.

The process described here may go beyond the resources available in many programs. In particular, some programs may need to
rely on selecting existing assessment tools and reporting strategies
rather than developing new ones. Nonetheless, we describe here
an ideal toward which programs should be moving.
The current Landscape of
Early Childhood Systems
An analysis of a systems approach for early childhood assessment starts with the somewhat utopian view presented in the
previous section, but it also requires careful review of the current
terrain: How are current early childhood assessment efforts linked
to standards, learning opportunities, or both? The early childhood landscape reveals multiple forms and targets of service and
assessment, varied sources of standards and mandates, numerous
ways of reporting and using data, and different approaches to
linking consequences with patterns of performance by children
and programs (Gilliam and Zigler, 2004); in other words, it is
at this moment very far from constituting a single system. The
National Early Childhood Accountability Task Force (2007) concluded that early childhood agencies are implementing a great
variety of child and program assessments.
Table 10-1 displays nine different forms of child and program assessments, including four forms of assessment used to
document the quality of early childhood programs, four forms
of assessments of young children, and one form of assessment
that gathers information on both program quality and childrens
learning. Each form carries its own distinctive purposes, its procedure for reporting to different audiences, and its specific ways
of using assessment data. Taken together, these multiple assessments are generating many different types of data on children and
programs. They also require substantial time and effort from local
practitioners and program administrators (National Early Childhood Accountability Task Force, 2007).
Beyond drawing attention to the large number of different
forms of assessment, the Accountability Task Force Report notes
that current assessment models, with the single exception of program evaluation studies, separate reports about child outcomes

325

THINKING SYSTEMATICALLY

TABLE 10-1 Current Forms of Early Childhood Assessments

Form

Population Assessed

Uses of Data

Quality rating
systems

Providers seeking
recognition for varied
levels of quality

Consumer information on
quality status
Higher reimbursement
rates for higher quality
Program improvement

Program
accreditation

Providers seeking
recognition as above a
threshold of quality

Consumer information on
quality status
Program improvement

Program monitoring

Providers receiving
state/federal program
funding

Program improvement
Funding decisions

Program licensing

All providers serving

young children

Determine compliance
with health and safety
standards

Kindergarten
readiness
assessment

All children at
kindergarten entry

Report to public
Planning early
childhood investments

State/federal pre-K
child assessments

Children enrolled in a
state or federal program

Reporting to funding
sources

Assessment for
instruction

All children

Planning curriculum
Informing parents

Developmental
screening

All children

Referral to assess for

eligibility for special
education

Program Assessments

Child Assessments

Child + Program Assessments

Program evaluations

Representative samples
of children and local
programs

Report to legislatures
and the public on
program quality,
outcomes, impacts
Informs program
improvement and
appropriations decisions

SOURCE: National Early Childhood Accountability Task Force (2007).

326

EARLY CHILDHOOD ASSESSMENT

from reports on program quality. This means that information

about the quality of a programs services is rarely integrated with
information about progress and outcomes for the children served
in that program and, conversely, data on childrens learning is
rarely juxtaposed with information about the quality of services,
teaching, and learning opportunities provided to those children.
This chapter summarizes bold goals for early childhood
assessment systems that transcend most contemporary practice
in supporting both accountability and childrens learning and
development. Experience with the design requirements of effective assessment systems based on standards is still developing.
Even in the K-12 system, which has a longer history of assessment
and accountability, the methods for designing and guaranteeing
alignment of assessments to standards and to learning opportunities are still evolving, with only a limited amount of research guidance. The research base on current theories of learning that should
guide the development of assessments is also evolving (but see
National Research Council, 2006). Thus, while current accountability practice is based on the premise that continuous cycles of
assessment and improvement are key to helping all learners reach
high standards, the means of making that goal a reality are still
underspecified. Because very young children are at even greater
risk than older ones of negative consequences from the misuse of
assessment, great care must be taken not to impose the incomplete
understandings in the K-12 system on this vulnerable population
(National Research Council and Institute of Medicine, 2000).
Recent years have witnessed significant investments at the
state and federal level in early childhood programming. Concomitantly, state and federal program offices are managing separate and varied approaches to standards and assessments for the
growing populations of children they serve. Table 10-2 highlights
different standards and assessments established by four major
funding sources for early childhood services: child care, Head
Start, state pre-K, and early childhood special education. These
standards include frameworks of learning goals for young children and standards for programs. The table also provides information on the number of states that are currently implementing
various types of standards and assessments.
This table highlights the fact that the nations approach to

327

THINKING SYSTEMATICALLY

TABLE 10-2 Standards and Assessments for Young Children by

Funding Source

Child Care

Head Start

State Pre-K

Early
Childhood
Special
Education

Standards
for
childrens
learning

Early
learning
guidelines
(49 states)

Head Start
Child
Outcomes
Framework
(federal)

Early learning
guidelines
(49 states)

3 functional
goals
(federal)

Child
assessments

No current
requirements

11
Guidance on Outcomes
and Assessments

his report centers around two key principles. First, all

assessments should be integrated into a larger coherent
system of early childhood care and education that they are
designed to support. This is not a new idea, but the committee
is convinced that it bears repeating, because it is fundamental
to worthwhile assessment. A system of early childhood care
and education must have well-articulated goals and objectives,
documented in standards, guidelines, and frameworks, that can
inform the design and implementation of early care and education
programs. The same set of goals should drive all assessment of
whether the objectives are being metby programs, by teachers,
and by children. This supports the coherence necessary for an
effective system.
Second, and also a key point not new in this report, the
purposes for assessment must be clearly articulated before the
assessment is designed, developed, selected, or implemented.
Different purposes require different types of assessments, and an
assessment designed for one purpose should never be converted
to another without careful consideration of its appropriateness to
the new purpose. This is really an extension of the first principle,
but it is especially important for building trust among the people
and organizations involved in an assessment effort. Poorly articulated purposes and assessments used for inappropriate purposes
341

342

EARLY CHILDHOOD ASSESSMENT

can lead to decisions that are unfair or unclear, and they may do
harm to programs, teachers, and, most importantly, children.
In this chapter, we present a set of guidelines that should be
useful to a broad range of organizations charged with the assessment of children and of programs providing care and education
to young children. These guidelines are organized around the
major themes of the report and flow from the perspective that
any assessment decision should be made in the context of a
larger, coherent assessment system, which is in turn embedded
in a network of medical, educational, and family support systems
designed to ensure optimal development for all children.
Thus, though we briefly recap our rationale, based on our
review of the literature, and present our guidelines following the
order of topics in the volume, we hope the reader interprets our
discussion of purposes, targets, and procedures for assessment as
different specific topics subordinated to the notion of an assessment
system. In compliance with our charge, we have also included a
section presenting a recommended agenda for research on the
assessment of young children, following the detailed guidelines.
These guidelines should be useful to anyone contemplating the selection or implementation of an assessment for young
children, including medical and educational service providers,
classroom practitioners, federal, state, and local governments
and private agencies operating or regulating child care and early
childhood education programs, and those interested in expanding
the knowledge base about child development and the conditions
of childhood. To make our guidance more pointed and practical,
the chapter ends with a list of high-priority actions by members
of specific groups engaged in the assessment of young children,
which can be taken quickly and should provide maximum
payoffs.
Purposes and uses of Assessment
Rationale
In recent years, the purposes for which young children are
being assessed have expanded, with more children being assessed
than ever before. Young children have been assessed to screen for

GUIDANCE ON OUTCOMES AND ASSESSMENTS

343

and identify possible developmental problems for many years, but

with advances in knowledge and new technologies the number of
potential problems for which screening can be done has increased.
The use of assessment to plan and guide instruction with young
children also has been a recognized purpose of assessment for
many years but has received more attention lately, as it has become
widely acknowledged as a key component of a high-quality early
childhood program. Making decisions about early childhood programs is a purpose for assessment for which an increasing number
of children are being assessed lately, and for which even more
children are likely to be assessed in the future. These decisions can
be the result of a program evaluation or as part of ongoing accountability procedures. This last area has generated much discussion
because of the technical challenges involved and because of the
potential for misuse of assessment information.
Despite the greatly increased amount of assessment in which
young children are engaged, it is not always clear why assessments are undertaken or what rationale exists for the form of
assessment selected. Assessments are often chosen and used that
do not match their purpose well. The process of developing any
assessment system involving young children needs to begin with
a clearly articulated statement of purpose.
Clearly thinking through the purpose involves defining the
question the assessment process is designed to answer, as well as
defining in advance how the information to be collected will be
used. The problem of mismatch between assessment purpose and
assessment use is evidenced in several ways:
Assessments designed and developed for one purpose are
adopted for different purposes, without consideration of
the match of information generated to the goal or to the
validity of inferences with the novel use. Whoever selects
the assessment instrument should consider the goal and
seek an instrument with proven validity when used for that
goal. If such an instrument does not exist, then firm conclusions cannot be drawn.
There are not many tools designed for large-scale program
evaluation, so tools designed for other purposes often are
adapted (e.g., shortened or administered differently) out

344

EARLY CHILDHOOD ASSESSMENT

of necessity, without sufficiently investigating the validity

of the adapted tools in their new form and for their new
purpose.
There is considerable worry in the field that an absence of
the funding needed to develop effective measures is driving people to use simple, unaligned, poorly developed
measures or to use well-developed psychometrically sound
measures to assess constructs for which they are not well
designed.
Purposes for assessment range widely, and some measures
can be used for more than one purpose. Child-focused assessments can be used for child-specific purposes, such as screening
and diagnosis, as well as for program monitoring and improvement purposes or for program evaluation. Similarly, with care,
classroom quality assessments can be used for purposes of program monitoring, as formative input to guide program decisions,
as an outcome in program evaluations, or in order to serve as
moderating or mediating variables in predicting child outcomes
in research. Nonetheless, not all instruments are appropriate for
all purposes, and those selecting an assessment need to review
the purposes for which it was designed to determine if it can be
appropriately used for their intended purpose.
It is not uncommon that inferences about program effectiveness are based on end-of-program performance of individual
children. Such inferences are inappropriate without attention to
the environments children experience both inside and outside the
program, as well as to the characteristics at entry of the children
served by the program. In the systems perspective we adopt,
child performance should be viewed developmentally, and the
complexity of factors influencing child performance or growth
in any particular domain should be understood. Threats to the
validity of inferences about program effectiveness that are based
purely on child performance are reduced if measures reflect child
progress rather than just end-of-program status, as well as if
direct indicators of quality in the environment are also collected.
Of course, information from these various sources about program
effectiveness then also needs to be contextualized in information
about resources (funding, longevity, administrative support, pro-

GUIDANCE ON OUTCOMES AND ASSESSMENTS

345

fessional development) available to the program before it could

possibly justify any decisions about restructuring or defunding.
There is a responsibility to articulate the purpose of any
assessment in a responsible way to those who participate and
who might be influenced by outcomes. For example, if a program
is being evaluated, program staff should understand whether
there are plans to use the assessments to evaluate their performance on an individual level. They should also know whether the
information will be made available to guide decisions about the
program and individual children. Consequences of assessment
vary. Ideally, of course, assessment information benefits children
by providing information that can be used to inform their caregivers, to improve the quality of their care and education environments, and to identify child risk factors that could be remediated.
Particularly in assessing young children, care is needed to ensure
that they are not negatively affected (unintentionally frightened
or made to feel incompetent) by the process of assessment, and
that the value of the information gathered through assessment
outweighs any negative effects (e.g., time taken away from
instruction, disruption of normal routine, boredom or disengagement with the tasks, decisions that may negatively affect them).
Guidelines on Purposes of Assessment
(P-1) Public and private entities undertaking the assessment of
young children should make the purposes of assessment
explicit and public.
(P-2) The assessment strategywhich assessments to use, how
often to administer them, how long they should be, how
the domain of items or children or programs should be
sampledshould match the stated purpose and require
the minimum amount of time to obtain valid results for
that purpose. Even assessments that do not directly involve
children, such as classroom observations, teacher rating
forms, and collection of work products, impose a burden
on adults and will require advance planning for using the
information.
(P-3) Those charged with selecting assessments need to weigh
options carefully, considering the appropriateness of candi-

346

EARLY CHILDHOOD ASSESSMENT

date assessments for the desired purpose and for use with
all the subgroups of children to be included. Although the
same measure may be used for more than one purpose,
prior consideration of all potential purposes is essential, as
is careful analysis of the actual content of the assessment
instrument. Direct examination of the assessment items is
important because the title of a measure does not always
reflect the content.
Domains and Measures of
developmental outcomes
Rationale
During infancy and toddlerhood in particular, frequently
assessed domains include those implicated by the agenda of
screening for medical, developmental, or environmental risk.
Across the entire preschool period, a critical issue is what aspect
of young childrens skills or behavior to measure. Research on
the developing child has traditionally conceived of development
as proceeding in different domains, for example, language or
motor or socioemotional development. These distinctions have
served science well and are helpful for assessment purposes, but
in reality the distinctions among childrens skills and behaviors
are somewhat artificial and not as clear-cut as the organization
of research or assessment tools would suggest. Developmental
domains are intertwined, especially in the very young child,
making it challenging or even impossible to interpret measures in
some domains without also measuring the influence of others.
Health, socioemotional functioning and cognitive functioning are closely interconnected in infancy, as for example when
sleeping difficulties affect both socioemotional and cognitive
functioning. For somewhat older preschoolers, the domains may
be more readily differentiated operationally and theoretically, but
they remain interdependent; for example, socioemotional (e.g.,
capacity to regulate negative emotion) and cognitive measures are
interrelated and appear to have linked neural bases.
Nevertheless, a conceptualization is needed that identifies the
areas of development society wants to track and that programs

GUIDANCE ON OUTCOMES AND ASSESSMENTS

347

and services for young children are trying to impact. Convergent

sources of information suggest that five major domains of child
functioning recur in discussions of development during the preschool period. Following the usage established by the National
Education Goals Panel (1995) on school readiness, we use the
following terms to describe them:
1. physical well-being and motor development,
2. social and emotional development,
3. approaches toward learning,
4. language development (including emergent literacy), and
5. cognition and general knowledge (including mathematics
and science).
These domains are themselves at different levels of development in defining the constructs they encompass and in the range
and sophistication of the associated measures, and they differ as
well in the amount of attention they get in policies for young children. It is relatively easy to converge on a set of general domains,
but disagreement is common when specifics are needed. Social
and emotional development, for example, encompasses emotion labeling in some assessments, but not others. Attentiveness
is classified as social/emotional in some assessments, but under
approaches toward learning in others. Also, the operationalization of the larger constructs evolves over time; fitness as an aspect
of physical well-being, for example, is only recently emerging as
a focus of policy attention in the preschool period, and it is not
widely included in state standards. For the domains of social and
emotional development and approaches to learning and for the
subdomain of fitness, this is a period of active measures development, including both direct assessment and further work on
parent and teacher reports. While important work in these areas
is under way, both measures development and consensus about
key constructs remain less advanced than for such subdomains as
language, literacy, and mathematics.
Some domains important to many parents and perhaps
to others are minimally represented in standards, research, or
assessmentsuch as art, music, morality. Those concerned with
promoting good outcomes for children differ in their beliefs about

348

EARLY CHILDHOOD ASSESSMENT

what domains are most important, as evidenced by the variation

among states early learning standards and the focus on basic
skills in the federal program Good Start, Grow Smart. Furthermore, a policy focus on a domain is likely to generate pressures
to develop associated measures, which in turn increases the likelihood that the domain will be included in subsequent assessment
activities.
One basis for identifying particular domains as outcomes
worthy of being tracked in young children is the values of parents, educators, policy makers, and traditional forces in society;
these forces are clearly historical, and thus the basis may need
to be expanded as the composition of society changes. Another
is predictive data that show relationships to school achievement
or other important long-term outcomes (e.g., staying out of the
juvenile justice system); these, too, represent relationships to tradi
tionally valued outcomes, but as the goals of education change,
they, too, might need to be adjusted. Evidence is not available
about the relative relevance of the domains currently emphasized
in assessment systems to groups increasing their representation in
the society rather than those traditionally most numerous.
Although domains are an easy way to think about outcomes,
they may not be the right approach for all purposes. A notable
example is assessment of children with disabilities, for whom the
recommended practice is to write functional rather than domainbased outcomes on individualized service plans (e.g., dressing
oneself, participating in family mealtime). To support this emphasis in service provision, the Office of Special Education Programs
in the U.S. Department of Education adopted three functional
outcomes for national accountability reporting on programs serving children from birth to age 3 and ages 3 through 5 with delays
and disabilities.
Guidelines on Domains and Measures of
Developmental Outcomes
(D-1) Domains included when assessing child outcomes and the
quality of education programs should be expanded beyond
those traditionally emphasized (language, literacy, and

GUIDANCE ON OUTCOMES AND ASSESSMENTS

(D-2)

(D-3)

(D-4)

(D-5)

349

mathematics) to include others, such as affect, interpersonal

interaction, and opportunities for self-expression.
Support is needed to develop measures of approaches to
learning and socioemotional functioning, as well as other
currently neglected domains, such as art, music, creativity,
and interpersonal skills.
Studies of the child outcomes of greatest importance
to parents, including those from ethnic minority and
immigrant groups, are needed to ensure that assessment
instruments are available for domains (and thinking about
domains) emphasized in different cultural perspectives, for
example, proficiency in the native language as well as in
English.
For children with disabilities and special needs, domainbased assessments may need to be replaced or supplemented with more functional approaches.
Selecting domains to assess requires first establishing the
purposes of the assessment, then deciding which of the
various possible domains dictated by the purposes can best
be assessed using available instruments of proven reliability
and validity, and considering what the costs will be of omitting domains from the assessment system (e.g., reduction of
their importance in the eyes of practitioners or parents).

Selecting and Implementing Assessments

358

EARLY CHILDHOOD ASSESSMENT

assessment subsystem within a larger system of early childhood care and education.
(S-2) A successful system of assessments must be coherent in a
variety of ways. It should be horizontally coherent, with the
curriculum, instruction, and assessment all aligned with
the early learning and development standards and with the
program standards, targeting the same goals for learning,
and working together to support childrens developing
knowledge and skill across all domains. It should be vertically coherent, with a shared understanding at all levels of
the system of the goals for childrens learning and development that underlie the standards, as well as consensus
about the purposes and uses of assessment. It should be
developmentally coherent, taking into account what is known
about how childrens skills and understanding develop over
time and the content knowledge, abilities, and understanding that are needed for learning to progress at each stage of
the process. The California Desired Results Developmental
Profile provides an example of movement toward a multiply coherent system. These coherences drive the design of
all the subsystems. For example, the development of early
learning standards, curriculum, and the design of teaching
practices and assessments should be guided by the same
framework for understanding what is being attempted
in the classroom that informs the training of beginning
teachers and the continuing professional development of
experienced teachers. The reporting of assessment results
to parents, teachers, and other stakeholders should also be
based on this same framework, as should the evaluations of
effectiveness built into all systems. Each child should have
an equivalent opportunity to achieve the defined goals, and
the allocation of resources should reflect those goals.
( S-3) Following the best assessment practices is especially crucial
in cases in which assessment can have significant consequences for children, teachers, or programs. The NRC
report High Stakes: Testing for Tracking, Promotion, and Graduation (National Research Council, 1999) urged extreme
caution in basing high-stakes decisions on assessment outcomes, and we conclude that even more extreme caution

GUIDANCE ON OUTCOMES AND ASSESSMENTS

(S-4)

(S-5)

(S-6)

(S-7)

359

is needed when dealing with young children from birth

to age 5 and with the early care and education system. We
emphasize that a primary purpose of assessing children or
classrooms is to improve the quality of early childhood care
and education by identifying where more support, professional development, or funding is needed and by providing
classroom personnel with tools to track childrens growth
and adjust instruction.
Accountability is another important purpose for assessment, especially when significant state or federal investments are made in early childhood programs. Programlevel accountability should involve high stakes only under
very well-defined conditions: (a) data about input factors
are fully taken into account, (b) quality rating systems or
other program quality information has been considered in
conjunction with child measures, (c) the programs have
been provided with all the supports needed to improve,
and (d) it is clear that restructuring or shutting the program down will not have worse consequences for children
than leaving it open. Similarly, high stakes for teachers
should not be imposed on the basis of classroom functioning or child outcomes alone. Information about access to
resources and support for teachers should be gathered and
carefully considered in all such decisions, because sanctioning teachers for the failure of the system to support them is
inappropriate.
Performance (classroom-based) assessments of children
can be used for accountability, if objectivity is ensured by
checking a sample of the assessments for reliability and
consistency, if the results are appropriately contextualized
in information about the program, and if careful safeguards
are in place to prevent misuse of information.
Minimizing the burdens of assessment is an important goal;
being clear about purpose and embedding any individual
assessment decision into a larger system can limit the time
and money invested in assessment.
It is important to establish a common way of identifying
children for services across the early care and education,
family support, health, and welfare sectors.

360

EARLY CHILDHOOD ASSESSMENT

(S-8) Implementing assessment procedures requires skilled

administrators who have been carefully trained in the assessment procedures to be implemented; because direct assessments with young children can be particularly challenging,
more training may be required for such assessments.
(S-9) Implementation of a system-level approach requires having
services available to meet the needs of all children identified through screening, as well as requiring follow-up with
more in-depth assessments.
(S-10) If services are not available, it can be appropriate to use
screening assessments and then use the results to argue for
expansion of services. Failure to screen when services are
not available may lead to underestimation of the need for
services.
Research Agenda
Among the tasks of the committee was the development of a
research agenda to improve the quality and suitability of developmental assessment, across a wide array of purposes and for the
benefit of all the various children who will eventually be entering kindergarten. References to the need for research on assessment tools and the building of an assessment system, distributed
throughout this volume, especially in connection with concerns
about the adequacy of current instruments and processes, are
gathered together here. These recommendations relate specifically to research needs in connection with assessment tools and
the building of an assessment system, the committees specific
charge.
However, research related to assessment is dependent on
continued support for other basic research in child development
(especially as related to children of cultural and linguistic minorities), family functioning, effective programming for children and
families, and community supports. The research base that can
guide the development of assessments is based on theories of
learning that are also evolving (see National Research Council,
2006); it would be short-sighted to proceed as though everything
needed to do this well is already known. The relationship between
assessment tools and knowledge of child development is highly

GUIDANCE ON OUTCOMES AND ASSESSMENTS

361

intertwined. Advances in knowledge will proceed in tandem with

advances in assessment because a primary way that researchers
learn about what children know and can do or how one area of
development relates to another involves administering the currently available assessment tools. As assessment tools improve,
the knowledge base will expand; at the same time, innovations in
assessment will emerge from the expanding knowledge base.
Given the current state of assessment tools and how much
more understanding is needed about the development of young
children, especially those from other cultures or who speak other
languages, it is imperative that both the strengths and limitations
of any given set of assessments be acknowledged. Because very
young children are at even greater risk than older ones of bad
consequences resulting from the misuse of assessment, great care
must be taken not to impose the incomplete understandings in
the K-12 system on this more vulnerable population (National
Research Council and Institute of Medicine, 2000).
Instrument Development
The various assessments available for use with young children have their origins in a variety of theoretical frameworks and
purposes. Some were developed many years ago and thus do not
incorporate what is now known about development and learning.
Principles of assessment development and psychometric theory
also have advanced in recent years and these are not reflected in
older tools. Assessment development is a lengthy and resourceintensive process, but it is critically important that it be undertaken. Assessments are used to make a variety of decisions about
young children, including screening, diagnosis, and instructional
planning. With the emergence of more programs for young children and the need for accountability for those resources, assessment will become even more widespread. The quality of the
assessment tools must match the various demands being placed
on them, and that requires an investment in research on the development of new techniques.

362

EARLY CHILDHOOD ASSESSMENT

Basic Considerations About Assessment

The field presently lacks conceptual frameworks and the
measures necessary to move this research forward to systematically improve childrens learning. Preliminary research on the role
of context in learning suggests that environmental factors can
increase childrens engagement and participation (Christenson,
2004; Goldenberg, Rueda, and August, 2006), which in turn can
lead to increased learningand that the influence of contextual
contingencies on learning outcomes is mediated by childrens
motivation to learn (Rueda, 2007; Rueda and Yaden, 2006; Rueda
et al., 2001). Meaningful empirical work in this area will require
the convergence of research methods (e.g., multilevel statistics
and the mixing of qualitative approaches with experimental and
quasi-experimental designs) and social science disciplines (e.g.,
cognitive psychology, educational anthropology, the sociology of
education).
Conceptual and empirical research on child assessment is
needed to move beyond the individual level to understand that
processes outside the individualin the classroom (e.g., teacherchild interactions, peer-to-peer interactions), the home (e.g., frequency of words spoken, number of books), and the school (e.g.,
language instruction policies) affect learning.
Research is needed to apply the latest technical advances,
such as item response theory, to assessment development, to
ensure that assessments are providing good measurement for all
children. Most direct assessment tools and observations methods
are developed conceptually, without sufficient attention to ensuring adequate measurement at all ranges of the scale and for children from diverse backgrounds.
Development research is needed on assessments that span a
broader age range, ideally from birth to ages 6 or 7. Assessments
with a broader age span are needed for research to allow childrens
learning and development to be tracked longitudinally, through
the transition into the primary grades. They also are important for
program continuity, as children move from one early childhood
classroom to the next, and for relating childrens learning to early
learning guidelines. Finally, for children with developmental
delays, assessments that span the entire early childhood period

GUIDANCE ON OUTCOMES AND ASSESSMENTS

363

allow growth to be tracked on the same assessment, even if children are performing significantly below their age peers.
Recently developed tools for examining social emotional
development need further work to generate evidence about their
reliability, validity, and sensitivity to intervention approaches.
More work is needed to develop key constructs within the domain
of approaches to learning, as well as tools to measure those constructs and their role in childrens learning and development. The
shortcomings of current measures, especially standardized normreferenced measures for young children and those with special
needs, have been extensively documented, yet it is precisely these
kinds of measures that are often employed in large-scale data
collections. New measures are needed that accurately capture
childrens growth toward being able to meaningfully participate
in the variety of settings that make up their day-to-day lives.
Research is needed on how to effectively use technology in
all forms of early childhood assessment. Some assessments currently provide for online entry of data and computerized scoring
and automatic report generation, but more work is needed. More
research is needed on the use of computer adaptive procedures
for establishing floor and ceiling levels, to allow more in-depth
assessment at the childs current performance level. Computeradaptive assessment could be applicable to both direct and
observation-based measures.
For the Improvement of Screening
Research is needed to validate screening tools for the full
range of children represented in early childhood programs. There
is a need to continue to collect information on who currently conducts screenings, including consideration of the barriers working
against more widespread screening. There is a need for information on how many are screened, fail the screen, receive follow-up
testing, and receive treatment or intervention based on whether a
problem is verified. (Newborn hearing screening data is a model
for this; the dismal results on measures of follow-up have become
clear only because the data were systematically collected.)

364

EARLY CHILDHOOD ASSESSMENT

For the Improvement of Diagnostic Tests

More information is needed on the validity of currently available tools to identify the presence of a developmental delay or
atypical development (Are the right children being identified?).
Tools are needed for identifying developmental delay in children from other cultures and those who are speakers of other
languages.
For the Improvement of Observation-Based and
Curriculum-Based Child Measures
More research is needed in the use of authentic assessment
tools for program evaluation and accountability, including consideration of what level of training (and retraining) is necessary
to ensure that teachers reliably administer the assessment initially and over time, whether the use of observation-based tools
in an accountability system leads to inflated scores or otherwise
reduces its usefulness in the classroom, and what level of monitoring and supervision is required to ensure that the assessment is
administered consistently.
Information is needed about how to train teachers efficiently
and effectively in the administration and use of curriculumbased assessments. Further work is needed to determine whether
psychometric methods to address differences in how teachers use
rating scales need to be routinely applied when these approaches
are used for evaluation.
There is a need for research on the impact on practice of
ongoing assessment in the classroom, on the barriers to effective
implementation and use of ongoing assessment, and on the use
of progress monitoring for ensuring that all children are receiving
appropriate instruction.
Assessment Processes
Response to Intervention
Much more research and model development are needed on
the application of response to intervention (RTI) to identification

GUIDANCE ON OUTCOMES AND ASSESSMENTS

365

and service delivery in early childhood, especially as it relates

to developmentally appropriate practice. These questions are
critical:
How can RTI be applied effectively to preschoolers?
Will it allow for the earlier identification and intervention
for children with learning problems?
What type of assessment tools are needed to apply RTI to
early childhood?
Can these tools be used to plan instruction?
How can screening for RTI be integrated with ongoing
assessment for instructional planning? (The Institute of
Education Sciences will be funding an RTI center for preschoolers; proposals are now under review.)
Research is needed on the types of tools and types of information most useful to teachers for ongoing assessment.
Child Outcomes and Program Quality Standards
Research is needed on tools and processes to tap childrens
knowledge and skill in such domains as art, music, creativity,
science, and ethics. There is need for consistent definitions and
measures for key constructs in early social and emotional competencies, self-regulation, and the absence of serious behavior
problems. Parallel work is needed to establish their relationship to
early participation in learning activities and to academic achievement. Further research is needed to identify fruitful domain
structures and optimal content and formats for early learning
standards to serve as a model for states as they revise initial work.
Research should continue to identify program quality elements
that strengthen child outcomes.
Use of Assessment Tools and Processes with
Special Populations
Addressing Bias
Little work has been done to address the effect of bias in the

366

EARLY CHILDHOOD ASSESSMENT

assessment process for young children; such work is hampered

by disagreement about what constitutes bias and how it operates
with different populations. Research on how to address these
issues is needed to be able to move forward. More work is needed
to explore the influence of sampling and norming in reducing
bias. More work is needed to understand the effects of the examiner, rater, or the testing situation on all children, but especially
on populations subject to bias.
Work is needed to expand the universal design characteristics
of extant testing instruments, to make them optimally useful for
all children, including children with special needs and children
from cultural and language minorities, and to consider universal
design characteristics in the development of new instruments.
Work is needed on the functionality of various instruments with
different populations (e.g., for minority and nonminority children) in different settings (e.g., in a Head Start program and a
private, for-profit, preschool program).
English Language Learners
Research is needed to develop psychometrically sound native
language assessments for English language learners (ELLs). This
will require the expertise of several disciplines, including linguistics, cognitive psychology, education, and psychometrics. Further
empirical research is needed to evaluate the reliability and validity of traditional cognitive measures for English language learners
and intelligence tests developed for specific ELL populations. For
English language learners, empirical research is needed to inform
decisions about which accommodations to use, for whom, and
under what conditions.
There is a need for ongoing implementation research in the
area of professional development and training for assessing young
English language learners. This research needs to identify the substance of professional development to improve staff competencies
necessary to work as a part of a professional team; inform how
staff works with interpreters; guide how to choose and administer appropriate assessment batteries; and train practitioners to
develop their competence in second language acquisition, acculturation, and the evaluation of educational interventions.

GUIDANCE ON OUTCOMES AND ASSESSMENTS

with loss of funding because the children in them failed to meet

some external standard even though they had progressed enormously, programs subjected to evaluation using tests of capacities
that had not been included in the curriculum. Such readers will be
particularly sensitive to the notion that child assessment might be
included as a basis for program accountability.
Another large portion of the audience will filter the information in a report like this through a generally much more positive
view of assessment in early (and later) childhood. These readers
are thinking of the value to parents of the procedures for screening infants to identify those who need services. They would cite
the value to taxpayers of evaluating early childhood programs to
ensure they are of high quality and the value to practitioners, to
parents, and to children of having both progress monitoring and
formative assessments available to support program improvement. They would cite standards and associated assessments
as levers for program improvement, as well as the need to hold
publicly financed programs accountable for meeting their goals
of providing young children with supportive and stimulating
environments. They would point out how much has been learned
about child development from assessment, and how much more
we need to know.
Of course, quite a lot of the readers, like many members of
this committee, constitute a third groupthose who understand
the opportunities that well-thought-out and effective assessment
offers to inform teaching and program improvement, but who are
simultaneously acutely aware that poor practices abound even
in the face of the best information about how to do better. Representing the views of this latter group, this report attempts to take
neither a positive nor a negative view of assessment, although
we recognize the credibility of specific claims on both sides of the
controversy. The committee members represent the full range of
gut feelings about assessment. Some of us, reading early drafts
of these chapters, wrote comments suggesting that more warnings and cautions were needed, whereas others wrote comments
indicating that the view of assessment presented was much too
bleak, that the value of assessment in educational improvement
needed to be more robustly emphasized. We conclude, not that
the very positive or the very negative views are wrong, but that

376

EARLY CHILDHOOD ASSESSMENT

both are correct and that both are limited. The final version of the
report, thus, explicitly does not take the position that assessment
is here to stay and wed better learn to live with it. Rather, it takes
the position that assessments can make crucial contributions to
the improvement of childrens well-being, but only if they are
well designed, implemented effectively and in the context of systematic planning, and interpreted and used appropriately. Other
wise, assessment of children and programs can have negative
consequences for both. We conclude that the value of assessments
themselves cannot be judged without attention to the design of
the larger systems in which they are used.

References

SUMMARY
National Education Goals Panel. (1995). Reconsidering childrens early development
and learning: Toward common views and vocabulary. Washington, DC: Author.

Chapter 1
National Research Council. (2001). Eager to learn: Educating our preschoolers.
Committee on Early Childhood Pedagogy, B.T. Bowman, M.S. Donovan,
and M.S. Burns (Eds.). Commission on Behavioral and Social Sciences and
Education. Washington, DC: National Academy Press.
National Research Council and Institute of Medicine. (2000). From neurons
to neighborhoods: The science of early childhood development. Committee on
Integrating the Science of Early Childhood Development, J.P. Shonkoff and
D.A. Phillips (Eds.). Board on Children, Youth, and Families, Commission
on Behavioral and Social Sciences and Education. Washington, DC: National
Academy Press.

Chapter 2
Brown, G., Scott-Little, C., Amwake, L., and Wynn, L. (2007). A review of methods
and instruments used in state and local school readiness evaluations. (Issues and
Answers Report, REL 2007No. 004.) Washington, DC: U.S. Department of
Education, Institute of Education Sciences, National Center for Education
Evaluation and Regional Assistance, Regional Educational Laboratory
Southeast.

framework. Pediatrics, 116, 862-871.
Brazelton, T.B., and Nugent, J.K. (1995). The Neonatal Behavioral Assessment Scale.
Cambridge, England: MacKeith Press.
Buros Institute of Mental Measurements. (2007). The seventeenth mental measure
ments yearbook. Lincoln, NE: Buros Institute of Mental Measurements.
Camp, B.W. (2007). Evaluating bias in validity studies of developmental/
behavioral screening tests. Journal of Developmental & Behavioral Pediatrics,
28(3), 234-240.
Capute, A.J., and Shapiro, B.K. (1985). The motor quotient: A method for the
early detection of motor delay. American Journal of the Diseases of Children, 98,
692-697.
Carroll, A.E., and Downs, S.M. (2005). Comprehensive cost-utility analysis of
newborn screening strategies. Pediatrics, 117, S287-S295.
Centers for Disease Control and Prevention. (2007). Tested and confirmed elevated
blood lead levels by state, year, and blood lead level group for children < 72 mos. Atlanta:
Author. Available: http://www.cdc.gov/nceh/lead/surv/stats.htm [accessed
October 2008].
Chandler, L., Andrews, M., and Swanson, M. (1980). The movement assessment of
infants: A manual. Rolling Bay, WA: Infant Movement Research.
Child Trends. (2004). Early childhood measures profiles. Washington, DC: Author.
Clarke-Stewart, K.A., Fitzpatrick, M.J., Allhusen, F.D., and Goldberg, W.A. (2000).
Measuring difficult temperament the easy way. Journal of Developmental and
Behavioral Pediatrics, 21, 207-220.
Coplan, J. (1993). Early language milestone scale. Austin, TX: Pro-Ed.
Das Eiden, R., and Reifman, A. (1996). Effects of Brazelton demonstrations on later
parenting: A meta-analysis. Journal of Pediatric Psychology, 21(6), 857-868.
Dumont-Mathieu, T., and Fine, D. (2005). Screening for autism in young children:
The Modified Checklist for Autism in Toddlers (M-CHAT) and other measures.
Mental Retardation and Developmental Disabilities Research Review, 11, 253-262.
Emory, E.K., and Walker, E.F. (1982). Relationship between birth weight and
neonatal behavior. In L.P. Lipsitt and T.M. Field (Eds.), Infant behavior and
development: Perinatal risk & newborn behavior (pp. 21-31). New York: Ablex.
Emory, E.K., Ansari, Z., Pattillo, R., Archibold, E., and Chevalier, J. (2003).
Maternal blood lead effects on infant intelligence at age 7 months. American
Journal of Obstetrics and Gynecology, 188, S26-S32.
Emory, E.K., Pattillo, R., Archibold, E., Bayorh, M., and Sung, F. (1999).
Neurobehavioral effects of low-level lead exposure in human neonates.
American Journal of Obstetrics and Gynecology, 181, S2-S11.
Emory, E.K., Tynan, W.D., and Dav, R. (1989). Neurobehavioral anomalies in
neonates with seizures. Journal of Clinical and Experimental Neuropsychology,
11(2), 231-240.
Emory, E.K., Walker, E.F., and Cruz, A. (1982). Fetal heart rate part II: Behavioral
correlates. Psychophysiology, 19(6), 680-686.

384

EARLY CHILDHOOD ASSESSMENT

Fisher, E.S., and Welch, H.G. (1999). Avoiding the unintended consequences of
growth in medical care: How might more be worse. The Journal of the American
Medical Association, 281(5), 446-453.
Folio, M.R., and Fewell, R.R. (1983). Peabody developmental motor scales and activity
cards: A manual. Allen, TX: DLM Teaching Resources.
Freed, G.L., Nahra, T.A., and Wheeler, J.R.C. (2004). Which physicians are
providing health care to Americas children? Archives of Pediatrics and
Adolescent Medicine, 158, 22-26.
Gilliam, W.S., Meisels, S.J., and Mayes, L. (2005). Screening and surveillance in
early intervention systems. In M.J. Guralnick (Ed.), The developmental systems
approach to early intervention. Baltimore, MD: Brookes.
Glascoe, F.P. (2003). Parents evaluation of developmental status: How well do
parents concerns identify children with behavioral and emotional problems?
Clinical Pediatrics, 42, 133-138.
Glascoe, F.P. (2005). Screening for developmental and behavioral problems.
Mental Retardation and Developmental Disabilities Research Reviews, 11(3),
173-179.
Glascoe, F.P., Martin, E.D., and Humphrey, S. (1990). A comparative review of
developmental screening tests. Pediatrics, 86, 5467-5554.
Grandjean, P., and Landrigan, P. (2006). Developmental neurotoxicity of industrial chemicals. The Lancet, 368(9553), 2167-2178.
Grantham-McGregor, S. (1984). Chronic undernutrition and cognitive abilities.
Human Nutrition-Clinical Nutrition, 38(2), 83-94.
Gravel, J.S., Fausel, N., Liskow, C., and Chobot, J. (1999). Childrens speech
recognition in noise using omni-directional and dual-microphone hearing aid
technology. Ear & Hearing, 20(1), 1-11.
Ireton, H. (1992). Child development inventory manual. Minneapolis, MN: Behavior
Science Systems.
Jacobs, S.E., Sokol, J., and Ohlsson, A. (2002). The newborn individualized
developmental care and assessment program is not supported by metaanalyses of the data. Journal of Pediatrics, 140, 699-706.
Johnson, J.O. (2005). Whos minding the kids? Child care arrangements: Winter,
2002. Current Population Reports (P70-101). Washington, DC: U.S. Census
Bureau.
Kaye, C.I., and the Committee on Genetics. (2006). Introduction to the newborn
screening fact sheets. Pediatrics, 118(3), 1304-1312.
Lanphear, B.P., Dietrich, K., Auinger, P., and Cox, C. (2000). Cognitive deficits
associated with blood lead concentrations, 10Mg/dl in US children and
adolescents. Public Health Reports, 115, 521-529.
Lanphear, B.P., Hornung, R., Ho, M., Howard, C., Eberli, S., and Knauf, D.K.
(2002). Environmental lead exposure during early childhood. Journal of
Pediatrics, 140, 40-47.
Lloyd-Puryear, M.A., Tonniges, T., van Dyck, P.C., Mann, M.Y., Brin, A., Johnson,
K., and McPherson, M. (2007). American Academy of Pediatrics Newborn
Screening Task Force recommendations: How far have we come? Pediatrics,
117(5 Pt. 2), S194-S211.

REFERENCES

385

Lozoff, B., Jiminez, E., Hagen, J., Mollen, E., and Wolf, A.W. (2000). Poorer
behavioral and developmental outcome more than 10 years after treatment for
iron deficiency in infancy. Pediatrics, 105(4), e51. Available: http://pediatrics.
aappublications.org/cgi/content/abstract/105/4/e51 [accessed August
2008].
Lozoff, B., Andraca, I.D., Castillo, M., Smith, J.B., Walter, T., and Pino, P. (2003).
Behavioral and developmental effects of preventing iron-deficiency anemia in
healthy full-term infants. Pediatrics, 112(4), 846-854.
Mangione-Smith, R., DeCristofaro, A.H., Setodji, C.M., Keesey, J., Klein, D.J.,
Adams, J.L., Schuster, M.A., and McGlynn, E.A. (2007). The quality of
ambulatory care delivered to children in the United States. The New England
Journal of Medicine, 357(15), 1515-1523.
Mathematica Policy Research. (2003). Resources for measuring services and outcomes
in Head Start programs serving infants and toddlers. Princeton, NJ: Author.
McCormick, M. (2008). Issues in measuring child health. Ambulatory Pediatrics,
8(2), 77-84.
Moeller, M.P. (2000). Early intervention and language development in children
who are deaf and hard of hearing. Pediatrics, 106, e43. Available: http://
pediatrics.aappublications.org/cgi/content/full/106/3/e43 [accessed August
2008].
Morgan, A.M., and Aldag, J.C. (1996). Early identification of cerebral palsy using
a profile of abnormal motor patterns. Pediatrics, 98(4), 692-697.
National Center for Hearing Assessment and Management. (2007). Early hearing
detection and intervention (EHDI) resources and information. Available: http://
www.infanthearing.org/ehdi.html [accessed July 2008].
National Research Council. (2002). Visual impairments: Determining eligibility
for Social Security benefits. Committee on Disability Determination for
Individuals with Visual Impairments, P. Lennie and S.B. Van Hemel (Eds.).
Board on Behavioral, and Sensory Sciences, Center for Studies of Behavior
and Development, Division of Behavioral and Social Sciences and Education.
Washington, DC: National Academy Press.
National Research Council and Institute of Medicine. (2000). From neurons
to neighborhoods: The science of early childhood development. Committee on
Integrating the Science of Early Childhood Development, J.P. Shonkoff and
D.A. Phillips (Eds.). Board on Children, Youth, and Families, Commission
on Behavioral and Social Sciences and Education. Washington, DC: National
Academy Press.
Needleman, H.L., and Gatsonis, C.A. (1990). Low-level lead exposure and the
IQ of children: A meta-analysis of modern studies. The Journal of the American
Medical Association, 263(2), 673-678.
Newman, T.B., Browner, W.S., and Hulley, S.B. (1990). The case against childhood
cholesterol screening. The Journal of the American Medical Association, 264, 30393043.
Piper, M.C., and Darrah, J. (1994). Motor assessment of the developing infant.
Philadelphia: W.B. Saunders.

386

EARLY CHILDHOOD ASSESSMENT

Putnam, S.P., and Rothbart, M.K. (2006). Development of short and very
short forms of the childrens behavior questionnaire. Journal of Personality
Assessment, 87, 102-112.
Rutter, M., Bailey, A., and Lord, C. (2003). The social and communication questionnaire (SCQ) manual. Los Angeles, CA: Western Psychological Services.
Schulze, A., Lindner, M., Kohlmuller, D., Olgemoller, K., Mayatepek, E., and
Hoffman, G.F. (2003). Expanded newborn screening for inborn errors of
metabolism by electrospray ionization-tandem mass spectrometry: Results,
outcome, and implications. Pediatrics, 111, 1399-1406.
Shepherd, P.A., and Fagan, J.F. (1981). Visual pattern detection and recognition
memory in children with profound mental retardation. In N.R. Ellis (Ed.),
International review of research in mental retardation. New York: Academic Press.
Siegel, B. (2004). Pervasive developmental disorders screening test-II (PDDST-II). Early
childhood screener for autistic spectrum disorders. San Antonio, TX: Harcourt
Assessment.
Simpson, L., Owens, P., Zodet, M., Chevarley, F., Dougherty, D., Elixhauser,
A., and McCormick, M.C. (2005). Health care for children and youth in the
United States: Annual report on patterns of coverage, utilization, quality, and
expenditures by income. Ambulatory Pediatrics, 5(1), 45-46.
Stone, W.I., Coonrod, E.E., and Ousley, O.Y. (2000). Brief report: Screening tool for
autism in two-year-olds (STAT): Development and preliminary data. Journal of
Autism and Developmental Disorders, 30, 607-701.
Teti, D.M., and Gelfand, D.M. (1997). The preschool assessment of attachment:
Construct validity in a sample of depressed and nondepressed families.
Development and Psychopathology, 9, 517-536.
Tronick, E.Z. (1987). The Neonatal Behavioral Assessment Scale as a biomarker
of the effects of environmental agents on the newborn. Environmental Health
Perspectives, 74, 185-189.
U.S. Department of Health and Human Services, National Center for Health
Statistics. (1981). National health interview survey-1981 child health supplement.
Washington, DC: Author.
U.S. Preventive Services Task Force. (2001). Guide to clinical preventive services.
Washington, DC: Office of Disease Prevention and Health Promotion.
U.S. Preventive Services Task Force. (2004). Screening for visual impairment in
children younger than age 5 years: Update of the evidence. Rockville, MD: Agency
for Healthcare Research and Quality. Available: http://www.ahrq.gov/clinic/
uspstf/uspsvsch.htm [accessed July 2008].
U.S. Preventive Services Task Force. (2006). Screening for iron deficiency anemia
Including iron supplementation for children and pregnant women. Washington, DC:
Office of Disease Prevention and Health Promotion.
Voigt, R.G., Brown, F.R., Fraley, J.K., Liorente, A.M., Rozelle, J., Turcich, M.,
Jensen, C.L., and Heird, W.C. (2003). Concurrent and predictive validity of
the Cognitive Adaptive Test/Clinical Linguistic and Auditory Milestone Scale
(CAT/CLAMS) and the mental developmental index of the Bayley Scales of
Infant Development. Clinical Pediatrics, 42(5), 427-432.

Sylva, K., Siraj-Blatchford, I., and Taggart, B. (2003). Assessing quality in the early
years: Early Childhood Environment Rating Scale-Extension (ECERS-E): Four
curricular subscales. Stoke-on Trent, Staffordshire, England: Trentham Books.
Sylva, K., Siraj-Blatchford, I., Taggart, B., Sammons, P., Melhuish, E., Elliot,
K., and Totsika, V. (2006). Capturing quality in early childhood through
environmental rating scales. Early Childhood Research Quarterly, 21, 76-92.
Tout, K., Zaslow, M., and Martinez-Beck, I. (forthcoming). Measuring the quality
of early care and education programs at the intersection of research, policy,
and practice. Submitted to Child Development Perspectives.
U.S. Department of Health and Human Services, Administration for Children
and Families. (2004). Early Head Start research: Making a difference in the lives of
infants, toddlers, and their families. The impacts of early Head Start, volume 1: Final
technical report. Washington, DC: Author.
U.S. Department of Health and Human Services, Administration for Children
and Families. (2005). Head Start impact study: First year findings. Washington,
DC: Author.
Van Horn, M., and Ramey, S. (2004). A new measure for assessing develop
mentally appropriate practices in early elementary school, a developmentally
appropriate practice template. Early Childhood Research Quarterly, 19, 569-587.
Vandell, D. (2004). Early child care: The known and the unknown. Merrill-Palmer
Quarterly, 50, 387-414.
Vu, J.A., Jeon, H., and Howes, C. (in press). Formal education, credential, or both:
Early childhood program classroom practices. Submitted to Early Education
and Development.
Wasik, B.H., and Bryant, D.M. (2001). Home visiting: Procedures for helping families
(2nd ed.). Newbury Park, CA: Sage.
Weinfield, N.S., Egeland, B., and Ogawa, J.R. (1998). Affective quality of motherchild interactions. In C.A. Eldred (Ed.), Parenting behaviors in a sample of young
single mothers in poverty: Results of the New Chance Observational Study (pp. 71113). New York: Manpower Demonstration Research.
Wesley, P.W. (1994). Providing on-site consultation to promote quality in
integrated child care programs. Journal of Early Intervention, 18(4), 391-402.
Yoder, P.J., and Warren, S.F. (2001). Relative treatment effects of two prelinguistic
communication interventions on language development in toddlers with
developmental delays vary by maternal characteristics. Journal of Speech
Language and Hearing Research, 44, 224-237.
Zaslow, M. (2008). Issues for the learning community from the Head Start Impact
Study. Infants and Young Children, 21(1), 4-17.
Zaslow, M., Halle, T., Martin, L., Cabrera, N., Calkins, J., Pitzer, L., and Margie,
N.G. (2006). Child outcome measures in the study of child care quality:
Overview and next steps. Evaluation Review, 30, 577-610.

403

REFERENCES

Chapter 7
American Educational Research Association, American Psychological Associa
tion, and National Council on Measurement in Education. (1999). Standards
for educational and psychological testing. Washington, DC: American Educational
Research Association.
American Institutes for Research. (2000). Voluntary national test, cognitive laboratory
report, year 2. Palo Alto, CA: Author.
Bagnato, S.J., Smith-Jones, J., McComb, G., and Cook-Kilroy, J. (2002). Quality early
learningKey to school success: A first-phase 3-year program evaluation research
report for Pittsburghs Early Childhood Initiative (ECI). Pittsburgh, PA: SPECS
Program Evaluation Research Team.
Bagnato, S.J., Suen, H., Brickley, D., Jones, J., and Dettore, E. (2002). Child
developmental impact of Pittsburghs Early Childhood Initiative (ECI) in
high-risk communities: First-phase authentic evaluation research. Early
Childhood Research Quarterly, 17(4), 559-589.
Brennan, R.L. (2006). Perspectives on the evolution and future of educational
measurement. In R.L. Brennan (Ed.), Educational measurement (4th ed., pp. 116). Westport, CT: ACE/Praeger.
Buros Institute of Mental Measurements. (2007). The seventeenth mental measure
ments yearbook. Lincoln, NE: Author.
Campbell, D.T., and Stanley, J.C. (1966). Experimental and quasi-experimental designs
for research. Chicago, IL: Rand McNally.
Child Trends. (2004). Early childhood measures profiles. Washington, DC: Author.
Cook, T.D., and Campbell, D.T. (1979). Quasi-experimentation: Design & analysis
issues for field settings. Boston: Houghton Mifflin.
Cronbach, L.J. (1951). Coefficient alpha and the internal structure of tests.
Psychometrika, 16(3), 297-334.
Cronbach, L.J. (1971). Test validation. In R.L. Thorndike (Ed.), Educational
measurement (2nd ed., pp. 443-507). Washington, DC: American Council on
Education.
Cronbach, L.J., and Meehl, P.E. (1955). Construct validity in psychological tests.
Psychological Bulletin, 52, 281-302.
Cureton, E.E. (1951). Validity. In E.F. Lindquist (Ed.), Educational measurement (pp.
621-694). Washington, DC: American Council on Education.
Dorans, N.J., and Holland, P.W. (1993). DIF detection and description: MantelHaenszel and standardization. In P.W. Holland and H. Wainer (Eds.),
Differential item functioning. Hillsdale, NJ: Lawrence Erlbaum Associates.
Goldstein, H. (1996). Assessment: Problems, developments, and statistical issues: A
volume of expert contributions. New York: Wiley.
Gulliksen, H. (1950). Theory of mental tests. New York: Wiley.
Holland, P.W., and Wainer, H. (1993). Differential item functioning. Hillsdale, NJ:
Lawrence Erlbaum Associates.
Huang, X. (2007). Validity equivalence between the Chinese and English versions
of the IEA Child Cognitive Developmental Status Test. Berkeley: University of
California.

404

EARLY CHILDHOOD ASSESSMENT

Hunter, J.E., and Schmidt, F.L. (1990). Dichotomization of continuous variables:

The implications for meta-analysis. Journal of Applied Psychology, 75(3), 334349.
Kane, M.T. (2006). Validation. In R.L. Brennan (Ed.), Educational measurement (4th
ed., pp. 17-64). Westport, CT: American Council on Education/Praeger.
Koretz, D.M., and Hamilton, L.S. (2006). Testing for accountability in K-12. In R.L.
Brennan (Ed.), Educational measurement (4th ed., pp. 531-578). Westport, CT:
American Council on Education/Praeger.
Kuder, G.F., and Richardson, M.W. (1937). The theory of the estimation of test
reliability. Psychometrika, 2(3), 151-160.
La Paro, K., and Pianta, R. (2000). Predicting childrens competence in the early
school years. A meta-analytic review. Review of Educational Research, 70(4),
443-484.
Laosa, L.M. (1991). The cultural context of construct validity and the ethics of
generalizability. Early Childhood Research Quarterly, 6, 313-321.
Longford, N.T., Holland, P.W., and Thayer, D.T. (1993). Stability of the MH D-DIF
statistics across populations. In P.W. Holland and H. Wainer (Eds.), Differential
item functioning. Hillsdale, NJ: Lawrence Erlbaum Associates.
Mathematica Policy Research. (2003). Resources for measuring services and outcomes
in Head Start programs serving infants and toddlers. Princeton, NJ: Author.
Mathematica Policy Research. (2007). Measuring childrens progress from preschool
through third grade. Princeton, NJ: Author.
Meisels, S. (2007). Accountability in early childhood: No easy answers. In R.C.
Pianta, M.J. Cox, and K. Snow (Eds.), Schools readiness and the transition to
kindergarten in the era of accountability (pp. 31-47). Baltimore, MD: Brookes.
Messick, S. (1989). Validity. New York: Macmillan.
Murphy, K.R. (Ed.). (2003). Validity generalization: A critical review. Mahwah, NJ:
Lawrence Erlbaum Associates.
Murphy, L.L., Spies, R.A., and Plake, B.S. (Eds.). (2006). Tests in print (vol. VII).
Lincoln: University of Nebraska Press.
National Association for the Education of Young Children and National
Association of Early Childhood Specialists in State Departments of Education.
(2003). Early childhood curriculum, assessment, and program evaluation: Building an
effective, accountable system in program for children birth through age 8, A position
statement. Washington, DC: Author.
National Research Council. (1989). Fairness in employment testing: Validity
generalization, minority issues, and the General Aptitude Test Battery. J.A.
Hartigan and A.K. Wigdor (Eds.). Board on Mathematical Sciences and Their
Applications, Division on Engineering and Physical Sciences. Washington,
DC: National Academy Press.
Popham, W.J. (2007). Another bite out of the apple. Educational Leadership, 64(6),
83-84.
Shavelson, R.J., and Webb, N.M. (1991). Generalizability theory: A primer
(measurement methods for the social sciences). New York: Sage.

405

REFERENCES

Thissen, D., Steinberg, L., and Wainer, H. (1993). Detection of differential item
function using the parameters of item response models. In P.W. Holland and
H. Wainer (Eds.), Differential item functioning. Hillsdale, NJ: Lawrence Erlbaum
Associates.
Wilson, M. (2005). Constructing measures: An item response modeling approach.
Mahwah, NJ: Lawrence Erlbaum Associates.
Wilson, M., and Adams, R.J. (1996). Evaluating progress with alternative
assessments: A model for Chapter 1. In M.B. Kane (Ed.), Implementing
performance assessment: Promise, problems and challenges. Hillsdale, NJ: Lawrence
Erlbaum Associates.
Zhou, Z., and Boehm, A.E. (1999). Chinese and American childrens knowledge of basic
relational concepts. Paper presented at the biennial meeting of the Society for
Research in Child Development, April, Albuquerque, NM.

Chapter 8
Abedi, J., Hofstetter, C.H., and Lord, C. (2004). Assessment accommodations for
English-language learners: Implications for policy-based empirical research.
Review of Educational Research, 74(1), 1-28.
Abedi, J., Lord, C., Hofstetter, C., and Baker, E. (2000). Impact of accommodation
strategies on English language learners test performance. Educational
Measurement: Issues and Practice, 19(3), 16-26.
American Educational Research Association, American Psychological Associa
tion, and National Council on Measurement in Education. (1999). Standards for
educational and psychological testing. Washington, DC: Author.
August, D., and Shanahan, T. (Eds.). (2006). Developing literacy in second-language
learners: Report of the National Literacy Panel on language-minority children and
youth. Mahwah, NJ: Lawrence Erlbaum Associates.
Bagnato, S.J. (2007). Authentic assessment for early childhood intervention: Best
practices. New York: Guilford Press.
Bagnato, S.J., and Neisworth, J. (1995). A national study of the social and
treatment invalidity of intelligence testing in early intervention. School
Psychologist Quarterly, 9(2), 81-102.
Bagnato, S.J., and Yeh-Ho, H. (2006). High-stakes testing with preschool children:
Violation of professional standards for evidence-based practice in early
childhood intervention. KEDI International Journal of Educational Policy, 3(1),
23-43.
Bagnato, S.J., Macey, M., Salaway, J., and Lehman, C. (2007a). Research foundations
for authentic assessments to ensure accurate and representative early intervention
eligibility. Washington, DC: U.S. Department of Education, Office of Special
Education Programs, TRACE Center for Excellence.
Bagnato, S.J., Macey, M., Salaway, J., and Lehman, C. (2007b). Research foundations
for conventional tests and testing to ensure accurate and representative early
intervention eligibility. Washington, DC: U.S. Department of Education, Office
of Special Education Programs, TRACE Center for Excellence.

406

EARLY CHILDHOOD ASSESSMENT

Bagnato, S.J., McKeating-Esterle, E., and Bartolomasi, P. (2007). Evidence-base for

team assessment practices in early intervention. Washington, DC: U.S. Department
of Education, Office of Special Education Programs, TRACE Center for
Excellence.
Bailey, D. (2004). Tests and test development. In M. McLean, M. Wolery, and D.B.
Bailey, Jr. (Eds.), Assessing infants and preschoolers with special needs (3rd ed., pp.
22-44). Upper Saddle River, NJ: Pearson.
Bailey, D.B., and Wolery, M. (1984). Teaching infants and preschoolers with handicaps.
Columbus, OH: Merrill.
Bainter, T.R., and Tollefson, N. (2003). Intellectual assessment of language
minority students: What do school psychologists believe are acceptable
practices? Psychology in the Schools, 40(6), 899-903.
Barnett, D.W., Elliott, N., Wolsing, L., Bunger, C.E., Haski, H., McKissick, C., and
Vander Meer, C.D. (2006). Response to intervention for young children with
extremely challenging behaviors: What it might look like. School Psychology
Review, 35(4), 568-582.
Boone, H.A., and Crais, E. (2002). Strategies for achieving family-driven
assessment and intervention planning. In M.M. Ostrosky and E. Horn (Eds.),
Assessment: Gathering meaningful information (Vol. Monograph Series No. 4, pp.
1-14). Missoula, MT: Division for Early Childhood.
Borghese, P., and Gronau, R.C. (2005). Convergent and discriminant validity of the
universal nonverbal intelligence test with limited English proficient MexicanAmerican elementary students. Journal of Psychoeducational Assessment, 23,
128-139.
Botts, D.C., Losard, A., and Notari-Syverson, A. (2007). Alternative assessment:
The pathway to indivualized instruction for young children. Young Exceptional
Children Monograph Series, 9, 71-85.
Bracken, B., and McCallum, R.S. (1998). The universal nonverbal intelligence test.
Chicago, IL: Riverside.
Bricker, D., Pretti-Froniczak, K., and McComas, N. (1998). An activity-based
approach to early intervention (2nd ed.). Baltimore, MD: Brookes.
Brooks-Gunn, J., Klebanov, P.K., Smith, J., Duncan, G.J., and Lee, K. (2003). The
black-white test score gap in young children: Contributions of test and family
characteristics. Applied Developmental Science, 7(4), 239-252.
Brown, R.T., Reynolds, C.R., and Whitaker, J.S. (1999). Bias in mental testing since
Bias in Mental Testing. School Psychology Quarterly, 14(3), 208-238.
Capps, R., Fix, M., Murray, J., Ost, J., Passel, J.S., and Herwantoro, S. (2005). The
new demography of Americas schools: Immigration and the No Child Left Behind Act.
Washington, DC: The Urban Institute.
Carta, J.J., and Kong, N.K. (2007). Trends and issues in intervention for
preschoolers with developmental disabilities. In S.L. Odom, R.H. Horner, M.E.
Snell, and J. Blacher (Eds.), Handbook of developmental disabilities (pp. 181-198).
New York: Guilford Press.

National Association of School Psychologists. (2005). Position statement on early

childhood assessment. Bethesda, MD: Author.
National Association of Test Directors. (2004). The achievement gap: Test bias or
school structures? Paper presented at the National Council on Measurement in
Education, San Diego.
National Child Care Information Center. (2005). Federal and state funding for early
child care and education. Fairfax, VA: Author.
National Clearinghouse for English Language Acquisition. (2006). The growing
numbers of limited English proficient students: 1993/94-2003/04. Washington, DC:
Office of English Language Acquisition, U.S. Department of Education.
National Early Childhood Accountability Task Force. (2007). Taking stock: Assessing
and improving early childhood learning and program quality. Philadelphia:
Author.
National Research Council. (1997). Educating one and all: Students with disabilities
and standards-based reform. Committee on Goals 2000 and the Inclusion of
Students with Disabilities, L.M. McDonnell and P. Morison (Eds.). Board on
Testing and Assessment, Center for Education, Commission on Behavioral and
Social Sciences and Education. Washington, DC: National Academy Press.
National Research Council. (2000). Testing English language learners in U.S.
schools: Report and workshop summary. Committee on Educational Excellence
and Testing Equity, K. Hakuta and A. Beatty (Eds.). Board on Testing and
Assessment, Center for Education, Division of Behavioral and Social Sciences
and Education. Washington, DC: National Academy Press.
Neisworth, J.T., and Bagnato, S.J. (2004). The mismeasure of young children.
Infants and Young Children, 17(3), 198-212.
Neisworth, J., and Bagnato, S. (2005). DEC Recommended practices: Assessment.
In S. Sandall, M.L. Hemmeter, B.J. Smith, and M. McLean (Eds.), DEC
recommended practices. A comprehensive guide for practical application in early
intervention/early childhood special education (pp. 45-70). Longmont, CO: Sopris
West.
Notari-Syverson, A., Losardo, A., and Lim, Y.S. (2003). Assessment of young
children from culturally diverse backgrounds: A journey in progress.
Assessment for Effective Intervention, 29(1), 39-51.
Ochoa, S.H., Galarza, S., and Amado, A. (1996). An investigation of school
psychologists assessment practices of language proficiency with bilingual
and limited-English-proficient students. Diagnostique, 21(4), 17-36.
Ochoa, S.H., Gonzalez, D., Galarza, A., and Guillemard, L. (1996). The training
and use of interpreters in bilingual psychoeducational assessment: An
alternative in need of study. Diagnostique, 21(3), 19-40.
Ogbu, J.U. (1981). Origins of human competence: A cultural-ecological perspective.
Child Development, 52, 413-429.
Ogbu, J.U. (2004). Collective identity and the burden of acting white in black
history, community, and education. Urban Review, 36(1), 1-35.
Paredes Scribner, A. (2002). Best assessment and intervention practices with
second language learners. In A.T.J. Grimes (Ed.), Best practices in school
psychology IV. Bethesda, MD: National Association of School Psychologists.

REFERENCES

413

Pretti-Frontczak, K., Jackson, S., Gross, S.M., Grisham-Brown, J., Horn, E., and
Harjusola-Webb, S. (2007). A curriculum framework that supports quality
early education for all children. In E.M. Horn, C. Peterson, and L. Fox (Eds.),
Linking curriculum to child and family outcomes (pp. 16-28). Missoula, MT:
Division for Early Childhood.
Qi, C.H., Kaiser, A.P., Milan, S.E., Yzquierdo, Z., and Hancock, T.B. (2003). The
performance of low-income African American children on the Preschool
Language Scale-3. Journal of Speech, Language, and Hearing Research, 46, 576590.
Rebell, M.A. (1989). Testing, public policy, and the courts. In B. Gifford (Ed.), Test
policy and the politics of opportunity allocation: The workplace and the law. Boston:
Kluwer Academic.
Reynolds, C.R. (1982). Methods for detecting construct and prediction bias.
In R.A. Berk (Ed.), Handbook of methods for detecting test bias (pp. 199-259).
Baltimore, MD: Johns Hopkins University Press.
Reynolds, C.R. (1983). Test bias: In God we trust; all others must have data. Journal
of Special Education, 17(3), 241-260.
Reynolds, C.R., and Kamphaus, R.W. (2003). Behavior assessment system for children
(2nd ed.). Minneapolis, MN: Pearson.
Reynolds, C.R., Lowe, P.A., and Saenz, A.L. (1999). The problem of bias in
psychological assessment. In C.R. Reynolds and T.B. Gutkin (Eds.), Handbook
of school psychology (3rd ed., pp. 549-595). New York: Wiley.
Rhodes, R., Ochoa, S.H., and Ortiz, S. (2005). Assessing culturally and linguistically
diverse students: A practical guide. New York: Guilford Press.
Rock, D.A., and Stenner, A.J. (2005). Assessment issues in the testing of children
at school entry. The Future of Children, 15(1), 15-34.
Rodrigue, J.R., Morgan, S.B., and Geffken, G.R. (1991). A comparative evaluation
of adaptive behavior in children and adolescents with autism, Down
syndrome, and normal development. Journal of Autism and Developmental
Disorders, 21(2), 187-196.
Rueda, R. (2007). Motivation, learning, and assessment of English learners.
Paper presented at the School of Education, California State University,
Northridge.
Rueda, R., and Yaden, D. (2006). The literacy education of linguistically and
culturally diverse young children: An overview of outcomes, assessment, and
large-scale interventions. In B. Spodek and O.N. Saracho (Eds.), Handbook of
research on the education of young children (2nd ed., pp. 167-186). Mahwah, NJ:
Lawrence Erlbaum Associates.
Rueda, R., MacGillivray, L., Monz, L., and Arzubiaga, A. (2001). Engaged
reading: A multi-level approach to considering sociocultural features with
diverse learners. In D. McInerny and S.V. Etten (Eds.), Research on sociocultural
influences on motivation and learning (pp. 233-264). Greenwich, CT: Information
Age.
Santos, R.M., Lee, S., Valdivia, R., and Zhang, C. (2001). Translating translations:
Selecting and using translated early childhood materials. Teaching Exceptional
Children, 34(2), 26-31.

414

EARLY CHILDHOOD ASSESSMENT

Scarborough, A.A., Hebbeler, K.M., and Spiker, D. (2006). Eligibility characteristics

of infants and toddlers entering early intervention in the United States. Journal
of Policy and Practice in Intellectual Disabilities, 3(1), 57-64.
Seymour, H.N., Roeper, T.W., de Villiers, J., and de Villiers, P.A. (2003). Diagnostic
evaluation of language variation. San Antonio, TX: Pearson Assessments.
Shackelford, J. (2006). State and jurisdictional eligibility definitions for infants and
toddlers with disabilities under IDEA. NECTAC Notes, 21, 1-16.
Sharma, S. (1986). Assessment strategies for minority groups. Journal of Black
Studies, 17(1), 111-124.
Skiba, R.J., Knesting, K., and Bush, L.D. (2002). Culturally competent assessment:
More than nonbiased tests. Journal of Child and Family Studies, 11(1), 61-78.
Slaughter-Defoe, D.T. (1995). Revisiting the concept of socialization: Caregiving
and teaching in the 90s, a personal perspective. American Psychologist, 50(4),
276-286.
Snyder, P., Lawson, S., Thompson, B., Stricklin, S., and Sexton, D. (1993). Evaluating
the psychometric integrity of instruments used in early intervention research:
The Battelle Developmental Inventory. Topics in Early Childhood Special
Education, 13(2), 216-232.
Spiker, D., and Hopmann, M. (1997). The effectiveness of early intervention for
children with Down syndrome. In M. Guralnick (Ed.), The effectiveness of early
intervention (pp. 271-306). Baltimore, MD: Brookes.
Stockman, I.J. (2000). The new Peabody Picture Vocabulary Test-III: An illusion of
unbiased assessment? Language, Speech, and Hearing Services in Schools, 31(4),
340-353.
Suen, H.K., Logan, C.R., Neisworth, J.T., and Bagnato, S. (1995). Parentprofessional congruence: Is it necessary? Journal of Early Intervention, 19(3),
243-252.
Suen, H.K., Lu, C.H., Neisworth, J.T., and Bagnato, S.J. (1993). Measurement
of team decision making through generalizability theory. Journal of Psycho
educational Assessment, 11, 120-132.
Thompson, S., and Thurlow, M. (2002). Universally designed assessments: Better
tests for everyone! Minneapolis: University of Minnesota, National Center on
Educational Outcomes.
Thompson, S.J., Johnstone, C.J., and Thurlow, M.L. (2002). Universal design applied
to large-scale assessment. Minneapolis: University of Minnesota, National
Center on Educational Outcomes.
U.S. Department of Education. (2008). Biennial report to Congress on the
implementation of the Title II state formula grant program, school years 2004-2006.
Washington, DC: Author.
Valencia, R.R., and Suzuki, L.A. (2001). Intelligence testing and minority students:
Foundations, performance factors, and assessment issues. Thousand Oaks, CA:
Sage.
VanDerHayden, A.M., and Snyder, P. (2006). Integrating frameworks from early
childhood intervention and school psychology to accelerate growth for all
children. School Psychology Review, 35(4), 519-534.

REFERENCES

415

Washington, J.A., and Craig, H.K. (1992). Performances of low-income, African

American preschool and kindergarten children on the Peabody Picture
Vocabulary Test-Revised. Language, Speech, and Hearing Services in Schools, 23,
329-333.
Wechsler, D. (2003). The Wechsler Intelligence Scale for Children (4th ed.). San
Antonio, TX: Psychological Corporation.
Wechsler, D. (2004). The Wechsler Intelligence Scale for ChildrenSpanish (4th ed.).
San Antonio, TX: Psychological Corporation.
Weisner, T.S. (1984). A cross-cultural perspective: Ecocultural niches of middle
childhood. In A. Collins (Ed.), The elementary school years: Understanding
development during middle childhood. Washington, DC: National Academy
Press.
Weisner, T.S. (1998). Human development, child well-being, and the cultural
project of development. In D. Sharma and K. Fischer (Eds.), Socio-emotional
development across cultures. New directions in child development, 81, 69-85. San
Francisco, CA: Jossey-Bass.
Wolery, M. (1989). Using assessment information to plan instructional programs.
In D. Bailey and M. Wolery (Eds.), Assessing infants and toddlers with handicaps
(pp. 478-495). Englewood Cliffs, NJ: Merrill/Prentice Hall.
Wolery, M. (2003). Using assessment information to plan intervention programs.
In M. McLean, M. Wolery, and D.B. Bailey (Eds.), Assessing infants and
preschoolers with special needs. Upper Saddle River, NJ: Prentice Hall.
Wolraich, M.L., Gurwitch, R.H., Bruder, M.B., and Knight, L.A. (2005). The role of
comprehensive interdisciplinary assessments in the early intervention system.
In M.J. Guralnick (Ed.), The developmental systems approach to early intervention
(pp. 133-150). Baltimore, MD: Brookes.
Woodcock, R.W., and Johnson, M.B. (1989). Woodcock-JohnsonRevised Tests of
Cognitive Abilities. Itasca, IL: Riverside.
Woodcock, R.W., McGrew, K.S., and Mather, N. (2001). Woodcock-Johnson III
(WJ-III) Tests of Cognitive Abilities. Rolling Meadows, IL: Riverside.
Woods, J., and McCormick, K. (2002). Toward an integration of child- and familycentered practices in the assessment of preschool children: Welcoming the
family. Young Exceptional Children, 5(3), 2-11.
World Health Organization. (2007). The international classification of functioning,
disability and healthChildren and youth version, ICF-CY. Geneva: Author.
Ysseldyke, J.E., Thurlow, M.L., Kozleski, E., and Reschly, D. (1998). Accountability
for the results of educating students with disabilities: Assessment conference report
on the new assessment provisions of the 1997 amendments to the Individuals with
Disabilities Education Act. Available: http://education.umn.edu/NCEO/
OnlinePubs/awgfinal.html [accessed January 2008].
Yzquierdo, Z., Blalock, G., and Torres-Velasquez, D. (2004). Language-appropriate
assessments for determining eligibility of English language learners for special
education services. Assessment for Effective Intervention, 29(2), 17-30.
Zimmerman, I.L., Steiner, V.G., and Pond, R.E. (2002). Preschool Language Scale (4th
ed.). San Antonio, TX: Harcourt Assessment.

416

EARLY CHILDHOOD ASSESSMENT

Chapter 9
Duncan, S.E., and De Avila, E. (1998). Pre-Language Assessment Scale 2000.
Monterey, CA: CTB McGraw-Hill.
Espinosa, L. (2005). Curriculum and assessment considerations for young children
from culturally, linguistically, and economically diverse backgrounds. Special
Issue, Psychology in the Schools, 42(8), 837-853.
Kim, H., Baydar, N., and Greek, A. (2003). Testing conditions influence the
race gap in cognition and achievement by household survey data. Applied
Developmental Psychology, 23, 16.
Mathematica Policy Research. (2006). Implementation of the Head Start National
Reporting System: Spring 2005 update. Princeton, NJ: Author.
Mathematica Policy Research. (2007). Language routing protocol developed for the
First Five LA Universal Preschool Child Outcomes Study, 2007-2008. Princeton,
NJ: Author.
Mathematica Policy Research. (2008). Introduction to conducting assessments as
part of survey projects: Presentation for staff development trainings. Princeton, NJ:
Author.
Maxwell, K.L., and Clifford, R.M. (2004). School readiness assessment. Young
Children: Journal of the National Association for the Education of Young Children,
January, 10. Available: http://journal.naeyc.org/btj/200401/Maxwell.pdf
[accessed February 2008].
Meisels, S.J., and Atkins-Burnett, S. (2006). Evaluating early childhood assess
ments: A differential analysis. In K. McCarney and D. Phillips (Eds.), Handbook
of early childhood development (pp. 533-549). Cambridge, MA: Blackwell.
Rowand, C., Sprachman, S., Wallace, I., Rhodes, H., and Avellar, H. (2005). Factors
contributing to assessment burden in preschoolers. Paper presented at the American
Association for Public Opinion Research, May, Miami, FL. Available: http://
www.allacademic.com/meta/p_mla_apa_research_citation/0/1/6/7/2/
p16722_index.html [accessed July 2008].
Shepard, L., Kagan, S.L., and Wurtz, L. (1998). Principles and recommendations for
early childhood assessments. Goal 1 Early Childhood Assessments Resource
Group. Washington, DC: National Education Goals Panel.
Snow, K.L. (2006). Measuring school readiness: Conceptual and practical
considerations. Early Education and Development, 17(1), 7-41.
Spier, E.T., Sprachman, S., and Rowand, C. (2004). Implementing large-scale studies
of children using clinical assessments. Paper presented at the Children and the
Mediterranean Conference, January, Genoa, Italy.
Sprachman, S., Atkins-Burnett, S., Glazerman, S., Avellar, S., and Loewenberg,
M. (2007). Minimizing assessment burden on preschool children: Balancing burden
and reliability. Paper presented at the Joint Statistical Meetings, September,
Salt Lake City, UT.

417

REFERENCES

Chapter 10
American Educational Research Association, American Psychological Associa
tion, and National Council on Measurement in Education. (1999). Standards
for educational and psychological testing. Washington, DC: American Educational
Research Association.
Baker, E.L., Linn, R.L., Herman, J.L., and Koretz, D. (2002). Standards for
educational accountability systems. Los Angeles: National Center for Research on
Evaluation, Standards, and Student Testing, University of California.
Bruner, C., Wright, M.S., Gebhard, B., and Hubbard, S. (2004). Building an early
learning system: The ABCs of planning and governing structures. Des Moines, IA:
SECPTAN.
California Department of Education. (2003). Desired results for children and families.
Sacramento: Author, Child Development Division.
California Department of Education. (2005). Desired Results Developmental
Profile-Revised (DRDP-R), Preschool Instrument. Sacramento: Author, Child
Development Division.
Espinosa, L.M. (2008). A review of the literature on assessment issues for young English
language learners. Paper commissioned by the Committee on Developmental
Outcomes and Assessments for Young Children, The National Academies,
Washington, DC.
Espinosa, L.M., and Lpez, M.L. (2007). Assessment considerations for young
English language learners across different levels of accountability. Philadelphia: The
National Early Childhood Accountability Task Force.
Garca, E.E. (2005). Teaching and learning in two languages: Bilingualism and schooling
in the United States. New York: Teachers College Press.
Gilliam, W.S., and Zigler, E.F. (2004). State efforts to evaluate the effects of prekindergarten: 1977-2003. New Haven, CT: Yale University Child Study Center.
Goodman, D.P., and Hambleton, R.K. (2003). Student test score reports and
interpretive guides: Review of current practices and suggestions for future research.
Amherst: University of Massachusetts School of Education.
Hambleton, R.K., and Slater, S.C. (1997). Reliability of credentialing examinations
and the impact of scoring models and standard setting policies. Applied
Measurement in Education, 10, 19-38.
Harms, T., Clifford, R., and Cryer, D. (1998). Early Childhood Environment Rating
Scale (Revised ed.). New York: Teachers College Press.
Harms, T., Cryer, R., and Clifford, R. (1990). Infant/Toddler Environment Rating
Scale. (Revised ed.). New York: Teachers College Press.
Herman, J.L., and Perry, M. (2002). California student achievement: Multiple views of
K-12 progress. Menlo Park, CA: Ed Source.
Jaeger, R.M. (1998). Evaluating the psychometric qualities of the National
Board for Professional Teaching Standards assessments: A methodological
accounting. Journal of Personnel Evaluation in Education, 22, 189-210.
Kagan, S.L., Tarrant, K., and Berliner, A. (2005). Building a professional development
system in South Carolina: Review and analysis of other states experiences. New
York: Columbia University National Center for Children and Families.

418

EARLY CHILDHOOD ASSESSMENT

Koretz, D.M., and Baron, S.I. (1998). The validity of gains in scores on the Kentucky
Instructional Results Information System (KIRIS). Santa Monica, CA: RAND
Corporation.
Linn, R.L. (2003). Accountability: Responsibility and reasonable expectations.
Educational Researcher, 32(7), 3-13.
Meisels, S.J. (2006). Accountability in early childhood: No easy answers. Chicago, IL:
Erikson Institute.
Mitchell, A.W. (2005). Stair steps to quality: A guide for states and communities
developing quality rating systems for early care and education. Alexandria, VA:
United Way Success by Six.
National Early Childhood Accountability Task Force. (2007). Taking stock: Assessing
and improving early childhood learning and program quality. Philadelphia:
Author.
National Research Council. (2001). Knowing what students know: The science
and design of educational assessment. Committee on the Foundations of
Assessment, J. Pellegrino, N. Chudowsky, R. Glaser (Eds.). Board on Testing
and Assessment, Center for Education, Division of Behavioral and Social
Sciences and Education. Washington, DC: National Academy Press.
National Research Council. (2006). Systems for state science assessment. Committee
on Test Design for K-12 Science Achievement, M.R. Wilson and M.W.
Bertenthal (Eds.). Board on Testing and Assessment, Center for Education,
Division of Behavioral and Social Sciences and Education. Washington, DC:
The National Academies Press.
National Research Council and Institute of Medicine. (2000). From neurons
to neighborhoods: The science of early childhood development. Committee on
Integrating the Science of Early Childhood Development, J.P. Shonkoff and
D.A. Phillips (Eds.). Board on Children, Youth, and Families, Commission
on Behavioral and Social Sciences and Education. Washington, DC: National
Academy Press.
Neuman, S.B., and Roskos, K. (2005). The state of state pre-kindergarten
standards. Early Childhood Research Quarterly, 20(2), 125-145.
New Jersey Office of Early Childhood Education. (2004). NJ early learning
assessment systemLiteracy. Trenton: New Jersey Department of Education.
New Jersey Office of Early Childhood Education. (2006). NJ early learning
assessment systemMath. Trenton: New Jersey Department of Education.
Pianta, R.C. (2003). Standardized classroom observations from pre-K to third grade: A
mechanism for improving quality classroom experiences during the P-3 years. New
York: Foundation for Child Development.
Scott-Little, C., Kagan, S.L., and Frelow, V.S. (2003a). Creating the conditions for
success with early learning standards: Results from a national study of statelevel standards for childrens learning prior to kindergarten. Early Childhood
Research & Practice, 5(2).
Scott-Little, C., Kagan, S.L., and Frelow, V.S. (2003b). Standards for preschool
childrens learning and development: Who has standards, how were they developed,
and how are they used? Greensboro: University of North Carolina.

419

REFERENCES

Smith, M., and Dickinson, D. (2002). Users guide to the early language & literacy
classroom observation toolkit. Available: http://www.brookespublishing.com/
store/books/smith-ellco/index.htm [accessed July 2008].
U.S. Department of Education. (2004). Standards and assessments peer review
guidance: Information and examples for meeting requirements of the No Child Left
Behind Act of 2001. Washington, DC: Author.
Wainer, H. (1997). Improving tabular displays: With NAEP tables as examples and
inspirations. Journal of Educational and Behavioral Statistics, 22, 1-30.
Wainer, H., Hambleton, R.K., and Meara, K. (1999). Alternative displays for
communicating NAEP results: A redesign and validity study. Journal of
Educational Measurement, 36, 301-335.

Chapter 11
American Educational Research Association, American Psychological Associa
tion, and National Council on Measurement in Education. (1999). Standards for
educational and psychological testing. Washington, DC: Author.
Christenson, S.L. (2004). The family-school partnership: An opportunity to
promote learning and competence of all students. School Psychology Review,
33(1), 83-104.
Goldenberg, C., Rueda, R., and August, D. (2006). Synthesis: Sociocultural
contexts and literacy development. In D. August and T. Shanahan (Eds.),
Report of the National Literacy Panel on Language Minority Youth and Children.
Mahwah, NJ: Lawrence Erlbaum Associates.
National Education Goals Panel. (1995). Reconsidering childrens early development
and learning: Toward common views and vocabulary. Washington, DC: Author.
National Research Council. (1999). High stakes: Testing for tracking, promotion,
and graduation. Committee on Appropriate Test Usage, J.P. Heubert and
R.M. Hauser (Eds.). Center for Education, Division of Behavioral and Social
Sciences and Education. Washington, DC: National Academy Press.
National Research Council. (2006). Systems for state science assessment. Committee
on Test Design for K-12 Science Achievement, M.R. Wilson and M.W.
Bertenthal (Eds.). Board on Testing and Assessment, Center for Education,
Division of Behavioral and Social Sciences and Education. Washington, DC:
The National Academies Press.
National Research Council and Institute of Medicine. (2000). From neurons
to neighborhoods: The science of early childhood development. Committee on
Integrating the Science of Early Childhood Development, J.P. Shonkoff and
D.A. Phillips (Eds.). Board on Children, Youth, and Families, Commission
on Behavioral and Social Sciences and Education. Washington, DC: National
Academy Press.
Rueda, R. (2007). Motivation, learning, and assessment of English learners. Paper
presented at the School of Education, California State University, Northridge,
April.

420

EARLY CHILDHOOD ASSESSMENT

Rueda, R., and Yaden, D. (2006). The literacy education of linguistically and
culturally diverse young children: An overview of outcomes, assessment, and
large-scale interventions. In B. Spodek and O.N. Saracho (Eds.), Handbook of
research on the education of young children (2nd ed., pp. 167-186). Mahwah, NJ:
Lawrence Erlbaum Associates.
Rueda, R., MacGillivray, L., Monz, L., and Arzubiaga, A. (2001). Engaged reading:
A multi-level approach to considering sociocultural features with diverse
learners. In D. McInerny and S.V. Etten (Eds.), Research on sociocultural influences
on motivation and learning (pp. 233-264). Greenwich, CT: Information Age.

Appendixes

Appendix

Glossary of Terms Related to

Early Childhood Assessment

Accommodations

Achievement test

Alternative
assessment

Adaptations in assessment tools and standards to permit children with disabilities

or English language learners to show what
they know and can do. Adjustments may
be made, for example, in the way a test is
administered or presented, in the timing, in
the language, or in how the child responds.
The nature of the adjustment determines
whether or not what is being measured
or the comparability of scores is affected
(Council of Chief State School Officers,
2008).
A testing instrument, typically standardized
and norm-referenced, used to measure how
much a child has learned in relation to educational objectives (Council of Chief State
School Officers, 2008).
See Performance assessment.

423

424
Assessment

Authentic
assessment

Constructi rrelevant
variance

Criterionreferenced
assessment
Curriculum-based
assessment

Developmental
assessment

EARLY CHILDHOOD ASSESSMENT

A term sometimes used loosely to refer to

any type of appraisal of young children. In
a narrower sense, assessment refers to information from multiple indicators and sources
of evidence that is organized and interpreted
and then evaluated to make an appraisal
(McAfee, Leong, and Bodrova, 2004).
A type of performance assessment that uses
tasks that are as close as possible to real-life
practical and intellectual challenges and
the child completes the desired behavior
in a context as close to real life as possible
(McAfee, Leong, and Bodrova, 2004).
Variance in assessment results that reflects
variables other than the construct the assessment is intended to measure. An example is
the variance in a mathematics assessment
that may occur if the child being assessed
lacks the language skills to understand the
assessment items.
A testing instrument in which the test-takers
performance (i.e., score) is interpreted by
comparing it with a prespecified standard
or specific content and/or skills (Council of
Chief State School Officers, 2008).
Form of criterion-referenced measurement
wherein curricular objectives act as the criteria for the identification of instructional
targets and for the assessment of status and
progress (Bagnato and Neisworth, 1991).
An ongoing process of observing a childs
current competencies (including knowledge,
skills, dispositions, and attitudes) and using
the information to help the child develop
further in the context of family and care
giving and learning environments (Council
of Chief State School Officers, 2008).

APPENDIX A

Developmentally
appropriate

Dynamic
assessment
Formal
assessment

Formative
assessment
High-stakes
assessment

Informal
assessment

Naturalistic
assessment

425
Developmentally appropriate practice is
informed by what is known about child
development and learning, what is known
about each child as an individual, and what
is known about the social and cultural contexts in which children live (adapted from
National Association for the Education of
Young Children, 1996, 2008).
Assessment approach characterized by
guided support or learning for the purpose
of determining a childs potential for change
(Losardo and Notari-Syverson, 2001).
A procedure for obtaining information that
can be used to make judgments about characteristics of children or programs using
standardized instruments (Council of Chief
State School Officers, 2008).
An assessment designed to monitor progress toward an objective and used to guide
curricular and instructional decisions.
Tests or assessment processes for which
the results lead to significant sanctions or
rewards for children, their teachers, administrators, schools, programs, or school systems. Sanctions may be direct (e.g., retention in grade for children, reassignment
for teachers, reorganization for schools) or
unintended (e.g., narrowing of the curriculum, increased dropping out).
A procedure for obtaining information that
can be used to make judgments about characteristics of children or programs using means
other than standardized instruments (Council
of Chief State School Officers, 2008).
See Authentic assessment.

426
Norm-referenced
test

Performance
assessment

Portfolio
assessment

Progress
monitoring

Readiness test

EARLY CHILDHOOD ASSESSMENT

A standardized testing instrument by which

the test-takers performance is interpreted
in relation to the performance of a group of
peers who have previously taken the same
test. The group of peers is known as the
norming group (Council of Chief State
School Officers, 2008).
Finding out what children know and can
do by observing how they perform certain
tasks. Usually uses tasks as close as possible to real-life practical and intellectual
challenges (McAfee, Leong, and Bodrova,
2004).
A collection of work, usually drawn from
childrens classroom work, which, when
subjected to objective analysis, becomes
an assessment tool (Council of Chief State
School Officers, 2008).
Assessment conducted to examine students
academic performance and evaluate the
effectiveness of instruction. Progress is
measured on a regular basis (e.g., weekly or
monthly) by comparing expected and actual
rates of learning. Based on these measurements, teaching is adjusted as needed (Association for Supervision and Curriculum
Development, 2008).
A testing instrument designed to measure
skills believed to be related to school learning tasks and to be predictive of school success (Council of Chief State School Officers,
2008).

APPENDIX A

427

Reliability

The consistency of measurements, gauged

by any of several methods, including when
the testing procedure is repeated on a population of individuals or groups (test-retest
reliability), or is administered by different
raters (inter-rater reliability). There is no single, preferred approach to quantification of
reliability (American Educational Research
Association, American Psychological Association, and National Council on Measurement in Education, 1999).
The use of a brief procedure or instrument
designed to identify, from within a large
population of children, those who may need
further assessment to verify developmental
and/or health risks (Council of Chief State
School Officers, 2008).
A testing instrument that is administered,
scored, and interpreted in a standard manner. It may be either norm-referenced or
criterion-referenced (Council of Chief State
School Officers, 2008).
An assessment using criteria that are derived
directly from content or performance standards (adapted from Council of Chief State
School Officers, 2008).
An assessment that typically documents
how much learning has occurred at a point
in time; its purpose is to measure the level of
child, school, or program success (Association for Supervision and Curriculum Development, 2008).
The extent to which an instrument measures
what it purports to measure; the extent
to which an assessments results support
meaningful inferences for certain intended
purposes.

Screening

Standardized test

Standards-based
assessment
Summative
assessment

Validity (of an
assessment or
tool)

428

EARLY CHILDHOOD ASSESSMENT

SOURCES
American Educational Research Association, American Psychological Association,
and National Council on Measurement in Education. (1999). Standards for
educational and psychological testing. Washington, DC: Author.
Association for Supervision and Curriculum Development. (2008). Homepage.
Available: http://www.ascd.org [accessed June 2008].
Bagnato, S.J., and Neisworth, J.T. (1991). Assessment for early intervention: Best
practices for professionals. New York: Guilford Press.
Council of Chief State School Officers. (2008). Glossary terms. Washington,
DC: Author. Available: http://www.ccsso.org/projects/scass/projects/
early_childhood_education_assessment_consortium/publications_and_
products/2892.cfm [accessed August 2008].
Losardo, A., and Notari-Syverson, A. (2001). Alternative approaches to assessing
young children. Baltimore, MD: Brookes.
McAfee, O., Leong, D.J., and Bodrova, E. (2004). Basics of assessment: A primer
for early childhood educators. Washington, DC: National Association for the
Education of Young Children.

Appendix

Information on Stakeholder Forum

Public Forum and Information-Gathering Session

July 6, 2007

AGENDA
1:00 Catherine Snow, Committee Chair, and Susan Van
Hemel, Study Director. Welcome and introduction of
committee. Description of the study and purpose of the
forum. Review of procedure and ground rules.
1:20 Ben Allen, National Head Start Association
1:32 Tammy Mann, Zero to Three
1:44 Fasaha Traylor, Foundation for Child Development
1:56 Jerlean Daniel, National Association for the Education of
Young Children
2:08 Joan Isenberg, National Association of Early Childhood
Teacher Educators
2:20 Sally Flagler, National Association of School Psychologists
2:32 Andrea Browning, Society for Research in Child
Development (brief statement)
2:40 Break
3:00 Willard Gilbert, National Association for Bilingual
Education
429

430

EARLY CHILDHOOD ASSESSMENT

3:12 Felicia DeHaney, National Black Child Development

Institute
3:24 Miriam Calderon, National Council of La Raza
3:36 Michael Lopez, National Center for Latino Child and
Family Research
3:48 Michaelene Ostrosky, Center on the Social Emotional
Foundations of Early Learning
4:00 Mark Innocenti, Division for Early Childhood, Council
for Exceptional Children
4:12 Noma Anderson, American Speech-Language-Hearing
Association
4:24 Guest Comments (sign up upon arrival), Maximum 3
minutes per speaker.
5:00 Adjourn
BACKGROUND INFORMATION
Congress and the U.S. Department of Health and Human
ervices (HHS) have launched multiple initiatives to invest in
S
early childhood interventions to improve healthy development
for at-risk children. These initiatives include programs such as
Head Start and Early Head Start, which serve low-income children from birth to age 5, pregnant women, and their families. The
programs strive to provide services responsive to the children and
their families cultural, ethnic, and linguistic heritage.
Assessment of childrens progress is a key feature of Head
Start classrooms, since ensuring that children are ready for school
requires systematic, comprehensive, and ongoing evaluation.
Numerous types of assessments are used in Head Start programs.
For example, performance standards for Head Start require that
programs assess the progress of each child toward an array of
positive outcomes, on an ongoing basis; programs are required to
screen children to identify special needs; and children are assessed
on their achievement of specific cognitive and language outcomes
through the standardized National Reporting System. The challenges of assessing young children are numerous, and in Head
Start these challenges are compounded by the multiple culturaland linguistic-minority origins of the children who participate in
these programs.

431

APPENDIX B

Concerns about the identification of relevant developmental

outcomes for young children and selection of appropriate assessment instruments for the Head Start program are emerging within
an increasingly crowded landscape of other early interventions,
including state-based early childhood education programs. An
evidence-based analysis of scientific research will help to inform
these efforts as well as building consensus about the appropriate instruments, objectives, and frameworks that should guide
standards-based assessment of young children.
The Study
Congress included conference report language in the HHS
FY2006 appropriations bill (H.Rpt. 109-300) directing HHS to
sponsor a study by the National Academy of Sciences to address
these issues. In response, the National Research Council (NRC)
will organize an ad hoc committee to review research on developmental outcomes and assessment processes for young children
(ages 0-5). The committee will focus on two key topics in conducting the study and preparing its report: (1) the identification of
key developmental outcomes associated with children ages 0-5
that should be the focus of early childhood programming and
(2) the identification of state-of-the art techniques and instruments
for developmental assessments, including examination of areas
where current assessment tools are inadequate. It is anticipated
that the 20-month study will inform the development and implementation of future testing instruments for children enrolled in
Head Start programs and other early childhood interventions as
well as guiding training needs for staff involved in administering
and interpreting various assessments. Explicit attention will be
given in the study to identification of children with various disabilities as well as assessments of children from minority cultures
and those whose home language is not English.
The study will be conducted through a collaboration between
the NRC/IOM Board on Children, Youth, and Families and the
NRC Board on Testing and Assessment. The study committee
will convene several times, will conduct a literature review, and
will commission a set of background papers to inform its deliberations. The final study report will include a research synthesis

432

EARLY CHILDHOOD ASSESSMENT

that highlights key developmental outcomes and the features

associated with selected categories of assessment tools, lessons
learned from their use in different program settings, and policy
and research recommendations to improve the quality of develop
mental assessments and their use with diverse populations of
young children. Dissemination efforts will include briefings for
agency officials, congressional representatives, and officers for
key stakeholder organizations, and the production of a report
brief that will translate the study findings for practitioners and
policy makers.
Questions for Forum Participants
Listed below are general topics and more specific questions
based upon the issues that the Committee on Developmental Outcomes and Assessments for Young Children will be addressing in
its work. We are interested in your views on any of these that you
and your organization feel competent to address, but you should
not feel obliged to answer all of the questions.
Please indicate clearly which questions you are responding
to, and keep your written response to a maximum of five pages
(11 pt. type or larger). If your organization has published position
statements that address our questions, you may refer the committee to those, noting which questions are addressed in each.
References, if you provide them, will not be included in the page
count. Please send your responses to us ([email protected])
as Word or PDF files no later than June 29. Thank you for your
participation. Your materials will be deposited in the projects
Public Access file and will be made available to interested parties
upon request.
1. General Issues: Why measure
What are the most important philosophical issues in assessing the development of children from birth to 5 years old? Do
your answers differ with the age (within the birth to 5-year
range) of the children being assessed? Questions you may
want to address include:
A. What are the most important benefits of such assessment?

APPENDIX B

433

B. What are the most important risks associated with such

assessment?
C. What are appropriate purposes for such assessment?
D. What are appropriate uses of assessment results?
E. To whom should assessment results be reported? At what
level of aggregation?
F. What is the proper role of child assessment in early childhood program accountability?
2. Outcomes, Domains, Functions: What to measure
What developmental outcomes, domains, or functions are
appropriate for assessment, and why? Questions you may
want to address include:
A. What domains/outcomes/functions best predict childrens
later development and learning outcomes?
B. What domains/outcomes/functions can be assessed most
reliably and validly in this age group?
3. Assessment Instruments: How to measure
What are the most important considerations or criteria to use
in designing or selecting assessment instruments for young
children? Questions you may want to address include:
A. For what domains/outcomes/functions do we have useful,
valid, reliable assessment tools at this time?
B. For what domains/outcomes/functions do we NOT have
useful, valid, reliable assessment tools at this time?
C. What do you see as the relative merits of direct assessment
versus assessment based on ongoing observation of children in their natural environments?
D. Where do you stand on the issue of administering all
children all instruments and items versus some form of
sampling?
4. Assessment Implementation: How to perform assessments
and use the results
What do you see as the major issues for implementing assessment of young children? Questions you may want to address
include:

434

EARLY CHILDHOOD ASSESSMENT

A. Who should be assessing children? Teachers, caregivers,

parents, others? What training and supervision do the
assessors need?
B. What can be done to assure that results are used in beneficial ways? What training do users of assessment results
need? How can results best be presented to various users
or audiences?
5. Special Populations (children with disabilities/delays,
English language learners, children from minority cultures,
etc.): Equity, fairness, inclusion
What are the special concerns about the assessment of children from these groups and your recommendations for appropriate assessment of these children? Questions you may want
to address include:
A. What suggestions would you offer for assuring that the
assessment of all children is fair and useful?
B. Can universal design principles be employed in the design
of assessments for young children? If so, should those principles be employed?
Forum Speaker List
Ben Allen, National Head Start Association
Noma Anderson, American Speech-Language-Hearing Association
Andrea Browning, Society of Research on Child Development
Miriam Calderon, National Council of La Raza
Jerlean Daniel, National Association for the Education of Young
Children
Felicia DeHaney, National Black Child Development Institute
Sally Flagler, National Association of School Psychologists
Willard Gilbert, National Association for Bilingual Education
Mark Innocenti, Division for Early Childhood, Council for
Exceptional Children
Joan Isenberg, National Association of Early Childhood Teacher
Educators
Michael Lopez, National Center for Latino Child and Family
Research
Tammy Mann, Zero to Three

APPENDIX B

Michaelene Ostrosky, Center on the Social Emotional

Foundations of Early Learning
Fasaha Traylor, Foundation for Child Development

435

Appendix

Development of
State Standards for
Early Childhood Education

aking generalizations across the states early learning

standards is difficult. They differ on many dimensions,
including diverse structures for naming the elements
of the documents, diverse structures for organizing the content,
varied intent for their use, multiple methods for defining and
creating alignment with the states K-12 standards, and a wide
range of resources available to put them into practice.
One characteristic of the state documents that comes closer to
congruency, especially in the development/revision of the standards following the launching of the federal Good Start, Grow
Smart initiative is: Who was involved in the development of state
early learning standards? Examination of the front material in the
state documents reveals that the stakeholder groups that came
For consistency the term early learning standards is used throughout this
appendix to refer to child outcomes, guidelines, and other references to written
sets of expectations for young children. This use of the term is consistent with the
definition in the Glossary developed by the Early Childhood Education Assessment
Consortium of the Council of Chief State School Officers in collaboration with
several early childhood organizations. The definition of early learning standards is:
statements that describe expectations for the learning and development of young
children across the domains of health and physical well-being, social and emotional
well-being, approaches to learning, language development and symbol systems,
and general knowledge about the world around them (Council of Chief State
School Officers and Early Childhood Education Assessment Consortium, 2007).

437

438

EARLY CHILDHOOD ASSESSMENT

together to develop standards in the states have themselves been

highly diverse. This diversity is a common element.
Prior to Good Start, Grow Smart, the departments of education were typically the lead agencies since those early standards were developed primarily to guide the development of
the states prekindergarten programs. After Good Start, Grow
Smart spurred the development of early learning standards by
additional states, leadership was often a joint enterprise of the
state social services agencies having oversight of the child care
program and the departments of education. In several cases, the
Head Start State Collaboration Offices were also included in the
leadership team. Stakeholder participants typically included
representatives from a wide array of early childhood program
sectors and support services (e.g., family- and center-based child
care; state prekindergarten; Head Start and Early Head Start;
associate- and bachelor-level higher education; resource and
referral agencies; specialists in age levels, such as infant/toddler,
preschool, kindergarten/primary; specialists in content areas;
specialists in special needs; social services, mental health, medical professionals, nutritionists, parents). Participation by such a
broad base of interested parties reflects a commitment on the part
of state leaders to the creation of standards suitable for use across
the field and reflective of reasonable expectations for the wide
range of child characteristics during this developmental period.
early learning standards documents
Differences among the state documents on this dimension are
legion. The lack of consensus makes it difficult to make comparisons of actual content. The early learning standards documents
represent a consensus process reflective of the often different
emphases of the states. It is unlikely that states will move toward
a common set of national standards, although the successive revision processes and the easy access that the Internet provides to the
work of other states may tend to bring about a form of consensus
over time.
Various scholars who have analyzed the documents recently
have described or recommended structures and naming schemes
(National Institute for Early Education Research, 2003; Neuman

439

APPENDIX C

and Roskos, 2005; Scott-Little, Kagan, and Frelow, 2005). The

scholars do not use consistent terminology or frameworks in
analyzing the content of sets of standards. The National Institute
for Early Education Research (2003) recommend a three-stage
framework of content statements categorized within a hierarchical structure of domains, standards, and benchmarks:
1. Domains are the seven general subject areas which statements may belong to.
2. Standards are familiar categories within a domain and help
organize a collection of closely related benchmarks.
3. Benchmarks describe either student knowledge or skill; they
do not describe student performance, student activities, or
goals of the curriculum (Introduction to the State Standards
Database at http://nieer.org/standards/).
Neuman and Roskos (2005) analyzed current naming and
organizational structures in early learning standards documents.
They argue for parsimony and clarity based on research and recommend a hierarchy organized by content domain, skill area, and
indicators (exemplars).
By itself this disagreement about terminology is not harmful
as long as the developers understand the hierarchy that they have
chosen and can use it to communicate important ideas to practitioners and to families. The most serious problem is the confusion
in many of the documents about the difference between content
and performance standards. Use of more consistent schema may
become more widespread as the state documents are revised to
reflect what has been learned from their initial use and because
of their increasing use as the basis for the development of state
assessment systems.
content
A more recent and complete compilation of information
about the content of early learning standards and their use across
the states is found in annual web-based surveys conducted by
the members of the Early Childhood Education Assessment
(ECEA) Consortium of the Council of Chief State School Officers

440

EARLY CHILDHOOD ASSESSMENT

(CCSSO). The results of the 2005 survey are reported in an article,

Early Learning Standards: Results from a National Survey to
Document Trends in State-Level Policies and Practices in the
online peer-reviewed journal Early Childhood Research and Practice
(Scott-Little et al., 2007, pp. 1-22). A total of 49 states (96 percent
of 51, including the District of Columbia) provided information
about the development and use of their early learning standards.
All indicated that the standards were intended as a resource to
improve instruction and strengthen curriculum. Of the 49 states,
36 (73 percent) said that improving professional development was
an important intent and 32 (65 percent) said that educating parents about childrens development and learning was important.
With 49 of 51 states now having developed early learning
standards (North Dakota standards continue to be in draft form),
the following generalizations may be observed:
All 49 have standards in the areas of Language and Early
Literacy.
37 have standards in Mathematics; of the 12 which do not,
mathematical concepts are included in standards on Cognition and General Knowledge.
42 states have standards in Physical/Motor Development
and Health; the 7 that do not are that states with standards
only in Language and Early Literacy and Mathematics (CO,
MD, OH, PA, SC, VA) and in Language and Early Literacy
only (NY).
15 of the 42 states address content in a general section
on Cognition and General Knowledge. The remainder
divides content areas into Mathematics, Science, Arts,
and Social Studies.
Nearly half (21) have standards addressing Approaches
to Learning.
Appendix Table C-1 provides these data for all the states.

x
x
x

x
x
x
x
x

x
x
x

x
x
x
x
x
x

AZ
AR
CA
CO
CT
DE
FL
GA
HI
ID

x
x

Social/
Emotional

National
Head Start COF
Carnegie/
McGraw-Hill
States
AL

Physical/
Motor/
Health

x
x

Approaches
Toward
Learning

x
x

x
x
x
x
x
x

x
x

Literacy

x
x

Language/
Communication

Cognition/
General
Knowledge

x
x

Math

x
x

Science

x
x

Art/
Aesthetics

Social
Studies

continued

Humanities

Safety

Technology
Environmental
Education
World
Languages
Safety

World
Languages

Other

TABLE C-1 Domain/Content Areas Headings Included in National and State Pre-K Early Learning Standards
Documents

441

x
x
x
x
x

MN
MS
MO
MT
NE

x
x
x
x
x

x
x

Language/
Communication

x
x

Approaches
Toward
Learning

x
x
x
x
x

IN
IA
KS
KY
LA
ME
Learning
Results
Early
Learning
Results
MD
MA

Social/
Emotional

Physical/
Motor/
Health

TABLE C-1 Continued

x
x
x

x
x

x
x
x
x
x

Literacy

Cognition/
General
Knowledge

x
x
x
x
x

x
x

x
x
x
x
x

Math

x
x
x
x
x

x
x

x
x
x
x
x

Science

x
x

x
x
x
x
x

Art/
Aesthetics

x
x

x
x
x

Social
Studies

Technology,
Engineering
Nutrition, SelfHelp

LR: Career
Preparation,
Modern
and Classic
Languages,
Technology

Foreign
Language

Other

442

x
x
x
x
x

x
x

WV
WI

NM
NY
NC
ND
OH
OK
OR
PA
RI
SC
SD
TN
TX
UT
VT
VA
WA

NV
NH
NJ

x
x
x
x
x
x
x
x
x
x
x
x
x
x
x

x
x
x
x
x
x
x
x
x
x
x
x
x

x
x

x
x
x

x
x

x
x
x
x
x
x
x
x
x
x

x
x

x
x
x
x
x

x
x

x
x
x
x
x

x
x
x

x
x

continued

Self-Help

Technology

World
Languages

McGraw-Hill

443

Social/
Emotional

Approaches
Toward
Learning

Language/
Communication

Literacy

Cognition/
General
Knowledge

Math

Science

Art/
Aesthetics

Social
Studies
Other

NOTE: This table has been adapted with permission from a 2005 report by Scott-Little, Kagan, and Frelov, Inside the Content: The Breadth and
Depth of Early Learning Standards. The table has been updated to include states that published their early learning standards document after this
report was completed. Data were collected by simply reviewing the table of contents of each early learning standards document and noting the
developmental domain areas and academic subject areas included in the table of contents. Results from analyses conducted by Scott-Little, Kagan,
and Frelov (2005) on the content of the actual early learning standards included in the documents indicate that the table of contents is not always
an accurate reflection of the content of the standards themselves. While the table of contents may reflect the intentions or overall mind set of the
persons who developed the early learning standards, they do not necessarily give a complete or accurate indication of the areas of learning and
development that have been addressed in the standards themselves.

Physical/
Motor/
Health

TABLE C-1 Continued

444

APPENDIX C

445

alignMENT WITH the Head Start Child

Outcomes Framework
At the time the Head Start Child Outcomes Framework
(HSCOF) was released in 2000, only 10 states had published early
learning standards. It was at that time the only set of nationally
recognized standards that could lay claim to a research base. In
November 2007, the state early childhood specialists, all of whom
had participated in the development of early learning standards in
their respective states, were queried about the degree to which the
HSCOF was consulted in the development of their early learning
standards. Of the specialists who responded, all indicated that the
HSCOF had been used in the formulation of their early learning
standards. The depth of the use varied; however, it was clear that
all of them had considered the organization and the content of the
HSCOF in deciding how to create their own sets of standards.
In reexamining Appendix Table C-1, it appears that the
majority of the states that have gone beyond Language, Early
Literacy, and Mathematics have included all the domain and
content areas included in the framework, with the exception of
Approaches to Learning. Only about two-fifths of the states have
that named category. A more thorough analysis of the entire
corpus of standards might reveal that Approaches to Learning
indicators are embedded in other areas, such as Social/Emotional
Development and Cognition. Furthermore, the emphasis on this
area in the 21st Century Learning Skills (Partnership for 21st Century Skills, 2007) suggests that Approaches to Learning might gain
greater visibility in subsequent revisions.
alignment with learning standards in
the K-12 system
While the major purpose of the 2005 CCSSO survey was to
determine the extent to which standards were being implemented
in the states, respondents also provided information about
issues in their development. Chief among these was how states
addressed the issue of alignment. How early learning standards
are aligned to standards for children in the K-12 system is both
important and of great interest. The ECEA CCSSO group, in their

446

EARLY CHILDHOOD ASSESSMENT

web-based glossary of assessment terms, The Words We Use: A

Glossary of Terms for Early Childhood Education Standards and Assessment (2007), defines alignment as:
The horizontal (coordination within an age/grade level), vertical
(what came before and what will follow), and temporal (across
the calendar year) relationships among early learning standards,
curriculum, teaching practices, and assessment. Alignment at the
early childhood level (birth through age 8) forms the basis for the
formulation of standards and assessment for older students.

Since the Good Start, Grow Smart initiative (White House,

2002) called on states to address vertical alignment, it is not
surprising that all the states reporting in the 2005 ECEA CCSSO
survey indicated that their early learning standards were aligned
in some way with the states K-12 standards. The nature of that
reported alignment is diverse and difficult to understand, given
the way the question was framed. In the CCSSO ECEA survey,
27 states (66 percent) reported some form of vertical alignment
using the states kindergarten standards as a guide. The openended survey responses provided more understanding of this
downward-mapping process, with states reporting greater or
lesser direct connection between the early learning standards and
the states kindergarten standards. Some states duplicated the
content areas in the kindergarten standards, and others included
domains typically associated with descriptions of childrens early
development (e.g., social emotional, approaches to learning).
Clearly, connecting the work accomplished in the creation
of standards and assessments at the pre-K level with that which
exists in K-12 should be a priority at both state and local levels.
Having now created standards for pre-K, many states are now
looking more critically at the alignment between pre-K and kinder
garten and are moving toward addressing that alignment in spite
of the challenges involved.
The learning and development of young people are complex
at all levels. In the early stages, how professionals decide to organize indicators of expected development and learning is informed
both by what science tells us and by what seems to be a reasonable
framework. Use of the developmental domains helps to accentuate the importance of areas such as social/emotional develop-

APPENDIX C

447

ment and approaches to learning to childrens development in

the language and cognitive areas. In the K-12 system, attention to
these areas occurs outside of the learning standards, if at all. Pre-K
practitioners have given greater attention to content areas over the
past decade, owing greatly to earlier NRC publications (National
Research Council, 2001). A parallel effort to raise the attention of
practitioners in the K-12 arena to the importance of social/emotional development and approaches to learning not only would
improve the learning environment for elementary children, it
would create a better environment to address alignment issues.
Ohio and Massachusetts are among states with initiatives underway to harmonize pre-K and K-3 education systems.
In its infant toddler guidelines, the state of Michigan uses the
image of a tree to explain how development and learning progress
in the early years:
. . . childrens development is not a straight line; one discrete skill
or milestone does not lead directly to another in a single chain of
developments. For the very youngest, it is difficult to differentiate
between developmental domains such as approaches to learning,
social and emotional development, language and cognition. . . .
One action falls in many domainsand that skill will later lead to a
number of other skills in a variety of domains. . . . Perhaps the image
is of a tree, where the roots are the strands in this document, and the
skills we see later are the branches and leaves. It may not be possible
to trace all the connections directly, but the early developments all
contribute to the later accomplishments. (Michigan State Board of
Education, 2006, pp. 2-3)

Building on this analogy, standards reflecting this view of

development and learning might be conceived of as beginning
with the less differentiated accomplishments, progressing to
the domains represented in this document and branching out to
include the content domains more commonly found in the early
elementary years of schooling.
A few promising initiatives are re-emerging, led largely by
the Foundation for Child Development. Its recent report, PK-3rd:
A New Beginning for American Education (Foundation for Child
Development, 2008) outlines a bold agenda for bringing the
years that span pre-K through grade 3 into a cohesive unit to support childrens early learning and development (http://www.
fcd-us.org/initiatives/initiatives_show.htm?doc_id=447080).

448

EARLY CHILDHOOD ASSESSMENT

REFERENCES
Council of Chief State School Officers and Early Childhood Education Assessment
Consortium. (2007). The words we use: A glossary of terms for early childhood
education standards and assessment. Available: http://www.ccsso.org/Projects/
scass/projects/early_childhood_education_assessment_consortium/
publications_and_products/2892.cfm [accessed February 2008].
Foundation for Child Development. (2008). PreK-3rd: A new beginning for American
education. Available: http://www.fcd-us.org/initiatives/initiatives_show.
htm?doc_id=447080 [accessed May 2008].
Michigan State Board of Education. (2006). Early childhood standards of quality for
infant and toddler programs. Lansing: Author.
National Institute for Early Education Research. (2003). State standards database.
New Brunswick, NJ: Author.
National Research Council. (2001). Eager to learn: Educating our preschoolers.
Committee on Early Childhood Pedagogy, B.T. Bowman, M.S. Donovan,
and M.S. Burns (Eds.). Commission on Behavioral and Social Sciences and
Education. Washington, DC: National Academy Press.
Neuman, S.B., and Roskos, K. (2005). The state of state pre-kindergarten
standards. Early Childhood Research Quarterly, 20(2), 125-145.
Partnership for 21st Century Skills. (2007). The intellectual and policy foundations of
the 21st century skills framework. Tucson, AZ: Author.
Scott-Little, C., Kagan, S.L., and Frelow, V.S. (2005). Inside the content: The depth and
breadth of early learning standards. Greensboro: University of North Carolina,
SERVE Center for Continuous Improvement.
Scott-Little, C., Lesko, J., Martella, J., and Milburn, P. (2007). Early learning
standards: Results from a national survey to document trends in state-level
policies and practices. Early Childhood Research and Practice, 9(1), 1-22.
White House. (2002). Good start, grow smart: The Bush administrations early child
hood initiative. Washington, DC: Executive Office of the President.

Appendix

Sources of
Detailed Information on
Test and Assessment Instruments

or specific information on the features and content of

instruments for use with infants and young children, the
committee refers the reader to the following sources. They
provide information on the construct measured, the content of
the instrument, the time to administer, how the instrument is
administered (e.g., caregiver or teacher report, direct assessment,
interview, observation), the age range and purpose (screening,
diagnosis, assessment, etc.) for which each measure is appropriate, and the interpretation and use of results. In some cases they
offer information on the instruments psychometric properties:
reliability, validity for various uses, research support for claims
of validity, etc. Some also have specific information regarding
use with special populations. The ones called reviews provide
evaluations. Additional reviews of specific instruments may often
be found by searching the ERIC (http://www.eric.ed.gov) and
PsycInfo (http://www.apa.org/psycinfo/) databases.

449

450

EARLY CHILDHOOD ASSESSMENT

Recent Print Reviews and

Compendium Documents
Title: The 17th Mental Measurements Yearbook (Buros Institute of
Mental Measurements, 2007)
Source: Buros Institute of Mental Measurements
Notes: Buros publishes periodic editions of its yearbook, the latest
of which, the 17th, was released in 2007. The yearbook provides indepth descriptions and critical reviews of current instruments.
Titles: Early Childhood Measures Profiles (Child Trends, 2004);
uality in Early Childhood Care and Education Settings: A CompenQ
dium of Measures (Child Trends, 2007)
Source: Child Trends
Notes: The Child Trends organization has published two recent
compendia of instruments for use with young children, one for
child assessment and one for assessment of care and education
environments. These provide in-depth descriptive information on each instrument, including descriptions of the norming
populations. The compendium of child measures has extensive
information on reliability and validity, both from the manual and
from other research. The environmental measures compendium
describes ways in which each measure addresses diversity, a feature not addressed in many other sources.
Title: Resources for Measuring Services and Outcomes in Head
Start Programs Serving Infants and Toddlers (Mathematica Policy
Research, 2003)
Source: Mathematica Policy Research
Notes: Mathematica Policy Research developed this compendium
of instruments for the Office of Head Start. It is oriented to the
Head Start and Early Head Start programs, and its Appendix C
has useful information on the measures used in the Early Head
Start Research and Evaluation Project, including descriptions of

APPENDIX D

451

psychometric properties. It covers measures of child development; parenting, the home environment, and parent well-being;
and program implementation and quality.
Title: Screening for Developmental and Behavioral Problems
Source: Glascoe (2005), Mental Retardation and Developmental Disabilities Research Reviews, 11(3), 173-179
Notes: This recent review article by Glascoe describes screening
tools and instruments for use with infants and young children and
is focused chiefly on instruments for use in pediatric surveillance
and screening programs. Similar information authored by Glascoe
is available at the DBPeds website (see below).
Title: Developmental Screening Tools: Gross Motor/Fine Motor for
Newborn, Infants and Children
Source: (Beligere, Zawacki, Pennington, and Glascoe, 2007)
(available at DBPeds website)
Notes: Glascoe and colleagues provide a listing specifically of
screening tools for gross motor and fine motor development, also
available on the DBPeds website.
Title: Assessing Social-Emotional Development in Children from a
Longitudinal Perspective for the National Childrens Study: SocialEmotional Compendium of Measures
Source: (Denham, 2005) (available at The National Childrens
Study website)
Notes: Denham provides extensive information on content and
psychometric characteristics of social-emotional instruments,
with additional comments on their use for the national childrens
Study. She includes judgments on strengths and weaknesses of
each measure reviewed, and references for research studies of
each instrument. Many of the measures reviewed are not for
young children.

452

EARLY CHILDHOOD ASSESSMENT

Title: Developmental Screening and Assessment Instruments with an

Emphasis on Social and Emotional Development for Young Children
Ages Birth Through Five
Source: National Early Childhood Technical Assistance Center
(NECTAC) at the University of North Carolina (Ringwalt, 2008)
Notes: A new resource, this document provides information,
including psychometric properties, for a large number of multidomain and socioemotional instruments. It is available at http://
www.nectac.org/~pdfs/pubs/screening.pdf.
Online Databases of Measurement Instruments
Site: Buros Center for Testing
URL: http://www.unl.edu/buros/
Notes: This is a service of the organization that has for many years
produced the venerable print-based Mental Measurements Yearbook (see above). It provides brief instrument descriptions available online at no cost. In-depth test reviews from the yearbook,
including information on the psychometric properties of instruments and how they were established, are available for purchase
online at $15 per title.
Site: National Institute for Early Education Research (NIEER)
URL: http://nieer.org/assessment/
Notes: NIEER maintains an assessment database (accessed from
the Facts and Figures tab on the NIEER homepage by selecting
Assessment Database) that provides detailed information on the
short list of early childhood measures categorized as verified,
similar to that found at the Educational Testing Service (ETS) site.
Much less information is given for unverified instruments. It
does not provide much psychometric information but notes what
information is available in instrument technical manuals. The
site notes that much of its information was obtained from test
publishers and other databases, including Buros, ETS, and ERIC.

453

APPENDIX D

(The ERIC database is no longer supported, but refers users to the

462

EARLY CHILDHOOD ASSESSMENT

Sc.D. from the Johns Hopkins University Bloomberg School of

Public Health.
Deborah J. Stipek is dean of the School of Education at Stanford
University. Her work concerns instructional effects on childrens
achievement motivation, early childhood education, elementary
education, and school reform. She has worked in the U.S. Senate
and with the Office of Head Start. While a professor at the University of California, Los Angeles, she served as director of the
Corinne Seeds University Elementary School (pre-K through sixth
grade) and the Urban Education Studies Center. At the NRC, she
served for 5 years on the NRC/IOM Board on Children, Youth,
and Families and chaired the NRC/IOM Committee for Increasing High School Students Engagement and Motivation to Learn.
She has a B.S. in psychology from the University of Washington
and a Ph.D. in developmental psychology from Yale University.
Susan B. Van Hemel (Study Director) is a senior program officer
in the Division of Behavioral and Social Sciences and Education at
the NRC. Previous projects at the NRC include a study of behavioral modeling and simulation, a study of staffing standards for
aviation safety inspectors at the Federal Aviation Administration,
studies of Social Security disability determination for individuals
with visual and hearing impairments, and workshops on technology for adaptive aging and on decision making in older adults.
She has also done work for a previous employer on vision requirements for commercial drivers and on commercial driver fatigue,
as well as many years of other work on human performance and
training. She is a member of the Human Factors and Ergonomics
Society and its technical groups on perception and performance
and aging. She has a Ph.D. in experimental psychology from the
Johns Hopkins University.
Mark R. Wilson is a professor in the Graduate School of Education at the University of California, Berkeley. His interests focus
on measurement and applied statistics. His work spans a range
of issues in measurement and assessment, from the development
of new statistical models for analyzing measurement data; to the
development of new assessments in subject matter areas, such as

APPENDIX E

463

science education, patient-reported outcomes, and child development; to policy issues in the use of assessment data in accountability systems. He has recently published three books: Constructing
Measures: An Item Response Modeling Approach is an introduction
to modern measurement; Explanatory Item Response Models: A
Generalized Linear and Nonlinear Approach (with Paul De Boeck)
introduces an overarching framework for the statistical modeling of measurements that makes available new tools for understanding the meaning and nature of measurement; and Towards
Coherence Between Classroom Assessment and Accountability explores
the issues relating to the relationships between large-scale assessment and classroom-level assessment. At the NRC, he chaired
the Committee on Test Design for K-12 Science Achievement. He
is the founding editor of Measurement: Interdisciplinary Research
and Perspectives. He has a Ph.D. in educational measurement and
educational statistics from the University of Chicago.
Martha Zaslow is the vice president for research at Child Trends
and area director for the early child development content area.
Her research takes an ecological perspective, considering the
contributions of different contexts to the development of children
in low-income families, including the family, early care and education, and policy contexts. In studying the role of the family, she
has focused especially on parenting, carrying out observational
studies of mother-child interaction. In studying early care and
education, her work has focused on patterns of child care use
among low-income families and on strategies to improve child
care quality. She has a particular interest in the professional
development of those working in early childhood settings and
its relation to quality and to child outcomes. With respect to the
policy context, she has studied the use of funding from the Child
Care and Development Fund to improve child care quality, state
initiatives to improve childrens school readiness, and impacts
on children of different welfare reform policies. At the NRC, she
was a member of the NRC/IOM Committee on Promoting Child
and Family Well-Being Through Family Work Policies: Building a
Knowledge Base to Inform Policies and Practice. She has a Ph.D.
in psychology from Harvard University.

Index

21st Century Learning Skills, 445

A
A Developmentally Appropriate
Practices Template (ADAPT),
162-163, 175
Abecedarian Project, 104, 111
Access to health care, 75
Access to test data, 3
Accommodations
for children with disabilities, 5, 8,
40, 250, 254, 259, 260, 267, 273,
276-279, 295-296, 298, 330-331,
353, 367
defined, 423
for English language learners, 250,
254, 366
Accountability. See also High-stakes
assessment
appropriate use of assessments,
35-36, 39-41, 167, 198, 259
current practice, 326
demand for assessments, 1, 18-19,
246, 280
development of instruments for,
367-368
interpreting test data, 35

observational measures, 147-148,

149, 167, 201-202, 203, 204
program evaluation, 18-19, 35-36,
54, 226-231
program impacts, 36
purposes of assessments for, 34-37,
39-41
selecting assessment tools, 40-41,
102, 201, 226-231
social benchmarking, 36-37
special considerations in using
assessments, 39-41, 119, 226231, 337
standards of, 307
tools/instruments, 39-40, 259, 254
universal design principles and,
277
validity of data, 40, 54-55, 198
Accreditation standards, 45
Achenbach System of Empirically
Based Assessment, 81
Achievement tests, 30-31, 113, 131,
133, 136, 137, 140, 153, 236-237,
242, 254-255, 271, 423. See also
Readiness for school; individual
instruments
Adapted EZ-Yale Personality/
Motivation Questionnaire, 122,
123, 128

465

466
Adaptive Social Behavior Index, 122
Administration for Children and
Families (ACF), 20, 21, 23, 24,
25, 53, 55
Administration of assessments. See
also Implementing assessments
accommodations for children with
disabilities, 272, 295-296
to English language learners, 105,
116-117, 260, 291-295
familiarity of assessor to child,
185-186, 227, 286-288
guidelines, 37-39, 268-269, 272
individualization of, 272
length of, 288-291, 294
order of, 293-294
standardization, 40, 74-75, 287
standards for testing, 250-251
stop rules, 290-291
training of examiners, 3, 33, 64,
102, 150, 256, 285-286, 291,
294-295
Age of children, 3, 38, 71, 194
Ages and Stages, 77
Ainsworth Strange Situation
Procedure, 84
Alberta Infant Motor Scale, 80
Alternative assessment. See
Performance assessments
American Academy of Pediatrics, 67,
68, 70, 71, 453
American Educational Research
Association, 55
American Psychological Association,
107
Approaches to learning
consensus, 97-98
constructs, 97, 107
domain defined, 58, 97
early childhood education
standards, 445
instruments, 128-129
intervention studies, 99
malleability, 99
measures of, 100
and outcomes, 98-99
testing all children, 100, 242

INDEX
Appropriate use of assessments
accommodations for disabilities,
5, 8, 267, 295-296, 298, 330-331,
353
for accountability, 35-36, 39-41,
167, 198, 259
age and, 3, 38, 71,194
defining, 22, 184
developmental delays and, 38, 271
domain definition and, 3, 184, 433
English language learners, 3, 40,
250-251, 256, 258, 259, 292, 293,
295, 298
guidelines for, 37-39, 270, 345-346,
353, 360
inclusivity and, 320-321, 322
legal and ethical precedents,
250-251
level of expectation for program
target and, 197
minority children, 3, 235-240, 243,
244, 259
program evaluation, 39, 86, 148,
197-198, 259, 297
for progress monitoring, 39, 86,
148, 197, 259, 297
purpose of assessment and, 27,
259, 283, 341-342, 344, 433
for quality of learning
environments, 320
readiness assessment, 31
for screening, 360
for special needs children, 3, 4, 38,
40, 271, 280, 283
standardization sample and
methods and, 237-238, 243, 271
test and item bias, 205, 210-212,
235-240, 243
for testing all children, 96, 100,
104-106, 112, 116-117
testing situation and assessor
characteristics and, 238-239
Assessment, Evaluation and
Programming System (AEPS),
333
Assessment, generally. See also
Instruments for assessment
current forms, 324-328

467

INDEX

defined, 27, 264-265, 313, 424

early learning standards aligned
with, 184-185
follow-up, 30, 39
hypothesis, 191
modes, 28
multitiered model, 263-264
Assessment of Preterm Infants
Behavior (APIB), 70
Assessment Profile for Early
Childhood Programs (APECP),
157-160, 174
Assessor/examiner
bias, 205, 238-239, 245-246
bilingual, 247, 250, 256, 260,
294-295
certification process, 285-286, 295,
296
familiarity of assessor to child,
185-186, 227, 286-288, 351
teachers as, 287
training, 3, 33, 64, 102, 150, 256,
260, 285-286, 291, 294-295, 366
Attachment/caregiver-child
interaction, 84, 153. See also
Parent-child interaction;
Teacher-child relationships
Attachment Q-set, 242
Attention span, 63, 70, 73, 86, 91, 92,
93, 96, 98, 99, 100, 107, 108, 109,
110, 113, 124, 129, 130, 164, 202,
245, 271
Authentic assessment, 22, 39, 200, 201,
204, 269, 271, 364, 424. See also
Observational measures
Automated analysis tools, 101
Automated auditory brainstem
response, 66

B
Bank Street curriculum, 335
Battelle Developmental Inventory
Screening Test, 77
Bayley Scales of Infant and Toddler
Development, 40, 78, 80, 81, 88,
113, 120, 121, 122, 123, 124, 129,
133, 242, 285, 288

Behavioral Assessment System for

Children (BASC), 124, 129, 242,
255
Benchmarks
defined, 185
social, 36-37, 40
Bias in assessments. See also
Reliability of assessments;
Validity of assessments
and appropriate use of
assessments, 205, 210-212, 235240, 243
confirmationist, 189
confounding variables, 185, 235,
241, 243, 245
contextual factors, 235-237, 243
defining, 235, 243
differential item functioning, 193,
206-210, 213
direction bias, 239
empirical evidence of, 185, 240-246
English language learners, 252, 253
examiner, 205, 238-239, 245-246,
351
item, 207, 212, 244
methods for assessing, 205-213,
241-246
minority children, 233-246
mono-operation of, 243
and predictive validity, 240
psychometric issues, 235, 240,
243-244
research needs, 365-366
restriction of range, 211, 212
situational factors, 238-239, 245-246
and social consequences of
assessment, 195-196, 239-240, 283
standardization samples and
methods and, 237-238, 241, 243,
253, 271
systematic, 237
test bias, 212, 213, 244, 245, 252
validity generalization, 187,
210-213
Bracken Basic Concept Scale-Revised
(BBCS-R), 133, 134, 136, 137, 141
Brief Infant-Toddler Social Emotional
Assessment (BITSEA), 81

468

INDEX

Brigance Screens, 77
Buros Institute of Mental
Measurements, 215, 450, 452

C
California Desired Results for
Children and Families (DRCF)
Access for Children with
Disabilities Project, 273-274,
330-331
Desired Results Developmental
Profile, 10, 36, 118, 277, 309-310,
312, 330, 332, 339, 358
system, 277, 329-332
California Preschool Learning
Foundations in Social and
Emotional Development for
Ages 3 and 4, 90
Capute Scales (CAT/CLAMS), 78
Caregiver. See Parent-child
interaction; Primary caregiverchild interactions
Caregiver Interaction Scale (CIS), 160161, 174
Caregiver-Teacher Report Form, 124,
125
Center-based environments. See also
Classroom environments
components, 99
quality measures, 155-156, 160,
162, 168, 169, 173
Center for Universal Design, 277
Cerebral palsy, 68, 69
Charge to committee, 20-22
Checklist for Autism in Toddlers
(CHAT), 81
Child Behavior Checklist, 122, 124,
125, 242
Child Care and Development Fund,
49
Child Development Inventory and
Child Development ReviewParent Questionnaire, 78
Child/Home Early Language
and Literacy Observation
(CHELLO), 153-154, 174

Child Trends, 150, 157, 215

Children with disabilities. See Special
needs children
Childrens Behavior Questionnaire, 83
Classroom Assessment Scoring
System (CLASS), 147, 150, 161,
174
Classroom environments
ADAPT, 162-163, 175
APECP, 157-160, 174
CIS, 160-161, 174
CLASS, 161, 174
CPI, 161-162, 175
dimensions observable in, 158-159,
171
EAS, 166-167, 175
ECCOM, 165, 175
ECERS-E, 164-165, 175
ECERS-R, 163-164, 175
ELLCO, 154, 165-166, 175, 334
FDCRS, 167, 176
implementing guidelines,
369-370
ITERS-R, 168, 176, 334
observation measures, 146, 154,
156-157, 158-159, 165-166, 168169, 173, 175, 176, 334
OMLIT, 168-169, 176
ORCE, 169-170, 176
PCMI, 170-171, 176
PQA, 171-172, 177
purposes of assessment, 147-149
quality of, 104
SELA, 172, 177
SELLCA, 172-173, 177
Classroom Literacy Instruction
Profile, 169
Classroom Literacy Opportunities
Checklist, 168
Classroom Practices Inventory (CPI),
161-162, 175
Clinical Evaluation of Language
Fundamentals (CELF)Preschool Behavioral
Observation Checklist, 121, 123,
129, 132, 139, 141, 142

469

INDEX
Cognitive skills. See also Attention
span; Executive functioning;
General knowledge;
Intelligence tests; Memory
consensus, 109
continuity and associations with
important outcomes, 108, 110
domain defined, 58, 106-109
English language learners, 253-254
infants, 108, 110, 113
instruments, 68, 77-78, 110, 111,
130-132, 253
lead poisoning and, 68
malleability, 111-112
measures of, 107, 108, 110, 113,
153-154
minority children, 242
nutritional deficiency and, 67, 76
quality of environment and, 108,
148, 151, 153-154, 170
standards of learning, 89
stimulation in home environments,
146, 151-152, 153-154
testing all children, 112, 242,
253-254
Communication and Symbolic
Behavior Scales, 79
Comprehensive Test of Phonological
Processing, 103, 140
Concepts About Print, 102, 140
Congenital hypothyroidism, 65
Connors Rating Scales-Revised
(CRS-R), 125
Construct, defined, 186. See also
Validity of assessments
Construct-irrelevant variance, 188,
191, 196, 274-275, 277, 278, 424
Context measures, 60
Contextual issues, 15-20, 40-41, 63-64,
67 n.1, 168, 226-231, 235-237,
243, 247, 250, 257-258
Continuous performance task (CPT),
113, 128
Council for Exceptional Children,
Division for Early Childhood
(DEC), 33, 38, 39, 268, 269
Council of Chief State School Officers,
38, 44, 308-309, 437 n.1, 439-440,
445-446

Creative Curriculum Developmental

Continuum, 33, 120, 123, 132,
139, 333, 335
Criterion-referenced assessment, 33,
238, 265, 278, 424, 427
Curiosity Corner curriculum, 335
Curriculum-based measurement, 33,
113, 255, 265, 272, 364, 369, 371,
372, 424

D
Databases on measurement
instruments, 452-453
Day/night test, 113
Delay-of-Gratification Task, 122, 128
Denver Developmental Screening Test
II, 77, 88, 120, 127, 144
Denver Prescreening Developmental
Questionnaire, 77
Desired Results Developmental
Profile, 10, 36, 118, 277, 309-310,
312, 330, 332, 339, 358, 375
Developmental assessment
charge to committee, 21-22,
431-432
clinical guidelines, 70
contexts for, 63-64
defined, 424
infants and toddlers, 2, 65, 70-72,
73, 110, 261
mandatory, 75
newborns, 68-70
normal limits, 72, 74
research agenda, 12, 360-368
for special needs children, 76, 261,
271, 369
types, 71, 77-84
Developmental delays, 38, 64, 65, 262,
263, 271. See also Special needs
children
Developmental Indicators for
Assessment of LearningRevised, 77
Developmental outcome measures,
69, 73, 88, 275-276
English language learners, 52, 54,
55

470

guidelines on, 5-6, 348-349

Head Start Child Outcomes
Framework, 49-52
important characteristics, 338
rationale for guidelines, 346-348
Developmental Profile-II, 77
Developmental scales, 77, 102, 225
Developmentally appropriate
practice, 27, 38, 153, 161-163,
165, 171, 175, 364, 425
Devereux Early Childhood
Assessment, 81, 125
Diagnostic Evaluation of Language
Variation (DELV), 106, 140
Diagnostic testing, 30, 63
instruments, 78, 79, 80, 363-364
language and literacy delays/
disorders, 101, 102, 106
referral for, 75
research needs, 363-364
special needs children, 262-264
Direct assessments, 4, 50 n.1, 111, 119.
See also individual instruments
adverse consequences, 3, 17
challenges with very young
children, 28, 101
defined, 200-201
interpreting scores, 17, 351
strengths and weaknesses, 202-205
Directional Stroop Battery, 95
Disabilities. See Developmental
delays; Special needs children
Domains. See also individual domains
appropriateness for subgroups, 3,
275
availability of instruments, 118
bias testing with minority
populations by, 242
categorizing, 58-59
defining content, 3, 184, 433
of development, 71-72
early childhood education
standards, 44, 97, 441-444
evidence of importance, 57-58
functional outcomes for special
needs children compared, 4,
275-276

INDEX
guidelines on, 5, 348-349
Head Start, 50-52
infant-toddler period, 63
instruments by, 71, 77-84, 87
justifications for, 59-60, 346-348
measurement ease, 17
overlap across, 275
schooling-related, 86
subscales, 87
Dots Task, 95
Dynamic assessment, 143, 144, 425
Dynamic Indicators of Basic Early
Literacy Skills, 143, 144

E
Early Childhood Classroom
Observation Measure
(ECCOM), 165, 175
Early Childhood Education
Assessment Consortium, 44,
437 n.1, 439-440, 445-446
Early childhood education standards
for accreditation, 45
alignment with assessments, 184185, 335-336, 338
concerns about, 46, 48
content, by state, 439-444
defined, 44, 437 n.1
development history, 36-37, 45-52,
437-447
differences among state
documents, 438-439
domains, 44, 97, 441-444, 445
Good Start, Grow Smart initiative,
52-53
Head Start Child Outcomes
Framework, 46, 49-52, 184, 445
important influences, 48-49
K-12 learning standards and,
445-447
national, 46
National Reporting System, 20, 23,
47, 49, 53-55, 201, 273, 284, 430
state, 45-46, 97
uses, 44

471

INDEX
Early Childhood Environment Rating
Scale-Extension (ECERS-E),
164-165, 175
Early Childhood Environment
Rating Scale-Revised Edition
(ECERS-R), 147, 163-164, 167,
168, 175, 336
Early Childhood Learning and
Knowledge Center, 23
Early Childhood Longitudinal StudyBirth Cohort, 36, 285, 288
Early Childhood Longitudinal StudyKindergarten (ECLS-K), 36, 98,
100, 122, 128, 129, 201, 266-267,
273, 367
Early Head Start, 23, 63, 64, 104, 111,
152, 201, 266-267, 430, 438, 450
Research and Evaluation Study,
152, 201, 267, 285, 466
Early Language and Literacy
Classroom Observation
(ELLCO), 154, 165-166, 175, 334
Early Language Milestone Scale, 79
Early Learning Assessment System,
335-336
Early Motor Pattern Profile (EMPP),
80
Early Training Project, 111
Educational Testing Service, 71, 207,
215, 452, 453
Effect sizes, 111, 208-209
Emerging Academics Snapshot (EAS),
166-167, 175
Emotion Matters II Direct
assessments, 95
English language learners
accommodations, 250, 254, 366
administration of assessments, 105,
116-117, 250-251, 260, 291-295
appropriateness of assessments
for, 3, 40, 250-251, 256, 258, 259,
292, 293, 295, 298
assessment issues, 23, 110, 112,
208-209, 249-258, 350-351
cognitive assessments, 253-254
contextual issues, 247, 250, 257-258
domains, 251-255
examiner issues, 247, 250

guidelines from professional

organizations, 38
home environment, 146
inclusion in assessments, 36, 40,
254, 259-260, 320
instruments for assessment, 54,
100, 172-173, 247, 251-255, 257,
351
language and literacy assessment,
104-105, 172-173, 248-249, 251252, 291-292, 332
legal and ethical standards, 249,
250-251, 254, 256
outcome measures, 52, 54, 55
population characteristics, 247-249
principles of assessment, 258-260
quality of learning environment,
172-173
research needs, 253, 256-258, 259,
366-367
socioemotional assessment, 255
in special education, 256
training of examiners, 256, 260,
294-295
validity of assessments, 55, 105106, 196, 228-229, 247, 254
Evaluation, defined, 264
Everyday Mathematics, 118
Executive functioning, 91, 94, 95, 99,
107, 108-109, 110, 112, 113, 128
Expressive One-Word Picture
Vocabulary Test, 79, 101, 120,
135, 139, 242
Eyberg Child Behavior Inventory, 81

F
Fagan Test of Infant Intelligence, 77
Family and Child Experiences Survey
(FACES), 50, 148, 285-286, 289
Family Day Care Environment Rating
Scale (FDCERS), 147, 167, 176
Five LA Universal Preschool Child
Outcomes Study, 292
Flanker Task, 95
Formal assessment, 71, 106, 117, 119,
137, 236-237, 272, 370, 371, 425

472

INDEX

Formative assessments, 43, 60, 100,

117, 118, 119, 147, 148, 149, 167,
336, 369, 371, 372, 425
Foundation for Child Development,
447
Function/Activities of Daily Living,
assessment instruments, 82

G
Galileo System for the Electronic
Management of Learning, 121,
123, 128, 129, 132, 135, 137, 138,
139
Games as Measurement for Early
Self-Control (GAMES), 120, 121,
123, 130
General knowledge, 58, 87, 107
instruments, 133-135
Generalizability theory, 200
Genetic/metabolic screening, 64-65
Global functioning, 17
Goal 1 Early Childhood Assessments
Resource Group, 38
Goals 2000, 48
Good Start, Grow Smart initiative, 47,
49, 52-53, 348, 437-438, 446
Government Performance and Results
Act, 1, 19
Growth Charts, 120
Guidelines. See also Standards
developmental outcome measures,
5-6, 348-349
domains, 5, 348-349
government responsibility, 372-373
health care providers role, 369
implementing guidance, 369-374
instrument selection and
implementation, 6-8, 352-354
of professional organizations,
37-39
program administrators role,
371-372
purposeful assessments, 5, 345-346
rationales for, 342-345, 346-348,
349-351, 354-356
researchers role, 374

system of assessment, 8-12,

356-360
teachers role, 369-370

H
Head Start, 2, 18, 45, 156, 302
approaches to learning in, 99, 100
assessment practices, 52, 53, 54,
110, 327, 328, 430
Child Outcomes Framework, 46,
49-52, 97, 184, 326, 327, 445
Family and Child Experiences
Survey (FACES), 50, 148, 285286, 289
Impact Study, 112, 148, 289
learning standards, 88
National Reporting System, 20, 23,
47, 49, 53-55, 201, 273, 284, 285,
287, 289, 291, 293, 294, 295, 296,
297, 327, 430
Office of Planning, Research and
Evaluation, 23
performance measures, 51, 322,
430
Pyramid of Services, 49, 50, 51
reauthorization, 55
State Collaboration Offices, 438
University Partnership
Measurement Development
Grants Program, 105
Head Start Act, 49
Head to Toe Task, 95
Health care providers
assessment of infants and toddlers,
64
implementing guidance, 369
Hearing
impairment, and assessment
validity, 274, 296
screening, 29, 30, 66, 255, 262, 263
n.2, 363
High/Scope Child Observation
Record (COR), 33, 121, 124, 129,
133, 135, 138, 139, 172, 333, 335
High Scope/Perry Preschool Project,
111

473

INDEX
High-stakes assessment, 27. See also
Accountability
defined, 2-3 n.1, 425
guidelines for using, 7, 10, 296,
353, 355, 358, 373
reliability and validity, 283
systemic approach, 337
unintended or inappropriate uses
of data, 284, 286, 337, 355-356,
358, 373
unintended or undesirable
consequences, 195
Home environments
and academic and social outcomes,
155
assessing, 149, 150-155, 167, 168,
169, 172, 173
basic needs and safety monitoring
provided, 151, 152
cognitive stimulation, 146, 151-152,
153-154
primary caregiver-child
interactions, 152-153
Home Observation for Measurement
of the Environment (HOME),
154-155, 174
Home visiting programs, 18, 63, 111112, 145-146, 149, 154

I
Implementing assessments. See
also Administration of
assessments
cost analysis, 97, 297-298
determining and communicating
purpose, 281, 282-284, 291, 292293, 296-297, 341-342
following up on administration,
33, 296-298
guidelines, 6-8, 352-354
parental consent, 284-285
preparing for administration,
282-286
protecting data, 286
rationale for guidelines, 349-351
standardization in, 283

training assessors, 33, 285-286

using information from
assessments, 19-20, 296-297
Indirect assessments. See also
Observational measures
adverse consequences, 3
Individual-focused assessments, 29-30
Individualized education programs,
86
Individuals with Disabilities
Education Act (IDEA), 47, 75,
251, 255, 261, 263, 264, 266, 267,
271, 280, 332, 334, 369
Infant and toddler assessments
cognitive skills, 108, 110, 113
contexts for, 63-64, 67 n.1, 168
developmental, 2, 65, 70-72, 73,
110, 261
purposes, 62-63, 74
system approach, 75-76
Infant Characteristics Questionnaire,
83
Infant Development Inventory, 77
Infant habituation, 69, 108, 110
Infant Health and Development
Project, 111
Infant Monitoring System, 77
Infant/Toddler Environment Rating
Scale-Revised (ITERS-R), 147,
168, 176, 334
Infant-Toddler Social and Emotional
Assessment (ITSEA), 81, 125,
126
Informal assessments, 72-73, 119, 137,
252, 425
Inhibitory control, 100, 109, 110, 128
Instruments for assessment. See also
Selecting assessment tools;
individual instruments
for accountability, 39-40, 367-368
adaptation of tools developed for
other purposes, 39-40, 100
appraisal stage, 190
approaches to learning, 128-129
availability, 118
basic considerations, 361-363
cognitive skills, 77-78, 130-132

474

INDEX

development, 186-190, 205-213,

361-364, 367-368
diagnostic, 78, 79, 80, 363-364
by domain, 71, 77-84, 87
English language learners, 54, 100,
172-173, 247, 251-255, 257
functional evaluation, 82
information resources, 215, 449-453
observational, 120-144, 157-173
outcomes related to, 160, 161
for planning instruction and
monitoring progress, 33
of quality of environment, 152-177
research needs, 361-364
for screening, 74, 77-84, 122-127,
363
social and emotional development,
81-82, 122-127
temperament screening, 83
validity, 74
Intelligence tests, 68, 110, 111, 112, 113,
168, 234, 239, 245, 253
International Classification of
Functioning, Disability and
HealthChildren and Youth
Version (ICF-CY), 276
Interpreting and using scores, 3, 17.
See also Bias in asessments;
Validity of assessments
for accountability purposes, 35, 39,
40, 355-356
automated analysis tools, 101
conditions for, 19-20
extrapolation, 188
guidelines for, 37-39
of infant-toddler assessments, 63
program quality information
linked to, 319
Iron deficiency screening, 67, 76

K
Kaufman Assessment Battery for
Children (K-ABC), 113, 130,
131, 137, 142, 242, 245
Knowledge. See General knowledge

L
Labeling vulnerable children, 46
Language and literacy, 16-17, 30. See
also Phonological awareness;
Reading; Vocabulary
accountability assessments, 102
associations with important
outcomes, 66, 103-104
cognitive skills and, 110
constructs, 58
delays/disorders, 101, 102, 106
diagnostic testing, 101, 102, 106
discourse skills, 101, 102, 103
domain defined, 79, 100-103,
139-144
early learning guidelines, 49
English language learners, 104105, 172-173, 248-249, 251-252,
291-292
instructional and intervention
planning, 32, 104
instruments/tools for assessment,
59, 79, 101, 102, 106, 139-144,
162, 165-166, 168-169, 172-173,
174-177
learning behaviors and, 98
length of assessment, 289-290
malleability, 104
measures of, 102, 105, 106
minority children, 237, 242
quality of learning environment,
17, 104, 148, 154, 155, 157, 158,
161, 162, 164, 165-166, 167, 168,
169, 170, 174-177
receptive language, 66, 101
research-related assessment, 101,
106
standards of learning, 52-53, 89
testing all children, 104-106, 242
training examiners, 102
transfer theory, 248-249
validity of scores, 101
Language Assessment Scales (LAS),
252
Language minority. See English
language learners
Large-scale assessments, 40, 254, 259260, 266-267, 285

475

INDEX
Lead screening, 29, 30, 68
Learning disabilities, 34, 255, 263 n.2
Learning standards. See Approaches
to learning; Early childhood
education standards; Standards
Lexington Developmental Scales, 77
Limited English proficient. See
English language learners
Literacy. See Language and literacy
Literacy Activities Rating Scale, 166
Literacy Environment Checklist, 166

M
MacArthur-Bates Communicative
Development Inventories, 40,
79, 101, 105, 139, 142
Mathematica Policy Research, 201,
202, 203, 204, 283, 284, 285, 286,
287, 293, 295, 450-451
Mathematics
and academic achievement, 116
algebraic concepts, 115-116, 118
developmentally appropriate, 171
domain defined, 107, 114-116,
136-138
early learning standards, 49, 116
geometry, 114-115, 117, 118
importance, 116
instruments for assessment, 118,
136-138, 170-171, 175, 176
language-oriented problems, 154
learning-related behaviors and, 98,
99
mathematical reasoning, 116,
170-171
measurement skills, 114, 115, 117,
118, 120, 121, 123, 130, 170-171
measures of, 110, 117-118
number sense, 114, 117, 118, 165, 170
quality of learning environment,
157, 158, 161, 164, 170-171, 175,
176
testing all children, 116-117
U.S. students performance, 116
vocabulary and, 58
McCarthy Scales of Childrens Ability,
78

Measurement error, 187, 191, 198-199,

202, 204, 235, 317. See also Bias
in assessments; Constructirrelevant variance; Reliability
of assessments
Measures of developmental
outcomes. See Developmental
outcome measures
Memory, 92, 98, 107, 108, 109, 110, 113,
130, 131, 193
Michigan School Readiness Program,
45, 447
Minority children. See also English
language learners
access to assessment, 76
appropriateness of assessment, 3,
235-240, 243, 244, 259
bias in assessment, 233-246
cognitive assessment, 242
fairness concerns, 23, 193, 206, 210,
234-235, 243
inclusion in assessments, 320
mathematics testing, 114
oral learning traditions, 236
in special education, 239
validity of assessments for, 96, 100,
112, 240
Modified Checklist of Autism in
Toddlers, 81
Motor development, 17. See also
Physical well-being and motor
development
instruments, 80
Motor Quotient, 80
Movement Assessment of Infants, 80
Mullen Scales of Early Learning, 78
Multitiered models, 34, 264
My Teaching Partner, 147

N
National Assessment of Educational
Progress, 337
National Association for the
Education of Young Children
(NAEYC), 33, 38, 39, 45, 161,
162, 258, 268, 334

476
National Association of Early
Childhood Specialists in State
Departments of Education, 38,
268
National Association of School
Psychologists, 250, 268
National Association of Test Directors,
246
National Center for Education
Statistics, 36
National Child Care Information
Center, 107
National Childrens Study, 451
National Early Childhood
Accountability Task Force, 24,
39, 302, 322, 324
National Early Childhood Technical
Assistance Center, 71, 452
National Early Intervention
Longitudinal Study, 266
National Education Goals Panel, 4, 38,
48, 50, 58, 86, 97, 282, 347
National Head Start Association, 23,
55
National Institute for Early Education
Research (NIEER), 71, 439,
452-453
National Institute of Child Health and
Human Development, 110, 152,
153, 162, 170
National Longitudinal Survey of
Youth-Child Supplement, 287
National Registry Alliance, 318
National Research Council, 2, 20, 21,
24, 48-49, 431
Naturalistic assessment. See Authentic
assessment
NCHS/NLSY Questionnaire, 77
Nebraska, assessment system, 332-335
Neonatal Behavioral Assessment
Scale, 68, 69, 70
NEPSY, 120, 128, 129, 130, 139
New Jersey, Abbott Preschool
Program, 335-336, 351, 354
Newborn Individualized
Development Care and
Assessment Program, 70

INDEX
Newborns
developmental assessment, 68-70
hearing screening, 66
No Child Left Behind Act (NCLB), 1,
16, 19, 34, 35, 302, 307-308, 314,
315
Norm-referenced tests, 40, 50 n.1, 112,
197, 237-238, 254, 259-260, 264265, 270, 271-272, 273, 279, 350,
423, 426, 427
Normative development. See Threats
to normative development
Nursing Child Assessment Satellite
Training, 84
Nutritional deficiency, 67

O
Obesity, 18, 88, 120
Observation Measure of Language
and Literacy Instruction
(OMLIT), 168-169, 176
Observational measures
for accountability, 147-148, 149,
167, 201-202, 203, 204
classroom environments, 146, 154,
156-157, 158-159, 165-166, 168169, 173, 175, 176, 334
of environmental quality, 63,
146-150
home environment, 152-153, 154155, 174
instruments/tools, 120-144, 157173, 201, 297, 351
language and literacy instruction,
168-169, 176
length of assessment, 150, 160, 166,
169
and professional development,
146-147, 149
purposes, 146-150, 201-202,
203-204
reliability, 149-150, 157, 203, 204,
268, 283, 334-335
research needs, 204, 364
selecting, 146

477

INDEX
for special needs children, 274
strengths and weaknesses, 203-205
training assessors, 157, 203, 204
validity, 150, 157, 164-165
Observation Record of the Caregiving
Environment (ORCE), 169-170,
176
Office of Civil Rights, 251
Office of Special Education Programs,
328, 333, 348
Oral Language Development Scale,
292
Otoacoustic emissions, 66
Outcome measures. See
Developmental outcome
measures

P
Parent-child interaction, 104, 151, 155,
174
Parental/family involvement, 38, 94,
159, 171, 172, 177, 251, 260, 265,
268-269, 287
Parenting skills, 149
Parents Evaluation of Developmental
Status (PEDS), 77, 78
Partners for Inclusion model, 147
Peabody Developmental Motor Scales,
80
Peabody Individual Achievement
Tests, 133, 136, 140, 242
Peabody Picture Vocabulary Test
(PPVT), 40, 79, 101, 139, 154,
166, 236, 242, 252, 291
Peen Interactive Peer Play, 242
Performance assessments, 11, 133,
213, 224-226, 238, 254-255,
264, 335-336, 359, 424, 426. See
also Authentic assessment;
Classroom environments;
individual instruments
Pervasive Developmental Disorders
Screening Test-II (PDDST-II), 82
Pew Foundation, 39, 302
Phenylketonuria screening, 29

Phonological awareness, 101, 102, 103,

106, 109, 110, 132, 140, 154, 172,
248, 311
Physical well-being and motor
development
consensus, 88
domain defined, 87-88, 120-121
screening instruments, 80, 87,
120-121
Pictorial Assessment of Temperament
(PAT), 83
Planning and monitoring childrens
progress, 1, 32-33, 43
Policies on child development
context for assessment, 15-20
positive ethics, 16
Portfolio assessments, 133, 136, 140,
201, 205, 314, 350, 370, 426
Pre-Elementary Education
Longitudinal Study, 266
Pre-Language Assessment Scale
(Pre-LAS), 252, 292
Prenatal Early Infancy Project-Elmira
site, 112
Preschool Assessment of Attachment,
84
Preschool Classroom Mathematics
Inventory (PCMI), 170-171, 176
Preschool Curriculum Evaluation
Research Study (PCER), 289,
290
Preschool Language Assessment
(PLA), 102
Preschool Language Scale (PLS), 79,
143, 242, 252
Preschool Learning Behavior Scale,
100
Preschool Program Quality
Assessment (PQA), 171-172, 177
Preschool Screening System, 77
Primary caregiver-child interactions,
64, 69, 150, 151, 152-153
Primary Test of Cognitive Skills
(PTCS), 131
Professional development. See also
Training of assessors
Head Start teachers, 148

478

observational assessment in, 146147, 149, 167, 171

state initiatives, 332, 334-335
systems perspective, 317-319, 322,
332, 334-335
Program
accreditation and licensing, 325,
334
administrators role in
implementing guidelines,
371-372
effectiveness evaluation, 18-19, 3536, 39, 54, 85, 86, 148, 197-198,
201, 204, 259, 297, 355, 367-368
impact evaluation, 36, 148
performance assessment, 2,
34-37, 39-41, 266-267; see also
Accountability
quality standards, 44-45, 319-320,
325, 365
Progress monitoring, 2, 8, 18, 19, 31,
75, 85, 229, 283
academic achievement standards
and, 308
appropriateness of assessment, 39,
86, 148, 197, 259, 297
defined, 265-266, 426
early childhood education
standards and, 43, 44, 224
English language learners, 255,
292, 293
Head Start practices, 52, 53, 54-55,
430
implementing, 372, 374, 375
maps, 315
outcome measures, 86, 329
program evaluation compared, 35,
344
research needs, 364
RTI approach, 33-34
special needs children, 262, 264,
265-266, 267, 280, 330
state initiatives, 329, 447
tools for assessment, 101, 106, 118,
135, 201, 215-216, 224, 255, 264,
336, 424, 425
using assessment data, 32-33, 39,
280, 286, 314, 326, 344, 372

INDEX
Psychometric issues in assessment,
23, 119. See also Reliability
of assessments; Validity of
assessments
abbreviation or adaptation of tests,
40
bias testing, 235, 240, 243-244
cognitive skills, 107-108, 112, 113
direct tests, 370, 371, 372
guidelines, 6, 271, 350, 352, 370
information on instruments, 87,
449, 451, 452, 453
high-stakes vs. low-stakes
conditions, 195
measuring quantitative change,
224-225
precision, 263
research needs, 361, 364
special populations, 96, 112,
243-244
standards of evidence, 3, 225,
243-244
Purpose of assessments. See also
Accountability; Diagnostic
testing; Progress monitoring;
Program, performance
assessment; Screening
and appropriate use of
assessments, 27, 259, 283, 341342, 344, 433
community-focused screening,
29-30
determining and communicating,
3, 282-284
diagnostic testing, 30
eligibility testing, 31
functional level, 2, 29-31
guidelines on, 5, 37-39, 345-346
importance of purposefulness, 2,
18, 313
individual-focused screening, 29
in infant-toddler period, 62, 74
intervention and instruction
planning, 2, 31-34, 39, 69, 70, 85,
201, 222-226, 259, 264-265, 283
rationale for guidelines, 342-345
readiness testing, 30-31
research-related, 2, 34, 37, 266-267

479

INDEX

response to intervention, 30, 33-34,

364-365
social benchmarking, 36-37
special needs children, 262-267
and uses of assessment, 18, 22,
342-346

Q
Qualistar Early Learning Quality
Rating and Improvement
System, 173
Quality Interventions for Early Care
and Education (QUINCE), 147
Quality of assessments. See also Bias
in assessments; Reliability
of assessments; Validity of
assessments
measurement choices and, 200-205
Quality of environment. See also
Center-based environments;
Classroom environments;
Home environments
appropriate assessment of, 320
and cognitive skills, 108, 148, 151,
153-154, 170
and developmental outcomes, 1718, 64, 86, 95, 104, 108
English language learners, 172-173
importance, 145
instruments, 147, 152-177
observational measures, 63,
146-150
strategy for assessing programs,
173
systems perspective, 319-320
Quality of Instruction in Language
and Literacy, 169, 174
The Quick Test, 79

R
RAND Corporation, 111
Ratings of Parent-Child Interactions,
174
Read Aloud Profile, 169

Readiness for school, 53

assessments, 1, 30-31, 85, 214, 226,
227, 325, 426
consequences of assessments, 227
design of programs, 1
language and literacy, 100-101
learning-related behavior and, 97
National Education Goals, 4, 347
physical fitness and, 88
program evaluation, 226
social and emotional, 90
Reading. See also Language and
literacy
cognitive stimulation, 153
English language learners, 252
learning-related behavior and, 98,
99
memory and, 110
readiness for, 30-31, 168
vocabulary and, 103
Reading First, 31-32
Receptive Expressive Emergent
Language Scale (REEL), 79
Recognition and Response model, 34
Reliability of assessments. See also
Bias in assessments
age of children and, 72, 73-74, 119,
219
alternate forms coefficient, 199
automated analysis tools and, 101
checks/checking, 7, 11, 353, 359
coefficients, 198-200
defined, 427
evidence of, 182
inferences, 179, 281
internal consistency coefficient,
198
length of assessment and, 288
measures, 59, 73, 95, 96-97, 119,
149-150, 211, 212, 281
monitoring, 203, 353
National Reporting System, 54
by observation, 149-150, 157, 203,
204, 268, 283, 334-335
outcomes, 244
purpose of assessment and, 283
quantifying, 182, 198-200, 427
score and inter-rater, 323, 334, 427

480

INDEX

Spearman-Brown formula, 199

split-halves coefficient, 199
standardized tests, 40
for subgroups, 219-220, 291, 366,
370, 371
test-retest coefficient, 198-199, 427
tools, 6, 21, 23, 119, 145, 181, 194,
197, 211, 212, 219, 253, 322, 349,
350, 352, 362-363
training of assessors and, 150, 157,
203, 204, 285, 364
Reporting of results
background information, 316-317
confidentiality and privacy
safeguards, 317
interpretive material, 316
NCLB basis, 314, 315
progress map, 315-316
samples of childrens work, 316
systems perspective, 3, 305-306,
314-317
uncertainty in results, 317
Research
accountability and program
evaluation instruments, 367-368
agenda, 12, 360-368
assessment processes, 364-365
bias in instruments, 365-366
committees approach for this
study, 22-25, 431-434
diagnostic testing, 363-364
instrument development, 361-364
program quality standards, 365
special populations, 253, 256-258,
259, 365-367
uses of assessment for, 28, 34, 37,
101, 106, 148-149
Response to intervention (RTI)
approach, 30, 33-34, 263-264,
364-365
Reynell Developmental Language
Scales, 79, 141, 143

S
Sampling error, 188, 207, 211, 212
Science, 107, 136-138, 157, 158

Screening
appropriateness of assessment for,
360
community-focused, 29-30
contexts for, 63-64
defined, 427
developmental, 68-72, 87, 262
difficulties with young children,
72-74
implementing, 283
individual-focused, 29
infants and toddlers, 62-64, 70-72
instruments, by domain, 77-84
limitations in effectiveness, 74-76
newborns, 66, 68-70
principles of good programs, 63
research needs, 363
special needs children, 262
for threats to normative
development, 64-68
universal, 33-34
uses of assessments, 62-63
Screening Tool for Autism in TwoYear-Olds (STAT), 82
Selecting assessment tools, 4, 431
for accountability purposes, 40-41,
102, 201, 226-231
accuracy and quality issues, 2, 181,
210
committee approach, 22-25
guidelines on, 6-8, 37-39, 352-354
for local needs, 214-222
for multiple related entities,
222-226
in program evaluation context, 4041, 226-231
rationale for guidelines, 349-351
Sequenced Inventory of
Communication Development,
79, 141
Shape Stroop measure, 113
Simon says test, 113
Slosson Intelligence Test, 77
Snack delay test, 113
Snapshot of Classroom Activities, 169
Social benchmarking, 36-37, 40
Social Communication Questionnaire
(SCQ), 82

INDEX
Social Competence and Behavioral
Evaluation (SCBE), 126, 127
Social consequences of assessment
bias in assessment and, 195-196,
239-240
scenario, 227
Social Skills Rating Scale, 100, 122,
128, 129
Social studies, 50, 85, 107, 135, 440
Socioemotional development. See also
Approaches to learning
behavior problems, 54 n.4, 89, 91,
92-93, 94, 95, 99, 365
consensus, 90
constructs, 58, 95, 96, 113
domain defined, 89
early learning guidelines, 89
home environment and, 146, 148,
150-151
importance in practice and policy,
50-51, 89-90
infant assessment, 62, 69
instruments, 59, 71, 164, 166, 167,
170, 173-177, 362-363
and later development, 90-94
malleability, 94-95
measurement issues, 95, 242
measures of, 91, 95, 96-97
minority children, 242, 245
nutritional deficiency and, 67
quality of environment and, 164,
166, 167, 170, 174-177
reliability and validity of tests, 9697, 194
research needs, 362-363
screening instruments, 81-82,
122-127
self-regulation, 70, 89, 90, 92, 93-94,
95, 96, 108, 123, 311, 365
social competence, 89, 91-92, 96,
108, 126, 127, 148, 164, 166, 167,
170, 194
testing all children, 96, 242, 255
Special education, 31, 153, 239, 252,
255, 256, 261, 262-263, 264, 267,
271, 325, 326, 327, 328, 330, 348,
367, 369

481
Special needs children
accommodations for, 8, 40, 250,
254, 259, 260, 267, 272, 273, 276279, 295-296, 298, 330-331, 353,
367
accountability-related assessments,
267, 270, 272, 279
administration of assessments,
269-270, 272-273, 295-296
appropriateness of assessment for,
3, 4, 38, 40, 271, 280, 283
challenges in assessment, 270-279
construct-irrelevant skills, 274-275
developmental assessment, 76,
271, 369
diagnostic testing, 262-264
domain-based assessments,
274-276
eligibility determinations, 262-264,
271-272
functional outcomes approach,
275-276
inclusion in assessments, 36, 40,
266-267, 270, 273, 295-296, 320,
330-331
infants and toddlers, 261
instruments/tools, 273-274,
276-279
intervention or instruction
planning, 1, 264-265
labeling concerns, 46
large-scale assessments, 40, 266267, 273, 279
outcome measures, 275-276
population characteristics, 261
principles of assessment, 267-270
progress monitoring, 263, 265-266,
267
purposes of assessment, 260,
262-267
reporting outcomes, 47
research needs, 367
research-related assessments, 266267, 272
response to intervention approach,
263-264
screening, 262

482

universal design principles, 276279, 353

validity of assessments, 261, 270
Standardization
in administration of tests, 40, 7475, 287
bias in samples and methods and,
237-238, 241, 243, 253, 271
instructions and scoring, 227
samples of tests, 237-238, 243, 271
test administration, 40, 74-75
Standardized tests, 28, 40, 101, 110,
111, 118, 187, 202, 236-237, 238,
254-255, 259-260, 270-272, 275,
278, 426, 427
Standards. See also Guidelines
and accountability, 307
achievement (performance), 308
for assessment, 250-251, 309, 323,
350
content, 308, 311, 439-444
defined, 308-309
by funding source, 326-327
K-12 learning standards, 1, 44, 87,
88, 89, 445-447
learning performances approach,
309, 310-311, 313
learning progressions approach,
309-310, 311, 312
NCLB framework, 307-308
organizing around big ideas, 311
and reporting assessment results,
314
state early childhood education
standards, 16, 45-46, 116, 327338, 437-447
systems perspective, 8, 304, 305,
307-313, 335
Standards-based assessments, 371,
427, 431
Stanford-Binet Intelligence Scales, 113,
130, 131, 134, 136, 141, 242
State. See also individual states
assessment systems, 44, 161, 167,
328-336
early childhood education
standards, 16, 45-46, 116, 335,
437-448

INDEX

K-12 learning standards, 87, 88,

445-447
prekindergarten programs, 18-19,
44, 45, 161, 167, 302, 322, 326,
327, 328, 431
quality rating systems for early
care and education, 147
Strange Situation test, 84, 122
Study of Early Child Care and Youth
Development, 110, 152, 153,
162, 170
Sulzby Classification Schemes:
Emergent Storybook Reading,
140
Sulzby Familiar Storybook Reading
scale, 102
Summative assessments, 60, 117, 118,
133, 136, 140, 149, 427
Supports for Early Literacy
Assessment (SELA), 170, 172,
177
Supports for English Language
Learners Classroom
Assessment (SELLCA), 172-173,
177
System of assessment
accountability assessment in, 35-36
assessment approaches, 8, 75-76,
205-206, 305, 313, 333-334,
335-336
coherence in, 339-340, 341
components, 331-332
cost of implementing, 297-298, 322
current landscape, 324-328
development and learning
opportunities, 9, 306, 319-320,
339
environmental ratings, 148
features of systems, 302-303
feedback loops, 303-304
guidelines, 8-12, 356-360
importance of systematicity, 2, 7576, 302-304, 336-340
inclusiveness, 9, 76, 306, 320-321,
330-331
infrastructure, 230-231, 304-324
monitoring and evaluation, 9, 148,
306, 322-324

483

INDEX

professional development, 9, 306,

317-319, 322, 332, 334-335, 339
program quality assessment, 226231, 334
rationale for guidelines, 354-356
reporting procedures, 9, 230-231,
305-306, 314-317, 339-340
resources, 9, 306, 321-322
scenarios for developing, 222-233
standards, 8, 304, 305, 307-313, 335,
339
state efforts, 328-336

T
Tandem mass spectrometry, 65
Teacher-child relationships, 91-92, 95,
147, 157, 160-161, 163, 164, 165,
171, 257, 287, 372
Teacher Rating Scale, 100
Temperament, screening instruments,
81, 83
Temperament and Atypical Behavior
Scale (TABS), 81
Test de Vocabulario en Imgenes
Peabody, 252
Test of Early Language Development
(TELD), 79, 102, 141, 143, 159
Test of Early Mathematics Ability
(TEMA), 136, 137
Test of Early Reading Ability (TERA),
140, 144
Test of Language Dominance (TOLD),
102, 139
Testing. See Diagnostic testing
Threats to normative development
genetic/metabolic screening, 64-65
iron deficiency screening, 67
lead screening, 68
newborn hearing screening, 66
vision screening, 66-67
Toddler Behavior Assessment
Questionnaire (Carey Scales),
83
Toddler-Parent Mealtime Behavior
Questionnaire, 120
Toddlers. See Infant and toddler
assessments

Tools of the Mind curriculum, 94, 95,

99, 335
Tower of Hanoi, 128, 130
Training of assessors, 3, 33, 64, 102,
150, 256, 260, 285-286, 291, 294295, 364. See also Professional
development

U
Universal design principles, 8, 33-34,
276-279, 353, 366, 374
Universal Nonverbal Intelligence Test
(UNIT), 253
University of Nebraska, 215
U.S. Department of Education, 23,
249, 267, 271, 333, 348
U.S. Department of Health and
Human Services, 52-53, 430, 431
Office of Head Start, 2, 20, 450
U.S. Government Accountability
Office, 23, 54, 55
U.S. Preventive Services Task Force,
66, 67
Use of assessments. See Appropriate
use of assessments; Purpose of
assessments

V
Validity argument, 187
Validity of assessments. See also Bias
in assessments
for accountability, 40, 54-55, 198
as argument, 186-191
consequence of use and, 194-196
consistency of assessment. See
Reliability of assessments
construct, 185-186, 193, 197, 236237, 243, 250, 254, 255, 274-275
contemporary views of evidence,
192-196
content, 100, 184-185, 192, 193, 235236, 244, 250, 254, 255
convergent/divergent evidence,
194
criterion model, 183, 211

484

defined, 181, 427

external variables and, 101, 193194, 197, 203
generalization, 187, 210-213
history of evidence, 182-186
instrument-specific, 67, 70
integrated views of, 186, 313
internal structure and, 100, 193,
197
interpretive argument, 187,
188-190
from intervention data, 194
by observation, 150, 157
plausible alternative hypotheses,
190-191
of program quality, 196-198
psychometric issues, 4, 23, 40, 96,
181, 194, 235, 240, 243, 370, 371,
372
response processes and, 192-193,
205, 364-365
social consequences and, 195-196,
239-240
standardization and, 187, 237, 250,
254
treatment studies, 193-194
trinitarian model, 184-186
Vineland Adaptive Behavior Scale-II,
82
Vineland Social-Emotional Early
Childhood Scales (SEEC), 121,
127
Vineland Social-Emotional Maturity
Scale, 81
Vision screening, 29, 66-67, 75-76, 255,
262

INDEX
Vocabulary, 16-17, 32, 40, 53, 58, 79,
98, 101, 102, 103, 106, 107, 116,
135, 139, 142, 146, 154, 158, 166,
215, 236, 242, 248, 252, 286-287,
291, 292-293
Vygotskian play-based preschool
curriculum, 94, 99

W
Wechsler Intelligence Scale for
Children, 113, 245, 253
Wechsler Preschool and Primary Scale
of Intelligence (WPPSI), 113,
130, 133, 141, 242, 245
Welfare reform policies, 153
Westat, 24, 53-54
Woodcock-Johnson III (WJ-III) Tests
of Cognitive Abilities, 113, 118,
121, 130, 131, 132, 133, 136, 137,
139, 140, 144, 242, 252
Woodcock-Johnson-Revised Tests of
Cognitive Ability (WJ-R COG),
253
Work Sampling for Head Start
(WSHS) measure, 119
Work Sampling System (WSS), 33,
119, 120, 122, 124, 135, 138, 139,
297