Timmy 029

Highlights From TIMSS 2007:
Mathematics and Science Achievement of U.S. Fourth-

and Eighth-Grade Students in an International Context
December 2008
Patrick Gonzales
Project Ofcer
National Center for Education Statistics
Trevor Williams
Leslie Jocelyn
Stephen Roey
David Kastberg
Summer Brenwald
Westat
NCES 2009-001
U.S. DEPARTMENT OF EDUCATION
U.S. Department of Education
Margaret Spellings
Secretary
Institute of Education Sciences
Sue Betka
Acting Director
Stuart Kerachsky
Acting Commissioner
The National Center for Education Statistics (NCES) is the primary federal entity for collecting, analyzing, and reporting data
related to education in the United States and other countries. t fulflls a congressional mandate to collect, collate, analyze,
and report full and complete statistics on the condition of education in the United States; conduct and publish reports and
specialized analyses of the meaning and signifcance of such statistics; assist state and local education agencies in improving
their statistical systems; and review and report on education activities in foreign countries.
NCES activities are designed to address high-priority education data needs; provide consistent, reliable, complete, and
accurate indicators of education status and trends; and report timely, useful, and high-quality data to the U.S. Department
of Education, the Congress, the states, other education policymakers, practitioners, data users, and the general public.
Unless specifcally noted, all information contained herein is in the public domain.
We strive to make our products available in a variety of formats and in language that is appropriate to a variety of audiences.
You, as our customer, are the best judge of our success in communicating information effectively. f you have any comments
or suggestions about this or any other NCES product or report, we would like to hear from you. Please direct your comments
to National Center for Special Education Research
Institute of Education Sciences
1990 K Street NW
Washington, DC 20006-5651
December 2008
The NCES World Wide Web Home Page address is http://nces.ed.gov.
The NCES World Wide Web Electronic Catalog is http://nces.ed.gov/pubsearch.
Suggested Citation
Gonzales, P., Williams, T., Jocelyn, L., Roey, S., Kastberg, D., and Brenwald, S. (2008). Highlights From TIMSS 2007:
Mathematics and Science Achievement of U.S. Fourth- and Eighth-Grade Students in an International Context (NCES 2009001).
National Center for Education Statistics, nstitute of Education Sciences, U.S. Department of Education. Washington, DC.
For ordering information on this report, write to
ED Pubs
P.O. Box 1398
Jessup, MD 20794-1398
or call toll free 1-877-4ED-Pubs or order online at http://www.edpubs.org.
Content Contact
Patrick Gonzales
(415) 920-9229
[email protected]
iii
Executive Summary
HIGHLIGHTS FROM TIMSS 2007
EXECUTIVE SUMMARY
The 2007 Trends in nternational Mathematics and Science
Study (TMSS) is the fourth administration since 1995 of this
international comparison. Developed and implemented at
the international level by the nternational Association for
the Evaluation of Educational Achievement (EA)an
international organization of national research institutions and
governmental research agenciesTMSS is used to measure
over time the mathematics and science knowledge and skills
of fourth- and eighth-graders. TMSS is designed to align
broadly with mathematics and science curricula in the
participating countries.
This report focuses on the performance of U.S. students
relative to that of their peers in other countries in 2007,
and on changes in mathematics and science achievement
since 1995.
1
Thirty-six countries or educational jurisdictions
participated at grade four in 2007, while 48 participated at
grade eight.
2
This report also describes additional details
about the achievement of U.S. student subpopulations.
All differences described in this report are statistically
signifcant at the .05 level. No statistical adjustments
to account for multiple comparisons were used.
Key fndings from the report include the following:
n 2007, the average mathematics scores of both U.S.
fourth-graders (529) and eighth-graders (508) were higher
than the TMSS scale average (500 at both grades).
3

The average U.S. fourth-grade mathematics score
was higher than those of students in 23 of the 35 other
countries, lower than those in 8 countries (all located in
Asia or Europe), and not measurably different from those
in the remaining 4 countries.
4
At eighth grade, the average
U.S. mathematics score was higher than those of students
in 37 of the 47 other countries, lower than those in 5
countries (all of them located in Asia), and not measurably
different from those in the other 5 countries.
Compared to 1995, the average mathematics scores for
both U.S. fourth- and eighth-grade students were higher in
2007. At fourth grade, the U.S. average score in 2007 was
529, 11 points higher than the 1995 average of 518. At
eighth grade, the U.S. average mathematics score in 2007
was 508, 16 points higher than the 1995 average of 492.
n 2007, 10 percent of U.S. fourth-graders and 6 percent
of U.S. eighth-graders scored at or above the advanced
international benchmark in mathematics.
5
At grade four,
seven countries had higher percentages of students
performing at or above the advanced international
mathematics benchmark than the United States:
Singapore, Hong Kong SAR, Chinese Taipei, Japan,
Kazakhstan, England, and the Russian Federation.
Fourth-graders in these seven countries were also found
to outperform U.S. fourth-graders, on average, on the
overall mathematics scale. At grade eight, a slightly
different set of seven countries had higher percentages
of students performing at or above the advanced
mathematics benchmark than the United States: Chinese
Taipei, Korea, Singapore, Hong Kong SAR, Japan,
Hungary, and the Russian Federation. These seven
countries include the fve countries that had higher
average overall mathematics scores than the United
States, as well as Hungary and the Russian Federation.
n 2007, the average science scores of both U.S. fourth-
graders (539) and eighth-graders (520) were higher
than the TMSS scale average (500 at both grades).
The average U.S. fourth-grade science score was higher
than those of students in 25 of the 35 other countries,
lower than those in 4 countries (all of them in Asia),
and not measurably different from those in the remaining
6 countries. At eighth grade, the average U.S. science
score was higher than the average scores of students
in 35 of the 47 other countries, lower than those in 9
countries (all located in Asia or Europe), and not
measurably different from those in the other 3 countries.
1
At grade four, a total of 257 schools and 10,350 students participated in the United States in 2007. At grade eight, 239 schools and 9,723 students participated.
The overall weighted school response rate in the United States was 70 percent at grade four before the use of substitute schools. The fnal weighted student
response rate at grade four was 95 percent. At grade eight, the overall weighted school response rate before the use of substitute schools was 68 percent.
The fnal weighted student response rate at grade eight was 93 percent.
2
The total number of countries reported here differs from the total number reported in the international TMSS reports (Mullis et al. 2008; Martin et al. 2008).
n addition to the 36 countries at grade four and 48 countries at grade eight, 8 other educational jurisdictions, or "benchmarking entities, participated: the states
of Massachusetts and Minnesota; the Canadian provinces of Alberta, British Columbia, Ontario, and Quebec; Dubai, United Arab Emirates; and the Basque region
of Spain.
3
TMSS provides two overall scalesmathematics and scienceas well as several content and cognitive domain subscales for each of the overall scales.
The scores are reported on a scale from 0 to 1,000, with the TMSS scale average set at 500 and standard deviation set at 100.
4
TMSS is open to countries and subnational entities, or educational jurisdictions, which are part of larger countries. For example, Hong Kong is a Special
Administrative Region (SAR) of the People's Republic of China. For convenience, this report uses the term "country or "nation to refer to all participating entities.
5
TMSS reports on four benchmarks to describe student performance in mathematics and science. Each benchmark is associated with a score on the achievement
scale and a description of the knowledge and skills demonstrated by students at that level of achievement. The advanced international benchmark indicates that
students scored 625 or higher. More information on the benchmarks can be found in the main body of the report and appendix A.
iv
The average science scores for both U.S. fourth- and
eighth-grade students in 2007 were not measurably
different from those in 1995. The U.S. fourth-grade
average science score in 2007 was 539 and in 1995
was 542. The U.S. eighth-grade average science score
in 2007 was 520 and in 1995 was 513.
n 2007, 15 percent of U.S. fourth-graders and 10 percent
of U.S. eighth-graders scored at or above the advanced
international benchmark in science. At grade four, two
countries had higher percentages of students performing
at or above the advanced international science
benchmark than the United States: Singapore and
Chinese Taipei. Fourth-graders in these two countries
were also found to outperform U.S. fourth-graders, on
average, on the overall science scale. At grade eight, six
countries had higher percentages of students performing
at or above the advanced science benchmark than the
United States: Singapore, Chinese Taipei, Japan,
England, Korea, and Hungary. These six countries also
had higher average overall eighth-grade science scores
than the United States.

EXECUTIVE SUMMARY
v
Acknowledgments
ACKNOWLEDGMENTS
The authors wish to thank all those who assisted with TMSS
2007, from its design to the reporting of fndings. Most
importantly, the authors wish to thank the many principals,
teachers, and students who participated in the study.
Page intentionally left blank
vii
Contents
CONTENTS
Page
Executive Summary .................................................................................................................................................................. iii
Acknowledgments .....................................................................................................................................................................v
List of Tables ............................................................................................................................................................................. viii
List of Figures .............................................................................................................................................................................. ix
List of Exhibits .............................................................................................................................................................................xi
Introduction .................................................................................................................................................................................1
TIMSS in brief ............................................................................................................................................................................ 1
Design and administration of TIMSS ..................................................................................................................................... 1
Reporting TIMSS results ........................................................................................................................................................... 3
Nonresponse bias in the U.S. TIMSS samples ....................................................................................................................... 4
Further information ................................................................................................................................................................. 4
Mathematics Performance in the United States and Internationally
The TIMSS mathematics assessment .................................................................................................................................... 5
Average scores in 2007 .......................................................................................................................................................... 6
Trends in scores since 1995 .................................................................................................................................................... 8
Content and cognitive domain scores in 2007 ................................................................................................................ 10
Performance on the TIMSS international benchmarks .................................................................................................... 13
Performance within the United States ............................................................................................................................... 15
Effect size of the difference in average scores ................................................................................................................ 28
Science Performance in the United States and Internationally
The TIMSS science assessment ............................................................................................................................................ 31
Average scores in 2007 ........................................................................................................................................................ 31
Trends in scores since 1995 .................................................................................................................................................. 33
Content and cognitive domain scores in 2007 ................................................................................................................ 35
Performance on the TIMSS international benchmarks .................................................................................................... 38
Performance within the United States ............................................................................................................................... 41
Effect size of the difference in average scores ................................................................................................................ 51
References .................................................................................................................................................................................55
Appendix A: Technical Notes.................................................................................................................................................A-1
Appendix B: Example Items ................................................................................................................................................... B-1
Appendix C: TIMSS-NAEP Comparison ................................................................................................................................. C-1
Appendix D: Online Resources and Publications .................................................................................................................D-1
viii
APPENDIX B

APPENDIX C

List of Tables
LIST OF TABLES
Table Page
1. Participation in the TIMSS fourth- and eighth-grade assessments, by grade and country:
1995, 1999, 2003, and 2007 ................................................................................................................................................ 2
2. Percent of fourth- and eighth-grade TIMSS mathematics assessment devoted to content and cognitive
domains: 2007 ..................................................................................................................................................................... 5
3. Average mathematics scores of fourth- and eighth-grade students, by country: 2007 ........................................... 7
4. Trends in average mathematics scores of fourth- and eighth-grade students, by country: 1995 to 2007 .............. 8
5. Description of TIMSS mathematics cognitive domains: 2007 ...................................................................................... 10
6. Average mathematics content and cognitive domain scores of fourth-grade students,
by country: 2007 ............................................................................................................................................................... 11
7. Average mathematics content and cognitive domain scores of eighth-grade students,
by country: 2007 ............................................................................................................................................................... 12
8. Description of TIMSS international mathematics benchmarks, by grade: 2007 ........................................................ 13
9. Mathematics scores of fourth- and eighth-grade students dening 10th and 90th percentiles,
by country: 2007 ............................................................................................................................................................... 17
10. Percent of fourth- and eighth-grade TIMSS science assessment devoted to content
and cognitive domains: 2007 ......................................................................................................................................... 31
11. Average science scores of fourth- and eighth-grade students, by country: 2007 ................................................... 32
12. Trends in average science scores of fourth- and eighth-grade students, by country: 1995 to 2007 ..................... 33
13. Description of TIMSS science cognitive domains: 2007 ................................................................................................ 35
14. Average science content and cognitive domain scores of fourth-grade students, by country: 2007 ................. 36
15. Average science content and cognitive domain scores of eighth-grade students, by country: 2007 ................ 37
16. Description of TIMSS international science benchmarks, by grade: 2007 ................................................................ 38
17. Science scores of fourth- and eighth-grade students dening 10th and 90th percentiles,
by country: 2007 ............................................................................................................................................................... 42
A-1. Coverage of target populations and participation rates, by grade and country: 2007 ...................................... A-5
A-2. Total number of schools and students, by grade and country: 2007 ....................................................................... A-7
A-3. Number of new and trend mathematics and science items in the TIMSS grade four and grade eight
assessments, by type: 2007 .......................................................................................................................................... A-12
A-4. Number of mathematics and science items in the TIMSS grade four and grade eight assessments,
by type and content domain: 2007 ........................................................................................................................... A-13
A-5. Within-country constructed-response scoring reliability for TIMSS grade four and grade eight
mathematics and science items, by exact percent score
agreement and country: 2007 .................................................................................................................................... A-16
A-6. Weighted response rates for unimputed variables for TIMSS grade four and grade eight: 2007 ....................... A-20
A-7. Difference between average scores, standard deviations, pooled standard deviations,
and effect sizes of mathematics and science scores of fourth- and eighth-grade students,
by country, sex, race/ethnicity, and school poverty level: 2007 ............................................................................ A-24
ix
Figure Page
1. Countries that participated in TIMSS 2007 ....................................................................................................................... 3
2. Difference between average mathematics scores of U.S. fourth-and eighth-grade students
and the TIMSS scale average: 1995, 1999, 2003, and 2007 ........................................................................................... 9
3. Percentage of US fourth- and eighth-grade students who reached each TIMSS international
mathematics benchmark compared with the international median percentage: 2007 ....................................... 14
4. Percentage of fourth- and eighth-grade students who reached the TIMSS advanced international
benchmark in mathematics, by country: 2007 ............................................................................................................. 16
5. Cutpoints at the 10th and 90th percentile for mathematics content domain scores of U.S. fourth-
and eighth-grade students: 2007 ................................................................................................................................... 18
6. Trends in 10th and 90th percentile mathematics scores of U.S. fourth- and eighth-grade students:
1995, 1999, 2003, and 2007 .............................................................................................................................................. 19
7. Difference in average mathematics scores of fourth- and eighth-grade students,
by sex and country: 2007 ................................................................................................................................................. 20
8. Average mathematics scores of U.S. fourth- and eighth-grade students, by content domain
and sex: 2007 .................................................................................................................................................................... 21
9. Trends in sex differences in average mathematics scores of U.S. fourth- and eighth-grade students:
1995, 1999, 2003, and 2007 .............................................................................................................................................. 22
10. Average mathematics scores of U.S. fourth- and eighth-grade students, by race/ethnicity: 2007 ....................... 23
11. Trends in differences in average mathematics scores of U.S. fourth- and eighth-grade students,
by selected race/ethnicity: 1995, 1999, 2003 and 2007 ............................................................................................... 24
12. Average mathematics scores of U.S. fourth- and eighth-grade students, by percentage of students
in public school eligible for free or reduced-price lunch: 2007 .................................................................................. 25
13. Trends in differences in average mathematics scores of U.S. fourth-
and eighth-grade students, by school poverty level: 1999, 2003, and 2007 ............................................................. 26
14. Effect size of difference in average mathematics achievement
of fourth- and eighth-grade students, by country, sex, race/ethnicity, and school poverty level: 2007 .............. 29
15. Difference between average science scores of U.S. fourth- and eighth-grade students
and the TIMSS scale average: 1995, 1999, 2003, and 2007 ......................................................................................... 34
16. Percentage of U.S. fourth- and eighth-grade students who reached each TIMSS international science
benchmark compared with the international median percentage: 2007 ............................................................... 39
17. Percentage of fourth- and eighth-grade students who reached the TIMSS advanced international
benchmark in science, by country: 2007 ...................................................................................................................... 40
18. Cutpoints at the 10th and 90th percentile for science content domain scores of U.S. fourth-
and eighth-grade students: 2007 ................................................................................................................................... 43
19. Trends in 10th and 90th percentile science scores of U.S. fourth- and eighth-grade students:
1995, 1999, 2003, and 2007 .............................................................................................................................................. 44
20. Difference in average science scores of fourth- and eighth-grade students, by sex and country: 2007 ............. 45
21. Average science scores of U.S. fourth- and eighth-grade students, by content domain and sex: 2007 .............. 46
LIST OF FIGURES
List of Figures
x
Figure Page
22. Trends in sex differences in average science scores of U.S. fourth- and eighth-grade students:
1995, 1999, 2003, and 2007 .............................................................................................................................................. 47
23. Average science scores of U.S. fourth- and eighth-grade students,
by race/ethnicity: 2007 .................................................................................................................................................... 48
24. Trends in differences in average science scores of U.S. fourth- and eighth-grade students,
by selected race/ethnicity: 1995, 1999, 2003, and 2007 .............................................................................................. 49
25. Average science scores of U.S. fourth- and eighth-grade students, by percentage of students
in public school eligible for free or reduced-price lunch: 2007 .................................................................................. 50
26. Trends in differences in average science scores of U.S. fourth- and eighth-grade students,
by school poverty level: 1999, 2003, and 2007.............................................................................................................. 51
27. Effect size of difference in average science achievement of fourth- and eighth-grade students,
by country, sex, race/ethnicity, and school poverty level: 2007 ................................................................................ 53

LIST OF FIGURES
xi
Exhibit Page
B1. Example fourth-grade mathematics item: 2007 .......................................................................................................... B-2
B4. Example eighth-grade mathematics item: 2007 ......................................................................................................... B-5
B8. Example fourth-grade science item: 2007 ................................................................................................................... B-9
B9. Example fourth-grade science item: 2007 ................................................................................................................. B-10
B10. Example fourth-grade science item: 2007 ................................................................................................................. B-11
B11. Example eighth-grade science item: 2007 ................................................................................................................ B-12
B14. Example eighth-grade science item: 2007 ............................................................................................................... B-15
LIST OF EXHIBITS
List of Exhibits
1
Introduction
INTRODUCTION
TIMSS in brief
The Trends in International Mathematics and Science
Study (TMSS) 2007 is the fourth time since 1995 that this
international comparison of student achievement has been
conducted. Developed and implemented at the international
level by the nternational Association for the Evaluation of
Educational Achievement (EA), an international organization
of national research institutions and governmental research
agencies, TMSS is used to measure over time the
mathematics and science knowledge and skills of fourth-
and eighth-graders.
TMSS is designed to align broadly with mathematics and
science curricula in the participating countries. The results,
therefore, suggest the degree to which students have learned
mathematics and science concepts and skills likely to have
been taught in school. TIMSS also collects background
information on students, teachers, and schools to allow
cross-national comparison of educational contexts that may
be related to student achievement. n 2007, there were 58
countries and educational jurisdictions
1
that participated
in TMSS, at the fourth- or eighth-grade level, or both.
2

This report presents the performance of U.S. students
relative to their peers in other countries, and on changes in
mathematics and science achievement since 1995. Most of
the fndings in the report are based on the results presented
in two reports published by the EA and available online at
http://www.timss.org:
TIMSS 2007 International Mathematics Report:
Findings From IEAs Trends in International Mathematics
and Science Study at the Fourth and Eighth Grades
(Mullis et al. 2008); and
TIMSS 2007 International Science Report: Findings
From IEAs Trends in International Mathematics
and Science Study at the Fourth and Eighth Grades
(Martin et al. 2008).
For a number of participating countries, changes in
achievement can be documented over the last 12 years,
from 1995 to 2007. For other countries, changes can be
documented over a shorter period of time. Table 1 and
fgure 1 show the countries that participated in TMSS 2007
as well as their participation status in the earlier TMSS data
collections. The TMSS fourth-grade assessment was
implemented in 1995, 2003, and 2007, while the eighth-grade
assessment was implemented in 1995, 1999, 2003, and 2007.
This report describes additional details about the achievement
of U.S. students that are not available in the international
reports, such as trends in the achievement of students of
different racial and ethnic and socioeconomic backgrounds.
Design and administration of TIMSS
TMSS 2007 is sponsored by the EA and carried out under
a contract with the TMSS & PRLS
3
International Study Center
at Boston College.

The National Center for Education
Statistics (NCES), in the Institute of Education Sciences
at the U.S. Department of Education, is responsible for the
implementation of TMSS in the United States. Data collection
in the United States was carried out under contract to
Windwalker Corporation and its subcontractors, Westat
and Pearson Educational Measurement.
Participating countries administered TMSS to two national
probability samples of students and schools, based on a
standardized defnition. Countries were required to draw
samples of students who were nearing the end of their fourth
year or eighth year of formal schooling, beginning with the
nternational Standard Classifcation of Education (SCED)
Level 1.
4
In most countries, including the United States, these
students were in the fourth and eighth grades. Details on the
grades assessed in each country are included in appendix A.
n the United States, TMSS was administered between April
and June 2007. The U.S. sample included both public and
private schools, randomly selected and weighted to be
representative of the nation.
5
n total, 257 schools and 10,350
students participated at grade four, and 239 schools and
9,723 students participated at grade eight. The overall
weighted school response rate in the United States was 70
1
TMSS is open to countries and subnational entities, or educational jurisdictions, which are part of larger countries. For example, Hong Kong is a Special
Administrative Region (SAR) of the People's Republic of China. For convenience, this report uses the term "country or "nation to refer to all participating entities.
2
Data from two nations were judged problematic by the EA. Morocco failed to meet the required school participation rates in grade eight because of a procedural
diffculty with some schools. Also, the quality of the data from Mongolia was not well documented at either grade level. n the international reports, Morocco is
included in the fourth-grade tables but is shown "below the line in the eighth-grade tables to indicate a problem in data quality. Data on Mongolia are reported
in an appendix. For the purposes of the present report, statistics relating to Moroccan eighth-graders and to Mongolian students in both grades are not reported.
3
The international study center takes its name from the two main EA studies it coordinates; the Trends in nternational Mathematics and Science Study (TMSS)
and the Progress in nternational Reading Literacy Study (PRLS).
4
The SCED was developed by the United Nations Educational, Scientifc, and Cultural Organization (UNESCO) to assist countries in providing comparable,
cross-national data. SCED Level 1 is termed primary schooling, and in the United States is equivalent to the frst through sixth grades (Matheson et al. 1996).
5
The sample frame data for public schools in the United States was based on the 2006 National Assessment of Educational Progress (NAEP) sampling frame.
This was done because recruitment of districts and schools began at the end of the 2005-06 school year to maximize response rates. The 2006 NAEP sampling
frame was based on the 2003-04 Common Core of Data (CCD), and the data for private schools were from the 2003-04 Private School Universe Survey (PSS).
Any school containing at least one grade four or one grade eight class was included in the school sampling frame.
2
APPENDIX B

INTRODUCTION
Grade four Grade eight
Country 1995 2003 2007 1995 1999 2003 2007
Total 26 25 36 41 38 46 48
Algeria
Armenia
Australia
1

Austria
Bahrain
Belgium (Flemish)
Belgium (French)
Bosnia and Herzegovina
Botswana
Bulgaria
Canada
Chile
Chinese Taipei
Colombia
Cyprus
Czech Republic
Denmark
Egypt
El Salvador
England
2

Estonia
Finland
France
Georgia
Germany
Ghana
Greece
Hong Kong SAR
3

Hungary
Iceland
Indonesia
Iran, Islamic Rep. of
Ireland
Israel
4

Italy
4

Japan
Jordan
Kazakhstan
Country 1995 2003 2007 1995 1999 2003 2007
Total 26 25 36 41 38 46 48
Korea, Rep. of
Kuwait
Latvia
5

Lebanon
Lithuania
Macedonia, Rep. of
Malaysia
Malta
Moldova, Rep. of
Morocco
4

Netherlands
New Zealand
Norway
Oman
Palestinian Nat'l Auth.
Philippines
Portugal
Qatar
Romania
Russian Federation
Saudi Arabia
Scotland
Serbia
Singapore
Slovak Republic
Slovenia
1

South Africa
6

Spain
Sweden
Switzerland
Syrian Arab Republic
Thailand
Tunisia
Turkey
Ukraine
United States
Yemen
1
Because of national-level changes in the starting age/date for school, 1999 data for Australia and Slovenia cannot be compared to 2003 data.
2
England collected data at grade eight in 1995, 1999, and 2003, but due to problems with meeting the minimum sampling requirements for 2003, its eighth-grade
data are not shown in this report.
3
Hong Kong is a Special Administrative Region (SAR) of the People's Republic of China.
4
Because of changes in the population tested, 1995 data for srael and taly, and 1999 data for Morocco are not shown.
5
Only Latvian-speaking schools were included in 1995 and 1999. For trend analyses, only Latvian-speaking schools are included in the estimates.
6
Because within-classroom sampling was not accounted for, 1995 data are not shown for South Africa.
NOTE: No fourth-grade assessment was conducted in 1999. Only countries that completed the necessary steps for their data to appear in the reports from the
nternational Study Center are listed. n addition to the countries listed above, eight separate jurisdictions participated in the Trends in nternational Mathematics
and Science Study (TMSS) 2007: the provinces of Alberta, British Columbia, Ontario, and Quebec in Canada; the Basque region of Spain; Dubai, UAE, and the
states of Massachusetts and Minnesota. nformation on these eight jurisdictions can be found in the international TMSS 2007 reports. Morocco participated in
TMSS 2007 at both the fourth and eighth grades, but due to sampling diffculties, its grade eight data are not shown in this report. Mongolia also participated in
TMSS 2007 but could not complete the steps necessary to have its data included in the report. Countries could participate at either grade level. Countries were
required to sample students enrolled in the grade corresponding to the fourth and eighth year of schooling, beginning with nternational Standard Classifcation
of Education (SCED) level 1, providing that the mean age at the time of testing was at least 9.5 years and 13.5 years, respectively. n the United States and most
countries, this corresponds to grade four and grade eight. See table A1 in appendix A for details.
SOURCE: nternational Association for the Evaluation of Educational Achievement (EA), Trends in nternational Mathematics and Science Study (TMSS), 1995,
1999, 2003 and 2007.
Table 1. Participation in the TIMSS fourth- and eighth-grade assessments, by grade and country:
1995, 1999, 2003, and 2007
3
APPENDIX B
INTRODUCTION
percent at grade four before the use of substitute schools and
89 percent with the inclusion of substitute schools.
6
At grade
eight, the overall weighted school response rate before the
use of substitute schools was 68 percent and 83 percent with
the inclusion of substitute schools. The fnal weighted student
response rate at grade four was 95 percent and at grade eight
was 93 percent. Student response rates are based on a
combined total of students from both sampled and substitute
schools. Detailed information on sampling, administration,
response rates, and other technical issues are included
in appendix A.
Reporting TIMSS results
Achievement results from TIMSS are reported on a scale from
0 to 1,000, with a TMSS scale average of 500 and standard
deviation of 100. Even though the countries participating in
TMSS have changed across the four assessments between
1995 and 2007, comparisons between the 2007 results and
prior results are still possible because the achievement scores
in each of the TMSS assessments are placed on a scale which
is not dependent on the list of participating countries in any
particular year. A brief description of the assessment equating
and scaling is presented in appendix A to this volume. A more
detailed presentation can be found in the TIMSS 2007
Technical Report (Olson, Martin, and Mullis 2008).
In addition to numerical scale results, TIMSS also includes
international benchmarks. The TIMSS international benchmarks
provide a way to interpret the scale scores and to understand
how students' profciency in mathematics and science varies
along the TIMSS scale. The TIMSS benchmarks describe four
levels of student achievement in each subject, based on the
kinds of skills and knowledge students at each score cutpoint
would need to successfully answer the mathematics and
science items. In general, the score cutpoints for the TIMSS
benchmarks were set based on the distribution of students

Lithuania
Latvia
Romania
Ukraine
Bulgaria Sweden
Bosnia &
Herzegovina
Hungary
Norway
Denmark
Germany
Netherlands
Scotland
England
El Salvador
United States
Colombia
Czech Republic
Austria
Slovenia
Italy
Morocco
Malta
Ghana
Israel
Jordan
Serbia
Egypt
Turkey
Saudi
Arabia
Qatar
Thailand
Malaysia
Singapore
Australia
Indonesia
New Zealand
Bahrain
Kuwait
Oman
Yemen
Cyprus
Georgia
Kazakhstan
Russian Federation
Armenia
Syrian Arab
Republic
Lebanon
Hong Kong
Korea
Chinese Taipei
Japan
Algeria
Botswana
Tunisia
Palestinian
National
Authority
Slovak Republic
Iran,
Islamic
Rep. of
Figure 1. Countries that participated in TIMSS 2007
SOURCE: nternational Association for the Evaluation of Educational Achievement (EA), Trends in nternational Mathematics and Science Study (TMSS), 2007.
6
NCES standards advise that substitute schools should not be included in the calculation of response rates (standard 1-3-8; National Center for Education
Statistics 2002). Response rates calculated "before replacement are consistent with this standard. Response rates calculated "after replacement include
substitute schools and hence are not consistent with NCES standards. Both kinds of response rates are reported here in the interests of comparability with
the TMSS international reports which report response rates before and after replacement.
4
along the TMSS scale. More information on the development
of the benchmarks and the procedures used to set the score
cutpoints can be found in the TIMSS 2007 Technical Report
(Olson, Martin, and Mullis 2008).
All differences described in this report are statistically
signifcant at the .05 level. No statistical adjustments to
account for multiple comparisons were used. Differences that
are statistically signifcant are discussed using comparative
terms such as "higher and "lower. Differences that are not
statistically signifcant are either not discussed or referred to
as "not measurably different or "not statistically signifcant.
n this latter case, failure to fnd a difference as statistically
signifcant does not necessarily mean that there was no
difference. t simply means that, given the precision of the
estimates, there is a larger than fve percent chance that the
difference was zero. n addition, because the results of tests
of statistical signifcance are, in part, infuenced by sample
sizes, statistically signifcant results may not identify those
fndings that have policy or practical importance. For this
reason, this report includes effect sizes to provide the reader
with a sense of the magnitude of statistically signifcant
differences. Further information about effect sizes and about
the tests conducted to determine statistical signifcance can
be found in appendix A. Supplemental tables providing all
estimates and standard errors discussed in this report are
available online at http://nces.ed.gov/pubsearch/pubsinfo.
asp?pubid=2009001.
All data presented in this report are used to describe
relationships between variables. These data are not intended,
nor can they be used, to imply causality. Student performance
can be affected by a complex mix of educational and other
factors that are not examined here.
Nonresponse bias in the U.S.
TIMSS samples
NCES standards require a nonresponse bias analysis if
school-level response rates fall below 85 percent, as they
did for both the fourth- and eighth-grade school samples
in TMSS 2007.
7
As a consequence, a nonresponse bias
analysis was undertaken, similar to that used for TMSS
2003 (Ferraro and Van De Kerckhove 2006).
These analyses examined whether the participation status
of schools (participant/non-participant) was related to seven
school characteristics: the region of the country in which the
school was located (Northeast, Southeast, Central, West);
the type of community served by the school (central city,
urban fringe/large town, rural/small town); whether the school
was public or private; percentage of students eligible for free
or reduced-price lunch; number of students enrolled in fourth
or eighth grade; total number of students; and percentage of
students from minority backgrounds. Details are provided in
appendix A.
8
The fndings indicate some potential for bias in the data
arising from regional and community-type differences in
participation, along with the fact that schools with higher
percentages of minority students were less likely to
participate. Specifcally, grade 4 schools in the central region
were more likely to participate than schools in the other
regions, and schools in rural/small towns were more likely
to participate than schools in central cities. However with
the inclusion of substitute schools there were no measurable
differences by region and differences by community type were
substantially reduced. At grade 8, after substitution, the results
of the analyses indicated that schools in central cities were
still more likely to participate than schools in urban/fringe/large
towns. At both grades, schools with higher percentages
of minority students were less likely to participate, but
the measurable differences were small after substitution
especially at grade 8. Since TMSS is conducted under a
set of standard rules designed to facilitate international
comparisons, the U.S. nonresponse bias analysis results
were not used to adjust the U.S. data for this source of bias.
While this may be possible at some later date, at present the
variables identifed above remain as potential sources of bias
in the published estimates.
Further information
To assist the reader in understanding how TMSS relates
to the National Assessment of Educational Progress (NAEP),
the primary source of national- and state-level data on U.S.
students' mathematics and science achievement, NCES
compared the form and content of the TMSS and NAEP
mathematics and science assessments. A summary of
the results of this comparison is included in appendix C.
Appendix D includes a list of TMSS publications and
resources published by NCES and the EA. Standard errors
for the estimates discussed in the report are available online
at http://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=
2009001. Detailed information on TMSS can also be found
on the NCES website (http://nces.ed.gov/timss) and the
international TMSS website (http://www.timss.org).
7
Standard 2-2-2 found in National Center for Education Statistics 2002.
8
The full text of the nonresponse bias analysis conducted for TMSS 2007 will be included in a technical report released with the U.S. national dataset.
See appendix A for a description of the analyses undertaken and additional details on the fndings.

INTRODUCTION
5
MATHEMATICS
Mathematics Performance in the United States
and Internationally
The TIMSS mathematics
assessment
The TMSS mathematics assessment is designed along two
dimensions: the mathematical topics or content that students
are expected to learn and the cognitive skills students are
expected to have developed. The topical or content domains
(as they are called in TMSS) covered at grade four are
number, geometric shapes and measures, and data display
(table 2). At grade eight, the content domains are number,
algebra, geometry, and data and chance. The cognitive
domains in each grade are knowing, applying, and reasoning.
Example items from the TMSS mathematics assessment are
included in appendix B (see items B1 through B7).
The proportion of items devoted to a domain, and, therefore,
the contribution of the domain to the overall mathematics
scale score differs somewhat across grades. For example,
in 2007 at grade four, 52 percent of the TMSS mathematics
assessment focused on the number domain, while the
analogous percentage at grade eight was 29 percent.
The proportion of items devoted to each cognitive domain
was similar across grades.
Also, within a content or cognitive domain, the makeup of
items, in terms of diffculty and form of knowledge and skills
addressed, differs across grade levels to refect the nature,
diffculty, and emphasis of the subject matter encountered in
school at each grade. TIMSS 2007 Assessment Frameworks
(Mullis et al. 2005) provides a more detailed description
of the content and cognitive domains assessed in TMSS.
The development and validation of the cognitive domains
is detailed in IEAs TIMSS 2003 International Report on
Achievement in the Mathematics Cognitive Domains:
Findings From a Developmental Project (Mullis, Martin,
and Foy 2005).
TMSS provides an overall mathematics scale score as well
as content and cognitive domain scores at each grade level.
The TIMSS mathematics scale is from 0 to 1,000 and the
international mean score is set at 500, with a standard deviation
of 100. The scaling of data is conducted separately for each
grade and each content domain. Thus, a score of 500 on the
grade four scale is not equivalent to a score of 500 on the
grade eight scale The scaling of data is conducted separately
for each grade and each content domain. While the scales
were created to each have a mean of 500 and a standard
deviation of 100, the subject matter and the level of diffculty
of items necessarily differ between the assessments at both
grades. Therefore, direct comparisons between scores across
grades should not be made. See appendix A for more details.
Table 2. Percentage of fourth- and eighth-grade TIMSS mathematics assessment
devoted to content and cognitive domains: 2007
Grade four
Content domains
Percent of
assessment
Number 52
Geometric shapes and measures 34
Data display 15
Cognitive domains
Percent of
assessment
Knowing 39
Applying 39
Reasoning 22
Grade eight
Content domains
Percent of
assessment
Number 29
Algebra 30
Geometry 22
Data and chance 19
Cognitive domains
Percent of
assessment
Knowing 38
Applying 41
Reasoning 21
NOTE: The content and cognitive domains are the foundation of the Trends in nternational Mathematics and Science Study (TMSS)
assessment. The content domains defne the specifc mathematics subject matter covered by the assessment, and the cognitive
domains defne the sets of behaviors expected of students as they engage with the mathematics content. Each mathematics content
domain has several topic areas. Each topic area is presented as a list of objectives covered in a majority of participating countries,
at either grade four or grade eight. However, the cognitive domains of mathematics are defned by the same three sets of expected
behaviorsknowing, applying, and reasoning. Detail may not sum to totals because of rounding.
SOURCE: nternational Association for the Evaluation of Educational Achievement (EA), Trends in nternational Mathematics and
Science Study (TMSS), 2007.
6
MATHEMATICS
Scores within a subject and grade are comparable over time.
The TMSS scale was established originally to have a mean
of 500 based on the average of all of the countries that
participated in TMSS 1995 at the fourth and eighth grades.
Successive TMSS assessments since then (TMSS 1999,
2003, and 2007) have scaled the achievement data so that
scores are equivalent from assessment to assessment.
That is, a score of 500 in eighth-grade mathematics in 2007
is equivalent to a score of 500 in eighth-grade mathematics
in 2003, in 1999, and in 1995. The same is true for the fourth-
grade scale: a score of 500 in fourth-grade mathematics
in 2007 is equivalent to a score of 500 in fourth-grade
mathematics in 2003 and 1995. More information on how
the TMSS scale was created can be found in appendix A.
Average scores in 2007
The average mathematics scores for both U.S. fourth- and
eighth-graders were higher than the TMSS scale average
(table 3). n 2007, the average score of U.S. fourth-graders
was 529 and the average score of U.S. eighth-graders was
508, compared with the TMSS scale average of 500 at each
grade level.
At grade four, the average U.S. mathematics score was higher
than those in 23 of the 35 other countries, lower than those in
8 countries (all 8 were in Asia or Europe), and not measurably
different from the average scores in the remaining 4 countries.
At grade eight, the average U.S. mathematics score was higher
than those in 37 of the 47 other countries, lower than those in
5 countries (all of them located in Asia), and not measurably
different from the average scores in the other 5 countries.
7
MATHEMATICS
Average score is higher than U.S. average score (p < .05)
Average score is not measurably different from the U.S. average score (p < .05)
Average score is lower than the U.S. average score (p < .05)
1
Hong Kong is a Special Administrative Region (SAR) of the People's Republic
of China.
2
National Target Population does not include all of the nternational Target
Population defned by the Trends in nternational Mathematics and Science
Study (TMSS) (see appendix A).
3
Nearly satisfed guidelines for sample participation rates only after substitute
schools were included (see appendix A).
4
Met guidelines for sample participation rates only after substitute schools were
included (see appendix A).
5
National Defned Population covers 90 percent to 95 percent of National
Target Population (see appendix A).
6
Kuwait tested the same cohort of students as other countries, but later
in 2007, at the beginning of the next school year.
7
National Defned Population covers less than 90 percent of National Target
Population (but at least 77 percent, see appendix A).
NOTE: Countries are ordered by 2007 average score. The tests for signifcance
take into account the standard error for the reported difference. Thus, a small
difference between the United States and one country may be signifcant while
a large difference between the United States and another country may not be
signifcant. The standard errors of the estimates are shown in tables E-1 and E-2
available at http://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2009001.
SOURCE: nternational Association for the Evaluation of Educational
Achievement (EA), Trends in nternational Mathematics and Science Study
(TMSS), 2007.
Table 3. Average mathematics scores
of fourth- and eighth-grade students,
by country: 2007
Grade four
Country
Average
score
TMSS scale average 500
Hong Kong SAR
1
607
Singapore 599
Chinese Taipei 576
Japan 568
Kazakhstan
2
549
Russian Federation 544
England 541
Latvia
2
537
Netherlands
3
535
Lithuania
2
530
United States
4,5
529
Germany 525
Denmark
4
523
Australia 516
Hungary 510
Italy 507
Austria 505
Sweden 503
Slovenia 502
Armenia 500
Slovak Republic 496
Scotland
4
494
New Zealand 492
Czech Republic 486
Norway 473
Ukraine 469
Georgia
2
438
Iran, Islamic Rep. of 402
Algeria 378
Colombia 355
Morocco 341
El Salvador 330
Tunisia 327
Kuwait
6
316
Qatar 296
Yemen 224
Grade eight
Country
Average
score
Chinese Taipei 598
Korea, Rep. of 597
Singapore 593
Hong Kong SAR
1,4
572
Japan 570
Hungary 517
England
4
513
United States
4,5
508
Lithuania
2
506
Czech Republic 504
Slovenia 501
Armenia 499
Australia 496
Sweden 491
Malta 488
Scotland
4
487
Serbia
2,5
486
Italy 480
Malaysia 474
Norway 469
Cyprus 465
Bulgaria 464
Israel
7
463
Ukraine 462
Romania 461
Bosnia and Herzegovina 456
Lebanon 449
Thailand 441
Turkey 432
Jordan 427
Tunisia 420
Georgia
2
410
Bahrain 398
Indonesia 397
Syrian Arab Republic 395
Egypt 391
Algeria 387
Colombia 380
Oman 372
Palestinian Nat'l Auth. 367
Botswana 364
Kuwait
6
354
El Salvador 340
Saudi Arabia 329
Ghana 309
Qatar 307
8
MATHEMATICS
Trends in scores since 1995
Several countries participated in both the frst TMSS in 1995
and the most recent TMSS in 2007 and therefore the average
scores can be compared over a 12-year period. At grade four,
16 countries, including the United States, participated in both
the frst and most recent TMSS administrations. Comparing
2007 mathematics scores with those from 1995, one-half of
the countries (8 of 16), including the United States, showed
improvement in average scores and one-quarter of the
countries (4 of 16) showed declines (table 4). n 2007, the U.S.
fourth-grade average mathematics score of 529 was 11 scale
score points higher than the 1995 average of 518.
The gain in the U.S. fourth-grade average mathematics score
(11 scale score points) was greater than the difference in six
countries (the four countries with declines in average scores,
as well as two other countries) and less than the gain of four
countries (England, Hong Kong SAR, Slovenia, and Latvia).
There was no measurable difference between the 11 score
point gain in the United States and the gains or declines in
score points experienced in the other countries.
At grade eight, 20 countries, including the United States,
participated in TMSS in both 1995 and 2007. About one-
quarter of the countries (6 of 20), including the United States,
had higher average mathematics scores in 2007 than in 1995
and students in one-half of the countries (10 of 20) showed
declines in their average scores. The U.S. eighth-grade
average mathematics score of 508 was 16 scale score points
higher than the 1995 average of 492.
The gain in the U.S. eighth-grade mathematics score
(16 scale score points) was greater than the difference
Grade eight
Average score Difference
1
Country 1995 2007 20071995
Colombia 332 380 47*
Lithuania
3
472 506 34*
Korea, Rep. of 581 597 17*
United States
4,5
492 508 16*
England
4
498 513 16*
Slovenia 494 501 7*
Hong Kong SAR
2,4
569 572 4
Cyprus 468 465 -2
Scotland
4
493 487 -6
Hungary 527 517 -10*
Japan 581 570 -11*
Russian Federation 524 512 -12
Romania 474 461 -12*
Australia 509 496 -13*
Iran, Islamic Rep. of 418 403 -15*
Singapore 609 593 -16*
Norway 498 469 -29*
Czech Republic 546 504 -42*
Sweden 540 491 -48*
Bulgaria 527 464 -63*
Table 4. Trends in average mathematics scores of fourth- and eighth-grade students, by country:
1995 to 2007
Grade four
1
Country 1995 2007 20071995
England 484 541 57*
Hong Kong SAR
2
557 607 50*
Slovenia 462 502 40*
Latvia
3
499 537 38*
New Zealand 469 492 23*
Australia 495 516 22*
Iran, Islamic Rep. of 387 402 15*
United States
4,5
518 529 11*
Singapore 590 599 9
Scotland
4
493 494 1
Japan 567 568 1
Norway 476 473 -3
Hungary 521 510 -12*
Netherlands
6
549 535 -14*
Austria 531 505 -25*
Country difference in average scores between 1995 and 2007 is greater than analogous U.S. difference (p < .05)
Country difference in average scores between 1995 and 2007 is not measurably different from analogous U.S. difference (p < .05)
Country difference in average scores between 1995 and 2007 is less than analogous U.S. difference (p < .05)
*p < .05. Within-country difference between 1995 and 2007 average scores is signifcant.
1
Difference calculated by subtracting 1995 from 2007 estimate using unrounded numbers.
2
3
n 2007, National Target Population did not include all of the nternational Target Population defned by the Trends in nternational Mathematics and Science Study
(TMSS) (see appendix A).
4
n 2007, met guidelines for sample participation rates only after substitute schools were included (see appendix A).
5
n 2007, National Defned Population covered 90 percent to 95 percent of National Target Population (see appendix A).
6
n 2007, nearly satisfed guidelines for sample participation rates only after substitute schools were included (see appendix A).
NOTE: Countries are ordered based on the difference in 1995 and 2007 average scores. All countries met international sampling and other guidelines in 2007,
except as noted. Data are not shown for some countries, because comparable data from previous cycles are not available. The tests for signifcance take into
account the standard error for the reported difference. Thus, a small difference between averages for one country may be signifcant while a large difference for
another country may not be signifcant. Detail may not sum to totals because of rounding. The standard errors of the estimates are shown in tables E-1 and E-2
SOURCE: nternational Association for the Evaluation of Educational Achievement (EA), Trends in nternational Mathematics and Science Study (TMSS), 1995
and 2007.
9
MATHEMATICS
Figure 2. Difference between average
mathematics scores of U.S. fourth-
and eighth-grade students and the
TIMSS scale average: 1995, 1999,
2003, and 2007
1999
1
2003 2007 1995
18* 18*
29*
1999
1
2003 2007
-8
4
2
8*
1995
-80
-60
-40
-20
0
20
40
60
80
Year
U.S. difference from TIMSS scale average
-80
-60
-40
-20
0
20
40
60
80
Year
Grade four
Grade eight
*p < .05. Difference between U.S. average and Trends in nternational
Mathematics and Science Study (TMSS) scale average is statistically signifcant.
1
No fourth-grade assessment was conducted in 1999.
NOTE: n 2007, the United States met guidelines for sample participation rates
only after substitute schools were included. The National Defned Population
covered 90 percent to 95 percent of National Target Population (see appendix A).
Difference calculated by subtracting the TMSS scale average (500) from the
U.S. average mathematics score. The standard errors of the estimates are shown
in table E-39 available at http://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=
2009001.
(TMSS), 1995, 1999, 2003, and 2007.
in 13 countries (including the 10 countries with declining scores
and 3 others) and less than the gain of 2 countries (Colombia
and Lithuania). There was no measurable difference between
the 16 score point gain in the United States and the gains
or declines in score points experienced in the other countries.
The size of the difference in scores between the U.S. fourth-
graders' and TMSS scale averages was larger in 2007 at 29
scale score points than it was in 1995 at 18 scale score points
(fgure 2). U.S. fourth-graders' average mathematics scores
were higher than the TMSS scale average in each of the 3
data collection years: 1995, 2003, and 2007.
U.S. eighth-graders' average mathematics scores showed no
measurable difference from the TMSS scale average in 3 of
the 4 data collection years between 1995 and 2007. However,
the 2007 U.S. score was higher than the U.S. score in 1995,
with the U.S. score in 1995 some 8 points below the TMSS
scale average, but 8 points above the average in 2007.
10
MATHEMATICS
Table 5. Description of TIMSS mathematics cognitive domains: 2007
Cognitive domain Description
Knowing
Knowing addresses the facts, procedures, and concepts that students need to know to function mathematically. The key skills of
this cognitive domain include recalling defnitions, terminology, number properties, geometric properties, and notation; recognizing
mathematical objects, shapes, numbers, and expressions; recognizing mathematical entities that are mathematically equivalent;
computing algorithmic procedures for basic functions with whole numbers, fractions, decimals, and integers; approximating
numbers to estimate computations; carrying out routine algebraic procedures; retrieving information from graphs, tables, and
charts; reading simple scales; using appropriate units of measure and measuring instruments; estimating measures; classifying
or grouping objects, shapes, numbers, and expressions according to common properties; making correct decisions about class
membership; and ordering numbers and objects by attributes.
Applying
Applying focuses on students' abilities to apply knowledge and conceptual understanding to solve problems or answer questions.
The key skills of this cognitive domain include selecting appropriate operations, methods, or strategies for solving problems where
there is a known algorithm or method of solution; representing mathematics information and data in diagrams, tables, graphs,
and charts; generating equivalent representations for a given mathematical entity or relationship; generating an appropriate
mathematical model, such as an equation or diagram for solving a routine problem; following and executing a set of mathematical
instructions; drawing fgures and shapes given specifcations; solving routine problems (i.e., problems similar to those students are
likely to have encountered in class); comparing and matching different representations of data (grade eight) and using data from
charts, tables, graphs, and maps to solve routine problems.
Reasoning
Reasoning goes beyond the cognitive processes involved in solving routine problems to include unfamiliar situations, complex
contexts, and multistep problems. The key skills of this cognitive domain include determining and describing relationships between
variables or objects in mathematical situations; using proportional reasoning (grade four); decomposing geometric fgures to simplify
solving a problem; drawing the net of a given unfamiliar solid; visualizing transformations of three-dimensional fgures; comparing
and matching different representations of the same data (grade four); making valid inferences from given information; generalizing
mathematical results to wider applications; combining mathematical procedures to establish results and combining results to
produce a further result; making connections between different elements of knowledge and related representations; making linkages
between different elements of knowledge and related representations; making linkages between related mathematical ideas;
providing a justifcation for the truth or falsity of a statement by reference to mathematical results or properties; solving problems
set in mathematical or real life contexts that students are unlikely to have encountered before; applying mathematical procedures
in unfamiliar or complex contexts; and using geometric properties to solve non-routine problems.
NOTE: The descriptions of the cognitive domains are the same for grades four and eight, except where noted.
Content and cognitive domain
scores in 2007
n addition to an overall mathematics score, TMSS provides
scores for content domains and cognitive domains (see table
5 for a description of the cognitive domains). U.S. fourth-
graders scored higher than the TMSS scale average across
the mathematics content domains in 2007 (table 6). U.S.
fourth-graders' average scores in number, geometric shapes
and measures, and data display were between 22 and 43
scale score points above the TMSS scale average of 500
in each content domain.
U.S. fourth-graders performed better on average in the data
display domain than in the number and geometric shapes
and measures domains, at least in terms of comparisons
with other countries. That is, there were fewer countries that
outperformed the United States in data display than in the
other two domains. U.S. fourth-graders outperformed their
peers in 22 countries in the number domain, 20 countries
in the geometric shapes and measures domain, and 28
countries in the data display domain. They were outperformed
by their peers in 9 countries in the number domain, 10
countries in the geometric shapes and measures domain,
and 4 countries in the data display domain.
n the three cognitive domains, U.S. fourth-graders scored
higher than the TMSS scale average in 2007. U.S. fourth-
graders' average scores in the knowing, applying, and
reasoning domains were between 23 and 41 scale score
points higher than the TMSS scale average of 500.
n terms of comparisons with other countries, U.S. fourth-
graders performed relatively better on average in the applying
domain than the knowing and reasoning domains. U.S. fourth-
graders outperformed students in 16 to 27 countries across
the three cognitive domains and were outperformed by their
peers in 5 to 11 countries across the three cognitive domains.
At the eighth-grade level, U.S. students scored higher, on
average, than the TMSS scale average in two of the four
mathematics content domains in 2007 (table 7). U.S. eighth-
graders' average scores in number and data and chance were
10 and 31 scale score points above the TMSS scale score
average of 500, respectively. On the other hand, U.S. eighth-
graders' average score in the geometry domain was lower than
the TMSS scale score average by 20 scale score points. There
was no measurable difference between U.S. eighth-graders'
average score in algebra and the TMSS scale score average.
U.S. eighth-graders performed relatively better, on average,
in the data and chance domain than in the number, algebra,
11
MATHEMATICS
Table 6. Average mathematics content and cognitive domain scores of fourth-grade students,
by country: 2007
Content domain Cognitive domain
Country Number
Geometric shapes
and measures Data display Knowing Applying Reasoning
TMSS scale average 500 500 500 500 500 500
Hong Kong SAR
1
606 599 585 599 617 589
Singapore 611 570 583 590 620 578
Chinese Taipei 581 556 567 569 584 566
Japan 561 566 578 566 565 563
Kazakhstan
2
556 542 522 547 559 539
Russian Federation 546 538 530 547 538 540
England 531 548 547 540 544 537
Latvia
2
536 532 536 540 530 537
Netherlands
3
535 522 543 540 525 534
Lithuania
2
533 518 530 539 520 526
United States
4,5
524 522 543 524 541 523
Germany 521 528 534 531 514 528
Denmark
4
509 544 529 528 513 524
Australia 496 536 534 523 509 516
Hungary 510 510 504 507 511 509
Italy 505 509 506 501 514 509
Austria 502 509 508 507 505 506
Sweden 490 508 529 508 482 519
Slovenia 485 522 518 504 497 505
Armenia 522 483 458 493 518 489
Slovak Republic 495 499 492 498 492 499
Scotland
4
481 503 516 500 489 497
New Zealand 478 502 513 495 482 503
Czech Republic 482 494 493 496 473 493
Norway 461 490 487 479 461 489
Ukraine 480 457 462 466 472 474
Georgia
2
464 415 414 433 450 437
Iran, Islamic Rep. of 398 429 400 405 410 410
Algeria 391 383 361 376 384 387
Colombia 360 361 363 357 360 372
Morocco 353 365 316 346 354
El Salvador 317 333 367 339 312 356
Tunisia 352 334 307 329 343
Kuwait
6
321 316 318 305 326
Qatar 292 296 326 296 293
Yemen
Average score is higher than the U.S. average score (p < .05)
Not available. Average achievement could not be accurately estimated.
1
2
National Target Population does not include all of the nternational Target Population defned by the Trends in nternational Mathematics and Science Study
3
Nearly satisfed guidelines for sample participation rates only after substitute schools were included (see appendix A).
4
Met guidelines for sample participation rates only after substitute schools were included (see appendix A).
5
National Defned Population covers 90 percent to 95 percent of National Target Population (see appendix A).
6
Kuwait tested the same cohort of students as other countries, but later in 2007, at the beginning of the next school year.
NOTE: Countries are ordered by 2007 overall mathematics average scale score. The tests for signifcance take into account the standard error for the reported
difference. Thus, a small difference between the United States and one country may be signifcant while a large difference between the United States and
another country may not be signifcant. The standard errors of the estimates are shown in table E-3 available at http://nces.ed.gov/pubsearch/pubsinfo.asp?
pubid=2009001.
12
MATHEMATICS
Table 7. Average mathematics content and cognitive domain scores of eighth-grade students,
by country: 2007
Country Number Algebra Geometry
Data and
chance Knowing Applying Reasoning
TMSS scale average 500 500 500 500 500 500 500
Chinese Taipei 577 617 592 566 592 594 591
Korea, Rep. of 583 596 587 580 595 596 579
Singapore 597 579 578 574 593 581 579
Hong Kong SAR
1,2
567 565 570 549 569 574 557
Japan 551 559 573 573 565 560 568
Hungary 517 503 508 524 513 518 513
England
2
510 492 510 547 514 503 518
Russian Federation 507 518 510 487 510 521 497
United States
2,3
510 501 480 531 503 514 505
Lithuania
4
506 483 507 523 511 508 486
Czech Republic 511 484 498 512 504 502 500
Slovenia 502 488 499 511 503 500 496
Armenia 492 532 493 427 493 507 489
Australia 503 471 487 525 500 487 502
Sweden 507 456 472 526 497 478 490
Malta 496 473 495 487 492 490 475
Scotland
2
489 467 485 517 489 481 495
Serbia
3,4
478 500 486 458 478 500 474
Italy 478 460 490 491 483 476 483
Malaysia 491 454 477 469 478 477 468
Norway 488 425 459 505 477 458 475
Cyprus 464 468 458 464 465 468 461
Bulgaria 458 476 468 440 458 477 455
Ukraine 460 464 467 458 464 471 445
Romania 457 478 466 429 462 470 449
Israel
5
469 470 436 465 456 473 462
Bosnia and Herzegovina 451 475 451 437 440 478 452
Lebanon 454 465 462 407 448 464 429
Thailand 444 433 442 453 446 436 456
Turkey 429 440 411 445 425 439 441
Jordan 416 448 436 425 422 432 440
Tunisia 425 423 437 411 423 421 425
Georgia
4
421 421 409 373 401 427 389
Iran, Islamic Rep. of 395 408 423 415 402 403 427
Bahrain 388 403 412 418 403 395 413
Indonesia 399 405 395 402 398 397 405
Syrian Arab Republic 393 406 417 387 401 393 396
Egypt 393 409 406 384 393 392 396
Algeria 403 349 432 371 412 371
Colombia 369 390 371 405 384 364 416
Oman 363 391 387 389 368 372 397
Palestinian Nat'l Auth. 366 382 388 371 371 365 381
Botswana 366 394 325 384 351 376
Kuwait
6
347 354 385 366 361 347
El Salvador 355 331 318 362 347 336
Saudi Arabia 309 344 359 348 335 308
Ghana 310 358 275 321 297 313
Qatar 334 312 301 305 305 307
1
2
3
4
National Target Population does not include all of the nternational Target Population defned by the Trends in nternational Mathematics and Science Study (TMSS)
(see appendix A).
5
National Defned Population covers less than 90 percent of National Target Population (but at least 77 percent, see appendix A).
6
NOTE: Countries are ordered by 2007 overall mathematics average scale score. The tests for signifcance take into account the standard error for the reported
difference. Thus, a small difference between the United States and one country may be signifcant while a large difference between the United States and another
country may not be signifcant. The standard errors of the estimates are shown in table E-4 available at http://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2009001.
13
MATHEMATICS
and geometry domains and relatively worse, on average,
in geometry than the other three content domains, at least
in terms of comparisons with other countries. U.S. eighth-
graders outperformed students in 38 countries in the data
and chance domain, 35 countries in the number domain,
37 countries in the algebra domain, and 29 countries in the
geometry domain. They were outperformed by their peers
in 6 countries in the data and chance domain, 5 countries
in the number domain, 7 countries in the algebra domain,
and 14 countries in the geometry domain.
n two of the three cognitive domains, the U.S. eighth-grade
average score was higher than the TMSS scale average
in 2007. U.S. eighth-graders' scores in the applying and
reasoning domains were 14 and 5 scale score points above
the TMSS scale score average of 500, respectively. On the
other hand, U.S. eighth-graders' average score in the knowing
domain was not measurably different from the TMSS scale
score average.
Like their fourth-grade counterparts, U.S. eighth-graders
performed relatively better in the applying domain than in the
knowing and reasoning domains in terms of comparisons with
other countries. U.S. eighth-graders outperformed students
in 30 to 38 countries across the three cognitive domains.
They were outperformed by their peers in 5 to 8 countries
across the three cognitive domains.
Performance on the TIMSS
international benchmarks
The TMSS international benchmarks provide a way to
understand how students' profciency in mathematics varies
along the TMSS scale (table 8). TMSS defnes four levels
of student achievement: advanced, high, intermediate, and
low. The benchmarks can then be used to describe the kinds
of skills and knowledge students at each score cutpoint would
need to successfully answer the mathematics items included
in the assessment. The descriptions of the benchmarks differ
between the two grade levels, as the mathematical skills and
knowledge needed to respond to the assessment items refect
the nature, diffculty, and emphasis at each grade.
Table 8. Description of TIMSS international mathematics benchmarks, by grade: 2007
Benchmark
(score cutpoint) Grade four
Advanced
(625)
Students can apply their understanding and knowledge in a variety of relatively complex situations and explain their reasoning.
They can apply proportional reasoning in a variety of contexts. They demonstrate a developing understanding of fractions and
decimals. They can select appropriate information to solve multistep word problems. They can formulate or select a rule for a
relationship. Students can apply geometric knowledge of a range of two- and three-dimensional shapes in a variety of situations.
They can organize, interpret, and represent data to solve problems.
High
(550)
Students can apply their knowledge and understanding to solve problems. Students can solve multistep word problems involving
operations with whole numbers. They can use division in a variety of problem situations. They demonstrate understanding of place
value and simple fractions. Students can extend patterns to fnd a later specifed term and identify the relationship between ordered
pairs. Students show some basic geometric knowledge. They can interpret and use data in tables and graphs to solve problems.
Intermediate
(475)
Students can apply basic mathematical knowledge in straightforward situations. Students at this level demonstrate an understanding
of whole numbers. They can extend simple numeric and geometric patterns. They are familiar with a range of two-dimensional
shapes. They can read and interpret different representations of the same data.
Low
(400)
Students have some basic mathematical knowledge. Students can demonstrate an understanding of adding and subtracting with
whole numbers. They demonstrate familiarity with triangles and informal coordinate systems. They can read information from
simple bar graphs and tables.
Grade eight
Advanced
(625)
Students can organize and draw conclusions from information, make generalizations, and solve nonroutine problems. They can
solve a variety of ratio, proportion, and percent problems. They can apply their knowledge of numeric and algebraic concepts
and relationships. Students can express generalizations algebraically and model situations. They can apply their knowledge
of geometry in complex problem situations. Students can derive and use data from several sources to solve multistep problems.
High
(550)
Students can apply their understanding and knowledge in a variety of relatively complex situations. They can relate and compute with
fractions, decimals, and percents, operate with negative integers, and solve word problems involving proportions. Students can work
with algebraic expressions and linear equations. Students use knowledge of geometric properties to solve problems, including area,
volume, and angles. They can interpret data in a variety of graphs and table and solve simple problems involving probability.
Intermediate
(475)
Students can apply basic mathematical knowledge in straightforward situations. They can add and multiply to solve one-step
word problems involving whole numbers and decimals. They can work with familiar fractions. They understand simple algebraic
relationships. They demonstrate understanding of properties of triangles and basic geometric concepts. They can read and
interpret graphs and tables. They recognize basic notions of likelihood.
Low (400) Students have some knowledge of whole numbers and decimals, operations, and basic graphs.
NOTE: Score cutpoints for the international benchmarks are determined through scale anchoring. Scale anchoring involves selecting benchmarks (scale points)
on the achievement scales to be described in terms of student performance, and then identifying items that students scoring at the anchor points can answer
correctly. The score cutpoints are set at equal intervals along the achievement scales. The score cutpoints were selected to be as close as possible to the
standard percentile cutpoints (i.e., 90th, 75th, 50th, and 25th percentiles). More information on the setting of the score cutpoints can be found in appendix A
and Martin et al. (2008).
14
MATHEMATICS
n 2007, there were higher percentages of U.S. fourth-graders
performing at or above each of the four TMSS international
benchmarks than the international medians
9
of the percentages
performing at each level (fgure 3). For example, 10 percent
of U.S. fourth-graders performed at or above the advanced
benchmark (625) compared to the international median of 5
percent. These students demonstrated an ability to apply
their understanding and knowledge to a variety of relatively
complex mathematical situations (see description in table 8).
At the other end of the scale, 95 percent of U.S. fourth-
graders performed at or above the low benchmark (400)
compared with the international median of 90 percent. These
students showed at least some basic mathematical skills by
demonstrating an understanding of adding and subtracting
with whole numbers, showing familiarity with triangles and
informal coordinate systems, and reading information from
simple bar graphs and tables.
Similar to their fourth-grade counterparts, there were higher
percentages of U.S. eighth-graders performing at or above
each of the four TIMSS international benchmarks than the
international medians of the percentage performing at each
level (fgure 3). For example, 6 percent of U.S. eighth-graders
performed at or above the advanced benchmark (625)
compared to the international median of 2 percent. These
students demonstrated an ability to organize information,
make generalizations, solve nonroutine problems, and draw
and justify conclusions from data (see description in table 8).
At the other end of the scale, 92 percent of U.S. eighth-
graders performed at or above the low benchmark (400)
compared with the international median of 75 percent.
These students showed at least a basic mathematical
understanding of whole numbers and decimals, could
perform simple computations, and complete a basic graph.
Figure 3. Percentage of U.S. fourth- and eighth-
grade students who reached each
TIMSS international mathematics
benchmark compared with the
international median percentage:
2007
Advanced
0
10
20
30
40
50
60
70
80
90
100
90
26
67
Benchmark
Percent
Percent
United States
International median
Grade four
5
Low Intermediate High
95*
77*
40*
10*
Grade eight
Advanced
0
10
20
30
40
50
60
70
80
90
100
75
15
46
Benchmark
United States
2
92*
67*
31*
6*
*p < .05. U.S. percentage is signifcantly different from the Trends in nternational
Mathematics and Science (TIMSS) international median percentage.
NOTE: The United States met guidelines for sample participation rates only
after substitute schools were included and the National Defned Population
covers 90 percent to 95 percent of National Target Population (see appendix
A). The TMSS international median represents all participating TMSS
jurisdictions, including the United States. The international median represents
the percentage at which half of the participating countries have that percentage
of students at or above the median and half have that percentage of students
below the median. The standard errors for the estimates are shown in table E-5
SOURCE: nternational Association for the Evaluation of Educational Achievement
(EA), Trends in nternational Mathematics and Science Study (TMSS), 2007.
9
The international median at each benchmark represents the percentage at which half of the participating countries have that percentage of students at or above
the median and half have that percentage of students below the median. For example, the low international benchmark median of 90 percent at grade four
indicates that half of the countries have 90 percent or more of their students who met the low benchmark, and half have less than 90 percent of their students
who met the low benchmark.
15
MATHEMATICS
At grade four, seven countries had higher percentages of
students performing at or above the advanced international
mathematics benchmark than the United States (fgure 4).
Fourth-graders in these seven countries were also found
to outperform U.S. fourth-graders, on average, on the overall
mathematics scale (see table 3). At grade eight, a slightly
different set of seven countries had higher percentages of
students performing at or above the advanced mathematics
benchmark than the United States (fgure 4). These seven
countries include the fve countries that had higher average
overall mathematics scores than the United States (see table
3), as well as Hungary and the Russian Federation.
At grade four in 2007, higher percentages of U.S. students
performed at or above the intermediate and low international
benchmarks than in 1995 (intermediate: 77 v. 71 percent; low:
95 v. 92 percent; data not shown). There were no measurable
differences in the percentage of U.S. fourth-graders
performing at or above either the high or advanced
international benchmarks between 1995 and 2007 (high: 37 v.
40 percent; advanced: 9 v. 10 percent). At grade eight, higher
percentages of U.S. students performed at or above the high,
intermediate, and low international benchmarks in 2007 than
in 1995 (high: 31 v. 26 percent; intermediate: 67 v. 61 percent;
low: 92 v. 86 percent; data not shown). There was no
measurable difference in the percentage of U.S. eighth-
graders performing at or above the advanced international
benchmark in 2007 than in 1995 (6 v. 4 percent).
Performance within
the United States
TMSS not only provides a measure of mathematics
performance of the nation as a whole, but also of the
performance of student subpopulations. For this report,
TMSS data were analyzed to investigate the performance
of students grouped in four ways: higher and lower
performing students; males and females; racial and ethnic
groups; and public schools serving students with different
low-income concentrations.
Scores of lower and higher
performing students
To examine the mathematics performance of each participating
country's higher and lower performing students, cutpoint
scores were calculated for students performing at or above
the 90th percentile (that is, the top 10 percent of students) and
those performing at or below the 10th percentile (the bottom
10 percent of students). The cutpoint scores were calculated
for each country, rather than across all countries combined.
n 2007, the highest-performing U.S. fourth-graders (those
performing at or above the 90th percentile) scored 625 or
higher (table 9). This was higher than the 90th percentile
scores for fourth-graders in 23 countries and lower than the
90th percentile score for students in 7 countries. The countries
in which the 90th percentile cutpoint score was higher than
the cutpoint score for U.S. are the same as those that
outperformed the United States as a whole (table 3), with
the exception of Latvia where the 90th percentile score of
628 is not signifcantly different from 625 in the United States.
The 90th percentile scores ranged between 371 (Yemen)
and 702 (Singapore). The difference in the 90th percentile
score between Singapore, the highest performing country,
and the United States was 77 score points.
The lowest-performing U.S. fourth-graders (those performing
at or below the 10th percentile) scored 430 or lower in 2007
(table 9). This was higher than the 10th percentile score
in 23 countries and lower than the 10th percentile score in
6 countries: Singapore, Hong Kong SAR, Japan, Chinese
Taipei, Latvia, and the Netherlands. The score at the 10th
percentile ranged between 81 (Yemen) and 520 (Hong Kong
SAR). The difference in the cutpoint scores between the
lowest-performing students in Hong Kong SAR and the
United States was 90 score points.
16
MATHEMATICS
Figure 4. Percentage of fourth- and eighth-grade students who reached the TIMSS advanced
international benchmark in mathematics, by country: 2007
0 10 20 30 40 50
0 10 20 30 40 50
2
2
2
2
3
3
3
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
6*
6*
8*
8*
10*
26*
31*
40*
40*
45*
4*
4*
4*
4*
4*
5*
5*
5*
6*
6*
6*
1*
1*
1*
1*
1*
1*
Algeria
Saudi Arabia
Ghana
El Salvador
Botswana
Kuwait
6
Qatar
Oman
Tunisia
Bahrain
Colombia
Norway
Indonesia
Egypt
Georgia
2
Lebanon
Jordan
Sweden
Cyprus
Malaysia
Italy
Ukraine
Thailand
Scotland
3
Romania
Israel
7
Slovenia
Bulgaria
Serbia
2,4
Malta
Turkey
Czech Republic
Armenia
Australia
United States
3,4
Lithuania
2
Russian Federation
England
3
Hungary
Japan
Hong Kong SAR
1,3
Singapore
Korea, Rep. of
Chinese Taipei
5
5
5
6
6
#
#
#
#
#
#
#
#
#
41*
40*
24*
23*
19*
16*
16*
11*
10*
10*
9*
9*
8*
7*
7*
4*
3*
3*
3*
2*
2*
2*
1*
Yemen
Qatar
Kuwait
6
El Salvador
Tunisia
Algeria
Morocco
Colombia
Georgia
2
Norway
Czech Republic
Ukraine
Sweden
Austria
Slovenia
Scotland
3
Slovak Republic
New Zealand
Germany
Netherlands
5
Denmark
3
Armenia
Australia
Hungary
Lithuania
2
United States
3,4
Latvia
2
Russian Federation
England
Kazakhstan
2
Japan
Chinese Taipei
Hong Kong SAR
1
Singapore
Italy
Percent
Percent
Percentage is higher than U.S. percentage (p < .05)
Percentage is not measurably different from U.S. percentage (p < .05)
Percentage is lower than U.S. percentage (p < .05)
*p < .05. Percentage is signifcantly different from the international median percentage.
# Rounds to zero.
1
2
3
4
5
6
7
NOTE: The TMSS international median represents all participating TMSS jurisdictions, including the United States. The international median represents the
percentage at which half of the participating countries have that percentage of students at or above the median and half have that percentage of students below
the median. The tests for signifcance take into account the standard error for the reported difference. Thus, a small difference between the United States and one
country may be signifcant while a large difference between the United States and another country may not be signifcant. The standard errors for the estimates are
shown in table E-41 available at http://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2009001.
17
MATHEMATICS
Grade four
Country 90th percentile 10th percentile
nternational average 576 366
Singapore 702 487
Hong Kong SAR
1
691 520
Japan 663 471
Chinese Taipei 663 488
Kazakhstan
2
653 435
England 647 429
Russian Federation 647 436
Latvia
2
628 444
United States
3,4
625 430
Lithuania
2
624 430
Hungary 620 389
Australia 620 408
Armenia 617 385
Netherlands
5
612 454
Denmark
3
611 431
Germany 607 440
Italy 601 406
New Zealand 598 377
Slovak Republic 597 389
Scotland
3
592 389
Austria 590 416
Slovenia 589 408
Sweden 586 417
Czech Republic 576 392
Ukraine 573 356
Norway 566 372
Georgia
2
549 322
Iran, Islamic Rep. of 508 290
Algeria 493 261
Colombia 470 238
Tunisia 469 178
Morocco 466 223
El Salvador 448 212
Kuwait
6
443 184
Qatar 413 179
Yemen 371 81
Grade eight
Korea, Rep. of 711 475
Singapore 706 463
Hong Kong SAR
1,3
681 438
Japan 677 460
Hungary 624 405
England
3
618 400
Lithuania
2
609 402
United States
3,4
607 408
Armenia 601 390
Australia 600 394
Malta 597 359
Serbia
2,4
597 368
Slovenia 594 409
Scotland
3
590 381
Romania 587 328
Bulgaria 586 324
Israel
7
584 328
Sweden 582 399
Turkey 581 297
Malaysia 578 372
Cyprus 575 347
Italy 574 381
Ukraine 572 346
Thailand 562 327
Jordan 556 290
Norway 552 382
Bosnia and Herzegovina 552 352
Lebanon 549 354
Georgia
2
532 280
Egypt 521 258
Indonesia 509 286
Tunisia 508 336
Bahrain 505 289
Syrian Arab Republic 502 290
Palestinian Nat'l Auth. 498 233
Oman 492 245
Colombia 477 281
Algeria 465 311
Botswana 460 264
Kuwait
6
455 252
El Salvador 433 248
Saudi Arabia 429 231
Ghana 428 192
Qatar 427 186
Table 9. Mathematics scores of fourth- and eighth-grade students dening 10th and 90th
percentiles, by country: 2007
Percentile cutpoint score is higher than U.S. cutpoint score (p < .05)
Percentile cutpoint score is not measurably different from U.S. cutpoint score (p < .05)
Percentile cutpoint score is lower than U.S. cutpoint score (p < .05)
1
2
(TMSS, see appendix A).
3
4
5
6
7
NOTE: Countries are ordered based on the 90th percentile cutpoint for mathematics scores. Cutpoints are calculated based on distribution of student scores within
each country. The international average is the average of the cutpoint scores for all reported countries. The standard errors of the estimates are shown in tables
E-6 and E-7 available at http://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2009001.
18
MATHEMATICS
Figure 5. Cutpoints at the 10th and 90th percentile for mathematics content domain scores
of U.S. fourth- and eighth-grade students: 2007
Mathematics score Mathematics score
Content domain
Total score
Content domain
Total score
625
430
632
413
428
615
621
464
Number
Geometric shapes
and measures
Data display
Grade four
90th percentile
10th percentile
Grade eight
90th percentile
10th percentile
Number
Algebra
Geometry
Data and chance
0 300 400 500 600 700 1,000 0 300 400 500 600 700 1,000
607
408
615
406
405
598
572
643
388
418
NOTE: The United States met guidelines for sample participation rates only after substitute schools were included. The National Defned Population covered 90
percent to 95 percent of National Target Population (see appendix A). Cutpoints are calculated based on distribution of U.S. student scores. The standard errors
of the estimates are shown in table E-8 available at http://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2009001.
On the three mathematics content domains at grade four,
the highest-performing U.S. fourth-graders (90th percentile
or higher) scored 632 or higher on the number domain, 615
or higher on the geometric shapes and measures domain,
and 621 or higher on the data display domain (fgure 5).
The lowest-performing U.S. students (10th percentile or lower)
scored 413 or lower on the number domain, 428 or lower on
the geometric shapes and measures domain, and 464 or
lower on the data display domain in 2007.
At grade eight, the highest-performing U.S. students (90th
percentile or higher) in mathematics scored 607 or higher
(table 9). The U.S. 90th percentile score was higher than
that of 34 countries and lower than the 90th percentile score
in 6 countries: Chinese Taipei, Korea, Singapore, Hong Kong
SAR, Japan, and Hungary. The range at the eighth grade
in 90th percentile scores was between 427 (Qatar) and 721
(Chinese Taipei). The difference in average scores between
the 90th percentile in Chinese Taipei and the United States
was 114 score points.
The lowest-performing U.S. eighth-graders (10th percentile
or lower) scored 408 or less in 2007 (table 9). The 10th
percentile score for U.S. eighth-graders in mathematics
was higher than the 10th percentile score in 34 countries
and lower than the 10th percentile score in 4 countries:
Chinese Taipei, Korea, Singapore, and Japan. The range
in 10th percentile scores was between 186 (Qatar) and 475
(Korea). The difference in the cutpoint scores between the
lowest-performing students in Korea and the United States
On the four mathematics content domains at grade eight,
the highest-performing U.S. eighth-graders (90th percentile
or higher) scored 615 or higher on the number domain,
598 or higher on the algebra domain, 572 or higher on the
geometry domain, and 643 or higher on the data and chance
domain (fgure 5). The same general pattern appears to hold
among the lowest-performing U.S. students (10th percentile
or lower) who scored 406 or lower on the number domain, 405
or lower on the algebra domain, 388 or lower on the geometry
domain, and 418 or lower on the data and chance domain.
19
MATHEMATICS
Figure 6. Trends in 10th and 90th percentile
and eighth-grade students: 1995,
1999, 2003, and 2007
0
300
400
500
600
700
1,000
0
300
400
500
600
700
1,000
1999 2003 2007
Year
1995
Mathematics score
594
*
611 608 607
408
400
380*
387*
1999
1
2003 2007
Year
Grade four
Grade eight
1995
Mathematics score
619 614*
625
430
417*
408*
90th percentile
10th percentile
90th percentile
10th percentile
*p < .05. Percentile cutpoint score is signifcantly different from 2007 percentile
cutpoint score.
1
Cutpoints are calculated based on distribution of U.S. student scores. The
standard errors of the estimates are shown in table E-9 available at http://nces.
ed.gov/pubsearch/pubsinfo.asp?pubid=2009001.
(TMSS), 1995, 1999, 2003, and 2007.
A comparison of 1995, when TMSS was frst administered,
and 2007 shows no measurable change in the cutpoint score
at the 90th percentile for U.S. fourth graders, the point marking
the top 10 percent of students (fgure 6). n 2007, the 90th
percentile score for U.S. fourth-graders was 625; the 90th
percentile score for 1995 was 619. However, a comparison
of data from 2003 and 2007 shows there was an increase in
the 90th percentile score defning the top-performing students:
from 614 to 625. On the other hand, the lowest-performing U.S.
fourth graders' showed statistically signifcant improvement
in mathematics: the 10th percentile score increased from 408
in 1995 and 417 in 2003 to 430 in 2007.
At grade eight, both the 90th and 10th percentile scores were
higher in 2007 than in 1995 (fgure 6). Though the 90th
percentile score has been relatively stable over the last three
administrations of TMSS, the 2007 score of 607 was higher
than the 1995 score of 594, showing improvement among top
students. The 10th percentile score for eighth-graders was
higher in 2007 than in 1995 or 1999.
Average scores of male and female students
n 2007, U.S. fourth-grade males outperformed females by 6
score points on average in mathematics (fgure 7). n addition
to the United States, of the 35 other countries participating at
grade four, 20 showed a signifcant difference in the average
mathematics scores of males and females: 12 in favor of
males and 8 in favor of females. The difference in average
scores between males and females ranged from 37 score
points in Kuwait (in favor of females) to 17 score points in
Colombia (in favor of males).
20
MATHEMATICS
Figure 7. Difference in average mathematics scores of fourth- and eighth-grade students, by sex
and country: 2007
Male-female difference in average mathematics scores favors males and is statistically signifcant (p < .05)
Male-female difference in average mathematics scores is not measurably different (p < .05)
Male-female difference in average mathematics scores favors females and is statistically signifcant (p < .05)
# Rounds to zero.
1
2
3
4
5
(see appendix A).
6
Kuwait tested the same cohort of students as other countries, but later in 2007, at the beginning of the next school year (see appendix A).
7
NOTE: The standard errors of the estimates are shown in tables E-10 and E-11 available at http://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2009001.
Grade four
Difference in favor
of females
Difference in
favor of males
Difference in favor
of females
Difference in
favor of males
Grade eight
1
2
3
3
4
7
9
7
6
5
6
6
7
9
10
12
14
15
17
37
22
22
18
14
9
8
6
6
6
3
3
5
#
#
#
#
Kuwait
6
Qatar
Yemen
Tunisia
Armenia
Kazakhstan
5
Russian Federation
Singapore
Algeria
Georgia
5
Latvia
5
Ukraine
Lithuania
5
England
Japan
New Zealand
Chinese Taipei
Morocco
Hungary
Hong Kong SAR
4
Slovenia
Australia
United States
2,3
Czech Republic
Sweden
Slovak Republic
Denmark
2
Norway
Scotland
2
El Salvador
Netherlands
1
Germany
Austria
Italy
Colombia
Difference in average mathematics score
Oman
Qatar
Bahrain
Thailand
Saudi Arabia
Kuwait
6
Jordan
Cyprus
Romania
Botswana
Singapore
Bulgaria
Egypt
Hong Kong SAR
2,4
Malaysia
Lithuania
5
Serbia
3,5
Ukraine
Russian Federation
Georgia
5
Armenia
Indonesia
Norway
Sweden
Israel
7
Czech Republic
Chinese Taipei
Turkey
Malta
Hungary
Slovenia
Scotland
2
Korea, Rep. of
Japan
Algeria
England
2
Italy
Lebanon
Australia
El Salvador
Tunisia
Ghana
Colombia
Difference in average mathematics score
United States
2,3
80 60 40 20 0 20 40 60 80
80 60 40 20 0 20 40 60 80
4
4
6
6
13
1
2
3
4
4
4
4
4
5
5
6
7
11
1
32
22
21
21
16
15
5
7
11
13
15
15
15
38
54
36
32
23
23
22
20
20
18
1
#
1
2
3
4
21
MATHEMATICS
Figure 8. Average mathematics scores of U.S. fourth- and eighth-grade students, by content domain
and sex: 2007
Average mathematics score Average mathematics score
Content domain
Total score
Content domain
Total score
532*
526
528*
520
522
523
544
543
Number
Geometric shapes
and measures
Data display
Grade four
Males
Females
Grade eight
Males
Females
Number
Algebra
Geometry
Data and chance
0 300 400 500 600 700 1000 0 300 400 500 600 700 1,000
510
507
515*
506
503
498
483*
535*
477
527
,
*p < .05. Difference between average mathematics scores for males and females is statistically signifcant and favors males.
NOTE: The United States met guidelines for sample participation rates only after substitute schools were included. The National Defned Population covered
90 percent to 95 percent of National Target Population (see appendix A). The standard errors of the estimates are shown in table E-12 available at http://nces.ed.gov/
pubsearch/pubsinfo.asp?pubid=2009001.
The higher average for U.S. male fourth graders on the total
mathematics scale refects higher average performance on one
content area: males outscored females 528 to 520, on average,
in number (fgure 8). There were no measurable sex differences
detected in the average scores in either the geometric shapes
and measures domain or the data display domain.
At grade eight, there was no measurable difference in the
average mathematics scores of U.S. males and females
in 2007 (fgure 7). Among the 47 other countries participating
in TMSS at grade eight, 24 showed a difference in the
average mathematics scores of males and females: 8 in favor
of males and 16 in favor of females. The difference in average
scores between males and females ranged from 54 score
points in Oman (in favor of females) to 32 score points in
Colombia (in favor of males).
Though there was no measurable difference detected in the
average mathematics scores of U.S. eighth-grade males and
females, U.S. males outperformed U.S. females in three of four
mathematics content domains: number (515 v. 506), geometry
(483 v. 477), and data and chance (535 v. 527; fgure 8).
22
MATHEMATICS
Figure 9. Trends in sex differences in average
and eighth-grade students: 1995,
1999, 2003, and 2007
Score
gap
5 6 7
4
Score
gap
3 8 6
0
300
400
500
600
700
1,000
0
300
400
500
600
700
1,000
1999 2003 2007
Year
1995
Average mathematics score
495*
505 507
510
507 502
490*
498
1999
1
2003 2007
Year
Grade four
Grade eight
1995
520* 522*
532
526
514* 516*
Males
Females
Males
Females
*p < .05. Signifcantly different from 2007.
1
The standard errors of the estimates are shown in table E-13 available at http://
nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2009001.
(TMSS), 1995, 1999, 2003 and 2007.
Both U.S. males and females' average scores, at the fourth
and eighth grades, were higher in 2007 than in 1995 (fgure
9). At grade four, the 2007 average scores of both males and
females were higher than their average scores in both 1995
and 2003. U.S. fourth-grade males scored 12 points higher
on average in mathematics in 2007 than in 1995 (532 v. 520),
and U.S. fourth-grade females scored 10 points higher, on
average (526 v. 516).
At grade eight in 2007, U.S. males and females had higher
scores, on average, compared to their scores in 1995: by
15 scale score points among males (510 v. 495) and by 17
scale score points among females (507 v. 490; fgure 9).
23
MATHEMATICS
Figure 10. Average mathematics scores of U.S.
fourth- and eighth-grade students,
by race/ethnicity: 2007
0
300
400
500
600
700
1,000
Race/ethnicity
Grade four
White Black Hispanic Asian Multiracial U.S.
average
TIMSS scale
average
550
482
504
582
534
529
500
533
457
475
549
506 508
500
0
300
400
500
600
700
1,000
Race/ethnicity
Grade eight
average
TIMSS scale
average
NOTE: Reporting standards were not met for American ndian/Alaska Native
and Native Hawaiian/Other Pacifc slander. Black includes African American.
Racial categories exclude Hispanic origin. Students who identifed themselves
as being of Hispanic origin were classifed as Hispanic, regardless of their race.
Although data for some race/ethnicities are not shown separately because the
reporting standards were not met, they are included in the U.S. totals shown
throughout the report. The United States met guidelines for sample participation
rates only after substitute schools were included. The National Defned
Population covered 90 percent to 95 percent of the National Target Population
(see appendix A). See appendix A in this report for more information. The
(TMSS), 2007.
Average scores of students of different
races and ethnicities
n 2007 U.S. non-Hispanic White, non-Hispanic Asian and
multiracial fourth-graders scored higher on average than
the TMSS scale average in mathematics, while U.S. non-
Hispanic Black fourth-graders scored lower (fgure 10).
10

U.S. Hispanic fourth-graders' average score showed
no measurable difference from the TMSS scale average.
n comparison to the U.S. national average, U.S. White
and Asian fourth-graders scored higher, on average, while
U.S. Black and Hispanic fourth-graders scored lower. U.S.
multiracial fourth-graders did not score measurably different
from the U.S. national average in mathematics.
At grade eight, U.S. White, and Asian students scored higher,
on average, than both the TMSS scale average and the U.S.
national average in mathematics. On the other hand, U.S.
Black and Hispanic eighth-graders scored lower, on average,
than the TMSS scale average and U.S. national average. U.S.
multiracial eighth-graders did not score measurably different
from either the TMSS scale average or the U.S. national
average score in mathematics.
Over time, U.S. White, Black, Hispanic, and Asian students,
in both fourth and eighth grades, have generally shown overall
improvement in mathematics (fgure 11). At grade four, U.S.
White, Black, and Asian students had higher scores in 2007
than in 1995 or 2003; Hispanic students improved their average
mathematics score over a shorter period of time, between 2003
and 2007, but not over the 12-year period since 1995.
11
Though
in each of the data collection years the differences in the
average scores of White fourth-graders and their Black peers
were statistically signifcant, the gap in scores decreased
between 1995 and 2007 (84 points v. 67 points). On the other
hand, the difference in average scores between White and
Asian fourth-graders has reversed and grown over the same
period of time, from being in favor of Whites in 1995 (541 v. 525)
to being in favor of Asians in 2007 (550 v. 582). There has been
no detectable change in the size of the gap in scores between
White fourth-graders and their Hispanic classmates.
At grade eight, U.S. White, Black, Hispanic, and Asian
students improved in mathematics, on average, when 2007
scores are compared to those from 1995 (fgure 11). Black
and Hispanic eighth-graders also showed an increase in scores
over a shorter period of time, when 2007 is compared to 1999.
Though in each of the data collection years the differences
in the average scores of White eighth-graders and their Black
and Hispanic peers were statistically signifcant, the sizes of
the gap in scores between these groups of students were
smaller in 2007 than they were 12 years earlier in 1995 (White
v. Black: 76 points v. 97 points; White v. Hispanic: 58 points v.
73 points). There has been no detectable change in the size
of the gap in scores between White eighth-graders and their
Asian peers.
10
Black includes African American and Hispanic includes Latino. Race categories exclude Hispanic origin.
11
The large apparent difference is not statistically signifcant because of relatively large standard errors.
24
MATHEMATICS
Score
gap
Score
gap
16* 8* 33 2 11 16 15
67
Score
gap
Score
gap
48 50 46 73* 60 58
81
Score
gap
97* 78 76 84* 70 67
Score
gap
0
300
400
500
600
700
1,000
1999
1
2003 2007
Year
Grade four
1995
541* 542*
550
482
471*
457*
1999 2003 2007
Year
Grade eight
1995
516*
444*
525 525
533
457
447
419*
516*
457*
525 525
533
475
465
443*
516*
539
525 525
533
549
537
514*
1999
1
2003 2007
Year
1995
541* 542*
550
504
492* 493
1999
1
2003 2007
Year
1995
541*
550*
582
550
542*
525*
0
300
400
500
600
700
1,000
0
300
400
500
600
700
1,000
1999 2003 2007
Year
1995
1999 2003 2007
Year
1995
0
300
400
500
600
700
1,000
0
300
400
500
600
700
1,000
0
300
400
500
600
700
1,000 White
Black
White
Black
White
Hispanic
White
Hispanic
White
Asian
White
Asian
Figure 11. Trends in differences in average mathematics scores of U.S. fourth- and eighth-grade
students, by selected race/ethnicity: 1995, 1999, 2003, and 2007
1
NOTE: Only the four numerically largest racial categories are shown. Multiracial data were not collected in 1995 and 1999. Reporting standards were not met for American
ndian/Alaska Native and Native Hawaiian/Other Pacifc slander. Black includes African American. Racial categories exclude Hispanic origin. Students who identifed
themselves as being of Hispanic origin were classifed as Hispanic, regardless of their race. Although data for some race/ethnicities are not shown separately because the
reporting standards were not met, they are included in the U.S. totals shown throughout the report. n 2007, the United States met guidelines for sample participation rates
only after substitute schools were included. The National Defned Population covered 90 percent to 95 percent of National Target Population (see appendix A). The tests for
signifcance take into account the standard error for the reported difference. Thus, a small difference between averages for one student group may be signifcant while a large
difference for another student group may not be signifcant. See appendix A in this report for more information. The standard errors of the estimates are shown in table E-15
SOURCE: nternational Association for the Evaluation of Educational Achievement (EA), Trends in nternational Mathematics and Science Study (TMSS), 1995, 1999, 2003,
and 2007.
25
MATHEMATICS
Figure 12. Average mathematics scores of U.S.
by percentage of students in public
school eligible for free or reduced-
price lunch: 2007
0
300
400
500
600
700
1,000
Percentage of students eligible for free or reduced-price lunch
Grade four
Less
than 10
percent
10 to
24.9
percent
25 to
49.9
percent
50 to
74.9
percent
75
percent
or more
U.S.
average
TIMSS
scale
average
583
553
537
510
479
529
500
0
300
400
500
600
700
1,000
Grade eight
Less
than 10
percent
10 to
24.9
percent
25 to
49.9
percent
50 to
74.9
percent
75
percent
or more
U.S.
average
TIMSS
scale
average
557
543
514
482
465
508
500
NOTE: Analyses are limited to public schools only, based on school reports
of the percentage of students in public school eligible for the federal free or
reduced-price lunch program. The United States met guidelines for sample
participation rates only after substitute schools were included. The National
Defned Population covered 90 percent to 95 percent of the National Target
Population (see appendix A). The standard errors of the estimates are shown
in table E-16 available at http://nces.ed.gov/pubsearch/pubsinfo.
asp?pubid=2009001.
(TMSS), 2007.
Average scores of students attending
public schools of various poverty levels
The U.S. results are also arrayed by the concentration of low-
income enrollment in the public schools, as measured by
eligibility for free or reduced-price lunch, and shown in relation
to the TMSS scale average and the U.S. national average.
n comparison to the TMSS scale average, the average
mathematics score of U.S. fourth graders in the highest
poverty public schools (at least 75 percent of students eligible
for free or reduced-price lunch) in 2007 was lower (479 v. 500);
the average scores of fourth-graders in each of the other
categories of school poverty was higher than the TMSS
scale average (fgure 12). n comparison to the U.S. national
average score, fourth-graders in schools with 50 percent
or more students eligible for free or reduced-price lunch
scored lower, on average, while those in schools with lower
proportions of poor students scored higher, on average,
than the U.S. national average.
On average, U.S. eighth-graders in public schools with at
least 50 percent eligible for free and reduced price lunch
scored lower than the TMSS scale average in 2007 (482
and 465 v. 500). U.S. eighth-graders attending public schools
with fewer than 50 percent of students eligible for the free or
reduced-price lunch program scored higher than the TMSS
scale average in mathematics. n comparison to the U.S.
national average, U.S. eighth-graders in public schools with
fewer than 25 percent of students eligible scored higher in
mathematics, on average, while students in public schools
with at least 50 percent eligible scored lower, on average.
26
MATHEMATICS
Score
gap
33 46
23 29
Score
gap
2007
Year
Grade four
2003
Grade four
Grade four
Grade four
566*
583
553
543*
566*
583
537
533
Less than 10 percent
10-24.9 percent
Year
2003 2007
0
300
400
500
600
700
1,000
0
300
400
500
600
700
1,000
Score
gap
96 103
67 72
Score
gap
2007
Year
2003
566*
583
510
499*
566*
583
479
471
50-74.9 percent
25-49.9 percent
75 percent or more
Year
2003 2007
0
300
400
500
600
700
1,000
0
300
400
500
600
700
1,000
students, by school poverty level: 1999, 2003, and 2007
See notes at end of table.
Comparisons of scores in 2007 to 2003 showed an
inconsistent pattern of improvement in mathematics among
U.S. fourth-graders in public schools serving students from
various levels of poverty (fgure 13).
12
On the one hand, fourth
graders in public schools with relatively lower levels of poverty
(less than 10 percent to 24.9 percent eligible) and in public
schools with relatively higher levels of poverty (50 to almost
75 percent eligible) had higher average mathematics scores
in 2007 than in 2003. On the other hand, there was no
measurable difference detected in the average scores of
students in public schools serving students from medium
and the highest level of poverty. Moreover, though the
average mathematics scores were higher in 2007, the score
gaps evident in the earlier data collections did not appear
to diminish over time.
13

Consistent with the lack of signifcant change between
1999 and 2007 in eighth-grade mathematics scores overall,
students in different types of public schools categorized by
poverty also did not show detectable change in performance
generally. And, as at grade four, the score gaps evident in
earlier data collections did not appear to diminish over time.
12
nformation on the percentage of students eligible for the federal free or reduced-price lunch program was not collected in 1995 for either grade. Thus, comparisons
over time on this measure are limited to an 8-year period.
13
Large apparent differences are not statistically signifcant because of relatively large standard errors.
27
MATHEMATICS
Score
gap
13 16 14
51 42 43 97 103 92
70 67 75
Score
gap
Year
Grade eight
Grade eight
Grade eight
Grade eight
546
557
543
547
531 533
546
557
514
547
505
495*
546
557
465
547
444
449
546
557
482
547
480
476
10-24.9 percent
Year
2007 1999 2003
2007 1999 2003
2007 1999 2003
2007 1999 2003
0
300
400
500
600
700
1,000
0
300
400
500
600
700
1,000
Score
gap
Score
gap
Year
50-74.9 percent
25-49.9 percent
75 percent or more
Year
0
300
400
500
600
700
1,000
0
300
400
500
600
700
1,000
students, by school poverty level: 1999, 2003, and 2007Continued
NOTE: nformation on the percentage of students in school eligible for free or reduced-price lunch was not collected in 1995. No fourth-grade assessment was conducted
in 1999. Analyses are limited to public schools only, based on school reports of the percentage of students in public school eligible for the federal free or reduced-price
lunch program. n 2007, the United States met guidelines for sample participation rates only after substitute schools were included. The National Defned Population
covered 90 percent to 95 percent of the National Target Population (see appendix A). The standard errors of the estimates are shown in table E-17 available at http://nces.
SOURCE: nternational Association for the Evaluation of Educational Achievement (EA), Trends in nternational Mathematics and Science Study (TMSS), 1999, 2003,
and 2007.
28
MATHEMATICS
Effect size of the difference
in average scores
As noted in the introduction, this report includes effect sizes
to provide the reader with a sense of the magnitude of the
statistically signifcant differences reported thus far. Statistically
signifcant results do not necessarily indicate those fndings that
are important or large enough to consider as informing policy or
practice. Small differences may be statistically signifcant, but
may not have much practical import.
One way of looking at within-country differences in
achievement between groups of students is to ask how large
these differences are relative to across-country differences
between the U.S. national average and an international
benchmark, such as the national average for the country
with the highest estimated score. As shown previously, the
countries with the highest scores outpaced the United States
on a number of measures. For example, the difference at
grade four between the U.S. average mathematics score
(529) and Hong Kong SAR average score (607) was 78
score points (see table 3). The gap between the United States
and Hong Kong SAR is also apparent in the percentage of
students scoring at the advanced level: 10 percent of U.S.
fourth-graders met the advanced international benchmark
compared with 40 percent in Hong Kong SAR (see fgure 4).
Are differences within the United States between groups
of students (e.g., by race/ethnicity or poverty concentration
in schools) bigger or smaller than these international
differences? Effect sizes help make these comparisons.
Figure 14 shows the effect size of the difference only for those
groups with statistically signifcant score differences. Appendix
A provides a discussion of how effect sizes were calculated.
As shown in fgure 14, in grade four mathematics, the effect
size of the difference between U.S. White and Black students
is roughly the same as the effect size between the United
States and Hong Kong SAR, the country with the highest
estimated score, while the effect size between U.S. White
and Hispanic students is roughly three-ffths the effect size
between the United States and Hong Kong SAR. The largest
effect size, between U.S. fourth-graders in schools with the
lowest and highest poverty levels, is 1.4 times the effect size
between the United States and Hong Kong SAR.
At grade eight, the effect size of the difference in mathematics
scores between U.S. White and Black students is 1.1 times
the effect size between the United States and Chinese Taipei,
the country with the highest estimated score. The effect size
between U.S. White and Hispanic students is four-ffths the
effect size between the United States and Chinese Taipei.
The largest effect size, between U.S. eighth-graders in schools
with the lowest and highest poverty levels, is 1.3 times the
effect size between the United States and Chinese Taipei.
29
MATHEMATICS
Figure 14. Effect size of difference in average mathematics achievement
of fourth- and eighth-grade, by country, sex, race/ethnicity,
and school poverty level: 2007
Groups compared
Grade four
United States
v. Hong Kong
SAR
1
U.S. males
v.
U.S. females
U.S. White
students v.
U.S. Black
students
U.S. White
students v.
U.S. Hispanic
students
U.S. White
students v.
U.S. Asian
students
U.S. White
students v.
U.S. multiracial
students
U.S. public
schools with
lowest levels
of poverty v.
U.S. public
schools with highest
levels of poverty
Effect size
1.1
0.1
1.0
0.7
0.5
0.2
1.5
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
Groups compared
Grade eight
United States
v.
Chinese Taipei
U.S. White
students v.
U.S. Black
students
U.S. White
students v.
U.S. Hispanic
students
U.S. White
students v.
U.S. Asian
students
U.S. White
students v.
U.S. multiracial
students
U.S. public
schools with
lowest levels
of poverty v.
U.S. public
levels of poverty
Effect size
1.0
1.1
0.8
0.2
0.4
1.3
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
1
NOTE: Effect size is shown only for statistically signifcant differences between group means. Effect size is calculated
by dividing the raw difference between group means by the pooled standard deviation (see appendix A). Black includes
African American. Racial categories exclude Hispanic origin. Students who identifed themselves as being of Hispanic
origin were classifed as Hispanic, regardless of their race. High-poverty schools are those in which 75 percent or more
of students are eligible for the federal free or reduced-price lunch program. Low-poverty schools are those in which less
than 10 percent of students are eligible. The United States met guidelines for sample participation rates only after
substitute schools were included. The National Defned Population covered 90 percent to 95 percent of the National
Target Population. See table E-18 (available at http://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2009001) for standard
deviations of the U.S. and other countries' student populations. See table E-19 (available at http://nces.ed.gov/
pubsearch/pubsinfo.asp?pubid=2009001) for standard deviations of U.S. student subpopulations.
SOURCE: nternational Association for the Evaluation of Educational Achievement (EA), Trends in nternational
Mathematics and Science Study (TMSS), 2007.
31
SCIENCE
The TIMSS science assessment
Like the TMSS mathematics assessment, the TIMSS science
assessment is designed along two dimensions: the science
topics or content that students are expected to learn and the
cognitive skills students are expected to have developed.
The content domains covered at grade four are life science,
physical science, and Earth science (see table 10). At grade
eight, the content domains are biology, chemistry, physics,
and Earth science. The cognitive domains in each grade are
knowing, applying, and reasoning. Example items from the
TMSS science assessment are included in appendix B (see
items B8 through B14).
The proportion of items devoted to a domain, and therefore
the contribution of the domain to the overall science scale
score, differs somewhat across grades. For example, at grade
four in 2007, 37 percent of the TMSS science assessment
focused on the physical science domain, while at grade eight,
46 percent of the assessment focused on the analogous
chemistry and physics domains. The proportion of items
devoted to each cognitive domain is similar across grades.
Also, within a content or cognitive domain, the makeup of
items, in terms of diffculty and form of knowledge and skills
addressed, differs across grade levels to refect the nature,
diffculty, and emphasis of the subject matter encountered in
school at each grade. The TIMSS 2007 Assessment
Frameworks (Mullis et al. 2005) provides a more detailed
description of the content and cognitive domains assessed
in TMSS. The development and validation of the science
cognitive domains is based on the same processes used
in the development of the mathematics cognitive domains.
Details of the development of the mathematics cognitive
domains can be found in IEAs TIMSS 2003 International
Report on Achievement in the Mathematics Cognitive
Domains: Findings From a Developmental Project (Mullis,
Martin, and Foy 2005).
TMSS provides an overall science scale score as well
as content and cognitive domain scores at each grade level.
As with the mathematics scale, the TMSS science scale is
from 0 to 1,000, and the international mean score is set at 500,
with an international standard deviation of 100. The scaling of
data is conducted separately for each grade and each content
domain. While the scales were created to each have a mean
of 500 and a standard deviation of 100, the subject matter
and the level of diffculty of items necessarily differ between
the assessments at both grades. Therefore, direct
comparisons between scores across grades should not be
made. Comparability over time is established by linking the
data from each assessment to the data from the assessment
that preceded it. More information on how the TMSS scale
was created can be found in appendix A.
Average scores in 2007
The average science scores for both U.S. fourth- and eighth-
graders were higher than the TMSS scale average (table 11).
Table 10. Percentage of fourth- and eighth-grade TIMSS science assessment devoted
to content and cognitive domains: 2007
Grade four
Content domains
Percent of
assessment
Life science 43
Physical science 37
Earth science 21
Cognitive domains
Percent of
assessment
Knowing 44
Applying 36
Reasoning 20
Grade eight
Content domains
Percent of
assessment
Biology 36
Chemistry 20
Physics 26
Earth science 19
Cognitive domains
Percent of
assessment
Knowing 39
Applying 40
Reasoning 21
NOTE: The content and cognitive domains are the foundation of the Trends in nternational Mathematics and Science Study (TMSS)
assessment. The content domains defne the specifc science subject matter covered by the assessment, and the cognitive domains
defne the sets of behaviors expected of students as they engage with the science content. Each science content domain has several
topic areas. Each topic area is presented as a list of objectives covered in a majority of participating countries, at either grade four or
grade eight. However, the cognitive domains of science are defned by the same three sets of expected behaviorsknowing, applying,
and reasoning. Detail may not sum to totals because of rounding.
SOURCE: nternational Association for the Evaluation of Educational Achievement (EA), Trends in nternational Mathematics and
Science Study (TMSS), 2007.
Science Performance in the United States
and Internationally
32
SCIENCE
1
Hong Kong is a Special Administrative Region (SAR) of the People's Republic
of China.
2
National Target Population does not include all of the nternational Target
Population defned by the Trends in nternational Mathematics and Science
Study (TMSS) (see appendix A).
3
Met guidelines for sample participation rates only after substitute schools were
included (see appendix A).
4
National Defned Population covers 90 percent to 95 percent of National
Target Population (see appendix A).
5
Nearly satisfed guidelines for sample participation rates only after substitute
schools were included (see appendix A).
6
Kuwait tested the same cohort of students as other countries, but later in
2007, at the beginning of the next school year.
7
National Defned Population covers less than 90 percent of National Target
Population (but at least 77 percent, see appendix A).
NOTE: Countries are ordered by 2007 average score. The tests for signifcance
take into account the standard error for the reported difference. Thus, a small
difference between the United States and one country may be signifcant while
a large difference between the United States and another country may not be
signifcant. The standard errors of the estimates are shown in tables E-20 and
E-21 available at http://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2009001.
(TMSS), 2007.
Table 11. Average science scores of fourth-
and eighth-grade students,
by country: 2007
Grade four
Country
Average
score
Singapore 587
Chinese Taipei 557
Hong Kong SAR
1
554
Japan 548
Latvia
2
542
England 542
United States
3,4
539
Hungary 536
Italy 535
Kazakhstan
2
533
Germany 528
Australia 527
Slovak Republic 526
Austria 526
Sweden 525
Netherlands
5
523
Slovenia 518
Denmark
3
517
Czech Republic 515
Lithuania
2
514
New Zealand 504
Scotland
3
500
Armenia 484
Norway 477
Ukraine 474
Georgia
2
418
Colombia 400
El Salvador 390
Algeria 354
Kuwait
6
348
Tunisia 318
Morocco 297
Qatar 294
Yemen 197
Grade eight
Country
Average
score
Singapore 567
Chinese Taipei 561
Japan 554
Korea, Rep. of 553
England
3
542
Hungary 539
Czech Republic 539
Slovenia 538
Hong Kong SAR
1,3
530
United States
3,4
520
Lithuania
2
519
Australia 515
Sweden 511
Scotland
3
496
Italy 495
Armenia 488
Norway 487
Ukraine 485
Jordan 482
Malaysia 471
Thailand 471
Serbia
2,4
470
Bulgaria
7
470
Israel
7
468
Bahrain 467
Romania 462
Malta 457
Turkey 454
Cyprus 452
Tunisia 445
Indonesia 427
Oman 423
Georgia
2
421
Kuwait
6
418
Colombia 417
Lebanon 414
Egypt 408
Algeria 408
Saudi Arabia 403
El Salvador 387
Botswana 355
Qatar 319
Ghana 303
n 2007, the average score of U.S. fourth-graders was 539 and
the average score of U.S. eighth-graders was 520, compared
to the TMSS scale average of 500 at each grade level.
At grade four, the average U.S. science score was higher than
those in 25 of the 35 other countries, lower than the average
scores in 4 countries (all of them in Asia), and not measurably
different from the average scores of students in the remaining
6 countries.
At grade eight, the average U.S. science score was higher
than those in 35 of the 47 other countries, lower than in 9
countries (all located in Asia or Europe), and not measurably
different from the average scores in the other 3 countries.
33
SCIENCE
Trends in scores since 1995
At grade four, 16 countries, including the United States,
participated in both the frst TMSS in 1995 and the most
recent TMSS in 2007 and therefore can be compared over
a 12-year period. Comparing 2007 with 1995, 7 of the 16
countries showed improvement in average science scores,
5 countries showed declines, and 4 countries, including the
United States, had no measurable difference in average
scores (table 12). n 2007, the U.S. fourth-grade average
science score was 539, compared with 542 in 1995.
Grade eight
1
Country 1995 2007 20071995
Lithuania
2
464 519 55*
Colombia 365 417 52*
Hong Kong SAR
3,4
510 530 20*
England
4
533 542 8
United States
4,5
513 520 7
Korea, Rep. of 546 553 7*
Russian Federation 523 530 7
Hungary 537 539 2
Australia 514 515 1
Cyprus 452 452 #
Japan 554 554 -1
Iran, Islamic Rep. of 463 459 -4
Scotland
4
501 496 -5
Romania 471 462 -9
Singapore 580 567 -13
Norway 514 487 -28*
Sweden 553 511 -42*
Grade four
1
Country 1995 2007 20071995
Singapore 523 587 63*
Latvia
2
486 542 56*
Iran, Islamic Rep. of 380 436 55*
Hong Kong SAR
3
508 554 46*
Hungary 508 536 28*
England 528 542 14*
Australia 521 527 6
New Zealand 505 504 -1
United States
4,5
542 539 -3
Japan 553 548 -5*
Netherlands
6
530 523 -7
Austria 538 526 -12*
Scotland 514 500 -14*
Norway 504 477 -27*
Country difference in average scores between 1995 and 2007 is greater than analogous U.S. difference (p < .05)
Country difference in average scores between 1995 and 2007 is not measurably different from analogous U.S. difference (p < .05)
Country difference in average scores between 1995 and 2007 is less than analogous U.S. difference (p < .05)
# Rounds to zero.
*p < .05. Within-country difference between 1995 and 2007 average scores is signifcant.
1
Difference calculated by subtracting 1995 from 2007 estimate using unrounded numbers.
2
n 2007, National Target Population did not include all of the nternational Target Population defned by the Trends in nternational Mathematics and Science Study
3
4
n 2007, met guidelines for sample participation rates only after substitute schools were included (see appendix A).
5
n 2007, National Defned Population covered 90 percent to 95 percent of National Target Population (see appendix A).
6
n 2007, nearly satisfed guidelines for sample participation rates only after substitute schools were included (see appendix A).
NOTE: Bulgaria collected data in 1995 and 2007, but due to a structural change in its education system, comparable science data from 1995 are not available.
Countries are ordered by the difference between 1995 and 2007 overall average scores. All countries met international sampling and other guidelines in 2007,
except as noted. Data are not shown for some countries, because comparable data from previous cycles are not available. The tests for signifcance take into
account the standard error for the reported difference. Thus, a small difference between the United States and one country may be signifcant while a large
difference between the United States and another country may not be signifcant. Detail may not sum to totals because of rounding. The standard errors of the
estimates are shown in tables E-20 and E-21 available at http://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2009001.
and 2007.
Table 12. Trends in average science scores of fourth- and eighth-grade students, by country:
1995 to 2007
34
SCIENCE
Figure 15. Difference between average science
scores of U.S. fourth- and eighth-
grade students and the TIMSS scale
average: 1995, 1999, 2003, and 2007
1999
1
2003 2007
-80
-60
-40
-20
0
20
40
60
80
Year
1995
42*
36*
39*
1999
1
2003 2007
-80
-60
-40
-20
0
20
40
60
80
13*
27*
15*
Year
20*
1995
Grade four
Grade eight
*p < .05. Difference between U.S. average and Trends in nternational
Mathematics and Science Study (TMSS) scale average is statistically
signifcant.
1
NOTE: The United States met guidelines for sample participation rates only after
substitute schools were included. The National Defned Population covers 90
percent to 95 percent of National Target Population (see appendix A). Difference
calculated by subtracting the TMSS scale average (500) from the U.S. average
science score. The standard errors of the estimates are shown in table E-40
(TMSS), 1995, 1999, 2003 and 2007.
At grade eight, 19 countries, including the United States,
participated in TMSS in both 1995 and 2007. Five countries
had higher average science scores in 2007 than in 1995,
3 countries showed declines in their average scores,
and 11 countries, including the United States, had no
measurable difference between average scores in 1995
and 2007. The U.S. eighth-grade average science score
was 520, compared with 513 in 1995.
Figure 15 shows the difference between the average U.S.
science scores and the TMSS scale average at grades four
and eight for each of the TMSS administrations. The average
size of difference in science scores between the U.S. fourth-
graders and the TMSS scale average shows no signifcant
change across the data collection years, from 36 to 42 scale
score points above the TMSS scale average. Similarly, at
grade eight, there has been no measurable change in the size
of the difference, on average, across the data collection years.
35
SCIENCE
Table 13. Description of TIMSS science cognitive domains: 2007
Cognitive Domain Description
Knowing
Knowing addresses the facts, information, concepts, tools, and procedures that students need to know to function scientifcally.
The key skills of this cognitive domain include making or identifying accurate statements about science facts, relationships,
processes, and concepts; identifying the characteristics or properties of specifc organisms, materials, and processes; providing
or identifying defnitions of scientifc terms; recognizing and using scientifc vocabulary, symbols, abbreviations, units, and scales
in relevant contexts; describing organisms, physical materials, and science processes that demonstrate knowledge of properties,
structure, function, and relationships; supporting or clarifying statements of facts or concepts with appropriate examples;
identifying or providing specifc examples to illustrate knowledge of general concepts; and demonstrating knowledge of the use
of scientifc apparatus, tools, equipment, procedures, measurement devices, and scales.
Applying
Applying focuses on students' ability to apply knowledge and conceptual understanding to solve problems or answer questions.
The key skills of this cognitive domain include identifying or describing similarities and differences between groups of organisms,
materials, or processes; distinguishing, classifying, or ordering individual objects, materials, organisms, and processes based on
given characteristics and properties; using a diagram or model to demonstrate understanding of a science concept, structure,
relationship, process, or biological or physical system or cycle; relating knowledge of an underlying biological or physical concept
to an observed or inferred property, behavior, or use of objects, organisms, or materials; interpreting relevant textual, tabular, or
graphical information in light of a science concept or principle; identifying or using a science relationship, equation, or formula to
fnd a quantitative or qualitative solution involving the direct application or demonstration of a concept; providing or identifying an
explanation for an observation or natural phenomena, demonstrating understanding of the underlying science concept, principle,
law, or theory.
Reasoning
Reasoning goes beyond the cognitive processes involved in solving routine problems to include more complex tasks. The key
skills of this cognitive domain include analyzing problems to determine the relevant relationships, concepts, and problem-solving
steps; developing and explaining problem-solving strategies; providing solutions to problems that require consideration of a
number of different factors or related concepts; making associations or connections between concepts in different areas of science;
demonstrating understanding of unifed concepts and themes across the domains of science; integrating mathematical concepts
or procedures in the solutions to science problems; combining knowledge of science concepts with information from experience or
observation to formulate questions that can be answered by investigation; formulating hypotheses as testable assumptions using
knowledge from observation or analysis of scientifc information and conceptual understanding; making predictions about the effects
of changes in biological or physical conditions in light of evidence and scientifc understanding; designing or planning investigations
appropriate for answering scientifc questions or testing hypotheses; detecting patterns in data; describing or summarizing data
trends; interpolating or extrapolating from data or given information; making valid inferences based on evidence; drawing appropriate
conclusions; demonstrating understanding of cause and effect; making general conclusions that go beyond the experimental
or given conditions; applying conclusions to new situations; determining general formulas for expressing physical relationships;
evaluating the impact of science and technology on biological and physical systems; evaluating alternative explanations and
problem-solving strategies; evaluating the validity of conclusions through examination of the available evidence; and constructing
arguments to support the reasonableness of solutions to problems.
NOTE: The descriptions of the cognitive domains are the same for grades four and eight.
SOURCE: nternational Association for the Evaluation of Educational Achievement (EA), Trends in nternational Mathematics and Science Study
(TMSS), 2007.
Content and cognitive domain
scores in 2007
As in mathematics, TMSS also provides scores for science
content and cognitive domains (see table 13 for a description
of the science cognitive domains). U.S. fourth-graders scored
higher than the TMSS scale average across the science
content domains in 2007 (table 14). U.S. fourth-graders'
average scores in life science, physical science, and Earth
science were between 33 and 40 scale score points above
the TMSS scale average of 500 in each content domain.
U.S. fourth-graders outperformed their peers in 25 countries
in the life science domain, 24 countries in the physical science
domain, and 21 countries in the Earth science domain. They
were outperformed by their peers in 3 countries in the life
science and Earth science domains, and 7 countries in the
physical science domain.
U.S. fourth-graders' average scores in the cognitive domains of
knowing, applying, and reasoning were, on average, between
33 and 41 scale score points higher than the TMSS scale
average of 500. U.S. fourth-graders outperformed students
in 22 to 26 countries across the three cognitive domains. U.S.
fourth-graders were outperformed by their peers in 1 country
in the applying domain, and 5 countries in the knowing and
reasoning domains.
At the eighth-grade level, U.S. students scored higher than
the TMSS scale average in three of the four science content
domains and the three cognitive domains in 2007 (table 15).
U.S. eighth-graders' average score in biology, chemistry, and
Earth science was, on average, 10 to 30 scale score points
above the TMSS scale score average of 500. On the other
hand, U.S. eighth-graders' average score in the physics
domain was not measurably different from the TMSS scale
score average.
U.S. eighth-graders outperformed students in 36 countries
in the biology and Earth science domains, 35 countries in the
chemistry domain, and 32 countries in the physics domain.
They were outperformed by their peers in 5 countries in the
biology and Earth science domains, 9 countries in the
chemistry domain, and 10 countries in the physics domain.
36
SCIENCE
Table 14. Average science content and cognitive domain scores of fourth-grade students,
by country: 2007
Country Life science Physical science Earth science Knowing Applying Reasoning
TMSS scale average 500 500 500 500 500 500
Singapore 582 585 554 579 587 568
Chinese Taipei 541 559 553 556 536 571
Hong Kong SAR
1
532 558 560 549 546 561
Japan 530 564 529 542 528 567
Latvia
2
535 544 536 535 540 551
England 532 543 538 536 543 537
United States
3,4
540 534 533 533 541 535
Hungary 548 529 517 531 540 529
Italy 549 521 526 539 530 526
Kazakhstan
2
528 528 534 536 534 519
Germany 529 524 524 526 527 525
Australia 528 522 534 523 529 530
Slovak Republic 532 513 530 527 527 513
Austria 526 514 532 526 529 513
Sweden 531 508 535 521 526 527
Netherlands
5
536 503 524 525 518 525
Slovenia 511 530 517 525 511 527
Denmark
3
527 502 522 515 516 525
Czech Republic 520 511 518 516 520 510
Lithuania
2
516 514 511 515 511 524
New Zealand 506 498 515 500 511 505
Scotland
3
504 499 508 494 511 501
Armenia 489 492 479 487 486 484
Norway 487 469 497 478 485 480
Ukraine 482 475 474 477 476 478
Georgia
2
427 414 432 424 434 388
Colombia 408 411 401 404 409 409
El Salvador 410 392 393 393 410 376
Algeria 351 377 365 379 350 357
Kuwait
6
353 345 363 338 360 331
Tunisia 323 340 325 329 316 349
Morocco 292 324 293 311 291 318
Qatar 291 303 305 283 304 293
Yemen
1
2
3
4
5
6
NOTE: Countries are ordered by 2007 overall science average scale score. The tests for signifcance take into account the standard error for the reported
difference. Thus, a small difference between averages for the United States and one country may be signifcant while a large difference between averages for the
United States and another country may not be signifcant. The standard errors of the estimates are shown in table E-22 available at http://nces.ed.gov/pubsearch/
pubsinfo.asp?pubid=2009001.
37
SCIENCE
Table 15. Average science content and cognitive domain scores of eighth-grade students,
by country: 2007
Country Biology Chemistry Physics Earth Science Knowing Applying Reasoning
TIMSS scale average 500 500 500 500 500 500 500
Singapore 564 560 575 541 567 554 564
Chinese Taipei 549 573 554 545 560 565 541
Japan 553 551 558 533 555 534 560
Korea, Rep. of 548 536 571 538 547 543 558
England
1
541 534 545 529 538 530 547
Hungary 534 536 541 531 549 524 530
Czech Republic 531 535 537 534 539 533 534
Slovenia 530 539 524 542 533 533 538
Hong Kong SAR
1,2
527 517 528 532 522 532 533
Russian Federation 525 535 519 525 527 534 520
United States
1,3
530 510 503 525 516 512 529
Lithuania
4
527 507 505 515 512 513 527
Australia 518 505 508 519 510 501 530
Sweden 515 499 506 510 509 505 517
Scotland
1
495 497 494 498 495 480 511
Italy 502 481 489 503 498 494 493
Armenia 490 478 503 475 502 493 459
Norway 487 483 475 502 486 486 491
Ukraine 477 490 492 482 488 477 488
Jordan 478 491 479 484 485 491 471
Malaysia 469 479 484 463 473 458 487
Thailand 478 462 458 488 472 473 473
Serbia
3,4
474 467 467 466 469 485 455
Bulgaria
5
467 472 466 480 471 489 448
Israel
5
472 467 472 462 472 456 481
Bahrain 473 468 466 465 468 469 469
Bosnia and Herzegovina 464 468 463 469 463 486 452
Romania 459 463 458 471 470 451 460
Iran, Islamic Rep. of 449 463 470 476 454 468 462
Malta 453 461 470 456 462 436 473
Turkey 462 435 445 466 450 462 462
Syrian Arab Republic 459 450 447 448 445 474 440
Cyprus 447 452 458 457 456 438 460
Tunisia 452 458 432 447 445 441 458
Indonesia 428 421 432 442 425 426 438
Oman 414 416 443 439 423 428 428
Georgia
4
423 418 416 425 422 440 394
Kuwait
6
419 418 438 410 417 430 411
Colombia 434 420 407 407 417 418 428
Lebanon 405 447 431 389 422 403 420
Egypt 406 413 413 426 404 434 395
Algeria 411 414 397 413 410 409 414
Palestinian Nat'l Auth. 402 413 414 408 412 407 396
Saudi Arabia 407 390 408 423 403 417 395
El Salvador 398 377 380 400 388 394 384
Botswana 359 371 351 361 358 361 362
Qatar 318 322 347 312 322 325
Ghana 304 342 276 294 291 316
1
2
3
4
(TIMSS) (see appendix A).
5
6
NOTE: Countries are ordered by 2007 overall science average scale score. The tests for signifcance take into account the standard error for the reported
difference. Thus, a small difference between averages for the United States and one country may be signifcant while a large difference between averages for the
United States and another country may not be signifcant. The standard errors of the estimates are shown in table E-23 available at http://nces.ed.gov/pubsearch/
SOURCE: International Association for the Evaluation of Educational Achievement (IEA), Trends in International Mathematics and Science Study (TIMSS), 2007.
38
SCIENCE
Table 16. Description of TIMSS international science benchmarks, by grade: 2007
Benchmark
(score cutpoint) Grade four
Advanced
(625)
Students can apply knowledge and understanding of scientifc processes and relationships in beginning scientifc inquiry. Students
communicate their understanding of characteristics and life processes of organisms as well as of factors relating to human health.
They demonstrate understanding of relationships among various physical properties of common materials and have some practical
knowledge of electricity. Students demonstrate some understanding of the solar system and Earth's physical features and processes.
They show a developing ability to interpret the results of investigations and draw conclusions as well as a beginning ability to evaluate
and support an argument.
High
(550)
Students can apply knowledge and understanding to explain everyday phenomena. Students demonstrate some understanding of
plant and animal structure, life processes, and the environment and some knowledge of properties of matter and physical phenomena.
They show some knowledge of the solar system, and of Earth's structure, processes, and resources. Students demonstrate beginning
scientifc inquiry knowledge and skills, and provide brief descriptive responses combining knowledge of science concepts with
information from everyday experience of physical and life processes.
Intermediate
(475)
Students can apply basic knowledge and understanding to practical situations in the sciences. Students recognize some basic
information related to characteristics of living things and their interaction with the environment, and show some understanding of
human biology and health. They also show some understanding of familiar physical phenomena. Students know some basic facts
about the solar system and have a developing understanding of Earth's resources. They demonstrate some ability to interpret
information in pictorial diagrams and apply factual knowledge to practical situations.
Low
(400)
Students have some elementary knowledge of life science and physical science. Students can demonstrate knowledge of some
simple facts related to human health and the behavioral and physical characteristics of animals. They recognize some properties
of matter, and demonstrate a beginning understanding of forces. Students interpret labeled pictures and simple diagrams, complete
simple tables, and provide short written responses to questions requiring factual information.
Grade eight
Advanced
(625)
Students can demonstrate a grasp of some complex and abstract concepts in biology, chemistry, physics, and Earth science.
They have an understanding of the complexity of living organisms and how they relate to their environment. They show understanding
of the properties of magnets, sound, and light, as well as demonstrating understanding the structure of matter and physical and
chemical properties and changes. Students apply knowledge of the solar system and of Earth's features and processes, and apply
understanding of major environmental issues. They understand some fundamentals of scientifc investigation and can apply basic
physical principles to solve some quantitative problems. They can provide written explanations to communicate scientifc knowledge.
High
(550)
Students can demonstrate conceptual understanding of some science cycles, systems, and principles. They have some
understanding of biological concepts including cell processes, human biology and health, and the interrelationship of plants and
animals in ecosystems. They apply knowledge to situations related to light and sound, demonstrate elementary knowledge of heat
and forces, and show some evidence of understanding the structure of matter, and chemical and physical properties and changes.
They demonstrate some understanding of the solar system, Earth's processes and resources, and some basic understanding of
major environmental issues. Students demonstrate some scientifc inquiry skills. They combine information to draw conclusions,
interpret tabular and graphical information, and provide short explanations conveying scientifc knowledge.
Intermediate
(475)
Students can recognize and communicate basic scientifc knowledge across a range of topics. They demonstrate some
understanding of characteristics of animals, food webs, and the effect of population changes in ecosystems. They are acquainted
with some aspects of sound and force and have elementary knowledge of chemical change. They demonstrate elementary
knowledge of the solar system, Earth's processes, and resources and the environment. Students extract information from tables
and interpret pictorial diagrams. They can apply knowledge to practical situations and communicate their knowledge through brief
descriptive responses.
Low (400) Students can recognize some basic facts from the life and physical sciences. They have some knowledge of the human body,
and demonstrate some familiarity with everyday physical phenomena. Students can interpret pictorial diagrams and apply
knowledge of simple physical concepts to practical situations.
NOTE: Score cutpoints for the international benchmarks are determined through scale anchoring. Scale anchoring involves selecting benchmarks (scale points)
on the achievement scales to be described in terms of student performance, and then identifying items that students scoring at the anchor points can answer
correctly. The score cutpoints are set at equal intervals along the achievement scales. The score cutpoints were selected to be as close as possible to the
standard percentile cutpoints (i.e., 90th, 75th, 50th, and 25th percentiles). More information on the setting of the score cutpoints can be found in appendix A
and Martin et al. (2008).
In the three cognitive domains, the average U.S. score
at eighth grade was higher than the TMSS scale average.
n 2007, U.S. eighth-graders' average scores in the knowing,
applying, and reasoning domains were between 12 and 29
scale score points higher than the TMSS scale average of
500. U.S. eighth-graders outperformed students in 33 to 35
countries across the three cognitive domains. U.S. eighth-
graders were outperformed by their peers in 6 to 10 countries
across the three cognitive domains.
Performance on the TIMSS
international benchmarks
The TMSS international benchmarks distinguish four levels
of student achievement: advanced, high, intermediate,
and low, and provide a way to understand how students'
profciency in science varies along the TMSS scale (table 16).
The descriptions of the benchmarks differ between the two
grade levels, as the science skills and knowledge needed to
respond to the assessment items refect the nature, diffculty,
and emphasis at each grade.
39
SCIENCE
Figure 16. Percentage of U.S. fourth- and eighth-
grade students who reached each
TIMSS international science
benchmark compared with the
international median percentage:
2007
Advanced
0
10
20
30
40
50
60
70
80
90
100
93
34
74
Benchmark
Percent
Percent
United States
Grade four
7
94
78*
47*
15*
Grade eight
Advanced
0
10
20
30
40
50
60
70
80
90
100
78
17
49
Benchmark
United States
3
92*
71*
38*
10*
*p < .05. U.S. percentage is signifcantly different from the Trends in nternational
Mathematics and Science (TIMSS) international median percentage.
NOTE: The United States met guidelines for sample participation rates only after
substitute schools were included. The National Defned Population covered 90
percent to 95 percent of National Target Population (see appendix A). The TMSS
international median represents all participating TIMSS jurisdictions, including the
United States. The international median represents the percentage at which half
of the participating countries have that percentage of students at or above the
median and half have that percentage of students below the median. The
standard errors for the estimates are shown in table E-24 available at http://nces.
14
The international median at each benchmark represents the percentage at which half of the participating countries have that percentage of students at or above the
median and half have that percentage of students below the median. For example, the low international benchmark median of 93 percent at grade four indicates that half
of the countries have 93 percent or more of their students who met the low benchmark, and half have less than 93 percent of their students who met the low benchmark.
n 2007, there were higher percentages of U.S. fourth-graders
performing at or above three of the four TMSS international
benchmarks than the international median percentage
(fgure 16).
14
For example, 15 percent of U.S. fourth-graders
performed at or above the advanced benchmark (625) in
science compared to the international median of 7 percent.
These students demonstrated an ability to apply their
knowledge and understanding of scientifc processes and
relationships in beginning scientifc inquiry (see description
in table 16). At the other end of the scale, 94 percent of U.S.
fourth-graders performed at or above the low benchmark
(400) which was not measurably different from the international
median of 93 percent. These students showed at least some
elementary knowledge of life science and physical science.
At the eighth grade, there were higher percentages of U.S.
students performing at or above each of the four TMSS
international science benchmarks than the international
median (fgure 16). For example, 10 percent of U.S. eighth-
graders performed at or above the advanced benchmark
(625) compared to the international median of 3 percent.
These students demonstrated a grasp of some complex
and abstract concepts in biology, chemistry, physics, and
Earth science (see description in table 14). At the other end
of the scale, 92 percent of U.S. eighth-graders performed
at or above the low benchmark (400) compared with the
international median of 78 percent. These students recognized
some basic facts from the life science and physical science.
At grade four, two countries had higher percentages of students
performing at or above the advanced international science
benchmark than the United States (fgure 17). Fourth-graders
in these two countries, Singapore and Chinese Taipei, were also
found to outperform U.S. fourth-graders, on average, on the
overall science scale (see table 11). At grade eight, six countries
had higher percentages of students performing at or above the
advanced science benchmark than the United States (fgure
17). These six countries also had higher average overall eighth-
grade science scores than the United States (see table 11).
n comparison with earlier data collections, a lower percentage
of U.S. fourth-graders performed at or above the advanced
benchmark in 2007 than in 1995 (15 v. 19 percent; data
not shown). There were no measurable differences in the
percentage of U.S. fourth-graders performing at or above the
high, intermediate, or low international science benchmarks
between 1995 and 2007 (high: 50 v. 47 percent; intermediate:
78 v. 78 percent; low: 92 v. 94 percent). At grade eight, there
were fewer U.S. students performing at or above the advanced
benchmark than in 1999 (10 v. 12 percent), but not between
1995 and 2007 (data not shown). On the other hand, there
were more U.S. eighth-graders performing at or above the low
science benchmark in 2007 than in 1995 (92 v. 87 percent).
There was no measurable difference in the percentage of U.S.
eighth-graders performing at or above the high or intermediate
international benchmarks in 2007 than in 1995.
40
SCIENCE
0 10 20 30 40 50
0 10 20 30 40 50
3
2*
2
3
3
3
3
1*
1*
1*
1*
1*
#
#
#
#
#
#
#
#
#
#
10*
11*
11*
11*
13*
17*
17*
17*
25*
32*
4
5*
5*
5*
5*
5*
6*
8*
8*
8*
10*
2*
2*
2*
2*
1*
1*
International median 7
7
8
8
9*
1*
1*
#
#
#
#
#
#
#
36*
19*
16*
15*
14*
14*
13*
13*
12*
12*
11*
10*
10*
10*
10
7
6
4*
4*
3*
2*
2*
1*
Percent
Percent
Yemen
Qatar
Tunisia
Algeria
Morocco
Kuwait
6
El Salvador
Georgia
4
Colombia
Norway
Ukraine
Lithuania
4
Netherlands
5
Scotland
2
Slovenia
Denmark
2
Czech Republic
New Zealand
Sweden
Austria
Kazakhstan
4
Germany
Latvia
4
Australia
Slovak Republic
Armenia
Japan
Italy
Hungary
Hong Kong SAR
3
England
United States
1,2
Russian Federation
Chinese Taipei
Singapore
Algeria
Botswana
El Salvador
Ghana
Qatar
Saudi Arabia
Tunisia
Indonesia
Georgia
4
Kuwait
6
Colombia
Oman
Egypt
Lebanon
Cyprus
Norway
Romania
Serbia
1,4
Bahrain
Malaysia
Thailand
Ukraine
Turkey
Israel
7
Italy
Scotland
2
Bulgaria
7
Malta
Jordan
Sweden
Lithuania
4
Australia
Armenia
United States
1,2
Hong Kong SAR
2,3
Russian Federation
Slovenia
Czech Republic
Hungary
Korea, Rep. of
England
2
Japan
Chinese Taipei
Singapore
Figure 17. Percentage of fourth- and eighth-grade students who reached the TIMSS advanced
international benchmark in science, by country: 2007
Percentage is higher than U.S. percentage (p < .05)
Percentage is not measurably different from U.S. percentage (p < .05)
Percentage is lower than U.S. percentage (p < .05)
# Rounds to zero.
*p < .05. Percentage is signifcantly different from the international median percentage.
1
2
3
4
5
6
7
NOTE: The Trends in nternational Mathematics and Science Study (TMSS) international median represents all participating TMSS jurisdictions, including the
United States. The international median represents the percentage at which half of the participating countries have that percentage of students at or above the
median and half have that percentage of students below the median. The tests for signifcance take into account the standard error for the reported difference.
Thus, a small difference between the United States and one country may be signifcant while a large difference between the United States and another country
may not be signifcant. The standard errors for the estimates are shown in table E-42 available at http://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2009001.
41
SCIENCE
Performance within
the United States
As with mathematics, the TMSS science data were analyzed
to investigate the performance of students grouped in four
ways: the highest and lowest performing students; males and
females; racial and ethnic groups; and public schools serving
students with different low-income concentrations.
Scores of lower and higher
performing students
To examine the science performance of each participating
country's higher and lower performing students, cutpoint
scores were calculated for students performing at or above
the 90th percentile (the top 10 percent of students) and those
performing at or below the 10th percentile (the bottom 10
percent of students). The 10th and 90th percentiles cutpoint
scores were calculated for each country, rather than across
all countries combined.
n 2007, the highest-performing U.S. fourth-graders (those
performing at or above the 90th percentile) scored 643 or
higher in science (table 17). This was higher than the 90th
percentile score for fourth-graders in 27 countries and lower
than 2 of the 35 other countries. Of the 4 countries that
outperformed the United States, on average, in science
at grade four (see table 11), 2 had higher 90th percentile
cutpoint scores than the United States: Singapore and
Chinese Taipei. Scores at the 90th percentile ranged between
379 (Yemen) and 701 (Singapore). The difference in scores
between the highest-performing students in Singapore and
the United States was 58 score points.
The lowest-performing U.S. fourth-graders in science (those
performing at or below the 10th percentile) scored 427 or less
in 2007 (table 17). The 10th percentile score for U.S. fourth-
graders was higher than the 10th percentile score in 17
countries and lower than that in 7 countries: Singapore,
Chinese Taipei, the Russian Federation, Hong Kong SAR,
Japan, Latvia, and the Netherlands. The range in scores at
the 10th percentile was between 20 (Yemen) and 466 (Hong
Kong SAR). The difference in scores between the lowest-
performing students in Hong Kong SAR and the United States
42
SCIENCE
Grade four
Singapore 701 464
United States
1,2
643 427
England 641 438
Armenia 640 336
Hungary 637 425
Hong Kong SAR
3
637 466
Italy 636 429
Japan 633 459
Slovak Republic 627 416
Australia 626 423
Latvia
4
625 454
Kazakhstan
4
623 433
Germany 623 427
Austria 620 423
Sweden 617 429
New Zealand 614 382
Denmark
1
610 417
Slovenia 610 416
Netherlands
5
598 445
Lithuania
4
595 428
Scotland
1
593 400
Ukraine 576 364
Norway 570 374
Georgia
4
524 306
Colombia 522 271
El Salvador 507 267
Kuwait
6
505 182
Tunisia 497 119
Algeria 483 220
Morocco 465 139
Qatar 464 121
Yemen 379 20
Grade eight
Singapore 694 421
England
1
649 427
Japan 648 454
Korea, Rep. of 646 452
Hungary 635 437
Slovenia 628 442
Hong Kong SAR
3
625 419
United States
1,2
623 410
Australia 617 410
Lithuania
4
616 414
Armenia 612 366
Sweden 608 405
Jordan 601 349
Scotland
1
597 388
Bulgaria
7
595 330
Malta 595 298
Israel
7
591 329
Italy 590 393
Ukraine 588 374
Malaysia 581 357
Norway 578 389
Thailand 578 363
Turkey 577 336
Bahrain 575 351
Romania 572 345
Serbia
2,4
571 359
Bosnia and Herzegovina 565 359
Cyprus 556 339
Syrian Arab Republic 546 355
Palestinian Nat'l Auth. 543 255
Oman 541 293
Lebanon 539 284
Egypt 537 275
Kuwait
6
530 298
Georgia
4
527 309
Tunisia 524 367
Indonesia 520 330
Colombia 514 319
Saudi Arabia 503 300
Algeria 488 327
Qatar 480 146
Botswana 478 220
El Salvador 477 298
Ghana 445 163
Percentile cutpoint score is higher than U.S. cutpoint score (p < .05)
Percentile cutpoint score is not measurably different from U.S. cutpoint score (p < .05)
Percentile cutpoint score is lower than U.S. cutpoint score (p < .05)
1
2
3
4
5
6
7
NOTE: Countries are ordered based on the 90th percentile cutpoint for science scores. Cutpoints are calculated based on distribution of student scores within
each country. The international average is the average of the cutpoint scores for all reported countries. The standard errors of the estimates are shown in tables
E-25 and E-26 available at http://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2009001.
Table 17. Science scores of fourth- and eighth-grade students dening 10th and 90th percentiles,
by country: 2007
43
SCIENCE
Figure 18. Cutpoints at the 10th and 90th percentile for science content domain scores of U.S.
fourth- and eighth-grade students: 2007
Science score Science score
Content domain
Total score
Content domain
Total score
643
427
641
433
433
630
630
433
Life Science
Physical Science
Earth Science
Grade four
90th percentile
10th percentile
Grade eight
90th percentile
10th percentile
Biology
Chemistry
Physics
Earth Science
0 300 400 500 600 700 1,000 0 300 400 500 600 700 1,000
623
410
633
421
410
607
603
634
399
410
percent to 95 percent of National Target Population (see appendix A). The standard errors of the estimates are shown in table E-27 available at http://nces.ed.gov/
On the three science content domains at grade four in 2007,
the highest-performing U.S. students (90th percentile or
higher) scored 641 or higher on the life science domain and
630 or higher on both the physical science or Earth science
domains (fgure 18). The lowest-performing U.S. students
(10th percentile or lower) scored 433 or lower on the life
science, physical science, and Earth science domains.
At grade eight, the highest-performing U.S. students (90th
percentile or higher) in science scored 623 or higher in 2007
(table 17). This was higher than the 90th percentile score
in 34 countries and lower than in 6 countries: Singapore,
Chinese Taipei, England, Japan, Korea, and Hungary. The
range in 90th percentile scores was between 445 (Ghana)
and 694 (Singapore). The difference in scores between the
highest-performing students in Singapore and the United
States was 71 score points.
At the other end of the scale, the lowest-performing U.S.
eighth-graders (10th percentile or lower) scored 410 or
lower in science in 2007 (table 17). The 10th percentile score
for U.S. eighth-graders was higher than the 10th percentile
score in 34 countries and lower than in 8 countries: Chinese
Taipei, England, Japan, Korea, Hungary, the Czech Republic,
Slovenia, and the Russian Federation. The range in 10th
percentile scores was between 163 (Ghana) and 454 (Japan).
The difference in scores between the lowest-performing
students in Japan and the United States was 44 score points.
On the four science content domains at grade eight, the
highest-performing U.S. eighth-graders (90th percentile
or higher) scored 633 or higher on the biology domain,
607 or higher on the chemistry domain, 603 or higher on
the physics domain, and 634 or higher on the Earth science
domain (fgure 18). The lowest-performing U.S. students
(10th percentile or lower) scored 421 or lower on the biology
domain, 410 or lower on the chemistry and Earth science
domains, and 399 or lower on the physics domain in 2007.
44
SCIENCE
Figure 19. Trends in 10th and 90th percentile
science scores of U.S. fourth-
and eighth-grade students:
1995, 1999, 2003, and 2007
0
300
400
500
600
700
1,000
0
300
400
500
600
700
1,000
1999 2003 2007
Year
1995
Science score
628
636*
628 623
410
419
384* 386*
1999
1
2003 2007
Year
Grade four
Grade eight
1995
Science score
654*
636
643
427 426
419
90th percentile
10th percentile
90th percentile
10th percentile
*p < .05. Percentile cutpoint score is signifcantly different from 2007 percentile
cutpoint score.
1
Cutpoints are calculated based on distribution of U.S. student scores. The
(TMSS), 1995, 1999, 2003, and 2007.
A comparison of 1995 and 2007 shows a decline in the 90th
percentile cutpoint score for U.S. fourth graders in science, the
point marking the top 10 percent of students (fgure 19). n
2007, the 90th percentile score was 643, 11 score points lower
than the analogous score of 654 in 1995. A comparison of the
10th percentile science scores for U.S. fourth-graders in 1995
and 2007 and 2003 and 2007 shows no measurable difference.
At grade eight, the data suggest a different picture. The 90th
percentile cutpoint score in science showed no measurable
differences in comparisons of 2007 to 1995 or 2003, but
showed a decrease when the 2007 score was compared to
the 1999 score (636 v. 623). The score identifying the lowest-
performing U.S. eighth-graders in science was higher in 2007
than in 1995 (410 v. 384) and in 1999 (410 v. 386).
Average scores of male and female students
n 2007, U.S. fourth-grade males and females showed no
measurable difference in their average science performance
(fgure 20). Fourteen of the 35 other countries participating at
grade four showed a signifcant difference in average science
scores of males and females: 8 countries in favor of males
and 6 in favor of females. The largest differences were 64
score points in Kuwait (in favor of females) and 15 score
points in Colombia (in favor of males).
45
SCIENCE
Figure 20. Difference in average science scores of fourth- and eighth-grade students, by sex
and country: 2007
Male-female difference in average science scores favors males and is statistically signifcant (p < .05)
Male-female difference in average science scores is not measurably different (p < .05)
Male-female difference in average science scores favors females and is statistically signifcant (p < .05)
# Rounds to zero.
1
2
3
4
5
(see appendix A).
6
Kuwait tested the same cohort of students as other countries, but later in 2007, at the beginning of the next school year (see appendix A).
7
NOTE: The standard errors of the estimates are shown in tables E-29 and E-30 available at http://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2009001.
Grade four
Difference in favor
of females
Difference in
favor of males
Difference in favor
of females
Difference in
favor of males
Grade eight
1
1
#
#
2
6
11
10
10
2
5
5
7
8
13
13
13
15
15
64
31
26
21
17
14
10
2
3
3
4
4
6
2
2
3
4
Difference in average science score
Difference in average science score
80 60 40 20 0 20 40 60 80
80 60 40 20 0 20 40 60 80
8
8
9
9
12
2
2
2
1
1
1
2
3
5
5
8
8
9
2
35
29
22
19
18
12
9
8
9
12
12
16
17
62
70
61
49
43
36
34
22
22
18
3
4
5
5
6
7
Qatar
Bahrain
Oman
Kuwait
6
Saudi Arabia
Jordan
Georgia
5
Botswana
Thailand
Egypt
Cyprus
Bulgaria
7
Israel
7
Malaysia
Romania
Armenia
Singapore
Turkey
Hong Kong SAR
2,4
Serbia
3,5
Sweden
Norway
Algeria
Lithuania
5
Indonesia
Ukraine
Slovenia
Malta
Japan
Chinese Taipei
Scotland
2
Russian Federation
Lebanon
Korea, Rep. of
Italy
England
2
Czech Republic
Hungary
United States
2,3
Australia
Tunisia
El Salvador
Ghana
Colombia
Kuwait
6
Tunisia
Qatar
Yemen
Armenia
Georgia
5
Algeria
Morocco
Latvia
5
New Zealand
Lithuania
5
Russian Federation
England
Ukraine
Sweden
Kazakhstan
5
Japan
Slovenia
Singapore
Scotland
2
Chinese Taipei
Norway
Hungary
Hong Kong SAR
4
Australia
United States
2,3
Denmark
2
Czech Republic
Slovak Republic
Netherlands
1
Italy
El Salvador
Austria
Germany
Colombia
46
SCIENCE
Figure 21. Average science scores of U.S. fourth- and eighth-grade students, by content domain
and sex: 2007
Average science score Average science score
Content domain
Total score
Content domain
Total score
541
536
541
538
532
536
536*
531
Life science
Physical science
Earth science
Grade four
Male
Female
Grade eight
Male
Female
Biology
Chemistry
Physics
Earth science
0 300 400 500 600 700 1,000 0 300 400 500 600 700 1,000
526*
514
533*
527
508
512
514*
534*
491
516
*p < .05. Difference between average science scores for males and females is statistically signifcant and favors males.
Although there was no measurable sex difference on the
total average science score, U.S. males outperformed U.S.
females in one content area: Earth science (536 v. 531; fgure
21). There was no measurable difference detected in the
average scores of U.S. fourth-grade males and females
in either the life science or physical science domains.
Unlike their fourth-grade counterparts, U.S. eighth-grade
males outperformed their female classmates in science in
2007 (fgure 20). Among the 47 other countries participating in
TMSS, 24 showed a difference in the average science scores
of males and females: 10 countries in favor of males and 14 in
favor of females. The largest differences were 70 score points
in Qatar (in favor of females) and 35 score points in Colombia
and Germany (in favor of males).
Like the overall science scale at grade eight, U.S. males
scored higher, on average, than their female classmates
in three of the four science content domains: biology (533
v. 527), physics (514 v. 491), and Earth science (534 v. 516;
fgure 21). There was no measurable difference detected in
the average science scores of U.S. eighth-grade males and
females in the chemistry domain.
47
SCIENCE
Figure 22. Trends in sex differences in average
science scores of U.S. fourth-
and eighth-grade students:
1995, 1999, 2003, and 2007
Score
gap
14 16 19
12
Score
gap
12* 5 5
0
300
400
500
600
700
1,000
0
300
400
500
600
700
1,000
1999 2003 2007
Year
1995
Average science score
520 524
536*
526
514
519
505 505
1999
1
2003 2007
Year
Grade four
Grade eight
1995
548
538 541
536 533 536
Males
Females
Males
Females
1
Detail may not sum to totals due to rounding. The standard errors of the
estimates are shown in table E-32 available at http://nces.ed.gov/pubsearch/
(TMSS), 1995, 1999, 2003, and 2007.
There was no measurable change in the average scores of
either U.S. males or females at grade four when 2007 scores
were compared to those from 1995 and 2003 (fgure 22).
However, the advantage for males decreased, from 12 scale
score points in 1995 to 5 scale score points in 2003 and 2007.
At grade eight, there was also no measurable change in the
average science scores of U.S. males and females or the gap
between them when 2007 scores were compared to 1995
(fgure 22). However, the average science score for males
was lower in 2007 than it was in 2003 (526 v. 536).
48
SCIENCE
Figure 23. Average science scores of U.S.
by race/ethnicity: 2007
0
300
400
500
600
700
1,000
Race/ethnicity
Grade four
average
TIMSS scale
average
567
488
502
573
550
539
500
551
455
480
543
522 520
500
0
300
400
500
600
700
1,000
Race/ethnicity
Grade eight
average
TIMSS scale
average
NOTE: Reporting standards were not met for American ndian/Alaska Native
and Native Hawaiian/Other Pacifc slander. Black includes African American.
Racial categories exclude Hispanic origin. Students who identifed themselves
as being of Hispanic origin were classifed as Hispanic, regardless of their race.
Although data for some race/ethnicities are not shown separately because the
reporting standards were not met, they are included in the U.S. totals shown
throughout the report. The United States met guidelines for sample participation
(see appendix A). See appendix A in this report for more information. The
(TMSS), 2007.
Average scores of students of different
races and ethnicities
n 2007, in comparison to the TMSS scale average, U.S.
White, Asian, and multiracial fourth-graders scored higher in
science, on average, while U.S. Black fourth-graders scored
lower (fgure 23). U.S. Hispanic fourth-graders' average score
showed no measurable difference from the TMSS scale
average. n comparison to the U.S. national average, U.S.
White and Asian fourth-graders scored higher in science, on
average, while U.S. Black and Hispanic fourth-graders scored
lower. U.S. multiracial fourth-graders' average score showed
no measurable difference from the U.S. national average.
At grade eight, U.S. White, Asian, and multiracial students
scored higher, on average, than the TMSS scale average in
science and U.S. Black and Hispanic eighth-graders scored
lower, on average (fgure 23). n comparison to the U.S.
national average, U.S. White and Asian eighth-graders scored
higher in science, on average, while U.S. Black and Hispanic
eighth-graders scored lower. U.S. multiracial eighth-graders'
average score showed no measurable difference from the
U.S. national average.
Examination of performance over time shows that U.S. Black
and Asian fourth-graders, and U.S. Black, Hispanic, and Asian
eighth-graders had an overall pattern of improvement in
science, on average (fgure 24). There was no measurable
change in the average science scores of White and Hispanic
fourth-graders, and White eighth-graders when 2007 scores
were compared to those from the earlier assessments.
Moreover, though signifcant differences remain in the
average scores of White students compared with most of their
classmates, the score gap between White students and their
counterparts decreased from 1995, at both grades. The
exception is the score gap in science between White and
Hispanic fourth-graders, which showed no measurable
change over the data collection years.
49
SCIENCE
Score
gap
Score
gap
47* 22* 6 38* 17 7 20
85
Score
gap
Score
gap
69 66 65 98* 70 71
109
Score
gap
122* 91 96 110* 78 79
Score
gap
0
300
400
500
600
700
1,000
1999
1
2003 2007
Year
Grade four
1995
572
565 567
488 486
462*
1999 2003 2007
Year
Grade eight
1995
438*
455
461
422*
462*
480 482
446*
544 547
527
536
543
551 552
506*
1999
1
2003 2007
Year
1995
572
565
567
502 498
503
1999
1
2003 2007
Year
1995
573
567
543*
525*
0
300
400
500
600
700
1,000
1999 2003 2007
Year
1995
1999 2003 2007
Year
1995
0
300
400
500
600
700
1,000
0
300
400
500
600
700
1,000
0
300
400
500
600
700
1,000 White
Black
White
Black
White
Asian
544 547
551 552
544 547
551 552
572
565
0
300
400
500
600
700
1,000
White
Asian
White
Hispanic
White
Hispanic
Figure 24. Trends in differences in average science scores of U.S. fourth- and eighth-grade students,
by selected race/ethnicity: 1995, 1999, 2003, and 2007
1
NOTE: Only the four numerically largest racial categories are shown. Multiracial data were not collected in 1995 and 1999. Reporting standards were not met for American
ndian/Alaska Native and Native Hawaiian/Other Pacifc slander. Black includes African American. Racial categories exclude Hispanic origin. Students who identifed
themselves as being of Hispanic origin were classifed as Hispanic, regardless of their race. Although data for some race/ethnicities are not shown separately because the
reporting standards were not met, they are included in the U.S. totals shown throughout the report. n 2007, the United States met guidelines for sample participation rates
only after substitute schools were included. The National Defned Population covered 90 percent to 95 percent of National Target Population (see appendix A). The tests for
signifcance take into account the standard error for the reported difference. Thus, a small difference between averages for one student group may be signifcant while a large
difference for another student group may not be signifcant. See appendix A in this report for more information. The standard errors of the estimates are shown in table E-34
SOURCE: nternational Association for the Evaluation of Educational Achievement (EA), Trends in nternational Mathematics and Science Study (TMSS), 1995, 1999, 2003,
and 2007.
50
SCIENCE
Figure 25. Average science scores of U.S.
by percentage of students in public
school eligible for free or reduced-
price lunch: 2007
0
300
400
500
600
700
1,000
Grade four
Less
than 10
percent
10 to
24.9
percent
25 to
49.9
percent
50 to
74.9
percent
75
percent
or more
U.S.
average
TIMSS
scale
average
590
567
550
520
477
539
500
0
300
400
500
600
700
1,000
Grade eight
Less
than 10
percent
10 to
24.9
percent
25 to
49.9
percent
50 to
74.9
percent
75
percent
or more
U.S.
average
TIMSS
scale
average
572
559
528
495
466
520
500
NOTE: Analyses are limited to public schools only, based on school reports of the
percentage of students in public school eligible for the federal free or reduced-
price lunch program. The United States met guidelines for sample participation
(see appendix A). The standard errors of the estimates are shown in table E-35
Average scores of students attending public
schools of various poverty levels
The U.S. results are also arrayed by the concentration of
low-income enrollment in the public schools, as measured by
eligibility for free or reduced-price lunch, and shown in relation
to the TMSS scale average and the U.S. national average.
n comparison to the TMSS scale average, the average
science score of U.S. fourth graders in the highest poverty
public schools (at least 75 percent of students eligible for
free or reduced-price lunch) in 2007 was lower; the average
scores of fourth-graders in each of the other categories of
school poverty was higher than the TMSS scale average
(fgure 25). n comparison to the U.S. national average score,
fourth-graders in schools with 50 percent or more students
eligible for free or reduced-price lunch scored lower in
science, on average, while those in schools with lower
proportions of poor students scored higher, on average,
than the U.S. national average.
n comparison to the TMSS scale average, U.S. eighth-graders
attending public schools with fewer than 50 percent of students
eligible for the free or reduced-price lunch program scored
higher in science, on average (fgure 25). On the other hand,
U.S. eighth-graders in public schools with 75 percent or more
of students eligible scored lower in science, on average, than
the TMSS scale average. n comparison to the U.S. national
average, U.S. eighth-graders in public schools with fewer than
25 percent of students eligible scored higher in science, on
average, while students in public schools with at least 50
percent eligible scored lower, on average.
51
SCIENCE
Score
gap
29 40
13 23
Score
gap
2007
Year
Grade four
2003
Grade four
Grade four
Grade four
580
590
567 567
550
551
10-24.9 percent
Year
2003 2007
0
300
400
500
600
700
1,000
0
300
400
500
600
700
1,000
Score
gap
100 113
60 70
Score
gap
2007
Year
2003
520
519
477 480
50-74.9 percent
25-49.9 percent
75 percent or more
Year
2003 2007
0
300
400
500
600
700
1,000
0
300
400
500
600
700
1,000
580
590
580
590
580
590
by school poverty level: 1999, 2003, and 2007
Comparisons of the 2007 average science scores to
those for the earlier years within each school poverty level
revealed no measurable change in the average science
scores at either grade four or eight, with one exception
(fgure 26).
15
At grade eight, students in public schools
with the highest poverty levels (75 percent or more) had
a higher average science score in 2007 than in 1999
(466 v. 440).
n addition, the size of the difference in average scores,
or the score gap, between U.S. fourth- and eighth-graders
in public schools with the lowest poverty level (less than
10 percent) and their peers attending public schools with
higher poverty levels showed no measurable change
(fgure 26).
Effect size of the difference
in average scores
As noted in the mathematics section of this report, statistically
signifcant results do not necessarily indicate those fndings
that are important or large enough to consider as informing
policy or practice. Small differences may be statistically
signifcant, but may not have much practical import.
As discussed earlier, the highest scoring countries outpaced
the United States on a number of measures. The difference at
grade four between the U.S. average science score (539) and
the Singapore average score (587) was 48 score points (see
table 11). The gap between the United States and Singapore
is also apparent in the percentage of students scoring at the
advanced level: 15 percent of U.S. fourth-graders met the
advanced international benchmark compared with 36 percent
15
nformation on the percentage of students eligible for the federal free or reduced-price lunch program was not collected in 1995 for either grade. Thus,
comparisons over time on the poverty measure are limited to a 8-year period.
52
SCIENCE
Score
gap
12 16 13
55 42 44 128 110 105
85 67 76
Score
gap
Year
Grade eight Grade eight
Grade eight Grade eight
568
572
559
571
554 556
528
529
513
466 461
440*
495
504
484
10-24.9 percent
Year
2007 1999 2003
2007 1999 2003
2007 1999 2003
2007 1999 2003
0
300
400
500
600
700
1,000
0
300
400
500
600
700
1,000
Score
gap
Score
gap
Year
50-74.9 percent
25-49.9 percent
75 percent or more
Year
0
300
400
500
600
700
1,000
0
300
400
500
600
700
1,000
568
572 571
568
572 571
568
572 571
by school poverty level: 1999, 2003, and 2007Continued
NOTE: nformation on the percentage of students in school eligible for free or reduced-price lunch was not collected in 1995. No fourth-grade assessment was conducted
in 1999. Analyses are limited to public schools only, based on school reports of the percentage of students in school eligible for the federal free or reduced-price lunch
program. n 2007, the United States met guidelines for sample participation rates only after substitute schools were included. The National Defned Population covered 90
SOURCE: nternational Association for the Evaluation of Educational Achievement (EA), Trends in nternational Mathematics and Science Study (TMSS), 1999, 2003,
and 2007.
in Singapore (see fgure 17). Are differences within the United
States between groups of students (e.g., by race/ethnicity or
poverty concentration in schools) bigger or smaller than these
international differences? Effect sizes help make these
comparisons. Figure 27 shows the effect size of the difference
in science only for those groups with statistically signifcant
score differences. Appendix A includes a discussion of how
effect sizes were calculated.
As shown in fgure 27, and as observed in mathematics, the
effect sizes between groups vary considerably. For example,
in grade four science, the effect size of the difference between
U.S. White and Black students is 2.2 times and between U.S.
White and Hispanic students is 1.6 times the effect size
between the United States and Singapore, the country with
the highest estimated score. The largest observed effect size,
between U.S. fourth-graders in schools with the lowest and
highest poverty levels, is 3 times the effect size between the
United States and Singapore.
At grade eight, the effect size of the difference in science
scores between U.S. White and Black students is 2.6 times
and between U.S. White and Hispanic students is 2 times
the effect size between the United States and Singapore,
the country with the highest estimated score. The largest
observed effect size, between U.S. eighth-graders in schools
with the lowest and highest poverty levels, is 2.8 times the
effect size between the United States and Singapore.
53
SCIENCE
Figure 27. Effect size of difference in average science achievement
of fourth- and eighth-grade students, by country, sex,
race/ethnicity, and school poverty level: 2007
Groups compared
Grade four
United States
v. Singapore
U.S. White
students v.
U.S. Black
students
U.S. White
students v.
U.S. Hispanic
students
U.S. White
students v.
U.S. multiracial
students
U.S. public
schools with
lowest levels
of poverty v.
U.S. public
levels of poverty
Effect size
0.5
1.1
0.8
0.2
1.5
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
Groups compared
Grade eight
United States
v. Singapore
U.S. males v.
U.S. females
U.S. White
students v.
U.S. Black
students
U.S. White
students v.
U.S. Hispanic
students
U.S. White
students v.
U.S. multiracial
students
U.S. public
schools with
lowest levels
of poverty v.
U.S. public
levels of poverty
Effect size
0.5
0.1
1.3
1.0
0.4
1.4
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
NOTE: Effect size is shown only for statistically signifcant differences between group means. Effect size is calculated
by dividing the raw difference between group means by the pooled standard deviation (see appendix A). Black includes
African American. Racial categories exclude Hispanic origin. Students who identifed themselves as being of Hispanic
origin were classifed as Hispanic, regardless of their race. High-poverty schools are those in which 75 percent or more
of students are eligible for the federal free or reduced-price lunch program. Low-poverty schools are those in which less
than 10 percent of students are eligible. The United States met guidelines for sample participation rates only after
substitute schools were included. The National Defned Population covered 90 percent to 95 percent of the National
Target Population. See table E-37 (available at http://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2009001) for standard
deviations of the U.S. and other countries' student populations. See table E-38 (available at http://nces.ed.gov/
pubsearch/pubsinfo.asp?pubid=2009001) for standard deviations of U.S. student subpopulations.
SOURCE: nternational Association for the Evaluation of Educational Achievement (EA), Trends in nternational
Mathematics and Science Study (TMSS), 2007.
55
References
References
Beaton, A.E., and Gonzlez, E. (1995). The NAEP Primer.
Chestnut Hill, MA: Boston College.
Cohen, J. (1988). Statistical Power Analysis for the
Behavioral Sciences, 2nd ed. Hillsdale, NJ: Erlbaum.
Ferraro, D., and Van de Kerckhove, W. (2006). Trends in
International Mathematics and Science Study (TIMSS)
2003: Nonresponse Bias Analysis (NCES 2007-044).
National Center for Education Statistics, Institute of
Education Sciences, U.S. Department of Education.
Washington, DC.
Foy, P., Joncas, M., and Zuhlke, O. (2005).TIMSS 2007
School Sampling Manual. Unpublished Manuscript,
EA Data Processing Center. (2006). TIMSS 2007 Data Entry
Manager Manual. Hamburg, Germany: Author.
Martin, M.O., Mullis, .V.S., and Foy, P. (2008). TIMSS 2007
International Science Report: Findings from IEAs Trends
in International Mathematics and Science Study at the
Fourth and Eighth Grades. Chestnut Hill, MA: Boston
College.
Matheson, N., Salganik, L., Phelps, R., Perie, M., Alsalam, N.,
and Smith, T. (1996). Education Indicators: An
International Perspective (NCES 96-003). U.S.
Department of Education. Washington, DC: National
Center for Education Statistics.
Mullis, .V.S., Martin, M.O., and Foy, P. (2005). IEAs TIMSS
2003 International Report on Achievement in the
Mathematics Cognitive Domains: Findings From a
Developmental Project. Chestnut Hill, MA: Boston College.
Mullis, .V.S., Martin, M.O., Ruddock, G.J., O'Sullivan, C.Y.,
Arora, A., and Erberber, E. (2005). TIMSS 2007
Assessment Frameworks. Chestnut Hill, MA: Boston
College.
Mullis, .V.S., Martin, M.O., and Foy, P. (2008). TIMSS 2007
International Mathematics Report: Findings from IEAs
Trends in International Mathematics and Science Study
at the Fourth and Eighth Grades. Chestnut Hill, MA:
Boston College.
National Center for Education Statistics. (2002). NCES
Statistical Standards (NCES 2003-601). nstitute of
Education Sciences, U.S. Department of Education.
Washington, DC: Author.
Olson, J.F., Martin, M.O., and Mullis, .V.S. (2008). TIMSS
2007 Technical Report. Chestnut Hill, MA: Boston College.
Rosnow, R.L., and Rosenthal, R. (1996). Computing
Contrasts, Effect Sizes, and Counternulls on Other
People's Published Data: General Procedures for
Research Consumers. Psychological Methods, 1:331-340.
United Nations Educational, Scientifc and Cultural
Organization (UNESCO). (1999). Classifying Educational
Programmes Manual for ISCED-97 Implementation in
OECD Countries (1999 Edition). Paris: Author. Retrieved
April 9, 2008 from
http://www.oecd.org/dataoecd/7/2/1962350.pdf.
Westat. (2007). WesVar 5.0 Users Guide. Rockville, MD:
Author.
A-1
APPENDIX A
Introduction
The Trends in International Mathematics and Science Study
(TMSS) is a cross-national comparative study of the
performance and schooling contexts of fourth- and eighth-
grade students in mathematics and science. In this fourth
cycle of TIMSS, mathematics and science assessments
and associated questionnaires were administered in 43
jurisdictions at the fourth-grade level and 56 jurisdictions at
the eighth-grade level during 2007. TMSS is coordinated by
the nternational Association for the Evaluation of Educational
Achievement (EA), with national sponsors in each
participating jurisdiction. In the United States, TIMSS is
sponsored by the National Center for Education Statistics
(NCES), in the Institute of Education Sciences at the U.S.
Department of Education. This appendix provides an
overview of the technical aspects of TMSS 2007, including
the sampling, data collection, test development and
administration, weighting and variance estimation, scaling,
and statistical testing procedures used to collect and analyze
the data. More detailed information can be found in the
TIMSS 2007 Technical Report (Olson, Martin, and Mullis 2008).
International requirements
for sampling, data collection,
and response rates
In order to ensure comparability of the data across countries,
the EA provided detailed international guidelines on the
various aspects of data collection described here, and
implemented quality control procedures. Participating
countries were obliged to follow these guidelines.
Target populations
In order to identify comparable populations of students to
be sampled, the EA defned the target populations as follows
(Olson, Martin, and Mullis 2008):
Fourth-grade student population. The international
desired target population is all students enrolled in the grade
that represents 4 years of schooling, counting from the frst
year of the nternational Standard Classifcation of Education
(SCED)
1
Level 1, providing that the mean age at the time of
testing is at least 9.5 years. For most countries, the target
grade should be the fourth grade, or its national equivalent.
All students enrolled in the target grade, regardless of their
age, belong to the international desired target population.
Eighth-grade student population. The international
desired target population is all students enrolled in the grade
that represents 8 years of schooling, counting from the frst
year of SCED Level 1, providing that the mean age at the
time of testing is at least 13.5 years. For most countries,
the target grade should be the eighth grade, or its national
equivalent. All students enrolled in the target grade,
regardless of their age, belong to the international desired
target population.
Teacher population. The mathematics and science
teachers linked to the selected students. Note that these
teachers are not a representative sample of teachers within
the country. Rather, they are the mathematics and science
teachers who teach a representative sample of students in
two grades within the country (grades four and eight in the
United States).
School population. All eligible schools
2
containing either
of the following: one or more fourth-grade classrooms; or one
or more eighth-grade classrooms.
Sampling
The sample design employed by the TMSS 2007 assessment
is generally referred to as a three-stage stratifed cluster
sample. The sampling units at each stage were defned
as follows.
First-stage sampling units. The frst-stage sampling
units consisted of individual schools selected with probability
proportionate to size (PPS), size being the estimated number
of students enrolled in the target grade. Prior to sampling,
schools in the sampling frame could be assigned to a
predetermined number of explicit or implicit strata. Schools
were to be sampled using a PPS systematic sampling
method. Substitution schoolsschools selected to replace
those that were originally sampled but refused to participate
were to be identifed simultaneously.
Second-stage sampling units. The second-stage
sampling units were classrooms within sampled schools.
Countries were required to randomly select a minimum of one
eligible classroom per target grade per school from a list of
eligible classrooms prepared for each target grade. However,
countries also had the option of selecting more than one
eligible classroom per target grade per school and were
encouraged to do so.
Appendix A: Technical Notes
1
The SCED was developed by the United Nations Educational, Scientifc, and Cultural Organization (UNESCO) to facilitate the comparability of educational levels
across countries. SCED Level 1 begins with the frst year of formal, academic learning (UNESCO 1999). n the United States, SCED Level 1 begins at grade one.
2
Some sampled schools may be considered ineligible for reasons noted in the section below titled "School exclusions.
A-2
APPENDIX A

Third-stage sampling units. The third-stage sampling
units were students within sampled classrooms. Generally,
all students in a sampled classroom were to be selected for
the assessment though it was possible to sample a subgroup
of students within a classroom, but only after consultation
with Statistics Canada, the organization serving as the
sampling referee.
Sample size for the main survey
TMSS guidelines call for a minimum of 150 schools to
be sampled per grade, with a minimum of 4,000 students
assessed per grade. The basic sample design of one
classroom per target grade per school was designed to yield
a otal sample of approximately 4,500 students per population.
Countries with small class sizes or less than 30 students per
school, were directed to consider sampling more schools,
more classrooms per school, or both, to meet the minimum
target of 4,000 tested students.
n 2007, countries that had participated in TMSS 2003 were
required to increase the size of their student samples to provide
data for a bridge study. This study was designed to evaluate
the effect of a small change in the assessment design between
2003 and 2007. Countries that participated in TMSS 2003
were asked to include four additional booklets from 2003 in
with the 14 booklets for TMSS 2007 at each grade. As a result,
student sample sizes needed to be increased to ensure that
the number of students taking each booklet was suffcient for
the purposes of scaling. The 2003-07 Bridge Study is described
below in the section on "Scaling.
Exclusions
The following discussion draws on the TIMSS 2007 School
Sampling Manual (Foy, Joncas, and Zuhlke 2005). All schools
and students excluded from the national defned target
population are referred to as the excluded population.
Exclusions could occur at the school level, with entire schools
being excluded, or within schools, with specifc students or
entire classrooms excluded. TMSS 2007 did not provide
accommodations for students with disabilities or students who
were unable to read or speak the language of the test. The
EA requirement with regard to exclusions is that they should
not exceed more than 5 percent of the national desired target
population (Foy, Joncas, and Zuhlke 2005).
School exclusions. Countries could exclude schools that
are geographically inaccessible;
are of extremely small size;
offer a curriculum, or school structure, radically different
from the mainstream educational system; or
provide instruction only to students in the excluded
categories defned under "within-school exclusions,
such as schools for the blind.
Within-school exclusions. Countries were asked to adapt
the following international within-school exclusion rules to
defne excluded students:
Students with intellectual disabilitiesStudents who,
in the professional opinion of the school principal or
other qualifed staff members, are considered to have
intellectual disabilities or who have been tested
psychologically as such. This includes students who are
emotionally or mentally unable to follow even the general
instructions of the test. Students were not to be excluded
solely because of poor academic performance or normal
disciplinary problems.
Students with functional disabilitiesStudents who are
permanently physically disabled in such a way that they
cannot perform in the TIMSS testing situation. Students
with functional disabilities who are able to respond were
to be included in the testing.
Non-native-language speakersStudents who are
unable to read or speak the language(s) of the test and
would be unable to overcome the language barrier of
the test. Typically, a student who had received less than
1 year of instruction in the language(s) of the test was
to be excluded.
Dened participation rates
n order to minimize the potential for response biases, the EA
developed participation or response rate standards that apply
to all countries and govern whether or not a nation's data are
included in the TMSS 2007 international dataset and the way
in which national statistics are presented in the international
reports. These standards were set using composites of
response rates at the school, classroom, and student and
teacher levels and response rates were calculated with and
without the inclusion of substitute schools that were selected
to replace schools refusing to participate.
The response rate standards determine how a jurisdiction's
data will be reported in the international reports. These
standards take the following two forms, distinguished primarily
by whether or not meeting the school response rate of 85
percent requires the counting of substitute schools.
Category 1: Met requirements. Countries that meet all
of the following conditions are considered to have fulflled the
EA requirements: (a) a minimum school participation rate of
85 percent, based on original sampled schools only; and (b)
a minimum classroom participation rate of 95 percent, from
both original and substitute schools; and (c) a minimum
student participation rate of 85 percent, from both original
and substitute schools.
A-3
APPENDIX A
Category 2: Met requirements after substitutes. In the
case of countries not meeting the category 1 requirements,
provided that at least 50 percent of schools in the original
sample participate, a country's data are considered
acceptable if the following requirements are met: a minimum
combined school, classroom and student participation rate
of 75 percent, based on the product of the participation rates
described above. That is, the product of (a), (b) and (c), as
defned in the Category 1 standard, must be greater than
or equal to 75 percent.
Countries satisfying the Category 1 standard are included in the
international tabular presentations without annotation. Those
only able to satisfy the Category 2 standard are included as
well but are annotated to indicate their response rate status.
The data from countries failing to meet either standard are
presented separately in the international tabular presentations.
Sampling, data collection, and
response rates in the United States
and other countries
The U.S. TIMSS sample design
In the United States and most other countries, the target
populations of students corresponded to the fourth and eighth
grades. n sampling these populations TMSS used a three-
stage stratifed cluster sampling design.
3
While the U.S.
sampling frame was not explicitly stratifed it was implicitly
stratifed (that is, sorted for sampling) by four categorical
stratifcation variables: type of school (public or private),
region of the country (Northeast, Central, West, Southeast);
4

community type (eight levels);
5
and minority status (above
or below 15 percent of the student population).
The frst stage made use of a systematic PPS technique to
select schools for the original sample. Using a sampling frame
based on the 2006 National Assessment of Educational
Progress (NAEP) school sampling frame,
6
schools were
selected with a probability proportionate to the school's
estimated enrollment of fourth- or eighth-grade students.
Data for public schools were taken from the Common Core
of Data (CCD), and data for private schools were taken from
the Private School Universe Survey (PSS). n addition, for
each original school selected, the two neighboring schools
in the sampling frame were designated as substitute schools.
The frst school following the original sample school was the
frst substitute and the frst school preceding it was the second
substitute. f an original school refused to participate, the frst
substitute was contacted. f that school also refused to
participate, the second substitute was contacted. There
were several constraints on the assignment of substitutes.
One sampled school was not allowed to substitute for another,
and a given school could not be assigned to substitute for
more than one sampled school. Furthermore, substitutes
were required to be in the same implicit stratum as the
sampled school.
The second stage consisted of selecting intact mathematics
classes within each participating school. Schools provided
lists of fourth- or eighth-grade classrooms. Within schools,
classrooms with fewer than 15 students were collapsed into
pseudo-classrooms, so that each classroom on the school's
classroom sampling frame had at least 20 students.
7
An equal
probability sample of two classrooms (pseudo-classrooms)
was identifed from the classroom frame for the school. n
schools where there was only one classroom, this classroom
was selected with certainty. At the fourth-grade level, 30
pseudo-classrooms were created prior to classroom sampling
with 20 of these being selected in the fnal fourth-grade
classroom sample. At the eighth-grade level, 253 pseudo-
classrooms were created, of which 58 were included in the
fnal classroom sample.
All students in sampled classrooms (pseudo-classrooms)
were selected for assessment. n this way, the overall sample
design for the United States was intended to approximate
a self-weighting sample of students as much as possible,
with each fourth- or eighth-grade student having an equal
probability of selection.
3
The primary purpose of stratifcation is to improve the precision of the survey estimates. f explicit stratifcation of the population is used, the units of interest
(schools, for example) are sorted into mutually exclusive subgroupsstrata. Units in the same stratum are as homogeneous as possible, and units in different
strata are as heterogeneous as possible, with respect to the characteristics of interest to the survey. Separate samples are then selected from each stratum.
n the case of implicit stratifcation, the units of interest are simply sorted with respect to one or more variables known to have a high correlation with the variable
of interest. n this way, implicit stratifcation guarantees that the sample of units selected will be spread across the categories of the stratifcation variables.
4
The Northeast region consists of Connecticut, Delaware, the District of Columbia, Maine, Maryland, Massachusetts, New Hampshire, New Jersey, New York,
Pennsylvania, Rhode sland, and Vermont. The Central region consists of llinois, ndiana, owa, Kansas, Michigan, Minnesota, Missouri, Nebraska, North Dakota,
Ohio, Wisconsin, and South Dakota. The West region consists of Alaska, Arizona, California, Colorado, Hawaii, daho, Montana, Nevada, New Mexico, Oklahoma,
Oregon, Texas, Utah, Washington, and Wyoming. The Southeast region consists of Alabama, Arkansas, Florida, Georgia, Kentucky, Louisiana, Mississippi, North
Carolina, South Carolina, Tennessee, Virginia, and West Virginia.
5
Eight community types are distinguished: large city of 250,000+; midsize city of < 250,000; urban fringe of large city; urban fringe of mid-size city; large town
of 25,000+; small town of 2,500-25,000; rural outside metropolitan statistical area (MSA); rural inside MSA.
6
n order to maximize response rates from both districts and schools it was necessary to begin the recruitment of both prior to the end of the 2005-06 school year.
Since the 2007 NAEP sampling frame was not available until March 2006, it was necessary to base the TMSS samples on the 2006 NAEP sampling frame.
7
Since classrooms are sampled with equal probability within schools, small classrooms would have the same probability of selection as large classrooms.
Selecting classrooms under these conditions would likely mean that student sample size would be reduced, and some instability in the sampling weights created.
To avoid these problems, pseudo-classes are created for the purposes of classroom sampling. Following sampling, the pseudo-class combinations are dissolved
and the small classes involved retain their own identity. n this way, data on students, teachers, and classroom practices are linked in small classes in the same
way as with larger classes.
A-4
APPENDIX A

U.S.TIMSS fourth-grade sample
School sample. The fourth-grade school sample consisted
of 300 schools. Ten ineligible schools were identifed on the
basis that they served special student populations, or had
closed or altered their grade makeup since the sampling
frame was developed. This left 290 schools eligible to
participate, and 202 agreed to do so. The school response
rate before substitution then was 70 percent unweighted.
The analogous weighted school response rate was also 70
percent (see table A-1) and is given by the following formula:

weighted school response
rate before replacement
where Y denotes the set of responding original-sample
schools; N denotes the set of eligible non-responding original
sample schools; W
i
denotes the base weight for school i; W
i
=
1/P
i
, where P
i
denotes the school selection probability for
school i; and E
i
denotes the enrollment size of age-eligible
students, as indicated on the sampling frame.
n addition to the 202 participating schools from the original
sample, 55 substitute schools participated for a total of 257
participating schools at the fourth grade in the United States
(see table A-2). This gives a weighted (and unweighted)
school participation rate after substitution of 89 percent
(see table A-1).
8

Classroom sample. Schools agreeing to participate were
asked to list their fourth-grade mathematics classes as the
basis for sampling at the classroom level, resulting in the
identifcation of a total of 1,108 mathematics classrooms.
At this time, schools were given the opportunity to identify
special classesclasses in which all or most of the students
had intellectual or functional disabilities or were non-native-
language speakers. While these classes were regarded as
eligible, the students as a group were treated as "excluded
since, in the opinion of the school, their disabilities or
language capabilities would render meaningless their
performance on the assessment. Some 876 fourth-grade
students in a total of 99 classrooms in 63 schools were
excluded in this way. Schools identifed 32 classrooms
containing 222 students with intellectual disabilities (25
percent), 41 classrooms containing 221 students with
functional disabilities (25 percent) and 26 classrooms
containing 433 non-native-language speakers (50 percent).
The remaining 1,009 classrooms served as the pool from
which the classroom sample was drawn.
Classrooms with fewer than 15 students were collapsed into
pseudo-classrooms prior to sampling so that each eligible
classroom in a school had at least 20 students. Two
classrooms (pseudo-classrooms) were selected per school
where possible. n schools with only one classroom, this
classroom was selected with certainty. Some 521 classrooms
were selected as a result of this process. All selected
classrooms participated in TIMSS yielding a classroom
response rate of 100 percent (Olson, Martin, and Mullis 2008,
exhibit A.6).
Student sample. Schools were asked to list the students
in each of these 521 classrooms, along with the teachers who
taught mathematics and science to these students. A total of
11,454 students were listed as a result. Subsequently, 2,454
of these students were allocated to the bridge study since
they completed a TMSS 2003 assessment booklet rather
than the TMSS 2007 assessment (see the description of
the 2003-07 bridge study in the section on Scaling below).
Eliminating these students from further consideration leaves
9,000 fourth-grade students as the pool of students selected
to take part in TMSS 2007 proper. These students are
identifed by EA as "sampled students in participating
schools (Olson, Martin, and Mullis 2008, exhibit A.5).
This pool of students is reduced by within-school exclusions
and withdrawals. At the time schools listed the students in the
sampled classrooms, they had the opportunity to identify
particular students who were not suited to take the test
because of physical or intellectual disabilities (i.e., students
with disabilities who had been mainstreamed) or because they
were non-English-language speakers. Schools identifed a
total of 543 students they wished to have excluded from the
assessment; 323 students with intellectual disabilities (59
percent), 92 students with functional disabilities (17 percent),
and 128 students who were non-English-language speakers
(24 percent). And, by the time of the assessment a further
140 of the listed students had withdrawn from the school or
classroom. In total then, the pool of 9,000 sampled students
was reduced by 683 students (543 excluded and 140
withdrawn) to yield 8,317 "eligible students. The number of
eligible students is used as the base for calculating student
response rates (Olson, Martin, and Mullis 2008, exhibit A.5).
The number of eligible students was further reduced on
assessment day by 421 student absences, leaving 7,896
"assessed students identifed as having completed a TMSS
2007 assessment booklet (see Table A-2). EA defnes the
student response rate as the number of students assessed as
a percentage of the number of eligible students which, in this
case yields a weighted (and unweighted) student response
rate of 95 percent (see table A-1).
8
Substitute schools are matched pairs and do not have an independent probability of selection. NCES standards (Standard 1-3-8) indicate that, in these
circumstances, response rates should be calculated without including substitute schools (National Center for Education Statistics 2002). TMSS response rates
denoted as "before replacement conform to this standard. TMSS response rates denoted as "after replacement are not consistent with NCES standards since,
in the calculation of these rates, substitute schools are treated as the equivalent of sampled schools.
A-5
APPENDIX A
Table A-1. Coverage of target populations and participation rates, by grade and country: 2007
Grade four
Country
Years
of formal
schooling
Percentage of
international
desired
population
coverage
National
desired
population
overall
exclusion rate
Weighted
school
participation
rate before
substitution
Weighted
school
participation
rate after
substitution
Weighted
student
participation
rate
Combined
weighted
school and
student
participation
rate
1
Algeria
4 100 2.1 99 99 97 97
Armenia 4 100 3.4 93 100 96 96
Australia 4 100 4.0 99 100 95 95
Austria 4 100 5.0 98 99 98 97
Chinese Taipei 4 100 2.8 100 100 100 100
Colombia 4 100 2.1 93 99 98 97
Czech Republic 4 100 4.9 89 98 94 92
Denmark 4 100 4.1 71 91 94 85
El Salvador 4 100 2.3 99 100 98 98
England 5 100 2.1 83 90 93 84
Georgia 4 85 4.8 92 100 98 98
Germany 4 100 1.3 96 100 97 96
Hong Kong SAR 4 100 5.4 81 84 96 81
Hungary 4 100 4.4 93 99 97 96
Iran, Islamic Rep. of 4 100 3.0 100 100 99 99
Italy 4 100 5.3 91 100 97 97
Japan 4 100 1.1 97 99 97 95
Kazakhstan 4 94 5.3 99 100 100 100
Kuwait 4 100 0.0 100 100 85 85
Latvia 4 72 4.6 93 97 95 92
Lithuania 5 93 5.4 99 100 94 94
Morocco 4 100 1.4 81 81 96 77
Netherlands 4 100 4.8 48 95 97 91
New Zealand 4.5-5.5 100 5.4 97 100 96 96
Norway 4 100 5.1 88 97 95 92
Qatar 4 100 1.8 100 100 97 97
Russian Federation 4 100 3.6 100 100 98 98
Scotland 5 100 4.5 77 94 94 88
Singapore 4 100 1.5 100 100 96 96
Slovak Republic 4 100 3.3 98 100 97 97
Slovenia 4 100 2.1 92 99 95 93
Sweden 4 100 3.1 98 100 97 97
Tunisia 4 100 2.9 100 100 99 99
Ukraine 4 100 0.6 96 96 97 93
United States 4 100 9.2 70 89 95 84
Yemen 4 100 2.0 99 100 98 98
(See notes at end of table)
A-6
APPENDIX A

Table A-1. Coverage of target populations and participation rates, by grade and country: 2007
Continued
Grade eight
Country
Years
of formal
schooling
Percentage of
international
desired
population
coverage
National
desired
population
overall
exclusion rate
Weighted
school
participation
rate before
substitution
Weighted
school
participation
rate after
substitution
Weighted
student
participation
rate
Combined
weighted
school and
student
participation
rate
1
Algeria
8 100 0.1 99 99 96 95
Armenia 8 100 3.3 94 100 96 96
Australia 8 100 1.9 100 100 93 93
Bahrain 8 100 1.5 100 100 97 97
Bosnia and Herzegovina 8 or 9 100 1.5 100 100 98 98
Botswana 8 100 0.1 100 100 99 99
Bulgaria 8 100 20.3 94 98 96 94
Chinese Taipei 8 100 3.3 100 100 99 99
Colombia 8 100 1.6 96 100 98 98
Cyprus 8 100 2.5 100 100 96 96
Czech Republic 8 100 4.6 92 100 95 95
Egypt 8 100 0.5 99 100 98 98
El Salvador 8 100 2.8 99 100 98 98
England 9 100 2.3 78 86 88 75
Georgia 8 85 3.9 97 100 97 97
Ghana 8 100 0.9 100 100 98 98
Hong Kong SAR 8 100 3.8 73 79 96 75
Hungary 8 100 3.9 92 99 97 96
Indonesia 8 100 3.4 100 100 97 97
Iran, Islamic Rep. of 8 100 0.5 100 100 98 98
Israel 8 100 22.8 94 97 94 91
Italy 8 100 5.0 93 100 96 96
Japan 8 100 3.5 96 97 93 91
Jordan 8 100 2.0 100 100 96 96
Korea, Rep. of 8 100 1.6 100 100 99 99
Kuwait 8 100 0.3 97 97 87 84
Lebanon 8 100 1.4 81 92 93 85
Lithuania 8 92 4.2 98 99 91 90
Malaysia 8 100 3.3 100 100 98 98
Malta 9 100 2.9 100 100 95 94
Norway 8 100 2.6 88 93 93 86
Oman 8 100 1.2 100 100 99 99
Palestinian Nat'l Auth. 8 100 1.0 100 100 98 98
Qatar 9 100 0.8 100 100 97 97
Romania 8 100 1.8 99 99 97 97
Russian Federation 7 or 8 100 2.3 100 100 97 97
Saudi Arabia 8 100 0.5 99 99 95 94
Scotland 9 100 1.7 74 86 90 77
Serbia 8 80 6.8 100 100 98 98
Singapore 8 100 1.8 100 100 95 95
Slovenia 7 or 8 100 1.9 92 99 93 92
Sweden 8 100 3.6 100 100 94 94
Syrian Arab Republic 8 100 0.6 100 100 96 96
Thailand 8 100 3.4 90 100 99 99
Tunisia 8 100 0.0 100 100 98 98
Turkey 8 100 2.6 100 100 98 98
Ukraine 8 100 0.2 98 98 97 95
United States 8 100 7.9 68 83 93 77
1
The combined weighted school and student participation rate is derived by multiplying the unrounded weighted school and student participation rates.
NOTE: Only countries that completed the necessary steps for their data to appear in the reports from the nternational Study Center are listed. n addition to the
countries listed above, seven separate jurisdictions participated in TMSS 2007: the provinces of British Columbia, Ontario, and Quebec in Canada; the Basque
region of Spain; Dubai, UAE; and the states of Massachusetts and Minnesota. nformation on these seven jurisdictions can be found in the international TMSS
2007 reports (Mullis, Martin, and Foy 2008; Martin, Mullis, and Foy 2008). Countries could participate at either grade level. Countries were required to sample
students enrolled in the grade that represents 4 years of schooling, counting from the frst year of the nternational Standard Classifcation of Education (SCED)
Level 1, providing that the mean age at the time of testing is at least 9.5 years, or students enrolled in the grade that represents eight years of schooling,
counting from the frst year of SCED Level 1. n the United States and most countries, this corresponds to grade four and grade eight, respectively. n Bulgaria,
the science assessment was administered to a diminished number of schools and students. The weighted school participation rate before substitution shown
above refers to the mathematics assessment. This number should be reduced to 93 percent in describing the science assessment.
A-7
APPENDIX A
Table A-2. Total number of schools and students, by grade and country: 2007
Grade four
Country
Schools
in original
sample
Eligible
schools
in original
sample
Schools
in original
sample that
participated
Substitute
schools
Total
schools that
participated
Sampled
students in
participating
schools
Students
assessed
Algeria
150 150 149 0 149 4,366 4,223
Armenia 150 148 143 5 148 4,253 4,079
Australia 230 229 226 3 229 4,511 4,108
Austria 199 197 194 2 196 5,158 4,859
Chinese Taipei 150 150 150 0 150 4,260 4,131
Colombia 150 143 132 10 142 5,320 4,801
Czech Republic 150 147 132 12 144 4,583 4,235
Denmark 150 150 105 32 137 3,907 3,519
El Salvador 150 148 146 2 148 4,467 4,166
England 160 159 131 12 143 4,784 4,316
Georgia 152 144 131 13 144 4,384 4,108
Germany 250 247 239 7 246 5,464 5,200
Hong Kong SAR 150 150 122 4 126 3,965 3,791
Hungary 150 145 135 9 144 4,221 4,048
Iran, Islamic Rep. of 240 224 224 0 224 3,939 3,833
Italy 170 170 155 15 170 4,912 4,470
Japan 150 150 145 3 148 4,677 4,487
Kazakhstan 150 141 140 1 141 4,063 3,990
Kuwait 150 150 149 0 149 4,468 3,803
Latvia 150 150 140 6 146 4,188 3,908
Lithuania 163 156 154 2 156 4,345 3,980
Morocco 226 224 184 0 184 4,282 3,894
Netherlands 150 148 72 69 141 3,608 3,349
New Zealand 220 220 213 7 220 5,347 4,940
Norway 150 150 131 14 145 4,462 4,108
Qatar 114 114 114 0 114 7,411 7,019
Russian Federation 206 206 206 0 206 4,659 4,464
Scotland 150 148 114 25 139 4,320 3,929
Singapore 177 177 177 0 177 5,235 5,041
Slovak Republic 184 184 181 3 184 5,269 4,963
Slovenia 150 150 138 10 148 4,664 4,351
Sweden 160 155 151 4 155 4,965 4,676
Tunisia 150 150 150 0 150 4,242 4,134
Ukraine 150 150 144 0 144 4,459 4,292
United States 300 290 202 55 257 9,000 7,896
Yemen 150 144 143 1 144 6,128 5,811
Note that the 876 students excluded because whole classes
were excluded do not fgure in the calculation of student
response rates. They do, however, fgure in the calculation
of the coverage of the nternational Target Population.
Together, these 876 students excluded prior to classroom
sampling, plus the 543 within-class exclusions resulted in an
overall student exclusion rate of 9.2 percent (see table A-1
and Olson, Martin, and Mullis 2008, exhibit A.3). The reported
coverage of the nternational Target Population then is 90.8
percent (see Olson, Martin, and Mullis 2008, Exhibit A.3).
EA standards defne this degree of coverage as acceptable
though falling outside the desired range of 95 percent or better.
Combined participation rates. The combined school,
classroom, and student weighted response rate standard of
75 percent used by TMSS in situations in which it is necessary
to recruit substitute schools was met in this instance. Both the
weighted and unweighted product of the separate response
rates (84 percent) exceeded this 75 percent standard (see
table A-1). The application of international guidelines means,
however, that U.S. statistics describing fourth-grade students
are annotated in international reports to indicate that coverage
of the defned student population was less than the EA
standard of 95 percent and that participation rates were met
only after substitute schools were included.
Tables A-1 and A-2 are extracts from the international report
Exhibits noted above and are designed to summarize
information on school and student responses rates and
coverage of the fourth- and eighth-grade target populations
in each nation.
A-8
APPENDIX A

Table A-2. Total number of schools and students, by grade and country: 2007Continued
Grade eight
Country
Schools
in original
sample
Eligible
schools
in original
sample
Schools
in original
sample that
participated
Substitute
schools
Total
schools that
participated
Sampled
students in
participating
schools
Students
assessed
Algeria 150 150 149 0 149 5,793 5,447
Armenia 150 148 143 5 148 4,898 4,689
Australia 230 228 228 0 228 4,549 4,069
Bahrain 74 74 74 0 74 4,434 4,230
Bosnia and Herzegovina 150 150 150 0 150 4,373 4,220
Botswana 150 150 150 0 150 4,310 4,208
Bulgaria 170 166 158 5 163 4,312 4,019
Chinese Taipei 150 150 150 0 150 4,164 4,046
Colombia 150 148 142 6 148 5,343 4,873
Cyprus 67 67 67 0 67 4,755 4,399
Czech Republic 150 147 135 12 147 5,182 4,845
Egypt 237 233 231 2 233 6,906 6,582
El Salvador 150 145 143 2 145 4,329 4,063
England 160 160 126 11 137 4,768 4,025
Georgia 152 135 131 4 135 4,533 4,178
Ghana 163 163 163 0 163 5,678 5,294
Hong Kong SAR 152 152 112 8 120 3,657 3,470
Hungary 150 145 133 11 144 4,321 4,111
Indonesia 150 149 149 0 149 4,419 4,203
Iran, Islamic Rep. of 220 208 208 0 208 4,140 3,981
Israel 150 150 140 6 146 3,708 3,294
Italy 170 170 159 11 170 4,873 4,408
Japan 150 150 144 2 146 4,656 4,312
Jordan 200 200 200 0 200 5,733 5,251
Korea, Rep. of 150 150 150 0 150 4,358 4,240
Kuwait 163 163 158 0 158 4,721 4,091
Lebanon 150 148 120 16 136 4,062 3,786
Lithuania 150 144 141 1 142 4,537 3,991
Malaysia 150 150 150 0 150 4,589 4,466
Malta 60 59 59 0 59 5,053 4,670
Norway 150 150 133 6 139 5,085 4,627
Oman 150 146 146 0 146 4,894 4,752
Palestinian Nat'l Auth. 155 148 147 1 148 4,572 4,378
Qatar 67 67 66 0 66 7,558 7,184
Romania 150 150 149 0 149 4,447 4,198
Russian Federation 210 210 210 0 210 4,706 4,472
Saudi Arabia 167 166 165 0 165 4,515 4,243
Scotland 150 150 109 20 129 4,700 4,070
Serbia 150 147 147 0 147 4,246 4,045
Singapore 164 164 164 0 164 4,828 4,599
Slovenia 150 150 138 10 148 4,414 4,043
Sweden 160 159 158 1 159 5,712 5,215
Syrian Arab Republic 150 150 150 0 150 5,025 4,650
Thailand 150 150 134 16 150 5,579 5,412
Tunisia 150 150 150 0 150 4,258 4,080
Turkey 150 146 146 0 146 4,682 4,498
Ukraine 150 150 146 0 146 4,598 4,424
United States 300 287 197 42 239 8,447 7,377
NOTE: Only countries that completed the necessary steps for their data to appear in the reports from the nternational Study Center are listed. n addition to the
countries listed above, seven separate jurisdictions participated in TMSS 2007: the provinces of British Columbia, Ontario, and Quebec in Canada; the Basque
region of Spain; Dubai, UAE; and the states of Massachusetts and Minnesota. nformation on these seven jurisdictions can be found in the international TMSS
2007 reports (Mullis, Martin, and Foy 2008; Martin, Mullis, and Foy 2008). Countries could participate at either grade level. Countries were required to sample
students enrolled in the grade that represents 4 years of schooling, counting from the frst year of the nternational Standard Classifcation of Education (SCED)
Level 1, providing that the mean age at the time of testing is at least 9.5 years, or students enrolled in the grade that represents eight years of schooling,
counting from the frst year of SCED Level 1. n the United States and most countries, this corresponds to grade four and grade eight, respectively. n Bulgaria,
the science assessment was administered to a diminished number of schools and students. The numbers shown in the table refer to the mathematics
assessment. These should be reduced accordingly to describe the science assessment, as follows:eligible schools=142; participating schools in original
sample=134; total participating schools=134; sampled students in participating schools=3,426; students assessed=3,079.
A-9
APPENDIX A
U.S.TIMSS eighth-grade sample
School sample. The eighth-grade school sample consisted
of 300 schools. Thirteen ineligible schools were identifed
on the basis that they served special student populations, or
had closed or altered their grade makeup since the sampling
frame was developed. This left 287 schools eligible to
participate and 197 agreed to do so. The unweighted school
response rate before substitution then was 69 percent. The
analogous weighted school response rate was 68 percent
(see table A-1).
n addition to the 197 participating schools from the original
sample, 42 substitute schools participated for a total of 239
participating schools at the eighth grade in the United States
(see table A-2). This gives a weighted (and unweighted)
school participation rate after substitution of 83 percent
(see table A-1).
9

Classroom sample. Schools agreeing to participate were
asked to list their eighth-grade mathematics classes as the
basis for sampling at the classroom level, resulting in the
identifcation of a total of 3,125 mathematics classrooms.
At this time, schools were given the opportunity to identify
special classesclasses in which all or most of the students
had intellectual or functional disabilities or were non-English-
language speakers. While these classes were regarded as
eligible, the students as a group were treated as "excluded
since, in the opinion of the school, their disabilities or
language capabilities would render meaningless their
performance on the assessment. Some 2,834 eighth-grade
students in a total of 308 classrooms in 133 schools were
excluded in this way. Schools identifed 106 classrooms
containing 788 students with intellectual disabilities (28
percent), 136 classrooms containing 989 students with
functional disabilities (35 percent) and 66 classrooms
containing 1,057 non-native-language speakers (37 percent).
The remaining 2,775 classrooms served as the pool from
which the sample was drawn.
Classrooms with fewer than 15 students were collapsed into
pseudo-classrooms prior to sampling so that each eligible
classroom in a school had at least 20 students. Two
classrooms (pseudo-classrooms) were selected per school
where possible. n schools where there was only one
classroom, this classroom was selected with certainty.
Some 539 classrooms were selected as a result of this
process. All selected classrooms participated in TMSS
yielding a classroom response rate of 100 percent (Olson,
Martin, and Mullis 2008, exhibit A6).
Subsequently, schools were asked to list the students in
each sampled classroom, along with the teachers who taught
mathematics and science to these students. At this time,
schools were given the opportunity to identify particular
students in these classrooms who were not suited to take the
test because of physical or intellectual disabilities (i.e., students
with disabilities who had been mainstreamed) or because they
were non-native- language speakers.
Student sample. Schools were asked to list the students
in each of these 539 sampled classrooms, along with the
teachers who taught mathematics and science to these
students. A total of 10,793 students were listed as being in the
selected classrooms. Subsequently, 2,346 of these students
were allocated to the bridge study since they completed a
TMSS 2003 assessment booklet rather than the TMSS 2007
assessment (see the description of the 2003-07 bridge study in
the section on Scaling below). Eliminating these students from
further consideration leaves 8,447 eighth-grade students as the
pool of students selected to take part in TMSS 2007 proper.
These students are identifed by EA as "sampled students in
participating schools (Olson, Martin, and Mullis 2008, exhibit A5).
This pool of students is reduced by within-school exclusions
and withdrawals. At the time schools listed the students in
sampled classrooms, they had the opportunity to identify
particular students who were not suited to take the test
because of physical or intellectual disabilities (i.e., students
with disabilities who had been mainstreamed) or because
they were non-native-language speakers. Schools identifed
a total of 272 students they wished to have excluded from
the assessment; 154 students with intellectual disabilities
(57 percent), 48 students with functional disabilities (18
percent) and 70 students who were non-English-language
speakers (26 percent). And, by the time of the assessment
a further 202 of the listed students had withdrawn from the
school or classroom. n total then, the pool of 8,447 sampled
students was reduced by 474 students (272 excluded and 202
withdrawn) to yield 7,973 "eligible students. The number of
eligible students is used as the base for calculating student
response rates (Olson, Martin, and Mullis 2008, exhibit A5). .
The number of eligible students was further reduced on
assessment day by 596 student absences, leaving 7,377
"assessed students identifed as having completed a TMSS
2007 assessment booklet (see table A-2). The EA defnes the
student response rate as the number of students assessed as
a percentage of the number of eligible students which, in this
case yields a weighted (and unweighted) student response
rate of 93 percent (see table A-1).
Note that the 2,834 students excluded because whole classes
were excluded do not fgure in the calculation of student
response rates. They do, however, fgure in the calculation of
the coverage of the nternational Target Population. Together,
these 2,834 students excluded prior to classroom sampling,
plus the 272 within-class exclusions resulted in an overall
student exclusion rate of 7.9 percent (see table A-1 and Olson,
9
Substitute schools are matched pairs and do not have an independent probability of selection. NCES standards (Standard 1-3-8) indicate that, in these
circumstances, response rates should be calculated without including substitute schools (National Center for Education Statistics 2002). TMSS response rates
denoted as "before replacement conform to this standard. TMSS response rates denoted as "after replacement are not consistent with NCES standards since,
in the calculation of these rates, substitute schools are treated as the equivalent of sampled schools.
A-10
APPENDIX A

Martin, and Mullis 2008, exhibit A.3). The reported coverage
of the nternational Target Population then is 92.1 percent
(see Olson, Martin, and Mullis 2008, exhibit A.3). EA standards
defne this degree of coverage as acceptable though falling
outside the desired range of 95 percent or better.
Combined participation rates. The combined school,
classroom and student weighted response rate standard of 75
percent used by TMSS in situations where substitute schools
were necessary was met in this instance. Both the weighted
and unweighted product of the separate response rates (77
percent) exceeded this 75 percent standard (see table A-1).
The application of international guidelines means, however,
that U.S. statistics describing eighth-grade students are
annotated in international reports to indicate that coverage
of the defned student population was less than the EA
standard of 95 percent and that participation rates were
met only after substitute schools were included. Table A-2
summarizes information on the coverage of the eighth-grade
target populations in each nation.
Nonresponse bias in the U.S. TIMSS samples
NCES standards require a nonresponse bias analysis if
the school-level response rate falls below 85 percent of
the sampled schools (standard 2-2-2; National Center for
Education Statistics 2002), as they did for both fourth- and
eighth-grade samples. As a consequence a nonresponse bias
analysis was initiated and took a form similar to that adopted
for TMSS 2003 (Ferraro and Van de Kerckhove 2006). A full
report of this study will be included in a technical report to be
released with the U.S. national TMSS dataset.
Three methods were chosen to perform this analysis. The
frst method focused exclusively on the sampled schools and
ignored substitute schools. The schools were weighted by their
school base weights, excluding any nonresponse adjustment
factor. The second method focused on sampled schools plus
substitute schools, treating as nonrespondents those schools
from which a fnal response was not received. Again, schools
were weighted by their base weights, with the base weight
for each substitute school set to the base weight of the
original school that it replaced. The third method repeated
the analyses from the second method using nonresponse
adjusted weights.
10

In order to compare TIMSS respondents and nonrespondents,
it was necessary to match the sample of schools back to the
sample frame to identify as many characteristics as possible
that might provide information about the presence of
nonresponse bias.
11
The characteristics available for analysis in
the sampling frame were taken from the CCD for public
schools, and from the PSS for private schools. For categorical
variables, the distribution of the characteristics for respondents
was compared with the distribution for all schools. The
hypothesis of independence between a given school
characteristic and the response status (whether or not the
school participated) was tested using a Rao-Scott modifed
chi-square statistic. For continuous variables, summary
means were calculated and the difference between means
was tested using a t test. Note that this procedure took
account of the fact that the two samples in question were not
independent samples, but in fact the responding sample was
a subsample of the full sample. This effect was accounted for
in calculating the standard error of the difference. Note also
that in those cases where both samples were weighted using
just the base weights, the test is exactly equivalent to testing
that the mean of the respondents was equal to the mean of
the nonrespondents.
n addition, multivariate logistic regression models were set
up to identify whether any of the school characteristics were
signifcant in predicting response status when the effects of
all potential infuences were considered simultaneously.
Public and private schools were modeled together using the
following variables:
12
community type (central city, urban
fringe/large town, rural/small town); control of school (public
or private); NAEP region (Northeast, Southeast, Central,
West); poverty level (percentage of students in school eligible
for free or reduced-price lunch);
13
number of students enrolled
in fourth or eighth grade; total number of students; and,
percentage minority students.
14

10
A detailed treatment of the meaning and calculation of sampling weights, including the nonresponse adjustment factors, is provided in the TIMSS 2007 Technical
Report (Olson, Martin, and Mullis 2008).
11
Comparing characteristics for respondents and nonrespondents is not always a good measure of nonresponse bias if the characteristics are either unrelated
or weakly related to more substantive items in the survey. Nevertheless, this is often the only approach available.
12
NAEP region and community type were dummy coded for the purposes of these analyses. n the case of NAEP region, "West was used as the omitted group.
For community type, "urban fringe/large town was chosen as the omitted group.
13
The measure of school poverty is based on the proportion of students in a school eligible for the Free or Reduced-Price Lunch (FRPL) program, a federally
assisted meal program that provides nutritionally balanced, low-cost or free lunches to eligible children each school day. For the purposes of the nonresponse bias
analyses, schools were classifed as "low poverty if less than 50 percent of the students were eligible for FRPL, and "high poverty if 50 percent or more of
students were eligible. Since the nonresponse bias analyses involve both participating and nonparticipating schools, they are based, out of necessity, on data from
the sampling frame. TMSS data are not available for nonparticipating schools. The school frame data are derived from the CCD and PPS. The CCD data provide
information on the percentage of students in each school who are eligible for free- or reduced-price lunch, but are limited to public schools. The PPS data do not
provide the same information for private schools. n the interest of retaining all of the schools and students in these analyses, private schools were assumed to be
low-poverty schoolsthat is, they were assumed to be schools in which less than 50 percent of students were eligible for FRPL. Separate analyses of the TMSS
data for participating private schools suggest the reasonableness of this assumption. Of the 21 grade four private schools, only one reports having 50 percent or
more of students eligible for FRPL. Among the 21 grade eight private schools, only two report having 50 percent or more of students eligible for FRPL.
14
Two forms of this school attribute were used in the analyses. n the bivariate analyses the percentage of each race/ethnic group was related separately to
participation status. n the logistic regression analyses a single measure was used to characterize each school, namely, "percentage of minority students.
A-11
APPENDIX A
Results for the original sample of schools. In the
analyses for the original sample of schools, all substituted
schools were treated as nonresponding schools. The results
of these analyses follow.
Fourth grade. n the investigation into nonresponse
bias at the school level for TMSS fourth-grade schools,
comparisons between schools in the eligible sample
and participating schools showed that there was no
relationship between response status and the majority
of school characteristics available for analysis. n separate
variable-by-variable bivariate analyses, three variables
were found to be related to participation: community type,
region, and racial/ethnic composition. Central city schools
were underrepresented among participating schools by
almost 4 percent and rural small-town schools were
overrepresented by the same amount. Similarly, schools
in the Central region were overrepresented by close to 5
percent, and schools in the West underrepresented by
about 3.5 percent in the original sample of participating
schools. And, in regard to racial/ethnic composition, both
the percentage of White, non-Hispanic and the
percentage of American ndian or Alaska Native students
were higher in participating schools than in the eligible
sample. Although each of these fndings indicates some
potential for nonresponse bias, when all of these factors
were considered simultaneously in a regression analysis,
the results indicated that the only independent source of
bias lay with the fact that, relative to schools in the West,
schools in the Central region were somewhat
overrepresented among the participating schools.
Eighth grade. The bivariate analyses for eighth-grade
schools showed no relationship between participation and
any of the school characteristics examined. However, the
multivariate regression analysis showed that, relative to
urban fringe/large town schools, central city schools were
overrepresented among the participating schools. And,
relative to schools in the West region, schools in the
Central region were similarly overrepresented.
ResuIts for the naI sampIe of schooIs. In the analyses
for the fnal sample of schools, all substitute schools were
included with the original schools as responding schools,
leaving nonresponding schools as those for which no
assessment data were available. The results of these
analyses follow and are somewhat more complicated
than the analyses for the original sample of schools.
Fourth grade. The bivariate results for the fnal sample
of fourth-grade schools indicated that two of the three
variables were still found to be related to participation:
community type, and racial/ethnic composition. As in the
earlier analysis, central city schools were underrepresented
among participating schools (by some 2.5 percent) and
rural small-town schools were overrepresented (by some
2 percent). Similarly, both the percentage of White, non-
Hispanic and the percentage of American ndian or
Alaska Native students were higher in participating
schools than in the eligible sample. In each instance the
differences were substantially reduced over those seen
in connection with the original sample. These same
differences could not be demonstrated in the multivariate
regression analysis which failed to show any variables
as signifcant predictors of participation.
For the fnal sample of schools with school nonresponse
adjustments applied to the weights,
15
the results were
identical. These results suggest that there is some
potential for nonresponse bias in the fourth-grade original
sample based on the characteristics studied. It also
suggests that the use of substitute schools reduced the
potential for bias. The school nonresponse adjustment
had no effect on the characteristics of the weighted
responding sample of schools.
Eighth grade. The bivariate results for the fnal sample
indicated that two variables were related to participation:
community type, and the percentage of American ndian
or Alaska Native students. Central city schools were
overrepresented among participating schools by some
4 percent, and schools in urban fringe/large town were
underrepresented by nearly 4 percent, And, in regard
to racial/ethnic composition, the percentage of American
ndian or Alaska Native students in participating schools
was higher than in all eligible schools. The multivariate
regression analysis indicated that, relative to urban
fringe/large town schools, central city schools were
overrepresented among the participating schools, and
that the percentage of minority students in participating
schools was lower than in all eligible schools.
With school nonresponse adjustments applied to the
weights,
16
the results were identical. These results
suggest that there is some potential for nonresponse bias
in the original sample based on the characteristics studied.
t also suggests that, while there is no evidence that the
use of substitute schools reduced the potential for bias,
it has not added to it substantially. The school nonresponse
adjustment had no effect on the characteristics of the
weighted responding sample of schools.
15
The international weighting procedures created a nonresponse adjustment class for each explicit stratum; see the TIMSS 2007 Technical Report (Olson, Martin,
and Mullis 2008) for details. n the case of the U.S. fourth-grade sample, there was no explicit stratifcation and thus a single adjustment class. The procedures
could not be varied for individual countries to account for any specifc needs. Therefore, the U.S. nonresponse bias analyses could have no infuence on the
weighting procedures and were undertaken after the weighting process was complete.
16
The international weighting procedures created a nonresponse adjustment class for each explicit stratum. For the eighth grade, there was no explicit stratifcation
and thus a single adjustment class. Again, the procedures were not varied for individual countries to account for any specifc needs. As with the fourth grade, the
nonresponse bias analyses for the eighth grade could have no infuence on the weighting procedures
A-12
APPENDIX A

Test development
TMSS is a cooperative effort involving representatives from
every country participating in the study. For TMSS 2007,
the test development effort began with a revision of the
frameworks that are used to guide the construction of the
assessment (Mullis et al. 2005). The frameworks were
updated to refect changes in the curriculum and instruction
of participating countries. Extensive input from experts in
mathematics and science education, assessment, and
curriculum, and representatives from national educational
centers around the world contributed to the fnal shape of the
frameworks. Maintaining the ability to measure change over
time was an important factor in revising the frameworks.
As part of the TMSS dissemination strategy, approximately
one half of the 2003 assessment items were released for
public use. To replace assessment items that had been
released, countries submitted items for review by subject-
matter specialists, and additional items were written by
the EA Science and Mathematics Review Committee
in consultation with item-writing specialists in various
countries to ensure that the content, as explicated in the
frameworks, was covered adequately. tems were reviewed
by an international Science and Mathematics tem Review
Committee and feld-tested in most of the participating
countries. Results from the feld test were used to evaluate
item diffculty, how well items discriminated between high-
and low-performing students, the effectiveness of distracters
in multiple-choice items, scoring suitability and reliability
for constructed-response items, and evidence of bias toward
or against individual countries or in favor of boys or girls.
As a result of this review, 196 new fourth-grade items were
selected for inclusion in the international assessment. In total,
353 mathematics and science items were included in the
fourth-grade TMSS assessment booklets. At the eighth grade,
the review of the item statistics from the feld test led to the
inclusion 240 new eighth-grade items in the assessment. n
total, 429 mathematics and science items were included in the
eighth-grade TMSS assessment booklets. More detail on the
distribution of new and trend items is included in table A-3.
Table A-3. Number of new and trend mathematics and science items in the
TIMSS grade four and grade eight assessments, by type: 2007
Grade four
All items New items Trend items
Number Percent Number Percent Number Percent
All items
Total 353 100 196 100 157 100
Multiple choice 189 54 108 55 81 52
Constructed response 164 46 88 45 76 48
Mathematics items
Total 179 100 98 100 81 100
Science items
Total 174 100 98 100 76 100
Grade eight
All items New items Trend items
Number Percent Number Percent Number Percent
All items
Total 429 100 240 100 189 100
Multiple choice 224 52 117 49 107 57
Mathematics items
Total 215 100 120 100 95 100
Science items
Total 214 100 120 100 94 100
SOURCE: nternational Association for the Evaluation of Educational Achievement (EA), Trends in nternational Mathematics
and Science Study (TMSS) 2007.
A-13
APPENDIX A
Design of instruments
TMSS 2007 included booklets containing assessment items
as well as self-administered background questionnaires for
principals, teachers, and students.
Assessment booklets
The assessment booklets were constructed such that not all
of the students responded to all of the items. This is consistent
with other large-scale assessments, such as the NAEP. To
keep the testing burden to a minimum, and to ensure broad
subject-matter coverage, TMSS used a rotated block design
that included both mathematics and science items. That is,
students encountered both mathematics and science items
during the assessment.
The 2007 fourth-grade assessment consisted of 14 booklets,
each requiring approximately 72 minutes of response time.
To ensure that TMSS 2007 maintains the trend, and to provide
for a correction through equating, if necessary, four additional
"bridge booklets were required but only for countries that
participated in TMSS 2003.
17
These bridge study booklets
were identical to booklets used in 2003. Performance on
the bridge booklets did not contribute to the overall score for
TMSS 2007 but the data were used in the trend scaling that
placed the 2007 results on the same scale as previous TMSS
assessments and so allowed for comparisons across the years.
For the United States and other countries participating in the
2003 assessment, this meant a total of 18 booklets. The 18
booklets were rotated among students, with each participating
student completing 1 booklet only. The mathematics and
science items were each assembled separately into 14
blocks, or clusters, of items. Each block contained either
mathematics items or science items only. The secure, or
trend, items used in prior assessments were included in 3
blocks, with the other 11 blocks containing new items. Each of
the 14 TMSS 2007 booklets contained 4 blocks in total. The 4
additional bridge study booklets from TMSS 2003 contained 6
blocks of items each.
The 2007 eighth-grade assessment followed the same pattern
and consisted of 18 booklets, each requiring approximately
90 minutes of response time. The 18 booklets were rotated
among students, with each participating student completing
1 booklet only. The mathematics and science items were
assembled into 14 blocks, or clusters, of items. Each block
contained either mathematics items or science items only.
The secure, or trend, items used in prior assessments were
included in 3 blocks, with the other 11 blocks containing new
items. Each of the 14 TMSS 2007 booklets contained 4
blocks in total. The 4 additional bridge study booklets from
TMSS 2003 contained 6 blocks of items each. Performance
on the bridge booklets did not contribute to the overall score
for TMSS 2007 but the data were used in the trend scaling
that placed the 2007 results on the same scale as previous
TMSS assessments and so allowed for comparisons across
the years.
As part of the design process, it was necessary to ensure that
the booklets showed a distribution across the mathematics and
science content domains as specifed in the frameworks. The
number of mathematics and science items in the fourth- and
eighth-grade TMSS 2007 assessments is shown in table A-4.
Grade eight
Response type
Content domain Total
Multiple
choice
Constructed
response
Total 429 224 205
Mathematics 215 117 98
Number 63 35 28
Algebra 64 34 30
Geometry 47 31 16
Data and chance 41 17 24
Science 214 107 107
Biology 76 36 40
Chemistry 42 21 21
Physics 55 31 24
Earth science 41 19 22
Grade four
Response type
Content domain Total
Multiple
choice
Constructed
response
Total 353 189
164
Mathematics 179 96 83
Number 78 50 28
Geometric shapes and measures 44 32 12
Data display 97 14 83
Science 174 93 81
Life science 74 42 32
Physical science 64 35 29
Earth science 36 16 20
Table A-4. Number of mathematics and science items in the TIMSS grade four and grade eight
assessments, by type and content domain: 2007
17
A detailed description of the bridge study and the use of the data obtained through the bridge booklets in scaling the 2007 assessment can be found in the
A-14
APPENDIX A

Background questionnaires
As in prior administrations of TMSS, TMSS 2007 included
self-administered questionnaires for principals, teachers,
and students. To create the questionnaires for 2007, the 2003
versions were reviewed extensively by the national research
coordinators from the participating countries as well as a
Questionnaire tem Review Committee (QRC). Based on
this review, the QRC deleted or revised some questions,
and added several new ones. Like the assessment items,
all questionnaire items were feld tested, and the results
reviewed carefully. As a result, some of the questionnaire
items needed to be revised prior to their inclusion in the fnal
questionnaires. The questionnaires requested information to
help provide a context for the performance scores, focusing
on such topics as students' attitudes and beliefs about
learning, their habits and homework, and their lives both in
and outside of school; teachers' attitudes and beliefs about
teaching and learning, teaching assignments, class size
and organization, instructional practices, and participation
in professional development activities; and principals'
viewpoints on policy and budget responsibilities, curriculum
and instruction issues and student behavior, as well as
descriptions of the organization of schools and courses.
Detailed results from the student, teacher, and school surveys
are not discussed in this report but are available in the two
international reports: the TIMSS 2007 International
Mathematics Report (Mullis, Martin, and Foy 2008) and
TIMSS 2007 International Science Report (Martin, Mullis,
and Foy 2008).
Calculator usage
Calculators were not permitted during the TMSS fourth-grade
assessment. However, the TMSS policy on calculator use at
the eighth grade was to give students the best opportunity to
operate in settings that mirrored their classroom experiences.
Calculators were permitted but not required for the eighth-grade
assessment materials. In the United States, students assigned
one of the 14 TMSS 2007 booklets were allowed, but not
required, to use calculators. However, students assigned one
of the trend booklets from the 2003 assessment were required
to follow the 2003 rules in this respect. These students could
use a calculator only for the second half of the booklet.
Translation
Source versions of all instruments (assessment booklets,
questionnaires, and manuals) were prepared in English and
translated into the primary language or languages of instruction
in each country. n addition, it was sometimes necessary to
adapt the instrument for cultural purposes, even in countries
that use English as the primary language of instruction. All
adaptations were reviewed and approved by the nternational
Study Center to ensure they did not change the substance or
intent of the question or answer choices. For example, proper
names were sometimes changed to names that would
be more familiar to students (e.g., Marja-leena to Maria).
Each country prepared translations of the instruments
according to translation guidelines established by the
nternational Study Center. Adaptations to the instruments
were documented by each country and submitted for review.
The goal of the translation guidelines was to produce
translated instruments of the highest quality that would
provide comparable data across countries.
Translated instruments were verifed by an independent,
professional translation agency prior to fnal approval and
printing of the instruments. Countries were required to submit
copies of the fnal printed instruments to the nternational
Study Center. Further details on the translation process can
be found in the TIMSS 2007 Technical Report (Olson, Martin,
and Mullis 2008).
Recruitment, test administration,
and quality assurance
TMSS 2007 emphasized the use of standardized procedures
in all countries. Each country collected its own data, based
on comprehensive manuals and trainings provided by the
international project team to explain the survey's
implementation, including precise instructions for the work
of school coordinators and scripts for test administrators to
use in testing sessions.
Recruitment of schools and students
With the exception of private schools, the recruitment of
schools required several steps. Beginning with the sampled
schools, the frst step entailed obtaining permission from
the school district to approach the sampled school(s) in that
district. If a district refused permission, then the district of the
frst substitute school was approached and the procedure was
repeated. With permission from the district, the school(s) was
contacted in a second step. If a sampled school refused to
participate, the district of the frst substitute was approached
and the permission procedure repeated. During most of the
recruitment period sampled schools and substitute schools
were being recruited concurrently. Each participating school
was asked to nominate a School Coordinator as the main
point of contact for the study. The school coordinator worked
with project staff to arrange logistics and liaise with staff,
students and parents as necessary.
On the advice of the school, parental permission for students
to participate was sought with one of three approaches to
parents: a simple notifcation; a notifcation with a refusal form;
and a notifcation with a consent form for parents to sign.
n each approach, parents were informed that their students
could opt out of participating.
A-15
APPENDIX A
Gifts to schools, School Coordinators, and students.
Schools, School Coordinators, and students were provided
with small gifts as a sign of appreciation for their willingness
to participate. Schools were provided with an all-in-one
printer/photocopier/scanner/fax, School Coordinators
received a TMSS satchel, and students were given a clock-
compass carabiner.
Test administration
Test administration in the United States was carried out
by professional staff trained according to the international
guidelines. School personnel were asked only to assist with
listings of students, identifying space for testing in the school,
and specifying any parental consent procedures needed for
sampled students.
Quality assurance
The nternational Study Center monitored compliance with
the standardized procedures. National research coordinators
were asked to nominate one or more persons unconnected
with their national center, such as retired school teachers,
to serve as quality control monitors for their countries.
The nternational Study Center developed manuals for
the monitors and briefed them in 2-day training sessions
about TIMSS, the responsibilities of the national centers in
conducting the study, and their own roles and responsibilities.
Some 30 schools in the U.S. samples were visited by the
monitors15 of the 257 schools in the fourth-grade sample,
and 15 of the 239 schools in the eighth-grade sample. These
schools were scattered geographically across the nation.
n addition, each country conducted its own separate quality
control procedures.
Scoring and scoring reliability
The TMSS assessment items included both multiple-choice
and constructed-response items. A scoring rubric (guide) was
created for every item included in the TMSS assessments.
The rubrics were carefully written and reviewed by national
research coordinators and other experts as part of the feld
test of items, and revised accordingly.
The national research coordinator in each country was
responsible for the scoring and coding of data in that country,
following established guidelines. The national research
coordinator and, sometimes, additional staff attended scoring
training sessions held by the International Study Center. The
training sessions focused on the scoring rubrics and coding
system employed in TMSS. Participants in these training
sessions were provided extensive practice in scoring example
items over several days. nformation on within-country
agreement among coders was collected and documented
by the International Study Center. Information on scoring
and coding reliability was also used to calculate cross-country
agreement among coders. Information on scoring reliability
for constructed-response scoring in TMSS 2007 is provided
in table A-5.
Data entry and cleaning
The national research coordinator from each country oversaw
data entry. The data collected for TMSS 2007 were entered
into data fles with a common international format, as specifed
in the Data Entry Manager Manual (EA Data Processing
Center 2006), which accompanied data entry software
(WinDEM) available to all participating countries. The software
facilitated the checking and correction of data by providing
various data consistency checks. The data were then sent to
the EA Data Processing Center (DPC) in Hamburg, Germany,
for cleaning. The DPC checked that the international data
structure was followed; checked the identifcation system
within and between fles; corrected single case problems
manually; and applied standard cleaning procedures to
questionnaire fles. Results of the data cleaning process were
documented by the DPC. This documentation was shared with
the national research coordinator with specifc questions to be
addressed. The national research coordinator then provided
the DPC with revisions to coding or solutions for anomalies.
The DPC subsequently compiled background univariate
statistics and preliminary test scores based on classical and
Rasch item analyses. Detailed information on the entire data
entry and cleaning process can be found in the TIMSS 2007
Technical Report (Olson, Martin, and Mullis 2008).
Weighting, scaling,
and plausible values
Before the data were analyzed, responses from the groups of
students assessed were assigned sampling weights to ensure
that their representation in TMSS 2007 results matched their
actual percentage of the school population in the grade
assessed. With these sampling weights in place, the analyses
of TMSS 2007 data proceeded in two phases: scaling and
estimation. During the scaling phase, item response theory
(RT) procedures were used to estimate the measurement
characteristics of each assessment question. During the
estimation phase, the results of the scaling were used to
produce estimates of student achievement. Subsequent
analyses related these achievement results to the background
variables collected by TMSS 2007.
Weighting
Responses from the groups of students were assigned
sampling weights to adjust for over- or under-representation
during the sampling of a particular group. The use of sampling
weights is necessary for the computation of sound, nationally
representative estimates. The weight assigned to a student's
responses is the inverse of the probability that the student
A-16
APPENDIX A

Table A-5. Within-country constructed-response scoring
reliability for TIMSS grade four and grade eight
mathematics and science items, by exact percent
score agreement and country: 2007
Grade four
Mathematics Science
Average
across items
Range
Average
across items
Range
Country Min Max Min Max
TMSS average 98 88 100 96 81 100
Algeria 92 58 99 88 69 98
Armenia 99 94 100 98 93 100
Australia 100 98 100 99 95 100
Austria 99 95 100 98 90 100
Chinese Taipei 98 84 100 97 74 100
Colombia 99 93 100 98 50 100
Czech Republic 98 90 100 94 78 100
Denmark 97 83 100 91 72 100
El Salvador 99 96 100 99 78 100
England 99 91 100 98 88 100
Georgia 97 88 100 92 68 100
Germany 97 75 100 93 73 100
Hong Kong SAR 100 98 100 99 98 100
Hungary 100 97 100 99 96 100
Italy 99 94 100 98 85 100
Japan 99 94 100 97 88 100
Kazakhstan 99 96 100 99 97 100
Kuwait 100 98 100 99 94 100
Latvia 95 41 100 85 42 100
Lithuania 98 88 100 95 80 100
Morocco 95 33 100 93 75 100
Netherlands 97 86 100 92 71 100
New Zealand 99 95 100 97 90 100
Norway 99 92 100 97 88 100
Qatar 99 91 100 99 94 100
Scotland 99 91 100 97 87 100
Singapore 99 93 100 96 90 100
Slovak Republic 99 92 100 99 97 100
Slovenia 100 99 100 99 93 100
Sweden 98 89 100 93 65 100
Tunisia 98 86 100 92 77 100
Ukraine 100 98 100 100 98 100
United States 98 83 100 94 68 100
Yemen 98 83 100 96 85 100
A-17
APPENDIX A
Table A-5. Within-country constructed-response scoring
reliability for TIMSS grade four and grade eight
mathematics and science items, by exact percent
score agreement and country: 2007Continued
Grade eight
Mathematics Science
Average
across items
Range
Average
across items
Range
Country Min Max Min Max
TMSS average 98 89 100 96 82 100
Algeria 95 60 100 94 75 100
Armenia 99 94 100 98 89 100
Australia 99 93 100 97 88 100
Bahrain 100 97 100 94 78 100
Bosnia and Herzegovina 98 90 100 95 74 100
Botswana 98 84 100 95 79 100
Bulgaria 96 70 100 91 69 100
Chinese Taipei 98 47 100 94 66 100
Colombia 99 92 100 98 88 100
Czech Republic 98 86 100 93 75 100
Egypt 99 94 100 97 88 100
El Salvador 100 98 100 100 98 100
England 99 94 100 97 88 100
Georgia 97 76 100 92 67 100
Ghana 100 98 100 99 96 100
Hong Kong SAR 99 95 100 99 96 100
Hungary 98 84 100 95 86 100
Indonesia 98 90 100 97 81 100
Israel 96 82 100 92 74 100
Italy 99 85 100 96 63 100
Japan 97 84 100 91 54 100
Jordan 100 97 100 99 93 100
Korea, Rep. of 99 96 100 99 95 100
Kuwait 99 96 100 99 88 100
Lebanon 100 97 100 100 97 100
Lithuania 98 94 100 97 90 100
Malaysia 99 96 100 99 96 100
Malta 97 81 100 93 81 100
Norway 99 94 100 97 88 100
Oman 99 95 100 99 95 100
Palestinian Nat'l Auth. 98 89 100 94 82 100
Qatar 99 91 100 99 95 100
Romania 99 96 100 99 89 100
Saudi Arabia 100 97 100 99 90 100
Scotland 99 95 100 97 84 100
Serbia 99 94 100 97 74 100
Singapore 98 93 100 96 90 100
Slovenia 100 98 100 100 95 100
Sweden 98 86 100 92 70 100
Syrian Arab Republic 99 95 100 99 92 100
Thailand 98 89 100 90 73 100
Tunisia 97 87 100 91 61 100
Turkey 100 95 100 97 81 100
Ukraine 98 80 100 92 68 100
United States 97 86 100 93 73 100
NOTE: The reliability of constructed-reponse scoring was determined by having two scorers
independently score a random sample of some 200 student responses to each item. Table A-5
displays the average and range of the within-country exact percent of inter-rater agreement
across all items. To gather and document within-country agreement among scorers, systematic
subsamples of at least 100 students' responses to each constructed-response item were coded
independently by two readers. The agreement score indicates the degree of agreement.
SOURCE: nternational Association for the Evaluation of Educational Achievement (EA),
Trends in nternational Mathematics and Science Study (TMSS), 2007.
A-18
APPENDIX A

would be selected for the sample. When responses are
weighted, none are discarded, and each contributes to the
results for the total number of students represented by the
individual student assessed. Weighting also adjusts for various
situations (such as school and student nonresponse) because
data cannot be assumed to be randomly missing. The
internationally defned weighting specifcations for TMSS
require that each assessed student's sampling weight should
be the product of (1) the inverse of the school's probability
of selection, (2) an adjustment for school-level nonresponse,
(3) the inverse of the classroom's probability of selection, and
(4) an adjustment for student-level nonresponse.
18
All TMSS
1995, 1999, 2003, and 2007 analyses are conducted using
sampling weights. A detailed description of this process is
provided in the TMSS Technical 2007 Report (Olson, Martin,
and Mullis 2008).
Scaling
n TMSS, scale scores were estimated for each student using
an item response theory (RT) model. With RT the diffculty
of each item is deduced using information about how likely it
is for students to get some items correct versus other items.
Once the diffculty of each item is determined, the ability of
each student can be estimated even when different students
have been administered different items. At this point in the
estimation process achievement scores are expressed in a
standardized logit scale which ranges from -4 to +4. n order
to make the scores more meaningful and to facilitate their
interpretation, the scores are transformed to a new scale
with a mean of 500 and a standard deviation of 100.
The procedures TMSS used for the analyses were developed
to produce accurate results for groups of students while
limiting the testing burden on individual students. Furthermore,
these procedures provided data that could be readily used in
secondary analyses. RT scaling provides estimates of item
parameters (e.g., diffculty, discrimination) that defne the
relationship between the item and the underlying variable
measured by the test. Parameters of the RT model are
estimated for each test question, with an overall scale being
established as well as scales for each content area and
cognitive domain specifed in the assessment framework.
For example, the TMSS 2007 eighth-grade assessment had
four scales describing four mathematics content areas and
four science content areas, as well as three cognitive domains
in each of mathematics and science.
n order to allow for the calculation of trends in achievement,
comparisons of scores were necessary across the four TMSS
assessments conducted in 1995, 1999, 2003 and 2007. RT
estimation procedures were used to place scores from the
multiple administrations on the same scale (the scale of the
1995 administration). This is made possible by the inclusion
of common test items in successive administrations. This
allows comparison of item parameters (such as the relative
diffculty of items compared with each other and how well
individual items predict overall scores) across administrations.
This comparison of item parameters is used to drop items
whose item parameters change dramatically across
administrations and to equate scales across years. t is
important to note that the item parameters do not depend
directly on the average ability level of the students tested,
though they may depend on the range of abilities among
students tested (for example, to determine which of two
diffcult items is more diffcult, it is important to test students
of suffcient ability to get at least one of the items correct).
Therefore, even if the average ability levels of students in
countries participating in TMSS over time changes, the scales
still can be equated across administrations.
n TMSS, scales are equated across administrations by
linking the data from each administration to the data from
the administration that preceded it, as follows. Data for
students in adjacent assessments are pooled together and
scaled using RT to determine the diffculty and discrimination
of each item. This puts the scores from adjacent assessments
on the same scale. The achievement scores estimated from
the new item parameters are then put on the original 1995
TIMSS metric by a linear transformation.
For example, in order to allow an examination of trends
in eighth-grade achievement between 1995 and 1999, the
TMSS 1999 eighth-grade data were placed on the 1995
TMSS scale by frst scaling the 1995 and 1999 data for
countries that participated in both years together to determine
the item parameters. Ability estimates for all students (those
assessed in 1995 and those assessed in 1999) based on
the new item parameters were then estimated. n order to
put these jointly calibrated 1995 and 1999 scores on the 1995
metric, a linear transformation is applied. This transformation
is designed to give the jointly calibrated 1995 scores the
same mean and standard deviation as the original 1995
scores that were reported in the 1995 assessment cycle.
Once this linear transformation is established it is applied
to the 1999 assessment scores for all countries participating
in 1999. This puts the 1999 scores on the 1995 (longitudinal)
metric while preserving any growth that has occurred
between assessments.
Following this same procedure, TMSS 2003 scores were
jointly calibrated with the 1999 scores to place them on the
same (1995) metric and, fnally, TMSS 2007 scores were
jointly calibrated with the 2003 scores to place these on the
same (1995) metric. By linking scores for each adjacent pair
of assessments, all four sets of scores are placed on the same
18
These adjustments are for overall response rates and did not include any of the characteristics associated with differential nonresponse as identifed in the
nonresponse bias analyses reported above.
A-19
APPENDIX A
longitudinal scale. As a result, even if the makeup of the
countries participating in TMSS changes over time,
achievement comparisons within and between countries
are legitimate at a single point in time and across time.
Information obtained from the bridge study described
below was incorporated into this scaling to ensure strict
comparability of scores across the four assessments.
Details are provided in the TIMSS 2007 Technical Report
(Olson, Martin, and Mullis 2008).
The 2003-07 Bridge Study. As the name suggests, TMSS
places a great deal of emphasis on the measurement of
trends in achievement within and between countries. TMSS
provides for the measurement of these trends across the four
TMSS assessment years (1995, 1999, 2003, and 2007) by
placing the scores from each assessment on the same scale.
However, the TMSS assessment design changed a little in
2007, and it was considered prudent to devise a procedure to
measure the effect of this change, if any, on the comparability
of the 2007 assessment scores with those from previous
years. Given an effect, the intent was to incorporate a
correction into the scaling procedures which establish the
comparability of the 2007 achievement scores with those
from 1995, 1999, and 2003.
n order to evaluate the effect of the change in assessment
design in TMSS 2007, a bridge study was incorporated
into the main survey to allow a comparison of the 2007
assessment with the 2003 assessment. Countries that
participated in TMSS 2003 were asked to include four
additional booklets from 2003 in with the 14 booklets for
TMSS 2007 at each grade. As a result, sample sizes needed
to be increased to ensure that the number of students taking
each booklet was suffcient for the purposes of scaling.
The fndings from the bridge study indicated a small effect
from the change in the assessment design. To accommodate
this, a correction was introduced into the scaling procedures
which placed the 2007 assessment scores on the same scale
as the scores from the 1995, 1999 and 2003 assessments.
A detailed description of the bridge study is provided in the
Plausible values
To keep student burden to a minimum, TIMSS administered a
limited number of assessment items to each studenttoo few
to produce accurate content-related scale scores for each
student. To accommodate this situation, during the scaling
process plausible values were estimated to characterize
students participating in the assessment. Plausible values are
imputed values and not test scores for individuals in the usual
sense. n fact, they are biased estimates of the profciencies
of individual students. Plausible values do, however, provide
unbiased estimates of population characteristics.
Plausible values represent what the true performance of an
individual might have been, had it been observed. They are
estimated as random draws (usually fve) from an empirically
derived distribution of score values based on the student's
observed responses to assessment items and on background
variables. Each random draw from the distribution is
considered a representative value from the distribution
of potential scale scores for all students in the sample who
have similar characteristics and identical patterns of item
responses. Differences between the plausible values quantify
the degree of precision (the width of the spread) in the
underlying distribution of possible scale scores that could
have caused the observed performances.
An accessible treatment of the derivation and use of plausible
values can be found in Beaton and Gonzlez (1995). A more
technical treatment can be found in the TIMSS 2007 Technical
Report (Olson, Martin, and Mullis 2008).
International benchmarks
nternational benchmarks for achievement were developed
in an attempt to provide a concrete interpretation of what the
scores on the TMSS mathematics and science achievement
scales mean (for example, what it means to have a scale
score of 513 or 426). To describe student performance at
various points along the TMSS mathematics and science
achievement scales, TMSS used scale anchoring to
summarize and describe student achievement at four
points on the mathematics and science scalesAdvanced
nternational Benchmark (625), High nternational Benchmark
(550), ntermediate nternational Benchmark (475), and Low
nternational Benchmark (400). Scale anchoring involves
selecting benchmarks (scale points) on the TIMSS
achievement scales to be described in terms of student
performance and then identifying items that students scoring
at the anchor points can answer correctly. Subsequently,
these items are grouped by content area within benchmarks
and reviewed by mathematics and science experts. These
experts focus on the content of each item and describe the
kind of mathematics or science knowledge demonstrated
by students answering the item correctly. The experts then
provide a summary description of performance at each anchor
point leading to a content-referenced interpretation of the
achievement results. Detailed information on the creation of
the benchmarks is provided in the international TMSS reports
(Mullis, Martin, and Foy 2008; Martin, Mullis, and Foy 2008).
Data limitations
As with any study, there are limitations to TMSS 2007
that researchers should take into consideration. Estimates
produced using data from TMSS 2007 are subject to
two types of errornonsampling and sampling errors.
A-20
APPENDIX A

Nonsampling errors can be due to errors made in collecting
and processing data. Sampling errors can occur because
the data were collected from a sample rather than a complete
census of the population.
Nonsampling errors
Nonsampling error is a term used to describe variations
in the estimates that may be caused by population coverage
limitations, nonresponse bias, and measurement error, as
well as data collection, processing, and reporting procedures.
The sources of nonsampling errors are typically problems like
unit and item nonresponse, the difference in respondents'
interpretations of the meaning of the survey questions,
response differences related to the particular time the survey
was conducted, and mistakes in data preparation.
Missing data. Five kinds of missing data were identifed
by separate missing data codes: omitted, uninterpretable,
not administered, not applicable, and not reached. An item
was considered omitted if the respondent was expected to
answer the item but no response was given (e.g., no box
was checked in the item which asked "Are you a girl or a
boy?). tems with invalid responses (e.g., multiple responses
to a question calling for a single response) were coded
as uninterpretable. The not administered code was used
to identify items not administered to the student, teacher or
principal (e.g., those items excluded from the student's test
booklet because of the BB-spiraling of the items). An item
was coded as not applicable when it is not logical that the
respondent answer the question (e.g., when the opportunity
to make the response is dependent on a flter question).
Finally, items that are not reached were identifed by a string
of consecutive items without responses continuing through
to the end of the assessment or questionnaire.
Missing background data on other than key variables
19
are not
included in the analyses for this report and are not imputed.
tem response rates for variables discussed in this report
exceeded the NCES standard of 85 percent and so can be
reported without notation. Of the three key variables identifed
in the TMSS 2007 data for the United Statesex, race/
ethnicity and the percentage of students eligible for free- or
reduced-price lunch (FRPL)as table A-6 indicates, sex has
no missing responses and race/ethnicity missing responses
are minimal at some 2 percent. The FRPL variable, however,
has some 17 percent missing responses among the public
schools in the sample and these were imputed by substituting
values taken from the CCD for the schools in question.
Note, however, that the CCD provides this information only
for public schools. The comparable database for private
schools (PPS) does not include data on participation in the
FRPL program. While most private schools are ineligible
for this Federal program, a few indicated that some of their
students were taking part6 of the 18 fourth-grade schools
and 3 of the 14 eighth-grade schools. The reported values
for these schools are included along with the zero values for
schools who reported that they had no students taking part.
Missing value codes then are assigned only to the 3 fourth-
grade and 7 eighth-grade private schools who did not respond
to the question.
19
Key variables include survey-specifc items for which aggregate estimates are commonly published by NCES. They include, but are not restricted to, variables
most commonly used in table row stubs. Key variables also include important analytic composites and other policy-relevant variables that are essential elements of
the data collection. For example, the National Assessment of Educational Progress (NAEP) consistently uses gender, race-ethnicity, urbanicity, region, and school
type (public/private) as key reporting variables.
Table A-6. Weighted response rates for unimputed variables for TIMSS grade four and grade
eight: 2007
Variable Variable D Source of information
U.S.
response
rate
Range of
response
rates in other
countries
U.S.
response
rate
Range of
response
rates in other
countries
Sex ITSEX Classroom tracking form 100 99.5 - 100
1
100
100
Race/ethnicity STRACE Student questionnaire 98 98
Free or reduced-price lunch FRLUNCH School questionnaire 83 83
Not applicable.
1
All countries other than Morocco achieved 100 percent response on this variable.
NOTE: FRLUNCH variable available for public schools only.
A-21
APPENDIX A
Sampling errors
Sampling errors arise when a sample of the population, rather
than the whole population, is used to estimate some statistic.
Different samples from the same population would likely
produce somewhat different estimates of the statistic in
question. This fact means that there is a degree of uncertainty
associated with statistics estimated from a sample. This
uncertainty is referred to as sampling variance and is usually
expressed as the standard error of a statistic estimated from
sample data. The approach used for calculating standard
errors in TMSS was Jackknife Repeated Replication (JRR).
Standard errors can be used as a measure for the precision
expected from a particular sample. Standard errors for all
of the reported estimates are included in appendix C.
Confdence intervals provide a way to make inferences
about population statistics in a manner that refects the
sampling error associated with the statistic. Assuming a
normal distribution, the population value of this statistic can
be inferred to lie within the confdence interval in 95 out of 100
replications of the measurement on different samples drawn
from the same population.
That is, there is a 95 percent chance that the population value
of the statistic lies within the range of 1.96 times the standard
error above or below the estimated score. For example, the
average mathematics score for the U.S. eighth-grade students
was 508 in 2007, and this statistic had a standard error of
2.8. Therefore, it can be stated with 95 percent confdence
that the actual average of U.S. eighth-grade students in 2007
was between 503 and 514 (1.96 x 2.8 = 5.5; confdence
interval = 508 +/- 5.5).
Description of background
variables
The international versions of the TMSS 2007 student,
teacher, and school questionnaires are available at http://
timss.bc.edu. The U.S. versions of these questionnaires
are available at http://nces.ed.gov/timss.
Race/ethnicity
Students' race/ethnicity was obtained through student
responses to a two-part question. Students were asked frst
whether they were Hispanic or Latino, and then whether they
were members of the following racial groups: American ndian
or Alaska Native; Asian; Black or African American; Native
Hawaiian or other Pacifc slander; or White. Multiple
responses to the race classifcation question were allowed.
Results are shown separately for Blacks, Hispanics, Whites,
Asians and Mixed-Race as distinct groups. The small numbers
of students indicating that they were American ndian or Alaska
Native or Native Hawaiian or other Pacifc slander were
combined into a group labeled "Other. This category is
treated as a residual category and is not reported separately
in the analyses.
Poverty level in public schools
(percentage of students eligible
for free or reduced-price lunch)
The poverty level in public schools was obtained from
principals' responses to the school questionnaire. The
question asked the principal to report, as of approximately
the frst of October 2006, the percentage of students at the
school eligible to receive free or reduced-price lunch through
the National School Lunch Program. The answers were
grouped into fve categories: less than 10 percent; 10 to 24.9
percent; 25 to 49.9 percent; 50 to 74.9 percent; and 75
percent or more. Analysis was limited to public schools only.
Missing data on this variable were replaced with measures
taken from the CCD. The effect of this replacement on the
confdentiality of the data was examined as part of the
confdentiality analyses described in the following section.
Condentiality and disclosure
limitations
n accord with NCES standard 4-2-6 (National Center for
Education Statistics 2002), confdentiality analyses for the
United States were implemented to provide reasonable
assurance that public-use data fles issued by the EA would
not allow identifcation of individual U.S. schools or students
when compared against publicly available data collections.
Disclosure limitations included the identifcation and masking
of potential disclosure risks for TIMSS schools and adding
an additional measure of uncertainty of school, teacher, and
student identifcation through random swapping of a small
number of data elements within the student, teacher, and
school fles.
A-22
APPENDIX A

Statistical procedures
Tests of signicance
Comparisons made in the text of this report were tested for
statistical signifcance. For example, in the commonly made
comparison of country averages against the average of the
United States, tests of statistical signifcance were used to
establish whether or not the observed differences from the
U.S. average were statistically signifcant. The estimation
of the standard errors that are required in order to undertake
the tests of signifcance is complicated by the complex sample
and assessment designs, both of which generate error
variance. Together they mandate a set of statistically complex
procedures in order to estimate the correct standard errors.
As a consequence, the estimated standard errors contain
a sampling variance component estimated by the jackknife
repeated replication (JRR) procedure; and, where the
assessments are concerned, an additional imputation
variance component arising from the assessment design.
Details on the procedures used can be found in the WesVar
5.0 Users Guide (Westat 2007).
n almost all instances, the tests for signifcance used were
standard t tests.
20
These fell into two categories according
to the nature of the comparison being made: comparisons
of independent and nonindependent samples. Before
describing the t tests used, some background on the
two types of comparisons is provided below.
The variance of a difference is equal to the sum of the
variances of the two initial variables minus two times the
covariance between the two initial variables. A sampling
distribution has the same characteristics as any distribution,
except that units consist of sample estimates and not
observations. Therefore,
The sampling variance of a difference is equal to the sum of the
two initial sampling variances minus two times the covariance
between the two sampling distributions on the estimates.
f one wants to determine whether girls' performance differs
from boys' performance, for example, then, as for all statistical
analyses, a null hypothesis has to be tested. In this particular
example, it consists of computing the difference between the
boys' performance mean and the girls' performance mean
(or the inverse). The null hypothesis is
To test this null hypothesis, the standard error on this
difference is computed and then compared to the observed
difference. The respective standard errors on the mean
estimate for boys and girls ( ) can be
easily computed.
The expected value of the covariance will be equal to 0 if the
two sampled groups are independent. f the two groups are not
independent, as is the case with girls and boys attending the
same schools within a country, or comparing a country mean
with the international mean that includes that particular country,
the expected value of the covariance might differ from 0.
In TIMSS, country samples are independent. Therefore, for
any comparison between two countries, the expected value
of the covariance will be equal to 0, and thus the standard
error on the estimate is
with being any statistic.
Within a particular country, any subsamples will be considered
as independent only if the categorical variable used to defne
the subsamples was used as an explicit stratifcation variable.
If sampled groups are not independent, the estimation of the
covariance between, for instance,
(boys)
and
(girls)
would
require the selection of several samples and then the analysis
of the variation of
(boys)
in conjunction with
(girls)
. Such a
procedure is, of course, unrealistic. Therefore, as for any
computation of a standard error in TIMSS, replication methods
using the supplied replicate weights are used to estimate the
standard error on a difference. Use of the replicate weights
implicitly incorporates the covariance between the two
estimates into the estimate of the standard error on
the difference.
Thus, in simple comparisons of independent averages, such
as the U.S. average with other country averages, the following
formula was used to compute the t statistic:
Est
1
and est
2
are the estimates being compared (e.g., average
of country A and the U.S. average), and se
1
and se
2
are the
corresponding standard errors of these averages.
The second type of comparison used in this report occurred
when comparing differences of nonsubset, nonindependent
groups (e.g., when comparing the average scores of males
versus females within the United States). n such comparisons,
the following formula was used to compute the t statistic:
Est
grp1
and est
grp2
are the nonindependent group estimates
being compared. Se(est
grp1
- est
grp2
) is the standard error
of the difference calculated using a JRR procedure, which
accounts for any covariance between the estimates for the
two nonindependent groups.
20
Adjustments for multiple comparisons were not applied in any of the t-tests undertaken.
A-23
APPENDIX A
Effect size
Tests of statistical signifcance are, in part, infuenced by
sample sizes. To provide the reader with an increased
understanding of the importance of the signifcant difference
between student populations in the United States, effect
sizes are included in the report. Effect sizes use standard
deviations, rather than standard errors and, therefore, are
not infuenced by the size of the student population samples.
Following Cohen (1988) and Rosnow and Rosenthal (1996),
effect size is calculated by fnding the difference between the
means of two groups and dividing that result by the pooled
standard deviation of the two groups:
Est
grp1
and est
grp2
are the student group estimates being
compared. Sd
pooled
is the pooled standard deviation of the
groups being compared. The formula for the pooled standard
deviation is as follows (Rosnow and Rosenthal 1996):
where sd
1
and sd
2
are the standard deviations of the groups
being compared.
For example, to calculate the effect size between the 2007
fourth-grade U.S. average and Hong Kong SAR average in
mathematics, the difference in the estimated averages (607-
529 = 78) is divided by the pooled standard deviation. The
pooled standard deviation is calculated by fnding the square
root of the sum of the squared standard deviations for the
United States (sd = 75) and Hong Kong SAR (sd = 67) divided
by 2. Using this formula, the pooled standard deviation is 71.
Dividing the difference in average scores (78) by the pooled
standard deviation (71) produces an effect size of 1.1.
Table A-7 shows the differences in average scores, standard
deviations, pooled standard deviations, and effect sizes for
the comparisons reported in fgures 14 and 27. The standard
deviations for all countries and U.S. student subpopulations
discussed in this report are provided in tables E-18 and E-19
(mathematics) and E-37 and E-38 (science).
A-24
APPENDIX A

Table A-7. Difference between average scores, standard deviations, and pooled standard deviations
used to calculate effects sizes of mathematics and sciences scores of fourth- and eighth-
grade students, by country, sex, race/ethnicity, and school poverty level: 2007
Subject/grade and groups compared
Difference
in average
scores
Standard
deviation of
group 1
Standard
deviation of
group 2
Pooled
standard
deviation
Effect
size
Mathematics grade four
United States v. Hong Kong SAR 78 75 67 71 1.1
U.S. males v. U.S. females 6 77 74 76 0.1
U.S. White students v. U.S. Black students 67 68 70 69 1.0
U.S. White students v. U.S. Hispanic students 46 68 70 69 0.7
U.S. White students v. U.S. Asian students 33 68 74 71 0.5
U.S. White students v. U.S. multiracial students 15 68 84 76 0.2
U.S. public schools with lowest levels of poverty v. U.S. schools with highest levels of poverty 103 64 72 68 1.5
Mathematics grade eight
United States v. Chinese Taipei 90 77 106 93 1.0
U.S. White students v. U.S. Asian students 16 69 68 69 0.2
Science grade four
United States v. Singapore 48 84 93 89 0.5
Science grade eight
United States v. Singapore 47 82 104 94 0.5
U.S. males v. U.S. females 12 85 79 82 0.1
NOTE: Difference calculated by subtracting average score of group 1 from average score of group 2. Standard deviations and pooled standard deviations are
shown only for statistically signifcant differences between group means. The pooled standard deviation is calculated by fnding the square root of the sum of
the squared standard deviations for the groups being compared divided by 2, following Rosnow and Rosenthal (1996). Black includes African American. Racial
categories exclude Hispanic origin. Students who identifed themselves as being of Hispanic origin were classifed as Hispanic, regardless of their race. High-
poverty schools are those in which 75 percent or more of students are eligible for the federal free or reduced-lunch program. Low-poverty schools are those in
which less than 10 percent of students are eligible. The United States met guidelines for sample participation rates only after substitutes schools were included.
The National Defned Population covered 90 to 95 percent of the National Target
Population. See tables E-18 and E-19 (available at http://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2009001) for standard deviations of the U.S. and other
countries' student populations in mathematics. See tables E-37 and E-38 (available at http://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2009001) for the
analogous standard deviations in science.
B-1
APPENDIX B
Appendix B: Example Items
B-2
APPENDIX B

1
2
National Target Population does not include all of the nternational Target Population defned by TMSS
(see appendix A).
3
Nearly satisfed guidelines for sample participation rates only after replacement schools were included
(see appendix A).
4
Met guidelines for sample participation rates only after replacement schools were included (see appendix A).
5
6
Kuwait tested the same cohort of students as other countries, but late in 2007, at the beginning of the next
school year.
NOTE: Countries are sorted by 2007 average percent correct. The answer shown illustrates the type of student
response that was given full credit.
SOURCE: nternational Association for the Evaluation of Educational Achievement (EA).
Exhibit B1. Example fourth-grade mathematics item: 2007
M
0
3
1
3
0
1
Content Domain Number
Cognitive Domain Applying
M
0
3
1
3
0
1
Al wanted to fnd how much his cat weighed. He weighed himself and noted that
the scale read 57 kg. He then stepped on the scale holding his cat and found that
it read 62 kg.
What was the weight of the cat in kilograms?
Answer: _______________ kilograms
Country
Percent
full credit
nternational average 60
Chinese Taipei 95
Singapore 87
Hong Kong SAR
1
86
Kazakhstan
2
85
Netherlands
3
85
Japan 83
Lithuania
2
81
Austria 80
Germany 80
Latvia
2
80
Czech Republic 76
Denmark
4
75
Hungary 73
Slovenia 69
Italy 68
Ukraine 68
Norway 67
Sweden 66
Armenia 65
Scotland
4
64
England 63
Australia 61
Slovak Republic 60
United States
4,5
60
Georgia
2
59
New Zealand 53
Tunisia 28
Algeria 23
El Salvador 21
Morocco 19
Colombia 18
Kuwait
6
12
Qatar 9
Yemen 5
B-3
APPENDIX B
1
2
(see appendix A).
3
4
5
Nearly satisfed guidelines for sample participation rates only after replacement schools were included (see
appendix A).
6
school year.
Country
Percent
full credit
Hong Kong SAR
1
91
Slovenia 91
Lithuania
2
89
Denmark
3
88
Scotland
3
88
England 88
Singapore 88
Japan 87
Italy 87
Sweden 86
Australia 85
United States
3,4
85
Slovak Republic 84
Norway 84
Czech Republic 83
Austria 82
Chinese Taipei 81
Hungary 81
Latvia
2
81
New Zealand 81
Netherlands
5
79
Kazakhstan
2
77
Germany 76
Armenia 74
Ukraine 67
Colombia 59
Georgia
2
59
El Salvador 50
Algeria 44
Kuwait
6
40
Morocco 39
Tunisia 38
Qatar 32
Yemen 13
M
0
3
1
2
7
1
M
0
3
1
2
7
1
same size and shape.
Content Domain Geometric Shapes and Measures
Cognitive Domain Knowing
B-4
APPENDIX B

1
2
(see appendix A).
3
(see appendix A).
4
5
6
school year.
M
0
4
1
3
3
6
Class A and B each have 40 students.
Tere are more girls in Class A than in Class B. How many more:
14
!
16
24
!
30
M
0
4
l
3
3
6
Glrls
Glrls 8oys
8oys

0
4
8
l2
l6
20
24
Content Domain Data Display
Cognitive Domain Reasoning
Country
Percent
full credit
Singapore 63
Hong Kong SAR
1
63
Kazakhstan
2
51
Chinese Taipei 47
Lithuania
2
46
Netherlands
3
44
Japan 41
England 40
Slovak Republic 39
United States
4,5
38
Hungary 37
Sweden 37
Latvia
2
37
Australia 36
Slovenia 35
Germany 35
Denmark
4
34
Scotland
4
34
Austria 34
Armenia 33
Ukraine 32
New Zealand 32
Norway 31
Czech Republic 31
Georgia
2
26
Italy 26
Algeria 21
Morocco 15
Tunisia 14
Qatar 13
Kuwait
6
12
Yemen 9
El Salvador 9
Colombia 9
B-5
APPENDIX B
1
2
3
4
(see appendix A).
5
National Defned Population covers less than 90 percent of National Target Population (but at least 77
percent, see appendix A).
6
Kuwait tested the same cohort of students as other countries, but later in 2007, at the beginning of the next
school year.
Country
Percent
full credit
Korea, Rep. of 89
Japan 85
Hong Kong SAR
1,2
82
Chinese Taipei 81
United States
2,3
81
Singapore 81
Sweden 77
England
2
77
Hungary 77
Australia 75
Czech Republic 74
Lithuania
4
74
Malaysia 74
Scotland
2
74
Norway 73
Slovenia 72
Malta 72
Italy 70
Cyprus 70
Thailand 68
Israel
5
66
Turkey 64
Ukraine 63
Romania 62
Bahrain 61
Tunisia 61
Serbia
3,4
60
Bulgaria 59
Kuwait
6
56
Lebanon 55
Colombia 54
Algeria 54
Indonesia 52
Georgia
4
51
Jordan 48
El Salvador 47
Oman 46
Armenia 46
Qatar 44
Egypt 44
Saudi Arabia 41
Botswana 41
Ghana 34
M
0
2
2
0
4
3
Exhibit B4. Example eighth-grade mathematics item: 2007
Content Domain Number
Which circle has approximately the same fraction of its area shaded as the
rectangle above?
!
!
!
M
0
2
2
0
4
3
B-6
APPENDIX B

1
2
3
4
5
(see appendix A).
6
school year.
M
0
4
2
2
6
3
Country
Percent
full credit
Chinese Taipei 68
Korea, Rep. of 68
Singapore 59
Hong Kong SAR
1,2
53
Japan 42
United States
2,3
37
Australia 36
England
2
34
Sweden 34
Slovenia 30
Scotland
2
29
Czech Republic 25
Hungary 24
Israel
4
24
Malta 21
Armenia 21
Italy 19
Norway 18
Turkey 18
Bulgaria 17
Lithuania
5
15
Serbia
3,5
15
Romania 14
Malaysia 14
Thailand 13
Cyprus 11
Ukraine 11
Colombia 9
Georgia
5
8
Indonesia 8
Tunisia 6
Lebanon 5
Jordan 5
Oman 4
Bahrain 4
Saudi Arabia 3
El Salvador 2
Algeria 2
Egypt 2
Kuwait
6
2
Botswana 2
Qatar 2
Ghana 1
Joe knows that a pen costs 1 zed more than a pencil.
His friend bought 2 pens and 3 pencils for 17 zeds.
How many zeds will Joe need to buy 1 pen and 2 pencils?
Show your work.
M
0
4
2
2
6
3
Content Domain Algebra
B-7
APPENDIX B
1
2
3
(see appendix A).
4
5
6
school year.
M
0
3
2
2
9
4
Country
Percent
full credit
Chinese Taipei 86
Korea, Rep. of 82
Japan 81
Hong Kong SAR
1,2
80
Slovenia 80
Lithuania
3
78
Singapore 77
Hungary 74
Malaysia 73
Scotland
2
68
Ukraine 68
Serbia
3,4
67
Malta 65
Lebanon 65
Israel
5
64
England
2
63
Czech Republic 63
Kuwait
6
63
Romania 62
Italy 61
Bahrain 59
Indonesia 59
Oman 59
Bulgaria 58
Egypt 58
Norway 56
Thailand 55
Jordan 54
Armenia 53
Australia 51
Cyprus 51
Algeria 50
Sweden 48
Saudi Arabia 46
United States
2,4
45
Georgia
3
41
Turkey 38
Qatar 38
El Salvador 33
Colombia 30
Botswana 30
Tunisia 26
Ghana 26
M
0
3
2
2
9
4
Two points M and N are shown in the fgure above. John is looking for a point P
such that MNP is an isosceles triangle. Which of these points could be point P?

(3,5)
!
(3,2)

(1,5)
!
(5,1)
x
O
2
3
4
5
6
y
1
M N
1 2 3 4 5 6
Content Domain Geometry
B-8
APPENDIX B

1
2
3
National Target Population does not include all of the nternational Target Population defned by TMSS (see
appendix A).
4
5
6
school year.
M
0
4
2
2
0
Country
Percent
full credit
Korea, Rep. of 76
Singapore 75
Chinese Taipei 70
Japan 68
Hong Kong SAR
1,2
66
Sweden 56
Lithuania
3
51
Hungary 48
Czech Republic 45
England
2
45
Slovenia 44
Norway 41
United States
2,4
40
Malta 40
Australia 38
Scotland
2
38
Malaysia 35
Cyprus 33
Israel
5
31
Romania 29
Serbia
3,4
27
Italy 27
Thailand 26
Ukraine 24
Bulgaria 23
Jordan 22
Turkey 17
Lebanon 15
Georgia
3
15
Indonesia 14
Armenia 12
Colombia 10
Egypt 10
Bahrain 9
Tunisia 8
Botswana 7
Oman 6
El Salvador 4
Qatar 4
Saudi Arabia 3
Algeria 3
Kuwait
6
3
Ghana 2
M
0
4
2
2
2
0

Make a bar chart showing the number of students in each category in the pie
chart.
Popularity of Rock Bands
Dreadlocks 30%
Red Hot Peppers 25%
Stone Cold 45%
200
150
100
50
0
Red Hot Peppers Stone Cold Dreadlocks
N
u
m
b
e
r

o
f

S
t
u
d
e
n
t
s
Popularity of Rock Bands
Content Domain Data and Chance
B-9
APPENDIX B
# Rounds to zero.
1
2
3
(see appendix A).
4
(see appendix A).
5
school year.
6
S
0
4
1
0
1
8
Exhibit B8. Example fourth-grade science item: 2007
Country
Percent
full credit
Japan 93
Slovak Republic 66
Singapore 64
Chinese Taipei 61
Hungary 56
Australia 56
Sweden 53
New Zealand 52
United States
1,2
48
Denmark
1
45
Lithuania
3
43
Czech Republic 40
Latvia
3
39
Germany 38
Netherlands
4
37
Austria 36
England 36
Scotland
1
33
Kuwait
5
32
Italy 32
Kazakhstan
3
26
Slovenia 25
Hong Kong SAR
6
22
Armenia 21
Norway 20
Ukraine 18
Georgia
3
16
Qatar 7
El Salvador 5
Colombia 4
Algeria 1
Tunisia 1
Yemen #
Morocco #
Te diagram below shows the life cycle of a moth.
Write the name of each stage in the boxes provided.
One stage has been completed for you.
adult moth
Content Domain Life Science
B-10
APPENDIX B

S
0
3
1
0
7
8
1
2
(see appendix A).
3
4
5
(see appendix A).
6
school year.
Country
Percent
full credit
Japan 92
Singapore 88
Hong Kong SAR
1
75
Slovenia 70
Czech Republic 69
Latvia
2
69
Hungary 67
Kazakhstan
2
67
England 67
United States
3,4
66
Netherlands
5
65
Chinese Taipei 65
Italy 65
Ukraine 65
Germany 64
Austria 63
Lithuania
2
63
Slovak Republic 63
Denmark
3
62
Australia 59
Scotland
3
58
New Zealand 58
Armenia 56
Sweden 55
Norway 53
Georgia
2
41
Qatar 40
Colombia 39
El Salvador 36
Algeria 35
Kuwait
6
35
Tunisia 31
Morocco 24
Yemen 20
Beans are fxed on a metal ruler with butter as shown in the fgure above. Te
ruler is heated at one end. In which order will the beans fall of:

1, 2, 3, 4, 3

3, 4, 3, 2, 1

1, 3, 3, 4, 2

All at the same time
l 2 3 4 5
8eans
Metal Puler
Candle
Content Domain Physical Science
B-11
APPENDIX B
S
0
3
1
0
8
1
1
2
3
(see appendix A).
4
5
(see appendix A).
6
school year.
Country
Percent
full credit
Chinese Taipei 90
Singapore 88
Japan 88
Hong Kong SAR
1
82
Australia 80
England 78
Scotland
2
76
Latvia
3
76
United States
2,4
75
Netherlands
5
75
Kazakhstan
3
74
Sweden 72
Slovak Republic 72
New Zealand 70
Italy 70
Slovenia 68
Hungary 68
Denmark
2
68
Lithuania
3
67
Czech Republic 64
Austria 63
Germany 57
Norway 53
Ukraine 53
Georgia
3
49
Armenia 44
Colombia 37
Tunisia 29
Kuwait
6
24
El Salvador 23
Qatar 20
Algeria 16
Yemen 15
Morocco 12
Content Domain Earth Science
S
0
3
1
0
8
1
A ribbon is tied to a pole to measure the wind strength as shown below.
Write the numbers 1, 2, 3, and 4 in the correct order that shows the wind
strength from the strongest to weakest.
Answer : _____, _____, _____, _____
1 2 3 4
B-12
APPENDIX B

Exhibit B11. Example eighth-grade science item: 2007
S
0
3
2
3
8
5
1
2
3
(see appendix A).
4
school year.
5
6
Country
Percent
full credit
Chinese Taipei 91
Hong Kong SAR
1,2
86
Thailand 84
Turkey 82
Hungary 78
Lithuania
3
76
Slovenia 76
Japan 75
Czech Republic 74
Armenia 73
Cyprus 72
Jordan 72
Saudi Arabia 72
Kuwait
4
70
Bulgaria
5
70
Korea, Rep. of 70
Georgia
3
69
Israel
5
68
Serbia
3,6
67
Bahrain 66
Romania 66
Italy 65
Singapore 60
Lebanon 60
Algeria 58
Australia 56
Indonesia 55
Malaysia 55
Colombia 54
Ukraine 54
Botswana 53
United States
2,6
53
El Salvador 53
Sweden 53
England
2
53
Norway 51
Qatar 49
Oman 49
Tunisia 48
Malta 44
Scotland
2
41
Egypt 40
Ghana 31
Which characteristic is found ONLY in mammals:
eyes that detect color

!
glands that make milk
skin that absorbs oxygen

!
bodies that are protected by scales
Content Domain Biology
B-13
APPENDIX B
S
0
4
2
1
0
6
1
(see appendix A).
2
3
4
5
6
school year.
Country
Percent
full credit
Japan 65
Korea, Rep. of 51
Chinese Taipei 51
Italy 46
Czech Republic 43
Slovenia 39
Hungary 39
Sweden 38
Singapore 37
Lithuania
1
37
Israel
2
33
Hong Kong SAR
3,4
30
Ukraine 29
England
4
28
Armenia 28
Malta 27
Australia 25
Norway 25
Thailand 25
United States
4,5
24
Cyprus 24
Scotland
4
22
Tunisia 22
Romania 22
Serbia
1,5
20
Jordan 19
Bulgaria
2
19
Bahrain 18
Lebanon 18
Colombia 16
Turkey 16
Malaysia 14
El Salvador 9
Oman 9
Egypt 8
Algeria 7
Kuwait
6
7
Indonesia 6
Saudi Arabia 5
Georgia
1
4
Qatar 3
Ghana 3
Botswana 1
Te mass of substances A and B are measured on a balance, as shown in
Figure 1. Substance B is put into the beaker and substance C is formed. Te empty
beaker is put back on the balance, as shown in Figure 2.
Te scale in Figure 1 shows a mass of 110 grams.
What will it show in Figure 2:
(Check one box.)
More than 110 grams
110 grams
Less than 110 grams

Explain your answer.
Plgure l Plgure 2
l l 0 g
A
C
8
! ! ! g
Content Domain Chemistry
B-14
APPENDIX B

S
0
3
2
3
9
2
1
2
3
4
(see appendix A).
5
6
school year.
Country
Percent
full credit
Singapore 96
United States
1,2
91
Bulgaria
3
91
Korea, Rep. of 91
Hungary 90
Ukraine 90
Lithuania
4
89
Slovenia 88
Turkey 88
Serbia
2,4
87
Italy 87
Indonesia 86
Czech Republic 86
Australia 86
Lebanon 86
Malta 86
England
1
85
Malaysia 84
Scotland
1
83
Georgia
4
82
Sweden 82
Japan 82
Chinese Taipei 81
Armenia 80
Romania 79
Jordan 79
Norway 76
Hong Kong SAR
1,5
75
Thailand 74
Cyprus 72
Algeria 71
Israel
3
71
Bahrain 70
Egypt 70
Colombia 70
El Salvador 68
Kuwait
6
67
Botswana 64
Ghana 63
Saudi Arabia 61
Oman 58
Qatar 55
Tunisia 49
Work is done when an object is moved in the direction of an applied force. A
person performed diferent tasks as shown in the diagrams below. In which
diagram is the person doing work:

!

Holdlng a
heavy ob[ect
Pushlng
agalnst a wall
Pushlng a
cart up a ramp
Peadlng a book
Content Domain Physics
B-15
APPENDIX B
S
O
2
2
2
4
4
Country
Percent
full credit
Korea, Rep. of 48
Singapore 47
Hong Kong SAR
1,2
42
Lithuania
3
42
Japan 39
Slovenia 38
England
2
38
Chinese Taipei 35
Hungary 34
Australia 32
Jordan 30
Scotland
2
28
Italy 27
Czech Republic 25
Sweden 24
United States
2,4
23
Bulgaria 23
Malta 22
Norway 20
Armenia 20
Romania 19
Ukraine 18
Thailand 18
Bahrain 17
Israel
5
17
Egypt 17
Serbia
3,4
16
Malaysia 16
Algeria 13
Georgia
3
12
Indonesia 11
Oman 11
Turkey 10
Lebanon 9
Saudi Arabia 8
Cyprus 7
Colombia 7
Kuwait
6
5
Tunisia 5
El Salvador 4
Botswana 3
Ghana 3
Qatar 2
1
2
3
(see appendix A).
4
5
6
school year.
Content Domain Earth science
S
0
2
2
2
4
4
coal burns, sulfur that is present in the coal reacts with oxygen to form sulfur
How does this process result in acid rain?
C-1
APPENDIX B
APPENDIX C
How Does the Content of TIMSS
Compare with That of Other
Assessments?
It is often asked how TIMSS compares with other assessments
that measure similar subjects and populations, in particular, the
National Assessment of Educational Progress (NAEP). The
various assessments in which the United States participates,
including NAEP, TIMSS, and the Program for International
Student Assessment (PISA), vary in some obvious ways, such
as the goals of the studies (and whether they are focused
on national objectives or shared international objectives);
the precise defnitions of the populations they are measuring;
the degree of precision required for estimates and resulting
different sample sizes; their frameworks and specifcations;
and, for TIMSS and PISA, the different groups of countries that
participate. However, there also are differences that are less
obvious and that can only be found by comparing the content
of the assessments through examination of the items.
In a recent comparison study, TIMSS 2007 mathematics
and science items were classifed to the NAEP assessment
frameworks (2005/2007 for mathematics and 2005 for
science) in terms of content topics and objectives, grade-level
expectations, and cognitive dimensions in order to allow a
direct comparison of the two assessments. In other studies
(one past and one recent), PISA mathematics and science
items also were placed on the NAEP frameworks, which
allows content comparison of the TIMSS and PISA via the
national frameworks. This section highlights some of the main
fndings; additional details on the comparison study will be
included in a technical report to be released with the U.S.
national TIMSS dataset at a later date.
Although the TIMSS and NAEP fourth- and eighth-grade
mathematics frameworks are organized similarly and, broadly,
cover the same range of content (e.g., number, measurement,
geometry, algebra, and data), there are some differences
in the relative emphases on the different topic areas between
the assessments. For example, at the fourth grade, NAEP has
a greater percentage of items that focus on measurement
topics than does TIMSS (21 versus 14 percent, respectively),
whereas TIMSS has a greater percentage of items focusing
on geometry than NAEP (20 versus 16 percent, respectively).
There are similar examples at the eighth-grade level among
TIMSS, NAEP, and PISA, which focuses on an older group
of students.
As with mathematics, the TIMSS and NAEP science
frameworks cover the same range of major content areas,
including Earth, physical (including chemistry), and life
sciences. However, again, there are differences in the
distribution of items even at the broad content level.
These differences tend to be larger for science than for
mathematics, with differences between the two assessments
in the percentage of items in a given content area reaching
14 percent or more in Earth science and 8 percent or more in
physical sciences at both grades. As an example, 37 percent
of the TIMSS fourth-grade assessment is devoted to physical
science compared to 29 percent of NAEPs fourth-grade
assessment. This pattern continues at eighth grade. NAEP,
on the other hand, has higher percentages of Earth science
items than does TIMSS at both grades. PISAs focus (with
47 percent of items) tends to be on life science.
There is one other notable fnding from the comparison study
of science assessments. Twelve and 20 percent of fourth-
and eighth-grade TIMSS items, respectively, could not
be placed within the more detailed objectives of the NAEP
framework, indicating that there are some differences at
the item level between the two assessments, not just in
distribution of items across content areas.
Appendix C: TIMSS-NAEP Comparison
D-1
APPENDIX D
Online Resources
The NCES website (http://nces.ed.gov/timss) provides
background information on the TMSS surveys, copies of
NCES publications that relate to TIMSS, information for
educators about ways to use TMSS in the classroom, and
data fles. The international TMSS website (http://www.
timss.org) includes extensive information on the study,
including the international reports and databases.
NCES Publications
The following publications are intended to serve as examples
of some of the numerous reports that have been produced
in relation to the Trends in International Mathematics and
Science Study (TMSS) by NCES. All of the publications
listed here are available at http://nces.ed.gov/timss.
TIMSS 2003 Achievement Report
Gonzales, P., Guzmn, J.C., Partelow, L., Pahlke, E., Jocelyn,
L., Kastberg, D., and Williams, T. (2004). Highlights From
the Trends in International Mathematics and Science
Study (TIMSS) 2003 (NCES 2005005). National Center
for Education Statistics, U.S. Department of Education.
Washington, DC.
TIMSS 1999 Achievement Reports
Gonzales, P., Calsyn, C., Jocelyn, L., Mak, K., Kastberg, D.,
Arafeh, S., Williams, T., and Tsen, W. (2000). Pursuing
Excellence: Comparisons of International Eighth-Grade
Mathematics and Science Achievement From a U.S.
Perspective, 1995 and 1999 (NCES 2001028). National
Center for Education Statistics, U.S. Department of
Education. Washington, DC.
Gonzales, P., Calsyn, C., Jocelyn, L., Mak, D., Kastberg, D.,
Arafeh, S., Williams, T., and Tsen, W. (2000). Highlights
From TIMSS-R (NCES 2001027). National Center for
Education Statistics, U.S. Department of Education.
Washington, DC.
National Center for Education Statistics, U.S. Department
of Education. (1997). Pursuing Excellence: A Study of U.S.
Fourth-Grade Mathematics and Science Achievement in
International Context (NCES 97255). National Center for
Education Statistics, U.S. Department of Education.
Washington, DC.
Peak, L. (1996). Pursuing Excellence: A Study of U.S.
Eighth-Grade Mathematics and Science Teaching,
Learning, Curriculum, and Achievement in International
Context (NCES 97198). National Center for Education
Statistics, U.S. Department of Education. Washington, DC.
Takahira, S., Gonzales, P., Frase, M., and Salganik, L.H.
(1998). Pursuing Excellence: A Study of U.S. Twelfth-
Grade Mathematics and Science Achievement in
International Context (NCES 98049). National Center
for Education Statistics, U.S. Department of Education.
Washington, DC.
TIMSS Videotape Classroom
Study Reports
Hiebert, J., Gallimore, R., Garnier, H., Givvin Bogard, K.,
Hollingsworth, H., Jacobs, J., Miu-Ying Chui, A., Wearne,
D., Smith, M., Kersting, N., Manaster, A., Tseng, E.,
Etterbeek, W., Manaster, C., Gonzales, P., and Stigler, J.
(2003). Teaching Mathematics in Seven Countries:
Results From the TIMSS 1999 Video Study (NCES 2003
013 Revised). National Center for Education Statistics,
nstitute of Education Sciences, U.S. Department of
National Center for Education Statistics, U.S. Department
of Education. (2000). Highlights From the TIMSS
Videotape Classroom Study (NCES 2000094). National
Center for Education Statistics, U.S. Department of
Roth, K.J., Druker, S.L., Garnier, H., Lemmens, M., Chen, C.,
Kawanaka, T., Rasmussen, D., Trubacova, S., Warvi, D.,
Okamoto, Y., Gonzales, P., Stigler, J., and Gallimore, R.
(2006). Teaching Science in Five Countries: Results From
the TIMSS 1999 Video Study (NCES 2006-011). National
Center for Education Statistics, Institute of Education
Sciences, U.S. Department of Education. Washington, DC.
Stigler, J.W., Gonzales, P., Kawanaka, T., Knoll, S., and
Serrano, A. (1999). The TIMSS Videotape Classroom
Study: Methods and Findings From an Exploratory
Research Project on Eighth-Grade Mathematics
Instruction in Germany, Japan, and the United States
(NCES 1999074). National Center for Education
Statistics, U.S. Department of Education. Washington, DC.
Appendix D: Online Resources and Publications
D-2
APPENDIX D

IEA Publications
The following publications are intended to serve as examples
of some of the numerous reports that have been produced
in relation to TMSS by the EA. All of the publications
listed here are available at http://timss.bc.edu.
Martin, M.O., Mullis, .V.S., and Foy, P. (2008). TIMSS 2007
International Science Report: Findings From IEAs Trends
in International Mathematics and Science Study at the
Eighth and Fourth Grades. Chestnut Hill, MA: Boston
College.
Mullis, .V.S., Martin, M.O., and Foy, P. (2008). TIMSS 2007
International Mathematics Report: Findings From IEAs
Trends in International Mathematics and Science Study at
the Eighth and Fourth Grades. Chestnut Hill, MA: Boston
College.
Martin, M.O., Mullis, .V.S., Gonzlez, E.J., and Chrostowski,
S.J. (2004). TIMSS 2003 International Science Report:
Findings From IEAs Trends in International Mathematics
and Science Study at the Eighth and Fourth Grades.
Mullis, .V.S., Martin, M.O., Gonzlez, E.J., and Chrostowski,
S.J. (2004). TIMSS 2003 International Mathematics
Report: Findings From IEAs Trends in International
Mathematics and Science Study at the Eighth and Fourth
Grades. Chestnut Hill, MA: Boston College.
Martin, M.O., Mullis, .V.S., Gonzlez, E.J., Gregory, K.D.,
Smith, T.A., Chrostowski, S.J., Garden, R.A., and
O'Connor, K.M. (2000). TIMSS 1999 International Science
Report: Findings From IEAs Repeat of the Third
International Mathematics and Science Study at the
Eighth Grade. Chestnut Hill, MA: Boston College.
Mullis, .V.S., Martin, M.O., Gonzlez, E.J., Gregory, K.D.,
Garden, R.A., O'Connor, K.M., Chrostowski, S.J., and
Smith, T.A. (2000). TIMSS 1999 International Mathematics
Report: Findings From IEAs Repeat of the Third
International Mathematics and Science Study at the
Eighth Grade. Chestnut Hill, MA: Boston College.
Beaton, A.E., Martin, M.O., Mullis, .V.S., Gonzlez, E.J., Smith,
T.A., and Kelly, D.L. (1996). Science Achievement in the
Middle School Years: IEAs Third International Mathematics
and Science Study. Chestnut Hill, MA: Boston College.
Beaton, A.E., Mullis, .V.S., Martin, M.O., Gonzlez, E.J., Kelly,
D.L., and Smith, T.A. (1996). Mathematics Achievement in the
Middle School Years: IEAs Third International Mathematics
and Science Study. Chestnut Hill, MA: Boston College.
Martin, M.O., Mullis, .V.S., Beaton, A.E., Gonzlez, E.J.,
Smith, T.A., and Kelly, D.L. (1997). Science Achievement in
the Primary School Years: IEAs Third International
Mathematics and Science Study. Chestnut Hill, MA:
Boston College.
Mullis, .V.S., Martin, M.O., Beaton, A.E., Gonzlez, E.J., Kelly,
D.L., and Smith, T.A. (1997). Mathematics Achievement in
the Primary School Years: IEAs Third International
Mathematics and Science Study. Chestnut Hill, MA: Boston
College.
Mullis, .V.S., Martin, M.O., Beaton, A.E., Gonzlez, E.J.,
Kelly, D.L., and Smith, T.A. (1998). Mathematics and
Science Achievement in the Final Year of Secondary
School: IEAs Third International Mathematics and
Science Study. Chestnut Hill, MA: Boston College.
TIMSS Technical Reports
and Frameworks
Martin, M.O., and Kelly, D.L. (Eds.). (1996). Third International
Mathematics and Science Study Technical Report,
Volume I: Design and Development. Chestnut Hill, MA:
Boston College.
Volume II: Implementation and Analysis, Primary and
Middle School Years. Chestnut Hill, MA: Boston College.
Volume III: Implementation and Analysis, Final Year of
Secondary School. Chestnut Hill, MA: Boston College
Martin, M.O., Gregory, K.D., and Stemler, S.E. (2000). TIMSS
Martin, M.O., Mullis, .V.S. and Chrostowski, S.J. (2004).
TIMSS 2003 Technical Report: Findings From IEAs
Trends in International Mathematics and Science Study at
the Eighth and Fourth Grades. Chestnut Hill, MA: Boston
College.
Mullis, .V.S., Martin, M.O., Smith, T.A., Garden, R.A., Gregory,
K.D., Gonzlez, E.J., Chrostowski, S.J., and O'Connor, K.M.
(2003). TIMSS Assessment Frameworks and Specifcations
2003: 2nd Edition. Chestnut Hill, MA: Boston College.
Mullis, .V.S., Martin, M.O., Ruddock, G.J., O'Sullivan, C.Y.,
Arora, A., and Erberber, E. (2005). TIMSS 2007 Assessment
Frameworks. Chestnut Hill, MA: Boston College.
Olson, J.F., Martin, M.O., and Mullis, .V.S. (2008). TIMSS
TIMSS Encyclopedia
Mullis, .V.S., Martin, M.O., Olson, J.F., Berger, D.R., Milne, D.,
and Stanco, G.M. (Eds.). (2008). TIMSS 2007 Encyclopedia:
A Guide to Mathematics and Science Education Around the
World. Chestnut Hill, MA: Boston College.

Timmy 029

Uploaded by

Copyright:

Available Formats

Timmy 029

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Timmy 029

Uploaded by

Copyright:

Available Formats

Highlights From TIMSS 2007:

Mathematics and Science Achievement of U.S. Fourth-

eyes that detect color

skin that absorbs oxygen

More than 110 grams

Less than 110 grams

You might also like