How Standardized Tests Are Scored: Human Scoring

how Standardized Tests are Scored
ETS uses both human scoring and automated scoring to score standardized tests.
Human Scoring
Human-scored tests are scored manually rather than by machine and require human judgment.
Since test scores can impact future student learning or opportunities such as placement, licensure
or professional advancement, successful scoring is critical. At ETS, test scorers are carefully
selected and go through rigorous training to ensure the accuracy of their work.
Automated Scoring
There are two types of automated scoring used at ETS:
 machine scoring of multiple-choice test questions
 automated scoring of open-ended responses, such as short written answers, essays and
recorded speech
We have developed a number of automated scoring technologies through extensive research
in Natural Language Processing (NLP) that spans more than a decade. They include:
 the e-rater® engine
 the c-rater™ system
 the m-rater engine
 the SpeechRater engine

SM
 the TextEvaluator® tool

We are innovators in the field of automated scoring and have incorporated these technologies
into many of our testing programs, products and services, including the GRE® General Test,
the TOEFL iBT® test and the Criterion® Online Writing Evaluation service. For more information,
download our Automated Scoring Technologies Brochure(PDF).
A Note About the Use of Standardized Test Scores

A standardized test score is a measurement of a test-taker's knowledge of a subject or a set of
skills that can be used as a basis for comparison, but only if used properly. R&D Connections, a
series of publications created by ETS Research & Development, can help you understand the
role of scores in standardized testing and how they should be used.
See also:
 What We Do: Test Scoring
 e-rater Scoring Engine
 Automated Scoring Technologies Brochure (PDF)
 A Culture of Evidence: An Evidence-Centered Approach to Accountability for Student

Learning Outcomes
Promotional Links
How ETS Develops Test Questions
Watch Video (Flash, 5:59) >

View Transcript >
You might also be interested in ...
 About ETS
 Education Topics
https://www.ets.org/understanding_testing/scoring
STANDARDIZED TEST
LAST UPDATED: 11.12.15
A standardized test is any form of test that (1) requires all test takers to
answer the same questions, or a selection of questions from common bank of
questions, in the same way, and that (2) is scored in a “standard” or
consistent manner, which makes it possible to compare the relative
performance of individual students or groups of students. While different
types of tests and assessments may be “standardized” in this way, the
term is primarily associated with large-scale tests administered to
large populations of students, such as a multiple-choice test given to all the
eighth-grade public-school students in a particular state, for example.
In addition to the familiar multiple-choice format, standardized tests can
include true-false questions, short-answer questions, essay questions, or a
mix of question types. While standardized tests were traditionally presented
on paper and completed using pencils, and many still are, they are
increasingly being administered on computers connected to online programs
(for a related discussion, see computer-adaptive test). While standardized
tests may come in a variety of forms, multiple-choice and true-false formats
are widely used for large-scale testing situations because computers can
score them quickly, consistently, and inexpensively. In contrast, open-ended
essay questions need to be scored by humans using a common set
of guidelines or rubrics to promote consistent evaluations from essay to
essay—a less efficient and more time-intensive and costly option that is also
considered to be more subjective. (Computerized systems designed to
replace human scoring are currently being developed by a variety of
companies; while these systems are still in their infancy, they are
nevertheless becoming the object of growing national debate.)
While standardized tests are a major source of debate in the United States,
many test experts and educators consider them to be a fair and objective
method of assessing the academic achievement of students, mainly because
the standardized format, coupled with computerized scoring, reduces the
potential for favoritism, bias, or subjective evaluations. On the other hand,
subjective human judgment enters into the testing process at various stages
—e.g., in the selection and presentation of questions, or in the subject
matter and phrasing of both questions and answers. Subjectivity also enters
into the process when test developers set passing scores—a decision that
can affect how many students pass or fail, or how many achieve a level of
performance considered to be “proficient.” For more detailed discussions of
these issue, see measurement error, test accommodations, test
bias and score inflation.
Standardized tests may be used for a wide variety of educational purposes.
For example, they may be used to determine a young child’s readiness for
kindergarten, identify students who need special-education services or
specialized academic support, place students in different academic
programs or course levels, or award diplomas and other educational
certificates. The following are a few representative examples of the most
common forms of standardized test:
 Achievement tests are designed to measure the knowledge and
skills students learned in school or to determine the academic progress
they have made over a period of time. The tests may also be used to
evaluate the effectiveness of a schools and teachers, or identify the
appropriate academic placement for a student—i.e., what courses or
programs may be deemed most suitable, or what forms of academic
support they may need. Achievement tests are “backward-looking” in
that they measure how well students have learned what they were
expected to learn.
 Aptitude tests attempt to predict a student’s ability to succeed in an
intellectual or physical endeavor by, for example, evaluating
mathematical ability, language proficiency, abstract reasoning, motor
coordination, or musical talent. Aptitude tests are “forward-looking” in
that they typically attempt to forecast or predict how well students will
do in a future educational or career setting. Aptitude tests are often a
source of debate, since many question their predictive accuracy and
value.
 College-admissions tests are used in the process of deciding which
students will be admitted to a collegiate program. While there is a great
deal of debate about the accuracy and utility of college-admissions tests,
and many institutions of higher education no longer require applicants to
take them, the tests are used as indicators of intellectual and academic
potential, and some may consider them predictive of how well an
applicant will do in postsecondary program.
 International-comparison tests are administered periodically to
representative samples of students in a number of countries, including
the United States, for the purposes of monitoring achievement trends in
individual countries and comparing educational performance across
countries. A few widely used examples of international-comparison tests
include the Programme for International Student
Assessment (PISA), the Progress in International Reading Literacy
Study (PIRLS), and the Trends in International Mathematics and
Science Study (TIMSS).
 Psychological tests, including IQ tests, are used to measure a
person’s cognitive abilities and mental, emotional, developmental, and
social characteristics. Trained professionals, such as school psychologists,
typically administer the tests, which may require students to perform a
series of tasks or solve a set of problems. Psychological tests are often
used to identify students with learning disabilities or other special needs
that would qualify them for specialized services.
Reform
Following a wide variety of state and federal laws, policies, and regulations
aimed at improving school and teacher performance, standardized
achievement tests have become an increasingly prominent part of public
schooling in the United States. When focused on reforming schools and
improving student achievement, standardized tests are used in a few
primary ways:
 To hold schools and educators accountable for educational
results and student performance. In this case, test scores are used
as a measure of effectiveness, and low scores may trigger a variety of
consequences for schools and teachers. For a more detailed discussion
see high-stakes test.
 To evaluate whether students have learned what they are
expected to learn, such as whether they have met state learning
standards. In this case, test scores are seen as a representative
indicator of student achievement.
 To identify gaps in student learning and academic progress. In
this case, test scores may be used, along with other information about
students, to diagnose learning needs so that educators can provide
appropriate services, instruction, or academic support.
 To identify achievement gaps among different student groups,
including students of color, students who are not proficient in English,
students from low-income households, and students with physical or
learning disabilities. In this case, exposing and highlighting achievement
gaps may be seen as an essential first step in the effort to educate all
students well, which can lead to greater public awareness and changes in
educational policies and programs.
 To determine whether educational policies are working as
intended. In this case, elected officials and education policy makers may
rely on standardized-test results to determine whether their laws and
policies are working or not, or to compare educational performance from
school to school or state to state. They may also use the results to
persuade the public and other elected officials that their policies are in
the best interest of children and society.
Debate
While debates about standardized testing are wide-ranging, nuanced, and
sometimes emotionally charged, many debates tend to be focused on the
ways in which the tests are used, and whether they present reliable or
unreliable evaluations of student learning, rather than on whether
standardized testing is inherently good or bad (although there is certainly
debate on this topic as well). Most test developers and testing experts, for
example, caution against using standardized-test scores as an exclusive
measure of educational performance, although many would also contend
that test scores can be a valuable indicator of performance if used
appropriately and judiciously. Generally speaking, standardized testing is
more likely to become an object of debate and controversy when test scores
are used to make consequential decisions about educational policies,
schools, teachers, and students. The tests are less likely to be contentious
when they are used to diagnose learning needs and provide students with
better services—although the line separating these two purposes is
notoriously fuzzy in practice (thus, the ongoing debates).
While an exhaustive discussion of standardized-testing debates is beyond
the scope of this resource, the following questions will illustrate a few of the
major issues commonly discussed and debated in the United States:
 Are numerical scores on a standardized test misleading indicators of
student learning, since standardized tests can only evaluate a narrow
range of achievement using inherently limited methods? Or do the scores
provide accurate, objective, and useful evidence of school, teacher, or
student performance? (Standardized tests don’t measure everything
students are expected to learn in school. A test with 50 multiple-choice
questions, for example, can’t possibly measure all the knowledge and
skills a student was taught, or is expected to learn, in a particular subject
area, which is one reason why some educators and experts caution
against using standardized-test scores as the only indicator of
educational performance and success.)
 Are standardized tests fair to all students because every student takes
the same test and is evaluated in the same way? Do the tests have
inherent biases that may disadvantage certain groups, such as students
of color, students who are unfamiliar with American cultural conventions,
students who are not proficient in English, or students with disabilities
that may affect their performance?
 Is the use of standardized tests providing valuable information that
educators and school leaders can use to improve instructional quality? Is
the pervasive overuse of testing actually taking up valuable instructional
time that could be better spent teaching students more content and
skills?
 Do the benefits of standardized testing—consistent data on school and
student performance that can be used to inform efforts to improve
schools and teaching—outweigh the costs—the money spent on
developing the tests and analyzing the results, the instructional time
teachers spend prepping students, or the time students spend taking the
test?
 Do math and reading test scores, for example, provide a full and
accurate picture of school, teacher, and student performance? Do
standardized tests focus too narrowly on a few academic subjects?
 Does the narrow range of academic content evaluated by standardized
tests cause teachers to focus too much on test preparation and a few
academic subjects (a practice known as “teaching to the test”) at the
expense of other worthwhile educational pursuits, such as art, music,
health, physical education, or 21st century skills, for example?
 Do standardized tests, and the consequences attached to low scores,
hold schools, educators, and students to higher standards and improve
the quality of public education? Do the tests create conditions that
undermine effective education, such as cheating, unhealthy forms of
competition, or unjustly negative perceptions of public schooling?
 Should some of the most important decisions in public education—such
as whether to reduce or increase school funding or fire teachers and
principals—be made entirely or primarily on the basis of test scores?
Are standardized-test scores, which could potentially be misleading or
inaccurate, too limited a measure to use as a basis for such
consequential decisions?
https://www.edglossary.org/standardized-test/
Standardized test
From Wikipedia, the free encyclopedia
Young adults in Poland sit for their Matura exams. The Matura is standardized so that universities can easily
compare results from students across the entire country.
A standardized test is a test that is administered and scored in a consistent, or "standard", manner.
Standardized tests are designed in such a way that the questions, conditions for administering,
scoring procedures, and interpretations are consistent [1] and are administered and scored in a
predetermined, standard manner.[2]
Any test in which the same test is given in the same manner to all test takers, and graded in the
same manner for everyone, is a standardized test. Standardized tests do not need to be high-stakes
tests, time-limited tests, or multiple-choice tests. The questions can be simple or complex. The
subject matter among school-age students is frequently academic skills, but a standardized test can
be given on nearly any topic, including driving tests, creativity, personality, professional ethics, or
other attributes.
The opposite of standardized testing is non-standardized testing, in which either significantly
different tests are given to different test takers, or the same test is assigned under significantly
different conditions (e.g., one group is permitted far less time to complete the test than the next
group) or evaluated differently (e.g., the same answer is counted right for one student, but wrong for
another student).
Most everyday quizzes and tests taken by students typically meet the definition of a standardized
test: everyone in the class takes the same test, at the same time, under the same circumstances,
and all of the students are graded by their teacher in the same way. However, the term standardized
test is most commonly used to refer to tests that are given to larger groups, such as a test taken by
all adults who wish to acquire a license to have a particular kind of job, or by all students of a certain
age.
Standardized tests are perceived as being fairer than non-standardized tests, because everyone
gets the same test and the same grading system. This is fairer and more objective than a system in
which some students get an easier test and others get a more difficult test. The consistency also
permits more reliable comparison of outcomes across all test takers, because everyone is taking the
same test.[3] The prevalence of standardized testing in formal education has also been criticized for
many reasons.
Definition[edit]
The definition of a standardized test has somewhat changed over time. [4] In 1960, standardized tests
were defined as those tests in which the conditions and content were equal for everyone taking the
test, regardless of when, where, or by whom the test was given or graded. The purpose of this
standardization is to make sure that the scores reliably indicate the abilities or skills being measured,
and not other things, such as different instructions about what to do if the test taker does not know
the answer to a question.[4]
By the beginning of the 21st century, the focus shifted away from a strict sameness of conditions
towards equal fairness of conditions.[4] For example, a test taker with a broken wrist might write more
slowly because of the injury, and it would be more fair, and produce a more reliable understanding of
the test taker's actual knowledge, if that person were given a few more minutes to write down the
answers to a most test. However, if the purpose of the test is to see how quickly the student could
write, then this would become a modification of the content, and no longer a standardized test.
History[edit]
China[edit]
Main article: Imperial examination
The earliest evidence of standardized testing was in China, during the Han Dynasty,[5] where
the imperial examinations covered the Six Arts which included music, archery, horsemanship,
arithmetic, writing, and knowledge of the rituals and ceremonies of both public and private parts.
These exams were used to select employees for the state bureaucracy.
Later, sections on military strategies, civil law, revenue and taxation, agriculture and geography were
added to the testing. In this form, the examinations were institutionalized for more than a millennium.
Today, standardized testing remains widely used, most famously in the Gaokao system.
UK[edit]
Standardized testing was introduced into Europe in the early 19th century, modeled on the
Chinese mandarin examinations,[6] through the advocacy of British colonial administrators, the most
"persistent" of which was Britain's consul in Guangzhou, China, Thomas Taylor Meadows.
[6]
Meadows warned of the collapse of the British Empire if standardized testing was not implemented
throughout the empire immediately.[6]
Prior to their adoption, standardized testing was not traditionally a part of Western pedagogy; based
on the skeptical and open-ended tradition of debate inherited from Ancient Greece, Western
academia favored non-standardized assessments using essays written by students. It is because of
this, that the first European implementation of standardized testing did not occur in Europe proper,
but in British India.[7] Inspired by the Chinese use of standardized testing, in the early 19th century,
British "company managers hired and promoted employees based on competitive examinations in
order to prevent corruption and favoritism."[7] This practice of standardized testing was later adopted
in the late 19th century by the British mainland. The parliamentary debates that ensued made many
references to the "Chinese mandarin system."[6]
It was from Britain that standardized testing spread, not only throughout the British Commonwealth,
but to Europe and then America.[6] Its spread was fueled by the Industrial Revolution. The increase in
number of school students during and after the Industrial Revolution, as a result of compulsory
education laws, decreased the use of open-ended assessment, which was harder to mass-produce
and assess objectively due to its intrinsically subjective nature. For instance, measurement error is
easy to determine in standardized testing, whereas in open-ended assessment, graders have more
individual discretion and therefore are more likely to produce unfair results through unconscious
bias. When the score depends upon the graders' individual preferences, then the result an individual
student receives depends upon who grades the test.
More recently, standardized testing has been shaped in part, by the ease and low cost of grading of
multiple-choice tests by computer. Though the process is more difficult than grading multiple-choice
tests electronically, essays can also be graded by computer. In other instances, essays and other
open-ended responses are graded according to a pre-determined assessment rubric by trained
graders. For example, at Pearson, all essay graders have four-year university degrees, and a
majority are current or former classroom teachers.[8]
United States[edit]
Further information: List of standardized tests in the United States
Standardized testing has been a part of American education since the 1800s, but the widespread
reliance on standardized is largely a 20th-century phenomenon. For instance the College Entrance
Examination Board did not begin standardized testing in connection to higher education until 1900.
This test was implemented with the idea of creating standardized admissions for the United States in
northeastern elite universities. Originally, the test was also meant for top boarding school in order to
standardize curriculum.[9] With origins in World War I the Army Alpha and Beta tests developed
by Robert Yerkes and colleagues.[10] Before then, immigration in the mid-19th century contributed to
the growth of standardized tests in the United States.[11] Standardized tests were used in immigration
when people first came over to test social roles and find social power and status. [12]
Originally the standardized test was made of essays and was not intended for widespread testing.
The College Board then designed the SAT(Scholar Aptitude Test) in 1926 for a broader IQ test.
Notably, the Army IQ tests were what the first SAT test was based on in order to determine a
student’s intelligence, problem solving skills, and critical thinking. [13] In 1959, Everett Lindquist offered
the ACT (American College Testing) for the first time.[14] The ACT currently includes 4 main sections
with multiple choice questions to test English, mathematics, reading, and science, plus an optional
writing section.[15]
Large population state testing began in the 1970s, and in the 1980s America began to assess
nationally.[16] In 2012, together 45 states is annual spending on assessments cost $27 per student
and $669 million overall. However, once test involved administrative costs were included the cost per
student increased to $1100.[17] The need for the federal government to make meaningful comparisons
across a highly de-centralized (locally controlled) public education system has also contributed to the
debate about standardized testing, including the Elementary and Secondary Education Act of
1965 that required standardized testing in public schools. U.S. Public Law 107-110, known as the No
Child Left Behind Act of 2001, further ties public school funding to standardized testing. The goal of
No Child Left Behind was to improve the education system in the United States by holding school
and teachers accountable and attempting to close the educational gap between minority and non-
minority children in public schools. Students' results on standardized tests were used to allocate
funds and other resources such as teachers and administrators to schools. This policy does not
provide a federal standard for schools, but allows each state to set their own standards. [18] The Every
Student Succeeds Act replaced the NCLB. It was signed into law by President Obama on December
10, 2015. This act was created in order to revise the provisions of the NCLC in order to further allow
student achievement and success.[19]
Standardized testing is a very common way of determining a student's past academic achievement
and future potential. However, high-stakes tests (whether standardized or non-standardized) can
cause anxiety. When teachers or schools are rewarded for better performance on tests, then those
rewards encourage teachers to "teach to the test" instead of providing a rich and broad curriculum.
[20]
In 2007 a qualitative study done by Au Wayne demonstrated that standardized testing narrows the
curriculum and encourages teacher-centered instruction.[21] As a result, standardized testing has
become controversial in the United States.[22]
An additional factor to consider in regards to standardized testing in the United States education
system, is the socio-economic background of the students being tested. Research has shown that
children from low-income and poor families do not receive the same emphasis on education from
their parents as those students from higher income families. According to the Nation Center for
Children in Poverty, 41 percent of children under the age of 18 fall into the category of lower income.
(Kobal, H. and Jiang, Y., 2018) This is a large percent of the student population who start behind the
learning curve and require specialized attention to get to where they need to be in order to perform
well on the standardized test.[23]
Australia[edit]
The Australian National Assessment Program – Literacy and Numeracy (NAPLAN) standardized
testing was commenced in 2008 by the Australian Curriculum, Assessment and Reporting Authority,
an independent authority "responsible for the development of a national curriculum, a national
assessment program and a national data collection and reporting program that supports 21st century
learning for all Australian students".[24]
The testing includes all students in Years 3, 5, 7 and 9 in Australian schools to be assessed using
national tests. The subjects covered in these testings include Reading, Writing, Language
Conventions (Spelling, Grammar and Punctuation) and Numeracy.
The program presents students level reports designed to enable parents to see their child's progress
over the course of their schooling life, and help teachers to improve individual learning opportunities
for their students. Students and school level data are also provided to the appropriate school system
on the understanding that they can be used to target specific supports and resources to schools that
need them most. Teachers and schools use this information, in conjunction with other information, to
determine how well their students are performing and to identify any areas of need requiring
assistance.
The concept of testing student achievement is not new, although the current Australian approach
may be said to have its origins in current educational policy structures in both the USA and the UK.
There are several key differences between the Australian NAPLAN and the UK and USA strategies.
Schools that are found to be under-performing in the Australian context will be offered financial
assistance under the current federal government policy.
Design and scoring[edit]

Some standardized testing uses multiple-choice tests, which are relatively inexpensive to score, but any form of
assessment can be used.
Standardized testing can be composed of multiple-choice questions, true-false questions, essay

questions, authentic assessments, or nearly any other form of assessment. Multiple-choice and true-
false items are often chosen because they can be given and scored inexpensively, quickly, and
reliably through using special answer sheets that can be read by a computer or via computer-
adaptive testing. Some standardized tests have short-answer or essay writing components that are
assigned a score by independent evaluators who use rubrics (rules or guidelines) and benchmark
papers (examples of papers for each possible score) to determine the grade to be given to a
response. Not all standardized tests involve answering questions; an authentic assessment for
athletic skills could take the form of running for a set amount of time or dribbling a ball for a certain
distance.
Most national and international assessments, however, are not fully evaluated by people; people are
used to score items that are not able to be scored easily by computer (such as essays). For
example, the Graduate Record Exam is a computer-adaptive assessment that requires no scoring
by people except for the writing portion.[25]
The term "normative assessment" refers to the process of comparing one test-taker to his or her
peers. A norm-referenced test (NRT) is a type of test, assessment, or evaluation which yields an
estimate of the position of the tested individual in a predefined population. The estimate is derived
from the analysis of test scores and other relevant data from a sample drawn from the population.
This type of test identifies whether the test taker performed better or worse than other students
taking this test. A criterion-referenced test (CRT) is a style of test which uses test scores to show
whether or not test takers performed well on a given task, not how well they performed compared to
other test takers. Most tests and quizzes that are written by school teachers can be considered
criterion-referenced tests. In this case, the objective is simply to see whether the student has learned
the material.
Scoring issues[edit]
Human scoring is relatively expensive and often variable, which is why computer scoring is preferred
when feasible. For example, some critics say that poorly paid employees will score tests badly.
[26]
Agreement between scorers can vary between 60 and 85 percent, depending on the test and the
scoring session. Sometimes states pay to have two or more scorers read each paper; if their scores
do not agree, then the paper is passed to additional scorers.[26]
Open-ended components of tests are often only a small proportion of the test. Most commonly, a
major academic test includes both human-scored and computer-scored sections.
Score[edit]
Sample scoring for the history question: What caused World War II?
Student answers Standardized grading Non-standardized grading
No grading standards. Each teacher

Grading rubric: Answers must
grades however he or she wants to,
be marked correct if they
considering whatever factors the
mention at least one of the
teacher chooses, such as the
following: Germany's invasion
answer, the amount of effort, the
of Poland, Japan's invasion of
student's academic background,
China, or economic issues.
language ability, or attitude.
Teacher #1:
Teacher #1: I feel like this answer is good
This answer mentions one of enough, so I'll mark it correct.
Student #1:
the required items, so it is Teacher #2:
WWII was caused by Hitler and Germany
correct. This answer is correct, but this good
invading Poland.
Teacher #2: student should be able to do better
This answer is correct. than that, so I'll only give partial
credit.
Student #2:
WWII was caused by multiple factors,
Teacher #1: Teacher #1:
including the Great Depression and the
This answer mentions one of I feel like this answer is correct and
general economic situation, the rise of
the required items, so it is complete, so I'll give full credit.
national socialism, fascism, and imperialist
correct. Teacher #2:
expansionism, and unresolved
Teacher #2: This answer is correct, so I'll give full
resentments related to WWI. The war in
This answer is correct. points.
Europe began with the German invasion
of Poland.
Teacher #1: Teacher #1:

This answer does not mention This answer is wrong. No points.
Student #3: any of the required items. No Teacher #2:
WWII was caused by the assassination of points. This answer is wrong, but this
Archduke Ferdinand. Teacher #2: student tried hard and the sentence
This answer is wrong. No is grammatically correct, so I'll give
credit. one point for effort.
There are two types of standardized test score interpretations: a norm-referenced score
interpretation or a criterion-referenced score interpretation.
 Norm-referenced score interpretations compare test-takers to a sample of peers. The

goal is to rank students as being better or worse than other students. Norm-referenced test
score interpretations are associated with traditional education. Students who perform better than
others pass the test, and students who perform worse than others fail the test.
 Criterion-referenced score interpretations compare test-takers to a criterion (a formal
definition of content), regardless of the scores of other examinees. These may also be described
as standards-based assessments, as they are aligned with the standards-based education
reform movement.[27] Criterion-referenced score interpretations are concerned solely with
whether or not this particular student's answer is correct and complete. Under criterion-
referenced systems, it is possible for all students to pass the test, or for all students to fail the
test.
Either of these systems can be used in standardized testing. What is important to standardized
testing is whether all students are asked equivalent questions, under equivalent circumstances, and
graded equally. In a standardized test, if a given answer is correct for one student, it is correct for all
students. Graders do not accept an answer as good enough for one student but reject the same
answer as inadequate for another student.
Standards[edit]
The considerations of validity and reliability typically are viewed as essential elements for
determining the quality of any standardized test. However, professional and practitioner associations
frequently have placed these concerns within broader contexts when developing standards and
making overall judgments about the quality of any standardized test as a whole within a given
context.
Evaluation standards[edit]
In the field of evaluation, and in particular educational evaluation, the Joint Committee on Standards
for Educational Evaluation[28] has published three sets of standards for evaluations. The Personnel
Evaluation Standards[29] was published in 1988, The Program Evaluation Standards (2nd edition)
[30]
was published in 1994, and The Student Evaluation Standards[31] was published in 2003.
Each publication presents and elaborates a set of standards for use in a variety of educational
settings. The standards provide guidelines for designing, implementing, assessing and improving the
identified form of evaluation. Each of the standards has been placed in one of four fundamental
categories to promote educational evaluations that are proper, useful, feasible, and accurate. In
these sets of standards, validity and reliability considerations are covered under the accuracy topic.
For example, the student accuracy standards help ensure that student evaluations will provide
sound, accurate, and credible information about student learning and performance.
Testing standards[edit]
In the field of psychometrics, the Standards for Educational and Psychological Testing[32] place
standards about validity and reliability, along with errors of measurement and issues related to
the accommodation of individuals with disabilities. The third and final major topic covers standards
related to testing applications, credentialing, plus testing in program evaluation and public policy.
Importance of testing[edit]
Standardised testing is considered important and these tests do assess what is taught on the
national level. They are used to measure objectives and how schools are meeting educational state
standards.
There are three primary reasons for Standardized tests: Comparing among test takers, Improvement
of ongoing instruction and learning, and Evaluation of instruction. [33]
Considering the information presented above, students undergoing the testing have been told to not
spend copious amounts of their own time to study and prepare for the tests, although students
believe they need to do well to ensure they don't let down their school. [34]
Standardized tests put large amounts of pressure on students. Some children who are considered at
the top of their class choke when it comes to standardized tests such as the citywide.
A past standardized testing paper using multiple choice questions and answering them in the form as shown
above.
Reflection of testing[edit]
Parents and community activates around the country explain that the education system are failing
student. Standardized testing is included in efforts to improve the education system. Standardized
testing gives a detailed account of how student improvement and teach effectiveness are evaluated,
which can show how the school effectiveness sits on a national scale.
Public policy[edit]
Standardized testing is used as a public policy strategy to establish stronger accountability
measures for public education. While the National Assessment of Education Progress (NAEP) has
served as an educational barometer for some thirty years by administering standardized tests on a
regular basis to random schools throughout the United States, efforts over the last decade at the
state and federal levels have mandated annual standardized test administration for all public schools
across the country.[35]
The idea behind the standardized testing policy movement is that testing is the first step to improving
schools, teaching practice, and educational methods through data collection. Proponents argue that
the data generated by the standardized tests act like a report card for the community, demonstrating
how well local schools are performing. Critics of the movement, however, point to various
discrepancies that result from current state standardized testing practices, including problems with
test validity and reliability and false correlations (see Simpson's paradox).
Critics also charge that standardized tests encourage "teaching to the test" at the expense of
creativity and in-depth coverage of subjects not on the test. Multiple choice tests are criticized for
failing to assess skills such as writing. Furthermore, student's success is being tracked to a teacher's
relative performance, making teacher advancement contingent upon a teacher's success with a
student's academic performance. Ethical and economical questions arise for teachers when faced
with clearly underperforming or underskilled students and a standardized test.
Critics also object to the type of material that is typically tested by schools. Although standardized
tests for non-academic attributes such as the Torrance Tests of Creative Thinking exist, schools
rarely give standardized tests to measure initiative, creativity, imagination, curiosity, good will, ethical
reflection, or a host of other valuable dispositions and attributes.[36] Instead, the tests given by
schools tend to focus less on moral or character development, and more on individual identifiable
academic skills.
Advantages[edit]
 Offers Guidance to Teachers. Standardized tests will allow teachers to see how their
students are performing compared to others in the country. This will help them revise their
teaching methods if necessary to help their students meet the standards. [37]
 Allows Students to See Own Progress. Students will be given the opportunity to reflect on
their scores and see where their strengths as well as weaknesses are. [37]
 Provide Parents Information about their Child. The scores can allow parents to get an
idea about how their child is doing academically compared to everyone else of the same age in
the nation. [38]
 Let's Government Know What Areas Need to be Improved. Tests that are taken by
everyone can help the government determine where students are struggling the most. With this
information, they can implement solutions to fix the issue, allowing students to learn and grow in
an academic environment. [37]
One of the main advantages of standardized testing is that the results can be empirically
documented; therefore, the test scores can be shown to have a relative degree
of validity and reliability, as well as results which are generalizable and replicable. [39] This is often
contrasted with grades on a school transcript, which are assigned by individual teachers. It may be
difficult to account for differences in educational culture across schools, difficulty of a given teacher's
curriculum, differences in teaching style, and techniques and biases that affect grading. This makes
standardized tests useful for admissions purposes in higher education, where a school is trying to
compare students from across the nation or across the world. Examples of such international
benchmark tests include the Trends in International Mathematics and Science Study (TIMMS) and
the Progress in International Reading Literacy Study (PIRLS). Performance on these exams have
been speculated to change based on the way standards like the Common Core State Standards
(CCSS) line up with top countries across the world.
There are three metrics by which the best performing countries in the TIMMS (the "A+ countries")
are measured: focus, coherence, and rigor. Focus is defined as the number of topics covered in
each grade; the idea is that the fewer topics covered in each grade, the more focus can be given to
each topic. The definition of coherence is adhering to a sequence of topics covered that follows the
natural progression or logical structure of mathematics. The CCSSM was compared to both the
current state standards and the A+ country standards. With the most number of topics covered on
average, the current state standards had the lowest focus.[40]The Common Core Standards aim to fix
this discrepancy by helping educators focus on what students need to learn instead of becoming
distracted by extraneous topics. They encourage educational materials to go from covering a vast
array of topics in a shallow manner to a few topics in much more depth.[41]
Standardized tests also remove teacher bias in assessment. Research shows that teachers create a
kind of self-fulfilling prophecy in their assessment of students, granting those they anticipate will
achieve with higher scores and giving those who they expect to fail lower grades. [42]
Another advantage is aggregation. A well designed standardized test provides an assessment of an
individual's mastery of a domain of knowledge or skill which at some level of aggregation will provide
useful information. That is, while individual assessments may not be accurate enough for practical
purposes, the mean scores of classes, schools, branches of a company, or other groups may well
provide useful information because of the reduction of error accomplished by increasing the sample
size.
Opponents claim that standardized tests are misused and uncritical judgments of intelligence and
performance, but supporters argue that these aren't negatives of standardized tests, but criticisms of
poorly designed testing regimes. They argue that testing should and does focus educational
resources on the most important aspects of education — imparting a pre-defined set of knowledge
and skills — and that other aspects are either less important, or should be added to the testing
scheme.
Disadvantages and criticism[edit]

 Validity, efficacy, and predictive power. Many contend that overuse and misuse of these
tests harms teaching and learning by narrowing the curriculum. According to the group FairTest,
when standardized tests are the primary factor in accountability, schools use the tests to
narrowly define curriculum and focus instruction. Accountability creates an immense pressure to
perform and this can lead to the misuse and misinterpretation of standardized tests. [43] FairTest
says that negative consequences of test misuse include narrowing the curriculum, teaching to
the test, pushing students out of school, driving teachers out of the profession, and undermining
student engagement and school climate. Critics say that "teaching to the test" disfavors higher-
order learning. While it is possible to use a standardized test without letting its contents
determine curriculum and instruction, frequently, what is not tested is not taught, and how the
subject is tested often becomes a model for how to teach the subject.
 Uncritical use of standardized test scores to evaluate teacher and school

performance is inappropriate, because the students' scores are influenced by three things:
what students learn in school, what students learn outside of school, and the students'
innate intelligence.[44] The school only has control over one of these three factors. Value-
added modeling has been proposed to cope with this criticism by statistically controlling for
innate ability and out-of-school contextual factors.[45] In a value-added system of interpreting
test scores, analysts estimate an expected score for each student, based on factors such as
the student's own previous test scores, primary language, or socioeconomic status. The
difference between the student's expected score and actual score is presumed to be due
primarily to the teacher's efforts.
 Some teachers would argue that Standardized Test only measures a student’s
current knowledge and it does not reflect the students progress from the beginning of the
year.[46] A result created by individuals that are not apart of the student's regular instruction,
but by professionals that determine what students should know at different ages. In addition,
teachers agree that the best test creator and facilitator are themselves. They argue that they
are the most aware of students abilities, capacities, and necessities which would allow them
to take a longer on subjects or proceed on with the regular curriculum.
 Notable Opponents. In her book, Now You See It, Cathy Davidson criticizes standardized
tests. She describes our youth as "assembly line kids on an assembly line model," meaning the
use of the standardized test as a part of a one-size-fits-all educational model. She also criticizes
the narrowness of skills being tested and labeling children without these skills as failures or as
students with disabilities.[47] Widespread and organized cheating has been a growing culture in
today's reformation of schools.[48]
 Education theorist Bill Ayers has commented on the limitations of the standardized
test, writing that "Standardized tests can't measure initiative, creativity, imagination,
conceptual thinking, curiosity, effort, irony, judgment, commitment, nuance, good will, ethical
reflection, or a host of other valuable dispositions and attributes. What they can measure
and count are isolated skills, specific facts and function, content knowledge, the least
interesting and least significant aspects of learning." [49] In his book, The Shame of the
Nation, Jonathan Kozol argues that students submitted to standardized testing are victims of
"cognitive decapitation." Kozol comes to this realization after speaking to many children in
inner city schools who have no spatial recollection of time, time periods, and historical
events. This is especially the case in schools where due to shortages in funding and strict
accountability policies, schools have done away with subjects like the arts, history and
geography; in order to focus on the contest of the mandated tests.[50]
 Testing Minorities. Monty Neill, the director of the National Center for Fair and Open
Testing, claims that students who speak English as a second language, who have a disability, or
who come from low-income families are disproportionately denied a diploma due to a test score,
which is unfair and harmful. In the late 1970s when the graduation test began in the United
States, for example, a lawsuit delayed that many Black students had not had a fair opportunity
on the material they were tested on the graduation test because they had attended schools
segregated by law. “The interaction of under-resourced schools and testing most powerfully hits
students of color”, as Neill argues, “They are disproportionately denied diplomas or grade
promotion, and the schools they attend are the ones most likely to fare poorly on the tests and
face sanctions such as restructuring.” [51]
 In the journal The Progressive, Barbara Miner explicates the drawbacks of

standardized testing by analyzing three different books. As the co-director of the Center for
Education at Rice University and a professor of education, Linda M. McNeil in her
book Contradictions of School Reform: Educational Costs of Standardized Testing writes
“Educational standardization harms teaching and learning and, over the long term,
restratifies education by race and class.” McNeil believes that test-based education reform
places higher standards for students of color. According to Miner, McNeil “shows how test-
based reform centralizes power in the hands of the corporate and political elite-- a
particularly frightening development during this time of increasing corporate and
conservative influence over education reform.” Such test-based reform has dumbed down
learning, especially for students of color.[52]
 On a student and educator level. There is criticism from students themselves that tests,
while standardized, are unfair to the individual student. Some students are "bad test takers",
meaning they get nervous and unfocused on tests. Therefore, while the test is standard and
should provide fair results, the test takers are at a disadvantage, but have no way to prove their
knowledge otherwise, as there is no other testing alternative that allows students to prove their
knowledge and problem-solving skills.
 Some students suffer from test anxiety. Test anxiety applies to standardized tests as
well, where students who may not have test anxiety regularly feel immense pressure to
perform when the stakes are so high. High stakes standardized testing includes exams like
the SAT, the PARCC, and the ACT, where doing well is required for grade passing or college
admission.
 Standardized tests are a way to measure the education level of students and schools
on a broad scale. From Kindergarten to 12th grade, students participate in required test
taking. In that amount of time, the average student takes 112 standardized tests, which
equates to about 10 tests per year.[53] At this rate, the average amount of testing takes about
2.3% of total class time.[54] Although standardized tests were designed to improve the
education system, they are creating many negative effects on students and teachers.
Standardized tests have caused the quality and depth of the educational curriculum to diminish
(Rooks, Noliwe, and Noliwe Rooks). Instead of teachers developing a curriculum that addresses the
needs of the actual students in their classrooms, they end up using the required material which they
did not take any part in creating. The required material often contains pacing guidelines which
regulate when substance should be taught and scripted lessons which often limits the teacher’s
abilities to make relevant decisions in a classroom. The tests have narrowed the curriculum to a lot
of schools, usually squeezing out classes such as art and music simply because they are excluded
in the tests, then they are wiped out of the curriculum. Teachers are then forced to teach subjects
that only influence the literacy level and comprehension ability of a student and leave out the ones
that often require talent or skill.
Standardized testing places a lot of stress and pressure on children and teachers. Teachers are put
under a lot of stress because the better students do on the test the more federal funding that school
and district will receive. This causes teachers to teach to the test rather than teach to the life skills
children will use and need. In some cases, schools have shortened or removed recess so that more
time can be spent preparing and practicing for the standardized tests. The pressure of this and the
removal of a stress outlet, recess, means that children, along with teachers, are going to become
depressed and sleep-deprived. Being depressed and sleep-deprived causes children to act out more
than usual which places more stress on the teachers. Teachers do not get the results back until the
end of the summer which means they will not be able to use those results to help those children
because they will already be on to the next grade. Standardized tests place an unnecessary amount
of stress on teachers and students without yielding any information in a timely manner.

 Standardized testing puts pressure not only on students, but on teachers as well.
New Jersey Governor Chris Christie has proposed educational reform in New Jersey that
pressures teachers not only to "teach to the test," but also have their students perform at the
potential cost of their salary and job security. The reform calls for performance-based pay
that depends on students' performances on standardized tests and their educational gains.
However, students vary based on cognitive, developmental, and psychological abilities, so it
is unfair to teachers with students with difficulties on the test.[55]
In an April 1995 "meta-analysis" published in the Journal of Educational and Psychological
Measurement, Todd Morrison and Melanie Morrison examined two dozen validity studies of the test
required to get into just about any Masters or PhD program in America: the Graduate Record
Examination (GRE). This study encompassed more than 5,000 test-takers over the past 30 years.
The authors found that GRE scores accounted for just 6 percent of the variation in grades in
graduate school. The GRE appears to be "virtually useless from a prediction standpoint," wrote the
authors. Repeated studies of the Law School Admissions Test (LSAT) find the same. The SAT's
maker, the Educational Testing Service (ETS), now claims the SAT is not an "aptitude" test but rather
an assessment of "developed abilities."[56]
Finally, standardized tests are not inexpensive. It has been reported that the United States spends
about 1.7 billion dollars annually on these tests.[57] In 2001, it was also reported that only three
companies (Harcourt Educational Measurement, CTB McGraw-Hill and Riverside Publishing) design
96% of the tests taken at the state level.[58]
Educational decisions[edit]
Test scores are in some cases used as a sole, mandatory, or primary criterion for admissions or
certification. For example, some U.S. states require high school graduation examinations. Adequate
scores on these exit exams are required for high school graduation. The General Educational
Development test is often used as an alternative to a high school diploma.
Other applications include tracking (deciding whether a student should be enrolled in the "fast" or
"slow" version of a course) and awarding scholarships. In the United States, many colleges and
universities automatically translate scores on Advanced Placement tests into college credit,
satisfaction of graduation requirements, or placement in more advanced courses. Generalized tests
such as the SAT or GRE are more often used as one measure among several, when making
admissions decisions. Some public institutions have cutoff scores for the SAT, GPA, or class rank,
for creating classes of applicants to automatically accept or reject.
Heavy reliance on standardized tests for decision-making is often controversial, for the reasons
noted above. Critics often propose emphasizing cumulative or even non-numerical measures, such
as classroom grades or brief individual assessments (written in prose) from teachers. Supporters
argue that test scores provide a clear-cut, objective standard that minimizes the potential for political
influence or favoritism.
The National Academy of Sciences recommends that major educational decisions not be based
solely on a single test score.[59] The use of minimum cut-scores for entrance or graduation does not
imply a single standard, since test scores are nearly always combined with other minimal criteria
such as number of credits, prerequisite courses, attendance, etc. Test scores are often perceived as
the "sole criteria" simply because they are the most difficult, or the fulfillment of other criteria is
automatically assumed. One exception to this rule is the GED, which has allowed many people to
have their skills recognized even though they did not meet traditional criteria. [citation needed]
https://en.wikipedia.org/wiki/Standardized_test
k-12
The education sector is set to mark a milestone this school year, 2016-2017, as the first
batch of students enters their first senior year in high school.
While it’s been years since the government effected the new program, many teachers,
students, and parents remain clueless on certain details about it. One of which includes the
grading system.
New Grading System

Though K-12 curriculum’s current grading system has separate metrics for Grades 1 to 10
and Grades 11 to 12, both have similar components. These include written work (shortened
as WW), performance tasks (PTs), and quarterly assessment (QA). The Department of
Education (DepEd) describes this new grading system as standards-based and
competency-based.
DepEd issued an order citing that schools use the current grading system in gauging and
tracking learners’ progress. In addition, it instructs them to further adjust the grading
scheme should the need for it arises.
What is the basis for students’ grades? DepEd says they should base the grades on their
weighted raw score in summative assessments.
– On Core Subjects
For core subjects, senior high school students’ PTs will get the most percentage—50%. On
the other hand, WW and QA will both share a 25% weight.
– On Tracks
Since students will choose a track to complete in their senior high school years, schools will
assess their performance based on the same components—but weight or percentage for
Academic Track will differ from the three other tracks (TVL, Sports, and Arts and Design).
Below is a table showing the weight for each track’s components:
In keeping with this reform program, DepEd provides public school teachers with templates
for e-class record (especially for Grades 1-10). These templates will aid them in computing
their students’ grades easily.
How It Differs from the Old One

The newly structured grading system is far different from its precursor, KPUP, an acronym
for its components—knowledge, performance, understanding, and product. Unlike the
KPUP, the new system has a minimum grade for each learning area, which is 60—but
which will be transmuted to 75 in the student’s report card. However, the lowest possible
grade a student can get is 60, but that’s for quarterly grades and final grades.
For more information on DepEd’s K-12 senior high school program, please
visit www.k12philippines.com. You may also explore the CIIT’s senior high
school courses to know more about our specialized tracks.
Related Articles:
http://www.ciit.edu.ph/k-12-a-review-of-the-new-senior-high-school-grading-system/

How Standardized Tests Are Scored: Human Scoring

Uploaded by

Copyright:

Available Formats

How Standardized Tests Are Scored: Human Scoring

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

How Standardized Tests Are Scored: Human Scoring

Uploaded by

Copyright:

Available Formats

how Standardized Tests are Scored

 machine scoring of multiple-choice test questions

 the e-rater® engine

 the c-rater™ system

 the m-rater engine

 the SpeechRater engine

 the TextEvaluator® tool

A Note About the Use of Standardized Test Scores

 e-rater Scoring Engine

 Automated Scoring Technologies Brochure (PDF)

 A Culture of Evidence: An Evidence-Centered Approach to Accountability for Student

Watch Video (Flash, 5:59) >

Design and scoring[edit]

Standardized testing can be composed of multiple-choice questions, true-false questions, essay

No grading standards. Each teacher

Teacher #1: Teacher #1:

 Norm-referenced score interpretations compare test-takers to a sample of peers. The

Disadvantages and criticism[edit]

 Uncritical use of standardized test scores to evaluate teacher and school

 In the journal The Progressive, Barbara Miner explicates the drawbacks of

New Grading System

Below is a table showing the weight for each track’s components:

How It Differs from the Old One

You might also like