Principles of Language Assessment
Principles of Language Assessment
Principles of Language Assessment
PRACTICALITY
Practicality refers to evaluating the assessment according to cost, logistics, the time
needed, and functionality. This principle is important for classroom teachers because an
expensive and time-consuming test is impractical. In other words, a practical test
considers time, budget, resources, and administration issues such as how to evaluate the
students’ work. A test can be practical when it stays within budgetary limits, it can be
completed by the test taker within appropriate time constraints. The test also should have
a clear distinction for administration, and appropriately utilize available human resources.
The test does not exceed available material resources and also consider the time and
effort involved for both design and scoring.
This theory can be illustrated in several examples when it becomes impractical. For
instance, while longer tests can increase validity because they capture more measurement
data, they may be impractical to administer. Besides, an overly long exam could induce
fatigue in candidates, which in turn could introduce errors into the measurements. A
practical examination is the one that does not place an unreasonable demand on available
resources. Therefore, analyzing the practicality of an authentic assessment will overlook
its budget, time of designing, implementing and scoring the assessment itself,
administration issues, and material resources.
RELIABILITY
Reliability is the ability of the test to be repeated and yield consistent results.
Rater Reliability: This can be caused by subjectivity, bias and human error. There
are two categories the first; Inter-rater reliability when two or more raters give
inconsistent results on the same test, by factors like lack of adherence to scoring,
criteria, inexperience, inattention, or even preconceived biases. The second; Intra-
rater reliability can occur in cases of unclear scoring criteria, fatigue, bias towards
particular “good” and “bad” students or simple carelessness.
VALIDITY
If you measure what you need to measure in students, then it is a valid test. For example,
we want to measure students' writing ability, we could ask them to write as many words
as they can in 15 minutes and then simply count the words for the final score and also
consider comprehensibility, elements of rhetorical discourse and organizations of ideas.
Validity can be classified based on the strategies that have been used to investigate
validity itself. Four types of evidence are explained below:
Daniela Hernandez Assessment, Evaluation and Testing
Isabel Andrade English and French Program – 9th semester
Nathalia Alvarez
Content-Related Evidence: it is a conclusion and measured performance based on
the subject that is being tested. It also relates to the measurement of achievement
and whether it is clear when applying the test.
A way of understanding content assessment is to consider direct and indirect tests.
In the direct, the student focuses on the target task while in the indirect learners
are not performing the task itself but rather a task that is related in some way.
Criterion-Related Evidence: it is about how performance on the assessment
accurately predicts future performance or estimate present performance on some
other valued measure. Usually it falls into two categories: concurrent
validation and predictive validation, the latter being important in the case of
performance tests designed to determine a student's readiness to move on to
another unit.
Construct-Related Evidence: This category is demonstrated in a test that measures
only the ability it is supposed to measure. As for the constructs, it refers to the
external factors or skills of each learner such as communication, fluency,
motivation or self-esteem that are crucial when presenting an oral test.
Consequential validity (impact): it refers to the positive or negative social
consequences of a given test. For example, the consequential validity of
standardized tests includes many positive attributes, including improving student
learning and motivation and ensuring that all students have access to equal
classroom content. According to Bachman and Palmer, test results should be
viewed at two levels: the macro and micro. The first refers to the effect on society,
and the last one to the effect on the individual.
Face validity: refers to the degree to which a test looks right, and appears to
measure the knowledge or abilities it claims to measure, based on the subjective
Daniela Hernandez Assessment, Evaluation and Testing
Isabel Andrade English and French Program – 9th semester
Nathalia Alvarez
judgment of the examinees who take it, the administrative personnel who decide
on its use, and other psychometrically unsophisticated observers.
It is important to understand the differences between reliability and validity. Validity will
tell you how good a test is for a particular situation; reliability will tell you how
trustworthy a score on that test will be. You cannot draw valid conclusions from a test
score unless you are sure that the test is reliable. Even when a test is reliable, it may not
be valid. You should be careful that any test you select is both reliable and valid for your
situation.
AUTHENTICITY
Authenticity is defined as “the degree of correspondence of the characteristics of a given
language test task to the features of a target language task” (Bachman & Palmer, 1996,
p. 23.) Authentic assessment occurs within the context of an authentic activity with
complex challenges, and centers on an active learner that produces refined results or
products, and is associated with multiple learning indicators. It includes the development
of tests and projects.
WASHBACK
Washback (Brown, 2004) or Backwash (Heaton, 1990) state that washback refers to the
influence of testing on teaching and learning. The influence itself can be positive or
negative.
Positive washback has a beneficial influence on teaching and learning. It means
teachers and students have a positive attitude toward the examination or test and
work willingly and collaboratively towards its objective. A good test should have a
good effect.
Negative washbac does not give any beneficial influence on teaching and learning.
These types of tests are considered a negative influence on teaching and learning.
Daniela Hernandez Assessment, Evaluation and Testing
Isabel Andrade English and French Program – 9th semester
Nathalia Alvarez
RELIABILITY
PRACTICALITY
Is consistent in its conditions
Stays within budgetary limits.
across two or more
Can be completed by the test-
administrations
taker.
Gives clear directions for scoring/
Has clear directions for
evaluation
administration.
Has uniform rubrics for scoring/
Does not exceed available
evaluation
material resources.
Lends itself to consistent
Considers the time and effort for
application of those rubrics by the
design and scoring.
scorer
WASHBACK
VALIDITY AUTHENTICITY
Positively influences what and
Measures exactly whathowit teachers teach Contains language that is as
proposes to measure Positively influences what andas possible
natural
Does not measure irrelevant how learners learn Has items that are contextualized
or“contaminating”variables Offers learners a chance
ratherto
than isolated
Relies as much as possibleadequately
on prepare Includes meaningful, relevant,
empirical evidence Gives learners feedback that topics
interesting
Involves performance enhances
that their language
Provides some thematic
samples the test’s criterion development organization to items
Is more formative in nature than
TO THINK ABOUT… summative
The second chapter
entitled "principles of language assessment" is a great contribution for us as future
teachers, since it explores the five principles of language assessment, which are the key
when applying an assessment in the students' learning process. It is important to
emphasize the definition of the concepts reviewed in the first chapter since they are the
basis for understanding and correctly applying the principles of language assessment. As
studied in the first chapter, assessment is a key component of learning because it not only
helps students to learn, but also helps teachers to evaluate the educational strategies they
Daniela Hernandez Assessment, Evaluation and Testing
Isabel Andrade English and French Program – 9th semester
Nathalia Alvarez
use in the classroom and assess whether they meet the objective or if they are motivating
the students. When students can apply that knowledge in their daily lives and reflect what
they have learned in their behavior, we can say that the goal of education has been
achieved. Assessment can also help to motivate students, to appropriate language
without being afraid of failure or mistakes. This second chapter has given us a very clear
idea on how to do a test. It means from the first step which is to design it until the last one
which is to give a grade. All this process needs to apply these principles in order to get a
successful result in the teaching and learning process.
The tests must be prepared in the way that both teachers and students get benefit from
them. It is important to understand the real goal of tests, and this is much better done
with the application of the principles already mentioned. Assessment should be seen as an
adequate tool to grade the teacher performance and the students’ learning. Assessment
should not be seen as something to be avoided or feared since it is the assessment itself
that allows growing and improving collectively. In this sense, a well-designed test will give
the students the confidence they need for improving every day, and in the same way, it
will provide the teacher a useful tool to follow his/her students learning process.
Regarding to the principles, all of them are considered as fundamental in the assessment
process. They provide specific and clear characteristics of a well done test. For example,
validity is fundamental when designing a test. It measures what exactly wants to measure
Daniela Hernandez Assessment, Evaluation and Testing
Isabel Andrade English and French Program – 9th semester
Nathalia Alvarez
in students. In this sense, content validity is important to consider since it proposes to
evaluate the specific topic a unit has, and to focus on the skill the teacher wants his/her
students to develop. In this way, students feel confident about their knowledge and
teachers get satisfied when designing the test without experiencing failures in the time,
objectivity, external or internal conditions and results.