Principles of Language Assessment
Principles of Language Assessment
Assessment
There are five cardinal criteria for testing a test, which are,
practicality, reliability, validity, authenticity, and washback.
1. Practicality
Example: If a test has 20 listening items based on an audio tape, and 80 item on
grammar, vocabulary, reading comprehension, and multiple choice format, and a
scoring grid that accompanies the test, is considered practical.
2. Reliability
• If you give the same test to the same student, the results are going to be
similar.
• Inter-rater reliability: When two or more scorers yield inconsistent scores of the
same test, because of lack of attention, inexperience, inattention, and preconceived
biases. When two scorers are not applying the same standards, like the example before.
• Intra-rater reliability: Teachers with unclear scoring criteria, fatigue, bias toward
particular good and bad students, or carelessness. The first revision of tests with the last
ones is different because the teachers get more tired, and the result can be an
inconsistent evaluation across all tests. The careful specification of analytical scoring can
increase rater reliability.
2. Reliability
3. Test Administration Reliability: The conditions in which the test is
administrated. Listening comprehension with a lot of noise from the streets,
and students sitting next to windows cannot hear well. Photocopying
variations, too dark, too blurry, the amount of light in different parts of the
room, variations in temperature, and the conditions of chairs and desks.
Example:
• A valid test of reading ability actually measures reading ability, not a 20/20 vision
test
• To measure writing ability, one might ask students to write as many words as they
can in 15 minutes, and then simply count the words for the final score. This is easy
to apply but it is not considered a valid test of writing ability without some
consideration of comprehension.
3. Validity
How is the validity of a test established?
There is no final measure of validity, but there are certain aspects that one can take in
consideration:
• Examine the extent to which a test calls for performance that matches that of the
course or unit of study being tested
• Being concerned of how well a test determines whether or not students have
reached an established set of goals or level of competence
• Statistical correlation with other related but independent measures
• Focus on the consequences of a test or even the test-takers perception of validity
Types of evidence to establish the validity
of a test
1. Content-Related Evidence
• If a test samples the subject matter about which conclusions are to be drawn, and if it
requires the test taker to perform the behavior that is being measured, it can claim content-
related evidence of validity
• You can identify it if you can clearly define the achievement that you are measuring
Example:
• If you are trying to assess a person’s ability to speak a second language in a conversational
meaning, asking the learner to answer paper-and-pencil multiple choice questions regarding
grammatical judgements dos not achieve content validity. A test that requires the learner to
actually speak within some sort of authentic context does.
Types of evidence to establish the validity
of a test
1. Content-Related Evidence
• Cases of highly specialized and sophisticated testing instruments: questionable
content validity.
• The standard language proficiency tests lack content validity since they do not
require the full spectrum of communicative performance on the part of the
learner
• What such proficiency tests lack In content validity, they may gain in other
forms of evidence
Types of evidence to establish the validity
of a test
1. Content-Related Evidence
Understand the difference between direct and indirect testing
• Direct testing: actually performing the target task.
• Indirect testing: performing a task that is related in some way.
The most feasible rule of thumb for achieving content validity is to test performance directly.
Types of evidence to establish the validity
of a test
2. Criterion-Related Evidence
• The extent to which the criterion of the test has actually been reached
• Specified classroom objectives are measured, and implied predetermined levels of
performance are expected to be reached
• In the case of teacher-made classroom assessments, criterion-related evidence is
demonstrated through a comparison of results of an assessment with results of some
other measure of the same criterion
Example: In a course units where students are asked to be able to produce orally voiced and
voiceless stops in all phonetic environments, the results of one teacher’s test unit might be
compared with an independent assessment of the same phonemic proficiency.
Types of evidence to establish the validity
of a test
2. Criterion-Related Evidence
• It falls into one of two categories: concurrent and predictive validity.
• Concurrent validity: if a tests results are supported by other concurrent performance
beyond the assessment itself
Example: the validity of a high score on the final exam of a foreign language course will be
sustained by actual proficiency in the language
3. Construct-Related Evidence
• A construct is any theory, hypothesis or model that attempts to explain
observed phenomena in our universe of perceptions
• In the field of assessment, construct validity asks: Does this test actually
tap into the theoretical constructs as it has been defined?
Types of evidence to establish the validity
of a test
3. Construct-Related Evidence
• Such tests adhere to the principle of practicality and because they must sample a
limited number of domains of language, they may not be able to contain all the
content of a particular skill
• The TOEFL has until recently not attempted to sample oral production, yet, oral
production is an important part of the test.
Types of evidence to establish the validity
of a test
4. Consequential Validity
5. Face Validity
• The extent in which students view the assessment as fair, relevant and useful
for improving learning
• The degree in which a test looks right and appears to measure the knowledge
or abilities it claims to measure
• They may feel that a test is not testing what it is supposed to test.
• Face validity means that the students perceive the test to be valid.
Types of evidence to establish the validity
of a test
5. Face Validity
Face validity will be high if students encounter:
• The other side of this issue reminds us the psychological state of the learner
4. Authenticity
• The degree of correspondence of the characteristics of a given language test
task to the features of a target language task.
• When you return a written test, consider giving more than a number as your
feedback.
• The students have to perform tasks that were included in the previous
classroom lessons and that represent the objectives of the unit seen.
• Are classroom objectives identified? A valid objective would be this one
Valid
Students will produce yes/no questions with final rising intonation
Invalid
Students should be able to demonstrate some reading comprehension.
Practice vocabulary in context because it is ambiguous and no standards are
implied.
• The test must have clear instructions, appropriate timing, there are
no surprises, and the test is logically organized.