0% found this document useful (0 votes)

21 views77 pages

Week V & VI

Uploaded by

yirem1333

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views77 pages

Week V & VI

Uploaded by

yirem1333

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 77

Testing Principles

Practicality
 Is not excessively expensive
 Stays within appropriate time constraints
 Is relatively easy to administer
 Has a scoring/evaluation procedure that is specific and time
efficient
 Items can be replicated in terms of resources needed e.g. time,
materials, people
 can be administered
 can be graded
 results can be interpreted
Reliability
 A reliable test is consistent and dependable.
 Related to accuracy, dependability and consistency e.g.
20°C here today, 20°C in North Italy – are they the same?

According to Henning [1987], reliability is

 a measure of accuracy, consistency, dependability, or
fairness of scores resulting from the administration of a
particular examination e.g. 75% on a test today, 83%
tomorrow – problem with reliability.
Reliability
 Student Related reliability: the deviation of an observed score
from one’s true score because of temporary illness, fatigue,
anxiety, bad day, etc.
 Rater reliability: two or more scores yield an inconsistent
scores of the same test because of lack of attention to scoring
criteria, inexperience, inattention, or preconceived bias.
 Administration reliability: unreliable results because of testing
environment such as noise, poor quality of audio recorders,
etc.
 Test reliability: measurement errors because the test is too
long.
To Make Test More Reliable
 Take enough sample of behaviour
 Exclude items which do not discriminate well between
weaker and stronger students
 Do not allow candidate too much freedom.
 Provide clear and explicit instructions
 Make sure that the tests were perfectly laid out and
legible
 Make candidates familiar with format and testing
techniques
To Make Test More Reliable
 Provide uniform and undistracted conditions of
administration
 Use items that permit objective scoring
 Provide a detailed scoring key
 Train scorers
 Identify candidate by number, not by name
 Employ multiple, independent scoring
Measuring Reliability

 Test retest reliability: administer whatever the test

involved two times.
 Equivalent –forms reliability/parallel-forms reliability:
administering two different bu equal tests to a single
group of students (e.g. Form A and B)
 Internal consistency reliability: estimate the consistency
of a test using only information internal to a test,
available in one administration of a single test. This
procedure is called Split-half method.
Validity
 Criterion related validity: the degree to which results
on the test agree with those provided by some
independent and highly dependable assessment of
the candidates’ ability.
 Construct validity: any theory, hypothesis, or model
that attempts to explain observed phenomena in our
universe and perception; Proficiency and
communicative competence are linguistic constructs;
self-esteem and motivation are psychological
constructs.
Reliability Coefficient
 Validity coefficient to compare the reliability of different tests.
 Lado: vocabulary, structure, reading (0,9-0,99), auditory
comprehension (0,80-0,89), oral production (0,70-0,79)
 Standard error: how far an individual test taker’s actual score
is likely to diverge from their true score
 Classical analysis: gives us a single estimatefor all test takers
 Item Response theory: gives estimate for each individual,
basing this estimate on that individual’s performance
The Item Response Theory
 The item response theory (IRT), also known as the latent response
theory refers to a family of mathematical models that attempt to
explain the relationship between latent traits (unobservable
characteristic or attribute) and their manifestations (i.e. observed
outcomes, responses or performance).
 They establish a link between the properties of items on an
instrument, individuals responding to these items and the underlying
trait being measured.
The Item Response Theory

 IRT assumes that the latent construct (e.g.

stress, knowledge, attitudes) and items of
a measure are organized in an
unobservable continuum.
 Therefore, its main purpose focuses on
establishing the individual’s position on
that continuum.
Classical Test Theory
 Classical Test Theory [Spearman, 1904, Novick,
1966]focuses on the same objective and before the
conceptualization of IRT; it was (and still being) used
to predict an individual’s latent trait based on an
observed total score on an instrument.
 In CTT, the true score predicts the level of the latent
variable and the observed score.
Validity
 The extent to which the inferences made from
assessment results are appropriate, meaningful and
useful in terms of the purpose of the assessment.
 Content validity: requires the test taker to perform
the behaviour that is being measured.
 Its content constitutes a representative sample of the
language skills, structures, etc. with which it is
meant to be measured
Validity
 Consequential validity: accuracy in measuring intended
criteria, its impacts on the preparation of test takers, its effects
on the learner, and social consequences of test interpretation
and use.
 Face validity: the degree to which the test looks right and
appears to the knowledge and ability it claims to measure
based on the subjective judgement of examinees who take it
and the administrative personnel who decide on its use and
other psychometrical observers.
Validity
Response validity [internal]
 the extent to which test takers respond in the way expected by
the test developers

Concurrent validity [external]

 the extent to which test takers' scores on one test relate to those
on another externally recognised test or measure
Predictive validity [external]
 the extent to which scores on test Y predict test takers' ability to
do X e.g. IELTS + success in academic studies at university
Validity
 'Validity is not a characteristic of a test, but a
feature of the inferences made on the basis of test
scores and the uses to which a test is put.'
 To make test more valid:
1) Write explicit test specification
2) Use direct testing
3) Scoring of responses related directly to what is
being tested.
4) Make the test reliable.
Washback

 The quality of the relationship between a test and

associated teaching.
 We have positive effect and negative effect.
 Test is valid when it has a good washback
 Students have ready access to discuss the
feedback and evaluation you have given.
Washback
 The effect of testing on teaching and learning
 The effect of test on instruction in terms of how students
prepare for the test
 Formative test: provides washback in the form of information
to the learner on progress toward goals, while Summative test
is always the beginning of further pursuits, more learning,
more goals
 To improve washback: use direct testing, use criterion
reference-testing, base achievement tests on objectives, and
make sure that the tests are understood by students and
teachers.
Evaluation of Classroom Tests

 Are the test procedures practical?

 Is the test reliable?
 Does the procedure demonstrate content validity?
 Are the test tasks as authentic as possible?
 Does the test give beneficial washback?
Norm-Referenced Tests

 Norm-referenced refers to standardized tests that are

designed to compare and rank test takers in relation to one
another.
 Norm-referenced tests report whether test takers performed
better or worse than a hypothetical average student, which is
determined by comparing scores against the performance
results of a statistically selected group of test takers, typically
of the same age or grade level, who have already taken the
exam.
Norm-Referenced Tests

 Calculating norm-referenced scores is called the “norming

process,” and the comparison group is known as the
“norming group.” Norming groups typically comprise only
a small subset of previous test takers, not all or even most
previous test takers.
 Test developers use a variety of statistical methods to select
norming groups, interpret raw scores, and determine
performance levels.
NRT

 Is designed to measure the global language

abilities such as overall English Proficiency,
academic listening ability, reading
comprehension, and so on.
 Each student’s score on such a test is interpreted
relative to the scores of all other students who
took the test with reference to normal distribution
CRT

 Criterion reference test is usually produced

to measure well-defined and failrly specific
instructional objectives
 The interpretation of CRT is considered as
absolute in a sense that each student’s score
is meaningful without reference to the other
students’ scores
Criterion-referenced test results

 They are often based on the number of correct

answers provided by students, and scores might be
expressed as a percentage of the total possible number
of correct answers.
 On a norm-referenced exam, however, the score
would reflect how many more or fewer correct
answers a student gave in comparison to other
students.

Criterion-referenced test results

 Hypothetically, if all the students who took a

norm-referenced test performed poorly, the
least-poor results would rank students in the
highest percentile. Similarly, if all students
performed extraordinarily well, the least-
strong performance would rank students in the
lowest percentile.
Norm-Referenced vs. Criterion-Referenced
Tests
 Norm-referenced tests are specifically designed to
rank test takers on a “bell curve,” or a distribution
of scores that resembles, when graphed, the outline
of a bell—i.e., a small percentage of students
performing well, most performing average, and a
small percentage performing poorly.
 To produce a bell curve each time, test questions
are carefully designed to accentuate performance
differences among test takers, not to determine if
students have achieved specified learning
standards, learned certain material, or acquired
specific skills and knowledge.
 Tests that measure performance against a fixed set
of standards or criteria are called criterion-
referenced tests.
NRT and
Characteristics
CRT
NRT CRT

Types of interpretation Relative Absolute

Type of measurement To measure general To measure specific

language abilities objective-based language
points

Purpose of testing Spread students out a long a Assess the amount of

continuum of general material known or learned
abilities of proficiencies by each student

Distribution of scores Normal distribution Varies; often non normal.

Test structure A few relatively long subtest A series of short-well

with a variety of item defined subtests with similar
content item contents

Knowledge of questions Students have little or no Student know exactly what

idea of what content to content to expect in test
expect in test items items
Test and Decision Purposes
TYPES OF DECISION
NORM-REFERENCED CRITERION-REFERENCED
Test Qualities Proficiency Placement Achievement Diagnostic
Detail of Very general general specific Very specific
information
Focus General skills From all levels & Terminal Terminal and
prerequisite to skills of program objectives of enabling
entry course objective
Purpose of To compare To find each To determine the To inform
Decision individual and student’s degree of students and
individual appropriate level learning for teachers of
advancement or weaker
graduation objectives
Relationship to Comparisons Comparison Directly related Related to
Program with other within program to objectives objectives need
institutions more worls
Interpretation Before entry and Beginning of End of courses Beginning and/or
When at exit program middle of courses
administered
score Spread of wide Spread of Overall number Percentage of
Characteristics of communicative
tests
 Communicative test setting requirements:
1) Meaningful communication
2) Authentic situation
3) Unpredictable language input
4) Creative language output
5) All language skills
 Bases for ratings
1) Success in getting meaning across
2) Use focus rather than usage
3) New components to be rated
Components of Communicative
competence
 Grammatical competence (phonology,
orthography, vocabulary, word formation,
sentence formation)
 Sociolinguistic competence (social
meanings, grammatical forms in different
sociolinguistic contexts)
Components of Communicative
competence
 Discourse competence (cohesion in
different genres, cohesion in different
genres)
 Strategic competence (grammatical
difficulties, sociolinguistic difficulties,
discourse difficulties, performance
factors)
Discrete-point/Integrative Issue

 Discrete point: measures the small bits and

pieces of a language as in a multiple choice
test made up of questions constructed to
measure students’ knowledge of different
structure
 Integrative test: measures several skills at
one time such as dictation
Practical Issues

 Fairness issue: a test treats every student the

same.
 The cost issue
 Ease of test construction
 Ease of test administration
 Ease of test scoring
 Interactions of theoretical issues
General Guidelines for Item
Formats
 correctly matched to the purpose and content of
the item
 only one correct answer?
 written at the students’ level of proficiency
 Avoiding ambiguous terms and statements
 Avoiding negatives and double negatives
General Guidelines for Item
Formats
 Avoid giving clues that could be used in
answering other items
 All parts of the item on the same page
 Only relevant information presented
 Avoiding bias of race, gender and nationality
 Let another person look over the item
STEM

 A multiple choice item consists of a problem,

known as the stem, and a list of suggested
solutions, known as alternatives.
 The alternatives consist of one correct or best
alternative, which is the answer, and incorrect or
inferior alternatives, known as distractors.
Stem
 The stem should not contain irrelevant material, which can
decrease the reliability and the validity of the test scores
(Haldyna and Downing 1989).
 The stem should be negatively stated only when significant
learning outcomes require it.
 Consider whether the stem:

 presents a clearly defined problem or task to the student,

 contains unnecessary information,
 could be more simply, clearly, or concisely stated.
Versatility

 Multiple choice test items can be written to assess

various levels of learning outcomes, from basic recall
to application, analysis, and evaluation.
 Because students are choosing from a set of potential
answers, however, there are obvious limits on what
can be tested with multiple choice items.
 For example, they are not an effective way to test
students’ ability to organize thoughts or articulate
explanations or creative ideas.
Consider the Following When
Reviewing Multiple-Choice Question

 Consider whether the alternatives:

 are parallel in structure,

 fit logically and grammatically with the stem,
 could be more simply, clearly, or concisely
stated,
 are so inclusive that they logically eliminate any
other option from being a possible answer.
Consider the Following When
Reviewing Multiple-Choice Question

 Consider whether the distractors

 contain one or more items a student can consider

as a correct answer,
 are plausible enough to be attractive to students
who are low achievers
 contain one or more that can call attention to the
key.
More than one correct answer

 The apple is located on or around

 A) a table C) the table
 B) an table D) table
- Two correct answers (A and C), wordy
(somewhere around), repeat the word
table inefficiently
Multiple Choice

 Do you see the chair and table? The apple

is on _____ table.
a) A c) the
b) An d) (no article)

Option d (no article) will be easily detected

as a wrong option so it is not a good
distractor.
Guidelines for Writing Multiple-Choice Test
Items.
 The following are some guidelines that you should
use for preparing multiple-choice test items.
 The entire stem must always precede the
alternatives and it should contain the problem and
any clarifications.
 Avoid negatively stated stems.
Guidelines for Writing Multiple-Choice Test
Items.

 If an omission occurs in the stem, it

should appear near the end of the stem
and not at the beginning.
 Use only correct grammar in the stem and
alternatives.
 Avoid repeating words between the stem
and key. You can do that, however, to
make distractors more attractive.
Guidelines for Writing Multiple-
Choice Test Items.
 Avoid wording directly from a reading passage or
use of stereotyped phrasing in the key.
 Try to avoid “all of the above” as the last option. If
a student can eliminate any of the other choices, this
choice can be automatically eliminated as well.
Guidelines for Writing Multiple-
Choice Test Items
 To test understanding of a term or concept,
present the term in the stem followed by
definitions or descriptions in the alternatives.
 Do not use “none of the above” as the last option
when the correct answer is simply the best
answer among the choices offered.
 Avoid terms such as “always” or “never,” as they
generally signal incorrect choices.
True-False

 According to the passage, antidisestablismentarianism

diverges fundamentally from the conventional proceedings
and traditions of the Church of England
* Containing too difficult vocabulary.
Ambiguous Word

 Why are statistical studies inaccessible to language

teachers in Brazil according to the reading passage?
 Accessible: language teachers get very little training
in mathematics and/or such teachers are averse
(strongly disliking or opposed to)to numbers
 Accessible: the libraries may be far away.
Double negatives
 One theory that is not unassociated with Noam
Chomsky is:
 A. Transformational generative grammar
 B. Case grammar
 C. Non-universal phonology
 D. Acoustic phonology
- Use one negative only
- Emphasize it by underline, upper case, or bold-
face. For example: not, NEVER, inconsistent
Receptive response items

 True-False
1) the statement worded carefully enough so it can be judged
without ambiguity
2) absoluteness clues are avoided
 Matching
1) More options than premises
2) Options shorter than premises to reduce reading
3) Option and premise lists related to one central theme
Multiple Choice

 Unintentional clues are avoided

 The distracters are plausible
 Needless redundancy in the options is avoided
 Ordering of the option is carefully considered
 The correct answers are randomly assigned
True-False

 Items should be worded carefully enough so it can be

judged without ambiguity
 Avoid absoluteness
 This book is always crystal clear in all its
explanation: T F
- allow the students to answer correctly without
knowing the correct response.
- Absolute clues: all, always, absolutely, never, rarely,
most often
Guidelines for good alternatives

 Use a logical sequence for alternatives

(e.g., temporal(relating to time) sequence,
length of the choice).
 If two alternatives are very similar
(cognitively or visually), they should be
placed next to one another to allow students
to compare them more easily.
Guidelines for good alternatives

 Make all incorrect alternatives (i.e., distractors) plausible

and attractive. It is often useful to use popular
misconceptions and frequent mistakes as distractors.
 Make all alternatives grammatically consistent with the
stem.
 Item distractors should include only correct forms and
vocabulary that actually exist in the language.
Guidelines for good alternatives

 Use 4 or 5 alternatives in each item.

 If one or more alternatives are partially
correct, ask for the “best” answer.
 Alternatives should not overlap in
meaning or be synonymous with one
another.
Guidelines for good alternatives

 All alternatives should be homogeneous in

content, form, and grammatical structure.
 The length, explicitness, and technical
information in each alternative should be
parallel so as not to give away the correct
answer.
Multiple Choice

 Avoid unintentional clues

 The fruit that Adam ate in the Bible was
an ____
A. Pear C. Apple
B. Banana D. Papaya
Unintentional clues: grammatical,
phonological, morphological, etc.
Multiple Choice

Are all distracters plausible?

Adam ate _______
A. An apple C. an apricot
B. A banana D. a tire
Multiple Choice

 Avoid needless redundancy

 The boy on his way to the store, walking down the
street, when he stepped on a piece of cold wet ice and
A. fell flat on his face
B. fall flat on his face
C. felled flat on his face
D. falled flat on his face
Multiple Choice

 More effective:
The boy stepped on a piece of ice and
______ flat on his face.
A. fell
B. fall
C. felled
D. falled
Multiple Choice

 Correct answers should be randomly

assigned
 Distractors like “none of the above”, “A
and B only”, “all of the above’ should be
avoided
Matching

 Present the students with two columns of

information; the students then must find and
identify matches between the two sets of
information.
 The information on the left-hand column is
called matching-item premise
 On the right hand column is called option
Matching

 More options should be supplied than premises so the

students can narrow down the choices as they
progress through the test simply by keeping track of
the options they have used.
 Options should be shorter than premises because most
students will read a premise then search through the
options
 The options and premises should relate to one central
theme that is obvious to students
Fill in Items
 The required response should be concise
 Bad item:
 John walked down the street ________
(slowly, quickly, angrily, carefully, etc.)
 Good item:
 John stepped onto the ice and immediately
____ down hard (fell)
Fill in Items

 There should be a sufficient context to convey

the intent of the question to the students.
 The blanks should be standard in length
 The main body of the question should precede the
blank
 Develop a list of acceptable responses
Short Response

 Items that the students can answer in a few phrases

or sentences.
 The item should be formatted that only one
relatively concise answer is possible.
 The item is framed as a clear and direct item
 E.g. According to the reading passage, what are the
three steps in doing research?
Task Items
 Task item is any of a group of fairly-open ended item
types that require students to perform a task in the
language that is being tested.
 The task should be clearly defined
 The task should be sufficiently narrow for the time
available.
 A scoring procedure should be worked out in advance
in regard to the approach that will be used.
Task Items
 A scoring procedure should be worked out
in advance in regard to the categories of
language that will be rated.
 The scoring procedure should be clearly
defined in terms of what each score within
each category means.
 The scoring should be anonymous
Analytic Score for Rating
Composition Tasks
Holistic Version of the Scale for
Rating Composition Tasks
 Content
 Organization
 Language Use
 Vocabulary
 Mechanics
Personal Response Items

 The response allows the students to

communicate in ways and about things that
are interesting to them personally
 Personal Responses include: self
assessment, conferences, porfolio
Self-Assessment
 Decide on a scoring type
 Decide what aspect of students’ language performance they
will be assessing
 Develop a written rating for the learners
 The rating scale should decide concrete language and
behaviours in simple terms
 Plan the logistics of how the students will assess themselves
 The students should learn the self-scoring procedures
 Have another student/teacher do the same scoring
Conferences
 Introduce and explain conferences to the students
 Give the students the sense that they are in control of
the conference
 Focus the discussion on the students’ views concerning
the learning process
 Work with the students concerning self-image issue
 Elicit performances on specific skills that need to be
reviewed.
 The conferences should be scheduled regularly
Portfolios

 Explain the portfolios to the students

 Decide who will take responsibility for what
 Select and collect meaningful work.
 The students periodically reflect in writing on their
portfolios
 Have other students, teachers, outsiders periodically
examined the portfolios.

Evaluating, Adapting and Developing Materials For Learners of EIL - Brian Tomlinson
100% (1)
Evaluating, Adapting and Developing Materials For Learners of EIL - Brian Tomlinson
87 pages
How To Teach Grammar Like A Pro 1
No ratings yet
How To Teach Grammar Like A Pro 1
113 pages
SPG Action Plan-Central 1
87% (46)
SPG Action Plan-Central 1
3 pages
Standardized & Non Standardized Tests
100% (1)
Standardized & Non Standardized Tests
45 pages
Establishing Validity-and-Reliability-Test
No ratings yet
Establishing Validity-and-Reliability-Test
28 pages
Qualities of An Evaluation Tool
No ratings yet
Qualities of An Evaluation Tool
42 pages
Standardized and Nonstandardized Test
50% (2)
Standardized and Nonstandardized Test
16 pages
FS 1 Episode 3 Focus On Gender Needs Strengths Autosaved DONE
No ratings yet
FS 1 Episode 3 Focus On Gender Needs Strengths Autosaved DONE
14 pages
Unit 4: Qualities of A Good Test: Validity, Reliability, and Usability
No ratings yet
Unit 4: Qualities of A Good Test: Validity, Reliability, and Usability
18 pages
Characteristics of A Good Test
50% (2)
Characteristics of A Good Test
5 pages
Characteristics of A Good Test
No ratings yet
Characteristics of A Good Test
33 pages
Factors Influencing The Validity of The Tests in General
100% (2)
Factors Influencing The Validity of The Tests in General
10 pages
Validity & Reliability (Chapter 4 - Learning Assessment)
No ratings yet
Validity & Reliability (Chapter 4 - Learning Assessment)
75 pages
BIGAME2E4 OW AP v2
No ratings yet
BIGAME2E4 OW AP v2
114 pages
Properties of Assessment Methods
60% (5)
Properties of Assessment Methods
24 pages
Chapter 1 - Variations in Psychological Attributes Notes
100% (2)
Chapter 1 - Variations in Psychological Attributes Notes
7 pages
Principles of Language Testing
No ratings yet
Principles of Language Testing
48 pages
Las Science 7 Q1 W1
No ratings yet
Las Science 7 Q1 W1
4 pages
Chapter 4
No ratings yet
Chapter 4
86 pages
Language Testing PPT 2
No ratings yet
Language Testing PPT 2
27 pages
Edu60 High Quality Classroom Assessment Version 2
No ratings yet
Edu60 High Quality Classroom Assessment Version 2
47 pages
Evaluation: Prepared By: Usha Kiran Poudel SBA, 2079
No ratings yet
Evaluation: Prepared By: Usha Kiran Poudel SBA, 2079
121 pages
Principles of Language Assessment - Tips For Testing
93% (14)
Principles of Language Assessment - Tips For Testing
4 pages
TSL 641 Summary Paper
No ratings yet
TSL 641 Summary Paper
17 pages
PHYSICS F2 Assignments - Form 2 - Physics
No ratings yet
PHYSICS F2 Assignments - Form 2 - Physics
18 pages
Two Conceptions of Happiness: Contrasts of Personal Expressiveness (Eudaimonia) and Hedonic Enjoyment
No ratings yet
Two Conceptions of Happiness: Contrasts of Personal Expressiveness (Eudaimonia) and Hedonic Enjoyment
15 pages
NLIUPlacementBrochurenc 2017 PDF
No ratings yet
NLIUPlacementBrochurenc 2017 PDF
32 pages
Topic 3 Characteristics and Principles of Assessment
100% (1)
Topic 3 Characteristics and Principles of Assessment
45 pages
Chapter 2 Principles of Language Assessment-Handout
No ratings yet
Chapter 2 Principles of Language Assessment-Handout
46 pages
Business English Course For An Indian Centralised Government Bank and Regulatory Body
No ratings yet
Business English Course For An Indian Centralised Government Bank and Regulatory Body
4 pages
Resume Taekwondo
No ratings yet
Resume Taekwondo
1 page
CT 200 Module 5-2
No ratings yet
CT 200 Module 5-2
41 pages
Assessment of Learning-1
No ratings yet
Assessment of Learning-1
24 pages
Bed 106
No ratings yet
Bed 106
93 pages
Principles of High Quality Assessment 2
No ratings yet
Principles of High Quality Assessment 2
46 pages
Principles of Language Assessment
100% (1)
Principles of Language Assessment
26 pages
Try Out and Validation
No ratings yet
Try Out and Validation
36 pages
Standardized and Non Standardized Test
No ratings yet
Standardized and Non Standardized Test
23 pages
Qualities of Test (Validity & Relibility Etc)
No ratings yet
Qualities of Test (Validity & Relibility Etc)
38 pages
Language Testing Summary of Chapters 1 6
No ratings yet
Language Testing Summary of Chapters 1 6
34 pages
Constructionoftests 211015110341
No ratings yet
Constructionoftests 211015110341
57 pages
Trixielyn Kate N. Roxas - Improving Assessment Items
No ratings yet
Trixielyn Kate N. Roxas - Improving Assessment Items
28 pages
Constructionoftests 211015110341
No ratings yet
Constructionoftests 211015110341
57 pages
Conditions of A Good Test #1
No ratings yet
Conditions of A Good Test #1
27 pages
Skills Based CV
No ratings yet
Skills Based CV
2 pages
Word High Quality Assessment Componentsr
No ratings yet
Word High Quality Assessment Componentsr
8 pages
Language - Testing - Characteristics of Good Test
No ratings yet
Language - Testing - Characteristics of Good Test
31 pages
Master the Essentials of Assessment and Evaluation: Pedagogy of English, #4
From Everand
Master the Essentials of Assessment and Evaluation: Pedagogy of English, #4
Dr. Jayanthi N.L.N.
No ratings yet
Standardized & Non Standarized Tests
No ratings yet
Standardized & Non Standarized Tests
24 pages
Lesson 5 Criteria To Consider When Constructing Good Test Items
No ratings yet
Lesson 5 Criteria To Consider When Constructing Good Test Items
22 pages
Royal University of Phnom Penh Faculty of Education Master of Education Program
No ratings yet
Royal University of Phnom Penh Faculty of Education Master of Education Program
41 pages
Measuring Instrument Module 2
No ratings yet
Measuring Instrument Module 2
10 pages
Junior English New (1) - 1
No ratings yet
Junior English New (1) - 1
4 pages
Assessment and Evaluation in Education - Teacher Note Forthe Midterm
No ratings yet
Assessment and Evaluation in Education - Teacher Note Forthe Midterm
8 pages
01 Ra9163lecture
No ratings yet
01 Ra9163lecture
32 pages
EFL Assessment
No ratings yet
EFL Assessment
14 pages
Test - Education (1) STANDARDIZED TESTS
No ratings yet
Test - Education (1) STANDARDIZED TESTS
9 pages
Xtics of Good Test BI
No ratings yet
Xtics of Good Test BI
22 pages
Course Outline
No ratings yet
Course Outline
3 pages
311 Assignment Lecture Notes
No ratings yet
311 Assignment Lecture Notes
7 pages
Assessment P3 Notes Part 1
No ratings yet
Assessment P3 Notes Part 1
7 pages
Test Ok
No ratings yet
Test Ok
8 pages
10 - Language Testing and Assessment
No ratings yet
10 - Language Testing and Assessment
4 pages
Assessment Midtrerm Quiz Reviewer
No ratings yet
Assessment Midtrerm Quiz Reviewer
3 pages
Midterm Assess
No ratings yet
Midterm Assess
3 pages
German Admission Lecture-1
No ratings yet
German Admission Lecture-1
3 pages
United International University
No ratings yet
United International University
21 pages
Principles of Language Assessment: Debi Annisa Anang Yunianto W by
No ratings yet
Principles of Language Assessment: Debi Annisa Anang Yunianto W by
17 pages
Addressing Absenteeism - Lessons For Policy and Practice (2019)
No ratings yet
Addressing Absenteeism - Lessons For Policy and Practice (2019)
12 pages
Validity & Reliability
No ratings yet
Validity & Reliability
27 pages
Arif Profile
No ratings yet
Arif Profile
3 pages
SCOB031
No ratings yet
SCOB031
6 pages
Damasco, John Rey C
No ratings yet
Damasco, John Rey C
11 pages
Moderation and Validity of Tests and Exams
No ratings yet
Moderation and Validity of Tests and Exams
5 pages
What Is Reliability
No ratings yet
What Is Reliability
2 pages
Psych Ass Ratio March 4
No ratings yet
Psych Ass Ratio March 4
4 pages
Assessment of Student Learning
No ratings yet
Assessment of Student Learning
3 pages
El 114 Prelim Module 2
No ratings yet
El 114 Prelim Module 2
9 pages
Complex Sentences1, With Adverb Clauses - ESLC (Repaired)
No ratings yet
Complex Sentences1, With Adverb Clauses - ESLC (Repaired)
7 pages
HALL TICKET FOR SUMMER 2025 of 23610960252
No ratings yet
HALL TICKET FOR SUMMER 2025 of 23610960252
1 page
Final Language Testing - Sita Larasati
No ratings yet
Final Language Testing - Sita Larasati
4 pages
Unit Iii - Designing and Developing Assessments
No ratings yet
Unit Iii - Designing and Developing Assessments
5 pages
Theories and Models of Multiculturalism
No ratings yet
Theories and Models of Multiculturalism
8 pages
Characteristics of A Good Test: Validity and Reliability Criteria of Assessment and Rubric of Scoring
No ratings yet
Characteristics of A Good Test: Validity and Reliability Criteria of Assessment and Rubric of Scoring
6 pages
RPH Module 2.1 - Selected Textual Primary Sources FINAL VERSION
No ratings yet
RPH Module 2.1 - Selected Textual Primary Sources FINAL VERSION
7 pages
The Effects of Language Barriers On Canadian Immigrants
No ratings yet
The Effects of Language Barriers On Canadian Immigrants
2 pages
Document 17 Standardized and Non Standarized Test
No ratings yet
Document 17 Standardized and Non Standarized Test
3 pages
Role of Women in Progressive Era Lesson Plan
No ratings yet
Role of Women in Progressive Era Lesson Plan
2 pages
Motivation Letter LL
No ratings yet
Motivation Letter LL
1 page
Curriculum Vitae: Md. Redowan
No ratings yet
Curriculum Vitae: Md. Redowan
2 pages

Week V & VI

Uploaded by

Week V & VI

Uploaded by

Testing Principles

According to Henning [1987], reliability is

 Test retest reliability: administer whatever the test

 IRT assumes that the latent construct (e.g.

Concurrent validity [external]

 The quality of the relationship between a test and

 Are the test procedures practical?

 Norm-referenced refers to standardized tests that are

 Calculating norm-referenced scores is called the “norming

 Is designed to measure the global language

 Criterion reference test is usually produced

 They are often based on the number of correct

 Hypothetically, if all the students who took a

Types of interpretation Relative Absolute

Type of measurement To measure general To measure specific

Purpose of testing Spread students out a long a Assess the amount of

Distribution of scores Normal distribution Varies; often non normal.

Test structure A few relatively long subtest A series of short-well

Knowledge of questions Students have little or no Student know exactly what

 Discrete point: measures the small bits and

 Fairness issue: a test treats every student the

 A multiple choice item consists of a problem,

 presents a clearly defined problem or task to the student,

 Multiple choice test items can be written to assess

 Consider whether the alternatives:

 are parallel in structure,

 Consider whether the distractors

 contain one or more items a student can consider

 The apple is located on or around

 Do you see the chair and table? The apple

Option d (no article) will be easily detected

 If an omission occurs in the stem, it

 According to the passage, antidisestablismentarianism

 Why are statistical studies inaccessible to language

 Unintentional clues are avoided

 Items should be worded carefully enough so it can be

 Use a logical sequence for alternatives

 Make all incorrect alternatives (i.e., distractors) plausible

 Use 4 or 5 alternatives in each item.

 All alternatives should be homogeneous in

 Avoid unintentional clues

Are all distracters plausible?

 Avoid needless redundancy

 Correct answers should be randomly

 Present the students with two columns of

 More options should be supplied than premises so the

 There should be a sufficient context to convey

 Items that the students can answer in a few phrases

 The response allows the students to

 Explain the portfolios to the students

You might also like