0% found this document useful (0 votes)
30 views49 pages

Principles of High Quality Assessment and Reliability

Uploaded by

Heaven Rebollido
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views49 pages

Principles of High Quality Assessment and Reliability

Uploaded by

Heaven Rebollido
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Guten tag!

Recap
3. VALIDITY
◾ Something valid is something fair.
◾ A valid test is one that measures what it
• is supposed to measure.

• Types of Validity
◾ Face: What do students think of the test?
◾ Construct: Am I testing in the way I taught?
◾ Content: Am I testing what I taught?
◾ Criterion-related: How does this compare
with the existing valid test?
Tests can be made more valid by making
them more subjective (open items).
Validity – appropriateness, correctness, meaningfulness and
usefulness of the specific conclusions that a teacher reaches
regarding the teaching-learning situation.

◾ Content validity – content and format of the instrument


i. Students’ adequate experience
ii. Coverage of sufficient material
iii. Reflect the degree of emphasis

◾Face validity – outward appearance of the test, the lowest form


of test validity

◾Criterion-related validity – the test is judge against a specific


criterion

◾Construct validity – the test is loaded on a “construct” or factor


PRINCIPLES OF HIGH QUALITY
ASSESSMENT
1. Clarity of learning targets
2. (knowledge, reasoning, skills, products, affects)
3. Appropriateness of Assessment Methods
4. Validity
5. Reliability
6. Fairness
7. Positive Consequences
8. Practicality and Efficiency
9. Ethics
1. CLARIT Y OF LEARNING TARGETS
(knowledge, reasoning, skills,
products, affects)
Assessment can be made precise, accurate and
dependable only if what are to be achieved are
clearly stated and feasible. The learning
targets, involving knowledge, reasoning, skills,
products and effects, need to be stated in
behavioral terms which denote something
which can be observed through the behavior of
the students.
CLARIT Y OF LEARNING TARGETS ( CONT)
Cognitive Targets
Benjamin Bloom ( 1954 ) proposed a hierarchy of educational
objectives at the cognitive level. These are:
• Knowledge – acquisition of facts, concepts and theories
• Comprehension - understanding, involves cognition or
awareness of the interrelationships
• Application – t ransfer of knowledge f rom one f ield of study to
another of f rom one concept to another concept in the same
discipline
• Analysis – breaking down of a concept or idea into i ts components
and explaining g the concept as a composition of these concepts
• Synthesis – opposite of analysis, entails putting together the
components in order to summarize the concept
• Evaluation and Reasoning – valuing and judgment or putting
the “ wor th” of a concept or principle.
CLARIT Y OF LEARNING TARGETS(CONT)

Skills, Competencies and Abilities Targets


§ Skills – specific activities or tasks that a student can
proficiently do
§ Competencies – cluster of skills
§ Abilities – made up of relate competencies categorized as:
i. Cognitive
ii. Affective
iii. Psychomotor
Products, Outputs and Project Targets
- tangible and concrete evidence of a student’s ability
- need to clearly specify the level of workmanship of projects
i. expert
ii. skilled
iii. novice
2. APPROPRIATENESS OF
ASSESSMENT
METHODS
a. Written-Response Instruments
§ Objective tests – appropriate for assessing the various levels
of hierarchy of educational objectives

§ Essays – can test the students’ grasp of the higher level


cognitive skills

§ Checklists – list of several characteristics or activities


presented to the subjects of a study, where they will analyze
and place a mark opposite to the characteristics.
2 . APPROPRIATENESS OF
ASSESSMENT
METHODS
b. Product Rating Scales
§ Used to rate products like book reports, maps, charts,
diagrams, notebooks, creative endeavors
§ Need to be developed to assess various products over the
years

c. Per formance Tests - Per formance checklist

§ Consists of a list of behaviors that make up a certain type


of performance
§ Used to determine whether or not an individual behaves
in a certain way when asked to complete a particular task
2 . APPROPRIATENESS OF
ASSESSMENT
METHODS
d. Oral Questioning – appropriate assessment method
when the objectives are to:
§ Assess the students’ stock knowledge and/or
§ Determine the students’ ability to communicate ideas in
coherent verbal sentences.

e. Obser vation and Self Repor ts


§ Useful supplementary methods when used in
conjunction with oral questioning and performance tests
5. FAIRNESS

The concept that assessment should be 'fair' covers a


number of aspects.
◾Student Knowledge and learning targets of
assessment
◾Opportunity to learn
◾Prerequisite knowledge and skills
◾Avoiding teacher stereotype
◾Avoiding bias in assessment tasks and
procedures
6. POSITIVE CONSEQUENCES

Learning assessments provide students with


effective feedback and potentially improve
their motivation and/or self-esteem. Moreover,
assessments of learning gives students the
tools to assess themselves and understand
how to improve.
- Positive consequence on students,
teachers, parents, and other stakeholders
7. PRACTICALITY AND EFFICIENCY

◾ Something practical is something effective in


real situations.
◾ A practical test is one which can be practically
administered.

Questions:
◾ Will the test take longer to design than apply?
◾ Will the test be easy to mark?

Tests can be made more practical by making


it more objective (more controlled items)
Teachers should be familiar with the test,
- does not require too much time
◾Teacher Familiarity with
the Method
◾Time required
◾Complexity of Administration
◾Ease of scoring
◾Ease of Interpretation
◾Cost
RELIABILITY, VALIDITY &
PRACTICALITY

The problem:

◾ The more reliable a test is, the less valid.


◾ The more valid a test is, the less reliable.
◾ The more practical a test is, (generally)
the less valid.

The solution:

As in everything, we need a balance (in


both exams and exam items)
8. ETHICS

◾Informed consent
◾Anonymity and
Confidentiality
1. Gathering data
2. Recording Data
3. Reporting Data
ETHICS IN ASSESSMENT – “RIGHT AND
WRONG”

◾Conforming to the standards of conduct of a given


profession or group
◾Ethical issues that may be raised
i. Possible harm to the participants.
ii. Confidentiality.
iii. Presence of concealment or deception.
iv. Temptation to assist students.
Reliability and Other Desired
Characteristics
RELIABILITY
◾ Something reliable is something that works well
and that you can trust.
◾ A reliable test is a consistent measure of what
it is supposed to measure.

Questions:
◾ Can we trust the results of the test?
◾ Would we get the same results if the tests were
taken again and scored by a different person?

Tests can be made more reliable by making


them more objective (controlled items).
◾Reliability is the extent to
which an experiment, test, or
any measuring procedure yields
the same result on repeated
trials.
Equivalency reliability is the extent
to which two items measure identical
concepts at an identical level of
difficulty. Equivalency reliability is
determined by relating two sets of test
scores to one another to highlight the
degree of relationship or association.
◾Stability reliability (sometimes
called test, re-test reliability) is the
agreement of measuring
instruments over time. To determine
stability, a measure or test is
repeated on the same subjects at a
future date.
◾Internal consistency is the extent to
which tests or procedures assess the
same characteristic, skill or quality.
It is a measure of the precision
between the observers or of the
measuring instruments used in a
study.
◾Interrater reliability is the extent
to which two or more individuals
(coders or raters) agree. Interrater
reliability addresses the consistency
of the implementation of a rating
system.
RELIABILIT Y – CONSISTENCY,
DEPENDABILIT Y, STABILIT Y
WHICH CAN BE ESTIMATED BY
◾ Split-half method
◾ Calculated using the
i. Spearman-Brown prophecy formula
ii. Kuder-Richardson – KR 20 and KR21
◾Consistency of test results when the same test is
administered at two different time periods
i. Test-retest method
ii. Correlating the two test results
REALIABILITY
• It provides the consistency that
makes validity possible

• It indicates the degree to which


various kinds of generalizations are
justifiable.

• It refers to the consistency of


measurement, that is, how consistent
test scores or other assessment
results are from one measurement to
another.
NATURE OF
• The meaning
REALIABILITY
of reliability, as applied to
testing and
assessment, can be further clarified by noting the
following general points:

1. Reliability refers to the results obtained


with an assessment instrument and not to
the instrument itself.
2. An estimate of reliability always refers to a
particular type of consistency.
3. Reliability is a necessary but not sufficient
condition for validity.
Determining Reliability by
Correlation Methods
TERMINOLOGY
• CORRELATION COEFFICIENT – A static that indicates the
degree of relationship between any two sets of scores
obtained from the same group of individuals (e.g.,
correlation between height and weight)
• VALIDITY COEFFICIENT – A correlation coefficient that
indicates the degree to which a measure predicts or
estimates performance on some criterion measure (e.g.,
correlation between scholastic aptitude scores and
grades in school).
• RELIABILITY COEFFICIENT – A correlation coefficient that
indicates the degree of relationship between two sets
of scores intended to be measures of the same
characteristic (e.g., correlation between scores assigned
by two different raters or sores obtained from
administrations of two forms of a test).
Methods of Estimating Reliability
Method Type of Reliability Measure Procedure
Test – retest Measure of stability Give the same test twice to the same group with some
time interval between tests, from several minutes to
several years.
Equivalent – forms Measure of equivalence Give two forms of the test to the same group in close
succession
Test – retest with Measure of stability and Give two forms of the test to the same group with an
equivalent – forms equivalence increased time interval between forms
Split – half Measure of internal consistency Give test once; score two equivalent halves of test;
correct correlation between halves to fit whole test by
Spearman – Brown formula
Coefficient alpha Measure of internal consistency Give test once; score test items and apply formula
Interrater Measure of consistency of Give a set of student responses requiring judgmental
ratings scoring in two or more raters and have them
independently score the responses
Standard Error of
Measurement
Hypothetical Distribution Illustrating the
Standard Error of Measurement
Standard Error of Measurement

• It shows why a test score should be interpreted as a band of scores


(called a confidence band) rather than a specific score.

• With a large standard error, the band of scores is wide, and we have less
confidence in our obtained score.

• If the standard error is small, the band of scores will be narrow, and we
will have greater confidence that our obtained score is a dependable
measure of the characteristic.
Standard Error of Measurement

The relationship
between the reliability
coefficient and the
standard errors of
measurement can be
seen in the table, which
presents the standard
errors of measurement
for various reliability
coefficients and
standard deviations.
Standard Error of Measurement

• The standard error of measurement has two special advantages as a


means of estimating reliability.

1. The estimates are in the same units as the assessment scores.


2. The standard error is likely to remain fairly constant from group to
group.

• The main difficulty encountered with the standard error occurs when we
want to compare two assessments that use different types of scores.
Factors Influencing Reliability
Measures
Factors Influencing Reliability Measures

Number of Spread of Scores


Assessment Tasks
• The larger the number • The larger the spread of
of tasks on an scores, the higher the
01 assessment, the 0 estimate of reliability will
higher its reliability be.
will be. 2 • Larger reliability
• A longer assessment coefficients result when
will provide a more individuals stay in the
adequate sample of same relative position in a
the behavior being group from one
measured, and the assessment to another, it
scores are apt to be naturally follows that
less distorted by anything that reduces the
chance factors. possibility of shifting
positions in the group also
contributes to larger
reliability coefficients.
Factors Influencing Reliability Measures

Objectivity • Methods
The sizeof of Estimating
the
Reliability
reliability
• The objectivity of an
coefficient is
assessment refers related to the
0 to the degree to 0 method of
which equally
3 competent scores 4 estimating
reliability.
obtain the same
• The variation in the
results. size of the
• The test items are reliability
of the objective coefficient
type and the resulting from the
resulting scores are
method of
not influenced by estimating
the scorers’ reliability is directly
judgment or opinion. attributable to the
type of consistency
included in each
Reliability of Assessments
Evaluated in Terms of a
Fixed Performance Standard
Fixed Performance Standard
• The most natural approach to reliability is to evaluate the consistency
with which students are classified as performing above or below the
standard.

• This type of reliability can be readily determined by computing the


percentage of consistent decisions as the result of having performances
evaluated by different raters or over two equivalent forms of an
assessment.
Assessment B

Fails to meet
Meets standard
standard
Meets
2 20
standard
Assessment A
Fails to meet
7 1
standard

A classification of 30 students with respect to a fixed performance standard


We can compute a percentage of consistency using the following formula
Reliability Demands and Nature of the
Decision
High reliability is demanded when the

• Decision is important
• Decision is final
• Decision is irreversible
• Decision is unconfirmable
• Decision concerns individuals
• Decision has lasting consequences
Reliability Demands and Nature of the
Decision
Low reliability is demanded when the

• Decision is of minor importance


• Decision making is in early stages
• Decision is reversible
• Decision is confirmable by other data
• Decision concerns groups
• Decision has temporary effects
Thank you!

You might also like