Prof. Ed. 6 Assessment of Learning 1: Module 2 (Midterm) Prepared By: Joewe B. Belga, Ed. D. Instructor

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 12
At a glance
Powered by AI
The key takeaways are that assessment and instruction should be parallel and focus on the learner. Teachers need to use a variety of strategies to assess learners' readiness and plan instruction based on students' needs. Ongoing assessment is important for planning.

The five characteristics are content validity, reliability, fairness, student engagement/motivation, and consequential relevance. They are measured by exploring their meaning and importance as well as developing a relational map comparing them.

The three measures of reliability are test-retest, interrater, and parallel forms reliability. Test-retest measures consistency over time, interrater measures agreement between raters, and parallel forms measures correlation between equivalent tests. They are used to assess how well a method withstands factors like time or subjectivity.

SAN JOSE COMMUNITY COLLEGE

Education Department

Prof. Ed. 6
ASSESSMENT OF LEARNING 1

Module 2 (Midterm)

Prepared by:

JOEWE B. BELGA, Ed. D.


Instructor
Unit 3: Designing and Developing Assessment

At the end of this unit, the students will be able to:

1. Develop assessment tools that are learner-appropriate and target-matched; and


2. Improve assessment tools based on assessment data.

Pre-Test. Read and React!

1. What are the assessment tools that the teachers mostly used in the classroom? Are
those tools effective? Justify your answer.
___________________________________________________________________________
___________________________________________________________________________
2. Choose a lesson with a learning objectives. If you were to provide assessment
strategies before, during and after on the topic you choose, what would it be?
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
3. How do you assess learners? What aspect do you consider prior to assessment?
___________________________________________________________________________
___________________________________________________________________________
4. Cite the arguments for and against testing frequency. ____________________________
___________________________________________________________________________
___________________________________________________________________________
5. Discuss the distinct features of a take-home examination_________________________
___________________________________________________________________________
___________________________________________________________________________

Lesson 1: Characteristics of Quality Assessment Tools

Introduction:
Assessment and instruction are parallel in a classroom that focuses on the
learner. Teachers need to use a variety of strategies to assess learners’ readiness in a
particular unit of study. They need to plan their instruction around the needs to
learners demonstrate. Ongoing assessment of student learning is an important part of
the planning process (PPST 2018).

Characteristics of Quality Assessment Tools

Assessment literacy involves understanding how assessments are made, what type of
assessments answer what questions, and how the data from assessments can be used to help
teachers, students, parents and other stakeholders make decisions about teaching and
learning. Assessment designers strive to create assessments that show a high degree of fidelity
to the following traits:

a. Content validity
b. Reliability
c. Fairness
d. Students engagement and motivation
e. Consequential relevance

Task 1: EXPLORE
Compare me! Supply the table to compare the five characteristics of quality
assessment tools.

Characteristics Meaning Importance


Task 2: Let’s Create and Develop.

Develop a relational map as provided in the table above of the types and
characteristics of the assessment tool.

Task 3: ASSESS

Let’s do it, my reflection.

1. I realized that in developing assessment tool__________________________________


__________________________________________________________________________
___________________________________________________________________________
2. As future educator I will adopt these five characteristics in developing assessment tool
because____________________________________________________________________
___________________________________________________________________________
3. How was the true-false test constructed? Analyze the rules and cite your interpretation
and give an example.

4. How do you construct the multiple choice test? Cite your basic understanding and
give an example.

Lesson 2: Types of Teacher-made Tests


Pre-Tests

Instruction: Below are four test item categories labeled A, B, C, and D. The following test item
categories are sample learning objectives. Write the letter of the most appropriate test item
category before the number.

A= Objective Test Item (Multiple choice, true-false, matching)


B= Performance Test item
C= Essay Test Item (extended response)
D= Essay Test Item (Short answer)
1. Name the parts of the human skeleton
2. Appraise a composition on the basis of its organization
3. Demonstrate safe laboratory skills
4. Cite four examples of satire that Twain uses in Huckleberry Finn
5. Design a logo for a web page
6. Describe the impact of a bull market
7. Diagnose a physical ailment
8. List important mental attributes necessary for an athlete
9. Categorize great American fiction writers
10. Analyze the major causes of learning disabilities

I. Activity

1. Cite the difference of two general categories of test items

2. As teachers when do you use the objective type of test? Why?

3. How does a good multiple choice questions constructed?

Two general categories of test items


1. Objective items which requires students to select the correct response
from several alternatives or to supply a word or short phrase to answer a question
or complete a statement. It includes:
a. Multiple choice
b. True-false
c. Matching type
d. Completion
2. Subjective or essay items which permit the student to organize and
present an original answer. It includes:
a. Short-answer essay
b. Extended-response essay
c. Problem solving
d. Performance test items

When to use Essay or Objective Tests

1. Essay tests are appropriated when:


a. The group to be tested is small and the test is not to be reused.
b. You wish to encourage and reward the development of student skill in writing.
c. You are more interested in exploring the student’s attitudes than in measuring his/her
achievement.
2. Objective tests are appropriate when:
a. The group to be tested is large and the test may be reused
b. Highly reliable scores must be obtained as efficiently as possible.
c. Impartiality of evaluation, fairness, and freedom from possible test scoring influences
are essential.

3. Either essay or objective tests can be used to:


a. Measure almost any important educational achievement a written test can
measure
b. Test understanding and ability to apply principles
c. Test ability to think critically.
d. Test ability to solve problems.

Rules of thumb in constructing a True-False Test items


1. Do not give hint in the body of the question.
2. Avoid using the words “always”, “never”, “often” and other words that tend to be either always true or always
false.
3. Avoid long sentences as these tend to be “true”. Keep sentence short.
4. Avoid tricky statement with some minor misleading word or spelling anomaly, misplaced phrase, etc. A wise
student who does not know the subject matter may detect this strategy and thus get the answer correctly.
5. Avoid quoting verbatim from reference materials or textbooks. This practice sends the wrong signal to the
students that it is necessary to memorize the textbook word for word and thus, acquisition of higher level
thinking skills is not given due importance.
6. Avoid specific determiners or give-away qualifiers.
7. With true of false questions, avoid a grossly disproportionate number of either true of false statements or
even patterns in the occurrence of true and false statement

Multiple Choice Tests

A generalization of the true-false test, the multiple choice type of test offers the
student with more than two (2) options per item to choose from. Each item in a multiple
choice test consists of two parts: (a) the stem, and (b) the options. In the set of options, there
is a “correct” or “best” option while all the other are considers “distracters” are chosen in such
a way that they are attractive to those who do not know the answer or are guessing but at the
same time, have no appeal to those who actually know the answer. It is this feature of multiple
choice type tests that allow the teacher to test higher order thinking skills even if the options
are clearly stated. As in true-false items, there are certain rule of thumb to be followed in
constructing multiple choice tests.

Guidelines in Construction Multiple Choice Items


1. Do not unfamiliar words, terms and phrases.
2. Do not use modifiers that are vague and whose meaning scan differ from one person to the next such as:
much, often, usually, etc.
3. Avoid complex or awkward word arrangements. Also, avoid use of negatives in the stem as this may add
unnecessary comprehension difficulties.
4. Do not use negatives or double negatives as such statements tend to be confusing. It is best to use simpler
sentences rather than sentences that would require expertise in grammatical construction.
5. Each item stem should be as short as possible; otherwise you risk testing more for reading and
comprehension skills.
6. Distracters should be equally plausible and attractive.
7. All multiple choice options should be grammatically consistent with the stem.
8. The length, explicitness, or degree of technicality of alternatives should not be the determinants of the
correctness of the answer.
9. Avoid stems that reveal the answer to another item.
10. Avoid alternative that are synonymous with others or those that, include or overlap others
11. Avoid presenting sequenced items in the same order as in the text.
12. Avoid use of assumed qualifiers that many examinees may not be aware of.
13. Avoid use of unnecessary words of phrases, which are not relevant to the problem at hand. The item’s value
is particularly damaged if the unnecessary material is designed to distract or mislead. Such items test the
student’s reading comprehension rather than knowledge of the subject matter.
14. Avoid use of non-relevant sources of difficulty such as requiring a complex calculation when only knowledge
of a principles is being tested.
15. Pack the question in the stem.
16. Use the “None of the above” option only when the keyed answer is totally correct. When choice of the “best”
response is intended, “none of the above” is not appropriate, since the implication has already been made
that the correct response may be partially inaccurate.
17. Note that use of “all of the above” may allow credit for partial knowledge. In a multiple option item, if a
student only knew that two (2) options were correct, he could then deduce the correctness of “ all of the
above”. This assumes you are allowed only one correct choice.
18. Better still use “none of the above” and “ all of the above” sparingly. But best not to use them at all.
19. Having compound response choices may purposefully increase difficulty of an item.

Matching Type
It may be considered modified multiple choice type items where the choices
progressively reduce as one successfully matches the items on the left with the items
on the right.
Guidelines in Constructing Matching Type of Test
1. Match homogeneous not heterogeneous items.
2. The stem must be in the second column while the options must be in the second column.
3. The options must be more in number than the stems to prevent the student from arriving at the answer by
mere process of elimination.
4. To help the examinee find the answer easier, arrange the options alphabetically or chronologically.
5. Like any other test, the direction of the test must be given.

Supply Type or Constructed-Response Type


Another useful device for testing lower order thinking skills is the supply type of tests.
Like the multiple choice test, the items in this kind of test consist of a stem and a blank where
the students would write the correct answer. It depends heavily on the way the stems are
constructed. These tests allow for one and only one answer and, hence often test only the
student’s knowledge.

Completion Type of Test


Guidelines in the Formulation of a Completion Type of Test
1. Avoid over mutilated sentences.
2. Avoid open-ended item.
3. The blank should be at the end or near the end of the sentence.
4. Ask question on more significant item not on trivial matter.
5. The length of the blanks must not suggest the answer.

Essay Test
A typical essay test usually consists of a small number of questions to which the
students is expected to recall and organize knowledge in logical, integrated answers. An essay
test item can be an extended response item or a short answer item.

Type of Essay

a. Extended Response: synthesis and evaluation levels; a lot of freedom in answers


b. Restricted response: more consistent scoring, outlines parameters of response.

Task 2: Let’s have some mental exercises to test your understanding

A. Give non-example of each of the following rules of thumb in the construction


of a true-false test. Improve on the non-examples for them to become good examples
of test.
1. Avoid giving hints in the body of the question.
2. Avoid using the words “always”. “never” and other such adverbs which tend to be
always true or always false.
3. Avoid long sentences which tend to be true. Keep sentence short.
4. Avoid a systematic pattern for true and false statement
5. Avoid ambiguous sentences which can be interpreted as true and at the same time
false.
B. Give non example of each of the following rules of thumb in the construction of
multiple choice test. Improve on the non-examples for them to become good
examples of test.
1. Phrase the stem to allow for only one correct or best answer.
2. Avoid giving away the answer in the stem.
3. Choose distracters appropriately.
4. Choose distracters so that they are all equally plausible and attractive.
5. Phrase questions so that they will test higher order thinking skills.

Task 3: Let’s try!

1. Construct a 5-item matching type to test this competency: Identify the computer parts
system.
2. Construct a 5-item supply type test to assess this competency: Identify farm tools
according to use.
3. Give an example of a supply type of test that will measure higher order thinking skills.
4. In what sense is a matching type test a variant of the multiple choice type of test?
Justify your answer.
5. Choose learning competencies from the K to 12 Curriculum Guide. Construct aligned
paper-and-pencil tests observing guidelines in test construction.

Creating a test is one of the most challenging tasks confronting a teacher. Well-
constructed tests motivate students and reinforce leaning. Well-constructed tests
enable teachers to assess the student’s mastery of course objectives. Tests also provide
feedback on teaching, often showing what was or was not communicated clearly.
Lesson 3: Construction of Table of Specifications (TOS)
The important steps in planning for a test are:

1. Identifying test objectives/lesson outcomes- An Objective test, if it is to be comprehensive,


must cover the various levels of Bloom’s taxonomy. Each objective consists of a statement of
what is to be achieved preferably by the students.
2. Deciding on the type of objective test to be prepared-
3. Preparing a Table of Specifications (TOS)
4. Constructing the draft test items
5. Try-out and validation

Preparing a table of specification (TOS). A Table of Specification is a test map that guides the teacher in
constructing a test. The TOS ensures that there is balance between items that test lower level thinking
skills and those which test higher order thinking skills in the test. The simplest TOS consists of a) level
of objective to be tested, b) statement of objective, c) item numbers where such an objective is being
tested, d) Number of items and percentage out of the total for the particular objective, and e) number
of days taught.
No Objectives No. of No. %
. days of Remember Understan Appl Analyz Evaluatio Create
taugh item d y e n
t s
1 Identify 10 28 37%
subject-
verb
2 Determin 9 26 35%
e subject
and
predicate
3 Formulate 7 21 28
rules on
subject-
verb
agreement
26 75 100%
Create- produce new or original work. Design, assemble, construct, conjecture, develop,
formulate,
author and investigate.
Evaluate. Justify a stand or decision. Appraise, defend, judge, select, support, value critique,
weigh.
Analyze. Draw connections among ideas. Differentiate, organize, relate, compare, contrast,
distinguish, examine, experiment, question, test.
Apply. Use information in new situations. Execute, implement, solve, use, demonstrate,
interpret,
operate, schedule, sketch.
Understand. Explain ideas or concepts. Classify, describe, discuss, explain, identify, locate,
recognize, report, select, translate.
Remember. Recall facts and basic concepts. Define, duplicate, list, memorize, repeat, state.

Task 1: Reflection
1. I Realize that the TOS has an important factor in developing quality assessment tools
because____________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
2. How can I use TOS to stablish a quality assessment tool?
___________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
3. Develop sample TOS using any subject in High School with sample test in at least 3
competencies with 10 total no. of days taught and 10 total no. of items.

Lesson 4: Assessment Tools Development

ENGAGE

The Four Steps of the Assessment Cycle

Step 1: Clearly define and identify the learning outcomes

Each program should formulate between 3 and 5 learning outcomes that describe what students should
be able to do (abilities), to know (knowledge), and appreciate (values and attitudes) following
completion of the program. The learning outcomes for each program will include Public Affairs
learning outcomes addressing community engagement, cultural competence, and ethical leadership.

Step 2: Select appropriate assessment measures and assess the learning outcomes

Multiple ways of assessing the learning outcomes are usually selected and used. Although direct and
indirect measures of learning can be used, it is usually recommended to focus on direct measures of
learning. Levels of student performance for each outcome is often described and assessed with the use
of rubrics.

It is important to determine how the data will be collected and who will be responsible for data
collection. Results are always reported in aggregate format to protect the confidentiality of the students
assessed.
Step 3: Analyze the results of the outcomes assessed

It is important to analyze and report the results of the assessments in a meaningful way. A small
subgroup of the DAC would ideally be responsible for this function. The assessment division of the
FCTL would support the efforts of the DAC and would provide data analysis and interpretation
workshops and training.

Step 4: Adjust or improve programs following the results of the learning outcomes assessed

Assessment results are worthless if they are not used. This step is a critical step of the assessment
process. The assessment process has failed if the results do not lead to adjustments or improvements in
programs. The results of assessments should be disseminated widely to faculty in the department in
order to seek their input on how to improve programs from the assessment results. In some instances,
changes will be minor and easy to implement. In other instances, substantial changes will be necessary

and recommended and may require several years to be fully implemented. Missouri State
U

Task 1: Explore/Apply
1. What are the goals of assessment in the performance of the students?
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________
2. Correlate the four processes. How does it c0ntribute in developing high performance? Explain.
__________________________________________________________________________________
__________________________________________________________________________________
3. I learn that_____________________________________________________________________

Lesson 5: Item Analysis

Introduction

The teacher normally prepares a draft of the test. Such a draft is subjected to item analysis and
validation in order to ensure that final version of the test would be useful and functional. First, the
teacher tries out the draft test to a group of students of similar characteristics as the intended test
takers. From the try-out group, each item will be analyzed in terms of its ability to discriminate
between those who know and those who do not know and also its level of difficulty, the item analysis
will provide information that will allow the teacher to decide whether to revise or replace an item, then,
finally, the final draft of the test is subjected to validation if the intent is to make use of the test as a
standard test for the particular unit or grading period. We shall be concerned with these concepts in
this lesson.

Item Analysis

Two characteristics of an item that will be of interest to the teacher

a. Item difficulty
b. Discrimination index

The difficulty of an item or item difficulty is defined as the number of students who are able to
answer the item correctly divided by the total number of students. Thus:

Item difficulty= number of students with correct answer/total number of students.

How do we decide on the basis of this index whether the item is too difficult or too easy? The following
arbitrary rule is often used in the literature:

Range of Difficulty Index Interpretation Action

1-0. 25 Difficult Revise or discard

0.26-0.75 Right difficulty Retain

0.76-above Easy Revise or discard

Difficulty items tend to discriminate between those who know and those who do not know the answer.
Conversely, easy items cannot discriminate between these two groups of students. We therefore
interested in deriving a measure that will tell us whether an item can discriminate between these two
groups of students. Such a measure is called an index of discrimination.
An easy way to derive such a measure is to measure how difficult an item is with respect to those in the
upper 25% of the class and how difficult it is with respect to those in the lower 25% of the class. If the
upper 25% of the class found the item easy yet the lower 25% found it difficult, then the item can
discriminate properly between these two groups. Thus:

Index of discrimination = DU-DL

Example: Obtain the index of discrimination of an item if the upper 25% of the class had a
difficulty index of 0.60 (i.e. 60% of the upper 25% got the correct answer) while the lower 25% of the
class had a difficulty index of 0. 20.

Here, DU= 0.60 while DL= 0.20, thus index of discrimination= .60-.20 = .40.

Theoretically, the index of discrimination can range from-1.0 (when DU=0 and DL=1) to 1.0 (when DU=1
and DL=0). When the index of discrimination is equal to -1, then this means that all of the lower 25% of
the students got the correct answer while all of the upper 25% got the wrong answer. In a sense, such
index discriminates correctly between the two groups but the item itself is highly questionable. Why
should the bright ones get the wrong answer and the poor ones get the right answer? On the other
hand, if the index of discrimination is 1.0, then this means that all of the lower 25% failed to get correct
answer while all of the upper 25% got the correct answer. This is a perfectly discriminating item and is
the ideal item that should be included in the test.

As in the case of the index of difficulty, we have the following rule of thumb:

Index range Interpretation Action

-1.0- _.50 Can discriminate but item is questionable Discard

-.55-0.45 Non-discriminating Revise

0.46- 1.0 Discriminating item Include

Task 1: Explore

Consider response a multiple choice type of test of which the following data obtained:

Item Option

1 A B* C D

0 40 20 20 Total

0 15 5 0 upper 25%

0 5 10 5 Lower 25%

The correct response is B. Let us compute the difficulty index and index of discrimination.

Task 2: ASSESS

a. Find the index of difficulty in each of the following situations:


1. N=60, number of wrong answers: upper 25%=2 lower 25% =6 Difficulty index .8666
2. N= 80, number of wrong answers: upper 25% =2 lower 25%= 9 .8625
3. N= 30, number of wrong answers: upper 25%=1 lower 25%=6 .7666
b. Which of the items in exercise A are found to be most difficult? why?
c. What are the benefits derived from Item Analysis?
d. Item Analysis Procedure for Norm-Provides the following information:
1. Task 1: Explore
2. Difficulty Index: 40/80 = 0.5 (Right difficulty, Retain the Item)
3. Discrimination Index = DU - DL = 0.75 - 0.25 = 0.5 (Can discriminate but item is questionable,
Discard)
4. DU = difficulty index of upper 25%. DU = 15/20 = 0.75
5. DL = difficulty index of lower 25%. DL = 5/20 = 0.25

The correct response is B. Let us compute the difficulty index and index of discrimination.

Task 2: ASSESS

a. Find the index of difficulty in each of the following situations:

1.N=60, number of wrong answers: upper 25%=2 lower 25% =6


* P= 2+6/60×100

=8/60×100

P=13.3%

2.N= 80, number of wrong answers: upper 25% =2 lower 25%= 9

* P= 2+9/80×100

=11/80×100

P=13.75%

3.N= 30, number of wrong answers: upper 25%=1 lower 25%=6

* P= 1+6/30×100

=7/30×100

P=23.3%

b. Which of the items in exercise A are found to be most difficult? why?

* The most difficult item is number 3 because its average is 23.3%.

c. What are the benefits derived from Item Analysis?

* It provides useful information for the class discussion of the test.

* It provides data which help students improve their learning.

* It provides insights and skill that leads to the preparation of better tests in the future.

d. Item Analysis Procedure for Norm-Provides the following information:

* The difficulty of an item

* The descriminating power of an item

* The effectiveness of each alternative

2.
3.

Lesson 6: Reliability and Validity


Task 1: ENGAGE
Validation
After the performing the item analysis and revising the items which need revision, the next
step is to validate the instrument. The purpose of validation is to determine the characteristics of the
whole test itself, namely the validity and reliability of the test. Validation is the process of collecting
and analyzing evidence to support the meaningfulness and usefulness of the test.

Validity is the extent to which a test measures what it purports to measure or a referring to the
appropriateness, correctness, meaningfulness and usefulness of the specific decisions a teacher makes
based on the test results. A test is valid when it is aligned to the learning outcomes.

Three main types of evidence

1. Content-related evidence of validity- refers to the content and format of the instrument.
2. Criterion-related evidence of validity- refers to the relationship between scores obtained
using the instrument and scores obtained using one or more other tests.
3. Constructed-related evidence of validity- refers to the nature of the psychological construct
or characteristics being measured by the test.

In order to obtain evidence of criterion-related validity, the teacher usually compares scores on
the test question with
the scores on some other
independent criterion test
which presumably has already high validity. Another type of criterion-related validity is called
predictive validity wherein the test scores in the instrument are correlated with scores on a later
performance of the students.

Apart from the use of correlation coefficient in measuring criterion –related validity, Gronlund
suggested using the so-called expectancy table. This table is easy to construct and consists of the test
categories listed on the left hand side and the criterion categories listed horizontally along the top of
the chart.

Reliability

It refers to the consistency of the scores obtained-how consistent they are for each individual
from one administration of an instrument to another and from one set of items to another. We already
gave the formula for computing the reliability of a test: for internal consistency; for instance, we could
use the split-half method or the Kuder-Richardson formulae (KR-20 or KR-21)

Reliability and validity are related concepts. If an instrument in unreliable, it cannot yet valid
outcomes. As reliability improves, validity improve. However, if an instrument is shown scientifically to
be valid then it is almost certain that it is also reliable.

The following table is a standard followed almost universally in educational tests and measurement.

Reliability Interpretation
.90 and above Excellent reliability; at the level of the best standardized tests
.80-90 Very good for a classroom test
.70-.80 Good for a classroom test; in the range of most. There are probably a few items
which could be improved
.60-.70 Somewhat low. This test needs to be supplemented by other measures (e.g.,
more tests) to determine grades. There are probably some items which could
be improved.
.50-.60 Suggests need for revision of test, unless it is quite short. The test should not
contribute heavily to the course grade, and it needs revision
.50 or below Questionable reliability. This test should not contribute heavily to the course
grade, and it needs revision.

Task 2: ASSESS

1. What is an expectancy table? Describe the process of constructing an expectancy table. When
do we use an expectancy table?
An expectancy table is a two-way table showing the relationship between two tests. Expectancy
tables are discussed as a device for interpreting the meaning of test results for those untrained
in statistics. Expectancy table can be used to display predictive validity data. Describing the
expectancy table is more likely the same in constructing the TOS but the expectancy table are more
on computing the percentages that indicate the probability of attaining a score based on
performance of another score. The expectancy table can be used by the teachers to help teachers
differentiate instruction by addressing the academic needs of individual students. Expectancy
table used when predicting validity to monitor students progress learning.
2. What is the relationship between validity and reliability? Can a test be reliable and yet not
valid? Illustrate
Reliability and validity are both about how well a method measures something: Reliability
refers to the consistency of a measure (whether the results can be reproduced under the same
conditions). Validity refers to the accuracy of a measure (whether the results really do
represent what they are supposed to measure). A measure can be reliable but not valid , if it is
measuring something very consistently but is consistently measuring the wrong construct.
Likewise, a measure can be valid but not reliable if it is measuring the right construct, but not
doing so in a consistent manner.
Illustration of reliability and validity:

3. Discuss the different measures of reliability. Justify the use of each measure in the context of
measuring reliability.
1. Test-retest reliability - it measures the consistency of results when you repeat the same test on
the same sample at a different point in time. You use it when you are measuring something
that you expect to stay constant in your sample.
How to measure it?
To measure test-retest reliability, it conduct the same test on the same group of people at two
different points in time. Then you calculate the correlation between the two sets of results.
Test-retest reliability can be used to assess how well a method resists these factors over time. The
smaller the difference between the two sets of results, the higher the test-retest reliability.
2. Interrater reliability (also called interobserver reliability) - it measures the degree of agreement
between different people observing or assessing the same thing. You use it when data is collected by
researchers assigning ratings, scores or categories to one or more variables.
How to measure it?
To measure interrater reliability, different researchers conduct the same measurement or
observation on the same sample. Then you calculate the correlation between their different sets of
results. If all the researchers give similar ratings, the test has high interrater reliability.
People are subjective, so different observers’ perceptions of situations and phenomena naturally
differ. Reliable research aims to minimize subjectivity as much as possible so that a different
researcher could replicate the same results. When designing the scale and criteria for data collection,
it’s important to make sure that different people will rate the same variable consistently with minimal
bias. This is especially important when there are multiple researchers involved in data collection or
analysis.
3. Parallel forms reliability - it measures the correlation between two equivalent versions of a test.
You use it when you have two different assessment tools or sets of questions designed to measure
the same thing.
How to measure it
The most common way to measure parallel forms reliability is to produce a large set of questions to
evaluate the same thing, then divide these randomly into two question sets. The same group of
respondents answers both sets, and you calculate the correlation between the results. High
correlation between the two indicates high parallel forms reliability.

4. Criterion related evidence of validity refers to the relationship between scores obtained using
the instrument and scores obtained using one or more test. How strong is this relationship?
How well do such scores estimate present or predict future performance of a certain type?

References:

https://www.td.org/videos/linking-assessment-data-to-training-outcomes

Rosita L. Navarro, Ph.D, Rosita G. Santo, Ph.D and Brenda B. Corpuz, Ph.D,Assessment of Learning 1,
Third Edition,

You might also like