Prof. Ed. 6 Assessment of Learning 1: Module 2 (Midterm) Prepared By: Joewe B. Belga, Ed. D. Instructor
Prof. Ed. 6 Assessment of Learning 1: Module 2 (Midterm) Prepared By: Joewe B. Belga, Ed. D. Instructor
Prof. Ed. 6 Assessment of Learning 1: Module 2 (Midterm) Prepared By: Joewe B. Belga, Ed. D. Instructor
Education Department
Prof. Ed. 6
ASSESSMENT OF LEARNING 1
Module 2 (Midterm)
Prepared by:
1. What are the assessment tools that the teachers mostly used in the classroom? Are
those tools effective? Justify your answer.
___________________________________________________________________________
___________________________________________________________________________
2. Choose a lesson with a learning objectives. If you were to provide assessment
strategies before, during and after on the topic you choose, what would it be?
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
3. How do you assess learners? What aspect do you consider prior to assessment?
___________________________________________________________________________
___________________________________________________________________________
4. Cite the arguments for and against testing frequency. ____________________________
___________________________________________________________________________
___________________________________________________________________________
5. Discuss the distinct features of a take-home examination_________________________
___________________________________________________________________________
___________________________________________________________________________
Introduction:
Assessment and instruction are parallel in a classroom that focuses on the
learner. Teachers need to use a variety of strategies to assess learners’ readiness in a
particular unit of study. They need to plan their instruction around the needs to
learners demonstrate. Ongoing assessment of student learning is an important part of
the planning process (PPST 2018).
Assessment literacy involves understanding how assessments are made, what type of
assessments answer what questions, and how the data from assessments can be used to help
teachers, students, parents and other stakeholders make decisions about teaching and
learning. Assessment designers strive to create assessments that show a high degree of fidelity
to the following traits:
a. Content validity
b. Reliability
c. Fairness
d. Students engagement and motivation
e. Consequential relevance
Task 1: EXPLORE
Compare me! Supply the table to compare the five characteristics of quality
assessment tools.
Develop a relational map as provided in the table above of the types and
characteristics of the assessment tool.
Task 3: ASSESS
4. How do you construct the multiple choice test? Cite your basic understanding and
give an example.
Instruction: Below are four test item categories labeled A, B, C, and D. The following test item
categories are sample learning objectives. Write the letter of the most appropriate test item
category before the number.
I. Activity
A generalization of the true-false test, the multiple choice type of test offers the
student with more than two (2) options per item to choose from. Each item in a multiple
choice test consists of two parts: (a) the stem, and (b) the options. In the set of options, there
is a “correct” or “best” option while all the other are considers “distracters” are chosen in such
a way that they are attractive to those who do not know the answer or are guessing but at the
same time, have no appeal to those who actually know the answer. It is this feature of multiple
choice type tests that allow the teacher to test higher order thinking skills even if the options
are clearly stated. As in true-false items, there are certain rule of thumb to be followed in
constructing multiple choice tests.
Matching Type
It may be considered modified multiple choice type items where the choices
progressively reduce as one successfully matches the items on the left with the items
on the right.
Guidelines in Constructing Matching Type of Test
1. Match homogeneous not heterogeneous items.
2. The stem must be in the second column while the options must be in the second column.
3. The options must be more in number than the stems to prevent the student from arriving at the answer by
mere process of elimination.
4. To help the examinee find the answer easier, arrange the options alphabetically or chronologically.
5. Like any other test, the direction of the test must be given.
Essay Test
A typical essay test usually consists of a small number of questions to which the
students is expected to recall and organize knowledge in logical, integrated answers. An essay
test item can be an extended response item or a short answer item.
Type of Essay
1. Construct a 5-item matching type to test this competency: Identify the computer parts
system.
2. Construct a 5-item supply type test to assess this competency: Identify farm tools
according to use.
3. Give an example of a supply type of test that will measure higher order thinking skills.
4. In what sense is a matching type test a variant of the multiple choice type of test?
Justify your answer.
5. Choose learning competencies from the K to 12 Curriculum Guide. Construct aligned
paper-and-pencil tests observing guidelines in test construction.
Creating a test is one of the most challenging tasks confronting a teacher. Well-
constructed tests motivate students and reinforce leaning. Well-constructed tests
enable teachers to assess the student’s mastery of course objectives. Tests also provide
feedback on teaching, often showing what was or was not communicated clearly.
Lesson 3: Construction of Table of Specifications (TOS)
The important steps in planning for a test are:
Preparing a table of specification (TOS). A Table of Specification is a test map that guides the teacher in
constructing a test. The TOS ensures that there is balance between items that test lower level thinking
skills and those which test higher order thinking skills in the test. The simplest TOS consists of a) level
of objective to be tested, b) statement of objective, c) item numbers where such an objective is being
tested, d) Number of items and percentage out of the total for the particular objective, and e) number
of days taught.
No Objectives No. of No. %
. days of Remember Understan Appl Analyz Evaluatio Create
taugh item d y e n
t s
1 Identify 10 28 37%
subject-
verb
2 Determin 9 26 35%
e subject
and
predicate
3 Formulate 7 21 28
rules on
subject-
verb
agreement
26 75 100%
Create- produce new or original work. Design, assemble, construct, conjecture, develop,
formulate,
author and investigate.
Evaluate. Justify a stand or decision. Appraise, defend, judge, select, support, value critique,
weigh.
Analyze. Draw connections among ideas. Differentiate, organize, relate, compare, contrast,
distinguish, examine, experiment, question, test.
Apply. Use information in new situations. Execute, implement, solve, use, demonstrate,
interpret,
operate, schedule, sketch.
Understand. Explain ideas or concepts. Classify, describe, discuss, explain, identify, locate,
recognize, report, select, translate.
Remember. Recall facts and basic concepts. Define, duplicate, list, memorize, repeat, state.
Task 1: Reflection
1. I Realize that the TOS has an important factor in developing quality assessment tools
because____________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
2. How can I use TOS to stablish a quality assessment tool?
___________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
3. Develop sample TOS using any subject in High School with sample test in at least 3
competencies with 10 total no. of days taught and 10 total no. of items.
ENGAGE
Each program should formulate between 3 and 5 learning outcomes that describe what students should
be able to do (abilities), to know (knowledge), and appreciate (values and attitudes) following
completion of the program. The learning outcomes for each program will include Public Affairs
learning outcomes addressing community engagement, cultural competence, and ethical leadership.
Step 2: Select appropriate assessment measures and assess the learning outcomes
Multiple ways of assessing the learning outcomes are usually selected and used. Although direct and
indirect measures of learning can be used, it is usually recommended to focus on direct measures of
learning. Levels of student performance for each outcome is often described and assessed with the use
of rubrics.
It is important to determine how the data will be collected and who will be responsible for data
collection. Results are always reported in aggregate format to protect the confidentiality of the students
assessed.
Step 3: Analyze the results of the outcomes assessed
It is important to analyze and report the results of the assessments in a meaningful way. A small
subgroup of the DAC would ideally be responsible for this function. The assessment division of the
FCTL would support the efforts of the DAC and would provide data analysis and interpretation
workshops and training.
Step 4: Adjust or improve programs following the results of the learning outcomes assessed
Assessment results are worthless if they are not used. This step is a critical step of the assessment
process. The assessment process has failed if the results do not lead to adjustments or improvements in
programs. The results of assessments should be disseminated widely to faculty in the department in
order to seek their input on how to improve programs from the assessment results. In some instances,
changes will be minor and easy to implement. In other instances, substantial changes will be necessary
and recommended and may require several years to be fully implemented. Missouri State
U
Task 1: Explore/Apply
1. What are the goals of assessment in the performance of the students?
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________
2. Correlate the four processes. How does it c0ntribute in developing high performance? Explain.
__________________________________________________________________________________
__________________________________________________________________________________
3. I learn that_____________________________________________________________________
Introduction
The teacher normally prepares a draft of the test. Such a draft is subjected to item analysis and
validation in order to ensure that final version of the test would be useful and functional. First, the
teacher tries out the draft test to a group of students of similar characteristics as the intended test
takers. From the try-out group, each item will be analyzed in terms of its ability to discriminate
between those who know and those who do not know and also its level of difficulty, the item analysis
will provide information that will allow the teacher to decide whether to revise or replace an item, then,
finally, the final draft of the test is subjected to validation if the intent is to make use of the test as a
standard test for the particular unit or grading period. We shall be concerned with these concepts in
this lesson.
Item Analysis
a. Item difficulty
b. Discrimination index
The difficulty of an item or item difficulty is defined as the number of students who are able to
answer the item correctly divided by the total number of students. Thus:
How do we decide on the basis of this index whether the item is too difficult or too easy? The following
arbitrary rule is often used in the literature:
Difficulty items tend to discriminate between those who know and those who do not know the answer.
Conversely, easy items cannot discriminate between these two groups of students. We therefore
interested in deriving a measure that will tell us whether an item can discriminate between these two
groups of students. Such a measure is called an index of discrimination.
An easy way to derive such a measure is to measure how difficult an item is with respect to those in the
upper 25% of the class and how difficult it is with respect to those in the lower 25% of the class. If the
upper 25% of the class found the item easy yet the lower 25% found it difficult, then the item can
discriminate properly between these two groups. Thus:
Example: Obtain the index of discrimination of an item if the upper 25% of the class had a
difficulty index of 0.60 (i.e. 60% of the upper 25% got the correct answer) while the lower 25% of the
class had a difficulty index of 0. 20.
Here, DU= 0.60 while DL= 0.20, thus index of discrimination= .60-.20 = .40.
Theoretically, the index of discrimination can range from-1.0 (when DU=0 and DL=1) to 1.0 (when DU=1
and DL=0). When the index of discrimination is equal to -1, then this means that all of the lower 25% of
the students got the correct answer while all of the upper 25% got the wrong answer. In a sense, such
index discriminates correctly between the two groups but the item itself is highly questionable. Why
should the bright ones get the wrong answer and the poor ones get the right answer? On the other
hand, if the index of discrimination is 1.0, then this means that all of the lower 25% failed to get correct
answer while all of the upper 25% got the correct answer. This is a perfectly discriminating item and is
the ideal item that should be included in the test.
As in the case of the index of difficulty, we have the following rule of thumb:
Task 1: Explore
Consider response a multiple choice type of test of which the following data obtained:
Item Option
1 A B* C D
0 40 20 20 Total
0 15 5 0 upper 25%
0 5 10 5 Lower 25%
The correct response is B. Let us compute the difficulty index and index of discrimination.
Task 2: ASSESS
The correct response is B. Let us compute the difficulty index and index of discrimination.
Task 2: ASSESS
=8/60×100
P=13.3%
* P= 2+9/80×100
=11/80×100
P=13.75%
* P= 1+6/30×100
=7/30×100
P=23.3%
* It provides insights and skill that leads to the preparation of better tests in the future.
2.
3.
Validity is the extent to which a test measures what it purports to measure or a referring to the
appropriateness, correctness, meaningfulness and usefulness of the specific decisions a teacher makes
based on the test results. A test is valid when it is aligned to the learning outcomes.
1. Content-related evidence of validity- refers to the content and format of the instrument.
2. Criterion-related evidence of validity- refers to the relationship between scores obtained
using the instrument and scores obtained using one or more other tests.
3. Constructed-related evidence of validity- refers to the nature of the psychological construct
or characteristics being measured by the test.
In order to obtain evidence of criterion-related validity, the teacher usually compares scores on
the test question with
the scores on some other
independent criterion test
which presumably has already high validity. Another type of criterion-related validity is called
predictive validity wherein the test scores in the instrument are correlated with scores on a later
performance of the students.
Apart from the use of correlation coefficient in measuring criterion –related validity, Gronlund
suggested using the so-called expectancy table. This table is easy to construct and consists of the test
categories listed on the left hand side and the criterion categories listed horizontally along the top of
the chart.
Reliability
It refers to the consistency of the scores obtained-how consistent they are for each individual
from one administration of an instrument to another and from one set of items to another. We already
gave the formula for computing the reliability of a test: for internal consistency; for instance, we could
use the split-half method or the Kuder-Richardson formulae (KR-20 or KR-21)
Reliability and validity are related concepts. If an instrument in unreliable, it cannot yet valid
outcomes. As reliability improves, validity improve. However, if an instrument is shown scientifically to
be valid then it is almost certain that it is also reliable.
The following table is a standard followed almost universally in educational tests and measurement.
Reliability Interpretation
.90 and above Excellent reliability; at the level of the best standardized tests
.80-90 Very good for a classroom test
.70-.80 Good for a classroom test; in the range of most. There are probably a few items
which could be improved
.60-.70 Somewhat low. This test needs to be supplemented by other measures (e.g.,
more tests) to determine grades. There are probably some items which could
be improved.
.50-.60 Suggests need for revision of test, unless it is quite short. The test should not
contribute heavily to the course grade, and it needs revision
.50 or below Questionable reliability. This test should not contribute heavily to the course
grade, and it needs revision.
Task 2: ASSESS
1. What is an expectancy table? Describe the process of constructing an expectancy table. When
do we use an expectancy table?
An expectancy table is a two-way table showing the relationship between two tests. Expectancy
tables are discussed as a device for interpreting the meaning of test results for those untrained
in statistics. Expectancy table can be used to display predictive validity data. Describing the
expectancy table is more likely the same in constructing the TOS but the expectancy table are more
on computing the percentages that indicate the probability of attaining a score based on
performance of another score. The expectancy table can be used by the teachers to help teachers
differentiate instruction by addressing the academic needs of individual students. Expectancy
table used when predicting validity to monitor students progress learning.
2. What is the relationship between validity and reliability? Can a test be reliable and yet not
valid? Illustrate
Reliability and validity are both about how well a method measures something: Reliability
refers to the consistency of a measure (whether the results can be reproduced under the same
conditions). Validity refers to the accuracy of a measure (whether the results really do
represent what they are supposed to measure). A measure can be reliable but not valid , if it is
measuring something very consistently but is consistently measuring the wrong construct.
Likewise, a measure can be valid but not reliable if it is measuring the right construct, but not
doing so in a consistent manner.
Illustration of reliability and validity:
3. Discuss the different measures of reliability. Justify the use of each measure in the context of
measuring reliability.
1. Test-retest reliability - it measures the consistency of results when you repeat the same test on
the same sample at a different point in time. You use it when you are measuring something
that you expect to stay constant in your sample.
How to measure it?
To measure test-retest reliability, it conduct the same test on the same group of people at two
different points in time. Then you calculate the correlation between the two sets of results.
Test-retest reliability can be used to assess how well a method resists these factors over time. The
smaller the difference between the two sets of results, the higher the test-retest reliability.
2. Interrater reliability (also called interobserver reliability) - it measures the degree of agreement
between different people observing or assessing the same thing. You use it when data is collected by
researchers assigning ratings, scores or categories to one or more variables.
How to measure it?
To measure interrater reliability, different researchers conduct the same measurement or
observation on the same sample. Then you calculate the correlation between their different sets of
results. If all the researchers give similar ratings, the test has high interrater reliability.
People are subjective, so different observers’ perceptions of situations and phenomena naturally
differ. Reliable research aims to minimize subjectivity as much as possible so that a different
researcher could replicate the same results. When designing the scale and criteria for data collection,
it’s important to make sure that different people will rate the same variable consistently with minimal
bias. This is especially important when there are multiple researchers involved in data collection or
analysis.
3. Parallel forms reliability - it measures the correlation between two equivalent versions of a test.
You use it when you have two different assessment tools or sets of questions designed to measure
the same thing.
How to measure it
The most common way to measure parallel forms reliability is to produce a large set of questions to
evaluate the same thing, then divide these randomly into two question sets. The same group of
respondents answers both sets, and you calculate the correlation between the results. High
correlation between the two indicates high parallel forms reliability.
4. Criterion related evidence of validity refers to the relationship between scores obtained using
the instrument and scores obtained using one or more test. How strong is this relationship?
How well do such scores estimate present or predict future performance of a certain type?
References:
https://www.td.org/videos/linking-assessment-data-to-training-outcomes
Rosita L. Navarro, Ph.D, Rosita G. Santo, Ph.D and Brenda B. Corpuz, Ph.D,Assessment of Learning 1,
Third Edition,