COED CastanedaJA Midterm Module Assessment in Learning 1
COED CastanedaJA Midterm Module Assessment in Learning 1
COLLEGE OF EDUCATION
(Bachelor in Technology and
Livelihood Education)
COURSE MODULE IN
ASSESSMENT
IN LEARNING 1
1st Semester; A.Y. 2021 – 2022
COURSE FACILITATOR: JUDY ANN B. CASTAÑEDA, LPT, MEd
FB/MESSENGER: Jo De Ann
Email: [email protected]
Phone No: 09462018351
MISSION
environment and to contribute to nation – building by providing education, training, research and
INSTITUTIONAL OUTCOMES
Welcome to the first semester of School Year 2021-2022! Welcome to the College of
Education and welcome to NONESCOST!
Despite of all the happenings around us, there is still so much to be thankful for and one
of these is the opportunity to continue learning.
You are right now browsing your course module in TLM 101, Introduction to Industrial
Arts 1. As you read on, you will have an overview of the course, the content,
requirements and other related information regarding the course. The module is made up
of 3 lessons. Each lesson has seven parts:
LEARNING ACTIVITIES – To measure your learnings in the lesson where you wandered
I encourage you to get in touch with me in case you may encounter problems while
studying your modules. Keep a constant and open communication. Use your real names
in your FB accounts or messenger so I can recognize you based on the list of officially
enrolled students in the course. I would be very glad to assist you in your journey.
Furthermore, I would also suggest that you build a workgroup among your classmates.
Participate actively in our discussion board or online discussion if possible and submit
your outputs/requirements on time. You may submit them online through email and
messenger. You can also submit hard copies. Place them in short size bond paper inside
a short plastic envelop with your names and submit them in designated pick-up areas.
I hope that you will find this course interesting and fun. I hope to know more of your
experiences, insights, challenges and difficulties in learning as we go along this course. I
am very positive that we will successfully meet the objectives of the course.
May you continue to find inspiration to become a great professional. Keep safe and God
bless!
This course module is an official document of Northern Negros State College of Science
and Technology under its Learning Continuity Plan on Flexible Teaching-Learning
modalities.
Quotations from, contractions, reproduction, and uploading of all or any part of this
module is not authorized without the permission from the faculty-author and from the
NONESCOST.
Course
EDP 106
Number
Course Title Assessment in Learning 1
This is a 3-unit course that focuses on the principles, development and
utilization of conventional assessment tools to improve the teaching-learning
Course process. It emphasizes on the use of testing for measurement of knowledge,
Description comprehension and other thinking skills. It allows students to go through the
standard steps in test construction for quality assessment. It includes
competencies contained in the Trainers Methodology 1 of TESDA.
No. of Units 3 units
Pre-requisites None
1. Demonstrate and apply the basic and higher level of literacy,
communication, numeracy and critical thinking skills needed for higher
learning;
Course 2. Establish and maintain an environment needed for holistic
Intended development of learners;
Learning 3. Create an environment conducive to learning;
Outcomes 4. Demonstrate mastery of the subject matter;
5. Apply familiarity with the learners knowledge and experience in
appropriate situations; and
6. Facilitate learning of diverse types of learners.
Content MODULE 1
Coverage LESSON 1
SHIFT OF EDUCATIONAL FOCUS FROM CONTENT TO LEARNING
OUTCOMES
A. Outcome-Based Education: Matching Intentions with
Accomplishments
B. The Outcomes of Education
C. Institutional, Program, Course and Learning Outcomes
D. Sample Educational Objectives and Learning Outcomes in TLE (K to
12)
LESSON 2
MEASUREMENT, ASSESSMENT AND EVALUATION IN OUTCOME
BASED EDUCATION
A. Measurement
B. Assessment
C. Evaluation
D. Assessment “FOR” “OF” and “AS” Learning: Approaches to
Assessment
LESSON 3
PROGRAM OUTCOMES
A. Program Outcomes and Student Learning Outcomes
B. Program Outcomes for Teacher Education
C. The Three Types of Learning
D. Domain I: Cognitive (Knowledge)
E. Domain II: Psychomotor (Skills)
F. Domain III: Affective (Attitude)
G. Kendall’s and Manzano’s New Taxonomy
LESSON 2
DEVELOPMENT OF VARIED ASSESSMENT TOOLS
A. Planning a Test and Construction of Table of Specifications (TOS)
B. Types of Pencil-and-Paper Tests
C. Constructing Selected-Response Type
a. True-False Test
b. Multiple Choice Test
c. Matching Type
D. Construction Supply Type or Constructed Response Type
a. Completion Type of Test
b. Essays
LESSON 3
ITEM ANALYSIS AND VALIDATION
A. Item Analysis
a. Difficulty Index
b. Discrimination Index
B. The U-L Index Method of Item Analysis
C. Validity
D. Reliability
MODULE 3
LESSON 1
MEASURE OF CENTRAL TENDENCY
A. Measure of Central Tendency
a. Mean
b. Median
c. Mode
B. Normal and Skewed Distributions
C. Measure of Dispersion or Variability
a. Range
b. Variance
c. Standard Deviation
LESSON 2
GRADING SYSTEMS
LESSON 3
TRAINERS METHODOLOGY 1 (TM1) QUALIFICATIONS
A. Basic Competencies and Core Competencies of TM1
B. Requirements in Taking the Assessment for TM1
C. Factors to be Considered in Planning Assessment Activity
REFERENCES:
TEXTBOOK:
T1 – Navarro, et.al., 2019 “Assessment in Learning 1”, Manila: Lorimar
Publishing, Inc.
OTHER REFERENCES:
R1 – Copy of VMGO
R2 – TESDA Trainers Methodology 1
R3 – Gabuyo, Yonardo A., et al., (2013) Assessment of Learning I,
Manila: Rex Bookstore.
R4 – Balagtas, M. & Dacanay, A (2013) A Reviewer for the LET
(Assessment of Student Learning) Manila: Philippine Normal
University R5 – Caparas, H. 2002 “Module 6.9: Curriculum and
Instruction (Technology And Livelihood Education)”. DepEd
Teacher Education Council
R5 – Anderson, L.et.al(2001). A Taxonomy for Learning, Teaching, and
Assessing: A revision of Bloom's Taxonomy of Educational
Objectives. New York: Pearson, Allyn & Bacon
R6 – Santos, R. et. al (2007) Assessment of Learning 1, Manila:
Lorimar Publishing, Inc.
R7 – Reganit, A. et. al (2010) Assessment of Student Learning 1
(Cognitive Learning), Manila: C & E Publishing, Inc.
ONLINE REFERENCES:
OR1 - https://www.techlearning.com/t-l-partner-post/5-key-features-of
high-quality-assessment
OR2 - http://www.nwlink.com/~donclark/hrd/bloom.html
OR3 - Authentic Assessment, https://www.teachervision.com/authentic
This module explains that outcome assessment is the process of gathering information on whether
the instruction, services and activities that the program provides are producing the desired student
learning outcomes.
Student Learning Outcome #2: Students apply principles of logical thinking and persuasive
argument in writing.
Student Learning Outcome #3: Students write multiple page essays complying with standard style
and format
Study the phases of outcome assessment in the instructional cycle as shown above in the figure then answer
the following questions:
D. Constructive Alignment
Below is another diagram that illustrates the principle of constructive alignment in the assessment process.
Study it well.
Learning
Outcome
Teaching-
Assessment
Learning
Task
Activities
Constructive Alignment
The figure above illustrates the principle of constructive alignment. The principle of constructive alignment
simple means that the teaching-learning activity or activities and assessment tasks are aligned with the
intended learning outcome. The intended learning outcome is “to drive a car.” The teaching-learning activity
is driving a car not giving lectures on car driving. The assessment task is to let the student drive a car not to
describe how to drive a car.
You have been victims of teachers who taught you one thing but assessed you on another. The result is
much confusion and disappointment. If you have been victims of lack of constructive alignment, then break
the cycle by not victimizing your students, too. Observe the principle of constructive alignment. Make sure
your assessment tasks are aligned with your learning outcomes.
Why the term “constructive?” Constructive alignment is based on the constructive theory (Biggs, 2007) that
learners use their own activity to construct their knowledge or other outcome/s.
The paper-and-pencil test (traditional assessment) assesses learning in the cognitive domain (Bloom) or
declarative knowledge (Kendall and Marzano, 2012).
This document is a property of NONESCOST Module 1 | Page 13
Unauthorized copying, uploading, and / or editing is prohibited. (For Classroom Use Only) Prepared by: Judy Ann B. Castañeda, LPT, MEd
The paper-and-pencil test, however, is inadequate to measure all forms of learning. Psychomotor learning
(Kendall and Marzano, 2021) or procedural knowledge (Kendall and Marzano, 2021) and learning proven by
a product and by a performance cannot be measured by a paper-and-pencil test.
Assessment tools for the cognitive domain (declarative knowledge) are the different paper-and-pencil tests.
Basic examples of paper-and-pencil tests are shown in figure below.
Examples of selected response type of tests are alternative response (True or False, Yes or No, 4 or 6);
matching type and the multiple choice type.
Examples of constructed type of tests are the completion type (Fill-in-the-blanks), short answer, the essay
test and problem solving.
Examples of authentic assessment tools are the demonstration of what have been learned by either a
product or a performance.
Product Performance
Examples of performance tests are executing steps of tango, delivering a keynote speech, opening a
computer, demonstration teaching, etc.
F.Portfolio
Portfolio falls under non-paper-and-pencil test. A portfolio is a purposeful collection of student work or
documented performance (e.g. video of dance) that tells story of student achievement or growth. The word
purposeful implies that a portfolio is not a collection of all student’s works. It is not just a receptacle for all
student’s work. The student’s work that is collected depends on the type and purpose of a portfolio you want
to have. It can be a collection of products or recorded performances or photos of performances.
For example, if the standard or competency specifies persuasive, narrative, and descriptive writing,
an assessment portfolio should include examples of each type of writing. Similarly, if the curriculum
calls for technical skill such as use of Power Point in report presentation, then the display portfolio
will include entries in documenting the reporting process with the use of Power Point.
G. Scoring Rubrics
A rubric is a coherent set of criteria for student’s work that includes descriptions of levels of performance
quality on the criteria. The main purpose of rubrics is to assess performance made evident in the processes
and products. It can serve as a scoring guide that seeks to evaluate a student’s performance in many
different tasks based on a full range of criteria rather than a single numerical score. The objectives tests can
be scored by simply counting the correct answers, but the essay tests, student’s products and student’s
performances cannot be scored the way the objective tests are scored. Products and performances can be
scored reliably only with the use of scoring rubrics.
Rubrics have two major parts: coherent sets of criteria and descriptions of levels of performance for these
criteria. (Brookhart, 2013 How to create and use rubrics).
1. Analytic rubric – each criterion (dimensions, traits) is evaluated separately. Good for formative
assessment. It is also adaptable to summative assessment because if you need an overall score for grading,
you can combine the scores.
2. Holistic rubric – all criteria (dimensions, traits) are evaluated simultaneously. Scoring is faster. It is good
for summative assessment.
Learners have multiple intelligences and varied learning styles. Students must be given the opportunity to
demonstrate learning that is aligned to their multiple intelligences and to their learning styles. It is good for
teachers to consider the multiple intelligences of learners to enable learners to demonstrate learning in a
manner which makes them feel comfortable and successful. Teachers truly consider learners’ multiple
intelligences when they make use of variety of assessment tools and tasks.
Here are assessment practices lifted from DepEd Order No. 8, s. 2015 for the guidance of all teachers:
1. Teachers should employ assessment methods that are consistent with standards. This means that
assessment as a process must be based on standards and competencies that are stated in the K
to 12 Curriculum Guide. Assessment must ne based NOT on content but on standards and
competencies. Therefore, there must be alignment between assessment tools or tasks and
standards and competencies.
2. Teachers must employ both formative and summative assessment both individually and
collaboratively. Assessment is done primarily to ensure learning, thus teachers are expected to
assess learning in every stage of lesson development – beginning, middle and at the end.
3. Grades are a function of written work, performance tasks and quarterly test. This means that
grades come from multiple sources with emphasis on performance tasks from Grades 1 to 12.
Grade does not come from only one source rather from multiple sources.
4. The cognitive process dimensions given by Krathwohl and Anderson (2001) - governs formulation
of assessment tasks.
Direction: “Complete me...” In the table list down 3 learning outcomes then give assessment tasks
appropriate to the multiple intelligences. (15 points)
Rubric
Criteria Very Good (3) Good (2) Needs Improvement (1)
Content structure Manifest excellence in the Well written and observes The essay is structurally
use of English grammar as correct grammatical poorly written.
well as a smooth flow of structure.
ideas.
Content relevance Relevant to the topic and Content is close to the Content is unsubstantial
provides a clear explanation topic yet lacks an in-depth and incoherent.
to the subject matter. explanation.
Spelling All words are correctly 1-3 words are misspelled 4 or more words are
spelled. misspelled
Neatness in Written neatly in clear and Shows a number of Unreadable and messily
writing formal handwriting. No erasures. written.
erasures.
Output Format The entire specified formats 1 of specified formats is 2 or more of specified
are correct. incorrect. formats is incorrect.
This module we are concerned with developing paper-and-pencil tests for assessing the attainment
of educational objectives based on Bloom’s taxonomy. Paper-and-pencil test can either be selected
response or constructed-response type.
In the activity you had, you were able to differentiate a standardized and a teacher-made test,
because the quality of test construction depends largely on the part of the classroom teacher.
Every classroom teacher is interested to know how far and wide they can facilitate, orient and
A teacher should have a plan for test development so that he will be guided as he chooses the contents
from which the items will be drawn as well as the behavior that he needs to assess. The principle of clarity should
be in the teacher’s mind before he begins writing the test. Clear objectives can guide him when he decides as to
what content and behavior he needs to assess. The same objective should be the basis of selecting appropriate
test item formats.
1. Identifying test objectives. An objective is defined as the statement of the expected behavior that the
students should display after the teacher has taught the content or subject matter. The teacher should
identify the objectives to be assessed. The objectives should be clearly stated in terms of the cognitive,
affective or psychomotor domains of learning. Therefore, in test development, the teacher should see
to it that the test assesses specific domains of learning.
Example. We want to construct a test on the topic: “Subject-Verb Agreement in English” for a
Grade V class. The following are typical objectives:
Knowledge. The students must be able to identify the subject and the verb in a given sentence.
Comprehension. The students must be able to determine the appropriate form of a verb to be
used given the subject of a sentence.
Application. The students must be able to write sentences observing rules on subject-verb
agreement.
Analysis. The students must be able to break down a given sentence into its subject and
predicate.
Synthesis. The students must be able to formulate rules to be followed regarding subject-verb
agreement.
2. Preparing a table of specification. For test construction, the teacher is like an engineer who is
building a structure. An engineer is guided by a blueprint which specifies the materials to be used, the
dimensions to be followed and the craftsmanship to be considered. A teacher must also be guided by a
blueprint from which he bases the contents and behaviors to be measured, and the type of item
formats to be constructed. This blue print is called a table of specification. A table of specification is a
test blueprint that serves as a teacher’s guide as he constructs the test.
One of the major complaints of students with regard to teacher-made tests is that they are often
invalid since the included items in the test were not discussed. Although a table of specifications (TOS) is
no guarantee that errors will be corrected, a blue print may help improve the content validity of teacher-
made tests.
A TOS is a matrix where the rows consist of specific topics or skills and the objectives cast in terms
of Bloom’s Taxonomy. It is sometimes called a test blueprint, test grid or content validity chart.
The main purpose of a TOS is to aid the test constructor in developing a balanced test, which
emphasizes the proportion of items to the amount of time spent in the classroom, activities engaged in and
topics discussed. It helps the teacher avoid the tendency to focus on materials that are easy to develop as
test items. Such tendency often limits the teacher in constructing items on knowledge.
Ideally and to be of most benefit, the TOS should be prepared before the beginning of instruction. It
would be good to consider it part and parcel of the course preparation because it can help the teacher be
more effective. It helps provide for optimal learning on the part of the students and optimal teaching
efficiency on the part of the teachers. It serves as a monitoring agent and can help keep the teacher from
straying off the instructional track.
Once the course content and instructional objectives have been specified, the teacher is ready to
integrate them in some meaningful fashion so that test, when completed, will be valid measure of the
student’s knowledge.
One could delineate the course contents into finer subdivisions. Whether this need to be done
depends on the nature of the content and the manner in which the course content has been outlined and
taught by the teacher. A good rule-of-thumb to follow in determining how detailed the content area should
be is to have a sufficient number of subdivisions to ensure an adequate, detailed coverage. The more
detailed the blueprints is, the easier it is to get ideas for test items.
A table of specification can either be one way or two-way. A two-way table of specification has
totals of items for the row (contents) and for the columns (behavior criteria). The table below shows a two-
way table of specification.
Behavior Criteria No of % of
Content Areas Items Items
(Sci & Tech 1) K C Ap An Syn Eval
Nature of Science 1,2,3 4,5,6 7,8,9 10,11, 14,15, 18,19, 20 20
12,13 16,17 20
Nature of Matter 21,22, 25,26, 29,30, 32,33, 35,36, 38,39, 20 20
23,24 27,28 31 34 37 40
Nature of Motion, Force 41,42, 44,45, 47,48, 51,52, 54,55, 58,59, 20 20
43 46 49,50 53 56,57 60
Man & His Physical, Natural Environment 61,62, 64,65, 67,68, 71,72, 74,75, 77,78, 20 20
63 66 69,70 73 76 79,80
The Nature of Living Things 81,82, 85,86, 88,89, 91,92, 95,96, 98,99, 20 20
83,84 87 90 93,94 97 100
TOTAL 17 16 17 17 17 16 100 100
The TOS ensures that there is a balance between that test lower level thinking skills and those
which test higher order thinking skills (or alternatively, a balance between easy and difficult items) in the
test.
The TOS guides the teacher in formulating the test. As we can see, the TOS also ensures that
each of the objectives in hierarchy of educational objectives is well represented in the test. As such, the
resulting test that will be constructed by the teacher will be more or less comprehensive. Without the table
of specifications, the tendency for the test maker is to focus too much on facts and concepts at the
knowledge level.
Given:
50-item Test
Modules covered: (Note: This can be broken down into topics or lesson and specify the
time spent on those topics/lesson)
Automotive = 6 hours
Civil Technology = 7 hours
Example: 6/13 = 0.4615 x 50 = 23.08 round off the answer to the nearest whole number = 23
So you will have to make 23 questions coming from the automotive module.
3. Selecting test format. After developing a table of specification, the teacher should make a decision as
to type of item formats that will be constructed. The choice of the item format should be enumeration.
There are guidelines for constructing different test item formats. As soon as the table of specification is
developed, the teacher is ready for test construction.
In order to fit in testing in the classroom with the testing programs at the regional and national
levels, it is but necessity that teachers construct test in the multiple-choice test item format. The
items used during regional and national achievement tests are constructed in the multiple choice
format. Classroom tests should also be multiple choices. If students are accustomed to this type of
item format, they will not find difficulty in answering regional and national examinations conducted
yearly.
4. Item analysis and try-out. The test draft is tried out to a group of pupils or students. The purpose of
this try-out is to determine the: (a) item characteristics through item analysis, and (b) characteristics of
the test itself-validity, reliability and practicality.
Time %age
Behavior/Competency No. %ag
Allot- of
Learning of e of
ment Teac
Competencies Rememb Understa Applyin Analyzi Evalua Creati Ite Item
(hours) hing
ering nding g ng ting ng ms s
Time
Example. The Philippines gained its independence in 1898 and therefore celebrated its
centennial year in 2000. _____
Obviously, the answer id FALSE because 100 years from 1898 is not 2000 but 1998.
Rule 2. Avoid using the words “always”, “never”, “often” and other adverbs that tend to be either always
true or always false.
Statements that use the word “always” are almost false. A test-wise student can easily
guess his way through a test like these and get high scores even if he does not know
anything about the test.
Rule 3. Avoid long sentences as these tend to be “true”. Keep sentences short.
Example. Tests need to be valid, reliable and useful, although, it would require a great amount
of time and effort to ensure that tests possess these test characteristics. _____
Notice that the statement is true. However, we are also not sure which part of the
sentence is deemed true by the student. It is just fortunate that in this case, all parts of
the above sentence are true and hence, the entire sentence is true. The following example
illustrates what can go wrong in long sentences:
Example. Tests need to be valid, reliable and useful since it takes very little amount of time,
money and effort to construct tests with these characteristics. ______
The first part of the sentence is true but the second part is debatable and may, in fact,
be false. Thus, a “true” response is correct and also, a “false” response is correct.
Rule 4. Avoid trick statements with some minor misleading word or spelling anomaly, misplaced phrases
etc. a wise student who does not know the subject matter may detect this strategy and thus get the answer
correctly.
Example. True or False. The Principle of our school is Mr. Albert P. Panadero.
The Principal’s name may actually be correct but since the word is misspelled and the
entire sentence takes a different meaning, the answer would be false! This is an example
of a tricky but utterly useless item.
Rule 5. Avoid quoting verbatim from reference materials or textbooks. This practice sends the wrong signal
to the students that is necessary to memorize the textbook word for word and thus, acquisition of higher
level thinking skills are not given due to importance.
Rule 6. Avoid specific determiners or give-away qualifiers. Students quickly learn that strongly worded
statements are more likely to be false than true, for example, statements with “never” “no” “all” or “always”.
Moderately worded statements are more likely to be true than false. Statements with “many” “often”
“sometime” “generally” “frequently” or “some” should be avoided.
Rule 7. With true or false questions, avoid a grossly disproportionate number of either true or false
statements or even patterns in the occurrence of true or false statements.
A multiple-choice test item format can assess multiple skills and can be constructed in different forms. A
teacher who possesses the skills required for constructing multiple-choice tests finds it easy to construct
the rest of item formats. In every step of test construction, the teachers are directed by the rules and
guidelines for test construction.
Item writing. When teachers write the test for the first time, they should consider that it is just a
simple draft; it deserves a second or even third checking, especially when the test format I
multiple-choice.
1. Do not use unfamiliar words, terms and phrases. The ability of the item to discriminate or its
level of difficulty should stem from the subject matter rather than form the wording of the question.
Example. What would be the system reliability of a computer system whose slave and
peripherals are connected in parallel circuits and each one has a known time to failure
probability of 0.05?
A student completely unfamiliar with the terms “slave” and “peripherals” may not be able
to answer correctly even if they knew the subject matter or reliability.
2. Do not use modifiers that are vague and whose meanings can differ from one person to the
next such as much, often, usually etc.
The qualifier “much” is vague and could have been replaced by more specific qualifiers
like: “90% of the photosynthetic process” or some similar phrase that would be more precise.
3. Avoid complex or awkward word arrangements. Also, avoid use of negatives in the stem as
this may add unnecessary comprehension difficulties.
Example.
(Poor) As President of the Republic of the Philippines, Corazon Cojuangco Aquino would stand
next to which President of the Philippine Republic subsequent to the 1986 EDSA Revolution?
(Better) Who was the President of the Philippines after Corazon Cojuangco Aquino?
4. Do not use negatives or double negatives as such statements tend to be confusing. It is best to
use simpler sentences rather than sentences that would require expertise in grammatical
construction.
Example.
(Poor) Which of the following will not cause inflation in the Philippine economy?
(Better) Which of the following will cause inflation in the Philippine economy?
5. Each item stem should be as short as possible; otherwise you risk testing more for reading and
comprehension skills.
Example. The short story: Mary Day’s Eve, was written by which Filipino author?
a. Jose Garcia Villa b. Nick Joaquin c. Robert Frost d. Edgar Allan Poe
If distracters had all been Filipino authors, the value of the item would be greatly increased. In this
particular instance, only the first three carry the burden of the entire item since the last two can be
essentially disregarded by the students.
7. All multiple choice options should be grammatically consistent with the stem.
This document is a property of NONESCOST Module 1 | Page 25
Unauthorized copying, uploading, and / or editing is prohibited. (For Classroom Use Only) Prepared by: Judy Ann B. Castañeda, LPT, MEd
8. The length, explicitness, or degree of technicality of alternatives should be the determinants of
the correctness of the answer. The following is an example of this rule:
Example. If the three angles of two triangles are congruent, then the triangles are:
a. congruent whenever one of the sides of the triangles are congruent
b. similar
c. equiangular and therefore, must also be congruent
d. equilateral if they are equiangular
The correct choice, “b” may be obvious from its length and explicitness why they must be
the correct choices forcing the students to think that they are, in fact, not the correct answers!
10. Avoid alternatives that are synonymous with others or those that include or overlap others.
Example. What causes ice to transform from solid state to liquid state?
a. Change in temperature. c. Changes in pressure.
b. Change in the chemical composition. d. Change in heat levels.
The options “a” and “d” are essentially the same. Thus, a student who spots these identical
choices would right away narrow down the field of choices to a, b and c. the last distracter would
play no significant role in increasing the value of the item.
11. Avoid presenting sequenced items in the same order as in the text.
12. Avoid use of assuming qualifiers that many examinees may not be aware.
13. Avoid use of unnecessary words or phrases, which are not relevant to the problem at hand
(unless such discriminating ability is the primary intent of the evaluation). The item’s value is
particularly damaged if the unnecessary material is designed to distract or mislead. Such items test
the student’s reading comprehension rather than knowledge of the subject matter.
Example. The side opposite the thirty degree angle in a right triangle is equal to half the length of
the hypotenuse. If the sine of a 30-degree is 0.5 and its hypotenuse is 5, what is the length of the
side opposite the 30-degree angle?
a. 2.5 b. 3.5 c. 5.5 d. 1.5
The sine of a 30-degree angle is really quite unnecessary since the first sentence already
gives the method for finding the length of the side opposite the thirty-degree angle. This is a case
of a teacher who wants to make sure that no students in the class get the wrong answer!
14. Avoid use of non-relevant sources of difficulty such as requiring a complex calculation when
only knowledge of a principle is being tested.
Note in the previous example, knowledge of the sine of the 30-degree angle would lead
some students to use the sine formula for calculation even if a simpler approach would have
sufficed.
16. Include as much of the item as possible in the stem. This allows less repetition and shorter
choice options.
17. Use the “None of the above” option only when the keyed answer is totally correct when choice
of the “best” response is intended, “none of the above” is not appropriate, since the implication has
already been made that the correct response may be partially inaccurate.
18. Note than use of “all of the above” may allow credit for partial knowledge. In a multiple option
item, (allowing only one option choice) if a student only knew that two (2) options were correct, he
could then deduce the correctness of “all of the above”. This assumes you are allowed only one
correct choice.
19. Having compound response choices may purposefully increase difficulty of an item.
Example.
(Less Homogeneous) (More Homogeneous)
Thailand is located in: Thailand is located in:
a. Southeast Asia a. Laos and Kampuchea
b. Eastern Europe b. India and China
c. South America c. China and Malaysia
d. East Africa d. Laos and China
e. Central America e. India and Malaya
A. Give an example questions to illustrate each of the following rules of thumb in the construction of a
true-false test: you may use the topics/lessons from Introduction to Industrial Arts 1 and Introduction to
Industrial Arts 2.
1. Avoid giving hints in the body of the question.
2. Avoid using the words “always”, “never” and other such adverbs which tend to be always true
or always false.
3. Avoid long sentences which tend to be true. Keep sentences short.
4. Avoid a systematic pattern for true and false statements.
5. Avoid ambiguous sentences which can be interpreted as true and at the same time false.
B. Give an example questions to illustrate each of the following rules of thumb in the construction of a
multiple-choice test: you may use the topics from Introduction to Industrial Arts 1 and Introduction to
Industrial Arts 2.
1. Phrase the stem to allow for only one correct or best answer.
2. Avoid giving away the answer in the stem.
3. Choose distracters appropriately.
4. Choose distracters so that they are all equally plausible and attractive.
5. Phrase questions so that they will test higher order thinking skills.
Example of homogeneous items. The items are all about the Filipino heroes, nothing more.
Perfect Matching happens when an option is the only answer to one of the items in column A.
Column A Column B
Provinces Tourist Destinations
1. Albay a. Luneta Park
2. Bohol b. Mt. Mayon
3. Banaue c. Chocolate Hills
4. Pangasinan d. Rice Terraces
5. Manila e. Hundred Islands
f. Pagsanjan Falls
g. Malolos Church
This document is a property of NONESCOST Module 1 | Page 27
Unauthorized copying, uploading, and / or editing is prohibited. (For Classroom Use Only) Prepared by: Judy Ann B. Castañeda, LPT, MEd
2. The stem (longer in construction than the options) must be in the first column while the options (usually
shorter) must be in the second column.
3. The potions must be more in number than the stems to prevent the student from arriving at the answer
by mere process of elimination.
4. To help the examinee find the answer easier, arrange the options alphabetically or chronologically,
whichever is applicable.
5. Like any other test must be given. The examinees must know exactly what to do.
Imperfect Matching happens when an option is the answer to more than one item in the column.
Column A Column B
Tourist Destinations Provinces
1. Luneta Park a. Albay
2. Mines View Park b. Manila
3. Chocolate Hills c. Banaue
4. Camp John Hay d. Bohol
5. Intramuros e. Pangasinan
f. Baguio
g. Palawan
Sequencing Matching requires the examinee to arrange things, steps, or events in chronological
order.
Arrange the steps on how to conduct historical research.
_________ 1. Reporting
_________ 2. Gathering of source materials.
_________ 3. Problem formulation.
_________ 4. Criticizing source materials.
_________ 5. Interpreting historical data
Multiple Matching requires the examinee to match the items in column A to B, then match the
answers from column B to column C and further match answers from column C to column D.
Match the provinces listed in Column A with their Capital towns in Column B and with the
tourist spots they are known for.
1. If possible, the response list should consist of short phrases, single words or numbers.
3. Have more options than the given items. Initially, a matching-item test decreases the student’s
tendencies to guess but as the students’ progress in answering the test, the guessing tendencies
increase. This can be avoided by increasing the options.
4. Arrange the options and items alphabetically, numerically or magnitudinally. This is one way to help the
examinees since they can maximize their time by not searching for the correct answers, especially if there
are many options.
5. Limit the number of items within each set. Ideally, the minimum is five items and the maximum is ten per
set.
6. Place the shorter responses in column B. this time-saving practice allows the students to read the longer
items first in column A and then search quickly through the shorter options to locate the correct alternative.
7. Provide complete directions. Directions should stipulate whether options can be used only once or more
than once. They should also instruct the students on how to respond. The instructions should also clarify
what columns A and B are about.
9. Avoid specific determiners and trivial information that can help the students find the correct response
without any effort on their part. The use of “none of the above” as an option is recommended if it is the only
correct answer.
Construct a 10-item matching type test on “Basic Hand Tools” that will measure higher order
thinking skills (beyond mere recall of facts and information.) 1 point in every correct question.
Essay questions can be used to measure attainment of a variety of objectives. Stecklein (1955) has listed
14 types of abilities that can be measured by essay items:
Note that all these involve the higher-order skills mentioned in Bloom’s Taxonomy.
The following are rules of thumb which facilitate the grading of essay papers:
Rule 1: Phrase the direction in such a way that students are guided on the key concepts to be included.
Example: Write an essay on the topic: “Plant Photosynthesis” using the following keywords and phrases:
chlorophyll, sunlight, water, carbon dioxide, oxygen, by-product, stomata.
Note that the students are properly guided in terms of the keywords that the teacher is
looking for in this essay examination. An essay such as the one given below will get a score of
zero (0). Why?
Rule 2: Inform the students on the criteria to be used for grading their essays. This rule allows the
students to focus on relevant and substantive materials rather than on peripheral and unnecessary
facts and bits of information.
Example: Write an essay on the topic: “Plant Photosynthesis” using the keywords indicated. You will be
graded according to the following criteria: (a) coherence, (b) accuracy of statements, (c) use of
keywords, (d) clarity and €extra points for innovative presentation of ideas.
Rule 4: Decide on your essay grading system prior to getting the essays of your students.
Rule 5: Evaluate all of the students’ answers to one question before proceeding to the next question.
This document is a property of NONESCOST Module 1 | Page 29
Unauthorized copying, uploading, and / or editing is prohibited. (For Classroom Use Only) Prepared by: Judy Ann B. Castañeda, LPT, MEd
Scoring or grading essay tests questions by question, rather than student by student,
makes it possible to maintain a more uniform standard for judging the answers to each question.
This procedure also helps offset the halo effect in grading. When all of the answers on one paper
are read together, the grader’s impression of the paper as a whole is apt to influence the grades he
assigns to the individual answers. Grading questions by question, of course, prevents the
formation of this overall impression of a student’s paper. Each answer is more apt to be judge on
its own merits when it is read and compared with other answers by the same student.
Rule 6: Evaluate answers to essay questions without knowing the identity of the writer. This is another
attempt to control personal bias during scoring. Answers to essay questions should be evaluated in
terms of what is written, not in terms of what is known about the writers from other contacts with
them. The best way to prevent our prior knowledge from influencing our judgment is to evaluate
each answer without knowing the identity of the writer. This can be done by having the students
write their names on the back of the paper or by using code number in place of names.
Rule 7: Whenever possible, have two or more persons grade each answer. The best way to check on
the reliability of the scoring of essay answers is to obtain two or more independent judgments.
Although this may not be feasible practice for routine classroom testing, it might be done
periodically with a fellow teacher (one who is equally competent in the area). Obtaining two or
more independent ratings becomes especially vital where the results are to be used for important
and irreversible decisions, such as in the selection of students for further training or for special
awards. Here the pooled ratings of several competent persons may be needed to attain level of
reliability that is commensurate with the significance of the decision being made.
Some teachers use the cumulative criteria i.e. adding the weights given to each criterion, as basis for
grading while others use the reverse. In the latter method, each student begins with a score of 100. Points
are then deducted every time a teacher encounters a mistake or when a criterion is missed by the student
in his essay.
Information Collection
Interview a teacher in your locality on how they prepare a test for their students. If face to face interview
can’t be done while observing physical distancing and wearing face mask and face shield, you can conduct
the interview thru online like video call on messenger, google meet, zoom or any other application that can
record the entire interview. Document the interview with a video. Ask their permission that the interview will
be recorded and will be submitted as one of your academic requirement for our subject. The rubric below
will be used to grade your interview video.
Here are the questions you will ask your teacher. Observe proper interview etiquette.
1. Is creating a test one of the most challenging task confronting you as a teacher? Why?
2. How can you design fair, yet challenging, exams that accurately gauge student learning?
3. What makes a test good or bad?
Rubric
Criteria/Ratings 5 3 1
First Impressions The student greets and The student greets and The student does not greet
Greetings & Attire addresses the teacher politely in addresses the teacher or addresses the teacher
a very effective way. politely naturally and naturally or politely.
This document is a property of NONESCOST Module 1 | Page 30
Unauthorized copying, uploading, and / or editing is prohibited. (For Classroom Use Only) Prepared by: Judy Ann B. Castañeda, LPT, MEd
*Attire was very appropriate for politely. *Attire was unprofessional
the interview. *Attire was ok, but too for an interview
casual
Politeness Student never interrupted or Student rarely interrupted Several times, the student
hurried the person being or hurried the person interrupted or hurried the
*Self-assess - please interviewed and thanked them for being interviewed, but person being interviewed
keep in mind while being willing to be interviewed. forgot to thank the AND forgot to thank the
interviewing!! person. person.
The student fully introduces the The student briefly The student does not
teachers’ personal information introduces the teachers’ introduce the teachers’
such as name, school where they personal information and personal information and
Introduction
are teaching and the subjects the topic of the interview. the topic of interview.
they are handling and the topic of
the interview.
*Speaks clearly and distinctly Speaking is clear with Speaking is unclear – very
with no lapse in sentence minimal mistakes in difficult to understand
structure and grammar usage; sentence structure and message of what is being
speaks concisely with correct grammar. said.
Communication
pronunciation. * Filler words were * Filler words (um, like, uh,
*Filler words were used used moderately right, okay) were used too
minimally. * Maintained ok posture frequently
*Maintained good posture. * Maintained poor posture
Video is not shaky, reflects clean Minor issues with video; Major issues with video
editing and clear audio may be shaky, unclean may include choppy editing,
Video Quality editing and unclear audio distracting images, shaky
filming, and unclear audio
or poor editing.
The teacher normally prepares a draft of the test. Such a draft is subjected to item analysis and
validation in order to ensure that the final version of the test would be useful and functional. The
item analysis will provide information that will allow the teacher to decide whether to revise or
replace an item (item revision phase).
A. ITEM ANALYSIS
Item analysis is applicable to test formats that require students to choose the correct or best answer from
the given choices. Therefore, the multiple-choice test is most amenable to item analysis. Examinations that
greatly influence the student’s course grades, like midterms and final examinations or serve other important
decision-making functions should be free from deceptive, ambiguous items as much as possible.
Unfortunately, it is often difficult to recognize problems before the test has been administered.
Items analysis procedures allow teachers to discover items that are ambiguous, irrelevant, too easy or
difficult and non-discriminating. The same procedures can also enhance the technical quality of an
examination by pointing out options that are non-functional and should be improved or eliminated. Another
purpose of item analysis is to facilitate classroom instruction. In diagnostic testing, providing information for
specific remediation.
Item analysis is a process of determining the plausibility of alternatives, difficulty index and discriminating
index of a test. For practical purposes and for ease of undertaking this process, a simple item analysis can
be done for quizzes and shorter tests.
The difficulty of an item or item difficulty is defined as the number of students who are able to answer the
item correctly divided by the total number of students. Thus:
Example: What is the item difficulty index of an item if 25 students are unable to answer it correctly while
75 answered it correctly?
Here, the total number of students is 100; hence, the item difficulty index is 75/100 or 0.75.
A sample analysis and scale for the interpretation of difficulty index is presented below.
Table for difficulty Index (Number of Students who took the test = 50)
No. of students Difficulty Index Verbal
Item No. Decision
who got correct (NCR/TNS) Interpretation
1 38 0.76 Average Retain
2 15 0.30 Average Retain
This document is a property of NONESCOST Module 1 | Page 33
Unauthorized copying, uploading, and / or editing is prohibited. (For Classroom Use Only) Prepared by: Judy Ann B. Castañeda, LPT, MEd
3 28 0.56 Average Retain
4 49 0.98 Very Easy Reject or Revise
5 43 0.86 Very Easy Reject or Revise
6 33 0.66 Average Retain
7 24 0.48 Average Retain
8 30 0.60 Average Retain
9 41 0.82 Very Easy Reject or Revise
10 7 0.14 Very Difficult Reject or Revise
Items 1, 2, 3, 6, 7 and 8 have average difficulty level. These items are considered good; therefore, they are
retained and included as entries in the item bank. In the test bank, indicate the difficulty index of the items.
This index will be the basis of arranging the items according to difficulty level. Items 4, 5 and 9 are very
easy and item 10 is very difficult; therefore they are rejected or revised.
This process is the simplest way of item analysis which teachers are capable of doing daily. When those
items are entered in the test bank, the difficulty level should be indicated so that when they are needed for
future use, the easiest one should be the first items in the test; the more difficult items should be the last
ones. Imagine, if every day, a teacher enters 5 good items in an item bank, for an average of 150
schooldays, they have developed 750 good items - a very rich item bank from which they can withdraw for
long tests.
Difficult items tend to discriminate between those who know and those who do not know the answer.
Conversely, easy items cannot discriminate between these two groups of students. We are therefore
interested in deriving a measure that will tell us whether an item can discriminate between these two
groups of students. Such a measure is called an index of discrimination.
B. The U-L Index Method of Item Analysis (Stocklein, 1957; cited by Oriondo & Antonio, 1989)
The new test items in the test bank, analyzed in terms of difficulty level during quizzes, can be subjected to
a thorough item analysis to determine the plausibility of alternative, difficulty index and discriminating index.
Plausibility of alternatives means that this analysis determines whether the alternative in a multiple
choice test have the capability to be chosen as a response.
Difficulty index is a value (ranging from 0 to 1.0), which gives an information whether the item is very easy,
average or very difficult.
Discriminating index is a value (ranging from -1.0 to +1.0), which gives an information whether the item can
recognize the bright from the poor performing students in the class.
The U-L Index Method is modified to shorten the tedious process of computing the indices. The Upper 27%
are separated from the Average (46%) members of the class. The question comes into our mind, “Why did
Stocklein (1957) use 27%” To present it in a nutshell, let us focus our attention on the theoretical curve, the
deal normal distribution curve as basis of a justification.
Theoretically, when a group of students is assessed through the use of an appropriate test, student’s
performance can be pictured out in a distribution curve. This is a curve which shows the distribution of
scores from the lowest (extreme left) to the highest (extreme right), with most of the scores clustering
somewhere at the middle. The distribution can either be normal or skewed. In this discussion, let us focus
first our attention on the normal distribution. See figure below:
Why 27 percent? The fact that the test is newly developed and is subjected to try out, the chance of
measurement error is high. To give allowance to these errors, it is suggested to move closer to the middle of the
distribution. By making it 27%, the teacher can lessen the adverse effect of measurement error. Other experts use
the concept of quartiles, 25% (lower group) and 25% (upper group). The teacher can reduce the chance of error in
measurement by increasing it to 27%. The steps in the process of item analysis are presented below:
2.Count the number of students who took the test (the normal classroom is composed of 50 students).
3.Compute 27% of the total number of students (in case of 50 students, the 27% is 14).
4.In the arranged test papers, count 14 papers from the top. Separate these papers and call this as the
Upper Group (U). From the lowest, count another 14 test papers; separate them and call this the Lower
Group (L). The remaining 46% are considered to have average scores. They are not included in the analysis.
Put them aside for future use.
5.Tally students’ responses in a table for the analysis of the plausibility of alternatives. For example, if a
student answered B in item number 1, it is tallied as the student’s answer to number 1. Do this until all the
responses of both group (Upper and Lower) are tallied (see sample table below).
6.Put an indicator of the correct answer for every item (e.g., in Number 1, the correct answer is B with 8
correct in the upper group and 3 in the lower group). The quantity, 8 and 3, means that 8 out of 14 students
in the upper group and 3 out of 14 students in the lower group got correct in item No. 1.
7.Find the difference of the upper and lower group. For the correct answer, the difference must be positive to
consider it as plausible; for the non-response alternatives, the difference should be negative to be considered
plausible. (In item Number 1, all alternative are plausible)
8.In a table for the difficulty and discriminating indices (see table below), enter the number of students who
got correct in the upper and lower groups. The data for the upper group should be entered under the column
for the Upper (U) for each item; the same is done for the lower group (L). In Number 1, for example, there are
8 students who got correct in the upper group. The 8 is entered in the cell U1; 3 in L1. Do this for the rest of
the items.
DIFFICULTY
DISCRIMINATING
Item No. UPPER (U) LOWER (L) INDEX DECISION
INDEX (U-L)/27%
(U+L)/T
1 8 3 0.39 0.36 Retain
2 12 10 0.79 0.14 Reject
3 10 4 0.50 0.43 Revise alt.
4 5 5 0.36 0 Reject
9.Compute the difficulty index by adding the U and L (8 + 3) and dividing the sum by the total (28). For Item
1, the difficulty index is computed as (8 + 3)/28 = 0.39. Based on the scale for the interpretation of the
difficulty index of the item presented below, this quantity shows an average difficulty index. Only items with
average difficulty index are considered good. Items that are very easy and very difficult are automatically
rejected. However, difficulty index alone is not the basis for rejecting or retaining an item. As in the case of
item No.4, the difficulty index is 0.36 – interpreted as average; however, the item shows no discriminating
power, hence, rejected.
Scale for interpreting difficulty index Scale for interpreting discriminating index
0.81 – 100 Very easy +0.21 - +1.00 Good discriminating index
0.21 – 0.80 Average -1.00 – +0.20 Bad discriminating index
0.00 – 0.20 Very difficult
10. Compute the discriminating index by subtracting L from U (8 – 3) and dividing the difference by the 27%
of N. for item 1, the discriminating index is computed as (8 – 3)/14 = 0.36. This is the discriminating index for
item No. 1. Based on the scale below, this quantitative value shows good discriminating index. Only items
with good discriminating power are considered retained. The discriminating power of the item can be
deduced by just looking at the data of U and L. If more of the students in the upper group got correct in the
item than those in the lower group, the item appears to have good discriminating index. The practical
application of which is by just looking at the data at the table and do the process of estimating. If the
difference of the two observations (U and L) is 0 to 2, the interpretation is that the item has poor
discrimination power; the difference of more than 2 shows good discriminating power. By doing so, the
teacher is relieved of the burdens of computation through simple estimates.
11. Formulate decision for each item by answering a “yes” or a “no” to these questions. Are the alternatives
plausible? Does the item have average difficulty index? Does the item have good discriminating index? The
“no” answer to one or all of these questions warrants rejection of the item. A “no” answer to the plausibility of
alternatives and “yes” answers to the other two makes it imperative to revise the non-plausible alternative(s).
12. Include good test items in the test bank and indicate the difficulty and discriminating indices for reference
when the items will be used for future testing.
• Item revision. The items that need revision should be revised and noted in the test bank for future try outs
or item analysis.
Validity – is the most important characteristics of a good test. It refers to the extent to which the test serves
its purpose or the efficiency with which it measures what it intends to measure.
The validity of test concerns what the test measures and how well it does for. For example, in order to
judge the validity of a test, it is necessary to consider what behavior the test is supposed to measure.
A test may reveal consistent scores but if it is not useful for the purpose, then it is not valid. For example, a
test for grade V students given to grade IV is not valid.
Validity is classified into four types: content validity, concurrent validity, predictive validity, and construct
validity.
1. Content validity – means that extent to which the content of the test is truly a representative of the
content of the course. A well-constructed achievement test should cover the objectives of
instruction, not just its subject matter. Three domains of behavior are included: cognitive, affective
and psychomotor.
2. Concurrent validity – is the degree to which the test agrees with or correlates with a criterion
which is set up an acceptable measure. The criterion is always available at the time of testing. It
establishes statistical tool to interpret and correlate test results.
For example, a teacher wants to validate an achievement test in Science (X) he constructed. He
administers this test to his students. The result of this test can be compared to another Science students
(Y), which has been proven valid. If the relationship between X and Y is high, this means that the
achievement test is Science is valid. According to Garrett, a highly reliable test is always valid measure of
some functions.
3. Predictive validity – is evaluated by relating the test to some actual achievement of the students
of which the test is supposed to predict his success. The criterion measure against this type is
important because the future outcome of the testee is predicted. The criterion measure against
which the test scores are validated and obtained are available after a long period.
4. Construct validity – is the extent to which the test measures a theoretical trait. Test item must
include factors that make up psychological construct like intelligence, critical thinking, reading
comprehension or mathematical aptitude.
2. Direction – unclear direction reduce validity. Direction that do not clearly indicate how the pupils should
answer and record their answers affect validity of test items.
3. Reading vocabulary and sentence structures – too difficult and complicated vocabulary and sentence
structure will not measure what it intend to measure.
4. Level of difficulty of Items – too difficult or too easy test items cannot discriminate between bright and
slow pupils will lower its validity.
5. Poorly constructed test item – test items that provide clues and items that are ambiguous confuse the
students and will not reveal a true measure.
6. Length of the test- a test should of sufficient length to measure what it is supposed to measure. A test
that is too short cannot adequately sample the performance we want to measure.
8. Patterns of answers – when students can detect the pattern of correct answer, they are liable to guess
and this lowers validity.
D. Reliability
The reliability of an assessment method refers to its consistency. It is also a term that is synonymous with
dependability or stability.
Reliability means consistency and accuracy. It refers then to the extent to which a test is dependable, self-
consistent and stable. In other words, the test agrees with itself. It is concerned with the consistency of
responses from moment to moments even if the person takes the same test twice, the test yields the same
result.
For example, if a student got a score of 90 in an English achievement test this Monday and gets 30 on the
same test given on Friday, then both score cannot be relied upon.
Inconsistency of individual scores however may be affected by person’s scoring the test, by limited
samples on certain areas of the subject matter and particularly the examinees himself. If the examinees
mood is unstable this may affect his score.
2. Difficulty of the test. When a test is too easy or too difficult, it cannot show the differences among
individuals; thus it is unreliable. Ideally, achievement tests should be constructed such that the average
score is 50 percent correct and the scores range from near zero to near perfect.
3. Objectivity. Objectivity eliminates the bias, opinions or judgments of the person who checks the test.
Reliability is greater when test can be scored objectively.
4. Heterogeneity of the student group. Reliability is higher when test scores are spread over a range of
abilities. Measurement errors are smaller than that of a group that is more heterogeneous.
5. Limited time. A test in which speed is a factor is more reliable than a test that is conducted at a longer
time.
1. Test-retest method. The same instrument is administered twice to the same group of subjects. The
scores of the first and second administrations of the test are determined by Spearman rank correlation
coefficient or Spearman rho and Pearson Product-Moment Correlation Coefficient.
For example, 10 students where used as samples to test the reliability of the achievement test in Biology.
After two administration of test the data and computation of Spearman rho is presented in the table below:
Given:
D2 = 3.5
N = 10
(6D2) (6)(3.5) 21
Solution: ρ1- ρ1- ρ1- ρ 1 – 0.0212 = 0.98
N(N2-1) 10(102-1) 990
The ρ value obtained is 0.98 which means very high relationship; hence achievement test in Biology is
reliable (refer to the scale below)
Spearman’s Rho
Interpretation
Correlation Coefficient
0.90 to 1.00 Very High
0.70 to 0.89 High
0.50 to 0.69 Moderate
0.30 to 0.49 Low
0.16 to 0.29 Very Low
Below 0.16 Too low to be meaningful
2. Equivalent/Parallel Form Method – it established consistency of results over test. The teacher
constructs two forms of test – the items similar but not identical. The two forms of tests are conducted to
one group of students. The results can be correlated through the use of Pearson Product-Moment of
Correlation.
Pearson Product-Moment Correlation Coefficient can also be used for test-retest method of estimating
the reliability of test.
Σxy
The formula is: rxy =
√(Σx2)(Σy2)
Using the same data for Spearman rho, the scores for 1st and 2nd administration may be presented in this
way:
Student Test (X) Retest (Y) X2 2 X
1 89 90 7 921 8 100 8 010
2 85 85 7 225 7 225 7 225
3 77 76 5 929 5 776 5 852
4 80 81 6 400 6 561 6 480
5 83 83 6 869 6 869 6 869
6 87 85 7 569 7 225 7 395
7 90 90 8 100 8 100 8 100
8 73 72 5 329 5 184 5 256
9 85 85 7 225 7 225 7 225
10 80 83 6 400 6 889 6 640
X = 829 = 830 X = 68 967
2
= 69 154
2
X = 69 052
Solution: Substitute the given values from the table to the formula.
Σxy 69,052 69,052
rxy = rxy = rxy = rxy = 0.9999
√(Σx2)(Σy2) √(68,967)(69,154) 69,060.44
The coefficient of correlation is interpreted as very high (refer to the scale below).
3. Split-half method. The test may be administered once, but the test items are divided into two halves.
The most common procedure is to divide a test into odd or even items. The results are correlated and the r
obtained is the reliability coefficient for a half test.
Split-half method is applicable for not highly speeded measuring instrument. If the measuring instrument
includes easy items and the subject is able to answer correctly all or nearly all items within the time limit of
the test, the scores on the two halves would be about similar and the correlation would be closed to +1.00.
4. Kuder-Richardson Formula 20 measures inter-item consistency. This method assumes that all items
are of equal difficulty. The analysis is feasible if the test has undergone an item analysis, especially when
the number of students who got correct in every item is available. The teacher administers the test to a
group of students, once only. The result is analysed by determining the number of students who got correct
for every item. The consistency can be established by comparing the ratio of correct responses for each
item with that of the ratios of the other items. If the ratios are consistent, then the test has internal
consistency.
K Σpq
The formula is: KR20 = ( ) (1 – )
2
(K-1) σ
Where:
p = proportion of students who got correct in an item
q = proportion of students who got wrong in an item
k = the total number of items
σ2 = variance of the total score (square of sd)
Example: Mr Marvin administered a 50-item test to 10 of his grade 5 pupils. The scores of his pupils are
presented in the table below:
Number of students who took the test = 50
Item Corre P Q pq Item Correct P Q pq
No. ct No.
1 33 0.76 0.24 0.18 26 31 0.62 0.38 0.24
2 39 0.78 0.22 0.17 27 29 0.58 0.42 0.24
3 42 0.84 0.16 0.13 28 27 0.54 0.46 0.25
4 22 0.44 0.56 0.25 29 36 0.72 0.28 0.20
5 27 0.54 0.46 0.25 30 44 0.88 0.12 0.11
6 36 0.72 0.28 0.20 31 43 0.86 0.14 0.12
7 44 0.88 0.12 0.11 32 28 0.56 0.44 0.25
8 46 0.92 0.08 0.07 33 22 0.44 0.56 0.25
9 41 0.82 0.18 0.15 34 25 0.50 0.50 0.25
10 38 0.76 0.24 0.18 35 28 0.56 0.44 0.25
11 33 0.66 0.34 0.22 36 29 0.58 0.42 0.24
12 36 0.72 0.28 0.20 37 30 0.60 0.40 0.24
13 29 0.58 0.42 0.24 38 33 0.66 0.34 0.22
14 27 0.54 0.46 0.25 39 37 0.74 0.26 0.19
Solution: Substitute the given from the table into the formula.
50 9.95
0.77
50-1 40.32
The result shows high degree of internal consistency among the items in the test in terms of the
proportions of students who got correct and wrong responses. Hence, the test is reliable.
Table 1 presents the scores of 10 students who were tested twice (test-retest) to test the reliability of
such test. Copy and complete the table and answer the questions below on your EDP106 notebook.
3. Based on the interpretation of the calculated rxy, what can you say about the test constructed?
“I believe…”
At this point, we will give you an opportunity to show how effectively you can develop and express ideas.
You should, therefore, take care to develop your point of view, present your ideas logically and clearly, and use
language precisely.
Question: Explain the purpose of item analysis in facilitating classroom instruction using formative testing.
Rubric
Criteria Very Good (3) Good (2) Needs Improvement (1)
Content structure Manifest excellence in the Well written and observes The essay is structurally
use of English grammar as correct grammatical poorly written.
well as a smooth flow of structure.
ideas.
Content relevance Relevant to the topic and Content is close to the Content is unsubstantial
provides a clear explanation topic yet lacks an in-depth and incoherent.
to the subject matter. explanation.
Spelling All words are correctly 1-3 words are misspelled 4 or more words are
spelled. misspelled
Neatness in Written neatly in clear and Shows a number of Unreadable and messily
writing formal handwriting. No erasures. written.
erasures.
Output Format The entire specified formats 1 of specified formats is 2 or more of specified
are correct. incorrect. formats is incorrect.