0% found this document useful (0 votes)
657 views199 pages

Assessment of Learning SLRC PNU

Here are the key principles of assessment: 1. Clarity of learning targets 2. Appropriateness 3. Validity 4. Reliability 5. Fairness 6. Positive consequences 7. Scorability 8. Administrability
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
657 views199 pages

Assessment of Learning SLRC PNU

Here are the key principles of assessment: 1. Clarity of learning targets 2. Appropriateness 3. Validity 4. Reliability 5. Fairness 6. Positive consequences 7. Scorability 8. Administrability
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 199

Mr.

Angelo Unay
*BEED, PNU-Manila (Cum Laude)
*PGDE-Math & English NTU-NIE, Singapore
 Diagnose learning, strengths, and difficulties
 Evaluate appropriate test items for given
objectives
 Use/Interpret measures of central tendency,
variability and standard scores
 Apply basic concepts and principles of
evaluation in classroom instruction, testing
and measurement
Fundamental Questions

 What is assessment?

 Why do teachers assess student’s


learning?

What to assess with student’s


learning?

How to assess learning?


Assessment

Test Evaluation

Measurement
Test
 An instrument designed to measure any quality,
ability, skill or knowledge.
 Comprised of test items of the area it is designed
to measure.
Testing
 Methods used to measure the level of
performance or achievement of learners.
 Refers to administration, scoring and
interpretation of procedures planned to get
information
Measurement
 A process of quantifying or assigning value to the
individual’s intelligence, personality, attitudes,
values and achievement.
 A process by which traits and behaviours are
differentiated.
Assessment
 A process of collecting and
organizing information into an
interpretable.
 It is a prerequisite to evaluation.
Evaluation
 A process of systematic analysis of
both qualitative and quantitative
information in order to make
judgment or decision.
 It involves judgment about the
desirability of changes in students.
Classroom Assessment
 An on-going process of identifying, gathering,
organizing, and interpreting quantitative and
qualitative information about what learners know and
can do. (Deped Order No. 8. S.2015)
Its purpose is to provide feedback to students,
evaluate their knowledge and understanding, and
guide the instructional process. (Burke, 2005)
Role of Assessment in the Classroom
Placement Summative
done before instruction  done after
determines mastery instruction
of prerequisite  certifies mastery of
skills the intended
learning outcomes
not graded  graded
 assessment of
learning

determines the extent of what the pupils have


achieved or mastered in the objectives of the
intended instruction
determine the students’ strengths and weaknesses
place the students in specific learning groups to
facilitate teaching and learning
serve as basis in planning for a relevant instruction
Role of Assessment in the Instruction
Formative Diagnostic
 reinforces successful  determine recurring or
learning persistent difficulties
 provides continuous
feedback to both  searches for the underlying
students and teachers causes of these problems
concerning learning that do not respond to first aid
treatment
success and failures
assessment for learning  helps formulate a plan for a
detailed remedial instruction

 administered during instruction


 designed to formulate a plan for remedial
instruction
 modify the teaching and learning process

 not graded
Identify the type of evaluation procedures
each of the examples use.
1. College Admission Test
2. Quarterly Test
3. International English Language Testing
System (IELTS)
4. Licensure Examination for Teachers (LET)
5. 5-Item Math Exercise
6. National Achievement Test (NAT)
7. Alternative Learning System (ALS)
Qualifying Examination
8. Asking a question orally during recitation
MODE DESCRIPTION EXAMPLES ADVANTAGES DISADVANTAGES

The objective Standardized Scoring is Preparation of


paper-and- Tests objective instrument is
pen test which Teacher-made Administration time-consuming
usually Tests is easy Prone to
assesses low- because cheating
level thinking students can
skills take the test at
the same time
There is only a
best answer
for any
question asked
Which is an advantage of teacher-made tests
over those of standardized tests?
Teacher-made tests are:
a. highly reliable
b. better adapted to the needs of the
pupils
c. more objectively scored
d. highly valid
Multiple Intelligences (Gardner, 1992)

Logical
Visual/
Interpersonal
Spatial
Naturalist

Musical Intrapersonal

Kinesthetic
MODE DESCRIPTION EXAMPLES ADVANTAGES DISADVANTAGES

Students create Essays Preparation of Scoring tends to


an original Oral the instrument is be subjective
response to relatively easy
presentation without rubrics
answer a certain
Exhibitions Administration
question. Measurers
Demos original response is time
Observation consuming
Self-
assessment
Meaningful
performance
tasks

Positive Clear
interaction standards
between and criteria
assessor and for
assessee excellence

Authentic
Assessment

Quality
Learning that
products and
transfers
performances

Emphasis on
meta-
cognition and
self-
evaluation
MODE DESCRIPTION EXAMPLES ADVANTAGES DISADVANTAGES

It requires Practical Test Preparation of Scoring tends to


actual Oral and the instrument is be subjective
demonstration relatively easy
Aural Tests without rubrics
of skills or
Projects Administration
creation of Measures
behaviours that is time
products of
cannot be consuming
learning
deceived
Perform real-
world task that
demonstrate
knowledge and
skills
MODE DESCRIPTION EXAMPLES ADVANTAGES DISADVANTAGES

A process of Working Measures Development is


gathering Portfolios student’s time-consuming
multiple Show growth and Rating tends to
indicators of Portfolios development be subjective
student Documentary Intelligence- without rubrics
progress to Portfolios fair
support course
goals in dynamic,
ongoing and
collaborative
process
Traditional Portfolio
Measures ability at one Measures ability over time
time
Done by teacher alone Done by teacher and
students
Conducted outside Embedded in instruction
instruction
Assigns student a grade Involves student in
assessment
Does not give student Students learns taking
responsibility responsibility
“No one assessment tool by
itself is capable of producing
the quality information
needed to make an accurate
judgment.”
A union of insufficiencies in which various
methods of assessment are combined in a
way that the strengths of one offset the
limitation of the other. (Shulman, 1998)
Which is the least authentic mode of
assessment?
a. Paper-and-pencil test in vocabulary
b. Oral performance to assess students’
spoken communication skills
c. Experiments in science to assess skill
in the use of scientific methods
d. Artistic production for music or art
subject
1. Clarity of Learning Targets
 Clear and appropriate learning targets
include (1) what students know and can do
and (2) the criteria for judging student
performance.

2. Appropriateness
 method of assessment to be used matches
the learning targets.
3. Validity
 A test measures it is supposed to measure

4. Reliability
 consistency of measurement
 stability when the same measures are
given across time.
5. Fairness
 Fair assessment is unbiased and provides
students with opportunities to demonstrate what
they have learned.

6. Positive Consequences
 The overall quality of assessment is enhanced
when it has a positive effect on student
motivation and study habits. For the teachers,
high-quality assessments lead to better
information and decision-making about students.
7. Scorability
 The test is easy to score and the direction
for scoring is clearly stated in the direction.

8. Administrability
 Assessment is given uniformly so that the
scores obtained will not be affected by
other factors other than student’s
knowledge and skills.
9. Practicality and efficiency
 A test contains a wide range of sampling
items
10. Practicality and efficiency
 Assessments should consider the
teacher’s familiarity with the method, the
time required, the complexity of
administration, the ease of scoring and
interpretation, and cost.
 Tools needed to accomplish what you want to
achieve.
 Direction for instructional process
 Provide basis for assessing a performance
 Conveys instructional intent to stakeholders
Goals Objectives
Broad Specific
Intangible/abstract Tangible/concrete
Conducted outside Embedded in
instruction instruction
Long term Short-term
 Audience
 Observable behavior
 Special conditions
 Criterion level

After 50-minute period, the pupils will


be able to multiply 2 to 3 digit number
mentally with 75% accuracy.
Cognitive
Domain

EOs
Affective Psychomotor
Domain Domain
(Anderson/Krathwohl, 2001)
REMEMBERING
recall information and retrieve relevant
knowledge from long-term memory
state, tell, underline, locate, match, list,
define, recall, name
(Anderson/Krathwohl, 2001)

UNDERSTANDING
construct meaning from oral, written and
graphic messages or materials
 explain, report, express, illustrate,
differentiate, represent, draw
(Anderson/Krathwohl, 2001)
APPLYING
use information to undertake a procedure in
familiar situations or in new ways
application of rules, methods, concepts,
principles, laws, and theories
use, develop, apply, show, practice
(Anderson/Krathwohl, 2001)
ANALYSING
distinguish between parts and determine
how they relate to one another, and to the
overall structure and purpose
compare, contrast, dissect, inspect, classify,
separate
(Anderson/Krathwohl, 2001)

EVALUATING
Make judgements based on criteria and
standards through checking and critiquing
appraise, evaluate, judge, justify, rate, rank
(Anderson/Krathwohl, 2001)
CREATING
put elements together to forma a functional
whole or create a new product or
perspective.
compose, construct, write, plan, produce,
formulate
Yes or No: Justify if the objectives
match the test items.
Objective: Discriminate fact from opinion
from Pres. Rodrigo Duterte’s
inauguration speech.
Test Item: From the speech of Pres. Duterte,
give five examples of facts and five examples
of opinions.
Yes or No: Justify if the objectives
match the test items.
Objective: Recall the names and capitals
of all different provinces of Regions I
&II.
Test Item: List the names and capitals of two
provinces in Region I and three provinces in
Region II.
Yes or No: Justify if the objectives
match the test items.
Objective: Circle the nouns and pronouns
from the given list of words.
Test Item: Give five examples of pronouns and
five examples of verbs.
(Krathwohl, 1964)

RECEIVING
Willingness to listen or to attend to a
particular phenomenon
Acknowledge, ask, choose, follow, listen,
reply, watch
(Krathwohl, 1964)

RESPONDING
Refers to active participation on the part of
the student.
Answer, assist, contribute, cooperate, follow-
up, react
(Krathwohl, 1964)

VALUING
Ability to see worth or value in a subject, and
activity, or willingness to be involved
Adopt, commit, desire, display, explain,
initiate, justify, share
(Krathwohl, 1964)

ORGANIZATION
Bringing together held values, resolving
conflicts between them, and beginning to
build an internally consistent value system or
willingness to be an advocate.
Adapt, categorize, establish, integrate
(Krathwohl, 1964)

VALUE CHARACTERIZATION
Values have been internalized and have
controlled ones’ behaviour for a sufficiently
long period of time or changed in one’s
behavior or lifestyle
Advocate, behave, defend, encourage
(Dave, 1975)

IMITATION
Observing and patterning behaviour after
someone else.
carry out, assemble, practice, follow, repeat,
sketch, move
Eg. Following a dance step in a video
(Dave, 1975)

MANIPULATION
Being able to perform certain actions by
following instructions and practicing
 acquire, complete, conduct, improve,
perform, produce
Playing a guitar
(Dave, 1975)

PRECISION
Refining, and becoming more exact where
few errors are apparent
Achieve, accomplish, excel, master,
succeed, surpass
Shooting a ball with high accuracy
(Dave, 1975)

ARTICULATION
Coordinating a series of actions, and
achieving harmony and internal consistency
Adapt, change, excel, reorganize, rearrange,
revise
Dancing tinikling
(Dave, 1975)

NATURALIZATION
Having high level performance becomes
natural, without needing to think about it
Arrange, combine, compose, construct,
create, design
Playing a piano like Beethoven
With SMART lesson objectives in the
synthesis in mind, which one does
NOT belong to the group?
a. Formulate
b. Judge
c. Organize
d. Build
Which test item is in the highest level of
Bloom’s taxonomy of objectives?

a. Explain how a tree functions in


relation to the ecosystem.
b. Explain how trees receive nutrients.
c. Rate three different methods of
controlling tree growth.
d. List the parts of a tree.
Which behavioral term describes a
lesson outcome in the highest level
of Bloom’s taxonomy?
a. Analyze
b. Create
c. Infer
d. Evaluate
MAIN POINTS
FOR COMPA-
RISON
TYPES OF TESTS
Psychological Educational
 Aims to measure
students intelligence or
mental ability in a large
degree without reference  Aims to measure
to what the students has the result of
Purpose instructions and
learned
learning (e.g.
 Measures the intangible Achievement Tests,
characteristics of an Performance Tests)
individual (e.g. Aptitude
Tests, Personality Tests,
Intelligence Tests)
MAIN POINTS
FOR
COMPARISON
TYPES OF TESTS
Survey Mastery
Covers a
Covers a broad
specific
Scope of range of objectives
objective
Content Measures
Measures general
fundamental
achievement in
skills and
certain subjects
abilities
MAIN POINTS
FOR TYPES OF TESTS
COMPARISON

Verbal Non-Verbal
Students do not
use words in
Language Words are used by
attaching meaning
Mode students in attaching
to or in responding
meaning to or
to test items (e.g.
responding to test items
graphs, numbers,
3-D subjects)
MAIN POINTS
FOR TYPES OF TESTS
COMPARISON
Standardized Informal
Constructed by a Constructed by a
professional item writer classroom teacher
Covers a broad range of
Covers a narrow
content covered in a subject
range of content
area
Construction
Various types of items
Uses mainly multiple choice
are used
Items written are screened
Teacher picks or
and the best items were
writes items as
chosen for the final
needed for the test
instrument
MAIN POINTS
FOR TYPES OF TESTS
COMPARISON
Standardized Informal
Scored
Can be scored by a
manually by the
machine
teacher
Construction
Interpretation
Interpretation of
is usually
results is usually
criterion-
norm-referenced
referenced
MAIN POINTS
FOR TYPES OF TESTS
COMPARISON
Individual Group
Mostly given orally or
This is a paper-
requires actual
and-pen test
demonstration of skill
Loss of rapport,
One-on-one situations,
Manner of insight and
thus, many opportunities
Administration for clinical observation knowledge about
each examinee
Chance to follow-up
Same amount of
examinee’s response in
time needed to
order to clarify or
gather information
comprehend it more
from one student
clearly
MAIN POINTS
FOR TYPES OF TESTS
COMPARISON
Objective Subjective
Affected by
Scorer’s personal
scorer’s personal
judgment does not
opinions, biases
affect the scoring
and judgments
Effect of
Worded that only one Several answers
Biases
answer is acceptable are possible
Possible to
Little or no
disagreement on
disagreement on what
what is the correct
is the correct answer
answer
MAIN POINTS
FOR TYPES OF TESTS
COMPARISON
Power Speed
Consists of series of
Consists of items
items arranged in
approximately
Time Limit and ascending order of equal in difficulty
Level of difficulty
Difficulty Measure’s
Measures student’s
student’s speed or
ability to answer more
rate and accuracy
and more difficult items
in responding
MAIN POINTS
FOR TYPES OF TESTS
COMPARISON
Selective Supply
Short answer,
Multiple choice, True or
Completion, Restricted
False, Matching Type
or Extended Essay
There are choices for There are no choices
the answer for the answer
Format Can be answered May require a longer
quickly time to answer
Less chance to
Prone to guessing guessing but prone to
bluffing
Time consuming to Time consuming to
construct answer and score
MAIN POINTS
FOR COMPA-
RISON
TYPES OF TESTS
Maximum Typical
Performance Performance
Determines what Determines what
Nature individuals can do individuals will do
of when performing at under natural
Assess their best conditions
ment Attitude, interest, and
personality inventories;
Aptitude tests,
observation
achievement tests techniques; peer
appraisal
MAIN POINTS
FOR
COMPARISON
TYPES OF TESTS
Norm-Referenced Criterion-Referenced

Result is interpreted
Result is interpreted
by comparing one
by comparing
student’s
student’s performance
performance with
based on a predefined
Interpretation other students’ standard/criteria
performance

Some will really


All or none may pass
pass
Constructed by Typically constructed
trained professional by the teacher
MAIN POINTS
FOR
COMPARISON
TYPES OF TESTS
Norm-Referenced Criterion-Referenced
There is There is no
competition for a competition for a
limited percentage limited percentage of
of high scores high score

Typically covers a Typically focuses on


Interpretation large domain of a delimited domain of
learning tasks learning

Emphasizes Emphasizes
discrimination description of what
among individuals in learning tasks
terms of level of individuals can and
learning cannot perform
MAIN POINTS
FOR
COMPARISON
TYPES OF TESTS
Norm-Referenced Criterion-Referenced
Matches item
Favors items of
difficulty to learning
average difficulty
tasks, without altering
and typically omits
item difficulty or
very easy and very
omitting easy or hard
Interpretation hard items items

Interpretation
Interpretation
requires a clearly
requires a clearly
defined and delimited
defined group
achievement domain
Similarities Between NRTs and CRTs
1. Both require specification of the
achievement domain to be measured.

2. Both require a relevant and


representative sample of test items.

3. Both use the same types of test items.


Similarities Between NRTs and CRTs
4. Both use the same rules for item writing
(except for item difficulty).

5. Both are judged by the same qualities of


goodness (validity and reliability).

6. Both are useful in educational assessment.


Question:
A test consists of a graph showing the
relationship between age and population.
Following it is a series of true-false items
based on the graph. Which type of test does
this illustrate?
a. Laboratory exercise
b. Problem solving
c. Performance
d. Interpretive
Steps in Developing Assessment Tools

Examine the Construct the


Make TOS Assemble
IOs items
the test
items

Improve the Make the


Write directions Check the items
items answer Key
1. Go back to instructional objective of the
topics previously taught.
2. Use your test specifications as guide to
item writing.
3. Write more test items than needed.
4. Write the test items well in advance of the
testing date.
5. Write each test item so that the task to be
performed is clearly defined.
6. Write each test item in appropriate reading
level.
7. Write each test item so that it does not provide
help in answering other items in the test.
8. Write each test item so that the answer is one
that would be agreed upon by experts.
9. Write test items so that it is the proper level of
difficulty.
10. Whenever a test is revised, recheck its
relevance.
Question:
What should a teacher do before
constructing items for a particular test?

a. Prepare the table of specifications.


b. Review the previous lessons.
c. Determine the length of time for
answering it.
d. Announce to students the scope of
the test.
Selective Type – provides choices for the answer
a. Multiple Choice – consists of a stem which
describes the problem and 3 or more alternatives which
give the suggested solutions. The incorrect alternatives
are the distractors.
b. True-False or Alternative Response –
consists of declarative statement that one has to mark
true or false, right or wrong, correct or incorrect, yes or
no, fact or opinion, and the like.
c. Matching Type – consists of two parallel columns:
Column A, the column of premises from which a match is
sought; Column B, the column of responses from which
the selection is made.
Multiple Choice
Advantages
 Measures learning outcomes from knowledge to evaluation
level.
 Scoring is highly objective, easy and reliable.
 Measures broad samples of content within a short time.
 Item analysis can reveal difficulty of an item and can
discriminate the good and poor students.
Disadvantages
 Time consuming to construct.
 Scores can be influenced by the reading ability of students.
 Not applicable when assessing the students’ ability to
organize and express ideas.
Alternate Response

Advantages
 Covers a lot of content in a short span of time.
 It is easy to score.
Disadvantages
 Limited only to low level thinking such as knowledge and
comprehension.
 High probability of guessing compared to other selective
type of tests.
Matching Type
Advantages
 Simple to construct than MCQ test.
 Reduces the effect of guessing compared to other selective
type of tests.
 More content can be covered in the given set of test.

Disadvantages
 It only measures simple recall or memorization of
information.
 Difficult to construct due to problems in selecting the
descriptions and options.
 Assesses only low level of cognitive domain (knowledge
and comprehension).
Supply Test
a. Short Answer – uses a direct question that can be
answered by a word, phrase, a number, or a symbol
b. Completion Test – it consists of an incomplete
statement

Essay Test
a. Restricted Response – limits the content of the
response by restricting the scope of the topic
b. Extended Response – allows the students to select
any factual information that they think is pertinent, to
organize their answers in accordance with their best
judgment
Completion or Short Answer

Advantages
 Covers a broad range of topic in a short time.
 It is easier to prepare and less time consuming compared
to MCQ and Matching Type.
 It assesses recall of information, rather than recognition.
Disadvantages
 It is only appropriate for questions that can be answered
with short responses.
 Scoring is tedious and time consuming.
 It is not adaptable in measuring complex learning
outcomes.
Essay Test

Advantages
 Easiest to prepare and less time consuming.
 It measures HOTS.
 It allows students’ freedom to express individuality.
 Reduces guessing answer compared to any objective test.
 Its presents more realistic task to students.
Disadvantages
 Scoring is time consuming.
 The scores are not reliable without scoring criteria.
 It measures limited amount of contents and objectives.
 It usually encourages bluffing.
Question:
Which assessment tool will be most
authentic?

a. Short answer test


b. Alternate-response test
c. Essay test
d. Portfolio
Question:
Which does NOT belong to the
group?

a. Short Answer
b. Completion
c. Multiple Choice
d. Restricted-response essay
Supply Type
Short Answer
1. The item should require a single word answer or
brief and definite statement.
2. Be sure to omit keywords.
3. Avoid to leave blank at the beginning or within a
statement.
4. Use direct question rather than an incomplete
statement.
5. Indicate the units in which to be expressed when
the statement requires it.
6. Avoid lifting textbook sentences.
Supply Type
Essay
1. Use essay to measure complex learning
outcomes only.
2. Formulate questions that present a clear task.
3. Require the students to answer the same
question.
4. Number of points and time spent in answering
the question must be indicated.
5. Specify the number of words, paragraphs or the
number of sentences.
6. Scoring system must be discussed.
Selective Type
Alternative-Response
1. Avoid broad statements.
2. Avoid trivial statements.
3. Avoid the use of negative statements
especially double negatives.
4. Avoid specific determiner.
5. Avoid long and complex sentences.
6. Avoid including two ideas in one sentence
unless cause and effect relationship is being
measured.
Selective Type
Alternative-Response
6.If opinion is used, attribute it to some source
unless the ability to identify opinion is being
specifically measured.
7. True statements and false statements should be
approximately equal in length.
8. The number of true statements and false
statements should be approximately equal.
9. Start with a false statement since it is a common
observation that the first statement in this type is
always positive.
Selective Type
Matching Type
1. The descriptions and options must be short and
homogeneous.
2. Descriptions are written at the left side and
options at the right side.
3. Include an unequal number of responses and
premises, and instruct the pupils that response
may be used once, more than once, or not at
all.
4. Keep the list of items to be matched brief, and
place the shorter responses at the right.
Selective Type
Matching Type
4. Matching directions should specify the basis for
matching.
5. Arrange the list of responses in logical order.
6. Indicate in the directions the basis for matching the
responses and premises.
7. Place all the items for one matching exercise on
the same page.
8. A minimum of three items and a maximum of seven
items for elementary and a maximum of 17 for
secondary and tertiary.
Selective Type
Multiple Choice
1. The stem of the item should be meaningful
by itself and should present a definite
problem.
2. The item should include as much of the item
as possible and should be free of irrelevant
information.
3. State the stem in positive form.
4. Use a negatively stated item stem only when
a significant learning outcome requires it.
Selective Type
Multiple Choice
4. Highlight negative words in the stem for
emphasis.
5. All the alternatives should be grammatically
consistent with the stem of the item.
6. An item should only have one correct or
clearly best answer.
7. Items used to measure understanding
should contain novelty, but beware of too
much.
Selective Type
Multiple Choice
8. All distractors should be plausible.
9. Verbal association between the stem and the
correct answer should be avoided.
10. The relative length of the alternatives should not
provide a clue to the answer.
11. The alternatives should be arranged logically.
12. The correct answer should appear in each of the
alternative positions and approximately equal
number of times but in random number.
Selective Type
Multiple Choice
13. Use three to five options.
14. Use of special alternatives (e.g. None of the
above; all of the above) should be done
sparingly.
15. Do not use multiple choice items when other
types are more appropriate.
16. Always have the stem and alternatives on the
same page.
17. Break any of these rules when you have a good
reason for doing so.
Question:
In preparing a multiple-choice test,
how many options would be ideal?

a. Five
b. Three
c. Any
d. Four
Essay Type
1. Restrict the use of essay questions to
those learning outcomes that cannot be
satisfactorily measured by objective items.
2. Formulate questions that will bring forth the
behavior specified in the learning outcome.
3. Phrase each question so that the pupils’
task is clearly defined.
4. Indicate an approximate time limit for each
question.
5. Avoid the use of optional questions.
PERFORMANCE & AUTHENTIC ASSESSMENTS

 Specific behaviors are to be


observed

 Possibility of judging the


When To
appropriateness of students’ actions
Use
 A process or outcome cannot be
directly measured by paper-and-
pencil test
PERFORMANCE & AUTHENTIC ASSESSMENTS

 Allow evaluation of complex skills


which are difficult to assess using
written tests

Advantages  Positive effect on instruction and


learning

 Can be used to evaluate both the


process and the product
PERFORMANCE & AUTHENTIC ASSESSMENTS

 Time-consuming to administer,
develop, and score

Limitations  Subjectivity in scoring

 Inconsistencies in performance on
alternative skills
PORTFOLIO ASSESSMENT

CHARACTERISTICS:
1) Adaptable to individualized instructional goals
2) Focus on assessment of products
3) Identify students’ strengths rather than
weaknesses
4) Actively involve students in the evaluation
process
5) Communicate student achievement to others
6) Time-consuming
7) Need of a scoring plan to increase reliability
RUBRICS – scoring guides, consisting of specific
pre-established performance criteria, used in
evaluating student work on performance
assessments

Types:
1) Holistic Rubric – requires the teacher to score
the overall process or product as a whole,
without judging the component parts separately
2) Analytic Rubric – requires the teacher to score
individual components of the product or
performance first, then sums the individual
scores to obtain a total score
1. Closed-Item or Forced-choice Instruments –
ask for one or specific answer
a. Checklist – measures students preferences,
hobbies, attitudes, feelings, beliefs, interests, etc.
by marking a set of possible responses
b. Scales – these instruments that indicate the
extent or degree of one’s response

1) Rating Scale – measures the degree or


extent of one’s attitudes, feelings, and
perception about ideas, objects and people
by marking a point along 3- or 5- point scale
2.) Semantic Differential Scale – measures the
degree of one’s attitudes, feelings and perceptions
about ideas, objects and people by marking a point
along 5- or 7- or 11- point scale of semantic
adjectives

Ex:
Math is
easy __ __ __ __ __ __ __ difficult
important __ __ __ __ __ __ __ trivial
useful __ __ __ __ __ __ __ useless
c. Alternative Response – measures students
preferences, hobbies, attitudes, feelings, beliefs,
interests, etc. by choosing between two possible
responses
Ex:
T F 1. Reading is the best way of spending leisure time.

d. Ranking – measures students preferences or


priorities by ranking a set of responses
Ex: Rank the following subjects according to its importance.
___ Science ____ Social Studies
___ Math ____ Arts
___ English
3) Likert Scale – measures the degree of one’s
agreement or disagreement on positive or
negative statements about objects and people
Ex:
Use the scale below to rate how much you agree or
disagree about the following statements.
5 – Strongly Agree
4 – Agree
3 – Undecided
2 – Disagree
1 – Strongly Disagree

1. Science is interesting.
2. Doing science experiments is a waste of time.
2. Open-Ended Instruments
– open to more than one answer
Sentence Completion – measures students preferences
over a variety of attitudes and allows students to
answer by completing an unfinished statement which
may vary in length
Surveys – measures the values held by an individual by
writing one or many responses to a given question
Essays – allows the students to reveal and clarify their
preferences, hobbies, attitudes, feelings, beliefs, and
interests by writing their reactions or opinions to a
given question
Question:
To evaluate teaching skills, which is
the most authentic tool?

a. Observation
b. Non-restricted essay test
c. Short answer test
d. Essay test
 A process of examining the student’s
response to individual item in a test.
 It helps identify good and defective test
items.
 Provides a basis for general improvement
of the class
STEPS:
1. Score the test. Arrange from lowest to
highest.
2. Get the top 27% (T27) and below 27% (B27)
of the examinees.
3. Get the proportion of the Top and Below who
got each item correct. (PT) & (PB)
4. Compute for the Difficulty Index.

5. Compute for the Discrimination Index.


- It refers to the proportion of the number of
students in the upper group and lower groups
who answered an item correctly.
- Use the formula:
where
Df = difficulty index;
n = number of the students selecting an item
correctly in the upper group and in the lower group;
N = total number of students who answered
the test.
Index Range Difficulty Level
0.00 – 0.20 Very Difficult
0.21 – 0.40 Difficult
0.41 – 0.60 Average/Moderately
Difficult
0.61 – 0.80 Easy
0.81 – 1.00 Very Easy
- It is the power of an item to discriminate the
students between those who scored high and
those who scored low in the test.
- It also refers to the number of students in the
upper group who got an item correctly minus the
number of students in the lower group who got
an item correctly.
- It is the basis of the validity of an item.
1. Positive discrimination
- more students in the upper group got the item
correctly than those in the lower group

2. Negative discrimination
- more students in the lower group got the item
correctly than those in the upper group

3. Zero discrimination
- the number of students in the upper and lower
group who answer the test correctly are equal
Index Range Discrimination Level
0.19 and below Poor: reject
0.20 – 0.29 Moderate: revise
0.30 – 0.39 Good: accept
0.40 – above Very Good: accept
- Use the formula:
where
Di = discrimination index value;
= number of students selecting the correct
answer in the upper group;
= number of students selecting the correct
answer in the lower group; and
D = number of students in either the upper or
lower group.
Yes No
1. Does the key discriminate
positively?
2. Do the incorrect options
discriminate negatively?

1 and 2 are both YES


1 and 2 are either YES or NO
1 and 2 are both No
Example:

Question A B C D Df
1 0 3 24* 3
2 12* 13 3 2

# of students: 30

*To compute the Df:


Divide the number of students who choose the
correct answer by the total number of students.
Example:

Question A B C D Df
1 0 3 24* 3 0.80
2 12* 13 3 2

# of students: 30

*To compute the Df:


Divide the number of students who choose the
correct answer by the total number of students.
Example:

Question A B C D Df
1 0 3 24* 3 0.80
2 12* 13 3 2 0.40

# of students: 30

*To compute the Df:


Divide the number of students who choose the
correct answer by the total number of students.
Example:
Student Score (%) Q1 Q2 Q3
Joe 90 1 0 1
Dave 90 1 0 1
Sujie 80 0 0 1
Darrell 80 1 0 1
Eliza 70 1 0 1
Zoe 60 1 0 0
Grace 60 1 0 1
Hannah 50 1 1 0
Ricky 50 1 1 0
Anita 40 0 1 0
* “1” –corrrect; “0” - incorrect
Example:

Question PT PB Df Ds
1
2
3
Example:

Question PT PB Df Ds
1 4 4
2 0 3
3 5 1
Example:

Question PT PB Df Ds
1 4 4 0.80
2 0 3
3 5 1
Example:

Question PT PB Df Ds
1 4 4 0.80
2 0 3 0.30
3 5 1
Example:

Question PT PB Df Ds
1 4 4 0.80
2 0 3 0.30
3 5 1 0.60
Example:

Question PT PB Df Ds
1 4 4 0.80 0
2 0 3 0.30
3 5 1 0.60
Example:

Question PT PB Df Ds
1 4 4 0.80 0
2 0 3 0.30 - 0.6
3 5 1 0.60
Example:

Question PT PB Df Ds
1 4 4 0.80 0
2 0 3 0.30 - 0.6
3 5 1 0.60 0.8
1. Which question was the easiest?
2. Which question was the most difficult?
3. Which item has the poorest discrimination?
4. Which question would you eliminate (if any)?
Why?
Question:
A negative discrimination index means that:

a. More from the lower group answered


the test items correctly.
b. The items could not discriminate
between the lower and upper group.
c. More from the upper group answered
the test item correctly.
d. Less from the lower group got the test
item correctly.
Question:
A test item has a difficulty index of 0.89
and a discrimination index of 0.44. What
should the teacher do?
a. Reject the item.

b. Retain the item.

c. Make it a bonus item.

d. Make it a bonus item and reject it.


VALIDITY - is the degree to which a test measures
what is intended to be measured. It is the
usefulness of the test for a given purpose. It is the
most important criteria of a good examination.

FACTORS influencing the validity of tests in general


 Appropriateness of test
 Directions
 Reading Vocabulary and Sentence Structure
 Difficulty of Items
 Construction of Items
 Length of Test
 Arrangement of Items
 Patterns of Answers
Face Validity – is done by examining the
physical appearance of the test

Content Validity – is done through a careful


and critical examination of the objectives of the
test so that it reflects the curricular objectives
Criterion-related validity – is established statistically
such that a set of scores revealed by a test is correlated
with scores obtained in another external predictor or
measure.

Has two purposes:


a. Concurrent Validity – describes the present status of
the individual by correlating the sets of scores obtained
from two measures given concurrently

b. Predictive Validity – describes the future performance


of an individual by correlating the sets of scores obtained
from two measures given at a longer time interval
Construct Validity – is established statistically by
comparing psychological traits or factors that influence
scores in a test, e.g. verbal, numerical, spatial, etc.

a. Convergent Validity – is established if the instrument


defines another similar trait other than what it intended
to measure (e.g. Critical Thinking Test may be
correlated with Creative Thinking Test)

b. Divergent Validity – is established if an instrument can


describe only the intended trait and not other traits (e.g.
Critical Thinking Test may not be correlated with
Reading Comprehension Test)
RELIABILITY - it refers to the consistency of
scores obtained by the same person when retested
using the same instrument or one that is parallel to
it.

FACTORS affecting Reliability


Length of the test
Difficulty of the test
Objectivity
Administrability
Scorability
Economy
Adequacy
Type of
Statistical
Method Reliability Procedure
Measure Measure
Give a test twice to the same
Measure of group with any time interval
Test-Retest Pearson r
stability between sets from several
minutes to several years
Equivalent Measure of Give parallel forms of test at
Pearson r
Forms equivalence the same time between forms
Test-Retest
Measure of Give parallel forms of test
with
stability and with increased time intervals Pearson r
Equivalent
equivalence between forms
Forms
Give a test once. Score
Measure of Pearson r and
equivalent halves of the test
Split Half Internal Spearman Brown
(e.g. odd-and even numbered
Consistency Formula
items)
Give the test once, then
Measure of correlate the Kuder
Kuder-
Internal proportion/percentage of the Richardson
Richardson
Consistency students passing and not Formula 20 & 21
passing a given item
Question:
Setting up criteria for scoring essay
tests is meant to increase their:

a. Objectivity
b. Reliability
c. Validity
d. Usability
Question:
The same test is administered to different
groups at different places at different
times. This process is done in testing
the:

a. Objectivity
b. Validity
c. Reliability
d. Comprehensiveness
Leniency error: Faculty tends to judge better than it really is.
Generosity error: Faculty tends to use high end of scale only.
Severity error: Faculty tends to use low end of scale only.
Central tendency error:
Faculty avoids both extremes of the scale.
Bias:
Letting other factors influence score (e.g., handwriting,
typos)
Halo effect:
Letting general impression of student influence rating of
specific criteria (e.g., student’s prior work)
Contamination effect:
Judgment is influenced by irrelevant knowledge about the
student or other factors that have no bearing on
performance level (e.g., student appearance)
Similar-to-me effect:
Judging more favorably those students whom faculty see
as similar to themselves (e.g., expressing similar interests
or point of view)
First-impression effect:
Judgment is based on early opinions rather than on a
complete picture (e.g., opening paragraph)
Contrast effect:
Judging by comparing student against other students
instead of established criteria and standards
Rater drift:
Unintentionally redefining criteria and standards over time
or across a series of scorings (e.g., getting tired and
cranky and therefore more severe, getting tired and
reading more quickly/leniently to get the job done)
NOMINAL

ORDINAL

RATIO

INTERVAL
ASSUMPTIONS
WHEN USED
APPROPRIATE STATISTICAL TOOLS
MEASURES OF MEASURES OF
CENTRAL TENDENCY VARIABILITY
(describes the (describes the degree of
representative value of a spread or dispersion of a
set of data) set of data)
When the frequency Mean – the arithmetic Standard Deviation – the
distribution is regular or average root-mean-square of the
symmetrical (normal) deviations from the mean
Usually used when data
are numeric (interval or
ratio)
When the frequency Median – the middle score Quartile Deviation – the
distribution is irregular or in a group of scores that are average deviation of the 1st and
skewed ranked 3rd quartiles from the median
Usually when the data is
ordinal
When the distribution of Mode – the most frequent Range – the difference
scores is normal and quick score between the highest and the
answer is needed lowest score in the distribution
Usually used when the
data are nominal
Find the mean, median, and
mode.
Out of 10-item quiz, 10 students got
these scores:
3, 8, 9, 2, 5, 6, 4, 4, 7, 10
Find the range, quartile
deviation, mean deviation,
standard deviation.

Out of 10-item quiz, 10 students got


these scores:
3, 8, 9, 2, 5, 6, 4, 4, 7, 10
2, 3, 4, 4, 5, 6, 7, 8, 9, 10
Question:
Teacher B is researching on a family
income distribution which is quite
symmetrical. Which measure/s of
central tendency will be most
informative and appropriate?
a. Mode
b. Mean
c. Median
d. Mean and median
Question:
What measure/s of central tendency does
the number 16 represent in the following
score distribution?
14, 15, 17, 16, 19, 20, 16, 14, 16
a. Mode only
b. Median only
c. Mode and median
d. Mean and mode
INTERPRETING MEASURES OF VARIABILITY
STANDARD DEVIATION (SD)
 The result will help you determine if the group is
homogeneous or not.
 The result will also help you determine the number of
students that fall below and above the average performance.

Main points to remember:


Points above Mean + 1SD = range of above average
Mean + 1SD
= give the limits of an average ability
Mean - 1SD

Points below Mean – 1SD = range of below average


Example:
A class of 25 students was given a 75-item test. The
mean average score of the class is 61. The SD is 6.
Lisa, a student in the class, got a score of 63.
Describe the performance of Lisa.

X = 61 SD = 6 X = 63

X + SD = 61 + 6 = 67
X - SD = 61 – 6 = 55

All scores between 55-67 are average.


All scores above 67 or 68 and above are above average.
All scores below 55 or 54 and below are below average.

Therefore, Lisa’s score of 63 is average.


Question:
Zero standard deviation means that:

a. The students’ scores are the same.


b. 50% of the scores obtained is zero.
c. More than 50% of the scores
obtained is zero.
d. Less than 50% of the scores
obtained is zero.
Question:
Nellie’s score is within x 1 SD. To which
of the following groups does she belong?
a. Below Average

b. Average

c. Needs Improvement

d. Above Average
Question:
The score distribution of Set A and Set B have
equal mean but with different SDs. Set A has an
SD of 1.7 while Set B has an SD of 3.2. Which
statement is TRUE of the score distributions?

a. The scores of Set B has less variability than


the scores in Set A.
b. Scores in Set A are more widely scattered.
c. Majority of the scores in Set A are clustered
around the mean.
d. Majority of the scores in Set B are clustered
around the mean.
INTERPRETING MEASURES OF VARIABILITY
QUARTILE DEVIATION (QD)
• The result will help you determine if the group is
homogeneous or not.
• The result will also help you determine the number of
students that fall below and above the average performance.

Main points to remember:


Points above Median + 1QD = range of above average
Median + 1QD
= give the limits of an average ability
Median – 1QD

Points below Median – 1QD = range of below average


Example:
A class of 30 students was given a 50-item test. The
median score of the class is 29. The QD is 3. Miguel,
a student in the class, got a score of 33. Describe the
performance of Miguel.
~
X = 29 QD = 3 X = 33
~
X + QD = 29 + 3 = 32
~
X - QD = 29 – 3 = 26
All scores between 26-32 are average.
All scores above 32 or 33 and above are above average.
All scores below 26 or 25 and below are below average.

Therefore, Miguel’s score of 33 is above average.


Correlation
Extent to which the distributions are linearly related
or associated between two variables.
Types of Correlation
Positive
Types of Correlation
Negative
Types of Correlation
Zero
INTERPRETATION of Correlation Value
1 ----------- Perfect Positive Correlation
high positive correlation
0.5 ----------- Positive Correlation
low positive correlation
0 ----------- Zero Correlation
low negative correlation
-0.5 ----------- Negative Correlation
high negative correlation
-1 ----------- Perfect Negative Correlation

.81 – 1.0 = very high correlation for Validity:


computed r should be at least
.61 - .80 = high correlation 0.75 to be significant
.41 - .60 = moderate correlation
.21 - .40 = low correlation for Reliability:
computed r should be at least
0 - .20 = negligible correlation 0.85 to be significant
Question:
The computed r for scores in Math and
Science is 0.92. What does this mean?

a. Math score is positively related to


Science score.
b. Science score is slightly related to Math
score.
c. Math score is not in any way related to
Science score.
d. The higher the Math score, the lower
the Science score.
STANDARD SCORES

• Indicate the pupil’s relative position by showing


how far his raw score is above or below average

• Express the pupil’s performance in terms of


standard unit from the mean

• Represented by the normal probability curve or


what is commonly called the normal curve

• Used to have a common unit to compare raw


scores from different tests
Corresponding Standard Scores and Percentiles
in a Normal Distribution

Z-Scores -3 -2 -1 0 +1 +2 +3

T-Scores 20 30 40 50 60 70 80

Percentiles 1 2 16 50 84 98 99.9
frequency TYPES OF DISTRIBUTION

low scores high scores


scores
Normal Distribution
Symmetrical
Bell Curve
TYPES OF DISTRIBUTION
frequency

low scores high scores


scores
Rectangular Distribution
TYPES OF DISTRIBUTION

Unimodal Distribution

Bimodal Distribution

high scores

Multimodal / Polymodal
Distribution
TYPES OF DISTRIBUTION
frequency

high scores
low scores scores

Positively Skewed Distribution


Skewed to the Right
TYPES OF DISTRIBUTION
frequency

high scores
low scores scores

Negatively Skewed Distribution


Skewed to the Left
KURTOSIS
Leptokurtic
distributions are tall and
peaked. Because the scores
are clustered around the
mean, the standard deviation
will be smaller.
Mesokurtic
distributions are the ideal
example of the normal
distribution, somewhere
between the leptokurtic and
playtykurtic.
Platykurtic
distributions are broad
and flat.
Question:
Which statement applies when score
distribution is negatively skewed?

a. The scores are evenly distributed


from the left to the right.
b. Most pupils are underachievers.
c. Most of the scores are high.
d. Most of the scores are low.
Question:
If the scores of your test follow a
positively skewed score distribution,
what should you do? Find out _______.

a. why your items are easy


b. why most of the scores are high
c. why some pupils scored low
d. why most of the scores are low
PERCENTILE

tells the percentage of examinees that lies below


one’s score

Example:
Jose’s score in the LET is 70 and his percentile
rank is 85.

P85 = 70 (This means Jose, who scored 70,


performed better than 85% of all the examinees )
Z-Score
tells the number of standard deviations equivalent
to a given raw score
Formula: Where:
X – individual’s raw score
XX X – mean of the normative group
Z
SD SD – standard deviation of the
normative group
Example:
Jenny got a score of 75 in a 100-item test. The mean
score of the class is 65 and SD is 5.

Z = 75 – 65
5
=2 (Jenny is 2 standard deviations above the mean)
Example:
Mean of a group in a test: X = 26 SD = 2

Peter’s Score John’s Score


X = 27 X = 25

X  X 27  26 1 X  X 25  26 1
Z   Z  
SD 2 2 SD 2 2

Z = 0.5 Z = -0.5
T-Score
refers to any set of normally distributed standard deviation
score that has a mean of 50 and a standard deviation of 10

computed after converting raw scores to z-scores to get rid


of negative values

Formula:
T  score  50  10(Z)
Example:
Joseph’s T-score = 50 + 10(0.5)
= 50 + 5
= 55
John’s T-score = 50 + 10(-0.5)
= 50 – 5
= 45
ASSIGNING GRADES / MARKS / RATINGS

Marking or Grading is the process of assigning value to a


performance

Marks / Grades / Rating SYMBOLS:

Could be in:
1. percent such as 70%, 88% or 92%
2. letters such as A, B, C, D or F
3. numbers such as 1.0, 1.5, 2.75, 5
4. descriptive expressions such as Outstanding
(O), Very Satisfactory (VS), Satisfactory (S),
Moderately Satisfactory (MS), Needs Improvement
(NI)
ASSIGNING GRADES / MARKS / RATINGS

Could represent:
1. how a student is performing in relation
to other students (norm-referenced
grading)
2. the extent to which a student has
mastered a particular body of knowledge
(criterion-referenced grading)
3. how a student is performing in relation
to a teacher’s judgment of his or her
potential
ASSIGNING GRADES / MARKS / RATINGS

Could be for:

Certification that gives assurance that a student has


mastered a specific content or achieved a certain level
of accomplishment
Selection that provides basis in identifying or grouping
students for certain educational paths or programs
Direction that provides information for diagnosis and
planning
Motivation that emphasizes specific material or skills to
be learned and helping students to understand and
improve their performance
ASSIGNING GRADES / MARKS / RATINGS

Could be assigned by using:

Criterion-Referenced Grading – or grading based on


fixed or absolute standards where grade is assigned
based on how a student has met the criteria or a well-
defined objectives of a course that were spelled out in
advance. It is then up to the student to earn the grade he
or she wants to receive regardless of how other students
in the class have performed. This is done by transmuting
test scores into marks or ratings.
ASSIGNING GRADES / MARKS / RATINGS

Norm-Referenced Grading – or grading based on


relative standards where a student’s grade reflects his
or her level of achievement relative to the performance
of other students in the class. In this system, the grade
is assigned based on the average of test scores.
Point or Percentage Grading System whereby the
teacher identifies points or percentages for various tests
and class activities depending on their importance. The total
of these points will be the bases for the grade assigned to
the student.
Contract Grading System where each student agrees to
work for a particular grade according to agreed-upon
standards.
Question:
Marking on a normative basis means that
__________.

a. the normal curve of distribution should


be followed
b. The symbols used in grading indicate
how a student achieved relative to other
students
c. Some get high marks
d. Some are expected to fail
Guidelines in Grading the Pupils
 explain your grading system at the
start of the school year
 base the grades on a
predetermined and reasonable set
of standards
 base your grades on the student’s
attitude as well as achievement,
especially in elementary level
Guidelines in Grading the Pupils
 base grades on the student’s
relative standing compared to his
classmates
 base grades on variety of sources
 guard against bias in grading
 keep pupils informed of their
standing in the class
K to 12 Grading System
 uses a standards and competency-
based grading system
K to 12 Grading System
 the minimum grade to pass a learning
area is 60, which is transmuted to 75 in
the report card.
K to 12 Grading System
 the lowest mark that can appear on the
report card is 60 for quarterly grades
and final grades.
K to 12 Grading System
 the components of the grades are
written work, performance tasks, and
quarterly test

You might also like