Module 7 8 Assessment in Learning I
Module 7 8 Assessment in Learning I
Tagudin Campus
MODULE MODULE 7
TITLE: Performance-Based Tests
WHAT IS THE MODULE ALL ABOUT
The course includes about performance-based tests, rubrics and exemplars and the
process of creating sample rubrics based on learning assessments.
Many people have noted serious limitations of performance-based tests and their vulnerability
toward subjectivity in scoring and creating or providing the real or closer-to-the task environment
for assessment purpose. However, the concerns for subjectivity may be addressed simply by
automating the test. The second issue is obviously a bigger problem, and there is no guarantee
that ideas from one domain will apply to another.
Performance-Based Tests
There are many testing procedures that are classified as performance tests with a generally
agreed upon definition that these tests are assessment procedures that require students to
perform a certain task or activity or perhaps, solve complex problems.
Tagudin Campus
MODULE
For example, Bryant suggested assessing portfolios of a student's work over time,
students' demonstrations, hands on execution of experiments by students, and a student's
work in simulated environments. Such an approach falls under the category of portfolio
assessment (i.e. keeping records of all tasks successfully and skillfully performed by a student).
According to Mehrens performance testing is not new. In fact, various types of performance-based
tests were used even before the introduction of multiple-choice testing. For instance, the following
are considered performance testing procedures: performance tasks, rubrics scoring guides and
exemplars of performance.
Performance Tasks
In performance tasks, students are required to draw on the knowledge and skills they possess
and to reflect upon them for use in the particular task at hand. Not only are the students expected
to obtain knowledge from a specific subject or subject matter but they are in fact required to draw
knowledge and skills from other disciplines in order to fully realize the key ideas needed in doing
the task. Normally, the tasks require students to work on projects that yield a definite output or
product, or perhaps, following a process which tests their approach to solving a problem. In many
instances, the tasks require a combination of the two approaches. Of course, the essential idea in
performance tasks is that students or pupils learn optimally by actually doing (Learning by Doing)
the task which is a constructivist philosophy.
As in any other test, the tasks need to be consistent with the intended outcomes of the
curriculum and the objectives of instruction; and must require students to manifest (a) what they
know and (b) the process by which they came to know it. In addition, performance-based tests
require that tasks involve examining the processes as well as the products of student learning.
There are many reasons for the seeming popularity of rubrics scoring in the Philippine school
system.
First, they are very useful tools for both teaching and evaluation of learning outcomes.
Rubrics have the potential to improve student performance, as well as monitor it, by clarifying
teachers' expectations and by actually guiding the students how to satisfy these expectations.
Secondly, rubrics seem to allow students to acquire wisdom in judging and evaluating
the quality of their own work in relation to the quality of the work of other students. In
several experiments involving the use of rubrics, students progressively became more aware of
Tagudin Campus
MODULE
the problems associated with their solution to a problem and with the problems inherent in the
solutions of other students. In other words, rubrics increase the students' sense of responsibility
and accountability.
Third, rubrics are quite efficient and tend to require less time for the teachers in
evaluating student performance. Teachers tend to find that by the time a piece has been self-
and peer-assessed according to a rubric, they have little left to say about it. When they do have
something to say, they can often simply circle an item in the rubric, rather than struggling to
explain the flaw or strength they have noticed and figuring out what to suggest in terms of
improvements. Rubrics provide students with more informative feedback about their strengths and
areas in need of improvement.
Finally, it is easy to understand and construct a rubrics scoring guide. Most of the items
found in the rubrics scoring guide are self-explanatory and require no further help from outside
experts.
3 2 1 0
Most acceptable acceptable less acceptable not acceptable
Purposes The report The report The report does
explains the key explains all of The report not refer to the
purposes of the the key explains some of purposes of the
invention and purposes of the the purposes of invention.
points out less invention. the invention but
obvious ones as misses key
well. purposes.
Features The report The report The report The report does
details both key details the key neglects some not detail the
and hidden features of the features of the features of the
features of the invention and invention or the invention or the
invention and explains the purposes they purposes they
explains how purposes they serve. serve.
they serve serve.
several
purposes.
Critique The report The report The report The report does
discusses the discusses the discusses either not mention the
strengths and strengths and the strengths or strengths or the
weaknesses of weaknesses of weaknesses of weaknesses of
the invention, the invention. the invention but the invention.
and suggests not both.
ways in which it
can be
improved.
Connections The report The report The report The report
makes makes makes unclear makes no
appropriate appropriate or inappropriate connections
connections connections connections between the
between the between the between the invention and
Tagudin Campus
purposes and purposes and invention and other things.
MODULE features of the features of the other
invention and invention and phenomena.
many different one or two
kinds of phenomena.
phenomena.
Figure 14. Prototype of Rubric Scoring
Creating Rubrics
In designing a rubric scoring guide, the students need to be actively involved in the process.
The following steps are suggested in actually creating a rubric:
1. Survey models - Show students examples of good and not-so-good work. Identify the
characteristics that make the good ones good and the bad ones bad.
2. Define criteria from the discussions on the models, identify the qualities that define good
work.
3. Agree on the levels of quality - Describe the best and worst levels of quality, then fill in the
middle levels based on your knowledge of common problems and the discussion of not-so-
good work.
4. Practise on models Using the agreed criteria and levels of quality, evaluate the models
presented in step 1 together with the students.
5. Use self and peer assessment - Give students their task. As they work, stop them
occasionally for self-and peer-assessment.
6. Revise. Always give students time to revise their work based on the feedback they get in
Step 5.
7. Use teacher assessment Use the same rubric students used to assess their work yourself.
Desired Characteristics of Criteria for Classroom Rubrics Characteristics the criteria are:
Appropriate
Definable
Observable
Distinct from one another
Complete
Explanation
Each criterion identifies a separate aspect of the learning out comes the performance is
intended to assess. All the criteria together describe the whole of the learning out comes
the performance is intended to assess. Able to support descriptions. Each criterion can be
described over a range of performance along a continuum of quality level.
Tagudin Campus
MODULE
Tips in Designing Rubrics
Perhaps the most difficult challenge is to use clear, precise and concise language. Terms
like "creative", "innovative" and other vague terms need to be avoided. If a rubric is to teach
as well as evaluate, terms like these must be defined for students. Instead of these words,
try words that can convey ideas and which can be readily observed. Patricia Crosby and
Pamela Heinz, both seventh grade teachers (from Andrade, 2007), solved the same
problem in a rubric for oral presentations by actually listing ways in which students could
meet the criterion (fig. 19). This approach provides valuable information to students on how
to begin a talk and avoid the need to define elusive terms like creative.
Rubrics are scales that differentiate levels of student performance. They contain the criteria
that must be met by the student and the judgment process that will be used to rate how well
the student has performed. An exemplar is an example that delineates the desired
characteristics of quality in ways students can understand. These are important parts of the
assessment process.
In summary, we can say that to design problem based tests, we have to ensure that both
processes and end-results should be tested. The tests should be designed carefully enough
to ensure that proper scoring rubrics can be designed, so that the concerns about
subjectivity in performance based tests are addressed. Indeed, this needs to be done
anyway in order to automate the test, so that a performance based testing is used widely.
We have seen that in order to automate a performance based test, we need to identify a set
of tasks which all lead to the solution of a fairly complex problem. For the testing software to
be able to determine whether a student has completed any particular task, the end of the
task should be accompanied by a definite change in the system. The testing software can
track this change in the system, to determine whether the student has completed the task.
Indeed, a similar condition applies to every aspect of the problem solving activity that we
wish to test. In this case, a set of changes in the system can indicate that the student has
the desired competency.
Such tracking is used widely by computer game manufacturers, where the evidence of a
game player's competency is tracked by the system, and the game player is taken to the
next level of the game.
A user need not always end up accomplishing the task; hence it is important to identify
important milestones the test taker reaches while solving the problem. Having defined the
Tagudin Campus
MODULE
possible strategies, the process and milestones, the selection of tasks that comprise a test
should allow the design of good rubrics for scoring. Every aspect of the problem-solving
activity that we wish to test has to lead to a set of changes in the system, so that the testing
software can collect evidence of the student's competency.
MODULE 8
TITLE: Grading System
WHAT IS THE MODULE ALL ABOUT
The course includes about item analysis as a concept and as a process, including the item
analysis tools such as validity, reliability, etc. This lesson will explain how to identify the range or
index of difficulty and discrimination. Such would discuss the benefits of item analysis.
DepEd guidelines
INTRODUCTION
how the student is progressing in a course (and, incidentally, how a teacher is also
performing with respect to the teaching process). The first step in assessment is, of
course, testing (either by some pencil-paper objective test or by some performance based
testing procedure) followed by a decision to grade the performance of the student,
Grading, therefore, is the next step after testing. Over the course of several years, grading
systems had been evolved in different schools systems all over the world. In the American
system, for instance, grades are expressed in terms of letters, A, B, B+, B-, C, C-, D or what
is referred to as a seven-point system. In Philippine colleges and universities, the letters
are replaced with numerical values: 1, 1.25, 1.50, 1.75, 2.0, 2.5, 3.0 and 4.0 or an eight-point
system. In basic education, grades are expressed as percentages (of accomplishment)
such as 80% or 75%. With the implementation of the K to 12 Basic Education curriculum,
however, student's performance is expressed in terms of level of proficiency. Whatever be
the system of grading adopted, it is clear that there appears to be a need to convert raw
score values into the corresponding standard grading system. This Chapter is concerned
Course Code: Educ 105
Descriptive Title: Assessment of Learning I Instructor: Mr. Jhunrey Calibuso
ILOCOS SUR POLYTECHNIC STATE COLLEGE
Tagudin Campus
with the underlying philosophy and mechanics of converting raw score values into
MODULE
standard grading formats.
Example: Consider the following two sets of scores in an English 1 class for two
sections of ten students each:
B 60, 65, 70, 75, 80, 85, 90, 90, 95, 100}
In the first class, the student who got a raw score of 75 would get a grade of 80% while
in the second class, the same grade of 80% would correspond to a raw score of 90. Indeed,
if the test used for the two classes are the same, it would be a rather "unfair" system of
grading. A wise student would opt to enroll in class A since it is easier to get higher grades
in that class than in the other class (class B).
The previous example illustrates one difficulty with using a norm-referenced grading
system. This problem is called the problem of equivalency. Does a grade of 80 in one class
represent the same achievement level as a grade of 80 in another class of the same
subject? This problem is similar to the problem of trying to compare a Valedictorian from
some remote rural high school with a Valedictorian from some very popular University in
the urban area. Does one expect the same level of competence for these two
valedictorians?
Tagudin Campus
the students would pass or fail a given course. For this reason, many opponents to
MODULE
norm-referenced grading aver that such a grading system does not advance the cause of
education and contradicts the principle of individual differences.
In norm-referenced grading, the students, while they may work individually, are actually
in competition to achieve a standard of performance that will classify them into the desired
grade range. It essentially promotes competition among students or pupils in the same
class. A student or pupil who happens to enroll in a class of gifted students in Mathematics
will find that the norm-referenced grading system is rather worrisome. For example, a
teacher may establish a grading policy whereby the top 15 percent of students will receive
a mark of excellent or outstanding, which in a class of 100 enrolled students will be 15
1.0 (Excellent)
1.50 (Good)
2.0
3.0
(Average, Fair)
(Poor, Pass)
5.0 (Failure)
Top 15 % of Class
Next 45 % of Class
Example: In a class of 100 students, the mean score in a test is 70 with a standard
deviation of 5. Construct a norm referenced grading table that would have seven-grade
scales and such that students scoring between plus or
Tagudin Campus
minus one standard deviation from the mean receives an Solution: The following
MODULE
intervals of raw scores to grade
average grade.
Raw Score
Below 55
55-60
61-65
66-75
76-80
81-85
Above 85
Grade Equivalent
Fail
Marginal Pass
Pass
Average
Above Average
Very Good
Excellent
Percentage
1%
4%
11%
68%
11%
4%
Tagudin Campus
1%
MODULE
Only a few of the teachers who use norm-referenced grading apply it with complete
consistency. When a teacher is faced with a particularly bright class, most of the time, he
does not penalize good students for having the bad luck to enroll in a class with a cohort of
other very capable students even if the grading system says he should fail a certain
percentage of the class. On the other hand, it is also unlikely that a teacher would reduce
the mean grade for a class when he observes a large proportion of poor performing
students just to save them from failure. A serious problem with norm-referenced grading is
that, no matter what the class level of knowledge and ability, and no matter how much they
learn, a predictable proportion of students will receive each grade. Since its essential
purpose is to sort students into categories based on relative performance, orm- referenced
grading and evaluation is often used to weed out students for limited places in selective
educational programs.
1.0
(Excellent)
1.5 Good)
2.0 (Fair)
3.0 (Poor/Pass)
Tagudin Campus
5.0 (Failure)
MODULE
= 98-100
= 88-97
= 75-87
=65-74
= below 65
or 85-100
or 80-84
or 70-79
or 60-69
or below 60
Criterion-referenced systems are often used in situations where the teachers are agreed
on the meaning of a standard of performance" in a subject but the quality of the students is
unknown or uneven; where the work involves student collaboration or teamwork; and
where there is no external driving factor such as needing to systematically reduce a pool
can help a fellow student in a group work without necessarily worrying about lowering
his grade in that course. This is because the criterion-referenced grading system does not
require the mean (of the class) as basis for distributing grades among the students. It is
therefore an ideal system to use in collaborative group work. When students are evaluated
based on predefined criteria, they are freed to collaborate with one another and with the
instructor. With criterion-referenced grading, a rich learning environment is to everyone's
advantage, so students are rewarded for finding ways to help each other, and for
contributing to class and small group discussions.
Marinila D. Svinicki (2007) of the Center for Teaching Effectiveness of the University of
Texas at Austin poses four intriguing questions relative to grading. We reflect these
Tagudin Campus
questions here in this section and the corresponding opinion of Ms. Svinicki for your own
MODULE
reflection:
a single mark?
The grading system an instructor selects reflects his or her educational philosophy.
There are no right or wrong systems, only systems which accomplish different objectives.
The following are questions which an instructor may want to answer when choosing what
will go into a student's grade.
This is often referred to as the controversy between norm referenced versus criterion-
referenced grading. In norm-referenced grading systems the letter grade a student receives
is based on his or her standing class. A ertain percenta of those at the top receive A's, a
specified percent of the next highest grades receive B's and so on. Thus an outside person,
looking at the grades, can decide which student in that group performed best under those
circumstances. Such a system also takes into account circumstances beyond the students'
control which might adversely affect grades, such as poor teaching, bad tests or
unexpected problems arising for the entire class. Presumably, these would affect all the
students equally, so all performance would drop but the relative standing would stay the
same.
On the other hand, under such a system, an outside evaluator has little additional
information about what a student actually knows since that will vary with the class. A
student who has learned an average amount in a class of geniuses will probably know
more than a student who is average in a class of low ability. Unless the instructor provides
more information than just the grade, the external user of the grade is poorly informed.
The system also assumes sufficient variability among student performances that the
difference in learning between them justifies giving different grades. This may be true in
large beginning classes, but is a shaky assumption where the student
The other most common grading system is the criterion referenced system. In this case
the instructor sets a standard of performance against which the students' actual
performance is measured. All students achieving a given level receive the grade assigned
to that level regardless of how many in the class receive the same grade. An outside
Tagudin Campus
evaluator, looking at the grade, knows only that the student has reached a certain level or
MODULE
set of objectives. The usefulness of that information to the outsider will depend on how
much information he or she is given on what behavior is represented by that grade. The
grade, however, will always mean the same thing and will not vary from class to class. A
possible problem with this is that outside factors such as those discussed under norm-
referenced grading might influence the entire class and performance may drop. In such a
case all the students would receive lower grades unless the instructor made special
allowances for the circumstances.
An advantage of this system is that the criteria for various grades are known from the
beginning. This allows the student to take some responsibility for the level at which he or
she is going to perform. Although this might result in some students working below their
potential, it usually inspires students to work for a high grade. The instructor is then faced
with the dilemma of a lot of students receiving high grades. Some people view this as a
problem.
A positive aspect of this foreknowledge is that much of the uncertainty which often
accompanies grading for students is eliminated. Since they can plot their own progress
toward the lesired grade, the students have little uncertainty about where they stand.
course, primarily because the need to motivate students to get their work done is a real
problem for instructors. Also it may be appropriate to the selection function of grading that
such values as timeliness and diligence be reflected in the grades. External. users of the
grades may be interpreting the mark to include such factors as attitude and compliance in
addition to competence in the material.
The primary problem with such inclusion is that it makes grades even more ambiguous
than they already are. It is very difficult to assess these nebulous traits accurately or
consistently. Instructors must use real caution when incorporating such value judgments
into final grade assignment. Two steps instructors should take are (1) to make students
aware of this possibility well in advance of grade assignment and (2) to make clear what
behavior is included in such qualities as prompt completion of work and neatness or
completeness.
Tagudin Campus
MODULE
A positive aspect of this foreknowledge is that much of the uncertainty which often
accompanies grading for students is eliminated. Since they can plot their own progress
toward the desired grade, the students have little uncertainty about where they stand.
There are many problems with "growth" measures as a basis for change, most of them
being related to statistical artifacts. In some cases the ability to accurately measure
entering and exiting levels is shaky enough to argue against change as a basis for grading.
Also many courses are prerequisite to later courses and, therefore, are intended to provide
the foundation for those courses. "Growth" scores in this case would be disastrous.
work and effort and to acknowledge the existence of different abilities. Unfortunately,
there is no easy answer to this question. Each instructor must review his or her own
philosophy and content to determine if such factors are valid components of the grade.
How can several grades on diverse skills combine to give a single mark?
The basic answer is that they can't really. The results of instruction are so varied that
the single mark is really a "Rube Goldberg" as far as indicating what a student has
achieved. It would be most desirable to be able to give multiple marks, one for each of the
variety of skills which are learned. There are, of course, many problems with such a
proposal. It would complicate an already complicated task. There might not be enough
evidence to reliably grade any one skill. The "halo" effect of good performance in one area
could spill over into others. And finally, most outsiders are looking for only one overall
classification of each person so that they can choose the "best." Our system requires that
we produce one mark. Therefore, it is worth our while to see how that can be done even
though currently the system does not lend itself to any satisfactory answers,
the scoring system of a specific standardized test, refer to the policies of the test's
producers.
In the Philippines, there are two types of grading systems used: the averaging and the
cumulative grading systems. In the averaging system, the grade of a student on a particular
Tagudin Campus
grading period equals the average of the grades obtained in the prior grading periods and
MODULE
the current grading period. In the cumulative grading system, the grade of a student in a
grading period equals his current grading period grade which is assumed to have the
cumulative effects of the previous grading periods. In which grading system would there be
more fluctuations observed in the students grades? How do these systems relate with
either norm or criterion-referenced grading?
8.7. Policy Guidelines on Classroom Assessment for the Kto12 Basic Education, DepEd
Order No. 8, s. 2015
Below are some of the highlights of the new K to 12 Grading System which was
implemented starting SY 2015-2016. These are all lifted from DepEd Order No. 8, s. 2015
Weights of the Components for the Different Grade Levels and Subjects
Components
Written Work
Performance Tasks
Quarterly
Assessment
Languages
30%
50%
20%
AP ESP
Science
40%
Tagudin Campus
40%
MODULE
20%
Math
MAPEH
EPPI TLE
20%
60%
20%
Table 5 presents the weights of the components for the Senior High School subjects
which are grouped into 1) core subjects, 2) all other subjects (applied and specialization)
and work immersion of the academic track, and 3) all other subjects (applied and
specialization) and work immersion / research/ exhibit / performance. An analysis of the
figures reveal that among the components, performance tasks have the highest percentage
contribution to the grade. This means that DepEd's grading system consistently puts most
emphasis on application of learned concepts and skills.
Academic Track
Work Immersion/
Performance
35%
40%
25%
20%
60%
20%
Core Subjects
Tagudin Campus
MODULE
All other subjects
25%
45%
30%
25%
50%
25%
1 to 10
Written Work
Performance
Tasks
Quarterly Assessment
Pass-Fail Systems. Other colleges and universities, faculties, schools, and institutions
use pass-fail grading systems in the Philippines, especially when the student's work to be
evaluated is highly subjective (as in the fine arts and music), there are no generally
accepted standard gradations (as with independent studies), or the critical requirement is
meeting a single satisfactory standard (as in some professional examinations and
practicum).
Non-Graded Evaluations. While not yet practised in Philippine schools, and institutions,
non-graded evaluations do not assign numeric or letter grades as a matter of policy. This
practice is usually based on a belief that grades introduce an inappropriate and distracting
element of competition into the learning process, or that they are not as meaningful as
measures of intellectual growth and development as are carefully crafted faculty
evaluations. Many faculty, schools, and institutions that follow a no-grade policy will, if
requested, produce grades or convert their student evaluations into formulae acceptable to
authorities who require traditional measures of performance.
The process of deciding on a grading system is a very complex one. The problems
faced by an instructor who tries to design a system which will be accurate and fair are
common to any manager attempting to evaluate those for whom he or she is responsible.
The problems of teachers and students with regard to grading are almost identical to those
of administrators and faculty with regard to evaluation for promotion and tenure. The need
for completeness and objectivity felt by teachers and administrators must be balanced
against the need for fairness and clarity felt by students and faculty in their respective
situations. The fact that the faculty member finds himself or herself in both the position of
evaluator and evaluated should help to make him or her more thoughtful about the needs
of each position.