Semi Final Module in Educ 5

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 114

SEMI- FINAL and FINAL MODULES IN EDUC 5

PREPARED BY: DIOCEMEE A. DOCUMENTO

THE 21ST CENTURY ASSESSMENT

In order to thrive in this constantly changing and extremely challenging period the
acquisition of 21st century skills is necessary. It is imperative that the educational system
sees that these skills are developed and honed before the learners graduate. It should be
integrated in the program in each discipline. More than just acquiring knowledge, its
application is important. To ensure that the education has really done its role, ways to
measure or to assess the learning process are necessary. Thus, the assessment process
and tools must be suited to the needs and requirements of the 21st century. In this
chapter, the characteristics of the 21st century assessment, how it is used as one of the
inputs in making instructional decision, and outcome-based assessment will be discussed.

Inevitably the 21st century is here, demanding a lot of changes, development, and re-
engineering of systems in different fields for this generation to thrive. In the field of
education, most of the changes of focused on teaching and learning. Preparing and
equipping the teachers to cater the needs of the 21st century learners are part of the
adjustments being done in the education system. Curricula are updated to address the
needs of the community of relation to the demands of the 21st century. This aspect of
teaching and learning has been given its share of focus, the various components/factors
analyzed and updated to ensure the students’ learning will be at par with the demands of
the 21st century. Although a lot of changes has been made on the different facets of
education, there are some members of the educational community calling for a
corresponding development of change in educational assessment. Viewing educational
assessment as agent of educational change is of great importance. This belief, coupled
with the traditional focus on teaching and learning, will produce a strong and emerging
imperative to alter our long-held conceptions of these three parts: teaching, learning and
assessment (Greenstein, 2012).

Twenty-first century skills must build on the core literacy and numeracy that all students
must master. Students need to think critically and creatively, communicate and collaborate
effectively, and work globally to be productive, accountable citizens and leaders. These
skills to be honed must be assessed, not just simply to get numerical results but more so,
to take the results of assessment of guide to take the further action.

Educators need to focus on: what to teach; how to teach it; and how to assess it
(Greenstein, 2012; Schmoker, 2011).

The Assessment and Teaching of 21st Century Skills project. (atc21s.org) has a core belief
that alignment of goals with learning and assessment is essential to policy and practice.
They emphasize the importance of balanced assessment systems that incorporate the 21st
century goals.

This section focuses on the characteristics of 21st century assessment and the different
types of assessment. You are expected to integrate the concepts that will be discussed
and apply them in using appropriate assessment tools and techniques in making
instructional decisions; and finally, relate assessment to learning outcomes.

1. Characteristics of the 21st Century Assessment

1.1 Responsive

Visible performance-based work (as a result of assessment) generates data that inform
curriculum and instruction. Teachers can adjust instructions, school leaders can consider
additional educational opportunities for students and policy makers can modify programs
and resources to cater to the present needs of the school community.

Process for responding to assessments are thoughtfully developed, incorporating best


practices in feedback and formative assessment. Feedback is to be targeted to the goal
and outcome. Rather than just a single test grade, students are informed of progress
toward the attainment of goal. Self-reflection, peer feedback, and opportunities for
revision will be natural outcome.

1.2 Flexible

Lesson design, curriculum, and assessment require flexibility, suppleness, and adaptability.
Assessment and responses may not be fitted to expected answers. Assessment need to be
adaptability. Assessments and responses may not fitted to expected answers. Assessment
need to be adaptable to students’ settings. Rather than the identical approach that works
in traditional assessment, 21st century approaches are more versatile. These approaches
best fit for the demands of the learning environment at present since as students’
decisions, actions and applications vary, the assessments and the system need to be
flexible, too.

1.3 Integrated

Assessment are to be incorporated into day-to-day practice rather than as add-ons at the
end of instructions or during a single specified week of the school calendar.

Assessment are enriched by metacognition. Assessment is about stimulating thinking,


building on prior learning, constructing meaning, and thinking about one’s thinking. It
offers opportunities for students to consider their choices, identify alternative strategies,
transfer earlier learning, and represent knowledge through different means.

1.4 Informative

The desired 21st century goals and objectives are clearly stated and explicitly taught.
Students display their range of emerging knowledge and skills. Exemplars routinely guide
students toward achievement of targets.

Learning objectives, instructional strategies, assessment methods, and reporting processes


are clearly aligned. Complex learning takes time. Students have opportunities to build on
prior learning in a logical sequence. As students develop and build skills, i.e. learning and
innovation skills, information, communication and technology skills, and life and career
skills; the work gets progressively more rigorous

Demonstration of 21st century skills are evident and support learning. Students show the
steps they go through and display their thought processes for peer and teacher review.

1.5 Multiple Methods

An assessment continuum that includes a spectrum of strategies is the norm. Students


demonstrate knowledge and skills through relevant tasks, projects, and performance.
Authentic and performance-based assessment is emphasized. There is recognition of and
appreciation for the processes and products of learning.

1.6 Communicated

Communication of assessment data is clear and transparent for all stakeholders. Results
are routinely posted to database along with standards-based commentary, both of which
must be available and comprehensible at levels. Students receive routine feedback on
their progress, and parents are kept informed through access to visible progress reports
and assessment data.

The educational community recognizes achievement of students beyond the standardized


test scores. Large-scale measures, including all the results of traditional and authentic
assessment, include and report on 21st century skills.

1.7 Technically Sound


Adjustments and accommodations are made in the assessment process to meet the
student needs the fairness. Students demonstrate what they know and how they can
apply that knowledge in ways that are relevant and appropriate for them.

To be valid, the assessments must measure the stated objective and 21st century skills
with legitimacy and integrity.

To be reliable, the assessment must be precise and technically sound so that users are
consistent in their administration and interpretation of data. They produce accurate
information for decision-making in all relevant circumstances.

1.8 Systemic

Twenty-first century assessment is part of a comprehensive and well-aligned assessment


system that is balanced and inclusive of all students, constituents, and stakeholders and
designed to support improvement at all levels.

These eight characteristics of 21st century assessment, are essential guide for the
preparation of assessment activities by educators. It is necessary to refer to these
characteristics to ensure that the learners are being assessed towards the skills and
demand of the 21st century.

2. Instructional Decision in Assessment

The major objective of educational assessment is to have a holistic appraisal of a learner,


his/her environment and accomplishments.

The educational assessment process start in analyzing the criterion together with the
teaching-learning environment. It is done to determine the effect of the environment to
the teaching-learning situation after which, the kind of evidence that are appropriate to
use for assessment of the individuals are set. This helps to determine the strengths,
weaknesses, needs and personality characteristics, skills and abilities of the learner (Bloom,
1970).

It is clear that educational assessment encompasses total educational setting and not
limited to the teacher-student engagement. It is not merely based on a single aspect such
as taking a test, and checking it. In totality, the processes of measurement and evaluation
are subsumed in the educational assessment process.

2.1 Decision-making at Different Phases of Teaching-Learning Process

Assessment is constantly taking place in educational settings. Decisions are made about
content/subject matter and specific targets, nature of students and faculty, morale and
satisfaction of both the teachers and the students, as well as the extent of which student
performances meet the standard and/or deliver the outcomes expected from them by the
teacher.

Assessments can be used as basis for decision-making at different phases of the teaching-
learning process. The table below depicts the different phases of the teaching-learning
process, how and what decisions are made by the teachers:

Using Classroom Assessment to Promote 21st Century Learning in Emerging Market


Countries

Introduction

Today, educational systems across the globe are undergoing efforts to move beyond the
ways they operated at the beginning of the 20th century, with traditional instructional
practices that commonly ask students to work individually on exams that require them to
recall facts or respond to pre-formulated problems within the narrow boundaries of
individual school subjects. Reforms currently underway reframe what is taught, how it is
learned, and how it is being evaluated in innovative ways that help personalize learning.
Assessments that support learning must explicitly communicate the nature of expected
learning. Research, in fact, shows the powerful effect that on-going assessment embedded
into the learning process has on student learning, particularly for low ability students
(Black & Wiliam, 1998). Creating such a system of personalized learning requires new
forms of formative and summative student performance assessments that enable
individual students to stretch onward from wherever they are in a learning continuum. For
over a decade, Intel® Corporation has been involved in a number of global initiatives
such as Assessment and Teaching of 21st Century Skills (ATC21S) that support developing
new national assessment strategies and new benchmarking tests. Through its partnerships
with ministries of education, Intel Teach’s teacher professional development programs
have helped millions of teachers in developing countries integrate these innovative
assessment strategies, as well as technology, into their classroom practice (EDC & SRI
International, 2008; Light, Polin, & Strother, 2009). While these strategies support new
assessments of learning, all of the Intel Teach professional development programs also
use a variety of assessment for learning approaches. Assessment for learning is the idea
that classroom assessments should support ongoing teaching and learning (Assessment
Reform Group, 2002; Heritage, 2010), thus highlighting the vital role that teacher-made
classroom-based formative and process-focused assessments could play in improving the
entire education system. Intel’s Getting Started course, teachers learn the technical skills
to design rubrics and the Essentials course teaches teachers how to use rubrics to assess
student products and encourages performance-based assessments. The Teaching Thinking
with Technology and the Essentials V10 courses stress formative assessments for 21st
century skills. The online Elements courses include one entirely devoted to assessing 21st
century learning. Intel also offers a free online rubric maker. Additionally, courses like
Getting Started and Essentials model good assessment practices when they have teachers
assess and provide feedback on their work or when the courses ask teachers to reflect on
their own learning in the course. But, these programs alone are probably not sufficient
and local agencies and ministries may need to do more to support the needed shifts in
classroom assessment strategies.

Classroom-Based Assessments
Fostering 21st Century Learning with
Classroom-Based Assessments
Teachers have always evaluated student knowledge through recall test, or by asking
content questions during a lecture, but researchers and practitioners are beginning to
understood that a different type of teacher developed assessments can play an important
role in supporting learning (Black & Wiliam, 1998; W. J. Popham, 2008b) and in helping to
transform teaching practice. In fact, incorporating 21st century teaching practices should
start with updating teachers’ arsenal of assessment strategies that they use in the
classroom to support their teaching (Jacobs, 2010). In a seminal review of the literature on
how people learn, the National Research Council asserts that “appropriately designed
assessments can help teachers realize the need to rethink their teaching practices” (2000,
p. 141).

The research around classroom assessments suggests that the tools and strategies we
wish to discuss share three important traits that in different degrees: high quality teacher-
designed assessments provide insight on what and how students are learning in time for
teachers to modify or personalize instruction; they allow teachers to assess a broader
range of skills and abilities in addition to content recall; and these assessments give
students new roles in the assessment process that can make assessment itself a learning
experience and deepen student engagement in content.

1) Provide Insight on Student Learning so Teachers Can Modify Instruction: Because many
of these assessment tools and strategies are formative in nature, the information garnered
from their implementation can be used to immediately inform teachers’ instructional
decisions. For example, information garnered from portfolios can help teachers evaluate
the effectiveness of their own instruction while helping them make informed decisions
about future lessons. The implementation of portfolio assessments stimulates student self-
reflection providing valuable feedback to both students and teachers, which in turn can
be used to inform the teaching and learning processes. When employing the peer
assessment strategy, if students and teachers assess a student differently it can open up
productive dialogue to discuss student learning needs and goal creation (J. Ross, 2006).
The teacher can then use that information to structure the following lesson around the
needs and goals of those students. Whether taking a pre and post survey poll or asking
multiple-choice questions to reveal student’s subtle misunderstandings and
misconceptions, a Student Response System (SRS) allows teachers to take a quick
snapshot of where his or her teachers are on a learning continuum and devise the
appropriate strategies to take them to the next level. As teachers become more aware of
their students’ interests, needs, strengths and weaknesses, they are better positioned to
modify their instructional strategies and content focus to help maximize student learning.

2) Assess Broader Range of Skills and Abilities: Traditional forms of assessment like
multiple-choice, fill in the blank, and true/false, privilege memorization and recall skills
that demand only a low level of cognitive effort (Dikli, 2003; Shepard, et al., 1995). The
assessment tools and strategies outlined in this paper provide more robust means to
measure higher order thinking skills and complex problem solving abilities (Palm, 2008).
Strategies such as performance bases assessment (PBA) and portfolios, take into account
multiple measures of achievement, and rely on multiple sources of evidence, moving
beyond the standardized examinations most commonly used for school accountability
(Shepard, et al., 1995; Wood, Darling-Hammond, Neill, & Roschewski, 2007). Self-and
peerassessment both teach and assess a broader range of life skills like self-reflection,
collaboration, and communication. As a tool to measure student learning, rubrics allow
teachers to measure multiple dimensions of learning rather than just content knowledge,
and to provide a more detailed assessment of each student’s abilities instead of just a
number or percent correct.

3) Give Students New Roles in the Assessment Process that Make Assessment a Learning
Experience: In contrast to the traditional teacher-designed, teacher-administered, teacher-
graded tests, this cadre of assessments involves students throughout the assessing
process. Involving students in the creation of assessment criteria, the diagnosis of their
strengths and weaknesses, and the monitoring of their own learning, transfers the locus of
instruction from the teacher to his or her students (Nunes, 2004). For example, the most
successful rubrics involve students in the creation of the evaluation criteria. This creates
buy-in, increases engagement, and fosters a deeper commitment to the learning process.
In the assembly of a portfolio, students not only get to decide which work is graded, they
have the opportunity reflect up and evaluate the quality of those submissions. This type
of involvement fosters metacognition, active participation, and ultimately puts students at
the center of the learning process (McMillan & Hearn, 2008). During peer-assessment
students are asked to be the actual evaluator offering feedback and suggestions on how
to improve their classmates’ work. When created collaboratively, many of these
assessments enable teachers and students to interact in a way that blurs the roles in the
teaching and learning process (Barootchi & Keshavarz, 2002). When students are part of
the assessment process they are more likely to “take charge” of their own learning
process and products and will be more likely to want to make improvements on future
work (Sweet, 1993).

Six Effective Assessment Strategies

The following sections describe six assessment tools and strategies shown to impact
teaching and learning as well as help teachers foster a 21st century learning environment
in their classrooms: 1) Rubrics, 2) Performance-based assessments (PBAs), 3) Portfolios, 4)
Student self-assessment
, 5) Peer-assessment, 6) Student response systems (SRS). Although the list does not
include all innovative assessment strategies, it includes what we think are the most
common strategies, and ones that may be particularly relevant to the educational context
of developing countries. Many of the assessment strategies currently in use fit under one
ore more of the categories discussed. Furthermore, it is important to note that these
strategies also overlap in a variety of ways.

1. Rubrics
Rubrics are both a tool to measure students’ knowledge and ability as well as an
assessment strategy. A rubric allows teachers to measure certain skills and abilities not
measurable by standardized testing systems that assess discrete knowledge at a fixed
moment in time (Reeves & Stanford, 2009). This section discusses the research on rubrics,
but because rubrics are frequently used as part of other assessment strategies (portfolios,
performances, projects, peer-review and self-assessment), they will be discussed in those
sections as well. Unlike a standard checklist used to assess performance, a rubric is a set
of criteria that articulates expectations and describes degrees of quality along a
continuum (H. L. Andrade, Ying, & Xiaolei, 2008; Rezaei & Lovorn, 2010; Wiggins &
McTighe, 2005). The rubric is not only utilized in conjunction with summative assessments;
it is a tool that can enhance the entire learning process from start to finish by serving a
number of purposes including communicating expectations for an assignment, providing
focused feedback on a project still in process. Additionally, they encourage self-
monitoring and self-assessment and give structure for a final grade on an end product (H.
L. Andrade, et al., 2008; Lee & Lee, 2009; National Research Council, 2002). Rubrics are
considered “inclusive assessment tools” that can be used as class-wide assessment tools
to help students at all levels make meaningful progress towards curricular goals (Lee &
Lee, 2009). Andrade, et. al. (2010), in their research around assessment and middle school
writing, found that students who are involved in three major components of rubric
assessment (reading an exemplary sample, generating criteria, and using a rubric to
selfassess) can actually produce more effective writing. In addition, students with access to
the evaluation criteria for a project had higher quality discussions and better group
products than their peers who did not know the grading criteria in advance (H. Andrade,
Buff, Terry, Erano, & Paolino, 2009). Skillings (2000), in her two years observing an
elementary school classroom noted that “both lower and higher achieving students were
able to be successful in showing their knowledge” when they were assessed with a rubric
(p. 454). Similarly, the awareness of lesson objectives and the encouragement of self-
monitoring associated with the use of rubrics increase engagement levels and help
students with disabilities learn more successfully in an inclusive classroom (Lee and Lee,
2009). One of the major strengths of the rubric as an assessment method is that it
functions as a teaching as well as an evaluative tool (H. L. Andrade, et al., 2008; J. W.
Popham, 1997). The development of high quality evaluation criteria is essential to the
effectiveness of a rubric as both an instructional and assessment tool (Wiggins &
McTighe, 2005). Popham (2008a) suggests that in fact, the evaluative criteria “should be
the most instructionally relevant component of the rubric. They should guide the teacher
in designing lessons because it is the students’ mastery of the evaluative criteria that
ultimately will lead to skill mastery” (p. 73). In order to ensure the rubric criteria is
rigorous and accurate, Wiggins and McTighe suggest designing and refining rubrics based
on actual student work that has been collected, sorted and rated. Collaborative rubric
development can also promote cooperation between teachers and students as they work
together to build and utilize the tool (Lee & Lee, 2009). As a result, students are more
comfortable because they feel some ownership in the process, recognize that their
opinion is valued and are more successful because they know what is expected of them
(Lundenberg, 1997; Reeves & Stanford, 2009). Inviting students to participate in the
generation of rubric criteria not only pushes students to think more deeply about their
learning it helps foster a sense of responsibility for their own learning process and
develop critical thinking skills that can be transferred to other learning situations (Andrade
et. al., 2008; Lee and Lee, 2009; Skillings and Ferrell, 2000; National Research Council,
2002). Wiggins and McTighe (2005) in fact emphasize that the ultimate test of student
knowledge is their ability to transfer what they know to a variety of contexts.
Metacognition can also lead to more self-directed learning through self-monitoring and
self-assessment (Lee and Lee, 2009).

2. Performance-based Assessments
Performance-based assessments (PBA), also known as project-based or authentic
assessments, are generally used as a summative evaluation strategy to capture not only
what students know about a topic, but if they have the skills to apply that knowledge in a
“real-world” situation. By asking them to create an end product, PBA pushes students to
synthesize their knowledge and apply their skills to a potentially unfamiliar set of
circumstances that is likely to occur beyond the confines of a controlled classroom setting
(Palm, 2008). Some examples of PBA include designing and constructing a model,
developing, conducting and reporting on a survey, carrying out a science experiment,
writing a letter to the editor of a newspaper, creating and testing a computer program,
and outlining, researching and writing an in-depth report (Darling-Hammond & Pecheone,
2009; Wren, 2009). Regardless of the type of performance, the common denominator
across all PBAs is that students are asked to perform an authentic task that simulates a
real life experience and mimics real world challenges (Wiggins & McTighe, 2005; Shepard,
1995). Performance-based assessments have been used in many countries for decades
and offer many advantages not afforded by standardized paper and pencil multiple-
choice exams. Wiggins and McTighe (2005) assert that in fact, “authentic assessments are
meant to do more than “test”: they should teach students (and teachers) what the “doing”
of a subject looks like and what kinds of performance challenges are actually considered
most important in a field or profession” (p. 337). PBA, coupled with an well-designed
measurement tool such as a scoring rubric, can provide the how and the why a student
might be struggling, versus just the what of standardized tests; as a result, PBA can
actually help teachers figure out how their students best learn (Falk, Ort, & Moirs, 2007;
Shepard, 2009). PBA, used as a formative assessment, also provides more timely feedback
than large-scale standardized tests. Standardized tests can take a number of months to
produce results, but PBA allows teachers to make meaningful adjustments while they are
still teaching their current students (Darling-Hammond & Pecheone, 2009; Wood, et al.,
2007). Additional benefits of PBA are that they are inherently more student-centered and
are better at assessing higher order thinking and other 21st century skills (Wood, et al.,
2007; Wren, 2009). In a yearlong study of 13 third grade teachers in Maryland, Shepard
and her team (1995) noted “small but real gains” in students’ ability to explain
mathematical patterns and tables; a skill previously exhibited by only the most adept
students (p. 27). Not surprisingly, PBA helps students to be more engaged and invested in
their learning (Wood et. al., 2007; Wiggins & McTighe, 2005). PBA also allows for
differentiation of assessment so that all students have space to demonstrate
understanding including special education and ELL students (Darling-Hammond, 2009).In
addition to impacts on student outcomes, research has shown that the implementation of
performance-based assessment strategies can also impact other instructional strategies in
the classroom. Though it can be challenging to change general teaching paradigms, a
small study of teachers in the US found that “under some circumstances, performance-
based assessment can change specific behaviors and procedures in the classroom”
(Firestone, Mayrowetz, & Fairman, 1998, p. 11).

3. Portfolio Assessment
Portfolios are a collection of student work gathered over time that is primarily used as a
summative evaluation method. The most salient characteristic of the portfolio assessment
is that rather than being a snapshot of a student’s knowledge at one point in time (like a
single standardized test), it highlights student effort, development, and achievement over
a period of time; portfolios measure a student’s ability to apply knowledge rather than
simply regurgitate it. They are considered both student-centered and authentic
assessments of learning (Anderson & Bachor, 1998; Barootchi & Keshavarz, 2002).
Portfolios are one of the most flexible forms of assessment because they can be
effectively adapted across subject areas, grade levels and administrative contexts (i.e. to
report individual student progress, to compare achievement across classroom or schools
and to increase parent involvement in student learning) (Sweet, 1993; National Research
Council, 2002). The content included in the portfolio, along with who chooses what to
include, vary by the teacher and the learning goals associated with the portfolio. Some
portfolios only include final products, while other portfolios will incorporate drafts and
other process documents. Some will contain items chosen exclusively by the teacher, while
others will fold in input from the student, their peers, administrators and even parents.
One of the strengths of the portfolio as an assessment tool is that it can be smoothly
integrated into classroom instruction (as opposed to be an add-on style of the
standardized summative test). The portfolio acts as a repository for work assigned and
completed throughout the year. It does not necessitate additional tests or writing
assignments. The additional inputs required (i.e. student reflection (written or spoken),
student-teacher collaboration, rubric creation and implementation) aid rather than distract
from the teaching and learning process. Barootchi and Keshavarz highlight that the
student portfolio is an assessment that is “truly congruent with instruction” because of its
ability to simultaneously teach and test (p. 286). In fact, when implemented effectively,
portfolios can supplement rather then take time away from instruction (Sweet, 1993;
National Research Council, 2002). When the portfolio is well integrated into a teacher’s
instructional practices, it can function as a strategy to increase student learning across a
variety of subject areas. Studies in Iran and in Turkey showed increased student
achievement in English as a foreign language (Barootchi & Keshavarz, 2002), science
(Çakan, Mihladiz, & Göçmen-Taskin, 2010), and writing and drawing (Tezci & Dikici, 2006).
All high quality portfolios involve students at some point in the process. In fact, the
selection process can be hugely instructive and impactful for students as they are asked
to collect, select and reflect upon what they want to include in their portfolio (Sweet,
1993). Portfolios foster self-reflection and awareness among students as they are often
asked to review previous assignments and projects and assess strengths and weaknesses
of both their processes as well as their final products (Sweet, 1993). Barootchi and
Keshavarz (2002) also emphasize the role that portfolios can have in helping students to
become more independent learners (p. 281). When well integrated, portfolios can also
foster collaboration both among students and their peers as well as between students
and their teacher (Tezci & Dikici, 2006). Students’ critiques and evaluations of classmate’s
work can even be included as an additional artifact in the portfolio collection. Nunes
(2004) believes that one of the underlying principles of portfolio development is that “it
should be dialogic and facilitate ongoing interaction between teacher and students” (p.
328). Technology is playing an increasingly important role enabling teachers to use
portfolios. In the past decade portfolios have moved from paper folders and file cabinets
to electronic databases in social networks imbedded within the online “cloud.” While e-
portfolios offer many of the same benefits of conventional portfolios, there are additional
advantages that affect learning, teaching and administration. Chang (2009) describes the
e-portfolio as an “abundant online museum” connoting an ease of storage, a creativity of
presentation, and the facilitation of collaboration (p. 392). Research suggests that e-
portfolios can aid in the development of information technology (IT)skills, but also
increase learning in low-motivation students (Chang, 2009). Online portfolios also allow
for real-time information collection, collaboration and editing with fewer physical
resources required. Finally, students are pushed to consider a wider audience when they
put their products online (Diehm, 2004). They also eliminate the space limitations normally
associated with paper portfolios.

4. Self-assessment

While the previous assessment tools and strategies listed in this report generally function
as summative approaches, self-assessment is generally viewed as a formative strategy,
rather than one used to determine a student’s final grade. Its main purpose is for students
to identify their own strengths and weakness and to work to make improvements to meet
specific criteria (H. Andrade & Valtcheva, 2009). According to McMillan and Hearn (2008)
“self-assessment occurs when students judge their own work to improve performance as
they identify discrepancies between current and desired performance” (p. 1). In this way,
self-assessment aligns well with standards-based education because it provides clear
targets and specific criteria against which students or teachers can measure learning. Self-
assessment is used to promote self-regulation, to help students reflect on their progress
and to inform revisions and improvements on a project or paper (Andrade and Valtcheva,
2009). Ross (2006) argues that in order for selfassessment to be truly effective four
conditions must be in place: the self-assessment criteria is negotiated between teachers
and students, students are taught how to apply the criteria, students receive feedback on
their selfassessments and teachers help students use assessment data to develop an
action plan (p. 5). A number of studies point to the positive effects self-assessment can
have on achievement, motivation, selfperception, communication, and behavior (H.
Andrade & Valtcheva, 2009; Klenowski, 1995; McMillan & Hearn, 2008). McDonald and
Boud (2003) report that high school students who were trained in self-assessment not
only felt better prepared for their external examinations, they actually outperformed their
peers who had not received the training. Similarly, students across grade levels and
subject areas including narrative writing, mathematics and geography outperformed their
peers in the control group who had not received self-assessment training (Ross, 2006).
Andrade and Valtcheva (2009) in their literature reviews cite numerous studies that found
a positive relationship between the use of self-assessments and the quality of writing,
depth of communication skills, level of engagement and degree of learner autonomy.
Finally, self-assessment is also a lifelong learning skill that is essential outside of the
confines of the school or classroom (McDonald and Boud, 2003). An additional strength of
self-assessment as a formative assessment tool is that it allows every student to get
feedback on his or her work. Few classrooms allow teachers the luxury of regularly
responding to each individual student, so when students are trained in self-assessment it
makes them less reliant on teachers to advance their learning (Andrade and Valtcheva,
2009). While the focus is self-evaluation, the process can also be enhanced through peer
and teacher based assessments that offer alternative interpretation and additional
evidence to support a student’s understanding of their own learning (Andrade and
Valtecheva, 2009). A number of channels can be used to aid students in their self-
assessment including journals, checklists, rubrics, questionnaires, interviews and student-
teacher conferences. As with the previous assessment strategies, the rubric is often the
most effective tool to help monitor and measure student self-assessment, though
Andrade and Valtcheva (2009) warn that simply handing one out to students before an
activity does not guarantee any learning gains because students need to deeply
understand and value the criteria. As the rubric section of this paper points out, students
can benefit deeply from being involved in the process of developing evaluation criteria
and benchmark targets (Ross, 2006). In addition to involving students in the process, the
assessment criteria needs to be appropriately challenging in order for the evaluation to be
meaningful (McMillan and Hearn, 2008). Ross (2006) also notes the importance of creating
a classroom climate in which students feel comfortable assessing themselves publicly. He
urges teachers to focus students’ attention on learning goals (with a focus on learning
ideas) rather than performance goals (that tend to focus on outdoing one’s peers).

5. Peer Assessment
Peer assessment, much like self-assessment, is a formative assessment strategy that gives
students a key role in evaluating learning (Topping, 2005). Peer assessment approaches
can vary greatly but, essentially, it is a process for learners to consider and give feedback
to other learners about the quality or value of their work (Topping, 2009). Peer
assessments can be used for variety of products like papers, presentations, projects, or
other skilled behaviors. Peer assessment is understood as more than only a grading
procedure and is also envisioned as teaching strategy since engaging in the process
develops both the assessor and assessee’s skills and knowledge (Li, Liu, & Steckelberg,
2010; Orsmond & Merry, 1996). Feedback that students are asked to provide can confirm
existing information, identify or correct errors, provide feedback on process, problem
solutions or clarity of communication (Butler & Winne, 1995). The primary goal for using
peer assessment is to provide feedback to learners. This strategy may be particularly
relevant in classrooms with many students per teacher since student time will always be
more plentiful than teacher time. Although any single student’s feedback may not be as
rich or in-depth as a teacher’s feedback, the research suggests that peer assessment can
improve learning. The research base has found peer assessment strategies to be effective
in different content areas from language arts (Karegianes, Pascarella, & Pflaum, 1980;
McLeod, Brown, McDaniels, & Sledge, 2009), to mathematics (Bangert, 2003; Jurow, Hall,
& Ma, 2008) and science (Peters, 2008). Peer assessment has even proven beneficial for
students as young as six years old (Jasmine & Weiner, 2007). There is research on peer
assessment from the North America and Europe (Sluijsmans, Dochy, & Moerkerke, 1999;
Topping, 2005), and there are a few research studies from Asian countries (Bryant &
Carless, 2010; Carless, 2005). Peer assessment is associated with performance gains and
cognitive gains for students who receive feedback and for students as they give feedback.
The research suggests that, when done properly, peer assessment strategies can improve
the quality of learning to a degree equivalent to gains from teacher assessment (Topping,
2009). Giving and receiving feedback impacts meta-cognitive abilities like self-regulation
(Bangert, 2003; Butler & Winne, 1995) influencing time on task and engagement in
learning and improving learning outcomes. Asking students to provide feedback to others
can also improve their own work as they internalize standards of excellence (Li, et al.,
2010). When used in conjunction with collaborative learning peer assessment can also
improve interpersonal skills like group work, consensus building, or seeking and providing
help (Brown, Topping, Henington, & Skinner, 1999; J. A. Ross, 1995). In collaborative peer
assessment techniques, students could work in groups to review work, entire class might
evaluate student presentations or students can even be asked to assess their own groups’
work. Peer assessment is usually used in conjunction with other types of teacher
assessment so that the peer assessment is seldom the only evaluation provided. For
example, peer editing maybe done on a draft report but the teacher evaluates the final
draft or peers may provide part of the score on a student’s performance but the rest of
the score comes from the teachers’ assessment. Peers are generally defined as students of
equal status in that they are in a similar grade and similar levels of proficiency with
content, although there is often flexibility and slightly older students may assess younger
students, or a student moving more quickly through the material may be asked to assess
a less advanced students. Topping (2005) contends that peer assessment works best when
students are asked to provide formative and qualitative feedback rather than merely
grading or giving a score to peers since this often makes students uncomfortable.

6. Student Response Systems


Student response system (SRS), also known as classroom response system (CRS), audience
response system (ARS) or colloquially as “clickers,” is a general term that refers to a
variety of technology-based formative assessment tools that can be used to gather
student-level data instantly in the classroom. Through the combination of hardware (hand
held clickers, receiver, PC, internet connection, projector and screen) and software,
teachers can ask students a wide range of questions (both closed and open-ended),
students can respond quickly and anonymously, and the teacher can display the data
immediately and graphically. The value of SRS comes from teachers analyzing information
quickly and then devising real-time pedagogical solutions to maximize student learning
(Beatty & Gerace, 2009; Bruff; Caldwell, 2007). As with most teaching tools (including the
rubric), an SRS is only as effective as the pedagogy it is couched in (Beatty & Gerace,
2009; Rochelle, Penuel, & Abrahamson, 2004). As a result, this section discusses not only
the tool but also the questioning strategies at the heart of its implementation. At its core,
SRS allows for the generation of data that can guide the ongoing modification of
pedagogy and content coverage to better differentiate teaching strategies to meet all
students’ needs (Bruff; Caldwell, 2007; Salend, 2009). What makes SRS distinct from other
assessment tools is its ability to collect and display data instantly rather than waiting days
to present the outcome as with a test, essay or project. SRS has been found to be
effective across grade levels and in a variety of subject areas (Beatty & Gerace, 2009;
Bruff, 2007; Caldwell, 2007; Rochelle, et al., 2004). The effectiveness of the SRS tool is
closely linked to the type, quality, quantity, speed and sequence of the questions being
asked (Bruff, 2007; Beatty & Gerace, 2009; Caldwell, 2007). SRS technology can be used to
pose a variety of types of questions including recall questions, conceptual understanding
questions, application questions, critical thinking questions, student perspective questions,
confidence level questions, monitoring questions, and classroom experiment questions
(Bruff). Depending on the learning goal for the lesson, a teacher can ask questions to help
gauge understanding, foster discussion, elicit feedback or give student voice in what they
are studying. An instructor may also choose from a number of questioning sequences
including easy-hard-hard (a “warm-up” question followed by two more challenging
questions meant to elicit student discussion and test transferability across contexts) or
rapid fire (a series of moderately difficult questions around one concept). Some general
examples of effective SRS questions include: given a graph, match it with the best
description or interpretation; match the method of analysis with a particular data set; sort
ideas or steps into the correct order; or apply a familiar idea to a new context. One strand
of questioning strategies that is highly effective at integrating SRS is a series of questions
designed to promote peer learning. Peer learning is an active learning method where
students spend time collaborating and discussing issues in small groups (Caldwell, 2007).
To foster peer learning, SRS can be used to pose a question that a teacher knows
students will have varying opinions on. Peer learning has been proven an effective
teaching method that increases student engagement, improves learning outcomes,
promotes the circulation of knowledge between students, fosters metacognitive learning,
and provides feedback to the instructor (Beatty & Gerace, 2009). Practitioners and
researchers report many other benefits to the use of SRS in the classroom. The research
suggests that when integrated effectively into instruction SRS can 1) improve
engagement, 2) provoke critical thinking; 3) give students voice in classroom decisions, 4)
improve classroom discussion, 5) increase attendance and retention, and finally, 6)
increase enjoyment of class (Caldwell, 2007; Bruff; Salend, 2009; Beatty & Gerace, 2009;
Johnson & McLeod, 2004). Though small studies show that SRS has been effective at
increasing achievement levels among special populations like students with learning
disabilities, on a larger scale, researchers have difficulty making a causal link between the
tool and academic outcomes (Jerome & Barbetta, 2005; Caldwell, 2007; Roschelle et. al.,
2004). In additional to enhancing instructional strategies, SRS can be used as an effective
classroom management tool to help monitor participation (Rochelle et. al., 2004), manage
a large classroom (Caldwell, 2007; Beatty and Gerace, 2009), practice and review for tests
(Beatty and Gerace, 2009), and facilitate homework collection (Bruff).

Conclusion

More and more, discussions concerning education reform are paying increasing attention
on the role that classroom based assessment strategies play in fostering student centered
teaching practices. Together, all of the research cited here strongly suggests that these
assessment tools and strategies can positively impact a number of key areas that we know
are important aspects of education reform: student/teacher relationships, teacher’s ability
to personalize instruction, acquisition of 21st century skills, student engagement and
student metacognition. These practices are becoming more common in developed
countries, but there is still little research on how to adapt these approaches to the school
contexts of many emerging market countries. It is important to note that with access to
professional development resources, teachers and administrators can become proficient
with assessment for learning approaches without returning to the university for continuing
education courses. Many teachers who have participated in Intel teacher professional
development program are beginning to use assessment for learning strategies and this
has offered us a chance to see these new assessment strategies in action (León Sáenz,
Castro, & Light, 2008; Light, et al., 2009). With support from the ministries of education,
the Intel education portfolio of professional development course is available online and
face-to-face courses are available in over 30 countries. But there is still more work to be
done for local governments, ministries, and NGOs both in researching and adapting these
strategies to developing country contexts and to developing programs to promote their
use in classrooms.

10 Types of Assessment :
1) Summative Assessment
Summative comes from the word summary. The summative assessment arrives at the very
end of the learning sequence and is used to record the students overall achievement at
the end of learning. The primary objective summative assessment is to measure a
student’s achievement post instructions or learnings.

Examples of summative assessment II midterms and final papers and examinations which
give overall knowledge test of the student at the end of the learning. The summative
assessment gives an insight into an overall scenario of the understanding of the student
regarding particular learning or a topic. Summative assessment helps to answer the
questions like what happened and what went wrong at the end of the learning.

The United States uses summative assessment all over their educational institutes.
Summative assessments have more weight compared to formative assessments.
Questionnaire service interviews testing’s and projects are few of the methods used to
measure Summative assessment.

2) Formative Assessment
Formative assessment includes a variety of formal and informal assessment procedures
which are used by teachers in the classroom so that they can modify the teaching an
improve the student’s attention retention and his learning activity.

Formative assessment survey geyser in throughout the learning process and usually
determined the performance of the student during the learning, unlike summative
assessment which determines the performance at the end of the learning.

The primary objective of formative assessments is to involve the attention of the students
and help them achieve their goals. It is performed in the classroom and determines the
strengths and weaknesses of students. The routine question during the teaching of a
lesson is an example of formative assessment.

Following are the characteristics of formative assessment

Formative assessment is positive in its intention such that it is directed towards promoting
learning and hence it is an integral part of teaching.
It helps in addressing individual or group deficiencies by identifying it.

3) Evaluative assessment
This is concerned only with evaluating assessment. The overall idea is to evaluate the
assessment in the school or in the system or in the department. Evaluation of candidates
helps in assessing and judging whether the candidates are capable enough for the
learning program. Evaluative assessment is done only with the aim of evaluating and
grading the candidates.
4) Diagnostic Assessment
When the objective is to identify individual strengths and areas of improvement
diagnostic assessment is the one that is used. It helps to inform next steps in the
assessment bike including the strengths and weaknesses areas of improvement and other
characteristics. Unlike Evaluative assessment, diagnostic assessment does not aim to grade
the candidates but rather it helps in diagnosing the issue after which the teacher can take
steps to address it.

5) Norm-referenced tests (NRT)


Robert Glaser coined the term Norm-Referenced Test.

Norm-referenced tests commonly known as NRT tests is used to assess or evaluate with
the aim of determining the position of the tested individual against a predefined group
on the traits being measured. The term normative assessment means the process of
comparing one test taker to his seniors or peers.

The primary objective behind this test is to determine whether the test taker has
performed better or worse than the other test takers which in turn determines whether
the test taker knows more or less than the other test takers. Comparison by
benchmarking is the method used in NRT.

The primary advantages that this kind of test can provide information about an individual
vis-a-vis the reference group while disadvantage includes the reference group may not
represent the current population of interest since most of the norms are misleading and
therefore do not stay over a period of time. This test also does not ensure if the test is
valid in itself. Norms do not mean standards which is another disadvantage of this test.

6) Performance-based assessments
This is also known as education assessment in which the skills, attitudes, knowledge, and
beliefs of the student are checked to improve the standard of learning. The assessment
year used at times done with the test but not only confirm to tests and it can extend to
class or workshop or real-world applications of knowledge used by the student.

It is further divided into few subtypes such as

Initial and diagnostic assessment


Objective and subjective assessment
Referenced and norm-referenced Assessment
Informal and formal assessment
Internal and external assessment
Examples would include multiple choice questions approach and answers as a post to the
traditional responses which are normally done in writing a report.

7) Selective response assessment


This refers to the objective assessments including multiple choice true or false and
matching questions. It is a very selective effective and efficient method to measure the
knowledge of students and is also the most common method of assessment for students
in the classroom.

Selective response assessment determines the exact amount of knowledge that the
student has and also provides an insight into the skills the student has acquired over the
time of learning.

8) Authentic assessment
Intellectual assessments that are worthwhile significant and substantial are measured by
authentic assessment. In contrast, to standardize tests authentic assessment provides deep
insights about the student.

It focuses to enable the skills of students to demonstrate their capabilities and


competencies in a more authentic setting. Like the performance of a particular skill or
demonstrating a particular form of knowledge assimilation and role plays or strategic and
selecting items. Authentic assessment helps to determine and develop the problem-
solving skills that are required out of school. Case studies are one of the common
examples of authentic assessment.

9) Criterion-referenced tests
This kind of assessment determines the performance of student against a fixed set of pre-
determined and agreed upon criteria or the learning of students. Unlike norm-referenced
test here without reference is made against a particular criterion other than a benchmark
or a human being or another student.

While criterion-referenced assessment will provide whether or not the answer is correct
the norm-referenced assessment will provide information on whether the answer is better
than student number 1 is worse than student number 3.

The comparison here is not against a person or fellow competitor is what is the biggest
advantage of criterion-referenced assessment over norm-referenced assessment. While the
earlier assessment provides the exact running status of student the latter one provides the
running status of a student with respect or in comparison to others.

10) Written and Oral Assessment


These include projects, term papers, exam papers, essays etc. The primary objective
behind the written assessment is to determine the knowledge and understanding of the
student. Written assessments are performed under the supervision of the teacher and the
questions are given on the assessment day with limited time to answer the questions.

Written assessments are one of the most popular methods in Summative Assessment.
Oral assessments, on the other hand, involve the evaluation of the candidates orally. They
are evaluated for the knowledge with their verbal answers. Questions can be elaborative
or objective or a combination of both.
NATURE OF PERFORMANCE BASED-TESTS
Most of the time, the teacher relied on paper-and-pencil tests which measures knowledge and
understanding instead of a student’s ability. With the implementation of the O.B.E (Outcome-
Based-Education) greater emphasis shall be given in assessing student outcomes through real
life (Authentic Assessment) which requires the students to carry out activities or produce
product in demonstrating meta-cognitive knowledge.

1.Meaning and Characteristics


Performance-Based Assessment:
• It is one during which the teacher observes a judgment about the student’s demonstration of
a skill or competency in creating a product, constructing a response, or making a presentation.

• It is an alternate sort of assessment that moves faraway from traditional paper-and-pencil


tests.

• It is a processes the creative aspect of the scholar in bringing out what they know and
what they will do through different performance tasks like exhibits, projects and work samples
(Hands-on experiences)

• It is stipulated within the DepEd Order No.7, s.2012 that the very .best level of assessment
focuses on the performance (product) which the scholars are expected to supply through
authentic performance tasks.

Multiple Evaluation Criteria. The student’s performance must be judged using an evaluation
criterion.

Pre-specified Quality Standards. Each of the evaluative criteria on which a student’s


performance is to be judged is clearly explicated beforehand of judging the standard of the
student’s performance

Judgemental Appraisal. Unlike the scoring of selected-response tests during which electronic
computers and scanning machines can. Once programmed, keep it going without the necessity
of manual intervention, genuine performance assessments needs human subjective judgments.

2. Types of Performance Tasks

• Solving problems – Critical thinking and problem solving skills are important. Teachers may
include activities and make sense of complex authentic problems or issues to be solved by the
students
• Completing an inquiry – An inquiry task is where students are asked to collect data in order
to develop understanding about a topic or issue. Examples of these include investigation,
research-based activities , survey and interviews or independent studies.

• Determining a position – This task requires students to make decision or clarify a position.
Case analysis and issue related activities or debate are some samples of this task.

• Demonstration task – This task show how students use knowledge and skills to complete
well-defined complex tasks. Examples are demonstrating steps in cooking, explaining
earthquake safety and etc.

• Developing Exhibits – Exhibits are visual representations or displays that need little or no
explanation from the creators. An exhibit is obtainable to elucidate , demonstrate or show
something.

• Presentation Task – This is a work or task performed in front of an audience like storytelling,
singing and dancing, musical play or theatrical acting.

• Capstone Performances – These are tasks that occur at the end of a program or study and
enables students to show knowledge and skills in the context that matches the world of
practicing professionals. These tasks includes research paper, practice teaching, internship or
on-the-job training.3. Strengths and Limitations

3. Strengths and Limitations

• Performance assessment clearly identifies and clarifies learning targets. Authentic


performance tasks like world challenges and situations can closely match with the varied
complex learning targets.

• Performance assessment allows students to exhibit their own skills, talents and expertise.
Tasks show integration of the student’s skills knowledge and skills , provide challenge and
opportunities to exhibit their best creation.

• Performance assessment advocates constructivist principle of learning. Students are more


engaged in active learning and provides more opportunities to demonstrate their learning in
several ways in complex tasks.

• Performance assessment uses a spread of approaches to student evaluation. This offers


students a spread of way of expressing their learning and increases the validity of student’s
evaluation.

• Performance assessment allows the teachers to explore the most goal and processes of
teaching and learning process. Teachers may reflect and revisit learning targets, curriculum and
instructional practices, and standards as they utilize performance-based assessment.

Performance based assessments have some distinct limitations as well:

1. Development of top quality performance assessment may be a tedious process. Performance


assessment needs careful planning and implementation. it’s very time consuming to construct
good tasks.

2. Performance assessment requires a substantial amount of your time to administer.


Paper-and-pencil takes 15-20 minutes per tasks to finish counting on the amount of things .
Most authentic tasks take variety of days to finish .

3. Performance assessment takes an excellent deal of your time to attain .

The more complex the method and performance, the longer you’ll expect to spend on scoring.
to scale back the scoring time, crafting top quality rubrics is suggested .

4. Performance task score may have lower reliability. This resulted to inconsistency of scoring
by teachers who interpret observation quit differently.

5. Performance task completion could also be discouraging to less able students. Some tasks
that need students to sustain their interest for a extended time may discourage disadvantaged
students.

Designing Meaningful Performance-Based Assessment

As we learned the nature of performance-based assessment, its characteristics, types,


advantages and limitations, the next step is to design it aligned to the learning goals. Focusing
on the knowledge and skills targeted, you will need to think of some tasks which must be
performed authentically. Clearly, comprehensive planning and designing of performance-based
assessment should be taken into consideration.

Designing Meaningful Performance-Based


Assessment

As we learned the nature of performance-based assessment, its characteristics, types,


advantages and limitations, the next step is to design it aligned to the learning goals.
Focusing on the knowledge and skills targeted, you will need to think of some tasks which
must be performed authentically. Clearly, comprehensive planning and designing of
performance-based assessment should be taken into consideration.

Chapter Intended Learning Outcome

At the end of the chapter, you should be able to develop a portfolio of pen-formalities based
assessment tools that measures learners’ competencies of a given subject.

Designing performance assessment entails critical processes which start from the tasks that the
teacher wants to assess. A well-designed performance assessment helps the student to see the
connections between the knowledge, skills, and abilities they have learned from the classroom,
including the experiences which help them to construct their own meaning of knowledge.

The following steps will guide you in developing a meaningful performance assessment both
process and product that will match to the desired learning outcomes.

1. Defining the Purpose of Assessment


The first step in designing performance-based assessment is to define the purpose of
assessment. Defining the purpose and target of assessment provides information on what
students need to be performed in a task given. By identifying the purpose, teachers are able to
easily identify the weaknesses and strengths of the students’ performance. Purpose must be
specified at the beginning of the process so that the proper kinds of performance criteria and
scoring procedures can be established. Basic questions which teachers ask in determining
possible learning competencies to be considered are listed below.

Herman (1992)

Basically, the teacher should select those learning targets which can be assessed by
performance which fits to the plan along with the assessment techniques to be utilized for
measuring other complex skills and performance.

1.1. Four Types of Learning Targets Used in Performance Assessment

In defining the purpose of assessment, learning targets must be carefully identify and taken in
consideration. Performance assessments primarily use four types of learning targets which are
deep understanding. reasoning, skills, and products (McMillan, 2007),

Deep Understanding

The essence of performance assessment includes the development of students’ deep


understanding. The idea is to involve students meaningfully in hands-on activities to extended
periods of time so that their understanding is rich and more extensive than what can be
attained by more conventional instruction and traditional paper-and-pencil assessments. This
focuses on the use of knowledge and skills.

Reasoning

Reasoning is essential with performance assessment as the students demonstrate skills and
construct products. Typically, students are given a problem to solve or ar: asked to make a
decision or other outcome, such as a letter to the editor or schor newsletter; based on
information that is provided.

Skills

In addition to logical and reasoning skills, students are required to demonstrate


communication, presentation, and psychomotor skills. These targets are ideally ‘ m to
performance assessment.

Psychomotor Skills
Psychomotor skills describe clearly the physical action required for a given tasks. These may be
developmentally appropriate skills or skills that are needed for specific tasks: fine motor skills
(holding a pen, focusing a microscope, and using scissors gross motor actions (jumping and
lifting), more complex athletic skills (shooting basketball or playing soccer), some visual skills,
and verbal / auditory skills for yo children. These skills also identify the level at which the skill
is to be performed.
Generally, deep understanding and reasoning involve in-depth, complex thinking about what is
known and application of knowledge and skills in novel and more sophisticated ways. Skills
include student proficiency in reasoning, communicatio and psychomotortasks.

Products

Are completed works, such as term papers, projects, and other assignments in which students
use their knowledge and skills.

1.2 Process and Product-Oriented Performance-Based Assessments

In defining the purpose of assessment, the teacher should identify whether the students will
have to demonstrate a process or a product. if the learning outcomes deal on the procedures
which you could specify, then it focuses on process assessment, in assessing the process, it is
essential also that assessment should be done while the students are performing the
procedures or steps.

Learning targets which require students to demonstrate process include the procedures of
proper handling / manipulating of microscope, or steps to be done when in an earthquake
drill. Mathematical operations, reciting a poem, and constructing a table of specification are
other examples of this target.

Example of process-oriented performance-based assessment in which the main domain is Oral


Language and Fluency (Enclosure No. 4, DepEd Order No. 73, S. 2012):

Example 1: English Grade 7

Content Standard: The students demonstrate oral language proficiency and fluency
in various social contexts.

Performance Standard: The learner proficiently renders rhetorical pieces.

Task: Oral – Aural Production (The teacher may use dialogs or passages from other written
or similar texts).

Specific Competencies:

1. Observe the right syllable stress pattern in different categories.


2. Observe the use of the rising and falling intonation, rising intonation, and the combination of
both intonation patterns in utterances.
3. Demonstrate how. prosodic patterns affect understanding of the message.

Example 2: Filipino Grade 7

Kakayahan (domain): Pag-unawa sa Napakinggan


Pamantayang Pangnilalaman (Content Standard): Naipamamalas ng mga mag-aaral ang
pagunawa sa paksa ng akdang napakinggan.

Pamantayan sa Pagganap para sa aralin (Performance Standard): Ang mga mag-aaral av


nakasusulat ng talata na may kaugnayan sa paksa ng akdang napakinggan.

Kakayahan:

1. Nakapagbabahagi ng mga nasaliksik na impormasyon.


2. Nakapag-uugnav ng mga nasaiiksik na impormasyon sa paksa ng akdang napakinggan.
3. Natutukoy ang ilang akda o awitin na may pagkakatulad sa paksa ng akdang napakinggan.

Usually, the learning objectives start with a general competency which is the main target of the
task, and it follows with specific competencies which are observable on th target behavior or
competencies. This can be observed also in defining the purpose of assessment for product-
oriented performance-based assessment.

Sometimes, even though you teach specific process the learning outcomes simply implies that
the major focus is product that the student produces. Nitko (2011) suggested focusing
assessment on the product students produce if most or all of the evidence abou their
achievement of the learning targets is found in the product itself, and little or none of the
evidence you need to evaluate students is found in the procedures they use or the way in
which they perform.

Assessment of products must be done if the students will produce a variety of better ways to
produce high quality products, sometimes, method or sequence does not make difference as
long as the product’ is the focus of the assessment.

Examples of learning targets which require students to produce products include; building a
garden, conducting classroom-based researches, publishing newspaper and creating
commercials or powerpoint presentation.

In the given examples 1 and 2 for English and Filipino Grade’7 domains, product – oriented
performance-based assessment can be stated as:

 Use the correct prosodic patterns (stress, intonation, phrasing, pacing, tone) in rendering
various speech acts or in oral reading activities, and
 Nakasusulat ng talatang nagsasalaysay rig iiang pangyayari sa kasalukuyan ng may kaugnayan
sa paksa ng akdang napakinggan.
Below is another example of product-oriented performance-based assessment task.

Example 3: Creating a Book Cover Taken from a Digital Camera

Performance Task: Creating A Book Cover

Competencies: The students should be able to:

1. Generate appropriate shots for book cover using digital camera;


2. Use a page lay-out software (MS Publisher) or presentation software (MS Powerpoint);
3. Create size estimation of image, shapes, and textbox in terms of importance, emphasis and
visual hierarchy; and
4. Demonstrate skills in information design principles such as clarity, balance, relevance, contrast,
alignment, repetition and proximity.

Product-oriented competencies require students to demonstrate multiple levels of


metacognitive skills which require the use of complex procedural skills for creating authenti’
product. The discussion on the steps of designing performance-based assessment shall be
focused on the process and product assessments.

1. Identifying Performance Tasks


Having a clear understanding of the purpose of assessment, the next step is to identify
performance tasks which measure the learning target you are about to assess. Some targets
imply that the tasks should be structured; others require unstructured tasks. Below are some
questions that should be answered in designing tasks:

 What ranges of tasks do the learning targets imply?


 Which parts of the tasks should be structured, and to what degree?
 Does each task require students to perform all the important elements implied by the learning
targets?
 Do the tasks allow me to assess the achievement dimensions 1 need to assess?
 What must l tell students about the task and its scoring to communicate to them what they
need to perform? o Will students with different ethnic and social backgrounds interpret my
task appropriately?
(Nitko 2011)

Performance needs to be identified so that students may know what tasks and criteria to be
performed. In this case, a task description must be prepared to provide the listing of
specification of the tasks and will elicit the desired performance of the students. Task
description should include the following: ‘

1. Content and skill targets to be assessed


2. Description of the student activities
3. Group or individual
4. Help allowed
5. Resource needed
6. Teacher role
7. Administrative process
8. Scoring procedures

(McMillan 2007)

Tasks on the other hand should be meaningful and must let the student he personally involved
in doing and creating the tasks. This could be done by Selecting a task which has personal
meaning for most of the students. Choose a task in which students have the ability to
demonstrate knowledge and skills from classroom activities or other similar ways. These asks
should be of high value, worth teaching to, and worth learning as well.

In creating performance tasks, one should specify the learning targets, the criteria by which
you will evaluate performance, and the instructions ‘for completing the task. Include also the
time needed to complete the tasks. Be sure students understand how long a response you are
expecting. Some learning targets can be assessed in a relatively short period of 20 to 30
minutes. But it also depends on the learning targets which necessitate-a longer time. Examples
are conducting opinion survey and gathering of data for research which need more than two
weeks and done outside of the class. With these activities. the results can make a valid
generalization of how the students achieved the learning target.

Participation of groups must be considered also in crafting performance tasks. Some tasks
require cooperative or collaborative learning or in group tasks. With this, the number of tasks
must be given an attention as well, as a rule, the fewer the number of tasks, the fewer targets
can be assessed in a given performance.

2.1 Suggestions for Constructing Performance Tasks


The development of high-quality performance assessments that effectively measure complex
learning outcomes requires attention to task development and to the ways in which
performances are rated. Linn (1995) suggested ways to improve the development of tasks:

1. Focus on learning outcomes that require complex cognitive skills and student
performances. Tasks need to be developed or selected in light of important learning outcomes.
Since performance-based tasks generally require a substantial investment of student time, they
should be used primarily to assess learning outcomes that are not adequately measured by
less time-consuming approaches.
2. Select or develop tasks that represent both the content and the skills that are central to
important learning outcomes. It is important to specify the range of content and resources
students can use in performing task. In any event, the specification of assumed content
understandings is critical in ensuring that a task functions as intended.
3. Minimize the difference of task performance on skills that are irrelevant to the intended
purpose of the assessment task. The key here is to focus on the attention of the assessment.
Example is the ability to read complicated texts and the ability to communicate clearly are both
important learning outcomes, but they are not necessarily the intent of a particular assessment
4. Provide the necessary scaffolding for students to be able to understand the tasks and what is
expected. Challenging tasks often involve ambiguities and require students to experiment,
gather information, formulate hypothesis, and evaluate their own progress in solving a
problem. However, problems cannot be solved in a vacuum. Students need to have a prior
knowledge and skills required to address the problem. These prerequisites can be a natural
outcome of prior instruction or may be built into the task.
5. Construct task directions so that the student’s task is clearly indicated. Vague directions can
lead to such a diverse array of performances that it becomes impossible to rate them in a fair
or reliable fashion. By design, many performance-based tasks give students a substantial
degree of freedom to explore, approach problems in different ways and come up with novel
solution.
6. Clearly communicate performance expectations in terms of the criteria by which the
performances will be judged. Specifying the criteria to be used in rating performance helps
clarify task expectations for a student. Explaining the criteria that will be used in rating
performances not only provides students with guidance on how to focus their efforts, but
helps to convey priorities for learning outcomes.
Example of Process-Oriented performance task on Problem Solving and Decicsion-making:

Example 4 Problem-Solving and Decision-Making Performance Task

Key Competencies:

1. Uses reading skills and strategies to comprehend and interpret what is read.
2. Demonstrate competence in speaking and listening as tools for learning.
3. Construct complex‘sentences.
.

Your friend is going through a difficult time. You have tried talking about the issue but to no
avail. After much thought you recall a book you had read where the character went through a
similar experience as your friend. How might the book help your friend deal with the problem?
What other sources of information or resources could you find I; to help your friend? What
might be some strategies your friend could use? Use your writing skills to compose a letter to
your friend as to why he should read the book or il resources you have collected. Be sure your
letter contains examples from the readings, your feelings and encouragement.

As a problem solver, devise a plan to meet with your friend to identify possible solutions to
the problem after he has read the materials. Be sure you are considerate of feelings and
outline steps you’ll take to make sure your discussion is one of collaboration.

You will be assessed on your ability to make informed decisions, your ability to create a letter
with complex sentences, your ability to solve problem and your ability to work collaboratively
with a peer.

Adapted from Educational Planning, Portland Public Schools

The example below shows performance task for product-oriented performance-based


assessment:

Competency: Prepare Useful Solution

Performance Task

Barangay Luntian is celebrating its 50th anniversary with the theme “Kalikasan Ko, Mahal Ko”.
The barangay’captain called for a council meeting to discuss the preparations for the program.
As a councilc’ir, you are asked to take charge of the preparation. of “Natural Beverage” for the
guests. This healthful drink should promote your locally produced fruits or vegetables as well
as health and wellness. On your next council meeting, you will Dresent your plan for the,
preparation of the drink and let the council member do the taste testing. The council members
will rate your drink based on the following criteria: Practicality, Preparation, Availability of
materials, Composition of solution (drink).

Crafting tasks for both process and product-oriented performance-based assessments


needs careful planning. Engagement, elaboration, and experience are some factors to consider
in making authentic tasks which make it different to traditional assessment. Tasks should also
center on the concepts, principles, and issues that are important to the context of the subject
matter. Moreover, teachers must know what they want to observe before performance criteria
can be identified. Below is the checklist for writing good performance tasks:
Checklist for Writing Performance Tasks

ü Are essential content and skills targets integrated?

ü Are multiple targets included?

ü Is the task authentic?

ü Is the task teachable?

ü Is the task feasible?

ü Are multiple solutions and paths possible?

ü Is the nature of the task clear?

ü Is the task challenging and stimulating?

ü Are criteria for scoring included?

ü Are constraints for completing the task included?

McMillan (2007)
Regardless of whether these are process or product-oriented performance tasks, clearly stated
performance criteria are critical to the success of both instruction and assessment. Criteria in
the real essence of performance-based assessment define the target process and product,
guide and help the students on what should be taught and done and provide a target in
assessing the performance of the students.

Developing Scoring Schemes

There are different useful ways to record the assessment of students’ performance. Variety of
tools can be used for assessment depending on the nature of the performance it calls for. As
teacher, you need to critically examine the task to be performed matched with the assessment
tools to be utilized. Some ways of assessing the students’ performance could be the utilization
of anecdotal records, interviews, direct observations using checklist or Likert scale, and the use
of rubrics especially for the performance-based assessment.

3.1 Rubrics as an Assessment Tool


Rubrics nowadays have been widely used as assessment tool n various d p me: most especially
in the field of education. Different authorities defined rubrics:

 Set of rules Specifying the criteria used to find out what the students know and are able to do
so. (Musial, 2009)
 Scoring tool that lays out Specific expectations for assignment (Levy, 2005)
 A scoring guide that uses criteria to differentiate between levels of student proficiency.
(McMillan, 2007)
 Descriptive scoring schemes that are developed by teachers or evaluators to guide the analysis
of products or processes of students’ effort (Brookhart. 1999).
 The scoring procedures for judging students’ responses to performance tests (Popham, 2011)

A rubric that’s used to score students’ responses to a performance assessment has, at


minimum, three important features:

 Evaluative criteria. These are the factors to be used in determining the quality of 3 students’
response.
 Descriptions of qualitative differences for evaluating criteria. F012 each evaluative criterion, a
description must be supplied so qualitative distinctions in students’ responses can be made
using the criterion.
 An indication of whether a holistic or analytic scoring approach is to be used. The rubric must
indicate whether the evaluative criteria are to be applied collectively in a form of holistic
scoring or on a criterion-by-criterion basis in the form of analytic scoring.

(Popham, 2011)

Rubrics are used also to communicate how teachers evaluate the essence of what is being
assessed. Rubrics not only improve scoring consistency, they also improve validity by clarifying
the standards of achievement the teacher will use in evaluating. In the development and
scoring of rubrics, Nitko (2011) suggested some questions which the teacher should address:

 What important criteria and learning targets do l need to asses?


 What are the levels of development (achievement) for each of these criteria and
learning targets?
 Should i use a holistic or an analytic scoring rubric?
 Do i need to use a rating scale or a checklist as my scoring scheme?
 Should my students be involved in rating their own performance?
 How can I make my scoring efficient and less time-consuming?
 What do l need to record as the result of my assessments?
 What are some useful methods of recording students’ responses to performance tasks?

3.2 Types of Rubrics

The structure of the rubrics changes when measuring different learning targets Generally,
rubrics can be classified into two major types: analytic and holistic rubrics.

Analytic Rubric. it requires the teacher to list and identify the major knowledge and skills which
are critical in the development of process or product tasks. It identifies specific and detailed
criteria prior to assessment. Teachers can assess easily the specific concept understanding,
Skills or product with a separate component. Each criterion for this kind of rubric receives a
separate score, thus, providing better diagnostic information and feedback for the students as
a form of formative assessment.

Holistic Rubric. It requires the teacher to make a judgment about the overall quality of each
student response. Each category of the scale contains several criteria which shall be given a
single score that gives an overall rating. This provides a reasonable summary of rating in which
traits are efficiently combined, scored quickly and with only one score, thus, limiting the
precision of assessment of the results and providing little specific information about the
performance of the students and what needs for further improvement.

 Rubric Development
Stevens and Levi s introduction to Rubrics (2005) enumerated the steps in developing rubric.
Basically, rubrics are composed of task description, scale, description of dimensions

Task Description

Task description involves the performance of the students. Tasks can be taken from
assignments, presentations, and other classroom activities. Usually, task descriptions are being
set in defining performance tasks.

 Community Development

Task Description; Each student will make a 10-minute presentation on his / her observations,
experiences, analysis and interpretation of developing community. Student may use his/her
own community as a sample and look into its changes over the past 10 years. He / She may
use any form or any focus of presentation, but it’s a must to have a thesis statement, not just
an exposition. The presentation should include table, graphs, photographs, maps, landmarks,
and conclusions for the audience.

 Scale

The scale describes how well or poorly any given task has been performed and determine to
what degree the student has met a certain criterion Generally, it is used to describe the level of
performance. Below are some commonly used labels compiled by Huba and Freed (2000).

 Sophisticated, competent, partly competent, not yet competent


 Exemplary, proficient, marginal, unacceptable
 Advanced, intermediate high, intermediate, novice
 Distinguished, proficient, intermediate, novice
 Accomplished, average, developing, beginning
Community Development

Task Description: Each student will make a 10minute presentation on his/ her observations,
experiences, analysis and interpretation of developing community. Student may use his/her
own community as a sample and look into its changes over the past 10 years. He / She may
use any form or any focus of presentation, but it’s a must to have a thesis statement, not just
an exposition. The presentation should include table, graphs, photographs, maps, landmarks,
and conclusions for the audience.

Dimensions

This is a set of criteria which serves as basis for evaluating student output or performance. The
dimensions of rubric lay out the parts and how tasks are divided into its important
components as basis also for scoring the students.

Community Development

Task Description: Each student will make a 10-minute presentation on his / her observations,
experiences, analysis and interpretation of developing community. Student may use his/her
own community as a sample and look into its changes over the past 10 years. He / She may
use any form or any focus of presentation. but it‘s a must to have a thesis statement, not just
an exposition. The presentation should include table. graphs, photographs, maps. landmarks,
and conclusions for the audience.

Description of the Dimensions

Dimensions should contain description of the level of performance as standard of A excellence


accompanied with examples. This allows both the teachers and the students to identify the
level of expectation and what dimension must be given an emphasis.

Community Development

Task Description: Each student will make a 10-minute presentation on his/ her observations,
experiences, analysis and interpretation of developing community. Student may use his/her
own community as a sample and look into its changes over the past 10 years He / She may
use any form or any focus of presentation, but it’ s a must to have a thesis statement, not just
an exposition. The presentation should include tables, graphs, photographs, maps, landmarks,
and conclusions for the audience.

1. Rating the Performance


This is the final step in performance-based assessment, determining the learning outcomes of
the students. The main objective of rating the performance is to be objective and consistent.
Be sure also that the scoring system is feasible as welL in most of the classroom situations, the
teacher is both the observer and the rater. If there are some important instructional decision to
be made, additional raters must be considered in order to make scoring more fair.

Since performance-based assessment involves professional judgment, some common errors in


rating should be avoided; personal bias and halo effect. McMillan (2007) stated that personal
bias results in three kinds of error, generosity error occurs when the teacher tends to give
higher scores; severity error results when the teachers use the low end of the scale and
underrate student performances; and the central tendency error in which th students are rated
in the middle. On the other hand, halo effect occurs when the teacher’ general impression of
the students affects scores given on individual traits or performance.

Students, on the other hand, can assess their own progress. Student participation need not to
be limited to the use of assessment instruments. it is also useful to have studen help develop
the instrument. in some practices, students rate themselves and compare their ratings with the
teacher-in-charge. With this, the teacher can elaborate and explain to each student the reasons
for rating and discuss the gap between the rating most especially in an indmdual conference.

Follow up-conference, peer and self- evaluation of output enable teachers to understand better
curriculum and instructional learning goals and the progress being undertaken towards the
achievement of the goals. These too can better diagnose the strengths and limitations of the
students and most importantly, this activity develops self-reflection and analysis skills of both
the teachers and the students.

Affective Learning Competencies


The affective learning outcomes have been recently introduced to our modern pedagogy which
requires students to reflect and personalize/actualize their experience in their own life, thus
giving it the word “Affective” or to “Affect” a learner’s life in some way.

Using the Affective Domain to select targeted teaching techniques might help foster
development of key beliefs and values that underlie the advanced competencies (and sub-
competencies). As educators, our aim is not only to impact knowledge, attitudes and skills,
but to impact the daily behavior of our graduates. (Yanofsky & Nyquist 2014)
According to the statement above, it is targeted to aim for the core values of our learners
and how they behave. William James also stated that Emotion and cognition are
inextricably linked and perhaps never entirely separate, distinctive nor pure.

If we go back to behaviorism, it is said that a person or even an animal can be trained


under a condition where there should be reinforcement for that schema to be embedded
deeply. These schemas are considered the learning and the stimuli are considered the
“affective” instructions that teachers are willing to imply.

Through this there will be a deep connection with the topic and the students’ lives. For
example; instead of just basically teaching the topic “Literary Devices” in a traditional way,
it would be much more preferable to reinforce it with an affective activity or agreement.
They’d be instead tasked to write a poem for their loved ones or even describe their
family using simile or metaphorical phrases. This way, there will be an emotional bond
that will be tied to the student’s knowledge to a certain topic.

“It is often the evaluation that is stored in memory, often without corresponding
conditions and affect that were responsible for its formation.” (Robert Scholl, University of
Rhodes Island, 2012)
By giving the students the “Need” to why they should study, evaluate and reflect on their
lessons will be the most effective and holistic way in learning. The question that will pop
out is “Why do we need this?”. It is simply because we can spark the motivation within
the student (Either Intrinsic or Extrinsic) to push their boundaries within that certain topic
on their own self-driven motivation. In the traditional setting, which we also call the
“Banking Concept of Education”, students are merely being deposited with historical,
linguistic and scientific facts. This process is somewhat dull and empty in the perspectives
of students. Unfortunately, this concept pushes them to cope up and memorize the topics
just for the sake of passing a particular subject.

Furthermore, assignments already existed before this time, tasking the students to
perform something within limited numbers of hours. The process were mostly made of
recollection of learning instead of application. For example, in the field of English, the
teacher taught conjunctions. Generally as we know it, conjunctions are words that bridges
phrases in order to form a sentence. In the traditional manner, the teacher would just
simple give out phrases and would let the student find out for themselves which are the
proper conjunctions that is to be used. On the contrary, in applying the “Affective-
Learning Competency”, the teacher would tap onto the personal schema or various stimuli
in order for the students to have a connection with the topic at hand (even if it’s just a
simple grammar lesson).

Another benefit of the Affective-Learning Competency is putting up the English Lessons


(which obviously cannot be touched) closer to Realia by actualizing what has been learned
into the world. It gives the students “The Real Thing” by incorporating it once, twice or
even a dozen times in their lives. This is inclined to the Natural Approach of Language
Acquisition which says that language can be naturally acquired if there is a “need” to
acquire it. Thus, the lessons for the Affective learning competency should be, as much as
possible, relatable to the real world. The teacher should create the most meaningful
learning to the closest extent and to let his or her learners use it as a means in their daily
lives.

Executing this strategy may take some time and effort on the part of the teacher. First
and foremost, the teacher should prepare ahead of time to make the topic as easy to
comprehend as possible. In his or her DLL or lesson plan, the teacher should give out a
positive impact or motivation as he or she starts his class. That way, the students would
find the lesson memorable and will create meaning towards their learning. The teacher
should then remember to create a balance of objectives in regarding the Psychomotor
and Cognitive objectives, this way, the students will be equipped with the proper know-
how on how to perform their tasks (may it be in grammar or in literature). After putting
up specific objectives, delivering meaningful motivation, and delivering comprehensible
lessons, the teacher should proceed on with the affective domain. The part in which we
have discussed, that gives the students the need to use the lesson at hand.

There are also difficulties and barriers which hinders the application of this strategy. One
of which is the mood of the classroom or of your students.

There will be times where the mood inside the classroom would be grim and chaotic.
These instances cannot be avoided and will somehow occur even to teachers who are
most proficient in handling their classes. We must keep in mind that no matter how
intelligent, talented or witty your students are, they will never learn in an uncomfortable
environment. This negative environment would usually occur if a teacher is very strict and
is always in a bad mood. This also occurs when the teacher spends most of his/her time
scolding the students at the smallest of mistakes that they make.

According to my short interview with the student-teachers of Capiz State University this
occurred to one of their co-student-teacher. It reached to a point where the class was
devoid of all affective-learning and instead, it turned into three months of spending most
of the time scolding the students. This act resulted towards the student’s negligence
towards their subjects and disrespect towards their teacher, which proves that affective-
learning does not thrive in negative environments because the lesson being given have
become the opposite of the word “meaningful”.

Still, during short interview that I conducted, one of the CapSU student-teachers heavily
implemented the affective-learning competency. During his time on his practice teaching,
he heavily implemented the affective-learning competency. The first step that he did was
he created different motivations for each meeting on his class making it more exiting as
each day passes. Through this, he has created a “positive” classroom environment. Next is
he created unique assessments that slightly deviated from the standard activities given by
their textbook, giving the topics a more personalized touch. He also created a multiple-
intelligence affective assessment that was suited for the individual differences of his
students, making them more comfortable in terms of their capacity to learn. Lastly, they
had an agreement (assignment) that directly tapped into their emotions in order for them
to create something from what they have learned. This resulted into a deep connection
between the given topic and the students, proving that the affective-learning competency
is effective in the field of teaching. It was a stress free, meaningful and holistic experience
for the students first-hand. The downside to this is that the activities takes a long time to
develop and can be a bit stressing to the part of the teacher; developing new learning
materials, unique motivations, unique assessment tools and multiple rubrics that would
satiate the demands of the prepared material.

With the example given, the tasks were mostly non-cognitive variables which tapped on
the student’s attitude, interests and values. According to Willam James Popham (2003)
teachers should assess affect to remind themselves that there’s more to being a successful
teacher than helping students obtain high scores and achievement. As Filipinos, we always
hear the phrase “Grades and Knowledge will only get you to a certain point, what matters
is your attitude and how you treat other people.” This is the local philosophy, if not, a
universal belief that developing the attitude of oneself can bring prosperity in the future.
According to Phraser (1994), students are more proficient in problem-solving if they enjoy
what they do. A more positive environment fosters good student engagement and
learning than in a classroom with negative climate.
This could also contribute greatly to the Psychological aspect of the students. Due to
being multicultural (originating from many local culture), students have problems in each
of their lives. These problems, if left unchecked, can result to loss of motivation. This
factor will eventually result in rebellion or a student dropping out in class. A negative
classroom will deteriorate the morale of that particular troubled student which will result
in his/her disinterest in the classroom. On the other hand, if the classroom has a positive
environment, the student will cease to remember his/her struggles and the classroom will
start to become a “Safe Haven” for that troubled student.

In terms of benefits, there are a lot of aspects that can be developed through the use of
this competency. First is the ability to collaborate with others. The students will be able to
exercise their social skills and express themselves in a manner where they’ll be confident
about who they are and what they believe in. Second is the capability to stand on what
they believe in and resolve conflicts in different scenarios. Third is the innate trait of
kindness and empathy towards their peers or even to a complete stranger. This is all due
to the focused development of the affective aspect of the students.

Talking about the downside of this strategy, this is mostly focused more on the “attitude”
part rather than relentlessly memorizing every fact on the textbook. Focusing heavily on
this strategy would not necessarily deprive them of their capability on objective
assessments, but somehow slightly degrade their capacity to point out some specific facts.
Although the outcomes are subjective (depending on the capability of the teacher to
maintain this type of instruction), still the students have a high legibility to go beyond
their institution’s standards.

This strategy can be developed on the first few years of a new teacher or a teacher who
just recently implemented the strategy, and can be revised annually so that he/she
wouldn’t have to create another instruction all over again unless the Department of
Education decides to change the curriculum for the current educational program.

Deducing everything that was stated prior to this part, the affective-learning competency
aims to develop the very foundations of a student’s norms along with their objective
learning. Specifically, it aims to develop their social skills as well as their knowledge on
different subject matters. Before proceeding to this strategy, the teacher should first make
a pre-assessment of a student’s background both psychological and subject matter. This
can be significantly achieved through a positive classroom environment and through
proper assessment planning and consideration.

The affective-learning competency should not be disregarded in our modern pedagogy


because this is the part of the lesson where we teach our students “Humanity” – the act
of being a human; the part which separates humans from animals. Thus, I highly
recommend this strategy and would like to remind all future teachers to never take the
“affective” part in our lessons for granted; because this is the fragment of memory where
your students would probably recollect even after decades have passed.

Development Of Affective Assessment Tool


The relevance of affective targets, attitude traits and how these concepts are related to student
learning were discussed in the preceding chapter. Assessment of the affective domain is one of
the requirements of the 21st teaching-learning proposition. A holistic approach is required so
as to have a meaningful evaluation of student learning. Both the traditional and authentic
assessment tools are to be utilized to come up with a good and quality results. There are
various instruments or tools that can be used but each has its own focus and each instrument
is designed to cater to a specific purpose. In this chapter are the various methods and
assessment tools that can be used to assess affective domain of learners. Samples are provided
to help you craft your own affective assessment tools.

Chapter Intended Learning Outcome

At the end of this chapter, you should be able to develop instruments for assessing
affective learning.

Cognitive and affective domains are inseparable aspects of a learner. Each completes one
another with respect to learners’ important domains. Proper, ongoing assessment of the
affective domain—students attitudes, values, dispositions, and ethical perspectives—is
essential in any efforts to improve academic achievement and the quality of the
educational experience provided. Unfortunately, the practice of routinely assessing
learners’ affective constructs are often left behind and focus is given most of the time to
assessing learners’ cognitive aspect. In addition, unlike cognitive domain, less assessment
tools are available for the affective construct.

1. Methods of Assessing Affective Targets

There are three feasible methods of assessing affective traits and dispositions. These
methods are: teacher observation, student self-report, and peer ratings. (McMillan, 2007).
Since affective traits are not directly observable, they must be deduced from behaviour or
what students say about themselves and others. There are variety of psychological
measures that assess affective traits, but due to sophistication of such instruments,
classroom teachers rarely use them. Instead, own observations and students self-reports
are mostly used.

There are three considerations in assessing affect. These are:

1. Emotions and feelings change quickly most especially for young children and during
early adolescence. Which means that to obtain a valid indication of an individual
student’s emotion or feeling, it is necessary to conduct several assessments over a period
of time. A single assessment is not enough to see what prevalent affect is. It needs to be
repeated over several times.
2. Use varied approaches in measuring the same affective traits as possible. It is better
not to rely on a single method because of limitations inherent in the method. For
example, students’ self-report maybe faked hence may significantly meddle in the results.
(However, if the self-reports are consistent with the teacher’s observation, then a stronger
case can be made.)
3. Decide what type of data or results are needed, is it individual or group
data? Consideration of what the purpose of assessment is will influence the method that
must be used. For reporting or giving feedback to parents or interested individuals about
the learner, individual student information is necessary. Thus, multiple methods of
collecting data over a period of time and keeping records to verify judgements made is
appropriate. If the assessments is to improve instruction, then results for group or whole
class is more proper you use. This is one of the usefulness of affective assessment. It is
more reliable to use anonymous student self-reports.

1.1 Teacher Observation

Teacher observation is one of the essential tools for formative assessment. However, in
this chapter, the emphasis is on how to use this method so that teachers can make more
systematic observations to record student behaviour that indicates the presence of
targeted affective traits.

In using observation, the first thing to do is determine in advance how specific behaviours
relate to the target. Its starts with a vivid definition of the trait, then followed by list of
student behaviours and actions are identified initially by listing what the students with
positive and negative behaviours and say. Classify those and create a separate list of the
positive student behaviours and another list for the negative student behaviours. These
lists will serve as the initial or starting point of what will be observed. Contained in the
table below are some possible student behaviours indicating positive and negative
attitude toward learning.

Student Behaviours Indicating Positive and Negative Attitudes Toward Learning

POSITIVE NEGATIVE
Rarely misses class Is frequently absent
Rarely late to class Is frequently tardy

Asks lots of questions Rarely asks questions

Helps other students Rarely helps other students

Works well independently without Needs constant supervision


supervision Is not involved in extracurricular activities
Is involved in extracurricular activities Says he or she doesn’t like school
He or she likes school Rarely comes to class early
Comes to class early Rarely stays after school
Stays after school Doesn’t volunteer
Volunteers to help Often does not complete homework
Completes homework Doesn’t care about bad grades
Tries hard to do well Never does extra credit work
Completes assignments before they Never completes assignments before the due
are due date complains
Rarely complains Sleep in class
Is rarely off-task Bothers other students
Rarely bothers students Stares out window

These behaviors provide foundation in developing guidelines, checklists or rating scales.


The positive behaviors are called approach behaviors while the negative ones are
termed avoidance behaviors. Approach behaviors result in less direct, less frequent, and
less intense contact. These dimensions are helpful in describing the behaviors that
indicate positive and negative attitudes.

These behaviors may serve as a vital input on how to perform observation, particularly the
teacher observation.

McMillan (2007) suggested that the best approach is to develop a list of positive and
negative behaviors. Although published instruments are available, the unique characteristic
of a school and its students are not considered in these instruments when they were
developed.
After the list of behaviors has been developed, the teacher needs to decide whether to
use an informal, unstructured observation or a formal one and structured. These two
types differ in terms of preparation and what is recorded.

1.1.1 Unstructured Observation

Unstructured observation (anecdotal) may also be used for the purpose of making
summative judgements. This is normally open-ended, no checklist or rating scale is used,
and everything observed is just simply recorded. In using unstructured observation, it is
necessary to have at least some guidelines and examples of behaviors that indicate
affective trait. Thus it is a must to determine in advance what to look for, however it
should not be limited to what was predetermined, it also needs to be open to include
other actions that may reflect on the trait.

Unstructured observation is more realistic, which means teachers can record everything
they have observed and are not limited by what is contained in a checklist or rating scale.

1.1.2 Structured Observation

Structured observation is different from unstructured observation in terms of preparation


needed as well as in the way observation is recorded. In structured observation, more
time is needed since checklist or rating forms are to be made since it will be used to
record observations. The form is generated from a list of positive and negative behaviors
to make it easy and convenient in recording.

Below are the things that should be considered if teacher observation method will be
used to assess affect.

 Determine behaviors to be observed in advance.


 Record student’s important data such as time, data, and place
 If unstructured, record brief descriptions of relevant behaviour
 Keep interpretations separate from description
 Record both positive and negative behaviors
 Have as much observations of each student as necessary
 Avoid personal bias
 Record immediately the observations
 Apply a simple and efficient procedure

1.2 Student Self-Report

There are varied ways to express students’ affect as self-report. The most common and
direct way is while having a casual conversation or interview. Students can also respond to
a written questionnaire or survey about themselves or other students.

1.2.1 Student Interview

There are different types of personal communication that teachers can use with their
students, like individual and group interviews, discussions, and casual conversations to
assess affect. It is similar to observation but in here, there is an opportunity that teachers
may have direct involvement with the student wherein teachers can probe and respond
for better understanding.

1.2.2 Surveys and Questionnaire

The second type under self-report method is questionnaires and surveys. The two types of
format using questionnaires and surveys are: (a) Constructed-Response format; and (b)
Selected-Response format.

Constructed-Response format

It is a straight forward approach asking students about their affect by


responding to simple statement or question. Another way to implement constructed-
response format is by means of an essay. Essay items provide more in-depth and
extensive responses than that of the simple short sentences. Reasons for their attitudes,
values and beliefs are expressed better using essays.

Selected-Response format

There are three ways of implanting the selected response format in assessing
affective learning outcomes. These are rating scale, semantic differential scale, and
checklist.

The advantage of selected-response formats is that it assures anonymity. It is an


important aspect when considering the traits that are personal such as values and self-
concept. This self-response formats are considered to be an efficient way of collecting
information.

Checklist for Using Student’s Self-Response to Assess Affect (McMillan, 2007):

 Keep measures focused on specific affective traits


 Establish trust with students
 Match response format to the trait being assessed
 Ensure anonymity if possible
 Keep questionnaires brief
 Keep items short and simple
 Avoid negatives and absolutes
 Write items in present tense
 Avoid double-barreled items

1.2.3 Peer Ratings

Peer ratings or appraisal is the least common method among the three methods of
assessing affect discussed in this chapter. Because of the nature of learners, they do not
always take this activity seriously and most often than not they are subjective in
conducting this peer rating. Thus, peer rating is seen as relatively inefficient in terms of
nature of conducting, scoring, and interpreting peer ratings. However, teachers can
accurately observe what is being assessed in peer ratings since teachers are very much
engaged and present inside the classroom and thus can verify the authentically of results
of peer rating. The two methods of conducting peer ratings are: (a) guess-who approach;
and (b) socio-metric approach. These approaches can be used together with observations
and self-reports to strengthens assessment of interpersonal and classroom environmental
targets.

1. Utilizing the Different Methods or Combination of Methods in Assessing Affect

Each of the three methods (observation, self-report, peer ratings) that was discussed
previously has its own advantage and disadvantages. In choosing for which method or
methods to use, consider the following factors:

2.1 Type of affect that needs to be assessed;

A general reaction to something or someone can best be gathered through observation.


However, if attitude components is to be diagnosed, a self-report will give a better
information. Observation can be supported by peer rating method if the target is socially-
oriented affect.

2.2 If the information needed is from grouped or individual responses; and

If grouped response and tendencies are needed, selected response self-report method is
suited because it assures anonymity and is easily scored.

2.3 The use of information

If the intention of the affective assessment is to utilize the results as supporting input to
grading, then multiple approaches is necessary and be mindful of the possibility of having
fake results from self-report and even from peer judgement.

1. Affective Assessment Tools

The affective domain encompasses behaviors in terms of attitudes, beliefs, and feelings.
Sets of attitudes, beliefs, and feelings comprise one’s value. There are various assessment
tools that can be used to measure affect.

3.1 Checklist

Checklist is one of the effective assessment strategies to monitor specific skills, behaviors,
or dispositions of individual or group of students (Burke, 2009).

Checklists contain criteria that focus on the intended outcome or target. Checklists help
student in organizing the tasks assigned to them into logically sequenced steps that will
lead to successful completion of the task. For the teachers, a criteria checklists can be
used for formative assessments by giving emphasis on specific behaviors, thinking skills,
social skills, writing skills, speaking skills, athletic skills or whatever outcomes are likely to
be measured and monitored. Checklists can be used for individual or group cases.

3.1.1 Criteria for Checklists


In planning for criteria that will be used in checklists, the criteria must be aligned with the
outcomes that need to be observed and measured. Generally criterion is defined as a
standard that serves as reference for judgement or decision. Popham (1999) explains that
when teachers set criteria, the main emphasis is to use these criteria in making judgement
regarding the adequacy of student responses and the criteria will influence the way the
response is scored.

3.1.2 Why Use Checklists

Checklists should be utilized because these

1. Make a quick and easy way to observe and record skills, criteria, and behaviors prior to
final test or summative evaluation.
2. Provide information to teachers if there are students who need help so as to avoid failing.
3. Provide formative assessment of students of students’ learning and help teachers monitor
if students are on track with the desired outcomes.

3.2 Rating Scale

According to Nitko (2001), rating scales can be used for teaching purposes and
assessment.

1. Rating scales help students understand the learning target/outcomes and to focus
students’ attention to performance.
2. Completed rating scale gives specific feedback to students as far as their strengths and
weaknesses with respect to the targets to which they are measured.
3. Students not only learn the standards but also may internalize the set standards.
4. Ratings helps to show each student’s growth and progress.

Example: Rating Scale (Attitude towards Mathematics)


Directions: Put the score on the column for each of the statement as it applies to you.
Use 1 to 5, 1 being the lowest and 5 the highest possible score.

score

1. I am happy during Mathematics class.

2. I get tired doing board work and drills.

3. I enjoy solving word problems.


3.2.1 Types of Rating Scales

The most commonly used type of rating scales are:

Numerical Rating Scales

A numerical rating scale translates the judgements of quality or degree


into numbers. To increase the objectivity and consistency of results from numerical rating
scales, a short verbal description of the quality level of each number may be provided.

Example:

To what extent does the student participate in team meetings and discussions?

1 2 3 4

Descriptive Graphic Rating Scales

A better format for rating is this descriptive graphic rating scales that replaces
ambiguous single word with short behavioural descriptions of the various points along the
scale.

Example:

To what extent does the student participate in team meetings and discussions?

Never participates participates

Participates as much as more than any

Quiet, other team other team

Passive members member

Comment(s):
______________________________________________________________________________

3.2.2 Common Rating Scale Errors

The table below contains the common rating scale errors that teachers and students must
be familiar with in order to avoid committing such kind of errors during assessment.
Error Description

Occurs when a teacher tends to make


Leniency Error almost all ratings towards the high end of
the scale, avoiding the low end of the scale.

A teacher tends to make almost all ratings


Severity Error toward the low end of the scale. This is the
opposite of leniency error.

Occurs when a teacher hesitates to use


Central Tendency Error extremes and uses only the middle part of
the scale.

Occurs when a teacher lets his/her general


Halo Effect impression of the student affect how he/she
rates the student on specific dimension.

Occurs when a teacher has a general


tendency to use inappropriate or irrelevant
Personal bias stereotypes favouring boys over girls, from
rich families over from middle-income
families, etc..

Occurs when a teacher gives similar ratings


to two or more dimensions that the teacher
Logical Error
believes to be related where in fact they are
not related at all.

Occurs when the raters, whose ratings


Rater Drift originally agreed, begin to redefine the
rubrics for themselves.

3.3 Likert Scale

Another simple and widely used self-report method in assessing affect is the use of Likert
scale wherein a list of clearly favourable and unfavourable attitude statements are
provided. The students are asked to respond to each of the statement.

Likert scale uses the five-point scale: Strongly Agree (SA); Agree (A); Undecided (U);
Disagree (D); and Strongly Disagree (SD).

The scoring of a Likert scale is based on assigning weights from 1 to 5 to each position of
scale. In using attitude scale, it is best to ask for anonymous responses. And in
interpreting the results, it is important to keep in mind that these are verbal expressions,
feelings and opinions that individuals are willing to report.

Example: Likert Scale


Directions: put a check on the column for each of the statement that applies to you.
Legend: SA – Strongly Agree, A – Agree, U – Undecided, D – Disagree, SD –
Strongly Disagree

(SA) (A) (U) (D) (SD)


5 4 3 2 1

1. I am happy during Mathematics class.

2. I get tired doing board work and drills.

3. I enjoy solving word problems.

3.3.1 Constructing Likert Scale Instrument

Below are the steps in constructing Likert scale instrument:

1. Write a series of statements expressing positive and negative opinions toward attitude
object.
2. Select the best statements expressing positive and negative opinions and edit as
necessary.
3. List the statements combining the positive and negative and put the letters of the five-
point scale to the left of each statement for easy marking.
4. Add the directions, indicating how to mark the answer and include a key at the top of the
page if letters are used for each statement.
5. Some prefer to drop the undecided category so that respondents will be forced to
indicate agreement or disagreement.

3.4 Semantic Different Scale

Another common approach to measuring affective traits is to use variations of semantic


differential. These scales use adjective pairs that provide anchors for feelings or beliefs
that are opposite in direction and intensity. Students would place a check between each
pair of adjectives that describes positive or negative aspects of the traits.

Example: Traits/attitude toward Mathematics subject

Mathematics

Boring __ __ __ __ __ Interestin Important ___ ___ ___ ___ ___ Useless


Semantic differential like other selected-response formats, is that it makes it easier to
assure anonymity. Anonymity is important when the traits are more personal, such as
values and self-concept. It is also an efficient way of collecting information. Though this
may be an efficient way note that it is not good to ask too many questions. It is
important to carefully select those traits that are concerned or included in the defined
affective targets or outcomes. It is also a good point to have open-ended items such as
“comments” or “suggestions”.

3.5 Sentence Completion

The advantage of using the incomplete sentence format is that it captures whatever
comes to mind from each student. However, there are disadvantages too for this. One is
students’ faking their response thinking that the teacher will notice their penmanship,
hence students will tend to give answers favourable to be liked responses of the teacher.
Another is scoring, which takes more time and is more subjective than the other
traditional objective formats.

Examples:
I think Mathematics as a subject is ________________________________.
I lie my Mathematics teacher the most because ______________________.

THE PORTFOLIO
Documentation Portfolio
– This approach involves a collection of work over time showing growth and improvement
reflecting students’ learning of identified outcomes. This portfolio is also called a “growth
portfolio” in the literature. The documentation portfolio can included everything from
brainstorming activities to drafts to finished products. The collection becomes meaningful
when specific items are selected out to focus on particular educational experiences or
goals. It can include the best and weakest of student work. It is important to realize here
that even drafts and scratch papers should be included in the portfolio for they actually
demonstrate the growth process that the students have been through.

Process Portfolio
– The process portfolio demonstrates all facets or phases of the learning process. As such,
these portfolios contain an extensive number of reflective journals, think logs and other
related forms of metacognitive processing. They are particularly useful in documenting
students’ overall learning process. It can show how students integrate specific knowledge
or skills and progress towards both basic and advanced mastery.

Showcase Portfolio
– The showcase portfolio only shows the best of the students’ outputs and products. As
such, this type of portfolio is best used for summative evaluation of students’ mastery of
key curriculum outcomes. It should include students’ very best work, determined through
a combination of student and teacher selection. Only completed work should be included.
In addition, this type of portfolio is especially compatible with audio-visual artifact
development, including photographs, videotapes, and electronic records of students’
completed work. The showcase portfolio should also include written analysis and
reflections by the student upon the decision-making process(es) used to determine which
works are included.

2) What are the essential elements of a portfolio? Describe each.

-ESSENTIAL ELEMENTS-
1. Cover Letter “About the author” and “What my portfolio shows about my progress
as a learner” (written at the end, but put at the beginning). The cover letter summarizes
the evidence of a student’s learning and progress.
 2. Table of Contents with numbered pages.
 3. Entries – both core (items students have to include) and optional (items of student’s
choice). The core elements will be required for each student and will provide a common
base from which to make decisions on assessment. The optional items will allow the
folder to represent the uniqueness of each student. Students can choose to include “best”
pieces of work, but also a piece of work which gave trouble or one that was less
successful, and give reasons why.
4. Dates on all entries, to facilitate proof of growth over time.
5. Drafts of aural/oral and written products and revised versions; i.e., first drafts and
corrected/revised versions.
6. Reflections can appear at different stages in the learning process (for formative
and/or summative purposes.) and at the lower levels can be written in the mother tongue
or by students who find it difficult to express themselves in English.

4) What type of portfolio are you required to submit in this course? Justify your
answer.
Process Portfolio because the making of this portfolio requires a process to create and
submitted. This is a process portfolio because we gain mastery of the different areas of
teaching profession because of the episodes that was given by my course facilitator.

Steps Needed For Developing Portfolio


Assessment
The design and use of portfolio begins with a clear description of your focus. The questions ‘Why
do I want a portfolio?’ and what learning targets and curriculum goals will it server?’

Preparing to Use Portfolio

´1. Who will construct the portfolio? ´

 Individual students with teacher input and help


 Individual students with input and help of cooperative learning groups.
 Cooperative base groups (Whole group work) with teacher’s Input and Help.

´2. What type of portfolio do you want to use?

3. What are the purposes and objectives of the portfolio?


4. What categories of work samples should go into the portfolio?

5. What criteria will students or groups use to select their entries?

6. Who will develop the rubrics to assess and evaluate portfolios?

Below is an example of assessing reading skills performance which shows the alignment of
teaching and learning goal, activities and assessment task which includes portfolio evidence

Identify the physical structure

Sample Classroom Assessment


Goal Portfolio evidence
Activity Tools

Individual
progress
Word bank, selected
Decode(Basic report, peer
Read simple texts ‘texts I can read’ ,
Reading skills) compliment,
reading on a cassette
checklists,
rating scales

Appreciate
Semi-extended reading Reading logs, reading
literature, Self/Peer
activities (Both guided journal, book tasks,
understanding assessment
and independent cassette, video clips,
characters and checklists
learning) artwork
themes

Teacher’s
record of
Reading for student’s
A log of books,
pleasure Sustained silent reading reading: Rating
creative tasks and
(Extensive in class as well as home scale relating
comment cards
reading) to content,
presentation
and language

Once the purpose and targets have been clarified, we need to think of the physical structure of
the portfolio. Some practical questions affect the successful use of portfolio in your classroom.

 What will it look like?


 Where will the students place the outputs?
 What type of container is appropriate?
 Do they need file folders? Clear book? Plastic bins?
 How are the materials to be organized?
 Where can students store the portfolios for easy access?
Determine the appropriate organization and sources of content

The content of portfolio consists of entries which provides assessment information about the
content and processes identified in dimensions to be assessed. These naturally are artifacts
which are derived from the different learning activities. The range of samples is extensive and
must be determined to some extent by the subject matter and the instruction.

THE USE OF PORTFOLIO ASSESSMENT IN EVALUATION

Meg Sewell, Mary Marczak, & Melanie Horn

WHAT IS PORTFOLIO ASSESSMENT?

In program evaluation as in other areas, a picture can be worth a


thousand words. As an evaluation tool for community-based
programs, we can think of a portfolio as a kind of scrapbook or
photo album that records the progress and activities of the
program and its participants, and showcases them to interested
parties both within and outside of the program. While portfolio
assessment has been predominantly used in educational settings
to document the progress and achievements of individual children
and adolescents, it has the potential to be a valuable tool for
program assessment as well.

Many programs do keep such albums, or scrapbooks, and use


them informally as a means of conveying their pride in the
program, but most do not consider using them in a systematic
way as part of their formal program evaluation. However, the
concepts and philosophy behind portfolios can apply to
community evaluation, where portfolios can provide windows into
community practices, procedures, and outcomes, perhaps better
than more traditional measures.
ortfolio assessment has become widely used in educational
settings as a way to examine and measure progress, by
documenting the process of learning or change as it occurs.
Portfolios extend beyond test scores to include substantive
descriptions or examples of what the student is doing and
experiencing. Fundamental to "authentic assessment" or
"performance assessment" in educational theory is the principle
that children and adolescents should demonstrate, rather than tell
about, what they know and can do (Cole, Ryan, & Kick, 1995).
Documenting progress toward higher order goals such as
application of skills and synthesis of experience requires obtaining
information beyond what can be provided by standardized or
norm-based tests. In "authentic assessment", information or data
is collected from various sources, through multiple methods,
and over multiple points in time (Shaklee, Barbour, Ambrose, &
Hansford, 1997). Contents of portfolios (sometimes called
"artifacts" or "evidence") can include drawings, photos, video or
audio tapes, writing or other work samples, computer disks, and
copies of standardized or program-specific tests. Data sources can
include parents, staff, and other community members who know
the participants or program, as well as the self-reflections of
participants themselves. Portfolio assessment provides a practical
strategy for systematically collecting and organizing such data.

PORTFOLIO ASSESSMENT IS MOST USEFUL FOR:

*Evaluating programs that have flexible or individualized goals or


outcomes. For example, within a program with the general
purpose of enhancing children's social skills, some individual
children may need to become less aggressive while other shy
children may need to become more assertive.

Each child's portfolio asseessment would be geared to his or her


individual needs and goals.
*Allowing individuals and programs in the community (those being
evaluated) to be involved in their own change and decisions to
change.

*Providing information that gives meaningful insight into behavior


and related change. Because portfolio assessment emphasizes
the process of change or growth, at multiple points in time, it may
be easier to see patterns.

*Providing a tool that can ensure communication and


accountability to a range of audiences. Participants, their families,
funders, and members of the community at large who may not
have much sophistication in interpreting statistical data can often
appreciate more visual or experiential "evidence" of success.

*Allowing for the possibility of assessing some of the more


complex and important aspects of many constructs (rather than
just the ones that are easiest to measure).

PORTFOLIO ASSESSMENT IS NOT AS USEFUL FOR:

*Evaluating programs that have very concrete, uniform goals or


purposes. For example, it would be unnecessary to compile a
portfolio of individualized "evidence" in a program whose sole
purpose is full immunization of all children in a community by the
age of five years. The required immunizations are the same, and
the evidence is generally clear and straightforward.

*Allowing you to rank participants or programs in a quantitative or


standardized way (although evaluators or program staff may be
able to make subjective judgements of relative merit).
*Comparing participants or programs to standardized norms.
While portfolios can (and often do) include some standardized test
scores along with other kinds of "evidence", this is not the main
purpose of the portfolio.

USING PORTFOLIO ASSESSMENT WITH THE STATE


STRENGTHENING EVALUATION GUIDE

Tier 1 - Program Definition

Using portfolios can help you to document the needs and assets
of the community of interest. Portfolios can also help you to clarify
the identity of your program and allow you to document the
"thinking" behind the development of and throughout the
program. Ideally, the process of deciding on criteria for the
portfolio will flow directly from the program objectives that have
been established in designing the program. However, in a new or
existing program where the original objectives are not as clearly
defined as they need to be, program developers and staff may be
able to clarify their own thinking by visualizing what successful
outcomes would look like, and what they would accept as
"evidence". Thus, thinking about portfolio criteria may contribute
to clearer thinking and better definition of program objectives.

Tier 2 - Accountability

Critical to any form of assessment is accountability. In the


educational arena for example, teachers are accountable to
themselves, their students, and the families, the schools and
society. The portfolio is an assessment practice that can inform all
of these constituents. The process of selecting "evidence" for
inclusion in portfolios involves ongoing dialogue and feedback
between participants and service providers.
Tier 3 - Understanding and Refining

Portfolio assessment of the program or participants provides a


means of conducting assessments throughout the life of the
program, as the program addresses the evolving needs and assets
of participants and of the community involved. This helps to
maintain focus on the outcomes of the program and the steps
necessary to meet them, while ensuring that the implementation is
in line with the vision established in Tier 1.

Tier 4 - Progress Toward Outcomes

Items are selected for inclusion in the portfolio because they


provide "evidence" of progress toward selected outcomes.
Whether the outcomes selected are specific to individual
participants or apply to entire communities, the portfolio
documents steps toward achievement. Usually it is most helpful for
this selection to take place at regular intervals, in the context of
conferences or discussions among participants and staff.

Tier 5 - Program Impact

One of the greatest strengths of portfolio assessment in program


evaluation may be its power as a tool to communicate program
impact to those outside of the program. While this kind of data
may not take the place of statistics about numbers served, costs,
or test scores, many policy makers, funders, and community
members find visual or descriptive evidence of successes of
individuals or programs to be very persuasive.

ADVANTAGES OF USING PORTFOLIO ASSESSMENT


*Allows the evaluators to see the student, group, or community as
individual, each unique with its own characteristics, needs, and
strengths.

*Serves as a cross-section lens, providing a basis for future analysis


and planning. By viewing the total pattern of the community or of
individual participants, one can identify areas of strengths and
weaknesses, and barriers to success.

*Serves as a concrete vehicle for communication, providing


ongoing communication or exchanges of information among
those involved.

*Promotes a shift in ownership; communities and participants can


take an active role in examining where they have been and where
they want to go.

*Portfolio assessment offers the possibility of addressing


shortcomings of traditional assessment. It offers the possibility of
assessing the more complex and important aspects of an area or
topic.

*Covers a broad scope of knowledge and information, from many


different people who know the program or person in different
contexts ( eg., participants, parents, teachers or staff, peers, or
community leaders).

DISADVANTAGES OF USING PORTFOLIO ASSESSMENT

*May be seen as less reliable or fair than more quantitative


evaluations such as test scores.

*Can be very time consuming for teachers or program staff to


organize and evaluate the contents, especially if portfolios have to
be done in addition to traditional testing and grading.
*Having to develop your own individualized criteria can be difficult
or unfamiliar at first.

*If goals and criteria are not clear, the portfolio can be just a
miscellaneous collection of artifacts that don't show patterns of
growth or achievement.

*Like any other form of qualitative data, data from portfolio


assessments can be difficult to analyze or aggregate to show
change.

HOW TO USE PORTFOLIO ASSESSMENT

Design and Development

Three main factors guide the design and development of a


portfolio: 1) purpose, 2) assessment criteria, and 3)
evidence (Barton & Collins, 1997).

1) Purpose

The primary concern in getting started is knowing the purpose


that the portfolio will serve. This decision defines the operational
guidelines for collecting materials. For example, is the goal to use
the portfolio as data to inform program development? To report
progress? To identify special needs? For program accountability?
For all of these?

2) Assessment Criteria

Once the purpose or goal of the portfolio is clear, decisions are


made about what will be considered success (criteria or
standards), and what strategies are necessary to meet the goals.
Items are then selected to include in the portfolio because
they provide evidence of meeting criteria, or making progress
toward goals.

3) Evidence

In collecting data, many things need to be considered. What


sources of evidence should be used? How much evidence do we
need to make good decisions and determinations? How often
should we collect evidence? How congruent should the sources of
evidence be? How can we make sense of the evidence that is
collected? How should evidence be used to modify program and
evaluation? According to Barton and Collins (1997), evidence can
include artifacts (items produced in the normal course of
classroom or program activities), reproductions (documentation of
interviews or projects done outside of the classroom or
program), attestations (statements and observations by staff or
others about the participant), and productions (items prepared
especially for the portfolio, such as participant reflections on their
learning or choices) . Each item is selected because it adds some
new information related to attainment of the goals.

Steps of Portfolio Assessment

Although many variations of portfolio assessment are in use, most


fall into two basic types: process portfolios and product portfolios
(Cole, Ryan, & Kick, 1995). These are not the only kinds of
portfolios in use, nor are they pure types clearly distinct from each
other. It may be more helpful to think of these as two steps in the
portfolio assessment process, as the participant(s) and staff
reflectively select items from their process portfolios for inclusion
in the product portfolio.

Step 1: The first step is to develop a process portfolio, which


documents growth over time toward a goal. Documentation
includes statements of the end goals, criteria, and plans for the
future. This should include baseline information, or items
describing the participant's performance or mastery level at the
beginning of the program. Other items are "works in progress",
selected at many interim points to demonstrate steps toward
mastery. At this stage, the portfolio is a formative evaluation tool,
probably most useful for the internal information of the
participant(s) and staff as they plan for the future.

Step 2: The next step is to develop a product portfolio (also


known as a "best pieces portfolio"), which includes examples of
the best efforts of a participant, community, or program. These
also include "final evidence", or items which demonstrate
attainment of the end goals. Product or "best pieces" portfolios
encourage reflection about change or learning. The program
participants, either individually or in groups, are involved in
selecting the content, the criteria for selection, and the criteria for
judging merits, and "evidence" that the criteria have been met
(Winograd & Jones, 1992). For individuals and communities alike,
this provides opportunities for a sense of ownership and strength.
It helps to show-case or communicate the accomplishments of the
person or program. At this stage, the portfolio is an example of
summative evaluation, and may be particularly useful as a public
relations tool.

Distinguishing Characteristics

Certain characteristics are essential to the development of any


type of portfolio used for assessment. According to Barton and
Collins (1997), portfolios should be:

1) Multi-sourced (allowing for the opportunity to evaluate a


variety of specific evidence)

Multiple data sources include both people (statements and


observations of participants, teachers or program staff, parents,
and community members), and artifacts (anything from test scores
to photos, drawings, journals, & audio or videotapes of
performances).

2) Authentic (context and evidence are directly linked)

The items selected or produced for evidence should be related to


program activities, as well as the goals and criteria. If the portfolio
is assessing the effect of a program on participants or
communities, then the "evidence" should reflect the activities of
the program rather than skills that were gained elsewhere. For
example, if a child's musical performance skills were gained
through private piano lessons, not through 4-H activities, an audio
tape would be irrelevant in his 4-H portfolio. If a 4-H activity
involved the same child in teaching other children to play, a tape
might be relevant.

3) Dynamic (capturing growth and change)

An important feature of portfolio assessment is that data or


evidence is added at many points in time, not just as "before and
after" measures. Rather than including only the best work, the
portfolio should include examples of different stages of mastery.
At least some of the items are self-selected. This allows a much
richer understanding of the process of change.

4) Explicit (purpose and goals are clearly defined)

The students or program participants should know in advance


what is expected of them, so that they can take responsibility for
developing their evidence.

5) Integrated (evidence should establish a correspondence


between program activities and life experiences)
Participants should be asked to demonstrate how they can apply
their skills or knowledge to real-life situations.

6) Based on ownership (the participant helps determine evidence


to include and goals to be met)

The portfolio assessment process should require that the


participants engage in some reflection and self-evaluation as they
select the evidence to include and set or modify their goals. They
are not simply being evaluated or graded by others.

7) Multipurposed (allowing assessment of the effectiveness of the


program while assessing performance of the participant).

A well-designed portfolio assessment process evaluates the


effectiveness of your intervention at the same time that it
evaluates the growth of individuals or communities. It also serves
as a communication tool when shared with family, other staff, or
community members. In school settings, it can be passed on to
other teachers or staff as a child moves from one grade level to
another.

Analyzing and Reporting Data

As with any qualitative assessment method, analysis of portfolio


data can pose challenges. Methods of analysis will vary depending
on the purpose of the portfolio, and the types of data collected
(Patton, 1990). However, if goals and criteria have been clearly
defined, the "evidence" in the portfolio makes it relatively easy to
demonstrate that the individual or population has moved from a
baseline level of performance to achievement of particular goals.

It should also be possible to report some aggregated or


comparative results, even if participants have individualized goals
within a program. For example, in a teen peer tutoring program,
you might report that "X% of participants met or exceeded two or
more of their personal goals within this time frame", even if one
teen's primary goal was to gain public speaking skills and
another's main goal was to raise his grade point average by
mastering study skills. Comparing across programs, you might be
able to say that the participants in Town X on average mastered 4
new skills in the course of six months, while those in Town Y only
mastered 2, and speculate that lower attendance rates in Town Y
could account for the difference.

Subjectivity of judgements is often cited as a concern in this type


of assessment (Bateson, 1994). However, in educational settings,
teachers or staff using portfolio assessment often choose to
periodically compare notes by independently rating the same
portfolio to see if they are in agreement on scoring (Barton &
Collins, 1997). This provides a simple check on reliability, and can
be very simply reported. For example, a local programmer could
say "To ensure some consistency in assessment standards, every
5th portfolio (or 20%) was assessed by more than one staff
member. Agreement between raters, or inter-rater reliability, was
88%".

1 PORTFOLIO ASSESSMENT METHODS

2 Historical perspective
Portfolios widely used for many years. Late 80s interest in portfolios for assessment (
Belan off and Dickson 1991)90s saw advent of e portfolios. A shift in emphasis away
from assessment to learning?

3 Definitions“collection of student work that demonstrates achievement or


improvement” (Stiggins 1994)“a portfolio is a collection of evidence that is gathered
together to show a person’s learning journey over time and to demonstrate their
abilities” (Butler 2006)

4 Definitions“…studentwriting over time, which contains exhibits showing the stages


in the writing processes a text has gone through and the stages of the writer’s
growth as a writer, and evidence of the writer’s self-reflection on her/his identity and
progress as a writer” (Hamp-Lyons 1996)

5 Definitions portfolios are “…prepared with a particular audience in mind”, “…are


selective” and “call for judgments” (Calfee and Freedman 1996)

6 Definitions“…a purposeful collection of student work that illustrates efforts,


progress, and achievement in one or more areas [over time]. The collection must
include: student participation in selecting contents, the criteria for selection, the
criteria for judging merit, and evidence of self-reflection” (The Northwest Evaluation
Association cited in Barret 2005)

7 Definitions – main characteristics


They are collections of work, different from a single timed impromptu essay or a class
essay carried out over a semester.They are purposeful in that they “demonstrate”,
“exhibit” or provide “evidence” of “achievement”, “improvement”, “the writer’s self
reflection”, “the writing process” and “the writer’s growth”.The degree to which these
characteristics are evidenced in portfolios largely depends on their purpose.

8 The collection must include:


A portfolio is a purposeful collection of student work that exhibits the student’s
progress and achievements in one or more areas.The collection must include:student
participation in selecting contents the criteria for selection the criteria for judging
merit and evidence of student self-reflection.

9 The purpose of creating a portfolio is to enable the student to demonstrate to


others learning and progress.The value is that, in building them, students become
active participants in the learning process and its assessment.

10 Features and Principles of Portfolio Assessment


A portfolio is a form of assessment that students do together with their teachers.A
portfolio represents a selection of what the students believe are best included from
among the possible collection of things related to the concept being studied.

11 A portfolio provides samples of the student’s work which show growth over
time
The criteria for selecting and assessing the portfolio contents must be clear to the
teacher and the students at the outset of the process.

12 Purposes of Portfolio Assessment


Portfolio assessment matches assessment to teaching.Portfolio assessment has clear
goals. They are decided on at the beginning of instruction and are clear to teacher
and students.Portfolio assessment gives a profile of learner abilities in terms of depth,
breadth and growth.
13 Portfolio assessment is a tool for assessing a variety of skills not normally testable
in a single setting for traditional testing.Portfolio assessment develops awareness of
own learning by the students.Portfolio assessment caters to individuals in a
heterogeneous class.

14 Portfolio assessment develop social skills


Portfolio assessment develop social skills. Students interact with other students in the
development of their own portfolio.Portfolio assessment promotes independent and
active learners.Portfolio assessment can improve motivation for learning and thus
achievement.Portfolio assessment provides opportunity for student-teacher dialogue.

15 Essential Elements of the Portfolio

 Cover Letter “About the author” and “What my portfolio shows about my
progress as a learner” (written at the end, but put at the beginning)
 Table of Contents with numbered pages
 Entries – both core (items student have to include) and optional (items of
student’s choice)
 Dates on all entries, to facilitate proof of growth over time.
Draft of aural/oral and written products and revised version; e.g., first drafts and
corrected/revised versions.Reflections can appear at different stages in the
learning process (for formative and/or summative purposes) and at the lower
levels can be written in the mother tongue.

17 Students can choose to reflect upon some of the following:


For each item – a brief rationale for choosing the item should be included.Students
can choose to reflect upon some of the following:What did I learn from it?What did I
do well?Why did I choose this item?What do I want to improve in the item?How do I
feel about my performance?What were the problem areas?

18 Planning for Portfolio Assessment


Steps for Planning and Implementing Portfolio Assessment

 Determine Purpose
 Identify Physical Structure
 Determine Source of Content
 Determine Student Self-Reflective Guidelines and Scoring Criteria
 Review with Students
 Teacher Evaluation of Contents and Student Self-Evaluation
 Student Self-Evaluation of Contents
 Student-Teacher Conference
 Portfolio Content Supplied by Teacher and/or Student
 Portfolios Returned to Students or School
19 Purpose Involves specific learning targets – the targets that reflect all contents are
broader and more general“development as a reader”“speaks clearly”Adapts writing
styles to different purposes”

The use of the portfolio: Documentation, Showcasing Growth, Evaluation

20 Identify Physical Structure


What will it look like?Where are they stored so that students can have easy access to
it?Do I have boxes to put them in?What is the actual arrangement of documents in
the portfolio?

21 Stages in Implementing Portfolio Assessment


Identifying teaching goals to assess through portfolio. Introducing the idea of
portfolio assessment to your class. Specification of portfolio content. Giving clear and
detailed guidelines for portfolio presentation. Informing key school officials, parents
and other stakeholders. Development of a portfolio

22 Types of Portfolios. The types of portfolios differ from each other depending on
the purposes or objectives set for the overall classroom assessment program.As a
rule, portfolio assessment is used where traditional testing is inadequate to measure
desired skills and competencies.

23 Types of portfolio :
A process portfolio, a showcase portfolio, an assessment portfolio, dossier portfolio,
reflective portfolio, classroom portfolio, positivist portfolio, constructivist portfolio,
personal portfolio, structured portfolio., employment portfolio, A working portfolio

24 Types of Portfolios: Documentation Portfolio – involves a collection of work over


time showing growth and improvement reflecting students’ learning of identified
outcomes.“growth portfolio”can include everything from brainstorming activities to
drafts to finished products.

25 Process Portfolio – demonstrate all facets or phrases of the learning process


Contains an extensive number of reflective journals, think logs and other related
forms of metacognitive processing.Useful in documenting students’ overall learning
processCan show how students integrate specific knowledge or skills and progress
towards basic and advanced mastery

26 Showcase Portfolio – shows the best of the students’ outputs and products.
Best used for summative evaluation of students’ mastery of key curriculum
outcomes.Should include students’ best work, determined through a combination of
student and teacher selection.Only completed work should be included. Compatible
with audio-visual artifact development, including photographs, and electric record of
students’ completed work. Should include written analysis and reflections by the
student upon the decision-making process used to determine which works are
included

27 Assessing and Evaluating the Portfolio


“Portfolio assessment provides the teacher and students an opportunity to observe
students in a broader context: taking risks, developing creative solutions, and learning
to make judgments about their own performances”(Paulson, Paulson and Meyer,
1991)

28 Rating criteria.
Thoughtfulness (including evidence of students’ monitoring of their own
comprehension, metacognitive reflection, and productive habits of mind)Growth and
development in relationship to key curriculum expectancies and indicators.
Understanding and application of key processes. Completeness, correctness, and
appropriateness of products and processes presented in the portfolio. Diversity of
entries (e.g., use of multiple formats to demonstrate achievement of designed
performance standards)

29 In evolving an evaluation criteria, teachers and students must work together and
agree on the criteria to be applied tot eh portfolio.Such evaluative criteria need to be
set and agreed prior to the development of the portfolio.The criteria to be used may
be formative (i.e., throughout the instructional time period) or summative (i.e., as part
of culminating project, activity or related assessment to determine the extent to which
identified curricular expectancies, indicators, and standards have been achieved)

30 Sample of Rating Scale for Cover Letter


GradeDescription1 – 3Shows limited awareness of portfolio goals. Has difficulty
understanding the process of revision. Demonstrates little evidence of progress over
time. Limited explanation of choices made. Has difficulty relating to self/peer
assessment4 – 7Reflects awareness of some portfolio goals. Understands the process
of revision to a certain extent. Demonstrates some evidence of progress over time.
Explains choices made in a relevant way. Relates to self/peer assessment8 – 10Reflects
awareness pf portfolio goals. Understands the process of revision. Demonstrates
evidence of progress over time. Fully explains choices made. Reaches high level of
reliability in self/peer assessment. Draws conclusions about his/her learning

31 “collaborative approach” – significant aspect of portfolio assessment


Students and teacher work together to identify significant or important artifacts and
processes to be captured in the portfolio.to determine grades or scores to be
assigned
32 For grading and scoring, rubrics, rules and scoring keys can be designed for a
variety of portfolio components. Letter grades might also be assigned, where
appropriate.For summative purposes, a panel of interviewers be designated to
evaluate the students’ portfolio based on the agreed set of criteria.

33 Each portfolio entry needs to be assessed with reference to its specific goals
Self and peer-assessment can also be used for formative evaluation, with students
having to justify their grades with reference to the goals and to specific pages in the
portfolio.

34 The teacher provides feedback on the portfolios:


Write a letter about the portfolio which details strengths and weaknesses and
generates a profile of a student’s ability, which is then added to the portfolio. Prepare
certificates which comment on the portfolio strengths and suggest future goals.

35 Student-Teacher Conferences
The main philosophy embedded in portfolio assessment is “shared and active
assessment”.For formative evaluation process,The teacher should have short individual
meetings with each student, in which progress is discussed and goals are set for
future meeting. The student and the teacher keep careful documentation of the
meetings noting the significant agreements and findings in each session.

36 For summative evaluation purposes,


Students can negotiate for the appropriate grade to be given using as evidence the
minutes of the regular student-teacher conferences.Notes from conferences have to
be included in the portfolio as they contain joint decisions about the individual’s
strengths and weaknesses.

37 Advantages and Disadvantages of Portfolio Assessment

 Promotes student self-assessment


 .Promotes collaborative assessment
 Enhances student motivation
 Systematic assessment is ongoing
 Focus on improvement not comparison with others
 Focus on students’ strength
 Assessment process is individualized
 Allows demonstration of unique accomplishments
 Provide concrete examples for parent conferences
 Products can be used for individualized teacher diagnosis
 Flexibility and adaptability
 Scoring difficulty may lead to low reliability
 Teacher training needed
 Time consuming to develop criteria, score and meet with students
 Students ,may not make good selections of which materials to include
 Sampling of student products may lead to weak generalization
 Parents may find portfolio difficult to understand

38 Research in Portfolio Assessment


Impact:Richardson (2000) study involved classroom observations teacher and student
interviews and examination of student writing and teacher response.Found that
students regard teacher responses as directives. Were not prepared to make
independent judgments largely because of the threat of grades

39 Impact- Herman and Winter (1994) based on self-reports from teachers and
others implementing portfolios appears to have positive effects on instruction .
Vermont principals affirmed that the portfolio assessment program had beneficial
effects on curriculum and instruction

40 Impact- Hirvela and Sweetland (2005) used 2 case studies showing the 2 students
did not strongly endorse the portfolios as used in 2 different courses. Seemed to
need more explanations of what portfolio approaches were meant to achieve. Even
with a 5% final course grade students saw the portfolio as essentially summative in
nature.

GRADING AND REPORTING SYSTEM


The past chapters of this book discussed the different methods and tools that measure student
achievement in the context of the different learning targets. In this chapter, the assigning of
grades to student and how it should be done in relation to the intended learning outcomes
have been achieved and that grading policies of schools must also be taken into consideration
in developing a grading system.

Section Intended Learning Outcomes

At the end of this section, you should be able to demonstrate skills in in preparing and
interpreting grades. Also, you should be able to assess the effectiveness of parent-teacher
conference as a venue for reporting learners’ performance.

Chapter Intended Learning Outcome


At the end of this chapter, you are expected to demonstrate skills in interpreting test results
and reporting of grades.

Assessment of learning during instruction and after instruction may be achieved in a number
of ways. One of the challenges in grading is that of summarizing the variety of collected
information from different types of assessment and come up with a standardized numerical
grade or descriptive letter rating or brief report.

The guiding premises in developing grading and reporting system are provided below:

1. The primary goal of grading and reporting is communication.


2. Grading and reporting are integral parts of the instructional process.
3. Good reporting is based on good evidence
4. Changes in grading and reporting are best accomplished through the development of a
comprehensive reporting system.

In developing ‘and implementing the grading and reporting systems, these premises must be
taken into consideration to have a meaningful output and help in the attainment of the
student learning objectives, to which the assessment objectives cascaded.

1. K to 12 Grading of Learning Outcomes

The K to 12 curriculums has specific assessment requirements and design catering to the
delivery modes of learning, i.e., the formal education and alternative learning system.

The K to 12 assessment is learner-centered and carefully considers its learning environment


system. The 21st century skills such as research, analytical/critical, practical and creative are
part of the indicators included in the K to 12 assessment. Both cognitive and non-cognitive
skill which includes values, motivation, attitude, behavior traits, and interpersonal relations are
part of the assessment.

Formative assessment (assessment FOR learning) is given importance to ensure learning.


Learners are encouraged to take part in the process of self-assessment (assessment AS
learning). Summative forms of assessment (assessment of learning) are also part of the
curriculum assessment under the K to 12.

The K to 12 curriculum prescribes that the assessment process should utilize the wide variety
of traditional and authentic assessment tools and techniques for a valid, reliable, and realistic
assessment of learning. Traditional and authentic assessments complement each other though
they are not mutually exclusive. Furthermore, it gives greater importance on assessing
understanding and skills development rather than on mere accumulation of content.

In K to 12 curriculums, assessment will be standards-based to ensure that there is


standardization in teaching and learning. Department of Education (DepEd) issued an order
(DepEd Order No. 31, s. 2012) stating that assessment will be done in four levels and will be
weighted accordingly.

These levels are the following:

Knowledge refers to the essential content of the curriculum, the facts and information that the
student acquires.

Process refers to cognitive acts that the student does on facts and information to come up
with meanings and understandings.

Understanding refers to lasting big ideas, principles, and generalizations that are fundamental
to the discipline which may be assessed using the facets of understanding.

Products/Performances refers to real-life application of understanding as shown by the


student’s performance of authentic tasks.

The assigned weight per level of assessment are shown in the following table:

Level of Assessment Percentage Weight

Knowledge 15%

Process of Skills 25%

Understanding 30%

Products/Performance 30%

TOTAL 100%

Source: DepEd Order 31, s,2012

At the end of the quarter, the student’s performance will be described based on the prescribed
level of proficiency which has equivalent numerical values. Proficiency level is computed from
the sum of all the performances of students in various levels of assessment. Each level is
described as follows:

Beginning. The student at this level struggles with his/her understanding of prerequisite and
fundamental knowledge skills that have not been acquired or developed adequately.

Developing. The student at this level possesses the minimum knowledge and skills and core
understanding but needs help throughout the performance of authentic tasks.

Approaching Proficiency. The student at this level has developed the fundamental knowledge
and skills and core understandings, and with little guidance from the teacher and or with some
assistance from peers, can transfer these understandings through authentic performance tasks.

Proficient. The student at this level has developed the fundamental knowledge and skills and
core understandings, and can transfer them independently through authentic performance
tasks.
Advanced. The student at this level exceeds the core requirements in terms of knowledge,
skills and core understandings, and can transfer them automatically and flexibly through
authentic performance tasks.

Translating these proficiency level into its numerical value is described in the following table.

Level of Proficiency Equivalent Numerical Value

Beginning 74% and below

Developing 75-79%

Approaching Proficiency 80-84%

Proficient 85-89%

Advance 90% and above


Source: DepEd Order 31, s, 2012

Comparison of Levels of Proficiency

Approaching
Indicators Beginning Developing Proficiency Proficiency Advance

Acquisition of
Struggling or
knowledge,
have not minimum Fundamental Fundamental Exceeding
Skills and
acquired
understanding

With little
guidance
Transfer of
from the
knowledge /
Needs help teacher or independent Automatic and flexible
Application of
some
Knowledge
assistance
from peers
Source: Marilyn D. Dimaano’s presentation materials on Assessment and Rating

Note: You may do some research in order to learn more about the grading and reporting
system used in the old curriculum as well as the newly implemented K to 12 curriculums.

1. The Effects of Grading on Students

Over the years, studies have also been made on how grades and the comments of teachers
written on students’ papers might affect students’ achievement. An early investigation by Page
(1958) focused specifically on this issue. In the said study, 74 school teachers administered a
test to the students in their classes and scored in the usual way. A numerical score was
assigned to each student’s paper and on the basis of the scores obtained, a corresponding
Letter grade of A, B, C, D, or F was given. Next, teachers randomly divided the students’ papers
into three groups. The first group received only the numerical score and letter grade. The
second group aside from the score and grade, received standard comments:

A: Excellent! Keep it up; B: Good work! Keep it up; C: Perhaps try to do still better? D. Let’s
bring this up; and F: Let’s raise this grade. For the third group, teachers marked the score,
letter grades and then wrote on each paper a variety of individualized comments. Page asked
the teachers to write anything they wished on these papers but to be sure their personal
feelings and instructional practices. Papers were then returned to students in a normal way.

Page then evaluated the effects of the comments by considering students’ scores on the very
next test or assessment given in the class. The results showed that students who received the
standard comments with their grade achieved significantly higher scores than those who
received only a score and grade. Those students who received individualized comments did
even better. This led him to conclude that grades can have a beneficial effect on student
learning when accompanied by specific or individualized comments from the teacher (Stewart
White, 1976). Studies conducted in more recent year confirmed Pages’ conclusion.

Based on the study presented in the previous paragraphs, its relevance are:

1. It illustrated that while grades may not be compulsory for teaching or learning, it can- be used
in positive ways to enhance students’ achievement and performance.
2. It showed that positive effects can be gained with relatively little effort on the part of teachers.
Stamps or stickers with standard comments such as these could be easily produced for
teachers to use. Yet the effect of this simple effort has significant positive effect on students’
performance.

1. Building a Grading and Reporting System

1. The Basis of Good Reporting is Good Evidence

Whatever is preferred and required of the teacher when it comes to format, grading and
reporting should provide high quality information to interested person by means of any
schema they can understand and use. The basis of such high-quality information is critical
evidence on student learning. Evaluation experts stress that if one is going to make important
decisions about students that have broad implications, such as decisions involved in grading,
the more that good evidence must be ready at hand (Airasian, 1994; Linn & Gronlund, 2000;
Stiggins, 2001). In the absence of good evidence, even the most detailed and hi-tech grading
and reporting system is useless. It simply cannot serve the basic communication functions for
which it is intended.

There are three qualities that contribute to the goodness of evidence that are gathered on
student learning. These three qualities are described in the following table.
Quality Description Example

Refers to the appropriateness If an assessment is to be used to describe


& adequacy of interpretations students’ reading comprehension; evidence
made from that information should actually reflect reading comprehension
Validity
(Linn & Gronlund, 2000) and not other irrelevant factors.

Attain very similar scores when the same


Refers to the consistency of assessment procedures are used with the same
Reliability
assessment results students at two different times, results have a
high degree of reliability.

The more sources of evidence Any single source of evidence of student learning
on students’ learning, the can be imperfect, it is essential that multiple
Quality
better the information can be sources of evidence in grading and reporting
reported. students is utilized.

 Major Purposes of Grading and Reporting

The following are the major purposes of grading and reporting:

 To communicate the achievement status of students to parents and others


 To provide information that students can use for self-evaluation
 To select, identify or group students for certain educational paths or programs
 To provide evidence of students’ lack of effort or inappropriate responsibility

Below are possible sources of Grading and Reporting System

 Major Exams or Composition


 Class observation
 Class quizzes
 Oral Presentations
 Reports or projects
 Homework completion
 Homework quality
 Students’ Portfolios
 Exhibits of students’ work
 Laboratory projects
 Students’ notebook or journal
 Class participation
 Work habits and neatness
 Effort
 Attendance
 Punctuality of assignments
 Class behavior or attitude
 Progress made

3.3 Grading and Reporting Methods

3.3.1 Letter Grades

 The most common and best known of all grading methods


 Mostly composed of five-level grading scale
 Letter Grade Descriptors

Despite their apparent simplicity, the true meaning of letter grades is not always clear. What
the teachers would like to communicate with particular letter grade and what parents interpret
that grade to mean, often are not the same (Waltman & Frisbie, 1994). To give more clarity to
the meaning of letter grade, most schools include a key or legend on the reporting from in
which each letter grade is paired with an explanatory word or phrase. Descriptors must be
carefully chose, to avoid additional complications and misunderstanding.

Advantages:

 A brief description of students’ achievement and level of performance including students’


potentials can be provided to parents and other interested persons.
 Based on parents’ experience when they were still studying, it is easier for them to believe and
understand what letter grade means.

Disadvantages:

 Requires abstraction of a great deal of information into a single symbol 9stiggins, 2001)
 Despite educators’ best effort, letter grades tend to be interpreted by parents in strictly norm-
referenced terms. The cut-offs between grade categories are always arbitrary and difficult to
justify.
 Lacks the richness of other more detailed reporting methods such as standards-based grading,
mastery grading, and narrative.

Different Interpretation of Letter Grades


Combined Norm-
Referenced and Based on
Criterion-Referenced Criterion Improvement
Grade Norm-Referenced Referenced
(Standards-Based)

· Outstanding
or advanced
A · Complete
knowledge of all
Outstanding:
content Outstanding: very Outstanding: much
among the
high level of improvement on
· Mastery of all highest or best performance most or all targets
targets performance

· Exceeds
standards

· Very good or
proficient
· Complete
B knowledge of most
content Very good: Very Good: better Very good: some
Performs above at than average improvement on
· Mastery of all the class average performance most or all targets
targets

· Meets most
standards

· Acceptable or
basic command of
only basic concepts
C or skills Average: performs Acceptable: some
· Mastery of at the class Average: improvements on
some targets average some targets

· Meets some
standards

· Unsatisfactory: Unsatisfactory: far Unsatisfactory: Unsatisfactory: no


lacks knowledge of below average; lacks sufficient improvement on any
contents;
· No mastery of
targets among the worst knowledge to
targets
in the class pass
· Does not
E
meet any standards

(McMillan, 2007)

3.3.2 Percentage Grades

 Are the ultimate multi-category grading method


 Can range from 0 to 100
 Generally more popular among high school teachers than elementary teachers

Advantages:

 Allows for maximum discriminators in evaluation of students’ achievement and performance


 Maximizes the variation among students, making it easier to choose students for honors or
representative for special program

Disadvantages:

 Requires a great deal of abstraction


 Interpretation of meaning of a percentage grade extremely difficult
 The cut-offs are no less arbitrary but a lot more
 Because of a large number of grade categories, it is less reliable and more subjective.

3.3.3 Standards-Based Grading

In an effort to bring greater clarity and specificity to the grading process, many schools
initiated standards-based grading procedures and reporting forms. Guskey and Bailey (2001)
identify four steps in developing standards-based grading. These steps are:

1. Identify the major learning goals or standards that students will be expected to achieve at
each grade level or in each course of study.
2. Establish performance indicators for the learning goals.
3. Determine graduated level of quality (benchmarks) for assessing each goal or standard.
4. Develop reporting tools that communicate teachers’ judgments of students’ learning progress
and culminating achievement in relation to the learning goals and standards.

Advantages:
 When clear learning goals or standards are established, standards-based grading offers
meaningful information about students’ achievement and performance to students, parents
and to others
 If information is detailed, it can be useful for diagnostic and prescriptive purposes.
 Facilitates teaching and learning processes better than any other grading method.

Disadvantages:

 Takes a lot of effort.


 Often too complicated for parents to understand
 May not communicate the appropriateness of students’ progress.

3.3.4 Pass/Fail Grading

Simplest alternative grading method available to educators reduces the number of grade
categories to just two: Pass or Fail. In the late 1800s Pass/Fail grading was originally introduced
in college-level courses in the college in order for students to give more importance to
learning and less to grades they attained. By lessening the emphasis on grades, many
educators believed that students would be encouraged to take more challenging subjects.

Pass/Fail was popular in most universities and colleges in 1970s. These universities and
colleges utilized this pass/fail grading to various programs.

Advantages:

 Simplifies the grading process for teachers and students.


 Use of a single minimal cut-off and only two grades categories improve the reliability of grade
assignments.
 Pass/Fail grading has the potential to make learning environments more relaxed by focusing
students’ attention on learning rather than on grade (Goldstein & Tilker, 1971).
 Pass/Fail grading is what students will face in many real-life situations.

Disadvantages:

 Students gain very little prescriptive information.


 Students spend less time studying if pass/fail grading is used than when a wider range of
grading system is utilized.
 Students only study to attain minimum passing level and show less effort in striving for
excellence.

The table below provides a summary of the different grading methods discussed:
Method Advantages Disadvantages

· Convenient · Broad, sometimes


· Concise unclear indication of
Letter performance;
Grade · Familiar · Often includes a
jumble of factors including
effort and improvement.

· Broad, sometimes
unclear indication of
performance, false sense of
· Easy to
difference between close
Percentage calculate, record, and
scores;
Grade combine
· High scores not
· Familiar
necessarily signifies mastery

· Focus on high · May not reflect


standards of all student learning in many
Standards-Based students areas;
· Pre-established · Does not include
performance levels effort or improvement

· Little discrimination in
· Simple
performance;
Pass/Fail · Consistent with
· Less emphasis on high
mastery of learning
performance

Developing Effective Reporting System

The most critical issue to be addressed in selecting the tools included in reporting system is
what purpose or purposes it is to serve. Why we need to convey this information and what we
need to accomplish.

To determine the purpose or purpose, three aspects of communication must be considered.

Critical Aspects in Determining Communication Purposes:

1. What information or messages do we want to communicate?


2. Who is the primary audience for that message?
3. How would we like that information or message to be used?

Tools for Comprehensive Reporting System


Reporting systems most highly regarded by parents typically include a mix of traditional and
more modern reporting tools.

Tools that might be included in comprehensive reporting system:

1. Report Cards
2. Notes: Attached to Report Cards
3. Standardized Assessment Report
4. Phone Calls to Parents
5. Weekly/Monthly Progress Report
6. School Open-Houses
7. Newsletter to Parents
8. Personal Letter to Parents
9. Evaluated Projects or Assignments
10. Portfolios or Exhibits of Students’ Work
11. Homework Assignments
12. Homework Hotlines
13. School Web Pages
14. Parent-Teacher Conferences
15. Student-Teacher Conferences
16. Student- Led Conference

Guidelines for Better Practice

To ensure better practice of grading and reporting systems, the following statements serve as
guide on how to utilize effectively the grading and reporting systems:

1. Begin with a clear statement of purpose.


2. Provide accurate and understandable descriptions of learning.
3. Use grading and reporting to enhance teaching and learning.

6 BASIC STATISTICAL TOOLS


There are lies, damn lies, and statistics......
(Anon.)

6.1 Introduction
6.2 Definitions
6.3 Basic Statistics
6.4 Statistical tests

6.1 Introduction

In the preceding chapters basic elements for the proper execution of analytical
work such as personnel, laboratory facilities, equipment, and reagents were
discussed. Before embarking upon the actual analytical work, however, one more
tool for the quality assurance of the work must be dealt with: the statistical
operations necessary to control and verify the analytical procedures (Chapter 7) as
well as the resulting data (Chapter 8).

It was stated before that making mistakes in analytical work is unavoidable. This is
the reason why a complex system of precautions to prevent errors and traps to
detect them has to be set up. An important aspect of the quality control is the
detection of both random and systematic errors. This can be done by critically
looking at the performance of the analysis as a whole and also of the instruments
and operators involved in the job. For the detection itself as well as for the
quantification of the errors, statistical treatment of data is indispensable.

A multitude of different statistical tools is available, some of them simple, some


complicated, and often very specific for certain purposes. In analytical work, the
most important common operation is the comparison of data, or sets of data, to
quantify accuracy (bias) and precision. Fortunately, with a few simple convenient
statistical tools most of the information needed in regular laboratory work can be
obtained: the "t-test, the "F-test", and regression analysis. Therefore, examples of
these will be given in the ensuing pages.

Clearly, statistics are a tool, not an aim. Simple inspection of data, without
statistical treatment, by an experienced and dedicated analyst may be just as
useful as statistical figures on the desk of the disinterested. The value of statistics
lies with organizing and simplifying data, to permit some objective estimate
showing that an analysis is under control or that a change has occurred. Equally
important is that the results of these statistical procedures are recorded and can
be retrieved.

6.2 Definitions
6.2.1 Error
6.2.2 Accuracy
6.2.3 Precision
6.2.4 Bias

Discussing Quality Control implies the use of several terms and concepts with a
specific (and sometimes confusing) meaning. Therefore, some of the most
important concepts will be defined first.

6.2.1 Error

Error is the collective noun for any departure of the result from the "true"
value*. Analytical errors can be:

1. Random or unpredictable deviations between replicates,


quantified with the "standard deviation".

2. Systematic or predictable regular deviation from the "true" value,


quantified as "mean difference" (i.e. the difference between the true
value and the mean of replicate determinations).

3. Constant, unrelated to the concentration of the substance


analyzed (the analyte).

4. Proportional, i.e. related to the concentration of the analyte.

* The "true" value of an attribute is by nature


indeterminate and often has only a very relative
meaning. Particularly in soil science for several
attributes there is no such thing as the true
value as any value obtained is method-
dependent (e.g. cation exchange capacity).
Obviously, this does not mean that no adequate
analysis serving a purpose is possible. It does,
however, emphasize the need for the
establishment of standard reference methods
and the importance of external QC (see Chapter
9).

6.2.2 Accuracy
The "trueness" or the closeness of the analytical result to the "true" value. It is
constituted by a combination of random and systematic errors (precision and
bias) and cannot be quantified directly. The test result may be a mean of several
values. An accurate determination produces a "true" quantitative value, i.e. it is
precise and free of bias.

6.2.3 Precision

The closeness with which results of replicate analyses of a sample agree. It is a


measure of dispersion or scattering around the mean value and usually expressed
in terms of standard deviation, standard error or a range (difference between the
highest and the lowest result).

6.2.4 Bias

The consistent deviation of analytical results from the "true" value caused by
systematic errors in a procedure. Bias is the opposite but most used measure for
"trueness" which is the agreement of the mean of analytical results with the true
value, i.e. excluding the contribution of randomness represented in precision.
There are several components contributing to bias:

1. Method bias

The difference between the (mean) test result obtained from a


number of laboratories using the same method and an accepted
reference value. The method bias may depend on the analyte level.

2. Laboratory bias

The difference between the (mean) test result from a particular


laboratory and the accepted reference value.

3. Sample bias

The difference between the mean of replicate test results of a


sample and the ("true") value of the target population from which
the sample was taken. In practice, for a laboratory this refers mainly
to sample preparation, subsampling and weighing techniques.
Whether a sample is representative for the population in the field is
an extremely important aspect but usually falls outside the
responsibility of the laboratory (in some cases laboratories have
their own field sampling personnel).
The relationship between these concepts can be expressed in the following
equation:

Figure

The types of errors are illustrated in Fig. 6-1.

Fig. 6-1. Accuracy and precision in laboratory measurements. (Note that the
qualifications apply to the mean of results: in c the mean is accurate but
some individual results are inaccurate)
6.3 Basic Statistics

6.3.1 Mean
6.3.2 Standard deviation
6.3.3 Relative standard deviation. Coefficient of variation
6.3.4 Confidence limits of a measurement
6.3.5 Propagation of errors

In the discussions of Chapters 7 and 8 basic statistical treatment of data will be


considered. Therefore, some understanding of these statistics is essential and they
will briefly be discussed here.
The basic assumption to be made is that a set of data, obtained by repeated
analysis of the same analyte in the same sample under the same conditions, has
a normal or Gaussian distribution. (When the distribution is skewed statistical
treatment is more complicated). The primary parameters used are the mean (or
average) and the standard deviation (see Fig. 6-2) and the main tools the F-
test, the t-test, and regression and correlation analysis.

Fig. 6-2. A Gaussian or normal distribution. The figure shows that (approx.)
68% of the data fall in the range ¯ x± s, 95% in the range ¯x ± 2s, and
99.7% in the range ¯x ± 3s.

6.3.1 Mean

The average of a set of n data xi:

(6.1)

6.3.2 Standard deviation

This is the most commonly used measure of the spread or dispersion of data
around the mean. The standard deviation is defined as the square root of
the variance (V). The variance is defined as the sum of the squared deviations
from the mean, divided by n-1. Operationally, there are several ways of
calculation:

(6.1)

or

(6.3)

or

(6.4)
The calculation of the mean and the standard deviation can easily be done on a
calculator but most conveniently on a PC with computer programs such as dBASE,
Lotus 123, Quattro-Pro, Excel, and others, which have simple ready-to-use
functions. (Warning: some programs use n rather than n- 1!).

6.3.3 Relative standard deviation. Coefficient of variation

Although the standard deviation of analytical data may not vary much over
limited ranges of such data, it usually depends on the magnitude of such data:
the larger the figures, the larger s. Therefore, for comparison of variations (e.g.
precision) it is often more convenient to use the relative standard deviation
(RSD) than the standard deviation itself. The RSD is expressed as a fraction, but
more usually as a percentage and is then called coefficient of variation
(CV). Often, however, these terms are confused.

(6.5; 6.6)

Note. When needed (e.g. for the F-test, see Eq. 6.11) the variance
can, of course, be calculated by squaring the standard deviation:
V = s2 (6.7)

6.3.4 Confidence limits of a measurement

The more an analysis or measurement is replicated, the closer the mean x of the
results will approach the "true" value m, of the analyte content (assuming absence
of bias).

A single analysis of a test sample can be regarded as literally sampling the


imaginary set of a multitude of results obtained for that test sample. The
uncertainty of such subsampling is expressed by

(6.8)

where

m = "true" value (mean of large set of replicates)


¯x = mean of subsamples
t = a statistical value which depends on the number of data and
the required confidence (usually 95%).
s = standard deviation of mean of subsamples
n = number of subsamples
(The term is also known as the standard error of the mean.)

The critical values for t are tabulated in Appendix 1 (they are, therefore, here
referred to as ttab ). To find the applicable value, the number of degrees of
freedom has to be established by: df = n -1 (see also Section 6.4.2).

Example

For the determination of the clay content in the particle-size analysis, a semi-
automatic pipette installation is used with a 20 mL pipette. This volume is
approximate and the operation involves the opening and closing of taps.
Therefore, the pipette has to be calibrated, i.e. both the accuracy (trueness) and
precision have to be established.

A tenfold measurement of the volume yielded the following set of data (in mL):

19.941 19.812 19.829 19.828 19.742


19.797 19.937 19.847 19.885 19.804

The mean is 19.842 mL and the standard deviation 0.0627 mL. According to
Appendix 1 for n = 10 is ttab = 2.26 (df = 9) and using Eq. (6.8) this calibration
yields:

pipette volume = 19.842 ± 2.26 (0.0627/ ) = 19.84 ± 0.04 mL

(Note that the pipette has a systematic deviation from 20 mL as this is outside
the found confidence interval. See also bias).

In routine analytical work, results are usually single values obtained in batches of
several test samples. No laboratory will analyze a test sample 50 times to be
confident that the result is reliable. Therefore, the statistical parameters have to
be obtained in another way. Most usually this is done by method validation (see
Chapter 7) and/or by keeping control charts, which is basically the collection of
analytical results from one or more control samples in each batch (see Chapter 8).
Equation (6.8) is then reduced to

(6.9)

where
m = "true" value
x = single measurement
t = applicable ttab (Appendix 1)
s = standard deviation of set of previous measurements.

In Appendix 1 can be seen that if the set of replicated measurements is large (say
> 30), t is close to 2. Therefore, the (95%) confidence of the result x of a single
test sample (n = 1 in Eq. 6.8) is approximated by the commonly used and well
known expression

(6.10)

where S is the previously determined standard deviation of the large set of


replicates (see also Fig. 6-2).

Note: This "method-s" or s of a control sample is not a constant


and may vary for different test materials, analyte levels, and with
analytical conditions.

Running duplicates will, according to Equation (6.8), increase the confidence of


the (mean) result by a factor :

where

¯x = mean of duplicates
s = known standard deviation of large set

Similarly, triplicate analysis will increase the confidence by a factor , etc.


Duplicates are further discussed in Section 8.3.3.

Thus, in summary, Equation (6.8) can be applied in various ways to determine the
size of errors (confidence) in analytical work or measurements: single
determinations in routine work, determinations for which no previous data exist,
certain calibrations, etc.

6.3.5 Propagation of errors


6.3.5.1. Propagation of random errors
6.3.5.2 Propagation of systematic errors

The final result of an analysis is often calculated from several measurements


performed during the procedure (weighing, calibration, dilution, titration,
instrument readings, moisture correction, etc.). As was indicated in Section 6.2, the
total error in an analytical result is an adding-up of the sub-errors made in the
various steps. For daily practice, the bias and precision of the whole method are
usually the most relevant parameters (obtained from validation, Chapter 7; or
from control charts, Chapter 8). However, sometimes it is useful to get an insight
in the contributions of the subprocedures (and then these have to be determined
separately). For instance if one wants to change (part of) the method.

Because the "adding-up" of errors is usually not a simple summation, this will be
discussed. The main distinction to be made is between random errors (precision)
and systematic errors (bias).

6.3.5.1. Propagation of random errors

In estimating the total random error from factors in a final calculation, the
treatment of summation or subtraction of factors is different from that of
multiplication or division.

I. Summation calculations

If the final result x is obtained from the sum (or difference) of


(sub)measurements a, b, c, etc.:

x = a + b + c +...

then the total precision is expressed by the standard deviation obtained by taking
the square root of the sum of individual variances (squares of standard deviation):

If a (sub)measurement has a constant multiplication factor or coefficient (such as


an extra dilution), then this is included to calculate the effect of the variance
concerned, e.g. (2b)2

Example
The Effective Cation Exchange Capacity of soils (ECEC) is obtained by summation
of the exchangeable cations:

ECEC = Exch. (Ca + Mg + Na + K + H + Al)

Standard deviations experimentally obtained for exchangeable Ca, Mg, Na, K and
(H + Al) on a certain sample, e.g. a control sample, are: 0.30, 0.25, 0.15, 0.15, and
0.60 cmolc/kg respectively. The total precision is:

It can be seen that the total standard deviation is larger than the highest
individual standard deviation, but (much) less than their sum. It is also clear that if
one wants to reduce the total standard deviation, qualitatively the best result can
be expected from reducing the largest individual contribution, in this case the
exchangeable acidity.

2. Multiplication calculations

If the final result x is obtained from multiplication (or subtraction) of


(sub)measurements according to

then the total error is expressed by the standard deviation obtained by taking the
square root of the sum of the individual relative standard
deviations (RSD or CV, as a fraction or as percentage, see Eqs. 6.6 and 6.7):

If a (sub)measurement has a constant multiplication factor or coefficient, then this


is included to calculate the effect of the RSD concerned, e.g. (2RSDb)2.

Example

The calculation of Kjeldahl-nitrogen may be as follows:

where
a = ml HCl required for titration sample
b = ml HCl required for titration blank
s = air-dry sample weight in gram
M = molarity of HCl
1.4 = 14×10-3×100% (14 = atomic weight of N)
mcf = moisture correction factor

Note that in addition to multiplications, this calculation contains a subtraction


also (often, calculations contain both summations and multiplications.)

Firstly, the standard deviation of the titration (a -b) is determined as indicated in


Section 7 above. This is then transformed to RSD using Equations (6.5) or (6.6).
Then the RSD of the other individual parameters have to be determined
experimentally. The found RSDs are, for instance:

distillation: 0.8%,
titration: 0.5%,
molarity: 0.2%,
sample weight: 0.2%,
mcf: 0.2%.

The total calculated precision is:

Here again, the highest RSD (of distillation) dominates the total precision. In
practice, the precision of the Kjeldahl method is usually considerably worse
(» 2.5%) probably mainly as a result of the heterogeneity of the sample. The
present example does not take that into account. It would imply that 2.5% - 1.0%
= 1.5% or 3/5 of the total random error is due to sample heterogeneity (or other
overlooked cause). This implies that painstaking efforts to improve subprocedures
such as the titration or the preparation of standard solutions may not be very
rewarding. It would, however, pay to improve the homogeneity of the sample, e.g.
by careful grinding and mixing in the preparatory stage.

Note. Sample heterogeneity is also represented in the moisture


correction factor. However, the influence of this factor on the final
result is usually very small.
6.3.5.2 Propagation of systematic errors

Systematic errors of (sub)measurements contribute directly to the total bias of the


result since the individual parameters in the calculation of the final result each
carry their own bias. For instance, the systematic error in a balance will cause a
systematic error in the sample weight (as well as in the moisture determination).
Note that some systematic errors may cancel out, e.g. weighings by difference
may not be affected by a biased balance.

The only way to detect or avoid systematic errors is by comparison (calibration)


with independent standards and outside reference or control samples.

6.4 Statistical tests

6.4.1 Two-sided vs. one-sided test


6.4.2 F-test for precision
6.4.3 t-Tests for bias
6.4.4 Linear correlation and regression
6.4.5 Analysis of variance (ANOVA)

In analytical work a frequently recurring operation is the verification of


performance by comparison of data. Some examples of comparisons in practice
are:

- performance of two instruments,

- performance of two methods,

- performance of a procedure in different periods,

- performance of two analysts or laboratories,

- results obtained for a reference or control sample with the "true",


"target" or "assigned" value of this sample.

Some of the most common and convenient statistical tools to quantify such
comparisons are the F-test, the t-tests, and regression analysis.

Because the F-test and the t-tests are the most basic tests they will be discussed
first. These tests examine if two sets of normally distributed data are similar or
dissimilar (belong or not belong to the same "population") by comparing
their standard deviations and means respectively. This is illustrated in Fig. 6-3.
Fig. 6-3. Three possible cases when comparing two sets of data (n1 = n2). A.
Different mean (bias), same precision; B. Same mean (no bias), different
precision; C. Both mean and precision are different. (The fourth case, identical
sets, has not been drawn).
6.4.1 Two-sided vs. one-sided test

These tests for comparison, for instance between methods A and B, are based on
the assumption that there is no significant difference (the "null hypothesis"). In
other words, when the difference is so small that a tabulated critical
value of F or t is not exceeded, we can be confident (usually at 95% level)
that A and B are not different. Two fundamentally different questions can be
asked concerning both the comparison of the standard deviations s1 and s2 with
the F-test, and of the means¯x1, and ¯x2, with the t-test:

1. are A and B different? (two-sided test)


2. is A higher (or lower) than B? (one-sided test).

This distinction has an important practical implication as statistically the


probabilities for the two situations are different: the chance that A and B are only
different ("it can go two ways") is twice as large as the chance that A is higher (or
lower) than B ("it can go only one way"). The most common case is the two-sided
(also called two-tailed) test: there are no particular reasons to expect that the
means or the standard deviations of two data sets are different. An example is
the routine comparison of a control chart with the previous one (see 8.3).
However, when it is expected or suspected that the mean and/or the standard
deviation will go only one way, e.g. after a change in an analytical procedure, the
one-sided (or one-tailed) test is appropriate. In this case the probability that it
goes the other way than expected is assumed to be zero and, therefore, the
probability that it goes the expected way is doubled. Or, more correctly, the
uncertainty in the two-way test of 5% (or the probability of 5% that the critical
value is exceeded) is divided over the two tails of the Gaussian curve (see Fig. 6-
2), i.e. 2.5% at the end of each tail beyond 2s. If we perform the one-sided test
with 5% uncertainty, we actually increase this 2.5% to 5% at the end of one tail.
(Note that for the whole gaussian curve, which is symmetrical, this is then
equivalent to an uncertainty of 10% in two ways!)

This difference in probability in the tests is expressed in the use of two tables of
critical values for both F and t. In fact, the one-sided table at 95% confidence
level is equivalent to the two-sided table at 90% confidence level.

It is emphasized that the one-sided test is only appropriate when a difference in


one direction is expected or aimed at. Of course it is tempting to perform this
test after the results show a clear (unexpected) effect. In fact, however, then a two
times higher probability level was used in retrospect. This is underscored by the
observation that in this way even contradictory conclusions may arise: if in an
experiment calculated values of F and t are found within the range between the
two-sided and one-sided values of Ftab, and ttab, the two-sided test indicates no
significant difference, whereas the one-sided test says that the result of A is
significantly higher (or lower) than that of B. What actually happens is that in the
first case the 2.5% boundary in the tail was just not exceeded, and then,
subsequently, this 2.5% boundary is relaxed to 5% which is then obviously more
easily exceeded. This illustrates that statistical tests differ in strictness and that for
proper interpretation of results in reports, the statistical techniques used,
including the confidence limits or probability, should always be specified.

6.4.2 F-test for precision

Because the result of the F-test may be needed to choose between the
Student's t-test and the Cochran variant (see next section), the F-test is discussed
first.

The F-test (or Fisher's test) is a comparison of the spread of two sets of data to
test if the sets belong to the same population, in other words if the precisions are
similar or dissimilar.

The test makes use of the ratio of the two variances:

(6.11)

where the larger s2 must be the numerator by convention. If the performances are
not very different, then the estimates s1, and s2, do not differ much and their ratio
(and that of their squares) should not deviate much from unity. In practice, the
calculated F is compared with the applicable F value in the F-table (also called
the critical value, see Appendix 2). To read the table it is necessary to know the
applicable number of degrees of freedom for s1, and s2. These are calculated by:

df1 = n1-1
df2 = n2-1

If Fcal £ Ftab one can conclude with 95% confidence that there is no significant
difference in precision (the "null hypothesis" that s1, = s, is accepted). Thus, there
is still a 5% chance that we draw the wrong conclusion. In certain cases more
confidence may be needed, then a 99% confidence table can be used, which can
be found in statistical textbooks.

Example I (two-sided test)


Table 6-1 gives the data sets obtained by two analysts for the cation exchange
capacity (CEC) of a control sample. Using Equation (6.11) the calculated F value is
1.62. As we had no particular reason to expect that the analysts would perform
differently, we use the F-table for the two-sided test and find Ftab = 4.03
(Appendix 2, df1, = df2 = 9). This exceeds the calculated value and the null
hypothesis (no difference) is accepted. It can be concluded with 95% confidence
that there is no significant difference in precision between the work of Analyst 1
and 2.

Table 6-1. CEC values (in cmolc/kg) of a control sample determined by two
analysts.

1 2
10.2 9.7
10.7 9.0
10.5 10.2
9.9 10.3
9.0 10.8
11.2 11.1
11.5 9.4
10.9 9.2
8.9 9.8
10.6 10.2
¯x: 10.34 9.97
s: 0.819 0.644
n: 10 10
Fcal = 1.62 tcal = 1.12
Ftab = 4.03 ttab =
2.10

Example 2 (one-sided test)

The determination of the calcium carbonate content with the Scheibler standard
method is compared with the simple and more rapid "acid-neutralization" method
using one and the same sample. The results are given in Table 6-2. Because of
the nature of the rapid method we suspect it to produce a lower precision then
obtained with the Scheibler method and we can, therefore, perform the one sided
F-test. The applicable Ftab = 3.07 (App. 2, df1, = 12, df2 = 9) which is lower
than Fcal (=18.3) and the null hypothesis (no difference) is rejected. It can be
concluded (with 95% confidence) that for this one sample the precision of the
rapid titration method is significantly worse than that of the Scheibler method.
Table 6-2. Contents of CaCO3 (in mass/mass %) in a soil sample determined with
the Scheibler method (A) and the rapid titration method (B).

A B
2.5 1.7
2.4 1.9
2.5 2.3
2.6 2.3
2.5 2.8
2.5 2.5
2.4 1.6
2.6 1.9
2.7 2.6
2.4 1.7
- 2.4
- 2.2
2.6
x: 2.51 2.13
s: 0.099 0.424
n: 10 13
Fcal = 18.3 tcal = 3.12
Ftab = 3.07 ttab* =
2.18

(ttab* = Cochran's "alternative" ttab)

6.4.3 t-Tests for bias

6.4.3.1. Student's t-test


6.4.3.2 Cochran's t-test
6.4.3.3 t-Test for large data sets (n³ 30)
6.4.3.4 Paired t-test

Depending on the nature of two sets of data ( n, s, sampling nature), the means of
the sets can be compared for bias by several variants of the t-test. The following
most common types will be discussed:

1. Student's t-test for comparison of two independent sets of data


with very similar standard deviations;
2. the Cochran variant of the t-test when the standard deviations of
the independent sets differ significantly;

3. the paired t-test for comparison of strongly dependent sets of


data.

Basically, for the t-tests Equation (6.8) is used but written in a different way:

(6.12)

where

¯x = mean of test results of a sample


m = "true" or reference value
s = standard deviation of test results
n = number of test results of the sample.

To compare the mean of a data set with a reference value normally the "two-
sided t-table of critical values" is used (Appendix 1). The applicable number of
degrees of freedom here is:

df = n-1

If a value for t calculated with Equation (6.12) does not exceed the critical value in
the table, the data are taken to belong to the same population: there is no
difference and the "null hypothesis" is accepted (with the applicable probability,
usually 95%).

As with the F-test, when it is expected or suspected that the obtained results are
higher or lower than that of the reference value, the one-sided t-test can be
performed: if tcal > ttab, then the results are significantly higher (or lower) than the
reference value.

More commonly, however, the "true" value of proper reference samples is


accompanied by the associated standard deviation and number of replicates used
to determine these parameters. We can then apply the more general case of
comparing the means of two data sets: the "true" value in Equation (6.12) is then
replaced by the mean of a second data set. As is shown in Fig. 6-3, to test if two
data sets belong to the same population it is tested if the two Gauss curves do
sufficiently overlap. In other words, if the difference between the means ¯x 1-¯x2 is
small. This is discussed next.
Similarity or non-similarity of standard deviations

When using the t-test for two small sets of data (n1 and/or n2<30), a choice of
the type of test must be made depending on the similarity (or non-similarity) of
the standard deviations of the two sets. If the standard deviations are sufficiently
similar they can be "pooled" and the Student t-test can be used. When the
standard deviations are not sufficiently similar an alternative procedure for the t-
test must be followed in which the standard deviations are not pooled. A
convenient alternative is the Cochran variant of the t-test. The criterion for the
choice is the passing or non-passing of the F-test (see 6.4.2), that is, if the
variances do or do not significantly differ. Therefore, for small data sets, the F-test
should precede the t-test.

For dealing with large data sets (n1, n2,³ 30) the "normal" t-test is used (see
Section 6.4.3.3 and App. 3).

6.4.3.1. Student's t-test

(To be applied to small data sets (n1, n2 < 30)


where s1, and s2 are similar according to F-test.

When comparing two sets of data, Equation (6.12) is rewritten as:

(6.13)

where

¯x1 = mean of data set 1


¯x2 = mean of data set 2
sp = "pooled" standard deviation of the sets
n1 = number of data in set 1
n2 = number of data in set 2.

The pooled standard deviation sp is calculated by:

6.14

where
s1 = standard deviation of data set 1
s2 = standard deviation of data set 2
n1 = number of data in set 1
n2 = number of data in set 2.

To perform the t-test, the critical ttab has to be found in the table (Appendix 1);
the applicable number of degrees of freedom df is here calculated by:

df = n1 + n2 -2

Example

The two data sets of Table 6-1 can be used: With Equations (6.13) and
(6.14) tcal, is calculated as 1.12 which is lower than the critical value ttab of 2.10
(App. 1, df = 18, two-sided), hence the null hypothesis (no difference) is accepted
and the two data sets are assumed to belong to the same population: there is no
significant difference between the mean results of the two analysts (with 95%
confidence).

Note. Another illustrative way to perform this test for bias is to


calculate if the difference between the means falls within or outside
the range where this difference is still not significantly large. In
other words, if this difference is less than the least significant
difference (lsd). This can be derived from Equation (6.13):
6.15

In the present example of Table 6-1, the calculation yields lsd = 0.69. The
measured difference between the means is 10.34 -9.97 = 0.37 which is smaller
than the lsd indicating that there is no significant difference between the
performance of the analysts.

In addition, in this approach the 95% confidence limits of the difference between
the means can be calculated (cf. Equation 6.8):

confidence limits = 0.37 ± 0.69 = -0.32 and 1.06

Note that the value 0 for the difference is situated within this confidence interval
which agrees with the null hypothesis of x1 = x2 (no difference) having been
accepted.

6.4.3.2 Cochran's t-test


To be applied to small data sets (n1, n2, < 30)
where s1 and s2, are dissimilar according to F-test.

Calculate t with:

6.16

Then determine an "alternative" critical t-value:

6.17

where

t1 = ttab at n1-1 degrees of freedom


t2 = ttab at n2-1 degrees of freedom

Now the t-test can be performed as usual: if tcal< ttab* then the null hypothesis that
the means do not significantly differ is accepted.

Example

The two data sets of Table 6-2 can be used.

According to the F-test, the standard deviations differ significantly so that the
Cochran variant must be used. Furthermore, in contrast to our expectation that
the precision of the rapid test would be inferior, we have no idea about the bias
and therefore the two-sided test is appropriate. The calculations yield t cal = 3.12
and ttab*= 2.18 meaning that tcal exceeds ttab* which implies that the null hypothesis
(no difference) is rejected and that the mean of the rapid analysis deviates
significantly from that of the standard analysis (with 95% confidence, and for this
sample only). Further investigation of the rapid method would have to include the
use of more different samples and then comparison with the one-sided t-test
would be justified (see 6.4.3.4, Example 1).

6.4.3.3 t-Test for large data sets (n³ 30)


In the example above (6.4.3.2) the conclusion happens to have been the same if
the Student's t-test with pooled standard deviations had been used. This is
caused by the fact that the difference in result of the Student and Cochran
variants of the t-test is largest when small sets of data are compared, and
decreases with increasing number of data. Namely, with increasing number of
data a better estimate of the real distribution of the population is obtained (the
flatter t-distribution converges then to the standardized normal distribution).
When n³ 30 for both sets, e.g. when comparing Control Charts (see 8.3), for all
practical purposes the difference between the Student and Cochran variant is
negligible. The procedure is then reduced to the "normal" t-test by simply
calculating tcal with Eq. (6.16) and comparing this with t tab at df = n1 + n2-2. (Note
in App. 1 that the two-sided ttab is now close to 2).

The proper choice of the t-test as discussed above is summarized in a flow


diagram in Appendix 3.

6.4.3.4 Paired t-test

When two data sets are not independent, the paired t-test can be a better tool
for comparison than the "normal" t-test described in the previous sections. This is
for instance the case when two methods are compared by the same analyst using
the same sample(s). It could, in fact, also be applied to the example of Table 6-1
if the two analysts used the same analytical method at (about) the same time.

As stated previously, comparison of two methods using different levels of analyte


gives more validation information about the methods than using only one level.
Comparison of results at each level could be done by the F and t-tests as
described above. The paired t-test, however, allows for different levels provided
the concentration range is not too wide. As a rule of fist, the range of results
should be within the same magnitude. If the analysis covers a longer range, i.e.
several powers of ten, regression analysis must be considered (see Section 6.4.4).
In intermediate cases, either technique may be chosen.

The null hypothesis is that there is no difference between the data sets, so the
test is to see if the mean of the differences between the data deviates
significantly from zero or not (two-sided test). If it is expected that one set is
systematically higher (or lower) than the other set, then the one-sided test is
appropriate.

Example 1

The "promising" rapid single-extraction method for the determination of the


cation exchange capacity of soils using the silver thiourea complex (AgTU,
buffered at pH 7) was compared with the traditional ammonium acetate method
(NH4OAc, pH 7). Although for certain soil types the difference in results appeared
insignificant, for other types differences seemed larger. Such a suspect group
were soils with ferralic (oxic) properties (i.e. highly weathered sesquioxide-rich
soils). In Table 6-3 the results often soils with these properties are grouped to test
if the CEC methods give different results. The difference d within each pair and
the parameters needed for the paired t-test are given also.

Table 6-3. CEC values (in cmolc/kg) obtained by the NH4OAc and AgTU methods
(both at pH 7) for ten soils with ferralic properties.

Sample NH4OAc AgTU d


1 7.1 6.5 -0.6
2 4.6 5.6 +1.0
3 10.6 14.5 +3.9
4 2.3 5.6 +3.3
5 25.2 23.8 -1.4
6 4.4 10.4 +6.0
7 7.8 8.4 +0.6
8 2.7 5.5 +2.8
9 14.3 19.2 +4.9
10 13.6 15.0 +1.4
¯d tcal = 2.89
= +2.19
sd = 2.395 ttab = 2.26

Using Equation (6.12) and noting that m d = 0 (hypothesis value of the


differences, i.e. no difference), the t-value can be calculated as:

where

= mean of differences within each pair of data


sd = standard deviation of the mean of differences
n = number of pairs of data

The calculated t value (=2.89) exceeds the critical value of 1.83 (App. 1, df = n -1
= 9, one-sided), hence the null hypothesis that the methods do not differ is
rejected and it is concluded that the silver thiourea method gives significantly
higher results as compared with the ammonium acetate method when applied to
such highly weathered soils.

Note. Since such data sets do not have a normal distribution, the
"normal" t-test which compares means of sets cannot be used here
(the means do not constitute a fair representation of the sets). For
the same reason no information about the precision of the two
methods can be obtained, nor can the F-test be applied. For
information about precision, replicate determinations are needed.

Example 2

Table 6-4 shows the data of total-P in four plant tissue samples obtained by a
laboratory L and the median values obtained by 123 laboratories in a
proficiency (round-robin) test.

Table 6-4. Total-P contents (in mmol/kg) of plant tissue as determined by 123
laboratories (Median) and Laboratory L.

Sample Median Lab L d


1 93.0 85.2 -7.8
2 201 224 23
3 78.9 84.5 5.6
4 175 185 10
¯d = 7.70 tcal =1.21
sd = 12.702 ttab = 3.18

To verify the performance of the laboratory a paired t-test can be performed:

Using Eq. (6.12) and noting that m d=0 (hypothesis value of the differences, i.e. no
difference), the t value can be calculated as:

The calculated t-value is below the critical value of 3.18 (Appendix 1, df = n - 1 =


3, two-sided), hence the null hypothesis that the laboratory does not significantly
differ from the group of laboratories is accepted, and the results of
Laboratory L seem to agree with those of "the rest of the world" (this is a so-
called third-line control).

6.4.4 Linear correlation and regression


6.4.4.1 Construction of calibration graph
6.4.4.2 Comparing two sets of data using many samples at different
analyte levels

These also belong to the most common useful statistical tools to compare effects
and performances X and Y. Although the technique is in principle the same for
both, there is a fundamental difference in concept: correlation analysis is applied
to independent factors: if X increases, what will Y do (increase, decrease, or
perhaps not change at all)? In regression analysis a unilateral response is
assumed: changes in X result in changes in Y, but changes in Y do not result in
changes in X.

For example, in analytical work, correlation analysis can be used for comparing
methods or laboratories, whereas regression analysis can be used to construct
calibration graphs. In practice, however, comparison of laboratories or methods is
usually also done by regression analysis. The calculations can be performed on a
(programmed) calculator or more conveniently on a PC using a home-made
program. Even more convenient are the regression programs included in
statistical packages such as Statistix, Mathcad, Eureka, Genstat, Statcal, SPSS, and
others. Also, most spreadsheet programs such as Lotus 123, Excel, and Quattro-
Pro have functions for this.

Laboratories or methods are in fact independent factors. However, for regression


analysis one factor has to be the independent or "constant" factor (e.g. the
reference method, or the factor with the smallest standard deviation). This factor
is by convention designated X, whereas the other factor is then the dependent
factor Y (thus, we speak of "regression of Y on X").

As was discussed in Section 6.4.3, such comparisons can often been done with
the Student/Cochran or paired t-tests. However, correlation analysis is indicated:

1. When the concentration range is so wide that the errors, both


random and systematic, are not independent (which is the
assumption for the t-tests). This is often the case where
concentration ranges of several magnitudes are involved.

2. When pairing is inappropriate for other reasons, notably a long


time span between the two analyses (sample aging, change in
laboratory conditions, etc.).
The principle is to establish a statistical linear relationship between two sets of
corresponding data by fitting the data to a straight line by means of the "least
squares" technique. Such data are, for example, analytical results of two methods
applied to the same samples (correlation), or the response of an instrument to a
series of standard solutions (regression).

Note: Naturally, non-linear higher-order relationships are also


possible, but since these are less common in analytical work and
more complex to handle mathematically, they will not be discussed
here. Nevertheless, to avoid misinterpretation, always inspect the
kind of relationship by plotting the data, either on paper or on the
computer monitor.

The resulting line takes the general form:

y = bx + (6.18)
a

where

a = intercept of the line with the y-axis


b = slope (tangent)

In laboratory work ideally, when there is perfect positive correlation without bias,
the intercept a = 0 and the slope = 1. This is the so-called "1:1 line" passing
through the origin (dashed line in Fig. 6-5).

If the intercept a ¹ 0 then there is a systematic discrepancy (bias, error)


between X and Y; when b ¹ 1 then there is a proportional response or difference
between X and Y.

The correlation between X and Y is expressed by the correlation


coefficient r which can be calculated with the following equation:

6.19

where

xi = data X
¯x = mean of data X
yi = data Y
¯y = mean of data Y

It can be shown that r can vary from 1 to -1:

r = 1 perfect positive linear correlation


r = 0 no linear correlation (maybe other correlation)
r = -1 perfect negative linear correlation

Often, the correlation coefficient r is expressed as r2: the coefficient of


determination or coefficient of variance. The advantage of r2 is that, when
multiplied by 100, it indicates the percentage of variation in Y associated with
variation in X. Thus, for example, when r = 0.71 about 50% (r2 = 0.504) of the
variation in Y is due to the variation in X.

The line parameters b and a are calculated with the following equations:

6.20

and

a = ¯y - b¯x 6.21

It is worth to note that r is independent of the choice which factor is the


independent factory and which is the dependent Y. However, the regression
parameters a and do depend on this choice as the regression lines will be
different (except when there is ideal 1:1 correlation).

6.4.4.1 Construction of calibration graph

As an example, we take a standard series of P (0-1.0 mg/L) for the


spectrophotometric determination of phosphate in a Bray-I extract ("available P"),
reading in absorbance units. The data and calculated terms needed to determine
the parameters of the calibration graph are given in Table 6-5. The line itself is
plotted in Fig. 6-4.

Table 6-5 is presented here to give an insight in the steps and terms involved.
The calculation of the correlation coefficient r with Equation (6.19) yields a value
of 0.997 (r2 = 0.995). Such high values are common for calibration graphs. When
the value is not close to 1 (say, below 0.98) this must be taken as a warning and
it might then be advisable to repeat or review the procedure. Errors may have
been made (e.g. in pipetting) or the used range of the graph may not be linear.
On the other hand, a high r may be misleading as it does not necessarily indicate
linearity. Therefore, to verify this, the calibration graph should always be plotted,
either on paper or on computer monitor.

Using Equations (6.20 and (6.21) we obtain:

and

a = 0.350 - 0.313 = 0.037

Thus, the equation of the calibration line is:

y = 0.626x + (6.22)
0.037

Table 6-5. Parameters of calibration graph in Fig. 6-4.

xi yi x1-¯x (xi-¯x)2 yi-¯y (yi-¯y)2 (x1-¯x)(yi-¯y)


0.0 0.05 -0.5 0.25 -0.30 0.090 0.150
0.2 0.14 -0.3 0.09 -0.21 0.044 0.063
0.4 0.29 -0.1 0.01 -0.06 0.004 0.006
0.6 0.43 0.1 0.01 0.08 0.006 0.008
0.8 0.52 0.3 0.09 0.17 0.029 0.051
1.0 0.67 0.5 0.25 0.32 0.102 0.160
3.0 2.10 0 0.70 0 0.2754 0.438 S
¯x=0.5 ¯y = 0.35

Fig. 6-4. Calibration graph plotted from data of Table 6-5. The dashed lines
delineate the 95% confidence area of the graph. Note that the confidence is
highest at the centroid of the graph.
During calculation, the maximum number of decimals is used, rounding off to the
last significant figure is done at the end (see instruction for rounding off in
Section 8.2).

Once the calibration graph is established, its use is simple: for each y value
measured the corresponding concentration x can be determined either by direct
reading or by calculation using Equation (6.22). The use of calibration graphs is
further discussed in Section 7.2.2.

Note. A treatise of the error or uncertainty in the regression line is


given.
6.4.4.2 Comparing two sets of data using many samples at different analyte
levels

Although regression analysis assumes that one factor (on the x-axis) is constant,
when certain conditions are met the technique can also successfully be applied to
comparing two variables such as laboratories or methods. These conditions are:
- The most precise data set is plotted on the x-axis
- At least 6, but preferably more than 10 different samples are
analyzed
- The samples should rather uniformly cover the analyte level range
of interest.

To decide which laboratory or method is the most precise, multi-replicate results


have to be used to calculate standard deviations (see 6.4.2). If these are not
available then the standard deviations of the present sets could be compared
(note that we are now not dealing with normally distributed sets of replicate
results). Another convenient way is to run the regression analysis on the
computer, reverse the variables and run the analysis again. Observe which variable
has the lowest standard deviation (or standard error of the intercept a, both given
by the computer) and then use the results of the regression analysis where this
variable was plotted on the x-axis.

If the analyte level range is incomplete, one might have to resort to spiking or
standard additions, with the inherent drawback that the original analyte-sample
combination may not adequately be reflected.

Example

In the framework of a performance verification programme, a large number of soil


samples were analyzed by two laboratories X and Y (a form of "third-line control",
see Chapter 9) and the data compared by regression. (In this particular case, the
paired t-test might have been considered also). The regression line of a common
attribute, the pH, is shown here as an illustration. Figure 6-5 shows the so-called
"scatter plot" of 124 soil pH-H2O determinations by the two laboratories. The
correlation coefficient r is 0.97 which is very satisfactory. The slope (= 1.03)
indicates that the regression line is only slightly steeper than the 1:1 ideal
regression line. Very disturbing, however, is the intercept a of -1.18. This implies
that laboratory Y measures the pH more than a whole unit lower than
laboratory X at the low end of the pH range (the intercept -1.18 is at pH x = 0)
which difference decreases to about 0.8 unit at the high end.

Fig. 6-5. Scatter plot of pH data of two laboratories. Drawn line: regression
line; dashed line: 1:1 ideal regression line.
The t-test for significance is as follows:

For intercept a: m a = 0 (null hypothesis: no bias; ideal intercept is then zero),


standard error =0.14 (calculated by the computer), and using Equation (6.12) we
obtain:

Here, ttab = 1.98 (App. 1, two-sided, df = n - 2 = 122 (n-2 because an extra


degree of freedom is lost as the data are used for both a and b) hence, the
laboratories have a significant mutual bias.

For slope: m b = 1 (ideal slope: null hypothesis is no difference), standard


error = 0.02 (given by computer), and again using Equation (6.12) we obtain:
Again, ttab = 1.98 (App. 1; two-sided, df = 122), hence, the difference between the
laboratories is not significantly proportional (or: the laboratories do not have a
significant difference in sensitivity). These results suggest that in spite of the good
correlation, the two laboratories would have to look into the cause of the bias.

Note. In the present example, the scattering of the points around


the regression line does not seem to change much over the whole
range. This indicates that the precision of laboratory Y does not
change very much over the range with respect to laboratory X. This
is not always the case. In such cases, weighted regression (not
discussed here) is more appropriate than the unweighted
regression as used here.

Validation of a method (see Section 7.5) may reveal that precision


can change significantly with the level of analyte (and with other
factors such as sample matrix).

6.4.5 Analysis of variance (ANOVA)

When results of laboratories or methods are compared where more than one
factor can be of influence and must be distinguished from random effects, then
ANOVA is a powerful statistical tool to be used. Examples of such factors are:
different analysts, samples with different pre-treatments, different analyte levels,
different methods within one of the laboratories). Most statistical packages for the
PC can perform this analysis.

As a treatise of ANOVA is beyond the scope of the present Guidelines, for further
discussion the reader is referred to statistical textbooks, some of which are given
in the list of Literature.

Error or uncertainty in the regression line

The "fitting" of the calibration graph is necessary because the response


points yi, composing the line do not fall exactly on the line. Hence, random errors
are implied. This is expressed by an uncertainty about the slope and
intercept b and a defining the line. A quantification can be found in the standard
deviation of these parameters. Most computer programmes for regression will
automatically produce figures for these. To illustrate the procedure, the example
of the calibration graph in Section 6.4.3.1 is elaborated here.
A practical quantification of the uncertainty is obtained by calculating the
standard deviation of the points on the line; the "residual standard
deviation" or "standard error of the y-estimate", which we assumed to be
constant (but which is only approximately so, see Fig. 6-4):

(6.23)

where

= "fitted" y-value for each xi, (read from graph or calculated


with Eq. 6.22). Thus, is the (vertical) deviation of the found y-
values from the line.

n = number of calibration points.

Note: Only the y-deviations of the points from the line are
considered. It is assumed that deviations in the x-direction are
negligible. This is, of course, only the case if the standards are very
accurately prepared.

Now the standard deviations for the intercept a and slope b can be calculated
with:

6.24

and

6.25

To make this procedure clear, the parameters involved are listed in Table 6-6.

The uncertainty about the regression line is expressed by the confidence limits of
a and b according to Eq. (6.9): a ± t.sa and b ± t.sb

Table 6-6. Parameters for calculating errors due to calibration graph (use also
figures of Table 6-5).
xi yi

0 0.05 0.037 0.013 0.0002


0.2 0.14 0.162 -0.022 0.0005
0.4 0.29 0.287 0.003 0.0000
0.6 0.43 0.413 0.017 0.0003
0.8 0.52 0.538 -0.018 0.0003
1.0 0.67 0.663 0.007 0.0001
0.001364 S

In the present example, using Eq. (6.23), we calculate

and, using Eq. (6.24) and Table 6-5:

and, using Eq. (6.25) and Table 6-5:

The applicable ttab is 2.78 (App. 1, two-sided, df = n -1 = 4) hence, using Eq. (6.9):

a = 0.037 ± 2.78 × 0.0132 = 0.037 ± 0.037


and
b = 0.626 ± 2.78 × 0.0219 = 0.626 ± 0.061

Note that if sa is large enough, a negative value for a is possible, i.e. a negative
reading for the blank or zero-standard. (For a discussion about the error
in x resulting from a reading in y, which is particularly relevant for reading a
calibration graph, see Section 7.2.3)

The uncertainty about the line is somewhat decreased by using more calibration
points (assuming sy has not increased): one more point reduces ttab from 2.78 to
2.57 (see Appendix 1).

You might also like