Semi Final Module in Educ 5
Semi Final Module in Educ 5
Semi Final Module in Educ 5
In order to thrive in this constantly changing and extremely challenging period the
acquisition of 21st century skills is necessary. It is imperative that the educational system
sees that these skills are developed and honed before the learners graduate. It should be
integrated in the program in each discipline. More than just acquiring knowledge, its
application is important. To ensure that the education has really done its role, ways to
measure or to assess the learning process are necessary. Thus, the assessment process
and tools must be suited to the needs and requirements of the 21st century. In this
chapter, the characteristics of the 21st century assessment, how it is used as one of the
inputs in making instructional decision, and outcome-based assessment will be discussed.
Inevitably the 21st century is here, demanding a lot of changes, development, and re-
engineering of systems in different fields for this generation to thrive. In the field of
education, most of the changes of focused on teaching and learning. Preparing and
equipping the teachers to cater the needs of the 21st century learners are part of the
adjustments being done in the education system. Curricula are updated to address the
needs of the community of relation to the demands of the 21st century. This aspect of
teaching and learning has been given its share of focus, the various components/factors
analyzed and updated to ensure the students’ learning will be at par with the demands of
the 21st century. Although a lot of changes has been made on the different facets of
education, there are some members of the educational community calling for a
corresponding development of change in educational assessment. Viewing educational
assessment as agent of educational change is of great importance. This belief, coupled
with the traditional focus on teaching and learning, will produce a strong and emerging
imperative to alter our long-held conceptions of these three parts: teaching, learning and
assessment (Greenstein, 2012).
Twenty-first century skills must build on the core literacy and numeracy that all students
must master. Students need to think critically and creatively, communicate and collaborate
effectively, and work globally to be productive, accountable citizens and leaders. These
skills to be honed must be assessed, not just simply to get numerical results but more so,
to take the results of assessment of guide to take the further action.
Educators need to focus on: what to teach; how to teach it; and how to assess it
(Greenstein, 2012; Schmoker, 2011).
The Assessment and Teaching of 21st Century Skills project. (atc21s.org) has a core belief
that alignment of goals with learning and assessment is essential to policy and practice.
They emphasize the importance of balanced assessment systems that incorporate the 21st
century goals.
This section focuses on the characteristics of 21st century assessment and the different
types of assessment. You are expected to integrate the concepts that will be discussed
and apply them in using appropriate assessment tools and techniques in making
instructional decisions; and finally, relate assessment to learning outcomes.
1.1 Responsive
Visible performance-based work (as a result of assessment) generates data that inform
curriculum and instruction. Teachers can adjust instructions, school leaders can consider
additional educational opportunities for students and policy makers can modify programs
and resources to cater to the present needs of the school community.
1.2 Flexible
Lesson design, curriculum, and assessment require flexibility, suppleness, and adaptability.
Assessment and responses may not be fitted to expected answers. Assessment need to be
adaptability. Assessments and responses may not fitted to expected answers. Assessment
need to be adaptable to students’ settings. Rather than the identical approach that works
in traditional assessment, 21st century approaches are more versatile. These approaches
best fit for the demands of the learning environment at present since as students’
decisions, actions and applications vary, the assessments and the system need to be
flexible, too.
1.3 Integrated
Assessment are to be incorporated into day-to-day practice rather than as add-ons at the
end of instructions or during a single specified week of the school calendar.
1.4 Informative
The desired 21st century goals and objectives are clearly stated and explicitly taught.
Students display their range of emerging knowledge and skills. Exemplars routinely guide
students toward achievement of targets.
Demonstration of 21st century skills are evident and support learning. Students show the
steps they go through and display their thought processes for peer and teacher review.
1.6 Communicated
Communication of assessment data is clear and transparent for all stakeholders. Results
are routinely posted to database along with standards-based commentary, both of which
must be available and comprehensible at levels. Students receive routine feedback on
their progress, and parents are kept informed through access to visible progress reports
and assessment data.
To be valid, the assessments must measure the stated objective and 21st century skills
with legitimacy and integrity.
To be reliable, the assessment must be precise and technically sound so that users are
consistent in their administration and interpretation of data. They produce accurate
information for decision-making in all relevant circumstances.
1.8 Systemic
These eight characteristics of 21st century assessment, are essential guide for the
preparation of assessment activities by educators. It is necessary to refer to these
characteristics to ensure that the learners are being assessed towards the skills and
demand of the 21st century.
The educational assessment process start in analyzing the criterion together with the
teaching-learning environment. It is done to determine the effect of the environment to
the teaching-learning situation after which, the kind of evidence that are appropriate to
use for assessment of the individuals are set. This helps to determine the strengths,
weaknesses, needs and personality characteristics, skills and abilities of the learner (Bloom,
1970).
It is clear that educational assessment encompasses total educational setting and not
limited to the teacher-student engagement. It is not merely based on a single aspect such
as taking a test, and checking it. In totality, the processes of measurement and evaluation
are subsumed in the educational assessment process.
Assessment is constantly taking place in educational settings. Decisions are made about
content/subject matter and specific targets, nature of students and faculty, morale and
satisfaction of both the teachers and the students, as well as the extent of which student
performances meet the standard and/or deliver the outcomes expected from them by the
teacher.
Assessments can be used as basis for decision-making at different phases of the teaching-
learning process. The table below depicts the different phases of the teaching-learning
process, how and what decisions are made by the teachers:
Introduction
Today, educational systems across the globe are undergoing efforts to move beyond the
ways they operated at the beginning of the 20th century, with traditional instructional
practices that commonly ask students to work individually on exams that require them to
recall facts or respond to pre-formulated problems within the narrow boundaries of
individual school subjects. Reforms currently underway reframe what is taught, how it is
learned, and how it is being evaluated in innovative ways that help personalize learning.
Assessments that support learning must explicitly communicate the nature of expected
learning. Research, in fact, shows the powerful effect that on-going assessment embedded
into the learning process has on student learning, particularly for low ability students
(Black & Wiliam, 1998). Creating such a system of personalized learning requires new
forms of formative and summative student performance assessments that enable
individual students to stretch onward from wherever they are in a learning continuum. For
over a decade, Intel® Corporation has been involved in a number of global initiatives
such as Assessment and Teaching of 21st Century Skills (ATC21S) that support developing
new national assessment strategies and new benchmarking tests. Through its partnerships
with ministries of education, Intel Teach’s teacher professional development programs
have helped millions of teachers in developing countries integrate these innovative
assessment strategies, as well as technology, into their classroom practice (EDC & SRI
International, 2008; Light, Polin, & Strother, 2009). While these strategies support new
assessments of learning, all of the Intel Teach professional development programs also
use a variety of assessment for learning approaches. Assessment for learning is the idea
that classroom assessments should support ongoing teaching and learning (Assessment
Reform Group, 2002; Heritage, 2010), thus highlighting the vital role that teacher-made
classroom-based formative and process-focused assessments could play in improving the
entire education system. Intel’s Getting Started course, teachers learn the technical skills
to design rubrics and the Essentials course teaches teachers how to use rubrics to assess
student products and encourages performance-based assessments. The Teaching Thinking
with Technology and the Essentials V10 courses stress formative assessments for 21st
century skills. The online Elements courses include one entirely devoted to assessing 21st
century learning. Intel also offers a free online rubric maker. Additionally, courses like
Getting Started and Essentials model good assessment practices when they have teachers
assess and provide feedback on their work or when the courses ask teachers to reflect on
their own learning in the course. But, these programs alone are probably not sufficient
and local agencies and ministries may need to do more to support the needed shifts in
classroom assessment strategies.
Classroom-Based Assessments
Fostering 21st Century Learning with
Classroom-Based Assessments
Teachers have always evaluated student knowledge through recall test, or by asking
content questions during a lecture, but researchers and practitioners are beginning to
understood that a different type of teacher developed assessments can play an important
role in supporting learning (Black & Wiliam, 1998; W. J. Popham, 2008b) and in helping to
transform teaching practice. In fact, incorporating 21st century teaching practices should
start with updating teachers’ arsenal of assessment strategies that they use in the
classroom to support their teaching (Jacobs, 2010). In a seminal review of the literature on
how people learn, the National Research Council asserts that “appropriately designed
assessments can help teachers realize the need to rethink their teaching practices” (2000,
p. 141).
The research around classroom assessments suggests that the tools and strategies we
wish to discuss share three important traits that in different degrees: high quality teacher-
designed assessments provide insight on what and how students are learning in time for
teachers to modify or personalize instruction; they allow teachers to assess a broader
range of skills and abilities in addition to content recall; and these assessments give
students new roles in the assessment process that can make assessment itself a learning
experience and deepen student engagement in content.
1) Provide Insight on Student Learning so Teachers Can Modify Instruction: Because many
of these assessment tools and strategies are formative in nature, the information garnered
from their implementation can be used to immediately inform teachers’ instructional
decisions. For example, information garnered from portfolios can help teachers evaluate
the effectiveness of their own instruction while helping them make informed decisions
about future lessons. The implementation of portfolio assessments stimulates student self-
reflection providing valuable feedback to both students and teachers, which in turn can
be used to inform the teaching and learning processes. When employing the peer
assessment strategy, if students and teachers assess a student differently it can open up
productive dialogue to discuss student learning needs and goal creation (J. Ross, 2006).
The teacher can then use that information to structure the following lesson around the
needs and goals of those students. Whether taking a pre and post survey poll or asking
multiple-choice questions to reveal student’s subtle misunderstandings and
misconceptions, a Student Response System (SRS) allows teachers to take a quick
snapshot of where his or her teachers are on a learning continuum and devise the
appropriate strategies to take them to the next level. As teachers become more aware of
their students’ interests, needs, strengths and weaknesses, they are better positioned to
modify their instructional strategies and content focus to help maximize student learning.
2) Assess Broader Range of Skills and Abilities: Traditional forms of assessment like
multiple-choice, fill in the blank, and true/false, privilege memorization and recall skills
that demand only a low level of cognitive effort (Dikli, 2003; Shepard, et al., 1995). The
assessment tools and strategies outlined in this paper provide more robust means to
measure higher order thinking skills and complex problem solving abilities (Palm, 2008).
Strategies such as performance bases assessment (PBA) and portfolios, take into account
multiple measures of achievement, and rely on multiple sources of evidence, moving
beyond the standardized examinations most commonly used for school accountability
(Shepard, et al., 1995; Wood, Darling-Hammond, Neill, & Roschewski, 2007). Self-and
peerassessment both teach and assess a broader range of life skills like self-reflection,
collaboration, and communication. As a tool to measure student learning, rubrics allow
teachers to measure multiple dimensions of learning rather than just content knowledge,
and to provide a more detailed assessment of each student’s abilities instead of just a
number or percent correct.
3) Give Students New Roles in the Assessment Process that Make Assessment a Learning
Experience: In contrast to the traditional teacher-designed, teacher-administered, teacher-
graded tests, this cadre of assessments involves students throughout the assessing
process. Involving students in the creation of assessment criteria, the diagnosis of their
strengths and weaknesses, and the monitoring of their own learning, transfers the locus of
instruction from the teacher to his or her students (Nunes, 2004). For example, the most
successful rubrics involve students in the creation of the evaluation criteria. This creates
buy-in, increases engagement, and fosters a deeper commitment to the learning process.
In the assembly of a portfolio, students not only get to decide which work is graded, they
have the opportunity reflect up and evaluate the quality of those submissions. This type
of involvement fosters metacognition, active participation, and ultimately puts students at
the center of the learning process (McMillan & Hearn, 2008). During peer-assessment
students are asked to be the actual evaluator offering feedback and suggestions on how
to improve their classmates’ work. When created collaboratively, many of these
assessments enable teachers and students to interact in a way that blurs the roles in the
teaching and learning process (Barootchi & Keshavarz, 2002). When students are part of
the assessment process they are more likely to “take charge” of their own learning
process and products and will be more likely to want to make improvements on future
work (Sweet, 1993).
The following sections describe six assessment tools and strategies shown to impact
teaching and learning as well as help teachers foster a 21st century learning environment
in their classrooms: 1) Rubrics, 2) Performance-based assessments (PBAs), 3) Portfolios, 4)
Student self-assessment
, 5) Peer-assessment, 6) Student response systems (SRS). Although the list does not
include all innovative assessment strategies, it includes what we think are the most
common strategies, and ones that may be particularly relevant to the educational context
of developing countries. Many of the assessment strategies currently in use fit under one
ore more of the categories discussed. Furthermore, it is important to note that these
strategies also overlap in a variety of ways.
1. Rubrics
Rubrics are both a tool to measure students’ knowledge and ability as well as an
assessment strategy. A rubric allows teachers to measure certain skills and abilities not
measurable by standardized testing systems that assess discrete knowledge at a fixed
moment in time (Reeves & Stanford, 2009). This section discusses the research on rubrics,
but because rubrics are frequently used as part of other assessment strategies (portfolios,
performances, projects, peer-review and self-assessment), they will be discussed in those
sections as well. Unlike a standard checklist used to assess performance, a rubric is a set
of criteria that articulates expectations and describes degrees of quality along a
continuum (H. L. Andrade, Ying, & Xiaolei, 2008; Rezaei & Lovorn, 2010; Wiggins &
McTighe, 2005). The rubric is not only utilized in conjunction with summative assessments;
it is a tool that can enhance the entire learning process from start to finish by serving a
number of purposes including communicating expectations for an assignment, providing
focused feedback on a project still in process. Additionally, they encourage self-
monitoring and self-assessment and give structure for a final grade on an end product (H.
L. Andrade, et al., 2008; Lee & Lee, 2009; National Research Council, 2002). Rubrics are
considered “inclusive assessment tools” that can be used as class-wide assessment tools
to help students at all levels make meaningful progress towards curricular goals (Lee &
Lee, 2009). Andrade, et. al. (2010), in their research around assessment and middle school
writing, found that students who are involved in three major components of rubric
assessment (reading an exemplary sample, generating criteria, and using a rubric to
selfassess) can actually produce more effective writing. In addition, students with access to
the evaluation criteria for a project had higher quality discussions and better group
products than their peers who did not know the grading criteria in advance (H. Andrade,
Buff, Terry, Erano, & Paolino, 2009). Skillings (2000), in her two years observing an
elementary school classroom noted that “both lower and higher achieving students were
able to be successful in showing their knowledge” when they were assessed with a rubric
(p. 454). Similarly, the awareness of lesson objectives and the encouragement of self-
monitoring associated with the use of rubrics increase engagement levels and help
students with disabilities learn more successfully in an inclusive classroom (Lee and Lee,
2009). One of the major strengths of the rubric as an assessment method is that it
functions as a teaching as well as an evaluative tool (H. L. Andrade, et al., 2008; J. W.
Popham, 1997). The development of high quality evaluation criteria is essential to the
effectiveness of a rubric as both an instructional and assessment tool (Wiggins &
McTighe, 2005). Popham (2008a) suggests that in fact, the evaluative criteria “should be
the most instructionally relevant component of the rubric. They should guide the teacher
in designing lessons because it is the students’ mastery of the evaluative criteria that
ultimately will lead to skill mastery” (p. 73). In order to ensure the rubric criteria is
rigorous and accurate, Wiggins and McTighe suggest designing and refining rubrics based
on actual student work that has been collected, sorted and rated. Collaborative rubric
development can also promote cooperation between teachers and students as they work
together to build and utilize the tool (Lee & Lee, 2009). As a result, students are more
comfortable because they feel some ownership in the process, recognize that their
opinion is valued and are more successful because they know what is expected of them
(Lundenberg, 1997; Reeves & Stanford, 2009). Inviting students to participate in the
generation of rubric criteria not only pushes students to think more deeply about their
learning it helps foster a sense of responsibility for their own learning process and
develop critical thinking skills that can be transferred to other learning situations (Andrade
et. al., 2008; Lee and Lee, 2009; Skillings and Ferrell, 2000; National Research Council,
2002). Wiggins and McTighe (2005) in fact emphasize that the ultimate test of student
knowledge is their ability to transfer what they know to a variety of contexts.
Metacognition can also lead to more self-directed learning through self-monitoring and
self-assessment (Lee and Lee, 2009).
2. Performance-based Assessments
Performance-based assessments (PBA), also known as project-based or authentic
assessments, are generally used as a summative evaluation strategy to capture not only
what students know about a topic, but if they have the skills to apply that knowledge in a
“real-world” situation. By asking them to create an end product, PBA pushes students to
synthesize their knowledge and apply their skills to a potentially unfamiliar set of
circumstances that is likely to occur beyond the confines of a controlled classroom setting
(Palm, 2008). Some examples of PBA include designing and constructing a model,
developing, conducting and reporting on a survey, carrying out a science experiment,
writing a letter to the editor of a newspaper, creating and testing a computer program,
and outlining, researching and writing an in-depth report (Darling-Hammond & Pecheone,
2009; Wren, 2009). Regardless of the type of performance, the common denominator
across all PBAs is that students are asked to perform an authentic task that simulates a
real life experience and mimics real world challenges (Wiggins & McTighe, 2005; Shepard,
1995). Performance-based assessments have been used in many countries for decades
and offer many advantages not afforded by standardized paper and pencil multiple-
choice exams. Wiggins and McTighe (2005) assert that in fact, “authentic assessments are
meant to do more than “test”: they should teach students (and teachers) what the “doing”
of a subject looks like and what kinds of performance challenges are actually considered
most important in a field or profession” (p. 337). PBA, coupled with an well-designed
measurement tool such as a scoring rubric, can provide the how and the why a student
might be struggling, versus just the what of standardized tests; as a result, PBA can
actually help teachers figure out how their students best learn (Falk, Ort, & Moirs, 2007;
Shepard, 2009). PBA, used as a formative assessment, also provides more timely feedback
than large-scale standardized tests. Standardized tests can take a number of months to
produce results, but PBA allows teachers to make meaningful adjustments while they are
still teaching their current students (Darling-Hammond & Pecheone, 2009; Wood, et al.,
2007). Additional benefits of PBA are that they are inherently more student-centered and
are better at assessing higher order thinking and other 21st century skills (Wood, et al.,
2007; Wren, 2009). In a yearlong study of 13 third grade teachers in Maryland, Shepard
and her team (1995) noted “small but real gains” in students’ ability to explain
mathematical patterns and tables; a skill previously exhibited by only the most adept
students (p. 27). Not surprisingly, PBA helps students to be more engaged and invested in
their learning (Wood et. al., 2007; Wiggins & McTighe, 2005). PBA also allows for
differentiation of assessment so that all students have space to demonstrate
understanding including special education and ELL students (Darling-Hammond, 2009).In
addition to impacts on student outcomes, research has shown that the implementation of
performance-based assessment strategies can also impact other instructional strategies in
the classroom. Though it can be challenging to change general teaching paradigms, a
small study of teachers in the US found that “under some circumstances, performance-
based assessment can change specific behaviors and procedures in the classroom”
(Firestone, Mayrowetz, & Fairman, 1998, p. 11).
3. Portfolio Assessment
Portfolios are a collection of student work gathered over time that is primarily used as a
summative evaluation method. The most salient characteristic of the portfolio assessment
is that rather than being a snapshot of a student’s knowledge at one point in time (like a
single standardized test), it highlights student effort, development, and achievement over
a period of time; portfolios measure a student’s ability to apply knowledge rather than
simply regurgitate it. They are considered both student-centered and authentic
assessments of learning (Anderson & Bachor, 1998; Barootchi & Keshavarz, 2002).
Portfolios are one of the most flexible forms of assessment because they can be
effectively adapted across subject areas, grade levels and administrative contexts (i.e. to
report individual student progress, to compare achievement across classroom or schools
and to increase parent involvement in student learning) (Sweet, 1993; National Research
Council, 2002). The content included in the portfolio, along with who chooses what to
include, vary by the teacher and the learning goals associated with the portfolio. Some
portfolios only include final products, while other portfolios will incorporate drafts and
other process documents. Some will contain items chosen exclusively by the teacher, while
others will fold in input from the student, their peers, administrators and even parents.
One of the strengths of the portfolio as an assessment tool is that it can be smoothly
integrated into classroom instruction (as opposed to be an add-on style of the
standardized summative test). The portfolio acts as a repository for work assigned and
completed throughout the year. It does not necessitate additional tests or writing
assignments. The additional inputs required (i.e. student reflection (written or spoken),
student-teacher collaboration, rubric creation and implementation) aid rather than distract
from the teaching and learning process. Barootchi and Keshavarz highlight that the
student portfolio is an assessment that is “truly congruent with instruction” because of its
ability to simultaneously teach and test (p. 286). In fact, when implemented effectively,
portfolios can supplement rather then take time away from instruction (Sweet, 1993;
National Research Council, 2002). When the portfolio is well integrated into a teacher’s
instructional practices, it can function as a strategy to increase student learning across a
variety of subject areas. Studies in Iran and in Turkey showed increased student
achievement in English as a foreign language (Barootchi & Keshavarz, 2002), science
(Çakan, Mihladiz, & Göçmen-Taskin, 2010), and writing and drawing (Tezci & Dikici, 2006).
All high quality portfolios involve students at some point in the process. In fact, the
selection process can be hugely instructive and impactful for students as they are asked
to collect, select and reflect upon what they want to include in their portfolio (Sweet,
1993). Portfolios foster self-reflection and awareness among students as they are often
asked to review previous assignments and projects and assess strengths and weaknesses
of both their processes as well as their final products (Sweet, 1993). Barootchi and
Keshavarz (2002) also emphasize the role that portfolios can have in helping students to
become more independent learners (p. 281). When well integrated, portfolios can also
foster collaboration both among students and their peers as well as between students
and their teacher (Tezci & Dikici, 2006). Students’ critiques and evaluations of classmate’s
work can even be included as an additional artifact in the portfolio collection. Nunes
(2004) believes that one of the underlying principles of portfolio development is that “it
should be dialogic and facilitate ongoing interaction between teacher and students” (p.
328). Technology is playing an increasingly important role enabling teachers to use
portfolios. In the past decade portfolios have moved from paper folders and file cabinets
to electronic databases in social networks imbedded within the online “cloud.” While e-
portfolios offer many of the same benefits of conventional portfolios, there are additional
advantages that affect learning, teaching and administration. Chang (2009) describes the
e-portfolio as an “abundant online museum” connoting an ease of storage, a creativity of
presentation, and the facilitation of collaboration (p. 392). Research suggests that e-
portfolios can aid in the development of information technology (IT)skills, but also
increase learning in low-motivation students (Chang, 2009). Online portfolios also allow
for real-time information collection, collaboration and editing with fewer physical
resources required. Finally, students are pushed to consider a wider audience when they
put their products online (Diehm, 2004). They also eliminate the space limitations normally
associated with paper portfolios.
4. Self-assessment
While the previous assessment tools and strategies listed in this report generally function
as summative approaches, self-assessment is generally viewed as a formative strategy,
rather than one used to determine a student’s final grade. Its main purpose is for students
to identify their own strengths and weakness and to work to make improvements to meet
specific criteria (H. Andrade & Valtcheva, 2009). According to McMillan and Hearn (2008)
“self-assessment occurs when students judge their own work to improve performance as
they identify discrepancies between current and desired performance” (p. 1). In this way,
self-assessment aligns well with standards-based education because it provides clear
targets and specific criteria against which students or teachers can measure learning. Self-
assessment is used to promote self-regulation, to help students reflect on their progress
and to inform revisions and improvements on a project or paper (Andrade and Valtcheva,
2009). Ross (2006) argues that in order for selfassessment to be truly effective four
conditions must be in place: the self-assessment criteria is negotiated between teachers
and students, students are taught how to apply the criteria, students receive feedback on
their selfassessments and teachers help students use assessment data to develop an
action plan (p. 5). A number of studies point to the positive effects self-assessment can
have on achievement, motivation, selfperception, communication, and behavior (H.
Andrade & Valtcheva, 2009; Klenowski, 1995; McMillan & Hearn, 2008). McDonald and
Boud (2003) report that high school students who were trained in self-assessment not
only felt better prepared for their external examinations, they actually outperformed their
peers who had not received the training. Similarly, students across grade levels and
subject areas including narrative writing, mathematics and geography outperformed their
peers in the control group who had not received self-assessment training (Ross, 2006).
Andrade and Valtcheva (2009) in their literature reviews cite numerous studies that found
a positive relationship between the use of self-assessments and the quality of writing,
depth of communication skills, level of engagement and degree of learner autonomy.
Finally, self-assessment is also a lifelong learning skill that is essential outside of the
confines of the school or classroom (McDonald and Boud, 2003). An additional strength of
self-assessment as a formative assessment tool is that it allows every student to get
feedback on his or her work. Few classrooms allow teachers the luxury of regularly
responding to each individual student, so when students are trained in self-assessment it
makes them less reliant on teachers to advance their learning (Andrade and Valtcheva,
2009). While the focus is self-evaluation, the process can also be enhanced through peer
and teacher based assessments that offer alternative interpretation and additional
evidence to support a student’s understanding of their own learning (Andrade and
Valtecheva, 2009). A number of channels can be used to aid students in their self-
assessment including journals, checklists, rubrics, questionnaires, interviews and student-
teacher conferences. As with the previous assessment strategies, the rubric is often the
most effective tool to help monitor and measure student self-assessment, though
Andrade and Valtcheva (2009) warn that simply handing one out to students before an
activity does not guarantee any learning gains because students need to deeply
understand and value the criteria. As the rubric section of this paper points out, students
can benefit deeply from being involved in the process of developing evaluation criteria
and benchmark targets (Ross, 2006). In addition to involving students in the process, the
assessment criteria needs to be appropriately challenging in order for the evaluation to be
meaningful (McMillan and Hearn, 2008). Ross (2006) also notes the importance of creating
a classroom climate in which students feel comfortable assessing themselves publicly. He
urges teachers to focus students’ attention on learning goals (with a focus on learning
ideas) rather than performance goals (that tend to focus on outdoing one’s peers).
5. Peer Assessment
Peer assessment, much like self-assessment, is a formative assessment strategy that gives
students a key role in evaluating learning (Topping, 2005). Peer assessment approaches
can vary greatly but, essentially, it is a process for learners to consider and give feedback
to other learners about the quality or value of their work (Topping, 2009). Peer
assessments can be used for variety of products like papers, presentations, projects, or
other skilled behaviors. Peer assessment is understood as more than only a grading
procedure and is also envisioned as teaching strategy since engaging in the process
develops both the assessor and assessee’s skills and knowledge (Li, Liu, & Steckelberg,
2010; Orsmond & Merry, 1996). Feedback that students are asked to provide can confirm
existing information, identify or correct errors, provide feedback on process, problem
solutions or clarity of communication (Butler & Winne, 1995). The primary goal for using
peer assessment is to provide feedback to learners. This strategy may be particularly
relevant in classrooms with many students per teacher since student time will always be
more plentiful than teacher time. Although any single student’s feedback may not be as
rich or in-depth as a teacher’s feedback, the research suggests that peer assessment can
improve learning. The research base has found peer assessment strategies to be effective
in different content areas from language arts (Karegianes, Pascarella, & Pflaum, 1980;
McLeod, Brown, McDaniels, & Sledge, 2009), to mathematics (Bangert, 2003; Jurow, Hall,
& Ma, 2008) and science (Peters, 2008). Peer assessment has even proven beneficial for
students as young as six years old (Jasmine & Weiner, 2007). There is research on peer
assessment from the North America and Europe (Sluijsmans, Dochy, & Moerkerke, 1999;
Topping, 2005), and there are a few research studies from Asian countries (Bryant &
Carless, 2010; Carless, 2005). Peer assessment is associated with performance gains and
cognitive gains for students who receive feedback and for students as they give feedback.
The research suggests that, when done properly, peer assessment strategies can improve
the quality of learning to a degree equivalent to gains from teacher assessment (Topping,
2009). Giving and receiving feedback impacts meta-cognitive abilities like self-regulation
(Bangert, 2003; Butler & Winne, 1995) influencing time on task and engagement in
learning and improving learning outcomes. Asking students to provide feedback to others
can also improve their own work as they internalize standards of excellence (Li, et al.,
2010). When used in conjunction with collaborative learning peer assessment can also
improve interpersonal skills like group work, consensus building, or seeking and providing
help (Brown, Topping, Henington, & Skinner, 1999; J. A. Ross, 1995). In collaborative peer
assessment techniques, students could work in groups to review work, entire class might
evaluate student presentations or students can even be asked to assess their own groups’
work. Peer assessment is usually used in conjunction with other types of teacher
assessment so that the peer assessment is seldom the only evaluation provided. For
example, peer editing maybe done on a draft report but the teacher evaluates the final
draft or peers may provide part of the score on a student’s performance but the rest of
the score comes from the teachers’ assessment. Peers are generally defined as students of
equal status in that they are in a similar grade and similar levels of proficiency with
content, although there is often flexibility and slightly older students may assess younger
students, or a student moving more quickly through the material may be asked to assess
a less advanced students. Topping (2005) contends that peer assessment works best when
students are asked to provide formative and qualitative feedback rather than merely
grading or giving a score to peers since this often makes students uncomfortable.
Conclusion
More and more, discussions concerning education reform are paying increasing attention
on the role that classroom based assessment strategies play in fostering student centered
teaching practices. Together, all of the research cited here strongly suggests that these
assessment tools and strategies can positively impact a number of key areas that we know
are important aspects of education reform: student/teacher relationships, teacher’s ability
to personalize instruction, acquisition of 21st century skills, student engagement and
student metacognition. These practices are becoming more common in developed
countries, but there is still little research on how to adapt these approaches to the school
contexts of many emerging market countries. It is important to note that with access to
professional development resources, teachers and administrators can become proficient
with assessment for learning approaches without returning to the university for continuing
education courses. Many teachers who have participated in Intel teacher professional
development program are beginning to use assessment for learning strategies and this
has offered us a chance to see these new assessment strategies in action (León Sáenz,
Castro, & Light, 2008; Light, et al., 2009). With support from the ministries of education,
the Intel education portfolio of professional development course is available online and
face-to-face courses are available in over 30 countries. But there is still more work to be
done for local governments, ministries, and NGOs both in researching and adapting these
strategies to developing country contexts and to developing programs to promote their
use in classrooms.
10 Types of Assessment :
1) Summative Assessment
Summative comes from the word summary. The summative assessment arrives at the very
end of the learning sequence and is used to record the students overall achievement at
the end of learning. The primary objective summative assessment is to measure a
student’s achievement post instructions or learnings.
Examples of summative assessment II midterms and final papers and examinations which
give overall knowledge test of the student at the end of the learning. The summative
assessment gives an insight into an overall scenario of the understanding of the student
regarding particular learning or a topic. Summative assessment helps to answer the
questions like what happened and what went wrong at the end of the learning.
The United States uses summative assessment all over their educational institutes.
Summative assessments have more weight compared to formative assessments.
Questionnaire service interviews testing’s and projects are few of the methods used to
measure Summative assessment.
2) Formative Assessment
Formative assessment includes a variety of formal and informal assessment procedures
which are used by teachers in the classroom so that they can modify the teaching an
improve the student’s attention retention and his learning activity.
Formative assessment survey geyser in throughout the learning process and usually
determined the performance of the student during the learning, unlike summative
assessment which determines the performance at the end of the learning.
The primary objective of formative assessments is to involve the attention of the students
and help them achieve their goals. It is performed in the classroom and determines the
strengths and weaknesses of students. The routine question during the teaching of a
lesson is an example of formative assessment.
Formative assessment is positive in its intention such that it is directed towards promoting
learning and hence it is an integral part of teaching.
It helps in addressing individual or group deficiencies by identifying it.
3) Evaluative assessment
This is concerned only with evaluating assessment. The overall idea is to evaluate the
assessment in the school or in the system or in the department. Evaluation of candidates
helps in assessing and judging whether the candidates are capable enough for the
learning program. Evaluative assessment is done only with the aim of evaluating and
grading the candidates.
4) Diagnostic Assessment
When the objective is to identify individual strengths and areas of improvement
diagnostic assessment is the one that is used. It helps to inform next steps in the
assessment bike including the strengths and weaknesses areas of improvement and other
characteristics. Unlike Evaluative assessment, diagnostic assessment does not aim to grade
the candidates but rather it helps in diagnosing the issue after which the teacher can take
steps to address it.
Norm-referenced tests commonly known as NRT tests is used to assess or evaluate with
the aim of determining the position of the tested individual against a predefined group
on the traits being measured. The term normative assessment means the process of
comparing one test taker to his seniors or peers.
The primary objective behind this test is to determine whether the test taker has
performed better or worse than the other test takers which in turn determines whether
the test taker knows more or less than the other test takers. Comparison by
benchmarking is the method used in NRT.
The primary advantages that this kind of test can provide information about an individual
vis-a-vis the reference group while disadvantage includes the reference group may not
represent the current population of interest since most of the norms are misleading and
therefore do not stay over a period of time. This test also does not ensure if the test is
valid in itself. Norms do not mean standards which is another disadvantage of this test.
6) Performance-based assessments
This is also known as education assessment in which the skills, attitudes, knowledge, and
beliefs of the student are checked to improve the standard of learning. The assessment
year used at times done with the test but not only confirm to tests and it can extend to
class or workshop or real-world applications of knowledge used by the student.
Selective response assessment determines the exact amount of knowledge that the
student has and also provides an insight into the skills the student has acquired over the
time of learning.
8) Authentic assessment
Intellectual assessments that are worthwhile significant and substantial are measured by
authentic assessment. In contrast, to standardize tests authentic assessment provides deep
insights about the student.
9) Criterion-referenced tests
This kind of assessment determines the performance of student against a fixed set of pre-
determined and agreed upon criteria or the learning of students. Unlike norm-referenced
test here without reference is made against a particular criterion other than a benchmark
or a human being or another student.
While criterion-referenced assessment will provide whether or not the answer is correct
the norm-referenced assessment will provide information on whether the answer is better
than student number 1 is worse than student number 3.
The comparison here is not against a person or fellow competitor is what is the biggest
advantage of criterion-referenced assessment over norm-referenced assessment. While the
earlier assessment provides the exact running status of student the latter one provides the
running status of a student with respect or in comparison to others.
Written assessments are one of the most popular methods in Summative Assessment.
Oral assessments, on the other hand, involve the evaluation of the candidates orally. They
are evaluated for the knowledge with their verbal answers. Questions can be elaborative
or objective or a combination of both.
NATURE OF PERFORMANCE BASED-TESTS
Most of the time, the teacher relied on paper-and-pencil tests which measures knowledge and
understanding instead of a student’s ability. With the implementation of the O.B.E (Outcome-
Based-Education) greater emphasis shall be given in assessing student outcomes through real
life (Authentic Assessment) which requires the students to carry out activities or produce
product in demonstrating meta-cognitive knowledge.
• It is a processes the creative aspect of the scholar in bringing out what they know and
what they will do through different performance tasks like exhibits, projects and work samples
(Hands-on experiences)
• It is stipulated within the DepEd Order No.7, s.2012 that the very .best level of assessment
focuses on the performance (product) which the scholars are expected to supply through
authentic performance tasks.
Multiple Evaluation Criteria. The student’s performance must be judged using an evaluation
criterion.
Judgemental Appraisal. Unlike the scoring of selected-response tests during which electronic
computers and scanning machines can. Once programmed, keep it going without the necessity
of manual intervention, genuine performance assessments needs human subjective judgments.
• Solving problems – Critical thinking and problem solving skills are important. Teachers may
include activities and make sense of complex authentic problems or issues to be solved by the
students
• Completing an inquiry – An inquiry task is where students are asked to collect data in order
to develop understanding about a topic or issue. Examples of these include investigation,
research-based activities , survey and interviews or independent studies.
• Determining a position – This task requires students to make decision or clarify a position.
Case analysis and issue related activities or debate are some samples of this task.
• Demonstration task – This task show how students use knowledge and skills to complete
well-defined complex tasks. Examples are demonstrating steps in cooking, explaining
earthquake safety and etc.
• Developing Exhibits – Exhibits are visual representations or displays that need little or no
explanation from the creators. An exhibit is obtainable to elucidate , demonstrate or show
something.
• Presentation Task – This is a work or task performed in front of an audience like storytelling,
singing and dancing, musical play or theatrical acting.
• Capstone Performances – These are tasks that occur at the end of a program or study and
enables students to show knowledge and skills in the context that matches the world of
practicing professionals. These tasks includes research paper, practice teaching, internship or
on-the-job training.3. Strengths and Limitations
• Performance assessment allows students to exhibit their own skills, talents and expertise.
Tasks show integration of the student’s skills knowledge and skills , provide challenge and
opportunities to exhibit their best creation.
• Performance assessment allows the teachers to explore the most goal and processes of
teaching and learning process. Teachers may reflect and revisit learning targets, curriculum and
instructional practices, and standards as they utilize performance-based assessment.
The more complex the method and performance, the longer you’ll expect to spend on scoring.
to scale back the scoring time, crafting top quality rubrics is suggested .
4. Performance task score may have lower reliability. This resulted to inconsistency of scoring
by teachers who interpret observation quit differently.
5. Performance task completion could also be discouraging to less able students. Some tasks
that need students to sustain their interest for a extended time may discourage disadvantaged
students.
At the end of the chapter, you should be able to develop a portfolio of pen-formalities based
assessment tools that measures learners’ competencies of a given subject.
Designing performance assessment entails critical processes which start from the tasks that the
teacher wants to assess. A well-designed performance assessment helps the student to see the
connections between the knowledge, skills, and abilities they have learned from the classroom,
including the experiences which help them to construct their own meaning of knowledge.
The following steps will guide you in developing a meaningful performance assessment both
process and product that will match to the desired learning outcomes.
Herman (1992)
Basically, the teacher should select those learning targets which can be assessed by
performance which fits to the plan along with the assessment techniques to be utilized for
measuring other complex skills and performance.
In defining the purpose of assessment, learning targets must be carefully identify and taken in
consideration. Performance assessments primarily use four types of learning targets which are
deep understanding. reasoning, skills, and products (McMillan, 2007),
Deep Understanding
Reasoning
Reasoning is essential with performance assessment as the students demonstrate skills and
construct products. Typically, students are given a problem to solve or ar: asked to make a
decision or other outcome, such as a letter to the editor or schor newsletter; based on
information that is provided.
Skills
Psychomotor Skills
Psychomotor skills describe clearly the physical action required for a given tasks. These may be
developmentally appropriate skills or skills that are needed for specific tasks: fine motor skills
(holding a pen, focusing a microscope, and using scissors gross motor actions (jumping and
lifting), more complex athletic skills (shooting basketball or playing soccer), some visual skills,
and verbal / auditory skills for yo children. These skills also identify the level at which the skill
is to be performed.
Generally, deep understanding and reasoning involve in-depth, complex thinking about what is
known and application of knowledge and skills in novel and more sophisticated ways. Skills
include student proficiency in reasoning, communicatio and psychomotortasks.
Products
Are completed works, such as term papers, projects, and other assignments in which students
use their knowledge and skills.
In defining the purpose of assessment, the teacher should identify whether the students will
have to demonstrate a process or a product. if the learning outcomes deal on the procedures
which you could specify, then it focuses on process assessment, in assessing the process, it is
essential also that assessment should be done while the students are performing the
procedures or steps.
Learning targets which require students to demonstrate process include the procedures of
proper handling / manipulating of microscope, or steps to be done when in an earthquake
drill. Mathematical operations, reciting a poem, and constructing a table of specification are
other examples of this target.
Content Standard: The students demonstrate oral language proficiency and fluency
in various social contexts.
Task: Oral – Aural Production (The teacher may use dialogs or passages from other written
or similar texts).
Specific Competencies:
Kakayahan:
Usually, the learning objectives start with a general competency which is the main target of the
task, and it follows with specific competencies which are observable on th target behavior or
competencies. This can be observed also in defining the purpose of assessment for product-
oriented performance-based assessment.
Sometimes, even though you teach specific process the learning outcomes simply implies that
the major focus is product that the student produces. Nitko (2011) suggested focusing
assessment on the product students produce if most or all of the evidence abou their
achievement of the learning targets is found in the product itself, and little or none of the
evidence you need to evaluate students is found in the procedures they use or the way in
which they perform.
Assessment of products must be done if the students will produce a variety of better ways to
produce high quality products, sometimes, method or sequence does not make difference as
long as the product’ is the focus of the assessment.
Examples of learning targets which require students to produce products include; building a
garden, conducting classroom-based researches, publishing newspaper and creating
commercials or powerpoint presentation.
In the given examples 1 and 2 for English and Filipino Grade’7 domains, product – oriented
performance-based assessment can be stated as:
Use the correct prosodic patterns (stress, intonation, phrasing, pacing, tone) in rendering
various speech acts or in oral reading activities, and
Nakasusulat ng talatang nagsasalaysay rig iiang pangyayari sa kasalukuyan ng may kaugnayan
sa paksa ng akdang napakinggan.
Below is another example of product-oriented performance-based assessment task.
Performance needs to be identified so that students may know what tasks and criteria to be
performed. In this case, a task description must be prepared to provide the listing of
specification of the tasks and will elicit the desired performance of the students. Task
description should include the following: ‘
(McMillan 2007)
Tasks on the other hand should be meaningful and must let the student he personally involved
in doing and creating the tasks. This could be done by Selecting a task which has personal
meaning for most of the students. Choose a task in which students have the ability to
demonstrate knowledge and skills from classroom activities or other similar ways. These asks
should be of high value, worth teaching to, and worth learning as well.
In creating performance tasks, one should specify the learning targets, the criteria by which
you will evaluate performance, and the instructions ‘for completing the task. Include also the
time needed to complete the tasks. Be sure students understand how long a response you are
expecting. Some learning targets can be assessed in a relatively short period of 20 to 30
minutes. But it also depends on the learning targets which necessitate-a longer time. Examples
are conducting opinion survey and gathering of data for research which need more than two
weeks and done outside of the class. With these activities. the results can make a valid
generalization of how the students achieved the learning target.
Participation of groups must be considered also in crafting performance tasks. Some tasks
require cooperative or collaborative learning or in group tasks. With this, the number of tasks
must be given an attention as well, as a rule, the fewer the number of tasks, the fewer targets
can be assessed in a given performance.
1. Focus on learning outcomes that require complex cognitive skills and student
performances. Tasks need to be developed or selected in light of important learning outcomes.
Since performance-based tasks generally require a substantial investment of student time, they
should be used primarily to assess learning outcomes that are not adequately measured by
less time-consuming approaches.
2. Select or develop tasks that represent both the content and the skills that are central to
important learning outcomes. It is important to specify the range of content and resources
students can use in performing task. In any event, the specification of assumed content
understandings is critical in ensuring that a task functions as intended.
3. Minimize the difference of task performance on skills that are irrelevant to the intended
purpose of the assessment task. The key here is to focus on the attention of the assessment.
Example is the ability to read complicated texts and the ability to communicate clearly are both
important learning outcomes, but they are not necessarily the intent of a particular assessment
4. Provide the necessary scaffolding for students to be able to understand the tasks and what is
expected. Challenging tasks often involve ambiguities and require students to experiment,
gather information, formulate hypothesis, and evaluate their own progress in solving a
problem. However, problems cannot be solved in a vacuum. Students need to have a prior
knowledge and skills required to address the problem. These prerequisites can be a natural
outcome of prior instruction or may be built into the task.
5. Construct task directions so that the student’s task is clearly indicated. Vague directions can
lead to such a diverse array of performances that it becomes impossible to rate them in a fair
or reliable fashion. By design, many performance-based tasks give students a substantial
degree of freedom to explore, approach problems in different ways and come up with novel
solution.
6. Clearly communicate performance expectations in terms of the criteria by which the
performances will be judged. Specifying the criteria to be used in rating performance helps
clarify task expectations for a student. Explaining the criteria that will be used in rating
performances not only provides students with guidance on how to focus their efforts, but
helps to convey priorities for learning outcomes.
Example of Process-Oriented performance task on Problem Solving and Decicsion-making:
Key Competencies:
1. Uses reading skills and strategies to comprehend and interpret what is read.
2. Demonstrate competence in speaking and listening as tools for learning.
3. Construct complex‘sentences.
.
Your friend is going through a difficult time. You have tried talking about the issue but to no
avail. After much thought you recall a book you had read where the character went through a
similar experience as your friend. How might the book help your friend deal with the problem?
What other sources of information or resources could you find I; to help your friend? What
might be some strategies your friend could use? Use your writing skills to compose a letter to
your friend as to why he should read the book or il resources you have collected. Be sure your
letter contains examples from the readings, your feelings and encouragement.
As a problem solver, devise a plan to meet with your friend to identify possible solutions to
the problem after he has read the materials. Be sure you are considerate of feelings and
outline steps you’ll take to make sure your discussion is one of collaboration.
You will be assessed on your ability to make informed decisions, your ability to create a letter
with complex sentences, your ability to solve problem and your ability to work collaboratively
with a peer.
Performance Task
Barangay Luntian is celebrating its 50th anniversary with the theme “Kalikasan Ko, Mahal Ko”.
The barangay’captain called for a council meeting to discuss the preparations for the program.
As a councilc’ir, you are asked to take charge of the preparation. of “Natural Beverage” for the
guests. This healthful drink should promote your locally produced fruits or vegetables as well
as health and wellness. On your next council meeting, you will Dresent your plan for the,
preparation of the drink and let the council member do the taste testing. The council members
will rate your drink based on the following criteria: Practicality, Preparation, Availability of
materials, Composition of solution (drink).
McMillan (2007)
Regardless of whether these are process or product-oriented performance tasks, clearly stated
performance criteria are critical to the success of both instruction and assessment. Criteria in
the real essence of performance-based assessment define the target process and product,
guide and help the students on what should be taught and done and provide a target in
assessing the performance of the students.
There are different useful ways to record the assessment of students’ performance. Variety of
tools can be used for assessment depending on the nature of the performance it calls for. As
teacher, you need to critically examine the task to be performed matched with the assessment
tools to be utilized. Some ways of assessing the students’ performance could be the utilization
of anecdotal records, interviews, direct observations using checklist or Likert scale, and the use
of rubrics especially for the performance-based assessment.
Set of rules Specifying the criteria used to find out what the students know and are able to do
so. (Musial, 2009)
Scoring tool that lays out Specific expectations for assignment (Levy, 2005)
A scoring guide that uses criteria to differentiate between levels of student proficiency.
(McMillan, 2007)
Descriptive scoring schemes that are developed by teachers or evaluators to guide the analysis
of products or processes of students’ effort (Brookhart. 1999).
The scoring procedures for judging students’ responses to performance tests (Popham, 2011)
Evaluative criteria. These are the factors to be used in determining the quality of 3 students’
response.
Descriptions of qualitative differences for evaluating criteria. F012 each evaluative criterion, a
description must be supplied so qualitative distinctions in students’ responses can be made
using the criterion.
An indication of whether a holistic or analytic scoring approach is to be used. The rubric must
indicate whether the evaluative criteria are to be applied collectively in a form of holistic
scoring or on a criterion-by-criterion basis in the form of analytic scoring.
(Popham, 2011)
Rubrics are used also to communicate how teachers evaluate the essence of what is being
assessed. Rubrics not only improve scoring consistency, they also improve validity by clarifying
the standards of achievement the teacher will use in evaluating. In the development and
scoring of rubrics, Nitko (2011) suggested some questions which the teacher should address:
The structure of the rubrics changes when measuring different learning targets Generally,
rubrics can be classified into two major types: analytic and holistic rubrics.
Analytic Rubric. it requires the teacher to list and identify the major knowledge and skills which
are critical in the development of process or product tasks. It identifies specific and detailed
criteria prior to assessment. Teachers can assess easily the specific concept understanding,
Skills or product with a separate component. Each criterion for this kind of rubric receives a
separate score, thus, providing better diagnostic information and feedback for the students as
a form of formative assessment.
Holistic Rubric. It requires the teacher to make a judgment about the overall quality of each
student response. Each category of the scale contains several criteria which shall be given a
single score that gives an overall rating. This provides a reasonable summary of rating in which
traits are efficiently combined, scored quickly and with only one score, thus, limiting the
precision of assessment of the results and providing little specific information about the
performance of the students and what needs for further improvement.
Rubric Development
Stevens and Levi s introduction to Rubrics (2005) enumerated the steps in developing rubric.
Basically, rubrics are composed of task description, scale, description of dimensions
Task Description
Task description involves the performance of the students. Tasks can be taken from
assignments, presentations, and other classroom activities. Usually, task descriptions are being
set in defining performance tasks.
Community Development
Task Description; Each student will make a 10-minute presentation on his / her observations,
experiences, analysis and interpretation of developing community. Student may use his/her
own community as a sample and look into its changes over the past 10 years. He / She may
use any form or any focus of presentation, but it’s a must to have a thesis statement, not just
an exposition. The presentation should include table, graphs, photographs, maps, landmarks,
and conclusions for the audience.
Scale
The scale describes how well or poorly any given task has been performed and determine to
what degree the student has met a certain criterion Generally, it is used to describe the level of
performance. Below are some commonly used labels compiled by Huba and Freed (2000).
Task Description: Each student will make a 10minute presentation on his/ her observations,
experiences, analysis and interpretation of developing community. Student may use his/her
own community as a sample and look into its changes over the past 10 years. He / She may
use any form or any focus of presentation, but it’s a must to have a thesis statement, not just
an exposition. The presentation should include table, graphs, photographs, maps, landmarks,
and conclusions for the audience.
Dimensions
This is a set of criteria which serves as basis for evaluating student output or performance. The
dimensions of rubric lay out the parts and how tasks are divided into its important
components as basis also for scoring the students.
Community Development
Task Description: Each student will make a 10-minute presentation on his / her observations,
experiences, analysis and interpretation of developing community. Student may use his/her
own community as a sample and look into its changes over the past 10 years. He / She may
use any form or any focus of presentation. but it‘s a must to have a thesis statement, not just
an exposition. The presentation should include table. graphs, photographs, maps. landmarks,
and conclusions for the audience.
Community Development
Task Description: Each student will make a 10-minute presentation on his/ her observations,
experiences, analysis and interpretation of developing community. Student may use his/her
own community as a sample and look into its changes over the past 10 years He / She may
use any form or any focus of presentation, but it’ s a must to have a thesis statement, not just
an exposition. The presentation should include tables, graphs, photographs, maps, landmarks,
and conclusions for the audience.
Students, on the other hand, can assess their own progress. Student participation need not to
be limited to the use of assessment instruments. it is also useful to have studen help develop
the instrument. in some practices, students rate themselves and compare their ratings with the
teacher-in-charge. With this, the teacher can elaborate and explain to each student the reasons
for rating and discuss the gap between the rating most especially in an indmdual conference.
Follow up-conference, peer and self- evaluation of output enable teachers to understand better
curriculum and instructional learning goals and the progress being undertaken towards the
achievement of the goals. These too can better diagnose the strengths and limitations of the
students and most importantly, this activity develops self-reflection and analysis skills of both
the teachers and the students.
Using the Affective Domain to select targeted teaching techniques might help foster
development of key beliefs and values that underlie the advanced competencies (and sub-
competencies). As educators, our aim is not only to impact knowledge, attitudes and skills,
but to impact the daily behavior of our graduates. (Yanofsky & Nyquist 2014)
According to the statement above, it is targeted to aim for the core values of our learners
and how they behave. William James also stated that Emotion and cognition are
inextricably linked and perhaps never entirely separate, distinctive nor pure.
Through this there will be a deep connection with the topic and the students’ lives. For
example; instead of just basically teaching the topic “Literary Devices” in a traditional way,
it would be much more preferable to reinforce it with an affective activity or agreement.
They’d be instead tasked to write a poem for their loved ones or even describe their
family using simile or metaphorical phrases. This way, there will be an emotional bond
that will be tied to the student’s knowledge to a certain topic.
“It is often the evaluation that is stored in memory, often without corresponding
conditions and affect that were responsible for its formation.” (Robert Scholl, University of
Rhodes Island, 2012)
By giving the students the “Need” to why they should study, evaluate and reflect on their
lessons will be the most effective and holistic way in learning. The question that will pop
out is “Why do we need this?”. It is simply because we can spark the motivation within
the student (Either Intrinsic or Extrinsic) to push their boundaries within that certain topic
on their own self-driven motivation. In the traditional setting, which we also call the
“Banking Concept of Education”, students are merely being deposited with historical,
linguistic and scientific facts. This process is somewhat dull and empty in the perspectives
of students. Unfortunately, this concept pushes them to cope up and memorize the topics
just for the sake of passing a particular subject.
Furthermore, assignments already existed before this time, tasking the students to
perform something within limited numbers of hours. The process were mostly made of
recollection of learning instead of application. For example, in the field of English, the
teacher taught conjunctions. Generally as we know it, conjunctions are words that bridges
phrases in order to form a sentence. In the traditional manner, the teacher would just
simple give out phrases and would let the student find out for themselves which are the
proper conjunctions that is to be used. On the contrary, in applying the “Affective-
Learning Competency”, the teacher would tap onto the personal schema or various stimuli
in order for the students to have a connection with the topic at hand (even if it’s just a
simple grammar lesson).
Executing this strategy may take some time and effort on the part of the teacher. First
and foremost, the teacher should prepare ahead of time to make the topic as easy to
comprehend as possible. In his or her DLL or lesson plan, the teacher should give out a
positive impact or motivation as he or she starts his class. That way, the students would
find the lesson memorable and will create meaning towards their learning. The teacher
should then remember to create a balance of objectives in regarding the Psychomotor
and Cognitive objectives, this way, the students will be equipped with the proper know-
how on how to perform their tasks (may it be in grammar or in literature). After putting
up specific objectives, delivering meaningful motivation, and delivering comprehensible
lessons, the teacher should proceed on with the affective domain. The part in which we
have discussed, that gives the students the need to use the lesson at hand.
There are also difficulties and barriers which hinders the application of this strategy. One
of which is the mood of the classroom or of your students.
There will be times where the mood inside the classroom would be grim and chaotic.
These instances cannot be avoided and will somehow occur even to teachers who are
most proficient in handling their classes. We must keep in mind that no matter how
intelligent, talented or witty your students are, they will never learn in an uncomfortable
environment. This negative environment would usually occur if a teacher is very strict and
is always in a bad mood. This also occurs when the teacher spends most of his/her time
scolding the students at the smallest of mistakes that they make.
According to my short interview with the student-teachers of Capiz State University this
occurred to one of their co-student-teacher. It reached to a point where the class was
devoid of all affective-learning and instead, it turned into three months of spending most
of the time scolding the students. This act resulted towards the student’s negligence
towards their subjects and disrespect towards their teacher, which proves that affective-
learning does not thrive in negative environments because the lesson being given have
become the opposite of the word “meaningful”.
Still, during short interview that I conducted, one of the CapSU student-teachers heavily
implemented the affective-learning competency. During his time on his practice teaching,
he heavily implemented the affective-learning competency. The first step that he did was
he created different motivations for each meeting on his class making it more exiting as
each day passes. Through this, he has created a “positive” classroom environment. Next is
he created unique assessments that slightly deviated from the standard activities given by
their textbook, giving the topics a more personalized touch. He also created a multiple-
intelligence affective assessment that was suited for the individual differences of his
students, making them more comfortable in terms of their capacity to learn. Lastly, they
had an agreement (assignment) that directly tapped into their emotions in order for them
to create something from what they have learned. This resulted into a deep connection
between the given topic and the students, proving that the affective-learning competency
is effective in the field of teaching. It was a stress free, meaningful and holistic experience
for the students first-hand. The downside to this is that the activities takes a long time to
develop and can be a bit stressing to the part of the teacher; developing new learning
materials, unique motivations, unique assessment tools and multiple rubrics that would
satiate the demands of the prepared material.
With the example given, the tasks were mostly non-cognitive variables which tapped on
the student’s attitude, interests and values. According to Willam James Popham (2003)
teachers should assess affect to remind themselves that there’s more to being a successful
teacher than helping students obtain high scores and achievement. As Filipinos, we always
hear the phrase “Grades and Knowledge will only get you to a certain point, what matters
is your attitude and how you treat other people.” This is the local philosophy, if not, a
universal belief that developing the attitude of oneself can bring prosperity in the future.
According to Phraser (1994), students are more proficient in problem-solving if they enjoy
what they do. A more positive environment fosters good student engagement and
learning than in a classroom with negative climate.
This could also contribute greatly to the Psychological aspect of the students. Due to
being multicultural (originating from many local culture), students have problems in each
of their lives. These problems, if left unchecked, can result to loss of motivation. This
factor will eventually result in rebellion or a student dropping out in class. A negative
classroom will deteriorate the morale of that particular troubled student which will result
in his/her disinterest in the classroom. On the other hand, if the classroom has a positive
environment, the student will cease to remember his/her struggles and the classroom will
start to become a “Safe Haven” for that troubled student.
In terms of benefits, there are a lot of aspects that can be developed through the use of
this competency. First is the ability to collaborate with others. The students will be able to
exercise their social skills and express themselves in a manner where they’ll be confident
about who they are and what they believe in. Second is the capability to stand on what
they believe in and resolve conflicts in different scenarios. Third is the innate trait of
kindness and empathy towards their peers or even to a complete stranger. This is all due
to the focused development of the affective aspect of the students.
Talking about the downside of this strategy, this is mostly focused more on the “attitude”
part rather than relentlessly memorizing every fact on the textbook. Focusing heavily on
this strategy would not necessarily deprive them of their capability on objective
assessments, but somehow slightly degrade their capacity to point out some specific facts.
Although the outcomes are subjective (depending on the capability of the teacher to
maintain this type of instruction), still the students have a high legibility to go beyond
their institution’s standards.
This strategy can be developed on the first few years of a new teacher or a teacher who
just recently implemented the strategy, and can be revised annually so that he/she
wouldn’t have to create another instruction all over again unless the Department of
Education decides to change the curriculum for the current educational program.
Deducing everything that was stated prior to this part, the affective-learning competency
aims to develop the very foundations of a student’s norms along with their objective
learning. Specifically, it aims to develop their social skills as well as their knowledge on
different subject matters. Before proceeding to this strategy, the teacher should first make
a pre-assessment of a student’s background both psychological and subject matter. This
can be significantly achieved through a positive classroom environment and through
proper assessment planning and consideration.
At the end of this chapter, you should be able to develop instruments for assessing
affective learning.
Cognitive and affective domains are inseparable aspects of a learner. Each completes one
another with respect to learners’ important domains. Proper, ongoing assessment of the
affective domain—students attitudes, values, dispositions, and ethical perspectives—is
essential in any efforts to improve academic achievement and the quality of the
educational experience provided. Unfortunately, the practice of routinely assessing
learners’ affective constructs are often left behind and focus is given most of the time to
assessing learners’ cognitive aspect. In addition, unlike cognitive domain, less assessment
tools are available for the affective construct.
There are three feasible methods of assessing affective traits and dispositions. These
methods are: teacher observation, student self-report, and peer ratings. (McMillan, 2007).
Since affective traits are not directly observable, they must be deduced from behaviour or
what students say about themselves and others. There are variety of psychological
measures that assess affective traits, but due to sophistication of such instruments,
classroom teachers rarely use them. Instead, own observations and students self-reports
are mostly used.
1. Emotions and feelings change quickly most especially for young children and during
early adolescence. Which means that to obtain a valid indication of an individual
student’s emotion or feeling, it is necessary to conduct several assessments over a period
of time. A single assessment is not enough to see what prevalent affect is. It needs to be
repeated over several times.
2. Use varied approaches in measuring the same affective traits as possible. It is better
not to rely on a single method because of limitations inherent in the method. For
example, students’ self-report maybe faked hence may significantly meddle in the results.
(However, if the self-reports are consistent with the teacher’s observation, then a stronger
case can be made.)
3. Decide what type of data or results are needed, is it individual or group
data? Consideration of what the purpose of assessment is will influence the method that
must be used. For reporting or giving feedback to parents or interested individuals about
the learner, individual student information is necessary. Thus, multiple methods of
collecting data over a period of time and keeping records to verify judgements made is
appropriate. If the assessments is to improve instruction, then results for group or whole
class is more proper you use. This is one of the usefulness of affective assessment. It is
more reliable to use anonymous student self-reports.
Teacher observation is one of the essential tools for formative assessment. However, in
this chapter, the emphasis is on how to use this method so that teachers can make more
systematic observations to record student behaviour that indicates the presence of
targeted affective traits.
In using observation, the first thing to do is determine in advance how specific behaviours
relate to the target. Its starts with a vivid definition of the trait, then followed by list of
student behaviours and actions are identified initially by listing what the students with
positive and negative behaviours and say. Classify those and create a separate list of the
positive student behaviours and another list for the negative student behaviours. These
lists will serve as the initial or starting point of what will be observed. Contained in the
table below are some possible student behaviours indicating positive and negative
attitude toward learning.
POSITIVE NEGATIVE
Rarely misses class Is frequently absent
Rarely late to class Is frequently tardy
These behaviors may serve as a vital input on how to perform observation, particularly the
teacher observation.
McMillan (2007) suggested that the best approach is to develop a list of positive and
negative behaviors. Although published instruments are available, the unique characteristic
of a school and its students are not considered in these instruments when they were
developed.
After the list of behaviors has been developed, the teacher needs to decide whether to
use an informal, unstructured observation or a formal one and structured. These two
types differ in terms of preparation and what is recorded.
Unstructured observation (anecdotal) may also be used for the purpose of making
summative judgements. This is normally open-ended, no checklist or rating scale is used,
and everything observed is just simply recorded. In using unstructured observation, it is
necessary to have at least some guidelines and examples of behaviors that indicate
affective trait. Thus it is a must to determine in advance what to look for, however it
should not be limited to what was predetermined, it also needs to be open to include
other actions that may reflect on the trait.
Unstructured observation is more realistic, which means teachers can record everything
they have observed and are not limited by what is contained in a checklist or rating scale.
Below are the things that should be considered if teacher observation method will be
used to assess affect.
There are varied ways to express students’ affect as self-report. The most common and
direct way is while having a casual conversation or interview. Students can also respond to
a written questionnaire or survey about themselves or other students.
There are different types of personal communication that teachers can use with their
students, like individual and group interviews, discussions, and casual conversations to
assess affect. It is similar to observation but in here, there is an opportunity that teachers
may have direct involvement with the student wherein teachers can probe and respond
for better understanding.
The second type under self-report method is questionnaires and surveys. The two types of
format using questionnaires and surveys are: (a) Constructed-Response format; and (b)
Selected-Response format.
Constructed-Response format
Selected-Response format
There are three ways of implanting the selected response format in assessing
affective learning outcomes. These are rating scale, semantic differential scale, and
checklist.
Peer ratings or appraisal is the least common method among the three methods of
assessing affect discussed in this chapter. Because of the nature of learners, they do not
always take this activity seriously and most often than not they are subjective in
conducting this peer rating. Thus, peer rating is seen as relatively inefficient in terms of
nature of conducting, scoring, and interpreting peer ratings. However, teachers can
accurately observe what is being assessed in peer ratings since teachers are very much
engaged and present inside the classroom and thus can verify the authentically of results
of peer rating. The two methods of conducting peer ratings are: (a) guess-who approach;
and (b) socio-metric approach. These approaches can be used together with observations
and self-reports to strengthens assessment of interpersonal and classroom environmental
targets.
Each of the three methods (observation, self-report, peer ratings) that was discussed
previously has its own advantage and disadvantages. In choosing for which method or
methods to use, consider the following factors:
If grouped response and tendencies are needed, selected response self-report method is
suited because it assures anonymity and is easily scored.
If the intention of the affective assessment is to utilize the results as supporting input to
grading, then multiple approaches is necessary and be mindful of the possibility of having
fake results from self-report and even from peer judgement.
The affective domain encompasses behaviors in terms of attitudes, beliefs, and feelings.
Sets of attitudes, beliefs, and feelings comprise one’s value. There are various assessment
tools that can be used to measure affect.
3.1 Checklist
Checklist is one of the effective assessment strategies to monitor specific skills, behaviors,
or dispositions of individual or group of students (Burke, 2009).
Checklists contain criteria that focus on the intended outcome or target. Checklists help
student in organizing the tasks assigned to them into logically sequenced steps that will
lead to successful completion of the task. For the teachers, a criteria checklists can be
used for formative assessments by giving emphasis on specific behaviors, thinking skills,
social skills, writing skills, speaking skills, athletic skills or whatever outcomes are likely to
be measured and monitored. Checklists can be used for individual or group cases.
1. Make a quick and easy way to observe and record skills, criteria, and behaviors prior to
final test or summative evaluation.
2. Provide information to teachers if there are students who need help so as to avoid failing.
3. Provide formative assessment of students of students’ learning and help teachers monitor
if students are on track with the desired outcomes.
According to Nitko (2001), rating scales can be used for teaching purposes and
assessment.
1. Rating scales help students understand the learning target/outcomes and to focus
students’ attention to performance.
2. Completed rating scale gives specific feedback to students as far as their strengths and
weaknesses with respect to the targets to which they are measured.
3. Students not only learn the standards but also may internalize the set standards.
4. Ratings helps to show each student’s growth and progress.
score
Example:
To what extent does the student participate in team meetings and discussions?
1 2 3 4
A better format for rating is this descriptive graphic rating scales that replaces
ambiguous single word with short behavioural descriptions of the various points along the
scale.
Example:
To what extent does the student participate in team meetings and discussions?
Comment(s):
______________________________________________________________________________
The table below contains the common rating scale errors that teachers and students must
be familiar with in order to avoid committing such kind of errors during assessment.
Error Description
Another simple and widely used self-report method in assessing affect is the use of Likert
scale wherein a list of clearly favourable and unfavourable attitude statements are
provided. The students are asked to respond to each of the statement.
Likert scale uses the five-point scale: Strongly Agree (SA); Agree (A); Undecided (U);
Disagree (D); and Strongly Disagree (SD).
The scoring of a Likert scale is based on assigning weights from 1 to 5 to each position of
scale. In using attitude scale, it is best to ask for anonymous responses. And in
interpreting the results, it is important to keep in mind that these are verbal expressions,
feelings and opinions that individuals are willing to report.
1. Write a series of statements expressing positive and negative opinions toward attitude
object.
2. Select the best statements expressing positive and negative opinions and edit as
necessary.
3. List the statements combining the positive and negative and put the letters of the five-
point scale to the left of each statement for easy marking.
4. Add the directions, indicating how to mark the answer and include a key at the top of the
page if letters are used for each statement.
5. Some prefer to drop the undecided category so that respondents will be forced to
indicate agreement or disagreement.
Mathematics
The advantage of using the incomplete sentence format is that it captures whatever
comes to mind from each student. However, there are disadvantages too for this. One is
students’ faking their response thinking that the teacher will notice their penmanship,
hence students will tend to give answers favourable to be liked responses of the teacher.
Another is scoring, which takes more time and is more subjective than the other
traditional objective formats.
Examples:
I think Mathematics as a subject is ________________________________.
I lie my Mathematics teacher the most because ______________________.
THE PORTFOLIO
Documentation Portfolio
– This approach involves a collection of work over time showing growth and improvement
reflecting students’ learning of identified outcomes. This portfolio is also called a “growth
portfolio” in the literature. The documentation portfolio can included everything from
brainstorming activities to drafts to finished products. The collection becomes meaningful
when specific items are selected out to focus on particular educational experiences or
goals. It can include the best and weakest of student work. It is important to realize here
that even drafts and scratch papers should be included in the portfolio for they actually
demonstrate the growth process that the students have been through.
Process Portfolio
– The process portfolio demonstrates all facets or phases of the learning process. As such,
these portfolios contain an extensive number of reflective journals, think logs and other
related forms of metacognitive processing. They are particularly useful in documenting
students’ overall learning process. It can show how students integrate specific knowledge
or skills and progress towards both basic and advanced mastery.
Showcase Portfolio
– The showcase portfolio only shows the best of the students’ outputs and products. As
such, this type of portfolio is best used for summative evaluation of students’ mastery of
key curriculum outcomes. It should include students’ very best work, determined through
a combination of student and teacher selection. Only completed work should be included.
In addition, this type of portfolio is especially compatible with audio-visual artifact
development, including photographs, videotapes, and electronic records of students’
completed work. The showcase portfolio should also include written analysis and
reflections by the student upon the decision-making process(es) used to determine which
works are included.
-ESSENTIAL ELEMENTS-
1. Cover Letter “About the author” and “What my portfolio shows about my progress
as a learner” (written at the end, but put at the beginning). The cover letter summarizes
the evidence of a student’s learning and progress.
2. Table of Contents with numbered pages.
3. Entries – both core (items students have to include) and optional (items of student’s
choice). The core elements will be required for each student and will provide a common
base from which to make decisions on assessment. The optional items will allow the
folder to represent the uniqueness of each student. Students can choose to include “best”
pieces of work, but also a piece of work which gave trouble or one that was less
successful, and give reasons why.
4. Dates on all entries, to facilitate proof of growth over time.
5. Drafts of aural/oral and written products and revised versions; i.e., first drafts and
corrected/revised versions.
6. Reflections can appear at different stages in the learning process (for formative
and/or summative purposes.) and at the lower levels can be written in the mother tongue
or by students who find it difficult to express themselves in English.
4) What type of portfolio are you required to submit in this course? Justify your
answer.
Process Portfolio because the making of this portfolio requires a process to create and
submitted. This is a process portfolio because we gain mastery of the different areas of
teaching profession because of the episodes that was given by my course facilitator.
Below is an example of assessing reading skills performance which shows the alignment of
teaching and learning goal, activities and assessment task which includes portfolio evidence
Individual
progress
Word bank, selected
Decode(Basic report, peer
Read simple texts ‘texts I can read’ ,
Reading skills) compliment,
reading on a cassette
checklists,
rating scales
Appreciate
Semi-extended reading Reading logs, reading
literature, Self/Peer
activities (Both guided journal, book tasks,
understanding assessment
and independent cassette, video clips,
characters and checklists
learning) artwork
themes
Teacher’s
record of
Reading for student’s
A log of books,
pleasure Sustained silent reading reading: Rating
creative tasks and
(Extensive in class as well as home scale relating
comment cards
reading) to content,
presentation
and language
Once the purpose and targets have been clarified, we need to think of the physical structure of
the portfolio. Some practical questions affect the successful use of portfolio in your classroom.
The content of portfolio consists of entries which provides assessment information about the
content and processes identified in dimensions to be assessed. These naturally are artifacts
which are derived from the different learning activities. The range of samples is extensive and
must be determined to some extent by the subject matter and the instruction.
Using portfolios can help you to document the needs and assets
of the community of interest. Portfolios can also help you to clarify
the identity of your program and allow you to document the
"thinking" behind the development of and throughout the
program. Ideally, the process of deciding on criteria for the
portfolio will flow directly from the program objectives that have
been established in designing the program. However, in a new or
existing program where the original objectives are not as clearly
defined as they need to be, program developers and staff may be
able to clarify their own thinking by visualizing what successful
outcomes would look like, and what they would accept as
"evidence". Thus, thinking about portfolio criteria may contribute
to clearer thinking and better definition of program objectives.
Tier 2 - Accountability
*If goals and criteria are not clear, the portfolio can be just a
miscellaneous collection of artifacts that don't show patterns of
growth or achievement.
1) Purpose
2) Assessment Criteria
3) Evidence
Distinguishing Characteristics
2 Historical perspective
Portfolios widely used for many years. Late 80s interest in portfolios for assessment (
Belan off and Dickson 1991)90s saw advent of e portfolios. A shift in emphasis away
from assessment to learning?
11 A portfolio provides samples of the student’s work which show growth over
time
The criteria for selecting and assessing the portfolio contents must be clear to the
teacher and the students at the outset of the process.
Cover Letter “About the author” and “What my portfolio shows about my
progress as a learner” (written at the end, but put at the beginning)
Table of Contents with numbered pages
Entries – both core (items student have to include) and optional (items of
student’s choice)
Dates on all entries, to facilitate proof of growth over time.
Draft of aural/oral and written products and revised version; e.g., first drafts and
corrected/revised versions.Reflections can appear at different stages in the
learning process (for formative and/or summative purposes) and at the lower
levels can be written in the mother tongue.
Determine Purpose
Identify Physical Structure
Determine Source of Content
Determine Student Self-Reflective Guidelines and Scoring Criteria
Review with Students
Teacher Evaluation of Contents and Student Self-Evaluation
Student Self-Evaluation of Contents
Student-Teacher Conference
Portfolio Content Supplied by Teacher and/or Student
Portfolios Returned to Students or School
19 Purpose Involves specific learning targets – the targets that reflect all contents are
broader and more general“development as a reader”“speaks clearly”Adapts writing
styles to different purposes”
22 Types of Portfolios. The types of portfolios differ from each other depending on
the purposes or objectives set for the overall classroom assessment program.As a
rule, portfolio assessment is used where traditional testing is inadequate to measure
desired skills and competencies.
23 Types of portfolio :
A process portfolio, a showcase portfolio, an assessment portfolio, dossier portfolio,
reflective portfolio, classroom portfolio, positivist portfolio, constructivist portfolio,
personal portfolio, structured portfolio., employment portfolio, A working portfolio
26 Showcase Portfolio – shows the best of the students’ outputs and products.
Best used for summative evaluation of students’ mastery of key curriculum
outcomes.Should include students’ best work, determined through a combination of
student and teacher selection.Only completed work should be included. Compatible
with audio-visual artifact development, including photographs, and electric record of
students’ completed work. Should include written analysis and reflections by the
student upon the decision-making process used to determine which works are
included
28 Rating criteria.
Thoughtfulness (including evidence of students’ monitoring of their own
comprehension, metacognitive reflection, and productive habits of mind)Growth and
development in relationship to key curriculum expectancies and indicators.
Understanding and application of key processes. Completeness, correctness, and
appropriateness of products and processes presented in the portfolio. Diversity of
entries (e.g., use of multiple formats to demonstrate achievement of designed
performance standards)
29 In evolving an evaluation criteria, teachers and students must work together and
agree on the criteria to be applied tot eh portfolio.Such evaluative criteria need to be
set and agreed prior to the development of the portfolio.The criteria to be used may
be formative (i.e., throughout the instructional time period) or summative (i.e., as part
of culminating project, activity or related assessment to determine the extent to which
identified curricular expectancies, indicators, and standards have been achieved)
33 Each portfolio entry needs to be assessed with reference to its specific goals
Self and peer-assessment can also be used for formative evaluation, with students
having to justify their grades with reference to the goals and to specific pages in the
portfolio.
35 Student-Teacher Conferences
The main philosophy embedded in portfolio assessment is “shared and active
assessment”.For formative evaluation process,The teacher should have short individual
meetings with each student, in which progress is discussed and goals are set for
future meeting. The student and the teacher keep careful documentation of the
meetings noting the significant agreements and findings in each session.
39 Impact- Herman and Winter (1994) based on self-reports from teachers and
others implementing portfolios appears to have positive effects on instruction .
Vermont principals affirmed that the portfolio assessment program had beneficial
effects on curriculum and instruction
40 Impact- Hirvela and Sweetland (2005) used 2 case studies showing the 2 students
did not strongly endorse the portfolios as used in 2 different courses. Seemed to
need more explanations of what portfolio approaches were meant to achieve. Even
with a 5% final course grade students saw the portfolio as essentially summative in
nature.
At the end of this section, you should be able to demonstrate skills in in preparing and
interpreting grades. Also, you should be able to assess the effectiveness of parent-teacher
conference as a venue for reporting learners’ performance.
Assessment of learning during instruction and after instruction may be achieved in a number
of ways. One of the challenges in grading is that of summarizing the variety of collected
information from different types of assessment and come up with a standardized numerical
grade or descriptive letter rating or brief report.
The guiding premises in developing grading and reporting system are provided below:
In developing ‘and implementing the grading and reporting systems, these premises must be
taken into consideration to have a meaningful output and help in the attainment of the
student learning objectives, to which the assessment objectives cascaded.
The K to 12 curriculums has specific assessment requirements and design catering to the
delivery modes of learning, i.e., the formal education and alternative learning system.
The K to 12 curriculum prescribes that the assessment process should utilize the wide variety
of traditional and authentic assessment tools and techniques for a valid, reliable, and realistic
assessment of learning. Traditional and authentic assessments complement each other though
they are not mutually exclusive. Furthermore, it gives greater importance on assessing
understanding and skills development rather than on mere accumulation of content.
Knowledge refers to the essential content of the curriculum, the facts and information that the
student acquires.
Process refers to cognitive acts that the student does on facts and information to come up
with meanings and understandings.
Understanding refers to lasting big ideas, principles, and generalizations that are fundamental
to the discipline which may be assessed using the facets of understanding.
The assigned weight per level of assessment are shown in the following table:
Knowledge 15%
Understanding 30%
Products/Performance 30%
TOTAL 100%
At the end of the quarter, the student’s performance will be described based on the prescribed
level of proficiency which has equivalent numerical values. Proficiency level is computed from
the sum of all the performances of students in various levels of assessment. Each level is
described as follows:
Beginning. The student at this level struggles with his/her understanding of prerequisite and
fundamental knowledge skills that have not been acquired or developed adequately.
Developing. The student at this level possesses the minimum knowledge and skills and core
understanding but needs help throughout the performance of authentic tasks.
Approaching Proficiency. The student at this level has developed the fundamental knowledge
and skills and core understandings, and with little guidance from the teacher and or with some
assistance from peers, can transfer these understandings through authentic performance tasks.
Proficient. The student at this level has developed the fundamental knowledge and skills and
core understandings, and can transfer them independently through authentic performance
tasks.
Advanced. The student at this level exceeds the core requirements in terms of knowledge,
skills and core understandings, and can transfer them automatically and flexibly through
authentic performance tasks.
Translating these proficiency level into its numerical value is described in the following table.
Developing 75-79%
Proficient 85-89%
Approaching
Indicators Beginning Developing Proficiency Proficiency Advance
Acquisition of
Struggling or
knowledge,
have not minimum Fundamental Fundamental Exceeding
Skills and
acquired
understanding
With little
guidance
Transfer of
from the
knowledge /
Needs help teacher or independent Automatic and flexible
Application of
some
Knowledge
assistance
from peers
Source: Marilyn D. Dimaano’s presentation materials on Assessment and Rating
Note: You may do some research in order to learn more about the grading and reporting
system used in the old curriculum as well as the newly implemented K to 12 curriculums.
Over the years, studies have also been made on how grades and the comments of teachers
written on students’ papers might affect students’ achievement. An early investigation by Page
(1958) focused specifically on this issue. In the said study, 74 school teachers administered a
test to the students in their classes and scored in the usual way. A numerical score was
assigned to each student’s paper and on the basis of the scores obtained, a corresponding
Letter grade of A, B, C, D, or F was given. Next, teachers randomly divided the students’ papers
into three groups. The first group received only the numerical score and letter grade. The
second group aside from the score and grade, received standard comments:
A: Excellent! Keep it up; B: Good work! Keep it up; C: Perhaps try to do still better? D. Let’s
bring this up; and F: Let’s raise this grade. For the third group, teachers marked the score,
letter grades and then wrote on each paper a variety of individualized comments. Page asked
the teachers to write anything they wished on these papers but to be sure their personal
feelings and instructional practices. Papers were then returned to students in a normal way.
Page then evaluated the effects of the comments by considering students’ scores on the very
next test or assessment given in the class. The results showed that students who received the
standard comments with their grade achieved significantly higher scores than those who
received only a score and grade. Those students who received individualized comments did
even better. This led him to conclude that grades can have a beneficial effect on student
learning when accompanied by specific or individualized comments from the teacher (Stewart
White, 1976). Studies conducted in more recent year confirmed Pages’ conclusion.
Based on the study presented in the previous paragraphs, its relevance are:
1. It illustrated that while grades may not be compulsory for teaching or learning, it can- be used
in positive ways to enhance students’ achievement and performance.
2. It showed that positive effects can be gained with relatively little effort on the part of teachers.
Stamps or stickers with standard comments such as these could be easily produced for
teachers to use. Yet the effect of this simple effort has significant positive effect on students’
performance.
Whatever is preferred and required of the teacher when it comes to format, grading and
reporting should provide high quality information to interested person by means of any
schema they can understand and use. The basis of such high-quality information is critical
evidence on student learning. Evaluation experts stress that if one is going to make important
decisions about students that have broad implications, such as decisions involved in grading,
the more that good evidence must be ready at hand (Airasian, 1994; Linn & Gronlund, 2000;
Stiggins, 2001). In the absence of good evidence, even the most detailed and hi-tech grading
and reporting system is useless. It simply cannot serve the basic communication functions for
which it is intended.
There are three qualities that contribute to the goodness of evidence that are gathered on
student learning. These three qualities are described in the following table.
Quality Description Example
The more sources of evidence Any single source of evidence of student learning
on students’ learning, the can be imperfect, it is essential that multiple
Quality
better the information can be sources of evidence in grading and reporting
reported. students is utilized.
Despite their apparent simplicity, the true meaning of letter grades is not always clear. What
the teachers would like to communicate with particular letter grade and what parents interpret
that grade to mean, often are not the same (Waltman & Frisbie, 1994). To give more clarity to
the meaning of letter grade, most schools include a key or legend on the reporting from in
which each letter grade is paired with an explanatory word or phrase. Descriptors must be
carefully chose, to avoid additional complications and misunderstanding.
Advantages:
Disadvantages:
Requires abstraction of a great deal of information into a single symbol 9stiggins, 2001)
Despite educators’ best effort, letter grades tend to be interpreted by parents in strictly norm-
referenced terms. The cut-offs between grade categories are always arbitrary and difficult to
justify.
Lacks the richness of other more detailed reporting methods such as standards-based grading,
mastery grading, and narrative.
· Outstanding
or advanced
A · Complete
knowledge of all
Outstanding:
content Outstanding: very Outstanding: much
among the
high level of improvement on
· Mastery of all highest or best performance most or all targets
targets performance
· Exceeds
standards
· Very good or
proficient
· Complete
B knowledge of most
content Very good: Very Good: better Very good: some
Performs above at than average improvement on
· Mastery of all the class average performance most or all targets
targets
· Meets most
standards
· Acceptable or
basic command of
only basic concepts
C or skills Average: performs Acceptable: some
· Mastery of at the class Average: improvements on
some targets average some targets
· Meets some
standards
(McMillan, 2007)
Advantages:
Disadvantages:
In an effort to bring greater clarity and specificity to the grading process, many schools
initiated standards-based grading procedures and reporting forms. Guskey and Bailey (2001)
identify four steps in developing standards-based grading. These steps are:
1. Identify the major learning goals or standards that students will be expected to achieve at
each grade level or in each course of study.
2. Establish performance indicators for the learning goals.
3. Determine graduated level of quality (benchmarks) for assessing each goal or standard.
4. Develop reporting tools that communicate teachers’ judgments of students’ learning progress
and culminating achievement in relation to the learning goals and standards.
Advantages:
When clear learning goals or standards are established, standards-based grading offers
meaningful information about students’ achievement and performance to students, parents
and to others
If information is detailed, it can be useful for diagnostic and prescriptive purposes.
Facilitates teaching and learning processes better than any other grading method.
Disadvantages:
Simplest alternative grading method available to educators reduces the number of grade
categories to just two: Pass or Fail. In the late 1800s Pass/Fail grading was originally introduced
in college-level courses in the college in order for students to give more importance to
learning and less to grades they attained. By lessening the emphasis on grades, many
educators believed that students would be encouraged to take more challenging subjects.
Pass/Fail was popular in most universities and colleges in 1970s. These universities and
colleges utilized this pass/fail grading to various programs.
Advantages:
Disadvantages:
The table below provides a summary of the different grading methods discussed:
Method Advantages Disadvantages
· Broad, sometimes
unclear indication of
performance, false sense of
· Easy to
difference between close
Percentage calculate, record, and
scores;
Grade combine
· High scores not
· Familiar
necessarily signifies mastery
· Little discrimination in
· Simple
performance;
Pass/Fail · Consistent with
· Less emphasis on high
mastery of learning
performance
The most critical issue to be addressed in selecting the tools included in reporting system is
what purpose or purposes it is to serve. Why we need to convey this information and what we
need to accomplish.
1. Report Cards
2. Notes: Attached to Report Cards
3. Standardized Assessment Report
4. Phone Calls to Parents
5. Weekly/Monthly Progress Report
6. School Open-Houses
7. Newsletter to Parents
8. Personal Letter to Parents
9. Evaluated Projects or Assignments
10. Portfolios or Exhibits of Students’ Work
11. Homework Assignments
12. Homework Hotlines
13. School Web Pages
14. Parent-Teacher Conferences
15. Student-Teacher Conferences
16. Student- Led Conference
To ensure better practice of grading and reporting systems, the following statements serve as
guide on how to utilize effectively the grading and reporting systems:
6.1 Introduction
6.2 Definitions
6.3 Basic Statistics
6.4 Statistical tests
6.1 Introduction
In the preceding chapters basic elements for the proper execution of analytical
work such as personnel, laboratory facilities, equipment, and reagents were
discussed. Before embarking upon the actual analytical work, however, one more
tool for the quality assurance of the work must be dealt with: the statistical
operations necessary to control and verify the analytical procedures (Chapter 7) as
well as the resulting data (Chapter 8).
It was stated before that making mistakes in analytical work is unavoidable. This is
the reason why a complex system of precautions to prevent errors and traps to
detect them has to be set up. An important aspect of the quality control is the
detection of both random and systematic errors. This can be done by critically
looking at the performance of the analysis as a whole and also of the instruments
and operators involved in the job. For the detection itself as well as for the
quantification of the errors, statistical treatment of data is indispensable.
Clearly, statistics are a tool, not an aim. Simple inspection of data, without
statistical treatment, by an experienced and dedicated analyst may be just as
useful as statistical figures on the desk of the disinterested. The value of statistics
lies with organizing and simplifying data, to permit some objective estimate
showing that an analysis is under control or that a change has occurred. Equally
important is that the results of these statistical procedures are recorded and can
be retrieved.
6.2 Definitions
6.2.1 Error
6.2.2 Accuracy
6.2.3 Precision
6.2.4 Bias
Discussing Quality Control implies the use of several terms and concepts with a
specific (and sometimes confusing) meaning. Therefore, some of the most
important concepts will be defined first.
6.2.1 Error
Error is the collective noun for any departure of the result from the "true"
value*. Analytical errors can be:
6.2.2 Accuracy
The "trueness" or the closeness of the analytical result to the "true" value. It is
constituted by a combination of random and systematic errors (precision and
bias) and cannot be quantified directly. The test result may be a mean of several
values. An accurate determination produces a "true" quantitative value, i.e. it is
precise and free of bias.
6.2.3 Precision
6.2.4 Bias
The consistent deviation of analytical results from the "true" value caused by
systematic errors in a procedure. Bias is the opposite but most used measure for
"trueness" which is the agreement of the mean of analytical results with the true
value, i.e. excluding the contribution of randomness represented in precision.
There are several components contributing to bias:
1. Method bias
2. Laboratory bias
3. Sample bias
Figure
Fig. 6-1. Accuracy and precision in laboratory measurements. (Note that the
qualifications apply to the mean of results: in c the mean is accurate but
some individual results are inaccurate)
6.3 Basic Statistics
6.3.1 Mean
6.3.2 Standard deviation
6.3.3 Relative standard deviation. Coefficient of variation
6.3.4 Confidence limits of a measurement
6.3.5 Propagation of errors
Fig. 6-2. A Gaussian or normal distribution. The figure shows that (approx.)
68% of the data fall in the range ¯ x± s, 95% in the range ¯x ± 2s, and
99.7% in the range ¯x ± 3s.
6.3.1 Mean
(6.1)
This is the most commonly used measure of the spread or dispersion of data
around the mean. The standard deviation is defined as the square root of
the variance (V). The variance is defined as the sum of the squared deviations
from the mean, divided by n-1. Operationally, there are several ways of
calculation:
(6.1)
or
(6.3)
or
(6.4)
The calculation of the mean and the standard deviation can easily be done on a
calculator but most conveniently on a PC with computer programs such as dBASE,
Lotus 123, Quattro-Pro, Excel, and others, which have simple ready-to-use
functions. (Warning: some programs use n rather than n- 1!).
Although the standard deviation of analytical data may not vary much over
limited ranges of such data, it usually depends on the magnitude of such data:
the larger the figures, the larger s. Therefore, for comparison of variations (e.g.
precision) it is often more convenient to use the relative standard deviation
(RSD) than the standard deviation itself. The RSD is expressed as a fraction, but
more usually as a percentage and is then called coefficient of variation
(CV). Often, however, these terms are confused.
(6.5; 6.6)
Note. When needed (e.g. for the F-test, see Eq. 6.11) the variance
can, of course, be calculated by squaring the standard deviation:
V = s2 (6.7)
The more an analysis or measurement is replicated, the closer the mean x of the
results will approach the "true" value m, of the analyte content (assuming absence
of bias).
(6.8)
where
The critical values for t are tabulated in Appendix 1 (they are, therefore, here
referred to as ttab ). To find the applicable value, the number of degrees of
freedom has to be established by: df = n -1 (see also Section 6.4.2).
Example
For the determination of the clay content in the particle-size analysis, a semi-
automatic pipette installation is used with a 20 mL pipette. This volume is
approximate and the operation involves the opening and closing of taps.
Therefore, the pipette has to be calibrated, i.e. both the accuracy (trueness) and
precision have to be established.
A tenfold measurement of the volume yielded the following set of data (in mL):
The mean is 19.842 mL and the standard deviation 0.0627 mL. According to
Appendix 1 for n = 10 is ttab = 2.26 (df = 9) and using Eq. (6.8) this calibration
yields:
(Note that the pipette has a systematic deviation from 20 mL as this is outside
the found confidence interval. See also bias).
In routine analytical work, results are usually single values obtained in batches of
several test samples. No laboratory will analyze a test sample 50 times to be
confident that the result is reliable. Therefore, the statistical parameters have to
be obtained in another way. Most usually this is done by method validation (see
Chapter 7) and/or by keeping control charts, which is basically the collection of
analytical results from one or more control samples in each batch (see Chapter 8).
Equation (6.8) is then reduced to
(6.9)
where
m = "true" value
x = single measurement
t = applicable ttab (Appendix 1)
s = standard deviation of set of previous measurements.
In Appendix 1 can be seen that if the set of replicated measurements is large (say
> 30), t is close to 2. Therefore, the (95%) confidence of the result x of a single
test sample (n = 1 in Eq. 6.8) is approximated by the commonly used and well
known expression
(6.10)
where
¯x = mean of duplicates
s = known standard deviation of large set
Thus, in summary, Equation (6.8) can be applied in various ways to determine the
size of errors (confidence) in analytical work or measurements: single
determinations in routine work, determinations for which no previous data exist,
certain calibrations, etc.
Because the "adding-up" of errors is usually not a simple summation, this will be
discussed. The main distinction to be made is between random errors (precision)
and systematic errors (bias).
In estimating the total random error from factors in a final calculation, the
treatment of summation or subtraction of factors is different from that of
multiplication or division.
I. Summation calculations
x = a + b + c +...
then the total precision is expressed by the standard deviation obtained by taking
the square root of the sum of individual variances (squares of standard deviation):
Example
The Effective Cation Exchange Capacity of soils (ECEC) is obtained by summation
of the exchangeable cations:
Standard deviations experimentally obtained for exchangeable Ca, Mg, Na, K and
(H + Al) on a certain sample, e.g. a control sample, are: 0.30, 0.25, 0.15, 0.15, and
0.60 cmolc/kg respectively. The total precision is:
It can be seen that the total standard deviation is larger than the highest
individual standard deviation, but (much) less than their sum. It is also clear that if
one wants to reduce the total standard deviation, qualitatively the best result can
be expected from reducing the largest individual contribution, in this case the
exchangeable acidity.
2. Multiplication calculations
then the total error is expressed by the standard deviation obtained by taking the
square root of the sum of the individual relative standard
deviations (RSD or CV, as a fraction or as percentage, see Eqs. 6.6 and 6.7):
Example
where
a = ml HCl required for titration sample
b = ml HCl required for titration blank
s = air-dry sample weight in gram
M = molarity of HCl
1.4 = 14×10-3×100% (14 = atomic weight of N)
mcf = moisture correction factor
distillation: 0.8%,
titration: 0.5%,
molarity: 0.2%,
sample weight: 0.2%,
mcf: 0.2%.
Here again, the highest RSD (of distillation) dominates the total precision. In
practice, the precision of the Kjeldahl method is usually considerably worse
(» 2.5%) probably mainly as a result of the heterogeneity of the sample. The
present example does not take that into account. It would imply that 2.5% - 1.0%
= 1.5% or 3/5 of the total random error is due to sample heterogeneity (or other
overlooked cause). This implies that painstaking efforts to improve subprocedures
such as the titration or the preparation of standard solutions may not be very
rewarding. It would, however, pay to improve the homogeneity of the sample, e.g.
by careful grinding and mixing in the preparatory stage.
Some of the most common and convenient statistical tools to quantify such
comparisons are the F-test, the t-tests, and regression analysis.
Because the F-test and the t-tests are the most basic tests they will be discussed
first. These tests examine if two sets of normally distributed data are similar or
dissimilar (belong or not belong to the same "population") by comparing
their standard deviations and means respectively. This is illustrated in Fig. 6-3.
Fig. 6-3. Three possible cases when comparing two sets of data (n1 = n2). A.
Different mean (bias), same precision; B. Same mean (no bias), different
precision; C. Both mean and precision are different. (The fourth case, identical
sets, has not been drawn).
6.4.1 Two-sided vs. one-sided test
These tests for comparison, for instance between methods A and B, are based on
the assumption that there is no significant difference (the "null hypothesis"). In
other words, when the difference is so small that a tabulated critical
value of F or t is not exceeded, we can be confident (usually at 95% level)
that A and B are not different. Two fundamentally different questions can be
asked concerning both the comparison of the standard deviations s1 and s2 with
the F-test, and of the means¯x1, and ¯x2, with the t-test:
This difference in probability in the tests is expressed in the use of two tables of
critical values for both F and t. In fact, the one-sided table at 95% confidence
level is equivalent to the two-sided table at 90% confidence level.
Because the result of the F-test may be needed to choose between the
Student's t-test and the Cochran variant (see next section), the F-test is discussed
first.
The F-test (or Fisher's test) is a comparison of the spread of two sets of data to
test if the sets belong to the same population, in other words if the precisions are
similar or dissimilar.
(6.11)
where the larger s2 must be the numerator by convention. If the performances are
not very different, then the estimates s1, and s2, do not differ much and their ratio
(and that of their squares) should not deviate much from unity. In practice, the
calculated F is compared with the applicable F value in the F-table (also called
the critical value, see Appendix 2). To read the table it is necessary to know the
applicable number of degrees of freedom for s1, and s2. These are calculated by:
df1 = n1-1
df2 = n2-1
If Fcal £ Ftab one can conclude with 95% confidence that there is no significant
difference in precision (the "null hypothesis" that s1, = s, is accepted). Thus, there
is still a 5% chance that we draw the wrong conclusion. In certain cases more
confidence may be needed, then a 99% confidence table can be used, which can
be found in statistical textbooks.
Table 6-1. CEC values (in cmolc/kg) of a control sample determined by two
analysts.
1 2
10.2 9.7
10.7 9.0
10.5 10.2
9.9 10.3
9.0 10.8
11.2 11.1
11.5 9.4
10.9 9.2
8.9 9.8
10.6 10.2
¯x: 10.34 9.97
s: 0.819 0.644
n: 10 10
Fcal = 1.62 tcal = 1.12
Ftab = 4.03 ttab =
2.10
The determination of the calcium carbonate content with the Scheibler standard
method is compared with the simple and more rapid "acid-neutralization" method
using one and the same sample. The results are given in Table 6-2. Because of
the nature of the rapid method we suspect it to produce a lower precision then
obtained with the Scheibler method and we can, therefore, perform the one sided
F-test. The applicable Ftab = 3.07 (App. 2, df1, = 12, df2 = 9) which is lower
than Fcal (=18.3) and the null hypothesis (no difference) is rejected. It can be
concluded (with 95% confidence) that for this one sample the precision of the
rapid titration method is significantly worse than that of the Scheibler method.
Table 6-2. Contents of CaCO3 (in mass/mass %) in a soil sample determined with
the Scheibler method (A) and the rapid titration method (B).
A B
2.5 1.7
2.4 1.9
2.5 2.3
2.6 2.3
2.5 2.8
2.5 2.5
2.4 1.6
2.6 1.9
2.7 2.6
2.4 1.7
- 2.4
- 2.2
2.6
x: 2.51 2.13
s: 0.099 0.424
n: 10 13
Fcal = 18.3 tcal = 3.12
Ftab = 3.07 ttab* =
2.18
Depending on the nature of two sets of data ( n, s, sampling nature), the means of
the sets can be compared for bias by several variants of the t-test. The following
most common types will be discussed:
Basically, for the t-tests Equation (6.8) is used but written in a different way:
(6.12)
where
To compare the mean of a data set with a reference value normally the "two-
sided t-table of critical values" is used (Appendix 1). The applicable number of
degrees of freedom here is:
df = n-1
If a value for t calculated with Equation (6.12) does not exceed the critical value in
the table, the data are taken to belong to the same population: there is no
difference and the "null hypothesis" is accepted (with the applicable probability,
usually 95%).
As with the F-test, when it is expected or suspected that the obtained results are
higher or lower than that of the reference value, the one-sided t-test can be
performed: if tcal > ttab, then the results are significantly higher (or lower) than the
reference value.
When using the t-test for two small sets of data (n1 and/or n2<30), a choice of
the type of test must be made depending on the similarity (or non-similarity) of
the standard deviations of the two sets. If the standard deviations are sufficiently
similar they can be "pooled" and the Student t-test can be used. When the
standard deviations are not sufficiently similar an alternative procedure for the t-
test must be followed in which the standard deviations are not pooled. A
convenient alternative is the Cochran variant of the t-test. The criterion for the
choice is the passing or non-passing of the F-test (see 6.4.2), that is, if the
variances do or do not significantly differ. Therefore, for small data sets, the F-test
should precede the t-test.
For dealing with large data sets (n1, n2,³ 30) the "normal" t-test is used (see
Section 6.4.3.3 and App. 3).
(6.13)
where
6.14
where
s1 = standard deviation of data set 1
s2 = standard deviation of data set 2
n1 = number of data in set 1
n2 = number of data in set 2.
To perform the t-test, the critical ttab has to be found in the table (Appendix 1);
the applicable number of degrees of freedom df is here calculated by:
df = n1 + n2 -2
Example
The two data sets of Table 6-1 can be used: With Equations (6.13) and
(6.14) tcal, is calculated as 1.12 which is lower than the critical value ttab of 2.10
(App. 1, df = 18, two-sided), hence the null hypothesis (no difference) is accepted
and the two data sets are assumed to belong to the same population: there is no
significant difference between the mean results of the two analysts (with 95%
confidence).
In the present example of Table 6-1, the calculation yields lsd = 0.69. The
measured difference between the means is 10.34 -9.97 = 0.37 which is smaller
than the lsd indicating that there is no significant difference between the
performance of the analysts.
In addition, in this approach the 95% confidence limits of the difference between
the means can be calculated (cf. Equation 6.8):
Note that the value 0 for the difference is situated within this confidence interval
which agrees with the null hypothesis of x1 = x2 (no difference) having been
accepted.
Calculate t with:
6.16
6.17
where
Now the t-test can be performed as usual: if tcal< ttab* then the null hypothesis that
the means do not significantly differ is accepted.
Example
According to the F-test, the standard deviations differ significantly so that the
Cochran variant must be used. Furthermore, in contrast to our expectation that
the precision of the rapid test would be inferior, we have no idea about the bias
and therefore the two-sided test is appropriate. The calculations yield t cal = 3.12
and ttab*= 2.18 meaning that tcal exceeds ttab* which implies that the null hypothesis
(no difference) is rejected and that the mean of the rapid analysis deviates
significantly from that of the standard analysis (with 95% confidence, and for this
sample only). Further investigation of the rapid method would have to include the
use of more different samples and then comparison with the one-sided t-test
would be justified (see 6.4.3.4, Example 1).
When two data sets are not independent, the paired t-test can be a better tool
for comparison than the "normal" t-test described in the previous sections. This is
for instance the case when two methods are compared by the same analyst using
the same sample(s). It could, in fact, also be applied to the example of Table 6-1
if the two analysts used the same analytical method at (about) the same time.
The null hypothesis is that there is no difference between the data sets, so the
test is to see if the mean of the differences between the data deviates
significantly from zero or not (two-sided test). If it is expected that one set is
systematically higher (or lower) than the other set, then the one-sided test is
appropriate.
Example 1
Table 6-3. CEC values (in cmolc/kg) obtained by the NH4OAc and AgTU methods
(both at pH 7) for ten soils with ferralic properties.
where
The calculated t value (=2.89) exceeds the critical value of 1.83 (App. 1, df = n -1
= 9, one-sided), hence the null hypothesis that the methods do not differ is
rejected and it is concluded that the silver thiourea method gives significantly
higher results as compared with the ammonium acetate method when applied to
such highly weathered soils.
Note. Since such data sets do not have a normal distribution, the
"normal" t-test which compares means of sets cannot be used here
(the means do not constitute a fair representation of the sets). For
the same reason no information about the precision of the two
methods can be obtained, nor can the F-test be applied. For
information about precision, replicate determinations are needed.
Example 2
Table 6-4 shows the data of total-P in four plant tissue samples obtained by a
laboratory L and the median values obtained by 123 laboratories in a
proficiency (round-robin) test.
Table 6-4. Total-P contents (in mmol/kg) of plant tissue as determined by 123
laboratories (Median) and Laboratory L.
Using Eq. (6.12) and noting that m d=0 (hypothesis value of the differences, i.e. no
difference), the t value can be calculated as:
These also belong to the most common useful statistical tools to compare effects
and performances X and Y. Although the technique is in principle the same for
both, there is a fundamental difference in concept: correlation analysis is applied
to independent factors: if X increases, what will Y do (increase, decrease, or
perhaps not change at all)? In regression analysis a unilateral response is
assumed: changes in X result in changes in Y, but changes in Y do not result in
changes in X.
For example, in analytical work, correlation analysis can be used for comparing
methods or laboratories, whereas regression analysis can be used to construct
calibration graphs. In practice, however, comparison of laboratories or methods is
usually also done by regression analysis. The calculations can be performed on a
(programmed) calculator or more conveniently on a PC using a home-made
program. Even more convenient are the regression programs included in
statistical packages such as Statistix, Mathcad, Eureka, Genstat, Statcal, SPSS, and
others. Also, most spreadsheet programs such as Lotus 123, Excel, and Quattro-
Pro have functions for this.
As was discussed in Section 6.4.3, such comparisons can often been done with
the Student/Cochran or paired t-tests. However, correlation analysis is indicated:
y = bx + (6.18)
a
where
In laboratory work ideally, when there is perfect positive correlation without bias,
the intercept a = 0 and the slope = 1. This is the so-called "1:1 line" passing
through the origin (dashed line in Fig. 6-5).
6.19
where
xi = data X
¯x = mean of data X
yi = data Y
¯y = mean of data Y
The line parameters b and a are calculated with the following equations:
6.20
and
a = ¯y - b¯x 6.21
Table 6-5 is presented here to give an insight in the steps and terms involved.
The calculation of the correlation coefficient r with Equation (6.19) yields a value
of 0.997 (r2 = 0.995). Such high values are common for calibration graphs. When
the value is not close to 1 (say, below 0.98) this must be taken as a warning and
it might then be advisable to repeat or review the procedure. Errors may have
been made (e.g. in pipetting) or the used range of the graph may not be linear.
On the other hand, a high r may be misleading as it does not necessarily indicate
linearity. Therefore, to verify this, the calibration graph should always be plotted,
either on paper or on computer monitor.
and
y = 0.626x + (6.22)
0.037
Fig. 6-4. Calibration graph plotted from data of Table 6-5. The dashed lines
delineate the 95% confidence area of the graph. Note that the confidence is
highest at the centroid of the graph.
During calculation, the maximum number of decimals is used, rounding off to the
last significant figure is done at the end (see instruction for rounding off in
Section 8.2).
Once the calibration graph is established, its use is simple: for each y value
measured the corresponding concentration x can be determined either by direct
reading or by calculation using Equation (6.22). The use of calibration graphs is
further discussed in Section 7.2.2.
Although regression analysis assumes that one factor (on the x-axis) is constant,
when certain conditions are met the technique can also successfully be applied to
comparing two variables such as laboratories or methods. These conditions are:
- The most precise data set is plotted on the x-axis
- At least 6, but preferably more than 10 different samples are
analyzed
- The samples should rather uniformly cover the analyte level range
of interest.
If the analyte level range is incomplete, one might have to resort to spiking or
standard additions, with the inherent drawback that the original analyte-sample
combination may not adequately be reflected.
Example
Fig. 6-5. Scatter plot of pH data of two laboratories. Drawn line: regression
line; dashed line: 1:1 ideal regression line.
The t-test for significance is as follows:
When results of laboratories or methods are compared where more than one
factor can be of influence and must be distinguished from random effects, then
ANOVA is a powerful statistical tool to be used. Examples of such factors are:
different analysts, samples with different pre-treatments, different analyte levels,
different methods within one of the laboratories). Most statistical packages for the
PC can perform this analysis.
As a treatise of ANOVA is beyond the scope of the present Guidelines, for further
discussion the reader is referred to statistical textbooks, some of which are given
in the list of Literature.
(6.23)
where
Note: Only the y-deviations of the points from the line are
considered. It is assumed that deviations in the x-direction are
negligible. This is, of course, only the case if the standards are very
accurately prepared.
Now the standard deviations for the intercept a and slope b can be calculated
with:
6.24
and
6.25
To make this procedure clear, the parameters involved are listed in Table 6-6.
The uncertainty about the regression line is expressed by the confidence limits of
a and b according to Eq. (6.9): a ± t.sa and b ± t.sb
Table 6-6. Parameters for calculating errors due to calibration graph (use also
figures of Table 6-5).
xi yi
The applicable ttab is 2.78 (App. 1, two-sided, df = n -1 = 4) hence, using Eq. (6.9):
Note that if sa is large enough, a negative value for a is possible, i.e. a negative
reading for the blank or zero-standard. (For a discussion about the error
in x resulting from a reading in y, which is particularly relevant for reading a
calibration graph, see Section 7.2.3)
The uncertainty about the line is somewhat decreased by using more calibration
points (assuming sy has not increased): one more point reduces ttab from 2.78 to
2.57 (see Appendix 1).