0% found this document useful (0 votes)
24 views24 pages

Measurement and Evaluation Notes

Uploaded by

Wolfus
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views24 pages

Measurement and Evaluation Notes

Uploaded by

Wolfus
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Measurement and evaluation

Measurement: process of obtaining numerical description of the


degree of individual possesses.
Assignment of numbers: tests etc.

Test: tool, procedure, examination, assessment, or a measure of an


outcome.
Tests: designed to measure any quality, ability, skill, or knowledge.
We test:
Achievement: knowledge
Personality: characteristics
Aptitude: potential to succeed
Ability or intelligence: skill

We test to classify and to place and select and diagnose.

Instructional: assess progress


Curricular: decision out of school curricula
Selection: determine ability
Placement: given to group students
Personal: make wise decisions for themselves.
Evaluation:
Establishing objectives
Classifying objectives
Defining objectives
Selecting indicators
Comparing data with objectives
Make judgements on worth: how good…?

Analyzing information to determine the extent of students achievement


of objectives.

Assessment principles:
Address learning targets
Provide efficient feedback.
Use variety of assessment procedures
Ensure that assessments are valid.
Keep record of assessment
Address the results meaningfully.

Measurement: quantitative determination of how much an individual’s


performance has been.
Measurement describes a situation; evaluation judges its worth or
value.

Assessment:
Gathering information
Pinpoints strengths and weaknesses
Diagnostic and formative
Focus on individual students.

Evaluation:
Setting a value on assessment information
Judgment
Ranks and sorts.
Summative
Focus on group.

Formative test: monitor the attainment of instructional objective.


Summative test: measures extent to which student has attained desired
outcome.
Standardized test: valid, reliable, and objective.
Norm-referenced test: based on standard level of accomplishment by
whole group taking the test.
Criterion-reference test: measuring device with a predetermined level
of success for test takers.

Levels of measurement:
Variable: take on more than one value. Can be measured diff ways. The
way its measured determines the level of measurement being used.
Measurement: assignment of labels to a variable or an outcome.
Represents how much info is being provided by the outcome measure.

Nominal: difference in quality. Discrete in nature// example: hair color,


nationality, names.
Ordinal: can be ordered or ranked// example: level of education.
Interval: assign value to outcome based on continuum of equal
intervals// example: difference between 100 degrees and 90 degrees is
the same as 60 degrees and 70 degrees.
Ratio: true zero point// example: income, height, weight,
unemployment rate.

Data: pieces of info that u collect and examine ur topic.


Variable: element that is liable to change.
Statistics: describing and analyzing quantitative data// example: mean.

Measures of central tendency: indices that represent average score


among a group of scores.
Mean, median, mode.
Mean: X= ∑X/n

Frequencies: refers to the number of times something occurs// how


many times will the coin land tails side up?

3 common measures of variability:


Range: difference between highest and lowest score.
Quartile deviation.
Upper quartile: top 25%
Lower quartile: lowest 25%
QD= (Q3-Q1)/2

Variance: the amount of spread amongst scores.

This Photo by Unknown Author

Standard deviation: square root of variance. Used in interval and ratio


data.
Distribution: When a distribution is not normally distributed, it is said
to be skewed.
Skewed: not symmetrical. Theres +ve and -ve.
Measures of relative position: Where score falls in distribution relative
to all other scores.
How well an individual has scored in comparison to others.
Measures: percentile ranks and standard scores.

z-score of mean= 0.
A score at 1 standard deviation above the mean has a z-score of 1.
Z= x-u/o x is raw score, and u is mean.

t-score: multiply z-score by 10 and add 50.

Types of reliability:
Consistency of scores
Consistency among raters
Consistency across time

Theory of reliability:
X= t+e
X is observed score//what u got in test.
T true score//the accurate reflection of what you really know.
E measurement error// day to day difference between the true and
error score.

Observed score= true score + error.

Error increase –Reliability decreases


Error decreases –Reliability increases

Sources of error:
Trait error: did not study.
Method error: lousy instruction, hot room.
Administration errors: inaccurate timing.
Scoring errors: subjective scoring, clerical errors.

Reliability is calculated using correlation coefficient (rxy)


Ranged between .00 and 1.0
Higher= more reliable.

Types of reliability:

Test retest: test to see if exam is reliable overtime.


Problems: practice effects, time between tests and nature of sample.

Parallel forms: examine similarity of 2 diff forms of same test.

Internal consistency: determine items are consistent to represent one


construct.

Interrater: to know how much 2 raters agree on their judgements of


some outcome.
Interrater reliability= #agreements/#possible agreements.

To increase reliability:
Increase standardization of tests
Increase number of items
Delete unclear items.
Moderate difficulty
Minimize effect of external events.

Validity: the tool does what it says it does.


Threats:

Construct underrepresentation: when test doesn’t measure important


aspect of specified construct.
Construct-Irrelevant Variance: when test measures characteristics,
content, or skills that are unrelated to the test construct.

Types of validity evidence:


Content
Criterion
Construct
Consequential

Content validity: the characteristic of a test that is made up of items


that fairly represent all the items that could be on the test.
Tests with well-established content validity: have a list or table that has
detailed the material elements of a construct.

Criterion validity: the characteristic of a test that produces scores


correlated with some other measure.
Criterion validity is important to establish for tests that predict the
future or estimate concurrent performance on some other test.
predictive criterion validity and concurrent criterion validity.
Predictive validity: assesses whether a test reflects a set of abilities in
the future.
Entrance exams.

Concurrent validity: assesses whether a test reflects a set of abilities at


the current moment.
Licensing.
Construct validity: characteristic of a test with scores that reflect the
construct (invisible trait) a test is intended to measure.
Cognitive abilities, intelligence, addiction.

Consequential validity: concerned with unintended social


consequences from the use of a test.
To help society and help people who are being tested.

Biased test: unfair towards a certain group.

If you can’t establish validity:


Redo questions
Undeveloped models.

Validity is more important than reliability.


Steps to develop classroom test:

Identification and statement of educational objectives is the first step.


Educational objectives: goals that you hope the student will learn.
Also referred to as instructional / learning objectives.

Bloom’s taxonomy: knowledge, comprehension, application, analysis,


synthesis, evaluation.
To show compatibility between class instruction and test content, use
table of specifications (test blueprint).

Norm-referenced assessment: compare student performance to other


students.
Criterion-referenced assessment: compare students’ performance to
an absolute standard or criterion.

Selected response: items require a student to select a response from


available alternatives.
Strengths:
Can include many which facilitates adequate sampling of the content.
Scored in efficient objective reliable manner. Good at measuring lower-
level objectives.
Weaknesses:
Difficult to write.
Not able to assess all Edu obj.
Subject to random guessing.

Constructed response: require student to construct a response.


Strengths:
Easier to write.
Assess higher order cognitive abilities.
Eliminate random guessing.

Weakness:
Can’t include many items in test, not able to sample content domain as
thoroughly.
More difficult to score in a reliable manner.
Sensitive to feigning.

Suggestions for assembling an assessment:


Adhere to your table of specifications.
Provide clear instructions.
State items clearly.
Include items that contribute to the reliability and validity of your
assessment results.
Multiple choice items:
Most popular.
Objective items
Preferred way of testing achievement-oriented outcomes.
Stem: question or incomplete statement.

Possible answers alternatives


Incorrect alternatives: distracters.

Benefits:
Easy to score.
Easy to analyse.
Flexible.
Easy to create items that match LO.
Written at any level of BT.

Distractors should be plausible.

No intentional clues should be given.


No inconsistent lengths.
No inconsistent categories.
• Best-answer multiple-choice items: There may be more than one
correct answer, but only one of them is the best.
• Rearrangement multiple-choice items: Here is where the test
taker arranges a set of items in sequential order.
• Interpretive multiple-choice items: Test taker reads through a
passage and then selects a response where the alternatives all are
based on the same passage.
• Substitution multiple-choice items: There are alternatives from
which to select. The test taker selects those responses from a set
of responses that he or she thinks answers the question correctly.

Strengths:
Versatile
Can be scored in a reliable manner.
Easy to refine using results of item analysis.
Efficient way of sampling content domain.
Weaknesses:
Not effective for measuring educational objectives.
Not easy to write.
Limit creativity.

Matching items:
Assess particular topic.
Easy to administer.
Easy to score.
Acceptable tool for assessment

Good when there are lots of possible answers without repletion.


Involve selection.
More than 5 alternatives for MC.
Or more than two alternatives for TF questions.

Premises: statement in column.


Options: responses.
Should be reasonable.
List responses in different order than premises.
Place premises in logical order.
Make sure they are on the same page.

Pros:
Easy to score.
Scored with a reliable manner.
Easy to administer to large numbers.
Responses are short and easy.
Allow comparison of ideas.
Cons:
Limited knowledge testing.
Scoring can be a problem.
Good memory is needed.

True false items:


Used to assess achievement when there is a clear distinction between
two alternates.
Binary choice items.

Use declarative sentences.


Clear choice and binary.
Focus on one specific topic.
Void statements of opinion.
No clues.

Pros:
Reliable and objective scoring.
Efficient.

Cons:
Vulnerable to guessing.
Subject to response sets.
Not easy to write.

The probability of TF tests is 50.


Limited to knowledge-based items.

Correcting for guessing formula:


CS=R-W.
Correct score= number correct-number incorrect.

Constructed response item: essays or short answers.


Known as supply items.
Supply does not select answers.
Used to assess lower level thinking skills.
Focus on a certain level of material.

Pros:
Flexible
Minimized guessing
Easy to write.
Cons:
No machine scoring.
Subjective.
Limited cognitive skills assessed.
Questions hard to create.

Essay items:
Higher level thinking.
Informative responses.
Open ended and close ended essay items.

Open ended:
Unrestricted.

Close ended:
Restricted.

Assess different levels of complexity.


Make sure the question is complete and clear.
Evaluate higher order outcomes.
Have all test takers answer the same questions.
Allow adequate time to answer.
Pros:
Shows how to relate ideas to each other.
Increases security.
Flexibility.
Easy to construct.

Cons:
Emphasize writing.
Tough to write.
Not easy to score.

Use model correct answer for comparison when scoring.


Grade responses without knowing the identity of the person.
Take your time.

Developing a rubric:
A systematic scoring guideline to evaluate students’ performance
(papers, speeches, problem solutions, portfolios, cases) using a detailed
description of performance levels.
Gives consistent score.
Makes us more aware of expectations.
Components:
Task description
Criteria
Level of attainment

Rubric:
Flexible tool to measure student’s learning related to a specific
objective of a task.
Reliability, consistent grading.

For teachers:
Provides students with detailed feedback.
Encourage critical thinking.
Helps refine teaching skills.

For students:
Help to monitor and critique own work.
Provide informative descriptions of expected performance.

A good rubric is:


Well defined.
Context specific.
Finite and exhaustive.
Ordered.
Related to common core theme.

Descriptive rubric:
Allow scoring of a task on several different aspects of the task.
Pros: provides judgment on each criterion.
Cons: time consuming to make.

Holistic rubric:
Single scale with all criteria included in the evaluation being considered
together.
Pros: saves time in scoring.
Cons: no specific feedback.

Use rubrics on:


Projects
Presentations
Portfolios
(performance based)
Sample work should be scored.
More than one evaluator should score papers.
If 2 disagree, the third decides.
Frequent disagreements means that rubric needs to be adjusted.

Using rubric with students:


Explain what the test with emphasise.
Inform student how the assessment will be recorded.
explain how the results will be used.
Make sure the rubric is understandable.
Works best with holistic rubrics.
Provide rubric in advance.

Differentiation instruction:
Proactive accepetance of and planning for student differences In
readiness, motivation and learning proviles.
Making adjustments throughut teaching and learning cycle.

Why do we need it?


Autism has risen.
One in five children experience behavioural or emotional difficulty.
Some live in poverty.
Gifted students.

To differentiate instruction, students should have access to high quality


curriculum and document assessment.

A classroom is a system of five interdependent elements.


1.Classroom environment
2.Curriculum
3.Assessment
4.Instruction
5.Classroom management

Kinds of assessment:
Formative; changing the course to improve outcomes.
Summative; measure and evaluate student outcomes.

When to assess for effective differentiation?


Pre-assessment
Formative assessment
Summative assessment.

• Choice is key to the process.


• The learning tasks always consider the students
‘strengths/weaknesses.
• Groupings of students will vary, some will work better
independently, and others will work in various group settings.
• Multiple intelligence is taken into consideration as are the
students’ learning and thinking styles. Lessons are authentic to
ensure that all students can make connections.
• Project and problem-based learning are also key in differentiated
instruction and assessment.
• Lessons and assessments are adapted to meet the needs of all
learners.
• Opportunities for children to think for themselves are evident.

Examples of Differentiated assessment:


Quizzes
Debates
Journals
Peer-evaluations.

Approaches:
Find ways to know students more.
Small group teaching into daily or weekly teaching routines.
Offer more ways to explore and express learning.
Teach in multiple ways.
Allow working alone or with peers.

One lesson plan a month is enough to assess differentiation.

You might also like