0% found this document useful (0 votes)
30 views

1.0 Brief Overview of Educational Assessment

The document discusses educational assessment and psychometrics. It defines constructs that are assessed, such as achievement, and different types of assessments including norm-referenced tests which compare individuals to groups and criterion-referenced tests which compare performance to standards. The document also discusses the importance of validity and reliability in assessments and different response formats such as structured and open-response items.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

1.0 Brief Overview of Educational Assessment

The document discusses educational assessment and psychometrics. It defines constructs that are assessed, such as achievement, and different types of assessments including norm-referenced tests which compare individuals to groups and criterion-referenced tests which compare performance to standards. The document also discusses the importance of validity and reliability in assessments and different response formats such as structured and open-response items.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

1.

0 Brief Overview of Educational Assessment



Psychologists and educators use the term, “construct,” to denote hypothetical

abstractions of mental processes that are related to behavior or experience (Murphy &

Davidshofer, p. 156). For example, “extroversion” is an abstract personality dimension which

may be assessed through an individual’s behavioral responses to a personality inventory.

“Achievement, defined as the extent to which students can demonstrate mastery of a scholastic

curriculum, is the most frequently assessed construct in the classroom” (Chatterji, p. 27).

Any test is an assessment instrument which records behavior obtained under

standardized conditions with established rules for scoring (Murphy & Davidshofer, p. 3,

Standards, p. 3). (Please note that the terms “test,” “exam,” and “assessment” will be used

interchangeably.) Tests vary in the precision and detail of their scoring, from the exact scoring

of multiple-choice tests to the more subjective judgment entailed by short answer or essay

tests. Tests may be used to assess maximal performance, such as aptitude or achievement

tests (examinees are asked to “do their best”), or may be used to assess typical performance,

such as an attitude survey or personality inventory (respondents are asked to report their

typical responses) (Crocker & Algina, p. 4).

Assessment allows us to identify individual differences among people.

A norm-referenced test compares an individual’s scores on an assessment instrument to the

scores of a norm group. Norm groups vary, depending on the purpose of the assessment. For

example, the scores of a child on an intelligence test may be compared to a group of children of

the same age, which would indicate the child’s standing compared to their age group. Well-

known norm-referenced aptitude tests are the Scholastic Assessment Test and the Graduate
Record Examinations. Grading “on a curve” is also a norm-referenced procedure in which the

class itself serves as the norm. So, for example, the top 20% may receive an A, the second 20%

a B, and so on (Chatterji, p. 85).

In contrast to norm-referenced tests, criterion-referenced tests compare an individual’s

score on an assessment instrument to a specific standard, usually related to the degree of

content mastered. “The focus is on what test takers can do and what they know, not on how

they compare to others” (Anastasi, p. 102). Many educational and licensing tests are criterion-

referenced tests which are used to establish knowledge or competency. For example, the

typical academic grade scale (90% to 100% = A, 80% to 89% = B, 70% to 79% = C, 60% to 69% =

D, <60% = F) establishes standards for achievement with respect to course content.

Assessments assist teachers in making two kinds of instructional decisions: Formative

decisions are used to shape the instructional design or delivery process, e.g., several of my

students need additional training on using the calculator. Summative decisions describe

students’ mastery of learning objectives, e.g., 70% of my students received an A or a B on the

final exam (Chatterji, p. 28).

Educational decisions may also be described as “low-stake” or “high-stake.” High-stake

decisions determine “who will and who will not gain access to employment, education, and

licensure or certification (jointly referred to as credentialing) opportunities” (Sackett, Schmitt,

Ellingson, & Kabin, p. 302). Although many classroom assessments are used for low-stake

decision making, the stakes tend to be higher when making summative decisions, in which case,

the assessments “should be of defensible quality” (Chatterji, p. 29).

2
Psychometrics is “the science of the assessment of individual differences,” often

referring to the quantitative aspects of psychological measurement (Whitney & Shultz, p. 425).

Two long-standing hallmarks of a test’s quality are its validity and reliability. “The validity of a

test concerns what the test measures and how well it does so. It tells us what can be inferred

from test scores” (Anastasi, p. 139). “Reliability refers to the consistency of scores obtained by

the same persons when reexamined . . . .” (Anastasi, p. 109). Because an unreliable test cannot

be valid, a test’s reliability places limits on a test’s validity1.

Tests also differ in the manner in which individuals respond to items. Probably the most

common form of testing is with items that call for a written response (even if only indicating an

item choice), although assessments may be conducted by observing behavior, judging a

product, conducting an interview, or reviewing a portfolio. Among written assessments, items

may call for a structured-response in which there is only one correct answer (e.g., multiple

choice, true false) or open-response, in which the length and content of the response varies,

e.g., short answer or essay tests (Chatterji, p. 86 – 89).

In general, assessments consisting of structured-response items allow a large amount of

the achievement domain to be covered in a relatively short period of time (improving reliability

and validity), may be administered to large groups, can be quickly graded, and consist of

objectively correct answers (improving reliability). However, if structured response items are


1
Precise speakers and writers would note the following: Tests are not valid or invalid, the inferences that we
make from their use are valid or invalid. Tests are not reliable or unreliable. Reliability coefficients are specific to a
sample of the population, so the best we can say is that the use of the test is likely reliable, particularly across
similar populations. It should also be noted that reliability results for criterion-referenced tests are not typical.
This is because test scores on a criterion-referenced test result in less variability which will limit reliability
coefficients.

3
not well written, e.g., the distractors increase the likelihood of correct guessing, both the

validity and reliability may be diminished.

Although some argue that structured response items can be written to assess higher-

order cognitive skills, they also admit that training and practice in creating such items is

required (Anastasi, p. 417). Most agree that open-response items can be created which will

require higher-level cognitive functioning, imparting access to some areas of the achievement

domain not accessible by structured-response items (increasing validity). In general,

assessments consisting of open-response items allow less of the domain to be tested (reducing

validity), require human scorers, and take more time to grade.


Because Chatterji was a course book, there is no need to do anything further. What appears
below are non-course references.

References

American Educational Research Association, American Psychological Association, & National


Council on Measurement in Education (1999). Standards for Educational and Psychological
Testing. Washington, DC: AERA, APA, & NCME.

Anastasi, A. (1988). Psychological Testing (6th ed.). New York, NY: Macmillan.

Crocker, L., & Algina, J. (1986). Introduction to Classical and Modern Test Theory. New York,
NY: Harcourt College Publishers.

Murphy, K.R., & Davidshofer, C.O. (1998). Psychological Testing: Principles and Applications (4th
ed.). Upper Saddle River, NJ: Prentice Hall.

Sackett, P.R., Schmitt, N., Ellingson, J.E., & Kabin, M.B. (2001). High stakes testing in
employment, credentialing, and higher education. American Psychologist, 56, 302-318.

Shultz, K.S., & Whitney, D.J. (2005). Measurement Theory in Action: Case Studies and Exercises.
Thousand Oaks, CA: Sage Publications, Inc.

You might also like