Module 1 - PSYCH 3140 Lab
Module 1 - PSYCH 3140 Lab
Module 1 - PSYCH 3140 Lab
Overview
I. Objectives
At the end of this lesson, you should be able to:
1. Know the definition of test and its features.
2. Understand the difference between psychological assessment and psychological
testing.
3. Know the major landmarks in the history of psychological testing.
4. Know the different types of test.
5. Understand basic concepts of standardized and non-standardized testing and
other assessment techniques including norm-referenced and criterion-
referenced assessment, environmental assessment, performance assessment,
individual and group test and inventory methods, psychological testing, and
behavioral observations.
6. Understand the varied purposes of psychological testing in addition to the
various settings in which tests are employed.
7. Apply technical concepts, basic principles and tools of measurement, of
psychological processes.
• Standardized procedure
• Behavior sample
• Scores or categories
• Norms or standards
• Prediction of non-test behavior
A psychological test is also a limited sample of behavior. Neither the subject nor
the examiner has sufficient time for truly comprehensive testing, even when the test is
targeted to a well-defined and finite behavior domain. Thus, practical constraints dictate
that a test is only a sample of behavior. Yet, the sample of behavior is of interest only
insofar as it permits the examiner to make inferences about the total domain of relevant
behaviors.
Another important distinction is between testing and assessment, which are often
considered equivalent. However, they do not mean exactly the same thing. Assessment
is a more comprehensive term, referring to the entire process of compiling information
about a person and using it to make inferences about characteristics and to predict
behavior. Assessment can be defined as appraising or estimating the magnitude of one or
more attributes in a person. The assessment of human characteristics involves
observations, interviews, checklists, inventories, projectives, and other psychological tests.
In sum, tests represent only one source of information used in the assessment process.
In assessment, the examiner must compare and combine data from different sources. This
is an inherently subjective process that requires the examiner to sort out conflicting in-
formation and make predictions based on a complex gestalt of data.
Role of The assessor is key to the The tester is not key to the
Evaluator process of selecting tests and/or process; practically speaking, one
other tools of evaluation as well as tester may be substituted for
drawing conclusion from the entire another tester without appreciably
evaluation affecting the evaluation
Skill of the Typically requires an Typically requires
Evaluator educated selection of tools of technician-like skills in terms of
evaluation, skill in evaluation, and administering and scoring a test
thoughtful organization and as well as in interpreting a test
integration of data. result.
Outcome Entails a logical problem- Typically yields a test score
solving approach that brings to or series of test scores.
bear many sources of data
designed to shed light on a
referral question
Complexity Simple: Involves one
More complex: Various
uniform procedure, frequently one
procedures and dimensions
dimension.
Duration
Longer Shorter
Sources of
data Several One
Focus
The uniqueness of group, How one person or group
individual or situation compares with others
Qualifications
Knowledge of methods and Knowledge of tests and
field of assessment testing
Procedure
Subjectivity – clinical Objectivity, quantification is
judgement, critical
Cost
High Low
Purpose Arriving at a decision
Obtaining data to make
concerning the referral question or
decisions
problem
Structure
Entails both structured and
Highly structured
unstructured aspects
1917 Robert Woodworth develops the Personal Data Sheet, the first personality
test.
1920 The Rorschach inkblot test is published.
1921 Psychological Corporation—the first major test publisher—is founded by
Cattell, Thorndike, and Woodworth.
1926 Florence Goodenough publishes the Draw-A-Man Test.
1926 The first Scholastic Aptitude Test is published by the College Entrance
Examination Board.
1927 The first edition of the Strong Vocational Interest Blank is published.
1935 The Thematic Apperception Test is re- leased by Morgan and Murray at
Harvard University.
1936 Lindquist and others publish the precursor to the Iowa Tests of Basic Skills.
1936 Edgar Doll publishes the Vineland Social Maturity Scale for assessment of
adaptive behavior in those with mental retardation.
1938 L. L. Thurstone proposes that intelligence consists of about seven group
factors known as primary mental abilities.
1938 Raven publishes the Raven’s Progressive Matrices, a nonverbal test
reasoning in- tended to measure Spearman’s g factor.
1938 Lauretta Bender publishes the Bender Visual Motor Gestalt Test, a design-
copying test of visual-motor integration.
1938 Oscar Buros publishes the first Mental Measurements Yearbook.
1938 Arnold Gesell releases his scale of infant development.
1939 The Wechsler-Bellevue Intelligence Scale is published; revisions are
published in 1955 (WAIS), 1981 (WAIS-R), 1997 (WAIS-III), and 2008
(WAIS-IV).
1939 Taylor–Russell tables published for deter- mining the expected proportion
of successful applicants with a test.
1939 The Kuder Preference Record, a forced- choice interest inventory, is
published.
1942 The Minnesota Multiphasic Personality Inventory (MMPI) is published.
1948 Office of Strategic Services (OSS) uses situational techniques for selection
of officers.
1949 The Wechsler Intelligence Scale for Children is published; revisions are
published in 1974 (WISC-R), 1991 (WISC-III), and 2003 (WISC-IV).
1950 The Rotter Incomplete Sentences Blank is published.
1951 Lee Cronbach introduces coefficient alpha as an index of reliability (internal
consistency) for tests and scales.
1952 American Psychiatric Association publishes the Diagnostic and Statistical
Manual (DSM-I).
1953 Stephenson develops the Q-technique for studying the self-concept and
other variables.
1954 Paul Meehl publishes Clinical vs. Statistical Prediction.
1956 The Halstead-Reitan Test Battery begins to emerge as the premiere test
battery in neuropsychology.
1957 C. E. Osgood describes the semantic differential.
1958 Lawrence Kohlberg publishes the first version of his Moral Judgment Scale;
research with it expands until the mid-1980s.
1959 Campbell and Fiske publish a test validation approach known as the
Multitrait-multimethod matrix.
1963 Raymond Cattell proposes the theory of fluid and crystallized intelligences.
1967 In Hobson v. Hansen the court rules against the use of group ability tests to
“track” students on the grounds that such tests dis- criminate against
minority children.
1968 American Psychiatric Association publishes DSM-II.
1969 Nancy Bayley publishes the Bayley Scales of Infant Development (BSID).
The revised version (BSID-2) is published in 1993.
1969 Arthur Jensen proposes the genetic hypothesis of African American versus
white IQ differences in the Harvard Educational Review.
1971 In Griggs v. Duke Power the Supreme Court rules that employment test
results must have a demonstrable link to job performance.
1971 George Vaillant popularizes a hierarchy of 18 ego adaptive mechanisms and
describes a methodology for their assessment.
1971 Court decision requires that tests used for personnel selection must be job
relevant (Griggs v. Duke Power).
1972 The Model Penal Code rule for legal in- sanity is published and widely
adopted in the United States.
1974 Rudolf Moos begins publication of the Social Climate Scales to assess
different environments.
1974 Friedman and Rosenman popularize the Type A coronary-prone behavior
pattern; their assessment is interview-based.
1975 The U.S. Congress passes Public Law 94142, the Education for All
Handicapped Children Act.
1978 Jane Mercer publishes SOMPA (System of Multicultural Pluralistic
Assessment), a test battery designed to reduce cultural discrimination.
1978 In the Uniform Guidelines on Employee Selection adverse impact is defined
by the four-fifths rule; also guidelines for employee selection studies are
published.
1979 In Larry P. v. Riles the court rules that standardized IQ tests are culturally
biased against low-functioning black children.
1980 In Parents in Action on Special Education v. Hannon the court rules that
standardized IQ tests are not racially or culturally biased.
1985 The American Psychological Association and other groups jointly publish the
influential Standards for Educational and Psychological Testing.
1985 Sparrow and others publish the Vineland Adaptive Behavior Scales, a
revision of the pathbreaking 1936 Vineland Social Maturity Scale.
1987 American Psychiatric Association publishes DSM-III-R.
1989 The Lake Wobegon Effect is noted: Virtually all states of the union claim
that their achievement levels are above average.
1989 The Minnesota Multiphasic Personality Inventory-2 is published.
1992 American Psychological Association publishes a revised Ethical Principles of
Psychologists and Code of Conduct (American Psychologist, December
1992)
1994 American Psychiatric Association publishes DSM-IV.
1994 Herrnstein and Murray revive the race and IQ heritability debate in The Bell
Curve.
1999 APA and other groups publish revised Standards for Educational and
Psychological Testing.
2003 New revision of APA Ethical Principles of Psychologists and Code of Conduct
goes into effect.
Tests can be broadly grouped into two camps: group tests versus individual
tests. Group tests are largely pencil-and-paper measures suitable to the testing of large
groups of persons at the same time. Individual tests are instruments that by their design
and purpose must be administered one on one. An important advantage of individual tests
is that the examiner can gauge the level of motivation of the subject and assess the
relevance of other factors (e.g., impulsiveness or anxiety) on the test results.
A. Form
Paper and Pencil test
Performance test
B. Time Element
Speed tests
Power tests
Tests without time limits
C. Responses
Verbal responses
Non-verbal responses
D. Scoring Procedure
Objectively – scored test
Subjectively – scored test
E. Standardization
Standardized test
Non-standardized test
F. Levels
Level A
Level B
Level C
By far the most common use of psychological tests is to make decisions about
persons. For example, educational institutions frequently use tests to determine placement
levels for students, and universities ascertain who should be admitted, in part, on the basis
of test scores. State, federal, and local civil service systems also rely heavily on tests for
purposes of personnel selection.
Even the individual practitioner exploits tests, in the main, for decision making.
Examples include the consulting psychologist who uses a personality test to determine that
a police department hire one candidate and not another, and the neuropsychologist who
employs tests to conclude that a client has suffered brain damage.
But simple decision making is not the only function of psychological testing. It is
convenient to distinguish five uses of tests:
• Classification
• Diagnosis and treatment planning
• Self-knowledge
• Program evaluation
• Research
Education Setting
• School readiness and school admission
• Classroom selection or classification of students with reference to their ability to
profit from different types of school instruction.
• Identification of exceptionality
• Diagnosis of academic failures and learning disabilities
• Educational planning and career counseling
• Evaluation of student competencies
• Evaluation of teacher competencies
• Evaluation of instructional programs
Business/Industrial Setting
Psychological tests are used in conjunction with other methods of obtaining information
about individuals i.e., biological data, application forms, interviews, work samples and
employment records.
• Selection of new employees: Hiring, Classification, and Job Assignment
• Evaluation of current employees: Job Transfer, Training, Promotion, Termination
• Evaluation of programs and/or products
• Assessment of consumer behavior
Counseling or Clinical Setting
The use of tests in counseling has broadened from educational/vocational planning to
involvement in all aspects of the person’s life. Tests are used to enhance self-
understanding and personal development.
• Identification of intellectual deficiencies
• Psycho diagnosis/ differential diagnosis of psychopathology
• Clinical assessment of emotional/behavioral disorders
• Marital and family assessment
• Assessment in Health and Legal Context
Characteristic 1. Reliability:
The dictionary meaning of reliability is consistency, depend-ence or trust. So in
measurement reliability is the consistency with which a test yields the same result in
measuring whatever it does measure. A test score is called reliable when we have reason
for believing the score to be stable and trust-worthy. Stability and trust-worthiness depend
upon the degree to which the score is an index of time-reliability’ is free from chance error.
Therefore reliability can be defined as the degree of consistency between two
measurements of the same thing. For example we administered an achievement test on
Group-A and found a mean score of 55. Again after 3 days we administered the same test
on Group-A and found a mean score of 55. It indicates that the measuring instrument
(Achievement test) is providing a stable or dependable result. On the other hand if in the
second measurement the test provides a mean score around 77 then we can say that the
test scores are not consistent.
In the words of Gronlund and Linn (1995) “reliability refers to the consistency of
measurement—that is, how consistent test scores or other evaluation results are from one
measurement to other.” C.V. Good (1973) has defined reliability as the “worthiness with
which a measuring device measures something; the degree to which a test or other
instrument of evaluation measures consistently whatever it does in fact measure.”
According to Ebel and Frisbie (1991) “the term reliability means the consistency with which
a set of test scores measure whatever they do measure.” Theoretically, reliability is defined
as the ratio of the true score and observed score variance. According to Davis (1980) “the
degree of relative precisions of measurement of a set of test score is defined as reliability.”
Thus reliability answers to the following questions (Gronlund and Linn, 1995):
• How similar the test scores are if the lost is administered twice?
• How similar the test scores are if two equivalent forms of tests are administered?
• To what extent the scores of any essay test. Differ when it is scored by different
teachers?
It is not always possible to obtain perfectly consistent results. Because there are several
factors like physical health, memory, guessing, fatigue, forgetting etc. which may affect the
results from one measurement to other. These extraneous variables may introduce some
error to our test scores. This error is called as measurement errors. So while determining
reliability of a test we must take into consideration the amount of error present in
measurement.
Characteristic 2. Validity:
From the above definitions it is clear that validity of an evaluation device is the
degree to which it measures what it is intended to measure. Validity is always concerned
with the specific use of the results and the soundness of our proposed interpretation.
It is not also necessary that a test which is reliable may also be valid. For example,
suppose a clock is set forward ten minutes. If the clock is a good time piece, the time it
tells us will be reliable. Because it gives a constant result. But it will not be valid as judged
by ‘Standard time’. This indicates “the concept that reliability is a necessary but not a
sufficient condition for validity.”
Characteristic 3. Objectivity:
Gronlund and Linn (1995) states “Objectivity of a test refers to the degree to which
equally competent scores obtain the same results. So, a test is considered objective when
it makes for the elimination of the scorer’s personal opinion and bias judgement. In this
con-text there are two aspects of objectivity which should be kept in mind while
constructing a test.”
• Objectivity in scoring.
• Objectivity in interpretation of test items by the test taker.
Objectivity of Scoring:
Objectivity of scoring means same person or different persons scoring the test at
any time arrives at the same result without may chance error. A test to be objective must
necessarily so worded that only correct answer can be given to it. In other words, the
personal judgement of the individual who score the answer script should not be a factor
affecting the test scores. So that the result of a test can be obtained in a simple and precise
manner if the scoring procedure is objective. The scoring procedure should be such that
there should be no doubt as to whether an item is right or wrong or partly right or partly
wrong.
Objectivity of Test Items:
By item objectivity we mean that the item must call for a definite single answer.
Well-con-structed test items should lead themselves to one and only one interpretation by
students who know the material involved. It means the test items should be free from
ambiguity. A given test item should mean the same thing to all the students that the test
maker intends to ask. Dual meaning sentences, items having more than one correct answer
should not be included in the test as it makes the test subjective.
Characteristic 4. Usability:
So, while constructing or selecting a test the following practical aspects must be taken into
account:
1. Ease of Administration:
It means the test should be easy to administer so that the general class-room teachers can
use it. Therefore, simple and clear directions should be given. The test should posses very
few subtests. The timing of the test should not be too difficult.
Appropriate time limit to take the test should be provided. If in order to provide ample time
to take the test we shall make the test shorter than the reliability of the test will be reduced.
Gronlund and Linn (1995) are of the opinion that “Somewhere between 20 and 60 minutes
of testing time for each individual score yielded by a published test is probably a fairly good
guide”.
Another im-portant aspect of test scores are interpretation of test scores and application of
test results. If the results are misinterpreted, it is harmful on the other hand if it is not
applied, then it is useless.
Equivalent forms tests helps to verify the questionable test scores. It also helps to eliminate
the factor of memory while retesting pupils on same domain of learning. Therefore
equivalent forms of the same test in terms of content, level of difficulty and other
characteristics should be available.
5. Cost of Testing:
A test should be economical from preparation, administration and scoring point of view.
References:
Davis, C. (1980). Perkins-Binet Tests of Intelligence for the blind. Watertown, MA:
Perkins School for the Blind.
Gronlund, N.E. and Linn, R.L. (1995) Measurement and Assessment in Teachng. New
York: Macmillan.
Petersen, N. S., Kolen, M. J., & Hoover, H. D. (1989). Scal- ing, norming, and
equating. In R. L. Linn (Ed.), Edu- cational measurement (3rd ed.). New York:
American Council on Education/Macmillan.