Psych Ass Notes

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 24

Psychological Assessment Transcripts

RGO Review Center | Board Licensure Examination for Psychometrician 2023

PSYCHOLOGICAL TESTING AND ASSESSMENT 2. Pre-assessment: meet with assessee or others before the
formal assessment to clarify aspects of the reason for
-Alfred Binet published a test designed to help place Paris referral
schoolchildren in appropriate classes 3. Prepares for the assessment by preparing the tools to be
used
-World War I: military needed a way to screen large numbers of 4. Assessor writes a report of the finding that is designed to
recruits quickly for intellectual and emotional problems; military answer the referral question
needed a way to screen large numbers of recruits quickly for TOOLS OF ASSESSMENT
intellectual

-World War II: military would depend even more on psychological -test: measuring device or procedure
tests to screen recruits for service
-psychological test: device or procedure designed to measure
-tests to measure not only intelligence but also personality, brain variables related to psychology; almost always involves an analysis
functioning, performance at work, and many other aspects of of behavior
psychological and social functioning
 Interview
-by World War II: distinction between testing and a more inclusive  Portfolio
term, “assessment”  Case history data
 Behavioral observation
-psychological assessment: gathering and integration of psychology-  Role-play tests
related data for the purpose of making a psychological evaluation that  CAPA (computer-assisted psychological assessment): scoring
is accomplished through the use of tools such as tests, interviews, is immediate
case studies, behavioral observation, and specially designed
apparatuses and measurement procedures  CAT (computer adaptive testing): ability to tailor the test to
the testtaker’s ability or test-taking pattern
-psychological testing: process of measuring psychology-related
variables by means of devices or procedures designed to obtain a -advantages over paper-and-pencil tests:
sample of behavior  Test administrators have greater access to potential users
because of the global reach of the internet

 Scoring and interpretation tend to be quicker than for


paper-and-pencil tests

 Costs associated with internet testing tend to be lower than


costs associated with paper-and-pencil tests

 Internet facilitates the testing of otherwise isolated


populations, as well as people with disabilities for whom
getting to a test center might prove a hardship
VARIABLES IN PSYCHOLOGICAL TESTS

 Content
 Format
 Administration procedures
 Scoring and interpretation
o score: code or summary statement, usually but not
-therapeutic psychological assessment: assessment that has a
necessarily numerical in nature; reflects an evaluation of
therapeutic component to it
performance on a test, task, interview, or some other
sample of behavior
-educational assessment: evaluate abilities and skills relevant to
success or failure in a school or pre-school context (i.e., intelligence
o scoring: process of assigning such evaluative codes or
tests, achievement tests, and reading comprehension tests)
statements to performance on tests, tasks, interviews, or
other behavior samples
-retrospective assessment: use of evaluative tools to draw
conclusions about psychological aspects of a person as they existed at
o cut score: reference point, usually numerical, derived by
some point in time prior to the assessment
judgment and used to divide a set of data into two or more
-remote assessment: draw conclusions about a subject who is not in classifications; used in schools, also used by employers as
physical proximity to the person or people conducting the evaluation aids to decision making about personnel hiring,
placement, and advancement
-Ecological Momentary Assessment (EMA): psychological
assessment by means of smartphones; “in the moment” evaluation -some tests are self-scored by testtakers themselves, are scored
by a computer, and others require scoring by trained examiners

-most tests of intelligence come with test manuals


-dynamic assessment: educational setting; follows a model:
 evaluation,  Psychometric soundness
 intervention of some sort
WHO, WHAT, WHY, HOW, AND WHERE?
 evaluation
WHO?
-collaborative assessment: assessor and assessee may work as
“partners”
-Parties in the assessment enterprise include developers and
publishers of tests, users of tests, and people who are evaluated by
PROCESS OF ASSESSMENT
means of tests
1. Referral
Page | 1
Psychological Assessment Transcripts
RGO Review Center | Board Licensure Examination for Psychometrician 2023

-may consider society at large as party to the assessment enterprise -for security purposes the test publisher will typically require
documentation of professional training before filling an order for
a test manual
IN WHAT TYPES OF SETTINGS?
 Educational
 Clinical  Professional books
 Counseling  Reference volumes
 Geriatric  Journal articles
 Business and military -articles in current journals may contain reviews of the test,
 Governmental and organizational updated or independent studies of its psychometric soundness,
 Academic research or examples of how the instrument was used in either research
or an applied context
HOW ARE ASSESSMENTS CONDUCTED?
-responsible test users have obligations before, during, and after a test  Online databases
or any measurement procedure is administered
HISTORY
-before the test, when test users have discretion with regard to the test 19 TH
CENTURY
administered, they should select and use only the test or tests that are
more appropriate for the individual being tested China as early as 2200 B.C.E.
-Testing was instituted as a means of selecting who, of many
-test administrator (or examiner) must be familiar with the test applicants, would obtain government jobs
materials and procedures and must have at the test site all the
materials needed to properly administer the test -tests examined proficiency in subjects like music, archery,
horsemanship, writing, and arithmetic, as well as agriculture,
-Test users have the responsibility of ensuring that the room in which geography, civil law, and military strategy
the test will be conducted is suitable and conducive to the testing
-knowledge of and skill in the rites and ceremonies of public and
-During test administration rapport between the examiner and the social life were also evaluated during the Song (or Sung) dynasty
examinee can be critically important
-tests emphasized knowledge of classical literature
-After a test administration, these obligations range from
safeguarding the test protocols to conveying the test results in a -testtakers who demonstrated their command of the classics were
clearly understandable fashion. perceived as having acquired the wisdom of the past and were
therefore entitled to a government position
-If third parties were present during testing or if anything else that -passing the examinations could result in exemption from taxes
might be considered out of the ordinary happened during testing, it is
the test user’s responsibility to make a note of such events on the Ancient Greco-Roman writings
report of the testing. -attempts to categorize people in terms of personality types

-if a test is to be scored by people, scoring needs to conform to pre- -such categorizations typically included reference to an
established scoring criteria overabundance or deficiency in some bodily fluid (such as blood or
phlegm) as a factor believed to influence personality
-test users who have responsibility for interpreting scores or other test
results have an obligation to do so in accordance with established Renaissance
procedures and ethical guidelines. -psychological assessment in the modern sense began to emerge

-Charles Darwin: chance variation in species would be selected or


Assessment of people with disabilities rejected by nature according to adaptivity and survival value
Alternate assessment
-programs for children who, as a result of a disability, could not -Darwin’s writing on individual differences kindled interest in
otherwise participate in state- and district-wide assessments research on heredity

- an evaluative or diagnostic procedure or process that varies from the -Francis Galton: influential contributor to the field of measurement;
usual, customary, or standardized way a measurement is derived, aspired to classify people “according to their natural gifts” and to
either by virtue of some special accommodation made to the assessee ascertain their “deviation from an average”
or by means of alternative methods designed to measure the same - credited with devising or contributing to the development
variables. of many contemporary tools of psychological assessment,
including questionnaires, rating scales, and self-report
Accommodation inventories.
-adaptation of a test, procedure, or situation, or the substitution of one
test for another, to make the assessment more suitable for an assessee -Karl Pearson: developed the product-moment correlation
with exceptional needs technique, its roots can be traced directly to the work of Galton
WHERE TO GO FOR AUTHORITATIVE -assessment was also an important activity at the first experimental
 Test catalogues INFORMATION? psychology laboratory, founded at the University of Leipzig in
- distributed by the publisher of the test Germany by Wilhelm Max Wundt

-usually contain only a brief description of the test and seldom -James McKeen Cattell: dealt with individual differences, coined
contain the kind of detailed technical information that a the term mental test
prospective user might require. objective is to sell the test
-Spearman is credited with originating the concept of test reliability
 Test manuals as well as building the mathematical framework for the statistical
-detailed information concerning the development of a particular technique of factor analysis.
test and technical
-Victor Henri: collaborate with Alfred Binet on papers suggesting
-information relating to it should be found in the test manual, how mental tests could be used to measure higher mental processes
which usually can be purchased from the test publisher
20TH CENTURY

Page | 2
Psychological Assessment Transcripts
RGO Review Center | Board Licensure Examination for Psychometrician 2023

-much of the nineteenth-century testing that could be described as


psychological in nature involved the measurement of sensory Verbal communication
abilities, reaction time, etc. -language is a key yet sometimes overlooked variable in the
assessment process
The measurement of intelligence
-Binet and collaborator Theodore Simon: published a 30-item -examiner and the examinee must speak the same language
“measuring scale of intelligence” designed to help identify Paris
schoolchildren with intellectual disability -when an assessment is conducted with the aid of a translator,
different types of problems may emerge
-David Wechsler: introduced a test designed to measure adult
intelligence; intelligence -depending upon the translator’s skill and professionalism, subtle
nuances of meaning may be lost in translation, or unintentional hints
-natural outgrowth of the individually administered intelligence test to the correct or more desirable response may be conveyed
devised by Binet was the group intelligence test in response to the
military’s need for an efficient method of screening the intellectual Nonverbal communication and behavior
ability of World War I recruits This -humans communicate not only through verbal means but also
through nonverbal means
-same need again became urgent as the United States prepared for
entry into World War II Standards of evaluation
The measurement of personality -cultures differ from one another in the extent to which they are
-Personal Data Sheet by Woodworth: World War I had brought individualist or collectivist
with it not only the need to screen the intellectual functioning of
recruits but also the need to screen for recruits’ general adjustment -individualist culture: self-reliance, autonomy, independence,
uniqueness, and competitiveness
-Woodworth Psychoneurotic Inventory: after the war, Woodworth
developed a personality test for civilian use that was based on the -collectivist culture: conformity, cooperation, interdependence, and
Personal Data Sheet; first widely used self-report measure of striving toward group goals
personality
TESTS AND GROUP MEMBERSHIP
-best known of all projective tests is the Rorschach: a series of Conflict
inkblots developed by the Swiss psychiatrist Hermann Rorschach. -groups systematically differ in terms of scores on a particular test
The academic and applied traditions Affirmative action
-researchers at universities throughout the world use the tools of -voluntary and mandatory efforts undertaken by federal, state, and
assessment to help advance knowledge and understanding of human local governments, private employers, and schools to combat
and animal behavior discrimination and to promote equal opportunity for all in education
and employment
-examinations developed there to help select applicants for various
positions on the basis of merit -seeks to create equal opportunity actively, not passively
CULTURE AND ASSESSMENT -In assessment, one way of implementing affirmative action is by
altering test-scoring procedures according to set guidelines
-professionals involved in the assessment enterprise have
shown increasing sensitivity to the role of culture in many LEGAL CONSIDERATIONS
different aspects of measurement Code of professional ethics
-recognized and accepted by members of a profession; defines the
Henry H. Goddard standard of care expected of members of that profession
-translated Binet’s test
Standard of care
-found most immigrants from various nationalities to be mentally -level at which the average, reasonable, and prudent professional
deficient when tested; largely the result of using would provide diagnostic or therapeutic services under the same or
a translated Binet test that overestimated mental deficiency in native similar conditions
English-speaking populations
The Concerns of the Public
-one way that early test developers attempted to deal with the impact
-assessment enterprise has never been well understood by the public
of language and culture on tests of mental ability was, in essence, to
“isolate” the cultural variable
-concern about the use of psychological tests first became widespread
in the aftermath of World War I, when various professionals sought
Culture-specific tests
to adapt group tests developed by the military for civilian use in
-tests designed for use with people from one culture but not from
schools and industry
another
The Concerns of the Profession
-Stanford-Binet Intelligence Scale included no minority children
Test-user qualifications
-Wechsler-Bellevue Intelligence Scale contained no minority
 Level A
members
- can adequately be administered, scored, and interpreted with the aid
ISSUES IN CULTURE AND ASSESSMENT of the manual and a general orientation to the kind of institution or
organization in which one is working
-communication between assessor and assessee is a most basic part of
assessment
 Level B
-assessors must be sensitive to any differences between the language -require some technical knowledge of test construction and use and of
or dialect familiar to assessees and the language in which the supporting psychological and educational fields such as statistics,
assessment is conducted individual differences, psychology of adjustment, personnel
psychology, and guidance
-assessors must also be sensitive to the degree to which assessees
have been exposed to the dominant culture and the extent to which  Level C
they have made a conscious choice to become assimilated.
Page | 3
Psychological Assessment Transcripts
RGO Review Center | Board Licensure Examination for Psychometrician 2023

-require substantial understanding of testing and supporting -measurement always involves error: collective influence of all of
psychological fields together with supervised experience in the use of the factors on a test score or measurement beyond those specifically
these measured by the test or measurement

Testing people with disabilities Scale


-challenges may include: -set of numbers whose properties model empirical properties of the
objects to which the numbers are assigned
 Transforming the test into a form that can be taken by the
testtaker -a scale used to measure a continuous variable might be referred to as
 Transforming the responses of the testtaker so that they are a continuous scale, whereas a scale used to measure a discrete
variable might be referred to as a discrete scale
scorable
 Meaningfully interpreting the test data
-measurement using continuous scales always involves error
Computerized test administration, scoring, and interpretation
Nominal Scales
-some major issues with regard to CAPA are as follows:
-simplest form of measurement
 Access to test administration, scoring, and interpretation -classification or categorization based on one or more distinguishing
software characteristics, where all things measured must be placed into
 Comparability of pencil-and-paper and computerized mutually exclusive and exhaustive categories (e.g., yes/no responses)
versions of tests
 The value of computerized test interpretations Ordinal Scales
 Unprofessional, unregulated “psychological testing” online -rank ordering on some characteristic

Guidelines with respect to certain populations - imply nothing about how much greater one ranking is than another
-designed to assist professionals in providing informed and
developmentally appropriate services -even though ordinal scales may employ numbers or “scores” to
represent the rank ordering, the numbers do not indicate units of
-standards must be followed by all psychologists, guidelines are more measurement
aspirational in nature
THE RIGHTS OF TESTTAKERS Interval Scales
-contain equal intervals between numbers
 The right of informed consent
 The right to be informed of test findings -each unit on the scale is exactly equal to any other unit on the scale;
 The right to privacy and confidentiality interval scales contain no absolute zero point (e.g., IQ tests)
-privacy right: recognizes the freedom of the individual to pick
Ratio Scales
and choose for himself the time, circumstances, and particularly -has a true zero point
the extent to which he wishes to share or withhold from others
his attitudes, beliefs, behavior, and opinions MEASUREMENT SCALES IN PSYCHOLOGY
-ordinal level of measurement is most frequently used in psychology
-privileged information: information that is protected by law
from disclosure in a legal proceeding -intelligence, aptitude, and personality test scores are, basically and
strictly speaking, ordinal
-confidentiality: concerns matters of communication outside Describing Data
the courtroom -distribution: a set of test scores arrayed for recording or study

-Privilege is not absolute; privilege in the psychologist–client -raw score: straightforward, unmodified accounting of performance
relationship belongs to the client, not the psychologist that is usually numerical

 The right to the least stigmatizing label Frequency Distributions


-frequency with which they occur

-all scores are listed alongside the number of times each score
occurred.

-scores might be listed in tabular or graphic form

-Often, a frequency distribution is referred to as a simple frequency


distribution to indicate that individual scores have been used and the
data have not been grouped

-also called class intervals, replace the actual test scores

STATISTICS REFRESHER -frequency distributions of test scores can also be illustrated


graphically
SCALES OF MEASUREMENT
 Graph: diagram or chart composed of lines, points, bars, or
Measurement other symbols that describe and illustrate with vertical lines
-act of assigning numbers or symbols to characteristics of things drawn at the true limits of each test score (or class interval),
according to rules forming a series of contiguous rectangles.

-the rules used in assigning numbers are guidelines for representing


the magnitude of the object being measured

Page | 4
SCALES Assessment
OF MEASUREMENT
Psychological Transcripts
RGO Review Center | Board Licensure Examination for Psychometrician 2023

-In a bar graph, the rectangular bars typically are not -most commonly used measure of central tendency
contiguous
-equal to the sum of the observations

-most appropriate measure of central tendency for interval or ratio


data when the distributions are believed to be approximately normal

Median
-middle score in a distribution

-determine the median of a distribution of scores by ordering the


scores in a list by magnitude, in either ascending or descending order

-if the total number of scores ordered is an odd number, then the
median will be the score that is exactly in the middle

-when the total number of scores ordered is an even number, then the
median can be calculated by determining the arithmetic mean of the
two middle scores.

-appropriate measure of central tendency for ordinal, interval, and


ratio data

-may be a particularly useful measure of central tendency in cases


where relatively few scores fall at the high end of the distribution or
relatively few scores fall at the low end of the distribution
 Histogram: vertical lines drawn at the true limits of each test
score (or class interval), forming a series of contiguous Mode
rectangles -most frequently occurring score in a distribution of scores is the
mode

-bimodal distribution: two scores that occur with the highest


frequency

-except with nominal data, the mode tends not to be a very commonly
used measure of central tendency

-the value of the modal score is not calculated; one simply counts and
determines which score occurs most frequently

-mode is useful in analyses of a qualitative or verbal nature

MEASURES OF CENTRAL TENDENCY

Variability
-an indication of how scores in a distribution are scattered or
dispersed

-statistics that describe the amount of variation in a distribution are


referred to as measures of variability

Range
 Frequency polygon: continuous line connecting the points -equal to the difference between the highest and the lowest scores;
where test scores or class intervals simplest measure of variability to calculate

The interquartile and semi-interquartile ranges


-Q1, Q2, and Q3

-quartile refers to a specific point, whereas quarter refers to an


interval

-the interquartile range is a measure of variability equal to the


difference between Q3 and Q1; ordinal statistic

-semi-interquartile range: equal to the interquartile range divided


by 2

-in a perfectly symmetrical distribution, Q1 and Q3 will be exactly


the same distance from the median

-lack of symmetry is referred to as skewness

MEASURES OF CENTRAL TENDENCY


-indicates the average or midmost score between the extreme scores
in a distribution

Mean

Page | 5
Psychological Assessment Transcripts
RGO Review Center | Board Licensure Examination for Psychometrician 2023

Kurtosis
-average deviation: another tool that could be used to describe the -the steepness of a distribution in its center is
amount of variability in a distribution
 Platykurtic (relatively flat)
The standard deviation  Leptokurtic (relatively peaked)
-equal to the square root of the average squared deviations about the  Mesokurtic (somewhere in the middle)
mean

-equal to the square root of the variance: equal to the arithmetic


mean of the squares of the differences between the scores in a
distribution and their mean

-the variance is calculated by squaring and summing all the deviation


scores and then dividing by the total number of scores

-variance is a widely used measure in psychological research

-standard deviation is a very useful measure of variation because each


individual score’s distance from the mean of the distribution is THE NORMAL CURVE
factored into its computation.
-Karl Friedrich Gauss: “Laplace-Gaussian curve.”
Skewness
-an indication of how the measurements in a distribution are -Karl Pearson is credited with being the first to refer to the curve as
distributed the normal curve
-positive skew when relatively few of the scores fall at the high end
of the distribution; test was difficult -a bell-shaped, smooth, mathematically defined curve that is highest
at its center

AREA UNDER THE NORMAL CURVE


-normal curve can be conveniently divided into areas defined in units
of standard deviation

-normal curve has two tails; area on the normal curve between 2 and
3 standard deviations above the mean and −2 and −3 standard
deviations below the mean are referred to as a tail.

STANDARD SCORES
-a raw score that has been converted from one scale to another scale

-a standard score obtained by a linear transformation is one that


retains a direct numerical relationship to the original raw score

-a nonlinear transformation may be required when the data under


consideration are not normally distributed yet comparisons with
-negative skew when relatively few of the scores fall at the low end normal distributions need to be made
of the distribution; test was too easy
-resulting standard score does not necessarily have a direct numerical
relationship to the original raw score; as the result of a nonlinear
transformation, the original distribution is said to have been
normalized

Z Score (M: 0 SD: 1)


-conversion of a raw score into a number indicating how many
standard deviation units the raw score is below or above the mean of
the distribution

-zero plus or minus one scale (-1, 0, +1)

Page | 6
Psychological Assessment Transcripts
RGO Review Center | Board Licensure Examination for Psychometrician 2023

-tells us the extent to which X and Y are “co-related”

-two ways to describe a perfect correlation between two variables are


as either +1 or −1
T Scores (M: 50 SD: 10)
-fifty plus or minus ten scale (+50, -10) -a negative (or inverse) correlation occurs when one variable
increases while the other variable decreases
T = z score (10) + 50
-if a correlation is zero, then absolutely no relationship exists
Stanine (M: 5 SD: 2) between the two variables

-familiar to many students from achievement tests; take on whole -although correlation does not imply causation, there is an
implication of prediction
values from 1 to 9
PEARSON R
Sten (M: 5.5 SD: 2) -most widely used of all

-Pearson correlation coefficient; Pearson product-moment coefficient


of correlation

-Pearson r should be used only if the relationship between the


variables is linear

-statistical tool of choice when the two variables being correlated are
continuous

-value obtained for the coefficient of correlation can be further


interpreted by deriving from it what is called a coefficient of
determination, or r2

-indication of how much variance is shared by the X- and the Y-


variables

-calculation of r2: square the correlation coefficient and multiply by


100; the result is equal to the percentage of the variance accounted
for

SPEARMAN RHO

-also known as rank-order correlation coefficient, a rank-difference


correlation coefficient, or simply Spearman’s rho

-frequently used when the sample size is small (fewer than 30 pairs of
measurements) and especially when both sets of measurements are in
ordinal (or rank-order) form

NORMALIZED STANDARD SCORES GRAPHIC REPRESENTATIONS OF CORRELATION


Scatterplot
-normalizing a distribution involves “stretching” the skewed curve -also known as bivariate distribution or a scatter diagram, a
into the shape of a normal curve and creating a corresponding scale scattergram
of standard scores, a scale that is technically referred to as a
normalized standard score scale -simple graphing of the coordinate points

-one of the primary advantages of a standard score on one test is that -provide a quick indication of the direction and magnitude of the
it can readily be compared with a standard score on another test relationship, if any, between the two variables

-transformations should be made only when there is good reason to -useful in revealing the presence of curvilinearity: “eyeball gauge” of
believe that the test sample was large enough and representative how curved a graph is.
enough and that the failure to obtain normally distributed scores was
due to the measuring instrument -a graph also makes the spotting of outliers relatively easy
CORRELATION AND INFERENCE -outlier: extremely atypical point located at an outlying distance
from the rest of the coordinate points in a scatterplot
-central to psychological testing and assessment are inferences
(deduced conclusions) about how some things are related to other -outliers are the result of administering a test to a very small sample
things of testtakers

-correlation is an expression of the degree and direction of META-ANALYSIS


correspondence between two things
-a family of techniques used to statistically combine information
-coefficient of correlation: strength of the relationship between two across studies to produce single estimates of the data under study
things
-the estimates derived (effect size) may take several different forms
-a coefficient of correlation (r) expresses a linear relationship
between two variables, usually continuous in nature -in most meta-analytic studies, effect size is typically expressed as a
correlation coefficient
Page | 7
Psychological Assessment Transcripts
RGO Review Center | Board Licensure Examination for Psychometrician 2023

-effect size: the size of the differences between groups PARAMETRIC NON-PARAMETRIC
Independent samples Mann-Whitney
-key advantage of meta-analysis over simply reporting a range of Dependent samples Wilcoxon Signed Rank
findings is that more weight can be given to studies that have larger One-way ANOVA Kruskal-Wallis
numbers of subjects MANOVA Friedman test
-some advantages to meta-analyses are: Pearson r Spearman Rho

 Meta-analyses can be replicated


 The conclusions of meta-analyses tend to be more reliable OF TESTS AND TESTING
and precise than the conclusions from single studies
 There is more focus on effect size rather than statistical Assumption 1: Psychological Traits and States Exist
significance alone Assumption 2: Psychological Traits and States Can Be
 Meta-analysis promotes evidence-based practice, which may Quantified and Measured
be defined as professional practice that is based on clinical Assumption 3: Test-Related Behavior Predicts Non-Test-Related
and research findings Behavior
Assumption 4: Tests and Other Measurement Techniques Have
OTHER STATISTICAL TREATMENTS Strengths and Weaknesses
PARAMETRIC MEASURES Assumption 5: Various Sources of Error Are Part of the
Assessment Process
-based on assumptions about the distribution of population from Assumption 6: Testing and Assessment Can Be Conducted in a
which the sample was taken Fair and Unbiased Manner
Assumption 7: Testing and Assessment Benefit Society
-for data with a normal distribution
WHAT’S A GOOD TEST?
Independent samples t-test
-whether means of two groups are different from each other -criteria for a good test would include clear instructions for
administration, scoring, and interpretation

Dependent samples t-test -one that measures what it purports to measure


-pre-test and post-test
-psychometric soundness of tests: reliability and validity
- same subjects
Other Considerations
One-way ANOVA -one that trained examiners can administer, score, and interpret with
-compare means of more than 2 groups when there is only one a minimum of difficulty
independent and dependent variable; different respondents
-useful; one that yields actionable results that will ultimately benefit
One-way Repeated Measures ANOVA individual testtakers or society at large
-compare means of three or more groups where the respondents are
the same for each treatment -if the purpose of a test is to compare the performance of the testtaker
with the performance of other testtakers, then a “good test” is one
that contains adequate norms
Pearson r
NORMS
NON-PARAMETRIC MEASURES
-also referred to as normative data, norms provide a standard with
-data can be collected from a sample that does not follow a specific which the results of measurement can be compared
distribution
-norm-referenced testing and assessment: way of deriving meaning
-for skewed data from test scores by evaluating an individual testtaker’s score and
comparing it to scores of a group of testtakers
Mann-Whitney U Test
-used to compare differences between two independent groups where -test performance: data of a particular group of testtakers that are
the dependent variable is either ordinal or continuous, but not designed for use as a reference when evaluating or interpreting
normally distributed individual test scores

-sister of independent samples t-test -normative sample: group of people whose performance on a
particular test is analyzed for reference in evaluating the performance
Wilcoxon Signed Rank Test of individual testtakers
-used to compare the same samples to assess whether the sample
differ from each other -norming: process of deriving norms

-sister of dependent samples t-test -some test manuals provide user norms or program norms, which
consist of descriptive statistics based on a group of testtakers in a
Kruskal-Wallis H Test given period of time rather than norms obtained by formal sampling
-used to test whether or not a group of independent samples is from methods
the same of different population
TYPES OF NORMS
-sister of ANOVA
Percentile norms
Friedman Test -raw data from a test’s standardization sample converted to percentile
-used to test whether or not the data is from the same sample under form
three conditions

-sister of two-way ANOVA  Percentile: an expression of the percentage of people


whose score on a test or measure falls below a particular
Spearman Rho raw score
Page | 8
Psychological Assessment Transcripts
RGO Review Center | Board Licensure Examination for Psychometrician 2023

-arbitrarily select some sample because we believe it to be


-a converted score that refers to the distribution of raw representative of the population
scores—more specifically, to the number of items that were
answered correctly multiplied by 100 and divided by the Convenience sample
total number of items -employ a sample that is not necessarily the most appropriate but is
simply the most convenient
-popular way of organizing all test-related data
Developing norms for a standardized test
-having obtained a sample, the test developer administers the test
Age norms according to the standard set of instructions that will be used with the
-age-equivalent scores test

-age norms indicate the average performance of different samples of -after all the test data have been collected and analyzed, the test
testtakers who were at various ages at the time the test was developer will summarize the data using descriptive statistics,
administered including measures of central tendency and variability

-norms are developed with data derived from a group of people who
Grade norms are presumed to be representative of the people who will take the test
-average test performance of testtakers in a given school grade in the future
-grade norms are developed by administering the test to
representative samples of children over a range of consecutive grade Fixed Reference Group Scoring Systems
levels -distribution of scores obtained on the test from one group of
testtakers
-mean or median score for children at each grade level is calculated
-used as the basis for the calculation of test scores for future
-primary use of grade norms is as a convenient, readily administrations of the test
understandable gauge of how one student’s performance compares
with that of fellow students in the same grade Norm-Referenced Versus Criterion-Referenced Evaluation
-not typically designed for use with adults who have returned to -criterion-referenced: evaluate it on the basis of whether or not
school some criterion has been met
-grade norms and age norms are referred to more generally as -criterion as a standard on which a judgment or decision may be
developmental norms based
National norms -criterion-referenced testing and assessment: method of
-derived from a normative sample that was nationally representative evaluation and a way of deriving meaning from test scores by
of the population at the time the norming study was conducted evaluating an individual’s score with reference to a set standard
National anchor norms -approach has also been referred to as domain- or content-
-provide some stability to test scores by anchoring them to other test referenced testing and assessment.
scores
-in norm-referenced interpretations of test data, a usual area of focus
-begins with the computation of percentile norms for each of the tests is how an individual performed relative to other people who took the
to be compared using the equipercentile method; must have been test
obtained on the same sample
-in criterion-referenced interpretations of test data, a usual area of
Subgroup norms focus is the testtaker’s performance: what the testtaker can or cannot
-segmented by any of the criteria initially used in selecting subjects do; what the testtaker has or has not learned; whether the testtaker
for the sample does or does not meet specified criteria for inclusion in some group
Local norms -criterion-referenced tests are frequently used to gauge achievement
-typically developed by test users themselves, local norms provide or mastery, they are sometimes referred to as mastery tests
normative information with respect to the local population’s
performance on some test RELIABILITY
-consistency in measurement
SAMPLING TO DEVELOP NORMS
-reliability coefficient is an index of reliability
Test standardization
-process of administering a test to a representative sample of
-a proportion that indicates the ratio between the true score variance
testtakers for the purpose of establishing norms
on a test and the total variance
Sampling
-statistic useful in describing sources of test score variability is the
-process of selecting the portion of the universe deemed to be
variance: the standard deviation squared
representative of the whole population is referred to as sampling
-true variance: variance from true differences
Sample of the population
-a portion of the universe of people deemed to be representative of
the whole population -error variance: variance from irrelevant, random sources

-the greater the proportion of the total variance attributed to true


Stratified sampling variance, the more reliable the test true differences
-sample people representing different subgroups (or strata) of the
population -because true difference is assumed to be stable, it is presumed to
yield consistent scores on repeated administrations of the same test as
-stratified-random sampling: If such sampling were random well as on equivalent forms of tests
Purposive sample

Page | 9
Psychological Assessment Transcripts
RGO Review Center | Board Licensure Examination for Psychometrician 2023

-measurement error: all of the factors associated with the process of -a measure of inter-item consistency is calculated from a single
measuring some variable, other than the variable being measured; can administration of a single form of a test
be categorized as being either:
 random error: caused by unpredictable fluctuations and -an index of inter-item consistency: useful in assessing the
inconsistencies of other variables in the measurement homogeneity of the test
process
-tests are said to be homogeneous if they contain items that measure a
 systematic error: typically constant or proportionate to single trait
what is presumed to be the true value of the variable being
measured -heterogeneity: degree to which a test
measures different factors; composed of items that measure more
-once a systematic error becomes known, it becomes predictable as than one trait.
well as fixable
-the more homogeneous a test is, the more inter-item consistency it
SOURCES OF ERROR VARIANCE can be expected to
 Test construction have
 Test administration
 Test scoring and interpretation -a homogeneous test is often an insufficient tool for measuring
 Other sources of error multifaceted psychological variables such as intelligence or
personality
RELIABILITY ESTIMATES
Split-Half Reliability Estimates
Test-Retest Reliability Estimates -obtained by correlating two pairs of scores obtained from equivalent
-using the same instrument to measure the same thing at two points in halves of a single test administered once
time
-useful measure of reliability when it is impractical or undesirable to
-correlating pairs of scores from the same people on two different assess reliability with two tests or to administer a test twice
administrations of the same test
-one acceptable way to split a test is to randomly assign items to one
-appropriate when evaluating the reliability of a test that purports to or the other half of the test
measure something that is relatively stable over time, such as a
personality trait -another way is to assign odd-numbered items to one half of the test
and even-numbered items to the other half; yields an estimate of
-passage of time can be a source of error variance; the longer the time split-half reliability that is also referred to as odd-even reliability
that passes, the greater the likelihood that the reliability coefficient
will be lower -another way to split a test is to divide the test by content so that each
half contains items equivalent with respect to content and difficulty
- coefficient of stability: interval between testing is greater than six
months -primary objective in splitting a test in half for the purpose of
obtaining a split-half reliability estimate is to create mini-parallel-
Parallel-Forms and Alternate-Forms Reliability Estimates forms
-degree of the relationship between various forms of a test
 Spearman–Brown formula
-two test administrations with the same group are required and test -used to estimate internal consistency reliability from a
scores may be affected by factors correlation of two halves of a test

-coefficient of equivalence -reliability increases as test length increases

 Parallel forms -could also be used to determine the number of items


-for each form of the test, the means and the variances of needed to attain a desired level of reliability
observed test scores are equal

-means of scores obtained on parallel forms correlate


equally with the true score  Kuder–Richardson formula 20 (KR-20)
-determining the inter-item consistency of dichotomous
-scores obtained on parallel tests correlate equally with items; right or wrong
other measures
-if test items are more heterogeneous, KR-20 will yield
 Alternate forms lower reliability estimates than the split-half method
-different versions of a test that have been constructed so as -approximation of KR-20 can be obtained by the use of
to be parallel KR-21

-alternate forms of a test are typically designed to be -KR-21 formula may be used if there is reason to assume
equivalent with respect to variables such as content and that all the test items have approximately the same degree
level of difficulty of difficulty

 Coefficient alpha
-mean of all possible split-half correlations, use on tests
-alternate forms reliability: estimate of the extent to containing nondichotomous items
which these different forms of the same test have been
affected by item sampling error, or other error -preferred statistic for obtaining an estimate of internal
consistency reliability
Internal consistency estimate of reliability
-evaluation of the internal consistency of the test items -requires only one administration of the test

-degree of correlation among all the items on a scale -ranges in value from 0 to 1

Page | 10
Psychological Assessment Transcripts
RGO Review Center | Board Licensure Examination for Psychometrician 2023

-calculated to help answer questions about how similar sets Speed test
of data are -items of uniform level of difficulty (typically uniformly low) so that,
when given generous time limits, all testtakers should be able to
-a value of alpha above .90 may be “too high”and indicate complete all the test items correctly
redundancy in the items
-score differences on a speed test are therefore based on performance
 Average proportional distance (APD) speed because items attempted tend to be correc
-focuses on the degree of difference that exists between
item scores -reliability estimate of a speed test should be based on performance
from two independent testing periods using one of the following:
-used to evaluate the internal consistency of a test that  test-retest reliability
focuses on the degree of difference that exists between item  alternate-forms reliability
scores  split-half reliability from two separately timed half tests

-general “rule of thumb” for interpreting an APD is that an -reliability of a speed test should not be calculated from a single
obtained value of .2 or lower is indicative of excellent administration of the test with a single time limit; result will be a
internal consistency, and that a value of .25 to .2 is in the spuriously high reliability coefficient
acceptable range
Criterion-referenced tests
-calculated APD of .25 is suggestive of problems with the -scores on criterion- referenced tests tend to be interpreted in pass–
internal consistency of the test fail

Measures of Inter-Scorer Reliability -scrutiny of performance on individual items tends to be for


-referred to as scorer reliability, judge reliability, observer reliability, diagnostic and remedial purposes
and inter-rate reliability,
CLASSICAL TEST THEORY
-consistency between two or more scorers with regard to a particular -referred to as the true score (or classical) model of measurement
measure
-most widely used and accepted model
-reference to levels of inter-scorer reliability for a particular test may
be published in the test’s manual or elsewhere -everyone has a “true score” on a test; one’s true score on one test of
extraversion, for example, may not bear much resemblance to one’s
-often used when coding nonverbal behavior true score on another test of extraversion

-simplest way of determining the degree of consistency among -favor the development of longer rather than shorter tests
scorers in the scoring of a test is to calculate a coefficient of
correlation: coefficient of inter-scorer reliability Domain sampling theory and generalizability theory
-estimate the extent to which specific sources of variation under
USING AND INTERPRETING COEFFICIENT OF defined conditions are contributing to the test score.
RELIABILITY
-three approaches to the estimation of reliability: test-retest, -test’s reliability is conceived of as an objective measure of how
alternate or parallel forms, and internal or inter-item consistency precisely the test score assesses the domain from which the test draws
a sample

-the items in the domain are thought to have the same means and
variances of those in the test that samples from the domain
PURPOSE OF THE RELIABILITY COEFFICIENT
-for a test designed for a single administration only, an estimate of Generalizability theory
internal consistency would be the reliability measure of choice -test scores vary from testing to testing because of variables in the
testing situation.
-transient error a source of error attributable to variations in the
testtaker’s feelings, moods, or mental state over time -universe score: analogous to a true score
NATURE OF THE TEST
-generalizability study examines how generalizable scores from a
-considerations such as whether: particular test are if the test is administered in different situations;
 The test items are homogeneous or heterogeneous in how much of an impact different facets of the universe have on the
nature test score
 The characteristic, ability, or trait being measured is
presumed to be dynamic or static -influence of particular facets on the test score is represented by
 The range of test scores is or is not restricted coefficients of generalizability
 The test is a speed or a power test
 The test is or is not criterion-referenced Decision study
-developers examine the usefulness of test scores in helping the test
Restriction of range or restriction of variance user make decisions.
-if the variance of either variable in a correlational analysis is
restricted by the sampling procedure used, then the resulting -designed to tell the test user how test scores should be used and how
correlation coefficient tends to be lower dependable those scores are as a basis for decisions
ITEM RESPONSE THEORY
Inflation of range or inflation of variance
-if the variance is inflated by the sampling procedure, the correlation -person with X ability will be able to perform at a level of Y
coefficient tends to be higher
-also known as latent-trait theory
Speed tests versus power tests
Power test -person with X amount of a particular personality trait will exhibit Y
-when a time limit is long enough to allow testtakers to attempt all amount of that trait on a personality test designed to measure it
items, and if some items are so difficult that no testtaker is able to
obtain a perfect score

Page | 11
Psychological Assessment Transcripts
RGO Review Center | Board Licensure Examination for Psychometrician 2023

-discrimination: degree to which an item differentiates among


people with higher or lower levels of the trait, ability, or whatever it - executing a comprehensive analysis of:
is that is being measured  how scores on the test relate to other test scores and
measures
-assign differential weight to the value of two items
 how scores on the test can be understood within some
STANDARD ERROR OF MEASUREMENT
theoretical framework for understanding the construct that
-often abbreviated as SEM or SEM the test was designed to measure

-provides a measure of the precision of an observed test score -construct validity as being “umbrella validity” because every other
variety of validity falls under it
-an estimate of the amount of error inherent in an observed score or
measurement
-appropriateness of inferences drawn from test scores regarding
-the higher the reliability of a test, the lower the SEM individual standings on a variable called a construct

-tool used to estimate or infer the extent to which an observed score -construct: informed, scientific idea developed or hypothesized to
deviates from a true score describe or explain behavior

-also known as the standard error of a score -constructs are unobservable, underlying traits that a test developer
may invoke to describe test behavior or criterion performance
-index of the extent to which one individual’s scores vary over tests
presumed to be parallel -investigating a test’s construct validity must formulate hypotheses
about the expected behavior of high scorers and low scorers on the
-most frequently used in the interpretation of individual test scores test

-useful in establishing what is called a confidence interval: a range - viewed as the unifying concept for all validity evidence
or band of test scores that is likely to contain the true score
Evidence of Construct Validity
-68% CI: 60.7-66.3 (1 SD away) -various techniques of construct validation may provide evidence, for
96% CI: 58.4-67.6 (2 SD away) example, that:

standard error of measurement can be used to set the confidence  The test is homogeneous, measuring a single construct
interval for a particular score or to determine whether a score is
significantly different from a criterion -Pearson r could be used to correlate average subtest scores
with the average total test score
STANDARD ERROR OF THE DIFFERENCE BETWEEN
TWO SCORES -one way a test developer can improve the homogeneity of
a test containing items that are scored dichotomously is by
-determining how large a difference should be before it is considered eliminating items that do not show significant correlation
statistically significant coefficients with total test scores

-if the probability is more than 5%, it is presumed that there was no -coefficient alpha may also be used in estimating the
difference homogeneity of a test composed of multiple-choice items

-more rigorous standard is the 1% standard  Test scores increase or decrease as a function of age, the
passage of time, or an experimental manipulation as
-applying the 1% standard, no statistically significant difference theoretically predicted
would be deemed to exist unless the observed difference could have
occurred by chance alone less than one time in a hundred  Test scores obtained after some event or the mere
passage of time (or, posttest scores) differ from pretest
scores as theoretically predicted
VALIDITY
 Test scores obtained by people from distinct groups
Validation vary as predicted by the theory method of contrasted
-process of gathering and evaluating evidence about validity groups, one way of providing evidence for the validity
of a test is to demonstrate that scores on the test vary in
-both the test developer and the test user may play a role in the a predictable way as a function of membership in some
validation of a test group
Local validation studies
-if a test is a valid measure of a particular construct, then
-necessary when the test user plans to alter in some way the format,
test scores from groups of people who would be presumed
instructions, language, or content of the
to differ with respect to that construct should have
Test
correspondingly different test scores
-necessary if a test user sought to use a test with a population of
testtakers that differed in some significant way from the population  Test scores correlate with scores on other tests in
on which the test was accordance with what would be predicted from a theory
standardized that covers the manifestation of the construct in
question
CONTENT VALIDITY
-evaluation of the subjects, topics, or content covered by the items in Convergent evidence
the test -if scores on the test undergoing construct validation tend to correlate
highly in the predicted direction with scores on older, more
-judgment of how adequately a test samples behavior representative established, and already validated tests designed to measure the same
of the universe of behavior that the test was designed to sample (or a similar) construct

CONSTRUCT VALIDITY
Page | 12
Psychological Assessment Transcripts
RGO Review Center | Board Licensure Examination for Psychometrician 2023

Discriminant evidence  base rate: particular attribute exists in the population


-validity coefficient showing little relationship between test scores
and/or other variables with which scores on the test being construct-  hit rate: test accurately identifies as
validated should not theoretically be correlated possessing or exhibiting a particular trait

Factor analysis  miss rate: people the test fails to identify as having a
-both convergent and discriminant evidence of construct validity can particular characteristic or attribute
be obtained by the use of factor analysis
 false positive: test predicted that the testtaker possess the
-designed to identify factors or specific variables that are typically particular characteristic even if the testtaker did not
attributes, characteristics, or dimensions on which people may differ
 false negative: test predicted that the testtaker did not
-employed as a data reduction method in which several sets of scores possess the particular characteristic even if the testtaker
and the correlations between them are analyzed actually did
-factor analysis is conducted on either an exploratory or a -to evaluate the predictive validity of a test, a test targeting a
confirmatory basis particular attribute may be administered to a sample of research
subjects in which approximately half of the subjects possess or
 Exploratory factor analysis: “estimating, or extracting exhibit the targeted attribute and the other half do not
factors; deciding how many factors to retain; and rotating
factors to an interpretable orientation” -judgments of criterion-related validity, whether concurrent or
predictive, are based on two types of statistical evidence:
 Confirmatory factor analysis: test the degree to which a
hypothetical model (which includes factors) fits the actual Validity coefficient
data -relationship between test scores and scores on the criterion measure

-factor loading: extent to which the factor determines the test score -Pearson correlation coefficient is used
or scores
-affected by restriction or inflation of range
-high factor loadings: provide convergent evidence of construct
validity OTHER TYPES OF VALIDITY
Ecological validity
-moderate to low factor loadings: discriminant evidence of construct -judgment regarding how well a test measures what it purports to
validity measure at the time and place that the variable being measured

CRITERION-RELATED VALIDITY Face validity


-face validity relates more to what a test appears to measure to the
-evaluating the relationship of scores obtained on the test to scores on person being tested than to what the test actually measures
other tests or measures
-how relevant the test items appear to be
-how adequately a test score can be used to infer an individual’s most
probable standing on some measure of interest—the measure of -test’s lack of face validity could contribute to a lack of confidence in
interest being the criterion the perceived effectiveness of the test; lead to decrease in the
testtaker’s cooperation or motivation to do his or her best
-criterion: standard against which a test or a test score is evaluated
-characteristics of a criterion: -face validity may be more a matter of public relations than
 relevant; applicable to the matter at hand psychometric soundness

 must also be valid for the purpose for which it is being -attrition in the number of subjects: validity coefficient may be
used adversely affected

 a criterion is also uncontaminated


o criterion contamination: criterion measure that has  Incremental validity
been based, at least in part, on predictor measures -additional predictor explains something about the criterion
measure that is not explained by predictors already in use
Concurrent validity
VALIDITY, BIAS, AND FAIRNESS
-test score is related to some criterion measure obtained at the same
time (concurrently) Test Bias
-factor inherent in a test that systematically prevents accurate,
-test scores are obtained at about the same time as the criterion impartial measurement
measures are obtained
-bias implies systematic variation
-sometimes the concurrent validity of a particular test (Test A) is
explored with respect to another test (Test B); Test B is used as the -one reason some tests have been found to be biased has more to do
validating criterion with the design of the research study than the design of the test

-criterion measure is obtained not concurrently but at some future -biased if some portion of its variance stems from some factor(s) that
time are irrelevant to performance on the criterion measure; as a
consequence, one group of testtakers will systematically perform
Predictive validity differently from another
-test score predicts some criterion measure
-prevention during test development is the best cure for test bias
-test scores may be obtained at one time and the criterion measures
obtained at a future time, usually after some intervening event has -procedure called estimated true score transformations represents
taken place one of many available post hoc remedies

-how accurately scores on the test predict some criterion measure


Page | 13
Psychological Assessment Transcripts
RGO Review Center | Board Licensure Examination for Psychometrician 2023

-rating error: numerical or verbal judgment (or both) that places a Naylor-Shine tables
person or an attribute along a continuum identified by a scale of -difference between the means of the selected and unselected groups
numerical or word descriptors known as a rating scale to derive an index of what the test is adding to already established
procedures
-a rating error is a judgment resulting from the intentional or
unintentional misuse of a rating scale -with both tables, the validity coefficient used must be one obtained
by concurrent validation procedures
-leniency error: also known as a generosity error; error in rating
that arises from the tendency on the part of the rater to be lenient in The Brogden-Cronbach-Gleser formula
scoring -used to calculate the dollar amount of a utility gain resulting from
the use of a particular selection instrument under specified conditions
-central tendency error: exhibits a general and systematic
reluctance to giving ratings at either the positive or the negative -utility gain: estimate of the benefit of using a particular test or
extreme; all of this rater’s ratings would tend to clusterin the middle selection method
of the rating continuum
-productivity gain: an estimated increase in work output
-one way to overcome restriction-of-range rating errors (central
tendency, leniency, severity errors) is to use rankings: measure -test is obviously of no value if the hit rate is higher without
individuals against one another instead of against an absolute scale using it

-Halo effect: some raters, some ates can do no wrong Decision theory
-provides guidelines for setting optimal cutoff scores
Test Fairness
-a test is used in an impartial, just, and equitable way SOME PRACTICAL CONSIDERATIONS
-when the base rates are extremely low or high because such a
-some tests, for example, have been labeled “unfair” because they situation may render the test useless as a tool of selection
discriminate among groups of people
 The pool of job applicants
UTILITY  The complexity of the job
-how useful a test is  The cut score in use
o cut score: usually numerical reference point derived
-practical value of using a test to aid in decision making as a result of a judgment and used to divide a set of
data into two or more classifications
FACTORS THAT AFFECT A TEST’S UTILITY
o relative cut score (norm-referenced cut score):
Psychometric soundness reference point—in a distribution of test scores used to
-the higher the criterion-related validity of test scores for making a divide a set of data into two or more classifications—
particular decision, the higher the utility of the test is likely to be that is set based on norm-related considerations

Costs o fixed cut score (absolute cut score): reference point—


-losses, or expenses in both economic and noneconomic terms in a distribution of test scores used to divide a set of
-basic elements in any utility analysis is the financial cost of the data into two or more classifications that is typically
selection device set with reference to a judgment concerning a
minimum level of proficiency required to be included
Benefits in a particular classification
-profits, gains, or advantages

-cost of administering tests can be well worth it if the result is certain o multiple cut scores: use of two or more cut scores
noneconomic benefits
with reference to one predictor for the purpose of
UTILITY ANALYSIS categorizing testtakers
-family of techniques that entail a cost–benefit analysis designed to
o multiple hurdle: a cut score is in place for each
yield information relevant to a decision about the usefulness and/or
predictor used; cut score used for each predictor will
practical value of a tool of assessment
be designed to ensure that each applicant possess some
minimum level of a specific attribute or skill
-undertaken for the purpose of evaluating whether the benefits of
using a test outweigh the costs
o compensatory model of selection: an assumption is
-purpose of a utility analysis is to answer a question related to costs made that high scores on one attribute can, in fact,
and benefits in terms of money “balance out” or compensate for low scores on another
attribute; within the framework of a compensatory
HOW IS A UTILITY ANALYSIS CONDUCTED? model is multiple regression

Expectancy data METHODS FOR SETTING CUT SCORES


-provide an indication of the likelihood that a testtaker will score
within some interval of scores on a criterion measure—an interval The Angoff Method
that may be categorized as -provide estimates regarding how testtakers who have at least
“passing,” “acceptable,” or “failing” minimal competence for the position should answer test items
correctly
Taylor-Russell tables
-provide an estimate of the extent to which inclusion of a particular -an expert panel makes judgments concerning the way a person with
test in the selection system will improve selection that trait, attribute, or ability would respond to test items

-provide an estimate of the percentage of employees hired by the use -Disadvantage: when there is low inter-rater reliability and major
of a particular test who will be successful at their jobs, given different disagreement regarding how certain populations of testtakers should
combinations of three variables: the test’s validity, the selection ratio respond to items
used, and the base rate
The Known Groups Method

Page | 14
Psychological Assessment Transcripts
RGO Review Center | Board Licensure Examination for Psychometrician 2023

-method of contrasting groups  What types of responses will be required of testtakers?

-collection of data on the predictor of interest from groups known to  Who benefits from an administration of this test?
possess, and not to possess, a trait, attribute, or ability of interest
 Is there any potential for harm as the result of an administration
-main problem with using known groups is that determination of of this test?
where to set the cutoff score is inherently affected by the composition
of the contrasting groups  How will meaning be attributed to scores on this test?

IRT-Based Methods Norm-referenced versus criterion-referenced tests


-each item is associated with a particular level of difficulty -a good item on a norm-referenced achievement test is an item for
which high scorers on the test respond correctly
-on order to “pass” the test, the testtaker must answer items that are
deemed to be above some minimum level of difficulty, which is -on a criterion-oriented test, this same pattern of results may occur:
determined by experts and serves as the cut score high scorers on the test get a particular item right whereas low scorers
on the test get that same item wrong
-item-mapping method: setting cut scores for licensing
examinations -the development of a criterion-referenced test or assessment
procedure may entail exploratory work with at least two groups of
 Bookmark method testtakers: one group known to have mastered the knowledge or skill
-more typically used in academic applications being measured and another group known not to have mastered such
knowledge or skill
-begins with the training of experts with regard to the
minimal knowledge, skills, and/or abilities that testtakers -the items that best discriminate between these two groups would be
should possess in order to “pass” considered “good” items

-subsequent to this training, the experts are given a book of Pilot Work
items, with one item printed per page, such that items are -preliminary research surrounding the creation of a prototype of the
arranged in an ascending order of difficulty test

-expert places a “bookmark” between the two pages (or, the -Test items may be pilot studied (or piloted) to evaluate whether they
two items) that are deemed to separate testtakers who have should be included in the final form of the instrument
acquired the minimal knowledge, skills, and/or abilities
from those who have not -Once the idea for a test is conceived (test conceptualization), test
construction begins
-bookmark serves as the cut score
TEST CONSTRUCTION
TEST DEVELOPMENT -a stage in the process of test development that entails writing test
items (or re-writing or revising existing items), as well as formatting
-umbrella term for all that goes into the process of creating a test items, setting scoring rules, and otherwise designing and building a
test
-The process of developing a test occurs in five stages:
1. test conceptualization -once a preliminary form of the test has been developed, it is
2. test construction administered to a representative sample of testtakers under conditions
3. test tryout that simulate the conditions that the final version of the test will be
4. item analysis administered under
5. test revision
Scaling
Preliminary questions -process of setting rules for assigning numbers in measurement
 What is the test designed to measure?
-measuring device is designed and calibrated and by which numbers
 What is the objective of the test? In the service of what goal will are assigned to different amounts of the trait
the test be employed?
Types of scales
 In what way or ways is the objective of this test the same as or Age-based scale
different from other tests with similar goals? -If the testtaker’s test performance as a function of age is of critical
interest
 What real-world behaviors would be anticipated to correlate
with testtaker responses? Grade-based scale
-testtaker’s test performance as a function of grade is of critical
 Is there a need for this test? interest

 Who will use this test? Stanine scale


-all raw scores on the test are to be transformed into scores that can
 Who will take this test? range from 1 to 9

 What content will the test cover? Scaling methods


Rating scale
-grouping of words, statements, or symbols on which judgments of
 How will the test be administered?
the strength of a particular trait, attitude, or emotion are indicated by
the testtaker
 What is the ideal format of the test?
-used to record judgments of oneself, others, experiences, or objects,
 Should more than one form of the test be developed? and they can take several forms
 What special training will be required of test users for -final test score is obtained by summing the ratings across all the
administering or interpreting the test? items, it is termed a summative scale

Page | 15
Psychological Assessment Transcripts
RGO Review Center | Board Licensure Examination for Psychometrician 2023

Likert scale  short answer


-usually to scale attitudes
 completion item: requires the examinee to provide a word
-use of rating scales of any type results in ordinal-level data or phrase that completes a sentence

-some rating scales are unidimensional: only one dimension is -should be worded so that the correct answer is specific
presumed to underlie the ratings
-may also be referred to as a short-answer item
-other rating scales are multidimensional: meaning that more than
one dimension is thought to guide the testtaker’s responses  essay item: requires the testtaker to respond to a question
by writing a composition, typically one that demonstrates
Method of paired comparisons recall of facts, understanding, analysis, and/or
-testtakers are presented with pairs of stimuli, which they are asked to interpretation
compare
-useful when the test developer wants the examinee to
-for each pair of options, testtakers receive a higher score for demonstrate a depth of knowledge about a single topic
selecting the option deemed more justifiable by the majority of a
group of judges Writing items for computer administration
Item bank
-comparative scaling: judgments of a stimulus in comparison with -relatively large and easily accessible collection of test questions
every other stimulus on the scale providing testtakers with a list of 30
items on a sheet of paper and asking them to rank the justifiability of computerized adaptive testing (CAT)
the items from 1 to 30 -computer administered test-taking process wherein items presented
to the testtaker are based in part on the testtaker’s performance on
-categorical scaling: stimuli are placed into one of two or more previous items
alternative categories that differ quantitatively with respect to some
continuum -test administered may be different for each testtaker, depending on
the test performance on the items presented
Guttman scale
-ordinal-level measures tends to reduce floor effects and ceiling effects

-items on it range sequentially from weaker to stronger expressions of -floor effect: diminished utility of an assessment tool for
the attitude, belief, or feeling being measured distinguishing testtakers at the low end of the ability, trait, or other
attribute being measured
-all respondents who agree with the stronger statements of the
attitude will also agree with milder statements -ceiling effect: refers to the diminished utility of an assessment tool
for distinguishing testtakers at the high end of the ability, trait, or
Scalogram analysis other attribute being measured
-item-analysis procedure and approach to test development that
involves a graphic mapping of a testtaker’s responses -item branching: ability of the computer to tailor the content and
order of presentation of test items on the basis of responses to
Writing Items previous items
 What range of content should the items cover?
 Which of the many different types of item formats should be -item-branching technology may be used in personality tests to
employed? recognize nonpurposive or inconsistent responding.
 How many items should be written in total and for each
content area covered?

Item pool
-reservoir or well from which items will or will not be drawn for the Scoring Items
final version of the test Cumulative model
-the higher the score on the test, the higher the testtaker is on the
Item format ability, trait, or other characteristic that the test purports to measure
-form, plan, structure, arrangement, and layout of individual test
items Class scoring (category scoring)
-selected-response format: require testtakers to select a response -responses earn credit toward placement in a particular class or
from a set of alternative responses category with other testtakers whose pattern of responses is
 multiple-choice: an item written in a multiple-choice presumably similar in some way
format has three elements: a stem, a correct alternative or
option, and (3) several incorrect options referred to as Ipsative scoring
distractors or foils -comparing a testtaker’s score on one scale within a test to another
scale within that same test
-multiple-choice item that contains only two possible
responses is called a binary-choice item TEST TRYOUT
-an informal rule of thumb is that there should be no fewer than 5
-most familiar binary-choice item is the true–false item subjects and preferably as many as 10 for each item on the test

-good binary choice contains a single idea, is not -the more subjects in the tryout the better
excessively long, and is not subject to debate
-definite risk in using too few subjects during test tryout comes
 matching item: testtaker is presented with two columns; during factor analysis of the findings, when what we might call
premises on the left and responses on the right phantom factors—factors that actually are just artifacts of the small
sample size—may emerge
-testtaker’s task is to determine which response is best
associated with which premise What Is a Good Item?
-good test item is reliable and valid
-constructed-response format: require testtakers to supply or to
create the correct answer, not merely to select it
Page | 16
Psychological Assessment Transcripts
RGO Review Center | Board Licensure Examination for Psychometrician 2023

-good test item is one that is answered correctly by high scorers on -ideal is .3+
the test as a whole
Item-Characteristic Curves (ICCs)
-item analysis: different types of statistical scrutiny that the test data -can play a role in decisions about which items are working well and
can potentially undergo at this point which items are not

ITEM ANALYSIS -item-characteristic curve is a graphic representation of item


difficulty and discrimination
-tools test developers might employ to analyze and select items are
 an index of the item’s difficulty -the steeper the slope, the greater the item discrimination
 an index of the item’s reliability
 an index of the item’s validity -item may also vary in terms of its difficulty level
 an index of item discrimination
-difficult item will shift the ICC to the right along the horizontal axis
The Item-Difficulty Index
-the larger the item-difficulty index, the easier the item item- Other Considerations in Item Analysis
difficulty index in the context of achievement testing Guessing
-criteria that any correction for guessing must meet as well as the
-may be an item-endorsement index in other contexts, such as other interacting issues that must be addressed:
personality testing
1. A correction for guessing must recognize that, when a respondent
-statistic provides not a measure of the percent of people passing the guesses at an answer on an achievement test, the guess is not
item but a measure of the percent of people who said yes to, agreed typically made on a totally random basis. It is more reasonable to
with, or otherwise endorsed the item assume that the testtaker’s guess is based on some knowledge of the
-value of an item-difficulty index can theoretically range from 0 (if subject matter and the ability to rule out one or more of the distractor
no one got the item right) to 1 (if everyone got the item right) alternatives. However, the individual testtaker’s amount of
knowledge of the subject matter will vary from one item to the next
The Item-Reliability Index
-indication of the internal consistency of a test 2. A correction for guessing must also deal with the problem of
omitted items. Sometimes, instead of guessing, the testtaker will
-the higher this index, the greater the test’s internal consistency simply omit a response to an item

-optimal average item difficulty is approximately .5, with individual 3. Some testtakers may be luckier than others in guessing the choices
items on the test ranging in difficulty from about .3 to .8. that are keyed correct

-a responsible test developer addresses the problem of guessing by


including in the test manual:
 explicit instructions regarding this point for the
Factor analysis and inter-item consistency examiner to convey to the examinees
-statistical tool useful in determining whether items on a test appear
to be measuring the same thing  specific instructions for scoring and interpreting
omitted items
-useful in the test interpretation process
Item fairness
The Item-Validity Index -refers to the degree, if any, a test item is biased
-designed to provide an indication of the degree to which a test is
measuring what it purports to measure -biased test item is an item that favors one particular group of
examinees in relation to another when differences in group ability are
-the higher the item-validity index, the greater the test’s criterion- controlled
related validity
-choice of item-analysis method may affect determinations of item
-calculating the item-validity index will be important when the test bias
developer’s goal is to maximize the criterion-related validity of the
test -item-characteristic curves can be used to identify biased items

The Item-Discrimination Index Speed tests


-measures of item discrimination indicate how adequately an item -the test developer ideally should administer the test to be item-
separates or discriminates between high scorers and low scorers on an analyzed, with generous time limits to complete the test
entire test
Qualitative Item Analysis
-item-discrimination index: a measure of item discrimination -qualitative methods are techniques of data generation and analysis
that rely primarily on verbal rather than mathematical or statistical
-compares performance on a particular item with performance in the procedures
upper and lower regions of a distribution of continuous test scores
-general term for various nonstatistical procedures designed to
-measure of the difference between the proportion of high scorers explore how individual test items work
answering an item correctly and the proportion of low scorers
answering the item correctly; the higher the value of d, the greater the -analysis compares individual test items to each other and to the test
number of high scorers answering the item correctly as a whole

-negative d-value on a particular item is a red flag because it indicates -exploration of the issues through verbal means such as interviews
that low-scoring examinees are more likely to answer the item and group discussions conducted with testtakers and other relevant
correctly than high-scoring examinees; items then need to be revised parties
or eliminated
“Think aloud” test administration
-higher the value of d, the more adequately the item discriminates the -having respondents verbalize thoughts as they occur
higher-scoring from the lower-scoring testtakers shed light on the testtaker’s thought processes during
the administration of a test
Page | 17
Psychological Assessment Transcripts
RGO Review Center | Board Licensure Examination for Psychometrician 2023

-formal item-analysis methods must be employed to evaluate the


Expert panels stability of items between revisions of the same test
-expert panels may also provide qualitative analyses of test items
-key step in the development of all tests, brand-new or revised
-sensitivity review: a study of test items, typically conducted during editions, is cross-validation
the test development process, in which items are examined for
fairness to all prospective testtakers and for the presence of offensive Cross-validation and co-validation
language, stereotypes, or situations -refers to the revalidation of a test on a sample of testtakers other than
those on whom test performance was originally found to be a valid
predictor of some criterion

-validity shrinkage: decrease in item validities that inevitably occurs


after cross-validation of findings

-co-validation may be defined as a test validation process conducted


on two or more tests using the same sample of testtakers
TEST REVISION
-tremendous amount of information is generated at the item-analysis -when used in conjunction with the creation of norms or the revision
stage of existing norms, this process may also be referred to as co-norming

-next step is to administer the revised test under standardized Quality assurance during test revision
conditions to a second appropriate sample of examinees -a mechanism for ensuring consistency in scoring is the anchor
protocol.
-on the basis of an item analysis of data derived from this
administration of the second draft of the test, the test developer may -anchor protocol: test protocol scored by a highly authoritative
deem the test to be in its finished form scorer that is designed as a model for scoring and a mechanism for
resolving scoring discrepancies
-test’s norms may be developed from the data, and the test will be
said to have been “standardized” on this (second) sample -discrepancy between scoring in an anchor protocol and the scoring
of another protocol is referred to as scoring drift
-when the item analysis of data derived from a test administration
indicates that the test is not yet in finished form, the steps of revision, The Use of IRT in Building and Revising Tests
tryout, and item analysis are repeated until the test is satisfactory and -Using IRT, test developers evaluate individual item performance
standardization can occur with reference to item-characteristic curves (ICCs)

Test Revision in the Life Cycle of an Existing Test -Three of the many possible applications of IRT in building and
-tests are revised when significant changes in the domain represented, revising tests include:
or new conditions of test use and interpretation, make the test
inappropriate for its intended use  evaluating existing tests for the purpose of mapping test
revisions
-many tests are deemed to be due for revision when any of the  determining measurement equivalence across testtaker
following conditions exist: populations
 developing item banks
1. The stimulus materials look dated and current testtakers
cannot relate to them. Evaluating the properties of existing tests and guiding test
revision
2. The verbal content of the test, including the -IRT information curves can help test developers evaluate how well
administration instructions and the test items, contains an individual item (or entire test) is working to measure different
dated vocabulary that is not readily understood by current levels of the underlying construct
testtakers.
Determining measurement equivalence across testtaker
3. As popular culture changes and words take on new populations
meanings, certain words orexpressions in the test items or -help ensure that the same construct is being measured, no matter
directions may be perceived as inappropriate or even what language the test has been translated into
offensive to a particular group and must therefore be
changed. differential item functioning(DIF)
-an item functions differently in one group of testtakers as compared
4. The test norms are no longer adequate as a result of to another group of testtakers known to have
group membership changes in the population of potential the same (or similar) level of the underlying trait
testtakers.
DIF analysis
5. The test norms are no longer adequate as a result of age- -test developers scrutinize group-by-group item response curves,
related shifts in the abilities measured over time, and so an looking for what are termed DIF items
age extension of the norms (upward, downward, or in both
directions) is necessary. DIF items
-items that respondents from different groups at the same level of the
6. The reliability or the validity of the test, as well as the underlying trait have different probabilities of endorsing as a function
effectiveness of individual test items, can be significantly of their group membership
improved by a revision.
-another application of DIF analysis has to do with the evaluation of
7. The theory on which the test was originally based has item-ordering effects, and the effects of different test administration
been improved significantly, and these changes should be procedures
reflected in the design and content of the test.
Developing item banks
-steps to revise an existing test parallel those to create a brand-new -each of the items assembled as part of an item bank, whether taken
one from an existing test or written especially for the item bank, have
undergone rigorous qualitative and quantitative evaluation

Page | 18
Psychological Assessment Transcripts
RGO Review Center | Board Licensure Examination for Psychometrician 2023

-new items may also be written when existing measures are either not
available or do not tap targeted aspects of the construct being -abstract-reasoning problems were thought to be the best measures of
measured g in formal tests

-acknowledged the existence of an intermediate class of factors


INTELLIGENCE AND ITS MEASUREMENT common to a group of activities but not to all called group factors:
neither as general as nor as specific as s (e.g., linguistic, mechanical,
- a multifaceted capacity that manifests itself in different ways across and arithmetical abilities)
the life span
Gardner
-includes the abilities to: -developed a theory of multiple intelligences: logical-
 acquire and apply knowledge mathematical, bodily-kinesthetic, linguistic, musical, spatial,
 reason logically interpersonal, and intrapersonal
 plan effectively
-wrote about interpersonal intelligence and intrapersonal intelligence
 infer perceptively
which have found expression in popular books written by others on
 make sound judgments and solve problems
the subject of so-called emotional intelligence
 grasp and visualize concepts
 pay attention Raymond B. Cattell and Horn
 be intuitive - theory of intelligence first proposed
 find the right words and thoughts with facility
 cope with, adjust to, and make the most of new -theory postulated the existence of two major types of cognitive
situations abilities: crystallized intelligence and fluid intelligence

Galton -abilities that make up crystallized intelligence (Gc) include acquired


-believed that the most intelligent persons were those equipped with skills and knowledge that are dependent on exposure to a particular
the best sensory abilities culture as well as on formal and informal education

Binet -retrieval of information and application of general knowledge are


-when one solves a particular problem, the abilities used cannot be conceived of as elements of crystallized intelligence
separated because they interact to produce the solution
-abilities that make up fluid intelligence (Gf) are nonverbal, relatively
Wechsler culture-free, and independent of specific instruction
-explicit reference to an “aggregate” or “global” capacity
-proposed the addition of several factors: visual processing (Gv),
-intelligence, operationally defined, is the aggregate or global auditory processing (Ga), quantitative processing (Gq), speed of
capacity of the individual to act purposefully, to think rationally and processing (Gs), facility with reading and writing (Grw), short-term
to deal effectively with his environment memory (Gsm), and long-term storage and retrieval (Glr)

Piaget -some of the abilities (such as Gv) are vulnerable abilities in that they
-evolving biological adaptation to the outside world decline with age and tend not to return to preinjury levels following
PERSPECTIVES ON INTELLIGENCE brain damage abilities

Interactionism -some abilities (such as Gq) are maintained abilities; they tend not
-refers to the complex concept by which heredity and to decline with age and may return to preinjury levels following brain
environment are presumed to interact and influence the development damage
of one’s intelligence
Three-stratum theory of cognitive abilities (Carroll)
Louis L. Thurstone -top stratum or level in Carroll’s model is g, or general intelligence
-conceived of intelligence as composed of what he termed primary
mental abilities (PMAs) -second stratum is composed of eight abilities and processes: fluid
intelligence (Gf), crystallized intelligence (Gc), general memory and
Early model of multiple abilities learning (Y), broad visual perception (V), broad auditory perception
(U), broad retrieval capacity (R), broad cognitive speediness (S), and
Factor-analytic theories processing/decision speed (T)
-focus is squarely on identifying the ability or groups of
abilities deemed to constitute intelligence -below each of the abilities in the second stratum are many “level
factors” and/or “speed factors

-three-stratum theory is a hierarchical model, meaning that all of the


abilities listed in a stratum are subsumed by or incorporated in the
Charles Spearman strata above
-measures of intelligence tended to correlate to various degrees with
each other -Cattell-Horn-Carroll (CHC) model of cognitive abilities; g has no
place in the Cattell-Horn model
-theory of general intelligence that postulated the existence of a
general intellectual ability factor (denoted by an italic lowercase g) McGrew-Flanagan CHC model
that is partially tapped by all other mental abilities -makes no provision for the general intellectual ability factor (g)

-referred to as a two-factor theory of intelligence -model was the product of efforts designed to improve the practice of
psychological assessment in education (sometimes referred to as
-tests that exhibited high positive correlations with other intelligence psychoeducational assessment) by identifying tests from different
tests were thought to be highly saturated with g, whereas tests with batteries that could be used to provide a comprehensive assessment of
low or moderate correlations with other intelligence tests were a student’s abilities
viewed as possible measures of specific factors
-cross-battery assessment: assessment that employs tests from
-greater the magnitude of g in a test of intelligence, the better the test different test batteries and entails interpretation of data from specified
was thought to predict overall intelligence subtests to provide a comprehensive assessment
Page | 19
Psychological Assessment Transcripts
RGO Review Center | Board Licensure Examination for Psychometrician 2023

-English translation of the Binet-Simon test authored by Lewis


Thorndike Terman
-intelligence can be conceived in terms of three clusters of ability:
social intelligence (dealing with people), concrete intelligence -first published intelligence test to provide organized and detailed
(dealing with objects), and abstract intelligence (dealing with verbal administration and scoring instructions
and mathematical symbols)
-first American test to employ the concept of IQ
-incorporated a general mental ability factor (g) into the theory,
defining it as the total number of modifiable neural connections or -first test to introduce the concept of an alternate item, an item to be
“bonds” available in the brain substituted for a regular item under specified conditions

-one’s ability to learn is determined by the number and speed of the -earlier versions of the Stanford-Binet had employed the ratio IQ,
bonds that can be marshaled which was based on the concept of mental age: the age level at
which an individual appears to be functioning intellectually as
indicated by the level of items responded to correctly
Information-processing theories
-the focus is on identifying the specific mental processes that -ratio IQ: ratio of the testtaker’s mental age divided by his or her
constitute intelligence chronological age, multiplied by 100 to eliminate decimals for its
computation
-Russian neuropsychologist Aleksandr Luria
-child whose mental age and chronological age were equal would
-focuses on the mechanisms by which information is processed—how thus have an IQ of 100
information is processed, rather than what is processed
-deviation IQ: comparison of the performance of the individual with
-two basic types of information-processing styles: the performance of others of the same age in the standardization
sample
 simultaneous (or parallel) processing: information is
integrated all at one time; “synthesized.” Information is -M: 100, SD: 16
integrated and synthesized at once and as a whole
-SB5 is exemplary in terms of what is called adaptive testing: testing
-tasks that involve the simultaneous mental representations individually tailored to the testtaker
of images or information involve simultaneous processing
The Wechsler Tests
 successive (or sequential) processing: each bit of -Bellevue Hospital in Manhattan, needed an instrument for evaluating
information is individually processed in sequence the intellectual capacity of its multilingual, multinational, and
multicultural clients
-logical and analytic in nature; piece by piece and one piece
-the W-B 1 was a point scale, not an age scale; items were classified
after the other, information is arranged and rearranged so
by subtests rather than by age and was organized into six verbal
that it makes sense
subtests and five performance subtests; all the items in each test were
arranged in order of increasing difficulty
PASS model of intellectual functioning
-Planning: strategy development for problem solving
-WAIS-IV is the current Wechsler adult scale
-Attention: (also referred to as arousal), receptivity of information
-Simultaneous and Successive: type of information process
-WAIS-IV It is made up of subtests that are designated either as core
employed
or supplemental
MEASURING INTELLIGENCE
-core subtest: administered to obtain a composite score
-measurement of intelligence entails sampling an examinee’s
-supplemental subtest: (also sometimes referred to as an optional
performance on different types of tests and tasks as a function of
subtest) is used for purposes such as providing additional clinical
developmental level
information or extending the number of abilities or processes
SOME TASKS USED TO MEASURE INTELLIGENCE sampled

-WAIS-IV contains ten core subtests:


-in infancy, intellectual assessment consists primarily of measuring
sensorimotor development  Block Design
 Similarities
-nonverbal motor responses such as turning over, lifting the head,  Digit Span
sitting up, following a moving object with the eyes, imitating  Matrix Reasoning
gestures, and reaching for a group of objects  Vocabulary
 Arithmetic
-evaluation of the older child shifts to verbal and performance  Symbol Search
abilities  Visual Puzzles
 Information
-adult intelligence scales should tap abilities such as retention of  Coding
general information, quantitative reasoning, expressive language and
memory, and social judgment and five supplemental subtests:
 Letter-Number Sequencing
-tests of intelligence are seldom administered to adults for purposes  Figure Weights
of educational placement; may be given to obtain clinically relevant  Comprehension
information or some measure of learning potential and skill
 Cancellation
acquisition
 Picture Completion
SOME TESTS USED TO MEASURE INTELLIGENCE
The Stanford-Binet Intelligence Scales: Fifth Edition (SB5) -scoring of subtests yields four index scores: a Verbal
-Binet collaborated with Theodore Simon to create the world’s first Comprehension Index, a Working Memory Index, a Perceptual
formal test of intelligence Reasoning Index, and a Processing Speed Index

Page | 20
Psychological Assessment Transcripts
RGO Review Center | Board Licensure Examination for Psychometrician 2023

-fifth index score, the General Ability Index (GAI): kind of and consideration of facts as well as a series of logical judgments to
“composite of two composites”; calculated using the Verbal narrow down solutions and eventually arrive at one solution
Comprehension and Perceptual Reasoning Indexes
-divergent thinking: thought is free to move in many different
-GAI is useful to clinicians as an overall index of intellectual ability directions, making several solutions possible; requires flexibility of
thought, originality, and imagination
-another composite score that has clinical application is the
Cognitive Proficiency Index (CPI): comprised of the Working Remote Associates Test (RAT)
Memory Index and the Processing Speed Index, the CPI is used to -presents the testtaker with three words; the task is to find a fourth
identify problems related to working memory or processing speed word associated with the other three

-CPI was calibrated to have a M: 100 and SD: 15 ISSUES IN THE ASSESSMENT OF INTELLIGENCE

-measured intelligence may vary as a result of factors related to the


Short forms of intelligence tests measurement process
-test that has been abbreviated in length, typically to reduce the time
needed for test administration, scoring, and interpretation Culture and Measured Intelligence
Culture loading
-validity of a test is affected by and is somewhat -extent to which a test incorporates the vocabulary, concepts,
dependent on the test’s reliability traditions, knowledge, and feelings associated with a particular
culture; subjective, qualitative, nonnumerical judgment
-changes in a test that lessen its reliability may also lessen its validity
Culture-fair intelligence test
-reducing the number of items in a test typically reduces the test’s -designed to minimize the influence of culture with regard to various
reliability and hence its validity aspects of the evaluation procedures, such as administration
instructions, item content, responses required of testtakers, and
Group tests of intelligence interpretations made from the resulting data; nonverbal
-Robert M. Yerkes: began efforts to mobilize psychologists
to help in the war effort -been found to lack the hallmark of traditional tests of intelligence:
predictive validity
-Army Alpha test: administered to Army recruits who could read;
contained tasks such as general information questions, analogies, and -one culture-specific intelligence test developed expressly for use
scrambled sentences to reassemble with African-Americans was the Black Intelligence Test of Cultural
Homogeneity: test was measuring a variable that could be
-Army Beta test: designed for administration to foreign-born recruits characterized as streetwiseness
with poor knowledge of English or to illiterate; contained tasks such
as mazes, coding, and picture completion The Flynn Effect
-intelligence inflation
-original objective of the Alpha and Beta tests was to measure the
ability to be a good soldier -measured intelligence seems to rise on average, year by year,
starting with the year for which the test is normed
-group tests of intelligence are extensively used in schools and related
educational settings -progressive rise in intelligence test scores that is expected to occur
on a normed test intelligence from the date when the test was first
-now also referred to as a school ability test normed

-group intelligence test results provide school personnel with The Construct Validity of Tests of Intelligence
valuable information for instruction-related activities and increased -evaluation of a test’s construct validity proceeds on the assumption
understanding of the individual pupil that one knows in advance exactly what the test is supposed to
measure
-first group intelligence test to be used in U.S. schools was the Otis-
Lennon School Ability Test, formerly the Otis Mental Ability Test ASSESSMENT FOR EDUCATION

-designed to measure abstract thinking and reasoning ability and to Integrative assessment
assist in school evaluation and placement decision-making -employ not only various tools of assessment, but input from various
school personnel, as well as parents, and other relevant sources of
Other measures of intellectual abilities information
-tests designed to measure creativity may well measure variables
related to intelligence measures of creativity may also be thought of Dynamic Assessment
as tools for assessing intelligence -originally developed for use with children

-four terms common to many measures of creativity are: -exploring learning potential that is based on a test-intervention-retest
 Originality: ability to produce something that is innovative model
or nonobvious
Achievement tests
 Fluency: ease with which responses are reproduced and is -designed to measure accomplishment
usually measured by the total number of responses
produced -measure the degree of learning that has taken place

 Flexibility: variety of ideas presented and the ability to -items may be characterized by the type of mental processes required
shift from one approach to another by the testtaker to successfully retrieve the information needed to
respond to the item
 Elaboration: richness of detail in a verbal explanation or
pictorial display -fact-based items and conceptual items

- the thought process typically required in achievement tests is Aptitude Tests


convergent thinking: deductive reasoning process that entails recall -Informal learning or life experiences whereas achievement tests tend
to focus on the learning that has
Page | 21
Psychological Assessment Transcripts
RGO Review Center | Board Licensure Examination for Psychometrician 2023

occurred as a result of relatively structured input -measure of reading readiness, reading achievement, and reading
difficulties
-referred to as prognostic tests; used to make predictions
-takes between 15 and 45 minutes to administer the entire battery
-tend to draw on a broader fund of information and abilities and may
be used to predict a wider variety of variables -used with children as young as 4½, adults as old as 80
THE PRESCHOOL LEVEL -subtests:
Apgar number  Letter Identification
- everybody’s first test”  Word Identification
 Word Attack
-conducted at 1 minute after birth to assess how well the infant  Word comprehension
tolerated the birthing process
-three subtests new to the third edition are Phonological Awareness,
-evaluation is conducted again at 5 minutes after birth to assess how Listening Comprehension, and Oral Reading Fluency
well the infant is adapting to the environment
Psychoeducational Test Batteries
-each evaluation is made with respect to the same five variables; each -test kits that generally contain two types of tests: those that measure
variable can be scored on a range from 0 to 2; and each score (at 1 abilities related to academic success and those that measure
minute and 5 minutes) can range from 0 to 10 educational achievement in areas such as reading and arithmetic
-activity (or muscle tone), pulse (or heart rate), grimace (or reflex -data derived from these batteries allow for normative comparisons as
irritability), appearance (or color), respiration well as an evaluation of the testtaker’s own strengths and weaknesses
*Approximately one hour is a good rule-of-thumb limit for an Kaufman Assessment Battery for Children (K-ABC)
entire test session with a preschooler; less time is preferable -use with testtakers from age 2½ through age 12½
*As testing time increases, so does the possibility of fatigue and -measurement for intelligence and achievement
distraction
THE ELEMENTARY-SCHOOL LEVEL -divided into two groups, reflecting the two kinds of information-
processing skills identified by Luria and his students: simultaneous
skills and sequential skills
Metropolitan Readiness Tests (MRT6)
-assesses the development of the reading and mathematics skills Kaufman Assessment Battery for Children, Second Edition
important in the early stages of formal school learning (KABC-II)
-age range for the second edition of the test was extended upward
-orally administered (ages 3 to 18) to expand the possibility of making ability/achievement
comparisons with the same test through high school
-runs about 90 minutes
THE SECONDARY-SCHOOL LEVEL -10 new subtests were created, 8 of the existing subtests were
removed, and only 8 of the original subtests remained
SAT
-aptitude test widely used in the schools at the secondary level

-test has been of value not only in the college selection process but
also as an aid to high-school guidance and job placement counselors

ACT Assessment Woodcock-Johnson IV (WJ IV)


-formerly known as the American College Testing Program -WJ III was a psychoeducational test package consisting of two co-
normed batteries (the Tests of Achievement and the Tests of
-curriculum-based questions directly based on typical high-school Cognitive Abilities)
subject areas
-WJ IV consists of three co-normed test batteries
-scores on the ACT may be predictive of creativity as well
as academic success -battery of tests designed to measure oral language ability, listening
comprehension and speed of lexical access

-total of 12 tests in the Tests of Oral Language battery, including nine


English language tests, and three parallel tests in Spanish

-battery may be used to gauge proficiency in English or Spanish, and


COLLEGE LEVEL AND BEYOND to evaluate various aspects of oral language
The Miller Analogies Test (MAT) -may be used with persons as young as 2, and as old as 90 (or older)
-draws not only on the examinee’s ability to perceive relationships
but also on general intelligence, vocabulary, and academic learning Other Tools of Assessment in Educational Settings
 Performance task
-one of the most cost-effective of all existing aptitude tests when it  Portfolio
comes to forecasting success in graduate school
 Authentic Assessment
- performance-based assessment
Diagnostic Tests
-educational contexts used to pinpoint a student’s difficulty, usually
-evaluation of relevant, meaningful tasks that may be conducted to
for remedial purposes
evaluate learning of academic subject matter but that demonstrate the
student’s transfer of that
-tool used to identify areas of deficit to be targeted for intervention
study to real-world activities
Reading Tests
-thought to increase student interest and the transfer of knowledge to
The Woodcock Reading Mastery Tests-Revised (WRMT-III)
settings outside the classroom
Page | 22
Psychological Assessment Transcripts
RGO Review Center | Board Licensure Examination for Psychometrician 2023

 Peer Appraisal Techniques Another person as the referent


-results of a peer appraisal can be graphically illustrated -in some situations, the best available method for the assessment of
Through a sociogram personality, behavior, or both involves reporting by a third party such
as a parent, teacher, peer, supervisor, spouse, or trained observer
Measuring Study Habits, Interests, and Attitudes
Study Habits Checklist
-consists of 37 items that assess study habits with respect to note  What is assessed when a personality assessment is
taking, reading material, and general study practices conducted?

 Where are personality assessments conducted?


PERSONALITY ASSESSMENT
 How are personality assessments structured and conducted?
-the measurement and evaluation of psychological Nomothetic approach
traits, states, values, interests, attitudes, worldview, acculturation, -learn how a limited number of personality traits can be applied to all
sense of humor, cognitive and behavioral styles, and/or related people
individual characteristics
Idiographic approach
Personality -learn about each individual’s unique constellation of personality
-individual’s unique constellation of psychological traits that is traits, with no attempt to characterize each person according to any
relatively stable over time particular set of
traits
Personality traits
-distinguishable, relatively enduring way in which one individual
varies from another
Ipsative approach
Personality types -testtaker’s responses, as well as the presumed strength of measured
-constellation of traits that is similar in pattern to one identified traits, are interpreted relative to the strength of measured traits for
category of personality within a taxonomy of personalities that same individual

-whereas traits are frequently discussed as if they were characteristics DEVELOPING INSTRUMENTS TO ASSESS
possessed by an individual, types are more clearly descriptions of PERSONALITY
people Logic and Reason
-may dictate what content is covered by the items
-typology devised by Carl Jung became the basis for the Myers-
Briggs Type Indicator -the use of logic and reason in the development of test items is
-MBTI: in assumption guiding the development of this test was that sometimes referred to as the content or content-oriented approach
people exhibit definite preferences in the way that they perceive or to test development
become aware of—and judge or arrive at conclusions
Theory
Meyer Friedman and Ray Rosenman -personality measures differ in the extent to which they rely on a
-conceived of a Type A personality: competitiveness, haste, particular theory of personality in their development as well as their
restlessness, impatience, feelings of being time-pressured, and strong interpretation
needs for achievement and dominance
Data Reduction Methods
-Type B personality: has the opposite of the Type A’s traits; mellow -include several types of statistical techniques collectively
or laid-back known as factor analysis or cluster analysis

Profile -one use of data reduction methods in the design of personality


-narrative description, graph, table, or other representation of the measures is to aid in the identification of the minimum number of
extent to which a person has demonstrated certain targeted variables or factors that account for the intercorrelations in observed
characteristics as a result of the administration or application of tools phenomena
of assessment
The Big Five (NEO PI-R)
Personality profile -widely used in both clinical applications and a wide range of
-targeted characteristics are typically traits, states, or types research that involves personality assessment

Personality states -essentially self-administered


-transitory exhibition of some personality trait

-the use of the word trait presupposes a relatively enduring behavioral  Neuroticism: referred to as the Emotional Stability factor;
predisposition, whereas the term state is indicative of a relatively adjustment and emotional stability, including how people
temporary predisposition cope in times of emotional turmoil

PERSONALITY ASSESSMENT: SOME BASIC  Extraversion: sociability, how proactive people are in
QUESTIONS seeking out others, as well as assertiveness
 Who is being assessed, and who is doing the assessing?
-some methods of personality assessment rely on the assessee’s own  Openness: referred to as the Intellect factor; openness to
self-report: process wherein information about assessees is supplied experience as well as active imagination, aesthetic
by the assessees themselves sensitivity, attentiveness to inner feelings, preference for
variety, intellectual curiosity, and independence of
-self-reported information may be obtained in the form of judgment
diaries kept by assessees or in the form of responses to oral or written
questions or test items  Agreeableness: interpersonal tendencies that include
altruism, sympathy toward others, friendliness, and the
-self-report methods are very commonly used to explore an belief that others are similarly inclined
assessee’s self-concept: one’s attitudes, beliefs, opinions, and related
thoughts about oneself
Page | 23
Psychological Assessment Transcripts
RGO Review Center | Board Licensure Examination for Psychometrician 2023

 Conscientiousness: active processes of planning,


organizing, and following through

PERSONALITY ASSESSMENT AND CULTURE

Acculturation
-an ongoing process by which an individual’s thoughts, behaviors,
values, worldview, and identity develop in relation to the general
thinking, behavior, customs, and
values of a particular cultural group

-process of acculturation begins at birth, a time at which the newborn


infant’s family or caretakers serve as agents of the culture

-through the process of acculturation, one develops culturally


accepted ways of thinking, feeling, and behaving

Page | 24

You might also like