0% found this document useful (0 votes)
13 views9 pages

Psych Testing Reviewer Midterm

The document provides a comprehensive overview of statistical concepts relevant to psychological testing, including measures of central tendency, variability, reliability, and validity. It details various scales of measurement, methods for ensuring reliability, and the importance of validity in test construction. Additionally, it discusses utility analysis and practical considerations in test development and evaluation.

Uploaded by

agomen942
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views9 pages

Psych Testing Reviewer Midterm

The document provides a comprehensive overview of statistical concepts relevant to psychological testing, including measures of central tendency, variability, reliability, and validity. It details various scales of measurement, methods for ensuring reliability, and the importance of validity in test construction. Additionally, it discusses utility analysis and practical considerations in test development and evaluation.

Uploaded by

agomen942
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

CHAPTER 3: A Statistic Refresher 2. Median: The middle score in a distribution.

Reviewer for Psychological Testing and 3. Mode: The most frequently occurring score.
Assessment
Measures of Variability
Chapter 3: A Statistic Refresher
• Range: Difference between highest and
Scales of Measurement lowest scores.
• Measurement: Assigning numbers or • Interquartile Range (IQR): Difference
symbols to characteristics of things between Q3 and Q1 (middle 50%).
according to rules.
• Semi-Interquartile Range: IQR divided by 2.
• Scale: A set of numbers or symbols
representing empirical properties. • Standard Deviation: Square root of variance;
indicates score dispersion.
Types of Scales:
• Skewness:
1. Nominal Scale: Classification or
categorization (e.g., gender in a study). o Positive skew: Few high scores.

2. Ordinal Scale: Rank ordering but no absolute o Negative skew: Few low scores.
zero (e.g., intelligence test rankings). • Kurtosis:
3. Interval Scale: Equal intervals between o Platykurtic: Flat distribution.
numbers but no absolute zero (e.g., IQ
scores). o Leptokurtic: Peaked distribution.

4. Ratio Scale: Has a true zero point (e.g., time o Mesokurtic: Normal distribution.
taken to complete a puzzle). The Normal Curve
Describing Data • Bell-shaped, mathematically defined curve
• Distribution: A set of test scores arranged for with symmetry.
study. • Developed by DeMoivre, Laplace, and later
• Raw Score: Unmodified numerical Pearson.
representation of performance. Standard Scores
• Frequency Distributions: • Standard Score: Converts raw scores into a
o Simple Frequency Distribution: Lists scale with a set mean and standard
all scores and their occurrences. deviation.

o Grouped Frequency Distribution: • Types of Standard Scores:


Groups test scores into intervals. 1. Z-score: Indicates how many
• Graph Types: standard deviations a raw score is
from the mean.
1. Histogram: Vertical bars at true limits of test
scores. 2. T-score: Mean of 50, standard
deviation of 10.
2. Bar Graph: Non-contiguous rectangular bars
showing categorical data. 3. Stanine: Standardized nine-point
scale (1 to 9).
3. Frequency Polygon: Continuous line
connecting score points.
Measures of Central Tendency
1. Mean: The average score.
CHAPTER 5: Reliability o Measures the correlation among test
items without needing multiple test
I. Key Concepts administrations.
• Reliability: The consistency of a o Homogeneous: Items measure a
measurement. single trait.
• Reliability Coefficient: The proportion that o Heterogeneous: Items measure
indicates the ratio between true score
multiple factors.
variance and total variance.
IV. Methods of Internal Consistency Reliability
• Variance: A statistic describing the sources
of test score variability. 1. Split-Half Reliability
o True Variance: Variance due to actual o Divides the test into two halves and
differences in the trait being correlates the scores.
measured.
o Uses Pearson r and the Spearman-
o Error Variance: Variance caused by Brown Formula.
random factors.
o Odd-Even Reliability: A specific type
II. Sources of Error Variance of split-half reliability.
1. Test Construction 2. Spearman-Brown Formula
2. Test Administration o Estimates reliability based on test
length.
3. Test Scoring and Interpretation
o Not suitable for heterogeneous tests
III. Types of Reliability Estimates or speed tests.
1. Test-Retest Reliability 3. Kuder-Richardson Formulas
o Correlates scores from the same test o KR-20: Measures inter-item
taken at different times. consistency for dichotomous items.
o Coefficient of Stability: Used if the o KR-21: Used when test difficulty is
time interval is greater than six consistent.
months.
4. Coefficient Alpha (Cronbach’s Alpha)
2. Parallel-Forms & Alternate-Forms
Reliability o Mean of all possible split-half
correlations.
o Coefficient of Equivalence:
Measures the relationship between o Used for Likert-scale tests and non-
different test forms. dichotomous items.
o Parallel Forms: Identical means and V. Measures of Inter-Scorer Reliability
variances; scores correlate equally
with the true score. • Inter-Scorer Reliability: Agreement between
multiple raters.
o Alternate Forms: Different versions
of a test, designed to be parallel. • Coefficient of Inter-Scorer Reliability: A
correlation coefficient measuring scorer
o Both require two test administrations consistency.
and may be affected by factors like
fatigue or practice. • Kappa Statistic: Used to calculate inter-
scorer agreement.
3. Internal Consistency Reliability
VI. Standard Error of Measurement (SEM)
• Estimates the amount of error in an observed o Zero CVR: Exactly half of the
score. panelists rate an item as essential.
• Measures precision of test scores. o Positive CVR: More than half but not
all panelists rate an item as essential.
3. Criterion-Related Validity
CHAPTER 6: Validity
• Evaluates how well test scores predict
I. Definition of Validity outcomes based on a criterion.
• Validity: The extent to which a test measures • Criterion: A standard used for evaluating the
what it claims to measure in a specific accuracy of test scores.
context.
• Characteristics of a good criterion:
• Trinitarian View of Validity:
1. Relevant – Applicable to the test.
1. Content Validity – Evaluates the
extent to which a test covers the 2. Valid – Meaningful for its intended
subject matter. purpose.
2. Criterion-Related Validity – 3. Uncontaminated – Not influenced by
Examines the relationship between predictor measures.
test scores and external measures.
• Types of Criterion-Related Validity:
3. Construct Validity – Assesses
whether a test aligns with theoretical 1. Concurrent Validity: Compares test
concepts. scores with criterion scores
collected at the same time.
2. Predictive Validity: Measures how
II. Types of Validity well test scores predict future
criterion scores.
1. Face Validity
• Statistical Measures of Criterion-Related
• Refers to how a test appears to measure a
Validity:
certain trait, from the perspective of test-
takers. o Validity Coefficient: Correlation
between test scores and criterion
• More about perception than actual scores.
psychometric soundness.
o Incremental Validity: Determines
2. Content Validity how much a new predictor improves
• Determines whether test items predictive ability.
representatively sample the subject matter. o Expectancy Data: Uses expectancy
• Used in objective, achievement, and tables to predict the probability of
aptitude tests. certain outcomes.

• Two key tools: 4. Construct Validity

o Table of Specification (TOS) • Assesses whether test scores meaningfully


relate to a theoretical concept.
o Subject Matter Expertise (SME)
• Construct: A theoretical trait or ability that is
• Content Validity Ratio (CVR): Formula used not directly observable (e.g., intelligence,
to quantify content validity. motivation).
o Negative CVR: Less than half the • Evidence of Construct Validity:
panelists rate an item as essential.
1. Convergent Evidence – High 1. Test Conceptualization – Defining the
correlation with similar constructs. purpose and scope of the test.
2. Divergent Evidence – Low correlation 2. Test Construction – Developing items and
with unrelated constructs. selecting the format.
3. Factor Analysis – Identifies test 3. Test Tryout – Administering a preliminary
components that contribute to a version of the test.
construct.
4. Item Analysis – Evaluating the quality of test
4. Evidence of Homogeneity – Shows items.
that test items measure a single
construct. 5. Test Revision – Modifying the test based on
findings from item analysis.
5. Evidence of Changes with Age –
Constructs develop predictably over
time. II. Test Conceptualization
6. Evidence from Pretest-Posttest • The initial stage where an idea for a test is
Changes – Scores change in conceived.
response to interventions.
• Important questions to consider:
7. Evidence from Distinct Groups –
Test differentiates between groups o What will the test measure?
known to differ on the construct. o Who will take and use the test?
o How will it be administered?
III. Validity, Bias, and Fairness o What responses will it require?
1. Test Bias o How will scores be interpreted?
• Refers to systematic errors that unfairly • Norm-Referenced vs. Criterion-Referenced
advantage or disadvantage certain groups. Tests:
• Types of Rating Errors: o Norm-Referenced Tests: Compare
o Leniency Error – Overly generous test-takers’ scores to a group norm.
ratings. o Criterion-Referenced Tests:
o Severity Error – Overly harsh ratings. Measure mastery of specific skills or
knowledge.
o Central Tendency Error – Avoiding
extreme ratings. Pilot Work

o Halo Effect – Rating influenced by • Preliminary research to refine the test


unrelated characteristics. before full development.

2. Test Fairness • Involves literature reviews, experimenting


with test items, and refining content.
• Ensures that tests are administered and
interpreted equitably for all individuals.
III. Test Construction

CHAPTER 8: Test Development Scaling Methods (Assigning Numbers to


Measurement)
I. Stages of Test Development
• Age-Based Scaling – Compares performance
The process of developing a psychological test based on age.
involves five key stages:
• Grade-Based Scaling – Compares o Reduces floor effects (difficulty at
performance by educational level. low levels) and ceiling effects
(difficulty at high levels).
• Stanine Scaling – Transforms raw scores into
a scale from 1 to 9.
• Unidimensional vs. Multidimensional V. Test Tryout
Scales:
• Administering the test to a sample similar to
o Unidimensional – Measures a single the target population.
construct.
• Must simulate real testing conditions as
o Multidimensional – Measures closely as possible.
multiple constructs.
Common Scaling Methods
VI. Item Analysis
1. Rating Scale – Assesses the intensity of a
trait (e.g., Likert scale). Statistical techniques used to evaluate test items:

2. Method of Paired Comparisons – Presents 1. Item Difficulty Index – Measures how many
pairs of stimuli for comparison. test-takers answered correctly.

3. Comparative Scaling – Requires ranking of 2. Item Reliability Index – Indicates internal


items. consistency of the test.

4. Categorical Scaling – Assigns items to 3. Item Validity Index – Assesses if items


distinct categories. measure the intended construct.

5. Guttman Scaling – Arranges items from 4. Item Discrimination Index – Determines


how well an item differentiates high and low
weakest to strongest expression.
scorers.
5. Item Characteristic Curve – Graphically
IV. Writing Test Items represents difficulty and discrimination.
• Item Pool: The collection of potential test Other Considerations in Item Analysis
items.
• Guessing – Difficult to control but affects test
• Two Major Item Formats: accuracy.
1. Selected-Response Items: Multiple- • Bias – Items should not unfairly favor one
choice, matching, true/false. group over another.
2. Constructed-Response Items: • Speed Tests – Later items may appear more
Short-answer, essay, completion- difficult simply due to time constraints.
type.
Computerized Test Item Development
VII. Test Revision
• Item Bank – A large collection of test items
for future use. • Occurs at two stages:

• Item Branching – Adjusts test difficulty 1. During new test development – After
based on a test-taker’s responses. analyzing test results.

• Computerized Adaptive Testing (CAT): 2. Throughout the life cycle of an


existing test – When updates are
o Dynamically selects items based on necessary.
prior responses.
• Cross-Validation: Testing on a different
sample to confirm validity.
• Validity Shrinkage: A decrease in test validity o Used to predict likelihood of
after cross-validation. success based on test scores.
• Co-Validation: Testing multiple assessments o Expectancy tables help categorize
on the same sample for efficiency. test-takers into passing, acceptable,
or failing groups.
Other Scoring Considerations
o Taylor-Russell and Naylor-Shine
• Anchor Protocol: A highly accurate reference
Tables: Used to analyze test validity in
scoring method. employment selection.
• Scoring Drift: Changes in scoring 2. Brogden-Cronbach-Gleser Formula
consistency over time.
o Used to estimate the monetary or
practical benefits of using a test in
CHAPTER 9: Utility selection decisions.

I. Definition of Utility
• Utility: The usefulness or practical value of IV. Practical Considerations in Utility Analysis
a test or assessment. • Size of the Applicant Pool – Affects how
• Helps determine whether a test improves selective the hiring process can be.
efficiency in decision-making. • Job Complexity – More complex jobs require
• Can also refer to the effectiveness of a more predictive and specialized tests.
training program or intervention. • Cut Scores – The minimum score required for
passing or selection.

II. Factors Affecting a Test’s Utility


1. Psychometric Soundness – A test is useful if V. Methods for Setting Cut Scores
it provides reliable and valid information for 1. Angoff Method
decision-making.
o Experts estimate how minimally
2. Costs – Expenses related to: competent individuals would
o Purchasing the test. perform.

o Printing test materials. o Issue: Low inter-rater reliability can


lead to disagreement.
o Scoring and interpretation (manual or
computerized). 2. Known Groups Method

3. Benefits – The profits, advantages, or o Compares test scores of groups


improvements gained from using a test. known to have or lack a trait.
o Issue: The cut score depends on
group composition, which can vary.
III. Utility Analysis
3. IRT-Based Methods (Item Response Theory)
• Definition: A family of techniques used for
cost-benefit analysis of a test. o Determines cut scores based on test-
takers’ performance across all
• Determines whether the use of a test is worth items.
the investment.
o Item Mapping Method: Used for
How Utility Analysis is Conducted licensing exams.
1. Expectancy Data o Bookmark Method: Common in
academic settings.
4. Discriminant Analysis ▪ Assimilation – Incorporating
new information into existing
o A statistical method to classify structures.
individuals into categories (e.g.,
successful vs. unsuccessful ▪ Accommodation – Modifying
employees). existing structures to fit new
information.

CHAPTER 10: Intelligence and its Measurement


III. Factor-Analytic Theories of Intelligence
I. Definition of Intelligence
Factor analysis identifies relationships between
• Intelligence is a multifaceted capacity that different cognitive abilities.
includes the ability to:
1. Charles Spearman (Two-Factor Theory)
o Acquire and apply knowledge
o Intelligence consists of:
o Reason logically
▪ General Intelligence (g) –
o Plan effectively Affects all cognitive tasks.
o Solve problems ▪ Specific Abilities (s) – Unique
o Make sound judgments to particular tasks.

o Visualize concepts 2. Louis Thurstone (Primary Mental Abilities)

o Adapt to new situations o Identified seven abilities (e.g., verbal


comprehension, numerical ability).
o Find words and thoughts easily
o Later acknowledged g-factor
o Pay attention and be intuitive influences all abilities.
3. J.P. Guilford (Structure of Intellect Model)
II. Early Theories of Intelligence o Rejected g and proposed intelligence
1. Francis Galton – First to study the consists of 150+ abilities.
heritability of intelligence. 4. Howard Gardner (Multiple Intelligences)
o Believed intelligence was hereditary o Identified seven types of intelligence:
and linked to sensory abilities.
▪ Logical-mathematical
2. Alfred Binet – Developed the first intelligence
test. ▪ Linguistic

o Saw intelligence as including ▪ Musical


reasoning, judgment, memory, and ▪ Spatial
abstraction.
▪ Bodily-kinesthetic
3. David Wechsler – Viewed intelligence as a
global capacity. ▪ Interpersonal
o Emphasized the role of non- ▪ Intrapersonal
intellective factors (e.g., motivation).
o Basis for emotional intelligence
4. Jean Piaget – Intelligence as a biological theories.
adaptation.
5. Raymond Cattell (Fluid & Crystallized
o Learning occurs through: Intelligence)
o Crystallized Intelligence (Gc) – o Arithmetic and verbal reasoning
Knowledge from education and decline in later adulthood.
experience.
o After age 75, cognitive abilities
o Fluid Intelligence (Gf) – Nonverbal, decline significantly.
culture-free, problem-solving
ability.
CHAPTER 11: Tests of Intelligence
6. John Horn (Extended Cattell’s Model)
o Added factors like visual, auditory, I. The Stanford-Binet Intelligence Scales
and quantitative processing. 1. First Edition (Original Stanford-Binet)
o Distinguished vulnerable abilities o First test to provide organized and
(decline with age) vs. maintained detailed administration and scoring.
abilities (stay stable).
o First American test to introduce the
7. John Carroll (Three-Stratum Theory) Intelligence Quotient (IQ).
o Intelligence has three levels: o Introduced alternate items (items
▪ Stratum I – Narrow abilities used under special conditions).
(e.g., memory, speed). 2. Fourth Edition
▪ Stratum II – Broad abilities o Used a point scale (organized by item
(e.g., Gf, Gc). category, not age).
▪ Stratum III – g-factor (general o Based on the Cattell-Horn model of
intelligence). intelligence.
8. CHC Model (Cattell-Horn-Carroll) 3. Fifth Edition (SB5)
o A combination of Cattell’s and o Based on the Cattell-Horn-Carroll
Carroll’s models. (CHC) theory.
o Guides modern intelligence testing. o Administered to individuals ages 2 to
85+.

IV. Measuring Intelligence o Includes 10 subtests that contribute


to a Full Scale IQ.
• Mental Age – Compares performance to a
specific age group. o Subtest scores: Mean = 10, Standard
Deviation = 3.
o Includes a behavioral checklist for
V. Intelligence: Key Issues examiners.
1. Nature vs. Nurture
o Preformationism: Intelligence is II. The Wechsler Tests
fixed at birth.
• Created by David Wechsler.
o Predeterminism: Intelligence is
genetically determined and • Designed for preschoolers to adults.
unchangeable. • Evolution of Wechsler tests:
2. Stability of Intelligence o WAIS-IV (Wechsler Adult Intelligence
o Vocabulary improves with age. Scale)
o WISC-IV (Wechsler Intelligence Scale
for Children)
o WPPSI-III (Wechsler Preschool and o Example: Standardized IQ tests.
Primary Scale of Intelligence)
2. Divergent Thinking
1. Wechsler Adult Intelligence Scale (WAIS-IV)
o Creative reasoning: Generates
• Consists of core and supplemental multiple possible solutions.
subtests:
o Requires flexibility, originality, and
o Core subtests: Used to calculate a imagination.
composite score.
o Example: Creative problem-solving
o Supplemental subtests: Provide tasks.
additional clinical information.
• 10 Core Subtests:
o Block Design, Similarities, Digit Span,
Matrix Reasoning, Vocabulary,
Arithmetic, Symbol Search, Visual
Puzzles, Information, Coding.
• 5 Supplemental Subtests:
o Letter-Number Sequencing, Figure
Weights, Comprehension,
Cancellation, Picture Completion.
2. Wechsler Intelligence Scale for Children (WISC-
IV)
• First published in 1949.
• Based on the CHC model of intelligence.
• Contains five supplemental tests (adds
about 30 minutes to testing).
3. Wechsler Preschool and Primary Scale of
Intelligence (WPPSI-III)
• First test to properly sample the total U.S.
population, including racial minorities.
• Tests children from 2 years 6 months and up.
• Subtests are categorized as core,
supplemental, or optional.
• Used for children with short attention spans
or special conditions.

III. Thinking Styles in Intelligence


1. Convergent Thinking (Guilford, 1967)
o Deductive reasoning: Narrows down
solutions to one correct answer.
o Requires fact recall and logical
judgment.

You might also like