0% found this document useful (0 votes)
67 views44 pages

Methods and Stats in I/O: - Science - Research - Data Analysis - Correlation and Regression - Psychometrics

This document discusses concepts and methods in psychometrics and measurement theory, including: - Psychometrics is the study of how people respond to tests and questionnaires. It involves assigning numbers to represent constructs that cannot be directly observed. - Validity refers to how well a test measures what it claims to measure. There are different types of validity including criterion-related, content-related, and construct-related validity. - Reliability indicates the consistency of a measure. Types of reliability include test-retest, equivalent forms, internal consistency, and inter-rater reliability. - Item response theory models items using parameters like difficulty, discrimination, and guessing. It can provide useful information for test

Uploaded by

Dave Nemth
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views44 pages

Methods and Stats in I/O: - Science - Research - Data Analysis - Correlation and Regression - Psychometrics

This document discusses concepts and methods in psychometrics and measurement theory, including: - Psychometrics is the study of how people respond to tests and questionnaires. It involves assigning numbers to represent constructs that cannot be directly observed. - Validity refers to how well a test measures what it claims to measure. There are different types of validity including criterion-related, content-related, and construct-related validity. - Reliability indicates the consistency of a measure. Types of reliability include test-retest, equivalent forms, internal consistency, and inter-rater reliability. - Item response theory models items using parameters like difficulty, discrimination, and guessing. It can provide useful information for test

Uploaded by

Dave Nemth
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 44

Methods and Stats in I/O

• Science
• Research
• Data Analysis
• Descriptive and Inferential
• Correlation and Regression
• Psychometrics
• Psychometrics
• Study of how people respond to tests and questionnaires
• Measurement
• System of rules of assigning numbers to represent a person’s
standing on some underlying characteristic
• Latent Variable
• Theoretical variable of interest
• Cannot be observed directly
• Measure
• Operational definition of construct
• Imperfect indicator of latent variable
Applications of Psychometrics
• Validity

• Measurement Precision
• Reliability
• Understanding error

• Scale Development
• Item selection

• Computer Adaptive Testing


The Concept of Validity
• How well a test fulfills the function for what it is being used
• Can we predict Y from X?
• Does the test measure what it claims to measure?
• Does the test “look right”?
• A match between empirical relations and theoretical relations
• A property of tests
Validity as a property of tests
• A test is valid for measuring an attribute if variation in the attribute
causes variations in the test scores

• The attribute must exist and have a causal impact on test scores

• Therefore, if one does not have an idea of how the attribute


variations produce variations in measurement outcomes, one cannot
have a clue as to whether the tests measures what it should measure.
Criterion-Related Validity
• A couple definitions
• Predictor
• The test chosen or developed to assess attributes (e.g., abilities) identified as important
for successful job performance
• Criterion
• An outcome variable that describes important aspects or demands of the job
• The variable that we want to predict when evaluating the validity of the predictor

• Criterion-Related
• Correlation of test scores (predictor) with job performance (criterion)
• Represented as the validity coefficient (i.e., a correlation)
Criterion-Related Validity
• Predictive Validity
• Predictors correlate with criterion separated by time

• Concurrent Validity
• Predictors correlate with criterion at the same time
Predicting Preference to Work Alone
Content-Related Validity
• The content of the predictor and criterion represent an adequate
sample of important work behaviors and KSAOs defined by the job
analysis

• Using the knowledge of incumbents (or subject matter experts,


SMEs), we make logical connections between tests and job
performance
Construct-Related Validity
• Construct
• Concept that a test is intending to measure
• A broad representation of a human characteristic

• Construct Validity
• The integration of validity evidence which is important for determining the
meaning of test scores
• Correlation between similar and dissimilar tests should be in the predicted
direction (and sometimes strength)
• Evidence from other sources (literature reviews, studies, theories, etc.)
Multi-Trait Multi-Method Matrix
1 2 3 4 5
1. Verbal Ability Test
2. Interview Rating of .5
Communication
3. Sample Lecture .4 .6
4. Test of I/O Knowledge .2 .1 .1
5. Interview Rating of I/O .1 .3 .1 .4
Knowledge
6. Number of Top Tier Pubs .1 .1 .1 .5 .4
Variable F1 F2
1. Verbal Ability Test .8 .1
2. Interview Rating of .6 .2
Communication
3. Sample Lecture .7 .2
4. Test of I/O Knowledge .1 .8
5. Interview Rating of I/O .4 .5
Knowledge
6. Number of Top Tier Pubs .2 .7
Measurement Theory
• Classical Test Theory

• Item Response Theory


Classical Test Theory
• The main idea in CTT is that observed scores can be decomposed into
a true score and an error component
• Observed = True + Error

• The true score is defined as the expected value of the observed


scores
• Derived from the Theory of Errors
• The central limit theorem
Reliability
• Reliability
• Consistency or stability of a measure
• A measure is said to be reliable when you get the same results at different
times, with different users, or in different situations

• More precisely,
• It indicates the fraction of observed variance that is systematic, as opposed to
random
• In CTT, reliability is the squared correlation between true and observed scores
Types of Reliability
• Test-Retest
• Equivalent Forms
• Internal Consistency
• Inter-Rater
Test-Retest Reliability
• Calculated by correlating measurements taken at time 1 with
measurements taken at time 2
• Represented as a correlation coefficient
• Higher the correlation, higher the reliability
Equivalent Forms Reliability
• Calculated by correlating measurements from a sample of individuals
who complete two different forms of the same test

Split halves are


another form
of Equivalent
forms.
Internal Consistency
• Assesses how consistently the items of a test measure a single
construct
• Affected by the number of items in the test, and
• Correlations among test items
Internal Consistency
Mini IPIP v Students: Internal Consistency
• Extraversion (.77) (.79)
• Agreeableness (.70) (.66)
• Conscientiousness (.69) (.73)
• Neuroticism (.68) (.52)
• Openness to Experience (.65) (.76)
2. Sympathize with others’ feelings
7. Am not really interested in others (R)
12. I believe others have good intentions
17. Am not interested in other people’s problems (R)
4. Have frequent mood swings
9. Am relaxed most of the time (R)
14. Get upset easily
19. Seldom feel blue (R)
Inter-Rater Reliability
• The reliability of several different individuals making judgements

• Assesses the how much consensus there is in ratings

• Absolute vs Relative agreement


Validity & Reliability

Reliable
Neither Valid but not
nor Reliable Valid

Fairly Valid but


not very Reliable Valid & Reliable
Item Response Theory
Item Response Theory
• Items have a number of parameters
• Difficulty
• Discrimination
• Guessing

• IRT can estimate these values and provide useful data


• Item and test information
• Ability level estimates

• Applications include
• Study of item bias
• Creating equivalent forms
• Computer adaptive testing
Item Information
• Item 1 (a = 2, b=-1)
• Item 2 (a = 2, b=-0.5)
• Item 3 (a = 1, b=1)
• Item 4 (a = 1.5, b=2)
Test Information
• The information provided by a set of items is simply
the sum of the Item Information

𝑇𝐼 𝜃 = ෍ 𝐼𝑖 𝜃
𝑖
• Item 1 and 4
Which item adds the most?
• Start with Items 1 & 4

• If we add Item 2

• If we add Item 3
Computerized Adaptive Testing
• CATs are used in many popular tests today
• SAT, ACT, GRE

• In CAT, we choose the next item which we hope in some way supplies
us with the most information about the individual’s trait
Intro To CAT
1. Pick an initial item
2. Based on response estimate θ
3. Using current θ, select item with max I(θ)
4. Base on response, update θ estimate
5. Check stopping rule
• e.g., stop if SE < .3
6. If stopping rule is not met, repeat 3-5
Example: Prior Distribution
Item 1 (a = 1, b= 0)
Item 2 (a = 1.5, b= 2)
Item 3 (a = 2, b= -1)
Item 4 (a = 2, b= 1)
Prior

Probability
0.10
0.00

-4 -2 0 2 4
Theta
Item 1 Correct
Item 1 (a = 1, b= 0)
Item 2 (a = 1.5, b= 2)
Item 3 (a = 2, b= -1)
Item 4 (a = 2, b= 1)
Item 1

Probability
0.10
0.00

-4 -2 0 2 4
Theta
Item 2 Wrong
Item 1 (a = 1, b= 0)
Item 2 (a = 1.5, b= 2)
Item 3 (a = 2, b= -1)
Item 4 (a = 2, b= 1)
Item 2

Probability
0.10
0.00

-4 -2 0 2 4
Theta
Item 3 Correct
Item 1 (a = 1, b= 0)
Item 2 (a = 1.5, b= 2)
Item 3 (a = 2, b= -1)
Item 4 (a = 2, b= 1)
Item 3

Probability
0.10
0.00

-4 -2 0 2 4
Theta
Item 4 Wrong
Item 1 (a = 1, b= 0)
Item 2 (a = 1.5, b= 2)
Item 3 (a = 2, b= -1)
Item 4 (a = 2, b= 1)
Item 4

Probability
0.10
0.00

-4 -2 0 2 4
Theta
Research on PROMIS
• Our team at IIT was able to improve the PROMIS CAT by reducing the
number of items by 50% and making the CAT more efficient in general

• We did this by performing a Multi-dimensional CAT

You might also like