Psyc 385 Exam 2 Study Guide
Psyc 385 Exam 2 Study Guide
Psyc 385 Exam 2 Study Guide
Chapter 4 continued
● In selecting a test for use, responsible test users should research the tests available
norms to check how appropriate they are for use with the targeted test taker population
● When interpreting test results, it helps to know about the culture and era of the testtaker
Chapter 5
● What is reliability?
Reliability coefficient is an index of reliability, a proportion that indicates the ratio between the
true score variance on a test and the total variance
--> Some things we want to measure in our society are not acceptable (like infidelity) so people
will not report those behaviors because it's not socially desirable to do so
-this is hard or impossible to correct for, but has smaller problematic implications than
systematic error. examples include bad weather, testing anxiety, etc.
● Test Construction: Variation may exist within items on a test or between tests (i.e.,
Item sampling or content sampling).
● Test Administration: Sources of error may stem from the testing environment. Also,
testtaker variables such as pressing emotional problems, physical discomfort, lack of
sleep, and the effects of drugs or medication. Examiner-related variables such as
physical appearance, training, and demeanor may play a role.
● Test Scoring and Interpretation: Computer testing reduces error in test scoring, but
many tests still require expert interpretation (e.g., projective tests). Subjectivity in scoring
can enter into behavioral assessment.
● Under what circumstances are the following indices of reliability most appropriate?
Test-retest reliability: an estimate of reliability obtained by correlating pairs of scores from the
same people on two different administrations of the same test
● Most appropriate for variables that should be stable over time (e.g., personality) and not
appropriate for variables expected to change over time (e.g., mood)
● Estimates tend to decrease as time passes
● With intervals over 6 months the estimate of test-retest reliability is called the coefficient
of stability
Test-retest Reliability
Coefficient of equivalence: The degree of the relationship between various forms of a test.
● Parallel forms: for each form of the test, the means and the variances of observed test
scores are equal
● Alternate forms: different versions of a test that have been constructed so as to be
parallel. Do not meet the strict requirements of parallel forms but typically item content
and difficulty is similar between tests
● Reliability is checked by administering two forms of a test to the same group. Scores
may be affected by error related to the state of testtakers (e.g., practice, fatigue, etc.) or
item sampling
Same construct is measured with two forms of the test –the coefficient of equivalence tells us
how similar the forms are to each other
○ Split-half reliability
Split-half reliability: is obtained by correlating two pairs of scores obtained from equivalent
halves of a single test administered once. Entails three steps:
Step 1. Divide the test into equivalent halves.
Step 2. Calculate a Pearson r between scores on the two halves of the test.
Step 3. Adjust the half-test reliability using the Spearman-Brown formula.
○ Inter-item consistency
Inter-item consistency: The degree of relatedness of items within a test. Able to gauge the
homogeneity of a test
○ Coefficient alpha
Coefficient alpha: mean of all possible split-half correlations, corrected by the Spearman-
Brown formula. The most popular approach for internal consistency. Values range from 0 to 1
Coefficient Alpha
○ Inter-scorer reliability
Inter-scorer reliability: The degree of agreement or consistency between two or more scores
(or judges or raters) with regard to a particular measure
● It is often used with behavioral measures
● Guards against biases or idiosyncrasies in scoring
● Coefficient of inter-score reliability – The scores from different raters are correlated with
one another.
Coefficient alpha is determined by the mean of all possible split-half correlations, corrected by
the Spearman Brown Formula
creates the most popular approach for internal consistency; values range 0 to 1
Table shows which values of coefficient alpha are acceptable; anything below 0.5 is not useful
at all; anything above 0.7 is acceptable
Homogenous= functionally uniform throughout; test designed to measure one factor are
expected to be homogenous; high degree of internal consistency
Item-Response Theory: Provides a way to model the probability that a person with X ability will
be able to perform at a level of Y.
● IRT refers to a family of methods and techniques.
● IRT incorporates considerations of item difficulty and discrimination
- Provides a way to model the probability that a person with X ability will be able to perform at a
level of Y.
- IRT refers to a family of methods and techniques.
- IRT incorporates considerations of item difficulty and discrimination
- An approach that is used to derive item characteristics
- Used to create shorter and more efficient measures
- Google: refers to a family of mathematical models that attempt to explain the relationship
between latent traits (unobservable characteristic or attribute) and their manifestations (i.e.
observed outcomes, responses or performance)
● Discrimination (a) refers to the degree to which an item differentiates among people with
higher or lower levels of the trait, ability, or other variable being measured
● Difficulty (b) relates to how difficult an item is to be accomplished, solved, or
comprehended
- Discrimination (a) refers to the degree to which an item differentiates among people with
higher or lower levels of the trait, ability, or other variable being measured (deals with item
characteristics relatedness/differences)
- Difficulty (b) relates to an item not being easily accomplished, solved, or comprehended
- Higher responses = less discrimination
^^High discrimination = more people are answering the same way
^^List difficult: number on table is item
Generalizability theory: a person’s test scores may vary because of variables in the testing
situation and sampling.
● Cronbach encouraged test developers and researchers to describe the details of the
particular test situation or universe leading to a specific test score.
- Google: the reliability of generalizing from a student's observed score on a test to his/her
average measure that would occur under all possible conditions that are acceptable
σ meas
Standard error of measurement, often abbreviated as SEM, provides a measure of the precision
of an observed test score. An estimate of the amount of error inherent in an observed score or
measurement.
● Generally, the higher the reliability of the test, the lower the standard error.
● Standard error can be used to estimate the extent to which an observed score deviates
from a true score.
● Confidence interval: a range or band of test scores that is likely to contain the true
score.
ASK / LOOK UP
ASK / LOOK UP
● What is the standard error of the difference and how does it help us with
interpretation of two scores?
The standard error of difference: a measure that can aid a test user in determining how large
a difference in test scores should be expected before it is considered Statistically significant. It
can be used to address three types of questions:
1. How did this individual’s performance on test 1 compare their own performance on
test 2?
2. How did this individual’s performance on test 1 compare with someone else’s
performance on test 1?
3. How did this individual’s performance on test 1 compare with someone else’s
performance on test 2?
Chapter 6
● What is validity?
Validity: a judgment or estimate of how well a test measures what it is supposed to measure
within a particular context
Face Validity
Face Validity: a judgment concerning how relevant the test items appear to be.
● If a test appears to measure what it is supposed to be measuring “on the face of it,” it
could be said to be high in face validity
● A perceived lack of face validity may contribute to a lack of confidence in the test
Content Validity
Content validity: how well a test samples behaviors that are representative of the broader set
of behaviors it was designed to measure
● Do the test items adequately represent the content that should be included in the test?
● Test blueprint: A plan regarding the types of information to be covered by the items, the
number of items tapping each area of coverage, the organization of the items in the test,
etc.
Content Validity
● Typically established by recruiting a team of experts on the subject matter and obtaining
expert ratings on the degree of item importance as well as scrutinize what is missing
from the measure
● Important to remember that content validity of a test varies across cultures and time
Characteristics of a Criterion
An adequate criterion is relevant for the matter at hand, valid for the purpose for which it is
being used, and uncontaminated , meaning it is not part of the predictor.
Criterion-related validity: A judgment of how adequately a test score can be used to infer an
individual’s most probable standing on some measure of interest (i.e., the criterion).
Construct validity: arrived at by executing a comprehensive analysis of: a. how scores on the
test relate to other test scores and measures, and b. how scores on the test can be understood
within some theoretical framework for understanding the construct that the test was designed to
measure; the ability of a test to measure a theorized construct that it aims to measure
● Understand what constitutes good face validity and what happens if it is lacking/why
we might not want the test to be face valid.
● If a test appears to measure what it is supposed to be measuring “on the face of it,” it
could be said to be high in face validity
● A perceived lack of face validity may contribute to a lack of confidence in the test
○ An adequate criterion is ___ for the matter at hand, ___ for the purpose it is
being used, and ___, meaning it is not part of the predictor
Characteristics of a Criterion
An adequate criterion is relevant for the matter at hand, valid for the purpose for which it is
being used, and uncontaminated , meaning it is not part of the predictor.
Predictive validity: an index of the degree to which a test score predicts some criterion, or
outcome, measure in the future. Tests are evaluated as to their predictive validity.
● Understand how base, hit, and miss rates relate to predictive validity
Type 1 error: false positive--> saying that there is a statistical significance there is not
Type 2 error: false negative--> saying that there is no significance when there is
● What happens to the validity coefficient when you restrict or inflate the range of
scores?
When you restrict the range of scores, correlations get weaker-->reliability gets weaker--
>weaker validity (reliability is a prerequisite for validity); inflated range-->stronger correlations--
>stronger reliability-->stronger validity
Incremental validity: the degree to which an additional predictor explains something about the
criterion measure that is not explained by predictors already in use
● To what extent does a test predict the criterion over and above other variables?
● If a test has high construct validity, what does this tell you about the test?
If a test is a valid measure of a construct, high scorers and low scorers should behave as
theorized; construct validity is an umbrella term; high construct validity tells you that the tests
measures what it says its measuring
Construct validity: the ability of a test to measure a theorized construct (e.g., intelligence,
aggression, personality, etc.) that it aims to measure
● If a test is a valid measure of a construct, high scorers and low scorers should behave
as theorized
● Construct validity is an umbrella term – content- and criterion-related validity provide
evidence for construct validity and much more
○ How do bias and fairness relate? Can you have an unbiased, yet unfair test?
(LOOK UP)
Rating error: a judgment resulting from the intentional or unintentional misuse of a rating scale.
● Raters may be either too lenient, too severe, or reluctant to give ratings at the extremes
(central tendency error)
(ADD TYPES)
● Halo effect - a tendency to give a particular person a higher rating than he or she
objectively deserves because of a favorable overall impression
Chapter 7
● QUIZLET LINKS:
○ https://quizlet.com/632740492/psyc-385-exam-2-flash-cards/
○ https://quizlet.com/739168853/psyc-385-exam-2-flash-cards/
○ https://quizlet.com/737356291/psyc-385-exam-2-flash-cards/
● TOPHAT QUESTIONS: