Psyc 385 Exam 2 Study Guide

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 17

STUDY: study guide, lecture, textbook, quizzes, tophat questions

LIGHT YELLOW: LECTURE NOTES


LIGHT GREEN: QUIZLETS
GREEN: STILL NEEDED

Preliminary Exam 2 Study Guide

Chapter 4 continued

● What are some cultural considerations in test construction/standardization?

● In selecting a test for use, responsible test users should research the tests available

norms to check how appropriate they are for use with the targeted test taker population

● When interpreting test results, it helps to know about the culture and era of the testtaker

● It is important to conduct culturally informed assessment

Chapter 5

● What is reliability?

Reliability: dependency or consistency in measurement.


- The proportion of the total variance attributed to true variance

● What is a reliability coefficient?

Reliability coefficient is an index of reliability, a proportion that indicates the ratio between the
true score variance on a test and the total variance

● What are the components of an observed score?

Observed score = true score plus error (X = T + E)

● What is random error?

Random Error: a source of error in measuring a targeted variable caused by unpredictable


fluctuations and inconsistencies of other variables in the measurement process (i.e., noise)

● What is systematic error?

Systematic Error: a source of error in measuring a variable that is typically constant or


proportionate to what is presumed to be the true value of the variable being measured.

● Describe the difference between random error and systematic error.


-Systematic Error: --> Based on identity (racial, ethnic, gender, socioeconomic) = controversial
and discrimination type error

--> Some things we want to measure in our society are not acceptable (like infidelity) so people
will not report those behaviors because it's not socially desirable to do so

-Random error:caused by unpredictable fluctuations and inconsistencies of other variables in


the measurement process, ex: noise

-this is hard or impossible to correct for, but has smaller problematic implications than
systematic error. examples include bad weather, testing anxiety, etc.

● Understand the general contributions of test construction, administration, and


interpretation to error variance

● Test Construction: Variation may exist within items on a test or between tests (i.e.,
Item sampling or content sampling).
● Test Administration: Sources of error may stem from the testing environment. Also,
testtaker variables such as pressing emotional problems, physical discomfort, lack of
sleep, and the effects of drugs or medication. Examiner-related variables such as
physical appearance, training, and demeanor may play a role.
● Test Scoring and Interpretation: Computer testing reduces error in test scoring, but
many tests still require expert interpretation (e.g., projective tests). Subjectivity in scoring
can enter into behavioral assessment.

● Under what circumstances are the following indices of reliability most appropriate?

○ Test re-test reliability

Test-retest reliability: an estimate of reliability obtained by correlating pairs of scores from the
same people on two different administrations of the same test
● Most appropriate for variables that should be stable over time (e.g., personality) and not
appropriate for variables expected to change over time (e.g., mood)
● Estimates tend to decrease as time passes
● With intervals over 6 months the estimate of test-retest reliability is called the coefficient
of stability

Test-retest Reliability

Same person, same test at two different time points.


○ Parallel or alternate-forms (note: these are not the same thing)

Coefficient of equivalence: The degree of the relationship between various forms of a test.
● Parallel forms: for each form of the test, the means and the variances of observed test
scores are equal
● Alternate forms: different versions of a test that have been constructed so as to be
parallel. Do not meet the strict requirements of parallel forms but typically item content
and difficulty is similar between tests
● Reliability is checked by administering two forms of a test to the same group. Scores
may be affected by error related to the state of testtakers (e.g., practice, fatigue, etc.) or
item sampling

Parallel or Alternative Forms

Same construct is measured with two forms of the test –the coefficient of equivalence tells us
how similar the forms are to each other

○ Split-half reliability

Split-half reliability: is obtained by correlating two pairs of scores obtained from equivalent
halves of a single test administered once. Entails three steps:
Step 1. Divide the test into equivalent halves.
Step 2. Calculate a Pearson r between scores on the two halves of the test.
Step 3. Adjust the half-test reliability using the Spearman-Brown formula.

○ Inter-item consistency

Inter-item consistency: The degree of relatedness of items within a test. Able to gauge the
homogeneity of a test

○ Coefficient alpha

Coefficient alpha: mean of all possible split-half correlations, corrected by the Spearman-
Brown formula. The most popular approach for internal consistency. Values range from 0 to 1

Coefficient Alpha
○ Inter-scorer reliability

Inter-scorer reliability: The degree of agreement or consistency between two or more scores
(or judges or raters) with regard to a particular measure
● It is often used with behavioral measures
● Guards against biases or idiosyncrasies in scoring
● Coefficient of inter-score reliability – The scores from different raters are correlated with
one another.

● How is coefficient alpha determined and generally interpreted?

Coefficient alpha is determined by the mean of all possible split-half correlations, corrected by
the Spearman Brown Formula

creates the most popular approach for internal consistency; values range 0 to 1

Table shows which values of coefficient alpha are acceptable; anything below 0.5 is not useful
at all; anything above 0.7 is acceptable

● How does homogeneity vs heterogeneity of test items impact reliability?

Homogenous= functionally uniform throughout; test designed to measure one factor are
expected to be homogenous; high degree of internal consistency

Heterogeneous items have lower internal consistency

● Know the relation between range of test scores and reliability.

Restricted range results in a lower correlation coefficient and lower reliability

● What is the impact of a speed test or power test on reliability?


Power test=time limit is long enough but some items are so difficult that no testtaker is able to
obtain a perfect score; if we're not getting answers right, there will be a lower coefficient alpha--
> lower reliability; Speed test=uniform level of difficulty but with a time limit that prevents
testtakers from obtaining a perfect score; reliability is low

● Assumptions, pros, and cons of:

○ Classical Test Theory (CTT)

Perhaps the most widely used model due to its simplicity.


True score: a value that according to classical test theory genuinely reflects an individual’s
ability (or trait) level as measured by a particular test.
● CTT assumptions are more readily met than Item Response Theory (IRT)
● A problematic assumption of CTT has to do with the equivalence of items on a test
● Typically yield longer tests
the true score model (aka CTT); true score=a value that, according to CTT, genuinely reflects
an individual's ability (or trait) level as measured by a true test; assumptions are more readily
met than IRT; a problematic assumption has to do with the equivalence of items on a test: the
greater the number of items, the greater the reliability--> we generally think that adding more
specific items makes a better test but it actually makes a longer test

○ Item Response Theory (IRT)

Item-Response Theory: Provides a way to model the probability that a person with X ability will
be able to perform at a level of Y.
● IRT refers to a family of methods and techniques.
● IRT incorporates considerations of item difficulty and discrimination
- Provides a way to model the probability that a person with X ability will be able to perform at a
level of Y.
- IRT refers to a family of methods and techniques.
- IRT incorporates considerations of item difficulty and discrimination
- An approach that is used to derive item characteristics
- Used to create shorter and more efficient measures
- Google: refers to a family of mathematical models that attempt to explain the relationship
between latent traits (unobservable characteristic or attribute) and their manifestations (i.e.
observed outcomes, responses or performance)

● What is difficulty and discrimination in IRT?

● Discrimination (a) refers to the degree to which an item differentiates among people with
higher or lower levels of the trait, ability, or other variable being measured
● Difficulty (b) relates to how difficult an item is to be accomplished, solved, or
comprehended
- Discrimination (a) refers to the degree to which an item differentiates among people with
higher or lower levels of the trait, ability, or other variable being measured (deals with item
characteristics relatedness/differences)
- Difficulty (b) relates to an item not being easily accomplished, solved, or comprehended
- Higher responses = less discrimination
^^High discrimination = more people are answering the same way
^^List difficult: number on table is item

● Know the basic tenets of Generalizability Theory.

Generalizability theory: a person’s test scores may vary because of variables in the testing
situation and sampling.
● Cronbach encouraged test developers and researchers to describe the details of the
particular test situation or universe leading to a specific test score.
- Google: the reliability of generalizing from a student's observed score on a test to his/her
average measure that would occur under all possible conditions that are acceptable

● What is the standard error of measurement?

σ meas

Standard error of measurement, often abbreviated as SEM, provides a measure of the precision
of an observed test score. An estimate of the amount of error inherent in an observed score or
measurement.
● Generally, the higher the reliability of the test, the lower the standard error.
● Standard error can be used to estimate the extent to which an observed score deviates
from a true score.
● Confidence interval: a range or band of test scores that is likely to contain the true
score.

● Be able to calculate the confidence interval if given the standard error of


measurement and the confidence level index (e.g., 95% confidence level z-score of 2).

ASK / LOOK UP

● Be able to use confidence interval information to interpret test scores.

ASK / LOOK UP

● What is the standard error of the difference and how does it help us with
interpretation of two scores?

The standard error of difference: a measure that can aid a test user in determining how large
a difference in test scores should be expected before it is considered Statistically significant. It
can be used to address three types of questions:
1. How did this individual’s performance on test 1 compare their own performance on
test 2?
2. How did this individual’s performance on test 1 compare with someone else’s
performance on test 1?
3. How did this individual’s performance on test 1 compare with someone else’s
performance on test 2?

Chapter 6

● What is validity?

Validity: a judgment or estimate of how well a test measures what it is supposed to measure
within a particular context

● Know the relationship between reliability and validity.

In order for a test to have validity, it MUST have reliability

● Know the different types of validity and their unique qualities.

Validity is often conceptualized according to three categories:


1. Content validity : evaluation of the subjects, topics, or content covered by the items in
the test
2. Criterion-related validity: evaluating the relationship of scores obtained on the test to
scores on other tests or measures
3. Construct validity: This is a measure of validity that is arrived at by executing a
comprehensive analysis of:
a. how scores on the test relate to other test scores and measures, and
b. how test scores can be interpreted within a theoretical framework that explains
the construct the test was designed to measure

Face Validity
Face Validity: a judgment concerning how relevant the test items appear to be.
● If a test appears to measure what it is supposed to be measuring “on the face of it,” it
could be said to be high in face validity
● A perceived lack of face validity may contribute to a lack of confidence in the test
Content Validity
Content validity: how well a test samples behaviors that are representative of the broader set
of behaviors it was designed to measure
● Do the test items adequately represent the content that should be included in the test?
● Test blueprint: A plan regarding the types of information to be covered by the items, the
number of items tapping each area of coverage, the organization of the items in the test,
etc.

Content Validity
● Typically established by recruiting a team of experts on the subject matter and obtaining
expert ratings on the degree of item importance as well as scrutinize what is missing
from the measure
● Important to remember that content validity of a test varies across cultures and time

Criterion Related Validity


A Criterion is the standard against which a test or a test score is evaluated

Characteristics of a Criterion
An adequate criterion is relevant for the matter at hand, valid for the purpose for which it is
being used, and uncontaminated , meaning it is not part of the predictor.

Criterion-related validity: A judgment of how adequately a test score can be used to infer an
individual’s most probable standing on some measure of interest (i.e., the criterion).
Construct validity: arrived at by executing a comprehensive analysis of: a. how scores on the
test relate to other test scores and measures, and b. how scores on the test can be understood
within some theoretical framework for understanding the construct that the test was designed to
measure; the ability of a test to measure a theorized construct that it aims to measure

● Understand what constitutes good face validity and what happens if it is lacking/why
we might not want the test to be face valid.

● If a test appears to measure what it is supposed to be measuring “on the face of it,” it
could be said to be high in face validity
● A perceived lack of face validity may contribute to a lack of confidence in the test

● Be able to define criterion and its characteristics

○ An adequate criterion is ___ for the matter at hand, ___ for the purpose it is
being used, and ___, meaning it is not part of the predictor

A Criterion is the standard against which a test or a test score is evaluated

Characteristics of a Criterion
An adequate criterion is relevant for the matter at hand, valid for the purpose for which it is
being used, and uncontaminated , meaning it is not part of the predictor.

● What is the difference between concurrent and predictive validity?


Concurrent validity: an index of the degree to which a test score is related to some criterion
measure obtained at the same time (concurrently).

Predictive validity: an index of the degree to which a test score predicts some criterion, or
outcome, measure in the future. Tests are evaluated as to their predictive validity.

● Understand how base, hit, and miss rates relate to predictive validity

● Base rate: extent to which the phenomenon exists in the population


● Hit rate: accurate identification
● Hit rate and miss rate are negatively associated with one another
● Miss rate: failure to identify accurately

○ False positive vs. false negative

False positive: identifying an outcome that is not present, type 1 error


False negative: not identifying an outcome that is present, type 2 error

○ Type I error vs. type II error

Type 1 error: false positive--> saying that there is a statistical significance there is not

Type 2 error: false negative--> saying that there is no significance when there is

● What happens to the validity coefficient when you restrict or inflate the range of
scores?
When you restrict the range of scores, correlations get weaker-->reliability gets weaker--
>weaker validity (reliability is a prerequisite for validity); inflated range-->stronger correlations--
>stronger reliability-->stronger validity

● What is the importance of incremental validity?

Incremental validity: the degree to which an additional predictor explains something about the
criterion measure that is not explained by predictors already in use
● To what extent does a test predict the criterion over and above other variables?

(WHAT IS THE IMPORTANCE)

● If a test has high construct validity, what does this tell you about the test?

If a test is a valid measure of a construct, high scorers and low scorers should behave as
theorized; construct validity is an umbrella term; high construct validity tells you that the tests
measures what it says its measuring

● Be familiar with the different types of evidence for construct validity.

Construct validity: the ability of a test to measure a theorized construct (e.g., intelligence,
aggression, personality, etc.) that it aims to measure
● If a test is a valid measure of a construct, high scorers and low scorers should behave
as theorized
● Construct validity is an umbrella term – content- and criterion-related validity provide
evidence for construct validity and much more

Evidence of Construct Validity


● Evidence of homogeneity - how uniform a test is in measuring a single concept
● Evidence of changes - Some constructs are expected to change over time (e.g., reading
rate)
● Evidence of pretest/posttest changes - test scores change as a result of some
experience between a pretest and a posttest (e.g., therapy).
● Evidence from distinct groups - scores on a test vary in a predictable way as a function
of membership in some group (e.g., scores on the Psychopathy Checklist for prisoners
vs. civilians).

● Convergent evidence - correlates highly in the predicted direction with scores on


previously psychometrically established tests designed to measure the same (or a
similar) constructs
● Discriminant evidence - showing little relationship between test scores and other
variables with which scores on the test should not theoretically be correlated
● Factor analysis – A new test should load on a common factor with other tests of the
same construct.

● Understand the definition of bias and fairness.


Bias: a factor inherent in a test that systematically prevents accurate, impartial measurement
● Bias implies systematic variation in test scores
● Prevention during test development is the best cure for test bias
Fairness: The extent to which a test is used in an impartial, just, and equitable way.

○ How do bias and fairness relate? Can you have an unbiased, yet unfair test?

(LOOK UP)

● Know the types of rater error.

Rating error: a judgment resulting from the intentional or unintentional misuse of a rating scale.
● Raters may be either too lenient, too severe, or reluctant to give ratings at the extremes
(central tendency error)
(ADD TYPES)

● What is the Halo effect?

● Halo effect - a tendency to give a particular person a higher rating than he or she
objectively deserves because of a favorable overall impression

Chapter 7

● Be able to define utility and understand the relationship between psychometric


soundness and utility.

Utility: the usefulness or practical value of the test

Factors Affecting Utility


Psychometric soundness – Generally, the higher the criterion validity of a test the greater the
utility
● Many factors affect the utility of an instrument and utility is assessed in many different
ways
● Valid tests are not always useful tests

Examples of when a test might be valid but not useful?


● Cost, time, people

● Make sure you can provide examples of:

○ Economic costs and noneconomic costs

Factors Affecting Utility Costs


● Cost in the context of test utility refers to disadvantages, losses, or expenses in both
economic and noneconomic terms
● Economic costs may include purchasing a test, a supply bank of test protocols, and
computerized test processing
● Other economic costs are more difficult to calculate such as the cost of not testing or
testing with an inadequate instrument
● Non-economic costs include things such as societal consequences

Good luck studying and on the exam!!

● QUIZLET LINKS:
○ https://quizlet.com/632740492/psyc-385-exam-2-flash-cards/
○ https://quizlet.com/739168853/psyc-385-exam-2-flash-cards/
○ https://quizlet.com/737356291/psyc-385-exam-2-flash-cards/

● TOPHAT QUESTIONS:

You might also like