0% found this document useful (0 votes)
47 views29 pages

1 - Concept of Testing Theory (CTT & IRT)

The document discusses the concept of psychological testing, emphasizing its systematic approach to measuring behaviors and cognitive abilities. It outlines key principles such as reliability, validity, and standardization, and contrasts Classical Test Theory (CTT) with Item Response Theory (IRT), highlighting their respective assumptions and limitations. The document concludes by reinforcing the idea that testing focuses on attributes rather than individuals.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views29 pages

1 - Concept of Testing Theory (CTT & IRT)

The document discusses the concept of psychological testing, emphasizing its systematic approach to measuring behaviors and cognitive abilities. It outlines key principles such as reliability, validity, and standardization, and contrasts Classical Test Theory (CTT) with Item Response Theory (IRT), highlighting their respective assumptions and limitations. The document concludes by reinforcing the idea that testing focuses on attributes rather than individuals.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 29

Concept of

testing
Psychological Testing and Assessment (Module
3.1)
What does testing mean in
psychology?
• In psychology, testing refers to the systematic and
structured way of measuring variables, assessing
behaviors, and evaluating cognitive abilities or
traits in individuals.
• Psychological testing involves the use of
standardized procedures to collect data, allowing
psychologists to make informed observations, draw
conclusions, and make predictions about an
individual's psychological characteristics.
REMEMBER SOMETHING.
• WE DO NOT MEASURE, ASSESS,
EVALUATE, AND LABEL HUMANS.
• WE MEASURE, ASSESS, EVALUATE,
AND LABEL (also compare)
CHARACTERISTICS OR
ATTRIBUTES OF VARIOUS
ASPECTS OF HUMANS AND THEIR
BEHAVIOR(S).
Purpose of testing
.

Assessment and Research Gauging


Diagnosis Cognitive
Abilities

Educational Tracking Personality


Assessment performance and Assessment
Psychometric Concepts
The three key principles.

Reliability

Validity

Standardization and
Normalization
Classical Testing Theory
• Classical Test Theory (CTT) is a traditional framework in psychometrics that provides a
foundation for understanding and interpreting test scores. It focuses on the relationship
between observed scores, true scores, and measurement error.

• The key components of Classical Test Theory include the observed score (X), the
true score (T), and the error score (E).

• 1. Observed Score (X):

The observed score is what is actually measured when an individual takes a test. It includes both
the true score and the measurement error.

• 2. True Score (T):

The true score represents the individual's actual level of the attribute being measured, such as
intelligence, knowledge, or skill. It is the hypothetical, error-free score that would be obtained if
the test were perfect.

• 3. Error Score (E):

The error score is the discrepancy between the observed score and the true score. It reflects the
influence of random or systematic errors that can affect the accuracy of the measurement.
Mathematical representation of CTT

• The relationship between the


observed score (X), true score (T),
and error score (E) is often
expressed as an equation:
• X=T+E

• Activity: Use the properties of equality to find out what the


Concepts in CTT
• Reliability:
• CTT is concerned with the reliability of a test, which refers to the
consistency and stability of measurements. Reliability indicates the
extent to which the observed scores accurately reflect the true scores
and not just measurement error.
• Sources of Error:
• CTT recognizes that there are various sources of error, including random
fluctuations and systematic biases, that can impact test scores. These
sources contribute to the difference between observed and true scores.

• The concept of VALIDITY DOES NOT EXIST IN CTT.


7 assumptions of CTT
• 1. Observed Score Decomposition:

• CTT assumes that an observed test score (X) can be decomposed into two
components: the true score (T) and the error score (E). The observed score is
seen as the sum of the true score and random measurement error.
• 2. Additivity of Scores:

• CTT assumes that observed scores are additive. In other words, the observed
score on a test is the sum of the true score and the error associated with that
particular administration of the test.
• 3. Homogeneity of Items:

• CTT assumes that all items in a test measure the same underlying trait or
construct. This is referred to as the homogeneity or parallel-forms
assumption. The idea is that all items contribute equally to the measurement
of the underlying trait.
7 assumptions of CTT
• 4. Homoscedasticity of Errors:

• CTT assumes that the variability of errors is constant across all levels of the true
score. This is known as the homoscedasticity assumption. In other words, the
spread or dispersion of measurement errors does not change with different levels
of the true score.

• 5. Local Independence:

• CTT assumes that the responses to different items on a test are locally
independent, meaning that the response to one item is not influenced by the
response to another item. This assumption allows for the valid use of summative
total scores.

• 6. Unidimensionality:

• CTT assumes that the underlying trait or construct being measured by the test is
unidimensional, meaning that it can be adequately represented by a single
dimension. This assumption supports the validity of using a single score to
represent an individual's standing on the trait..
7 assumptions of CTT
• 7. No Differential Item Functioning (DIF):

• CTT assumes that the probability of endorsing an item (answering it correctly) is


the same for individuals with the same true score, regardless of other
characteristics such as gender, age, or cultural background. This assumption is
related to the concept of measurement invariance across different groups
Limitations of CTT
• Assumption of Additivity:

• CTT assumes observed scores are the sum of true scores and independent errors,
which may not hold in situations where item interactions affect performance.

• Homogeneity of Items:

• CTT assumes all items measure the same trait equally. In cases where items differ
in importance or measure different aspects, CTT may not accurately represent the
test's structure.

• Local Independence:

• CTT assumes responses to items are locally independent. Violations, like response
dependencies, can impact the accuracy of test score interpretations.

• Unidimensionality:

• CTT assumes the underlying trait is unidimensional. If the construct is


multidimensional, CTT may not accurately capture an individual's abilities.
Limitations of CTT
• Sensitivity to Test Length:

• Reliability in CTT is influenced by test length. Longer tests tend to have higher
reliability, potentially inflating estimates for shorter tests.

• Limited Information about Items:

• CTT does not provide detailed information about item characteristics, limiting
insights into item difficulty and discrimination.

• Does Not Consider Test Item Difficulty:

• CTT does not explicitly model item difficulty, hindering understanding of how
different items contribute to overall test difficulty.

• Not Well-Suited for Adaptive Testing:

• CTT is less efficient for adaptive testing, where item difficulty is adjusted based on
individual performance.
Item Response Theory of testing
• Item Response Theory (IRT) is a modern psychometric
framework used to design, analyze, and score tests.
• Unlike Classical Test Theory (CTT), IRT focuses on the
characteristics of individual test items and examines
how individuals with different ability levels respond to
those items.
Components of IRT
• Item Response Theory (IRT) involves several key components that contribute to its modeling of the
relationship between individuals' abilities and their responses to test items. Here are the main
components of IRT:

• 1. Item Parameters:
• Difficulty Parameter (β):

• Represents the ability level at which individuals have a 50% chance of answering the item correctly.
Higher values indicate more difficult items.

• Discrimination Parameter (a):

• Reflects how well an item discriminates between individuals with high and low abilities. Higher values
indicate greater discriminatory power.

• Guessing Parameter (c):

• Indicates the likelihood of guessing the correct answer when an individual lacks the ability to respond
correctly. Lower values suggest less guessing.
Components of IRT
• 2. Item Characteristic Curve (ICC):
• The ICC is a graphical representation of how the probability of a correct response varies across
different levels of ability. Each item has its unique ICC, illustrating how the item performs across the
ability continuum.

• 3. Test Information Function:


• The test information function shows the amount of information the test provides at different levels of
ability. It is derived from the item parameters and indicates the precision of measurement. Peaks in the
information function represent areas where the test is most informative.

• 4. Item Response Curve:


• The item response curve is another term for the ICC. It graphically depicts the relationship between
the probability of a correct response and the examinee's ability level for a specific item.

• 5. Test Characteristic Curve (TCC):


• Similar to the ICC, the Test Characteristic Curve is a graphical representation of the overall test's
performance across different levels of ability. It is created by combining the ICCs of all items in the test.
Mathematical Representation (IRT
MODEL)
• Item Response Function (IRT Model):

• The item response function describes the probability of a correct response to a test item based on an
individual's ability level. It is represented as:
Mathematical Representation (IRT
MODEL)
• PARAMETRIC REPRESENTATIONS:

• 1-Parameter Logistic Model (1PLM):

• Parameter: Only one parameter is included, which is the item difficulty parameter (β).

• Assumption: Assumes a constant item discrimination across all ability levels.

• Applicability: Suitable for measuring unidimensional constructs when discrimination differences are
not crucial.

• 2-Parameter Logistic model (2PLM)

• Parameters:

• Item difficulty parameter (β): Location of the item on the ability scale.

• Item discrimination parameter (a): Steepness of the curve, indicating how well the item differentiates
between high and low ability.

• Assumption: Assumes variable discrimination across different ability levels.

• Applicability: Widely used and suitable for many testing scenarios.


Mathematical Representation (IRT
MODEL)
• 3-Parametric Logistical Model (3PLM)

• Parameters:

• Item difficulty parameter (β): Location of the item on the ability scale.

• Item discrimination parameter (a): Steepness of the curve, indicating how well the item differentiates
between high and low ability.

• Guessing parameter (c): Accounts for the probability of guessing the correct response when ability is
very low.

• Assumption: Adds a guessing parameter to address the likelihood of a correct response due to
guessing.

• Applicability: Useful when guessing may significantly influence responses, common in multiple-choice
items.
Let’s Summarise the Models:

• 1PLM: Simplest model with only an item difficulty parameter.


• 2PLM: Adds an item discrimination parameter to account for variable
discrimination.
• 3PLM: Introduces a guessing parameter to address the probability of a
correct response due to guessing.
Assumptions of IRT:

• Unidimensionality of the Measured Trait


• Local Independence
• Monotonicity
• Item Invariance:
Unidimensionality of the Measured
Trait

• The test is designed to measure a single underlying


trait or construct. This means that all items on the
test are related to the same latent variable.
• Unidimensionality assumes that variations in
individual responses are primarily due to
differences in this single trait and not influenced by
other unrelated factors.
Local Independence

• The responses to different test items are statistically


independent after accounting for an individual's level of the
latent trait. Once the latent trait is known, knowledge of an
individual's response to one item should provide no
information about their response to another item.
• Local independence is crucial for ensuring that the test
measures the intended trait without the interference of
relationships between items.
Monotonicity

• The probability of a correct response to an item increases


monotonically with the individual's level of the latent trait. In
other words, as an individual's ability on the latent trait
increases, so does the likelihood that they will respond
correctly to a particular item.
• Monotonicity ensures that the test is sensitive to changes in
the underlying trait.
Item Invariance

• Also known as measurement invariance or item parameter


invariance, this assumption implies that the parameters of
the IRT model (such as item difficulty and discrimination) are
consistent across different groups or conditions.
• Item invariance ensures that the same item measures the
same trait in a consistent manner across various subgroups
(e.g., different demographic groups).
Limitations of IRT

• While Item Response Theory (IRT) offers numerous


advantages in test development and measurement, it is not
without limitations.
• 1. Assumption Sensitivity: IRT models are built on several
assumptions, including unidimensionality, local
independence, monotonicity, and item invariance. Deviations
from these assumptions can affect the accuracy of parameter
estimates and compromise the validity of test scores.
Limitations of IRT
• 2. Data Requirements: IRT models often require larger
sample sizes compared to Classical Test Theory, particularly
when estimating parameters for more complex models.
Inadequate sample sizes can lead to imprecise parameter
estimates and reduced model performance.
• 3. Complexity and Expertise: Implementing and interpreting
IRT models can be complex and requires a certain level of
statistical expertise. Researchers and practitioners may find it
challenging to navigate and apply IRT models effectively.
Limitations of IRT
• 4. Limited Applicability for Small-Scale Assessments: IRT is
most beneficial when applied to large-scale assessments with
a sufficient number of items and examinees. For small-scale
assessments or tests with a limited number of items, the
benefits of IRT may be diminished.
• 5. Difficulty in Test Development: Developing items for IRT
can be challenging. Items need to be carefully crafted to meet
the assumptions of the model, and item parameters must be
accurately estimated through pre-testing or calibration.
Remember…
We test attributes, not
humans.
We are open to questions!

Your speakers (nominal):


1. Prasad Deshpande
2. Srimoyee Kabiraj
3. Avishi Shah
4. Rukkaiya Ali
5. Parishi Somani
6. Priyal Sanghvi

You might also like