0% found this document useful (0 votes)
12 views

Module 2 - Measurement Fundamentals

Uploaded by

Sheikh Badshah
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Module 2 - Measurement Fundamentals

Uploaded by

Sheikh Badshah
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 105

1

Level A: Basic Psychometrics

Module 2: Measurement Fundamentals

Professor Muhammad Kamal Uddin


(PhD, Kyushu University, Japan)
Department of Psychology
University of Dhaka, Bangladesh
Phone: 01713456644 Email:[email protected]
What is Psychometrics?
Psychometrics is a field of study concerned with the theory and
technique of psychological measurement. As defined by the US
National Council on Measurement in Education (NCME),
psychometrics refers to psychological measurement. Generally, it
refers to the field in psychology and education that is devoted to
testing, measurement, assessment, and related activities.

2
Psychological Assessment

It can be defined as the gathering and integration of psychology-


related data for the purpose of making a psychological evaluation,
accomplished through the use of tools such as tests, interviews, case
studies, behavioral observation, and specially designed apparatuses
and measurement procedures.

3
Tools of Psychological Assessment
1. Psychological test
2. Interview
3. Portfolio
4. Case History Data
5. Behavioral Observation
6. Role-Play Tests
7. Computers as Tools
8. Other Tools

4
A Taxonomy of Psychological Assessment
Psychological Test
According to Cronbach (1960) – “A psychological test is a systematic
procedure for comparing the behavior of two or more people”.
According to Anstey (1966) – “Psychological tests can be defined as devices
and techniques for the quantitative assessment of psychological attribute of an
individual”.
According to Anastasi (1969) – “A psychological test is an objective and
standardized measure of a sample of behavior”.

According to Gregory (1996) – “A test is a standardized procedure for


sampling behavior and describing it with categories or score”.

According to Anastasi and Urbina (1997) – “A psychological test is


essentially an objective and standardized measure of a sample of behavior”. 6
Definition of Standardized Tests
Standardized tests are constructed by test construction specialists, usually with
the assistance of curriculum experts, teachers, and school administrators. They
may be used to determine a student’s level of performance relative to (a) the
performance of other students of similar age and grade or (b) a criterion, such
as state academic standards, or the new Common Core State Standards. When
standardized tests are used to compare performance to students across the
country, they are called standardized norm-referenced tests, and when they are
used to determine whether performance meets or exceeds criteria like state
standards, they are called standardized criterion-referenced tests.
Norm-Referenced Tests (NRTs) and Criterion-
Referenced Tests (CRTs)
 NRTs are typically standardized tests developed by commercial test publishers or some state education
agencies (e.g., the SAT). They are designed to enable us to compare the performance of students who
currently take the test with a sample of students who completed the test in the past. The sample of students
who completed the test in the past is called a norm group (or normative group or sample). NRTs tend to
measure broad educational goals and are usually lengthy (hours long in duration).

 CRTs may be standardized or teacher-made and enable a different kind of comparison. Compared to
NRTs, CRTs are typically shorter in length and narrower in focus. Instead of comparing current student
performance to other students, CRTs enable comparisons to an absolute standard or criterion. CRTs help us
to determine what a student can or cannot do. Rather than stating that “Marie is above average,” CRTs
enable us to make judgments about a student’s (or group of students’) level of proficiency or mastery over a
skill or set of skills (e.g., “Marie is able to spell the words in the third grade spelling list with greater than
80% accuracy,” For this reason, and because they are shorter than NRTs, scores from a CRT are more likely
to be useful for instructional decision making than scores from a NRT would be.
Definition of Measurement
• Measurement is the assignment of numerals to objects or events according to rules
(Stevens, 1946).
• Measurement is a process that involves three components – an object of measurement, a
set of numbers, and a system of rules – that serve to assign numbers to attributes or
magnitudes of the variable being measured.
• The rules are the specific procedures used to transform qualities of attributes into numbers
(Camilli, Cizek, & Lugg, 2001; Nunnaly & Bernstein, 1994; Yanai, 2003).
• An educational or psychological test is a measuring device, and as such it involves rules
e.g., specific items, administration, and scoring instructions for assigning numbers to an
9
individual’s performance that are interpreted as reflecting characteristics of the individual.
Definition of Measurement…

10
Properties of Numbers

• Property of Identity
• Property of Order
• Property of Quantity
• The Number 0

11
Four Levels of Measurement
1. Nominal Scale: a scale in which the numbers or letters assigned to an object serve only
as labels for identification or classification, e.g. Gender (Male = 1, Female = 2)
2. Ordinal Scale: a scale that arranges objects or alternatives according to their magnitude
in an ordered relationship, e.g. Academic status (Freshman = 1, Sophomore = 2, Junior
= 3, etc.
3. Interval Scale: a scale that both arranges objects according to their magnitude,
distinguishes this ordered arrangement in units of equal intervals, but does not have a
natural zero representing absence of the given attribute, e.g. the temperature scale (40oC
is not twice as hot as 20oC)
4. Ratio Scale: a scale that has absolute rather than relative quantities and an absolute
12
(natural) zero where there is an absence of a given attribute, e.g. income, age.
Association Between Property of
Number and Level of Measurement
Level of Measurement
Property of Number Nominal Ordinal Interval Ratio

Identity    
Order   
Quantity  
Absolute Zero 
Example Sex Class Rank Temperature Distance
13
Why Level of Measurement Matters?

It helps you decide what mathematical operations and


statistical analyses are appropriate on the values that
were assigned
It helps you decide how to interpret the data from that
variable
14
Mathematical Operations
 With nominal level data the only mathematical operation that is applicable is “equal
to” (=) and “not equal to” (≠).
 With ordinal level data one can also include “greater than” (>) and “less than” (<) as
applicable operations.
 With interval level data all the basic mathematical operations like addition,
subtraction, multiplication, and division can be applied. However, because interval
level scores do not have an absolute or true zero, one cannot make accurate
statements about relative magnitude and create ratios.
 With ratio level data, however, one can make accurate statements about relative
15
magnitude and create ratios.
Statistics Analyses
Descriptive Statistics
 With nominal level data the only measures of central tendency applicable is mode. No common
measures of variability is applicable. One can describe the categories and the count (frequency
distribution) in each category.
 If ordinal scales are used, analysis of raw data can be done using median and range (plus mode and
frequency distribution)
 If interval or ratio scales are used, analysis of raw data can be done through the use of mean, median,
mode, range, variance, standard deviation.
Inferential Statistics
 Nominal and Ordinal data are amenable to nonparametric statistics but Interval and Ratio data can be
analyzed using parametric statistics. Parametric tests are more powerful meaning that they are more
16
sensitive in detecting true differences between groups.
17
Types of Correlations
Variables Nominal Ordinal Interval/Ratio (Scale)

Nominal Clustered bar-graph, Clustered bar-graph, Scatter plot, Bar chart or


Chi-square, Chi-square, Error-bar chart,
Phi (φ) or Cramer's V Phi (φ) or Cramer's V Point bi-serial correlation

Ordinal Scatterplot or clustered Recode


bar chart, Spearman’s Rho Scatter plot, Pearson Point bi-
or Kendall’s Tau serial, or Spearman’s Rho, or
Kendall’s Tau

Interval/Ratio (Scale) Scatter plot, Pearson


Product-moment correlation

18
Types of Correlations
No Correlation Level of Measurement
1. Phi, contingency Both variables nominal
2. Spearman rank order, Kendall’s Both variables ordinal
tau
3. Pearson product moment Both variables interval
4. Pearson Point biserial One variable interval, one variable (naturally)
dichotomous/binary
5. Pearson biserial One variable interval, one variable artificially
dichotomous/binary
6. Polychoric Both variables ordinal with underlying continuities
7. Tetrachoric Both variables dichotomous artificially 19
Types of Graphs
Type Level of Measurement

Bar Chart Nominal; must be organized into categories

Pie Chart Nominal, ordinal, interval, or ratio. However, it is not practical


to use a pie chart when there are more than five or six possible
values for a variable.

Histogram/Frequency Polygram Ordinal, interval, or ratio level data. Most often used with ratio
or interval level data

Line Chart Interval and ratio data

Single System Design Interval and ratio data

20
Types of Graphs

ABA Design

70

60

50

Low Self-esteem
40

30

20

10

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Baseline Intervention Baseline
Definition of Variable

Any characteristic which is subject to change and can have more


than one value such as age, intelligence, motivation, gender, etc.

22
Types of Variables Based on How They are Measured
Exercise
1. The number of participants in the current batch.
2. A person’s weight is measured on which kind of scale?
3. Salaries of college professors.
4. What level of measurement is of an IQ score?
5. Variables gender or ethnicity are measured on which scale?
6. Where do you live?
7. What is your political preference?
8. What is your hair color?
9. What is your social class?
26
9. Number of students present in the class
10. Time it takes to get to school
11. Number of red marbles in a jar
12. Distance traveled between classes
13. Students’ grade level
14. Height of students in class
15. weight of students in class
16. number of heads when flipping three coins
17. temperature measured in Kelvin scale

27
Exercise
18. How satisfied are you with the training program?
 Very Unsatisfied – 1
 Unsatisfied – 2
 Neutral – 3
 Satisfied – 4
 Very Satisfied – 5
19. How would you rate our app?
 Excellent
 Very Good
 Good
 Bad
 Poor
20. In medical practice, burns are commonly described as
First-Degree
Second-Degree
Third-Degree 28
Main Differences Between Discrete Versus Continuous Variables
Basis of Comparison Discrete Variable Continuous Variable

A variable with a limited Is characterized by variables


Meaning number of values which are with unlimited number of
isolated ranging values

Values Countable Measurable

Range of specified number Complete or whole Incomplete

Represented by Lone points on a graph Linked points

Classification Do not overlap Overlapping

Assumes Separate or distinct value A value between a range


29
Types of Variables Based on the Role They Play in a Study

Independent, Input, Covariate, Dependent, Output, Outcome,


Exploratory, Organismic, Predictor, Effect, Criterion, Endogenous,
Exogenous, Manipulated, Treatment, Test, Response
Explanatory

Extraneous, Confounding, Intervening, Mediating, Moderating, Interaction,


Control or Constant, Dummy or Indicator

30
Assignment

1. Prepare a psychometric test or questionnaire including demographics to


demonstrate categorical (nominal and ordinal), continuous (interval and
ratio), and discrete (either continuous or categorical) variables based on
how they are measured and label them graphically.

2. Explain all kinds of variables based on the role they play in a study and
present them pictorially.
31
Errors in Measurement

1. Random Error
2. Systematic Error
What Is Random Error?
Any factors that randomly affect measurement of the variable
across the sample. For instance, each person’s mood can
inflate or deflate performance on any occasion. Random error
adds variability to the data but does not affect average
performance for the group.

33
Random Error

Frequency

The distribution of X with no


random error

34
Random Error
The distribution of X with
random error
Frequency

The distribution of X with no


random error

35
Random Error
The distribution of X with
random error
Frequency

The distribution of X with no


random error

Notice that random error doesn’t X


affect the average, only the
variability around the average.
36
Participants Observed Score = X True Score = T Error = E
1 7 6 1
2 5 5 0
3 8 7 1
4 8 7 1
5 7 8 -1
6 6 5 1
7 5 6 -1
8 3 5 -2
9 5 6 -1
10 6 5 1
Mean 6.00 6.00 0.00
Variance 2.44 1.11 1.33
Variance T/Variance X 0.45
Squared Correlation 0.45
Correlation 0.67 37
What Is Systematic Error/Bias?
Any factors that systematically affect measurement of the variable across
the sample. For instance, asking questions like “Do you favor eliminating
the wasteful excess in the military budget?” will tend to yield a systematic
higher endorsement rate in disagreement option. Systematic error does
affect average performance for the group.

38
Systematic Error

Frequency

The distribution of X with no


systematic error

39
Systematic Error
The distribution of X with
systematic error
Frequency

The distribution of X with no


systematic error

40
Systematic Error
The distribution of X with
systematic error
Frequency
The distribution of X with no
systematic error

Notice that systematic error does X


affect the average; we call
this a bias.
41
Participants Observed Score = X True Score = T Error = E
1 7 6 1
2 5 4 1
3 8 7 1
4 8 7 1
5 7 6 1
6 6 5 1
7 5 4 1
8 3 2 1
9 5 4 1
10 6 5 1
Mean 6.00 5.00 1.00
Variance 2.44 2.44 0.00
Variance T/Variance X 1.00
Squared Correlation 1.00
Correlation 1.00
Variance, Covariance, and Correlation

Variance  
2


X X  2

X
( N  1)

Variance 
 (X  X )( X  X )
N 1

Covariance  XY   (X  X )(Y  Y )
N 1
Covariance
Correlatio n (ρ XY )
σ X σY

43
Computing Variance
Students X X  X  X  X  2

1 4 -2.6 6.76
2 5 -1.6 2.56
3 3 -3.6 12.96
4 8 1.4 1.96
5 8 1.4 1.96
6 6 -0.6 0.36
7 6 -0.6 0.36
8 7 0.4 0.16
9 9 2.4 5.76
10 10 3.4 11.56
Sum 66.00 0.00 44.40
Mean 6.60 0.00 N = 10
= (Sum of Squared Deviations) /(N-1)
Variance
= 44.40/9 = 4.93
Computing Covariance
Student X X  X  Y Y  Y  X  X  Y  Y 
1 4 -2.6 20 2.6 -6.76
2 5 -1.6 19 1.6 -2.56
3 3 -3.6 21 3.6 -12.96
4 8 1.4 16 -1.4 -1.96
5 8 1.4 16 -1.4 -1.96
6 6 -0.6 18 0.6 -0.36
7 6 -0.6 18 0.6 -0.36
8 7 0.4 17 -0.4 -0.16
9 9 2.4 15 -2.4 -5.76
10 10 3.4 14 -3.4 -11.56
Sum 66.00 0.00 174.00 0.00 -44.40

Mean 6.60 0.00 17.40 0.00


(-44.40/9) = -4.93
Covariance
45
Computing Correlation
Participants X =Height (cm) Y = Weight (gm) X =Height (inch) Y = Weight (dg)

1 7 6 2.76 0.60
2 5 5 1.97 0.50
3 8 7 3.15 0.70
4 8 7 3.15 0.70
5 7 8 2.76 0.80
6 6 5 2.36 0.50
7 5 6 1.97 0.60
8 3 5 1.18 0.50
9 5 6 1.97 0.60
10 6 5 2.36 0.50
Mean 6 6 2.36 0.60
Covariance 0.91 0.04
46
Correlation 0.67 0.67
ΣXΣY
COV XY 
 
Σ X  X Y  Y


ΣXY 
N
N  1 N  1

ΣXΣX
COV XX 
 
Σ X  X X  X  
ΣXX 
N
N  1 N  1

ΣX 
2

 
2
2 ΣX 
Σ X  X N
Var X  
N  1 N  1

ΣY 
2

 
2
2 ΣY 
Σ Y Y N
Var Y  
N 1 N 1
 Often multiple items are combined in order to create a composite score
 The variance of the composite is a combination of the variances and
covariances of the items creating it
 General Variance Sum Law states that if X and Y are random variables:

 2
X Y  2
X   2 XY
2
Y

49
Variance Sum Law
Σ[(X  Y)  ( X  Y)] 2
σ X2 Y 
N  1
Σ[(X  X)  (Y  Y)] 2
σ 2X  Y 
N 1
Σ(X  X) 2 Σ(Y  Y) 2 2Σ(X  X)(Y  Y)
σ 2X  Y   
N 1 N 1 N 1

2 2 2
σ X Y σ X σ Y  2 XY
2 2 2
σ X Y σ X σ Y  2 XY  X  Y
Covariance (σ XY )
Correlation (ρ XY ) 
σXσY
2 2 2
σ X Y σ X σ Y  2 XY
Individual X (Sleep) Y (Awaken) (X+Y)
1 4 20 24
2 5 19 24
3 3 21 24
4 8 16 24
5 8 16 24
6 6 18 24
7 6 18 24
8 7 17 24
9 9 15 24
10 10 14 24
Covariance -4.93
Variance 4.93 4.93 0
51
Variance (X+Y) 0
2 2 2
σ XY σ X σ Y  2 XY
Individual X (Sleep) Y (Awaken) (X-Y)
1 4 20 -16
2 5 19 -14
3 3 21 -18
4 8 16 -8
5 8 16 -8
6 6 18 -12
7 6 18 -12
8 7 17 -10
9 9 15 -6
10 10 14 -4
Covariance -4.93 -4.93
Variance 4.93 4.93 4.93
52
Variance (X-Y) 19.73
2 2 2
σ X Y σ X σ Y  2 XY
Individual X (Sleep) Y (Study) (X+Y)
1 5 4 9
2 6 5 11
3 7 4 11
4 8 4 12
5 6 5 11
6 7 5 12
7 6 5 11
8 7 4 11
9 8 5 13
10 6 4 10
Covariance 0.00
Variance 0.93 0.28 1.21
53
Variance (X+Y) 1.21
2 2 2
σ XY σ X σ Y  2 XY
Individual X (Sleep) Y (Study) (X-Y)
1 5 4 1
2 6 5 1
3 7 4 3
4 8 4 4
5 6 5 1
6 7 5 2
7 6 5 1
8 7 4 3
9 8 5 3
10 6 4 2
Covariance 0.00
Variance 0.93 0.28 1.21
Variance (X-Y) 1.21 54
Theories of Measurement

1. Classical Test Theory (CTT)


2. Item Response Theory (IRT)
3. Generalizability Theory (GT)
Classical Test Theory

Lord and Novick (1968) say “The classical test theory model is
based on a particular, mathematically convenient and conceptually
useful, definition of true score and on certain basic assumptions
concerning the relationships among true and error scores”.
Classical Test Theory

X T  E
Observed True Random
= +
Score Score Error

57
Classical test theory also assumes that (a) the distribution of observed scores
that a person may have under repeated independent testing is normal and (b)
the standard deviation of the normal distribution, referred to as standard error
of measurement (SEM), is the same for all persons taking the test.

Under these assumptions, the left figure represents


the (hypothetical) normal distribution of observed
scores for repeated measurements of one person
with the same test. The mean of this distribution is,
in fact, the person’s true score (T = 20) and the
[X – 2(SEM)] < T < [X +
2(SEM)] standard deviation is the standard error of
measurement (SEM = 2).
58
59
60
Assumptions in CTT
The classical measurements model — which asserts that an observed score, X, results

from the summation of a true score, T, plus error, E – starts with common assumptions

about items and their relationships to the latent variable and sources of error:

The observed score (X) of a person is the sum of the true part (T) and the error part (E).

X T  E

61
Assumptions
1. The amount of error associated with individual items varies randomly.
The error associated with individual items has a mean zero when
aggregated across a large number of people. Thus, items’ means tend to
be unaffected by error when a large number of respondents complete the
items.

 E  0...(1)

The expected value of E of all deviations of is zero; they


T are
deviation scores from the Mean.

62
Assumptions…
2. Measurement errors between two items of the same scale are
uncorrelated. That is one item’s error term is not correlated with
another item’s error term (i.e. assumption of local independence); the
only routes linking items always pass through the latent variable,
never through any error term.

 E E 0...(2)
1 2

63
Assumptions…
3. Error terms are not correlated with the true score of the latent variable. For example, with a high value of T, the E is
not systematically lower or higher. So, the assumption is:

TE 0...(3)

64
Important Deduction from the CTT
1. The observed variance of scores in a sample or population equals the true
score variance of a sample/population plus the error variance. The total
observed variance in a test/questionnaire consists of the sum of the true
variances and the error variances. Because the correlation between the T’s
and E’s is zero, no correlation between T and E has to be added.

 2
X  2
T   ...( 4 )
2
E

Note. Cov( TE ) 0...( 3 ) 65


Important Deduction from the CTT…
2. The covariance of observed score with true score is just the
variance of true score
Cov XT  Cov T  E ),T   2 T  Cov TE   2 T  0  2 T ...(5)

3. The correlation of observed score with true score is


Cov XT   2T T
 XT    ...( 6 )
 XT  XT X
4. The correlation of observed score with error score is

Cov XE  Var E  E E 
 XE     E ...(7)
 X E  X E  X E X 66
Theoretical Definition of Reliability
1. Reliability can be defined as the ratio of the true score to the observed
score.
T
R ...(1)
X
2. Reliability is generally defined as the ratio of the true score variance to the
observed score variance.
 T2
Re liability  2 ...( 2)
X
3. Reliability is the squared correlation between true score and observed
score.
Re liability  XT
2
...(3)
σ T2 2
 Reliabilit y(ρ XX )  2  ρ XT ...(4)
σX 67
Theoretical Definition of Reliability…
You may be wondering how we can compute a reliability coefficient if we don’t know the
true scores of all the test takers. Fortunately, the answer is simple. There is another
definition of reliability/precision that is mathematically equivalent to the formula that uses
true score variance and observed score variance to calculate reliability. That definition is as
follows: Reliability/precision is equal to the correlation between the observed scores on two
parallel tests (Crooker & Algina, 1986)

Re liability  XX '
68
69
Theoretical Definition of Reliability…
The theoretical reliability coefficient is not practical; we do not know each
person’s true score. So, we cannot compute reliability. If we can’t compute
reliability, perhaps the best we can do is to estimate it.

Test-Retest Reliability
Alternate/Equivalent/Parallel Forms Reliability
Internal Consistency Reliability
Inter-Rater /Inter-Scorer /Scorer Reliability
Each traditional estimation method – test-retest, parallel (equivalent) forms,
and internal consistency – defines reliability somewhat differently; none is
isomorphic with the theoretical definition.
71
72
Test-Retest Reliability
The test-retest method for estimating reliability involves
administering the same test to the same group of individuals on
two different occasions and then correlating the two sets of
scores. When using this method, the reliability coefficient
indicates the degree of stability (consistency) of examinees'
scores over time and is also known as the coefficient of stability.

73
Now, to see how repeatable or consistent an observation is,
we can measure it twice.
If we can't compute reliability, perhaps the best we can do is to
estimate it. Maybe we can get an estimate of the variability of the
true scores. How do we do that? Remember our two observations,
X1 and X2? We assume that these two observations would be related
to each other to the degree that they share true scores. So, let’s
calculate the correlation between X 1 and X2. Here’s a simple
formula for the correlation:

Covariance (X1 , X 2 )
Correlatio n 
SD X1  SD X 2
Covariance (X1 , X 2 )
Correlatio n 
SD X1  SD X 2

Variance (T )
Correlatio n 
Variance (X )
 Correlatio n Re liability
 Variance (T ) 
 Variance (X) Re liability 
 
Alternate/Equivalent/Parallel Forms Reliability
The theoretical reliability coefficient is not practical; we do not know each person’s
true score. Nevertheless, we can estimate the theoretical coefficient with the sample
correlation between scores on two parallel tests. Assume that X and X′ are two
strictly parallel tests (for simplicity) – that is, tests with equal means, variances,
covariances with each other, and equal covariances with any outside measure. The
Pearson product-moment correlation between parallel tests produces an estimate of
the theoretical reliability coefficient:

Cov XX / CovT  E T  E /  ST2


rXX /    2 Re liability
SX SX / S X2 SX

77
 Kuder-Richardson 20 (KR 20):
 a special case of alpha
 applies only to dichotomous items

k  sTotal   pq 
2

   2 
k  1 sTotal 
Where, k is the number of items and pq is the variance for each dichotomous item
The proportion of individuals who pass (p) multiplied by the proportion who
pq  fail (q) each item
Calculate Kuder-Richardson 20 reliability coefficients on the following scores on an achievement test, where 1
indicates a right answer and 0 a wrong answer.
Examinee A B C D E p q pq
Item
1
1 1 0 1 1 0.8 0.2 0.16
2
1 0 0 0 0 0.2 0.8 0.16
3
1 1 1 1 1 1 0 0.00
4
1 1 1 0 0 0.6 0.4 0.24
5
1 0 1 1 0 0.6 0.4 0.24

Total Score 5 3 3 3 2 0.80


Mean 3.2 1.2

S 2
Total 1.2  pq 0.8
2 

 X -X 
2

N σ 2  pq
2 2 p  Item Facility
X - 2 X X   X
2 
N q  Item Difficulty
2
X 2 2 X X X
2   
N N N
2
 X 0 or 1
X  ΣX  X
2   2  X   X X 2
N  N  N
2
NX
σ 2  p  2p 2 
N
σ 2  p  2p 2  p 2
σ 2  p(1  p) σ 2  pq
80
k   sij 
   2 
k  1  sTotal 
s 2 is the composite variance (if items were summed)
Total

 sij is covariance between the ith and jth items where i is not equal to j
 k is the number of items

81
Respondents
Items Ria Zia Pia Variance
1 6 5 4 1.00
2 6 4 5 1.00
3 5 3 3 1.33
4 4 4 4 .00
5 4 5 4 .34
Total 25 21 20
Variance of Total Scores = 7.0
Total of item variances = 3.67
82
A 3-item (X, Y, Z) psychological test is administered to 60 participants in Basic Psychometrics
and the following variance-covariance matrix is obtained:

X Y Z
X 55.83 29.52 30.33
Y 29.52 17.49 16.15
Z 30.33 16.15 29.06
The sum of all the item variance is 102.38
The sum of all the item covariance is 152.03
The variance of the Total Test Scores is 254.41
2
S Total 55.83  17.49  29.06  2( 29.52  30.33  16.15)
83
k  sTotal   si 
2 2
3  254.41  102.38 
   2     .8964
k  1 sTotal  3  1 254.41 

k   sij  3  152.03 
   2     .8964
k  1  sTotal  3  1  254.41 

84
Models of Measurement

Just as routine tests to check for any violations of normality should be carried out, so equally should tests for
assumptions in reliability estimation be applied. Despite the seemingly obscure labels given to the models, all
are connected by four underlying and easily-described properties of a scale (e.g., see Graham, 2006). These
properties are as follows:
i) the extent to which each item measures the same underlying trait (unidimensionality);
ii) whether the true scores for different items have the same mean (sensitivity);
iii) whether the true scores for different items have the same variance; and
iv) whether the error variance is the same for each item.
Thus the degree to which one assumes either constancy or variability of properties ii) to iv) is what distinguishes
the essentially tau-equivalent from parallel or congeneric models.
Models of Measurement…

In CTT, measures of the same thing (e.g., items, subtests, or tests) can be classified by their levels of similarity.
In this section, I define four levels of similarity: parallel, tau-equivalent, essentially tau-equivalent, and
congeneric. Note that these levels are hierarchical in the sense that the highest level (parallel) requires the most
similarity, whereas levels lower in the hierarchy allow for less similarity in test properties. For example, parallel
measures must have equal true score variances, whereas congeneric measures do not require this. One useful
way of thinking about these levels is in terms of the relationships between the true scores of pairs of measures
(Komaroff, 1997). In CTT, the basic relationship between the true scores on two measures (ti and tj) is

t i a ij  bij t j
Models of Measurement…

A. Congeneric Model (Least Restrictive)


B. Essentially Tau-equivalent (More Restrictive)
C. Tau-equivalent Model (Even More Restrictive)
D. Parallel Model (Most Restrictive)
Models of Measurement…
A. Parallel Model (Most Restrictive)
B. Tau-equivalent Model (Less Restrictive)
C. Essentially Tau-equivalent (Even Less Restrictive)
D. Congeneric Model (Least Restrictive)
X k   E...( A )
X k   E k ...(B)
X k   a k   E k ...(C)
X k λk η  a k   E k ...( D)
Assumptions in CTT Measurement Models

89
Congeneric Model

X k λ k η  a k   E k ...( A )
X is an observed score linearly related to a single latent trait η ,
λ is the slope representing units /scales of measurement,
a is the intercept representing origin of scale, E is the residual (error term), and subscript k is the item
in question.
The least restrictive measurement model referred to as congeneric model assumes
that a group of observed items
1.measure the same latent trait
2.can measure the latent trait on different scales/units (the slopes λk can be

different)
3.can measure the latent trait with different degrees of precision (dissimilar scale
origins—the intercepts ak can be different)

4.can measure the latent trait with different amounts of error (Var(Ek) can be

different).

Factor Analysis (typically) uses the Congeneric Measurement Model (Raykov, 1997a).

91
Essentially Tau-equivalent Model

X k   a k   E k ...(B)

92
Essentially Tau-Equivalent Model
A more restricted case of congeneric measures, referred to as essentially tau-
equivalent measures, occurs when only the first condition is in place-
1.λ1= λ2 = λ3 (all slopes are equal).

As variables that differ by a constant have equal variances, one can say that
essentially tau-equivalent measures have equal true score variances but unequal
error variances.

93
Tau-equivalent Model
X k   E k ...(C)
Tau-Equivalent Model

An even more restricted case of congeneric measures, referred to as tau-equivalent


measures, occurs when only the first two conditions are in place-
1.λ1= λ2 = λ3 (all slopes are equal); and

2.a1= a2 = a3 (all intercepts are equal);

95
Parallel Model

X k   E...( D)
Parallel Model
The most restricted case of congeneric measures, referred to as parallel measures, occurs
when the following three conditions are in place-
1.λ1= λ2 = λ3 (all slopes are equal);

2.a1= a2 = a3 (all intercepts are equal); and

3.Var(E1) = Var(E2) = Var(E3) (all error variances are equal).

Thus, parallel measures have the same units of measurement, scale origins, and error
variances.

97
98
99
Setting a Confidence Interval
Var X VarT  Var E
Var E Var X  VarT
Var E  Var X  VarT
 VarT 
SDE  Var X  1  
 Var X 
SDE SD X 1  R
Setting a Confidence Interval…
FIVE Steps:
1. Estimate True Score (T ) = (R)(X)
Where, R = Reliability, X = Observed score
2. Calculate Standard Error of Measurement (SE)
SE SD X 1  R

3. Find z value associated with the confidence level


4. Multiply SE with z
5. Confidence Interval (CI) = T ± (SE × z)
101
Setting a Confidence Interval…
Suppose a score of 120 on an IQ test is obtained, and the test
has a reliability of 0.95 and a standard deviation of 12.
Calculate the 95% confidence Interval of the estimated true
score.

102
Setting a Confidence Interval…
• Then, if I want to be 95% confident that the score will fall
within a certain range, the z value associated with the 95%
confidence interval (1.96) will be multiplied with the value of
the SE (1.96 × 2.68 = 5.25). Then this value is added to and
subtracted from the estimated true score.
• This leads to the following two equations:
• 114 + 5.25 = 119.25 and
• 114 – 5.25 = 108.75

103
Setting a Confidence Interval …

Thus, I can be 95% confident that if the test was


administered to the test taker 100 times, 95 times out of
100, the true score would fall between 108.75 and 119.25.

104
THANK YOU

105

You might also like