0% found this document useful (0 votes)
36 views9 pages

Psychometrics - For Colleagues

The document discusses the concept of reliability in psychometrics, emphasizing the importance of measurement error and the methods for estimating reliability, such as Cronbach's alpha and split-half reliability. It highlights the relationship between the number of items in a test and its reliability, noting that while more items can enhance internal consistency, there is a threshold beyond which additional items may not significantly improve reliability. Additionally, it addresses the issue of attenuation, where limited item numbers can lead to underestimating true reliability due to measurement error.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views9 pages

Psychometrics - For Colleagues

The document discusses the concept of reliability in psychometrics, emphasizing the importance of measurement error and the methods for estimating reliability, such as Cronbach's alpha and split-half reliability. It highlights the relationship between the number of items in a test and its reliability, noting that while more items can enhance internal consistency, there is a threshold beyond which additional items may not significantly improve reliability. Additionally, it addresses the issue of attenuation, where limited item numbers can lead to underestimating true reliability due to measurement error.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Psychometrics PSY 329.

1 – Presentation Reliability

Table of contents:

1. Theory of measurement error

2. Methods of estimating reliability

3. Reliability and correlation with external variables

4. Reliability and number of items

Theory of measurement error & Methods of estimating reliability - IMAN H

The theory of measurement error

Some error is involved in any measurement, whether it is the measurement of temperature, blood
pressure, or intelligence. One definition of the true score is that it represents the average score that
would be obtained over repeated testing

-Measurement error causes obtained scores to vary over the test. The standard deviation of obtained
scores over these generally hypothetical tests for a given individual defines the "standard error of
measurement"

-In particular, measurement error associated with extreme true scores is estimated to be

(1) Smaller than the error associated with true scores closer to the mean

(2) Skewed (positively for low scores and negatively for high scores)

-In contrast to definitions of reliability based upon the internal consistency or covariances among
components of a linear combination, "reliability" can also mean temporal stability which basically
concerns the correlation between scores over repeated testing.

Domain sampling - particularly useful model of a process that gives rise to true scores

-Tests are constructed by selecting a specified number of measures at random from a homogeneous,
infinitely large pool. Under these conditions, the correlation of any given test score with the average of
an test scores (the reliability index) can be shown to equal the square root of the correlation of any
given test score with another given test score (the reliability coefficient).

-In turn, the reliability coefficient can be shown to estimate the ratio of variance in true scores to the
variance in observed scores.

Parallel test - one alternative to domain sampling

- It assumes that two or more tests produce equal true scores but generate independent random
measurement error.

-Although it defines rather than estimates the reliability coefficient, its major predictions are the same.
The role of factorial complexity in measures of reliability is considered; a key point is that a test may
measure more than one thing (factor) yet be highly reliable.

The concept of measurement error

-Measurement error can be thought of as measure deviating from a true value and such measurement
error can be mixture of systematic and random processes. When it is systematic, it can affect all
observations equally and be constant error or affect certain types of observations differently than others
and be a bias.

Example 1: A miscalibrated thermometer that always reads three degrees too high illustrates a constant
error in the physical sciences. If the thermometer was sensitive to some irrelevant attribute such as the
color or the density of what was being measured, the error would be a bias

Example 2: Random error would be introduces if the person reading the thermometer were to
transpose digits from time to time when reading observations

Constant errors

There are obvious biases and random errors in the behavioral sciences, although the situation may be
less obvious with constant errors. If clinician A were to judge the intelligence of each of a series of
individuals five points higher than clinician B, they would be calibrated differently, but either or both
could have a constant error since the true IQ is unknown.

Likewise, unsystematic differences in ratings on repeated testing illustrate one form of random error
when it can be assumed the person rated did not change.

Even if the concept of constant error was meaningful in the behavioral sciences, it affects all
observations equally and therefore does not influence group comparisons, and so it no need to be
considered further. Indeed it has no effect by definition, unless a scale has a meaningful zero for
example if it is a radio or absolute scale since it affects only the location of the scale mean.
However, a clinician, a rater, or an evaluative process may be sensitive to irrelevant attributes like race,
gender etc, and thereby be a biased!

Random errors

What is more important and meaningful is the presence or random errors. They are important because
they limit the degree of lawfulness in nature by complicating relationships. They might for example
make the curve appear jagged and therefore more complex rather than smoother and simpler.

- Scores on particular classroom test are influenced by:

1. The content sampled

2. Luck in guessing

3. State of alertness

4. Clerical errors

Random measurement errors

-Random measurement errors are never completely eliminated, but one should seek to minimize them
as much as possible

-One definition of reliability is freedom from a random error – how repeatable observations are

1. When different persons take the measurement

2. With alternative instruments intended to measure the same thing

3. When incidental variation exists in the conditions of measurement

Measurement reliability

In other words, measurements must be stable whenever essentially the same results should be obtained
since the measurement in reliable to the extent that it leads to the same or similar results, regardless of
the opportunities for variations to occur. Reliable measures allow one to generalize from one particular
use of the method to a wide variety of related circumstances.

-But be aware that high reliability does not mean high validity!

Example 3: One could measure intelligence by having individuals throw stones as far as possible.
Distances obtained by individuals on one occasion will correlate highly with distances obtained on
another occasion. Being repeatable, the measures are highly reliable; but stone tossing is obviously not
a valid measure of intelligence.

Results will correlate with the measures of strength and not intelligence.
-Measurement error places limits on the validity of an instrument, but event its complete absence does
not guarantee validity, thus, reliability is necessary but not sufficient to validity.

Estimates of reliability

-Practical measures of reliability are usually based upon either:

1. Items with a single test

2. Between a test and one other test

Some measures are so readily obtainable that it is possible to retest the subject many times.

Example 4: Practice a sample of subjects at reaction time responses until their performance is stable to
help satisfy the above assumption. Then run them for at least 100 trials to produce a domain of
responses. Their reaction time on one trial is a one-item test.

Correlate their results on one arbitrarily chosen trial (e.g., the tenth) with:

(1) Their results on another arbitrarily chosen trial (e.g., the fifteenth) and

(2) The average over all trials

-These two correlations reflect the reliability of individual differences in a one-item test of reaction time
and should closely approximate a square root relationship.

Reliability and correlation with external variables – AHMED

Reliability and number of items – IMAN EHNECH-SERAN

Reliability and number of items

In psychometrics, reliability refers to the consistency and stability of a measurement instrument or test
over repeated administrations. It is a crucial aspect of any psychological assessment, as it indicates the
extent to which the instrument produces consistent and dependable results. One of the factors that
contribute to the reliability of a test is the number of items it contains.

The number of items in a psychological test plays a significant role in determining its reliability. A test
with too few items may not adequately capture the construct it intends to measure, leading to
unreliable results. On the other hand, a test with too many items may become cumbersome and time-
consuming, potentially leading to participant fatigue and decreased motivation, affecting the overall
quality of responses.

To understand the relationship between reliability and the number of items, it's essential to consider the
concept of internal consistency. Internal consistency is a measure of how well the items within a test
correlate with each other. One common method for assessing internal consistency is Cronbach's alpha
coefficient. As the number of items increases, it can impact the internal consistency of the test.

In general, increasing the number of items in a test can enhance its internal consistency up to a certain
point. This is because a larger number of items provides a more comprehensive sampling of the
construct being measured, increasing the likelihood of capturing its various facets. However, after a
certain threshold, adding more items may result in diminishing returns, and the additional items might
not contribute substantially to the overall reliability.

Researchers and test developers often conduct reliability analyses to determine the optimal number of
items required to achieve a balance between precision and practicality. They may use statistical
techniques such as item-total correlations, factor analysis, and reliability coefficients to assess the
internal consistency of the test and make informed decisions about whether to add or remove items.

Several common statistics are used to estimate the reliability of a set of items in psychometrics. These
statistics help assess the consistency and stability of measurements, providing valuable information
about the quality of a psychological test. Here are some of the common reliability statistics:

1. Cronbach's Alpha (α):

 Cronbach's alpha is a widely used measure of internal consistency. It assesses how well
items in a test are correlated with each other, providing an overall estimate of reliability.
Values closer to 1 indicate higher internal consistency.

 Developed by Lee Cronbach in 1951, it has become one of the most widely used
measures of internal consistency.

 The formula for Cronbach's alpha is as follows:

 Where: k is the number of items in the test, sigma-i is variance of item i, and sigma-X is
total score variance.

 Interpretation:

1. Range of Values:

Cronbach's alpha ranges from 0 to 1.

A higher alpha indicates greater internal consistency.


2. Interpretation of Values:

Values close to 1 suggest high internal consistency among the items.

Values closer to 0 indicate lower internal consistency

 Key Consideration: While there is no universally agreed-upon threshold for an


acceptable alpha value, a commonly cited guideline is that values above 0.70 or 0.80 are
considered indicative of satisfactory reliability.

 Practical Applications:

1. Educational Testing:

Used to assess the reliability of educational tests, ensuring that the test
consistently measures a student's knowledge or ability.

2. Psychological Research:

Applied in research studies to evaluate the internal consistency of scales or


questionnaires measuring psychological constructs.

3. Personnel Selection:

Employed in employment and personnel selection assessments to ensure that


the items reliably measure relevant traits or skills.

 In summary, Cronbach's alpha is a valuable tool in psychometrics for evaluating the


internal consistency of a test. It provides a quantitative measure of how well the items
in a test correlate with each other, offering insights into the reliability of the overall
measurement instrument. Researchers and test developers use Cronbach's alpha to
assess and enhance the quality of psychological assessments.

 Alternative terminology. Cronbach's alpha, when computed for binary (e.g., true/false)
items, is identical to the so-called Kuder-Richardson-20 formula of reliability for sum
scales. In either case, because the reliability is actually estimated from the consistency
of all items in the sum scales, the reliability coefficient computed in this manner is also
referred to as the internal-consistency reliability.

2. Kuder-Richardson Formula 20 (KR-20):


KR-20 is a formula used for assessing the reliability of dichotomous (yes/no) items. It is similar to
Cronbach's alpha but is specifically designed for tests with binary response options. While
Cronbach tends to get the credit, to the point that the index is often called “Cronbach’s Alpha”
he really did not invent it.
Kuder and Richardson (1927) suggested the following equation to estimate the reliability of a
test with dichotomous (right/wrong) items.
Note that it is the same as Cronbach’s equation, except that he replaced the binomial
variance pq with the more general notation of variance (sigma).

This just means that you can use Cronbach’s equation on polytomous data such as Likert rating
scales. In the case of dichotomous data such as multiple choice items, Cronbach’s alpha and KR-
20 are the exact same.

As a formula designed for dichotomous items, KR-20 is not suitable for tests with items that
have more than two response categories.

3. Split-Half Reliability:

This method involves dividing a test into two halves and comparing the scores on each half. The
correlation between the scores on the two halves is then calculated. The Spearman-Brown
formula is often applied to correct for the artificially low reliability that can result from splitting
the test.

Procedure for Calculating Split-Half Reliability:


1. Divide the Test:
 The test is divided into two halves, typically by randomly splitting the items into
two sets. It is crucial to ensure that each half is representative of the overall
content and difficulty level of the entire test.
2. Score Each Half:
 Scores are calculated for each individual on both halves of the test. For example,
if the test has 50 items, the first 25 items might form one half, and the
remaining 25 items would constitute the second half.
3. Correlate the Scores:
 The scores on the two halves are then correlated. The most common correlation
coefficient used for split-half reliability is the Pearson correlation coefficient.
4. Adjustment for Reliability:
The Spearman-Brown prophecy formula is often applied to correct the correlation
coefficient for the artificially lower reliability that can result from splitting the test. The
formula is as follows: radjusted=2r/1+r
Where : r adjusted is the adjusted correlation coefficient.

r is the observed correlation coefficient obtained from the split-half reliability analysis.

Example:
Suppose the observed correlation (r) from a split-half analysis is 0.70. Applying the Spearman-
Brown prophecy formula:
2 X 0.70
radjusted=
1+0.70
1.40
radjusted=
1.70

radjusted=0.824

In this example, the adjusted correlation coefficient (radjusted) suggests an estimate of the reliability of
the full-length test.

Other statistical techniques involve: Test-Retest Reliability, Intraclass Correlation Coefficient (ICC),
Coefficient Omega (Ω), Guttman Split-Half Coefficient, Average Inter-Item Correlation, Coefficient of
Stability and Factor Analysis

When evaluating the reliability of a set of items, researchers often consider a combination of these
statistics to gain a comprehensive understanding of the measurement instrument's quality. Each statistic
has its strengths and limitations, and the choice of which to use depends on the nature of the test and
the goals of the assessment.

Attentuation
Attenuation in the context of reliability and the number of items refers to the potential
underestimation of the true reliability of a measurement instrument due to measurement error
associated with a limited number of items. Reliability refers to the consistency and stability of
measurements, and it is typically assessed using reliability coefficients such as Cronbach's alpha. The
number of items in a scale or test can influence the observed reliability estimate, and attenuation is
a concern when the number of items is insufficient to accurately reflect the underlying reliability of
the construct.

Key Concepts:

Attenuation in Reliability:

Attenuation in reliability occurs when the observed reliability coefficient is lower than the true reliability
due to measurement error. This is particularly relevant when the number of items in the scale is limited.

Reasons for Attenuation in Reliability with Fewer Items:

Limited Coverage of the Construct:

A small number of items may not adequately cover the full range of the construct being
measured, leading to an incomplete representation and potential underestimation of reliability.

Increased Sensitivity to Item Variability:

With fewer items, the reliability estimate becomes more sensitive to the variability of individual
items. If one or a few items have low variability or are not well-aligned with the construct, it can
disproportionately impact the observed reliability.
Impact of Random Measurement Error:

Random measurement error has a greater impact on reliability estimates when the number of
items is limited. Inconsistencies or fluctuations in responses to a small number of items can
contribute to attenuation.

Strategies to Address Attenuation in Reliability:

Increase the Number of Items:

Whenever possible, include a sufficient number of items in the scale to improve reliability. More
items provide a more robust and stable estimate of the underlying construct.

Item Quality:

Ensure that the selected items are of high quality, tapping into different facets of the construct.
Well-designed items contribute more to reliability than a larger number of poorly constructed or
redundant items.

Factor Analysis:

Conduct factor analysis to explore the underlying factor structure of the scale. It helps ensure
that the items are measuring the intended construct and identifies potential sources of
attenuation.

Replication Studies:

Replicate studies with different samples to assess the generalizability of reliability estimates.
Consistent reliability across diverse samples enhances confidence in the stability of the measure.

You might also like