0% found this document useful (0 votes)
29 views5 pages

Reliability Reviewer

Psychological Statistics

Uploaded by

NICOLE KAYE LIPA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views5 pages

Reliability Reviewer

Psychological Statistics

Uploaded by

NICOLE KAYE LIPA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

109 PSYCHOLOGICAL ASSESSMENT 1 REVIEWER

Reliability ● Refers to the range of fluctuation likely to


occur in a single score as a result of
irrelevant or unknown chance factors.
Reliability
● Refers to the consistency of scores X=T+E ;E=X–T
obtained by the same persons when they X – observed score; T – true score; E – error
are re-examined with the same test on
different occasions, or with different sets of Systematic Error
equivalent items, or under other variable ● Error consistent across uses of the
examining conditions. measurement tool and likely to affect
● Can be expressed in terms of a correlation validity, not reliability.
coefficient. ● Examples include an incorrectly worded
item, poorly written directions, inclusion of
Correlation coefficient (r) items unrelated to the content, theory, etc.
● Expresses the degree of correspondence or upon which the measurement tool is based.
relationship between two sets of scores.
● May be computed in different ways, Random Error
depending on the nature of the data. ● Exerts a differential effect on the same
● Reliability coefficients usually fall in the .80 examinee across different testing sessions.
to .90 range. ● This inconsistency affects reliability.
● Varies from examinee to examinee; there is
Correlation formula no consistency in the source of error.
A. Pearson Product Moment Correlation
(Interval scores). Source of random error
● Individual examinee variations
● Administration condition variation such as
noise, temperature, lighting, seat comfort.
● Measurement device bias (e.g., ambiguous
B. Spearman’s Rank Correlation (Ordinal wording, test item bias).
data). ● Participant bias (e.g., guessing, motivation,
cheating, sabotage).
● Test administrator bias (e.g., nonstandard
directions, inconsistent proctoring, scoring
Interpretation of Correlation Value errors).
➔ 0.00 = zero correlation.
➔ 0.01 to +0.20 = negligible correlation. Types of reliability
➔ 0.21 to +0.40 = low or slight correlation. ● Can be identified through one or more
➔ 0.41 to +0.70 = marked or moderate occasions of test, one or more test
relationship. instrument, or both
➔ 0.71 to +0.90 = high relationship. ● Internal: extent to which a measure is
➔ 0.91 to +0.99 = very high relationship. consistent within itself.
➔ +1.00 = perfect correlation. ● External: extent to which a measure varies
from one use to another.
1. Test-retest
Error of Measurement

by Nicole Kaye E. Lipa, ABPS 3B


109 PSYCHOLOGICAL ASSESSMENT 1 REVIEWER
● Involves administering the same test twice How to ensure parallel form of test:
to the same person or group after a certain ● Contain the same number of items
time interval has elapsed ● Expressed in the same form
● Measures consistency of scores over time ● Cover the same type of content
● The longer the time interval, the lower the ● Equal range and level of difficulty
reliability
● Keep the interval short, should rarely 3. Split-half
exceed 6 months ● Also internal consistency
● Two scores are obtained separately for
Procedure each person by dividing the test into
● Administer the test. equivalent halves
● Wait awhile (preferably two weeks), then ● Use odd & even items
administer it again to the same individuals. ● The longer the test, the higher the reliability
● Compute the correlation between the two
results. Spearman Brown formula
● Test is considered reliable if the correlation
is high (e.g., above +.90)

Example: A Math test was given to students on


Monday & will be given again the next Monday Procedure
without any Math lessons being taught between ● Administer the test once.
these times. The scores of the students after the ● Randomly split items into two groups with
first administration & their scores during the second half the items in each group (split
administration are correlated. The resulting index is halves/odd and even items).
the reliability coefficient. ● Score the split halves separately.
● Compute Pearson r on the resulting pairs of
Disadvantages scores.
● When the time interval is short, the
respondents may recall their previous Advantage/Disadvantage of Split-half
responses & this tends to make the ● Advantage: Requires only one testing
correlation coefficient high (memory effect) session. Eliminates the possibility that the
● When the time interval is long, factors such variable being measured will change
as unlearning & forgetting may result in low between measurements.
correlation of the test ● Disadvantage: No guarantee that the two
● Environmental conditions such as noise, “split halves” are equivalent. If not, then the
temperature, lighting & other factors may method underestimates the reliability of the
affect the correlation coefficient of the test. test.

2. Alternate-form
● Also equivalent or parallel reliability
● Uses two different but equivalent forms of
the test administered to the same group 4. Kuder-Richardson
● Fluctuation in performance depends on the ● Also inter-item consistency
different form of test, not over time ● Based on the consistency of responses to
all items in the test

by Nicole Kaye E. Lipa, ABPS 3B


109 PSYCHOLOGICAL ASSESSMENT 1 REVIEWER
● Used in tests scored as 1 or 0, right or Techniques For Measuring Reliability
wrong
● The more homogenous the domain, the
Test Session Required Test Form Required
higher the consistency
One Two
Influenced by two factors:
● Content sampling One Split-half Alternate form
Kuder-Richardson (immediate)
● Heterogeneity of behavior domain sampled Coefficient alpha

Kuder-Richardson Formula 21 (KR21) Two Test-retest Alternate form (delayed


● Requires only three pieces of information—
the number of items on the test, the mean,
& the standard deviation. Note, formula
Type of reliability Error Variance
KR21 can be used only if it can be assumed
coefficient
that the items are of equal difficulty.
Test-retest Time sampling

Alternate-form Content sampling


(immediate) Time and content
Alternate-form sampling
K = number of items on the test (delayed)
M = mean of the set of test scores
SD = standard deviation of the set of test scores Split-half Content sampling

5. Coefficient alpha Kuder-Richardson & Content sampling &


Coefficient Alpha content heterogeneity
● Also Cronbach alpha.
● Used in items not scored as 1 or 0
Scorer Interscorer differences
● Also used in tests where two or more
scoring weights are assigned to answers.
● Personality or attitude scales. Factors affecting Reliability
● Used in calculating the reliability of items in ● Test length. Generally, the longer a test is,
essay tests where more than one answer is the more reliable it is. If a test is too short,
possible. the reliability coefficient is low.
● Speed. The rate at which an examinee
6. Scorer reliability works will systematically influence
● Also interscorer or interrater performance.
● Score depends upon the judgment of the ● Group homogeneity. In general, the more
scorer heterogeneous the group of examinees, the
● Examiner variance or scorer variance higher the correlation coefficient or the more
reliable the measure will be.
Reliability standards ● Item difficulty. When there is little
A. For instruments where groups are variability among test scores, the reliability
concerned: 0.80 or higher is adequate. will be low.
B. For decisions about individuals: 0.90 is the ● Objectivity. Objectively scored tests, rather
bare minimum: 0.95 is the desired standard. than subjectively scored tests, show a
higher reliability.

by Nicole Kaye E. Lipa, ABPS 3B


109 PSYCHOLOGICAL ASSESSMENT 1 REVIEWER
● Test-retest interval. The shorter the time errors; guessing, effects of memory,
interval between two administrations of a practice, boredom, etc.
test, the less likely that changes will occur
and the higher the reliability will be.
● The number of tasks in the test or
assessment—more tasks will generally
lead to higher reliability.
● The spread of scores produced by the
assessment—the larger the spread of
results, the higher the reliability.
● The clearness of marking guides and
checking of marking procedures. Scoring
errors (e.g., inconsistent scoring) will
depress a reliability estimate. Keep scoring
simple and consistently applied.
● Item Quality: Poorly constructed test items
introduce ambiguity into the testing
situation, thus affecting examinee
performance.
● The suitability of the questions or tasks
for the students being assessed—
questions that are too hard or too easy for
the students will not increase reliability.
● The training of the assessors.
● The wording of the rubric—carefully
worded rubrics make it easier to decide on
achievement levels.
● How closely standardized procedures
and conditions for assessment are
followed.
● How well questions and tasks are
phrased.
● Variation with the testing situation. Errors
in the testing situation (e.g., students
misunderstanding or misreading test
directions, noise level, distractions, and
sickness) can cause test scores to vary. Norm
● The anxiety or readiness of the students
for assessment—assessing students when Normal or average performance
they are tired or after an exciting event is Established by determining what persons in a
less likely to produce reliable results. representative group do in a test
● Other threats: These include differences in
content on test or measurement forms;
administration, examinee, and/or scoring RAw score

by Nicole Kaye E. Lipa, ABPS 3B


109 PSYCHOLOGICAL ASSESSMENT 1 REVIEWER
The distribution of scores by a standardization
sample to know where he or she belongs in the
distribution. It is often the correct number of
responses in a test. To make the RS meaningful, it
must be converted to a relative measure or derived
scores

Derived score

It can be expressed in two ways: developmental


level attained and its relative position in a group

Developmental Norms

A. Mental Age - simon-binet scale. Often used


when scores are grouped by year levels. It
is calculated by getting the sum of the basal
age and the extra months of credit earned
at higher year levels.

Basal age- highest age that an individual can start


answering questions
Ceiling age - the point at which an individual cannot
answer questions anymore

B. Grade Equivalents - compute the main raw


score of the students across different year
levels. Test the children at different times of
the year
C. Ordinal scales- determine how far along
children are in the normal stages of
development. Determine the stage reached
by a child in the development of specific
behavior functions.

by Nicole Kaye E. Lipa, ABPS 3B

You might also like