True Score
True Score
Test of abilities and other personal characteristics play a large role in modern life,
contributing to countless decisions that shape individual’s upbringing, schooling and career.
Tests direct attention to the talented; they issue an early warning and constructive hints
regarding individuals who will need special help. Almost never does, or should, a test score
by itself determine what is to be done.
Tests have been much criticized because misconceptions and misapplication have lead to
some decisions that, in hindsight, we see as unwise or unjust. No single observation fully
represents a person. To know how trustworthy a procedure is, we examine the consistency
among measurements. There are many reasons for inconsistency. Attention and effort can
change from moment to moment. Over longer periods score change from physical growth,
learning, changes in health, and personality changes. If we employ fresh test items for each
measurement, another type of variation is introduced. To these factors must be added the
unaccountable chance effects.
TRUE SCORE
Charles Spearman was one of the founders of this classical test theory, having an
understanding that there were generally always going to be errors in test measurements,
that these errors are random variables, and finally, that they could be correlated and
indexed. It is hoped that through these correlated errors, improvements can be made,
therefore reducing these errors and increasing the reliability of the tests. A higher test
reliability would yield more true score answers, which is essentially the main aim of the
classical theory, and is a more valuable way of finding the correct candidate for a job.
Classical Test Theory is rarely considered by individuals taking psychometric tests or the
companies using them, but is essential in its uses, as there is no point in a test that has to
be highly scrutinized for errors before the candidates’ responses are even measured. It is
also important to have high reliability within tests, simply for the fact that companies do
not what to waste time or money on using a test to gauge future employers, if the answers
do not relate to anything or give any indication of job performance.
Classical test theory may be regarded as roughly synonymous with true score theory. The term
"classical" refers not only to the chronology of these models but also contrasts with the more recent
psychometric theories, generally referred to collectively as item response theory, which sometimes
bear the appellation "modern" as in "modern latent trait theory".
Classical test theory assumes that each person has a true score, T that would be obtained if there were
no errors in measurement. A person's true score is defined as the expected number-correct score over
an infinite number of independent administrations of the test. Unfortunately, test users never observe
a person's true score, only an observed score, X. It is assumed that observed score = true score plus
some error:
X = T + E
Observed score true score error
ERROR
As used in the classical test theory, the term error refers to unwanted variation. The score the person
earns on a particular testing, the observed score, differs from the thorough measurement the tester
would prefer to base conclusions on. That ideal error-free measurement is traditionally called the true
score. The difference between the observed score and true score is the error of measurement.
In statistics, an error is not a "mistake". Variability is an inherent part of things being
measured and of the measurement process. Measurement errors can be divided into two
components: random error and systematic error.
Random error: Random error is caused by any factors that randomly affect measurement of
the variable across the sample. For instance, each person's mood can inflate or deflate their
performance on any occasion. In a particular testing, some children may be feeling in a good
mood and others may be depressed. If mood affects their performance on the measure, it may
artificially inflate the observed scores for some children and artificially deflate them for
others. The important thing about random error is that it does not have any consistent effects
across the entire sample. Instead, it pushes observed scores up or down randomly. This means
that if we could see all of the random errors in a distribution they would have to sum to 0 --
there would be as many negative errors as positive ones. The important property of random
error is that it adds variability to the data but does not affect average performance for the
group. Because of this, random error is sometimes considered noise.
Systematic error: Systematic error is caused by any factors that systematically affect
measurement of the variable across the sample. For instance, if there is loud traffic going by
just outside of a classroom where students are taking a test, this noise is liable to affect all of
the children's scores -- in this case, systematically lowering them. Unlike random error,
systematic errors tend to be consistently either positive or negative -- because of this,
systematic error is sometimes considered to be bias in measurement.
Classical test theory assumes linearity—that is, the regression of the observed score on the
true score is linear. This linearity assumption underlies the practice of creating tests from the
linear combination of items or subtests. In addition, the following assumptions are often
made by classical test theory:
The first four assumptions can be readily derived from the definitions of true score and
measurement error. Thus, they are commonly shared by all the models of CTT. The fifth
assumption is also suggested by most of the models because it is needed to estimate
reliability. All of these assumptions are generally considered “weak assumptions,” that is,
assumptions that are likely to hold true in most data. Some models of CTT make further
stronger assumptions that, although they are not needed for deriving most formulas central to
the theory, provide estimation convenience:
Measurement error is normally distributed within a person and across persons in the
population.
Distributions of measurement error have the same variance across all levels of true
score.
SOURCES OF ERROR
Measurement should be precise and unambiguous in an ideal research study. However, this
objective is often not met with in entirety. As such, the researcher must be aware about the
sources of error in measurement. Following are listed the possible sources of error in
measurement.
Failure to account for a factor (usually systematic) - The most challenging part of
designing an experiment is trying to control or account for all possible factors except the one
independent variable that is being analyzed. For instance, you may inadvertently ignore air
resistance when measuring free-fall acceleration or you may fail to account for the effect of
the Earth's magnetic field when measuring the field of a small magnet. The best way to
account for these sources of error is to brainstorm with your peers about all the factors that
could possibly affect your result. This brainstorm should be done before beginning the
experiment so that arrangements can be made to account for the confounding factors before
taking data. Sometimes a correction can be applied to a result after taking data, but this is
inefficient and not always possible.
Instrument resolution (random) - All instruments have finite precision that limits the ability
to resolve small measurement differences. For instance, a meter stick cannot distinguish
distances to a precision much better than about half of its smallest scale division (0.5 mm in
this case). One of the best ways to obtain more precise measurements is to use a null
difference method instead of measuring a quantity directly. Null or balance methods involve
using instrumentation to measure the difference between two similar quantities, one of which
is known very accurately and is adjustable. The adjustable reference quantity is varied until
the difference is reduced to zero. The two quantities are then balanced and the magnitude of
the unknown quantity can be found by comparison with the reference sample. With this
method, problems of source instability are eliminated, and the measuring instrument can be
very sensitive and does not even need a scale.
Physical variations (random) - It is always wise to obtain multiple measurements over the
entire range being investigated. Doing so often reveals variations that might otherwise go
undetected. If desired, these variations may be cause for closer examination, or they may be
combined to find an average value.
Parallax (systematic or random) - This error can occur whenever there is some distance
between the measuring scale and the indicator used to obtain a measurement. If the observer's
eye is not squarely aligned with the pointer and scale, the reading may be too high or low.
Instrument drift (systematic) - Most electronic instruments have readings that drift over
time. The amount of drift is generally not a concern, but occasionally this source of error can
be significant and should be considered.
Lag time and hysteresis (systematic) - Some measuring devices require time to reach
equilibrium, and taking a measurement before the instrument is stable will result in a
measurement that is generally too low. The most common example is taking temperature
readings with a thermometer that has not reached thermal equilibrium with its environment.
A similar effect is hysteresis where the instrument readings lag behind and appear to have a
"memory" effect as data are taken sequentially moving up or down through a range of values.
Hysteresis is most commonly associated with materials that become magnetized when a
changing magnetic field is applied.
One thing that can do is to pilot test the instruments, getting feedback from respondents
regarding how easy or hard the measure was and information about how the testing
environment affected their performance. Second, if we are gathering measures using people
to collect the data (as interviewers or observers) it should make sure that we train them
thoroughly so that they aren't accidently introducing error All data entry for computer
analysis should be "double-punched" and verified. Use statistical procedures to adjust for
measurement error. These range from rather simple formulas you can apply directly to data to
very complex modelling procedures for modelling the error and its effects. Finally, one of the
best things is to deal with measurement errors, especially systematic errors, is to use multiple
measures of the same construct. Especially if the different measures don't share the same
systematic errors, it will be able to triangulate across the multiple measures and get a more
accurate sense of what's going on.
REFERENCE
http://www.psychometrictest.org.uk/classic-test-theory/
http://psychology.iresearchnet.com/industrial-organizational-psychology/i-o-psychology-
theories/classical-test-theory/
https://en.wikipedia.org/wiki/Classical_test_theory
https://www2.southeastern.edu/Academics/Faculty/rallain/plab193/labinfo/
Error_Analysis/06_Sources_of_Error.html
http://www.socialresearchmethods.net/kb/measerr.php