5) The Freiburg Visual Acuity Test-Variability Unchanged by Post-Hoc Re-Analysis
5) The Freiburg Visual Acuity Test-Variability Unchanged by Post-Hoc Re-Analysis
5) The Freiburg Visual Acuity Test-Variability Unchanged by Post-Hoc Re-Analysis
DOI 10.1007/s00417-006-0474-4
CLINICAL INVESTIGATION
Received: 26 January 2006 / Revised: 28 August 2006 / Accepted: 9 October 2006 / Published online: 12 January 2007
# Springer-Verlag 2007
Fig. 1 Landolt C. The unit u, are very low relative to current computer capabilities. The
measured in minutes of arc, graphics should be able to resolve at least 256 gray levels,
defines the decimal visual acuity
(VA): For a VA of 1.0 (=20/20), or millions of colors (3×8-bit color depth). The resolution
u would span 1 min of visual of the visual display unit (VDU) is the most likely
angle 1u bottleneck (see “Limitations” in Discussion). Both CRT-
or LCD-type VDUs are possible. EN ISO 8596 [6] details a
luminance of the Landolt C between 80 and 320 cd/m2 at a
contrast of 85%. This is easily reached with consumer-
5u grade equipment. The present results were obtained with a
17” CRT monitor at a distance of 4 m, a luminance of
Methods 105 cd/m2 at a contrast of 95% and a background
illuminance of 60 lux, measured at the subject’s eye.
Versions
or in one of four positions; in the present study, eight value of the gap size ≥1 pixel is possible though, with the
different orientations were used. help of anti-aliasing. The Landolt C orientation is calculat-
ed randomly for each trial.
Threshold definition An important parameter is the number of trials [13]. In
the interest of rapid measurement and avoidance of subject
Like all sensory thresholds, the detection rate versus fatigue, it should be as low as possible; for best precision,
optotype size is described by a psychometric function however, it should be as high as possible. For clinical
(Fig. 2) [11]. Given this continuous relation, threshold studies, 30 trials yield a test-retest comparable to the
definition is not obvious. The optimal choice from a signal- ETDRS procedure [19]. In the present study, only 18 trials
detection point of view is the point of steepest slope, which were presented. This increases variability, making room for
is also the point of inflection. With the use of eight improvement. Post-hoc re-analysis (see below) sought to
alternatives, this point lies in the middle between the reduce the variability, though without success.
guessing rate of 12.5% and 100%, i.e., at 56.25%. At this
point, any deviation on the detection scale (ordinate) Procedure
transforms into the smallest possible deviation on the
acuity scale (abscissa). This definition is widely used and The procedure with FrACT is subject to the same boundary
also underlies the EN ISO 8596 standard [6]. One could conditions as any acuity test: well-defined surround lumi-
also define this region as the most uncomfortable one for nance, no screen illumination that would reduce contrast, a
the patient: here, they are most uncertain whether or not
they can recognize the target.
3.0
Threshold estimation ø
Gap size [acuity units]
ø ø xø xøx xø
It is suggested by signal detection theory [9], and has been 1.0 x x x
shown experimentally (e.g., [17]), that when comparing the
psychometric acuity function in subjects with a wide range x x x
0.3
of acuity, the position of the inflection point shifts, but x
slope stays rather constant, when plotted on a log(VA) 0.1 x
scale. Neglecting lapses, only one parameter needs to be
estimated, namely the threshold. A number of algorithms #61, VA=1.28
have been developed for such a situation (review: [20] or 0.03
the special edition of Perception [15]). For FrACT, the 0 5 10 15 Trial #
Best-PEST algorithm was selected [12], which needs no
prior information and assumes a constant fixed slope of the
3.0
Gap size [acuity units]
50 50 50
#144 #92 #0
Slope
Slope
Slope
40 40 40
30 30 30
20 20 20
10 10 10
0 0 0
0.3 1.0 3.0 0.3 1.0 3.0 0.3 1.0 3.0
Decimal acuity Decimal acuity Decimal acuity
Fig. 5 Typical characteristics of the psychometrical function (cf. acuity cases; the right one depicts a run where the subject gave an
Fig. 2) for three different subjects. The z-axis (represented via contour incorrect response to a (very easy) bonus trial, resulting in a low slope.
lines) depicts the likelihood of the fit producing the specific test run In none of the 148 cases was there a marked obliqueness of the
result; the z-ranges covered are: 0–5·10−3, 0–3·10−3, 0–6·10−5, likelihood ‘hill’ (the center graph represents an extreme case),
respectively. The abscissa represents the acuity threshold, and slope indicating that the threshold estimate does not depend strongly on
is on the ordinate.The two left graphs are typical for low and high the slope
Graefe’s Arch Clin Exp Ophthalmol (2007) 245:965–971 969
The run data thus obtained (similar to those depicted in has already occurred will yield a specific outcome. The
Fig. 3) were fitted with a psychometric function P using concept differs from that of a probability in that a
maximum-likelihood [21, 24] where both the threshold v0 probability refers to the occurrence of future events, while
and slope s were free parameters according to the following a likelihood refers to past events with known outcomes
Eq. (1):
Pðν Þ¼pchance þð1 pchance Þ=ð1þðν 0 =ν Þs ÞÞ ð1Þ a
3.0
0.4
In Fig. 3, two representative runs of FrACT in two subjects
are depicted. It can be seen that a run starts with “easy”
optotypes (low acuity) that become smaller until an
incorrect response is encountered (the fifth trial for the 0.2
upper example). Consequently, the Best-PEST algorithm Mean+2·SD
next selects a somewhat larger optotype.
The average time per 18-trial run, including data transfer
to a spreadsheet, was 103±40 s (range, 53–210 s). It took
0.0
about 6 min for the acuity of both eyes including a Mean
binocular training run.
Figure 4 demonstrates that the threshold obtained by
fitting the psychometric function with slope as another free
parameter never differed by more than one line (1 dB, or a -0.2
Mean–2·SD
factor of 1.26) from the one obtained by Best PEST. On
average, the results differed by 1.1%. The slope ranged
widely from 2.0 to over 100, averaging at 17.0; this is
markedly higher than the slope value of 1.7 currently used. -0.4
Figure 5 depicts the maximal likelihood function depending -1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2
on slope and threshold (decimal acuity) for three represen- Mean of test-retest [log(VA) = – logMAR]
tative cases. On the left, the slope of the psychometric Fig. 6 Test-retest variability of FrACT using 18 trials: 74 eyes of 37
naive, visually normal subjects, not necessarily wearing best correc-
function is very steep, while on the right it is very shallow tion, were analyzed. (a) Scatter plot. Along the continuous 45° line,
(in that run, an incorrect response to a large optotype, a perfect reproducibility would be obtained. The parallel dashed lines
bonus trial, was entered). The threshold depends very little indicate a deviation of ±3 lines on retest from the initial test. There
on slope as can be seen from the missing obliqueness of the was no improvement of repeatability by post-hoc analysis. (b) The
best-PEST data, depicted as difference test-retest vs. average of test-
likelihood vs. slope and threshold function. A brief retest (Bland-Altman plot [4, 5]); the abscissa covers the same range
explanation of the likelihood function may be in order: as (a). The mean test-retest difference is close to zero, and variability
Likelihood is the hypothetical probability that an event that does not change markedly across the acuity range covered
970 Graefe’s Arch Clin Exp Ophthalmol (2007) 245:965–971
[22]. The event here is the occurrence of the entire reflects inherent fluctuation of the threshold itself, or the
sequence of correct-incorrect responses, given the specific loss of degrees of freedom to estimate the additional
values of acuity threshold and slope value (cf. Fig. 1). parameter slope offsets a possible closer approximation of
Figure 6 illustrates the test-retest reproducibility. Points the psychometric function. Still, post-hoc processing has
on the continous 45° line would be perfectly reproduced; been integrated into FrACT as an option.
the dashed lines correspond to deviations by a factor of two A number of additional modifications of the post-hoc
(corresponding to three lines on an acuity chart or 3 dB). analysis were tried out: removal of the bonus trial results,
For the Best-PEST method, 72 of 74 (97.3%) run pairs restricting analysis to the final part, iteratively removing
differed by 2 dB or less; the mean CV was 12.9±9.7%. outliers, and removing erroneous bonus trials. None of
After post-hoc processing, 73 of 74 (98.6%) run pairs these modifications resulted in a lower test-retest variability.
differed by 2 dB or less; the mean CV was 13.5±9.7%. One problem of FrACT occurs when a subject mistypes
Thus, post-processing does not yield significantly different their first response; that is when a very large optotype is
thresholds (P=0.79, Wilcoxon test), it does not reduce test- seemingly not recognized correctly. The Best-PEST algo-
retest variability, and it removes about as many outliers as it rithm then searches too long for the threshold in the low
adds. In Fig. 6b, the Best-Pest results are depicted as a acuity region and may not converge to full acuity. In such a
Bland-Altman [4, 5] plot (after taking the logarithm of all case, it is best to abort the run and restart. This was not
acuities, the difference test-retest vs. average of test-retest). necessary in the present study.
The mean difference (dotted line) was 0.025 logMAR, and The main application fields for FrACT are thus clinical
the dashed lines indicate ±2·SD around the mean (2·SD= studies where acuity is an outcome variable. FrACT can be
0.196 logMAR). This plot shows that: (1) The negative seen as an automated alternative to ETDRS, extending its
mean difference hints at a small, though insignificant (P= range both at the upper and lower end and being safe from
0.6) learning effect, and (2) there is no marked skewness for being learned by heart on repeated testing. In laboratory
low acuities, but a tendency towards higher variability. The environments, FrACT has proven useful for subject
post-hoc data have been left out to avoid clutter; the mean screening and for quantifying acuity after optical or
difference was 0.014 logMAR and the 95% confidence physiological manipulations. Since the present study was
band is spanned by 0.204 logMAR. not successful in reducing the variability of the rather short
18-trial runs, for highly reliable results, the test should
either be repeated and the results averaged, or the number
Discussion of trials should be increased to 30 [1, 19].
The old version of FrACT has been successfully validated Acknowledgement Thanks to many users for their inspiring support,
in independent laboratories [7, 14, 23]. The new version is providing feedback that helped to root out bugs and suggesting useful
expansions. Special thanks to Lew Harvey, Hans Strasburger and Thomas
geometrically identical and showed an agreement between
Meigen for tutoring in signal detection theory, psychometric threshold
the (new) FrACT and ETDRS charts within 9% down to assessment and probability statistics and to Margret Schumacher for
very low acuities in the author’s laboratory [19]. This assiduous testing. Finally, thanks to two very persistent reviewers who
suggests that the FrACT results are bias-free estimators of considerably helped to clarify my thoughts.
visual acuity over the full range from ≈0.01 to ≈3.0. The
present study assessed the test-retest variability of FrACT
with only 18 trials. The test-retest variability as quantified References
by the coefficient of variation (CV) of VA was around 13%,
corresponding to about half a line (1 line = a factor of 1.26). 1. Bach M (1996) The Freiburg Visual Acuity Test-automatic
The corresponding 95% confidence interval spans ±0.196 measurement of visual acuity. Optom Vis Sci 73:49–53
2. Bach M (1997) Anti-aliasing and dithering in the Freiburg Visual
logMAR. This leaves room for improvement. Acuity Test. Spat Vis 11:85–89
The post-hoc analysis, namely fitting slope as another 3. Bach M (2006) Homepage of the Freiburg Visual Acuity and
free parameter in addition to the threshold, resulted in Contrast Test (‘FrACT’). Retrieved 2006-07-04, from http://www.
nearly identical acuity estimates and nearly identical michaelbach.de/fract.html
4. Bland JM, Altman DG (1986) Statistical methods for assessing
average test-retest values. The maximum likelihood fitting agreement between two methods of clinical measurement. Lancet
surface showed that slope and threshold are highly 1:307–310
decoupled; in other words, whichever value of slope is 5. Bland JM, Altman DG (1995) Comparing methods of measure-
chosen has very little influence on the acuity outcome. This ment: why plotting difference against standard method is
misleading. Lancet 346:1085–1087
suggests that the fixed slope as used in Best PEST is no 6. CEN (Comité Européen de Normalisation) (1996) Ophthalmic
disadvantage. Somewhat disappointingly, post-hoc process- optics-visual acuity testing-the standard optotype and its presen-
ing did not improve test-retest variability. Either this tation. Beuth-Verlag, Berlin
Graefe’s Arch Clin Exp Ophthalmol (2007) 245:965–971 971
7. Dennis RJ, Beer JM, Baldwin JB, Ivan DJ, Lorusso FJ, Thompson 16. Peters BT, Bloomberg JJ (2005) Dynamic visual acuity using
WT (2004) Using the Freiburg Acuity and Contrast Test to “far” and “near” targets. Acta Otolaryngol 125:353–357
measure visual performance in USAF personnel after PRK. 17. Petersen J (1990) Zur Fehlerbreite der subjektiven Visusmessung.
Optom Vis Sci 81:516–524 Fortschr Ophthalmol 87:604–608
8. Foley JD, Van Dam A, Feiner SK, Hughes JF (1990) Computer 18. Ruamviboonsuk P, Tiensuwan M, Kunawut C, Masayaanon P
Graphics, Principles and Practice. Addison-Wesley, Reading (2003) Repeatability of an automated Landolt C test, compared
9. Green DM, Swets JA (1966) Signal detection theory and with the early treatment of diabetic retinopathy study (ETDRS)
psychophysics. Wiley, New York chart testing. Am J Ophthalmol 136:662–669
10. Hess R, Woo G (1978) Vision through cataracts. Invest Ophthalmol 19. Schulze-Bonsel K, Feltgen N, Burau H, Hansen LL, Bach M
Vis Sci 17:428–435 (2006) Visual acuities “Hand Motion” and “Counting Fingers”
11. Klein SA (2001) Measuring, estimating, and understanding the can be quantified using the Freiburg Visual Acuity Test. Invest
psychometric function: a commentary. Percept Psychophys Ophthalmol Vis Sci [in print]
63:1421–1455 20. Treutwein B (1995) Adaptive psychophysical procedures. Vision
12. Lieberman HR, Pentland AP (1982) Microcomputer-based esti- Res 35:2503–2522
mation of psychophysical thresholds: The best PEST. Behav Res 21. Treutwein B, Strasburger H (1999) Fitting the psychometric
Methods Instrument 14:21–25 function. Percept Psychophys 61:87–106
13. Linschoten MR, Harvey LO, Jr., Eller PM, Jafek BW (2001) 22. Weisstein EW. (2006) “Likelihood.” From MathWorld - A
Fast and accurate measurement of taste and smell thresholds Wolfram Web Resource. Retrieved 2006-06-27, from < http://
using a maximum-likelihood adaptive staircase procedure. mathworld.wolfram.com/Likelihood.html >
Percept Psychophys 63:1330–1347 23. Wesemann W (2002) [Visual acuity measured via the Freiburg
14. Loumann Knudsen L (2003) Visual acuity testing in diabetic visual acuity test (FVT), Bailey Lovie chart and Landolt Ring
subjects: the decimal progression chart versus the Freiburg visual chart]. Klin Monatsbl Augenheilkd 219:660–667
acuity test. Graefes Arch Clin Exp Ophthalmol 241:615–618 24. Wichmann FA, Hill NJ (2001) The psychometric function: I.
15. Macmillan NA (2001) Threshold estimation: the state of the art. Fitting, sampling, and goodness of fit. Percept Psychophys
Percept Psychophys 63:1277–1278 63:1293–1313