Sampling Frequency and Eye-Tracking Measures
Sampling Frequency and Eye-Tracking Measures
3(3), 6, 1-12
Introduction Gur, & Snodderly, 2008). However, there are also cases
where a naturalistic setup demands eye-trackers that
The role of sampling frequency and its mathemati- currently lack the speed of their stationary counter-
cal implications to the resulting data may be common parts. Consider the following quote from Green (2002):
knowledge to statisticians, but in the eye-tracking com-
munity, sampling frequency is rarely highlighted in the Similarly, Crundall & Underwood (1998) re-
methodological discussions. Sampling frequency is not ported that experienced drivers had shorter
the only important property of eye-trackers (precision, fixation durations for suburban roads (324
accuracy and versatility on participants are others), but versus 335 ms) and divided highways (349
it is definitely the most, by manufacturers, highlighted versus 395 ms), but not for rural roads (381
technical property. Manufacturers use sampling fre- versus 364 ms). Although statistically sig-
quency as a major sales argument, and sampling fre- nificant differences are claimed, these dif-
quency is the most mentioned technical property of ferences are at the limits of recording ac-
eye-trackers in journal papers. There are good reasons curacy at 30 Hz (33 milliseconds per video
for this: Sampling frequency affects what you can do frame).
with your eye-tracker in a number of ways. Some uses, Sampling frequency is measured using the unit Hertz
such as precise measurements of small saccades, are so (Hz), which refers to the number of samples per sec-
sensitive that a higher sampling frequency (and preci- ond. Most modern eye-trackers have sampling fre-
sion) is necessary to estimate them accurately (see e.g. quencies ranging from 25 - 2000 Hz. For the many 50
the discussion in the supplemental methods of Kagan, Hz eye-trackers, a sample is registered once every 20
ms, whereas a 250 Hz eye-tracker samples every 4 ms.
We seem to think that faster is automatically better, like
The authors wish to thank Kazuo Koga of EcoTopia Sci- we think more pixels in a digital camera is always bet-
ence Institute, Nagoya University, Japan, and one anony- ter, but it is also reasonable to expect the marginal ben-
mous reviewer for extremely helpful reviews. The authors efit for every further Hz to diminish at some level. For
also thank the eye-tracking group at Lund University Hu- instance, the benefit of a 2000 Hz system over a 1000 Hz
manities Lab for valuable comments. Source code for the system should not equal that of a 100 Hz over a 50 Hz
simulations can be found at the main author’s webpage: eye-tracker, even though both constitute a doubling of
http://www.humlab.lu.se/richard the frequency. It is currently not clear what sampling
DOI 10.16910/jemr.3.3.6 This article is licensed under a ISSN 1995-8692
Creative Commons Attribution 4.0 International license.
1
2 ANDERSSON, NYSTRÖM & HOLMQVIST (2010)
Velocity
Velocity
dards vary.
This points to the question: what sampling fre- Error Error
quency and/or data amount is necessary to be certain
of eye-tracking results where the sampling-related un- n n+1 n n+1
Time & Samples Time & Samples
certainty exceeds the effect magnitudes found?
Figure 1. Consider the measurement of the event that trig-
For oscillating eye-movements, such as tremors, we
gers a saccade velocity criterion (dashed line). The temporal
can argue based on the Nyquist-Shannon sampling sampling error occurs when the eye increases velocity after a
theorem (Shannon, 1949) that the sampling frequency recent sampling of the eye, n, resulting in changes not being
should be at least twice the speed of the particular eye registered until the next sample n + 1. Case a) shows a large
movement (e.g., behaviour at 150 Hz requires >300 error when the eye accelerates right after it just having been
Hz sampling frequency). Other than that, the typi- sampled, resulting in an error equal to almost the duration
cal practice is to use whatever is used in your partic- of one sample. Case b) shows a small error, where the eye
ular field of research or faster. Low-level visual cog- accelerates at the end of the current sample and just before
nition research can use constraining setups favouring the next sample.
systems with speeds from 1000 Hz to 2000 Hz, as nat-
uralism is not typically a primary concern, but rather
to maintain control over the variables. Research using
gaze-contingent display changes, for example in real- should be higher than 300 Hz to accurately calculate
time exchanging peripheral letters with ’x’ to manipu- the maximum saccadic velocity. Inchingolo and Spanio
late parafoveal preview benefits in a reading task (e.g., (1985) found that saccadic duration and velocity data
McConkie & Rayner, 1975), are usually the most de- from a 200 Hz EOG system that they tested are equiva-
manding experiments in terms of frequency. This is lent to the same data from a 1000 Hz system, but only
because high speeds allow the system to detect sac- for saccades larger than 5◦ .
cade launches earlier and provide the display changes Whereas previous studies have focused on sampling
even faster, which minimizes the risk of the partici- frequency and its role for particular, often saccade-
pant noticing the manipulation. Research investigating related, measures, we explore the effect of sampling-
higher-level cognition and using naturalistic tasks com- related errors more generally. Our aim is to explain the
monly prefer systems allowing free movement of the source of these errors, describe them mathematically,
head, either by remote eye-tracking or head-mounted simulate their effects in actual experiments and pro-
and mobile eye-tracking. These systems typically op- vide easy-to-follow heuristics to compensate for these
erate at speeds from 25 Hz to 250 Hz. A specific effects.
community of researchers choose to use web-cameras
as eye-trackers, with the goal of bringing inexpensive The source of the error
gaze-interfacing capabilities to the masses, and these
cameras typically have sampling frequencies below or As it is impossible to have an infinite sampling
equal to 30 Hz. Also, analyses using video are typically frequency, each eye-tracker instead takes an instanta-
limited the frame speed, which most often is around neous snapshot of the eye at a fixed rate (typically 25-
24 fps/Hz. But even high-end eye-tracking systems al- 2000 Hz). Each snapshot is a point in time, taken to
low different setups that exchange sampling frequency be representative of a whole interval of time. For in-
for binocularity or remote filming. This requires us stance, with a 50 Hz system, the position of the eye at
to know what speeds we need for our particular re- each sample is assumed to be valid for the whole 20 ms,
search questions and also to know when it yields a even if it is very likely that the eye did not have that ex-
net improvement to sacrifice speed in order to capture act position just before the moment of sampling. The
the behaviour in a more naturalistic setting, e.g., using eye-tracker cannot sample the eye in a position which
the less intruding remote filming for slower eye move- it has not moved to yet, but the system may sample
ments. the eye in the correct position or a position it recently
Additionally, sampling frequency heavily affects had. By necessity, sampling always lags behind the po-
many measures we use, and what we can use them sition of the eye because the eye is constantly moving
for. For instance, Enright (1998) provides evidence that to some extent. Figure illustrates the resulting tempo-
saccadic peak velocity can be well estimated in 60 Hz ral sampling error, where the eye-tracker mis-estimates
data from eye-trackers using the relation between pupil the correct point in time that a particular event (the trig-
and corneal reflection, but only for saccades larger than gering of a velocity criterion) takes place.
10◦ . For saccades shorter that 10◦ , typical of reading, It should be noted that this paper addresses tempo-
the peak velocity calculation is not accurate with 60 ral measures, i.e., any measures that tries to estimate ei-
Hz data. Juhola, Jäntti, and Pyykkö (1985), using Elec- ther the duration of an event or the point in time when
trooculography (EOG) and photoelectric eye-trackers an event takes place. These temporal measures are es-
to study 20◦ saccades, argue that sampling frequency timated using two reference points in time. The two
DOI 10.16910/jemr.3.3.6 This article is licensed under a ISSN 1995-8692
Creative Commons Attribution 4.0 International license.
JOURNAL OF EYE MOVEMENT RESEARCH, VOLUME 3, ISSUE 3 3
points we will call the start criterion and the stop cri- is launched towards a designated target on the screen
terion. The exact operationalization of the criteria will (duration = timestop − timestart ). If the time-stamp of the
vary with the specific measures, but typically we focus stimulus onset is 5674 ms and the saccade to the target
on the system clock time of these events as they occur. is detected to be launched at 6743 ms, then the result-
For example, if an event starts at time 1 and it stops ing saccadic latency is 6743 − 5674 = 1069 ms. This pre-
at time 10, then the resulting duration of the event is cise saccadic latency value, however, assumes that we
10 − 1 = 9 in whatever time unit we are measuring in. correctly detected the saccade launch at exactly 6743
If we want to estimate a point in time when something ms. We will now describe how a temporal sampling
occurs rather than a duration, we typically set the start error occurs for this example. Assuming no system la-
event criterion to be the point when we start counting tency, the onset of the stimulus will appear at the same
the time from 0. For example, if we want to estimate the time as the system timer starts counting the designated
point in time when the eye makes a particular move- trial durations. The control computer records the pro-
ment in a trial, then typically we start counting the time cessed eye-images from the onset of the stimulus until
from the beginning of the trial (our zero point). In this the offset of the same. During our analysis, we extract
case, the trial start is our start criterion. only the eye data corresponding to the time between
Throughout this paper, we will use the term sampling the stimulus onset and until the first saccade to the tar-
point to refer to that particular point in time when the get. When the system detects the eye making a saccade,
eye image is captured by the eye-tracker. We use the the qualifying velocity criterion happens somewhere
term sample to refer to the eye image and the resulting between two sample points, one on each side of the
coordinate pair that is taken to be valid for a period velocity criterion. This in turn gives us our temporal
of time related to the sampling frequency of the eye- sampling error which will be, on average, half a sam-
tracker. We use the term window of no sampling to re- ple in time (the expected mean of the uniform distribu-
fer to the time that passes between the two sampling tion [0,1]). As a result, our measured saccadic latency
points. The term temporal sampling error, or simply er- is the time from the stimulus onset to the true trigger-
ror, will refer to the time between the point of actual ing of the saccade criterion plus the temporal sampling
objective occurrence of an event and the detected oc- error. In Figure 2, this is the temporal error resulting
currence of an event, e.g., the time between the point from measuring the start of the trial, a, to the registered
where the gaze enters an area of interest and the point gaze criterion, c, and not the true saccadic latency that
the system actually registers the gaze inside the area. occurs between point a and point b. The temporal sam-
What is referred to as a temporal sampling error in this pling error is the time between points b and c.
paper seems to be same phenomenon that Kagan et al. This form of temporal error is the same for a large
(2008) calls ”temporal offset´´ (in the caption of Supple- group of measures we choose to call one-point measures.
mental Figure 6). Do not confuse this temporal sam- These measures have, by our definition, either the start
pling error with other errors of measurements, such as criterion or the stop criterion of the measure defined by
spatial offset. The distribution of means of temporal a frequency-independent system event and the other
sampling errors will often be modulated by the amount criterion defined by a sampled gaze criterion, but both
of data we have, and we will use the term data points criteria can not be determined by the same type of qual-
to refer to the number (count) of a particular measure ifying event.
we have recorded, for example the number of fixation The temporal sampling error caused by a finite sam-
durations, dwell times, saccade durations and so on. pling frequency where the true oculomotor events are
We make the very plausible assumption that a true uniformly distributed between sampling points can be
oculomotor event, e.g. the eye passing a 60◦ /s velocity described mathematically as follows. If the sampling
criterion of saccade detection, is equally likely to occur interval spans the interval [0, f1s ], the sampling error can
anytime between two sampling points (i.e., uniformly be described mathematically as
distributed). For this paper, we also assume an eye-
tracking system with zero system latency and cameras 1
which take snapshots of the eye rather than continu- ε = tMeasured − tTrue , ε ∼ U(0, ) (1)
fs
ously transmitting camera pixels going from the top-
left corner to the bottom-right corner. where ε represents the error between the time (tTrue )
for the oculomotor movement and the time (tMeasured )
The one-point temporal when the movement was registered by an eye-tracker
sampling error with sampling frequency fs . U(a, b) denotes the uni-
form distribution on the interval [a, b]
Consider a visual search task where we are inter- The net effect of the temporal sampling error on
ested in how fast participants can locate a target object our desired measure depends on the particular mea-
in a cluttered scene. We may use, as a dependent vari- sure. Specifically, the error will be positive or nega-
able, a saccadic latency measure where we measure the tive depending on whether the sampled gaze criterion
duration from the onset of a stimulus until a saccade constitutes the start criterion or the stop criterion in
DOI 10.16910/jemr.3.3.6 This article is licensed under a ISSN 1995-8692
Creative Commons Attribution 4.0 International license.
4 ANDERSSON, NYSTRÖM & HOLMQVIST (2010)
position
generated criteria (which do not depend on a sampling
frequency) have no such errors. The two possible out-
comes that can result from a one-point measure can be
expressed in the following way, where d is the duration
estimated, s and g denotes system-generated and gaze
sampled events respectively. t
Overestimation : d + ε = (stopg + ε) − starts (2a) Figure 2. The x-axis shows continuous time with regularly
occurring sampling points as brief vertical lines, starting from
Underestimation : d − ε = stops − (startg + ε) (2b) the start of the trial, a. The curved line crossing the x-axis
shows the eye-movement fulfilling some gaze-based crite-
An overestimated d would result from any measure rion, e.g. eye velocity or position in relation to an area of in-
that use a sampled gaze property as a stop criterion, for terest. This precise moment happens when the curve crosses
example saccadic latency where the stop is the trigger- the x-axis, b. This event is briefly after registered by the eye-
ing of the saccade or time to target where the stop is tracker, in c. Similarly, another fulfilled criteria may happen
the position of the eye inside the target area of interest. later in d, but is registered only a while after, in e. The tem-
An underestimated d, on the other hand, would result poral sampling error is the difference between the true event
from measures that uses a sampled gaze property as and the registered event, e.g. c-b or e-d
the start criterion, but a system generated event as the
stop criterion. An example would be a measure we can
call “time from decision”, where we measure the time
from when the participant gazed at a particular area of
interest, to when the trial ends and the participant is two gaze-generated criteria results in two sampling er-
forced to make a choice. In this case, the start event rors within the same event. Consider Figure 2 and as-
is sampled and the stop event is system generated (the sume the gaze-based criteria to be the entering and the
end of the trial). This measure would result in an un- exit of an area of interest. Our measure will be time
derestimated latency. spent gazing at an area of interest in one visit, and we
One-point measures that use a gaze sampled stop call this measure the dwell time. The dwell time can
criterion are much more common than measures using be expressed as d dwell = tstop dwell − t dwell , where d and t de-
start
only a gaze sampled start criterion. Therefore, we will note duration and point in time, respectively. In one
primarily focus on the former type in this paper, but the end of this measure we have an entering event that oc-
equations and simulations can very easily be adjusted curs slightly before the registration of that event, and
to accommodate those measures as well. Overesti- in the other end we have an exiting event which occurs
mated one-point measures have errors within the inter- slightly before the registration of the exit. The tempo-
val [0, f1s ] whereas the underestimated one-point mea- ral sampling error can occur in both ends, i.e. at both
qualifying criteria. The start of the measure is overes-
sures are the mirror image with errors within [− f1s , 0].
timated, yielding a later/higher start time, which re-
This means that, on average, the sampling error sults in a shorter dwell time (more is subtracted), but
(whether it be overestimated or underestimated) will on the other hand we also overestimate the end of the
amount to half a sample worth of time. With a 50 Hz measure, yielding a later/higher stop time and conse-
1
system, half a sample would amount to 50 /2 = 0.01 s = quently a longer dwell time (more to subtract from).
10 ms. Thus, we can summarize the error of the dwell time, in
this case, as εdwell = εstop − εstart . As the events are equal
The two-point temporal in their distributions of the temporal sampling error,
sampling error we expect that, on average, the net temporal sampling
error will be zero - the two errors will cancel each other
There is another large group of measures that behave out. A net error of zero results when the error in es-
differently from the one-point measures. We choose to timating the start time and stop time of the two event
call them two-point measures, because they are qualified criteria are exactly equal. However, the net error is not
by two gaze-related criteria. These measures are both always zero, but is located between two extreme values
initiated and concluded by events determined from εdwell ∈ [− f1s , f1s ]. The first extreme value, a maximal un-
eye-movements and as such they contain sampling er-
rors at both events. Their duration is determined by derestimation of the dwell duration (−1 sample) occurs
simply taking the end time of the event and subtract- when we correctly capture the dwell stop as is occurs
ing the start time of the same event, and the resulting (εstop = 0), but we overestimate the dwell start by (al-
difference is the duration of the event. What differen- most) a complete sample (εstart = f1s ) – the real dwell
tiates these measures from one-point measures is that starts immediately after we just sampled the eye, so we
DOI 10.16910/jemr.3.3.6 This article is licensed under a ISSN 1995-8692
Creative Commons Attribution 4.0 International license.
JOURNAL OF EYE MOVEMENT RESEARCH, VOLUME 3, ISSUE 3 5
As we add data (e.g. dwells) the error distribution
Probability
3
1200 10
do
i = 1 (data amount, i.e. number of one-point mea- 600
sures) 400 10
1
end if
10 100 1000
end while Sampling frequency (Hz)
end for Figure 5. The number of one-point measures we need in or-
end for der get a two-sample t-test significant at the 95 % level for
three different effect magnitudes. Values below 5 data points
are not shown as t-tests are not reliable for such small sam-
ples.
the number of data points you need. As you investi-
gate smaller effects, the curve will shift outwards and
increase your data requirements. For small effects at
5 ms, a 100 Hz system needs around 10 data points.
Results & Discussion
Lowering the speed to 50 Hz, increases the data re- The results in Figure 6 indicate that for very low
quirements to around 40 data points. It is important sampling frequencies, the data requirements to greatly
to point out that these effects are constant, i.e. always reduce the sampling errors are enormous, but these
5/20/50 ms for each measure in one of the vectors. requirements drop off very quickly as the frequency
Real effects are seldom constant, but rather normal in increases. At around 200 Hz or above, there is little
their distribution, which means these results represent marginal benefit of higher sampling frequencies with
an optimistic minimal-requirements case for these ef- regard to reducing sampling errors. Furthermore, we
fect magnitudes. The absolute levels are not the main managed to fit the data near perfectly (r2 = 1.00) using
focus here, but rather the relation between sampling Equation (5), where N is the data points required, fs
frequency and data requirements. is the sampling frequency, but with c as the constant
2429400 (which differs from the constant for solving
Simulation 3 - two-point one-point errors).
temporal error reduction Given the sampling frequency, we can solve for the
minimum number of data points required to contain
In this simulation, we investigate how many data the temporal sampling error within 1 ms of the ex-
points from a two-point measure we need, given a par- pected mean of the error. Similarly, if we have the num-
ticular sampling frequency, in order for the temporal ber of data points, we can solve for the minimum sam-
sampling error be limited to maximally 1 ms (we se- pling frequency needed in order to contain the tempo-
lect the same span as in the one-point simulation - for ral sampling error within 1 ms of its expected mean.
comparison). This is similar to convolving the distri-
bution in Equation (4) until it reaches such a narrow Simulation 4 - two-point data
Gaussian form that it is very unlikely (p <.05) that the compensation
sampling error is within 1 ms (.5 ms in either direction
of the mean). Of course, the typical researcher is often not inter-
ested in completely cancelling this sampling error, but
Procedure rather to show that her experimental manipulation has
a statistically significant effect. In this simulation we
The same procedure as in Simulation 1 was used, but investigate, given a particular sampling frequency and
the two-point temporal sampling error was instead cal- a particular effect magnitude, how many pairs of data
culated by Equation (3). Note that the expected tempo- points from a two-point measure will suffice in order
ral sampling error for a two-point measures is zero, so to achieve a significant two-sample t-test on the aver-
the resulting error distribution will be centered on zero. age comparison. This simulation will show how large a
DOI 10.16910/jemr.3.3.6 This article is licensed under a ISSN 1995-8692
Creative Commons Attribution 4.0 International license.
8 ANDERSSON, NYSTRÖM & HOLMQVIST (2010)
1400 1000
3
1200 10
1000
2
800 10
100
600
1
400 10
200
10
0
10
20 40 60 80 100 120 140 160 180 200
Frequency (line, linear scale) 10 100 1000
Figure 6. How many two-point measures we need in or- Sampling frequency (Hz)
der for the mean temporal noise to be less than 1 ms. The Figure 7. Number of two-point measures needed for two
line shows the fixations needed for all simulated frequencies samples to be significantly different at various effect magni-
(left and bottom scales). The dots show the fixations needed tudes. The two samples compared are one with a base two-
for typical frequencies of modern eye-trackers (top and right point sampling error and another with a base sampling error
scales, base 10 log-transformed). plus the added constant effect. Data below 5 data points are
not shown as t-tests are not reliable for very small data sets.
Procedure
We used real eye-tracking data from the large read-
The same procedure as in Simulation 2 was used, but ing experiment described in Nyström and Holmqvist
the errors were generated for a two-point measure us- (2010). In short, eye-movements were recorded at 1250
ing Equation (4). Hz while University students read texts on a com-
puter screen. The text was divided into 16 screens (im-
ages), and for each screen we defined an area of interest
Results & Discussion around a single high-frequency word near the center
of the screen. We used this area of interest to calcu-
Results in Figure 7 show the same relation between late the time spent gazing inside this area in one single
sampling frequency and data requirements as Sim- visit from entry to exit, referred to as a dwell, and the
ulation 2, which was the same simulation for one- resulting dwell time. We estimated dwell time using
point measures. However, the absolute values are the original 1250 Hz raw data and only included dwells
slightly different, reflecting the fact that two-point tem- that were longer than 50 ms (similar to the shortest fix-
poral sampling errors span a larger interval, and con- ations, see e.g. Rayner, 1998:376) and were separated
sequently require more data to obtain a mean near by at least 20 ms. The 20 ms criterion corresponds to
zero. The same reservation as for the one-point mea- the minimum duration of a saccade, which is 10 ms ac-
sures remain – that this reflects an optimistic minimal- cording to Nyström and Holmqvist (2010), rounded up
requirements case only. to 20 ms to equal a full 50 Hz sample. This allowed
us to ignore high-frequency noise such as visits due
to passing saccades and eye-tracker imprecision. We
Simulation 5 - real-world data then downsampled this data to 50 Hz data by using
every 25th coordinate pair. The 1250 Hz data function
In this simulation, we resample real eye-tracking as a baseline that we, for sake of argument, assume are
data from a reading task to quantify the sampling error identical to the objective sampling of an unlimited sam-
and verify the predicted shape of the two-point error pling frequency. The differences that arise are thus due
distribution in Figure 3. This is done in order to show to the longer sampling intervals of the 50 Hz system,
that this is not only a purely theoretical effect, but it can essentially showing the sampling error of a 50 Hz sys-
also affect actual recorded data tem relative to a 1250 Hz system.
DOI 10.16910/jemr.3.3.6 This article is licensed under a ISSN 1995-8692
Creative Commons Attribution 4.0 International license.
JOURNAL OF EYE MOVEMENT RESEARCH, VOLUME 3, ISSUE 3 9
Reproduced 50 Hz sampling error Procedure
300
Inchingolo, P., & Spanio, M. (1985). On the identification and rithm for fixation, saccade, and glissade detection in eye-
analysis of saccadic eye movements-a quantitative study tracking data. Behavior Research Methods, 42, 188-204.
of the processing procedures. IEEE Transactions on Biome- Rayner, K. (1998). Eye movements in reading and informa-
dial Engineering, 32(9), 683-695. tion processing: 20 years of research. Psychological Bulletin,
Juhola, M., Jäntti, V., & Pyykkö, I. (1985). Effect of sampling 124(3), 372–422.
frequencies on computation of the maximum velocity of Salvucci, D., & Goldberg, J. (2000). Identifying fixations and
saccadic eye movements. Biological Cybernetics, 53(2), 67– saccades in eyetracking protocols. In Etra ’00: Proceedings
72. of the 2000 symposium on eye tracking research & applications
Kagan, I., Gur, M., & Snodderly, D. M. (2008, 11). Sac- (pp. 71–78). New York, NY, USA: ACM Press.
cades and drifts differentially modulate neuronal activity Shannon, C. (1949). Communication in the presence of noise.
in V1: Effects of retinal image motion, position, and ex- Proceedings of the IRE, 37(1), 10–21.
traretinal influences. J. Vis., 8(14), 1-25. Available from Shic, F., Scassellati, B., & Chawarska, K. (2008). The incom-
http://journalofvision.org/8/14/19/ plete fixation measure. In Etra ’08: Proceedings of the 2008
symposium on eye tracking research & applications (pp. 111–
McConkie, G. W., & Rayner, K. (1975). The span of the ef-
114). New York, NY, USA: ACM Press. Available from
fective stimulus during a fixation in reading. Perception &
http://dx.doi.org/10.1145/1344471.1344500
Psychophysics, 17(6), 578–586.
Nyström, M., & Holmqvist, K. (2010). An adaptive algo-