0% found this document useful (0 votes)
9 views13 pages

Muc24a Sub4067 I7

This paper investigates hands-free interaction methods for selecting items in scroll lists on augmented reality (AR) devices using eye and head tracking. A user study compared three gaze-based methods against traditional hand-based interaction, revealing insights into the physical and cognitive demands of each approach. The findings suggest a preference for head movements over eye interactions due to lower physical and cognitive load, highlighting the potential for improved user interfaces in AR technology.

Uploaded by

avikcsgrad05
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views13 pages

Muc24a Sub4067 I7

This paper investigates hands-free interaction methods for selecting items in scroll lists on augmented reality (AR) devices using eye and head tracking. A user study compared three gaze-based methods against traditional hand-based interaction, revealing insights into the physical and cognitive demands of each approach. The findings suggest a preference for head movements over eye interactions due to lower physical and cognitive load, highlighting the potential for improved user interfaces in AR technology.

Uploaded by

avikcsgrad05
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

1 Hands-free Selection in Scroll Lists for AR Devices

2
3 ANONYMOUS AUTHOR(S)
4
5 While desktops and smartphones have established user interface standards, they are still lacking for virtual and augmented reality
6 devices. Hands-free interaction for these devices is desirable. This paper explores utilizing eye and head tracking for interaction beyond
7 buttons, in particular, selection in scroll lists. We conducted a user study with three different interaction methods based on eye and
8
head movements, gaze-based dwell-time, gaze-head offset, and gaze-based head gestures and compared them with the state-of-the-art
9
hand-based interaction. The study evaluation of quantitative and qualitative measurement provides insights into the trade-off between
10
physical and mental demands for augmented reality interfaces.
11
12 CCS Concepts: • Human-centered computing → Empirical studies in interaction design.
13
14 Additional Key Words and Phrases: hands-free interaction, eye tracking, head tracking, head gestures, selection in list boxes
15
ACM Reference Format:
16
Anonymous Author(s). 2024. Hands-free Selection in Scroll Lists for AR Devices. In . ACM, New York, NY, USA, 13 pages. https:
17
18 //doi.org/XXXXXXX.XXXXXXX
19
20 1 INTRODUCTION
21
The higher-level goal of our research is to understand how a user interface could look like if we want to transfer large
22
23 portions of the smartphone functionalities into AR glasses to get something we call smart glasses. Typical smartphone
24 tasks are making telephone calls, reading and writing emails, surfing the internet, and using a navigation application.
25
Most of these tasks need selections in scroll lists, for example, choosing an entry in a phone list. Desirably, such a user
26
interface should keep the hands free for other tasks. Keeping the hands free would be a good argument to switch from a
27
28 smartphone to smartglasses, but there is also a demand for industrial use cases where the hands need protective gloves
29 or have to handle tools.
30
Smartphone user interfaces use standard interaction objects like buttons, menus, list boxes, etc., which are more or
31
less the same as those used for interaction with desktop devices. However, desktop systems use mouse and keyboard
32
33 input, while smartphones get the input from a touch-sensitive display. Keeping the standard interaction objects for AR
34 glasses as they are familiar to the users means finding interaction methods for these interaction objects. AR devices
35
typically can track head movements, and many AR devices come along with a built-in eye tracker. So, the idea to use
36
head and eye movements for interacting with AR devices lies near. Additionally, there is a considerable amount of
37
38 research on selection and pressing buttons with gaze, head movements, or both. However, there is not much research
39 on more complex interaction tasks.
40
In this paper, we investigate the selection in scroll lists interaction possibilities. We designed three different methods
41
to interact with scroll lists using eye and head movements and conducted a user study (𝑁 = 25) where we compared
42
43 these methods against each other and additionally compared them to the interaction with the hand, which is the default
44 interaction method provided by the device manufacturer.
45
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not
46
made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components
47
of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to
48
redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
49
© 2024 Association for Computing Machinery.
50 Manuscript submitted to ACM
51
52 1
Mensch und Computer 2024, 1-4 September 2024, Karlsruhe, Germany Anon.

53 Our observations and insights are that lifting the arms for mid-air gestures is physically demanding, while interaction
54
with the eyes demands cognitive effort. Users prefer head movements against arm movements as this is less physically
55
demanding and wish only very few gaze interactions as this creates a cognitive load. Cognitive load influences human
56
57 motor performance [29], but humans are well-trained to perform controlled arm, hand, and finger movements in the
58 presence of cognitive load. For eye movements in the presence of cognitive load, however, especially for eye movements
59
for interaction, most individuals are not well-trained. The big question for the future of gaze interaction is whether
60
training reduces cognitive load and will make controlled eye movements less dependent on other cognitive loads and
61
62 stress.
63
64 2 RELATED WORK
65
66
The first eye-tracking devices for interaction date back to the early 1980ies. These systems provided eye-typing
67 applications for disabled people. In 1981, Bolt gave a vision of using gaze for interaction [2]. Jacob did the first systematic
68 research on how to use eye trackers for interacting with graphical user interfaces in 1990 [12].
69
Despite four decades of research, there was no other eye-tracking application in the wild than eye typing. However,
70
71
there is new hope that eye tracking will become an interaction technology for the masses with the introduction of AR
72 and VR glasses. In contrast to public gaze-aware displays, a one-time calibration is no obstacle for a personal device.
73 Building eye trackers into glasses also alleviates problems with outdoor usage caused by changing environmental light
74
conditions. Eye tracker devices have become better and cheaper in the last few years and many hardware manufacturers
75
76
equipped their AR and VR glasses with eye trackers, such as the HoloLens21 and the HTC Vive Pro Eye2 . However, the
77 standard interaction with these devices works with controllers or hand gestures.
78 Apple announced the Vision Pro3 together with the operating system visions, which will be controllable with eyes,
79
hands, and voice. The gaze addresses the interaction object, and a pinch gesture with the fingers makes the selection
80
81
[25]. The hands can be down at the side or in the lap and do not have to be in mid-air to avoid the gorilla arm. We
82 are very curious whether Apple’s product will make eye tracking become a standard interaction technique similar to
83 the touch gestures introduced with the iPhone. However, Apple’s interface is not hands-free and not for advanced
84
interaction like selecting from scroll lists.
85
86
There has been research on eye tracking in VR since at least the beginning of this millennium [7]. Since then, many
87 publications on gaze interaction with AR and VR glasses appeared, e.g., [9, 13, 17, 24, 26]. Some research propagates
88 positive expectations with statements “that eye tracking will soon become an integral part of many, perhaps most,
89
HMD systems” [1], while other research on heads-up computing [28] mention gaze interaction only in the related work.
90
91
Using gaze for interaction, in our study with scroll lists, is not the only solution. Besides standard interaction with
92 scroll lists via pointing devices or directly with the hand, it is also possible to utilize novel devices such as a wristband
93 [8]. The question of whether gaze interaction will be established as a standard interaction method and which second
94
input modality will be used in combination - hand and finger gestures, head movements, or controllers like finger rings
95
96
or wristbands - is still open. If gaze interaction turns out to be problematic, “fallback modalities could be leveraged to
97 ensure stable interaction” [31] as Sidemark et al. proposed.
98 The basic publication for the eye-dwell method researched in our study is from Jacob [12]. Majaranta et al. [19]
99
researched feedback for dwell time-based eye typing. Isomoto et al. [11] focus on dwell selection in AR and VR. Other
100
101 1 https://www.microsoft.com/en-us/hololens/hardware
102 2 https://www.vive.com/us/support/vive-pro-eye/

103 3 https://www.apple.com/apple-vision-pro/

104 2
Hands-free Selection in Scroll Lists for AR Devices Mensch und Computer 2024, 1-4 September 2024, Karlsruhe, Germany

105 research deals with dynamic and adjustable dwell times [18, 21]. There is research on scrolling in gaze-based interfaces
106
by Kumar and Winograd [15], and for auto-scrolling when reading text by Sharmin et al. [30]. Sharmin et al. [30] also
107
point to three US patents on the topic in their references. An important paper on the head-gesture method is “Eye-Based
108
109 Head Gestures” by Mardanbegi et al. [20] from 2012, who used the vestibulo-ocular reflex for separating natural head
110 movements from intended head movements for interaction. The idea of using the vestibulo-ocular reflex, however,
111
was presented already in 2003 by Nonaka [22]. Also, Špakov and Majaranta [33] and Nukarinen et al. [23] presented
112
interaction methods based on gaze and head movements, however, without mentioning the vestibulo-ocular reflex. The
113
114 head-gaze offset method for scrolling is our idea, but the selection method with head-gaze offset was also inspired by
115 Sidenmark et al. [32]. As smart glasses should provide a pedestrian navigation system and our future research also
116
aims for interaction with maps, it is worth mentioning the research on interacting with maps on optical head-mounted
117
displays of Rudi et al. [27] and Liao et al. [16].
118
119
120
121 3 INTERACTION DESIGN FOR SCROLL LISTS
122
AR and VR devices allow for free movement of the user, and the first decision is the placement of the list box to interact
123
with. As the interface should work in any place it should not be world-stable. As we need the head orientation relative to
124
125 the interaction object a head-stable display would not work. For adequate interaction, we need a body-stable projection
126 of the interface. In our implementation, we realized by taking the head position and ignoring the head orientation.
127
According to the principles of VR interaction by Bowman et al. [3], our user study task consists of manipulation of
128
the list to get the desired list item in the view, in Figure 1a to 2b on the left side, and a subsequent selection, depicted in
129
130 the Figures on the right side. Several interaction methods can be used for the two sub-tasks, and our design decision
131 was to use interaction methods of the same type of interaction for both sub-tasks to get a consistent interface. We used
132
interaction methods from the literature, dwell-time [12], head gestures [20], and as a novel method, the offset between
133
head and gaze vector where the selection sub-task is similar to Sidemark’s Gaze-Activated Head-Crossing [32].
134
135 Implementing the interaction methods demanded decisions on parameter values for sizes, times, speeds, and angles.
136 The choice of these parameter values influences the results and this should be considered for comparison of the methods.
137
We conducted a pilot study to estimate these values. The study goal, however, was not to find optimal parameter values
138
but to find the optimal method.
139
140
141
142 3.1 Hands
143
One of the standard interaction methods provided by the device manufacturers is hand tracking. The system shows the
144
tracked hand as a virtual object; see Figure 1a. The reason for implementing the Hands interaction in our study was to
145
146 have a baseline for comparison with the eye-based interaction techniques. For this reason, the scroll list looked the
147 same in all the tasks, with the exception of the eye-dwell method, which had two additional buttons above and below
148
the list. One constraint of interaction with the hand, similar to the interaction with real objects, is that the interaction
149
objects must be within arm’s reach. Consequently, we displayed the list box at a virtual distance of 45 cm while we
150
151 used 1 m for the gaze interaction techniques.
152 Figure 1a shows the scroll list implementation, which has the familiar design of a vertical list with a scroll bar slider
153
at the right side. Putting the finger into the slider allows for scrolling the list as opposed to the finger movement. Putting
154
the finger into a list entry selects the entry at the moment of pulling out the finger.
155
156 3
Mensch und Computer 2024, 1-4 September 2024, Karlsruhe, Germany Anon.

157
158
159
160
161
162
163
164
165
166
167
168
169
170
(a) Hands-only interaction method: The left side shows the inter- (b) Gaze-only dwell-time method: The left side shows the dwell
171
action with the scroll bar with a hand. The right side the selection time button to scroll. The right side shows a selection of a list item
172
of a list item. with a progress bar indicating the dwell time. The light blue color
173 indicates gaze feedback.
174
175 Fig. 1. Screenshots of the hands-only interaction method (a) and the gaze-only dwell-time interaction (b).
176
177
178
179
180 3.2 Eye-Dwell
181
The standard gaze-only interaction method, typically used as an accessibility option, is dwell time. The user has to
182
look for a certain time, the dwell time, at the interaction element. There are other options for gaze-only scrolling, for
183
184 example auto-scrolling as presented by Sharmin et al. [30], which also would be worth studying. However, we wanted
185 to provide a simple and easy method for those participants who might struggle with the other methods.
186
For the Eye-Dwell interface, we placed a button at the top of the list for scrolling down and another button at the
187
bottom of the list for scrolling up. While looking at this button the list scrolls down or up respectively. To select a list
188
189 entry, the user has to look longer at the list item. The list item provides feedback with a growing bar indicating the time
190 already elapsed, see Figure 1b.
191
The height of both the dwell button and the list entry was exactly 2◦ , which is sufficient to avoid problems with the
192
eye tracker accuracy. The optimal dwell time depends on the user’s experience and can be as low as some hundred
193
194 milliseconds. However, as we did not expect experienced users for our study, we set the dwell time to 2 seconds. Due to
195 the problems with dynamic scrolling discussed later in Section 3.5 we used a static scroll speed of 6.4◦ /s.
196
197
198 3.3 Head-Gaze Offset
199
For scrolling up with the Head-Gaze-Offset method, the head direction has to be above the middle of the list, at least 7◦ ,
200
while the gaze has to be below the head direction. The scrolling speed depends on the angle between the head and gaze
201
202 direction. The bigger the angle, the quicker the scrolling, which is in the range from 4.8◦ /s to 12.8◦ /s. In contrast to the
203 dwell-time method, the eyes are on the list content and can recognize when the item to select comes into the field of
204
view. Once the gaze is on the item, the eyes can follow the moving item, which decreases the angle between the gaze
205
and head and reduces the scroll speed. Eventually, the list stops to scroll. When the scrolling stops, the gaze is already
206
207 on the item to make the selection.
208 4
Hands-free Selection in Scroll Lists for AR Devices Mensch und Computer 2024, 1-4 September 2024, Karlsruhe, Germany

209
210
211
212
213
214
215
216
217
218
219
220
221
222
(a) The head gaze offset interaction method. The left side shows (b) The gaze and head gesture interaction method. The left side
223
the scrolling with the head direction above the middle of the list shows the scrolling with just the head direction above the middle
224
and the gaze below the head direction. The right side shows the of the list. The right side shows the selection of a list item by
225 selection of a list item by placing the red dot and the gaze onto looking at the list item and performing a head gesture (roll).
226 the list item. The red dot shows the head direction with a 9◦ offset
227 to the right.
228
229 Fig. 2. Screenshots of the head gaze offset interaction method (a) and the gaze and head gesture interaction method (b).
230
231
The red dot represents the head direction with a 9◦ /s offset to the left (see Figure 2a). For the selection, the red dot
232
233 has to be brought onto the selected item while the gaze also stays on the item. This means that the head has to turn 9◦
234 to the right. We expect that it is not necessary to display the red dot after some practice with this method.
235
236 3.4 Head Gestures
237
238 With the head-gesture method, the list scrolls down when the head direction is at least 7◦ above the middle of the list
239 and scrolls up when it is below. The eyes are not involved in scrolling.
240
A head gesture while keeping the eyes on the list item, means using the vestibulo-occular reflex, triggers the selection,
241
see Figure 2a. We chose a roll movement for the head gesture with a minimal roll angle of 9◦ . Both gesture type and
242
243 angle were the results of our pilot study. The head gesture detection works the same way as for the gaze gestures
244 introduced by Drewes and Schmidt [6], however, based on angles.
245
246 3.5 Interface Design Decisions
247
248 Implementing the interaction methods means making decisions on many details, such as the width and height of the
249 list. We did a pilot study with three individuals to estimate reasonable values for some of the parameters, such as scroll
250
speed, distance of the interface, and preferred head gesture. Other parameters, such as the number of list items, we
251
chose arbitrarily but based on plausible assumptions.
252
253 One of these details, which is worth discussing in depth, is the scroll speed. For the Hands method, the scroll speed is
254 the speed of the hand and depends on the list length and how quickly the user moves the hand. For the other methods,
255
the scroll speed is a value that is coded in the source code and eventually should be adjustable in the “preferences.”
256
The choice of these values, scroll speed, dwell time, number of list items, influence the completion times for the tasks
257
258 and makes comparisons between the four methods questionable. Nevertheless, we will make comparisons, but for this
259 reason, we prefer to speak about observations.
260 5
Mensch und Computer 2024, 1-4 September 2024, Karlsruhe, Germany Anon.

261 We intended to offer a dynamic scroll behavior as this eases selection in long lists, and we expect such behavior for a
262
real product. For the head-gaze offset and the head gesture, there are two possibilities to realize the dynamic scroll. The
263
first option is to increase the scroll speed over time. The other option is to use the angle between the head and gaze
264
265 direction (head-gaze offset) or between the head direction and the horizon (head gesture) as a control parameter for the
266 scroll speed.
267
For the eye-dwell method, the only option is to increase the scroll speed over time. However, we encountered a
268
problem. With the head-gaze offset and the head gesture, the head direction may be above or below the list, but the
269
270 gaze stays on the list to recognize whether the desired item comes into view. In contrast, the eye-dwell method requires
271 that the gaze is on the button above or below the list, and for this reason, the eye can not see whether the desired item
272
appeared already. In consequence, the user has to look at the list from time to time, and this would reset the dynamic
273
scroll behavior. We decided not to implement dynamic scroll for the eye-dwell method. Additionally, we decided to use
274
275 a slow scrolling speed of 6.4◦ /s and a long dwell-time of two seconds.
276
277 4 USER STUDY
278
We used a HoloLens2 for the study. The study design followed the common standards with a training phase, a
279
280 questionnaire for demographic data, tasks for each interaction method with randomized order according to Latin square,
281 a questionnaire after every task, and a final questionnaire with questions on how the four interaction methods compare
282
to each other.
283
284
4.1 Procedure
285
286 After we informed the participants about the study, they signed a consent agreement and filled out the demographic
287
questionnaire. Next, participants familiarized themself with the device and went through the eye tracking calibration,
288
and then they entered a training phase for all four interaction methods. When the participants ensured that they
289
290 understood the interaction methods, the training phase ended, and the main part of the study started. Here, we ask
291 participants to perform five selections for each condition.
292
For each selection, we presented a scroll list with 50 alphabetically sorted entries from which eight were visible at a
293
time. To avoid learning effects, we used a different set of list items than the one in the training. The first two list items
294
295 to select were randomly chosen. The next three list items were from the list positions top (within the first 10 entries),
296 middle (10 entries around the 25th entry), or bottom (within the last 10 entries) in random order. The start position of
297
the list for the first task was the first item at the top. The start position for the subsequent tasks was the position from
298
the end of the previous task.
299
300 After each condition, we asked users to fill in the raw NASA Task Load Index (NASA-TLX) [10] and the System
301 Usability Scale (SUS) [4].
302
303 4.2 Participants
304
305 We conducted a within-group user study with 25 people. The age ranged from 23 to 60 years, with an average age of 37
306 years and a standard deviation of 11.8 years. The gender distribution was 60% male and 40% female.
307
308 5 OBSERVATIONS AND INSIGHTS
309
310 The results of the study depend on the design decisions for parameter values. This makes it questionable whether it is
311 legitimate to compare the results. For example, a shorter dwell time value would make the completion time for the
312 6
Hands-free Selection in Scroll Lists for AR Devices Mensch und Computer 2024, 1-4 September 2024, Karlsruhe, Germany

313
314
315
316
317
318
319
320
321
322
323
324
325
326 Fig. 3. Average task completion time for the four interaction methods with the post hoc p-values indicated. The long task completion
327 time for the Eye-Dwell method is a consequence of a slow scroll speed chosen for this method. Error bars represent standard error.
328
329
330 gaze-dwell method shorter. Consequently, it would be possible to tweak the results to the desired outcome. For this
331
reason, we prefer to speak about observations. Nevertheless, we did significance tests as this is common scientific
332
333
practice. Despite the training, we excluded the first two trials per condition to ensure they understood the interaction
334 techniques, and no on-set training effects were present.
335
336 5.1 Task Completion Time
337
338 The time so successful selection of the time is the task completion time (TCT) measured in seconds. First, we confirmed
339 that the TCT was not normally distributed using a Shapiro–Wilk test (𝑊 = .856, 𝑝 < .001). Consequently, we performed
340
a Friedman test comparing the four methods. The results showed significant differences (𝜒 2 = 34.295, 𝑝 < .001). All
341
significant post hoc comparisons using Wilcoxon signed-rank tests with Bonferroni correction applied are indicated in
342
343 Figure 3. The fastest interaction method was the Head Gesture method, which was even quicker than the Hands selection.
344 However, it is without significance. The head-gesture method was also the one with the least incorrect selections. The
345
slowest interaction was Eye-Dwell method. The reason was the slow and non-dynamic scroll speed and the long dwell
346
time for the selection, which was an intended design decision to provide an easy interface method for participants
347
348 who might be overstrained by the other interaction methods. A higher scroll speed for gaze-dwell would reduce the
349 execution time to a value similar as for the other methods.
350
351
5.2 Error Rate
352
353 We counted an error if at least one wrong selection was made within one trial. In general, participants continued after a
354 wrong selection until they selected the correct object. While this could have led to consecutive wrong selections, this
355
prolonged interaction is already penalized by the increased TCT. First, we confirmed that the task completion time
356
357
(TCT) was not normally distributed using a Shapiro–Wilk test (𝑊 = .856, 𝑝 < .001). Consequently, we performed a
358 Friedman test comparing the four methods. The results showed significant differences (𝜒 2 = 1.637, 𝑝 = .713). The high
359 number of incorrect selections with the Head-Gaze offset was unexpected. Participants reported that they found this
360
method particularly challenging and complex to use. The wrong selections with the hand method also surprised us.
361
362
Participants told us that the scroll bar was not wide enough and too close to the list of items. Maybe the occlusion with
363 the hand and parallax effects are further reasons for the wrong selections.
364 7
Mensch und Computer 2024, 1-4 September 2024, Karlsruhe, Germany Anon.

365
366
367
368
369
370
371
372
373
374
375
376
377
378
Fig. 4. Error rate for the four methods. Error bars represent standard error.
379
380
381
5.3 Scroll Speed
382
383 Figure 5 shows the scrolled distance over time for the four interaction methods measured in the study. Theoretically,
384 the data points should lie on a curve but with some dispersion of the values as human performance varies.
385
There are predictive models for classical scroll methods [5] how much time it takes to acquire a list item. However,
386
387 for the Hands method, the interaction process is complex and not fully understood. In our data for the Hands method,
388 see Figure 5, the dispersion is high, and a functional relation is not recognizable. Although the Hands method was not
389 the fastest interaction, it achieved the highest scroll speeds. Only for the Hands method does the scroll speed depend on
390
the number of items in the list [5].
391
392 The data (see Figure 5) reflect the constant scroll speed for the gaze-dwell method nearly perfectly. The data points
393 are mostly on a straight line. The scroll speed, which is the slope of the regression line, does not depend on human
394 abilities but was a design decision.
395
For the dynamic scroll methods, e.g., Gaze-Head Offset and gaze-gesture method, with a constant increase in speed,
396
397 the data should lie on a parabola. However, because of the high dispersion, this is not recognizable.
398
399
5.4 User Ratings
400
401 Figure 6 shows the average SUS score Brooke [4] for the four interaction methods. We confirmed that the data is not
402 normally distributed (𝑊 = 947, 𝑝 < .001). Next, a Friedman test showed that the conditions are significantly different
403
(𝜒 2 = 17.766, 𝑝. < .001). All significant post hoc comparisons using Wilcoxon signed-rank tests with Bonferroni
404
correction applied are indicated in Figure 3. Again, the head gesture method got the best rating.
405
406 Figure 7 shows the average ratings in six categories from the raw NASA-TLX questionnaire [10]. The head gesture
407 method has the lowest frustration and effort and the best performance.
408
409
410
6 DISCUSSION
411 According to the SUS score, none of the interaction methods is completely inoperative. The favorite interaction method,
412
however, was the head-gesture method. Interestingly, the Head Gesture method has the least portion of gaze interaction
413
414
of all the interaction methods using gaze. Eye-Dwell interaction is mentally demanding, while body movements are
415 physically demanding.
416 8
Hands-free Selection in Scroll Lists for AR Devices Mensch und Computer 2024, 1-4 September 2024, Karlsruhe, Germany

417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435 Fig. 5. Scrolled distance over time with the four methods. The dashed line shows the trend line for the four methods.
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
Fig. 6. Results from the SUS questionnaire with the post hoc p-values indicated. The red line indicates the threshold of 68 to make
451
systems that are considered to be below average. Error bars represent standard error.
452
453
454
There is a clear hierarchy on how physically demanding body movements are. Lifting the arms for mid-air gestures
455
456
is physically demanding, an effect that is well-known and called gorilla arm syndrome. Moving the head is much less
457 demanding, and for this reason, people bend the head down to look at their smartphone display and do not lift the arm
458 holding the smartphone. Eye movements do cause nearly no physical demand as the eyes are constantly moving, even
459
while we sleep. In contrast, the mental demand for intentional eye movements seems to be high. Maybe the reason for
460
461
the mental demand is the novel way to interact, which needs high concentration, and the mental demand will get lower
462 after some time of practice. However, less eye movements will always be less demanding.
463
464 6.1 Multi-Modal Interaction Approach
465
466 There is a trade-off for a multi-modal interaction method with body and eye movements between mental and physical
467 demands. The head-gesture method with scrolling by head movement only and selection with a small head movement
468 9
Mensch und Computer 2024, 1-4 September 2024, Karlsruhe, Germany Anon.

469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495 Fig. 7. Results from the NASA-TLX questionnaire. The red stars indicate significance on the 5% level. The hands method has the
496 highest physical and the lowest mental demand.
497
498
499
500 while looking at the intended item minimizes both, the physical and the mental demand. At the same time, future
501 research has to examine the robustness of the methods against accidental selection while doing other interactions.
502
It seems that it was a good choice to use a roll movement for the head gesture. In previous studies, we used nodding
503
504
(tilt) and shaking (pan), which was a source of problems. Participants tended to nod or shake their heads too vigorously,
505 and the heavy head-mounted device got out of place and eventually spoiled the eye-tracker calibration. In the study, the
506 participants performed the roll movement much softer. For answering yes-no questions with a head gesture, nodding
507
and shaking the head is more intuitive. For selecting an item, however, rolling the head may be the better option.
508
509
Selecting an item from a scroll list consists of two basic interactions: scrolling the list and selecting the item. Both
510 basic interactions can be done with the dwell-time approach, the offset between head and gaze vector, or head gestures
511 utilizing the vestibulo-occular reflex. We designed our study with both interactions being from the same type for
512
consistency. However, this is not mandatory, and combining interaction methods of different types, for example,
513
514
head-gaze offset for scrolling and dwell-time for selection, needs to be explored in future research.
515
516 6.2 Effect of List Entries
517
518
Another interesting question is the influence of the number of list entries on the quantitative and qualitative results. It
519 is a limitation of our study that we used only a fixed number of list entries. Also, dynamic scrolling behavior definitely
520 10
Hands-free Selection in Scroll Lists for AR Devices Mensch und Computer 2024, 1-4 September 2024, Karlsruhe, Germany

521 needs further investigation. Other questions are about selection in unsorted lists or how helpful a page-scroll mechanism
522
is.
523
524
6.3 Relation To Cognitive Load
525
526 Observation from our previous studies on gaze interaction suggests that gaze interaction works best when the users are
527
in a relaxed mood, while stressed users typically perform poorly. Stress seems to downgrade manual fine motor skills
528
529
and also influences eye movements. Hands gesticulate unconsciously, but we are used to performing willful movements
530 even under stress. In contrast, the eyes are much less used to perform willful movements, and stress degenerates this
531 ability [14]. The big question for gaze interaction is whether willful eye movements under stress are doable on a similar
532
level as hands or the head can do or whether gaze interaction is only feasible with relaxed users.
533
534
7 CONCLUSIONS AND FUTURE WORK
535
536 Based on our results, we see the main task for future research is to find out why gaze interaction is so out of favor
537
for the users and especially whether training will increase the acceptance of gaze interaction. For this, we intend
538
539
to investigate the effects of long-term usage. Effects like the gorilla arm syndrome become only obvious in longer
540 studies. The same is true for training effects. After some training with a dwell-time approach, it is possible to reduce
541 the dwell-time period, and the interaction will be more efficient. Experienced users may not need the red dot for the
542
gaze-offset method, resulting in less distraction. Working several days with one of the tested interaction methods may
543
544
change the picture. Based on our insights, we derive a general design rule for developing hands-free interaction in AR
545 and VR. We recommend using mostly head movements with only a little bit of support by eye movements.
546
547 REFERENCES
548
[1] Isayas Berhe Adhanom, Paul MacNeilage, and Eelke Folmer. 2023. Correction to: Eye Tracking in Virtual Reality: A Broad Review of Applications
549 and Challenges. Virtual Real. 27, 2 (2023), 1569–1570. https://doi.org/10.1007/s10055-023-00781-4
550 [2] Richard A. Bolt. 1981. Gaze-Orchestrated Dynamic Windows. SIGGRAPH Comput. Graph. 15, 3 (1981), 109–119. https://doi.org/10.1145/965161.806796
551 [3] Doug A. Bowman, Donald B. Johnson, and Larry F. Hodges. 1999. Testbed Evaluation of Virtual Environment Interaction Techniques. In Proceedings
552 of the ACM Symposium on Virtual Reality Software and Technology (London, United Kingdom) (VRST ’99). Association for Computing Machinery,
553 New York, NY, USA, 26–33. https://doi.org/10.1145/323663.323667
554 [4] John Brooke. 1995. SUS: A quick and dirty usability scale. Usability Eval. Ind. 189 (11 1995).
555
[5] Andy Cockburn and Carl Gutwin. 2009. A Predictive Model of Human Performance With Scrolling and Hierarchical Lists. Human–Computer
Interaction 24, 3 (2009), 273–314. https://doi.org/10.1080/07370020902990402 arXiv:https://www.tandfonline.com/doi/pdf/10.1080/07370020902990402
556
[6] Heiko Drewes and Albrecht Schmidt. 2007. Interacting with the Computer Using Gaze Gestures. In Human-Computer Interaction – INTERACT 2007,
557
Cécilia Baranauskas, Philippe Palanque, Julio Abascal, and Simone Diniz Junqueira Barbosa (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg,
558
475–488.
559 [7] Andrew T. Duchowski, Vinay Shivashankaraiah, Tim Rawls, Anand K. Gramopadhye, Brian J. Melloy, and Barbara Kanki. 2000. Binocular Eye
560 Tracking in Virtual Reality for Inspection Training. In Proceedings of the 2000 Symposium on Eye Tracking Research & Applications (Palm Beach
561 Gardens, Florida, USA) (ETRA ’00). Association for Computing Machinery, New York, NY, USA, 89–96. https://doi.org/10.1145/355017.355031
562 [8] Jacqui Fashimpaur, Amy Karlson, Tanya R. Jonker, Hrvoje Benko, and Aakar Gupta. 2023. Investigating Wrist Deflection Scrolling Techniques for
563 Extended Reality. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for
564 Computing Machinery, New York, NY, USA, Article 386, 16 pages. https://doi.org/10.1145/3544548.3580870
565
[9] Ajoy S. Fernandes, T. Scott Murdison, and Michael J. Proulx. 2023. Leveling the Playing Field: A Comparative Reevaluation of Unmodified
Eye Tracking as an Input and Interaction Modality for VR. IEEE Transactions on Visualization and Computer Graphics 29, 5 (2023), 2269–2279.
566
https://doi.org/10.1109/TVCG.2023.3247058
567
[10] Sandra G. Hart. 2006. Nasa-Task Load Index (NASA-TLX); 20 Years Later. Proceedings of the Human Factors and Ergonomics Society Annual Meeting
568
50, 9 (2006), 904–908. https://doi.org/10.1177/154193120605000909 arXiv:https://doi.org/10.1177/154193120605000909
569 [11] Toshiya Isomoto, Shota Yamanaka, and Buntarou Shizuki. 2022. Interaction Design of Dwell Selection Toward Gaze-Based AR/VR Interaction. In
570 2022 Symposium on Eye Tracking Research and Applications (Seattle, WA, USA) (ETRA ’22). Association for Computing Machinery, New York, NY,
571 USA, Article 39, 2 pages. https://doi.org/10.1145/3517031.3531628
572 11
Mensch und Computer 2024, 1-4 September 2024, Karlsruhe, Germany Anon.

573 [12] Robert J. K. Jacob. 1990. What You Look at is What You Get: Eye Movement-based Interaction Techniques. In Proceedings of the SIGCHI Conference on
574 Human Factors in Computing Systems (Seattle, Washington, USA) (CHI ’90). ACM, New York, NY, USA, 11–18. https://doi.org/10.1145/97243.97246
575 [13] Pekka Kallioniemi, Tuuli Keskinen, Ville Mäkelä, Jussi Karhu, Kimmo Ronkainen, Arttu Nevalainen, Jaakko Hakulinen, and Markku Turunen. 2018.
576 Hotspot Interaction in Omnidirectional Videos Using Head-Mounted Displays. In Proceedings of the 22nd International Academic Mindtrek Conference
577
(Tampere, Finland) (Mindtrek ’18). Association for Computing Machinery, New York, NY, USA, 126–134. https://doi.org/10.1145/3275116.3275148
[14] Thomas Kosch, Mariam Hassib, Paweł W. Woźniak, Daniel Buschek, and Florian Alt. 2018. Your Eyes Tell: Leveraging Smooth Pursuit for Assessing
578
Cognitive Workload. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI ’18). Association
579
for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3173574.3174010
580
[15] Manu Kumar and Terry Winograd. 2007. Gaze-Enhanced Scrolling Techniques. In Proceedings of the 20th Annual ACM Symposium on User
581 Interface Software and Technology (Newport, Rhode Island, USA) (UIST ’07). Association for Computing Machinery, New York, NY, USA, 213–216.
582 https://doi.org/10.1145/1294211.1294249
583 [16] Hua Liao, Changbo Zhang, Zhao Wendi, and Weihua Dong. 2022. Toward Gaze-Based Map Interactions: Determining the Dwell Time and Buffer Size
584 for the Gaze-Based Selection of Map Features. ISPRS International Journal of Geo-Information 11 (02 2022), 127. https://doi.org/10.3390/ijgi11020127
585 [17] Mathias N. Lystbæk, Peter Rosenberg, Ken Pfeuffer, Jens Emil Grønbæk, and Hans Gellersen. 2022. Gaze-Hand Alignment: Combining Eye Gaze
586 and Mid-Air Pointing for Interacting with Menus in Augmented Reality. Proc. ACM Hum.-Comput. Interact. 6, ETRA, Article 145 (2022), 18 pages.
587
https://doi.org/10.1145/3530886
[18] Päivi Majaranta, Ulla-Kaija Ahola, and Oleg Špakov. 2009. Fast Gaze Typing with an Adjustable Dwell Time. In Proceedings of the SIGCHI Conference
588
on Human Factors in Computing Systems (Boston, MA, USA) (CHI ’09). Association for Computing Machinery, New York, NY, USA, 357–360.
589
https://doi.org/10.1145/1518701.1518758
590
[19] Päivi Majaranta, Anne Aula, and Kari-Jouko Räihä. 2004. Effects of Feedback on Eye Typing with a Short Dwell Time. In Proceedings of the 2004
591 Symposium on Eye Tracking Research & Applications (San Antonio, Texas) (ETRA ’04). Association for Computing Machinery, New York, NY, USA,
592 139–146. https://doi.org/10.1145/968363.968390
593 [20] Diako Mardanbegi, Dan Witzner Hansen, and Thomas Pederson. 2012. Eye-Based Head Gestures. In Proceedings of the Symposium on Eye
594 Tracking Research and Applications (Santa Barbara, California) (ETRA ’12). Association for Computing Machinery, New York, NY, USA, 139–146.
595 https://doi.org/10.1145/2168556.2168578
596 [21] Martez E. Mott, Shane Williams, Jacob O. Wobbrock, and Meredith Ringel Morris. 2017. Improving Dwell-Based Gaze Typing with Dynamic,
597
Cascading Dwell Times. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI ’17).
Association for Computing Machinery, New York, NY, USA, 2558–2570. https://doi.org/10.1145/3025453.3025517
598
[22] Hidetoshi Nonaka. 2003. Communication Interface with Eye-Gaze and Head Gesture Using Successive DP Matching and Fuzzy Inference. J. Intell.
599
Inf. Syst. 21, 2 (2003), 105–112. https://doi.org/10.1023/A:1024754314969
600
[23] Tomi Nukarinen, Jari Kangas, Oleg Špakov, Poika Isokoski, Deepak Akkil, Jussi Rantala, and Roope Raisamo. 2016. Evaluation of HeadTurn: An
601 Interaction Technique Using the Gaze and Head Turns. In Proceedings of the 9th Nordic Conference on Human-Computer Interaction (Gothenburg,
602 Sweden) (NordiCHI ’16). Association for Computing Machinery, New York, NY, USA, Article 43, 8 pages. https://doi.org/10.1145/2971485.2971490
603 [24] Ken Pfeuffer, Benedikt Mayer, Diako Mardanbegi, and Hans Gellersen. 2017. Gaze + Pinch Interaction in Virtual Reality. In Proceedings of the 5th
604 Symposium on Spatial User Interaction (Brighton, United Kingdom) (SUI ’17). Association for Computing Machinery, New York, NY, USA, 99–108.
605 https://doi.org/10.1145/3131277.3132180
606 [25] Ken Pfeuffer, Jan Obernolte, Felix Dietz, Ville Mäkelä, Ludwig Sidenmark, Pavel Manakhov, Minna Pakanen, and Florian Alt. 2023. PalmGazer:
607
Unimanual Eye-hand Menus in Augmented Reality. In Proceedings of the 2023 ACM Symposium on Spatial User Interaction (, Sydney, NSW, Australia,)
(SUI ’23). Association for Computing Machinery, New York, NY, USA, Article 10, 12 pages. https://doi.org/10.1145/3607822.3614523
608
[26] Thammathip Piumsomboon, Gun Lee, Robert W. Lindeman, and Mark Billinghurst. 2017. Exploring natural eye-gaze-based interaction for immersive
609
virtual reality. In 2017 IEEE Symposium on 3D User Interfaces (3DUI). 36–39. https://doi.org/10.1109/3DUI.2017.7893315
610
[27] David Rudi, Ioannis Giannopoulos, Peter Kiefer, Christian Peier, and Martin Raubal. 2016. Interacting with Maps on Optical Head-Mounted Displays.
611 In Proceedings of the 2016 Symposium on Spatial User Interaction (Tokyo, Japan) (SUI ’16). Association for Computing Machinery, New York, NY, USA,
612 3–12. https://doi.org/10.1145/2983310.2985747
613 [28] Shardul Sapkota, Ashwin Ram, and Shengdong Zhao. 2021. Ubiquitous Interactions for Heads-Up Computing: Understanding Users’ Preferences for
614 Subtle Interaction Techniques in Everyday Settings. In Proceedings of the 23rd International Conference on Mobile Human-Computer Interaction
615 (Toulouse & Virtual, France) (MobileHCI ’21). Association for Computing Machinery, New York, NY, USA, Article 36, 15 pages. https://doi.org/10.
616 1145/3447526.3472035
617
[29] Stoo Sepp, Steven J. Howard, Sharon Tindall-Ford, Shirley Agostinho, and Fred Paas. 2019. Cognitive Load Theory and Human Movement: Towards
an Integrated Model of Working Memory. Educational Psychology Review 31, 2 (2019), 293–317. https://doi.org/10.1007/s10648-019-09461-9
618
[30] Selina Sharmin, Oleg Špakov, and Kari-Jouko Räihä. 2013. Reading On-Screen Text with Gaze-Based Auto-Scrolling. In Proceedings of the 2013
619
Conference on Eye Tracking South Africa (Cape Town, South Africa) (ETSA ’13). Association for Computing Machinery, New York, NY, USA, 24–31.
620
https://doi.org/10.1145/2509315.2509319
621 [31] Ludwig Sidenmark, Mark Parent, Chi-Hao Wu, Joannes Chan, Michael Glueck, Daniel Wigdor, Tovi Grossman, and Marcello Giordano. 2022.
622 Weighted Pointer: Error-aware Gaze-based Interaction through Fallback Modalities. IEEE Transactions on Visualization and Computer Graphics 28,
623 11 (2022), 3585–3595. https://doi.org/10.1109/TVCG.2022.3203096
624 12
Hands-free Selection in Scroll Lists for AR Devices Mensch und Computer 2024, 1-4 September 2024, Karlsruhe, Germany

625 [32] Ludwig Sidenmark, Dominic Potts, Bill Bapisch, and Hans Gellersen. 2021. Radi-Eye: Hands-Free Radial Interfaces for 3D Interaction Using
626 Gaze-Activated Head-Crossing. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21).
627 Association for Computing Machinery, New York, NY, USA, Article 740, 11 pages. https://doi.org/10.1145/3411764.3445697
628 [33] Oleg Špakov and Päivi Majaranta. 2012. Enhanced Gaze Interaction Using Simple Head Gestures. In Proceedings of the 2012 ACM Conference on
629
Ubiquitous Computing (Pittsburgh, Pennsylvania) (UbiComp ’12). Association for Computing Machinery, New York, NY, USA, 705–710. https:
//doi.org/10.1145/2370216.2370369
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676 13

You might also like