How to Tune Autofocals: A Comparative Study of Advanced Tuning Methods

Benedikt W. Hosp ZVSL, Institute for Ophthalmic Research, University of Tübingen, Maria-von-Linden Straße 6, 72076 Tübingen, Germany [email protected] Yannick Sauer ZVSL, Institute for Ophthalmic Research, University of Tübingen, Maria-von-Linden Straße 6, 72076 Tübingen, Germany Carl Zeiss Vision International GmbH, Turnstraße 27, 73430 Aalen, Germany Björn Severitt ZVSL, Institute for Ophthalmic Research, University of Tübingen, Maria-von-Linden Straße 6, 72076 Tübingen, Germany Rajat Agarwala ZVSL, Institute for Ophthalmic Research, University of Tübingen, Maria-von-Linden Straße 6, 72076 Tübingen, Germany Siegfried Wahl ZVSL, Institute for Ophthalmic Research, University of Tübingen, Maria-von-Linden Straße 6, 72076 Tübingen, Germany Carl Zeiss Vision International GmbH, Turnstraße 27, 73430 Aalen, Germany

Abstract

This study comprehensively evaluates tuning methods for autofocal glasses using virtual reality (VR), addressing the challenge of presbyopia. With aging, presbyopia diminishes the eye’s ability to focus on nearby objects, impacting the quality of life for billions. Autofocals, employing focus-tunable lenses, dynamically adjust optical power for each fixation, promising a more natural visual experience than traditional bifocal or multifocal lenses. Our research contrasts the most common tuning methods—manual, gaze-based, and vergence—within a VR setup to mimic real-world scenarios. Utilizing the XTAL VR headset equipped with eye-tracking, the study replicated autofocal scenarios, measuring performance and usability through psychophysical tasks and NASA TLX surveys. Results show varying strengths and weaknesses across methods, with gaze control excelling in precision but not necessarily comfort and manual control providing stability and predictability. The findings guide the selection of tuning methods based on task requirements and user preferences, highlighting a balance between precision and ease of use.

Introduction

The eye’s natural aging process leads to the common condition of presbyopia, resulting in a decline in the eye’s ability to focus on nearby objects [1, 2]. This widespread issue significantly affects billions of individuals’ visual acuity and quality of life as they age [3, 4]. Traditional approaches, such as bifocal or multifocal lenses, impose limitations and cannot fully replicate the eye’s natural accommodation [5, 6]. These solutions restrict gaze behavior to predefined lens regions, limiting visual convenience and field of view. Furthermore, progressive lenses, while effective in correcting vision, can introduce optical aberrations that compromise visual perception [7, 8, 9, 10, 11, 12, 13, 14]. Autofocals are gaining attention as a new spectacle solution for presbyopia [15, 16, 17, 18]. These spectacles utilize focus-tunable lenses that dynamically adjust their optical power to focus on the currently fixated object [9]. Unlike traditional spectacles with predefined lens regions, autofocals offer the potential for a visual perception that closely resembles the natural healthy accommodation of the eye. Adjusting the focus for every new fixation aims to provide a more seamless and natural visual experience. To understand the environment and perceive the possible distances in there, different kinds of sensors, e.g., LiDAR (Light Detection and Ranging) or stereo cameras [15, 17] have been proposed. Other research focuses on the lens design, such as Hasan et al. using liquid membrane[19, 15] or Li et al. [20] with liquid crystal-based lenses. Although the technology of autofocals is quite a young research field, more scientists focus on optimal hardware settings. One important aspect that has not been researched well is the user’s interaction method (or tuning method) to tune the autofocals. The most common tuning methods are gaze, vergence, and manual [21]. The aim is to find a method that enables smooth and accurate focus control to reach a high usability. However, there has been no systematic comparison between different tuning methods regarding performance and convenience. As a consumer device, user experience is an essential part of the design decision of autofocals. To adequately compare the most common methods, it is necessary to recreate realistic scenarios involving dynamic changes in gaze distance. Virtual reality (VR) can provide several benefits to evaluating autofocals as it is easy to recreate a natural scene with high experimental control. This allows for precise measurement and analysis of the performance and convenience of autofocals, providing valuable insights for further development and optimization.
This work presents a user study that exhaustively compares different tuning methods for autofocals in virtual reality regarding performance and usability ratings. To simulate appropriate scenarios, we use VisionaryVR [22], a virtual reality simulation tool for optical vision correction, which offers a novel approach to evaluate and optimize autofocal algorithms and methods. The simulation tool realistically replicates optical aberrations, depth of field, and other factors affecting focus performance. It enables precise visual performance and convenience evaluation by recreating dynamic gaze distance changes using psychophysical paradigms and behavioral quantifiers. To address the limitations of current approaches of autofocal tuning methods and explore their potential, the paper evaluates a task-based evaluation framework that recreates natural scenarios involving dynamic changes in gaze distance. We aim to comprehensively assess visual performance and convenience, simulating real-life situations where individuals with presbyopia frequently experience shifts in focus. Through experiments and data analysis, the paper aims to gain insights into the effectiveness and efficiency of these control algorithms.

Methods

The method section describes the setting of the experiment environment and the hardware and software of the data-collection procedure for reproducing the results. In the second part, we describe the data collection process and the experimental design for our user study.

VR & ET

For the experiment, we utilized the XTAL VR headset, developed by VRgineers Inc, Prague, Czech Republic, in the simulation tool to replicate autofocals. The XTAL headset boasts a high-resolution display of 8k (4k for each eye at 75 Hz frame rate) and a wide field of view (150 °horizontal and 100°vertical), enhancing the visual realism and immersion of the VR experience. The headset’s high-end specifications gave participants a clear and detailed view of the virtual environment. We used the integrated eye-tracking technology of the VR headset to complement the experiment to capture eye-tracking data during appropriate VR scenarios. This allowed us to record and analyze the participants’ eye movements and gaze direction throughout the experiment. By evaluating the participants’ gaze behavior, we could assess the performance of the gaze-based control approach for autofocals. Combining the XTAL VR headset and eye-tracking data acquisition contributed to a comprehensive and immersive evaluation of the autofocal system’s performance.

Autofocal Simulation

To simulate autofocals, our VR headset incorporated a virtual tunable optical power, representing the optical power of the simulated autofocal lens. The simulated optical power could be dynamically changed within 0 to 3 diopters at two diopters per second. This approach ensured that only objects near the distance fitting the current optical power appeared sharp. Specifically, the simulation of autofocals in VR was achieved using a simulation of spectacle lens blur, as described in the work by Sauer et al. [23]. The calculation of defocus blur in the simulation of autofocals considered several factors. These factors included objects’ distance, the tunable lens’s optical power, and pupil size. Considering these factors, the strength of blur at each point in the field of view (FoV) could be determined. The blur size, or the circle of confusion, was calculated for each pixel based on the depth buffer. The rendered image was then blurred using a disk kernel with a location-dependent size, replicating the defocus blur experienced by individuals with presbyopia. Only objects near the distance fitting the current optical power appeared sharp, mimicking the focusing behavior of autofocals.

Tuning methods

In this study, we evaluated two distinct tuning methods for controlling simulated tunable lenses: manual control and gaze-based control.

Manual Control Method

The manual control approach in our study required users to actively select between set focus distances. This involved choosing from three distinct power settings, each corresponding to different focal lengths: near (20 cm), intermediate (1 m), and far (6 m). For this purpose, the intuitive SteamVR Knuckles controllers [24] were utilized, allowing users to effortlessly transition between these settings using thumb movements on the joystick. This user-directed method afforded full control over the focal distance, catering to individual preferences. A key benefit of this approach was its stability, owing to the absence of input signal noise and the non-requirement of complex sensors. However, it’s important to note the difference between this simulated interaction and real-world hardware, where users typically adjust focus by physically touching the frame of the glasses, a function simplified in our setup as a button press.

Gaze-Based Control Method

The gaze-based control method in our study provided an automated system for adjusting focus, driven by the user’s gaze within a virtual reality environment. Utilizing sophisticated eye-tracking technology, this approach dynamically updated the focus distance based on where the user looked. The system determined the focus distance by identifying the intersection of the user’s gaze with virtual objects. This method was designed to simulate a potential real-world application of autofocals, where a distance sensor, such as LIDAR or a stereo camera, could be employed. Such sensors would enhance the accuracy of the system by mapping the gaze position to a precise distance distribution. This feature would significantly benefit visual performance, aligning the focus adjustments more accurately with the user’s natural gaze behavior. However, challenges such as noise, delays, and accuracy concerns in the eye-tracking system could still impact the fidelity of the focus distance estimations. This addition suggests a direction for future development in autofocal technology, enhancing its practicality and user experience.

Experiment

Our data set consisted of 21 emmetropic individuals, 11 males, and 10 females, with an average age of 31.68 years (standard deviation = 11.71). All participants were free from any known eye diseases. Of these, 9 had experience with eye-tracking technologies, and 16 had been previously exposed to virtual reality environments. The study followed ethical guidelines and received approval from the Faculty of Medicine Ethics Committee of the University of Tübingen under reference number 439/2020BO. Written consent was collected. Our experiment involved a matching task developed in VisionaryVR[22], designed to imitate daily life scenarios with dynamic eye movements. Participants were presented with stimuli on three screens at varying distances: a smartphone at 30 cm, a computer monitor at 1 m, and a television screen at 6 m. The task required participants to match a Landolt ring on one screen and a random Sloan letter on another, determining if they corresponded to the same column on a third screen (for specific implementation insights, please see [22] or see Figure 1 which is taken from VisionaryVR [22]). This activity necessitated shifting focus across all three distances, mirroring real-life situations. The table display was randomized to ensure diverse and comparable gaze shifts, preventing a fixed screen-viewing sequence. Participants tested both tuning methods in separate sessions, completing 30 tasks per method. The stimuli, distributed normally, appeared at the screen’s center or corners, incorporating complex scenarios with stimuli near the borders of different depth fields. Following each session, participants completed a NASA Task Load Index (TLX) survey in VR, providing feedback on task ease and workload. The survey dimensions included:

Refer to caption — Figure 1: Key Elements of the Matching Task Demonstrating Focus Adjustments from VisionaryVR [22]: A) The Landolt ring, used as the initial stimulus in the task, features an opening that can face any of eight directions. B) A second screen displays one of eight Sloan letters. C) Three screens showcase stimuli at varying distances: a smartphone at $30\text{\,}\mathrm{cm}$ , mimicking reading distance; a computer display at $1\text{\,}\mathrm{m}$ for intermediate vision; and a far-distance visualization at $6\text{\,}\mathrm{m}$ . The task involves identifying if a Landolt ring and a Sloan letter, shown on two separate screens, appear in the same column on a third screen, which displays a table of both stimuli. This task necessitates shifting focus across all three distances. D) A virtual environment screenshot of the matching task, where defocus blur is simulated based on local depth and the adjustable focus distance of the virtual autofocal lens. Here, the focus is set to the distant screen. E) An identical scene, but with the focus adjusted to the intermediate distance, resulting in the Landolt stimulus appearing blurred and out of focus on the distant screen.

1.

Mental Demand: Assessing cognitive load and mental effort required for the task. Higher scores indicate greater mental demand.
2.

Physical Demand: Evaluating the physical effort needed for task execution. Higher scores suggest increased physical exertion.
3.

Temporal Demand: Measuring perceived time pressure or constraints during the task. Higher scores reflect more intense time demands.
4.

Performance: Gauging participants’ self-assessment of their task performance. Higher scores denote poorer self-perceived performance.
5.

Effort: Assessing overall mental and physical effort invested in task completion. Higher scores point to greater perceived effort.
6.

Frustration: Determining the level of annoyance or frustration experienced during the task. Higher scores indicate increased frustration.

Experiment Procedure

The overall experiment procedure consisted of the following steps.

•

Familiarization: Subjects performed a few trials without the simulated blur of autofocals to become familiar with the task and the virtual environment.
•

Explanation of Tuning Methods: The two different tuning methods, manual control and gaze-based control, were explained to the subjects. They were informed about the operation and capabilities of each method.
•

Randomized Trials: Subjects operated both tuning methods in separate trials. The order of the tuning algorithms used in each trial was randomized to avoid any bias or order effects. Each subject completed 30 tasks for each tuning method, resulting in 60 tasks.
•

Stimulus Presentation: Stimuli were presented on three different screens at different distances. The screens represented a smartphone at a reading distance of 30 cm, a computer display at 1 m, and a far-distance TV screen at 6 m. The stimuli consisted of Landolt rings and Sloan letters, standardized optotypes for visual acuity testing.
•

Matching Task: The task for the subjects was to compare if the combination of the Landolt ring and Sloan letter on different screens belonged to the same column in a table displayed on a third screen. This required a focus shift between the three viewing distances and dynamic gaze changes. The screen showing the table was randomized to prevent the subjects from developing a fixed strategy for fixating the screens.
•

NASA TLX Questionnaire: After each trial, a NASA TLX questionnaire [25] was presented in VR to assess the subjective task load and convenience experienced by the subjects. The NASA TLX questionnaire is used to determine subjective workload and task performance. It comprises six dimensions: Mental Demand, Physical Demand, Temporal Demand, Performance, Effort, and Frustration. Participants rate each dimension on a scale from low to high or good to poor, indicating their perceived experience in each aspect. The questionnaire provides insights into participants’ subjective workload and helps evaluate the convenience and user experience of the different tuning methods for autofocals.

The stimuli on the screens were presented at the center or the corner of the screen, with their positions randomized. This randomization incorporated complex cases where a stimulus was close to the border of two distinct depths, making the task more challenging and representative of real-world scenarios.

Evaluation

We employed several performance metrics to compare and evaluate the performance of the tuning methods. These metrics included Pearson correlation, root mean square error (RMSE), cosine similarity, mean absolute error (MAE), cross-correlation, and dynamic time warping (DTW). Each similarity metric contributed to the evaluation by assessing the similarity, agreement, or alignment between the control signal (the desired focus distance) and the ground truth (the target distance). The specific formulas or calculations associated with each metric were employed to quantify the performance of the tuning methods accurately.

Similarity Metrics

To evaluate the performance of the tuning methods, we employed several similarity metrics that captured different aspects of agreement and similarity between the control signal and the ground truth. The chosen similarity metrics were:

•

Pearson correlation: The Pearson correlation coefficient measures the linear relationship between two variables, in this case, the control signal and the ground truth. A higher correlation coefficient indicates a stronger linear relationship and better agreement between the two signals.
•

RMSE: The root mean square error quantifies the average magnitude of the differences between the control signal and the ground truth. A smaller RMSE value indicates a closer agreement between the two signals.
•

Cosine similarity: Cosine similarity measures the cosine of the angle between two vectors in a multi-dimensional space. It ranges from -1 to 1, with 1 indicating identical signals and -1 indicating entirely dissimilar signals. A higher cosine similarity suggests a more substantial similarity or agreement between the control signal and the ground truth.
•

Mean absolute error The mean absolute error (MAE) measures the average dissimilarity between the control signal and the ground truth by calculating the average distance between the corresponding dimensions of the two vectors. A smaller mean absolut error suggests a closer match between the two signals.
•

Cross-correlation: Cross-correlation measures the similarity between two signals as a function of the displacement of one relative to the other. It indicates the strength and timing of their correspondence. Higher cross-correlation values indicate a more substantial similarity between the control signal and the ground truth.
•

Dynamic Time Warping (DTW): DTW is used to measure the similarity of two time series that may vary in time or speed. It aligns the time axis of the control signal with the ground truth, allowing for non-linear alignments. Smaller DTW values suggest a better alignment and similarity between the two signals.

By utilizing these similarity metrics, we could quantitatively assess the performance of the tuning methods based on different criteria and perspectives, providing a comprehensive assessment of their effectiveness and accuracy.

Results

Ground Truth Approximation

During the experiment, accurate control signals were obtained by tracking various objects. The position and orientation of the main camera in the VR setup provided information about the view’s spatial origin. Additionally, specific objects’ known position and depth allowed precisely determining a ground truth depth signal. Tracking the camera generated a signal to calculate the distance to objects throughout the experiment. Assuming that the object of interest was always on one of the three screens (mobile, screen, or TV), the object with the shortest Euclidean distance from its spatial center to the 3D gaze point was assumed to be the current object of interest. The correct depth was then determined based on the distance of the viewed objects center point. The variance between the gaze-based method and ground truth distance estimations, it’s important to elucidate several key aspects. Firstly, the gaze-based method relies on tracking the intersection point of the user’s gaze within the virtual environment. This intersection, ideally, should align with the object identified in the ground truth data. However, discrepancies arise due to the complexity of accurately tracking the gaze in a dynamic virtual reality (VR) setting. The primary challenge lies in ensuring the gaze ray intersects with the intended object. Often, due to slight misalignments or user’s gaze behavior, the intersection occurs with an adjacent or different object, leading to a variance from the ground truth measurements. Additionally, our method assumed the nearest object to the 3D gaze point as the object of interest, which might not always represent the user’s actual focus, especially in a cluttered or densely populated VR scene. To mitigate these issues, future improvements could include enhanced gaze tracking algorithms and more sophisticated criteria for determining the object of interest. This would potentially align the gaze-based measurements more closely with the ground truth, thereby improving the system’s overall accuracy and reliability.

Although the ground truth signal determination does not reflect the true signal of depth perception of the subjects, it can be used to measure the performance of the control signals, as they all have their limitations. The manual condition lacks flexibility in offering a wide range of focus distances, while the gaze-based control approach may have accuracy limitations in estimating gaze direction. Additionally, the eye vergence angle alone is insufficient for accurate depth estimation due to several factors, such as ambiguity in vergence changes, differences between accommodation and vergence, individual variability, and limited depth range. Using the raw gaze signal directly to determine the ground truth is not the preferred solution in this study due to the inherent accuracy error of the eye tracker. The accuracy of eye-tracking devices can vary, and relying solely on the raw gaze signal may introduce inaccuracies in the ground truth determination. Instead, the study aims to estimate the best method for autofocal tuning despite the error sources and limitations present in each method, including the accuracy error of the eye tracker used to capture the gaze signal. By considering each method’s accuracy limitations and potential errors, the study seeks to identify the most effective and reliable approach for autofocal tuning.

Uncertainty avoidance

Ensuring an adequate distance between the objects of interest is crucial to prevent potential interference and misclassification caused by uncertainties in the gaze signal. When objects are too close together, inaccuracies or noise in the eye-tracking data may lead to false classifications of the current object of interest. This misclassification can introduce errors in the ground truth determination and, subsequently, impact the evaluation and comparison of different tuning methods for autofocals. By maintaining sufficient spacing between the objects of interest, the study aims to reduce the likelihood of misclassification and improve the accuracy of the ground truth signal. This approach ensures a more reliable comparison between tuning methods, as the ground truth accurately reflects the intended focus distance without significant distortions from uncertainties in the gaze signal. The appropriate distance between objects of interest is critical in obtaining a robust and accurate ground truth signal. It helps minimize the influence of uncertainties in the gaze signal, ensuring a fair evaluation of different autofocal tuning methods and yielding more meaningful and reliable results.

These errors are consistent across all tuning methods, ensuring a fair and meaningful comparison. However, the ground truth computation’s potential errors and limitations can affect similarity measurements in the study. Assumptions about the object of interest and inaccuracies in eye-tracking measurements can introduce deviations from perfect similarity measurements. The simulation’s limitations in replicating real-world conditions can also impact the accuracy of the ground truth signal and similarity measurements. It is important to consider these limitations when interpreting the similarity measurements and assessing the performance and accuracy of the tuning methods.

Performance of Tuning Methods

In our research, we have chosen to use direct gaze tracking as our primary method for assessing gaze depth, as it offers superior reliability compared to relying solely on the eye vergence angle [26, 27]. While previous studies may have used the eye vergence angle as an indirect measure to approximate gaze depth, we believe that direct gaze tracking provides a more accurate and direct measurement of the person’s focus point. However, to ensure a comprehensive assessment, we will still incorporate the evaluation of the vergence signal in our study. This combination of methods will allow us to understand gaze behavior better and provide more robust findings for our research objectives.

The performance of different tuning methods, namely manual control, gaze control, and vergence control, was evaluated within the simulation environment. The analysis considered various factors, including visual performance, response speed, and subjective convenience measures. Here are the findings and a discussion of the performance of each tuning method:

Visual Performance

Quantitative results based on the descriptive statistics of the differences between the control and ground truth signals provide insights into visual performance. The manual control signal exhibited the smallest mean difference, median difference, and standard deviation compared to the gaze and vergence control signals. It also displayed a less skewed and heavy-tailed distribution. These results suggest that the manual control method aligns more closely with the ground truth signal, indicating better visual performance. The gaze control signal showed slightly larger deviations, skewness, and kurtosis than the manual control signal. While it still performed better than the vergence control signal, these findings suggest that the gaze control method introduces some deviations from the ground truth signal. The vergence control signal demonstrated the largest deviations, skewness, and kurtosis among the three methods. This indicates a less accurate and inconsistent alignment with the ground truth signal, resulting in poorer visual performance than the other two methods.

Method	Mean	Median	Std Dev	Skewness	Kurtosis
Manual	0.162	0.043	0.375	1.472	10.291
Gaze	-0.917	-0.373	-0.629	3.170	21.360
Vergence	-3.658	-4.304	8.250	24.146	852.334

Table 1: Error between control and ground truth signal in meters.

The mean difference represents the average deviation between the control and ground truth signals. In this particular analysis, the vergence control signal exhibits the highest mean difference (-3.658), followed by the gaze control signal (-0.917) and the manual control signal (0.162), as can be seen in Table 1. A lower mean difference signifies a closer alignment with the ground truth signal. Therefore, it can be observed that the manual control signal exhibits a slight deviation from the ground truth signal. Similarly, a lower median difference indicates a better alignment. In this comparative analysis, the manual control signal demonstrates the lowest median difference (0.043), followed by the gaze control signal (-0.373) and the vergence control signal (-4.304). The standard deviation measures the variability or spread of the differences between the control and ground truth signals. A smaller standard deviation implies less variability and greater consistency. Among the control signals, the manual control signal displays the slightest standard deviation (0.375), followed by the gaze control signal (-0.629) and the vergence control signal (8.250). When considering the symmetry and shape of the signals, skewness is employed to assess the asymmetry of the distribution of differences. Positive skewness suggests a longer tail on the right side, whereas negative skewness indicates a longer tail on the left side. In this analysis, the gaze control signal (skewness = 3.170) and the vergence control signal (skewness = 24.146) exhibit positive skewness, indicating a longer tail towards higher values. Conversely, the manual control signal demonstrates a slight positive skewness of 1.472. Kurtosis measures the heaviness of the tails in the distribution of differences compared to a normal distribution. Higher kurtosis values signify heavier tails. The vergence control signal displays the highest kurtosis (852.334), followed by the gaze control signal (21.360) and the manual control signal (10.291). All three control signals exhibit kurtosis values higher than a normal distribution (which has a kurtosis of 3), suggesting the presence of outliers or extreme values in the distributions.

Response Speed

The response speed of each tuning method was measured in terms of trial duration. The analysis showed that the manual control method generally had lower response times than the gaze control method. However, there were a few instances where the gaze control method exhibited faster response times. On average, the gaze control method had a slight advantage in terms of response speed. The mean response speed values indicate that, on average, the Manual control method has a slightly higher response speed (8.72 seconds) than the Gaze control method (8.57 seconds). The standard deviation values reveal that the response speed data for the Manual method has higher variability (22.00 seconds) compared to the Gaze method (4.05 seconds). This suggests that the response speed in the Manual control method varies more widely across subjects than in the Gaze control method. Figure 2 provides valuable insights into the response speed performance of the Manual and Gaze control methods, highlighting differences in trial duration for different subjects and showing which method was preferred for each subject and their average performance based on response speed.

Correct response rate

The analysis of correct response rates across all subjects reveals that the gaze control method outperformed the manual control method in task performance. The gaze control method achieved a perfect score of 100% in 7 out of 21 subjects, while the manual control method achieved a perfect score in 5 out of 21 subjects. The gaze control method was the preferred choice in 10 out of 21 subjects, while the manual control method won in 7 out of 21. There were four subjects where the performance between the two methods was equally good. An overview of each subject and the condition performance can be seen in Figure 3. Based on the correct response rate, the gaze control method exhibited a higher average performance of 94.7%, compared to the manual control method’s average performance of 94.0%. This indicates that, on average, the manual control method resulted in slightly more accurate responses and better task performance than the gaze control method. However, the difference in performance between the two methods is relatively small and insignificant, with an average difference of only 0.71%. As shown in Figure 3, the correct response rate analysis indicates that both control methods have similar task performance, with the Gaze control method having a slight advantage regarding average correct response rate and lower variability. However, other factors such as ease of use, user preference, and system requirements may also play a role in selecting the most suitable control method for specific use cases.

Subjective Convenience Measures

The task load evaluation included various subjective convenience measures to assess participants’ experiences with each tuning method. These measures were based on the NASA Task Load Index (TLX) [28], a widely used tool for evaluating mental workload and perceived task demands. The TLX assesses six dimensions of workload, each measured on a 60-point scale: Participants were asked to rate each of these dimensions on a scale from 0 to 60, with 0 indicating "low" and 60 indicating "high." The scores for each dimension were then averaged to obtain a comprehensive evaluation of the subjective convenience of each tuning method. The results of the subjective convenience measures showed that the manual control method received lower scores in all dimensions (mental demand, physical demand, temporal demand, performance, effort, and frustration) compared to the gaze control method. This suggests that participants perceived the manual control method as requiring less mental and physical effort and less time pressure, leading to lower frustration levels during task execution and being more effective. These findings indicate that the gaze control method requires more mental and physical effort from participants and may induce higher frustration levels. A direct comparison of the averaged values of the particular dimensions can be seen in Figure 4.

Similarity measures

The analysis of similarity measures provides valuable insights into the performance and accuracy of the tuning methods. A comprehensive analysis of the similarity measures for the control signals, as displayed in Figure 5, yields the following insightful findings.

The Pearson correlation coefficient is a valuable tool for assessing the linear correlation between two signals, providing valuable insights into their relationship. A correlation value close to 1 indicates a strong positive correlation, while a value relative to -1 suggests a strong negative correlation. In our study, all three control signals (manual, gaze, and vergence) exhibit weak negative correlations with the ground truth signal. Specifically, the manual control signal achieves the highest correlation coefficient (-0.016), followed by the gaze control signal (-0.028) and the vergence control signal (-0.049). These relatively low correlation values indicate that the control signals poorly capture the essence or likeness of the ground truth signal. Another important measure, the root mean square error (RMSE), quantifies the average magnitude of differences between the control signal and the ground truth signal. A lower RMSE value indicates a closer alignment between the two signals. In our analysis, the gaze control signal demonstrates the lowest RMSE (2.142), followed by the manual control signal (2.681) and the vergence control signal (13.215). The significantly lower average magnitude of differences the gaze control signal exhibited suggests a relatively superior match to the ground truth signal. These values are reasonable because RMSE gives more weight to big errors as the squaring leads to larger errors disproportionately impacting RMSE. After squaring the errors, RMSE calculates their mean. This step averages out the error over all predictions, but since larger errors are squared, their impact remains significant in this average. In summary, a higher RMSE generally indicates that the signal is less accurate or has more substantial deviations from the ground truth signal. This could be due to a few very large errors or a consistent pattern of moderate errors. RMSE doesn’t distinguish between these scenarios, so it’s often helpful to look at other metrics or analyze error distributions to understand the differences between the two signals better. Cosine similarity, which measures the cosine of the angle between two vectors, assesses the similarity between the control signal and the ground truth signal. Cosine similarity values range from -1 to 1, where 1 indicates perfect similarity, and -1 represents complete dissimilarity. Our comparison reveals that the manual control signal boasts the highest cosine similarity (0.483), followed by the gaze control signal (0.404) and the vergence control signal (0.209). These values indicate that the manual control signal resembles the ground truth signal. All signals show a weak cosine similarity. Thus, this metric is not informative for our comparison. The average absolute error measures the average distance between the data points of the control signal and the ground truth signal. A lower average distance signifies a closer alignment. Within our examination, the gaze control signal demonstrates the lowest average absolute error (273.231 cm), followed by the manual control signal (529.342 cm) and the vergence control signal (1103.577 cm). The mean absolute error (MAE) treats all errors equally, regardless of their size. A higher MAE suggests that the signal consistently has larger deviations from the ground truth, but it doesn’t necessarily indicate the presence of extremely large errors (like RMSE). Cross-correlation, a measure that quantifies the similarity between two signals by sliding one over the other and calculating the correlation at each step, provides further insights into their alignment. Higher cross-correlation values indicate a more pronounced similarity. In our comparison, the manual control signal exhibits the highest cross-correlation (154281.867 cm ${}^{2}$ ), followed by the vergence control signal (44439.620 cm ${}^{2}$ ) and the gaze control signal (38962.862 cm ${}^{2}$ ). These elevated cross-correlation values suggest a stronger similarity between the manual and vergence control signals and the ground truth signals. Dynamic time warping, which quantifies the similarity between two time series by warping the time axis to align data points, offers valuable insights into their alignment. Lower dynamic time-warping values indicate a closer alignment. In our investigation, the gaze control signal demonstrates the lowest dynamic time warping value (28140.367 cm), followed by the vergence control signal (32185.166 cm) and the manual control signal (64675.913 cm). The superior alignment of the gaze control signal with the ground truth signal is evident in these results.

Discussion

The analysis of the autofocal system’s performance reveals notable distinctions among the control signals. Gaze control notably excels in precision, as highlighted by its low RMSE and MAE values, alongside a significant cosine similarity. This suggests its strong alignment with desired outcomes, despite less impressive dynamic time-warping scores. Conversely, manual control displays superior Pearson correlation and cross-correlation, indicating a more linear and consistent relationship with the standard signal. In terms of performance, vergence control lags, evidenced by its higher RMSE, reduced cosine similarity, expanded MAE, and greater dynamic time warping, highlighting a diminished correlation with the benchmark.

Exploring deeper, gaze control shows superiority in areas like RMSE and MAE, indicating high precision. Manual control, on the other hand, exhibits better correlation metrics, aligning more closely with anticipated results. Gaze control’s low RMSE and MAE suggest fine-tuned accuracy, but this doesn’t necessarily equate to reduced effort, a factor that might be better evaluated through comprehensive assessments like NASA TLX.

Statistics	Manual	Gaze	Vergence
Response speed	-	Marginally Faster	-
Response rate	Comparable	Comparable	-
Pearson Correlation	Most Prominent	-	-
RMSE	-	Most Favorable	-
Cosine Similarity	Most Prominent	-	-
MAE	-	Most Favorable	-
Cross-Correlation	Most Prominent	-	-
Dynamic Warping	-	Most Favorable	-
NASA TLX	Optimal	-	-

Table 2: Comparative Overview of Autofocal Glasses Tuning Methods

The manual approach, though slower, offers desirable control and predictability. Its linear consistency and high cosine similarity might contribute to an intuitive and user-friendly operation, albeit with slower response times. This suggests a gradual, comfortable learning curve for users. Gaze control’s lower correlation, despite its precision, might imply a minor deviation from expected paths. Conversely, manual control’s higher correlation, despite larger errors, shows adherence to general trends, albeit with variability, potentially influenced by fixed-depth plane selection. These insights highlight that control method selection should be based on task demands and user preferences, weighing precision against user-friendliness. The findings also point to vergence control’s limitations, necessitating improvements for better performance. The study further suggests enhancing the experimental setup, particularly in gaze tracking and manual control depth step calibration. Practically, gaze control suits tasks needing quick, frequent focus adjustments where accuracy is critical, though it may increase cognitive load. In contrast, manual control is better for scenarios requiring steady, user-led operation, providing stability and comfort, particularly in less hurried environments. Ultimately, manual and gaze-based methods show comparable performance, but manual control appears to have an edge in subjective assessments. The preference for automated methods, despite associated frustrations, indicates a demand for such controls. However, limitations, particularly in eye tracking, often lead to user frustration. Addressing these issues through advanced algorithms could enhance user experience and reduce frustration.

References

[1] Atchison, D. A. Accommodation and presbyopia. \JournalTitleOphthalmic and Physiological Optics 15, 255–272 (1995).
[2] Charman, W. N. The eye in focus: accommodation and presbyopia. \JournalTitleClinical and experimental optometry 91, 207–225 (2008).
[3] Frick, K. D., Joy, S. M., Wilson, D. A., Naidoo, K. S. & Holden, B. A. The global burden of potential productivity loss from uncorrected presbyopia. \JournalTitleOphthalmology 122, 1706–1710 (2015).
[4] Tahhan, N., Papas, E., Fricke, T. R., Frick, K. D. & Holden, B. A. Utility and uncorrected refractive error. \JournalTitleOphthalmology 120, 1736–1744 (2013).
[5] Lord, S. R., Dayhew, J., Sc, B. A. & Howland, A. Multifocal glasses impair edge-contrast sensitivity and depth perception and increase the risk of falls in older people. \JournalTitleJournal of the American Geriatrics Society 50, 1760–1766 (2002).
[6] Johnson, L., Buckley, J. G., Scally, A. J. & Elliott, D. B. Multifocal spectacles increase variability in toe clearance and risk of tripping in the elderly. \JournalTitleInvestigative ophthalmology & visual science 48, 1466–1471 (2007).
[7] Charman, W. N. Developments in the correction of presbyopia i: spectacle and contact lenses. \JournalTitleOphthalmic and Physiological Optics 34, 8–29 (2014).
[8] Pope, D. R. Progressive addition lenses: history, design, wearer satisfaction and trends. In Vision science and its applications, NW9 (Optica Publishing Group, 2000).
[9] Agarwala, R., Dechant, M., Sauer, Y. & Wahl, S. Feasibility of eye tracking to control a prototype for presbyopia correction with focus tunable lenses. \JournalTitleInvestigative Ophthalmology & Visual Science 64, 2503–2503 (2023).
[10] Jarosz, J. et al. Pilot clinical investigation of adaptative eyeglasses for the correction of presbyopia. \JournalTitleInvestigative Ophthalmology & Visual Science 64, 2501–2501 (2023).
[11] Sheedy, J. E., Campbell, C., King-Smith, E. & Hayes, J. R. Progressive powered lenses: the minkwitz theorem. \JournalTitleOptometry and Vision science 82, 916–922 (2005).
[12] Meister, D. J. & Fisher, S. W. Progress in the spectacle correction of presbyopia. part 2: Modern progressive lens technologies. \JournalTitleClinical and experimental optometry 91, 251–264 (2008).
[13] Selenow, A., Bauer, E. A., Ali, S. R., Spencer, L. W. & Ciuffreda, K. J. Assessing visual performance with progressive addition lenses. \JournalTitleOptometry and vision science 79, 502–505 (2002).
[14] Sauer, Y. et al. Self-motion illusions from distorted optic flow in multifocal glasses. \JournalTitleIscience 25 (2022).
[15] Agarwala, R., Sanz, O. L., Seitz, I. P., Reichel, F. F. & Wahl, S. Evaluation of a liquid membrane-based tunable lens and a solid-state lidar camera feedback system for presbyopia. \JournalTitleBiomedical Optics Express 13, 5849–5859 (2022).
[16] Wang, L., Cassinelli, A., Oku, H. & Ishikawa, M. A pair of diopter-adjustable eyeglasses for presbyopia correction. In Novel Optical Systems Design and Optimization XVII, vol. 9193, 369–378 (SPIE, 2014).
[17] Padmanaban, N., Konrad, R. K. & Wetzstein, G. Autofocals: Evaluating gaze-contingent eyeglasses for presbyopes. In ACM SIGGRAPH 2019 Talks, 1–2 (2019).
[18] Hasan, N. et al. Adaptive optics for autofocusing eyeglasses. In Applied Industrial Optics: Spectroscopy, Imaging and Metrology, AM3A–1 (Optica Publishing Group, 2017).
[19] Hasan, N., Banerjee, A., Kim, H. & Mastrangelo, C. H. Tunable-focus lens for adaptive eyeglasses. \JournalTitleOptics express 25, 1221–1233 (2017).
[20] Li, G. et al. Switchable electro-optic diffractive lens with high efficiency for ophthalmic applications. \JournalTitleProceedings of the National Academy of Sciences 103, 6100–6104 (2006).
[21] De Smet, J. Morrow eyeware: An introduction to morrow’s autofocal tunable lens technology. In SPIE AVR21 Industry Talks II, vol. 11764, 117640K (SPIE, 2021).
[22] Mompeán, J., Aragón, J. L. & Artal, P. Portable device for presbyopia correction with optoelectronic lenses driven by pupil response. \JournalTitleScientific Reports 10, 20293 (2020).
[23] Sauer, Y., Wahl, S. & Habtegiorgis, S. W. Realtime blur simulation of varifocal spectacle lenses in virtual reality. In SIGGRAPH Asia 2022 Technical Communications, 1–4 (2022).
[24] Valve Corporation. SteamVR Knuckles Controllers. https://www.valvesoftware.com/en/index/controllers (2023). Accessed: 01.12.2023.
[25] Feick, M., Kleer, N., Tang, A. & Krüger, A. The virtual reality questionnaire toolkit. In Adjunct Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology, 68–69 (2020).
[26] Hooge, I. T., Hessels, R. S. & Nyström, M. Do pupil-based binocular video eye trackers reliably measure vergence? \JournalTitleVision Research 156, 1–9, DOI: https://doi.org/10.1016/j.visres.2019.01.004 (2019).
[27] Linton, P. Does vision extract absolute distance from vergence? \JournalTitleAttention, Perception, & Psychophysics 82, 3176–3195 (2020).
[28] Sandra G. Hart, L. E. S. Development of nasa-tlx (task load index): Results of empirical and theoretical research. \JournalTitleAdvances in Psychology (1986).

Acknowledgements

The German Research Foundation (DFG) generously supported this research under SFB 1233, Robust Vision: Inference Principles and Neural Mechanisms, TP TRA, with project number 276693517.

Author contributions statement

BWH contributed to the development, data collection, analysis, writing, and hypothesis formulation. YS contributed to the development, writing, and hypothesis formulation. RA contributed to writing and hypothesis formulation. SW contributed to writing, hypothesis formulation, and project funding. The authors declare no competing interests.