Next Article in Journal
MultiSenseX: A Sustainable Solution for Multi-Human Activity Recognition and Localization in Smart Environments
Previous Article in Journal
Attention-Based Hybrid Deep Learning Models for Classifying COVID-19 Genome Sequences
Previous Article in Special Issue
From Eye Movements to Personality Traits: A Machine Learning Approach in Blood Donation Advertising
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Adaptive Real-Time Translation Assistance Through Eye-Tracking

by
Dimosthenis Minas
*,
Eleanna Theodosiou
,
Konstantinos Roumpas
and
Michalis Xenos
Software Quality and Human-Computer Interaction Laboratory, University of Patras, 26504 Rio, Greece
*
Author to whom correspondence should be addressed.
Submission received: 31 October 2024 / Revised: 1 December 2024 / Accepted: 16 December 2024 / Published: 2 January 2025
(This article belongs to the Special Issue Machine Learning for HCI: Cases, Trends and Challenges)

Abstract

:
This study introduces the Eye-tracking Translation Software (ETS), a system that leverages eye-tracking data and real-time translation to enhance reading flow for non-native language users in complex, technical texts. By measuring the fixation duration, we can detect moments of cognitive load, ETS selectively provides translations, maintaining reading flow and engagement without undermining language learning. The key technological components include a desktop eye-tracker integrated with a custom Python-based application. Through a user-centered design, ETS dynamically adapts to individual reading needs, reducing cognitive strain by offering word-level translations when needed. A study involving 53 participants assessed ETS’s impact on reading speed, fixation duration, and user experience, with findings indicating improved comprehension and reading efficiency. Results demonstrated that gaze-based adaptations significantly improved their reading experience and reduced cognitive load. Participants positively rated ETS’s usability and were noted through preferences for customization, such as pop-up placement and sentence-level translations. Future work will integrate AI-driven adaptations, allowing the system to adjust based on user proficiency and reading behavior. The study contributes to the growing evidence of eye-tracking’s potential in educational and professional applications, offering a flexible, personalized approach to reading assistance that balances language exposure with real-time support.

1. Introduction

In the ongoing wave of technological advancement, we envision a near future where technology seamlessly integrates into every facet of life, trying to precisely fulfill human needs. Developing robust and intelligent systems is crucial to driving this transformation, ensuring that these technologies are not only efficient and reliable but also adaptive to the complexities of human interactions. Such advancements will pave the way for a more intuitive, user-centered world where technology actively enhances daily experiences and capabilities [1,2,3]. However, the applications must be developed in a way that addresses the current and future needs of users, especially as eye-tracking technology transitions to the industry level and is utilized in real-life applications [4,5].
Eye-tracking technology, once expensive and limited to specialized fields [6,7], is now becoming more affordable and accessible, making it a viable tool in a wide range of industries. This democratization of eye-tracking, driven by advances in artificial intelligence (AI) and machine learning, enables systems to become more efficient and adaptive. AI enhances the capability of eye-tracking by predicting user intent and improving data accuracy, which in turn reduces cognitive load during interactions [8,9]. By integrating AI, eye-tracking systems can now provide real-time insights, making them more useful not only for entertainment and accessibility but also for academic and professional settings [10].
The potential of eye-tracking to impact niche industries, such as research, is increasing. Researchers, especially young researchers, often engage in cognitively demanding tasks such as reading technical and scientific texts. Eye-tracking systems can help identify moments when a user struggles, offering adaptive support, such as text translation [11,12]. Such interventions can ease mental fatigue and enhance productivity, making formerly complex tasks more manageable [13,14]. These technologies can also assist in improving human-computer interaction (HCI) through adaptive user interfaces that react to eye movements, providing a more seamless user experience.
One such activity–which is the subject of this work–is using eye-tracking to assist in reading and understanding scientific papers in foreign languages. By automatically detecting moments of cognitive load where users are stuck on specific words, ETS v1.0 (Eye-tracking Translation Software) can display pop-up translations in real time. With this approach, we aim to improve reading comprehension and minimize user frustration, helping users stay focused on their primary tasks without unnecessary interruptions.
Reading in a non-native language, particularly in specialized scientific texts, presents unique challenges for researchers [15,16]. While the use of AI translation tools offers a convenient solution for understanding foreign texts, these tools can diminish the reader’s engagement with the language, preventing the natural development of reading skills [17,18]. Conversely, relying on traditional dictionary lookups for unfamiliar words can be equally disruptive, as it interrupts the reading flow and leads to significant time loss [19]. Our system aims to bridge this gap by leveraging eye-tracking technology to provide real-time translations only when necessary, specifically when the user struggles with particular words or phrases. Through this, we aim to retain the readers’ focus on the text and continue to engage with the foreign language in a meaningful way without the need for constant manual intervention. In doing so, our system supports researchers in maintaining their language proficiency while enhancing reading efficiency, striking a balance between learning and productivity.
ETS provides immediate translation for challenging words that increase the user’s cognitive load, aiming to minimize disruptions to the reading flow. By assisting with these difficult words, we aim to reduce the amount of mental effort required from the working memory during cognitively demanding tasks, such as reading and understanding a scientific publication [20,21]. Through the integration of eye-tracking technology, the system adapts to individual reader needs, offering a personalized and dynamic reading experience. Real-time monitoring of user interactions with translated text allows for identifying areas of difficulty, which can inform future improvements. Ultimately, this research introduces an eye-tracking-based system that enhances the reading process by detecting words that frustrate the user and provide instant translation, showcasing the potential of intelligent, user-centered reading aids.
By capturing and storing eye movement data for individual users, our system lays the groundwork for future integration with machine learning algorithms [22,23]. This will enable more sophisticated, adaptive learning environments tailored to individual user needs. Currently, the system enhances reading comprehension by providing immediate translations of words that frustrate the user, demonstrating the practical application of eye-tracking in improving accessibility and personalized learning experiences. Practically, this system could be applied in educational settings to assist students with reading difficulties, ensuring they receive immediate support and thereby enhancing their learning outcomes. Additionally, it could be used in professional environments to help non-native speakers quickly comprehend complex and scientific documents, improving efficiency and reducing cognitive load.
The paper is organized as follows: Section 2 offers a literature review, presenting the previous work and the proposed solutions corresponding to text translation assistance systems. It also explores how eye movements can be used to infer user needs, drawing from prior research in psychometrics and eye-tracking studies to establish the rationale for this investigation. Section 3 details the development of our system, covering the technologies utilized, the structure of the application, and the specific processes implemented to ensure its functionality. It includes an overview of the core components, system specifications, and the step-by-step procedures required for the system to operate effectively. Section 4 encompasses the research questions and the study design, including the design and procedure, the methods applied, the experiment protocol, and the results with a detailed statistical analysis. Finally, a summary discussion will be included to interpret the findings and their implications.

2. Literature Review

The development of software applications, particularly those integrating innovative technologies such as eye-tracking, necessitates a thorough understanding of existing methodologies, tools and frameworks. Ching-I, Wu [24] stated that the utilization of eye-tracking technology in education will revolutionize teaching methods by providing real-time feedback and improving cognitive processing during learning activities. Therefore, understanding the basics of eye-tracking technology and its relation to cognition, as well as examining previous work and eye movements, will help us analyze and understand cognitive effort during post-editing and interactive interfaces. Building on this foundation, Biedert et al. [25] in their work on the eyeBook, an application that uses eye-tracking data to enhance and create an interactive and entertaining reading experience. This system observes which text parts are currently being read by the user on the screen and generates appropriate effects, such as playing sounds or changing the color scheme based on the reader’s gaze behavior. Users responded favorably to the eyeBook system, noting that the integration of eye-tracking technology significantly enriched their reading experience by making it more immersive and interactive, which they found to be both engaging and intuitive. However, using eye-tracking in various domains, including translation tasks, poses challenges, particularly in connecting gaze data with cognitive processes and addressing reactivity effects in the recorded gaze data [26].
Hurskykari et al. [27] presented iDict, a gaze-assisted translation aid system that aims to assist users in reading text written in a foreign language by tracking their gaze path and providing real-time translation assistance when difficulties in word comprehension are encountered. Additionally, iDict incorporates information obtained from reading research and a language model to provide relevant translations. The system utilizes lexical and syntactic analysis of the text to identify potentially unfamiliar words, phrases and idioms. Various indicators of reading difficulties were examined, with the total fixation time on a word adjusted by its frequency in the language corpus being the primary measure. When difficulties were detected, a translation of the word appeared automatically above it on the screen, along with a dictionary definition in a separate panel. Two automatic drift correction algorithms were used to determine which line of text a new fixation should be linked to. Both algorithms employed a forward-looking box or mask from the last mapped fixation to decide if the new fixation should remain on the same line. The first algorithm, Sticky Lines, increased the mask’s vertical size as the reading progressed along a line, allowing more vertical drift with longer sequences of fixations. The second algorithm, Magnetic Lines, aimed to correct vertical shifts at the start of new lines by carrying forward the average fixation-to-line distance from the previous line and centering the mask at this height rather than the new line’s actual height. Additionally, readers could manually correct drift using arrow keys to move the current line up or down if the automatic correction failed. The effectiveness of these algorithms depended partly on the line spacing. The best results, with 1.5 line spacing and 11-point Verdana font, showed that 86% of fixations were correctly mapped to the intended lines.
Similarly, the study of Sibert et al. [28] presents a visually controlled auditory prompting system aimed at assisting individuals with reading disabilities. The system utilizes eye-tracking technology to track the reader’s gaze and provide real-time assistance in recognizing and pronouncing words. Practically, the reading assistant is a visually activated, interactive reading program and uses eye-tracking to trigger synthetic speech feedback. To evaluate the effectiveness of the proposed system, eight children participated in a pilot experiment and indicated early signs of increased learning speed. Another system made by Sibert et al. [29] is named KiEV, wherein a new method for gaze and keystroke data visualization was developed and detected fixations are linked to the corresponding word in the text and the blocks of reading and typing processes are shown in parallel with details for each word presented in word bars. By visualizing gaze patterns and writing actions, the tool helps identify areas where translators face difficulties, offering a clearer understanding of the translation process. This research emphasizes the potential of integrating eye-tracking into translation studies to optimize workflows and improve translator training.
Ho et al. [30] introduced an innovative reading interface designed to aid English as second-language learners by utilizing eye-tracking to adaptively display machine translations with sentence-level mapping highlighted through background colors. An experimental investigation was conducted to assess the impact of various methods on second-language readers. The results demonstrated that active translation significantly enhanced the readers’ comprehension of English texts without causing visual discomfort. Additionally, the sentence-level mapping using background colors effectively facilitated the correlation between original and translated sentences while also mitigating issues such as line skipping and resumption of reading, so overall, they enhanced the reading performance as well.
Guo and Chen [31] proposed a novel reading assistance approach that leverages eye-tracking data and text features of the gazed area during reading. This innovative method can automatically detect the user’s intent, whether it be for word translation or summarizing long sentences. Upon detecting the intent, it displays the meaning of the word or provides a summary of the sentences in the form of annotations. The pilot study results indicated that this approach achieved an average accuracy of 80.6 ± 6.3%, and the automatically generated annotations significantly improved the user’s reading efficiency and subjective experience. Similarly, Palmer and Sharif [32] describe an algorithmic method for adjusting fixation locations when reading source code. Unlike reading regular text, source code reading is non-linear and does not follow a top-to-bottom sequence. Fixations are grouped into temporal clusters, and these clusters are then repositioned to align with areas of interest (AOIs), which are defined by lines of source code and lexical tokens within those lines. The automatically adjusted output was compared with manual adjustments made by two human judges working together. The algorithm achieved an accuracy of 89% compared to the manual adjustments for line AOIs rather than token AOIs.
Our work builds upon previous advancements in gaze-assisted translation systems, particularly the iDict system developed by Hurskykari et al. [27]. iDict provided a robust solution by tracking users’ gaze paths and offering real-time translations when word comprehension difficulties arose. It utilized lexical and syntactic analysis to identify unfamiliar words and presented translations in an external panel, supplemented by dictionary definitions. While iDict demonstrated the potential of eye-tracking in assisting users with foreign language texts, its reliance on separate panels for translation led to interruptions in the natural reading flow, as users had to divert their attention away from the text. In contrast, ETS integrates dynamic, real-time word-level translations directly within the text, ensuring the reader maintains focus. Moreover, ETS allows translations to disappear dynamically when the user moves on, enhancing reading fluidity, whereas iDict’s static tooltips can obscure parts of the text. Additionally, ETS incorporates an adaptive timing mechanism (50 ms + 10 ms per letter) for translation pop-ups, calibrated to user behavior, enabling contextual assistance, while iDict monitors general indicators like regressions and extended fixation times without quantifying thresholds in milliseconds, which may lack granularity. Furthermore, ETS leverages modern technologies such as Elasticsearch [33] and AI-ready data storage for future personalization, enabling dynamic adaptation to user proficiency and behavior. This technological stack also facilitates the handling of diverse document formats through real-time Optical Character Recognition (OCR) [34], surpassing iDict’s reliance on preprocessed lexical analysis. Our approach aims to minimize these disruptions by integrating translations directly within the text, ensuring the reader can maintain focus on the content without diverting attention to external sources of information.
Additionally, Guo and Chen’s [31] system focuses on summarizing sentences and detecting broader reading intent; our work narrows the focus to real-time world-level translations. Instead of offering summaries or broader interpretations, we decided to concentrate specifically on detecting moments of cognitive load at the word level, providing immediate, word-specific translations. This allows us to directly support the reader’s engagement with complex texts without altering the overall structure or meaning of the content.

3. Design and Specification of ETS System

In this section, we present the technical architecture and implementation details of ETS. This section outlines the technology stack, methodologies, and engineering principles that culminated in the creation of a robust and responsive application designed to provide real-time translation support for users reading text in a non-native language. The design and development of ETS were guided by a set of critical requirements to ensure that the system is not only functional but also reliable, user-centric, and effective in practical scenarios. Key design priorities included real-time data processing, precision in eye-tracking accuracy, intuitive user interface (UI) design, scalability, flexibility, and stringent data privacy. The system’s capability to process eye-tracking data in real time was achieved through the integration of high-performance computing resources and optimized algorithms, enabling seamless data without perceptible delays. This ensures that users receive immediate and contextually relevant translation assistance. Achieving high accuracy was essential, necessitating the use of advanced algorithms and precise calibration techniques to accurately map eye movements to specific words on the screen. The following sections delve into these aspects in detail, showcasing how the design choices underpin the system’s precise responsiveness to user behavior.

3.1. Technology Stack and System Architecture

The development of ETS involved a multi-level approach that included the utilization of various technologies and hardware devices. The UI was developed using the React library [35], known for its modular and flexible structure along with component reusability, which is essential for maintaining a scalable and maintainable codebase. React’s virtual DOM was instrumental in optimizing application performance, while JSX (JavaScript XML) enhanced the clarity and manageability of the code. The interactive elements of the UI were implemented using React in conjunction with Tailwind CSS [36,37], a utility-first framework that facilitated efficient and consistent styling across the application. The React-PDF library [38] was integrated to enable seamless interaction with PDF documents within the application, providing users with a dynamic and customizable reading experience. The system supports document uploads, enabling users to select their preferred zoom level and navigate through scrollable content in uploaded PDF files.
On the backend, the system was developed using Flask [39], a microweb framework selected for its lightweight architecture and flexibility. The flask-cors extension was employed to manage Cross-Origin Resource Sharing (CORS) [40], ensuring secure communication exclusively between the React frontend and the Flask backend. For data storage, SQLite [41], a self-contained, serverless database engine, was chosen due to its lightweight nature, providing an efficient solution for simple embedded data management. Additionally, Elasticsearch [33], a distributed search and analytics engine, was integrated to handle the recording, processing, and analysis of the large volumes of data generated by the eye-tracking device during system usage per user. To ensure user privacy, eye-tracking data stored in Elasticsearch were fully anonymized, with unique, non-identifiable user IDs assigned to each participant.
Furthermore, the data collection and storage processes comply with GDPR regulations [42], ensuring restricted access to these data. These measures safeguard sensitive behavioral information and uphold ethical standards for data handling. These data are stored and leveraged by advanced AI algorithms to achieve eye-tracking user personalization, enabling the system to dynamically adapt and optimize user experiences based on individual gaze patterns and preferences. To leverage real-time eye-tracking data and access translation assistance, users must connect to a compatible eye-tracking device linked to their computer system, ensuring that the system responds accurately to user behavior.
To manage eye-tracking data collection and interaction with the eye-tracking device, a separate application, the ETSDVM (Eye-tracking Translation Device Management Service), was developed in Python using the low-level Tobii Pro Software Development Kit v1.11 (SDK). The ETSDVM service, when asked, subscribes to eye-tracking, capturing real-time eye-tracking data. Python’s web sockets library [43] enabled real-time communication between the ETSDVM service and the React client, allowing the application to process and display eye-tracking data instantly. Finally, the Google Translate API [44] was integrated to provide real-time translation services based on the user’s calculated frustration.
For optimal system performance, the ETS application was responsible for managing UI, backend processing, data storage, and saving the user’s eye-tracking data to Elasticsearch. In contrast, the ETSDVM application focused exclusively on communication with the eye-tracking device, handling the collection and real-time transmission of eye-tracking data.
During development, several challenges arose that were not accounted for initially, necessitating additional technologies and custom algorithms. Optical Character Recognition (OCR) using Pytesseract [34,45] was employed to convert image content from PDF documents into digital text, facilitating accurate word positioning on the screen. To enhance this process, we utilized Pytesseract to create virtual boxes around each word, excluding stopwords [46], enabling precise tracking of user focus. This integration, combined with real eye-tracking data, enabled the system to determine the precise word on which the user is focusing at any given moment. Custom algorithms were developed to address scaling inconsistencies and determine the exact timing for displaying translations based on user gaze duration. This dynamic system adjusts the translation pop-up based on word length, as longer words require more time for the user to process.
For the accurate implementation of translation display timing, Equation (1) was used:
D i s p l a y   T i m e = 50   m s + 10   m s × N u m b e r   o f   L e t t e r s
This Equation (1) was derived following an initial pilot study involving seven users, during which we tested multiple timing configurations to evaluate and determine the optimal translation display speed for user comfort and efficiency. Equation (1) ensures that the translation begins to appear after 50 ms when the user focuses on a word, with an additional 10 ms added for each letter in the word. To achieve this, we buffered the gaze data and applied a smoothing algorithm to filter out noise and detect sustained fixations accurately. By continuously comparing the user’s gaze coordinates with the boundaries of each word on the screen, we identified the specific word the user was fixating on. Upon detecting a word fixation, we displayed a translation pop-up near the word. The translation data were fetched to the Google API [47] translation service, providing real-time translation based on the user’s gaze and language preference.
Initially, the pop-up was designed with an alternative UI concept (see Figure 1) aimed at providing seamless, immediate translation feedback to enhance the reading experience. To refine this approach, a part of the pilot study specifically included determining the preferable translation pop-up for optimal usability and efficiency. Based on their input, and as we gained experience with how the system performs best for typical users, we decided on a design that balances usability with minimal disruption. This resulted in the final UI shown in Figure 2, which more closely aligns with both user preferences and our observations for a seamless reading aid.
Finally, to maintain accuracy, the system continuously monitors the user’s eye-tracking data. Before each session, participants underwent a stringent calibration process to achieve high accuracy standards, including accuracy metric below 0.1° and precision (standard deviation) and precision (RMS) under 0.15°. This calibration ensured the eye tracker was finely tuned to each participant’s gaze patterns. In our study, the before-line spacing was set to 5.3 pt, the text size was 10 with Times New Roman as the font style, and the zoom level was adjusted to 330% in the Spectrum build-in monitor. These specific settings were chosen to align with the optimal accuracy calibration parameters of our eye-tracking system, ensuring high tracking reliability and minimizing the risk of misalignment during real-time translations.
The system utilizes Pytesseract to process the document and create virtual bounding boxes around each word, which are scaled and adjusted to match the determined zoom level. If at least one gaze data point falls outside of the virtual box around the word, the system resets the display time and recalculates before displaying the translation again. To enhance accuracy in triggering translations, ETS applies a calculated delay of 50 ms + 10 ms per letter—a parameter refined through pilot studies. To further enhance this adaptive process, we developed a dedicated connector between the user and the eye tracker, optimizing data flow and ensuring minimal latency, where the user can connect and disconnect from the eye tracking device anytime (Figure 3). This logic provides a tailored and dynamic response to the user’s reading behavior, optimizing the translation experience based on the exact reading position.

3.2. System Specifications and Eye-Tracking Hardware

A high-performance desktop environment is essential to handle the real time transmission and interpretation of eye-tracking data between the ETS application and the ETSDVM service. This system is needed to support the high-frequency collection, processing, and transmission of eye-tracking data, ensuring seamless application interaction.
A head-free eye-tracking setup was selected as the optimal solution for the ETS system, as it allows users to maintain natural movement without requiring a fixed position relative to the screen. This flexibility is crucial for prolonged reading sessions, where freedom of movement can reduce physical strain and improve user comfort. The system is compatible with a range of remote eye-tracking models that support this head-free capability, including the Spectrum, Nano, TX300, T60XL, X3-120, X2-60, X2-30, X60, X120, T60, and T120. For development and testing, we utilized the Spectrum 300 Hz eye tracker, which combines a high sampling rate with reliable head-free tracking. This configuration ensures accurate gaze data while accommodating natural head movement, thereby capturing authentic reading behaviors and enhancing the system’s suitability for real-world application [48,49]. The choice of Spectrum was guided by its superior precision of up to 0.1 degrees of accuracy, which is essential for ETS to accurately determine word-level fixations and provide timely translations.

4. Study Design

In this section, we describe the study design, including the experiment procedure, the metrics used and the results. Our goal is to evaluate if ETS enhances the reader’s experience and efficiency. To evaluate that, we designed an experiment in which participants were asked to fill out a pre- and post-questionnaire, interact with our system, and finally, participate in a semi-structured interview.

4.1. Research Questions

This paper aims to investigate whether ETS can enhance the reading flow and understanding of the user during the reading process of research works. Therefore, this study aims to explore and answer the following research questions:
  • RQ1: Will the ETS system alter reading speed compared to reading without system assistance?
  • RQ2: Will the ETS system alter the fixation times for each word compared to reading without system assistance?
  • RQ3: Will the ETS system alter the number of fixations compared to reading without system assistance?
  • RQ4: Will the ETS system alter the average fixation times for each word compared to reading without system assistance?
The experiment was conducted in a university laboratory setting within a department specializing in Computer Engineering. Forty-seven participants contributed to our survey, a number that allowed us to make both qualitative and quantitative assessments of eye-tracking indicators, including the fixation count, fixation duration and average fixation duration. A depiction of the experimental laboratory setup is provided in Figure 4.

4.2. Experiment Procedure

A within-subjects design was chosen for this study due to its several methodological advantages that are particularly beneficial for assessing the effectiveness of the ETS software [50,51]. A completely randomized technique was employed, ensuring the unbiased assignment of participants to each experimental condition [52,53,54]. The laboratory setup was designed to ensure optimal conditions for eye-tracking experiments. A separate room was used to eliminate external noise, creating a distraction-free environment. Soft lighting was employed to facilitate precise calibration of the eye-tracking equipment, enhancing the accuracy of eye-tracking data collection. Additionally, the setup ensured that participants had no visual contact with the experimenter, thereby minimizing potential biases and distractions [9].
Upon arrival, participants were guided to the experiment room, where they filled out a consent form, were briefed on the study’s purpose, and their rights, and were informed that they could stop participating without any reason if they wanted. Informed consent was obtained from all subjects involved in the study. They then completed a pre-experiment questionnaire collecting demographic information, including age, gender, and language proficiency, to assist in attaining general opinions for the ETS. An overview of the calibration procedure was provided, ensuring that each participant could achieve the best possible calibration for their individual eye movements and position. To proceed with the experiment, participants were required to meet specific calibration criteria: an accuracy degree of lower than 0.1°, precision (SD) under 0.15°, and precision (RMS) under 0.15°. These stringent calibration parameters were set to ensure the highest level of accuracy and reliability in tracking eye movements, thus enabling the collection of precise and meaningful data during the experiment.
Following calibration, the experiment commenced. Participants were asked to read a technical, medical paper for four minutes using the ETS. As most of our participants had a computer science background, a medical paper was selected to add difficulty to the comprehension of the paper. After the four-minute mark, participants stopped reading and were then asked to read for another four minutes without the ETS software. The order of conditions was randomized, ensuring that some participants started with the ETS while others began without it.
The analytical focus was on the spatial and temporal distribution of fixations. These were identified using the advanced I-VT (Identification by Velocity Threshold) algorithm, which builds on the work of [55,56] and the I-VT framework in [57]. The I-VT filtering mechanism is widely regarded as the standard for pinpointing fixations in eye-tracking data analysis. Data was collected and analyzed using specialized software.
A 27′ monitor with 1920 × 1080 resolution was used. The observational distance maintained by the participants from the screen was set at 70 cm to ensure optimal visibility for the eye-tracking apparatus.
Upon completing the tasks, participants filled out a seven-point Likert scale (1: Strongly disagree to 7: Strongly agree, with 4 being a neutral statement) [58] post-experiment questionnaire to provide feedback on their experience. Questions included:
  • The system’s word translation feature enhanced my reading experience compared to my preferred conventional translation method.
  • I would recommend this system to others for reading and reading with the need for translation tasks.
  • Overall, I am satisfied with my experience using this system for reading and translation.
  • I found the system to be a valuable tool for improving my reading experience in a non-native language.
  • On a scale of 1 to 7, the system’s performance in word translation met my expectations.
After completing the post-questionnaire, a semi-structured interview was conducted to gain deeper insights into the user experience. The interview questions included:
  • Could you describe your overall experience with the eye-tracking-based translation aid system for non-native language reading comprehension? What aspects did you find the most interesting?
  • How engaged did you feel during the learning process? What factors contributed to your level of engagement?
  • Did you find the eye-tracking-based translation aid system to be an effective tool for improving your non-native language reading comprehension? Why or why not?
  • What was your favorite aspect of the eye-tracking-based translation system? How did it enhance your understanding of the text?
  • What was the most challenging aspect of using the ETS system? How did you overcome these challenges?
  • How would you compare this learning method to conventional methods of non-native language learning and comprehension? What are the main differences, and which approach do you prefer?
  • Based on your experience, do you have any suggestions for improving the eye-tracking-based translation system or the learning environment for non-native learning reading comprehension?
  • Would you recommend the use of the eye-tracking-based translation system to other non-native language learners? Why or why not?
  • When exactly would you prefer the translation pop-up to appear when you are reading the text?
This procedure provided quantitative data on reading performance and qualitative insights into user experience and system usability.
Finally, regarding eye-tracking data, the following key metrics were collected:
  • Fixation Count: The average number of fixations detected within an active area of interest (AOI).
  • Fixation Duration: The average duration of all fixations within an active AOI. A visit is defined as the period from the first fixation on the AOI to the end of the last fixation within the same AOI, provided no fixations outside the AOI during this time.
  • Average Fixation Duration: The average fixation duration refers to the mean time spent by the eyes in a single fixation within a specified area of interest (AOI). It is calculated by summing the duration of all fixations within the AOI and dividing them by the number of fixations. This metric provides insights into how long a participant visually engages with specific elements within the AOI, reflecting cognitive processing and attention levels [59,60].

4.3. Participants

We initially recruited fifty-three participants by announcing the experiment to the “Software Quality Assurance and Standards” undergraduate course. However, six participants were excluded from the final analysis due to issues encountered during the calibration process, which prevented us from retrieving their eye-tracking data with the desired level of accuracy. This resulted in a sample of forty-seven participants (M = 22.32, SD = 1.086; thirty-four identified as males, thirteen as females). Regarding age distribution, most participants belonged to the age group of 18–24 (96.2%), and the rest were in the age group 26–30, which is indicative of our target audience. Bachelor students accounted for 95.7% of the sample level of education, whereas master’s students accounted for the remaining 4.3%.

4.4. Method

Data regarding the fixations across all areas of interest (AOIs) were extracted and subjected to an in-depth statistical analysis. AOIs were set and labeled to correspond to each word that the participants would happen to glance at throughout the document that was given to them. The dual use of the eye-tracker in our experiment—i.e., translating the words that the users fixate on and tracking the users’ eye movements—made it easy to correlate our data with their representation in the experiment. Simply put, words that would be translated would also serve as fixation points. Furthermore, we calculated the average reading speed of each applicant using Equation (2):
R e a d i n g   S p e e d = N u m b e r   o f   W o r d s   R e a d   /   R e a d i n g   T i m e
Given that the purpose of the software is to make it easier for users to read through academic papers and improve over time, measuring reading speed would help us get a solid first impression on the matter.

4.5. Results

As previously mentioned, this was a within-subject study where the order of the conditions was randomized for each participant, as shown in Table 1. We measured four variables through our system: number of words read, total duration of whole fixation, number of whole fixations, and average duration of fixations, as well as a five-item Likert scale questionnaire. Descriptive statistics of the first four appear in Table 2. Descriptive statistics regarding participants’ self-reported ratings of the system appear in Table 3. An independent sample Mann–Whitney U Test for all cases proved that whether the participants started the experiment using the system or not did not affect their performance. This behavior can be further inspected in Table 4. Consequently, we are able to analyze the data collectively, irrespective of participants’ starting conditions. The assumption of normality was evaluated using the Shapiro–Wilk test, with the results presented in Table 5.
For the cases regarding reading speed (number of words read) and average duration of fixations, we could not disprove the assumption that our data was following a normal distribution. The same cannot be said for the other two cases regarding the total duration of whole fixations and the total number of whole fixations, where the assumption for normality was ultimately disproved.

4.5.1. RQ1: System Impact on Reading Speed

An independent sample t-test on the number of words read, depending on whether the system was used or not, produced a p-value of 0.027, rejecting the null hypothesis. By comparing the mean values of each group, we also can see that the words read during the use of the software show a significant increase, where their mean values are 181.28 while not using the system and 223.81 while using the system. Figure 5 illustrates the total words read by users in each condition, highlighting the overall improvement in reading performance when the software is ulitized. By applying Equation (2), we can estimate the average reading speed of each case.
This result aligns with previous findings indicating that eye-tracking systems that offer real-time assistance can significantly improve reading speed and comprehension, particularly when reading non-native language and technical texts [27,31]. The faster reading speed suggests that the system helps alleviate the cognitive load by providing immediate translations, thus allowing users to maintain their reading flow without being disrupted by unfamiliar words [28]. By calculating Cohen’s d, we found a d-value of 0.59, signifying a meaningful and likely substantial difference between the two groups.

4.5.2. RQ2: System Impact on Fixation Duration

The independent samples Mann–Whitney U Test on the total duration time, depending on whether the system was used or not, produced a p-value of <0.01, rejecting the null hypothesis that the system does not have an impact on the total duration of fixations. A closer look at the descriptives showed us that the mean value of the duration of whole fixations with the use of the system was 208,700.1 s, whereas the mean value of the duration of whole fixations without the use of the system was 176,651.5 s, indicating that the use of the system results in longer fixation times in total compared to reading without system assistance. This may seem to contradict our first findings from RQ1. This finding suggests that the system’s pop-up translations may have prompted users to pause and better comprehend unfamiliar terms. Such results echo prior studies highlighting how gaze-based tools can direct cognitive attention to critical learning moments, leading to deeper processing of challenging words [25,32]. We calculated a Cohen’s d of 1.13, signifying a substantial difference between the group means compared to the variability.

4.5.3. RQ3: System Impact on Fixation Count

An independent-sample Mann–Whitney U Test on the null hypothesis of “the distribution of the number of whole fixations is the same with and without the use of the system”, produces a p-value of 0.677 and thus does not reject the null hypothesis. This implies that the reading flow of the participants was not disturbed by the use of the system.

4.5.4. RQ4: System Impact on Average Fixation Duration

The null hypothesis, “The average fixation duration is the same with and without the use of the system”, is rejected since an independent samples t-test showed a p-value < 0.01. By comparing the mean values of the average fixation duration while using the system (mean = 235.1), we can see that our system significantly reduced the average fixation duration in comparison to when the users are not using the system (mean = 261.3).

4.5.5. Post-Questionnaire Likert Scale Analysis

After examining the results from the Likert scale and testing for normality (p < 0.05 for all cases), we could see a very strong internal correlation between all the questions (Cronbach’s Alpha = 0.914), which indicates that the values for the responses were consistent throughout. We can now treat the mean values of the results of the Likert Scale as indicative of what each question represented. Namely, for each of these questions, we encountered mean values of 4.68, 5.3, 5.02, 5.47, and 4.89 in order.
These means suggest a generally positive response across the items, with each question scoring close to or above the midpoint on the scale, pointing to favorable responses. The highest mean score (5.47) suggests particularly strong agreement or positive feedback on that item, indicating it may hold greater significance or resonance with participants. Conversely, while the lowest mean (4.68) still reflects a generally positive trend, it may indicate slightly less agreement or emphasis compared to other items.
Given this internal consistency and high Cronbach’s Alpha, these items likely tap into a coherent underlying construct, allowing us to interpret these average scores with confidence. Most of the respondents reported that they found the system’s word translation feature to be beneficial for their reading experience. Participants generally expressed a positive sentiment towards recommending the system to others for reading if they require assistance with translating the text. Additionally, they generally viewed the system as a valuable tool for enhancing their reading experience in a non-native language. A closer look at user satisfaction was conducted through the interviews with the participants.

4.5.6. Participant Interviews

The overall opinion and response from participants regarding the system were generally positive during the anonymous interviews and data collection phase. Users reported improved usability, user experience, and effectiveness on the reading task. Opinions diverged on certain issues, such as the timing of the translation pop-ups and their placement, as well as the preferred number of words to be translated. Below, the most prevalent opinions are presented.
Most of the users provided positive feedback regarding their experience with the eye-tracking system. Many appreciated the user interface, noting that “the system had a good and user-friendly UI”. They found that as they continued using the system, they were able to adapt to it effectively, with users reporting that “I was learning to adapt to the system’s behavior as I was using it”. Additionally, the translation pop-ups were praised for their responsiveness, with users noting that “The response of the translation pop-up was immediate”. The system also gave the users the impression that it was helping them to be more efficient with their reading, as they commented that: “It reduced the time needed to read and understand the text”. Users felt that it was easy to translate the desired words to a satisfactory degree, which contributed to a more streamlined reading experience.
With the applied settings, most users mentioned that the system translated more words than they desired, which means that the threshold of 50 ms + (word length × 10 ms) was an overestimate of how sensitive the translation sensitivity should be for an average user. Some users have reported difficulties with translating pop-ups obscuring the text they are reading. In our study, 16 out of 47 participants (34%) noted some minor occurrences of misalignment, while most of them (13 out of 16) mentioned that the misalignment was not constant but occurred sporadically. More importantly, all participants explained that this issue did not obscure their reading comprehension. These instances were primarily linked to minor head movements outside the calibrated virtual track box.
This issue likely stems from the eye tracker’s sensitivity; moving around too much and outside the track-box of the eye tracker can lead to reduced accuracy—an issue that the participants were warned about—but is of course logical given the duration of the whole experiment. As a result, the text would sometimes appear at the top or the bottom row.
Regarding interaction, a few users preferred a less intrusive pop-up system, suggesting the appearance of translations in a lower or sidebar of the screen to avoid disrupting the reading flow. One participant noted, “I would prefer the pop-up to not be intrusive in the text. Maybe it could be in a lower bar at the bottom of the screen”. The adaptability of the system to the user’s foreign language proficiency was significant.
Regarding translation accuracy, participants suggested alterations that they considered would benefit both their experience and the software’s efficiency. A participant suggested, “I would like to see the system adapting to the language level of expertise of each user so that it does not translate as many words for experienced users”. There was also a preference for translating entire sentences to prevent distraction from multiple pop-ups. As one user pointed out, “The meaning of the text changes based on the sentence context. So, I would prefer if the system translated in a phrase or sentence level instead, or even replaced the text with the Greek translation in real time”, similar to tools like Google Lens [61].
Some users noted they would prefer greater freedom of movement while using the eye-tracking system, as they were instructed not to move significantly from the initial calibration position to maintain data accuracy, given the eye-tracking device’s sensitivity. However, due to the varying data accuracy among participants, there were some deviations in the reading line, resulting in negative feedback for the system. Specifically, some users complained about the accuracy of the lines they were reading, as during some experiments, translation pop-ups did not correspond to the current word on the line but referred to the one above or below. One user mentioned: “I would like to be less stable and at a static position while reading. I would like a more flexible reading position”.
The overall positive feedback received by the participant interviews compliments and ensures the validity of the Likert scale results. Furthermore, through the interviews, participants had the freedom to express their own views and ideas on how software like that could be tailored to their needs. Ideas like adjusting the response time based on the experience of the user, making the translation text less intrusive by having it on the bottom of the screen, and translating an entire sentence instead of one word at a time could all be great scenarios for future work. On the other hand, drawbacks of the system, like having the users sit still in order to avoid errors with the system calibration, can act as valuable insight into the potential limitations of eye-tracking-based systems or an opportunity to solve those problems.

5. Discussion

The present study investigated the effectiveness of ETS in enhancing reading comprehension and engagement for users reading complex, technical texts in a non-native language. As digital learning tools evolve, adaptive technologies like ETS have the potential to bridge comprehension gaps while maintaining the natural flow of reading. Through a combination of quantitative data analysis and qualitative user feedback, this study provides insights into the cognitive impact of ETS, highlighting both the benefits and areas for improvement in real-time reading assistance. By examining how ETS influenced reading speed, fixation duration, and user satisfaction, we can assess its practical application in educational and professional settings.
The conflicting nature of the results may point to a trade-off between speed and depth of comprehension. While participants read statistically faster on average with ETS, the increased fixation duration on specific words may imply that the system encouraged thorough understanding of difficult terms rather than facilitating only quick, surface-level comprehension. This is particularly important in educational contexts, where the balance between efficiency and depth of learning is crucial [21,30]. Notably, the cognitive load was significantly higher without ETS, as users frequently needed to interrupt their reading to manually search for word translations, disrupting their focus and processing flow. ETS helped mitigate this by providing automatic translations only when cognitive strain was detected, thus minimizing manual interruptions.
Future iterations of the software could incorporate features that allow users to adjust the sensitivity of the translation pop-up timing to align with their personal learning preferences, thus optimizing for either speed or depth as needed. To address the varying sensitivity preferences observed in the study, adjustable thresholds for translation triggers could be included to enable users to customize the pop-up frequency to better match their reading pace. Additionally, further customization options, such as toggling between word and sentence-level translations, could provide users with greater control over the type of assistance offered. Currently, ETS relies on Google Translate API for real-time translations. While effective for general purposes, the quality of translations can vary based on several factors [62]. Integrating a custom language model or applying AI algorithms for better translation results could enhance users’ reading comprehension, particularly in domain-specific contexts. This enhancement would not only address participant feedback but also align with the broader goal of optimizing ETS for professional and educational applications.
Technical limitations of ETS also warrant further discussion. The requirements for users to remain relatively still for optimal eye-tracking accuracy were cited as a constraint. Head movements outside the calibrated virtual track box occasionally resulted in translation misalignment. Wearable eye trackers, while less precise than desktop trackers [63,64], could offer greater mobility and adaptability, expanding ETS’s applicability to dynamic reading scenarios.
Future research could further broaden ETS’s applicability by testing its effectiveness with a wider range of text types, from simple narratives to highly technical materials. This would provide a deeper understanding of how ETS performs in varying cognitive and linguistic contexts and extend its utility beyond the current focus on technical and professional domains. By including multiple text difficulties and user proficiency levels, ETS could better cater to diverse user needs and demonstrate adaptability across different reading scenarios. Another crucial part for future research would be the integration of AI techniques to analyze reading patterns and dynamically adjust settings based on the reader’s proficiency and behavior, as the adaptability and personalization potential of ETS offer opportunities for future development. This real-time personalization could enhance both usability and learning outcomes, making ETS a valuable tool for long-term language retention and comprehension.
Additionally, while the current study focused on immediate reading metrics such as speed and fixation duration, understanding ETS’s impact on long-term language retention and comprehension would provide valuable insights into its broader educational impact. Future follow-up studies could assess how ETS influences users’ ability to recall and retain new vocabulary or concepts over time. Exploring these long-term effects will clarify whether ETS’s benefits extend beyond short-term reading support and further establish its value as a tool for educational and professional applications. Enhancing ETS with additional customization options, such as adjustable pop-up timing and personalized user settings, could better support diverse learning strategies and optimize educational outcomes.
Nonetheless, the study also identified areas for improvement, particularly regarding user interface design and customization options. Some participants found the translation pop-ups to be intrusive, with several suggesting that displaying translations in a sidebar or lower part of the screen could reduce the cognitive load associated with shifting between the translated and original text. This feedback aligns with established principles in human-computer interaction, which emphasize the importance of non-intrusive design in assistive technologies to improve user engagement and learning outcomes [65].
The participant feedback gathered from the Likert scale ratings and interviews provided valuable insights into ETS’s usability and effectiveness. Users generally expressed satisfaction with the system’s word translation feature, acknowledging its positive impact on their reading experience. Many participants noted that ETS’s immediate translations helped them understand complex terms more quickly than conventional methods, such as using dictionaries or separate translation software. Moreover, the positive sentiment towards recommending ETS to others indicates that users see the potential of the system to benefit a wider audience, including students, researchers, and professionals working with foreign-language materials. These findings highlight ETS’s relevance as a tool that enhances accessibility to scientific and technical literature, broadening the scope of accessible information across linguistic barriers.
Moreover, the user feedback gathered during the semi-structured interviews provided further insights into how future iterations of the software might evolve. Most of the participants appreciated the frequent translation pop-ups, finding them helpful for comprehension, although a smaller yet notable percentage expressed a preference for fewer translations. For example, users suggested a less intrusive translation pop-up, such as having translations displayed in a sidebar, which could help reduce the cognitive load associated with shifting between the translation and the original text. This observation is consistent with other research in human-computer interaction (HCI), which emphasizes that non-intrusive assistive tools tend to enhance user experience and learning outcomes [1,24]. Additionally, some participants proposed the inclusion of sentence-level translation, a feature that could further enhance comprehension by contextualizing individual word translations, which is supported by findings in translation studies that emphasize the importance of context for accurate translation [30,66,67,68].

6. Conclusions

This study explored the design, development, and evaluation of an eye-tracking-based translation assistance system (ETS) aimed at enhancing the reading flow and understanding of non-native speakers when engaging with complex and technical texts. By leveraging real-time eye-tracking data, ETS dynamically identifies moments of cognitive load when users encounter challenging words and provides immediate translations. The system’s ability to offer targeted assistance at critical points allows users to maintain reading flow and comprehension without constant manual intervention, making it a valuable tool for research and learning in non-native languages.
The results of our study revealed an impact of ETS on reading speed. Participants demonstrated an increase in reading speed when using ETS, suggesting that the system successfully reduces the cognitive burden associated with unfamiliar vocabulary. The immediate translation of challenging words likely prevented disruptions in the reading flow, allowing users to engage more naturally with the text. However, our findings also indicated longer fixation durations on translated words, highlighting a trade-off between speed and depth of comprehension. This suggests that while ETS facilitates more efficient reading, it also encourages users to spend additional time on complex terms, potentially promoting a deeper understanding of the content. This aligns with the educational objective of fostering meaningful learning rather than superficial reading, especially in contexts where comprehension is as critical as efficiency.
The eye-tracking and translation features implemented in ETS address key limitations in existing language learning and translation tools. Unlike general-purpose translators that translate entire texts, ETS focuses on maintaining the reader’s engagement with the original language by intervening only when necessary. This selective translation approach preserves the user’s exposure to the foreign language, enhancing their language proficiency while optimizing reading efficiency. This balance between language exposure and cognitive support reflects a promising direction for reading aids that aim to assist without undermining the learning process.
The experimental results and participant feedback highlight the importance of flexibility in assistive reading tools. The varying preferences for pop-up placement, response time, and translation level demonstrate that a one-size-fits-all approach may not fully address diverse user needs [69]. Customizable settings in future iterations of ETS could allow users to tailor the system’s behavior to their reading habits and proficiency levels. For example, advanced users might benefit from delayed or sentence-level translations, while beginners could prefer quicker, word-level assistance. Adjusting the timing and sensitivity of translation pop-ups could reduce perceived intrusiveness and make ETS adaptable to a broader range of reading strategies and preferences. By storing eye-tracking data in Elasticsearch, ETS also enables the integration of AI techniques to analyze reading patterns and dynamically adjust system settings to match individual skill levels, paving the way for enhanced personalization and improved user experiences.
Finally, the positive reception of ETS and the insights gained from this study contribute to a growing body of evidence supporting the potential of eye-tracking technology in educational and professional applications [64]. By addressing both cognitive and linguistic challenges, ETS provides a foundation for further development of personalized reading aids. By incorporating AI techniques, future work could achieve real-time personalization of the reading assistance, adapting the system’s support to the unique reading patterns and comprehension levels of each user. This approach also aims to respond to individual learning needs and improve the accessibility of complex information. The implications of this work extend beyond language translation, suggesting a framework for adaptive, user-centered reading aids that respond dynamically to individual needs.

Author Contributions

Conceptualization, D.M. and M.X.; Methodology, D.M. and M.X.; Software, D.M. and E.T.; Validation, D.M. and M.X.; Formal analysis, D.M., E.T. and K.R.; Investigation, D.M. and E.T.; Resources, D.M. and E.T.; Data curation, D.M., E.T. and K.R.; Writing—original draft, D.M., E.T. and K.R.; Writing—review & editing, D.M., K.R. and M.X.; Supervision, M.X.; Project administration, D.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Ethical review and approval were waived for this study as it did not require formal approval under the regulations of University of Patras, since the study did not involve medical interventions or high-risk procedures. However, the study was conducted in accordance with the Declaration of Helsinki, and informed consent was obtained in writing from all participants prior to their involvement.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to privacy concerns and compliance with GDPR regulations.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Stephanidis, C.; Salvendy, G.; Antona, M.; Chen, J.Y.C.; Dong, J.; Duffy, V.G.; Fang, X.; Fidopiastis, C.; Fragomeni, G.; Fu, L.P.; et al. Seven HCI Grand Challenges. Int. J. Hum.–Comput. Interact. 2019, 35, 1229–1269. [Google Scholar] [CrossRef]
  2. Hao, R.; Liu, D.; Hu, L. Enhancing Human Capabilities through Symbiotic Artificial Intelligence with Shared Sensory Experiences. arXiv 2023, arXiv:2305.19278. Available online: http://arxiv.org/abs/2305.19278 (accessed on 4 July 2024).
  3. Rawas, S. AI: The future of humanity. Discov. Artif. Intell. 2024, 4, 25. [Google Scholar] [CrossRef]
  4. Jeelani, I.; Albert, A.; Han, K. Improving Safety Performance in Construction Using Eye-Tracking, Visual Data Analytics, and Virtual Reality. In Construction Research Congress 2020; American Society of Civil Engineers: Tempe, AZ, USA, 2020; pp. 395–404. [Google Scholar]
  5. Vlačić, S.; Knežević, A.; Rođenkov, S.; Mandal, S.; Vitsas, P.A. Improving the pilot selection process by using eye-tracking tools. J. Eye Mov. Res. 2020, 12, 5–6. [Google Scholar] [CrossRef] [PubMed]
  6. Hutton, S.B. Eye Tracking Methodology. In Eye Movement Research; Klein, C., Ettinger, U., Eds.; Studies in Neuroscience, Psychology and Behavioral Economics; Springer International Publishing: Cham, Switzerland, 2019; pp. 277–308. ISBN 978-3-030-20083-1. [Google Scholar]
  7. Borsato, F.H.; Morimoto, C.H. Towards a low cost and high speed mobile eye tracker. In ETRA’19, Proceedings of the 11th ACM Symposium on Eye Tracking Research & Applications, Denver, CO, USA, 25–28 June 2019; ACM: New York, NY, USA, 2019; pp. 1–9. [Google Scholar]
  8. Cioffi, G.M.; Pinilla-Echeverri, N.; Sheth, T.; Sibbald, M.G. Does artificial intelligence enhance physician interpretation of optical coherence tomography: Insights from eye tracking. Front. Cardiovasc. Med. 2023, 10, 1283338. [Google Scholar] [CrossRef]
  9. Duchowski, A.T. Eye Tracking Methodology; Springer International Publishing: Cham, Switzerland, 2017; ISBN 978-3-319-57881-1. [Google Scholar]
  10. Ndiaye, Y.; Lim, K.H.; Blessing, L. Eye tracking and artificial intelligence for competency assessment in engineering education: A review. Front. Educ. 2023, 8, 1170348. [Google Scholar] [CrossRef]
  11. Gardony, A.L.; Lindeman, R.W.; Brunyé, T.T. Eye-tracking for human-centered mixed reality: Promises and challenges. In Proceedings of the Optical Architectures for Displays and Sensing in Augmented, Virtual, and Mixed Reality (AR, VR, MR), San Francisco, CA, USA, 2–4 February 2020; SPIE: Bellingham, WA, USA, 2020; Volume 11310, pp. 230–247. Available online: https://www.spiedigitallibrary.org/conference-proceedings-of-spie/11310/113100T/Eye-tracking-for-human-centered-mixed-reality--promises-and/10.1117/12.2542699.short (accessed on 25 October 2024).
  12. Toyama, T.; Sonntag, D.; Dengel, A.; Matsuda, T.; Iwamura, M.; Kise, K. A mixed reality head-mounted text translation system using eye gaze input. In Proceedings of the 19th International Conference on Intelligent User Interfaces, Haifa, Israel, 24–27 February 2014; ACM: New York, NY, USA, 2014; pp. 329–334. [Google Scholar]
  13. Menges, R.; Kumar, C.; Staab, S. Improving User Experience of Eye Tracking-Based Interaction: Introspecting and Adapting Interfaces. ACM Trans. Comput.-Hum. Interact. 2019, 26, 37. [Google Scholar] [CrossRef]
  14. Lupu, R.G.; Bozomitu, R.G.; Păsărică, A.; Rotariu, C. Eye tracking user interface for Internet access used in assistive technology. In Proceedings of the 2017 E-Health and Bioengineering Conference (EHB), Sinaia, Romania, 22–24 June 2017; pp. 659–662. Available online: https://ieeexplore.ieee.org/abstract/document/7995510/ (accessed on 24 October 2024).
  15. Giglio, A.D.; Costa, M.U.P. da The use of artificial intelligence to improve the scientific writing of non-native english speakers. Rev. Assoc. Médica Bras. 2023, 69, e20230560. [Google Scholar] [CrossRef]
  16. Amano, T.; Ramírez-Castañeda, V.; Berdejo-Espinola, V.; Borokini, I.; Chowdhury, S.; Golivets, M.; González-Trujillo, J.D.; Montaño-Centellas, F.; Paudel, K.; White, R.L. The manifold costs of being a non-native English speaker in science. PLoS Biol. 2023, 21, e3002184. [Google Scholar] [CrossRef] [PubMed]
  17. Yang, J. The Perception of Pre-service English Teachers’ use of AI Translation Tools in EFL Writing. J. Converg. Cult. Technol. 2024, 10, 121–128. [Google Scholar]
  18. De la Vall, R.R.F.; Araya, F.G. Exploring the benefits and challenges of AI-language learning tools. Int. J. Soc. Sci. Humanit. Invent. 2023, 10, 7569–7576. [Google Scholar] [CrossRef]
  19. Liu, T.-C.; Lin, P.-H. What comes with technological convenience? Exploring the behaviors and performances of learning with computer-mediated dictionaries. Comput. Hum. Behav. 2011, 27, 373–383. [Google Scholar] [CrossRef]
  20. Guarnera, D.T.; Bryant, C.A.; Mishra, A.; Maletic, J.I.; Sharif, B. iTrace: Eye tracking infrastructure for development environments. In Proceedings of the 2018 ACM Symposium on Eye Tracking Research & Applications, Warsaw, Poland, 14–17 June 2018; ACM: New York, NY, USA, 2018; pp. 1–3. [Google Scholar]
  21. Morrison, A.B.; Richmond, L.L. Offloading items from memory: Individual differences in cognitive offloading in a short-term memory task. Cogn. Res. 2020, 5, 1. [Google Scholar] [CrossRef]
  22. Klaib, A.F.; Alsrehin, N.O.; Melhem, W.Y.; Bashtawi, H.O.; Magableh, A.A. Eye tracking algorithms, techniques, tools, and applications with an emphasis on machine learning and Internet of Things technologies. Expert Syst. Appl. 2021, 166, 114037. [Google Scholar] [CrossRef]
  23. Tong, Y.; Lu, W.; Yu, Y.; Shen, Y. Application of machine learning in ophthalmic imaging modalities. Eye Vis. 2020, 7, 22. [Google Scholar] [CrossRef] [PubMed]
  24. Wu, C.-I. HCI and eye tracking technology for learning effect. Procedia-Soc. Behav. Sci. 2012, 64, 626–632. [Google Scholar] [CrossRef]
  25. Biedert, R.; Buscher, G.; Dengel, A. The eyeBook—Using Eye Tracking to Enhance the Reading Experience. Inform. Spektrum 2010, 33, 272–281. [Google Scholar] [CrossRef]
  26. Jakobsen, A.L. Translation technology research with eye tracking. In The Routledge Handbook of Translation and Technology; Routledge: London, UK, 2019; pp. 398–416. Available online: https://www.taylorfrancis.com/chapters/edit/10.4324/9781315311258-28/translation-technology-research-eye-tracking-arnt-lykke-jakobsen (accessed on 8 July 2024).
  27. Hyrskykari, A.; Majaranta, P.; Aaltonen, A.; Räihä, K.-J. Design issues of iDICT: A gaze-assisted translation aid. In Proceedings of the Symposium on Eye Tracking Research & Applications—ETRA’00, Gardens, FL, USA, 6–8 November 2000; ACM Press: New York, NY, USA, 2000; pp. 9–14. [Google Scholar]
  28. Sibert, J.L.; Gokturk, M.; Lavine, R.A. The reading assistant: Eye gaze triggered auditory prompting for reading remediation. In Proceedings of the 13th Annual ACM Symposium on User Interface Software and Technology—UIST’00, San Diego, CA, USA, 6–8 November 2000; ACM Press: New York, NY, USA, 2000; pp. 101–107. [Google Scholar]
  29. Špakov, O.; Räihä, K.-J. KiEV: A tool for visualization of reading and writing processes in translation of text. In Proceedings of the 2008 Symposium on Eye Tracking Research & Applications—ETRA’08, Savannah, GA, USA, 26–28 March 2008; ACM Press: New York, NY, USA, 2008; p. 107. [Google Scholar]
  30. Ho, T.-Y.; Wang, H.-C.; Lai, S.-H. Non-native Language Reading Support with Display of Machine Translation Based on Eye-Tracking and Sentence-Level Mapping. In Proceedings of the Sixth International Symposium of Chinese CHI, Montreal, QC, Canada, 21–22 April 2018; Association for Computing Machinery: New York, NY, USA, 2018; pp. 57–63. [Google Scholar]
  31. Guo, W.; Cheng, S. An Approach to Reading Assistance with Eye Tracking Data and Text Features. In Proceedings of the Adjunct of the 2019 International Conference on Multimodal Interaction, Suzhou, China, 14–18 October 2019; ACM: New York, NY, USA, 2019; pp. 1–7. [Google Scholar]
  32. Palmer, C.; Sharif, B. Towards automating fixation correction for source code. In Proceedings of the Ninth Biennial ACM Symposium on Eye Tracking Research & Applications, Charleston, SC, USA, 14–17 March 2016; ACM: New York, NY, USA, 2016; pp. 65–68. [Google Scholar]
  33. Elasticsearch, B.V. Elasticsearch. Software. Version 2018, 6. Available online: https://www.aeris-consulting.com/wp-content/uploads/2022/04/Elasticsearch.pdf (accessed on 12 August 2024).
  34. Akhil, S. An overview of tesseract OCR engine. In Proceedings of the A Seminar Report. Department of Computer Science and Engineering National Institute of Technology, Calicut, Monsoon, 2016; Available online: https://www.academia.edu/32328825/An_overview_of_Tesseract_OCR_Engine (accessed on 12 August 2024).
  35. Fedosejev, A. React. Js Essentials; Packt Publishing Ltd.: Birmingham, UK, 2015. [Google Scholar]
  36. Al Salmi, H. Comparative CSS frameworks. Multi-Knowl. Electron. Compr. J. Educ. Sci. Publ. (MECSJ) 2023, 66, 1–35. Available online: https://mecsj.com/uplode/images/photo/hat4_.pdf (accessed on 12 August 2024).
  37. Libby, A. Styling the Site and Content. In Beginning Eleventy; Apress: Berkeley, CA, USA, 2022; pp. 155–187. ISBN 978-1-4842-8314-1. [Google Scholar]
  38. Maj, W. TypeScript. Available online: https://github.com/wojtekmaj/react-pdf (accessed on 12 August 2024).
  39. Ronacher, A. Flask Documentation 2021. Welcome to Flask-Flask Documentation (2.0. x). 2021. Available online: https://media.readthedocs.org/pdf/flask-russian-docs/0.9/flask-russian-docs.pdf (accessed on 12 August 2024).
  40. Abdelhamied, M.A.H. Client-Side Security Using CORS. 2016. Available online: http://dspace.unive.it/handle/10579/8046 (accessed on 12 August 2024).
  41. Allen, G.; Owens, M. Introducing SQLite. In The Definitive Guide to SQLite; Apress: Berkeley, CA, USA, 2010; pp. 1–16. ISBN 978-1-4302-3225-4. [Google Scholar]
  42. Voigt, P.; Von Dem Bussche, A. The EU General Data Protection Regulation (GDPR); Springer International Publishing: Cham, Switzerland, 2017; ISBN 978-3-319-57958-0. [Google Scholar]
  43. Lombardi, A. WebSocket: Lightweight Client-Server Communications; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2015. [Google Scholar]
  44. Google Translate. Wikipedia 2024. Available online: https://en.wikipedia.org/w/index.php?title=Google_Translate&oldid=1253147815 (accessed on 25 October 2024).
  45. Patel, C.; Patel, A.; Patel, D. Optical character recognition by open source OCR tool tesseract: A case study. Int. J. Comput. Appl. 2012, 55, 50–56. [Google Scholar] [CrossRef]
  46. Sarica, S.; Luo, J. Stopwords in technical language processing. PLoS ONE 2021, 16, e0254937. [Google Scholar] [CrossRef]
  47. Johnson, M.; Schuster, M.; Le, Q.V.; Krikun, M.; Wu, Y.; Chen, Z.; Thorat, N.; Viégas, F.; Wattenberg, M.; Corrado, G. Google’s multilingual neural machine translation system: Enabling zero-shot translation. Trans. Assoc. Comput. Linguist. 2017, 5, 339–351. [Google Scholar] [CrossRef]
  48. Niehorster, D.C.; Cornelissen, T.H.W.; Holmqvist, K.; Hooge, I.T.C.; Hessels, R.S. What to expect from your remote eye-tracker when participants are unrestrained. Behav. Res. Methods 2018, 50, 213–227. [Google Scholar] [CrossRef] [PubMed]
  49. Hessels, R.S.; Cornelissen, T.H.W.; Kemner, C.; Hooge, I.T.C. Qualitative tests of remote eyetracker recovery and performance during head rotation. Behav. Res. 2015, 47, 848–859. [Google Scholar] [CrossRef]
  50. Charness, G.; Gneezy, U.; Kuhn, M.A. Experimental methods: Between-subject and within-subject design. J. Econ. Behav. Organ. 2012, 81, 1–8. [Google Scholar] [CrossRef]
  51. Lakens, D. Calculating and reporting effect sizes to facilitate cumulative science: A practical primer for t-tests and ANOVAs. Front. Psychol. 2013, 4, 863. [Google Scholar] [CrossRef] [PubMed]
  52. Carter, B.T.; Luke, S.G. Best practices in eye tracking research. Int. J. Psychophysiol. 2020, 155, 49–62. [Google Scholar] [CrossRef]
  53. Suresh, K.P. An overview of randomization techniques: An unbiased assessment of outcome in clinical research. J. Hum. Reprod. Sci. 2011, 4, 8–11. [Google Scholar] [CrossRef] [PubMed]
  54. Mohr, D.L.; Wilson, W.J.; Freund, R.J. Statistical Methods; Academic Press: Cambridge, MA, USA, 2021. [Google Scholar]
  55. Komogortsev, O.V.; Gobert, D.V.; Jayarathna, S.; Gowda, S.M. Standardization of automated analyses of oculomotor fixation and saccadic behaviors. IEEE Trans. Biomed. Eng. 2010, 57, 2635–2645. [Google Scholar] [CrossRef]
  56. Olsen, A. The Tobii I-VT fixation filter. Tobii Technol. 2012, 21, 4–19. [Google Scholar]
  57. Ezer, T.; Greiner, M.; Grabinger, L.; Hauser, F.; Mottok, J. Eye Tracking as Technology in Education: Data Quality Analysis and Improvements. In ICERI2023 Proceedings, Proceedings of the 16th Annual International Conference of Education, Research and Innovation, Seville, Spain, 13–15 November 2023; IATED: New York, NY, USA, 2023; pp. 4500–4509. Available online: https://library.iated.org/view/EZER2023EYE (accessed on 26 July 2024).
  58. Sullivan, G.M.; Artino Jr, A.R. Analyzing and interpreting data from Likert-type scales. J. Grad. Med. Educ. 2013, 5, 541–542. [Google Scholar] [CrossRef] [PubMed]
  59. Silva, B.B.; Orrego-Carmona, D.; Szarkowska, A. Using linear mixed models to analyze data from eye-tracking research on subtitling. Transl. Spaces 2022, 11, 60–88. [Google Scholar] [CrossRef]
  60. Brown-Schmidt, S.; Naveiras, M.; De Boeck, P.; Cho, S.-J. Statistical modeling of intensive categorical time-series eye-tracking data using dynamic generalized linear mixed-effect models with crossed random effects. In Psychology of Learning and Motivation; Elsevier: Amsterdam, The Netherlands, 2020; Volume 73, pp. 1–31. Available online: https://www.sciencedirect.com/science/article/pii/S0079742120300232 (accessed on 26 July 2024).
  61. Google Lens: Translate Text in Real Time. Available online: https://lens.google.com/ (accessed on 6 September 2024).
  62. Sun, Y.-C.; Yang, F.-Y.; Liu, H.-J. Exploring Google Translate-friendly strategies for optimizing the quality of Google Translate in academic writing contexts. SN Soc. Sci. 2022, 2, 147. [Google Scholar] [CrossRef] [PubMed]
  63. Huang, Z.; Duan, X.; Zhu, G.; Zhang, S.; Wang, R.; Wang, Z. Assessing the data quality of AdHawk MindLink eye-tracking glasses. Behav. Res. 2024, 56, 5771–5787. [Google Scholar] [CrossRef]
  64. Rosas, H.J.; Sussman, A.; Sekely, A.C.; Lavdas, A.A. Using eye tracking to Reveal responses to the built environment and its constituents. Appl. Sci. 2023, 13, 12071. [Google Scholar] [CrossRef]
  65. Keates, S.; Dowland, R. User Modelling and the Design of Computer-Based Assistive Devices. 1997. Available online: https://chooser.crossref.org/?doi=10.1049%2Fic%3A19970638 (accessed on 25 October 2024).
  66. Schaeffer, M.; Dragsted, B.; Hvelplund, K.T.; Balling, L.W.; Carl, M. Word Translation Entropy: Evidence of Early Target Language Activation During Reading for Translation. In New Directions in Empirical Translation Process Research; Carl, M., Bangalore, S., Schaeffer, M., Eds.; New Frontiers in Translation Studies; Springer International Publishing: Cham, Switzerland, 2016; pp. 183–210. ISBN 978-3-319-20357-7. [Google Scholar]
  67. Bowker, L. Machine Translation and Global Research: Towards Improved Machine Translation Literacy in the Scholarly Community. 2019. Available online: https://www.emerald.com/insight/content/doi/10.1108/978-1-78756-721-420191009/full/html (accessed on 6 September 2024).
  68. Brude, F.; Öhman Ekman, A. A Closer Look at Reading Strategies in the Swedish Syllabus for English as a Second Language: A Literature Review on Strategies for Reading in Upper Secondary School in Sweden. 2021. Available online: https://www.diva-portal.org/smash/record.jsf?pid=diva2:1609828 (accessed on 5 July 2024).
  69. King, R. Assistive reading technologies for struggling readers. Mt. R. Undergrad. Educ. Rev. 2015, 1, 26–28. Available online: http://mrujs.mtroyal.ca/index.php/mruer/article/view/317 (accessed on 25 October 2024). [CrossRef]
Figure 1. Mockup screen of the original translation pop-up, where user frustration was identified.
Figure 1. Mockup screen of the original translation pop-up, where user frustration was identified.
Ai 06 00005 g001
Figure 2. The final graphical user interface of ETS is when the user triggers a translation.
Figure 2. The final graphical user interface of ETS is when the user triggers a translation.
Ai 06 00005 g002
Figure 3. The user’s document is displayed as scrollable content, and a connection with the eye-tracker can be initialized.
Figure 3. The user’s document is displayed as scrollable content, and a connection with the eye-tracker can be initialized.
Ai 06 00005 g003
Figure 4. Experimental laboratory.
Figure 4. Experimental laboratory.
Ai 06 00005 g004
Figure 5. Word with and without ETS.
Figure 5. Word with and without ETS.
Ai 06 00005 g005
Table 1. Participant demographics.
Table 1. Participant demographics.
FrequencyPercentage
GenderMale3472.3%
Female1327.7%
Age18–244493.6%
25–3036.4%
EducationBachelor Student4495.7%
Master’s Student34.3%
English Proficiency LevelB2 (Lower)510.6%
C1 (Advanced)714.9%
C2 (Proficiency)3574.5%
Learning DifficultiesNo4596.2%
Yes23.8%
ConditionSoftware was used in the 1st part2144.7%
Software was used in the 2nd part2655.3%
Table 2. Descriptive statistics of independent variables.
Table 2. Descriptive statistics of independent variables.
VariableMeanMedianStandard Deviation95% CI
Number of words read.202.5200.586.9[185.0, 220.1]
Total duration of whole fixations.192,675.8200,514.061,549.4[180,233.2, 205,118.3]
Number of whole fixations.248.6250.033.8[241.8, 255.5]
Average duration of fixations786.1810.5255.3[434.5, 837.7]
Table 3. Descriptive statistics of participants’ self-reported ratings of their experience with the ETS system.
Table 3. Descriptive statistics of participants’ self-reported ratings of their experience with the ETS system.
Question (1: Strongly Disagree, 5: Strongly Agree)MeanMedianStandard
Deviation
95% CI
Q1. The system’s word translation feature enhanced my reading experience compared to my preferred conventional translation method.4.685.001.60[4.36, 5.00]
Q2. I would recommend this system to others for reading and reading with the need for translation tasks.5.306.001.44[5.01, 5.59]
Q3. Overall, I am satisfied with my experience using this system for reading and translation.5.025.001.38[4.74, 5.30]
Q4. I found the system to be a valuable tool for improving my reading experience in a non-native language.5.476.001.32[5.20, 5.73]
Q5. The system’s performance in word translation met my expectations.4.895.001.57[4.58, 5.21]
Table 4. Results of the Independent Samples Mann–Whitney U Test on starting condition.
Table 4. Results of the Independent Samples Mann–Whitney U Test on starting condition.
Null HypothesisTestSig.Decision
The distribution of the number of words read with software is the same despite the starting condition.Independent-Samples Mann–Whitney U Test0.831Retain the null hypothesis.
The distribution of the number of words read without software is the same despite the starting condition.Independent-Samples Mann–Whitney U Test0.28Retain the null hypothesis.
The distribution of the total duration of whole fixations with the use of the system is the same despite the starting conditionIndependent-Samples Mann–Whitney U Test0.797Retain the null hypothesis.
The distribution of the total duration of whole fixations without the use of the system is the same despite the starting conditionIndependent-Samples Mann–Whitney U Test0.5Retain the null hypothesis.
The distribution of the average duration of fixations with the use of the system is the same despite the starting condition.Independent-Samples Mann–Whitney U Test0.65Retain the null hypothesis.
The distribution of the average duration of fixations without the use of the system is the same despite the starting condition.Independent-Samples Mann–Whitney U Test0.387Retain the null hypothesis.
The distribution of the system number of whole fixations with the use of the system is the same across despite the starting condition.Independent-Samples Mann–Whitney U Test0.872Retain the null hypothesis.
The distribution of the number of whole fixations without the use of the system is the same despite the starting condition.Independent-Samples Mann–Whitney U Test0.374Retain the null hypothesis.
Table 5. Tests of normality of all cases.
Table 5. Tests of normality of all cases.
p-Values
Words read with ETS0.335
Words read without ETS0.177
Total duration of whole fixations with ETS<0.001
Total duration of whole fixations without ETS<0.001
Average duration with ETS0.735
Average duration without ETS0.206
Number of whole fixations with ETS<0.001
Number of whole fixations without ETS<0.001
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Minas, D.; Theodosiou, E.; Roumpas, K.; Xenos, M. Adaptive Real-Time Translation Assistance Through Eye-Tracking. AI 2025, 6, 5. https://doi.org/10.3390/ai6010005

AMA Style

Minas D, Theodosiou E, Roumpas K, Xenos M. Adaptive Real-Time Translation Assistance Through Eye-Tracking. AI. 2025; 6(1):5. https://doi.org/10.3390/ai6010005

Chicago/Turabian Style

Minas, Dimosthenis, Eleanna Theodosiou, Konstantinos Roumpas, and Michalis Xenos. 2025. "Adaptive Real-Time Translation Assistance Through Eye-Tracking" AI 6, no. 1: 5. https://doi.org/10.3390/ai6010005

APA Style

Minas, D., Theodosiou, E., Roumpas, K., & Xenos, M. (2025). Adaptive Real-Time Translation Assistance Through Eye-Tracking. AI, 6(1), 5. https://doi.org/10.3390/ai6010005

Article Metrics

Back to TopTop