Eye-tracking technology is widely used in various scientific fields including cognitive psychology, linguistics, and computational sciences. Multilingual eye-tracking data collection refers to the process of collecting eye movement data from individuals while they read or look at stimuli in different languages. This type of data collection is useful for both human and machine language processing, as it provides insight into how people process information in different languages and can help researchers and developers improve language-related technologies.
In addition, it tells us how the brain processes different languages, including the differences in reading speed, attention allocation, and visual fixations between languages. This way we have a better understanding of the cognitive processes involved in language processing and it improves our understanding of bilingualism and multilingualism.
For machine language processing, multilingual eye-tracking data can be used to train and evaluate language models. By tracking eye movements as people read text, researchers can determine what information is important and what is not, which can help improve natural language processing models. Additionally, the data can help feed the machine learning algorithms to predict eye movements, to improve the accuracy of language models and the overall user experience in language-related technologies.
Eye-tracking is a useful technology for a multitude of applications. For example, it can help to detect tiredness while driving, or it can support the diagnosis of attention and language disorders. In addition to other applications in the medical domain, eye-tracking is also used in gaming, marketing and human-computer interaction.
Why is eye-tracking while reading especially interesting?
As you read these words, an eye tracker can follow your eye’s movements over the text. This provides information about how long you spend looking at a text, or more specifically, how long you spend on each word, which words you skip, and which words you dwell on.
Even though eye-tracking research for reading has been conducted for decades, there is no common standard for stimulus presentation. Moreover, there is no common standard for data pre-processing to answer open questions such as the minimum duration of fixations and saccades, velocity thresholds for saccades, the size of interest areas in reading.
The COST Action Enabling multilingual eye-tracking data collection for human and machine language processing research (MultiplEYE) aims to foster an interdisciplinary network of research groups working on collecting eye tracking data from reading in many languages. The Action also supports research in languages beyond Europe (US, Canada, Mexico, Pakistan) and aims to broaden language coverage in the future.
This collaborative network kicked off in September 2022 and brings together 129 researchers and scientists from 35 countries. Areas of expertise covers linguistics, psychology, computer, and information sciences to support the development of a large multilingual eye tracking corpus. This will enable researchers to collect data by sharing infrastructure and their knowledge between various fields, including linguistics, psychology, and computer science.
Laying the foundations
The motivation behind MultiplEYE is that eye-tracking data is still sparse, especially for smaller languages. Such a large data collection is a challenge in terms of developing and agreeing on the experimental design, the complexity and types of the texts to be read by the participants. Other decisions that seem less relevant but are in fact very important, include the font type and size the text is presented in, the order of the texts, the experiment procedure, and how the data will be processed.
However, once completed, this data-set will investigate many topics related to psycho-linguistics and computational linguistics. For example, this will allow the comparison of behaviors across different languages. Does the script, the Latin alphabet versus Cyrillic or Arabic scripts, have an impact on reading times? An example concerning the computational processing of text could involve using eye tracking data to advance artificial intelligence applications that imitate the human reading process. This could be used to build better machine translation systems or to improve the automatic extraction of keywords from text.
Eyes on the future
The main outcomes of MultiplEYE will be a large data-set containing eye-tracking data in many languages and a platform for new collaborations leveraging this type of data.
Multilingual eye-tracking data collection is a valuable tool for both human and machine language processing, providing valuable insights into how we process information in different languages and how technology can be improved to better support language processing.
This article was first published on 13 February by COST Association.