Machine Learning Model Used to Predict Focus of Attention in Humans in Different Environments

Aryan Inamdar
Jun 6, 2021
3 min read

In recent years, researchers have begun to analyze the impact of an organism's environment on their neurological state, and more specifically, their attention span and field of view (FOV) when introduced to various driving scenarios/situations.

Dr. Davide Abati and Dr. Simone Calderara at the University of Modena in Italy both specialize in deep learning and computer vision and set out to design an experiment for such a topic. The goal of the experiment was to estimate what a person would pay attention to while driving, and which part of the scene around the vehicle is more critical for the task. As part of their background research, they analyzed the DR (eye) VE dataset, a collection of driving scenes for which eye-tracking annotations are available. This dataset features more than 500,000 registered frames, and was carefully analyzed by both Abati and Calderara to familiarize themselves with basic human attention patterns.

After gaining a fundamental understanding of their topic, the researchers began to design their experimentation method. Their test group consisted of genetically unrelated people of different ethnic backgrounds, genders, and ages. Over a span of 2 months, the subjects were shown videos that were recorded in different contexts, both in terms of landscape (downtown, countryside, highway) and traffic condition, ranging from traffic-free to highly cluttered scenarios. They were recorded in diverse weather conditions (sunny, rainy, cloudy) and at different hours of the day (both daytime and night).

From here, using a high quality eye-tracking camera, Abati and Calderara were able to track where the subject was looking by deriving x-y coordinates which represented what pixel the subject was looking at. Using this raw data, the researchers created an optical flow chart for each frame, showing the concentrations of their attention on different parts of the map by using a somewhat "infrared indexing" as shown below.

Note: Here, the subject gave its most attention to the bottom left of the image, where there is a green blob, showing an increased viewing in that area.

However, there was one problem with only this data. Although Abati and Calderara knew where the subject was looking, they did not know what he/she was looking at. In order to fix this problem, the researchers used an advanced, yet common method of computer vision known as semantic image segmentation. This is a method in which once can label specific regions of an image according to what's being shown. More specifically, the goal of semantic image segmentation is to label each pixel of an image with a corresponding class of what is being represented. Because every pixel in the image is being predicted for, this task is commonly referred to as dense prediction.

The researchers also implemented convolutional neural networks in order to predict what each pixel represented in terms of the frame. Specifically, each pixel in the frame was assigned an integer/semantic label that made it easier to distinguish class differences (shown below).

Note: In this case, the input was every frame of the videos used for experimentation on the test subjects.

Once each pixel has been properly assigned to an object class, fully segmenting the image becomes easy.

Note: an example of a fully segmented image using convolutional neural networks.

With both the optical flow chart and the segmented image, the researchers were able to integrate both of these aspects into one image, as shown below. This makes it easier to predict where and what the subject will look at given a certain driving environment.

Note: An integrated frame showing object segmentation and optical flow (green blob).

Finally, Abati and Calderara are able to use each integrated frame in their video as data points in order to accurately predict the location and quality of the subject's attention and field of view. Such technology can be implemented in cars in order to reduce the risk of car accidents.

Machine Learning Model Used to Predict Focus of Attention in Humans in Different Environments

Recent Posts

Comments