Gajus - stock.adobe.com
Eye-tracking software has been under development for at least two decades, and one of the leading companies in the domain is a startup success case from Sweden’s Stockholm area. Tobii, which now employs more than 600 people around the world, builds devices that provide attention computing services, of which eye tracking is a subset.
Eye-tracking technologies were first used for scientific research purposes. Studies conducted in a controlled environment tracked eye movements to draw conclusions about where subjects were directing their attention.
Unaware of all the work that had already gone into eye tracking for use in scientific research, three Swedish entrepreneurs came up with their own camera-based solution using infrared illumination to paint a pattern on the user’s eyes, which the camera then recognises. Software analysed the data to determine the direction of the person’s gaze.
The three entrepreneurs established Tobii 20 years ago on the premise that their technology could be used to enable people to control a computer with their eye movements.
Now, two decades later, the system is much more sophisticated – and the software uses some of the latest computer vision and machine learning techniques to determine much more than just where someone is looking. The techniques are very similar, whether used as a screen-based or a wearable eye-tracking system.
The market has changed in the meantime. Enterprises now use eye-tracking technologies for a growing number of applications, ranging from measuring user reactions to new products to detecting drowsy car drivers. Requirements have also changed.
In the early days, when eye tracking was used only in scientific settings, it was easy to control the environment. Researchers would have the subject put their head in a stand to keep it still, and they could also control the lighting. If a subject was particularly difficult to track, the researchers could just pick somebody else for the study. But with business applications, the objective is to attract users and to operate in as many environments as possible.
“The challenge with eye tracking isn’t so much getting it to work as it is getting it to work for all people in all types of environment,” says Anand Srivatsa, CEO of Tobii. “For the last 20 years, much of what the company has been doing is looking at how we can take eye-tracking technology and make it robust over a large population. We want to cover people with different ethnicities where the shape of the eye is different, and the colour of the iris is different. We want to cover people with eye conditions, such as lazy eye.
“Not only are people different, but the environment might include bright light, or there might be light in front of the user. The system has to correct itself for reflections, and it has to work for users wearing glasses.”
For example, enterprises devising new packaging may want to know if their bottle of drink is attractive. One way to do that is to ask a question, but as soon as you ask a question, the subject uses his or her “active mind”, which is usually biased. For example, a man might deny liking the colour pink, when that is really what caught his eye.
Read more about eye-tracking software
To overcome these biases, commercial market research is now moving towards measuring people’s responses rather than asking them for explicit information. Eye-tracking technology is perfectly adapted to this need.
The first area of commercial success for Tobii was in helping companies understand user preferences. Then the company entered another phase, where it marketed medical-grade devices for communication assistance. The technology could be used to give people a voice – someone with amyotrophic lateral sclerosis (ALS), like the late Stephen Hawking, could use their eyes to access the keyboard.
“Now, for the last five or six years, our technology has become mature enough to be considered for mass-market deployment,” says Srivatsa. “Today we see a lot of opportunity in gaming and in extended realities – virtual reality (VR) and augmented reality (AR). We also see opportunities in automotive for driver monitoring systems, healthcare, education and training. At the same time, we continue to market products and services for consumer research and behaviour.”
The company has now moved beyond simple gaze detection to measuring other things to gain deeper insights into human attention, says Srivatsa. “We’ve started with this first measure, which is where you look. But there are other biometrics you can measure when you look at images of people’s eyes or their faces, like where their head is pointed, whether you are pointing perpendicularly to the screen. Are you moving your head to the left or the right? We call that head pose.
“Of course, when you have a picture of somebody’s face, you could do facial identification like you do for your iPhone. You can look at how open the eye is. You can tell if the person is about to shut his or her eyes. You can look at things like pupil diameter, which can be an indicator for other types of higher signals.
“When we look at images of people’s faces and their eyes, we are able to measure additional signals, which we call core signals. And then the other thing we’ve done in the last two decades is to combine these core signals into higher-level signals, which we call attention signals.
Anand Srivatsa, Tobii
“We can tell if you’re looking at something like a fixation, or if your eyes are moving rapidly in what is called a saccade. Or, based on pupil dilation, you can understand things about cognitive load. You can detect if somebody is stressed, and things like that.
“These are all signals that, in scientific research or in other applications, can be quite important to measure and understand. In the automotive space, we have attention signals around drowsiness, to help a car determine if you need a coffee break, or whether you are distracted.”
Tobii built up layers of signals, which brought them from eye tracking to the more general area of “attention computing”, of which eye tracking is an essential component. It is no accident that parents tell their kids to “look at me when I’m speaking to you”: where people fix their gaze says a lot about where they are directing their attention.
One of the areas for future innovation is AR, where users wear a pair of normal-looking glasses, into which all the technology has to fit snugly. Power management is a big concern, with the eye-tracking software sharing a limited battery life with networking and display and everything else. Miniaturisation, lowering power consumption and reducing the cost of the overall solution are all areas where more innovation is needed.
Tobii markets a variety of platforms, some of which are USB peripherals operating at less than 10W. They also have systems that sit within a VR headset in a digital signal processor in the Qualcomm system on chips running the headset. One of the innovations of the last two decades has been to reduce the footprint, both from the sensors needed to capture the pictures of the eye and do the painting that you need, and from the chip that performs the higher-level analysis required by the use case.
Whenever new solutions are found, anyone making eye-tracking platforms must then make those solutions work for a large population and in a variety of environments. “Even if we fix the population problem for a certain type of application, as soon as you miniaturise, you have to get back to that level of population,” says Srivatsa.
“Ideally, you want to say this works 100% of the time on everybody, which is a goal that is probably impossible to attain. But if you want to be in consumer markets, you have to be in the high 90s in percentage of coverage. This means that every time you miniaturise, you have to work the population problem again to get back to the high 90s. Otherwise, you can’t put a product out.”
Killer apps for eye tracking and attention computing
Smaller and faster processors, along with smarter software, have given rise to new solutions that can solve a variety of problems in different industries. Several killer apps are on the horizon, and one of the more obvious cases is automotive driver monitoring.
The number one reason for car accidents is driver error. A worldwide movement, called Vision Zero, is lobbying for legislation in industrialised countries to dramatically reduce the number of fatal accidents. The European Union has mandated that, by 2026, new cars cannot be sold without a camera-based driver-monitoring system to detect whether a driver is drowsy.
Even when self-driving cars become more prevalent, it will still be necessary to make sure the driver is ready to take back control of the vehicle. It will not be until autonomy levels 4 and 5 are reached that the driver will longer be needed. And full autonomy (level 5) is not expected for at least 15 years.
The second killer app is in VR, where eye tracking or attention computing can be used in consumer headsets to enable higher levels of immersiveness. The problem for VR is that it is even more graphically intensive than a big screen because it requires an almost retina-quality image over a very broad field of view. Foveated rendering makes it possible to perceive where users are directing their attention and then render HD for only that part of their view, dramatically reducing the amount of graphics power needed.
“If you look at how our eyes operate, we can only see a very small part in high definition,” says Srivatsa. “The fovea is the part of your retina that can see in full fidelity and has full colour range – and that’s only about 1% of your field of view.
“If I know exactly where you are looking, I can reduce the rendering load to only focus there, and you could actually get back to super low resolution in other parts of the screen. You don’t even perceive colours in some parts of that spectrum, so if you could do it super smart, you can really reduce the load. Sony believes in foveated rendering and they are going to include that in their PSVR 2 headset.”
A third killer app is in AR, where there is a need to contextualise information. If a user is walking up to a bus station and looks at the station through AR-equipped glasses, the device should understand that the user is at a bus stop and tell him or her when the next bus is coming.
Role of machine learning
Because attention detection is not yet personalised, today’s systems cannot handle some of the outlier cases. They cannot distinguish between users who look drunk naturally and those who really are drunk.
“The market expectation is that you can make a judgement based on data that you’ve collected when you’re developing your product,” says Srivatsa. “So, for example, you would say this is the reference population I have tested it on in a controlled environment. A person who is known to be drowsy goes in front of the system and we collect measurements to be used as training data. Machine-learning algorithms then look for signs that can be correlated with the drowsy person.”
Srivatsa thinks the market will change very quickly as people get used to eye tracking and attention computing. User acceptance might follow a similar path to the uptake of voice-controlled systems.
“I was at Intel when we were looking at putting speech into computers in the early 2010s and we gave up because we said it’s too hard,” he says. “Besides, it was too gimmicky. And then we heard about Alexa. We assumed it would be a failure – and personally, I couldn’t think why anybody would want to talk to a device. And now my kids are doing it all the time.
“I expect eye tracking – and the broader attention computing – to take a similar path, becoming a part of everyday life before we know it.”