The Pixel 4's radar-powered Soli motion-sensing system now lets you play and pause music with a swipe, which might seem like a simple thing to implement. According to a new explainer post by Google, however, Soli doesn't work exactly how you might expect, and required a lot of new deep learning tech and training to make it work.
Soli's short-range radar can essentially do two things: detect your presence to prepare the screen for face unlock, and read swiping or tapping gestures performed in the air above the phone. However, in order to make the antennae small enough for a smartphone, the radar is geared to detect motion rather than shapes.
That has the added advantage of privacy. Since Soli can't form a well-defined image, there's "no distinguishable image of a person's body or face" that can be generated or used by Google, the researchers wrote.
The challenge for Google was to quickly and accurately interpret these temporal motions in order to figure out what the user is doing. At the same time, it had to account for movement of the phone when you're walking, for instance, or vibrations when music is playing.
To do that, Soli sees the world as a sort of 3D graph, with the distance of the subject on the vertical axis, the velocity moving toward or away from the phone on the horizontal axis, and the brightness of each pixel representing the relative size of the object detected. The resulting data is shown above.
Using that data, the researchers created an AI model and trained the system using millions of gestures recorded from thousands of volunteers. Those were compared against radar recordings to create AI models using Google's TensorFlow machine learning framework. The whole thing was optimized to run on the Pixel 4's custom digital signal processors at up to 18,000 fps.
While Soli can only interpret relatively simple gestures for now, Google has high hopes for what it'll do in the future. It believes the tech could one day be used on smaller devices like smartwatches where gestures would be more useful, or put to service for security, entertainment, education and more.