AI might soon be an ally in the quest to banish wake words from voice assistants. Carnegie Mellon University researchers have developed a machine learning model that estimates the direction a voice is coming from, indicating your intent without the need for a special phrase or gesture. The approach relies on the inherent properties of sound as it bounces around a room.
The system recognizes that the first, loudest and clearest sound is always the one aimed directly at a given subject. Anything else tends to be quieter, delayed and muffled. The model is also aware that human speech frequencies vary depending on the direction you’re facing. Lower frequencies tend to be more omnidirectional.
This method is “lightweight,” software-based and doesn’t require sending audio data to the cloud, the researchers added.
It could be a while before you see the technology in use, although the team has publicly released code and data to help others build on their work. It’s easy to see where this might lead, at least. You could tell a smart speaker to play music without using a wake word or setting off a horde of other connected devices. It might also help with privacy by requiring your physical presence while eschewing the need for gaze-detecting cameras. In other words, it would be closer to that Star Trek vision of voice assistants that always know when you’re talking to them.