Tomorrow

For VR to be truly immersive, it needs convincing sound to match

Personalized 3D audio delivers the best results.

By Mona Lalwani Jan. 22, 2016 10:00 am EST

I'm staring at a large iron door in a dimly lit room. "Hey," a voice says, somewhere on my right. "Hey buddy, you there?" It's a heavily masked humanoid. He proceeds to tell me that my sensory equipment is down and will need to be fixed. Seconds later, the heavy door groans. A second humanoid leads the way into the spaceship where my suit will be repaired.

Inside a wide room with bright spotlights I notice an orange drilling machine. "OK, before we start, I need to remove the panel from the back of your head," says the humanoid. I hear the whirring of a drill behind me. I squirm and reflexively raise my shoulders. The buzzing gets louder, making the hair on the nape of my neck stand up.

Then I snapped out of it. I removed the Oculus Rift DK2 strapped on my face and the headphones pressed on my ears and was back on the crowded floors of the Consumer Electronics Show in Vegas. But for a few terrifying seconds, the realistic audio in Fixing Incus, a virtual reality demo built on RealSpace 3D audio engine, had tricked my brain into thinking a machine had pulled nails out from the back of my head.

"There's a little map in your brain even when you're not seeing the objects," says Ramani Duraiswami, professor of computer science at the University of Maryland and co-founder of VisiSonics, the startup that licensed its RealSpace 3D audio technology to Oculus in 2014. "If the sound is consistent with geometry, you'll know automatically where things are even if they're not in your view field."

The premise of VR is to create an alternate reality, but without the right audio cues to match the visuals, the brain doesn't buy into the illusion. For the trickery to succeed, the immersive graphics need equally immersive 3D audio that replicates the natural listening experience.

There are a couple of ways to capture and play back 3D audio. By making binaural recordings on a dummy head with microphones for ears, one can create a clear distinction between left and right sounds. Musicians like Beck and Bjork have experimented with the format. Meanwhile, a YouTube community has been using binaural audio for the sound of whispers and hair snipping since 2010, a brain trick that has reportedly helped some of its followers overcome insomnia and anxiety. But live-action 360-video creators have been toying with "ambisonics," a technique that employs a spherical microphone to capture a sound field in all directions, including above and below the listener.

But in simulated VR—like gaming, for instance—where the visual setting is predetermined, 3D audio is best created on a rendering engine that's capable of attaching sound to objects as they move through the setting. So, a drilling machine that's out of sight can feel like a torture tool when it's at the back of your head.

This object-based audio technique uses software to assign audible cues to things and characters in 3D space. But it isn't a new invention. Dolby Laboratories has been employing the same technique for Atmos, a four-year-old adaptive sound technology that brought immersive audio to cinemas.

Back in the '70s, when Dolby first launched a multiple-speaker setup called surround sound, the technology was based on fixed audio channels. The idea was to direct audio to speakers placed at prescribed locations. So if someone in a scene screamed on the right side of the screen, the sound was sent to a speaker in that area of the theater—or living room, even. This changed the way people experienced movies in their homes.

Indeed, movie audio has been mixed specifically for this way for decades now. While it led to the rise of 5.1 and 7.1 home theater systems, the same technique didn't always work for cinemas that didn't follow the same speaker locations. For movie theaters, then, Dolby needed a more flexible format.

"The content creators wanted more freedom in terms of where to place the sound. They didn't want to think in terms of channels," says Joel Susal, director of Dolby's virtual reality and augmented reality business. "Dolby Atmos gives sound designers a 3D canvas to design a soundscape that they want." It offers object-based audio that isn't tied down to fixed speakers. It's also a scalable technology, meaning it can be used for movie theaters, home speaker systems and even headphones. And while Atmos's flexible sound environment was intended for movie theaters, its immersive capabilities also make it a natural fit for VR.

The premise of VR is to create an alternate reality, but without the right audio cues to match the visuals, the brain doesn't buy into the illusion.

With more players now tackling the problem of 3D audio, everyday consumers might soon have the chance to experience it for themselves. But there's a lingering challenge: Maintain the cues that the brain needs to localize the sound so the illusion remains intact. The human ears pick up audio in three dimensions. The brain processes multiple cues to spatialize that sound. One of the most basic indicators is proximity. The ear closer to the source picks up sound waves before the other; there's a gap in the time that it takes to travel from one ear to the other. The distance also changes the audio levels. Together, these differences help the brain pinpoint the exact source of the sound.

But the same cues don't apply to all directions. Sounds that emerge from the front or the back are more ambiguous for the brain. In particular, when a sound from the front interacts with the outer ears, head, neck and shoulders, it gets colored with modifications that help the brain solve the confusion. This interaction creates a response called Head-Related Transfer Function (HRTF), which has now become the linchpin of personalized immersive audio.

Capturing a person's HRTFs is the equivalent of fingerprinting. Everyone's ears are unique, so the imprint of one person's anatomy on a sound is completely different from the other. It's the reason generic dummy head binaural recordings don't have the same effect on everyone. Likewise, they don't always work for VR either.

To solve the VR audio problem, scientists have been experimenting with ways to measure individual audio modifications so that the brain can localize simulated sounds with immaculate precision. So far, the norm has been to place small microphones inside the ear to pick up modifications. Then, a technician plays a sound from a specific point in space. The thing is this process essentially covers only one position. To cover an entire soundscape, the speaker would need to be placed in hundreds of different spots and the sound variations would need be recorded for each location. As you can imagine, the technique is tedious and can take hours. But VisiSonics, the startup behind the Oculus Rift's audio technology, found a solution: Swap the speakers with microphones.

At the company's research lab at the University of Maryland, there's a sound booth covered in 256 tiny, disc-shaped microphones. The researchers place an earbud-shaped speaker inside your ear to play the sound of birds. The chirping hits the array of microphones that record the audio modifications. Unlike other testing methods, which account for each possible location one by one, VisiSonics's patented technology picks up all the audio cues simultaneously, allowing them to measure a person's unique audio imprint within seconds. "We can [do this] in a lab but we want to ... set it up in every Best Buy," says CEO Gregg Wilkes.

A bespoke 3D audio experience feels like putting on prescription glasses for fuzzy eyesight. Unlike stereo sound that's designed to stay trapped inside your headphones, personalized sound feels far enough outside your head for you to forget that you have a headset on. This kind of realism is essential to VR, but apart from a few musical experiments and obscure art projects, 3D audio has largely lived inside research labs for the last few decades.

Similar to older binaural recordings, the newer 3D audio format is best suited to headphones. It doesn't convert easily into a realistic soundscape over speakers. In the absence of a head-tracker, the listener is required to sit fairly still to stay inside the illusion. The restriction has held the technology back from reaching movie watchers at home. Now, with VR headsets about to hit shelves, immersive audio is moving to the forefront of sound technologies. At CES earlier this month, Sennheiser brought out a suite of 3D audio technologies called Ambeo, which included a VR microphone that captures ambisonics and an upmix algorithm that converts stereo tracks into a high-quality 9.1 sound experience.

Innovation in this space isn't limited to big audio companies either. Ossic, a San Diego-based startup, has a set of 3D audio headphones, which will make their debut on Kickstarter next month. The company claims to have sensors that can automatically calibrate the headphones to your ears for personalized audio. In addition to the hardware, Ossic also has a rendering engine that creates object-based sound for virtual experiences like the HTC Vive demo Secret Shop, which relies heavily on audible cues to guide the viewer through the game.

Despite the high-quality demos available now, audio for VR is still a work in progress. But for now, the combination of 3D audio and head-tracking makes VR complete. Without accurate audio cues, if you strap on a headset and look in one direction, you run the risk of missing the humanoid on your right. "Audio, from an evolutionary perspective, is the thing that makes you turn your head quickly when you hear a twig snap behind you," says Susal. "It's very common that people put on the headset and don't even realize they can look around. You need techniques to nudge people to look where you want them to look, and sound is the thing that has nudged us as humans as we've evolved."

[Image credit: VisiSonics]

For VR to be truly immersive, it needs convincing sound to match

Recommended