Latest in Science

Image credit:

Google DeepMind's AI can mimic realistic human speech

It can also play music.
Mariella Moon, @mariella_moon
September 10, 2016
Share
Tweet
Share

Sponsored Links

JUNG YEON-JE/AFP/Getty Images

It's still pretty easy to tell whether it's a real person who's talking or a text-to-speech program. But there might come a time when a robot could dupe you into thinking that you're speaking with a real person, thanks to a new AI called WaveNet developed by Google's DeepMind team. They have a pretty good track record when it comes to building neural networks -- you probably know them as the folks who created AlphaGo, the AI that defeated one of the world's best Go players.

Currently, developers use one of two methods to create speech programs. One involves using a large collection of words and speech fragments spoken by a single person, which makes sounds and intonations hard to manipulate. The other forms words electronically, depending on how they're supposed to sound. That makes things easier to tweak, but the results sound much more robotic.

In order to build a speech program that actually sounds human, the team fed the neural network raw audio waveforms recorded from real human speakers. Waveforms are the visual representations of the shapes sounds take -- those squiggly waves that squirm and dance to the beat in some media player displays. As such, WaveNet speaks by forming individual sound waves. (By the way, the AI also has a future in music. The team fed it classical piano pieces, and it came up with some interesting samples on its own.)

For instance, if used as a text-to-speech program, it transforms the text you type into a series of phonemes and syllables, which it then voices out. Subjects who took part in blind tests thought WaveNet's results sounded more human than the other methods'. In the AI's announcement post, DeepMind said it can "reduce the gap between the state of the art and human-level performance by over 50 percent" based on those English and Mandarin Chinese experiments. You don't have to take the team's word for it: We're still far from using a WaveNet-powered app, but you can listen to some samples on DeepMind's website.

All products recommended by Engadget are selected by our editorial team, independent of our parent company. Some of our stories include affiliate links. If you buy something through one of these links, we may earn an affiliate commission.
Comment
Comments
Share
Tweet
Share

Popular on Engadget

Samsung, Stanford make a 10,000PPI display that could lead to 'flawless' VR

Samsung, Stanford make a 10,000PPI display that could lead to 'flawless' VR

View
Xbox Series X and Series S walkthrough is a day-one primer

Xbox Series X and Series S walkthrough is a day-one primer

View
Instagram changes nudity policy after controversy with Black, plus-size model

Instagram changes nudity policy after controversy with Black, plus-size model

View
Facebook will not ban Oculus owners with multiple VR headsets (updated)

Facebook will not ban Oculus owners with multiple VR headsets (updated)

View
Twitch faces music industry backlash over proper licensing (updated)

Twitch faces music industry backlash over proper licensing (updated)

View

From around the web

Page 1Page 1ear iconeye iconFill 23text filevr