Google's new text-to-speech service has more realistic voices

In all, the engine has 32 different voice options in 12 languages.

By Mallory Locklear March 27, 2018 1:17 pm EST

Google will now let developers use the text-to-speech synthesis that powers the voices in Google Assistant and Maps. Cloud Text-to-Speech is available now through the Google Cloud Platform and the company says it can be used to power voice response systems in call centers, enable IoT device speech and convert media like news articles and books into a spoken format. There are 32 different voice options in 12 languages and users can customize pitch, speaking rate and volume gain.

Additionally, a selection of the available voices were built with Google's WaveNet model. It was developed by Google's DeepMind team and the company first announced it in 2016. Rather than using fragments of speech and stringing them together to make words — which often sounds very robotic — WaveNet forms individual sound waves, creating more natural sounding speech. Google has since improved WaveNet, making it 1,000 times faster and able to generate more high quality audio. In tests, listeners said WaveNet voices were 20 percent better than other generated voices and their scores suggested that WaveNet reduces the quality gap between generated speech and human speech by about 70 percent.

Those who want to try out the voices can do so here. Pricing is available here.

Google's new text-to-speech service has more realistic voices

Recommended