Speaking another language may be getting easier. Google is showing off Translatotron, a first-of-its-kind translation model that can directly convert speech from one language into another while maintaining a speaker's voice and cadence. The tool forgoes the usual step of translating speech to text and back to speech, which can often lead to errors along the way. Instead, the end-to-end technique directly translates a speaker's voice into another language. The company is hoping the development will open up future developments using the direct translation model.
According to Google, Translatotron uses a sequence-to-sequence network model that takes a voice input, processes it as a spectrogram -- a visual representation of frequencies -- and generates a new spectrogram in a target language. The result is a much faster translation with less likelihood of something getting lost along the way. The tool also works with an optional speaker encoder component, which works to maintain a speaker's voice. The translated speech is still synthesized and sounds a bit robotic, but can effectively maintain some elements of a speaker's voice. You can listen to samples of Translatotron's attempts to maintain a speaker's voice as it completes translations on Google Research's GitHub page. Some are certainly better than others, but it's a start.
Google has been fine-tuning its translations in recent months. Last year, the company introduced accents in Google Translate that can speak a variety of languages in region-based pronunciations and added more langauges to its real-time translation feature. Earlier this year, Google Assistant got an "interpreter mode" for smart displays and speakers that can translate between 26 languages.