Latest in Tomorrow

Image credit: BabelOn

BabelOn is trying to create Photoshop for your voice

One day, an app could translate your words into another language -- in your own voice.
702 Shares
Share
Tweet
Share
Save
BabelOn

Speech synthesis -- the process of artificially creating the human voice -- isn't anything new. But a startup from San Francisco called BabelOn is working on a particularly unique offshoot of this technology. In a nutshell, BabelOn wants to make it a trivial matter to translate your own voice into another language, even if you don't speak that language yourself. The company says its combo of software and custom-built hardware can analyze what makes up your voice and then use that to recreate speech that sounds just like you, in a language of your choosing.

Initially, the company wants to use its technology for things like improving dubbed films or localizing video games, but eventually it wants to be able to translate your speech in real time, say while you're on a Skype call. Microsoft has done this for a while, translating Skype voice calls on the fly, but BabelOn promises that its translations will sound like you, not an anonymous Siri- or Cortana-like digital voice.

It's an intriguing idea, but let's be clear: It's very early days for BabelOn. We haven't seen the software in action, and the company hasn't booked a client yet. The company is in negotiations with a video game developer to use BabelOn for translating a forthcoming title, but the deal's not done yet. There's promise here but also plenty of potential pitfalls, not the least of which is the idea of someone's voice being "stolen" and used in a way she didn't consent to.

Though BabelOn isn't ready just yet, the idea behind it has existed since 2004. Co-founder Daisy Hamilton's parents had noticed a demand for better language dubbing in the film industry. They received a patent for the core technology behind BabelOn, but the rest of the technology they needed to make this vision a reality wasn't around yet.

Now, though, the surrounding technologies and hardware are sophisticated enough that BabelOn can begin to put its idea into practice. The core part of the process is creating a BabelOn Language Information Profile, or BLIP. Over the course of about two hours in the company's San Francisco studio, an individual's BLIP is created by having them read specific texts in a variety of emotional states.

But BabelOn doesn't just capture the sound of a voice. Hamilton described it as looking at your body as an instrument. BabelOn's custom hardware can capture and analyze your breath, how your voice comes out of your chest and throat, how your mouth moves, and a variety of other key factors. "It's both visual and vocal feedback that's captured into a single continuous stream," Hamilton said.

Once recorded, BabelOn will be able to take your voice and translate it into other languages and replicate the corresponding emotion that a script calls for, without you needing to go out and record entirely new dialogue. Imagine a game company wanting to localize an English voice-acting performance for other countries; BabelOn could let companies use the same voice actor and digitally create her dialogue rather than having to find a native speaker to rerecord the entire script.

To start, the company is focusing on English, French, Spanish, German, Portuguese, Mandarin, Japanese and Hindi, with additional languages coming down the line based on demand. But it's important to note that you can't just type words in English into a computer and have BabelOn do both the voice creation and translation: It needs to be provided with a specific script or input in the language you're looking to translate to. However, you can specify the desired emotional output of the translated performance; Hamilton called it an "emotional markup language."

As for the hardware itself, it was developed in partnership with the Lawrence Livermore National Laboratory, a federal institution focused on developing science and technology. It's actually a variation on hardware that's been in use by the US Department of Defense for unrelated applications. Hamilton didn't offer up many other details, but eventually the company hopes to set up multiple studios in locations beyond San Francisco.

Hamilton said it takes a few hours to fully process a script and output it in another language. But with further work and processing improvements, she envisions the system working in near-real time. That's something that would greatly expand BabelOn's capabilities beyond films and games. Doing a video call that get translated almost instantly with your own voice could make multi-language conversation a lot more personal and expressive.

But the idea of taking BabelOn to consumers brings up a major security challenge. If the technology to create a BLIP becomes more commonplace and the translation software is used in more applications, it's easy to imagine voice data being an appealing target for hackers who want to literally put words in someone's mouth. Hamilton noted that the company has an ethics board to head off potential misuse, but that doesn't solve the security challenge of keeping your voice safe.

Hamilton addressed those concerns, noting that BabelOn will "use a highly encrypted offline voice vault to store all of the BLIP, which would be curated upon request of the [original] speaker." Offline storage would certainly make this harder to crack in to, and Hamilton also noted that BLIPs would have a reference visual cue that indicates when voices and languages have been altered. It's still not clear how this will scale if the service becomes popular, but it's something BabelOn is aware of. "Security of BLIPs is massively important to us, as we'd never want to threaten someone's vocal authenticity," she said.

Security is the kind of challenge that could keep BabelOn from ever being something consumers can use. For people recording dialogue in a movie or game, their BLIP could be destroyed when the work is done. But a tool that can capture and then create language using someone's voice in real time is basically unheard of and something that could be a huge target for hackers.

BabelOn's introduction to the public is via an Indiegogo campaign -- a strange choice given that the technology isn't directed at consumers. Hamilton said its purpose is to get funds to extend a software license the company needs to finish its own work. But she also stressed that they have backup plans in place if the campaign doesn't meet its goal. "It's just as much about using Indiegogo as a launch pad to put BabelOn out in the world," Hamilton said.

Hamilton hopes and expects that BabelOn will have its first client soon. If it can get a video game made with BabelOn, it'll give the company a concrete example of its technology to court other clients and push development forward -- but until then, we're still in the theoretical realm. It's way too early to know whether this technology will take off with the movie and game companies BabelOn is targeting, let alone whether we'll see it in consumer-focused products some years down the line.

From around the web

ear iconeye icontext filevr