IBM inches toward human-like accuracy for speech recognition

The company has reached a 5.5 percent word error rate that's nearly on par with humans.

Sponsored Links

Wachiwit via Getty Images
Wachiwit via Getty Images

The tech world has spent years trying to create speech recognition software that listens as well as humans. Now, IBM says it's achieved a 5.5 percent word error rate, down from its previous record of 6.9 percent -- an industry milestone that could eventually lead to improvements in voice assistants like Siri and Alexa.

Microsoft claimed to reach a 5.9 percent word error rate last October using neural language models resembling associative word clouds. At the time, the company believed 5.9 percent was equivalent to human parity. But, IBM says it's not popping the champagne yet. "As part of our process in reaching today's milestone, we determined human parity is actually lower than what anyone has yet achieved — at 5.1 percent," George Saon, IBM principal research scientist, wrote in a blog post this week.

IBM reached the 5.5 percent milestone by combining so-called Long Short-Term Memory, an artificial neural network, and WaveNet language models with three strong acoustic models. It was then measured using the "SWITCHBOARD" corpus, a collection of telephone conversations that's been used as a benchmark for speech recognition software for decades. SWITCHBOARD is not the industry standard for measuring human parity, however, which makes breakthroughs harder to achieve.

"The ability to recognize speech as well as humans do is a continuing challenge, since human speech, especially during spontaneous conversation, is extremely complex," said Julia Hirschberg, a professor and Chair at the Department of Computer Science at Columbia University, in a statement to IBM. "It's also difficult to define human performance, since humans also vary in their ability to understand the speech of others."

Turn on browser notifications to receive breaking news alerts from Engadget
You can disable notifications at any time in your settings menu.
Not now

All products recommended by Engadget are selected by our editorial team, independent of our parent company. Some of our stories include affiliate links. If you buy something through one of these links, we may earn an affiliate commission. All prices are correct at the time of publishing.
View All Comments
IBM inches toward human-like accuracy for speech recognition