Latest in Gear

Image credit: Christine Daniloff/MIT

AI can identify objects based on verbal descriptions

It could lead to much smarter translation.
317 Shares
Share
Tweet
Share
Save

Sponsored Links

Christine Daniloff/MIT

Modern speech recognition is clunky and often requires massive amounts of annotations and transcriptions to help understand what you're referencing. There might, however, be a more natural way: teaching the algorithms to recognize things much like you would a child. Scientists have devised a machine learning system that can identify objects in a scene based on their description. Point out a blue shirt in an image, for example, and it can highlight the clothing without any transcriptions involved.

The team started with an existing approach where two neural networks process the images and audio spectrograms, learning to match an audio caption with images containing a given object. However, they modified the image-handling neural network so that it would split the image into a grid of cells, while the audio network cuts up the spectrogram into short (1-2 second) snippets. After pairing the right image and caption, the training process scores the AI system based on how well the audio segments match objects in the cell grids. Effectively, it's like telling children what they're looking at by pointing at objects and naming them.

There are a number of potential uses, but the researchers are most enamored with the potential for translation. Rather than asking a bilingual annotator to make the connections, you could have people speaking different languages describe the same thing -- the system could assume that one description is a translation of the other. That could make speech recognition viable for many more languages than just the roughly 100 that have enough transcriptions for the old-fashioned method.

All products recommended by Engadget are selected by our editorial team, independent of our parent company. Some of our stories include affiliate links. If you buy something through one of these links, we may earn an affiliate commission.
Comment
Comments
Share
317 Shares
Share
Tweet
Share
Save

Popular on Engadget

The Morning After: One last update for Google Glass Explorer Edition

The Morning After: One last update for Google Glass Explorer Edition

View
Reddit bans 61 accounts linked to 'suspected campaign from Russia'

Reddit bans 61 accounts linked to 'suspected campaign from Russia'

View
Noir detective game 'Blacksad' will be out for consoles on December 10th

Noir detective game 'Blacksad' will be out for consoles on December 10th

View
Google is ending support for the Explorer Edition of Glass

Google is ending support for the Explorer Edition of Glass

View
Despite the HQ2 debacle, Amazon will add office space in Manhattan

Despite the HQ2 debacle, Amazon will add office space in Manhattan

View

From around the web

Page 1Page 1ear iconeye iconFill 23text filevr