SpeechRecognition

Latest

  • Jon Fingas / Engadget

    Google's real-time speech recognition AI can run offline on Pixel

    by 
    Amrita Khalid
    Amrita Khalid
    03.12.2019

    You can now dictate your texts with Google's Gboard keyboard even when you're offline, at least if you use a Pixel. Google's AI team announced that it updated the Gboard's speech recognizer to recognize characters one-by-one as they're spoken, and it is now hosted directly on the device. By no longer having to send data over the internet, Gboard's voice typing should now be faster and more reliable. Google explained in a blog post that it wanted to create a speech recognizer that was "compact enough to reside on a phone" and wouldn't be derailed by unreliable WiFi or mobile networks.

  • Christine Daniloff/MIT

    AI can identify objects based on verbal descriptions

    by 
    Jon Fingas
    Jon Fingas
    09.18.2018

    Modern speech recognition is clunky and often requires massive amounts of annotations and transcriptions to help understand what you're referencing. There might, however, be a more natural way: teaching the algorithms to recognize things much like you would a child. Scientists have devised a machine learning system that can identify objects in a scene based on their description. Point out a blue shirt in an image, for example, and it can highlight the clothing without any transcriptions involved.

  • Devindra Hardawar/AOL

    Voice assistants still have problems understanding strong accents

    by 
    Jon Fingas
    Jon Fingas
    07.19.2018

    Cultural biases in tech aren't just limited to facial recognition -- they crop up in voice assistants as well. The Washington Post has partnered with research groups on studies showing that Amazon Alexa and Google Assistant aren't as accurate understanding people with strong accents, no matter how fluent their English might be. People with Indian accents were at a relatively mild disadvantage in one study, but the overall accuracy went down by at least 2.6 percent for those with Chinese accents, and by as much as 4.2 percent for Spanish accents. The gap was particularly acute in media playback, where a Spanish accent might net a 79.9 accuracy rate versus 91.8 percent from an Eastern US accent.

  • Getty Images/iStockphoto

    Google voice recognition could transcribe doctor visits

    by 
    Jon Fingas
    Jon Fingas
    11.22.2017

    Doctors work long hours, and a disturbingly large part of that is documenting patient visits -- one study indicates that they spend 6 hours of an 11-hour day making sure their records are up to snuff. But how do you streamline that work without hiring an army of note takers? Google Brain and Stanford think voice recognition is the answer. They recently partnered on a study that used automatic speech recognition (similar to what you'd find in Google Assistant or Google Translate) to transcribe both doctors and patients during a session.

  • Wachiwit via Getty Images

    IBM inches toward human-like accuracy for speech recognition

    by 
    Stefanie Fogel
    Stefanie Fogel
    03.10.2017

    The tech world has spent years trying to create speech recognition software that listens as well as humans. Now, IBM says it's achieved a 5.5 percent word error rate, down from its previous record of 6.9 percent -- an industry milestone that could eventually lead to improvements in voice assistants like Siri and Alexa.

  • Facebook's new mobile AI can process video in real time

    by 
    Steve Dent
    Steve Dent
    11.08.2016

    Facebook has started rolling out its "Caffe2Go" AI platform that does advanced style transfer video effects in real time using only your iOS or Android smartphone's horsepower. While the painterly effects are cool (see the video, below), the tech behind it is much more interesting. Deep learning normally requires content "be sent off to data centers for processing on big-compute servers," Facebook wrote, but with Caffe2Go, the processing can be done "in the palm of your hand."

  • Stephen Brashear/Getty Images

    Microsoft's speech recognition engine listens as well as a human

    by 
    Andrew Tarantola
    Andrew Tarantola
    10.18.2016

    When humans try to transcribe a spoken conversation all in one go, they manage to miss 5.9 percent of what they hear on average. Microsoft announced on Tuesday that, for the first time, they've managed to get a computer to perform that same transcription task just as well as a person. "We've reached human parity," Microsoft's chief speech scientist Xuedong Huang, said in a statement.

  • Personal assistants are ushering in the age of AI at home

    by 
    Mona Lalwani
    Mona Lalwani
    10.05.2016

    Google Home is the latest embodiment of a virtual assistant. The voice-activated speaker can help you make a dinner reservation, remind you to catch your flight, fire up your favorite playlist and even translate words for you on the fly. While the voice interface is expected to make quotidian tasks easier, it also gives the company unprecedented access to human patterns and preferences that are crucial to the next phase of artificial intelligence. Comparing an AI agent to a personal assistant, as most companies have been doing of late, makes for a powerful metaphor. It is one that is indicative of the human capabilities that most major technology companies want their disembodied helpers to adopt. Over the last couple of years, with improvements in speech-recognition technology, Siri, Cortana and Google Now have slowly learned to move beyond the basics of weather updates to take on more complex responsibilities like managing your calendar or answering your queries. But products that invade our personal spaces -- like Amazon's Echo and Google Home -- point to a larger shift in human-device interaction that is currently underway.

  • Hello Barbie has some career advice for your child

    by 
    Mona Lalwani
    Mona Lalwani
    09.17.2015

    There's a new Barbie on the block. She's chatty and she comes with a charging station. She's dressed in a cropped, metallic leather jacket, dark skinny jeans and a white sweater vest with the word "HELLO" printed thrice on the front. Within seconds of switching her on, her chunky necklace lights up as it searches for a WiFi connection. When the LED goes from red to green, you know she's ready to play. A shiny, round belt buckle doubles as a button. You press it down to activate speech-recognition for your child's first two-way conversation with the iconic, inanimate doll.

  • Amazon's Echo lets you control iTunes, Pandora and Spotify with your voice

    by 
    Jon Fingas
    Jon Fingas
    01.31.2015

    If you accepted an invitation to buy Amazon's Echo speaker, you've noticed that the device didn't have a vast musical vocabulary at first -- you could tell it to play iHeartRadio or Prime Music tunes, and that's about it. You'll have a better time of things from now on, though. Amazon is rolling out an update that lets you use your voice to steer iTunes, Pandora radio or Spotify on your mobile device. It's not super-sophisticated, but you no longer have to reach for your phone just to skip tracks. And in case millions more songs won't keep you entertained, there's also a "Simon says" command that you can use to prank people (or simply tell them something) from across the home. We'd argue that the biggest upgrade to the Echo would be getting to buy one, but these new features will do in a pinch.

  • Windows 10's browser reportedly lets you search with your voice (update: pics)

    by 
    Jon Fingas
    Jon Fingas
    01.08.2015

    Windows 10's oft-reported Spartan web browser may not just be a leaner, fresher substitute for Internet Explorer -- it could have a few tricks up its sleeve, too. Sources for The Verge claim that Microsoft's voice-guided Cortana assistant will be present both in the OS and in Spartan -- much like Chrome's "OK Google" feature, you can reportedly open a new browser tab and ask Cortana to look something up, whether it's a website or your flight itinerary. There's also talk of pen-based annotations for websites that you can share with others through the cloud.

  • Facebook just bought a speech recognition company

    by 
    Jon Fingas
    Jon Fingas
    01.05.2015

    Facebook is clearly eager to check out new interface concepts these days. Just months after its acquisition of Oculus VR wrapped up, it's buying a speech recognition company, Wit.ai. The social network isn't saying just what it plans to do with its new purchase, but Wit.ai's focus has been on a platform for voice-guided natural language interfaces. It's not a stretch to see Facebook giving you ways to dictate your status updates or chats. Also, voice recognition is particularly important for virtual reality, where you can't always reach for a keyboard -- this may play an important role in Oculus' immersive experiences going forward.

  • Google's short film examines the science of voice recognition

    by 
    Mariella Moon
    Mariella Moon
    10.18.2014

    People used to think it's harder to make computers play chess (or Jeopardy) and do mathematics than it is to make them understand human language. Turns out the opposite is true -- yes, engineers have made great advancements in voice recognition (Siri and Google voice commands are perfect examples), but they've yet to create a system that can speak with us like another human can. Google's documentary (after the break) talks about the beginnings of voice recognition, the current state of language understanding, as well as the future of artificial neural network technology, which can be used to improve both. The main goal of scientists and engineers is to make computers reach human levels of language understanding, but whether that'll ever happen remains to be seen.

  • GM wants voice-controlled cars that learn what you really mean

    by 
    Jon Fingas
    Jon Fingas
    07.15.2014

    Voice control is easy to find in cars, but it's not always intuitive. You often have to use specific syntax, which might be hard to remember when you're barreling down the highway. GM may have a smarter approach in store, though. The Wall Street Journal understands that the automaker is working with machine learning firm VocalIQ on an "advanced voice-control system" that would let you control navigation, wipers and other car components in a more intuitive way.

  • Apple buys tech that could take Siri offline

    by 
    Sharif Sakr
    Sharif Sakr
    04.04.2014

    Apple has sort-of-confirmed that it recently snapped up another small company, called Novauris. The firm specializes in speech recognition and has historical ties to the core technology and patents underpinning Siri. TechCrunch reports that Novauris' experts are already working inside Apple to improve its voice assistant, but no one really knows exactly what they're up to. One of Novauris's big strengths has been locally processed recognition, which doesn't rely on distant servers, so it's possible that Apple wants Siri to accomplish more without a data connection. (Apple's current Siri partner, Nuance, can also do offline processing, but Apple hasn't been able to bring that technology in-house.) We're just speculating, of course, but this is a function that no voice assistant has really mastered so far (although others are definitely working on it), and it's even more important now that iOS is getting into the car.

  • Tell Gmail what to do with the latest Dragon Dictate for Mac

    by 
    Timothy J. Seppala
    Timothy J. Seppala
    03.05.2014

    Let's face it: not everyone uses Nuance's Dragon Dictate software to power a ridiculously automated dorm room, the less creative among us have had to get by using it to take notes or write term papers. No matter what you do with it, however, you might appreciate that the latest Mac version of the app lets you use your mouth instead of your fingers to write emails and navigate your inbox -- so long as you're using Firefox or Safari to access Gmail. You can tell Mac's word processor, Pages, what to do too. Beyond that, Dictate will also transcribe single-speaker recordings either from a smartphone or digital voice recorder now (including .mp3 and .wav files), and, what's more, it apparently boasts improved voice recognition accuracy. The suite is $200 directly from the developer should you want to give your hands a rest, or perhaps you just really like hearing the sound of your own voice.

  • Google Chrome can listen in on your conversations (but it probably isn't)

    by 
    Ben Gilbert
    Ben Gilbert
    01.22.2014

    Google Chrome users are no strangers to speech recognition software -- heck, the internet browser has "Ok Google!" voice recognition built right into its URL navigation bar. But that recognition is triggered to "listen" only when you've opened a new tab or navigate to Google's homepage, and the expectation is that the browser isn't able to listen in otherwise. Not so, says speech recognition program developer Tal Ater, who discovered an exploit in Chrome's speech recognition that enabled unscrupulous websites with speech recognition software to listen in when users aren't expecting. First, you have to give permission to a website to allow speech recognition to work. After that, however, the website may open a pop-under window with the intent of secretly continuing to listen -- even if you've closed the tab and moved on. Google Chrome must remain running, and you have to miss seeing the pop-under, but it's certainly an issue. Moreover, Google knows of the problem and has yet to fix it...despite a fix existing. Ater describes reporting the issue to Google, finding out it was fixed by the company soon after, and that fix not being implemented in subsequent updates. Google confirmed that to Engadget with the following statement: "The security of our users is a top priority, and this feature was designed with security and privacy in mind. We've re-investigated and this is not eligible for a reward, since a user must first enable speech recognition for each site that requests it. The feature is in compliance with the current W3C specification, and we continue to work on improvements." Given Google's compliance with speech recognition standards, it sounds like Mountain View isn't changing the way Chrome's speech software works just yet, though we'd be surprised if some form of visual indication of recording wasn't included in a future build. A video of Ater demonstrating the exploit is just below.

  • Intel reportedly acquires Indisys, gets an edge in natural language recognition (update: official)

    by 
    Jon Fingas
    Jon Fingas
    09.13.2013

    Intel is quickly transforming its dream of perceptual computing into reality: the company will soon ship motion control technology, and it acquired the gesture interface firm Omek back in July. The chip giant may not be done yet, as there are reports from Spain that it has acquired Indisys, a small natural language recognition company. Details of the buyout are scarce, but the move would give Intel its own voice control software; it wouldn't have to license code from third parties like Nuance. We've reached out to Intel to confirm the acquisition. If real, the Indisys takeover might have come at just the right time -- Intel is swinging its attention to wearables, and voice control is now more of a necessity than a luxury. Update: Intel just confirmed to us that it acquired Indisys on May 31st, and that the deal has already closed.

  • Facebook to acquire speech recognition startup Mobile Technologies

    by 
    Alexis Santos
    Alexis Santos
    08.12.2013

    Facebook may not seem like an obvious match for a machine translation company, but its just agreed to snatch up speech recognition startup Mobile Technologies to strengthen its chops in the area. If you're not familiar with the outfit, they're the minds behind the Android and iOS app Jibbigo, which translates your text or dulcet tones into other languages. While Zuckerberg and Co. haven't revealed precise plans for the freshly-acquired firm, they note that the voice tech factors into their long-term plan for the web and mobile devices. "Voice technology has become an increasingly important way for people to navigate mobile devices and the web, and this technology will help us evolve our products to match that evolution," said Tom Stocky, Facebook's Director of Product Management. "We believe this acquisition is an investment in our long-term product roadmap as we continue towards our company's mission." There's no word if Jibiggo will still receive support once Mobile Technologies joins the social network in Menlo Park, but we've gotten in touch with the team to find out.

  • Dragon Mobile Assistant 4 for Android adds driving mode, voice notifications

    by 
    Jon Fingas
    Jon Fingas
    06.18.2013

    For Nuance, it's not enough that Dragon Mobile Assistant spares Android users from pecking at the keyboard -- with the app's new 4.0 upgrade, those users can sometimes avoid contact altogether. Dragon Mobile Assistant can now detect when you're in a moving car and automatically invoke a Driver Mode that relies solely on voice recognition and feedback, keeping your focus on the road. Accordingly, the upgrade builds in spoken notifications for inbound calls, messages, upcoming meetings and Facebook updates. There's also voice-aware email and customizable wake up commands. All told, 4.0 is a big boost for Android fans who see touchscreens as old hat; if you do, you can grab the update shortly (if not already) through Google Play.