voice recognition

Latest

Hitting the Books: Voice-controlled AI copilots could lead to safer flights
by
Andrew Tarantola
10.15.2023
With an AI riding shotgun, the pilots of tomorrow will have fewer minutia to split their attention between while in the air.
Apple TV devices now recognize up to six different voices
by
Jon Fingas
12.14.2022
Apple TV 4K and HD now recognize your voice to customize your Siri searches.
Spotify is testing a new car mode focused on voice commands
by
Amrita Khalid
03.25.2022
The company retired its old Car View mode last year.
Microsoft is reportedly close to buying speech tech giant Nuance
by
Jon Fingas
04.11.2021
Microsoft is reportedly in late talks to buy Nuance for $16 billion, giving it advantages in speech tech and AI.
Mercedes' new touchscreen controls eliminate 27 physical buttons
by
Christine Fisher
07.08.2020
Mercedes' second-gen MBUX system will replace 27 physical buttons with touchscreen, voice, gesture and gaze controls.
Android TV may soon recognize your exact voice
by
Jon Fingas
06.14.2020
Android TV code hints at Voice Match coming to Google's living room OS, letting it deliver more personal material when you speak.
Lexus’s first autonomous EV has drones and ‘artificial muscle technology’
by
Christine Fisher
10.23.2019
Lexus is finally ready to unveil its first electric vehicle prototype. At the Tokyo Motor Show today, it pulled back the curtain on its LF-30 Electric Concept, its vision for the next generation of EVs.
Google trains its AI to accommodate speech impairments
by
Christine Fisher
05.07.2019
For most users, voice assistants are helpful tools. But for the millions of people with speech impairments caused by neurological conditions, voice assistants can be yet another frustrating challenge. Google wants to change that. At its I/O developer conference today, Google revealed that it's training AI to better understand diverse speech patterns, such as impaired speech caused by brain injury or conditions like ALS.
Spotify tests voice-enabled ads in the US
by
Amrita Khalid
05.02.2019
If you're a free user on Spotify's streaming music service, you may hear an ad soon that asks you to respond verbally. Spotify has started testing voice-enabled ads on a small number of free subscribers in the US. The voice-enabled ads will only be deployed to users who already have their microphone permissions turned on to use Spotify's voice search feature, the company confirmed to Engadget on Thursday.
Google's real-time speech recognition AI can run offline on Pixel
by
Amrita Khalid
03.12.2019
You can now dictate your texts with Google's Gboard keyboard even when you're offline, at least if you use a Pixel. Google's AI team announced that it updated the Gboard's speech recognizer to recognize characters one-by-one as they're spoken, and it is now hosted directly on the device. By no longer having to send data over the internet, Gboard's voice typing should now be faster and more reliable. Google explained in a blog post that it wanted to create a speech recognizer that was "compact enough to reside on a phone" and wouldn't be derailed by unreliable WiFi or mobile networks.
ICYMI: Password via voice recognition, drone delivery & more
by
Kerry Davis
07.30.2015
#fivemin-widget-blogsmith-image-271554{display:none;} .cke_show_borders #fivemin-widget-blogsmith-image-271554, #postcontentcontainer #fivemin-widget-blogsmith-image-271554{width:570px;display:block;} try{document.getElementById("fivemin-widget-blogsmith-image-271554").style.display="none";}catch(e){}Today on In Case You Missed It: Customers at the Netherlands ING Bank can now check their account balance by saying "my voice is my password." A delivery company named Workhorse is testing out a parcel delivery service with drones, from a base at the tops of delivery vans. And Microsoft researchers have outlined how to record content viewable with HoloLens and a very odd assortment of characters are ready to entertain you.
Amazon is offering Echo voice tech to other manufacturers
by
Edgar Alvarez
06.25.2015
Now that Amazon's voice-controlled Echo speaker is available to everyone, the company is hinting at third-party devices that will make use of the same voice tech that powers the Echo's built-in assistant, "Alexa." Additionally, Amazon is giving developers access to the Alexa Skills Kit, a free SDK that will make it easy for them to create new features for the Echo platform. Lastly, the company launched the Alexa Fund, a $100 million endowment designed to support developers, manufacturers and startups who are interested in making voice-powered products for its ecosystem. To be a part of it, Amazon says it's going to base those decisions on the technology's ability to influence the Alexa Skills Kit or the Alexa Voice Service. What this tells us, though, is that Amazon is getting serious about what appeared to be a simple side-project from the beginning.
Sainsbury's teams up with Google to stop you wasting food
by
Matt Brian
06.05.2014
It turns out that us Brits are a wasteful bunch. Studies suggest we're throwing out as much as £60 worth of food and drink each month when we could be putting it to better use. Instead of trying to convince you to head over to one of its stores to replenish your supplies, Sainsbury's has teamed up with Google to create a tool that provides suggestions on how to use the food you'd otherwise be chucking out. It's called Food Rescue, and Google plays a small but vital role in proceedings by lending the same voice-recognition tech that powers its search engine to the supermarket's new mobile and online tool. When you visit the website, you can say (or type) what foodstuffs you have an it'll find a range of recipes that use those ingredients. In a bid to get more people involved, the supermarket chain will record the weight of food rescued and calculate the money saved in each recipe. That information will then be added to a real-time leaderboard of top 'rescuers' across the UK. You say tomato, I say tomato, it'll still work the whole thing out.
Crowdfunding Roundup: Too many projects, too little time
by
Steve Sande
06.04.2014
Every week, TUAW provides readers with an update on new or significant crowdfunded Apple-related projects in the news. While our policy is to not go into detail on items that haven't reached at least 80 percent of their funding goal, this update is designed to give readers a heads-up on projects they might find interesting enough to back. Wow, as usual we received notification of a lot of crowdfunding projects that are currently underway. Due to a lack of time this week, we're only going to cover a few. Let's start with a look at Kickstarter projects that are underway: Have you ever wanted to record a call off of your iPhone for future reference? Maybe you're talking with your lawyer, perhaps it's an interview that you want to go back and reference later. That's the idea behind RECAP USB, a new version of Igor Ramos' RECAP devices. You can just plug this device into your iPhone on one end, your Mac on the other end, and then use Garage Band or Audacity to record those calls. With 50 days to go, RECAP USB is already over 15 percent funded. Do you know what a slider is? In the food world, it's a small hamburger, but for videographers a slider allows a camera to move horizontally during a shot to provide a more compelling view of a subject. The Extralite G2 Camera Slider project seeks funding to create a family of sliders that will work with everything from an iPhone to your DSLR. It's starting a bit slowly; at this point, with 29 days to go, the project is only about 7 percent funded. You can change that with your support. Hey, it's another tablet stand! This one not only has an odd name -- Scööb -- but an interesting functionality. It's foldable for extra portability, and adjusts to a number of different angles. I think this may be one case where the low cost of the device -- CAD$3 -- is going to make it difficult for the team to reach its goal of CAD$5000. It's 19 percent funded at this point. Check it out in the video below: Remember the Pluggy Lock from a few weeks ago? It's a tiny lanyard lock for your iPhone that plugs into the headphone jack. Unsurprisingly, it's funded to the tune of 459 percent with three weeks to go! There's still time for you to back the project and get one of these innovative little devices before you lose your iPhone. During this week's WWDC Keynote, we heard about "Hey, Siri", a feature of iOS 8 that will let you interact with Siri without the need for you to press a button or raise your phone to your head. But what about being able to say "Hey, Homey, I want to watch Star Trek" and then have lights turned down, blinds closed, a TV turned on, and Plex searched for the movie? That's the idea behind Homey, a home automation tool that uses voice recognition and apps to control your home. It's already funded at 159 percent of goal with just over three weeks to go. And one quick project from Indiegogo. Remember that little puck-type printer that crawled around a piece of paper to print stuff? There's somebody else trying the idea now. PPrintee wants to fit in your pocket and self-drive on a piece of paper of any size, leaving a trail of print behind it. It's just .2 percent funded with about a month to go, but all it takes is some deep-pocketed individual to throw a few hundred thousand bucks in the direction of the PPrintee team and this is reality. And that, my friends, is all we have time for this week. Be sure to drop by next week for our continuing coverage of the world of crowd funded projects that have some relevance to Apple fans. If you're aware of any other crowdfunded Apple-related projects, be sure to let us know about them through the Tip Us button at the upper right of the TUAW home page for future listing on the site. Thanks again to reader Hal Sherman, our faithful provider of tips about Kickstarter and Indiegogo projects!
Xbox One's May update to add audio options for apps, chat
by
Thomas Schulenberg
05.03.2014
Participants in the Xbox One's early access program can expect a new update to arrive sometime this week, while general users will see it sometime in May. Larry "Major Nelson" Hyrb's post explains that with the update, snapped apps will soon be manageable with a sound mixer found in the Settings menu, which will allow users to adjust volume levels for apps independently. The same functionality will allow users to tweak volume levels while using the Kinect for chatting. The update will also allow users to opt into allowing their speech data to be collected, which Major Nelson states will "be used for product improvement only." Users will be able to toggle their related permission by visiting the Settings menu, selecting Privacy & Online Safety, heading into Customizing Privacy and Online Safety, and setting Share Voice Data to "Allow." The post explains that having additional voice samples for the software's algorithms would help improve the Kinect's responsiveness, but if you'd rather let Microsoft smooth the kinks out on their own, that's definitely an option as well. [Image: Microsoft]
Dragon Dictate 4 released today with new features and speed enhancements
by
Mel Martin
03.04.2014
I've been a longtime fan of Dragon products, which seem to be at the very top of the line of speech recognition applications. I used an earlier version of Dragon Dictate to write large portions of a book, and I frequently use it for email as well as general control of Safari by voice. Nuance Communications has released Dragon Dictate version 4 today. The new version now includes the features of what used to be a separate product called MacSpeech Scribe. This functionality, which was introduced in the previous version of Dragon Dictate, has been considerably enhanced in function and performance. You can play a sound file into the application, and it creates a profile for the voice, which then results in a pretty accurate transcription of what was said. Many other voice recognition apps require an Internet connection because the processing is being done on a remote server. Dragon Dictation is all Mac-based, so you don't need WiFi or a data connection to let Dragon do its magic. I have spent about a week using DD4 around my home office. I noticed right off the bat that it is faster than version 3.5, and control of other applications like Apple Mail and Safari are smoother. Version 4 has also added precise control of Gmail, making creating, editing and sending a message a completely hands-off affair. Apple's Pages is also directly supported, so you can do all your formatting of text and speak other commands that would otherwise require mouse moves to a menu. It should be noted, however, that Pages 5.1 reduced support for AppleScript, so you don't get quite the range of options that you did with earlier versions. Big mistake, Apple. As a test of transcription, I downloaded a couple of podcasts. You point the Dragon application to an audio file, and it starts taking in sounds to turn into text. You then highlight a sentence of the converted sound file and make any corrections to the text. If needed, it's possible to play the audio file to hear what actually was said. Dragon needs about 60 seconds of corrected text to create a conversion profile, after which you play the sound file and the transcript appears in faster than real time on your screen. I played a 15 minute sound file into Dragon Dictate and it had the transcript ready in about 5 minutes. Things are surprisingly accurate -- better than 99% in my tests -- but one issue is there are no automatic paragraph breaks, which makes for a pretty large chunk of text to navigate. I have suggested to the Nuance folks that the app should automatically insert paragraphs based on pauses in speech, counting sentences, or every 20 seconds or so. Transcription is a great feature for students who want to preserve a lecture, or anyone wanting to turn recorded speech into editable text. The microphone needs to be pretty close to the speaker though, as you won't capture usable audio at a distance. Supported audio files include .mp3, .aif, .aiff, .wav, .mp4, .m4a, and .m4v. One nice way to record audio is to use the voice memo app that comes with iOS. When the recording is complete, email the file to yourself and let DD4 transcribe it. I tried that with a 40 second file and the transcription was perfect. Are all transcriptions mistake-free without editing? No, but Dragon Dictate 4 sure beats hours of typing. You will usually have to make some corrections. Of course Apple has long been involved in text-to-speech, and Mavericks has a built-in dictation function. It does need an internet connection, but you can download a large file that will allow local processing of speech just like Dragon does. Although neither company admits it, it's likely that Siri and Apple's OS X dictation are really Nuance products. Apple dictation is not nearly as powerful as the Dragon Dictate product, but it works well for basic dictation. If you want to dictate to your Mac while also controlling various apps without ever picking up a mouse, Dragon Dictate is the app for you. There is a certain joy and freedom that comes with seeing your words accurately appear on screen. I also love using Safari with voice only, initiating Google searches, clicking on links by voice only, and scrolling pages up and down. Dragon Dictate 4 requires an Intel Core Duo CPU running at 2.4 Ghz or faster. The app currently supports both OS X Mountain Lion and Mavericks. A headset/microphone is included with purchase, but I did fine training the app with my Blue desk microphone. On my Mac laptop, I did quite well using the built-in microphone. By the way, this review was mostly written using Dragon Dictate 4 by dictating directly into our content editor on Safari. Making hyperlinks still involves using a mouse for part of the work. Dragon Dictate 4 sells for US$199.00. An upgrade from version 3 or 3.5 is $99.00 during this month. After that upgrade pricing for previous owners with a valid install of Dragon Dictate or MacSpeech Scribe is $149.00. Dragon Dictate 4 can be bought directly from Nuance or resellers.
Meet Siri's great grandfather
by
Mike Wehner
02.17.2014
Siri is the most recognizable face -- er, voice -- of Apple's faux sentient virtual assistants these days, but let's take a moment to remember one of the company's best efforts at voice recognition from yesteryear. He didn't ever seem to have a name aside from simply "Macintosh," but he lived in the corner of your screen and could to a number of nifty things, including opening files and shutting down the computer. For a look at the little man in action, check out this Macintosh ad from the mid-nineties, and be sure to listen for the short, robotic "Goodbye" at the end. You may even gain a greater appreciation for the faceless assistant that lives inside your iPhone.
ZTE Grand S II has a smooth look and clever customizable voice recognition (hands-on)
by
Brad Molen
01.06.2014
ZTE brought a handful of products to show off at CES 2014, and they range from your run-of-the-mill flagship to projectors and smartwatches. Leading the pack is the Grand S II, an obvious follow-up to the Grand S announced at last year's January extravaganza. As the company announced this morning, the sequel features a 5.5-inch 1080p display, Snapdragon 800, 13MP rear camera and a respectable 3,000mAh battery. ZTE hasn't made any official announcements on when and where it plans to launch the device, but we're not holding our breath for US availability. We were able to spend a few minutes with the new Grand S, and it's just fast as we'd come to expect from a Snapdragon 800 device; it features ZTE's custom skin on Android 4.3, which certainly takes a little getting used to. But one thing that really aroused our curiosity is its voice control capabilities, which are completely customizable and can recognize multiple voices. Just like the Moto X, you can completely unlock the screen verbally; no training is necessary, and ZTE tells us that it's even possible to add your own trigger phrases. The phone was able to recognize voices and process commands with only a short delay, and we imagine this will continue to improve as the company works on the finishing touches. The device has a removable plastic back which looks a lot like brushed metal, and it looks smooth and classy. However, while ZTE couldn't confirm what kind of plastic it's using, we weren't terribly convinced that it'll stand the test of time. Just adding a little bit of pressure on the back resulted in a hefty amount of creaking, and while a little bit of give can actually be beneficial if you're hard on the phone, this offered generous amounts of it. We'll have a hands-on video from the show floor tomorrow, but enjoy our image gallery in the meantime.
Apple adds internet-free dictation to Mavericks
by
Mel Martin
11.05.2013
There hasn't been any fanfare about this, but if you are using Mavericks, you now have a universal dictation feature that doesn't require an internet connection. Dictation made its debut in Mountain Lion and required an internet connection so your speech could be processed on an Apple server and sent back to you as text. There were limits, and you had to pause to let your data get to Apple and back again. It worked, but it wasn't very effective. Mavericks users will now find what Apple calls "enhanced dictation," and if you turn it on, you'll need to download about an 800 MB file that makes your speech recognition local rather than server based. No training is needed; once you have the files, it works right away, and I found the recognition quick and accurate. As you talk, you see your words appear on the screen in near real time, something not possible with the internet-based method. If you liked the old method, it still works, and you can choose to use enhanced dictation or not. As a default, you start dictation by hitting the Fn key twice, but you can choose your own keys if you don't like that choice. You don't have all the fancy commands and capabilities that you'd get in one of the Dragon products, but for a quick email or word processing, it works fine. You can edit your text normally with your mouse or trackpad. This is a very nice feature that hasn't had much publicity, but it's free and works well. Give it a try if you have a need to dictate.
How Siri gained its voice
by
Mel Martin
10.01.2013
The Verge has a terrific article about voice synthesis and speech recognition that gives some interesting insights into how Siri and other digital voice assistants work. Although not officially acknowledged by Apple, Siri is based on technology from Nuance, the folks behind Dragon Dictate for Mac. Nuance also offers the free Dragon Dictation and Dragon Go! for iOS. Nuance licenses speech-recognition and voice-synthesis technology to many software companies, and has made some dramatic breakthroughs that are being used extensively in the medical field. While Siri's voice isn't quite as good as the Hal 9000 in the movie 2001, it is getting close. In iOS 7, you can choose to have a male or female voice, and Apple has added more languages. Most voice synthesis starts with a human reading sounds, which are then taken in by a computer. It's not a matter of reading every possible word, but having a catalog of sounds, called phonemes that can be used to construct new words. If you used one of the Dragon Dictation products, you see the process in reverse. You read a story into the computer composed of various words, but the computer is not just learning the words, but key parts of speech that can be used to understand words not in the story. It's complex, and requires intensive processing. With a product like Dragon Dictate, your computer does the processing. With Siri and other smartphone assistants, like Google Search, the computing is done not on your device, but on powerful servers in the cloud. To keep speech from sounding robotic, computer voices now have inflection, rising at the end of sentences where appropriate, but following a set of rules so the style and tone of speech match the context. It isn't perfect, but Siri sounds a lot better than the computer voices of 10 years ago, and Siri does sound more natural in iOS 7. The next few years are likely to show even more progress. Better recognition, more realistic voices and faster processing will be rapidly coming. I find Siri a bit half-baked at times, with server time-outs or bafflingly inaccurate recognition. Still, a feature like Siri was unthinkable on a phone just a few short years ago, and the best is yet to come.