speech to text

Latest

iOS 8 real time voice-to-text feature looks incredible
by
John-Michael Bond
07.28.2014
One of the downsides of using iOS' dictation feature for taking notes is having to wait until your note is done to see that your message was accurately transcribed. While iOS 7 features real-time transcription in Siri, the feature has yet to make it to iOS messaging. That's going to change with iOS 8. In the recently released Beta 4 for iOS 8, Apple debuted its new voice-to-text feature that allows you to see your message in real time as it's being transcribed. You can see the feature below in this video from MacRumors. Overall, this should lead to an easier to use and less frustrating dictation experience on iOS devices. Fewer errors means less time spent on editing your messages, one of the few things that currently makes dictation slower than typing on your phone. With Apple's push for CarPlay ramping up as more cars become compatible, improving voice dictation for messaging could help the company out in the long run, especially as more states outlaw texting while driving.
Dragon Dictate 4 released today with new features and speed enhancements
by
Mel Martin
03.04.2014
I've been a longtime fan of Dragon products, which seem to be at the very top of the line of speech recognition applications. I used an earlier version of Dragon Dictate to write large portions of a book, and I frequently use it for email as well as general control of Safari by voice. Nuance Communications has released Dragon Dictate version 4 today. The new version now includes the features of what used to be a separate product called MacSpeech Scribe. This functionality, which was introduced in the previous version of Dragon Dictate, has been considerably enhanced in function and performance. You can play a sound file into the application, and it creates a profile for the voice, which then results in a pretty accurate transcription of what was said. Many other voice recognition apps require an Internet connection because the processing is being done on a remote server. Dragon Dictation is all Mac-based, so you don't need WiFi or a data connection to let Dragon do its magic. I have spent about a week using DD4 around my home office. I noticed right off the bat that it is faster than version 3.5, and control of other applications like Apple Mail and Safari are smoother. Version 4 has also added precise control of Gmail, making creating, editing and sending a message a completely hands-off affair. Apple's Pages is also directly supported, so you can do all your formatting of text and speak other commands that would otherwise require mouse moves to a menu. It should be noted, however, that Pages 5.1 reduced support for AppleScript, so you don't get quite the range of options that you did with earlier versions. Big mistake, Apple. As a test of transcription, I downloaded a couple of podcasts. You point the Dragon application to an audio file, and it starts taking in sounds to turn into text. You then highlight a sentence of the converted sound file and make any corrections to the text. If needed, it's possible to play the audio file to hear what actually was said. Dragon needs about 60 seconds of corrected text to create a conversion profile, after which you play the sound file and the transcript appears in faster than real time on your screen. I played a 15 minute sound file into Dragon Dictate and it had the transcript ready in about 5 minutes. Things are surprisingly accurate -- better than 99% in my tests -- but one issue is there are no automatic paragraph breaks, which makes for a pretty large chunk of text to navigate. I have suggested to the Nuance folks that the app should automatically insert paragraphs based on pauses in speech, counting sentences, or every 20 seconds or so. Transcription is a great feature for students who want to preserve a lecture, or anyone wanting to turn recorded speech into editable text. The microphone needs to be pretty close to the speaker though, as you won't capture usable audio at a distance. Supported audio files include .mp3, .aif, .aiff, .wav, .mp4, .m4a, and .m4v. One nice way to record audio is to use the voice memo app that comes with iOS. When the recording is complete, email the file to yourself and let DD4 transcribe it. I tried that with a 40 second file and the transcription was perfect. Are all transcriptions mistake-free without editing? No, but Dragon Dictate 4 sure beats hours of typing. You will usually have to make some corrections. Of course Apple has long been involved in text-to-speech, and Mavericks has a built-in dictation function. It does need an internet connection, but you can download a large file that will allow local processing of speech just like Dragon does. Although neither company admits it, it's likely that Siri and Apple's OS X dictation are really Nuance products. Apple dictation is not nearly as powerful as the Dragon Dictate product, but it works well for basic dictation. If you want to dictate to your Mac while also controlling various apps without ever picking up a mouse, Dragon Dictate is the app for you. There is a certain joy and freedom that comes with seeing your words accurately appear on screen. I also love using Safari with voice only, initiating Google searches, clicking on links by voice only, and scrolling pages up and down. Dragon Dictate 4 requires an Intel Core Duo CPU running at 2.4 Ghz or faster. The app currently supports both OS X Mountain Lion and Mavericks. A headset/microphone is included with purchase, but I did fine training the app with my Blue desk microphone. On my Mac laptop, I did quite well using the built-in microphone. By the way, this review was mostly written using Dragon Dictate 4 by dictating directly into our content editor on Safari. Making hyperlinks still involves using a mouse for part of the work. Dragon Dictate 4 sells for US$199.00. An upgrade from version 3 or 3.5 is $99.00 during this month. After that upgrade pricing for previous owners with a valid install of Dragon Dictate or MacSpeech Scribe is $149.00. Dragon Dictate 4 can be bought directly from Nuance or resellers.
Apple seeks patent for hearing aids that deliver speech at an even keel
by
Jon Fingas
08.23.2012
Although they're called hearing aids, they can sometimes be as much of a hindrance as a help. Catch an unfamiliar accent and the attention might be on just parsing the words, let alone moving the conversation forward. Apple is applying for a patent on a technique that would take the guesswork out of listening by smoothing out all the quirks. The proposed idea would convert speech to text and back, using the switch to remove any unusual pronunciation or too-quick talking before it reaches the listener's ear. Not surprisingly for a company that makes phones and tablets, the hearing aid wouldn't always have to do the heavy lifting, either: iOS devices could handle some of the on-the-fly conversion, and pre-recorded speech could receive advance treatment to speed up the process. We don't know if Apple plans to use its learning in any kind of shipping product, although it's undoubtedly been interested in the category before -- and its ambitions of having iPhone-optimized hearing aids could well get a lift from technology that promises real understanding, not just a boost in volume.
Mountain Lion 101: Dictation
by
Steve Sande
07.25.2012
What can I say about my love of Mountain Lion's new Dictation feature? I've wanted to be able to talk and have my words transcribed to text ever since I saw the original "Assignment: Earth" episode of Star Trek back in 1968 (image at top of post). That's actress Teri Garr talking to a typewriter, and it's transcribing her words. Now it's finally happening, and I think that's pretty cool. I know that a lot of people are unimpressed by the dictation capabilities of Mountain Lion, the iPhone 4S, and the third-generation iPad, but I'm one of those people who is both blessed with a voice that seems to be made for Siri (the brains behind Dictation) and who has practiced dictating to my Mac and iOS devices. Unlike Rich Gaywood, who stated in his big Mountain Lion review that Dictation was having cutting through his Welsh accent, I seem to be having very few problems. As you'd expect, I am dictating this post on my Mountain Lion-equipped MacBook Air. By default, Dictation is turned on in Mountain Lion. To shut it off permanently or change other settings, use the new Dictation & Speech pref in System Preferences. With the pref it's possible to select the microphone used by Dictation, set the key(s) to press to activate Dictation (by default, you press the fn key on your keyboard twice), or learn more about Dictation and privacy. That last feature comes courtesy of a button on the bottom of the preference pane. Click it, and you're basically told that anything you dictate is recorded and sent to Apple to convert into text. That's right; it won't work without a live Internet connection. The Apple privacy statement also says that your computer will also send Apple "other information, such as your first name and nickname; and the names, nicknames, and relationship with you (for example, "my dad") of your address book contacts." Enough about the preferences panel. Let's talk about how accurate dictation really is. If I stop and think about what I'm trying to say to my Mac, and then speak clearly and a little bit slowly, then the accuracy rate is almost 100 percent. On the other hand, if I just start talking and stumble over what I'm saying, my accuracy suffers. Don't expect to be able to talk to your Mac for an hour and have a perfectly-typed term paper ready to submit at the end. Dictation works in 30-second chunks; any more than that and it will chime to let you know that it's done. I've found that the response time for Dictation is very fast compared to that on the iPhone 4S and third-generation iPad. In our book, "Talking to Siri", Erica Sadun and I discuss ways of improving accuracy of Siri dictation. We also talk about how to add caps and punctuation to your dictation, but you'll find that some of those commands don't work quite the same in Mountain Lion. For example, it was previously possible to say "My cat is named cap emerald" to have Siri type out "My cat is named Emerald." You no longer need to say "cap" to get Dictation to capitalize the proper name. However, none of the capitalization commands work any more. Likewise, spacing commands -- "space" and "no space" -- that used to add or eliminate spaces between words no longer work. All punctuation commands seem to be enabled from the testing I've been able to do. Dictation is one of those Mountain Lion features that you're either going to love or hate -- I'm not sure there's much of an in-between. Personally, I find it to be extremely useful, especially in combination with Messages. There's nothing more satisfying than tapping the function key twice, dictating a quick response to my wife, and then getting back to work. I'd suggest to anyone who is upgrading to Mountain Lion to at least give Dictation a try. You might find out that it works better than you think.
BMW's 3 and 7 Series to be the first with Nuance's Dragon Drive! Messaging aboard
by
Edgar Alvarez
07.09.2012
It somehow feels like it was only yesterday that Nuance unveiled its Dragon Drive! creation to the world, hoping to in the process make drivers' lives easier by delivering a fresh eyes / hands-free messaging system inside connected cars. Unfortunately, back then the savvy company didn't announce any partnerships with auto manufacturers -- still, we had a feeling it wouldn't be too long before one of them would want to come along for the voice dictation ride. The good news is, that's about to change pretty soon. Per the outfit itself, BMW's decided to bring the Dragon Drive! tech to its 2012 7 Series later this month, with the 3 Series Touring and the eco-friendly 3 Series ActiveHybrid expected to get it "later this year." Notably, Dragon Drive! will offer multi-language support, including English, Spanish, Italian, French and German. There's no word yet on just how much the fee for the service will be, but we do know those who land themselves one of these new Beemers will get a two-month trial to take Dragon Drive! for a quick spin.
AT&T Translator app hands-on: smashing the language barrier (video)
by
Terrence O'Brien
04.19.2012
Translation apps aren't exactly the newest or sexiest thing in the world of technology, but we've got to hand it to AT&T for whipping up a rather impressive demo. The company showed off a next-gen version of its AT&T Translator app, which may one day allow people to communicate in real time regardless of their spoken language. The app uses the carrier's new Watson Speech API, in this case via a VoIP call on a pair of iPads, to not only transcribe dialog, but translate it from English to Spanish (and vice-versa), then play it back in the target tongue using a computer generated voice. This isn't like the Google Translate app on your phone -- the translation happens in near real time, with only a slight latency as your words are fed through the system. The demo wasn't without its hitches (the room was noisy and filled with bloggers totting wireless devices), but it went more or less as planned, and our gracious hosts were able to complete a call requesting a taxi cab. One day AT&T hopes to make this a standard feature of its services, eliminating the language barrier once and for all. To see the app in action check out the video after the break.
Ask Ziggy: the Windows Phone 7 counter to Apple's Siri (video)
by
Darren Murph
01.02.2012
Ask Ziggy has actually been on the Windows Store for right around a fortnight, but there's an updated version hitting soon that brings an astounding amount of Siri-ness to Microsoft's own Windows Phone 7 platform. Developed by Shai Leib, the app is a free (and even ad-free) program that can "translate human speech into transcribed text." According to Leib, the text is then "analyzed for patterns to detect commands or general queries, while commands are interpreted and routed to routine phone tasks such as emailing, texting, calling, social network updates, and getting directions." If you're asking a more generic question, the app uses a hodgepodge of technologies and web searches to find the answer, and we're told that "several passes may be required to find a concise answer." Still, what's shown on the video just past the break is impressive -- particularly for a gratis app from a single Earthling -- and you can expect the latest edition to pop up in the Store within the next couple of days. Just don't ask it if it's hot for Siri, okay? [Thanks, Alex]
Windows Phone Apollo to feature speech-to-text for email, low-end Tango gets split in two
by
Joseph Volpe
09.13.2011
If this hodgepodge of sorta, kinda official confirmation is to be believed, Windows Phone users can look forward to deeper integration of voice command functionality built-in to the Apollo update. Nokia US' CEO, Chris Weber, first spilled the speech recognition beans in an interview with VentureBeat back in early August, referring to the tech as a killer WP feature. Now, a report over on ZDNet backs up that leaked info with resume tidbits from former MS Windows Phone / Mobile Communications team members that had a hand in creating the so-called "Voice-Compose" and "Read-Aloud" features for native email clients-- even tipping us off to a possible Windows 8 and WP 8 convergence. There's also mention (gleaned from a company job listing) of MS' lower-end mobile OS splitting into two separate versions -- Tango1 and Tango2. We know what you're thinking. It's hard to get excited about far-off OS updates when we're still waiting on Mango's release. Still, it's good to know Ballmer and co. aren't just resting on their Windows laurels.
Leak: future iOS update to introduce Siri-based voice control
by
Sean Buckley
07.25.2011
When Apple snatched up Siri back in April, we had to wonder exactly what Cupertino was planning for the voice controlled virtual assistant. The answer, according to a new leak, is unsurprisingly obvious: iOS integration. A screenshot leaked to 9to5Mac flaunts an "Assistant" feature presumably built into a firmware update. To back up the screenshot, the aforesaid site dove into the iOS SDK and uncovered code describing Siri-like use of the iPhone's location, contact list, and song metadata. The code also outlined a "speaker" feature, opening a door for further Nuance integration in Apple products. Sound awesome? Sure it does, but keep it salty: 9to5's source says the assistant feature only just went into testing, and may not be ready in time for Apple's next big handset upgrade. Hit the source link to see the code and conjecture for yourself.
Pioneer solicits Whodoo guinea pigs for speech-based Android assistant (video)
by
Zachary Lutz
07.13.2011
Ever wish you could have a personal attendant living inside your Android smartphone? You know... one you can boss around without incurring human rights or labor law violations? Apparently Pioneer shares your vision, because its voice-controlled social assistant named Whodoo is seemingly ready to "hop to" at a moment's notice -- willing to locate a restaurant and send it to friends, route the appropriate directions, and announce your intentions to Facebook or Twitter -- all based on your verbal commands (and ostensibly perfect for in-dash navigation). The company is seeking bossy applicants for its closed beta experiment, which involves completing a lengthy application, providing considerable feedback, and submitting audio samples that are gathered by Whodoo. Think you've got the chops? Just follow the source, where you're free to convince Pioneer of the same.
Windows Phone 7.5 Mango in-depth preview (video)
by
Brad Molen
06.27.2011
Make no mistake, Microsoft isn't playing coy in the smartphone market any longer. The folks in Redmond are making a significant jump forward in the mobile arena, announcing that the upcoming version of Windows Phone, codenamed "Mango," will be heading to a device near you in time for the holidays. As its competitors have raised the bar of expectations to a much higher level, Microsoft followed suit by adding at least 500 features to its mobile investment, which the company hopes will plug all of the gaping holes the first two versions left open. We received a Samsung Focus preloaded with the most recent developer build (read: not even close to the market release version) and we had a few good days to put it through its paces. It's still far from completion, as there were several key features that we couldn't test out; some weren't fully implemented, and others involved third-party apps that won't be updated until closer to launch. Yet we don't want to call this build half-baked -- in fact, it was surprisingly smooth for software that still has at least four months to go before it's available for public consumption. At the risk of sounding ridiculously obvious, we're mighty interested in seeing the final result when all is said and done this holiday season. As a disclaimer, we can't guarantee that the stuff we cover here will actually look or act the same when it's ready to peek out and make its official introduction in Q4; as often happens, features and UI enhancements are subject to be changed by the Windows Phone team as Mango gets closer and closer to release. Let's get straight to brass tacks, since there's a lot of details to dive into. It'd be best to grab a large beverage (we'd recommend a Big Gulp, at least), find your most comfortable chair, and meet us after the break.
NTT DoCoMo exhibits on-the-fly speech translation, lets both parties just talk (video)
by
Sharif Sakr
05.30.2011
The race to smash linguistic barriers with simultaneous speech-to-speech translation is still wide open, and Japanese mobile operator NTT DoCoMo has just joined Google Translate and DARPA on the track. Whereas Google Translate's Conversation Mode was a turn-based affair when it was demoed back in January, requiring each party to pause awkwardly between exchanges, NTT DoCoMo's approach seems a lot more natural. It isn't based on new technology as such, but brings together a range of existing cloud-based services that recognize your words, translate them and then synthesize new speech in the other language -- hopefully all before your cross-cultural buddy gets bored and hangs up. As you'll see in the video after the break, this speed comes with the sacrifice of accuracy and it will need a lot of work after it's trialled later in the year. But hey, combine NTT DoCoMo's system with a Telenoid robot or kiss transmission device and you can always underline your meaning physically.
Nuance voices found in OS X Lion, patent application suggests new iPhone speech / text capabilities
by
Donald Melanson
05.16.2011
Apple's certainly no stranger to speech recognition, but it looks like it may have enlisted a bit of outside help for the next version of OS X, otherwise known as Lion. As Netputing reports, some of the text-to-speech voice options available in the developer preview of Lion just so happen to match the voices available from Nuance -- which would seem to suggest a partnership or licensing agreement of some sort, as the voices themselves cost $45 apiece directly from Nuance. In somewhat related news, Apple has also recently filed a patent application that would bring some fairly extensive new speech recognition options to the iPhone -- if it ever actually moves beyond a patent application, that is. In short, it would let you either instantly have a phone call converted to text, or send some text and have it converted to voice on the other end -- which the application notes could come in handy both in noisy environments or in situations where you simply aren't able to talk. It would even apparently incorporate a noise meter that could automatically trigger various options when the ambient noise hits a certain level. Hit up the source link below for a closer look at how it would work. [Thanks to everyone who sent this in]
iOS 5 speech recognition concept showcased in video
by
Kelly Hodgkins
05.16.2011
Recent rumors and a patent application suggest an upcoming version of iOS will include some form of speech recognition. Inspired by these revelations, graphic designer Jan-Michael Cart created a short video that shows how Apple could add this speech-to-text functionality to iOS 5. His conceptualization takes speech recognition one step further than the patent, which focus on calling only. Cart envisions a world where speech is incorporated into the core of iOS and used throughout the user interface. For example, a long-press of the home button would launch the speech recognition module and let you create text messages. An API could be made available to developers so that they could add speech recognition to their applications. It's an interesting concept that would make many users happy if Apple implements speech-to-text in this way. Read on for Jan-Michael Cart's concept video. [Via iPhoneDownloadBlog]
Apple patent reveals a text-to-speech and speech-to-text system for the iPhone
by
Kelly Hodgkins
05.13.2011
Apple recently filed a patent application for a text-to-speech and a speech-to-text converter designed to work in noisy environments. The patent describes a system that uses a converter included on the logic board of the phone. This hardware-based conversion would have a distinct advantage over current text-to-speech systems, which use an internet-based service from a company like Nuance to translate conversions. Unlike Android's text-to-speech system, which is used for searching and navigation, Apple's patent describes a system used for sending and receiving phone calls. In one embodiment, a microphone on the iPhone would detect the ambient noise level and prompt the user to answer a call using text-to-speech in a noisy environment when talking on the phone may be difficult. The person answering the call would type in their messages, and the phone would convert it to speech heard by the caller. In another example, the user could choose to talk via a two-way texting system that uses both text-to-speech and a speech-to-text within the conversation. Basically, your caller's words would be converted to text that you could read, and you could input a text response that is converted to speech for the caller on the other end. It's an elegant system that would be useful for making phone calls at a loud sporting event or a crowded bar. Apple is rumored to be in talks with Nuance that could bring an advanced speech recognition system to iOS. This above patent may describe a small part of what is to come for iPhone owners in the future.
Chrome 11 goes beta with speech-to-text capabilities
by
Donald Melanson
03.23.2011
Well, it looks like Google is unsurprisingly adding more than just a new logo to the latest version of its Chrome browser -- the just-released beta of Chrome 11 also now boasts speech-to-text capabilities. That comes in the form of support for the HTML5 speech input API, which web developers will be able to take advantage of to let folks simply talk to websites and have their speech magically transcribed to text. Also making a first appearance in the beta is support for GPU-accelerated 3D CSS, which will let developers apply all sorts of 3D effects to websites -- Blingee will never be the same, surely. Hit up the link below to try it out for yourself.
Nuance opens Dragon Mobile SDK to app developers, we see end to embarrassing dictation
by
Christopher Trout
01.23.2011
There are some messages that are just too embarrassing to dictate to a human being. Lucky for us and the retired circus contortionist we hired to type up our missives, Nuance is expanding the reach of its transcription software by making its Dragon Mobile SDK available to developers for use in iOS and Android applications. The SDK, which is free to members of the Nuance Mobile Developer Program, sports speech-to-text capabilities in eight languages and text-to-speech in 35. There are already apps out there that can do the job, including Nuance's own Dragon Dictation, but we welcome new advances in automated transcription. You know, it's not exactly a walk in the park dictating an entire Clay Aiken Fan Club newsletter to a guy named Sid the Human Pretzel.
OnStar announces Bluetooth voice app, reads your Facebook messages to you
by
Tim Stevens
01.04.2011
Texting while driving is deadly, for serious. But, letting someone else read to you is rather less risky, and talking isn't so bad either -- in moderation, anyway. Bring those two together and you have OnStar's solution, an upcoming Bluetooth app that will read text messages and status updates to you and, somewhat more interestingly, lets you speak a custom message that will be transcribed to your recipient. Fascinating? Absolutely, but we can't wait to hear what sort of fun and cheeky mistranslations come out of that feature. You can also post voice messages to Facebook and say things like "call back" to return a call. The app is, as of now, intended for Android devices only and is set to hit the Market sometime in the first half of the year, and at least initially it'll only work on cars that have Bluetooth or those equipped with the company's new aftermarket mirror, though you'll have to be paid up on your OnStar dues if you want to use it. Full details in the PR after the break. %Gallery-112593%
Google Voice Search update helps you personalize your results, helps Google build another database to take over the world
by
Sean Hollister
12.14.2010
Google Voice Actions was the first step towards our Star Trek dreams of lassoing the world with naught but vocal cords, and today Google's taken a second hop towards that inevitable future by letting Android devices record our every utterance. Yes, if you've got a handset running Froyo or better, you can download an update for Google Voice Search right now, which will let your phone dynamically personalize its speech-to-text engine to better recognize your voice most every time you use it. Of course, by so doing you're giving Google permission to record your sentences -- anonymously, of course -- to use in future products, but whether that's a problem or just a happy coincidence depends on whether you take Google at its word. We hit the "yes" button, in case you're curious. Find it on Android Market, or just use the handy-dandy QR code below.
LG's Optimus 7 gets previewed by Korean newspaper, has voice to text feature?
by
Sean Hollister
09.28.2010
You know how we abhor machine translation, but this rumor was too juicy to pass up -- the Korea Economic Daily reportedly got hands-on with LG's Optimus 7 (aka E900) way ahead of release, and if we're reading this right, the Windows Phone 7 device will be capable of writing your text messages, emails and status updates just by hearing you speak. The publication also reports it's got a 3.8-inch, 800 x 480 screen (rather than the 3.5 or 3.7 inches we've heard before), a 1500 mAh battery, 16GB of built-in storage and a 1GHz processor. There's also apparently "automatic panorama" feature where you simply pan the camera to take stills and stitch them together, which sounds a lot like the Sweep Panorama dealie Sony recently added to its Cyber-Shot lineup. Can we expect a US version to have these features? Hard to say. Even should this preview be wholly legit, speech-to-text would probably need quite the overhaul to tell English from Korean -- and let's not even get started on Engrish.