SpeechToText

Latest

  • Getty Images

    Microsoft improves Office's hands-free typing with Dictate

    by 
    Mallory Locklear
    Mallory Locklear
    06.20.2017

    Microsoft has released a new app called Dictate. It's an add-in for Word, Outlook and Powerpoint and uses Cortana's speech-recognition technology to let you speak what you want to type.

  • Evernote comes to Windows Phone 8 with multi-shot camera, speech to text support

    by 
    Jon Fingas
    Jon Fingas
    08.27.2013

    Evernote has let its Windows Phone app languish at times, but the company is making amends with a new version of its note-taking client that supports Windows Phone 8. The update primarily improves navigation and speed for modern hardware, although there are a few platform-independent additions. Users can now snap multiple photos without leaving the camera mode; the refresh also introduces photo slideshows, speech-to-text transcription and an automatic title generator. Evernote's big upgrade is available in the Windows Phone Store today, and there's promises of "lots more" coming to Microsoft's mobile platform in the near future.

  • Study suggests voice-to-text 'just as dangerous' as texting while driving

    by 
    Mat Smith
    Mat Smith
    04.23.2013

    If you thought you were driving safe when activating your voice-to-text service or starting up Siri, a survey by the Texas Transportation Institute suggests it might dull your reactions just as much as finger-based typing. The study compared traditional text messaging with voice-to-text and found that drivers still took about twice as long to react compared to when they weren't trying to communicate. According to Christine Yager, who led the research, "eye contact to the roadway" also suffered, regardless of which texting method was used. In fact, voice-to-text often took longer than manual input, due to the need to correct typing discrepancies while using the software: "You're still using your mind to try to think of what you're trying to say, and that by proxy causes some driving impairment, and that decreases your response time." The bigger issue is that the drivers apparently felt safer when using voice-based entry, even though test results proved that response times were just as negatively affected. "It is important to educate the public that even these seemingly new distractions are still distractions, and it will help people be safer when they get into the vehicle," Yager added.

  • Apple seeks patent for hearing aids that deliver speech at an even keel

    by 
    Jon Fingas
    Jon Fingas
    08.23.2012

    Although they're called hearing aids, they can sometimes be as much of a hindrance as a help. Catch an unfamiliar accent and the attention might be on just parsing the words, let alone moving the conversation forward. Apple is applying for a patent on a technique that would take the guesswork out of listening by smoothing out all the quirks. The proposed idea would convert speech to text and back, using the switch to remove any unusual pronunciation or too-quick talking before it reaches the listener's ear. Not surprisingly for a company that makes phones and tablets, the hearing aid wouldn't always have to do the heavy lifting, either: iOS devices could handle some of the on-the-fly conversion, and pre-recorded speech could receive advance treatment to speed up the process. We don't know if Apple plans to use its learning in any kind of shipping product, although it's undoubtedly been interested in the category before -- and its ambitions of having iPhone-optimized hearing aids could well get a lift from technology that promises real understanding, not just a boost in volume.

  • Mountain Lion 101: Dictation

    by 
    Steve Sande
    Steve Sande
    07.25.2012

    What can I say about my love of Mountain Lion's new Dictation feature? I've wanted to be able to talk and have my words transcribed to text ever since I saw the original "Assignment: Earth" episode of Star Trek back in 1968 (image at top of post). That's actress Teri Garr talking to a typewriter, and it's transcribing her words. Now it's finally happening, and I think that's pretty cool. I know that a lot of people are unimpressed by the dictation capabilities of Mountain Lion, the iPhone 4S, and the third-generation iPad, but I'm one of those people who is both blessed with a voice that seems to be made for Siri (the brains behind Dictation) and who has practiced dictating to my Mac and iOS devices. Unlike Rich Gaywood, who stated in his big Mountain Lion review that Dictation was having cutting through his Welsh accent, I seem to be having very few problems. As you'd expect, I am dictating this post on my Mountain Lion-equipped MacBook Air. By default, Dictation is turned on in Mountain Lion. To shut it off permanently or change other settings, use the new Dictation & Speech pref in System Preferences. With the pref it's possible to select the microphone used by Dictation, set the key(s) to press to activate Dictation (by default, you press the fn key on your keyboard twice), or learn more about Dictation and privacy. That last feature comes courtesy of a button on the bottom of the preference pane. Click it, and you're basically told that anything you dictate is recorded and sent to Apple to convert into text. That's right; it won't work without a live Internet connection. The Apple privacy statement also says that your computer will also send Apple "other information, such as your first name and nickname; and the names, nicknames, and relationship with you (for example, "my dad") of your address book contacts." Enough about the preferences panel. Let's talk about how accurate dictation really is. If I stop and think about what I'm trying to say to my Mac, and then speak clearly and a little bit slowly, then the accuracy rate is almost 100 percent. On the other hand, if I just start talking and stumble over what I'm saying, my accuracy suffers. Don't expect to be able to talk to your Mac for an hour and have a perfectly-typed term paper ready to submit at the end. Dictation works in 30-second chunks; any more than that and it will chime to let you know that it's done. I've found that the response time for Dictation is very fast compared to that on the iPhone 4S and third-generation iPad. In our book, "Talking to Siri", Erica Sadun and I discuss ways of improving accuracy of Siri dictation. We also talk about how to add caps and punctuation to your dictation, but you'll find that some of those commands don't work quite the same in Mountain Lion. For example, it was previously possible to say "My cat is named cap emerald" to have Siri type out "My cat is named Emerald." You no longer need to say "cap" to get Dictation to capitalize the proper name. However, none of the capitalization commands work any more. Likewise, spacing commands -- "space" and "no space" -- that used to add or eliminate spaces between words no longer work. All punctuation commands seem to be enabled from the testing I've been able to do. Dictation is one of those Mountain Lion features that you're either going to love or hate -- I'm not sure there's much of an in-between. Personally, I find it to be extremely useful, especially in combination with Messages. There's nothing more satisfying than tapping the function key twice, dictating a quick response to my wife, and then getting back to work. I'd suggest to anyone who is upgrading to Mountain Lion to at least give Dictation a try. You might find out that it works better than you think.

  • BMW's 3 and 7 Series to be the first with Nuance's Dragon Drive! Messaging aboard

    by 
    Edgar Alvarez
    Edgar Alvarez
    07.09.2012

    It somehow feels like it was only yesterday that Nuance unveiled its Dragon Drive! creation to the world, hoping to in the process make drivers' lives easier by delivering a fresh eyes / hands-free messaging system inside connected cars. Unfortunately, back then the savvy company didn't announce any partnerships with auto manufacturers -- still, we had a feeling it wouldn't be too long before one of them would want to come along for the voice dictation ride. The good news is, that's about to change pretty soon. Per the outfit itself, BMW's decided to bring the Dragon Drive! tech to its 2012 7 Series later this month, with the 3 Series Touring and the eco-friendly 3 Series ActiveHybrid expected to get it "later this year." Notably, Dragon Drive! will offer multi-language support, including English, Spanish, Italian, French and German. There's no word yet on just how much the fee for the service will be, but we do know those who land themselves one of these new Beemers will get a two-month trial to take Dragon Drive! for a quick spin.

  • AT&T Translator app hands-on: smashing the language barrier (video)

    by 
    Terrence O'Brien
    Terrence O'Brien
    04.19.2012

    Translation apps aren't exactly the newest or sexiest thing in the world of technology, but we've got to hand it to AT&T for whipping up a rather impressive demo. The company showed off a next-gen version of its AT&T Translator app, which may one day allow people to communicate in real time regardless of their spoken language. The app uses the carrier's new Watson Speech API, in this case via a VoIP call on a pair of iPads, to not only transcribe dialog, but translate it from English to Spanish (and vice-versa), then play it back in the target tongue using a computer generated voice. This isn't like the Google Translate app on your phone -- the translation happens in near real time, with only a slight latency as your words are fed through the system. The demo wasn't without its hitches (the room was noisy and filled with bloggers totting wireless devices), but it went more or less as planned, and our gracious hosts were able to complete a call requesting a taxi cab. One day AT&T hopes to make this a standard feature of its services, eliminating the language barrier once and for all. To see the app in action check out the video after the break.

  • Ask Ziggy: the Windows Phone 7 counter to Apple's Siri (video)

    by 
    Darren Murph
    Darren Murph
    01.02.2012

    Ask Ziggy has actually been on the Windows Store for right around a fortnight, but there's an updated version hitting soon that brings an astounding amount of Siri-ness to Microsoft's own Windows Phone 7 platform. Developed by Shai Leib, the app is a free (and even ad-free) program that can "translate human speech into transcribed text." According to Leib, the text is then "analyzed for patterns to detect commands or general queries, while commands are interpreted and routed to routine phone tasks such as emailing, texting, calling, social network updates, and getting directions." If you're asking a more generic question, the app uses a hodgepodge of technologies and web searches to find the answer, and we're told that "several passes may be required to find a concise answer." Still, what's shown on the video just past the break is impressive -- particularly for a gratis app from a single Earthling -- and you can expect the latest edition to pop up in the Store within the next couple of days. Just don't ask it if it's hot for Siri, okay? [Thanks, Alex]

  • Will Dragon speech apps remain in the app store for iPhone 4 owners?

    by 
    Mel Martin
    Mel Martin
    10.05.2011

    It was a bit of a shock to learn yesterday that the terrific Siri app, now owned by Apple, will get pulled from the app store. It's being done, I'm sure, to encourage people to get the Siri technology built into the new iPhone 4S. Although an interview with the co-founder of Siri indicated that they had to cut some corners to get the app to work on "older" hardware. Still, it seems, shall we say, small of Apple to kill an app that seemed to work just fine, and did some of the tasks that the new incarnation of Siri will do on the iPhone 4S. One bright spot for those sticking with their current phones are the Dragon apps from Nuance. Dragon Dictation will take your voice and turn it into text for a note, an email, or a text message. Dragon Go!, which we have reviewed very positively, does much of what Siri does, connecting to Yelp, Google Maps, Open Table, various search engines and other web services so you can ask about a weather forecast, directions to any destination, and even the latest sports scores. Like Siri, the Dragon apps are powered by Nuance speech recognition software, and the processing is done in the cloud. Both the Dragon apps are free and work fine on the iPhone 4, 3GS, 3rd and 4th generation iPod touch, and the iPad. A Nuance spokesperson assured me today that both apps are doing very well, and the company has no plans to pull them from the App Store. Together the apps can give you a rough approximation of what Siri on the iPhone 4S can do, but it doesn't have the same integration with iOS as Siri so it won't be as slick. I'm hoping Apple will reconsider what I think is a customer hostile decision to yank Siri. How about you? Do you think Apple should have pulled the plug on the Siri app?

  • Windows Phone Apollo to feature speech-to-text for email, low-end Tango gets split in two

    by 
    Joseph Volpe
    Joseph Volpe
    09.13.2011

    If this hodgepodge of sorta, kinda official confirmation is to be believed, Windows Phone users can look forward to deeper integration of voice command functionality built-in to the Apollo update. Nokia US' CEO, Chris Weber, first spilled the speech recognition beans in an interview with VentureBeat back in early August, referring to the tech as a killer WP feature. Now, a report over on ZDNet backs up that leaked info with resume tidbits from former MS Windows Phone / Mobile Communications team members that had a hand in creating the so-called "Voice-Compose" and "Read-Aloud" features for native email clients-- even tipping us off to a possible Windows 8 and WP 8 convergence. There's also mention (gleaned from a company job listing) of MS' lower-end mobile OS splitting into two separate versions -- Tango1 and Tango2. We know what you're thinking. It's hard to get excited about far-off OS updates when we're still waiting on Mango's release. Still, it's good to know Ballmer and co. aren't just resting on their Windows laurels.

  • Leak: future iOS update to introduce Siri-based voice control

    by 
    Sean Buckley
    Sean Buckley
    07.25.2011

    When Apple snatched up Siri back in April, we had to wonder exactly what Cupertino was planning for the voice controlled virtual assistant. The answer, according to a new leak, is unsurprisingly obvious: iOS integration. A screenshot leaked to 9to5Mac flaunts an "Assistant" feature presumably built into a firmware update. To back up the screenshot, the aforesaid site dove into the iOS SDK and uncovered code describing Siri-like use of the iPhone's location, contact list, and song metadata. The code also outlined a "speaker" feature, opening a door for further Nuance integration in Apple products. Sound awesome? Sure it does, but keep it salty: 9to5's source says the assistant feature only just went into testing, and may not be ready in time for Apple's next big handset upgrade. Hit the source link to see the code and conjecture for yourself.

  • Pioneer solicits Whodoo guinea pigs for speech-based Android assistant (video)

    by 
    Zachary Lutz
    Zachary Lutz
    07.13.2011

    Ever wish you could have a personal attendant living inside your Android smartphone? You know... one you can boss around without incurring human rights or labor law violations? Apparently Pioneer shares your vision, because its voice-controlled social assistant named Whodoo is seemingly ready to "hop to" at a moment's notice -- willing to locate a restaurant and send it to friends, route the appropriate directions, and announce your intentions to Facebook or Twitter -- all based on your verbal commands (and ostensibly perfect for in-dash navigation). The company is seeking bossy applicants for its closed beta experiment, which involves completing a lengthy application, providing considerable feedback, and submitting audio samples that are gathered by Whodoo. Think you've got the chops? Just follow the source, where you're free to convince Pioneer of the same.

  • Windows Phone 7.5 Mango in-depth preview (video)

    by 
    Brad Molen
    Brad Molen
    06.27.2011

    Make no mistake, Microsoft isn't playing coy in the smartphone market any longer. The folks in Redmond are making a significant jump forward in the mobile arena, announcing that the upcoming version of Windows Phone, codenamed "Mango," will be heading to a device near you in time for the holidays. As its competitors have raised the bar of expectations to a much higher level, Microsoft followed suit by adding at least 500 features to its mobile investment, which the company hopes will plug all of the gaping holes the first two versions left open. We received a Samsung Focus preloaded with the most recent developer build (read: not even close to the market release version) and we had a few good days to put it through its paces. It's still far from completion, as there were several key features that we couldn't test out; some weren't fully implemented, and others involved third-party apps that won't be updated until closer to launch. Yet we don't want to call this build half-baked -- in fact, it was surprisingly smooth for software that still has at least four months to go before it's available for public consumption. At the risk of sounding ridiculously obvious, we're mighty interested in seeing the final result when all is said and done this holiday season. As a disclaimer, we can't guarantee that the stuff we cover here will actually look or act the same when it's ready to peek out and make its official introduction in Q4; as often happens, features and UI enhancements are subject to be changed by the Windows Phone team as Mango gets closer and closer to release. Let's get straight to brass tacks, since there's a lot of details to dive into. It'd be best to grab a large beverage (we'd recommend a Big Gulp, at least), find your most comfortable chair, and meet us after the break.

  • NTT DoCoMo exhibits on-the-fly speech translation, lets both parties just talk (video)

    by 
    Sharif Sakr
    Sharif Sakr
    05.30.2011

    The race to smash linguistic barriers with simultaneous speech-to-speech translation is still wide open, and Japanese mobile operator NTT DoCoMo has just joined Google Translate and DARPA on the track. Whereas Google Translate's Conversation Mode was a turn-based affair when it was demoed back in January, requiring each party to pause awkwardly between exchanges, NTT DoCoMo's approach seems a lot more natural. It isn't based on new technology as such, but brings together a range of existing cloud-based services that recognize your words, translate them and then synthesize new speech in the other language -- hopefully all before your cross-cultural buddy gets bored and hangs up. As you'll see in the video after the break, this speed comes with the sacrifice of accuracy and it will need a lot of work after it's trialled later in the year. But hey, combine NTT DoCoMo's system with a Telenoid robot or kiss transmission device and you can always underline your meaning physically.

  • Nuance voices found in OS X Lion, patent application suggests new iPhone speech / text capabilities

    by 
    Donald Melanson
    Donald Melanson
    05.16.2011

    Apple's certainly no stranger to speech recognition, but it looks like it may have enlisted a bit of outside help for the next version of OS X, otherwise known as Lion. As Netputing reports, some of the text-to-speech voice options available in the developer preview of Lion just so happen to match the voices available from Nuance -- which would seem to suggest a partnership or licensing agreement of some sort, as the voices themselves cost $45 apiece directly from Nuance. In somewhat related news, Apple has also recently filed a patent application that would bring some fairly extensive new speech recognition options to the iPhone -- if it ever actually moves beyond a patent application, that is. In short, it would let you either instantly have a phone call converted to text, or send some text and have it converted to voice on the other end -- which the application notes could come in handy both in noisy environments or in situations where you simply aren't able to talk. It would even apparently incorporate a noise meter that could automatically trigger various options when the ambient noise hits a certain level. Hit up the source link below for a closer look at how it would work. [Thanks to everyone who sent this in]

  • iOS 5 speech recognition concept showcased in video

    by 
    Kelly Hodgkins
    Kelly Hodgkins
    05.16.2011

    Recent rumors and a patent application suggest an upcoming version of iOS will include some form of speech recognition. Inspired by these revelations, graphic designer Jan-Michael Cart created a short video that shows how Apple could add this speech-to-text functionality to iOS 5. His conceptualization takes speech recognition one step further than the patent, which focus on calling only. Cart envisions a world where speech is incorporated into the core of iOS and used throughout the user interface. For example, a long-press of the home button would launch the speech recognition module and let you create text messages. An API could be made available to developers so that they could add speech recognition to their applications. It's an interesting concept that would make many users happy if Apple implements speech-to-text in this way. Read on for Jan-Michael Cart's concept video. [Via iPhoneDownloadBlog]

  • Apple patent reveals a text-to-speech and speech-to-text system for the iPhone

    by 
    Kelly Hodgkins
    Kelly Hodgkins
    05.13.2011

    Apple recently filed a patent application for a text-to-speech and a speech-to-text converter designed to work in noisy environments. The patent describes a system that uses a converter included on the logic board of the phone. This hardware-based conversion would have a distinct advantage over current text-to-speech systems, which use an internet-based service from a company like Nuance to translate conversions. Unlike Android's text-to-speech system, which is used for searching and navigation, Apple's patent describes a system used for sending and receiving phone calls. In one embodiment, a microphone on the iPhone would detect the ambient noise level and prompt the user to answer a call using text-to-speech in a noisy environment when talking on the phone may be difficult. The person answering the call would type in their messages, and the phone would convert it to speech heard by the caller. In another example, the user could choose to talk via a two-way texting system that uses both text-to-speech and a speech-to-text within the conversation. Basically, your caller's words would be converted to text that you could read, and you could input a text response that is converted to speech for the caller on the other end. It's an elegant system that would be useful for making phone calls at a loud sporting event or a crowded bar. Apple is rumored to be in talks with Nuance that could bring an advanced speech recognition system to iOS. This above patent may describe a small part of what is to come for iPhone owners in the future.

  • Chrome 11 goes beta with speech-to-text capabilities

    by 
    Donald Melanson
    Donald Melanson
    03.23.2011

    Well, it looks like Google is unsurprisingly adding more than just a new logo to the latest version of its Chrome browser -- the just-released beta of Chrome 11 also now boasts speech-to-text capabilities. That comes in the form of support for the HTML5 speech input API, which web developers will be able to take advantage of to let folks simply talk to websites and have their speech magically transcribed to text. Also making a first appearance in the beta is support for GPU-accelerated 3D CSS, which will let developers apply all sorts of 3D effects to websites -- Blingee will never be the same, surely. Hit up the link below to try it out for yourself.

  • Nuance opens Dragon Mobile SDK to app developers, we see end to embarrassing dictation

    by 
    Christopher Trout
    Christopher Trout
    01.23.2011

    There are some messages that are just too embarrassing to dictate to a human being. Lucky for us and the retired circus contortionist we hired to type up our missives, Nuance is expanding the reach of its transcription software by making its Dragon Mobile SDK available to developers for use in iOS and Android applications. The SDK, which is free to members of the Nuance Mobile Developer Program, sports speech-to-text capabilities in eight languages and text-to-speech in 35. There are already apps out there that can do the job, including Nuance's own Dragon Dictation, but we welcome new advances in automated transcription. You know, it's not exactly a walk in the park dictating an entire Clay Aiken Fan Club newsletter to a guy named Sid the Human Pretzel.

  • Google Voice Search update helps you personalize your results, helps Google build another database to take over the world

    by 
    Sean Hollister
    Sean Hollister
    12.14.2010

    Google Voice Actions was the first step towards our Star Trek dreams of lassoing the world with naught but vocal cords, and today Google's taken a second hop towards that inevitable future by letting Android devices record our every utterance. Yes, if you've got a handset running Froyo or better, you can download an update for Google Voice Search right now, which will let your phone dynamically personalize its speech-to-text engine to better recognize your voice most every time you use it. Of course, by so doing you're giving Google permission to record your sentences -- anonymously, of course -- to use in future products, but whether that's a problem or just a happy coincidence depends on whether you take Google at its word. We hit the "yes" button, in case you're curious. Find it on Android Market, or just use the handy-dandy QR code below.