speech

Latest

GM wants voice-controlled cars that learn what you really mean
by
Jon Fingas
07.15.2014
Voice control is easy to find in cars, but it's not always intuitive. You often have to use specific syntax, which might be hard to remember when you're barreling down the highway. GM may have a smarter approach in store, though. The Wall Street Journal understands that the automaker is working with machine learning firm VocalIQ on an "advanced voice-control system" that would let you control navigation, wipers and other car components in a more intuitive way.
Apple buys tech that could take Siri offline
by
Sharif Sakr
04.04.2014
Apple has sort-of-confirmed that it recently snapped up another small company, called Novauris. The firm specializes in speech recognition and has historical ties to the core technology and patents underpinning Siri. TechCrunch reports that Novauris' experts are already working inside Apple to improve its voice assistant, but no one really knows exactly what they're up to. One of Novauris's big strengths has been locally processed recognition, which doesn't rely on distant servers, so it's possible that Apple wants Siri to accomplish more without a data connection. (Apple's current Siri partner, Nuance, can also do offline processing, but Apple hasn't been able to bring that technology in-house.) We're just speculating, of course, but this is a function that no voice assistant has really mastered so far (although others are definitely working on it), and it's even more important now that iOS is getting into the car.
Intel reportedly acquires Indisys, gets an edge in natural language recognition (update: official)
by
Jon Fingas
09.13.2013
Intel is quickly transforming its dream of perceptual computing into reality: the company will soon ship motion control technology, and it acquired the gesture interface firm Omek back in July. The chip giant may not be done yet, as there are reports from Spain that it has acquired Indisys, a small natural language recognition company. Details of the buyout are scarce, but the move would give Intel its own voice control software; it wouldn't have to license code from third parties like Nuance. We've reached out to Intel to confirm the acquisition. If real, the Indisys takeover might have come at just the right time -- Intel is swinging its attention to wearables, and voice control is now more of a necessity than a luxury. Update: Intel just confirmed to us that it acquired Indisys on May 31st, and that the deal has already closed.
Dragon Mobile Assistant 4 for Android adds driving mode, voice notifications
by
Jon Fingas
06.18.2013
For Nuance, it's not enough that Dragon Mobile Assistant spares Android users from pecking at the keyboard -- with the app's new 4.0 upgrade, those users can sometimes avoid contact altogether. Dragon Mobile Assistant can now detect when you're in a moving car and automatically invoke a Driver Mode that relies solely on voice recognition and feedback, keeping your focus on the road. Accordingly, the upgrade builds in spoken notifications for inbound calls, messages, upcoming meetings and Facebook updates. There's also voice-aware email and customizable wake up commands. All told, 4.0 is a big boost for Android fans who see touchscreens as old hat; if you do, you can grab the update shortly (if not already) through Google Play.
Google's conversational search goes live with latest version of Chrome
by
Steve Dent
05.22.2013
After revealing it at I/O 2013 only days ago, Google's new conversational voice search function is up and running on Chrome 27. If you've got that version, you'll now get a spoken response on top of a web page display when using the voice search function (the microphone in the main search window), for starters. More interestingly, the new feature also includes semantic search, meaning you can ask follow-up questions without repeating needless info -- for instance, "who's the CEO of GE?" can now be followed up with "how old is he?" and Google will know who "he" is. We gave it a spin for ourselves and found that when it worked, it worked well, however, the system may be overwhelmed by the launch and is giving us a "no internet connection" message most of the time -- not exactly what we're looking for.
Google's conversational voice search reaches the desktop through Chrome
by
Jon Fingas
05.15.2013
We're used to Google's mobile search apps letting us ask questions as we would with real people, but the desktop has usually been quite stiff. That's changing today: Google is bringing conversation-like voice search to our computers through Chrome, with no typing required. Web denizens just have to say "okay, Google," ask their question, and get back a spoken response similar to what they'd hear on their phones. The company hasn't said just how soon Chrome will incorporate the new voice features, however. %Gallery-188469%
Nuance Dragon Notes brings quick, spoken memos and messages to Windows 8
by
Jon Fingas
05.14.2013
Sometimes, the smallest and simplest apps make the most sense. Take Nuance's new Dragon Notes for Windows 8, for example. Unlike its NaturallySpeaking cousin, it's not a universal tool: instead, it's narrowly focused on the voice dictation of memos, email, social networking updates and web searches. That limited scope leads to a very simple interface, however, and slims down the price from $100 to a far more accessible $20. Fans of minimalism can grab Dragon Notes directly from Nuance on May 15th, although they'll need to spend $10 for every language they speak beyond English.
AppleScripting Mail > Announce New Emails By Voice
by
Ben Waldie
03.25.2013
You're in the kitchen cooking dinner, or sitting down watching TV, or exercising. Ding! You have a new email. Quickly, run to your Mac to see who it's from. Meh, spam. Ding! Meh, a message from your boss. Ding! Meh. Sure would be nice if Mail could announce who's emailing you. That way, you could just listen for ones you care about. Well, with the help of AppleScript and Mail rules, you can set this up on your own. Think of it as an audible caller ID, but for email. Setting It Up 1. Launch AppleScript Editor (in /Applications/Utilities) and create a new script document. 2. Enter the following script into the document. NOTE: if you have any trouble following along, you can download the completed script here. NOTE: If you wish to test the script, which is always a good idea, you can do so by running it in AppleScript Editor. Just select a message or two in Mail. Then, return to AppleScript Editor and click Run in the script document's window. 3. Make any adjustments to the properties at the top of the script to customize its behavior. For example, if you don't want the script to raise your volume if it's too low, change the raiseVolumeIfNeeded property value to false. If you don't want the script to read the first few paragraphs of each message to you, set the readFirstParagraphsOfEachMessage property value to false. Modifiable Properties in the Script 4. Save the script in Script format to your Desktop as Mail > Announce New Emails By Voice.scpt. Saving the script to the Desktop 5. Launch Mail, open its Preferences window, and click Rules in the toolbar. Mail's Rule Preferences window 6. Click Add Rule to create a Mail rule. 7. Set the rule's description to Announce New Emails by Voice. 8. Set the rule to trigger if any of the following conditions are met: Account matches [Your Account]. If you have multiple accounts, click + and add each one. 9. Set the rule to perform the following action: Run AppleScript. From the list of scripts, choose Open in Finder. Configuring the Mail rule 10. Copy the Announce New Emails By Voice.scpt script file from your desktop into the newly opened folder (this folder is ~/Library/Application Scripts/com.apple.mail in Mountain Lion). Mail's rule scripts folder in Mountain Lion (in ~/Library/Application Scripts) 11. Go back to Mail and close and save the rule. The configured Mail rule, set to run the script Now, whenever a new message arrives for the accounts you specified, the rule should trigger the script, and the messages should be announced audibly. If you get tired of listening to the announcements or want to mute them, just open up Mail's Preferences > Rules window again and de-select the Active checkbox next to the rule. Happy Scripting! You can disable the Mail rule by de-selecting the Action checkbox
QNX builds in-car speech framework with AT&T's Watson, knows our true intentions
by
Jon Fingas
01.07.2013
QNX wants to put an end to in-car voice systems that require an awkward-sounding syntax to get the job done. As part of its CES launches, it's rolling out a framework for its speech recognition technology leaning on AT&T's Watson engine. By offloading the phrase interpretation to AT&T's servers, any infotainment system with the framework inside can focus on deciphering the speaker's intent -- letting drivers spend more time navigating or playing music, instead of remembering the necessary magic words. QNX will roll out the voice element as part of its CAR platform at an unspecified point in 2013. We'll have to wait until car and head-end unit designers implement the platform in tangible hardware, but the new speech system will hopefully lead to more organic-sounding conversations with our cars. Follow all the latest CES 2013 news at our event hub.
Samsung patent ties emotional states to virtual faces through voice, shows when we're cracking up
by
Jon Fingas
11.06.2012
Voice recognition usually applies to communication only in the most utilitarian sense, whether it's to translate on the spot or to keep those hands on the wheel while sending a text message. Samsung has just been granted a US patent that would convey how we're truly feeling through visuals instead of leaving it to interpretation of audio or text. An avatar could change its eyes, mouth and other facial traits to reflect the emotional state of a speaker depending on the pronunciation: sound exasperated or brimming with joy and the consonants or vowels could lead to a furrowed brow or a smile. The technique could be weighted against direct lip syncing to keep the facial cues active in mid-speech. While the patent won't be quite as expressive as direct facial mapping if Samsung puts it to use, it could be a boon for more realistic facial behavior in video games and computer-animated movies, as well as signal whether there was any emotional subtext in that speech-to-text conversion -- try not to give away any sarcasm.
Mountain Lion 101: Updated high-quality voice synthesis
by
Michael Rose
07.28.2012
Most of the speech hubbub around Mountain Lion has centered on the OS's marquee Dictation feature, which happily accepts your spoken words as a substitute for typing them in. Dictation works in almost any text entry field, and it's surprisingly effective; Steve even dictated his entire post about Dictation. Speech-to-text is only one side of the coin; there's also text-to-speech. OS X Leopard introduced a single high-quality voice named Alex. "He" sounded so natural compared to the previous generation of Mac synth voices that it was a little bit disconcerting. Starting in OS X Lion, users were given the choice to install high-quality synthetic voices licensed from Nuance that supplemented or replaced the "classic" Mac voice options in scores of languages. These voices delivered uncanny quality while chewing up hefty amounts of disk space (upwards of 500 MB in some cases). As pointed out by AppleInsider, the enhanced speech voices have now been updated for Mountain Lion. Users who previously installed a custom voice should now see 2.0 versions of those voices available in the new Software Update zone (which appears at the top of the Updates area in the Mac App Store). If you've never experimented with the voice synth options in OS X, you can change the system voice in the Dictation & Speech system preference pane. Want to make your Mac speak? TextEdit (and most Cocoa-based editors) offers Speech options in the Edit menu or via a contextual menu. You can also pick a hotkey in the Dictation & Speech preference pane to speak any selected text in any application. The preference pane also allows you to turn on spoken alerts for notifications, speak the time or let you know when an app needs your attention, which I imagine would grow tiresome awfully quickly. OS X's voice synthesis skills are also put to full use with VoiceOver, the accessibility screenreader that assists visually impaired users. Sighted users may find it's worth experiencing VoiceOver once or twice, just to get a sense of the amount of engineering work that goes into making OS X a true accessible platform. To turn on the full screenreader interface with VoiceOver, you can use the Accessibility preference pane or just hit ⌘-F5. Automator and AppleScript both support speech output, and there's still the venerable say command-line tool which lets you specify a voice with the -v flag. If you're feeling particularly mischief-minded, remember that say will work on a remote machine via an ssh session. Watch as your officemate jumps clear out of his or her skin when you tell the MacBook Pro on the desk to say -v Trinoids Do not adjust your screen. You will be assimilated. The process is quick and painless, stand by. Fun for the whole family.
Microsoft job posting hints at Connected Car strategy: Azure, Kinect and WP8
by
Steve Dent
06.25.2012
Redmond seems to have more grandiose ideas for Connected Car than it's let on before, judging from a recent help wanted ad on its site. Reading more like PR for its car-based plans, the job notice waxes poetically about using "the full power of the Microsoft ecosystem" in an upcoming auto platform with tech such as Kinect, Azure, Windows 8 and Windows Phone. Those products would use face-tracking, speech and gestures to learn your driving habits and safely guide or entertain you on the road, according to the software engineer listing. It also hints that everything would be tied together using Azure's cloud platform, so that your favorite music or shortcuts would follow you around, even if you're not piloting your own rig. All that makes its original Connected Car plans from 2009 seem a bit laughable -- check the original video for yourself after the break.
ASUS Computex keynote now on YouTube: relive the excitement, the yelling
by
Sharif Sakr
06.07.2012
You think it's easy up there on stage? Then just try shouting "ubiquitous cloud computing era" at the top of your lungs without sounding silly. It's virtually impossible, as ASUS chairman Jonney Shih discovered 45 seconds into the video after the break. Fortunately, he quickly moved onto his company's rather stellar array of Computex reveals, including the dual-booting Transformer AiO (which doubles up as the "world's biggest tablet"), a couple of Windows 8 hybrids and the Taichi swiveller -- not to mention some live performance art ten minutes before the end. If you're the "Home C.I.O." in your family, then it could be professionally negligent to miss this.
EyeRing finger-mounted connected cam captures signs and dollar bills, identifies them with OCR (hands-on)
by
Zach Honig
04.25.2012
Ready to swap that diamond for a finger-mounted camera with a built-in trigger and Bluetooth connectivity? If it could help identify otherwise indistinguishable objects, you might just consider it. The MIT Media Lab's EyeRing project was designed with an assistive focus in mind, helping visually disabled persons read signs or identify currency, for example, while also serving to assist children during the tedious process of learning to read. Instead of hunting for a grownup to translate text into speech, a young student could direct EyeRing at words on a page, hit the shutter release, and receive a verbal response from a Bluetooth-connected device, such as a smartphone or tablet. EyeRing could be useful for other individuals as well, serving as an ever-ready imaging device that enables you to capture pictures or documents with ease, transmitting them automatically to a smartphone, then on to a media sharing site or a server. We peeked at EyeRing during our visit to the MIT Media Lab this week, and while the device is buggy at best in its current state, we can definitely see how it could fit into the lives of people unable to read posted signs, text on a page or the monetary value of a currency note. We had an opportunity to see several iterations of the device, which has come quite a long way in recent months, as you'll notice in the gallery below. The demo, which like many at the Lab includes a Samsung Epic 4G, transmits images from the ring to the smartphone, where text is highlighted and read aloud using a custom app. Snapping the text "ring," it took a dozen or so attempts before the rig correctly read the word aloud, but considering that we've seen much more accurate OCR implementations, it's reasonable to expect a more advanced version of the software to make its way out once the hardware is a bit more polished -- at this stage, EyeRing is more about the device itself, which had some issues of its own maintaining a link to the phone. You can get a feel for how the whole package works in the video after the break, which required quite a few takes before we were able to capture an accurate reading.
AT&T opens Watson API up to developers
by
Brian Heater
04.19.2012
Admit it, you don't have nearly enough opportunities to talk back to your phone. AT&T is giving you more. The company today announced that it will be offering its Watson real-time speech-to-text software to developers as APIs aimed at a number of different application types -- things like web search, question and answer apps and anything that uses AT&T's U-Verse TV services. A number of additional varieties are also in the works, including gaming and social media. Check out a cheery informational video after the break.
Next generation iPad adds voice dictation
by
Dave Caolo
03.07.2012
Earlier today at the iPad media event, Phil Schiller announced that the new iPad will feature voice dictation. Users will find a new microphone button on the keyboard. Give it a tap and start speaking, much like you do with the iPhone 4S. As of the announcement, voice dictation on the new iPad supports US English, British, Australian, French, German and Japanese.
SpeechJammer gun gives loudmouths a dose of their own medicine to keep 'em quiet
by
Michael Gorman
03.01.2012
Silence is golden, so there are plenty of times when it'd be awfully convenient to mute those around us, and a couple of Japanese researchers have created a gadget that can do just that. Called the SpeechJammer, it's able to "disturb remote people's speech without any physical discomfort" by recording and replaying what you say a fraction of a second after you say it. Why would that shut up the chatty Cathy next to you? Delayed auditory feedback (DAF) is based on an established psychological principle that it's well-nigh impossible for folks to speak when their words are played back to them just after they've been uttered. SpeechJammer puts the power of DAF in a radar gun-style package that uses a directional mic and speaker, distance sensor and a trigger switch to turn it on, plus a laser pointer for targeting purposes -- so you simply point and shoot at your talkative target, and enjoy the silence that ensues. Piggy, your new conch has arrived, and this one can make Jack keep quiet.
Microsoft strikes deal with 24/7, promises to 'redefine' customer service
by
Donald Melanson
02.07.2012
A partnership between Microsoft and customer service company 24/7 may not exactly sound like the most exciting proposition on the face of things, but the two are making some fairly lofty promises, and Microsoft seems to be making a serious investment in the initiative. As ZDNet's Mary Jo Foley reports, part of the deal will see Microsoft send at least some of the 400 employees it brought on in its 2007 acquisition of TellMe Networks to 24/7, and it will also license some of its speech-related IP to the company (in addition to taking an equity stake in it). The goal there being to combine natural user interfaces with a cloud-based customer service platform, which Microsoft promises will "redefine what customer service looks like." To that end, it gives the example of a credit card company getting in touch with you to report suspicious behavior; rather than a phone call, you could get a notification with all the pertinent details sent directly to your phone, which could anticipate a number of potential actions and let you respond by voice (or touch, presumably). Unfortunately, while the two are talking plenty about the future of customer service, there's not a lot of word as to when that might arrive.
Nuance gobbles up Vlingo, yearns to transcribe its own announcement
by
Dante Cesa
12.21.2011
Apparently, if you can't (legally) beat them, you buy them. Such is the thinking over at Nuance, who has decided to acquire its competitor and former courtroom dance partner, Vlingo. Should make for some nice additions to the former's voice recognition tubes -- technology which powers everything from Apple's Siri, Dragon dictation and even various autos. No indications as to how many greenbacks exchanged hands, but the newlyweds were happy to boast their "complementary research and development efforts" will result in a company "stronger together than alone." We'll have to see about that. PR after the break.
New computer system can read your emotions, will probably be annoying about it (video)
by
Amar Toor
11.22.2011
It's bad enough listening to your therapist drone on about the hatred you harbor toward your father. Pretty soon, you may have to put up with a hyper-insightful computer, as well. That's what researchers from the Universidad Carlos III de Madrid have begun developing, with a new system capable of reading human emotions. As explained in their study, published in the Journal on Advances in Signal Processing, the computer has been designed to intelligently engage with people, and to adjust its dialogue according to a user's emotional state. To gauge this, researchers looked at a total of 60 acoustic parameters, including the tenor of a user's voice, the speed at which one speaks, and the length of any pauses. They also implemented controls to account for any endogenous reactions (e.g., if a user gets frustrated with the computer's speech), and enabled the adaptable device to modify its speech accordingly, based on predictions of where the conversation may lead. In the end, they found that users responded more positively whenever the computer spoke in "objective terms" (i.e., with more succinct dialogue). The same could probably be said for most bloggers, as well. Teleport past the break for the full PR, along with a demo video (in Spanish).