For as much as we complain, we totally get it. Teaching a computer program how to recognize, understand and act upon the movement of human vocal chords is a Herculean task. Throw in nearly unlimited amounts of dialect and regional variation with even a single language, and it's a wonder that programs such as Nuance's Dragon Dictate even exist. Teaching a vehicle how to route calls, adjust volume and tweak a radio station is one thing, but having a program that turns actual speech into presentable documents requires a heightened level of accuracy. The newest build of Dragon Dictate for Mac (v2.5) allows users to seamlessly combine dictation with mouse and keyboard input in Microsoft Word 2011; it also gives yappers the ability to more finely control how Dragon formats text such as dates, times, numbers and addresses, while a free iOS app turns your iPhone, iPad or iPod touch into a wireless microphone. We recently pushed our preconceived notions about this stuff aside in order to spend a solid week relying on our voice instead of our fingertips -- read on to see how it turned on.
Installation and setupFor those unaware, Dragon Dictate is comparatively new to the Mac platform. In short, it's an application that runs quietly in the background and picks up whatever you say into the bundled Plantronics USB microphone. The installation process is relatively painless -- our boxed test unit did indeed require an optical drive to be nearby, but digital download versions are available for those who have opted for one of Apple's ODD-less machines. The most time-consuming part of the setup is right at the beginning. Users are asked to read 10 or 15 paragraphs aloud so that the software can learn how you speak, how you pronounce certain things and whether or not you've a southern drawl.
Once your profile is created, you can then add certain words that are in the (admittedly gargantuan) dictionary. For example, we hastily added DARPA, MHz and BRB -- you know, essential phrases in this line of work. That said, we were downright flabbergasted by how many words were already included, including ye olde Engadget. One of the things that we wondered about going into this review was compatibility; the company doesn't exactly make clear if you need a certain program to have words transcribed. Thankfully, we found that pretty much any program that accepts text works just fine with Dragon Dictate. That includes Microsoft Word, TextEdit, Skype, insert-your-IRC-app-here, Gmail, ScribeFire, Microsoft Excel and even the URL bar in Google Chrome.
Inputting text via voice couldn't be easier. The program includes a few modes: dictation is one -- it's the one you'll likely use most frequently -- and command mode is another. If you're in the latter, you're able to simply tell your computer to open a program, close a window, or quit a certain program. In practice, I found it easier to use my mouse cursor to select which program I wanted my voice to input text into. For example, I would exit the conversation in IRC, mouse over to Gmail, and begin talking once again. So long as you keep an eye on where your cursor is, you'll have no problems getting voice memos into your application or area of choice.
In practice...It's also worth pointing out that there is somewhat of a learning curve for first-time users. Unfortunately, things like natural pauses aren't recognized as a comma. In other words, you actually have to say the word "comma" in order for it to insert one. The same goes for any other type of punctuation mark, and while it's rather awkward at first to speak out these marks that usually fly through your thoughts as you're going from word to word, it grows easier with time. On the plus side, having the ability to input commands through your voice enables you to actually initiate macros with spoken command. Yeah, that means you can close an existing Firefox tab by just saying "Press the key Command W."
In practice, the program had no issues whatsoever recognizing our commands. It was only when we attempted to speak entirely too fast that we saw missed actions. Nuance claims that you don't actually have to change the way you speak for the program. To an extent, that's true. But the reality is this: you need to speak solemnly, deliberately and enunciate properly in order to reduce the number of errors you see. If you begin to speak quickly, as if you were just jawing with a friend, the number of errors will inevitably rise. Trust us on that one. That's not to say you have to speak slowly, but think of it this way -- you're aiming to speak more like Brian Williams on the Nightly News then you are like yourself when chatting with your long-lost sister.
Like many things, this program works best if you put a lot of effort into learning how to use it up front. It took us around two to three hours before we felt completely comfortable speaking into the microphone, and predicting exactly what would happen on the other end. As time went on, the number of mistakes we saw decreased. But, that's largely because we began to compensate for the program in areas where we knew it was weak. To some extent, that's an unfortunate necessity. We found ourselves with fewer errors, but we were intentionally sidestepping words that we genuinely wanted to use simply because we knew the program had a difficult time translating them. If you work in an industry where flowery language isn't exactly an expectation, this may be a nonfactor for you.
Productivity impactOverall, we were supremely blown away by just how great Dragon Dictate operated. We went into this with the goal of seeing just how much more productive we could be by speaking all day instead of typing. Turns out, those who consider themselves to be Professional Emailers stand to gain quite a bit from using this. We can attest to the wrist pain that creeps up after banging away for 10+ hours, and folks who are looking to at least delay the inevitable onset of Carpal Tunnel Syndrome will be downright thrilled in the boost in perceived health. After years of typing for the majority of our waking hours, being able to cut that by a good 60 to 70 percent (in favor of using voice) was downright heavenly. If you don't type for a living, there's a solid chance you won't be in the target market looking to drop $137 on a program like this, but if you do... well, you can probably relate.
The real kicker, however, is speed and accuracy. On our Core i7 test rig, speed was never an issue. The program didn't visibly bog down our system, and our words were spat out onto digital paper mere milliseconds after we uttered them. Accuracy, however, still isn't perfect. But it's wildly close. When banging out long-form content (such as this review, or a term paper, or a dissertation), you might see five to ten errors throughout. Errors that are easily addressable as you're proofreading your content before submitting. Annoying? Sure, but no more so than leaving out words altogether or trying to peck out "convenience" for three minutes sans a dictionary -- things we routinely do when typing. For what it's worth, the updated build does indeed integrate well with Word 2011, allowing users to insert typed text in the midst of spoken text. In fact, that functionality extended to every single application we used, and it's a vital part of the workflow.
We often came upon sections of text -- mostly littered with subscripts, parenthetical references and a litany of quotation marks -- where we simply preferred to type them out instead of speak them. So we did. And when we started speaking again, it simply picked up wherever the cursor was left blinking. Every so often we'd have a word wrap issue, but those were few and far betwixt, and could be solved with a close + reopen. While we're on the topic of errors, we should point out that we spoke this entire review, top to bottom. All told, we had six errors, and we're guessing that we would've found at least six had we penned the entire thing on a keyboard. Specifically, the software erroneously understood "eye on" to be "aisle," "aren't" to be "are," "end" to be "hand," and "weak" to be "week." It also tried to close TextEdit when we said the word "close," and it routinely assumed that "nah" was "now."
Some of those problems are more grating than others; we had to train ourselves to say "aren't" in a way that the software could recognize; otherwise, we handed out the complete wrong message when speaking. We also longed for more slang terms to be understood. When conversing with colleagues, the use of "nah" is a far softer way to express dissent than "no," but we found it nigh impossible to get it to recognize that. Words like "duuuuuude" and "hahaha" also have no place here -- not that we're shocked or anything, but we're guessing the corporate set will want to use this for friendly sidechats on occasion.