Category Archives: Speech Recognition

Utter Command Knowledge Base Updated

We’ve updated the Utter Command Knowledge Base with a couple of new pages:
Generally useful software, mostly free
Useful help and effective complaint URLs

“Generally useful software, mostly free” is just what it sounds like. In the coming weeks you’ll see more updates to the Knowledge Base, including strategies on using the software listed on this page.

“Useful help and effective complaint URLs” points you to effective places to complain about problems with common software. Make sure to mentioned that you use speech recognition when you register a bug or complaint about other software you are using. The more obvious it is that speech users are using their software, the more software makers will pay attention to how their software works with speech.

Urgent Dragon Alert: automatic check glitch can prevent Dragon from opening

By Kimberly Patch

[2-27-13 Update: We’ve gotten word that the issue with the Dragon update service has been fixed. It’s safe to turn on automatic updates if you wish.

In addition, there is a service pack available for Dragon 12. We strongly recommend downloading and installing this update.

A version of Utter Command that is compatible with this update is scheduled for release next week.]

Dragon Naturallyspeaking maker Nuance is having technical issues with its check for update service.

The bottom line: don’t let Dragon automatically check for updates until this is fixed. The software checks periodically unless the “Check for Product Updates at Startup” feature is turned off. This feature is turned on by default.

Trouble is, if your software checks for updates and runs into this issue, Dragon will then not open, making it difficult to turn off the “Check for Product Updates at Startup” feature.

To protect yourself from this potential problem turn off the “Check for Product Updates at Startup” feature: go to Dragon Options\Tools\Administrative Settings\Miscellaneous and Uncheck “Check for Product Updates at Startup”.

If you’ve already run into this problem and Dragon won’t open, there’s a more elaborate fix posted in the Dragon forum: http://nuance.custhelp.com/app/answers/detail/a_id/15105

It’s fairly obvious from the trouble that Nuance is getting ready to release an update. Once Nuance solves the update issues, you’ll want to download the update. The update is fully compatible with Utter Command.

Check back here periodically – we’ll let you know when you can turn the update service back on.

Heads-up: Dragon Recorder iPhone App

By Kimberly Patch

Nuance has released a free iPhone Recorder application you can use with the “Transcribe Recording” feature of Dragon NaturallySpeaking for the desktop.

Dragon Recorder is a relatively simple recorder with a fairly clean interface that lets you record WAV files and transfer them to your computer via wifi. Once the files are on your computer, you can process them through Dragon’s Transcribe Recording feature, which is designed to transcribe the voice of a person who has trained a profile on Dragon NaturallySpeaking. It does pretty well with a relatively quiet recording of just that person’s voice.

Dragon Recorder gives you some useful, basic abilities:

  • You can pause, then continue recording.
  • You can play back the recording on the iPhone, and you can move the pause/play button to jump to different portions of the recording.
  • You can continue recording at the end of any previous recording. This is a little tricky — drag the play button all the way to the right and the play button will turn into a record button
  • You designate the first portion of the name of your file in settings. The second portion of the name is an automatic date and time stamp.

I can think of a couple of additions I’m hoping to see in updates:

  • The ability to bookmark recordings on-the-fly during recording and playback. I’m picturing several types of bookmarks you can use like hash tags. Bookmarks should also show up in the transcription.
  • Although this is designed to be transcribed automatically, it would also be useful to have slider bars for controlling the speed and pitch of recording on playback so you have a good way to manually transcribe as well.

What do you think? Let me know at Kim at this website address or look me up on Google+. Feel free to + me if you want to be in my Accessibility, Utter Command or Redstart Reports circles.

iPhone 4S: speech advances but there’s more to do

By Kimberly Patch

Apple’s iPhone 4S has taken a couple of nice big steps toward adding practical speech to smart phones. There are still some big gaps, mind you. I’ll get to those as well.

Speech on the keyboard

The long-awaited speech button is now part of the keyboard. Everywhere there’s a keyboard you can dictate rather than type. This is far better than having to use an app to dictate, then cut and paste into applications. This is one of the big steps. This will make life much easier for people who have trouble using the keyboard. And I suspect a large contingent of others will find themselves dictating into the iPhone a good amount of time, increasingly reserving the keyboard for situations where they don’t want to be overheard.

The key question about speech on the keyboard is how it works beyond the letter keys and straight dictation.
For instance, after you type
“Great! I’ll meet you at the usual place (pool cue at the ready) at 6:30.”
how easy is it to change what you said to something like this?
“Excellent :-) I’ll meet you at the usual place (pool cue at the ready) at 7:00.”
And then how easy is it to go back to the original if you change your mind again?

Speech assistant

After we all use the speech assistant for a couple of days or weeks it‘ll become readily apparent where Siri lies on the very-useful-to-very-annoying continuum.

The key parameters are
– how much time Siri saves you
– how a particular type of Siri audio feedback hits you the10th time you’ve heard it
– how physically and cognitively easy it is to switch between the assistant and whatever you have to do with your hands on the phone.

One thing that has the potential to tame the annoyance factor is giving users some control over the feedback.

I think the tricky thing about computer-human feedback is it’s inherently different from human-human feedback. One difference is the computer has no feelings and we know that. Good computer-human feedback isn’t necessarily the same as good human-human feedback.

The big gap

There’s still a big speech gap on the iPhone. Speech is still just a partial interface.

Picture sitting in an office with a desktop computer and a human assistant. Type anything you want using the letter keys on your keyboard or ask the assistant to do things for you. You could get a fair amount of work done this way, but there’d still be situations where you’d want to control your computer directly using keyboard shortcuts, arrow keys or the mouse. Partial interfaces have a high annoyance factor.

Even if you use a mix of speech, keyboard and gesture, if you’re able to choose the method of input based on what you want to do rather than what happens to be available, true efficiencies will emerge.

Ultimately, I want to be able to completely control my phone by speech. And I suspect if we figure out how to do that, then make it available for everyone, the general mix of input will become more efficient.

I’d like to see the computer industry tap folks who have to use speech recognition as testers. I think this would push speech input into practical use more quickly and cut out some of the annoyance-factor growing pains.

What do you think? Let me know at Kim@ this domain name.

Getting Gmail working well with speech commands

By Kimberly Patch

If you haven’t used speech commands to control a computer, it might not be obvious that single character commands, for instance “y” to archive a message in Gmail, can present a challenge.

Single-character commands seem like a great idea, especially for Web programs, because your Web browser already takes up some common keyboard shortcuts. Gmail has a lot of single-character commands, and once you get to know them you can fly along using the keyboard. In general I’m all for more keyboard shortcuts because it’s easy to enable them using speech.

Command conundrum

Single-character commands that can’t be changed, however, can get speech users in a lot of trouble. Say a command or make a noise that’s misheard as text in a program that doesn’t use single-character shortcuts and either nothing happens or you get some stray text you can easily undo. Do the same thing in a single-character-command program and you can cause many actions to happen at once.

A stray “Kelly” in your Gmail inbox, for instance, will move the cursor up one message (single-character command “k”) and archive it (single-character command “y”). “Bruno” causes even more damage.

Turn off the keyboard shortcuts, though, and the program becomes fairly inaccessible for speech users. We need the shortcuts, and we can combine multiple keystrokes into single utterances to make things even better. It’s having little control over them that presents a problem.

Speech-safe single character shortcuts

Google Labs has a nifty extension that presents a simple fix. It lets you change the characters you use for keyboard shortcuts, including using two characters rather than one. Add a plus sign (+) to the beginning of every shortcut and they all become speech-safe.

Here are step-by-step instructions.
– go to your Gmail account, click the settings gear icon at the top right of the screen
– click “Labs”
– search for the “Custom Keyboard Shortcuts” extension and click to download. This will add a ”Keyboard Shortcuts” tab to your Gmail settings
– now, click the settings gear icon at the top right of the screen
– click Keyboard Shortcuts
– add “+“ to the beginning of every command

If you’re using Utter Command 2.0 you’re now all set. Say “Plus” and any one- or two-character command. Say, for instance “Plus j” or “Plus Juliet” to move down one item. You can also say a command multiple times in a single utterance. Say “Plus j Repeat 5” to move down five items, for instance. And you can combine two commands: “Plus j Plus y” moves down one item, then archives that item (say “Question Mark” to call up the keyboard shortcuts list.)

Raising the bar

The Google Labs add-on enables Gmail for speech users, but there are many other programs out there that use single-character shortcuts, including other Google programs, and other Web-based programs like Twitter. Message for Google: How about one facility that would let us control keyboard shortcuts across Google programs?

It would also improve things if we could have a larger number of characters available for a given character shortcut, the ability to also control control-key shortcuts, the ability to save and share different sets, and the ability to apply at least some shortcuts across applications

Important Note: If you were a beta tester or received the Utter Command 2.0 pre-release, you might not have the “Plus” set of commands. If this is the case, send e-mail to “Info” at this web address, and we’ll make sure you have the release version. The release version shows 15 new sets of commands on the “New commands for 2.0” list you can open from the Taskbar icon menu.

Tips, tricks, productivity, accessibility, usability and all things speech recognition.

Making filling out forms fast and easy

By Kimberly Patch

Here’s a simple way to make filling out forms in Firefox easier.

If you find yourself frequently putting the same old information — name, address etc. in a Web form, this will save you a lot of time, and it’s probably worth the time to set up even if you fill out forms just a few times a year (speech instructions are for Dragon plus Utter Command):

– Click on this link to download the Autofill Forms extension:
https://addons.mozilla.org/en-US/firefox/addon/autofill-forms/
– In Firefox say “Under Tango Alpha” to Click Tools/Add-ons
– “Shift Tab”, and if necessary “1-10 Down” to Navigate to Autofill Forms
– “2 Tab · Enter” to Click “Options”
– “2 Tab” to Navigate to the first field
– Fill in all applicable fields
– “Enter” to save your information

Now anytime you find yourself in a form field say “Under Juliet” and applicable fields will automatically fill in.

That was the quick easy setup. If you want to change the keyboard shortcut or set several different profiles, take a look at the options. There’s a lot you can do with this add-on.

Feel free to +Kim Patch if you want me to add you to my Utter Command Circle on Google+

617-218-7018    laura.catanzaro@gmail.com

Posting to Word Press by speech

I get a lot of inquiries about how I carry out particular computer tasks by speech.

Here are the gory details on what I do to write a blog item and post it to WordPress:

Getting ready to write

When I think of an idea for a Patch on Speech blog post I say
– “Blog Pending Site” to bring up the Google document I write the blog in. Then I say
– “Find Mark 1”, then “Another Graph” to position the cursor. I have “MARK 1” written at the top of my working section. The first command selects “MARK 1”, and the second one positions the cursor two lines below it at the top of the section. Then I say
– “Today Short Enter” to add the current date and move the cursor to the next line

Writing

I either jot down an idea, or write a whole post.

When I’m writing I make heavy use of “1-20 Befores” to select the last few words I said and change them. A key point about this technique is I don’t count how many words I want to select back. I just make sure to select more words than I need to change, then look to see what is selected and resay what I need to.

I also make use of the Dragon inline commands, which allow you to say punctuation like “Open Quote” and “New Paragraph” without pausing. I use  “Another Graph” to start a new paragraph when I’m not at the very end of a line. I occasionally find myself speaking keyboard to fix something, for instance “Left Backspace Right” to correct “two” to “to”.

We’ve just been testing a series of commands that lets you use a mouse without clicking, and I’ve been experimenting with commands like “Touch Word” and “Touch 3 Words” to select text.

Posting

After I’ve written and edited a piece, I say
– “Find Mark 1”, then “2 Down Home” to put the cursor at the beginning of the headline
Then I use several “1-100  Up\Downs” commands combined with a copy command to select the story, e.g. “50 Downs”, “20 Downs”, “5 Ups Copy”

Then I open the page where I post by saying
– “WordPress Site”
If I’m not already logged on it prompts me for my username. I have my username in the UC Enter list so I can say it and hit the Enter key in one utterance. Since my password is stored I can login in a single utterance:
“<username> Enter”
Once I’m in I say
– “31 Go” to click the “New Post” link
– “Tab Paste” to tab to the body field and paste the text
– “Go Top” to move the cursor to the top of the file
– “Line Cut” to cut the headline
– “2 Delete” to remove the extra lines
– “49 Go” to move to the headline field
– “This Paste” to paste the headline

Categories and Publish

I add categories using the Go numbers, one or two at a time , e.g. “31 Go” to add one category and “38 Go 41 Go” to add two categories in a single utterance, and use a Go number to hit the “Preview” button.

Then I look over the post, say “Doc Close” to close the preview, and use a Go number to hit “Publish”.

Avoid having to remember commands

I think the key to enabling a program for efficient speech control is to take the time to look at what you want to do in detail and plot it out — take the time to write out the steps. Make a game of figuring out just how efficient you can be. Then take the steps and put them in one of the UC Custom Guides, so you can call it up instantly, e.g. “Custom 3 Guide”, and read the set of commands to carry out the task.

This way you don’t have to remember commands. Eventually, after using the guide a bunch of times, you’ll have the sequence memorized without having to consciously memorize it.

If you have a way of carrying out a task by speech that you’re particularly proud of — or if there’s something you’re struggling with — drop me a line at kim @ this web address.

I get a lot of inquiries into how I carry out particular computer tasks by speech.

Here are the gory details on what I do to write a blog item and post it to WordPress.

Getting ready to write

When I think of an idea for a Patch on Speech blog post I say

– “Blog Pending Site” to bring up the Google document I write the blog in. Then I say

– “Find Placeholder”, then “Another Graph” to position the cursor. I have “MARK 1” written at the top of my working section. The first command selects “MARK 1”, and the second command positions the cursor two lines below it, so the new ideas are always at the top of the section. Then I say

– “Today Short Enter” to add the current date and move the cursor to the next line

Writing

I either jot down an idea, or write a whole post.

When I’m writing I make heavy use of “1-20 Befores” to select the last few words I said and change them. A key point about this technique — I don’t count how many words I want to select back — I just make sure to go over the number I want to change, then I look to see what is selected and resay what I need to. I also make use of the Dragon Inline commands, which allow you to say punctuation like “Open Quote” and “New Paragraph” without pausing. I use  “Another Graph” to start a new paragraph when I’m not at the very end of a line. I occasionally find myself speaking keyboard to fix something, for instance “Left Backspace Right” to correct “two” to “to”. We’ve just been testing out a series of commands that lets you use a mouse device without clicking, and I’ve found that commands like “Touch Word”and ”

Posting

After I’ve written and edited a piece, I select the blog text and say

– “Copy to 1 File” to copy story to the use the clipboard “1 File” so I can paste it later

– “2 Up” to unselect and put the cursor on the headline, and

– “Line Copy” to copy the headline

Once I have the blog and headline loaded up, I open the page where I post by saying

– “Word Press Site”

If I’m not already logged on it it prompts me for my username. I have my username in the UC Enter list so I can say it and hit the Enter key in one utterance. Since my password is stored This is all I need to say to login:

“<username> Enter”

Once I’m in I say

– “31 Go” to click the “post” link

– “Paste Tab” to paste the headline and tab to the next field

– “1 File Paste” to paste the blog text.

I think the key to enabling a program for efficient speech control is to take the time to look at what you want to do in detail and plot it out — take the time to write out the steps. Make a game of figuring out just how efficient you can be. Then take the steps and put them in one of the UC custom guides, so you can call up instantly and simply read the set of commands to carry out the task, e.g. “Custom 3 Guide”. This way you don’t have to remember commands. Eventually, from the repetition and saying and picturing the commands in the guide, you’ll have the memorized. But you won’t have to spend extra energy while you’re trying to do your work memorizing them.

If you have a way of carrying out a task by speech that you’re particularly proud of — or if there’s something you’re struggling with — drop me a line.

Spell Everywhere

I’ve been getting a lot of questions lately about the Dragon NaturallySpeaking “Spell XYZ” command. This command lets you say, for instance “Spell s a”. People are complaining that it sometimes doesn’t work. They’re right.

This command doesn’t work everywhere. It only works in text boxes. This is an unfortunate oversight in the Dragon user interface.

Logically, any speech command should work in all contexts where it could be useful. It’s unnecessarily difficult to make the user remember different commands to carry out the same operations in different contexts. Something as basic as pressing a letter key should work anywhere you might want to use a letter, including menus.

This is what people are complaining about. Those who are complaining have gotten adept enough at speech that something basic like pressing letter keys becomes second nature. They have a habit of saying “Spell” and then a letter, number or symbol name whenever they have to hit separate keys. The definition of habit is you don’t have to think about it. And this is where they get in trouble — the habit kicks in everywhere, including when you are in a drop-down menu that doesn’t respond to full words.

If you’d like to use the “Spell XYZ” command everywhere rather than having to stop and think about where you can and can’t use it, complain to Nuance, the company that makes Dragon (there are couple of ways to do this — details are posted on the Redstart wikki: http://redstartsystems.com/Wikka/wikka.php?wakka=NatSpeakUtilitiesandResources).

What’s in a name? Lots.

I get a lot of inquiries from people who are confused about the Dragon speech engine’s many names, and also the name of the company that owns it.

Here’s a brief history:

The Dragon speech engine has changed hands twice, but the name of the company owning it has changed three times.
In the beginning Dragon Systems created the DragonDictate speech engine. Also in the beginning several other companies also created programs that let you speak to a computer: Kurzweil Applied Intelligence, Lernout & Hauspie, IBM and Philips. These early speech engines all required you to pause between words. This was a somewhat frustrating way to dictate and was hard on your voice.

Dragon, Lernout & Hauspie, IBM and Philips eventually improved their speech engines so you could dictate in phrases. When Dragon Systems brought out continuous speech recognition, it changed the name of its product to Dragon NaturallySpeaking. Dragon NaturallySpeaking generally worked better for dictation than DragonDictate.

People who were trying to use Dragon NaturallySpeaking hands-free, however, found that Dragon NaturallySpeaking lacked some of the DragonDictate features. Some of us who needed hands-free speech input used a combination of DragonDictate and Dragon NaturallySpeaking for years. (For me it was until NaturallySpeaking 3.5 came out. There are still a couple of features that were in the old DragonDictate that haven’t made it into Dragon NaturallySpeaking. The one I miss the most is the ability to go straight to a macro script from the recognition dialog box where you could see what Dragon had heard.) So DragonDictate was used and talked about long after development stopped.

Just before Dragon NaturallySpeaking version 5 came out Dragon Systems was sold to Lernout & Hauspie, makers of rival speech engine VoiceXpress Pro. NaturallySpeaking 6 was a merger of the products, keeping the NaturallySpeaking name and most of the look and feel (with the notable exception of the macro creation facility). When Lernout & Hauspie famously melted down, the Lernout & Hauspie speech assets were sold to ScanSoft, a company that started with optical scanning recognition technology acquired from Xerox, who acquired it by buying Kurzweil Computer Products, Inc., one of several companies started by Ray Kurzweil. (The Lernout & Hauspie speech assets also included the Kurzweil Voice speech engine, which Lernout & Hauspie had acquired by buying Kurzweil Applied Intelligence, another company started by Ray Kurzweil.)

Just before ScanSoft acquired Dragon, they’d signed a 10-year deal with IBM to market IBM’s ViaVoice, which by then included PC and Mac versions. After the ScanSoft acquisition there were no more new ViaVoice products. Over the next few years ScanSoft acquired many more speech-related companies including Nuance. After the Nuance acquisition, ScanSoft switched its name to Nuance. Some people refer to the old Nuance as blue Nuance and the current Nuance as green Nuance. (This was the second name change for ScanSoft. It was founded in 1992 as Visioneer.)

This year, Nuance created an iPhone app named Dragon Dictation — name sound familiar?

Also this year Nuance bought MacSpeech. There’s some name history here too. MacSpeech’s original speech engine for the Mac, iListen, was based on Philips FreeSpeech2000 speech engine. MacSpeech changed its product name to match the company name after signing an initial deal with Nuance in early 2008 to use the Dragon NaturallySpeaking engine. (Later in 2008 Nuance bought Philips Speech Recognition Systems.) After buying MacSpeech Nuance renamed the speech engine product to Dragon Dictate for Mac. Name sound familiar? The old DragonDictate had no space between words. The new Dragon Dictate is two separate words.

OK. Got that all straight? There’s a little more nitty-gritty. The Dragon NaturallySpeaking product line includes a basic version, middle version, professional version, legal version and medical version. The professional, legal and medical versions all originally had the “Dragon NaturallySpeaking” first and middle names, but somewhere along the line the legal and medical versions lost NaturallySpeaking, becoming Dragon Legal, and Dragon Medical.

Meanwhile, the basic version and middle versions have recently changed names. The basic version has in the past gone by “standard” but is currently “home”. The middle version has in the past gone by “preferred” but is currently “premium”. There’s also a sub-basic version not usually sold by resellers that can be found in retail stores usually around Christmastime named Dragon NaturallySpeaking Essentials.

One last thing. I’m not sure where Dragon Speak came from. I’ve heard many people refer to Dragon NaturallySpeaking as Dragon Speak, but that’s never been an official name — so far.

So — I hope that clears everything up.

Utter Command has always been named Utter Command — just saying.

Suggestion for Dragon: Easier Correction

In the last couple of months I’ve had a couple occasions to suggest to the folks at Nuance, the company that makes the Dragon NaturallySpeaking speech engine, that their “Resume With” command is under advertised. The command is very useful, but I keep meeting people who don’t know about it.

“Resume With” lets you change text on the fly. For instance, if you say “The black cat jumped over the brown dog”, then — once you see it on the screen — change your mind about the last bit and say “Resume With over the moon”, the phrase will change to “The black cat jumped over the moon.”

This is a particularly useful command for doing something people do a lot — change text as they dictate.

Now I have a suggestion that I think would make the command both better and more often used. Split “Resume With” into two commands: “Try Again” and “Change To”. The two commands would have the same result as “Resume With”, but “Try Again” would tell the computer that the recognition engine got it wrong the first time and you are correcting the error. “Change To” would tell the computer that you are simply changing text.

This would be a less painful way to correct text than the traditional correction box. Users are tempted to change text rather correct it because it’s easier. This would make it equally easy to correct and change using what is arguably the fastest and easiest way to make a change.

Easy correcting is important because NaturallySpeaking learns from correcting and because it’s annoying when the computer gets things wrong. Correcting improves recognition. Minimizing the interruption reduces frustration and lets users concentrate on their work rather than spending time telling Dragon how to do its job. From my observations, many users are tempted to change text rather than correct it when the computer gets something wrong simply because it’s easier.

It would be great to have these commands both in Dragon NaturallySpeaking on the desktop and in Dragon Dictation, the iPhone application. This would enable truly hands-free dictation in Dragon Dictation.