Trying out Dragon Search for the iPhone

Dragon Search is a nice app. Here’s how it works: open the app, hit one button, speak the phrase you want to search for. By default the app stops listening and starts the search when you pause so you don’t have to hit another button when you’re done.

The app comes up quickly, which from a practical standpoint is extremely important. And in my experience so far the search has been fast. There’s also a button you can push to cancel out of the search. The big plus of this application is the different search channels: Google, iTunes, Twitter, Wikipedia, and YouTube. You can search for something, like green apples, and the results will come up in the channel you used last. Once you’ve done a search you can switch channels easily to see results across channels.

I have a couple of practical suggestions.

1. The history list is just three items long — I’d like a much longer scrolling history list. Google Voice Search has a long scrolling list that includes dates. I would’ve liked to have seen Nuance improve on that.

2. I’d also like to be able to add my own channel.

I’ll also take the opportunity to repeat what I said a couple of days ago. I appreciate the progress on speech apps — don’t get me wrong. But speech on the iPhone is still not what I really want, which is system-level speech control of a mobile device that would give me the option to use speech for anything. These new apps are steps in the right direction — making the iPhone more hands-free. But there’s still a long way to go.

A few more thoughts on Dragon Dictation

I’ve been using Dragon Dictation on the iPhone a little more over the past few days and have a couple more thoughts for improvement.

1. If you select text in the full-screen application, then switch to the keyboard the text doesn’t stay selected. The text should stay selected. If you’ve selected an incorrect word or phrase, found there are no correct choices, and are proceeding to the keyboard to correct it. It’s frustrating to have to select again.

2. I’ve lost dictation a couple of times because I’ve switched out of the app — this is unexpected because writing apps like Notepad tend to stay where you left them. I suspect that Dragon Dictation maker Nuance made this choice in order to limit the number of steps for new dictation. I think there are ways to provide this valuable option without increasing steps. The quick solution would be a “remember last dictation option” in settings that would let the user decide which way to do it. Maybe a better solution would be adding a “continue” button to the bottom of the initial screen that would give you the option to continue. So if you wanted to start fresh you would press the main button in the middle of the screen, but if you wanted to continue you could press the smaller “continue” button at the bottom of the screen.

Trying out Dragon Dictation for the iPhone

I’ve been trying out the Dragon Dictation iPhone app. It’s still not what I really want, which is system-level speech control of a mobile device that would give me the option to use speech for anything. But it’s a step in the right direction of making the iPhone more hands-free.

Here’s how Dragon Dictation for the iPhone works: open the app, hit one button, speak up to 30 seconds of dictation, then hit another button to say you’re done. Your dictation shows up on the screen a few seconds later. Behind the scenes the audio file you’ve dictated is sent to a server, put through a speech-recognition engine, and the results sent back to your screen. Now you can add to your text by dictating again, or hit an actions button that gives you three choices: send what you’ve written to your e-mail app, send it to your text app, or copy it to the clipboard so you can paste it someplace else.

The recognition is usually fairly accurate in quiet environments. Not surprisingly, you get a lot of errors in noisy environments. To its credit, on a mobile device the built-in microphone is not optimal for speech-recognition. It does pretty well given these constraints.

Here’s a practical suggestion that should be easy to implement: Add a decibel meter so people can see exactly how much background noise there it is at any given time. This would make people more aware of background noise so they could set their expectations accordingly.

The interface for correcting errors is reasonable. Tap on a word and there are sometimes alternates available or you can delete it. Tap the keyboard button and you can use the regular system keyboard to clean things up.

I have two interface suggestions:

1. You can’t use the regular system copy and paste without going into the keyboard mode. You should be able to. I suspect this is fairly easy to fix.

2. There is no speech facility for correcting errors. I think there’s a practical fix here as well.

First, some background. Full dictation on a mobile device is tricky. Full dictation speech engines take a lot of horsepower. Dragon Dictation sidesteps the problem by sending the dictation over the network to a server running a speech engine. The trade-off is it’s difficult to give the user close control of the text — you must dictate in batches and wait briefly to see the results. This makes it more difficult to offer ways to correct using speech. But I think there is a good solution already in use on another platform.

Although it’s difficult to implement most speech commands given the server setup, the “Resume With” command that’s part of the Dragon NaturallySpeaking desktop speech application is a different animal. This command lets you start over at any point in the phrase you last dictated by picking up the last couple of words that will remain the same and dictating the rest over again.

This would make Dragon Dictation much more useful for people who are trying to be as hands-free as possible. It would also lower the frustration of misrecognitions and subtly teach people to dictate better.

It’s nice to see progress on mobile speech. I’m looking forward to more.

Tip: Scrolling by speech

I’ve gotten several questions lately about scrolling by speech, which is key to comfortable hands-free operation. Utter Command gives you several ways to scroll by speech. The best way depends on the situation.

To quickly look something over, use the speech command that allows you to see successive screens with a pause between changes. For example, “3 Screen Down Wait” moves down a screen, then after a default wait of two seconds moves down another screen, then two seconds later moves down a third screen. If you want a longer wait, add a specific number of seconds, e.g. “3 Screen Down Wait 5” (UC Lesson 7.23). 

To directly control the scroll bar by speech, place the mouse pointer on the scroll bar using a command like “99 by 10” and use the vertical drag command to move the scroll bar to a given point. For example “Drag By 50” moves the scroll bar to the middle. Then, if you then want to go three quarters of the way down say “Drag By 75”. You can also control the scroll bar incrementally, for instance, “Drag 3 Down” (UC Lesson 4.2, 4.5).

In some programs, including some versions of Word, the cursor moves to the page you scrolled to when you use an arrow command like “5 Down”. And in some programs, like Firefox, you can say a link number to move the cursor. In these cases you can leave the arrow parked on the scroll bar, edit the text, than say another drag command to move the scrollbar without having to move the mouse to the scrollbar again. In some programs, including WordPad, you have to move the cursor to the new page by clicking. In this case, keep the right ruler open on your screen so you can easily click back to the scroll bar when you’re ready to scroll again.

– If you use this method a lot, try naming a mouse click to move the arrow to the scroll bar at the home position (UC Lesson 10.24).

– You can also use this method to control horizontal scrollbars — use the “Drag 1-100 By” command.

– If you’re a ZoomText user, you can use this method even when the scrollbar is not showing on the screen.

Tell me what you think about scrolling by speech – reply here or let me know at info@ this website address.

Highlighting and hot water

Have you ever used a faucet that had a hot water knob on the right side instead of the left?

Even if it’s well labeled, chances are you’ll turn the wrong handle a good percentage of the time. This is because controlling the faucet is something you usually do without thinking and your habit is to turn with your left hand when you want hot, not your right.

Consistency allows for habit, which saves time. Do a consistent navigation task a few times and after that you don’t have to think about it. It’s become habit, which means you can use more of your brain to think. The system backfires, however, when you unconsciously expect consistency, use habit, and are caught by surprise.

I often talk about the importance of consistent keyboard shortcuts across programs, because I use keyboard shortcut navigation more than mouse/toolbar navigation.

But consistency is just as important in toolbars.

The default order for many common groups of items is consistent across programs. For instance, Bold, Italic and Underline are commonly shown in that order. Left Justify, Center and Right Justify are commonly shown in that order. Style, Font and Size are commonly shown in that order. There’s a glaring problem, however, when it comes to the highlight and text color icons.

Microsoft Office toolbars put the highlight on the left and the text color icon on the right, while Google Docs and OpenOffice defaults put the highlight button on the right and the text color icon on the left.

The inconsistency makes it impossible to form a habit that’s useful across programs. If you get used to one way you’ll inevitably pick the wrong button when you’re in the program you’re not used to. If you regularly use a mix of inconsistent programs you’re likely to get things wrong fairly often.

In a world where people use multiple programs, inconsistent default order in groups of icons puts a larger-than-necessary cognitive load on folks. Worse, it makes habit a liability rather than an advantage.

It would be good for people if we had a standard order for related icons like Highlight and Text Color just as we have a standard order for faucet controls. The exact order matters much, much less than consistency across programs. Software is complicated enough already — we need to give people all the easy breaks we can.

Speeding Web navigation: single-step deep menu access

Utter Command speech-enables the Firefox Mouseless Browsing extension, which puts a unique number on every clickable item on a Web page. UC lets you click every item on a page, including links, by saying the number plus the word “Go”, for instance “7 Go”.

This works pretty well, but it gets even better when you discover that an item doesn’t have to be visible for you to click it.

This lets you click items that are off-screen. Better yet, it lets you click items on drop-down menus without having to first drop-down the menu. This lets you use a single step to get to any menu item in a Web application once you know the number.

For instance, to insert a horizontal line in a Google Document you can click the “Insert” menu, then click the menu item “Horizontal Line”. There’s no direct keyboard shortcut for horizontal line, so it’s usually a two-step task.

Using numbers you can say “7 Go” to drop-down the Insert menu, then “84 Go” to click  Horizontal Line. But if, like me, you add horizontal lines often enough to remember the number, you can cut straight to the chase and say “84 Go” anytime you want a horizontal line.

Tip: Beat the heat

Here in Boston right now it’s ridiculously hot outside. If you’re using speech recognition on a computer in a room that’s hot, you might have a fan going and/or the computer fan might be going continuously rather than occasionally. And if this is the case, you’re probably getting worse than usual recognition.

There something you can do about it, however. Dragon NaturallySpeaking does an audio check when you initially train a user. The audio check adjusts sound levels and checks for background noise. If your background noise changes, it’s a good idea to do an audio check. This includes if it’s a hellishly hot day out and there’s extra fan noise around.

To do an audio check say
1. “NatSpeak Accuracy” to open the NatSpeak Accuracy Center window
2. “Under c” (or “Under Charlie”) to click “Check your audio settings”, which brings up the Audio Setup Wizard dialog box
3. Now follow the instructions to go through the wizard

Unfortunately the Audio Setup wizard is not hands free. Log a complaint to NaturallySpeaking maker Nuance about this (see the UC Exchange page on NatSpeak Utilities and Resources for ideas about where to do so.)

Remember to run the Audio Setup Wizard whenever the general noise around you changes, or when you take a laptop to a new space. Accurate audio settings make for faster, better recognition.

Keep cool.

What are your speech pet peeves? Tell me about them – reply here or let me know at info@ this website address.

Tip: Not my mistake

One thing that the Dragon NaturallySpeaking speech engine could do better is hyphenation. I don’t mind so much when I say something that should be hyphenated and it’s not. I can always say the NaturallySpeaking command “hyphenate that” or the UC command “1-10 Hyphenate” after the fact if the NaturallySpeaking engine leaves out the hyphenation. I can also specify hyphenation when I want it, e.g. “on hyphen the hyphen fly” will type “on-the-fly”.

If I have something that’s not hyphenated and should be, it’s either a mistake or something I accidentally left out.

But if NaturallySpeaking puts in hyphenation where I don’t want it, there are two problems. First, there’s not an easy way to remove hyphenation after the fact — I have to select the phrase, then say it again in two phrases so it won’t be hyphenated, which is 3 steps. Second, there’s no way to specify no hyphenation.

If NaturallySpeaking over-hyphenates and I don’t notice, it looks like I’m consciously adding hyphens where they shouldn’t be. There’s nothing more annoying than having another entity introduce mistakes into your work.

Because the minuses of over-hyphenation are larger than the minuses of not hyphenating enough, when I see a phrase hyphenated when it’s not supposed to be I remove the hyphenated version from Natspeak Vocabulary so it won’t happen again.

For instance, I removed “follow-up”, which I often put as a stand-alone tag in my todo list. It’s a clunky workaround, but it’ll have to do until speech engines get better at analyzing hyphenation.

To remove a vocabulary word say “NatSpeak Vocabulary”, say the or phrase you want to delete, “Under d c” to delete and close the window, and “Enter” to confirm the change.

I think Nuance could mitigate this problem with a pair of in-line commands: “no-hyphen that” would remove hyphenation in the last phrase and “no-hyphen” would specify that something not be hyphenated, parallel to the “no-caps” command. I’m adding this to the Nuance wish list.

Tip: What to do when dictation isn't recognized as text

Occasionally the Dragon NaturallySpeaking speech engine will get mixed up about whether or not the program or field in focus is something you should be able to type text into. When this happens you’ll see lots of question marks in the recognition box.

The problem is usually easy to fix — move the focus out of whatever program this is happening in, then back in. Here’s a quick way to do that — the UC command “Notepad Open · Notepad Close”.

Tip: What to do when dictation isn’t recognized as text

Occasionally the Dragon NaturallySpeaking speech engine will get mixed up about whether or not the program or field in focus is something you should be able to type text into. When this happens you’ll see lots of question marks in the recognition box.

The problem is usually easy to fix — move the focus out of whatever program this is happening in, then back in. Here’s a quick way to do that — the UC command “Notepad Open · Notepad Close”.