Category Archives: mobile

Heads-up: Dragon Recorder iPhone App

By Kimberly Patch

Nuance has released a free iPhone Recorder application you can use with the “Transcribe Recording” feature of Dragon NaturallySpeaking for the desktop.

Dragon Recorder is a relatively simple recorder with a fairly clean interface that lets you record WAV files and transfer them to your computer via wifi. Once the files are on your computer, you can process them through Dragon’s Transcribe Recording feature, which is designed to transcribe the voice of a person who has trained a profile on Dragon NaturallySpeaking. It does pretty well with a relatively quiet recording of just that person’s voice.

Dragon Recorder gives you some useful, basic abilities:

  • You can pause, then continue recording.
  • You can play back the recording on the iPhone, and you can move the pause/play button to jump to different portions of the recording.
  • You can continue recording at the end of any previous recording. This is a little tricky — drag the play button all the way to the right and the play button will turn into a record button
  • You designate the first portion of the name of your file in settings. The second portion of the name is an automatic date and time stamp.

I can think of a couple of additions I’m hoping to see in updates:

  • The ability to bookmark recordings on-the-fly during recording and playback. I’m picturing several types of bookmarks you can use like hash tags. Bookmarks should also show up in the transcription.
  • Although this is designed to be transcribed automatically, it would also be useful to have slider bars for controlling the speed and pitch of recording on playback so you have a good way to manually transcribe as well.

What do you think? Let me know at Kim at this website address or look me up on Google+. Feel free to + me if you want to be in my Accessibility, Utter Command or Redstart Reports circles.

iPhone 4S: speech advances but there’s more to do

By Kimberly Patch

Apple’s iPhone 4S has taken a couple of nice big steps toward adding practical speech to smart phones. There are still some big gaps, mind you. I’ll get to those as well.

Speech on the keyboard

The long-awaited speech button is now part of the keyboard. Everywhere there’s a keyboard you can dictate rather than type. This is far better than having to use an app to dictate, then cut and paste into applications. This is one of the big steps. This will make life much easier for people who have trouble using the keyboard. And I suspect a large contingent of others will find themselves dictating into the iPhone a good amount of time, increasingly reserving the keyboard for situations where they don’t want to be overheard.

The key question about speech on the keyboard is how it works beyond the letter keys and straight dictation.
For instance, after you type
“Great! I’ll meet you at the usual place (pool cue at the ready) at 6:30.”
how easy is it to change what you said to something like this?
“Excellent :-) I’ll meet you at the usual place (pool cue at the ready) at 7:00.”
And then how easy is it to go back to the original if you change your mind again?

Speech assistant

After we all use the speech assistant for a couple of days or weeks it‘ll become readily apparent where Siri lies on the very-useful-to-very-annoying continuum.

The key parameters are
– how much time Siri saves you
– how a particular type of Siri audio feedback hits you the10th time you’ve heard it
– how physically and cognitively easy it is to switch between the assistant and whatever you have to do with your hands on the phone.

One thing that has the potential to tame the annoyance factor is giving users some control over the feedback.

I think the tricky thing about computer-human feedback is it’s inherently different from human-human feedback. One difference is the computer has no feelings and we know that. Good computer-human feedback isn’t necessarily the same as good human-human feedback.

The big gap

There’s still a big speech gap on the iPhone. Speech is still just a partial interface.

Picture sitting in an office with a desktop computer and a human assistant. Type anything you want using the letter keys on your keyboard or ask the assistant to do things for you. You could get a fair amount of work done this way, but there’d still be situations where you’d want to control your computer directly using keyboard shortcuts, arrow keys or the mouse. Partial interfaces have a high annoyance factor.

Even if you use a mix of speech, keyboard and gesture, if you’re able to choose the method of input based on what you want to do rather than what happens to be available, true efficiencies will emerge.

Ultimately, I want to be able to completely control my phone by speech. And I suspect if we figure out how to do that, then make it available for everyone, the general mix of input will become more efficient.

I’d like to see the computer industry tap folks who have to use speech recognition as testers. I think this would push speech input into practical use more quickly and cut out some of the annoyance-factor growing pains.

What do you think? Let me know at Kim@ this domain name.

Dragon 11.5 free upgrade finally here

By Kimberly Patch

If you already use Dragon NaturallySpeaking 11, you’re entitled to a free upgrade to 11.5.

You must access the upgrade from within Dragon 11 — Click on “Help\ Check for Updates”, select “Dragon NaturallySpeaking 11.5” and follow the instructions to download the upgrade. Once it’s downloaded, click on the upgrade to install.

There’s more information here:  nuance.custhelp.com/app/answers/detail/a_id/6213

We strongly recommend upgrading. The upgrade fixes several bugs and gives you the ability to use your iPhone as a wireless microphone over a wifi network.

Utter Command is fully compatible with Dragon 11.5.

Trying out Dragon Search for the iPhone

Dragon Search is a nice app. Here’s how it works: open the app, hit one button, speak the phrase you want to search for. By default the app stops listening and starts the search when you pause so you don’t have to hit another button when you’re done.

The app comes up quickly, which from a practical standpoint is extremely important. And in my experience so far the search has been fast. There’s also a button you can push to cancel out of the search. The big plus of this application is the different search channels: Google, iTunes, Twitter, Wikipedia, and YouTube. You can search for something, like green apples, and the results will come up in the channel you used last. Once you’ve done a search you can switch channels easily to see results across channels.

I have a couple of practical suggestions.

1. The history list is just three items long — I’d like a much longer scrolling history list. Google Voice Search has a long scrolling list that includes dates. I would’ve liked to have seen Nuance improve on that.

2. I’d also like to be able to add my own channel.

I’ll also take the opportunity to repeat what I said a couple of days ago. I appreciate the progress on speech apps — don’t get me wrong. But speech on the iPhone is still not what I really want, which is system-level speech control of a mobile device that would give me the option to use speech for anything. These new apps are steps in the right direction — making the iPhone more hands-free. But there’s still a long way to go.

A few more thoughts on Dragon Dictation

I’ve been using Dragon Dictation on the iPhone a little more over the past few days and have a couple more thoughts for improvement.

1. If you select text in the full-screen application, then switch to the keyboard the text doesn’t stay selected. The text should stay selected. If you’ve selected an incorrect word or phrase, found there are no correct choices, and are proceeding to the keyboard to correct it. It’s frustrating to have to select again.

2. I’ve lost dictation a couple of times because I’ve switched out of the app — this is unexpected because writing apps like Notepad tend to stay where you left them. I suspect that Dragon Dictation maker Nuance made this choice in order to limit the number of steps for new dictation. I think there are ways to provide this valuable option without increasing steps. The quick solution would be a “remember last dictation option” in settings that would let the user decide which way to do it. Maybe a better solution would be adding a “continue” button to the bottom of the initial screen that would give you the option to continue. So if you wanted to start fresh you would press the main button in the middle of the screen, but if you wanted to continue you could press the smaller “continue” button at the bottom of the screen.

Trying out Dragon Dictation for the iPhone

I’ve been trying out the Dragon Dictation iPhone app. It’s still not what I really want, which is system-level speech control of a mobile device that would give me the option to use speech for anything. But it’s a step in the right direction of making the iPhone more hands-free.

Here’s how Dragon Dictation for the iPhone works: open the app, hit one button, speak up to 30 seconds of dictation, then hit another button to say you’re done. Your dictation shows up on the screen a few seconds later. Behind the scenes the audio file you’ve dictated is sent to a server, put through a speech-recognition engine, and the results sent back to your screen. Now you can add to your text by dictating again, or hit an actions button that gives you three choices: send what you’ve written to your e-mail app, send it to your text app, or copy it to the clipboard so you can paste it someplace else.

The recognition is usually fairly accurate in quiet environments. Not surprisingly, you get a lot of errors in noisy environments. To its credit, on a mobile device the built-in microphone is not optimal for speech-recognition. It does pretty well given these constraints.

Here’s a practical suggestion that should be easy to implement: Add a decibel meter so people can see exactly how much background noise there it is at any given time. This would make people more aware of background noise so they could set their expectations accordingly.

The interface for correcting errors is reasonable. Tap on a word and there are sometimes alternates available or you can delete it. Tap the keyboard button and you can use the regular system keyboard to clean things up.

I have two interface suggestions:

1. You can’t use the regular system copy and paste without going into the keyboard mode. You should be able to. I suspect this is fairly easy to fix.

2. There is no speech facility for correcting errors. I think there’s a practical fix here as well.

First, some background. Full dictation on a mobile device is tricky. Full dictation speech engines take a lot of horsepower. Dragon Dictation sidesteps the problem by sending the dictation over the network to a server running a speech engine. The trade-off is it’s difficult to give the user close control of the text — you must dictate in batches and wait briefly to see the results. This makes it more difficult to offer ways to correct using speech. But I think there is a good solution already in use on another platform.

Although it’s difficult to implement most speech commands given the server setup, the “Resume With” command that’s part of the Dragon NaturallySpeaking desktop speech application is a different animal. This command lets you start over at any point in the phrase you last dictated by picking up the last couple of words that will remain the same and dictating the rest over again.

This would make Dragon Dictation much more useful for people who are trying to be as hands-free as possible. It would also lower the frustration of misrecognitions and subtly teach people to dictate better.

It’s nice to see progress on mobile speech. I’m looking forward to more.