Monthly Archives: October 2011

Heads-up: Dragon Recorder iPhone App

By Kimberly Patch

Nuance has released a free iPhone Recorder application you can use with the “Transcribe Recording” feature of Dragon NaturallySpeaking for the desktop.

Dragon Recorder is a relatively simple recorder with a fairly clean interface that lets you record WAV files and transfer them to your computer via wifi. Once the files are on your computer, you can process them through Dragon’s Transcribe Recording feature, which is designed to transcribe the voice of a person who has trained a profile on Dragon NaturallySpeaking. It does pretty well with a relatively quiet recording of just that person’s voice.

Dragon Recorder gives you some useful, basic abilities:

  • You can pause, then continue recording.
  • You can play back the recording on the iPhone, and you can move the pause/play button to jump to different portions of the recording.
  • You can continue recording at the end of any previous recording. This is a little tricky — drag the play button all the way to the right and the play button will turn into a record button
  • You designate the first portion of the name of your file in settings. The second portion of the name is an automatic date and time stamp.

I can think of a couple of additions I’m hoping to see in updates:

  • The ability to bookmark recordings on-the-fly during recording and playback. I’m picturing several types of bookmarks you can use like hash tags. Bookmarks should also show up in the transcription.
  • Although this is designed to be transcribed automatically, it would also be useful to have slider bars for controlling the speed and pitch of recording on playback so you have a good way to manually transcribe as well.

What do you think? Let me know at Kim at this website address or look me up on Google+. Feel free to + me if you want to be in my Accessibility, Utter Command or Redstart Reports circles.

iPhone 4S: speech advances but there’s more to do

By Kimberly Patch

Apple’s iPhone 4S has taken a couple of nice big steps toward adding practical speech to smart phones. There are still some big gaps, mind you. I’ll get to those as well.

Speech on the keyboard

The long-awaited speech button is now part of the keyboard. Everywhere there’s a keyboard you can dictate rather than type. This is far better than having to use an app to dictate, then cut and paste into applications. This is one of the big steps. This will make life much easier for people who have trouble using the keyboard. And I suspect a large contingent of others will find themselves dictating into the iPhone a good amount of time, increasingly reserving the keyboard for situations where they don’t want to be overheard.

The key question about speech on the keyboard is how it works beyond the letter keys and straight dictation.
For instance, after you type
“Great! I’ll meet you at the usual place (pool cue at the ready) at 6:30.”
how easy is it to change what you said to something like this?
“Excellent :-) I’ll meet you at the usual place (pool cue at the ready) at 7:00.”
And then how easy is it to go back to the original if you change your mind again?

Speech assistant

After we all use the speech assistant for a couple of days or weeks it‘ll become readily apparent where Siri lies on the very-useful-to-very-annoying continuum.

The key parameters are
- how much time Siri saves you
- how a particular type of Siri audio feedback hits you the10th time you’ve heard it
- how physically and cognitively easy it is to switch between the assistant and whatever you have to do with your hands on the phone.

One thing that has the potential to tame the annoyance factor is giving users some control over the feedback.

I think the tricky thing about computer-human feedback is it’s inherently different from human-human feedback. One difference is the computer has no feelings and we know that. Good computer-human feedback isn’t necessarily the same as good human-human feedback.

The big gap

There’s still a big speech gap on the iPhone. Speech is still just a partial interface.

Picture sitting in an office with a desktop computer and a human assistant. Type anything you want using the letter keys on your keyboard or ask the assistant to do things for you. You could get a fair amount of work done this way, but there’d still be situations where you’d want to control your computer directly using keyboard shortcuts, arrow keys or the mouse. Partial interfaces have a high annoyance factor.

Even if you use a mix of speech, keyboard and gesture, if you’re able to choose the method of input based on what you want to do rather than what happens to be available, true efficiencies will emerge.

Ultimately, I want to be able to completely control my phone by speech. And I suspect if we figure out how to do that, then make it available for everyone, the general mix of input will become more efficient.

I’d like to see the computer industry tap folks who have to use speech recognition as testers. I think this would push speech input into practical use more quickly and cut out some of the annoyance-factor growing pains.

What do you think? Let me know at Kim@ this domain name.