By Kimberly Patch
Apple’s iPhone 4S has taken a couple of nice big steps toward adding practical speech to smart phones. There are still some big gaps, mind you. I’ll get to those as well.
Speech on the keyboard
The long-awaited speech button is now part of the keyboard. Everywhere there’s a keyboard you can dictate rather than type. This is far better than having to use an app to dictate, then cut and paste into applications. This is one of the big steps. This will make life much easier for people who have trouble using the keyboard. And I suspect a large contingent of others will find themselves dictating into the iPhone a good amount of time, increasingly reserving the keyboard for situations where they don’t want to be overheard.
The key question about speech on the keyboard is how it works beyond the letter keys and straight dictation.
For instance, after you type
“Great! I’ll meet you at the usual place (pool cue at the ready) at 6:30.”
how easy is it to change what you said to something like this?
“Excellent :-) I’ll meet you at the usual place (pool cue at the ready) at 7:00.”
And then how easy is it to go back to the original if you change your mind again?
After we all use the speech assistant for a couple of days or weeks it‘ll become readily apparent where Siri lies on the very-useful-to-very-annoying continuum.
The key parameters are
- how much time Siri saves you
- how a particular type of Siri audio feedback hits you the10th time you’ve heard it
- how physically and cognitively easy it is to switch between the assistant and whatever you have to do with your hands on the phone.
One thing that has the potential to tame the annoyance factor is giving users some control over the feedback.
I think the tricky thing about computer-human feedback is it’s inherently different from human-human feedback. One difference is the computer has no feelings and we know that. Good computer-human feedback isn’t necessarily the same as good human-human feedback.
The big gap
There’s still a big speech gap on the iPhone. Speech is still just a partial interface.
Picture sitting in an office with a desktop computer and a human assistant. Type anything you want using the letter keys on your keyboard or ask the assistant to do things for you. You could get a fair amount of work done this way, but there’d still be situations where you’d want to control your computer directly using keyboard shortcuts, arrow keys or the mouse. Partial interfaces have a high annoyance factor.
Even if you use a mix of speech, keyboard and gesture, if you’re able to choose the method of input based on what you want to do rather than what happens to be available, true efficiencies will emerge.
Ultimately, I want to be able to completely control my phone by speech. And I suspect if we figure out how to do that, then make it available for everyone, the general mix of input will become more efficient.
I’d like to see the computer industry tap folks who have to use speech recognition as testers. I think this would push speech input into practical use more quickly and cut out some of the annoyance-factor growing pains.
What do you think? Let me know at Kim@ this domain name.