Now that there are always-on audio interfaces to networked applications, we begin the conversation about “Talk UI.” In some number of houses there are devices listening and ready to execute commands. These commands and their acknowledgement have the form of a conversation between humans.
An optimistic vision of this interaction might be the computer on the television series “Star Trek.” A dystopian vision would be HAL in the Stanley Kubrick film “2001: A Space Odyssey.”
At this moment in audio interfaces, we're closer to the bad handwriting translations of the Apple Newton, or the unintended word transformations of “auto-complete” in texting applications. We ask the audio interface a question and we get back a non-sequitur. We sigh, and type in a specific query.
I can get my Apple TV to show subtitles to a Danish television series by saying: “subtitles (pause) on.” But I can't say, “Siri, please turn the subtitles on.” That's because this isn't a conversational user interface. Words aren't words as they are generally used by humans. Words are buttons, they have specific meanings. The spoken sounds must mean just what the interaction designer chose them to mean, neither more nor less.
“When I use a word,” Humpty Dumpty said in rather a scornful tone, “it means just what I choose it to mean–neither more nor less.”
“The question is,” said Alice, “whether you can make words mean so many different things.”
“The question is,” said Humpty Dumpty, “which is to be master–that's all.”
The Talk UI is “like” a conversation. It has some of the form of a conversation, while not actually being a conversation. We call it a “Conversational UI” to sell it to the masses. They will be disappointed unless they understand that this new thing is just pushing buttons with sound.
The surface area of today's sound buttons is too small. They're hard to press. Creating a larger surface area is the usability challenge for this new interface.