As phones have transitioned to smart phones, our personal technology has graduated from conduits between people to a more sophisticated breed that allows for – even invites – direct control. In tandem, people are getting rid of voicemail, making fewer phone calls, and texting more. In one vein, this seems like a more truncated, efficient behavior, but it also implies greater intimacy with the device.
We’re also growing to expect the similar level of control we have over our phones to expand to the devices of our environment. The “smart home” and “connected” objects are commanded with our phones for the time being. Contrary to the shift in phone use, control of these devices that is buried in a growing library of apps is not efficient.
The technological response to surfacing quick control over these smart objects is the use of voice interfaces. The Xbox’s Kinect allows for voice control of your Xbox apps and access to media. The Xfinity remote control makes “change the channel to HBO” possible. Apple’s Siri, Microsoft’s Cortana, Amazon’s Alexa, and the Google Now services are all serious attempts at broadening voice control to access many services.
While speech-to-text recognition has largely improved, the voice controlled services themselves still lack in the sophistication that people presume exists when communicating through a nuanced medium as speech. Even if this level of sophistication is attained, and the services understand and respond exactly as we expect them to, the challenge of intimacy remains.
When common interaction with phones shifted from calls to text as the interfaces allowed more direct (read: intimate) control, we’ve created this controversial-yet-accepted balance of interacting with people directly and multitasking with our pocket computers. Voice interaction necessitates a more public display of that human computer interaction. One that is so uncomfortable, directly inhibits its use. Think of the times you have used your voice input on a phone: public settings, private settings with people around, or solitary settings?
Although we may not be able to out-design social mores, we can take the first challenge—that of accuracy, intuitive use, and predictable outcome—to the whiteboard and to the APIs.