Talking to your Tech: The Future of Voice Commands

The popular science fiction show Star Trek depicted early versions of Skype, cell phones, solid state storage, super computers, and more. However, the ability to give voice commands, one of the most widely used pieces of technology on the show, has been slower to make it into the real world and realize its enormous potential. Anyone who has unsuccessfully tried to use voice commands on their phone, or interact with a car’s speak-and-drive system, understands that the days of simply speaking naturally to a computer and receiving a coherent response every time are still a long way off.

Progress has been made from the early days of computer-voice interaction. On early cells phones equipped with a voice command system, only very limited commands such as “Call Mike” would have been recognized (with difficulty). Today, Google and Apple have both created extremely complicated systems that allow users to ask questions and send texts or emails using natural sounding speech in a number of languages. Despite these advances, voice command systems are still easily flummoxed by accents, minor background noise, speaking slightly too quickly, and a host of other factors that an average human can easily circumvent when speaking to another human. Voice command systems also fail to detect or understand emotion – a vital component in human speech.

Rather than give up on human computer voice interaction, companies around the world are investing in new software that allows computers to better understand the natural emotion and cadence of the human voice, and interpret that data to help the technology better understand and execute commands. The benefits of almost perfect voice/computer integration are obvious: the world’s fastest typer managed acceptable long term accuracy at a speed of 150 words per minute. An average person is able to easily hear, comprehend, and speak at that pace. Better voice command systems would also allow computers to interact with those who are illiterate or unable to type due to blindness or physical disability. The benefits extend to peripheral technology makers as well. For example, makers of smart light bulbs benefit when consumers are able to walk into a room and verbally tell the room how much light to turn on.

Despite the possibilities that human to computer voice interaction offers, these benefits risk jeopardizing yet more of the precious little privacy consumers retain. Computers capable of understanding emotion will be able to track emotional states over time and use that information to target advertising during periods of emotional weakness. While there are upsides to this loss of privacy in areas such as safety (stressed individuals who attempted to operate a motor vehicle could be warned about their emotional state by their cars before setting off), and practicality (computers that understand human emotion would be more intuitive to interact with) the thought of a computer knowing exactly how they’re feeling will rightfully worry many consumers.

Despite these setbacks, the field of human computer interaction (and the technology sector in general) is an exciting and rapidly growing place to work. Consumers are, for the moment, happy to exchange privacy for amazing (and typically free) tools, and the future shows no signs of technology’s march slowing down. However, companies will be wise to realize that as consumers increasingly begin to regard their devices as a trusted friend, they move ever closer to the day when they will stop accepting that their friend shares everything about them.