But then, why didn’t nature, or God, make speech just as asymmetric as vision? I’ll venture the guess that speaking and listening were meant for intercommunication rather than perception, where, unlike survival, symmetry was desirable. And since survival was more important than chatting, the lion’s share of the human brain was dedicated to seeing.
These conclusions run against the common wisdom that for human-machine communication, “vision is just like speech, only more powerful.” Not so! These two serve different roles, which we should imitate in human-machine communication: Spoken dialogue should be the primary approach for back-and-forth exchanges, and vision should be the primary approach for human perception of information from the machine.
We can imagine situations where a visual human-machine dialogue would be preferable, for example in learning by machine to ski or juggle. But we are interested in human-machine intercommunication across the full gamut of human interests, where, as telephony has demonstrated, speech-only exchanges go a long way. (Might these basic differences between speech and vision have contributed to the lack of success of video telephony?)
Finally, if we can combine speech and vision in communicating with our machines, as we do in our interactions with other people, we’ll be even better off. But that’s not easy to do yet, because the technologies for speech and vision are in different stages of development. Nor is the wish to combine them reason enough to ignore their different roles.
Conclusion: When you face a machine, instead of your surrounding world and other people, your interactions will be comparably natural, and the machine easiest to use, if it uses speech understanding and speech synthesis for two-way human-machine dialogue (these technologies have begun appearing commercially), and if it has large realistic displays that convey to you a great deal of visual information (as do today’s displays). As machine vision improves, it should be combined with speech for even more natural human-machine exchanges (such combined capabilities are now being researched and demonstrated in several research labs).
So, take heart: Simpler, more natural computer systems will enter our lives within the next 5 to 10 years. Let’s speed up their arrival, as users by asking for them, and as technologists by daring to build them.