On Monday, Apple showed off the newest capabilities of Siri, its popular voice-activated smartphone assistant. Following an update, Siri will be able to search through the day's sports scores, read out movie times at the local theater, and find dinner reservations for you -- and all you have to do is speak to it, as you would to another human, in order for it to work.
Siri is, as I have written, the most important feature on Apple's most important product. This is not just true financially: With Siri, and its pervasive advertising campaign, Apple has staked its claim as the company most closely associated with natural language control of devices.
Apple, however, is not alone in marking its territory in the future of human-device interaction.
Think about what are arguably the three most elite consumer tech companies in the world: Apple, Google and Microsoft. All are clearly attempting to transform the traditional point-and-click experience; and yet, interestingly, these companies have conceived three radically different ideas about how we will interact with our devices in the coming years.
Specifically: Apple thinks that we will talk to them (Siri); Google thinks we will stare through them (Google Glasses); and Microsoft thinks we will wave our hands in front of them (Kinect).
I have highlighted what I consider each company's most distinguishing and inspiring product in the personal device sector; each one demonstrates its company's core idea about the future. Apple has Siri, a voice-based system that can retrieve information when you ask for it; Google will soon have Google Glasses, a sight-based system in which everything on a tiny screen in front of your eyeball can be controlled by simple eye movements; and Microsoft has Kinect, in which an operating system can be manipulated via body movement, simulating "touching" a screen but really moving your hands in the air, touching nothing.
In other words, each company chose one of the five human senses on which to build its future, betting that the sense it has chosen will be the dominant means by which we interact with our devices. Apple has chosen hearing (speech); Google has chosen sight; and Microsoft has chosen touch.
(The choices also align, vaguely, with visions from popular science fiction of decades ago. Apple and Siri -- the ability to talk to your device, and have it understand and respond to your requests -- is Knight Rider. Google and its Glasses -- with the ability to have data about the world you are looking at displayed in front of your eyes -- is Terminator. And Microsoft and Kinect -- the ability to virtually move applications with your movements -- is Minority Report. Perhaps engineers at Google, Apple and Microsoft spent their childhoods nose-deep in sci-fi literature and film).
But back to the senses, and Apple, Google, and Microsoft. We have all seen what Siri, the speech-based system, can and cannot currently do, and it is easy to predict where Siri is going. As my friend Andrew Ferguson, a computer scientist at Brown University, wrote me, Apple seems to be steering Siri toward the classic futurist dream of a personal assistant who simply understands and has a correspondent response for everything you say. Expect Apple to continue to build out Siri, to enhance its capabilities, and to make it available on all of its future products: Soon you will be able to talk to your smartphone, tablet, MacBook, television, and -- yes, Knight Rider fans -- automobile.
Google, meanwhile, laid out its vision for a sight-operated device in an enormously viral video this past April, and the Glass prototypes are being rigorously tested as we speak. Though the lens-less eyeglasses can also be controlled via touch and voice, the most intriguing and captivating operating method comes via sight. We know that, even in the earliest stages, one can operate the camera and post photos to Google+ simply by looking in the correct direction. Maps with turn-by-turn directions that flash before one's eyes are also being tested. The Terminator-style overlay of an operating system -- which can be controlled merely by blinking and staring in the right spot -- could be closer than we think.
Finally, the idea of Microsoft's Kinect, the touch-based system, has clearly touched a nerve. For now, it is only officially being used on televisions with the Xbox, where the motion-tracking system can be used to control applications like Netflix and ESPN by moving your hands in front of you as though you were touching the screen. Kinect, however, is far more useful than a reinvention of the TV remote: This technology could be on your next laptop; hacks of the Kinect sensor have shown the system being used to control robots, perform surgery, and a projection of your smartphone's screen. Future applications of the Kinect sensor could turn any surface into a touchscreen, and any smartphone or tablet into a device that could be manipulated without needing to see the screen.
Hearing, sight and touch. Apple, Google, and Microsoft, will depend on each, respectively, to power future generations of devices and entice future generations of consumers. Debating which is "right" or "wrong" is pointless -- all are likely to converge into single devices at some point. In the immediate future, however, these are the individual roles to be mindful of: Apple as Knight Rider, Google as Terminator, and Microsoft as Minority Report.
And our role? Simple: Sit back, relax and watch science-fiction transform into science-fact.
Follow Jason Gilbert on Twitter: www.twitter.com/gilbertjasono
#2- The "voice control" perception of Siri is incredibly skewed, since you still have to occupy one of your hands to hold the phone while you talk into it. It's not hands-free if I have to use my hand to activate it in the first place.
http://www.youtube.com/watch?v=2ysjrbSmMI4
http://www.youtube.com/watch?v=NPvavXqO_Do&feature=related
I like the idea of gestures better than voice overall, you can't always use voice, it isn't always practical. Also, gestures would make it easier for those with some disabilities to use tech. Someone who is deaf could communicate with sign language and it would type it for them.
They are both great technologies, but I think Kinect is ahead of the curve because it does something Siri can't (gestures) and does voice, just not as well. But it gets the job done in the voice department, and software can be improved easier than creating all new hardware.
I do like the Kinect's potential that you've pointed out as well. I think its only a matter of time before Kinect's voice capabilities are on par with Siri; and meanwhile, its facial recognition and gesture software is being explored on PCs way beyond the simple use of video games.
Those Google glasses are fine in a enclosed space, the jury is still out on the wide adaption of these as the next step. They need serious fine tuning. Anyway, it is good we are seeing glimpses of what tech could become. Microsoft has a serious arsenal. They could have put a couple of their offering together and taken us to the next level with the Windows 8. Such as a mind-mapping desktop, packable, multiple desktop system that allows you to arrange your project on one desktop and then flip to another for other purposes. Integrating Visio, video mail/messaging and 3D into the desktop.
1. The natural language processing and the complexity of Siri's analysis are far from ordinary, and are truly bridging the divide between structured and unstructured information. For the record, before being acquired by Apple, Siri was a startup that spun out of SRI International with researchers from Stanford. Siri's roots are truly cutting-edge tech.
2. The mode of Siri's operation takes 'software as a service' a huge step further than any before. It sends voice data to a server that does recognition and all the real data-crunching, then sends the results back - all within a mobile paradigm. This is an extremely important step in the commercialization of AI technologies, which have so far largely been limited by the capabilities of the computers that run them. (By moving it to the cloud, the practical difficulties are shifted to the corporate level, which is far more manageable.)
3. Siri is personable. For several decades - as far back as the 1950s - AI has been portrayed as an evil technology. Siri is finally reversing public sentiment to AI technology, which has already 'infiltrated' their lives in numerous ways (e.g. spam filters).
Haha that's *SO* true and one need only look at my friends to know that. Then again so was my nose and if it wasn't for SciFi I might never have 'friends' so ...
Google's glasses is John Carpenters 'The Live'. Which is, seeing things that otherwise are not there.
And Kinect is the film adaptation of Lost in Space.
And bringing them all together...Star Trek The Next Generation's holodeck.
Siri is Star Trek. (A utopian vision, with weekly peril.)
Google is "They Live". (A distopian vision with perpetual peril.)
Microsoft is "Lost in Space". ("Danger Will Robinson!!" shouts Robby the Robot as it waves its arms frenetically.)
Excellent!
Fixed that for ya...
Yea I'm a nitpicker but you never misquote the Captain.
The author is marginalizing the technology too much for the top players.
Google = terminator
Microsoft = minority report
Where in reality, the future will be Iron man - whoever is able to integrate all three of what those tech companies are pursuing will be the overall winner.
Microsoft too is placing its bets on the growth of AI, long before Apple bought Siri. When MS reinvented its search engine to create Bing, it acquired Powerset, which had bought the rights to the natural language processing technology from PARC - the company that has revolutionized the world several times over in ways people don't often appreciate (the personal computer, GUIs, ethernet, laser printing).
In short, all three are trying to stake out niches within a larger territory. I personally think Siri has more potential for a huge, long-lasting impact than Google Glasses or Kinect. (It's not hard to see Siri "turned into" glasses within a few years.)
Loading comments…