Will this summer be remembered as a turning point in the story of man versus machine? On June 23, with little fanfare, a computer program came within a hair’s breadth of passing the Turing test — a kind of parlour game for evaluating machine intelligence devised by mathematician Alan Turing more than 60 years ago.
This wasn’t as dramatic as Skynet becoming self-aware in the Terminator films, or HAL killing off his human crew mates in 2001, A Space Odyssey. But it was still a sign that machines are getting better at the art of talking — something that comes naturally to humans — but has always been a formidable challenge for computers.
Turing proposed the test — he called it “the imitation game” — in a 1950 paper titled “Computing machinery and intelligence”. Back then, computers were very simple machines, and the field known as Artificial Intelligence (AI) was in its infancy. But already scientists and philosophers were wondering where the new technology would lead. In particular, could a machine “think”?
Turing considered that question to be meaningless, so proposed the imitation game as a way of sidestepping the question. Better, he argued, to focus on what the computer can actually do: can it talk? Can it hold a conversation well enough to pass for human? If so, Turing argued, we may as well grant that the machine is, at some level, intelligent.
In a Turing test, judges converse by text with unseen entities, which may be either human or artificial. (Turing imagined using teletype; today it’s done with chat software.) A human judge must determine, based on a five-minute conversation, whether his correspondent is a person or a machine.
Turing speculated that by 2000, “an average interrogator will not have more than a 70 per cent chance of making the right identification” — that is, computers would trick the judges 30 per cent of the time. For years, his prediction failed to come true, as software systems couldn’t match wits with their human interrogators. But in June, they came awfully close.
The event in question, billed as a “Turing test marathon”, was organized by the University of Reading as part of the centenary celebrations of the mathematician’s birth — and held, appropriately enough, at Bletchley Park in Buckinghamshire, where he played a key role in cracking the Enigma code as part of the Allied code-breaking effort. I joined 29 other judges in chatting electronically with 25 “hidden humans” (ensconced in an adjacent room) and five sophisticated “chatbots” — computer programs designed to imitate human intelligence and ability to converse.
Altogether, some 150 separate conversations were held. The winning program, developed by a Russian team, was called “Eugene”. Attempting to emulate the personality of a 13-year-old boy, Eugene fooled the judges 29.2 per cent of the time, just a smidgen below Turing’s 30 per cent threshold.
As a judge, I got a first-hand look at the strengths and weaknesses of the test. First of all, there’s the five-minute time limit — an arbitrary figure mentioned by Turing in his paper. The shorter the conversation, the greater the computer’s advantage; the longer the interrogation, the higher the probability that the computer will give itself away — typically by changing the subject for no reason, or by not being able to answer a question. The 30 per cent mark, too, is arbitrary.
But what about the nature of the test itself? Traditionally, language has been as the ultimate hallmark of intelligence, which is why Turing chose to focus on it. Yet while it may be our most impressive cognitive tool, it is certainly not the only one. In fact, what gives our species its edge may be the sheer variety of skills we have at our disposal, rather than its proficiency at any one task. “Human intelligence,” says Manuela Veloso, a computer scientist at Carnegie Mellon University, “has to do with the breadth of things that we can do.”
Not that we would necessarily want a machine that could “do it all”. Aside from being a staggeringly ambitious task, the idea of building an all-purpose robot — an “artificial human” — has never been a useful approach to AI, not least because it would simply replicate our own impressive capabilities.
Instead, the greatest progress has come when AI is applied to very specific tasks, such as the satellite navigation system in your car, the apps on your iPhone, or the search engines that pull needles out of the Internet’s haystack. Indeed, its most widely publicized achievements — the chess-playing skills of the computer Deep Blue, or the quiz knowledge of an IBM supercomputer called Watson, which last year triumphed in the American TV show Jeopardy! — are very narrow indeed. (Watson can answer difficult trivia questions with impressive skill, but it can’t do your taxes, fold your laundry, or make you a cup of tea.)
“That very simplistic idea, that we’re trying to imitate a human being, has sort of become an embarrassment,” says Pat Hayes of the Institute for Human and Machine Cognition in Pensacola, Florida. “It’s not that we’re failing, which is what a lot of people think — it’s that we’ve decided to do other things which are far more interesting.”
Finally, there’s another aspect of the Turing test that’s easy to overlook: it makes a virtue out of deception, forcing the machine to pretend to be something it’s not. “This is a test of being a successful liar,” says Hayes. “If you had something that really could pass Turing’s imitation game, it would be a very successful human mimic.’”
Of course, a computer’s ability to act human is aided by the amount of information at its fingertips, and the rise of powerful Internet search engines and massive storage and retrieval capability is already paying off for AI. A good example is Siri, Apple’s “intelligent personal assistant”, which has become a big hit with iPhone users, chirping out birthday reminders, restaurant recommendations, and more. With the web at its disposal, Siri seemingly “knows” a great deal.
But is Siri merely simulating intelligence? Some scientists and philosophers have argued that machines lack awareness or consciousness, and that, as a result, their words are just empty chatter. Professor Roger Penrose of Oxford University has long argued that computers are missing something — he can’t say precisely what — that is necessary for full-fledged consciousness and awareness, which in turn make intelligent conversation possible.
What the Turing test measures, Penrose says, is the ability of a machine to dupe humans. “You make it look as though they understand something, when they really don’t,” he says. As a result the Turing test is only “a fairly rough test” of understanding and intelligence — though, he admits, “I don’t know of a better one.”
At the opposite end of the spectrum, there are those who feel that the difference between machines and humans is merely one of complexity. The staunchest defender of that view is probably the philosopher Daniel Dennett, of Tufts University in Massachusetts. Dennett rejects the idea that we have a mysterious “essence” endowed by our biological structure that underlies our cognitive abilities. “It’s not impossible to have a conscious robot,” he told me. “You’re looking at one.” What he means is that the human brain is the most complex known arrangement of matter in the universe.
That, more than anything, accounts for the challenges facing AI: that while the human body is a machine, too, it’s one that far exceeds the current capabilities of information technology.
Perhaps, however, we’re closer than we think to “true” AI. After the Wright Brothers’s airplane lifted off in 1903, skeptics continued to debate whether we were “really” flying — an argument that simply faded away. It may be like that with AI. As Hayes argues, “You could argue we’ve already passed the Turing test”. If someone from 1950 could talk to Siri, he says, they’d think they were talking to a human being. “There’s no way they could imagine it was a machine — because no machine could do anything like that in 1950. So I think we’ve passed the Turing test, but we don’t know it.”