Google’s artificial intelligence researchers are starting to have to code around their own code, writing patches that limit a robot’s abilities so that it continues to develop down the path desired by the researchers — not by the robot itself. It’s the beginning of a long-term trend in robotics and AI in general: once we’ve put in all this work to increase the insight of an artificial intelligence, how can we make sure that insight will only be applied in the ways we would like?
That’s why researchers from Google’s DeepMind and the Future of Humanity Institute have published a paper outlining a software “killswitch” they claim can stop those instances of learning that could make an AI less useful — or, in the future, less safe. It’s really less a killswitch than a blind spot, removing from the AI the ability to learn the wrong lessons.
Specifically, they code the AI to ignore human input and its consequences for success or failure. If going inside is a “failure” and it learns that every time a human picks it up, the human then carries it inside, the robot might decide to start running away from any human who approaches. If going inside is a desired goal, it may learn to give up on pathfinding its way inside, and simply bump into human ankles until it gets what it wants. Writ large, the “law” being developed is basically, “Thou shalt not learn to win the game in ways that are annoying and that I didn’t see coming.”
It’s a very good rule to have.
Elon Musk seems to be using the media’s love of sci-fi panic headlines to promote his name and brand, at this point, but he’s not entirely off base when he says that we need to worry about AI run amok. The issue isn’t necessarily hegemony by the robot overlords, but widespread chaos as AI-based technologies enter an ever-wider swathe of our lives. Without the ability to safely interrupt an AI and not influence its learning, the simple act of stopping a robot from doing something unsafe or unproductive could make it less safe or productive — making human intervention a tortured, overly complex affair with unforeseeable consequences.
Asimov’s Three Laws of Robotics are conceptual in nature — they describe the types of things that cannot be done. But to provide the Three Laws in such a form requires a brain that understands words like “harm” and can accurately identify the situations and actions that will produce it. The laws, those simple when written in English, will be of absolutely ungodly complexity when written out in software. They will reach into every nook and cranny of an AI’s cognition, editing not the thoughts that can be produced from input, but what input will be noticed, and how will it be interpreted. The Three Laws will be attributes of machine intelligence, not limitations put upon it — that is, they will be that, or they won’t work.
This Google initiative might seem a ways off from First Do No (Robot) Harm, but this grounded understanding of the Laws shows how it really is the beginning robot personality types. We’re starting to shape how robots think, not what they think, and to do it with the intention of adjusting their potential behavior, not their observed behavior. That is, in essence, the very basics of a robot morality.
We don’t know violence is bad because evolution provided us a group of “Violence Is Bad” neurons, but in part because evolution provided us with mirror neurons and a deeply-laid cognitive bias to project ourselves into situations we see or imagine, experiencing some version of the feelings therein. The higher-order belief about morality emerges at least in part from comparatively simple changes in how data is processed. The rules being imagined and proposed at Google are even more rudimentary, but they’re the beginning of the same path. So, if you want to teach a robot not to do harm to humans, you have to start with some basic aspects of its cognition.
Modern machine learning is about letting machines re-code themselves within certain limits, and those limits mostly exist to direct the algorithm in a positive direction. It doesn’t know what “good” means, and so we have to give it a definition, and a means to judge its own actions against that standard. But with so-called “unsupervised machine learning,” it’s possible to let an artificial intelligence change its own learning rules and learn from the effects of those modifications. It’s a branch of learning that could make ever-pausing Tetris bots seem like what they are: quaint but serious reminders of just how alien a computer’s mind really is, and how far things could very easily go off course.
The field of unsupervised learning is in its infancy today, but it carries the potential for true robot versatility and even creativity, as well as exponentially fast change in abilities and traits. It’s the field that could realize some of the truly fantastical predictions of science fiction — from techno-utopias run by super-efficient and unbiased machines, to techno-dystopia run by malevolent and inhuman ones. It could let a robot usefully navigate in a totally unforeseen alien environment, or lead that robot to slowly acquire some V’ger-like improper understanding of its mission.