How DeepMind Is Reinventing the Robot

Tom Chivers

from IEEE Spectrum on 2021-09-27 15:00 (#5Q1XT)

image.jpg?width=1245&coordinates=0%2C122

Artificial intelligence has reached deep into our lives, though you might be hard pressed to point to obvious examples of it. Among countless other behind-the-scenes chores, neural networks power our virtual assistants, make online shopping recommendations, recognize people in our snapshots, scrutinize our banking transactions for evidence of fraud, transcribe our voice messages, and weed out hateful social-media postings. What these applications have in common is that they involve learning and operating in a constrained, predictable environment.

But embedding AI more firmly into our endeavors and enterprises poses a great challenge. To get to the next level, researchers are trying to fuse AI and robotics to create an intelligence that can make decisions and control a physical body in the messy, unpredictable, and unforgiving real world. It's a potentially revolutionary objective that has caught the attention of some of the most powerful tech-research organizations on the planet. "I'd say that robotics as a field is probably 10 years behind where computer vision is," says Raia Hadsell, head of robotics at DeepMind, Google's London-based AI partner. (Both companies are subsidiaries of Alphabet.)

This article is part of our special report on AI, The Great AI Reckoning."

Even for Google, the challenges are daunting. Some are hard but straightforward: For most robotic applications, it's difficult to gather the huge data sets that have driven progress in other areas of AI. But some problems are more profound, and relate to longstanding conundrums in AI. Problems like, how do you learn a new task without forgetting the old one? And how do you create an AI that can apply the skills it learns for a new task to the tasks it has mastered before?

Success would mean opening AI to new categories of application. Many of the things we most fervently want AI to do-drive cars and trucks, work in nursing homes, clean up after disasters, perform basic household chores, build houses, sow, nurture, and harvest crops-could be accomplished only by robots that are much more sophisticated and versatile than the ones we have now.

Beyond opening up potentially enormous markets, the work bears directly on matters of profound importance not just for robotics but for all AI research, and indeed for our understanding of our own intelligence.

Let's start with the prosaic problem first. A neural network is only as good as the quality and quantity of the data used to train it. The availability of enormous data sets has been key to the recent successes in AI: Image-recognition software is trained on millions of labeled images. AlphaGo, which beat a grandmaster at the ancient board game of Go, was trained on a data set of hundreds of thousands of human games, and on the millions of games it played against itself in simulation.

To train a robot, though, such huge data sets are unavailable. "This is a problem," notes Hadsell. You can simulate thousands of games of Go in a few minutes, run in parallel on hundreds of CPUs. But if it takes 3 seconds for a robot to pick up a cup, then you can only do it 20 times per minute per robot. What's more, if your image-recognition system gets the first million images wrong, it might not matter much. But if your bipedal robot falls over the first 1,000 times it tries to walk, then you'll have a badly dented robot, if not worse.

The problem of real-world data is-at least for now-insurmountable. But that's not stopping DeepMind from gathering all it can, with robots constantly whirring in its labs. And across the field, robotics researchers are trying to get around this paucity of data with a technique called sim-to-real.

The San Francisco-based lab OpenAI recently exploited this strategy in training a robot hand to solve a Rubik's Cube. The researchers built a virtual environment containing a cube and a virtual model of the robot hand, and trained the AI that would run the hand in the simulation. Then they installed the AI in the real robot hand, and gave it a real Rubik's Cube. Their sim-to-real program enabled the physical robot to solve the physical puzzle.

Despite such successes, the technique has major limitations, Hadsell says, noting that AI researcher and roboticist Rodney Brooks "likes to say that simulation is 'doomed to succeed.' " The trouble is that simulations are too perfect, too removed from the complexities of the real world. "Imagine two robot hands in simulation, trying to put a cellphone together," Hadsell says. If you allow them to try millions of times, they might eventually discover that by throwing all the pieces up in the air with exactly the right amount of force, with exactly the right amount of spin, that they can build the cellphone in a few seconds: The pieces fall down into place precisely where the robot wants them, making a phone. That might work in the perfectly predictable environment of a simulation, but it could never work in complex, messy reality. For now, researchers have to settle for these imperfect simulacrums. "You can add noise and randomness artificially," Hadsell explains, "but no contemporary simulation is good enough to truly recreate even a small slice of reality."

Catastrophic forgetting: When an AI learns a new task, it has an unfortunate tendency to forget all the old ones.

There are more profound problems. The one that Hadsell is most interested in is that of catastrophic forgetting: When an AI learns a new task, it has an unfortunate tendency to forget all the old ones.

The problem isn't lack of data storage. It's something inherent in how most modern AIs learn. Deep learning, the most common category of artificial intelligence today, is based on neural networks that use neuronlike computational nodes, arranged in layers, that are linked together by synapselike connections.

Before it can perform a task, such as classifying an image as that of either a cat or a dog, the neural network must be trained. The first layer of nodes receives an input image of either a cat or a dog. The nodes detect various features of the image and either fire or stay quiet, passing these inputs on to a second layer of nodes. Each node in each layer will fire if the input from the layer before is high enough. There can be many such layers, and at the end, the last layer will render a verdict: "cat" or "dog."

Each connection has a different "weight." For example, node A and node B might both feed their output to node C. Depending on their signals, C may then fire, or not. However, the A-C connection may have a weight of 3, and the B-C connection a weight of 5. In this case, B has greater influence over C. To give an implausibly oversimplified example, A might fire if the creature in the image has sharp teeth, while B might fire if the creature has a long snout. Since the length of the snout is more helpful than the sharpness of the teeth in distinguishing dogs from cats, C pays more attention to B than it does to A.

Each node has a threshold over which it will fire, sending a signal to its own downstream connections. Let's say C has a threshold of 7. Then if only A fires, it will stay quiet; if only B fires, it will stay quiet; but if A and B fire together, their signals to C will add up to 8, and C will fire, affecting the next layer.

What does all this have to do with training? Any learning scheme must be able to distinguish between correct and incorrect responses and improve itself accordingly. If a neural network is shown a picture of a dog, and it outputs "dog," then the connections that fired will be strengthened; those that did not will be weakened. If it incorrectly outputs "cat," then the reverse happens: The connections that fired will be weakened; those that did not will be strengthened.

image.png?width=980 Training of a neural network to distinguish whether a photograph is of a cat or a dog uses a portion of the nodes and connections in the network [shown in red, at left]. Using a technique called elastic weight consolidation, the network can then be trained on a different task, distinguishing images of cars from buses. The key connections from the original task are frozen" and new connections are established [blue, at right]. A small fraction of the frozen connections, which would otherwise be used for the second task, are unavailable [purple, right diagram]. That slightly reduces performance on the second task.

But imagine you take your dog-and-cat-classifying neural network, and now start training it to distinguish a bus from a car. All its previous training will be useless. Its outputs in response to vehicle images will be random at first. But as it is trained, it will reweight its connections and gradually become effective. It will eventually be able to classify buses and cars with great accuracy. At this point, though, if you show it a picture of a dog, all the nodes will have been reweighted, and it will have "forgotten" everything it learned previously.

This is catastrophic forgetting, and it's a large part of the reason that programming neural networks with humanlike flexible intelligence is so difficult. "One of our classic examples was training an agent to play Pong," says Hadsell. You could get it playing so that it would win every game against the computer 20 to zero, she says; but if you perturb the weights just a little bit, such as by training it on Breakout or Pac-Man, "then the performance will-boop!-go off a cliff." Suddenly it will lose 20 to zero every time.

This weakness poses a major stumbling block not only for machines built to succeed at several different tasks, but also for any AI systems that are meant to adapt to changing circumstances in the world around them, learning new strategies as necessary.

There are ways around the problem. An obvious one is to simply silo off each skill. Train your neural network on one task, save its network's weights to its data storage, then train it on a new task, saving those weights elsewhere. Then the system need only recognize the type of challenge at the outset and apply the proper set of weights.

But that strategy is limited. For one thing, it's not scalable. If you want to build a robot capable of accomplishing many tasks in a broad range of environments, you'd have to train it on every single one of them. And if the environment is unstructured, you won't even know ahead of time what some of those tasks will be. Another problem is that this strategy doesn't let the robot transfer the skills that it acquired solving task A over to task B. Such an ability to transfer knowledge is an important hallmark of human learning.

Hadsell's preferred approach is something called "elastic weight consolidation." The gist is that, after learning a task, a neural network will assess which of the synapselike connections between the neuronlike nodes are the most important to that task, and it will partially freeze their weights. "There'll be a relatively small number," she says. "Say, 5 percent." Then you protect these weights, making them harder to change, while the other nodes can learn as usual. Now, when your Pong-playing AI learns to play Pac-Man, those neurons most relevant to Pong will stay mostly in place, and it will continue to do well enough on Pong. It might not keep winning by a score of 20 to zero, but possibly by 18 to 2.

image.jpg?width=980

image.jpg?width=980 Raia Hadsell [top] leads a team of roboticists at DeepMind in London. At OpenAI, researchers used simulations to train a robot hand [above] to solve a Rubik's Cube.Top: DeepMind; Bottom: OpenAI

There's an obvious side effect, however. Each time your neural network learns a task, more of its neurons will become inelastic. If Pong fixes some neurons, and Breakout fixes some more, "eventually, as your agent goes on learning Atari games, it's going to get more and more fixed, less and less plastic," Hadsell explains.

This is roughly similar to human learning. When we're young, we're fantastic at learning new things. As we age, we get better at the things we have learned, but find it harder to learn new skills.

"Babies start out having much denser connections that are much weaker," says Hadsell. "Over time, those connections become sparser but stronger. It allows you to have memories, but it also limits your learning." She speculates that something like this might help explain why very young children have no memories: "Our brain layout simply doesn't support it." In a very young child, "everything is being catastrophically forgotten all the time, because everything is connected and nothing is protected."

The loss-of-elasticity problem is, Hadsell thinks, fixable. She has been working with the DeepMind team since 2018 on a technique called "progress and compress." It involves combining three relatively recent ideas in machine learning: progressive neural networks, knowledge distillation, and elastic weight consolidation, described above.

Progressive neural networks are a straightforward way of avoiding catastrophic forgetting. Instead of having a single neural network that trains on one task and then another, you have one neural network that trains on a task-say, Breakout. Then, when it has finished training, it freezes its connections in place, moves that neural network into storage, and creates a new neural network to train on a new task-say, Pac-Man. Its knowledge of each of the earlier tasks is frozen in place, so cannot be forgotten. And when each new neural network is created, it brings over connections from the previous games it has trained on, so it can transfer skills forward from old tasks to new ones. But, Hadsell says, it has a problem: It can't transfer knowledge the other way, from new skills to old. "If I go back and play Breakout again, I haven't actually learned anything from this [new] game," she says. "There's no backwards transfer."

That's where knowledge distillation, developed by the British-Canadian computer scientist Geoffrey Hinton, comes in. It involves taking many different neural networks trained on a task and compressing them into a single one, averaging their predictions. So, instead of having lots of neural networks, each trained on an individual game, you have just two: one that learns each new game, called the "active column," and one that contains all the learning from previous games, averaged out, called the "knowledge base." First the active column is trained on a new task-the "progress" phase-and then its connections are added to the knowledge base, and distilled-the "compress" phase. It helps to picture the two networks as, literally, two columns. Hadsell does, and draws them on the whiteboard for me as she talks.

If you want to build a robot capable of accomplishing many tasks in a broad range of environments, you'd have to train it on every single one of them.

The trouble is, by using knowledge distillation to lump the many individual neural networks of the progressive-neural-network system together, you've brought the problem of catastrophic forgetting back in. You'll change all the weights of the connections and render your earlier training useless. To deal with this, Hadsell adds in elastic weight consolidation: Each time the active column transfers its learning about a particular task to the knowledge base, it partially freezes the nodes most important to that particular task.

By having two neural networks, Hadsell's system avoids the main problem with elastic weight consolidation, which is that all its connections will eventually freeze. The knowledge base can be as large as you like, so a few frozen nodes won't matter. But the active column itself can be much smaller, and smaller neural networks can learn faster and more efficiently than larger ones. So the progress-and-compress model, Hadsell says, will allow an AI system to transfer skills from old tasks to new ones, and from new tasks back to old ones, while never either catastrophically forgetting or becoming unable to learn anything new.

Other researchers are using different strategies to attack the catastrophic forgetting problem; there are half a dozen or so avenues of research. Ted Senator, a program manager at the Defense Advanced Research Projects Agency (DARPA), leads a group that is using one of the most promising, a technique called internal replay. "It's modeled after theories of how the brain operates," Senator explains, "particularly the role of sleep in preserving memory."

The theory is that the human brain replays the day's memories, both while awake and asleep: It reactivates its neurons in similar patterns to those that arose while it was having the corresponding experience. This reactivation helps stabilize the patterns, meaning that they are not overwritten so easily. Internal replay does something similar. In between learning tasks, the neural network recreates patterns of connections and weights, loosely mimicking the awake-sleep cycle of human neural activity. The technique has proven quite effective at avoiding catastrophic forgetting.

There are many other hurdles to overcome in the quest to bring embodied AI safely into our daily lives. "We have made huge progress in symbolic, data-driven AI," says Thrishantha Nanayakkara, who works on robotics at Imperial College London. "But when it comes to contact, we fail miserably. We don't have a robot that we can trust to hold a hamster safely. We cannot trust a robot to be around an elderly person or a child."

Nanayakkara points out that much of the "processing" that enables animals to deal with the world doesn't happen in the brain, but rather elsewhere in the body. For instance, the shape of the human ear canal works to separate out sound waves, essentially performing "the Fourier series in real time." Otherwise that processing would have to happen in the brain, at a cost of precious microseconds. "If, when you hear things, they're no longer there, then you're not embedded in the environment," he says. But most robots currently rely on CPUs to process all the inputs, a limitation that he believes will have to be surmounted before substantial progress can be made.

You know the cat is never going to learn language, and I'm okay with that.

His colleague Petar Kormushev says another problem is proprioception, the robot's sense of its own physicality. A robot's model of its own size and shape is programmed in directly by humans. The problem is that when it picks up a heavy object, it has no way of updating its self-image. When we pick up a hammer, we adjust our mental model of our body's shape and weight, which lets us use the hammer as an extension of our body. "It sounds ridiculous but they [robots] are not able to update their kinematic models," he says. Newborn babies, he notes, make random movements that give them feedback not only about the world but about their own bodies. He believes that some analogous technique would work for robots.

At the University of Oxford, Ingmar Posner is working on a robot version of "metacognition." Human thought is often modeled as having two main "systems"-system 1, which responds quickly and intuitively, such as when we catch a ball or answer questions like "which of these two blocks is blue?," and system 2, which responds more slowly and with more effort. It comes into play when we learn a new task or answer a more difficult mathematical question. Posner has built functionally equivalent systems in AI. Robots, in his view, are consistently either overconfident or underconfident, and need ways of knowing when they don't know something. "There are things in our brain that check our responses about the world. There's a bit which says don't trust your intuitive response," he says.

For most of these researchers, including Hadsell and her colleagues at DeepMind, the long-term goal is "general" intelligence. However, Hadsell's idea of an artificial general intelligence isn't the usual one-of an AI that can perform all the intellectual tasks that a human can, and more. Motivating her own work has "never been this idea of building a superintelligence," she says. "It's more: How do we come up with general methods to develop intelligence for solving particular problems?" Cat intelligence, for instance, is general in that it will never encounter some new problem that makes it freeze up or fail. "I find that level of animal intelligence, which involves incredible agility in the world, fusing different sensory modalities, really appealing. You know the cat is never going to learn language, and I'm okay with that."

Hadsell wants to build algorithms and robots that will be able to learn and cope with a wide array of problems in a specific sphere. A robot intended to clean up after a nuclear mishap, for example, might have some quite high-level goal-"make this area safe"-and be able to divide that into smaller subgoals, such as finding the radioactive materials and safely removing them.

I can't resist asking about consciousness. Some AI researchers, including Hadsell's DeepMind colleague Murray Shanahan, suspect that it will be impossible to build an embodied AI of real general intelligence without the machine having some sort of consciousness. Hadsell herself, though, despite a background in the philosophy of religion, has a robustly practical approach.

"I have a fairly simplistic view of consciousness," she says. For her, consciousness means an ability to think outside the narrow moment of "now"-to use memory to access the past, and to use imagination to envision the future. We humans do this well. Other creatures, less so: Cats seem to have a smaller time horizon than we do, with less planning for the future. Bugs, less still. She is not keen to be drawn out on the hard problem of consciousness and other philosophical ideas. In fact, most roboticists seem to want to avoid it. Kormushev likens it to asking "Can submarines swim?...It's pointless to debate. As long as they do what I want, we don't have to torture ourselves with the question."

image.jpg?width=980 Pushing a star-shaped peg into a star-shaped hole may seem simple, but it was a minor triumph for one of DeepMind's robots.DeepMind

In the DeepMind robotics lab it's easy to see why that sort of question is not front and center. The robots' efforts to pick up blocks suggest we don't have to worry just yet about philosophical issues relating to artificial consciousness.

Nevertheless, while walking around the lab, I find myself cheering one of them on. A red robotic arm is trying, jerkily, to pick up a star-shaped brick and then insert it into a star-shaped aperture, as a toddler might. On the second attempt, it gets the brick aligned and is on the verge of putting it in the slot. I find myself yelling "Come on, lad!," provoking a raised eyebrow from Hadsell. Then it successfully puts the brick in place.

One task completed, at least. Now, it just needs to hang on to that strategy while learning to play Pong.

This article appears in the October 2021 print issue as "How to Train an All-Purpose Robot."

Special Report: The Great AI Reckoning

Source	RSS or Atom Feed
Feed Location	http://feeds.feedburner.com/IeeeSpectrum
Feed Title	IEEE Spectrum
Feed Link	https://spectrum.ieee.org/