AI fake-face generators can be rewound to reveal the real faces they trained on
Load up the website This Person Does Not Exist and it'll show you a human face, near-perfect in its realism yet totally fake. Refresh and the neural network behind the site will generate another, and another, and another. The endless sequence of AI-crafted faces is produced by a generative adversarial network (GAN)-a type of AI that learns to produce realistic but fake examples of the data it is trained on.
But such generated faces-which are starting to be used in CGI movies and ads-might not be as unique as they seem. In a paper titled This Person (Probably) Exists, researchers show that many faces produced by GANs bear a striking resemblance to actual people who appear in the training data. The fake faces can effectively unmask the real faces the GAN was trained on, making it possible to expose the identity of those individuals. The work is the latest in a string of studies that call into doubt the popular idea that neural networks are black boxes" that reveal nothing about what goes on inside.
To expose the hidden training data, Ryan Webster and his colleagues at the University of Caen Normandy in France used a type of attack called a membership attack, which can be used to find out whether certain data was used to train a neural network model. These attacks typically take advantage of subtle differences between the way a model treats data it was trained on-and has thus seen thousands of times before-and unseen data.
For example, a model might identify a previously unseen image accurately, but with slightly less confidence than one it was trained on. A second, attacking model can learn to spot such tells in the first model's behavior and use them to predict when certain data, such as a photo, is in the training set or not.
Such attacks can lead to serious security leaks. For example, finding out that someone's medical data was used to train a model associated with a disease might reveal that this person has that disease.
Webster's team extended this idea so that instead of identifying the exact photos used to train a GAN, they identified photos in the GAN's training set that were not identical but appeared to portray the same individual-in other words, faces with the same identity. To do this, the researchers first generated faces with the GAN and then used a separate facial-recognition AI to detect whether the identity of these generated faces matched the identity of any of the faces seen in the training data.
The results are striking. In many cases, the team found multiple photos of real people in the training data that appeared to match the fake faces generated by the GAN, revealing the identity of individuals the AI had been trained on.
The left-hand column in each block shows faces generated by a GAN. These fake faces are followed by three photos of real people identified in the training dataUNIVERSITY OF CAEN NORMANDYThe work raises some serious privacy concerns. The AI community has a misleading sense of security when sharing trained deep neural network models," says Jan Kautz, vice president of learning and perception research at Nvidia.
In theory this kind of attack could apply to other data tied to an individual, such as biometric or medical data. On the other hand, Webster points out that people could also use the technique to check whether their data has been used to train an AI without their consent.
Artists could find out whether their work had been used to train a GAN in a commercial tool, he says: You could use a method such as ours for evidence of copyright infringement."
The process could also be used to make sure GANs don't expose private data in the first place. The GAN could check whether its creations resembled real examples in its training data, using the same technique developed by the researchers, before releasing them.
Yet this assumes that you can get hold of that training data, says Kautz. He and his colleagues at Nvidia have come up with a different way to expose private data, including images of faces and other objects, medical data, and more, that does not require access to training data at all.
Instead, they developed an algorithm that can re-create the data that a trained model has been exposed to by reversing the steps that the model goes through when processing that data. Take a trained image-recognition network: to identify what's in an image, the network passes it through a series of layers of artificial neurons. Each layer extracts different levels of information, from edges to shapes to more recognizable features.
Kautz's team found that they could interrupt a model in the middle of these steps and reverse its direction, re-creating the input image from the internal data of the model. They tested the technique on a variety of common image-recognition models and GANs. In one test, they showed that they could accurately re-create images from ImageNet, one of the best known image recognition data sets.
Images from ImageNet (top) alongside recreations of those images made by rewinding a model trained on ImageNet (bottom) NVIDIAAs in Webster's work, the re-created images closely resemble the real ones. We were surprised by the final quality," says Kautz.
The researchers argue that this kind of attack is not simply hypothetical. Smartphones and other small devices are starting to use more AI. Because of battery and memory constraints, models are sometimes only half-processed on the device itself and sent to the cloud for the final computing crunch, an approach known as split computing. Most researchers assume that split computing won't reveal any private data from a person's phone because only the model is shared, says Kautz. But his attack shows that this isn't the case.
Kautz and his colleagues are now working to come up with ways to prevent models from leaking private data. We wanted to understand the risks so we can minimize vulnerabilities, he says.
Even though they use very different techniques, he thinks that his work and Webster's complement each other well. Webster's team showed that private data could be found in the output of a model; Kautz's team showed that private data could be revealed by going in reverse, re-creating the input. Exploring both directions is important to come up with a better understanding of how to prevent attacks," says Kautz.