Scientists prove that deepfake detectors can be duped

They tested their method on 'state-of-the-art deepfake detectors.'

UC San Diego Jacobs School of Engineering

Universities, organizations and tech giants, such as Microsoft and Facebook, have been working on tools that can detect deepfakes in an effort to prevent their use for the spread of malicious media and misinformation. Deepfake detectors, however, can still be duped, a group of computer scientists from UC San Diego has warned. The team showed how detection tools can be fooled by inserting inputs called “adversarial examples” into every video frame at the WACV 2021 computer vision conference that took place online in January.

In their announcement, the scientists explained that adversarial examples are manipulated images that can cause AI systems to make a mistake. See, most detectors work by tracking faces in videos and sending cropped face data to a neural network — deepfake videos are convincing because they were modified to copy a real person’s face, after all. The detector system can then determine if a video is authentic by looking at elements that aren’t reproduced well in deepfakes, such as blinking.

The UC San Diego scientists found that by creating adversarial examples of the face and inserting them into every video frame, they were able to fool “state-of-the-art deepfake detectors.” Further, the technique they developed works even for compressed videos and even if they had no complete access to the detector model. A bad actor coming up with the same technique could then create deepfakes that can evade even the best detection tools.

So, how can developers create detectors that can’t be duped? The scientists recommend using adversary training, wherein an adaptive adversary keeps generating deepfakes that can bypass the detector while it’s being trained, so that the detector can continue to improve in spotting inauthentic images.

The researchers wrote in their paper:

“To use these deepfake detectors in practice, we argue that it is essential to evaluate them against an adaptive adversary who is aware of these defenses and is intentionally trying to foil these defenses. We show that the current state of the art methods for deepfake detection can be easily bypassed if the adversary has complete or even partial knowledge of the detector.”