It's terrifyingly easy to just make stuff up online these days, such is life in the post-truth era. But recent advancements in machine learning (ML) and artificial intelligence (AI) have compounded the issue exponentially. It's not just the news that's fake anymore but all sorts of media and consumer goods can now be knocked off thanks to AI. From audio tracks and video clips to financial transactions and counterfeit products -- even your own handwriting can be mimicked with startling levels of accuracy. But what if we could leverage the same computer systems that created these fakes to reveal them just as easily?
People have been falling for trickery and hoaxes since forever. Human history is filled with false prophets, demagogues, snake-oil peddlers, grifters and con men. The problem is that these days, any two-bit huckster with a conspiracy theory and a supplement brand can hop on YouTube and instantly reach a global audience. And while the definition of "facts" now depends on who you're talking to, one thing that most people agreed to prior to January 20th this year is the veracity of hard evidence. Video and audio recordings have long been considered reliable sources of evidence but that's changing thanks to recent advances in AI.
In July 2016, researchers at the University of Washington developed a machine learning system that not only accurately synthesizes a person's voice and vocal mannerisms but lip syncs their words onto a video. Essentially, you can fake anybody's voice and create a video of them saying whatever you want. Take the team's demo video, for example. They trained the ML system using footage of President Obama's weekly address. The recurrent neural network learned to associate various audio features with their respective mouth shapes. From there, the team generated CGI mouth movements, and with the help of 3D pose matching, ported the animated lips onto a separate video of the president. Basically, they're able to generate a photorealistic video using only its associated audio track.
While the team took an outsized amount of blowback over the potential misuses of such technology, they had far more mundane uses for it in mind. "The ability to generate high-quality video from audio could signicantly reduce the amount of bandwidth needed in video coding/transmission (which makes up a large percentage of current internet bandwidth)," they suggested in their study, Synthesizing Obama: Learning Lip Sync from Audio. "For hearing-impaired people, video synthesis could enable lip-reading from over-the-phone audio. And digital humans are central to entertainment applications like film special effects and games."
UW isn't the only facility looking into this sort of technology. Last year, a team from Stanford debuted the Face2Face system. Unlike UW's technology, which generates video from audio, Face2Face generates video from other video. It uses a regular webcam to capture the user's facial expressions and mouth shapes, then uses that information to deform the target YouTube video to best match the user's expressions and speech -- all in real time.
AI-based audio-video transcription is a two-way street. Just as UW's system managed to generate video from an audio feed, a team from MIT's CSAIL figured out how to create audio from a silent video reel. And do it well enough to fool human audiences.
"When you run your finger across a wine glass, the sound it makes reflects how much liquid is in it," Andrew Owens, the paper's lead author told MIT News. "An algorithm that simulates such sounds can reveal key information about objects' shapes and material types, as well as the force and motion of their interactions with the world."
The MIT's deep learning system was trained over the course of a few months using 1,000 videos containing some 46,000 sounds resulting from different objects being poked, struck or scraped with a drumstick. Like the UW algorithm, MIT's learned to associate different audio properties with specific onscreen actions and synthesize those sounds as the video played. When tested online against a video with authentic sound, people actually chose the fake audio over the real twice as often as the baseline algorithm.
The MIT team figures that they can leverage this technology to help give robots better situational awareness. "A robot could look at a sidewalk and instinctively know that the cement is hard and the grass is soft, and therefore know what would happen if they stepped on either of them," Owens said. "Being able to predict sound is an important first step toward being able to predict the consequences of physical interactions with the world."
Research into audio synthesization isn't limited to universities; a number of major corporations are investigating the technology as well. Google, for example, has developed Wavenet, a "deep generative model of raw audio waveforms." Among the first iterations of computer-generated text-to-speech (TTS) systems is "concatenative" TTS. That's where a single person records a variety speech fragments, those are fed into a database and then reconstructed by a computer to form words and sentences. The problem is that the output sounds more like the MovieFone guy (ask your parents) than a real person.
Waveform, on the other hand, is trained on waveforms of people speaking. The system samples those recordings for data points up to 16,000 times per second. To output sound, Waveform uses a model to predict what the next sound will be based on the sounds that came before it. The process is computationally expensive but does produce superior audio quality compared to the conventional TTS methods.
In the future, robots could potentially forge your signature on official documents, if this AI-based handwriting mimic developed at the University College London is ever misused. Dubbed the "My Text in Your Handwriting" program, this system can accurately recreate a subject's handwriting with as little as a paragraph's input. The program is based on "glyphs," essentially the unique traits of each person's handwriting. By measuring various aspects like horizontal and vertical spacing, connectors between letters and writing texture, the program can readily copy the style.
"Our software has lots of valuable applications. Stroke victims, for example, may be able to formulate letters without the concern of illegibility, or someone sending flowers as a gift could include a handwritten note without even going into the florist," Dr. Tom Haines, UCL Computer Science and lead author of the study, told UCL News. "It could also be used in comic books where a piece of handwritten text can be translated into different languages without losing the author's original style."
And while this technology could be used to create forgeries, it can just as easily be leveraged to spot them as well. "Forgery and forensic handwriting analysis are still almost entirely manual processes," Dr. Gabriel Brostow, of the UCL computer science department, said. "But by taking the novel approach of viewing handwriting as texture-synthesis we can use our software to characterise handwriting to quantify the odds that something was forged."
Forgeries and faked products don't stop at the the bounds of the internet. Recent estimates by the Organisation for Economic Co-operation and Development put the global market for counterfeit goods at around $460 billion annually. And that's where the Entrupy authentication system comes in.
"In an ideal world, we shouldn't exist," Entrupy CEO Vidyuth Srinivasan lamented. "The more we can instill trustworthiness in the market, the better it will be for commerce in general."
The company first imaged a wide variety of luxury goods and uses that database to help its customers -- generally those in secondary retail markets like vintage clothing stores or eBay sellers -- authenticate products with around 98.5 percent accuracy. Customers receive a handheld microscope and take various images of the product in question, such as the exterior, logo or interior lining. These photos are then fed into a mobile app and transmitted to the company's servers where a classification algorithm goes to work, differentiating between legitimate goods and counterfeits. If the product is real, the Entrupy will provide a certificate of authenticity.
Although the company's product database is varied, there are limits to the system's current capabilities. Because it's optical, reflective or transparent items are no good, nor is anything without surface texture. Some things that it does not work on include porcelain, diamonds and glass, pure plastic and bare metal.
Unlike the other AI-based systems discussed here, there's little chance of the Entrupy system being corrupted or gamed. "We have had [counterfeiters] pose as real customers and legitimate businesses to try and buy [the system] and we're fine with it," Srinivasan explained. That's because the system doesn't actually tell the user which of the images they're taking are actually being used to verify the product's authenticity. "We ask our customers to take images of different parts of the item because it's not just pure material [being used for verification]...," he continued. "It's a holistic view of the different aspects of the item -- from the workmanship to the material used to the wear" as well as a number of other contextual bits of metadata.
What's more, the system is continually updated with new data, not just from the company's internal efforts of posing as secret buyers to acquire counterfeit goods, but also from the users themselves. Images taken during the authentication process -- whether the item turns out to be real or not -- are incorporated into the company's database, further improving the system's accuracy.
"In the near to medium future, I think that AI and ML will, as a counterfeiting solution, will definitely raise the bar," he concluded. "It's a spy versus spy game, cat versus mouse."
Increasing our ability to spot fakes will force counterfeiters to up their game and start using better quality materials and better workmanship. That, however, will increase the production cost of these products, hopefully to a price that is no longer economically viable. "The MO of any counterfeiter is to make something that they can sell a lot of, that can be easily produced and that does not cost a lot to produce a fake of," Srinivasan sid. "Otherwise there's no profitability."
Similar measures have been adopted by Paypal, one of the the internet's top financial service providers, for cases of account fraud. "Say my account was accessed today from San Francisco, tomorrow from NYC, and some other IP the day after," Hui Wang, Paypal's senior director of global risk sciences, told Engadget. This sort of activity is indicative of some kind of account takeover. "In order to detect these kinds of fraud," she explained, "we track the IP we track the machine and we track the network."
The company created an algorithm that looks at both the IP and the geolocation of that IP, then compares them to your account history to see if this matches up with previous actions. Paypal developed a proprietary technology that compares this IP location patten with other users, to see if there is a larger effect at work or there's a reasonable explanation for the movement -- i.e., perhaps you're flying through New York on business and buy a souvenir at the airport gift shop before continuing on the trip.
The company's AI system also attempts to identify each previous IP, whether it's a hotel's secured ethernet connection or the public WiFi at the airport. "[The algorithm] is retrieving tons of data from your account history and going beyond your account to look at the traffic on your network, like the other people using the same IP," Wang said. From this raw information, the algorithm selects specific data points and uses those to estimate whether the transaction is legitimate.
Most of these actions and their subsequent decisions -- such as verifying or denying a payment -- are performed autonomously. However, if the algorithm's confidence value is too low, human investigators from the operations center will investigate the transaction manually.
"We are also in the process of ensuring that human intelligence can be fed back into the automated system," she continued, so that the ML system continually learns, improves and increases its accuracy.
These sorts of systems, both those designed to generate fakes and those trained to uncover them, are still in their infancy. But in the coming decades, artificial intelligence and machine learning techniques will continue to improve, often in ways that we have yet to envision. There is a very real danger in technologies that can create uncannily convincing lies, hoaxes and fakes -- in front of our very eyes, no less. But, like movable type, radio and internet that came before it, AI systems like these, ones capable of generating photorealistic content, will only be as dangerous as the intentions of the people using it. And that's a terrifying thought.