China's gigantic multi-modal AI is no one-trick pony

Sporting 1.75 trillion parameters, Wu Dao 2.0 is roughly ten times the size of Open AI's GPT-3.

By Andrew Tarantola June 2, 2021 5:14 pm EST

China flag in the center of circuit board, Artificial intelligence of digital illustration

hakule via Getty Images

When Open AI's GPT-3 model made its debut in May of 2020, its performance was widely considered to be the literal state of the art. Capable of generating text indiscernible from human-crafted prose, GPT-3 set a new standard in deep learning. But oh what a difference a year makes. Researchers from the Beijing Academy of Artificial Intelligence announced on Tuesday the release of their own generative deep learning model, Wu Dao, a mammoth AI seemingly capable of doing everything GPT-3 can do, and more.

First off, Wu Dao is flat out enormous. It's been trained on 1.75 trillion parameters (essentially, the model's self-selected coefficients) which is a full ten times larger than the 175 billion GPT-3 was trained on and 150 billion parameters larger than Google's Switch Transformers.

In order to train a model on this many parameters and do so quickly — Wu Dao 2.0 arrived just three months after version 1.0's release in March — the BAAI researchers first developed an open-source learning system akin to Google's Mixture of Experts, dubbed FastMoE. This system, which is operable on PyTorch, enabled the model to be trained both on clusters of supercomputers and conventional GPUs. This gave FastMoE more flexibility than Google's system since FastMoE doesn't require proprietary hardware like Google's TPUs and can therefore run on off-the-shelf hardware — supercomputing clusters notwithstanding.

With all that computing power comes a whole bunch of capabilities. Unlike most deep learning models which perform a single task — write copy, generate deep fakes, recognize faces, win at Go — Wu Dao is multi-modal, similar in theory to Facebook's anti-hatespeech AI or Google's recently released MUM. BAAI researchers demonstrated Wu Dao's abilities to perform natural language processing, text generation, image recognition, and image generation tasks during the lab's annual conference on Tuesday. The model can not only write essays, poems and couplets in traditional Chinese, it can both generate alt text based off of a static image and generate nearly photorealistic images based on natural language descriptions. Wu Dao also showed off its ability to power virtual idols (with a little help from Microsoft-spinoff XiaoIce) and predict the 3D structures of proteins like AlphaFold.

"The way to artificial general intelligence is big models and big computer," Dr. Zhang Hongjiang, chairman of BAAI, said during the conference Tuesday. "What we are building is a power plant for the future of AI, with mega data, mega computing power, and mega models, we can transform data to fuel the AI applications of the future."

China's gigantic multi-modal AI is no one-trick pony

Recommended