Alphabet-owned DeepMind may be best known for building the AI that beat a world-class Go player, but the company announced another, perhaps more vital breakthrough this morning. As part of its work for the 14th Critical Assessment of Protein Structure Prediction, or CASP, DeepMind's AlphaFold 2 AI has shown it can guess how certain proteins will fold themselves with surprising accuracy. In some cases, the results were perceived to be "competitive" with actual, experimental data.
"We have been stuck on this one problem – how do proteins fold up – for nearly 50 years," said Professor John Moult, CASP chair and co-founder, in a DeepMind blog post. "To see DeepMind produce a solution for this, having worked personally on this problem for so long and after so many stops and starts, wondering if we’d ever get there, is a very special moment."
Researchers and enthusiasts across the internet have met the news enthusiastically, with some proclaiming that AlphaFold has solved the "protein solving problem." But what does that mean, exactly? And how do we stand to benefit from it?
To start answering these questions, we need to take a closer look at the proteins themselves. As your biology teacher might have said, proteins are the building blocks of life, responsible for countless functions inside and outside the human body. Each one starts as a series of amino acids strung together into a chain, but it doesn't take long -- sometimes just milliseconds -- before things start to get complicated. Some parts of the amino acid chain twist into helixes. Others fold back onto themselves as "sheets". Before long, these helixes and sheets coalesce and contort into a protein's final structure, and that's what gives a protein the ability to perform specific tasks, like ferrying oxygen through your body or strengthening the structure of your bones.
In other words, shape is everything, and researchers have spent decades trying to find a way to determine a protein’s final, folded structure based solely on the amino acids that make up its backbone. That’s where CASP comes in -- since 1994, the program has served as a focal point of sorts for teams around the world working to crack the protein solving problem with computational ingenuity. The rules are fairly simple: Every other year, organizers select a series of target proteins from a bevy of submissions whose structures have been determined experimentally, but haven’t been published yet. Researchers then get a few months to tune their systems and make their predictions, which are then judged by experts in the field for about a month after submissions are closed.
While CASP has been running for 26 years, it’s been in the past few that the scientific community has been able to bring quantum leaps in compute power and machine learning to bear on the challenge. In DeepMind’s case, that involved training AlphaFold 2’s prediction model on about 170,000 known protein structures, along with a vast number of protein sequences whose 3D structures haven’t yet been determined. This testing data, the team admits, is fairly similar to what it used in 2018, when the original AlphaFold system achieved top marks during CASP 13. (At the time, organizers hailed DeepMind’s “unprecedented progress in the ability of computational methods to predict protein structure.”)
That said, the team made some notable changes to its machine learning approach -- they haven’t published a full paper yet, but the CASP 14 abstract book highlights some of their modifications. And beyond that, DeepMind also relied on about 128 of Google’s cloud-based TPUv3 cores, which ultimately gave AlphaFold 2 the ability to accurately determine a protein’s structure within just days, if not sooner -- the New York Times notes that, in some cases, predictions can be generated in a matter of hours.
This all sounds impressive -- and it is, certainly -- but there’s still plenty of work to be done. On the whole, AlphaFold’s results represented a dramatic improvement in accuracy compared to past years, and as mentioned, some of DeepMind’s predictions were accurate enough to rival experimental results at an atomic level. Others, however, fell short of that threshold. The company notes that “for the very hardest protein targets, those in the most challenging free-modelling category, AlphaFold achieves a median score of 87.0 GDT” -- that’s just shy of the 90 GDT metric CASP co-founder Moult uses as the barrier for calling results “competitive” with real data. Put another way, DeepMind hasn’t fully solved the protein solving problem, but it’s getting closer than many had thought possible.
As DeepMind’s work continues, we’ll start to see the full extent of accurate protein prediction take shape -- for now, the jury still seems out on what practical benefits we could expect to see in the short term. The company points to potential advances in sustainability and drug design as a result of its protein folding research, though it didn’t elaborate on specifics. Meanwhile, Janet Thornton, a structural biologist at the European Molecular Biology Laboratory-European Bioinformatics Institute, told Nature that she hopes this leap in accuracy could shed light on the functions of “thousands” of unsolved proteins at work in the human body. If nothing else, though, researchers could be looking at a glut of new protein structure data to investigate, test against, and work backward from -- that’s worth celebrating, even if we don’t know how it’ll be used yet.