The artificial intelligence network (AI), developed by Google̵
DeepMind’s program, called AlphaFold, outperforms about 100 other teams in a two-year protein structure prediction challenge called CASP, short for Critical Assessment of Structure Prediction. The results were announced on November 30, at the beginning of the conference – held practically this year -, which takes stock of the exercise.
“It’s a big deal,” said John Moult, a computational biologist at the University of Maryland at College Park who co-founded CASP in 1994 to improve computational methods for accurately predicting protein structures. “In a sense, the problem is solved.”
The ability to accurately predict protein structures from their amino acid sequence would be a huge boon to the life sciences and medicine. This will significantly speed up the effort to understand the building blocks of cells and allow for faster and more advanced drug detection.
AlphaFold came to the top of the table of the last CASP – in 2018, the first year in which the London-based DeepMind participated. But this year, the deep equipment training network was head and shoulders above other teams and, scientists say, performed so astonishingly well that it could herald a revolution in biology.
“This is a change of game,” said Andrei Lupas, an evolutionary biologist at the Max Planck Institute for the Development of Biology in Tübingen, Germany, who praised the work of various teams at CASP. AlphaFold has already helped him find the structure of a protein that has puzzled his lab for a decade, and he expects him to change the way he works and the issues he deals with. “It will change medicine. This will change research. This will change bioengineering. That will change everything, “Lupas added.
In some cases, the predictions for AlphaFold structures are indistinguishable from those determined by the “gold standard” of experimental methods such as X-ray crystallography and, in recent years, cryo-electron microscopy (cryo-EM). AlphaFold may not eliminate the need for these time-consuming and expensive methods, scientists say, but AI will make it possible to study living things in new ways.
The problem with structure
Proteins are the building blocks of life that are responsible for most of what happens inside cells. How a protein works and what it does is determined by its 3D shape – “structure is a function” is an axiom of molecular biology. Proteins tend to take shape without help, guided only by the laws of physics.
For decades, laboratory experiments have been the main way to obtain good protein structures. The first complete structures of proteins were determined, beginning in the 1950s, using a technique in which X-rays were fired at crystallized proteins and diffracted light was transformed into the atomic coordinates of the protein. X-ray crystallography has created the lion’s share of protein structures. But over the past decade, cryo-EM has become a preferred tool in many structural biology laboratories.
Scientists have long wondered how the components of a protein – a series of different amino acids – outline the many twists and turns of its eventual shape. Early attempts to use computers to predict protein structures in the 1980s and 1990s performed poorly, researchers said. High claims about methods in published articles tend to fall apart when other scientists apply them to other proteins.
Moult launched CASP to bring more rigor to these efforts. The event challenged teams to predict protein structures that had been solved using experimental methods but for which the structures had not been made public. Molt credited the experiment – he doesn’t call it a race – with a significant improvement in the field, requiring time for over-claims. “You really find out what looks promising, what works, and what you need to move away from,” he says.
The implementation of DeepMind in 2018 on CASP13 shocked many scientists in the field, which has long been a bastion of small academic groups. But his approach is very similar to that of other teams that implement AI, says Jinbo Sue, a computational biologist at the University of Chicago, Illinois.
The first iteration of AlphaFold applied the AI method, known as in-depth training to structural and genetic data, to predict the distance between amino acid pairs in a protein. In a second step, which does not refer to AI, AlphaFold uses this information to come up with a “consensus” model of what the protein should look like, says John Jumper of Deep Mind, who is leading the project.
The team tried to build on this approach, but eventually hit the wall. So he changed the change, Jumper says, and developed an AI network that included additional information about the physical and geometric constraints that determine how a protein folds. They also set it a more difficult task: instead of predicting the links between amino acids, the network predicts the final structure of the target protein sequence. “It’s a much more complex system,” says Jumper.
CASP is carried out for several months. Target proteins or parts of proteins called domains – a total of about 100 – are released regularly and teams have several weeks to present their predictions of the structure. A team of independent scientists then evaluates the predictions using metrics that assess how similar the predicted protein is to the experimentally determined structure. Evaluators do not know who makes the forecast.
AlphaFold’s predictions came under the name “Group 427”, but the astonishing accuracy of many of their recordings set them apart, Lupas said. “I assumed it was AlphaFold. “Most people have,” he said.
Some estimates are better than others, but almost two-thirds are comparable in quality to experimental structures. In some cases, Moult says, it was unclear whether the discrepancy between AlphaFold’s predictions and the experimental result was a prediction error or an artifact from the experiment.
AlphaFold’s predictions are poor matches with experimental structures determined by a technique called magnetic resonance imaging, but that may depend on how the raw data is converted into a model, Moult said. The network also struggles to model individual structures into protein complexes or groups, with interactions with other proteins distorting their shapes.
Overall, the teams predicted the structures more accurately this year than the last CASP, but much of the progress can be attributed to AlphaFold, Moult said. In terms of protein targets, which are considered moderately difficult, other teams’ best scores typically score 75 on a 100-point accuracy scale, while AlphaFold scores about 90 on the same targets, Moult said.
About half of the teams mentioned “in-depth learning” in the summary, summarizing their approach, Moult said, suggesting that AI has a big impact on the field. Most of them were from academic teams, but Microsoft and Chinese technology company Tencent also joined CASP14.
Mohamed Al Qurayshi, a computational biologist at Columbia University in New York and a CASP participant, is eager to review the details of AlphaFold’s performance at the competition and learn more about how the system works when the DeepMind team unveils its approach on December 1st. It is possible – but unlikely, he says – that it is easier than usual to harvest protein targets to contribute to their performance. AlQuraishi’s strong idea is that AlphaFold will be transformational.
“I think it’s fair to say that this will be very destructive for the field of protein structure prediction. I suspect that many will leave the field, as the main problem is probably solved, “he said. “It’s a first-rate breakthrough, certainly one of the most significant scientific results of my life.”
AlphaFold’s prognosis helped determine the structure of a bacterial protein that Lupas’ lab has been trying to break down for years. The Lupas team had previously collected raw X-ray diffraction data, but transforming these Rorschach-like models into a structure requires some information about the shape of the protein. Tricks for obtaining this information, as well as other forecasting tools, have failed. “The Model 427 model gave us our structure in half an hour after spending a decade trying everything,” Lupas said.
Demis Hassabis, co-founder and CEO of DeepMind, says he plans to make AlphaFold useful so other scientists can use it. (Previously, they published enough details about the first version of AlphaFold so that other scientists could replicate the approach.) It may take AlphaFold days to come up with a predictive structure that includes estimates of the reliability of different regions of the protein. “We are just beginning to understand what biologists would like,” added Khasabis, who sees drug discovery and protein design as potential applications.
In early 2020, the company published forecasts for the structures of a handful of SARS-CoV-2 proteins that have not yet been determined experimentally. DeepMind’s predictions for a protein called Orf3a are ultimately very similar to those later determined by cryo-EM, said Stephen Brown, a molecular neurobiologist at the University of California, Berkeley, whose team released the structure in June. “What they have managed to do is very impressive,” he added.
Impact in the real world
AlphaFold is unlikely to close laboratories like Brohawn, which use experimental methods to solve protein structures. But this may mean that lower quality experimental data and easier collection will be all that is needed to obtain a good structure. Some applications, such as evolutionary protein analysis, will thrive as the tsunami of available genomic data can now be reliably translated into structures. “This will allow a new generation of molecular biologists to ask more advanced questions,” Lupas said. “It will require more thinking and less pipetting.”
“This is a problem I started to think would not be solved in my lifetime,” said Janet Thornton, a structural biologist at the European Molecular Biology Laboratory-European Institute of Bioinformatics in Hinxton, UK, and a former CASP evaluator. She hopes the approach can help shed light on the function of the thousands of undissolved proteins in the human genome and make sense of disease-causing gene variations that vary between people.
The performance of AlphaFold also marks a turning point for DeepMind. The company is best known for having AI to master games like Go, but its long-term goal is to develop programs capable of achieving broad, human-like intelligence. Dealing with major scientific challenges, such as predicting protein structure, is one of the most important applications his AI can make, Hasabis said. “I think that’s the most important thing we’ve done in terms of the real impact.”