W. BIKEL /SCIENCE
When DeepMind competed for the first time in 2018, its algorithm, called AlphaFold, relied on this comparative strategy. But AlphaFold also includes a computational approach called in-depth learning, in which software is trained on vast arrays of data — in this case, sequences, structures, and known proteins — and learns to detect patterns. DeepMind won easily, beating the competition by an average of 15% for each structure and winning from GDT to about 60 for the toughest goals.
But the forecasts were still too rough to be useful, said John Jumper, who oversees AlphaFold’s development at DeepMind. “We knew how far we were from biological significance.” To do better, Jumper and colleagues combined deep learning with a “stress algorithm” that mimics the way one can put together a puzzle: first putting pieces together in small pieces. lumps – in this case clusters of amino acids – and then look for ways to merge the piles into a larger whole. Working on a modest 128-processor computer network, they train the algorithm of all 170,000 or so known protein structures.
And it happened. In terms of target proteins in this year’s CASP, AlphaFold achieved a mean GDT score of 92.4. For the most challenging proteins, AlphaFold scored a median of 87.25 points above the next best predictions. It even outperforms the resolution of protein structures that are wedged into cell membranes, which are essential for many human diseases but are known to be difficult to resolve with X-ray crystallography. Venki Ramakrishnan, a structural biologist in the Molecular Biology Laboratory of the Medical Research Council, called the result “stunning progress on the problem of protein folding.”
All groups in this year’s competition have improved, says Moult. But with AlphaFold, Lupas says, “The game has changed.” Organizers even worry that DeepMind may have cheated in some way. So Lupas poses a special challenge: an archaic membrane protein, an ancient group of microbes. For 10 years, his research team tried every trick in the book to get an X-ray crystal structure of the protein. “We couldn’t solve it.”
But AlphaFold had no problems. He returned a detailed image of a three-part protein with two long spiral arms in the middle. The model enabled Lupas and his colleagues to make sense of their X-ray data; within half an hour, they had aligned their experimental results with the predicted structure of AlphaFold. “It’s almost perfect,” says Lupas. “It simply came to our notice then. I don’t know how they do it. “
As a condition of joining CASP, DeepMind – like all groups – agreed to disclose enough details about its method so that other groups could reproduce it. This will be a boon for experimenters who will be able to use accurate structural predictions to make sense of opaque X-ray and cryo-EM data. It could also enable drug designers to quickly structure each protein into new and dangerous pathogens such as SARS-CoV-2, a key step in the search for molecules to block them, Moult said.
Still, AlphaFold isn’t doing well yet. During the race, he shook noticeably on a protein, an amalgam of 52 small repeating segments that distorted their positions as they were assembled. Jumper says the team now wants to train AlphaFold to solve such structures, as well as those of protein complexes that work together to perform key functions in the cell.
Although one great challenge has fallen, there will undoubtedly be others. “That’s not the end of it,” Thornton said. This is the beginning of many new things.