Google open sources AlphaFold -- a monumental achievement for biology and medicine.
Nearly a year ago, Google’s AI outfit DeepMind announced they had cracked one of the oldest problems in biology: predicting a protein’s structure from its genomic sequence alone. On July 22nd, 2021, it has turned the breakthrough AI system AlphaFold on nearly every human protein and hundreds of thousands of additional proteins from organisms important to medical research.
Freely available to researchers and companies, the new database of roughly 350,000 protein sequences and structures represents a monumental achievement for the life sciences, one that could hasten new biological insights and the development of new drugs.
Proteins are the molecular machines of life. As chains of amino acids are translated from a corresponding RNA sequence, they acquire their functionality by folding into complex 3D structures. Nature makes the process look easy; even with millions of possible 3D configurations, most small proteins fold into a stable functional structure instantaneously through various atomic forces. Understanding how proteins fold is key to uncovering many sought-after answers in biology. For example, how do mutations in the genome lead to disease, or can an enzyme be engineered to degrade plastic effectively?
An amino acid chain for Chymotrypsin inhibitor 2 (1LW6) in its unfolded and folded form. Source: Wikicommons.
The ultimate question, a now 50-year-old postulate called the “protein folding problem,” asks whether a protein’s structure can be predicted using its amino acid sequence alone. Until now, the answer has been a firm NO. Given an amino acid chain, there are just too many possible structures to know which is correct.
With AlphaFold, biologists working on cancer or other diseases could now reveal new pockets to target small molecule drugs that would bind efficaciously to tumor cells or correct mutations. The database could also help researchers better understand how bacteria evade antibiotics and unveil mechanisms to overcome that resistance.
With successes in predicting protein structures associated with COVID-19, AlphaFold could also help researchers to identify the genomic mutation that has led to the SARS-CoV-2 Delta Variant. Armed with such information, they would be able to identify the cause of the mutated Spike protein, which would help curb the spread.
The database is far from perfect. Researchers will want to try to validate DeepMind’s predictions with traditional experimental methods like x-ray crystallography. At present, for each of its predictions, AlphaFold is assigning a confidence level. Despite only 36% of the predictions are deemed accurate at this point, the number of high-resolution structures available to researchers has more than doubled.
DeepMind’s software is also unable to model how proteins interact with each other or with molecules like DNA and RNA — crucial questions for building new gene-editing proteins or targeting certain immunological disorders. Nevertheless, the tech will free researchers to work on these more difficult questions. And this is certainly welcoming news for the entire drug industry!