Artificial intelligence can help predict the three-dimensional structure of proteins. Beat Christen describes how such algorithms should soon help to develop tailored artificial proteins.
Computer algorithms have been a helpful tool in biomedical research for decades, and their importance has been growing steadily over that time. But what we’re now experiencing is nothing short of a quantum leap; it overshadows all that came before and it will have unforeseen effects. Artificial intelligence (AI) algorithms have made it possible to use nothing but the linear sequence of the building blocks of proteins – amino acids – to deliver extremely accurate predictions of the three-dimensional structure into which this chain of amino acids will assemble.
Grasping the importance of this development hinges on knowing that biology on a cellular level is actually always about spatial interactions between molecules – and that it’s the three-dimensional structure of these molecules that determine those interactions. Once we understand the structures and interactions in play, we understand the biology. And only once we understand the structure of molecules can we engineer medications capable of influencing the function of these molecules.
Up to now, there have been three experimental methods for determining the three-dimensional structure of proteins: X-ray structure analysis, nuclear magnetic resonance and, just in the past few years, cryo-electron microscopy. The addition now of AI as a fourth precision method is due not just to improvements in AI algorithms and the vast computing power that is available today. For AI to make accurate predictions, it also needs to be trained using a wealth of data of exceptional quality. What makes the abovementioned quantum leap possible is considerable progress and effort in both data science and experimental protein research.
Competition between private and public research
Currently occupying most of the spotlight is the AlphaFold AI program developed by DeepMind, a sister company of Google. At present, DeepMind is undoubtedly the most important player in predicting protein structures. But what gets lost in the public discussion is that DeepMind is by no means the only player in this area; in particular the team led by David Baker from the University of Washington is conducting some outstanding research.
Overall, this competition between private and public research has surely served to inspire and invigorate the field, even if, as one would expect, private players keep many of their insights to themselves to protect their own business interests. But highly competitive research has also led to vast improvements to the AI algorithms that are in the public domain, which the entire scientific community can now use and develop. I expect this trend to continue. AI algorithms will soon provide us with highly precise structures for all known proteins. This will enable us to design precision medications on the computer.
In the future, it should be possible to start from a three-dimensional molecular scafold designed on a computer and employ AI to calculate a sequence of amino acids that will precisely assemble into the desired structure with the desired molecular function.
Once this sequence of amino acids has been determined, my area of research comes into play. My work deals with the development of artificial genes and genomes, and it also employs computer algorithms. Based on sequences of amino acids, we calculate how protein information can be encoded into sequences of genetic building blocks – in other words into DNA. And we do it in a way that provides a simple means of synthesising these genes for practical applications.
Reversing the information flow
This means we are on the verge of being able to calculate an artificial gene for any given three-dimensional protein structure designed on a computer, and then synthesise that gene. In biotechnology, this paves the way for manufacturing artificial proteins in microorganisms – including new pharmaceutical agents, vaccines or enzymes for use in industry.
Ever since the earliest lifeforms emerged several billion years ago, to this day biological information has always been stored in the form of DNA. Inside biological cells, this information is transcribed– first into RNA molecules, and then translated into proteins. Until now, there has been no mechanism for reversing the flow of information such that protein information is translated back into DNA information. AI will soon change all that. For biologists such as myself, this is an incredibly spectacular development, one that will have a profound impact on biotechnology and medicine.
This article was first published on August 17 by ETH Zurich.