Perspective on the Deepmind — EBI AlphaFold Database and how we at Nexco can put these new bioinformatics tools to work for you

Reported in two back-to-back Nature papers, we at Nexco can directly put these tools to work for you

Figure composed by the authors from Dall-E 2 generations and screenshots from PyMOL v0.99.

When in 2021 Deepmind made available to the scientific community its star AI model for protein structure prediction, CASP-winning AlphaFold 2, it didn’t stop there. Rather, it allied with the European Institute of Bioinformatics to create the biggest-ever database of confident protein structure models in an attempt to chart the whole “protein universe”. The AlphaFold Protein Structure Database (AFDB) was then born, which as of today contains models for over 200 million proteins. Browsing, managing and understanding structural models at this scale presents many challenges, that these new tools tackle thus allowing scientists to more easily get the flesh out of the huge, rich structural dataset.

AlphaFold 2 and the AFDB are game-changers because they address an unmet need in the life sciences: while modern genomics profits from DNA sequencing at an unprecedented rate and low costs, determining the three-dimensional structures of the proteins encoded in the sequenced genes remains a major bottleneck in terms of time and money, precluding downstream analyses and experimentation. AlphaFold 2 solves a big chunk of the problem by using AI to predict structures quite accurately and accompanied by rich quality estimates. AlphaFold 2 and the AFDB thus represent major breakthroughs for protein biotechnology, diagnostics, and personalized treatments.

To help navigate and analyze the vast amount of structural data in the AFDB, two new web resources came out together with new methods and tools developed ad hoc to solve the problems arising from the large-scale nature of the dataset. One paper introduces clustering algorithms to group the 200 million protein structures in the AFDB down by 2 orders of magnitude to “just” 2.3 million clusters that you can browse right online; and another paper connects AFDB entries to UniProt and PFam annotations charting the whole “protein universe” in a quite graphical and easy-to-browse fashion.

In “Clustering-predicted structures at the scale of the known protein universe”, Barrio-Hernandez and colleagues employ innovative clustering algorithms based on Foldseek to efficiently group the 200 million protein structures within the AFDB. This approach unveils a staggering 2.30 million non-singleton structural clusters, including 31% previously uncharacterized protein structures. These clustering opens doors to unexplored territories in the “protein universe”, offering glimpses into ancient origins and potential species-specific functions.

The second paper, “Uncovering new families and folds in the natural protein universe”, by Durairaj and collaborators, delves into the “dark matter” of protein diversity connecting entries of the AFDB directly to UniProt and PFam annotations. This resource presents the models clustered at different levels, whose deep inspection allowed its creators to detect some novel folds and other peculiar features of protein families.

Perspective on AFDB and how we at Nexco can put these new tools to work for you

While mainly of fundamental nature, the revolution of AI for structural biology is already starting to resound directly in the field of protein biotechnology and will sooner or later also impact clinical applications in the form of more accurate diagnostics and personalized treatments. In fact, Deepmind itself is already attempting to cement the grounds for such an endeavor, as we covered recently in our blog post reporting on their latest tool, AlphaMissense, which outperforms all other methods for the prediction of missense mutations in proteins and is based on AlphaFold at its core.

There are also applications that are possible today by leveraging on AlphaFold 2, the AFDB and the new resources presented here. For example, these new tools can help us find proteins with potential to solve biotechnological problems, that exist in nature but we don’t know about them. The tools can also uncover links between protein families through unexpected structural relationships thanks to how the models of the database fill in the missing links.

We at Nexco eagerly anticipate exciting breakthroughs in protein biotechnology, diagnostics, therapeutics, and our overall understanding of the intricacies of life, by the hand of AI-based protein modeling. Critical to such advances, we count with scientists who are experts in the molecular modeling of biological systems, including recent assessors for the CASP contest on structure prediction from which AlphaFold came out. This means we are excellently acquainted with the latest methods, their weaknesses and strengths, the metrics required to assess predictions, how to interpret quality estimates, and all aspects related to practical applications. Plus, with these new resources we can navigate the models more efficiently to then inspect and judge AFDB models in the most modern ways, reaching conclusions that are also backed up by our experts’ decades-long experience in computational and experimental structural biology, protein design, and molecular evolution.

To know more about our services in computational structural biology visit our website at https://www.nexco.ch or contact us directly at contact@nexco.ch.

References

  • Sunday, Oct 22, 2023, 9:47 PM
  • alphafold, artificial-intelligence, structural-biology, bioinformatics
  • Share this post
Contact us

Our location

Nexco Analytics Bâtiment Alanine, Startlab Route de la Corniche 5A 1066 Epalinges, Switzerland

Give us a call

+41 76 509 73 73     

Leave us a message

contact@nexco.ch

Do not hesitate to contact us

We will answer you shortly with the optimal solution to answer your needs