期刊文献+

高通量蛋白质结构生物信息学进展

Advances in High-throughput Protein Structural Bioinformatics
原文传递
导出
摘要 本文总结了高通量蛋白质结构生物信息学的最新进展,包括结构数据管理、工具软件开发和结构数据挖掘三个主要方面。结构数据管理方面,得益于类AlphaFold系统的发展,蛋白质结构数据量实现爆发式增长,直接促进了压缩技术的升级,也吸引了研究者对结构数据管理的关注。工具软件开发方面,以Foldseek为代表的新算法实现了高速的结构比对,突破了结构分析的通量瓶颈,此外深度学习模型的大量应用从多个方面改进了基于结构的蛋白质功能注释。结构数据挖掘方面,研究者以组学思维处理结构大数据,在持续的探索中提炼分析要素、优化方法,并在新工具的帮助下推动着结构数据挖掘的进阶。随着高通量方法的发展,结构生物信息学有望在生命科学中发挥更重要的作用。 This review provides a comprehensive summary of the latest advancements in high-throughput protein structural bioinformatics,a field that has undergone a revolutionary transformation with the advent of deep learning-based protein structure prediction systems like AlphaFold2.These systems have significantly increased the accuracy,speed,and scale of protein structure prediction,resulting in an exponential growth in the number of protein structures available for analysis.Notably,the AlphaFold Protein Structure Database(AFDB)has amassed over 214 million protein structures,surpassing the PDB’s 50-year cumulative data by over 1000-fold within several months.Big data is driving the comprehensive upgrade of protein structural bioinformatics.This review focuses on three main areas:structure data management,tool development,and structure data mining.In the realm of structure data management,the review spotlights the optimization strategy of AlphaFold-like systems,which significantly reduces the resource requirements for protein folding,enabling more researchers to make custom structure predictions and further enlarging the data scale.The resulting“data explosion”has exerted increased pressure on storage and bandwidth,prompting the development of cutting-edge tools such as Foldcomp,PDC,and ProteStAr for compressing PDB files.Moreover,the review underscores the critical role of public repositories like ModelArchive and PDB-Dev in archiving and sharing third-party AlphaFold models.It also highlights the utilization of independent services like MineProt and 3D-Beacons to create more interactive and accessible data portals.In terms of tool development,the review spotlights recent breakthroughs in structure alignment algorithms,represented by Foldseek,which enable ultra-fast searching of large protein structure databases.It also covers tools for functional annotation of proteins based on their structures,including AlphaFill for ligand annotation,DeepFRI for Gene Ontology(GO)annotation,TT3D for protein-protein interaction(PPI)prediction,among others.It is proposed that 3Di sequences born concurrently with Foldseek can enhance many sequence based deep learning models developed in the pre-AlphaFold era,enabling them to be applied to structure-based function prediction.The challenges on traditional molecular docking methods in the high-throughput era are mentioned at last,in a gesture to arouse the attention of researchers.Finally,the review explores the burgeoning field of structure data mining.Whole proteome structuring has become feasible in recent years,and scientists are processing large structure datasets from an omics viewpoint,continuously identifying analyzable elements and optimizing methodologies,as well as utilizing newly developed tools to push the boundaries.Notable examples include the identification of new protein families,the development of protein structure clustering,and the integration of AlphaFold with conventional experimental techniques to solve large structures.These advancements are paving the way for a deeper understanding of protein structure and function and have the potential to unlock new discoveries in the life sciences.However,the review also acknowledges the challenges and limitations that persist in the field,including the lack of diversity in high-throughput software for protein structural bioinformatics and the existing bottleneck in rapidly predicting protein complex structures.Overall,structural bioinformatics is expected to play an even more crucial role in the life sciences with the development of high-throughput methodology.
作者 祝云篪 陆祖宏 ZHU Yun-Chi;LU Zu-Hong(State Key Laboratory of Digital Medical Engineering,School of Biological Science and Medical Engineering,Southeast University,Nanjing 211189,China)
出处 《生物化学与生物物理进展》 SCIE CAS CSCD 北大核心 2024年第9期1989-1999,共11页 Progress In Biochemistry and Biophysics
基金 国家重点研发计划(2016YFA0501600)资助项目。
关键词 蛋白质结构生物信息学 高通量 类AlphaFold系统 结构蛋白质组学 protein structural bioinformatics high-throughput AlphaFold-like system structural proteomics
  • 相关文献

参考文献1

共引文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部