摘要
Gene Ontology(GO)has been widely used to annotate functions of genes and gene products.Here,we proposed a new method,Triplet GO,to deduce GO terms of protein-coding and noncoding genes,through the integration of four complementary pipelines built on transcript expression profile,genetic sequence alignment,protein sequence alignment,and naīve probability.Triplet GO was tested on a large set of 5754 genes from 8 species(human,mouse,Arabidopsis,rat,fly,budding yeast,fission yeast,and nematoda)and 2433 proteins with available expression data from the third Critical Assessment of Protein Function Annotation challenge(CAFA3).Experimental results show that Triplet GO achieves function annotation accuracy significantly beyond the current state-of-the-art approaches.Detailed analyses show that the major advantage of Triplet GO lies in the coupling of a new triplet network-based profiling method with the feature space mapping technique,which can accurately recognize function patterns from transcript expression profiles.Meanwhile,the combination of multiple complementary models,especially those from transcript expression and protein-level alignments,improves the coverage and accuracy of the final GO annotation results.The standalone package and an online server of Triplet GO are freely available at https://zhanggroup.org/Triplet GO/.
基金
supported in part by the National Natural Science Foundation of China(Grant Nos.62072243 and 61772273 to Dong-Jun Yu)
the Natural Science Foundation of Jiangsu,China(Grant No.BK20201304 to Dong-Jun Yu)
the Foundation of National Defense Key Laboratory of Science and Technology,China(Grant No.JZX7Y202001SY000901 to DongJun Yu)
the China Scholarship Council(Grant No.201906840041 to Yi-Heng Zhu)
the National Institute of Environmental Health Sciences,USA(Grant No.P30ES017885 to Gilbert S.Omenn)
the National Cancer Institute,USA(Grant No.U24CA210967 to Gilbert S.Omenn)
the National Institute of General Medical Sciences,USA(Grant Nos.GM136422 and S10OD026825 to Yang Zhang)
the National Institute of Allergy and Infectious Diseases,USA(Grant No.AI134678 to Peter L.Freddolino and Yang Zhang)
the National Science Foundation,USA(Grant Nos.IIS1901191,DBI2030790,and MTM2025426 to Yang Zhang)
used the Extreme Science and Engineering Discovery Environment(XSEDE),which is supported by the National Science Foundation,USA(Grant No.ACI1548562)。