A fundamental principle of biology is that proteins tend to form complexes to play important roles in the core functions of cells.For a complete understanding of human cellular functions,it is crucial to have a compre...A fundamental principle of biology is that proteins tend to form complexes to play important roles in the core functions of cells.For a complete understanding of human cellular functions,it is crucial to have a comprehensive atlas of human protein complexes.Unfortunately,we still lack such a comprehensive atlas of experimentally validated protein complexes,which prevents us from gaining a complete understanding of the compositions and functions of human protein complexes,as well as the underlying biological mechanisms.To fill this gap,we built Human Protein Complexes Atlas(HPC-Atlas),as far as we know,the most accurate and comprehensive atlas of human protein complexes available to date.We integrated two latest protein interaction networks,and developed a novel computational method to identify nearly 9000 protein complexes,including many previously uncharacterized complexes.Compared with the existing methods,our method achieved outstanding performance on both testing and independent datasets.Furthermore,with HPC-Atlas we identified 751 severe acute respiratory syndrome coronavirus 2(SARS-CoV-2)-affected human protein complexes,and 456 multifunctional proteins that contain many potential moonlighting proteins.These results suggest that HPC-Atlas can serve as not only a computing framework to effectively identify biologically meaningful protein complexes by integrating multiple protein data sources,but also a valuable resource for exploring new biological findings.The HPCAtlas webserver is freely available at http://www.yulpan.top/HPC-Atlas.展开更多
MicroRNAs (miRNAs), a class of ~20-24 nt long non-coding RNAs, have critical roles in diverse biological processes including devel- opment, proliferation, stress response, etc. With the development and availability...MicroRNAs (miRNAs), a class of ~20-24 nt long non-coding RNAs, have critical roles in diverse biological processes including devel- opment, proliferation, stress response, etc. With the development and availability of experimental technologies and computational approaches, the field of miRNA biology has advanced tremendously over the last decade. By sequence complementarity, miRNAs have been estimated to regulate certain mRNA transcripts. Although it was once thought to be simple and straightforward to find plant miR NA targets, this viewpoint is being challenged by genetic and biochemical studies. In this review, we summarize recent progress in plant miRNA target recognition mechanisms, principles of target prediction, and introduce current experimental and computational tools for plant miRNA target prediction. At the end, we also present our thinking on the outlook for future directions in the development of plant miRNA target finding methods.展开更多
Functional networks are extracted from resting-state functional magnetic resonance imaging data to explore the biomarkers for distinguishing brain disorders in disease diagnosis. Previous works have primarily focused ...Functional networks are extracted from resting-state functional magnetic resonance imaging data to explore the biomarkers for distinguishing brain disorders in disease diagnosis. Previous works have primarily focused on using a single Resting-State Network(RSN) with various techniques. Here, we apply fusion analysis of RSNs to capturing biomarkers that can combine the complementary information among the RSNs. Experiments are carried out on three groups of subjects, i.e., Cognition Normal(CN), Early Mild Cognitive Impairment(EMCI), and Alzheimer's Disease(AD) groups, which correspond to the three progressing stages of AD; each group contains18 subjects. First, we apply group Independent Component Analysis(ICA) to extracting the Default Mode Network(DMN) and Dorsal Attention Network(DAN) for each subject group. Then, by obtaining the common DMN and DAN as templates for each group, we employ the individual ICA to extract the DMN and DAN for each subject.Finally, we fuse the DMNs and DANs to explore the biomarkers. The results show that(1) the templates generated by group ICA can extract the RSN for each subject by individual ICA effectively;(2) the RSNs combined with the fusion analysis can obtain more informative biomarkers than without fusion analysis;(3) the most different regions of DMN and DAN are found between CN and EMCI and between EMCI and AD, which show differences. For the DMN, the difference in the medial prefrontal cortex between the EMCI and AD is smaller than that between CN and EMCI, whereas that in the posterior cingulate between EMCI and AD is larger. As for the DAN, the difference in the intraparietal sulcus is smaller than that between CN and EMCI;(4) extracting DMN and DAN for each subject via the back reconstruction of group ICA is invalid.展开更多
In the past decades,advances in high-throughput technologies have led to the generation of huge amounts of biological data that require analysis and interpretation.Recently,nonnegative matrix factorization (NMF) has...In the past decades,advances in high-throughput technologies have led to the generation of huge amounts of biological data that require analysis and interpretation.Recently,nonnegative matrix factorization (NMF) has been introduced as an efficient way to reduce the complexity of data as well as to interpret them,and has been applied to various fields of biological research.In this paper,we present CloudNMF,a distributed open-source implementation of NMF on a MapReduce framework.Experimental evaluation demonstrated that CloudNMF is scalable and can be used to deal with huge amounts of data,which may enable various kinds of a high-throughput biological data analysis in the cloud.CloudNMF is freely accessible at http://admis.fudan.edu.cn/projects/CloudNMF.html.展开更多
Interactions between chromatin segments play a large role in functional genomic assays and developments in genomic interaction detection methods have shown interacting topological domains within the genome. Among thes...Interactions between chromatin segments play a large role in functional genomic assays and developments in genomic interaction detection methods have shown interacting topological domains within the genome. Among these methods, Hi-C plays a key role. Here, we present the Genome Interaction Tools and Resources(GITAR), a software to perform a comprehensive Hi-C data analysis, including data preprocessing, normalization, and visualization, as well as analysis of topologically-associated domains(TADs). GITAR is composed of two main modules:(1)HiCtool, a Python library to process and visualize Hi-C data, including TAD analysis; and(2)processed data library, a large collection of human and mouse datasets processed using HiCtool.HiCtool leads the user step-by-step through a pipeline, which goes from the raw Hi-C data to the computation, visualization, and optimized storage of intra-chromosomal contact matrices and TAD coordinates. A large collection of standardized processed data allows the users to compare different datasets in a consistent way, while saving time to obtain data for visualization or additional analyses. More importantly, GITAR enables users without any programming or bioinformatic expertise to work with Hi-C data. GITAR is publicly available at http://genomegitar.org as an open-source software.展开更多
基金supported by the National Natural Science Foundation of China(Grant Nos.61972100 and 62172300).
文摘A fundamental principle of biology is that proteins tend to form complexes to play important roles in the core functions of cells.For a complete understanding of human cellular functions,it is crucial to have a comprehensive atlas of human protein complexes.Unfortunately,we still lack such a comprehensive atlas of experimentally validated protein complexes,which prevents us from gaining a complete understanding of the compositions and functions of human protein complexes,as well as the underlying biological mechanisms.To fill this gap,we built Human Protein Complexes Atlas(HPC-Atlas),as far as we know,the most accurate and comprehensive atlas of human protein complexes available to date.We integrated two latest protein interaction networks,and developed a novel computational method to identify nearly 9000 protein complexes,including many previously uncharacterized complexes.Compared with the existing methods,our method achieved outstanding performance on both testing and independent datasets.Furthermore,with HPC-Atlas we identified 751 severe acute respiratory syndrome coronavirus 2(SARS-CoV-2)-affected human protein complexes,and 456 multifunctional proteins that contain many potential moonlighting proteins.These results suggest that HPC-Atlas can serve as not only a computing framework to effectively identify biologically meaningful protein complexes by integrating multiple protein data sources,but also a valuable resource for exploring new biological findings.The HPCAtlas webserver is freely available at http://www.yulpan.top/HPC-Atlas.
基金supported by Major State Basic Research and Development Program of China (973 Program) (Grant No. 2010CB126604)NSFC (Grant No. 61272380)+1 种基金supported by NSFC (Grant No. 61173118)the Shuguang Program of Shanghai Education Foundation
文摘MicroRNAs (miRNAs), a class of ~20-24 nt long non-coding RNAs, have critical roles in diverse biological processes including devel- opment, proliferation, stress response, etc. With the development and availability of experimental technologies and computational approaches, the field of miRNA biology has advanced tremendously over the last decade. By sequence complementarity, miRNAs have been estimated to regulate certain mRNA transcripts. Although it was once thought to be simple and straightforward to find plant miR NA targets, this viewpoint is being challenged by genetic and biochemical studies. In this review, we summarize recent progress in plant miRNA target recognition mechanisms, principles of target prediction, and introduce current experimental and computational tools for plant miRNA target prediction. At the end, we also present our thinking on the outlook for future directions in the development of plant miRNA target finding methods.
基金supported by the National Natural Science Foundation of China(NSFC)(No.61772367)the Program of Shanghai Subject Chief Scientist(No.15XD1503600)supported by the National Key Research and Development Program of China(No.2016YFC0901704)
文摘Functional networks are extracted from resting-state functional magnetic resonance imaging data to explore the biomarkers for distinguishing brain disorders in disease diagnosis. Previous works have primarily focused on using a single Resting-State Network(RSN) with various techniques. Here, we apply fusion analysis of RSNs to capturing biomarkers that can combine the complementary information among the RSNs. Experiments are carried out on three groups of subjects, i.e., Cognition Normal(CN), Early Mild Cognitive Impairment(EMCI), and Alzheimer's Disease(AD) groups, which correspond to the three progressing stages of AD; each group contains18 subjects. First, we apply group Independent Component Analysis(ICA) to extracting the Default Mode Network(DMN) and Dorsal Attention Network(DAN) for each subject group. Then, by obtaining the common DMN and DAN as templates for each group, we employ the individual ICA to extract the DMN and DAN for each subject.Finally, we fuse the DMNs and DANs to explore the biomarkers. The results show that(1) the templates generated by group ICA can extract the RSN for each subject by individual ICA effectively;(2) the RSNs combined with the fusion analysis can obtain more informative biomarkers than without fusion analysis;(3) the most different regions of DMN and DAN are found between CN and EMCI and between EMCI and AD, which show differences. For the DMN, the difference in the medial prefrontal cortex between the EMCI and AD is smaller than that between CN and EMCI, whereas that in the posterior cingulate between EMCI and AD is larger. As for the DAN, the difference in the intraparietal sulcus is smaller than that between CN and EMCI;(4) extracting DMN and DAN for each subject via the back reconstruction of group ICA is invalid.
基金financially supported by National High Technology Research and Development Program of China(863 Program Grant No.2012AA020403)National Natural Science Foundation of China(Grant Nos.61173118 and 61272380)
文摘In the past decades,advances in high-throughput technologies have led to the generation of huge amounts of biological data that require analysis and interpretation.Recently,nonnegative matrix factorization (NMF) has been introduced as an efficient way to reduce the complexity of data as well as to interpret them,and has been applied to various fields of biological research.In this paper,we present CloudNMF,a distributed open-source implementation of NMF on a MapReduce framework.Experimental evaluation demonstrated that CloudNMF is scalable and can be used to deal with huge amounts of data,which may enable various kinds of a high-throughput biological data analysis in the cloud.CloudNMF is freely accessible at http://admis.fudan.edu.cn/projects/CloudNMF.html.
基金supported by the National Institutes of Health,United States(Grant Nos.U01CA200147 and DP1HD087990)awarded to SZ
文摘Interactions between chromatin segments play a large role in functional genomic assays and developments in genomic interaction detection methods have shown interacting topological domains within the genome. Among these methods, Hi-C plays a key role. Here, we present the Genome Interaction Tools and Resources(GITAR), a software to perform a comprehensive Hi-C data analysis, including data preprocessing, normalization, and visualization, as well as analysis of topologically-associated domains(TADs). GITAR is composed of two main modules:(1)HiCtool, a Python library to process and visualize Hi-C data, including TAD analysis; and(2)processed data library, a large collection of human and mouse datasets processed using HiCtool.HiCtool leads the user step-by-step through a pipeline, which goes from the raw Hi-C data to the computation, visualization, and optimized storage of intra-chromosomal contact matrices and TAD coordinates. A large collection of standardized processed data allows the users to compare different datasets in a consistent way, while saving time to obtain data for visualization or additional analyses. More importantly, GITAR enables users without any programming or bioinformatic expertise to work with Hi-C data. GITAR is publicly available at http://genomegitar.org as an open-source software.