The fast development of next-generation sequencing technology presents a major computational challenge for data processing and analysis.A fast algorithm,de Bruijn graph has been successfully used for genome DNA de nov...The fast development of next-generation sequencing technology presents a major computational challenge for data processing and analysis.A fast algorithm,de Bruijn graph has been successfully used for genome DNA de novo assembly;nevertheless,its performance for transcriptome assembly is unclear.In this study,we used both simulated and real RNA-Seq data,from either artificial RNA templates or human transcripts,to evaluate five de novo assemblers,ABySS,Mira,Trinity,Velvet and Oases.Of these assemblers,ABySS,Trinity,Velvet and Oases are all based on de Bruijn graph,and Mira uses an overlap graph algorithm.Various numbers of RNA short reads were selected from the External RNA Control Consortium(ERCC) data and human chromosome 22.A number of statistics were then calculated for the resulting contigs from each assembler.Each experiment was repeated multiple times to obtain the mean statistics and standard error estimate.Trinity had relative good performance for both ERCC and human data,but it may not consistently generate full length transcripts.ABySS was the fastest method but its assembly quality was low.Mira gave a good rate for mapping its contigs onto human chromosome 22,but its computational speed is not satisfactory.Our results suggest that transcript assembly remains a challenge problem for bioinformatics society.Therefore,a novel assembler is in need for assembling transcriptome data generated by next generation sequencing technique.展开更多
Proteins are essential parts of living organisms and participate in virtually every process within cells. As the genomlc sequences for increasing number of organisms are completed, research into how proteins can perfo...Proteins are essential parts of living organisms and participate in virtually every process within cells. As the genomlc sequences for increasing number of organisms are completed, research into how proteins can perform such a variety of functions has become much more intensive because the value of the genomic sequences relies on the accuracy of understanding the encoded gene products. Although the static three-dimensional structures of many proteins are known, the functions of proteins are ulti- mately governed by their dynamic characteristics, including the folding process, conformational fluctuations, molecular mo- tions, and protein-ligand interactions. In this review, the physicochemical principles underlying these dynamic processes are discussed in depth based on the free energy landscape (FEL) theory. Questions of why and how proteins fold into their native conformational states, why proteins are inherently dynamic, and how their dynamic personalities govern protein functions are answered. This paper will contribute to the understanding of structure-function relationship of proteins in the post-genome era of life science research.展开更多
基金supported by grants from the National Center for Research Resources (5P20RR016471-12)the National Institute of General Medical Sciences (8 P20 GM103442-12) from the National Institutes of Healththe seed collaborative research grant from the Odegard School of Aerospace Sciences and the School of Medicine and Health Sciences at University of North Dakota
文摘The fast development of next-generation sequencing technology presents a major computational challenge for data processing and analysis.A fast algorithm,de Bruijn graph has been successfully used for genome DNA de novo assembly;nevertheless,its performance for transcriptome assembly is unclear.In this study,we used both simulated and real RNA-Seq data,from either artificial RNA templates or human transcripts,to evaluate five de novo assemblers,ABySS,Mira,Trinity,Velvet and Oases.Of these assemblers,ABySS,Trinity,Velvet and Oases are all based on de Bruijn graph,and Mira uses an overlap graph algorithm.Various numbers of RNA short reads were selected from the External RNA Control Consortium(ERCC) data and human chromosome 22.A number of statistics were then calculated for the resulting contigs from each assembler.Each experiment was repeated multiple times to obtain the mean statistics and standard error estimate.Trinity had relative good performance for both ERCC and human data,but it may not consistently generate full length transcripts.ABySS was the fastest method but its assembly quality was low.Mira gave a good rate for mapping its contigs onto human chromosome 22,but its computational speed is not satisfactory.Our results suggest that transcript assembly remains a challenge problem for bioinformatics society.Therefore,a novel assembler is in need for assembling transcriptome data generated by next generation sequencing technique.
基金supported by the National Natural Science Foundation of China(31370715,31160181,31360277,30860011)the National Basic Research Program of China(2013CB127500)+1 种基金the Program of Innovation Group of Yunnan Province(2011CI123)Foundation for Key Teacher in Yunnan University(XT412003)
文摘Proteins are essential parts of living organisms and participate in virtually every process within cells. As the genomlc sequences for increasing number of organisms are completed, research into how proteins can perform such a variety of functions has become much more intensive because the value of the genomic sequences relies on the accuracy of understanding the encoded gene products. Although the static three-dimensional structures of many proteins are known, the functions of proteins are ulti- mately governed by their dynamic characteristics, including the folding process, conformational fluctuations, molecular mo- tions, and protein-ligand interactions. In this review, the physicochemical principles underlying these dynamic processes are discussed in depth based on the free energy landscape (FEL) theory. Questions of why and how proteins fold into their native conformational states, why proteins are inherently dynamic, and how their dynamic personalities govern protein functions are answered. This paper will contribute to the understanding of structure-function relationship of proteins in the post-genome era of life science research.