In the post-genomic era, identification of specific regulatory motifs or transcription factor binding sites (TFBSs) in non-coding DNA sequences, which is essential to elucidate transcriptional regulatory networks, h...In the post-genomic era, identification of specific regulatory motifs or transcription factor binding sites (TFBSs) in non-coding DNA sequences, which is essential to elucidate transcriptional regulatory networks, has emerged as an obstacle that frustrates many researchers. Consequently, numerous motif discovery tools and correlated databases have been applied to solving this problem. However, these existing methods, based on different computational algorithms, show diverse motif prediction efficiency in non-coding DNA sequences. Therefore, understanding the similarities and differences of computational algorithms and enriching the motif discovery literatures are important for users to choose the most appropriate one among the online available tools. Moreover, there still lacks credible criterion to assess motif discovery tools and instructions for researchers to choose the best according to their own projects. Thus integration of the related resources might be a good approach to improve accuracy of the application. Recent studies integrate regulatory motif discovery tools with experimental methods to offer a complementary approach for researchers, and also provide a much-needed model for current researches on transcriptional regulatory networks. Here we present a comparative analysis of regulatory motif discovery tools for TFBSs.展开更多
Hepatitis B virus(HBV) infection is a severe global health problem. In recent years, mutations as an essential element in the HBV evolution have been extensively studied. However, the study of the conserved sequence f...Hepatitis B virus(HBV) infection is a severe global health problem. In recent years, mutations as an essential element in the HBV evolution have been extensively studied. However, the study of the conserved sequence for the evolution of HBV is still in its infancy. In this paper, we applied MEME(multiple EM for motif elicitation) algorithm for motif discovery and proposed a new metric CI(conserved index) to make phylogenetic analysis of HBV sequences. Our results indicate that MEME can efficiently discover multiple motifs from HBV sequences and the new measurement CI for the conservative of sequences can effectively help us to build the phylogenetic tree.Thus, we can get evolutionary relationship of HBV sequence through the phylogenetic tree.展开更多
Background: The frequency of small subtrees in biological, social, and other types of networks could shed light into the structure, function, and evolution of such networks. However, counting all possible subtrees of...Background: The frequency of small subtrees in biological, social, and other types of networks could shed light into the structure, function, and evolution of such networks. However, counting all possible subtrees of a prescribed size can be computationally expensive because of their potentially large number even in small, sparse networks. Moreover, most of the existing algorithms for subtree counting belong to the subtree-centric approaches, which search for a specific single subtree type at a time, potentially taking more time by searching again on the same network. Methods: In this paper, we propose a network-centric algorithm (MTMO) to efficiently count k-size subtrees. Our algorithm is based on the enumeration of all connected sets of k-1 edges, incorporates a labeled rooted tree data structure in the enumeration process to reduce the number of isomorphism tests required, and uses an array-based indexing scheme to simplify the subtree counting method. Results: The experiments on three representative undirected complex networks show that our algorithm is roughly an order of magnitude faster than existing subtree-centric approaches and base network-centric algorithm which does not use rooted tree, allowing for counting larger subtrees in larger networks than previously possible. We also show major differences between unicellular and multicellular organisms. In addition, our algorithm is applied to find network motifs based on pattern growth approach. Conclusions: A network-centric algorithm which allows for a This enables us to count larger motif in larger networks than faster counting of non-induced subtrees is proposed previously.展开更多
文摘In the post-genomic era, identification of specific regulatory motifs or transcription factor binding sites (TFBSs) in non-coding DNA sequences, which is essential to elucidate transcriptional regulatory networks, has emerged as an obstacle that frustrates many researchers. Consequently, numerous motif discovery tools and correlated databases have been applied to solving this problem. However, these existing methods, based on different computational algorithms, show diverse motif prediction efficiency in non-coding DNA sequences. Therefore, understanding the similarities and differences of computational algorithms and enriching the motif discovery literatures are important for users to choose the most appropriate one among the online available tools. Moreover, there still lacks credible criterion to assess motif discovery tools and instructions for researchers to choose the best according to their own projects. Thus integration of the related resources might be a good approach to improve accuracy of the application. Recent studies integrate regulatory motif discovery tools with experimental methods to offer a complementary approach for researchers, and also provide a much-needed model for current researches on transcriptional regulatory networks. Here we present a comparative analysis of regulatory motif discovery tools for TFBSs.
基金Science Research Foundation of Yunnan Educational Committeegrant number:2011J079+3 种基金Yunnan Fundamental Research Foundation of Applicationgrant number:2009ZC049MScience Research Foundation for the Overseas Chinese Scholars,State Education Ministrygrant number:2010-1561
文摘Hepatitis B virus(HBV) infection is a severe global health problem. In recent years, mutations as an essential element in the HBV evolution have been extensively studied. However, the study of the conserved sequence for the evolution of HBV is still in its infancy. In this paper, we applied MEME(multiple EM for motif elicitation) algorithm for motif discovery and proposed a new metric CI(conserved index) to make phylogenetic analysis of HBV sequences. Our results indicate that MEME can efficiently discover multiple motifs from HBV sequences and the new measurement CI for the conservative of sequences can effectively help us to build the phylogenetic tree.Thus, we can get evolutionary relationship of HBV sequence through the phylogenetic tree.
基金This work was supported by the National Natural Science Foundation of China (No. 61572180) and Scientific and Technological Research Project of Education Department in Jiangxi Province (No. GJJ170383),
文摘Background: The frequency of small subtrees in biological, social, and other types of networks could shed light into the structure, function, and evolution of such networks. However, counting all possible subtrees of a prescribed size can be computationally expensive because of their potentially large number even in small, sparse networks. Moreover, most of the existing algorithms for subtree counting belong to the subtree-centric approaches, which search for a specific single subtree type at a time, potentially taking more time by searching again on the same network. Methods: In this paper, we propose a network-centric algorithm (MTMO) to efficiently count k-size subtrees. Our algorithm is based on the enumeration of all connected sets of k-1 edges, incorporates a labeled rooted tree data structure in the enumeration process to reduce the number of isomorphism tests required, and uses an array-based indexing scheme to simplify the subtree counting method. Results: The experiments on three representative undirected complex networks show that our algorithm is roughly an order of magnitude faster than existing subtree-centric approaches and base network-centric algorithm which does not use rooted tree, allowing for counting larger subtrees in larger networks than previously possible. We also show major differences between unicellular and multicellular organisms. In addition, our algorithm is applied to find network motifs based on pattern growth approach. Conclusions: A network-centric algorithm which allows for a This enables us to count larger motif in larger networks than faster counting of non-induced subtrees is proposed previously.