摘要
目的进化树分析是生物信息学研究的重要工具,但是目前结果比较精确的进化树方法计算量都很大,无法在大样本数据中直接应用。本文试图通过结合聚类分析和进化树分析的方法以解决此问题。方法以甲型流感病毒的H3A1序列为例,首先使用两步聚类将数据进行拆分,随后按照类别分类构建进化树,并最终将其拼接为完整的进化树结果。结果序列的聚类结果与进化树结构间呈现出高度的一致性,各类别在时间上的更替规律在进化树中则呈现为各进化树节段的交替。结论聚类方法与进化树方法相结合可以很好地满足大样本序列的进化树分析需求,如果在模型中加入其他参数,还可以使结果更为丰富,值得在该领域中推广。
Objective Phylogenetic tree analysis is an important tool in bioinformatlcs research. However, the most accurate methods are too computationally intensive to be used in large sequence data. In this paper, a strategy which combines cluster method and phylogenetic tree method was discussed to solve the problem. Methods Large sequence data should be split by duster method firstly, then phylogenetic tree topology could be searched in each clusters separately, finally, a integrated tree topology could be constructed via combine the segments of clusters. H3A1 sequence data in influenza A virus are applied to confirm the method. Results Tree topology is highly consistent with the relationship between clusters, the order of dusters in tree is identical to time order of these clusters. Conclusion The combination of cluster method and phylogenetic tree method could well sat- isfy the need in analysis of large sequence data, additional parameters should be added into models to make results more explainable. This strategy should be extended in the research field.
出处
《中国卫生统计》
CSCD
北大核心
2006年第5期393-396,共4页
Chinese Journal of Health Statistics
基金
国家自然科学基金资助项目(30400370)
关键词
生物信息学
聚类分析
进化树分析
甲型流感病毒
Bioinformatics, Cluster analysis, Phylogenetic analysis, Influenza A virus