期刊文献+

聚类技术在大样本序列进化树分析中的应用

Application of Cluster Method in Phylogenetic Tree Analysis of Large Sequence Sample
下载PDF
导出
摘要 目的进化树分析是生物信息学研究的重要工具,但是目前结果比较精确的进化树方法计算量都很大,无法在大样本数据中直接应用。本文试图通过结合聚类分析和进化树分析的方法以解决此问题。方法以甲型流感病毒的H3A1序列为例,首先使用两步聚类将数据进行拆分,随后按照类别分类构建进化树,并最终将其拼接为完整的进化树结果。结果序列的聚类结果与进化树结构间呈现出高度的一致性,各类别在时间上的更替规律在进化树中则呈现为各进化树节段的交替。结论聚类方法与进化树方法相结合可以很好地满足大样本序列的进化树分析需求,如果在模型中加入其他参数,还可以使结果更为丰富,值得在该领域中推广。 Objective Phylogenetic tree analysis is an important tool in bioinformatlcs research. However, the most accurate methods are too computationally intensive to be used in large sequence data. In this paper, a strategy which combines cluster method and phylogenetic tree method was discussed to solve the problem. Methods Large sequence data should be split by duster method firstly, then phylogenetic tree topology could be searched in each clusters separately, finally, a integrated tree topology could be constructed via combine the segments of clusters. H3A1 sequence data in influenza A virus are applied to confirm the method. Results Tree topology is highly consistent with the relationship between clusters, the order of dusters in tree is identical to time order of these clusters. Conclusion The combination of cluster method and phylogenetic tree method could well sat- isfy the need in analysis of large sequence data, additional parameters should be added into models to make results more explainable. This strategy should be extended in the research field.
出处 《中国卫生统计》 CSCD 北大核心 2006年第5期393-396,共4页 Chinese Journal of Health Statistics
基金 国家自然科学基金资助项目(30400370)
关键词 生物信息学 聚类分析 进化树分析 甲型流感病毒 Bioinformatics, Cluster analysis, Phylogenetic analysis, Influenza A virus
  • 相关文献

参考文献9

  • 1Baxevanis AD,Ouellette BF.Bioinformatics:A Practical Guide to the Analysis of Genes and Proteins,2nd Edition.John Wiley & Sons,Inc,2001,323-358.
  • 2Pevzner PA.Educating biologists in the 21st century:bioinformatics scientists versus bioinformatics technicians.Bioinformatics,2004,20 (14):2159-2161.
  • 3Huelsenbeck JP,Rannala B.Phylogenetic Methods Come of Age:Testing Hypotheses in an Evolutionary Context.Science,1997,276 (5310):227-232.
  • 4Guindon S,Gascuel O.A Simple,Fast,and Accurate Algorithm to Estimate Large Phylogenies by Maximum Likelihood.Syst Biol,2003,52(5):696-704.
  • 5张文彤,姜庆五,蒋露芳,居丽雯.基于基因序列聚类的甲型流行性感冒病毒H3抗原变异规律研究[J].中华流行病学杂志,2004,25(12):1046-1049. 被引量:8
  • 6张文彤,姜庆五,赵耐青,周琴.数据挖掘技术在生物信息学基因变异规律研究中的应用[J].中国卫生统计,2005,22(1):5-8. 被引量:9
  • 7Bush RM,Fitch WM,Bender CA,et al.Positive Selection on the H3 Hemagglutinin Gene of Human Influenza Virus A.Mol Biol and Evol,1999,16:1457-1465.
  • 8Huelsenbeck JP,Ronquist F.MrBayes:Bayesian inference of phylogenetic trees.Bioinformatics,2001,17 (8):754-755.
  • 9张文彤,姜庆五.全球历年人甲型流感病毒H3A1抗原的分子进化研究[J].中华流行病学杂志,2005,26(11):843-847. 被引量:11

二级参考文献21

  • 1张文彤,姜庆五,蒋露芳,居丽雯.基于基因序列聚类的甲型流行性感冒病毒H3抗原变异规律研究[J].中华流行病学杂志,2004,25(12):1046-1049. 被引量:8
  • 2Plotkin JB,Dushoff J,et al. Hemagglutinin sequence clusters and the antigenic evolution of inflution A virus. PNAS,2002,99:6263.
  • 3SPSS Base 12 User' s Guide. SPSS Inc. Chicago, Illinois, 2003,391-399.
  • 4Clementine 8.0 User' s Guide. SPSS Inc. Chicago, Illinois, 2003, 307-380.
  • 5Clementine 8.0 Algorithms Guide. SPSS Inc. Chicago, Illinois, 2003,37- 61.
  • 6Advanced Modeling With Clementine. SPSS Inc. Chicago, Illinois, 2003, 1-23.
  • 7Plotkin JB, Dushoff J, Levin SA. Hemagglutinin sequence clusters and theantigenic evolution of influenza A virus. PNAS, 2002,99:6263.
  • 8SPSS Base 12 User' s Guide. SPSS Inc. Chicago, Illinois, 2003.391-399.
  • 9Bush RM, Bender CA, Subbarao K, et al. Predicting the evolution of human influenza A. Science, 1999,286:1921.
  • 10Bush RM, Fitch WM, Bender CA, et al. Positive selection on the H3 hemagglutinin gene of human influenza virus A. Mol Biol Evol,1999,16: 1457-1465.

共引文献22

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部