摘要
利用加权相对熵的二阶马尔可夫模型的基本原理,对DNA序列进行比较.DNA序列由4个字符A、T、C、G构成的序列,将其视为一个马尔可夫链,取状态空间Ι={A,T,C,G},使用二阶转移概率矩阵来描述DNA序列,得到DNA序列的特征值,进而利用特征值定义DNA序列的相似性度量,得到能够对DNA序列进行比较的新方法,并利用这个方法对30个物种的线粒体DNA序列进行分类,通过加权相对熵得到距离矩阵的非比对方法构建的进化树划分更加清晰,准确度更高.
The fundamental of the weighted relative entropy is introduce based on 2-step Markov model to compare DNA sequences. The DNA sequence, consisted of four characters A, T, C and G can be considered as a Markov chain. By taking state space ! = {A, T, C, G f and describe the DNA sequences with 2-step transition probability matrix, the eigen value of the DNA sequence can be obtained to define the similarity metric. Therefore, a new method to compare the DNA sequences is found to classify chromosomes DNA sequences ob- tained from 30 species. The phylogenetic tree built by the alignment-free method of the distance matrix resul- ted from the weighted relative entropy has clearer and more accurate division.
出处
《大连交通大学学报》
CAS
2013年第5期112-117,共6页
Journal of Dalian Jiaotong University
关键词
二阶马尔可夫模型
DNA序列分类
加权相对熵
进化树
2-step Markov model
comparison of DNA sequences
weighted relative entropy
phylogenetic tree