摘要
生物序列比对是生物信息学中最基础的研究课题之一。基于动态规划的Needleman-Wunsch双序列比对算法主要采用迭代算法及空位罚分规则对基因序列进行逐一比对,计算二者相似性得分,最后通过回溯分析得出序列之间的最佳比对。虽然该算法可以得到最佳比对结果,但是时间复杂度和空间复杂度较高。首先对原算法进行分析,对计算得分和回溯进行改进。接着设计2次实验,以金黄色葡萄球菌和银葡萄球菌分别作为目标序列和待比对序列,分别生成序列长度范围相同和不同的5组数据进行实验测试。最后通过对新型冠状病毒和SARS病毒全序列进行比对,进一步验证了改进算法的有效性。实验结果表明,改进后的算法可以缩短序列比对时间,提高序列比对效率。
Sequence alignment is one of the most fundamental research problems in bioinformatics.The pairwise alignment Needleman-Wunsch based on dynamic programming mainly uses the iterative algorithm and the vacancy penalty rule to compare gene sequences one by one,calculates their similarity score,and finally obtains the best alignment between sequences through backtracking analysis.Although the algorithm can get the best result,it has high time and space complexity.Firstly,the original algorithm is analyzed and improved from the aspects of calculation score and backtracking.Secondly,two experiments are designed.In the experiments,Staphylococcus aureus is used as the target sequence,and Staphylococcus aureus is used as the counterpart sequence.Five groups of experiments with the same and different sequence length range are conducted.Finally,the novel coronavirus and SARS virus sequences are compared to verify the effectiveness of the algorithm.The experimental results show that the improved algorithm can reduce the sequence alignment time and improve the efficiency of sequence alignment.
作者
甘秋云
GAN Qiu-yun(School of Computing and Information Science,Fuzhou Institute of Technology,Fuzhou 350014,China)
出处
《计算机工程与科学》
CSCD
北大核心
2021年第2期340-346,共7页
Computer Engineering & Science
基金
福建省教育厅中青年教师教育科研项目(JAT191019)。
关键词
序列比对
动态规划
相似性
sequence alignment
dynamic programming
identity