摘要
带约束最长公共子序列(CLCS)问题有很深的生物学应用背景,常被用来表示同源基因序列相似性的度量,但计算CLCS时间代价很高,最早的CLCS算法的时间复杂度为O(rn4),目前,最快的CLCS算法的时间复杂性为O(rn2).运用对偶原理将带约束最长公共子序列问题转换为带约束最小覆盖集问题,并建立带权的ref树结构,构造包含约束序列的约束覆盖子集,约简带约束覆盖子集并从中搜索关键路径,再通过关键路径构造CLCS,该算法将算法时间复杂度提升到O(nlogn+(q+r)L),r是约束序列的长度,q是两序列序偶的个数,L是两序列的最长公共子序列(LCS)长度.
The constrained longest common subsequence problem has deep background applications in biology. It is often used to express the measurement of similarity in homologous gene sequences, hut the time complexity on computation of constrained longest common subsequence(CLCS) is high. The time complexity of the original CLCS algorithm is O(rn^4 ), while presently the time complexity of the fastest CLCS algorithm is O(rn^2). We use the principle of primal-dual which will convert CLCS to the constrained minimal covering set problem, and then establish ref tree structure with weight, structure constrained covering subset which contains the constrained sequence. We also reduce constrained covering subset and search critical paths from it,and finally structure CLCS through critical paths. The time complexity of this algorithm will be upgraded to O(nlogn+(q+r)L), where the r is length of the constrained sequence, o is the number of ordered hairs of the two given sequences and L is the longest common subsequence(LCS) length of the two given sequences.
出处
《南京大学学报(自然科学版)》
CAS
CSCD
北大核心
2009年第5期576-584,共9页
Journal of Nanjing University(Natural Science)
基金
国家自然科学基金(60573024)
江苏省自然科学基金(BK2009393)
关键词
带约束最长公共子序列
快速算法
对偶算法
constrained longest common subsequence, fast algorithm, primal-dual