摘要
染色质免疫共沉淀技术将模体识别问题拓展到了全基因组范围,但因数据量过大,传统的模体识别算法往往运算过慢从而无法很好地解决此问题。为了解决传统算法的缺点,提出一种用于Ch IP-seq数据的替换显露子串寻找问题的算法Fast ESE,通过测试集和控制集的比对找出显露子串并搜索其(l,d)替换实例组成相应的位置概率矩阵,再使用权重信息量对这些子串进行聚类,最终找出集合中的替换显露子串。使用真实的Ch IP-seq数据对该研究算法进行有效性验证,实验结果表明,Fast ESE可以在合理时间内有效解决Ch IP-seq中的模体识别问题。
Recently,the development of chromatin immunoprecipitation technique has extended the motif identification problem to the genome?wide range,but the traditional motif identification algorithms runs too slowly and hard to solve this largescale data problem.In order to solve the shortcomings of the traditional algorithms,a substituted emerging substring search algorithm named FastESE applied to ChIP?seq data is proposed in this research.The emerging substrings are found out by comparing the test dataset and the control dataset,and then its substituted instances are searched to constitute the corresponding position probabilistic matrix.The weighted information content is adopted to cluster these substrings,and Finally,discover the substituted emerging substrings.The effectiveness of proposed algorithm was verified with the real ChIP?seq data.The experimental results show that the FastESE can deal with the motif identification problem in the ChIP?seq data in a proper time.
作者
张懿璞
闫茂德
侯俊
阚丹会
ZHANG Yipu;YAN Maode;HOU Jun;KAN Danhui(School of Electronics and Control Engineering,Chang’An University,Xi’an 710064,China;School of Information Engineering,Chang’An University,Xi’an 710064,China)
出处
《现代电子技术》
北大核心
2017年第12期6-10,共5页
Modern Electronics Technique
基金
国家自然科学青年基金(61501058)
陕西省自然科学青年基金(2016JQ6075)
中央高校基本业务费(310832161008)