摘要
提出一种基于隐马尔可夫模型的转录因子文本挖掘算法(HMM-TFM),该方法通过建立转录因子名称的词库,利用谓语筛选策略判断句子是否描述转录因子,使用隐马尔可夫模型预测单词词性,并根据前后文单词词性识别转录因子的名称.实验结果表明,HMM-TFM在英文文献中抽取转录因子名称的查全率和查准率分别可达74.2%和77.9%.
A text mining algorithm named HMM-TFM(hidden Markov m odel based transcription factor name mining) was presented.The proposed algorit hm does not need a dictionary of transcription factor names.A small verb set is defined to filter sentences.Transcription factor names are mined according to the part of speech tagged by hidden Markov model.Experimental results show that the recall rate and precision of HMM-TFM come to 74.2% and 77.9%,respectively.
出处
《吉林大学学报(理学版)》
CAS
CSCD
北大核心
2012年第2期320-322,共3页
Journal of Jilin University:Science Edition
基金
国家自然科学基金(批准号:61073075)