期刊文献+

基于改进C-value方法的中文术语抽取 被引量:23

Chinese Term Extraction Based on Improved C-value Method
原文传递
导出
摘要 提出一种改进C-value的术语抽取方法,即IC-value方法。利用停用词对文本进行预处理后,采用一种基于串频统计的抽取算法提取候选术语;对候选术语进行语言规则过滤;从逆文档频率、破碎子串和术语长度三个方面改进C-value方法得到IC-value方法,并用来计算候选术语的术语度。以1 000篇乙型肝炎相关论文摘要进行实证研究,结果证明IC-value方法在准确率和召回率方面都要优于C-value、TF-IDF和V-value,有较强的长术语发现能力,且识别破碎子串的效果十分明显。 An improved C -value term extraction method is introduced in the paper. Firstly, the domain -specific text corpora is preprocessed by stop word list. Secondly, a term extraction algorithm based on the co - occurrence frequency of multi -character is applied to get candidate terms. Lastly, term selection is completed based on termbood computed by IC - value which is the improvement of C - value in terms of inverse document frequency, meaningless substring and term length. Empirical study is conducted based on 1 000 abstracts of articles about Hepatitis B. The results indicate the pro- posed IC - value is much better than C - value, TF - IDF and V - value in both precision and recall. And IC - value also has good performance in long term extraction and it is very effective in filtering meaningless substring.
出处 《现代图书情报技术》 CSSCI 北大核心 2013年第2期24-29,共6页 New Technology of Library and Information Service
关键词 术语抽取 串频统计 语言规则 术语度 Term extraction Statistics of string frequency Linguistical rules Termhood
  • 相关文献

参考文献16

  • 1王强军,李芸,张普.信息技术领域术语提取的初步研究[J].术语标准化与信息技术,2003(1):32-33. 被引量:23
  • 2安纪霞,李锡祚,宋冰,曾伟.服务于词典编纂的特定领域专业术语自动抽取[J].计算机与数字工程,2007,35(11):53-56. 被引量:3
  • 3Foo J, Merkel M. Using Machine Learning to Perform Automatic Term Recognition[C].In:Proceedings of the LREC 2010 Workshop on Methods for Automatic Acquisition of Language Resources and Their Evaluation Methods, Valletta. 2010:49-54.
  • 4Krauthammer M, Nenadic G. Term Identification in the Biomedical Literature[J].Journal of Biomedical Informatics, 2004, 37(6):512-526.
  • 5Kageura K, Umino B. Methods of Automatic Term Recognition: A Review[J].Terminology, 1996, 3(2):259-289.
  • 6潘虹,徐朝军.LCS算法在术语抽取中的应用研究[J].情报学报,2010,29(5):853-857. 被引量:11
  • 7Damerau F J. Generating and Evaluating Domain-oriented Multi-word Terms from Texts[J]. Information Processing & Management, 1993,29(4):433-447.
  • 8张锋,许云,侯艳,樊孝忠.基于互信息的中文术语抽取系统[J].计算机应用研究,2005,22(5):72-73. 被引量:36
  • 9Gelbukh A, Sidorov G, Lavin-Villa E, et al. Automatic Term Extraction Using Log-Likelihood Based Comparison with General Reference Corpus[C].In: Proceedings of the Natural Language Processing and Information Systems, and the 15th International Conference on Applications of Natural Language to Information Systems. Berlin, Heidelberg: Springer-Verlag,2010:248-255.
  • 10周浪,史树敏,冯冲,黄河燕.基于多策略融合的中文术语抽取方法[J].情报学报,2010,29(3):460-467. 被引量:28

二级参考文献82

共引文献129

同被引文献281

引证文献23

二级引证文献156

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部