摘要
提出一种领域术语自动抽取的混合策略,首先进行多字词候选术语抽取和分词,然后合并其结果,最后通过领域相关度和领域主题一致度抽取出最终领域术语。在多字词抽取和最终领域术语抽取阶段分别对现有方法进行了改进,降低了字符串分解的时间复杂度并提高了领域术语抽取的准确率和召回率。实验表明,术语抽取准确率为90.64%,优于现有的抽取方法。
This paper introduced a hybrid strategy to extract domain-specific terms automatically. At the beginning, executed multi-word candidate extraction and Chinese word segmentation at the same time with two threads. Then merged their result sets. Finally extracted the domain-specific terms with domain relevance and domain topic consensus method. In multi-word candidate extraction and domain-specific term extraction periods, it improved the presented methods respectively to decrease time complexity of string decomposing and increase the precision and recall. Experimental results show that the precision of hy- brid method achieves 90.64% , which is better than that of presented Chinese domain-specific term extraction methods.
出处
《计算机应用研究》
CSCD
北大核心
2009年第7期2652-2655,共4页
Application Research of Computers
基金
电子工程学院博士研究生创新基金资助项目(2008006)
关键词
领域术语抽取
领域主题一致度
领域本体学习
多字词候选术语
字符串分解
domain-specific term extraction
domain topic consensus
domain ontology learning
multi-word candidate terms
string decomposing