摘要
【目的】为提高新能源汽车领域中文术语抽取结果的准确率和召回率,提出一种适合该领域的术语抽取方法。【方法】在总结前人工作基础上,提出利用条件随机场模型作为抽取模型,选取词、词长、词性、依存关系、词典位置、停用词等特征作为特征模板。【结果】实验结果正确率为93.12%,召回率为90.47%。正确率比Baseline方法提高7.73%。【局限】该方法只提高较短术语抽取结果的正确率。【结论】依存关系作为条件随机场模型的一项特征可以提高新能源汽车领域中文术语抽取结果的正确率和召回率。
[Objective] The problem of Chinese term extraction in new energy vehicles domain is a key problem which needs a special method to improve the precision and recall rate. [Methods] This paper uses conditional random fields model as extraction model, select the word, word length, part of speech, dependencies, dictionary location, stop words and other characteristics as the feature templates. [Results] Experimental results show that the precision and recall are 93.12% and 90.47% respectively. This method improves the performance by 7.73% when compared with the baseline in terms of accuracy. [Limitations] This method can only improve part of the accuracy of the results. [Conclusions] Dependency as one of the conditional random fields mode/ features can improve the precision and recall rate in new energy vehicles domain.
出处
《现代图书情报技术》
CSSCI
2015年第10期88-94,共7页
New Technology of Library and Information Service
基金
国家自然科学基金项目"基于本体的专利自动标引研究"(项目编号:61271304)
北京市教委科技发展计划重点项目暨北京市自然科学基金B类重点项目"面向领域的互联网多模态信息精准搜索方法研究"(项目编号:KZ201311232037)
北京市科学技术研究院科技创新工程项目"基于CGE-TIMES模型的交通对大气环境综合影响评价方法研究"(项目编号:PXM2015_178215_000008)的研究成果之一
关键词
术语抽取
新能源汽车领域
条件随机场
依存句法关系
Term extraction
New energy vehicles
Conditional random fields
Dependency syntactic relations