摘要
在中文专利功效短语获取研究中,针对单功效短语研究较多,复合功效短语提取困难的情况,提出一种基于规则和条件随机场的复合功效短语提取流程。对分词过程中被切分的复合功效短语,进行词性标注和句法规则分析,基于规则合并被切分的复合功效短语得到候选复合功效短语数据集;将不同的特征组合引入条件随机场模型,实验确定最优特征组合,成功构建模型;利用该模型过滤候选复合功效短语数据集,提取专利中的复合功效短语。实验结果表明,该方法显著提高了复合功效短语提取的准确率和F值。
Aiming at the situation that research on single effect words is more while compound effect phrases are difficult to extract in the acquisition of effect phrases in Chinese patent,a process of extracting compound effect phrases based on rules and conditional random fields(CRF)was proposed.The POS tagging and syntactic rules were adopted to analyze the segmented compound effect phrases,and were merged to form candidate data set.Various combination of features was introduced into CRF,and the combination varying experiment was optimized to structure a model.With this model,compound effect phrases in patent were extracted accurately by filtering alternative data set.A plenty of tests verify that the proposed method can effectually enhance the accuracy and F-measure of extracting compound effect phrases.
作者
马建红
杨成
姚爽
MA Jian-hong;YANG Cheng;YAO Shuang(School of Computer Science and Engineering,Hebei University of Technology,Tianjin 300401,China)
出处
《计算机工程与设计》
北大核心
2019年第2期449-454,共6页
Computer Engineering and Design