摘要
针对汉语简单名词短语提出一种混合的识别模型,该模型包括组合分类器方法和一种并列结构识别算法.分析简单名词短语与其他类型名词短语的异同,进一步确定其识别任务;利用词、词性和词义信息构建层叠的组合分类器对其进行识别;最后,创新性地利用词性组合模板和基于词向量的语义相似度构造一种并列结构识别算法,提高了简单名词短语的识别精度的同时,保持了其内部结构的清晰.简单名词短语识别的F-值为91.19%,比目前最好结果提高了0.85%,验证了该方法识别简单名词短语的有效性;内部并列结构左右边界的识别精确率分别为80.93%和82.11%,在一定程度上解决了目前多名词并列结构难以识别的问题.
This paper proposes a mixed method for Chinese simple noun phrase recognition. The method includes a cascade combina- tion classifier and a coordinate structure identification algorithm. Firstly, we analyze the similarities and differences between simple noun phrase and chunking. Secondly, words information, part of speech information and semantic information are employed to build a cascade combination classifier. Finally, in order to improve the recognition accuracy and keep structure clear, we propose a parallel structure identification algorithm which takes advantage of part of speech template and semantic similarity based on the word vector. The optimized model performs 0.85 point higher than the current best model with a 91.19% F-score,which indicates the affectivity of our model. We also identify the left and right boundary of coordinate structure with the algorithm and the recognition accuracies reach 80.93 % and 82.11% respectively,it proves that we have solved the recognition problem of some multi noun coordinate structure.
出处
《小型微型计算机系统》
CSCD
北大核心
2017年第4期749-754,共6页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61672127
61173100
61672126
61272375)资助
关键词
简单名词短语
条件随机场
支持向量机
并列结构
词向量
simple noun phrase
conditional random field
support vector machine
coordinate structure
word vector