摘要
最大名词短语的识别对机器翻译等诸多自然语言处理任务有着:惹要的意义.以汉语最大名词短语识别为研究任务,在分析现有方法的基础上,从汉语的语H学特殊性以及基于支持向量机的序列标注算法的特点出发,考查了基于混合特征的融合算法的适应性.实验证叨,釆用词和基本组块混合标注单元的标注方法对汉语最大名词短语的识别是有效的,并且其i E反向识别结果具有一定的互补性,在此基础上提出的基于"边界分歧"的双向序列标注融合算法恰能发掘双向识别的互补性,并达到较高的融合精度.
Maximal-length noun phrase indentification is meaningful to machine translation and many other natural language processing tasks. For the purpose of studying Chinese maximal-length noun phrases, on the basis of current methods, starting from linguistics particularity in Chinese and characteristics of sequence labeling algorithm based on support vector machine (SVM), we explore the adaptability of combination algorithm based on hybrid features. The algorithm is effective, by theoretical analysis and experimental results, to identify Chinese maximal-length noun phrase by applying hybrid unit with words and base chunk, and it is complementary in bi-directional labeling results. From the above, a combination algorithm of bi-directional labeling based on "boundary fork" can discover complement of two directions identification and achieve a high combination accuracy.
出处
《自动化学报》
EI
CSCD
北大核心
2015年第7期1274-1282,共9页
Acta Automatica Sinica
基金
国家重点基础研究发展计划(973计划)
2013CB329303)
国家自然科学基金(61132009
61202244
61201352)资助~~
关键词
最大名词短语
双向标注
基本组块
混合特征
Maximal-length noun phrase
bi-directional labeling
base chunk
hybrid feature