摘要
介绍了一种描述能力介于线性词序列和完整句法树表示之间的浅层句法知识描述体系——组块分析体系,并详细讨论了其中两大部分:词界块和成分组的基本内容及其自动识别算法.在此基础上,提出了一种分阶段构造汉语树库的新设想,即先构造组块库,再构造树库,进行了一系列句法分析和知识获取实验,包括1)自动识别汉语最长名词短语;2)自动获取汉语句法知识等.所有这些工作都证明了这种知识描述体系的实用性和有效性.
This paper proposed the chunk parsing scheme , a shallow syntactic knowledge representation system with the descriptive ability between part of speech serial and parse tree representation, and discussed some basic concepts and automatic identification algorithms for its two main parts: word boundary stems and constituent groups. Based on this scheme, this paper also proposed a new treebank annotation strategy, i.e. from chunk bank to treebank, and carried out several syntactic parsing and knowledge acquisition experiments, such as 1) the automatic identification of Chinese maximal noun phrases, 2) the automatic acquisition of Chinese probabilistic context free grammar knowledge. All these work shows its usefulness and efficiency for natural language processing research and development.
出处
《计算机学报》
EI
CSCD
北大核心
1999年第11期1158-1165,共8页
Chinese Journal of Computers
基金
国家自然科学基金
中国博士后科学基金
关键词
句法分析
自然语言处理
汉语句子
组块分析体系
Word boundary stem, constituent group, partial parsing, syntactic parsing.