摘要
汉语介词短语识别的方法是基于最大熵的统计模型,通过最大熵的介词短语边界自动识别和依存语法错误校正两个处理阶段:先由最大熵模型对介词短语进行识别,然后利用依存树库中介词短语的左右边界词语的依存语法知识,对介词短语右边界的错误识别进行校正,完成了对经过分词和词性标注的句子进行介词短语界定的任务,为进一步的句法分析工作打下良好的基础。实验表明该方法是行之有效的。
This paper describes an automatic prediction model of Chinese prepositional phrase boundary location based on maximum entropy.It consists of two stages:first automatically identifying the phrase boundary by using the statistic of maximum entropy,and then post-tuning the results with dependent grammar knowledge.Firstly,the maximum entropy is applied to identifying the prepositional phrase,then the results are fine-tuned with dependent grammar knowledge generated by dependent treebank.Thus finishing the identification of Chinese prepositional phrase through the word segmented and word-of-speech tagged sentences,and laying a good foundation for the further analysis of the sentences.The experiment result indicates that the method is feasible and effective.
出处
《通信技术》
2010年第5期181-183,186,共4页
Communications Technology
基金
教育部科学技术重点资助项目(No.03081)
关键词
汉语介词短语
短语识别
最大熵
依存语法
Chinese prepositional phrase
phrase identification
maximum entropy
dependence grammar