摘要
关系词是多重复句的连接成分,其功能是关联分句且标志分句间的语义关系,它对多重复句的研究具有重要意义。但是,在研究基于规则的现代汉语复句关系词的自动标识过程中,发现多重复句内初次识别出的关系标记,较多是伪关系词。这就需要判定其是否是真正的关系词,而判定的基础是确定关系标记之间的搭配关系,这是一个难点。为解决该问题,本文提出了两个算法:(1)利用解空间树得到关系标记所有的搭配集合;(2)对解空间树进行剪枝,去掉无用搭配集。实验测试可知:这两个算法不仅通用性强,而且判定正确率达到98.9%,剩下的1.1%还可以得到近似解,这表明本文提出的算法在处理多重复句问题上具有较好的可行性。
Relation words are the connected components of compound sentences, and the function of them is mainly associating clauses and marking the sense relations between clauses, but in the process of studying the automatic identification of the relation words of Modern Chinese compound sentences based on rules, we find that most of the relation markers identified in multiple compound sentences are fake relation words. Therefore, it is needed to determine whether a relation word is true, and the basis for determination is confirming the collocations between relation markers, yet it is a difficulty. This paper proposes two algorithms to solve this problem: (1)utilizing the resolution space tree to get all the collocations between relation markers; (2)pruning the solution space tree in order to delete the useless set of collocations. The results of experiments show that the two algorithms not only are general-purpose, but also the accuracy can be improved to 98. 9 % and the remaining 1. 1% can get app which shows the good effectiveness in dealing with the issues of multiple compound roximate solutions, sentences.
出处
《计算机工程与科学》
CSCD
北大核心
2011年第11期177-182,共6页
Computer Engineering & Science
基金
国家教育部人文社科重点研究基地重点项目(10JJD740012)
2011年国家社科基金资助项目(11BYY052)
关键词
多重复句
关系词搭配
解空间树
multiple compound sentences
the collocations between relation words
the resolution space tree