摘要
本文提出了一种基于粗糙集的基本名词短语(BaseNP)识别方法。该方法首先进行BaseNP标注,然后实现BaseNP识别。它把BaseNP标注看作一个决策问题用粗糙集理论解决,因而具有特征约简和规则优化的特点。文章介绍了基于粗糙集的规则学习方法和相应的算法,同时也给出了BaseNP标注和识别的算法流程,提出了解决实例冲突问题的方法,并提高了识别效果。文章最后给出了详细的实验步骤和结果,并与几个典型系统进行了比较与分析,提出了进一步改进的方向。
An approach of base noun phrase (BaseNP) identification based on rough sets is proposed in this paper. It divides BaseNP identification into two ordinal subtasks : tagging and identification, and regards BaseNP tagging as a decision-making problem which can be solved in rough sets theory. So it characters feature reduction and rule optimization. In the paper, rough sets-based rule learning method and relevant algorithms are briefly introduced at first, the flow charts of BaseNP tagging and identification are then described, and the solution to the instance collision is put forward for improving the performance of BaseNP identification. The detailed experimental steps and results, and the comparison with some representative similar systems are given at last. According to the analysis of the results, the paper also points out the direction of further improvement of the approach.
出处
《中文信息学报》
CSCD
北大核心
2006年第3期14-21,共8页
Journal of Chinese Information Processing
基金
国家自然科学基金资助项目(60372038)
关键词
人工智能
自然语言处理
基本名词短语
粗糙集
机器学习
规则方法
算法
artificial intelligence
natural language processing
base noun phrase
rough sets
machine learning
rulebased method
algorithm