摘要
给出了一种错误驱动学习机制与SVM相结合的汉语组块识别方法。该方法在SVM组块识别的基础上,对SVM识别结果中的错误词语序列的词性、组块标注信息等进行分析,获得候选校正规则集;之后按照阈值条件对候选集进行筛选,得到最终的校正规则集;最后应用该规则集对SVM的组块识别结果进行校正。实验结果表明,与单独采用SVM模型的组块识别相比,加入错误驱动学习方法后,组块识别的精确率、召回率和F值均得到了提高。
Chunk parsing of Chinese texts can decrease the difficulty of syntactic parsing. This paper proposes a chunking approach that combines support vector machine with error-driven learning. First, the SVM model is used to chunk the training data. Then by error-driven learning, we automatically acquire the tuning rules from the chunking results of SVM. After filtration the rules are used to revise the chunk parsing results of SVM. The experimental results show that this approach is effective in Chinese chunk parsing. Compared with the pure SVM-based chunking, the performance is improved.
出处
《中文信息学报》
CSCD
北大核心
2006年第6期17-24,共8页
Journal of Chinese Information Processing
基金
国家自然科学基金资助项目(60373095
60373096)