摘要
从文本中获得的组块对机器翻译、信息检索等很多领域都非常有用.介绍了规则和统计进行组块分析的处理策略,提出了规则与统计相结合的处理方法.并且结合组块分析的实际情况改进了一般评价系统性能的指标,通过封闭测试和开放测试验证,与单纯规则组块划分相比较,组块识别的精确率和召回率都得到了提高,组块划分错误率降低了7%.
To acquire chunks from running texts is useful for many applications, such as machine translation, information retrieving, etc.. Described in this paper are the schemes of rule-based chunker and statistics-based chunker. Also proposed is a method to combine rule-based processing with statistics-based processing. According to the practical situation the mistake recall is introduced to rate the performance of the system. Compared with the rule-based system, the precision and recall are enhanced to identify chunks, and the error rate is reduced about 7%. The performance of the whole system has been improved greatly.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2002年第4期385-391,共7页
Journal of Computer Research and Development
基金
国家"九七三"重点基础研究项目基金资助(G1998030507-4)