摘要
缩略语自动识别意义重大,有助于提高自动分词和标注的准确率、及时快捷地编写缩略语词典。缩略语自动识别的内容主要有:自动抽取、自动还原、面向中文信息处理的分类体系、缩略语知识库建设等。研究方法上,依托语料库和缩略机制,自觉地把基于规则和统计的方法结合起来。缩略语自动识别研究取得了较大的进展:研究目标明确;进行了一定程度的实验和工程化,识别的准确率和召回率都达到了一定的高度;建立了高质量的缩略语知识库。但也还存在一些问题,研究大都还是初步的,实验的规模也较小,识别的准确率和召回率还不太高,离实用尚有距离。
The automatic identification of abbreviations is of great significance for the automatic segmentation and tagging of Chinese words in the Chinese Information Processing as well as the quick and timely compiling of abbreviation dictionary. The main researching contents of the automatic identification of abbreviations are as follows : the automatic extraction and restoration of abbreviations, the classification system of abbreviations for the Chinese Information Processing and the constructing of abbreviation knowledge-base. Based on Chinese corpus and the mechanism of abbreviation, research methods are based on rules and statistical methods which are often consciously combined. By far the Automatic identification of abbreviations has made great progresses ; the research goals are more and more clear, many experiments have been made, and the experiments show by close testing that the recall rate and correct rate are higher; the high-quality abbreviation knowledge - base has been successfully established. Of course, there are still some problems about the automatic identification of abbreviations existed, such as most of the researches are still preliminary, the scale of experiment is still small, the recall rate and correct rate are not better, and the last research results haven't indeed been put into practice efficiently.
出处
《渭南师范学院学报》
2008年第6期39-43,共5页
Journal of Weinan Normal University
基金
2007年度安徽省高校青年教师科研资助计划项目"现代汉语缩略词语的自动识别研究"(2007jqw104)
巢湖学院科研启动基金
关键词
缩略语
未登录词
中文信息处理
自动识别
abbreviation
unknown words
Chinese information processing
automatic identification