摘要
中文组织机构名识别的精确率一直是影响自然语言处理的因素之一.根据中文组织机构名的特点,提出采用最大熵的识别方法.实验系统在较大规模的数据集上对比了不同特征选择方法对模型的影响,同时考查了词面、词性、语法等信息对模型的贡献.实验结果表明,不同的特征选择算法,开放测试的平均值只相差0.2~0.5个百分点.
The accurate identifying of Chinese organizations' names has been one of the factors which influences natural language processing.According to the specialty of Chinese organizations' name,the recognition method of the maximum entropy is put forward.With a large-scale set of data,we compare different results to the model caused by applying different feature selection.Besides,the contribution of word,part of speech,grammar etc is investigated.The experimental result suggests that,with different method of feature selection,the average of open test differs only by 0.2~0.5 percentage points.
出处
《计算机与数字工程》
2010年第12期36-40,共5页
Computer & Digital Engineering