摘要
物资分类是企业物资管理的一项基础工作,在大型企业中,物资数量巨大且类别繁多,所以需要借助计算机自动分类技术提高物资分类的效率。在自动分类的过程中,物资名称相似度是影响分类效果的关键因素之一。在分析了物资名称字符串特点和Jaro—Winkle算法的基础上,提出了一种基于动态权重的中文字符串相似度计算方法。通过在真实物资分类数据集上的实验,验证了这种相似度的计算方法可以有效提高物资分类的准确度。
Material classification plays a fundamental role in enterprise material management, while the huge amount of materials and categories make it impossible to accomplish the task by manual editing. Therefore it is important to integrate automatic classification methodologies into enterprise material classification. In the process of automatic material classification, the material name similarity metric is essential; however traditional string similarity metrics are not suitable for Chinese material names. In this paper, after evaluating the Jaro-Winkle algorithm, a novel material classification- oriented Chinese string similarity metric is proposed by estimating the weights of the suffixes in Chinese material names dynamically. Finally, the experiment resuhs on a real dataset of Chinese Materials are reported, which shows that the dynamic-weighting based string similarity metric outperforms the traditional metrics.
出处
《情报学报》
CSSCI
北大核心
2012年第7期709-714,共6页
Journal of the China Society for Scientific and Technical Information
关键词
字符串相似度
自动分类
物资分类
string similarity, automatic classification, material classification