摘要
设计并建立一个基于向量空间模型和简单贝叶斯的文本分类系统,系统引入小类校正和兼类判断的算法,完成层级多标签的分类。进行基于向量空间模型和简单贝叶斯分类效果的对比,实验证明,在约3万篇测试集上(共15个大类,244个小类),基于向量空间模型的大类分类高25.2个百分点,层级小类分类高26.3个百分点。
This paper has implemented a text categorization system based on Vector Space Model(VSM) and Naive-Bayes(NB). When estimating the category, the authors enhance the veracity of parent-category by emendation of sub-category, and judge whether document has multi-classification and multi-label by estimating the similar difference of classifier' s final values. The experiment proves that VSM is better than NB in text representation : MicroFl increases of 25.2 percent of parent-category, and MicroF1 increases of 26.3 percent of sub-category,
出处
《现代图书情报技术》
CSSCI
北大核心
2006年第4期53-55,共3页
New Technology of Library and Information Service
基金
教育部"国家语言资源监测"项目(项目编号:L2004-01-01-04)的研究成果之一