摘要
采用经验风险最小化归纳原则和梯度下降方法调整传统中心分类法的类别中心向量,解决了传统中心分类法因忽略训练集文本权值因素而导致的类别中心向量表达能力较差问题,得到了与支持向量机分类性能基本一致的一种改进的中心分类法.实验结果表明,该方法是提高中心分类法分类性能的一种有效方法.
Empirical risk minimization inductive principle and gradient descent method were used to fix class-centroid-vectors in traditional centroid-based text classification algorithms so as to improve the poor expression ability of class-centroid-vectors in traditional centroid-based text classification algorithm caused by ignoring the weighting factors of training texts. Then, an improved centroid- based text classification algorithm was obtained, the performance of which is as well as those of support vector machines. Experimental results show that the method adopted in this article is an effective mean to improve the performance of traditional centroid-based text classification algorithms.
出处
《吉林大学学报(理学版)》
CAS
CSCD
北大核心
2013年第5期876-880,共5页
Journal of Jilin University:Science Edition
基金
国家自然科学基金(批准号:61170092
61133011
61272208
61103091
61202308)
关键词
文本分类
中心分类法
经验风险最小化
text classification
centroid-based text classification algorithms
empirical risk minimization