基于经验风险的中心文本分类算法

Centroid Classifier Based on Empirical Risk for Text Categorization

下载PDF

导出

摘要采用经验风险最小化归纳原则和梯度下降方法调整传统中心分类法的类别中心向量,解决了传统中心分类法因忽略训练集文本权值因素而导致的类别中心向量表达能力较差问题,得到了与支持向量机分类性能基本一致的一种改进的中心分类法.实验结果表明,该方法是提高中心分类法分类性能的一种有效方法. Empirical risk minimization inductive principle and gradient descent method were used to fix class-centroid-vectors in traditional centroid-based text classification algorithms so as to improve the poor expression ability of class-centroid-vectors in traditional centroid-based text classification algorithm caused by ignoring the weighting factors of training texts. Then, an improved centroid- based text classification algorithm was obtained, the performance of which is as well as those of support vector machines. Experimental results show that the method adopted in this article is an effective mean to improve the performance of traditional centroid-based text classification algorithms.

作者周晓堂欧阳继红李熙铭

机构地区吉林大学计算机科学与技术学院

出处《吉林大学学报（理学版）》 CAS CSCD 北大核心 2013年第5期876-880,共5页 Journal of Jilin University:Science Edition

基金国家自然科学基金(批准号:61170092 61133011 61272208 61103091 61202308)

关键词文本分类中心分类法经验风险最小化 text classification centroid-based text classification algorithms empirical risk minimization

分类号 TP391.1 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献13

1XUE Gui-rong, XING Di-kan, YANG Qiang, et al. Deep Classification in Large-Scale Text Hierarchies [C]// Proceedings of the 31st Annual In*ernational ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2008: 619-626.
2Han E H, George K. Centroid-Based Document Classification: Analysis and Experimental Results [-C]// Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery. London Springer-Verlag, 2000: 424-431.
3Ashraf M K, Eibe F, Bernhard P, et al. Multinomial Proceedings of the 17th Australian Joint Conference on Naive Bayes for Text Categorization Revisited [C]// Artificial Intelligence. Berlin: Springer Verlag, 2004: 488-499.
4Naohiro I, Tsuyoshi M, Takahiro Y, et al. Text Classification by Combining Grouping, LSA and kNN [-C]// Proceedings of the 5th IEEE/ACIS International Conference on Computer and Information Science. Washington DC: IEEE Computer Society, 2006:148 -154.
5Rowena C, Chunghsing Y, Katea S. A Neural Network Model for Hierarchical Multilingual Text Categorization [C]//Proeeedings of the Second International Symposium on Neural Networks. Berlin: Springer Verlag, 2005: 238 245.
6GAO Sheng, WU Wen, LEE Chin-hui, et al. A Maximal Figure-of-Merit (MFoM)-Learning Approach to Robust Classifier Design for Text Categorization [J]. ACM Transactions on Information Systems, 2006, 24(2) : 190-218.
7Shrikanth S, George K. A Feature Weight Adjustment Algorithm for Document Categorization [C]//Proceedings of the KDD-2000 Workshop on Text Mining. Boston: Citeseer, 2000: 12-19.
8Verayuth L, Thanaruk T. Effect of Term Distributions on Centroid-Based Text Categorization [-J]. Information Sciences, 2004, 158: 89-115.
9GUAN Hu, ZHOU Jing-yu, GUO Min yi. A Class-Feature-Centroid Classifier for Text Categorization [C]// Proceedings of the 18th International Conference on World Wide Web. New York: ACM, 2009: 201-210.
10TAN Song-bo. Large Margin DragPushing Strategy for Centroid Text Categorization [-J]. Expert Systems with Applications, 2007, 33(1): 215-220.

1何尧,张顺淼.利用未标识文档提高中心分类法性能的研究[J].电脑知识与技术（过刊）,2007(16):1125-1126.
2张志斌.数据分析的三种方法[J].国外科技新书评介,2013(7):19-20.
3罗娜.数据挖掘中的新方法——支持向量机[J].软件导刊,2008,7(10):30-31. 被引量：17
4李志明,孔令富.用于回归估计的支持向量机[J].广西科学院学报,2005,21(4):215-218. 被引量：1
5范秋凤,陈彦涛.支持向量机及其应用研究[J].科技信息,2009(29). 被引量：3
6梁宏斌,严正俊.基于支持向量机的模式识别方法[J].现代电子技术,2007,30(16):193-194. 被引量：3
7蒋宗礼,徐学可.文本分类中基于AdaBoost.MR的改进中心法[J].计算机工程与设计,2009,30(1):122-124. 被引量：2
8吴建生,金龙.神经网络的统计学习理论基础[J].广西科学院学报,2005,21(2):102-105. 被引量：1
9李军杰,刘克胜,赵有才.基于改进kNN算法的网页分类系统设计[J].网络安全技术与应用,2007(11):51-52.
10刘昕,孙金玮,刘丹.基于支持向量回归的非线性多功能传感器信号重构[J].传感技术学报,2006,19(4):1167-1170. 被引量：2

吉林大学学报（理学版）

2013年第5期

浏览历史

内容加载中请稍等...

基于经验风险的中心文本分类算法

参考文献13

相关作者

相关机构

相关主题

浏览历史