期刊文献+

面向层次分类的文本特征选择方法 被引量:2

Text Feature Selection Method for Hierarchical Classification
原文传递
导出
摘要 提出一种针对层次分类的文本特征选择方法.先给出类别层次相关度的概念,并利用分类树和训练数据在不同层次上的概率分布进行计算,进而得到分类树中不同类别的重要性.最后基于前面的计算结果,计算每个特征对类别的识别能力,并选择识别能力大的特征组成用于分类的特征集合.实验表明该方法在选取的特征质量以及在accuracy、F1和micro-Precision等分类测度上均优于传统方法. An approach of feature selection for hierarchical classification is proposed. Firstly, the concept of category hierarchical correlation degree is introduced and it is calculated according to the category tree and the probability distribution of training data on different levels. Then, the importance degrees of categories are computed according to hierarchical correlation degree. Finally, the discriminative abilities of features are calculated based on the previous computation and the features with the greater discriminative ability are chosen as the feature set for classification. Experimental results show that the proposed approach outperforms the traditional feature selection methods on both quality of the features selected and standard classification metrics in terms of accuracy, F1 and micro-precision.
出处 《模式识别与人工智能》 EI CSCD 北大核心 2011年第1期103-110,共8页 Pattern Recognition and Artificial Intelligence
基金 国家自然科学基金项目(No.60970047) 山东省自然科学基金项目(No.Y2008G19) 山东省科技攻关项目(No.2007GG10001002 2008GG10001026)资助
关键词 文本特征选择 类别层次相关 层次分类 机器学习 Text Feature Selection, Category Hierarchical Correlation, Hierarchical Classification,Machine Learning
  • 相关文献

参考文献3

二级参考文献23

  • 1赵世奇,张宇,刘挺,陈毅恒,黄永光,李生.基于类别特征域的文本分类特征选择方法[J].中文信息学报,2005,19(6):21-27. 被引量:21
  • 2苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量:387
  • 3Wang Qiang, Wang Xiaolong, Guan Yi. A Study of Semi - Discrete Matrix Decomposition for LSI in Automated Text Categorization[A]. In: First International Joint Conference on Natural Language Processing[C]. 2003: 302- 309.
  • 4Chen Wenliang, Chang xingzhi et al. Automatic Word Clustering for Text Categorization Using Global Information[ A]. In: Asia Information Retrieval Symposium[ C]. 2004:1 -6.
  • 5Board,C. L. C. E.. China Library Categorization (The 4th ed. )[M]. Beijing Library Press, Beijing, 1999.
  • 6Vladimir N. Vapnik. The Nature of Statistical Learning Theary[ M]. Springer, New York, 1998.
  • 7Duda,R O,Hart,P E,Stork,D G. Pattern Classification . 2000
  • 8Dash M,Liu H.Feature selectionfor classification[].Interna-tional Journal of Intelligent data Analysis.1997
  • 9Mendenhall,W,Beaver,R J,Beaver,B M. Introduction to Probability and Statistics . 2004
  • 10Yang Y,Pedersen J O.A Comparative Study on Feature Selection in Text Categorization. // Proceedings of the 14th International Conference on Machine Learning ( ICML) . July8-111997

共引文献101

同被引文献29

  • 1潘文峰.[D].北京.中国科学院计算技术研究所,2004.7.
  • 2Ducheneaut N, Watts L. In search of coherence : a review of e-mail re- search [ J ]. Human-Computer Interaction,2004 : 11 - 48.
  • 3中国反垃圾邮件状况调查报告[DB/OL],2010-07-15.http://ww.isc.org.cn/zxzx/xhdt/listinfo一1775.html.
  • 4Androutsopoulos I, Koutsias J, Chandrinos K V, et al. An evaluation of naive Bayesian anti-spam filtering[ C ]//Proceedings of the 1 l th Euro- pean Conference on Machine Learning. Barcelona, Spain : Springer-Ver- lagi2000:9 - 17.
  • 5Can'eras X, Marquez L. Boosting trees for anti-spam e-mail filtering [ C ]//Proceedings of the 4th International Conference on Recent Ad- vances in Natural Language Processing,2001:58 -64.
  • 6Nicholas T. Using adaboost and decision stumps to identify spam e-mail [ R]. Stanford University,2003.
  • 7Yang Y M, Pedersen J O. A comparative study on feature selection in text categorization [ C ]. International Conference on Machine Learning Nashville Tennessee, USA, IMLS, 1997:412 - 420.
  • 8王强,关毅,王晓龙.基于标题类别语义识别的文本分类算法研究[J].电子与信息学报,2007,29(12):2885-2890. 被引量:6
  • 9Kim K,Chung B S,Choi Y R,et al.Semantic pattern tree kernels for short-text classification[C]//Proc of the 9th IEEE International Conference on Dependable:Autonomic and Secure Computing.[S.l.]:IEEE Press,2011:1250-1252.
  • 10Kirange D K.Emotion classification of news headlines using SVM[J].Asian Journal of Computer Science & Information Technology,2013,2(5):104-106.

引证文献2

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部