期刊文献+

一种基于互信息的改进文本特征选择 被引量:15

Improved mutual information method of feature selection in text categorization
下载PDF
导出
摘要 提出了一种优化互信息文本特征选择方法。针对互信息模型的不足之处主要从三方面进行改进:用权重因子对正、负相关特征加以区分;以修正因子的方式在MI中引入词频信息对低频词进行抑制;针对特征项在文本里的位置差异进行基于位置的特征加权。该方法改善了MI模型的特征选择效率。文本分类实验结果验证了提出的优化互信息特征选择方法的合理性与有效性。 This paper puts forward a kind of optimizing Mutual Information(M1) text characteristic selection method. Aiming at the MI' s deficiencies, it puts forward three approaches to improvement. The positive and negative fea- tures with the weight factors are distinguished. Through the introduction of the correct factors way, the low-frequency word is realized to restrain. According to the features position in the text, a further weighted method is put forward. In this way, the paper has improved the efficiency of MI model. Subsequent text classification experimental results show the proposed optimization MI and rationality of the method is effective.
出处 《计算机工程与应用》 CSCD 2012年第25期1-4,97,共5页 Computer Engineering and Applications
基金 国家自然科学基金(No.71071161)
关键词 文本分类 特征选择 互信息 特征降维 Text Categorization ( TC ) feature selection Mutual Information (MI) feature reduction
  • 相关文献

参考文献7

二级参考文献23

  • 1徐燕,李锦涛,王斌,孙春明,张森.不均衡数据集上文本分类的特征选择研究[J].计算机研究与发展,2007,44(z2):58-62. 被引量:20
  • 2尚文倩,黄厚宽,刘玉玲,林永民,瞿有利,董红斌.文本分类中基于基尼指数的特征选择算法研究[J].计算机研究与发展,2006,43(10):1688-1694. 被引量:38
  • 3[1]Harry Zhang,Charles X.Ling.A Fundamental Issue of Naive Bayes,Advances in Artificial Intelligence,AI2003[C],Halifax,Canada,2003(6):591?595.
  • 4[2]Han-joon Kim,Jae-young Chang.Improving Naive Bayes Text Classifier with Modified EM Algorithm[C].ISMIS 2003:326-333.
  • 5[6]Salton G,McGill M.J.Introduction to Modern Information Retrieval[M].NewYork,McGraw-Hill,1983.
  • 6Nguyen M H, Torre F D.Optimal feature selection for support vector machines[J].Pattem Recognition,2010,43(3) : 584-591.
  • 7Liu Hua-wen, Sun Ji-gui, Liu Lei.Feature selection with dynamic mutual information[J].Pattem Recognition,2009,42(7) : 1330-1339.
  • 8Zhu Hao-dong,Zhao Xiang-hui,Zhong Yong.Feature selection method combined optimized document frequency with improved RBF netWork[C]//Proc of.5th International Cnference, ADMA 2009, Beijing, China, August 2009:796-803.
  • 9Kalousis A,Prados J, Hilario M.Stability of feature selection algorithms: A study on high-dimensional spaces[J].Knowledge and Information Systems, 2007,12 ( 1 ) : 95-116.
  • 10Destrero A,Mosci S, Mol C D.Feature selection for high-dimensional data[J].Computational Management Science, 2009, 6 (1): 25-40.

共引文献78

同被引文献144

引证文献15

二级引证文献56

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部