期刊文献+

基于类别区分度和关联性分析的综合特征选择 被引量:1

Syntaxic Feature Selection Based on Category Discrimination Degree and Correlation Analysis
下载PDF
导出
摘要 提出一种基于类别区分度和关联性分析的综合特征选择算法。利用类别区分度提取具有较强类别区分能力的特征词,降低特征空间的稀疏性,通过特征的关联性分析衡量特征与类别的相关性及特征之间的冗余度,选出具有类别代表性且相互之间不存在冗余的特征词。实验结果表明,该算法能有效提高分类器性能。 This paper proposes a syntaxic feature selection algorithm based on category discrimination degree and correlation analysis. The algorithm uses discrimination degree to extract the features that reveal larger differences among categories to reduce the sparsity of feature spaces, and emploies correlation analysis of features to measure relativity between features and categories and redundancy among features, so it can acquire the feature subsets which are more representative and have no redundancy between each other. Experimental results show that the proposed algorithm can improve the performance of the classifier effectively.
出处 《计算机工程》 CAS CSCD 2012年第9期186-188,192,共4页 Computer Engineering
基金 国家自然科学基金资助项目(60873196) 甘肃省自然科学基金资助项目(1010RJZA022) 西北师范大学2010年第三期知识与创新工程科研骨干基金资助项目(nwnu-kjcxgc-03-67)
关键词 文本分类 特征选择 关联性分析 类别区分度 相关独立度 text categorization feature selection correlation analysis category discrimination degree relevant independence degree
  • 相关文献

参考文献9

  • 1何绍荣,梁金明,何志勇.基于互信息和关系积理论的特征选择方法[J].计算机工程,2010,36(13):257-259. 被引量:11
  • 2张海龙,王莲芝.自动文本分类特征选择方法研究[J].计算机工程与设计,2006,27(20):3840-3841. 被引量:45
  • 3周茜,赵明生,扈旻.中文文本分类中的特征选择研究[J].中文信息学报,2004,18(3):17-23. 被引量:165
  • 4Battiti R.Using Mutual Information for Selection Features inSupervised Neural Net Learning[J].IEEE Transactions on NeuralNetworks,1994,5(4):537-550.
  • 5Kwak N,Choi Chong-Ho.Input Feature Selection forClassification Problems[J].IEEE Transactions on Neural Networks,2002,3(1):143-159.
  • 6Estévez P A,Michel T,Perez C A.Normalized Mutual InformationFeature Selection[J].IEEE Transactions on Pattern Analysis andMachine Intelligence,2009,20(2):189-201.
  • 7Lei Yu,Huang Liu.Efficient Feature Selection via Analysis ofRelevance and Redundancy[J].Journal of Machine LearningResearch,2004,5(5):1205-1224.
  • 8路松峰,刘芳,胡波.一种基于属性依赖的属性约简算法[J].华中科技大学学报(自然科学版),2008,36(2):39-41. 被引量:9
  • 9谭松波,王月粉.中文文本分类语料——TanCorpV1.0[EB/OL].(2010-10-23).http://www.searchforum.org.cn/tansongbo/corpus.

二级参考文献27

  • 1顾军华,周艳聪,宋洁,晏俊秋.一种新的求解属性值约简算法[J].南开大学学报(自然科学版),2003,36(4):38-42. 被引量:26
  • 2寇莎莎,魏振军.自动文本分类中权值公式的改进[J].计算机工程与设计,2005,26(6):1616-1618. 被引量:25
  • 3寇苏玲,蔡庆生.中文文本分类中的特征选择研究[J].计算机仿真,2007,24(3):289-291. 被引量:30
  • 4路松峰,胡波.基于核属性依赖的属性约简算法研究[J].计算机仿真,2007,24(4):69-71. 被引量:2
  • 5黄麟.智能计算[M].重庆:重庆大学出版社,2004:30-45.
  • 6Delgado M,Martin-Bautista M J,Sanchez D,et al.Mining Text Data:Special Features and Pattems[C] //Proc.of ESF Exploratory Workshop.London,UK:[s.n.] ,2002.
  • 7Yang Yiming,Pedersen J O.A Comparative Study on Feature Selection in Text Categorization[C] //Proc.of the 14th lot'1 Conf.on Machine Learning.[S.1.] :Morgan Kaufmann Publishers,1997.
  • 8.[EB/OL].http://www. ics. uci. edu/-mlearn/MLRepository. html,.
  • 9Yang Yiming,Pederson J O.A Comparative Study on Feature Selection in Text Categorization [A].Proceedings of the 14th International Conference on Machine learning[C].Nashville:Morgan Kaufmann,1997:412-420.
  • 10Y.Yang.Noise reduction in a statistical approach to text categorization[A].Proceedings of the 18th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR95)[C].Seattle:ACM Press,1995:256-263.

共引文献213

同被引文献24

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部