期刊文献+

采用特征分辨率和等价类相关矩阵的特征选择 被引量:1

Feature Selection by Applying Feature Resolution and Correlation Matrix of Equivalence Classes
下载PDF
导出
摘要 特征选择是文本分类的关键步骤之一,所选特征子集的优劣直接影响文本分类的结果。首先分析了词频和文档频并在此基础上对文档频进行优化。然后又以此为基础提出了特征分辨率并先用它初选文本特征。紧接着又把粗糙集引入进来并给出了一个基于等价类相关矩阵的属性约简算法,以此来进一步消除冗余特征。仿真结果表明上述方法无论是在精确度和召回率方面,还是时间性能及平均分类精度方面,都具有一定的优势。 Feature selection is one of the key steps in text categorization, selected feature subset directly influ- ences results of text categorization. Firstly, word frequency and document frequency were analyzed, and an im- proved document frequency was improved. And then, feature resolution was presented based on the improved docu- ment frequency. Subsequently, rough sets were introduced into feature selection and a new attribute reduction algo- rithm based on correlation matrix of equivalence classes was provided. Finally, combining feature resolution with the provided attribute reduction algorithm, a new feature selection method was proposed. The new feature selection method firstly uses feature resolution to select text features and filter out some terms to reduce the sparsity of text feature spaces, and then employs the provided attribute reduction algorithm to eliminate redundancy. The simula- tion results show that the proposed feature selection method to a certain extent has advantages in precision rate, re- call rate, time performance and average classification accuracy.
出处 《科学技术与工程》 北大核心 2012年第34期9234-9237,9242,共5页 Science Technology and Engineering
基金 阿坝师范高等专科学校校级科研项目(ASB12-23)资助
关键词 特征选择 文本分类 特征分辨率 粗糙集 相关矩阵 feature selection text categorization feature resolution rough sets correlation matrix
  • 相关文献

参考文献14

  • 1陈晓云,李荣陆,胡运发.基于最小词频阈值的文档特征选择[J].模式识别与人工智能,2006,19(4):531-537. 被引量:7
  • 2朱颢东,李红婵.基于互信息和粗糙集理论的特征选择[J].计算机工程,2011,37(15):181-183. 被引量:9
  • 3NirAilon B C. Faster dimension reduction . Communications of the ACM, 2010;53(2) :97-104.
  • 4Chen Jingnian, Huang Houkuan, Tian Shengfeng, et al. Feature se- lection for text classification with Na? ve Bayes. Expert Systems with Applications, 2009 ;36(3 ) :5432-5435.
  • 5Destrero A, Mosci S, Mol C D. Feature selection for high-dimension- al data. Computational Management Science, 2009; 6( 1 ) :25-40.
  • 6朱颢东,李红婵.结合类内集中度和优化RBF神经网络的特征选择[J].微电子学与计算机,2011,28(2):145-148. 被引量:6
  • 7Gheyas I A, Smith L S. Feature subset Selection in large dimension- ality domains. Pattern Recognition, 2010 ;43 ( 1 ) : 5-13.
  • 8Nguyen M H, Torte F D. Optimal feature sdection for support vector machines. Pattern Recognition, 2010;43(3): 584-591.
  • 9Pawlak Z. Rough sets theory and its application to data analy- sis. Cybernetics and Systems, 1998 ;29(9) :661-668.
  • 10SaIam6. M, L6pez-S6nchez. M. Rough set based approaches to fea- ture selection for case-based reasoning classifiers. Pattern Recogni- tion Letters, 2011 ; 32 (2) :280-292.

二级参考文献24

  • 1邢桂华,朱庆保.基于模拟退火遗传算法的RBF网络的优化[J].微电子学与计算机,2005,22(7):174-177. 被引量:3
  • 2黄萱菁,吴立德,王文欣,叶丹瑾.基于机器学习的无需人工编制词典的切词系统[J].模式识别与人工智能,1996,9(4):297-303. 被引量:24
  • 3陈伟,冯斌,孙俊.基于QPSO算法的RBF神经网络参数优化仿真研究[J].计算机应用,2006,26(8):1928-1931. 被引量:24
  • 4刘成,肖扬.粒子群优化算法在多用户检测中的应用[J].中国铁道科学,2006,27(4):129-132. 被引量:2
  • 5Delgado M, Martin Bautista M J, Sanchez D, et al. Mining text data: special features and patterns [C]// Proceedings of ESF Exploratory Workshop. London.. U K, 2002,32-38.
  • 6Friedman N, Geiger D, Goldszmidt M. Bayesian network classifiers[J]. Machine learning, 1997, 29 (2): 131 -163.
  • 7Sun Jun, Feng Bin, Xu Wenbo. Particle swarm optimiza-tion with particles having quantum behavior[C]//Pro- ceeding of 2004 Congress on Evolutionary Computation. Piscataway CA: IEEE Press, 2004 : 325- 3.
  • 8Nguyen M, Torre F D. Optimal Feature Selection for Support Vector Machines[J]. Pattern Recognition, 2010, 43(3): 584-591.
  • 9Bakus J, Kamel M S. Higher Order Feature Selection for Text Classification[J]. Knowledge and Information Systems, 2006, 9(4): 468-491.
  • 10Liu Huawen, Sun Jigui, Liu Lei. Feature Selection with Dynamic Mutual Information[J]. Pattern Recognition, 2009, 42(7): 1330- 1339.

共引文献15

同被引文献14

  • 1卫志华.中文文本多标签分类研究[D].上海:同济大学,2010.
  • 2Godbole S,Sarawagi S.Discriminative methods for muhi-labeled classification[C]//Proceedings of the 8th Pacic-Asia Conference on Knowledge Discovery and Data Mining.2004,3056:22-30.
  • 3Streich A,Buhmann J.Classfication of multi-labeled data:A generative approach[C]//Proceedings of the ECMI/PKDD.Antwerp,Belgium,2008,2:390-405.
  • 4Tsoumakas G,Katakis I,Vlahavas I.Multi-Label Classification:An Overview[J].International Journal of Data Warehousing and Mining,2007,3(3):1-13.
  • 5Tsoumakas G,Katakis I,Vlahavas I.Mining Multi-label Data.Data Mining and Knowledge Discovery Handbook[M]//Maimon O,Rokach L.Springer,2010:667-685.
  • 6Modi Hiteshri,Panchal Mahesh.Experimental Comparison of Different Problem Transformation Methods for Multi-Label Classification using MEKA[J].International Journal of Computer Applications,2012,59(15):10-15.
  • 7Kou H,Gardarin G,Zeitouni K.Approaches to feature selection for document categorization[C]//Proceedings of the 8th International Conference on Applications of Natural Language to Information Systems.Amsterdam,Netherlands:Elsevier Science Publishers.2003:141-154.
  • 8Pavlidis P,Weston J,Cai J,et al.Combining mieroarray expression data and phylogenetic profiles to learn functional categoties using support vector machines[C]//Proceedings of Annual International Conference on Computational Molecular Biology.Columbia:Columia University,2001:242-248.
  • 9Boutell M R,Luo J,Shen X,et al.Learning multi-label scene classification[J].Pattern Recognition,2003,37(9):1757-1771.
  • 10Trohidis K,Tsoumakas G,Kalliris G,et al.Multi-label classification of music into emotions[C]//Proceedings International Conference on Music Information Retrieva1.Philadelphia:ISMIR,2008:325-330.

引证文献1

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部