期刊文献+

结合优化的文档频和PA的特征选择方法 被引量:2

Feature selection method combined optimized document frequency with PA
下载PDF
导出
摘要 特征空间的高维特点限制了分类算法的选择,影响了分类器的设计和准确度,降低了分类器的泛化能力,从而出现分类器过拟合的现象,因此需要进行特征选择以避免维数灾难。首先简单分析了几种经典特征选择方法,总结了它们的不足;然后给出了一个优化的文档频方法,并用它过滤掉一些词条以降低文本矩阵的稀疏性;最后应用模式聚合(PA)理论建立文本集的向量空间模型,从分类贡献的角度强化词条的作用,消减原词条矩阵中包含的冗余模式,从而有效地降低了向量空间的维数,提高了文本分类的精度和速度。实验结果表明此种综合性特征选择方法效果良好。 Feature space has characteristic of high dimensional, which restricts choice of classification algorithms and makes the classifier hardly design, also lows the generalization ability and makes the classifier overfitting, so feature selection is necessary to avoid curse of dimensionality. This paper firstly analyzed simply several classic feature selection methods and summarized their deficiencies. And then it presented an optimized document frequency method and used this method to filter out some terms to reduce the sparsity of text matrix. Finally,it established the vector space model of text sets weight by means of the theory of PA, which enhanced the function of the words from the viewpoint of categorization effect, decreased the dimension of vector by eliminating redundant features and raised speed and accuracy of text categorization. The experimental results show that the combined method is promising.
作者 朱颢东 钟勇
出处 《计算机应用研究》 CSCD 北大核心 2010年第1期36-38,共3页 Application Research of Computers
基金 四川省科技计划资助项目(2008GZ0003)
关键词 特征选择 文本分类 词频 文档频 模式聚合 feature selection text categorization word frequency document frequency PA
  • 相关文献

参考文献8

二级参考文献32

  • 1寇莎莎,魏振军.自动文本分类中权值公式的改进[J].计算机工程与设计,2005,26(6):1616-1618. 被引量:25
  • 2Yang Y, Hu X. A re- examination of text categorization methods [ A]. Proceedings 22^nd Annual International ACM SIGIR Confetence on Research and Develolanent in Information Retrieval(SIGIR '99)[C]. Berkeley: ACM Press, 1999.42-49.
  • 3Yah Qiu Chen; Nixca, M. S.; Damper, R. I. Implementing the k - nearest neighbour rule via a neural network[A]. Neural Netwodm, 1995 [ C ]. Proceedings., IEEE.International Coderence on, 1995. 136- 140.
  • 4Soucy, P.; Mineau, G. W. A simple KNN algorithm for text categodzation[ A]. Data Mining, 2001. ICDM 2001[C], Proceedings IEEE International Codeaevce on, 2001.647-648.
  • 5徐建锁 王正欧.一种基于Kohonen网络和模式聚合理论的高效文本分类新方法[R].天津:天津大学系统工程研究所,2004..
  • 6Yang Y, Pedersen JP. A comparative study on feature selection in text categorization[ A]. Proceedings of the Fourteenth Intematlonal Confemnce on Machine Learning (ICML'7)[C]. San Francisco: Morgan Ksufmann Publishers, 1997.412 - 420.
  • 7YiMing Yang. An Evaluation Of Statitical Approches to Text.Categorizaiton[J]. Information retrieval, 1999, (1) :69 - 90.
  • 8Qiang Shen, Alexios chouchoulas. A rough- fuzzy approach for generating classification rules[J]. Pattern Recogonition, 2002,(35) :2425 - 2438.
  • 9Lili Diao, Keyyun Hu, Yuehaa Lu, Chunyi Shi. Boosting Simple Decision Trees with Bayesian Learning for Text Categorization[A].IEEE Robotics and Automation Society[C].Proceedings of the 4th World Congress on Intelligent Control and Automation.Shanghai,China:2002.321 - 325.
  • 10Yiming Yang,Jan O.Pederaen.A Comparative Study on Feature Selection in Text Categorization [ A ]. Proceedings of ICML-97,14th International Confereme on Machine Learning,1997.412-420.

共引文献99

同被引文献17

引证文献2

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部