期刊文献+

基于索引项权重的文本特征选择方法 被引量:4

Feature selection based on term weight for text categorization
下载PDF
导出
摘要 为改善文本分类的效率和效果,降低计算复杂度,在分析了经典的特征选择方法后,提出加权的文本特征选择方法。该方法不仅利用数据集中文本的个数,还充分考虑到索引项的权重信息,并构造新的评估函数,改进了信息增益、期望交叉熵以及文本证据权。利用KNN分类器在Reuters-21578标准数据集上进行训练和测试。实验结果表明,该方法能够选出有效特征,提高文本分类的性能。 To improve the efficiency and effectiveness and reduce computational complexity for text categorization, text feature selection with term weight is prop6sed based on the classical method. This method not only used the numbers of documents in datasets, but also fully took the information of term weight into account in the text. Thus, new evaluation function is constructed. It works better than information gain, expected cross entropy and weight of evidence for text. Using K-Nearest neighbor classifier, Reuters-21578 is used as standard data collection. Experimental results show that the new method select good features and effectively improve the performance of text categorization.
出处 《计算机工程与设计》 CSCD 北大核心 2010年第5期1149-1151,共3页 Computer Engineering and Design
基金 国家自然科学基金项目(60673186)
关键词 文本分类 特征选择 索引项权重 信息增益 期望交叉熵 文本证据权 text categorization feature selection term weight information gain expected cross entropy weight of evidence for text
  • 相关文献

参考文献11

二级参考文献33

共引文献233

同被引文献35

  • 1单丽莉,刘秉权,孙承杰.文本分类中特征选择方法的比较与改进[J].哈尔滨工业大学学报,2011,43(S1):319-324. 被引量:25
  • 2张会娥,张智雄,林颖,李飒.基于RSS的科技信息聚合系统的设计和实现[J].现代图书情报技术,2005(7):60-63. 被引量:30
  • 3毛勇,周晓波,夏铮,尹征,孙优贤.特征选择算法研究综述[J].模式识别与人工智能,2007,20(2):211-218. 被引量:95
  • 4彭京,杨冬青,唐世渭,付艳,蒋汉奎.一种基于语义内积空间模型的文本聚类算法[J].计算机学报,2007,30(8):1354-1363. 被引量:44
  • 5Wang Tai-Yue,Chiang Huei-Min.One-against-one fuzzy sup- port vector machine classifier:An approach to text categoriza- tion[J].Expert Systems with Applications,2009,36(6):10030- 10034.
  • 6Ko Youngjoong,Seo Jungyun.Text classification from unlabeled documents with bootstrapping and feature projection techniques [J].Information Processing and Management,2009,45(1):70-83.
  • 7Lu Shing-Hwa, Chiang Ding-An,Keh Huan-Chao,et al.Chinese text classification by the Naive Bayes classifier and the associa- tive classifier with multiple confidence threshold values [J]. Knowledge-Based Systems,2010,23(6):598-604.
  • 8Rajuzan Al,Besancon R.Text mining:Natural language techni- ques and text mining applications[D].Artificial Intelligence Labo- ratory, Computer Science Department,Swiss Federal Institute of Technology,2007:15-23.
  • 9Chang Yu-Chuan,Chen Shyi-Ming,Liau Chum-Jung.Multilabel text categorization based on a new linear classifier learning method and a category-sensitive refinement method [J]. Expert Systems with Applications,2008,34(3): 1948-1953.
  • 10Han Hyoungdong, Ko Youngjoong, Seo Jungyun.Using the re- vised EM algorithm to remove noisy data for improving the one- against-the-rest method in binary text classification[J].Informa- tion Processing and Management,2007,43(5): 1281-1293.

引证文献4

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部