期刊文献+

基于关联特征的朴素贝叶斯文本分类器 被引量:4

Naive Bayes Text Classifier Based on Association Features
下载PDF
导出
摘要 单词的共同出现信息可以为文本分类做出贡献,但是,目前的文本分类研究中未能充分使用这一信息。文中提出了一种利用关联特征来提高朴素贝叶斯文本分类器性能的策略,给出了关联特征集的构造方法,设计并实现了冗余关联特征剔除算法和关联特征筛选算法,使得特征空间中的每个特征都具有较强的分类能力。实验证明,经处理后的关联特征集可以提高朴素贝叶斯文本分类器的性能。 The information of the co-occurrence of words can make contributions to automatic text classification. However, the current text classifiers fail to take full advantage of this information. We defined the association feature to describe this information. In order to make the association features to be good discriminators, we proposed the technology to create association feature set. Firstly, we set up the association feature by an apriori-like algorithm. Secondly, we proposed an algorithm to evaluate the discriminative ability of association features for pruning the redundant features. Thirdly, we proposed the feature selection algorithm, which is based on IG (information gain) algorithm, for further dimensionality reduction of the feature set. The experimental results on Reuters21578 dataset show that when association feature is applied, the Macro F1 of naive Bayes text classifier is enhanced to 83.5% from 72%. This result means that association features can be used to improve the performance of naive Bayes text classifier.
出处 《西北工业大学学报》 EI CAS CSCD 北大核心 2004年第4期413-416,共4页 Journal of Northwestern Polytechnical University
基金 国家自然科学基金 (60 0 73 0 55)资助
关键词 朴素贝叶斯分类器 关联特征 特征筛选 . Algorithms Classification (of information) Data mining Discriminators Information analysis Performance
  • 相关文献

参考文献7

  • 1[1]McCallum A, Nigam K. A Comparison of Event Models for Naive Bayes Text Classification. AAAI-98 Workshop on Learning for Text Categorization, 1998
  • 2[2]Meretakis D, Fragoudis Dimitris, Lu Hongjun, Likothanassis Spiros, Scalable Association-Based Text Classification. Proceedings of the 9th ACM Int Conf Information and Knowledge Management (CIKM'00), Washington, US, 2000, 5~11
  • 3[3]Antonie M, Osmar R. Text Document Categorization by Feature Association. Proceedings of the IEEE International Conference on Data Mining (ICDM'2002), 2002, 3: 19~26
  • 4[4]Deshpande Mukund, Karypis George. Using Conjunction of Attribute Values for Classification. Proceedings of the 11th ACM Int Conf Information and Knowledge Management (CIKM'02), 2002, 356~364
  • 5[5]Lesh Neal, Mohammed J, Zaki Ogihara Mitsunori. Mining Features for Sequence Classification. Proceedings of 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 1999, 342~346
  • 6[6]Mladenic D, Grobelnik M. Word Sequences as Features in Text-Learning. Proceedings of the 17th Electrotechnical and Computer Science Conference, Ljubljana, Slovenia: 1998, 145~148
  • 7[7]Tan Chade-Meng, Wang Yuan-Fang, Lee Chan-Do. The Use of Bigrams to Enhance Text Categorization. Information Processing and Management, 2002, 38(4): 529~546

同被引文献50

引证文献4

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部