期刊文献+

规则加权的文本关联分类 被引量:4

Association Rules Text Categorization Based on Weighted Rules
下载PDF
导出
摘要 近年来,基于关联规则的文本分类方法受到普遍关注。虽然在一般情况下这种方法可获得较好的分类效果。但当样本特征词分布明显不均时,分类规则在各类别的分布也出现不均,从而导致分类准确率下降。本文设计和实现的基于规则权重调整的关联规则文本分类算法可有效地解决这一问题。该算法根据误分类训练样本的数量定义规则强度。对强规则通过乘以小于1的调整因子降低其权重,而弱规则乘以大于1的调整因子提高其权重。实验结果表明经过规则权重的调整,分类质量显著提高。 Recently, categorization methods based on association rules have been given much attention. In general, association classification has the higher accuracy and the better performance. However, the classification accuracy drops rapidly when the distribution of feature words in training set is uneven. Therefore, text categorization algorithm Weighted Association Rules Categorization (WARC) is proposed in this paper. In this method,rule intensity is defined according to the number of misclassified training samples. Each strong rule is multiplied by factor less than 1 to reduce its weight while each weak rule is multiplied by factor more than 1 to increase its weight. The result of research shows that this method can remarkably improve the accuracy of association classification algorithms by regulation of rules weights.
出处 《中文信息学报》 CSCD 北大核心 2005年第4期52-59,共8页 Journal of Chinese Information Processing
基金 国家自然科学基金资助项目(69933010) 福建省教委科技基金资助项目(JB02069)
关键词 计算机应用 中文信息处理 关联分类 规则强度 权重 computer application Chinese information processing association classification rule intensity weight
  • 相关文献

参考文献10

  • 1W Li, J Hart, J pei.. CMAR: Accurate and efficient classification based on multiple classification rules. In: IEEE International Conference on Data Mining(ICDM'01)[C], San Jose, California, November 29-December 2001.
  • 2B.Liu,W. Hsu and Y. Ma. Integrating classification and association rule rnining[C] In: ACM Int. Conf. on Knowledge Discovery and Data Mining (SIG KDD'98), pages 80-86, NewYork City, NY, August 1998.
  • 3O.R.Za? ane and M.L.Antonie. Classifying text documents by associating terms with text categories[C]. In: Thirteenth Australasian Database Conference(ADC'02), pages 215-222, Melbourne, Australia, January 2002.
  • 4Yoav Freund Robert E. Schapire,Experiments with a New Boosting Algorithm[C]. In: Machine Learning:Proceedings of the Thirteenth International Conference,pages 148-157, Bari, Italy 1996.
  • 5R. Agrawal and R. Srikant , Fast algorithms for mining association rules. In: Proceeding of the 1994 International Conference on Vary Large Data Bases,Santiago[C]. Chile, 1994.487-499.
  • 6周水庚,关佶红,俞红奇,胡运发.基于Ngram信息的中文文档分类研究[J].中文信息学报,2001,15(1):34-39. 被引量:23
  • 7金凌,吴文虎,郑方,吴根清.距离加权统计语言模型及其应用[J].中文信息学报,2001,15(6):47-52. 被引量:8
  • 8Y. Yang and X Lin, A Re-Examination of Text Categorization Methods. In Proceedings of SIGIR 99[C], Berkeley,CA, 1999.
  • 9Y. Yang, Jan P. Pedersen, Acomparative study on feature selection in text categorization, In: Proceedings of the Fourteenth International Conference on Machine Leaming(ICML'97)[C], Jr. Doughals H. Fisher, Ed., Nashville, TN,July 8-12 1997.
  • 10Tom M.Michell, Machine Learning, China Machine Press[Z], 2003, 1.

二级参考文献7

共引文献28

同被引文献77

引证文献4

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部