期刊文献+

基于相关规则的不平衡数据的关联分类 被引量:3

Correlated Rules Based Associative Classification for Imbalanced Datasets
下载PDF
导出
摘要 许多研究表明关联分类具有较高的分类准确率,然而,大多数关联分类基于"支持度-置信度"框架,在不平衡数据集中,置信度和支持度都偏向产生多数类的规则,因此,少数类的实例容易被错误分类。针对上述问题,提出了一种基于相关规则的不平衡数据的关联分类算法。该算法挖掘频繁且互关联的项集,在以该项集为前件的分类规则中选取提升度最大的规则。规则按结合了提升度、置信度和补类支持度(CCS)的规则强度进行排序。实验表明,该算法取得了较高的平均分类准确率且在分类少数类的实例时具有更高的准确率。 Many studies have shown that associative classification is a promising classification method. However, most algorithms of associative classifications may not achieve high classification performance on imbalanced datasets because they generate rules based on the "support-confidence" framework. The confidence (support) tends to bias the majority class in imbalanced datasets. As a result, these instances with minority class may be misclassified. We proposed a new associative classification approach called CRAC (Correlated Rules based Associative Classification for Imbalanced Data- sets). First, we mine frequent and mutual associative itemsets for classification. Therefore, we will generate small set of high-quality rules. Second,CRAC only select the rule with largest lift as a CAR among all rules with that frequent and associative itemset as condition. As a result, the antecedent and the consequent of the rules CRAC generated are posi- tively correlated. Finally,we rank rules according to a new metric which integrates lift, support and Complement Class Support (CCS). So, we are likely to use rules with positively correlation to prediction the minority class. Our experiments on fifteen UCI data sets show that our approach is an effective classification technique for both balance and imbalanced datasets, and has better average classification accuracy in comparison with CBA.
出处 《计算机科学》 CSCD 北大核心 2014年第2期111-113,122,共4页 Computer Science
基金 国家自然科学基金(61170129) 福建省自然科学基金(2013J01259)资助
关键词 数据挖掘 关联分类 不平衡数据 相关规则 Data mining, Associative classification, Imbalance datasets, Correlated rules
  • 相关文献

参考文献13

  • 1Liu B, Hsu W, Ma Y. Integrating classification and association rule mining[C]//Proc of the 4th International Conference on Knowledge Discovery and Data Mining (KDD' 98). 1998:80-86.
  • 2Li W, Han J, Pei J. CMAR: Accurate and efficient classification based on multiple class-association rules[C]//Proc of the 1 st In- ternational Conference on Data Mining. 2001:369-376.
  • 3Yin X, Han J. CPAR: classification based on predictive associa- tion rules[C]//Proc of the SIAM International Conference on Data Mining (SDM'03). 20031331-335.
  • 4Dong G, Zhang X,Wong L, et al. CAEP: Classification by aggre- gating emerging patterns[C]//Discovery Science. Springer Ber- lin Heidelberg, 1999130-42.
  • 5Wang J, Karypis G. HARMONY: Efficiently mining the best rules for classification[C]//Proc, of SDM. 2005:205-216.
  • 6Quinlan J R. CA. 5:programs for machine learning[M]. Morgan kaufmarm, 1993.
  • 7Verhein F, Chawla S. Using significant, positively associated and relatively class correlated rules for associative classification of imbalanced datasets [C] /// Seventh IEEE International Confe- rence on Data Mining, 2007, ICDM 2007. IEEE, 2007 : 679-684.
  • 8Arunasalam B, Chawla S. CCCS: a top-down associative classifier for imbalanced class distribution[C]//Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM,2006:517-522.
  • 9Omiecinski E R. Alternative interest measures for mining associa- tions in databases [J]. IEEE Transactions on Knowledge and Data Engineering, 2003,15(1) 157-69.
  • 10Zhao Y, Karypis G. Criterion functions for document clustering; Experiments and analysis [Z]. Machine Learning, 2001.

同被引文献6

引证文献3

二级引证文献27

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部