摘要
许多研究表明关联分类具有较高的分类准确率,然而,大多数关联分类基于"支持度-置信度"框架,在不平衡数据集中,置信度和支持度都偏向产生多数类的规则,因此,少数类的实例容易被错误分类。针对上述问题,提出了一种基于相关规则的不平衡数据的关联分类算法。该算法挖掘频繁且互关联的项集,在以该项集为前件的分类规则中选取提升度最大的规则。规则按结合了提升度、置信度和补类支持度(CCS)的规则强度进行排序。实验表明,该算法取得了较高的平均分类准确率且在分类少数类的实例时具有更高的准确率。
Many studies have shown that associative classification is a promising classification method. However, most algorithms of associative classifications may not achieve high classification performance on imbalanced datasets because they generate rules based on the "support-confidence" framework. The confidence (support) tends to bias the majority class in imbalanced datasets. As a result, these instances with minority class may be misclassified. We proposed a new associative classification approach called CRAC (Correlated Rules based Associative Classification for Imbalanced Data- sets). First, we mine frequent and mutual associative itemsets for classification. Therefore, we will generate small set of high-quality rules. Second,CRAC only select the rule with largest lift as a CAR among all rules with that frequent and associative itemset as condition. As a result, the antecedent and the consequent of the rules CRAC generated are posi- tively correlated. Finally,we rank rules according to a new metric which integrates lift, support and Complement Class Support (CCS). So, we are likely to use rules with positively correlation to prediction the minority class. Our experiments on fifteen UCI data sets show that our approach is an effective classification technique for both balance and imbalanced datasets, and has better average classification accuracy in comparison with CBA.
出处
《计算机科学》
CSCD
北大核心
2014年第2期111-113,122,共4页
Computer Science
基金
国家自然科学基金(61170129)
福建省自然科学基金(2013J01259)资助
关键词
数据挖掘
关联分类
不平衡数据
相关规则
Data mining, Associative classification, Imbalance datasets, Correlated rules