摘要
分类问题是数据挖掘的一个重要研究课题.朴素贝叶斯分类器是分类问题中一种简单高效的分类学习技术.该分类器假定给定类标时属性之间相互条件独立,然而现实中属性之间往往具有一定的依赖关系."属性-值"序偶构成的模式在分类问题中具有关键作用,许多研究者利用这种特定模式构造分类器,而特定模式所包含的属性与其他属性之间的依赖关系,将对分类结果产生重要影响.通过对属性间的依赖关系进行深入研究,提出基于选择性模式的贝叶斯分类算法,既利用了基于贝叶斯网络分类器的优秀分类能力,又通过进一步分析模式中属性之间的依赖关系,削弱了属性条件独立假设的限制.实验证明:根据数据集特点,深入挖掘高区分能力的模式,合理构建属性之间的依赖关系,有助于提升分类精度.实验分析表明:与基准算法NB,AODE相比,提出的分类算法在10个数据集上的平均精度分别提升了1.65%和4.29%.
Data mining is mainly related to the theories and methods on how to discover knowledge from data in very large databases, while classification is an important topic in data mining. In the field of classification research, the Naive Bayesian classifier is a simple but effective learning technique, which has been widely used. It is commonly thought to assume that the probability of each attribute belonging to a given class value is independent of all other attributes. However, there are lots of contexts where the dependencies between attributes are more complex. It is an important technique to construct a classifier using specific patterns based on "attribute-value" pairs in lots of researchers’ work, while the dependencies among the attributes implied in the patterns and others will have significant impacts on classification results, thus the dependency between attributes is exploited adequately here. A Bayesian classification algorithm based on selective patterns is proposed, which could not only make use of the excellent classification ability based on Bayesian network classifiers, but also further weaken restrictions of the conditional independence assumption by further analyzing the dependencies between attributes in the patterns. The classification accuracies will benefit from fully considering the characteristics of datasets, mining and employing patterns which own high discrimination, and building the dependent relationship between attributes in a proper way. The empirical research results have shown that the average accuracy of the proposed classification algorithm on 10 datasets has been increased by 1.65% and 4.29%, comparing with the benchmark algorithms NB and AODE, respectively.
作者
鞠卓亚
王志海
Ju Zhuoya;Wang Zhihai(School of Computer and Information Technology,Beijing Jiaotong University,Beijing 100044;Unit 32178,Beijing 100012)
出处
《计算机研究与发展》
EI
CSCD
北大核心
2020年第8期1605-1616,共12页
Journal of Computer Research and Development
基金
国家自然科学基金项目(61672086)
北京市自然科学基金项目(4182052)。
关键词
分类
模式发现
选择性模式
依赖关系
贝叶斯分类器
classification
pattern discovery
selective patterns
dependency
Bayesian classifier