摘要
在动态的数据流中,由于其不稳定性以及存在概念漂移等问题,集成分类模型需要有及时适应新环境的能力。目前通常使用监督信息对基分类器的权重进行更新,以此来赋予符合当前环境的基分类器更高的权重,然而监督信息在真实数据流环境下无法立即获得。为了解决这个问题,文中提出了一种基于信息熵更新基分类器权重的数据流集成分类算法。首先使用随机特征子空间对每个基分类器进行初始化来构建集成分类器;其次基于每个新到来的数据块构建一个新的基分类器来替换集成中权重最低的基分类器;然后基于信息熵的权重更新策略实时对基分类器中的权重进行更新;最后满足要求的基分类器参与加权投票,得到分类结果。将所提算法和几个经典学习算法进行对比,实验结果表明,所提方法的分类准确性有着明显优势,并且适合多种类型的概念漂移环境。
In the dynamic data stream,due to its instability and the existence of concept drift,the ensemble classification model needs the ability to adapt to the new environment in time.At present,the weight of the base classifier is usually updated by using the supervision information,so as to give higher weight to the base classifier suitable for the current environment.However,supervision information cannot be obtained immediately in a real data stream environment.In order to solve this problem,this paper presents a data stream ensemble classification algorithm,which updates the weight of the base classifier through information entropy.Firstly,the random feature subspace is used to initialize each base classifier to construct the ensemble classifier.Secondly,a new base classifier is constructed based on each new data block to replace the base classifier with the lowest weight in the ensemble.Then,the weight update strategy based on information entropy will update the weights in the base classifier in real time.Finally,the base classifier that meets the requirements participates in weighted voting to obtain the classification result.Comparing the proposed algorithm with several other classic learning algorithms,the experimental results show that the proposed method has obvious advantages in classification accuracy and is suitable for various types of concept drift environments.
作者
夏源
赵蕴龙
范其林
XIA Yuan;ZHAO Yun-long;FAN Qi-lin(School of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China;Collaborative Innovation Center of Novel Software Technology and Industrialization,Nanjing 210023,China)
出处
《计算机科学》
CSCD
北大核心
2022年第3期92-98,共7页
Computer Science
关键词
数据流
概念漂移
信息熵
分类
集成算法
Data stream
Concept drift
Information entropy
Classification
Ensemble algorithm