摘要
倾斜数据中普遍存在概念漂移,而已有数据流概念漂移检测方法多假设类分布是平衡的,难以用于倾斜数据流。为此,提出了一种基于正例分布的倾斜数据流概念漂移检测方法CDPSD。首先采用改进的重采样方法,避免将不同概念的实例采样到同一数据块中,并构建分类器;再通过检测正例而非所有实例的类分布变化进行概念漂移的检测及分类器更新。实验表明,CDPSD能及时检测到倾斜数据流中的概念漂移,并快速更新分类模型,提高了正类样本的分类效果。
The concept drift is common in skewed data stream (SDS). However, the most detection algorithms of concept drift assume that the class distributions of data streams are balanced, and are not suitable in skewed data streams. Therefore, this paper proposes a detection approach for concept drifts in SDS, called CDPSD. Firstly, it adopts the modified resample method, which makes the instances in different concepts belong to different data blocks, and then builds the classifiers. Secondly, it uses the class distribution of the positive not all instances to detect the concept drifts and modify the classifiers. The experiments show that CDPSD can detect the concept drift, update the classifier in time, and promote the classification results of positive instances.
出处
《计算机科学与探索》
CSCD
2013年第6期545-550,共6页
Journal of Frontiers of Computer Science and Technology
基金
国家自然科学基金No.60975034
中央高校基本科研业务费专项资金Nos.2011HGBZ1329
2011HGQC1013
安徽省自然科学基金No.1208085QF122~~
关键词
概念漂移
倾斜数据流
重采样
分类
正类
concept drift
skewed data streams
resample
classification
positive class