期刊文献+

基于时序窗口的概念漂移类别检测 被引量:10

Concept Drift Class Detection Based on Time Window
下载PDF
导出
摘要 流数据作为一种新型数据,在各个领域均有应用,其快速、大量及持续不断的特点使得单遍精准扫描成为在线学习算法的必备特质.在流数据不断产生过程中,往往会发生概念漂移,目前对于概念漂移节点检测的研究相对成熟,然而实际问题中学习环境因素朝不同方向发展往往会导致流数据中概念漂移类别的多样性,这给流数据挖掘及在线学习带来了新的挑战.针对这个问题,提出一种基于时序窗口的概念漂移类别检测(concept drift class detection based on time window,CD-TW)方法.该方法借助栈和队列对流数据进行存取,借助窗口机制对流数据进行分块学习.首先创建2个分别加载历史数据和当前数据的基础节点时序窗口,通过比较二者所包含数据的分布变化情况来检测概念漂移节点.然后创建加载漂移节点后部分数据的跨度时序窗口,通过分析该窗口中数据分布的稳定性检测漂移跨度,进而判断概念漂移类别.实验结果表明该方法不仅能够精确定位概念漂移节点,同时在漂移类别判断方面也表现出良好性能. As a new type of data,streaming data has been applied in various application fields.Its fast,massive and continuous characteristics make single pass and accurate scanning become essential features of online learning.In the process of continuous generation of streaming data,concept drift often occurs.At present,the research on concept drift detection is relatively mature.However,in reality,the development of learning environment factors in different directions often leads to the diversity of concept drift class in streaming data,which brings new challenges to streaming data mining and online learning.To solve this problem,this paper proposes a concept drift class detection method based on time window(CD-TW).In this method,stack and queue are used to access the data,and window mechanism is used to learn streaming data in chunks.This method detects concept drift site by creating two basic site time windows which load historical data and current data respectively and comparing the distribution changes of the data contained in them.Then,a span time window loading partial data after drift site is created.The drift span is obtained by analyzing the distribution stability of the data in span time window,which is further used to judge the concept drift class.The results of experiment demonstrate that CD-TW can not only detect concept drift site accurately,but also show good performance in judging the class of concept drift.
作者 郭虎升 任巧燕 王文剑 Guo Husheng;Ren Qiaoyan;Wang Wenjian(School of Computer and Information Technology,Shanxi University,Taiyuan 030006;Key Laboratory of Computational Intelligence and Chinese Information Processing(Shanxi University),Ministry of Education,Taiyuan 030006)
出处 《计算机研究与发展》 EI CSCD 北大核心 2022年第1期127-143,共17页 Journal of Computer Research and Development
基金 国家自然科学基金项目(61503229,U1805263,62076154) 山西省自然科学基金项目(201901D111033) 山西省重点研发计划项目(国际合作)(201903D421050)。
关键词 流数据 概念漂移 时序窗口 漂移跨度 概念漂移类别 streaming data concept drift time window drift span concept drift class
  • 相关文献

参考文献5

二级参考文献44

  • 1许冠英,韩萌,王少峰,贾涛.数据流集成分类算法综述[J].计算机应用研究,2020,37(1):1-8. 被引量:11
  • 2Masud M M, Gao J, Khan L. Mining concept drifting data stream to detect peer to peer botnet traffic [C] //Proe of the 4th Annual Workshop on Cyber Security and Information Intelligence Research. New York: ACM, 2008:56-68.
  • 3Delany S J, Cunningham P, Tsymbal A. A comparison of ensemble and case-base maintenance techniques for handing concept drift in spare filtering [C] //Proc of the 19th Int Conf on Artificial Intelligence. Menlo Park: AAAI, 2006: 340- 345.
  • 4Masud M M, Gao J, Khan L, et al. A practical approach to classify evolving data streams: Training with limited amount of labeled data [C] //Proc of the 8th IEEE Int Conf on Data Mining. Piscataway, NJ: IEEE, 2008:929-934.
  • 5Widmer G, Kubat M. Learning in the presence of concept drift and hidden contexts [J]. Machine Learning, 1996, 23 (1) : 69-101.
  • 6李南,郭躬德.面向高速数据流的集成分类器算法[J].计机应用,2012,32(3):629-633.
  • 7Klinkerberg R. Learning drifting concepts: Examples selection vs. example weighing [J]. Intelligent Data Analysis, 2004, 8(3): 281-300.
  • 8Zhang P, Zhu X Q, Shi Y. Categorizing and mining concept drifting data streams [C] //Proc of the 14th Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2008:812-820.
  • 9Street W, Kim Y. A streaming ensemble algorithm (SEA) for large-scale classification [C] //Proc of the 7th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2001: 77-382.
  • 10Kolter J Z, Maloof M A. Dynamic weighted majority; An ensemble method for drifting concepts [J]. Journal of Machine Research, 2007, 8(12): 2755-2790.

共引文献63

同被引文献24

引证文献10

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部