摘要
数据流挖掘作为从连续不断的数据流中挖掘有用信息的技术,近年来正成为数据挖掘领域的研究热点,并有着广泛的应用前景.数据流具有数据持续到达、到达速度快、数据规模巨大等特点,因此需要新颖的算法来解决这些问题.而数据流挖掘的分类技术更是当前的研究热点.综述了当前国际上关于数据流挖掘分类算法的研究现状,并从数据平稳分布和带概念漂移两个方面对这些方法进行了系统的介绍与分析,最后对数据流挖掘分类技术当前所面临的问题和发展趋势进行了总结和展望.
Data streams mining, the technology of getting valuable information from continuous data streams is a field that has recently gained increasingly attention all over the world. In the model of data streams, data does not take the form of persistent relations, but rather arrives in a multiple, continuous, rapid and time-varying way. Because of the rapid data arriving speed and huge size of data set in data streams, novel algorithms are devised to resolve these problems. Among these research topics, classifying methods is an important one. In this review paper, the state-of-the-art in this growing vital field is presented, and theses methods are introduced from two directions: stationary distribution data streams and data streams with concept drift. Finally, the challenges and future work in this field are explored.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2007年第11期1809-1815,共7页
Journal of Computer Research and Development
基金
国家自然科学基金项目(60573057)~~
关键词
数据流
挖掘
分类
稳态分布
概念漂移
data streams
mining
classify
stationary distribution
concept-drift