期刊文献+

数据流挖掘分类技术综述 被引量:41

A Survey of Classification of Data Streams
下载PDF
导出
摘要 数据流挖掘作为从连续不断的数据流中挖掘有用信息的技术,近年来正成为数据挖掘领域的研究热点,并有着广泛的应用前景.数据流具有数据持续到达、到达速度快、数据规模巨大等特点,因此需要新颖的算法来解决这些问题.而数据流挖掘的分类技术更是当前的研究热点.综述了当前国际上关于数据流挖掘分类算法的研究现状,并从数据平稳分布和带概念漂移两个方面对这些方法进行了系统的介绍与分析,最后对数据流挖掘分类技术当前所面临的问题和发展趋势进行了总结和展望. Data streams mining, the technology of getting valuable information from continuous data streams is a field that has recently gained increasingly attention all over the world. In the model of data streams, data does not take the form of persistent relations, but rather arrives in a multiple, continuous, rapid and time-varying way. Because of the rapid data arriving speed and huge size of data set in data streams, novel algorithms are devised to resolve these problems. Among these research topics, classifying methods is an important one. In this review paper, the state-of-the-art in this growing vital field is presented, and theses methods are introduced from two directions: stationary distribution data streams and data streams with concept drift. Finally, the challenges and future work in this field are explored.
出处 《计算机研究与发展》 EI CSCD 北大核心 2007年第11期1809-1815,共7页 Journal of Computer Research and Development
基金 国家自然科学基金项目(60573057)~~
关键词 数据流 挖掘 分类 稳态分布 概念漂移 data streams mining classify stationary distribution concept-drift
  • 相关文献

参考文献28

  • 1B Babcock,S Babu,M Datar,etal.Models and issues in data streams systems[C].The 21st ACM SIGACT-SIGMOD-SIGART Symp on Priciples of Database Systems,Madison,2002
  • 2P Domingos,G Hulten.Mining high-speed data streams[C].The Assoiciation for Computing Machinery 6th Int'l Conf on Knowledge Discovery and Data Minings,Boston,2000
  • 3R Jin,G Agrawal.Efficient decision tree construction on streaming data[C].The ACM SIGKDD 9th Int'l Conf on Knowledge Discovery and Data Mining,Washington,2003
  • 4S Muthukrishnan.Data streams:Algorithms and applications[C].The 14th Annual ACM-SIAM Symp on Discrete Algorithms,Baltimore,MD,USA,2003
  • 5H Wang,W Fan,P Yu,et al.Mining concept-drifting data streams using ensemble classifiers[C].The 9th ACM Int'l Conf on Knowledge Discovery and Data Mining (SIGKDD),Washington,2003
  • 6Q H Xie.An efficient approach for mining concept-drifting data streams:[Master dissertation][D].Tainan,China:National University of Tainan,2004
  • 7M Guetova,Holldobter,H V Storr.Incremental fuzzy decision trees[C].The 25th German Conf on Artificial Intelligence(KI2002),Aachen,Germany,2002
  • 8杨宜东,孙志挥,张净.基于核密度估计的分布数据流离群点检测[J].计算机研究与发展,2005,42(9):1498-1504. 被引量:8
  • 9钱江波,徐宏炳,董逸生,王永利,刘学军,杨雪梅.基于最小生成树的数据流窗口连接优化算法[J].计算机研究与发展,2007,44(6):1000-1007. 被引量:3
  • 10V Ganti,J Gehrke,R Ramakrishnan.Mining data streams under block evolution[J].SIGMOD Explorations,2002,3(2):1-10

二级参考文献25

  • 1金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量:161
  • 2钱江波,徐宏炳,王永利,刘学军,董逸生.多数据流滑动窗口并发连接方法[J].计算机研究与发展,2005,42(10):1771-1778. 被引量:10
  • 3S. Muthukrishnan. Data streams algorithms and applications. In:Proc. the 14th Annual ACM-SIAM Symposium on Discrete Algorithms. Philadelphia: Society for Industrial and Applied Mathematics, 2003. 413~413.
  • 4D. Hawkins. Identification of Outliers. London: Chapman and Hall, 1980.
  • 5E.M. Knorr, R. T. Ng. Algorithms for mining distance-based outliers in large datasets. In: Proc. the 24th Int'l Conf. Very Large Databases. New York: ACM Press, 1998. 392~403.
  • 6D. Yu, G. Sheikholeslami, A. Zhang. Findout: Finding outliers in very large datasets. Knowledge and Information Systems,2002, 4(4): 387~412.
  • 7M. M. Breunig, H. Kriegel, R. T. Ng, et al. LOF:identifying density-based local outliers. In: Proc. the 2000 ACM SIGMOD Int'l Conf. Management of Data. New York: ACM Press, 2000. 93~104.
  • 8S. Papadimitirou, H. Kitagawa, P. B. Gibbons, et al. LOCI:Fast outlier detection using the local correlation integral. In: Proc.the 19th Int'l Conf. Data Engineering. Los Alamitos, CA: IEEE Computer Society Press, 2003. 315~326.
  • 9S. Muthukrishnan, R. Shah, J. Vitter. Mining deviants in time series data streams. In: Proc. the 16th Int'l Conf. Scientific and Statistical Database Management. Los Alamitos, CA: IEEE Computer Society Press, 2004. 41~50.
  • 10H. V. Jagadish, N. Koudas, S. Muthukrishnan. Mining deviants in a time series database. In: Proc. the 25th Int'l Conf.Very Large Data Bases. San Francisco: Morgan Kaufmann,1999. 102~113.

共引文献9

同被引文献401

引证文献41

二级引证文献164

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部