期刊文献+

基于数据分区和网格的离群点挖掘算法 被引量:2

Outlier mining algorithm based on data-partitioning and grid
下载PDF
导出
摘要 针对已有的基于网格的离群点挖掘算法挖掘效率低和对于大数据集适应性差的问题,提出基于数据分区和网格的离群点挖掘算法。算法首先将数据进行分区,以单元为单位筛选非离群点,并把中间结果暂存起来;然后采用改进的维单元树结构维护数据点的空间信息,以微单元为单位进行非离群点筛选,并通过两个优化策略进行高效操作;最后以数据点为单位挖掘离群点,从而得到离群数据集合。理论分析和实验结果表明了该方法是有效可行的,对大数据集和高维数据具有更好的伸缩性。 To solve the problems of inefficiency and bad-adaptability for the existing outlier mining algorithms based on grid, this paper proposed an outlier mining algorithm based on data partitioning and grid. Firstly, the technology of data partitioning was applied. Secondly, the non-outliers were filtered out by cell and the intermediate results were temporarily stored. Thirdly, the structure of the improved Cell Dimension Tree (CD-Tree) was created to maintain the spatial information of the reserved data. Afterwards, the non-outliers were filtered out by micro-cell and were operated efficiently through two optimization strategies. Finally, followed by mining by data point, the outlier set was obtained. The theoretical analysis and experimental results show that the method is feasible and effective, and has better scalability for dealing with massive and high dimensional data.
出处 《计算机应用》 CSCD 北大核心 2012年第8期2193-2197,共5页 journal of Computer Applications
关键词 数据挖掘 离群数据 网格 数据分区 单元 微单元 维单元树 data mining outlier data grid data partitioning cell micro-cell Cell Dimension Tree (CD-Tree)
  • 相关文献

参考文献16

二级参考文献215

共引文献315

同被引文献19

  • 1张建锦,吴渝,刘小霞.一种改进的密度偏差抽样算法[J].计算机应用,2007,27(7):1695-1698. 被引量:5
  • 2GU B H, HU F F, LIU H. Sampling and its application in data mining: a survey[ R]. Singapore: National University of Singapore, 2000.
  • 3PALMER C R, FALOUTSOS C. Density biased sampling: an im- proved method for data mining and clustering[ C]// Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. New York: ACM Press, 2000:82 -92.
  • 4NANOPOULOS A, THEODORIDS Y, MANOLOPOULOS Y. In- dexed-based density biased sampling for clustering applications[ J].Data & Knowledge Engineering, 2006, 57(1) : 37 -63.
  • 5APPEL A P, PATERLINI A A, de SOUSA E P M, et al. A densi- ty-biased sampling technique to improve cluster representativeness [ C]// Proceedings of PKDD 2007. Berlin: Springer, 2007:366 - 373.
  • 6HUANG J B, SUN H L, KANG J M, et al. ESC: an efficient syn- chronization-based clustering algorithm [ J]. Knowledge-Based Sys- tems, 2013, 40". 111 - 122.
  • 7ZHAO Y C, CAO J, ZHANG C Q, et al. Enhancing grid-density based clustering for high dimensional data[ J]. Journal of Systems and Software, 2011,84(9) : 1524 - 1539.
  • 8PILEVAR A H, SUKUMAR M. GCHL: a grid-clustering algorithm for high-dimensional very large spatial data bases [ J]. Pattern Rec- ognition Letters, 2005, 26(7) : 999 - 1010.
  • 9张继福,蒋义勇,胡立华,蔡江辉,张素兰.基于概念格的天体光谱离群数据识别方法[J].自动化学报,2008,34(9):1060-1066. 被引量:24
  • 10余波,朱东华,刘嵩,郑涛.密度偏差抽样技术在聚类算法中的应用研究[J].计算机科学,2009,36(2):207-209. 被引量:7

引证文献2

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部