期刊文献+

对随机投影算法的离群数据挖掘技术研究 被引量:3

Random projection algorithm for outlier mining technology research
下载PDF
导出
摘要 d维点集离群数据挖掘技术是目前数据挖掘领域的研究热点之一。当前基于距离或最近邻概念进行离群数据挖掘时,在高维数据情况下的挖掘效果不佳,鉴于此,将基于角度的离群因子应用到高维离群数据挖掘中,提出一种新的基于随机投影算法的离群数据挖掘方案,它只需要用接近线性时间的方法就能预测所有数据点的基于角度的离群因子。该方法可以用于并行环境进行并行加速。对近似质量进行了理论分析,以保证算法的可靠性。合成和真实数据集实验结果表明,对超高维数据集,该方法效率高、可伸缩性强。 Outlier mining in ddimensional point sets is currently one of the hot areas of data mining. The current outlier mining approaches based on the distance or the nearest neighbor result in the poor mining results. To solve this problem, this paper investi gates the use of anglebased outlier factor in mining high dimensional outliers. It proposes a novel random projectionbased tech nique that is able to estimate the anglebased outlier factor for all data points in time nearlinear in the size of the data. Also, the approach is suitable to be performed in parallel environment to achieve a parallel speedup. It introduces a theoretical analysis of the quality of approximation to guarantee the reliability of the algorithm. The empirical experiments on synthetic and real world data sets demonstrate that the approach is efficient and scalable to very large high-dimensional data sets.
出处 《计算机工程与应用》 CSCD 2013年第24期122-129,共8页 Computer Engineering and Applications
基金 2011年湖南省教育厅科学研究项目(No.11C0784)
关键词 离群数据挖掘 角度 随机投影算法 接近线性时间 可靠性 效率 outlier data mining angle random projection algorithm near-linear time reliability efficiency
  • 相关文献

参考文献15

  • 1Wheeler R, Aitken S.Multiple algorithms for fraud detection[J]. Knowledge-Based Systems, 2000, 13 (2) : 93-99.
  • 2贺玲,蔡益朝,杨征.高维数据空间的一种网格划分方法[J].计算机工程与应用,2011,47(5):152-153. 被引量:4
  • 3Angiulli F, Pizzuti C.Outlier mining in large high-dimensional data sets[J].IEEE Transactions on Knowledge and Data Engi- neering, 2005,17 (2) : 203 -215.
  • 4Papadimitriou S, Kitagawa H, Gibbons P B, et al.Loci: fast outlier detection using the local correlation integral[C]//Pro- ceedings 19th International Conference on Data Engineering, 2003:315-326.
  • 5Kriegel H P,Zimek A.Angle-based outlier detection in high- dimensional data[C]//Proceedings of the 14th ACM SIGKDD Intemational Conference on Knowledge Discovery and Data Mining, 2008 : 444-452.
  • 6Charikar M S.Similarity estimation techniques from rounding algorithms[C]//Annual ACM Symposium on Theory of Com- puting, 2002: 380-388.
  • 7Indyk P, McGregor A.Declaring independence via the sketch- ing of sketches[C]//Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, 2008 : 737-745.
  • 8Knox E M,Ng R T.Algorithms for mining distance-based out- liers in large datasets[C]//Proceedings of the International Conference on Very Large Data Bases, 1998:392-403.
  • 9Wang Y, Parthasarathy S, Tatikonda S.Locality sensitive out- lier detection: a ranking driven approach[C]//IEEE 27th Inter- national Conference on Data Engineering(ICDE),2011:410-421.
  • 10Breunig M M,Kriegel H P,Ng R T,et al.LOF:identifying density-based local outliers[J].ACM Sigmod Record, 2000, 29(2) :93-104.

二级参考文献3

  • 1汪祖媛,庄镇泉,王煦法.逐维聚类的相似度索引算法[J].计算机研究与发展,2004,41(6):1003-1009. 被引量:5
  • 2Friedman J H.Flexible metric nearest neighbor classification[R]. Department of Statistics,Stanford University, 1994.
  • 3Hans-Peter K, Kroger P,Zimek A.Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering[J].ACM Transactions on Knowledge Dis- covery from Data,2009,3( 1 ) : 1-58.

共引文献3

同被引文献35

引证文献3

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部