期刊文献+

基于MapReduce与距离的离群数据并行挖掘算法 被引量:4

Parallel Mining of Distance-Based Outliers Using MapReduce
下载PDF
导出
摘要 数据挖掘技术是解决数据丰富而知识贫乏的有效途径,离群数据挖掘是数据挖掘领域中的重要研究内容之一,己广泛应用于网络入侵检测,信用卡诈骗,垃圾邮件的分析和基因突变分析等领域.在高维海量数据中,由于数据量大和维度高,严重影响了离群数据挖掘的精度和效率.本文在KNN基础上,通过定义"解集"的概念,在MapReduce编程环境下,实现了一种基于距离的离群数据挖掘算法.分别采用人工数据集和UCI数据集,实验验证了该算法在不同条件下,参数对算法性能的影响. Data mining technology is an effective approach to resolve the problem of abundant data and scanty information. Outlier mining is one of the main research topic in the field of data mining, and it has been widely used in network intrusion detection, line card fraud, spam analysis, gene mutation analysis, etc. In high-dimensional data, the data volume and high dimension affect the effects of outlier data mining and efficiency seriously. In view of the high dimensional data, this study adopts the KNN implementing a distance-based outlier data mining algorithms under the MapReduce programming model by defining the "solving set". Using artificial data set and UCI data set, the influence of parameters on the algorithm performance is discussed under different conditions in the experiment.
作者 任燕
出处 《计算机系统应用》 2018年第2期151-156,共6页 Computer Systems & Applications
关键词 MAPREDUCE 基于距离 KNN 离群数据挖掘 MapReduce distance-based KNN outliers data mining
  • 相关文献

同被引文献46

引证文献4

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部