期刊文献+

基于Spark的大数据聚类研究及系统实现 被引量:24

Research and Implementation of Big Data Clustering Based on Spark
下载PDF
导出
摘要 传统聚类算法由于单机内存和运算能力的限制已经不能满足当前大数据处理的要求,因而迫切需要寻找新的解决方法。针对单机内存运算问题,结合聚类算法的迭代计算特点,提出并实现了一种基于Spark平台的聚类系统。针对稀疏集和密集集两种不同类型的数据集,系统首先采用不同策略实现数据预处理;其次分析比较了不同聚类算法在Spark平台下的聚类性能,并给出最佳方案;最后利用数据持久化技术提高了计算速度。实验结果表明,所提系统能够有效满足海量数据聚类分析的任务要求。 Traditional clustering algorithms can not meet the requirements of current big data processing because of the limitations of stand-alone memory and computing power.Therefore it is urgent to find new solutions.Aiming at problems occurred in stand-alone memory calculating,combined with iterative computing features of clustering algorithms,a clustering system based on Spark platform is proposed.For the two different types of data sets,which are sparse sets and dense sets,the system firstly uses different strategies to achieve data preprocessing.Secondly,the performance of different clustering algorithms on Spark platform is analyzed and the best solution is given.Finally,the computing speed is improved with data persistence technology.Experimental results show that the proposed system can effectively meet the requirements of massive data clustering analysis.
作者 王磊 邹恩岑 曾诚 奚雪峰 陆悠 Wang Lei;Zou Encen;Zeng Cheng;Xi Xuefeng;Lu You(School of Electronic and Information Engineering,Suzhou University of Science and Technology,Suzhou,215009,China;Virtual Reality Key Laboratory of Intelligent Interaction and Application Technology of Suzhou,Suzhou,215009,China;Big Data Key Laboratory of PuKai,Suzhou University of Science and Technology,Suzhou,215009,China;Kunshan Public Security Bureau Command Center,Suzhou,215300,China)
出处 《数据采集与处理》 CSCD 北大核心 2018年第6期1077-1085,共9页 Journal of Data Acquisition and Processing
基金 国家自然科学基金(61673290 61750110534 61728205)资助项目 苏州市科技发展计划(SYG201707 SYG201817)资助项目
关键词 SPARK 聚类 大数据 Spark clustering big data
  • 相关文献

参考文献4

二级参考文献36

  • 1(加)HanJ KamberM 范明 盂小峰 等译.数据挖掘概念与技术m[M].北京:机械工业出版社,2001.223-262.
  • 2..http://lib, slat. Cmu. Edu/datasets/places. Data,.
  • 3Raghu Krishnapuram,Keller J M.A possibilistic approach to clustering[J].IEEE Transactions on Fuzzy System,1993,1(2):98-110.
  • 4Bezdek J C.Pattern recognition with fuzzy objective function algorithm[M].New York:New York Plenum Press,1981.
  • 5Dombi J.Membership function as an evaluation[J].Fuzzy Sets and Systems,1990,35(1):1-21.
  • 6Popescu I,Bertsimas D.Optimal inequalities in probability theory:A convex optimization approach[J].SIAM Journal on Optimization,2001,15(3):780-804.
  • 7Gert R G L,Laurent E G,Chiranjib Bhattacharyya,et al.A robust minimax approach to classification[J].Journal of Machine Learning Research,2002(3):555-582.
  • 8Huang Kaizhu,Yang Haiqin,King Irwin,et al.The minimum error minimax probability machine[J].Journal of Machine Learning Research,2004(5):1253-1286.
  • 9Savaresi S M, Boley D. On the Performance of Bisecting K-Means and PDDP[C]//Proc. of the 1st SIAM International Conference on Data Mining. Chicago, USA: [s. n.], 2001: 1-14.
  • 10Steinbach M, Karypis G, Kumar V. A Comparison of Document Clustering Techniques[C]//Proc. of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Boston, USA: [s. n.], 2000: 525-526.

共引文献161

同被引文献208

引证文献24

二级引证文献57

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部