期刊文献+

云环境下基于LSH的分布式数据流聚类算法 被引量:3

Distributed Data Stream Clustering Based on LSH on Cloud Environments
下载PDF
导出
摘要 近年来,随着计算机技术、信息处理技术在工业生产、信息处理等领域的广泛应用,会连续不断地产生大量随时间演变的序列型数据,构成时间序列数据流,如互联网新闻语料分析、网络入侵检测、股市行情分析和传感器网络数据分析等。实时数据流聚类分析是当前数据流挖掘研究的热点问题。单遍扫描算法虽然满足数据流高速、数据规模较大和实时分析的需求,但因缺乏有效的聚类算法来识别和区分模式而限制了其有效性和可扩展性。为了解决以上问题,提出云环境下基于LSH的分布式数据流聚类算法DLCStream,通过引入Map-Reduce框架和位置敏感哈希机制,DLCStream算法能够快速找到数据流中的聚类模式。通过详细的理论分析和实验验证表明,与传统的数据流聚类框架CluStream算法相比,DLCStream算法在高效并行处理、可扩展性和聚类结果质量方面更有优势。 In recent years,with the wide application of computer technology and internet technology in the field of industrial production and information processing,these applications will continuously produce large amounts of sequence data evolved over time and constitute time series data stream,such as internet news feed analysis,network intrusion detection system,stock markets analysis and sensor networks data analysis.The real-time clustering analysis of data stream is a hot issue of the current data stream mining.However,due to the high speed,large-scale data and real-time analysis,data must often be analyzed on the fly.Although the one-pass-through scanning algorithm is able to meet the needs,the lack of efficient clustering algorithms to identify and distinguish patterns limits the effectivity and scalability of this method.In order to solve the above problems,we proposed a novel stream clustering algorithm called DLCStream,which is based on LSH on cloud environments.It is a distributed data stream clustering approach that uses the Map-Reduce framework and LSH mechanism to quickly find the clustering pattern in the data stream.Finally,the theoretical analysis and experiment results illustrate that the DLCStream algorithm results is significantly more efficient in efficient parallel processing,scalablity,and quality of the clustering results compared with traditional data stream clustering framework CluStream algorithm.
出处 《计算机科学》 CSCD 北大核心 2014年第11期195-202,共8页 Computer Science
基金 国家"九七三"重点基础研究发展规划项目基金(2007CB310803) 国家自然科学基金重点项目(61035004) 国家自然科学基金(60875029) 国家科技部博士后基金(2013M541005)资助
关键词 数据流聚类 位置敏感哈希方法 Map-Reduce框架 DLCStream算法 Data stream clustering Locality sensitive hashing Map-Reduce frame DLCStream
  • 相关文献

参考文献1

二级参考文献11

  • 1Li Chao-shun,Zhou Jian-zhong,and Li Qing-qing.A fuzzy clustering algorithm based on mutative scale chaos optimization.Advances in Neural Networks.ISNN 2008,Berlin/Heidelberg:Springer.2008,5264:259-267.
  • 2Runkler T A and Katz C.Fuzzy clustering by particle swarm optimization.Proceedings of 2006 IEEE International Conference on Fuzzy Systems.Vancouver,BC,2006:601-608.
  • 3Chuang Keh-shih,Tzeng Hong-long,and Chen Sharon.Fuzzy c-means clustering with spatial information for image segmentation.Computerized Medical Imaging and Graphics.2006,30(1):9-15.
  • 4Cai Wei-ling,Chen Song-can,and Zhang Dao-qiang.Fast and robust fuzzy c-means clustering algorithms incorporating local information for image segmentation.Pattern Recognition,2007,40(3):825-838.
  • 5Pal N R and Bezdek J C.On cluster validity for the Fuzzy c-means Model.1EEE Transactions on Fuzzy Systems.1995,3(3):370-378.
  • 6Kamber M and Han Jia-wei.Data Mining:Concepts and Techniques.2rd edition.Singapore:Elsevier Press.2005:295-300.
  • 7Breunig M M,Kriegel Hans-peter,and Raymond T N,et al..LOF:Identifying density-based local outliers.Proceedings of ACM SIGMOD International Conference on Management of Data,Dallas,Texas:ACM Press.2000,29:93-104.
  • 8Cao Hui,Si Gang-quan,Zhu Wen-zhi,and Zhang Yan-bin.Enhancing effectiveness of deusity-based outlier mining.International Symposiums on Information processing,Moscow,May 23-25,2008.
  • 9Ghoting A,Parthasarathy S,and Otey M E.Fast miniug of distance-based outliers in high-dimensional dataset.Data Mining Knowledge Discovery,2008,16(3):349-364.
  • 10Weng Xiao-qing and Shen Jun-yi.Detecting outlier samples in multivariate time series dataset.Knowledge-Based Systems,2008,21(8):807-812.

共引文献10

同被引文献27

引证文献3

二级引证文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部