期刊文献+

2种加速K-近邻方法的实验比较 被引量:3

Experimental comparison of two acceleration approaches for K-nearest neighbors
下载PDF
导出
摘要 K-近邻(K-NN:K-nearest neighbors)是著名的数据挖掘算法,应用非常广泛.K-NN思想简单,易于实现,其计算时间复杂度和空间复杂度都是O(n),n为训练集中包含的样例数.当训练集比较大时,特别是面对大数据集时,K-NN算法的效率会变得非常低,甚至不可行.本文用实验的方法比较了2种加速K-NN的方法,2种加速方法分别是压缩近邻(CNN:condensed nearest neighbor)方法和基于MapReduce的K-NN.具体地,在Hadoop环境下,用MapReduce编程实现了K-NN算法,并与CNN算法在8个数据集上进行了实验比较,得出了一些有价值的结论,对从事相关研究的人员具有一定的借鉴作用. K-NN(K-nearest neighbors)is a famous data mining algorithm with wide range of applications.The idea of K-NN is simple and it is easy to implement.Both computational time and space complexity of K-NN are all O(n),where,nis the number of instances in a training set.When K-NN encountered larger training sets,especially faced with big data sets,the efficiency of K-NN becomes very low,even KNN is impracticable.Two acceleration approaches for K-nearest neighbors are experimentally compared on 8data sets.The two acceleration approaches are the CNN and MapReduce based K-NN.Specifically,in Hadoop environment,this paper implements K-NN with MapReduce,and experimentally compares with CNN on 8data sets.Some valuable conclusions are obtained,and may be useful for researchers in related fields.
出处 《河北大学学报(自然科学版)》 CAS 北大核心 2016年第6期650-656,共7页 Journal of Hebei University(Natural Science Edition)
基金 国家自然科学基金资助项目(71371063) 河北省高等学校科学技术研究重点项目(ZD20131028) 河北大学研究生创新项目(X2016059)
关键词 K-近邻 数据挖掘 MAPREDUCE HADOOP K-nearest neighbors data mining MapReduce Hadoop
  • 相关文献

参考文献3

二级参考文献86

  • 1袁曾任.人工神经元网络及其应用[M].北京:清华大学出版社,2000..
  • 2HANJIAWEI MICHELINEKAMBER.DataMiningconceptsandtechniques[M].北京:高等教育出版社,2001..
  • 3DUDA R O, HART P E, STORK D G. Pattern lassification[M]. 2nd ed. New York: John Wiley and Sons,2001.
  • 4ERIC BAUER, RON KOHAVI. Art empirical comparison of voting classification algorithms: Bagging, Boosting and variants[J]. Machine Learning, 1999, 36(1/2): 105-139.
  • 5YANG YIMING. An evaluation of statistical approaches to text categorization[ J ]. Journal of Information Retrieval, 1999,1 (1/2) : 67- 88.
  • 6ANTONIO GOMEZ SKARMETA, AMINE BENSAID. Data mining for text categorization with semi - supervised agglomerative hierarchical clustering[J]. International Journal of Intelligent Systems, 2000, 15(7) :633 - 646.
  • 7JYH- SHING ROGER JANG, CHUEN - TSAI SUN,EIJI - MIZUTANI. Neuro- Fuzzy and soft computing[M]. New Jersey, USA: Prentice- Hall, 1997.
  • 8JAVIER RAYMUNDO GARCIA- SERRANO, JOSE FRANCISCO MARTINEZ - TRNIDAD. Extension to c - means algorithm for the use of similarity function[A]. JAN M ZYTKOW, JAN RAUCH. Third European Corfference, PKDD'99[C].Prague: Czech Republic, 1999.
  • 9AH- HWEE TAN. Text Mining: The state of the art and the challenges[ Z], PAKDD'99 Workshop on Knowledge discovery from Advanced Databases (KDAD'99), Beijing, 1999.
  • 10JI HE, AH- HWEE TAN, CHEW - LIM TAN. A comparative study on Chinese text categorization methods[Z]. PRICAI 2000 workshop on text and web mining, Melbourne, 2000.

共引文献59

同被引文献22

引证文献3

二级引证文献29

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部