期刊文献+

一种基于Spark的改进协同过滤算法研究 被引量:8

AN IMPROVED COLLABORATIVE FILTERING ALGORITHM BASED ON SPARK
下载PDF
导出
摘要 为提高协同过滤算法在大数据环境下的可扩展性以及在高维稀疏数据下的推荐精度,基于Spark平台实现了一种分层联合聚类协同过滤算法。利用联合聚类对数据集进行稀疏性处理并构建聚类模型,运用层次分析模型并结合评分密集度分析联合聚类模型中用户和项目潜在类别权重,由此进行项目相似度计算并构建项目最近邻居集合,完成在线推荐。通过在GroupLens提供的不同规模MovieLens数据集上实验表明,改进后的算法能够明显提高推荐的准确度,并且在分布式环境下具有良好的推荐效率和可扩展性。 In order to improve the scalability of collaborative filtering algorithm in big data environment and the recommendation accuracy in high dimensional sparse data,a hierarchical co-clustering collaborative filtering algorithm based on spark is implemented. The data sets are sparsely processed by using co-clustering and the clustering model is constructed. The potential categories weight of users and projects in the co-clustering model are analyzed by using the analytic hierarchy model combined with the score-density analysis. The project similarity is calculated and the project nearest neighbor set is constructed to complete the online recommendation. The experiments different scale Movie Lens datasets provided by Group Lens show that the improved algorithm can significantly improve the accuracy of recommendation,and it has good recommendation efficiency and expansibility in distributed environment.
出处 《计算机应用与软件》 2017年第5期247-254,278,共9页 Computer Applications and Software
基金 天津市科技计划项目(14ZCDGSF00124) 河北省青年科学基金项目(F2015202311)
关键词 协同过滤 联合聚类 层次分析模型 SPARK Collaborative filtering Co-clustering Analytic hierarchy model Spark
  • 相关文献

参考文献5

二级参考文献89

  • 1贾丽会,张修如.BP算法分析与改进[J].计算机技术与发展,2006,16(10):101-103. 被引量:48
  • 2陈刚,刘发升.基于BP神经网络的数据挖掘方法[J].计算机与现代化,2006(10):20-22. 被引量:14
  • 3Xu HL,Wu X,Li XD,Yan BP.Comparison study of Internet recommendation system.Journal of Software,2009,20(2):350-362 (in Chinese with English abstract).http://www.jos.org.cn/1000-9825/3388.htm[doi:10.3724/SP.J.1001.2009.03388].
  • 4Marlin B.Collaborative Filtering:A machine learning perspective[MS.Thesis].Toronto:University of Toronto,2004.
  • 5Hofmann T.Latent semantic models for collaborative filtering.ACM Trans.on Information System,2004,22(1):89-115.[doi:10.1145/963770.963774].
  • 6Blei DM,Ng AY,Jordan MI.Latent Dirichlet allocation.Journal of Machine Learning Research,2003,3(3):993-1022.[doi:10.1162/ jmlr.2003.3.4-5.993].
  • 7Netflix update:Try this at home.2006.http://sifter.org/~simon/journal/20061211.html.
  • 8Zhang S,Wang WH,Ford J,Makedon F.Learning from incomplete ratings using non-negative matrix factorization.In:Ghosh J,ed.Proc.of the 6th SIAM Conf.on Data Mining.Bethesda:SIAM,2006.549-553.
  • 9Cheng YZ,Church GM.Biclustering of expression data.In:Bourne PE,ed.Proc.of the 8th Int'l Conf.on Intelligent Systems for Molecular Biology.La Jolla:AAAI Press,2000.93-103.[doi:10.1016/j.ipm.2008.12.004].
  • 10Cheng G,Wang F,Zhang CS.Collaborative filtering using orthogonal nonnegative matrix tri-factorization.Information Processing & Management,2009,45(3):368-379.

共引文献299

同被引文献58

引证文献8

二级引证文献40

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部