摘要
为提高协同过滤算法在大数据环境下的可扩展性以及在高维稀疏数据下的推荐精度,基于Spark平台实现了一种分层联合聚类协同过滤算法。利用联合聚类对数据集进行稀疏性处理并构建聚类模型,运用层次分析模型并结合评分密集度分析联合聚类模型中用户和项目潜在类别权重,由此进行项目相似度计算并构建项目最近邻居集合,完成在线推荐。通过在GroupLens提供的不同规模MovieLens数据集上实验表明,改进后的算法能够明显提高推荐的准确度,并且在分布式环境下具有良好的推荐效率和可扩展性。
In order to improve the scalability of collaborative filtering algorithm in big data environment and the recommendation accuracy in high dimensional sparse data,a hierarchical co-clustering collaborative filtering algorithm based on spark is implemented. The data sets are sparsely processed by using co-clustering and the clustering model is constructed. The potential categories weight of users and projects in the co-clustering model are analyzed by using the analytic hierarchy model combined with the score-density analysis. The project similarity is calculated and the project nearest neighbor set is constructed to complete the online recommendation. The experiments different scale Movie Lens datasets provided by Group Lens show that the improved algorithm can significantly improve the accuracy of recommendation,and it has good recommendation efficiency and expansibility in distributed environment.
出处
《计算机应用与软件》
2017年第5期247-254,278,共9页
Computer Applications and Software
基金
天津市科技计划项目(14ZCDGSF00124)
河北省青年科学基金项目(F2015202311)
关键词
协同过滤
联合聚类
层次分析模型
SPARK
Collaborative filtering Co-clustering Analytic hierarchy model Spark