摘要
协同过滤算法是推荐系统中比较古老的算法,原理是根据近邻用户或者相似物品对目标进行推荐,因此相似度计算方法是关键部分.由于互联网的高传播性,物品热门周期变短,影响了用户之间相似性度量,传统的协同过滤算法已经不能达到很好的推荐效果.针对相似度改进,在皮尔逊相似度原理上添加物品热门因子,优化皮尔逊相似度计算,提高推荐效果.采用大数据技术并搭建spark分布式平台.在spark大数据平台上使用Movie Lens电影推荐数据集上验证改进后的算法,采用准确率、召回率和平均绝对误差(MAE)等指标来评价改进算法.实验结果表明改进算法在准确率和召回率上都比传统算法有很大的提高,在平均绝对误差上也有所下降.
Collaborative filtering algorithm is a relatively old algorithm in recommendation system,which is based on the nearest neighbors or similar objects,so the similarity calculation method is the key part.Due to the high transmission of the Internet,the popular cycle of items becomes shorter,which affects the similarity measurement between users,the traditional collaborative filtering algorithm cannot achieve a good recommendation.In order to improve the similarity,this paper improves the user similarity calculation method based on the Pearson similarity from hot degree of items.Use big data technology and building spark distributed platform.Using MovieLens datasets to verify the improved algorithm on spark distributed platform and using recall,precision and MAE to evaluate the improved algorithm.The experimental results show that the improved algorithm has higher precision and recall than the traditional algorithm,and the MAE is also decreased.
作者
孙红
韩震
SUN Hong, HAN Zhen(1University of Shanghai for Science and Technology, Shanghai 200093, China ;2 Shanghai Key Lab of Modem Optical System, Shanghai 200093, Chin)
出处
《小型微型计算机系统》
CSCD
北大核心
2018年第4期638-643,共6页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61472256
61170277)资助
上海市教委科研创新重点项目(12zz137)资助
沪江基金项目(C14002)资助