摘要
因子分解机(factorization machine,简称FM)模型因为能够有效解决高维数据特征组合的稀疏问题且具有较高的预测精度和计算效率,在广告点击率预测和推荐系统领域被广泛研究和应用.对FM及其相关模型的研究进展进行综述,有利于促进该模型的进一步改进和应用.通过比较FM模型与多项式回归模型和因子分解模型之间的关联关系,阐述FM模型的灵活性和普适性.从特征的高阶交互、特征的场交互、特征的分层交互以及基于特征工程的特征提取、合并、智能选择和提升等角度,总结模型在宽度扩展方面的方法、策略和关键技术.比较和分析了FM模型与其他模型的集成方式和特点,尤其是与深度学习模型的集成,为传统模型的深度扩展提供了思路.对FM模型的优化学习方法和基于不同并行与分布式计算框架的实现进行概括、比较和分析.最后,对FM模型中有待深入研究的难点、热点及发展趋势进行展望.
Since the factorization machine (FM) model can effectively solve the sparsity problem of high-dimensional data feature combination with high prediction accuracy and computational efficiency, it has been widely studied and applied in the field of click-through-rate (CTR) prediction and recommender systems. The review of the progress on the subsequent research on FM and its related models will help to promote the further improvement and application of the model. By comparing the relationship between the FM model and the polynomial regression model and the factorization model, the flexibility and generality of the FM model are described. Considering width extension, the strategies, methods, and key technologies are summarized from the dimensions of high-order feature interaction, field-aware feature interaction and hierarchical feature interaction, as well as feature extraction, combining, intelligent selection and promotion based on feature engineering. The integration approaches and benefits of FM model with other models, especially the combination with deep learning models are compared and analyzed, which provides insights into the in-depth expansion of traditional models. The learning and optimization methods of FM models and the implementation based on different parallel and distributed computing frameworks are summarized, compared, and analyzed. Finally, the authors forecast the difficult points, hot spots and development trends in the FM model that need to be further studied.
作者
燕彩蓉
周灵杰
张青龙
李晓林
YAN Cai-Rong;ZHOU Ling-Jie;ZHANG Qing-Long;LI Xiao-Lin(School of Computer Science and Technology, Donghua University, Shanghai 201620, China;State Key Laboratory for Novel Software Technology (Nanjing University), Nanjing 210023, China)
出处
《软件学报》
EI
CSCD
北大核心
2019年第3期822-844,共23页
Journal of Software
基金
国家自然科学基金(61402100)
中央高校基本科研业务费专项资金(2232016D3-11)~~
关键词
因子分解机
推荐系统
广告点击率预测
特征工程
深度学习
并行与分布式计算
factorization machine
recommender system
CTR prediction
feature engineering
deep learning
parallel and distributed computing