摘要
大规模随机梯度下降算法是近年来的热点研究问题,提高其收敛速度和性能具有重要的应用价值.大规模随机梯度下降算法可以分为数据并行和模型并行两大类.在数据并行算法中,模型合并是一种比较常用的策略.目前,基于模型合并的随机梯度下降算法普遍采用平均加权方式进行合并,虽然取得了不错的效果,但是,这种方式忽略了参与合并的模型的内在差异性,最终导致算法收敛速度慢,模型的性能及稳定性较差.针对上述问题,该文在分布式场景下,提出了基于模型差异进行合并的策略,差异性主要体现在两方面,各模型在其训练数据上错误率的差异和训练不同阶段模型合并策略的差异.此外,该文对合并后的模型采用规范化技术,将其投射到与合并前模型Frobenius范数相同的球体上,提高了模型的收敛性能.作者在Epsilon、RCV1-v2和URL 3个数据集上,验证了提出的基于差异合并的分布式随机梯度下降算法相对于平均加权方式具有收敛速度更快、模型性能更好的性质.
Large scale stochastic gradient descent has been a popular research topic.It is important to improve its convergence speed and performance in practice.Currently,large scale stochastic gradient descent algorithms can be roughly divided into two classes,data-parallel and model parallel.In data-parallel,model aggregating is a popular strategy.State-of-the-art model aggregating methods simply average different models together and it has been proved to be effective.However,the simply average method may neglect the differences between different models,thus may slow down the convergence speed and harm the performance of the algorithm.In this paper,we propose a distributed stochastic gradient descent algorithm with discriminative aggregating to avoid the above problems.Our discriminative aggregating algorithm differs to existing average method in two aspects.Firstly,we take into account the training performance of each model on training data.Secondly,we exploit the dynamic importance of different models as the training process moves on.Meanwhile,to further improve model performance,we project the combined model onto the surface sphere of the local model respectively,using Frobenius norm.Experimentalresults on Epsilon,RCV1-v2 and URL data sets verify the fast convergence rate and high performance of the proposed algorithm.
出处
《计算机学报》
EI
CSCD
北大核心
2015年第10期2054-2063,共10页
Chinese Journal of Computers
基金
国家"九七三"重点基础研究发展规划项目基金(2012CB316303
2014CB340401)
国家"八六三"高技术研究发展计划项目子课题基金(2012AA011003)
国家自然科学基金重点基金(61232010)
国家自然科学基金杰出青年学者基金(61203298
61003166)资助~~
关键词
分布式
随机梯度下降
规范化
模型合并
社交网络
社会计算
distributed
stochastic gradient descent
standardized
model combine
social networks
social computing