期刊文献+

基于差异合并的分布式随机梯度下降算法 被引量:20

Distributed Stochastic Gradient Descent with Discriminative Aggregating
下载PDF
导出
摘要 大规模随机梯度下降算法是近年来的热点研究问题,提高其收敛速度和性能具有重要的应用价值.大规模随机梯度下降算法可以分为数据并行和模型并行两大类.在数据并行算法中,模型合并是一种比较常用的策略.目前,基于模型合并的随机梯度下降算法普遍采用平均加权方式进行合并,虽然取得了不错的效果,但是,这种方式忽略了参与合并的模型的内在差异性,最终导致算法收敛速度慢,模型的性能及稳定性较差.针对上述问题,该文在分布式场景下,提出了基于模型差异进行合并的策略,差异性主要体现在两方面,各模型在其训练数据上错误率的差异和训练不同阶段模型合并策略的差异.此外,该文对合并后的模型采用规范化技术,将其投射到与合并前模型Frobenius范数相同的球体上,提高了模型的收敛性能.作者在Epsilon、RCV1-v2和URL 3个数据集上,验证了提出的基于差异合并的分布式随机梯度下降算法相对于平均加权方式具有收敛速度更快、模型性能更好的性质. Large scale stochastic gradient descent has been a popular research topic.It is important to improve its convergence speed and performance in practice.Currently,large scale stochastic gradient descent algorithms can be roughly divided into two classes,data-parallel and model parallel.In data-parallel,model aggregating is a popular strategy.State-of-the-art model aggregating methods simply average different models together and it has been proved to be effective.However,the simply average method may neglect the differences between different models,thus may slow down the convergence speed and harm the performance of the algorithm.In this paper,we propose a distributed stochastic gradient descent algorithm with discriminative aggregating to avoid the above problems.Our discriminative aggregating algorithm differs to existing average method in two aspects.Firstly,we take into account the training performance of each model on training data.Secondly,we exploit the dynamic importance of different models as the training process moves on.Meanwhile,to further improve model performance,we project the combined model onto the surface sphere of the local model respectively,using Frobenius norm.Experimentalresults on Epsilon,RCV1-v2 and URL data sets verify the fast convergence rate and high performance of the proposed algorithm.
出处 《计算机学报》 EI CSCD 北大核心 2015年第10期2054-2063,共10页 Chinese Journal of Computers
基金 国家"九七三"重点基础研究发展规划项目基金(2012CB316303 2014CB340401) 国家"八六三"高技术研究发展计划项目子课题基金(2012AA011003) 国家自然科学基金重点基金(61232010) 国家自然科学基金杰出青年学者基金(61203298 61003166)资助~~
关键词 分布式 随机梯度下降 规范化 模型合并 社交网络 社会计算 distributed stochastic gradient descent standardized model combine social networks social computing
  • 相关文献

参考文献19

  • 1Dean J, Corrado G, Monga R, et al. Large scale distributed deep networks//Proceedings of the Conference on Neural Information Processing Systems. Lake Tahoe, USA, 2012: 1223-1231.
  • 2McDonald R, Hall K, Mann G. Distributed training strategies for the structured perceptron//Proceedings of the Human Language Technologies-. The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Los Angeles, USA, 2010: 456-464.
  • 3Hall K B, Gilpin S, Mann G. MapReduee/Bigtable for distributed optimization//Proeeedings of the Conference on Neural Information Proeessing Systems. Workshop on Learn- ing on Cores, Clusters and Clouds. Vancouver, Canada, 2010.
  • 4Mann G, McDonald R T, Mohri M, et al. Efficient large- scale distributed training of conditional maximum entropy models//Proceedings of the Conference on Neural Information Processing Systems. Vancouver, Canada,2009:1231-1239.
  • 5Zinkevich M, Weimer M, Smola A J, et al. Parallelized stochastic gradient descent//Proceedings of the Conference on Neural Information Processing Systems. Vancouver, Canada, 2010:4.
  • 6Kleiner A, Talwalkar A, Sarkar P, et al. The big data bootstrap //Proceedings of the International Conference on Machine Learning. Edinburgh, UK, 2012.
  • 7Louppe G, Geurts P. A zealous parallel gradient descent algorithm//Proceedings of the Conference on Neural Infor- mation Processing Systems. Workshop on Learning on Cores, Clusters and Clouds. Vancouver, Canada, 2010.
  • 8Niu F, Recht B, R, C, et al. Hogwildl: A lock-free approach to parallelizing stochastic gradient descent// Proceedings of the Conference on Neural Information Processing Systems. Granada, Spain, 2011:693-701.
  • 9Langford J, Smola A J, Zinkevich M. Slow learners are iast//Proceedings of the Conference on Neural Information Processing Systems. Vancouver, Canada, 2009:2331-2339.
  • 10Dai W, Wei J, Zheng X, et al. Petuum: A framework for iterative- convergent distributed, arXiv preprint arXiv, 1312. 7651, 2013.

同被引文献110

引证文献20

二级引证文献157

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部