基于差异合并的分布式随机梯度下降算法被引量：20

Distributed Stochastic Gradient Descent with Discriminative Aggregating

下载PDF

导出

摘要大规模随机梯度下降算法是近年来的热点研究问题,提高其收敛速度和性能具有重要的应用价值.大规模随机梯度下降算法可以分为数据并行和模型并行两大类.在数据并行算法中,模型合并是一种比较常用的策略.目前,基于模型合并的随机梯度下降算法普遍采用平均加权方式进行合并,虽然取得了不错的效果,但是,这种方式忽略了参与合并的模型的内在差异性,最终导致算法收敛速度慢,模型的性能及稳定性较差.针对上述问题,该文在分布式场景下,提出了基于模型差异进行合并的策略,差异性主要体现在两方面,各模型在其训练数据上错误率的差异和训练不同阶段模型合并策略的差异.此外,该文对合并后的模型采用规范化技术,将其投射到与合并前模型Frobenius范数相同的球体上,提高了模型的收敛性能.作者在Epsilon、RCV1-v2和URL 3个数据集上,验证了提出的基于差异合并的分布式随机梯度下降算法相对于平均加权方式具有收敛速度更快、模型性能更好的性质. Large scale stochastic gradient descent has been a popular research topic.It is important to improve its convergence speed and performance in practice.Currently,large scale stochastic gradient descent algorithms can be roughly divided into two classes,data-parallel and model parallel.In data-parallel,model aggregating is a popular strategy.State-of-the-art model aggregating methods simply average different models together and it has been proved to be effective.However,the simply average method may neglect the differences between different models,thus may slow down the convergence speed and harm the performance of the algorithm.In this paper,we propose a distributed stochastic gradient descent algorithm with discriminative aggregating to avoid the above problems.Our discriminative aggregating algorithm differs to existing average method in two aspects.Firstly,we take into account the training performance of each model on training data.Secondly,we exploit the dynamic importance of different models as the training process moves on.Meanwhile,to further improve model performance,we project the combined model onto the surface sphere of the local model respectively,using Frobenius norm.Experimentalresults on Epsilon,RCV1-v2 and URL data sets verify the fast convergence rate and high performance of the proposed algorithm.

作者陈振宏兰艳艳郭嘉丰程学旗

机构地区中国科学院计算技术研究所网络数据科学与技术重点实验室中国科学院大学

出处《计算机学报》 EI CSCD 北大核心 2015年第10期2054-2063,共10页 Chinese Journal of Computers

基金国家"九七三"重点基础研究发展规划项目基金(2012CB316303 2014CB340401) 国家"八六三"高技术研究发展计划项目子课题基金(2012AA011003) 国家自然科学基金重点基金(61232010) 国家自然科学基金杰出青年学者基金(61203298 61003166)资助~~

关键词分布式随机梯度下降规范化模型合并社交网络社会计算 distributed stochastic gradient descent standardized model combine social networks social computing

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献19

1Dean J, Corrado G, Monga R, et al. Large scale distributed deep networks//Proceedings of the Conference on Neural Information Processing Systems. Lake Tahoe, USA, 2012: 1223-1231.
2McDonald R, Hall K, Mann G. Distributed training strategies for the structured perceptron//Proceedings of the Human Language Technologies-. The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Los Angeles, USA, 2010: 456-464.
3Hall K B, Gilpin S, Mann G. MapReduee/Bigtable for distributed optimization//Proeeedings of the Conference on Neural Information Proeessing Systems. Workshop on Learn- ing on Cores, Clusters and Clouds. Vancouver, Canada, 2010.
4Mann G, McDonald R T, Mohri M, et al. Efficient large- scale distributed training of conditional maximum entropy models//Proceedings of the Conference on Neural Information Processing Systems. Vancouver, Canada,2009:1231-1239.
5Zinkevich M, Weimer M, Smola A J, et al. Parallelized stochastic gradient descent//Proceedings of the Conference on Neural Information Processing Systems. Vancouver, Canada, 2010:4.
6Kleiner A, Talwalkar A, Sarkar P, et al. The big data bootstrap //Proceedings of the International Conference on Machine Learning. Edinburgh, UK, 2012.
7Louppe G, Geurts P. A zealous parallel gradient descent algorithm//Proceedings of the Conference on Neural Infor- mation Processing Systems. Workshop on Learning on Cores, Clusters and Clouds. Vancouver, Canada, 2010.
8Niu F, Recht B, R, C, et al. Hogwildl: A lock-free approach to parallelizing stochastic gradient descent// Proceedings of the Conference on Neural Information Processing Systems. Granada, Spain, 2011:693-701.
9Langford J, Smola A J, Zinkevich M. Slow learners are iast//Proceedings of the Conference on Neural Information Processing Systems. Vancouver, Canada, 2009:2331-2339.
10Dai W, Wei J, Zheng X, et al. Petuum: A framework for iterative- convergent distributed, arXiv preprint arXiv, 1312. 7651, 2013.

同被引文献110

1LI Jiancheng JIANG Weiping professor,School of Geodesy and Geomatics,Wuhan University,129 Luoyu Road,Wuhan 430079,China..Long Distance Transference of Height Datum Across Seas[J].Geo-Spatial Information Science,2002,5(3):1-5. 被引量：2
2章传银,郭春喜,陈俊勇,张利明,王斌.EGM 2008地球重力场模型在中国大陆适用性分析[J].测绘学报,2009,38(4):283-289. 被引量：337
3刘曙光,郑崇勋,刘明远.前馈神经网络中的反向传播算法及其改进:进展与展望[J].计算机科学,1996,23(1):76-79. 被引量：51
4李建成.我国现代高程测定关键技术若干问题的研究及进展[J].武汉大学学报（信息科学版）,2007,32(11):980-987. 被引量：67
5钱华明,王雯升.遗传小波神经网络及在电机故障诊断中的应用[J].电子测量与仪器学报,2009,23(3):81-86. 被引量：41
6余建平,周新民,陈明.群体智能典型算法研究综述[J].计算机工程与应用,2010,46(25):1-4. 被引量：39
7郭裕兰,万建伟,欧建平,陈付彬.基于一维距离像的抗箔条干扰算法研究[J].雷达科学与技术,2011,9(1):67-71. 被引量：9
8来庆福,李金梁,冯德军,王雪松.舰船与箔条的双极化统计特性研究[J].电波科学学报,2010,25(6):1079-1084. 被引量：6
9周靖,刘晋胜.基于特征熵相关度差异的KNN算法[J].计算机工程,2011,37(17):146-148. 被引量：5
10王珊,王会举,覃雄派,周烜.架构大数据:挑战、现状与展望[J].计算机学报,2011,34(10):1741-1752. 被引量：616

引证文献20

1曹司磊,曾维贵,刘明刚.基于区域判别的抗质心式箔条干扰方法[J].兵工自动化,2017,36(6):70-74. 被引量：3
2俞艺涵,付钰,吴晓平.MapReduce框架下支持差分隐私保护的随机梯度下降算法[J].通信学报,2018,39(1):70-77. 被引量：3
3李德权,王俊雅,马驰,周跃进.快速在线分布式对偶平均优化算法[J].计算机应用,2018,38(8):2337-2342. 被引量：1
4郭西进,李红强,张帆,郑杰,吴刚.基于卷积神经网络的煤泥浮选泡沫图像分类方法[J].煤炭技术,2018,37(9):348-351. 被引量：5
5王惠中,乔林翰,贺珂珂,段洁.基于Cross-Validation的电机故障诊断振动数据处理方法[J].自动化仪表,2018,39(4):22-25. 被引量：6
6王俊雅.分布式在线随机投影优化[J].阜阳师范学院学报（自然科学版）,2018,35(3):4-7.
7刘黎志,邓介一,吴云韬.基于HBase的多分类逻辑回归算法研究[J].计算机应用研究,2018,35(10):3007-3010. 被引量：11
8何立健,林穗,翁海瑞.基于LSTM的图像生成诗歌模型[J].信息技术与网络安全,2019,38(4):76-78. 被引量：2
9李胜东,吕学强.基于图片问答的静态重启随机梯度下降算法[J].计算机研究与发展,2019,56(5):1092-1100. 被引量：5
10宋旭,刘国英.基于自动编码机特征融合的图像行为识别算法[J].计算机工程与设计,2019,40(5):1477-1483. 被引量：1

二级引证文献157

1朱小勇,陈胜.基于ResNet-ViT的海战多目标态势感知[J].信息与控制,2023,52(5):638-647. 被引量：1
2郑俊浩.基于深度学习的乳腺癌MRI影像预处理[J].智能计算机与应用,2020,10(1):231-232. 被引量：1
3臧涛,何芬芬,刘琼霄.水面舰船伴随型诱饵建模与分析[J].舰船电子工程,2018,38(12):88-92.
4刘黎志,何经纬.空气质量监测大数据区间的统计问题[J].武汉工程大学学报,2019,41(2):179-183. 被引量：1
5戚超,徐佳琪,刘超,吴明清,陈坤杰.基于机器视觉和机器学习技术的鸡胴体质量自动分级方法[J].南京农业大学学报,2019,42(3):551-558. 被引量：8
6张行,朱树先.支持向量机分类法在异步电机故障诊断中的应用[J].苏州科技大学学报（工程技术版）,2019,32(2):70-74. 被引量：3
7吴锋,牛哲,王志鑫,史吉隆.离心式压缩机喘振控制浅析[J].石油化工自动化,2019,55(4):41-44. 被引量：5
8李姚舜,刘黎志.逻辑回归中的批量梯度下降算法并行化研究[J].武汉工程大学学报,2019,41(5):499-503. 被引量：4
9章磊,姚庆文,徐伟,李燕.自适应类神经网络控制器在时变系统中的应用[J].自动化仪表,2019,40(10):95-99. 被引量：2
10高巍,孙盼盼,李大舟.Twitter情感分析中停用词处理[J].计算机工程与设计,2019,40(11):3180-3185. 被引量：3

1王裕民,顾乃杰,张孝慈.多GPU环境下的卷积神经网络并行算法[J].小型微型计算机系统,2017,38(3):536-539. 被引量：5
2屈玉贵,ustc.edu.cn,石勇军,ustc.edu.cn.多视角面向对象分析的一致性问题[J].计算机工程与应用,2000,36(11):75-77.
3杨双涛,马志强,窦保媛,张力.一种Yarn框架下的异步双随机梯度下降算法[J].小型微型计算机系统,2017,38(5):1070-1075. 被引量：2
4金钊,鲁淑霞.非线性SGD算法改进预测超平面表示方法的研究[J].信息与电脑,2016,28(13):99-99. 被引量：1
5金钊.基于线性SGD的两层迭代算法[J].信息通信,2016,29(9):39-40.
6朱烨,叶高英.医院信息系统的设计与实现[J].核工业西南物理研究院年报,2004(1):147-147.
7周伊琳,谢善益.61970 CIM模型与61850 SCL模型比较[J].广东科技,2009,18(16):192-194. 被引量：6
8邹劲松,黄凯锋.遥感图像分类中的核稀疏字典学习[J].计算机工程与设计,2016,37(6):1584-1587. 被引量：1
9王越,程昌正.协同过滤算法在电影推荐中的应用[J].四川兵工学报,2014,35(5):86-88. 被引量：7
10宋利伟,曾智勇.基于多尺度深度卷积特征的图像检索[J].福建师范大学学报（自然科学版）,2016,32(5):17-23.

计算机学报

2015年第10期

浏览历史

内容加载中请稍等...

基于差异合并的分布式随机梯度下降算法被引量：20

参考文献19

同被引文献110

引证文献20

二级引证文献157

相关作者

相关机构

相关主题

浏览历史

基于差异合并的分布式随机梯度下降算法 被引量：20

参考文献19

同被引文献110

引证文献20

二级引证文献157

相关作者

相关机构

相关主题

浏览历史

基于差异合并的分布式随机梯度下降算法被引量：20