期刊文献+

基于图片问答的静态重启随机梯度下降算法 被引量:5

Static Restart Stochastic Gradient Descent Algorithm Based on Image Question Answering
下载PDF
导出
摘要 图片问答是计算机视觉与自然语言处理交叉的多模态学习任务.为了解决该任务,研究人员提出堆叠注意力网络(stacked attention networks, SANs).研究发现该模型易陷入不好的局部最优解,引发较高的问答错误率.为了解决该问题,提出基于图片问答的静态重启随机梯度下降算法.实验结果和分析表明:它的准确率比基准算法提高0.29%,但其收敛速度慢于基准算法.为了验证改善性能的显著性,对实验结果进行统计假设检验.T检验结果证明它的改善性能是极其显著的.为了验证它在同类算法中的有效性,将该算法和当前最好的一阶优化算法进行有效性实验,实验结果和分析证明它更有效.为了验证它的泛化性能和推广价值,在经典的Cifar-10数据集上进行图像识别实验.实验结果和T检验结果证明:它具有良好的泛化性能和较好的推广价值. Image question answering is a multimodal learning task intersecting computer vision and natural language processing. With the breakthroughs in the deep neural networks, it has been the hotspot and focus of many researchers attention. To solve the task, researchers put forward numerous excellent models. Stacked attention networks (SANs) is one of the most typical models, and gets the state-of-the-art results in the test of four public visual question answering datasets. Although it has the excellent performance, because of the diversity of question and the sparsity of answer, it cannot fully learn the universal law of the corpus, and easily fall into the poor local optimal solution, which leads to the higher question answering error rate. By analyzing the causes of the error and observing the details of the model processing image question answering, we find that stochastic gradient descent based on momentum (baseline) has some defects in the optimization of SANs. To solve it, we propose static restart stochastic gradient descent based on image question answering. The experimental results show that its accuracy is 0.29% higher than baseline, but its convergence rate is slower than baseline. To verify the significance of the improved performance, we conduct statistical hypothesis test on the experimental results. The results of T test prove that its improved performance is extremely significant in the process of converging to the global optimal solution. To verify its effectiveness in the same kind of algorithm, we conduct effectiveness experiments with it and the state-of-the-art first-order optimization algorithms. The experimental results and analysis prove that it is more effective in solving image question answering. To verify its generalization performance and promotion value, we conduct the image recognition experiment on the classic Cifar-10 for the image recognition task. The experimental results and the results of T test prove that it has good generalization performance and promotion value in the process of converging to the global optimal solution.
作者 李胜东 吕学强 Li Shengdong;Lü Xueqiang(School of Information, Renmin University of China, Beijing 100872;Department of Computer Engineering, Langfang Yanjing Vocational Technical College, Langfang, Hebei 065200;Beijing Key Laboratory of Internet Culture and Digital Dissemination Research(Beijing Information Science and Technology University), Beijing 100101)
出处 《计算机研究与发展》 EI CSCD 北大核心 2019年第5期1092-1100,共9页 Journal of Computer Research and Development
基金 国家自然科学基金项目(61671070) 国家语委十三五科研规划2017年度重点项目(ZDI135-53) 网络文化与数字传播北京市重点实验室开放课题(ICDD201505)This work was~~
关键词 图片问答 堆叠的注意力网络 动量 静态重启 随机梯度下降 image question answering stacked attention networks (SANs) momentum static restart stochastic gradient descent (SGD)
  • 相关文献

参考文献6

二级参考文献64

  • 1Vapnik V N. Statistical Learning Theory [M]. New York: Wiley-Interseience, 1998.
  • 2Zhang T. Statistical behavior and consistency of classification methods based on convex risk minimization [J]. Annals of Statistics, 2004, 32(1): 56-85.
  • 3Shalev Shwartz S, Singer Y, Srebro N, et al. Pegasos: Primal estimated sub-gradient solver for svm [J]. Mathematical Programming, 2011, 127(1) : 3-30.
  • 4Bach F, Moulines E. Non-asymptotic analysis of stochastic approximation algorithms for machine learning [G] // Advances in Neural Information Processing Systems. New York: ACM, 2011:451-459.
  • 5Nemirovski A, Juditsky A, Lan G, et al. Robust stochastic approximation approach to stochastic programming [J]. SIAM Journal on Optimization, 2009, 19(4): 1574-1609.
  • 6Nesterov Y. A method of solving a convex programming problem with convergence rate 0 ( 1/k2 ) [J]. Soviet Mathematics Doklady, 1983, 27(2): 372-376.
  • 7Nemirovski A, Yudin D. Problem Complexity and Method Efficiency in Optimization [M]. New York: Wiley Interscience, 1983.
  • 8Beck A, Teboulle M. Mirror descent and nonlinear projected subgradient methods for convex optimization [J]. Operations Research Letters, 2003, 31(3): 167-175.
  • 9Gabay D, Mercier B. A dual algorithm for the solution of nonlinear variational problems via finite element approximation [J]. Computers and Mathematics with Applications, 1976, 2(1): 17-40.
  • 10Wang H, Banerjee A. Online alternating direction method [C] //Proc of the 29th Int Conf on Machine Learning. New York: ACM, 2012:1119-1126.

共引文献133

同被引文献88

引证文献5

二级引证文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部