群体环境下基于随机对策的多Agent局部学习算法

Local Learning Algorithm for Multi-agent Based on Stochastic Games under Group Environment

下载PDF

导出

摘要基于群体环境中个体agent局部感知和交互的生物原型,提出一种随机对策框架下的多agent局部学习算法.算法在与局部环境交互中采用贪婪策略最大化自身利益.分别在零和、一般和的单个平衡点和多个平衡点情形下改进了Nash-Q学习算法;提出了行为修正方法,并证明了算法收敛、计算复杂度降低. A local learning algorithm for multi-agent-based stochastic games is proposed in light of the fact that the individual performs local perception and interaction in group. In the algorithm, every agent adopts greedy policy to maximize- its payoff when interacting with the environment. The Nash-Q earning algorithm is improved respectively in situations of zero-sum, general-sum games with only one equilibrium or multi-equilibrium. Besides, the method to modify the behavior is proposed, and it is proved that the algorithm is convergent and the computing complexity is reduced.

作者尹怡欣江道平班晓娟孟祥嵩

机构地区北京科技大学信息工程学院

出处《信息与控制》 CSCD 北大核心 2008年第6期703-708,共6页 Information and Control

基金国家自然科学基金资助项目(60503024 60374032)

关键词多AGENT学习随机对策 Nash—Q 局部学习 multi-agent learning stochastic game Nash-Q local Jearning

分类号 TP391.9 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献10

1Wang B N, Gao Y, Chen Z Q, et al. A two-layered multi-agent reinforcement learning model and algorithm [J]. Journal of Network and Computer Applications, 2007, 30(4): 1366- 1376.
2Littman M L. Markov games as a framework for multi-agent reinforcement learning [A]. Proceedings of the 11th International Conference on Machine Learning [C]. San Mateo, CA, USA: Morgan Kaufmann, 1994. 157-163.
3Hu J, Wellman M E Experimental results on Q-learning for general-sam stochastic games [A]. Proceedings' of the Seventeenth International Conference on Machine Learning [C]. San Mateo, CA, USA: Morgan Kaufmarm, 2000. 407-414.
4Ishii S, Yoshida W, Yoshimoto J. Control of exploitation-exploration meta-parameter in reinforcement learning [J]. Neural Networks, 2002, 15(4-6): 665-687.
5Reynolds C W. Flocks, herds, and schools: A distributed behavioral model [J]. Computer Graphics, 1987, 21(4): 25-34.
6Watkins C J C H, Dayan P. Q-learning [J]. Machine Learning, 1992, 8(3-4): 279-292.
7Filar J A, Vrieze K. Competitive Markov Decision Processes [M]. New York, USA: Springer, 1996.
8Shapley L S. Stochastic games [A]. Proceedings of the National Academy of Sciences of the United States of America [C]. Princeton, NJ, USA: Princeton University Press, 1953. 1095-1100.
9Fink A M. Equilibrium in a stochastic n-person game [J]. Journal of Science in Hiroshima University, 1964, A128(1): 89-93.
10Bowling M, Veloso M. Multiagent learning using a variable learning rate [J]. Artificial Intelligence, 2002, 136(2): 215-250.

1专家系统、人工智能[J].电子科技文摘,2002,0(11):143-144.
2李琳娜.基于变学习率的多agent学习算法的研究[J].长春工程学院学报（自然科学版）,2009,10(4):81-83.
3吴军,徐昕,王健,贺汉根.面向多机器人系统的增强学习研究进展综述[J].控制与决策,2011,26(11):1601-1610. 被引量：22
4赵海燕,曹健,徐文博.基于多Agent学习机制的服务组合[J].计算机工程与科学,2013,35(9):117-121.
5相入喜,朱锡芳,吴峰,许清泉.基于视觉显著性的局部感知锐度的模糊图像质量评价算法研究[J].微电子学与计算机,2016,33(11):40-44. 被引量：1
6蔡娟,蔡坚勇,廖晓东,黄海涛,丁侨俊.基于卷积神经网络的手势识别初探[J].计算机系统应用,2015,24(4):113-117. 被引量：52
7刘菲,曾广周,宋言伟.多Agent协作的强化学习模型和算法[J].计算机科学,2006,33(12):156-158. 被引量：6
8王凤芹,任廷友,吴忠强.基于遗传算法的多模型模糊控制[J].自动化技术与应用,2005,24(1):5-7.
9郑延斌,牛丽平.基于随机对策的团队CGA学习[J].计算机工程与应用,2009,45(23):52-54.
10王倩,谢阳群.基于P2P技术的群体信息共享分析[J].图书馆学研究,2016(7):50-54.

信息与控制

2008年第6期

浏览历史

内容加载中请稍等...

群体环境下基于随机对策的多Agent局部学习算法

参考文献10

相关作者

相关机构

相关主题

浏览历史