Collaborative multi-agent reinforcement learning based on experience propagation 被引量：5

Collaborative multi-agent reinforcement learning based on experience propagation

下载PDF

导出

摘要 For multi-agent reinforcement learning in Markov games, knowledge extraction and sharing are key research problems. State list extracting means to calculate the optimal shared state path from state trajectories with cycles. A state list extracting algorithm checks cyclic state lists of a current state in the state trajectory, condensing the optimal action set of the current state. By reinforcing the optimal action selected, the action policy of cyclic states is optimized gradually. The state list extracting is repeatedly learned and used as the experience knowledge which is shared by teams. Agents speed up the rate of convergence by experience sharing. Competition games of preys and predators are used for the experiments. The results of experiments prove that the proposed algorithms overcome the lack of experience in the initial stage, speed up learning and improve the performance. For multi-agent reinforcement learning in Markov games, knowledge extraction and sharing are key research problems. State list extracting means to calculate the optimal shared state path from state trajectories with cycles. A state list extracting algorithm checks cyclic state lists of a current state in the state trajectory, condensing the optimal action set of the current state. By reinforcing the optimal action selected, the action policy of cyclic states is optimized gradually. The state list extracting is repeatedly learned and used as the experience knowledge which is shared by teams. Agents speed up the rate of convergence by experience sharing. Competition games of preys and predators are used for the experiments. The results of experiments prove that the proposed algorithms overcome the lack of experience in the initial stage, speed up learning and improve the performance.

作者 Min Fang Frans C.A. Groen

机构地区 School of Computer Science and Technology Informatics Institute

出处《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2013年第4期683-689,共7页 系统工程与电子技术（英文版）

基金 supported by the National Natural Science Foundation of China (61070143 61173088)

关键词 MULTI-AGENT Q learning state list extracting experience sharing. multi-agent Q learning state list extracting experience sharing.

分类号 TP18 [自动化与计算机技术—控制理论与控制工程] TH166 [机械工程—机械制造及自动化]

引文网络
相关文献

参考文献2

1JIANG Jian-Guo,SU Zhao-Pin,QI Mei-Bin,ZHANG Guo-Fu.Multi-task Coalition Parallel Formation Strategy Based on Reinforcement Learning[J].自动化学报,2008,34(3):349-352. 被引量：6
2周浦城,洪炳镕,黄庆成.一种新颖的多agent强化学习方法[J].电子学报,2006,34(8):1488-1491. 被引量：8

二级参考文献17

1蒋建国,夏娜,于春华.基于能力向量发挥率和拍卖的联盟形成策略[J].电子学报,2004,32(F12):215-217. 被引量：20
2宋梅萍,顾国昌,张国印.随机博弈框架下的多agent强化学习方法综述[J].控制与决策,2005,20(10):1081-1090. 被引量：13
3张国富,蒋建国,夏娜,苏兆品.基于离散粒子群算法求解复杂联盟生成问题[J].电子学报,2007,35(2):323-327. 被引量：33
4Ho F,Kamel M.Learning coordinating strategies for cooperative multiagent systems[J].Machine Learning,1998,33(2-3):155 -177.
5Garland A,Alterman R.Autonomous agents that learn to better coordinate[J].Autonomous Agents and Multi-Agent Systems,2004,8 (3):267-301.
6Kaelbing L P,Littman M L,Moore A W.Reinforcement learning:A survey[J].Journal of Artificial Intelligence Research,1996,4:237-285.
7Brafman R I,Tennenholtz M.Learning to coordinate efficiently:A model-based approach[J].Journal of Artificial Intelligence Research,2003,18:517-529.
8Chen G,Yang Zh.Coordinating multiple agents via rein -forcement learning[J].Autonomous Agents and Multi-Agent Systems,2005,10 (3):273-328.
9Watkins C J C H,Dayan P.Technical note:Q-learning[J].Machine learning,1992,8(3-4):279 -292.
10Grefenstette J J.Credit assignment in rule discovery systems based on genetic algorithms[J].Machine Learning,1988,3(2-3):225 -245.

共引文献12

1方宝富,潘启树,洪炳镕,丁磊.基于活跃区域的多机器人分层追逃算法[J].华中科技大学学报（自然科学版）,2011,39(S2):335-339.
2李剑,景博,杨义先.一种基于奖励机制的agent联盟形成策略[J].电子学报,2008,36(B12):71-75. 被引量：5
3张媛,张广明,袁宇浩.利用聚类分析法改进的多Agent协作强化学习方法[J].计算机测量与控制,2010,18(4):923-926. 被引量：1
4陈玉明,张广明,赵英凯.基于强化学习的混合智能控制算法研究与分析[J].机床与液压,2010,38(20):75-77.
5Zhao-Pin Su,Jian-Guo Jiang,Chang-Yong Liang2＇ 3 Guo-Fu Zhang,Guo-Fu Zhang.Path Selection in Disaster Response Management Based on Q-learning[J].International Journal of Automation and computing,2011,8(1):100-106. 被引量：3
6李珺,潘启树,周浦城,洪炳镕.未知环境下多机器人协作追捕算法[J].电子学报,2011,39(3):567-574. 被引量：4
7吴军,徐昕,王健,贺汉根.面向多机器人系统的增强学习研究进展综述[J].控制与决策,2011,26(11):1601-1610. 被引量：22
8付鹏,罗杰.基于改进蚁群算法的Q学习算法研究[J].计算机技术与发展,2013,23(2):123-126. 被引量：1
9焦玉民,王强,徐婷,苏京.智能虚拟维修环境多Agent协作机制[J].系统工程与电子技术,2013,35(6):1348-1352. 被引量：4
10苏兆品,张国富,蒋建国,岳峰,张婷.基于非支配排序差异演化的应急资源多目标分配算法[J].自动化学报,2017,43(2):195-214. 被引量：18

同被引文献26

1孙宝彩,祁载康.带状态反馈约束的驾驶仪极点配置设计方法[J].系统仿真学报,2006,18(z2):892-893. 被引量：2
2李阳阳,焦李成.求解SAT问题的量子免疫克隆算法[J].计算机学报,2007,30(2):176-183. 被引量：45
3朱敬举,祁载康,夏群力.三回路驾驶仪的极点配置方法设计[J].弹箭与制导学报,2007,27(4):8-12. 被引量：13
4温求遒,夏群力,祁载康.三回路驾驶仪开环穿越频率约束极点配置设计[J].系统工程与电子技术,2009,31(2):420-423. 被引量：17
5王辉,林德福,祁载康.导弹伪攻角反馈三回路驾驶仪设计分析[J].系统工程与电子技术,2012,34(1):129-135. 被引量：27
6刘振,胡云安,史建国.协同进化免疫记忆克隆算法[J].四川大学学报（工程科学版）,2013,45(1):138-145. 被引量：5
7南杨,李中健,叶文伟.基于强化学习的飞行自动驾驶仪设计[J].电子设计工程,2013,21(10):45-47. 被引量：3
8Xiang Gao,Yangwang Fang,Youli Wu.Fuzzy Q learning algorithm for dual-aircraft path planning to cooperatively detect targets by passive radars[J].Journal of Systems Engineering and Electronics,2013,24(5):800-810. 被引量：6
9刘振,胡云安,彭军.协同进化扩展紧致量子进化算法[J].控制与决策,2014,29(2):320-326. 被引量：13
10潘晓英,焦李成,刘芳.求解SAT问题的多智能体社会进化算法[J].计算机学报,2014,37(9):2011-2020. 被引量：6

引证文献5

1刘振,郭恒光,李伟.协同量子智能体进化算法及其性能分析[J].北京邮电大学学报,2019,42(2):120-126. 被引量：1
2LIU Wenzhang,DONG Lu,LIU Jian,SUN Changyin.Knowledge transfer in multi-agent reinforcement learning with incremental number of agents[J].Journal of Systems Engineering and Electronics,2022,33(2):447-460. 被引量：4
3万齐天,卢宝刚,赵雅心,温求遒.基于深度强化学习的驾驶仪参数快速整定方法[J].系统工程与电子技术,2022,44(10):3190-3199.
4LI Bohao,WU Yunjie,LI Guofei.Hierarchical reinforcement learning guidance with threat avoidance[J].Journal of Systems Engineering and Electronics,2022,33(5):1173-1185.
5宋健,王子磊.基于值分解的多目标多智能体深度强化学习方法[J].计算机工程,2023,49(1):31-40. 被引量：4

二级引证文献9

1吕峰.自动化控制中人工智能的应用[J].科技经济市场,2024(4):31-33.
2姚力,章江铭,倪琳娜.基于量子遗传和核模糊聚类的低压台区户变关系识别[J].电测与仪表,2020,57(20):106-113. 被引量：13
3张钰欣,赵恩娇,赵玉新.规则耦合下的多异构子网络MADDPG博弈对抗算法[J].智能系统学报,2024,19(1):190-208.
4孙文洁,李宗民,孙浩淼.基于图神经网络的多智能体强化学习值函数分解方法[J].计算机工程,2024,50(5):62-70.
5张斯力,李梓健,蔡瑞初,郝志峰,闫玉光.基于因果机制约束的强化推荐系统[J].计算机工程,2024,50(5):279-290.
6王若男,董琦.基于学习机制的多智能体强化学习综述[J].工程科学学报,2024,46(7):1251-1268.
7龚雪,彭鹏菲,荣里,郑雅莲,姜俊.基于深度强化学习的任务分析方法[J].系统仿真学报,2024,36(7):1670-1681.
8Kun Jiang,Wenzhang Liu,Yuanda Wang,Lu Dong,Changyin Sun.Discovering Latent Variables for the Tasks With Confounders in Multi-Agent Reinforcement Learning[J].IEEE/CAA Journal of Automatica Sinica,2024,11(7):1591-1604.
9张耐民,蔡秉辰,于浛,刘海阔.基于多智能体强化学习的对抗博弈技术综述[J].海军航空大学学报,2024,39(4):395-410.

1童亮,陆际联.Multi-Agent Reinforcement Learning Algorithm Based on Action Prediction[J].Journal of Beijing Institute of Technology,2006,15(2):133-137.
2LIU Changan LIU Fei LIU Chunyang WU Hua.Multi-agent Reinforcement Learning Based on K-Means Algorithm[J].Chinese Journal of Electronics,2011,20(3):414-418.
3唐文彬,朱淼良.基于强化学习的多Agent系统[J].计算机科学,2003,30(4):16-18. 被引量：7
4Liu Hong,Liu Xiyu (Department of Computer Science Shandong Normal Uinversity Jinan City 250014, PR China).Design Agents with Sharing Learning Mechanism[J].Computer Aided Drafting,Design and Manufacturing,2000,10(1):74-83.
5樊大地,王宏力,侯青剑.基于TEAMS的系统健康评估方法研究[J].仪器仪表用户,2009,16(1):5-7.
6樊大地,王宏力,侯青剑.TEAMS在故障诊断中的应用研究[J].装备制造技术,2008(9):126-128. 被引量：7
7杨玉君,程君实,陈佳品.Multi-agent reinforcement learning with cooperation based on eligibility traces[J].Journal of Harbin Institute of Technology(New Series),2004,11(5):564-568.
8王宏标.强化学习机械识图基本知识熟练掌握制图基本规定[J].中国电子商务,2013(16):242-242.
9飞狐.用好TCL电脑TEAMS管理平台[J].电脑知识与技术（经验技巧）,2009(11):75-76.
10Cai Ruoheng,Hong Liang,Yuan Ling.The Current State of Guangzhou Standardization and a Study of its Mechanisms of Operation[J].China Standardization,2009,54(2):33-36.

Journal of Systems Engineering and Electronics

2013年第4期

浏览历史

内容加载中请稍等...