期刊文献+

图多智能体任务建模视角下的协作子任务行为发现

Discovering Coordinated Subtask Patterns from a Graphical Multi-Agent Task Modeling Perspective
下载PDF
导出
摘要 大量多智能体任务都表现出近似可分解结构,其中相同交互集合中智能体间交互强度大,而不同交互集合中智能体间交互强度小.有效建模该结构并利用其来协调智能体动作选择可以提升合作型多智能体任务中多智能体强化学习算法的学习效率.然而,目前已有工作通常忽视并且无法有效实现这一目标.为解决该问题,使用动态图来建模多智能体任务中的近似可分解结构,并由此提出一种名叫协作子任务行为(coordinated subtask pattern,CSP)的新算法来增强智能体间局部以及全局协作.具体而言,CSP算法使用子任务来识别智能体间的交互集合,并利用双层策略结构来将所有智能体周期性地分配到多个子任务中.这种分配方式可以准确刻画动态图上智能体间的交互关系.基于这种子任务分配,CSP算法提出子任务内和子任务间行为约束来提升智能体间局部以及全局协作.这2种行为约束确保相同子任务内的部分智能体间可以预知彼此动作选择,同时所有智能体选择优异的联合动作来最大化整体任务性能.在星际争霸环境的多个地图上开展实验,实验结果表明CSP算法明显优于多种对比算法,验证了所提算法可以实现智能体间的高效协作. Numerous multi-agent tasks exhibit a nearly decomposable structure,wherein interactions among agents within the same interaction set are strong while interactions between different sets are weak.Efficiently modeling this structure and leveraging it to coordinate agents can enhance the learning efficiency of multi-agent reinforcement learning algorithms for cooperative multi-agent tasks,while existing work typically neglects and fails.To address this limitation,we model the nearly decomposable structure using a dynamic graph and accordingly propose a novel algorithm named coordinated subtask pattern(CSP)that enhances both local and global coordination among agents.Specifically,CSP identifies agents’interaction sets as subtasks and utilizes a bi-level structure to periodically distribute agents into multiple subtasks,which ensures accurate characterizations regarding their interactions on the dynamic graph.Based on the subtask assignment,CSP proposes intra-subtask and inter-subtask pattern constraints to facilitate both local and global coordination among agents.These two constraints ensure that partial agents within the same subtask are aware of their action selections and all agents select superior joint actions that maximize the overall task performance.Experimentally,we evaluate CSP across multiple maps of SMAC benchmark,and its superior performance against multiple baseline algorithms demonstrates its effectiveness on efficiently coordinating agents.
作者 李超 李文斌 高阳 Li Chao;Li Wenbin;Gao Yang(State Key Laboratory for Novel Software Technology(Nanjing University),Nanjing 210023)
出处 《计算机研究与发展》 EI CSCD 北大核心 2024年第8期1904-1916,共13页 Journal of Computer Research and Development
基金 国家自然科学基金项目(62192783,62106100,62276142) 江苏省自然科学基金项目(BK20221441) 江苏省产业前瞻与关键核心技术竞争项目(BE2021028) 深圳市中央引导地方科技发展资金项目(2021Szvup056)。
关键词 多智能体强化学习 合作型任务 近似可分解结构 动态图 协作 multi-agent reinforcement learning cooperative tasks nearly decomposable structure dynamic graph coordination
  • 相关文献

参考文献3

二级参考文献14

共引文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部