摘要
针对多智能体系统编队在路径规划过程中队形不稳定、获取路径时间较慢以及在避障过程中存在与障碍物距离较近的问题,本文采用角度距离度量的方式对多智能体进行编队,通过改变传统的由起始点到达终点避障过程所形成的路径规划方法,弱化起始点和终点概念,添加路径中点的方式,使智能体同时从起点以及终点向中点移动,形成由起点指向中点、终点指向中点的2条路径。将奖励函数设计为同号智能体相碰为正向奖励,异号智能体相碰以及智能体(同号智能体和异号智能体)与障碍物之间相碰均为负向奖励。在搭建的静态和动态障碍物2种仿真环境下分别进行可变容量体验池深度确定性策略梯度算法(deep deterministic policy gradient-variable capacity experience pool,DDPG-vcep)验证,并对比不同训练次数下的奖赏值。仿真结果表明,改进后的DDPG编队算法较传统DDPG算法节约了路径获取时间,编队避障的效果更加明显。
In view of the problems of unstable formation in the path planning process,slow time to obtain path,and close distance to obstacles in the process of obstacle avoidance,in this paper,the method of angular distance measurement is used to form multiple agents,and by changing the traditional path planning method formed by the obstacle avoidance process from the start point to the end point,weakening the concept of start point and end point,and adding the path midpoint,the agent moves from the start point and the end point to the midpoint at the same time,forming two paths from the start point to the midpoint and the end point to the midpoint.The reward function is designed to meet agents of the same number as a positive reward,and the collision of different agents and the collision between agents(agents of the same number and different agents)and obstacles are all negative rewards.The deep deterministic policy gradient-variable capacity experience pool(DDPGvcep)algorithm is verified respectively in the static and dynamic obstacle simulation environments,and the reward values under different training times are compared.The simulation results show that compared with the traditional DDPG algorithm,the improved DDPG formation algorithm saves the path acquisition time and has a more obvious obstacle avoidance effect.
作者
景永年
耿双双
向瑶
文家燕
JING Yongnian;GENG Shuangshuang;XIANG Yao;WEN Jiayan(School of Automation,Guangxi University of Science and Technology,Liuzhou 545616,China;Black Sesame Technologies Company Limited,Shenzhen 518055,China;Research Center for Intelligent Cooperation and Cross-application,Guangxi University of Science and Technology,Liuzhou 545616,China;Guangxi Key Laboratory of Automobile Components and Vehicle Technology,Guangxi University of Science and Technology,Liuzhou 545616,China)
出处
《广西科技大学学报》
CAS
2023年第3期62-71,共10页
Journal of Guangxi University of Science and Technology
基金
国家自然科学基金项目(61963006)
广西自然科学基金项目(2018GXNSFAA050029,2018GXNSFAA294085)
2022年广西汽车零部件与整车技术重点实验室自主研究课题(2022GKLACVTZZ01)资助。
关键词
深度学习
强化学习
深度确定性策略梯度算法(DDPG算法)
多智能体
编队控制
避障
deep learning
reinforcement learning
deep deterministic policy gradient algorithm(DDPG algorithm)
multi-agent
formation control
obstacle avoidance