摘要
针对离线的无人机(UAV)基站飞行路线设计无法满足随机的、动态的地面用户通信请求难题,该文研究了飞行路线在线优化设计算法。考虑单个无人机空中基站为两个地面用户提供无线通信服务,通过在线实时优化无人机的飞行路线实现最小化与地面用户的平均通信时延。首先,由于系统的无人机的状态和动作是连续的,将问题转化成一个马尔可夫决策过程(MDP);然后,把单次通信时延引入到动作价值函数中;最后分别采用强化学习中蒙特卡罗和Q-Learning算法来实现无人机的飞行路线在线优化。仿真结果表明,所提出的在线优化的平均时延性能优于"固定位置"和"贪婪算法"的时延计算结果。
Considering dealing with the problem of random and dynamic communication requests of ground users in a UAV(Unmanned Aerial Vehicle)mounted base station communication system,which can not be tackled by an offline trajectory design scheme,an online trajectory optimization algorithm is proposed for the UAVmounted base station.In the considered system,a single UAV is utilized as an aerial base station to provide wireless communication service to two ground users.The problem of minimizing the average communication delay of the ground users via optimizing the UAV’s trajectory is considered.First,it is shown that the problem can be casted as a Markov Decision Process(MDP),and then the delay of one single communication is introduced into the action value function.Finally,the Monte Carlo and Q-Learning algorithms from the reinforcement learning technology are respectively adopted to realize the online trajectory optimization.Simulation results show that the proposed algorithm outperforms the“fixed position”and“greedy algorithm”schemes.
作者
张广驰
严雨琳
崔苗
陈伟
张景
ZHANG Guangchi;YAN Yulin;CUI Miao;CHEN Wei;ZHANG Jing(School of Information Engineering,Guangdong University of Technology,Guangzhou 510006,China;Institute of Environmental Geology Exploration of Guangdong Province,Guangzhou 510080,China;China Academic of Electronics and Information Technology,Beijing 100043,China)
出处
《电子与信息学报》
EI
CSCD
北大核心
2021年第12期3605-3611,共7页
Journal of Electronics & Information Technology
基金
广东省科技计划(2017B090909006,2019B010119001,2020A050515010,2021A0505030015)
广东特支计划(2019TQ05X409)。
关键词
无人机通信
飞行路线在线优化
平均时延最小化
强化学习
Unmanned Aerial Vehicle(UAV)communication
Online trajectory optimization
Average delay minimization
Reinforcement learning