摘要
目前四旋翼无人机大部分都采用经典控制方法进行控制律的设计,然而控制参数的选择和对被控对象数学模型的依赖一直是经典控制方法设计中需要克服的问题;针对此问题,采用了一种基于深度强化学习算法Deep Q Network的无人机控制律设计方法,以四旋翼姿态角和姿态角速率作为三层神经网络的输入数据,最终输出动作值函数,再根据贪婪策略进行动作的选取,通过与环境的不断交互,智能体根据奖惩信息来更新神经网络的权值,使得智能体朝着获得累积回报最大值的方向选取动作;仿真结果表明在经过强化学习训练之后,四旋翼姿态角能够快速准确地跟踪上参考指令的变化,证明了基于强化学习的四旋翼无人机控制律的可行性,从而避免了传统控制方法对控制参数的选择与控制模型的依赖。
At present,most of the quadrotor UAVs use the classic control method to design the control law.However,the selection of control parameters and the dependence on the mathematical model of the controlled object have always been problems that need to be overcome in the design of the classic control method.Aiming at this problem,a design method of UAV control law based on deep reinforcement learning algorithm Deep Q Network is adopted.The quadrotor attitude angle and attitude angle rate are used as the input data of the three-layer neural network,and finally the action value function is output.Then,the action is selected according to the greedy strategy.Through continuous interaction with the environment,the agent updates the weight of the neural network according to the reward and punishment information,so that the agent selects the action in the direction of obtaining the maximum cumulative return.The simulation results show that after the reinforcement learning training,the quadrotor attitude angle can quickly and accurately track the change of the reference command,which proves the feasibility of the quadrotor UAV control law based on reinforcement learning,thus avoiding the dependence of traditional control methods on the selection of control parameters and control model.
作者
梁晨
刘小雄
张兴旺
黄剑雄
Liang Chen;Liu Xiaoxiong;Zhang Xingwang;Huang Jianxiong(College of Automation,Northwestern Polytechnical University,Xi'an 710072,China)
出处
《计算机测量与控制》
2021年第2期71-75,86,共6页
Computer Measurement &Control
基金
航空科学基金资助(201905053003)
陕西省飞行控制与仿真技术重点实验室资助。