摘要
针对航迹探测领域中探测器获得的目标地理位置通常是同一帧下无法区分的多目标场景,需要利用目标位置信息还原各航迹并区分各目标的问题进行研究,提出采用深度强化学习复原目标航迹的方法。依据目标航迹的物理特点,提取数学模型,结合目标航迹的方向、曲率等提出轨迹曲率圆(TOC)奖励函数,使深度强化学习能够有效复原多目标航迹并区分各目标。首先描述多目标航迹复原问题,并将问题建模成深度强化学习能够处理的模型;结合TOC奖励函数对多目标航迹复原问题进行实验;最后给出该奖励函数的数学推导和物理解释。实验结果表明,TOC奖励函数驱动下的深度强化网络能够有效还原目标的航迹,在航向和航速方面切合实际目标航迹。
It attracts lots of attention in the field of object trajectory detection that detectors always receive several geographical locations without any other information about the targets,and furthermore it comes into a problem to use the geographical location information received by the sensors to reconstruct the trajectories of each target as well as to distinguish the targets in each frame,which is called multi-target trajectory recovery and can be solved by deep reinforcement learning( DRL). This paper implemented a trajectory osculating circle( TOC) reward function based on the mathematical model of the direction and trajectory curvature according to the peculiarity of trajectories in actual. Firstly,it switched the issue of the multi-target trajectory reconstruction into a model which could be appropriate for DRL. Then,it tested DRL with the proposed reward function. Finally,it introduced a mathematical derivation and physical interpretation of the proposed TOC reward function. The experimental result shows that DRL with the TOC reward function can reverse the trajectory effectively,and the trace corresponds well with the actual trajectory.
作者
贺亮
徐正国
贾愚
沈超
李赟
He Liang;Xu Zhengguo;Jia Yu;Shen Chao;Li Yun(National Key Laboratory of Science&Technology on Blind Signal Processing,Chengdu 610041,China;MOE Key Laboratory for Intelligent Networks&Network Security,Xi’an Jiaotong University,Xi’an 710049,China)
出处
《计算机应用研究》
CSCD
北大核心
2020年第6期1626-1632,共7页
Application Research of Computers
基金
国家自然科学基金重点项目(U1736205)
国家自然科学基金资助项目(61773310)。
关键词
深度强化学习
序贯决策
Q函数
轨迹密切圆
deep reinforcement learning(DRL)
sequential decision
Q function
trajectory osculating circle(TOC)