摘要
深入研究了融合航海优先级(NP)和优先级经验回放(PER)策略的深度Q网络(DQN)算法在海上环境智能路径规划问题上的应用。不同于传统路径规划算法,本优化算法能够自主探索并学习海上环境的规律,无需依赖人工构建的海洋环境全局信息。本研究开发了基于Gym框架的海上仿真环境,用以模拟和验证改进的DQN模型。该模型融合了航海优先级和优先级经验回放机制,通过调整学习过程中经验样本的利用频率,提升了算法对重要决策的学习效率。此外,引入新的奖赏函数,进一步增强了模型对路径规划问题的适应能力和稳定性。仿真实验结果证明,该模型在避免障碍物及寻找最佳路径方面相较于基准方法有显著提升,展现了一定的泛化性和优秀的稳定性。
This study delves into the application of a deep Q-Network(DQN)algorithm,which integrates strategies of Navigational Priority(NP)and Prioritized Experience Replay(PER),for intelligent path planning in maritime environments.Unlike conventional path planning algorithms,our optimized model autonomously explores and learns the patterns of the maritime environment without relying on manually constructed global maritime information.We have developed a maritime simulation environment based on the Gym framework to simulate and validate our improved DQN model.This model incorporates the mechanisms of Navigational Priority and Prioritized Experience Replay,enhancing the algorithm′s learning efficiency for critical decisions by adjusting the frequency of experience sample utilization during the learning process.Additionally,the introduction of a novel reward function has further strengthened the model′s adaptability and stability in addressing path planning issues.Simulation experiments demonstrate that our model significantly outperforms baseline methods in avoiding obstacles and finding optimal routes,showcasing notable generalizability and exceptional stability.
作者
李鹏程
周远国
杨国卿
Li Pengcheng;Zhou Yuanguo;Yang Guoqing(College of Communication and Information Engineering,Xi′an University of Science and Technology,Xi′an 710054,China;College of Electronics and Information,Hangzhou Dianzi University,Hangzhou 310018,China)
出处
《电子测量技术》
北大核心
2024年第5期77-84,共8页
Electronic Measurement Technology
基金
国家自然科学基金(61801009)
陕西省自然科学基金面上项目(2024JC-YBMS-556)资助。
关键词
改进深度Q网络
海上模拟仿真环境
航海优先级
奖赏函数
improved deep Q-Network
maritime simulation environment
navigational priority
reward function