期刊文献+

基于策略梯度强化学习的高铁列车动态调度方法 被引量:7

A policy gradient reinforcement learning algorithm for high-speed railway dynamic scheduling
原文传递
导出
摘要 高速铁路以其运输能力大、速度快、全天候等优势,取得了飞速蓬勃的发展.而恶劣天气等突发事件会导致列车延误晚点,更甚者延误会沿着路网不断传播扩散,其带来的多米诺效应将造成大面积列车无法按计划运行图运行.目前依靠人工经验的动态调度方式难以满足快速优化调整的实际要求.因此,针对突发事件造成高铁列车延误晚点的动态调度问题,设定所有列车在各站到发时间晚点总和最小为优化目标,构建高铁列车可运行情况下的混合整数非线性规划模型,提出基于策略梯度强化学习的高铁列车动态调度方法,包括交互环境建立、智能体状态及动作集合定义、策略网络结构及动作选择方法和回报函数建立,并结合具体问题对策略梯度强化学习(REINFORCE)算法进行误差放大和阈值设定两种改进.最后对算法收敛性及算法改进后的性能提升进行仿真研究,并与Q-learning算法进行比较,结果表明所提出的方法可以有效地对高铁列车进行动态调度,将突发事件带来的延误影响降至最小,从而提高列车的运行效率. The high-speed railway has achieved vigorous development in recent years due to its advantages of large transport capacity,fast speed and all-weather.But unexpected events such as bad weather will cause train delays,and even the delay will continue to spread along the road network.The domino effect will cause large-area trains to fail to operate according to the plan.At present,the dynamic scheduling method relying on manual experience is difficult to meet the actual requirements.Therefore,this paper aims at the problem of dynamic scheduling of high-speed train,setting the minimum sum of the delays of all trains at each station as the optimization goal.At the same time,a mixed-integer nonlinear programming(MINLP)model under traversable conditions is constructed,and a policy gradient reinforcement learning method is proposed including establishment of environment,definition of state and action set,policy network,action selection method,reward function and combined with the specific problems,the error amplification and threshold setting of REINFORCE algorithm are improved.Finally,the convergence and the performance improvement of the algorithm are studied and compared with the Q-learning algorithm.The results show that the method proposed in this paper can effectively reschedule high-speed trains,minimize the impact of delays,and improve the efficiency of train operation.
作者 俞胜平 韩忻辰 袁志明 崔东亮 YU Sheng-ping;HAN Xin-chen;YUAN Zhi-ming;CUI Dong-liang(State Key Laboratory of Synthetical Automation for Process Industries,Northeastern University,Shenyang 110004,China;Signal&Communication Reseach Institute,China Academy of Railway Sciences Co.,Ltd,Beijing 100081,China)
出处 《控制与决策》 EI CSCD 北大核心 2022年第9期2407-2417,共11页 Control and Decision
基金 国家自然科学基金项目(U1834211,61790574,61603262,61773269) 辽宁省自然科学基金项目(2020-MS093)。
关键词 高铁列车 突发扰动 动态调度 强化学习 策略梯度 策略梯度强化学习 high-speed railway unexpected disturbances dynamic scheduling reinforcement learning policy gradient REINFORCE
  • 相关文献

参考文献5

二级参考文献46

共引文献56

同被引文献83

引证文献7

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部