摘要
为解决复杂现实环境下的水库优化调度问题,采用强化学习的Q-learning算法,以哈希表作为核心数据结构,在逐幕生成水库可行调度方案的同时,单时段优化Q值,最终依据各时段的最优Q值生成水库最优调度方案。试验分析结果表明,当迭代次数达到一定数量时,Q-learning算法能够达到理论上的最优解;依据水库调度历史数据建立最优搜索廊道,Q-learning算法可在缩短优化时间的同时获得高质量的解。
The Q-learning algorithm of reinforcement learning was adopted to solve the problem of reservoir optimal operation in complex real environment.Taking the Hash table as the core data structure,the feasible operation scheme of the reservoir was generated step by step,and the Q value was optimized in a single interval.Finally,the optimal operation scheme of the reservoir was obtained according to the optimal Q value of each interval.Experimental analysis shows that when the number of iterations reaches a certain number,Q-learning algorithm can achieve the optimal solution in theory;According to the historical data of reservoir operation to establish the optimal search corridor,Q-learning algorithm can shorten the optimization time and obtain high-quality solution.
作者
胡鹤轩
尹苏明
胡强
张晔
胡震云
义崇政
HU He-xuan;YIN Su-ming;HU Qiang;ZHANG Ye;HU Zhen-yun;YI Chong-zheng(College of Computer and Information,Hohai University,Nanjing 210098,China;Business School,Hohai University,Nanjing 210098,China;School of Electrical Engineering,Tibet Agriculture Animal Huvsbandry College,Nyingchi 860000,China;Changjiang Survey,Planning,Design and Research Co.,Ltd.,Wuhan 430010,China;Changjiang Spatial Information Technology Engineering Co.,Ltd.,Wuhan 430010,China;Water Resources Information Perception and Big Data Engineering Research Center of Hubei Province,Wuhan 430010,China)
出处
《水电能源科学》
北大核心
2022年第1期73-77,共5页
Water Resources and Power
基金
国家重点研发计划(2018YFC0407904)
西藏自治区创新创业重点研究项目(Z2016D01G01/01)。