摘要
针对最短路径问题中Q学习算法的初始搜索空间大、后期收敛不稳定的缺陷,提出多头绒泡菌预处理的改进Q学习算法(PPA-Q)。该算法引入网络预处理过程和自适应概率选择模型,利用多头绒泡菌进行网络预处理,减少算法前期的无用探索空间,再通过改进的模拟退火算法实现自适应概率选择模型,加强算法对优质路径的探索程度,增加算法初期解的多样性,同时在算法后期稳定逼近最优路径且不振荡。仿真结果表明,PPA-Q算法收敛到最优路径成功率为100%,高于经典蚁群(ACO)算法和Q(λ)算法的80%,其迭代次数分别低于Q学习算法57.2%、ACO算法32.9%和Q(λ)算法35.1%.
To solve Q-learning algorithm’s drawbacks of large range of initial searching space and low convergence rate in latter stage in shortest path problem,a physarum polycephalum plgorithm based improved Q-learning( PPA-Q) is proposed. The improved algorithm puts forward pretreatments of network and adaptive choice model,and the effects of pretreatments of network reduce useless searching space in early stage by features of physarum polycehalum,the adaptive choice model enhances possibility of choosing high-quality path,increases diversity of solutions and converges to optimal result stably in latter stage through improved simulated annealing algorithm.Simulation results show that the success rate of the PPA-Q algorithm converges to the optimal path is 100%,which is higher than 80%of the ACO algorithm and Q( λ) algorithm,and the number of iterations is lower than 57. 2% of Q-learning,32. 9% of ACO algorithm and 35. 1% of Q( λ) algorithm.
作者
马学森
朱建
谈杰
唐昊
周江涛
Ma Xuesen;Zhu Jian;Tan Jie;Tang Hao;Zhou Jiangtao(School of Computer and Information, Hefei University of Technology, Hefei 230009, China;Research Institute of Sanshui & Hefei University of Technology in Guangdong, Foshan 528000, China;School of Electrical and Automation Engineering, Hefei University of Technology, Hefei 230009, China;School of Mathematics, Hefei University of Technology, Hefei 230009, China)
出处
《电子测量与仪器学报》
CSCD
北大核心
2019年第5期148-157,共10页
Journal of Electronic Measurement and Instrumentation
基金
国家自然科学基金(61573126)
广东省科技发展专项基金(2017A010101001)
中央高校基本科研业务费专项基金(JZ2016HGBZ1032)
国家留学基金
安徽省教育厅高等学校省级质量工程项目(2017JYXM0055)
合肥工业大学课程规划设计研究项目(119-033112)资助
关键词
最短路径问题
Q学习
多头绒泡菌
模拟退火算法
网络预处理
自适应概率选择模型
shortest path problem
Q-learning algorithm
physarum polycephalum
simulated annealing algorithm
pretreatments ofnetwork
adaptive choice model