摘要
针对半Markov决策过程在紧致行动集上的数值迭代优化,提出了折扣和平均准则下直接基于等价无穷小生成子的统一的标准数值迭代算法,并证明了其收敛性。另外,讨论了两种性能准则下统一的异步数值迭代算法,包括Gauss-Seidel异步迭代和随机异步迭代,特别是基于样本轨道仿真的数值迭代等,并运用性能势思想对上述算法进行改进。结果表明,该算法可直接适用于连续时间Markov决策过程。最后通过一个数值例子来比较各种算法的特点。
For the problem of value iteration (VI) optimization of the compact action set in semi-Markov decision process, a unified standard VI algorithm directly based on the equivalent infinitesimal generator under both discount and average criteria was proposed with the proof of convergence. In addition, two unified asynchronous VI algorithms, Gauss-Seidel and stochastic, especially the stochastic asynchronous VI algorithm based on the sample path simulation under both discount and average criteria was discussed with improvement by using the performance potential theory. Comparison of each algorithm through a numerical example was given, which shows that the proposed algorithms are applicable to the continuous-time Markov decision processes.
出处
《吉林大学学报(工学版)》
EI
CAS
CSCD
北大核心
2006年第1期108-112,共5页
Journal of Jilin University:Engineering and Technology Edition
基金
国家自然科学基金资助项目(60404009)
安徽省自然科学基金资助项目(050420303)
合肥工业大学中青年科技创新群体计划项目
关键词
计算机应用
半MARKOV决策过程
等价无穷小生成子
异步数值迭代
computer application
semi-Markov decision process
equivalent infinitesimal generator
asynchronous value iteration