期刊文献+

半Markov决策过程的数值迭代优化 被引量:2

Value Iteration Optimization for Semi-Markov Decision Processes
下载PDF
导出
摘要 针对半Markov决策过程在紧致行动集上的数值迭代优化,提出了折扣和平均准则下直接基于等价无穷小生成子的统一的标准数值迭代算法,并证明了其收敛性。另外,讨论了两种性能准则下统一的异步数值迭代算法,包括Gauss-Seidel异步迭代和随机异步迭代,特别是基于样本轨道仿真的数值迭代等,并运用性能势思想对上述算法进行改进。结果表明,该算法可直接适用于连续时间Markov决策过程。最后通过一个数值例子来比较各种算法的特点。 For the problem of value iteration (VI) optimization of the compact action set in semi-Markov decision process, a unified standard VI algorithm directly based on the equivalent infinitesimal generator under both discount and average criteria was proposed with the proof of convergence. In addition, two unified asynchronous VI algorithms, Gauss-Seidel and stochastic, especially the stochastic asynchronous VI algorithm based on the sample path simulation under both discount and average criteria was discussed with improvement by using the performance potential theory. Comparison of each algorithm through a numerical example was given, which shows that the proposed algorithms are applicable to the continuous-time Markov decision processes.
出处 《吉林大学学报(工学版)》 EI CAS CSCD 北大核心 2006年第1期108-112,共5页 Journal of Jilin University:Engineering and Technology Edition
基金 国家自然科学基金资助项目(60404009) 安徽省自然科学基金资助项目(050420303) 合肥工业大学中青年科技创新群体计划项目
关键词 计算机应用 半MARKOV决策过程 等价无穷小生成子 异步数值迭代 computer application semi-Markov decision process equivalent infinitesimal generator asynchronous value iteration
  • 相关文献

参考文献3

二级参考文献5

共引文献20

同被引文献20

  • 1TANGHao YUANJi-Bin LUYang CHENGWen-Juan.Performance Potential-based Neuro-dynamic Programming for SMDPs[J].自动化学报,2005,31(4):642-645. 被引量:10
  • 2吴琦,熊光泽.非平稳自相似业务下自适应动态功耗管理[J].软件学报,2005,16(8):1499-1505. 被引量:20
  • 3唐昊,周雷,袁继彬.平均和折扣准则MDP基于TD(0)学习的统一NDP方法[J].控制理论与应用,2006,23(2):292-296. 被引量:5
  • 4江琦,奚宏生,殷保群.动态电源管理的随机切换模型与策略优化[J].计算机辅助设计与图形学学报,2006,18(5):680-686. 被引量:4
  • 5胡奇英,刘建庸.马尔可夫决策过程引论[M].西安:西安电子科技大学出版社,2002.
  • 6Soteriou V, Peh L S. Dynamic power management for power optimization of interconnection networks using on/off links[C]//11th Symposium on High Performance Interconnects, Stanford, C A, USA, 2003: 15-20.
  • 7Kaehroo P, Shukla S K, Erbes T, et al. Stochastic learning feedback hybrid automata for power man- agement in embedded systems[C]// Proceedings of the 2003 IEEE International Workshop on Soft Computing in Industrial Applications, Binghamton, N Y,USA,2003: 121-125.
  • 8Weng L C, Wang X J, Liu B. A survey of dynamic power optimization techniques[C]//3rd IEEE Inter- national Workshop on System-on-Chip for Real-Time Applications, Calgary, Alberta, Canada, 2003 : 48-52.
  • 9Benini L, De Micheli G. System-level power optimization: techniques and tools[J]. ACM Transactions on Design Automation of Electronic Systems, 2000, 5(2): 115-192.
  • 10Lu Y H, De Micheli G. Comparing system-level power management policies [J]. IEEE Design Test of Computers, 2001, 18(2):10-19.

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部