摘要
在状态和行动集均可数,报酬函数有界条件下,建立起非时齐折扣MDP的对应时齐折扣MDP模型,并证明两者等价从而把非时齐折扣MDP问题转化为一个与之等价的时齐折扣MDP问题,使时齐折扣MDP的结果对非时齐情况也成立鉴于时齐折扣模型的讨论比较充分,这就带来了非时齐折扣模型的完满结论。
This paper is concentrated on the study of transformation for nonstationary discounted Markov decision processes Hera, the state spaces and cation spaces are countable, and the reward functions are bounded Through the transformation of models from nonstationary to stationary, a specially structured stationary discounted MDP is worked out Thus the intrinisic relationship between the two models is provided and they are proven equivalent Accordingly, the results about ε optimal policies and optimal policies in the stationary discounted can be applied to the nonstationary discounted MDP
关键词
非时齐折扣MDP
时齐折扣MDP
模型转化
nonstationary discounted MDP
stationary discounted MDP
transformation of models
(S t, ε)optimal policy
optimal policy
ε optimal policy