本文在双人非合作马尔科夫博弈模型下,引入了一种策略度量指标,将保守策略推广到了双智能体情形,给出了一种保守策略梯度和策略改进的条件。这为双人非合作博弈中寻找保守策略下的纳什均衡提供了一定基础和改进方向。In this paper, a p...本文在双人非合作马尔科夫博弈模型下,引入了一种策略度量指标,将保守策略推广到了双智能体情形,给出了一种保守策略梯度和策略改进的条件。这为双人非合作博弈中寻找保守策略下的纳什均衡提供了一定基础和改进方向。In this paper, a policy metric is introduced under the two-player non-cooperative Markov game model, which generalizes the conservative policy to the two-agent case, and gives a conservative policy gradient and the conditions for policy improvement. This provides a certain foundation and improvement direction for finding Nash equilibrium under policy in two-player non-cooperative game.展开更多
文摘本文在双人非合作马尔科夫博弈模型下,引入了一种策略度量指标,将保守策略推广到了双智能体情形,给出了一种保守策略梯度和策略改进的条件。这为双人非合作博弈中寻找保守策略下的纳什均衡提供了一定基础和改进方向。In this paper, a policy metric is introduced under the two-player non-cooperative Markov game model, which generalizes the conservative policy to the two-agent case, and gives a conservative policy gradient and the conditions for policy improvement. This provides a certain foundation and improvement direction for finding Nash equilibrium under policy in two-player non-cooperative game.