摘要
AODE是我们研制的一个面向Agent的智能系统开发环境 .AODE中基于强化学习的Agent协商模型采用Markov决策过程和连续决策过程分别描述系统状态变化和特定系统状态的Agent协商过程 ,并将强化学习技术应用于Agent协商过程 .该协商模型能够描述动态环境下的多Agent协商 ,模型中所有Agent都采用元对策Q 学习算法时 ,系统能获得动态协商环境下的最优协商解 .
AODE is an agent oriented development environment for intelligent software system, and it adopts a reinforcement learning based negotiation model . The negotiation model describes negotiation process as two parts: Markov decision process used to describe negotiation process while environment state changes; sequential decision process used to describe negotiation process in given environment state. The meta Q algorithm is applied to the negotiation model to make it adapt to dynamic environment. Evidence from theoretic analysis and observations of human interaction suggests that if a decision maker can take into consideration what other agents are thinking and furthermore learn how other agents behave from their interactions, its payoff might increase. So, applying learning to agents' negotiation process is receiving more and more consideration. Bazaar is a sequential decision making model of negotiation, and learning is modeled as a Bayesian belief update process. Agents equipped with the learning mechanism can update knowledge during interaction, and have stronger negotiation capability than agents without learning mechanism. But analysis of Bazaar shows: first, absence of knowledge about different environment states makes the model unable to be applied to negotiation in dynamic environment; second, the negotiation strategy adopted in Bazaar “always chooses the action that maximizes the expected payoff given the information available at this stage”. But this strategy neglects the effect the chosen action will make on the following states, agents with this strategy might not get optimal solution in dynamic environment. AODE adopts Markov process to describe migration among system states. According to meta game theory, optimal negotiation strategy is meta game equilibrium solution under given model. So, AODE chooses meta game Q learning algorithm as its learning mechanism, which considers both utilities in current states and possible effects on following states. Negotiation model of AODE can prescribe multi agent negotiation in dynamic environment, and optimization negotiation solution in dynamic environment is gotten while all agents adopt meta game Q learning algorithm through their negotiation process.
出处
《南京大学学报(自然科学版)》
CAS
CSCD
北大核心
2001年第2期135-141,共7页
Journal of Nanjing University(Natural Science)
基金
国家自然科学基金! ( 6990 50 0 1 )
高等学校博士点基金! ( 970 2 84 2 8)