期刊文献+

交互式动态影响图研究及其最优K模型解法 被引量:3

Interactive Dynamic Influence Diagram Research Summary and New Solutions on Top-K Model Selection
下载PDF
导出
摘要 不确定性多智能体序贯决策是人工智能研究领域一个重要的研究问题,主要求解智能体如何在与其他智能体的交互中优化本身的决策.特别在部分可观测的随机博弈设置下,智能体不能探测到真实的外部环境状态,必须依靠所接收的观察来推断可能的状态;同时,智能体的动作也具有相当的随机性,直接影响到其他智能体的决策.智能体的交互主要通过对共同环境状态的影响决定它们各自决策的报酬.因此,如何对多智能体之间的交互进行建模是求解该问题的核心任务.目前大部分的研究主要通过对整个智能体系统进行建模,采取集中规划、分散控制的求解机制:首先,统一计算所有智能体的联合决策;然后,各个智能体执行分配得到的局部决策.该求解技术往往要求所有的智能体必须对全局环境有一个共同的知识假设,因此该研究工作一般只适用于合作型的多智能体系统.相比之下,交互式动态影响图是从个体决策者的角度研究不确定性多智能体序贯决策问题的一种普遍适用的建模方法,克服了传统的博弈论方法求解多智能体决策问题的局限性.求解交互式动态影响图模型的主要困难在于复杂的智能体相互建模过程.特别是在竞争的环境下,由于智能体缺少相互交流的机会,也不能预知其他智能体的真实模型,必须通过预测和推理其他智能体的行为来决定本身的动作.主要求解思路是首先假设其他智能体的可能模型,然后通过求解这些可能的模型来预测智能体的行为.由于其他智能体的备选模型往往有很多,而且随着决策时间的推移,模型的不确定性增强,导致可能的模型呈指数增长,这给求解交互式动态影响图带来了极大的困难.基于目前大量的交互式动态影响图研究工作,文中旨在总结归纳模型的具体表达方式和求解方法,并在此基础上提出一种新的模型求解方法.针对巨大的其他智能体备选模型空间,新方法侧重于研究模型的选取技术,把模型选取问题转化为一个构造最优K模型的函数优化问题.优化的目标是尽量使得选取的K个模型能在最大程度上覆盖整个其他智能体的模型空间.从本质上说,新的函数优化问题具有NP难度.文中通过挖掘目标函数的单调子模特性提出一种贪婪算法以迅速求解该优化问题,并在理论上保证了解的质量.此外,新的求解方法克服了目前近似方法的随机性和参数设置的复杂性.该方法在一个经典计算机游戏领域得到了大量的实验验证,展示了较强的实际应用能力. Mult research issue in iagent sequential decision-making problem under uncertainty is an important the area of artificial intelligence, and mainly focuses on solutions to the problem of how agents shall optimize their decisions observable, stochastic games agents can't in the interactions. Particularly in a setting of partially perceive the precise states of external environments and rely on received observations to infer the hidden agents have a direct influence on decisions of other states. Meanwhile, the stochastic actions of agents. Their interactions impact the state changes of the common environment, which decides rewards in executing their actions. Hence the core task is to model agents' interactions and subsequently to solve the model. Most of the existing research models the entire multiagent systems and follows the mechanism of centralized plan and decentralized control to solve the problem. It first computes a joint policy for all the agents and then assigns the local policies to the agents for a final execution. This approach often demands that all the agents hold common knowledge of the global environment, which can only be applied in cooperative multiagent systems. In contrast, interactive dynamic influence diagram (I-DID), which takes the individual decision-making perspective, provides a general framework for solvingmultiagent sequential decision problems under uncertainty. The solutions remove limitations of traditional multiagent decision approaches based on game theory. The main difficulty arises from the complicated process of mutually modeling of multiple agents in I-DID. In particular, agents can't communicate in a competitive setting so that they can't perceive the true model other agents, which requires the agents to predict and reason with other agents~ behavior in order to optimize their own decisions. The solution is to first hypothesize a number of candidate models assigned to other agents and then solve the models to predict their behavior. Since the number of candidate models could be rather large and the uncertainty of the models increases with the receiving of new information over time, the number of models grows exponentially with time. This signifi- cantly increases the difficulty in solving I-DID. This paper summarizes the recent development of I-DID research and proposes a new I-DID solution. Considering the large space of candidate models of other agents, the proposed approach focuses on the selection of a proper subset of models in order to efficiently solve the I-DID. It converts the model selection problem into one top-K model optimization through function optimization techniques; The optimization problem aims to maximize the coverage of the selected K models over the entire space of other agents~ candidate models. Essentially the top-K model optimization is NP hard. By exploiting the monotonic, submodular property of the objective function, this article develops a greedy selection algorithm to efficiently solve the problem while ensuring the solution quality in a theoretical way. In addition, the new solution avoids randomness of the model selection and reduces the complexity of parameter settings in approximate I-DID solutions. Performance of the new method is experi- mentally verified in a new problem domain of computer games and shows strong capability in practical applications.
出处 《计算机学报》 EI CSCD 北大核心 2018年第1期28-46,共19页 Chinese Journal of Computers
基金 国家自然科学基金(61375070 61562033 61772442 71361011) 江西省社会科学规划基金(16GJ20) 江西省自然科学基金(20171BAB202022)资助~~
关键词 多智能体系统 影响图 序贯决策问题 行为等价 multiagent systems influence diagrams sequential decision-making problem behavioral equivalence
  • 相关文献

参考文献13

二级参考文献134

共引文献88

同被引文献17

引证文献3

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部