摘要
为了解决认知无线网络中分布式的动态频率分配问题,采用随机博弈的框架,将认知链路建模成自私理性的智能体,并提出了一种以最大化平均Q函数为目标的多智能体学习算法—MAQ。通过MAQ学习,分布式的智能体可以实现间接的协商而不需要交互Q函数和回报值,因为智能体的决策过程需要考虑其他用户的决策。理论证明了MAQ学习算法的收敛性。仿真结果表明,MAQ算法的吞吐量性能接近中心式的学习算法,但是MAQ只需要较少的信息交互。
In order to achieve a distributed dynamic frequency allocation in cognitive radio network,a stochastic game framework is adopted.Cognitive links are modeled as selfish and rational agents.A new MARL algorithm,maximizing the average Q function algorithm(MAQ),is proposed in this study.With MAQ,distributed agents can realize an indirect coordination without exchanging their rewards and Q functions.Simulation results show that the learning efficiency of MAQ is close to that of centric learning method,while MAQ needs much less intercommunications.
出处
《辽宁工程技术大学学报(自然科学版)》
CAS
北大核心
2011年第5期778-783,共6页
Journal of Liaoning Technical University (Natural Science)
基金
国家973基金资助项目(2009CB320400)