Scalable video coding(SVC)has been widely used in video-on-demand(VOD)service,to efficiently satisfy users’different video quality requirements and dynamically adjust video stream to timevariant wireless channels.Und...Scalable video coding(SVC)has been widely used in video-on-demand(VOD)service,to efficiently satisfy users’different video quality requirements and dynamically adjust video stream to timevariant wireless channels.Under the 5G network structure,we consider a cooperative caching scheme inside each cluster with SVC to economically utilize the limited caching storage.A novel multi-agent deep reinforcement learning(MADRL)framework is proposed to jointly optimize the video access delay and users’satisfaction,where an aggregation node is introduced helping individual agents to achieve global observations and overall system rewards.Moreover,to cope with the large action space caused by the large number of videos and users,a dimension decomposition method is embedded into the neural network in each agent,which greatly reduce the computational complexity and memory cost of the reinforcement learning.Experimental results show that:1)the proposed value-decomposed dimensional network(VDDN)algorithm achieves an obvious performance gain versus the traditional MADRL;2)the proposed VDDN algorithm can handle an extremely large action space and quickly converge with a low computational complexity.展开更多
基金supported by the National Natural Science Foundation of China under Grant No.61801119。
文摘Scalable video coding(SVC)has been widely used in video-on-demand(VOD)service,to efficiently satisfy users’different video quality requirements and dynamically adjust video stream to timevariant wireless channels.Under the 5G network structure,we consider a cooperative caching scheme inside each cluster with SVC to economically utilize the limited caching storage.A novel multi-agent deep reinforcement learning(MADRL)framework is proposed to jointly optimize the video access delay and users’satisfaction,where an aggregation node is introduced helping individual agents to achieve global observations and overall system rewards.Moreover,to cope with the large action space caused by the large number of videos and users,a dimension decomposition method is embedded into the neural network in each agent,which greatly reduce the computational complexity and memory cost of the reinforcement learning.Experimental results show that:1)the proposed value-decomposed dimensional network(VDDN)algorithm achieves an obvious performance gain versus the traditional MADRL;2)the proposed VDDN algorithm can handle an extremely large action space and quickly converge with a low computational complexity.