Based on option-critic algorithm,a new adversarial algorithm named deterministic policy network with option architecture is proposed to improve agent's performance against opponent with fixed offensive algorithm.A...Based on option-critic algorithm,a new adversarial algorithm named deterministic policy network with option architecture is proposed to improve agent's performance against opponent with fixed offensive algorithm.An option network is introduced in upper level design,which can generate activated signal from defensive and of-fensive strategies according to temporary situation.Then the lower level executive layer can figure out interactive action with guidance of activated signal,and the value of both activated signal and interactive action is evaluated by critic structure together.This method could release requirement of semi Markov decision process effectively and eventually simplified network structure by eliminating termination possibility layer.According to the result of experiment,it is proved that new algorithm switches strategy style between offensive and defensive ones neatly and acquires more reward from environment than classical deep deterministic policy gradient algorithm does.展开更多
基金the National Natural Science Foundation of China (No.61673265)the National Key Research and Development Program (No.2020YFC1512203)the Shanghai Commercial Aircraft System Engineering Joint Research Fund (No.CASEF-2022-Z05)。
文摘Based on option-critic algorithm,a new adversarial algorithm named deterministic policy network with option architecture is proposed to improve agent's performance against opponent with fixed offensive algorithm.An option network is introduced in upper level design,which can generate activated signal from defensive and of-fensive strategies according to temporary situation.Then the lower level executive layer can figure out interactive action with guidance of activated signal,and the value of both activated signal and interactive action is evaluated by critic structure together.This method could release requirement of semi Markov decision process effectively and eventually simplified network structure by eliminating termination possibility layer.According to the result of experiment,it is proved that new algorithm switches strategy style between offensive and defensive ones neatly and acquires more reward from environment than classical deep deterministic policy gradient algorithm does.