In this paper, the reinforcement learning method for cooperative multi-agent systems(MAS) with incremental number of agents is studied. The existing multi-agent reinforcement learning approaches deal with the MAS with...In this paper, the reinforcement learning method for cooperative multi-agent systems(MAS) with incremental number of agents is studied. The existing multi-agent reinforcement learning approaches deal with the MAS with a specific number of agents, and can learn well-performed policies. However, if there is an increasing number of agents, the previously learned in may not perform well in the current scenario. The new agents need to learn from scratch to find optimal policies with others,which may slow down the learning speed of the whole team. To solve that problem, in this paper, we propose a new algorithm to take full advantage of the historical knowledge which was learned before, and transfer it from the previous agents to the new agents. Since the previous agents have been trained well in the source environment, they are treated as teacher agents in the target environment. Correspondingly, the new agents are called student agents. To enable the student agents to learn from the teacher agents, we first modify the input nodes of the networks for teacher agents to adapt to the current environment. Then, the teacher agents take the observations of the student agents as input, and output the advised actions and values as supervising information. Finally, the student agents combine the reward from the environment and the supervising information from the teacher agents, and learn the optimal policies with modified loss functions. By taking full advantage of the knowledge of teacher agents, the search space for the student agents will be reduced significantly, which can accelerate the learning speed of the holistic system. The proposed algorithm is verified in some multi-agent simulation environments, and its efficiency has been demonstrated by the experiment results.展开更多
基于预训练微调的分类方法通常需要大量带标注的数据,导致无法应用于小样本分类任务。因此,针对中文小样本新闻主题分类任务,提出一种基于知识增强和提示学习的分类方法KPL(Knowledge enhancement and Prompt Learning)。首先,利用预训...基于预训练微调的分类方法通常需要大量带标注的数据,导致无法应用于小样本分类任务。因此,针对中文小样本新闻主题分类任务,提出一种基于知识增强和提示学习的分类方法KPL(Knowledge enhancement and Prompt Learning)。首先,利用预训练模型在训练集上学习最优的提示模板;其次,将提示模板与输入文本结合,使分类任务转化为完形填空任务;同时利用外部知识扩充标签词空间,丰富标签词的语义信息;最后,对预测的标签词与原始的标签进行映射。通过在THUCNews、SHNews和Toutiao这3个新闻数据集上进行随机采样,形成小样本训练集和验证集进行实验。实验结果表明,所提方法在上述数据集上的1-shot、5-shot、10-shot和20-shot任务上整体表现有所提升,尤其在1-shot任务上提升效果突出,与基线小样本分类方法相比,准确率分别提高了7.59、2.11和3.10个百分点以上,验证了KPL在小样本新闻主题分类任务上的有效性。展开更多
基于深度学习的个性化新闻推荐方法通常采用全量更新训练模型.然而,全量更新需要不断整合新数据形成新的训练集,虽然可以保障模型性能,但训练效率低下.另外,出于数据隐私和存储考虑,现实场景下的应用通常不会保留所有历史数据导致全量...基于深度学习的个性化新闻推荐方法通常采用全量更新训练模型.然而,全量更新需要不断整合新数据形成新的训练集,虽然可以保障模型性能,但训练效率低下.另外,出于数据隐私和存储考虑,现实场景下的应用通常不会保留所有历史数据导致全量更新难以为继.增量学习是目前广泛采用的有效解决方法.然而,基于增量学习的新闻推荐模型也存在着新的挑战——灾难性遗忘问题,常见的解决策略有基于正则化和基于回放的方法.基于正则化的方法局限于个体样本在新任务中学习到的特征和原始网络的响应特征之间的对齐或空间几何结构匹配,缺乏全局视觉.基于回放的方法重放过往任务数据,可能导致数据隐私泄漏.为了解决以上不足,本文提出了基于最优传输和知识回放(Optimal Transport and Knowledge Replay)的新闻推荐模型增量学习方法OT-KR.OT-KR方法通过联合分布知识提取器重构联合分布知识特征集合,并且使用最优传输理论在训练过程中最小化新任务和旧任务间的分布差异,确保新模型学习到的域分布可以同时拟合旧任务和新任务,实现知识融合.特别地,为了缓解数据隐私泄漏问题,OT-KR方法仅保存模型参数而非样本作为知识进行回放,同时,借鉴多教师知识蒸馏思想让新任务上的模型可以同时融合所有教师流中的分布信息,并根据任务的学习次序分配权重.通过在公开新闻推荐数据集上开展实验,结果表明OT-KR方法的推荐性能优于基于目前主流增量学习技术的新闻推荐方法,在AUC和NDCG@10两个指标上比目前最优性能平均提高了0.55%和0.47%,同时,能够很好地平衡模型的推荐性能和训练效率.展开更多
基金supported by the National Key R&D Program of China (2018AAA0101400)the National Natural Science Foundation of China (62173251+3 种基金61921004U1713209)the Natural Science Foundation of Jiangsu Province of China (BK20202006)the Guangdong Provincial Key Laboratory of Intelligent Decision and Cooperative Control。
文摘In this paper, the reinforcement learning method for cooperative multi-agent systems(MAS) with incremental number of agents is studied. The existing multi-agent reinforcement learning approaches deal with the MAS with a specific number of agents, and can learn well-performed policies. However, if there is an increasing number of agents, the previously learned in may not perform well in the current scenario. The new agents need to learn from scratch to find optimal policies with others,which may slow down the learning speed of the whole team. To solve that problem, in this paper, we propose a new algorithm to take full advantage of the historical knowledge which was learned before, and transfer it from the previous agents to the new agents. Since the previous agents have been trained well in the source environment, they are treated as teacher agents in the target environment. Correspondingly, the new agents are called student agents. To enable the student agents to learn from the teacher agents, we first modify the input nodes of the networks for teacher agents to adapt to the current environment. Then, the teacher agents take the observations of the student agents as input, and output the advised actions and values as supervising information. Finally, the student agents combine the reward from the environment and the supervising information from the teacher agents, and learn the optimal policies with modified loss functions. By taking full advantage of the knowledge of teacher agents, the search space for the student agents will be reduced significantly, which can accelerate the learning speed of the holistic system. The proposed algorithm is verified in some multi-agent simulation environments, and its efficiency has been demonstrated by the experiment results.
文摘基于预训练微调的分类方法通常需要大量带标注的数据,导致无法应用于小样本分类任务。因此,针对中文小样本新闻主题分类任务,提出一种基于知识增强和提示学习的分类方法KPL(Knowledge enhancement and Prompt Learning)。首先,利用预训练模型在训练集上学习最优的提示模板;其次,将提示模板与输入文本结合,使分类任务转化为完形填空任务;同时利用外部知识扩充标签词空间,丰富标签词的语义信息;最后,对预测的标签词与原始的标签进行映射。通过在THUCNews、SHNews和Toutiao这3个新闻数据集上进行随机采样,形成小样本训练集和验证集进行实验。实验结果表明,所提方法在上述数据集上的1-shot、5-shot、10-shot和20-shot任务上整体表现有所提升,尤其在1-shot任务上提升效果突出,与基线小样本分类方法相比,准确率分别提高了7.59、2.11和3.10个百分点以上,验证了KPL在小样本新闻主题分类任务上的有效性。
文摘基于深度学习的个性化新闻推荐方法通常采用全量更新训练模型.然而,全量更新需要不断整合新数据形成新的训练集,虽然可以保障模型性能,但训练效率低下.另外,出于数据隐私和存储考虑,现实场景下的应用通常不会保留所有历史数据导致全量更新难以为继.增量学习是目前广泛采用的有效解决方法.然而,基于增量学习的新闻推荐模型也存在着新的挑战——灾难性遗忘问题,常见的解决策略有基于正则化和基于回放的方法.基于正则化的方法局限于个体样本在新任务中学习到的特征和原始网络的响应特征之间的对齐或空间几何结构匹配,缺乏全局视觉.基于回放的方法重放过往任务数据,可能导致数据隐私泄漏.为了解决以上不足,本文提出了基于最优传输和知识回放(Optimal Transport and Knowledge Replay)的新闻推荐模型增量学习方法OT-KR.OT-KR方法通过联合分布知识提取器重构联合分布知识特征集合,并且使用最优传输理论在训练过程中最小化新任务和旧任务间的分布差异,确保新模型学习到的域分布可以同时拟合旧任务和新任务,实现知识融合.特别地,为了缓解数据隐私泄漏问题,OT-KR方法仅保存模型参数而非样本作为知识进行回放,同时,借鉴多教师知识蒸馏思想让新任务上的模型可以同时融合所有教师流中的分布信息,并根据任务的学习次序分配权重.通过在公开新闻推荐数据集上开展实验,结果表明OT-KR方法的推荐性能优于基于目前主流增量学习技术的新闻推荐方法,在AUC和NDCG@10两个指标上比目前最优性能平均提高了0.55%和0.47%,同时,能够很好地平衡模型的推荐性能和训练效率.