期刊文献+
共找到4篇文章
< 1 >
每页显示 20 50 100
深度分层强化学习研究与发展 被引量:9
1
作者 黄志刚 刘全 +2 位作者 张立华 曹家庆 朱斐 《软件学报》 EI CSCD 北大核心 2023年第2期733-760,共28页
深度分层强化学习是深度强化学习领域的一个重要研究方向,它重点关注经典深度强化学习难以解决的稀疏奖励、顺序决策和弱迁移能力等问题.其核心思想在于:根据分层思想构建具有多层结构的强化学习策略,运用时序抽象表达方法组合时间细粒... 深度分层强化学习是深度强化学习领域的一个重要研究方向,它重点关注经典深度强化学习难以解决的稀疏奖励、顺序决策和弱迁移能力等问题.其核心思想在于:根据分层思想构建具有多层结构的强化学习策略,运用时序抽象表达方法组合时间细粒度的下层动作,学习时间粗粒度的、有语义的上层动作,将复杂问题分解为数个简单问题进行求解.近年来,随着研究的深入,深度分层强化学习方法已经取得了实质性的突破,且被应用于视觉导航、自然语言处理、推荐系统和视频描述生成等生活领域.首先介绍了分层强化学习的理论基础;然后描述了深度分层强化学习的核心技术,包括分层抽象技术和常用实验环境;详细分析了基于技能的深度分层强化学习框架和基于子目标的深度分层强化学习框架,对比了各类算法的研究现状和发展趋势;接下来介绍了深度分层强化学习在多个现实生活领域中的应用;最后,对深度分层强化学习进行了展望和总结. 展开更多
关键词 人工智能 强化学习 深度强化学习 半马尔可夫决策过程 深度分层强化学习
下载PDF
融合引力搜索的双延迟深度确定策略梯度方法 被引量:1
2
作者 徐平安 刘全 +1 位作者 郝少璞 张立华 《软件学报》 EI CSCD 北大核心 2023年第11期5191-5204,共14页
近年来,深度强化学习在复杂控制任务中取得了令人瞩目的效果,然而由于超参数的高敏感性和收敛性难以保证等原因,严重影响了其对现实问题的适用性.元启发式算法作为一类模拟自然界客观规律的黑盒优化方法,虽然能够有效避免超参数的敏感性... 近年来,深度强化学习在复杂控制任务中取得了令人瞩目的效果,然而由于超参数的高敏感性和收敛性难以保证等原因,严重影响了其对现实问题的适用性.元启发式算法作为一类模拟自然界客观规律的黑盒优化方法,虽然能够有效避免超参数的敏感性,但仍存在无法适应待优化参数量规模巨大和样本使用效率低等问题.针对以上问题,提出融合引力搜索的双延迟深度确定策略梯度方法(twin delayed deep deterministic policy gradient based on gravitational search algorithm,GSA-TD3).该方法融合两类算法的优势:一是凭借梯度优化的方式更新策略,获得更高的样本效率和更快的学习速度;二是将基于万有引力定律的种群更新方法引入到策略搜索过程中,使其具有更强的探索性和更好的稳定性.将GSA-TD3应用于一系列复杂控制任务中,实验表明,与前沿的同类深度强化学习方法相比,GSA-TD3在性能上具有显著的优势. 展开更多
关键词 深度强化学习 元启发式算法 引力搜索 确定策略梯度 策略搜索
下载PDF
Identification of Similar Air Traffic Scenes with Active Metric Learning 被引量:2
3
作者 CHEN Haiyan HOU Xiaye +1 位作者 YUAN Ligang ZHANG Bing 《Transactions of Nanjing University of Aeronautics and Astronautics》 EI CSCD 2021年第4期625-633,共9页
The rapid growth of air traffic has continuously increased the workload of controllers,which has become an important factor restricting sector capacity.If similar traffic scenes can be identified,the historical decisi... The rapid growth of air traffic has continuously increased the workload of controllers,which has become an important factor restricting sector capacity.If similar traffic scenes can be identified,the historical decision-making experience may be used to help controllers decide control strategies quickly.Considering that there are many traffic scenes and it is hard to label them all,in this paper,we propose an active SVM metric learning(ASVM2L)algorithm to measure and identify the similar traffic scenes.First of all,we obtain some traffic scene samples correctly labeled by experienced air traffic controllers.We design an active sampling strategy based on voting difference to choose the most valuable unlabeled samples and label them.Then the metric matrix of all the labeled samples is learned and used to complete the classification of traffic scenes.We verify the effectiveness of ASVM2L on standard data sets,and then use it to measure and classify the traffic scenes on the historical air traffic data set of the Central South Sector of China.The experimental results show that,compared with other existing methods,the proposed method can use the information of traffic scene samples more thoroughly and achieve better classification performance under limited labeled samples. 展开更多
关键词 air traffic similar scene active learning metric learning SVM
下载PDF
Recognition of Similar Weather Scenarios in Terminal Area Based on Contrastive Learning 被引量:2
4
作者 CHEN Haiyan LIU Zhenya +1 位作者 ZHOU Yi YUAN Ligang 《Transactions of Nanjing University of Aeronautics and Astronautics》 EI CSCD 2022年第4期425-433,共9页
In order to improve the recognition accuracy of similar weather scenarios(SWSs)in terminal area,a recognition model for SWS based on contrastive learning(SWS-CL)is proposed.Firstly,a data augmentation method is design... In order to improve the recognition accuracy of similar weather scenarios(SWSs)in terminal area,a recognition model for SWS based on contrastive learning(SWS-CL)is proposed.Firstly,a data augmentation method is designed to improve the number and quality of weather scenarios samples according to the characteristics of convective weather images.Secondly,in the pre-trained recognition model of SWS-CL,a loss function is formulated to minimize the distance between the anchor and positive samples,and maximize the distance between the anchor and the negative samples in the latent space.Finally,the pre-trained SWS-CL model is fine-tuned with labeled samples to improve the recognition accuracy of SWS.The comparative experiments on the weather images of Guangzhou terminal area show that the proposed data augmentation method can effectively improve the quality of weather image dataset,and the proposed SWS-CL model can achieve satisfactory recognition accuracy.It is also verified that the fine-tuned SWS-CL model has obvious advantages in datasets with sparse labels. 展开更多
关键词 air traffic control terminal area similar weather scenarios(SWSs) image recognition contrastive learning
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部