期刊文献+
共找到3,575篇文章
< 1 2 179 >
每页显示 20 50 100
Evolutionary dynamics of tax-based strong altruistic reward andpunishment in a public goods game
1
作者 Zhi-Hao Yang Yan-Long Yang 《Chinese Physics B》 SCIE EI CAS CSCD 2024年第9期247-257,共11页
In public goods games, punishments and rewards have been shown to be effective mechanisms for maintaining individualcooperation. However, punishments and rewards are costly to incentivize cooperation. Therefore, the g... In public goods games, punishments and rewards have been shown to be effective mechanisms for maintaining individualcooperation. However, punishments and rewards are costly to incentivize cooperation. Therefore, the generation ofcostly penalties and rewards has been a complex problem in promoting the development of cooperation. In real society,specialized institutions exist to punish evil people or reward good people by collecting taxes. We propose a strong altruisticpunishment or reward strategy in the public goods game through this phenomenon. Through theoretical analysis and numericalcalculation, we can get that tax-based strong altruistic punishment (reward) has more evolutionary advantages thantraditional strong altruistic punishment (reward) in maintaining cooperation and tax-based strong altruistic reward leads toa higher level of cooperation than tax-based strong altruistic punishment. 展开更多
关键词 evolutionary game theory strong altruism PUNIsHMENt reward
原文传递
Evolutionary analysis of green credit and automobile enterprises under the mechanism of dynamic reward and punishment based on government regulation
2
作者 Yu Dong Xiaoyu Huang +1 位作者 Hongan Gan Xuyang Liu 《中国科学技术大学学报》 CAS CSCD 北大核心 2024年第5期49-62,I0007,共15页
To explore the green development of automobile enterprises and promote the achievement of the“dual carbon”target,based on the bounded rationality assumptions,this study constructed a tripartite evolutionary game mod... To explore the green development of automobile enterprises and promote the achievement of the“dual carbon”target,based on the bounded rationality assumptions,this study constructed a tripartite evolutionary game model of gov-ernment,commercial banks,and automobile enterprises;introduced a dynamic reward and punishment mechanism;and analyzed the development process of the three parties’strategic behavior under the static and dynamic reward and punish-ment mechanism.Vensim PLE was used for numerical simulation analysis.Our results indicate that the system could not reach a stable state under the static reward and punishment mechanism.A dynamic reward and punishment mechanism can effectively improve the system stability and better fit real situations.Under the dynamic reward and punishment mechan-ism,an increase in the initial probabilities of the three parties can promote the system stability,and the government can im-plement effective supervision by adjusting the upper limit of the reward and punishment intensity.Finally,the implementa-tion of green credit by commercial banks plays a significant role in promoting the green development of automobile enter-prises. 展开更多
关键词 automobile enterprises green credit system dynamics reward and punishment mechanism
下载PDF
Improved Double Deep Q Network Algorithm Based on Average Q-Value Estimation and Reward Redistribution for Robot Path Planning
3
作者 Yameng Yin Lieping Zhang +3 位作者 Xiaoxu Shi Yilin Wang Jiansheng Peng Jianchu Zou 《Computers, Materials & Continua》 SCIE EI 2024年第11期2769-2790,共22页
By integrating deep neural networks with reinforcement learning,the Double Deep Q Network(DDQN)algorithm overcomes the limitations of Q-learning in handling continuous spaces and is widely applied in the path planning... By integrating deep neural networks with reinforcement learning,the Double Deep Q Network(DDQN)algorithm overcomes the limitations of Q-learning in handling continuous spaces and is widely applied in the path planning of mobile robots.However,the traditional DDQN algorithm suffers from sparse rewards and inefficient utilization of high-quality data.Targeting those problems,an improved DDQN algorithm based on average Q-value estimation and reward redistribution was proposed.First,to enhance the precision of the target Q-value,the average of multiple previously learned Q-values from the target Q network is used to replace the single Q-value from the current target Q network.Next,a reward redistribution mechanism is designed to overcome the sparse reward problem by adjusting the final reward of each action using the round reward from trajectory information.Additionally,a reward-prioritized experience selection method is introduced,which ranks experience samples according to reward values to ensure frequent utilization of high-quality data.Finally,simulation experiments are conducted to verify the effectiveness of the proposed algorithm in fixed-position scenario and random environments.The experimental results show that compared to the traditional DDQN algorithm,the proposed algorithm achieves shorter average running time,higher average return and fewer average steps.The performance of the proposed algorithm is improved by 11.43%in the fixed scenario and 8.33%in random environments.It not only plans economic and safe paths but also significantly improves efficiency and generalization in path planning,making it suitable for widespread application in autonomous navigation and industrial automation. 展开更多
关键词 Double Deep Q Network path planning average Q-value estimation reward redistribution mechanism reward-prioritized experience selection method
下载PDF
Enhancing Cross-Lingual Image Description: A Multimodal Approach for Semantic Relevance and Stylistic Alignment
4
作者 Emran Al-Buraihy Dan Wang 《Computers, Materials & Continua》 SCIE EI 2024年第6期3913-3938,共26页
Cross-lingual image description,the task of generating image captions in a target language from images and descriptions in a source language,is addressed in this study through a novel approach that combines neural net... Cross-lingual image description,the task of generating image captions in a target language from images and descriptions in a source language,is addressed in this study through a novel approach that combines neural network models and semantic matching techniques.Experiments conducted on the Flickr8k and AraImg2k benchmark datasets,featuring images and descriptions in English and Arabic,showcase remarkable performance improvements over state-of-the-art methods.Our model,equipped with the Image&Cross-Language Semantic Matching module and the Target Language Domain Evaluation module,significantly enhances the semantic relevance of generated image descriptions.For English-to-Arabic and Arabic-to-English cross-language image descriptions,our approach achieves a CIDEr score for English and Arabic of 87.9%and 81.7%,respectively,emphasizing the substantial contributions of our methodology.Comparative analyses with previous works further affirm the superior performance of our approach,and visual results underscore that our model generates image captions that are both semantically accurate and stylistically consistent with the target language.In summary,this study advances the field of cross-lingual image description,offering an effective solution for generating image captions across languages,with the potential to impact multilingual communication and accessibility.Future research directions include expanding to more languages and incorporating diverse visual and textual data sources. 展开更多
关键词 Cross-language image description multimodal deep learning semantic matching reward mechanisms
下载PDF
UAV maneuvering decision-making algorithm based on deep reinforcement learning under the guidance of expert experience
5
作者 ZHAN Guang ZHANG Kun +1 位作者 LI Ke PIAO Haiyin 《Journal of Systems Engineering and Electronics》 SCIE CSCD 2024年第3期644-665,共22页
Autonomous umanned aerial vehicle(UAV) manipulation is necessary for the defense department to execute tactical missions given by commanders in the future unmanned battlefield. A large amount of research has been devo... Autonomous umanned aerial vehicle(UAV) manipulation is necessary for the defense department to execute tactical missions given by commanders in the future unmanned battlefield. A large amount of research has been devoted to improving the autonomous decision-making ability of UAV in an interactive environment, where finding the optimal maneuvering decisionmaking policy became one of the key issues for enabling the intelligence of UAV. In this paper, we propose a maneuvering decision-making algorithm for autonomous air-delivery based on deep reinforcement learning under the guidance of expert experience. Specifically, we refine the guidance towards area and guidance towards specific point tasks for the air-delivery process based on the traditional air-to-surface fire control methods.Moreover, we construct the UAV maneuvering decision-making model based on Markov decision processes(MDPs). Specifically, we present a reward shaping method for the guidance towards area and guidance towards specific point tasks using potential-based function and expert-guided advice. The proposed algorithm could accelerate the convergence of the maneuvering decision-making policy and increase the stability of the policy in terms of the output during the later stage of training process. The effectiveness of the proposed maneuvering decision-making policy is illustrated by the curves of training parameters and extensive experimental results for testing the trained policy. 展开更多
关键词 unmanned aerial vehicle(UAV) maneuvering decision-making autonomous air-delivery deep reinforcement learning reward shaping expert experience
下载PDF
Efficient Optimal Routing Algorithm Based on Reward and Penalty for Mobile Adhoc Networks
6
作者 Anubha Ravneet Preet Singh Bedi +3 位作者 Arfat Ahmad Khan Mohd Anul Haq Ahmad Alhussen Zamil S.Alzamil 《Computers, Materials & Continua》 SCIE EI 2023年第4期1331-1351,共21页
Mobile adhoc networks have grown in prominence in recent years,and they are now utilized in a broader range of applications.The main challenges are related to routing techniques that are generally employed in them.Mob... Mobile adhoc networks have grown in prominence in recent years,and they are now utilized in a broader range of applications.The main challenges are related to routing techniques that are generally employed in them.Mobile Adhoc system management,on the other hand,requires further testing and improvements in terms of security.Traditional routing protocols,such as Adhoc On-Demand Distance Vector(AODV)and Dynamic Source Routing(DSR),employ the hop count to calculate the distance between two nodes.The main aim of this research work is to determine the optimum method for sending packets while also extending life time of the network.It is achieved by changing the residual energy of each network node.Also,in this paper,various algorithms for optimal routing based on parameters like energy,distance,mobility,and the pheromone value are proposed.Moreover,an approach based on a reward and penalty system is given in this paper to evaluate the efficiency of the proposed algorithms under the impact of parameters.The simulation results unveil that the reward penalty-based approach is quite effective for the selection of an optimal path for routing when the algorithms are implemented under the parameters of interest,which helps in achieving less packet drop and energy consumption of the nodes along with enhancing the network efficiency. 展开更多
关键词 ROUtING optimization reward PENALtY MOBILItY energy tHROUGHOUt PHEROMONE
下载PDF
Magnetic Field-Based Reward Shaping for Goal-Conditioned Reinforcement Learning
7
作者 Hongyu Ding Yuanze Tang +3 位作者 Qing Wu Bo Wang Chunlin Chen Zhi Wang 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2023年第12期2233-2247,共15页
Goal-conditioned reinforcement learning(RL)is an interesting extension of the traditional RL framework,where the dynamic environment and reward sparsity can cause conventional learning algorithms to fail.Reward shapin... Goal-conditioned reinforcement learning(RL)is an interesting extension of the traditional RL framework,where the dynamic environment and reward sparsity can cause conventional learning algorithms to fail.Reward shaping is a practical approach to improving sample efficiency by embedding human domain knowledge into the learning process.Existing reward shaping methods for goal-conditioned RL are typically built on distance metrics with a linear and isotropic distribution,which may fail to provide sufficient information about the ever-changing environment with high complexity.This paper proposes a novel magnetic field-based reward shaping(MFRS)method for goal-conditioned RL tasks with dynamic target and obstacles.Inspired by the physical properties of magnets,we consider the target and obstacles as permanent magnets and establish the reward function according to the intensity values of the magnetic field generated by these magnets.The nonlinear and anisotropic distribution of the magnetic field intensity can provide more accessible and conducive information about the optimization landscape,thus introducing a more sophisticated magnetic reward compared to the distance-based setting.Further,we transform our magnetic reward to the form of potential-based reward shaping by learning a secondary potential function concurrently to ensure the optimal policy invariance of our method.Experiments results in both simulated and real-world robotic manipulation tasks demonstrate that MFRS outperforms relevant existing methods and effectively improves the sample efficiency of RL algorithms in goal-conditioned tasks with various dynamics of the target and obstacles. 展开更多
关键词 Dynamic environments goal-conditioned reinforcement learning magnetic field reward shaping
下载PDF
基于Stackelberg博弈的车辆最优路径决策
8
作者 李志衡 《工业控制计算机》 2024年第11期75-77,共3页
随着城市化和机动化的快速发展,交通拥堵因困扰着人们的出行,影响着人们的出行安全和出行时间而越来越受到人们的关注。针对可能发生的交通拥堵现象,提出了一种基于Stackelberg博弈的车辆最优路径决策,以提高道路通行效率,解决交通拥堵... 随着城市化和机动化的快速发展,交通拥堵因困扰着人们的出行,影响着人们的出行安全和出行时间而越来越受到人们的关注。针对可能发生的交通拥堵现象,提出了一种基于Stackelberg博弈的车辆最优路径决策,以提高道路通行效率,解决交通拥堵问题。首先,建立了云智能中心和规划车辆各自的收益模型,并在对应约束条件下构建Stackelberg激励机制。随后,提出了一种迭代求解算法,用于确定云智能中心最优奖励决策和规划车辆的最优路径决策。最后,通过算例分析验证了该激励机制对于缓解交通拥堵,提高道路通行能力的有效性。 展开更多
关键词 交通拥堵 stACKELBERG博弈 云智能中心 规划车辆 最优奖励决策
下载PDF
Effectiveness of Reward System on Assessment Outcomes in Mathematics
9
作者 May Semira Inandan 《Journal of Contemporary Educational Research》 2023年第9期52-58,共7页
As assessment outcomes provide students with a sense of accomplishment that is boosted by the reward system,learning becomes more effective.This research aims to determine the effects of reward system prior to assessm... As assessment outcomes provide students with a sense of accomplishment that is boosted by the reward system,learning becomes more effective.This research aims to determine the effects of reward system prior to assessment in Mathematics.Quasi-experimental research design was used to examine whether there was a significant difference between the use of reward system and students’level of performance in Mathematics.Through purposive sampling,the respondents of the study involve 80 Grade 9 students belonging to two sections from Gaudencio B.Lontok Memorial Integrated School.Based on similar demographics and pre-test results,control and study group were involved as participants of the study.Data were treated and analyzed accordingly using statistical treatments such as mean and t-test for independent variables.There was a significant finding revealing the advantage of using the reward system compare to the non-reward system in increasing students’level of performance in Mathematics.It is concluded that the use of reward system is effective in improving the assessment outcomes in Mathematics.It is recommended to use reward system for persistent assessment outcomes prior to assessment,to be a reflection of the intended outcomes in Mathematics. 展开更多
关键词 MAtHEMAtICs reward system Assessment outcomes
下载PDF
Effects of Grassland Eco-Protection Compensation and Reward System 被引量:3
10
作者 杨旭东 孟志兴 杨春 《Agricultural Science & Technology》 CAS 2016年第6期1506-1509,共4页
Grassland ecological protection compensation and reward policy is the largest-scale investment concerning themost extensive areas since foundation of the PRC. It will be the long-term implementationpolicy for grasslan... Grassland ecological protection compensation and reward policy is the largest-scale investment concerning themost extensive areas since foundation of the PRC. It will be the long-term implementationpolicy for grassland ecological protection. In this study,based on macro-perspective, the policy effects ofgrasslandproductivity, ecological protection, animal husbandryoutput, pastoralists' income were ana- lyzed. The resultsshow that, afterimplementation of the policy, naturalgrass production and grasslandtheoretical stocking rateincreased. The averagenatural grasslandlivestockoverloading ratedecreased significantly, comprehensivenationalgrasslandvegetation coverageis increasing. Besides, adult cattleandbeef yield arefluctuated. Sheep head, adult sheep, sheep production, milk productionincreasedin varying degrees. The per capita netincomeof farmers and pastoralists, livestock income, the proportion oflivestockincomewere higher than those beforeimplementation of the policy. 展开更多
关键词 Grassland eco-protection compensation and reward system Grassland productivity Grassland ecology Animal husbandry production Pastoralists income
下载PDF
基于深度强化学习的SCR脱硝系统协同控制策略研究 被引量:3
11
作者 赵征 刘子涵 《动力工程学报》 CAS CSCD 北大核心 2024年第5期802-809,共8页
针对选择性催化还原(SCR)脱硝系统大惯性、多扰动等特点,提出了一种基于多维状态信息和分段奖励函数优化的深度确定性策略梯度(DDPG)协同比例积分微分(PID)控制器的控制策略。针对SCR脱硝系统中存在部分可观测马尔可夫决策过程(POMDP),... 针对选择性催化还原(SCR)脱硝系统大惯性、多扰动等特点,提出了一种基于多维状态信息和分段奖励函数优化的深度确定性策略梯度(DDPG)协同比例积分微分(PID)控制器的控制策略。针对SCR脱硝系统中存在部分可观测马尔可夫决策过程(POMDP),导致DDPG算法策略学习效率较低的问题,首先设计SCR脱硝系统的多维状态信息;其次,设计SCR脱硝系统的分段奖励函数;最后,设计DDPG-PID协同控制策略,以实现SCR脱硝系统的控制。结果表明:所设计的DDPG-PID协同控制策略提高了DDPG算法的策略学习效率,改善了PID的控制效果,同时具有较强的设定值跟踪能力、抗干扰能力和鲁棒性。 展开更多
关键词 DDPG 强化学习 sCR脱硝系统 协同控制 多维状态 分段奖励函数
下载PDF
Delta EEG Activity in Left Orbitofrontal Cortex in Rats Related to Food Reward and Craving 被引量:3
12
作者 付玉 陈艳梅 +3 位作者 曾涛 彭沿平 田绍华 马原野 《Zoological Research》 CAS CSCD 北大核心 2008年第3期260-264,共5页
The orbitofrontal cortex (OFC) is particularly important for the neural representation of reward value. Previous studies indicated that electroencephalogram (EEG) activity in the OFC was involved in drug administr... The orbitofrontal cortex (OFC) is particularly important for the neural representation of reward value. Previous studies indicated that electroencephalogram (EEG) activity in the OFC was involved in drug administration and withdrawal. The present study investigated EEG activity in the OFC in rats during the development of food reward and craving. Two environments were used separately for control and food-related EEG recordings. In the food-related environment rats were first trained to eat chocolate peanuts; then they either had no access to this food, but could see and smell it (craving trials), or had free access to this food (reward trials). The EEG in the left OFC was recorded during these trials. We showed that, in the food-related environment the EEG activity peaking in the delta band (2-4 Hz) was significantly correlated with the stimulus, increasing during food reward and decreasing during food craving when compared with that in the control environment. Our data suggests that EEG activity in the OFC can be altered by food reward; moreover, delta rhythm in this region could be used as an index monitoring changed signal underlying this reward. 展开更多
关键词 Orbitofrontal cortex EEG reward CRAVING Delta band
下载PDF
Co-effect of Demand-control-support Model and Effort-reward Imbalance Model on Depression Risk Estimation in Humans: Findings from Henan Province of China 被引量:9
13
作者 YU Shan Fa NAKATA Akinori +4 位作者 GU Gui Zhen SWANSON Naomi G ZHOU Wen Hui HE Li Hua WANG Sheng 《Biomedical and Environmental Sciences》 SCIE CAS CSCD 2013年第12期962-971,共10页
Objective To investigate the co-effect of Demand-control-support (DCS) model and Effort-reward Imbalance (ERI) model on the risk estimation of depression in humans in comparison with the effects when they are used... Objective To investigate the co-effect of Demand-control-support (DCS) model and Effort-reward Imbalance (ERI) model on the risk estimation of depression in humans in comparison with the effects when they are used respectively. Methods A total of 3 632 males and 1 706 females from 13 factories and companies in Henan province were recruited in this cross-sectional study. Perceived job stress was evaluated with the Job Content Questionnaire and Effort-Reward Imbalance Questionnaire (Chinese version). Depressive symptoms were assessed by using the Center for Epidemiological Studies Depression Scale (CES-D). Results DC (demands/job control ratio) and ERI were shown to be independently associated with depressive symptoms. The outcome of low social support and overcommitment were similar. High DC and low social support (SS), high ERI and high overcommitment, and high DC and high ERI posed greater risks of depressive symptoms than each of them did alone. ERI model and SS model seem to be effective in estimating the risk of depressive symptoms if they are used respectively. Conclusion The DC had better performance when it was used in combination with low SS. The effect on physical demands was better than on psychological demands. The combination of DCS and ERI models could improve the risk estimate of depressive symptoms in humans. 展开更多
关键词 DEPREssION Work-related stress Demand-control-support Effort- reward imbalance
下载PDF
基于路径模仿和SAC强化学习的机械臂路径规划算法 被引量:1
14
作者 宋紫阳 李军怀 +2 位作者 王怀军 苏鑫 于蕾 《计算机应用》 CSCD 北大核心 2024年第2期439-444,共6页
在机械臂路径规划算法的训练过程中,由于动作空间和状态空间巨大导致奖励稀疏,机械臂路径规划训练效率低,面对海量的状态数和动作数较难评估状态价值和动作价值。针对上述问题,提出一种基于SAC(Soft Actor-Critic)强化学习的机械臂路径... 在机械臂路径规划算法的训练过程中,由于动作空间和状态空间巨大导致奖励稀疏,机械臂路径规划训练效率低,面对海量的状态数和动作数较难评估状态价值和动作价值。针对上述问题,提出一种基于SAC(Soft Actor-Critic)强化学习的机械臂路径规划算法。通过将示教路径融入奖励函数使机械臂在强化学习过程中对示教路径进行模仿以提高学习效率,并采用SAC算法使机械臂路径规划算法的训练更快、稳定性更好。基于所提算法和深度确定性策略梯度(DDPG)算法分别规划10条路径,所提算法和DDPG算法规划的路径与参考路径的平均距离分别是0.8 cm和1.9 cm。实验结果表明,路径模仿机制能提高训练效率,所提算法比DDPG算法能更好地探索环境,使得规划路径更加合理。 展开更多
关键词 模仿学习 强化学习 sAC算法 路径规划 奖励函数
下载PDF
A UAV collaborative defense scheme driven by DDPG algorithm 被引量:1
15
作者 ZHANG Yaozhong WU Zhuoran +1 位作者 XIONG Zhenkai CHEN Long 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2023年第5期1211-1224,共14页
The deep deterministic policy gradient(DDPG)algo-rithm is an off-policy method that combines two mainstream reinforcement learning methods based on value iteration and policy iteration.Using the DDPG algorithm,agents ... The deep deterministic policy gradient(DDPG)algo-rithm is an off-policy method that combines two mainstream reinforcement learning methods based on value iteration and policy iteration.Using the DDPG algorithm,agents can explore and summarize the environment to achieve autonomous deci-sions in the continuous state space and action space.In this paper,a cooperative defense with DDPG via swarms of unmanned aerial vehicle(UAV)is developed and validated,which has shown promising practical value in the effect of defending.We solve the sparse rewards problem of reinforcement learning pair in a long-term task by building the reward function of UAV swarms and optimizing the learning process of artificial neural network based on the DDPG algorithm to reduce the vibration in the learning process.The experimental results show that the DDPG algorithm can guide the UAVs swarm to perform the defense task efficiently,meeting the requirements of a UAV swarm for non-centralization,autonomy,and promoting the intelligent development of UAVs swarm as well as the decision-making process. 展开更多
关键词 deep deterministic policy gradient(DDPG)algorithm unmanned aerial vehicles(UAVs)swarm task decision making deep reinforcement learning sparse reward problem
下载PDF
Brain areas activated by uncertain reward-based decision-making in healthy volunteers 被引量:3
16
作者 Zongjun Guo Juan Chen +3 位作者 Shien Liu Yuhuan Li Bo Sun Zhenbo Gao 《Neural Regeneration Research》 SCIE CAS CSCD 2013年第35期3344-3352,共9页
Reward-based decision-making has been found to activate several brain areas, including the ven- trolateral prefronta~ lobe, orbitofrontal cortex, anterior cingulate cortex, ventral striatum, and mesolimbic dopaminergi... Reward-based decision-making has been found to activate several brain areas, including the ven- trolateral prefronta~ lobe, orbitofrontal cortex, anterior cingulate cortex, ventral striatum, and mesolimbic dopaminergic system. In this study, we observed brain areas activated under three de- grees of uncertainty in a reward-based decision-making task (certain, risky, and ambiguous). The tasks were presented using a brain function audiovisual stimulation system. We conducted brain scans of 15 healthy volunteers using a 3.0T magnetic resonance scanner. We used SPM8 to ana- lyze the location and intensity of activation during the reward-based decision-making task, with re- spect to the three conditions. We found that the orbitofrontal cortex was activated in the certain reward condition, while the prefrontal cortex, precentral gyrus, occipital visual cortex, inferior parietal lobe, cerebellar posterior lobe, middle temporal gyrus, inferior temporal gyrus, limbic lobe, and midbrain were activated during the 'risk' condition. The prefrontal cortex, temporal pole, inferior temporal gyrus, occipital visual cortex, and cerebellar posterior lobe were activated during am- biguous decision-making. The ventrolateral prefrontal lobe, frontal pole of the prefrontal lobe, orbi- tofrontal cortex, precentral gyrus, inferior temporal gyrus, fusiform gyrus, supramarginal gyrus, infe- rior parietal Iobule, and cerebellar posterior lobe exhibited greater activation in the 'risk' than in the 'certain' condition (P 〈 0.05). The frontal pole and dorsolateral region of the prefrontal lobe, as well as the cerebellar posterior lobe, showed significantly greater activation in the 'ambiguous' condition compared to the 'risk' condition (P 〈 0.05). The prefrontal lobe, occipital lobe, parietal lobe, temporal lobe, limbic lobe, midbrain, and posterior lobe of the cerebellum were activated during deci- sion-making about uncertain rewards. Thus, we observed different levels and regions of activation for different types of reward processing during decision-making. Specifically, when the degree of reward uncertainty increased, the number of activated brain areas increased, including greater ac- tivation of brain areas associated with loss. 展开更多
关键词 neural regeneration NEUROIMAGING DECIsION-MAKING reward uncertainty cognitive processing functionalmagnetic resonance imaging BRAIN grants-supported paper NEUROREGENERAtION
下载PDF
Detecting Icing on the Blades of a Wind Turbine Using a Deep Neural Network
17
作者 Tingshun Li Jiaohui Xu +2 位作者 Zesan Liu Dadi Wang Wen Tan 《Computer Modeling in Engineering & Sciences》 SCIE EI 2023年第2期767-782,共16页
The blades of wind turbines located at high latitudes are often covered with ice in late autumn and winter,where this affects their capacity for power generation as well as their safety.Accurately identifying the icin... The blades of wind turbines located at high latitudes are often covered with ice in late autumn and winter,where this affects their capacity for power generation as well as their safety.Accurately identifying the icing of the blades of wind turbines in remote areas is thus important,and a general model is needed to this end.This paper proposes a universal model based on a Deep Neural Network(DNN)that uses data from the Supervisory Control and Data Acquisition(SCADA)system.Two datasets from SCADA are first preprocessed through undersampling,that is,they are labeled,normalized,and balanced.The features of icing of the blades of a turbine identified in previous studies are then used to extract training data from the training dataset.A middle feature is proposed to show how a given feature is correlated with icing on the blade.Performance indicators for the model,including a reward function,are also designed to assess its predictive accuracy.Finally,the most suitable model is used to predict the testing data,and values of the reward function and the predictive accuracy of the model are calculated.The proposed method can be used to relate continuously transferred features with a binary status of icing of the blades of the turbine by using variables of the middle feature.The results here show that an integrated indicator systemis superior to a single indicator of accuracy when evaluating the prediction model. 展开更多
关键词 DNN predicting blade icing sCADA data wind power reward function
下载PDF
结合先验知识的SAC神经纤维追踪算法及应用
18
作者 林佳俐 李永强 +1 位作者 赵硕 冯远静 《小型微型计算机系统》 CSCD 北大核心 2024年第7期1719-1727,共9页
扩散磁共振成像是目前唯一的非侵入式神经纤维成像方法.针对现有的纤维追踪算法在交叉、分叉等复杂纤维结构上存在无效连接率高或者无连接率高的问题,本文提出了基于先验知识的Soft-Actor-Critic纤维追踪算法;设计了基于球谐函数模型的... 扩散磁共振成像是目前唯一的非侵入式神经纤维成像方法.针对现有的纤维追踪算法在交叉、分叉等复杂纤维结构上存在无效连接率高或者无连接率高的问题,本文提出了基于先验知识的Soft-Actor-Critic纤维追踪算法;设计了基于球谐函数模型的单步奖励和基于解剖学结构的稀疏奖励;结合六邻域体素的球谐函数信息,保证空间一致性;将先前时刻的动作作为决策网络的输入,增强智能体对时序动作的利用.在Fibercup数据集上,有效连接率达到78.1%,并且显著降低了无效链接率和无连接率.此外,还将该方法成功应用到视神经这类长距离、带噪声并且包含交叉区域的复杂结构的重建上.实验结果表明本文方法可以完成复杂结构的重建,并且有效降低错误连接率. 展开更多
关键词 深度强化学习 稀疏奖励 神经纤维追踪 视神经
下载PDF
Impact of social relationship on firms' sharing reward program 被引量:1
19
作者 Wei Wei Mei Shu e Zhong Weijun 《Journal of Southeast University(English Edition)》 EI CAS 2018年第4期540-544,共5页
In order to make strategic decision on firms’ sharing reward program( SRP), a nested Stackelberg game is developed. The sharing behavior among users and the rewarding strategy of firms are modeled. The optimal sharin... In order to make strategic decision on firms’ sharing reward program( SRP), a nested Stackelberg game is developed. The sharing behavior among users and the rewarding strategy of firms are modeled. The optimal sharing bonus is worked out and the impact of social relationships among customers is discussed. The results show that the higher the bonus,the more efforts the inductor is willing to make to persuade the inductee into buying. In addition,the firms should take the social relationship into consideration when setting the optimal sharing bonus. If the social relationship is weak,there is no need to adopt the SRP. Otherwise,there are two ways to reward the inductors. Also,the stronger the social relationship,the fewer the sharing bonuses that should be offered to the inductors,and the higher the expected profits. As a result,it is reasonable for the firms to implement SRPs on the social media where users are familiar with each other. 展开更多
关键词 social relationship sharing reward program incentive strategy social commerce
下载PDF
改进式MATD3算法及其对抗应用
20
作者 王琨 赵英策 +1 位作者 王光耀 李建勋 《指挥控制与仿真》 2024年第5期77-84,共8页
提升多智能体训练效果一直是强化学习领域中的重点。以多智能体双延迟深度确定性策略梯度(MATD3)算法为基础,引入参数共享机制,进而提升训练效率。同时为缓解真实奖励与辅助奖励不一致的问题,借鉴课程学习思想,提出辅助奖励衰减因子,以... 提升多智能体训练效果一直是强化学习领域中的重点。以多智能体双延迟深度确定性策略梯度(MATD3)算法为基础,引入参数共享机制,进而提升训练效率。同时为缓解真实奖励与辅助奖励不一致的问题,借鉴课程学习思想,提出辅助奖励衰减因子,以保证训练初期的策略探索积极性与训练末期的奖励一致性。将所提出的改进式MATD3算法应用于战车博弈对抗,从而实现战车的智能决策,应用结果表明,智能战车的奖励曲线收敛稳定,且效果良好。同时就改进式算法与原始MATD3算法进行对比仿真,仿真结果验证了改进式算法能够有效提升收敛速度以及奖励收敛值。 展开更多
关键词 强化学习 参数共享 奖励一致性 智能决策
下载PDF
上一页 1 2 179 下一页 到第
使用帮助 返回顶部