Driven by the improvement of the smart grid,the active distribution network(ADN)has attracted much attention due to its characteristic of active management.By making full use of electricity price signals for optimal s...Driven by the improvement of the smart grid,the active distribution network(ADN)has attracted much attention due to its characteristic of active management.By making full use of electricity price signals for optimal scheduling,the total cost of the ADN can be reduced.However,the optimal dayahead scheduling problem is challenging since the future electricity price is unknown.Moreover,in ADN,some schedulable variables are continuous while some schedulable variables are discrete,which increases the difficulty of determining the optimal scheduling scheme.In this paper,the day-ahead scheduling problem of the ADN is formulated as a Markov decision process(MDP)with continuous-discrete hybrid action space.Then,an algorithm based on multi-agent hybrid reinforcement learning(HRL)is proposed to obtain the optimal scheduling scheme.The proposed algorithm adopts the structure of centralized training and decentralized execution,and different methods are applied to determine the selection policy of continuous scheduling variables and discrete scheduling variables.The simulation experiment results demonstrate the effectiveness of the algorithm.展开更多
This paper develops deep reinforcement learning(DRL)algorithms for optimizing the operation of home energy system which consists of photovoltaic(PV)panels,battery energy storage system,and household appliances.Model-f...This paper develops deep reinforcement learning(DRL)algorithms for optimizing the operation of home energy system which consists of photovoltaic(PV)panels,battery energy storage system,and household appliances.Model-free DRL algorithms can efficiently handle the difficulty of energy system modeling and uncertainty of PV generation.However,discretecontinuous hybrid action space of the considered home energy system challenges existing DRL algorithms for either discrete actions or continuous actions.Thus,a mixed deep reinforcement learning(MDRL)algorithm is proposed,which integrates deep Q-learning(DQL)algorithm and deep deterministic policy gradient(DDPG)algorithm.The DQL algorithm deals with discrete actions,while the DDPG algorithm handles continuous actions.The MDRL algorithm learns optimal strategy by trialand-error interactions with the environment.However,unsafe actions,which violate system constraints,can give rise to great cost.To handle such problem,a safe-MDRL algorithm is further proposed.Simulation studies demonstrate that the proposed MDRL algorithm can efficiently handle the challenge from discrete-continuous hybrid action space for home energy management.The proposed MDRL algorithm reduces the operation cost while maintaining the human thermal comfort by comparing with benchmark algorithms on the test dataset.Moreover,the safe-MDRL algorithm greatly reduces the loss of thermal comfort in the learning stage by the proposed MDRL algorithm.展开更多
In physical information theory elementary objects are represented as correlation structures with oscillator properties and characterized by action. The procedure makes it possible to describe the photons of positive a...In physical information theory elementary objects are represented as correlation structures with oscillator properties and characterized by action. The procedure makes it possible to describe the photons of positive and negative charges by positive and negative real action;gravitons are represented in equal amounts by positive and negative real, i.e., virtual action, and the components of the vacuum are characterized by deactivated virtual action. An analysis of the currents in the correlation structures of photons of static Maxwell fields with wave and particle properties, of the Maxwell vacuum and of the gravitons leads to a uniform three-dimensional representation of the structure of the action. Based on these results, a basic structure consisting of a system of oscillators is proposed, which describe the properties of charges and masses and interact with the photons of static Maxwell fields and with gravitons. All properties of the elemental components of nature can thus be traced back to a basic structure of action. It follows that nature can be derived from a uniform structure and this structure of action must therefore also be the basis of the origin of the cosmos.展开更多
The performance of the state-of-the-art Deep Reinforcement algorithms such as Proximal Policy Optimization, Twin Delayed Deep Deterministic Policy Gradient, and Soft Actor-Critic for generating a quadruped walking gai...The performance of the state-of-the-art Deep Reinforcement algorithms such as Proximal Policy Optimization, Twin Delayed Deep Deterministic Policy Gradient, and Soft Actor-Critic for generating a quadruped walking gait in a virtual environment was presented in previous research work titled “A Comparison of PPO, TD3, and SAC Reinforcement Algorithms for Quadruped Walking Gait Generation”. We demonstrated that the Soft Actor-Critic Reinforcement algorithm had the best performance generating the walking gait for a quadruped in certain instances of sensor configurations in the virtual environment. In this work, we present the performance analysis of the state-of-the-art Deep Reinforcement algorithms above for quadruped walking gait generation in a physical environment. The performance is determined in the physical environment by transfer learning augmented by real-time reinforcement learning for gait generation on a physical quadruped. The performance is analyzed on a quadruped equipped with a range of sensors such as position tracking using a stereo camera, contact sensing of each of the robot legs through force resistive sensors, and proprioceptive information of the robot body and legs using nine inertial measurement units. The performance comparison is presented using the metrics associated with the walking gait: average forward velocity (m/s), average forward velocity variance, average lateral velocity (m/s), average lateral velocity variance, and quaternion root mean square deviation. The strengths and weaknesses of each algorithm for the given task on the physical quadruped are discussed.展开更多
宽带跳频与深度强化学习结合的智能跳频通信模式能有效提高通信抗干扰能力。针对同时调整信号频点和功率的双动作空间智能决策由于频点离散但功率非离散使得决策依赖的深度强化学习算法难以设计的问题,基于离散型深度确定性策略梯度算法...宽带跳频与深度强化学习结合的智能跳频通信模式能有效提高通信抗干扰能力。针对同时调整信号频点和功率的双动作空间智能决策由于频点离散但功率非离散使得决策依赖的深度强化学习算法难以设计的问题,基于离散型深度确定性策略梯度算法(Wolpertinger Deep Deterministic Policy Gradient,W-DDPG),提出了一种适于宽带跳频通信且具有发射频率和功率组成的双动作空间智能抗干扰决策方法。该决策方法面向频率/功率双动作空间,在频率空间中使用Wolpertinger架构处理频率动作,并与功率动作组成联合动作,然后使用DDPG算法进行训练,使该算法能够适用于宽带跳频双动作空间的抗干扰场景,在复杂的电磁环境下能够快速作出决策。仿真结果表明,该方法在宽带跳频双动作空间干扰模式下的收敛速度及抗干扰性能较传统抗干扰算法提升了大约25%。展开更多
基金This work was supported by the National Key R&D Program of China(2018AAA0101400)the National Natural Science Foundation of China(62173251,61921004,U1713209)the Natural Science Foundation of Jiangsu Province of China(BK20202006).
文摘Driven by the improvement of the smart grid,the active distribution network(ADN)has attracted much attention due to its characteristic of active management.By making full use of electricity price signals for optimal scheduling,the total cost of the ADN can be reduced.However,the optimal dayahead scheduling problem is challenging since the future electricity price is unknown.Moreover,in ADN,some schedulable variables are continuous while some schedulable variables are discrete,which increases the difficulty of determining the optimal scheduling scheme.In this paper,the day-ahead scheduling problem of the ADN is formulated as a Markov decision process(MDP)with continuous-discrete hybrid action space.Then,an algorithm based on multi-agent hybrid reinforcement learning(HRL)is proposed to obtain the optimal scheduling scheme.The proposed algorithm adopts the structure of centralized training and decentralized execution,and different methods are applied to determine the selection policy of continuous scheduling variables and discrete scheduling variables.The simulation experiment results demonstrate the effectiveness of the algorithm.
基金supported by the National Natural Science Foundation of China(No.62002016)the Science and Technology Development Fund,Macao S.A.R.(No.0137/2019/A3)+1 种基金the Beijing Natural Science Foundation(No.9204028)the Guangdong Basic and Applied Basic Research Foundation(No.2019A1515111165)。
文摘This paper develops deep reinforcement learning(DRL)algorithms for optimizing the operation of home energy system which consists of photovoltaic(PV)panels,battery energy storage system,and household appliances.Model-free DRL algorithms can efficiently handle the difficulty of energy system modeling and uncertainty of PV generation.However,discretecontinuous hybrid action space of the considered home energy system challenges existing DRL algorithms for either discrete actions or continuous actions.Thus,a mixed deep reinforcement learning(MDRL)algorithm is proposed,which integrates deep Q-learning(DQL)algorithm and deep deterministic policy gradient(DDPG)algorithm.The DQL algorithm deals with discrete actions,while the DDPG algorithm handles continuous actions.The MDRL algorithm learns optimal strategy by trialand-error interactions with the environment.However,unsafe actions,which violate system constraints,can give rise to great cost.To handle such problem,a safe-MDRL algorithm is further proposed.Simulation studies demonstrate that the proposed MDRL algorithm can efficiently handle the challenge from discrete-continuous hybrid action space for home energy management.The proposed MDRL algorithm reduces the operation cost while maintaining the human thermal comfort by comparing with benchmark algorithms on the test dataset.Moreover,the safe-MDRL algorithm greatly reduces the loss of thermal comfort in the learning stage by the proposed MDRL algorithm.
文摘In physical information theory elementary objects are represented as correlation structures with oscillator properties and characterized by action. The procedure makes it possible to describe the photons of positive and negative charges by positive and negative real action;gravitons are represented in equal amounts by positive and negative real, i.e., virtual action, and the components of the vacuum are characterized by deactivated virtual action. An analysis of the currents in the correlation structures of photons of static Maxwell fields with wave and particle properties, of the Maxwell vacuum and of the gravitons leads to a uniform three-dimensional representation of the structure of the action. Based on these results, a basic structure consisting of a system of oscillators is proposed, which describe the properties of charges and masses and interact with the photons of static Maxwell fields and with gravitons. All properties of the elemental components of nature can thus be traced back to a basic structure of action. It follows that nature can be derived from a uniform structure and this structure of action must therefore also be the basis of the origin of the cosmos.
文摘The performance of the state-of-the-art Deep Reinforcement algorithms such as Proximal Policy Optimization, Twin Delayed Deep Deterministic Policy Gradient, and Soft Actor-Critic for generating a quadruped walking gait in a virtual environment was presented in previous research work titled “A Comparison of PPO, TD3, and SAC Reinforcement Algorithms for Quadruped Walking Gait Generation”. We demonstrated that the Soft Actor-Critic Reinforcement algorithm had the best performance generating the walking gait for a quadruped in certain instances of sensor configurations in the virtual environment. In this work, we present the performance analysis of the state-of-the-art Deep Reinforcement algorithms above for quadruped walking gait generation in a physical environment. The performance is determined in the physical environment by transfer learning augmented by real-time reinforcement learning for gait generation on a physical quadruped. The performance is analyzed on a quadruped equipped with a range of sensors such as position tracking using a stereo camera, contact sensing of each of the robot legs through force resistive sensors, and proprioceptive information of the robot body and legs using nine inertial measurement units. The performance comparison is presented using the metrics associated with the walking gait: average forward velocity (m/s), average forward velocity variance, average lateral velocity (m/s), average lateral velocity variance, and quaternion root mean square deviation. The strengths and weaknesses of each algorithm for the given task on the physical quadruped are discussed.
文摘宽带跳频与深度强化学习结合的智能跳频通信模式能有效提高通信抗干扰能力。针对同时调整信号频点和功率的双动作空间智能决策由于频点离散但功率非离散使得决策依赖的深度强化学习算法难以设计的问题,基于离散型深度确定性策略梯度算法(Wolpertinger Deep Deterministic Policy Gradient,W-DDPG),提出了一种适于宽带跳频通信且具有发射频率和功率组成的双动作空间智能抗干扰决策方法。该决策方法面向频率/功率双动作空间,在频率空间中使用Wolpertinger架构处理频率动作,并与功率动作组成联合动作,然后使用DDPG算法进行训练,使该算法能够适用于宽带跳频双动作空间的抗干扰场景,在复杂的电磁环境下能够快速作出决策。仿真结果表明,该方法在宽带跳频双动作空间干扰模式下的收敛速度及抗干扰性能较传统抗干扰算法提升了大约25%。