Deep reinforcement learning(DRL) has demonstrated significant potential in industrial manufacturing domains such as workshop scheduling and energy system management.However, due to the model's inherent uncertainty...Deep reinforcement learning(DRL) has demonstrated significant potential in industrial manufacturing domains such as workshop scheduling and energy system management.However, due to the model's inherent uncertainty, rigorous validation is requisite for its application in real-world tasks. Specific tests may reveal inadequacies in the performance of pre-trained DRL models, while the “black-box” nature of DRL poses a challenge for testing model behavior. We propose a novel performance improvement framework based on probabilistic automata,which aims to proactively identify and correct critical vulnerabilities of DRL systems, so that the performance of DRL models in real tasks can be improved with minimal model modifications.First, a probabilistic automaton is constructed from the historical trajectory of the DRL system by abstracting the state to generate probabilistic decision-making units(PDMUs), and a reverse breadth-first search(BFS) method is used to identify the key PDMU-action pairs that have the greatest impact on adverse outcomes. This process relies only on the state-action sequence and final result of each trajectory. Then, under the key PDMU, we search for the new action that has the greatest impact on favorable results. Finally, the key PDMU, undesirable action and new action are encapsulated as monitors to guide the DRL system to obtain more favorable results through real-time monitoring and correction mechanisms. Evaluations in two standard reinforcement learning environments and three actual job scheduling scenarios confirmed the effectiveness of the method, providing certain guarantees for the deployment of DRL models in real-world applications.展开更多
The probabilistic real-time automaton (PRTA) is a representation of dynamic processes arising in the sciences and industry. Currently, the induction of automata is divided into two steps: the creation of the prefix...The probabilistic real-time automaton (PRTA) is a representation of dynamic processes arising in the sciences and industry. Currently, the induction of automata is divided into two steps: the creation of the prefix tree acceptor (PTA) and the merge procedure based on clustering of the states. These two steps can be very time intensive when a PRTA is to be induced for massive or even unbounded datasets. The latter one can be efficiently processed, as there exist scalable online clustering algorithms. However, the creation of the PTA still can be very time consuming. To overcome this problem, we propose a genuine online PRTA induction approach that incorporates new instances by first collapsing them and then using a maximum frequent pattern based clustering. The approach is tested against a predefined synthetic automaton and real world datasets, for which the approach is scalable and stable. Moreover, we present a broad evaluation on a real world disease group dataset that shows the applicability of such a model to the analysis of medical processes.展开更多
The notion of amortisation has been integrated in quantitative bisimulations to make long-term behavioral comparisons between nondeterministic systems. In this paper, we present sound and complete proof systems for am...The notion of amortisation has been integrated in quantitative bisimulations to make long-term behavioral comparisons between nondeterministic systems. In this paper, we present sound and complete proof systems for amortised strong probabilistic bisimulation and its observational congruence on a process algebra with probability and nondeterminism, and prove their soundness and completeness. Our results make it possible to reason about long-term (observable) probabilistic behaviors by syntactic manipulations.展开更多
基金supported by the Shanghai Science and Technology Committee (22511105500)the National Nature Science Foundation of China (62172299, 62032019)+2 种基金the Space Optoelectronic Measurement and Perception LaboratoryBeijing Institute of Control Engineering(LabSOMP-2023-03)the Central Universities of China (2023-4-YB-05)。
文摘Deep reinforcement learning(DRL) has demonstrated significant potential in industrial manufacturing domains such as workshop scheduling and energy system management.However, due to the model's inherent uncertainty, rigorous validation is requisite for its application in real-world tasks. Specific tests may reveal inadequacies in the performance of pre-trained DRL models, while the “black-box” nature of DRL poses a challenge for testing model behavior. We propose a novel performance improvement framework based on probabilistic automata,which aims to proactively identify and correct critical vulnerabilities of DRL systems, so that the performance of DRL models in real tasks can be improved with minimal model modifications.First, a probabilistic automaton is constructed from the historical trajectory of the DRL system by abstracting the state to generate probabilistic decision-making units(PDMUs), and a reverse breadth-first search(BFS) method is used to identify the key PDMU-action pairs that have the greatest impact on adverse outcomes. This process relies only on the state-action sequence and final result of each trajectory. Then, under the key PDMU, we search for the new action that has the greatest impact on favorable results. Finally, the key PDMU, undesirable action and new action are encapsulated as monitors to guide the DRL system to obtain more favorable results through real-time monitoring and correction mechanisms. Evaluations in two standard reinforcement learning environments and three actual job scheduling scenarios confirmed the effectiveness of the method, providing certain guarantees for the deployment of DRL models in real-world applications.
文摘The probabilistic real-time automaton (PRTA) is a representation of dynamic processes arising in the sciences and industry. Currently, the induction of automata is divided into two steps: the creation of the prefix tree acceptor (PTA) and the merge procedure based on clustering of the states. These two steps can be very time intensive when a PRTA is to be induced for massive or even unbounded datasets. The latter one can be efficiently processed, as there exist scalable online clustering algorithms. However, the creation of the PTA still can be very time consuming. To overcome this problem, we propose a genuine online PRTA induction approach that incorporates new instances by first collapsing them and then using a maximum frequent pattern based clustering. The approach is tested against a predefined synthetic automaton and real world datasets, for which the approach is scalable and stable. Moreover, we present a broad evaluation on a real world disease group dataset that shows the applicability of such a model to the analysis of medical processes.
基金This work was supported by the National Natural Science Foundation of China under Grant No. 60833001.
文摘The notion of amortisation has been integrated in quantitative bisimulations to make long-term behavioral comparisons between nondeterministic systems. In this paper, we present sound and complete proof systems for amortised strong probabilistic bisimulation and its observational congruence on a process algebra with probability and nondeterminism, and prove their soundness and completeness. Our results make it possible to reason about long-term (observable) probabilistic behaviors by syntactic manipulations.