期刊文献+

融合Dead-ends和离线监督Actor-Critic的动态治疗策略生成模型

Dynamic Treatment Regime Generation Model Combining Dead-ends and Offline SupervisionActor-Critic
下载PDF
导出
摘要 强化学习对数学模型依赖性低,利用经验便于架构和优化模型,非常适合用于动态治疗策略学习。但现有研究仍存在以下问题:1)学习策略最优性的同时未考虑风险,导致学到的策略存在一定的风险;2)忽略了分布偏移问题,导致学到的策略与医生策略完全不同;3)忽略患者的历史观测数据和治疗史,从而不能很好地得到患者状态,进而导致不能学到最优策略。基于此,提出了融合Dead-ends和离线监督Actor-Critic的动态治疗策略生成模型DOSAC-DTR。首先,考虑学到的策略所推荐的治疗行动的风险性,在Actor-Critic框架中融入Dead-ends概念;其次,为缓解分布偏移问题,在Actor-Critic框架中融入医生监督,在最大化预期回报的同时,最小化所学策略与医生策略之间的差距;最后,为了得到包含患者关键历史信息的状态表示,使用基于LSTM的编码器解码器模型对患者的历史观测数据和治疗史进行建模。实验结果表明,DOSAC-DTR相比基线方法有更好的性能,可以得到更低的估计死亡率以及更高的Jaccard系数。 Reinforcement learning has low dependence on mathematical models,and it is easy to construct and optimize models by using experience,which is very suitable for dynamic treatment regime learning.However,existing studies still have the following problems:1)risk is not considered when learning strategy optimality,resulting in certain risks in the learned policy;2)the problem of distribution deviation is ignored,resulting in learning policies completely different from the doctor’s policy;3)the patient’s histo-rical observation data and treatment history are ignored,thus failing to obtain a good patient status and thus failing to learn the optimal policy.Based on this,DOSAC-DTR,a dynamic treatment regime generation model combining dead-ends and offline supervision actor-critic,is proposed.First,considering the risk of treatment actions recommended by the learned policies,the concept of dead-ends is integrated into the actor-critic framework.Secondly,in order to alleviate the problem of distribution offset,physician supervision is integrated into the actor-critic framework to minimize the gap between learned policies and doctors’policies while maximizing the expected return.Finally,in order to obtain a state representation that includes critical patient historical information,a LSTM-based encoder decoder model is used to model the patient’s historical observation data and treatment history.Experiments show that DOSAC-DTR has better performance than the baseline approach,resulting in lower estimated mortality rates and higher Jaccard coefficients.
作者 杨莎莎 于亚新 王跃茹 许晶铭 魏阳杰 李新华 YANG Shasha;YU Yaxin;WANG Yueru;XU Jingming;WEI Yangjie;LI Xinhua(College of Computer Science and Engineering,Northeastern University,Shenyang 110169,China;Key Laboratory of Intelligent in Medical Image,Northeastern University,Shenyang 110169,China)
出处 《计算机科学》 CSCD 北大核心 2024年第7期80-88,共9页 Computer Science
基金 国家自然科学基金(62373084)。
关键词 动态治疗策略 Dead-ends Actor-Critic 状态表征 Dynamic treatment regime Dead-ends Actor-Critic State representation
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部