融合Dead-ends和离线监督Actor-Critic的动态治疗策略生成模型

Dynamic Treatment Regime Generation Model Combining Dead-ends and Offline SupervisionActor-Critic

下载PDF

导出

摘要强化学习对数学模型依赖性低,利用经验便于架构和优化模型,非常适合用于动态治疗策略学习。但现有研究仍存在以下问题:1)学习策略最优性的同时未考虑风险,导致学到的策略存在一定的风险;2)忽略了分布偏移问题,导致学到的策略与医生策略完全不同;3)忽略患者的历史观测数据和治疗史,从而不能很好地得到患者状态,进而导致不能学到最优策略。基于此,提出了融合Dead-ends和离线监督Actor-Critic的动态治疗策略生成模型DOSAC-DTR。首先,考虑学到的策略所推荐的治疗行动的风险性,在Actor-Critic框架中融入Dead-ends概念;其次,为缓解分布偏移问题,在Actor-Critic框架中融入医生监督,在最大化预期回报的同时,最小化所学策略与医生策略之间的差距;最后,为了得到包含患者关键历史信息的状态表示,使用基于LSTM的编码器解码器模型对患者的历史观测数据和治疗史进行建模。实验结果表明,DOSAC-DTR相比基线方法有更好的性能,可以得到更低的估计死亡率以及更高的Jaccard系数。 Reinforcement learning has low dependence on mathematical models,and it is easy to construct and optimize models by using experience,which is very suitable for dynamic treatment regime learning.However,existing studies still have the following problems:1)risk is not considered when learning strategy optimality,resulting in certain risks in the learned policy;2)the problem of distribution deviation is ignored,resulting in learning policies completely different from the doctor’s policy;3)the patient’s histo-rical observation data and treatment history are ignored,thus failing to obtain a good patient status and thus failing to learn the optimal policy.Based on this,DOSAC-DTR,a dynamic treatment regime generation model combining dead-ends and offline supervision actor-critic,is proposed.First,considering the risk of treatment actions recommended by the learned policies,the concept of dead-ends is integrated into the actor-critic framework.Secondly,in order to alleviate the problem of distribution offset,physician supervision is integrated into the actor-critic framework to minimize the gap between learned policies and doctors’policies while maximizing the expected return.Finally,in order to obtain a state representation that includes critical patient historical information,a LSTM-based encoder decoder model is used to model the patient’s historical observation data and treatment history.Experiments show that DOSAC-DTR has better performance than the baseline approach,resulting in lower estimated mortality rates and higher Jaccard coefficients.

作者杨莎莎于亚新王跃茹许晶铭魏阳杰李新华 YANG Shasha;YU Yaxin;WANG Yueru;XU Jingming;WEI Yangjie;LI Xinhua(College of Computer Science and Engineering,Northeastern University,Shenyang 110169,China;Key Laboratory of Intelligent in Medical Image,Northeastern University,Shenyang 110169,China)

机构地区东北大学计算机科学与工程学院医学影像智能计算教育部重点实验室(东北大学)

出处《计算机科学》 CSCD 北大核心 2024年第7期80-88,共9页 Computer Science

基金国家自然科学基金(62373084)。

关键词动态治疗策略 Dead-ends Actor-Critic 状态表征 Dynamic treatment regime Dead-ends Actor-Critic State representation

分类号 TP399 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

1陈子铭,宣士斌.虚拟对抗训练的跨域块对比半监督细胞核分割[J].计算机技术与发展,2024,34(6):37-44.
2刘华玲,朱建亮,任青青.强化学习框架中因果推断研究进展[J].浙江大学学报（理学版）,2024,51(4):391-406.
3寇大治.基于深度学习的口腔全景片牙齿自动分割方法[J].数据与计算发展前沿,2024,6(3):162-172.
4胡海波,杨丹,聂铁铮,寇月.融入多影响力与偏好的图对比学习社交推荐算法[J].计算机科学,2024,51(7):146-155.
5陈威震,林添钰,李津.2型结核分枝杆菌与肺外结核相关性的研究[J].中国防痨杂志,2024,46(S01):13-16.
6王文龙,张帆,唐超,李徐,郝正阳,张帆扬.基于三值估算法的深度双确定性策略梯度算法[J].智能计算机与应用,2024,14(5):75-82.
7曾子辉,李超洋,廖清.缺失值场景下的多元时间序列异常检测算法[J].计算机科学,2024,51(7):108-115.
8丁雪莲.基于数据增强的语义一致番茄叶病识别方法[J].内蒙古大学学报（自然科学版）,2024,55(3):302-307.
9张惠鹃,黄钦阳,胡诗彦,杨青,张敬伟.完全图高阶关系驱动的链接预测[J].计算机研究与发展,2024,61(7):1825-1835. 被引量：1
10马壮林,程会媛,邵逸恒,刘悦,程泽农,马飞.大客流干扰下多层公交-地铁网络的韧性评估[J].中国公路学报,2024,37(6):267-278. 被引量：1

计算机科学

2024年第7期

浏览历史

内容加载中请稍等...

融合Dead-ends和离线监督Actor-Critic的动态治疗策略生成模型

相关作者

相关机构

相关主题

浏览历史