摘要
【目的】面向复杂疾病临床试验招募的需求,提出一种基于BERT-TextCNN的临床试验疾病亚型识别方法,辅助识别复杂疾病特定亚型的受试人群。【方法】将临床试验疾病亚型识别问题转化为单标签分类问题,应用基于BERT-TextCNN的单标签分类模型进行分类,以卒中为例在临床试验数据集(ClinicalTrials.gov)上开展实验验证。【结果】基于LP法的BERT-TextCNN模型性能最佳,加权宏平均F1值为0.9053,可以有效判定一项卒中临床试验可纳入卒中亚型受试者情况。【局限】缺乏在其他单病种上的可行性研究,以及在外部数据集上的有效性验证。【结论】本文方法可以有效解决从纳入标准中准确识别复杂疾病亚型的问题。
[Objective]This study develops a method to identify disease subtypes based on BERT-TextCNN,which could facilitate cohort selection for clinical trials.[Methods]We transformed the disease subtype identification into a single-label classification task based on BERT-TextCNN.Then,we examined our new model with clinical trials data for strokes from ClinicalTrials.gov.[Results]The BERT-TextCNN based on the LP method yielded the best weighted macro-average F1 value of 0.9053.It identified stroke subtypes for participants of a clinical trial.[Limitations]More research is needed to evaluate our model with other diseases and data sets.[Conclusions]The proposed method could be an effective approach to identify complex disease subtypes.
作者
杨林
黄晓硕
王嘉阳
丁玲玲
李子孝
李姣
Yang Lin;Huang Xiaoshuo;Wang Jiayang;Ding Lingling;Li Zixiao;Li Jiao(Institute of Medical Information/Medical Library,Chinese Academy of Medical Science&Peking Union Medical College,Beijing 100020,China;China National Clinical Research Center for Neurological Diseases,Beijing Tiantan Hospital,Capital Medical University,Beijing 100070,China;Department of Neurology,Beijing Tiantan Hospital,Capital Medical University,Beijing 100070,China)
出处
《数据分析与知识发现》
CSSCI
CSCD
北大核心
2022年第4期69-81,共13页
Data Analysis and Knowledge Discovery
基金
北京市自然科学基金重点研究专题(项目编号:Z200016)的研究成果之一。