摘要
背景:电子病历数据是医疗领域大数据的重要源头,是医学知识的体现。电子病历是患者就医过程的记录,是临床辅助决策系统、精准医学研究和疾病监控等应用的重要数据支撑。目的:研究电子病历的信息抽取技术,提取中文电子病历中的重要医学实体,支持肝细胞癌的知识发现。方法:数据集来自广东省某三甲医院的电子病历数据库。共收集了240例患有肝细胞癌的病历记录(18 542个句子),包括入院记录和出院小结。按照预先定义的标准进行标注。随机抽取180例患者病历(13 839个句子)进行训练,并保留60个病例记录(4 703个句子)作为测试集。利用双向的LSTM网络结合CRF训练命名实体识别模型。在测试数据集上评估NER系统的性能,并计算出严格匹配的准确率、召回率和F1值。结果与结论:对测试数据集的评估表明,入院记录中实体识别F1值为0.853 5,出院小结中实体识别的F1值为0.726 5,总体F1值为0.8052。研究实现了电子病历文本自动命名实体识别模型,下一步的研究重点将改进实体抽取的准确率。
BACKGROUND: Electronic medical record(EMR) is an important source of medical source, reflecting medical knowledge. There are patient clinical features in EMR, which enables decision support system and precision medicine.OBJECTIVE: To extract important medical entities of EMR using information extraction, and to discover hepatocellular carcinoma knowledge. METHODS: The EMR database of a Grade-A Tertiary hospital in Guangdong Province was used. We retrieved clinical records(18 542 sentences) of 240 patients suffering from hepatocellular carcinoma, including admission notes and discharge summaries. The records were remarked according to the predetermined standards. Totally 180 patients' records(13 839 sentences) were selected randomly for training and 60 patients' records(4 703 sentences) were remained for testing. Bidirectional long short-term memory combined with case report form was used to identify the model. The performance of NER systems was evaluated on the test datasets, and precision, recall, F1 of strict matching were caculated. RESULTS AND CONCLUSION: Evaluation on the dataset showed that an F1-measure of 0.853 5 was for admission, F1-measure of 0.726 5 was for the discharge summaries, and an overall F1-measure was 0.805 2. In this study, we have achieved the auto-name entity identification model of EMR, but the accuracy of entity extraction needs further investigation.
作者
杨红梅
李琳
杨日东
周毅
Yang Hong-mei;Li Lin;Yang Ri-dong;Zhou Yi(Zhongshan School of Medicine,Sun Yat-sen University,Guangzhou 510080,Guangdong Province,China;Xinjiang Medical University,Urumqi 830011,Xinjiang Uygur Autonomous Region,China)
出处
《中国组织工程研究》
CAS
北大核心
2018年第20期3237-3242,共6页
Chinese Journal of Tissue Engineering Research
基金
国家重点研发计划精准医学专项基金项目(2016YFC0901602)
NSFC-广东大数据科学中心联合基金项目(U1611261)
广东省前沿与关键技术创新专项基金项目(2014B010118003)
广州市2017年产学研协同创新重大专项(201604016136)
广州市健康医疗协同创新重大专项(201604020016)~~