摘要
医疗实体标准化旨在将电子病历、患者主诉等文本数据中非标准化术语映射为统一且规范的医疗实体。针对医学文本普遍存在的标注语料规模小、规范化程度低等领域特点,该文提出了一种基于多模型协同的集成学习框架,用以解决医疗实体标准化问题。该框架通过建立多模型之间的“合作与竞争”模式,能够兼具字符级、语义级等不同标准化方法的优势。具体而言,运用知识蒸馏技术进行协同学习,从各模型中汲取有效特征;利用竞争意识综合各模型的实体标准化结果,保证候选集的多样性。在CHIP-CDN 2021医疗实体标准化评测任务中,该文提出的方法在盲测数据集上达到了73.985%的F_(1)值,在包括百度BDKG、蚂蚁金融Antins、思必驰AIspeech在内的255支队伍中,取得了第二名的成绩。后续实验结果进一步表明,该方法可有效对医疗文本中的术语进行标准化处理。
Medical entity standardization aims to map non-standardized terms in texts(e.g.electronic medical records and patient complaints)into unified and standardized medical entities.In view of the small scale and hardly standardized of annotated corpora in medical texts,this paper proposes a multi-model collaborative ensemble learning framework to solve the standardization of medical entities.By establishing a"cooperation and competition"mechanism among multiple models,we can combine the advantages of different standardization methods in character level and semantic level.Specifically,the collaborative learning implemented by knowledge distillation technology can extract effective features from each model.The diversity of candidate sets can be guaranteed by integrating entity standardization results of each model with competition-aware.In the CHIP-CDN 2021task of medical entity standardization,the method proposed achieved a F1value of 73.985%in the blind test data set,ranking second among 255teams including Baidu BDKG,Ant-Financial Antins and AISpeech.Experimental results also show that this method can effectively standardize terms in medical texts.
作者
姜京池
侯俊屹
李雪
关毅
关昌赫
JIANG Jingchi;HOU Junyi;LI Xue;GUAN Yi;GUAN Changhe(AIoT Research Center,Harbin Institute of Technology,Harbin,Heilongjiang 150001,China;Language Technology Research Center,Harbin Institute of Technology,Harbin,Heilongjiang 150001,China)
出处
《中文信息学报》
CSCD
北大核心
2023年第3期135-142,共8页
Journal of Chinese Information Processing
基金
国家青年自然基金(NSFC62006063)
黑龙江省博士后面上自然基金(LBH-Z20015)