摘要
针对医疗领域的研究,发现了不同科室间电子病历存在着差异,但是新语料的标注成本又非常高。为了解决这一问题,利用迁移学习的方法在中文电子病历中进行跨科室组块分析的研究。在构建的中文电子病历中,对比了SSVM与CRF模型在词性标注和组块分析上的实验结果,发现SSVM模型的效果更好并选择该模型作为基本标注模型;此外,使用了改进的结构对应学习算法(SCL)进行组块分析,使得该算法能适用于SSVM模型进行领域适应。实验结果表明该算法有效地改善了序列标注任务中跨科室的领域适应性问题。
Aiming at the study of medical field, found that there were differences between Chinese electronic medical records (CEMRs) from different departments, but the cost of new annotated corpus was very expensive. To solve this problem, this paper applied a method of transfer learning in study of cross-department chunking based on Chinese electronic records. Comparing the performance of SSVM and CRF algorithms on part-of-speech(POS) tagging and chunking tasks in established CEMRs, found that SSVM was better, then chose this model to train the basic model. Moreover, this paper proposed a modified structural correspondence learning(SCL) algorithm to chunk, which adapted to SSVM algorithm for domain adaption on POS tagging and chunking tasks. The results of experiments show that this modified algorithm effectively improves domain adaptability between the different departments on sequence labeling tasks.
出处
《计算机应用研究》
CSCD
北大核心
2017年第7期2084-2087,共4页
Application Research of Computers
关键词
中文电子病历
词性标注
组块分析
领域适应
结构化支持向量机
Chinese electronic medical record
part-of-speech tagging
chunking
domain adaptation
structured SVM