摘要
目的:基于构建的高质量中医古籍文本语料库及深度学习语言模型,训练出可以应用于中医古籍缺失文本修复的模型,为中医古籍修复工作提供帮助。方法:分别训练和测试N-gram模型、LSTM模型、BiLSTM模型、RoBERTa模型,对比筛选出最优模型,并将筛选出的最优模型运用到文本修复场景中。结果:BiLSTM模型优于LSTM模型,LSTM模型明显优于N-gram模型,而RoBERTa模型效果最优,将RoBERTa模型运用到《黄帝内经》的修复中,达到了63.36%的hit@1,82.57%的hit@5。结论:将深度学习技术应用于中医古籍缺失文本修复中具有较好的效果,能够为中医古籍修复提供帮助。
Objective Based on the constructed high-quality text corpus of ancient books of traditional Chinese medicine and deep learning language models, a model that could be applied to contribute to the repairing of missing texts of TCM ancient books was trained. Methods N-gram model, LSTM model, BiLSTM model, and RoBERTa model were trained and tested respectively, and the optimal model was selected out after comparison, and then applied to the text repairing scenes. Results BiLSTM model outperformed LSTM model, LSTM model significantly outperformed N-gram model, and RoBERTa model had the best effect. 63.36% of hits@1 and 82.57% of hits@5 were achieved by applying RoBERTa model to the repairing of Huangdi Neijing(Yellow Emperor’s Classic of Internal Medicine). Conclusion The application of deep learning technology in the missing texts repairing has a preferable effect, and can help the repairing of TCM ancient books.
作者
盛威
卢彦杰
刘伟
胡为
周冲
SHENG Wei;LU Yan-jie;LIU Wei;HU Wei;ZHOU Chong(School of Informatics,Hunan University of Chinese Medicine,Changsha 410208,Hunan Province,China)
出处
《中华医学图书情报杂志》
CAS
2022年第8期1-7,共7页
Chinese Journal of Medical Library and Information Science
基金
湖南省教育厅科学研究项目“融合机器学习的中医古籍智能分析和知识抽取研究”(20C1435)
湖南省自然科学基金项目“中医典籍复杂语义结构分析与知识发现研究”(2022JJ30438)
湖南中医药大学研究生创新课题项目“基于深度学习的中医古籍修复研究与应用”(2022CX121)。
关键词
中医古籍
语言模型
文本修复
深度学习
RoBERTa
Ancient books of traditional Chinese medicine
Language model
Text Repairing
Deep learning
RoBERTa