摘要
基于敦煌文献电子化、数字化所取得的成果,提出将敦煌文献的转录文本和相关图像资料以数字化手段按照统一的标准和格式整合成数据库,即建设包含文献文本数据及相关图像的敦煌文献多模态语料库。据此探讨敦煌文献多模态语料库的设计目标与原则,以及语料库维护、检索和输出多功能系统的开发。该语料库的开发和建设既有利于研究人员从多个角度开展敦煌文献语言文字研究,也可以应用于中古汉语文献的语言教学,帮助学生进行相关的数据驱动学习。
Based on the electronic and digital achievements of Dunhuang literature,it is proposed to integrate the transcripts and relevant image data of Dunhuang literature into a database by means of digitization in accordance with the unified standard and format,that is to construct a multi-modal corpus of Dunhuang literature,which includes the transcribed texts of Dunhuang literature and the relevant images.And accordingly,the design principles and objectives of the corpus have been discussed,as well as the development of corpus multi-functional system,including corpus maintenance,corpus query and the output of analysis results.The development and construction of this corpus can be used not only in the linguistic study of Dunhuang literature,but also in the teaching of ancient Chinese,by which students can use the corpus data to carry out data-driven study.
作者
康宁
陈冰云
KANG Ning;CHEN Bing-yun(School of Foreign Languages,Qingdao University of Science and Technology,Qingdao 266061,China;Library,Qingdao University of Science and Technology,Qingdao 266061,China)
出处
《青岛科技大学学报(社会科学版)》
2018年第4期110-114,共5页
Journal of Qingdao University of Science and Technology(Social Sciences)
基金
山东省社会科学规划研究项目(16CZWJ49)