摘要
文章尝试将卷积神经网络用于数字人文古籍汉字的元数据加工,将古籍汉字识别问题转换为卷积神经网络的分类问题,在缺乏训练集的情况下通过数据生成技术构建训练集进行模型训练,并用于古籍汉字的识别。通过TensorFlow平台,对773个汉字生成约24万个训练样本,网络模型可自行判定不可识别的图片;在提高精确率同时,对这部分数据可直接转由人工识别,系统更为可靠,作为数字人文古籍元数据加工的半自动化工具,旨在提高古籍资源在数字人文应用研究中的效率。
Convolutional neural network (CNN) is used to index the metadata of Chinese characters in ancient books in the field of digital humanities, so that the recognition of Chinese characters in ancient books is transformed into the classification of CNN. As a result of the absence of training sets,data generation technology is used for model training, and then for the recognition of Chinese characters in ancient books. In detail, the TensorFlow platform is used to generate about 240,000 training samples for 773 Chinese characters, and the adopted network model can be used to pick out those unrecognizable character pictures automatically. Then,the unrecognizable character pictures would be transferred for manual recognition,which would be more reliable. In short,though still a semi-automatic tool,it can save the manpower cost to a certain extent in the indexing of digital humanistic metadata.
作者
郭利敏
葛亮
刘悦如
GUO Limin;GE Liang;LIU Yueru
出处
《图书馆论坛》
CSSCI
北大核心
2019年第10期142-148,共7页
Library Tribune
关键词
智慧图书馆
人工智能
卷积神经网络
数字人文
古籍汉字识别
smart library
artificial intelligence
convolution neural network
digital humanities
recognition of Chinese characters in ancient books