卷积神经网络在古籍汉字识别中的应用实践被引量：14

CNN-Based Recognition of Chinese Characters in Ancient Books

下载PDF

导出

摘要文章尝试将卷积神经网络用于数字人文古籍汉字的元数据加工,将古籍汉字识别问题转换为卷积神经网络的分类问题,在缺乏训练集的情况下通过数据生成技术构建训练集进行模型训练,并用于古籍汉字的识别。通过TensorFlow平台,对773个汉字生成约24万个训练样本,网络模型可自行判定不可识别的图片;在提高精确率同时,对这部分数据可直接转由人工识别,系统更为可靠,作为数字人文古籍元数据加工的半自动化工具,旨在提高古籍资源在数字人文应用研究中的效率。 Convolutional neural network (CNN) is used to index the metadata of Chinese characters in ancient books in the field of digital humanities, so that the recognition of Chinese characters in ancient books is transformed into the classification of CNN. As a result of the absence of training sets,data generation technology is used for model training, and then for the recognition of Chinese characters in ancient books. In detail, the TensorFlow platform is used to generate about 240,000 training samples for 773 Chinese characters, and the adopted network model can be used to pick out those unrecognizable character pictures automatically. Then,the unrecognizable character pictures would be transferred for manual recognition,which would be more reliable. In short,though still a semi-automatic tool,it can save the manpower cost to a certain extent in the indexing of digital humanistic metadata.

作者郭利敏葛亮刘悦如 GUO Limin;GE Liang;LIU Yueru

机构地区上海图书馆上海宝开软件有限公司同济大学图书馆

出处《图书馆论坛》 CSSCI 北大核心 2019年第10期142-148,共7页 Library Tribune

关键词智慧图书馆人工智能卷积神经网络数字人文古籍汉字识别 smart library artificial intelligence convolution neural network digital humanities recognition of Chinese characters in ancient books

分类号 G255.1 [文化科学—图书馆学]

引文网络
相关文献

参考文献3

1钱跃良,林守勋,刘群,刘洋,刘宏,谢萦.863计划中文信息处理与智能人机接口基础数据库的设计和实现[J].高技术通讯,2005,15(1):107-110. 被引量：4
2张青云.基于价值链视角的图书馆古籍文献资源挖掘与利用研究[J].图书馆学刊,2018,40(10):83-86. 被引量：5
3韩玉凤.中国古典文学数位化的现状及分析[J].金华职业技术学院学报,2017,17(1):61-67. 被引量：1

二级参考文献17

1毛建军.古籍数字化研究的回顾与思考[J].国家图书馆学刊,2007,16(3):62-65. 被引量：14
2.中文语言资源联盟(ChineseLDC)[EB/OL].网站:http://www.chineseldc.org.,.
3NII Test Collection for IR Systems (NTCIR 会议) : http://research. nii. ac. jp/ntcir/workshop/.
4Qian YueLiang,LIN ShouXun,Zhang YongDong, LIU Yang,LIU Hong and LIU Qun. An Introduction to Corpora Resources of 863 Program for Chinese Language Processing and Intelligent Human-Machine Interaction. In Proceedings of The 4th Workshop on Asian Language Resources (ALR-04), March 25,2004, Sanya City, Hainan Island, China.
5The Linguistic Data Consortium (LDC) 网站: http://www.ldc. upenn. edu.
6The Text REtrieval Conference (TREC 会议): http://tree.nist. gov/.
7王兆鹏.建设中国文学数字化地图平台的构想[J].文学遗产,2012(2):131-133. 被引量：19
8苗怀明,许玉敏.中国古代文学数字化研究笔谈主持人语[J].九江学院学报（社会科学版）,2013,32(4):47-49. 被引量：2
9郑永晓.加快“数字化”向“数据化”转变——“大数据”、“云计算”理论与古典文学研究[J].文学遗产,2014(6):141-148. 被引量：23
10张旭,赵彦辉,刘树春.本草古籍数字化及嵌入学术资源平台的探索与实践[J].中国中医药图书情报杂志,2017,41(6):5-9. 被引量：2