摘要
为更好利用和挖掘藏文古籍文献内容,文章首先研究了手写藏文古籍文本的特点,按照其字形大小构建了3种数据集;其次采用PSENet、PixelLink、PANNet 3种基于分割的深度学习文本检测算法对多种字体的手写藏文古籍文本进行了检测;再评估了3种算法对手写藏文古籍文本的检测性能,分析了3种算法检测多种手写藏文古籍字体和字形大小的效果,指出了在同库实验中PSENet和PANNet性能优于Pixel⁃Link,跨库实验中PixelLink性能优于PSENet和PANNet。
To sufficiently use and fully explore the content of Tibetan ancient handwritten books,Tibetan ancient handwritten books must be digitized.For digitaization of Tibetan ancient handwritten books,the first key step is to detect Tibetan text from the books correctly.And hence,in this paper firstly the characteristics of Tibetan an⁃cient handwritten books is studied,and three datasets is constructed according to the font size of Tibetan ancient handwritten books.Secondly,three algorithms i.e.PSENet,PixelLink and PANNet,which are based on deep learning text detection algorithms,are applied to detect the text of Tibetan ancient handwritten books with multi⁃ple fonts,and the evaluation of performance of the three algorithms is carried out.Moreover,the performance of the three algorithms in detecting various fonts and font size of Tibetan ancient handwritten books are compared.Our results show that the performance of PSENet and PANNet are better than that of PixelLink in detecting Tibet⁃an ancient handwritten books with three font sizes,while the performance of PixelLink is better than PSENet and PANNet in the cross-database experiment.
作者
芷香香
高定国
ZHI Xiangxiang;GAO Dingguo(College of Information Science and Technology,Tibet University,Lhasa 850000,China)
出处
《高原科学研究》
CSCD
2022年第2期89-101,共13页
Plateau Science Research
基金
国家自然科学基金项目(62166038)
西藏大学研究生高水平人才培养计划项目(00060701).