期刊文献+

基于高阶相关聚类的脱机手写文本行分割 被引量:1

Offline handwritten text line segmentation based on high-order correlation clustering
下载PDF
导出
摘要 从手写文档图像中提取出文本行是文档分析的一个重要预处理步骤,但是由于手写文本行之间通常行方向不平行,甚至存在着交叠和弯曲,所以它仍然是一个具有挑战性的问题.针对该问题,提出了一种基于高阶相关聚类的脱机中文手写文本行的分割算法.首先,使用连通部件构成一个文档超图,然后,在学习所得的相似性度量准则的约束下,通过高阶相关聚类算法将连通部件对标记为属于或者不属于同一文本行;最后,使用union-find算法将连通部件连接成为不同的文本行.该算法在HIT-MW脱机手写数据库上的803幅文档上取得了较好的效果,召回率99.05%,错误率为1.96%. Text line segmentation from handwritten document images is one of important pre-processing steps in document image analysis, however, it remains a challenge because the handwritten text lines are often multi-skewed, curved and overlapped. This paper proposed a novel handwritten text line segmentation method based on high-order correlation clustering. First, a hypergraph was constructed with the nodes corresponding to connected components and the edge connecting at least two connected components. Then under the learned similarity measure, the pairs of connected components were labeled as belonging or not belonging to the same text line. Finally, the connected components were merged into different text lines using union-find algorithm. In experiments on a database with 803 unconstrained handwritten Chinese document images (HIT-MW), the proposed method achieved a correct rate 99.05%, and an error rate of 1.96%.
作者 殷亚林 刘爱民 周祥东 YIN Yalin LIU Aimin ZHOU Xiangdonga(Department of Digital Media Technology, Jianghan University, Wuhan 430056 Laboratory and Equipment Department, Central China Normal University, Wuhan 430079 Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing 400714)
出处 《华中师范大学学报(自然科学版)》 CAS 北大核心 2017年第1期18-22,34,共6页 Journal of Central China Normal University:Natural Sciences
基金 国家自然科学基金项目(61273269)
关键词 手写文本行分割 高阶相关聚类 超图 handwritten text line segmentation high-order correlation clustering hypergraph
  • 相关文献

参考文献1

二级参考文献13

  • 1Arivazhagan M, Srinivasan H, Srihari S. A statisti- cal approach to line segmentation in handwritten doe- uments[C] // Document Recognition and Retrieval XIV. Bellingham: SPIE Press, 2007: 111-121.
  • 2Nikolaou N, Makridis M, Gatos B, et al. Segmenta- tion of historical machine-printed documents using adaptive run length smoothing and skeleton segmen-tation paths[J]. Image and Vision Computing, 2010, 28(4) : 590-604.
  • 3Feldbach M, Tonnies K D. Line detection and seg- mentation in historical church registers[C]//6th Int'l Conference on Document Analysis and Recognition. Seattle: IEEE Press, 2001: 743-747.
  • 4Louloudis G, Gatos B, Halatsis C. Text line detec- tion in unconstrained handwritten documents using a block-based hought transform approach[C]//9 th Int'l Conference on Document Analysis and Recognition. Parana: IEEE Press, 2007: 599-603.
  • 5Oztop E, Mulayim A Y, Atalay V, et al. Repulsive attractive network for baseline extraction on docu- ment images[C]//22th Int'l Conference on Acoustic, Speech, and Signal Processing. Munich: IEEE Press, 1997: 3181-3184.
  • 6Nicolas S, Paquet T, Heutte L. Markov random field models to extract the layout of complex handwritten documents[C]//10th Int'l Workshop on Frontiers in Handwriting Recognition. Piscataway: IEEE Press, 2006: 292-295.
  • 7Stamatopoulos N, Gatos B, Perantonis S J. A meth od for combining complementary techniques for docu- ment image segmentation[J]. Pattern Recognition, 2009, 42(12): 3158-3168.
  • 8Yin F, Liu C L. Handwritten Chinese text line seg- mentation by clutering with distance metric learning [J]. Pattern Recognition, 2009, 42(12): 3146 -3157.
  • 9Chang F, Chen C J, Lu C J. A lineatime compo- nent-labeling algorithm using contour tracing tech- nique[J]. Computer Vision and Image Understand- ing, 2004, 93(2): 206-220.
  • 10Mao S, Kanungo T. Empirical performance evalua- tion of page Segmentation algorithms [J]. IEEE Transactions on Pattern Analysis and Machine Intel- ligence, 2001, 23(3): 242- 256.

共引文献2

同被引文献2

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部