基于高阶相关聚类的脱机手写文本行分割被引量：1

Offline handwritten text line segmentation based on high-order correlation clustering

下载PDF

导出

摘要从手写文档图像中提取出文本行是文档分析的一个重要预处理步骤,但是由于手写文本行之间通常行方向不平行,甚至存在着交叠和弯曲,所以它仍然是一个具有挑战性的问题.针对该问题,提出了一种基于高阶相关聚类的脱机中文手写文本行的分割算法.首先,使用连通部件构成一个文档超图,然后,在学习所得的相似性度量准则的约束下,通过高阶相关聚类算法将连通部件对标记为属于或者不属于同一文本行;最后,使用union-find算法将连通部件连接成为不同的文本行.该算法在HIT-MW脱机手写数据库上的803幅文档上取得了较好的效果,召回率99.05%,错误率为1.96%. Text line segmentation from handwritten document images is one of important pre-processing steps in document image analysis, however, it remains a challenge because the handwritten text lines are often multi-skewed, curved and overlapped. This paper proposed a novel handwritten text line segmentation method based on high-order correlation clustering. First, a hypergraph was constructed with the nodes corresponding to connected components and the edge connecting at least two connected components. Then under the learned similarity measure, the pairs of connected components were labeled as belonging or not belonging to the same text line. Finally, the connected components were merged into different text lines using union-find algorithm. In experiments on a database with 803 unconstrained handwritten Chinese document images （HIT-MW）, the proposed method achieved a correct rate 99.05%, and an error rate of 1.96%.

作者殷亚林刘爱民周祥东 YIN Yalin LIU Aimin ZHOU Xiangdonga(Department of Digital Media Technology, Jianghan University, Wuhan 430056 Laboratory and Equipment Department, Central China Normal University, Wuhan 430079 Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing 400714)

机构地区江汉大学数字媒体技术系华中师范大学实验室与设备处中国科学院绿色智能技术研究院

出处《华中师范大学学报（自然科学版）》 CAS 北大核心 2017年第1期18-22,34,共6页 Journal of Central China Normal University：Natural Sciences

基金国家自然科学基金项目(61273269)

关键词手写文本行分割高阶相关聚类超图 handwritten text line segmentation high-order correlation clustering hypergraph

分类号 TP317.2 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献1

1黄亮,殷飞,陈庆虎.基于图聚类的脱机手写文档图像文本行分割[J].华中科技大学学报（自然科学版）,2014,42(3):33-36. 被引量：3

二级参考文献13

1Arivazhagan M, Srinivasan H, Srihari S. A statisti- cal approach to line segmentation in handwritten doe- uments[C] // Document Recognition and Retrieval XIV. Bellingham: SPIE Press, 2007: 111-121.
2Nikolaou N, Makridis M, Gatos B, et al. Segmenta- tion of historical machine-printed documents using adaptive run length smoothing and skeleton segmen-tation paths[J]. Image and Vision Computing, 2010, 28(4) : 590-604.
3Feldbach M, Tonnies K D. Line detection and seg- mentation in historical church registers[C]//6th Int'l Conference on Document Analysis and Recognition. Seattle: IEEE Press, 2001: 743-747.
4Louloudis G, Gatos B, Halatsis C. Text line detec- tion in unconstrained handwritten documents using a block-based hought transform approach[C]//9 th Int'l Conference on Document Analysis and Recognition. Parana: IEEE Press, 2007: 599-603.
5Oztop E, Mulayim A Y, Atalay V, et al. Repulsive attractive network for baseline extraction on docu- ment images[C]//22th Int'l Conference on Acoustic, Speech, and Signal Processing. Munich: IEEE Press, 1997: 3181-3184.
6Nicolas S, Paquet T, Heutte L. Markov random field models to extract the layout of complex handwritten documents[C]//10th Int'l Workshop on Frontiers in Handwriting Recognition. Piscataway: IEEE Press, 2006: 292-295.
7Stamatopoulos N, Gatos B, Perantonis S J. A meth od for combining complementary techniques for docu- ment image segmentation[J]. Pattern Recognition, 2009, 42(12): 3158-3168.
8Yin F, Liu C L. Handwritten Chinese text line seg- mentation by clutering with distance metric learning [J]. Pattern Recognition, 2009, 42(12): 3146 -3157.
9Chang F, Chen C J, Lu C J. A lineatime compo- nent-labeling algorithm using contour tracing tech- nique[J]. Computer Vision and Image Understand- ing, 2004, 93(2): 206-220.
10Mao S, Kanungo T. Empirical performance evalua- tion of page Segmentation algorithms [J]. IEEE Transactions on Pattern Analysis and Machine Intel- ligence, 2001, 23(3): 242- 256.

共引文献2

1朱宗晓,杨兵.特征离散点计算在手写文本行分割中的应用[J].计算机工程与应用,2015,51(8):148-152. 被引量：3
2张晶,许爽,贺建军,李敏,郑蕊蕊.基于缝隙裁剪的满文单词分割和提取方法研究[J].中文信息学报,2019,33(2):81-88. 被引量：4

同被引文献2

1童立靖,陈静.基于逆向工程的扭曲文档图像恢复[J].计算机工程与设计,2016,37(4):964-968. 被引量：5
2曾凡锋,段漾波.一种复杂版面扭曲文档图像快速校正方法[J].计算机应用与软件,2016,33(6):172-175. 被引量：4

引证文献1

1罗晓萍,朱金好.分段Radon变换的弯曲文本基线提取[J].小型微型计算机系统,2018,39(12):2699-2704.

1蒋盛益.基于投票机制的融合聚类算法[J].小型微型计算机系统,2007,28(2):306-309. 被引量：7
2林国平.基于聚类的Web序列模式挖掘[J].漳州师范学院学报（自然科学版）,2005,18(4):21-27. 被引量：1
3唯智创想.头头是道——图解电源上的接头[J].电脑爱好者（普及版）,2009(8):77-78.
4马得.主板故障后的检修[J].电脑知识与技术（过刊）,2005(3):68-68. 被引量：5
5王明.主板故障该如何检查[J].计算机与网络,2004,30(22):11-11. 被引量：1
6李瑞,张春元,罗莉.三种常用SoC片上总线的分析与比较[J].单片机与嵌入式系统应用,2004,4(2):5-8. 被引量：11
7张海波.UNION-FIND算法中数据结构的应用[J].濮阳职业技术学院学报,2006,19(1):21-21.
8吴会松.试论软件文档的用户界面及设计[J].邵阳高专学报,1997,10(1):12-15.
9赵志成.基于Android手机平台的来电通设计[J].哈尔滨师范大学自然科学学报,2014,30(6):72-74.
10吴会松.试论软件文档的用户界面及设计[J].电子工程师,1997,0(3):12-16.

华中师范大学学报（自然科学版）

2017年第1期

浏览历史

内容加载中请稍等...

基于高阶相关聚类的脱机手写文本行分割被引量：1

参考文献1

二级参考文献13

共引文献2

同被引文献2

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于高阶相关聚类的脱机手写文本行分割 被引量：1

参考文献1

二级参考文献13

共引文献2

同被引文献2

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于高阶相关聚类的脱机手写文本行分割被引量：1