摘要
地形图中包含了大量的字体丰富的汉字注记 ,其中有一部分由于与其它图符对象相互粘连而使得对象的尺寸变大超过了预定的阈值或失去了原有的结构特性 ,大大增加了提取难度。本文提出了一种寻求最佳分割点的算法来去除粘连 ,提取汉字的算法 ,取得了良好的效果。首先 ,在已提取出的汉字周围确定一个局部搜索范围 ,当局部范围内存在大尺寸的对象时表明可能有潜在的粘连汉字 ;其次 ,以图象中的分枝点和端点为顶点 ,以其中的图段为边建立对应的图 ;然后 ,在图中寻找最佳分割点 ,将图符分割成不同的互相分离的几个部分 ;最后 ,用连通成分结构分析的方法来提取汉字。
There are a lot of Chinese characters with many fonts in a topographical map.Some of them turn larger to surpass the scheduled threshold value or to lose their former structure traits because they are linked to other symbols,which adds great difficulties to extraction.An efficient algorithm in search for the best segmentation points is presented in this paper to remove the adhesion and to extract characters.First,fix a local searching area around the extracted characters.If there have large objects in the area,some characters may adhere to them potentially.Secondly,ramification points and extreme points in the images as the apexes,lines as the sides,set up corresponding graphics;then find out the best segmentation points in the graphics to segment the symbols into several separate parts.At last,extract the characters with the method of analyzing the structure of connected components.
出处
《中文信息学报》
CSCD
北大核心
2000年第2期43-48,共6页
Journal of Chinese Information Processing
关键词
地形图
汉字注记
分割点
汉字提取算法
Topographic map Chinese character Connected components Graph Segmentation point