摘要
根据边缘点的位置和颜色信息采取逐步松弛的聚类方法将图像分割成像素子集,应用文本区域边缘的分布特征提取初始文本区,并进行边界扩展得到完整的文本区域;同时给出了一种文本区域二值化方法,减少了在文本颜色极性未知时的二值图像个数,可提高字符分割等后续处理的计算效率.实验结果表明,该方法对文本区域提取是有效的,提取完整率达99%.
An approach based on edge-pixels clustering to extract Chinese and English text areas from an image is proposed. The image is segmented into pixel-subclasses based on the colors and positions of edgepixels. And then the initial text areas are extracted according to the features of edges in text area. The boundaries of the initial text areas are expanded for the entire text areas. Furthermore, an algorithm of text area binarization is presented to improve the efficiency of post-processing by reducing the number of binary images when the text color polarity is unknown. The experimental results show that the proposed approach is effective with integrality up to 99 %.
出处
《计算机辅助设计与图形学学报》
EI
CSCD
北大核心
2006年第5期729-734,共6页
Journal of Computer-Aided Design & Computer Graphics
关键词
文本区域提取
图像检索
光学字符识别
聚类
图像二值化
text area extraction
image retrieval
optical character recognition (OCR)
clustering
image binary