摘要
提出了一种基于Adaboost算法的场景中文文本定位的新方法。首先利用边缘特征进行文本区域的检测,即对数字图像进行边缘提取、二值化处理,然后通过连通域分析去除明显的非字符连通域,并获得候选的文本区域。对场景中文文本区域进行分析,提取了场景中文文本的4类特征,并利用这4类特征经过分类与回归决策树构造了Adaboost强分类器。将候选文本区域送入强分类器,得到正确的文本区域。实验结果表明方法不仅对场景文本图像中字体、大小和颜色多变的文本具有很好的定位效果,而且具有很高的召回率和准确率。
A novel Chinese text localization method based on Adaboost in natural images is proposed in this paper. Firstly,the text regions are detected using the edge feature, where digital image is processed by edge extraction and binarization,then connected domain analysis is used to remove non-text regions and get candidate text regions. Secondly, a strong classifier of Adaboost with CART(Classification And Regression Tree)is constructed by using the four classes Chinese text features that are extracted by analyzing the text areas. Finally, the correct text areas are got after the candidate regions are send into the strong classifier. The experimental results show that not only can this method achieve a good effect on the text location in the natural images including the images with text of various fonts, sizes and colors but also realize high recall rate and precision rate.
作者
尹芳
郑亮
陈田田
YIN Fang;ZHENG Liang;CHEN Tiantian(School of Computer Science and Technology, Harbin University of Science and Technology, Harbin 150080, China;Instrument Science and Technology Postdoctoral Research Station, Harbin University of Science and Technology, Harbin 150080, China)
出处
《计算机工程与应用》
CSCD
北大核心
2017年第4期200-204,208,共6页
Computer Engineering and Applications
基金
黑龙江省教育厅科学技术研究项目(No.12541119)
关键词
文本定位
文本识别
连通域
分类与回归决策树
text location
text recognition
connected domain
classification and regression tree