摘要OCR技术在快速构建档案目录数据库的任务中能够发挥重大作用。基于OCR识别的档案目录数据库构建方法的处理流程,包括数据分类、建立模板、OCR识别、信息输出、校对等五个环节。在毕业证、学位证等教学档案管理中,使用商用软件ABBYY Fine Reader和开源免费软件Tesseract能够快速、有效、自动化地完成档案目录数据的构建工作。
3Mori S.Historical review of OCR research and development[J].IEEE,1992,80(7):1029-1058.
4Nagasaki T,Takaheshi T,Marukawa K.Document retrieval system.Tolerant of segmentation errors of document images[C]//Proc.of the 9th International Workshop on Frontiers in Handwriting Recognition.Tokyo,Japan,2004:280-285.
5Tan C L,Huang Weihua,Yu Zhaohui.Imaged document text retrieval without OCR[J].IEEE Trans on Pattern Analysis and Machine Intelligence,2002,24(6):838-844.
6Richard S Hunter.Abstract[J].Journal of the Optical Society of America,1948,38:661.
7Stefano L,Mattoccia S.Real-time stereo within the VIDET project[J].Real-Time Imaging,2002,8(5):439-453.