Jul 4, 2002
(Release 1.31)
1. About this program These days we have a large number of documents with colored backgrounds and/or complex backgrounds. Book covers, advertisements, and CD jackets are often colorful. In recent books, magazines and journals, main texts are sometimes printed on plain, colored backgrounds, on illustrations, or even on pictures. However, current OCR systems often fail to get text information from such documents because they utilize rather simple binarization as the preprocessing and have no capability of separating text characters and complex backgrounds. This program, dlabel, is the sample implementation of the method which we have proposed in the following paper. Hideaki Goto and Hirotomo Aso, "Character Pattern Extraction Based on Local Multilevel Thresholding and Region Growing," Proc. 15th Int. Conf. Patt. Recogn. (ICPR2000), Volume 4, pp.430-433, 2000. dlabel is a powerful tool for extracting character patterns from grayscale document images with complex backgrounds. dlabel can separate character patterns of various gray levels and sizes from the images with overlapping backgrounds as long as there is enough difference of gray level between the character patterns and the backgrounds. Here is an example of the process.The method used in this program has following advantageous properties. 1. It can separate light character patterns and dark ones simulta- neously. 2. It can extract very thin (>1.5pixels) character strokes. 3. It is tolerant of dull image in which edges are not very clear. 4. It is tolerant of slight shading of image. Even if the bright- ness of a text line changes gradually, it can extract the pattern of the text line as a single image. 2. Requirements Under every Operating System: 1) Following package is required. O2-tools-1.xx.tar.gz (Release 1.08 or later) Under UNIX or UNIX-like Operating System: 1) ANSI C and C++ compilers. Using GNU's gcc and g++ is the easiest way, if you don't care about the speed of the programs. I'd like to recommend you to try better compilers with good optimizations in order to get faster executables. 2) "make" command. 3) "xmkmf", "imake" and "makedepend" commands in X Window System Version 11 Release 5 or later. Under Windows NT Operating System: 1) ANSI C and C++ compilers. (Never use 16bit-compilers!) Microsoft Visual C++ 2.0 or later is desirable. 3. Testing Environments The current version (1.31) has been tested only in the following environments. Machine: Sun Ultra60 Model 1360 OS: Solaris8 1/01 X Window System: X11R6.4 Compilers: gcc,g++-2.95.3 with libg++-2.8.1.3 addon Machine: Sun Ultra60 Model 1360 OS: Solaris8 1/01 X Window System: X11R6.4 Compilers: Sun Forte Developer 6 update 2 Machine: Compaq XP1000 OS: Compaq Tru64 UNIX V4.0F X Window System: DECWINDOWS (OS's standard window system based on X11R6) Compilers: gcc-2.95.2 with libg++-2.8.1.3 addon Machine: NEC TX7/AzusA OS: Red Hat Linux 7.1 (IA64) with kernel 2.4.7 modified by NEC X Window System: XFree86-4.0.3-18 Compilers: Intel C++ Itanium Compiler Version 6.0b ![]()
/*-------------------------------------------------------------------- Copyright (C) 1999-2002 Hideaki Goto All Rights Reserved Permission to use, copy, modify, and distribute this software and its documentation for any purpose is hereby granted without fee, provided that (i) the above copyright notice and this permission notice appear in all copies and in supporting documentation, (ii) the name of the author, Hideaki Goto, may not be used in any advertising or otherwise to promote the sale, use or other dealings in this software without prior written authorization from the author, (iii) this software may not be used for commercial products without prior written permission from the author, and (iv) the notice of modification is specified in cases where modified copies of this software are distributed. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND. THE AUTHOR WILL NOT BE RESPONSIBLE FOR ANY DAMAGE CAUSED BY THIS SOFTWARE. --------------------------------------------------------------------*/