Jul 4, 2002
(Release 1.31)
1. About this program
These days we have a large number of documents with colored
backgrounds and/or complex backgrounds. Book covers, advertisements,
and CD jackets are often colorful. In recent books, magazines and
journals, main texts are sometimes printed on plain, colored
backgrounds, on illustrations, or even on pictures.
However, current OCR systems often fail to get text information
from such documents because they utilize rather simple binarization
as the preprocessing and have no capability of separating text
characters and complex backgrounds.
This program, dlabel, is the sample implementation of the method
which we have proposed in the following paper.
Hideaki Goto and Hirotomo Aso, "Character Pattern Extraction
Based on Local Multilevel Thresholding and Region Growing,"
Proc. 15th Int. Conf. Patt. Recogn. (ICPR2000), Volume 4,
pp.430-433, 2000.
dlabel is a powerful tool for extracting character patterns from
grayscale document images with complex backgrounds. dlabel can
separate character patterns of various gray levels and sizes from
the images with overlapping backgrounds as long as there is enough
difference of gray level between the character patterns and the
backgrounds.
Here is an example of the process.
The method used in this program has following advantageous
properties.
1. It can separate light character patterns and dark ones simulta-
neously.
2. It can extract very thin (>1.5pixels) character strokes.
3. It is tolerant of dull image in which edges are not very clear.
4. It is tolerant of slight shading of image. Even if the bright-
ness of a text line changes gradually, it can extract the
pattern of the text line as a single image.
2. Requirements
Under every Operating System:
1) Following package is required.
O2-tools-1.xx.tar.gz (Release 1.08 or later)
Under UNIX or UNIX-like Operating System:
1) ANSI C and C++ compilers.
Using GNU's gcc and g++ is the easiest way, if you don't care
about the speed of the programs. I'd like to recommend you to
try better compilers with good optimizations in order to get
faster executables.
2) "make" command.
3) "xmkmf", "imake" and "makedepend" commands in X Window System
Version 11 Release 5 or later.
Under Windows NT Operating System:
1) ANSI C and C++ compilers. (Never use 16bit-compilers!)
Microsoft Visual C++ 2.0 or later is desirable.
3. Testing Environments
The current version (1.31) has been tested only in the following
environments.
Machine: Sun Ultra60 Model 1360
OS: Solaris8 1/01
X Window System: X11R6.4
Compilers: gcc,g++-2.95.3 with libg++-2.8.1.3 addon
Machine: Sun Ultra60 Model 1360
OS: Solaris8 1/01
X Window System: X11R6.4
Compilers: Sun Forte Developer 6 update 2
Machine: Compaq XP1000
OS: Compaq Tru64 UNIX V4.0F
X Window System: DECWINDOWS
(OS's standard window system based on X11R6)
Compilers: gcc-2.95.2 with libg++-2.8.1.3 addon
Machine: NEC TX7/AzusA
OS: Red Hat Linux 7.1 (IA64)
with kernel 2.4.7 modified by NEC
X Window System: XFree86-4.0.3-18
Compilers: Intel C++ Itanium Compiler Version 6.0b
/*--------------------------------------------------------------------
Copyright (C) 1999-2002 Hideaki Goto
All Rights Reserved
Permission to use, copy, modify, and distribute this software and
its documentation for any purpose is hereby granted without fee,
provided that (i) the above copyright notice and this permission
notice appear in all copies and in supporting documentation, (ii)
the name of the author, Hideaki Goto, may not be used in any
advertising or otherwise to promote the sale, use or other
dealings in this software without prior written authorization
from the author, (iii) this software may not be used for
commercial products without prior written permission from the
author, and (iv) the notice of modification is specified in cases
where modified copies of this software are distributed.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND.
THE AUTHOR WILL NOT BE RESPONSIBLE FOR ANY DAMAGE CAUSED BY THIS
SOFTWARE.
--------------------------------------------------------------------*/