----------------------------------------------------------------
    Block-Based Ground Truth Dataset for Scene Text Detection
    scene-osakafu-u-GT4  (Rev.20090417)
----------------------------------------------------------------

1. About This Dataset

This dataset contains 4-class Ground Truth data for the natural
scene images with text provided at:
  http://www.cs.osakafu-u.ac.jp/document/ 

This dataset is intended to be used for evaluations of
block-based text detection algorithms. Please refer to the
following paper for details.

  [HG2008IJDAR]
    Hideaki Goto, "Redefining the DCT-based feature for scene
      text detection  --  Analysis and comparison of spatial
      frequency-based features," IJDAR, Vol.11, No.1, pp.1-8
      (2008).

The original package of this dataset can be found on our
website:  http://www.imglab.org/db/


----------------------------------------------------------------
Copyright (C) 2005-2009  Hideaki Goto
All Rights Reserved.

You may use, copy, modify, merge, and distribute this dataset
without restriction and free of charge, subject to the following
conditions.

* The above copyright notice and this permission notice shall be
  included in all copies or substantial portions of the dataset.

* Modification(s) made to the dataset and the reason(s) of the
  modification(s) must be clearly explained in a document, and
  the document shall be included in all copies or substantial
  portions of the dataset.

THE DATASET IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND.
THE AUTHOR(S) OR COPYRIGHT HOLDER(S) WILL NOT BE RESPONSIBLE FOR
ANY DAMAGE CAUSED BY THIS DATASET.
----------------------------------------------------------------


2. Description

The dataset is based on the Ground Truth data used in our
research group (http://www.sc.isc.tohoku.ac.jp/).

The original files of scene images can be found at:
  http://www.cs.osakafu-u.ac.jp/document/
    (C) 2005 Koichi Kise

The basic block size is 16x16 (pixels).
Each block was manually classified into one of the following
four classes.

	value	class
	0	text block
	127	intermediate block
	200	large text block
	255	non-text block

The criteria used are as follows.

  a) If a block does not contain any character stroke at all,
     the block is classified as "non-text block."

  b) If a character is smaller than 8x8 pixels, we cannot expect
     typical OCRs to recognize such a small character accurately.
     Therefore, if a block contains such a character only, the
     block is classified as "intermediate blocks."

  c) If a character is larger than double of the block size, its
     strokes often do not fit in a block. Therefore, if a block
     contains such a large character only, the block is
     classified as "large text blocks."

  d) If a block contains only a small portion of a stroke, the
     block is classified as "intermediate block."

  e) The other blocks are classified into "text blocks."


3. Changes

  Rev.20090417
    Document update.

  Rev.20060316
    004.pgm has been fixed because it had some non-255 white blocks.

  Rev.20051025
    First release.


4. Credits

This Ground Truth dataset was created by the following people.

  Tohoku University, Japan
      Seiji Saito
      Tomoyuki Saoi
      Hiroki Shiratori
      Hideaki Goto


Contact:
Assoc. Prof. Hideaki Goto
Cyberscience Center, Tohoku University,
Sendai 980-8578, JAPAN
E-mail:  hgot_(at)_isc.tohoku.ac.jp  (remove underscores)
WWW   :  http://www.sc.isc.tohoku.ac.jp/~hgot/