This project was planned to provide software for experiments on document analysis and recognition. The software is mainly for layout analysis. (Note: No OCR included.)
We, the researchers and/or developers, often find it hard to evaluate the performance of the algorithms for document analysis and recognition, because documents have various layouts. We can evaluate the performance using some standard databases as is often done in the researches on the character recognition. However, such an evaluation gives us only some limited data. Even if we gather many document images at random, we may do senseless evaluations if there are only a few images which contain problems to be challenged.
It depends on the applications what kind of problems arise. Each researcher might take up a problem different from that of other researchers. A researcher may want to examine in detail the point which he is interested in. To realize such an evaluation, or an examination, it might be effective to gather the data hard to the algorithm and compare the performance between new and former methods. On such a comparison, it is convenient to use the programs which is the implementations of former method.
By the way, we often find it hard and time-consuming to implement the algorithm, even if the published, previous algorithm is not so hard to understand. Moreover, we are often unable to get enough information or to check the previous method, due to the omission of the detail of the algorithm and/or the real value of parameters. In order to reduce the time-consuming jobs for experiments, and to carry out research efficiently, the best way is that the programs are released and provided to everyone.
In this project, named "project-O2", my colleague and I implement the methods which we have developed, and release the programs positively, if possible, with source code. The released software packages contain some programs which may be convenient to many researchers, even if no new method is incorporated.
The following impacts are expected by this project.
The development of new method and the improvement of the previous method are rapidly-advancing. A method is already old at the time of its publication. However, it can be an object to get over in the future works. This project never aim to propose the "Standard", but I believe it can provide some "References" at a past time.