OCRopus
OCRopus is used for text recognition and analysing documents. The most modular design combines an analysis of the document format through optical character recognition and the use of statistical language models. Individual components can then be comfortably exchanged using additional modules.
OCRopus, which is free, is aimed at both home users as well as companies. The biggest user at the moment is the Google book search. The developer Thomas Breuel, from the German Research Centre for Artificial Intelligence (DFKI) also has the support of Google Inc. This application has since appeared in a 2.0 version with an Apache license.
It is also being developed by Ubuntu Linux in C++ and Python with Jam as the build system. The only recognition module available from OCRopus at the moment is the Tesseract from Hewlett Packard. In the future it should be possible to incorporate other modules into this. The analysis from this module are already better than any derived from Tesseract alone, although OCRopus does not have it’s own language module at the moment. As soon as the first official version of the OpenFST project comes into existence, this should be for used for this purpose.
Operating system: Linux | Website / Download: http://code.google.com/p/ocropus
GOCR

The GORC text recognition program is also known as JORC. The creators had to find an alternative name for it, seeing as the name GORC already existed and was well known. This repetition was first noted when the source code was published on www.sourceforge.net.
GOCR is based on command line constructions. It first appeared on Linux and is available in all popular Linux formats. It can also be used however with OS/2 and Windows. The binary data necessary to make this possible was published by two external programmers - Franz Bakan and Peter B-L- Meijer. The program is free and is used for scanner software under KDE, amongst others. It is able to recognise some fonts and one-dimensional barcodes without needing to access a database. This means that the software is especially easy to use, however it cannot keep up with other options that are commercially available.
Operating systems: Linux, Windows, OS2, Mac OSX | Website / Download: http://www.gocr.de
CuneiForm

CuneiForm is a text recognition software for use with printed text. Is cannot recognise handwriting, but it able to read tables. The language model is suitable for 20 different languages and the results can be saved as HTML, RTF or ASCII text, or alternatively exported directly into Word or Excel. Exporting files does not alter the font or the structure found in the document.
CuneiForm has recently been made an open source software. It was developed by the Russian company Cognitive Technologies. It has only been commercially available since April 2008, when the source code first came out. Jussi Pakkanen has created a portable version of CuneiForm.
Operating systems: Linux, BSD, Mac OS X und Windows | Website / Download: http://openocr.org
Tesseract
Tesseract was developed by Hewlett Packard between 1985 and 1995, but lay idle around the HP offices for 10 years after HP left the OCR market. Contact was re-established with their former developer Ray Smith who was now with Google after this was handed over to the Information Science Research Institute. He then updated the standard over Google Code and has cleared the Apache license through SourceForge.
Tesseract’s strong point is the fact that it is a program for character recognition which delivers very good results. It is, however, only suitable for analysing pages or statistical language models – there is no graphical user interface.
Tesseract can be used as a text recognition module for OCRopus and is used by Google Book Search. This merger means that the structure of the document can also be analysed, and a statistical language model is available. Text recognition data is available for English, French, Spanish, Italian and Dutch.
It is already possible to recognise German type texts in attachments.
Operating systems: Windows, Linux, Mac | Website / Download: http://code.google.com/p/tesseract-ocr/









Text information can be taken from image files using suitable 