申明:本文非笔者原创,原文转载自: http://en.wikipedia/wiki/Comparison_of_optical_character_recognition_software

This comparison of optical character recognition software includes:

  • OCR engines, that do the actual character identification
  • Layout analysis software, that divide scanned documents into zones suitable for OCR
  • Graphical interfaces to one or more OCR engines
  • Software development kits that are used to add OCR capabilities to other software (e.g. forms processing applications, document imaging management systems, e-discoverysystems, records management solutions)
Name Founded year Latest stable version Release year License Online Windows Mac OS X Linux BSD Programming language SDK? Languages Fonts Output Formats Notes
Tesseract19853.02Oct 2012 Apache No Yes Yes Yes YesC++, C Yes35+[1]?Text,hOCR,[2]others with different user interfaces[3]or the APICreated by Hewlett-Packard; under further development by Google[4] It was one of the top 3 engines in the 1995 UNLV Accuracy test.
ExperVision[5]TypeReader & RTK19877.1.170.11252010 Proprietary Yes Yes Yes Yes YesC/C++ Yes212618 Won the highest marks in the independent testing performed byUNLV for X consecutive years (in 1994).[6][citation needed]


The speed of ExperVision’s OpenRTK is four to eight times faster than competition. — PC Magazine[7] but also "Not as accurate as rival products, clumsy interface, limited options for proofreading, couldn't open some files in standard PDF or image formats."[8] PC Magazine

ABBYY FineReader1989112011 Proprietary Yes Yes Yes Yes YesC/C++ Yes198[9]?DOC, DOCX, XLS, XLSX, PPTX, RTF, PDF, HTML, CSV, TXT, ODT, DjVu, EPUB, FB2[10]ABBYY also supplies SDKs for embedded and mobile devices. Professional, Corporate and Site License Editions for Windows, Express Edition for Mac.[11]
AnyDoc Software1989?? Proprietary No Yes No No NoVBScript??? Works with structured, semi-structured, and unstructured documents.
Aquaforest OCR SDK20011.412013 Proprietary Yes[12] Yes No No NoC#, VB.NET, ASP.NET Yes23OmniFont (Extended Module available, including support for over 100 languages)[13]PDF, PDF/A, RTF, TXTAquaforest's[14] OCR SDK for .NET[15]enables developers to directly make use of the Aquaforest OCR engine in their own applications and create searchable PDFs, RTF or text files from TIFFs, Bitmaps and Image-Only PDFs.
LEADTOOLS[16]1990[17]18.02013 Proprietary Yes Yes Yes Yes NoC/C++, .NET, Objective-C, Java, JavaScript Yes56[18]Any printed fontPDF, PDF/A, DOC, DOCX, XLS, XPS, RTF, HTML, ANSI Text, Unicode Text, CSV[19]Supports Latin, Asian, Arabic, and MICR character sets.[16] For full page, zonal, and form image processing. Includes OCR, barcode, OMR and forms recognition.[20] ICR (handwritten text recognition) is supported.[21]
CuneiForm/OpenOCR1996122007 BSD variant No Yes Yes Yes YesC/C++ Yes28Any printed fontHTML, hOCR, native, RTF,TeX, TXT[22]Enterprise-class system, can save text formatting and recognizes complicated tables of any structure
Transym OCR20003.32011 Proprietary No Yes No No NoC#, C/C++, VB, VB.NET Yes11?  
Image to OCR Converter2010[23]1.2[24]2012 Proprietary No Yes No No NoC/C++, VB and .NETCommand Line40?SearchablePDF, Text-Only PDF, Word, HTML, Text[25]It can read most image formats and pdf files, and can scan images from scanner or camera.[26][27]
SimpleOCR20023.52008 Proprietary No Yes No No No????  
Dynamsoft OCR SDK20038.22012 Proprietary Yes Yes No No NoC/C++ Yes40+[28]?PDF, TXTDynamsoft is the leading provider of image capture SDKs and version control tools.
OmniPage2005182011 Proprietary No Yes Yes Yes NoC/C++, C#[29] Yes?? Product of Nuance Communications
Microsoft Office OneNote 20072007?2007 Proprietary No Yes No No No????  
FreeOCR?4.2August 2012 Proprietary No Yes No No No???? [30]
GOCR?0.492010 GPL Yes[31] Yes Yes Yes YesC???  
Ocrad?0.21[32]2011 GPL Yes Yes Yes Yes YesC++ YesLatin alphabet? Command line
SmartScore??? Proprietary No Yes Yes No No???? For musical scores
Microsoft Office Document Imaging?Office 20072007 Proprietary No Yes No No No???? Uses OmniPage[citation needed]
Puma.NET??? BSD No Yes No No NoC# Yes28Any printed font .NET OCR SDK based on Cognitive Technologies' CuneiForm recognition engine. Wraps Puma COM server and provides simplified API for .NET applications
ReadSoft??? Proprietary No Yes No No No???? Scan, capture and classify business documents such as invoices, forms and purchase orders integrated with business processes.
Scantron?Cognition?? Proprietary No Yes No No No???? For working with localized interfaces, corresponding language support is required.
OCRFeeder?0.7.112009 GPL No No No Yes NoPython??? Features a full user interface and has a command-line tool for automatic operations. Has its own segmentation algorithm but uses system-wide OCR engines likeTesseract or Ocrad
OCRopus?0.62012 Apache No No No Yes NoPython???hOCR, HTML, TXT[33]Pluggable framework under active development, used forGoogle Books
Name Founded year Latest stable version Release year License Online Windows Mac OS X Linux BSD Programming language SDK? Languages Fonts Output Formats Notes

更多推荐

Comparison of optical character recognition software