释义 |
- Evaluation
- References
This comparison of optical character recognition software includes: - OCR engines, that do the actual character identification
- Layout analysis software, that divide scanned documents into zones suitable for OCR
- Graphical interfaces to one or more OCR engines
- Software development kits that are used to add OCR capabilities to other software (e.g. forms processing applications, document imaging management systems, e-discovery systems, records management solutions)
Sortable tableName | Founded year | Latest stable version | Release year | License | Online | Windows | Mac OS X | Linux | BSD | Programming language | SDK? | Languages | Fonts | Output Formats | Notes |
---|
Google Drive OCR or Google Cloud Vision | 2015 | {{free}} | Yes | Browser | Browser | Browser | Unknown | Unknown | Yes | 200+ | All fonts | text | Google blog post [1] [2] | Tesseract | 1985 | 4.0.0 | 2018 | Apache}} | {{No}} | {{yes}} | {{yes}} | {{yes}} | {{yes}} | C++, C | {{yes}} | 100+[3] | Any printed font | Text, hOCR,[4] PDF, others with different user interfaces[5] or the API | Created by Hewlett-Packard; under further development by Google[6] | Readiris | 1986 | 16 | {{dunno}} | {{proprietary}} | {{dunno}} | {{yes}} | {{yes}} | {{dunno}} | {{dunno}} | {{dunno}} | {{yes}} | 100+[7] | {{dunno}} | {{dunno}} | Owned by Canon | CIB OCR [8] | 2011 | 2.08.00 | 2018 | Freeware | {{yes}}[9] | {{yes}} | {{yes}} | {{yes}} | {{yes}} | C++, Java, Python, Objective-C | {{yes}} | German, English, Spanish, Russian, Chinese, Japanese, Italian, French | Any printed font | Text, hOCR, PDF | CIB OCR supports more than 160 input formats | Screenworm | 2013 | 1.0 | 2014 | {{proprietary}} | {{no}} | {{no}} | {{yes}} | {{no}} | {{no}} | Objective-C++ | {{no}} | 57 | {{dunno}} | TXT | Product of Funchip. Uses the Tesseract OCR-engine. | ExperVision[10] TypeReader & RTK | 1987 | 7.1.170.1125 | 2010 | {{proprietary}} | {{yes}} | {{yes}} | {{yes}} | {{yes}} | {{yes}} | C/C++ | {{yes}} | 21 | 2618 | Has a Mobile and Embedded System version for iOS/Android/etc. | AliusDoc AD-SCI[11] | 2005 | 2.1 | 2015 | {{proprietary}} | {{no}} | {{yes}} | {{no}} | {{no}} | {{no}} | VB.Net | For Extensions | All ASCII-compatible languages | {{dunno}} | XML, PlainText, any other thru SDK extensions | Minimal need for post-sale Professional Services. Works with structured, semi-structured, and unstructured documents. | ABBYY FineReader | 1989 | 14 | 2017-01-25 | {{proprietary}} | {{yes}} | {{yes}} | {{yes}} | {{yes}} | {{yes}} | C/C++ | {{yes}} | 192[12] | {{dunno}} | DOC, DOCX, XLS, XLSX, PPTX, RTF, PDF, HTML, CSV, TXT, ODT, DjVu, EPUB, FB2[13] | ABBYY also supplies SDKs for embedded and mobile devices. Professional, Corporate and Site License Editions for Windows, Express Edition for Mac.[14] | E-aksharayan | 2010 | {{Yes}} | {{No}} | {{Yes}} | {{No}} | 14 | RTF, TXT, BRL | Asprise OCR SDK | 1998 | 15 | 2015 | {{proprietary}} | {{yes}} | {{yes}} | {{yes}} | {{yes}} | {{yes}} | Java, C#,VB.NET, C/C++/Delphi | {{yes}} | 20+[15] | {{dunno}} | Plain text, searchable PDF, XML[16] | Java, C#, VB.NET, C/C++/Delphi SDKs for OCR and Barcode recognition on Windows, Linux, Mac OS X and Unix.[17] | Nicomsoft OCR SDK | 1999 | 5.5 | 2015 | {{proprietary}} | {{no}} | {{yes}} | {{no}} | {{yes}} | {{no}} | C#, VB.NET, C++, Delphi, Java | {{yes}} | 25+[18] | {{dunno}} | Searchable PDF, Text, RTF | C#, VB.NET, C++, Delphi, Java OCR tool for Windows and Linux.[19] | AnyDoc Software | 1989 | {{dunno}} | {{dunno}} | {{proprietary}} | {{no}} | {{yes}} | {{no}} | {{no}} | {{no}} | VBScript | {{dunno}} | {{dunno}} | {{dunno}} | Works with structured, semi-structured, and unstructured documents. | LEADTOOLS[20] | 1990[21] | 19.0 | 2014 | {{proprietary}} | {{yes}} | {{yes}} | {{yes}} | {{yes}} | {{no}} | C/C++, .NET, Objective-C, Java, JavaScript | {{yes}} | 56[22] | Any printed font | PDF, PDF/A, DOC, DOCX, XLS, XPS, RTF, HTML, ANSI Text, Unicode Text, CSV[23] | Supports Latin, Asian, Arabic, and MICR character sets.[20] For full page, zonal, and form image processing. Includes OCR, barcode, OMR and forms recognition.[24] ICR (handwritten text recognition) is supported.[25] | CuneiForm | 1996 | 1.1 | 2011-04-19 | BSD variant}} | {{no}} | {{yes}} | {{yes}} | {{yes}} | {{yes}} | C/C++ | {{yes}} | 28 | Any printed font | HTML, hOCR, native, RTF, TeX, TXT[26] | Enterprise-class system, can save text formatting and recognizes complicated tables of any structure | OCR.space | 2015 | 3.02 | 2017 | GPL}} | {{yes}} | {{yes}} | {{no}} | {{no}} | {{no}} | C# | {{yes}} | 23 | Any printed font | TXT | Windows desktop software, Windows Store application and online web app - converts scanned documents to editable text documents using OCR. | SimpleOCR | 2002 | 3.5 | 2008 | {{proprietary}} | {{no}} | {{yes}} | {{no}} | {{no}} | {{no}} | {{dunno}} | {{dunno}} | {{dunno}} | {{dunno}} | Dynamsoft OCR SDK | 2003 | 8.2 | 2012 | {{proprietary}} | {{yes}} | {{yes}} | {{No}} | {{No}} | {{No}} | C/C++ | {{yes}} | 40+[27] | {{dunno}} | PDF, TXT | OmniPage | 1970s | 19.2 | 2015 | {{proprietary}} | {{yes}} | {{yes}} | {{yes}} | {{yes}} | {{no}} | C/C++, C#[28] | {{yes}} | 125[29] | Machine and handprinted fonts | DOC/DOCX XLS/XLSX PPTX RTF PDF PDF/A Searchable PDF HTML Text XML ePUB MP3 | Product of Nuance Communications | Microsoft Office OneNote 2007 | 2011 | {{dunno}} | 2007 | {{proprietary}} | {{no}} | {{yes}} | {{no}} | {{no}} | {{no}} | {{dunno}} | {{dunno}} | {{dunno}} | {{dunno}} | FreeOCR | {{dunno}} | 4.2 | August 2012 | {{proprietary}} | {{No}} | {{Yes}} | {{No}} | {{No}} | {{no}} | {{dunno}} | {{dunno}} | {{dunno}} | {{dunno}} | [30] | gImageReader[31] | 2009 | 3.2.99 | 2017-07 | GPL}} | {{no}} | {{yes}} | {{yes}} | {{yes}} | {{no}} | C++ | {{dunno}} | 100+ | Any printed font | TXT, PDF, hOCR | uses Tesseract OCR engine | GOCR | 2000 | 0.52[32] | 2018-10-15 | GPL}} | {{yes}}[33] | {{yes}} | {{yes}} | {{yes}} | {{yes}} | C | {{dunno}} | 20+ | {{dunno}} | Ocrad | {{dunno}} | 0.26[34] | 2017-03-31 | GPL}} | {{yes}} | {{yes}} | {{yes}} | {{yes}} | {{yes}} | C++ | {{yes}} | Latin alphabet | {{dunno}} | Command line | SmartScore | 1991 | 10.5.8 | 2015-07 | {{proprietary}} | {{no}} | {{yes}} | {{yes}} | {{no}} | {{no}} | {{dunno}} | {{dunno}} | {{dunno}} | {{dunno}} | For musical scores | Microsoft Office Document Imaging | {{dunno}} | Office 2007 | 2007 | {{proprietary}} | {{no}} | {{yes}} | {{no}} | {{no}} | {{no}} | {{dunno}} | {{dunno}} | {{dunno}} | {{dunno}} | date=March 2011}} | OCR.net | 2016 | {{dunno}} | 2016 | {{proprietary}} | {{Yes}} | {{No}} | {{No}} | {{No}} | {{No}} | Java, C++, PHP, Objective-c | {{No}} | 100+ | {{dunno}} | TXT, Searchable PDF | Online service powered by PDF OCR X for conversions. | PDF OCR X | 2008 | 3.0.11 | 2018 | {{proprietary}} | {{no}} | {{yes}} | {{yes}} | {{no}} | {{no}} | Java, C++, Objective-C | {{no}} | 100+ | {{dunno}} | TXT, Searchable PDf | Drag and drop UI. | Puma.NET | {{dunno}} | {{dunno}} | 2009-10-29 | BSD}} | {{no}} | {{yes}} | {{no}} | {{no}} | {{no}} | C# | {{yes}} | 28 | Any printed font | .NET OCR SDK based on Cognitive Technologies' CuneiForm recognition engine. Wraps Puma COM server and provides simplified API for .NET applications | ReadSoft | {{dunno}} | {{dunno}} | 14{{dunno}} | {{proprietary}} | {{no}} | {{yes}} | {{no}} | {{no}} | {{no}} | {{dunno}} | {{dunno}} | {{dunno}} | {{dunno}} | Scan, capture and classify business documents such as invoices, forms and purchase orders integrated with business processes. | Scantron | {{dunno}} | {{dunno}} | {{dunno}} | {{proprietary}} | {{no}} | {{yes}} | {{no}} | {{no}} | {{no}} | {{dunno}} | {{dunno}} | {{dunno}} | {{dunno}} | For working with localized interfaces, corresponding language support is required. | OCRFeeder | 2009-03 | 0.8.1 | 2014-12-22 | GPL}} | {{no}} | {{no}} | {{no}} | {{yes}} | {{no}} | Python | {{dunno}} | {{dunno}} | {{dunno}} | Features a full user interface and has a command-line tool for automatic operations. Has its own segmentation algorithm but uses system-wide OCR engines like Tesseract or Ocrad | OCRopus | 2007 | 1.3.3 | 2017-12-16 | Apache}} | {{no}} | {{no}} | {{yes}} | {{yes}} | {{yes}} | Python | {{dunno}} | All languages using Latin script (other languages can be trained) | Normal Latin script and Fraktur (other scripts can be trained) | TXT, hOCR[35], PDF[36] | Pluggable framework under active development, used for Google Books | MathOCR | 2014 | 0.0.3 | 2015 | GPL}} | {{no}} | {{yes}} | {{yes}} | {{yes}} | {{yes}} | Java | {{dunno}} | {{dunno}} | {{dunno}} | HTML, LaTeX | Features mathematical formula recognition and logical layout analysis, can use OCR engines like Tesseract or Ocrad as back-end. | MeOCR | 2012 | 1.0.0 | 2012 | Freeware}} | {{no}} | {{yes}} | {{no}} | {{no}} | {{no}} | C/C++/C# | {{yes}} | 28 | Any printed font | HTML, hOCR, native, RTF, TeX, TXT | Windows application. Converts scanned documents to editable text documents using OCR and exports them to Microsoft Word with one click. Features a full user interface and also has a .NET Interface library[37] for developers. | Yunmai OCR SDK | 2002 | 1.0 | 2013 | {{proprietary}} | {{yes}} | {{yes}} | {{yes}} | {{yes}} | {{yes}} | Java, C++, C, object pascal, objective-C | {{yes}} | 14 | Any printed font | TXT, PDF | Has the advantage of Chinese characters recognition.[38] | Anyline SDK | 2013[39] | 3.5.1[40] | 2016[40] | Free non-commercial use[41]}} | {{no}} | No*}} | No*}} | No*}} | No*}} | Java (Android), Objective-C & Swift (iOS), C# (Windows Phone, Xamarin), JavaScript (Cordova)[42] | Yes[43]}} | 2 (German, English) | Any printed trainable font[44] | Plain text, verification image | *Customizable mobile OCR SDK for Android, iOS, Windows Phone, Smart glasses (Google Glass, Epson Moverio,...) | Name | Founded year | Latest stable version | Release year | License | Online | Windows | Mac OS X | Linux | BSD | Programming language | SDK? | Languages | Fonts | Output Formats | Notes |
---|
EvaluationAn analysis of the accuracy and reliability of the OCR packages Google Docs OCR, Tesseract, ABBYY FineReader, and Transym, employing a dataset including 1227 images from 15 different categories concluded Google Docs OCR and ABBYY to be performing better than others. [45] References 1. ^{{Cite web|url=https://ai.googleblog.com/2015/05/paper-to-digital-in-200-languages.html|date=May 6, 2015| title=Paper to Digital in 200+ languages |author=Dmitriy Genzel |author2=Ashok Popat}} 2. ^{{Cite web|url=https://www.youtube.com/watch?v=E0y41YU85tI |date= Sep 4, 2015|title=IEEE SPS: Optical Character Recognition for Most of the World's Languages|author=Ashok Popat}} 3. ^Based on count of language training files for version 3.04. Available at [https://github.com/tesseract-ocr/tessdata the download page]. 4. ^Usage explained in the Tesseract [https://github.com/tesseract-ocr/tesseract/wiki#running-tesseract Readme] and [https://github.com/tesseract-ocr/tesseract/wiki/FAQ#what-output-formats-can-tesseract-producet FAQ] 5. ^Such as ODF with OCRFeeder 6. ^{{cite web|url=https://github.com/tesseract-ocr/tesseract#brief-history/ |title=GitHub - tesseract-ocr/tesseract: Tesseract Open Source OCR Engine (main repository)|accessdate=2018-11-05}} 7. ^http://www.irislink.com/EN-GB/c1462/Readiris-16-for-Windows---OCR-Software.aspx 8. ^{{cite web|url=https://ocr.team/ |title=CIB ocr |publisher=cib.de |date=2018-10-01 |accessdate=2018-10-01}} 9. ^{{cite web|url=https://doxiview.cib.de/showcase/index.html |title=CIB doXiview |publisher=cib.de |date=2018-10-01 |accessdate=2018-10-01}} 10. ^{{cite web|url=http://www.expervision.com/ocr-sdk-toolkit/openrtk-ocr-toolkit-sdk |title=OpenRTK – ExperVision OCR SDK | OCR Software, OCR SDK & Toolkit, OCR Service – ExperVision OCR |publisher=Expervision.com |date= |accessdate=2013-09-12}} 11. ^{{cite web|url=http://aliusdoc.com/sci.html |title=AliusDoc AD-SCI |publisher=AliusDoc.com |date= |accessdate=2015-10-16}} 12. ^{{cite web|url=https://www.abbyy.com/en-eu/finereader/tech-specs/ |title=ABBYY FineReader 14: Technical Specifications |publisher=Finereader.abbyy.com |date= |accessdate=2017-02-23}} 13. ^{{cite web|url=http://finereader.abbyy.com/professional/tech_specs/ |title=ABBYY FineReader 11: Technical Specifications |publisher=Finereader.abbyy.com |date= |accessdate=2013-09-12}} 14. ^{{cite web|url=http://ocrworld.com/software/5-in-depth/149-top-ocr-software.html |title=Top OCR Software |publisher=Ocrworld.com |date=2010-03-30 |accessdate=2013-09-12}} 15. ^{{cite web|url=http://asprise.com/royalty-free-library/java-ocr-api-overview.html |title=Asprise OCR SDK Features |publisher=asprise.com |date= |accessdate=2014-06-21}} 16. ^{{cite web|url=http://asprise.com/royalty-free-library/java-ocr-api-overview.html |title=Asprise Java OCR Library Features |publisher=asprise.com |date= |accessdate=2014-06-21}} 17. ^{{cite web|url=http://asprise.com/royalty-free-library/ocr-api-for-java-csharp-vb.net.html |title=Asprise Java, C#/VB.NET OCR API |publisher=asprise.com |date=2015-11-19 |accessdate=2015-11-19}} 18. ^{{cite web|url=http://www.nicomsoft.com/products/ocr/features/ |title=Nicomsoft OCR SDK Features |publisher=nicomsoft.com |date= |accessdate=2015-01-08}} 19. ^{{cite web|url=http://nicomsoft.com/ |title=Nicomsoft OCR, C#/VB.NET OCR API |publisher=nicomsoft.com |date=2015-01-08 |accessdate=2015-01-08}} 20. ^1 {{cite web|url=http://www.leadtools.com/sdk/ocr/default.htm |title=Ocr Sdk |publisher=Leadtools |date= |accessdate=2013-09-12}} 21. ^{{cite web|url=http://www.leadtools.com/corporate/corporate.htm |title=LEAD Technologies, Inc. Corporate Information |publisher=Leadtools.com |date= |accessdate=2013-09-12}} 22. ^{{cite web|url=http://www.leadtools.com/sdk/ocr/product-comparison-chart.htm |title=Ocr Sdk |publisher=Leadtools |date= |accessdate=2013-09-12}} 23. ^{{cite web|url=http://www.leadtools.com/sdk/formats/ocr.htm |title=OCR SDK Output Formats |publisher=Leadtools |date= |accessdate=2013-09-12}} 24. ^{{cite web|url=http://www.leadtools.com/sdk/recognition-imaging.htm |title=LEADTOOLS Recognition Imaging Developer Toolkit |publisher=Leadtools.com |date= |accessdate=2013-09-12}} 25. ^{{cite web|url=http://www.leadtools.com/sdk/ocr/icr.htm |title=Icr Sdk |publisher=Leadtools |date= |accessdate=2013-09-12}} 26. ^Debian manual page for Cuneiform for Linux version 1.1.0 27. ^{{cite web|url=http://www.dynamsoft.com/Downloads/OCR-Language-Package.aspx |title=OCR SDK Language Packages Download |publisher=Dynamsoft.com |date= |accessdate=2013-09-12}} 28. ^{{cite web|url=http://www.nuance.com/imaging/omnipage/omnipage-csdk.asp |title=OmniPage CSDK - OCR Document Capture Toolkit | Document Imaging & OCR |publisher=Nuance |date= |accessdate=2013-09-12}} 29. ^{{cite web|url=http://www.nuance.com/for-business/by-product/omnipage/standard/index.htm |title=OmniPage Standard Document Conversion |publisher=Nuance |date= |accessdate=2014-02-25}} 30. ^{{cite web|url=http://www.paperfile.net/ |title=Free OCR Software - Optical Character Recognition Software for Windows import from PDF and Twain Scanners |publisher=Paperfile.net |date= |accessdate=2013-09-12}} 31. ^{{cite web|url=https://github.com/manisandro/gImageReader |title=gImageReader |publisher=github.com |date= |accessdate=2018-03-25}} 32. ^{{cite web|url=https://wasd.urz.uni-magdeburg.de/jschulen/ocr/ |title=GOCR Homepage |publisher=wasd.urz.uni-magdeburg.de |date= |accessdate=2018-10-17}} 33. ^{{cite web|url=http://jocr.sourceforge.net/ |title=GOCR |publisher=Jocr.sourceforge.net |date= |accessdate=2013-09-12}} 34. ^{{cite mailing list |last=Diaz |first=Antonio |title=GNU Ocrad 0.26 released |publisher=info-gnu |date=2015-04-16 |url=https://lists.gnu.org/archive/html/bug-ocrad/2017-04/msg00000.html}} 35. ^OCRopus includes the ocropus-hocr tool which produces hOCR from the recognition results. 36. ^In combination with the hocr-tools 37. ^{{cite web|title=MeOCR .NET Library|url=http://www.meocr.com/meocrlib.html}} 38. ^{{cite web|url=http://www.yunmai.com/en/ocr_sdks.html |title=List of Yunmai OCR SDKs |publisher=yunmai.com |date= |accessdate=2015-07-12}} 39. ^{{Cite web|url=https://www.anyline.io/company/|title=Company {{!}} Anyline|last=|first=|date=2016-06-30|website=Anyline|publisher=|access-date=2016-06-30}} 40. ^1 {{Cite web|url=https://www.anyline.io/blog/category/release-notes/|title=Release Notes Archives - ANYLINE|website=ANYLINE|language=en-US|access-date=2016-06-30}} 41. ^{{Cite web|url=https://www.npmjs.com/package/anyline|title=anyline|website=npm|access-date=2016-06-30}} 42. ^{{Cite web|url=https://documentation.anyline.io/|title=API Reference|website=documentation.anyline.io|access-date=2016-06-30}} 43. ^{{Cite web|url=https://www.npmjs.com/package/anyline|title=anyline|website=npm|access-date=2016-06-30}} 44. ^{{Cite web|url=https://www.anyline.io/font|title=Fonts {{!}} Anyline|last=|first=|date=2016-06-30|website=Anyline|publisher=|access-date=2016-06-30}} 45. ^{{Cite web|url=https://www.researchgate.net/publication/310645810_OCR_as_a_Service_An_Experimental_Evaluation_of_Google_Docs_OCR_Tesseract_ABBYY_FineReader_and_Transym|title=OCR as a Service: An Experimental Evaluation of Google Docs OCR, Tesseract, ABBYY FineReader, and Transym|last=Assefi|first=Mehdi|date=2016-12-01|website=Research gate|archive-url=|archive-date=|dead-url=|access-date=2019-01-31}}
{{OCR}}{{DEFAULTSORT:List Of Optical Character Recognition Software}} 4 : Computer libraries|Optical character recognition|Multimedia software comparisons|Software development kits |