请输入您要查询的百科知识:

 

词条 Comparison of optical character recognition software
释义

  1. Evaluation

  2. References

This comparison of optical character recognition software includes:

  • OCR engines, that do the actual character identification
  • Layout analysis software, that divide scanned documents into zones suitable for OCR
  • Graphical interfaces to one or more OCR engines
  • Software development kits that are used to add OCR capabilities to other software (e.g. forms processing applications, document imaging management systems, e-discovery systems, records management solutions)
Sortable table
NameFounded yearLatest stable versionRelease yearLicenseOnlineWindowsMac OS XLinuxBSDProgramming languageSDK?LanguagesFontsOutput FormatsNotes
Google Drive OCR or Google Cloud Vision 2015 {{free}}Yes BrowserBrowserBrowserUnknownUnknownYes200+All fontstext Google blog post [1] [2]
Tesseract 1985 4.0.0 2018 Apache}} {{No}} {{yes}} {{yes}} {{yes}} {{yes}} C++, C {{yes}} 100+[3] Any printed font Text, hOCR,[4] PDF, others with different user interfaces[5] or the API Created by Hewlett-Packard; under further development by Google[6]
Readiris 1986 16 {{dunno}} {{proprietary}} {{dunno}} {{yes}} {{yes}} {{dunno}} {{dunno}} {{dunno}} {{yes}} 100+[7] {{dunno}} {{dunno}} Owned by Canon
CIB OCR [8] 2011 2.08.00 2018 Freeware {{yes}}[9] {{yes}} {{yes}} {{yes}} {{yes}} C++, Java, Python, Objective-C {{yes}} German, English, Spanish, Russian, Chinese, Japanese, Italian, French Any printed font Text, hOCR, PDF CIB OCR supports more than 160 input formats
Screenworm 2013 1.0 2014 {{proprietary}} {{no}} {{no}} {{yes}} {{no}} {{no}} Objective-C++ {{no}} 57 {{dunno}} TXT Product of Funchip. Uses the Tesseract OCR-engine.
ExperVision[10] TypeReader & RTK 1987 7.1.170.1125 2010 {{proprietary}} {{yes}} {{yes}} {{yes}} {{yes}} {{yes}} C/C++ {{yes}} 21 2618 Has a Mobile and Embedded System version for iOS/Android/etc.
AliusDoc AD-SCI[11] 2005 2.1 2015 {{proprietary}} {{no}} {{yes}} {{no}} {{no}} {{no}} VB.Net For Extensions All ASCII-compatible languages {{dunno}} XML, PlainText, any other thru SDK extensions Minimal need for post-sale Professional Services. Works with structured, semi-structured, and unstructured documents.
ABBYY FineReader 1989 14 2017-01-25 {{proprietary}} {{yes}} {{yes}} {{yes}} {{yes}} {{yes}} C/C++ {{yes}} 192[12] {{dunno}} DOC, DOCX, XLS, XLSX, PPTX, RTF, PDF, HTML, CSV, TXT, ODT, DjVu, EPUB, FB2[13] ABBYY also supplies SDKs for embedded and mobile devices. Professional, Corporate and Site License Editions for Windows, Express Edition for Mac.[14]
E-aksharayan2010{{Yes}}{{No}}{{Yes}}{{No}}14RTF, TXT, BRL
Asprise OCR SDK 1998 15 2015 {{proprietary}} {{yes}} {{yes}} {{yes}} {{yes}} {{yes}} Java, C#,VB.NET, C/C++/Delphi {{yes}} 20+[15] {{dunno}} Plain text, searchable PDF, XML[16] Java, C#, VB.NET, C/C++/Delphi SDKs for OCR and Barcode recognition on Windows, Linux, Mac OS X and Unix.[17]
Nicomsoft OCR SDK 1999 5.5 2015 {{proprietary}} {{no}} {{yes}} {{no}} {{yes}} {{no}} C#, VB.NET, C++, Delphi, Java {{yes}} 25+[18] {{dunno}} Searchable PDF, Text, RTF C#, VB.NET, C++, Delphi, Java OCR tool for Windows and Linux.[19]
AnyDoc Software 1989 {{dunno}} {{dunno}} {{proprietary}} {{no}} {{yes}} {{no}} {{no}} {{no}} VBScript {{dunno}} {{dunno}} {{dunno}} Works with structured, semi-structured, and unstructured documents.
LEADTOOLS[20] 1990[21] 19.0 2014 {{proprietary}} {{yes}} {{yes}} {{yes}} {{yes}} {{no}} C/C++, .NET, Objective-C, Java, JavaScript {{yes}} 56[22] Any printed font PDF, PDF/A, DOC, DOCX, XLS, XPS, RTF, HTML, ANSI Text, Unicode Text, CSV[23] Supports Latin, Asian, Arabic, and MICR character sets.[20] For full page, zonal, and form image processing. Includes OCR, barcode, OMR and forms recognition.[24] ICR (handwritten text recognition) is supported.[25]
CuneiForm 1996 1.1 2011-04-19 BSD variant}} {{no}} {{yes}} {{yes}} {{yes}} {{yes}} C/C++ {{yes}} 28 Any printed font HTML, hOCR, native, RTF, TeX, TXT[26] Enterprise-class system, can save text formatting and recognizes complicated tables of any structure
OCR.space 2015 3.02 2017 GPL}} {{yes}} {{yes}} {{no}} {{no}} {{no}} C# {{yes}} 23 Any printed font TXTWindows desktop software, Windows Store application and online web app - converts scanned documents to editable text documents using OCR.
SimpleOCR 2002 3.5 2008 {{proprietary}} {{no}} {{yes}} {{no}} {{no}} {{no}} {{dunno}} {{dunno}} {{dunno}} {{dunno}}
Dynamsoft OCR SDK 2003 8.2 2012 {{proprietary}} {{yes}} {{yes}} {{No}} {{No}} {{No}} C/C++ {{yes}} 40+[27] {{dunno}} PDF, TXT
OmniPage 1970s 19.2 2015 {{proprietary}} {{yes}} {{yes}} {{yes}} {{yes}} {{no}} C/C++, C#[28] {{yes}} 125[29] Machine and handprinted fonts DOC/DOCX XLS/XLSX PPTX RTF PDF PDF/A Searchable PDF HTML Text XML ePUB MP3 Product of Nuance Communications
Microsoft Office OneNote 2007 2011 {{dunno}} 2007 {{proprietary}} {{no}} {{yes}} {{no}} {{no}} {{no}} {{dunno}} {{dunno}} {{dunno}} {{dunno}}
FreeOCR {{dunno}} 4.2 August 2012 {{proprietary}} {{No}} {{Yes}} {{No}} {{No}} {{no}} {{dunno}} {{dunno}} {{dunno}} {{dunno}} [30]
gImageReader[31] 2009 3.2.99 2017-07 GPL}} {{no}} {{yes}} {{yes}} {{yes}} {{no}} C++ {{dunno}} 100+ Any printed font TXT, PDF, hOCR uses Tesseract OCR engine
GOCR 2000 0.52[32] 2018-10-15 GPL}} {{yes}}[33] {{yes}} {{yes}} {{yes}} {{yes}} C {{dunno}} 20+ {{dunno}}
Ocrad {{dunno}} 0.26[34] 2017-03-31 GPL}} {{yes}} {{yes}} {{yes}} {{yes}} {{yes}} C++ {{yes}} Latin alphabet {{dunno}} Command line
SmartScore 1991 10.5.8 2015-07 {{proprietary}} {{no}} {{yes}} {{yes}} {{no}} {{no}} {{dunno}} {{dunno}} {{dunno}} {{dunno}} For musical scores
Microsoft Office Document Imaging {{dunno}} Office 2007 2007 {{proprietary}} {{no}} {{yes}} {{no}} {{no}} {{no}} {{dunno}} {{dunno}} {{dunno}} {{dunno}} date=March 2011}}
OCR.net2016{{dunno}}2016{{proprietary}}{{Yes}}{{No}}{{No}}{{No}}{{No}}Java, C++, PHP, Objective-c{{No}}100+{{dunno}}TXT, Searchable PDFOnline service powered by PDF OCR X for conversions.
PDF OCR X20083.0.112018{{proprietary}}{{no}}{{yes}}{{yes}}{{no}}{{no}}Java, C++, Objective-C{{no}}100+{{dunno}}TXT, Searchable PDfDrag and drop UI.
Puma.NET {{dunno}} {{dunno}} 2009-10-29 BSD}} {{no}} {{yes}} {{no}} {{no}} {{no}} C# {{yes}} 28 Any printed font .NET OCR SDK based on Cognitive Technologies' CuneiForm recognition engine. Wraps Puma COM server and provides simplified API for .NET applications
ReadSoft {{dunno}} {{dunno}} 14{{dunno}} {{proprietary}} {{no}} {{yes}} {{no}} {{no}} {{no}} {{dunno}} {{dunno}} {{dunno}} {{dunno}} Scan, capture and classify business documents such as invoices, forms and purchase orders integrated with business processes.
Scantron {{dunno}} {{dunno}} {{dunno}} {{proprietary}} {{no}} {{yes}} {{no}} {{no}} {{no}} {{dunno}} {{dunno}} {{dunno}} {{dunno}} For working with localized interfaces, corresponding language support is required.
OCRFeeder 2009-03 0.8.1 2014-12-22 GPL}} {{no}} {{no}} {{no}} {{yes}} {{no}} Python {{dunno}} {{dunno}} {{dunno}} Features a full user interface and has a command-line tool for automatic operations. Has its own segmentation algorithm but uses system-wide OCR engines like Tesseract or Ocrad
OCRopus 2007 1.3.3 2017-12-16 Apache}} {{no}} {{no}} {{yes}} {{yes}} {{yes}} Python {{dunno}} All languages using Latin script (other languages can be trained) Normal Latin script and Fraktur (other scripts can be trained) TXT, hOCR[35], PDF[36] Pluggable framework under active development, used for Google Books
MathOCR 2014 0.0.3 2015 GPL}} {{no}} {{yes}} {{yes}} {{yes}} {{yes}} Java {{dunno}} {{dunno}} {{dunno}} HTML, LaTeX Features mathematical formula recognition and logical layout analysis, can use OCR engines like Tesseract or Ocrad as back-end.
MeOCR 2012 1.0.0 2012 Freeware}} {{no}} {{yes}} {{no}} {{no}} {{no}} C/C++/C# {{yes}} 28 Any printed font HTML, hOCR, native, RTF, TeX, TXT Windows application. Converts scanned documents to editable text documents using OCR and exports them to Microsoft Word with one click. Features a full user interface and also has a .NET Interface library[37] for developers.
Yunmai OCR SDK 2002 1.0 2013 {{proprietary}} {{yes}} {{yes}} {{yes}} {{yes}} {{yes}} Java, C++, C, object pascal, objective-C {{yes}} 14 Any printed font TXT, PDF Has the advantage of Chinese characters recognition.[38]
Anyline SDK2013[39]3.5.1[40]2016[40]Free non-commercial use[41]}}{{no}}No*}}No*}}No*}}No*}}Java (Android), Objective-C & Swift (iOS), C# (Windows Phone, Xamarin), JavaScript (Cordova)[42]Yes[43]}}2 (German, English)Any printed trainable font[44]Plain text, verification image*Customizable mobile OCR SDK for Android, iOS, Windows Phone, Smart glasses (Google Glass, Epson Moverio,...)
Name Founded year Latest stable version Release year License Online Windows Mac OS X Linux BSD Programming language SDK? Languages Fonts Output Formats Notes

Evaluation

An analysis of the accuracy and reliability of the OCR packages Google Docs OCR, Tesseract, ABBYY FineReader, and Transym, employing a dataset including 1227 images from 15 different categories concluded Google Docs OCR and ABBYY to be performing better than others.

[45]

References

1. ^{{Cite web|url=https://ai.googleblog.com/2015/05/paper-to-digital-in-200-languages.html|date=May 6, 2015| title=Paper to Digital in 200+ languages |author=Dmitriy Genzel |author2=Ashok Popat}}
2. ^{{Cite web|url=https://www.youtube.com/watch?v=E0y41YU85tI |date= Sep 4, 2015|title=IEEE SPS: Optical Character Recognition for Most of the World's Languages|author=Ashok Popat}}
3. ^Based on count of language training files for version 3.04. Available at [https://github.com/tesseract-ocr/tessdata the download page].
4. ^Usage explained in the Tesseract [https://github.com/tesseract-ocr/tesseract/wiki#running-tesseract Readme] and [https://github.com/tesseract-ocr/tesseract/wiki/FAQ#what-output-formats-can-tesseract-producet FAQ]
5. ^Such as ODF with OCRFeeder
6. ^{{cite web|url=https://github.com/tesseract-ocr/tesseract#brief-history/ |title=GitHub - tesseract-ocr/tesseract: Tesseract Open Source OCR Engine (main repository)|accessdate=2018-11-05}}
7. ^http://www.irislink.com/EN-GB/c1462/Readiris-16-for-Windows---OCR-Software.aspx
8. ^{{cite web|url=https://ocr.team/ |title=CIB ocr |publisher=cib.de |date=2018-10-01 |accessdate=2018-10-01}}
9. ^{{cite web|url=https://doxiview.cib.de/showcase/index.html |title=CIB doXiview |publisher=cib.de |date=2018-10-01 |accessdate=2018-10-01}}
10. ^{{cite web|url=http://www.expervision.com/ocr-sdk-toolkit/openrtk-ocr-toolkit-sdk |title=OpenRTK – ExperVision OCR SDK | OCR Software, OCR SDK & Toolkit, OCR Service – ExperVision OCR |publisher=Expervision.com |date= |accessdate=2013-09-12}}
11. ^{{cite web|url=http://aliusdoc.com/sci.html |title=AliusDoc AD-SCI |publisher=AliusDoc.com |date= |accessdate=2015-10-16}}
12. ^{{cite web|url=https://www.abbyy.com/en-eu/finereader/tech-specs/ |title=ABBYY FineReader 14: Technical Specifications |publisher=Finereader.abbyy.com |date= |accessdate=2017-02-23}}
13. ^{{cite web|url=http://finereader.abbyy.com/professional/tech_specs/ |title=ABBYY FineReader 11: Technical Specifications |publisher=Finereader.abbyy.com |date= |accessdate=2013-09-12}}
14. ^{{cite web|url=http://ocrworld.com/software/5-in-depth/149-top-ocr-software.html |title=Top OCR Software |publisher=Ocrworld.com |date=2010-03-30 |accessdate=2013-09-12}}
15. ^{{cite web|url=http://asprise.com/royalty-free-library/java-ocr-api-overview.html |title=Asprise OCR SDK Features |publisher=asprise.com |date= |accessdate=2014-06-21}}
16. ^{{cite web|url=http://asprise.com/royalty-free-library/java-ocr-api-overview.html |title=Asprise Java OCR Library Features |publisher=asprise.com |date= |accessdate=2014-06-21}}
17. ^{{cite web|url=http://asprise.com/royalty-free-library/ocr-api-for-java-csharp-vb.net.html |title=Asprise Java, C#/VB.NET OCR API |publisher=asprise.com |date=2015-11-19 |accessdate=2015-11-19}}
18. ^{{cite web|url=http://www.nicomsoft.com/products/ocr/features/ |title=Nicomsoft OCR SDK Features |publisher=nicomsoft.com |date= |accessdate=2015-01-08}}
19. ^{{cite web|url=http://nicomsoft.com/ |title=Nicomsoft OCR, C#/VB.NET OCR API |publisher=nicomsoft.com |date=2015-01-08 |accessdate=2015-01-08}}
20. ^{{cite web|url=http://www.leadtools.com/sdk/ocr/default.htm |title=Ocr Sdk |publisher=Leadtools |date= |accessdate=2013-09-12}}
21. ^{{cite web|url=http://www.leadtools.com/corporate/corporate.htm |title=LEAD Technologies, Inc. Corporate Information |publisher=Leadtools.com |date= |accessdate=2013-09-12}}
22. ^{{cite web|url=http://www.leadtools.com/sdk/ocr/product-comparison-chart.htm |title=Ocr Sdk |publisher=Leadtools |date= |accessdate=2013-09-12}}
23. ^{{cite web|url=http://www.leadtools.com/sdk/formats/ocr.htm |title=OCR SDK Output Formats |publisher=Leadtools |date= |accessdate=2013-09-12}}
24. ^{{cite web|url=http://www.leadtools.com/sdk/recognition-imaging.htm |title=LEADTOOLS Recognition Imaging Developer Toolkit |publisher=Leadtools.com |date= |accessdate=2013-09-12}}
25. ^{{cite web|url=http://www.leadtools.com/sdk/ocr/icr.htm |title=Icr Sdk |publisher=Leadtools |date= |accessdate=2013-09-12}}
26. ^Debian manual page for Cuneiform for Linux version 1.1.0
27. ^{{cite web|url=http://www.dynamsoft.com/Downloads/OCR-Language-Package.aspx |title=OCR SDK Language Packages Download |publisher=Dynamsoft.com |date= |accessdate=2013-09-12}}
28. ^{{cite web|url=http://www.nuance.com/imaging/omnipage/omnipage-csdk.asp |title=OmniPage CSDK - OCR Document Capture Toolkit | Document Imaging & OCR |publisher=Nuance |date= |accessdate=2013-09-12}}
29. ^{{cite web|url=http://www.nuance.com/for-business/by-product/omnipage/standard/index.htm |title=OmniPage Standard Document Conversion |publisher=Nuance |date= |accessdate=2014-02-25}}
30. ^{{cite web|url=http://www.paperfile.net/ |title=Free OCR Software - Optical Character Recognition Software for Windows import from PDF and Twain Scanners |publisher=Paperfile.net |date= |accessdate=2013-09-12}}
31. ^{{cite web|url=https://github.com/manisandro/gImageReader |title=gImageReader |publisher=github.com |date= |accessdate=2018-03-25}}
32. ^{{cite web|url=https://wasd.urz.uni-magdeburg.de/jschulen/ocr/ |title=GOCR Homepage |publisher=wasd.urz.uni-magdeburg.de |date= |accessdate=2018-10-17}}
33. ^{{cite web|url=http://jocr.sourceforge.net/ |title=GOCR |publisher=Jocr.sourceforge.net |date= |accessdate=2013-09-12}}
34. ^{{cite mailing list |last=Diaz |first=Antonio |title=GNU Ocrad 0.26 released |publisher=info-gnu |date=2015-04-16 |url=https://lists.gnu.org/archive/html/bug-ocrad/2017-04/msg00000.html}}
35. ^OCRopus includes the ocropus-hocr tool which produces hOCR from the recognition results.
36. ^In combination with the hocr-tools
37. ^{{cite web|title=MeOCR .NET Library|url=http://www.meocr.com/meocrlib.html}}
38. ^{{cite web|url=http://www.yunmai.com/en/ocr_sdks.html |title=List of Yunmai OCR SDKs |publisher=yunmai.com |date= |accessdate=2015-07-12}}
39. ^{{Cite web|url=https://www.anyline.io/company/|title=Company {{!}} Anyline|last=|first=|date=2016-06-30|website=Anyline|publisher=|access-date=2016-06-30}}
40. ^{{Cite web|url=https://www.anyline.io/blog/category/release-notes/|title=Release Notes Archives - ANYLINE|website=ANYLINE|language=en-US|access-date=2016-06-30}}
41. ^{{Cite web|url=https://www.npmjs.com/package/anyline|title=anyline|website=npm|access-date=2016-06-30}}
42. ^{{Cite web|url=https://documentation.anyline.io/|title=API Reference|website=documentation.anyline.io|access-date=2016-06-30}}
43. ^{{Cite web|url=https://www.npmjs.com/package/anyline|title=anyline|website=npm|access-date=2016-06-30}}
44. ^{{Cite web|url=https://www.anyline.io/font|title=Fonts {{!}} Anyline|last=|first=|date=2016-06-30|website=Anyline|publisher=|access-date=2016-06-30}}
45. ^{{Cite web|url=https://www.researchgate.net/publication/310645810_OCR_as_a_Service_An_Experimental_Evaluation_of_Google_Docs_OCR_Tesseract_ABBYY_FineReader_and_Transym|title=OCR as a Service: An Experimental Evaluation of Google Docs OCR, Tesseract, ABBYY FineReader, and Transym|last=Assefi|first=Mehdi|date=2016-12-01|website=Research gate|archive-url=|archive-date=|dead-url=|access-date=2019-01-31}}
{{OCR}}{{DEFAULTSORT:List Of Optical Character Recognition Software}}

4 : Computer libraries|Optical character recognition|Multimedia software comparisons|Software development kits

随便看

 

开放百科全书收录14589846条英语、德语、日语等多语种百科知识,基本涵盖了大多数领域的百科知识,是一部内容自由、开放的电子版国际百科全书。

 

Copyright © 2023 OENC.NET All Rights Reserved
京ICP备2021023879号 更新时间:2024/11/12 5:49:28