“Captricity”的意思、由来-开放百科全书

Captricity is a data capture software program (and the company that sells it) that uses a combination of machine-learning and human verification to perform OCR {{Citation needed|date=December 2016}} data capture from hand-filled forms.

Background

Captricity was incubated in the Code for America incubator program and is used by government agencies, health clinics and global health practitioners, and researchers such as NYU's Center for Technology and Economic Development {{Citation needed|date=December 2016}}.

Captricity was founded in 2011 by Kuang Chen and former Harvey Danger musician Jeff J. Lin. The idea for Captricity came from Chen’s PhD dissertation at UC Berkeley. His research focused on data-centric approaches to increase the efficiency of low-resource organizations, so they could better serve disadvantaged clients.

Company

Captricity is currently headquartered in downtown Oakland, CA,^[1] and according to its LinkedIn profile, it has 51-200 employees.^[2]

Technology

Captricity capitalizes on the process of crowd sourcing, parceling out OCR verification tasks to human operators.^[3]

Captricity claims that their technology achieves 99.9% accuracy.^[4] Captricity’s machine learning elements combine OCR, ICR and OMR {{Citation needed|date=December 2016}}.

Captricity captures handwritten information from forms. This data then populates searchable spreadsheets (like a .csv Excel file). Captricity does not support unstructured data.

Privacy

To maintain the privacy of the information in the forms, each form is “shredded” into distinct fields and each field is verified by one or more different people.^[5] Captricity claims that since no one person can see more than one field from a document, privacy is maintained. Captricity uses Amazon's Mechanical Turk System to perform this human verification step.^[6] For example, a worker may see a stream of 4-digit numbers, not knowing that it is the last portion of a collection of US social security numbers.

Data redaction

Captricity performs redaction in addition to OCR. Redaction is a service in which any field or collection of fields can be “blacked out” in the document template.^[7] Any information contained in those fields will not be read by the system. For example, if a courthouse wants to release their records to the public, but wants to keep the arresting officer’s name private, the field containing this information can be redacted.

Captricity and Non-profits

Non-profit and academic researchers often conduct survey research in order to conduct Monitoring and Evaluation of their programs or projects. The Center for Effective Global Action (CEGA), which is affiliated with UC Berkeley, announced a partnership with Captricity in August 2012.^[8] Captricity donates digitization services to non-profits {{Citation needed|date=December 2016}} via its Data for Communities program, and offers discounts to non-profit organizations such as CEGA members.

References

1. ^{{cite web|title=Captricity Oakland, CA|url=https://captricity.com/jobs/}}
2. ^{{cite web|title=LinkedIn Page for Captricity|url=http://www.linkedin.com/company/captricity|work=LinkedIn|accessdate=2015}}
3. ^{{cite web|last=Howard|first=Alex|title=A startup takes on "the paper problem" with crowdsourcing and machine learning|url=http://strata.oreilly.com/2012/10/captricity-digitizing-documents-crowdsource.html|work=strata.oreilly|publisher=OReilly|accessdate=5 October 2012}}
4. ^{{cite web|last=Chen |first=Kuang |title=Shreddr: pipelined paper digitization for low-resource organizations |url=http://www.eecs.berkeley.edu/~kuangc/publications/dev12-shreddr.pdf |publisher=University of California - Berkeley |accessdate=2011 |deadurl=yes |archiveurl=https://web.archive.org/web/20140223011813/http://www.eecs.berkeley.edu/~kuangc/publications/dev12-shreddr.pdf |archivedate=2014-02-23 |df= }}
5. ^{{cite news|last=G.|first=Little|title=Human ocr: Insights from a complex human computation process|year=2011}}
6. ^{{cite web|last=HARDY|first=QUENTIN|title=How Big Data Gets Real|url=http://bits.blogs.nytimes.com/2012/06/04/how-big-data-gets-real/|work=NY Times|accessdate=4 June 2012}}
7. ^{{cite web|last=WILLIAM|first=SAFIRE|title=Redact This|url=https://www.nytimes.com/2007/09/09/magazine/09wwln-safire-t.html?_r=0|work=New York Times|publisher=NY Times|accessdate=9 September 2007}}
8. ^{{cite web|title=CEGA partners|url=http://cega.berkeley.edu/partners/|publisher=UC Berkeley|accessdate=August 2012}}