“Text nailing”的意思、由来-开放百科全书

TN combines two concepts: 1) human-interaction with narrative text to identify highly prevalent non-negated expressions, and 2) conversion of all expressions and notes into non-negated alphabetical-only representations to create homogeneous representations. The importance of using non-negated expressions to achieve an increased accuracy of text-based classifiers was emphasized in a letter published in Communications of the ACM in October 2018.^[6]

In traditional machine learning approaches for text classification, a human expert is required to label phrases or entire notes, and then a supervised learning algorithm attempts to generalize the associations and apply them to new data. In contrast, using non-negated distinct expressions eliminates the need for an additional computational method to achieve generalizability.^[7]^[8]^[9]

Source code

A sample code for extracting smoking status from narrative notes using "nailed expressions" is available in GitHub.^[10]

TN as progressive cyber-human intelligence

In July 2018 researchers from Virginia Tech and University of Illinois at Urbana–Champaign referred TN as an example for progressive cyber-human intelligence (PCHI).^[11]

Criticism of machine learning in health care

Chen & Asch 2017 wrote "With machine learning situated at the peak of inflated expectations, we can soften a subsequent crash into a “trough of disillusionment” by fostering a stronger appreciation of the technology’s capabilities and limitations."^[12]

A letter published in Communications of the ACM, "Beyond brute force", emphasized that a brute force approach may perform better than traditional machine learning algorithms when applied to text. The letter stated "... machine learning algorithms, when applied to text, rely on the assumption that any language includes an infinite number of possible expressions. In contrast, across a variety of medical conditions, we observed that clinicians tend to use the same expressions to describe patients' conditions."^[13]

In his viewpoint published in June 2018 concerning slow adoption of data-driven findings in medicine, Uri Kartoun, co-creator of Text Nailing states that " ...Text Nailing raised skepticism in reviewers of medical informatics journals who claimed that it relies on simple tricks to simplify the text, and leans heavily on human annotation. TN indeed may seem just like a trick of the light at ﬁrst glance, but it is actually a fairly sophisticated method that ﬁnally caught the attention of more adventurous reviewers and editors who ultimately accepted it for publication."^[14]

Criticism

The human in-the-loop process is a way to generate features using domain experts. Using domain experts to come up with features is not a novel concept. However, the specific interfaces and method which helps the domain experts create the features are most likely novel.

In this case the features the experts create are equivalent to regular expressions. Removing non-alphabetical characters and matching on "smokesppd" is equal to the regular expression /smokes[^a-zA-Z]*ppd/. Using regular expressions as features for text classification is not novel.

Given these features the classifier is a manually set threshold by the authors, decided by the performance on a set of documents. This is a classifier, it's just that the parameters of the classifier, in this case a threshold, is set manually. Given the same features and documents almost any machine learning algorithm should be able to find the same threshold or (more likely) a better one.

The authors note that using support vector machines (SVM) and hundreds of documents give inferior performance, but does not specify which features or documents the SVM was trained/tested on. A fair comparison would use the same features and document sets as those used by the manual threshold classifier.

References

1. ^{{cite journal |doi=10.1145/3139488 |title=Text nailing |journal=Interactions |volume=24 |issue=6 |pages=44–9 |year=2017 |last1=Kartoun |first1=Uri }}
2. ^{{cite journal |doi=10.1145/3151556 |title=Avoiding agenda bias with design thoughtfulness |journal=Interactions |volume=24 |issue=6 |pages=5 |year=2017 |last1=Barbosa |first1=Simone |last2=Cockton |first2=Gilbert }}
3. ^{{cite journal |title=Development of an Algorithm to Identify Patients with Physician-Documented Insomnia |journal=Scientific Reports |volume=8 |issue=1 |pages=7862 | year=2018 | last1=Kartoun |display-authors=etal | first1=Uri| url=https://www.nature.com/articles/s41598-018-25312-z|doi=10.1038/s41598-018-25312-z |pmid=29777125 |pmc=5959894 }}
4. ^{{cite journal |pmid=18660887 |url=http://www.schattauer.de/index.php?id=1268&L=1&pii=meme08010128&no_cache=1 |year=2008 |author1=Meystre |first1=S. M |title=Extracting information from textual documents in the electronic health record: A review of recent research |journal=Yearbook of Medical Informatics |pages=128–44 |last2=Savova |first2=G. K |last3=Kipper-Schuler |first3=K. C |last4=Hurdle |first4=J. F }}
5. ^{{cite journal |doi=10.1016/j.jbi.2017.11.011 |pmid=29162496 |pmc=5771858 |title=Clinical information extraction applications: A literature review |journal=Journal of Biomedical Informatics |volume=77 |pages=34–49 |year=2018 |last1=Wang |first1=Yanshan |last2=Wang |first2=Liwei |last3=Rastegar-Mojarad |first3=Majid |last4=Moon |first4=Sungrim |last5=Shen |first5=Feichen |last6=Afzal |first6=Naveed |last7=Liu |first7=Sijia |last8=Zeng |first8=Yuqun |last9=Mehrabi |first9=Saeed |last10=Sohn |first10=Sunghwan |last11=Liu |first11=Hongfang }}
6. ^{{cite journal |doi=10.1145/3273019 |title=More accurate text analysis for better patient outcomes |journal=Communications of the ACM |volume=61 |issue=10 |pages=6–7 |year=2018 |author1=CACM Staff }}
7. ^{{cite journal |doi=10.1038/srep42282 |pmid=28181568 |pmc=5299453 |title=Predictive Modeling of Physician-Patient Dynamics That Influence Sleep Medication Prescriptions and Clinical Decision-Making |journal=Scientific Reports |volume=7 |pages=42282 |year=2017 |last1=Beam |first1=Andrew L |last2=Kartoun |first2=Uri |last3=Pai |first3=Jennifer K |last4=Chatterjee |first4=Arnaub K |last5=Fitzgerald |first5=Timothy P |last6=Shaw |first6=Stanley Y |last7=Kohane |first7=Isaac S |bibcode=2017NatSR...742282B }}
8. ^{{cite journal |doi=10.1002/hep4.1051 |pmid=29085919 |pmc=5659323 |title=Model for end-stage liver disease Na Score predicts incident major cardiovascular events in patients with nonalcoholic fatty liver disease |journal=Hepatology Communications |volume=1 |issue=5 |pages=429–438 |year=2017 |last1=Simon |first1=Tracey G |last2=Kartoun |first2=Uri |last3=Zheng |first3=Hui |last4=Chan |first4=Andrew T |last5=Chung |first5=Raymond T |last6=Shaw |first6=Stanley |last7=Corey |first7=Kathleen E }}
9. ^{{cite journal |doi=10.1038/ajg.2016.44 |pmid=26925881 |pmc=4864030 |title=Using an Electronic Medical Records Database to Identify Non-Traditional Cardiovascular Risk Factors in Nonalcoholic Fatty Liver Disease |journal=The American Journal of Gastroenterology |volume=111 |issue=5 |pages=671–6 |year=2016 |last1=Corey |first1=Kathleen E |last2=Kartoun |first2=Uri |last3=Zheng |first3=Hui |last4=Chung |first4=Raymond T |last5=Shaw |first5=Stanley Y }}
10. ^{{Cite web | url=https://github.com/kartoun/text-nailing | title=Contribute to kartoun/text-nailing development by creating an account on GitHub| date=2018-01-07}}
11. ^https://dl.acm.org/citation.cfm?id=3231559
12. ^{{cite journal |doi=10.1056/NEJMp1702071 |pmid=28657867 |pmc=5953825 |title=Machine Learning and Prediction in Medicine — Beyond the Peak of Inflated Expectations |journal=New England Journal of Medicine |volume=376 |issue=26 |pages=2507–9 |year=2017 |last1=Chen |first1=Jonathan H |last2=Asch |first2=Steven M }}
13. ^{{cite journal |doi=10.1145/3135241 |title=Beyond brute force |journal=Communications of the ACM |volume=60 |issue=10 |pages=8–9 |year=2017 |author1=CACM Staff }}
14. ^{{cite journal |title=Toward an accelerated adoption of data-driven findings in medicine |journal=Medicine, Health Care and Philosophy |volume=22 |issue=1 |pages=153–157 | year=2018 | last1=Kartoun| first1=Uri|doi=10.1007/s11019-018-9845-y |pmid=29882052 }}