请输入您要查询的百科知识:

 

词条 Draft:Spark NLP
释义

  1. Main Features

  2. Pipelines

  3. Pre-trained models

      English    Italian    French  

  4. See also

  5. Licence Information

  6. References

{{AFC submission|t||ts=20190329083353|u=Dia.trambitas|ns=118|demo=}}{{lowercase title}}{{Distinguish|}}{{Infobox software
| title = Spark NLP
| name = Spark NLP
| author = John Snow Labs
| released = October 2017[1]
| latest release version = 2.0
| latest release date = {{Start date and age|2019|03|df=yes}}
| repo = {{URL|https://github.com/JohnSnowLabs/spark-nlp}}
| status = active
| programming language = Python, Scala
| operating system = Linux, Windows, macOS, OS X
| platform = cross-platform
| genre = Natural language processing
| license = Apache licence
| website = {{URL|https://www.johnsnowlabs.com/spark-nlp/}}
}}

Spark NLP[2][3][4][5][6] is an open-source text processing library built on top of Apache Spark and its Spark ML library. It's goal is to provide an API for NLP annotations allowing a scalable approach within a distributed large scale environment.

Main Features

Several annotators are provided out of the box for both Python and Scala:

  • Tokenizer: Word tokens
  • Normalizer: Text cleaning
  • Stemmer: Hard stems
  • Lemmatizer: Lemmas
  • RegexMatcher: Rule matching
  • TextMatcher: Phrase matching
  • Chunker: Meaningful phrase matching
  • DateMatcher: Date-time parsing
  • SentenceDetector: Sentence Boundary Detector
  • DeepSentenceDetector: Sentence Boundary Detector with Machine Learning
  • POSTagger: Part of speech tagger
  • ViveknSentimentDetector: Sentiment analysis
  • SentimentDetector: Sentiment analysis
  • Named Entity Recognition CRF annotator
  • Named Entity Recognition Deep Learning annotator
  • SpellChecker: Norvig algorithm
  • SpellChecker: Symmetric delete
  • Dependency Parser: Unlabeled grammatical relation
  • Typed Dependency Parser: Labeled grammatical relation

Pipelines

PipelinesEnglishName
Explain Document ML[https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/explain_document_ml_en_2.0.0_2.4_1553189532150.zip Download]explain_document_ml
Explain Document DL[https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/explain_document_dl_en_2.0.0_2.4_1553227894237.zip Download]explain_document_dl
Entity Recognizer DL[https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/entity_recognizer_dl_en_2.0.0_2.4_1553230844671.zip Download]entity_recognizer_dl

Pre-trained models

English

ModelEnglish
LemmatizerModel (Lemmatizer)[https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lemma_fast_en_1.8.0_2.4_1545435317864.zip Download]
PerceptronModel (POS)[https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pos_fast_en_1.8.0_2.4_1545434653742.zip Download]
ViveknSentimentModel (Sentiment)[https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/vivekn_fast_en_1.8.0_2.4_1545435741623.zip Download]
NerCRFModel (NER)[https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_fast_en_1.8.0_2.4_1545435254745.zip Download]
NerDLModel (NER)[https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_precise_en_1.8.0_2.4_1545439567330.zip Download]
SymmetricDeleteModel (Spell Checker)[https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spell_sd_fast_en_1.8.0_2.4_1545435558025.zip Download]
ContextSpellCheckerModel (Spell Checker)[https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/context_spell_gen_en_1.8.0_2.4_1546979465177.zip Download]
NorvigSweetingModel (Spell Checker)[https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spell_fast_en_1.8.0_2.4_1545435732032.zip Download]

Italian

ModelItalian
LemmatizerModel (Lemmatizer)[https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/it/lemma/dxc.technology/lemma-it_dxc-1.8.0.zip Download]
SentimentDetector (Sentiment)[https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/it/sentiment/dxc.technology/sentiment-it_dxc-1.8.0.zip Download]

French

ModelFrench
PerceptronModel (POS UD-GSD)[https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pos_ud-gsd_fr_2.0.0_2.4_1553029753307.zip Download]

See also

  • Natural language processing
  • List of natural language processing toolkits

Licence Information

The library is under Apache 2.0 license, written in Scala with no dependencies on other NLP or ML libraries, and designed to natively extend the Spark ML Pipeline API. Spark NLP is available for free download on GitHub https://github.com/johnsnowlabs/spark-nlp.

References

1. ^{{cite web |last1=Talby |first1=David |title=Introducing the Natural Language Processing Library for Apache Spark |url=https://databricks.com/blog/2017/10/19/introducing-natural-language-processing-library-apache-spark.html |website=databricks.com |publisher=databricks |accessdate=29 March 2019}}
2. ^{{Cite web|url=https://insidebigdata.com/2018/09/03/use-nlp-extract-unstructured-medical-data-text/|title=The Use of NLP to Extract Unstructured Medical Data From Text|last=Team|first=Editorial|date=2018-09-04|website=insideBIGDATA|language=en-US|access-date=2019-03-29}}
3. ^{{Cite web|url=https://startupbeat.com/john-snow-labs-natural-language-understanding-software-gets-state-of-the-art-recognition-in-three-industry-events/30699/|title=John Snow Labs' Natural Language Understanding Software Gets "State of the Art" Recognition in Three Industry Events|date=2018-07-19|website=StartUp Beat|language=en-US|access-date=2019-03-29}}
4. ^{{Cite web|url=https://www.oreilly.com/ideas/comparing-production-grade-nlp-libraries-running-spark-nlp-and-spacy-pipelines|title=Comparing production-grade NLP libraries: Running Spark-NLP and spaCy pipelines|last=Ellafi|first=Saif Addin|date=2018-02-28|website=O'Reilly Media|language=en|access-date=2019-03-29}}
5. ^{{Cite web|url=https://www.oreilly.com/ideas/comparing-production-grade-nlp-libraries-accuracy-performance-and-scalability|title=Comparing production-grade NLP libraries: Accuracy, performance, and scalability|last=Ellafi|first=Saif Addin|date=2018-02-28|website=O'Reilly Media|language=en|access-date=2019-03-29}}
6. ^{{Cite web|url=https://www.i-programmer.info/news/80-java/11251-spark-gets-nlp-library.html|title=Spark Gets NLP Library|last=Ewbank|first=Kay|date=|website=www.i-programmer.info|archive-url=|archive-date=|dead-url=|access-date=}}
Category:SoftwareCategory:Open-source artificial intelligence{{AFC submission|||ts=20190329085310|u=Dia.trambitas|ns=118}}
随便看

 

开放百科全书收录14589846条英语、德语、日语等多语种百科知识,基本涵盖了大多数领域的百科知识,是一部内容自由、开放的电子版国际百科全书。

 

Copyright © 2023 OENC.NET All Rights Reserved
京ICP备2021023879号 更新时间:2024/9/20 19:31:02