词条 | National Corpus of Polish |
释义 |
The National Corpus of Polish (Polish : Narodowy Korpus Języka Polskiego NKJP) is the biggest and the most important corpus of the Polish language. A linguistic corpus is a collection of texts where one can find the typical use of a single word or a phrase, as well as their meaning and grammatical function. DescriptionThe National Corpus of Polish is a shared initiative of four institutions: Institute of Computer Science and the Institute of the Polish Language at the Polish Academy of Sciences, Polish Scientific Publishers PWN, and the Department of Computational and Corpus Linguistics at the University of Łódź. It has been registered as a research-development project of the Ministry of Science and Higher Education. The intended size of the whole National Corpus of Polish is over 1 billion words, of which a 300-million word subcorpus has been carefully balanced, and a manually-annotated 1-million corpus has been released under an open license. The corpus is accessible online at http://nkjp.pl/poliqarp/ The corpus contains classic literature, daily newspapers, specialist periodicals and journals, transcripts of conversations, and a variety of short-lived and internet texts.[1] Search Engines
HistoryThe first corpus to emerge was developed by the Institute of the Polish Language, Polish Academy of Sciences (not publicly available), followed by the corpus of PWN publishers, then the corpus of the PELCRA group at the University of Łódź, and finally the corpus of the Institute of Computer Science, Polish Academy of Science. All four teams decided to join forces in 2006, forming the Consortium for the National Corpus of Polish.[2] References1. ^http://nkjp.pl/index.php?page=0&lang=1 2. ^http://nkjp.pl/settings/papers/NKJP_ACADEMIA2009_pl.pdf External links
4 : Polish language|Corpora|Linguistic research|Corpus linguistics |
随便看 |
|
开放百科全书收录14589846条英语、德语、日语等多语种百科知识,基本涵盖了大多数领域的百科知识,是一部内容自由、开放的电子版国际百科全书。