词条 | Truecasing |
释义 |
Truecasing is the problem in natural language processing (NLP) of determining the proper capitalization of words where such information is unavailable. This commonly comes up due to the standard practice (in English and many other languages) of automatically capitalizing the first word of a sentence. It can also arise in badly cased or noncased text (for example, all-lowercase or all-uppercase text messages). Truecasing is unnecessary in languages whose scripts do not have a distinction between uppercase and lowercase letters. This includes all languages not written in the Latin, Greek, Cyrillic or Armenian alphabets, such as Japanese, Chinese, Thai, Hebrew, Arabic, Hindi, and Georgian. Techniques
ApplicationsTruecasing aids in other NLP tasks, such as named entity recognition, automatic content extraction, and machine translation.[1] Proper capitalization allows easier detection of proper nouns, which are the starting points of NER and ACE. Some translation systems use statistical machine learning techniques, which could make use of the information contained in capitalization to increase accuracy. References1. ^{{Cite conference | last1 = Lita | first1 = L. V. | last2 = Ittycheriah | first2 = A. | last3 = Roukos | first3 = S. | last4 = Kambhatla | first4 = N. | title = tRuEcasIng | booktitle = Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics | place= Sapporo, Japan | pages = 152–159 | year = 2003 | url = http://portal.acm.org/citation.cfm?id=1075096.1075116}} {{Natural Language Processing}} 1 : Tasks of natural language processing |
随便看 |
|
开放百科全书收录14589846条英语、德语、日语等多语种百科知识,基本涵盖了大多数领域的百科知识,是一部内容自由、开放的电子版国际百科全书。