词条 | LIVAC Synchronous Corpus |
释义 |
| name = LIVAC | logo = | screenshot = | caption = LIVAC Word Search System | collapsible = yes | released = July 1995 | operating system = Cross-platform | language = English, Traditional and Simplified Chinese | genre = Corpus | website = {{URL|http://www.livac.org}} }}LIVAC is an uncommon language corpus dynamically maintained since 1995. Different from other existing corpora, LIVAC has adopted a rigorous and regular as well as "Windows" approach in processing and filtering massive media texts from representative communities in the Pan-Chinese region including Hong Kong, Macau, Taipei, Singapore, Shanghai, Beijing, Guangzhou, and Shenzhen.[1] The contents are thus deliberately repetitive in most cases, represented by textual samples drawn from editorials, local and international news, cross-Formosan Straits news, as well as news on finance, sports and entertainment.[2] By 2017, 2.5 billion characters of news media texts have been filtered so far, of which 600 million characters have been processed and analyzed and have yielded an expanding Pan-Chinese dictionary of 2 million words from the Pan-Chinese printed media. Through rigorous analysis based on computational methodology, LIVAC has at the same time accumulated a large amount of accurate and meaningful statistical data on the Chinese language and their speech communities in the Pan-Chinese region, and the result shows considerable and important variations.[3][4] The "Windows" approach is the most representative feature of LIVAC and has enabled Pan-Chinese media texts to be quantitatively analyzed according to various attributes such as locations, time and subject domains. Thus, various types of comparative studies and applications in information technology as well as development of often related innovative applications have been possible.[5][6] Moreover, LIVAC has allowed longitudinal developments to be taken into account, facilitating Key Word in Context (KWIC) and comprehensive study of target words and their underlying concepts as well as linguistic structures over the past 20 years, based on variables such as region, duration and content. Results from the extensive and accumulative data analysis contained in LIVAC have enabled the cultivation of textual databases of proper names, place names, organization names, new words, and bi-weekly and annual rosters of media figures. Related applications have included the establishment of verb and adjective databases, the formulation of sentiment indices, and related opinion mining, to measure and compare the popularity of global media figures in the Chinese media (LIVAC Annual Pan-Chinese Celebrity Rosters, later renamed as the Pan-Chinese Media Personalities Rosters)[7][8] and construction of monthly new word lexicons (LIVAC Annual Pan-Chinese New Word Rosters).[9][10] On this basis, the analysis of the emergence, diffusion and transformation of new words, and the publication of dictionaries of neologisms have been made possible.[11][12] Corpus data processing
Labeling for data curation
Applications
See also
References1. ^Tsou, Benjamin; Lai, Tom; Chan, Samuel; and Wang, William S.-Y. (Eds). (1998). Quantitative and Computational Studies on the Chinese Language 《漢語計量與計算研究》. Language Information Sciences Research Centre, City University Press. 2. ^Tsou, B. K., Kwong, O.Y. (Eds). (2015). Linguistic Corpus and Corpus Linguistics in the Chinese Context (Journal of Chinese Linguistics Monograph Series Number 25), Hong Kong: Chinese University Press. 3. ^Tsou, Benjamin. (2004). "Chinese Language Processing at the Dawn of the 21st Century", in C R Huang and W Lenders (eds) Language and Linguistics Monograph Series B: Frontiers in Linguistics I, pp.189–207. Institute of Linguistics, Academia Sinica. 4. ^Tsou, B. K. (2017). Loanwords in Mandarin Through Other Chinese Dialects. In R. Sybesma, W. Behr, Y. Gu, Z. Handel, C.-T. Huang & J. Myers (Eds.), The Encyclopaedia of Chinese Language and Linguistics (Vol. 2, pp. 641-647). Leiden; Boston: BRILL 5. ^Tsou, Benjamin, and Kwong, Olivia. (2015). LIVAC as a Monitoring Corpus for Tracking Trends beyond Linguistics. In Tsou, Benjamin, and Kwong, Olivia., (eds.), Linguistic Corpus and Corpus Linguistics in the Chinese Context (Journal of Chinese Linguistics Monograph Series No.25). Hong Kong: The Chinese University Press, pp. 447-471. 6. ^Tsou, Benjamin. (2016). Skipantism Revisited: Along with Neologisms and Terminological Truncation. In Chin, Chi-on Andy and Kwok, Bit-chee and Tsou, Benjamin K., (eds.), Commemorative Essays for Professor Yuen-Ren Chao: Father of Modern Chinese Linguistics. Taiwan: Crane Publishing. pp. 343-357. 7. ^CityU releases 2015 LIVAC Pan-Chinese Media Personality Roster, City University of Hong Kong, Hong Kong, 28 December 2015. 8. ^CityU releases 2016 LIVAC Pan-Chinese Media Personality Roster, City University of Hong Kong, Hong Kong, 02 January 2017. 9. ^CityU releases 2014 Pan-Chinese New Word Rosters, City University of Hong Kong, Hong Kong, 12 February 2015. 10. ^CityU releases 2015 LIVAC Pan-Chinese New Word Rosters, City University of Hong Kong, Hong Kong, 04 February 2016. 11. ^鄒嘉彥、游汝杰(編)(2007),《21世紀華語新詞語詞典》(簡體字版),上海,復旦大學出版社。 12. ^鄒嘉彥、游汝杰(編)(2010),《全球華語新詞語詞典》,北京,商務印書館。 External links
6 : Online databases|Applied linguistics|Linguistic research|Corpus linguistics|Computational linguistics|Natural language processing |
随便看 |
|
开放百科全书收录14589846条英语、德语、日语等多语种百科知识,基本涵盖了大多数领域的百科知识,是一部内容自由、开放的电子版国际百科全书。