请输入您要查询的百科知识:

 

词条 English-Arabic Parallel Corpus of United Nations Texts
释义

  1. Texts included in the EAPCOUNT

  2. Time-frame

  3. Main sources of EAPCOUNT texts

  4. References

  5. External links

  6. See also

The English-Arabic Parallel Corpus Of United Nations Texts (EAPCOUNT) is one of the biggest available parallel corpora involving the Arabic language.

It is intended as a general research tool, available beyond the present project for applied and theoretical linguistic research. It started as a PhD research project at the Department of Linguistics, University of Carthage, in 2006 by Dr. Hammouda Salhi (حمُّودة الصالحي), in collaboration with some of his students, and completed in 2010. The whole description of the corpus was completed in 2009 and revised in 2010.

The EAPCOUNT project comes as a response to the unsatisfactory performance of general-purpose dictionaries (Zanettin, 2009), especially when it comes to translation studies and comparative research involving Arabic. It was also motivated by the increasing demands for cross-lingual research and information retrieval (Salhi, 2010).

The EAPCOUNT comprises 341 texts aligned on a paragraph basis, which means texts in English along with their translational counterparts in Arabic. It consists of two subcorpora; one contains the English originals and the other their Arabic translations. As for the English subcorpus, it contains 3,794,677 word tokens, with 78,606 word types. The Arabic subcorpus has a slightly fewer word tokens (3,755,741), yet differs greatly in terms of the number of word types, which is 143,727. This means that the whole corpus contains 7,550,418 tokens.

Texts included in the EAPCOUNT

The EAPCOUNT consists mainly, but not exclusively, of resolutions and annual reports issued by different UN organizations and institutions. Some texts are taken from the authoritative publications of another UN-like institution, namely the Inter-Parliamentary Union (IPU); representing 2.18% of the total number of tokens in the English subcorpus. But the great majority of texts are issued by the General Assembly and Security Council (66.44% SL tokens). The assumption here is that TL texts produced by these selected international bodies can be considered as translations of a high degree of reliability. All texts have been downloaded from first-hand sources (official websites of these agencies) in order to make sure that the publications are all kept in their original form.

Time-frame

The EAPCOUNT texts cover a time-frame of about 14 years. The EAPCOUNT can be taken as a synchronic corpus, even though Meyer (2002:46) maintains that “a time-frame of 5 to 10 years seems reasonable” for a corpus to fit into the category of synchronic corpora. This is because almost all original texts and translations are issued by the same bodies and are governed by strict norms and standards of writing and translation, which may arguably mean that language change happens at a slower pace. In addition, 22.6% of the texts were produced in 2009, 16% in 2007, and 13.4% in 2005, and 93.87% of the texts were produced over a period of 9 years, namely from 2001 to 2009, or within the reasonable time-frame set by Meyer for a synchronic corpus.

Main sources of EAPCOUNT texts

  • General Assembly Resolutions: http://www.un.org/ga/64/resolutions.shtml
  • Security Council Resolutions: http://www.un.org/Docs/sc/unsc_resolutions.html
  • UNICEF Publications: http://www.unicef.org/publications/index.html
  • International Monetary Fund Publications http://www.imf.org/external/arabic/index.htm

References

  • Meyer, Charles F. (2002) English Corpus Linguistics. Cambridge: Cambridge University Press
  • Salhi, Hammouda (2010): Small Parallel Corpora in an English-Arabic Translation Classroom: No Need to Reinvent the Wheel in the Era of Globalization In: Said M SHIYAB, Marilyn Gaddis ROSE, Juliane HOUSE, and John DUVAL, ed. Globalisation and Aspects of Translation. Newcastle: Cambridge Scholars Publishing, UK. 53-67.
  • Zanettin, Federico (2009): Corpus-based Translation Activities for Language Learners. The Interpreter and Translator Trainer (ITT), 3(2) Manchester: St Jerome 209-224

External links

  • http://www.comp.leeds.ac.uk/eric/latifa/arabic_corpora.htm
  • http://hammouda-salhi.webs.com/
  • http://www.lancs.ac.uk/fass/projects/corpus/UCCTS2010Proceedings/
  • http://www.authorstream.com/Presentation/salhi-627362-business-and-translation-pedagogy-salhi3/
  • Parallel corpus

See also

  • Computer-assisted reviewing
  • Machine translation
  • Natural language processing
{{DEFAULTSORT:English-Arabic Parallel Corpus Of United Nations Texts}}

2 : United Nations documents|Arabic dictionaries

随便看

 

开放百科全书收录14589846条英语、德语、日语等多语种百科知识,基本涵盖了大多数领域的百科知识,是一部内容自由、开放的电子版国际百科全书。

 

Copyright © 2023 OENC.NET All Rights Reserved
京ICP备2021023879号 更新时间:2024/11/10 14:07:08