词条 | Machine translation software usability |
释义 |
The sections below give objective criteria for evaluating the usability of machine translation software output. Stationarity or canonical form{{main|Round-trip translation}}Do repeated translations converge on a single expression in both languages? I.e. does the translation method show stationarity or produce a canonical form? Does the translation become stationary without losing the original meaning? This metric has been criticized as not being well correlated with BLEU (BiLingual Evaluation Understudy) scores.[1] Adaptive to colloquialism, argot or slangIs the system adaptive to colloquialism, argot or slang? The French language has many rules for creating words in the speech and writing of popular culture. Two such rules are: (a) The reverse spelling of words such as femme to meuf. (This is called verlan.) (b) The attachment of the suffix -ard to a noun or verb to form a proper noun. For example, the noun faluche means "student hat". The word faluchard formed from faluche colloquially can mean, depending on context, "a group of students", "a gathering of students" and "behavior typical of a student". The Google translator as of 28 December 2006 doesn't derive the constructed words as for example from rule (b), as shown here:
French argot has three levels of usage:[2]
The United States National Institute of Standards and Technology conducts annual evaluations [https://www.nist.gov/speech/tests/mt/] of machine translation systems based on the BLEU-4 criterion [https://www.nist.gov/speech/tests/mt/doc/mt06_evalplan.v4.pdf]. A combined method called IQmt which incorporates BLEU and additional metrics NIST, GTM, ROUGE and METEOR has been implemeneted by Gimenez and Amigo . Well-formed outputIs the output grammatical or well-formed in the target language? Using an interlingua should be helpful in this regard, because with a fixed interlingua one should be able to write a grammatical mapping to the target language from the interlingua. Consider the following Arabic language input and English language translation result from the Google translator as of 27 December 2006 . This Google translator output doesn't parse using a reasonable English grammar: وعن حوادث التدافع عند شعيرة رمي الجمرات -التي كثيرا ما يسقط فيها العديد من الضحايا- أشار الأمير نايف إلى إدخال "تحسينات كثيرة في جسر الجمرات ستمنع بإذن الله حدوث أي تزاحم". ==> And incidents at the push Carbuncles-throwing ritual, which often fall where many of the victims - Prince Nayef pointed to the introduction of "many improvements in bridge Carbuncles God would stop the occurrence of any competing."
Semantics preservationDo repeated re-translations preserve the semantics of the original sentence? For example, consider the following English input passed multiple times into and out of French using the Google translator as of 27 December 2006: Better a day earlier than a day late. ==> Améliorer un jour plus tôt qu'un jour tard. ==>
Pour améliorer un jour plus tôt qu'un jour tard. ==>
To improve one day earlier than a day late.
As noted above and in,[1] this kind of round-trip translation is a very unreliable method of evaluation. Trustworthiness and securityAn interesting peculiarity of Google Translate as of 24 January 2008 (corrected as of 25 January 2008) is the following result when translating from English to Spanish, which shows an embedded joke in the English-Spanish dictionary which has some added poignancy given recent events: Heath Ledger is dead ==> Tom Cruise está muerto
This raises the issue of trustworthiness when relying on a machine translation system embedded in a Life-critical system in which the translation system has input to a Safety Critical Decision Making process. Conjointly it raises the issue of whether in a given use the software of the machine translation system is safe from hackers. It is not known whether this feature of Google Translate was the result of a joke/hack or perhaps an unintended consequence of the use of a method such as statistical machine translation. Reporters from CNET Networks asked Google for an explanation on January 24, 2008; Google said only that it was an "internal issue with Google Translate".[3] The mistranslation was the subject of much hilarity and speculation on the Internet.[4][5] If it is an unintended consequence of the use of a method such as statistical machine translation, and not a joke/hack, then this event is a demonstration of a potential source of critical unreliability in the statistical machine translation method. In human translations, in particular on the part of interpreters, selectivity on the part of the translator in performing a translation is often commented on when one of the two parties being served by the interpreter knows both languages. This leads to the issue of whether a particular translation could be considered verifiable. In this case, a converging round-trip translation would be a kind of verification. Cybersecurity experts question the business use of free machine translation tools, which are often used to translate business documents that contain potentially sensitive information. These systems may present a security vulnerability due to a lack of advanced security functionality.[6] See also
Notes1. ^1 {{cite journal|last1=Somers|first1=Harold|title=Round-trip translation: What is it good for?|journal=Proceedings of Australasian Language Technology Workshop ALTW 2005|date=2005|pages=127-133|url=http://personalpages.manchester.ac.uk/staff/harold.somers/RoundTrip.doc|location=Sydney}} 2. ^"The Agony of Argot", Chitlins & Camembert, October 28, 2005 3. ^"Google Translate bug mixes up Heath Ledger, Tom Cruise", by Caroline McCarthy, CNET Networks, January 24, 2008 4. ^'"Tom Cruise" is Spanish for "Heath Ledger"', gawker.com, January 24, 2008 {{webarchive|url=https://web.archive.org/web/20080128172538/http://gawker.com/5002510/tom-cruise-is-spanish-for-heath-ledger |date=January 28, 2008 }} 5. ^"Tom Cruise está muerto", Ray Leon Blog Project, January 24, 2008 {{webarchive|url=https://web.archive.org/web/20081029004505/http://rayhey2.blogspot.com/2008/01/tom-cruise-est-muerto.html |date=October 29, 2008 }} 6. ^{{cite web |title=Cybersecurity Audit Checklist: The Risk of Free Online Tools |url=https://www.pairaphrase.com/cybersecurity-audit-checklist-risk-free-online-tools/ |website=Pairaphrase |publisher=Pairaphrase |accessdate=5 October 2018}} References
4 : Artificial intelligence applications|Computational linguistics|Machine translation software|Natural language processing |
随便看 |
|
开放百科全书收录14589846条英语、德语、日语等多语种百科知识,基本涵盖了大多数领域的百科知识,是一部内容自由、开放的电子版国际百科全书。