Graphics & Design

Introduction. Philipp Koehn. 28 January PDF

Published
of 37
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Description
Introduction Philipp Koehn 28 January 2016 Administrativa 1 Class web site: Tuesdays and Thursdays, 1:30-2:45, Hodson 313 Instructor: Philipp Koehn (with help from Matt Post)
Transcript
Introduction Philipp Koehn 28 January 2016 Administrativa 1 Class web site: Tuesdays and Thursdays, 1:30-2:45, Hodson 313 Instructor: Philipp Koehn (with help from Matt Post) Grading five programming assignments (12% each) final project (30%) in-class presentation: language in ten minutes (10%) Textbook 2 Machine Translation: Chinese 3 Machine Translation: French 4 A Clear Plan 5 Interlingua Lexical Transfer Source Target A Clear Plan 6 Interlingua Analysis Syntactic Transfer Lexical Transfer Generation Source Target A Clear Plan 7 Interlingua Semantic Transfer Generation Analysis Syntactic Transfer Lexical Transfer Source Target A Clear Plan 8 Interlingua Analysis Semantic Transfer Syntactic Transfer Generation Lexical Transfer Source Target Learning from Data 9 Training Data parallel corpora monolingual corpora dictionaries Training Statistical Machine Translation System Linguistic Tools Using Source Text Statistical Machine Translation System Translation 10 why is that a good plan? Word Translation Problems 11 Words are ambiguous He deposited money in a bank account with a high interest rate. Sitting on the bank of the Mississippi, a passing ship piqued his interest. How do we find the right meaning, and thus translation? Context should be helpful Syntactic Translation Problems 12 Languages have different sentence structure das behaupten sie wenigstens this claim they at least the she Convert from object-verb-subject (OVS) to subject-verb-object (SVO) Ambiguities can be resolved through syntactic analysis the meaning the of das not possible (not a noun phrase) the meaning she of sie not possible (subject-verb agreement) Semantic Translation Problems 13 Pronominal anaphora I saw the movie and it is good. How to translate it into German (or French)? it refers to movie movie translates to Film Film has masculine gender ergo: it must be translated into masculine pronoun er We are not handling this very well [Le Nagard and Koehn, 2010] Semantic Translation Problems 14 Coreference Whenever I visit my uncle and his daughters, I can t decide who is my favorite cousin. How to translate cousin into German? Male or female? Complex inference required Semantic Translation Problems 15 Discourse Since you brought it up, I do not agree with you. Since you brought it up, we have been working on it. How to translated since? Temporal or conditional? Analysis of discourse structure a hard problem Learning from Data 16 What is the best translation? Sicherheit security 14,516 Sicherheit safety 10,015 Sicherheit certainty 334 Learning from Data 17 What is the best translation? Counts in European Parliament corpus Sicherheit security 14,516 Sicherheit safety 10,015 Sicherheit certainty 334 Learning from Data 18 What is the best translation? Phrasal rules Sicherheit security 14,516 Sicherheit safety 10,015 Sicherheit certainty 334 Sicherheitspolitik security policy 1580 Sicherheitspolitik safety policy 13 Sicherheitspolitik certainty policy 0 Lebensmittelsicherheit food security 51 Lebensmittelsicherheit food safety 1084 Lebensmittelsicherheit food certainty 0 Rechtssicherheit legal security 156 Rechtssicherheit legal safety 5 Rechtssicherheit legal certainty 723 Learning from Data 19 What is most fluent? a problem for translation 13,000 a problem of translation 61,600 a problem in translation 81,700 Learning from Data 20 What is most fluent? a problem for translation 13,000 a problem of translation 61,600 a problem in translation 81,700 Hits on Google Learning from Data 21 What is most fluent? a problem for translation 13,000 a problem of translation 61,600 a problem in translation 81,700 a translation problem 235,000 Learning from Data 22 What is most fluent? police disrupted the demonstration 2,140 police broke up the demonstration 66,600 police dispersed the demonstration 25,800 police ended the demonstration 762 police dissolved the demonstration 2,030 police stopped the demonstration 722,000 police suppressed the demonstration 1,400 police shut down the demonstration 2,040 Learning from Data 23 What is most fluent? police disrupted the demonstration 2,140 police broke up the demonstration 66,600 police dispersed the demonstration 25,800 police ended the demonstration 762 police dissolved the demonstration 2,030 police stopped the demonstration 722,000 police suppressed the demonstration 1,400 police shut down the demonstration 2,040 24 where are we now? Word Alignment 25 michael geht davon aus, dass er im haus bleibt michael assumes that he will stay in the house Phrase-Based Model 26 Foreign input is segmented in phrases Each phrase is translated into English Phrases are reordered Workhorse of today s statistical machine translation Syntax-Based Translation 27 S PRO VP VP VP VBZ wants TO to VB NP NP NP PP PRO she DET a NN cup IN of NN NN coffee VB drink Sie PPER will VAFIN eine ART Tasse NN Kaffee NN trinken VVINF NP S VP Semantic Translation 28 Abstract meaning representation [Knight et al., ongoing] (w / want-01 :agent (b / boy) :theme (l / love :agent (g / girl) :patient b)) Generalizes over equivalent syntactic constructs (e.g., active and passive) Defines semantic relationships semantic roles co-reference discourse relations In a very preliminary stage 29 what is it good for? 30 what is it good enough for? Why Machine Translation? 31 Assimilation reader initiates translation, wants to know content user is tolerant of inferior quality focus of majority of research (GALE program, etc.) Communication participants don t speak same language, rely on translation users can ask questions, when something is unclear chat room translations, hand-held devices often combined with speech recognition, IWSLT campaign Dissemination publisher wants to make content available in other languages high demands for quality currently almost exclusively done by human translators Problem: No Single Right Answer 32 Israeli officials are responsible for airport security. Israel is in charge of the security at this airport. The security work for this airport is the responsibility of the Israel government. Israeli side was in charge of the security of this airport. Israel is responsible for the airport s security. Israel is responsible for safety work at this airport. Israel presides over the security of the airport. Israel took charge of the airport security. The safety of this airport is taken charge of by Israel. This airport s security is the responsibility of the Israeli security officials. Quality 33 HTER assessment 0% 10% 20% publishable editable 30% gistable 40% triagable 50% (scale developed in preparation of DARPA GALE programme) Applications 34 HTER assessment application examples 0% Seamless bridging of language divide publishable Automatic publication of official announcements 10% editable Increased productivity of human translators 20% Access to official publications Multi-lingual communication (chat, social networks) 30% gistable Information gathering Trend spotting 40% triagable Identifying relevant documents 50% Current State of the Art 35 HTER assessment language pairs and domains 0% publishable French-English restricted domain 10% French-English technical document localization editable French-English news stories 20% English-German news stories 30% gistable English-Czech open domain 40% triagable 50% (informal rough estimates by presenter) Thank You 36 questions?
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks