Seminarium i datorlingvistik

  • Datum: –15.00
  • Plats: Engelska parken Rum: 9-3042
  • Föreläsare: Gong Zhengxian
  • Kontaktperson: Ali Basirat
  • Seminarium

Integrating Discourse Information into Neural Machine Translation

As the quality of Machine Translation(MT) improves, research on integrating discourse information in MT becomes more viable. As we all know, some research work focusing on context-dependent phenomena, such as co-reference, word sense disambiguation and lexical cohesion, really have the capability of improving the performance of MT. Does it mean only these kinds of information do discourse-level structure contribute to MT? And with the development of Neural Machine translation(NMT), how to effectively integrate discourse-level information to NMT?

In this seminar, I first present four kinds of popular discourse structures, then review some typical papers related to discourse-level MT. And from this part, we can find only limited theories about discourse structure have been applied to MT systems. Second, I introduce some popular NMT systems which can integrate discourse-level information, such as memory-based NMT and multi-model NMT. Finally, I talk with Theme-Rheme theory which originated from Systemic Functional Linguistics. I give some interesting examples, coming from the papers related to Second Language Teaching, to show the importance of studying Theme-Rheme difference between Chinese-English translation. And I found it’s a really challenging work to deal with the thematic divergence between Chinese and English based on my initial experiment running on an MT evaluation dataset. Currently, my Chinese team has constructed a small size Chinese Theme-Rheme corpus, and I can automatically identify theme part from English sentence, but how to integrate this discourse-level useful information into NMT system needs further research.