Samhällsvetenskapliga fakulteten

Seminarium i datorlingvistik

  • Datum: 2018-03-02 kl 13:00 15:00
  • Plats: Engelska parken
  • Föreläsare: Eva Vanmassenhove
  • Kontaktperson: Ali Basirat
  • Seminarium

SuperNMT: Integrating Supersense and Supertag Features into Neural Machine Translation

Neural Machine Translation   (NMT) models have recently become the state-of-the-art in the field of   Machine Translation (Bahdanau et al. 2014, Cho et al. 2014,  Kalchbrenner  et al. 2014, Sutskever et al. 2014). Compared to Statistical Machine  Translation (SMT), the previous state-of-the-art, NMT performs particularly well when it comes to word-reorderings and translations involving morphologically rich languages (Bentivogli et al. 2016). Although NMT seems to partially learn or generalize some patterns related to syntax from the raw,  sentence-aligned parallel data, more complex phenomena (e.g.  prepositional-phrase attachment) remain problematic  (Bentivogli et al.  2016). More recent work showed that explicitly  modeling extra syntactic information into an NMT system on the source (and/or target) side  improves the translation quality 1 :  Sennrich and Haddow (2016)  integrated morphological information,  POS-tags and dependency labels in  the form of features on the source  side of the NMT model while Nadejde et al. (2017) introduced syntactic information in the form  of CCG supertags on both the source  and the target side. Moreover,  Nadejde et al. (2017) showed that a  shared embedding space, where  syntax information and words are tightly coupled, is more effective than multitask training. When integrating linguistic  information into an MT system, following the central role assigned to  syntax by many linguists, the focus has been mainly on the integration  of syntactic features. Although there has been some work on semantic features for SMT (Banchs and Costa-Jussà 2011), so  far, no work has been done on enriching NMT  systems with more general  semantic features at the word-level. This might be explained by the fact  that NMT models already have means of learning word-embeddings since words are represented in a common vector space.  However, making some level of semantics more explicitly available at the  word level can provide the translation system with a higher level of  abstraction and generalization beneficial to learn more complex constructions. Furthermore, a combination of both  syntactic and semantic features would provide the  NMT system with a way  of learning semantico-syntactic patterns. To apply semantic  abstractions at the word-level that enable a characterisation beyond that what can be superficially derived, coarse-grained semantic  classes can be used. Inspired by Named Entity  Recognition (NER) which  provides such abstractions for a limited set of words, supersense-  tagging uses an inventory of more general semantic classes (for nouns and verbs) for domain-independent settings (Schneider  and Smith 2015). We investigate the effect of integrating super-sense  features (26 for nouns, 15 for verbs) into an  NMT system. To obtain  these features, we used theAMALGrAM 2.0 tool  (Schneider et al. 2014, Schneider and Smith 2015) which analyses the  input sentence for multi-word expressions as well as noun and verb  supersenses. The features are integrated using the framework of   Sennrich et al. (2016), replicating the tags for every subword unit obtained by byte-pair encoding (BPE). We further experiment with a  combination of semantic supersenses and syntactic supertag features (CCG  syntactic categories (Steedman 2000) using  EasySRL (Lewis et al.  2015)) and less complex features such as POS-tags,  assuming that supersense-tags have the potential to be useful especially  in combination with syntactic information. Our experiments show that  semantic features, especially in combination with syntactic features,  lead to improvements over the BPE baseline system  (in terms of BLEU scores). We also observed faster convergence during  training. Furthermore, the improvements were more clear on test sets  that are less similar to the training data, which supports our  hypothesis that semantic features can provide the system with a higher level of abstraction and generalization.