Seminar in Computational Linguistics
- Date: –15:00
- Location: Engelska parken
- Lecturer: Eva Vanmassenhove
- Contact person: Ali Basirat
SuperNMT: Integrating Supersense and Supertag Features into Neural Machine Translation
Neural Machine Translation (NMT) models have recently become the state-of-the-art in the field of Machine Translation (Bahdanau et al. 2014, Cho et al. 2014, Kalchbrenner et al. 2014, Sutskever et al. 2014). Compared to Statistical Machine Translation (SMT), the previous state-of-the-art, NMT performs particularly well when it comes to word-reorderings and translations involving morphologically rich languages (Bentivogli et al. 2016). Although NMT seems to partially learn or generalize some patterns related to syntax from the raw, sentence-aligned parallel data, more complex phenomena (e.g. prepositional-phrase attachment) remain problematic (Bentivogli et al. 2016). More recent work showed that explicitly modeling extra syntactic information into an NMT system on the source (and/or target) side improves the translation quality 1 : Sennrich and Haddow (2016) integrated morphological information, POS-tags and dependency labels in the form of features on the source side of the NMT model while Nadejde et al. (2017) introduced syntactic information in the form of CCG supertags on both the source and the target side. Moreover, Nadejde et al. (2017) showed that a shared embedding space, where syntax information and words are tightly coupled, is more effective than multitask training. When integrating linguistic information into an MT system, following the central role assigned to syntax by many linguists, the focus has been mainly on the integration of syntactic features. Although there has been some work on semantic features for SMT (Banchs and Costa-Jussà 2011), so far, no work has been done on enriching NMT systems with more general semantic features at the word-level. This might be explained by the fact that NMT models already have means of learning word-embeddings since words are represented in a common vector space. However, making some level of semantics more explicitly available at the word level can provide the translation system with a higher level of abstraction and generalization beneficial to learn more complex constructions. Furthermore, a combination of both syntactic and semantic features would provide the NMT system with a way of learning semantico-syntactic patterns. To apply semantic abstractions at the word-level that enable a characterisation beyond that what can be superficially derived, coarse-grained semantic classes can be used. Inspired by Named Entity Recognition (NER) which provides such abstractions for a limited set of words, supersense- tagging uses an inventory of more general semantic classes (for nouns and verbs) for domain-independent settings (Schneider and Smith 2015). We investigate the effect of integrating super-sense features (26 for nouns, 15 for verbs) into an NMT system. To obtain these features, we used theAMALGrAM 2.0 tool (Schneider et al. 2014, Schneider and Smith 2015) which analyses the input sentence for multi-word expressions as well as noun and verb supersenses. The features are integrated using the framework of Sennrich et al. (2016), replicating the tags for every subword unit obtained by byte-pair encoding (BPE). We further experiment with a combination of semantic supersenses and syntactic supertag features (CCG syntactic categories (Steedman 2000) using EasySRL (Lewis et al. 2015)) and less complex features such as POS-tags, assuming that supersense-tags have the potential to be useful especially in combination with syntactic information. Our experiments show that semantic features, especially in combination with syntactic features, lead to improvements over the BPE baseline system (in terms of BLEU scores). We also observed faster convergence during training. Furthermore, the improvements were more clear on test sets that are less similar to the training data, which supports our hypothesis that semantic features can provide the system with a higher level of abstraction and generalization.