Seminar in Computational Linguistics
- Date: –15:00
- Location: Engelska parken 9-3042
- Lecturer: Ryan Cotterell
- Contact person: Miryam de Lhoneux
Mitigating Gender Bias in Morphologically Rich Languages
Gender bias exists in corpora of all of the world’s languages: the bias is a function what people talk about, not of the grammar of a language. For this reason, data-driven systems in NLP that are trained on this data will inherit such bias. Evidence of bias can be found in all sorts of NLP technologies: word vectors, language models, coreference systems and even machine translation. Most of the research done to mitigate gender bias in natural language corpora, however, has focused solely on English. For instance, in an attempt to remove gender bias in English corpora, NLP practitioners often augment corpora by swapping gendered words: i.e., if “he is a smart doctor” appears, add the sentence “she is a smart doctor” to the corpus as well before training a model. The broader research question asked in this talk is the following: How can we mitigate gender bias in corpora from any of the world’s languages, not just in English? As an example, the simple swapping heuristic for English will not generalize to most of the world’s languages. Indeed, such a solution would not even apply to German, since it marks gender on both nouns and adjectives and requires gender agreement throughout a sentence. MIn the context of German, this task is far more complicated: mapping “er ist ein kluger Arzt” to “sie ist eine kluge Ärztin” requires more than simply swapping “er” with “sie” and “Arzt” with “Ärztin”—one also has to modify the article (“ein”) and the adjective (“klug”). In this talk, we present a machine-learning solution to this problem: we develop a novel neural random field that generates such sentence-to-sentence transformations, enforcing agreement with respect to gender. We explain how to perform inference and morphological reinflection to generate such transformations without any labeled training examples. Empirically, we illustrate that the model manages to reduce gender bias in corpora without sacrificing grammaticality with a novel metric of gender bias. Additionally, we discuss concrete applications to coreference resolution and machine translation.
Ryan Cotterell is a lecturer (≈assistant professor) in the Department of Computer Science and Technology at the University of Cambridge. He will receive his Ph.D. from the Johns Hopkins Computer Science department in the Spring of 2019, where he was affiliated with the Center for Language and Speech Processing. He was co-advised by Jason Eisner and David Yarowsky. He specializes in natural language processing and machine learning, mostly publishing at *ACL and EMNLP, where he has published over 40 papers; he has won best paper awards at ACL 2017 and EACL 2017 after having twice runnered-up (EMNLP 2015, NAACL 2016). Previously, he was a visiting Ph.D. student at the Center for Information and Language Processing at Ludwig-Maximilians-Universität München supported by a Fulbright Fellowship and a DAAD Research Grant under the supervision of Hinrich Schütze. His Ph.D. was supported by a Facebook Fellowship, and an NDSEG graduate fellowship.