Multilingual Learning

Multi-lingual learning is concerned with training models to work well for multiple languages, including low-resource ones. We research methods for enabling information sharing between multiple languages, and also study how to utilise typological knowledge bases to this end. We are currently involved in three larger funded projects on this.

Multi3Generation is a COST Action that funds collaboration of researchers in Europe and abroad. The project is coordinated by Isabelle Augenstein, and its goals are to study language generation using multi-task, multilingual and multi-modal signals.

We are further partner in a research project funded by the Swedish Research Council coordinated by Robert Östling. Its goals are to study structured multilinguality, i.e. the idea of using language representations and typological knowledge bases to guide which information to share between specific languages.

Lastly, Andrea Lekkas’ industrial PhD project with Ordbogen, supported by Innovation Fund Denmark, focuses on multilingual language modelling for developing writing assistants.

Publications

While the prevalence of large pre-trained language models has led to significant improvements in the performance of NLP systems, recent …

Pre-trained language models have been known to perpetuate biases from the underlying datasets to downstream tasks. However, these …

Language embeds information about social, cultural, and political values people hold. Prior work has explored social and potentially …

The success of pre-trained contextualized representations has prompted researchers to analyze them for the presence of linguistic …

The success of multilingual pre-trained models is underpinned by their ability to learn representations shared by multiple languages …

The effectiveness of a language model is influenced by its token representations, which must encode contextual information and handle …

This paper presents the Multitask, Multilingual, Multimodal Language Generation COST Action – Multi3Generation (CA18231), an …

The goal of stance detection is to determine the viewpoint expressed in a piece of text towards a target. These viewpoints or contexts …

Cross-lingual representations have the potential to make NLP techniques available to the vast majority of languages in the world. …

Bridging the performance gap between high- and low-resource languages has been the focus of much previous work. Typological features …

Learning what to share between tasks has been a topic of high importance recently, as strategic sharing of knowledge has been shown to …

Typological knowledge bases (KBs) such as WALS contain information about linguistic properties of the world’s languages. They …

We propose a novel Chinese character conversion model that can disambiguate between mappings and convert between the two scripts. The …

Although the vast majority of knowledge bases KBs are heavily biased towards English, Wikipedias do cover very different topics in …

The study of linguistic typology is rooted in the implications we find between linguistic features, such as the fact that languages …

In the Principles and Parameters framework, the structural features of languages depend on parameters that may be toggled on or off, …

A neural language model trained on a text corpus can be used to induce distributed representations of words, such that similar words …

Previous work has suggested that parameter sharing between transition-based neural dependency parsers for related languages can lead to …

This paper documents the Team Copenhagen system which placed first in the CoNLL–SIGMORPHON 2018 shared task on universal …

A core part of linguistic typology is the classification of languages according to linguistic properties, such as those detailed in the …

Although linguistic typology has a long history, computational approaches have only recently gained popularity. The use of distributed …