Learning with Limited Labelled Data

Learning with limited labelled data is useful for small domains or languages with little resources. Methods we research to mitigate problems arising in these contexts include multi-task learning, weakly supervised and zero-shot learning.

This is a cross-cutting theme in most of our research. Two funded projects specifically addressing this are Multi3Generation and Andreas Nugaard Holm’s industrial PhD project with BASE Life Science, supported by Innovation Fund Denmark.

Multi3Generation is a COST Action that funds collaboration of researchers in Europe and abroad. The project is coordinated by Isabelle Augenstein, and its goals are to study language generation using multi-task, multilingual and multi-modal signals.

Andreas Nugaard Holm’s industrial PhD project focuses on transfer learning and domain adaptation for scientific text.

Publications

Emotion lexica are commonly used resources to combat data poverty in automatic emotion detection. However, methodological issues emerge …

Cross-lingual representations have the potential to make NLP techniques available to the vast majority of languages in the world. …

Scientific document understanding is challenging as the data is highly domain specific and diverse. However, datasets for tasks with …

Modern natural language processing (NLP) methods employ self-supervised pretraining objectives such as masked language modeling to …

Bridging the performance gap between high- and low-resource languages has been the focus of much previous work. Typological features …

Citation count prediction is the task of predicting the number of citations a paper has gained after a period of time. Prior work …

In this paper, we describe our participation in the TREC Health Misinformation Track 2020. We submitted 11 runs to the Total Recall …

For natural language processing (NLP) tasks such as sentiment or topic classification, currently prevailing approaches heavily rely on …

Subjectivity is the expression of internal opinions or beliefs which cannot be objectively observed or verified, and has been shown to …

In practical machine learning settings, the data on which a model must make predictions often come from a different distribution than …

Learning what to share between tasks has been a topic of high importance recently, as strategic sharing of knowledge has been shown to …

A critical component of automatically combating misinformation is the detection of fact check-worthiness, i.e. determining if a piece …

Typological knowledge bases (KBs) such as WALS contain information about linguistic properties of the world’s languages. They …

While state-of-the-art NLP explainability (XAI) methods focus on supervised, per-instance end or diagnostic probing task evaluation[4, …

In this paper, we extend the task of semantic textual similarity to include sentences which contain emojis. Emojis are ubiquitous on …

Language evolves over time in many ways relevant to natural language processing tasks. For example, recent occurrences of tokens …

Task oriented dialogue systems rely heavily on specialized dialogue state tracking (DST) modules for dynamically predicting user intent …

Although the vast majority of knowledge bases KBs are heavily biased towards English, Wikipedias do cover very different topics in …

Multi-task learning and self-training are two common ways to improve a machine learning model’s performance in settings with …

Studying to what degree the language we use is gender-specific has long been an area of interest in socio-linguistics. Studies have …

When assigning quantitative labels to a dataset, different methodologies may rely on different scales. In particular, when assigning …

In online discussion fora, speakers often make arguments for or against something, say birth control, by highlighting certain aspects …

Multi-task learning (MTL) allows deep neural networks to learn from related tasks by sharing parameters with other networks. In …

Previous work has suggested that parameter sharing between transition-based neural dependency parsers for related languages can lead to …

This paper documents the Team Copenhagen system which placed first in the CoNLL–SIGMORPHON 2018 shared task on universal …

Neural part-of-speech (POS) taggers are known to not perform well with little training data. As a step towards overcoming this problem, …

We combine multi-task learning and semisupervised learning by inducing a joint embedding space between disparate label spaces and …

We take a multi-task learning approach to the shared Task 1 at SemEval-2018. The general idea concerning the model structure is to use …

Keyphrase boundary classification (KBC) is the task of detecting keyphrases in scientific articles and labelling them with respect to …