Learning with Limited Labelled Data

31 Aug 2020

Learning with limited labelled data is useful for small domains or languages with little resources. Methods we research to mitigate problems arising in these contexts include multi-task learning, weakly supervised and zero-shot learning.

This is a cross-cutting theme in most of our research. Two previous projects specifically addressing this are Multi3Generation and Andreas Nugaard Holm’s industrial PhD project with BASE Life Science, supported by Innovation Fund Denmark.

Multi3Generation is a COST Action that funds collaboration of researchers in Europe and abroad. The project is coordinated by Isabelle Augenstein, and its goals are to study language generation using multi-task, multilingual and multi-modal signals.

Andreas Nugaard Holm’s industrial PhD project focuses on transfer learning and domain adaptation for scientific text.

lld limited-data

Publications

Explainability and Interpretability of Multilingual Large Language Models: A Survey

Multilingual large language models (MLLMs) demonstrate state-of-the-art capabilities across diverse cross-lingual and multilingual …

Lucas Resck, Isabelle Augenstein, Anna Korhonen

PDF Project Project

Collecting Cost-Effective, High-Quality Truthfulness Assessments with LLM Summarized Evidence

With the degradation of guardrails against mis- and disinformation online, it is more critical than ever to be able to effectively …

Kevin Roitero, Dustin Wright, Michael Soprano, Isabelle Augenstein, Stefano Mizzaro

PDF Project Project

Multi-View Knowledge Distillation from Crowd Annotations for Out-of-Domain Generalization

Selecting an effective training signal for tasks in natural language processing is difficult: expert annotations are expensive, and …

Dustin Wright, Isabelle Augenstein

PDF Project

Multi-Modal Framing Analysis of News

Automated frame analysis of political communication is a popular task in computational social science that is used to study how authors …

Arnav Arora, Srishti Yadav, Maria Antoniak, Serge Belongie, Isabelle Augenstein

PDF Project Project Project

A Meta-Evaluation of Style and Attribute Transfer Metrics

LLMs make it easy to rewrite text in any style, be it more polite, persuasive, or more positive. We present a large-scale study of …

Amalie Brogaard Pauli, Isabelle Augenstein, Ira Assent

PDF Project

Unstructured Evidence Attribution for Long Context Query Focused Summarization

Large language models (LLMs) are capable of generating coherent summaries from very long contexts given a user query. Extracting and …

Dustin Wright, Zain Muhammad Mujahid, Lu Wang, Isabelle Augenstein, David Jurgens

PDF Project Project

With Great Backbones Comes Great Adversarial Transferability

The large and ever-increasing amount of data available on the Internet coupled with the laborious task of manual claim and fact …

Erik Arakelyan, Karen Hambardzumyan, Davit Papikyan, Pasquale Minervini, Albert Gordo, Aram H. Markosyan, Isabelle Augenstein

PDF Project

Factcheck-Bench: Fine-Grained Evaluation Benchmark for Automatic Fact-Checkers

The increased use of large language models (LLMs) across a variety of real-world applications calls for mechanisms to verify the …

Yuxia Wang, Revanth Gangi Reddy, Zain Muhammad Mujahid, Arnav Arora, Aleksandr Rubashevskii, Jiahui Geng, Osama Mohammed Afzal, Liangming Pan, Nadav Borenstein, Aditya Pillai, Isabelle Augenstein, Iryna Gurevych, Preslav Nakov

PDF Project Project

Investigating the Impact of Model Instability on Explanations and Uncertainty

Explainable AI methods facilitate the understanding of model behaviour, yet, small, imperceptible perturbations to inputs can vastly …

Sara Vera Marjanović, Isabelle Augenstein, Christina Lioma

PDF Project Project

PHD: Pixel-Based Language Modeling of Historical Documents

The digitisation of historical documents has provided historians with unprecedented research opportunities. Yet, the conventional …

Nadav Borenstein, Phillip Rust, Desmond Elliott, Isabelle Augenstein

PDF Project

Revisiting Softmax for Uncertainty Approximation in Text Classification

Uncertainty approximation in text classification is an important area with applications in domain adaptation and interpretability. One …

Andreas Nugaard Holm, Dustin Wright, Isabelle Augenstein

Preprint PDF Project

Measuring Intersectional Biases in Historical Documents

Data-driven analyses of biases in historical texts can help illuminate the origin and development of biases prevailing in modern …

Nadav Borenstein, Karolina Stańczak, Thea Rolskov, Natacha Klein Käfer, Natália da Silva Perez, Isabelle Augenstein

PDF Project

Multilingual Event Extraction from Historical Newspaper Adverts

NLP methods can aid historians in analyzing textual materials in greater volumes than manually feasible. Developing such methods poses …

Nadav Borenstein, Natália da Silva Perez, Isabelle Augenstein

PDF Project

Topic-Guided Sampling For Data-Efficient Multi-Domain Stance Detection

The task of Stance Detection is concerned with identifying the attitudes expressed by an author towards a target of interest. This task …

Erik Arakelyan, Arnav Arora, Isabelle Augenstein

PDF Project Project

Generating Fluent Fact Checking Explanations with Unsupervised Post-Editing

Fact-checking systems have become important tools to verify fake and misguiding news. These systems become more trustworthy when …

Shailza Jolly, Pepa Atanasova, Isabelle Augenstein

PDF Project Project Project

A Primer on Contrastive Pretraining in Language Processing: Methods, Lessons Learned and Perspectives

Modern natural language processing (NLP) methods employ self-supervised pretraining objectives such as masked language modeling to …

Nils Rethmeier, Isabelle Augenstein

PDF Project

Generating Scientific Claims for Zero-Shot Scientific Fact Checking

Automated scientific fact checking is difficult due to the complexity of scientific language and a lack of significant amounts of …

Dustin Wright, David Wadden, Kyle Lo, Bailey Kuehl, Isabelle Augenstein, Lucy Lu Wang

PDF Project Project Project

Multi3Generation: Multi-task, Multilingual, Multi-Modal Language Generation

This paper presents the Multitask, Multilingual, Multimodal Language Generation COST Action – Multi3Generation (CA18231), an …

Anabela Barreiro, José G. C. de Souza, Albert Gatt, Mehul Bhatt, Elena Lloret, Aykut Erdem, Dimitra Gkatzia, Helena Moniz, Irene Russo, Fabio Kepler, Iacer Calixto, Marcin Paprzycki, François Portet, Isabelle Augenstein, Mirela Alhasani

PDF Project Project

A Neighbourhood Framework for Resource-Lean Content Flagging

We propose a novel framework for cross-lingual content flagging with limited target-language data, which significantly outperforms …

Sheikh Muhammad Sarwar, Dimitrina Zlatkova, Momchil Hardalov, Yoan Dinkov, Isabelle Augenstein, Preslav Nakov

PDF Project

Diagnostics-Guided Explanation Generation

Explanations shed light on a machine learning model’s rationales and can aid in identifying deficiencies in its reasoning …

Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, Isabelle Augenstein

PDF Project Project

Few-Shot Cross-Lingual Stance Detection with Sentiment-Based Pre-Training

The goal of stance detection is to determine the viewpoint expressed in a piece of text towards a target. These viewpoints or contexts …

Momchil Hardalov, Arnav Arora, Preslav Nakov, Isabelle Augenstein

PDF Project Project Project

Contrastive Text Pretraining for Zero to Few-Shot Long-Tail Learning

For natural language processing (NLP) tasks such as sentiment or topic classification, currently prevailing approaches heavily rely on …

Nils Rethmeier, Isabelle Augenstein

PDF Project

Longitudinal Citation Prediction using Temporal Graph Neural Networks

Citation count prediction is the task of predicting the number of citations a paper has gained after a period of time. Prior work …

Andreas Nugaard Holm, Barbara Plank, Dustin Wright, Isabelle Augenstein

PDF Project Project

Cross-Domain Label-Adaptive Stance Detection

Stance detection concerns the classification of a writer’s viewpoint towards a target. There are different task variants, e.g., …

Momchil Hardalov, Arnav Arora, Preslav Nakov, Isabelle Augenstein

PDF Project Project

How Does Counterfactually Augmented Data Impact Models for Social Computing Constructs?

As NLP models are increasingly deployed in socially situated settings such as online abusive content detection, ensuring these models …

Indira Sen, Mattia Samory, Fabian Flöck, Claudia Wagner, Isabelle Augenstein

PDF Project Project Project

Semi-Supervised Exaggeration Detection of Health Science Press Releases

Public trust in science depends on honest and factual communication of scientific papers. However, recent studies have demonstrated a …

Dustin Wright, Isabelle Augenstein

PDF Project Project Project

Joint Emotion Label Space Modelling for Affect Lexica

Emotion lexica are commonly used resources to combat data poverty in automatic emotion detection. However, methodological issues emerge …

Luna De Bruyne, Pepa Atanasova, Isabelle Augenstein

PDF Project

Inducing Language-Agnostic Multilingual Representations

Cross-lingual representations have the potential to make NLP techniques available to the vast majority of languages in the world. …

Wei Zhao, Steffen Eger, Johannes Bjerva, Isabelle Augenstein

PDF Code Project Project

CiteWorth: Cite-Worthiness Detection for Improved Scientific Document Understanding

Scientific document understanding is challenging as the data is highly domain specific and diverse. However, datasets for tasks with …

Dustin Wright, Isabelle Augenstein

Project Project

Does Typological Blinding Impede Cross-Lingual Sharing?

Bridging the performance gap between high- and low-resource languages has been the focus of much previous work. Typological features …

Johannes Bjerva, Isabelle Augenstein

PDF Project Project

University of Copenhagen Participation in TREC Health Misinformation Track 2020

In this paper, we describe our participation in the TREC Health Misinformation Track 2020. We submitted 11 runs to the Total Recall …

Lucas Chaves Lima, Dustin Wright, Isabelle Augenstein, Maria Maistro

PDF Project Project

SubjQA: A Dataset for Subjectivity and Review Comprehension

Subjectivity is the expression of internal opinions or beliefs which cannot be objectively observed or verified, and has been shown to …

Johannes Bjerva, Nikita Bhutani, Behzad Golshan, Wang-Chiew Tan, Isabelle Augenstein

PDF Dataset Project Project

Transformer Based Multi-Source Domain Adaptation

In practical machine learning settings, the data on which a model must make predictions often come from a different distribution than …

Dustin Wright, Isabelle Augenstein

PDF Code Project

Zero-Shot Cross-Lingual Transfer with Meta Learning

Learning what to share between tasks has been a topic of high importance recently, as strategic sharing of knowledge has been shown to …

Farhad Nooralahzadeh, Giannis Bekoulis, Johannes Bjerva, Isabelle Augenstein

PDF Code Project Project Project Project

Claim Check-Worthiness Detection as Positive Unlabelled Learning

A critical component of automatically combating misinformation is the detection of fact check-worthiness, i.e. determining if a piece …

Dustin Wright, Isabelle Augenstein

PDF Code Project Project Project

SIGTYP 2020 Shared Task: Prediction of Typological Features

Typological knowledge bases (KBs) such as WALS contain information about linguistic properties of the world’s languages. They …

Johannes Bjerva, Elizabeth Salesky, Sabrina J. Mielke, Aditi Chaudhary, Giuseppe G. A. Celano, Edoardo M. Ponti, Ekaterina Vylomova, Ryan Cotterell, Isabelle Augenstein

PDF Code Dataset Project Project

TX-Ray: Quantifying and Explaining Model-Knowledge Transfer in (Un-)Supervised NLP

While state-of-the-art NLP explainability (XAI) methods focus on supervised, per-instance end or diagnostic probing task evaluation[4, …

Nils Rethmeier, Vageesh Kumar Saxena, Isabelle Augenstein

PDF Code Project Project

Semantic Textual Similarity of Sentences with Emojis

In this paper, we extend the task of semantic textual similarity to include sentences which contain emojis. Emojis are ubiquitous on …

Alok Debnath, Nikhil Pinnaparaju, Manish Shrivastava, Vasudeva Varma, Isabelle Augenstein

PDF Project

Back to the Future -- Sequential Alignment of Text Representations

Language evolves over time in many ways relevant to natural language processing tasks. For example, recent occurrences of tokens …

Johannes Bjerva, Wouter Kouw, Isabelle Augenstein

PDF Code Project Project

Domain Transfer in Dialogue Systems without Turn-Level Supervision

Task oriented dialogue systems rely heavily on specialized dialogue state tracking (DST) modules for dynamically predicting user intent …

Joachim Bingel, Victor Petrén Bach Hansen, Ana Valeria Gonzalez, Paweł Budzianowski, Isabelle Augenstein, Anders Søgaard

PDF Project Project

X-WikiRE: A Large, Multilingual Resource for Relation Extraction as Machine Comprehension

Although the vast majority of knowledge bases KBs are heavily biased towards English, Wikipedias do cover very different topics in …

Mostafa Abdou, Cezar Sas, Rahul Aralikatte, Isabelle Augenstein, Anders Søgaard

PDF Project Project Project Project

Transductive Auxiliary Task Self-Training for Neural Multi-Task Models

Multi-task learning and self-training are two common ways to improve a machine learning model’s performance in settings with …

Johannes Bjerva, Katharina Kann, Isabelle Augenstein

PDF Project

Unsupervised Discovery of Gendered Language through Latent-Variable Modeling

Studying to what degree the language we use is gender-specific has long been an area of interest in socio-linguistics. Studies have …

Alexander Hoyle, Lawrence Wolf-Sonkin, Hanna Wallach, Isabelle Augenstein, Ryan Cotterell

PDF Code Project Project Slides

Combining Sentiment Lexica with a Multi-View Variational Autoencoder

When assigning quantitative labels to a dataset, different methodologies may rely on different scales. In particular, when assigning …

Alexander Hoyle, Lawrence Wolf-Sonkin, Hanna Wallach, Ryan Cotterell, Isabelle Augenstein

PDF Code Project Slides Video

Issue Framing in Online Discussion Fora

In online discussion fora, speakers often make arguments for or against something, say birth control, by highlighting certain aspects …

Mareike Hartmann, Tallulah Jansen, Isabelle Augenstein, Anders Søgaard

PDF Project Project

Latent multi-task architecture learning

Multi-task learning (MTL) allows deep neural networks to learn from related tasks by sharing parameters with other networks. In …

Sebastian Ruder, Joachim Bingel, Isabelle Augenstein, Anders Søgaard

PDF Code Project Slides

Parameter sharing between dependency parsers for related languages

Previous work has suggested that parameter sharing between transition-based neural dependency parsers for related languages can lead to …

Miryam de Lhoneux, Johannes Bjerva, Isabelle Augenstein, Anders Søgaard

PDF Project Project Poster

Copenhagen at CoNLL--SIGMORPHON 2018: Multilingual Inflection in Context with Explicit Morphosyntactic Decoding

This paper documents the Team Copenhagen system which placed first in the CoNLL–SIGMORPHON 2018 shared task on universal …

Yova Kementchedjhieva, Johannes Bjerva, Isabelle Augenstein

PDF Project Project

Character-level Supervision for Low-resource POS Tagging

Neural part-of-speech (POS) taggers are known to not perform well with little training data. As a step towards overcoming this problem, …

Katharina Kann, Johannes Bjerva, Isabelle Augenstein, Barbara Plank, Anders Søgaard

PDF Project

Multi-Task Learning of Pairwise Sequence Classification Tasks over Disparate Label Spaces

We combine multi-task learning and semisupervised learning by inducing a joint embedding space between disparate label spaces and …

Isabelle Augenstein, Sebastian Ruder, Anders Søgaard

PDF Code Project Project Slides Video

KU-MTL at SemEval-2018 Task 1: Multi-task Identification of Affect in Tweets

We take a multi-task learning approach to the shared Task 1 at SemEval-2018. The general idea concerning the model structure is to use …

Thomas Nyegaard-Signori, Casper Veistrup Helms, Johannes Bjerva, Isabelle Augenstein

PDF Project

Multi-Task Learning of Keyphrase Boundary Classification

Keyphrase boundary classification (KBC) is the task of detecting keyphrases in scientific articles and labelling them with respect to …

Isabelle Augenstein, Anders Søgaard

PDF Project Project Project Poster