Scholarly Data Processing

We are working on studying methods to automatically process scholarly data. This is to assist researchers in finding publications (e.g. by extracting content from papers automatically, which can be used to populate knowledge bases), writing better papers (e.g. by suggesting which sentences need citations, improving peer review), or tracking their impact (e.g. by tracking which papers are highly cited and how this relates to meta-data, such as venues or authors).

Publications

Distorted science communication harms individuals and society as it can lead to unhealthy behavior change and decrease trust in …

Whether the media faithfully communicate scientific information has long been a core issue to the science community. Automatically …

Learning scientific document representations can be substantially improved through contrastive learning objectives, where the challenge …

Automated scientific fact checking is difficult due to the complexity of scientific language and a lack of significant amounts of …

Citation count prediction is the task of predicting the number of citations a paper has gained after a period of time. Prior work …

Public trust in science depends on honest and factual communication of scientific papers. However, recent studies have demonstrated a …

Most work on scholarly document processing assumes that the information processed is trust-worthy and factually correct. However, this …

Scientific document understanding is challenging as the data is highly domain specific and diverse. However, datasets for tasks with …

A critical component of automatically combating misinformation is the detection of fact check-worthiness, i.e. determining if a piece …

Peer review is our best tool for judging the quality of conference submissions, but it is becoming increasingly spurious. We argue that …

Language evolves over time in many ways relevant to natural language processing tasks. For example, recent occurrences of tokens …

Automatic summarisation is a popular approach to reduce a document to its main arguments. Recent research in the area has focused on …

Keyphrase boundary classification (KBC) is the task of detecting keyphrases in scientific articles and labelling them with respect to …