Can Transformers Learn n-gram Language Models?

Anej Svete , Nadav Borenstein , Mike Zhou , Isabelle Augenstein , Ryan Cotterell

24 Sep 2024

PDF Project

Abstract

Much theoretical work has described the ability of transformer language models (LMs) to represent formal languages.However, linking theoretical results to empirical performance is not straightforward. We empirically evaluate recent work linking transformers to n-gram LMs by studying their ability to learn random n-gram LMs of two kinds: ones with arbitrary next-symbol probabilities and ones where next-symbol probabilities are defined with shared parameters. We find that classic n-gram estimation techniques such as Add-lambda outperform transformers on the former, while transformers perform well on the latter, outperforming methods specifically designed to learn n-gram LMs.

Type

Conference paper

Publication

In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024)

Date

September, 2024