Much theoretical work has described the ability of transformer language models (LMs) to represent formal languages.However, linking theoretical results to empirical performance is not straightforward. We empirically evaluate recent work linking transformers to n-gram LMs by studying their ability to learn random n-gram LMs of two kinds: ones with arbitrary next-symbol probabilities and ones where next-symbol probabilities are defined with shared parameters. We find that classic n-gram estimation techniques such as Add-lambda outperform transformers on the former, while transformers perform well on the latter, outperforming methods specifically designed to learn n-gram LMs.