Introduction#

Welcome to this interactive book on Statistical Natural Language Processing (NLP). NLP is a field that lies in the intersection of Computer Science, Artificial Intelligence (AI) and Linguistics with the goal to enable computers to solve tasks that require natural language understanding and/or generation. Such tasks are omnipresent in most of our day-to-day life: think of Machine Translation, Automatic Question Answering or even basic Search. All these tasks require the computer to process language in one way or another. But even if you ignore these practical applications, many people consider language to be at the heart of human intelligence, and this makes NLP (and its more linguistically motivated cousin, Computational Linguistics), important for its role in AI alone.

Statistical NLP#

NLP is a vast field with beginnings dating back to at least the 1960s, and it is difficult to give a full account of every aspect of NLP. Hence, this book focuses on a sub-field of NLP termed Statistical NLP (SNLP). In SNLP computers aren’t directly programmed to process language; instead, they learn how language should be processed based on the statistics of a corpus of natural language. For example, a statistical machine translation system’s behaviour is affected by the statistics of a parallel corpus where each document in one language is paired with its translation in another. This approach has been dominating NLP research for almost two decades now, and has seen widespread in industry too. Notice that while Statistics and Machine Learning are, in general, quite different fields, for the purposes of this book we will mostly identify Statistical NLP with Machine Learning-based NLP.

NDAK18000U Course Information#

We will use materials from this interactive book throughout the course. Note that this book was originally developed for a 15 ECTS course at UCL, so we will not be covering all topics of the book and will cover some in less depth. For completeness and context, you can still access all book materials below. Materials are due to minor changes. Materials covered in each week are listed below and will be linked once they are close to finalised. The course schedule is tentative and subject to minor changes. The official course description can be found here.

Week 36 (6-10 Sept)
- Lecture (Tuesday): Course Logistics (slides), Introduction to NLP (slides), Tokenisation & Sentence Splitting (notes, slides, exercises), Text Classification (slides)
- Lab (10.09 & 13.09): Jupyter notebook setup, introduction to Colab. Introduction to PyTorch. Project group arrangements. Questions about the course project. (lab)
Week 37 (13-17 Sept)
- Reading (before lecture): Jurafsky & Martin Chapter 7 up to and including 7.4
- Lecture (Tuesday): Introduction to Representation Learning (slides), Language Modelling (partially) (slides)
- Lab (17.09 & 20.09): Recurrent Neural Networks and word representations. Project help. (lab)
Week 38 (20-24 Sept)
- Reading (before lecture): Jurafsky & Martin Chapter 9, up to and including 9.6
- Lecture (Tuesday): Language Modelling (rest) (slides), Recurrent Neural Networks (slides), Contextualised Word Representations (slides)
- Lab (24.09 & 27.09): Language Models with Transformers and RNNs. Project help. (lab)
Week 39 (27 Sept-1 Oct)
- Reading (before lecture): Attention? Attention! Blog post by Lilian Weng; Belinkov and Glass, 2020. Analysis Methods in Neural Language Processing: A Survey
- Lecture (Tuesday): Attention (slides), Interpretability (slides)
- Lab (01.10 & 04.10): Error analysis and explainability. Project help. (lab)
Week 40 (4-8 Oct)
- Reading (before lecture): Jurafsky & Martin Chapter 8, up to and including 8.6; 9.2.2 in Chapter 9; 18.1 in Chapter 18
- Lecture (Tuesday): Sequence Labelling (slides, notes)
- Lab (08.10 & 11.10): Sequence labelling. Beam search. Project help. (lab)
Week 41 (11-15 Oct)
- Reading (before lecture): Cardie, 1997. Empirical Methods in Information Extraction, up to (not including) the section “Learning Extraction Patterns”; Question Answering. Blog post by Vered Shwartz
- Lecture (Tuesday): Information Extraction (slides), Question Answering (slides)
- Lab (15.10 & 25.10): In-depth look at Transformers and Multilingual QA. Project help. (lab)
Week 43 (25-29 Oct)
- Reading (before lecture): Jurafsky & Martin Chapter 10, up to and including 10.8.2, recorded lecture 8 part 2; recorded lecture 5 part 9
- Lecture (Tuesday): Machine Translation (slides), Cross-lingual Transfer Learning (slides)
- Lab (29.10 & 01.11): Project help.
Week 44 (1-5 Nov)
- Reading (before lecture): Jurafsky & Martin Chapter 14, except 14.5; de Marneffe et al., 2021. Universal Dependencies, up to and including 2.3.2
- Lecture (Tuesday): Dependency Parsing (slides)
- Lab (05.11): Project help.

Structure of this Book#

We think that to understand and apply SNLP in practice one needs knowledge of the following:

Tasks (e.g. Machine Translation, Syntactic Parsing)
Methods & Frameworks (e.g. Discriminative Training, Linear Chain models, Representation Learning)
Implementations (e.g. NLP data structures, efficient dynamic programming)

The book is somewhat structured around the task dimension; that is, we will explore different methods, frameworks and their implementations, usually in the context of specific NLP applications.

On a higher level the book is divided into themes that roughly correspond to learning paradigms within SNLP, and which follow a somewhat chronological order: we will start with generative learning, then discuss discriminative learning, then cover forms of weaker supervision to conclude with representation and deep learning. As an overarching theme we will use structured prediction, a formulation of machine learning that accounts for the fact that machine learning outputs are often not just classes, but structured objects such as sequences, trees or general graphs. This is a fitting approach, seeing as NLP tasks often require prediction of such structures.

Table Of Contents#

Course Logistics: slides
Introduction to NLP: slides1, slides2
Structured Prediction: notes, slides, exercises
Tokenisation and Sentence Splitting: notes, slides, exercises
Generative Learning:
- Language Models (MLE, smoothing): notes, slides, exercises
  - Maximum Likelihood Estimation: notes, slides
- Machine Translation (EM algorithm, beam-search, encoder-decoder models): notes, slides1, slides2 exercises
- Constituent Parsing (PCFG, dynamic programming): notes, slides, exercises
- Dependency Parsing (transition based parsing): notes, slides
Discriminative Learning:
- Text Classification (logistic regression): notes, slides1, slides2
- Sequence Labelling (linear chain models): notes, slides
- Sequence Labelling (CRF): slides
Weak Supervision:
- Relation Extraction (distant supervision, semi-supervised learning) notes, slides, interactive-slides
Representation and Deep Learning
- Overview and Multi-layer Perceptrons slides
- Word Representations slides
- Contextualised Word Representations slides
- Recurrent Neural Networks slides1, slides2
- Attention (slides
- Transfer Learning slides
- Textual Entailment (RNNs) slides
- Interpretability (slides

Methods#

We have a few dedicated method chapters:

Structured Prediction: notes
Maximum Likelihood Estimation: notes
EM-Algorithm: notes

Interaction#

The best way to learn language processing with computers is to process language with computers. For this reason this book features interactive code blocks that we use to show NLP in practice, and that you can use to test and investigate methods and language. We use the Python language throughout this book because it offers a large number of relevant libraries and it is easy to learn.

Installation#

To install the book locally and use it interactively follow the installation instruction on GitHub.

Setup tutorials:

Azure tutorial

Stat-NLP Book

Introduction

Contents