Constituent Parsing Exercises#

In the lecture we took a look at a simple tokenizer and sentence segmenter. In this exercise we will expand our understanding of the problem by asking a few important questions, and looking at the problem from a different perspectives.

Setup 1: Load Libraries#

%%capture
%load_ext autoreload
%autoreload 2
%matplotlib inline
# %cd .. 
import sys
sys.path.append("..")
import math 
import statnlpbook.util as util
import statnlpbook.parsing as parsing

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Input In [1], in <cell line: 8>()
      6 sys.path.append("..")
      7 import math 
----> 8 import statnlpbook.util as util
      9 import statnlpbook.parsing as parsing

ModuleNotFoundError: No module named 'statnlpbook'

Task 1: Understanding parsing#

Be sure you understand grammatical categories and structures and brush up on your grammar skils.

Then re-visit the Enju online parser, and parse the following sentences…

What is wrong with the parses of the following sentences? Are they correct?

Fat people eat accumulates.
The fat that people eat accumulates in their bodies.
The fat that people eat is accumulating in their bodies.

What about these, is the problem in the parser or in the sentence?

The old man the boat.
The old people man the boat.

These were examples of garden path sentences, find out what that means.

What about these sentences? Are their parses correct?

Time flies like an arrow; fruit flies like a banana.
We saw her duck.

Task 2: Parent Annotation#

Reminisce the lecture notes in parsing, and the mentioned parent annotation. (grand)*parents, matter - knowing who the parent is in a tree gives a bit of context information which can later help us with smoothing probabilities, and approaching context-dependent parsing.

in that case, each non-terminal node should know it’s parent. We’ll do this exercise on a single tree, just to play around a bit with trees and their labeling.

Given the following tree:

x = ('S', [('Subj', ['He']), ('VP', [('Verb', ['shot']), ('Obj', ['the', 'elephant']), ('PP', ['in', 'his', 'pyjamas'])])])
parsing.render_tree(x)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [2], in <cell line: 2>()
      1 x = ('S', [('Subj', ['He']), ('VP', [('Verb', ['shot']), ('Obj', ['the', 'elephant']), ('PP', ['in', 'his', 'pyjamas'])])])
----> 2 parsing.render_tree(x)

NameError: name 'parsing' is not defined

We construct the annotate_parents function which will take that tree, and annotate its parents:

def annotate_parents(tree, parent="null"):
    if isinstance(tree, tuple):
        children = [annotate_parents(child, tree[0]) for child in tree[1]]
        return (tree[0] + "^" + parent, children)
    else:
        return tree

The final annotation result looks like this:

parsing.render_tree(annotate_parents(x))

Stat-NLP Book

Constituent Parsing Exercises

Contents

Constituent Parsing Exercises#

Setup 1: Load Libraries#

Task 1: Understanding parsing#

Task 2: Parent Annotation#