Constituent Parsing Exercises#

In the lecture we took a look at a simple tokenizer and sentence segmenter. In this exercise we will expand our understanding of the problem by asking a few important questions, and looking at the problem from a different perspectives.

Setup 1: Load Libraries#

%%capture
%load_ext autoreload
%autoreload 2
%matplotlib inline
# %cd .. 
import sys
sys.path.append("..")
import math 
import statnlpbook.util as util
import statnlpbook.parsing as parsing
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Input In [1], in <cell line: 8>()
      6 sys.path.append("..")
      7 import math 
----> 8 import statnlpbook.util as util
      9 import statnlpbook.parsing as parsing

ModuleNotFoundError: No module named 'statnlpbook'

Task 1: Understanding parsing#

Be sure you understand grammatical categories and structures and brush up on your grammar skils.

Then re-visit the Enju online parser, and parse the following sentences…

What is wrong with the parses of the following sentences? Are they correct?

  • Fat people eat accumulates.

  • The fat that people eat accumulates in their bodies.

  • The fat that people eat is accumulating in their bodies.

What about these, is the problem in the parser or in the sentence?

  • The old man the boat.

  • The old people man the boat.

These were examples of garden path sentences, find out what that means.

What about these sentences? Are their parses correct?

  • Time flies like an arrow; fruit flies like a banana.

  • We saw her duck.

Task 2: Parent Annotation#

Reminisce the lecture notes in parsing, and the mentioned parent annotation. (grand)*parents, matter - knowing who the parent is in a tree gives a bit of context information which can later help us with smoothing probabilities, and approaching context-dependent parsing.

in that case, each non-terminal node should know it’s parent. We’ll do this exercise on a single tree, just to play around a bit with trees and their labeling.

Given the following tree:

x = ('S', [('Subj', ['He']), ('VP', [('Verb', ['shot']), ('Obj', ['the', 'elephant']), ('PP', ['in', 'his', 'pyjamas'])])])
parsing.render_tree(x)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [2], in <cell line: 2>()
      1 x = ('S', [('Subj', ['He']), ('VP', [('Verb', ['shot']), ('Obj', ['the', 'elephant']), ('PP', ['in', 'his', 'pyjamas'])])])
----> 2 parsing.render_tree(x)

NameError: name 'parsing' is not defined

Construct an annotate_parents function which will take that tree, and annotate its parents. The final annotation result should look like this:

y = ('S^?', [('Subj^S', ['He']), ('VP^S', [('Verb^VP', ['shot']), ('Obj^VP', ['the', 'elephant']), ('PP^VP', ['in', 'his', 'pyjamas'])])])
parsing.render_tree(y)
../_images/parsing_8_0.svg

Solutions#

You can find the solutions to this exercises here