Parse Tree Example

Introduction

From a paper reporting on the US Open some years ago:

Novak Djokovic, five times a champion here, failed 
to reach the concluding Sunday of the ATP World Tour Finals 
for a record eighth time [...].

This sentence can be interpreted in two different ways. It has two different meanings.

Can you spot and explain the ambiguity?

In the following, we are going to explain the ambiguity by showing that the sentence above has two different parse trees.[1]

Remark: At some point during the lecture (not part of these notes) we discussed "How do computers think?" and the concept of neuro-symbolic AI. One can think of the notes here as an example that mixes neural and symbolic intelligence: Below, I will use generative AI (neural intelligence) to generate parse trees and a CFG. But symbolic intelligence (rule following) is required to verify that the parse trees actually represent the sentences and that the CFG can generate the parse trees.

Analysis

Let us first simplify the sentence down to its essentials:

Novak Djokovic failed to reach the Finals for a record eighth time.

One way to think about parsing is to put in the parentheses into a string.

In this example we can do this in two ways:

Novak Djokovic failed to (reach the Finals for a record eighth time).
Novak Djokovic failed (to reach the Finals) for a record eighth time.

In the first reading, eigth time qualifies reach and sentence is about Djokovic reaching the final eight times.

In the second reading eigth time qualifies failed and the sentence is about Djokovic failing eight times.

We are now going to show the two parse trees that correspond to the two different interpretations.

Parse trees are important in programming languages because type-checker, interpreters, code generators, etc do not manipulate programs-as-strings but programs-as-trees.

The Parse Trees

If you paste the sentences above into GPT4 you can make it produce the corresponding parse trees (based on a context-free grammar, see below).

Btw, parsing refers to the activity of producing the trees from the strings.

Novak Djokovic failed to (reach the Finals for a record eighth time)

S
|-- NP
|   |-- N
|       |-- "Novak Djokovic"
|-- VP
    |-- V
    |   |-- "failed"
    |-- VP
        |-- TO
        |   |-- "to"
        |-- VP
            |-- V
            |   |-- "reach"
            |-- NP
            |   |-- Det
            |   |   |-- "the"
            |   |-- N
            |       |-- "Finals"
            |-- PP
                |-- P
                |   |-- "for"
                |-- NP
                    |-- Det
                    |   |-- "a"
                    |-- N
                    |   |-- "record"
                    |-- Num
                    |   |-- "eighth"
                    |-- N
                        |-- "time"

Exercise: Think about how a parsing algorithm could create the tree from the string.

Novak Djokovic failed (to reach the Finals) for a record eighth time.

S
|-- NP
|   |-- N
|       |-- "Novak Djokovic"
|-- VP
    |-- V
    |   |-- "failed"
    |-- VP
    |   |-- TO
    |   |   |-- "to"
    |   |-- VP
    |       |-- V
    |       |   |-- "reach"
    |       |-- NP
    |           |-- Det
    |           |   |-- "the"
    |           |-- N
    |               |-- "Finals"
    |-- PP
        |-- P
        |   |-- "for"
        |-- NP
            |-- Det
            |   |-- "a"
            |-- N
            |   |-- "record"
            |-- Num
            |   |-- "eighth"
            |-- N
                |-- "time"

Exercise: Again, think about how a parser would create the tree from the string.

The Context-Free Grammar

In software engineering practice, we often do not write parsers by hand, but use parser-generators to generate parsers from a context-free grammar.

I used GPT4 to create a CFG for that can be used to parse the sentences above:

CFG Production Rules

Let's define a CFG with non-terminal symbols (S, NP, VP, V, TO, N, Det, P, PP, Num) and terminal symbols (specific words and categories from the sentences).

S → NP VP
NP → N
N → "Novak Djokovic" | "Finals" | "time"
VP → V VP | V PP
V → "failed" | "reach"
TO → "to"
PP → P NP
P → "for"
Det → "the" | "a"
Num → "eighth"

Explanation

  1. S → NP VP

    • A sentence consists of a noun phrase followed by a verb phrase.
  2. NP → N

    • A noun phrase can be just a noun (e.g., "Novak Djokovic").
  3. N → "Novak Djokovic" | "Finals" | "time"

    • Nouns identified in the sentences.
  4. VP → V VP | V PP

    • A verb phrase can be a verb followed by another verb phrase or a verb followed by a prepositional phrase.
  5. V → "failed" | "reach"

    • Verbs identified in the sentences.
  6. TO → "to"

    • Marker for the infinitive.
  7. PP → P NP

    • A prepositional phrase consists of a preposition followed by a noun phrase.
  8. P → "for"

    • Prepositions identified in the sentences.
  9. Det → "the" | "a"

    • Determiners used in the sentences.
  10. Num → "eighth"

    • Numerals identified in the sentences.

  1. Programming languages are designed so that every program only has one unique parse tree. A substanital part of the theory of programming languages is concerned with techniques that help designing languages that have unique parse trees that can be efficiently constructed by compilers. ↩︎