"Semantic parsing converts natural language statements into logical forms in a meaning representation language [or into a programming language]"
"interpretation [as opposed to generation of language] is an inherently probabilistic problem"
"Two important aspects of semantic parsing for question answering are the breadth of the knowledge source and the depth of logical compositionality."
"Decoding is difficult for machine translation because of the huge space of possible translations." (Eisenstein, p.431) - - - This should be easier if translating into a controlled natural language, in particular if it is determinstic.
"We also need to know something about the programs that we're outputting. A key feature of the semantic parsing problem that the best models make use of is that there are particular constraints on the outputโwe know that the output should be a valid program in some programming language, and we can write down a set of hard constraints on what is a valid program. We'll start with a very simple model in this chapter which doesn't take into account these constraints, but the rest of the chapters will leverage the output programming language to build stronger models." (Allen NLP) - - - Do they actually do this?
"NMT models often fail to properly use [source and target] information. [An] example is hallucinations, when a model produces sentences that are fluent but unrelated to the source." (Lena Voita)
GPT
Can You Give Me An Example of Semantic Parsing
Let's take the example sentence "The quick brown fox jumps over the lazy dog."
Parsing using a Context-Free Grammar (CFG):
A CFG for this sentence might include rules like:
S โ NP VP (A sentence consists of a noun phrase followed by a verb phrase)
NP โ Det Adj N (A noun phrase consists of a determiner, followed by adjectives, followed by a noun)
VP โ V PP (A verb phrase consists of a verb followed by a prepositional phrase)
PP โ P NP (A prepositional phrase consists of a preposition followed by a noun phrase)
Det โ 'The' | 'the' (Determiners)
Adj โ 'quick' | 'brown' | 'lazy' (Adjectives)
N โ 'fox' | 'dog' (Nouns)
V โ 'jumps' (Verbs)
P โ 'over' (Prepositions)
Using these rules, here is the corrected parse tree that includes the preposition "over":
S
โโโ NP
โ โโโ Det
โ โ โโโ The
โ โโโ Adj
โ โ โโโ quick
โ โโโ Adj
โ โ โโโ brown
โ โโโ N
โ โโโ fox
โโโ VP
โโโ V
โ โโโ jumps
โโโ PP
โโโ P
โ โโโ over
โโโ NP
โโโ Det
โ โโโ the
โโโ Adj
โ โโโ lazy
โโโ N
โโโ dog
This tree represents the syntactic structure of the sentence according to the CFG rules.
Parsing using Semantic Parsing:
Semantic parsing aims to transform the sentence into a structured meaning representation. One common representation is a form of logical expression that can be used to query a knowledge base. For our example, we could use predicate logic like:
jump(fox-3, dog-7) represents the action of jumping where the fox is the subject (performer) of the jump and the dog is the object (target).
det(fox-3, the-1) indicates that 'the' is the determiner for 'fox.'
amod(fox-3, quick-2) and amod(fox-3, brown-3) indicate that 'quick' and 'brown' are adjectival modifiers of 'fox.'
det(dog-7, the-5) and amod(dog-7, lazy-6) indicate that 'the' is the determiner for 'dog' and that 'lazy' is its adjectival modifier.
The numbers correspond to the position of the word in the sentence (1-indexed). This semantic representation abstracts away from the syntactic form of the sentence and focuses on its meaning, which could then be used to answer questions like "Who jumps?" or "Over what does the fox jump?" using a knowledge base that understands these predicates and their relationships.
How Does a Semantic Parser Work?
A semantic parser works by converting natural language input into a structured and machine-interpretable representation of its meaning, such as a logical form, database query, or executable code. The process typically involves several stages, each contributing to the understanding and transformation of the language input into a desired formalism that captures its semantic content.
Here's a high-level overview of how a semantic parser might work:
Tokenization and Part-of-Speech Tagging: The input sentence is split into tokens (words and punctuation), and each token is annotated with its part-of-speech (POS) tag, indicating whether it is a noun, verb, adjective, etc.
Syntactic Parsing: Using a CFG or more advanced parsing techniques like dependency parsing or constituency parsing, the syntactic structure of the sentence is determined. This results in a parse tree or a dependency graph that represents the grammatical relationships between words.
Semantic Role Labeling (SRL): This process identifies the predicate-argument structure of the sentence. For example, it labels which nouns are subjects, objects, etc., with respect to the verbs (predicates).
Entity Recognition and Disambiguation: Named entities (e.g., people, organizations, locations) are identified, and their references are resolved, often linking them to entities in a knowledge base (such as a knowledge graph).
Construction of Semantic Representation: The parser constructs a semantic representation by mapping the syntactic structure and the identified semantic roles to a formal semantic language. This could involve translating the sentence into a logic-based form (such as lambda calculus), a database query (like SQL), or a machine-readable format like RDF triples if the target is a knowledge graph.
Incorporation of Context and World Knowledge: The parser may need to consider the broader context of the conversation or text, as well as common-sense knowledge or domain-specific knowledge. This helps in resolving ambiguities and understanding implicit meaning.
Handling of Pragmatics and Inference: The parser may need to perform additional reasoning to capture the intended meaning, especially when dealing with indirect speech, idioms, metaphors, or implicatures.
The operation of a semantic parser can be quite complex, depending on the specific applications and the depth of understanding required. Machine learning, especially deep learning, plays a significant role in modern semantic parsing. Models like transformer-based architectures (BERT, GPT, etc.) can be fine-tuned to perform semantic parsing tasks, leveraging large amounts of annotated data and their ability to capture nuanced patterns in language usage.
Semantic parsing is a key component in various NLP applications, such as question answering systems, chatbots, virtual assistants, and text-based information extraction systems, where understanding the meaning of text is crucial for effective functionality.
Traditional parsing techniques
Mapping Natural Language to Description Logic. 2017. They took 960 system installation design principles (SIDP) from Airbus and translated them to OWL. The grammar was specifically crafted for these SIDPs and parsed using CYK. The correctness of partial parses was assessed algorithmically by comparing the input sentences with re-generated sentences using BLEU. Interesting for us could be the separation of grammar and lexicon (p.6)
Summary of Content: semantic parsing to map system installation design principles (SIDP) to OWL DL axioms โฆ a symbolic grammar-based approach with an automatically acquired lexicon and a robust parsing algorithm which can skip unknown words to produce connected, possibly partial, parses โฆ a surface realiser which given a DL formula can produce a text โฆ how lexical and grammatical knowledge can be dissociated
General Observations: "enriching ontologies from texts would help bridge the gap between a document-centric and a model-centric view of information" โฆ "A chief difficulty when mapping text to semantic representations is the lack of accepted criteria for assessing the correctness of the semantic representations derived by the system" โฆ "A central bottleneck when developing semantic parsers and text generators is the lack of parallel corpus aligning text and semantic representations"
Krishnamurthy, Mitchell: Weakly Supervised Training of Semantic Parsers, 2012. Video of talk. Contains a good general introduction of question answering based on semantic parsing and a knowledge base. Main idea of the paper is how to use the knowledge base for unsupervised training of the semantic parser.