Unsupervised Disentangling of Language and Meaning

Unsupervised Disentangling of Language and Meaning === **General Context:** Part of the generated language is dependant on some kind of an underlying knowledge base. The knowledge base is the component that controls the information contained in generated sentences. For instance: in Obama is born in 1961, 1961 comes from the knowledge base. **Objective:** being able to extract the underlying knowledge base such that one can take control over it. We can think about two different approaches (but first, we could focus on the first one): First approach: from a corpus, extract the knowledge base Second approach: Update the knowledge base while reading new sentences **Problem:** The knowledge is not labeled in the available corpora (or weakly labeled). So we need to find a way to extract it automatically. **Idea:** If we have access to different sentences containing the same knowledge, then the invariance is the knowledge. See the https://arxiv.org/abs/1711.00305 paper which is based on the same assumption (for images). So by working on pairs of sentences/documents, one would be able to rebuild the underlying knowledge **Formalization:** We consider that a generative model of text is producing words from left to right. This process is mixing two types of models: one model in charge of writing correct sentences, and one in charge of distilling corresponding information in the sentences. Axiomatic Description of MR === Main document [here](https://hackmd.io/nDdueWwzQ5ehqbC5251YhQ?both). What is a good latent meaning representation (MR)? Here we give an axiomatic description of MRs. We are looking for: * Meaning *extractor* operator $m(x)$ that produces meaning representation for $x$ * Meaning *aggregation* operation $+$ on meaning representations * Query operator $w(m,q)$ that returns the answer for an NL query based on the knowledge in $q$. We want the following from meaning representation $m(x)$ and query operation $w$: * *Meaning invariance*: Iff $x_1$ and $x_2$ have the same meaning then $m(x_1) = m(x_2)$ * *Compositionality*: Assume there is a way to combine text $x_1$ and text $x_2$ to $x_1 + x_2$ (e.g. concatenation, but other NL strategies are possible), then we want that $m(x_1) + m(x_2) = m(x_1 + x_2)$. Compositionality allows humans to "build" a KB incrementally, using NL formulations of knowledge as input. * *Querying*: Let $a$ be the answer to query $q$ about the meaning of $x$, then $w(m(x),q)=a$. (We also want $w$ to be robust with respect to paraphrasing but this might fall out of meaning invariance?) There will be more axioms, for example on minimality or scale: * *Sublinearity*: let $\sum_i^n t_i$ be a text collection. We want the (asymptotic) cost of the query $w(m(\sum_i t_i), q)$ to be below $O(n)$. Question: From a downstream point of view, why is meaning invariance important? I almost feel that this invariance is something that helps to fulfill the other requirements, but not an end in itself (a main hypothesis?). Note that these constraints do *not* require interpretability. Instead, provision of an NL interface allows humans to interact with the knowledge. Humans can also append/change knowledge via the composition operation and providing additional units of information as language input. To develop an MR we should find architectures, constraints, and unsupervised objectives that align with the axioms. One core hypothesis I have that a latent representation directly optimised for the above will work better than traditional ones Evaluation ========== To *intrinsically* evaluate an MR under the above specification, we need to measure the extent to which the axioms hold. We can also compare traditional IE pipelines with respect to metrics aligned with the axioms. * We can test generalisation for the meaning invariance axiom: does the model indeed produce the same representation for texts with the same meaning, unseen at training time? * We can test compositionality in the same way (on test set). *