Tarun Chitra - HackMD

Do language models possess knowledge (soundness)?

Authors: Tarun Chitra and Henry Prior (Gauntlet) In this post we explore how interactive proofs, a concept of interest in theoretical computer science and cryptography, can be useful for testing capabilities of language models. We consider how a setup comprised of multiple language models could be used in place of prompt tuning for this purpose. Our goal is to try to understand the role of knowledge in language models --- can they be viewed (at least to one another) as possessing knowledge? We utilize tools from interactive proofs in theoretical computer science to try to formalize this and provide a mathematical lens on the question, "do large language models actually possess knowledge?" Epistemology before Math There’s a lot hidden in the word “knowledge,” as epistemologists (i.e. those who study knowledge within philosophy) will be quick to inform you. One of the earliest epistemologists in the West, Plato, thought that all knowledge was recollection -- a matter of drawing out what one's immortal soul has always known, but unfortunately forgotten. This (bizarre) claim is given some support in a famous scene in Meno, where an enslaved boy manages to prove the Pythagorean theorem on the basis of Socrates' keen questioning. In the last few centuries, epistemologists have realized that defining knowledge isn't so easy, and an exception-less set of necessary and sufficient conditions is still super elusive. Although much of the debate has moved away from quibbling over deciding between competing sets of necessary & sufficient conditions, the question of whether knowledge is something that is open to conceptual analysis is still unsolved. The natural pessimistic inference one might make from philosophy’s paradoxes and impossibility theorems is that LLMs cannot have knowledge due to these classical epistemological paradoxes. IP = PSPACE = ... = Knowledge? Theoretical computer science has long been enamored with the concept of interactive proofs. These proofs allow one entity (a prover) to convince another entity (a verifier) that they have knowledge of a particular computation without revealing it directly. These proofs could be viewed as the modern version of the Socratic method: the veracity of a statement is assessed after rigorous back and forth of questioning of the prover (analogous to a Socratic student) by the verifier (teacher). A verifier is satisfied if the prover can answer sufficiently many questions correctly.