Preliminary Readings: Master's Thesis and MPP

# Readings: Master's Thesis and MPP Readings and relevant research pertaining to *hs-to-lean* and the MPP research. ## November 2026 ### Integrate LLMs with Lean (server) * [LLMStep: LLM Proofstep Suggestions in Lean](https://mathai2023.github.io/papers/40.pdf) - Sean Welleck (2023) > "LLMSTEP is a Lean 4 tactic that sends a user’s proof state to a server hosting a language model. The language model generates suggestions, which are checked in Lean and displayed to a user in their development environment." * [Repo](https://github.com/wellecks/llmstep) * [Lean-STaR](https://scholar.google.com/citations?view_op=view_citation&hl=en&user=Ac6n5pQAAAAJ&cstart=20&pagesize=80&citation_for_view=Ac6n5pQAAAAJ:lSLTfruPkqcC) -- Sean Welleck (for training LLMs to "think" informally before proving) * [An interesting discussion about LLMs and Lean](https://news.ycombinator.com/item?id=45713455) * [LeanTool](https://github.com/GasStationManager/LeanTool/tree/main) - A "code interpreter" for Lean4 (202?) > "LeanTool is a simple utility that connects LLMs with a "Code Interpreter" for Lean. This is implemented as tool calls from the LLM to the Lean executable, hence the name." * Exposes/passes tactic state to LLM during proof -- MCP allows feedback loop * MCP (Model Context Protocol) server can connect LeanTool to coding assistants (Cursor, Aider, VSCode, etc.). * Workflow: Lean execution tool (including plugins) get exposed to MCP server as MCP tool which can be used by LMMs in coding assistants. * [MCP Documentation](https://modelcontextprotocol.io/docs/getting-started/intro) * [Provably-Correct Vibe Coding](http://www.provablycorrectvibecoding.com:8501/?page=main) (I don't have an API key yet to test it out) * [Sample prompt](https://github.com/GasStationManager/LeanTool/blob/main/examples/sagredo_prompt.txt) * GasStationManager's "essay" -- [A Proposal for Safe and Hallucination-free Coding AI](https://gasstationmanager.github.io/ai/2024/11/04/a-proposal.html) * [Phantograph](https://arxiv.org/abs/2410.16429) -- Aniva et al. * machine-to-machine interface for automated theorem proving in Lean * TBR... ### Untyped vs Typed * [Stack exchange: Why DTT over ST for PA?](https://mathoverflow.net/questions/376839/what-makes-dependent-type-theory-more-suitable-than-set-theory-for-proof-assista/376973#376973) * Proof assistant = Kernel (with formal system **$F$**) $\to$ Vernacular (expressive formal language) $\to$ Metalanguage (formal language tactic language, automation, etc/) * $F$ (kernel formal system): correctness important, must be small (easier to implement), doesn't have to be human-readable * Choice in logical foundation (set vs type theory) depends on other factors (technical needs, efficiency, available algorithms, etc.) * Vernacular ($V$): must reflect "practiced" mathematics while being formal enough to translated to $F$, fill in for (some) logical assumptions * $V$ specifications and theoretical framework resemble type theory (and almost never set theory) * How to detect errors in notation/logic even when accepted by kernel? (can we add a mechanism to flag suspicious proof steps for humans to review) * $M$ (metalanguage): less interesting from mathematician POV, but very important in automation * more like a programming language than a formal system * [Should your specification language be typed?](https://lamport.azurewebsites.net/pubs/lamport-types.pdf) - Lamport and Paulson * ## Next Meeting * [ProofWala: Multilingual Data Synthesis and Theorem Proving](https://arxiv.org/abs/2502.04671) (2025) - Amitayush Thakur, George Tsoukalas, Greg Durrett, Swarat Chaudhuri * [Framework (repo)](https://github.com/trishullab/proof-wala) * [multi-lingual ITP interaction module (repo)](https://github.com/trishullab/itp-interface) * [FMC: Formalization of Natural Language Mathematical Competition Problems](https://arxiv.org/abs/2507.11275) (2025) - Jiaxuan Xie, Chengwu Liu, Ye Yuan, Siqi Li, Zhiping Xiao, Ming Zhang * proposed autoformalization pipeline (NL to Lean) * [FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models](https://arxiv.org/abs/2505.02735) (2025) - Zhouliang Yu, Ruotian Peng, Keyi Ding, Yizhe Li, Zhongyuan Peng, Minghao Liu, Yifan Zhang, Zheng Yuan, Huajian Xin, Wenhao Huang, Yandong Wen, Ge Zhang, Weiyang Liu * new benchmark (Lean4) for autoformalization AND proposed autoformalization pipeline (human-in-the-loop) * [Lean-STaR](https://arxiv.org/abs/2407.10040) * Having the autoformalizer do informal reasoning between each(?) step of the formal proof * [From informal to formal proofs in Euclidean geometry](https://link.springer.com/article/10.1007/s10472-018-9597-7) * uses a semi-formal proof step to translate from informal to semi-formal (and verify the semi-formal) ... Alexander closing tabs: - [Benchmarks for Natural Language and Formal Proofs](/L_QNoQ6NTJi3kSUdKwNv9w) - [Imperative Haskell](/hE2KNOThSsCPZ1HOn0lxyA) - [Isabelle Parallel Corpus](https://behemoth.cl.cam.ac.uk/ipc/) - Large-Scale Formal Proof for the Working Mathematician -- Lessons learnt from the ALEXANDRIA Project, [pdf](https://arxiv.org/pdf/2305.14407) - [GFLean](/513WNjGDQpu570fY3ziRAA) - [Deep Neural Network Verification](/39ivc7ftRECpS6zrQPo8hQ) - [Related Work](/vY-Bexu6SeuT325PNUdRlw) ## September 03 -- Found Readings * [Formalizing Complex Mathematical Statements with LLMs: A Study on Mathematical Definitions](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=Formalizing+Complex+Mathematical+Statements+with+LLMs%3A+AStudy+on+Mathematical+Definitions&btnG=) -- Lan Zhang, Marco Valentino, André Freitas * [Formal Mathematical Reasoning: A New Frontier in AI](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=Formal+Mathematical+Reasoning%3A+ANewFrontier+in+AI&btnG=) -- Kaiyu Yang, Gabriel Poesia, Jingxuan He, Wenda Li, Kristin Lauter, Swarat Chaudhuri, Dawn Song * Mentions [Lean-STaR](https://arxiv.org/abs/2407.10040) * has a good "Related Works" sections -- lots of material to look at * not about an actual system/research. just a review of works relating to Math and AI * keeps bringing up the idea of multi-lingual LLMs (formalization across multiple proof assistant languages) * [MLFMF: DataSets for Machine Learning for Mathematical Formalization](https://arxiv.org/abs/2310.16005) -- Andrej Bauer, Matej Petkovic, Ljupˇ co Todorovski * [Github](https://github.com/ul-fmf/mlfmf-data/tree/main) * [Dataset package](https://zenodo.org/records/10041075) * Note: would need A LOT more computing power to run same simulations * Training and organizing via internal type theoretic representations (so it can apply to multiple proof assistants without relying on specific meta-languages) # Summer Readings ## Literature and Summaries ### * ["Large-Scale Formal Proof for the Working Mathematician -- Lessons learnt from the ALEXANDRIA Project"](https://arxiv.org/abs/2305.14407) - Lawrence Paulson (2023) (**software project**) * Figuring out the big problems related to formalization of mathematics * Looked at how mathematicians proved large theorems to identify patterns in "practiced" mathematics * trying to identify areas where Isabelle/HOL's library should be extended or changed * mentions the use of LLMs and intermediate representation for automatic formalization * Includes link to [Isabelle Parallel Corpus](https://behemoth.cl.cam.ac.uk/ipc/) website (proofs written in ITP and NL), but it does not seem like the site has been updated in a while. * ["Canonical for Automated Theorem Proving in Lean"](https://arxiv.org/abs/2504.06239) -- Chase Norman and Jeremy Avigad (2025) (**software**) * Canonical is "the first solver for type inhabitation in dependent type theory." Searches for terms of types (of a specific type) * also a Lean proof tactic for finding a proof term for main goal * goal: soundness and completeness for entire Lean language, including definitional equality * Hoping Canonical will help with tactic prediction (LLMs) and human users -- buffer between AI and the underlying theory * ["A Certified Proof Checker for Deep Neural Network Verification in Imandra"](https://arxiv.org/abs/2405.10611) -- Remi Desmartin et al. (2025) (**software**) * Uses DNN checker Marabou to generate certificates that are verified in Imandra (?) * Marabou Proof Trees * ["Informal to formal and back with LLMs"](https://www.newton.ac.uk/seminar/46714/) -- Olivier Bousquet (2025) (**seminar**) * formalizing mathematics is advantageous but tedious and the output can be hard to read * readable autoformalization: Naproche, [Verbose Lean](https://github.com/PatrickMassot/verbose-lean4) * Lean $\to$ English (deformalization) is easier; verification easier than generation * Formalization steps: (1) claim extraction, (2) definition extraction, (3) proof --- * ["Automated Formalization of Structured Natural Language requirements"](https://www.sciencedirect.com/science/article/abs/pii/S0950584921000707) -- Giannakopoulou et al. (2021) * unambiguous requirements captured by **structured natural language** -- user input parsed into semantic templates to form basis of formalization approach * includes use of temporal logic (unlike many others ) * ["Hiproofs: A Hierarchical Notation of Proof Tree"](http://sciencedirect.com/science/article/pii/S1571066106002003) -- Ewen Denney et al. (2006) * what is the role of tactics in semi-formal mathematics? (kurz) * ["Autoformalization with Large Language Models"](https://proceedings.neurips.cc/paper_files/paper/2022/hash/d0c6bc641a56bebee9d985b937307367-Abstract-Conference.html) -- Wu et al. (2022) * use Codex and PaLM models to translate mathematical competition problems into formal mathematics for ITP (Isabelle) * Reasons for translation failure >The majority of the failures are due to the misalignment of informal and formal definitions. For example, when seeing the phrase “the greatest possible value”, the LLMs often fail to align it with the function Greatest/Max in Isabelle. Another example is the failure to align the factorial of n (i.e., !n) to fact n in Isabelle. Other common failure modes include the misapplication of functions (e.g., applying a prefix function in an infix way). * mentions using informalization (aka deformalization) to help with "back-translation techniques" (see Sennrich et al.) * ["Improving Neural Machine Translations with Monolingual Data"](https://arxiv.org/abs/1511.06709) -- Sennrich et al. (2016) * cited by Wu et al. (above) and many others in this list * encoder-decoder for translating language pairs * refers mostly to language translations (English, German, Turkish), not mathematical ones * Verbose Lean (**software**) * Waterproof (**software**) * Lean Blueprint (**software**) ## Informalization/deformalization **general notes** * to have good deformalization, need large parallel training data (formal proof and corresponding informal proof) **Literature** * ["Multi-Language Diversity Benefits Autoformalization"](https://proceedings.neurips.cc/paper_files/paper/2024/hash/984de836e696ba653bbfbbbfce31d3bc-Abstract-Conference.html) -- Jiang et al. (2024) * address lack of parallel training data set by translating/informalizing Isabelle's and Lean's mathematical libraries -- saved into training set [MMA](https://github.com/albertqjiang/MMA) (multi-language mathematical autoformalization) * training LLMs to informalize (and autoformalize) multiple languages (i.e. Isabelle and Lean) * claims that autoformalization doesn't mean the translation has to be 100% correct * had experts analyze autoformalized proofs (in Lean and Isabelle). They kept track of how many proofs the experts needed to manually fix, and the degree of repair required. * authors say that having the computer get *mostly* there was still good progress and helps accelerate formalization efforts * most mistakes from autoformalization occured due to misunderstanding (ambiguity/context or the LLM's lack of understanding) * unlike others (Sennrich et al., Wu et al.), does not use back-translation (i.e. translating to from original language to target language and back with the same model) due to using GPT-4 ## Between informal and formal *Research and literature relating to developing a language somewhere between formal and informal proof language with the goal of autoformalization* * ["A Natural Formalized Proof Language"](https://arxiv.org/abs/2405.07973) -- Xie et al. (2024) * developed a natural proof language that is meant to be extremely readable to humans but consists of a parsable and verifiable formal system * includes partial proofs * considers/implements PA's tactic system in natural proof language * parses the natural formal proof into AST (that is parsed directly -- no changes), then performs static analysis (preserves context-dependent language til this point), then put through proof checker * cannot find public repo * kind of (very?) close to our goal and understanding of semi-formal mathematics * ["Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs"](https://arxiv.org/abs/2210.12283) -- Jiang et al. (2023) * Formalizing proofs by using informal proofs to automatically generate a **formal sketch**. The formal sketch is then completed into a formal proof through an ATP * informal proof generated either by humans or automatically by machines (Minerva) * machine-generated informal proof sometimes does not use the more "clever" or "creative" approaches --> proofs formalized by machine-generated informal proofs are sometimes longer or less concise * the formal sketch reflects our goals with a semi-formal language * [general repo](https://github.com/albertqjiang/draft_sketch_prove) and [example repo](https://github.com/wellecks/ntptutorial/blob/main/partII_dsp/notebooks/II_dsp__part2_dsp.ipynb)