# Towards the Dream Zone: Better Measurement and Collaboration in Lean ## 1. The Force We Can Harness Whether you are contributing to mathlib, formalizing a landmark proof, or building Lean tooling, some contributions naturally unlock a wave of future progress, while others quietly keep the work stable. If we could make those effects visible, see not just what was added but what it enabled, each project could better guide effort toward the changes with the biggest impact. Not by imposing a plan but by giving contributors clearer feedback on how their work moves the project forward. In the literature, there is a concept called an impact evaluator. It measures what is happening, assesses which actions truly advance the objective, and feeds that back to the people doing the work. With the right impact evaluators in Lean, this feedback could become automatic, precise, and actionable, so collaboration becomes sharper and more effective without losing its openness. My own interest in this was deepened by attending the Impact Evaluator Research Retreat ([IERR](https://www.researchretreat.org/ierr-2025/)) in Iceland, where I explored how measurement, evaluation, and reward systems can support collective effort. The ideas from that experience fit naturally with the opportunities I see in the Lean ecosystem. ## **2. Why Lean is the Perfect Place to Apply This Idea** Lean has a rare combination of properties that make it especially well suited for building tools that enhance collaboration through better measurement and feedback. - **Explicit dependencies** Every theorem, definition, and lemma in Lean is connected through machine-readable links. This makes it possible to map exactly how ideas build on one another. - **Machine-verified correctness** When a proof compiles, it is correct. This removes the ambiguity found in many collaborative projects where correctness depends on subjective review. - **Complete version history** Lean projects use git, so every change is recorded. This provides a base layer for attributing contributions and understanding how the work evolves. - **Active and ambitious community** From maintaining mathlib to formalizing landmark results, there is a constant flow of work that could benefit from clearer visibility and coordination. - **Compatibility with human and AI contributors** Both people and automated tools can add proofs. With the right measurement and evaluation in place, their efforts can be combined more effectively. These features make Lean an ideal environment for applying the principles behind impact evaluators to strengthen collaboration and make progress more visible. ## **3. The Loop: Measure, Evaluate, Reward** At the core of this approach is a simple loop. **Measure** : See what is happening. In Lean, that means collecting data from the dependency graph, proof metadata, and version history. This shows who contributed what, how those contributions connect, and where they are used. **Evaluate** : Assess how each action advances the project’s objectives across multiple dimensions. Impact is not just about how often something is used. A single lemma might only appear once yet be central to a major theorem. Another contribution might remove a bottleneck, enabling many others to make progress. Evaluation can consider factors like foundational importance, enabling power, difficulty, and stability. **Reward** : Feed the results back to the community in a form that supports better decisions. This could be as simple as dashboards that highlight high-impact contributions, or as ambitious as funding or recognition systems tied to impact. The strength of the loop depends on its first two steps. Better measurement produces better evaluation, and better evaluation makes rewards more accurate and meaningful. Over time, this alignment helps contributions push the project forward in the most effective way. ## **4. The Measurement–Evaluation Graph** Not all domains are equally easy to measure or evaluate. Some give you rich, structured data but make it hard to know which actions really matter. Others are easy to assess in hindsight but hard to instrument in real time. We can think of this as a 2D space: - **X-axis:** ease of measurement, how straightforward it is to collect accurate, relevant data about what is happening - **Y-axis:** ease of evaluation, how straightforward it is to determine which actions actually move the objective forward ![measure_eval_zones](https://hackmd.io/_uploads/SkpN6XX_ll.png) The top right corner is the **dream zone**. These are domains where both measurement and evaluation are easy, so it becomes simple to build effective feedback loops. The bottom left is the **fog zone**, where neither is easy and progress is hard to steer. Right now, Lean’s collaborative work sits closer to the “easy to measure, harder to evaluate” quadrant. Proofs, dependencies, and contributions are all recorded in machine-readable form, which makes sight relatively easy. But evaluation is still multi-dimensional and sometimes subjective. The true impact of a lemma or definition can’t be captured by a single metric. The opportunity is to build tools that make evaluation more structured and transparent, gradually shifting more of Lean’s work toward the dream zone. ## **5. Objectives: What We Are Optimizing For** Before building any measurement or evaluation system, we need to be clear on the objective. Without that clarity, it is easy to measure the wrong things and steer effort in unhelpful directions. In Lean, objectives can vary widely depending on the project: - **Library health** : ensuring broad coverage, correctness, and maintainability in mathlib or other shared libraries - **Landmark theorem progress** : advancing the formalization of high-profile results, such as Fermat’s Last Theorem - **Foundational strength** : creating definitions and lemmas that unlock future work - **Ease of contribution** : lowering the barrier for newcomers through better docs, examples, and modular proofs - **Human–AI collaboration efficiency** : ensuring human and automated contributions fit together productively Even within a single objective, evaluation is rarely one-dimensional. A contribution might be: - foundational but only used once - difficult and requiring rare expertise - enabling a whole new class of proofs - replacing a fragile workaround with something stable - unlocking progress in a stalled area of the project Because evaluation is multi-dimensional, the tooling must be flexible enough to account for different weightings in different contexts. What matters most in a large, general-purpose library might differ from what matters in a focused formalization project. Clear objectives keep measurement meaningful, evaluation fair, and rewards aligned with the project’s real priorities. ## **6. Making it Practical with a Reduced Scope** Evaluating all Lean contributions across all projects is a complex challenge. Different objectives, contexts, and priorities mean that no single set of measurements will fit every case. One way to make progress is to start with a reduced scope. A well-defined project where the objectives are clear and the complexity is lower. This creates a kind of sandbox where we can experiment with measurement and evaluation tools before applying them more broadly. Even then, the work is far from mechanical. Translating a written proof into Lean is not a direct copy–paste process. Papers often skip steps, rely on implicit lemmas, or use definitions that don’t exist in Lean yet. Formalization requires filling in those gaps, restructuring arguments, and sometimes building entirely new foundations. Evaluation remains multi-dimensional. The most critical lemma in a project might be used only once, but proving it could require rare expertise or unlock the rest of the proof. Raw metrics like “number of uses” or “lines of code” will miss this without human insight. Still, a reduced-scope project gives us the advantage of clarity: - The boundaries are known - The end goal is fixed - Progress can be mapped in detail against a specific objective For example, consider formalizing a specific landmark proof from start to finish. We could: - Map its dependency graph as it emerges in Lean - Track progress automatically as nodes are completed - Attribute contributions using git history and structure-aware analysis - Combine automated metrics with curated expert evaluation By starting here, we create a proving ground for tools and methods that make measurement and evaluation more reliable. Once refined, these approaches can be scaled up to larger, more complex Lean projects, gradually moving more of the community’s work toward the dream zone. ## **7. Why the Reduced Scope Still Isn’t in the Dream Zone, and How to Get Closer** Even in a reduced-scope project like formalizing a single mathematical paper, we are not yet in the dream zone. In theory, the boundaries are clear, the objective is fixed, and the progress should be easy to track. In practice, two challenges still make this harder than it could be: 1. **Measurement gaps** - Tools like *blueprints* already do a great job of laying out the proof structure, showing what’s been formalized, and linking to Lean code. - But blueprints are often manually maintained, so they can lag behind the actual Lean state. They work at a section/subsection level, which can hide the smaller enabling lemmas that make progress possible. And they don’t automatically surface who contributed what or rank tasks by impact or difficulty. 2. **Evaluation complexity** - Some contributions have obvious value, they’re reused across many lemmas, but others may be equally important despite being used only once. - Difficulty, enabling power, and foundational value aren’t automatically visible. Human judgment is still needed, and without the right tools, that judgment is scattered across conversations and individual mental maps. Experienced contributors and coordinators often know the key bottlenecks and priorities. The problem is that this knowledge isn’t always visible to everyone, including newcomers or AI tools and it isn’t captured in a structured way that can be queried or analyzed. --- ### **Tools to Move Us Closer to the Dream Zone** If we want to make a reduced-scope project easier to measure and evaluate, we can start building tools that close these gaps: - **Dependency graph explorer** : Makes the structure of the project visible, down to the level of individual lemmas, and updates automatically as the project evolves. - **Critical gap detection** : Flags “critical sorries”, missing proofs that sit on high-leverage paths in the dependency graph. - In Lean, a `sorry` acts as a placeholder proof, so downstream theorems can still typecheck. - But this comes at a cost: if the `sorry` later turns out to be unprovable as stated, every theorem that depends on it is compromised. This means long chains built on top of a `sorry` are structurally fragile, and resolving those sorries should be a high priority. - **Progress dashboards** : Show completion percentage for the whole project, with trends over time, built from live Lean data. - **Contribution attribution** : Connects each result in the graph to its contributors, so recognition and historical tracking are automatic. - **Bottleneck finder** : In reduced scope, less about *discovering* the obvious blockers (which the blueprint or coordinators already know) and more about confirming priorities and surfacing hidden bottlenecks that may not be immediately visible. - **Dynamic difficulty estimator** : Predicts which remaining tasks are likely to require the most effort, based on proof patterns and past attempts. - **Foundational value scoring** : Combines automated metrics and curator input to highlight results that are disproportionately important, even if rarely used. --- **How these tools close the measurement and evaluation gaps** | Tool | Measurement (seeing) | Evaluation (judging) | | --- | --- | --- | | Dependency graph explorer | Shows the live structure of all lemmas/theorems and their dependencies | — | | Critical gap detection | Identifies missing proofs (`sorries`) and their positions in the graph | Highlights which gaps are high-risk and should be prioritized | | Progress dashboards | Tracks completion %, trends, and project milestones | — | | Contribution attribution | Records who authored or updated each result | — | | Bottleneck finder | — | Finds proofs whose completion would unlock the most downstream results | | Dynamic difficulty estimator | — | Predicts which tasks will require the most effort | | Foundational value scoring | — | Flags disproportionately important results, even if rarely reused | By making progress, fragility, and impact more legible, building on what blueprints already do well, these tools shrink the measurement and evaluation gaps, pushing the project toward the dream zone where **coordinated, high-impact contributions become much easier.** --- ## **8. Conclusion and Call to Action** Reduced-scope projects like formalizing a single mathematical paper in Lean make it easier to see where measurement and evaluation still fall short. Blueprints already provide a strong starting point, but they don’t automatically update, capture fine-grained dependencies, or highlight the highest-leverage work. That’s where targeted tools can help: - Dependency graph explorers for full project visibility. - Critical gap detection for fragile chains built on `sorries`. - Progress dashboards for live tracking. - Bottleneck and value analysis for clearer priorities. By making the project’s structure and priorities visible to everyone, we reduce wasted effort, improve coordination, and allow contributors to focus on work with the highest impact. The next step is straightforward: - Prototype these tools on a real reduced-scope project. - Compare how well they capture the priorities experienced contributors already know. - Iterate until they reliably surface what matters most. If we can close the measurement and evaluation gaps here, the same methods can be extended to larger Lean efforts. The aim is simple: better visibility, clearer priorities, and more effective collaboration. Github: https://github.com/iammadab/lean-extract