Evergreen - HackMD

--- tags: AI, audit, security, watchtower, evergreen --- # Evergreen ### Problem Over 2025, the industry saw multiple successful exploits where attackers presumably used AI/agentic tooling to accelerate discovery and weaponization of vulnerabilities—sometimes in long-standing, “battle-tested” production code rather than only in new changes. Lido’s security posture should assume adversaries have continuously-improving automated capability and respond with an equally continuous, system-aware defense layer. ### Proposed solution Evergreen: an evergreen auditing engine/toolbelt made of multiple AI agents (but not limited to it) that continuously analyze Lido’s production code and its operating context, producing actionable security findings ahead of adversaries. The preferred solution is to find a 3rd party capable of developing, fine-tuning and maintaining the engine, rather than approaching the project with in-house dev resources. The key idea is not “one more scanner,” but an always-on, evolving set of agents that: * Understand Lido’s protocol architecture and trust boundaries (staking flow, oracle/reporting, governance, modules/adapters, upgrade patterns). * Are aware of the environment Lido operates in (integrations, dependency graph, deployment topology, known risk assumptions). * Runs on a set of triggers (schedule/meaningful code change/meaningful LLM upgrade/newly discovered attack vectors/hacks in the industry). ### Core capabilities 1) Continuous codebase and actual setup scanning * Targets both new diffs and old code under newly-discovered risk patterns (e.g., dependency behavior changes, new EVM quirks, novel exploit patterns). 2) Architecture-aware analysis * Agents operate with a structured “Lido context pack”: documentation, invariants, privileged roles, interfaces, canonical threat model, and historical incidents/near-misses (internal + external). * Findings are framed in terms of Lido-specific impact, not generic “smells”. 3) Multi-agent toolbelt * Invariant/Property agent (protocol invariants, state transitions) * Access-control & upgrade agent (roles, upgrade plans, proxy risks, permissions changes) * Economic/MEV agent (griefing, sandwichable flows, oracle manipulations) * Dependency agent (external libs and contracts, compiler/toolchain changes) * Diff agent (PR-level regression and “what changed in security terms?”) 4) Output designed for action * Triage-friendly reports: severity, affected components, recommended fix, and repro steps. * Maybe: “Security regression flags” that can gate merges for high-confidence critical issues. 5) Feedback incorporation mechanism * A clear way for the team to acknowledge findings and update the prompts accordingly. * Fine-tuning the sensitivity based on how the previous findings have been addressed. ### Operating model * Human-in-the-loop by design: Evergreen does not merge code or deploy. It recommends, the team decides. * Feedback loop: every resolved finding (true/false positive, fix type, root cause) updates prompts/rulesets and test templates so the system improves over time. * Evidence-based: wherever possible, findings must include a minimal PoC in a controlled test environment, or a formalizable property/invariant violation. ### Guardrails and risks * False positives/noise: mitigated via strict reporting format, and “prove it” requirement (PoC/property). * Over-reliance: position as augmentation to audits, not replacement. * *Mystery:* Prompt/model supply-chain: lock down context packs, version changes, and agent configs. * *Mystery:* Data exposure: keep execution in Lido-controlled infra; no sensitive code/context sent to third parties without explicit approvals. ### Success criteria * Surface “newly exploitable old code” risks driven by emerging exploit techniques. * Reduce time-to-triage for security reviewers and auditors via structured, reproducible outputs. * *Mystery:* Over 1–2 quarters: measurable increase in pre-merge detection of high/medium issues and reduction in recurring classes of bugs. ### Path forward/Next steps * Talk to 3rd party security experts working on AI-enabled security and monitoring tools. * Try pilot scans of certain areas of the codebase/new projects (already in progress to some limited success). * Evaluate the output within the CTO Bar/Audit Committee/ProSecCo/project owners.