# Quantus External Miner — Concise Checklist This checklist summarizes day-to-day engineering tasks for CPU and GPU paths, build flags, validation, metrics, and rollout steps. Last updated: see roadmap.md for context and rationale. --- ## 0) Prerequisites - Rust toolchain: stable (see rust-toolchain file) - Workspace build sanity: - CPU only: `cargo build -p miner-cli` - With CUDA support: `cargo build -p miner-cli --features cuda` - Metrics available: run with `--metrics-port <port>` --- ## 1) CPU Engines ### 1.1 Build & Run - Baseline: - Build: `cargo build -p miner-cli` - Run: `quantus-miner --engine cpu-baseline --metrics-port 9919` - Fast: - Run: `quantus-miner --engine cpu-fast --metrics-port 9919` - Montgomery (auto-detects backend): - Run: `quantus-miner --engine cpu-montgomery --metrics-port 9919` - Optional chunking knob: `--progress-chunk-ms 5000` - Backend override (for A/B): - `MINER_MONT_BACKEND=portable|bmi2|bmi2-adx|umulh` ### 1.2 Verification (local) - Parity tests: - `cargo test -p engine-montgomery -q` - End-to-end sanity: - Confirm distances and winners match `cpu-fast` on small ranges (unit tests cover this). ### 1.3 Observability - Ensure backend metric is present: - `miner_engine_backend{engine="cpu-montgomery", backend="<name>"} = 1` - Keep chunking equal across engines when comparing hashrates for apples-to-apples. --- ## 2) GPU — CUDA (Feature-Gated) ### 2.1 Build & Select - Build CUDA-enabled binary: - `cargo build -p miner-cli --features cuda` - Select engine (error and exit if CUDA unavailable): - `quantus-miner --engine gpu-cuda --metrics-port 9919` ### 2.2 Staged Implementation - G1 — Bring-up: - [ ] Device: 8×64 Montgomery CIOS with 64-bit products + __umul64hi. - [ ] Host: SHA3-512 for a small nonce window (copy back y in normal domain). - [ ] Parity: CPU vs GPU y values → identical SHA3 → identical distances. - G2 — Device SHA3 + Early-exit: - [ ] Device: Keccak-f[1600] (24 rounds) optimized for 64-byte input. - [ ] Device: threshold compare and early-exit via atomic flag; ordered candidate write. - [ ] Host: poll flag; return EngineStatus; maintain hash_count semantics. - G3 — Tuning: - [ ] Tune K nonces/thread, block sizes, occupancy. - [ ] Unroll Montgomery limbs and Keccak rounds; minimize spills. - [ ] Move constants (m, n, n0_inv, R^2, target, threshold) to constant memory. - [ ] (Optional) Multi-GPU orchestration. --- ## 3) Testing & Validation ### 3.1 CPU Property Tests - Portable Montgomery vs pow-core step_mul (multiple steps). - Backend parity: - x86_64: BMI2 & BMI2+ADX vs portable. - aarch64: UMULH vs portable. - End-to-end small ranges: `cpu-montgomery` vs `cpu-fast`. - Command: `cargo test -q` ### 3.2 GPU Parity (as kernels land) - G1: Device Montgomery → host SHA3 parity over small windows. - G2: Full device parity vs `cpu-fast` for small ranges. - Optional “shadow mode”: sample small fraction of work to cross-check on CPU. --- ## 4) Metrics & Dashboards - Core series: - `miner_hash_rate` (global gauge) - `miner_job_hash_rate{engine,job_id}`, `miner_thread_hash_rate{engine,job_id,thread_id}` - `miner_job_hashes_total{engine,job_id}`, `miner_thread_hashes_total{engine,job_id,thread_id}` - `miner_jobs_total{status}`, `miner_effective_cpus` - Backend info: - `miner_engine_backend{engine="cpu-montgomery", backend="<name>"}` = 1 - Best practice: - When comparing engines/backends, keep `--progress-chunk-ms` equal (e.g., 5000). --- ## 5) A/B Procedure (Apples-to-Apples) - Pick a node; run back-to-back 5–10 min windows with identical settings: - Engine A (e.g., `cpu-fast`) - Engine B (e.g., `cpu-montgomery`) - Optionally force backend: `MINER_MONT_BACKEND=bmi2-adx` (on ADX-capable x86_64) - Keep: - Same workers (`--workers`) - Same chunking (`--progress-chunk-ms`) - Same job mix/range sizes - Compare: - Overall hashrate - Per-job rates - Stability and latency to early-cancel --- ## 6) Build & Feature Matrix - Default (CPU-only): `cargo build -p miner-cli` - With CUDA: `cargo build -p miner-cli --features cuda` - Service features (implicit via workspace): - `metrics` on by default; `montgomery` on by default - `cuda` must be enabled explicitly to select GPU engine --- ## 7) Release Gates - CPU: - [ ] Unit/property tests pass on x86_64 and aarch64. - [ ] Backend parity validated (ADX and UMULH on capable hosts). - [ ] Dashboard review: uplift vs `cpu-fast`, stable metrics. - GPU: - [ ] G1 parity achieved (device Montgomery → host SHA3). - [ ] G2 full device parity and stable early-exit. - [ ] Tuning pass with documented settings and measured wins. - Docs: - [ ] roadmap.md updated - [ ] engine-montgomery/readme.md updated - [ ] GPU README (once kernels land) --- ## 8) Troubleshooting - `gpu-cuda` selection errors: - Built without CUDA feature → rebuild with `--features cuda`. - CUDA init failed → check driver/device; the service will fail fast by design. - Low CPU hashrate: - Increase chunking: `--progress-chunk-ms 5000` - Verify backend selection in logs/metrics (`bmi2-adx`, `bmi2`, `umulh`) - Metrics missing: - Ensure `--metrics-port` specified. - Confirm `miner_engine_backend` appears for `cpu-montgomery`. --- ## 9) Quick Commands - Build (CPU): `cargo build -p miner-cli` - Build (CUDA): `cargo build -p miner-cli --features cuda` - Run (Montgomery): `quantus-miner --engine cpu-montgomery --metrics-port 9919 --progress-chunk-ms 5000` - Force backend: - `MINER_MONT_BACKEND=bmi2-adx quantus-miner --engine cpu-montgomery ...` - Run (CUDA, fail fast if unavailable): `quantus-miner --engine gpu-cuda --metrics-port 9919` - Tests: `cargo test -q` ---