# Quantus External Miner — Concise Checklist
This checklist summarizes day-to-day engineering tasks for CPU and GPU paths, build flags, validation, metrics, and rollout steps.
Last updated: see roadmap.md for context and rationale.
---
## 0) Prerequisites
- Rust toolchain: stable (see rust-toolchain file)
- Workspace build sanity:
- CPU only: `cargo build -p miner-cli`
- With CUDA support: `cargo build -p miner-cli --features cuda`
- Metrics available: run with `--metrics-port <port>`
---
## 1) CPU Engines
### 1.1 Build & Run
- Baseline:
- Build: `cargo build -p miner-cli`
- Run: `quantus-miner --engine cpu-baseline --metrics-port 9919`
- Fast:
- Run: `quantus-miner --engine cpu-fast --metrics-port 9919`
- Montgomery (auto-detects backend):
- Run: `quantus-miner --engine cpu-montgomery --metrics-port 9919`
- Optional chunking knob: `--progress-chunk-ms 5000`
- Backend override (for A/B):
- `MINER_MONT_BACKEND=portable|bmi2|bmi2-adx|umulh`
### 1.2 Verification (local)
- Parity tests:
- `cargo test -p engine-montgomery -q`
- End-to-end sanity:
- Confirm distances and winners match `cpu-fast` on small ranges (unit tests cover this).
### 1.3 Observability
- Ensure backend metric is present:
- `miner_engine_backend{engine="cpu-montgomery", backend="<name>"} = 1`
- Keep chunking equal across engines when comparing hashrates for apples-to-apples.
---
## 2) GPU — CUDA (Feature-Gated)
### 2.1 Build & Select
- Build CUDA-enabled binary:
- `cargo build -p miner-cli --features cuda`
- Select engine (error and exit if CUDA unavailable):
- `quantus-miner --engine gpu-cuda --metrics-port 9919`
### 2.2 Staged Implementation
- G1 — Bring-up:
- [ ] Device: 8×64 Montgomery CIOS with 64-bit products + __umul64hi.
- [ ] Host: SHA3-512 for a small nonce window (copy back y in normal domain).
- [ ] Parity: CPU vs GPU y values → identical SHA3 → identical distances.
- G2 — Device SHA3 + Early-exit:
- [ ] Device: Keccak-f[1600] (24 rounds) optimized for 64-byte input.
- [ ] Device: threshold compare and early-exit via atomic flag; ordered candidate write.
- [ ] Host: poll flag; return EngineStatus; maintain hash_count semantics.
- G3 — Tuning:
- [ ] Tune K nonces/thread, block sizes, occupancy.
- [ ] Unroll Montgomery limbs and Keccak rounds; minimize spills.
- [ ] Move constants (m, n, n0_inv, R^2, target, threshold) to constant memory.
- [ ] (Optional) Multi-GPU orchestration.
---
## 3) Testing & Validation
### 3.1 CPU Property Tests
- Portable Montgomery vs pow-core step_mul (multiple steps).
- Backend parity:
- x86_64: BMI2 & BMI2+ADX vs portable.
- aarch64: UMULH vs portable.
- End-to-end small ranges: `cpu-montgomery` vs `cpu-fast`.
- Command: `cargo test -q`
### 3.2 GPU Parity (as kernels land)
- G1: Device Montgomery → host SHA3 parity over small windows.
- G2: Full device parity vs `cpu-fast` for small ranges.
- Optional “shadow mode”: sample small fraction of work to cross-check on CPU.
---
## 4) Metrics & Dashboards
- Core series:
- `miner_hash_rate` (global gauge)
- `miner_job_hash_rate{engine,job_id}`, `miner_thread_hash_rate{engine,job_id,thread_id}`
- `miner_job_hashes_total{engine,job_id}`, `miner_thread_hashes_total{engine,job_id,thread_id}`
- `miner_jobs_total{status}`, `miner_effective_cpus`
- Backend info:
- `miner_engine_backend{engine="cpu-montgomery", backend="<name>"}` = 1
- Best practice:
- When comparing engines/backends, keep `--progress-chunk-ms` equal (e.g., 5000).
---
## 5) A/B Procedure (Apples-to-Apples)
- Pick a node; run back-to-back 5–10 min windows with identical settings:
- Engine A (e.g., `cpu-fast`)
- Engine B (e.g., `cpu-montgomery`)
- Optionally force backend: `MINER_MONT_BACKEND=bmi2-adx` (on ADX-capable x86_64)
- Keep:
- Same workers (`--workers`)
- Same chunking (`--progress-chunk-ms`)
- Same job mix/range sizes
- Compare:
- Overall hashrate
- Per-job rates
- Stability and latency to early-cancel
---
## 6) Build & Feature Matrix
- Default (CPU-only): `cargo build -p miner-cli`
- With CUDA: `cargo build -p miner-cli --features cuda`
- Service features (implicit via workspace):
- `metrics` on by default; `montgomery` on by default
- `cuda` must be enabled explicitly to select GPU engine
---
## 7) Release Gates
- CPU:
- [ ] Unit/property tests pass on x86_64 and aarch64.
- [ ] Backend parity validated (ADX and UMULH on capable hosts).
- [ ] Dashboard review: uplift vs `cpu-fast`, stable metrics.
- GPU:
- [ ] G1 parity achieved (device Montgomery → host SHA3).
- [ ] G2 full device parity and stable early-exit.
- [ ] Tuning pass with documented settings and measured wins.
- Docs:
- [ ] roadmap.md updated
- [ ] engine-montgomery/readme.md updated
- [ ] GPU README (once kernels land)
---
## 8) Troubleshooting
- `gpu-cuda` selection errors:
- Built without CUDA feature → rebuild with `--features cuda`.
- CUDA init failed → check driver/device; the service will fail fast by design.
- Low CPU hashrate:
- Increase chunking: `--progress-chunk-ms 5000`
- Verify backend selection in logs/metrics (`bmi2-adx`, `bmi2`, `umulh`)
- Metrics missing:
- Ensure `--metrics-port` specified.
- Confirm `miner_engine_backend` appears for `cpu-montgomery`.
---
## 9) Quick Commands
- Build (CPU): `cargo build -p miner-cli`
- Build (CUDA): `cargo build -p miner-cli --features cuda`
- Run (Montgomery): `quantus-miner --engine cpu-montgomery --metrics-port 9919 --progress-chunk-ms 5000`
- Force backend:
- `MINER_MONT_BACKEND=bmi2-adx quantus-miner --engine cpu-montgomery ...`
- Run (CUDA, fail fast if unavailable): `quantus-miner --engine gpu-cuda --metrics-port 9919`
- Tests: `cargo test -q`
---