zkSync Boojum Shivini Prover Setup

# Setting Up a zkSync Boojum Shivini Prover ## tl;dr This note will guide you through setting up your own (GPU-accelerated) prover machine(s) for zkSync's upcoming proof system upgrade, Boojum. This is currently for testing purposes only. ## Context The team behind [zkSync Era](https://zksync.io), Matter Labs, [has announced](https://zksync.mirror.xyz/HJ2Pj45EJkRdt5Pau-ZXwkV2ctPx8qFL19STM5jdYhc) an improved proof system, Boojum, currently running in [shadow-proving](https://github.com/matter-labs/zksync-era/tree/main/prover) mode on zkSync Era Mainnet. Boojum comes with [Shivini](https://github.com/matter-labs/era-shivini), a library implementing the GPU-accelerated prover utilising the [Boojum CUDA](https://github.com/matter-labs/era-boojum-cuda) library. ## Prerequisites In order to get set up with Boojum's prover, you'll need the following: - Basic GNU/Linux sysadmin knowledge - Feeling comfortable using the CLI - Access to a (virtual) machine with at least: - 1 GPU with at least 24 GB of memory - 8 CPU threads - 40 GB of RAM (32 GB plus swap may work, untested; [let me know](#Contributing) if you try this) - 30 GB of free disk space *Note: The hardware requirements are expected to decline, so you might want to check the latest [FRI prover repo docs](https://github.com/matter-labs/zksync-era/tree/main/prover/prover_fri#proving-a-block-using-gpu-prover-locally). In particular, once [Shivini PR #7](https://github.com/matter-labs/era-shivini/pull/7) is merged, a GPU with 16 GB of memory should be sufficient. However, currently (Oct 2023), the real RAM requirement is higher than the one stated in the repo.* ## Cloud VM Setup In the following, we'll assume you'll be using a GPU VM from a cloud provider. **Warning**: Both AWS EC2 (Amazon) and GCP CE (Google) will require you to request a "quota" increase prior to letting you spin up a VM with the required GPU, for the (available) region of your choosing. For Amazon, **this may take several days**. For Google, the allowance tends to be granted faster. Note: The cost of running such a VM starts at around $1/h. You may opt to use a "spot" instance provisioning policy while only testing, to decrease the costs (this allows the cloud provider to shut down your VM at any time, but comes at a significant discount). ### In GCP Note: Getting the correct dependency software versions exactly right may be important. In particular, you probably want to avoid the otherwise good practice of keeping your packages up-to-date (e.g. do NOT run `apt upgrade` or `apt dist-upgrade`). 1. Navigate to the GCP [Compute Engine](https://console.cloud.google.com/compute/instances) console (and log in with your Google account if needed). 1. Use the "[Create Instance](https://console.cloud.google.com/compute/instancesAdd?project=personal-369712)" button and configure your VM with [sufficient resources](#Prerequisites); you may want to click "GPU" under "Machine configuration" for basic prefiltering). - I used a single `NVIDIA L4` for GPU type, the `g2-standard-12` machine type and the "spot" VM provisioning model. Note that not every region/zone offers these. 1. Chose your boot disk image and size; I tested: - A) `Deep Learning VM with CUDA 12.1 M112` (described as "Debian 11, Python 3.10. With CUDA 12.1 preinstalled") and 64 GB of "Balanced persistent disk". Note: This actually installs CUDA 12.2 which is the most recent CUDA release as of writing this (Oct 2023). - B) `Ubuntu 22.04 LTS` and 64 GB of "Balanced persistent disk". 1. Check the cost estimate on the right and continue by pressing "Create". 1. Once provisioned (conditional on [quotas](#Cloud-VM-Setup)), the new VM should show up in the [CE VM table](https://console.cloud.google.com/compute/instances), with a "Connect" column. Press the "SSH" button there for a web terminal session, or use the small arrow next to it and select "View gcloud command", then copy-paste it into your local machine's terminal, for a local terminal session. - If you you're missing the `gcloud` command, ensure you have the `google-cloud-sdk` installed; e.g. for macOS: - `brew install google-cloud-sdk && gcloud auth login` - Example: - `gcloud compute ssh --zone "us-central1-a" "boojum-0" --project "personal-123456"` 1. Install the correct CUDA toolkit, incl. NVIDIA drivers. Note: This is a somewhat sensitive dependency area where installation conflicts with e.g. the default operating system packages may cause issues that are hard to debug. - A) For CUDA Debian 1. Agree to install the NVIDIA drivers. 2. Verify they were installed correctly: `nvidia-smi` should display the GPU name alongside the driver and CUDA version, e.g. `NVIDIA-SMI 535.86.10 Driver Version: 535.86.10 CUDA Version: 12.2` - cf. [#VM-Resizing](#VM-Resizing) - B) For Ubuntu 22.04 - Recommended installer 1. NVIDIA's apt source for CUDA 12.2: Follow the [official instructions](https://developer.nvidia.com/cuda-12-2-2-download-archive?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_local) with type `deb (local)`. 1. Make double sure that you've pinned the CUDA version. 1. Update envvars: - `echo "export CUDACXX=/usr/local/cuda-12.2/bin/nvcc" >> ~/.bashrc && . ~/.bashrc` - Alternative installers (may cause issues) - NVIDIA's, using their apt source, for CUDA 12.0: https://developer.nvidia.com/cuda-12-0-0-download-archive?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_network - GCP's: https://cloud.google.com/compute/docs/gpus/install-drivers-gpu - `curl https://raw.githubusercontent.com/GoogleCloudPlatform/compute-gpu-installation/main/linux/install_gpu_driver.py --output install_gpu_driver.py` - cf. https://stackoverflow.com/questions/72278881/no-cmake-cuda-compiler-could-be-found-when-installing-pytorch - Ubuntu 22.04 default apt (but it didn't work for me; [let me know](#Contributing) if it does for you): - `sudo apt install nvidia-cuda-toolkit` 1. Set up Rust - A) For CUDA Debian 1. Run: - `curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh` 1. Select `2` to modify the installation settings and ensure you're installing the `nightly` toolchain, not the default `stable` one (press `enter` for the other values to keep the defaults) 1. Once at the installation recap screen again, confirm with `enter` and wait for the installer to finish. 1. Re-source your shell to update the `$PATH` (alternatively, restart your SSH session): - `. ~/.bashrc` - B) For Ubuntu 22.04 1. `sudo snap install rustup --classic && rustup default nightly` 1. Set up libclang - A) For CUDA Debian - `sudo apt-get install clang` - B) For Ubuntu 22.04 - `sudo apt install clang` 1. Set up CMake (>3.24) - A) For CUDA Debian 1. Ensure you're in your home: `cd` 2. Download installer, e.g.: - `wget https://github.com/Kitware/CMake/releases/download/v3.28.0-rc2/cmake-3.28.0-rc2-linux-x86_64.sh` 4. Start: - `sh cmake-3.28.0-rc2-linux-x86_64.sh` 6. Press `q` and `y` to agree to the terms, and `enter` to continue with the default destination (home subdir) 7. Prepend the new version in your `PATH`: - `echo "export PATH=~/cmake-3.28.0-rc2-linux-x86_64/bin:$PATH" >> ~/.bashrc` 9. Re-source: - `. ~/.bashrc` - B) For Ubuntu 22.04 1. `sudo snap install cmake --classic` 1. Clone the Shivini repo 1. `git clone https://github.com/matter-labs/era-shivini.git` 1. `cd era-shivini` 9. Run tests 1. Adjust to the circuit file path to yours or skip this line in order for the tests to use the defaults: - `export CIRCUIT_FILE=/home/Orest/era-shivini/test_data/default.circuit` 1. Single-circuit: - `cargo test compare_proofs_for_single_zksync_circuit_in_single_shot --features zksync --release -- --nocapture --ignored` - *Note: The first run will take a longer time due to the dependencies' compilation.* 1. All circuits (note the changed ignore flags): - `cargo test compare_proofs_for_all_zksync_circuits --features zksync --release -- --nocapture --include-ignored` - *Note: This test is not finishing successfully using the default test data as of writing this (Oct 2023) due to the second assertion's failure (happening after some 80s (12 CPUs) or 110s (8 CPUs) for me).* :::spoiler ``` Will be evaluating Boolean constraint gate over specialized columns Evaluating general purpose gates thread 'test::zksync::compare_proofs_for_all_zksync_circuits' panicked at src/test.rs:347:5: assertion `left == right` failed left: [[0x6cf0c48f9a0ff194, 0x22686828290a25d7, 0xf620e629d603aa52, 0x7324f1b5f6c80ab4], [0x93da0b8f5d217f26, 0x65cf9a3c5c96285b, 0x46e3606365038355, 0x843746eeb6921bf8], [0x2bfe4047ec6b9ab1, 0x132d08a8dd6af9a9, 0x58f5fc81afcdfd2f, 0x08885ff588dd5fec], [0xdae2e197c305c708, 0xd7441235b45e1db6, 0x86f6b375253b87aa, 0x98dc60901a89b023], [0xf50b74803ba977d0, 0x5c628400de14c096, 0xa7fa1dd810e0bf15, 0x697894cea1c16d75], [0x349cb22c6bceaa5c, 0x9f8a340fde46da0a, 0x645ac5e6af11d6d4, 0x2184517c2823eb86], [0x4e353d4bdba7b93b, 0x781199f0bc9da6c7, 0x8157426fd582d702, 0x954b33cd639a09db], [0x6a40518c9bde6af1, 0x211ab5a566b00146, 0x0fff34fda3e349c0, 0xd99e636a81fdf129], [0x79f02481f9537e04, 0xbe7f227996327469, 0x6e69b448850c97e6, 0x005fdf4575154d05], [0xd869a2bc06d93516, 0xd134a7a3c5b20995, 0xc9ba9277b9605cb8, 0x2d79a6b20b997806], [0xffe8a78a4db5c06c, 0x9cef55610e4291e8, 0x0da5c5b5db3d036e, 0x9bbc318f78fad8a4], [0x434381e0fa812e6e, 0x9a4a7f92ed68391b, 0xb54f5de89acd823e, 0x7e98c3eb57b5125c], [0x994e0705018a2796, 0xf8e5900491e73d62, 0xf1182e58dbcc8bae, 0xb776af05fd314a6a], [0xc6289060ef629575, 0xddc09f0b0e1a1fa1, 0xd3952c284deba5ad, 0xbbdf277eb4901226], [0x1fb96c30cd48cd0f, 0x23d77708bf4a7acf, 0xf2813332ef3b8de7, 0xa09159c372780123], [0x8a318cb539ef88bd, 0x54362d5e20941c07, 0x14adda938f81b76d, 0x8cb611939ce60d59]] right: [[0x6c3d768df3628231, 0x88f3d55a055ae9de, 0x613c8599557944fc, 0xfeabca8e21d76179], [0xce534e9c1a62aa63, 0xabebd1cb74f7191d, 0xf04fe01e9def89a5, 0x32ac865f00638c8a], [0x4de44b973575d81c, 0xc9e0292255d42693, 0xb9e131b0c67add21, 0x84abc2177a070008], [0xfa1dff2a241960e6, 0x16d0e995b7fd231e, 0x29388d3bd3955220, 0x7e32c88b1d0cc1f6], [0x67d51fda15241c92, 0x962ba4c46eeb4848, 0xb77adaad3a5d670f, 0x867614c7b5dc1a92], [0x07fc09d1a3e45132, 0xfcbc7bc27b1e46b3, 0x4573402c382ba2b4, 0x3e4111b14484c48d], [0x71d399e337b5ceb3, 0xdd9912bab9c0f719, 0xc9706101b29d74c5, 0x3862210f852e92de], [0x007fabc7397bd809, 0x9a64c360ff957a2f, 0xa84b3c3f7b91d574, 0x30cb3e810f8a5a14], [0x81ca69f3a753df60, 0xffdd64990edf3ca4, 0xc188e37088dd6554, 0xb688f0dad4e29cf0], [0x58137d80b59376e1, 0xd09ff76056d98b8a, 0x890cf52a2e1c07f8, 0xfdacfd495ff7760f], [0xee2b749270c638fc, 0x9167a8dd83dc3b3f, 0x3f34bea2f4709f7d, 0x6e8c85ae54b9d06e], [0xb06396492b6b39e2, 0x73a8a6aaacc07250, 0x7d9a81a0a3dc6137, 0x72412b037feb0996], [0x1878824418bf2d52, 0x8b8b9aed277381d1, 0xbb379d601444a5f0, 0xae05333f38c69bc3], [0x7d2e4cd28eea5aa1, 0x89220cdcca2e845d, 0x89596ca8c9642256, 0xb0d15c34feaa0310], [0x7de8feccc8cd8c0f, 0x804debf9887b5af3, 0x3aad6a22faa8ce2f, 0xc249ea7225bea60d], [0x63986811662d265b, 0x4c35cbfdeccc1f99, 0x3bc226b930b849a5, 0x42c66f385574e880]] note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace freeing static cuda allocation freeing static cuda allocation test test::zksync::compare_proofs_for_all_zksync_circuits ... FAILED failures: failures: test::zksync::compare_proofs_for_all_zksync_circuits test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 11 filtered out; finished in 83.72s ``` ::: ## Notes and Troubleshooting ### Observed Hardware Resource Use Manual observation values peaked at: - RAM: 38 GB - `free -g` - GPU memory: 22232 MB - `nvidia-smi` - Load average (~CPU): 6 (at 8 vCPUs (threads) from 4 physical CPUs, no swap I/O) - `uptime` - Disk: 30 GB (including the OS) - `df -h ~` ### GCP VM Resizing - Warning: Changing the instance size of a GCP CUDA Debian VM after having used it seems to break the NVIDIA drivers, *sometimes*. To fix, run: - `sudo /opt/deeplearning/install-driver.sh` ### Misc - Matter Labs could consider adding broader CI to the repos in order to detect test regressions early. ## Back-of-the-Envelope Benchmarking ### GCP, 1 NVIDIA L4, 40 GB of RAM #### Default Single-Circuit Test - For 8 CPUs (`g2-custom-8-40960`, Ubuntu 22.04 LTS): **371.41s** total - Some excerpts of the slower sections: :::spoiler ``` gpu proving Buffering resolvers, 33715272 taken. CR stats Stats { values_added: 78, witnesses_added: 0, registrations_added: 33715272, started_at: Instant { tv_sec: 245, tv_nsec: 873036192, }, registration_time: 31.640905832s, total_resolution_time: 31.642763208s, } [...] cpu proving [...] Buffering resolvers, 32539740 taken. CR stats Stats { values_added: 78, witnesses_added: 0, registrations_added: 33715272, started_at: Instant { tv_sec: 282, tv_nsec: 748934022, }, registration_time: 29.466606377s, total_resolution_time: 29.468081695s, } [...] Will operate with LDEs of factor 8 Witness LDE taken 29.120301136s Merkle tree of size 2^21 leaf hashes taken 34.096147622s for 152 elements per leaf Nodes construction of size 2^21 taken 1.779060685s Second stage LDE taken 17.81468861s Merkle tree of size 2^21 leaf hashes taken 15.30030099s for 68 elements per leaf Nodes construction of size 2^21 taken 1.812841164s [...] Gates over general purposes columns contribution to quotient evaluation taken 51.045270734s Quotient work and LDE taken 81.744262168s Merkle tree of size 2^21 leaf hashes taken 3.452936818s for 16 elements per leaf [...] Batched FRI opening computation taken 20.46226588s [...] test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 11 filtered out; finished in 371.41s ``` ::: - For 12 CPUs (`g2-standard-12`, GCP CUDA Debian): **273.48s** - Some excerpts of the slower sections: :::spoiler ``` gpu proving Buffering resolvers, 33715272 taken. CR stats Stats { values_added: 78, witnesses_added: 0, registrations_added: 33715272, started_at: Instant { tv_sec: 157, tv_nsec: 462212317, }, registration_time: 22.260551903s, total_resolution_time: 22.262115763s, } [...] cpu proving [...] Buffering resolvers, 30295132 taken. CR stats Stats { values_added: 78, witnesses_added: 0, registrations_added: 33715272, started_at: Instant { tv_sec: 183, tv_nsec: 944819761, }, registration_time: 20.943228486s, total_resolution_time: 20.962915627s, } [...] Will operate with LDEs of factor 8 Witness LDE taken 20.658967282s Merkle tree of size 2^21 leaf hashes taken 21.973517116s for 152 elements per leaf Nodes construction of size 2^21 taken 1.237683713s Second stage LDE taken 12.782764677s Merkle tree of size 2^21 leaf hashes taken 10.400755648s for 68 elements per leaf Nodes construction of size 2^21 taken 1.264824651s [...] Gates over general purposes columns contribution to quotient evaluation taken 33.14526248s Quotient work and LDE taken 59.738590856s Merkle tree of size 2^21 leaf hashes taken 2.386003436s for 16 elements per leaf [...] Batched FRI opening computation taken 22.222478317s [...] test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 11 filtered out; finished in 273.48s ``` ::: ## Cost Overview *[TBD]* ## Related Repositories - Prover: https://github.com/matter-labs/era-shivini - Uses https://github.com/matter-labs/era-boojum-cuda via Cargo - Orchestration: https://github.com/matter-labs/zksync-era/tree/main/prover/prover_fri - Aux: Shadow block validation: https://github.com/matter-labs/era-boojum-validator-cli ## Outstanding Documentation Work ### Setting up an AWS EC2 (Amazon) Cloud VM *[TBD]* ### Setting up a CPU Prover *[TBD]* ### Setting up a Local zkSync Era Testnet Note: An additional ~100 GB of disk space will be needed. #### GCP Ubuntu 22.04 ##### Installing Requirements - https://www.digitalocean.com/community/tutorials/how-to-install-and-use-docker-on-ubuntu-22-04 - https://www.digitalocean.com/community/tutorials/how-to-install-and-use-docker-compose-on-ubuntu-22-04 - https://www.stewright.me/2023/04/install-nodejs-18-on-ubuntu-22-04/ - `sudo apt install docker-compose` (for the v1 docker compose, with a hyphen instead of a space) - `sudo npm install --global yarn` (it should be `v1.22.19`) - `sudo apt-get install axel` - `sudo apt-get install build-essential pkg-config cmake clang lldb lld` - `sudo apt-get install libssl-dev` - `cargo install sqlx-cli --version 0.5.13` ##### zkSync Server Setup - `git clone https://github.com/matter-labs/zksync-era.git` - `cd zksync-era/` - `git checkout boojum-integration` - `echo "export ZKSYNC_HOME=~/zksync-era" >> ~/.bashrc && echo "export PATH=~/zksync-era/bin:$PATH" >> ~/.bashrc && . ~/.bashrc` - `zk` - `zk init` - `zk test rust` - Update `etc/env/dev.env`: ``` ETH_SENDER_SENDER_PROOF_SENDING_MODE=OnlyRealProofs ETH_SENDER_SENDER_PROOF_LOADING_MODE=FriProofFromGcs OBJECT_STORE_FILE_BACKED_BASE_PATH=/path/to/server/artifacts PROVER_OBJECT_STORE_FILE_BACKED_BASE_PATH=/path/to/prover/artifacts FRI_PROVER_SETUP_DATA_PATH=/path/to/above-generated/gpu-setup-data ``` *Cf. https://github.com/matter-labs/zksync-era/blob/boojum-integration/docs/setup-dev.md* ##### Key generation Note: These are long-running tasks (after dependency compilation maybe another 20 minutes, depending on hardware), so you might want to use byobu/screen/tmux or similar to run the following commands in. - `sudo mkdir /usr/src/setup-data` - `sudo chmod 0777 /usr/src/setup-data` ###### Base layer setup data - `for i in {1..13} ; do zk f cargo run --features "gpu" --release --manifest-path=prover/vk_setup_data_generator_server_fri/Cargo.toml --bin zksync_setup_data_generator_fri -- --is_base_layer --numeric-circuit $i ; done` - Check output: `ls /usr/src/setup-data` (totalling around 11 GB for me) ###### Recursive layer setup data - `for i in {1..15} ; do zk f cargo run --features "gpu" --release --manifest-path=prover/vk_setup_data_generator_server_fri/Cargo.toml --bin zksync_setup_data_generator_fri -- --numeric-circuit $i ; done` - Check output: `ls /usr/src/setup-data` (now totalling around 22 GB for me) *Cf. https://gist.github.com/OrestTa/f2be59f8bd233e8bb5705a7e0aebfc16* ###### Verification key - `wget https://storage.googleapis.com/matterlabs-setup-keys-us/setup-keys/setup_2\^26.key` - `cargo run --release --bin zksync_verification_key_generator` *Cf. https://github.com/matter-labs/zksync-era/blob/boojum-integration/docs/launch.md* ##### Launching GPU Prover Pipeline - `zk server --components=api,eth,tree,state_keeper,housekeeper,proof_data_handler` - `cd ~/zksync-era/prover && zk f cargo run --release --features "gpu" --bin zksync_prover_fri` (in a dedicated terminal session, byobu recommended) - `cd ~/zksync-era/prover && zk f cargo run --release --bin zksync_prover_fri_gateway` (in a dedicated terminal session, byobu recommended) - With `cd ~/zksync-era/prover`: - Witgen, run each of the following in a dedicated terminal session, byobu recommended: ``` API_PROMETHEUS_LISTENER_PORT=3116 zk f cargo run --release --bin zksync_witness_generator -- --round=basic_circuits API_PROMETHEUS_LISTENER_PORT=3117 zk f cargo run --release --bin zksync_witness_generator -- --round=leaf_aggregation API_PROMETHEUS_LISTENER_PORT=3118 zk f cargo run --release --bin zksync_witness_generator -- --round=node_aggregation API_PROMETHEUS_LISTENER_PORT=3119 zk f cargo run --release --bin zksync_witness_generator -- --round=scheduler ``` - Witness vector gen, run each of the following in a dedicated terminal session, byobu recommended: ``` FRI_WITNESS_VECTOR_GENERATOR_PROMETHEUS_LISTENER_PORT=3416 zk f cargo run --release --bin zksync_witness_vector_generator FRI_WITNESS_VECTOR_GENERATOR_PROMETHEUS_LISTENER_PORT=3417 zk f cargo run --release --bin zksync_witness_vector_generator FRI_WITNESS_VECTOR_GENERATOR_PROMETHEUS_LISTENER_PORT=3418 zk f cargo run --release --bin zksync_witness_vector_generator FRI_WITNESS_VECTOR_GENERATOR_PROMETHEUS_LISTENER_PORT=3419 zk f cargo run --release --bin zksync_witness_vector_generator FRI_WITNESS_VECTOR_GENERATOR_PROMETHEUS_LISTENER_PORT=3420 zk f cargo run --release --bin zksync_witness_vector_generator ``` - `cd ~/zksync-era/prover && zk f cargo run --release --bin zksync_proof_fri_compressor` (in a dedicated terminal session, byobu recommended) *Cf. https://github.com/matter-labs/zksync-era/tree/boojum-integration/prover/prover_fri#proving-a-block-using-gpu-prover-locally* ##### Running in Docker For CPU proving: https://gist.github.com/hatemosphere/07728f5774444760c2de3a5b0ee3f375 #### Some notes for macOS - Node version: don’t use 20, I used 16 - `brew link node@16 —override` - contracts.toml: set `dummy_verifier=false` ### Connecting to Local Testnet *[TBD]* ### Connecting to Remote Testnet *[TBD]* ### Connecting to Mainnet *[TBD]* ## Contributing Feel free to [let Orest know](https://tarasiuk.me/about-orest) if you're interested in contributing to improving this note, or just use the HackMD comment function directly.