# c-kzg alignment issue debug log ## Background Issue originated from: https://github.com/taikoxyz/raiko/issues/169 The plain log from raiko as below: ```bash 2024-06-04T02:17:34.525090Z INFO prove_core:runtime.state: sp1_core::runtime: starting execution [2024-06-04T02:17:34Z INFO sp1_core::runtime::utils] clk = 0 pc = 0x295948 stdout: spec_id: SHANGHAI [2024-06-04T02:17:46Z INFO sp1_core::runtime::utils] clk = 10000000 pc = 0x3c2e08 stdout: ^Mprocessing tx 0/1...^MTx transact time: 0.000 seconds stdout: Tx misc time: 0.000 seconds stdout: Processing withdrawals...^MProcessing withdrawals... Done in 0.000 seconds stdout: Generating block header...^MGenerating block header... Done in 0.000 seconds stdout: kzg check enabled! [2024-06-04T02:17:57Z INFO sp1_core::runtime::utils] clk = 20000000 pc = 0x295788 [2024-06-04T02:18:08Z INFO sp1_core::runtime::utils] clk = 30000000 pc = 0x37bddc stdout: malloc 98304 memory to address 0x67c4b4. stdout: malloc 393216 memory to address 0x6944b4. stdout: malloc 131072 memory to address 0x6f44b4. thread 'tokio-runtime-worker' panicked at /home/yue/works/succinct/sp1/core/src/runtime/mod.rs:622:17: assertion `left == right` failed: addr is not aligned left: 2 right: 0 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace (END) ``` After John’s fix, the risc0 seems fine, but sp1 still randomly meets un-alignment issue. Actually not that random, for me, it’s 90% reproducible. ## My attempts: First try log all execution. It was a huge log with limited register, so I tuned a little bit to make log smaller then faster to parse: https://github.com/smtmfft/sp1/commit/3e8c1f70ad846b51ca7acfb7a7568952d8b0b773 Here is the tail of VM execution trace: ```bash clk=35354707 [pc=0x384cb8] add %x20 %x15 0 | "x20=144 | x15=6192794 | x23=1695420 " clk=35354708 [pc=0x384cbc] add %x8 %x26 48 | "x20=6192794 | x26=6899644 | x15=6192794 | x8=1695616 " clk=35354709 [pc=0x384cc0] add %x15 %x20 96 | "x26=6899644 | x15=6192794 | x8=6899692 | x20=6192794 " clk=35354710 [pc=0x384cc4] add %x14 %x26 0 | "x14=1536 | x15=6192890 | x20=6192794 | x26=6899644 " clk=35354711 [pc=0x384cc8] add %x12 %x20 144 | "x14=6899644 | x20=6192794 | x26=6899644 | x12=4096 " clk=35354712 [pc=0x384ccc] lw %x13 %x15 0 | "x12=6192938 | x15=6192890 | x20=6192794 | x13=5752992 " thread 'tokio-runtime-worker' panicked at /home/yue/works/succinct/sp1/core/src/runtime/mod.rs:622:17: assertion `left == right` failed: addr is not aligned left: 2 right: 0 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace ``` The last one is `lw %13 %15 0` , means x13 = *(x15+0). here x13&x15 are registers. We can see here `x15 = 6192890 (0x5e7efa)` , which is not 4-bytes aligned. However, it’s seems a reasonable address. After analyze the previous log, I can see a vogue pattern that this value 6192890 is increased from a lower base address, and the increasing process is like a loop as the pc is the same. So I switched back looking into the c-kzg code. The c-kzg project structure is like this: ![Untitled](https://hackmd.io/_uploads/HyoYgaANA.png) As here we have multi-layers dependency, another guessing is some alignment related flags are not passed down to blst like these: ```rust .c_flags(&[ "/opt/riscv/bin/riscv32-unknown-elf-gcc", "-march=rv32im", "-mstrict-align", "-falign-functions=2", ]) ``` Fortunately the guessing is valid, those build flags are really missing in compiling ckzg.a & blst.a. ![Untitled 1](https://hackmd.io/_uploads/S1Wol60N0.png) ![Untitled 2](https://hackmd.io/_uploads/ryjje6A4R.png) BUT unfortunately!!! the result has `NOT` changed 🤢. Meanwhile, I add some log into the code, to see if we can find the postion where issue happens. (This is troublesome because sp1 gets rid of the debug stack-frame). After some investigation, the deepest function I can find is this one: `blst_p1s_to_affine`, the callstack is: ```bash blob_to_kzg_commitment → poly_to_kzg_commitment → g1_lincomb_fast → blst_p1s_to_affine. ``` However, this `blst_p1s_to_affine` is a dynamic macro function as below, seems no easy way to dig deeper. (although it may not helpful as knowing exactly where issue happen maybe not related to the root cause.) ```bash #define POINTS_TO_AFFINE_IMPL(prefix, ptype, bits, field) \ static void ptype##s_to_affine(ptype##_affine dst[], \ const ptype *const points[], size_t npoints) \ { \ size_t i; \ vec##bits *acc, ZZ, ZZZ; \ const ptype *point = NULL; \ const size_t stride = sizeof(ptype)==sizeof(POINTonE1) ? 1536 : 768; \ \ while (npoints) { \ const ptype *p, *const *walkback; \ size_t delta = stride<npoints ? stride : npoints; \ \ point = *points ? *points++ : point+1; \ acc = (vec##bits *)dst; \ vec_copy(acc++, point->Z, sizeof(vec##bits)); \ for (i = 1; i < delta; i++, acc++) \ point = *points ? *points++ : point+1, \ mul_##field(acc[0], acc[-1], point->Z); \ \ --acc; reciprocal_##field(acc[0], acc[0]); \ \ walkback = points-1, p = point, --delta, dst += delta; \ for (i = 0; i < delta; i++, acc--, dst--) { \ mul_##field(acc[-1], acc[-1], acc[0]); /* 1/Z */\ sqr_##field(ZZ, acc[-1]); /* 1/Z^2 */\ mul_##field(ZZZ, ZZ, acc[-1]); /* 1/Z^3 */\ mul_##field(acc[-1], p->Z, acc[0]); \ mul_##field(dst->X, p->X, ZZ); /* X = X'/Z^2 */\ mul_##field(dst->Y, p->Y, ZZZ); /* Y = Y'/Z^3 */\ p = (p == *walkback) ? *--walkback : p-1; \ } \ sqr_##field(ZZ, acc[0]); /* 1/Z^2 */\ mul_##field(ZZZ, ZZ, acc[0]); /* 1/Z^3 */\ mul_##field(dst->X, p->X, ZZ); /* X = X'/Z^2 */\ mul_##field(dst->Y, p->Y, ZZZ); /* Y = Y'/Z^3 */\ ++delta, dst += delta, npoints -= delta; \ } \ } \ \ void prefix##s_to_affine(ptype##_affine dst[], const ptype *const points[], \ size_t npoints) \ { ptype##s_to_affine(dst, points, npoints); } POINTS_TO_AFFINE_IMPL(blst_p1, POINTonE1, 384, fp) POINTS_TO_AFFINE_IMPL(blst_p2, POINTonE2, 384x, fp2) ``` Debugging temporarily pauses here. Another possible attempt is that, with the help from alignment checking flag, we can see the following warnings. Not sure if some of them has sth to do with the above suspicious function, but resolving all these warning looks a little bit promising. ```bash warning: blst@0.3.11: In file included from /home/yue/works/taiko/c-kzg-4844/blst/src/server.c:7: warning: blst@0.3.11: /home/yue/works/taiko/c-kzg-4844/blst/src/keygen.c: In function 'HKDF_Expand': warning: blst@0.3.11: /home/yue/works/taiko/c-kzg-4844/blst/src/keygen.c:123:22: warning: cast increases required alignment of target type [-Wcast-align] warning: blst@0.3.11: 123 | sha256_hcopy((unsigned int *)OKM, (const unsigned int *)ctx->tail.c); warning: blst@0.3.11: | ^ warning: blst@0.3.11: /home/yue/works/taiko/c-kzg-4844/blst/src/keygen.c:123:43: warning: cast increases required alignment of target type [-Wcast-align] warning: blst@0.3.11: 123 | sha256_hcopy((unsigned int *)OKM, (const unsigned int *)ctx->tail.c); warning: blst@0.3.11: | ^ warning: blst@0.3.11: In file included from /home/yue/works/taiko/c-kzg-4844/blst/src/server.c:23: warning: blst@0.3.11: /home/yue/works/taiko/c-kzg-4844/blst/src/exports.c: In function 'blst_fr_from_scalar': warning: blst@0.3.11: /home/yue/works/taiko/c-kzg-4844/blst/src/exports.c:80:34: warning: cast increases required alignment of target type [-Wcast-align] warning: blst@0.3.11: 80 | mul_mont_sparse_256(ret, (const limb_t *)a, BLS12_381_rRR, warning: blst@0.3.11: | ^ warning: blst@0.3.11: /home/yue/works/taiko/c-kzg-4844/blst/src/exports.c: In function 'blst_scalar_from_fr': warning: blst@0.3.11: /home/yue/works/taiko/c-kzg-4844/blst/src/exports.c:98:23: warning: cast increases required alignment of target type [-Wcast-align] warning: blst@0.3.11: 98 | from_mont_256((limb_t *)ret, a, BLS12_381_r, r0); warning: blst@0.3.11: | ^ warning: blst@0.3.11: /home/yue/works/taiko/c-kzg-4844/blst/src/exports.c: In function 'blst_sk_mul_n_check': warning: blst@0.3.11: /home/yue/works/taiko/c-kzg-4844/blst/src/exports.c:136:46: warning: cast increases required alignment of target type [-Wcast-align] warning: blst@0.3.11: 136 | mul_mont_sparse_256(t[0], BLS12_381_rRR, (const limb_t *)a, BLS12_381_r, r0); warning: blst@0.3.11: | ^ warning: blst@0.3.11: /home/yue/works/taiko/c-kzg-4844/blst/src/exports.c:137:37: warning: cast increases required alignment of target type [-Wcast-align] warning: blst@0.3.11: 137 | mul_mont_sparse_256(t[0], t[0], (const limb_t *)b, BLS12_381_r, r0); warning: blst@0.3.11: | ^ warning: blst@0.3.11: /home/yue/works/taiko/c-kzg-4844/blst/src/exports.c: In function 'blst_sk_inverse': warning: blst@0.3.11: /home/yue/works/taiko/c-kzg-4844/blst/src/exports.c:153:23: warning: cast increases required alignment of target type [-Wcast-align] warning: blst@0.3.11: 153 | limb_t *out = (limb_t *)ret; warning: blst@0.3.11: | ^ warning: blst@0.3.11: /home/yue/works/taiko/c-kzg-4844/blst/src/exports.c:154:34: warning: cast increases required alignment of target type [-Wcast-align] warning: blst@0.3.11: 154 | mul_mont_sparse_256(out, (const limb_t *)a, BLS12_381_rRR, warning: blst@0.3.11: | ^ warning: blst@0.3.11: /home/yue/works/taiko/c-kzg-4844/blst/src/exports.c: In function 'blst_fp_from_uint64': warning: blst@0.3.11: /home/yue/works/taiko/c-kzg-4844/blst/src/exports.c:247:13: warning: cast increases required alignment of target type [-Wcast-align] warning: blst@0.3.11: 247 | a = (const unsigned long long *)ret; warning: blst@0.3.11: | ^ warning: blst@0.3.11: /home/yue/works/taiko/c-kzg-4844/blst/src/exports.c: In function 'blst_fr_from_uint64': warning: blst@0.3.11: /home/yue/works/taiko/c-kzg-4844/blst/src/exports.c:473:13: warning: cast increases required alignment of target type [-Wcast-align] warning: blst@0.3.11: 473 | a = (const unsigned long long *)ret; warning: blst@0.3.11: | ^ warning: blst@0.3.11: At top level: warning: blst@0.3.11: cc1: note: unrecognized command-line option '-Wno-unused-command-line-argument' may have been intended to silence earlier diagnostics ``` **However, a weird problem is none of above analysis can explain why running c-kzg alone in sp1 has no issue. WHY???** ## Temporary conclusions 1. Without debug info in binary, debugging effiency is really low…. An ideal use case is we can use qemu to simulate the elf directly (tried, but failed) rather than rely on sp1 VM execution.Hopefully we can find better debugging tool for RiscV ISA, need help from riscv community. 2. Ask sp1’s help, they could have better debugging skill or luck.🙂 3. Try to discard the alignment requirement, based on my understanding, the VM exe is for zk witness generation, as long as zk witness does not require memory alignment (ZK says: “what’s alignment?”😄). As Risc0 does not need that, I think sp1 can do the same. 4. Temporarily disable c-kzg check in raiko, as sgx has it, no way in zk to make fraud proof on this point, and wait until a working kzg lib online.