# MiMC Hashing in the EVM using EVM384
**TLDR** This post presents potential cost reductions for on-chain computation of the MiMC hash function implemented using [EVM384-v7](https://notes.ethereum.org/@poemm/evm384-interface-update#Interface-v7) opcodes. Limitations of the EVM384 spec discovered as a result of this work are also discussed.
MiMC is a snark-friendly hash function that has seen considerable use on Ethereum. A notable use-case is in decentralized coin-mixers such as Tornado Cash where the MiMC cipher is currently invoked 40 times per deposit. According to [Tornado Cash Specs](https://github.com/tornadocash/tornado-core#specs), this greatly dominates the cost of deposits (1,088,354 gas), which are ~3x more expensive than withdrawals from the system.
A look at the [inner loop](https://github.com/iden3/circomlib/blob/master/src/mimcsponge_gencontract.js#L53-L78) shows a reliance on `ADDMOD`/`MULMOD` which are priced according to a generic algorithm. Additionally a significant portion of the EVM overhead comes from stack manipulation using `DUP/SWAP` to keep parameters in the correct ordering for the current and subsequent round.
## MiMC Implementation
The EVM384 implementation of MiMC is [here](https://github.com/jwasinger/mimc-evm384).
| MiMC Implementation | Gas Cost |
| ----------- | ----------- |
| EVM (Circomlib) | 17460 |
| EVM384 | 11414 |
**Table 1.** The cost of a single call to MiMC's cipher.
This implementation uses a slightly modified EVM384-v7 with smaller offsets, to save on code size, among other reasons. The costs are calculated using the opcode costs set by [update 5](https://notes.ethereum.org/@poemm/evm384-update5#Proposed-EVM384-Gas-Costs). However we expect significant cost reduction once the potential optimisations mentioned in the next section are considered.
Compared to the Circomlib implementation, we note an additional benefit: the use of `EVM384` opcodes, which pack multiple parameters (memory offsets) into a single stack item, greatly improves readability of the [generator code](https://github.com/jwasinger/mimc-evm384/blob/master/src/mimcsponge.js#L14) by removing the need for stack manipulation.
For a fair comparison of gas savings within Tornado Cash we must clarify a few details. A Tornado deposit computes the root of 20-level binary merkle, calling MiMC's cipher twice per level. While the EVM (and so Circomlib's implementation) handles numbers using big-endian byte order, EVM384 is little-endian currently. We have [noted](https://notes.ethereum.org/@poemm/evm384-update3#Interface-Endianness) some slowdown for making EVM384 big-endian. Hence we have multiple options:
- a) For a drop-in-replacement byteswapping must be performed. We [prototyped](https://github.com/axic/evm-bswap-golfing) that using existing EVM opcodes and a new [`BSWAP` opcode](https://github.com/ewasm/EIPs/blob/evm384/EIPS/eip-draft-bswap.md).
- b) For the best cost efficiency one would modify Tornado Cash to use a different interface to Circomlib's.
- c) Consider a big-endian version of EVM384 once again.
We plan to release an update about these options later.
## Limitations of EVM384
EVM384 aimed to add support for modular arithmetic on values up to 384-bits. The implementations and specification are based on modular addition/subtraction and Montgomery multiplication algorithms operating on large integers represented as 6x64 bit limbs. <small>Note that performant arithmetic on values larger than the system word size is often implemented using numbers represented as multiple word-sized "limbs".</small>
It was assumed that the Montgomery multiplication algorithm used for EVM384-v7 implementations (algorithm 14.36 from The Handbook of Applied Cryptography) would produce correct values for moduli occupying less than the full 6 limbs (i.e. cases where the most significant limb is `0x00...00`). This would allow EVM384-v7's `MULMODMONT384` to cover smaller moduli.
When this assumption was revealed to be false while attempting to use `MULMODMONT384` with a 254-bit/4x64bit-limbed modulus (Tornado uses MiMC with BN128's curve order), several iterations were [brainstormed](https://notes.ethereum.org/@axic/H1DMHeild) and [implemented](https://github.com/jwasinger/evm1024).
Since then, an additional property of the design of EVM384-v7 came to light which reveals the need for further iteration(s) on the spec:
Remember that Montgomery multiplication makes use of a special constant derived from a given modulus using an expensive modular inverse. The `MULMODMONT384` opcode requires the user to pre-compute this constant and pass it as a parameter.
Because Montgomery multiplication algorithms we were aware of produce defined (but potentially incorrect/"garbage") outputs on all inputs regardless of whether the "Montgomery constant" is correctly computed, it was assumed that the spec for `MULMODMONT384` could strictly follow the behavior of a chosen algorithm.
However, as it was later pointed out (first by Jordi Baylina from Iden3), having a setup mechanism would remove some overhead and reduce some pricing risk. This in turn also opens the door for using multiple algorithms and removes the need for keeping implementation quirks part of the specification.
Special thanks to Alex Beregszazi for implementation of ABI handler code compatible with the Circomlib implementation of MiMC, review and proposal of solutions regarding byteswapping for EVM384 and review/contributions on this document. Stay tuned for the release of additional updates and new proposals.