Sin7Y: About the Sinsemilla hash function used in OlaVM

# Sin7Y: About the Sinsemilla hash function used in OlaVM Last month we were pleased to announce the [OlaVM Whitepaper](https://ethresear.ch/t/whitepaper-olavm-an-ethereum-compatible-zkvm/13144), an EVM-compatible ZKVM solution, released the 25th of July, 2022. ZKEVM has been a hot topic itself over the past couple of weeks, and upon the release of OlaVM, the paper managed to receive some honorable attention from prominent people in the industry, amongst them, one being [Daira Hopwood](https://twitter.com/feministPLT) (who also is the main author of the [Zcash protocol](https://zips.z.cash/protocol/protocol.pdf)), we would like to thank Daira for her feedback. Daira brought up a few important questions in regards to the design decisions, one of them relating to the choice of hash function in [ECDSA](https://en.wikipedia.org/wiki/Elliptic_Curve_Digital_Signature_Algorithm) and [Schnorr](https://en.wikipedia.org/wiki/Schnorr_signature) signature algorithms. The exact comment can be seen in the tweet attached below. ![](https://i.imgur.com/GbFs8j3.png) What Daira is referencing can in simple terms be described with the following: The security level of [Sinsemilla hash](https://zips.z.cash/protocol/protocol.pdf) is collision resistant, so it cannot be regarded as a Random Oracle. For the ECDSA and Schnorr signature algorithms, in order to satisfy sufficient security, the choice of hash function needs to be regarded as Random Oracle. In order to understand this better, we need to understand some cryptographic concepts first. ## 1. Security properties of a cryptographic hash function (CHF) According to the definition in the paper [Cryptographic Hash-Function Basics](https://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=C7364E9082B2726A10E1C712B476C82A?doi=10.1.1.3.6200&rep=rep1&type=pdf), the security properties corresponding to a CHF are divided into the following three categories: - **Preimage Resistance** - Given H(x), it is computationally difficult to determine x, i.e., it is computationally infeasible to discover any input which created a specific output. - **Second Preimage Resistance** - Given x, it is computationally difficult to find some different value x' such that H(x) == H(x'), i.e., it is computationally infeasible to find any second input which has the same output as any specified input. - **Collision Resistance** - Expanding the concept of Second Preimage Resistance, it is computationally infeasible to find any two arbitrary inputs, x and y such that H(x) == H(y), i.e., different inputs which hash to the same output. An important aspect to be noted is that, if you satisfy the conditions for Collision Resistance, you also satisfy the conditions for Second Preimage Resistance, however this does not guarantee that you are satisfying the conditions for Preimage Resistance. ## 2. What is a Random Oracle? A [Random Oracle](https://en.wikipedia.org/wiki/Random_oracle) is described in the following model: - Imagine you have a black box. In the box lives a gnome, with a big book and some dice. - We can add some input data into the box (an arbitrary sequence of bits). - Given some input data that the gnome did not see beforehand, he uses his dice to generate some output, uniformly and randomly, in some conventional space (the space of oracle outputs). The gnome also writes down the input and the newly generated output in his book. - If he is given some formerly known input data, the gnome turns to his book, in order to recover the output he generated and returned the last time, and returns it again. To briefly summarize the behavior of Random Oracles, assuming the input is x: - If x has been used as input data before, it will directly return the corresponding H(x). - If x has never been used as input data, the Random Oracle will generate a string consisting of 0.1 in the range of values **completely randomly**. It might be noted that: - **Complete randomness** here means that even the Random Oracle itself does not know what the final value will be, it has no rules to follow. This is the main difference from hash functions. Any hash function has its own calculation rules. But in the real world, it is quite difficult to achieve an actual Random Oracle, therefore, we need to find a potential candidate for Random Oracles, and we need to make the output as random as possible. Hash functions are a good choice. A secure hash function needs to satisfy the following conditions: Preimage Resistance, Second Preimage Resistance, and Collision Resistance. A hash function that can be used as Random Oracle must satisfy these three properties, but a hash function that satisfies these three properties may not necessarily be used as Random Oracle, there is a necessary and insufficient relationship between them. More details can be found in [What is the “Random Oracle Model” and why is it controversial?](https://crypto.stackexchange.com/questions/879/what-is-the-random-oracle-model-and-why-is-it-controversial) ## 3. Requirements for using hash functions in ECDSA and Schnorr signature algorithm As mentioned in the papers [On the security of ECDSA with additive key derivation and presignatures](https://www.shoup.net/papers/2021-1330.pdf) and [On the Exact Security of Schnorr-Type Signatures in the Random Oracle Model](https://eprint.iacr.org/2012/029.pdf), the hash functions in both ECDSA and Schnorr signature algorithms need to be considered Random Oracles before they are safe. According to the previous description, this hash needs to satisfy all the security properties of CHF: preimage-resistance, 2nd-preimage resistance, and collision resistance. ![](https://i.imgur.com/f59pXfQ.png) <center>Figure 1. ECDSA signature process</center> ![](https://i.imgur.com/loYUxKA.png) <center>Figure 2. Schnorr signature process</center> ## 4. About the Sinsemilla hash function The Sinsemilla hash function was designed by [Daira Hopwood and Sean Bowe](https://twitter.com/feministPLT), and the bottom layer relies on [ECDLP(Elliptic Curve Discrete Logarithm Problem)](https://link.springer.com/referenceworkentry/10.1007/978-1-4419-5906-5_246). Under a fixed-length input, the Sinsemilla hash function satisfies the Collision Resistance property, but not the Preimage Resistance property. Please refer to Daira's [answer](https://twitter.com/feministPLT/status/1551856467145269249) for further clarification. According to the [Zcash protocol specification](https://zips.z.cash/protocol/protocol.pdf), the original intention of designing the Sinsemilla hash function is to make full use of the advantages of being Lookup-friendly during the execution of the zero-knowledge proof algorithm Halo2, in order to improve the execution efficiency of Halo2. Therefore, the Sinsemilla hash function is a Lookup-friendly hash function, which is more suitable for computation of promises and computation of Merkle Tree roots. ![](https://i.imgur.com/cetTwsi.png) <center>Figure 3. Design idea of Sinsemilla hash</center> ## 5. Conclusion We would like to thank [Daira Hopwood](https://twitter.com/feministPLT) for pinpoiting this and her guidance, we will continue gathering feedback to continuously optimize and improve the OlaVM design, making it more efficient and secure. The Sinsemilla hash function will still be used in suitable modules in the OlaVM design. Regarding the usage of hash functions for signatures, we will opt for an appropriate and secure hash function, such as e.g the [Poseidon-](https://eprint.iacr.org/2019/458.pdf) or [Reinforced Concrete](https://eprint.iacr.org/2021/1038.pdf) hash functions.