changed 3 years ago
Linked with GitHub

Deriving the Pedersen Generators

Introduction

This document explains how we will derive the pedersen generators and includes rust and golang code to which you can copy and paste. We will also include a few test vectors to ensure consistency.

The python code

Pedersen Hash vs Pedersen Commitment

You can skip this section, if you just want to figure out how to derive the generators :)

The difference between a pedersen hash and a pedersen commitment is that a commitment explicitly has a generator that is used to hide the committed values.

For example, consider the case where we want to commit to three values \(a,b,c\)

Pedersen Hash

Hash = \(a\cdot G_0 + b \cdot G_1 + c \cdot G_2\)

Pedersen Commitment

Commitment = \(a\cdot G_0 + b \cdot G_1 + c \cdot G_2 + r \cdot H\)

where \(r\) is a random element

It's a bit confusing because when we commit to a polynomial without zero knowledge, we says pedersen commitment and not pedersen hash.

Rationale

For security, the generators used in a pedersen hash must be created in such a way that the relative discrete logarithm between generators is unknown. If this is not the case, then a malicious actor can commit to any value that they desire. At a high level, for each generator that I know the relative discrete logarithm for, that generator becomes redundant. Another way to see it is that, N generators define an N-dimensional vector space and knowing the discrete logarithm between them makes them no longer linearly independent, ie they do not form a basis.

Methodology

We follow the simplest methodology which is try-and-increment. This procedure does not need to be constant time and should follow the (Nothing Up My Sleeve) convention.

Try and Increment (MapToGroup) was first introduced at 3.2 here: https://hovav.net/ucsd/dist/sigs.pdf

In short we:

  1. Set a bit b to be 0 or 1
  2. Think of a seed. For example "verkle trie"
  3. Use a cryptographic hash function to hash the seed and reduce it modulo the base field of the elliptic curve
  4. Interpret this field element as the \(x\) co-ordinate
  5. plug x into the twisted edwards equation to get z
  6. If z is a quadratic residue, then take the square root, returning two values
  7. If b == 0, choose the lexographically smaller value, else choose the other
  8. The chosen value is denoted as \(y\)
  9. Check that the co-ordinates (x,y) are correct.
  10. The seed is now \(x+1\) and we start again from step 2
  11. Stop when you have generated however many points that you need

Pseudo code

x = sha256("verkle trie")
points = []
POINTS_NEEDED = 256
i = 0
b = 0
while {
    x = x + i
    z = plug_x_into_curve_equation(x)
    if z.is_a_square() { // 4.is_a_square() == true, 5.is_square() == false in Z
        (y1, y2) = z.sqrt() // y1 = -y2
        y =  choose(b, y1, y2)
        if (x,y).is_correct() {
            points.append((x,y))    
        }
        
    }
    if len(points) == POINTS_NEEDED {
        break
    } 
    i++
}

Correctness checks

It is important that:

  • Each point is a point on the curve
  • Each point is in the prime subgroup
  • Each point only appears once in our list

Multiply by Order or Cofactor

There are two ways to check for the point being in the correct subgroup.

  • The first is to multiply by the order of the largest prime subgroup. The drawback of this is that it is expensive, however if we assume that the generators only need to be computed once, then it is a one-off cost.
  • The second is to clear the cofactor, which is cheap. Note that we could clear the cofactor and the result could be a point already in our list, so we must check if we've already seen this point. Using this method is much cheaper and it is then feasible to never needing to store the generators.

Recommendation

  • If the first method is used, then it is recommended to have a dedicated test to check that those values are indeed correct, each time the CI is ran, and also store these constants in a separate file so that diffs made to this file are apparent.
  • If the second method is used, the general recommendation for the first method also applies, in that any changes made to that particular function should be apparent and tests should be ran on each CI workflow.

This document uses the first method.

Extra Dependencies

  • Sha256

Lexographically Largest

For a field of size \(q\), we denote a reduced field element \(z\) is lexographically larger than it's negation if \(z > (q-1)/2\).

Library Assumptions

  • The cryptography library has a method to check if a field is a square
  • There is a method to check if a point is on the curve

Seed

The current seed being used is: eth_verkle_oct_2021

Incase the seed does change in the future, one only needs to change the month and year parts.

Golang

(Imperative)


func GenerateRandomPoints(numPoints uint64) []*bandersnatch.PointAffine {

	digest := sha256.New()
	digest.Write([]byte("eth_verkle_oct_2021")) // incase it changes or needs updating, we can use eth_verkle_month_year
	hash := digest.Sum(nil)

	var u fp.Element
	u.SetBytes(hash)

	// flag to indicate whether we choose the lexographically larger
	// element out of `y` and it's negative
	choose_largest := false

	points := []*bandersnatch.PointAffine{}

	var increment uint64 = 0

	for uint64(len(points)) != numPoints {

		var x = incrementBy(u, increment)
		increment++

		point_found := bandersnatch.GetPointFromX(x, choose_largest)
                // nil means it is not a point on the curve
		if point_found == nil {
			continue
		}
		if point_found.IsInPrimeSubgroup() {
			points = append(points, point_found)
		}

	}

	return points
}

// returns u + i
func incrementBy(u fp.Element, i uint64) *fp.Element {
	var increment fp.Element
	increment.SetUint64(i)

	var result fp.Element
	return result.Add(&u, &increment)
}

Rust

(Functional)

 fn generate_random_elements(num_required_points: usize) -> Vec<EdwardsAffine> {
    use bandersnatch::Fq;
    use sha2::{Digest, Sha256};

    let mut hasher = Sha256::new();
    hasher.update(b"eth_verkle_oct_2021");
    let bytes = hasher.finalize().to_vec();

    let u = bandersnatch::Fq::from_be_bytes_mod_order(&bytes);
    let choose_largest = false;

    (0..)
        .into_iter()
        .map(|i| Fq::from(i as u128) + u)
        .map(|x| EdwardsAffine::get_point_from_x(x, choose_largest))
        .filter_map(|point| point)
        .filter(|point| point.is_in_correct_subgroup_assuming_on_curve())
        .take(num_required_points)
        .collect()
}

Test Vectors

First point

22ac968a98ab6c50379fc8b039abc8fd9aca259f4746a05bfbdf12c86463c208

256th point

c8b4968a98ab6c50379fc8b039abc8fd9aca259f4746a05bfbdf12c86463c208

sha256 of all points

c390cbb4bc42019685d5a01b2fb8a536d4332ea4e128934d0ae7644163089e76

  • This hash was produced by serialising all of the points and add each serialised point into the sha256 buffer. Then producing a hash. Please check the code below for more details.

Golang

func TestCRSGeneration(t *testing.T) {

	points := GenerateRandomPoints(256)
	for _, point := range points {
		if !point.IsOnCurve() {
			panic("generated a point that was not on the curve")
		}
		if !point.IsInPrimeSubgroup() {
			panic("point is not in the prime sub group")
		}
	}
	bytes := points[0].Bytes()
	got := hex.EncodeToString(bytes[:])
	expected := "22ac968a98ab6c50379fc8b039abc8fd9aca259f4746a05bfbdf12c86463c208"
	if got != expected {
		panic("the first point is not correct")
	}
	bytes = points[255].Bytes()
	got = hex.EncodeToString(bytes[:])
	expected = "c8b4968a98ab6c50379fc8b039abc8fd9aca259f4746a05bfbdf12c86463c208"
	if got != expected {
		panic("the 256th (last) point is not correct")
	}

	digest := sha256.New()
	for _, point := range points {
		bytes := point.Bytes()
		digest.Write(bytes[:])
	}
	hash := digest.Sum(nil)
	got = hex.EncodeToString(hash[:])
	expected = "c390cbb4bc42019685d5a01b2fb8a536d4332ea4e128934d0ae7644163089e76"
	if got != expected {
		panic("unexpected point encountered")
	}

}

Rust

#[test]
fn generate_crs() {
    use bandersnatch::{EdwardsAffine, Fq};
    use sha2::{Digest, Sha256};

    let points = generate_random_elements(256);
    for point in &points {
        let on_curve = point.is_on_curve();
        let in_correct_subgroup = point.is_in_correct_subgroup_assuming_on_curve();
        if !on_curve {
            panic!("generated a point which is not on the curve")
        }
        if !in_correct_subgroup {
            panic!("generated a point which is not in the prime subgroup")
        }
    }

    let mut bytes = [0u8; 32];
    points[0].serialize(&mut bytes[..]).unwrap();
    assert_eq!(
        hex::encode(&bytes),
        "22ac968a98ab6c50379fc8b039abc8fd9aca259f4746a05bfbdf12c86463c208",
        "the first point is incorrect"
    );
    let mut bytes = [0u8; 32];
    points[255].serialize(&mut bytes[..]).unwrap();
    assert_eq!(
        hex::encode(&bytes),
        "c8b4968a98ab6c50379fc8b039abc8fd9aca259f4746a05bfbdf12c86463c208",
        "the 256th (last) point is incorrect"
    );

    let mut hasher = Sha256::new();
    for point in &points {
        let mut bytes = [0u8; 32];
        point.serialize(&mut bytes[..]).unwrap();
        hasher.update(&bytes);
    }
    let bytes = hasher.finalize().to_vec();
    assert_eq!(
        hex::encode(&bytes),
        "c390cbb4bc42019685d5a01b2fb8a536d4332ea4e128934d0ae7644163089e76",
        "unexpected point encountered"
    );
}

Future changes

Serialisation

The serialisation that the Rust and Python library uses is non-standard and so once this issue is closed, the test vectors will need to be updated:https://github.com/arkworks-rs/algebra/issues/330

The Golang library used the correct serialisation format, however it was modified to match the arkworks format for interopability.

Seed

It's possible that the seed does change; maybe "eth_verkle_2021" is better, or maybe "verkle_trie". It's not a big issue in my opinion, but if you are reading this from the future, this could be a reason why your test vectors are not matching up.

Select a repo