Summary

I started to look at the pyspec project. It's a work related to consensus-specs test generators.
The [consensus-specs] is repository that describe in python what must be done by consensus clients. To ensure there are doing the same things for the same objects, there are test generators. The run is super slow, I am not sure why but most likely du to many I/O operations.

I haven't worked much this week because I had hard time with my work, lo siento.

I am working on fixing some redundant generator tests on consensus-specs; not a complicated fix but ask me a lot of time to understand the generator structure.

Ressources

  • project wishlist
  • pyspec tests (example for rewards): the test code if special. There are decorators to insert data; for instance if you want to test provide a generic spec and state, you can use @spec_state_test, if you want to run a test for only Altair phas you can use @with_phases([ALTAIR]) etc. The way tests are define is strange but somehow interesting.
  • pyspec test generators: I did run a generator for rewards test and it takes a lot of time (waiting for cProfiler results).
  • pyspec test generators output: this were results of generator tests are stored. It's usefull for consensus client to be able to comapre results with what is expected. It's just a database for output files.
  • Prysm spectest: I did have a look at a implementation of those test from a consensus client. I took Prysm because I know Go. I am not sure to understand where are located test folders(in the server that run the CI?); but tests are compared to consensus-spec-test
  • Profiler results: I had profiler results and I was wrong. Just for the context, I run the test generator for rewards only! I thought it was most I/O operation because of the .ssz and .ssz_snappy + the JSON file updated for each tests (with the file mutex). But it seems around 50% of the own time comes from a dependency named remerkleable. It's a package wrote in pure python that stores some types and nodes that can be serialized/deserialized in SSZ. It is very often used in consensus-specs. I will need to run the profiler for another generator test to be sure it comes from this dependency.
  • What's SSZ ?: a short introduction of what is SSZ serialization. It's very different from RLP since you need to know the model to deserialize (not self-describing). I did have a look at SSZ since the package (remerkleable) is an implementation of SSZ in python (with the merkleization).
  • Merkleization: computing the hash tree root of an SSZ container (or basic types); I needed to understand how it works to understnad the package remerkleable.

Activity

Questions

At the beginning of the week I was wondering why tests were setup like this, because it's so unconventional and it seemed to be "dark magic". But after debugging it I think I understand why the setup is so complicated:

  • specs test are based on .py files which are themself based on the .md.
  • to make input, output and base test methods as genereic as possible.

I need to ask to the project proposer what are optimization he has in mind. For the moment, I only thought about I/O. => I am wrong

During the week, I will send a message to HWW to ask her about the python package that's using almost all tests ressources; if it's the issue she want to fix or is there something else to optimized before.

TODO

  • waiting for the profiler to find what are the time consuming tasks
  • debug the gen_runner.py which the launcher of the generators; there is a lot of I/O operation here.
  • make is easier to be runned without WSL.