(OKR 2022Q4 - ???) Gas costs regressions detected twice a week

# (OKR 2022Q4 - ???) Gas costs regressions detected twice a week ## Context The Snoop benchmarks used to infer all gas cost parameters of the protocol are currently only run just before injecting a new amendment proposal, so about once every 4 months. That's because a run is a bit long (it takes a few days), and inputing the new values involves some manual work. Updating these gas parameters is hence performed rarely and at a moment in the development cycle at which we don't have much time to analyse problems. In particular, performance regressions are sometimes discovered at that point but understanding them is too long and difficult to do given the time constraints we have then. Running the benchmarks and infering the parameters more often would make studying the detected regressions easier because there would be less changes in the protocol between two runs and because doing this work continuously during the protocol development cycle would release the pressure of doing this all at once, in a rush, just before injection. Our ultimate goal would be to integrate benchmarking, inference (and detection of regression of gas parameters) and code generation in the continuous integration framework but this is very ambitious because it requires a lot of changes to Snoop to make it machine-independent, and to distribute benchmarks. Such deep changes to Snoop are out of our scope for Q4; we focus here, as a first step, on running the benchmarks as often as possible while running them sequentially on the reference machine. We estimate that running them twice a week is doable and already brings a lot of value to the gas regression situation. ## Work Break Down - [ ] Running benchmarks twice a week (continuous workload): run the full snoop benchmark and infer all parameters two times a week on the reference machine. - [ ] Automated regression report on inferred models: create and publish a report for each run that shows the difference (of inferred values) from some reference (could be the previous run or something fixed). - [ ] Regression alerts: send an alert when differences with regressions exceed a given threshold. - [ ] Fix flaky benchmarks: we expect some benchmarks to raise false-positive alerts or be reported as low-quality. These alerts will be analysed and the flaky benchmarks will be fixed.