Q4 Compiler backend retrospective

Recap

We set out our priority list of goals at the beginning of the quarter, intentionally longer than we expected to achieve. A subset of these were:

Comp95 - Post MVP tasks
- This is the umbrella KR for all the 5.0 activities that have happened since the merge. Bug fixes, design improvements, documentation etc.
KR: Resurrect prefetching (Comp106)
- Very likely to be a big positive performance impact
- Required for this is to do very careful profiling. Stephen Dolan initially implemented this. There is a microbenchmark linked from the original PR but there was also a call for more benchmarking.
- Largely @fabrice
KR: Pre-requisites for profiling allocator (Comp107)
- Recording of memory traces via statmemprof (with sample rate turned up to 1.0)
- Replay in progress, will probably switch to using a C implementation
- Test cases: Irmin, need to coordinate with JS to get traces from them
  - Is 'stock' statmemprof right? Does that leak too much info? If so, can we do something a bit simpler?
- @tjr
KR: Cross compilation (Comp110)
- Long-term project!
- Sebastian to lead

Progress

Comp95 - Post MVP tasks

This goal is very close to being finished - tomorrow, if all goes well! As expected this highest priority task pre-empted any other goals, and ended up taking a large amount of @engil's time, and most of @sadiqj's limited time. I believe this has been very successful.

Comp106 - Resurrect prefetching

We began the quarter with a working port of the prefetcher from 4.x. However, the performance improvement of this work didn't match that seen when it was first implemented by Stephen Dolan, according to the microbenchmark included in that first PR. The work this quarter has been to dig into this and find out exactly why this was the case, and to investigate several options to see if we can squeeze any more performance out of it.

In general performance engineering is tricky, not obvious, and seemingly minor changes can often lead to large changes in outcome. As such it's an area where experience is critical, and so building our team's knowledge in this area is very important for the future. @fabbing has met with @sadiqj and got several suggestions for potential improvements that might be made, but I understand that none of these had a significant impact. He also got some suggestions from @stedolan, which have been worked on recently, and they are both working together on this today. I anticipate some more weeks work on this, but not much more than that.

Comp107 - Pre-requisites for profiling allocator

This KR was more loosely defined. The goal is to know whether we need to spend time reimplementing the best-fit allocator for 5.x or whether the current allocator is "sufficient", for some definition of "sufficient". The aim is to provide Jane Street with an early way to make this definition concrete.

At the beginning of the quarter the idea was to take traces from memtrace on 4.12 on Jane Street's internal code, and to run them on the 5.0 allocator directly. After some discussions with @sadiqj, @tom_ridge, @stedolan and @kc, we settled on a two-pronged approach, where we will

do statistical analysis of data from JS, and
backport the 5.x allocator to 4.14

The backport will allow JS to do internal testing of the new allocator and hopefully answer the question of whether the reimplementation of the best-fit allocator is needed or not. As of today, we've not received any data from JS, but we have some code to do the statistics. For the backport, @sadiq has provided a start that compiles and segfaults, and @engil has started doing some reading to get more familiar with what's necessary. This work will continue into Q1 next year.

Comp110 - Cross compilation

@seb joined the company in October and was given the task of making OCaml cross compilation a first-class feature. We currently have one rolling KR covering this, but he has been working on a roadmap for the complete cross-compilation support, and so in future we can expect a more fine-grained set of KRs to track this work. A lot of the work can be described as 'tidying' in order to make the subsequent work more tractable. The tasks required waere already well defined enough that @seb could immediately begin by continuing the work he was previously doing on the build system of OCaml. A large chunk of this has been merged already (unifying /tools/Makefile and /Makefile), and he has started work on breaking the dependency between dynlink and compiler-libs. Future steps will include:

improving ocamldep
- allowing separate build/source trees
- making it understand mll/mly/other files
- separating build/source trees
simplifying bootstrap
supporting out-of-source builds
removing bootstrap compiler dependency (ie, use a pre-installed ocaml)

These will be refined into KRs suitable for planning of Q1 and beyond.

Honorable mentions

TSAN

@fabbing worked with @otini on the TSAN implementation for user libraries for OCaml 5.0. While this wasn't on our original plan this was a useful collaboration as in-depth knowledge of how exceptions and effects work was required for this - and @fabbing, with his frame-pointers work and the tech-talk he gave is now one of the most knowledgable people within our organisation on this.