We set out our priority list of goals at the beginning of the quarter, intentionally longer than we expected to achieve. A subset of these were:
Comp95 - Post MVP tasks
KR: Resurrect prefetching (Comp106)
KR: Pre-requisites for profiling allocator (Comp107)
KR: Cross compilation (Comp110)
This goal is very close to being finished - tomorrow, if all goes well! As expected this highest priority task pre-empted any other goals, and ended up taking a large amount of @engil's time, and most of @sadiqj's limited time. I believe this has been very successful.
We began the quarter with a working port of the prefetcher from 4.x. However, the performance improvement of this work didn't match that seen when it was first implemented by Stephen Dolan, according to the microbenchmark included in that first PR. The work this quarter has been to dig into this and find out exactly why this was the case, and to investigate several options to see if we can squeeze any more performance out of it.
In general performance engineering is tricky, not obvious, and seemingly minor changes can often lead to large changes in outcome. As such it's an area where experience is critical, and so building our team's knowledge in this area is very important for the future. @fabbing has met with @sadiqj and got several suggestions for potential improvements that might be made, but I understand that none of these had a significant impact. He also got some suggestions from @stedolan, which have been worked on recently, and they are both working together on this today. I anticipate some more weeks work on this, but not much more than that.
This KR was more loosely defined. The goal is to know whether we need to spend time reimplementing the best-fit allocator for 5.x or whether the current allocator is "sufficient", for some definition of "sufficient". The aim is to provide Jane Street with an early way to make this definition concrete.
At the beginning of the quarter the idea was to take traces from memtrace on 4.12 on Jane Street's internal code, and to run them on the 5.0 allocator directly. After some discussions with @sadiqj, @tom_ridge, @stedolan and @kc, we settled on a two-pronged approach, where we will
The backport will allow JS to do internal testing of the new allocator and hopefully answer the question of whether the reimplementation of the best-fit allocator is needed or not. As of today, we've not received any data from JS, but we have some code to do the statistics. For the backport, @sadiq has provided a start that compiles and segfaults, and @engil has started doing some reading to get more familiar with what's necessary. This work will continue into Q1 next year.
@seb joined the company in October and was given the task of making OCaml cross compilation a first-class feature. We currently have one rolling KR covering this, but he has been working on a roadmap for the complete cross-compilation support, and so in future we can expect a more fine-grained set of KRs to track this work. A lot of the work can be described as 'tidying' in order to make the subsequent work more tractable. The tasks required waere already well defined enough that @seb could immediately begin by continuing the work he was previously doing on the build system of OCaml. A large chunk of this has been merged already (unifying /tools/Makefile and /Makefile), and he has started work on breaking the dependency between dynlink and compiler-libs. Future steps will include:
These will be refined into KRs suitable for planning of Q1 and beyond.
@fabbing worked with @otini on the TSAN implementation for user libraries for OCaml 5.0. While this wasn't on our original plan this was a useful collaboration as in-depth knowledge of how exceptions and effects work was required for this - and @fabbing, with his frame-pointers work and the tech-talk he gave is now one of the most knowledgable people within our organisation on this.