# Internship report ## Table of content [TOC] ## Introduction The subject of this internship was to explore the possibility to build tools for analysis memory troubles for OCaml programs. OCaml has a garbage collector, therefore, memory problems can be hard to debug. The current tools are not satisfactory, especially to find memory leaks. At the beginning of the internship, several option were discussed and a more precise topic were chosen. ## Possible projects that were not explored Here, we will list some topics/projects that were looking interesting but were not investigate further. ### HTOP-like for LWT Tokio is a asynchronous runtime for Rust. They have developed a HTOP-like interface for it name [Tokio-console](https://tokio.rs/blog/2021-12-announcing-tokio-console). This shows information on the tasks for debugging purpose. A idea is to develop a similar tools for LWT. This tool could show how many promise are currently active, what the wait for, which one is running, etc. This project seems quite big and will most likely need to modify the LWT to gather meaningful information, and tools to analyse those information. To build such tool, finding the location of a closure can be needed. The project [Owee](https://github.com/let-def/owee) can be use to do that. ### Runtime Events and EIO-console OCaml 5 also has a new profiling tool called [Runtime Events](https://github.com/ocaml/ocaml/pull/10964). This is only for the next version of OCaml, it could be interesting to follow and see what are the possibility offer by it. Also there exist an application to monitor an OCaml application using this new profiling tool, [Eio-console](https://github.com/patricoferris/eio-console). ## Post mortem memory analysis The project chosen for the internship was to build a tool to do post mortem analysis of the memory of an OCaml program. The idea is the following, a program is running, and dump of the memory can be done. A dump of the memory can be done to understand what is in the memory at the time of the dump. Several dumps can be made for a running application, for example after each compaction. Then, the sequence of dump can be analysed to understand the evolution of the memory usages. ### Constraints As we want to study the memory behaviour of a program, the developed tool should not modify the memory behaviour of the running program and be the less invasive. Also, OCaml is statically typed, therefore there is little to no information at runtime to know, what is in the memory. We can traverse the memory as the GC does, but, by default, we cannot retrieve more information about what is in memory. ### Approach Our approach is the following. We choose to start from the [fork](https://github.com/let-def/ocaml/tree/tagl-413) developed by Fédéric Bour. The idea is to modify the header of OCaml values on 64 bits platform to include more information about the value without changing the memory used by the program. On 64 platform, the structure of the header is the following: ``` +--------+-------+-----+ | wosize | color | tag | +--------+-------+-----+ bits 63 10 9 8 7 0 ``` In the compiler, it is possible to reduce the size of `wosize`, to reserve some bits for other purpose. This infrastructure is called PROFINFO. With this we have he following structure for the header: ``` P = PROFINFO_WIDTH (as set by "configure", currently 26 bits, giving a maximum block size of just under 4Gb) +----------------+----------------+-------------+ | profiling info | wosize | color | tag | +----------------+----------------+-------------+ bits 63 (64-P) (63-P) 10 9 8 7 0 ``` Using this, Fédéric Bour added information in the profinfo field to be able to print any OCaml value using a syntax close to the original OCaml code. This approach is probabilistic because, a hash is include in the profinfo field, therefore collision could happen. This information can be seen as some sort of type information. Based of this work, we modified the OCaml runtime, to add a dump function and wrote a tool to analyse those dump. ### Modification of the OCaml compiler The work of Fédéric Bour have been ported to OCaml 4.14.0 from OCaml 4.13.0. Other than that, the patch on the compiler were not modified. A future task would be to add more information in the hash but this have not been done yet. ### Dumping the OCaml memory A possibility is to use a core-dump from the Linux kernel but, by doing this, we will get all the memory of the program including the runtime. From this, it is not clear how to retrieve the roots and the OCaml values. Therefore, we chose to modify the runtime of OCaml to had a dump function. To dump the memory of an OCaml program, one need understand how OCaml handles the memory. The full explanation of the memory allocation and management is out of scope of this document. We recommend reading the following to learn about it: - The [PhD thesis](https://hal-ensta-paris.archives-ouvertes.fr/ENSTA_U2IS/tel-01122262v1) of Çagdas Bozman - https://dev.realworldocaml.org/runtime-memory-layout.html - https://dev.realworldocaml.org/garbage-collector.html - https://v2.ocaml.org/manual/intfc.html The functions to do the dump are define in [`runtime/dump.c`](https://github.com/FardaleM/ocaml/blob/tagl-414-dump/runtime/dump.c). The format of the dump is as follows. First all the roots are dumps followed by a null word that mark the end. A memory region is dump by writing the first address of the region, the size in byte as a word, and then the actual dump of the region. After the roots, the minor heap is dump, then all the chunks of the major heap and the data segment. We also define a new option `d` via the `OCAMLRUNPARAM` environment variable which enable dump after each compaction. ### Mnemonicoeus Going with the modification of the runtime, we build a tool to analyse the dumps. This tools is still a prototype. It assume running one the AMD64 architecture and the size of the PROFINFO field is 26bits. For more information, please read the [project](https://gitlab.com/Fardale/mnemonicoeus) README file. ### Known issues There are two main issues currently with the project of dumping the memory and analysing it. First, it seems that, the traversal of the memory from the roots, does not scan the minor heap. Secondly, during the traversal of the memory, some memory addresses are not in the dump. ### Need to be done In this section, we will list the different item that need to be done next one this project. #### Dump of the hash table Currently, the dump does not contains the table that associate information to the hash contains in the profinfo field. For the bytecode, this information is dump as an header of the file named `TAGL`. In native compilation mode, the information is stored in a variable called `caml_globals_taglib` in the runtime. #### TMC of OCaml 4.14.0 In OCaml 4.14.0, an experimental optimisation called TMC (tail call modulo constructor) have been introduced. Currently, this optimisation is erasing the tags. Someone should look into it to understand and properly propagate the tags. #### LOCid Currently the hash contained information to be able to print the value in the OCaml syntax. This was enough for the debugging goal of Frédéric Bour. For the analysis of memory, the location of the allocation is an interesting information when looking for memory leak. This can be added. #### Bytecode The current implementation work only for the native compilation. A natural next step is to port this to bytecode. #### Port to OCaml 5.0 OCaml 5.0 merge the multicore runtime in OCaml. Therefore, some changes were made. In our work, there is two orthogonal modifications that need to be ported. The first is the modification of the compilation to include the tags in the header. The second is the function to dump the memory to a file. The rebase of the tagl patches from Fédéric Bour, seems to be quite easy to do on top of trunk, the future OCaml 5.0. An attempt to do that can be found on the branch [tagl-trunk](https://github.com/FardaleM/ocaml/tree/tagl-trunk). A function used by those patch is no longer present in trunk. An [issue](https://github.com/ocaml/ocaml/issues/11287) is open about it. For the dump, in OCaml 5.0, code can be run in parallel. Therefore, dumping from one domain can lead to data race. The minor and major GC, can run code in Stop The World section. For the dump, to be on the safe side, using the same should be done. The work should start by looking at `caml_finish_major_cycle` in `runtime/major_gc.c`. The function `caml_try_run_on_all_domains` seems to be able initiate a STW section. #### Upstream the patch The compiler has been patch in two orthogonal ways. First, using the profinfo infrastructure to store data in the header of value. This information, can be useful for debugging purposes. If several use case are found, a discussion with upstream is possible to have this in the compiler. For example, it could be useful for profiling and `js_of_ocaml`. It could also be use to instrument the backtraces to add the representation of the value in it. Second, the dump function. This function does not depends on the tags and can be interesting on its own and should have a really limited impact on the runtime. This will fix a dump format that can be use be external tool for analysis.