# Internship report
## Table of content
[TOC]
## Introduction
The subject of this internship was to explore the possibility to build tools for analysis
memory troubles for OCaml programs. OCaml has a garbage collector, therefore, memory
problems can be hard to debug. The current tools are not satisfactory, especially to find
memory leaks. At the beginning of the internship, several option were discussed and a
more precise topic were chosen.
## Possible projects that were not explored
Here, we will list some topics/projects that were looking interesting but were not
investigate further.
### HTOP-like for LWT
Tokio is a asynchronous runtime for Rust. They have developed a HTOP-like interface for
it name [Tokio-console](https://tokio.rs/blog/2021-12-announcing-tokio-console). This shows
information on the tasks for debugging purpose. A idea is to develop a similar tools for
LWT. This tool could show how many promise are currently active, what the wait for, which
one is running, etc. This project seems quite big and will most likely need to modify the
LWT to gather meaningful information, and tools to analyse those information.
To build such tool, finding the location of a closure can be needed. The project
[Owee](https://github.com/let-def/owee) can be use to do that.
### Runtime Events and EIO-console
OCaml 5 also has a new profiling tool called [Runtime Events](https://github.com/ocaml/ocaml/pull/10964).
This is only for the next version of OCaml, it could be interesting to follow and see what
are the possibility offer by it. Also there exist an application to monitor an OCaml
application using this new profiling tool, [Eio-console](https://github.com/patricoferris/eio-console).
## Post mortem memory analysis
The project chosen for the internship was to build a tool to do post mortem analysis of
the memory of an OCaml program.
The idea is the following, a program is running, and dump of the memory can be done.
A dump of the memory can be done to understand what is in the memory at the time of the dump.
Several dumps can be made for a running application, for example after each compaction.
Then, the sequence of dump can be analysed to understand the evolution of the memory usages.
### Constraints
As we want to study the memory behaviour of a program, the developed tool should not
modify the memory behaviour of the running program and be the less invasive.
Also, OCaml is statically typed, therefore there is little to no information at runtime
to know, what is in the memory. We can traverse the memory as the GC does, but, by default,
we cannot retrieve more information about what is in memory.
### Approach
Our approach is the following. We choose to start from the
[fork](https://github.com/let-def/ocaml/tree/tagl-413) developed by Fédéric Bour.
The idea is to modify the header of OCaml values on 64 bits platform to include more
information about the value without changing the memory used by the program.
On 64 platform, the structure of the header is the following:
```
+--------+-------+-----+
| wosize | color | tag |
+--------+-------+-----+
bits 63 10 9 8 7 0
```
In the compiler, it is possible to reduce the size of `wosize`, to reserve some bits for
other purpose. This infrastructure is called PROFINFO. With this we have he following
structure for the header:
```
P = PROFINFO_WIDTH (as set by "configure", currently 26 bits, giving a
maximum block size of just under 4Gb)
+----------------+----------------+-------------+
| profiling info | wosize | color | tag |
+----------------+----------------+-------------+
bits 63 (64-P) (63-P) 10 9 8 7 0
```
Using this, Fédéric Bour added information in the profinfo field to be able to print any
OCaml value using a syntax close to the original OCaml code. This approach is probabilistic
because, a hash is include in the profinfo field, therefore collision could happen.
This information can be seen as some sort of type information. Based of this work, we
modified the OCaml runtime, to add a dump function and wrote a tool to analyse those dump.
### Modification of the OCaml compiler
The work of Fédéric Bour have been ported to OCaml 4.14.0 from OCaml 4.13.0. Other than
that, the patch on the compiler were not modified. A future task would be to add more
information in the hash but this have not been done yet.
### Dumping the OCaml memory
A possibility is to use a core-dump from the Linux kernel but, by doing this, we will
get all the memory of the program including the runtime. From this, it is not clear how
to retrieve the roots and the OCaml values. Therefore, we chose to modify the runtime of
OCaml to had a dump function. To dump the memory of an OCaml program, one need understand
how OCaml handles the memory. The full explanation of the memory allocation and management
is out of scope of this document. We recommend reading the following to learn about it:
- The [PhD thesis](https://hal-ensta-paris.archives-ouvertes.fr/ENSTA_U2IS/tel-01122262v1) of Çagdas Bozman
- https://dev.realworldocaml.org/runtime-memory-layout.html
- https://dev.realworldocaml.org/garbage-collector.html
- https://v2.ocaml.org/manual/intfc.html
The functions to do the dump are define in
[`runtime/dump.c`](https://github.com/FardaleM/ocaml/blob/tagl-414-dump/runtime/dump.c).
The format of the dump is as follows.
First all the roots are dumps followed by a null word that mark the end.
A memory region is dump by writing the first address of the region, the size in byte as
a word, and then the actual dump of the region. After the roots, the minor heap is dump,
then all the chunks of the major heap and the data segment.
We also define a new option `d` via the `OCAMLRUNPARAM` environment variable which enable
dump after each compaction.
### Mnemonicoeus
Going with the modification of the runtime, we build a tool to analyse the dumps. This
tools is still a prototype. It assume running one the AMD64 architecture and the size of
the PROFINFO field is 26bits.
For more information, please read the [project](https://gitlab.com/Fardale/mnemonicoeus)
README file.
### Known issues
There are two main issues currently with the project of dumping the memory and analysing it.
First, it seems that, the traversal of the memory from the roots, does not scan the minor
heap. Secondly, during the traversal of the memory, some memory addresses are not in the
dump.
### Need to be done
In this section, we will list the different item that need to be done next one this project.
#### Dump of the hash table
Currently, the dump does not contains the table that associate information to the hash
contains in the profinfo field.
For the bytecode, this information is dump as an header of the file named `TAGL`.
In native compilation mode, the information is stored in a variable called
`caml_globals_taglib` in the runtime.
#### TMC of OCaml 4.14.0
In OCaml 4.14.0, an experimental optimisation called TMC (tail call modulo constructor)
have been introduced. Currently, this optimisation is erasing the tags. Someone should
look into it to understand and properly propagate the tags.
#### LOCid
Currently the hash contained information to be able to print the value in the OCaml
syntax. This was enough for the debugging goal of Frédéric Bour. For the analysis of
memory, the location of the allocation is an interesting information when looking for
memory leak. This can be added.
#### Bytecode
The current implementation work only for the native compilation. A natural next step is
to port this to bytecode.
#### Port to OCaml 5.0
OCaml 5.0 merge the multicore runtime in OCaml. Therefore, some changes were made.
In our work, there is two orthogonal modifications that need to be ported. The first is
the modification of the compilation to include the tags in the header. The second is the
function to dump the memory to a file.
The rebase of the tagl patches from Fédéric Bour, seems to be quite easy to do on top of
trunk, the future OCaml 5.0. An attempt to do that can be found on the branch
[tagl-trunk](https://github.com/FardaleM/ocaml/tree/tagl-trunk). A function used by
those patch is no longer present in trunk. An
[issue](https://github.com/ocaml/ocaml/issues/11287) is open about it.
For the dump, in OCaml 5.0, code can be run in parallel. Therefore, dumping from one
domain can lead to data race. The minor and major GC, can run code in Stop The World
section. For the dump, to be on the safe side, using the same should be done. The work
should start by looking at `caml_finish_major_cycle` in `runtime/major_gc.c`.
The function `caml_try_run_on_all_domains` seems to be able initiate a STW section.
#### Upstream the patch
The compiler has been patch in two orthogonal ways.
First, using the profinfo infrastructure to store data in the header of value.
This information, can be useful for debugging purposes. If several use case are found,
a discussion with upstream is possible to have this in the compiler.
For example, it could be useful for profiling and `js_of_ocaml`. It could also be use
to instrument the backtraces to add the representation of the value in it.
Second, the dump function. This function does not depends on the tags and can be
interesting on its own and should have a really limited impact on the runtime. This
will fix a dump format that can be use be external tool for analysis.