Off-CPU analysis is where the program behavior when it is not running is recorded and analysed. See Brendan Gregg's eBPF based off-CPU analysis: https://www.brendangregg.com/offcpuanalysis.html. While on-CPU performance monitoring tools such as `perf` give you an idea of where the program is _actively_ spending its time, they won't tell you where the program is spending time _blocked_ waiting for an action. Off-CPU analysis reveals information about where the program is spending time _passively_.
## Installation
Install the tools from https://github.com/iovisor/bcc/.
## Enabling frame pointers
The off-CPU stack trace collection, `offcputime-bpfcc`, requires the programs to be compiled with frame pointers for full backtraces.
### OCaml
For OCaml, you'll need a compiler variant with frame pointers enabled. If you are installing a released compiler using `opam`, look for `+fp` variants in `opam switch list-available`.
Instead, if you are building the OCaml compiler from source, `configure` the compiler with `--enable-frame-pointers` option:
```
$ ./configure --enable-frame-pointers
```
Lastly, there is an option to create an opam switch with the development branch of the compiler. The instructions are in `ocaml/HACKING.adoc`. In order to create an opam switch from the current working directory, do:
```
$ opam switch create . 'ocaml-option-fp' --working-dir
```
## glibc
The libc is not compiled with frame pointers by default. This will lead to many truncated stack traces. On Ubuntu, I did the following to get a glibc with frame pointers enabled:
1. Install glibc with frame pointers
```
$ sudo apt install libc6-prof
```
2. LD_PRELOAD the glibc with frame pointers
```
$ LD_PRELOAD=/lib/libc6-prof/x86_64-linux-gnu/libc.so.6 ./myapp.exe
```
## Running
On one terminal run the program that you want to analyze:
```
$ LD_PRELOAD=/lib/libc6-prof/x86_64-linux-gnu/libc.so.6 ./ocamlfoo.exe
```
On another terminal run `offcputime-bpfcc` tool:
```
$ sudo offcputime-bpfcc --stack-storage-size 2097152 -p $(pgrep -f ocamlfoo.exe) 10 > offcputime.out
```
The command instruments the watches for 10s and the writes out the stack traces corresponding to blocking calls in `offcputime.out`. We use a large stack storage size argument so as to not lose stack traces. Otherwise, you will see many `[Missing User Stack]` errors in the back traces.
## Caveats
`offcputime-bpfcc` must run longer than the program being instrumented by a few seconds so that the function symbols are resolved. Otherwise you may see `[unknown]` in the backtrace for function names.
## Oddities
I still see an order of magnitude difference between the maximum pauses observed using `offcputime-bpfcc` and `olly trace`. Something is off.
## Other links
* https://www.pingcap.com/blog/how-to-trace-linux-system-calls-in-production-with-minimal-impact-on-performance/