###### tags: `SDE_Good_Read`
# Use OProfile on Fedora 36
OProfile is a CPU performance profiling tool on Linux platform. It contains several tools to help user trace the resource usage for every process on the machine.
### Install
The easiest way to install oprofile maybe directly download the source code and compile yourself.
We can download oprofile [here](https://oprofile.sourceforge.io/news/)
Extract the zip file to a proper place, run **``./configure``** to know whether there is a required dependency. In my case, I found out that libutils-devel is a need. So:
```bash
sudo dnf install binutils-devel
```
And there is a special requirement is that you should create a user named 'oprofile' to pass the configuration. We can create one esaily with:
```bash
sudo useradd oprofile
```
Then, do the usual way to install the binary:
```bash
make
sudo make install
```
Done. We can check the version to ensure installation success.
```bash
operf -v
```
### Usage
We can do a simple test by monitor the machine for a while and then use ``opreport`` to generate a resource usage report.
We can start a system-wide monitoring by commanding:
```bash
sudo operf -s
```
A profile will be recorded by the operf. After we quit operf, we can use ``opreport`` to read a simple text report.
```bash
opreport
```
We can use man ``oprofile <binary>`` to check the detailed help message of that tool.
```bash
man operf
man oprofile
```
There are short introductions to some oprofile tools:
**``ophelp``**: an alternative of help message, but also show a list of all hardware events that can be recorded.
**``operf``**: the man part of oprofile. Start a process that monitor performance by listen to the information given by CPU.
**``ocount``**: a tool to only count the CPU hardware events rather than generate report.
**``opreport``**: a useful tool to generate report after ``operf``.
**``oparchive``**: generate the whole archive after ``operf``.
**``opgprof``**: generate a special format of profile for a specific binary.
### How it works ?
It utilizes hardware performance counter inside CPU to realize a low overhead performance tracking. Perf commonly used ``top`` command are using the same mechanism too.
[About hardware performance register](https://en.wikipedia.org/wiki/Hardware_performance_counter)
### Special usage
``operf -g -p <PID>`` can generate the function callgraph of a running binary, kind of amazing. We can write a binary to see its functionality. ( a symbol list is required, not so sure what that means )
``opreport -d`` can see the CPU usage of each instruction in the binary !
``sudo opreport -c`` can see the resource usage of an actual function. (Identified by symbols, not so sure what will happens if symbol table is removed.)
### An experiment
Let us write a simple task to see if we can find a problem in the source code.
```C++
#include <iostream>
#include <cstdlib>
#include <ctime>
#include <cmath>
#include <thread>
unsigned long long fibo(int n) {
if(n <= 1) return 1;
else return fibo(n-1) + fibo(n-2);
// bad implementation
};
unsigned long long complex_calculate(int n) {
return fibo(n);
}
unsigned long long easy_calculate(int n) {
return (unsigned long long)pow(n, 42);
}
int main() {
srand(time(0));
int cnt = 0;
while(rand() % 10000000 != 0) {
std::cout << cnt++ << "\n";
if (rand() % 1000 == 42) {
std::thread t(complex_calculate, 42);
t.detach();
} else {
std::thread t(easy_calculate, 42);
t.detach();
}
}
}
```
This program has a fatal problem. The fibo() recursive function is not optimized. It will over recurse and waste a lot of CPU resource. In fact, we can observe significant delay after the program running about 10 seconds.
1. We at first start a operf by ``sudo operf -s -g`` to monitor. Then we run the program about 1 minutes.
2. Then, we can see the report by ``opreport``
But if we do that it will look trival. Instead, we can use ``opreport -t <THRESHOLD>`` to set a threshold to find those resource-consuming programs/
We command ``opreport -t 5``, such report is showed:
```bash
cpu_clk_unhalt...|
samples| %|
------------------
59679 94.4288 a.out
cpu_clk_unhalt...|
samples| %|
------------------
55954 93.7583 a.out
3539 5.9301 kallsyms
145 0.2430 libc.so.6
24 0.0402 libstdc++.so.6.0.30
7 0.0117 i915
7 0.0117 ld-linux-x86-64.so.2
3 0.0050 libm.so.6
```
3. Now we ensure the reason of that delay is the program a.out. We can further use ``sudo opreport -c -t 1`` to see which function consume the most.
```bash
samples % app name symbol name
-------------------------------------------------------------------------------
55952 88.5344 a.out fibo(int)
55952 98.5747 a.out fibo(int) [self]
-------------------------------------------------------------------------------
```
4. Then, we can fix this program to solve the delay problem.
```C++
#include <iostream>
#include <cstdlib>
#include <ctime>
#include <cmath>
#include <thread>
unsigned long long fibo(int n) {
int a=1,b=1;
if(n <= 1)return 1;
while(n--) {
b = a + b;
a = b - a;
}
return b;
};
unsigned long long complex_calculate(int n) {
return fibo(n);
}
unsigned long long easy_calculate(int n) {
return (unsigned long long)pow(n, 42);
}
int main() {
srand(time(0));
int cnt = 0;
while(rand() % 10000000 != 0) {
std::cout << cnt++ << "\n";
if (rand() % 1000 == 42) {
std::thread t(complex_calculate, 42);
t.detach();
} else {
std::thread t(easy_calculate, 42);
t.detach();
}
}
}
```
5. Do the experiment again, after ``operf`` for 60 seconds, we ``opreport -t 5`` to show the CPU resource profile:
```bash
cpu_clk_unhalt...|
samples| %|
------------------
15982 51.4387 a.out
cpu_clk_unhalt...|
samples| %|
------------------
14661 91.7345 kallsyms
1147 7.1768 libc.so.6
92 0.5756 libstdc++.so.6.0.30
34 0.2127 libm.so.6
33 0.2065 ld-linux-x86-64.so.2
15 0.0939 a.out
7862 25.3042 gnome-terminal-server
cpu_clk_unhalt...|
samples| %|
------------------
3956 50.3180 kallsyms
1632 20.7581 libglib-2.0.so.0.7200.1
499 6.3470 libpixman-1.so.0.40.0
463 5.8891 [vdso] (tgid:6424 range:0x7fff291dd000-0x7fff291defff)
431 5.4821 libvte-2.91.so.0.6800.0
409 5.2022 libc.so.6
137 1.7426 libz.so.1.2.11
115 1.4627 libdbus-1.so.3.32.0
45 0.5724 libcairo.so.2.11706.0
35 0.4452 libgobject-2.0.so.0.7200.1
33 0.4197 libgdk-3.so.0.2404.29
33 0.4197 libwayland-client.so.0.20.0
29 0.3689 libgtk-3.so.0.2404.29
18 0.2289 libatspi.so.0.0.1
18 0.2289 libgio-2.0.so.0.7200.1
2 0.0254 libpango-1.0.so.0.5000.7
1 0.0127 i915
1 0.0127 ld-linux-x86-64.so.2
1 0.0127 libffi.so.8.1.0
1 0.0127 libfribidi.so.0.4.0
1 0.0127 libgnutls.so.30.32.0
1 0.0127 libnettle.so.8.4
1 0.0127 libpangoft2-1.0.so.0.5000.7
2561 8.2427 operf
cpu_clk_unhalt...|
samples| %|
------------------
1905 74.3850 kallsyms
601 23.4674 operf
49 1.9133 libc.so.6
6 0.2343 libstdc++.so.6.0.30
```
We can find out that the resource comsumption of a.out decreases for a large amount. And by the callpath, most of the resource is consumed by system call now.
6. Show the callpath by ``sudo opreport -c -t 1``
```bash
samples % image name app name symbol name
-------------------------------------------------------------------------------
1632 5.2545 libglib-2.0.so.0.7200.1 gnome-terminal-server /usr/lib64/libglib-2.0.so.0.7200.1
1632 100.000 libglib-2.0.so.0.7200.1 gnome-terminal-server /usr/lib64/libglib-2.0.so.0.7200.1 [self]
-------------------------------------------------------------------------------
271 20.8622 kallsyms a.out ret_from_fork
1028 79.1378 kallsyms a.out do_syscall_64
1094 3.5223 kallsyms a.out syscall_exit_to_user_mode
1094 84.2186 kallsyms a.out syscall_exit_to_user_mode [self]
201 15.4734 kallsyms a.out exit_to_user_mode_prepare
-------------------------------------------------------------------------------
1336 100.000 kallsyms a.out __get_vm_area_node.constprop.0
1073 3.4547 kallsyms a.out alloc_vmap_area
1073 80.3144 kallsyms a.out alloc_vmap_area [self]
86 6.4371 kallsyms a.out insert_vmap_area.constprop.0
63 4.7156 kallsyms a.out insert_vmap_area_augment.constprop.0
56 4.1916 kallsyms a.out kmem_cache_alloc_node
45 3.3683 kallsyms a.out _raw_spin_lock
-------------------------------------------------------------------------------
45 4.3945 kallsyms a.out attach_entity_cfs_rq
59 5.7617 kallsyms a.out set_next_entity
106 10.3516 kallsyms a.out enqueue_entity
168 16.4062 kallsyms a.out dequeue_entity
623 60.8398 kallsyms a.out propagate_entity_cfs_rq
679 2.1862 kallsyms a.out update_load_avg
679 66.3086 kallsyms a.out update_load_avg [self]
169 16.5039 kallsyms a.out __update_load_avg_se
156 15.2344 kallsyms a.out __update_load_avg_cfs_rq
16 1.5625 kallsyms a.out _raw_spin_lock
-------------------------------------------------------------------------------
702 100.000 kallsyms gnome-terminal-server do_syscall_64
632 2.0348 kallsyms gnome-terminal-server syscall_exit_to_user_mode
632 90.0285 kallsyms gnome-terminal-server syscall_exit_to_user_mode [self]
65 9.2593 kallsyms gnome-terminal-server exit_to_user_mode_prepare
-------------------------------------------------------------------------------
526 1.6936 operf operf operf_sfile_find(operf_transient const*)
526 98.6867 operf operf operf_sfile_find(operf_transient const*) [self]
-------------------------------------------------------------------------------
499 1.6066 libpixman-1.so.0.40.0 gnome-terminal-server /usr/lib64/libpixman-1.so.0.40.0
499 100.000 libpixman-1.so.0.40.0 gnome-terminal-server /usr/lib64/libpixman-1.so.0.40.0 [self]
-------------------------------------------------------------------------------
463 1.4907 [vdso] (tgid:6424 range:0x7fff291dd000-0x7fff291defff) gnome-terminal-server [vdso] (tgid:6424 range:0x7fff291dd000-0x7fff291defff)
463 100.000 [vdso] (tgid:6424 range:0x7fff291dd000-0x7fff291defff) gnome-terminal-server [vdso] (tgid:6424 range:0x7fff291dd000-0x7fff291defff) [self]
-------------------------------------------------------------------------------
459 100.000 kallsyms a.out post_alloc_hook
459 1.4778 kallsyms a.out clear_page_erms
459 100.000 kallsyms a.out clear_page_erms [self]
-------------------------------------------------------------------------------
431 1.3877 libvte-2.91.so.0.6800.0 gnome-terminal-server /usr/lib64/libvte-2.91.so.0.6800.0
431 100.000 libvte-2.91.so.0.6800.0 gnome-terminal-server /usr/lib64/libvte-2.91.so.0.6800.0 [self]
-------------------------------------------------------------------------------
508 100.000 kallsyms a.out zap_page_range
425 1.3684 kallsyms a.out unmap_page_range
425 83.6614 kallsyms a.out unmap_page_range [self]
41 8.0709 kallsyms a.out _raw_spin_lock
12 2.3622 kallsyms a.out sync_mm_rss
11 2.1654 kallsyms a.out _raw_spin_unlock
9 1.7717 kallsyms a.out asm_sysvec_apic_timer_interrupt
-------------------------------------------------------------------------------
428 100.000 kallsyms operf do_syscall_64
413 1.3297 kallsyms operf syscall_exit_to_user_mode
413 96.4953 kallsyms operf syscall_exit_to_user_mode [self]
10 2.3364 kallsyms operf exit_to_user_mode_prepare
5 1.1682 kallsyms operf syscall_exit_to_user_mode_prepare
-------------------------------------------------------------------------------
12 3.2432 kallsyms a.out tty_insert_flip_string_fixed_flag
13 3.5135 kallsyms a.out perf_output_copy
345 93.2432 kallsyms a.out arch_dup_task_struct
369 1.1881 kallsyms a.out memcpy_erms
369 99.7297 kallsyms a.out memcpy_erms [self]
-------------------------------------------------------------------------------
139 41.9940 kallsyms a.out dequeue_entity
189 57.0997 kallsyms a.out enqueue_entity
331 1.0657 kallsyms a.out update_cfs_group
331 100.000 kallsyms a.out update_cfs_group [self]
```
The most resource comsuming function now is kernal call by kallsyms, which should be the operation behind std::threading.
### Conclusion
In this page we show a CPU Performance Profiling tool called OProfile. How to use it and how it works. And we demo an experiment by leaving an intended bug, then use OProfile to point it out.