###### tags: `SDE_Good_Read` # Use OProfile on Fedora 36 OProfile is a CPU performance profiling tool on Linux platform. It contains several tools to help user trace the resource usage for every process on the machine. ### Install The easiest way to install oprofile maybe directly download the source code and compile yourself. We can download oprofile [here](https://oprofile.sourceforge.io/news/) Extract the zip file to a proper place, run **``./configure``** to know whether there is a required dependency. In my case, I found out that libutils-devel is a need. So: ```bash sudo dnf install binutils-devel ``` And there is a special requirement is that you should create a user named 'oprofile' to pass the configuration. We can create one esaily with: ```bash sudo useradd oprofile ``` Then, do the usual way to install the binary: ```bash make sudo make install ``` Done. We can check the version to ensure installation success. ```bash operf -v ``` ### Usage We can do a simple test by monitor the machine for a while and then use ``opreport`` to generate a resource usage report. We can start a system-wide monitoring by commanding: ```bash sudo operf -s ``` A profile will be recorded by the operf. After we quit operf, we can use ``opreport`` to read a simple text report. ```bash opreport ``` We can use man ``oprofile <binary>`` to check the detailed help message of that tool. ```bash man operf man oprofile ``` There are short introductions to some oprofile tools: **``ophelp``**: an alternative of help message, but also show a list of all hardware events that can be recorded. **``operf``**: the man part of oprofile. Start a process that monitor performance by listen to the information given by CPU. **``ocount``**: a tool to only count the CPU hardware events rather than generate report. **``opreport``**: a useful tool to generate report after ``operf``. **``oparchive``**: generate the whole archive after ``operf``. **``opgprof``**: generate a special format of profile for a specific binary. ### How it works ? It utilizes hardware performance counter inside CPU to realize a low overhead performance tracking. Perf commonly used ``top`` command are using the same mechanism too. [About hardware performance register](https://en.wikipedia.org/wiki/Hardware_performance_counter) ### Special usage ``operf -g -p <PID>`` can generate the function callgraph of a running binary, kind of amazing. We can write a binary to see its functionality. ( a symbol list is required, not so sure what that means ) ``opreport -d`` can see the CPU usage of each instruction in the binary ! ``sudo opreport -c`` can see the resource usage of an actual function. (Identified by symbols, not so sure what will happens if symbol table is removed.) ### An experiment Let us write a simple task to see if we can find a problem in the source code. ```C++ #include <iostream> #include <cstdlib> #include <ctime> #include <cmath> #include <thread> unsigned long long fibo(int n) { if(n <= 1) return 1; else return fibo(n-1) + fibo(n-2); // bad implementation }; unsigned long long complex_calculate(int n) { return fibo(n); } unsigned long long easy_calculate(int n) { return (unsigned long long)pow(n, 42); } int main() { srand(time(0)); int cnt = 0; while(rand() % 10000000 != 0) { std::cout << cnt++ << "\n"; if (rand() % 1000 == 42) { std::thread t(complex_calculate, 42); t.detach(); } else { std::thread t(easy_calculate, 42); t.detach(); } } } ``` This program has a fatal problem. The fibo() recursive function is not optimized. It will over recurse and waste a lot of CPU resource. In fact, we can observe significant delay after the program running about 10 seconds. 1. We at first start a operf by ``sudo operf -s -g`` to monitor. Then we run the program about 1 minutes. 2. Then, we can see the report by ``opreport`` But if we do that it will look trival. Instead, we can use ``opreport -t <THRESHOLD>`` to set a threshold to find those resource-consuming programs/ We command ``opreport -t 5``, such report is showed: ```bash cpu_clk_unhalt...| samples| %| ------------------ 59679 94.4288 a.out cpu_clk_unhalt...| samples| %| ------------------ 55954 93.7583 a.out 3539 5.9301 kallsyms 145 0.2430 libc.so.6 24 0.0402 libstdc++.so.6.0.30 7 0.0117 i915 7 0.0117 ld-linux-x86-64.so.2 3 0.0050 libm.so.6 ``` 3. Now we ensure the reason of that delay is the program a.out. We can further use ``sudo opreport -c -t 1`` to see which function consume the most. ```bash samples % app name symbol name ------------------------------------------------------------------------------- 55952 88.5344 a.out fibo(int) 55952 98.5747 a.out fibo(int) [self] ------------------------------------------------------------------------------- ``` 4. Then, we can fix this program to solve the delay problem. ```C++ #include <iostream> #include <cstdlib> #include <ctime> #include <cmath> #include <thread> unsigned long long fibo(int n) { int a=1,b=1; if(n <= 1)return 1; while(n--) { b = a + b; a = b - a; } return b; }; unsigned long long complex_calculate(int n) { return fibo(n); } unsigned long long easy_calculate(int n) { return (unsigned long long)pow(n, 42); } int main() { srand(time(0)); int cnt = 0; while(rand() % 10000000 != 0) { std::cout << cnt++ << "\n"; if (rand() % 1000 == 42) { std::thread t(complex_calculate, 42); t.detach(); } else { std::thread t(easy_calculate, 42); t.detach(); } } } ``` 5. Do the experiment again, after ``operf`` for 60 seconds, we ``opreport -t 5`` to show the CPU resource profile: ```bash cpu_clk_unhalt...| samples| %| ------------------ 15982 51.4387 a.out cpu_clk_unhalt...| samples| %| ------------------ 14661 91.7345 kallsyms 1147 7.1768 libc.so.6 92 0.5756 libstdc++.so.6.0.30 34 0.2127 libm.so.6 33 0.2065 ld-linux-x86-64.so.2 15 0.0939 a.out 7862 25.3042 gnome-terminal-server cpu_clk_unhalt...| samples| %| ------------------ 3956 50.3180 kallsyms 1632 20.7581 libglib-2.0.so.0.7200.1 499 6.3470 libpixman-1.so.0.40.0 463 5.8891 [vdso] (tgid:6424 range:0x7fff291dd000-0x7fff291defff) 431 5.4821 libvte-2.91.so.0.6800.0 409 5.2022 libc.so.6 137 1.7426 libz.so.1.2.11 115 1.4627 libdbus-1.so.3.32.0 45 0.5724 libcairo.so.2.11706.0 35 0.4452 libgobject-2.0.so.0.7200.1 33 0.4197 libgdk-3.so.0.2404.29 33 0.4197 libwayland-client.so.0.20.0 29 0.3689 libgtk-3.so.0.2404.29 18 0.2289 libatspi.so.0.0.1 18 0.2289 libgio-2.0.so.0.7200.1 2 0.0254 libpango-1.0.so.0.5000.7 1 0.0127 i915 1 0.0127 ld-linux-x86-64.so.2 1 0.0127 libffi.so.8.1.0 1 0.0127 libfribidi.so.0.4.0 1 0.0127 libgnutls.so.30.32.0 1 0.0127 libnettle.so.8.4 1 0.0127 libpangoft2-1.0.so.0.5000.7 2561 8.2427 operf cpu_clk_unhalt...| samples| %| ------------------ 1905 74.3850 kallsyms 601 23.4674 operf 49 1.9133 libc.so.6 6 0.2343 libstdc++.so.6.0.30 ``` We can find out that the resource comsumption of a.out decreases for a large amount. And by the callpath, most of the resource is consumed by system call now. 6. Show the callpath by ``sudo opreport -c -t 1`` ```bash samples % image name app name symbol name ------------------------------------------------------------------------------- 1632 5.2545 libglib-2.0.so.0.7200.1 gnome-terminal-server /usr/lib64/libglib-2.0.so.0.7200.1 1632 100.000 libglib-2.0.so.0.7200.1 gnome-terminal-server /usr/lib64/libglib-2.0.so.0.7200.1 [self] ------------------------------------------------------------------------------- 271 20.8622 kallsyms a.out ret_from_fork 1028 79.1378 kallsyms a.out do_syscall_64 1094 3.5223 kallsyms a.out syscall_exit_to_user_mode 1094 84.2186 kallsyms a.out syscall_exit_to_user_mode [self] 201 15.4734 kallsyms a.out exit_to_user_mode_prepare ------------------------------------------------------------------------------- 1336 100.000 kallsyms a.out __get_vm_area_node.constprop.0 1073 3.4547 kallsyms a.out alloc_vmap_area 1073 80.3144 kallsyms a.out alloc_vmap_area [self] 86 6.4371 kallsyms a.out insert_vmap_area.constprop.0 63 4.7156 kallsyms a.out insert_vmap_area_augment.constprop.0 56 4.1916 kallsyms a.out kmem_cache_alloc_node 45 3.3683 kallsyms a.out _raw_spin_lock ------------------------------------------------------------------------------- 45 4.3945 kallsyms a.out attach_entity_cfs_rq 59 5.7617 kallsyms a.out set_next_entity 106 10.3516 kallsyms a.out enqueue_entity 168 16.4062 kallsyms a.out dequeue_entity 623 60.8398 kallsyms a.out propagate_entity_cfs_rq 679 2.1862 kallsyms a.out update_load_avg 679 66.3086 kallsyms a.out update_load_avg [self] 169 16.5039 kallsyms a.out __update_load_avg_se 156 15.2344 kallsyms a.out __update_load_avg_cfs_rq 16 1.5625 kallsyms a.out _raw_spin_lock ------------------------------------------------------------------------------- 702 100.000 kallsyms gnome-terminal-server do_syscall_64 632 2.0348 kallsyms gnome-terminal-server syscall_exit_to_user_mode 632 90.0285 kallsyms gnome-terminal-server syscall_exit_to_user_mode [self] 65 9.2593 kallsyms gnome-terminal-server exit_to_user_mode_prepare ------------------------------------------------------------------------------- 526 1.6936 operf operf operf_sfile_find(operf_transient const*) 526 98.6867 operf operf operf_sfile_find(operf_transient const*) [self] ------------------------------------------------------------------------------- 499 1.6066 libpixman-1.so.0.40.0 gnome-terminal-server /usr/lib64/libpixman-1.so.0.40.0 499 100.000 libpixman-1.so.0.40.0 gnome-terminal-server /usr/lib64/libpixman-1.so.0.40.0 [self] ------------------------------------------------------------------------------- 463 1.4907 [vdso] (tgid:6424 range:0x7fff291dd000-0x7fff291defff) gnome-terminal-server [vdso] (tgid:6424 range:0x7fff291dd000-0x7fff291defff) 463 100.000 [vdso] (tgid:6424 range:0x7fff291dd000-0x7fff291defff) gnome-terminal-server [vdso] (tgid:6424 range:0x7fff291dd000-0x7fff291defff) [self] ------------------------------------------------------------------------------- 459 100.000 kallsyms a.out post_alloc_hook 459 1.4778 kallsyms a.out clear_page_erms 459 100.000 kallsyms a.out clear_page_erms [self] ------------------------------------------------------------------------------- 431 1.3877 libvte-2.91.so.0.6800.0 gnome-terminal-server /usr/lib64/libvte-2.91.so.0.6800.0 431 100.000 libvte-2.91.so.0.6800.0 gnome-terminal-server /usr/lib64/libvte-2.91.so.0.6800.0 [self] ------------------------------------------------------------------------------- 508 100.000 kallsyms a.out zap_page_range 425 1.3684 kallsyms a.out unmap_page_range 425 83.6614 kallsyms a.out unmap_page_range [self] 41 8.0709 kallsyms a.out _raw_spin_lock 12 2.3622 kallsyms a.out sync_mm_rss 11 2.1654 kallsyms a.out _raw_spin_unlock 9 1.7717 kallsyms a.out asm_sysvec_apic_timer_interrupt ------------------------------------------------------------------------------- 428 100.000 kallsyms operf do_syscall_64 413 1.3297 kallsyms operf syscall_exit_to_user_mode 413 96.4953 kallsyms operf syscall_exit_to_user_mode [self] 10 2.3364 kallsyms operf exit_to_user_mode_prepare 5 1.1682 kallsyms operf syscall_exit_to_user_mode_prepare ------------------------------------------------------------------------------- 12 3.2432 kallsyms a.out tty_insert_flip_string_fixed_flag 13 3.5135 kallsyms a.out perf_output_copy 345 93.2432 kallsyms a.out arch_dup_task_struct 369 1.1881 kallsyms a.out memcpy_erms 369 99.7297 kallsyms a.out memcpy_erms [self] ------------------------------------------------------------------------------- 139 41.9940 kallsyms a.out dequeue_entity 189 57.0997 kallsyms a.out enqueue_entity 331 1.0657 kallsyms a.out update_cfs_group 331 100.000 kallsyms a.out update_cfs_group [self] ``` The most resource comsuming function now is kernal call by kallsyms, which should be the operation behind std::threading. ### Conclusion In this page we show a CPU Performance Profiling tool called OProfile. How to use it and how it works. And we demo an experiment by leaving an intended bug, then use OProfile to point it out.