source: Steven Rostedt - Learning the Linux Kernel with tracing
puts
?The objdump
output indicates that the main
function is calling the puts
method. In reality, however, the source code calls the printf
method rather than puts
.
This is due to the gcc
compiler optimizing our source code. If we prefer to prevent the compiler from performing such optimizations, we can pass the -O0
flag to disable them.
After disabling optimization with the -O0
flag, the printf
function call is present in our main
function as expected.
Everytime we execute the main
file, the outputs are different.
This is cause by the Address Space Layout Randomization(ASLR) in linux kernel. We can disable it by the following command.
After disabled ASLR, the main
file always output same address.
randomize_va_space
after the experiment, as it is a security feature.
0x560799d6e149
The virtual address is structured in a format comprising "9 bits-9 bits-9 bits-9 bits-12 bits," which together make up a 48-bit address space. We observe that the kernel attempts to determine whether the PGD contains 0xac
, the PUD contains 0x117
, the PMD contains 0xf1
, and the PTE contains 0x194
. Ultimately, the final 12 bits represent the offset within the address pointed by the PTE.
If the address does not exist in the Page Directory, the kernel will load it from memory and update the corresponding Page Directory.
Since each application has its own virtual address space, each can have the same virtual address that points to a different physical address.
The application has its own process space, which contains sections such as data, text, stack, heap, and others. However, what resides in the kernel space?
We can determine the system calls invoked in user space by using strace
, which reveals each call made by the main
program. On the other hand, to understand the activities within the kernel space, we can employ ftrace
.
Strace stands for System Call Trace. You can figure out each system call the main
file really called by utilies strace
Here is the output.
Here is what these system calls actually do, as generated by ChatGPT.
execve
system call, which is used to execute a program pointed to by the filename. This indicates the start of the a.out
program.NULL
to find the current location of the program break.EINVAL
(Invalid argument) error suggests that the call was made with an incorrect or unsupported code.ENOENT
(No such file or directory) suggests that the file /etc/ld.so.preload
does not exist.libc.so.6
library, which is the standard C library, into memory./etc/ld.so.cache
and the libc.so.6
library is closed after being read and mapped into memory.The +++ exited with 28 +++
line indicates that the program has finished executing and has exited with the status code 28.
ftrace
/sys/kernel/tracing
/sys/kernel/debug/tracing
(for backward compatibility)First of all, please read the README in /sys/kernel/tracing
.
Let's trace all functions in the kernel.
Here is the sample output.
You may have observed that the file contains an extensive amount of content and seems to be never-ending. This occurs because we are continuously tracing the function. If we wish to halt the tracing process, we need to write nop
into the current_tracer
.
The primary reason we can stop the tracing using the aforementioned method is because the trace
employs a non-consuming read approach.
Additionally, there is an alternative method for observing the current tracing activity, which is through the trace_pipe
. However, when monitoring the tracing function via this file, we cannot terminate the tracing in the same manner. The trace_pipe
prevents the user from writing to the current_tracer
.
As for trace_pipe
, it continuously provides the latest function calls because we use the cat
command to read from it, which inadvertently triggers events in its own trace. Consequently, it ends up reading from itself in a loop.
The trace
command is powerful but not particularly user-friendly. You might not want to go through the aforementioned steps each time. Therefore, Steven has implemented trace-cmd
for users like you.
/sys/kernel/tracing
for youUsing the following command with trace-cmd
, we can obtain the same output as before without the need to modify or mount any files.
Furthermore, we can capture system calls using the record
command and retrieve them with the report
command.
write
function doprintf("Hello world\n")
does a system callSteven had traced our "hello" program by the following commands.
The function_graph
tracer, similar to the function
tracer, traces not only all kernel function calls but also their entry and exit points. This enables the creation of a graphical representation of the call hierarchy that closely reflects the structure of the source code in C. More detail in The Linux Kernel.
-F
: This will filter only the executable that is given on the command line. If no command is given, then it will filter itself (pretty pointless). Using -F
will let you trace only events that are caused by the given command.--max-graph-depth
: --max-graph-depth
option specifies the maximum depth to which the function_graph
tracer will record function calls.-g <function name>
: This option is for the function_graph
plugin. It will graph the given function. That is, it will only trace the function and all functions that it calls. You can have more than one -g
on the command line.-l <function name>
: This will limit the function
and function_graph
tracers to only trace the given function name. More than one -l
may be specified on the command line to trace more than one function. This supports both full regex(3) parsing, or basic glob parsing. If the filter has only alphanumeric, '_', '*', '?' and '.' characters, then it will be parsed as a basic glob. to force it to be a regex, prefix the filter with '^' or append it with '$'-n <function name>
: This has the opposite effect of -l
. The function given with the -n
option will not be traced. This takes precedence, that is, if you include the same function for both -n
and -l
, it will not be traced.-O <option>
: Ftrace has various options that can be enabled or disabled. This allows you to set them. Appending the text 'no' to an option disables it. For example: -O nograph-time
will disable the "graph-time" Ftrace option. Here is more detail about optionsToo much information!!