# Debugging
# Getting Started
user-space debug跟kernel debug是有很大的不同的
* Bugs in kernel不一定可以重現的
* 同樣的bug可能每次死的樣子會不一樣
* Race conditions的問題
* 妳可能要壓測才能複現出來
* 所以如果妳lock沒弄好,妳寫完code當下測了10次沒問題,之後送去壓測,每1000次都死,完蛋,很難debug
* 通常如果是reproducable的bug,都沒什麼問題
# Bugs in the Kernel
* incorrect code
* not storing the correct value in the proper place
* synchronization errors
* not properly locking
* incorrectly managing hardware
* sending the wrong operation to the wrong control register
可能導致
* poor performance
* incorrect behavior to corrupt data
常常,bug發生一直到噴出來展現到妳面前,已經走了很長一段路了,ex: a shared structure的reference count可能造成race condition. 沒有把記數做好的情況下,可能發生一個process會free掉另一個process正在使用他的情況,不久之後,這個很衰的process會以為這個structure還在記憶體內,去用他就可能發生
* null pointer dereferenced.
* cause oops
* reading of garbage data
* leads to corruption
* bad behavior
* oops
* or,什麼事也沒發生
* 如果這個data還沒被複寫
這種bug就要查出是race問題,然後把reference counting修好
# Debugging by Printing
printk() 大多數情況可以用這,不過有些時候會影響kernel的執行
* is callable in kernel anywhere and any time.
* interrupt context
* process context
* can be called while any lock is held.
* can be called simultaneously on multiple processors.
but, 在kernel開機到一個時間點之前,不能用
(至少要等到console initialization之後)
所以如果妳還在setup_arch()階段想debug, printk看不出來,但可以用 ealry_printk()來解決這個問題。not a portable solution.但不是所有的arch都有support這個method.
# LogLevels

Default: KERN_WARNING
在console調整loglevels
echo 0 /proc/sys/kernel/printk
# The Log Buffer
circular buffer of size LOG_BUF_LEN
可以在compile time去調整這個buffer的大小
透過 CONFIG_LOG_BUF_SHIFT option
default
* a uniprocessor machine
* 16KB
超過16KB就開始會把前面的log洗掉
# 自訂訊息開關
你可以自訂一個macro把printk包起來,在pre-process階段安排kernel-space debug or user-space debug,也可以達成只要修改CFLAGS變數的值就可把開發階段的printk全部取消或恢復的功能

注意到尾端有個PDEBUGG(名稱末端多了一個G)是一個空的macro,當你想註銷掉某個偵錯訊息,只要多加一個G就可以了
為了進一步簡化控制程序,還可以在Makefile裡做些手腳

注意這些動作都是compile time決定的,如果要更改什麼都需要重新compile才行。
# 節制訊息產生速率

# syslogd and klogd

# 使用 /proc 檔案系統
支援 /proc檔案系統的函式都定義於<linux/proc_fs.h>,要產生/proc檔的module都要include這個檔
* 建立唯讀的/proc檔案
* read_proc() (已不建議使用,有很多問題)
* 詳情再看課本
* 
* get_info介面
* 舊的建立/proc的介面
* seq_file介面(建議)
註冊 /proc檔
定義好read_proc後,還需要為他在/proc下設置一個入口點,driver只要呼叫create_proc_read_entry()就可以完成此一動作

# 觀測偵錯法- strace
* 不需要程式被編義時支援偵錯,直接可以用
* 可直接tracking running process
* 可顯示user-space process所有系統呼叫
* 還能顯示每次呼叫的引述跟return值
* 如果有系統呼叫失敗,還會顯示錯誤代碼(ex:ENOMEM)跟對應的錯誤字串(out of memory之類的)
*
strace ls /dev > /dev/scull0

https://medium.com/fcamels-notes/%E4%BD%BF%E7%94%A8-strace-%E4%BA%86%E8%A7%A3%E7%A8%8B%E5%BC%8F%E8%AE%80%E5%8F%96%E8%B3%87%E6%96%99%E7%9A%84%E4%BE%86%E6%BA%90-aaa17ee2df2b
# Oops
https://training.ti.com/debugging-embedded-linux-kernel-oops-logs
因為kernel本身就是entire system的supervisor, 當something bad happened in kernel,他不能把自己killed掉,如果是user process他就可以去kill,但他能發出一個oops! 死前留下點證據給user
* printing an error message to the console
* dumping the contexts of the registers
* providing back trace
通常發生oops後,kernel就進入了一個不穩定的狀態
for example
* 當一個oops發生的時候
* kernel可能正在處理某個重要的data
* might have held a lock
* might in the middle of talking to hardware
kenel必須優雅地轉身離開這些context去處理身後那堆mess, try to resume conterol of the system. 很多情況,這幾乎不可能。如果oops發生在interrupt context, kernel不能繼續然後就panics了。kernel panic會立即halt住系統.
如果oops發生在ilde task(pid zero)或init task(pid one), 一樣panic 但如果發生在其他比較不重要的process,oops不一定會panic,kernel可能會把process killed掉然後繼續執行
oops可能發生的原因
* memory access violation
* illegal instruction
example

* The machine was idle and executing the idle loop
* cpu_idle(), which calls default_idle() in a loop
* The timer interrupt occurred, which resulted in the processing of timers
* do_softirq (timer的bottom half)
* tulip_timer() (timer handler)
* 他做了個NULL pointer dereference.
* 妳可以用offset直接找出c code的位置
* register的資訊也是很有用的
* seeing an unexpected value in a register might shine some light on the root of the issue.
* 妳可以看哪個registers held NULL來發現function中的哪個變數有unexpected value.通常這種情況都會是race的問題。這個case, between the timer and some other part of this network card. Debugging a race condition is always a challenge
# ksymoops
上面那個版本的oops是decoded的版本,because the memory addresses are translated into the functions they represent.
這個是undecoded的版本

backtrace的addresses是還沒轉成symbolic names的狀態,這是透過ksymoops指令跟System.map在kernel compile time轉換的。如果你用的是modules, you also need some module information.
ksymoops tries to figure out most of this information, so you can usually invoke it via
> ksymoops saved_oops.txt
>
The program then spits out a decoded version of the oops.
# kallsyms
* CONFIG_KALLSYMS
* This option stores in the kernel the symbolic name of function addresses mapped into the kernel image so that the kernel can print decoded back traces. Consequently, decoding oopses no longer requires System.map or ksymoops.
* kernel code size會稍微大一點,但多了address-to-symbol mappings是值得的
* CONFIG_KALLSYMS_ALL
* additionally stores the symbolic name of all symbols, not only functions.This is generally needed only by specialized debuggers
* CONFIG_KALLSYMS_EXTRA_PASS
* causes the kernel build process to make a second pass over the kernel’s object code. It is useful only when debugging kallsyms itself.
# Kernel Debugging Options
妳可以在Kernel Hacking menu找到一堆kernel debug的Config, They all depend on CONFIG_DEBUG_KERNEL,當要debug kernel的時候,考慮都把這些開一開吧
* slab layer debugging
* high-memory debugging
* I/O mapping debugging
* spin-lock debugging
* stack-overflow checking
* sleep-inside-spinlock checking
* 這個好像很有用
Kernel做了很多同步處理的debug機制如下

常用的kernel hacking





# Asserting Bugs and Dumping Information
A number of kernel routines make it easy to
* flag bugs
* provide assertions
* dump information
最常見的
* BUG()
* cause oops
* results in a stack trace
* error message dumped to kernel
* BUG_ON()
* cause oops
* results in a stack trace
* error message dumped to kernel
這些是用來assert出一些不應該出現的情況


BUG_ON()比較建議使用,比較清楚,而且還被unlikely()打包起來
more critical error is signaled via panic()A call to panic() prints an error message and then halts the kernel. Obviously, you want to use it only in the worst of situations:

如果妳只想要簡單的 stack trace來當幫助debug
用dumpstack();

# Magic SysRq Key(系統停擺時使用0)
死機的時候可以用來敲磚@@

妳可以在console打這些
* hit SysRq-h for a list of available options
* SysRq-s syncs dirty buffers to disk
* SysRq-u unmounts all filesystems
* SysRq-b reboots the machine
machine死機可能不會成功

# The Saga of a Kernel Debugger
# gdb
> gdb vmlinux /proc/kcore
>


gdb沒辦法runtime改kernel data, 也沒辦法 single-step執行kernel code或set中斷點
# kgdb
遠端透過serial line來debug kernel.需要兩台機器
* 一台跑有kgdb patched的kernel
* 一台用serial line連上第一台才能使用gdb
* 用kgdb才能使用gdb全部的功能
* read write any variables
* breakpoints
* watch points
* single stepping
* ...
怎麼安裝請參考Documentation/
# Poking and Probing the System
# Using UID as a Conditional

# Using Condition Variables

# Using Statistics

# Rate and Occurrence Limiting Your Debugging
kernel裡面有些functions每秒被call超多次,如果妳用printk的話系統很快就不能使用了,有兩種方法解決這種情況
* rate limiting
* 觀察一個event的進展很有用,but the event occurs rather often
* 妳可以用jiffies控制output message的時間,每幾秒在噴一次這樣



