注意 .c
檔案的檔案名稱要與 Makefile 裡面的 obj-m
.o
檔名一致,否則 make 時會見到以下錯誤訊息
~/g/m/hello-1 ❯❯❯ ls
Makefile module.c
~/g/m/hello-1 ❯❯❯ make
make -C /lib/modules/`uname -r`/build M=/home/idoleat/tmp/module/hello-1 modules
make[1]: Entering directory '/usr/lib/modules/6.8.9-arch1-1/build'
make[3]: *** No rule to make target '/home/idoleat/tmp/module/hello-1/hello-1.o', needed by '/home/idoleat/tmp/module/hello-1/'. Stop.
make[2]: *** [/usr/lib/modules/6.8.9-arch1-1/build/Makefile:1921: /home/idoleat/tmp/module/hello-1] Error 2
make[1]: *** [Makefile:240: __sub-make] Error 2
make[1]: Leaving directory '/usr/lib/modules/6.8.9-arch1-1/build'
make: *** [Makefile:8: all] Error 2
原因在於 kernel build system 會依據 object file (.o
) 的名字尋找對應的 source file (.c
),使用此 naming convention 就不用額外的規則去尋找 source file,避免不必要的複雜與錯誤,kernel build system 已經夠複雜了
注意到如果是要編譯多個模組,對於 obj-m
必須用 +=
串接標的 object file,而不是使用 :=
(simple assignment) 蓋掉先前的值
Linux 核心模組使用的記憶體地址空間為 kernel virtual memory,因為他的實體地址不需要是連續的,insmod
的時候會使用 vmalloc()
配置。MMIO
也是。不過要注意的是在 32-bit 的機器上 virtual area 比較小
如果 kernel 直接使用 user space 的虛擬記憶體地址會發生什麼事情?
虛擬記憶體幾個要注意的地方:
Observability 是 Linux kernel 一個強項,所以其實都可以透過 tracing tool 來理解運作機制
echo 1 > /sys/kernel/tracing/events/vmalloc/enable
echo 1 > /sys/kernel/tracing/events/module/module_load/enable
/proc/softirqs
and /proc/interrupts
tasklet
APIs with workqueue
APIsin_irq
, in_softirq
and in_interrupt
are deprecatedThe reason to deprecate in_softirq
and in_interrupt
is that when bottom half lock is held (local_bh_disable()
), it can give false positive (mentioned in Unreliable hacking guide by Rusty). Also commit 7c47889 added a note (but now removed) to mentioned this as well:
Note: due to the BH disabled confusion: in_softirq(),in_interrupt() really should not be used in new code.
When to disable soft interrupts on the local CPU? What's the difference between
drgn tools/workqueue/wq_dump.py
CMWQ 裡面的連續 work item 如果都是用同一個 function,可以讓 function 連續執行而不是結束再重新呼叫嗎?(確定會結束再重新呼叫嗎?)
An MT wq could provide only one execution context per CPU while an ST wq one for the whole system. Work items had to compete for those very limited execution contexts leading to various problems including proneness to deadlocks around the single execution context.
A work item can be executed in either a thread or the BH (softirq) context
execution context 是不是就是指那個 function?
就是指他在什麼前提或環境下執行?
目前看到的 context: interrupt context, process context, atomic context
/*
* Macros to retrieve the current execution context:
*
* in_nmi() - We're in NMI context
* in_hardirq() - We're in hard IRQ context
* in_serving_softirq() - We're in softirq context
* in_task() - We're in task context
*/
#define in_nmi() (nmi_count())
#define in_hardirq() (hardirq_count())
#define in_serving_softirq() (softirq_count() & SOFTIRQ_OFFSET)
#ifdef CONFIG_PREEMPT_RT
# define in_task() (!((preempt_count() & (NMI_MASK | HARDIRQ_MASK)) | in_serving_softirq()))
#else
# define in_task() (!(preempt_count() & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET)))
#endif
/*
* The following macros are deprecated and should not be used in new code:
* in_irq() - Obsolete version of in_hardirq()
* in_softirq() - We have BH disabled, or are processing softirqs
* in_interrupt() - We're in NMI,IRQ,SoftIRQ context or have BH disabled
*/
#define in_irq() (hardirq_count())
#define in_softirq() (softirq_count())
#define in_interrupt() (irq_count())
...
/*
* Are we running in atomic context? WARNING: this macro cannot
* always detect atomic context; in particular, it cannot know about
* held spinlocks in non-preemptible kernels. Thus it should not be
* used in the general case to determine whether sleeping is possible.
* Do not use in_atomic() in driver code.
*/
#define in_atomic() (preempt_count() != 0)
/*
* Check whether we were atomic before we did preempt_disable():
* (used by the scheduler)
*/
#define in_atomic_preempt_off() (preempt_count() != PREEMPT_DISABLE_OFFSET)
int myintarray[2];
module_param_array(myintarray, int, NULL, 0); /* not interested in count */
short myshortarray[4];
int count;
module_param_array(myshortarray, short, &count, 0); /* put count into "count" variable */
what is meaning of count if some count and some don't?
try it
register_chrdev() would occupy a range of minor numbers
static inline int register_chrdev(unsigned int major, const char *name,
const struct file_operations *fops)
{
/* extern void __unregister_chrdev(unsigned int major,
* unsigned int baseminor, unsigned int count, const char *name)
*/
return __register_chrdev(major, 0, 256, name, fops);
}
佔了 256 個?
why? Do we have doc for this?
Avoiding Linuxisms:
https://people.freebsd.org/~olivierd/porters-handbook/dads-avoiding-linuxisms.html
Do not use
/proc
if there are any other ways of getting the information. For example,setprogname(argv[0])
inmain()
and then getprogname(3) to know the executable name>.
what will happen if kernel holds a pointer point to a piece of memory in user process segment and dereferences it after context switched?
perf: interrupt took too long (2668 > 2500), lowering kernel.perf_event_max_sample_rate to 74700
/proc/modules
vs /sys/modules
兩個看似毫不相干的東西都提到同個概念可以高度加深理解與印象,兩份不同來源的教材也有一樣的效果,例如在教材上讀的東西與 xxx in the wild。在論壇上、群組裡與別人討論新聞、專案或奇怪的想法可以遇到野生的 xxx。
so subscribe to LKML is important as well. Or at least LWN or Phoronix. Just don't spend too much time on them. 因為在初期看不到事情全貌與關鍵點,就算深入探索也只是在走馬看花
local_bh_disable()
, the big softirq lock
https://www.youtube.com/watch?v=rmv40f5K8AI
Live demo of it is pretty comprehensive
spinlock 的範例看不出來有誰來競爭但因為 lock 被擋下而自旋了一下
寫一個好了
An spinlock in userspace incident in 2020
https://news.ycombinator.com/item?id=21959692
https://www.phoronix.com/news/Linux-2020-Scheduler-Bugs-Stadia
https://probablydance.com/2019/12/30/measuring-mutexes-spinlocks-and-how-bad-the-linux-scheduler-really-is/
需要把範例程式碼都跑一遍
試試看只 local_irq_disable
但是不 enable,lockdep 會追蹤 disable/enable 但是會阻止沒有成對的使用嗎?
work stealing
What's difference of following functions?
void disable_irq(unsigned int irq)
disable an irq and wait for completion
Parameters: unsigned int irq
: Interrupt to disable
Disable the selected interrupt line. Enables and Disables are nested. This function waits for any pending IRQ handlers for this interrupt to complete before returning. If you use this function while holding a resource the IRQ handler may need you will deadlock.
Can only be called from preemptible code as it might sleep when an interrupt thread is associated to irq.
bool disable_hardirq(unsigned int irq)
disables an irq and waits for hardirq completion
Parameters: unsigned int irq
: Interrupt to disable
Disable the selected interrupt line. Enables and Disables are nested. This function waits for any pending hard IRQ handlers for this interrupt to complete before returning. If you use this function while holding a resource the hard IRQ handler may need you will deadlock.
When used to optimistically disable an interrupt from atomic context the return value must be checked.
Return:
false if a threaded handler is active.
This function may be called - with care - from IRQ context.
local_irq_save()
/local_irq_restore()
Defined in include/linux/irqflags.h
local_bh_disable()
/local_bh_enable()
Defined in include/linux/bottom_half.h
local_irq_disable()
/local_irq_enable()
除了關閉有更好的作法以增進 Real time 能力
Preventing preemption using interrupt disabling
送了一個 patch 修正 generic irq 文件的多餘括號
https://lore.kernel.org/lkml/20240619160057.128208-1-idoleat@taiker.tw/T/#u
Applied by Jon Corbet.
Understanding Linux Interrupt Subsystem - Priya Dixit, Samsung Semiconductor India Research
https://www.youtube.com/watch?v=LOCsN3V1ECE
in __init: request_irq
to register handler to irq number
in __exit: free_irq
to unregister from the irq number