contributed by < linD026
>
linux2021
kfifo_alloc(&rx_fifo, PAGE_SIZE, GFP_KERNEL)
kfifo_free
kfifo_in
— put data into the fifo
fifo
address of the fifo to be usedbuf
the data to be addedn
number of elements to be added為何 produce_data
會需要 len
來確認有沒有完整放入,是因為原始程式碼當中有做確認,若要放入的大小大於剩餘大小,則會以剩餘大小為主:
kfifo_copy_in
當中有使用到兩次 memcpy
是因為考量到在 unused
記憶體容許下,儲存物件會跨區:
kfifo_to_user
— copies data from the fifo into user space
fifo
address of the fifo to be usedto
where the data must be copiedlen
the size of the destination buffercopied
pointer to output variable to store the number of copied byteskfifo
單個 reader 以及單個 writer , include/linux/kfifo.h :
在 simrupt.c
也有提到:
正常:
測試同時讀取:
vmalloc — allocate virtually contiguous memory
生產,消費
alloc_workqueue
flush_workqueue
destroy_workqueue
softirqs can not be used by device drivers, they are reserved for various kernel subsystems. Because of this there is a fixed number of softirqs defined at compile time. For the current kernel version we have the following types defined:
Each type has a specific purpose:
HI_SOFTIRQ
and TASKLET_SOFTIRQ
- running taskletsTIMER_SOFTIRQ
- running timersNET_TX_SOFIRQ
and NET_RX_SOFTIRQ
- used by the networking subsystemBLOCK_SOFTIRQ
- used by the IO subsystemBLOCK_IOPOLL_SOFTIRQ
- used by the IO subsystem to increase performance when the iopoll handler is invoked;SCHED_SOFTIRQ
- load balancingHRTIMER_SOFTIRQ
- implementation of high precision timersRCU_SOFTIRQ
- implementation of RCU type mechanismsThe highest priority is the HI_SOFTIRQ
type softirqs, followed in order by the other softirqs defined. RCU_SOFTIRQ
has the lowest priority.
Softirqs are running in interrupt context which means that they can not call blocking functions. If the sofitrq handler requires calls to such functions, work queues can be scheduled to execute these blocking calls.
A tasklet is a special form of deferred work that runs in interrupt context, just like softirqs. The main difference between sofirqs and tasklets is that tasklets can be allocated dynamically and thus they can be used by device drivers. A tasklet is represented by struct tasklet
and as many other kernel structures it needs to be initialized before being used. A pre-initialized tasklet can defined as following:
If we want to initialize the tasklet manually we can use the following approach:
The data parameter will be sent to the handler when it is executed.
Programming tasklets for running is called scheduling. Tasklets are running from softirqs. Tasklets scheduling is done with:
When using tasklet_schedule
, a TASKLET_SOFTIRQ
softirq is scheduled and all tasklets scheduled are run. For tasklet_hi_schedule
, a HI_SOFTIRQ
softirq is scheduled.
If a tasklet was scheduled multiple times and it did not run between schedules, it will run once. Once the tasklet has run, it can be re-scheduled, and will run again at a later timer. Tasklets can be re-scheduled from their handlers.
Tasklets can be masked and the following functions can be used:
Remember that since tasklets are running from softirqs, blocking calls can not be used in the handler function.
銷毀
tasklet_kill(&simrupt_tasklet);
A particular type of deferred work, very often used, are timers. They are defined by struct timer_list. They run in interrupt context and are implemented on top of softirqs.
To be used, a timer must first be initialized by calling timer_setup():
timer_setup(&timer, timer_handler, 0);
The above function initializes the internal fields of the structure and associates function as the timer handler. Since timers are planned over softirqs, blocking calls can not be used in the code associated with the treatment function.
Scheduling a timer is done with mod_timer()
:
mod_timer(&timer, jiffies + msecs_to_jiffies(delay));
Where expires is the time (in the future) to run the handler function. The function can be used to schedule or reschedule a timer.
The time unit is jiffie
. The absolute value of a jiffie
is dependent on the platform and it can be found using the HZ macro that defines the number of jiffies for 1 second. To convert between jiffies (jiffies_value
) and seconds (seconds_value
), the following formulas are used:
The kernel mantains a counter that contains the number of jiffies since the last boot, which can be accessed via the jiffies global variable or macro. We can use it to calculate a time in the future for timers:
Reference
wait_event_interruptible
wait_event_interruptible(wq, condition);
- sleep until a condition gets true
wq
the waitqueue to wait oncondition
a C expression for the event to wait forThe process is put to sleep (
TASK_INTERRUPTIBLE
) until thecondition
evaluates to true or a signal is received. Thecondition
is checked each time the waitqueuewq
is woken up.
wake_up
has to be called after changing any variable that could change the result of the waitcondition
.
The function will return-ERESTARTSYS
if it was interrupted by a signal and0
ifcondition
evaluated totrue
.
最後,整體流程圖:
dmesg 中 pr_info
資訊:
kfifo
在宣告的時候是:
但 API 有提供 marco 宣告,考量大小為 PAGE_SIZE
且程式碼也以 kfifo_alloc
分配記憶體空間,應為:
而關於 kfifo 以及 circular buffer 會遭遇到並行的問題,因為前者只提供 one reader 和 one writer 的情況;後者則只提供資料結構以及計算使用量等 API 。
關於此問題,在後續 改寫 producer_lock 和 consumer_lock 時會說明。
lwn.net 有一篇關於 kfifo writer side lock-less support ,但去查閱 kfifo.h 沒有相關 commit 。再從這篇 Re: [RFC -v2] kfifo writer side lock-less support 以及之後的回應,應該是沒有被採用。
如果同時開啟檔案兩次,當其中一個關閉時會清空 timer
、 fast_buf
等設定會導致另一個操作錯誤。因此,考量並行下的狀態,以 atomic_t
型態新增 open_cnt
變數來記錄,當為第一個開啟以及最後一個關閉的時候才會做原先的設定操作。
poll()
performs a similar task toselect(2)
: it waits for one of a set of file descriptors to become ready to perform I/O. The Linux-specificepoll(7)
API performs a similar task, but offers features beyond those found inpoll()
.POLLIN
There is data to read.
POLLHUP
Hang up (only returned inrevents
; ignored inevents
). Note that when reading from a channel such as a pipe or a stream socket, this event merely indicates that the peer closed its end of the channel. Subsequent reads from the channel will return0
(end of file) only after all outstanding data in the channel has been consumed.
How to add poll function to the kernel module code?
Add
fortune_poll()
function and add it (as.poll
callback) to your file operations structure:Note that you should return
POLLIN | POLLRDNORM
if you have some new data to read, and0
in case there is no new data to read (poll()
call timed-out). See man 2poll
for details.Notify your waitqueue once you have new data:
wake_up_interruptible(&fortune_wait);
That's the basic stuff about implementing
poll()
operation. Depending on your task, you may be needed to use some waitqueue API in your.read
function (likewait_event_interruptible()
).
Implementing poll in a Linux kernel module
Taking into the account that you haven't mentioned
write()
operation, I will assume further that your hardware is producing new data all the time. If it's so, the design you mentioned can be exactly what is confusing you:The read call is very simple. It starts a DMA write, and then waits on a wait queue.
This is exactly what prevents you from working with your driver in regular, commonly used (and probably desired for you) way. Let's think out of the box and come up with the desired user interface first (how you would want to use your driver from user-space). The next case is commonly used and sufficient here (from my point of view):
- poll() your device file to wait for new data to arrive
- read() your device file to obtain arrived data
Now you can see that data requesting (to DMA) should be started not by
read()
operation. The correct solution would be to read data continuously in the driver (without any triggering from user-space) and store it internally, and when user asks your driver for the data to consume (byread()
operation) – provide the user with data stored internally. If there is no data stored internally in driver – user can wait for new data to arrive usingpoll()
operation.As you can see, this is well-known producer-consumer problem.You can use circular buffer to store data from your hardware in your driver (so you intentionally lost old data when buffer is full to prevent buffer overflow situation). So the producer (DMA) writes to the head of that RX ring buffer, and the consumer (user performing
read()
from user-space) reads from tail of that RX ring buffer.
poll_wait
adds your device (represented by the "struct file") to the list of those that can wake the process up.The idea is that the process can use poll (or select or epoll etc) to add a bunch of file descriptors to the list on which it wishes to wait. The poll entry for each driver gets called. Each one adds itself (via
poll_wait
) to the waiter list.Then the core kernel blocks the process in one place. That way, any one of the devices can wake up the process. If you return non-zero mask bits, that means those "ready" attributes (readable/writable/etc) apply now.
So, in pseudo-code, it's roughly like this:Yes. When you call
poll(2)
in user space, that goes to a function called "sys_poll" inside the kernel (see fs/select.c in kernel source). Likewise, select(2) => sys_select, etc. All those functions follow more or less the pseudo-code I gave above.
Example - rtc
在 simrupt_work_func
函式當中已有 wake_up_interruptible
呼叫:
因此,在 module 當中新增:
撰寫 userspace 的程式碼:
輸出:
producer_lock
和 consumer_lock
consumer_lock
改進, fast_buf
lock-free :
producer_lock
kfifo_in
This macro copies the given buffer into the fifo and returns the number of copied elements.Note that with only one concurrent reader and one concurrent writer, you don't need extra locking to use these macro.
因此在單個 writer 情況下是不需要 lock ,但此處會因 workqueue 導致有可能會有多個 writer 所以保留:
而在 kfifo_to_user
需要 lock ,則可以使用前面因 fast_buf
lock-free 後捨棄的 consumer_lock
。
Reference