Try   HackMD

閱讀 The Linux Kernel Module Programming Guide

前置設定
Kernel module package

sudo apt-get install build-essential kmod

核心中已載入哪些模組

sudo lsmod

模組檔案存在在 /proc/modules

sudo /proc/modules

搜尋特定模組 ex : fat

sudo lsmod | grep fat

4.5 Passing Command Line Arguments to a Module

模組在命令可以接受參數輸入,但不能使用我們熟悉的 argv,argc
在這舉講義中的範例

int myint = 3; 
module_param(myint, int, 0);

int myintarray[2]; 
module_param_array(myintarray, int, NULL, 0); /* not interested in count */ 
 
short myshortarray[4]; 
int count; 
module_param_array(myshortarray, short, &count, 0); /* put count into "count" variable */

問題:for exposing parameters in sysfs (if non-zero) at a later stage. 這段想表達的意思是? sysfs 又是什麼?

範例中,將參數初始化

static short int myshort = 1; 
static int myint = 420; 
static long int mylong = 9999; 
static char *mystring = "blah"; 
static int myintarray[2] = { 420, 420 }; 
static int arr_argc = 0; 

透過引入 kernel module 來使命令接受參數輸入

module_param(myshort, short, S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP); 
MODULE_PARM_DESC(myshort, "A short integer"); 
module_param(myint, int, S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH); 
MODULE_PARM_DESC(myint, "An integer"); 
module_param(mylong, long, S_IRUSR); 
MODULE_PARM_DESC(mylong, "A long integer"); 
module_param(mystring, charp, 0000); 
MODULE_PARM_DESC(mystring, "A character string"); 

module_param_array(myintarray, int, &arr_argc, 0000); 
MODULE_PARM_DESC(myintarray, "An array of integers"); 

下方透過引入 #include <linux/init.h> #include <linux/module.h>
用做初始化函式?

static int __init hello_5_init(void) 
{ 
    int i; 
 
    pr_info("Hello, world 5\n=============\n"); 
    pr_info("myshort is a short integer: %hd\n", myshort); 
    pr_info("myint is an integer: %d\n", myint); 
    pr_info("mylong is a long integer: %ld\n", mylong); 
    pr_info("mystring is a string: %s\n", mystring); 
 
    for (i = 0; i < ARRAY_SIZE(myintarray); i++) 
        pr_info("myintarray[%d] = %d\n", i, myintarray[i]); 
 
    pr_info("got %d arguments for myintarray.\n", arr_argc); 
    return 0; 
} 

module_init(hello_5_init); 

用作卸載模組

static void __exit hello_5_exit(void) 
{ 
    pr_info("Goodbye, world 5\n"); 
}

module_exit(hello_5_exit);

在 cmd 執行範例

$ sudo insmod hello-5.ko mystring="bebop" myintarray=-1
$ sudo dmesg -t | tail -7
myshort is a short integer: 1
myint is an integer: 420
mylong is a long integer: 9999
mystring is a string: bebop
myintarray[0] = -1
myintarray[1] = 420
got 1 arguments for myintarray.

$ sudo rmmod hello-5
$ sudo dmesg -t | tail -1
Goodbye, world 5
$ sudo insmod hello-5.ko mystring="supercalifragilisticexpialidocious" myintarray=-1,-1
$ sudo dmesg -t | tail -7
myshort is a short integer: 1
myint is an integer: 420
mylong is a long integer: 9999
mystring is a string: supercalifragilisticexpialidocious
myintarray[0] = -1
myintarray[1] = -1
got 2 arguments for myintarray.

$ sudo rmmod hello-5
$ sudo dmesg -t | tail -1
Goodbye, world 5

4.6 Modules Spanning Multiple Files
start.c

#include <linux/kernel.h> /* We are doing kernel work */ 
#include <linux/module.h> /* Specifically, a module */ 
 
int init_module(void) 
{ 
    pr_info("Hello, world - this is the kernel speaking\n"); 
    return 0; 
} 
 
MODULE_LICENSE("GPL");

stop.c


#include <linux/kernel.h> /* We are doing kernel work */ 
#include <linux/module.h> /* Specifically, a module  */ 
 
void cleanup_module(void) 
{ 
    pr_info("Short is the life of a kernel module\n"); 
} 
 
MODULE_LICENSE("GPL");

在 Makefile 將 start.c 與 stop.c 包裝成 startstop

obj-m += startstop.o 
startstop-objs := start.o stop.o 

obj-mstartstop.o 作為模組進行編譯,不會編譯到內核中,但會生成一個獨立的 startstop.ko

5.6 Device Drivers

$ ls -l /dev/hda[1-3]
brw-rw----  1 root  disk  3, 1 Jul  5  2000 /dev/hda1
brw-rw----  1 root  disk  3, 2 Jul  5  2000 /dev/hda2
brw-rw----  1 root  disk  3, 3 Jul  5  2000 /dev/hda3

第一個數字為主要編號,告訴你由哪個驅動程式來存取硬體,上面 3 個皆是由同一個驅動程式來控制。
第二個號碼用作區分控制的各種硬體,上面三個有不同的次編號,因此被驅動程式識別為不同的硬體。

Devices are divided into two types:

  • character devices
  • block devices

使用 ls -l 查看,若為 c 開頭表示為 character devices ,若為 b 開頭表示為 block devices 。
5.2 Functions available to modules
在5.2節中舉例

#include <stdio.h> 
 
int main(void) 
{ 
    printf("hello"); 
    return 0; 
}

當中 printf 很熟悉,當我們使用 gcc -Wall -o hello hello.c 進行編譯再輸入 strace ./hello 可以查看系統調用的詳細信息。
在最後一行我們看到 write(1, "hello", 5hello)printf 背後的本尊。
6.3 Registering A Device
將驅動程序添加到系統 (即將其註冊到內核中)

int register_chrdev(unsigned int major, const char *name, struct file_operations *fops);

unsigned int major 是要請求的主要編號
const char *name 設備名稱,顯示在 /proc/devices 中
struct file_operations *fops 指向驅動程序的 file_operations 表的指針

若返回值為負,表示註冊失敗。注意:不須將次要編號傳給 register_chrdev() ,這是因為核心並不關心次要編號,只有驅動程序使用到他。

如何獲得一個未被使用的主要編號,而不是挪用已經被使用的編號?
我們可以要求核心分配一個動態的主要編號給我們。

如果將主要編號設置為 0 傳遞給 register_chrdev(),則返回值將是動態分配的主要編號。缺點是您無法預先創建設備文件,因為您不知道主要編號將是什麼。有幾種方法可以做到這一點。首先,驅動程序本身可以打印新分配的號碼,然後我們可以手動創建設備文件。其次,新註冊的設備將在 /proc/devices 中有一個條目,我們可以手動創建設備文件,或者編寫一個 shell 腳本來讀取文件並創建設備文件,第三種方法是,我們的驅動程序在成功註冊後使用 device_create 函數創建設備文件,並在 cleanup_module 調用期間使用 device_destroy

這段話中的 device_create cleanup_module device_destroy 分別代表什麼?

然而,register_chrdev() 將佔用與給定主要編號相關聯的一系列次要編號。減少 char 設備註冊的浪費的推薦方式是使用 cdev 接口。

表示說 register_chrdev() 一旦註冊將隨附一系列的次要編號?

struct cdev *my_dev = cdev_alloc(); 
my_cdev->ops = &my_fops;

初始化 char 設備的數據結構,並將其與設備編號關聯起來。

將自己的設備特定結構嵌入到 struct cdev

使用以下方式初始化

void cdev_init(struct cdev *cdev, const struct file_operations *fops);

一旦完成初始化,我們就可以使用 cdev_add 將 char 設備添加到系統中。

int cdev_add(struct cdev *p, dev_t dev, unsigned count);

6.4 Unregistering A Device
當 root 將核心載卸使用 rmmod 移除時,是不能被隨便允許的。

通常,當您不希望允許某件事時,您會從應該執行該操作的函數中返回一個錯誤碼(一個負數)。對於 cleanup_module 來說,這是不可能的,因為它是一個 void 函數。然而,有一個計數器用於跟踪有多少進程正在使用您的模塊。您可以使用命令 cat /proc/modulessudo lsmod 查看此數字的值。如果這個數字不為零,rmmod 將失敗。請注意,您不需要在 cleanup_module 中檢查計數器,因為系統調用 sys_delete_module 將為您執行檢查,該系統調用定義在 include/linux/syscalls.h 中。您不應該直接使用此計數器,但是在 include/linux/module.h 中定義了一些函數,讓您可以增加、減少和顯示此計數器:

  • try_module_get(THIS_MODULE) : 增加當前模塊的引用計數
  • module_put(THIS_MODULE) : 減少當前模塊的引用計數
  • module_refcount(THIS_MODULE) : 返回當前模塊的引用計數值

保持計數器的準確性非常重要;若曾經失去了正確的使用計數,則將永遠無法卸載模塊。
伴隨而來的就是 reboot !

6.5 chardev.c

TODO : 程式碼目前對我來說還太難了解,先看下去

char 驅動程式範例
/* 
 * chardev.c: Creates a read-only char device that says how many times 
 * you have read from the dev file 
 */ 
 
#include <linux/atomic.h> 
#include <linux/cdev.h> 
#include <linux/delay.h> 
#include <linux/device.h> 
#include <linux/fs.h> 
#include <linux/init.h> 
#include <linux/kernel.h> /* for sprintf() */ 
#include <linux/module.h> 
#include <linux/printk.h> 
#include <linux/types.h> 
#include <linux/uaccess.h> /* for get_user and put_user */ 
#include <linux/version.h> 
 
#include <asm/errno.h> 
 
/*  Prototypes - this would normally go in a .h file */ 
static int device_open(struct inode *, struct file *); 
static int device_release(struct inode *, struct file *); 
static ssize_t device_read(struct file *, char __user *, size_t, loff_t *); 
static ssize_t device_write(struct file *, const char __user *, size_t, 
                            loff_t *); 
 
#define SUCCESS 0 
#define DEVICE_NAME "chardev" /* Dev name as it appears in /proc/devices   */ 
#define BUF_LEN 80 /* Max length of the message from the device */ 
 
/* Global variables are declared as static, so are global within the file. */ 
 
static int major; /* major number assigned to our device driver */ 
 
enum { 
    CDEV_NOT_USED = 0, 
    CDEV_EXCLUSIVE_OPEN = 1, 
}; 
 
/* Is device open? Used to prevent multiple access to device */ 
static atomic_t already_open = ATOMIC_INIT(CDEV_NOT_USED); 
 
static char msg[BUF_LEN + 1]; /* The msg the device will give when asked */ 
 
static struct class *cls; 
 
static struct file_operations chardev_fops = { 
    .read = device_read, 
    .write = device_write, 
    .open = device_open, 
    .release = device_release, 
}; 
 
static int __init chardev_init(void) 
{ 
    major = register_chrdev(0, DEVICE_NAME, &chardev_fops); 
 
    if (major < 0) { 
        pr_alert("Registering char device failed with %d\n", major); 
        return major; 
    } 
 
    pr_info("I was assigned major number %d.\n", major); 
 
#if LINUX_VERSION_CODE >= KERNEL_VERSION(6, 4, 0) 
    cls = class_create(DEVICE_NAME); 
#else 
    cls = class_create(THIS_MODULE, DEVICE_NAME); 
#endif 
    device_create(cls, NULL, MKDEV(major, 0), NULL, DEVICE_NAME); 
 
    pr_info("Device created on /dev/%s\n", DEVICE_NAME); 
 
    return SUCCESS; 
} 
 
static void __exit chardev_exit(void) 
{ 
    device_destroy(cls, MKDEV(major, 0)); 
    class_destroy(cls); 
 
    /* Unregister the device */ 
    unregister_chrdev(major, DEVICE_NAME); 
} 
 
/* Methods */ 
 
/* Called when a process tries to open the device file, like 
 * "sudo cat /dev/chardev" 
 */ 
static int device_open(struct inode *inode, struct file *file) 
{ 
    static int counter = 0; 
 
    if (atomic_cmpxchg(&already_open, CDEV_NOT_USED, CDEV_EXCLUSIVE_OPEN)) 
        return -EBUSY; 
 
    sprintf(msg, "I already told you %d times Hello world!\n", counter++); 
    try_module_get(THIS_MODULE); 
 
    return SUCCESS; 
} 
 
/* Called when a process closes the device file. */ 
static int device_release(struct inode *inode, struct file *file) 
{ 
    /* We're now ready for our next caller */ 
    atomic_set(&already_open, CDEV_NOT_USED); 
 
    /* Decrement the usage count, or else once you opened the file, you will 
     * never get rid of the module. 
     */ 
    module_put(THIS_MODULE); 
 
    return SUCCESS; 
} 
 
/* Called when a process, which already opened the dev file, attempts to 
 * read from it. 
 */ 
static ssize_t device_read(struct file *filp, /* see include/linux/fs.h   */ 
                           char __user *buffer, /* buffer to fill with data */ 
                           size_t length, /* length of the buffer     */ 
                           loff_t *offset) 
{ 
    /* Number of bytes actually written to the buffer */ 
    int bytes_read = 0; 
    const char *msg_ptr = msg; 
 
    if (!*(msg_ptr + *offset)) { /* we are at the end of message */ 
        *offset = 0; /* reset the offset */ 
        return 0; /* signify end of file */ 
    } 
 
    msg_ptr += *offset; 
 
    /* Actually put the data into the buffer */ 
    while (length && *msg_ptr) { 
        /* The buffer is in the user data segment, not the kernel 
         * segment so "*" assignment won't work.  We have to use 
         * put_user which copies data from the kernel data segment to 
         * the user data segment. 
         */ 
        put_user(*(msg_ptr++), buffer++); 
        length--; 
        bytes_read++; 
    } 
 
    *offset += bytes_read; 
 
    /* Most read functions return the number of bytes put into the buffer. */ 
    return bytes_read; 
} 
 
/* Called when a process writes to dev file: echo "hi" > /dev/hello */ 
static ssize_t device_write(struct file *filp, const char __user *buff, 
                            size_t len, loff_t *off) 
{ 
    pr_alert("Sorry, this operation is not supported.\n"); 
    return -EINVAL; 
} 
 
module_init(chardev_init); 
module_exit(chardev_exit); 
 
MODULE_LICENSE("GPL");

6.6 Writing Modules for Multiple Kernel Versions
為了確保模塊在不同內核版本中的正常運行,需要注意內核版本之間的差異,並相應地進行編碼和調試。
比較常見的 macro LINUX_VERSION_CODEKERNEL_VERSION

7 The /proc File System
kernel 和 kernel modules 向行程發送訊息的一種額外機制是 /proc file system 。最初設計用於存取有關行程的資訊。舉例 :

  • /proc/modules 提供 module 列表
  • /proc/meminfo 蒐集記憶體使用訊息

7.2 Read and Write a /proc File

在 /proc 文件中進行讀取和寫入操作與讀取操作類似,但有一點不同之處在於數據來自 user space,因此您需要將數據從 user space 空間導入到 kernel space(使用 copy_from_userget_user)。

使用 copy_from_userget_user 的原因是,Linux 內存是分段的。這意味著指針本身並不引用內存中的唯一位置,而只是內存段中的位置,您需要知道它屬於哪個內存段才能使用它。每個進程都有一個自己的內存段,而行程唯一可以訪問的內存段就是自己的內存段。

當編寫運行為行程的常規程序時,通常不需要擔心內存段。但是,當您編寫一個核心模組時,通常希望存取內核內存段,這由系統自動處理。然而,當需要在當前運行的行程和內核之間傳遞內存緩衝區的內容時,內核函數接收的是位於行程段中的內存緩衝區的指針。巨集 put_userget_user 允許您訪問該內存。這些函數僅處理一個字符,您可以使用 copy_to_usercopy_from_user 處理多個字符。由於緩衝區(在讀取或寫入函數中)位於 kernel space 中,因此對於寫入函數,您需要導入數據,因為數據來自 user space ,但對於讀取函數,則不需要,因為數據已經位於內核空間中。

範例:
/* 
 * procfs2.c -  create a "file" in /proc 
 */ 
 
#include <linux/kernel.h> /* We're doing kernel work */ 
#include <linux/module.h> /* Specifically, a module */ 
#include <linux/proc_fs.h> /* Necessary because we use the proc fs */ 
#include <linux/uaccess.h> /* for copy_from_user */ 
#include <linux/version.h> 
 
#if LINUX_VERSION_CODE >= KERNEL_VERSION(5, 6, 0) 
#define HAVE_PROC_OPS 
#endif 
 
#define PROCFS_MAX_SIZE 1024 
#define PROCFS_NAME "buffer1k" 
 
/* This structure hold information about the /proc file */ 
static struct proc_dir_entry *our_proc_file; 
 
/* The buffer used to store character for this module */ 
static char procfs_buffer[PROCFS_MAX_SIZE]; 
 
/* The size of the buffer */ 
static unsigned long procfs_buffer_size = 0; 
 
/* This function is called then the /proc file is read */ 
static ssize_t procfile_read(struct file *file_pointer, char __user *buffer, 
                             size_t buffer_length, loff_t *offset) 
{ 
    char s[13] = "HelloWorld!\n"; 
    int len = sizeof(s); 
    ssize_t ret = len; 
 
    if (*offset >= len || copy_to_user(buffer, s, len)) { 
        pr_info("copy_to_user failed\n"); 
        ret = 0; 
    } else { 
        pr_info("procfile read %s\n", file_pointer->f_path.dentry->d_name.name); 
        *offset += len; 
    } 
 
    return ret; 
} 
 
/* This function is called with the /proc file is written. */ 
static ssize_t procfile_write(struct file *file, const char __user *buff, 
                              size_t len, loff_t *off) 
{ 
    procfs_buffer_size = len; 
    if (procfs_buffer_size > PROCFS_MAX_SIZE) 
        procfs_buffer_size = PROCFS_MAX_SIZE; 
 
    if (copy_from_user(procfs_buffer, buff, procfs_buffer_size)) 
        return -EFAULT; 
 
    procfs_buffer[procfs_buffer_size & (PROCFS_MAX_SIZE - 1)] = '\0'; 
    *off += procfs_buffer_size; 
    pr_info("procfile write %s\n", procfs_buffer); 
 
    return procfs_buffer_size; 
} 
 
#ifdef HAVE_PROC_OPS 
static const struct proc_ops proc_file_fops = { 
    .proc_read = procfile_read, 
    .proc_write = procfile_write, 
}; 
#else 
static const struct file_operations proc_file_fops = { 
    .read = procfile_read, 
    .write = procfile_write, 
}; 
#endif 
 
static int __init procfs2_init(void) 
{ 
    our_proc_file = proc_create(PROCFS_NAME, 0644, NULL, &proc_file_fops); 
    if (NULL == our_proc_file) { 
        pr_alert("Error:Could not initialize /proc/%s\n", PROCFS_NAME); 
        return -ENOMEM; 
    } 
 
    pr_info("/proc/%s created\n", PROCFS_NAME); 
    return 0; 
} 
 
static void __exit procfs2_exit(void) 
{ 
    proc_remove(our_proc_file); 
    pr_info("/proc/%s removed\n", PROCFS_NAME); 
} 
 
module_init(procfs2_init); 
module_exit(procfs2_exit); 
 
MODULE_LICENSE("GPL");

Every time the file /proc/helloworld is read, the function procfile_read is called. Two parameters of this function are very important: the buffer (the second parameter) and the offset (the fourth one). The content of the buffer will be returned to the application which read it (for example the cat command). The offset is the current position in the file. If the return value of the function is not null, then this function is called again. So be careful with this function, if it never returns zero, the read function is called endlessly.

  • HAVE_PROC_OPS 前置處理器判別版本是否大於等於 5, 6, 0
  • 定義 PROCFS_NAME 為 "buffer1k",為我們創建 /proc 文件的名稱
  • procfile_read() ,每次讀取 /proc/helloworld 都會呼叫此函式
    1.從 kernel space 讀取內存段中數據
    2.更新讀取操作的偏移量
    3.返回以讀取的字元數量

總結為將 /proc 文件的內容提供給 user space 的應用程序,以實現相應的讀取操作

  • procfile_write() ,將 user space 的數據使用 copy_from_user 導入到 kernel space 的 buffer 中,並更新偏移量。

7.3 Manage /proc file with standard filesystem

從範例中無法理解如何使用 inode 來管理 /proc file ,待釐清

buffer 在 user space 中?

7.4 Manage /proc file with seq_file
使用 seq_file 管理 /proc 檔案
為了方便撰寫 /proc ,定義一個 API 名為 seq_file 。此 API 基於 3 個函式組成 : start()next()stop() ,當用戶讀取 /proc 檔案時, seq_file API 開始一個序列。
下方圖說明序列的流程 :

image
seq_fileproc_ops 提供基本功能,如 :seq_readseq_lseek 等,但不提供 write 至 /proc file 的功能。

8 sysfs: Interacting with your module
sysfs允許從用戶空間與運行中的 kernel 進行交互,透過讀取或設置 module 內的變量。
輸入 ls -l /sys 可以在系統的 /sys 目錄下找到 sysfs的目錄及檔案。

kobjects 是?

$ ls -l /sys/
總用量 0
drwxr-xr-x   2 root root 0  4月 24  2024 block
drwxr-xr-x  52 root root 0  4月 24  2024 bus
drwxr-xr-x  80 root root 0  4月 24  2024 class
drwxr-xr-x   4 root root 0  4月 24  2024 dev
drwxr-xr-x  38 root root 0  4月 24  2024 devices
drwxr-xr-x   6 root root 0  4月 24  2024 firmware
drwxr-xr-x   8 root root 0  4月 24  2024 fs
drwxr-xr-x   2 root root 0  4月 24  2024 hypervisor
drwxr-xr-x  16 root root 0  4月 24  2024 kernel
drwxr-xr-x 259 root root 0  4月 24  2024 module
drwxr-xr-x   3 root root 0  4月 24  2024 power

An attribute definition in simply:

struct attribute { 
    char *name; 
    struct module *owner; 
    umode_t mode; 
}; 
 
int sysfs_create_file(struct kobject * kobj, const struct attribute * attr); 
void sysfs_remove_file(struct kobject * kobj, const struct attribute * attr);

For example, the driver model defines struct device_attribute like:

struct device_attribute { 
    struct attribute attr; 
    ssize_t (*show)(struct device *dev, struct device_attribute *attr, 
                    char *buf); 
    ssize_t (*store)(struct device *dev, struct device_attribute *attr, 
                    const char *buf, size_t count); 
}; 
 
int device_create_file(struct device *, const struct device_attribute *); 
void device_remove_file(struct device *, const struct device_attribute *);

要讀取或寫入屬性,必須在聲明屬性時指定 show()store() 方法。對於常見情況,include/linux/sysfs.h 提供了方便的 macro(如 __ATTR,__ATTR_RO,__ATTR_WO 等),以使定義屬性更加輕鬆,同時使代碼更加簡潔和易讀。

在講義中舉例透過 sysfs 可存取的變量創建的 HelloWorld module 示範:

/* 
 * hello-sysfs.c sysfs example 
 */ 
#include <linux/fs.h> 
#include <linux/init.h> 
#include <linux/kobject.h> 
#include <linux/module.h> 
#include <linux/string.h> 
#include <linux/sysfs.h> 
 
static struct kobject *mymodule; 
 
/* the variable you want to be able to change */ 
static int myvariable = 0; 
 
static ssize_t myvariable_show(struct kobject *kobj, 
                               struct kobj_attribute *attr, char *buf) 
{ 
    return sprintf(buf, "%d\n", myvariable); 
} 
 
static ssize_t myvariable_store(struct kobject *kobj, 
                                struct kobj_attribute *attr, char *buf, 
                                size_t count) 
{ 
    sscanf(buf, "%du", &myvariable); 
    return count; 
} 
 
static struct kobj_attribute myvariable_attribute = 
    __ATTR(myvariable, 0660, myvariable_show, (void *)myvariable_store); 
 
static int __init mymodule_init(void) 
{ 
    int error = 0; 
 
    pr_info("mymodule: initialized\n"); 
 
    mymodule = kobject_create_and_add("mymodule", kernel_kobj); 
    if (!mymodule) 
        return -ENOMEM; 
 
    error = sysfs_create_file(mymodule, &myvariable_attribute.attr); 
    if (error) { 
        pr_info("failed to create the myvariable file " 
                "in /sys/kernel/mymodule\n"); 
    } 
 
    return error; 
} 
 
static void __exit mymodule_exit(void) 
{ 
    pr_info("mymodule: Exit success\n"); 
    kobject_put(mymodule); 
} 
 
module_init(mymodule_init); 
module_exit(mymodule_exit); 
 
MODULE_LICENSE("GPL");

當我在執行 echo "32" > /sys/kernel/mymodule/myvariable 這段時出現 bash: /sys/kernel/mymodule/myvariable: 拒絕不符權限的操作 盡管我已加上 sudo

9 Talking To Device Files
大多數實體裝置既用於輸入也用於輸出,因此內核中的裝置驅動程序需要一些機制來從進程獲取要發送到裝置的輸出。這通過打開裝置文件以進行輸出並將數據寫入其中來實現,就像寫入文件一樣。在下面的示例中,這是通過 device_write 函數來實現的。

在 Unix 中有意特殊函式 ioctl (Input Output Control) ,每個裝置都可以有自己的 ioctl command ,可以是 read ioctl (將訊息從行程發送至核心),也可以是 write ioctl (將訊息返回給行程),也可以兩個都有或都沒有。注意!這裡再次翻轉了 read 和 write 的角色,在 ioctl 中, read 是將訊息發送核心, write 是從核心接收訊息

ioctl() 函式中有三個參數:1.對應設備檔案的描述子 2. ioctl number 3.參數及它的型別,舉例: ret_val = ioctl(file_desc, IOCTL_SET_MSG, message);

其中 ioctl number ,通常由標頭檔的巨集所呼叫(_IO , _IOR , _IOW or _IOWR — depending on the type),從講義範例中的 header file 來看怎麼定義

_IO an ioctl with no parameters
_IOW an ioctl with write parameters (copy_from_user)
_IOR an ioctl with read parameters (copy_to_user)
_IOWR an ioctl with both write and read parameters.

/* 
 * chardev.h - the header file with the ioctl definitions. 
 * 
 * The declarations here have to be in a header file, because they need 
 * to be known both to the kernel module (in chardev2.c) and the process 
 * calling ioctl() (in userspace_ioctl.c). 
 */ 
 
#ifndef CHARDEV_H 
#define CHARDEV_H 
 
#include <linux/ioctl.h> 
 
/* The major device number. We can not rely on dynamic registration 
 * any more, because ioctls need to know it. 
 */ 
#define MAJOR_NUM 100 
 
/* Set the message of the device driver */ 
#define IOCTL_SET_MSG _IOW(MAJOR_NUM, 0, char *) 
/* _IOW means that we are creating an ioctl command number for passing 
 * information from a user process to the kernel module. 
 * 
 * The first arguments, MAJOR_NUM, is the major device number we are using. 
 * 
 * The second argument is the number of the command (there could be several 
 * with different meanings). 
 * 
 * The third argument is the type we want to get from the process to the 
 * kernel. 
 */ 
 
/* Get the message of the device driver */ 
#define IOCTL_GET_MSG _IOR(MAJOR_NUM, 1, char *) 
/* This IOCTL is used for output, to get the message of the device driver. 
 * However, we still need the buffer to place the message in to be input, 
 * as it is allocated by the process. 
 */ 
 
/* Get the n'th byte of the message */ 
#define IOCTL_GET_NTH_BYTE _IOWR(MAJOR_NUM, 2, int) 
/* The IOCTL is used for both input and output. It receives from the user 
 * a number, n, and returns message[n]. 
 */ 
 
/* The name of the device file */ 
#define DEVICE_FILE_NAME "char_dev" 
#define DEVICE_PATH "/dev/char_dev" 
 
#endif

11 Blocking Processes and threads
11.1 Sleep
當 kernel module 被行程打擾時,可以讓行程進入睡眠狀態,直到模組能再次為它提供服務。行程被 kernel 置於睡眠狀態並不斷的被叫醒,這就是多個行程同時在單一 CPU 運行的情況。

核心模組會調用 wait_event_interruptible 將行程置於睡眠狀態,直到該檔案可用為止。這個機制確保了對檔案的獨占性存取。

當行程完成對檔案的操作並將其關閉時,會調用 module_close 函式。這個函式會喚醒等待訪問文件的所有行程,並允許它們繼續運行。這樣,等待存取檔案的行程將能夠依次存取檔案。

重要的是要記住,除了 module_close 函式外,訊號(例如Ctrl + c)也可以喚醒等待存取檔案的行程。這是因為我們使用了 wait_event_interruptible 函式。如果採用了 wait_event 函式,則會導致當Ctrl+c被忽略時用戶會非常生氣。

對於不想進入睡眠狀態的進程,它們可以使用 O_NONBLOCK 標誌來打開文件。在這種情況下,內核會立即返回錯誤碼 -EAGAIN,而不會阻塞進程的操作。
11.2 Completions
有時在多個執行緒的模組,一件事應該在另一件事之前發生。與使用 /bin/sleep 命令不同,核心有另一種方法可以做到這一點,允許超時或中斷也能發生。

Completion 作為代碼同步機制有三個主要部分:初始化結構體 completion 同步對象,通過 wait_for_completion() 進行等待,以及通過調用 complete() 進行信號發送部分。

例子中啟用兩個行程: crank 和 flywheel ,在 flywheel 行程之前必須首先啟動 crank 行程。

為每個執行緒建立了一個完成(completion)狀態,為 crank 和 flywheel 執行緒分別定義了不同的完成(completion)。在每個執行緒的退出點上更新了相應的完成(completion)狀態,flywheel 執行緒使用 wait_for_completion 函數確保它不會過早開始。crank 執行緒使用 complete_all() 函數更新完成(completion),這讓 flywheel 執行緒繼續執行。

/* 
 * completions.c 
 */ 
#include <linux/completion.h> 
#include <linux/err.h> /* for IS_ERR() */ 
#include <linux/init.h> 
#include <linux/kthread.h> 
#include <linux/module.h> 
#include <linux/printk.h> 
#include <linux/version.h> 
 
static struct completion crank_comp; 
static struct completion flywheel_comp; 
 
static int machine_crank_thread(void *arg) 
{ 
    pr_info("Turn the crank\n"); 
 
    complete_all(&crank_comp); 
#if LINUX_VERSION_CODE >= KERNEL_VERSION(5, 17, 0) 
    kthread_complete_and_exit(&crank_comp, 0); 
#else 
    complete_and_exit(&crank_comp, 0); 
#endif 
} 
 
static int machine_flywheel_spinup_thread(void *arg) 
{ 
    wait_for_completion(&crank_comp); 
 
    pr_info("Flywheel spins up\n"); 
 
    complete_all(&flywheel_comp); 
#if LINUX_VERSION_CODE >= KERNEL_VERSION(5, 17, 0) 
    kthread_complete_and_exit(&flywheel_comp, 0); 
#else 
    complete_and_exit(&flywheel_comp, 0); 
#endif 
} 
 
static int __init completions_init(void) 
{ 
    struct task_struct *crank_thread; 
    struct task_struct *flywheel_thread; 
 
    pr_info("completions example\n"); 
 
    init_completion(&crank_comp); 
    init_completion(&flywheel_comp); 
 
    crank_thread = kthread_create(machine_crank_thread, NULL, "KThread Crank"); 
    if (IS_ERR(crank_thread)) 
        goto ERROR_THREAD_1; 
 
    flywheel_thread = kthread_create(machine_flywheel_spinup_thread, NULL, 
                                     "KThread Flywheel"); 
    if (IS_ERR(flywheel_thread)) 
        goto ERROR_THREAD_2; 
 
    wake_up_process(flywheel_thread); 
    wake_up_process(crank_thread); 
 
    return 0; 
 
ERROR_THREAD_2: 
    kthread_stop(crank_thread); 
ERROR_THREAD_1: 
 
    return -1; 
} 
 
static void __exit completions_exit(void) 
{ 
    wait_for_completion(&crank_comp); 
    wait_for_completion(&flywheel_comp); 
 
    pr_info("completions exit\n"); 
} 
 
module_init(completions_init); 
module_exit(completions_exit); 
 
MODULE_DESCRIPTION("Completions example"); 
MODULE_LICENSE("GPL");

12 Avoiding Collisions and Deadlocks

如果在不同的 CPU 上運行的進程或不同的線程嘗試存取同一塊記憶體,那麼可能會發生奇怪的事情,或者您的系統可能會 lock up 。為了避免這種情況,核心提供了各種類型的互斥核心函數。這些函數指示代碼段是否“鎖定”或“解鎖”,從而防止同時運行它的嘗試。

12.1 mutex

mutex_is_locked 來檢查互斥鎖的鎖定狀態。如果互斥鎖被鎖定,則返回非零值;否則返回零。

最後,無論鎖定是否成功,都應該在臨界區域訪問完畢後使用 mutex_unlock 函數來釋放互斥鎖,以允許其他行程或執行緒群取被保護的資源。

static DEFINE_MUTEX(mymutex); 
 
static int __init example_mutex_init(void) 
{ 
    int ret; 
 
    pr_info("example_mutex init\n"); 
 
    ret = mutex_trylock(&mymutex); 
    if (ret != 0) { 
        pr_info("mutex is locked\n"); 
 
        if (mutex_is_locked(&mymutex) == 0) 
            pr_info("The mutex failed to lock!\n"); 
 
        mutex_unlock(&mymutex); 
        pr_info("mutex is unlocked\n"); 
    } else 
        pr_info("Failed to lock\n"); 
 
    return 0; 
} 
12.2 Spinlocks (自旋鎖)

Linux 自旋鎖

在範例中使用 irq safe ,意味著在 lock 期間發生中斷,不會丟失中斷,而是在解鎖時激活,使用 flag 變數來保留原來狀態。

The example here is "irq safe" in that if interrupts happen during the lock then they will not be forgotten and will activate when the unlock happens, using the flags variable to retain their state.

範例中定義兩種不同的自旋鎖 :

  1. 靜態自旋鎖 sl_static
  2. 動態自旋鎖 sl_dynamic

但為何要分成動態與靜態?

ChatGPT 解釋:

在示例中,動態自旋鎖(sl_dynamic)和靜態自旋鎖(sl_static)是兩種不同類型的自旋鎖,它們之間有幾個重要的區別:

  1. 初始化方式:

    • 靜態自旋鎖是使用 DEFINE_SPINLOCK 宏在全局範圍內靜態定義的。這意味著它在模塊加載時就已經存在,並且佔用了固定的內存空間。
    • 動態自旋鎖則是在運行時通過 spin_lock_init 函數初始化的。這意味著它在運行時動態創建,並且可以根據需要釋放內存。
  2. 作用域:

    • 靜態自旋鎖的作用域是全局的,可以在整個模塊中使用。
    • 動態自旋鎖的作用域可以是局部的,例如可以在函數內部使用,也可以是全局的,類似於靜態自旋鎖。
  3. 佔用的內存:

    • 靜態自旋鎖在模塊加載時分配並佔用了固定的內存空間。
    • 動態自旋鎖在運行時動態創建,並且可能需要額外的內存空間。
  4. 初始化方式:

    • 靜態自旋鎖不需要額外的初始化步驟,因為它在定義時已經被初始化。
    • 動態自旋鎖需要在使用之前通過 spin_lock_init 函數進行初始化。

總的來說,靜態自旋鎖是在編譯時創建並佔用固定內存的自旋鎖,而動態自旋鎖是在運行時動態創建並且可能需要動態分配內存的自旋鎖。选择使用哪種自旋鎖取決於具體的應用場景和性能要求。

static DEFINE_SPINLOCK(sl_static); 
static spinlock_t sl_dynamic; 
 
static void example_spinlock_static(void) 
{ 
    unsigned long flags; 
 
    spin_lock_irqsave(&sl_static, flags); 
    pr_info("Locked static spinlock\n"); 
 
    /* Do something or other safely. Because this uses 100% CPU time, this 
     * code should take no more than a few milliseconds to run. 
     */ 
 
    spin_unlock_irqrestore(&sl_static, flags); 
    pr_info("Unlocked static spinlock\n"); 
} 
 
static void example_spinlock_dynamic(void) 
{ 
    unsigned long flags; 
 
    spin_lock_init(&sl_dynamic); 
    spin_lock_irqsave(&sl_dynamic, flags); 
    pr_info("Locked dynamic spinlock\n"); 
 
    /* Do something or other safely. Because this uses 100% CPU time, this 
     * code should take no more than a few milliseconds to run. 
     */ 
 
    spin_unlock_irqrestore(&sl_dynamic, flags); 
    pr_info("Unlocked dynamic spinlock\n"); 
} 

Taking 100% of a CPU’s resources comes with greater responsibility. Situations where the kernel code monopolizes a CPU are called atomic contexts. Holding a spinlock is one of those situations. Sleeping in atomic contexts may leave the system hanging, as the occupied CPU devotes 100% of its resources doing nothing but sleeping. In some worse cases the system may crash. Thus, sleeping in atomic contexts is considered a bug in the kernel. They are sometimes called “sleep-in-atomic-context” in some materials.

12.3 Read and write locks

RWLock 的規則

  • 同一時間允許一個 Writer 獲得 Lock
  • 同一時間允許多個 Reader 獲得 Lock
  • 同一時間 Reader 跟 Writer 不能同時獲得 Lock
static DEFINE_RWLOCK(myrwlock); 
 
static void example_read_lock(void) 
{ 
    unsigned long flags; 
 
    read_lock_irqsave(&myrwlock, flags); 
    pr_info("Read Locked\n"); 
 
    /* Read from something */ 
 
    read_unlock_irqrestore(&myrwlock, flags); 
    pr_info("Read Unlocked\n"); 
} 
 
static void example_write_lock(void) 
{ 
    unsigned long flags; 
 
    write_lock_irqsave(&myrwlock, flags); 
    pr_info("Write Locked\n"); 
 
    /* Write to something */ 
 
    write_unlock_irqrestore(&myrwlock, flags); 
    pr_info("Write Unlocked\n"); 
} 

12.4 Atomic operations

13 Replacing Print Macros

13.1 Replacement

利用“current”指針訪問活動任務的tty結構。在這個結構中,有一個指向字符串寫入函數的指針,方便將字符串傳輸到tty。

範例程式碼中有這段 (ttyops->write)(my_tty, "\015\012", 2); 看了註解但不能完全理解,於是找到這篇 \r\n和\n的差異

static void print_string(char *str) 
{ 
    /* The tty for the current task */ 
    struct tty_struct *my_tty = get_current_tty(); 
 
    /* If my_tty is NULL, the current task has no tty you can print to (i.e., 
     * if it is a daemon). If so, there is nothing we can do. 
     */ 
    if (my_tty) { 
        const struct tty_operations *ttyops = my_tty->driver->ops; 
        /* my_tty->driver is a struct which holds the tty's functions, 
         * one of which (write) is used to write strings to the tty. 
         * It can be used to take a string either from the user's or 
         * kernel's memory segment. 
         * 
         * The function's 1st parameter is the tty to write to, because the 
         * same function would normally be used for all tty's of a certain 
         * type. 
         * The 2nd parameter is a pointer to a string. 
         * The 3rd parameter is the length of the string. 
         * 
         * As you will see below, sometimes it's necessary to use 
         * preprocessor stuff to create code that works for different 
         * kernel versions. The (naive) approach we've taken here does not 
         * scale well. The right way to deal with this is described in 
         * section 2 of 
         * linux/Documentation/SubmittingPatches 
         */ 
        (ttyops->write)(my_tty, /* The tty itself */ 
                        str, /* String */ 
                        strlen(str)); /* Length */ 
 
        /* ttys were originally hardware devices, which (usually) strictly 
         * followed the ASCII standard. In ASCII, to move to a new line you 
         * need two characters, a carriage return and a line feed. On Unix, 
         * the ASCII line feed is used for both purposes - so we can not 
         * just use \n, because it would not have a carriage return and the 
         * next line will start at the column right after the line feed. 
         * 
         * This is why text files are different between Unix and MS Windows. 
         * In CP/M and derivatives, like MS-DOS and MS Windows, the ASCII 
         * standard was strictly adhered to, and therefore a newline requires 
         * both a LF and a CR. 
         */ 
        (ttyops->write)(my_tty, "\015\012", 2); 
    } 
} 

13.2 Flashing keyboard LEDs

14 Scheduling Tasks

運行任務有兩種主要方式:tasklet 和 work queue。Tasklet 是一種快速簡便的方式,用於安排單個函數的運行,例如當從中斷觸發時。而 work queue 則更複雜,但也更適合按順序運行多個任務。

14.1 Tasklets

問題:
根據講義描述結果應為:

tasklet example init
Example tasklet starts
Example tasklet init continues...
Example tasklet ends

但我實作結果卻是:

[  321.229193] tasklet example init
[  321.232447] Example tasklet starts
[  326.193007] Example tasklet ends
[  326.387463] Example tasklet init continues...

問題還在釐清中

#ifndef DECLARE_TASKLET_OLD 
#define DECLARE_TASKLET_OLD(arg1, arg2) DECLARE_TASKLET(arg1, arg2, 0L) 
#endif 
 
static void tasklet_fn(unsigned long data) 
{ 
    pr_info("Example tasklet starts\n"); 
    mdelay(5000); 
    pr_info("Example tasklet ends\n"); 
} 
 
static DECLARE_TASKLET_OLD(mytask, tasklet_fn); 
 
static int __init example_tasklet_init(void) 
{ 
    pr_info("tasklet example init\n"); 
    tasklet_schedule(&mytask); 
    mdelay(200); 
    pr_info("Example tasklet init continues...\n"); 
    return 0; 
} 

14.2 Work queues

static struct workqueue_struct *queue = NULL; 
static struct work_struct work; 
 
static void work_handler(struct work_struct *data) 
{ 
    pr_info("work handler function.\n"); 
} 
 
static int __init sched_init(void) 
{ 
    queue = alloc_workqueue("HELLOWORLD", WQ_UNBOUND, 1); 
    INIT_WORK(&work, work_handler); 
    queue_work(queue, &work); 
    return 0; 
}

15 Interrupt Handlers

15.1 Interrupt Handlers

CPU 與電腦硬件之間有兩種類型的交互。第一種類型是 CPU 向硬體發出命令,另一種是硬體需要告訴CPU某些信息。後者被稱為中斷,實現起來更難,因為它必須在對硬件方便而不是 CPU 方便時進行處理。硬體設備通常只有很少的RAM,如果在信息可用時不讀取它們,則將丟失該信息。

在 Linux 硬體中斷稱為 IRQ's (Interrupt ReQuests) ,分成兩種:

  • short IRQ :預計需要很短時間的中斷,期間整個計算機將被阻塞,不會處理其他中斷。
  • long IRQ :長中斷需要更長時間的中斷,並且在此期間可能發生其他中斷(但不是來自同一設備的中斷)。

盡可能設置為 long IRQ 。

15.2 Detecting button presses

Here is an example where buttons are connected to GPIO numbers 17 and 18 and an LED is connected to GPIO 4. You can change those numbers to whatever is appropriate for your board.

定義 LED 在 GPIO 4,開關分別在 GPIO 17 and 18 :

static int button_irqs[] = { -1, -1 }; 
 
/* Define GPIOs for LEDs. 
 * TODO: Change the numbers for the GPIO on your board. 
 */ 
static struct gpio leds[] = { { 4, GPIOF_OUT_INIT_LOW, "LED 1" } }; 
 
/* Define GPIOs for BUTTONS 
 * TODO: Change the numbers for the GPIO on your board. 
 */ 
static struct gpio buttons[] = { { 17, GPIOF_IN, "LED 1 ON BUTTON" }, 
                                 { 18, GPIOF_IN, "LED 1 OFF BUTTON" } }; 

中斷處理函式,當按鈕被按下時觸發。根據中斷號(IRQ)的不同,決定要執行的操作。如果第一個按鈕被按下且LED為關閉狀態,則將LED打開;如果第二個按鈕被按下且LED為打開狀態,則將LED關閉。

/* interrupt function triggered when a button is pressed. */ 
static irqreturn_t button_isr(int irq, void *data) 
{ 
    /* first button */ 
    if (irq == button_irqs[0] && !gpio_get_value(leds[0].gpio)) 
        gpio_set_value(leds[0].gpio, 1); 
    /* second button */ 
    else if (irq == button_irqs[1] && gpio_get_value(leds[0].gpio)) 
        gpio_set_value(leds[0].gpio, 0); 
 
    return IRQ_HANDLED; 
} 

15.3 Bottom Half

若想在中斷中執行一堆操作,一種常見的方法式將其與 tasklet 結合,這樣可以將大部分的工作推遲到排程器中進行。

static void bottomhalf_tasklet_fn(unsigned long data) 
{ 
    pr_info("Bottom half tasklet starts\n"); 
    /* do something which takes a while */ 
    mdelay(500); 
    pr_info("Bottom half tasklet ends\n"); 
} 

在中斷程式中安插 tasklet_schedule

static irqreturn_t button_isr(int irq, void *data) 
{ 
    /* Do something quickly right now */ 
    if (irq == button_irqs[0] && !gpio_get_value(leds[0].gpio)) 
        gpio_set_value(leds[0].gpio, 1); 
    else if (irq == button_irqs[1] && gpio_get_value(leds[0].gpio)) 
        gpio_set_value(leds[0].gpio, 0); 
 
    /* Do the rest at leisure via the scheduler */ 
    tasklet_schedule(&buttontask); 
 
    return IRQ_HANDLED; 
}