Try   HackMD

Linux simple sysfs, procfs and character device driver (Linux kernel >3.2)

tags: C LANGUAGE linux kernel procfs, sysfs, character

Authors: WhoAmI, CrazyMonkey
Date: 20230303
email: kccddb@gmail.com
Copyright: CC BY-NC-SA

此為 Linux Kernel module 相關議題 請若是基本使用者 請詢查別的文章

目的:

A. 學會這 sysfs, procfs 您可以修改 Linux kernel source 或 kernel module 如此可以透過 user space 做一下簡單設定(wrtie) 也可觀察 (read)

B. 需要有 OOP 觀念

C. c language: function pointer, callback and event-driven

例如 起動 "IP 轉送 Route 的功能"
echo 1 > /proc/sys/net/ipv4/ip_forward

就是已有的 procfs 的 entry
例如 由ETH0 IP frame > ETH1 (根據 routing table 轉送)

https://lxr.linux.no/linux+v6.0.9/net/ipv4/devinet.c#L2643

static struct ctl_table ctl_forward_entry[] = { 2642 { 2643 .procname = "ip_forward", 2644 .data = &ipv4_devconf.data[ 2645 IPV4_DEVCONF_FORWARDING - 1], 2646 .maxlen = sizeof(int), 2647 .mode = 0644, 2648 .proc_handler = devinet_sysctl_forward, 2649 .extra1 = &ipv4_devconf, 2650 .extra2 = &init_net, 2651 }, 2652 { }, 2653};

Please use "ip route" to see the route!

See also
(1) iproute2:
Linux Advanced Routing & Traffic Control HOWTO (ip command)
請用 (a) ip command 代替 ifconfig, route,..
例如
ip addr
ip route
(b) tc: traffic control QoS (進階)

(2)基本Kernel Netfilter 觀念 與 user space command "iptables"
iptables 是user space 用來與 核心netfiter 溝通的 命令, udevd (event managing daemon) 也是

netlink - communication between kernel and user space (AF_NETLINK)

Please see

A Deep Dive into Iptables and Netfilter Architecture

ip, iptables 都是透過 netlink socket 與 kernel 運行!! 現在可以了解我敎 netlink socket 的目的了?

另一個 tc 有用到再去看 這是 QoS 控制 Queueing 又出現了排隊理論與資源分享

當然重開機procfs, sysfs 要再重設(因為是在DRAM 記憶體, 不是在硬碟!!), 因此一般就加到 起動的 shell script 中

B. 學會小細節 licence and export, strip, Makefile

C. 簡單的 sysfs, procfs, kernel module

D. 更進一步請看很好的 Sysfs in Linux Kernel – Linux Device Driver Tutorial Part 11

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

strip -g

​​​​strip: Discard symbols from object files.
​​​​strip -g: remove debugging symbols only.
​​​​    Be careful, you cannot use "strip hello_sys.ko"  in building kernel module.
​​​​     make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
​​​​    (a) "/lib/modules/$(shell uname -r)/build" have the default Makefile.  
​​​​          In this case, you use the default kernel!

​​​​          or 

​​​​          "/linux-kernel-source" if you use other kernel version. e.g., /src/linux-3.2.0-64-generic

​​​​          If you use a new kernel source, not your default kernel, 
​​​​          u must boot your Linux via your new kernel!!

​​​​    (b)  M=$(PWD)  means your module path "/home/laikc/demo"
​​​​    
​​​​    
​​​​     
​​​​     (c) "modules" means:  make modules

​​​​  

​​​​  **Use strip -g**

Makefile

root@laikc-virtual:/home/laikc/demo# cat Makefile
obj-m += hello_sys.o
all:
make -C /lib/modules/

(shellunamer)/buildM=(PWD) modules
strip -g hello_sys.ko
clean:
make -C /lib/modules/
(shellunamer)/buildM=
(PWD) clean

root@laikc-virtual:/home/laikc/demo#

root@laikc-virtual:/home/laikc/demo# make
make -C /lib/modules/3.2.0-64-generic/build M=/home/laikc/demo modules
make[1]: Entering directory /usr/src/linux-headers-3.2.0-64-generic' Building modules, stage 2. MODPOST 1 modules make[1]: Leaving directory /usr/src/linux-headers-3.2.0-64-generic'
strip -g hello_sys.ko
root@laikc-virtual:/home/laikc/demo#


扮演 檔案運作的 struct file_operations

struct file_operations { struct module *owner; loff_t (*llseek) (struct file *, loff_t, int); ssize_t (*read) (struct file *, char user *, size_t, loff_t *); ssize_t (*write) (struct file *, const char user *, size_t, loff_t *); ssize_t (*aio_read) (struct kiocb *, const struct iovec *, unsigned long, loff_t); ssize_t (*aio_write) (struct kiocb *, const struct iovec *, unsigned long, loff_t); ssize_t (*read_iter) (struct kiocb *, struct iov_iter *); ssize_t (*write_iter) (struct kiocb *, struct iov_iter *); int (*iterate) (struct file *, struct dir_context *); unsigned int (*poll) (struct file *, struct poll_table_struct *); long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long); long (*compat_ioctl) (struct file *, unsigned int, unsigned long); int (*mmap) (struct file *, struct vm_area_struct *); void (*mremap)(struct file *, struct vm_area_struct *); int (*open) (struct inode *, struct file *); int (*flush) (struct file *, fl_owner_t id); int (*release) (struct inode *, struct file *); int (*fsync) (struct file *, loff_t, loff_t, int datasync); int (*aio_fsync) (struct kiocb *, int datasync); int (*fasync) (int, struct file *, int); int (*lock) (struct file *, int, struct file_lock *); ssize_t (*sendpage) (struct file *, struct page *, int, size_t, loff_t *, int); unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long); int (*check_flags)(int); int (*flock) (struct file *, int, struct file_lock *); ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int); ssize_t (*splice_read)(struct file *, loff_t *, struct pipe_inode_info *, size_t, unsigned int); int (*setlease)(struct file *, long, struct file_lock **, void **); long (*fallocate)(struct file *file, int mode, loff_t offset, loff_t len); void (*show_fdinfo)(struct seq_file *m, struct file *f); };

SysFs I

/***************************************************************************** *Source code hello_sys.c* *Copyright (c) 2018 GPL v2 *(C) kclai *This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 as * published by the Free Software Foundation. ****************************************************** */ #include <linux/init.h> #include <linux/module.h> #include <linux/kernel.h> #include <linux/fs.h> #include <linux/device.h> #include <linux/sysdev.h> #include <linux/major.h> #include <asm/uaccess.h> static ssize_t enable_write(struct class *cls, struct class_attribute *attr, const char *buf, size_t count); static ssize_t enable_show(struct class *cls, struct class_attribute *attr, char *buf); static int enable_state=0; static struct class_attribute class_attr[] = { __ATTR(enable, 0644, enable_show, enable_write), __ATTR_NULL }; static struct class hello_drv = { .name = "HelloSys", .owner = THIS_MODULE, .class_attrs = (struct class_attribute *) &class_attr, }; // Show function static ssize_t enable_show(struct class *cls, struct class_attribute *attr, char *buf) { char *msg[] = {"Disabled", "Enabled" }; printk("Hello Sys show function\n"); return sprintf(buf, "%s\n", msg[enable_state]); } // Store function static ssize_t enable_write(struct class *cls, struct class_attribute *attr, const char *buf, size_t count) { if(strncmp(buf,"1",1)==0) enable_state=1; if(strncmp(buf,"0",1)==0) enable_state=0; printk("HelloSys Change State %d\n",enable_state); return 1; } static int hello_init(void) { int status; status = class_register(&hello_drv); if (status < 0) printk("Registering Class Failed\n"); else printk("Registering Class at /sys/class/HelloSys\n"); return 0; } static void hello_exit(void) { class_unregister(&hello_drv); printk(" GoodBye, HelloSys\n"); } void pr_hello_export(void){ printk("hello sysfs demo"); } EXPORT_SYMBOL(pr_hello_export) ; module_init( hello_init); module_exit( hello_exit); MODULE_LICENSE("GPL"); /* Notice that: MODULE_LICENSE("GPL"); In general, if you want to use kernel "public functions without licence and export" , you must care EXPORT_SYMBOL(pr_hello_export) and licence problem. module_init( hello_init); Let the kernel know the initial function hello_init address (exec hello_init when insmod or kernel starts) The space of "hello_init" will be released later on. module_exit( hello_exit); Let the kernel know the exit function hello_exit address (exec hello_exit when rmmod) */

Supplementary Information

#include <stdio.h> /* *Stringification operator: # *token pasting operator: ## */ #define WARN_IF(EXP) \ do { if (EXP) \ fprintf (stderr, "Warning: " #EXP "\n"); } \ while (0) #define str(x) #x #define var(x) tmpvar##x int main(int argc, char *argv[]){ int tmpvar1=0; int tmpvar2=1; WARN_IF(argc<2); printf("%d\n", var(1) ); return 0; }
#define __ATTR(_name, _mode, _show, _store) { \ .attr = {.name = __stringify(_name), \ .mode = VERIFY_OCTAL_PERMISSIONS(_mode) }, \ .show = _show, \ .store = _store, \ }

How to test:

root@laikc-virtual:/home/laikc/demo# insmod hello_sys.ko
root@laikc-virtual:/home/laikc/demo# cat /sys/class/HelloSys/enable
Disabled
root@laikc-virtual:/home/laikc/demo# echo "1" >/sys/class/HelloSys/enable
root@laikc-virtual:/home/laikc/demo# cat /sys/class/HelloSys/enable
Enabled
root@laikc-virtual:/home/laikc/demo# rmmod hello_sys
root@laikc-virtual:/home/laikc/demo#

root@laikc-virtual:/home/laikc/demo# dmesg

you can see the printk info


SysFs II *

Kobject Abstraction

Allowing objects to be arranged into hierarchies.

NETLINK_KOBJECT_UEVENT (since Linux 2.6.10)

Kernel messages to user space.

Ref. The zen of kobjects
Ref. Everything you never wanted to know about kobjects, ksets, and ktypes
Ref. Sysfs in Linux Kernel – Linux Device Driver Tutorial Part 11, by EmbeTronicX

至於 classOOP 觀念很重要!!!

有興趣的 可以參考 Linux设备模型(7)_Class

You can download the other kernel sources
from
https://www.kernel.org/pub/linux/kernel/v2.6/
e.g.,
wget http:///linux-2.6.29.1.tar.gz

If u use your new kernel source, you may use QEMU to simulate virtual machine and run your new Linux.


Linux 3.8 New ProcFS
If you need procfs, NOTICE that new linux kernel procfs is different

hello_proc.c

/***************************************************************************************** *Source code hello_proc.c *Copyright (c) 2018 GPL v2 *(C) kclai *This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 as * published by the Free Software Foundation. ****************************************************** */ #include <linux/fs.h> #include <linux/init.h> #include <linux/kernel.h> #include <linux/proc_fs.h> #include <linux/seq_file.h> #include <linux/utsname.h> static int hello_proc_show(struct seq_file *m, void *v) { seq_printf(m, "Hello...%s-%s-%s", utsname()->sysname, utsname()->release, utsname()->version); return 0; } static int hello_proc_open(struct inode *inode, struct file *file) { return single_open(file, hello_proc_show, NULL); } static const struct file_operations hello_proc_fops = { .open = hello_proc_open, .read = seq_read, .llseek = seq_lseek, .release = single_release, }; static int __init proc_hello_init(void) { proc_create("Hello", 0, NULL, &hello_proc_fops); //using the new struct file_operations return 0; } module_init(proc_hello_init);

Write Makefile
Notice that TAB key and strip

Makefile

obj-m += hello_proc.o all: make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules strip -g hello_proc.ko clean: make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean

Make modules

laikc@laikc-virtual-machine:~$ make
make -C /lib/modules/3.8.0-35-generic/build M=/home/laikc modules
make[1]: Entering directory /usr/src/linux-headers-3.8.0-35-generic' Building modules, stage 2. MODPOST 1 modules make[1]: Leaving directory /usr/src/linux-headers-3.8.0-35-generic'
strip -g hello_proc.ko

Insert module


laikc@laikc-virtual-machine:~$ sudo insmod hello_proc.ko
[sudo] password for laikc:


laikc@laikc-virtual-machine:~$ cat /proc/Hello
HelloLinux-3.8.0-35-generic-#50-Ubuntu SMP Tue Dec 3 01:25:33 UTC 2013


Linux Device Driver Tutorial Part 9 – Procfs in Linux


Character Device Driver

You can request dynamic assignment of a major number. If the argument major is set to 0 when you call register_chrdev, the function selects a free number and returns it.

Major and Minor Numbers

Register Object and Use Object

Device File Creation – Linux Device Driver Tutorial Part 5, by by SLR

Linux Device Driver Tutorial Programming – Linux Device Driver Tutorial Part 7, by EmbeTronicX


Appendix Linux VFS (Virtual File System)

OOP concept!
Register Objects and Use Objects

For example,

procfs:
define class hello_proc_fops
static const struct file_operations hello_proc_fops

EXPORT_SYMBOL () is a macro.

It makes a symbol accessible to dynamically loaded modules (provided that said modules add an extern declaration).

struct file_operations is an important data structure!


socket layer (Appendix proto_ops)

AP client user mode<->kernel mode: socket(AF_INET, )<>IPv4 inet_stream_ops<->Route, etc.<->Device Driver<->Device<->network <> AP server

藍色都是 kernel mode

Appendix: proto_ops

const struct proto_ops inet_stream_ops = { .family = PF_INET, .owner = THIS_MODULE, .release = inet_release, .bind = inet_bind, .connect = inet_stream_connect, .socketpair = sock_no_socketpair, .accept = inet_accept, .getname = inet_getname, .poll = tcp_poll, .ioctl = inet_ioctl, .listen = inet_listen, .shutdown = inet_shutdown, .setsockopt = sock_common_setsockopt, .getsockopt = sock_common_getsockopt, .sendmsg = inet_sendmsg, .recvmsg = inet_recvmsg, .mmap = sock_no_mmap, .sendpage = inet_sendpage, .splice_read = tcp_splice_read, .read_sock = tcp_read_sock, .sendmsg_locked = tcp_sendmsg_locked, .sendpage_locked = tcp_sendpage_locked, .peek_len = tcp_peek_len, #ifdef CONFIG_COMPAT .compat_setsockopt = compat_sock_common_setsockopt, .compat_getsockopt = compat_sock_common_getsockopt, .compat_ioctl = inet_compat_ioctl, #endif }; EXPORT_SYMBOL(inet_stream_ops);

linux/net

ipv4, ipv6, netfilter, packet, netlink,bridge


Makefile 中的 $@, $^, $< , $? 符號


HW:

  1. study https://lxr.linux.no/linux+v3.2/drivers/char/misc.c
    2. 實作下面例子 (poll) 重要!!! (select system call 需要 poll) 結合sysfs
    Poll Linux Example Device Driver – Linux Device Driver Tutorial Part 42

https://lxr.linux.no/linux-old+v2.4.31/fs/select.c

Line 197

int do_select(int n, fd_set_bits *fds, long *timeout) 165{ 166 poll_table table, *wait; 167 int retval, i, off; 168 long __timeout = *timeout; 169 170 read_lock(&current->files->file_lock); 171 retval = max_select_fd(n, fds); 172 read_unlock(&current->files->file_lock); 173 174 if (retval < 0) 175 return retval; 176 n = retval; 177 178 poll_initwait(&table); 179 wait = &table; 180 if (!__timeout) 181 wait = NULL; 182 retval = 0; 183 for (;;) { 184 set_current_state(TASK_INTERRUPTIBLE); 185 for (i = 0 ; i < n; i++) { 186 unsigned long bit = BIT(i); 187 unsigned long mask; 188 struct file *file; 189 190 off = i / __NFDBITS; 191 if (!(bit & BITS(fds, off))) 192 continue; 193 file = fget(i); 194 mask = POLLNVAL; 195 if (file) { 196 mask = DEFAULT_POLLMASK; 197 if (file->f_op && file->f_op->poll) 198 mask = file->f_op->poll(file, wait); 199 fput(file); 200 } 201 if ((mask & POLLIN_SET) && ISSET(bit, __IN(fds,off))) { 202 SET(bit, __RES_IN(fds,off)); 203 retval++; 204 wait = NULL; 205 } 206 if ((mask & POLLOUT_SET) && ISSET(bit, __OUT(fds,off))) { 207 SET(bit, __RES_OUT(fds,off)); 208 retval++; 209 wait = NULL; 210 } 211 if ((mask & POLLEX_SET) && ISSET(bit, __EX(fds,off))) { 212 SET(bit, __RES_EX(fds,off)); 213 retval++; 214 wait = NULL; 215 } 216 } 217 wait = NULL; 218 if (retval || !__timeout || signal_pending(current)) 219 break; 220 if(table.error) { 221 retval = table.error; 222 break; 223 } 224 __timeout = schedule_timeout(__timeout); 225 } 226 current->state = TASK_RUNNING; 227 228 poll_freewait(&table); 229 230 /* 231 * Up-to-date the caller timeout. 232 */ 233 *timeout = __timeout; 234 return retval; 235}

Futher Reading:

Linux-Kernel-Examples

udevadm - udev management tool, udevadm monitor

Softirq in Linux Device Driver – Linux Device Driver Tutorial Part 45

主要 Multi-core processor 增進效能
What is RCU? – “Read, Copy, Update”

Using Kernel Timer In Linux Device Driver – Linux Device Driver Tutorial Part 26, by SLR

GPIO Linux Device Driver (GPIO Interrupt) – Linux Device Driver Tutorial Part 36, by SLR

有關 mmap

How to mmap a Linux kernel buffer to user space?

Linux驱动mmap内存映射

Frame buffer device initialization and setup routines
linux+v2.6.12/drivers/video/fbmem.c

1027static struct file_operations fb_fops = {
1028 .owner = THIS_MODULE,
1029 .read = fb_read,
1030 .write = fb_write,
1031 .ioctl = fb_ioctl,
1032#ifdef CONFIG_COMPAT
1033 .compat_ioctl = fb_compat_ioctl,
1034#endif
1035 .mmap = fb_mmap,
1036 .open = fb_open,
1037 .release = fb_release,
1038#ifdef HAVE_ARCH_FB_UNMAPPED_AREA
1039 .get_unmapped_area = get_fb_unmapped_area,
1040#endif
1041};

Linux wait queue:

https://lxr.linux.no/linux+v6.0.9/include/linux/wait.h#L61

#define __WAITQUEUE_INITIALIZER(name, tsk) {
50 .private = tsk,
51 .func = default_wake_function,
52 .entry = { NULL, NULL } }
53
54#define DECLARE_WAITQUEUE(name, tsk)
55 struct wait_queue_entry name = __WAITQUEUE_INITIALIZER(name, tsk)
56
57#define __WAIT_QUEUE_HEAD_INITIALIZER(name) {
58 .lock = __SPIN_LOCK_UNLOCKED(name.lock),
59 .head = LIST_HEAD_INIT(name.head) }
60
61#define DECLARE_WAIT_QUEUE_HEAD(name)
62 struct wait_queue_head name = __WAIT_QUEUE_HEAD_INITIALIZER(name)
63

extern void __init_waitqueue_head(struct wait_queue_head *wq_head, const char *name, struct lock_class_key *);


#define init_waitqueue_head(wq_head)
do {
static struct lock_class_key __key;

__init_waitqueue_head((wq_head), #wq_head, &__key);
} while (0)


void __init_waitqueue_head(struct wait_queue_head *wq_head, const char *name, struct lock_class_key *key)
{
spin_lock_init(&wq_head->lock);
lockdep_set_class_and_name(&wq_head->lock, key, name);
INIT_LIST_HEAD(&wq_head->head);
}

#define wake_up(x) __wake_up(x, TASK_NORMAL, 1, NULL)

143/**
144 * __wake_up - wake up threads blocked on a waitqueue.
145 * @wq_head: the waitqueue
146 * @mode: which threads
147 * @nr_exclusive: how many wake-one or wake-many threads to wake up
148 * @key: is directly passed to the wakeup function
149 *
150 * If this function wakes up a task, it executes a full memory barrier before
151 * accessing the task state.
152 */
153void __wake_up(struct wait_queue_head *wq_head, unsigned int mode,
154 int nr_exclusive, void *key)
155{
156 __wake_up_common_lock(wq_head, mode, nr_exclusive, 0, key);
157}

https://gcc.gnu.org/onlinedocs/cpp/Stringizing.html

https://lxr.linux.no/linux+v6.0.9/include/linux/spinlock.h#L347

  • This guarantees that the following two properties hold:
    128 *
    129 * 1) Given the snippet:
    130 *
    131 * { X = 0; Y = 0; }
    132 *
    133 * CPU0 CPU1
    134 *
    135 * WRITE_ONCE(X, 1); WRITE_ONCE(Y, 1);
    136 * spin_lock(S); smp_mb();
    137 * smp_mb__after_spinlock(); r1 = READ_ONCE(X);
    138 * r0 = READ_ONCE(Y);
    139 * spin_unlock(S);
    140 *
    141 * it is forbidden that CPU0 does not observe CPU1's store to Y (r0 = 0)
    142 * and CPU1 does not observe CPU0's store to X (r1 = 0); see the comments
    143 * preceding the call to smp_mb__after_spinlock() in __schedule() and in
    144 * try_to_wake_up().
    145 *
    146 * 2) Given the snippet:
    147 *
    148 * { X = 0; Y = 0; }
    149 *
    150 * CPU0 CPU1 CPU2
    151 *
    152 * spin_lock(S); spin_lock(S); r1 = READ_ONCE(Y);
    153 * WRITE_ONCE(X, 1); smp_mb__after_spinlock(); smp_rmb();
    154 * spin_unlock(S); r0 = READ_ONCE(X); r2 = READ_ONCE(X);
    155 * WRITE_ONCE(Y, 1);
    156 * spin_unlock(S);
    157 *
    158 * it is forbidden that CPU0's critical section executes before CPU1's
    159 * critical section (r0 = 1), CPU2 observes CPU1's store to Y (r1 = 1)
    160 * and CPU2 does not observe CPU0's store to X (r2 = 0); see the comments
    161 * preceding the calls to smp_rmb() in try_to_wake_up() for similar
    162 * snippets but "projected" onto two CPUs.
    163 *
    164 * Property (2) upgrades the lock to an RCsc lock