Sleeping in the Kernel, Signal, mmap and Frame Reception/Transmission

Author: WhoAmI
email: kccddb@gmail.com
Date: 20231104

Ref.
Understanding Linux Network Internals
By Christian Benvenuti
DP8390D datasheet, DP8390D/NS32490D NIC Network Interface Controller

Kernel Korner - Sleeping in the Kernel

How to Write Linux Mouse Drivers

Linux 2.6.xx
入門級 Mouse Linux Kernel Driver

What is RCU? – “Read, Copy, Update”

Linux 核心設計: RCU 同步機制

Day25 RCU 同步機制

 Linux Interrupt /Interrupt Service Routune (ISR)

Intel 8255x 10/100 Mbps Ethernet
Controller Family

Linux Base Driver for the Intel® PRO/100 Family of Adapters

NAPI, NAPI is the event handling mechanism used by the Linux networking stack.

Management Data Input/Output (MDIO)
Media Independent Interface Management (MIIM)
Media Independent Interface (MII)
RMII(Reduced Media Independant Interface)

MII has two signal interfaces:

A Data interface to the Ethernet MAC, for sending and receiving Ethernet frame data.
A PHY management interface, MDIO, used to read and write the control and status registers of the PHY in order to configure each PHY before operation, and to monitor link status during operation.

都要看 datasheet
4 ports PHY

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

https://lxr.linux.no/linux+v2.6.12/drivers/net/e100.c

There are 4 bottom half mechanisms available in Linux:

1. Workqueue – Executed in a process context.
2. Threaded IRQs
3. Softirqs – Executed in an atomic context.
4. Tasklets – Executed in an atomic context.

SOFTIRQ~software interrupt request.
https://elixir.bootlin.com/linux/v2.6.33/source/include/linux/interrupt.h

























/* PLEASE, avoid to allocate new softirqs, if you need not _really_ high
   frequency threaded job scheduling. For almost all the purposes
   tasklets are more than enough. F.e. all serial device BHs et
   al. should be converted to tasklets, not to softirqs.
 */

enum
{
	HI_SOFTIRQ=0,
	TIMER_SOFTIRQ,
	NET_TX_SOFTIRQ,
	NET_RX_SOFTIRQ,
	BLOCK_SOFTIRQ,
	BLOCK_IOPOLL_SOFTIRQ,
	TASKLET_SOFTIRQ,
	SCHED_SOFTIRQ,
	HRTIMER_SOFTIRQ,
	RCU_SOFTIRQ,	/* Preferable RCU should always be the last softirq */

	NR_SOFTIRQS
};

Tasklet/Workqueue

Tasklets in Linux kernel |Dynamic Method – Linux Device Driver Tutorial Part 21

Workqueue in Linux Kernel Part 1 – Linux Device Driver Tutorial Part 14

Ref.
Deferrable functions, kernel tasklets, and work queues
An introduction to bottom halves in Linux 2.6, by M. Jones

Kernel Thread

Kernel Thread – Linux Device Driver Tutorial Part 19

schedule
schedule_timeout





















































































static void process_timeout(unsigned long __data)
{
	wake_up_process((struct task_struct *)__data);
}


/**
 * schedule_timeout - sleep until timeout
 * @timeout: timeout value in jiffies
 *
 * Make the current task sleep until @timeout jiffies have
 * elapsed. The routine will return immediately unless
 * the current task state has been set (see set_current_state()).
 *
 * You can set the task state as follows -
 *
 * %TASK_UNINTERRUPTIBLE - at least @timeout jiffies are guaranteed to
 * pass before the routine returns. The routine will return 0
 *
 * %TASK_INTERRUPTIBLE - the routine may return early if a signal is
 * delivered to the current task. In this case the remaining time
 * in jiffies will be returned, or 0 if the timer expired in time
 *
 * The current task state is guaranteed to be TASK_RUNNING when this
 * routine returns.
 *
 * Specifying a @timeout value of %MAX_SCHEDULE_TIMEOUT will schedule
 * the CPU away without a bound on the timeout. In this case the return
 * value will be %MAX_SCHEDULE_TIMEOUT.
 *
 * In all cases the return value is guaranteed to be non-negative.
 */
signed long __sched schedule_timeout(signed long timeout)
{
	struct timer_list timer;
	unsigned long expire;

	switch (timeout)
	{
	case MAX_SCHEDULE_TIMEOUT:
		/*
		 * These two special cases are useful to be comfortable
		 * in the caller. Nothing more. We could take
		 * MAX_SCHEDULE_TIMEOUT from one of the negative value
		 * but I' d like to return a valid offset (>=0) to allow
		 * the caller to do everything it want with the retval.
		 */
		schedule();
		goto out;
	default:
		/*
		 * Another bit of PARANOID. Note that the retval will be
		 * 0 since no piece of kernel is supposed to do a check
		 * for a negative retval of schedule_timeout() (since it
		 * should never happens anyway). You just have the printk()
		 * that will tell you if something is gone wrong and where.
		 */
		if (timeout < 0) {
			printk(KERN_ERR "schedule_timeout: wrong timeout "
				"value %lx\n", timeout);
			dump_stack();
			current->state = TASK_RUNNING;
			goto out;
		}
	}

	expire = timeout + jiffies;

	setup_timer_on_stack(&timer, process_timeout, (unsigned long)current);
	__mod_timer(&timer, expire, false, TIMER_NOT_PINNED);
	schedule();
	del_singleshot_timer_sync(&timer);

	/* Remove the timer from the object tracker */
	destroy_timer_on_stack(&timer);

	timeout = expire - jiffies;

 out:
	return timeout < 0 ? 0 : timeout;
}
EXPORT_SYMBOL(schedule_timeout);

https://elixir.bootlin.com/linux/v2.6.36/source/kernel/timer.c#L1430
https://elixir.bootlin.com/linux/v2.6.36/source/kernel/sched.c#L4293

interruptible_sleep_on ** 現今不建議使用**, 改 wait_event_interruptible, wait_event_interruptible_timeout




































































/**
 * wait_event_interruptible - sleep until a condition gets true
 * @wq: the waitqueue to wait on
 * @condition: a C expression for the event to wait for
 *
 * The process is put to sleep (TASK_INTERRUPTIBLE) until the
 * @condition evaluates to true or a signal is received.
 * The @condition is checked each time the waitqueue @wq is woken up.
 *
 * wake_up() has to be called after changing any variable that could
 * change the result of the wait condition.
 *
 * The function will return -ERESTARTSYS if it was interrupted by a
 * signal and 0 if @condition evaluated to true.
 */
#define wait_event_interruptible(wq, condition)				\
({									\
	int __ret = 0;							\
	if (!(condition))						\
		__wait_event_interruptible(wq, condition, __ret);	\
	__ret;								\
})

#define __wait_event_interruptible_timeout(wq, condition, ret)		\
do {									\
	DEFINE_WAIT(__wait);						\
									\
	for (;;) {							\
		prepare_to_wait(&wq, &__wait, TASK_INTERRUPTIBLE);	\
		if (condition)						\
			break;						\
		if (!signal_pending(current)) {				\
			ret = schedule_timeout(ret);			\
			if (!ret)					\
				break;					\
			continue;					\
		}							\
		ret = -ERESTARTSYS;					\
		break;							\
	}								\
	finish_wait(&wq, &__wait);					\
} while (0)


/**
 * wait_event_interruptible_timeout - sleep until a condition gets true or a timeout elapses
 * @wq: the waitqueue to wait on
 * @condition: a C expression for the event to wait for
 * @timeout: timeout, in jiffies
 *
 * The process is put to sleep (TASK_INTERRUPTIBLE) until the
 * @condition evaluates to true or a signal is received.
 * The @condition is checked each time the waitqueue @wq is woken up.
 *
 * wake_up() has to be called after changing any variable that could
 * change the result of the wait condition.
 *
 * The function returns 0 if the @timeout elapsed, -ERESTARTSYS if it
 * was interrupted by a signal, and the remaining jiffies otherwise
 * if the condition evaluated to true before the timeout elapsed.
 */
#define wait_event_interruptible_timeout(wq, condition, timeout)	\
({									\
	long __ret = timeout;						\
	if (!(condition))						\
		__wait_event_interruptible_timeout(wq, condition, __ret); \
	__ret;								\
})

Waitqueue in Linux – Linux Device Driver Tutorial Part 10
Thanks to SLR

現今不建議使用 interruptible_sleep_on



























static long __sched
sleep_on_common(wait_queue_head_t *q, int state, long timeout)
{
	unsigned long flags;
	wait_queue_t wait;

	init_waitqueue_entry(&wait, current);

	__set_current_state(state);

	spin_lock_irqsave(&q->lock, flags);
	__add_wait_queue(q, &wait);
	spin_unlock(&q->lock);
	timeout = schedule_timeout(timeout);
	spin_lock_irq(&q->lock);
	__remove_wait_queue(q, &wait);
	spin_unlock_irqrestore(&q->lock, flags);

	return timeout;
}

void __sched interruptible_sleep_on(wait_queue_head_t *q)
{
	sleep_on_common(q, TASK_INTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT);
}
EXPORT_SYMBOL(interruptible_sleep_on);

wait queue

Initializing Waitqueue
Queuing (Put the Task to sleep until the event comes)
Waking Up Queued Task

Waitqueue in Linux – Linux Device Driver Tutorial Part 10

Kernel Thread – Linux Device Driver Tutorial Part 19

wait_event_interruptible - sleep until a condition gets true

https://elixir.bootlin.com/linux/v2.6.33/source/include/linux/wait.h#L282






















/**
 * wait_event_interruptible - sleep until a condition gets true
 * @wq: the waitqueue to wait on
 * @condition: a C expression for the event to wait for
 *
 * The process is put to sleep (TASK_INTERRUPTIBLE) until the
 * @condition evaluates to true or a signal is received.
 * The @condition is checked each time the waitqueue @wq is woken up.
 *
 * wake_up() has to be called after changing any variable that could
 * change the result of the wait condition.
 *
 * The function will return -ERESTARTSYS if it was interrupted by a
 * signal and 0 if @condition evaluated to true.
 */
#define wait_event_interruptible(wq, condition)				\
({									\
	int __ret = 0;							\
	if (!(condition))						\
		__wait_event_interruptible(wq, condition, __ret);	\
	__ret;								\
})

Waitqueue created by Dynamic Method, by SLR
































































































































































































/****************************************************************************//**
*  \file       driver.c
*
*  \details    Simple linux driver (Waitqueue Dynamic method)
*
*  \author     EmbeTronicX
*
*  \Tested with Linux raspberrypi 5.10.27-v7l-embetronicx-custom+
*
*******************************************************************************/
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/module.h>
#include <linux/kdev_t.h>
#include <linux/fs.h>
#include <linux/cdev.h>
#include <linux/device.h>
#include <linux/slab.h>                 //kmalloc()
#include <linux/uaccess.h>              //copy_to/from_user()
#include <linux/kthread.h>
#include <linux/wait.h>                 // Required for the wait queues
#include <linux/err.h>
 
 
uint32_t read_count = 0;
static struct task_struct *wait_thread;
 
dev_t dev = 0;
static struct class *dev_class;
static struct cdev etx_cdev;
wait_queue_head_t wait_queue_etx;
int wait_queue_flag = 0;
 
/*
** Function Prototypes
*/
static int      __init etx_driver_init(void);
static void     __exit etx_driver_exit(void);
 
/*************** Driver functions **********************/
static int      etx_open(struct inode *inode, struct file *file);
static int      etx_release(struct inode *inode, struct file *file);
static ssize_t  etx_read(struct file *filp, char __user *buf, size_t len,loff_t * off);
static ssize_t  etx_write(struct file *filp, const char *buf, size_t len, loff_t * off);

/*
** File operation sturcture
*/
static struct file_operations fops =
{
        .owner          = THIS_MODULE,
        .read           = etx_read,
        .write          = etx_write,
        .open           = etx_open,
        .release        = etx_release,
};
 
/*
** Thread function
*/
static int wait_function(void *unused)
{
        
        while(1) {
                pr_info("Waiting For Event...\n");
                wait_event_interruptible(wait_queue_etx, wait_queue_flag != 0 );
                if(wait_queue_flag == 2) {
                        pr_info("Event Came From Exit Function\n");
                        return 0;
                }
                pr_info("Event Came From Read Function - %d\n", ++read_count);
                wait_queue_flag = 0;
        }
        return 0;
}
 
/*
** This function will be called when we open the Device file
*/ 
static int etx_open(struct inode *inode, struct file *file)
{
        pr_info("Device File Opened...!!!\n");
        return 0;
}

/*
** This function will be called when we close the Device file
*/
static int etx_release(struct inode *inode, struct file *file)
{
        pr_info("Device File Closed...!!!\n");
        return 0;
}

/*
** This function will be called when we read the Device file
*/
static ssize_t etx_read(struct file *filp, char __user *buf, size_t len, loff_t *off)
{
        pr_info("Read Function\n");
        wait_queue_flag = 1;
        wake_up_interruptible(&wait_queue_etx);
        return 0;
}

/*
** This function will be called when we write the Device file
*/
static ssize_t etx_write(struct file *filp, const char __user *buf, size_t len, loff_t *off)
{
        pr_info("Write function\n");
        return len;
}

/*
** Module Init function
*/
static int __init etx_driver_init(void)
{
        /*Allocating Major number*/
        if((alloc_chrdev_region(&dev, 0, 1, "etx_Dev")) <0){
                pr_info("Cannot allocate major number\n");
                return -1;
        }
        pr_info("Major = %d Minor = %d \n",MAJOR(dev), MINOR(dev));
 
        /*Creating cdev structure*/
        cdev_init(&etx_cdev,&fops);
 
        /*Adding character device to the system*/
        if((cdev_add(&etx_cdev,dev,1)) < 0){
            pr_info("Cannot add the device to the system\n");
            goto r_class;
        }
 
        /*Creating struct class*/
        if(IS_ERR(dev_class = class_create(THIS_MODULE,"etx_class"))){
            pr_info("Cannot create the struct class\n");
            goto r_class;
        }
 
        /*Creating device*/
        if(IS_ERR(device_create(dev_class,NULL,dev,NULL,"etx_device"))){
            pr_info("Cannot create the Device 1\n");
            goto r_device;
        }
        
        //Initialize wait queue
        init_waitqueue_head(&wait_queue_etx);
 
        //Create the kernel thread with name 'mythread'
        wait_thread = kthread_create(wait_function, NULL, "WaitThread");
        if (wait_thread) {
                pr_info("Thread Created successfully\n");
                wake_up_process(wait_thread);
        } else
                pr_info("Thread creation failed\n");
 
        pr_info("Device Driver Insert...Done!!!\n");
        return 0;
 
r_device:
        class_destroy(dev_class);
r_class:
        unregister_chrdev_region(dev,1);
        return -1;
}

/*
** Module exit function
*/
static void __exit etx_driver_exit(void)
{
        wait_queue_flag = 2;
        wake_up_interruptible(&wait_queue_etx);
        device_destroy(dev_class,dev);
        class_destroy(dev_class);
        cdev_del(&etx_cdev);
        unregister_chrdev_region(dev, 1);
        pr_info("Device Driver Remove...Done!!!\n");
}
 
module_init(etx_driver_init);
module_exit(etx_driver_exit);
 
MODULE_LICENSE("GPL");
MODULE_AUTHOR("EmbeTronicX <embetronicx@gmail.com>");
MODULE_DESCRIPTION("Simple linux driver (Waitqueue Dynamic method)");
MODULE_VERSION("1.8");

Interrupt

include/linux/interrupt.h
request_irq

/**

request_irq - Add a handler for an interrupt line
@irq: The interrupt line to allocate
@handler: Function to be called when the IRQ occurs.

 Primary handler for threaded interrupts

 If NULL, the default primary handler is installed

@flags: Handling flags
@name: Name of the device generating this interrupt
@dev: A cookie passed to the handler function
This call allocates an interrupt and establishes a handler; see
the documentation for request_threaded_irq() for details.
*/
static inline int __must_check
request_irq(unsigned int irq, irq_handler_t handler, unsigned long flags,
const char *name, void *dev)
{
return request_threaded_irq(irq, handler, NULL, flags, name, dev);
}



























static struct wait_queue *mouse_wait;
static spinlock_t mouse_lock = SPIN_LOCK_UNLOCKED;

static void ourmouse_interrupt(int irq, void *dev_id,
struct pt_regs *regs)
{
char delta_x;
char delta_y;
unsigned char new_buttons;
delta_x = inb(OURMOUSE_BASE);
delta_y = inb(OURMOUSE_BASE+1);
new_buttons = inb(OURMOUSE_BASE+2);
if(delta_x || delta_y || new_buttons != mouse_buttons)
{
/* Something happened */
spin_lock(&mouse_lock);
mouse_event = 1;
mouse_dx += delta_x;
mouse_dy += delta_y;
mouse_buttons = new_buttons;
spin_unlock(&mouse_lock);
wake_up_interruptible(&mouse_wait);
}
}











Listing 8: The poll function
static unsigned int mouse_poll(struct
file *file, poll_table *wait)
{
poll_wait(file, &mouse_wait, wait);
if(mouse_event)
return POLLIN | POLLRDNORM;
return 0;
}

Wait Queues linux 2.6.xx

A process can sleep in two different modes, interruptible and uninterruptible. In an interruptible sleep, the process could be woken up for processing of signals. In an uninterruptible sleep, the process could not be woken up other than by issuing an explicit wake_up. Interruptible sleep is the preferred way of sleeping, unless there is a situation in which signals cannot be handled at all, such as device I/O. Lost Wake-Up Problem

有問題的 spin_lock, Why?


















Process A:
1  spin_lock(&list_lock);
2  if(list_empty(&list_head)) {
3      spin_unlock(&list_lock);
4      set_current_state(TASK_INTERRUPTIBLE);
5      schedule();
6      spin_lock(&list_lock);
7  }
8
9  /* Rest of the code ... */
10 spin_unlock(&list_lock);

Process B:
100  spin_lock(&list_lock);
101  list_add_tail(&list_head, new_node);
102  spin_unlock(&list_lock);
103  wake_up_process(processa_task);

修正為:















Process A:

1  set_current_state(TASK_INTERRUPTIBLE);
2  spin_lock(&list_lock);
3  if(list_empty(&list_head)) {
4         spin_unlock(&list_lock);
5         schedule();
6         spin_lock(&list_lock);
7  }
8  set_current_state(TASK_RUNNING);
9
10 /* Rest of the code ... */
11 spin_unlock(&list_lock);

linux-2.6.11/kernel/sched.c










4253  /* Wait for kthread_stop */
4254  set_current_state(TASK_INTERRUPTIBLE);
4255  while (!kthread_should_stop()) {
4256          schedule();
4257          set_current_state(TASK_INTERRUPTIBLE);
4258  }
4259  __set_current_state(TASK_RUNNING);
4260 return 0;

A wait queue could be defined and initialised in the following manner:





wait_queue_head_t my_event;
init_waitqueue_head(&my_event);

using this macro:


DECLARE_WAIT_QUEUE_HEAD(my_event);

Any process that wants to wait on my_event could use either of the following options:

wait_event(&my_event, (event_present == 1) );
wait_event_interruptible(&my_event, (event_present == 1) );

The interruptible version 2 of the options above puts the process to an interruptible sleep, whereas the other (option 1) puts the process into an uninterruptible sleep.

Wait Queues: Putting It Together

include/linux/wait.h
























/**
 * wait_event_interruptible - sleep until a condition gets true
 * @wq_head: the waitqueue to wait on
 * @condition: a C expression for the event to wait for
 *
 * The process is put to sleep (TASK_INTERRUPTIBLE) until the
 * @condition evaluates to true or a signal is received.
 * The @condition is checked each time the waitqueue @wq_head is woken up.
 *
 * wake_up() has to be called after changing any variable that could
 * change the result of the wait condition.
 *
 * The function will return -ERESTARTSYS if it was interrupted by a
 * signal and 0 if @condition evaluated to true.
 */

#define wait_event_interruptible(wq_head, condition)				\
({										\
	int __ret = 0;								\
	might_sleep();								\
	if (!(condition))							\
		__ret = __wait_event_interruptible(wq_head, condition);		\
	__ret;									\
})

smbiod thread:

linux-2.6.11/fs/smbfs/smbiod.c

































291 static int smbiod(void *unused)
292 {
293     daemonize("smbiod");
294
295     allow_signal(SIGKILL);
296
297     VERBOSE("SMB Kernel thread starting "
                "(%d)...\n", current->pid);
298
299     for (;;) {
300             struct smb_sb_info *server;
301             struct list_head *pos, *n;
302
303             /* FIXME: Use poll? */
304             wait_event_interruptible(smbiod_wait,
305                     test_bit(SMBIOD_DATA_READY,
                                 &smbiod_flags));
...
...             /* Some processing */
312
313             clear_bit(SMBIOD_DATA_READY,
                          &smbiod_flags);
314
...             /* Code to perform the requested I/O */
...
...
337     }
338
339     VERBOSE("SMB Kernel thread exiting (%d)...\n",
                current->pid);
340     module_put_and_exit(0);
341 }
342

linux-2.6.11/fs/smbfs/smbiod.c:







57 void smbiod_wake_up(void)
58 {
59     if (smbiod_state == SMBIOD_DEAD)
60         return;
61     set_bit(SMBIOD_DATA_READY, &smbiod_flags);
62     wake_up_interruptible(&smbiod_wait);
63 }

Futher reading:

https://elixir.bootlin.com/linux/v2.6.39/source/drivers/char/mem.c


























static const struct memdev {
	const char *name;
	mode_t mode;
	const struct file_operations *fops;
	struct backing_dev_info *dev_info;
} devlist[] = {
	 [1] = { "mem", 0, &mem_fops, &directly_mappable_cdev_bdi },
#ifdef CONFIG_DEVKMEM
	 [2] = { "kmem", 0, &kmem_fops, &directly_mappable_cdev_bdi },
#endif
	 [3] = { "null", 0666, &null_fops, NULL },
#ifdef CONFIG_DEVPORT
	 [4] = { "port", 0, &port_fops, NULL },
#endif
	 [5] = { "zero", 0666, &zero_fops, &zero_bdi },
	 [7] = { "full", 0666, &full_fops, NULL },
	 [8] = { "random", 0666, &random_fops, NULL },
	 [9] = { "urandom", 0666, &urandom_fops, NULL },
	[11] = { "kmsg", 0, &kmsg_fops, NULL },
#ifdef CONFIG_CRASH_DUMP
	[12] = { "oldmem", 0, &oldmem_fops, NULL },
#endif
};

Frame Reception

Notifying the Kernel of Frame Reception: NAPI and netif_rx
Processing Multiple Frames During an Interrupt, e.g., e1000
Processing the NET_RX_SOFTIRQ: net_rx_action

Frame Transmission

*int dev_queue_xmit(struct sk_buff skb)

Interfacing to Traffic Control (the QoS layer)

Invoking hard_start_xmit directly

RCU – “Read, Copy, Update”

Day25 RCU 同步機制

rcu 机制简介


























































struct foo {
        int a;
        char b;
        long c;
};
DEFINE_SPINLOCK(foo_mutex);

struct foo __rcu *gbl_foo;

/*
 * Create a new struct foo that is the same as the one currently
 * pointed to by gbl_foo, except that field "a" is replaced
 * with "new_a".  Points gbl_foo to the new structure, and
 * frees up the old structure after a grace period.
 *
 * Uses rcu_assign_pointer() to ensure that concurrent readers
 * see the initialized version of the new structure.
 *
 * Uses synchronize_rcu() to ensure that any readers that might
 * have references to the old structure complete before freeing
 * the old structure.
 */
void foo_update_a(int new_a)
{
        struct foo *new_fp;
        struct foo *old_fp;

        new_fp = kmalloc(sizeof(*new_fp), GFP_KERNEL);
        spin_lock(&foo_mutex);
        old_fp = rcu_dereference_protected(gbl_foo, lockdep_is_held(&foo_mutex));
        *new_fp = *old_fp;
        new_fp->a = new_a;
        rcu_assign_pointer(gbl_foo, new_fp);
        spin_unlock(&foo_mutex);
        synchronize_rcu();
        kfree(old_fp);
}

/*
 * Return the value of field "a" of the current gbl_foo
 * structure.  Use rcu_read_lock() and rcu_read_unlock()
 * to ensure that the structure does not get deleted out
 * from under us, and use rcu_dereference() to ensure that
 * we see the initialized version of the structure (important
 * for DEC Alpha and for people reading the code).
 */
int foo_get_a(void)
{
        int retval;

        rcu_read_lock();
        retval = rcu_dereference(gbl_foo)->a;
        rcu_read_unlock();
        return retval;
}

Network 之二 Ethernet（以太网）中的 MAC、MII、PHY 详解

https://elixir.bootlin.com/linux/v2.6.36/source/drivers/net/8139too.c#L755








































































































































































































static const struct net_device_ops rtl8139_netdev_ops = {
	.ndo_open		= rtl8139_open,
	.ndo_stop		= rtl8139_close,
	.ndo_get_stats		= rtl8139_get_stats,
	.ndo_change_mtu		= eth_change_mtu,
	.ndo_validate_addr	= eth_validate_addr,
	.ndo_set_mac_address 	= rtl8139_set_mac_address,
	.ndo_start_xmit		= rtl8139_start_xmit,
	.ndo_set_multicast_list	= rtl8139_set_rx_mode,
	.ndo_do_ioctl		= netdev_ioctl,
	.ndo_tx_timeout		= rtl8139_tx_timeout,
#ifdef CONFIG_NET_POLL_CONTROLLER
	.ndo_poll_controller	= rtl8139_poll_controller,
#endif
};




static int __devinit rtl8139_init_one (struct pci_dev *pdev,
				       const struct pci_device_id *ent)
{
	struct net_device *dev = NULL;
	struct rtl8139_private *tp;
	int i, addr_len, option;
	void __iomem *ioaddr;
	static int board_idx = -1;

	assert (pdev != NULL);
	assert (ent != NULL);

	board_idx++;

	/* when we're built into the kernel, the driver version message
	 * is only printed if at least one 8139 board has been found
	 */
#ifndef MODULE
	{
		static int printed_version;
		if (!printed_version++)
			pr_info(RTL8139_DRIVER_NAME "\n");
	}
#endif

	if (pdev->vendor == PCI_VENDOR_ID_REALTEK &&
	    pdev->device == PCI_DEVICE_ID_REALTEK_8139 && pdev->revision >= 0x20) {
		dev_info(&pdev->dev,
			   "This (id %04x:%04x rev %02x) is an enhanced 8139C+ chip, use 8139cp\n",
		       	   pdev->vendor, pdev->device, pdev->revision);
		return -ENODEV;
	}

	if (pdev->vendor == PCI_VENDOR_ID_REALTEK &&
	    pdev->device == PCI_DEVICE_ID_REALTEK_8139 &&
	    pdev->subsystem_vendor == PCI_VENDOR_ID_ATHEROS &&
	    pdev->subsystem_device == PCI_DEVICE_ID_REALTEK_8139) {
		pr_info("OQO Model 2 detected. Forcing PIO\n");
		use_io = 1;
	}

	dev = rtl8139_init_board (pdev);
	if (IS_ERR(dev))
		return PTR_ERR(dev);

	assert (dev != NULL);
	tp = netdev_priv(dev);
	tp->dev = dev;

	ioaddr = tp->mmio_addr;
	assert (ioaddr != NULL);

	addr_len = read_eeprom (ioaddr, 0, 8) == 0x8129 ? 8 : 6;
	for (i = 0; i < 3; i++)
		((__le16 *) (dev->dev_addr))[i] =
		    cpu_to_le16(read_eeprom (ioaddr, i + 7, addr_len));
	memcpy(dev->perm_addr, dev->dev_addr, dev->addr_len);

	/* The Rtl8139-specific entries in the device structure. */
	dev->netdev_ops = &rtl8139_netdev_ops;
	dev->ethtool_ops = &rtl8139_ethtool_ops;
	dev->watchdog_timeo = TX_TIMEOUT;
	netif_napi_add(dev, &tp->napi, rtl8139_poll, 64);

	/* note: the hardware is not capable of sg/csum/highdma, however
	 * through the use of skb_copy_and_csum_dev we enable these
	 * features
	 */
	dev->features |= NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_HIGHDMA;

	dev->irq = pdev->irq;

	/* tp zeroed and aligned in alloc_etherdev */
	tp = netdev_priv(dev);

	/* note: tp->chipset set in rtl8139_init_board */
	tp->drv_flags = board_info[ent->driver_data].hw_flags;
	tp->mmio_addr = ioaddr;
	tp->msg_enable =
		(debug < 0 ? RTL8139_DEF_MSG_ENABLE : ((1 << debug) - 1));
	spin_lock_init (&tp->lock);
	spin_lock_init (&tp->rx_lock);
	INIT_DELAYED_WORK(&tp->thread, rtl8139_thread);
	tp->mii.dev = dev;
	tp->mii.mdio_read = mdio_read;
	tp->mii.mdio_write = mdio_write;
	tp->mii.phy_id_mask = 0x3f;
	tp->mii.reg_num_mask = 0x1f;

	/* dev is fully set up and ready to use now */
	pr_debug("about to register device named %s (%p)...\n",
		 dev->name, dev);
	i = register_netdev (dev);
	if (i) goto err_out;

	pci_set_drvdata (pdev, dev);

	netdev_info(dev, "%s at 0x%lx, %pM, IRQ %d\n",
		    board_info[ent->driver_data].name,
		    dev->base_addr, dev->dev_addr, dev->irq);

	netdev_dbg(dev, "Identified 8139 chip type '%s'\n",
		   rtl_chip_info[tp->chipset].name);

	/* Find the connected MII xcvrs.
	   Doing this in open() would allow detecting external xcvrs later, but
	   takes too much time. */
#ifdef CONFIG_8139TOO_8129
	if (tp->drv_flags & HAS_MII_XCVR) {
		int phy, phy_idx = 0;
		for (phy = 0; phy < 32 && phy_idx < sizeof(tp->phys); phy++) {
			int mii_status = mdio_read(dev, phy, 1);
			if (mii_status != 0xffff  &&  mii_status != 0x0000) {
				u16 advertising = mdio_read(dev, phy, 4);
				tp->phys[phy_idx++] = phy;
				netdev_info(dev, "MII transceiver %d status 0x%04x advertising %04x\n",
					    phy, mii_status, advertising);
			}
		}
		if (phy_idx == 0) {
			netdev_info(dev, "No MII transceivers found! Assuming SYM transceiver\n");
			tp->phys[0] = 32;
		}
	} else
#endif
		tp->phys[0] = 32;
	tp->mii.phy_id = tp->phys[0];

	/* The lower four bits are the media type. */
	option = (board_idx >= MAX_UNITS) ? 0 : media[board_idx];
	if (option > 0) {
		tp->mii.full_duplex = (option & 0x210) ? 1 : 0;
		tp->default_port = option & 0xFF;
		if (tp->default_port)
			tp->mii.force_media = 1;
	}
	if (board_idx < MAX_UNITS  &&  full_duplex[board_idx] > 0)
		tp->mii.full_duplex = full_duplex[board_idx];
	if (tp->mii.full_duplex) {
		netdev_info(dev, "Media type forced to Full Duplex\n");
		/* Changing the MII-advertised media because might prevent
		   re-connection. */
		tp->mii.force_media = 1;
	}
	if (tp->default_port) {
		netdev_info(dev, "  Forcing %dMbps %s-duplex operation\n",
			    (option & 0x20 ? 100 : 10),
			    (option & 0x10 ? "full" : "half"));
		mdio_write(dev, tp->phys[0], 0,
				   ((option & 0x20) ? 0x2000 : 0) | 	/* 100Mbps? */
				   ((option & 0x10) ? 0x0100 : 0)); /* Full duplex? */
	}

	/* Put the chip into low-power mode. */
	if (rtl_chip_info[tp->chipset].flags & HasHltClk)
		RTL_W8 (HltClk, 'H');	/* 'R' would leave the clock running. */

	return 0;

err_out:
	__rtl8139_cleanup_dev (dev);
	pci_disable_device (pdev);
	return i;
}



static struct pci_driver rtl8139_pci_driver = {
	.name		= DRV_NAME,
	.id_table	= rtl8139_pci_tbl,
	.probe		= rtl8139_init_one,
	.remove		= __devexit_p(rtl8139_remove_one),
#ifdef CONFIG_PM
	.suspend	= rtl8139_suspend,
	.resume		= rtl8139_resume,
#endif /* CONFIG_PM */
};

Using Kernel Timer

Using Kernel Timer In Linux Device Driver – Linux Device Driver Tutorial Part 26
by SLR

HZ=1000 (Linux 2.6.xx)
jiffies is incremented HZ times every second.













































































































































































/***************************************************************************//**
*  \file       driver.c
*
*  \details    Simple Linux device driver (Kernel Timer)
*
*  \author     EmbeTronicX
*
*  \Tested with Linux raspberrypi 5.10.27-v7l-embetronicx-custom+
*
*******************************************************************************/
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/module.h>
#include <linux/kdev_t.h>
#include <linux/fs.h>
#include <linux/cdev.h>
#include <linux/device.h>
#include <linux/timer.h>
#include <linux/jiffies.h>
#include <linux/err.h>
//Timer Variable
#define TIMEOUT 5000    //milliseconds

static struct timer_list etx_timer;
static unsigned int count = 0;
 
dev_t dev = 0;
static struct class *dev_class;
static struct cdev etx_cdev;

static int __init etx_driver_init(void);
static void __exit etx_driver_exit(void);

/*************** Driver functions **********************/
static int etx_open(struct inode *inode, struct file *file);
static int etx_release(struct inode *inode, struct file *file);
static ssize_t etx_read(struct file *filp, char __user *buf, size_t len,loff_t * off);
static ssize_t etx_write(struct file *filp, const char *buf, size_t len, loff_t * off);
/******************************************************/

//File operation structure 
static struct file_operations fops =
{
        .owner          = THIS_MODULE,
        .read           = etx_read,
        .write          = etx_write,
        .open           = etx_open,
        .release        = etx_release,
};
 
//Timer Callback function. This will be called when timer expires
void timer_callback(struct timer_list * data)
{
    /* do your timer stuff here */
    pr_info("Timer Callback function Called [%d]\n",count++);
    
    /*
       Re-enable timer. Because this function will be called only first time. 
       If we re-enable this will work like periodic timer. 
    */
    mod_timer(&etx_timer, jiffies + msecs_to_jiffies(TIMEOUT));
}

/*
** This function will be called when we open the Device file
*/ 
static int etx_open(struct inode *inode, struct file *file)
{
    pr_info("Device File Opened...!!!\n");
    return 0;
}

/*
** This function will be called when we close the Device file
*/
static int etx_release(struct inode *inode, struct file *file)
{
    pr_info("Device File Closed...!!!\n");
    return 0;
}

/*
** This function will be called when we read the Device file
*/
static ssize_t etx_read(struct file *filp, 
                                    char __user *buf, size_t len, loff_t *off)
{
    pr_info("Read Function\n");
    return 0;
}

/*
** This function will be called when we write the Device file
*/
static ssize_t etx_write(struct file *filp, 
                                const char __user *buf, size_t len, loff_t *off)
{
    pr_info("Write function\n");
    return len;
}

/*
** Module Init function
*/ 
static int __init etx_driver_init(void)
{
    /*Allocating Major number*/
    if((alloc_chrdev_region(&dev, 0, 1, "etx_Dev")) <0){
            pr_err("Cannot allocate major number\n");
            return -1;
    }
    pr_info("Major = %d Minor = %d \n",MAJOR(dev), MINOR(dev));
 
    /*Creating cdev structure*/
    cdev_init(&etx_cdev,&fops);
 
    /*Adding character device to the system*/
    if((cdev_add(&etx_cdev,dev,1)) < 0){
        pr_err("Cannot add the device to the system\n");
        goto r_class;
    }
 
    /*Creating struct class*/
    if(IS_ERR(dev_class = class_create(THIS_MODULE,"etx_class"))){
        pr_err("Cannot create the struct class\n");
        goto r_class;
    }
 
    /*Creating device*/
    if(IS_ERR(device_create(dev_class,NULL,dev,NULL,"etx_device"))){
        pr_err("Cannot create the Device 1\n");
        goto r_device;
    }
 
    /* setup your timer to call my_timer_callback */
    timer_setup(&etx_timer, timer_callback, 0);       //If you face some issues and using older kernel version, then you can try setup_timer API(Change Callback function's argument to unsingned long instead of struct timer_list *.
 
    /* setup timer interval to based on TIMEOUT Macro */
    mod_timer(&etx_timer, jiffies + msecs_to_jiffies(TIMEOUT));
 
    pr_info("Device Driver Insert...Done!!!\n");
    return 0;
r_device:
    class_destroy(dev_class);
r_class:
    unregister_chrdev_region(dev,1);
    return -1;
}


/*
** Module exit function
*/
static void __exit etx_driver_exit(void)
{
    /* remove kernel timer when unloading module */
    del_timer(&etx_timer);
    device_destroy(dev_class,dev);
    class_destroy(dev_class);
    cdev_del(&etx_cdev);
    unregister_chrdev_region(dev, 1);
    pr_info("Device Driver Remove...Done!!!\n");
}
 
module_init(etx_driver_init);
module_exit(etx_driver_exit);
 
MODULE_LICENSE("GPL");
MODULE_AUTHOR("EmbeTronicX <embetronicx@gmail.com>");
MODULE_DESCRIPTION("A simple device driver - Kernel Timer");
MODULE_VERSION("1.21");

mmap

SIGBUS Core Bus error (bad memory access)

struct vm_operations_struct:

https://elixir.bootlin.com/linux/v2.6.33/source/include/linux/mm_types.h#L136

kernel version 2.6.xx 多了:
line 24: vm_fault_t (*fault)(struct vm_fault *vmf);




















































































/*
 * These are the virtual MM functions - opening of an area, closing and
 * unmapping it (needed to keep files on disk up-to-date etc), pointer
 * to the functions called when a no-page or a wp-page exception occurs.
 */
struct vm_operations_struct {
	void (*open)(struct vm_area_struct * area);
	/**
	 * @close: Called when the VMA is being removed from the MM.
	 * Context: User context.  May sleep.  Caller holds mmap_lock.
	 */
	void (*close)(struct vm_area_struct * area);
	/* Called any time before splitting to check if it's allowed */
	int (*may_split)(struct vm_area_struct *area, unsigned long addr);
	int (*mremap)(struct vm_area_struct *area);
	/*
	 * Called by mprotect() to make driver-specific permission
	 * checks before mprotect() is finalised.   The VMA must not
	 * be modified.  Returns 0 if mprotect() can proceed.
	 */
	int (*mprotect)(struct vm_area_struct *vma, unsigned long start,
			unsigned long end, unsigned long newflags);
	vm_fault_t (*fault)(struct vm_fault *vmf);
	vm_fault_t (*huge_fault)(struct vm_fault *vmf,
			enum page_entry_size pe_size);
	vm_fault_t (*map_pages)(struct vm_fault *vmf,
			pgoff_t start_pgoff, pgoff_t end_pgoff);
	unsigned long (*pagesize)(struct vm_area_struct * area);

	/* notification that a previously read-only page is about to become
	 * writable, if an error is returned it will cause a SIGBUS */
	vm_fault_t (*page_mkwrite)(struct vm_fault *vmf);

	/* same as page_mkwrite when using VM_PFNMAP|VM_MIXEDMAP */
	vm_fault_t (*pfn_mkwrite)(struct vm_fault *vmf);

	/* called by access_process_vm when get_user_pages() fails, typically
	 * for use by special VMAs. See also generic_access_phys() for a generic
	 * implementation useful for any iomem mapping.
	 */
	int (*access)(struct vm_area_struct *vma, unsigned long addr,
		      void *buf, int len, int write);

	/* Called by the /proc/PID/maps code to ask the vma whether it
	 * has a special name.  Returning non-NULL will also cause this
	 * vma to be dumped unconditionally. */
	const char *(*name)(struct vm_area_struct *vma);

#ifdef CONFIG_NUMA
	/*
	 * set_policy() op must add a reference to any non-NULL @new mempolicy
	 * to hold the policy upon return.  Caller should pass NULL @new to
	 * remove a policy and fall back to surrounding context--i.e. do not
	 * install a MPOL_DEFAULT policy, nor the task or system default
	 * mempolicy.
	 */
	int (*set_policy)(struct vm_area_struct *vma, struct mempolicy *new);

	/*
	 * get_policy() op must add reference [mpol_get()] to any policy at
	 * (vma,addr) marked as MPOL_SHARED.  The shared policy infrastructure
	 * in mm/mempolicy.c will do this automatically.
	 * get_policy() must NOT add a ref if the policy at (vma,addr) is not
	 * marked as MPOL_SHARED. vma policies are protected by the mmap_lock.
	 * If no [shared/vma] mempolicy exists at the addr, get_policy() op
	 * must return NULL--i.e., do not "fallback" to task or system default
	 * policy.
	 */
	struct mempolicy *(*get_policy)(struct vm_area_struct *vma,
					unsigned long addr);
#endif
	/*
	 * Called by vm_normal_page() for special PTEs to find the
	 * page for @addr.  This is useful if the default behavior
	 * (using pte_page()) would not find the correct page.
	 */
	struct page *(*find_special_page)(struct vm_area_struct *vma,
					  unsigned long addr);
};

struct vm_area_struct:

https://elixir.bootlin.com/linux/v2.6.33/source/include/linux/mm_types.h#L136










































































/*
 * This struct defines a memory VMM memory area. There is one of these
 * per VM-area/task.  A VM area is any part of the process virtual memory
 * space that has a special rule for the page-fault handlers (ie a shared
 * library, the executable area etc).
 */
struct vm_area_struct {
	struct mm_struct * vm_mm;	/* The address space we belong to. */
	unsigned long vm_start;		/* Our start address within vm_mm. */
	unsigned long vm_end;		/* The first byte after our end address
					   within vm_mm. */

	/* linked list of VM areas per task, sorted by address */
	struct vm_area_struct *vm_next;

	pgprot_t vm_page_prot;		/* Access permissions of this VMA. */
	unsigned long vm_flags;		/* Flags, see mm.h. */

	struct rb_node vm_rb;

	/*
	 * For areas with an address space and backing store,
	 * linkage into the address_space->i_mmap prio tree, or
	 * linkage to the list of like vmas hanging off its node, or
	 * linkage of vma in the address_space->i_mmap_nonlinear list.
	 */
	union {
		struct {
			struct list_head list;
			void *parent;	/* aligns with prio_tree_node parent */
			struct vm_area_struct *head;
		} vm_set;

		struct raw_prio_tree_node prio_tree_node;
	} shared;

	/*
	 * A file's MAP_PRIVATE vma can be in both i_mmap tree and anon_vma
	 * list, after a COW of one of the file pages.	A MAP_SHARED vma
	 * can only be in the i_mmap tree.  An anonymous MAP_PRIVATE, stack
	 * or brk vma (with NULL file) can only be in an anon_vma list.
	 */
	struct list_head anon_vma_node;	/* Serialized by anon_vma->lock */
	struct anon_vma *anon_vma;	/* Serialized by page_table_lock */

	/* Function pointers to deal with this struct. */
	const struct vm_operations_struct *vm_ops;

	/* Information about our backing store: */
	unsigned long vm_pgoff;		/* Offset (within vm_file) in PAGE_SIZE
					   units, *not* PAGE_CACHE_SIZE */
	struct file * vm_file;		/* File we map to (can be NULL). */
	void * vm_private_data;		/* was vm_pte (shared mem) */
	unsigned long vm_truncate_count;/* truncate_count or restart_addr */

#ifndef CONFIG_MMU
	struct vm_region *vm_region;	/* NOMMU mapping region */
#endif
#ifdef CONFIG_NUMA
	struct mempolicy *vm_policy;	/* NUMA policy for the VMA */
#endif
};

mmap driver implementation

Appendix

https://lxr.linux.no/linux+v6.0.9/include/uapi/asm-generic/ioctl.h#L87























#define _IOC(dir,type,nr,size) \
  70        (((dir)  << _IOC_DIRSHIFT) | \
  71         ((type) << _IOC_TYPESHIFT) | \
  72         ((nr)   << _IOC_NRSHIFT) | \
  73         ((size) << _IOC_SIZESHIFT))
  74
  75#ifndef __KERNEL__
  76#define _IOC_TYPECHECK(t) (sizeof(t))
  77#endif
  78
  79/*
  80 * Used to create numbers.
  81 *
  82 * NOTE: _IOW means userland is writing and kernel is reading. _IOR
  83 * means userland is reading and kernel is writing.
  84 */
  85#define _IO(type,nr)            _IOC(_IOC_NONE,(type),(nr),0)
  86#define _IOR(type,nr,size)      _IOC(_IOC_READ,(type),(nr),(_IOC_TYPECHECK(size)))
  87#define _IOW(type,nr,size)      _IOC(_IOC_WRITE,(type),(nr),(_IOC_TYPECHECK(size)))
  88#define _IOWR(type,nr,size)     _IOC(_IOC_READ|_IOC_WRITE,(type),(nr),(_IOC_TYPECHECK(size)))

Signal

Sending Signal from Linux Device Driver to User Space – Linux Device Driver Tutorial Part 25

IO Multiplexing

Select Linux Example Device Driver – Linux Device Driver Tutorial Part 43

Sleeping in the Kernel, Signal, mmap and Frame Reception/Transmission

Tasklet/Workqueue

Kernel Thread

Kernel Thread – Linux Device Driver Tutorial Part 19

wait queue

Wait Queues linux 2.6.xx

Frame Reception

Frame Transmission

RCU – “Read, Copy, Update”

Using Kernel Timer

mmap

Signal

IO Multiplexing

Read more

Metric space, vector space, inner product space

Linux Network Programming, by WhoAmI

\mathcal{Probability \ Measure \ and \ Conditional \ Expectation}

使用 pthread 的 echo server (有bug! why?)