Netdev ===== ###### tags: `Linux Kernel` `Contribute by <Dung-Ru Tsai>` # [Writing Network Device Drivers for Linux](https://linuxgazette.net/156/jangir.html) Starting Driver Development Driver development breaks down into the following steps: - Detecting the device - Enabling the device - Understanding the network device - Bus-independent device access - Understanding the PCI configuration space - Initializing net_device - Understanding RealTek8139's transmission mechanism - Understanding RealTek8139's receiving mechanism - Making the device ready to transmit packets - Making the device ready to receive packets - [RTL8139(A/B) Programming guide](https://www.cs.usfca.edu/~cruse/cs326f04/RTL8139_ProgrammersGuide.pdf) [Emulating WLAN in Linux - part I: the 802.11 stack](https://linuxembedded.fr/2020/05/emulating-wlan-in-linux-part-i-the-80211-stack) [snull.c LDD3](https://github.com/martinezjavier/ldd3/blob/master/snull/snull.c) [Linux Network Scaling: Receiving Packets](https://garycplin.blogspot.tw/2017/06/linux-network-scaling-receives-packets.html) [Linux-networking-stack-receiving-data](https://blog.packagecloud.io/eng/2016/06/22/monitoring-tuning-linux-networking-stack-receiving-data/#) ![](https://i.imgur.com/dhWeGVg.png) # A simple netdevice ```clike= #include <linux/module.h> #include <linux/netdevice.h> #include <linux/kernel.h> #include <linux/etherdevice.h> struct net_device *snull_dev[2]; int snull_open(struct net_device *dev) { printk("snull_open called\n"); return 0; } int snull_release(struct net_device *dev) { printk("snull_release called\n"); netif_stop_queue(dev); return 0; } int snull_xmit(struct sk_buff *skb, struct net_device *dev) { printk("dummy xmit function called...\n"); dev_kfree_skb(skb); return 0; } int snull_init(struct net_device *dev) { printk("snull device initialized\n"); return 0; }; const struct net_device_ops my_netdev_ops = { .ndo_init = snull_init, .ndo_open = snull_open, .ndo_stop = snull_release, .ndo_start_xmit = snull_xmit, }; static void virtual_setup(struct net_device *dev){ dev->netdev_ops = &my_netdev_ops; } int snull_init_module(void) { int result; snull_dev[0] = alloc_netdev (0, "sn%d", NET_NAME_UNKNOWN, virtual_setup); if((result = register_netdev(snull_dev[0]))) { printk("snull: Error %d initalizing card ...", result); return result; } return 0; } void snull_cleanup (void) { printk ("<0> Cleaning Up the Module\n"); unregister_netdev (snull_dev[0]); } module_init(snull_init_module); module_exit(snull_cleanup); MODULE_LICENSE("GPL"); ``` # Important netdevice operation (Device Methods) ```clike= static const struct net_device_ops snull_netdev_ops = { //interface up, open method should register any system resource it needs (I/O ports, IRQ,DMA, etc.) .ndo_open = snull_open, // interface down .ndo_stop = snull_release, // Method that initiates the transmission of a packet .ndo_start_xmit = snull_tx, .ndo_do_ioctl = snull_ioctl, .ndo_set_config = snull_config, .ndo_get_stats = snull_stats, .ndo_change_mtu = snull_change_mtu, .ndo_tx_timeout = snull_tx_timeout }; ``` Header Operation ```clike= static const struct header_ops snull_header_ops = { /* * Function (called before hard_start_xmit) that builds * the hardware header from the source and destination * hardware addresses that were previously retrieved; its * job is to organize the information passed to it as * arguments into an appropriate, device-specific * hardware header. eth_header is the default function for Ethernet- * like interfaces, and ether_setup assigns this field * accordingly. */ .create = snull_header, .rebuild = snull_rebuild_header }; ``` # Guideline 1. Module Init - alloc_netdev - register_netdev 3. net_device structure 4. Module unloading 5. Opening and Closing interface - ifconfig 6. Packet Transmission - Controlling Transmission Concurrency :::info What happen when there is two same Vender ID and Device ID on the different PCI bus? Ans: The pci probe function will be call two times with different `struct pci_dev *pdev` address. ::: ## MSI interrupt 1. Request IRQ: do after enable MSI interrupt ```clike if (intr_mode) { if((msi_result = pci_enable_msi(pdev))) { printk(KERN_NOTICE "MSI support not available: %i.\n", msi_result); } else { printk(KERN_NOTICE "MSI Interrupt vector: %i.\n", pdev->irq); } } if (request_irq(pdev->irq, &edma_irq_hndlr, IRQF_SHARED, DRV_NAME, pdev)) { printk(KERN_WARNING "Failed to request shared IRQ", DRV_NAME); goto err_out2; } ``` 2. Free IRQ must do before disable the PCI MSI ```clike free_irq(pdev->irq, pdev); if(intr_mode) pci_disable_msi(pdev); ``` # Driver Private data (adapter) [netdev_appa.c](https://gist.github.com/ldotrg/6eaaaebbf4ed1c74649e10de249d00e2) Kernel Main stream: `ixgb_driver` good example - There is three data struct need to be handle:`net_device`,`pci_dev`, user private - Create in the pci probe hooks function ```clike= // Netdev will help us create the space for adapter // In the next, we will use netdev_priv to access our private adapter. //alloc_etherdev = alloc_netdev(sizeof_priv, name, name_assign_type, setup) + ether_setup netdev = alloc_etherdev(sizeof(struct ixgb_adapter)); // Set the sysfs physical device (pci_dev) reference for the network logical device // Let net_dev get the PCI(bus) device SET_NETDEV_DEV(netdev, &pdev->dev); // Let pci device get the netdev. pci_set_drvdata(pdev, netdev); // access network device private data adapter = netdev_priv(netdev); adapter->netdev = netdev; adapter->pdev = pdev; adapter->hw.back = adapter; ``` Kernel Data struct: ```clike /* Set the sysfs physical device reference for the network logical device * if set prior to registration will cause a symlink during initialization. */ #define SET_NETDEV_DEV(net, pdev) ((net)->dev.parent = (pdev)) pci_set_drvdata(pdev, netdev); static inline void pci_set_drvdata(struct pci_dev *pdev, void *data) { dev_set_drvdata(&pdev->dev, data); } static inline void dev_set_drvdata(struct device *dev, void *data) { dev->driver_data = data; } static inline void *netdev_priv(const struct net_device *dev) { return (char *)dev + ALIGN(sizeof(struct net_device), NETDEV_ALIGN); } ``` Private Adapter data will be put in the end of the net_device. ```graphviz digraph dfd2{ pci_device [label="{<f0> pci_device|<f1> pdev\-\>dev\-\>driver_data\n }" shape=Mrecord]; net_device [label="{<f0> net_device|<f1> (net)\-\>dev.parent\n adapter data is in the end\n }" shape=Mrecord]; net_device->pci_device pci_device->net_device } ``` # User space to Kernel space ## procfs Read only for kernel export variable. E.g. `arp_proc_init` create the path on `/proc/net/arp`. ## sysctl: Directory /proc/sys Export kernel variable, Read and modify are avalivable. Register by function `register_sysctl_table`. Example `/proc/sys/net/ipv4/ip_forward` ```clike static struct ctl_table ctl_forward_entry[] = { { .procname = "ip_forward", .data = &ipv4_devconf.data[ IPV4_DEVCONF_FORWARDING - 1], .maxlen = sizeof(int), .mode = 0644, .proc_handler = devinet_sysctl_forward, .extra1 = &ipv4_devconf, .extra2 = &init_net, }, }; ``` :::info [Graphviz Drawing Guide](https://www.tonyballantyne.com/graphs.html#orgheadline19) ::: ### Core directories in /proc/sys/net ```graphviz digraph hierarchy { nodesep=1.0 // increases the separation between nodes node [color=Black,fontname=Courier,shape=box] //All nodes will this shape and colour edge [color=Black] //All the lines look like this root[label="/"] sys [label="sys"] proc[label="proc"] net [label="net"] bridge[label="bridge (ch17)"] ipv4 [label="ipv4 (ch23)"] core[label="core (ch12)"] neigh[label="neigh (ch29)"] conf[label="conf (ch36)"] route[label="route (ch36)"] root->proc proc->sys sys->net net->{bridge, ipv4, core} ipv4->{neigh, conf, route} } ``` ### Creation of the core directories in /proc/sys/net ![](https://i.imgur.com/XiKFjw7.png) ## IOCTL ### Dispatching ioctl commands ![](https://i.imgur.com/gkZTci2.png) ## Netlink Socket The Netlink socket, well described in [RFC 3549](https://tools.ietf.org/html/rfc3549), represents the preferred interface between user space and kernel for IP networking configuration. ### `int socket(int domain, int type, int protocol)` - domain: `AF_NETLINK` Kernel user interface device - type: SOCK_DGRAM only - protocol: Define in `include/uapi/linux/netlink.h`, such as `NETLINK_ROUTE`. - destination endpoint address: PID or multicast group ID - Notifications in multicast groups The groups are listed in the enumeration list `RTMGRP_XXX` in `include/linux/rtnetlink.h`. Among them are the `RTMGRP_IPV4_ROUTE` and `RTMGRP_NEIGH groups`, used respectively for notifications. - NETLINK_ROUTE service specifies several types, such as RTM_NEWLINK, RTM_DELLINK, RTM_GETLINK, RTM_NEWADDR, RTM_DELADDR, RTM_NEWROUTE, RTM_DELROUTE, etc. :::success One of the advantages of Netlink over other user-kernel interfaces such as ioctl is that the kernel can initiate a transmission instead of just returning information in answer to user-space requests. ::: More detail please reference [Netlink](/YvSS8uXIS4yZpmWUfLBjWQ). # Notification Chains publish-and-subscribe model: - The notified are the subsystems that ask to be notified about the event and that provide a callback function to invoke. - The notifier is the subsystem that experiences an event and calls the callback function. # Ingress path ![](https://i.imgur.com/8yqjsbS.png)