Netdev
=====
###### tags: `Linux Kernel`
`Contribute by <Dung-Ru Tsai>`
# [Writing Network Device Drivers for Linux](https://linuxgazette.net/156/jangir.html)
Starting Driver Development
Driver development breaks down into the following steps:
- Detecting the device
- Enabling the device
- Understanding the network device
- Bus-independent device access
- Understanding the PCI configuration space
- Initializing net_device
- Understanding RealTek8139's transmission mechanism
- Understanding RealTek8139's receiving mechanism
- Making the device ready to transmit packets
- Making the device ready to receive packets
- [RTL8139(A/B) Programming guide](https://www.cs.usfca.edu/~cruse/cs326f04/RTL8139_ProgrammersGuide.pdf)
[Emulating WLAN in Linux - part I: the 802.11 stack](https://linuxembedded.fr/2020/05/emulating-wlan-in-linux-part-i-the-80211-stack)
[snull.c LDD3](https://github.com/martinezjavier/ldd3/blob/master/snull/snull.c)
[Linux Network Scaling: Receiving Packets](https://garycplin.blogspot.tw/2017/06/linux-network-scaling-receives-packets.html)
[Linux-networking-stack-receiving-data](https://blog.packagecloud.io/eng/2016/06/22/monitoring-tuning-linux-networking-stack-receiving-data/#)

# A simple netdevice
```clike=
#include <linux/module.h>
#include <linux/netdevice.h>
#include <linux/kernel.h>
#include <linux/etherdevice.h>
struct net_device *snull_dev[2];
int snull_open(struct net_device *dev) {
printk("snull_open called\n");
return 0;
}
int snull_release(struct net_device *dev) {
printk("snull_release called\n");
netif_stop_queue(dev);
return 0;
}
int snull_xmit(struct sk_buff *skb, struct net_device *dev) {
printk("dummy xmit function called...\n");
dev_kfree_skb(skb);
return 0;
}
int snull_init(struct net_device *dev) {
printk("snull device initialized\n");
return 0;
};
const struct net_device_ops my_netdev_ops = {
.ndo_init = snull_init,
.ndo_open = snull_open,
.ndo_stop = snull_release,
.ndo_start_xmit = snull_xmit,
};
static void virtual_setup(struct net_device *dev){
dev->netdev_ops = &my_netdev_ops;
}
int snull_init_module(void) {
int result;
snull_dev[0] = alloc_netdev (0, "sn%d", NET_NAME_UNKNOWN, virtual_setup);
if((result = register_netdev(snull_dev[0]))) {
printk("snull: Error %d initalizing card ...", result);
return result;
}
return 0;
}
void snull_cleanup (void)
{
printk ("<0> Cleaning Up the Module\n");
unregister_netdev (snull_dev[0]);
}
module_init(snull_init_module);
module_exit(snull_cleanup);
MODULE_LICENSE("GPL");
```
# Important netdevice operation (Device Methods)
```clike=
static const struct net_device_ops snull_netdev_ops = {
//interface up, open method should register any system resource it needs (I/O ports, IRQ,DMA, etc.)
.ndo_open = snull_open,
// interface down
.ndo_stop = snull_release,
// Method that initiates the transmission of a packet
.ndo_start_xmit = snull_tx,
.ndo_do_ioctl = snull_ioctl,
.ndo_set_config = snull_config,
.ndo_get_stats = snull_stats,
.ndo_change_mtu = snull_change_mtu,
.ndo_tx_timeout = snull_tx_timeout
};
```
Header Operation
```clike=
static const struct header_ops snull_header_ops = {
/*
* Function (called before hard_start_xmit) that builds
* the hardware header from the source and destination
* hardware addresses that were previously retrieved; its
* job is to organize the information passed to it as
* arguments into an appropriate, device-specific
* hardware header. eth_header is the default function for Ethernet-
* like interfaces, and ether_setup assigns this field
* accordingly.
*/
.create = snull_header,
.rebuild = snull_rebuild_header
};
```
# Guideline
1. Module Init
- alloc_netdev
- register_netdev
3. net_device structure
4. Module unloading
5. Opening and Closing interface
- ifconfig
6. Packet Transmission
- Controlling Transmission Concurrency
:::info
What happen when there is two same Vender ID and Device ID on the different PCI bus?
Ans: The pci probe function will be call two times with different `struct pci_dev *pdev` address.
:::
## MSI interrupt
1. Request IRQ: do after enable MSI interrupt
```clike
if (intr_mode) {
if((msi_result = pci_enable_msi(pdev))) {
printk(KERN_NOTICE "MSI support not available: %i.\n", msi_result);
} else {
printk(KERN_NOTICE "MSI Interrupt vector: %i.\n", pdev->irq);
}
}
if (request_irq(pdev->irq, &edma_irq_hndlr, IRQF_SHARED, DRV_NAME, pdev)) {
printk(KERN_WARNING "Failed to request shared IRQ", DRV_NAME);
goto err_out2;
}
```
2. Free IRQ must do before disable the PCI MSI
```clike
free_irq(pdev->irq, pdev);
if(intr_mode)
pci_disable_msi(pdev);
```
# Driver Private data (adapter)
[netdev_appa.c](https://gist.github.com/ldotrg/6eaaaebbf4ed1c74649e10de249d00e2)
Kernel Main stream: `ixgb_driver` good example
- There is three data struct need to be handle:`net_device`,`pci_dev`, user private
- Create in the pci probe hooks function
```clike=
// Netdev will help us create the space for adapter
// In the next, we will use netdev_priv to access our private adapter.
//alloc_etherdev = alloc_netdev(sizeof_priv, name, name_assign_type, setup) + ether_setup
netdev = alloc_etherdev(sizeof(struct ixgb_adapter));
// Set the sysfs physical device (pci_dev) reference for the network logical device
// Let net_dev get the PCI(bus) device
SET_NETDEV_DEV(netdev, &pdev->dev);
// Let pci device get the netdev.
pci_set_drvdata(pdev, netdev);
// access network device private data
adapter = netdev_priv(netdev);
adapter->netdev = netdev;
adapter->pdev = pdev;
adapter->hw.back = adapter;
```
Kernel Data struct:
```clike
/* Set the sysfs physical device reference for the network logical device
* if set prior to registration will cause a symlink during initialization.
*/
#define SET_NETDEV_DEV(net, pdev) ((net)->dev.parent = (pdev))
pci_set_drvdata(pdev, netdev);
static inline void pci_set_drvdata(struct pci_dev *pdev, void *data)
{
dev_set_drvdata(&pdev->dev, data);
}
static inline void dev_set_drvdata(struct device *dev, void *data)
{
dev->driver_data = data;
}
static inline void *netdev_priv(const struct net_device *dev)
{
return (char *)dev + ALIGN(sizeof(struct net_device), NETDEV_ALIGN);
}
```
Private Adapter data will be put in the end of the net_device.
```graphviz
digraph dfd2{
pci_device [label="{<f0> pci_device|<f1>
pdev\-\>dev\-\>driver_data\n
}" shape=Mrecord];
net_device [label="{<f0> net_device|<f1>
(net)\-\>dev.parent\n
adapter data is in the end\n
}" shape=Mrecord];
net_device->pci_device
pci_device->net_device
}
```
# User space to Kernel space
## procfs
Read only for kernel export variable.
E.g. `arp_proc_init` create the path on `/proc/net/arp`.
## sysctl: Directory /proc/sys
Export kernel variable, Read and modify are avalivable. Register by function `register_sysctl_table`.
Example `/proc/sys/net/ipv4/ip_forward`
```clike
static struct ctl_table ctl_forward_entry[] = {
{
.procname = "ip_forward",
.data = &ipv4_devconf.data[
IPV4_DEVCONF_FORWARDING - 1],
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = devinet_sysctl_forward,
.extra1 = &ipv4_devconf,
.extra2 = &init_net,
},
};
```
:::info
[Graphviz Drawing Guide](https://www.tonyballantyne.com/graphs.html#orgheadline19)
:::
### Core directories in /proc/sys/net
```graphviz
digraph hierarchy {
nodesep=1.0 // increases the separation between nodes
node [color=Black,fontname=Courier,shape=box] //All nodes will this shape and colour
edge [color=Black] //All the lines look like this
root[label="/"]
sys [label="sys"]
proc[label="proc"]
net [label="net"]
bridge[label="bridge (ch17)"]
ipv4 [label="ipv4 (ch23)"]
core[label="core (ch12)"]
neigh[label="neigh (ch29)"]
conf[label="conf (ch36)"]
route[label="route (ch36)"]
root->proc
proc->sys
sys->net
net->{bridge, ipv4, core}
ipv4->{neigh, conf, route}
}
```
### Creation of the core directories in /proc/sys/net

## IOCTL
### Dispatching ioctl commands

## Netlink Socket
The Netlink socket, well described in [RFC 3549](https://tools.ietf.org/html/rfc3549), represents the preferred interface
between user space and kernel for IP networking configuration.
### `int socket(int domain, int type, int protocol)`
- domain: `AF_NETLINK` Kernel user interface device
- type: SOCK_DGRAM only
- protocol: Define in `include/uapi/linux/netlink.h`, such as `NETLINK_ROUTE`.
- destination endpoint address: PID or multicast group ID
- Notifications in multicast groups
The groups are listed in the enumeration list `RTMGRP_XXX` in `include/linux/rtnetlink.h`. Among them are the `RTMGRP_IPV4_ROUTE` and `RTMGRP_NEIGH groups`, used respectively for notifications.
- NETLINK_ROUTE service specifies several types, such as RTM_NEWLINK, RTM_DELLINK, RTM_GETLINK, RTM_NEWADDR, RTM_DELADDR, RTM_NEWROUTE, RTM_DELROUTE, etc.
:::success
One of the advantages of Netlink over other user-kernel interfaces such as ioctl is that the kernel can initiate a transmission instead of just returning information in answer to user-space requests.
:::
More detail please reference [Netlink](/YvSS8uXIS4yZpmWUfLBjWQ).
# Notification Chains
publish-and-subscribe model:
- The notified are the subsystems that ask to be notified about the event and that
provide a callback function to invoke.
- The notifier is the subsystem that experiences an event and calls the callback
function.
# Ingress path
