# [Intern] 14/09/2022 To summarize step by step how packet transfer within the kernel. Use illustration/flowchart to summarize. ###### tags: `BMW-Lab`, `Intern` :::success **Goal:** To summarize step by step how packet transfer within the kernel with illustration/flowchart. ::: :::success **References** {%youtube T5TvPRQFNoM %} ::: ## Background The Kernel NIC Interface sample application uses two threads in user space for each physical NIC port being used, and allocates one or more KNI device for each physical NIC port with kernel module’s support. For a physical NIC port, one thread reads from the port and writes to KNI devices, and another thread reads from KNI devices and writes the data unmodified to the physical NIC port. It is recommended to configure one KNI device for each physical NIC port. If configured with more than one KNI devices for a physical NIC port, it is just for performance testing, or it can work together with VMDq support in future. The packet flow through the Kernel NIC Interface application is as shown in the following Picture 1. ![](https://imgur.com/Glyzpre.png) ## Packet processing in communication networks Packet processing in communication networks involves the application of different functions and algorithms to ensure that each packet can run through the network efficiently. It includes steps such as packet identification, inspection and manipulation. The network packet is a fundamental component in a packet-switched network. It consists of three main parts: the header, the payload and the trailer. The header contains information such as the sender address, the destination address, the length of the packet and the packet sequence number. The different packet processing functions and algorithms range from fairly simple to more complex ones. A basic routing function is a good example of simple packet processing. More complex packet processing functions involve the application of different policies, charging and manipulation of the packets. ## User plane packet processing nodes While micro-sleeps can be applied to all types of packet processing, our research indicates that they are particularly impactful when used in the user plane function (UPF). The UPF includes several complex processing functions such as GPRS Tunneling Protocol User Plane encapsulation and decapsulation. Other UPF functions include access control, bearer lookup, QoS mapping and marking. It also follows rules for guaranteed bitrate and maximum bitrate. Packets passing through the UPF are also subject to online/offline charging – that is, the application of different charging policies. Instructions about how to process packets for different user equipment come from the session management function/policy control function. The processing of packets from different users occurs independently for the most part. All the packet processing functions and algorithms must meet the system requirement of real-time characteristics such as jitter and packet latency. ## Kernel packet processing The built-in, native networking support in computers is implemented as part of the operating system (OS) kernel and uses the POSIX (Portable Operating System Interface) socket as its standard application programming interface (API). In kernel packet processing, user-mode application programs use the POSIX socket API to send and receive packets, while the kernel driver/scheduler handles the interaction with the network interface card (NIC). If the network is not ready (no packet has arrived) the kernel can decide to block that application while waiting. Picture 2. illustrates how kernel packet processing works, with “App” representing user-mode applications. In this model, the OS is aware of the idle time, and OS power management can work to save power.. ![](https://imgur.com/WUaFI9n.png) Kernel packet processing has multiple disadvantages. Most importantly, the high overhead of the OS calls and the copying of packets to/from the kernel space makes it hard to scale to high networking speeds and high packet rates. For example, the time between 1,534-byte packets on a 200Gbit Ethernet is 61ns. This corresponds to the execution time for performing a system call, which does not leave sufficient time to perform the packet processing itself. The speed of today’s network cards makes kernel-packet processing challenging at best, and impossible at worst. Power management in the kernel can be problematic at high networking speeds. The OS kernel monitors and saves energy based on core utilization. This is a much slower mechanism that does not have the ability to follow the rapid changes in high-speed network traffic, resulting in queue buildups, delays and even packet drops. ## Kernel-bypass packet processing Kernel-bypass packet processing eliminates the kernel execution overhead by moving packet processing directly to the user space, as shown on the right side of the Figure below. In this scenario, the OS can dedicate a network interface to an application such as the Data Plane Development Kit (DPDK), which can program the HW from the user space. When DPDK is used, packets are received directly in user-space memory without kernel interaction, and all network-related interrupts are disabled. It is up to the application to make sure that packet queues and ring buffers to the NIC  are checked frequently for new packets. To avoid packet drops and reduce packet latency, the DPDK-based application is designed to check the packet ring buffer in busy-waiting mode, where the complete core is assigned to the application thread. This enables packet processing without any context switching or HW interruptions and minimal cache pollution, as the application thread is the only user of the core. Measurements using DPDK as a kernel bypass for packet processing show that millions of packets can be received in a single core and pipelined to other cores for further packet processing. Kernel-bypass packet processing solves the issue of how to handle packet processing at speeds of 200 Gigabit Ethernet and beyond, but the busy-waiting technique of packet reception comes at a cost. No energy savings will be achieved if all the packet processing cores are 100 percent utilized in busy-waiting mode. In kernel-bypass packet processing, Ethernet packets are transferred directly to the user-mode application memory. The power management needs to be performed in the user space within the DPDK library. It is important to note that DPDK interacts with kernel power management when changing core frequency. ## sk_buff structure The struct sk_buff (socket buffer) describes a network packet. The structure fields contain information about both the header and packet contents, the protocols used, the network device used, and pointers to the other struct sk_buff. A summary description of the content of the structure is presented below: struct sk_buff { union { struct { /* These two members must be first. */ struct sk_buff *next; struct sk_buff *prev; union { struct net_device *dev; /* Some protocols might use this space to store information, * while device pointer would be NULL. * UDP receive path is one user. */ unsigned long dev_scratch; }; }; struct rb_node rbnode; /* used in netem & tcp stack */ }; struct sock *sk; union { ktime_t tstamp; u64 skb_mstamp; }; /* * This is the control buffer. It is free to use for every * layer. Please put your private variables there. If you * want to keep them across layers you have to do a skb_clone() * first. This is owned by whoever has the skb queued ATM. */ char cb[48] __aligned(8); unsigned long _skb_refdst; void (*destructor)(struct sk_buff *skb); union { struct { unsigned long _skb_refdst; void (*destructor)(struct sk_buff *skb); }; struct list_head tcp_tsorted_anchor; }; /* ... */ unsigned int len, data_len; __u16 mac_len, hdr_len; /* ... */ __be16 protocol; __u16 transport_header; __u16 network_header; __u16 mac_header; /* private: */ __u32 headers_end[0]; /* public: */ /* These elements must be at the end, see alloc_skb() for details. */ sk_buff_data_t tail; sk_buff_data_t end; unsigned char *head, *data; unsigned int truesize; refcount_t users; }; where: next and prev are pointers to the next, and previous element in the buffer list; dev - the device which sends or receives the buffer; sk - the socket associated with the buffer; destructor - the callback that deallocates the buffer; transport_header, network_header, and mac_header are offsets between the beginning of the packet and the beginning of the various headers in the packets. They are internally maintained by the various processing layers through which the packet passes. To get pointers to the headers, use one of the following functions: tcp_hdr(), udp_hdr(), ip_hdr(), etc. In principle, each protocol provides a function to get a reference to the header of that protocol within a received packet. Keep in mind that the network_header field is not set until the packet reaches the network layer and the transport_header field is not set until the packet reaches the transport layer. The structure of an IP header (struct iphdr) has the following fields: struct iphdr { #if defined(__LITTLE_ENDIAN_BITFIELD) __u8 ihl:4, version:4; #elif defined (__BIG_ENDIAN_BITFIELD) __u8 version:4, ihl:4; #else #error "Please fix <asm/byteorder.h>" #endif __u8 tos; __be16 tot_len; __be16 id; __be16 frag_off; __u8 ttl; __u8 protocol; __sum16 check; __be32 saddr; __be32 daddr; /*The options start here. */ }; where: protocol is the transport layer protocol used; saddr is the source IP address; daddr is the destination IP address. ## Conclusion The kernel bypass mechanisms directly transfer packets to/from user applications from/to high-speed network cards. Ethernet packets are transferred directly to the user-mode application memory.