### :school: TEEP 2024_RT LAB_ORAN DPDK
#### :book: Technology Background
:::success
List the essential information of this chapter.
1. DPDK Installation on Ubuntu 20.04
2. VFIO and UIO
3. EAL
4. Enabling Additional Functionality
5. How to get best performance with NICs on Intel platforms
:::
---
| DPDK Ver. | 23.11 |
| -------- | -------- |
| **Ubuntu Ver.** | **20.04** |
## 1. DPDK Installation on Ubuntu 20.04
### 1.1 Update System
```=
sudo apt update
sudo apt upgrade -y
```
### 1.2 Install dependencies:
```=
sudo apt install build-essential cmake meson ninja-build python3 python3-pip
```
### 1.3 Install additional dependencies for DPDK:
```=
sudo apt install libnuma-dev libpcap-dev
```
### 1.4 Download DPDK 20.11:
```=
wget http://fast.dpdk.org/rel/dpdk-20.11.tar.xz
tar -xvf dpdk-20.11.tar.xz
```

### 1.5 Configure DPDK using Meson:
```=
cd dpdk-20.11
meson build
```
### 1.6 Build DPDK usinf Ninja:
```=
ninja -C build
```
### 1.7 Install DPDK:
```=
sudo ninja -C build install
```
### 1.8 Binding Network Ports to DPDK
#### 1.8.1 UIO Drivers
```=
sudo insmod build/kmod/igb-uio.ko
```
:::danger
**Problem:**
Insmod: ERROR: could not load module build/kmod/igb-uio.ko: No such file or directory
**Info:** I am using localhost with my laptop to install and configure DPDK.
I have tried various ways to solve the problem, including looking for the correct path but the result is nil. and other ways also the results are always the same. at first I thought this happened because I was using a laptop, because the reference sources that I read there were types of processors or laptop systems that did not support DPDK.

:::
:::success
**Troubleshooting Problem**
After doing some research to troubleshoot this problem. I found a reading source that says:
If you need to bind some devices to IGB_UIO, Install driver from other sources
```=
cd /DPDK
apt install git
git clone git://dpdk.org/dpdk-kmods dpdk_igb_uio_driver
cd ./dpdk_igb_uio_driver/linux/igb_uio
make clean && make
# Load IGB_UIO module
sudo modprobe uio
cd /dpdk_igb_uio_driver/linux/igb_uio
$ sudo insmod igb_uio.ko
```
:::
#### 1.8.2 Binding Network Ports
Once UIO driver is activated, bind network ports with the driver. DPDK provides usertools/dpdk-devbind.py for managing devices.
Find ports for binding to DPDK by running the tool with -s option.
```
# ./usertools/dpdk-devbind.py --status
```

You can find network ports are bound to kernel driver and not to DPDK. To bind a port to DPDK, run `dpdk-devbind.py` with specifying a driver and a device ID. Device ID is a PCI address of the device or more friendly style like eth0 found by ifconfig or ip command.
```
./usertools/dpdk-devbind.py --bind=uio_pci_generic enp0s8
```
After binding two ports, you can find it is under the DPDK driver and cannot find it by using ifconfig or ip.
```
# ./usertools/dpdk-devbind.py -s
```

### 1.9 Unbind Network Ports
```=
# ./usertools/dpdk-devbind.py --unbind 0000:00:08.0
```
After unbinding the device, it should appear in the "Available devices" list, and you can bind it to the DPDK igb_uio or vfio-pci driver if needed.
---
## 2. VFIO and UIO
Linux drivers handle PCI enumeration and link status interrupts in user mode, instead of being handled by kernel.
To work properly, different PMDs might require different kernel drivers. Depending on the PMD being used, a corresponding kernel driver should be loaded, and network ports should be bound to that driver.
* **VFIO** driver is a robust and secure driver that relies on IOMMU protection.
* **UIO** is a small kernel module to set up the device, map device memory to user space, and register interrupts.
Since DPDK release 1.7 onward provides VFIO support, it is recommended that vfio-pci be used as the kernel module for DPDK-bound ports in all cases. This is a more robust and secure driver compared to UIO, relying on IOMMU protection. To make use of VFIO, the vfio-pci module must be loaded.
If an IOMMU is unavailable, the vfio-pci can be used in no-iommu mode. If, for some reason, vfio is unavailable, the UIO-based modules, igb_uio and uio_pci_generic may be used.
:::info
notes:
**IOMMU (Input/Output Memory Management Unit):** is a feature of modern CPUs that allows the operating system to map physical and virtual memory addresses to manage resources efficiently. IOMMU on Linux provides extra protection for a computer system by allowing the CPU to control direct memory access requests from devices such as USB, network, and storage controllers. This can help prevent malicious code from being injected into the computer's memory.
:::
### 2.1 VFIO
VFIO is a robust and secure driver that relies on IOMMU protection. To make use of VFIO, the vfio-pci module must be loaded:
```
sudo modprobe vfio-pci
```
To make use of full VFIO functionality, both kernel and BIOS must support and be configured to use IO virtualization (such as Intel® VT-d).
For proper operation of VFIO when running DPDK applications as a non-privileged user, correct permissions should also be set up. For more information, please refer to Running [DPDK Applications Without Root Privileges](https://doc.dpdk.org/guides-23.11/linux_gsg/enable_func.html#running-without-root-privileges).
**VFIO no-IOMMU mode**
If there is no IOMMU available on the system, VFIO can still be used, but it has to be loaded with an additional module parameter:
```
modprobe vfio enable_unsafe_noiommu_mode=1
```
Alternatively, one can also enable this option in an already loaded kernel module:
```
echo 1 > /sys/module/vfio/parameters/enable_unsafe_noiommu_mode
```
After that, VFIO can be used with hardware devices as usual.
:::info
It may be required to unload all VFIO related-modules before probing the module again with `enable_unsafe_noiommu_mode=1` parameter.
:::
:::warning
Since no-IOMMU mode forgoes IOMMU protection, it is inherently unsafe. That said, it does make it possible for the user to keep the degree of device access and programming that VFIO has, in situations where IOMMU is not available.
:::
### 2.2 UIO
In situations where using VFIO is not an option, there are alternative drivers one can use. In many cases, the standard uio_pci_generic module included in the Linux kernel can be used as a substitute for VFIO. This module can be loaded using the command:
```
sudo modprobe uio_pci_generic
```
Using UIO drivers is inherently unsafe due to this method lacking IOMMU protection, and can only be done by root user.
:::info
uio_pci_generic module doesn’t support the creation of virtual functions.
For some devices which lack support for legacy interrupts, e.g. virtual function (VF) devices, the igb_uio module may be needed in place of uio_pci_generic.
:::
As an alternative to the uio_pci_generic, there is the igb_uio module which can be found in the repository dpdk-kmods. It can be loaded as shown below:
```
sudo modprobe uio
sudo insmod igb_uio.ko
```
---
## 3. EAL (Environment Abstraction Layer)
The EAL, or Environment Abstraction Layer, is the main concept behind the DPDK. The EAL is a set of programming tools that let the DPDK work in a specific hardware environment and under a specific operating system. In the official DPDK repository, libraries and drivers that are part of the EAL are saved in the `rte_eal directory`.
Drivers and libraries for Linux and the BSD system are saved in this directory. It also contains a set of header files for various processor architectures: ARM, x86, TILE64, and PPC64. We access software in the EAL when we compile the DPDK from the source code:
The Environment Abstraction Layer (EAL) is responsible for gaining access to low-level resources such as hardware and memory space. It provides a generic interface that hides the environment specifics from the applications and libraries. It is the responsibility of the initialization routine to decide how to allocate these resources (that is, memory space, devices, timers, consoles, and so on).
We access software in the EAL when we compile the DPDK from the source code:
```
make config T=x86_64-native-linuxapp-gcc
```
:::info
One can guess that this command will compile DPDK for Linux in an x86_64 architecture. The EAL is what binds the DPDK to applications. **All of the applications that use the DPDK must include the EAL’s header files.**
The most commonly of these include:
* rte_lcore.h — manages processor cores and sockets;
* rte_memory.h — manages memory;
* rte_pci.h — provides the interface access to PCI address space;
* rte_debug.h — provides trace and debug functions (logging, dump_stack, and more);
* rte_interrupts.h — processes interrupts.
:::
More details on this structure and EAL functions can be found in the [official documentation](https://doc.dpdk.org/guides/prog_guide/env_abstraction_layer.html).
Typical services expected from the EAL are:
* DPDK Loading and Launching: The DPDK and its application are linked as a single application and must be loaded by some means.
* Core Affinity/Assignment Procedures: The EAL provides mechanisms for assigning execution units to specific cores as well as creating execution instances.
* System Memory Reservation: The EAL facilitates the reservation of different memory zones, for example, physical memory areas for device interactions.
* Trace and Debug Functions: Logs, dump_stack, panic and so on.
* Utility Functions: Spinlocks and atomic counters that are not provided in libc.
* CPU Feature Identification: Determine at runtime if a particular feature, for example, Intel® AVX is supported. Determine if the current CPU supports the feature set that the binary was compiled for.
* Interrupt Handling: Interfaces to register/unregister callbacks to specific interrupt sources.
* Alarm Functions: Interfaces to set/remove callbacks to be run at a specific time.
---
## 4. Enabling Additional Functionality
### 4.1 Running DPDK Applications Without Root Privileges
The following sections describe generic requirements and configuration for running DPDK applications as non-root. There may be additional requirements documented for some drivers.
#### 4.1.1 Hugepages
Hugepages is a guide to allocating and managing memory efficiently in the Linux operating system. Hugepages have a much larger size than regular pages, which helps reduce memory management overhead and improve the performance of applications that require large continuous memory access, such as DPDK. Other benefits include reduced memory fragmentation. However, the use of hugepages requires additional configuration and management, and memory allocation should be considered carefully due to their static nature.
To run DPDK applications as non-root, first, hugepages need to be reserved as root using the command :
```
sudo dpdk-hugepages.py --reserve 1G.
```
If multi-process isn't needed, adding --in-memory bypasses accessing hugepage mount point and files. Otherwise, granting write access to the hugepage directory for unprivileged users is necessary. A practical approach for managing multiple applications with hugepages involves mounting the filesystem with group permissions and assigning a supplementary group to each application or container. One method is using the provided script:
```
export HUGEDIR=$HOME/huge-1G
mkdir -p $HUGEDIR
sudo dpdk-hugepages.py --mount --directory $HUGEDIR --user `id -u` --group `id -g`
```
#### 4.1.2 Resource Limits
When running as non-root user, there may be some additional resource limits that are imposed by the system. Specifically, the following resource limits may need to be adjusted in order to ensure normal DPDK operation:
* RLIMIT_LOCKS (number of file locks that can be held by a process)
* RLIMIT_NOFILE (number of open file descriptors that can be held open by a process)
* RLIMIT_MEMLOCK (amount of pinned pages the process is allowed to have)
The above limits can usually be adjusted by editing /etc/security/limits.conf file, and rebooting.
#### 4.1.3 Device Control
If the HPET is to be used, /dev/hpet permissions must be adjusted.
For vfio-pci kernel driver, the following Linux file system objects’ permissions should be adjusted:
The VFIO device file, /dev/vfio/vfio
The directories under /dev/vfio that correspond to IOMMU group numbers of devices intended to be used by DPDK, for example, /dev/vfio/50
### 4.2 Power Management and Power Saving Functionality
Enhanced Intel SpeedStep® Technology must be enabled in the platform BIOS if the power management feature of DPDK is to be used. Otherwise, the sys file folder /sys/devices/system/cpu/cpu0/cpufreq will not exist, and the CPU frequency- based power management cannot be used. Consult the relevant BIOS documentation to determine how these settings can be accessed.
For example, on some Intel reference platform BIOS variants, the path to Enhanced Intel SpeedStep® Technology is:
```
Advanced
-> Processor Configuration
-> Enhanced Intel SpeedStep\ |reg| Tech
```
In addition, C3 and C6 should be enabled as well for power management. The path of C3 and C6 on the same platform BIOS is:
```
Advanced
-> Processor Configuration
-> Processor C3 Advanced
-> Processor Configuration
-> Processor C6
```
:::info
Features C3 and C6 are low-power states available on some processors to reduce power consumption when the processor is not in intensive use. These modes are intended to improve the energy efficiency of computer systems, especially during idle conditions or when the processor's workload is low.
* C3 (C-State 3): C3 mode is one of the low-power modes where the processor lowers its working frequency and reduces power consumption when no processing tasks are running. The processor remains active and ready to respond when needed, but operates at a lower power level.
* C6 (C-State 6): C6 mode is deeper compared to C3, where the processor turns off some or all parts of the processor core that are not being used for a while. This can include shutting down unnecessary circuits and reducing the working frequency to zero to save power significantly. Although more energy efficient, exiting C6 mode takes longer compared to C3 mode.
These two modes, C3 and C6, are part of the power management technology used to optimize energy efficiency in modern computer systems, especially during times of inconsistent workload or when the system is idle. Activation of the C3 and C6 features can help to significantly reduce power consumption, especially in environments where energy efficiency is a priority.
:::
### 4.3 Using Linux Core Isolation to Reduce Context Switches
While the threads used by a DPDK application are pinned to logical cores on the system, it is possible for the Linux scheduler to run other tasks on those cores. To help prevent additional workloads, timers, RCU processing and IRQs from running on those cores, it is possible to use the Linux kernel parameters isolcpus, nohz_full, irqaffinity to isolate them from the general Linux scheduler tasks.
For example, if a given CPU has 0-7 cores and DPDK applications are to run on logical cores 2, 4 and 6, the following should be added to the kernel parameter list:
```
isolcpus=2,4,6 nohz_full=2,4,6 irqaffinity=0,1,3,5,7
```
### 4.4 High Precision Event Timer (HPET) Functionality
DPDK can support the system HPET as a timer source rather than the system default timers, such as the core Time-Stamp Counter (TSC) on x86 systems. To enable HPET support in DPDK:
* Ensure that HPET is enabled in BIOS settings.
* Enable HPET_MMAP support in kernel configuration. Note that this my involve doing a kernel rebuild, as many common linux distributions do not have this setting enabled by default in their kernel builds.
* Enable DPDK support for HPET by using the build-time meson option use_hpet, for example, meson configure -Duse_hpet=true
:::info
HPET stands for High Precision Event Timer, which is a type of timer used in computer systems. HPET is designed to provide more accurate and high-precision timing compared to other conventional timers, such as the Time-Stamp Counter (TSC) found on x86 processors.
:::
To use the rte_get_hpet_cycles() and rte_get_hpet_hz() APIs in the application, as well as make HPET the default time source for the rte_timer library, the application must call the rte_eal_hpet_init() API on initialization. This API call will ensure that HPET is accessible, and will return an error if it is not available.
For applications that require timing APIs but do not specifically require HPET timers, it is recommended to use the rte_get_timer_cycles() and rte_get_timer_hz() APIs instead of the HPET-specific APIs. These APIs are generic and can work with both TSC and HPET timing sources, depending on what is requested by the application in the rte_eal_hpet_init() call, if any, and system availability at runtime.
## 5. How to get best performance with NICs on Intel platforms
### 5.1 Hardware and Memory Requirements
For best performance use an Intel Xeon class server system such as Ivy Bridge, Haswell or newer. Ensure that each memory channel has at least one memory DIMM inserted, and that the memory size for each is at least 4GB. Note: this has one of the most direct effects on performance.
You can check the memory configuration using `dmidecode` as follows:
```
dmidecode -t memory | grep Locator
Locator: DIMM_A1
Bank Locator: NODE 1
Locator: DIMM_A2
Bank Locator: NODE 1
Locator: DIMM_B1
Bank Locator: NODE 1
Locator: DIMM_B2
Bank Locator: NODE 1
...
Locator: DIMM_G1
Bank Locator: NODE 2
```
The sample output above shows a total of 8 channels, from A to H, where each channel has 2 DIMMs.
You can also use `dmidecode` to determine the memory frequency:
```
dmidecode -t memory | grep Speed
Speed: 2133 MHz
Configured Clock Speed: 2134 MHz
Speed: Unknown
Configured Clock Speed: Unknown
Speed: 2133 MHz
Configured Clock Speed: 2134 MHz
Speed: Unknown
```
#### 5.1.1 Network Interface Card Requirements
Use a DPDK supported high end NIC such as the Intel XL710 40GbE.
Make sure each NIC has been flashed the latest version of NVM/firmware.
Use PCIe Gen3 slots, such as Gen3 x8 or Gen3 x16 because PCIe Gen2 slots don’t provide enough bandwidth for 2 x 10GbE and above. You can use lspci to check the speed of a PCI slot using something like the following:
```
lspci -s 03:00.1 -vv | grep LnkSta
LnkSta: Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- ...
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ ...
```
When inserting NICs into PCI slots always check the caption, such as CPU0 or CPU1 to indicate which socket it is connected to.
Care should be take with NUMA. If you are using 2 or more ports from different NICs, it is best to ensure that these NICs are on the same CPU socket. An example of how to determine this is shown further below.
#### 5.1.2 BIOS Settings
It is important to customize the BIOS settings to the needs of the application under test. Usually, choosing the high performance option as the CPU power policy is a reasonable first step. Taking into account the use of Turbo Boost to increase the core-processor frequency is also recommended. Also, when testing the physical functions of the NIC, be sure to disable all virtualization options, and enable VT-d if you plan to use VFIO.
#### 5.1.3 Linux boot command line
The following are some recommendations on GRUB boot settings:
* Use the default grub file as a starting point.
* Reserve 1G huge pages via grub configurations. For example to reserve 8 huge pages of 1G size:
`default_hugepagesz=1G hugepagesz=1G hugepages=8`
* Isolate CPU cores which will be used for DPDK. For example:
`isolcpus=2,3,4,5,6,7,8`
* If it wants to use VFIO, use the following additional grub parameters:
`iommu=pt intel_iommu=on`
:::info
The recommendations provide guidance for configuration of GRUB (Grand Unified Bootloader) settings to improve performance and DPDK (Data Plane Development Kit) configuration.
* Huge Pages Reservation: Using GRUB configuration, you can allocate and reserve huge pages at boot time. The example above shows how to configure GRUB to reserve 8 huge pages with a size of 1G.
* Isolate CPU Cores: To improve DPDK performance, it is recommended to isolate the CPU cores that will be used by DPDK. This is done by using the isolcpus option in the GRUB configuration, like the example above which isolates cores 2 through 8.
* Configuration for VFIO: If you plan to use VFIO (Virtual Function I/O) for hardware virtualization, it is necessary to add additional parameters to the GRUB configuration, such as iommu=pt and intel_iommu=on, to enable VT-d and ensure proper use of VFIO.
The guide provides the necessary configuration steps in the GRUB setup to improve performance and support special features such as huge pages and VFIO in a DPDK environment.
:::
### 5.2 Configurations before running DPDK
#### 5.2.1 Reserve huge pages. See the earlier section for more details or [click here](https://doc.dpdk.org/guides-23.11/linux_gsg/sys_reqs.html#linux-gsg-hugepages) for official documentation.
```
# Get the hugepage size.
awk '/Hugepagesize/ {print $2}' /proc/meminfo
# Get the total huge page numbers.
awk '/HugePages_Total/ {print $2} ' /proc/meminfo
# Unmount the hugepages.
umount `awk '/hugetlbfs/ {print $2}' /proc/mounts`
# Create the hugepage mount folder.
mkdir -p /mnt/huge
# Mount to the specific folder.
mount -t hugetlbfs nodev /mnt/huge
```
#### 5.2.2 Check the CPU layout using the DPDK cpu_layout utility:
```
cd dpdk_folder
usertools/cpu_layout.py
```
Or run lscpu to check the cores on each socket.
#### 5.2.3 Check your NIC id and related socket id:
```
# List all the NICs with PCI address and device IDs.
lspci -nn | grep Eth
```
For example suppose your output was as follows:
```
82:00.0 Ethernet [0200]: Intel XL710 for 40GbE QSFP+ [8086:1583]
82:00.1 Ethernet [0200]: Intel XL710 for 40GbE QSFP+ [8086:1583]
85:00.0 Ethernet [0200]: Intel XL710 for 40GbE QSFP+ [8086:1583]
85:00.1 Ethernet [0200]: Intel XL710 for 40GbE QSFP+ [8086:1583]
```
Check the PCI device related numa node id:
`cat /sys/bus/pci/devices/0000\:xx\:00.x/numa_node`
Usually 0x:00.x is on socket 0 and 8x:00.x is on socket 1. Note: To get the best performance, ensure that the core and NICs are in the same socket. In the example above 85:00.0 is on socket 1 and should be used by cores on socket 1 for the best performance.
#### 5.2.4 Check which kernel drivers needs to be loaded and whether there is a need to unbind the network ports from their kernel drivers.