# 2022q1 Accelerate Cloud Native Application with eBPF contributed by < `Shawn5141` > Related Github Repo: https://github.com/I-mpossible/socket-acceleration-with-ebpf ## 摘要 How to use eBPF for accelerating Cloud Native applications ### Part 1: https://cyral.com/blog/how-to-ebpf-accelerating-cloud-native/ ### Part 2: https://cyral.com/blog/lessons-using-ebpf-accelerating-cloud-native/ #### Introduction - As real time applications with very stringent performance SLAs, eBPF became a great tools for accelerating communication between various microservices in our backend. - This blog focues on enabling application to transparently bypass the TCP/IP stack using eBPF when those applications are on the same host. This is valauble for `cloud native applications` that are built using the microservices which spend a significant time processing network requests over RPCs (e.g. gRPC from the Cloud Native Computing Foundation) and/or other REST APIs. #### Overview of BPF and eBPF - [Berkeley Packet Filter (BPF)](https://en.wikipedia.org/wiki/Berkeley_Packet_Filter) allowed user defined filters to be translated into instructions that ran inside a simple VM with a small register set, within the kernel, and specified which subset of network packets are to be rejected or accepted. > Example One can actually use tcpdump to see the BPF instruction set in action – just use tcpdump with the -d option on an interface to get the BPF instructions for a filter (here we see the instructions for capturing UDP packets destined to DNS port 53): > ```bash > // "classic" BPF > tcpdump -i enp0s3 udp dst port 53 -d (000) ldh [12] > (001) jeq #0x86dd jt 2 jf 6 > (013) jeq #0x35 jt 14 jf 15 > ``` - [extended BPF (eBPF)](https://prototype-kernel.readthedocs.io/en/latest/bpf/) have an enhanced instruction set, new features including support for hooking at multiple events in the kernel, actions other than just packet filtering, a just-in-time assembler to increase performance, and a bytecode optimizer and verifier for the code to be injected in the kernel (see details here). The result is a general packet filter framework that can be used to inject BPF programs in the Linux kernel to extend its functionality during runtime. #### Using eBPF for Network Acceleration Typically, eBPF programs have two parts to them: * kernel space component, where decision making or data collection needs to happen based on some kernel events, such as packet rx on the nic, a system call spawning a shell, etc. * a user space component, where one can access data written by the kernel code in some shared data structure (maps, etc.). The Linux kernel supports different types of eBPF programs that can each be attached to specific hooks ![](https://i.imgur.com/Oae0z96.png) These programs execute when events associated with those hooks get triggered e.g. a system call such as setsockopt() is made, network driver hook XDP just after DMA of the packet buffer descriptor, etc. <details> <summary> <a href="https://man.archlinux.org/man/bpftool.8.en">load.sh</a> use btftool to attach specific program to the kernel. </summary> 1. Compiles the sockops BPF code, using LLVM Clang frontend, that updates the sockhash map 2. Uses bpftool to attach the above compiled code to the cgroup so that it gets invoked for all the socket operations such as connection established, etc. in the system. 3. Extracts the id of the sockhash map created by the above program and pins the map to the virtual filesystem so that it can be accessed by the second eBPF program 4. Compiles the tcpip_bypass code that performs the socket data redirection bypassing the TCPIP stack 5. Uses bpftool to attach the above eBPF code to sockhash map </details> <details> <summary> <a href="[[url](https://github.com/cyralinc/os-eBPF/blob/develop/sockredir/load.sh](https://manpages.ubuntu.com/manpages/focal/man8/bpftool-prog.8.html))">btftool</a>. </summary> </details> All the types are enumerated in the UAPI [bpf.h](https://github.com/torvalds/linux/blob/master/include/uapi/linux/bpf.h) header file with the user facing definitions required for an eBPF program. In this blog post we are interested in the eBPF programs of the type BPF_PROG_TYPE_SOCK_OPS and BPF_PROG_TYPE_SK_MSG which allow us to hook up our BPF program to socket operations e.g. when a TCP connect event takes place, upon sendmsg call of a socket, etc. The SK_MSG is the one which performs the socket data redirect. #### Performing the socket data redirect The SK_MSG program executes upon a sendmsg call on a socket and must be attached to a socket map, specifically BPF_MAP_TYPE_SOCKMAP or BPF_MAP_TYPE_SOCKHASH. These maps are key value stores where the value can only be a socket. Once the map has the SK_MSG program attached, all the sockets in the map inherit the SK_MSG program which gets executed upon any writes on the sockets. ```c= __section("sk_msg") int bpf_tcpip_bypass(struct sk_msg_md *msg) {  struct sock_key key = {};  sk_msg_extract4_key(msg, &key);  msg_redirect_hash(msg, &sock_ops_map, &key, BPF_F_INGRESS);  return SK_PASS; } char ____license[] __section("license") = "GPL"; ``` The code above is placed in the sk_msg section of the ELF object code using the compiler section attribute. This section is what tells the loader about the BPF program type which determines where the program can be attached in the kernel and what in-kernel helper functions it can access. msg_redirect_hash() is a wrapper around the BPF helper function bpf_msg_redirect_hash(). The helper functions cannot be accessed directly and must be accessed through predefined helpers of the form BPF_FUNC_msg_redirect_hash since the kernel verifier for BPF programs only allows calls to these predefined helpers from UAPI [linux/bpf.h](https://github.com/torvalds/linux/blob/master/include/uapi/linux/bpf.h) defined in _[‘enum bpf_func_id’](https://github.com/torvalds/linux/blob/7e63420847ae5f1036e4f7c42f0b3282e73efbc2/include/uapi/linux/bpf.h#L3160)_ (see the [code](https://github.com/cyralinc/os-eBPF/blob/c01e7fa8fe1b49c010dceb6c52ecb216603157fc/sockredir/bpf_sockops.h#L34) for the macro definition). This indirection allows the BPF backend to emit error when it sees a call to a global function or to an external symbol.