Secure, Efficient and Developer Friendly Kernel Extension Reviews and Comments TODO

Secure, Efficient and Developer Friendly Kernel Extension Reviews and Comments TODO =========================================================================== + Compare HYBRIDDRV's SFI with other SFI-based approach + Define clear Kernel interface, akin to kernel API, API detail + Emphasize our contribution in the significance of HYBRIDDRV + Compare with other contemporary approaches, recent papers that propose secure data plane kernel architectures + Writing needs improvement + Evaluation does only partly justify the design + If protection against a malicious and not only faulty drivers is the intent than this aspected needs to be emphasised more + What kernel interfaces is HYBRIDDRV exposing? + How are you performing the security checks for those interfaces? And if the system could perform those checks, what’s stopping you from exposing more kernel functionalities to the data plane? What is the overhead of invoking the control plane? Is that going to undermine the performance benefit of the hybrid approach? + How is IPC being implemented? Those numbers are orders of magnitude lower than that of prior work, such as Snap[1] and TAS[2], which leads me to suspect if the performance differences are just an artifact of the implementation + Another issue with the evaluation is that none of the extension data planes involve any control plane interaction (to access kernel services), so it only shows the best case performance. + Moreover, for the networking evaluation, I am quite confused why the experiments were conducted using VMs running on the same host.The authors mentioned physical NICs as a possible bottleneck of the system, which seems highly unlikely given the performance numbers in the paper (what NIC are you evaluating on?). + Elaborate SFI verifier detail(The SFI verifier is actually an interesting point of the paper. Unfortunately, the paper did not really explain it in much detail. I would recommend the authors to elaborate on the verification effort in their next iteration of the paper.) + Sometimes the paper seems to hint that the approach is not only about reliability but also about security. If this is the case this needs to be explained better. E.g., by providing an attacker model and a security analysis. ??? + Fixing some wordings + The comparison with WASM seems odd and it is unclear to me which WASM environment is used + it is not clear how much functionality now resides in user space for the implemented drivers. For the outlined benchmarks all code could be in kernel and the numbers would not change? + Is it just the case that system reboots are less likely? Or is there a reduced complexity and if yes to what extent? + an unspecified network stack? + How does one write a HybridDrv extension, and what APIs does a developer have to use to specify which code is in the data plane and which is in the control plane? + The evaluation would be stronger if it looked at a performance-critical driver such as a 10/25 GbE network driver (e1000 is not terribly challenging any more these days). + HybridDrv's overhead is still high + There is a long line of work on safe kernel extension code, going back to SPIN and VINO in the early 1990s (neither of which this paper cites; please check them out!). + More recently, eBPF has brought safe kernel extensions to Linux, and there is some undeniable commonality between the idea of a small amounts of safe "data plan code" running in kernel and eBPF programs. + 20% Overhead is high + `kalloc` and `kfree` are pointer-based. But the data plane code cannot call APIs that take raw pointers + The user-space part of a HybridDrv extension appears to be able to invoke kernel functionality through well-defined interfaces, which the paper describes as "syscalls". But does this mean that you add additional syscalls to Linux to expose kernel APIs to user space? NO + The kernel code modifications (The user-space part of a HybridDrv extension appears to be able to invoke kernel functionality through well-defined interfaces, which the paper describes as "syscalls". But does this mean that you add additional syscalls to Linux to expose kernel APIs to user space? Or does the extension literally work only with the existing Linux syscall API? If the latter, it again seems tricky to port many existing drivers and modules to HybridDrv; but if the former is true, this requires serious kernel modifications.) + Consider §4.3: it mentions that the "kernel also provides an interface to allow data plane to send an event to wake up the waiting process". How does this work without a pointer to a process description being passed as an argument? + More demanding extension workloads might be needed. + In particular, the e1000 (1 GbE) driver may fail to exhibit some overheads that would be much more evident when working with a driver that has even tighter performance bounds, such as ixgbe (10 GbE) or even the 25/100 GbE drivers that we see today. The NVMe driver is a better example, but the paper should give some specifics about which driver this is and which device(s) it targets. + Another reason why I think the evaluation numbers need a bit more context is that §6.3 says that a single VM could not saturate the (virtual) link. That seems very slow, particularly on a 1 GbE link using large packets. Do you know why this was so slow? I think the standard Linux stack has no problem at all saturating a 1 GbE link with a single TCP connection (or indeed large UDP packets). + The concept of "data plane" vs. "control plane" in the context of this paper seems to mean "short operation vs. long operation" (per §2.2). Traditionally, this would be seen more as a logical difference between common and rare operations + Does it matter how common an operation is for whether it should be in HybridDrv's control plan or data plane? + When two HybridDrv kernel extensions communicate with each other, do they have to use user-space IPC? If so, how is that faster than microkernel IPC? + In §4, please explain if you wrote the SFI compiler plugin from scratch, or if you used an existing SFI implementation. The paper seems to imply that you wrote it from scratch. + Could there be kernel extension code that relies on the different behavior of x86 instructions in ring 0 and ring 3? If so, my understanding is that this code would not work in HybridDrv's control plane, correct? + §5.1: the kernel module described here *is* part of the TCB in HybridDrv, right? It would be good to be clear about this. + Please clarify if the networking scenario in §6.3 involves one or two kernel extensions; i.e., whether the driver and the network stack are separate extensions or part of the same extension. + It would be great to understand what specific drivers and network stack you're using in the evaluation. For example, is the e1000 driver the one from stock Linux, ported into HybridDrv, or a driver you wrote from scratch? If ported, how much effort was involved? If from scratch, how large are the implementations? + I'm curious why you ran the experiments on a VMware VMM; why not on bare metal to avoid confounding performance factors from emulated hardware? + Why does there have to be a completely new SFI design? Why is eBPF not applicable?(How does the proposed SFI compared with eBPF? Why not just use eBPF? The SFI design of HybridDrv immediately reminds me of eBPF, both of which set constraints on what code is allowed to run, have a verifier, and are designed to work in the kernel. It is not clear why HybridDrv has to propose its own SFI and why eBPF is not enough. At the end of section 3.3, the paper says "developers need only to register entries as handlers for corresponding kernel events, ...", which seems to me to be a perfect match of eBPF. eBPF has also been used to implement network packet processing-related drivers (e.g. XDP).)(In the evaluation, authors argue that the performance overhead of Web Assembly VM) ((considered to be a proxy of eBPF) is much larger than SFI, which seems to be the only reason to not use eBPF. However, I don't find that experiment convincing. The experiment setup does not make it clear but I assume the LLVM VM only interprets LLVM bytecode while the SFI in HybirdDrv compiles everything down to the native instructions. I wonder whether WASM VM is a good approximation to compare with eBPF, since eBPF already has a JIT compiler that can run code in native instructions as well. It is not surprising that the native execution of SFI outperforms bytecode interpretation of LLVM VM, but is that a fair comparison? ) + Does the requirement of manually separating kernel extensions to user space control plane and kernel space data plane scale to complex OS implementations? + Is it practical to ask developers to manually specify what code should be put in the control plane and what in the data plane? How much effort was spent on making control/data plane separation decisions for the kernel extensions implemented in this paper? What factors are such control/data plane partitions sensitive to? e.g. Are they sensitive to application workload changes? How hard is it to reimplement control plane code to data plane and vice versa? As the number of kernel extensions increase and the dependencies between extensions become more complex, is it still manageable to make control/data + plane decisions manually? Is it possible to automate the control/data plane separation decisions via + profling and estimating the cost of IPC overhead and the SFI overhead? + Questions regarding the different ways of driver implementation in the eval section. 1. Section 6.3, Figure 3. What does "Linux original driver" mean? Only "Monolithic Mode" outperforms the HybridDrv in Figure3, but section 5.2 tells me monolithic modes port HybridDrv drivers into kernel mode as kernel modules, thus should not be the "Linux orignal driver". 2. Why not just use the Linux original driver as the "monolithic driver" instead of porting HybridDrv driver to kernel mode? 3. What do the data points of "Linux Rawsocket" imply? Is it the so-called "Linux original driver"? What is the take-away of its appearances in Figure 3 (b)(d) and Table 7? + What's the relationship between HybridDrv and kernel-bypass userspace drivers? The terms "control plane" and "data plane" used in this paper also exist in other OS papers, but are used in a completely different way. In particular, data plane is typically put as close as possible to the application process (e.g. as a library in the same process). While the control plane is typically put in the kernel space for device management, etc. This paper designs them in the opposite way. Besides, existing kernel-bypass OS cares a lot about the kernel/userspace context switch when designing the control/data plane. However, HybridDrv does care about different type of context switches but seems to ignore the kernel/userspace switch completely. + HybridDrv's design started to make sense when I reached section 5. My hypothesis for such differences is that kernel-bypass drivers rely on special hardware support while HybridDrv aims for a generic kernel design not depending on hardware features. Is the kernel-bypass trend in OS design completely orthogonal to HybridDrv? Anyway, the first half of the paper needs some clarification. + The paper needs better examples and illustrations. Section 3 needs improvement. After reading section 3, the overview of HybridDrv is still confusing. A high-level design diagram to explain the architecture of HybridDrv is lacking so is a comparison to other OS designs. Some terms used in the paper are also confusing, e.g. "requester" pops out of nowhere in the left column, page 2. So does "caller" in section 3.2. And what does "borrow the context" and "the kernel mode of another process" mean in section 3.2? The "context" in section 4.2.2 is also confusing. The data plane code is said to run in the kernel context, wheres this section mentions that "data plane codes run in the context of the requester"? + 'SFI' first appears in the abstract and should be in its full form. + The assumption that data-plane functions are short and do not call syscalls may not always apply + The drivers ported are quite small, and it's unclear how generalized the assumption above would hold and how the system would perform