# DeBoot Swarm grant milestone report
This is a report on the first set of milestones of a project funded by the [Swarm foundation](https://www.ethswarm.org/foundation) to research and prototype methods to boot devices into images fetched from a peer-to-peer content-addressable storage network with a cryptocurrency-based incentive layer — that is, the [Swarm network](https://www.ethswarm.org).
It is beyond the scope of this progress report to go into details about the benefits such a system could bring, but briefly:
- Permissionless peer-to-peer networks have excellent resilience properties compared to the client-server model: once uploaded, it is extremely difficult for any entity — including the uploader – to take content down.
- Generally any software downloaded from a remote source should have its integrity and authenticity validated against, for example, a checksum signed by the developer. Content-addressing means the checksumming component is already built into node software: the integrity of downloaded chunks is guaranteed by the protocol itself.
Meanwhile, the blockchain integration used for the cryptocurrency incentive layer provides a convenient way for developers and other trusted authorities to sign content — for example, by writing content hashes on a nameservice like the [ENS](https://ens.domains).
We developed the original approach of launching a Swarm node in the initramfs to download a rootfs at [ETHBerlin](https://ethberlin.ooo) (September 2022). Over the last couple of months, we pushed this all the way from idea hit the following milestones:
- Implemented scripts to generate a GRUB based bootable USB drive (hosted for the moment on [GitHub](https://github.com/awmacpherson/deboot)). The user provides a list of Swarm hashes with short descriptions; these then appear as options in the GRUB menu that appears on boot.
On selection, a Linux kernel is launched with a initramfs (generated by [dracut](https://github.com/awmacpherson/dracut.git) [dr'eɪkət]) and init program that brings up a network interface, launches a [Swarm node](https://github.com/ethersphere/bee), and fetches the image addressed by the content hash. It then mounts this in memory and boots into a "live" OS.
- Registered the ENS domain deboot.eth with content hash pointing to a document listing Swarm hashes with a short description and a commit ref for the DeBoot git repo.
- Considered how this circle of ideas can be put into a broader context of *decentralized package management* that admits application far beyond the bootstrap scenario (see [below](#beyond-bare-metal-decentralized-package-management)).
## Swarm initramfs
Very roughly, the execution flow of a DeBoot bootstrap looks something like this:
```mermaid
graph TD
UEFI --"boot USB"--> GRUB --"select hash"--> Linux+initramfs --"Swarm fetch"--> Linux+rootfs
```
After booting GRUB in the usual way, the user is presented with a menu to select a Swarm image with English description from a list baked into the `grub.cfg ` generated at build time. The chosen hash is passed to the kernel on its commandline, which looks something like the following:
```sh
vmlinuz root=live:bzz://0123456789deadbeef console=ttyS0 \
initrd=/boot/swarm-initrd
```
The kernel launches `systemd` as PID 1 which handles system initialisation. Special initramfs services inserted by dracut trigger hook scripts which parse the `root=` commandline parameter, run `ifup` scripts, start the `bee` node, and fetch the desired rootfs image. The rootfs is in a [Squashfs](https://docs.kernel.org/filesystems/squashfs.html) format (with [zstd](http://facebook.github.io/zstd/zstd_manual.html) compression, in the case of our demo images) and is mounted in memory.
The new hook scripts are implemented as a dracut module called `bzz` which can be found in our [fork](https://github.com/awmacpherson/dracut.git) of the main dracut github repo.
### Discussion
If we want to use experimental technologies to fetch OS images in novel ways, the path of least resistance is to do this fetching in the initramfs, since the tooling for these technologies is probably already designed to run in Linux. Implementing at any earlier stage requires getting client software to run in extremely pared-back and unfamiliar environments: for example, we have to do without libc or bash and possibly limited filesystem support. It's hard to argue that there would even be any benefit to doing so: the win one expects from skipping the initramfs is reducing boot time, but since the amount of time saved is likely dwarfed by that needed to connect to a peer-to-peer network, build up a peer list, and download a rootfs image,
Still, implementing in initramfs was not without its challenges:
- Setting up isolated tests for an initramfs generator requires an idiosyncratic configuration; the dracut devs provide a system container designed for running KVM-accelerated VMs for this purpose. There's not much information floating around on the Internet that helps to troubleshoot this type of setup!
- Downloading large files from Swarm usually requires making a cryptocurrency payment, which means that cryptographic keys securing actual funds (which will be spent each time the device boots) must be embedded in the initramfs.[^demo-rootfs] Figuring out a sensible way to deal with fund management is something we leave to a later work.
[^demo-rootfs]: At 10 megabytes, the demo rootfs images we uploaded for the sake of this prototype are within the limit for free downloads, so the tests work with an unfunded address.
## Integrity and authenticity
When downloading and running third-party software, one generally wants to verify two things:
- *Integrity.* The downloaded archive contains exactly the data the user expects.
- *Authenticity.* The code in the archive does what the user expects it to do.
Typically, integrity is validated by a checksum/hash. Authenticity is subtler: it should validated by a signature by a trusted authority (assuming the user has not personally checked every line of code). This signature could appear in various guises:
- If the trusted authority owns a website listing checksums of the relevant packages, then by providing a TLS certificate for that website they may effectively vouch for the content. A user may validate the signature by connecting to the website with TLS.
- A trusted authority that does not run a website may alternatively vouch for package contents by signing it or its content hash directly, distributing it in a standard format (e.g. an ASCII-armored PGP signature file). The authority must somehow associate its public key to the identity that is trusted.
- A modern alternative to offering data in the form of a website delivered over HTTPS is to write content hashes to a public blockchain with the associated transaction signed by a key or keys associated to the trusted authority.
Early in the boot process, the variety of possible approaches to authenticating images is limited by the variety of crypto implementations and methods for fetching remote data available in firmware and bootloaders. In the initramfs, where we have access to all software that runs on Linux, this is hardly any constraint.
For the current version, we wrote a short document containing the Swarm URIs of our demo images and a commit hash for the deboot repo and attached it to the ENS name [deboot.eth](https://app.ens.domains/deboot.eth).[^eoa]
An in-depth analysis of the security context and of the tentative approach of the current version is beyond the scope of this set of milestones.
[^eoa]: The ENS name deboot.eth is currently owned by an EOA, but we intend to transfer it as soon as possible to a multisig safe.
## Beyond bare metal: decentralized package management
Many of the considerations that apply to the current project apply equally to managing a decentralized software repository serving any type of package, not only rootfs images.
We identified and highlighted some issues that apply in this wider context that we feel warrant further investigation:
- *Security.* The discussion in [the previous section](#integrity-and-authenticity) manifestly applies to downloading and jumping into any software.
- *Indexing.* Different packages and repositories may use different hashing or content addressing schemes. While traditional sofware repositories are usually not content-*addressed*, for the security reasons already discussed they do tend to at least serve content hashes. If we could map these onto the hash scheme used in a content addressed networks like Swarm, we could shim existing tools (e.g. `dpkg`, `rpm`) onto a decentralized backend.
In [previous work](https://awmacpherson.com/posts/ethlisbon-report/), we investigated the idea of maintaining a mapping between addressing schemes using "content address translation" (CAT) tables. These tables could themselves be stored and maintained on peer-to-peer networks. In an ideal world, writing to such tables would be permissionless. Ensuring validity of the tables and designing incentives is a topic for future research.
- *Discovery.* That is, *search.* There are many topics to discuss here, of which most apply to much more general types of search problem. We highlight a search problem of particular importance to package management: that of finding a package version matching a given *version specifier* (see, for example, [the scheme used in Python packaging](https://peps.python.org/pep-0440/)).
Verifying that a given package version matches a specifier is a lightweight computation. This query is therefore suited to being handled by a decentralized network of search providers. While it may seem rather trivial — indeed, clients could also rather cheaply download the entire index of versions for a particular package and query it themselves — this problem provides an interesting toy case to study important questions about decentralized search in general.
- *Dependency resolution.* Although we hardly touched upon this issue in this report, the structure of the boot process originates as a solution to a non-trivial dependency problem: each stage is needed to fetch and decode the code and configuration needed to jump into the next stage. It's typical to solve this problem manually and program boots in an imperative manner; it would be cool to put this process on a similar footing to building other types of software by expressing boots in a declarative language.
Future work to push DeBoot methods towards a more general DePkg paradigm should address the following action items:
- Formal description of the security problems associated with downloading and running software.
(This will be addressed in the second set of milestones of the present project.)
- Further research into CAT databases and methods to host them in decentralized storage and maintain them with permissionless writes. Apply to shimming existing package management APIs onto a decentralized repository.
- Decentralized search. Treating the version specifier match problem as a toy model, evaluate approaches to verifiably resolving queries of decentralized databases.