owned this note
owned this note
Published
Linked with GitHub
FreeBSD Container Engine
## Introduction
`xc` is a work-in-progress container engine for FreeBSD to run both Linux[^1] and FreeBSD containers. It can use OCI compatible image registries[^2] for image distribution, such as [DockerHub](https://hub.docker.com), or [Azure Container Registry](https://azure.microsoft.com/en-us/products/container-registry).
Some highlight on features unique to `xc` includes:
- pre-instantiation sanity checks, including missing environment variables on supported images
- Better `DTrace` support, by default `xc` exposes `/dev/dtrace/helper` to the containers which allows `USDT` to be registered by applications.
`xc` targets `FreeBSD-14`, however, most of the features will still work with older versions, except VNET container as it depends on [this patch](https://reviews.freebsd.org/D40213) and [this patch](https://reviews.freebsd.org/D40377), which the patches the result of development of this project.
Although `xc` utilizes OCI registry for image distribution, it uses a different image format, **which can be subject to breaking changes at anytime unnoticed until the first stable version released**.
## Table of contents
- [Requirements](#Requirements)
- [Quick start](#Quick-start)
- [Usage](#Usages)
- [Networking](#Networking)
- [Tracing container](#Tracing-container)
## Requirements
### Building
You will need Rust, Cargo an cmake to build this project, the easiest way to either use [rustup](https://rustup.rs) or install from `pkg`. Now, with `cargo` installed, you can build the project with
```
cargo build
```
> Note: If you want to push images, it is much better to build the project with release configuration `cargo build --release`, due to much better sha2 performance
### Running
#### Supported CPU architecture
`xc` *should* support all architecture FreeBSD supports. `xc` is mostly developed on `aarch64` machine and quite a bit on `amd64`.
#### File system
`xc` supports only ZFS currently. There are plans to make it work for non ZFS systems but that depends on [the availability of overlayfs in base](https://hackmd.io/@jhb/ByWrxQmr2)
#### Networking
`xc` relies on `pf` (`ipfw` is planned but not developed) for port exposure (via `rdr`)
You may need to add `NAT` related rules in your `pf` configuration if you wish to allow internet access for the containers.
## Quick start
### Building
```sh
# clone the project
git clone https://github.com/michael-yuji/xc.git
# build the project
cd xd
cargo build --release
```
### Installing
```sh
# The ocitar utility must in $PATH, any directory in $PATH works but we pick /usr/local/bin here
cp target/release/ocitar /usr/local/bin
# xcd, the daemon, must run as root
cp target/release/xcd /usr/local/sbin
# copy xc, the client utility to some $PATH, can run by normal user as well
cp target/release/xc /usr/local/bin
```
### Configuration
Create ZFS datasets for hosting images and rootfs of containers with the assumption that zroot is the name of the ZFS pool.
`zfs create -p -o atime=off zroot/xc/datasets`
`zfs create -p -o atime=off zroot/xc/run`
Create a json configuration file at `/usr/local/etc/xc.conf`
```json
{
"ext_ifs": [
"igb0"
],
"image_dataset": "zroot/xc/datasets",
"container_dataset": "zroot/xc/run",
"layers_dir": "/var/cache",
"devfs_id_offset": 1000,
"image_database_store": "/var/db/xc.image.sqlite",
"database_store": "/var/db/xc.sqlite",
"socket_path": "/var/run/xc.sock",
"networks": {},
"registries": "/var/db/xc.registries.json"
}
```
| Key | Description |
| -- | -- |
| `ext_ifs` | specifies the external network interfaces which the port forwarding rules will be triggered by default.|
|`image_dataset`| The dataset where rootfs of container images will be stored. **This dataset must exists**. |
|`container_dataset` | The dataset which will be used to store the root dataset of running containers. **This dataset must exists**. |
| `layers_dir` | The directory where image file system layers will be stored |
|`devfs_id_offset`| `xc` takes care of devfs ruleset generation, this variable tells `xc` how to generate the id for the rulesets |
|`image_database_store` | This is the sqlite database which `xc` stores image manifests, this file will be created automatically if it does not exist|
|`database_store`| This is the sqlite database which `xc` stores address allocation and network definition, this file will be created atuomatically if it does not exist|
|`socket_path`| The UNIX socket the daemon will listen at and accept connection from|
|`networks`| Mapping between host network interfaces and xc networks, leave it empty for now as it can be managed via cli|
|`registries`|Json file that should kept secure, which stores credentials for different container registries, this file will be created automatically if it does not exist|
### Run
Now you are ready to run `xc`. Starting at this port throughout the section, we are assuming you are running as root, for the sake of keeping things a bit simpler. Running containers as non-privileged users is supported in `xc`, but we are not going to get into that in the quick start guide.
In a terminal, start the daemon in foreground:
`# xcd`
By default, DockerHub is set as the default registry.
#### Run a pre-built FreeBSD image
The following command pull [this image](https://hub.docker.com/r/freebsdxc/freebsd/tags) from dockerhub.
`# xc pull freebsdxc/freebsd:13.2`
Now you can run the image
`# xc run freebsdxc/freebsd:13.2 /bin/sh`
:::warning
By default xc containers does not attach to any network, see [Networking](#Networking) section for more information.
:::
:::warning
The image in this example `freebsdxc/freebsd:13.2` runs `/etc/rc` on a stock FreeBSD base, which means sendmail is enabled, so if you attached unusable network to it, it might stuck at initialization for awhile, so play around with it without attach to a network first and don't panic if it seems stuck. (check the log of xcd!)
:::
#### Run a Linux image
First load the linux kernel module and enable efi fallback to Linux
`# kldload linux64`
`# sysctl kern.efi64.fallback_brand=3`
The following command pull [this image](https://hub.docker.com/_/mariadb)
`# xc pull library/mariadb:10.9 mariadb:10.9`
Now you can run the image
`# xc run library/mariadb:10.9 -e MARIADB_ROOT_PASSWORD=password`
:::warning
By default xc containers does not attach to any network
:::
#### Running an image, with network
We are now going to create a managed network. Additionally, we are going to make the containers able to access the internet.
We want xc to automatically assign addresses from the range `192.168.17.0/24` to the containers.
:::info
You can pick any ranges you want, we just pick `192.168.17.0/24` as an example.
:::
##### Create an example network
1. Create the interface we are going to use in host. We will call it `xc0`
>`# ifconfig bridge create inet 192.168.17.254/24 name xc0`
>>Here we create a bridge interface named `xc0` with an ip address 192.168.17.254 and subnet mask 255.255.255.0 (because the /24)
>> We use a bridge interface because it allow us to serve both VNET and non-VNET Jails
2. Create the `xc` network, let's name it `example`
>`# xc network create --alias xc0 --bridge xc0 --default-router 192.168.17.254 example 192.168.17.0/24`
>> Essentially it means: When we attach a container to this `example` network, find an ip in the range of `192.168.17.0/24` that other containers attached to the same network is not using. `xc` also adds the allocated address to the `pf` table `xc:network:example`. (similarly, if a network is called `foo`, the address allocated from the pool will be added to `xc:network:foo`)
>>
>> If the container is non-`VNET`, add the address alias to `xc0` (because of `--alias xc0`), as if we run `ifconfig xc0 inet 192.168.17.x/24 alias`
>>
>> If the container is `VNET`, the runtime creates the `epair` interfaces, `(epairXa, epairXb)`, move `epairXb` to the container, assign it with an the allocated `192.168.17.x/24` address, and add `epairXa` to `xc0` (because of `--bridge xc0`)
3. Let's say we want our containers to access the internet via `NAT`, so we need to configure the firewall `pf`. Assuming the network interface connected to our default gateway interface is `igb0`
> a minimal `/etc/pf.conf` is going to look like
```
ext_if="igb0"
# This rule creates a NAT to the $ext_if when the source address
# is an address in the 'xc:network:example' table
nat on $ext_if from <xc:network:example> to any -> ($ext_if)
# In case we need to perform port redirection (-p rules), add our
# rdr anchor here
rdr-anchor xc-rdr
```
4. Start the firewall before starting any container requires internet started
`# service pf start`
5. Now we can run a container with the extra `--network <name>` flag
`# xc run --network example freebsdxc/freebsd:13.2 --name test`
6. Type `fetch -o- https://google.com` to verify internet is working. Don't kill this container yet as we are going use it to test our `VNET` container in later step.
>
>*Hint: in the test container console, you can type `<Ctrl-p>-q` (Control-P follow by q) to detach from the console, and run `xc attach test` to reattach the console.*
7. Now let's try out `VNET` container.
`# xc run --vnet --network example freebsdxc/freebsd:13.2 --name test2 /bin/sh`
8. Verify everything is working
8.1.1. Test we can ping internet
`# ping 1.1.1.1`
8.1.2 Test we can ping the other container
`# ping <address of test>`
8.1.3 Test DNS is working
`# ping google.com`
> If you wonder why DNS magically work in these example containers, by default `xc` copies /etc/resolv.conf from the host to the containers. You can override this behaviour by providing the DNS nameservers by using one or multiple `--dns <dns ip>` arguments.
> For example `--dns 8.8.8.8 --dns 8.8.4.4` generates a `resolv.conf` look like
```
nameserver 8.8.8.8
nameserver 8.8.4.4
```
## Usages
Show running containters
> `xc ps`
Pull an image from a remote registry and name it `foo:bar`
> `xc pull example.io/my-image:bar foo:bar`
Push a local image `foo:bar` to a registry `example.io` as `foo1:bar1`
> `xc push foo:bar example.io/foo1:bar1`
Kill a container named `foo`
> `xc kill foo`
Run a container with image `freebsdxc/freebsd:13.2` and name the container "example"
>`xc run freebsdxc/freebsd:13.2 --name example /bin/sh`
Run a container and add the ip address `192.168.8.8` on `igb0` to it
> `xc run freebsdxc/freebsd:13.2 --ip 'igb0|192.168.8.8' /bin/sh`
Run a container and add the ip address `192.168.8.7` and `192.168.8.8` on `igb0` to it
> `xc run freebsdxc/freebsd:13.2 --ip 'igb0|192.168.8.7/24,192.168.8.8/24' /bin/sh`
Run a vnet container and move the `igb0` to the Jail, with ip address `192.168.8.7` and `192.168.8.8`
> `xc run freebsdxc/freebsd:13.2 --vnet --ip 'igb0|192.168.8.7/24,192.168.8.8/24' /bin/sh`
Run a container using the `example` network
> `xc run --network example freebsdxc/freebsd:13.2 /bin/sh`
Link a container named `foo`, the command will enter blocking mode until the container is killed, killing the command will result in killing the container as well
> `xc link foo`
## Pull Images and OCI Registry
You can pull images from public registries using the `pull` command:
`xc pull <server>/<repo>:<tag>`
For example, the command `xc pull index.docker.io/freebsdxc/freebsd:13.2` pulls the image in repo `freebsdxc/freebsd` with tag `test-amd64` and will be accessible as `freebsdxc/freebsd:13.2` locally.
:::info
By default `xc` uses DockerHub as the default registry, that means if the server component is not available, `xc` will try to pull from DockerHub instead
:::
:::warning
If you are trying to pull a "official image" from DockerHub, remember to add the `library/` prefix to the repo, for example, the [official image of mariadb](https://hub.docker.com/_/mariadb) can be pulled by `xc pull library/mariadb:10.9` or `xc pull index.docker.io/library/mariadb:10.9`
:::
If your registry requires a credential to access, you can use `xc login --username <username> --password <password> <server>` to add credential to a registry.
If you prefer to deal with the registries.json file directly,
Here's an example of the registry file (`/var/db/xc.registries.json` in the example configuration shown above)
```json
{
"default": "index.docker.io",
"registries": {
"index.docker.io": {
"base_url": "https://index.docker.io",
}
}
}
```
If you have credentials for some of the registries:
```json
{
"default": "index.docker.io",
"registries": {
"index.docker.io": {
"base_url": "https://index.docker.io",
"basic_auth": {
"username": "my_docker_hub_username",
"password": "my_docker_hub_access_token"
}
},
"my_azure_cr.azurecr.io": {
"base_url": "https://my_azure_cr.azurecr.io",
"basic_auth": {
"username": "my_username",
"password": "my_token"
}
}
}
}
```
## Networking
There are many ways to configure the network for `xc` containers. You can assign IP addresses to the containers just like normal Jails, but you can also let `xc` handle address allocation for you via `network` objects, in fact, you can even mix both!
### Mananged address allocation
To have `xc` allocate the addresses to the containers, first, you need to create the `network` objects, for example, the following command creates a network named `example`, with an address space of `172.17.0.0/24`. If a container is attached to a network, `xc` allocates an address within the address space and assigns it to the container. It is also possible to request an explicit address from an `xc` network. See [Request explicit address](#request-explicit-address) sub-section for more.
```
xc network create --alias igb0 --bridge bridge0 example 172.17.0.0/24
```
The `alias` interface is the interface that will be used to create an IP alias for non-`vnet` containers. The `bridge` interface is the interface that the interface of the container will be bridged to (to be exact, the `epairXa` end of the `epairX` interfaces, that `epairXb` is the interface that moved to the container)
**`xc` does not guarantee the connectivity between the containers, even in the same network**, all `xc` does is create an alias/bridge the interfaces, it is the responsibility of the administrator to oversee the network topology. This also means **`xc` is trying hard to stay out of your way** in terms of network engineering.
To run a container attached to a network, use the `--network <network>` argument.
For example, `xc run --network example freebsdxc/freebsd:13.2` creates a container attached to the network named `example`.
#### Request explicit address
You can request an explicit address from the address pool from a network.
For example, `xc run --network 'example|172.17.0.200' freebsdxc/freebsd:13.2`.
In the case where the address is not available, you'll get an error like this:
```
Err(
ErrResponse {
errno: 2,
value: Object {
"error": String("address 172.17.0.200 already consumed"),
},
},
)
```
### Unmanaged address allocation
Use `--ip '<iface>|x.x.x.x/m'` to assign an IP address to the container manually. Where `<iface>` is the name of the network interface and `x.x.x.x/m` is the CIDR of the IP address.
For example, if you want to allocate `192.168.13.31/24` to the container, on interface `igb0`, you should add `--ip 'igb0|192.168.13.31/24'` to the run command.
:::warning
If the container uses `vnet`. The interface will be moved to the container.
:::
:::info
Tips: you can allocate multiple addresses on the interface, each CIDR block needs to be separated by `,`, for example, the argument `--ip 'igb0|192.168.13.31/24,192.168.8.8/24'` allocates both `192.168.13.31/24` and `192.168.8.8/24` to the container.
:::
### Mixing managed and unmanaged address allocation
You can mix managed and unmanaged address allocation together anytime, for example,
```
xc run --vnet --network example --ip 'igb0|192.168.1.111/24,[dead:beef::1]/24 freebsdxc/freebsd:t13.2'
```
# Tracing Container
You can trace system calls and many others of your container, using the `xc trace <container>` command. By default, without any extra arguments, `xc trace <container>` launches `dwatch(1)` with `-F syscall` and trace all the syscall entry/exit of system calls.
Containers created by `xc` are by default with `/dev/dtrace/helper` exposed, this allows applications running in the container to register their own USDT probes, and can be traced by the host system.
`xc trace` is marely a wrapper around `dwatch`, `xc trace <container> -- <args>...` is translated to `dwatch -j <jid> <args>...`. Checkout the man page of `dwatch(1)` for amazing things you can do with it.
[^1]: As long as supposed by [FreeBSD Linuxulator](https://wiki.freebsd.org/Linuxulator)
[^2]: Tested: DockerHub, Microsoft Azure CR