# Majur worker service
## What
A service that is capable of running arbitrary Linux processes. It will provide the ability to start, stop, query status, and stream logs from running jobs. It will support constraining job resources through cgroups and PID, mount, and networking namespaces.
## API
The API will be exposed via a gRPC server, with authentication enforced by mTLS. The server will use TLS 1.3 and the cipher suite supported by Go 1.18, `TLS_AES_128_GCM_SHA256`, `TLS_AES_256_GCM_SHA384`, and `TLS_CHACHA20_POLY1305_SHA256`.
Authorization will be enforced by a simple scheme where the serial number in the client's x509 certificate is used to uniquely identify the client, and clients are restricted to accessing processes that they have started.
Processes launched by the server will be run inside new UTS, IPC, PID, network, mount and user namespaces. They will as a result:
- Not have any network access
- Run inside their own root filesystem
- Not see the host's processes
- Not see other global resources (hostname, [ipc resources](https://man7.org/linux/man-pages/man7/sysvipc.7.html))
```protobuf
message StartRequest {
// Path is the path of the command to run
string Path = 1;
// Args holds command line arguments
repeated string Args = 2;
// Resources specifies cgroup information for resource constraints for the process.
LinuxResources Resources = 3;
}
message StartResponse {
// ID is the unique identifier for the process
string ID = 1;
}
message StopRequest {
// ID is the unique identifier for the process
string ID = 1;
// Use the SIGKILL signal instead of SIGTERM
bool Force = 2;
}
message StatusRequest {
// ID is the unique identifier for the process
string ID = 1;
}
message StatusResponse {
// Exited represents whether the process has exited
bool Exited = 1;
// ExitCode is the exit code of the job, or -1 if the job hasn't exited or was terminated by a signal
int32 ExitCode = 2;
}
message LogsRequest {
// ID is the unique identifier for the process
string ID = 1;
}
message LogsResponse {
// Stdout is a chunked segment of stdout of the job
bytes Stdout = 1;
// Stderr is a chunked segment of stderr of the job
bytes Stderr = 2;
}
message LinuxResources {
// Memory restriction configuration
LinuxMemory Memory = 1;
// CPU resource restriction configuration
LinuxCPU CPU = 2;
// BlockIO restriction configuration
LinuxBlockIO BlockIO = 3;
}
message LinuxMemory {
// Memory limit (in bytes)
int64 Max = 1;
}
message LinuxCPU {
/// CPU hardcap limit (in usecs). Allowed cpu time in a given period.
int64 Quota = 1;
// CPU period to be used for hardcapping (in usecs).
uint64 Period = 2;
}
message LinuxThrottleDevice {
int64 Major = 1;
int64 Minor = 2;
uint64 Rate = 3;
}
message LinuxBlockIO {
// IO read rate limit per device, rate is in bytes per second
repeated LinuxThrottleDevice ThrottleReadBpsDevice = 1;
// IO write rate limit per device, rate is in bytes per second
repeated LinuxThrottleDevice ThrottleWriteBpsDevice = 2;
// IO read rate limit per device, rate is in IO per second
repeated LinuxThrottleDevice ThrottleReadIOPSDevice = 3;
// IO write rate limit per device, rate is in IO per second
repeated LinuxThrottleDevice ThrottleWriteIOPSDevice = 4;
}
service WorkerService {
// Start requests the server run a job
rpc Start(StartRequest) returns StartResponse;
// Stop stops the job by sending it SIGTERM, followed by SIGKILL after a grace period. Returns NotFound error if it does not exist or the user does not have permission to it
rpc Stop(StopRequest) returns (google.protobuf.Empty);
// Status returns the current state of the job
rpc Status(StatusRequest) returns (StatusResponse);
// Logs streams the logs of a process by id, returns NotFound error if it does not exist or the user does not have permission to it
rpc Logs(LogsRequest) returns (stream LogsResponse)
}
```
## Library
The library wraps the gRPC client to provide an easy means of interacting with the server. It's mostly a straightforward translation of the gRPC API but with the `Logs` method returning `io.ReadCloser`s for the job's output streams.
```go
type Client interface {
Start(context.Context, *StartRequest) (*StartResponse, error)
Stop(context.Context, *StopRequest) error
Logs(ctx context.Context, *LogsRequest) (stdout io.ReadCloser, stderr io.ReadCloser, error)
Status(ctx context.Context, *StatusRequest) (*StatusResponse, error)
}
```
## CLI
A CLI will be created for interfacing with the API. It will support all the functionality exposed by the API through a combination of configuration files and command-line flags.
Example:
```
$ majur start -f job.yml
Job 1234 has started
$ majur status -j 1234
Job 1234 is running
$ majur logs -f
beep boop i'm a job running and stuff
^C
$ majur stop
$ majur logs
beep boop i'm a job running and stuff
received SIGTERM but i refuse
$ majur stop --force
$ majur status -j 1234
Job 1234 is no longer running, was terminated by signal.
```
job.yml example:
```yaml
path: "bash"
args: ["-c", "hello $USER"]
env: ["USER=ben"]
resources:
memory:
limit: "19Mi"
cpu:
```
## Further work / trade-offs
### Further cgroup APIs
cgroups allow for more kinds of configuration than we're using, including [device access](https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#device-controller), [huge pages](https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#hugetlb), and plenty more. These could be exposed as part of the API to give the full amount of flexibility and control that the cgroups APIs allow.
### Network connectivity
It might be desirable for the containers to have network access. We could up a bridge device and veth pair for our network namespace, but it's non-trivial and so I'm also going to consider this out of scope for the initial version of this project. [This blog post](https://blog.scottlowe.org/2013/09/04/introducing-linux-network-namespaces/) is a good reference for learning more about how this could be setup.
### Bring your own rootfs
Rather than running programs on the host in an Alpine chroot, we might further "Docker-ify" our worker API by allowing users to specify OCI images as rootfses to be used instead of hard-coding the Alpine one.