Majur worker service

# Majur worker service ## What A service that is capable of running arbitrary Linux processes. It will provide the ability to start, stop, query status, and stream logs from running jobs. It will support constraining job resources through cgroups and PID, mount, and networking namespaces. ## API The API will be exposed via a gRPC server, with authentication enforced by mTLS. The server will use TLS 1.3 and the cipher suite supported by Go 1.18, `TLS_AES_128_GCM_SHA256`, `TLS_AES_256_GCM_SHA384`, and `TLS_CHACHA20_POLY1305_SHA256`. Authorization will be enforced by a simple scheme where the serial number in the client's x509 certificate is used to uniquely identify the client, and clients are restricted to accessing processes that they have started. Processes launched by the server will be run inside new UTS, IPC, PID, network, mount and user namespaces. They will as a result: - Not have any network access - Run inside their own root filesystem - Not see the host's processes - Not see other global resources (hostname, [ipc resources](https://man7.org/linux/man-pages/man7/sysvipc.7.html)) ```protobuf message StartRequest { // Path is the path of the command to run string Path = 1; // Args holds command line arguments repeated string Args = 2; // Resources specifies cgroup information for resource constraints for the process. LinuxResources Resources = 3; } message StartResponse { // ID is the unique identifier for the process string ID = 1; } message StopRequest { // ID is the unique identifier for the process string ID = 1; // Use the SIGKILL signal instead of SIGTERM bool Force = 2; } message StatusRequest { // ID is the unique identifier for the process string ID = 1; } message StatusResponse { // Exited represents whether the process has exited bool Exited = 1; // ExitCode is the exit code of the job, or -1 if the job hasn't exited or was terminated by a signal int32 ExitCode = 2; } message LogsRequest { // ID is the unique identifier for the process string ID = 1; } message LogsResponse { // Stdout is a chunked segment of stdout of the job bytes Stdout = 1; // Stderr is a chunked segment of stderr of the job bytes Stderr = 2; } message LinuxResources { // Memory restriction configuration LinuxMemory Memory = 1; // CPU resource restriction configuration LinuxCPU CPU = 2; // BlockIO restriction configuration LinuxBlockIO BlockIO = 3; } message LinuxMemory { // Memory limit (in bytes) int64 Max = 1; } message LinuxCPU { /// CPU hardcap limit (in usecs). Allowed cpu time in a given period. int64 Quota = 1; // CPU period to be used for hardcapping (in usecs). uint64 Period = 2; } message LinuxThrottleDevice { int64 Major = 1; int64 Minor = 2; uint64 Rate = 3; } message LinuxBlockIO { // IO read rate limit per device, rate is in bytes per second repeated LinuxThrottleDevice ThrottleReadBpsDevice = 1; // IO write rate limit per device, rate is in bytes per second repeated LinuxThrottleDevice ThrottleWriteBpsDevice = 2; // IO read rate limit per device, rate is in IO per second repeated LinuxThrottleDevice ThrottleReadIOPSDevice = 3; // IO write rate limit per device, rate is in IO per second repeated LinuxThrottleDevice ThrottleWriteIOPSDevice = 4; } service WorkerService { // Start requests the server run a job rpc Start(StartRequest) returns StartResponse; // Stop stops the job by sending it SIGTERM, followed by SIGKILL after a grace period. Returns NotFound error if it does not exist or the user does not have permission to it rpc Stop(StopRequest) returns (google.protobuf.Empty); // Status returns the current state of the job rpc Status(StatusRequest) returns (StatusResponse); // Logs streams the logs of a process by id, returns NotFound error if it does not exist or the user does not have permission to it rpc Logs(LogsRequest) returns (stream LogsResponse) } ``` ## Library The library wraps the gRPC client to provide an easy means of interacting with the server. It's mostly a straightforward translation of the gRPC API but with the `Logs` method returning `io.ReadCloser`s for the job's output streams. ```go type Client interface { Start(context.Context, *StartRequest) (*StartResponse, error) Stop(context.Context, *StopRequest) error Logs(ctx context.Context, *LogsRequest) (stdout io.ReadCloser, stderr io.ReadCloser, error) Status(ctx context.Context, *StatusRequest) (*StatusResponse, error) } ``` ## CLI A CLI will be created for interfacing with the API. It will support all the functionality exposed by the API through a combination of configuration files and command-line flags. Example: ``` $ majur start -f job.yml Job 1234 has started $ majur status -j 1234 Job 1234 is running $ majur logs -f beep boop i'm a job running and stuff ^C $ majur stop $ majur logs beep boop i'm a job running and stuff received SIGTERM but i refuse $ majur stop --force $ majur status -j 1234 Job 1234 is no longer running, was terminated by signal. ``` job.yml example: ```yaml path: "bash" args: ["-c", "hello $USER"] env: ["USER=ben"] resources: memory: limit: "19Mi" cpu: ``` ## Further work / trade-offs ### Further cgroup APIs cgroups allow for more kinds of configuration than we're using, including [device access](https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#device-controller), [huge pages](https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#hugetlb), and plenty more. These could be exposed as part of the API to give the full amount of flexibility and control that the cgroups APIs allow. ### Network connectivity It might be desirable for the containers to have network access. We could up a bridge device and veth pair for our network namespace, but it's non-trivial and so I'm also going to consider this out of scope for the initial version of this project. [This blog post](https://blog.scottlowe.org/2013/09/04/introducing-linux-network-namespaces/) is a good reference for learning more about how this could be setup. ### Bring your own rootfs Rather than running programs on the host in an Alpine chroot, we might further "Docker-ify" our worker API by allowing users to specify OCI images as rootfses to be used instead of hard-coding the Alpine one.