## Production-ready DAC infra on Kubernetes. In this section, we will outline the steps to deploy a full DAC infrastructure on Kubernetes. This tutorial is cloud provider-agnostic. Prerequesites: a good understanding of Kubernetes. ### Kubernetes manifest files You can access the YAML files referenced in this section from [this public repository](https://github.com/marigold-dev/dac-deploy-tools), feel free to reuse and adapt them according to your needs. #### Outlook on the YAML files we got: * [configmap-env-vars.yaml](https://github.com/marigold-dev/dac-deploy-tools/blob/main/k8s/configmap-env-vars.yaml): various environment variables and feature toggles. * [configmap-scripts.yaml](https://github.com/marigold-dev/dac-deploy-tools/blob/main/k8s/configmap-scripts.yaml): shellscripts stored in a configmap and mounted in the pods. * [deployment-<actor>.yaml](https://github.com/marigold-dev/dac-deploy-tools/blob/main/k8s/deployment-coordinator.yaml): k8s deployment files for the coordinator|members|observer. * [ingress-<actor>.yaml](https://github.com/marigold-dev/dac-deploy-tools/blob/main/k8s/ingress-member1.yml): ingress and service files to publicly expose the DAC actors on the internet. * [pvc-<actor>.yaml](https://github.com/marigold-dev/dac-deploy-tools/blob/main/k8s/pvc-members.yaml): persistent volumes to store the data on. #### Docker image used to run the nodes: * This setup demands the Octez binaries alongside a few other utilities, all encapsulated within a Docker image. The most straightforward way to obtain this is by using the official images available [on DockerHub](https://hub.docker.com/r/tezos/tezos-bare/tags?page=1&name=master). These images come preloaded with the *octez-dac-node* binary. * To initialize the nodes, we will run multiple shell scripts. For ease, we have opted to [include Bash as a binary](https://github.com/marigold-dev/dac-deploy-tools/blob/main/k8s/Dockerfile) within the Docker image. However, for a sleeker setup and enhanced security, you might want to execute these shell scripts in an [InitContainer](https://kubernetes.io/docs/concepts/workloads/pods/init-containers/). #### Env vars and feature toggle flags: * In Kubernetes, updating a ConfigMap not automatically trigger a restart of pods that have mounted that ConfigMap. Pods using ConfigMaps retain the old version until manually restarted. Pods will continue to use the previous version of the ConfigMap until they are restarted. We will use that behavior to conveniently manage some feature flags. * Example of toggles and env vars we set in the [configmap](https://github.com/marigold-dev/dac-deploy-tools/blob/main/k8s/configmap-env-vars.yaml): ```bash WIPE_COORDINATOR_DIR: "NO" # put 'YES' if you want to remove that storage before starting up WIPE_COORDINATOR_REVEAL_DATA_DIR: "NO" WIPE_COORDINATOR_OCTEZ_CLIENT_DIR: "YES" COORDINATOR_DIR: "/mounted_volume/dac/coordinator" COORDINATOR_REVEAL_DATA_DIR: "/mounted_volume_reveals/dac/coordinator/reveals" TEZOS_RPC:"https://ghostnet.tezos.marigold.dev" ``` * For instance, if for some reason we want to clean the data-dir of the coordinator we can set ` WIPE_COORDINATOR_DIR: "YES"` and restart the pod so that it gets cleaned up at startup. #### Storing scripts in a configmap: Bash scripts are stored in [a configmap](https://github.com/marigold-dev/dac-deploy-tools/blob/main/k8s/configmap-scripts.yaml) and mounted in the pods. Basically each script does the following: 1. [set dirs](https://github.com/marigold-dev/dac-deploy-tools/blob/main/k8s/configmap-scripts.yaml#L26) and env 2. [import](https://github.com/marigold-dev/dac-deploy-tools/blob/main/k8s/configmap-scripts.yaml#L43) of the keys 3. [configure](https://github.com/marigold-dev/dac-deploy-tools/blob/main/k8s/configmap-scripts.yaml#L46) the dac node 4. [run](https://github.com/marigold-dev/dac-deploy-tools/blob/main/k8s/configmap-scripts.yaml#L74) the dac node #### Cloud native security & management of secrets: If you're operating the DAC on a Kubernetes instance or a private server managed by a cloud provider, setting up Authorization and Authentication is essential. Most often, this is done using the cloud provider's IAM (Identity and Access Management). Such implementations ensure that only authorized users can access sensitive cloud resources and that they can execute only permitted actions. For maintaining security specifically within Kubernetes, it's imperative to follow its security best practices, which include: * Role-Based Access Control (RBAC) * Network segmentation and policies * Encrypt data stored in Kubernetes' etcd key-value store * Implement policy engines * Limit access to Kubernetes API * Configure security context for pods and containers * Use namespaces to isolate workloads * etc For those who use GitOps workflows, tools like [SOPS](https://fluxcd.io/flux/guides/mozilla-sops/) and [SealedSecrets](https://github.com/bitnami-labs/sealed-secrets) come in handy. They offer effective ways to encrypt secrets - in this context, the private keys of wallets. #### Exposing DAC endpoints: The `octez-dac-node` binary does not support CORS headers. Therefore, we use a reverse proxy positioned between the DAC nodes and the client to append CORS headers to HTTP requests. As illustrated in the provided ingress manifests, we use the Nginx ingress controller as a reverse proxy to [supplement the CORS headers](https://github.com/marigold-dev/dac-deploy-tools/blob/main/k8s/ingress-coordinator.yml#L9): ```yaml nginx.ingress.kubernetes.io/enable-cors: "true" (...) ``` Additionally, some ingress configuration options need tweaking: ```yaml # increase max body size to be able to transfer large files to the DAC: nginx.ingress.kubernetes.io/proxy-body-size: 2g # suppress proxy timeouts: nginx.ingress.kubernetes.io/proxy-read-timeout: "2147483647" # We want the timeout to never trigger. There's no way to turn it off per se, so we achieve this by setting an unrealistically large number. nginx.ingress.kubernetes.io/proxy-send-timeout: "2147483647" nginx.ingress.kubernetes.io/proxy-connect-timeout: "2147483647" ``` In our case we also use Nginx ingress controller as a way to expose DAC endpoints over the internet on a [public endpoint](https://github.com/marigold-dev/dac-deploy-tools/blob/main/k8s/ingress-coordinator.yml#L25). You can also reach your DAC from within Kubernetes on its corresponding [service](https://github.com/marigold-dev/dac-deploy-tools/blob/main/k8s/ingress-coordinator.yml#L39): *<name-of-service>.<k8s-namespace>.svc:80* Like so: ``` dac-coordinator-service-ghostnet.dac-ghostnet.svc:80 ``` Please note that we had to specify the `--rpc-addr 0.0.0.0 --rpc-port <port>` in the `octez-dac-node` [command](https://github.com/marigold-dev/dac-deploy-tools/blob/main/k8s/configmap-scripts.yaml#L46) for each node to ensure it listens for connections from the desired IP address and port. #### Monitoring: As of now, the `octez-dac-node` does not produce metrics that we could use to monitor our nodes. Nonetheless, we can configure a livenessProbe in our deployments, allowing Kubernetes to periodically verify if the container is alive: ```yaml livenessProbe: httpGet: path: /health/live port: 11100 initialDelaySeconds: 40 periodSeconds: 20 failureThreshold: 3 ``` #### Backup and recovery solutions: DAC data not only resides on Persistent Volume Claims (PVCs) – a dependable storage solution within Kubernetes – but is also replicated across all DAC community Members and Observers. Despite this inherent redundancy, it remains prudent to have an external backup and recovery strategy for your PVCs. We opted for [Velero](https://github.com/vmware-tanzu/velero), an open-source tool, that offers the ability to safely backup, recover, and migrate Kubernetes cluster resources and persistent volumes, and comes equipped with a disaster recovery feature.