# Design: In-Guest File-Level Backup for the OpenShift KubeVirt VMs via OADP ## 1. Abstract OADP provides robust, snapshot-based backup and restore for OpenShift KubeVirt Virtual Machines (VMs). This is ideal for full disaster recovery but does not address the common need for granular, file-level backups initiated from within the guest OS (e.g., backing up application configurations, user data, or log directories). This design proposes a Kubernetes-native, client-server architecture to enable file-level backups directly from within VMs. We will introduce a new Custom Resource Definition (CRD), `BackupStorageLocationServer`, which deploys and manages a Kopia repository server within the OpenShift cluster. This server will use an existing Velero `BackupStorageLocation` (BSL) as its storage backend. Users inside VMs can then use a standard Kopia client to connect to this managed server and perform self-service, path-level backups and restores. This approach shifts from an external, "pull" model (like `libguestfs` or `ssh`) to an internal, "push" model, empowering VM users and integrating seamlessly with existing cloud storage infrastructure and maintaining centralized storage governance by administrators. ## 2. Background Current VM protection with OADP focuses on block-level consistency by snapshotting Persistent Volume Claims (PVCs). While this method is effective for full VM recovery, it is inefficient and overly complex for scenarios where operators or application owners need to restore a single corrupted configuration file or recover a specific user directory. This design introduces a persistent, long-running Kopia service that decouples the file-backup lifecycle from the VM snapshot lifecycle, offering greater flexibility, efficiency, and a better user experience for application owners and users within the VM. ## 3. Goals - **Declarative Deployment:** Provide a Kubernetes-native Custom Resource Definition (CRD) (`BackupStorageLocationServer`) to automate the deployment and configuration of a multi-tenant Kopia repository server. - **Storage Re-use:** Leverage existing Velero `BackupStorageLocation` (BSL) resources to avoid credential duplication and simplify storage management. - **Self-Service for VM Users:** Enable users within a KubeVirt VM to use a standard Kopia client to back up and restore their own files on their own schedule. - **Centralized User Management:** Manage Kopia user credentials via OpenShift `Secrets` or `ConfigMaps`, allowing for GitOps-style management and easy user rotation. - **Secure by Default:** Ensure all communication between the in-guest client and the in-cluster server is secured with TLS. ## 4. Non-Goals - **Replacing Velero:** This is a *complementary* solution for file-level backup, not a replacement for Velero's full VM snapshot capabilities. - **Automatic Backup Client Installation:** The installation and configuration of the Kopia client *inside* the guest OS is the responsibility of the VM owner. - **External File Access:** This design does not use `libguestfs`, `ssh`, or any other mechanism to access the VM from the outside. All connections are initiated by the client inside the VM. - **Application Consistency:** It is the responsibility of the VM user to ensure that the application’s data is in a consistent state before starting a backup (e.g., by flushing caches or pausing writes). # 5. High-Level Architecture The entire lifecycle of the backup server is managed through Kubernetes Custom Resources (CRDs). This enables GitOps workflows and simplifies server deployment and configuration. ![OADP_Kubevirt_Kopia_BSL (1)](https://hackmd.io/_uploads/HJUJV5F7lx.png) ### 5.1 Separation of Concerns: - Storage Administrators manage the underlying storage (e.g., S3) and credentials, without needing access to the OpenShift cluster. - Cluster Administrators manage the backup service, users, and permissions. - VM Users are consumers of the service, responsible only for their own data, without any knowledge of the underlying infrastructure and S3 storage credentials data. ### 5.2 Centralized Control, Decentralized Execution The Kopia Server acts as a central point for authentication and access control. The Kopia Client, executed on user VMs, performs the client-side work of reading files, encryption, deduplication, and compression. ### 5.3 Security and Multi-Tenancy: The architecture is built on a zero-trust model with multiple layers of security to ensure data confidentiality, integrity, and tenant isolation. - Zero-Knowledge Encryption: Data is encrypted on the user's VM by the Kopia Client before transit. The Kopia Server and S3 storage only handle opaque, encrypted blobs, meaning ~~neither Cluster nor~~ S3 administrators can access user backup contents. **The kopia repository administrator is able to view and recover user files.** - Secure Transport: All client-server communication is protected by mandatory TLS, preventing eavesdropping on the network. - Centralized Access Control: The Kopia Server acts as a strict gateway, enforcing Access Control Lists (ACLs) for every operation. This ensures users can only access and manage their own backup data, providing robust multi-tenancy. - Optional Storage Hardening: S3 administrators can enable additional safeguards like Object Locking (WORM) for immutability against ransomware or accidental deletion, and Server-Side Encryption (SSE) for defense-in-depth and compliance at the storage layer. - User data, ACLs and credentials are stored within S3 kopia repository. If the disaster happens causing OpenShift services to be unavailable, the user configurations is preserved on the S3 storage. # 6. Sequence Diagrams ### 6.1 Enabling the BSL Server - Kopia Backup Server Perspective: Cluster Administrator's interaction with the OpenShift system. This diagram shows how a Cluster Admin's actions (creating CRDs) allows the Kopia Server to be deployed and configured. ```mermaid sequenceDiagram actor Cluster Admin participant k8s as OpenShift API participant op as OADP Operator participant ks as Kopia Server Pod participant s3 as S3 Bucket Cluster Admin->>k8s: 1. Apply BackupStorageLocation CRD - **BSL** note over op: Operator is watching for CRDs Cluster Admin->>k8s: 2. Apply BackupStorageLocationServer CRD - **BSLS** op->>k8s: 3. Read BSL & BSLS resources op->>op: 4. Process reconciliation loop op->>k8s: 5. Read BSL config (for S3 secret and other config data) op->>k8s: 6. Create Pod(Kopia Server) resource activate ks k8s->>ks: 7. Start Pod (Kopia Server) ks->>s3: 8. Connect to repository s3-->>ks: Connection successful op->>op: 9. Process reconciliation loop (watching for Kopia Server pod) deactivate ks op->>k8s: 10. Update BSLS status to 'Ready' k8s-->>Cluster Admin: BSLS Status is now 'Ready' ``` ### 6.2 Managing Users and Access Control (ACLs) Perspective: Cluster Administrator's interaction with the running Kopia Server. ACLs and Users can be shared across all of the Kopia server instances or can be per kopia server instance. ACLs are managed similarly to the users, below is the `kopia server acl` usage and the diagram below is presenting how an administrator uses `kopia` to configure users and their permissions on the kopia server. > Note: `kopia` CLI can be used directly from the deployed kopia pod to which cluster administrator has access in which case no external administrative access is required. ```shell usage: kopia server acl <command> [<args> ...] Manager server access control list entries server acl add --user=USER --target=TARGET --access=ACCESS [<flags>] delete [<flags>] [<id>...] enable [<flags>] list [<flags>] ``` ```mermaid sequenceDiagram actor Cluster Admin participant cli as Kopia CLI participant ks as Kopia Server %% --- Phase 1: Connect --- Cluster Admin->>cli: 1. Connect to repository with admin credentials note over cli: `kopia repository connect server --url=<server> --username=<admin_user> ...` activate cli cli->>ks: 2. Authenticate Admin Credentials activate ks ks-->>cli: 3. Authentication OK deactivate ks cli->>cli: 4. Save connection details locally note over cli: Subsequent commands will use this context. cli-->>Cluster Admin: 5. Display "Connected to repository" deactivate cli %% --- Phase 2: Perform Administrative Tasks, some example tasks --- note over Cluster Admin, ks: Admin now performs one or more tasks. opt Add User Cluster Admin->>cli: 6a. Execute `kopia server user add <flags> <username>` activate cli cli->>ks: Authenticated request: Add User for <user> ks-->>cli: Response: OK cli-->>Cluster Admin: Displays success message deactivate cli end opt List Users Cluster Admin->>cli: 6b. Execute `kopia server user list` activate cli cli->>ks: Authenticated request: Get User List ks-->>cli: Returns list of users cli-->>Cluster Admin: Displays user list deactivate cli end opt Change User Password Cluster Admin->>cli: 6c. Execute `kopia server user set <user> ...` activate cli cli->>ks: Authenticated request: Change Password for <user> ks-->>cli: Response: OK cli-->>Cluster Admin: Displays success message deactivate cli end opt Delete User Cluster Admin->>cli: 6d. Execute `kopia server user delete <user>` activate cli cli->>ks: Authenticated request: Delete <user> ks-->>cli: Response: OK cli-->>Cluster Admin: Displays success message deactivate cli end %% --- Phase 3: Disconnect --- Cluster Admin->>cli: 7. Disconnect from repository note over cli: `kopia repository disconnect` activate cli cli->>cli: 8. Remove local connection details cli-->>Cluster Admin: 9. Disconnected deactivate cli ``` ### 6.3 VM User Performing a File Backup This sequence diagram illustrates a typical session for a VM User. The user first connects their Kopia client to the server, then performs tasks like creating backups or restoring files, and finally disconnects. ```mermaid sequenceDiagram actor VM User participant cli as Kopia CLI (on VM) participant ks as Kopia Server %% --- Phase 1: Connect to the Repository --- VM User->>cli: 1. Connect to repository with user credentials note over cli: `kopia repository connect server --url=<server> ...` activate cli cli->>ks: 2. Authenticate User Credentials activate ks ks-->>cli: 3. Authentication OK deactivate ks cli->>cli: 4. Save connection details locally note over cli: Subsequent commands will use this context. cli-->>VM User: 5. Display "Connected to repository" deactivate cli %% --- Phase 2: Perform Backup and Other Tasks --- note over VM User, ks: User can now perform various actions. opt Create a Backup VM User->>cli: 6a. Execute `kopia snapshot create /path/to/data` activate cli cli->>cli: Reads, compresses, and encrypts local files cli->>ks: Authenticated request: Upload backup data & manifest activate ks ks->>ks: Verifies ACLs & writes to storage (S3) ks-->>cli: Response: OK (Snapshot created) deactivate ks cli-->>VM User: Displays success message deactivate cli end opt List Backups VM User->>cli: 6b. Execute `kopia snapshot list` activate cli cli->>ks: Authenticated request: Get Snapshot List ks-->>cli: Returns list of snapshots cli-->>VM User: Displays snapshot list deactivate cli end opt Restore Files VM User->>cli: 6c. Execute `kopia restore <snapshot_id> ...` activate cli cli->>ks: Authenticated request: Download backup data ks-->>cli: Returns backup data cli->>cli: Decrypts & writes files to local disk cli-->>VM User: Displays success message deactivate cli end %% --- Phase 3: Disconnect --- VM User->>cli: 7. Disconnect from repository note over cli: `kopia repository disconnect` activate cli cli->>cli: 8. Remove local connection details cli-->>VM User: 9. Display "Disconnected" deactivate cli ``` ## 7. Detailed Design This section outlines the technical specification for the new `BackupStorageLocationServer` CRD that enables the Kopia server functionality, along with security considerations. ### 7.1. `BackupStorageLocationServer` CRD Definition A new CRD `BackupStorageLocationServer` is introduced. This resource extends the OADP/Velero `BackupStorageLocation` (BSL) by using it as a storage backend for a file-level backup server. **apiVersion:** `oadp.openshift.io/v1alpha1` **kind:** `BackupStorageLocationServer` #### `spec` The `spec` defines the desired state of the Kopia server deployment. | Field | Type | Description | Required | | :--- | :--- | :--- | :--- | | **`backupStorageLocation`** | `BackupStorageLocationSpec` | A reference to the existing `BackupStorageLocation` that will serve as the storage backend. | Yes | | **`repositoryPasswordSecret`** | `corev1.SecretKeySelector` | Reference to a Secret key containing the master password for initializing the Kopia repository. | Yes | | **`userManagement`** | `UserManagementSpec` | Configuration for managing Kopia user credentials for multi-tenancy. | Yes | | **`prefix`** | `string` | An optional path prefix. The final path in the bucket will be `oadp-bsls/{prefix}`. It is used to separate Kopia snapshots from the OADP/Velero snapshots | No | | **`service`** | `ServiceSpec` | Defines the Kubernetes Service used to expose the Kopia server. Defaults to `ClusterIP`. | No | | **`tls`**| `TLSSpec` | TLS configuration for the server endpoint. | No | | **`podConfig`** | `PodConfig` | Configuration for the deployed Kopia server pod, including resources and scheduling. | No | #### `BackupStorageLocationSpec` | Field | Type | Description | Required | | :--- | :--- | :--- | :--- | | **`name`** | `string` | Name of the `BackupStorageLocation` CR. | Yes | | **`namespace`** | `string` | The namespace where the `BackupStorageLocation` is located. Defaults to `oadp-operator`. | No | #### `UserManagementSpec` | Field | Type | Description | Required | | :--- | :--- | :--- | :--- | | **`source`** | `UserSource` | Defines the source of the Kopia user list file. Only one source can be set. | Yes | | `UserSource` Field | Type | Description | | :--- | :--- | :--- | | **`secret`** | `corev1.SecretKeySelector` | A Secret key containing the user list (e.g., `username@hostname:passwordhash`). | | **`configMap`**| `corev1.ConfigMapKeySelector`| A ConfigMap key containing the user list. | #### `TLSSpec` | Field | Type | Description | | :--- | :--- | :--- | | **`secretName`** | `string` | Name of a `kubernetes.io/tls` type Secret containing `tls.crt` and `tls.key`. | #### `PodConfig` | Field | Type | Description | | :--- | :--- | :--- | | **`podResources`** | `corev1.ResourceRequirements` | Defines the CPU and memory resource requests and limits for the Kopia server pod. | | **`loadAffinity`** | `LoadAffinity` | Defines the load affinity for scheduling the Kopia server pod onto specific nodes. | #### `LoadAffinity` | Field | Type | Description | | :--- | :--- | :--- | | **`nodeSelector`** | `metav1.LabelSelector` | The label selector to match nodes where the pod can be scheduled. | #### `status` The `status` subresource reflects the observed state of the Kopia server. | Field | Type | Description | | :--- | :--- | :--- | | **`conditions`** | `[]metav1.Condition` | Standard conditions like `Available`, `Progressing`, `Degraded`. | | **`serviceURL`** | `string` | The internal DNS address and port for clients to connect to (e.g., `my-server.oadp-operator.svc:51515`). | | **`repositoryStatus`** | `string` | The status of the Kopia repository (`Initialized`, `NotInitialized`, `Error`). | ### 7.2. Security Considerations * **TLS Encryption:** Communication between the in-guest client and the server must be encrypted in production. The `spec.tls.secretName` field enables this. * **Controller RBAC:** The controller requires read-only access to `Secrets` and `BackupStorageLocations` in the OADP/Velero namespace. This access must be tightly scoped. The controller should create a derived, temporary Secret containing only the necessary credentials for the Kopia pod to consume, rather than mounting the OADP BSL's credentials Secret directly. * **Network Policies:** `NetworkPolicy` resources should be deployed to restrict access to the Kopia server `Service`, allowing connections only from designated namespaces that host VMs. * **User Credential Management:** Passwords in the user list file are hashed by Kopia, not stored in plaintext. This file should always be managed via a `Secret` rather than a `ConfigMap` for better security posture. ## 8. Other Considered Designs A previously considered approach using `libguestfs`, while technically feasible, tightly couples file-level operations to the cluster administrator's Velero backup. It also introduces significant operational overhead for each backup operation, such as taking a snapshot, creating a PVC, mounting it, and launching a helper pod. The Kopia server model provides a more decoupled, scalable, and user-centric solution for file-level backups. ### License Kopia is APL https://github.com/kopia/kopia/blob/master/LICENSE