[WIP] UOR Technical Proposal

# [WIP] UOR Technical Proposal # Mission We want a way to make secure supply chain concepts integral to content management in a universal and accessible way. # Vision Normalize supply chain security at scale that serves internal and external stakeholders. # Summary Traceability of software artifacts in supply chains is a long-standing, but increasingly serious security concern. Some ecosystems have made an effort to provide solutions to enhance the verifiability of software artifacts, but often these solutions are not universally applicable. The purpose of the UOR initiative is to create a scalable cross-content correlation framework that enhances the traceability of decentralized software artifacts. # Workflow and Critical User Journeys The mock CLI below describes how a user would interact with a registry using the UOR client. This will also describe workflow for package manager plugins. For the purposes of this example, we will use dnf. ## Publishing a Collection - From disk input ```bash uor-client build collection registry.example.com/test:1.0 uor-client push registry.example.com/test:1.0 --sign ``` - From registry generated manifest ```bash uor-client store /path/to/manifest registry.example.com/test:1.0 uor-client push registry.example.com/test:1.0 ``` ## Resolving attribute queries ```bash export UOR_REGISTRY_CONFIG=registry-config.yaml uor-client resolve /path/to/attribute-query ``` ## Create a deployment record ```bash # Adds links on manifest as is/ This can be used to # continuously updated dependencies and create new deployment records. uor-client create aggregate /path/to/config # Resolve linked attribute queries into manifest descriptor. uor-client create aggregate /path/to/config --freeze # This will generate an immutable deployment record and a corresponding # component list that can be attached. uor-client create deployment-record registry.example.com/test:1.0 ``` ## Generate Component List ```bash! uor-client create component-list registry.example.com/test:1.0 --format spdx ``` ## Consuming collections ```bash uor-client pull registry.example.com/test:1.0 --follow-links ``` ## Through the App Proxy ```bash # Start app proxy server with CLI uor-client serve /var/run/uor.sock # With systemd systemctl start uor-manager # With dnf plugin dnf install httpd # Find with fuse driver uor-fuse ls ... httpd ``` # High-Level Plan ## Phases ### POC (Pre-MVP) - Schema Validation - Attestations - Component List Automation - Signature Verification - Manifest/Blob Signing - Registry Attribute Queries - Linked Collections ### Crawl (MVP) - Search Domains - Linked Collection Locking - Package Manager Compatability (i.e. app proxy) - Host Policy Service - Enable SLSA Level 2 Compliance - Schema Cataloging ### Walk (Beyond MVP) - UOR Native Package Managers - Registry Gateway Policy Service - Collection Change Management - Enable SLSA Level 3 Compliance ### Run (Beyond MVP) - RBAC/ABAC hybrid capability - Attribute and Blob Encryption - Enable SLSA Level 4 Compliance > Note: SLSA compliance levels would be achieved when UOR is integrated with a hosted build service that produces attestable build metadata. ## Steps 1. Create a Collection spec to serve as an extension of the OCI spec (v3) 2. Define an initial attribute extension for the distribution spec 3. Fork Quay and add functionality that will support the distribution spec 4. Develop a local solution that can enable compatibility with v2 registries 5. Create a policy engine solution that apply logic to operations based on attributes 6. Clarify extension spec following lessons learned from Quay fork ## Required Components - Server-side (registry) distribution implementation - CLI/Client Libraries - Local Storage Solution with embedded database - Single tool for service management - Services: - Policy Engine > Host Level Policy Engine > Policy Engine as a Registry Gateway - Collection Manager ### Upstreams - sigstore/cosign - in-toto - oras - anchore syft (possibly?) ## Distribution Spec Changes The goals for distribution spec changes is to build an extension to the existing OCI distribution spec to allow for cross-repo/cross-namespace attribute querying. With query optimization and backward compatability in mind, we have the following requirements: - This must retain the V2 structure with standalone blob storage - This will need an indexing service to optimize certain types of queries (e.g. attributes) ### RBAC The multi-tenant design of container registries will not change with the addition of attribute queries. Repository-scoped permissions should still be used and when returning collection results, the result set should be filtered by what is viewable to the user. In other words, owners will be able to choose to make artifacts public. ### Query scoping On the client side, scoping queries to namespace and repositories will be supported. ## CLI/Client Libraries We have existing CLI and library code bases that manage single collections and linked collection. These will continue to be used to publish single collection and manage local storage. The CLI/libraries will be extended to interface with the proposed API changes. ## Local Storage Solution - V2 Compatability - Mimics V3 functionality using an embedded database - Descriptors are stored in the database and used to create resultant collection that can be published as a single unit. ## Service Manager - Manages running gRPC services ### Service - Policy Engine > Question: Could this be an OPA Plugin? Component lists or any attestations on content add little to no value if that information is not used for security gating. The policy engine allows users to configure and enforce policy based on the existence of these artifacts and their contents. ### Service - Collection Manager The purpose of the collection manager service is to act as a proxy for existing package managers and installation tooling. This can also be used for clients that do not have knowledge of collection publishing and manipulation. ## UOR Concepts ### Attribute Schemas Schema Type is part of an attribute set. Each schema has an id and in order for a Collection to be compliant, it must have schema type and value declarations that matches the validating schema. Referencing schema by the ID in the manifest allows the schema source to come from anywhere (i.e. remote reference, a local file, structs in code). Schema enables consistency and shared attribute meaning for collection publishers. Schemas can also be used in the generation of other configuration files such as a dataset configuration file or an attribute query. Each published schema will be discoverable by its attributes which will identification, description, and category. #### Core Schema UOR has a core schema that will be validated during publish time. All collection will require this information to be consumable. Part of the core schema include component list information. ### Multi-Schema Option 1: The schema type would be an array of schema IDs as defined by the publisher. Option 2: The publisher defines core schema and schemas can be registered with the Collection Manager to add context to an operation. The schemas would be added to a schema pool along with the core schema. Option 3: Combination of both. ### Resultant Collections The attribute query endpoint can be used to automate building application from existing collection. The endpoint will resolve an attribute query to an index manifest of matching descriptors that can be used for collection publishing. To enable efficiency in this workflow, it would be best practice to only publish one component or file per collection. However, if a collection is published with more than one file, the UOR client will allow attribute filtering on individual collections. ### Search Domains A content query can be sent to multiple OCI registries. A user can specify which registries to query and prioritize registry results. ### Reference Types/Collection Links #### OCI Reference types According to the OCI Referrers spec, referrers using a tag schema to declare relationships in the `subject` field. When querying the referrer's API, an index manifest response is generated. This type of workflow could be used when creating supporting artifacts for collection that would only ever reference one collection as the parent. An important note here is that a referring manifest can only ever have one subject. Meaning this relationship takes the form of a tree structure. ##### Use Case ```mermaid graph LR; A[Signature]-->B[Collection]; C[Attestation] --> B D[Scan Results] --> B ``` #### Collection Link Links are a UOR specific type and differ from references because they define relationships that can be cross repo. These links can also form a DAG structure. https://github.com/uor-framework/uor-client-go/blob/main/docs/design/nodes.md#linked-collection #### Methods - Option 1: We could link collection references directory in the collection spec. - Pros - Allows the publisher to define collection links without additional steps. - Cons - Rate of Change: Will a collection change everytime a link does? - Non-publisher cannot create collection links - Option 2: Maintain an index manifest in each repo to track relationships. This would be created from running an attribute query. It would produce an index of attribute resolved links. - Pros - Anyone can link two collections and keep them update to date easily using the attribute API - Cons - Another manifest reference will have to be maintained. Scaling this could get complex from a lifecycle management perspective. ##### Use Cases ```mermaid graph LR; A[Terraform main.tf]-->B[AWS Provider]; A --> C[Terraform Module 1] A --> D[Terraform Module 2] E[Another Terraform main.tf] --> B E --> D ``` ```mermaid graph LR; A[CVE Collection]--Links-->B[CPE Attribute] B--Resolve-->C[Collection Digest Reference] ``` ## Supply Chain Security Areas ## Immutable/Mutable References Our goal is to make collections tamper-proof without relying solely on immutable references. Collections will, of course, always have a digest but pulling by digest would not be a requirement to ensure the collection contains the expected content. **Why don't we want to use immutable references as the main security control?** Using immutable references make upgrading more software difficult. Without signatures, using immutable references just shifts risk from one party to another because using immutable references do not provide proof of the publisher identity (i.e. non-repudiation). Where is immutable reference usage important? - Build reproducibility - TOCTOU (Time of Check, Time of Use) Solutions: - Policy to discourage usage of highly mutable tags like "latest" and to use semver tags, Like 2.X, 2.1.X. - Digest translation after attributes are resolved to act as a lock file. This should be republished for use. ### Signatures Signatures are a better way to ensure the collection has not been tampered with and come from a trusted entity. Blobs must be individually signed so that they can be referenced as standalone descriptors. This functionality will come from the use of cosign which already supports OCI artifacts. ### Component List Component list fields are part of the core UOR schema and are required to be present in the manifest attributes. This information will be listed in collection manifest attributes to assure they are queryable with the attribute searching API. Automated collection of component list attributes will be completed by the UOR CLI operation, when possible. If the UOR CLI is being integrated with a trusted build service, the component information will be populated by that service. When resultant collections are being generated with the attributes API, all component list information will be composed of the existing information from the resolved collections. These component list will contain the required information for conversion into commonly adopted formats like SPDX and CycloneDX. #### Streamlined Access to Component Lists A requirement for security teams that manage many products is streamlined component list management. This is possible with the proposed solution by allowing users to configure search domains (e.g. queryable registry endpoints) for content. Using the component list schema and product schema, queries can be constructed to easily find application specific component lists. This allows the storage of components to remain decentralized for distributed teams while still providing one method for content retrieval. ### Attestations We will support in-toto attestation. One core attestation during collection publishing will include the collection as the statement and the attributes as the predicate. More information [here](https://github.com/in-toto/attestation/blob/main/spec/README.md) Attestations from user input will be accepted during collection publishing. Tooling will be required to fail closed if attestations are not present. ### Vulnerability Management - Component list input for vulnerability scanning (e.g. Anchore Grype) - Attribute queries - Semver tagging (no more latest tag) - Auto-upgrades ### Transparency Logging > TODO # Areas of Concern # Snippets ```bash // Manifest provides a schema for a manifest that extend the OCI manifest when marshaled to JSON. type Manifest struct { // MediaType is the media type of the object this schema refers to. MediaType string `json:"mediaType"` // ArtifactType is the IANA media type of the artifact this schema refers to. ArtifactType string `json:"artifactType"` // Blobs is a collection of blobs referenced by this manifest. Blobs []Descriptor `json:"blobs,omitempty"` // Attributes contains typed attributes for this artifact. Attributes map[string]json.RawMessage `json:"attributes,omitempty"` // Links defines references to any manifest(s) for Collection that this Collection is dependent on. The attributes of each collection can be used to create an attribute query or the Collection can simply be resolved to the digest. // is dependent on or relates to. Links []*Descriptor `json:"links,omitempty"` // Annotations contains arbitrary metadata for the artifact manifest. Annotations map[string]string `json:"annotations,omitempty"` } ``` ```bash // Descriptor describes the disposition of targeted content. type Descriptor struct { // MediaType is the media type of the object this schema refers to. MediaType string `json:"mediaType,omitempty"` // ArtifactType is the artifact type of the object this schema refers to. // // When the descriptor is used for blobs, this property must be empty. ArtifactType string `json:"artifactType,omitempty"` // Digest is the digest of the targeted content. Digest digest.Digest `json:"digest"` // Size specifies the size in bytes of the blob. Size int64 `json:"size"` // URLs specifies a list of URLs from which this object MAY be downloaded URLs []string `json:"urls,omitempty"` // Attributes contains typed attributes for this artifact. Attributes map[string]json.RawMessage `json:"attributes,omitempty"` // Annotations contains arbitrary metadata relating to the targeted content. Annotations map[string]string `json:"annotations,omitempty"` } ``` ```bash type Core struct { // RegistryHint will allow registries to be // added to a user search domain in a discovered zone. // Similar concept to DNS root hints. RegistryHint string `json:"registryHint"` } ``` ```bash // Component schema defines information to create a component list. // Based on Anchore Syft Package spec type Component struct { ID string `json:"id"` Name string `json:"name"` Version string `json:"version"` Type string `json:"type"` FoundBy string `json:"foundBy"` Locations []string `json:"locations"` Licenses []string `json:"licenses"` Language string `json:"language"` // Common Platform Enumeration CPEs []string `json:"cpes"` // Package URL PURL string `json:"purl"` AdditionalMetadata json.RawMessage `json:",inline"` } ``` # Diagrams ## System Context ```mermaid graph LR; A[UOR User]-->B[Client] B --publishing and retrieval--> C[Collection Service] B --policy enforcement--> D[Host Policy Service] C --storage and queries--> E[Registry] E --> F[Organization Policy Service] ``` ```mermaid sequenceDiagram; participant Client participant Host Policy Service participant Collection Service participant Registry Client->>Host Policy Service: Check attributes Host Policy Service-->>Client: Collection Meets Criteria Client->>Collection Service: Publishing Operation Collection Service->>Registry: Publishing to Registry Registry-->>Collection Service: Success Collection Service-->>Client: Success ``` ```mermaid sequenceDiagram; participant Client participant Organization Policy Service participant Collection Service participant Registry Client->>Collection Service: Publishing Operation Collection Service->>Registry: Publishing to Registry Registry->>Organization Policy Service: Check attributes Organization Policy Service-->>Registry: Meets criteria Registry->>Registry: Store collection Registry-->>Collection Service: Success Collection Service-->>Client: Success ``` ### Client #### Searching ```mermaid sequenceDiagram; participant User participant Client participant Registry User->>Client: Sending attribute query Client->>Client: Reconstruct query for Registry API Note left of Client: Use the search domain to determine which order to query the registries. Client->>Registry: Send modified query Registry-->>Client: Constructed and fully resolved resultant collection Client-->>User: Merged query results from search domain ``` ### Registry #### Components ##### Container Registry Service ##### Database > TODO Data model information #### Workflows ##### Publishing ```mermaid sequenceDiagram; participant Client participant API Endpoint participant Ingestion Service participant Manifest Scraper participant Database Client->>API Endpoint: User Collection API Endpoint->>Ingestion Service: User Collection Ingestion Service-->>API Endpoint: Writen to disk API Endpoint-->>Client: Written to digest X Note right of Manifest Scraper: Triggered by hook or schedule? Manifest Scraper->>Manifest Scraper: Scrape attributes from manifest Manifest Scraper->>Database: Write attribute data to database Note right of Manifest Scaper: Optional Component ``` ##### Consumption ```mermaid sequenceDiagram; participant Client participant Attributes API participant Database Client->>Attributes API: Attribute Query Attributes API->>Database: Database Query Database-->>Attributes API: Result Set Attributes API-->>Client: Resultant Collection ``` ### Collection Service > TODO ### Policy Service > TODO # Glossary - Collection - A collection of files represented as an OCI artifact schema: the properties and datatypes that can be specified within a collection. - Resultant Collection - A collection that is construct from attribute queries for existng collections to form a larger application. - Dataset Configuration - A configuration file used during collection publishing to configure file attributes, associate schema, and link existing collections. - Attribute Query - A configuration file used with the UOR CLI that defines key and values to filter collection contents during pulling operations. - Component List - An inventory list of all the software artifacts that comprise a collection. - Multi-Schema Collection - A collection with an attribute set that satisfies multiple attribute schemas. # Reference Birkholz, H., Delignat-Lavaud, A., & Fournet, C. (2022). An Architecture for Trustworthy and Transparent Digital Supply Chains draft-birkholz-scitt-architecture-00. *Internet Engineering Task Force*. <https://www.ietf.org/archive/id/draft-birkholz-scitt-architecture-00.txt>