Akash Singhal

@akashsinghal

Joined on Dec 7, 2021

  • Ratify currently publishes multiple forms of release assets both for production and development uses. Currently, these assets are not published with accompanying supply chain metadata including signatures, SBOMs, and provenance. Shipping each of these forms of metadata with all binaries and container images produced by Ratify will provide consumers a verifiable way to guarantee integrity of Ratify assets. Furthermore, this will improve Ratify's OSSF scorecard. What does Ratify currently publish? Ratify publishes two types: release and development. Release assets accompany official Ratify Github releases. Development assets are published weekly (or adhoc as needed). Each publish type includes the following group of assets: CRD container image to ghcr.io/ratify-project Base container image to ghcr.io/ratify-project Base + plugins container image to ghcr.io/ratify-project
     Like  Bookmark
  • Testing Setup Installing Ratify Install Ratify from dev build with a single cosign Inline KMP helm install ratify \ oci://ghcr.io/deislabs/ratify-chart-dev/ratify --atomic \ --version 0-dev \ --namespace gatekeeper-system \ --set featureFlags.RATIFY_CERT_ROTATION=true \ --set image.pullPolicy=Always \
     Like  Bookmark
  • This document is evolving and will be updated as new discussions/information comes about. Sample Scenario User has a CI/CD pipeline used for deploying container images to their cluster. Prerequisite steps: Image is built Various CSSC workflows are applied (e.g signing, SBOM, vuln scanning, lifecycle metadata, provenance) Image + artifact pushed to registry
     Like  Bookmark
  • PRD doc here Vulnerability reports are JSON documents attached to the subject as a referrer. What and how they are verified surfaces different considerations: Which tools will generate these reports?Trivy, Grype, Syft We will focus on Trivy and Grype initially What report format will we support? For now, we will focus on SARIF reports.
     Like  Bookmark
  • Goals Define a generic Cache interface Implement default cache implementation using redis (distributed) and/or Ristretto (in memory) Remove all existing cache implementation Migrate existing cache implementation to use interfaceThis will require some refactoring of auth cache and subject descriptor cache Update caching documentation Non Goals Metrics support for new cache
     Like  Bookmark
  • Ratify stores the auth credentials for a registry in an in-memory map as part of the ORAS store. We want to unify the caches across Ratify so that we can introduce new cache providers such as Redis to support HA scenarios. Is storing registry credentials in a centralized cache a security risk? During investigation of cache unification, we found that the external plugins cannot share in-memory caches with the main Ratify process since they are invoked as separate processes. This includes registry credentials. If we don't address this problem, external plugins will invoke authentication flow every time they are invoked which will be a huge performance bottleneck. How do we make auth credentials accessible to external plugins? What is the security boundary? Ratify will be deployed inside a user K8s cluster. Only users with correct RBAC can access the cluster but there's no guarantee how a centralized cache instance will be secured. For Redis, Ratify will store the registry credentials in a well-known cache key pattern which will allow any application with an authorized client (via a password) to read registry credentials. Redis has a concept of "Databases" but in reality they are only logical "namespace" separators and do not inherently provide data isolation. We cannot rely on "Database" partitioning to provide security guarantees. (See this discussion) Other applicatons are within the cluster boundary but can we assume those application fall into the trusted boundary?
     Like  Bookmark
  • Ratify supports TLS and mTLS communication with Gatekeeper when handling external data provider requests. Certificates and keys managed by Gatekeeper/Ratify can be rotated independently. The Ratify server should be able to: Identify changes to any of these certificates Replace TLS certs with new versions Reduce request dropping due to TLS cert mismatch (ideally none) How it works currently mTLS guarantees communication integrity from both sides. Client and server must verify the other parties respective certificates. Gatekeeper generates it's own TLS public certificate and key which are derived from a (new or existing) CA certificate and key. Ratify is provided the CA public key during startup.
     Like 1 Bookmark
  • Apart from pod metrics, Ratify does not emit any application-specific metrics. This proposal aims to outline the design and implementation of an extensible metrics interface. Goals Add latency metrics for executor and verifiers Add request metrics for all registry, KMS (Key Management System), and identity network operations Add load metrics for number of External Data requests Ratify pod is processing Metrics implementation should be metrics provider agnostic Documentation on how to add new metrics provider Documentation for every metric implemented Dashboard (Nice to have)
     Like  Bookmark
  • Currently, only public registries are supported for cosign verification in Ratify. This is due to how Cosign was initially implemented and the inherent limitations with the current Ratify design on working with non-OCI Artifact manifest types. Furthermore, enabling cosign requires a flag to be set in the ORAS store. This is confusing and counter intuitive since Cosign is simply a verifier and a user should not need to interact with the Store to support a specific verifier. Even if a cosign verifier is not specified in the config, the validation process will fail for any private registry scenario if the cosignEnabled flag is set to true. The goal is to redesign and implement a new Cosign integration which will remove the opt in flag from the ORAS store as well as support private registry scenarios for Cosign. Current Implementation There are two main components critical to cosign verification: A special subroutine is added in the ListReferrers method that checks if the cosignEnabled field is set in the ORAS Store config. If cosign is enabled, an existence check for the cosign-specific tag schema manifest is performed using CRANE as the client. Currently this operation does not support any credentials to be passed in and thus fails in any auth enabled registry. The cosign verifier leverages cosign's online verification function to handle all pulling of image and the signature blobs and the subsequent verification operations. This violates the Referrer Store and Verifier interaction of Ratify. As a result, the cosign verifier does not have access to credentials from the ORAS auth provider. Thus, Ratify cosign verifier currently only supports public registries.
     Like  Bookmark
  • This document discusses the different performance tests and results. Previous performance benchmarks: https://hackmd.io/@akashsinghal/rkEZqxxW5 Pod Level Analysis We measured the time it took for Ratify to process a single external data request. This ED request had multiple unique subject images for each container. Each unique image had variable number of signatures attached. Image and Signature creation/push as well as deployment yaml generation were done using tools in this repository: https://github.com/anlandu/ratify-perf Test Parameters:
     Like  Bookmark
  • Currently, Ratify allows subject image references to be specified via tag or digest. Tags are mutable and thus not guaranteed to be referencing the same/correct image every time. Ratify should: Require all subject references to be specified via digest Implement a new tag-to-digest mutation endpoint for the ratify httpserver which will return the full digest reference string of a tag-specified reference input. This endpoint can be leveraged by a new External mutating data provider defined for Ratify which will be invoked by Gatekeeper's mutation webhook. Add Mutation endpoint Add a new endpoint /ratify/gatekeeper/v1/mutate. We will also add the accompanying handler which will utilize the referrer store to fetch the subject descriptor which will have the digest in it. The ORAS auth provider already handles all registry authentication exchanges. As a result, no special authentication logic will need to be handled for this endpoint. However, the first authentication exchange, which is the longest, will now happen during the mutation since mutation occurs before admission. It is possible that the request could time out since the default mutation webhook timeout is 1 second (not 3 seconds).
     Like  Bookmark
  • Currently, Ratify is not concurrent. It cannot verify multiple subjects, multiple stores, or multiple artifacts in parallel. This will lead to serious performance degradation for complicated scenarios eventually hitting the validating webhook timeout. Ratify will begin by implementing go routines at the subject, store, and artifact level. Furthermore, some update and cache operations will be refactored to use locks. Adding Go Routines Multiple Subjects The http server handler is responsible for invoking the executor's verify function for each subject reference provided in the External Data request. This inner loop will be converted to it's own go routine. Multiple Stores / Multiple Artifacts
     Like  Bookmark
  • Problem Currently, Ratify is not aware of the host OS/Architecture that a container should be used with. As a result, any subject reference that is an OCI Index is validated at the Index level instead of the platform-specific manifest that the container will use. This leads to two main questions: How should Ratify treat an OCI Index? Should the OCI Index be kept as an ordinary artifact with signatures verified ONLY at the index level? Should the OCI Index be further resolved to the platform-specific manifest and the validation occurrs ONLY on the platform-specific manifest? Should the OCI Index be validated AND it's platform-specific manifest be validated? How does Ratify retrieve the target host OS/Architecture of the container?
     Like  Bookmark
  • Ratify leverage's OPA Gatekeeper's external data provider for admission control into the cluster. When an admission request arrives, Gatekeeper queries a validating webhook to determine pass/fail. Ratify responds with a verbose success/fail payload back to Gatekeeper via webhook upon which pre-defined policies are executed on Ratify's response. Ratify's verification process requires various operations that when aggregated sometimes exceed the 3 second validating webhook time limit. We added a 200ms buffer (2.8 second limit) so Gatekeeper fails gracefully. Ratify Validation Operations with Azure Workload Identity Gatekeeper sends a request to verify an image Verify Handler processes this request and calls Executor to verify subject Executor calls ORAS to resolve the subject desciptor ORAS calls Auth Provider's Provide function to get credentials for registry -NOTE: Send request for new AAD token if token has expired Provide sends request to ACR to exchange AAD token
     Like  Bookmark