---
status: proposed
title: Trusted Artifacts in Workspaces
creation-date: '2023-07-27'
last-updated: '2023-09-05'
authors:
- '@afrittoli'
collaborators:
- '@pritidesai'
- '@jerop'
---
# TEP-0139: Trusted Artifacts in Workspaces
---
<!-- toc -->
- [Summary](#summary)
- [Background](#background)
- [Motivation](#motivation)
- [Security](#security)
- [Usability](#usability)
- [Goals](#goals)
- [Non-Goals](#non-goals)
- [Requirements](#requirements)
- [Proposal](#proposal)
- [API](#api)
- [Execution](#execution)
- [Non-Falsifiability of Results](#non-falsifiability-of-results)
- [Example](#example)
- [Notes and Caveats](#notes-and-caveats)
- [Future Work](#future-work)
- [Reusable Steps](#reusable-steps)
- [Provenance Generation](#provenance-generation)
- [Extend Schema](#extend-schema)
- [Design Evaluation](#design-evaluation)
- [Reusability](#reusability)
- [Simplicity](#simplicity)
- [Flexibility](#flexibility)
- [Conformance](#conformance)
- [User Experience](#user-experience)
- [Performance](#performance)
<!-- /toc -->
## Summary
The goal of the TEP is to extend the chain of trust for provenance produced by Tekton based on `TaskRuns` and `PipelineRuns`. It accomplishes this goal by enabling consumer `Tasks` to trust `Artifacts` on a `Workspace` from producer `Tasks` by verifying against hashes stored as non-falsifiable `Results`.
## Background
The Tekton Data Interface working group has been working on for about 10 months now, identified a number of different problems to solve and proposed a number of different solutions.
The number of different issues discussed and their sometimes conflicting requirements means that only a small fraction of the proposed solutions has actually been implemented in Tekton.
This proposal is an attempt to take one of the problems identified, describe it in a way that is as much as possible self-contained, and provide a simple solution to it. The solution proposed does not need to address all the requirements and constrains of adjacent problems, but it should at least not make it harder for them to be addressed in future.
## Motivation
Tekton runtime model maps the execution of a `Task` (i.e. `TaskRun`) to a Kubernetes `Pod` and the execution of `Pipeline` (i.e. `PipelineRun`) to a collection of `Pods`. `Tasks` in a `Pipeline` share data through the `Workspace` abstraction, which can be bound to a `Persistent Volume` in `Kubernetes`.
### Security
Because of the design of `Persistent Volumes`, a downstream `TaskRun` has no way of knowing whether the content of a `Workspace` it receives as input has been tampered with. For example, if the source code on the `Workspace` is changed between a git clone `Task` and a container build `Task`, there is no longer a guarantee the build is of the git reference that was checked out. SLSA v0.1 L3 requires provenance to identify source code used for builds: “provenance must authenticate the repository that stored the source code used in the build”. We need to ensure the integrity of `Artifacts` in `Workspaces` to meet these requirements and secure software supply chains.
### Usability
The current solution for generating provenance to identify source code is a suffix-based type hinting where Results must have ``“-ARTIFACT_INPUTS”`` or ``“-ARTIFACT_OUTPUTS”`` suffixes, which presents challenges:
- Encodes ``"Artifact"` concept into `Result` names. Intertwining concepts makes the API complex.
- Users have to always use the suffixes in `Result` names for Tekton Chains to generate provenance.
- Users have to wire `Results` correctly between `Tasks` and `Pipelines` for Tekton Chains to generate provenance; it doesn't “just work”.
It is critical that the proposed solution is easy to use and “just works” so that user workloads are secure by default.
## Goals
- Enable producer `Tasks` to declare artifacts they produce to a `Workspace` and consumer `Tasks` to declare artifacts they consume from a `Workspace`.
- Enable consumer `Tasks` to trust artifacts on a `Workspace` from producer `Tasks`.
## Non-Goals
- This proposal does not address the integrity of `Artifacts` stored outside `Workspaces`. However, the proposed solution can be extended to upload, download and verify `Artifacts` from other storage e.g. object storage and OCI registries.
- This proposal does not address downloading `Artifacts` as inputs to a `Pipeline` or uploading `Artifacts` as outputs of a `Pipeline`. This proposal focuses on passing `Artifacts` within a `Pipeline` and sets the foundation for this work to be explored in the future.
## Requirements
The solution should support the following combinations:
- One producer `Task`, one consumer `Task`.
- One producer `Task`, N consumer `Tasks`, including with write-one, read-many storage class.
- Many producer `Tasks`, one consumer `Task`.
- Many producer `Tasks`, many consumer `Tasks`, including with write-one, read-many storage class.
- Fail validation if `Workspaces` (static/runtime) are not fit for `Artifacts`.
- Fail execution if hash validation fails and surface error to `TaskRun` / `PipelineRun` failure reason.
## Proposal
### API
Add `Input.Artifacts` and `Output.Artifact` types with a fixed schema with three properties: `path`, `hash`, `type`:
- Path from which files are uploaded and to which files are downloaded.
- Hash of the produced files as computed by injected Steps.
- Type: ”file” or ”directory”.
This will be implemented using object `Parameters` and `Results` for inputs and outputs respectively.
```yaml
# Interface for users to declare input and output artifacts which inbuilt schema
inputs:
artifacts:
- name: foo
description: abcd
outputs:
artifacts:
- name: bar
description: 1234
# Implementation object type Params and Results with properties from the schema
params:
- name: foo
type: object
description: abcd
properties:
path:
type: string
hash:
type: string
type:
type: string
results:
- name: bar
type: object
description: 1234
properties:
path:
type: string
hash:
type: string
type:
type: string
```
Extend `Workspaces` to indicate whether they are used to store `Artifacts`. This is done through a new field that defaults to ``”false”``, but users can set it to ``”true”``. If set to ``”true”``, Tekton will validate that it is backed by a `Persistent Volume`.
```yaml
spec:
workspaces:
- name: artifactStorage
artifacts: true
```
Extend variable expansion to add `.data.path` for `Artifacts`. The `.data.path` will be backed by an `EmptyDir Volume` named `/tekton/artifacts/` that’s mounted to the `Pod` by Tekton.
```yaml
steps:
- name: produce-file
image: bash:latest
script: |
#!/usr/bin/env bash
date +%s | tee "$(outputs.artifacts.aFileArtifact.data.path)/foo.txt"
- name: produce-folder
image: bash:latest
script: |
#!/usr/bin/env bash
date +%s | tee "$(outputs.artifacts.aFolderArtifact.data.path)/aFolder/a.txt"
```
```yaml
steps:
- name: consume-file
image: bash:latest
script: |
#!/usr/bin/env bash
echo "File content"
cat $(inputs.artifacts.aFileArtifact.data.path)
- name: consume-folder
image: bash:latest
script: |
#!/usr/bin/env bash
echo "Folder content"
find $(inputs.artifacts.aFolderArtifact.data.path) -type f
```
### Execution
A producing `Task` writes files to a path backed by an `EmptyDir Volume`. After its execution, Tekton injects a `Step` to compute hash and copy the files from the `EmptyDir Volume` to a `Persistent Volume`.
A consuming `Task` reads files from a path backed by an `EmptyDir Volume`. Before its execution, Tekton injects a `Step` to copy the files from the `Persistent Volume` to the `EmptyDir Volume`, and verify the hash to ensure that the files that were produced are what are being consumed.
### Non-Falsifiability of Results
This proposal requires `Results` to be non-falsifiable, as proposed in TEP-0089, so that Tekton can rely on the hashes to validate the artifacts in a `Workspace`. As such, we need to complete the implementation of SPIRE support in Tekton Pipelines.
### Example
This `PipelineRun` demonstrates the passing of trusted artifacts between `Tasks` through a `Workspace`.
```yaml
apiVersion: tekton.dev/v1
kind: PipelineRun
metadata:
generateName: trusted-artifacts-example
spec:
workspaces:
- name: artifactStorage
volumeClaimTemplate:
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
pipelineSpec:
workspaces:
- name: artifactStorage
artifacts: true
tasks:
- name: producer
taskSpec:
outputs:
artifacts:
- name: aFileArtifact
description: An artifact file
- name: aFolderArtifact
description: An artifact folder
steps:
- name: produce-file
image: bash:latest
script: |
#!/usr/bin/env bash
date +%s | tee "$(outputs.artifacts.aFileArtifact.data.path)/afile.txt"
- name: produce-folder
image: bash:latest
script: |
#!/usr/bin/env bash
A_FOLDER_PATH=$(outputs.artifacts.aFolderArtifact.data.path)/afolder
mkdir "$A_FOLDER_PATH"
date +%s | tee "${A_FOLDER_PATH}/a.txt"
date +%s | tee "${A_FOLDER_PATH}/b.txt"
date +%s | tee "${A_FOLDER_PATH}/c.txt"
- name: consumer
params:
- name: aFileArtifact
value: $(tasks.producer.outputs.artifacts.aFileArtifact)
- name: aFolderArtifact
value: $(tasks.producer.outputs.artifacts.aFolderArtifact)
taskSpec:
inputs:
artifacts:
- name: aFileArtifact
description: An artifact file
- name: aFolderArtifact
description: An artifact folder
steps:
- name: consume-file
image: bash:latest
script: |
#!/usr/bin/env bash
echo "File content"
cat $(inputs.artifacts.aFileArtifact.data.path)
- name: consume-folder
image: bash:latest
script: |
#!/usr/bin/env bash
echo "Folder content"
find $(inputs.artifacts.aFolderArtifact.data.path) -type f
```
In practice, this is what happens to the above `PipelineRun` with the `Steps` injected at execution time:
```yaml
apiVersion: tekton.dev/v1
kind: PipelineRun
metadata:
generateName: trusted-artifacts-example
spec:
workspaces:
- name: artifactStorage
volumeClaimTemplate:
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
pipelineSpec:
workspaces:
- name: artifactStorage
artifacts: true
tasks:
- name: producer
taskSpec:
outputs:
artifacts:
- name: aFileArtifact
description: An artifact file
- name: aFolderArtifact
description: An artifact folder
steps:
- name: produce-file
image: bash:latest
script: |
#!/usr/bin/env bash
date +%s | tee "$(outputs.artifacts.aFileArtifact.data.path)/afile.txt"
- name: produce-folder
image: bash:latest
script: |
#!/usr/bin/env bash
A_FOLDER_PATH=$(outputs.artifacts.aFolderArtifact.data.path)/afolder
mkdir "$A_FOLDER_PATH"
date +%s | tee "${A_FOLDER_PATH}/a.txt"
date +%s | tee "${A_FOLDER_PATH}/b.txt"
date +%s | tee "${A_FOLDER_PATH}/c.txt"
- name: upload-file
image: bash:latest
script: |
#!/usr/bin/env bash
set -ex
ARTIFACT_ROOT="/tekton/artifacts"
A_FILE_PATH=afile.txt
A_FILE_HASH=$(md5sum "${ARTIFACT_ROOT}/${A_FILE_PATH}" | awk '{ print $1 }')
TARGET_PATH="$(workspaces.artifactStorage.path)/.tekton/artifacts"
mkdir -p "$TARGET_PATH"
cp "${ARTIFACT_ROOT}/${A_FILE_PATH}" "${TARGET_PATH}/${A_FILE_HASH}"
cat <<EOF | tee $(output.artifacts.aFileArtifact.path)
{
"path": "${A_FILE_PATH}",
"hash": "${A_FILE_HASH}",
"type": "file"
}
EOF
- name: upload-folder
image: bash:latest
script: |
#!/usr/bin/env bash
set -ex
ARTIFACT_ROOT="/tekton/artifacts"
A_FOLDER_PATH=afolder
tar zcf "${ARTIFACT_ROOT}/${A_FOLDER_PATH}.tgz" "${ARTIFACT_ROOT}/${A_FOLDER_PATH}"
A_FOLDER_HASH=$(md5sum "${ARTIFACT_ROOT}/${A_FOLDER_PATH}.tgz"| awk '{ print $1 }')
TARGET_PATH="$(workspaces.artifactStorage.path)/.tekton/artifacts"
mkdir -p "$TARGET_PATH"
cp "${ARTIFACT_ROOT}/${A_FOLDER_PATH}.tgz" "${TARGET_PATH}/${A_FOLDER_HASH}.tgz"
cat <<EOF | tee $(output.artifacts.aFolderArtifact.path)
{
"path": "${A_FOLDER_PATH}",
"hash": "${A_FOLDER_HASH}",
"type": "folder"
}
EOF
- name: consumer
params:
- name: aFileArtifact
value: $(tasks.producer.outputs.artifacts.aFileArtifact)
- name: aFolderArtifact
value: $(tasks.producer.outputs.artifacts.aFolderArtifact)
taskSpec:
inputs:
artifacts:
- name: aFileArtifact
description: An artifact file
- name: aFolderArtifact
description: An artifact folder
steps:
- name: download-verify-file
image: bash:latest
script: |
#!/usr/bin/env bash
set -e
# Download file
ARTIFACTS_ROOT="$(workspaces.artifactStorage.path)/.tekton/artifacts"
ARTIFACT="${ARTIFACTS_ROOT}/$(inputs.artifacts.aFileArtifact.hash)
TARGET_ROOT="/tekton/artifacts"
TARGET_ARTIFACT="${TARGET_ROOT}/$(inputs.artifacts.aFileArtifact.hash)"
cp "$ARTIFACT" "$TARGET_ARTIFACT"
# Check the md5sum
echo "${inputs.artifacts.aFileArtifact.hash} ${TARGET_ARTIFACT}" | md5sum -c || ret=$?
if [[ $ret -ne 0 ]]; then
>&2 echo "Want $(inputs.artifacts.aFileArtifact.hash), got $(md5sum ${TARGET_ARTIFACT})"
exit 1
fi
- name: download-verify-folder
image: bash:latest
script: |
#!/usr/bin/env bash
set -e
# Download folder
ARTIFACTS_ROOT="$(workspaces.artifactStorage.path)/.tekton/artifacts"
ARTIFACT="${ARTIFACTS_ROOT}/$(inputs.artifacts.aFolderArtifact.hash).tgz
TARGET_ROOT="/tekton/artifacts"
TARGET_ARTIFACT="${TARGET_ROOT}/$(inputs.artifacts.aFolderArtifact.hash).tgz"
cp "$ARTIFACT" "$TARGET_ARTIFACT"
# Check the md5sum
echo "${inputs.artifacts.aFolderArtifact.hash} ${TARGET_ARTIFACT}" | md5sum -c || ret=$?
if [[ $ret -ne 0 ]]; then
>&2 echo "Want $(inputs.artifacts.aFolderArtifact.hash), got $(md5sum ${TARGET_ARTIFACT})"
exit 1
fi
- name: consume-file
image: bash:latest
script: |
#!/usr/bin/env bash
echo "File content"
cat $(inputs.artifacts.aFileArtifact.data.path)
- name: consume-folder
image: bash:latest
script: |
#!/usr/bin/env bash
echo "Folder content"
find $(inputs.artifacts.aFolderArtifact.data.path) -type f
```
## Notes and Caveats
Some questions raised during the initial presentation:
* Q: Can we restrict access to the persistent volumes to the injected steps? Maybe using TEP-0029
* A: We could mount the workspace to injected steps instead of relying on propagated workspaces. This would not prevent users from mounting the workspace to other steps / sidecars though, unless we add validation to prevent that. However that would mean that a consumer could not use the artifact workspace to produce another artifact, which would be problematic
* Q: Using an emptyDir secures the data, but may be less performant than writing directly
* A: On the producing side, we need to let users write to an `emptyDir`, and the injected step will calculate the hash and then copy the data to the workspace. On the consuming side, we need to copy data to an `emptyDir` and then verify the checksum, or else we cannot be sure that the data has not been compromised after the checksum verification
* Q: Controller could be the one that has access to write the files to the artifact storage
* A: Using the Tekton controller to transfer data for all `TaskRuns` would turn it into an I/O bottleneck. We could conceive a Tekton managed service where artifact are uploaded to/downloaded from, but for this proposal I wanted to rely on the existing workspace as a baby-step forward. Once that is in place, we can introduce different kinds of backends. The beauty of it is that we can switch the implementation behind the scenes with no impact on the user interface and thus no impact on existing tasks and pipelines
* Q: If an Artifact needs to be consumed by multiple Tasks, do we need a lock?
* A: We don't need a lock, but we need to copy the artifact to the Pod local disk (`emptyDir`) first, then verify the checksum, and then hand-off control to the user
* Q: What about the flexibility of the (injected) steps?
* A: The implementation for workspace (this TEP) won't be flexible. In future we will introduce support for other kind of backends, and perhaps user-defined ones, which means we may need to give users a way to define what the upload/download steps look like. I purposefully wanted to steer away from that complexity in this proposal.
* Q: Is the path/hash to be used for provenance generation?
* A: This proposal is only meant for tasks to securely share artifact between each other. Provenance generation is interested in input and output artifacts instead. That said, this proposal is designed so that it may be used and extended for input and output artifacts as well, in which case the artifact metadata will become relevant from a provenance point of view.
* Q: Do we want to support several verification policies, like we do for trusted resources?
* A: Not in the initial version where we will only fail when the hash doesn’t match.
## Future Work
### Reusable Steps
We can introduce `Step` CRDs to enable the reusable units of work in Tekton to execute in the same environment with a shared file system – `Pod` in Kubernetes. We can build on the above proposal to enable users to define the injected `Steps` used to transfer artifacts to/from local disks and verify artifacts before they are operated on. This will provide greater flexibility than the injected Steps defined by Tekton.
### Provenance Generation
This proposal focuses on sharing artifacts between `Tasks` in a `Pipeline` via a `Workspace`. We can extend this feature to declare inputs and outputs of a `Pipeline` for which provenance needs to be generated by Tekton Chains. This will be explored in future work.
### Extend Schema
The schema requires `path`, `hash` and `type`. Users may need additional properties in schemas of specific `Artifacts`. We can explore supporting this in future work.
## Design Evaluation
### Reusability
Adopting trusted artifacts would require users to make changes to their Tasks and Pipelines, albeit minimal ones.
### Simplicity
The proposed functionality relies as much as possible on existing Tekton features.
### Flexibility
The proposed functionality relies on workspaces and `PVCs`, however it could easily be extended to support additional storage formats. In terms of flexibility of adoption in pipelines, there are no assumptions made on the `Tasks` and `Pipelines` where this is used.
The artifact schema could extended in future, or it could support custom fields to be specified by users in the same way they do today for object paramters and results, to allow users to attach additional metadata to their artifacts/
### Conformance
TBD
### User Experience
The API surface change is minimal and consistent with the API that users are familiar with today.
### Performance
Injected steps would impact the execution of `TaskRuns` and `PipelineRuns`, however impact should be minimal:
- a single producer and consumer step can be used to handle multiple artifacs to avoid the overhead of one container per artifact
- steps shall be injected only where needed
- the ability to use `workspaces` means that minimal extra data I/O is required:
- tar/untar folders for hashing purposes
- copy data on the consuming side to avoid dirty reads