---
status: proposed
title: Trusted Artifacts
creation-date: '2023-07-26'
last-updated: '2023-07-26'
authors:
- '@afrittoli'
collaborators:
- '@pritidesai'
---
# TEP-XXXX: Trusted Artifacts
---
<!-- toc -->
- [Summary](#summary)
- [Motivation](#motivation)
- [Goals](#goals)
- [Non-Goals](#non-goals)
- [Use Cases](#use-cases)
- [Requirements](#requirements)
- [Proposal](#proposal)
- [Notes and Caveats](#notes-and-caveats)
- [Design Details](#design-details)
- [Design Evaluation](#design-evaluation)
- [Reusability](#reusability)
- [Simplicity](#simplicity)
- [Flexibility](#flexibility)
- [User Experience](#user-experience)
- [Performance](#performance)
- [Risks and Mitigations](#risks-and-mitigations)
- [Drawbacks](#drawbacks)
- [Alternatives](#alternatives)
- [Implementation Plan](#implementation-plan)
- [Test Plan](#test-plan)
- [Infrastructure Needed](#infrastructure-needed)
- [Upgrade and Migration Strategy](#upgrade-and-migration-strategy)
- [Implementation Pull Requests](#implementation-pull-requests)
- [References](#references)
<!-- /toc -->
## Summary
The Tekton Data Interface working group has been working on for about 10 months now, identified a number of different problems to solve and proposed a number of different solutions.
The number of different issues discussed and their sometimes conflicting requirements means that only a small fraction of the proposed solutions has actually been implemented in Tekton.
This proposal is an attempt to take one of the problems identified, describe it in a way that is as much as possible self-contained, and provide a simple solution to it.
The solution proposed does not need to address all the requirements and constrains of adjecent problems, but it should at least not make it harder for them to be addressed in future.
## Motivation
The Tekton runtime model maps the execution of a `Task` (i.e. `TaskRun`) to a Kubernetes `Pod` and the execution of `Pipeline` (i.e. `PipelineRun`) to a collection of `Pods`. `Tasks` in a `Pipeline` may share data using the `Workspace` abstraction, which can be bound to a persistent volume (or `PV`) in Kubernetes. Because of the nature of `PVs`, a downstream `TaskRun` has no way of knowing whether the content of a `workspace` it receives as input has been tampered with.
Using existing Tekton capabilities, a producer and a consumer task could share artifacts in a workspace, like a file or a folder, as shown by this [demo `PipelineRun`](https://gist.github.com/afrittoli/3e7600eac3172a9f683f294610218635):
<details>
<summary>Demo Pipeline:</summary>
```yaml=
apiVersion: tekton.dev/v1
kind: PipelineRun
metadata:
generateName: trusted-artifacts
spec:
pipelineSpec:
workspaces:
- name: artifactStorage # In this example this is where we store artifacts
tasks:
- name: producer
taskSpec:
results:
- name: aFileArtifact
type: object
description: An artifact file
properties:
path:
type: string
hash:
type: string
type:
type: string
- name: aFolderArtifact
type: object
description: An artifact folder
properties:
path:
type: string
hash:
type: string
type:
type: string
steps:
- name: produce-file
image: bash:latest
script: |
#!/usr/bin/env bash
# Produce some content
date +%s | tee "$(workspaces.artifactStorage.path)/afile.txt"
- name: upload-hash-file
image: bash:latest
script: |
#!/usr/bin/env bash
# Uploads the file somewhere
# This is noop in this case, as the file is passed through
# the PVC directly. Note that this PVC could be backed
# by different types of storage via CSI volumes, or we
# could provide support for direct upload to OCI registries
# or object storage
# Produces a result which makes the file trustable
# This step could be injected by the Tekton controller and be
# transparent to users, except for some syntatic sugar, like
# a special result kind or an "artifact" API
A_FILE_PATH=$(workspaces.artifactStorage.path)/afile.txt
A_FILE_HASH=$(md5sum "${A_FILE_PATH}" | awk '{ print $1 }')
cat <<EOF | tee $(results.aFileArtifact.path)
{
"path": "${A_FILE_PATH}",
"hash": "${A_FILE_HASH}",
"type": "file"
}
EOF
- name: produce-folder
image: bash:latest
script: |
#!/usr/bin/env bash
A_FOLDER_PATH=$(workspaces.artifactStorage.path)/afolder
mkdir "$A_FOLDER_PATH"
date +%s | tee "${A_FOLDER_PATH}/a.txt"
date +%s | tee "${A_FOLDER_PATH}/b.txt"
date +%s | tee "${A_FOLDER_PATH}/c.txt"
- name: upload-hash-folder
image: bash:latest
script: |
#!/usr/bin/env bash
A_FOLDER_PATH=$(workspaces.artifactStorage.path)/afolder
# Uploads the folder somewhere
# This is noop in this case, as the folder is passed through
# Depending on the storage file we could upload each file in the folder
# some compressed form of the folder
A_FOLDER_HASH=$(tar zcf - "$A_FOLDER_PATH" | md5sum | awk '{ print $1 }')
cat <<EOF | tee $(results.aFolderArtifact.path)
{
"path": "${A_FOLDER_PATH}",
"hash": "${A_FOLDER_HASH}",
"type": "folder"
}
EOF
- name: consumer
taskSpec:
params:
- name: aFileArtifact
type: object
properties:
path:
type: string
hash:
type: string
type:
type: string
- name: aFolderArtifact
type: object
properties:
path:
type: string
hash:
type: string
type:
type: string
steps:
- name: download-verify-file
image: bash:latest
script: |
#!/usr/bin/env bash
set -e
# Check the md5sum
if [ "$(params.aFileArtifact.type)" == "file" ]; then
echo "$(params.aFileArtifact.hash) $(params.aFileArtifact.path)" | md5sum -c
else
tar zcf download.tgz $(params.aFileArtifact.path)
echo "$(params.aFileArtifact.hash) download.tgz" | md5sum -c
fi
- name: download-verify-folder
image: bash:latest
script: |
#!/usr/bin/env bash
set -e
# Check the md5sum
if [ "$(params.aFolderArtifact.type)" == "file" ]; then
echo "$(params.aFolderArtifact.hash) $(params.aFolderArtifact.path)" | md5sum -c
else
tar zcf download.tgz $(params.aFolderArtifact.path)
echo "$(params.aFolderArtifact.hash) download.tgz" | md5sum -c
fi
- name: consume-content
image: bash:latest
script: |
#!/usr/bin/env bash
# Do something with the verified content
# Here I need to use a workspace variable to trigger propagation of the workspace
find $(workspaces.artifactStorage.path) -type f
params:
- name: aFileArtifact
value: $(tasks.producer.results.aFileArtifact)
- name: aFolderArtifact
value: $(tasks.producer.results.aFolderArtifact)
workspaces:
- name: artifactStorage
volumeClaimTemplate:
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
```
</details>
</br>
<details>
<summary>Example execution log:</summary>
```log
[producer : produce-file] 1690234279
[producer : upload-hash-file] {
[producer : upload-hash-file] "path": "/workspace/artifactStorage/afile.txt",
[producer : upload-hash-file] "hash": "77c5df93c80c4847891407f22c955527",
[producer : upload-hash-file] "type": "file"
[producer : upload-hash-file] }
[producer : produce-folder] 1690234281
[producer : produce-folder] 1690234281
[producer : produce-folder] 1690234281
[producer : upload-hash-folder] tar: removing leading '/' from member names
[producer : upload-hash-folder] {
[producer : upload-hash-folder] "path": "/workspace/artifactStorage/afolder",
[producer : upload-hash-folder] "hash": "ce344e90cd05a43e44db451dc9d91354",
[producer : upload-hash-folder] "type": "folder"
[producer : upload-hash-folder] }
[consumer : download-verify-file] /workspace/artifactStorage/afile.txt: OK
[consumer : download-verify-folder] tar: removing leading '/' from member names
[consumer : download-verify-folder] download.tgz: OK
[consumer : consume-content] /workspace/artifactStorage/afile.txt
[consumer : consume-content] /workspace/artifactStorage/afolder/a.txt
[consumer : consume-content] /workspace/artifactStorage/afolder/b.txt
[consumer : consume-content] /workspace/artifactStorage/afolder/c.txt
```
</details>
</br>
The example pipeline shows a few things:
- The solution of sharing content is generic, there's nothing in it specific to pipeline in question
- Already in the pipeline the same code is used more than once. Replicating this solution in a pipeline with multiple producer or consumers would lead to a lot of duplication
- The metadata (path, hash and type) required to trust an artifact is stored as a result in the status of the `TaskRun`. To complete the chain of trust we need to be able to trust the `status` of the `TaskRun`, feature that would be provided by the integration with Spire proposed in TEP-0089
This suggests that we may be able to use a combination of API sugar-coating and controller injected steps to achieve the very same functionality while providing an API which is very similar to the one they are familiar with, but more powerful.
### Goals
- Contribute to the chain of trust by allowing consumer `Tasks` to trust artifacts on a `workspace` from producer `Tasks`, as long as results can be trusted.
### Non-Goals
- This proposal is restricted to artifact in a `workspace`. There is no reason however that would prevent this mechanism to be extended to artifacts stored somewhere else. The same step injection mechanism could upload and download artifacts to and from other storage types (like OCI registries or object storage). This feature would be similar to what `PipelineResources` used to do, and it shall be designed in a separate TEP for the very reasons described in the [summary](#summary).
- This proposal does not discuss how to expose artifacts outside of a pipeline, even though it sets foundations that could be used to achieve that
- This proposal does not discuss how to inject artifact as inputs to a pipeline, even though it sets foundations that could be used to achieve that. For instance, one could use a workspace preprovisioned with artifacts and use artifact type params as inputs for a pipeline
### Use Cases
- Extend the chain of trust across Tasks for provenance produced by Tekton Chains based on the `TaskRuns` and `PipelineRuns` executed by Tekton Pipeline
### Requirements
- TBD
## Proposal
A thourough proposal is not available yet; a rough approximation involves the following:
- extend parameters and result types to a new type `artifact`, an object type with a fixed schema
- extend the `Pod` logic in the controller to inject hashing and checking steps when required
The same pipeline from demo, rewritten after implementation, is shown by this [demo pipeline](https://gist.github.com/afrittoli/7236be5fca524b752c221d2346497bb7):
<details>
<summary>Demo Pipeline:</summary>
```yaml=
apiVersion: tekton.dev/v1
kind: PipelineRun
metadata:
generateName: trusted-artifacts-sugar
spec:
pipelineSpec:
workspaces:
- name: artifactStorage # In this example this is where we store artifacts
artifacts: true # this will result in failed validation if the workspace is bound to a readonly backend like a secret
tasks:
- name: producer
taskSpec:
results:
- name: aFileArtifact
type: artifact # inbuilt object schema (path, hash, type)
description: An artifact file
- name: aFolderArtifact
type: artifact # inbuilt object schema (path, hash, type)
description: An artifact folder
steps:
- name: produce-file
image: bash:latest
script: |
#!/usr/bin/env bash
# Produce some content. The result "data.path" will resolve to the workspace marked for artifacs.
date +%s | tee "$(results.aFileArtifact.data.path)/afile.txt"
# The controller appends a step that builds the object result json,
# and stores it under $(results.aFileArtifact.path)
# The type is detected from the context of $(results.aFileArtifact.data.path)
# If it's a single file, it's type, if one or more files and folders it's folder
# The hash is calculated and added to into the json.
- name: produce-folder
image: bash:latest
script: |
#!/usr/bin/env bash
A_FOLDER_PATH=$(results.aFolderArtifact.path)/afolder
mkdir "$A_FOLDER_PATH"
date +%s | tee "${A_FOLDER_PATH}/a.txt"
date +%s | tee "${A_FOLDER_PATH}/b.txt"
date +%s | tee "${A_FOLDER_PATH}/c.txt"
- name: consumer
taskSpec:
params:
- name: aFileArtifact
type: artifact # inbuilt object schema (path, hash, type)
- name: aFolderArtifact
type: artifact # inbuilt object schema (path, hash, type)
steps:
- name: consume-content
image: bash:latest
script: |
#!/usr/bin/env bash
# A step is prepended, which will automatically check the hashes
# and fail the task with a specific reason if there is no match
# this behaviour could be enabled via some Pipeline/PipelineRun flag
# Do something with the verified content.
# The path from the object params corresponds to the result's "data.path"
# and resolves to a path on the workspace
echo "File content"
cat $(params.aFileArtifact.path)
echo "Folder content"
find $(params.aFolderArtifact.path) -type f
params:
- name: aFileArtifact
value: $(tasks.producer.results.aFileArtifact)
- name: aFolderArtifact
value: $(tasks.producer.results.aFolderArtifact)
workspaces:
- name: artifactStorage
volumeClaimTemplate:
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
```
</details>
</br>
### Notes and Caveats
TBD
## Design Details
TBD
## Design Evaluation
### Reusability
Adopting trusted artifacts would require users to make changes to their Tasks and Pipelines, albeit minimal ones.
### Simplicity
The proposed functionality relies as much as possible on existing Tekton features, it uses a syntax that users are already familiar with by extending it consistently.
### Flexibility
The proposed functionality relies on workspaces and `PVCs`, however it could easily be extended to support additional storage formats. In terms of flexibility of adoption in pipelines, there are no assumptions made on the `Tasks` and `Pipelines` where this is used.
The artifact schema could extended in future, or it could support custom fields to be specified by users in the same way they do today for object paramters and results, to allow users to attach additional metadata to their artifacts/
### Conformance
TBD
### User Experience
The API surface change is minimal and consistent with the API that users are familiar with today.
### Performance
Injected steps would impact the execution of `TaskRuns` and `PipelineRuns`, however impact should be minimal:
- a single producer and consumer step can be used to handle multiple artifacs to avoid the overhead of one container per artifact
- steps shall be injected only where needed
- the ability to use `workspaces` means that no extra data I/O is required, apart from that needed to tar/untar folders for hashing purposes
### Risks and Mitigations
N/A
### Drawbacks
N/A
## Alternatives
We could document the demo pipeline and let users apply that approach explicitly in their pipelines.
## Implementation Plan (TBD)
* Test Plan
* Infrastructure Needed
* Upgrade and Migration Strategy
* Implementation Pull Requests
* References