# Meta
**Name:** Cache for Image Verification in Kyverno
**Start Date:** 2023-07-13
**Update date:** 2023-07-13
# Table of Contents
* [Meta](#Meta)
* [Table of Contents](#Table-of-Contents)
* [Overview](#Overview)
* [Definitions](#Definitions)
* [Motivation](#Motivation)
* [Proposal](#Proposal)
* [Rules](##Rules)
* [Default Eviction Strategies](##Default-Eviction-Strategies)
* [Periodic eviction using a probabilistic algorithm](###Periodic-eviction-using-a-probabilistic-algorithm)
* [Advantages](####Advantages)
* [Disadvantages](####Disadvantages)
* [LRU-like size limit](###LRU-like-size-limit)
* [Advantages](####Advantages)
* [Disadvantages](####Disadvantages)
* [Janitor](###Janitor)
* [Advantages](####Advantages)
* [Disadvantages](####Disadvantages)
* [Approaches](#Approaches)
* [Hashmap Cache with a unique key](##Hashmap-Cache-with-a-unique-key)
* [Evictions](###Evictions)
* [Drawbacks](###Drawbacks)
* [Multiset with image_ref as Key](##Multiset-with-image_ref-as-Key)
* [Evictions](###Evictions)
* [Drawbacks](###Drawbacks)
* [Implementation](#Implementation)
* [Unresolved Questions](#Unresolved-Questions)
# Overview
Kyverno supports verification of image signatures and their attached artifacts using both Notary and Cosign. This proposal aims to describe solutions to perform image verification faster on a large scale by proposing ideas to cache the image verification using a TTL Cache.
# Definitions
**Artifacts:** Software builds produce artifacts for installation and execution. The type and format of artifacts varies depending on the software. They can be packages, WAR files, container images, or other formats.
**Signing:** Signing refers to the process of attaching metadata that is used to verify the image. A signed container image allows users to verify where an image came from, ensuring it was not tampered with
**Metadata:** Metadata is used to describe software and the build environment. Provenance (origin) data, SBOMs, and vulnerability scan reports are the essential set of metadata required to assess security risks for software.
**Attestations:** Authenticated metadata is used to attest to the integrity of a software system. Both custom and standardized metadata can be converted into attestations.
**Policies:** Policies check and enforce organization standards. Policies should be automatically enforced prior to deployment and via runtime scanning.
**TTL:** TTL or time to live, in context of an image verification cache refers to the time for which a verified image will be assumed to be verified once it has been verified using remote verification.
**Payload:** Payload in this case refers to the data in the artifact.
# Motivation
As of Kyverno v1.11, Kyverno supports verification of images and artifacts using Notary and Cosign, we also allow users to add conditions to verify the attestations. This requires accessing a remote registry using a client like go-containerregistry. Since image data fetching and verification is done with the help of a network call to some external service. It can take a lot of time to verify an image. And in most cases, we have to verify the same image multiple time, which is just redundant work and can be done efficiently by adding a cache.
Here, we are proposing some possible ways to implement a cache for image verification in kyverno without adding any infrastructure.
# Proposal
## Rules
There are a few specifications the cache must follow:
1. The cache should not require any additional infrastructure.
2. The cache should use TTL. This is because the signatures on an image can be removed and image tags, unlike digests, are mutable. So, if an image was verified once, that does not imply that we can trust it forever to be verified. The cache should be automatically invalidated after a fixed time. We can optionally allow users to set a custom time based on their use cases.
3. There must be an option to disable the cache. Users should have the ability to verify the image every time and not use the cache altogether.
To tackle this, the caching approach must follow these rules.
1. The cache is an in memory cache.
2. There should be a TTL value in all the entries although we can set a size limit to avoid having a lot of stale entries.
3. There should be a global flag called --imageVerifyPolicy with the following values
- **IfNotPresent:** Will use the cache if possible.
- **Always:** Will not use the cache and verify image every time
- **PerResource:** (OPTIONAL) If the image has been verified for one resource, then it won’t be verified again. For a different resource, the image needs to be verified. We can optionally add a longer TTL for this (or maybe indefinite TTL). This can be used as a replacement for the verifyImages annotation.
We should also have a field called imageVerifyPolicy with the same values, per rule.
## Default Eviction Strategies
### Periodic eviction using a probabilistic algorithm
Redis using this approach, Here is the reference : https://redis.io/commands/expire/
To quote,
Specifically this is what Redis does 10 times per second:
1. Test 20 random keys from the set of keys with an associated expire.
2. Delete all the keys found expired.
3. If more than 25% of keys were expired, start again from step 1.
This is a trivial probabilistic algorithm, basically the assumption is that our sample is representative of the whole key space, and we continue to expire until the percentage of keys that are likely to be expired is under 25%
#### Advantages
Well tested and a popular approach
#### Disadvantages
As we will have to lock the cache for some time, (10 times per second). This will affect the access time slightly.
### LRU-like size limit
In this approach, we will set a max size like in LRU and keep a journal of when an entry was accessed and delete the entry that is the oldest.
#### Advantages
No periodic job that needs to be run.
Deletion wont take long, it can be done in O(1) time.
#### Disadvantages
We will have to keep a journal, in form of a doubly linked list which requires extra space.
There is a possibility that some entries might be deleted before they are expired.
### Janitor
This approach is used in patrickmn/go-cache library. When setting up the cache, we have to specify the interval at which a “janitor’ will be activated.
When the janitor runs, it will periodically call a DeleteExpired function which will lock the entire cache and check every entry and delete the one that are expired.
#### Advantages
Simple to implement.
#### Disadvantages
The entire cache will be locked for sometime, every few minutes, which might not be scalable.
# Approaches
## Hashmap Cache with a unique key
This is a very simple implementation.
We can keep a hashmap of type map[string]int64. Here the key (string) is a unique key which can be: policy_id;rule_name;image_ref Policy ID is the unique policy id of the policy which cannot be changed. Rule Name is the name of the rule that this image was verified for. Image ref is the reference of the image which can contain tags or digest.
The value (int64) is the Unix time when the entry expires. When time.Now().UnixNano() > value, the entry is considered expired.
### Evictions
We will have to evict all the cache entries whenever there is a change in the rule. We will have to traverse the entire cache and do a prefix match for policy_id;rule_name; and delete all the related entries.
When the name of a rule is changes we will have to delete all the entries for a policy by doing a prefix match for policy_id;
### Drawbacks
Since we have to do a prefix match to delete all the entries of a cache, we will have to traverse the entire cache every time which can take a lot of time. (Maybe we can combine this with a default eviction strategy and delete expired entries here as well which will remove the need for a separate periodic eviction strategy.)
In the future, when we make some change in the working and structure of image verify policy, we will have to consider whether the cache should be invalidated here or not.
This is simply a policy result cache and not a image verification output cache.
There are only two hard things in Computer Science: cache invalidation and naming things. – Phil Karlton
## Multiset with image_ref as Key
We can use a multiset-like implementation for caching using a red-black tree for the entire cache, where the key is just the image reference. Since a red black tree allows duplicates, we can use just the image_ref, and there can be multiple entries in the cache with the image_ref.
Each cache entry will store the options using which the image was verified.
```
type Item struct {
ImageRef string
VerificationType ImageVerificationType # Notary or cosign
Cert string
CertChain string
Roots string
Subject string
Issuer string
Repository string
RekorURL string
RekorPubKey string
IgnoreSCT bool
IgnoreTlog bool
SignatureAlgorithm string
Type string # Only set when the verified ref was an attestation
Payload string # This will store the attestation, if the verified ref was an attestation
Identities string
Expiration int64
}
```
Every valid entry in the tree means - This image reference was verified using cosign/notary within the last . The following certs/ identities were used to verify the reference. If it was an attestation, then here we have the type of attestation and the payload.
Before calling verifySignatures or FetchAttestation methods, we will check the cache, by passing the image verification opts (ref, certs, identities etc) and check whether there is a valid entry for it in the cache or not.
This will make policy and image verification output independent of each other. And if any policy wants to verify an image using some opts, and there is a valid entry in the cache for the same image and the same opts then we will consider the image to be verified.
### Evictions
We will only have to evict an entry once the TTL expires.
### Drawbacks
- Complicated to implement.
- All operations are logarithmic and not constant time.
# Implementation
// TODO based on what Approach we want to use.
# Unresolved Questions
- What default eviction strategy we want to use?
- What caching approach we want to use?