Base images and containers

# Base images and containers Before jumping into the details, it's worth explaining what the "Base Image" term refers to. Words matter, there is some misunderstanding between the terms "Parent" and "Base" image. As defined by the [Docker documentation](https://docs.docker.com/develop/develop-images/baseimages/), a parent of an image is the image used to start the build of the current image, typically the image identified in the FROM directive in the Dockerfile. If the parent image is SCRATCH, then the image is considered a base image. However, the terms "base image" and "parent image" are often used interchangeably in the container community. > As a quick reminder, today, we're mostly going to talk about the parent image concept, but we'll be referring to them as base images. To be more precise, there is no such thing as "base" for an image because container images can be created/built in a lot of ways, not only via Dockerfile and not only via Dockerfile that uses "FROM <image-from-a-registry>" in the last stage. To better understand the base images, we should talk about the layers concept of an image first. Container images are made up of layers, a collection of files and folders built one on top of the other and bundled together all the essentials. These layers may be inherited from another image or created during the build of a specific image as defined by the instructions in a Dockerfile or other build process. The first layer, known as the base, is the layer on which all different layers are built and provides the basic building blocks for your container environments. Base images provide a base operating system environment for your containers, whether they're built using a Dockerfile's FROM directive, or Buildpacks, configured by the run-image. Generally speaking, to save time, we make our container images based on ready-made base images available on various registries such as DockerHub, Google Container Registry (gcr.io), and many other public container repositories. But nothing comes for free, right? Recently, a company called [Chainguard](https://chainguard.dev/) the zero-trust company, researched the popular base images that so many people, including us, are using. Unfortunately, from their study, many popular base images downloaded billions of times come with tens or hundreds of known security vulnerabilities. That's a lot of security debt in your image to overcome before you've even started building your applications! To learn more about that research, you can refer to the blog post over [here](https://thenewstack.io/chainguard-its-all-about-that-base-image/). In addition to that, many base images also include a fully installed operating system distribution which explicitly affects the size of the final image, on the other hand, and it also increases the attack surface of an image. In the light of the information above, we understand that we should be careful about choosing base images for our container images. If developers don't choose this image wisely, it can lead to critical security risks. Is it enough? Of course not, security is not a static thing, security vulnerabilities pop up every day, and the image with no known vulnerabilities today suddenly becomes a vulnerable image tomorrow, you can't assume that your base images are secure forever. To overcome this issue, we should always scan our base images to identify vulnerabilities and try to make them less vulnerable by applying patches against vulnerabilities before allowing people to use them. Today, we'll walk you through how we can prevent containers from running if they are not using one of the allowed base images and also reject them if they don't provide any build information as well, don't worry if you don't know what the build information means in the first place, we'll talk about it in detail in the upcoming sections. We'll use Kyverno for policy management on top of Kubernetes clusters and various container image builders that can provide base image information in different ways. First, let's talk about how we can create build information that contains some valuable information, including which base images were used while building container images. But, wait a minute. We mentioned that a container image is just a collection of layers, so how do we identify which base image was used during the build? Unfortunately, for any client that pulls your image, there is no way of knowing which base image was used during the build process, which means that once your image is built, information about the base image is completely lost. Fortunately, various container image builders have come to the rescue to address the problem here, such as Buildpacks, ko, and Docker BuildKit. We'll discuss all of them and show how they can provide this information. Let's start with [ko](https://github.com/google/ko), a simple, fast container image builder for Go applications. Luckily, Jason Hall (@imjasonh), one of the core maintainers of the ko project, worked on an RFC for contributing to Open Containers Initiative (OCI) image-spec about adding the "Base Image" annotations concept into it. You can access the details of his work from [here](https://github.com/imjasonh/ImJasonH/tree/main/articles/oci-base-image-annotations), and finally, OCI approved his RFC, and now, we can add these annotations above to the image-spec manifest to identify which base images were used during the build, which is precisely what ko does for us for free. The new standard annotations are: * **org.opencontainers.image.base.name:** the mutable reference to the base image (e.g., docker.io/alpine:3.14) * **org.opencontainers.image.base.digest:** the immutable digest of the base image at the time the image was built on it (e.g., sha256:adab384...) Let's see it in action and start with cloning the simple project and installing ko in our environment. To install ko, please refer to the [installation](https://github.com/google/ko/#install) page ```shell= $ git clone https://github.com/developer-guy/hello-world-ko $ cd hello-world-ko ``` This is a straightforward Go application, there is nothing special in it, but the only thing we should mention is that you might notice that there is a **.ko.yaml** file, which is a configuration file for ko. The default base image config is necessary for this example because there's an issue with annotations on Docker-typed images (Docker V2 Manifest Scheme V2), Docker Manifests don't support annotations, and most base images are Docker-typed and not OCI. Hence, we should use **"ghcr.io/distroless/static"** as the base or specify **--platform=all** flag to make a manifest list that supports annotations for ko to set base image annotations successfully. Once you clone the project, the only thing that you need to do is run the simple command below: > Quick reminder here. You can change the container registry below to one of your favorites. Also, please don't forget to replace your repository name. ```shell= $ KO_DOCKER_REPO=docker.io/devopps ko build -B --push --tags buildinfo . ``` Once the build is finished you can take a look at the manifest to see the base image annotations of an image with a utility called crane which is a tool for interacting with remote images and registries. To install crane, please refer to the [installation](https://github.com/google/go-containerregistry/tree/main/cmd/crane#installation) page. ```shell= $ crane manifest devopps/hello-world-ko:buildinfo | jq { "schemaVersion": 2, "mediaType": "application/vnd.oci.image.manifest.v1+json", "config": { "mediaType": "application/vnd.oci.image.config.v1+json", "size": 1059, "digest": "sha256:8034a968e1b22bb096c3ba80a35452891f58d988c85b9a18f9dc6787fe1498df" }, "layers": [ { "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip", "size": 130745, "digest": "sha256:d9ae476aa351d26969e1b9122781177e492495f1bc28086cc4240c6cc9d3dd72" }, { "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", "size": 127, "digest": "sha256:250c06f7c38e52dc77e5c7586c3e40280dc7ff9bb9007c396e06d96736cf8542" }, { "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", "size": 1117260, "digest": "sha256:7e857b94ae3abc7a22c7e79bf25aceaa5f78dec8bac86e1a1acbbb1b48da0432" } ], "annotations": { "org.opencontainers.image.base.digest": "sha256:2f230f58bd5883ded11612740c3f724f5b9340d29258882eb97e21d5991d8100", "org.opencontainers.image.base.name": "ghcr.io/distroless/static:latest" } } ``` As you can see from the output above, ko did what it claimed it would do and set these annotations for us. Let's move on with the next container image builder, [Docker Buildx](https://github.com/docker/buildx/). Docker Buildx is built on [BuildKit](https://github.com/moby/buildkit/), concurrent, cache-efficient, and Dockerfile-agnostic builder toolkit, technology, and a CLI plugin that extends the Docker command with the full support of the features provided by the Moby BuildKit builder toolkit. To install Buildx, please refer to the [installation](https://github.com/docker/buildx/#installing) page. In addition, BuildKit [v0.10](https://github.com/moby/buildkit/releases/tag/v0.10.0) comes with a set of shiny new features, one of them is new build information-structure generation from build metadata. This allows us to see all sources (images, Git repositories, and HTTP URLs) and configurations passed on to your build. This information can also be stored in different ways, such as within the image config or a file. To get more detail about this topic, please refer to the official documentation of the BuildKit project [here](https://github.com/moby/buildkit/blob/master/docs/build-repro.md). > While this feature is automatically activated upon updating to BuildKit v0.10, we also recommend using BuildKit’s Dockerfile v1.4 to reliably capture original image names. You can do so by adding the following syntax to your Dockerfile: # syntax=docker/dockerfile:1.4. > Source: https://www.docker.com/blog/capturing-build-information-buildkit/ As we said, we can store this information in various ways. For example, to keep it in a file, we should use the “–metadata-file” flag or within the image config, which is enabled by default, to make it even more portable across registries, so you don’t have to do anything additional, this is also what we are going to use today. To start building container images with Buildx, the recommended way is to create a new [docker-container builder](https://github.com/docker/buildx/blob/master/docs/reference/buildx_create.md#docker-container-driver) with Buildx that uses the latest stable version of BuildKit. Let's create new docker-container typed builder according to the recommendation above. ```shell= $ docker buildx create --use --bootstrap --name mybuilder ``` Once the builder is ready, the next step is to build the project. To do so, we should clone the simple project first. ```shell= $ git clone https://github.com/developer-guy/hello-world-buildx $ cd hello-world-buildx ``` You will notice that the Dockerfile is designed to be highly optimized and cache-efficient for the Go application. Don't worry if you are not familiar with all of the optimizations within the Dockerfile. There is a blog post to explain everything in detail [here](https://kubesimplify.com/the-secret-gems-behind-building-container-images-enter-buildkit-and-docker-buildx). Let's build our container image with Buildx. To do so, we should run the simple command below: ```shell= $ docker buildx build -t devopps/hello-world-buildx:buildinfo . --push ``` Once the build is finished, let’s look at the image config, not the manifest, with the crane again. As we said, Buildx can keep this information within the image config instead of a file, and this information is stored within a new field called “moby.buildkit.buildinfo.v0” to embed build dependencies. ```shell= $ crane config devopps/hello-world-buildx:buildinfo | jq -r .'"moby.buildkit.buildinfo.v1"' | base64 -D | jq { "frontend": "dockerfile.v0", "sources": [ { "type": "docker-image", "ref": "docker.io/library/golang:1.17.8-alpine", "pin": "sha256:95abb5d5c780126d12a63401acddc9fe0748fab7c5bd498f9961ed5736393049" }, { "type": "docker-image", "ref": "docker.io/tonistiigi/xx:latest", "pin": "sha256:23ca08d120366b31d1d7fad29283181f063b0b43879e1f93c045ca5b548868e9" } ] } ``` > Unlike with your metadata file results, build attributes aren’t automatically available within the image config. Attribute inclusion is currently an opt-in configuration to avoid troublesome leaks. As you can see from the output above, Docker Buildx did what it claimed it would do and keep this information within the image config. Finally, our next container image builder that provides the build metadata is Clod Native Buildpacks (CNB), Buildpacks for short. Cloud Native Buildpacks transform your application source code into images that can run on any cloud in the simplest form. Cloud Native Buildpacks embrace modern container standards such as the OCI image format. To get more detail about Buildpacks, please refer to the official website [here](https://buildpacks.io/). There are some concepts around Buildpacks that we need to be aware of: "Build" and "Run" images. * **Build image:** A base image used to construct the build environment in which the buildpacks will run. * **Run image:** A minimal base image used for the final application image. > Source: https://aws.amazon.com/blogs/containers/creating-container-images-with-cloud-native-buildpacks-using-aws-codebuild-and-aws-codepipeline/ Next, we'll try to find the Run image information within the image config because it's the base image that Buildpacks set for us. * The remote image will have an image digest reference in the `runImage.reference` field in the `io.buildpacks.lifecycle.metadata` label * The local image will have an image ID in the runImage.reference field in the `io.buildpacks.lifecycle.metadata` label if it was created locally > Source: https://buildpacks.io/docs/features/reproducibility/ Before deeping too much dive, let's look at building a container image with the pack, which is a CLI for building apps using Cloud Native Buildpacks. To install pack, please refer to the [installation](https://buildpacks.io/docs/tools/pack/#install) page. Again, let's start by cloning the simple project: ```shell= $ git clone https://github.com/developer-guy/hello-world-buildpack $ cd hello-world-buildpack ``` Again, this is a simple Go application, and there is nothing special in it. Therefore, to build it with a pack, the only thing we need to do is use a builder that supports Go applications, and we are going to use "[Paketo Tiny Builder](https://github.com/paketo-buildpacks/tiny-builder)" for this purpose. > You can access this information by typing "pack config trusted-builders" command. ```shell= $ pack build devopps/hello-world-buildpack:buildinfo \ --buildpack paketo-buildpacks/go \ --builder paketobuildpacks/builder:tiny \ --publish ``` Once the build is finished, let's look at the image config, not the manifest, with crane again. ```shell= $ crane config devopps/hello-world-buildpack:buildinfo | jq -r '.config.Labels."io.buildpacks.lifecycle.metadata"' | jq -r '.stack.runImage.image' index.docker.io/paketobuildpacks/run:tiny-cnb ``` As you can see from the output above, Cloud Native Buildpacks did what it claimed it would do and kept this information within the image config. Last but not least, now we come to the section where we'll discuss how this information can be used. This is also where policy management tools like Kyverno come to the rescue. Kyverno is a policy engine designed for Kubernetes. Kyverno policies can validate, mutate, and generate Kubernetes resources plus ensure OCI image Supply Chain Security. Today, we'll use Kyverno to validate the base image metadata information of the images we've built with several techniques. To do so, first, we should install Kyverno v1.7.0 in our Kubernetes clusters: ```shell= $ helm install \ --namespace kyverno \ --create-namespace \ --repo https://kyverno.github.io/kyverno kyverno kyverno ``` Once the Kyverno pods are ready, the next step would be to apply the previously mentioned policies. These are [requiring-base-image](https://kyverno.io/policies/other/require-base-image/require-base-image/) for validating the build information, which means that every container image should provide valid build information and [allowed-base-images](https://kyverno.io/policies/other/allowed-base-images/allowed-base-images/) for validating the base image information against the allowed ones to ensure it is a good base image. Let's apply require-base-image policy first and test it: ```shell= $ curl -sSLO https://github.com/kyverno/policies/raw/main/other/require-base-image/require-base-image.yaml && \ sed -i "" 's/audit/enforce/g' require-base-image.yaml && \ kubectl apply -f require-base-image.yaml clusterpolicy.kyverno.io/require-base-image created ``` To ensure whether the cluster policy state is ready before moving to the next step: ```shell= $ kubectl get clusterpolicies require-base-image NAME BACKGROUND ACTION READY require-base-image true enforce true ``` Now, it's time to test this with a proper image. Let's choose one of the images that we have built above, and it should succeed: ```shell= $ kubectl run hello-world-buildpack --image=devopps/hello-world-buildpack:buildinfo pod/hello-world-buildpack created ``` What if I try to run an image without building information? Let's see: ```shell= $ kubectl run busybox --image=busybox Error from server: admission webhook "validate.kyverno.svc-fail" denied the request: resource Pod/default/busybox was blocked due to the following policies require-base-image: check-base-image: 'validation failure: Images must specify a source/base image from which they are built.' ``` Voilà This is what we expected. Let's move on to the next policy but before that we should create a ConfigMap that holds the allowed base image list inside of it. ```shell= $ kubectl create namespace platform namespace/platform created $ cat <<EOF | kubectl apply -f - apiVersion: v1 kind: ConfigMap metadata: name: baseimages namespace: platform data: allowedbaseimages: '["ghcr.io/distroless/static:latest"]' EOF configmap/baseimages created ``` We created this ConfigMap with the name of "baseimages" because the policy that we're going to apply is using this name store the base images within a variable. Let's apply the policy: ```shell= $ curl -sSLO https://github.com/kyverno/policies/raw/main//other/allowed-base-images/allowed-base-images.yaml && \ sed -i "" 's/audit/enforce/g' allowed-base-images.yaml && \ kubectl apply -f allowed-base-images.yaml ``` To ensure whether the cluster policy state is ready before moving to the next step: ```shell= $ kubectl get clusterpolicies allowed-base-images NAME BACKGROUND ACTION READY allowed-base-images true enforce true ``` Let's test this again: ```shell= $ kubectl run busybox --image=busybox Error from server: admission webhook "validate.kyverno.svc-fail" denied the request: resource Pod/default/busybox was blocked due to the following policies allowed-base-images: check-base-image: 'validation failure: This container image''s base is not in the approved list or is not specified. Only pre-approved base images may be used. Please contact the platform team for assistance.' require-base-image: check-base-image: 'validation failure: Images must specify a source/base image from which they are built.' ``` As we expected, it failed, let's see what will happen when we try to create a valid pod that contains valid container image: ```shell= $ kubectl run hello-world-ko --image=devopps/hello-world-ko:buildinfo pod/hello-world-ko created ``` > We used an image we have built with ko because if you take a closer look at the "check-base-image" policy, you will notice that > it's gathering build information through OCI image-spec annotations, and ko is considering this approach while providing build information. Whee! ### Build Reproducibility It is very difficult to rebuild the same container image that you built maybe a month ago using the same Dockerfile. Container image build reproducibility is kind of getting exact same digest over and over again for each new build. It means that image builders produce identical layers given the same input could be said to be a reproducible container image. We can think of it as pinning the digests of the image tags we referenced at the time of building our images since we trust digests. Digest values are used to construct immutable addresses to the objects. This process is called content-addressable storage. BuildKit has been discussing a new concept called `Dockerfile.pin` for [pinning the digests](https://github.com/moby/buildkit/pull/2816). It will use this file while reconstructing the original image to ensure build reproducibility. On the othe other hand, there is also support for [Reproducible Builds](https://buildpacks.io/docs/features/reproducibility/) made available in The Cloud Native Buildpacks project too. ### Day 2 Container Base Image Operations Out-of-date software is a major factor in security breaches. All base images should be continually updated and aim for zero known vulnerabilities. New vulnerabilities are constantly being discovered and pop up every day. You want to update base image to get rid of some vulnerabilities but don't want to rebuild from scratch? Fine! Luckily, we can rebase our images to enable them to use newer base images. Let's get into to details of how we can accomplish this. #### Rebasing [Rebasing](https://github.com/ImJasonH/ImJasonH/tree/main/articles/oci-base-image-annotations#rebasing=) is kind of updating your old base image onto a new base image. During rebasing, layers above the original base image layers are snipped out, placed on top of the new base image layers, then pushed back to the registry. The rebased image can be tagged over the original image, or given a new tag. Fortunately, we have various options that supports rebasing images such as crane, Buildpacks. [crane](https://github.com/google/go-containerregistry/tree/main/cmd/crane), a registry interaction tool, have a [rebase](https://github.com/google/go-containerregistry/blob/main/cmd/crane/doc/crane_rebase.md) sub command to take care of this stuff. This command also avoids the need to fully rebuild the app. You can take a look [mutate.Rebase](https://github.com/google/go-containerregistry/blob/7fc806eebf5bec4f1e7e58d9b9103db4d9460cf6/pkg/v1/mutate/rebase.go#L25-L112) function to learn what's happening under the hood. ```bash= # We've specifically built the `distroless` tag of an image with the base image `gcr.io/distroless/static` for this purpose. $ crane rebase devopps/hello-world-ko:distroless --old_base gcr.io/distroless/static --new_base ghcr.io/distroless/static --tag devopps/hello-world-ko:rebased 2022/06/13 21:56:59 devopps/hello-world-ko:rebased: digest: sha256:ed9ff1e64abf2a6225364fbf720e9bda18b050276e224155c797e981a751c005 size: 939 ``` [pack](https://github.com/buildpacks/pack), on the other side, have also [rebase](https://github.com/buildpacks/pack/blob/c210b6eba650e76d0ae591887876bc4386b39f5c/pkg/client/rebase.go#L41-L95) sub command allows you to quickly swap out the underlying OS layers (run image) of an app image generated by pack build with a newer version of the run image, without re-building the application. Running the following will update the base of `devopps/hello-world-buildpack:buildinfo` with the latest version of `pack/run`: ```bash= $ pack rebase devopps/hello-world-buildpack:buildinfo --publish ``` ### Conclusion To summarize, we’ve learned that base images are a crucial part of securing container images because we are building our container images on top of them. Therefore, we should always be careful while choosing them because if we don’t pick them wisely, we can encounter security problems. According to that, we should be determining which base images can be used and which of them cannot before allowing them to run on production systems. This is where Kyverno comes into play. To do so, we should access the base image information of the container images first. This is where the build information concept comes into the picture. Thanks to ko, Buildpacks, and Docker Buildx, we can quickly provide build information while building container images. As you can see, you can enable end-to-end security for your base images easily and quickly with the help of a bunch of open-source projects.

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.