# TL;DR
This project will strive to attain SLSA Level 2 build provenance for the Julia programming language in order to improve Julia's supply chain security at its root component. To achieve this objective, the following additions will be made for each subsequent Julia language release:
- A SPDX Software Bill of Materials file (`julia-major.minor.patch.spdx.json`) generated and validated via [REUSE](https://reuse.software/)
- [in-toto](https://in-toto.io/) SLSA build provenance attestation files (`julia-major.minor.patch-attestation.json` and `julia-major.minor.patch-full-attestation.json`)
- [Sigstore](https://www.sigstore.dev/) signature (`*.sigstore`) and certificate (`*.pem`) files for every artifact bundled with each release (tarballs, zips, individual files)
- An updated JSON release feed and feed schema via a new `JuliaVersionsCI` package to replace the current `VersionsJSONUtil` package
Additional contributions will include generating documentation detailing Julia's CI/CD pipeline with BuildKite as well as general documentation/code improvements relating to Julia supply chain security. Successful achievement of SLSA Level 2 build provenance for the Julia language will serve as a foundation for expanding such supply chain security practices to the rest of the Julia ecosystem.
# Project Information
- **Organisation** : The Julia Language
- **Mentor** : Experience with Julia's Buildkite CI/CD pipeline
- **Size**: Medium (~175 hours)
# Contact Information
- **Name** : Michael Persico
- **Email** : <michael.a.persico@gmail.com>
- **Phone**: [+15148031925](tel:+15148031925)
- **GitHub** : [M-PERSIC](https://github.com/M-PERSIC)
- **LinkedIn** : [michael-persico](https://www.linkedin.com/in/michael-persico/)
- **Location** : Montreal, Quebec, Canada
- **Timezone** : Eastern Standard Time (EST, UTC/GMT -4)
# Student Information
- **UNIVERSITY** : [Concordia University](https://www.concordia.ca/)
- **DEGREE** : [Computer Science (GrDip)](https://www.concordia.ca/academics/graduate/computer-science-diploma.html)
- **EXPECTED GRADUATION** : August 2025
# Background
I am a current graduate diploma Computer Science student at Concordia University in Montreal, Canada. Prior to my diploma, I earned my undergraduate degree in Systems and Information Biology, a cross-discipline program wherein I pursued both biology and computer science fundamentals. I will be returning to Concordia in the fall session for my masters in Software Engineering (MEng). I previously went through a non-GSoC Julia Summer of Code experience in 2023 primarily focusing on generating JLLs for common bioinformatics tools via [Yggdrasil and BinaryBuilder](https://github.com/JuliaPackaging/Yggdrasil.git). I have also made smaller contributions to the Julia ecosystem including updating the currrent devcontainer configuration.
# Description
Julia represents a modern programming language not only because of its recent emergence or adoption of new capabilities. Like Rust, Go, and other comtemporaries, Julia heavily supports an ecosystem of stakeholders, tools, dependencies, and practices that actively coalesce and mutually contribute toward an elevated Julia developer and user experience. Certain principles contribute to the sucess of this collaborative model:
- Strict adherance to the principles of [Free and Open-Source Software (FOSS) and the Four Freedoms](https://fsfe.org/freesoftware/freesoftware.en.html) as the bedrock of the entire ecosystem
- Decentralized contribution by stakeholders across various domains (industry, academia, etc.)
- Transparent community governance with decision-making performed without the unjustified influence of a single entity
- Shared tooling infrastructure accessible to all stakeholders and that promote greater ecosystem coalescence and participation
- Dynamic interdependence of software components with reusability, composability, and innovation in mind
The process flow and infrastructure supporting this ecosystem is referred to as the [software supply chain](https://circleci.com/blog/secure-software-supply-chain/). Simply speaking, the supply chain is the answer to fundamental questions about how the ecosystem and the software that forms it work, such as where does this software come from, how was it built, and what it is composed of. Lack of proper design and maintenance of the supply chain will pose dangerous risks to all stakeholders, lest there be an significant breakdown in trust, increased vulnerability to security threats, degraded developer and user experience, and increased legal complexity. It is imperative that the supply chain be continually kept up-to-date with modern standards and frameworks for every aspect of the ecosystem, especially as Julia continues to experience [increasing industry adoption](https://juliahub.com/industries/case-studies).
The supply chain represents a broad set of flows and infrastructure that are integral to the ecosystem, meaning there are many moving parts that require specialized understanding for effective modernization. The focus of this proposal is on specifically improving the supply chain security for Julia release artifacts, meaning the files (executables, signatures, etc.) that form part of every Julia [language release](https://github.com/JuliaLang/julia/releases). Updating the release artifact supply chain security flow would serve to:
1. Improve trust and security of software components at the root level of the entire ecosystem, i.e. the language itself
2. Serve as a demonstration for the adoption of modern standards for the rest of the ecosystem
3. Demonstrate the compliance-adaptability of Julia for modern regulatory standards and frameworks
[Software provenance](https://slsa.dev/spec/v1.0/provenance) represents verifiable information, or attestation, about a software artifact's origin and history. Such information must be validatable and can be generated at key stages within the software lifecycle, including build steps, release deployments, etc. [Supply-chain Levels for Software Artifacts (SLSA)](https://slsa.dev/) is an emerging, industry-backed security framework governed by the Open Source Security Foundation (OpenSSF) that defines a specification strictly for build provenance. It includes a series of guidelines for improving the security of the build process and achieve specific build "levels" that provide guarantees including auditability and tamper resistance. It is both language-agnostic and forge-agnostic, with the goal of not being tied down to any particular implementation or workflow so long as the requirements are met (any stakeholder can verify provenance on their own).
This Google Summer of Code project aims to provide a number of cumulative contributions to Julia that will lead to SLSA Build L2 provenance for Julia release artifacts. Primary contributions will include enabling automated copyright/licensing compliance with REUSE, automated Sigstore cryptographic signature generation, and automated in-toto attestation generation. A rewrite of the current JSON release feed workflow package ([JuliaLang/VersionsJSONUtil.jl](https://github.com/JuliaLang/VersionsJSONUtil.jl.git)) package would also be explored and implemented as a demonstration of the benefits of build provenance for Julia.
# REUSE Licensing Compliance
- **Repositories**: [JuliaLang/Julia](https://github.com/JuliaLang/julia.git), [JuliaCI/julia-buildkite](https://github.com/JuliaCI/julia-buildkite.git), [fsfe/reuse-tool (potentially)](https://github.com/fsfe/reuse-tool.git)
- **Objectives**:
- Adopt the REUSE specification for declaring file-level copyright and licensing information within the main Julia repository (primary)
- Enable key discussion surrounding third-party dependency licensing and licensing/copyright edge cases, including non-code assets (primary)
- Implement a GitHub Actions workflow within the main repository to verify REUSE compliance and SBOM generation upon specific conditions and via manual trigger (primary)
- Implement automated SPDX SBOM generation within the Buildkite workflow for Julia releases (primary)
- Provide additional contributions for further copyright/licensing compliance across the Julia ecosystem (secondary)
- **Metrics**:
- Complete `REUSE.toml` file and `LICENSES` directory covering the main repository
- Working CI workflow that successfully generates an SBOM during tests and releases
- Updated file templates and developer documentation
- **Resources**:
- [REUSE specification v3.3](https://reuse.software/spec-3.3/)
- [SPDX specification v2.1](https://spdx.dev/wp-content/uploads/sites/31/2023/09/spdxversion2.1.pdf)
- [SBOM + SLSA: Accelerating SBOM success with the help of SLSA (article)](https://slsa.dev/blog/2022/05/slsa-sbom)
## Description
The current software bill of materials (SBOM) for Julia is defined as the `julia.spdx.json` file within the main repository. This represents a Software Package Data Exchange (SPDX) document that is meant to provide an auditable manifest of all software components that compose Julia itself. Stakeholders concerned with regulatory compliance, cybersecurity, or business concerns in regards to dependencies may analyse the SBOM for this crucial metadata and make sweeping decisions on it alone. It is thus vital that the SPDX document be correct and kept up-to-date as Julia matures. This, unfortunately, brings up a number of concerns relating to the current SBOM situation:
- Any changes to the SBOM must be performed manually via edits to the `julia.spdx.json` file. This is an error-prone process that places a burden on contributors unfamiliar with software copyright/licensing and the SPDX specification. As an example, the current file [fails SPDX validation](https://tools.spdx.org/app/validate/) (set to `V2 JSON`).
- The current file groups SPDX metadata by package rather than by file. File-level copyright/licensing information is almost entirely absent and, as a consequence, special cases may be hidden from stakeholders. For example, the main repository contains a number of patches under the `deps/patches` directory for LLVM, which falls under a specific Apache license (`Apache-2.0 WITH LLVM-exception`). Under its terms, patches are potentially considered to be derivative works, meaning that Julia maintainers and users may need be aware of the legal ramifications of the file's inclusion and of its license's terms and conditions (retain upstream copyright notices, include a copy of the upstream license, etc.
- There is no direct method for verifying copyright/licensing compliance. The project is under a blanket MIT license, however few files contain appropriate SPDX headers (only [3 files](https://github.com/search?q=repo%3AJuliaLang%2Fjulia+SPDX-License-Identifier&type=code) contain an `SPDX-License-Identifier` tag at the time of writing) and it would be a tedious process to check and modify every file for conformance. Some files, by their nature, cannot include such information, and require a separate `.license` file or other means to convey copyright/licensing information.
[REUSE](https://reuse.software/) is a specification by Free Software Foundation Europe e.V. (FSFE) that provides a human-readable and machine-readable format for specifying file-level copyright/licensing information for a given software project. The overall goal is to provide a [license metadata management tool](https://github.com/rust-lang/compiler-team/issues/519#issue-1271950464) that enables automation of copyright/licensing compliance and thus simplify SBOM generation. At its core is a top-level `REUSE.toml` file containing annotations that specify filepaths, the order of precedence of licensing information (in case of differing licensing information between the `REUSE.toml` annotation and the annotation within a given file), and associated SPDX file tags. License files are stored within the `LICENSES` directory, with one or more top-level license files allowed for specifying the project license, which would be `LICENSE-MIT` in the case of Julia. A basic example for the Julia main repository would be as follows:
```toml
version = 1
# SOURCE file type
[[annotations]]
path = "*.jl" # every Julia source file within the repository
precedence = "override" # ignore licensing information already present in the file
SPDX-FileCopyrightText = "Contributors to Julia <https://julialang.org>"
SPDX-License-Identifier = "MIT"
# IMAGE file type
[[annotations]]
path = "doc/src/assets/logo.svg" # the Julia logo
precedence = "override"
SPDX-FileCopyrightText = "Contributors to Julia <https://julialang.org>"
SPDX-License-Identifier = "MIT OR CC-BY-4.0" # Creative Commons license for non-code assets
# Third-party (LLVM) patch
[[annotations]]
path = "deps/patches/llvm-libunwind-force-dwarf.patch"
precedence = "override"
SPDX-FileCopyrightText = [
"LLVM Project Contributors",
"Contributors to Julia <https://julialang.org>"
]
SPDX-License-Identifier = "Apache-2.0 WITH LLVM-exception"
```
All files within the main repository are thus annotated from a single location. While this does not eliminate the need for appropriate SPDX file headers, a single source of truth is therein established that is verifiable, lintable, and easy to change. The major advantage of this approach is the ease of generation of a valid SPDX SBOM from the `REUSE.toml` file.
## Implementation
The FSFE offers a [reference commandline application](https://github.com/fsfe/reuse-tool.git) which follows modern Python project practices and is easy to install and execute via `pipx`. Adoption of the REUSE specification will be completed incrementally:
1. Promote active discussion within the Julia community in order to determine any potential copyright/licensing edge cases
2. Complete the `REUSE.toml` file and `LICENSES` directory according to community discussion and the current `julia.spdx.json` context
3. Implement a basic GitHub Actions workflow that triggers both automatically (new file introduced, for example) or manually that performs `reuse lint` and determines if the main repository remains REUSE compliant
4. Implement SPDX SBOM generation via the Julia Buildkite CI/CD (`reuse spdx`) that will be bundled with the next Julia release. This might be implemented as an additional utility (`utilities/generate-sbom.sh`) that is then integrated into the `upload_julia` job.
5. Provide additional contributions for extended copyright/licensing compliance, including:
- Include `.github` template files for contributors with pre-populated SPDX headers
- Update developer documentation on current copyright/licensing practices that conform to SPDX
## Motivating Example
The Rust Compiler Team, in conjunction with Ferrous Systems, [adopted](https://github.com/rust-lang/rust/pull/99415) the REUSE specification [out of a need](https://github.com/rust-lang/compiler-team/issues/519) for handling copyright/licensing information of Rust components. This was of particular concern for the the latter stakeholder, whose objectives include wider industry adoption of Rust via ISO/IEC qualification for the reference compiler. The [previous REUSE format](https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/) was eventually replaced with the modern `REUSE.toml` format [adopted](https://github.com/rust-lang/rust/pull/127923), which serves the [entire project repository](https://github.com/rust-lang/rust/blob/master/REUSE.toml).
## Considerations
- Currently, the reference commandline application [only supports](https://github.com/fsfe/reuse-tool/issues/394) outputting SPDX v2.1 documents. The National Telecommunications and Information Administration (NTIA) [recommends](https://github.com/spdx/ntia-conformance-checker.git) SPDX v2.3 at a minimum for meeting what they consider the [minimum elements](https://www.ntia.gov/report/2021/minimum-elements-software-bill-materials-sbom) for an SBOM. Upstream contribution towards SPDX v2.3 support along with other improvements that will ease REUSE integration (adoption of [uv](https://astral.sh/blog/uv), devcontainer configuration, etc.) will be evaluated.
- The reference commandline application supports multiple `REUSE.toml` files. While a single top-level `REUSE.toml` file would be the most convenient option, it might also be ideal to include multiple `REUSE.toml` files for different subprojects, such as stdlib packages. This would depend on the desire to maintain separation between subprojects if they are moved out of the main repository, as an example.
- Barring exceptional circumstances, REUSE compliance failure should block contributions or releases, as this would signal a potential copyright/licensing conflict that must be resolved. An SBOM should be provided with every valid Julia release as a required manifest for that release.
## Extended Objective
Successful adoption of the REUSE specification would open up discussion on additional improvements to copyright/licensing compliance across the Julia ecosystem. One such example would be Pkg-level integration of SBOM generation, similar to Rust [Cargo plugins](https://github.com/CycloneDX/cyclonedx-rust-cargo.git). Possible `license` and `copyright` fields for `Project.toml` would provide explicit project-level declaration exposable via`Pkg.project()` (see [Standardised Metadata in Project.toml](https://github.com/JuliaLang/Pkg.jl/issues/1070)) and that could be embedded in the package's `Manifest.toml` file. This would allow for granular copyright/licensing scanning of almost every dependency of a given Julia package, which would facilitate SBOM generation per package.
# Complementary Sigstore Cryptographic Signing
- **Repositories**: [JuliaLang/Julia](https://github.com/JuliaLang/julia.git), [JuliaCI/julia-buildkite](https://github.com/JuliaCI/julia-buildkite.git), [JuliaLang/www.julialang.org ](https://github.com/JuliaLang/www.julialang.org.git)
- **Objectives**:
- Implement a Buildkite workflow for generating Sigstore signatures for each Julia release (primary)
- Provide additional contributions for further Sigstore signing and verification across the Julia ecosystem (secondary)
- **Metrics**:
- Working CI workflow that generates Sigstore signatures (`.sig` and `.pem` files) for Julia release artifacts
- Updated developer documentation detailing new signing practices
- **Resources**:
- [Sigstore documentation](https://docs.sigstore.dev/)
- [Sigstore security model](https://docs.sigstore.dev/about/security/)
- [Introducing npm package provenance (article)](https://github.blog/security/supply-chain-security/introducing-npm-package-provenance/)
- [PGP vs. sigstore: A recap of the match at Maven Central (article)](https://www.sonatype.com/blog/pgp-vs.-sigstore-a-recap-of-the-match-at-maven-central)
- [Achieving SLSA 3 Compliance with GitHub Actions and Sigstore for Go modules (article)](https://github.blog/security/supply-chain-security/slsa-3-compliance-with-github-actions/)
## Description
Currently, binaries for each new Julia release are [signed](https://docs.julialang.org/en/v1/devdocs/build/distributing/#Signing-binaries) according to their respective platform. Buildbots automatically handle macOS platform code signing, whereas Windows and Linux platform code signing requires manual commands as part of the release CI/CD workflow. Each new Linux Julia release is accompanied by a generated [GNU Privacy Guard (GPG)](https://www.gnupg.org/) `.asc` signature for the distributed blobs (tarballs). These traditional signing methods, unfortunately, are insufficient for software supply chain security purposes:
- Traditional code signing formats are not generally suited for provenance, as they focus more narrowly on developer identity. They do not integrate with attestation frameworks like [in-toto](https://in-toto.io/), for example, and different platforms require different signatures, which complicates provenance generation and automation.
- Certain signature formats have fallen out of favour in a number of security contexts due to unreliability and vulnerability concerns. For example, PyPI [no longer accepts PGP signatures](https://blog.pypi.org/posts/2023-05-23-removing-pgp/) for packages over the lack of reliable provenance information, [among other reasons](https://blog.yossarian.net/2023/05/21/PGP-signatures-on-PyPI-worse-than-useless).
Thus, a separate signature mechanism is needed to address supply chain security and provenance, one expressly suited for both tasks. [Sigstore](https://www.sigstore.dev/) represents an emerging framework that provides an integrated system for handling artifact signing and provenance. A given artifact is signed using a Sigstore client such as [Cosign](https://github.com/sigstore/cosign.git). Compared to GPG, Cosign promotes keyless signing by issuing short-lived keys and X.509 certificates, the latter provided by [Fulcio](https://github.com/sigstore/fulcio.git). The certificate request is bound to the developer via an OpenID Connect (OIDC) token provided by GitHub, [Buildkite](https://buildkite.com/docs/pipelines/security/oidc), or other OIDC provider. The resulting artifact `.sig` signature and `.pem` certificate are then uploaded to the public [Rekor](https://github.com/sigstore/rekor.git) append-only transparency log.
Sigstore signatures currently cannot replace traditional signature formats for establishing platform-specific developer identity (see [Documentation on signing Windows and MacOS apps](https://github.com/sigstore/fulcio/issues/250)). However, the immediate benefits of generating such signatures for subsequent Julia releases include:
- Enhanced security via the promotion of ephemeral keys and other modern security practices
- Public auditability via the Rekor log that provides a record of every signed artifact
- Ease of CI/CD automation via keyless workflows, online and offline modes, and platform-agnostic tooling
- Integration with in-toto for generating SLSA-approved attestations alongside signatures (discussed further in the In-Toto Attestation section)
## Implementation
Cosign [supports Buildkite as an identity provider](https://github.com/sigstore/cosign/pull/2779), meaning that it can detect when it is being run in a Buildkite pipeline and gather the appropriate OIDC token information. This simplifies adoption of Cosign into the Julia Buildkite workflow, since it can run directly inside the pipeline to generate the signatures. One approach to integrating Cosign would be to simply add an additional Buildkite step for the release CI/CD workflow wherein the `.sig` and `.pem` files are generated and uploaded to Rekor. A potential path forward would be to create a new `utilities/sigstore_sign.sh` file within the Julia BuildKite repository that is solely reponsible for generating the `.sig`/`.pem` files for all release artifacts. Or, this functionality would simply be included directly into the `utilities/upload_julia.sh` file or other needed locations. Additional contributions following successful testing and deployment of the Sigstore signing CI/CD workflow will include:
- Updating developer documentation to reflect new signing practices
- Ensuring the Sigstore signature is widely available close to release artifact download locations (JuliaLang website, JSON release feed, etc.)
## Motivating Example
As previously stated, PyPi dropped GPG suppport for packages due to a number of security-related concerns. This was followed by [PEP 761](https://peps.python.org/pep-0761/), which suggested that GPG signatures be dropped from Python at the root level. This PEP was approved and, starting with v3.14, Python releases will [officially drop](https://www.python.org/downloads/metadata/sigstore/) GPG signatures in favour Sigstore signatures. Sigstore signing and verification has also been made available via the official [Python Sigstore client](https://pypi.org/project/sigstore/).
## Considerations
- Implementation of sigstore signing will most likely be the most challenging contribution for this project, Julia's Buildkite CI/CD will be directly affected with proper assurity needed that appropriate OIDC permissions and other components are in place. This will build off of the previous experience with implementing SBOM generation following REUSE adoption.
- All artifacts contained in a Julia release are potentially signable. This would include, for example, the generated SBOM.
- Sigstore signing and verification methods should be widely availabe to Julia stakeholders. Contributions could include turning Cosign into a JLL for ease of integration into Julia projects and additional documentation and tutorials educating stakeholders on Sigstore and its relation to Julia.
## Extended Objective
Sigstore cryptographic signing of Julia release artifacts would greatly benefit Julia's provenance story. Allowing Julia packages to be able to upload their own release signatures would thus elevate the provenance story to the entire ecosystem. One possible path would be to enable the [Registrator.jl](https://github.com/JuliaRegistries/Registrator.jl.git) bot to recognize a provided `.sig` file with a new package release and tie it to that release in the [General Registry](https://github.com/JuliaRegistries/General.git). Sigstore signing may also be automated via Registrator.jl if such an option is both desirable and feasible. This tight coupling of package version and signature would provide future possibilities for enhanced ecosystem features (output the current package version Sigstore signature via Pkg, signature verification via Rekor and Pkg, etc.).
# In-Toto Attestation
- **Repositories**: [JuliaLang/Julia](https://github.com/JuliaLang/julia.git), [JuliaCI/julia-buildkite](https://github.com/JuliaCI/julia-buildkite.git)
- **Objectives**:
- Integrate in-toto attestation generation into Julia's Buildkite release workflow (primary)
- Document the Julia CI/CD release workflow (primary)
- **Metrics**:
- Successful in-toto attestations generation for new Julia releases via Buildkite
- Updated developer documentation detailing the Julia release attestation workflow
- Series of written articles and updated developer documentation on the Julia CI/CD release workflow
- **Resources**:
- [SLSA Provenance V1 attestation format](https://slsa.dev/spec/v1.0/provenance)
- [In-Toto Attestations (Cosign)](https://docs.sigstore.dev/cosign/verifying/attestation/)
- [in-toto and SLSA (article)](https://slsa.dev/blog/2023/05/in-toto-and-slsa)
- [Understanding Software Provenance Attestation: The Roles of SLSA and in-toto (article)](https://mikael.barbero.tech/blog/post/2023-12-28-slsa-and-in-toto/)
- [SLSA, it’s all about provenance attestation (article)](https://medium.com/@rrey94/slsa-its-all-about-provenance-attestation-09a83b7b9de7)
## Description
The complete provenance story requires an authenticated statement detailing the build context, artifact location, and additional provenance information of the release artifacts. [In-toto](https://in-toto.io/) represents an open metadata standard for formats and processes that secure the supply chain, and is central to the [in-toto Attestation framework](https://github.com/in-toto/attestation.git). The framework defines a software provenance attestation format built on top of the in-toto standard, and is supported by an ecosystem of tools including Sigstore. In-toto attestation is part of [SLSA's recommended suite](https://slsa.dev/attestation-model#recommended-suite), and a custom `https://slsa.dev/provenance/v1` predicate type is provided as a tailored attestation format for SLSA provenance. Implementation of in-toto attestations would make Julia the first major programming language to include attestations for its releases.
## Implementation
Sigstore, specifically Cosign, includes the ability to both generate and sign attestations with the `slsaprovenance` predicate type. Following successful Sigstore integration of cryptographic signatures into the Buildkite CI/CD system, a similar pattern can be adopted for providing attestation with Cosign (logic within the Buildkite release pipeline for generating and bundling attestation).
## Motivating Example
The [npm Javascript/TypeScript package manager](https://www.npmjs.com/) recently became [one of the first](https://blog.sigstore.dev/npm-public-beta/) package managers to support built-in generation of both [provenance and publish attestations via Sigstore](https://docs.npmjs.com/generating-provenance-statements).
## Considerations
- These changes to Buildkite will provide an excellent opportunity to fully document Julia's release CI/CD workflow. One thought is a series of weekly blog posts cataloging the research odyssey into Julia's CI/CD internals that will hopefully prove enlightening for other contributors.
- A particular challenge will be choosing the appropriate location to store the attestations. They will by default be bundled with each Julia release, however this would mean that accessing them would potentially require downloading the release in full. Keeping a copy of the attestations in the main repository would not be ideal as this would violate the principle of separation between source and build artifacts. One potential solution would be to upload the attestations as [OCI artifacts](https://edu.chainguard.dev/open-source/oci/what-are-oci-artifacts/) to the [GitHub Container Registry](https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry). The ideal path would be to upload the attestations to the main repository and be able to treat them as [GitHub Artifact Attestations](https://docs.github.com/en/actions/security-for-github-actions/using-artifact-attestations/using-artifact-attestations-to-establish-provenance-for-builds) in order to take advantage of the [GitHub REST API](https://docs.github.com/en/rest/users/attestations?apiVersion=2022-11-28).
## Extended Objective
In-toto attestation generation, for the purposes of this GSoC project, will focus exclusively on automated provenance attestation for future Julia release artifacts. Successful integration into the Julia CI/CD release workflow would allow for a path towards providing automated provenance attestation for Julia packages, perhaps built directly into Registrator.jl.
# Updated Release Feed Workflow
- **Repositories**: [JuliaLang/VersionsJSONUtil.jl](https://github.com/JuliaLang/VersionsJSONUtil.jl.git), [JuliaLang/www.julialang.org](https://github.com/JuliaLang/www.julialang.org.git), [JuliaLang/Julia (potentially)](https://github.com/JuliaLang/julia.git)
- **Objectives**:
- Gather and adopt feedback on potential improvements to the current JSON release feed format (primary)
- Rewrite the VersionsJSONUtil.jl package as JuliaVersionsCI.jl modernize its internals and adopt JSON schema improvements (primary)
- Implement the new package into the JSON release feed workflow (primary)
- Enable ecosystem use for package-level use (secondary)
- **Metrics**:
- Working rewrite or new package that can successfully upload to S3
- Integration of the package into the JSON release feed workflow
- Updated testing and documentation for JSON release feed workflow and development
- **Resources**:
- [JSON Schema (draft-06)](https://json-schema.org/draft-06/draft-wright-json-schema-01)
- [An Introduction to Rekor (article)](https://edu.chainguard.dev/open-source/sigstore/rekor/an-introduction-to-rekor/)
## Description
One of Julia's best features is the `versions.json` feed served directly online that lists all Julia releases and associated metadata. This is generated from a JSON schema by the [VersionsJSONUtil.jl](https://github.com/JuliaLang/VersionsJSONUtil.jl.git) package based on the main repository's Git tags. VersionsJSONUtil.jl was introduced in 2021 following [discussion](https://github.com/JuliaLang/julia/issues/33817) on the need for a centralized version list, and a modern rewrite would allow for the following benefits:
- Attempt to tackle issues such as long wait times for adding new versions and the desire for changes to the current schema
- Reduce the reliance on third-party tools such as [ajv-cli](https://www.npmjs.com/package/ajv-cli/v/3.3.0) for schema validation and instead use Julia dependencies wherever possible
- Take advantage of new developments in Julia ([package extensions (v1.9)](https://julialang.org/blog/2023/04/julia-1.9-highlights/#package_extensions), [defined main entrypoint (v1.11)](https://julialang.org/blog/2024/10/julia-1.11-highlights/#new_main_entry_point), etc.)
- Add novel features usable both for the Julia release feed and for package release feeds (multiple cloud storage backends, local release manifest, etc.)
The approach to be taken for potentially reducing wait times in particular would be the main focus of the rewrite. It will build off of the previous contribution providing in-toto attestations and thus serve as a demonstration of the benefits of automated provenance of release artifacts.
## Implementation
First, stakeholder feedback will be sought for potential improvements to the current release feed schema. Based on prior discussion, as an example, there would be added convenience with the inclusion of specific channel tags (`lts`, `rc`, etc.) marking their respective Julia releases directly in the feed. Inclusion of Sigstore signatures, in-toto attestations, and other useful release information would also be of benefit. The potential schema structure could look like so:
```json
// Include the channel tag as and additional field for each release
"properties": {
"files": {
"type": "array",
"items": {
"$ref": "#/definitions/File"
}
},
"stable": {
"type": "boolean"
},
"lts": {
"type": "boolean"
}
// ...
// OR include a separate top-level object listing each channel
"type": "object",
"additionalProperties": {
"$ref": "#/definitions/ChannelList"
},
// ...
"definitions": {
"ChannelList": {
"type": "object",
"additionalProperties": false,
"properties": {
"lts": {
"type": "string" // or a reference to the specific release object
}
// ...
```
A final specificiation for the schema would be drafted and thoroughly tested using [fredo-dedup/JSONSchema.jl](https://github.com/fredo-dedup/JSONSchema.jl), which would replace `ajv-cli` as the new schema validator dependency. A new package called JuliaVersionsCI.jl would be written separately from VersionsJSONUtil.jl based on the latest Julia version (v1.11). [LocalStack](https://github.com/localstack/localstack) will be employed for local S3 testing, with exploration of [nektos/act](https://github.com/nektos/act.git) and [SanjulaGanepola/github-local-actions](https://github.com/SanjulaGanepola/github-local-actions.git) for further mocking capabilities. For a specific period, both the old and new release feeds will exist side-by-side (the latter available via julialang-s3.julialang.org/bin/V2/versions.json) until the former is deprecated.
The option will be provided to either allow JuliaVersionsCI.jl to upload the release feed itself, or simply have it generate the release feed JSON file and pass it to a GitHub Actions workflow. The default release feed generation path will default to Git tags as with VersionsJSONUtil.jl. Two additional methods will be provided, with the latter being the recommended option moving forward:
- Generate via the [list of GitHub releases](https://api.github.com/repos/JuliaLang/julia/releases). This would not be ideal for Julia because not all releases have been made GitHub releases. This also does not solve the need for downloading the release artifacts.
- Generate via in-toto attestation statements. The SLSA Provenance V1 attestation format, supported by Cosign, provides much of the same information that must be gleaned via downloading the release artifacts (artifact hash, target platform, etc.). Therefore, only the signed attestation statement file need be downloaded from Rekor and parsed to populate the release feed. A curated collection of each release attestation statement could be stored in an `Artifacts.toml` file or other data format that, when updated, triggers the generation and deployment of the updated feed. This way, there is an additional source of truth for Julia releases (Rekor and the curated list) that could reduce the need to fetch attestations online and thus further improve performance and offline capabilities.
JuliaVersionsCI.jl would also be designed with the ability for Julia package developers to create their own release feeds for their packages. Developers would simply need to integrate this package into their package release workflows. Package extensions would allow developers to choose their preferred storage backend for the release feed should they wish for JuliaVersionsCI.jl to handle uploading
[AWS.jl](https://github.com/JuliaCloud/AWS.jl.git), [Azure.jl](https://github.com/JuliaComputing/Azure.jl.git)), though direct calling of each cloud platform's respective REST APIs will also be explored. Additional contributions following successful JuliaVersionsCI.jl integration would included updating developer documentation detailing the release feed workflow and incorporating additional feedback on further improvements to the release feed once in the wild.
## Considerations
- Discussion must take place prior to the integration of JuliaVersionsCI.jl in order to determine how it should be included in the release workflow (kept manually triggered as with VersionsJSONUtil.jl or automated with each new release).
- Any downstream consumers of the updated schema will expect their workflows to continue as expected until complete deprecation of the prior online feed. Proper notice will need to be given before such time in order to ensure consumers have enough time to update.
- The attestation statement storage location must be taken into account. Although it is strongly advised to upload attestations to Rekor, certain package developers may wish to keep attestations stored locally, for example when dealing with a private registry or when dealing with specific regulatory restrictions.
## Extended Objective
Instead of forcing users to integrate JuliaVersionsCI.jl into their release workflows themselves, it might be desirable to include this capability direclty into Registrator.jl. Each declaration of a new package release would also trigger an update of the package release feed that could potentially be stored in the General Registry. This could greatly simplify how Pkg adds and vets officially registered package releases moving forward, whilst also providing incentive for stakeholders to begin adopting provenance across the ecosystem.
# Timeline
**TODO**
- **Community Bonding Period (May 8 - June 1)**
- **Coding Period (June 2 - July 14)**
- **Midterm Evaluations (July 14 - July 18)**
- **Work Period (July 19 - August 25)**
- **Final Week (August 25 - September 1)**
# FAQ
## Why decide on provenance as the focus of your GSoC project?
A particular story that inspired this GSoC project was when I contributed towards the [common-utils](https://github.com/devcontainers/features/tree/13521bc5efd79a1cea7da58df59a523243b3cea6/src/common-utils) devcontainer feature. This is a feature by Microsoft that adds a number common utilities to development environments, such as `curl` and `zip`. I wished to simplify Yggdrasil's [devcontainer configuration](https://github.com/JuliaPackaging/Yggdrasil/commit/f1615080d6a0dd5b6db9dc5af30ce70b7497e1ad) by removing the need for a Dockerfile and relying on common-utils, which is installed by default on GitHub Codespaces. Every dependency needed was already included in common-utils, with the exception of the XZ utils data compression library. Therefore, I made a [pull request](https://github.com/devcontainers/features/pull/798) to include XZ in the common-utils feature, which was accepted by Microsoft and fully merged. Roughly 2 months later, [a major XZ backdoor was discovered](https://en.wikipedia.org/wiki/XZ_Utils_backdoor) prepetrated by malicious actors that significantly affected the software community. Consequently, I was absolutely convinced for a full month that agents would be breaking down my door at any moment :') This, on a positive note, lead me down the path to learning about the developments in modern software security practices and their increasing importance in industry and the community.
## Why does this proposal aim solely for Julia release artifacts, and not other artifacts across the ecosystem?
The supply chain is a broad topic and includes many moving parts, and this proposal aims solely for build provenance of Julia itself. It must first be proven that SLSA provenance can be successfully generated for Julia at its root level and in an automated fashion. As alluded to in certain sections, it might be possible for future contributions to extend provenance capabilities (SBOM generation, artifact signing, etc.) to the rest of the ecosystem based on this work.
## Why not directly aim for Build L3 SLSA provenance?
Each subsequent provenance level requires increasing trustworthiness and completeness of artifact provenance. As such, all guidelines from the lower levels must first be met before attaining higher levels. Build L0 and Build L1 are almost fully met by Julia: Build L0 makes [absolutely no guarantee of anything :)](https://slsa.dev/spec/v1.0/levels#build-l0-no-guarantees), and Build L1 ensures that [some form of provenance](https://slsa.dev/spec/v1.0/levels#build-l1-provenance-exists) exists (Buildkite logs and basic SBOM generation). Build L2 is largely attainable with the current configuration of the supply chain, requiring all the guarantees of L1 with a [hosted build platform](https://slsa.dev/spec/v1.0/levels#build-l2-hosted-build-platform) (Buildkite). This proposal is thus meant to provide contributions that confirm and further guarantee Build L2 provenance for Julia. [Build L3 provenance](https://slsa.dev/spec/v1.0/levels#build-l3-hardened-builds) requires that all the requirements for Build L2 provenance are first met and that additional steps are taken to ensure the build platform is hardened and offers tamper protection. While this might be attainable with the current Buildkite setup, this would require much greater research and discussion with the CI/CD team to properly vet.
## What are the risks associated with this project?
The biggest risk would be during testing, meaning that a fault in a contribution leads to a disruption of Buildkite activities. This was an unfortunate lesson during one of my [previous contributions](https://github.com/JuliaLang/VersionsJSONUtil.jl/pull/41) with adding an LTS tag to the release feed, for which I apologize sincerely and please wish not to be burned at the stake :( Other risks would include potential errors in the final attestation or signing of release artifacts, as the exact metadata contained therein is subject to stringent specifications that may vary between tools and workflows. Any contribution will be discussed thoroughly with the mentor and major stakeholders of the Julia community before any potential merge or rejection.
# Note
With this proposal relating to Julia supply chain security, I feel the need to publicly state that I am a current member of the Canadian Armed Forces as part of the Primary Reserve. The Canadian Department of National Defence/Ministère de la Défense nationale is neither supporting me, nor endorsing this proposal, nor exerting any influence on the design and implementation of any aspect of this project. This is a self-driven endeavour for the purposes of FOSS contribution and personal/professional experience.