# Artifacts for GitHub Actions The Rust Project operates infrastructure that provides users of the Rust programming language access to releases and third-party packages. Whenever they install or update Rust or a dependency in their projects, they download artifacts from this infrastructure. Providing this infrastructure comes at a cost to the Rust Project, which is proportional to the amount of traffic that is served. A very signifact amount of traffic comes not from individual users but from automated build systems, most notably from GitHub Actions. Optimizing the infrastructure to reduce the traffic from GitHub will likely reduce the infrastructure costs of the Rust Project by a similar factor. ## Assumptions - Artifact requests have a [long tail distribution](https://en.wikipedia.org/wiki/Long_tail), where only the most recent releases account for most of the traffic. - The infrastructure team is able to change the flow of traffic coming from GitHub without requiring changes to GitHub itself. ## Implementations Several potential approaches have been identified in conversations with the team and stakeholders. All approaches share the goal to serve artifact requests from within GitHub's network to reduce egress traffic from AWS. ### Mirrors for Releases and Crates This approach sets up mirrors for releases and crates inside Azure by replicating the respective S3 buckets into Azure. All requests from within Azure's datacenters are routed to these mirrors. Copying the bucket for crates is quite trivial due to its relatively small size. But the replication mechanism would need to support the deletion of crates, e.g. when crates are manually removed by admins for policy violations. Releases on the other hand are much larger, and it is questionable if all of them are actually required. Selectively replicating releases would require a fallback mechanism, however, in case a requested release is not available in Azure. This could be implemented client-side, e.g. in `rustup`, or server-side, e.g. in a serverless function. ### Caching An alternative approach uses caching inside Azure to reduce the amount of outgoing requests. The implementation might either be able to utilize Azure CDN or alternatively deploy a reverse proxy such as Varnish into Azure. Compared to the first approach, this implementation would not capture all traffic from GitHub Actions. Depending on the size and TTL of the cache, some requests will still hit AWS. But assuming a long tail distribution, most releases and crates _should_ be available in the cache at most times. ## Traffic Routing Depending on the implementation, more or less sophisticated strategies have to be implemented to route the traffic from GitHub Actions. In its simplest form, DNS-based routing points requests from GitHub Actions at a location within an Azure datacenter. This can either be a mirror or a cache, and might look like `azure-static.crates.io`. If only a subset of artifacts is mirrored into Azure, e.g. only the last few releases, then a routing function that knows what artifacts are available in Azure might be required. This function could either run within Azure, where it could check on each request if the artifact is locally available. Or it could run in AWS and implement a more generic router with potential support for other cloud providers (e.g. for mirrors in GCP). Finally, routing can also be built into the clients. `cargo` could try to first get the crates from the Azure mirror, and fall back to `static.crates.io` if that mirror returned `HTTP 404 Not Found`. While possible, this approach is probably the least desirable due to the work and coordination that it requires.