Strategy for 2024

# Strategy for 2024 Within the Rust Foundation, we are currently reflecting on this year and planning for the next. This is necessary both regarding accountability towards the board, but also to ensure that we use our limited resources like human capital wisely. This document lays out how I am thinking of our strategic plans for next year, and proposes a few projects to address current challenges. Its purpose is to support discussions within both the Rust Foundation and the Infrastructure Team. ## Objectives On a very high level, I believe that everything I do in my roles as the foundation's Infrastructure Engineer and the Infrastructure Team's co-lead needs to support the following four objectives. ### Cost We need to ensure that the Rust Foundation and its sponsors can cover the costs of our current and future infrastructure needs. This requires us to understand our cost structure and trends, and take measures to ensure we grow sustainably. ### Operations As the usage of Rust grows, we need to ensure that our infrastructure is reliable and resilient. This requires us to invest in observability and mature operational processes to proactively support the needs of the project and ecosystem. ### Security As Rust is used more widely and in more critical applications, security becomes ever more important. We need to ensure that we proactively improve our infrastructure and processes to address threats. ### Sustainability The Rust Project at large and the Infrastructure Team specifically are mostly made up of volunteers. As the demands on the team increase, we need to invest in the sustainability of the team. Specifically, we need to provide a healthy work environment and scale the team with the workload. ## Challenges & Risks Looking at this year and next, the following challenges and risks need to be addressed by the Infrastructure Team. ### Bandwidth Costs We are seeing that bandwidth is growing exponentially year-over-year, and we are seeing that the growth rate itself is increasing as well. Bandwidth is already a major driver of our infrastructure cost, and we need to develop a long-term strategy that ensures availability. ### Security Threat Model The Security Engineer of the Rust Foundation has developed a threat model for the Rust Project and its Infrastructure Team. While unlikely, certain threats pose a serious risk to the reputation or functioning of the Rust Project. We need to identify those and implement mitigations. ## Opportunities The following opportunities might be worthwhile to explore or at least consider when we plan for the next year. ### Security Initiative A lot of focus and attention in the industry is currently directed towards software and supply chain security. Creating a roadmap that addresses security issues or concerns within our infrastructure and then working on those projects might open up new sponsorship opportunities. ### Internal Mirrors Anecdotally, we heard about interest by big cloud providers to set up mirrors for Rust releases and crates within their infrastructure to benefit their own internal projects. This might provide interesting opportunities for collaboration with our own need to manage our bandwidth. ## Projects The following is an initial list of projects that we might want to consider executing in 2024. They are sources from the team's backlog and discussions with stakeholders from both the Rust Foundation and the Rust Project. ### Caching for GitHub Actions We estimate that roughly half of our traffic comes from continuous integration platforms, most importantly GitHub Actions. Implementing caching for releases and crates within Microsoft's data centers has the potential to meaningfully reduce our outbound traffic and slow its growth rate. ### Out-of-band Backups Rust releases and crates are currently only backed up within the same AWS organization. For additional security and redundancy, we need to have another backup that lives in a different account with totally different access controls. ### Secure Deployments Instead of deploying infrastructure from our personal laptops, we should use a secure environment with strong access controls. Deploying from centralized infrastructure also makes it easier to monitor the system for unauthorized access or suspicious activity. ### Access Control for GitHub We still have a lot of GitHub organizations and repositories without automated access control. Some repositories are now managed by the rust-lang/team repository and associated tooling, but we should add all organizations and repositories to ensure permissions are granted and (most importantly) revoked automatically. ### Google Workspace for Rust Project The Infrastructure Team has discussed setting up a Google Workspace for the Rust Project in the past, and some progress was already made on setting it up. The goal is to use the workspace for automated account and access management using SAML. But we can also use it to fix the continued threat of phishing campaigns to our mailing lists. ### Patch Management Most, if not all, of our infrastructure is not patched regularly. This is especially concerning for systems that are either directly connected to the internet or are running untrusted code in a sandbox environment. We should invest in either a process to update critical systems regularly or in tooling that can support or even automate this task. ### Audit Trails Somewhat related to Secure Deployments, we want to make sure that we have audit trails for any interaction with our infrastructure. Once in place, we can deploy automated tooling to detect and alert on anomalies. ### Dual Control for Critical Systems We should investigate how to implement dual control (i.e. the 4-eyes principle) for any change made to critical production infrastructure. Both from a reliability as well as security perspective, changes to critical systems should be reviewed and approved by someone else first. Ideally, this is codified in our process and enforced by our tooling. ### Service Catalog The Infrastructure Team manages a lot of different services, but there is no complete overview and often no associated documentation. This makes it difficult to onboard new team members or support each other in the team.