Tradeoffs in Distribution and Execution of Provider Plugins

In today's OpenTofu we broadly assume that all providers come from provider registries, and that the various alternative installation methods are byte-for-byte identical mirrors of packages from an upstream registry. In practice lots of organizations intentionally violate that assumption by, for example, running a "mirror" that is actually the primary location for a provider, but currently that involves adding special CLI Configuration settings on every system where OpenTofu would use those providers to force it to use the mirror instead of trying to contact the origin registry. Furthermore, OpenTofu's model of "platforms" for providers makes the assumption that it's sufficient to have just a single binary package per operating system and architecture pair, such as `linux_amd64` or `darwin_arm64`, and that provider developers will produce executables that work on all reasonable variations of each operating system and architecture. In practice that's most typically achieved by writing provider plugins in the Go programming language, because Go on Linux is relatively unique in making relatively little use of the system's C library and instead just making Linux kernel system calls directly. Building provider binaries in most other languages is troublesome because developers must carefully choose which libc they link to so that the resulting binaries will be usable by as many end-users as possible. The OpenTofu provider protocol working group is primarily concerned with deciding the scope of a new provider protocol, but we're also interested in making it easier to build and use an in-house provider and in making it easier to write shared providers in languages other than Go. This document is an initial overview of that problem space, and a summary of some ideas we discussed before forming the new working group. # Conflicting Requirements There have always been various design challenges in the space of provider selection and distribution because different participants have very different needs and preferences depending on how they use OpenTofu and on their relationship with the developers of the providers they use: 1. In the simplest case, an organization completely controls all of the OpenTofu modules they use and so can effectively trust that those modules are written in good faith. Those in this situation typically favor solutions that give the most flexibility and the least workflow friction. Unfortunately, the other cases described below have historically forced those in this category to accept constraints and friction they don't strictly need or want. They often use the provider-installation-related features in ways that they weren't intended to be used -- such as using "mirrors" as the primary source of one or more providers -- because that's the best available way to achieve a workable compromise. 2. In the moderate case, an organization might use a mix of in-house and third-party modules, and allow the third party modules to depend on whatever providers they might need as long as those providers are available for automatic installation from a registry. This seems to be the most common case, and is the situation the current system was designed to prioritize: the dependency lock file aims to support teams in having a workflow where they scrutinize entirely-new providers, but can then rely on the lock file to ensure that those providers don't change from what was initially reviewed and thus have some confidence that an existing provider won't become compromised by an attacker at a later date without that attack being detected. 3. In the most extreme case, organizations have strict security policies that require that dependencies be installed only from sources directly controlled by the organization, and they then scrutinize those packages at the time of importing them into the organization's private repositories. Using providers from any other source is forbidden by policy and, as much as possible, also blocked by technology. This cohort is the target audience of the various alternative "mirror" installation methods in OpenTofu, since that allows an organization to create a local copy of providers copied directly from their origin registries, such as by using `tofu providers mirror`. OpenTofu's CLI Configuration options to prevent installation from anywhere except a particular mirror are an important part of several organizations' defense-in-depth strategy. The tension between these three broad groups is very tricky, because what those in group 1 might consider to be a convenient and flexible behavior can be a security risk for those in category 2, and in the worst case could completely undermine the control needed by those in category 3. In discussions so far we've found consensus that we'd like to do a better job of supporting those in category 1, but we must find a way to do it without undermining the existing features and mechanisms that support the needs of those in categories 2 and 3. # An adjusted threat model Making tradeoffs related to security means we need to first agree on what problems we're trying to solve. A different way to frame the problem is to think about relationships between different parties where one party in the relationship might be a malicious actor or someone accidentally introducing a security problem in good faith. For the sake of this document I'm going to assume three main parties: 1. The operator of the environment where OpenTofu is running. When someone is using a "TACOS" system that supports cloud-hosted runners, this party is the vendor offering that service. They want to protect their multi-tenant system from abuse by any of its tenants. There's also an in-house variant of this in larger organizations where e.g. a platform team runs an execution environment on behalf of many application teams. In that case, the central platform team is this party. In both cases, this party is concerned with ensuring that the next group cannot (whether through malicious intent or through good-faith error) install and run bothersome software in the environment(s) intended for planning and applying changes using OpenTofu. From a non-security standpoint this party also ultimately constrains what other software is available for use in the execution environment. For example, if a provider were hypothetically written in Python and expecting to be run through a CPython runtime installed on the same system then the environment operator ultimately makes the decision about whether such a runtime is available and thus whether this hypothetical provider would work in their environment. If some commonly-used provider plugins came to depend on various different third-party language runtimes then there would be increased demand for these environment operators to preinstall these runtimes in their environment, thereby increasing the amount of software they are ultimately responsible for. 2. The author of a configuration's "root module" Although in many cases we try to make both root and descendent modules equivalent, the author of the root module has the ultimate responsibility of making sure all of the dependencies they select are trustworthy and non-harmful, and for recording the dependencies they decided to use in the configuration and the dependency lock file. This party is in a middle position where they are potentially the "attacker" from the perspective of the environment operator, but also potentially the "victim" from the perspective of a malicious shared module or provider author. 3. The authors of shared modules that a root module depends on (directly or indirectly) Lots of OpenTofu configurations involve at least one third-party module installed from outside of the organization that owns the root module and/or execution environment. Those modules in turn typically depend on at least one provider each, and so can introduce additional risk of installing and executing a malicious provider, or of choosing to depend on a provider that requires external language runtimes not installed in the target execution environment. This party is potentially an "attacker" from the perspective of either of the previous two parties. These three parties offer another lens through which we can think about the cohorts discussed in the previous section: in "the simplest case", the same organization is playing all three roles and so they can rely to some extent on internal trust and contract law as a substitute for technical security measures. The other cases gradually consider more of these parties distrusted, up to the extreme where nobody trusts anyone else at all and technical solutions become a hard requirement. # A Potential Compromise With all of the above in mind, in earlier discussion we considered various different compromises that each slightly change the power and flexibility afforded to each of the parties in the threat model. The most promising variation we've discussed so far has been to offer two different ways to distribute and use a provider: 1. Distribution through a provider registry, or a mirror thereof: this is the current model, though (as discussed later in the document) extended with some additional capabilities. 2. Distribution alongside the root module: allowing the root module author to directly specify a provider that is presumed to already be available -- e.g. by being included in the same Git repository as the root module -- which OpenTofu will just run directly without any special installation step at all. (Earlier discussion referred to this as [Local-exec Providers](https://github.com/opentofu/opentofu/pull/3027), which was defined as an extension of [Registry in a File](https://github.com/opentofu/opentofu/pull/2892).) The second case is the main new addition here: it acknowledges that under the default settings a root module author is _already_ effectively capable of introducing arbitrary code to execute, and offers a new way to do that which is easier to set up and doesn't require any special configuration of the execution environment aside from making sure any needed software is already somehow available in the execution environment. Note that the current draft RFC for "local-exec providers" calls for them to be specified in a separate file that's distributed alongside the root module rather than _as part of_ any module, because that creates a compromise where root module authors can introduce additional in-house providers to their own root modules but authors of third-party modules can still only use providers that were already somehow available to the configuration where they are being used. In order to offer this new option without undermining "the most extreme case", this ability for a root module to directly bundle providers must be considered an extension of the existing "direct" provider installation method, since that's the way root module authors currently exercise their ability to install arbitrary software. Operators of environments that require more constraint currently typically achieve that by completely disabling the "direct" installation method in favor of one or more of the "mirror" methods, and so we must make sure that strategy remains effective as we introduce this new capability. The remaining sections of this document assume, for the sake of discussion, that we want to adopt a compromise like that described in this section, which then leads to various technical design and implementation questions in each case, with differing tradeoffs. # Design Tradeoffs for Registry-distributed Providers Because OpenTofu already supports installing providers from registries (and mirrors thereof), we already have quite a comprehensive understanding of the tradeoffs in this area: we assume that each provider has a distinct source address that is rooted in a hostname whose owner is assumed to control the meaning of that source address, and that for each source address there will be zero or more installation methods we can use to try to fetch a package for it. Each provider has a fixed set of platforms (OS and architecture) it supports, and OpenTofu automatically selects the one that matches the platform that the OpenTofu CLI executable was built for. For our current discussions the main tradeoffs here relate to supporting providers written in languages other than Go: - Should we extend our modeling of "target platform" to support more than just OS and architecture, so that we can model variations such as which libc is used on Linux, or which variant subset of Windows OpenTofu is running on? This would allow provider developers to, for example, offer separate builds for Linux distributions with a certain minimum version of glibc from Linux distributions that prefer to use musl libc. However, it achieves that by just pushing all of these complexity onto the provider developer, still requiring that they figure out how to even produce all of these various binary builds. Each programming language has different capabilities for cross-compilation and different toolchain requirements, and so in practice this is likely to still result in providers only supporting the most common configurations and those with less common situations (like musl libc systems) would benefit only by getting an explicit error message that a provider isn't supported for them, instead of a runtime failure from the system's dynamic linker. - Should we encourage provider developers to ship providers that only work when a certain language runtime is already installed on the computer where OpenTofu is running? For example, to publish a provider written in Python that expects there to already be a `python3` executable and various libraries installed on the host. Encouraging this is likely to severely fracture our provider ecosystem, and cause the same functionality to be reimplemented multiple times in different languages. It's already technically possible to ship providers that expect other software to be installed on the system, but that rarely happens today. It's unclear exactly why, but it's possible that it's because there aren't officially-maintained SDKs in any languages other than Go and so building providers in other languages is more challenging. If that's true then the OpenTofu project offering SDKs for other languages is likely to cause this to become more common. If we _do_ choose to encourage this model, we would probably want to add an "any" platform to our current model of platforms so that providers written in languages that are typically distributed as source code rather than binary do not need to artificially publish multiple copies of the same source code associated with various different platforms. - Could we compromise by choosing just _one_ additional runtime to encourage? For example, we could decide that the new style of provider artifact is a multi-platform container image and require folks to install a suitable container runtime on their systems alongside OpenTofu in order to use providers of that style. This at least limits the amount of ecosystem fracturing we'd cause, while allowing folks to ship a suitable runtime for their chosen language as part of the container images. However, container engines _in particular_ might be challenging for environment operators that impose constraints that prevent the use of the relevant OS APIs, and container support for macOS is pretty lacking in comparison to other platforms, and some of our "best-effort" target platforms do not have a suitable container runtime implementation. Another variation of this tradeoff is to embed a WebAssembly runtime and use the [WASI](https://wasi.dev/) API. That would effectively prioritize using precompiled languages like C, Go, and Rust over source-distribution-based languages like JavaScript and Python, but means that we could embed the runtime directly inside OpenTofu rather than requiring something separate to be installed. The fact that WASI remains experimental at the time of writing means that we might commit to support an API that will become obsolete in the relatively near future. # Design Tradeoffs for "Local-exec" Providers For "local-exec" providers we would be making some quite different assumptions than for registry-distributed providers. Most notably, we'd expect that such a provider is more tightly coupled to a specific set of use-cases and specific set of execution environments, and therefore we probably don't need to worry _so_ much about fracturing the ecosystem: these providers are not really part of the "ecosystem", and are instead more likely private to whatever organization chose to use them. The tradeoffs here are quite different, then: - Do local-exec providers still have "source addresses" in the same global namespace as registry-distributed providers? This would certainly minimize changes to the rest of the system that already assumes that all providers have a "fully-qualified" source address that is global, but it means that anyone wanting to use this model will need to establish a private namespace under a hostname they control even though that hostname would never actually be used for network requests. If we instead decided to place these in a different namepace -- or perhaps, under a reserved pseudo-hostname within our current syntax -- then that would undermine some assumptions elsewhere in the system about how providers can be passed between modules only when the two modules agree on which fully-qualified source address they depend on: the same name/address could mean something quite different depending on which root module a shared module has been called by. - Should environment operators be able to control the availability of local-exec providers separately from "direct" installation from registries, or can we consider these as two variants of the same control? As noted earlier, root module authors already have the ability to depend on arbitrary providers as long as the CLI configuration allows them to use the "direct" installation method with arbitrary hostnames, and so we could argue that local-exec providers are just a variation of that where the provider is effectively just preinstalled along with the root module. However, allowing root module authors to effectively redefine the meaning of names in arbitrary provider namespaces could undermine a more granular CLI configuration which tries to allow installation only from a fixed set of registry hostnames. We might need to offer a separate control (e.g. as discussed in [a comment on the "Registry in a File" prototype](https://github.com/opentofu/opentofu/pull/2892#issuecomment-3211398084)) so that organizations that are already using a constrained CLI Configuration for security reasons would not have that immediately undermined by the addition of this new feature. - Should local-exec providers use the same protocol as registry-distributed providers? Initially we presumed "yes", but depending on what tradeoffs we choose to make for registry-distributed providers we might ultimately conclude that it's better for registry-distributed providers to use a backward-compatible extension of the Terraform-managed provider protocol instead of something entirely new (perhaps still expecting those to be written in Go), while still offering a lighter stdio-based protocol for easier implementation of local-exec providers in many different langauges. - Should non-root modules be able to directly depend on local-exec providers? The discussion above as largely assumed that we _don't_ want to let shared module authors write modules that depend directly on arbitrary code written in other languages, but allowing that _would_ potentially allow a module author to encapsulate some special behavior -- e.g. a special data-transforming function written in a general-purpose language -- into their module. Because modules themselves can be distributed via registries and other network module sources, allowing a third-party module to include a local-exec provider seems likely to cause all of the same concerns that caused us to try to separate registry-distributed vs. local-exec providers in the first place: these modules would likely only work with specific other software installed on the system, and so we'd risk fracturing the ecosystem. - Can a local-exec provider effectively replace a registry-distributed provider? If we decide that local-exec providers belong to the same heirarchical namespace as registry-distributed providers, would we allow a local-exec provider to take the same name as a provider that would normally be installed from a registry? Could we stop this even if we wanted to, since OpenTofu would only know what's available in a remote registry after actually trying to install it? If we _do_ allow this then we'll need to think carefully about how we should handle the same configuration being used in environments that allow local-exec providers vs. those that don't. For example, does the dependency lock file still include checksums for the registry-distributed version of a provider _just in case_ the root module is used in an environment where the local-exec provider is ignored? Or, from the other perspective, does the dependency lock file "remember" that a particular provider is supposed to be local-exec so that `tofu init` can fail immediately if that root module is used in an environment where local-exec providers are not allowed, rather than potentially installing and running a third-party provider that the operator wasn't intending to run?