# Sparse registry selection
This document proposes two separate mechanisms for how Cargo will know to use a git index versus a sparse index. One is for crates.io, and the other is for all other registries.
Feel free to leave feedback on the [tracking issue #10964](https://github.com/rust-lang/cargo/issues/10964), or on [#t-cargo on Zulip](https://rust-lang.zulipchat.com/#narrow/stream/246057-t-cargo).
## Changes
### crates.io easy select
> NOTE: This option is already implemented (via [#11215](https://github.com/rust-lang/cargo/pull/11215)), and is available on the nightly channel when used with the `-Zsparse-registry` CLI option.
To make it a little easier for users to choose which method to access crates.io, a new config value will be added:
```toml
[registries.crates-io]
protocol = "sparse"
```
The `protocol` field can only be specified for `crates-io`.
This is needed because the URLs are hard-coded into Cargo itself, and the user needs some method to select which one to use.
The value can be `"sparse"` or `"git"`.
### `config.json`
> NOTE: This is currently not implemented, as we are focusing on the crates.io transition. `crates.io` will not be using this setting as both URLs are hard-coded in Cargo.
For registries other than crates.io, we are proposing to add a new field to the index `config.json` in order to assist migration from git to sparse:
* `canonical` — The canonical URL for the registry. This is the URL that Cargo will use in `Cargo.lock` files, `.crate` files (`Cargo.toml`), and the index (for cross-registry dependencies).
The way this is intended to work is that a registry can set the "canonical" field to the URL of the git index.
This is intended to solve the problem with Cargo not knowing that two different URLs refer to the same index. If the "canonical" field is **not** used, and a user has a configuration where they are accessing a sparse index at `sparse+https://example.com/crates/`, and they have a project where the `Cargo.lock` points to the git index at `https://example.com/index.git`, Cargo would not know that the dependency belongs to the same location, and thus would rebuild the `Cargo.lock` to use the new `sparse` URL.
The "canonical" field allows the following:
1. It allows a registry to serve the exact same contents for the index and crate files. The HTTPS sparse index can even serve from a simple clone of the git index.
2. It allows users to use both the git index and the sparse index at the same time, and Cargo will be able to understand that they are all one in the same.
3. It allows a registry to migrate from git to sparse without rebuilding anything.
See below for some examples of how this will work and help with migration.
## Use cases
The following sections describe various use cases and how they will work for users and admins and implementors.
### crates.io public users
In version 1.67 (tentatively), Cargo will stabilize support for sparse registries, but will not switch `crates.io` by default. Instead, users will need to specify `registries.crates-io.protocol="sparse"` in their config, or set the `CARGO_REGISTRIES_CRATES_IO_PROTOCOL=sparse` environment variable. This will help us ease-in and test how the service handles under load and more diverse environments.
In a future version, once there has been more experience with people using `index.crates.io`, we will switch the default of `registries.crates-io.protocol` to "sparse" so that it will default to that.
If a user for whatever reason wants to keep using the GitHub index, they can set `registries.crates-io.protocol="git"`.
### Source replacement
Source replacement should generally work how it does today. The protocol can be specified in the URL if they want to use something explicit:
```toml
[source.crates-io]
replace-with = "mirror"
[source.mirror]
# Without sparse+ this will default to git.
index = "sparse+https://example.com/crates/"
```
### Alternate registry gradual migration
An alternate registry can migrate to sparse indexes in a similar fashion as crates.io.
1. In `config.json`, set the "canonical" value to their current git index URL.
2. Serve the same index via git and http.
3. The registry administrators should instruct their users to update their Cargo `config.toml` to point directly to `sparse+https://example.com/crates/` so that they start using the sparse index.
4. Eventually, if all users have successfully updated their config files to go directly to the sparse index, the registry may decide to decommission the git index if they so desire.
### Alternate registry immediate migration (with compatibility)
An alternate registry can swap to using sparse indexes and drop support for git indexes in one step. This will require doing the following all at once:
1. In `config.json` set the `canonical` field to the URL of the git index.
2. Shut down the git index and start serving the http index.
3. Update all user `[registries]` configs to the new sparse URL.
### Alternate registry immediate migration (without compatibility)
> **Note**: We think it is unlikely that any registry will implement this technique due to the complexity and disruption. It is being outlined here to provide it as an option, and to illustrate the need for an easier migration path.
Similar to above, though without compatibility of existing `Cargo.lock` files. This avoids the need for setting `canonical` and retaining any sort of backwards compatibility. This will require doing the following all at once:
1. Shut down the git index.
2. Extract and rebuild all `.crate` files to modify the `Cargo.toml` files to change the URLs to the new `sparse+` URL. `Cargo.lock` files (for `.crate` files with binaries) also need to update the URL to change the URL and update the checksum to the rebuilt `.crate` file.
3. Rebuild the index to update the new checksum for each `.crate` file. If there is more than one index, then the URLs to the other indexes will need to be updated if they are also transitioning.
4. Update `Cargo.lock` files to change the source IDs to the new `sparse+` URL and update the checksum.
5. Update all user `[registries]` configs to the new sparse URL.
6. Start serving the index over http.
### New alternate registry
A new registry may decide to only support sparse indexes. In this case, they should instruct users to use the appropriate config:
```toml
[registries.my-registry]
index = "sparse+https://example.com/crates/"
```
There is no need to set `canonical`.
### Alt-registry to alt-registry
It is possible to have one registry refer to crates in another registry. Today, users only need the registry configuration for direct dependencies. The second registry's index URL is embedded in the first index (and the published `Cargo.toml` are rewritten with the correct index).
In the situation where the user does not have the second registry configured, then Cargo will have to access the second registry via its canonical URL (the URL found in the first registry's index). This means the second registry cannot decommission its git index, otherwise it would not be accessible.
In the situation where the user does have the second registry configured, then Cargo should be able to map the canonical URL. However, it will need to somehow get the `config.json` from that second registry, and that is not something Cargo does today.
For now, we are leaning towards not addressing this use case, as it we feel it is very unlikely that it is in much use in practice.
If an organization wants to migrate this situation, they will need to follow the steps [described above](#Alternate-registry-immediate-migration-without-compatibility) to rebuild all the files.