owned this note
owned this note
Published
Linked with GitHub
# How to migrate to [Josh](https://github.com/josh-project/josh/)
A couple of good advices:
- **Never use the Josh version from crates.io!** It is severely outdated. Use the [latest released](https://github.com/josh-project/josh/releases) version from GitHub (at the time of writing, it was `r24.10.04`):
```bash
$ cargo +stable install josh-proxy --git https://github.com/josh-project/josh --tag r24.10.04
```
Note that the version of `josh` printed by cargo may not correpond to the version of the GitHub release (because that uses datetime versioning), that is fine.
- **Never use Rebase or squash merges on GitHub for the synchronization (pull/push) PRs**. Using that will overwrite the SHAs generated by Josh and thus desync the git history. To be on the safer side, a repository using Josh should probably have Rebase and squash merges disabled on GitHub.
## How to migrate from `git subtree` to `josh-proxy`
"subrepo" refers to the repo that reflects just a folder of rustc, e.g. the Miri or rust-analyzer repository.
- Note down your current subrepo head commit as LAST_SUBTREE_COMMIT.
- Do one last subtree sync from the subrepo to rustc so that LAST_SUBTREE_COMMIT exists in rustc histroy.
- Now do *not* do any more subtree syncs in either direction! Josh would have to figure out how to sync those and that usually fails.
- Construct your josh filter. For rust-analyzer it is
`:rev(LAST_SUBTREE_COMMIT:prefix=src/tools/rust-analyzer):/src/tools/rust-analyzer`. This tells josh how to extract the subrepo history from the rustc history: generally only the part inside `src/tools/rust-analyzer` matters, but for everything before LAST_SUBTREE_COMMIT, it needs to be treated as-if it was inside `src/tools/rust-analyzer`.
This reflects a fundamental difference in how subrepo commits get reflected in the rustc repo: with `git subtree`, they are copied identically, even preserving their git hash, so their tree contains just your subrepo at its root. With josh, the commits look exactly as-if the same change was made inside the rustc repo, i.e. your subrepo is as its usual place in the rustc folder hierarchy and the rest of the rustc repo also exists in these commits (but is of course left unchanged). With josh, looking at the rustc history, you can't tell which changes were made directly in Rust vs in the subrepo.
- Set up the scripts to do rustc-push and rustc-pull via josh semi-automatically, e.g. by copying it from Miri or rust-analyzer. Make sure the commit that adds this also contains an empty file in the crate root called `rust-version`.
- Do the first rustc-pull. You need to pull from a rustc commit that contains LAST_SUBTREE_COMMIT. This will update the `rust-version` file. Manually check the fetched history (FETCH_HEAD) to ensure that it contains LAST_SUBTREE_COMMIT. The following command should not print anything:
```
git merge-base --is-ancestor <LAST_SUBTREE_COMMIT> FETCH_HEAD || echo "Something went very wrong!"
```
Also, `git rev-list HEAD --max-parents=0 --count` should say that there is only a single root commit (unless your project already had multiple roots before josh entered the picture). The overall diff of the merge (`git diff <LAST_SUBTREE_COMMIT>`) should be only whatever changed in rustc that has not yet been synced to the subrepo.
- You may note that the extracted history contains a *ton* of merge commits. (In Miri we didn't get many merge commits, probably because we didn't actually successfully use `git subtree` for more than a single sync; in RA, we got around 1500 merge commits.) Specifically, a merge commit is created for each rustc merge commit where your subrepo differs between the two parents of that merge commit. josh considers those merge commits to "matter" for the subrepo history, and reflects them in the subrepo. So if someone creates a PR for rustc, and between them forking off of master and their PR landing in master a subrepo change lands (either via a sync or via a change that directly happens in rustc), then the merge commit of that PR will become visible in your subrepo. (Only the merge commit becomes visible, no other part of the PR.) You can look at [the rustup PRs in Miri](https://github.com/rust-lang/miri/pulls?q=is%3Apr+is%3Aclosed+rustup) to see how many of these merge commits we get in practice. This reflects the deeper philosophy of josh that the "main" repo is rustc itself, and the subrepo is "just" a projection to a particular folder (as defined by the filter).
- Now you can merge your subrepo master (in case it changed since LAST_SUBTREE_COMMIT), and do a rustc-push to ensure everything works. I recommend playing around with this a bit, i.e. doing changes in the pushed branch and pulling them again and doing changes in the subrepo and pushing those.
## How to migrate from `git submodule` to `josh-proxy`
An example of this migration can be found [here](https://github.com/rust-lang/rust/pull/134907).
Here we assume that we want to migrate a subrepo that was used as a submodule in rustc. We assume that the submodule was located at `${SUBMODULE_DIR}`, and we will want to place the new josh version of the subrepo into the same directory.
- Note down the current submodule commit as `LAST_SUBMODULE_COMMIT`.
- You can find it with `git submodule status ${SUBMODULE_DIR}`
- Note: you can also create the new josh subtree based on a different (newer) commit if you want. You don't necessarily have to continue where the submodule was, you can update it forward together with the move to josh. In any case, the code below assumes that you want to merge the subtree at `LAST_SUBMODULE_COMMIT` into rustc.
- Create a branch in rustc that will be used to perform the migration. We will call it `${BRANCH}`.
- Create a commit in `${BRANCH}` that will remove the submodule:
```bash
git submodule deinit --force ${SUBMODULE_DIR}
git rm -r ${SUBMODULE_DIR}
# Check that the submodule has been removed from .gitmodules
# Commit the result
git commit -m"Removed `${SUBMODULE_DIR}` submodule"
```
- Install `josh-filter`, which is used for performing the initial merge of the subtree into rustc. The following pulls/pushes will be performed using `josh-proxy`, same as in the subtree guide above.
```bash
$ cargo +stable install josh-filter --git https://github.com/josh-project/josh --tag r24.10.04
```
- Merge the subtree into rustc
```bash
$ git fetch <submodule-repo-url> ${LAST_SUBMODULE_COMMIT}
$ josh-filter ":prefix=${SUBMODULE_DIR}" FETCH_HEAD
# e.g. josh-filter ':prefix=src/doc/rustc-dev-guide' FETCH_HEAD
$ git merge --allow-unrelated FILTERED_HEAD
# Choose some commit message for merging the subtree history into rustc
```
- **IMPORTANT!!!** Check that the merged history can be reconstructed with josh into the same commit SHA from the original subrepo.
- Push `${BRANCH}` to your fork of rustc.
- Get the SHA of `${BRANCH}` with `git rev-parse HEAD`. Let's call it `${RUSTC_SHA}`.
- Start `josh-proxy`.
```bash
$ josh-proxy --local $HOME/.cache/josh --remote https://github.com --port 42042 --no-background
```
- Fetch the filtered version of `${RUSTC_SHA}` from your fork.
```bash
$ git fetch http://localhost:42042/<your-gh-username>/rust.git@${RUSTC_SHA}:${SUBMODULE_DIR}.git
```
- Print the filtered commit: `git rev-parse FETCH_HEAD`
- **If the output of the above command does not equal `${LAST_SUBMODULE_COMMIT}`, DO NOT CONTINUE FURTHER and retrace your steps, or investigate why josh did not do what it should have done!!!**. If the two commit SHAs do not match, it means that the git histories have diverged, and it won't be possible to do pushes and pulls cleanly with josh.
- Construct the corresponding [Josh filter](https://josh-project.github.io/josh/reference/filters.html) that will be used for pulls and pushes. It should be quite simple, `:/${SUBMODULE_DIR}`, e.g. `:/src/doc/rustc-dev-guide`.
- Prepare push and pull tooling, similar to the subtree guide above. You can see an example [here](https://github.com/rust-lang/rustc-dev-guide/pull/2183).
- Commit an empty `rust-version` file into the subrepo.
- The following instructions are similar to the subtree guide above.
- Try to perform a rustc-pull in your fork of subrepo.
- There should be no merge conflicts (unless you already modified something from the subrepo in `${BRANCH}`).
- Try to merge the result of rustc-pull into your fork of the subrepo. There should not be thousands of commits in the PR (that would mean the history got duplicated), you should just see the changes between `${LAST_SUBMODULE_COMMIT}` and subrepo's HEAD, plus a few merge commits.
- Try to perform a rustc-push in your fork of subrepo.
- There should not be any merge conflicts.
- Try to push the result into your fork of rustc. Again, there should not be thousands of commits in the PR.
- After you have tested that everything works as expected on your forks, you can merge `${BRANCH}` into rustc.
## Steady state
After you have ported the subtree or submodule to josh, you should update rustc with rustc-push, and the subrepo with rustc-pull. Note that both of these commands will be executed in the subrepo.
- The `rust-version` always keeps track of the last rustc commit that got merged into the subrepo (i.e., the last rustc-pull). This is useful for rustc-push to behave in a more predictable way, and it can also be useful for the subrepo's CI -- it can download that version of rustc to obtain a rustc that is guaranteed to be in sync with this repo.
- When doing rustc-pull and there are conflicts, just resolve them in the subrepo as part of the merge commit. Do *not* rebase.
- When doing rustc-push and there are conflicts, likewise do *not* rebase. In Miri what we usually do is abort the rustc-push, do a rustc-pull instead and resolve the conflicts there, and then do another rustc-push which should no longer have any conflicts.