owned this note
owned this note
Published
Linked with GitHub
# Plans for the GAP package distribution
## General idea
Right now, the package distribution consists of a bunch of Mercurial repositories, one for each package we distribute. These are managed by a bunch of scripts (see also [here](https://github.com/gap-system/gap-distribution/blob/master/DistributionUpdate/PackageUpdate/README.md) for some more information).
The key idea for the new system is to replace this with a single git repository, say <https://github.com/gap-system/pkg-dist> containing all packages, i.e., the content of the current package tarball.
So instead of `make bootstrap` downloading and extracting a tarball, it could also just execute `git clone https://github.com/gap-system/pkg-dist pkg`. Then updating to a newer set of packages can be done via `cd pkg && git pull && ../bin/BuildPackages.sh`
Different sets of packages for major release lines i.e. GAP 4.10, 4.11, ..., master could be realized as branches of this repository.
This should cater for situations when we need to disable the latest version of a package for the stable or master branch of GAP.
Different sets of packages for each minor release could correspond to tags in this repository.
GitHub would also automatically create tarballs for us, e.g. <https://github.com/gap-system/pkg-dist/archive/master.zip> and <https://github.com/gap-system/pkg-dist/archive/stable-4.11.tar.gz>
but of course we can also easily generate our own using `git archive` if we want to.
## How are package updates handled?
Package updates, submissons and removals are all done via pull requests (PRs). These pull requests will mostly be generated by a Python(?) script `submit-gap-pkg`. Invoke it with either a path or URL pointing to a `PackageInfo.g` file as argument. It will then do this:
- if it is a local path, extract the `PackageInfoURL` from it and continue with that
- download the `PackageInfo.g` file from the URL
- validate it
- check that it is for a new package; or for an existing package, but with a newer version. Otherwise, abort
- (optionally) check if there is already a pull request for that exact package version; if so, abort
- download the archive referenced in it
- check that the `PackageInfo.g` in the archive matches the original one
- create a branch `update-PKGNAME-VERSION`
- add a commit to that branch that removes any older version of the package (if there is one), then adds the new one from the archive (this step could be done using `git fast-import`, so it does not need to touch the working tree)
- (optionally) push the changs to github and submit a pull request
This script can be used in multiple scenarios:
- package authors can use it to submit new packages for distribution
- a cron job can call it on `*/PackageInfo.g` in the `gap-dist` repository to fetch all recent package updates
- GAP core maintainer can use it to manually fixup issues with a package
Using PRs means that CI tests can easily be run on this (GitHub Actions, ... but also Jenkins). One of the first tests would be validation of the `PackageInfo.g`. We'd also run the package tests (against the latest GAP stable release, and also GAP master, on Linux/macOS/Windows, 32/64 bit etc., or maybe just on Linux 64 bit with the GAP Docker container); ideally also GAP's tests etc. (but that's also a matter on how much CPU resources we can afford to spend on this; some tests may have to use Jenkins after all)
TODO: discuss multi package PRs (e.g. Homalg)
- for the first scenario, there should be a way to create a PR with more than one package
- Homalg will be straightforward after we have meta packages
TODO: discuss commit format:
```
PACKAGENAME 1.2.3
Update PACKAGENAME from 1.2.2 to 1.2.3, downloaded from
http://URL/pkg.tar.gz
with SHA256 checksum
9d0b6bec790e309d799e91263b2f09be523ddc48a15b419d1a84100394198be3
```
## Q & A
* Q: What about pkg dir names that contain a version, like `anupq-1.2.3` and then this is updated to `anupq-1.2.4`
- A1: we could normalize all directory names, i.e., use `LowercaseString(pkgname)` for all of them
- A2: the script could track the directory names; and effectively do this: `git rm -rf oldname && git add newname && git commit`
* Q: This system is depending on GitHub; what do we do if GitHub goes away / becomes unusable for us for some reason?
* A: We switch to [GitLab](https://www.gitlab.com), [BitBucket](https://bitbucket.org/), [Gitea](https://gitea.io/), [Gogs](https://gogs.io), ..., which is possible because:
- all data is stored in a git repository (so fully transferable)
- "GH pull requests" need to be replaced by some other mechanism, but that's a very localized piece of code and will be comparatively easy to replace
- CI tests: we can use Travis, CircleCI, ... or even our own Jenkins
* Q: How are package updates picked up?
- by a cron job; this could be a GH Actions cron job running every 24 hours; or hosted on some of our own servers; it would basically read all `PackageInfo.g` files, run the update script etc .
- this would then open a PR
- but package authors could also manually open a PR (this is useful if two ore more packages must be update simultaneously)
* Q: What about the `currentPackageInfoURLList` file? Don't we need a list of all packages in the distribution?
- A: We don't need it: the current URLs can automatically be extracted from the `PackageInfo.g` files.
- A: or if we need it, then the CI could as one of its tasks generate/update it for us
* Q: how to remove a package from distribution?
- A: create a pull request that remove(s) the package(s)
* Q: But what about moving package URLs / the MOVE entry?
- A: just use the same submission script with the new URL !
* Q: How to import the old data?
- A: Alex needs to give some of us access to the existing repositories; we then can write a script to import them
- A: alternatively, we just start fresh from scratch with the latest set of packages and ignore the history
- A: as a mix, one could take all old package distribution versions from previous releases (downloaded from our file server) and generate a repository from them, with tags for the corresponding GAP releases. While that would not be complete, it would at least give some limited access to older versions
* Q: What if multiple packages update simultaneously and codependently?
- A: one should be able to combine them in one pull request
* Q: How to prepare individual package archives for the redistribution?
- A1: we could create them using `git archive` and/or some other custom script which just wraps a subirectory
- A2: we could just stop doing that... resp. take the original archive(s); and possibly generate "missing ones" (so if we only get a .tar.gz, we create .tar.bz2, -win.zip)
Remark: if one needs an archive in one of the "missing" formats,
then will still be preparing GAP distribution in all these formats.
The real question is - what are the potential benefits of keeping
individual archives of packages. It seems that there now less of
them then seemed before.
* Q: Do we keep original downloaded archives of the packages:
- A: we do not keep them now, so it will be consistent if do not plan to do this in the future
* Q: How to get the selection of packages redistributed with a given GAP release?
- A: this will correspond to a tag in the repository
* Q: What to do if a package has been withdrawn or changed its name
- A: in both cases, a package corresponds to a directory in the repository, so the action is to remove this directory.
* Q: Where are the relevant script(s) kept?
- A: Could be in the root dir of the `pkg-dist` repository; could be in a seperate repository (so they don't confuse "casual" users)
* Q: how to deal with binaries, esp. `manual.pdf` files?
- A: we could just add them in and hope it is OK (should first investigate how much storage they really use, though)
- A: we could also not store `manual.pdf` and instead re-generate them later on the fly
- A: or we could use [`git-lfs`](https://git-lfs.github.com), see also [docs for using git-lfs on GitHubs](https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-git-large-file-storage)
### TODO / ideas
- mention new validation tool, integration with PR submission
- dealing with multiple package updates in a row
- prevent duplicate PRs for same package version
- maybe at least record package tarball checksums (e.g. in the commit message)?
- describe how packages are submitted
- describe what happens if people modify the package w/o changing the version
- dealing with difference between GAP master and GAP stable:
- when a new package release uses features only available in the GAP master branch,
so we need to use its older version with the GAP stable branch
- when a package requires update to work with the GAP master branch, and we use its
latest release with the GAP stable branch, but its older version with the master
- when a package breaks GAP master branch, and has to be disabled there (i.e. the package
version has been for a redistribution, but had some dormant bug, which had been
revealed after some changes in GAP, and now causes a break loop
- which tests to run, and when/where ?
- only quick test: Linux 64bit (mac, win and 32bit usually don't differ)
- run GAP's testinstall/teststandard/etc. with just that one package loaded
- use GAP docker container for tests