Try   HackMD

RFC: Nimble Caching and Package Structure

Intro

Some folks working on Nim tooling were discussing Nimble and how it works with package caches. It also overlaps with how Nimble parses information from nimble files in each project which plays a role in properly using SAT for package resolution. Currently Nimble mixes Go-like vendoring style with that of a package manager style. This unfortunately ends up with the worst aspects of both as discussed in other RFC's (Nim RFC 524, Nimble RFC 653).

Currently Nimble uses Git to clone a package from a repository url. This cloned repo is used to parse the project's Nimble file and find tagging informating for versions. The "source files" folder is then copied into Nimble's cache directory, the meta data is cached, and the cloned repo is removed. The repo remains the defacto source of information for the package metadata while the source files are transformed into a new structure. This process runs into many issues, including cloning a repo multiple times for different versions, inconsistencies in metadata, etc.

This RFC builds on an idea that arnetheduck suggested. The idea is that Nimble should instead keep git clones in its global cache and "instantiate" their work trees in a local project folder. This would resolve many issues with networking, storage, and locality.

Nimble could further embrace a traditional package manager design where a central metadata server would be used to store versioning information and possibly files themselves. However, this would require a large change in how the Nim and Nimble ecosystem works. It also requires dedicated resources to maintain such a system, especially from a security standpoint. Building on resources like Github, Gitlab, etc offloads those issue and allows users flexibility in their setup.

Proposed Design

Git repos would be cloned as bare repos into $NIMBLE_DIR/cache/. The other Nimble file structures could be maintained as they are. Nimble installing dependencies for a project would then instantiate a set of dependencies for the project using a local nimcache folder. A global cache could be using similar to NPM's -g flag which could switch back to using $NIMBLE_DIR/pkgs2.

Other DVCS's like Mercurial should be treated similarly as Git.

This approach would allow Nimble to clone a repository once. This repo would be headless and global and could be used for both finding Nimble metadata and as the local cache of project files.

These repos could be quickly and efficiently updated, which would be a large improvement over the current state of affairs with Nimble. Keeping them headless could reduce confusion about which files are used by Nim.

There are some downsides such as increased disk usage since the Git repos won't be dropped anymore. However, this likely won't be a pertinent issue for most developers (e.g. compare it to the average NPM or Cargo packages). Network speeds and recloning repos are often a much bigger burden for many without fast internet speeds.

Example Layout

The global structure would look something like:

/Users/elcritch/.nimble/nimbledata.json
/Users/elcritch/.nimble/bin/nimlangserver
/Users/elcritch/.nimble/cache
/Users/elcritch/.nimble/cache/bearssl # bare git repo
/Users/elcritch/.nimble/cache/bearssl/HEAD
/Users/elcritch/.nimble/cache/bearssl/config
/Users/elcritch/.nimble/cache/bearssl/objects
/Users/elcritch/.nimble/cache/bearssl/...

Handling forked repos could be done similarly to Atlas's "name triplet" where name collisions are handled by renaming using a packagename.user.hostname format:

/Users/elcritch/.nimble/cache/bearssl.elcritch.github # forked repos using name triplets

Then a project would look like:

my_project/my_project.nimble
my_project/src/my_project.nim
my_project/nimcache/
my_project/nimcache/nimbledata.json # cached metadata
my_project/nimcache/depstree.json # sat resolution
my_project/nimcache/bearssl-0.2.0-9e9b4c34bae17aa7218e7ce449128064ae5e1118
my_project/nimcache/bearssl-0.2.0-9e9b4c34bae17aa7218e7ce449128064ae5e1118/nimblemeta.json
my_project/nimcache/bearssl-0.2.0-9e9b4c34bae17aa7218e7ce449128064ae5e1118/bearssl.nimble
my_project/nimcache/bearssl-0.2.0-9e9b4c34bae17aa7218e7ce449128064ae5e1118/bearssl.nim
my_project/nimcache/bearssl-0.2.0-9e9b4c34bae17aa7218e7ce449128064ae5e1118/...

Considerations

Structure of Local Package Instances

Currently Nimble strips the package structure and only copies out specified source files. This leads to lots of issues as noted in the aforementioned RFC's (Nim RFC 524, Nimble RFC 653).

This RFC recommends dropping this convention in Nimble and instead copying the full repo layout and instead adjusting the Nim config paths instead.

Some packages rely on how Nimble strips out unneeded data directories. It may be beneficial for this reason to provide an excludedPaths option in the Nimble files rather than the current opt in approach used with srcDirs.

Path Configuration

This RFC also recommends using the nim.cfg configuration technique utilized by Atlas. When Atlas sets up a workspace, it just creates a nim.cfg in the current project which specifies the appropriate src directories for Nim to reference.

Configuring a nim.cfg which points to the local nimcache/ folder with the correct project folders would simplify the need for using getPathsClause that Nimble has been moving toward. Using a nim.cfg file makes understanding what packages are actually used significantly simpler for developers to understand. It also avoids needing to modify Nimble tasks.

Metadata and Git Tags

One downside of not using a central packaging server to maintain metadata is that the metadata for each package can change whenever it's Git repo is updated. This currently causes headaches with Nimble which will resolve to versions to different hashes.

This RFC recommends recording the HEAD commit of each repo when packages are resolved. Furthermore, Nimble should correctly resolve to the latest appropriate version whenever the user does a Nimble install or update.