# Strict dependency management ## Summary The new feature would add a new mode to npm called "strict" which will initially be an opt-in feature. The key characteristic of the strict mode is that it guarantees that any package only has access to the dependencies that it has declared in its package.json file. ## Motivation The current default installation strategy used by npm is called "hoisting". Hoisting is a strategy that cleverly uses the nodejs resolution strategy to reduce the number of packages duplicated on disk. The hoisting strategy comes with three issues: 1. Not all dependencies can be de-duplicated. - This creates performance problems when the dependency graph is complex enough. 3. Dependencies are over-shared. - This make packages able to access packages that they do not depend on. 3. Some peer-dependencies graphs cannot be correctly fulfilled. ### Duplicated packages The following example shows that the hoisting algorithm cannot always deduplicate packages. #### Dependency graph ``` - repo - A@1.0.0 - foo@1.0.0 - B@1.0.0 - foo@1.0.0 - C@1.0.0 - foo@2.0.0 - D@1.0.0 - foo@2.0.0 ``` #### Installation on disk ``` - repo - node_modules - A - B - C - node_modules - foo (2.0.0) - D - node_modules - foo (2.0.0) - foo (1.0.0) ``` #### Consequence The package foo@2.0.0 is duplicated on disk. ### Dependency over sharing When installing the following dependency graph on disk, the hoisting strategy overshares certain dependencies. #### Dependency graph ``` - root - foo - bar ``` #### Installation on disk ``` - root - node_modules - foo - bar ``` #### Consequence Code in the package `root` can directly import code in the package `bar` even though `root` has not dependency on `bar`. ### Wrong dependency graph This example demonstrates a case where the dependency graph cannot be correclty installed on disk. #### Dependency graph ``` - root - A@1.0.0 - B@1.0.0 - C@1.0.0 - B@* (peerDependency) - D@1.0.0 - B@2.0.0 - C@1.0.0 (circular dependency) ``` #### Installation on disk The hoisting algorithm cannot fulfill this dependency graph so it will install on disk something that almost fulfills the dependency graph. See repro [here](https://github.com/VincentBailly/hoisting-and-peer-dependencies). #### Consequence The following dependency chain should resolve to B v2.0.0 but resolves to B v1.0.0. Rationale and alternatives ## Current solutions npm has currently no way to solve this solution, other package managers have solved these issues with the following strategies. ### Existing strategies #### PlugAndPlay (PnP) This is a protocol [introduced by yarn](https://next.yarnpkg.com/features/pnp) which modifies the nodejs runtime to communicate the real dependency graph independentely of how things are laid-out on disk. #### Symlinking This is an approach developed by the [pnpm package manager](https://pnpm.js.org/). ### Preferred strategy for npm The symlink strategy is preferred over PnP for the following reasons: - symlink is the recommended strategy from the [nodejs documentation](https://nodejs.org/api/modules.html#modules_addenda_package_manager_tips) - Less disruptive - Does not need buy-in from the ecosystem (TypeScript and VSCode do not support PnP) - PnP's low adpotion makes it hard to get data on the viability of this approach. In the meantime, the symlink strategy has been successfully used for years by few large repositories in Microsoft. ## Implementation The implementation consists of installing all the packages in a folder where they are side by side instead of nested. The dependencies between the various packages is expressed by setting symlinks between the packages. This way allows to get rid of the contrain of needing to convert a graph into a tree; any shape of graph can be fully represented on disk. ### How does it work? This strategy is based on the following characteristic of the [nodejs module resolution algorithm](https://nodejs.org/api/modules.html#modules_all_together): When a package is being resolved, the resolution algorithm follows symlinks as if there were real folders. Once a module is resolved, the resolution algorithm calls 'realpath()' on the result. This means that the resolution algorithm always returns a real path. This allows to setup an arbitrary complex dependency graph while making sure nodejs does not create more than one instance of a given module.. ### Simple example #### Dependency graph ``` - root - A@1.0.0 - B@1.0.0 ``` #### Installation on disk ``` - root - .store - A@1.0.0 - node_modules - B -> ../../B@1.0.0 - B@1.0.0 - node_modules - A -> ../.store/a@1.0.0 ``` ### More complex example: peer dependencies #### Dependency graph ``` - root - A@1.0.0 - B@1.0.0 - C@1.0.0 - B@* (peerDependency) - D@1.0.0 - B@2.0.0 - C@1.0.0 (circular dependency) ``` #### Installation on disk ``` - root - .store - A@1.0.0 - node_modules - B -> ../../B@1.0.0 - C -> ../../C@1.0.0+B@1.0.0 - B@1.0.0 - B@2.0.0 - C@1.0.0+B@1.0.0 - node_modules - B -> ../../B@1.0.0 - D => ../../D@1.0.0 - C@1.0.0+B@2.0.0 - node_modules - B -> ../../B@2.0.0 - D => ../../D@1.0.0 - D@1.0.0 - node_modules - B -> ../../B@2.0.0 - C -> ../../C@1.0.0+B@2.0.0 - node_modules - A -> ../../.store/A@1.0.0 ``` ## Consequences The hoisting algorithm has been the default in the JS community for a while. Because it makes it possible to use some code without declaring a dependency on it, many mistakes (undeclared dependencies) have been shipped to the npm registry over the years. All these mistakes will re-surface with a strict installation strategy and need to be fixed. Strict dependencies installation makes the developping environment more predictable, this opens for new kind of optimizations like faster incremental installation or sharing dependencies cross repos. ## Unresolved Questions and Bikeshedding - Can circular symlinks be a problem? - Should we use symlinks or junctions on Windows? Both of them have drawbacks: - Junctions have to be representated by an absolute path, this means that junctions cannot be committed to git or packed into a package. - Symlinks can only be created in elevated shell [or when Windows is in "devloper mode"](https://blogs.windows.com/windowsdeveloper/2016/12/02/symlinks-windows-10/#LCiVBWTgQF5s7fmL.97). - Will it make it hard for developers to visualize and understand the dependency tree? - Should the store folder be in the repo itself? - if yes, every package in the store will have access to the dependencies of the git repo, because they will have access to its node_module folder (`../../node_modules`) - if no, where? Should it be shared with other repositories installed on the system? - Should we share configuration with pnpm?