owned this note
owned this note
Published
Linked with GitHub
# Parallelization of ngcc
One of the most impactful ways to improve ngcc performance is to parallelize its operations. ngcc must compile many entrypoints from `node_modules`, and often these compilation tasks have no interdependencies and could be performed in parallel.
This document describes how this would functionally be implemented in ngcc.
[TOC]
## Options for parallelizing Node programs
Node is famously single threaded. Most options for parallelizing the execution of tasks leverage multiple worker processes, with IPC communication between the original "master" process and a number of "worker"s which handle the underlying operations. Broadly speaking, there are 3 options worth considering:
### Option 1: `child_process`
This is the most straightforward approach, where ngcc would `fork()` itself repeatedly via the Node builtin `child_process.fork()`. A number of workers would be forked off, and the main ngcc process would then pass messages to them via IPC (likely via some JSON serialization over stdin/stdout) to indicate which entrypoints each worker should compile.
This option requires ngcc to implement the IPC itself, as well as the management tasks related to tracking worker health, handling shutdown of child processes, etc.
### Option 2: `cluster`
`cluster` is an NPM package which wraps the `child_process` API into something more convenient for splitting tasks off as separate workers. It handles IPC and the bookkeeping/cleanup tasks, and allows the programmer to focus on the actual parallelization of the work.
Using `cluster` would incur an additional NPM dependency for the compiler.
### Option 3: `worker_threads`
`worker_threads` is a new API in Node 10.5+ which supports true multi-threading, as opposed to multi-process. This means that threads can share memory (via exchanging/sharing `ArrayBuffer`s).
It's currently an unstable, experimental API, but has the potential to be more efficient than a multi-process approach.
### Recommendation:
Despite requiring an additional NPM dependency, `cluster` is the most appealing option. It's stable, well-tested, handles a lot of edge cases that would require additional work for ngcc to handle on its own, and provides a very clean API.
## Current single-threaded ngcc algorithm
ngcc's work can rougly be divided into 3 phases:
1. Dependency scanning
Beginning with an entrypoint or the whole of `node_modules`, ngcc identifies which packages contain Angular View Engine entrypoints. It then parses a particular format of each entrypoint and extracts the dependency edges to other entrypoints by looking at `import` statements.
It continues this until it's built a DAG describing the dependency structure of all the entrypoints it needs to compile.
2. Planning
In this phase, ngcc looks at the DAG and determines an order of operations for compiling entrypoints. This ordering ensures any dependencies of an entrypoint have already been compiled prior to its invocation.
3. Execution
Finally, ngcc iterates through the operations in its list and compiles them, currently one entrypoint at a time. Each entrypoint is then compiled, sometimes multiple times if multiple formats have been requested.
## Parallel ngcc algorithm
All 3 phases of the above algorithm could potentially be parallelized. We will first discuss the expensive operation of compilation, and then consider the possibility of parallelizing the scanning phase.
### Parallelizing compilation
In ngcc, the information tracked about an entrypoint is described by the `EntryPoint` interface. This includes things like the entrypoint name, various paths (e.g. to source or .d.ts files), and the parsed `package.json` contents for the entrypoint. This interface is serializable, so it could easily be sent to a worker process for execution.
Currently, multiple compilations may occur for a single `EntryPoint`, due to the difference in formats. Since it's possible to compile two formats in parallel, a separate `Task` interface will be created representing the actual unit of work which will be sent to the worker:
```typescript
interface Task {
entryPoint: EntryPoint;
formatProperties: string[];
processDts: boolean;
}
```
This combines both the `entryPoint` being targeted and the list of `formatProperties` - properties in the `package.json` which all point to the same format being compiled. The `processDts` flag indicates whether that particular compilation should emit `.d.ts` files for the given entrypoint as well. This only needs to happen once per entrypoint, regardless of how many formats are compiled for it.
### Changes to the planning phase
The current planning ngcc does is insufficient for a parallelized compilation phase. Currently, ngcc compiles one entrypoint at a time, picking each task off the front of the queue until there are no more entrypoints to compile. This maintains an important invariant: an entrypoint will not be compiled until all of its dependencies have been.
Consider a simple dependency graph:
```graphviz
digraph depgraph {
C -> {A, B}
}
```
To compile entrypoint `C`, we first need to compile `A` and `B`. So an ordered list of operations for ngcc today might be: `[A, B, C]`.
If this is parallelized across two workers, then one worker would compile `A` while the other compiles `B`. So far, so good. Once one task finishes though, `C` cannot be started since some of its dependencies are still being compiled.
Thus, ngcc's planning phase will need adjusting. The output of planning should be a `Task[]`, indicating the rough order of operations. However, the work selection algorithm will need to be more sophisticated.
### Work selection
Work selection is a process which runs on the master, and is triggered whenever an event happens (at the start of compilation, or when a worker finishes a task assigned to it). It proceeds as follows:
* While there are still workers without work, and tasks to process:
* Consider the next task in the queue.
* If all of its dependencies have at least had their typings compiled, assign it to a worker.
* If not, skip it.
* If out of tasks, stop. There may still be idle workers, but there is no work for them to safely perform.
This process could be optimized somewhat (for example, by having the master proactively scan the queue and mark as "safe" any tasks which could be started at any time). Dependencies of a task could be cached so they don't have to be re-checked with each consideration.
### IPC
To assign a worker a task, the master will send it a message containing the `Task` object.
When the worker completes the task, it will either send back a message indicating success or failure. If failure, the worker will include a diagnostic message to be reported by the master which explains the failure.
### Potential parallelization of scanning
The task of scanning `node_modules` for candidate entrypoints to compile could potentially be parallelized as well. It's a much lower priority than compliation simply because it represents a significantly smaller potential win, but since this phase can scale to several seconds for larger projects it might be worth consideration.
## Parallelizing compilation
In general, the compilation of the specific format of an entrypoint is independent of any other compilations going on. However, care must be taken to not introduce race conditions when writing files.
### package.json
`package.json` is one of these special cases. Each compilation updates the `package.json` fields related to that specific format, and these can run in parallel. So writes to `package.json` need to take place on the main thread.
## Design
Since ngcc needs to be able to run either in parallel or serially, a solid abstraction is needed to avoid needing two separate flows. Ideally, the choice of serial or parallel should be irrelevant to the main ngcc "flow". Although as execution is now possibly asynchronous, the main ngcc flow will also potentially be asynchronous.
### `Executor` interface
One way to achieve this independence is to hide the compilation behind an `Executor` interface, which is responsible for performing ngcc's work either synchronously or asynchronously depending on compilation. A possible specification of this interface is:
```typescript
interface Executor {
execute(analyzeFn: AnalyzeFn, compiler: NgccTaskCompiler): Promise<void>|void;
}
type AnalyzeFn = () => {
queue: TaskQueue;
graph: DepGraph<EntryPoint>;
};
```
The inputs to `execute` are an `AnalyzeFn`, a function which can compute the work needing to be done (the ngcc scanning/planning phases), and an `NgccTaskCompiler`, a separate class which wraps compilation of a particular `Task`.
`TaskQueue` is simply an abstraction for a `Task[]` with some utility methods for popping the next task of interest.
### Flow with `cluster`
`cluster` creates workers by doing a `fork`/`exec` of the current process. Thus, each worker will behave as if the ngcc command was re-run from the beginning, going through the same flow as before. Since the main ngcc flow has no knowledge of the split, workers will go through the entire `main()`, including calling `Executor.execute()`. This means that the `main` function shouldn't do any work unless the executor directs it to.
The `AnalyzeFn` allows the scanning/planning phase to be deferred until the executor can make the determination to call it. This way, the executor that uses `cluster` to parallelize compilations can only call the `AnalyzeFn` the first time through, when running on the master.
## Consideration: parallelizing the CLI's usage of ngcc
The above design only really works when ngcc is invoked directly. There are thus issues preventing the CLI from using parallel ngcc:
1) The CLI needs a synchronous answer.
When the CLI calls ngcc, it does so by intercepting the `ts.CompilerHost.resolveModuleNames` function during program creation. This function expects a synchronous answer - there is no room for the CLI to wait on promises or callbacks.
2) `cluster.fork()` won't quite work in the CLI
`cluster.fork()` by default is equivalent to creating a new execution of the current process, meaning in the CLI it would start a new CLI compilation. We would need to invest some effort into separating the worker entrypoint into ngcc as a separate JS file and configure Cluster to start workers using it instead.
3) Overhead
`fork()` is not a cheap operation, and repeating it (several times) every time the CLI needs ngcc to compile a new module would get very costly in terms of overhead.
### On synchronicity and overhead
It's possible to conceive of a design to run ngcc in parallel in the CLI and deal with the synchronicity and overhead issues. This section stops short of proposing a full design, but sketches out a potential architecture.
#### Waiting on async operations behind a sync interface
Solving this problem is critical to being able to invoke a parallel (and therefore async) ngcc operation behind the synchronous TS interface. It is solvable. As it turns out, there are several techniques for doing this. The general formula is:
1) Spawn a separate thread or process to actually perform the async work.
2) Block the CLI process using some synchronous API.
3) Signal the main CLI process, causing it to unblock and continue.
One approach to doing this is to make use of `child_process.execSync`. This function can invoke a node sub-process which can involve async work, and block synchronously until that operation completes.
Another approach (which sadly won't work on Windows) is to make use of a filesystem FIFO object to communicate completion. `fs.readFileSync` will block on a FIFO until data is available. This would allow communication to and from a long-running process, instead of needing to spawn a new process every time. Unfortunately Windows does not have FIFOs, and the rough equivalent (named pipes) are not exposed to Node.
A final approach is to make use of one of the NPM packages like `deasync` which use node's C++ bindings to implement such blocking operations using threading internally.
#### Dealing with overhead
Rather than spawning multiple workers with each compilation request, the first time CLI requests ngcc to do work, the workers should be spun up lazily and left running. Each subsequent request would be routed to a new worker (or more than one, if dependencies need to be compiled), and then the CLI process would block waiting for completion using the mechanism above. This would allow ngcc to be truly parallel inside the CLI without the overhead of constant process creation.
#### Limitations
One issue with the above is that the CLI would still be feeding ngcc one entrypoint at a time to compile. This inherently serializes the compilation somewhat, so it's only the compilation of dependencies of a given entrypoint which can happen in parallel. With some work, perhaps the CLI could continue creating the program when the `.d.ts` compilations are done, and ngcc could continue in the background with compilation of other entrypoint formats in parallel with future work. Still, this is a significant limitation to the usefulness of parallelism within the context of the `ts.CompilerHost.resolveModuleNames` API that the CLI hooks for ngcc.
## Prototype
A [prototype](https://github.com/alxhub/angular/tree/ngcc/ll) of the above exists. It's missing a few features and needs substantial testing. In particular:
* [ ] `package.json` writes need to be serialized through the master. An interface exists to abstract this, but the actual serialization isn't implemented.
* [ ] unit tests are needed for the executors (perhaps mocking `cluster` somehow?)
* [ ] an integration test is needed for the actual parallelization
* [ ] configuration should determine whether serial or parallel strategies are used
* [ ] the CLI API should be exclusively serial
* [ ] generally the code should be cleaned up & commented more thoroughly