Parallelization of ngcc

One of the most impactful ways to improve ngcc performance is to parallelize its operations. ngcc must compile many entrypoints from node_modules, and often these compilation tasks have no interdependencies and could be performed in parallel.

This document describes how this would functionally be implemented in ngcc.

Parallelization of ngcc

Options for parallelizing Node programs

Node is famously single threaded. Most options for parallelizing the execution of tasks leverage multiple worker processes, with IPC communication between the original "master" process and a number of "worker"s which handle the underlying operations. Broadly speaking, there are 3 options worth considering:

Option 1: `child_process`

This is the most straightforward approach, where ngcc would fork() itself repeatedly via the Node builtin child_process.fork(). A number of workers would be forked off, and the main ngcc process would then pass messages to them via IPC (likely via some JSON serialization over stdin/stdout) to indicate which entrypoints each worker should compile.

This option requires ngcc to implement the IPC itself, as well as the management tasks related to tracking worker health, handling shutdown of child processes, etc.

Option 2: `cluster`

cluster is an NPM package which wraps the child_process API into something more convenient for splitting tasks off as separate workers. It handles IPC and the bookkeeping/cleanup tasks, and allows the programmer to focus on the actual parallelization of the work.

Using cluster would incur an additional NPM dependency for the compiler.

Option 3: `worker_threads`

worker_threads is a new API in Node 10.5+ which supports true multi-threading, as opposed to multi-process. This means that threads can share memory (via exchanging/sharing ArrayBuffers).

It's currently an unstable, experimental API, but has the potential to be more efficient than a multi-process approach.

Recommendation:

Despite requiring an additional NPM dependency, cluster is the most appealing option. It's stable, well-tested, handles a lot of edge cases that would require additional work for ngcc to handle on its own, and provides a very clean API.

Current single-threaded ngcc algorithm

ngcc's work can rougly be divided into 3 phases:

Dependency scanning

Beginning with an entrypoint or the whole of node_modules, ngcc identifies which packages contain Angular View Engine entrypoints. It then parses a particular format of each entrypoint and extracts the dependency edges to other entrypoints by looking at import statements.

It continues this until it's built a DAG describing the dependency structure of all the entrypoints it needs to compile.

Planning

In this phase, ngcc looks at the DAG and determines an order of operations for compiling entrypoints. This ordering ensures any dependencies of an entrypoint have already been compiled prior to its invocation.

Execution

Finally, ngcc iterates through the operations in its list and compiles them, currently one entrypoint at a time. Each entrypoint is then compiled, sometimes multiple times if multiple formats have been requested.

Parallel ngcc algorithm

All 3 phases of the above algorithm could potentially be parallelized. We will first discuss the expensive operation of compilation, and then consider the possibility of parallelizing the scanning phase.

Parallelizing compilation

In ngcc, the information tracked about an entrypoint is described by the EntryPoint interface. This includes things like the entrypoint name, various paths (e.g. to source or .d.ts files), and the parsed package.json contents for the entrypoint. This interface is serializable, so it could easily be sent to a worker process for execution.

Currently, multiple compilations may occur for a single EntryPoint, due to the difference in formats. Since it's possible to compile two formats in parallel, a separate Task interface will be created representing the actual unit of work which will be sent to the worker:

interface Task {
  entryPoint: EntryPoint;
  formatProperties: string[];
  processDts: boolean;
}

This combines both the entryPoint being targeted and the list of formatProperties - properties in the package.json which all point to the same format being compiled. The processDts flag indicates whether that particular compilation should emit .d.ts files for the given entrypoint as well. This only needs to happen once per entrypoint, regardless of how many formats are compiled for it.

Changes to the planning phase

The current planning ngcc does is insufficient for a parallelized compilation phase. Currently, ngcc compiles one entrypoint at a time, picking each task off the front of the queue until there are no more entrypoints to compile. This maintains an important invariant: an entrypoint will not be compiled until all of its dependencies have been.

Consider a simple dependency graph:

digraph depgraph {
    C -> {A, B}
}

To compile entrypoint C, we first need to compile A and B. So an ordered list of operations for ngcc today might be: [A, B, C].

If this is parallelized across two workers, then one worker would compile A while the other compiles B. So far, so good. Once one task finishes though, C cannot be started since some of its dependencies are still being compiled.

Thus, ngcc's planning phase will need adjusting. The output of planning should be a Task[], indicating the rough order of operations. However, the work selection algorithm will need to be more sophisticated.

Work selection

Work selection is a process which runs on the master, and is triggered whenever an event happens (at the start of compilation, or when a worker finishes a task assigned to it). It proceeds as follows:

While there are still workers without work, and tasks to process:
- Consider the next task in the queue.
  - If all of its dependencies have at least had their typings compiled, assign it to a worker.
  - If not, skip it.
If out of tasks, stop. There may still be idle workers, but there is no work for them to safely perform.

This process could be optimized somewhat (for example, by having the master proactively scan the queue and mark as "safe" any tasks which could be started at any time). Dependencies of a task could be cached so they don't have to be re-checked with each consideration.

IPC

To assign a worker a task, the master will send it a message containing the Task object.

When the worker completes the task, it will either send back a message indicating success or failure. If failure, the worker will include a diagnostic message to be reported by the master which explains the failure.

Potential parallelization of scanning

The task of scanning node_modules for candidate entrypoints to compile could potentially be parallelized as well. It's a much lower priority than compliation simply because it represents a significantly smaller potential win, but since this phase can scale to several seconds for larger projects it might be worth consideration.

Parallelizing compilation

In general, the compilation of the specific format of an entrypoint is independent of any other compilations going on. However, care must be taken to not introduce race conditions when writing files.

package.json

package.json is one of these special cases. Each compilation updates the package.json fields related to that specific format, and these can run in parallel. So writes to package.json need to take place on the main thread.

Design

Since ngcc needs to be able to run either in parallel or serially, a solid abstraction is needed to avoid needing two separate flows. Ideally, the choice of serial or parallel should be irrelevant to the main ngcc "flow". Although as execution is now possibly asynchronous, the main ngcc flow will also potentially be asynchronous.

`Executor` interface

One way to achieve this independence is to hide the compilation behind an Executor interface, which is responsible for performing ngcc's work either synchronously or asynchronously depending on compilation. A possible specification of this interface is:

interface Executor {
  execute(analyzeFn: AnalyzeFn, compiler: NgccTaskCompiler): Promise<void>|void;
}

type AnalyzeFn = () => {
  queue: TaskQueue;
  graph: DepGraph<EntryPoint>;
};

The inputs to execute are an AnalyzeFn, a function which can compute the work needing to be done (the ngcc scanning/planning phases), and an NgccTaskCompiler, a separate class which wraps compilation of a particular Task.

TaskQueue is simply an abstraction for a Task[] with some utility methods for popping the next task of interest.

Flow with `cluster`

cluster creates workers by doing a fork/exec of the current process. Thus, each worker will behave as if the ngcc command was re-run from the beginning, going through the same flow as before. Since the main ngcc flow has no knowledge of the split, workers will go through the entire main(), including calling Executor.execute(). This means that the main function shouldn't do any work unless the executor directs it to.

The AnalyzeFn allows the scanning/planning phase to be deferred until the executor can make the determination to call it. This way, the executor that uses cluster to parallelize compilations can only call the AnalyzeFn the first time through, when running on the master.

Consideration: parallelizing the CLI's usage of ngcc

The above design only really works when ngcc is invoked directly. There are thus issues preventing the CLI from using parallel ngcc:

The CLI needs a synchronous answer.

When the CLI calls ngcc, it does so by intercepting the ts.CompilerHost.resolveModuleNames function during program creation. This function expects a synchronous answer - there is no room for the CLI to wait on promises or callbacks.

cluster.fork() won't quite work in the CLI

cluster.fork() by default is equivalent to creating a new execution of the current process, meaning in the CLI it would start a new CLI compilation. We would need to invest some effort into separating the worker entrypoint into ngcc as a separate JS file and configure Cluster to start workers using it instead.

Overhead

fork() is not a cheap operation, and repeating it (several times) every time the CLI needs ngcc to compile a new module would get very costly in terms of overhead.

On synchronicity and overhead

It's possible to conceive of a design to run ngcc in parallel in the CLI and deal with the synchronicity and overhead issues. This section stops short of proposing a full design, but sketches out a potential architecture.

Waiting on async operations behind a sync interface

Solving this problem is critical to being able to invoke a parallel (and therefore async) ngcc operation behind the synchronous TS interface. It is solvable. As it turns out, there are several techniques for doing this. The general formula is:

Spawn a separate thread or process to actually perform the async work.
Block the CLI process using some synchronous API.
Signal the main CLI process, causing it to unblock and continue.

One approach to doing this is to make use of child_process.execSync. This function can invoke a node sub-process which can involve async work, and block synchronously until that operation completes.

Another approach (which sadly won't work on Windows) is to make use of a filesystem FIFO object to communicate completion. fs.readFileSync will block on a FIFO until data is available. This would allow communication to and from a long-running process, instead of needing to spawn a new process every time. Unfortunately Windows does not have FIFOs, and the rough equivalent (named pipes) are not exposed to Node.

A final approach is to make use of one of the NPM packages like deasync which use node's C++ bindings to implement such blocking operations using threading internally.

Dealing with overhead

Rather than spawning multiple workers with each compilation request, the first time CLI requests ngcc to do work, the workers should be spun up lazily and left running. Each subsequent request would be routed to a new worker (or more than one, if dependencies need to be compiled), and then the CLI process would block waiting for completion using the mechanism above. This would allow ngcc to be truly parallel inside the CLI without the overhead of constant process creation.

Limitations

One issue with the above is that the CLI would still be feeding ngcc one entrypoint at a time to compile. This inherently serializes the compilation somewhat, so it's only the compilation of dependencies of a given entrypoint which can happen in parallel. With some work, perhaps the CLI could continue creating the program when the .d.ts compilations are done, and ngcc could continue in the background with compilation of other entrypoint formats in parallel with future work. Still, this is a significant limitation to the usefulness of parallelism within the context of the ts.CompilerHost.resolveModuleNames API that the CLI hooks for ngcc.

Prototype

A prototype of the above exists. It's missing a few features and needs substantial testing. In particular:

package.json writes need to be serialized through the master. An interface exists to abstract this, but the actual serialization isn't implemented.
unit tests are needed for the executors (perhaps mocking cluster somehow?)
an integration test is needed for the actual parallelization
configuration should determine whether serial or parallel strategies are used
the CLI API should be exclusively serial
generally the code should be cleaned up & commented more thoroughly

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`	在筆記中貼入程式碼
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.

Parallelization of ngcc

Options for parallelizing Node programs

Option 1: child_process

Option 2: cluster

Option 3: worker_threads

Recommendation:

Current single-threaded ngcc algorithm

Parallel ngcc algorithm

Parallelizing compilation

Changes to the planning phase

Work selection

IPC

Potential parallelization of scanning

Parallelizing compilation

package.json

Design

Executor interface

Flow with cluster

Consideration: parallelizing the CLI's usage of ngcc

On synchronicity and overhead

Waiting on async operations behind a sync interface

Dealing with overhead

Limitations

Prototype

Option 1: `child_process`

Option 2: `cluster`

Option 3: `worker_threads`

`Executor` interface

Flow with `cluster`