Alex Rickabaugh
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
      • Invitee
    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Versions and GitHub Sync Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
Invitee
Publish Note

Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

Your note will be visible on your profile and discoverable by anyone.
Your note is now live.
This note is visible on your profile and discoverable online.
Everyone on the web can find and read all notes of this public team.
See published notes
Unpublish note
Please check the box to agree to the Community Guidelines.
View profile
Engagement control
Commenting
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
  • Everyone
Suggest edit
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
Emoji Reply
Enable
Import from Dropbox Google Drive Gist Clipboard
   owned this note    owned this note      
Published Linked with GitHub
Subscribed
  • Any changes
    Be notified of any changes
  • Mention me
    Be notified of mention me
  • Unsubscribe
Subscribe
# Parallelization of ngcc One of the most impactful ways to improve ngcc performance is to parallelize its operations. ngcc must compile many entrypoints from `node_modules`, and often these compilation tasks have no interdependencies and could be performed in parallel. This document describes how this would functionally be implemented in ngcc. [TOC] ## Options for parallelizing Node programs Node is famously single threaded. Most options for parallelizing the execution of tasks leverage multiple worker processes, with IPC communication between the original "master" process and a number of "worker"s which handle the underlying operations. Broadly speaking, there are 3 options worth considering: ### Option 1: `child_process` This is the most straightforward approach, where ngcc would `fork()` itself repeatedly via the Node builtin `child_process.fork()`. A number of workers would be forked off, and the main ngcc process would then pass messages to them via IPC (likely via some JSON serialization over stdin/stdout) to indicate which entrypoints each worker should compile. This option requires ngcc to implement the IPC itself, as well as the management tasks related to tracking worker health, handling shutdown of child processes, etc. ### Option 2: `cluster` `cluster` is an NPM package which wraps the `child_process` API into something more convenient for splitting tasks off as separate workers. It handles IPC and the bookkeeping/cleanup tasks, and allows the programmer to focus on the actual parallelization of the work. Using `cluster` would incur an additional NPM dependency for the compiler. ### Option 3: `worker_threads` `worker_threads` is a new API in Node 10.5+ which supports true multi-threading, as opposed to multi-process. This means that threads can share memory (via exchanging/sharing `ArrayBuffer`s). It's currently an unstable, experimental API, but has the potential to be more efficient than a multi-process approach. ### Recommendation: Despite requiring an additional NPM dependency, `cluster` is the most appealing option. It's stable, well-tested, handles a lot of edge cases that would require additional work for ngcc to handle on its own, and provides a very clean API. ## Current single-threaded ngcc algorithm ngcc's work can rougly be divided into 3 phases: 1. Dependency scanning Beginning with an entrypoint or the whole of `node_modules`, ngcc identifies which packages contain Angular View Engine entrypoints. It then parses a particular format of each entrypoint and extracts the dependency edges to other entrypoints by looking at `import` statements. It continues this until it's built a DAG describing the dependency structure of all the entrypoints it needs to compile. 2. Planning In this phase, ngcc looks at the DAG and determines an order of operations for compiling entrypoints. This ordering ensures any dependencies of an entrypoint have already been compiled prior to its invocation. 3. Execution Finally, ngcc iterates through the operations in its list and compiles them, currently one entrypoint at a time. Each entrypoint is then compiled, sometimes multiple times if multiple formats have been requested. ## Parallel ngcc algorithm All 3 phases of the above algorithm could potentially be parallelized. We will first discuss the expensive operation of compilation, and then consider the possibility of parallelizing the scanning phase. ### Parallelizing compilation In ngcc, the information tracked about an entrypoint is described by the `EntryPoint` interface. This includes things like the entrypoint name, various paths (e.g. to source or .d.ts files), and the parsed `package.json` contents for the entrypoint. This interface is serializable, so it could easily be sent to a worker process for execution. Currently, multiple compilations may occur for a single `EntryPoint`, due to the difference in formats. Since it's possible to compile two formats in parallel, a separate `Task` interface will be created representing the actual unit of work which will be sent to the worker: ```typescript interface Task { entryPoint: EntryPoint; formatProperties: string[]; processDts: boolean; } ``` This combines both the `entryPoint` being targeted and the list of `formatProperties` - properties in the `package.json` which all point to the same format being compiled. The `processDts` flag indicates whether that particular compilation should emit `.d.ts` files for the given entrypoint as well. This only needs to happen once per entrypoint, regardless of how many formats are compiled for it. ### Changes to the planning phase The current planning ngcc does is insufficient for a parallelized compilation phase. Currently, ngcc compiles one entrypoint at a time, picking each task off the front of the queue until there are no more entrypoints to compile. This maintains an important invariant: an entrypoint will not be compiled until all of its dependencies have been. Consider a simple dependency graph: ```graphviz digraph depgraph { C -> {A, B} } ``` To compile entrypoint `C`, we first need to compile `A` and `B`. So an ordered list of operations for ngcc today might be: `[A, B, C]`. If this is parallelized across two workers, then one worker would compile `A` while the other compiles `B`. So far, so good. Once one task finishes though, `C` cannot be started since some of its dependencies are still being compiled. Thus, ngcc's planning phase will need adjusting. The output of planning should be a `Task[]`, indicating the rough order of operations. However, the work selection algorithm will need to be more sophisticated. ### Work selection Work selection is a process which runs on the master, and is triggered whenever an event happens (at the start of compilation, or when a worker finishes a task assigned to it). It proceeds as follows: * While there are still workers without work, and tasks to process: * Consider the next task in the queue. * If all of its dependencies have at least had their typings compiled, assign it to a worker. * If not, skip it. * If out of tasks, stop. There may still be idle workers, but there is no work for them to safely perform. This process could be optimized somewhat (for example, by having the master proactively scan the queue and mark as "safe" any tasks which could be started at any time). Dependencies of a task could be cached so they don't have to be re-checked with each consideration. ### IPC To assign a worker a task, the master will send it a message containing the `Task` object. When the worker completes the task, it will either send back a message indicating success or failure. If failure, the worker will include a diagnostic message to be reported by the master which explains the failure. ### Potential parallelization of scanning The task of scanning `node_modules` for candidate entrypoints to compile could potentially be parallelized as well. It's a much lower priority than compliation simply because it represents a significantly smaller potential win, but since this phase can scale to several seconds for larger projects it might be worth consideration. ## Parallelizing compilation In general, the compilation of the specific format of an entrypoint is independent of any other compilations going on. However, care must be taken to not introduce race conditions when writing files. ### package.json `package.json` is one of these special cases. Each compilation updates the `package.json` fields related to that specific format, and these can run in parallel. So writes to `package.json` need to take place on the main thread. ## Design Since ngcc needs to be able to run either in parallel or serially, a solid abstraction is needed to avoid needing two separate flows. Ideally, the choice of serial or parallel should be irrelevant to the main ngcc "flow". Although as execution is now possibly asynchronous, the main ngcc flow will also potentially be asynchronous. ### `Executor` interface One way to achieve this independence is to hide the compilation behind an `Executor` interface, which is responsible for performing ngcc's work either synchronously or asynchronously depending on compilation. A possible specification of this interface is: ```typescript interface Executor { execute(analyzeFn: AnalyzeFn, compiler: NgccTaskCompiler): Promise<void>|void; } type AnalyzeFn = () => { queue: TaskQueue; graph: DepGraph<EntryPoint>; }; ``` The inputs to `execute` are an `AnalyzeFn`, a function which can compute the work needing to be done (the ngcc scanning/planning phases), and an `NgccTaskCompiler`, a separate class which wraps compilation of a particular `Task`. `TaskQueue` is simply an abstraction for a `Task[]` with some utility methods for popping the next task of interest. ### Flow with `cluster` `cluster` creates workers by doing a `fork`/`exec` of the current process. Thus, each worker will behave as if the ngcc command was re-run from the beginning, going through the same flow as before. Since the main ngcc flow has no knowledge of the split, workers will go through the entire `main()`, including calling `Executor.execute()`. This means that the `main` function shouldn't do any work unless the executor directs it to. The `AnalyzeFn` allows the scanning/planning phase to be deferred until the executor can make the determination to call it. This way, the executor that uses `cluster` to parallelize compilations can only call the `AnalyzeFn` the first time through, when running on the master. ## Consideration: parallelizing the CLI's usage of ngcc The above design only really works when ngcc is invoked directly. There are thus issues preventing the CLI from using parallel ngcc: 1) The CLI needs a synchronous answer. When the CLI calls ngcc, it does so by intercepting the `ts.CompilerHost.resolveModuleNames` function during program creation. This function expects a synchronous answer - there is no room for the CLI to wait on promises or callbacks. 2) `cluster.fork()` won't quite work in the CLI `cluster.fork()` by default is equivalent to creating a new execution of the current process, meaning in the CLI it would start a new CLI compilation. We would need to invest some effort into separating the worker entrypoint into ngcc as a separate JS file and configure Cluster to start workers using it instead. 3) Overhead `fork()` is not a cheap operation, and repeating it (several times) every time the CLI needs ngcc to compile a new module would get very costly in terms of overhead. ### On synchronicity and overhead It's possible to conceive of a design to run ngcc in parallel in the CLI and deal with the synchronicity and overhead issues. This section stops short of proposing a full design, but sketches out a potential architecture. #### Waiting on async operations behind a sync interface Solving this problem is critical to being able to invoke a parallel (and therefore async) ngcc operation behind the synchronous TS interface. It is solvable. As it turns out, there are several techniques for doing this. The general formula is: 1) Spawn a separate thread or process to actually perform the async work. 2) Block the CLI process using some synchronous API. 3) Signal the main CLI process, causing it to unblock and continue. One approach to doing this is to make use of `child_process.execSync`. This function can invoke a node sub-process which can involve async work, and block synchronously until that operation completes. Another approach (which sadly won't work on Windows) is to make use of a filesystem FIFO object to communicate completion. `fs.readFileSync` will block on a FIFO until data is available. This would allow communication to and from a long-running process, instead of needing to spawn a new process every time. Unfortunately Windows does not have FIFOs, and the rough equivalent (named pipes) are not exposed to Node. A final approach is to make use of one of the NPM packages like `deasync` which use node's C++ bindings to implement such blocking operations using threading internally. #### Dealing with overhead Rather than spawning multiple workers with each compilation request, the first time CLI requests ngcc to do work, the workers should be spun up lazily and left running. Each subsequent request would be routed to a new worker (or more than one, if dependencies need to be compiled), and then the CLI process would block waiting for completion using the mechanism above. This would allow ngcc to be truly parallel inside the CLI without the overhead of constant process creation. #### Limitations One issue with the above is that the CLI would still be feeding ngcc one entrypoint at a time to compile. This inherently serializes the compilation somewhat, so it's only the compilation of dependencies of a given entrypoint which can happen in parallel. With some work, perhaps the CLI could continue creating the program when the `.d.ts` compilations are done, and ngcc could continue in the background with compilation of other entrypoint formats in parallel with future work. Still, this is a significant limitation to the usefulness of parallelism within the context of the `ts.CompilerHost.resolveModuleNames` API that the CLI hooks for ngcc. ## Prototype A [prototype](https://github.com/alxhub/angular/tree/ngcc/ll) of the above exists. It's missing a few features and needs substantial testing. In particular: * [ ] `package.json` writes need to be serialized through the master. An interface exists to abstract this, but the actual serialization isn't implemented. * [ ] unit tests are needed for the executors (perhaps mocking `cluster` somehow?) * [ ] an integration test is needed for the actual parallelization * [ ] configuration should determine whether serial or parallel strategies are used * [ ] the CLI API should be exclusively serial * [ ] generally the code should be cleaned up & commented more thoroughly

Import from clipboard

Paste your markdown or webpage here...

Advanced permission required

Your current role can only read. Ask the system administrator to acquire write and comment permission.

This team is disabled

Sorry, this team is disabled. You can't edit this note.

This note is locked

Sorry, only owner can edit this note.

Reach the limit

Sorry, you've reached the max length this note can be.
Please reduce the content or divide it to more notes, thank you!

Import from Gist

Import from Snippet

or

Export to Snippet

Are you sure?

Do you really want to delete this note?
All users will lose their connection.

Create a note from template

Create a note from template

Oops...
This template has been removed or transferred.
Upgrade
All
  • All
  • Team
No template.

Create a template

Upgrade

Delete template

Do you really want to delete this template?
Turn this template into a regular note and keep its content, versions, and comments.

This page need refresh

You have an incompatible client version.
Refresh to update.
New version available!
See releases notes here
Refresh to enjoy new features.
Your user state has changed.
Refresh to load new user state.

Sign in

Forgot password

or

By clicking below, you agree to our terms of service.

Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
Wallet ( )
Connect another wallet

New to HackMD? Sign up

Help

  • English
  • 中文
  • Français
  • Deutsch
  • 日本語
  • Español
  • Català
  • Ελληνικά
  • Português
  • italiano
  • Türkçe
  • Русский
  • Nederlands
  • hrvatski jezik
  • język polski
  • Українська
  • हिन्दी
  • svenska
  • Esperanto
  • dansk

Documents

Help & Tutorial

How to use Book mode

Slide Example

API Docs

Edit in VSCode

Install browser extension

Contacts

Feedback

Discord

Send us email

Resources

Releases

Pricing

Blog

Policy

Terms

Privacy

Cheatsheet

Syntax Example Reference
# Header Header 基本排版
- Unordered List
  • Unordered List
1. Ordered List
  1. Ordered List
- [ ] Todo List
  • Todo List
> Blockquote
Blockquote
**Bold font** Bold font
*Italics font* Italics font
~~Strikethrough~~ Strikethrough
19^th^ 19th
H~2~O H2O
++Inserted text++ Inserted text
==Marked text== Marked text
[link text](https:// "title") Link
![image alt](https:// "title") Image
`Code` Code 在筆記中貼入程式碼
```javascript
var i = 0;
```
var i = 0;
:smile: :smile: Emoji list
{%youtube youtube_id %} Externals
$L^aT_eX$ LaTeX
:::info
This is a alert area.
:::

This is a alert area.

Versions and GitHub Sync
Get Full History Access

  • Edit version name
  • Delete

revision author avatar     named on  

More Less

Note content is identical to the latest version.
Compare
    Choose a version
    No search result
    Version not found
Sign in to link this note to GitHub
Learn more
This note is not linked with GitHub
 

Feedback

Submission failed, please try again

Thanks for your support.

On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

Please give us some advice and help us improve HackMD.

 

Thanks for your feedback

Remove version name

Do you want to remove this version name and description?

Transfer ownership

Transfer to
    Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

      Link with GitHub

      Please authorize HackMD on GitHub
      • Please sign in to GitHub and install the HackMD app on your GitHub repo.
      • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
      Learn more  Sign in to GitHub

      Push the note to GitHub Push to GitHub Pull a file from GitHub

        Authorize again
       

      Choose which file to push to

      Select repo
      Refresh Authorize more repos
      Select branch
      Select file
      Select branch
      Choose version(s) to push
      • Save a new version and push
      • Choose from existing versions
      Include title and tags
      Available push count

      Pull from GitHub

       
      File from GitHub
      File from HackMD

      GitHub Link Settings

      File linked

      Linked by
      File path
      Last synced branch
      Available push count

      Danger Zone

      Unlink
      You will no longer receive notification when GitHub file changes after unlink.

      Syncing

      Push failed

      Push successfully