Alex Rickabaugh
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Note Insights Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    # Parallelization of ngcc One of the most impactful ways to improve ngcc performance is to parallelize its operations. ngcc must compile many entrypoints from `node_modules`, and often these compilation tasks have no interdependencies and could be performed in parallel. This document describes how this would functionally be implemented in ngcc. [TOC] ## Options for parallelizing Node programs Node is famously single threaded. Most options for parallelizing the execution of tasks leverage multiple worker processes, with IPC communication between the original "master" process and a number of "worker"s which handle the underlying operations. Broadly speaking, there are 3 options worth considering: ### Option 1: `child_process` This is the most straightforward approach, where ngcc would `fork()` itself repeatedly via the Node builtin `child_process.fork()`. A number of workers would be forked off, and the main ngcc process would then pass messages to them via IPC (likely via some JSON serialization over stdin/stdout) to indicate which entrypoints each worker should compile. This option requires ngcc to implement the IPC itself, as well as the management tasks related to tracking worker health, handling shutdown of child processes, etc. ### Option 2: `cluster` `cluster` is an NPM package which wraps the `child_process` API into something more convenient for splitting tasks off as separate workers. It handles IPC and the bookkeeping/cleanup tasks, and allows the programmer to focus on the actual parallelization of the work. Using `cluster` would incur an additional NPM dependency for the compiler. ### Option 3: `worker_threads` `worker_threads` is a new API in Node 10.5+ which supports true multi-threading, as opposed to multi-process. This means that threads can share memory (via exchanging/sharing `ArrayBuffer`s). It's currently an unstable, experimental API, but has the potential to be more efficient than a multi-process approach. ### Recommendation: Despite requiring an additional NPM dependency, `cluster` is the most appealing option. It's stable, well-tested, handles a lot of edge cases that would require additional work for ngcc to handle on its own, and provides a very clean API. ## Current single-threaded ngcc algorithm ngcc's work can rougly be divided into 3 phases: 1. Dependency scanning Beginning with an entrypoint or the whole of `node_modules`, ngcc identifies which packages contain Angular View Engine entrypoints. It then parses a particular format of each entrypoint and extracts the dependency edges to other entrypoints by looking at `import` statements. It continues this until it's built a DAG describing the dependency structure of all the entrypoints it needs to compile. 2. Planning In this phase, ngcc looks at the DAG and determines an order of operations for compiling entrypoints. This ordering ensures any dependencies of an entrypoint have already been compiled prior to its invocation. 3. Execution Finally, ngcc iterates through the operations in its list and compiles them, currently one entrypoint at a time. Each entrypoint is then compiled, sometimes multiple times if multiple formats have been requested. ## Parallel ngcc algorithm All 3 phases of the above algorithm could potentially be parallelized. We will first discuss the expensive operation of compilation, and then consider the possibility of parallelizing the scanning phase. ### Parallelizing compilation In ngcc, the information tracked about an entrypoint is described by the `EntryPoint` interface. This includes things like the entrypoint name, various paths (e.g. to source or .d.ts files), and the parsed `package.json` contents for the entrypoint. This interface is serializable, so it could easily be sent to a worker process for execution. Currently, multiple compilations may occur for a single `EntryPoint`, due to the difference in formats. Since it's possible to compile two formats in parallel, a separate `Task` interface will be created representing the actual unit of work which will be sent to the worker: ```typescript interface Task { entryPoint: EntryPoint; formatProperties: string[]; processDts: boolean; } ``` This combines both the `entryPoint` being targeted and the list of `formatProperties` - properties in the `package.json` which all point to the same format being compiled. The `processDts` flag indicates whether that particular compilation should emit `.d.ts` files for the given entrypoint as well. This only needs to happen once per entrypoint, regardless of how many formats are compiled for it. ### Changes to the planning phase The current planning ngcc does is insufficient for a parallelized compilation phase. Currently, ngcc compiles one entrypoint at a time, picking each task off the front of the queue until there are no more entrypoints to compile. This maintains an important invariant: an entrypoint will not be compiled until all of its dependencies have been. Consider a simple dependency graph: ```graphviz digraph depgraph { C -> {A, B} } ``` To compile entrypoint `C`, we first need to compile `A` and `B`. So an ordered list of operations for ngcc today might be: `[A, B, C]`. If this is parallelized across two workers, then one worker would compile `A` while the other compiles `B`. So far, so good. Once one task finishes though, `C` cannot be started since some of its dependencies are still being compiled. Thus, ngcc's planning phase will need adjusting. The output of planning should be a `Task[]`, indicating the rough order of operations. However, the work selection algorithm will need to be more sophisticated. ### Work selection Work selection is a process which runs on the master, and is triggered whenever an event happens (at the start of compilation, or when a worker finishes a task assigned to it). It proceeds as follows: * While there are still workers without work, and tasks to process: * Consider the next task in the queue. * If all of its dependencies have at least had their typings compiled, assign it to a worker. * If not, skip it. * If out of tasks, stop. There may still be idle workers, but there is no work for them to safely perform. This process could be optimized somewhat (for example, by having the master proactively scan the queue and mark as "safe" any tasks which could be started at any time). Dependencies of a task could be cached so they don't have to be re-checked with each consideration. ### IPC To assign a worker a task, the master will send it a message containing the `Task` object. When the worker completes the task, it will either send back a message indicating success or failure. If failure, the worker will include a diagnostic message to be reported by the master which explains the failure. ### Potential parallelization of scanning The task of scanning `node_modules` for candidate entrypoints to compile could potentially be parallelized as well. It's a much lower priority than compliation simply because it represents a significantly smaller potential win, but since this phase can scale to several seconds for larger projects it might be worth consideration. ## Parallelizing compilation In general, the compilation of the specific format of an entrypoint is independent of any other compilations going on. However, care must be taken to not introduce race conditions when writing files. ### package.json `package.json` is one of these special cases. Each compilation updates the `package.json` fields related to that specific format, and these can run in parallel. So writes to `package.json` need to take place on the main thread. ## Design Since ngcc needs to be able to run either in parallel or serially, a solid abstraction is needed to avoid needing two separate flows. Ideally, the choice of serial or parallel should be irrelevant to the main ngcc "flow". Although as execution is now possibly asynchronous, the main ngcc flow will also potentially be asynchronous. ### `Executor` interface One way to achieve this independence is to hide the compilation behind an `Executor` interface, which is responsible for performing ngcc's work either synchronously or asynchronously depending on compilation. A possible specification of this interface is: ```typescript interface Executor { execute(analyzeFn: AnalyzeFn, compiler: NgccTaskCompiler): Promise<void>|void; } type AnalyzeFn = () => { queue: TaskQueue; graph: DepGraph<EntryPoint>; }; ``` The inputs to `execute` are an `AnalyzeFn`, a function which can compute the work needing to be done (the ngcc scanning/planning phases), and an `NgccTaskCompiler`, a separate class which wraps compilation of a particular `Task`. `TaskQueue` is simply an abstraction for a `Task[]` with some utility methods for popping the next task of interest. ### Flow with `cluster` `cluster` creates workers by doing a `fork`/`exec` of the current process. Thus, each worker will behave as if the ngcc command was re-run from the beginning, going through the same flow as before. Since the main ngcc flow has no knowledge of the split, workers will go through the entire `main()`, including calling `Executor.execute()`. This means that the `main` function shouldn't do any work unless the executor directs it to. The `AnalyzeFn` allows the scanning/planning phase to be deferred until the executor can make the determination to call it. This way, the executor that uses `cluster` to parallelize compilations can only call the `AnalyzeFn` the first time through, when running on the master. ## Consideration: parallelizing the CLI's usage of ngcc The above design only really works when ngcc is invoked directly. There are thus issues preventing the CLI from using parallel ngcc: 1) The CLI needs a synchronous answer. When the CLI calls ngcc, it does so by intercepting the `ts.CompilerHost.resolveModuleNames` function during program creation. This function expects a synchronous answer - there is no room for the CLI to wait on promises or callbacks. 2) `cluster.fork()` won't quite work in the CLI `cluster.fork()` by default is equivalent to creating a new execution of the current process, meaning in the CLI it would start a new CLI compilation. We would need to invest some effort into separating the worker entrypoint into ngcc as a separate JS file and configure Cluster to start workers using it instead. 3) Overhead `fork()` is not a cheap operation, and repeating it (several times) every time the CLI needs ngcc to compile a new module would get very costly in terms of overhead. ### On synchronicity and overhead It's possible to conceive of a design to run ngcc in parallel in the CLI and deal with the synchronicity and overhead issues. This section stops short of proposing a full design, but sketches out a potential architecture. #### Waiting on async operations behind a sync interface Solving this problem is critical to being able to invoke a parallel (and therefore async) ngcc operation behind the synchronous TS interface. It is solvable. As it turns out, there are several techniques for doing this. The general formula is: 1) Spawn a separate thread or process to actually perform the async work. 2) Block the CLI process using some synchronous API. 3) Signal the main CLI process, causing it to unblock and continue. One approach to doing this is to make use of `child_process.execSync`. This function can invoke a node sub-process which can involve async work, and block synchronously until that operation completes. Another approach (which sadly won't work on Windows) is to make use of a filesystem FIFO object to communicate completion. `fs.readFileSync` will block on a FIFO until data is available. This would allow communication to and from a long-running process, instead of needing to spawn a new process every time. Unfortunately Windows does not have FIFOs, and the rough equivalent (named pipes) are not exposed to Node. A final approach is to make use of one of the NPM packages like `deasync` which use node's C++ bindings to implement such blocking operations using threading internally. #### Dealing with overhead Rather than spawning multiple workers with each compilation request, the first time CLI requests ngcc to do work, the workers should be spun up lazily and left running. Each subsequent request would be routed to a new worker (or more than one, if dependencies need to be compiled), and then the CLI process would block waiting for completion using the mechanism above. This would allow ngcc to be truly parallel inside the CLI without the overhead of constant process creation. #### Limitations One issue with the above is that the CLI would still be feeding ngcc one entrypoint at a time to compile. This inherently serializes the compilation somewhat, so it's only the compilation of dependencies of a given entrypoint which can happen in parallel. With some work, perhaps the CLI could continue creating the program when the `.d.ts` compilations are done, and ngcc could continue in the background with compilation of other entrypoint formats in parallel with future work. Still, this is a significant limitation to the usefulness of parallelism within the context of the `ts.CompilerHost.resolveModuleNames` API that the CLI hooks for ngcc. ## Prototype A [prototype](https://github.com/alxhub/angular/tree/ngcc/ll) of the above exists. It's missing a few features and needs substantial testing. In particular: * [ ] `package.json` writes need to be serialized through the master. An interface exists to abstract this, but the actual serialization isn't implemented. * [ ] unit tests are needed for the executors (perhaps mocking `cluster` somehow?) * [ ] an integration test is needed for the actual parallelization * [ ] configuration should determine whether serial or parallel strategies are used * [ ] the CLI API should be exclusively serial * [ ] generally the code should be cleaned up & commented more thoroughly

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully