TurboWish Development Plan: The Async Monitor and Console

TurboWish Overview

TurboWish is a suite of tools that give Rust developers insight into performance issues in their code.

The first TurboWish deliverable is the Async Monitor and Console, which answers a developer's questions about how their code's async runtime is behaving as their program runs.

The Async Console provides a summary of an async runtime's behavior, using concepts shared across the async rust ecosystem (such as tasks and resources), and metrics that are applicable to any async program (such as the average time each task spends waiting in a ready state).

When a developer asks: "Why is my task stuck?" or "Why is my app slow?", the Async Console is the first, and in some cases, the only tool they need to reach for. It will allow the developer to quickly see how tasks are scheduled (to learn how much time is spent in the developer's own code versus waiting to run), identify tasks that are starving or blocked and what resources they are waiting on, and identify tasks responsible for replenishing a scarce resource.

We plan to expand the TurboWish tool suite with other tools dedicated to other investigations, such heap profiling or sampling-based CPU profiling. This design document is dedicated to the Async Monitor and Console; in tandem, they are the tool that will drive the top-down investigation of an async developer's performance issues.

Document overview

This document describes the development plan for the first TurboWish tool: the Async Monitor and Console. It opens with the original goals followed by (proposed, unfinalized) tenets for the TurboWish suite itself. It then presents a description of the customer experience using the Async Monitor and Console. Then it sketches an implementation plan. A section follows describing metrics for evaluating whether the tools are achieving their goal, and finally a section touching on security concerns. Appendices follow the conclusion of the document, including an appendix with the project schedule.

Goals

link to goals (if you want to comment on them, go to the linked doc)

Profile Production Code: Incorporating the TurboWish Framework is low-overhead: it can be incorporated into production code without producing an undue maintenance burden and without incurring significant performance overhead.

Domain-specific Feedback: Frameworks and applications can provide data for specialized metrics, specific to their internal architecture.

Understand Hidden Costs and Connections: Frameworks like tokio ease writing asynchronous code because they hide a number of details behind abstractions (such as generator code produced by the Rust compiler, or task queues managed by the tokio runtime). TurboWish exposes those hidden details, allowing developers to correlate them with other program events. It also exposes connections that humans usually have to reconstruct by hand (such as future to resource to future chains that can yield deadlock), allowing one to directly see from Rust’s ownership model how resources are being held in the object graph.

Framework Agnostic: Many of Rust’s customers use tokio, but not all of them. async-std and fuschia_async are other frameworks for asynchronous programming. TurboWish can provide value to any such framework (though it may also provide framework-specific functionality when warranted). For our initial releases, we can focus on tokio alone, but expect integration with others if use with tokio proves successful.

EC2 Instance Type Agnostic: If we make use of any OS specific features (e.g. dtrace probes), they will be available on all EC2 AL2 instances, regardless of instance-type. (Specifically, we cannot require access to CPU performance counters for core functionality. We may offer extra features that utilize them.)

Tenets of TurboWish Tool Suite

The Rust community at large, both within and outside of AWS, is the customer for TurboWish.

Injected instrumentation must not block application progress.

Timing-measurement noise induced by client instrumentation should be minimized.

Present diagnoses in terms of customer-centric concepts, such as resource and task.

Minimize coupling between components to encourage concurrent community development.

The transition from development to production deserves as much performance tooling as production monitoring itself. (More specifically: We see customers struggling to get their software operating well enough to push to release; therefore, some tools can be specialized to the development use-case.)

The Async Monitor is one such tool: It is aimed at programs under development, not in release.

The Async Console Experience

When a developer wants to deploy the Async Monitor on their program, they will need to hook it in by adding it as an upstream dependency (in the Cargo.toml file) and also writing a few initialization lines at the start of their source code:








use turbowish_async_monitor as tw_monitor;

#[tokio::main]
async fn main() {
    tw_monitor::builder().port(8080).init();
    
    // Rest of the app
}

With that initialization code in place, the service will operate in the same fashion as before, but will now also run the Async Monitor, which observes events and then, based on those observations, builds an internal model of the program and the async executor. External programs, such as the Async Console can now connect and present the state of the Async Monitor by connecting to the port indicated above..

When the developer first connects the Async Console, they get a presentation similar to the UNIX top command, showing a brief summary of the executors (with metrics like the number of tasks running or sleeping, average times spent waiting to run, average Future::poll runtimes), and below that, a list of the current tasks, each on its own line with with an id number, name, current run state (Polling, Ready to poll, and Waiting), and a list of task attributes that include developer-specified metadata.

async console terminal user interface

This view will dynamically update (just like top) as the application runs.

The async console presents data primarily oriented around tasks and resources. The aim is for a model as straight-forward to understand as the Resource Allocation Graphs described by Ric Holt in 1972: We wish to explain as much as we can in terms of tasks waiting for or using resources (either by locking them if they are exclusive resources, or by consuming them if they are cumulative resources such as message channels), or resources waiting for tasks to make them available (either by unlocking them for exclusive resources, or by refueling them if they are cumulative resources).

The async console monitors for known performance pitfalls, and includes a highlighted alert at the top of the screen if the program is currently exhibiting one of those pitfalls. The alert provides a link to a problem-oriented view, that explains the pitfall and provides guidance as to how one can resolve it.

From the async console, the developer can do three main things. First, they can dig into an individual record of a task or resource, by traversing hyperlinks for that task or resource. (A "record" in this context is a screen summarizing the event history and current status for that task or resource.) Second, they can pause a waiting task (or request that a specific non-waiting task be paused when it next evaluates a .await); such paused tasks can be subsequently inspected by a debugger attaching to the running process, and then resumed when the developer is done with the attached debugger. Third, they can rollback history, in order to inspect past states for tasks or resources.

Performance Pitfall Alerts

The performance pitfall alerts handle a collection of known cases where our customers are asking themselves: "why is my task stuck?"

Example of problems that the Async Monitor can detect include: deadlock cycles, excessive polling times, and buggy implementation of Future that fail to register a waker.

As a concrete example: When a deadlock cycle occurs, the async console displays an alert saying "deadlock cycle detected." The developer follows the hyperlink for the alert, and this brings up a "problem view" that shows an interleaving list of tasks and resources, corresponding to the chain of dependencies that forms the cycle.

Resource Records

A resource record shows information about an individual resource.

Some resources are exclusive: they can be held by at most N tasks at a time (where N is often 1, such as in the case of a mutex lock). The resource record for an exclusive resource will include a list of tasks that currently hold it. It will also include a separate list of tasks that are currently blocked while trying to acquire the resource.

Some resources are cumulative: they hold some number of resource units (work items in a queue, messages on a channel, et cetera). The resource record for a cumulative resource will include a list of tasks that are currently blocked on the resource's current state (for example, for a channel, a receiver will block on an empty channel; a sender will block on a bounded channel at full capacity).

(In Holt 1972, the terms "reusable" and "consumable" roughly correspond to our usage of "exclusive" and "cumulative". The correspondence is imperfect though; e.g. Holt refers to individual consumable resource units, while for us, a "cumulative resource" describes a collection of such consumable units, such as a channel.)

In addition, the resource record will include lists of tasks that have signaled intent to interact with the resource. This way, the resource record for a multi-producer single-consumer channel can include a list of all of the sending tasks that hold the channel, as well as the receiver task associated with the channel, regardless of whether the channel's message buffer is empty, partially-full, or full.

Listing the tasks that intend to interact with the resource is how we can provide the developer with enough information to resolve unexpected blocking behavior in their code. For example, after seeing that a sender task is blocked because a bounded channel is at full capacity, it is simple for the developer to follow hyperlinks to go from the sender task to the channel's resource record, and from there through another hyperlink to the receiver task, and then work on figuring out how to make the receiver task process messages more efficiently.

Task Records

If a task is blocked, the task record will show what specific .await expression it is blocked on, by showing the span (file:line and column range) and the expression itself (my_channel.send(value).await). If the blockage is due to a specific resource, then a hyperlink to its resource record will be provided as well.

More generally, a task record shows information about an individual task, including lists of resources associated with the task.

If a task currently holds exclusive access to resources, all such resources will be listed.

As mentioned earlier, tasks can signal intent to interact with a resource. There are two kinds of intention that can be signaled: conditional intent and unconditional intent.

A task signals conditional intent when the interaction is predicated on some external condition holding; i.e., it will occur only on some of the task's non-error control-flow paths. (One could imagine approximating conditional intent by just listing all the resources that are reachable from the task; this is a topic we should explore during implementation.)

A task signals unconditional intent as a way to state "if I can make sufficient forward progress, I will interact in this manner with that resource." It is essentially another way of saying: "if you can figure out how to get me unblocked, these are the actions I will take", which can be crucial in resolving certain kinds of resource starvation issues.

The task record shows two separate lists corresponding to the two kinds of intention. This way, developers are informed about which tasks to look at first when trying to understand the interactions between tasks and resources in their program.

During execution, resources may move from the conditional list to the unconditional list, or they may move off the conditional list entirely, all according to what a task indicates as its code runs. When a task has actually performed its intended operation to completion (e.g. if it is entirely done sending messages on a given channel, and signals such to the async monitor) then the resource will likewise be removed from the task's resource lists.

Pausing a Task

Sometimes the information provided in the TurboWish Async Monitor's records will not give enough detail, and the developer will want to inspect the actual state of the task's memory on the live process in a debugger.

To pause a task, the developer can go to its Task record page, or just select it in the Async Console overview, and then issue the "pause" command. The executor pauses tasks by opting not to schedule them, instead leaving them suspended where they evaluated .await.

As a convenience, the Async Console provides gdb and lldb commands as helpers that pause the task, then spawn the corresponding debugger processes and attach them appropriately to the paused task.

Rolling Back the Event Log

Sometimes humans just don't understand how they got into some mess.

For those situations, the Async Monitor offers rollback functionality. Developers can step backwards through the event history and see how such steps affect the task and resource records.

For example, a select! expression in tokio will run a collection of futures concurrently. It will proceed with the value provided by whichever future completes first, canceling the other futures that lost the race. If code is not written to anticipate such cancellation, it can lead to data corruption, or tasks appearing to be blocked indefinitely (aka "hung").

As a concrete example adapted from a blog post, consider this code:










let mut file = ...;
let mut channel = ...;
loop {
    futures::select! {
        _ => read_send(&mut file, &mut channel) => {},
        some_data => socket.read_packet() => {
            // ...
        }
    }
}

Here, read_send is constructing a future on every iteration through the loop. If the socket.read_packet() invocation ever wins the select!-race, then the read_send future is dropped, which means the data may have been extracted from the file but not yet relayed on the channel. (The details are spelled out on the blog post.)

If a task appears to be hung, the Async Monitor allows the developer to gain insight into why: They can visit the task view for the blocked task. The task view will indicate where it is stuck (in the select!). Since select! is known to be a source of problems for developers, the view will also include a hyperlink that will "rewind time" to the invocation of select! on the previous iteration of this loop, so that one can observe what side-effects it had. Upon rewinding time, the task view indicates that the socket.read_packet() future won the race, and that the future returned from read_send was dropped.

(At this point, we are relying on the developer having an "Aha" moment about the effect of dropping a future when it is in the middle of its computation. It would be good to explore ways to help people get the insight sooner. For example, maybe having the view here indicate where the read_send future was in its own computation at the time it was dropped would help someone see that we need to keep that same future running.)

Constraining event log overhead

As part of the configuration options for the Async Monitor, one can adjust the upper-bound on the size of the event log, in order to ensure its memory usage does not induce memory exhaustion, virtual memory swapping, or other problems that could excessively perturb performance evaluation.

Concluding the Experience

For many problems, the Async Monitor will provide the developer with the insight they need to resolve their questions about their programs interactions with the async executor. However, some performance problems will require further inspection. The Async Monitor can help with some of that, such as when it allows a developer to pause a task. But sometimes a developer will need insight into other aspects of their program, such as where memory or CPU time is going. Other tools will be needed for such cases.

An appendix of this document shows a diagram of the expected developer work flow. The diagram presents steps that are described above; but it also shows decision points where a user might need to employ another tool.

Now that we have finished explaining the experience of someone using the tool, we will now discuss how to deliver that experience to our customers.

Implementation Plan

As mentioned above, there are two main components: the Async Monitor, and the Async Console.

The 2021 release of TurboWish will need to provide value without having any support from the Rust compiler or standard library. Therefore, the initial release will work by:

leveraging instrumentation added directly to the async executor,
encouraging service developers to add corresponding instrumentation, and
capturing stack backtraces at key executor events.

Instrumentation

The instrumentation will be responsible for documenting how the relationships between tasks and resources change over time. More specifically, the instrumentation will capture: task state transitions (running, ready, waiting), task acquisition of exclusive resource (e.g. locking a mutex), task modifications of resource state (e.g. sending on a channel or receiving from it), and task intent to interact with resources.

As mentioned in the Task Records section, there are two kinds of intent: conditional and unconditional. My long-term hope is to leverage some sort of heap-ownership tracing system to infer conditional intent, because signalling it via manual instrumentation will be arduous and error-prone. (Heap ownership tracking alone cannot infer unconditional intent, but it may be possible to leverage compiler analysis to perform such inference of unconditional intent.)

Component Separation

There are four short-term deliverables: 1. the Async Monitor, 2. the Async Console, 3. a specification of what instrumentation events the Async Monitor understands (which may come from the client code, the async executor, or any related libraries (e.g. Rayon or other crates that offer a thread pool)), and 4. the instrumentation of Tokio (following the aforementioned specification) to support the Async Monitor.

Diagram of Async Monitor Architecture

This is a rendering of the component separation. There are two running programs: The program being developed, and the Async Console. Within the program being developed, instrumentation events are generated and travel on the event bus, where they are observed by the Async Monitor running as a thread in that process. The Async Monitor communicates with the Async Console, presenting portions of the model according to the requests that arrive from the console.

%% Note: `%%` at line start is a comment.
flowchart TD
  subgraph Client [Development Program ___]
  TwCollect -.- pause_reqs -.-> Tokio
  pause_reqs([introspection requests,<br/>e.g. 'pause'])
  TracingEventBus
  TracingEventBus -.-> TwCollect
  
  TwCollect[Async Monitor]
  TwCollect --- EventLog
  EventLog[(EventLog)]
  
 
     Tokio[Async Executor<br/>e.g. Tokio]
     ClientCode[Client App Code]
     Rayon     
     
     ClientCode -.-> TracingEventBus
     Tokio -.-> TracingEventBus
     Rayon -.-> TracingEventBus
     TracingEventBus[Event Bus<br/>atop tracing crate<br/><follows event spec>]
  end
  
    subgraph Console [Console Program __]
   TwTui <--> TwCollect
  TwTui([Async Console])
   end

Development Decoupling

In order to enable concurrent development of the four deliverables, we will follow these steps:

Deliverable: Event specification

First, we must pick some format for specifying the grammar of the instrumentation events. Anything with an existing off-the-shelf parser-generator available as a Rust crate will serve. JSON might be a reasonable choice here, at least for initial prototyping; but we should also evaluate whether other event description formats incur less encoding overhead than JSON.

(Another option would be a dedicated enum type for events. That would then need to be exposed in its own crate that downstream clients would need to add as a dependency. ~~Perhaps more importantly: we are very likely to use tracing as the event bus, which takes string-encoded messages as the basis for its encoding.~~)

Second, we must make an initial specification of events. Just saying "it's JSON" is not sufficient; we need to state how each desired event is actually encoded in the chosen grammar. This will evolve over time. It must cover everything listed in Instrumentation.

With these two pieces in place, we can spin off a few parallel tracks of development.

Deliverable: Instrumenting tokio

For instrumenting tokio. I recommend that we start by developing an event-validating async monitor (or more simply, "event-validator"). The event-validator is responsible for ensuring the incoming event stream is coherent; it might build up a minimal model as part of its coherence check, but it is not responsible for gathering all the metrics that are expected of the Async Monitor product itself. With an event-validator in place, one can test the correctness of tokio instrumentation as it is added. (We will also need to do integration tests once the async monitor itself is constructed.)

Deliverable: Async Monitor

For the Async Monitor, I recommend the following baby-steps toward implementation.

First, either make a new mock async executor, or select a pre-existing simple reference async executor (several exist on github), and using this as the mock executor. Use the mock executor for initial development of the Async Monitor: that is, the in-development Async Monitor would be exercised by building an internal model of the mock executor's state.

The reason for employing a mock executor is two-fold: 1. it will be quicker to add (and modify when inevitably needed) the necessary instrumentation as the Async Monitor itself is developed, and 2. using a mock rather than tokio directly will help prevent inadvertent coupling between tokio itself and the async monitor; such coupling would make it hard to add support to other async executors like async-std in the future.

In addition to a mock executor, I also recommend making an mock event stream that will simulate both ends connected to the Async Monitor: it will generate the events that we expect to be signaled by an instrumented application sitting atop some instrumented executor, and it will also simulate a console attached to the other side of the Async Monitor. I recommend having a mock event streamer because this is the easiest way to generate a deterministic series of instrumentation events interleaved with console events. Making such event generation deterministic will be crucial for testing the async monitor.

Deliverable: Async Console

Finally, for the console, I recommend that we make a mock async monitor. It will simulate the Async Monitor's side of the communication protocol used by any Console connecting to the monitor's port.

(Defining the communication protocol between the Async Monitor and the Console is a prerequisite here. However, this protocol is only exposed to the Monitor and Console, and so it can be revised without having to worry about updating downstream clients.)

Deliverable Development Decoupling Diagram

The three decoupled tracks listed above are repeated below in a visual diagram, showing the dependencies between the pieces.

flowchart TD
    spec[Event Specification]
    spec --> validating_monitor[Event-Validating Async Monitor]
    spec --> mock_executor[Mock Async Executor]
    validating_monitor --> instrument_tokio[Add instrumentation to tokio]
    spec --> mock_event_stream
    mock_event_stream[Mock Event Streamer]
    mock_executor --> async_monitor[Async Monitor]
    mock_event_stream --> async_monitor
    mock_monitor[Mock Async Monitor] --> Console

Long-term concerns; short-cuts for the Minimum Viable Product

See Appendix: Sacrifices for Minimum Viable Product

Metrics

The goal of these tools is to provide users with insight into performance pitfalls. How can we know if the Async Monitor and Console are achieving their goal?

There are two ways I can imagine approaching this: telemetry from the Async Console program, or evaluating out-of-band signals (such as sentiment analysis on social media). I will focus here on the telemetry option, under the assumption that any potential telemetry would be an opt-in choice presented to the customer when they first use the Async Console tool.

Telemetry: Success Metrics

For evaluating whether the tool is successfully providing users with insight, the most obvious answer is we could ask our users. When the console detects a problem and the user shifts to the dedicated problem view describing the problem, the console could also proactively ask the "Yes"/"No" question of whether the information it presents is helping the user solve their problem (or perhaps request a user happiness rating on a 1-5 to scale), and then ship those responses back to a service that collects such feedback.

Alternatively: we already plan to have the tool link to websites that provide documentation of various known performance pitfalls. Rather than building telemetry into the console, the linked websites could ask the visitor whether the Async Monitor and Console are working for them. (However, this approach runs the risk of omitting the experiences of users who do not follow the links.)

Telemetry: Failure Metrics

On the flip side of things, we also want to ensure that the instrumentation from TurboWish is not injecting too much overhead into our customer's applications.

While it is probably not reasonable to try to measure the time spent issuing each individual instrumentation event, it is entirely reasonable to measure how many messages are being sent to the Async Monitor, how large they are, and which component (namely, the aync executor, or user code) issued them.

My recommendation is to monitor how much load the instrumentation is putting onto the event bus, according to the event density over a sliding window of time (let's say 100 ms long). If the event density from user code exceeds some predetermined threshold in any given 100ms window, then the Async Console signals a problem report to the developer, telling them that their program's instrumentation may be slowing down the program. If the event density from the async executor (lets say tokio) exceeds a similar predetermined threshold, then the Async Console reports that up to a service associated with the tokio project. (Or, if the tokio project does not see value in getting such traffic, then the Async Console could suggest that the user file a bug on the tokio github.)

Security Concerns

The instrumentation added to the tokio runtime and client application code may expose details of internal operation that customers do not exposed to the world at large.

We should check that there are controls in place that ensure either: 1. code deployed to production does not have such instrumentation turned on, or 2. the async monitor is not initiated on production systems, or 3. the port associated with the async monitor is blocked by a firewall guarding the customer's host machine or intranet.

Conclusion

The Async Monitor and Console answer a developer's questions about how their code's async executor is behaving as their program runs. They provide a summary of the executor's behavior, with metrics about the tasks and resources their program is using. They allow the developer to identify scheduling decisions and see how task and resources depend on each other.

Furthermore, they serve as the foundation of TurboWish. So, in effect: We aim in 2021 to deliver the foundation for 2022 and beyond. We will work with the Rust community to lay this groundwork, and our community will be enabled to make even more amazing tools atop this foundation.

FAQ

Q: Why build the Async Monitor into the application binary? Couldn't that be part of the Async Console instead, and have all the events stream to the Async Console?
- A: The primary reason that the Async Monitor is part of the application binary is that we believe streaming all the events that the monitor needs to observe to an external program will inject too much overhead.
  A secondary reason for building the Async Monitor into the application binary is simplicity. If the monitor were a separate program, then one cannot ensure that the full stream of events reaches the monitor. At best one could strive for a suffix of the event stream, which means that the monitor has to be designed to still produce a useful model given such a suffix. (A previously considered architecture handled the suffix problem by allowing the monitor to pose queries to the instrumented application, in order to get information such as "what is the current set of tasks?")

References

pdxcarl TurboWish Thoughts

Appendices

Appendix A: Sacrifices for Minimum Viable Product

Minimal Viable Product: Shims for resource instrumentation

Long-term concern: We want developers to be able to deploy the Async Monitor with minimal changes to their code. A few lines of initialization code is acceptable, but broad changes to many files is not a good customer experience.

Ways to accomplish this are under active discussion, but any solution with such limited source code modification will require one or more of: 1. changes to the Rust standard library, 2. changes to the rust compiler, or 3. use of low-level features such as dtrace probes or eBPF. We have not yet decided on which of these options is appropriate.

Short term sacrifice: In addition to the initialization code described at the start of the description of the developer experience, initial versions of TurboWish also require the developer to swap in instrumented versions of common modules; a provided lint will guide developers in this process and help ensure no instances are overlooked.


use turbowish::sync::Mutex; // was previously `use std::sync::Mutex;`

(As already stated above, longer-term, we hope to leverage other facilities to get these instrumentation events.)

Minimal Viable Product: Tokio-specific

Long-term concern: We want the Async Monitor and Console to work with any popular async executor: tokio and async-std are obvious choices here. For the Async Monitor to be useful on an async program, one must use an async executor that has appropriate TurboWish instrumentation added to it.

Short term sacrifice: We will deploy a prototype with tokio, but try to keep in mind any differences with other async executors as we design the protocol used for the async executor to communicate with the async monitor.

Minimum Viable Product: Require client instrumentation for full-features

Long-term concern: Client application code can benefit from adding extra instrumentation to their code. However, developers should be able to use and benefit from TurboWish without going to extremes adding new instrumentation beyond what the executor has out-of-the-box.

Some components of the Async Monitor will work with zero client instrumentation. In particular: the initial console output that shows the list of tasks and how their time is spent between polling, ready, and waiting does not require any client instrumentation.

However, other features of the Async Monitor, such as tasks listing resources with which they intend to interact, require either Rust compiler support or client instrumentation, or maybe both.

Short term sacrifice: We will prototype under the assumption that clients are willing to add instrumentation. The Async Monitor will differentiate between instrumentation that is "trusted": cases where instrumentation bugs will make the Async Monitor and Console produce misleading results (e.g. if transitions between polling and waiting are omitted or forged), and "untrusted": cases where instrumention bugs, by design of the monitor, will at most lead to confusion, but not outright lies (e.g. if incorrect attributes are attached to a task or resource, the console output showing those attributes will be likewise incorrect, but it need not disrupt other parts of the Async Monitor or Console).

Appendix B: TurboWish User Work Flow

flowchart TD
   Start --> PerfStart
   %% Start --> TasksSeemBlocked
   PerfStart([Performance does not match expectations])
   PerfStart --> QA
   %% TasksSeemBlocked([Tasks seem stuck])
   %% TasksSeemBlocked --> A
   QA{using<br/>async<br/>rust?}
   QA --"yes" --> A
   A --> R --> CC --> QProblemHighlighted
   A[add TurboWish Async Console to service]
   R[start serivce]
   CC[connect to console]
      
   QProblemHighlighted{Console<br/>highlights<br/>problem}
    
   QProblemHighlighted -- "yes" --> ConsoleHighlight
   ConsoleHighlight[observe console output]

   ConsoleHighlight --> PendingWithoutWaker
   PendingWithoutWaker([Console reports:<br/>buggy Future detected])
   
   ConsoleHighlight --> CycleDetected
   CycleDetected([Console reports:<br/>deadlock cycle detected])
   
   PendingWithoutWaker --> UserReadsDocs
   CycleDetected --> UserReadsDocs
   UserReadsDocs[read linked Rust documentation]
   
   UserReadsDocs --> IdentifyRootCause   
   
   QProblemHighlighted -- "no" --> QStuckTask
   
   QStuckTask{any tasks<br/>blocked?}
   
   QStuckTask -- "yes" --> InspectEachStuckTask --> FollowTaskResourceChains --> IdentifyRootCause
   
   InspectEachStuckTask[inspect each stuck task]
   FollowTaskResourceChains[follow task/resource dependencies]
   IdentifyRootCause[identify root cause of problem]
     
   QPH{excessive<br/>memory usage?}
  
   QPH -. "yes" .-> H
   QPH -. "no" .-> P
   
   QA -. "no" .-> QPH
   
   H[use turbowish-heapprof<br/><future project>]
   P[use turbowish-perf<br/><future project>]

Appendix C: Instrumentation notes (brainstorming)

(This section assumes that a set of related wakers is a reasonable and necessary proxy for "resource". This assumption will be revisited during development; the executor controls creation of wakers, so it is a natural point to instrument; but the wakers may not have sufficient context available at their creation points to support describing their associated resource.)

The most basic functionality for the task/resource graph user story requires the executor to emit events whenever:

a task is spawned,
a task is dropped,
a waker is created,
a waker is cloned,
a waker is dropped, or
a waker is transferred from one task to another.

Supporting other user stories will likely require tracking other information as well (such as how many pending futures have had wake called and are awaiting a call to poll). We may need to add additional hooks to the async executor, analogous to the support for "pause", that the Async Monitor can invoke to turn on tracing of such information.

The emitted events should include unique identifiers (UID) for any referenced task/wakers.

For values that are themselves boxed or own a heap-allocated value, we should be able to use a memory address as a UID, as long as we also include some sort of timestamp with the events (and the Event Collector will infer when memory is being reused and update its internal model accordingly).
(If we need to track values that do not have a uniquely associated heap values, then we may need to add some sort of unique-id generation for them. So far I haven't seen a need in tokio's types.)

The emitted events should also include some notion of the calling context for the event. This calling context should be meaningful from the viewpoint of the Client App Code.

For example, when <TimerFuture as Future>::poll calls cx.waker().clone(), we want the waker construction event to include (at minimum) that a waker was created from TimerFuture, so that one can properly tell what kind of resource that waker is associated with.
(It would be even better to include enough information in the event for the Event Collector to know which specific resource is associated with the waker, rather than just its type.

These events may include program-internal details such as (partial) stack traces that will include memory addresses of program code

(We cannot change existing APIs to thread through details like file/line-number info in the style of #[track_caller], so in general this is the only way I expect to be able to infer calling context without putting undue burden on client code.)
More specifically: Based on my understanding of the API's available, providing contextual info about the calling context of cx.waker().clone() will require either 1. client instrumentation that sets some thread- or task-local state, or 2. backtracing through the stack to find instruction addresses that the Async Monitor can, via debuginfo, map back to the calling context.

Appendix D: Schedule

Schedule

(embedded; follow this link to edit or write comments on schedule itself.)

Milestone	Subtask	Owner	Deadline	Status
3 to 5 User Stories		pnkfelix	26 feb 2021	done
Development Tracked Identified		pnkfelix	26 mar 2021	done
	Development Plan Doc	pnkfelix	~~1 apr 2021~~ ~~15 apr 2021~~ 21 apr 2021
PR/FAQ published		pnkfelix	~~2 apr 2021~~ 26 apr 2021	missed original deadline
	reach out to candidate launch partners	pnkfelix	23 apr 2021	3 identified
3 Launch Partners established		pnkfelix	30 apr 2021	on track
	event spec		14 may 2021
	event validating monitor		15 jun 2021
	tokio instrumentation		2 jul 2021
	mock event streamer		15 jun 2021
	alpha async monitor		2 jul 2021
	mock async monitor		13 may 2021
	alpha console		2 jul 2021
alpha prototype			2 jul 2021
alpha demo feedback gathered from launch partners			16 jul 2021
beta release			20 aug 2021
beta integrated with Launch Partner code bases			17 sep 2021
evaluation report of beta (interviews w/ Launch partners)			24 sep 2021
1.0 Release			29 oct 2021

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`	在筆記中貼入程式碼
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.