TurboWish is a suite of tools that give Rust developers insight into performance issues in their code.
The first TurboWish deliverable is the Async Monitor and Console, which answers a developer's questions about how their code's async runtime is behaving as their program runs.
The Async Console provides a summary of an async runtime's behavior, using concepts shared across the async rust ecosystem (such as tasks and resources), and metrics that are applicable to any async program (such as the average time each task spends waiting in a ready state).
When a developer asks: "Why is my task stuck?" or "Why is my app slow?", the Async Console is the first, and in some cases, the only tool they need to reach for. It will allow the developer to quickly see how tasks are scheduled (to learn how much time is spent in the developer's own code versus waiting to run), identify tasks that are starving or blocked and what resources they are waiting on, and identify tasks responsible for replenishing a scarce resource.
We plan to expand the TurboWish tool suite with other tools dedicated to other investigations, such heap profiling or sampling-based CPU profiling. This design document is dedicated to the Async Monitor and Console; in tandem, they are the tool that will drive the top-down investigation of an async developer's performance issues.
This document describes the development plan for the first TurboWish tool: the Async Monitor and Console. It opens with the original goals followed by (proposed, unfinalized) tenets for the TurboWish suite itself. It then presents a description of the customer experience using the Async Monitor and Console. Then it sketches an implementation plan. A section follows describing metrics for evaluating whether the tools are achieving their goal, and finally a section touching on security concerns. Appendices follow the conclusion of the document, including an appendix with the project schedule.
link to goals (if you want to comment on them, go to the linked doc)
The Rust community at large, both within and outside of AWS, is the customer for TurboWish.
Injected instrumentation must not block application progress.
Timing-measurement noise induced by client instrumentation should be minimized.
Present diagnoses in terms of customer-centric concepts, such as resource and task.
Minimize coupling between components to encourage concurrent community development.
The transition from development to production deserves as much performance tooling as production monitoring itself. (More specifically: We see customers struggling to get their software operating well enough to push to release; therefore, some tools can be specialized to the development use-case.)
When a developer wants to deploy the Async Monitor on their program, they will need to hook it in by adding it as an upstream dependency (in the Cargo.toml
file) and also writing a few initialization lines at the start of their source code:
use turbowish_async_monitor as tw_monitor;
#[tokio::main]
async fn main() {
tw_monitor::builder().port(8080).init();
// Rest of the app
}
With that initialization code in place, the service will operate in the same fashion as before, but will now also run the Async Monitor, which observes events and then, based on those observations, builds an internal model of the program and the async executor. External programs, such as the Async Console can now connect and present the state of the Async Monitor by connecting to the port indicated above..
When the developer first connects the Async Console, they get a presentation similar to the UNIX top
command, showing a brief summary of the executors (with metrics like the number of tasks running or sleeping, average times spent waiting to run, average Future::poll
runtimes), and below that, a list of the current tasks, each on its own line with with an id number, name, current run state (Polling, Ready to poll, and Waiting), and a list of task attributes that include developer-specified metadata.
This view will dynamically update (just like top
) as the application runs.
The async console presents data primarily oriented around tasks and resources. The aim is for a model as straight-forward to understand as the Resource Allocation Graphs described by Ric Holt in 1972: We wish to explain as much as we can in terms of tasks waiting for or using resources (either by locking them if they are exclusive resources, or by consuming them if they are cumulative resources such as message channels), or resources waiting for tasks to make them available (either by unlocking them for exclusive resources, or by refueling them if they are cumulative resources).
The async console monitors for known performance pitfalls, and includes a highlighted alert at the top of the screen if the program is currently exhibiting one of those pitfalls. The alert provides a link to a problem-oriented view, that explains the pitfall and provides guidance as to how one can resolve it.
From the async console, the developer can do three main things. First, they can dig into an individual record of a task or resource, by traversing hyperlinks for that task or resource. (A "record" in this context is a screen summarizing the event history and current status for that task or resource.) Second, they can pause a waiting task (or request that a specific non-waiting task be paused when it next evaluates a .await
); such paused tasks can be subsequently inspected by a debugger attaching to the running process, and then resumed when the developer is done with the attached debugger. Third, they can rollback history, in order to inspect past states for tasks or resources.
The performance pitfall alerts handle a collection of known cases where our customers are asking themselves: "why is my task stuck?"
Example of problems that the Async Monitor can detect include: deadlock cycles, excessive polling times, and buggy implementation of Future
that fail to register a waker.
As a concrete example: When a deadlock cycle occurs, the async console displays an alert saying "deadlock cycle detected." The developer follows the hyperlink for the alert, and this brings up a "problem view" that shows an interleaving list of tasks and resources, corresponding to the chain of dependencies that forms the cycle.
A resource record shows information about an individual resource.
Some resources are exclusive: they can be held by at most N tasks at a time (where N is often 1, such as in the case of a mutex lock). The resource record for an exclusive resource will include a list of tasks that currently hold it. It will also include a separate list of tasks that are currently blocked while trying to acquire the resource.
Some resources are cumulative: they hold some number of resource units (work items in a queue, messages on a channel, et cetera). The resource record for a cumulative resource will include a list of tasks that are currently blocked on the resource's current state (for example, for a channel, a receiver will block on an empty channel; a sender will block on a bounded channel at full capacity).
(In Holt 1972, the terms "reusable" and "consumable" roughly correspond to our usage of "exclusive" and "cumulative". The correspondence is imperfect though; e.g. Holt refers to individual consumable resource units, while for us, a "cumulative resource" describes a collection of such consumable units, such as a channel.)
In addition, the resource record will include lists of tasks that have signaled intent to interact with the resource. This way, the resource record for a multi-producer single-consumer channel can include a list of all of the sending tasks that hold the channel, as well as the receiver task associated with the channel, regardless of whether the channel's message buffer is empty, partially-full, or full.
Listing the tasks that intend to interact with the resource is how we can provide the developer with enough information to resolve unexpected blocking behavior in their code. For example, after seeing that a sender task is blocked because a bounded channel is at full capacity, it is simple for the developer to follow hyperlinks to go from the sender task to the channel's resource record, and from there through another hyperlink to the receiver task, and then work on figuring out how to make the receiver task process messages more efficiently.
If a task is blocked, the task record will show what specific .await
expression it is blocked on, by showing the span (file:line and column range) and the expression itself (my_channel.send(value).await
). If the blockage is due to a specific resource, then a hyperlink to its resource record will be provided as well.
More generally, a task record shows information about an individual task, including lists of resources associated with the task.
If a task currently holds exclusive access to resources, all such resources will be listed.
As mentioned earlier, tasks can signal intent to interact with a resource. There are two kinds of intention that can be signaled: conditional intent and unconditional intent.
A task signals conditional intent when the interaction is predicated on some external condition holding; i.e., it will occur only on some of the task's non-error control-flow paths. (One could imagine approximating conditional intent by just listing all the resources that are reachable from the task; this is a topic we should explore during implementation.)
A task signals unconditional intent as a way to state "if I can make sufficient forward progress, I will interact in this manner with that resource." It is essentially another way of saying: "if you can figure out how to get me unblocked, these are the actions I will take", which can be crucial in resolving certain kinds of resource starvation issues.
The task record shows two separate lists corresponding to the two kinds of intention. This way, developers are informed about which tasks to look at first when trying to understand the interactions between tasks and resources in their program.
During execution, resources may move from the conditional list to the unconditional list, or they may move off the conditional list entirely, all according to what a task indicates as its code runs. When a task has actually performed its intended operation to completion (e.g. if it is entirely done sending messages on a given channel, and signals such to the async monitor) then the resource will likewise be removed from the task's resource lists.
Sometimes the information provided in the TurboWish Async Monitor's records will not give enough detail, and the developer will want to inspect the actual state of the task's memory on the live process in a debugger.
To pause a task, the developer can go to its Task record page, or just select it in the Async Console overview, and then issue the "pause" command. The executor pauses tasks by opting not to schedule them, instead leaving them suspended where they evaluated .await
.
As a convenience, the Async Console provides gdb
and lldb
commands as helpers that pause the task, then spawn the corresponding debugger processes and attach them appropriately to the paused task.
Sometimes humans just don't understand how they got into some mess.
For those situations, the Async Monitor offers rollback functionality. Developers can step backwards through the event history and see how such steps affect the task and resource records.
For example, a select!
expression in tokio will run a collection of futures concurrently. It will proceed with the value provided by whichever future completes first, canceling the other futures that lost the race. If code is not written to anticipate such cancellation, it can lead to data corruption, or tasks appearing to be blocked indefinitely (aka "hung").
As a concrete example adapted from a blog post, consider this code:
let mut file = ...;
let mut channel = ...;
loop {
futures::select! {
_ => read_send(&mut file, &mut channel) => {},
some_data => socket.read_packet() => {
// ...
}
}
}
Here, read_send
is constructing a future on every iteration through the loop. If the socket.read_packet()
invocation ever wins the select!
-race, then the read_send
future is dropped, which means the data may have been extracted from the file but not yet relayed on the channel
. (The details are spelled out on the blog post.)
If a task appears to be hung, the Async Monitor allows the developer to gain insight into why: They can visit the task view for the blocked task. The task view will indicate where it is stuck (in the select!
). Since select!
is known to be a source of problems for developers, the view will also include a hyperlink that will "rewind time" to the invocation of select!
on the previous iteration of this loop, so that one can observe what side-effects it had. Upon rewinding time, the task view indicates that the socket.read_packet()
future won the race, and that the future returned from read_send
was dropped.
(At this point, we are relying on the developer having an "Aha" moment about the effect of dropping a future when it is in the middle of its computation. It would be good to explore ways to help people get the insight sooner. For example, maybe having the view here indicate where the read_send
future was in its own computation at the time it was dropped would help someone see that we need to keep that same future running.)
As part of the configuration options for the Async Monitor, one can adjust the upper-bound on the size of the event log, in order to ensure its memory usage does not induce memory exhaustion, virtual memory swapping, or other problems that could excessively perturb performance evaluation.
For many problems, the Async Monitor will provide the developer with the insight they need to resolve their questions about their programs interactions with the async executor. However, some performance problems will require further inspection. The Async Monitor can help with some of that, such as when it allows a developer to pause a task. But sometimes a developer will need insight into other aspects of their program, such as where memory or CPU time is going. Other tools will be needed for such cases.
An appendix of this document shows a diagram of the expected developer work flow. The diagram presents steps that are described above; but it also shows decision points where a user might need to employ another tool.
Now that we have finished explaining the experience of someone using the tool, we will now discuss how to deliver that experience to our customers.
As mentioned above, there are two main components: the Async Monitor, and the Async Console.
The 2021 release of TurboWish will need to provide value without having any support from the Rust compiler or standard library. Therefore, the initial release will work by:
The instrumentation will be responsible for documenting how the relationships between tasks and resources change over time. More specifically, the instrumentation will capture: task state transitions (running, ready, waiting), task acquisition of exclusive resource (e.g. locking a mutex), task modifications of resource state (e.g. sending on a channel or receiving from it), and task intent to interact with resources.
As mentioned in the Task Records section, there are two kinds of intent: conditional and unconditional. My long-term hope is to leverage some sort of heap-ownership tracing system to infer conditional intent, because signalling it via manual instrumentation will be arduous and error-prone. (Heap ownership tracking alone cannot infer unconditional intent, but it may be possible to leverage compiler analysis to perform such inference of unconditional intent.)
There are four short-term deliverables: 1. the Async Monitor, 2. the Async Console, 3. a specification of what instrumentation events the Async Monitor understands (which may come from the client code, the async executor, or any related libraries (e.g. Rayon or other crates that offer a thread pool)), and 4. the instrumentation of Tokio (following the aforementioned specification) to support the Async Monitor.
This is a rendering of the component separation. There are two running programs: The program being developed, and the Async Console. Within the program being developed, instrumentation events are generated and travel on the event bus, where they are observed by the Async Monitor running as a thread in that process. The Async Monitor communicates with the Async Console, presenting portions of the model according to the requests that arrive from the console.
%% Note: `%%` at line start is a comment.
flowchart TD
subgraph Client [Development Program ___]
TwCollect -.- pause_reqs -.-> Tokio
pause_reqs([introspection requests,<br/>e.g. 'pause'])
TracingEventBus
TracingEventBus -.-> TwCollect
TwCollect[Async Monitor]
TwCollect --- EventLog
EventLog[(EventLog)]
Tokio[Async Executor<br/>e.g. Tokio]
ClientCode[Client App Code]
Rayon
ClientCode -.-> TracingEventBus
Tokio -.-> TracingEventBus
Rayon -.-> TracingEventBus
TracingEventBus[Event Bus<br/>atop tracing crate<br/><follows event spec>]
end
subgraph Console [Console Program __]
TwTui <--> TwCollect
TwTui([Async Console])
end
In order to enable concurrent development of the four deliverables, we will follow these steps:
First, we must pick some format for specifying the grammar of the instrumentation events. Anything with an existing off-the-shelf parser-generator available as a Rust crate will serve. JSON might be a reasonable choice here, at least for initial prototyping; but we should also evaluate whether other event description formats incur less encoding overhead than JSON.
(Another option would be a dedicated enum type for events. That would then need to be exposed in its own crate that downstream clients would need to add as a dependency. Perhaps more importantly: we are very likely to use )tracing
as the event bus, which takes string-encoded messages as the basis for its encoding.
Second, we must make an initial specification of events. Just saying "it's JSON" is not sufficient; we need to state how each desired event is actually encoded in the chosen grammar. This will evolve over time. It must cover everything listed in Instrumentation.
With these two pieces in place, we can spin off a few parallel tracks of development.
For instrumenting tokio. I recommend that we start by developing an event-validating async monitor (or more simply, "event-validator"). The event-validator is responsible for ensuring the incoming event stream is coherent; it might build up a minimal model as part of its coherence check, but it is not responsible for gathering all the metrics that are expected of the Async Monitor product itself. With an event-validator in place, one can test the correctness of tokio instrumentation as it is added. (We will also need to do integration tests once the async monitor itself is constructed.)
For the Async Monitor, I recommend the following baby-steps toward implementation.
First, either make a new mock async executor, or select a pre-existing simple reference async executor (several exist on github), and using this as the mock executor. Use the mock executor for initial development of the Async Monitor: that is, the in-development Async Monitor would be exercised by building an internal model of the mock executor's state.
The reason for employing a mock executor is two-fold: 1. it will be quicker to add (and modify when inevitably needed) the necessary instrumentation as the Async Monitor itself is developed, and 2. using a mock rather than tokio directly will help prevent inadvertent coupling between tokio itself and the async monitor; such coupling would make it hard to add support to other async executors like async-std in the future.
In addition to a mock executor, I also recommend making an mock event stream that will simulate both ends connected to the Async Monitor: it will generate the events that we expect to be signaled by an instrumented application sitting atop some instrumented executor, and it will also simulate a console attached to the other side of the Async Monitor. I recommend having a mock event streamer because this is the easiest way to generate a deterministic series of instrumentation events interleaved with console events. Making such event generation deterministic will be crucial for testing the async monitor.
Finally, for the console, I recommend that we make a mock async monitor. It will simulate the Async Monitor's side of the communication protocol used by any Console connecting to the monitor's port.
(Defining the communication protocol between the Async Monitor and the Console is a prerequisite here. However, this protocol is only exposed to the Monitor and Console, and so it can be revised without having to worry about updating downstream clients.)
The three decoupled tracks listed above are repeated below in a visual diagram, showing the dependencies between the pieces.
flowchart TD
spec[Event Specification]
spec --> validating_monitor[Event-Validating Async Monitor]
spec --> mock_executor[Mock Async Executor]
validating_monitor --> instrument_tokio[Add instrumentation to tokio]
spec --> mock_event_stream
mock_event_stream[Mock Event Streamer]
mock_executor --> async_monitor[Async Monitor]
mock_event_stream --> async_monitor
mock_monitor[Mock Async Monitor] --> Console
See Appendix: Sacrifices for Minimum Viable Product
The goal of these tools is to provide users with insight into performance pitfalls. How can we know if the Async Monitor and Console are achieving their goal?
There are two ways I can imagine approaching this: telemetry from the Async Console program, or evaluating out-of-band signals (such as sentiment analysis on social media). I will focus here on the telemetry option, under the assumption that any potential telemetry would be an opt-in choice presented to the customer when they first use the Async Console tool.
For evaluating whether the tool is successfully providing users with insight, the most obvious answer is we could ask our users. When the console detects a problem and the user shifts to the dedicated problem view describing the problem, the console could also proactively ask the "Yes"/"No" question of whether the information it presents is helping the user solve their problem (or perhaps request a user happiness rating on a 1-5 to scale), and then ship those responses back to a service that collects such feedback.
Alternatively: we already plan to have the tool link to websites that provide documentation of various known performance pitfalls. Rather than building telemetry into the console, the linked websites could ask the visitor whether the Async Monitor and Console are working for them. (However, this approach runs the risk of omitting the experiences of users who do not follow the links.)
On the flip side of things, we also want to ensure that the instrumentation from TurboWish is not injecting too much overhead into our customer's applications.
While it is probably not reasonable to try to measure the time spent issuing each individual instrumentation event, it is entirely reasonable to measure how many messages are being sent to the Async Monitor, how large they are, and which component (namely, the aync executor, or user code) issued them.
My recommendation is to monitor how much load the instrumentation is putting onto the event bus, according to the event density over a sliding window of time (let's say 100 ms long). If the event density from user code exceeds some predetermined threshold in any given 100ms window, then the Async Console signals a problem report to the developer, telling them that their program's instrumentation may be slowing down the program. If the event density from the async executor (lets say tokio) exceeds a similar predetermined threshold, then the Async Console reports that up to a service associated with the tokio project. (Or, if the tokio project does not see value in getting such traffic, then the Async Console could suggest that the user file a bug on the tokio github.)
The instrumentation added to the tokio runtime and client application code may expose details of internal operation that customers do not exposed to the world at large.
We should check that there are controls in place that ensure either: 1. code deployed to production does not have such instrumentation turned on, or 2. the async monitor is not initiated on production systems, or 3. the port associated with the async monitor is blocked by a firewall guarding the customer's host machine or intranet.
The Async Monitor and Console answer a developer's questions about how their code's async executor is behaving as their program runs. They provide a summary of the executor's behavior, with metrics about the tasks and resources their program is using. They allow the developer to identify scheduling decisions and see how task and resources depend on each other.
Furthermore, they serve as the foundation of TurboWish. So, in effect: We aim in 2021 to deliver the foundation for 2022 and beyond. We will work with the Rust community to lay this groundwork, and our community will be enabled to make even more amazing tools atop this foundation.
pdxcarl TurboWish Thoughts
Long-term concern: We want developers to be able to deploy the Async Monitor with minimal changes to their code. A few lines of initialization code is acceptable, but broad changes to many files is not a good customer experience.
Ways to accomplish this are under active discussion, but any solution with such limited source code modification will require one or more of: 1. changes to the Rust standard library, 2. changes to the rust compiler, or 3. use of low-level features such as dtrace probes or eBPF. We have not yet decided on which of these options is appropriate.
Short term sacrifice: In addition to the initialization code described at the start of the description of the developer experience, initial versions of TurboWish also require the developer to swap in instrumented versions of common modules; a provided lint will guide developers in this process and help ensure no instances are overlooked.
use turbowish::sync::Mutex; // was previously `use std::sync::Mutex;`
(As already stated above, longer-term, we hope to leverage other facilities to get these instrumentation events.)
Long-term concern: We want the Async Monitor and Console to work with any popular async executor: tokio and async-std are obvious choices here. For the Async Monitor to be useful on an async program, one must use an async executor that has appropriate TurboWish instrumentation added to it.
Short term sacrifice: We will deploy a prototype with tokio, but try to keep in mind any differences with other async executors as we design the protocol used for the async executor to communicate with the async monitor.
Long-term concern: Client application code can benefit from adding extra instrumentation to their code. However, developers should be able to use and benefit from TurboWish without going to extremes adding new instrumentation beyond what the executor has out-of-the-box.
Some components of the Async Monitor will work with zero client instrumentation. In particular: the initial console output that shows the list of tasks and how their time is spent between polling, ready, and waiting does not require any client instrumentation.
However, other features of the Async Monitor, such as tasks listing resources with which they intend to interact, require either Rust compiler support or client instrumentation, or maybe both.
Short term sacrifice: We will prototype under the assumption that clients are willing to add instrumentation. The Async Monitor will differentiate between instrumentation that is "trusted": cases where instrumentation bugs will make the Async Monitor and Console produce misleading results (e.g. if transitions between polling and waiting are omitted or forged), and "untrusted": cases where instrumention bugs, by design of the monitor, will at most lead to confusion, but not outright lies (e.g. if incorrect attributes are attached to a task or resource, the console output showing those attributes will be likewise incorrect, but it need not disrupt other parts of the Async Monitor or Console).
flowchart TD
Start --> PerfStart
%% Start --> TasksSeemBlocked
PerfStart([Performance does not match expectations])
PerfStart --> QA
%% TasksSeemBlocked([Tasks seem stuck])
%% TasksSeemBlocked --> A
QA{using<br/>async<br/>rust?}
QA --"yes" --> A
A --> R --> CC --> QProblemHighlighted
A[add TurboWish Async Console to service]
R[start serivce]
CC[connect to console]
QProblemHighlighted{Console<br/>highlights<br/>problem}
QProblemHighlighted -- "yes" --> ConsoleHighlight
ConsoleHighlight[observe console output]
ConsoleHighlight --> PendingWithoutWaker
PendingWithoutWaker([Console reports:<br/>buggy Future detected])
ConsoleHighlight --> CycleDetected
CycleDetected([Console reports:<br/>deadlock cycle detected])
PendingWithoutWaker --> UserReadsDocs
CycleDetected --> UserReadsDocs
UserReadsDocs[read linked Rust documentation]
UserReadsDocs --> IdentifyRootCause
QProblemHighlighted -- "no" --> QStuckTask
QStuckTask{any tasks<br/>blocked?}
QStuckTask -- "yes" --> InspectEachStuckTask --> FollowTaskResourceChains --> IdentifyRootCause
InspectEachStuckTask[inspect each stuck task]
FollowTaskResourceChains[follow task/resource dependencies]
IdentifyRootCause[identify root cause of problem]
QPH{excessive<br/>memory usage?}
QPH -. "yes" .-> H
QPH -. "no" .-> P
QA -. "no" .-> QPH
H[use turbowish-heapprof<br/><future project>]
P[use turbowish-perf<br/><future project>]
(This section assumes that a set of related wakers is a reasonable and necessary proxy for "resource". This assumption will be revisited during development; the executor controls creation of wakers, so it is a natural point to instrument; but the wakers may not have sufficient context available at their creation points to support describing their associated resource.)
The most basic functionality for the task/resource graph user story requires the executor to emit events whenever:
Supporting other user stories will likely require tracking other information as well (such as how many pending futures have had wake
called and are awaiting a call to poll
). We may need to add additional hooks to the async executor, analogous to the support for "pause", that the Async Monitor can invoke to turn on tracing of such information.
The emitted events should include unique identifiers (UID) for any referenced task/wakers.
The emitted events should also include some notion of the calling context for the event. This calling context should be meaningful from the viewpoint of the Client App Code.
<TimerFuture as Future>::poll
calls cx.waker().clone()
, we want the waker construction event to include (at minimum) that a waker was created from TimerFuture
, so that one can properly tell what kind of resource that waker
is associated with.These events may include program-internal details such as (partial) stack traces that will include memory addresses of program code
#[track_caller]
, so in general this is the only way I expect to be able to infer calling context without putting undue burden on client code.)cx.waker().clone()
will require either 1. client instrumentation that sets some thread- or task-local state, or 2. backtracing through the stack to find instruction addresses that the Async Monitor can, via debuginfo, map back to the calling context.(embedded; follow this link to edit or write comments on schedule itself.)