Task Trace Mechanisms

Goal

Our goal is to develop a mechanism for inspecting the tree of pending futures from its root (i.e., a task); e.g.:

β•Ό taskdump::foo at backtrace/examples/taskdump.rs:20:1
  β””β•Ό taskdump::bar at backtrace/examples/taskdump.rs:25:1
     β””β•Ό join at backtrace/examples/taskdump.rs:26:4
        β”œβ•Ό taskdump::buz at backtrace/examples/taskdump.rs:35:1
        β”‚  β””β•Ό taskdump::baz at backtrace/examples/taskdump.rs:40:1
        β”‚     β””β•Ό tokio::sleep(10s) at backtrace/examples/taskdump.rs:42:4
        β””β•Ό taskdump::fiz at backtrace/examples/taskdump.rs:30:1
           β””β•Ό yield_now() at backtrace/examples/taskdump.rs:31:4

A tasktrace should contain pending futures, but not completed or yet-to-be-polled futures. Each frame of a tasktrace should include a name (either the async function name, or the name of the Future type) and a source location. For leaf futures, tasktraces ideally will contain additional information (e.g., the duration of a sleep, or the name of a file being read).

For futher reading, see: Logical stack traces - Setting Goals & Collecting Examples.

Tenets

Safe

The mechanism must pose minimal risk to services.

Efficient

The mechanism should add minimal overhead to the size of poll performance of Futures.

Ergonomic

The mechanism should require minimal manual intervention to produce useful traces.

Solution Space

We've considered six mechanisms for tracing futures; three requiring compiler support, and three that do not:

  1. πŸ¦€ Reflection with Debug Info
  2. πŸ¦€ Language-Supported Async Frame Pointers
  3. πŸ¦€ Future Traversal API
  4. πŸ“¦ Via tracing
  5. πŸ“¦ Library-Only Async Frame Pointers
  6. πŸ“¦ Backtrace the Leaves

Below is a brief summary of each the approaches and their tradeoffs. As you read, consider these key dimensions of distinction:

  • To what degree does the mechanism require manual intervention?
  • What will be missing from the trace of manual intervention does not occur?
  • What is the runtime cost of the mechanism?
  • Can the mechanism be used to debug very stuck tasks (e.g., a task whose poll is infinitely looping).

The in-libary possibilities will be presented with Tokio in mind, but, in practice, should be applicable to almost any async runtime.

This document concludes that the πŸ“¦ Backtrace the Leaves is the best candidate for a tasktracing mechanism.

πŸ¦€ Reflection with Debug Info

In this approach, DWARF (or PDB) debuginfo is used to reflect the in-memory structure of Rust futures, to direct the recursive traversal of sub-futures. This approach requires that the compiler emit sufficient debuginfo for async blocks, and that a library is developed to provide reflections of manually-implemented futures.

This approach imposes no runtime overhead to the futures. Unfortunately, this approach carries three major disadvantages:

  1. The maintenance burden of writing custom reflections for most manually-implemented futures in widespread use would be immense.
  2. Any mistakes in custom reflections would likely surface as segfaults, potentially crashing the application being debugged.
  3. It is common to strip debuginfo in production; recovering debuginfo for reflection poses deployment challenges.

This approach cannot soundly be used to trace very stuck tasks (e.g., a task whose poll is infinitely looping).

πŸ¦€ Language-Supported Async Frame Pointers

In this approach, futures insert their Location into an intrusive, doubly-linked tree when first polled, and remove their Location from that tree when they are dropped. A trace is produced by walking the tree of Locations from the root of the task. This is, essentially, the approach taken by the async-backtrace crate, but with safety concessions to acomodate the unavailability of thread local storage in Rust's core library.

Depending on the implementation details, this approach would add between 40β€”48 bytes of overhead to each Future in a Future tree.

Under this approach, non-participating sub-futures would simply be missing from traces, but their participating children would be present.

With thread pausing, this approach can be used to trace very stuck tasks.

πŸ¦€ Future Traversal API

In this approach, we extend Future with a visitor API; e.g.:

pub trait Future {
    type Output;

    fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output>;
    
    fn as_visitable(self: Pin<&mut Self>) -> Option<Pin<&dyn Visitable>> {
        None
    }
}

pub trait Visitable {
    /// The source location associated with this future, if any.
    fn location(self: Pin<&mut Self>) -> &'static Location<'static>;

    /// Visit this `Future`'s sub-futures.
    fn visit(self: Pin<&mut Self>, visit: &mut dyn Visit);
}

pub trait Visit {
    fn visit_future(&mut self, future: &dyn Visitable);
}

The compiler will automatically implement Future::as_visitable and Visitable for async blocks and functions.

Since futures must, necessarily, track which of their sub-futures are pending (in order to poll them), this approach does not inherently impose any additional runtime overhead. However, care would need to be taken to ensure that the implementation of these items for async blocks does not inhibit inlining optimizations.

Under this approach, Future traces would halt at any sub-Future not implementing the traversal API; i.e., the manually-implemented futures already present in the ecosystem. Implementations of the traversal API would need to be contributed to these extant Futures.

This approach cannot be used to trace very stuck tasks.

πŸ“¦ Via tracing

Many crates in the Tokio ecosystem, including Tokio, make heavy use of the tracing framework for structured logging. An approximate of a tasktrace can produced for a well-instrumented Future by recursively traversing its tree of tracing spans.

This 'tasktrace' will only include frames instrumented to produce tracing spans, and only the subset of those spans which are 'enabled' by the tracing subscriber. In performance sensitive contexts, tracing instrumentation is often applied conservatively, and tracing spans are enabled conservatively.

The appeal of this approach is that it is, by far, the easiest for Tokio to enable. Tokio already tracks the top-level tracing span ID of each task; all that must be done is to provide an API exposing these top-level IDs.

This approach can be used to trace very stuck tasks.

πŸ“¦ Library-Only Async Frame Pointers

The async-backtrace crate provides Future wrappers that enable efficient tracing of tasks. In its approach, futures insert their Location into an intrusive, doubly-linked tree when first polled, and remove their Location from that tree when they are dropped. A trace is produced by walking the tree of Locations from the root of the task.

This approach adds approximately 48 bytes of space overhead to each Future in a task, and a very small amount of overhead to polls (at best, a conditional check; at worst, a few pointer writes). In the async-backtrace crate, the first poll and drop of top-level futures also requires a mutex lock, but this cost would be eliminated with deeper Tokio integration.

Overhead aside, this approach has other downsides. To produce useful tasktraces, the both the async functions of an application and its manually-implemented futures must be liberally instrumented. Entrenching the ecosystem in this mechanism might stick Rust libraries with a difficult-to-avoid overhead, should any of the compiler-supported approaches be pursued in the future.

Tokio could use it internally only, but it's not clear how useful such traces would be if they only included futures from Tokio.

With thread pausing, this approach can be used to trace very stuck tasks.

πŸ“¦ Backtrace the Leaves

In this approach, the polls of Tokio's leaf futures (i.e., yield_now, sleep, etc.) are modified to check if a global SHOULD_TASKTRACE flag is set. If so, they capture and stash a linear stacktrace. A tasktrace is produced by setting SHOULD_TASKTRACE to true, polling the task, and then coalescing its various linear backtraces into a tree.

This approach has low runtime overhead during normal execution (only a cold conditional branch), and adds no size overhead to futures. Tasktraces produced with this mechanism would include all branches of the tracing tree ending in a Tokio leaf; branches ending in custom leaves would not be included without manual instrumentation.

This approach cannot be used to trace very stuck tasks.

Jack's Recommendations

We should dismiss the πŸ¦€ Reflection with Debug Info approach, for its great difficulty and risk. Of the remaining approaches, only two are zero-cost: πŸ¦€ Future Traversal API and πŸ“¦ Backtrace the Leaves.

The πŸ¦€ Future Traversal API approach provides an efficient, cross-runtime mechanism for traversing tasktraces. So long as manually implemented futures implement the traversal API, the tasktraces produced by this mechanism will include all pending sub-futures. Unfortunately, while we wait for extant manually-implemented futures in the ecosystem to implement this API, tasktraces produced with this mechanism will omit all-subtrees that don't yet implement the API. Producing usable results with this mechanism will require ample library buy-in, which will take non-trivial time.

By contrast, the πŸ“¦ Backtrace the Leaves approach does not require instrumentation anywhere but leaf futures; traces produced with this mechanism will include all intermediate futures. However, in cases where a future combinator does not poll all of its pending sub-futures, such unpolled sub-futures will be missing from traces produced by this mechanism.

Verdict

The πŸ“¦ Backtrace the Leaves approach will deliver tasktraces to Rust users on a short timeframe, with minimal runtime overhead and manual instruementation. Implementing it would not pose any barriers to later adopting a compiler-supported mechanism (probably the πŸ¦€ Future Traversal API). It is the best candidate for a tasktracing mechanism.

But what about tracing very stuck tasks?

It is not critical that our mechanism address this scenario. A Future that is infinitely looping or has parked its executor thread can be debugged by requesting a traditional stack trace with a native debugger.

Should Rust also implement the πŸ¦€ Future Traversal API?

Unsure. Implementing the πŸ¦€ Future Traversal API would partly provide a standard interface for traversing the Future tree of a task. However, it is unclear whether the benefits of a general, cross-runtime API for traversing futures justify the compiler and library complexity of the approach. Additionally, it only partly provides such an interface, since async runtimes would still need to provide APIs for accessing suspended tasks.