Task Trace Mechanisms

# Task Trace Mechanisms ## Goal Our goal is to develop a mechanism for inspecting the tree of pending futures from its root (i.e., a task); e.g.: ``` ╼ taskdump::foo at backtrace/examples/taskdump.rs:20:1 └╼ taskdump::bar at backtrace/examples/taskdump.rs:25:1 └╼ join at backtrace/examples/taskdump.rs:26:4 ├╼ taskdump::buz at backtrace/examples/taskdump.rs:35:1 │ └╼ taskdump::baz at backtrace/examples/taskdump.rs:40:1 │ └╼ tokio::sleep(10s) at backtrace/examples/taskdump.rs:42:4 └╼ taskdump::fiz at backtrace/examples/taskdump.rs:30:1 └╼ yield_now() at backtrace/examples/taskdump.rs:31:4 ``` A tasktrace should contain pending futures, but *not* completed or yet-to-be-polled futures. Each frame of a tasktrace should include a name (either the `async` function name, or the name of the `Future` type) and a source location. For leaf futures, tasktraces ideally will contain additional information (e.g., the duration of a sleep, or the name of a file being read). For futher reading, see: [*Logical stack traces - Setting Goals & Collecting Examples*](https://internals.rust-lang.org/t/async-debugging-logical-stack-traces-setting-goals-collecting-examples/15547). ### Tenets #### Safe The mechanism must pose minimal risk to services. #### Efficient The mechanism should add minimal overhead to the size of `poll` performance of `Future`s. #### Ergonomic The mechanism should require minimal manual intervention to produce useful traces. ## Solution Space We've considered six mechanisms for tracing futures; three requiring compiler support, and three that do not: 1. [🦀 Reflection with Debug Info](#🦀-Reflection-with-Debug-Info) 2. [🦀 Language-Supported Async Frame Pointers](#🦀-Language-Supported-Async-Frame-Pointers) 3. [🦀 Future Traversal API](#🦀-Future-Traversal-API) 4. [📦 Via `tracing`](#📦-Via-tracing) 5. [📦 Library-Only Async Frame Pointers](#📦-Library-Only-Async-Frame-Pointers) 6. [📦 Backtrace the Leaves](#📦-Backtrace-the-Leaves) Below is a brief summary of each the approaches and their tradeoffs. As you read, consider these key dimensions of distinction: - To what degree does the mechanism require manual intervention? - What will be missing from the trace of manual intervention does not occur? - What is the runtime cost of the mechanism? - Can the mechanism be used to debug very stuck tasks (e.g., a task whose `poll` is infinitely looping). The in-libary possibilities will be presented with Tokio in mind, but, in practice, should be applicable to almost any async runtime. This document [concludes](#Jack’s-Recommendations) that the [📦 *Backtrace the Leaves*](#📦-Backtrace-the-Leaves) is the best candidate for a tasktracing mechanism. ### 🦀 Reflection with Debug Info In this approach, DWARF (or PDB) debuginfo is used to reflect the in-memory structure of Rust futures, to direct the recursive traversal of sub-futures. This approach requires that the compiler emit sufficient debuginfo for `async` blocks, and that a library is developed to provide reflections of manually-implemented futures. This approach imposes no runtime overhead to the futures. Unfortunately, this approach carries three major disadvantages: 1. The maintenance burden of writing custom reflections for most manually-implemented futures in widespread use would be immense. 2. Any mistakes in custom reflections would likely surface as segfaults, potentially crashing the application being debugged. 3. It is common to strip debuginfo in production; recovering debuginfo for reflection poses deployment challenges. This approach cannot soundly be used to trace very stuck tasks (e.g., a task whose `poll` is infinitely looping). ### 🦀 Language-Supported Async Frame Pointers In this approach, futures insert their `Location` into an intrusive, doubly-linked tree when first polled, and remove their `Location` from that tree when they are dropped. A trace is produced by walking the tree of `Location`s from the root of the task. This is, essentially, the approach taken by the [`async-backtrace`](https://github.com/tokio-rs/async-backtrace) crate, but with safety concessions to acomodate the unavailability of thread local storage in Rust's `core` library. Depending on the implementation details, this approach would add between 40—48 bytes of overhead to each `Future` in a `Future` tree. Under this approach, non-participating sub-futures would simply be missing from traces, but their participating children would be present. With thread pausing, this approach can be used to trace very stuck tasks. ### 🦀 Future Traversal API In this approach, we extend `Future` with a visitor API; e.g.: ```rust pub trait Future { type Output; fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output>; fn as_visitable(self: Pin<&mut Self>) -> Option<Pin<&dyn Visitable>> { None } } pub trait Visitable { /// The source location associated with this future, if any. fn location(self: Pin<&mut Self>) -> &'static Location<'static>; /// Visit this `Future`'s sub-futures. fn visit(self: Pin<&mut Self>, visit: &mut dyn Visit); } pub trait Visit { fn visit_future(&mut self, future: &dyn Visitable); } ``` The compiler will automatically implement `Future::as_visitable` and `Visitable` for `async` blocks and functions. Since futures must, necessarily, track which of their sub-futures are pending (in order to `poll` them), this approach does not inherently impose any additional runtime overhead. However, care would need to be taken to ensure that the implementation of these items for `async` blocks does not inhibit inlining optimizations. Under this approach, `Future` traces would halt at any sub-`Future` *not* implementing the traversal API; i.e., the manually-implemented futures already present in the ecosystem. Implementations of the traversal API would need to be contributed to these extant `Futures`. This approach cannot be used to trace very stuck tasks. ### 📦 Via `tracing` Many crates in the Tokio ecosystem, including Tokio, make heavy use of the [`tracing`](https://github.com/tokio-rs/tracing) framework for structured logging. An approximate of a tasktrace can produced for a well-instrumented `Future` by recursively traversing its tree of `tracing` spans. This 'tasktrace' will only include frames instrumented to produce `tracing` spans, and only the subset of those spans which are 'enabled' by the `tracing` subscriber. In performance sensitive contexts, `tracing` instrumentation is often applied conservatively, and `tracing` spans are enabled conservatively. The appeal of this approach is that it is, by far, the easiest for Tokio to enable. Tokio already tracks the top-level `tracing` span ID of each task; all that must be done is to provide an API exposing these top-level IDs. This approach can be used to trace very stuck tasks. ### 📦 Library-Only Async Frame Pointers The [`async-backtrace`](https://github.com/tokio-rs/async-backtrace) crate provides `Future` wrappers that enable efficient tracing of tasks. In its approach, futures insert their `Location` into an intrusive, doubly-linked tree when first polled, and remove their `Location` from that tree when they are dropped. A trace is produced by walking the tree of `Location`s from the root of the task. This approach adds approximately 48 bytes of space overhead to each `Future` in a task, and a very small amount of overhead to polls (at best, a conditional check; at worst, a few pointer writes). In the `async-backtrace` crate, the first poll and drop of top-level futures also requires a mutex lock, but this cost would be eliminated with deeper Tokio integration. Overhead aside, this approach has other downsides. To produce useful tasktraces, the both the `async` functions of an application and its manually-implemented futures must be liberally instrumented. Entrenching the ecosystem in this mechanism might stick Rust libraries with a difficult-to-avoid overhead, should any of the compiler-supported approaches be pursued in the future. Tokio *could* use it internally only, but it's not clear how useful such traces would be if they only included futures from Tokio. With thread pausing, this approach can be used to trace very stuck tasks. ### 📦 Backtrace the Leaves In this approach, the `poll`s of Tokio's leaf futures (i.e., `yield_now`, `sleep`, etc.) are modified to check if a global `SHOULD_TASKTRACE` flag is set. If so, they [capture and stash a linear stacktrace](https://docs.rs/backtrace/latest/backtrace/struct.Backtrace.html#method.new_unresolved). A tasktrace is produced by setting `SHOULD_TASKTRACE` to `true`, polling the task, and then coalescing its various linear backtraces into a tree. This approach has low runtime overhead during normal execution (only a cold conditional branch), and adds no size overhead to futures. Tasktraces produced with this mechanism would include all branches of the tracing tree ending in a Tokio leaf; branches ending in custom leaves would not be included without manual instrumentation. This approach cannot be used to trace very stuck tasks. ## Jack's Recommendations We should dismiss the [🦀 *Reflection with Debug Info*](#🦀-Reflection-with-Debug-Info) approach, for its [great difficulty and risk](https://rust-lang.zulipchat.com/#narrow/stream/187312-wg-async/topic/logical.20stack.20traces/near/339818792). Of the remaining approaches, only two are zero-cost: [🦀 *Future Traversal API*](#🦀-Future-Traversal-API) and [📦 *Backtrace the Leaves*](#📦-Backtrace-the-Leaves). The [🦀 *Future Traversal API*](#🦀-Future-Traversal-API) approach provides an efficient, cross-runtime mechanism for traversing tasktraces. So long as manually implemented futures implement the traversal API, the tasktraces produced by this mechanism will include all pending sub-futures. Unfortunately, while we wait for extant manually-implemented futures in the ecosystem to implement this API, tasktraces produced with this mechanism will omit all-subtrees that *don't* yet implement the API. Producing usable results with this mechanism will require ample library buy-in, which will take non-trivial time. By contrast, the [📦 *Backtrace the Leaves*](#📦-Backtrace-the-Leaves) approach does not require instrumentation anywhere but leaf futures; traces produced with this mechanism will include all intermediate futures. However, in cases where a future combinator does not poll all of its pending sub-futures, such unpolled sub-futures will be missing from traces produced by this mechanism. ### Verdict The [📦 *Backtrace the Leaves*](#📦-Backtrace-the-Leaves) approach will deliver tasktraces to Rust users on a short timeframe, with minimal runtime overhead and manual instruementation. Implementing it would not pose any barriers to later adopting a compiler-supported mechanism (probably the [🦀 *Future Traversal API*](#🦀-Future-Traversal-API)). It is the best candidate for a tasktracing mechanism. #### *But what about tracing very stuck tasks?* It is not critical that our mechanism address this scenario. A `Future` that is infinitely looping or has parked its executor thread can be debugged by requesting a traditional stack trace with a native debugger. #### *Should Rust also implement the [🦀 *Future Traversal API*](#🦀-Future-Traversal-API)?* Unsure. Implementing the [🦀 *Future Traversal API*](#🦀-Future-Traversal-API) would partly provide a standard interface for traversing the `Future` tree of a task. However, it is unclear whether the benefits of a general, cross-runtime API for traversing futures justify the compiler and library complexity of the approach. Additionally, it only *partly* provides such an interface, since async runtimes would still need to provide APIs for accessing suspended tasks.