owned this note
owned this note
Published
Linked with GitHub
---
tags: turbowish
---
# carllerche's tokio-console thoughts
This tool is intended to be run locally during debugging and/or benchmarks.
## Shiny future
All stories start with the developer enabling tokio-console in their application then opening the tokio-console terminal UI.
```rust=
#[tokio::main]
async fn main() {
tokio_console::init();
// Rest of the app
}
```
Connect using the default tokio-console port.
```
$ tokio-console
```
Alternatively, a `cargo` command exists to both run the process and switch to the tokio-console interface:
```
$ cargo run-tokio-console
```
### Why is my task stuck?
#### Cause: an incorrectly implemented future returns `Pending` without registering the waker.
* The task list view includes a warning at the top (or bottom?) indicating a buggy future was detected.
* The developer goes to the "problem" view.
* The tool identifies the location where `Pending` is returned without registering the waker.
* This could be a library or application.
* The tool also includes:
* A clean backtrace to that point
* The task type, ID, and attributes.
* A link to a documentation page expanding on the issue.
#### Cause: Deadlock cycle. A set of tasks are awaiting on values produced by other tasks. The dependency chain creates a cycle, preventing any tasks from making progress.
* The task list view includes a warning at the top (or bottom?) indicating a deadlock cycle was detected.
* The developer goes to the "problem" view.
* The view lists tasks included in the cycle and includes:
* The task types, ID, and attributes for each type.
* The resources being awaited on
* The location / backtrace of the await.
* A link to a documentation page expanding on the issue.
A cycle may include a single task if a task is receving on a channel and holds all senders.
#### Cause: Resource starvation.
This is similar to the deadlock case above, but may not be a true deadlock. Either the task may advance (but slowly) or the system cannot detect the deadlock due to external systems being at play (reading from a TcpStream, but no data is arriving).
We will break it down further to a few specifics.
**Variant 1**: Blocked waiting on channel capacity.
* The developer finds the blocked task and enters the task view
* The task view shows the task is blocked on `my_channel.send(value).await`
* The frame details show the current state of the request is "waiting on channel capacity"
* The frame details link to tasks that are "blocking" the current tasks, in this case, it is the task receiving the values.
* The developer inspects the receiving task and speeds it up.
**Variant 2:** Blocked waiting on reqwest's connection pool.
* The developer finds the blocked task and enters the task view
* The task view shows the task is blocked on `http_request.await`
* The frame details show the current state of the request is "waiting on a connection from the pool"
* The frame details link to tasks that are "blocking" the current tasks, in this case, it is a list of tasks holding connections.
* The developer inspects those tasks and solves the problem by either reducing the amount of time the tasks hold the connection or by increasing the size of the connection pool.
Take away: libraries will need to expose higher level data to tokio-console.
#### Cause: logic bug in application
**Variant 1:**
When parsing a protocol frame, the frame has been fully received but the developer accidentally issues a read. The peer will send no more data.
* The developer finds the blocked task and enters the task view.
* The developer finds the blocked `read` frame.
* The developer looks at the parent frame, which tracks the buffer and sees that 128 bytes have been received.
* Alternative, the developer inspects the buffer and sees it contains the protocol frame.
* The developer wonders why the task is blocked in read if the frame is already buffered, inspects the code, and finds the bug.
**Variant 2:** The protocol frame is 128 bytes but the developer does `socket.read_exact(256)`
* The developer finds the blocked task and enters the task view.
* The developer finds the blocked `read_exact` frame.
* The developer sees the `read_exact` future has already buffered 128 bytes and the frame is waiting for 128 more bytes.
* Alternative, the developer inspects the `read_exact` buffer and sees it contains the protocol frame.
* The developer fixes the code to `read_exact(128)`
#### Cause: Unexpected cancellation from not calling `.await`
In both of these cases, the compiler will emit a warning as futures are annotated `#[must_use]`. Developers do not always catch warnings, especially when they are drowned out by many other warnings.
**Variant 1:** not calling await on a Tokio fn.
```rust=
let mut buf = vec![0; 4096];
my_tcp_stream.read(&mut buf[..]);
```
The `read` operation is immediately cancelled without polling.
It is unclear how `tokio-console` can handle this. At the very least, it can be included in the list of warnings.
* The developer sees a warning on the task list view.
* Alternatively, the developer goes to the specific task and sees a warning there.
* The warning indicates a future was dropped without calling await.
* The developer clicks a link to documentation on the topic.
**Variant 2:** not calling await on a user fn.
```rust=
async fn do_work() {
...
}
#[tokio::main]
async fn main() {
do_work();
}
```
This is similar variant 1, however `tokio-console` will not be able to detect the problem without the rust compiler providing runtime instrumentation. Alternatively, the developer could explicitly annotate the function making the flow identical to variant 1.
```rust=
#[tokio::instrument]
async fn do_work() {
...
}
```
Creating a future without ever awaiting it or explicitly cancelling it with a combinator is a developer mistake.
#### Cause: Unexpected cancelation from `select!`
[Source](https://tomaka.medium.com/a-look-back-at-asynchronous-rust-d54d63934a1c)
Consider a task that parses in-bound protocol frames from a socket and also must handle configuration change updates (for example, change the max frame size).
```rust=
let mut max_frame_size = 4096;
loop {
select! {
frame = parse_frame(&my_socket, max_frame_size) => {
tx.send(frame).await;
}
new_max_frame_size = config_rx.recv() => {
max_frame_size = new_max_frame_size;
}
}
}
```
When `config_rx.recv()` completes first, any in-progress `parse_frame` state is lost, resulting in either the protocol parsing state to be corrupted or a frame is lost. Let's assume the observed symptom is that a frame is lost, this will appear to the developer as a hung task. The data has been received but nothing was acted on or `parse_frame` hangs waiting for more data even though the frame was received (because part of the data was lost).
The correct strategy is described in the [Tokio docs](https://tokio.rs/tokio/tutorial/select#resuming-an-async-operation), though this is a tricky problem to debug.
* The developer goes to the task view for the stuck task.
* Sees the task is stuck in `select!`, expand that frame for the `parse_frame` branch.
* The frame includes a warning that the previous iteration was canceled.
* The frame also calls out that the previous iteration performed side-effects (read from a socket).
* Maybe always include a warning about bugs related to this?
* The developer jumps back to the previous frame (the one that was cancelled). -- the point ("call-site"?) where it was cancelled
* The frame view shows that 40 bytes were read from the TCP socket in this frame.
* The developer realizes that the buffer containing this data was dropped in the select branch.
* **NOTE: making this leap requires the developer to realize the buffer is contained in the select branch. What happens if the developer doesn't make this leap?**
#### Cause: did not spawn a task
```rust=
#[tokio::instrument]
async fn process(listener: TcpListener) -> io::Result<()> {
loop {
let (socket, _) = listener.accept().await?
let mut buf = Vec::with_capacity(4096);
socket.read(&mut buf[..]).await?;
// The read hangs due to no inbound data.
}
Ok(())
}
```
* Finds task in task list and goes to task view.
* Sees the task is currently awaiting the read.
* Sees, in the task's resource list, the `TcpListener` resource is ready.
* Realizes the task is blocked on reading and cannot get back to accepting.
* Maybe there is a "help" section on how to deal with ready resources that are not being polled?
#### Cause: Accidental blocking call starves the runtime
**Variant 1**: Calling `block_on` from the `futures` crate
* blocking fns across async rust emit appropriate tracing instrumentation
* the task view displays a warning indicating that a task is synchronously blocked
* the file, line, and async back trace is included.
* docs include help on how to avoid calling block_on from a runtime.
**Variant 2**: Performing a high CPU operation on the runtime.
There is no instrumentation to catch this issue. It is detect by noticing frames being polled for long periods of time (where long is tbd).
* On the task list, the developer sees a warning that tasks are taking a long time to poll
* On the warning view, the view highlights spans that are taking too long and the backtrace.
* In this case, if the task is currently *blocked* in a CPU task (an infinite loop) a full thread backtrace is returned.
**Variant 3**: Accidentally holding a `std` mutex across an `.await`.
This requires syscall level instrumentation. This could be instrumenting `std`, though that would not catch everything (e.g. ffi calls). `ptrace` can provide hooks. This will allow catching mutex waits and other blocking syscalls.
* On the task list, the developer sees a warning that blocking syscalls are being made from the runtime.
* On the warning view, the backtrace shows where the call is being made, rooted in a task.
* Help explains how to avoid blocking calls w/ `spawn_blocking` or `block_in_place`.
### Panics
At a high level, tasks in the task list will track they panicked and will allow the developer to browse the task events (logs).
#### Cause: Accidental future cancellation
* The developer opens the task list
* Finds the task that panicked.
* Navigates the task events (i.e. browses the hierarchical logs w/ a fancy view).
* Sees that the previous iteration panicked.
### Why is my app slow?
At a high level, a common root cause for slowness in an async app is not processing events ASAP when they come in. In these cases, the developer needs to "record" events to play them back after.
#### Cause: running a task off runtime
Given the following:
```rust=
#[tokio::main]
async fn main() {
for _ in 0..100 {
let (tx, rx) = oneshot::channel();
tokio::spawn(async move {
tx.send(());
})
rx.await;
}
}
```
One would expect this to run efficiently, however, the main task is run **off** the main runtime and the spawned tasks are **on** runtime, which means the snippet will run much slower than:
```rust=
#[tokio::main]
async fn main() {
tokio::spawn(async {
for _ in 0..100 {
let (tx, rx) = oneshot::channel();
tokio::spawn(async move {
tx.send(());
})
rx.await;
}
}).await;
}
```
The slowdown is caused by a higher time between the rx task being notified (in `tx.send()`) and the task being polled.
* The developer runs a benchmark
* Opens tokio-console
* Sees the list of tasks and expands the main task.
* Sees the main task spending a significant % of time in "scheduled" state (time between being notified & polled).
* The view also has a note saying the notifications come from a "remote" runtime.
* tokio-console knows this is the main task and calls out how to reduce notifications.
**Variant 2:** using a Tokio resource off runtime.
A Tokio TcpStream is moved to a separate thread and is used via `block_on`
```rust=
let res = futures::task::block_on(async {
my_tcp_stream.read(&mut buf[..]).await
});
```
* Opens task view and finds the `block_on` task.
* Finds task in list
* Sees the task spending a significant % of time in "scheduled" state.
* The view also says a significant % of wakeups are "remote".
Initially, `block_on` calls may not automatically translate to tasks in the task view. If not, the developer could do:
```rust=
let task = tokio_console::instrument::task()
.name("read-from-tcp-stream")
.attribute("remote-addr", &my_tcp_stream.remote_addr())
.build(async {
my_tcp_stream.read(&mut buf[..]).await
});
let res = futures::task::block_on(task);
```
#### Cause: not currently polling the stream
[Source](https://github.com/rust-lang/futures-rs/issues/2387)
```rust=
let stream = stream::iter(0..5).map(|_| {
time::sleep(ms(100)).await;
});
pin!(stream);
while let Some(_) in stream.next().await {
time::sleep(ms(100)).await
}
```
And a simpler variant:
```rust=
let fut1 = time::sleep(100.ms());
let fut2 = time::sleep(150.ms());
fut1.await;
fut2.await;
```
Expectation: the snippet takes 600ms to iterate because the sleeps in the stream run in the background and are immediately ready when polling the stream. In practice, the sleeps do not execute because nothing is polling the stream while sleeping in the `while` block.
This comes from the developer having a misunderstanding of Rust's async model. The developer expects async calls to **always** start processing immediately in the background and not lazily when being `.await`ed.
To help debug, tokio-console must expose the details of execution timing.
* The developer runs a benchmark then opens tokio-console
* Goes to the task type and looks at aggregate metrics.
* Sees that each task, on average, is waiting X time on `sleep(...).await`, which is higher than expected.
* The UI will indicate sleeps do not happen concurrently.
* Goes to the task instance view for one of these tasks.
* Steps through the execution of the task.
* This can be thought as browsing the emitted logs, but with a better UI (hopefully)
* Sees that sleeps happen sequentially.
*Note: the question is, how does the developer make the leap from seeing sleeps happen sequentially to understanding Rust's async execution model? My best answer now is the view will have some legend explaining the difference between a "plain" await call and a "concurrent" await (select!, join!, FuturesUnordered, ...). There can be a help link there to learn more.*
* The developer sees the legend calling out different representations for "plain" awaits vs. concurrent awaits and follows to the docs to learn more. The developer learns that rust's async runtime model is different and requires special calls to make awaits happen in the background.
#### Cause: significant time spent moving large values
```rust=
struct Frame {
data: [0; 4096],
len: usize,
}
struct FrameQueue {
pending: Vec<Frame>,
}
async fn process_frames(queue: &mut FrameQueue) {
while !queue.is_empty() {
let frame = queue.remove(0);
process(frame).await;
}
}
```
* The developer opens tokio-console
* The developer sees significant time is spent running the connection process task type.
* The task type view shows significant time "running"
* The tool suggests using `perf`
* Maybe it is possible for `tokio-console` to interpret perf data and isolate it per task?
#### Cause: unbuffered reads / writes.
The application is exhibiting sub-optimal performance due to issuing socket read and write calls with small buffers.
* On the task list, the developer sees a warning indicating small reads / writes were detected.
* The developer jumps to the warning view.
* The callsites are displayed with the size of the buffers.
* This is detected if multiple reads / writes are done w/ small buffers that are ready on the same task poll.
* The developer reads a recommendation to buffer reads / writes
* The developer uses `BufReader` to solve the problem.
**Question:** What if the warning is a false-positive, how are they silenced?
**Alternative:**
* On the task list, The developer sees the "process-connection" task type spends a high percentage of wall time running vs. waiting.
* The developer goes to the "process-connection" task type view
* The aggregate trace shows a significant percentage of time spent in read / write calls.
* The developer sees that most calls are issued with small buffer sizes.
#### Cause: overloaded scheduler
The application has more tasks being notified then can be processed. This could happen with excessive "fan-out" style messaging.
* On the task-list view, the developer sees elevated scheduler queue depth and time-to-run.
* The developer goes to the "scheduler" view??
* The view lists `.await` calls that have high wake -> poll times.
* The developer expands the entry w/ the highest total wake -> poll time.
* This view may not **obviously** display the source of the problem as unrelated tasks may be slowed down by a busy scheduler.
* **ALternative**: the developer orders callsites by ones that generate the most notifiations.
* tokio-console displays the sources if notification.
* In the "fan-out" case, this would be the location that is **sending** messages.
#### Cause: slow channel receiver
When message passing, if the message receiver is overloaded, it can be perceived as an app slow down.
```rust=
let (tx, rx) = mpsc::channel(1024);
tokio::task::Builder::new()
.name("process-requests")
.spawn(async move {
while let Some((request, tx)) = rx.recv().await {
// If `process(...).await` is slow, the channel backs up.
let response = process(request).await;
tx.send(response);
}
});
for _ in 0..1000 {
let tx = tx.clone();
tokio::task::Builder::new()
.name("sends-requests")
.spawn(async move {
loop {
let request = gen_request();
let (response_tx, response) = oneshot::channel();
tx.send((request, response_tx)).await;
println!("Response = {:?}", response.await);
}
});
}
```
This implementation introduces a bottleneck with processing requests. This can be fixed by concurrently processing requests.
* Finds the "sends-requests" task type.
* See it spends most of the time waiting on `response.await`
* See the other side of the oneshot comes from the "process-requests" task
* Since this is an aggregate view, this works because there is only one callsite that "sends" into the the oneshot. If there are multiple, all of them (or the top N based on await time?) would be displayed.
* Sees the producer task pops messages off the channel.
* Inspects the channel and sees the mean time spent in channel is high, indicating a slow consumer.
* How to fix it?
* Maybe channel consumer views can provide tips on handling slow consumers?
Interestingly, tokio-console could provide better insight if a higher level "actor" library is used to model this pattern and the library is tokio-console-aware.
```rust=
let handle = kabuku::actor()
.name("process-requests")
.backlog(1024)
.spawn(async move |request| {
process(request).await
});
for _ in 0..1000 {
let handle = handle.clone();
tokio::task::Builder::new()
.name("sends-requests")
.spawn(async move {
loop {
let request = gen_request();
let response = handle.send(request).await;
println!("Response = {:?}", response);
}
});
}
```
* Finds the "sends-requests" task type.
* See it spends most of the time waiting on `handle.send(request).await`
* In the .await point view, it knows this is an actor processing the requests and displays:
* Time spent in the inbox.
* Mean response time.
* Mean per-handle in-flight requests.
* Any other data?
### Spurious wakeups / excessive yielding
How to debug this? How do we know spurious wakeups are wrong vs. just yielding.
## Concepts
* Runtime = Scheduler + drivers
* Driver
* Executor / scheduler
* Task
* Spawned async blocks
* Blocks passed to `block_on`
* Resource
* May be composed of sub resources
* e.g.
* Channel receiver
* reqwest response future.
* Frame (await point?)
* Expose the current state
* Number of tasks spawned?
* Metric
* Flavor of data an entity can provide
* Instant value
* Moving average
* Histogram
* ...
## Views
### Task list
| ![Top](https://i.imgur.com/F71IQMl.png) |
|:--:|
| Initial view / top |
#### Data
* Runtime metrics
* List of tasks
* Task ID
* Name
* State
* Running
* Idle
* Scheduled
* Blocking
* Terminated
* Panicked
* Metrics
* % Running (vs. idle)
* Poll time (Metric: mean, p50, p90, ...)
* Wake to poll time (mean, p50, p90, ...)
* Meta data, user supplied, key-value format
* Warning (!) near task doing something wrong.
* Rate of "external" scheduings vs. internal
* Problems / warnings
* `Pending` returned without registering waker.
* Deadlock cycle detected
* Task poll takes too long
* Calling blocking fn from async task.
* Mutex held across await point
#### Columns
Similar to `top`, columns are configurable, sortable, and filterable.
Ideas:
* Last poll duration
* Last idle duration
#### TODO
* Track completed tasks.
* Categorize the task:
* Spawned
* Inline? (`block_on`)
* Blocking? (`spawn_blocking`)
* Anything else?
### Task type
TODO
#### Data
* Number of tasks matching.
* Metrics
* % run vs. idle
* Poll time, avg, med, p99
* Wake to poll, avg, med, p99
* % remote notifications (off task)
* Other tasks woken during poll (generating lots of scheduler work)
* Trace, aggregate
* Differentiates between sequential and concurrent.
* `select!`
* `.await` point data
* % of total task wall-time
* % off-runtime notification
* Poll time, avfg, med, p99
* Wake to poll, avg, med, p99
* Other tasks woken during poll (generating lots of scheduler work)
* Links to tasks that notify this await (there could be many)
Notes:
* Aggregates metrics for all types w/ a given type
* Filter tasks within view. e.g.:
* only show currently active tasks
* Only show tasks matching query
### Task instance
TODO
#### Data
* Attributes
* Task ID
* Name
* Task "parent" (spawned this task)
* Metadata (key/value)
* State (running, idle, ...)
* Metrics
* Time spent in poll: avg, med, p99
* Time wake to poll: avg, med, p99
* % of remote wakes (cross thread)
* % of polls that yield (self wake).
* Resources ready per poll
* Hit coop?
* Task focused log / backtrace?
* Hierarchical.
* Tracks metadata, allowing view into data, e.g. bytes received.
* `.await` points can "mask" children. For example, libraries do not want to expose internals by default.
* Show tasks spawned
* Differentiate sequential vs. concurrent
* Representation for `select!`
* PANIC!
* Can "navigate back in time"
* `.await` points
* Attributes
* Bytes read from socket.
* Current action. e.g.
* mpsc channel Sender waiting on capacity
* mpsc chanenl receiver waiting for value.
* Resources that could "unblock" the `.await` point. e.g
* For channels, this is the pair.
* Warnings
* Poll takes too long
* Previous iteration canceled early (TODO: `select!` level?)
* Mutex held across await point
* Resources owned by task
* When they were created and used
* Absolute time
* Relative to task spawn time.
* Resource state: Ready, pending, idle?
* Attributes
* socket: local_addr
* channel
* capacity
* Current size
* time vals spend in chan: avg, med, p99
* Time tree?
* https://gist.github.com/seanmonstar/e06edc2c04e908989e396512b1e5c055
* Warnings
* `Pending` returned without registering waker.
* Future dropped w/o polling.
* Notified from off-runtime (performance).
* TODO: this may not want to be a warning... some other view, "tips"?
* Small read/write syscalls
* Links to help
### Resource
TODO
### Warnings (clippy, hints, ...)
TODO
### Logs
TODO: Hierarchical log stream w/ filter ability
### Runtime / scheduler / `.await` points
TODO: Should there be a runtime centric view?
#### Data
* `.await` points top-like interface
* % run time
* Wake to poll times
#### Data
* Callsites in "top" style view.
* Ordered by mean time-to-wake.
### Deadlock detected
A deadlock cycle has been detected.
#### Data
* List of tasks involved
* Task attributes (id, name, metadata)
* Resource(s) being waited on
* Backtrace of .await
* Link to documentation
#### Data
## Instrumenting
Getting rich data requires application and library developers to include instrumentation.
### Spawning
```rust=
tokio::task::Builder::new()
.name("connection-handler")
.attribute("remote-address", &addr)
.spawn(async move {
// Start processing request
let request_id = my_request.get_id();
tokio_console::task::attribute("request-id", &request_id);
// Finish processing request
});
```
[Other options](https://hackmd.io/@seanmonstar/B1Vu6YwN_)
## Needed ecosystem improvements
* Improved backtraces when using async.
* Get the location of `.await` calls.
* Maybe the `Context` argument to `poll` can include an `.await` stack?
* Instrument `std` fn calls, especially blocking ones.
* This can probably be done with strace?
## Open questions
* How to deal w/ select!?
* How to represent "concurrent" await points in general?
* How to identify all resource handles, especially mpsc::Sender.
* We know how many *outstanding* there are.
* Ask the user to "touch" them using a special API.
* We also know when a task is "awaiting" on an un-instrumented resource.
## Misc notes
* Resources know they are ready **before** even being polled. We can track the delta between readiness and poll.
* Decorate `poll` fns to *prevent* polling if not woken. This helps catch missed wakeups but the future was polled due to another part of the task being woken and polling the original future.
* Frames! But `select!` forks frames.
* Production?
* Sev-2: dump state now
* Metrics: distribution of tasks & waits.
### Data to collect / show
* Task in `block_in_place`
* Per-task "async" backtrace?
* % time a future is spent in misc states.
* e.g. HTTP request:
* Check out connection from pool
* Send request
* Waiting
* read response
* "Who spawned me"? Track a task's spawner.
* Can we emit "what task are we on?" with gdb?
* Resources per task
* When they were created
* Absolute
* Relative to task spawn
## Rough: More questions and thoughts on how to answer them
* "Always ready" resources.
* Track number of ready resources per poll?
* Is the coop yielder hit?
* Blocking call in task.
* Instrument blocking calls from std.
* Overload scheduler w/ too many busy tasks.
* e.g. actor system that fans out tens of other messages
* Track number of spawned tasks per poll?
* Track number of other tasks waked per poll.
* Rate of spawned tasks.
* Multiple runtimes per process
* Cross-thread notifications?
* Rate of tasks pushed into "external" queue
* Instrument channel depth
* Track queue depth, and time in queue.
* Outright deadlock.
* Lack of progress from incorrectly implemented Future.
* HTTP request to reqwest that hangs, why?
* What phase is the "worker task" in?
* DNS resolution
* Waiting on earlier pipelined request?
* Waiting on available connection pool connection?
* Are other tasks hogging resources and a limit is hit?
* Was the request sent to the remote?
* Was the response received?
* The response header was received, but body?
* IDEA
* Custom per-resource state / config
* Custom lib task linkage
* Custom per-task state / config
* Hyper worker task can include config info.
* Deadlock from circular dependencies.
* Why is my task slow?
* Is it spending time in reading?
* Am I using buffering correctly?
* IDEA: Detect lack of buffering w/ tcp reads / writes
(e)BPF tracing
- define static trace point?
- https://lwn.net/Articles/753601/
- Debugger integration
- Run "record" when running tests and then show what happened during a test.