owned this note
owned this note
Published
Linked with GitHub
# Tokio AsyncRead / AsyncWrite
This is my attempt to write-up and summarize the [Tokio AsyncRead/AsyncWrite discussion](https://github.com/tokio-rs/tokio/pull/1744). It includes a certain amount of "editorial commentary" on my part. To be clear, I am not a "stakeholder" with decision making power there, but obviously this is relevant to any future attempt to standardize `AsyncRead` and `AsyncWrite`, so I want to make sure that the discussion is well documented. --nikomatsakis
XXX I may start adapting this to a more general "summary page" on the topic --nikomatsakis
## Core Motivation: Perf and Uninitialized Memory
The core motivation is that the `AsyncRead` interface would like to be able to accept buffers that are not yet initialized, but the `&mut [u8]`. This is a recognized downside of the existing synchronous `Read` traits as well, and sfackler has a [great writeup][w] on the topic.
[w]: https://paper.dropbox.com/doc/IO-Buffer-Initialization-MvytTgjIOTNpJAS6Mvw38
## Measuring the impact
The performance impact of zeroing uninitialized memory has been measured in various ways.
In end-to-end hyper benchmarks (basically hyper serving as both http client/server and piping data as fast as it can go), [seanmonster reports](https://github.com/tokio-rs/tokio/pull/1744#issuecomment-554543881) the following results:
| Data size | Uninitialized Memory | Zeroed Memory |
| --- | --- | --- |
| 100kb | 1600 mb/s | 1075 mb/s |
| 10mb | 2350 mb/s | 1850 mb/s |
[Ralith tested a similar setup](https://github.com/tokio-rs/tokio/pull/1744#issuecomment-553501198), but using the QUIC protocol as implemented by [`Quinn`](https://github.com/djc/quinn). Here Ralith reported an impact of 2.5% for large streams, but only 0.2%-0.6% for smaller inputs.
sfackler also had a [comment](https://github.com/tokio-rs/tokio/pull/1744#issuecomment-553179399) exploring some of the performance costs from the synchronous trait. For example, they mentioned that some routines had to be rewritten in complex ways to work around initialization costs (see [PR #23820](https://github.com/rust-lang/rust/pull/23820)). They also mentioned that [PR #26950](https://github.com/rust-lang/rust/pull/26950), which added some specializations to the stdlib to avoid initialization costs, found a 7% impact on microbenchmarks around file reads. Unfortunately, neither of these represent "full system" impact.
It would be interesting to try and get numbers for a setup that does more than pipe data through as fast as it can go -- for example, serving more realistic requests that involve more processing. But that data can be quite hard to gather in practice. If somebody has a setup based on tokio that is doing more complex processing, it might be possible to build with a fork of tokio in which zeroing is artifically added or removed and test that way?
### Alternatives: buffer pooling
One alternative to permitting uninitialized buffers is to use buffer pooling, in which case you amortize the initialization costs by reusing buffers. In many scenarios this a perfectly fitting solution, but there are some concerns:
* you are unable to allocate buffers on the stack
* you are now required to decide when to release those buffers to the operating system, which was previously the job of the allocator
There have been arguments that you also increase the risk of heartbleed-like attacks, where a secret is accidentally left in a buffer and then re-used from another connection. However that seems like a weaker case, since the memory allocator is also likely to hand you memory that was freed but not re-initialized. Therefore, if you wish to guard secrets, the right approach is to zero the memory when you are done with it, most likely, or provide some other mechanism.
### Bridging and the relationship to std
This part is more editorial on my part. I am happy to see Tokio exploring the "design space" around the `AsyncRead` traits. I don't have a fully formed opinion yet on what I think they should look like -- I see benefits here, but they come at a significant cost in complexity.
I guess my main thought is that I think it would be important to pick something that can be "bridged" to a `std` trait with relative ease (and, similarly, I would expect that "bridge-ability" to the traits in use in different runtimes would be a consideration for a `std` trait). Of course, not knowing exactly what the std trait will look like makes that a bit harder! But I guess we can assume it will be similar to some of the options raised on this thread, so it'd be good to consider how easily they can be bridged back and forth, and at what cost.
**Takeaway:** It would be good to consider how readily the proposals can be bridged back and forth. I would pay particular attention to how hard it is to bridge to `&mut [u8]` or `&mut [MaybeUninit<u8>]` based formulations.
### How to expose vectorized writes
One concern is how to expose vectorized writes. It's clear that the current trait, which has two methods, can be error-prone, because people can easily forget to implement the vectorized variant and hence get poor performance. Some of the alternatives narrow down to one method, which is good.
But [Carl argues here](https://github.com/tokio-rs/tokio/pull/1744#issuecomment-553575438) that in fact high performance callers really want to have two code-paths, depending on whether the source will be able to productively use vectorized reads, and hence that the trait should expose a `bool` method or something that lets the caller choose the path they want.
## Alternatives
### Original
```rust
fn poll_read(buffer: &mut [u8])
```
### Some way to "opt-in" to not zeroing memory
Similar to what std offers.
Pros:
* Compatible with Read, permits simpler signature
Cons:
* Somewhat complex
* Need a "unsafe to implement but not to call" mechanism for methods, which we lack
### dyn Trait
Original proposal:
```rust
fn poll_read(buffer: &mut dyn Buf)
```
Pros:
* One method supports both vectorized writes and ordinary writes
* Buffer encapsulates uninitialized memory
* Bridging is (presumably) easier because you can have `Buf` trait implemented for many types
Cons:
* Requires a virtual call for leaf writes
* Because dyn traits (like `dyn Buf`) cannot be created for unsized types like `[u8]`, the actual type when invoked with a `&mut [u8]` buffer will be doubly indirect (`&mut &mut [u8]`), and similarly callers may need some extra `&mut`. Not clear that this matters.
### impl Trait
[As described here](https://github.com/tokio-rs/tokio/pull/1744#issuecomment-550717991)
```rust
fn poll_read(buffer: impl Buf) where Self: Sized;
fn poll_read_dyn(buffer: &mut dyn Buf);
```
Pros:
* Like dyn Trait, but more flexible and eliminates fears of perf impact
* But more complex trait, and implementing will be annoying because defaults cannot be provided, so there is some boilerplate
## Concrete struct
[Proposed here](https://github.com/tokio-rs/tokio/pull/1744#issuecomment-553575438)
Another option
```rust
fn poll_read(buffer: &mut BufStruct)
fn poll_read_vectorized(buffer: &mut BufStruct)
```
XXX write-up some of the pros/cons
## Take slice of `MaybeUninit`
Another option
```rust
fn poll_read(buffer: &mut [MaybeUninit<u8>])
```