Asynchronous Programming in Rust

Author: Didrik Nordstrom (betamos @ github)
Status: Work in progress
Last updated: 2020-08-14

Publishing target: fuchsia.dev
Audience: Rust on Fuchsia developers
Goal: Improve developer confidence, productivity and understanding of asynchronous Rust.
Out of scope: Build your own executor and synchronization primitives.
Pre-requisites: Basic knowledge of the Rust type system and ownership model, OS threads, concurrency vs parallelism.
Dual-format: Guide (linear reading) and reference (searchable).

Rust comes with excellent documentation. The Rust Book can be read cover to cover and goes over almost all features of Rust. Asynchronous Rust is recently stabilized and dramatically impacts APIs, design patterns and data structures. Early research indicates that developers find async and its concepts harder to use and understand compared to Rust generally. The existing Rust Async Book covers many concepts of async Rust, but is incomplete and in my opinion does not clearly separate the knowledge needed to write asynchronous applications from advanced topics, e.g. building an executor, implementing futures manually and understanding wakers. I believe that this risks overloading prospective developers with information that is not necessary for beginners.

Although this draft is currently written for Fuchsia developers, most concepts are not Fucshia-specific. This follows the model of other runtimes, such as tokio, providing and maintaining their own documentation. Per-runtime documentation is the better model for the meaningful differences across runtimes. However, a majority set of features, concepts and mechanisms are runtime-independent. As such, this guide can be repurposed for async Rust developers in general, by substituting references to Fuchsia in favor of more common environments.

Overview

Asynchronous (often abbreviated "async") Rust lets you run multiple tasks concurrently on one or more threads, while preserving zero-cost abstraction performance as well as most of the look and feel of regular, synchronous, Rust. Async Rust is used in operating systems, production-grade servers and other complex systems.

The state of asynchronous Rust

Async Rust is an emerging area. Only a small set of crucial features have yet made it into stable Rust, most notably the Future trait and the async/await syntax. Other features reside in library crates, such as async runtimes, combinators and synchronization primitives. In order to be fully productive with async Rust today, you need to rely on a mixture of core language features, the standard library and community crates. In the future, we expect more async features to be integrated with stable Rust.

Consider async early

Much like security and testability, async should be considered early in the design process. Retrofitting async onto an existing synchronous code base is tedious and costly. An async application or library typically uses different APIs, control flow, design patterns and data structures as compared to its synchronous counterpart.

Structure of this guide

This guide is written for:

People who already know Rust and want to learn the basics about asynchronous programming in general, and asynchronous Rust in particular.
People who are considering async Rust for new projects and want to know how async Rust can fit into their needs.
People who already write async Rust, but want to increase their understanding and productivity.

Familiarity with the Rust type and ownership system will make the technical details of this guide easier to understand.

Introduction to asynchronous programming

Asynchronous programming is a paradigm that has gained traction over the years, as multitasking is playing an increasingly important role in our day-to-day applications. It is suitable for many types of applications, so learning to "think in async" is tremendously useful, both within and outside of Rust.

In some languages, asynchronous programming is baked into the language runtime, such as in modern Javascript. Since Rust is a low-level language with a minimal runtime, async support is not included with the language runtime. Instead, it is your choice whether you want to use async or not.

Background: The problem of multitasking

Imperative programming languages lets you write code that is executed in sequence. Python, JavaScript, C, C++, Go, Rust (and many more) are very different languages, yet they all adhere to this basic principle. The fundamental execution environment consists of a call stack, a heap and a program counter representing the current state of the program.

This model is sufficient for sequential workloads. However, real world applications often require multitasking, for instance:

A server needs to serve multiple clients at the same time.
A graphical program needs to be responsive while some work is going on in the background.
A computational workload should be sharded across multiple CPU cores in order to reduce wall time consumption.

Imperative programming alone does not answer this question. This leads us to the following problem statement:

How do we model multitasking in an imperative programming environment?

Many different languages, frameworks, libraries and design patterns have been developed to answer this question. As of today, there is no widely acknowledged consensus on which approach is the best.

Why asynchronous programming?

The most fundamental multitasking primitive is a thread, which is exposed by the operating system itself. Individual threads can run independent on sep The simplest approach is simply to create one thread for each task. However, threads come with a cost, in terms of CPU time, memory consumption and synchronization costs.

In short, using one thread per task results in sub-optimal resource utilization, in particular for workloads with a large number of tasks. The problem statement can be phrased as:

How do we run M tasks concurrently on N worker threads?

Here, M represents the number of tasks in the program. It can be very large, and vary at runtime. For instance, a server that has one task per client could have a large number of tasks.

The number of worker threads, N, deserves further explanation. We just stated that threads are expensive, so why do we use threads in our search for a better way of multitasking? The reason is that threads can run in parallel on multiple logical CPU cores, which can improve the runtime of parallel workloads. In fact, threads constitute the only abstraction for unlocking parallelism, so we have no choice. However, more threads only help up to a point - when all cores are saturated (i.e. working at their maximum capacity). As such, N should not exceed the number of cores. (In some runtimes, a small number of constant threads are used for auxiliary purposes).

Asynchronous programming offers a performant and ergonomic solution to this problem. It lets you run multiple logical tasks on a single OS thread, by exposing multitasking constructs within the programming environment.

Futures and promises

Many async programming environments provide a future or promise type. They represent an asynchronous operation, where creation and completion are logically separate. Technically, any operation can be asynchronous, even something simple like adding two numbers. In practice we only make operations asynchronous if they cannot complete quickly on their own, for instance if the operation involves I/O, IPC or synchronization. These operations can be suspended while they wait for something to happen elsewhere, such as the arrival of an IP packet or a mutex being locked. In Rust, an async operation is called a future.

Green threads, coroutines and tasks

An asynchronous programming environment often provides either green threads, coroutines or tasks, which represent a series of operations that are largely logically independent from the rest of the program. For example, an HTTP server may create a separate task for each incoming HTTP request. In asynchronous Rust they are (informally) called tasks.

Asynchronous runtimes

An async runtime is responsible for running async applications. If you are familiar with how OS threads work, this analogy may help:

The operating system schedules and executes the threads of a synchronous application.
The async runtime schedules and executes the tasks of an asynchronous application.

Some programming environments provide an implicit runtime which you cannot directly interact with — you may not even be consciously aware of it! In Rust, async runtimes are provided by community maintained crates. You have to manually import and invoke the runtime in order to run your async application.

What about `async/await`?

The async/await syntax lets you write sequential but asynchronous code, similar to regular synchronous code, which is immensely useful in practice. However, async/await is neither necessary to write asynchronous code, nor is it by itself sufficient to achieve in-thread concurrency, which is the goal of asynchronous programming. Understanding async/await is easier if you are already familiar with the asynchronous programming primitives of your programming environment. In Rust, it is easier to understand async/await once you know the fundamentals of futures.

Blocking operations

A blocking operation can occupy the OS thread for a long time. It is usually the result of either:

A blocking system call, such as sleeping or reading from a socket.
A computationally expensive operation, such as searching for large prime numbers.

In a traditional sequential program, blocking operations are completely normal, because there is nothing else to be done while it is waiting.

However, an asynchronous program needs to make progress on multiple tasks concurrently on each OS thread. When a blocking operation is running, it prevents other tasks on the same thread from making progress. This can lead to large latency spikes and even deadlocks. As such, blocking operations should be avoided in async applications.

There is no definite consensus on what constitues "a long time", and hence blocking is not an objective term. By convention, if a system call waits for an external event to occur, it is considered blocking. For computational operations, the line between blocking and non-blocking is blurry. As a rule of thumb, operations that take more than a millisecond of CPU time on modern hardware can be considered blocking. Fortunately, in most circumstances, typical computations are significantly faster than that. If you have stricter or looser latency requirements, you may want to use a different reference number.

Note that async applications can invoke synchronous code freely, as long as they are non-blocking. Some programming environments support running blocking code in a way that doesn't interfere with the rest of the async application. In Rust, you can look up whether this is possible in the documentation provided by your runtime.

Summary

The goal of asynchronous programming is multitasking, allowing multiple logical tasks to make progress concurrently on either one or more OS threads. It is typically more performant and ergonomic than using threads directly. Asynchronous applictions should avoid blocking operations which can cause latency spikes and deadlocks. An async runtime schedules and executes an asynchronous application, switching between tasks as needed. The async/await syntax lets you write sequential but asynchronous code. In async Rust,

An asynchronous operation is called a future
A logically independent asynchronous task is simply called a task
Runtimes are provided by community-maintained crates

TODO: Move

Comparison table: Writing async code

Objective	Sync Rust	Async Rust
Fast function	Regular function, e.g. `fn square(x: i64) -> i64`	(same)
Slow* function	Blocking function, e.g. `fn sleep(t: Duration)`	Async function, e.g. `async fn delay()` `fn delay() -> impl Future<..>` `fn delay() -> DelayFut`
Fast closure	Regular closure, e.g. `\|\| { square(x) }` `move \|\| { square(x) }`	(same)
Slow* closure	Blocking closure, e.g. `\|\| { sleep() }` `move \|\| { sleep() }`	Async closure, e.g. `\|\| async { delay().await }` `move \|\| async { delay().await }`
Fast producer of multiple values	`Iterator<Item=T>`	`Iterator<Item=T>`
Slow* producer of multiple values	`Iterator<Item=T>`	`Stream<Item=T>`
Achieving concurrency only	N/A (except through custom event loop)	Spawn local tasks**, future- and stream combinators
Achieving parallelism	Spawn threads	Spawn tasks**
Thread/task local storage	Provided by `thread_local` crate	Task-local storage may be provided by your runtime.**
Synchronization	Provided by `std::sync`. Reference counting must use `std::sync::Arc`.	Provided by `futures`. Local tasks may use `std::rc::Rc` for reference counting.
I/O	Provided by `std::fs` and `std::net`	Provided by your runtime**
Timers	Provided by `std::thread::sleep`	Provided by your runtime**

Runtime characteristics

Feature	Sync Rust	Async Rust
Scheduler of threads/tasks	OS kernel	The executor of your runtime, which runs in user-space
Scheduling strategy	Pre-emptive and cooperative. Pre-emptive means that a logical task can be suspended by the scheduler at any point.	Cooperative only, meaning that the scheduler cannot resume until the task voluntarily yields back control to it.
Threading model	1:1 (each logical task has its own OS thread)	N:M model: N logical tasks run on M OS threads, where M is typically the number of CPU cores. N:1 model: N logical tasks run on a single OS thread. Also known as "single-threaded" and "multithreaded" executors.
Isolation against slow threads/tasks	Yes, through pre-emptive scheduling	No. You must manually avoid blocking code to avoid congestion and/or deadlocks.
Isolation against panicking threads/tasks	Yes, parent threads can detect and resume after child panics.	See the documentation of your runtime.**
Cost of creating threads/tasks	Slow to create, due to syscalls and kernel provisioning. Can be mitigated by a thread pool.	Very cheap (allocate, then enqueue).
Memory cost of dormant threads/tasks	Significant, since each thread needs its own stack. Can be mitigated by a thread pool.	Minimal memory footprint, since each task only provisions memory for its input future, which is known up-front.
Cost of switching between threads/tasks	Context switches have considerable overhead.	Faster, since many context switches are substituted for user-space task switches.***
Cost of synchronization	Considerate cost due to atomics, memory barriers and context switching.	N:M model: Cheaper than sync Rust due to reduced context switches.* N:1 Model: Significantly cheaper when using single-threaded sync. primitives.*
Cost of I/O	Each read or write incurs a mode switch	Multiple read- and write ops can be pooled in a single syscall, which can improve performance dramatically under load***.
Binary size costs	(Baseline)	Larger binary sizes, due to `Future` types. In particular, `async fn` auto-generates futures that can be large. Mitigations exist.

* Here, slow means any operation that may not complete quickly. Blocking simply means slow and synchronous.
** Depends on your runtime.
*** Theoretical benefit. Needs to be verified through benchmarks.

A future is the async equivalent of a plain old function. Both perform a set of operations
and eventually complete. A function executes from start to end without interruption, but a future executes in bursts. It makes as much progress as possible when it is polled, and then it yields control back to the runtime, allowing other work to be done in the meantime.

The async/await syntax are counterparts. Async is a declaration, similar to fn. Await is application, similar to ().

A task is the async equivalent of a thread. Just like the OS owns and schedules threads, the async runtime owns and schedules tasks. In order to create a thread, you pass a function. When you're creating a task you pass a future.

The async runtime is the equivalent of the OS scheduler. The OS schedules threads in kernel-space, but the async runtime schedules tasks in user-space.

Async combinators are similar to regular combinators (map, and, or_else etc), but deals specifically with futures. Some async combinators (like join, select, etc) alter the concurrent control flow and as such do not have synchronous equivalents.

Fundamentals

This section covers the fundamental building blocks of async Rust, and how they fit together:

Think of async Rust as an extension of synchronous Rust, rather than a substitute. In async Rust, you often call synchronous functions in the same way as you always have. However, you cannot call async functions from a sync function:













fn foo() {}

async fn bar() {}

fn baz() {
    foo(); // OK: sync -> sync
    bar().await; // Error: sync -> async
}

async fn quux() {
    foo(); // OK: async -> sync
    bar().await; // OK: async -> async
}

In async Rust, operations that involve I/O, timeouts and synchronization are usually async. Since these operations can only be invoked by other async functions, async tends to be "contagious" and bubble up all the way to the main function. As a result, you will see async fn a lot in async Rust, even for functions that are only using I/O indirectly through nested function calls.

An asynchronous operation in Rust is a represented as a future—a type which implements the Future trait. Its associated Output type denotes the type of the result, which is available once the future has completed.

Futures are inert just like any other type, but they know how to make progress. A future makes prorgess only when polled through the poll method. It returns Poll::Pending if more work is needed and finally Poll::Ready(val) when complete. Futures cannot be polled after completion.

Most of the time, you won’t need to interact with poll directly unless you implement Future manually [link] or build an executor. Instead, polling is handled automatically by the executor.

A future can represent any asynchronous operation. In practice, a future represents one of the following operations:

Synchronization futures are driven to completion by other operations within the program, e.g. async mutices and channels. This is known as cross-task synchronization.
System futures are driven to completion by the operating system, e.g. IO, IPC and timers. They rely on asynchronous system calls. System futures can sometimes represent cross-process synchronization, but for our purposes we still consider those system futures.
Composed futures are driven to completion by their enclosed futures, which can be nested arbitrarily. Execution order and completion criteria vary. These are typically either of the following:
- Async expressions: The async keyword denotes a block of code which is evaluated sequentially, but without blocking progress of other asynchronous operations.
- Combinators: Types, functions and macros for non-sequential control flow or data transformation, such as select, join, map, Stream and FuturesUnordered.

Identifying asynchronous functions

When browsing code and documentation, you’ll need to differentiate between asynchronous and synchronous APIs. An asynchronous function returns a future. Typically, they come in either of the following forms:

// The return value is a concrete, but unknown, type which implements Future
fn get_user(id: u64) -> impl Future<Output=User> { .. }

// A function with `async` in its signature implies that a future is returned
async fn get_user(id: u64) -> User { .. }

// Returns a custom type that implements Future
fn get_user(id: u64) -> UserFut { .. }

// Somewhere else
impl Future for UserFut {
    type Output = User;
    // ...
}

Sometimes a future is wrapped in a smart pointer, e.g. Box<dyn Future<..>>. For our purposes we consider these futures as well. [see boxing futures].

The async/await syntax

The async keyword denotes an asynchronous expression (a block of code) that evaluates to an anonymous future, similar to how a closure evaluates to an anonymous function. The future’s Output type is inferred by the compiler.

Within an async expression you can add .await to a future (similar to how ? is added to a result). .await suspends execution of the async expression until the future is ready, without blocking the execution of other asynchronous operations. The term “await point” refers a specific .await statement within an async block.

The async keyword can be used with standalone functions and methods, closures as well as plain blocks of code.

Async blocks

An async block automatically captures variables from the outer scope by reference. The resulting future must not exceed the lifetime of the captured variables.

let id = 42;

// A regular async block. We suffix with `_fut` to denote a future.
let friend_count_fut = async {
    // Id is captured by reference
    let user = get_user(id).await;
    let friends = user.get_friends().await;
    friends.len()  // The future's Output type is inferred here
};

Async move blocks

An async move block capture variables from the outer scope and takes ownership of them, similar to a move closure. This tends to be useful in practice, since you don’t have to worry about the lifetime of captured references:

let mut count = 42;
let incr_fut = async move {
    count += 1;  // Count captured as owned
};

Async closures

Async closures are anonymous functions (closures) that return a future. They are written async |x, y| { .. } or async move |x, y| { .. }. However, this syntax is currently unstable.

As a workaround, you can wrap an async block inside a regular closure, like so:

let closure = |id| async move {
    get_user(id).await.get_friends().await.len()
};
let friend_count_fut = closure(42);

Async functions

The async keyword in a function declaration denotes an asynchronous function. It returns a future where the Output type is equal to the return type of the signature. Async functions use move semantics, similar to async move blocks.

pub async fn get_user(id: u64) -> User {
    Database::fetch_one("user", id).await
}

Spelling out async in the signature helps users quickly identify async functions, and reduces verbose boilerplate. The asynchronous get_user method from above could also be written without async in the signature:

pub fn get_user(id: u64) -> impl Future<Output=User> {
    async move {
        Database::fetch_one("user", id).await
    }
}

The impl Future<Output=User> return type denotes a concrete type that implements Future, which is inferred at compile time. Recall that the concrete type of an async block is anonymous.

Async methods and associated functions

You can use the async keyword on methods and associated functions too:

impl User {
    // Async method
    pub async fn get_friends(&self) -> Vec<User> { .. }

    // Async associated function (doesn't use self)
    pub async fn get_common_friends(a: &Self, b: &Self) -> Vec<User> { .. }
}

Recall that the resulting future must outlive all its captured references. This is true also for &self and &mut self. [separate section on lifetimes?]

Async in traits

Traits are Rust’s primary abstraction mechanism. However, the async keyword is not supported in traits. Fortunately, there are a number of workarounds. See the chapter on [writing generic asynchronous code].

Combinators

Combinators are types, methods and macros that enable non-sequential control flow and data transformations. The combinators in this chapter come from the futures crate. This crate provides many useful utilities for async code including a large collection of combinators.

Two fundamental combinators are the join! and select! macros. They correspond to conjunction and disjunction, respectively. Both execute two futures concurrently. join! awaits both operations and returns their results. select! awaits until either of the futures is complete and returns its result.

// Awaits both futures concurrently
let (user1, user2) = join!(get_user(42), get_user(43));  

// Awaits any of the two futures to complete, and returns the first one
let user = select! {
   user1 = get_user(42).fuse() => user1,
   user2 = get_user(43).fuse() => user2,
};

The .fuse() method transforms a future into a FusedFuture, and is required by select!. See the section on fusing[link] for an explanation on why this is necessary.

TODO:

for_each_concurrent
FuturesUnordered
Stream

Advanced Futures

TODO:

Pinning
Boxing
Fusing
Send-ness
Lifetimes of futures
Async expression state machines

The Asynchronous Runtime

High level workings and usage. Should not extensively cover lower level mechanisms.

TODO:

Tasks
Wakers
The event-loop
Multithreaded executors

Developer Guide

TODO:

Pick an executor
Avoid blocking code
Writing generic async code
Data flow and synchronization
Testing

Demo Project

Author: Lee B

WIP: See https://hackmd.io/@lbernick/SkgO7bCMw

Daniel Johnson

2021/06/25 18:53:41

"single-threaded" and "multithreaded" executors.

nit: switch "multithreaded" and "single-threaded" so that they're in the same order as the above threading models. (Edited)

Asynchronous Programming in Rust

Overview

The state of asynchronous Rust

Consider async early

Structure of this guide

Introduction to asynchronous programming

Background: The problem of multitasking

Why asynchronous programming?

Futures and promises

Green threads, coroutines and tasks

Asynchronous runtimes

What about async/await?

Blocking operations

Summary

TODO: Move

Comparison table: Writing async code

Runtime characteristics

Fundamentals

Identifying asynchronous functions

The async/await syntax

Async blocks

Async move blocks

Async closures

Async functions

Async methods and associated functions

Async in traits

Combinators

Advanced Futures

The Asynchronous Runtime

Developer Guide

Demo Project

Read more

Tauri Friction Log

What about `async/await`?