---
tags: rav1e
title: February Summary
---
> [rav1e](https://github.com/xiph/rav1e) is an AV1 encoder written in [Rust](https://rust-lang.org)
Here a quick summary of what happened last month. I will try to write a recap every month.
## February Summary
We poured lots of work on improving the encoding speed, you may read some details of the journey:
- Analyze our memory access patterns and [improve the layout and the update strategy](https://dev.to/barrbrain/video-encoder-rollback-optimization-in-rav1e-4d5k) of a structure accessed a lot in our hottest code-path.
- [parallelize one of the remaining bottleneck](https://dev.to/luzero/temporal-rdo-update-optimization-2pf1) so we improve the average thread usage and improve both speed and latency.
- [add the temporal rdo lookahead to our speed levels](https://dev.to/master_of_zen/per-speed-rdo-lookahead-frames-optimization-5ai2), measure its quality-vs-speed impact and retune them accordingly.
> The benchmarks are prepared using [speed-levels-rs](https://crates.io/crates/speed-levels-rs).
>
> The encoder is using the following settings:
> ```
> --threads 16 --tiles 16 -l 100 <file> -o <encoded> -s <level>
> ```
> The source file is **Bosphorus** from the [ultravideo test sequences](http://ultravideo.fi/#testsequences), the 1080p 10bit version is the 4k 10bit version scaled down, since it is not available on the website.

Overall our **aarch64** support is getting fairly good, but there is still a lot of room for improvement. Our 10bit support on it is fairly good though.

## Digging deeper
### x86_64
As expected the memory layout optimization that happened between `p20210209` and `p20210216` had the largest impact on the speed 0 and 1, while optimizing and tuning the temporal rdo lookahead computation has the largest impact on speed level 9 and 10.

The `x86_64` **10bit** encoding is behaving similarly. Our SIMD support for it received a [large](https://github.com/xiph/rav1e/commit/8b930d2a) [boost](https://github.com/xiph/rav1e/commit/a420bc3) in January and there is an ongoing effort to improve it even further in March.

### Aarch64
The impact of the optimizations on **aarch64** had been more radical with a fairly large relative improvement on speed 10.

On 10bit the SIMD support is larger for **aarch64** and that results in a smaller boost compared to 8bit.

## Coming next
We already landed additional SIMD for both **x86_64** and **aarch64**, [David](https://dev.to/barrbrain/) started working on improving the [segment selection](https://github.com/xiph/rav1e/pull/2682) and I have [eventually](https://github.com/lu-zero/demo-mt) came up with the internals architecture that would give us a better thread pool usage while not impacting a lot the overall latency.
March is going to be exciting.