February Summary

--- tags: rav1e title: February Summary --- > [rav1e](https://github.com/xiph/rav1e) is an AV1 encoder written in [Rust](https://rust-lang.org) Here a quick summary of what happened last month. I will try to write a recap every month. ## February Summary We poured lots of work on improving the encoding speed, you may read some details of the journey: - Analyze our memory access patterns and [improve the layout and the update strategy](https://dev.to/barrbrain/video-encoder-rollback-optimization-in-rav1e-4d5k) of a structure accessed a lot in our hottest code-path. - [parallelize one of the remaining bottleneck](https://dev.to/luzero/temporal-rdo-update-optimization-2pf1) so we improve the average thread usage and improve both speed and latency. - [add the temporal rdo lookahead to our speed levels](https://dev.to/master_of_zen/per-speed-rdo-lookahead-frames-optimization-5ai2), measure its quality-vs-speed impact and retune them accordingly. > The benchmarks are prepared using [speed-levels-rs](https://crates.io/crates/speed-levels-rs). > > The encoder is using the following settings: > ``` > --threads 16 --tiles 16 -l 100 <file> -o <encoded> -s <level> > ``` > The source file is **Bosphorus** from the [ultravideo test sequences](http://ultravideo.fi/#testsequences), the 1080p 10bit version is the 4k 10bit version scaled down, since it is not available on the website. ![](https://i.imgur.com/GmIN3Il.png) Overall our **aarch64** support is getting fairly good, but there is still a lot of room for improvement. Our 10bit support on it is fairly good though. ![](https://i.imgur.com/X88zTZD.png) ## Digging deeper ### x86_64 As expected the memory layout optimization that happened between `p20210209` and `p20210216` had the largest impact on the speed 0 and 1, while optimizing and tuning the temporal rdo lookahead computation has the largest impact on speed level 9 and 10. ![](https://i.imgur.com/5Y1XpTv.png) The `x86_64` **10bit** encoding is behaving similarly. Our SIMD support for it received a [large](https://github.com/xiph/rav1e/commit/8b930d2a) [boost](https://github.com/xiph/rav1e/commit/a420bc3) in January and there is an ongoing effort to improve it even further in March. ![](https://i.imgur.com/ETjG9lQ.png) ### Aarch64 The impact of the optimizations on **aarch64** had been more radical with a fairly large relative improvement on speed 10. ![](https://i.imgur.com/5QMsfxO.png) On 10bit the SIMD support is larger for **aarch64** and that results in a smaller boost compared to 8bit. ![](https://i.imgur.com/nK6uE8q.png) ## Coming next We already landed additional SIMD for both **x86_64** and **aarch64**, [David](https://dev.to/barrbrain/) started working on improving the [segment selection](https://github.com/xiph/rav1e/pull/2682) and I have [eventually](https://github.com/lu-zero/demo-mt) came up with the internals architecture that would give us a better thread pool usage while not impacting a lot the overall latency. March is going to be exciting.