Project Portable SIMD: Getting Started

# Project Portable SIMD: Getting Started ###### tags: `Portable SIMD` `Blog Draft` RFC: https://github.com/rust-lang/rfcs/pull/2977 ## First steps - Merge the RFC - Create tracking issue in `rust-libs` - Decide on meeting times: https://doodle.com/poll/95mmtfp6ssnpzm4t ## Getting started What do we need to decide before we can get started? - Where do we put the codes? - New `rust-lang/coresimd` repo? - Module in `rust-lang/corearch`? - Crate in `rust-lang/corearch`? - Somewhere else? ## Announcing Portable SIMD We're announcing the start of the _Portable SIMD Project Group_ within the Libs team. This group is dedicated to making a portable SIMD API available to stable Rust users. The Portable SIMD Project Group is being lead by [@calebzulawski](https://github.com/calebzulawski), [@Lokathor](https://github.com/Lokathor), and [@workingjubilee](https://github.com/workingjubilee). ### What are project groups? Rust uses [project groups](https://rust-lang.github.io/rfcs/2856-project-groups.html) to help coordinate work. They're a place for people to get involved in helping shape the parts of Rust that matter to them. ### What is portable SIMD? SIMD (Single Instruction, Multiple Data) instructions can apply the same operation to multiple values _simultaneously_. We say these instructions are _vectorized_ because they operate on a "vector" of values instead of a single value (it's similar to an array, but not to be confused with Rust's `Vec` type). Different chip vendors offer different hardware intrinsics for achieving vectorization. Rust's standard library has exposed some of these intrinsics to users directly through the [`std::arch` module](https://doc.rust-lang.org/core/arch/index.html) since [`1.27.0`](https://blog.rust-lang.org/2018/06/21/Rust-1.27.html) shipped back in mid 2018. You _can_ build vectorized algorithms on `std::arch` directly, but that could mean sacrificing portability, or having to maintain a different implementation for each CPU you want to support. They can also just be noisy to work with directly. The goal of the Portable SIMD project group is to provide a high-level API in a new `std::simd` module that abstracts these platform-specific intrinsics away. You just pick a vector type with the right size, like `f32x4`, and perform operations on them, like addition, and the appropriate intrinsics will be used behind the scenes. There are still reasons to want to choose intrinsics in `std::arch` directly though. `std::simd` cannot mirror the details of every possible vendor API. `std::simd` is Rust's explicit, _portable_ vectorization story. ### How can I get involved? If you'd like to get on board and help make portable SIMD a reality you can visit our [GitHub repository](https://github.com/rust-lang/project-portable-simd) or reach out on [Zulip](https://rust-lang.zulipchat.com/#narrow/stream/257879-project-portable-simd) and say hi! :wave: ## Reference * [Notes on SIMD in Rust](https://hackmd.io/-LaVJuO2SuS53uGX-D76tA?view) * [`packed_simd` reverse dependencies](https://crates.io/crates/packed_simd/reverse_dependencies) ## Lokathor Scratch Space ### What is Portable SIMD? SIMD stands for "Single Instruction, Multiple Data". It lets the CPU apply a single instruction to a "vector" of data. The vector is a single CPU register, but it's logically considered to have several "lanes" internally that are all of the same type. You can think of it as being *similar* to an array. Instead of adding two `f32` values together as one step and getting an `f32` out, you can add two `f32x4` together as a single step and get an entire `f32x4` out. Not every problem can be accelerated with SIMD processing, but particularly for multimedia and other "list processing" situations there can be significant gains. As you might expect, different CPUs have different sets of SIMD instructions available to them. Portable SIMD will let us write out our SIMD code just once using a high level API, and then have the compiler sort out the problem of turning that code into the particular SIMD instructions that fit the target CPU. This is on a "best effort" basis, and if the CPU doesn't support SIMD at all then the operations are simply compiled into doing each lane one at a time using normal CPU instructions. Depending on your program, it might still be appropriate for you to use `std::arch` directly, but we are aiming to have the Portable SIMD API (eventually available in `std::simd`) cover as many use cases as possible. The `std::simd` types will also be easily convertable to the appropriate `std::arch` types for when you do need to use `std::arch`.