# Project Summary Intrinsics are special built-in functions that act as an intermediary between high-level code and low-level processor instructions. They're often platform-specific assembly instructions that can easily be used within high-level code. For Rust, all of the intrinsics are organized as a crate (`core_arch`) within a module called `std::arch`. `std::arch` houses a bunch of crates that provide the infrastructure of maintaining intrinsics. `intrinsic-test` is one such crate and is tasked with ensuring that intrinsics are implemented correctly and work in the exact same way as their C++ implementation. The way this works is: 1. `intrinsic-test` generates Rust and C++ test-files that makes a call to each intrinsic with a wide variety of legal values as its arguments. 2. The Rust and the C++ test-files are then compiled and executed 3. Both the test-files generate outputs for each intrinsic 4. The outputs are compared, failing the program incase any difference is found. This project was intended to build the infrastructure necessary to add more architectures into `intrinsic-test` (which had support only for ARM-based targets prior). # Prior state of the project Prior to working on this project, `intrinsic-test` targeted ARM-based platforms exclusively: 1. `aarch64_be-unknown-linux-gnu` 2. `aarch64-unknown-linux-gnu` 3. `armv7-unknown-linux-gnueabihf` It was difficult to extend this checking for other platforms. # What I did ### The following were the actions that needed to be performed irrespective of architecture: 1. Process a reference file (that lists down intrinsics and describes their arguments and the return type) 2. Generate Rust test-files (that implements a test function for each intrinsic) 3. Generate C++ test-files 4. Compile the test-files 5. Run their binaries and compare the outputs ### The platform-specific actions would be: 1. Processing the reference for intrinsics. - ARM intrinsics were represented in JSON format - x86 intrinsics were in XML format 3. Ensuring correct argument loads 4. Ensuring correct display of the result of each intrinsic 5. Tweaking the configurations to ensure that the C++ way of displaying the results matches that in Rust ### My work during the GSoC period encompassed: 1. Splitting the codebase of `intrinsic-test` into a platform-independent (`common`) module and a platform-specific module for ARM based on the above logic 2. Stabilizing the `common` module for easier use for future architectures 3. Adding x86 into `intrinsic-test` 4. Creating the foundation for `loongarch` and `wasm32` # Merged PRs 1. [`intrinsic-test` : Adding x86 behavioural testing](https://github.com/rust-lang/stdarch/pull/1894) - Currently tests only for 5% of all test-able x86 intrinsics, to maintain the time taken by the CI 3. [Re-organize `intrinsic-test` to enable seamless addition of behaviour testing for more architectures](https://github.com/rust-lang/stdarch/pull/1758) - Broke the existing logic (which tested only for aarch* and armv7 targets) into architecture-specific and architecture-independent logic - The intention was to setup logic so that other architectures could be added just by implementing traits and tweaking configurations 4. [`core_arch::x86` : Fix the implementation of _kshift instructions](https://github.com/rust-lang/stdarch/pull/1930) - The definition of this intrinsic mentions that the result will be zero if the amount of the shift exceeds the bit length of the data type. - However the definition within Rust performs a bounded shift (the shift amount was performed modulo the bit length of the argument). - The fix was to move from using `a << LEN` or `a >> LEN` to `a.unbounded_shl(LEN)` and `a.unbounded_shr(LEN)` respectively. - This issue was identified during a run of `intrinsic-test`. 5. [`intrinsic-test` : Updated the Constraint enum to support discrete values](https://github.com/rust-lang/stdarch/pull/1892) - The arguments to some intrinsics don't take up the entire range of the type. It is possible that such arguments take up a subset of the same. - The existing setup allowed for only singleton values or a range of values. - However x86 had intrinsics that took up a discrete set of values. - This change allows for the constraint to take up a discrete set of values. 6. [Feat: updated `Argument<T>` type for functional compatibility with other architectures](https://github.com/rust-lang/stdarch/pull/1887) 7. [`intrinsic-test` : Cleaning the IntrinsicType struct and related functionalities](https://github.com/rust-lang/stdarch/pull/1895) 8. [`intrinsic-test` : bringing back support for --generate-only flag ](https://github.com/rust-lang/stdarch/pull/1888) 9. [`intrinsic-test` : Modified TypeKind enum to group the Signed and Unsigned version of types](https://github.com/rust-lang/stdarch/pull/1876) # Active PRs: 1. [`intrinsic-test` : Supporting LoongArch behavioural testing](https://github.com/rust-lang/stdarch/pull/1900) 2. [`stdarch-gen-wasm32` : Tool that creates spec sheet from wasm32's C and Rust source files](https://github.com/rust-lang/stdarch/pull/1910) # Future targets 1. Processing all the intrinsics in one binary instead of having to run different processes for the same. - Currently x86 has over 10,000 intrinsics that take about 4 hours to run fully. - This is because each intrinsic is run on a different process, generating overheads for process management - This update will greatly reduce the time taken to process all the x86 intrinsics 2. Adding Wasm32 into `intrinsic-test` - Adding `stdarch-gen-wasm32` since there is no official reference file that currently exists for Wasm32 - Wasm32 has different function signatures in its C++ definitions, compared to its Rust definitions 3. Adding LoongArch into `intrinsic-test` # Challenges 1. Picking up relevant concepts was the most difficult challenge. I had to learn the concepts of vector registers, constraints and the like in the journey. 2. The journey from implementing all the necessary traits for x86 and plugging in all the platform-specific functionality to a successful CI integration involved fixing numerous issues and adding a lot of helper functions. Some notable ones were - Most of the 16-bit integer lane function implementation in C++ returned data as `unsigned int` in x86 (where the return data was set in the lower 16 bits) unlike Rust, which returned such data as `u16`. This wasn't handled by the existing `T1 cast(T2)` definition, which performed bit-preserving typecasts between types that had the same bit length. The easiest way to fix this was to update the function to allow for - Normal casting when the conversion is possible (eg: from uint32_t to uint64_t) - Bit-preserving casting when the final type is smaller than or equal to the initial type - Defining custom lane functions for the `_mm512` vector type in C++, so that elements smaller than 128 bits could be extracted easily. - These functions performed a 2-level extraction, first the 128-bit "block" of the vector variable was loaded, from which the actual lane value was extracted - Definiting custom load functions for Rust testfiles, so that the arguments to the intrinsics (which were the result of the load functions) had the correct type without needing to clutter the Rust testfile generation logic - These functions involved only a zero-cost cast between the`{i/d/h}` types after the actual load, but reduced the necessity to rewrite the way load calls were constructed 3. The time taken by the CI for the `x86_64-unknown-linux-gnu` target was notorious in that it took 4 hours to run and compare each intrinsic. - Earlier, there was a C++ file and a Rust module for each intrinsic - To reduce compilation times, we grouped the intrinsics, initially making as many groups as the number of CPU cores (to maximise parallelism) - However this caused excessive resource consumption for x86, which had about 10,000 intrinsics while there were only 4 CPU cores in the CI infrastructure. - We then decided to cluster the intrinsics into fixed-size groups to ease RAM consumption # Acknowledgements I wish to express my gratitude to: 1. [@Amanieu](https://github.com/amanieu) for mentoring and supporting me throughout the project, and connecting me with experts on LoongArch and Nvidia intrinsics. 2. [@sayantn](https://github.com/sayantn) for reviewing my PRs to `core_arch`, guiding me with CI issues and assisting me with [adding the `apxf` target feature to Rust](https://github.com/rust-lang/rust/pull/139534) 3. [@folkertdev](https://github.com/folkertdev) for assisting me with parallelizing the code generation/compilation steps in `intrinsic-test` and suggesting the usage of useful design patterns (such as the usage of associated constants in traits). 4. [@adamgemmell](https://github.com/adamgemmell) for advising me on the design of the `common` module and helping me prioritize the targets I wished to achieve within this project