owned this note
owned this note
Published
Linked with GitHub
# 2025-08-18 NumPy Optimization Team meeting
- Time: 17:00 (5 pm) UTC
- Join via Zoom: https://numfocus-org.zoom.us/j/81261288210?pwd=iwV99tGSjR61RTGEERKM4QKxe46g1n.1
- [NumPy community events calendar](https://scientific-python.org/calendars)
- [Upcoming community meeting agenda](https://hackmd.io/76o-IxCjQX2mOXO_wwkcpg)
- [Community meetings notes archive](https://github.com/numpy/archive/tree/main/community_meetings)
- [Triage meetings notes archive](https://github.com/numpy/archive/tree/master/triage_meetings)
- [Documentation team meetings notes archive](https://github.com/numpy/archive/tree/main/docs_team_meetings)
- [Optimization team meetings notes archive](https://github.com/numpy/archive/tree/main/optim_team_meetings)
**Code of Conduct**
The NumPy Steering Council has made a strong commitment to creating an open, inclusive, and positive community.
All attendees of NumPy community events must adhere to the NumPy Code of Conduct (https://numpy.org/code-of-conduct/).
If you see violations, take a screenshot, intervene in a respectful manner, and report it to the CoC Committee via email. For more information, refer to the Reporting Guidelines section on https://numpy.org/code-of-conduct/.
**Present**: Sayed, Chris, Sam, Abhishek
## Topics
- NumPy SIMD Routines
- Re-implementation of SVML in Google Highway
- https://github.com/numpy/numpy-simd-routines
- https://github.com/numpy/numpy-simd-routines/pull/1
- https://github.com/numpy/numpy-simd-routines/pull/4
- Highway for x86?
- https://github.com/numpy/numpy/issues/29710
## PRs and issues labelled with "SIMD"
- To avoid SSE4 emulation, raising baseline to x86 v2 = SSE4 (https://github.com/numpy/numpy/pull/28896). Also v3 = Haswell, v4 = Skylake
- No major objections, can proceed with this.
- OpenSVML, Generic SIMD Math routines
- Naming
- numpy-simd-routines ?
- Python testing
- Generic across languages
- solya to generate and validate coefficients
- Maybe coremath alternative?
- Where to do:
1. Inside of Highway
2. Outside of NumPy in a separate repo
3. Inside of NumPy with split headers
## Previous topics
- Sayed: Improve highway usage and testing:
- Cleaner interface to use Highway by getting rid of tags (applicable to https://github.com/numpy/numpy/pull/28490)
- for `ConvertTo`, use lane types as the 'tag'. For all other docs, assume/use `ScalableTag<T>` as the tag.
- Test suite to test each intrinsic to simplify debugging
- Current Highway sources: Ability to toggle Highway on/off while building: Sayed to submit a PR fixing it.
- argsort using Highway on ASIMD: https://github.com/numpy/numpy/pull/28416: not ideal, we prefer to have argsort natively implemented in Highway
- LASX loongArch support: https://github.com/numpy/numpy/pull/28373
- Adding 128 bit LSX, LASX in highway: done/merged
- CI doesnt work yet for these target
- Convert fp arithmetic from C to C++ using HWY: https://github.com/numpy/numpy/pull/27402
- segfaults. Recommended asan/hwasan. Also important to split loops into main and a single iteration for the remainder, and only call LoadN in the remainder.
- Sayed says it regresses on x86 when divisor is scalar (6x on AVX2)
- Port VSX specific code to highway to benefit all platforms - helps with type conversions, even for platforms that lack native division (all except VSX and SVE)
## Follow-up from last meeting / discussions
- How to deal with dynamic dispatch in NumPy for future patches
- HWY_ targets in NumPy: maybe disable SVE. AVX2 is fine. SSE4=SSE4.2+AES+CLMUL, not 4.1.
- Highway Dynamic Dispatch
- https://github.com/google/highway/issues/2364
- Parked waiting for Raghuveer
- Hard to decide whether or not to integrate with every project
- Can we control it using Highway?
- Inline SLEEF functions can be wrapped by `#ifdef HWY_TARGET` and guard with the no-multiple-inclusion Highway macros
- Sayed has a side project going into upstream Meson for CPU features
- Can it detect Highway features and match the C++ flags?
- For functions targeting Highway we want to use Highway groups
- Highway can change older groups
- no longer define cpu features
- Discussed converting `exp` and `log` to Highway
- https://github.com/numpy/numpy/blob/main/numpy/_core/src/umath/loops_exponent_log.dispatch.c.src
- It only works on non-linux platforms, on linux we use Intel SVML so its better to convert Intel SVML impl instead
- Modulate dispatched x86 CPU features https://github.com/numpy/numpy/pull/28896
Jan: It needs to be updated to remove AVX512IFMA from ICL and adds BF16 to AVX512_SPR
## New topics
## Action items
- [name=r-devulap] to raise an issue for dispatch decision to gather all of the points together as well as whether to move to Google Highway
- Issue for testing unit for Highway to be able to develop kernels in Python. Raise issue in Highway to discuss. Short-term: can only implement the ops required by numpy, also allows prototyping numpy kernels. Mid-term goal could be to replace existing Highway tests. To be confirmed with other users: would it be OK to depend upon python?
## Backlog
* (https://github.com/numpy/numpy/labels/component%3A%20SIMD) (please add)
- float16/bfloat16: should we have better support, e.g. as an external package providing bfloat16
Raghuveer: not a short-term focus. Jan: bf16 quite useful for ML. Chris: would be good to get f16 and bf16 operands for MatMul; BLAS only supports f32 and f64.
- Leveraging more of Highway's pre-built functions
- Dot / Matvec for none-f32/f64? 8..32-bit MatMul is currently in gemma.cpp.
- Ref: https://github.com/numpy/numpy/pull/25675