Try   HackMD

NumPy2 lighning talk: Change np.int_ to np.int64 on all platforms

Today np.int_ reflects the C long type, which is 32-bits for MSVC (windows) and 64-bits on all other 64-bit platforms, and 32-bits on 32-bit platforms. The relevant issues discussing this: 9464, 12332, 17531 (which points out that the canonical name for np.int32 on win64 is np.intc), 17640 18540

Implications for Array API compatibility

Under Dtypes, the Array API spec says:

The default integer data type should be the same across platforms, but the default may vary depending on whether Python is 32-bit or 64-bit.

So this change would bring NumPy into compliance with the Array API

Who is in favor?

The difference confuses some users, who expect np.array(2**31) to have the same dtype as np.array(2**30) on all platforms. There were comments from the CuPy developers on one of the linked issues that they needed to adjusts tests for windows because of this.

What will break?

Cython imports npy_long and uses it. We would have to check the implications of making npy_long 64-bits everywhere, or if there would be problems with inconsistencies between npy_long and np.int_.

Why not make the change?

It would be disruptive. In the best case, SciPy users (to take a concrete example) would silently get different results on windows when updating NumPy to 2.0. In a worse case, their programs might crash if this affects Cython.

We could develop a strategy to migitate this with a legacy code-path when import_array() is called in Scipy, which would cause all SciPy code to use the older dtype and prevent crashes, but that is really complicated and might not work.

Is it worth it?

<Discussion></Discussion>

Decision

<Discussion></Discussion>