# Numpy Tips and Tricks ## Table of contents [TOC] --- ## Useful numpy functions ### Utility functions - Create a linearly spaced array over some interval [np.linspace](https://numpy.org/doc/stable/reference/generated/numpy.linspace.html), [np.arange](https://numpy.org/doc/stable/reference/generated/numpy.arange.html) - Create a logarithmically spaced array [np.geomspace](https://numpy.org/doc/stable/reference/generated/numpy.geomspace.html) - Create a new array shaped like another array [np.empty_like](https://numpy.org/doc/stable/reference/generated/numpy.empty_like.html) - Create a new zero-filled array [np.zeros](https://numpy.org/doc/stable/reference/generated/numpy.zeros.html), [np.zeros_like](https://numpy.org/doc/stable/reference/generated/numpy.zeros_like.html) - Create a new one-filled array [np.ones](https://numpy.org/doc/stable/reference/generated/numpy.ones.html), [np.ones_like](https://numpy.org/doc/stable/reference/generated/numpy.ones_like.html) - Calculate the difference between neighboring values in an array [np.diff](https://numpy.org/doc/stable/reference/generated/numpy.diff.html) --- ### Random numbers - Generate a set of uniform random numbers in half-open interval \[0.0, 1.0\) [np.random.random](https://numpy.org/doc/stable/reference/random/generated/numpy.random.random.html) - Generate a set of uniform random integers in half-open interval \[a, b\) [np.random.randint](https://numpy.org/doc/stable/reference/random/generated/numpy.random.randint.html) - Generate a set of random numbers from a specific kind of distribution e.g., [Gaussian/Normal](https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.normal.html), [Poisson](https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.poisson.html), [Power](https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.power.html) - Randomly shuffle an array [np.random.shuffle](https://numpy.org/doc/stable/reference/random/generated/numpy.random.shuffle.html)/[np.random.permutation](https://numpy.org/doc/stable/reference/random/generated/numpy.random.permutation.html) - Choose (n)-random elements or (n)-random indices [np.random.choice](https://numpy.org/doc/stable/reference/random/generated/numpy.random.choice.html) Entire random number reference is [here](https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.random.html) **Note** The random number interface has changed and the recommended way is to create an rng instance -- ```python rng = np.random.default_rng() rng.random() # returns a single random number rng.random(10) # returns 10 uniform random numbers between [0.0, 1.0) ``` --- ### Manipulating Arrays through indices - Find the indices of all non-zero elements within an array, separated per dimension (suited for indexing) [np.nonzero](https://numpy.org/doc/stable/reference/generated/numpy.nonzero.html) - Find the indices of all non-zero elements within an array, separated per element (not suited for indexing) [np.argwhere](https://numpy.org/doc/stable/reference/generated/numpy.argwhere.html) - Select elements from two arrays based on a given condition [np.where](https://numpy.org/doc/stable/reference/generated/numpy.where.html) - Find unique values in an array [np.unique](https://numpy.org/doc/stable/reference/generated/numpy.unique.html) - Find indices where an array of values would have to be inserted in a sorted reference array [np.searchsorted](https://numpy.org/doc/stable/reference/generated/numpy.searchsorted.html) - Find sorted unique values in two 1D arrays [np.intersect1d](https://numpy.org/doc/stable/reference/generated/numpy.unique.html) - Find sorted unique values in two nD arrays over an axis [e13.intersect](https://e13tools.readthedocs.io/api/e13tools.numpy.html#e13tools.numpy.intersect) - Find the indices where the unique values occur (see example code [here](#Finding-(n-D)-indices-of-unique-values)) - Cumulative sum over arrays, frequently for calculating percentiles [np.cumsum](https://numpy.org/doc/stable/reference/generated/numpy.unique.html) - Masking numpy arrays (Mohsen, Adam, Ellert....) [np.maskedarray](https://numpy.org/doc/stable/reference/maskedarray.generic.html) - Sorting an array [np.sort](https://numpy.org/doc/stable/reference/generated/numpy.sort.html) - Get the indices that would sort an array [np.argsort](https://numpy.org/doc/stable/reference/generated/numpy.argsort.html) - Get the indices of the minimum/maximum value(s) in an array [np.argmin](https://numpy.org/doc/stable/reference/generated/numpy.argmin.html)/[np.argmax](https://numpy.org/doc/stable/reference/generated/numpy.argmax.html) - nD-version of built-in ``iter`` (iterate through an nD array) [np.nditer](https://numpy.org/doc/stable/reference/generated/numpy.nditer.html) - nD-version of built-in ``enumerate`` [np.ndenumerate](https://numpy.org/doc/stable/reference/generated/numpy.ndenumerate.html) - nD-version of built-in ``range`` [np.ndindex](https://numpy.org/doc/stable/reference/generated/numpy.ndindex.html) - Apply a function to 1D slices of an array along a given axis [np.apply_along_axis](https://numpy.org/doc/stable/reference/generated/numpy.apply_along_axis.html) - Apply a function to nD slices of an array over multiple axes in given order [np.apply_along_axis](https://numpy.org/doc/stable/reference/generated/numpy.apply_over_axes.html) --- ## Example Use-cases and Codes ### Finding (n-D) indices of unique values ``np.unique`` only returns the index for the first occurence of the unique value. But you might want to know **all** occurrences of every unique value. Here is a code snippet ([taken from here](https://stackoverflow.com/a/54736464)) ```python def ndix_unique(x): """ Returns an N-dimensional array of indices of the unique values in x ---------- x: np.array Array with arbitrary dimensions Returns ------- - 1D-array of sorted unique values - Array of arrays. Each array contains the indices where a given value in x is found """ x_flat = x.ravel() ix_flat = np.argsort(x_flat) u, ix_u = np.unique(x_flat[ix_flat], return_index=True) ix_ndim = np.unravel_index(ix_flat, x.shape) ix_ndim = np.c_[ix_ndim] if x.ndim > 1 else ix_flat return u, np.split(ix_ndim, ix_u[1:]) ``` --- ### Fast histograms [fast-histogram](https://github.com/astrofrog/fast-histogram) is a faster version of `np.histogram` or `np.unique`. This is a pretty easy replacement for `np.histogram`. The main difference is that is does not return the `edges` (the edge of each bin). Instead you have to calculate them yourself (computationally fast). This leads to a massive speed up in `fast-histogram`. --- ### 'Along axis' vs. 'Over axis' In NumPy, you frequently encounter functions that take an `axis` input argument, which specifies that a specific operation must be applied either 'along' or 'over' an axis in the given array. It is however never clear (to me at least) what exactly that means, so here is a short description of both (even NumPy messes it up sometimes, for example in the documentation of [np.sum](https://numpy.org/doc/stable/reference/generated/numpy.sum.html)): - 'Along' means that you take 1D slices along the specified axis, i.e., all dimensions/axes are iterated over except the specified axis. So, if you apply an operation along the first axis of a 3D array, it gets applied to the slices ``[:, 0, 0], [:, 0, 1], ..., [:, 0, N], [:, 1, 0], [:, 1, 1], ..., [:, M, N]``. - 'Over' means that you take (n-1)D slices over the specified axis, i.e., only the specified axis is iterated over. Using a similar example, the operation is applied to ``[0, :, :], [1, :, :], ..., [N, :, :]``. An easy way to remember this (assuming that said operation returns a single value), is that 'along' takes 1D slices and produces an (n-1)D array; and 'over' takes (n-1)D slices and produces a 1D array.