Crazy SciPy stunts

Here are some tricks of the Python scientific ecosystem that you may not know about yet.

The magic appearing dimension [None]

Here is a boring 1D Numpy array:

x = np.arange(10)

Indexing with None adds empty dimensions on the fly.
Need x as column vector? x[:, None]
Need x as row vector? x[None, :]
Outer product? x[:, None] @ x[None, :] # or np.outer(x, x)
Need x as 5-dimensional array? x[None, None, :, None, None]

np.newaxis is defined as None. This may be slightly more readable: x[np.newaxis, :]

The "yada yada yada" operator []

Here is the 5D Numpy array from the example above:

x = np.arange(10)[None, None, :, None, None]

Want to select some elements along the 3rd dimention?

x[:, :, 2:4, :, :]  # So many :'s :(

Use the ellipsis (...) operator to avoid having to type so many :'s!

x[:, :, 2:4, ...] x[..., 2:4, :, :]

Also useful when writing functions that take an array of arbitrary many dimensions as input:

def select_along_first_dim(x, sel): return x[sel] # This works fine def select_along_last_dim(x, sel): return x[..., sel] # ... to the rescue!

Zip and unzip [zip(*iter)]

You may be familiar with the super useful zip function:

a = [1, 2, 3] b = ['a', 'b', 'c'] zip(a, b) # Returns an iterator # (1, 'a'), (2, 'b'), (3, 'c')

Did you know that you can feed the result back into zip to achieve the opposite?

a = [1, 2, 3] b = ['a', 'b', 'c'] iter1 = zip(a, b) # This is the result from the previous example zip(*iter1) # Feed it right back into zip! # [1, 2, 3], ['a', 'b', 'c']

Here is a common use case for it: selecting specific cells from a matrix.

x = np.arange(28).reshape(4, 7) # array([[ 0, 1, 2, 3, 4, 5, 6], # [ 7, 8, 9, 10, 11, 12, 13], # [14, 15, 16, 17, 18, 19, 20], # [21, 22, 23, 24, 25, 26, 27]]) sel = [(1, 2), (3, 5), (2, 6)] # Desired cells in the matrix # This doesn't work: # x[sel] # But this does: x[tuple(zip(*sel))] # array([ 9, 26, 20])

Save the dimensions! [keepdims=True]

NumPy broadcasting behavior is truly epic. Here is how to remove the column-wise mean from a matrix:

x = np.arange(28).reshape(4, 7) # array([[ 0, 1, 2, 3, 4, 5, 6], # [ 7, 8, 9, 10, 11, 12, 13], # [14, 15, 16, 17, 18, 19, 20], # [21, 22, 23, 24, 25, 26, 27]]) x - x.mean(axis=0) # array([[-10.5, -10.5, -10.5, -10.5, -10.5, -10.5, -10.5], # [ -3.5, -3.5, -3.5, -3.5, -3.5, -3.5, -3.5], # [ 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5], # [ 10.5, 10.5, 10.5, 10.5, 10.5, 10.5, 10.5]])

So clean and readable! How to remove the row-wise mean?

x - x.mean(axis=1) # ValueError: operands could not be broadcast together with shapes (4,7) (4,)

The default broadcasting behavior doesn't work for us here. One solution is to use the [None] indexing trick discussed above. Broadcasting behavior becomes a lot more intuitive when you make sure that both arrays have the same number of dimensions. Any dimensions that have a length of 1 will get broadcasted:

x - x.mean(axis=1)[:, None] # array([[-3., -2., -1., 0., 1., 2., 3.], # [-3., -2., -1., 0., 1., 2., 3.], # [-3., -2., -1., 0., 1., 2., 3.], # [-3., -2., -1., 0., 1., 2., 3.]])

Recognizing this, many NumPy functions have a keepdims parameter. When set, any dimensions that would normally be removed will be set to length 1 instead. This makes broadcasting super easy:

x - x.mean(axis=0, keepdims=True) # Remove column-wise mean x - x.mean(axis=1, keepdims=True) # Remove row-wise mean