Crazy SciPy stunts

Here are some tricks of the Python scientific ecosystem that you may not know about yet.

The magic appearing dimension [None]

Here is a boring 1D Numpy array:

x = np.arange(10)

Indexing with None adds empty dimensions on the fly.
Need x as column vector? x[:, None]
Need x as row vector? x[None, :]
Outer product? x[:, None] @ x[None, :] # or np.outer(x, x)
Need x as 5-dimensional array? x[None, None, :, None, None]

np.newaxis is defined as None. This may be slightly more readable: x[np.newaxis, :]

The "yada yada yada" operator […]

Here is the 5D Numpy array from the example above:


x = np.arange(10)[None, None, :, None, None]

Want to select some elements along the 3rd dimention?

x[:, :, 2:4, :, :]  # So many :'s :(

Use the ellipsis (...) operator to avoid having to type so many :'s!


x[:, :, 2:4, ...]
x[..., 2:4, :, :]

Also useful when writing functions that take an array of arbitrary many dimensions as input:





def select_along_first_dim(x, sel):
    return x[sel]  # This works fine
    
def select_along_last_dim(x, sel):
    return x[..., sel]  # ... to the rescue!

Zip and unzip [zip(*iter)]

You may be familiar with the super useful zip function:




a = [1, 2, 3]
b = ['a', 'b', 'c']
zip(a, b)  # Returns an iterator
# (1, 'a'), (2, 'b'), (3, 'c')

Did you know that you can feed the result back into zip to achieve the opposite?





a = [1, 2, 3]
b = ['a', 'b', 'c']
iter1 = zip(a, b)  # This is the result from the previous example
zip(*iter1)  # Feed it right back into zip!
# [1, 2, 3], ['a', 'b', 'c']

Here is a common use case for it: selecting specific cells from a matrix.














x = np.arange(28).reshape(4, 7)
# array([[ 0,  1,  2,  3,  4,  5,  6],
#        [ 7,  8,  9, 10, 11, 12, 13],
#        [14, 15, 16, 17, 18, 19, 20],
#        [21, 22, 23, 24, 25, 26, 27]])

sel = [(1, 2), (3, 5), (2, 6)]  # Desired cells in the matrix

# This doesn't work:
# x[sel]

# But this does:
x[tuple(zip(*sel))]
# array([ 9, 26, 20])

Save the dimensions! [keepdims=True]

NumPy broadcasting behavior is truly epic. Here is how to remove the column-wise mean from a matrix:











x = np.arange(28).reshape(4, 7)
# array([[ 0,  1,  2,  3,  4,  5,  6],
#        [ 7,  8,  9, 10, 11, 12, 13],
#        [14, 15, 16, 17, 18, 19, 20],
#        [21, 22, 23, 24, 25, 26, 27]])

x - x.mean(axis=0)
# array([[-10.5, -10.5, -10.5, -10.5, -10.5, -10.5, -10.5],
#        [ -3.5,  -3.5,  -3.5,  -3.5,  -3.5,  -3.5,  -3.5],
#        [  3.5,   3.5,   3.5,   3.5,   3.5,   3.5,   3.5],
#        [ 10.5,  10.5,  10.5,  10.5,  10.5,  10.5,  10.5]])

So clean and readable! How to remove the row-wise mean?


x - x.mean(axis=1)
# ValueError: operands could not be broadcast together with shapes (4,7) (4,)

The default broadcasting behavior doesn't work for us here. One solution is to use the [None] indexing trick discussed above. Broadcasting behavior becomes a lot more intuitive when you make sure that both arrays have the same number of dimensions. Any dimensions that have a length of 1 will get broadcasted:





x - x.mean(axis=1)[:, None]
# array([[-3., -2., -1.,  0.,  1.,  2.,  3.],
#        [-3., -2., -1.,  0.,  1.,  2.,  3.],
#        [-3., -2., -1.,  0.,  1.,  2.,  3.],
#        [-3., -2., -1.,  0.,  1.,  2.,  3.]])

Recognizing this, many NumPy functions have a keepdims parameter. When set, any dimensions that would normally be removed will be set to length 1 instead. This makes broadcasting super easy:


x - x.mean(axis=0, keepdims=True)  # Remove column-wise mean
x - x.mean(axis=1, keepdims=True)  # Remove row-wise mean

Crazy SciPy stunts

The magic appearing dimension [None]

The "yada yada yada" operator […]

Zip and unzip [zip(*iter)]

Save the dimensions! [keepdims=True]

Read more

Invitation for participants You 2024

Converting from LaTeX to DOCX (and everything else): Pandoc and working around its limitations

Setting up a remote desktop from your work laptop to Aalto Ubuntu workstation

Proof of concept: text chain of trust