# Crazy SciPy stunts Here are some tricks of the Python scientific ecosystem that you may not know about yet. ## The magic appearing dimension [None] Here is a boring 1D Numpy array: ```python x = np.arange(10) ``` Indexing with `None` adds empty dimensions on the fly. Need `x` as column vector? `x[:, None]` Need `x` as row vector? `x[None, :]` Outer product? `x[:, None] @ x[None, :] # or np.outer(x, x)` Need `x` as 5-dimensional array? `x[None, None, :, None, None]` `np.newaxis` is defined as `None`. This may be slightly more readable: `x[np.newaxis, :]` ## The "yada yada yada" operator [...] Here is the 5D Numpy array from the example above: ```python= x = np.arange(10)[None, None, :, None, None] ``` Want to select some elements along the 3rd dimention? ```python x[:, :, 2:4, :, :] # So many :'s :( ``` Use the ellipsis (`...`) operator to avoid having to type so many `:`'s! ```python= x[:, :, 2:4, ...] x[..., 2:4, :, :] ``` Also useful when writing functions that take an array of arbitrary many dimensions as input: ```python= def select_along_first_dim(x, sel): return x[sel] # This works fine def select_along_last_dim(x, sel): return x[..., sel] # ... to the rescue! ``` ## Zip and unzip [zip(*iter)] You may be familiar with the super useful `zip` function: ```python= a = [1, 2, 3] b = ['a', 'b', 'c'] zip(a, b) # Returns an iterator # (1, 'a'), (2, 'b'), (3, 'c') ``` Did you know that you can feed the result back into `zip` to achieve the opposite? ```python= a = [1, 2, 3] b = ['a', 'b', 'c'] iter1 = zip(a, b) # This is the result from the previous example zip(*iter1) # Feed it right back into zip! # [1, 2, 3], ['a', 'b', 'c'] ``` Here is a common use case for it: selecting specific cells from a matrix. ```python= x = np.arange(28).reshape(4, 7) # array([[ 0, 1, 2, 3, 4, 5, 6], # [ 7, 8, 9, 10, 11, 12, 13], # [14, 15, 16, 17, 18, 19, 20], # [21, 22, 23, 24, 25, 26, 27]]) sel = [(1, 2), (3, 5), (2, 6)] # Desired cells in the matrix # This doesn't work: # x[sel] # But this does: x[tuple(zip(*sel))] # array([ 9, 26, 20]) ``` ## Save the dimensions! [keepdims=True] NumPy broadcasting behavior is truly epic. Here is how to remove the column-wise mean from a matrix: ```python= x = np.arange(28).reshape(4, 7) # array([[ 0, 1, 2, 3, 4, 5, 6], # [ 7, 8, 9, 10, 11, 12, 13], # [14, 15, 16, 17, 18, 19, 20], # [21, 22, 23, 24, 25, 26, 27]]) x - x.mean(axis=0) # array([[-10.5, -10.5, -10.5, -10.5, -10.5, -10.5, -10.5], # [ -3.5, -3.5, -3.5, -3.5, -3.5, -3.5, -3.5], # [ 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5], # [ 10.5, 10.5, 10.5, 10.5, 10.5, 10.5, 10.5]]) ``` So clean and readable! How to remove the row-wise mean? ```python= x - x.mean(axis=1) # ValueError: operands could not be broadcast together with shapes (4,7) (4,) ``` The default broadcasting behavior doesn't work for us here. One solution is to use the `[None]` indexing trick discussed above. Broadcasting behavior becomes a lot more intuitive when you make sure that both arrays have the same number of dimensions. Any dimensions that have a length of 1 will get broadcasted: ```python= x - x.mean(axis=1)[:, None] # array([[-3., -2., -1., 0., 1., 2., 3.], # [-3., -2., -1., 0., 1., 2., 3.], # [-3., -2., -1., 0., 1., 2., 3.], # [-3., -2., -1., 0., 1., 2., 3.]]) ``` Recognizing this, many NumPy functions have a `keepdims` parameter. When set, any dimensions that would normally be removed will be set to length 1 instead. This makes broadcasting super easy: ```python= x - x.mean(axis=0, keepdims=True) # Remove column-wise mean x - x.mean(axis=1, keepdims=True) # Remove row-wise mean ```