NumPy Debugging Reference

# NumPy Debugging Reference ## NumPy Functions ### `np.bincount(x, weights=None, minlength=0)` Counts occurrences of non-negative ints. **x must be 1D, non-negative integers.** ```python >>> np.bincount([0, 1, 1, 3, 2, 1]) array([1, 3, 1, 1]) # counts: [0 appears 1x, 1 appears 3x, 2 appears 1x, 3 appears 1x] >>> np.bincount([0, 1, 1], minlength=5) array([1, 2, 0, 0, 0]) # force output length >>> np.bincount([0, 1, 2], weights=[0.5, 1.0, 0.5]) array([0.5, 1.0, 0.5]) # weighted counts ``` **Common bugs:** - Forgetting `minlength` → output shorter than expected - Passing negative values → ValueError - Passing floats → TypeError --- ### `np.nonzero(a)` Returns **tuple of arrays**, one per dimension, containing indices of non-zero elements. ```python >>> arr = np.array([0, 2, 0, 3]) >>> np.nonzero(arr) (array([1, 3]),) # tuple! indices where arr != 0 >>> np.nonzero(arr)[0] # unwrap for 1D array([1, 3]) >>> mat = np.array([[0, 1], [2, 0]]) >>> np.nonzero(mat) (array([0, 1]), array([1, 0])) # (row_indices, col_indices) ``` **Common bugs:** - Using result directly without `[0]` for 1D arrays - Expecting flat indices instead of tuple of coordinates --- ### `np.where(condition, [x, y])` Two forms: 1. `np.where(condition)` → same as `np.nonzero(condition)` (returns tuple of indices) 2. `np.where(condition, x, y)` → element-wise: x where True, y where False ```python >>> arr = np.array([1, -2, 3, -4]) # Form 1: indices only (returns tuple!) >>> np.where(arr > 0) (array([0, 2]),) # Form 2: conditional replacement >>> np.where(arr > 0, arr, 0) array([1, 0, 3, 0]) >>> np.where(arr > 0, 'pos', 'neg') array(['pos', 'neg', 'pos', 'neg'], dtype='<U3') ``` **Common bugs:** - Using 1-arg form expecting array, getting tuple - Confusing with `arr[condition]` which returns values, not indices --- ### `np.sum(a, axis=None, keepdims=False)` ```python >>> arr = np.array([[1, 2], [3, 4]]) >>> np.sum(arr) # all elements 10 >>> np.sum(arr, axis=0) # sum columns (collapse rows) array([4, 6]) >>> np.sum(arr, axis=1) # sum rows (collapse columns) array([3, 7]) >>> np.sum(arr, axis=1, keepdims=True) # preserve dims for broadcasting array([[3], [7]]) ``` **Common bugs:** - Wrong axis (0=down columns, 1=across rows) - Missing `keepdims=True` when result needs to broadcast back --- ### `np.min(a, axis=None)` / `np.max(a, axis=None)` Same axis semantics as `np.sum`. Also have `keepdims` parameter. ```python >>> arr = np.array([[1, 5], [3, 2]]) >>> np.min(arr, axis=0) # min of each column array([1, 2]) >>> np.max(arr, axis=1) # max of each row array([5, 3]) ``` --- ### `np.random.RandomState(seed)` Reproducible random number generator. ```python >>> rng = np.random.RandomState(42) >>> rng.rand(3) # uniform [0, 1), shape (3,) array([0.37454012, 0.95071431, 0.73199394]) >>> rng.randint(0, 10, 5) # integers in [0, 10), shape (5,) array([6, 3, 7, 4, 6]) >>> rng.randn(2, 3) # standard normal, shape (2, 3) >>> rng.choice([1,2,3], size=2, replace=False) # sample without replacement >>> rng.shuffle(arr) # in-place shuffle (returns None!) >>> rng.permutation(arr) # returns shuffled copy ``` **Common bugs:** - Using `rng.shuffle()` return value (it's None) - `randint(a, b)` is [a, b) exclusive of b - Confusing `rand` (uniform) vs `randn` (normal) --- ### Broadcasting Rules Shapes are compared right-to-left. Dimensions match if equal or one is 1. ```python (4, 3) + (3,) → (4, 3) ✓ # (3,) broadcasts to (1, 3) then (4, 3) (4, 3) + (4,) → error ✗ # 3 != 4 (4, 3) + (4, 1) → (4, 3) ✓ # 1 broadcasts to 3 # Common pattern: make (n,) broadcastable to (n, d) weights = np.array([1, 2, 3]) # shape (3,) weights[:, None] # shape (3, 1) - broadcasts with (3, d) weights.reshape(-1, 1) # equivalent ``` --- ### Floating Point Comparisons ```python # BAD if np.sum(vec**2) == 1.0: ... # GOOD if np.isclose(np.sum(vec**2), 1.0): ... # For arrays np.allclose(arr1, arr2, rtol=1e-5, atol=1e-8) ``` --- ## Python Patterns ### NamedTuple ```python from collections import namedtuple Point = namedtuple('Point', ['x', 'y']) p = Point(1, 2) p.x # 1 p[0] # 1 p.x = 5 # AttributeError! Immutable # To "modify": p2 = p._replace(x=5) # Point(x=5, y=2) p3 = Point(5, p.y) # equivalent ``` ### List Comprehension Order ```python # Nested loops: outer first, inner second (same order as regular for loops) [[i*j for j in range(3)] for i in range(2)] # [[0,0,0], [0,1,2]] # Flattening: [x for row in matrix for x in row] # row is outer loop ``` ### Mutable Default Arguments ```python # BAD def __init__(self, items=[]): self.items = items # shared across all instances! # GOOD def __init__(self, items=None): self.items = items if items is not None else [] ``` --- ## pdb Debugger ### Starting the Debugger ```python # Insert at point where you want to break import pdb; pdb.set_trace() # Python 3.7+ shorthand breakpoint() ``` ### Essential Commands | Command | Short | Description | |---------|-------|-------------| | `next` | `n` | Execute next line (step over functions) | | `step` | `s` | Step into function call | | `continue` | `c` | Continue execution until next breakpoint | | `return` | `r` | Continue until current function returns | | `list` | `l` | Show source code around current line | | `list .` | | Re-center listing on current line | | `print expr` | `p expr` | Print expression value | | `pp expr` | | Pretty-print expression | | `where` | `w` | Show call stack | | `up` | `u` | Move up one frame in stack | | `down` | `d` | Move down one frame in stack | | `quit` | `q` | Quit debugger (and program) | ### Practical Example ```python def buggy_normalize(arr): import pdb; pdb.set_trace() total = np.sum(arr) # after hitting n: check `p total`, `p arr.shape` return arr / total # In pdb: # (Pdb) p arr.shape # (3, 4) # (Pdb) p total # 24.0 # (Pdb) p np.sum(arr, axis=1, keepdims=True) # array([[6.], [6.], [12.]]) # aha, need axis parameter! ``` ### Tips - Type any Python expression to evaluate it - Use `!` prefix if command conflicts with pdb command: `!n = 5` - `interact` drops into full Python shell at current frame - Set conditional breakpoints in code: `if condition: pdb.set_trace()` --- ## unittest Assertions ```python import unittest class TestFoo(unittest.TestCase): # Equality self.assertEqual(actual, expected) # actual == expected self.assertNotEqual(a, b) # Truthiness self.assertTrue(x) self.assertFalse(x) self.assertIsNone(x) self.assertIsNotNone(x) # Identity & Type self.assertIs(a, b) # a is b self.assertIsInstance(obj, cls) # Containers self.assertIn(item, container) # item in container self.assertCountEqual(a, b) # same elements, any order # Numeric self.assertAlmostEqual(a, b, places=7) # round(a-b, places) == 0 self.assertGreater(a, b) # also: GreaterEqual, Less, LessEqual # Exceptions with self.assertRaises(ValueError): some_function() # NumPy arrays (use numpy.testing instead!) np.testing.assert_array_equal(actual, expected) np.testing.assert_array_almost_equal(actual, expected, decimal=6) np.testing.assert_allclose(actual, expected, rtol=1e-7, atol=0) ``` ### Reading Test Failures ``` AssertionError: Lists differ: [1, 2, 3] != [1, 2, 4] First differing element 2: 3 4 ``` Format is `assertEqual(actual, expected)` — first value is what your code produced. --- ## Quick Debugging Checklist 1. **Read the error message** — line number and exception type 2. **Check shapes** — `print(arr.shape)` liberally 3. **Check types** — `print(type(x))`, especially for tuple vs array 4. **Check values** — edge cases: empty arrays, zeros, negatives 5. **Simplify** — test function with minimal input 6. **Compare against spec** — re-read docstring/test expectations