<style> .reveal { font-size: 24px; } .reveal section img { background:none; border:none; box-shadow:none; } pre.graphviz { box-shadow: none; } </style> # Segment Trees *(AMH section 12.4)* Performance Engineering, Lecture 6 Sergey Slotin May 14, 2022 --- Segment trees can do lots of different things, but we focus on *dynamic prefix sum*: ```cpp void add(int k, int x); // react to a[k] += x (zero-based indexing) int sum(int k); // return the sum of the first k elements (from 0 to k - 1) ``` ---- - Few prefix sum queries? Store it as it is and calculate the sum directly - Few update queries? Store its prefix sums and re-calculate them on each update Segment trees: $O(\log n)$ for both <!-- .element: class="fragment" data-fragment-index="1" --> ---- ![](https://en.algorithmica.org/hpc/data-structures/img/segtree-path.png =500x) --- ### V1: pointer-based ---- ```cpp struct segtree { int lb, rb; // the range this node is responsible for int s = 0; // the sum of elements [lb, rb) segtree *l = nullptr, *r = nullptr; // pointers to its children segtree(int lb, int rb) : lb(lb), rb(rb) { if (lb + 1 < rb) { // if the node is not a leaf, create children int m = (lb + rb) / 2; l = new segtree(lb, m); r = new segtree(m, rb); } } void add(int k, int x) { /* react to a[k] += x */ } int sum(int k) { /* compute the sum of the first k elements */ } }; ``` ---- ```cpp void add(int k, int x) { s += x; if (l != nullptr) { // check whether it is a leaf node if (k < l->rb) l->add(k, x); else r->add(k, x); } } ``` ---- ```cpp int sum(int lq, int rq) { if (rb <= lq && rb <= rq) // if we're fully inside the query, return the sum return s; if (rq <= lb || lq >= rb) // if we don't intersect with the query, return zero return 0; return l->sum(lq, rq) + r->sum(lq, rq); } ``` ---- ```cpp int sum(int k) { if (rb <= k) return s; if (lb >= k) return 0; return l->sum(k) + r->sum(k); } ``` ---- ![](https://en.algorithmica.org/hpc/data-structures/img/segtree-pointers.svg =500x) Dominated by pointer chasing --- ### V2: implicit ---- - the node $2k$ is the left child corresponding to the range $[l, \lfloor \frac{l+r}{2} \rfloor)$ - the node $(2k+1)$ is the right child corresponding to the range $[\lfloor \frac{l+r}{2} \rfloor, r)$ (the Eytzinger layout) ---- ![](https://en.algorithmica.org/hpc/data-structures/img/segtree-layout.png =500x) ---- If $n$ is not a power of two, we need up to 2x more memory: ```cpp int t[4 * N]; // contains the node sums ``` ---- ```cpp void add(int k, int x, int v = 1, int l = 0, int r = N) { t[v] += x; if (l + 1 < r) { int m = (l + r) / 2; if (k < m) add(k, x, 2 * v, l, m); else add(k, x, 2 * v + 1, m, r); } } ``` ---- ```cpp int sum(int k, int v = 1, int l = 0, int r = N) { if (l >= k) return 0; if (r <= k) return t[v]; int m = (l + r) / 2; return sum(k, 2 * v, l, m) + sum(k, 2 * v + 1, m, r); } ``` ---- ![](https://en.algorithmica.org/hpc/data-structures/img/segtree-topdown.svg =500x) --- ### V2.1: Iterative ---- ```cpp void add(int k, int x) { int v = 1, l = 0, r = N; while (l + 1 < r) { t[v] += x; v <<= 1; int m = (l + r) >> 1; if (k < m) r = m; else l = m, v++; } t[v] += x; } ``` ---- ```cpp int sum(int k) { int v = 1, l = 0, r = N, s = 0; while (true) { int m = (l + r) >> 1; v <<= 1; if (k >= m) { s += t[v++]; if (k == m) break; l = m; } else { r = m; } } return s; } ``` ---- ![](https://en.algorithmica.org/hpc/data-structures/img/segtree-iterative.svg =500x) --- ## V3: Bottom-up ---- Define parent of node $k$ to be $\lfloor k / 2 \rfloor$ ---- Last layer if packed densely, so we need exactly $2n$ cells: ```cpp int t[2 * N]; ``` ---- ```cpp void add(int k, int x) { k += N; while (k != 0) { t[k] += x; k >>= 1; } } ``` ---- ```cpp int sum(int l, int r) { l += N; r += N - 1; int s = 0; while (l <= r) { if ( l & 1) s += t[l++]; // l is a right child: add it and move to a cousin if (~r & 1) s += t[r--]; // r is a light child: add it and move to a cousin l >>= 1, r >>= 1; } return s; } ``` ---- ![](https://en.algorithmica.org/hpc/data-structures/img/segtree-permuted.png =500x) Works even when $n$ is not a power of two! ---- ![](https://en.algorithmica.org/hpc/data-structures/img/segtree-bottomup.svg =500x) --- ### V3.1: Branchless ---- ```cpp int sum(int k) { int s = 0; k += N - 1; while (k != 0) { if (~k & 1) // if k is a right child s += t[k--]; k = k >> 1; } return s; } ``` $n$ needs to be a power of two, otherwise we wrap around ---- ```cpp const int last_layer = 1 << __lg(2 * N - 1); // calculate the index of the leaf k int leaf(int k) { k += last_layer; k -= (k >= 2 * N) * N; return k; } ``` ---- ```cpp void add(int k, int x) { k = leaf(k); while (k != 0) { t[k] += x; k >>= 1; } } int sum(int k) { k = leaf(k - 1); int s = 0; while (k != 0) { //if (~k & 1) // s += t[k--]; s += (~k & 1) ? t[k] : 0; // will be replaced with a cmov k >>= 1; } return s; } ``` ---- ![](https://en.algorithmica.org/hpc/data-structures/img/segtree-branchless.svg =500x) --- ## V4: Fenwick tree ---- ![](https://en.algorithmica.org/hpc/data-structures/img/segtree-succinct.png =500x) Remove redundant nodes (right-children) ---- ![](https://en.algorithmica.org/hpc/data-structures/img/fenwick-sum.png =500x) ---- ```cpp int lowbit(int x) { return x & -x; } ``` ``` +90 = 64 + 16 + 8 + 2 = (0)10110 -90 = 00000 - 10110 = (1)01010 → (+90) & (-90) = (0)00010 ``` <!-- .element: class="fragment" data-fragment-index="1" --> ---- ```cpp int sum(int k) { int s = 0; for (; k != 0; k -= lowbit(k)) s += t[k]; return s; } ``` ---- ![](https://en.algorithmica.org/hpc/data-structures/img/fenwick-update.png =500x) ---- ```cpp void add(int k, int x) { for (k += 1; k <= N; k += lowbit(k)) t[k] += x; } ``` ---- ![](https://en.algorithmica.org/hpc/data-structures/img/segtree-fenwick.svg =500x) Why does it grow so fast? *Cache associativity* <!-- .element: class="fragment" data-fragment-index="1" --> --- ### V4.1: Fenwick with holes ---- ```cpp inline constexpr int hole(int k) { return k + (k >> 10); } int t[hole(N) + 1]; void add(int k, int x) { for (k += 1; k <= N; k += k & -k) t[hole(k)] += x; } int sum(int k) { int res = 0; for (; k != 0; k &= k - 1) res += t[hole(k)]; return res; } ``` ---- ![](https://en.algorithmica.org/hpc/data-structures/img/segtree-fenwick-holes.svg =500x) --- ## V5 (final): Wide segment trees ---- ![](https://en.algorithmica.org/hpc/data-structures/img/segtree-wide.png =500x) ---- ```cpp const int b = 4, B = (1 << b); // cache line size (in integers, not bytes) // the height of the tree over an n-element array constexpr int height(int n) { return (n <= B ? 1 : height(n / B) + 1); } // where the h-th layer starts constexpr int offset(int h) { int s = 0, n = N; while (h--) { s += (n + B - 1) / B * B; n /= B; } return s; } constexpr int H = height(N); alignas(64) int t[offset(H)]; // an array for storing nodes ``` ---- What is stored inside nodes? 1. We could store $B$ *sums* in each node: fast update, slow query (mask-hsum) <!-- .element: class="fragment" data-fragment-index="1" --> 2. We could store $B$ *prefix sums* in each node: okay update (mask-add), fast query <!-- .element: class="fragment" data-fragment-index="2" --> ---- ```cpp int sum(int k) { int s = 0; for (int h = 0; h < H; h++) s += t[offset(h) + (k >> (h * b))]; return s; } ``` ---- ```cpp struct Precalc { alignas(64) int mask[B][B]; constexpr Precalc() : mask{} { for (int k = 0; k < B; k++) for (int i = 0; i < B; i++) mask[k][i] = (i > k ? -1 : 0); } }; constexpr Precalc T; ``` ---- ```cpp typedef int vec __attribute__ (( vector_size(32) )); constexpr int round(int k) { return k & ~(B - 1); // = k / B * B } void add(int k, int x) { vec v = x + vec{}; for (int h = 0; h < H; h++) { auto a = (vec*) &t[offset(h) + round(k)]; auto m = (vec*) T.mask[k % B]; for (int i = 0; i < B / 8; i++) a[i] += v & m[i]; k >>= b; } } ``` ---- ![](https://en.algorithmica.org/hpc/data-structures/img/segtree-simd.svg =500x) ---- ![](https://en.algorithmica.org/hpc/data-structures/img/segtree-simd-others.svg =500x) ---- ![](https://en.algorithmica.org/hpc/data-structures/img/segtree-popular.svg =500x) ---- ![](https://en.algorithmica.org/hpc/data-structures/img/segtree-popular-relative.svg =500x) --- ### Modifications - General range queries? <!-- .element: class="fragment" data-fragment-index="1" --> - Other data types? <!-- .element: class="fragment" data-fragment-index="2" --> - Non-reversible operations? <!-- .element: class="fragment" data-fragment-index="3" --> - Lazy propagation? <!-- .element: class="fragment" data-fragment-index="4" -->
{"metaMigratedAt":"2023-06-17T00:49:24.562Z","metaMigratedFrom":"YAML","title":"Segment Trees","breaks":true,"slideOptions":"{\"theme\":\"white\"}","contributors":"[{\"id\":\"045b9308-fa89-4b5e-a7f0-d8e7d0b849fd\",\"add\":14195,\"del\":4469}]"}
    214 views