<style>
.reveal {
font-size: 24px;
}
.reveal section img {
background:none;
border:none;
box-shadow:none;
}
pre.graphviz {
box-shadow: none;
}
</style>
# Segment Trees
*(AMH section 12.4)*
Performance Engineering, Lecture 6
Sergey Slotin
May 14, 2022
---
Segment trees can do lots of different things, but we focus on *dynamic prefix sum*:
```cpp
void add(int k, int x); // react to a[k] += x (zero-based indexing)
int sum(int k); // return the sum of the first k elements (from 0 to k - 1)
```
----
- Few prefix sum queries? Store it as it is and calculate the sum directly
- Few update queries? Store its prefix sums and re-calculate them on each update
Segment trees: $O(\log n)$ for both <!-- .element: class="fragment" data-fragment-index="1" -->
----
![](https://en.algorithmica.org/hpc/data-structures/img/segtree-path.png =500x)
---
### V1: pointer-based
----
```cpp
struct segtree {
int lb, rb; // the range this node is responsible for
int s = 0; // the sum of elements [lb, rb)
segtree *l = nullptr, *r = nullptr; // pointers to its children
segtree(int lb, int rb) : lb(lb), rb(rb) {
if (lb + 1 < rb) { // if the node is not a leaf, create children
int m = (lb + rb) / 2;
l = new segtree(lb, m);
r = new segtree(m, rb);
}
}
void add(int k, int x) { /* react to a[k] += x */ }
int sum(int k) { /* compute the sum of the first k elements */ }
};
```
----
```cpp
void add(int k, int x) {
s += x;
if (l != nullptr) { // check whether it is a leaf node
if (k < l->rb)
l->add(k, x);
else
r->add(k, x);
}
}
```
----
```cpp
int sum(int lq, int rq) {
if (rb <= lq && rb <= rq) // if we're fully inside the query, return the sum
return s;
if (rq <= lb || lq >= rb) // if we don't intersect with the query, return zero
return 0;
return l->sum(lq, rq) + r->sum(lq, rq);
}
```
----
```cpp
int sum(int k) {
if (rb <= k)
return s;
if (lb >= k)
return 0;
return l->sum(k) + r->sum(k);
}
```
----
![](https://en.algorithmica.org/hpc/data-structures/img/segtree-pointers.svg =500x)
Dominated by pointer chasing
---
### V2: implicit
----
- the node $2k$ is the left child corresponding to the range $[l, \lfloor \frac{l+r}{2} \rfloor)$
- the node $(2k+1)$ is the right child corresponding to the range $[\lfloor \frac{l+r}{2} \rfloor, r)$
(the Eytzinger layout)
----
![](https://en.algorithmica.org/hpc/data-structures/img/segtree-layout.png =500x)
----
If $n$ is not a power of two, we need up to 2x more memory:
```cpp
int t[4 * N]; // contains the node sums
```
----
```cpp
void add(int k, int x, int v = 1, int l = 0, int r = N) {
t[v] += x;
if (l + 1 < r) {
int m = (l + r) / 2;
if (k < m)
add(k, x, 2 * v, l, m);
else
add(k, x, 2 * v + 1, m, r);
}
}
```
----
```cpp
int sum(int k, int v = 1, int l = 0, int r = N) {
if (l >= k)
return 0;
if (r <= k)
return t[v];
int m = (l + r) / 2;
return sum(k, 2 * v, l, m)
+ sum(k, 2 * v + 1, m, r);
}
```
----
![](https://en.algorithmica.org/hpc/data-structures/img/segtree-topdown.svg =500x)
---
### V2.1: Iterative
----
```cpp
void add(int k, int x) {
int v = 1, l = 0, r = N;
while (l + 1 < r) {
t[v] += x;
v <<= 1;
int m = (l + r) >> 1;
if (k < m)
r = m;
else
l = m, v++;
}
t[v] += x;
}
```
----
```cpp
int sum(int k) {
int v = 1, l = 0, r = N, s = 0;
while (true) {
int m = (l + r) >> 1;
v <<= 1;
if (k >= m) {
s += t[v++];
if (k == m)
break;
l = m;
} else {
r = m;
}
}
return s;
}
```
----
![](https://en.algorithmica.org/hpc/data-structures/img/segtree-iterative.svg =500x)
---
## V3: Bottom-up
----
Define parent of node $k$ to be $\lfloor k / 2 \rfloor$
----
Last layer if packed densely, so we need exactly $2n$ cells:
```cpp
int t[2 * N];
```
----
```cpp
void add(int k, int x) {
k += N;
while (k != 0) {
t[k] += x;
k >>= 1;
}
}
```
----
```cpp
int sum(int l, int r) {
l += N;
r += N - 1;
int s = 0;
while (l <= r) {
if ( l & 1) s += t[l++]; // l is a right child: add it and move to a cousin
if (~r & 1) s += t[r--]; // r is a light child: add it and move to a cousin
l >>= 1, r >>= 1;
}
return s;
}
```
----
![](https://en.algorithmica.org/hpc/data-structures/img/segtree-permuted.png =500x)
Works even when $n$ is not a power of two!
----
![](https://en.algorithmica.org/hpc/data-structures/img/segtree-bottomup.svg =500x)
---
### V3.1: Branchless
----
```cpp
int sum(int k) {
int s = 0;
k += N - 1;
while (k != 0) {
if (~k & 1) // if k is a right child
s += t[k--];
k = k >> 1;
}
return s;
}
```
$n$ needs to be a power of two, otherwise we wrap around
----
```cpp
const int last_layer = 1 << __lg(2 * N - 1);
// calculate the index of the leaf k
int leaf(int k) {
k += last_layer;
k -= (k >= 2 * N) * N;
return k;
}
```
----
```cpp
void add(int k, int x) {
k = leaf(k);
while (k != 0) {
t[k] += x;
k >>= 1;
}
}
int sum(int k) {
k = leaf(k - 1);
int s = 0;
while (k != 0) {
//if (~k & 1)
// s += t[k--];
s += (~k & 1) ? t[k] : 0; // will be replaced with a cmov
k >>= 1;
}
return s;
}
```
----
![](https://en.algorithmica.org/hpc/data-structures/img/segtree-branchless.svg =500x)
---
## V4: Fenwick tree
----
![](https://en.algorithmica.org/hpc/data-structures/img/segtree-succinct.png =500x)
Remove redundant nodes (right-children)
----
![](https://en.algorithmica.org/hpc/data-structures/img/fenwick-sum.png =500x)
----
```cpp
int lowbit(int x) {
return x & -x;
}
```
```
+90 = 64 + 16 + 8 + 2 = (0)10110
-90 = 00000 - 10110 = (1)01010
→ (+90) & (-90) = (0)00010
```
<!-- .element: class="fragment" data-fragment-index="1" -->
----
```cpp
int sum(int k) {
int s = 0;
for (; k != 0; k -= lowbit(k))
s += t[k];
return s;
}
```
----
![](https://en.algorithmica.org/hpc/data-structures/img/fenwick-update.png =500x)
----
```cpp
void add(int k, int x) {
for (k += 1; k <= N; k += lowbit(k))
t[k] += x;
}
```
----
![](https://en.algorithmica.org/hpc/data-structures/img/segtree-fenwick.svg =500x)
Why does it grow so fast? *Cache associativity* <!-- .element: class="fragment" data-fragment-index="1" -->
---
### V4.1: Fenwick with holes
----
```cpp
inline constexpr int hole(int k) {
return k + (k >> 10);
}
int t[hole(N) + 1];
void add(int k, int x) {
for (k += 1; k <= N; k += k & -k)
t[hole(k)] += x;
}
int sum(int k) {
int res = 0;
for (; k != 0; k &= k - 1)
res += t[hole(k)];
return res;
}
```
----
![](https://en.algorithmica.org/hpc/data-structures/img/segtree-fenwick-holes.svg =500x)
---
## V5 (final): Wide segment trees
----
![](https://en.algorithmica.org/hpc/data-structures/img/segtree-wide.png =500x)
----
```cpp
const int b = 4, B = (1 << b); // cache line size (in integers, not bytes)
// the height of the tree over an n-element array
constexpr int height(int n) {
return (n <= B ? 1 : height(n / B) + 1);
}
// where the h-th layer starts
constexpr int offset(int h) {
int s = 0, n = N;
while (h--) {
s += (n + B - 1) / B * B;
n /= B;
}
return s;
}
constexpr int H = height(N);
alignas(64) int t[offset(H)]; // an array for storing nodes
```
----
What is stored inside nodes?
1. We could store $B$ *sums* in each node: fast update, slow query (mask-hsum) <!-- .element: class="fragment" data-fragment-index="1" -->
2. We could store $B$ *prefix sums* in each node: okay update (mask-add), fast query <!-- .element: class="fragment" data-fragment-index="2" -->
----
```cpp
int sum(int k) {
int s = 0;
for (int h = 0; h < H; h++)
s += t[offset(h) + (k >> (h * b))];
return s;
}
```
----
```cpp
struct Precalc {
alignas(64) int mask[B][B];
constexpr Precalc() : mask{} {
for (int k = 0; k < B; k++)
for (int i = 0; i < B; i++)
mask[k][i] = (i > k ? -1 : 0);
}
};
constexpr Precalc T;
```
----
```cpp
typedef int vec __attribute__ (( vector_size(32) ));
constexpr int round(int k) {
return k & ~(B - 1); // = k / B * B
}
void add(int k, int x) {
vec v = x + vec{};
for (int h = 0; h < H; h++) {
auto a = (vec*) &t[offset(h) + round(k)];
auto m = (vec*) T.mask[k % B];
for (int i = 0; i < B / 8; i++)
a[i] += v & m[i];
k >>= b;
}
}
```
----
![](https://en.algorithmica.org/hpc/data-structures/img/segtree-simd.svg =500x)
----
![](https://en.algorithmica.org/hpc/data-structures/img/segtree-simd-others.svg =500x)
----
![](https://en.algorithmica.org/hpc/data-structures/img/segtree-popular.svg =500x)
----
![](https://en.algorithmica.org/hpc/data-structures/img/segtree-popular-relative.svg =500x)
---
### Modifications
- General range queries? <!-- .element: class="fragment" data-fragment-index="1" -->
- Other data types? <!-- .element: class="fragment" data-fragment-index="2" -->
- Non-reversible operations? <!-- .element: class="fragment" data-fragment-index="3" -->
- Lazy propagation? <!-- .element: class="fragment" data-fragment-index="4" -->
{"metaMigratedAt":"2023-06-17T00:49:24.562Z","metaMigratedFrom":"YAML","title":"Segment Trees","breaks":"true","slideOptions":"{\"theme\":\"white\"}","contributors":"[{\"id\":\"045b9308-fa89-4b5e-a7f0-d8e7d0b849fd\",\"add\":14195,\"del\":4469}]"}