SQRT decomp - HackMD

# Sqrt decomposition This article uses a very liberal definition of sqrt decomposition: we will describe most common situations where you end up with a complexity involving square roots. Sometimes this is the intended/best complexity, sometimes not. Irregardless, knowing how to write code with a good constant factor is often required to pass the time limit. # Blocking ## Example problem: [Point add range sum](https://judge.yosupo.jp/problem/point_add_range_sum) Divide your array into blocks of size $K$. Each block maintains the sum of $O(K)$ elements. Using these, we can speed up the queries: for every $K$ indices completely contained within a block, it suffices to add the block som in $O(1)$. There are at most $O(\frac{N}{K})$ blocks. At the left and right endpoints of the query, there can be up to $K$ elements not completely contained within blocks. We can handle these in $O(K)$. The code is as follows: ```c++ ll ans = 0; ll i = l; while (i < r && i % block_size != 0) // left part { ans += arr[i++]; } while ((i+block_size-1)<r) // middle part { ans += blocks[i / block_size]; i += block_size; } while (i < r) // right part { ans += arr[i++]; } cout << ans << "\n"; ``` We can update blocks in $O(1)$: ```c++ arr[a] += b; blocks[a / block_size] += b; ``` This gives $O(1)$ update and $O(\frac{N}{K}+K)$ query time. Selecting the best $K$, which is $K=\sqrt{N}$, we get $O(\sqrt{N})$ time per query. [Implementation](https://judge.yosupo.jp/submission/303290). This can be sped up further by autovectorizing the loops. Structures with these complexities can sometimes be used to speed up MO's algorithm. ## $O(1)$ query, $O(\sqrt{N})$ update In some problems, there are many queries and few updates. To get the advertised complexity, we maintain a prefix and suffix sum for each block, which allows us to deal with the tails in $O(1)$. Recomputing after an update only takes $O(K)$. To handle the middle blocks, maintain a prefix sum over all blocks. Updating this takes $O(\frac{N}{K})$ time. Problems where blocking is the easiest solution (according to me): - [slottsmur](https://open.kattis.com/problems/slottsmur) - [Kodama hierarchy](https://open.kattis.com/problems/kodamahierarchy) Blocking can also be used to implement segment trees. They have a unique (albeit niche) advantage, being that they operate slightly differently. Suppose our problem is that we have $N$ items $(v_i, c_i)$, meaning that the item is worth $v_i$ dollars and weights $c_i$ kg. Then, we get range queries along with a capacity $C$. We have to answer with the profit of the subset with largest possible profit, whose total weight is $\leq C$. It is possible to merge two "knapsack arrays" A and B in time $O(C^2)$ using so-called max plus convolution: ```python a, b=... # Two size C+1 knapsack arrays res = [0] * (C+1) for i in range(C+1): for j in range(C+1): res[i+j]=max(res[i+j],a[i]+b[j]) ``` It is an open problem to improve this beyond $O(C^2)$ without further assumptions. This allows us to solve the problem in $O(Q \log(N) C^2)$ using a segment tree. However, we can also use blocking. We will maintain a knapsack over each block. Let $K$ be our block size. Then, each block can be built/updated in $O(KC)$ time. Then, to answer each query, we start with an empty knapsack array and then do $O(\frac{N}{K})$ max-plus convolutions. This takes $O(\frac{N}{K}C^2)$ time. The best $K$ is then $K=\sqrt{\frac{N}{C^3}}$, yielding $O(C\sqrt{NC})$ per query/update. Not a great improvement, but if we are working with matrices instead, then the tradeoff is between $O(C^3 \log(N))$ and $O(C\sqrt{NC})$. This can be used for fargaharet. Problems: - https://open.kattis.com/problems/fargaharet - https://open.kattis.com/problems/japaneselottery Note: most of the time, the best $K$ is not exactly $\sqrt{N}$. Choose the $K$ that pushes the most work to the faster one, measured in constant factor. Heuristically, this is typically the case with the "simplest" as code, as that's most likely to have a good constant factor. # Batching Batching is a way to maintain data structures. Let's use it to solve [point add range sum](https://judge.yosupo.jp/problem/point_add_range_sum) again. We will maintain a full prefix sum. We also have a queue of pending updates of the form add $amount$ to $i$. To answer a query, we first use the prefix sum. Then, we apply all pending updates that are within the query interval. If there are ever more than $K$ pending updates, we rebuild the prefix sum. Queries take $O(K)$, updates take $O(N\cdot \frac{Q}{K})$ in total. [Implementation](https://judge.yosupo.jp/submission/303296). This can be used to maintain a dynamic convex hull trick in $O(N\sqrt{Q})$. The idea is that the "pending buffer" stores raw lines. When querying, try every single one. When the buffer grows too large, merge it into a set of lines sorted by slope. Suppose our buffer size limit is $K$. Then, by merging the buffer with the main set merge sort-style, every rebuild costs $O(K\log K+B)$, where $B$ is the current set size, and query costs $O(K+\log(B))$ (we can binary search for the best line). The nice part is that the log-factors completely vanish. ## Problems - [Forced online queries](https://codeforces.com/problemset/problem/1217/F) # Two solutions You have a graph with $V,E \leq 10^5$ vertices and edges. You get $Q \leq 10^5$ queries of the form: is there a path from node $a$ to $b$ using only nodes in the set $\{u_1,u_2, \dots, u_K\}$? The set is given for each query. Sum of all $K$ is bounded by $10^5$. Solution: if $K$ is big, we can do a DFS in $O(M)$. If $K$ is small, we can create a smaller graph with $K$ nodes, and check which edges it has in $O(K^2)$ time. Selecting the optimal cutoff point yields $O(M^\frac{2}{3})$ per query. Faster algorithms exist for this task, but the best I am aware of still use sqrt-like ideas. ## Problems - IOI 2015 teams - [Dream luck](https://open.kattis.com/problems/dreamluck) - [Horse Race (hard)](https://open.kattis.com/problems/horseracehard) # Small/large Counting triangles [V (mod)](https://open.kattis.com/problems/v) https://po2punkt0.kattis.com/problems/pocamp20.lassmeden # Mo's algorithm ### [Guessing camels](https://open.kattis.com/problems/camels) ### Range MEX queries You move pointers $N\sqrt{N}$ times, but only query your data structure $Q$ times. - Subtask 1: $N,Q \leq 1000$ - Subtask 2: $N,Q \leq 10^5$ - Subtask 3: $N,Q \leq 3 \cdot 10^5$ - Subtask 4: $N \leq 3 \cdot 10^5, Q \leq 10^6$ Online 3D Online *and* 3D? Insert-only in non-amortized time # Product If $a \cdot b \leq x$, then $min(a,b) \leq \sqrt{x}$. This means that if you can solve a problem in $O(ab \min(a,b))$, then this runs in $O(x \sqrt{x})$. Holds for grids, for example. # Few unique Given integers with sum $K$, there can exist at most $\approx 1.4\sqrt{K}$ unique values. ## Problems: - [Subset sum 2](https://judge.progolymp.se/contests/subsetsum/problems/subsetsum2) Same idea, applied to string lengths. Given $N$ strings of total length $K$, there are at most $\approx 1.4\sqrt{K}$ distinct *lengths*. Group strings by length and use hashing. ## Problems: - [Proteinsyntes](https://open.kattis.com/problems/proteinsyntes) - [String Multimatching](https://open.kattis.com/problems/stringmultimatching) # Random walk Sample problem: given a set of $N$ numbers $x_i$, with $-k \leq x_i \leq k$. Does there exist a non-empty subset with sum 0? $N, k \leq 10^4$. A naive solution is to do a DP $$DP[i][s] = \text{can sum s be attained after considering first i items?}$$ We have $i \leq N$ and $s \leq N \cdot k$, so it runs in $O(N^2k)$, which is too slow. We can pretty easily speed this up using bitsets. This is still too slow. To optimize it further, let's analyze the worst case. The reason we need to keep $s$ up to $Nk$ is because some solutions will first add items going up to a sum of $O(Nk)$, and then follow with negative items going down to $0$. How do we avoid this? First, we can realize that the order we add our items to the DP doesn't matter. Therefore, we can shuffle the items. Now, the worst case won't happen: the negative and positive numbers will be evenly spread. How far do we expect to have to deviate? This is basically a random walk, with step sizes bounded by $|k|$. It can be shown that the expected furthest deviation is $O(\sqrt{N}k)$, so we only need to consider sums that large in our DP. This gives $O(\frac{N\sqrt{N}k}{64})$, which is fast enough. [Further reading](https://codeforces.com/blog/entry/50036). Problem: [Topside VS Zaun](https://open.kattis.com/problems/topsidevszaun) # Birthday paradox/meet in the middle ## Meet in the middle Suppose you're given the following problem: given $N \leq 40$ integers $x_i$ ($1 \leq x_i \leq 10^9$), is there a subset with sum exactly equal to $K$? Solution: split the items into two lists A and B of approximately equal size. For each of them, generate all $2^{\frac{N}{2}}$ subsets and check if $K$ is in any of them. This covers the case where the subset with sum $K$ is completely contained within one of them, but we also need to cover the case where part of it is contained in A, and the other in B. To do this, iterate over every sum $s$ attainable in A, and check if $K-s$ can is in the other set. A hashmap can be used for this. Problems: - https://open.kattis.com/problems/celebritysplit - https://open.kattis.com/problems/countcircuits - https://open.kattis.com/problems/maxloot - https://open.kattis.com/problems/indoorienteering - https://open.kattis.com/problems/jigsawpresent ## Birthday paradox You are given $N \leq 2 \cdot 10^5$ integers. Find a subset with sum divisible by $d$ ($1 \leq d \leq 10^9$) where you use exactly half of all numbers. $d$ and all numbers are randomized. Let's do something similar to the previous problem: create a set of around $10^5$ random subsets. Whenever you add a new random subset, check if it combined with any previous one can be combined into your desired answer. In order to quickly sample random subsets of size $N/2$, start with a random valid one, and then remove one item and add another one. This sampling is not perfect, but good enough. Problems: - https://open.kattis.com/problems/knapsack101 (this was the example problem. You need to handle $N \leq 64$ separately). - https://open.kattis.com/problems/highwaycombinatorics