# Problems discussed
- [Integer Division](https://open.kattis.com/problems/integerdivision) ||map||
- [Union Find](https://open.kattis.com/problems/unionfind) ||union-find||
- [Supercomputer](https://open.kattis.com/problems/supercomputer) ||segment tree||
- [Rust](https://open.kattis.com/problems/rust) ||2D prefix sum||
- [Almost union-find](https://open.kattis.com/problems/almostunionfind) ||smaller to larger merging||
- [Dragon maid](https://open.kattis.com/problems/dragonmaid) ||offline sorting+priority queue||
- [Bar Shelf](https://open.kattis.com/problems/barshelf) ||segment tree over large interval||
- [An Easy Array Problem](https://open.kattis.com/problems/aneasyarray) ||modified segment tree||
# [Integer division](https://open.kattis.com/problems/integerdivision)
First off, divide every integer by $d$. Now, you want to count the number of pairs of integers with the same value. If there are $K$ integers with some value, then there are ${K \choose 2}=\frac{K(K-1)}{2}$ unique pairs with the value $K$. So now we only need to be able to count the number of integers with the same value, which can be done with a dictionary (map in C++).
```python
from collections import defaultdict
n, d = map(int, input().split())
occs = defaultdict(int)
for i in map(int, input().split()):
occs[i // d] += 1
ans = 0
for v, occ in occs.items():
ans += occ * (occ - 1) // 2
print(ans)
```
# [Union find](https://open.kattis.com/problems/unionfind)
In this problem, you should simply implement the basic form of the union-find data structure.
The explanation by CP-algorithms is probably better than ours. https://cp-algorithms.com/data_structures/disjoint_set_union.html
An alternate explanation by us is provided at page 129 here: http://www.mattekollo.se/wp-content/uploads/2024/08/Programmeringskollo_hafte_2024.pdf. It also describes a nice application of union-find: finding a minimum spanning tree.
An implementation in C++ can be found [here](https://gist.github.com/Matistjati/d86b84c2b7f411cf783f60332f2ff3e4). We're too busy with exams for python solutions for most of these problems.
# [Supercomputer](https://open.kattis.com/problems/supercomputer)
For this problem, it turns out that we're kind of forced to use a more complicated structure called a segment tree. This is basically an array $S$ that supports the following two operations (among others):
- set $S[i]=k$
- what is the sum of $S[l]+S[L+1]+\dots+S[R]$
both in time $\log(N)$.
The explanation by CP-algorithms is probably better than ours. https://cp-algorithms.com/data_structures/segment_tree.html
An alternate explanation by us is provided at page 118 here: http://www.mattekollo.se/wp-content/uploads/2024/08/Programmeringskollo_hafte_2024.pdf.
Here is an [implementation](https://gist.github.com/Matistjati/358a390fafdbdb1f245dcb5179b98706) in C++ using segment tree.
There is another data structure that's basically a weaker but faster segment tree called "fenwick tree". They can solve supercomputer, but for example can't give you smallest number in an interval (which segment trees can). Here is an [implementation](https://gist.github.com/Matistjati/60bdbab7631edbee88c2f457a73c5257) using a fenwick tree.
# [Rust](https://open.kattis.com/problems/rust)
Given a function that can calculate the sum of all numbers in rectangle in $O(1)$, the problem becomes easy. Let's assume we know how to do this. We create two 2D arrays: one with only the valuables, and one with a zero for every tree. Now, we can try every spot in $O(1)$: first, we check if the tree sum is nonzero; if it is, we can't build here. Otherwise, check the valuables sum.
Now, how can we do 2D range sums? One way is to extend prefix sums to 2D. Usaco guide has a great explanation on this https://usaco.guide/silver/more-prefix-sums?lang=cpp#2d-prefix-sums.
The problem can also be solved by generalizing doing a 2D sliding window, but it's unclear if that code will be easier than the prefix sum approach.
A C++ implementation can found [here](https://gist.github.com/Matistjati/96281880b0def975fb2380cac45b592e).
# [Almost union-find](https://open.kattis.com/problems/almostunionfind)
This problem is almost like union-find. The only caveat is that we also have to process the query of moving a single node to another set. We could maybe try to modify the union find algorithm to account for this while not exceeding logarithmic or sub-logarithmic time per query. \
But there is an easier way, through using a method called smaller-to-larger merging. \
It's not a very complicated idea: if you are supposed to merge two sets, always put the smaller set's elements into the larger of the two and proceed with that one. \
Although this might at first glance seem very slow it's not. Since every time we have to move an item from one set to another the set always at least doubles in size from the perspective of the smaller size set, every item can be moved a maximum of logarithmic amount of times between sets. Since the data structure "set" in C++ can handle insertions and deletions in $O(logN)$. Just merging sets will have the time complexity $O(log²N)$. \
As it turns out you can do the same here, just save which set they belong to as well as the sum for each current set. When it's queries of type 1, just merge the smaller of p and q's sets into each other (remember to also transfer the sum, and make sure the elements you moved have updated the set they belong to). \
For queries of type 2: Just delete the item from the set it belongs and add it to the new set, while also changing the sum etc.\
For queries of type 3: Print the size of the set that p belongs to as well as its sum.
Tada.
https://usaco.guide/plat/merging?lang=cpp
[Implementation](https://gist.github.com/Matistjati/63c7c483d4bb5775f5c0431cced3604b)
# [Dragon maid](https://open.kattis.com/problems/dragonmaid)
Would it not be convenient if we got the queries in ascending order of $x$? If that were the case we would only have to add items to the “pile” of items we have to consider in the answer. Thankfully there is a neat trick we can do here in order to get the queries in ascending order. Since we are allowed to answer the queries offline(we don’t need to process them in the order they are given) we can simply order the queries with $x$ before answering the queries.
Now we just have to be able to handle adding elements and printing the 10 elements with the largest value. The classical way to do this is using a priority queue. Or alternatively you could solve it using a vector that you sort after every insertion and delete the worst element when there are more than 10 elements in it vector. Using a vector is slightly faster (exercise: why?).
[Implementation](https://gist.github.com/Matistjati/eda1821336da31e93038b21af712e466)
# [Bar shelf](https://open.kattis.com/problems/barshelf)
$O(N^3)$: try all triples.
Usually, when we want to bruteforce something, it's productive to fix the "middle" and then hope that the non-middle parts are independent. In fact, if we fix the middle bottle, the valid bottles to the left and right are completely independent! So the number of triples using bottle $x$ as middle is simply the product of the number of valid bottles to the left and right of $x$. We can then add up this for every possible middle bottle. This runs in $O(N^2)$: we try each middle bottle, and counting valid first and last bottles takes $O(N)$.
To speed this up, it seems hard to reduce the number of middle bottles. Instead, let's try speeding up counting the number of candidates for the first and last bottle. Let's only focus on counting the first bottle for now. Our code will look something like:
```python
# compute number of left-candidates for all bottles
num_left = [-1] * n
for i in range(n):
for j in range(i):
if bottles[i] * 2 >= bottles[j]:
num_left[i] += 1
```
We can realize that this is in fact a range query over bottle values, so segment trees can be used. The code will look something like this:
```python
# compute number of left-candidates for all bottles
segtree = new Tree()
num_left = [-1] * n
for i in range(n):
num_left[i] += tree.query(0, bottles[i] // 2)
tree.add(bottles[i], 1)
```
However, things are complicated by the fact that the sizes of bottles are up to $10^9$. An array this large will neither work time- nor memorywise. In general, there are two reasonable solutions to this:
1. There are at most $N$ unique values in the input, so we can remap the bottle sizes to be in the interval $[0, N)$. Then, a normal segment tree of $N$ elements works
2. Use an [implicit segment tree](https://cp-algorithms.com/data_structures/segment_tree.html#dynamic-segment-tree) (also known as sparse or dynamic segment tree)
There's a tradeoff between these two methods. Method one probably uses less code, runs faster, but is harder logically and more prone to off-by-one. On the other hand, implicit segment trees use more, are much slower, but are intuitive; they are just like normal segment trees, but with huge indices.
So this allows us to easily calculate the number of candidates to the left of each bottle. But it's not easy to calculate the number to the right at the same, so we will calculate the left and right values separately.
Finally, it's also worth mentioning that since we are not doing range sum queries, but instead counting ones, we can use a modified set. This can either be done using [treaps](https://cp-algorithms.com/data_structures/treap.html), or using [GCC PBDS](https://codeforces.com/blog/entry/11080).
### Code
Here are several different ways to solve the problem.
- Fenwick tree solution: [link](https://gist.github.com/Matistjati/2df587ab6697a2fc809fa41d45a7cdb8). Runtime: 0.15s.
- Segment tree solution: [link](https://gist.github.com/Matistjati/9e9d034c1897620692c27ed6b8f42767). Runtime: 0.26s.
- Implicit segment tree solution: [link](https://gist.github.com/Matistjati/f13ed558eda1c8fc19d51e1d123b4d7d). Runtime: 1.57s. Note that *much* better implementations are possible.
- PBDS solution: [link](https://gist.github.com/Matistjati/7f21e423e7ca8cd7f996561a54ce0918). Runtime: 0.29s.
- Treap solution: [link](https://gist.github.com/Matistjati/e9dd2d25560da4549ba9acae07a3ef58). Runtime: 0.47s.
### Discussion
So which method is preferred? I would say that if you have the code and you believe it's going to be fast enough, implicit segment trees are the most intuitive, which can be seen from how easy the non-datastructure code is. For example, Victor Vatn used implicit segment trees as black box to solve [problem B](https://pofinal24.kattis.com/contests/pofinal24/problems/vasaloppet) in the finals 2024 in just 12 minutes.
To be quite frank, my implicit segment tree implementation sucks. [Kactl](https://github.com/kth-competitive-programming/kactl/tree/main/content/data-structures), while having lots of good code, doesn't have a good implicit segment tree either. All of this is to say, sometimes during contests you know the data structure you need, but there is no easily googleable "great implementation". That is why it's nice to write good implementations yourself (of course heavily inspired by implementations from people better than you), that you are familiar with and know how to modify. Allegedly, one great source is looking at the solutions of top performers on codeforces rounds which have problems using your desired data structure. Although you didn't hear this from me, since I have no idea about the copyright implications.
If you want to start by optimizing mine instead, here are some ideas:
- The lowest-hanging fruit is to not be lazy like me; not using `#define int ll`, but instead only using 64-bit ints where necessary improves the runtime to 1.37 seconds.
- Create less nodes by writing push more carefully. I always create both the left and right child, but in most cases only one is needed.
- Instead of using points, preallocate a big array of nodes to get better caching.
- Problem-specific: my implementation has range add, but only point add is needed, which in this case would also allow speeding up the implementation.
- Problem-specific no.2: use "impilcit fenwick trees". Just use a fenwick tree with an unordered map! Note that this only works if your query is invertible (pretty much only +,-,* and xor satisfy this). [0.72](https://gist.github.com/Matistjati/40b46d4f1c814b21710bef9d4b1a0278) using vanilla unordered map. I tried speeding it up by using better hashtables, but the STD hashtable was the only one that got AC at all.
Hardcore ideas:
- You don't need any nodes with degree 1. Compressing these long chains reduces memory usage to $O(n)$, which should help massively with caching. I don't remember where I heard of this idea, but it sounds awful to implement.
- Higher branching factor. I assume that the bottleneck currently is the memory accesses to random memory over and over. The CPU can easily fetch memory much faster if it's contiguous or more predictable. Therefore, we might have, let's say, 8 or 16 children for every node, where they are all laid out in a contiguous block. Then, you could use SIMD (or hope for autovectorization) in finding which one you should branch on. There are some more nice ideas [here](https://en.algorithmica.org/hpc/data-structures/segment-trees/) and [here](https://en.algorithmica.org/hpc/data-structures/s-tree/).
# [An Easy Array Problem](https://open.kattis.com/problems/aneasyarray)
Intuitively, this problem doesn't seem *that* hard. However, the devil lies in the details.
Denote the product of the endpoints by $p$. Let's list some cases:
#### $p<0$
- First and foremost, it would be optimal to use the smallest negative number times the largest positive number.
- If there aren't any negative numbers, we want a zero if one exists.
- If there isn't a zero, the best we can do is the smallest two positive numbers.
Instead of trying to list every possible case (there are about 7 in total), let's do the following: find the largest, second largest, smallest and second smallest numbers in the interval. Try all pairs of these. Also try 0 if it's present. This almost works; however, it what if the smallest and largest overlap? For example:
`1 0 5 1`
Here, the largest and second largest are (5,0) and smallest and second smallest are (0,5). If we try all pairs, the algorithm will incorrectly combine the 5s and get the answer 25, while the answer must be 0. The easiest way to remedy this is to simply solve the query using brute force if the interval is small enough, because if the interval is bigger than say 8, we can prove that the 4 max/min values will be disjoint will be guaranteed to be disjoint.
To quickly find if there is a 0 in the interval, we can use a prefix sum over all occurences of zeros.
To quickly find the largest and second largest number in the interval, we can use a modified segment tree. In the code below, we use a slightly modified kactl's segment tree. In order to not write two versions, we use templates to pass the function that we wish to aggregate. The reason it's done by creating a class is so that the compiler may get compile-time access to our function, so it can optimize harder. There may be cleaner ways to get the same result (for example, passing lambda functions gave a slight performance decrease).
A C++ implementation can be found [here](https://gist.github.com/Matistjati/4cb21a9cc20c7b57c8f5f638a228234a).