# Slope trick This blog is the second blog in a series of blogs about line containers. Links: - 1: [Data structure: add line, query max](https://hackmd.io/I4227EHWRuO16T9odRgZWw) - 2: Slope trick (this blog) - 3: [Convex hulls](https://hackmd.io/DpZ_bPNWSd6nwov6Ym1n2w) Slope trick is a way to represent functions that are convex, piecewise linear and continuous. Let's showcase the idea with a problem. ## [Sonya and Problem Without a Legend](https://codeforces.com/contest/713/problem/C) First, read the [problem](https://codeforces.com/contest/713/problem/C). Let $a$ be the numbers in questions. We first need to transform this problem a bit by changing the condition from strictly increasing to non-decreasing. To do that, simply do `a[i] -= i` for all elements. Next, let's solve the problem with DP. Let $$DP[i][t]=\text{min cost to make first i+1 elements non-decreasing, if we change a[i] to t}$$ The transition is as follows: $$DP[i][t]=\min\limits_{j \leq t} \{DP[i-1][j]\} + |t-a[i]| $$ In words, if element $i$ has value $t$, we iterate over what value the previous element should have. The only constraint is that it has value $\leq t$. Then, we pay $|t-a[i]|$ to change $a[i]$ to $t$. This solution has completely terrible time complexity: it runs in something like $O(n \cdot \max\{a_i\}^2)$. It is not hard to optimize it to $O(n \cdot max\{a_i\})$. However, we will focus on magically optimizing it down to $O(n \log(n))$. The idea is to think of $DP[i]$ as a function of $t$. It turns out that our transitions result in $DP[i][t]$ being the a pretty nice function, and by choosing a clever representation, all transitions can be performed quickly. Before describing exactly why it's nice, let's get a little feel for how the function will typically look like. Let's first assume $a[0]=3$ and visualize: $DP[0]$. From the transition, we get that $DP[0][t]=|t-a[0]|$ ![image](https://hackmd.io/_uploads/ryp2XMmTel.png) Now, let's think of how to compute $DP[1][t]$. We obtain $DP[1]$ by changing $DP[0]$, as per the transition. According to it, we furst need to change $DP[0][t]$ to $DP[0][t]=\min\limits_{j\leq t}\{DP[0][t]\}$. This basically means to change $DP[0]$ to the prefix-min of $DP[0]$ (prefix-sum but with min instead of plus). In general, our function will look a bit like $y=x^2$: if we take a prefix-min on this, the first part before the minima will remain unaffacted, and afterwards, the function will simply be y=minima. The reason for this is that after the minima, it is increasing, and will thus be worse than all previous values. Taking the prefix-min of $DP[0]$, it will look like this ![image](https://hackmd.io/_uploads/BJm9BGXTgl.png) After taking the prefix-min, to finish changing $DP[0]$ into $DP[1]$, we also need to add $|t-a[1]|$. Let's assume $a[1]=5$. Then, we get ![image](https://hackmd.io/_uploads/rJb1UGQTgx.png) From these images, let's make some observations: - Our function consists of a set of lines - The function is continuous - If we have considered $N$ $a[i]$ so far, there will be at most $2N$ lines Just from these facts, you would probably be able to solve the problem in $O(N^2)$ by explicitly keeping track of all lines on the form `y=kx+m` and their intervals, and then computing $DP[i]$ from $DP[i-1]$ in $O(N)$. This is sufficient, since the problem has $N\leq 3000$. However, if we choose a more clever representation than `y=kx+m`, we can solve the problem in $O(N\log(N))$ with less code. Instead of explicitly storing the lines as a set of `y=kx+m`, we need an additional observation: our function is convex. Informally, this means that it looks like $y=x^2$. More formally, its derivative is non-decreasing. Using this, we can store store the function with the following, slightly unintuitive representation. We will keep track of the leftmost line as `y=kx+m`. Then, for every $x$ where the slope changes, we will record how much the slope increases as the pair $(x, +k)$: ![image](https://hackmd.io/_uploads/SJQa0zXTex.png) More exactly, we will keep a map that says for each $x$, how much does the slope increase. Note that this information is sufficient to fully describe the function. We will call the points where the slope changes *breakpoints*. Note that since our function is convex, the breakpoints will always increase slope, never decrease it. This representation of the function is the essence of slope trick. Now, what do we gain from storing our lines like this? It turns out that this representation allows to efficiently perform the two transitions of computing a prefix-min and adding a new abs-function efficiently. To avoid $O(N^2)$ from copying $O(N)$ breakpoints in every step, we will simply compute $DP[0]$, and then change it to become $DP[1]$, then $DP[2]$, .... Now, how do we do our two desired operations on this representation? To add two slope tricks, it suffices to add the two leftmost linear functions, and then add all breakpoints. To add the breakpoints, simply add the slope increase for each x-coordinate in the breakpoints map. Let's motivate why this is true. One way to think of it is to consider how you would add two such functions if they were not in slope trick form. You would probably do a left-to-right sweep, keeping a priority queue events where the slope of the function changes. Slope trick can then be thought of as storing these events, and then, adding two slope tricks is naturally to just consider the union of all breakpoints. Let's illustrate it with an example: suppose we add $|x-3|$ to the function above. In slope trick form, $|x-3|$ is $3-x$ with breakpoint $(3,2)$. Then, adding it to the other one, which is $-2x+8$ with breakpoints $(3,1),(5,2)$, we get $-3x+11$ with breakpoints $(3,3),(5,2)$. I encourage you to draw this before looking at the result so you understand it. ![image](https://hackmd.io/_uploads/BkWdcbVTgg.png) In general, if we have two slope tricks $a$ and $b$, with $|a| \leq |b|$, then we can add them in $O(|a| \log(|b|))$ with very little code. This is the single most useful property of slope tricks. Some C++ pseudocode: ```c++ struct SlopeTrick { ll k,m; // leftmost line map<ll,ll> breakpoints; void add(SlopeTrick& other) { k+=other.k; m+=other.m; for (auto [x,dk] : other.breakpoints) { breakpoints[x] += dk; } } }; ``` Next, let's consider how to perform a prefix-min. Let's think of what this means. As stated earlier, our function will approximately look like $y=x^2$. Then, we need to remove all lines with positive slope. The way we do this is by removing the breakpoints that create positive slope. To do this efficiently, keep track of the total change of slope of all breakpoints, and then repeatedly remove the rightmost breakpoints until the total slope change is 0. This operation looks something like this (image stolen). ![image](https://hackmd.io/_uploads/S1gAgQQ6el.png) Working code: ```c++ struct SlopeTrick { ll total_slope_change = 0; // total slope change from breakpoints ll k,m; // leftmost line map<ll,ll> breakpoints; void add(SlopeTrick& other) { k+=other.k; m+=other.m; for (auto [x,k] : other.breakpoints) { breakpoints[x] += k; total_slope_change += k; } } void prefixmin() { while (sz(breakpoints) && k+total_slope_change>0) { auto it = prev(end(breakpoints)); if (k+total_slope_change-it->second > 0) { total_slope_change -= it->second; breakpoints.erase(it); // can safely remove } else { // remove only partially ll remove_amount = k+total_slope_change; it->second -= remove_amount; total_slope_change -= remove_amount; } } } }; ``` Using this structure, we can now easily solve the problem. While it might seems that prefixmin could be slow, it amortizes to $O(n\log(n))$: we only add $n$ breakpoints, and each is removed at most once. There is one final detail: when we have done all transitions, how do we get the answer? Well, the answer is the minima of the final function. Since the final functions is the total cost as a function of the last value, and we can greedily choose the best one. The easiest way to do this is to compute the prefix-minima and evaluate the function at $+\infty$. My implementation can be found [here](https://gist.github.com/Matistjati/66256f79fab8e0185f89ce4c9d6763b4). ## Generalizing The preconditions for being able to use slope trick are as follows: our function must be piecewise linear and continuous. This is enough to be able to add functions. To be able to take prefix-min in sublinear time amortized, we also need convexity (or concavity for prefix-max). To summarize, the main advantage of this line representation is that it allows us to quickly add functions with little code. It is very useful that for slope tricks with size $|a| \leq |b|$, it runs in $O(|a| \log(|b|))$ time, as this allows us to perform smaller to larger merging. If you store the lines explicitly as set of $(k,m)$, it impossible to make addition run faster than $O(|a|+|b|)$. In general, slope trick can be a little hard to digest, and my explanation is likely not perfect/complete. Here are two other blogs about the topic if you want a more complete picture. https://codeforces.com/blog/entry/77298 (large inspiration) https://codeforces.com/blog/entry/47821 ## [Five Rats at Gustav's](https://open.kattis.com/problems/frag) To solve the problem, let's do a bottom-up style DP. We will calculate $$DP[u][t]=\text{what is the cheapest cost to ensure no rat in u's subtree enters the root in t seconds?}$$ Let's start by writing this DP in $O(NT)$. The base case is a rat: let $D(u)$ be the distance from the root to $u$. To stall it for $T \in [0,D(u)]$ seconds is free: this is the time the rat has to travel to the root. For every extra second, we have to pay $e_u$ energy. Thus, the function will look something like this ![image](https://hackmd.io/_uploads/S1X3KLmplx.png) If we are not on a rat, we will have to choose some number of seconds to stall for. If we stall for $s$ seconds at this node and want to stall for $t$ seconds in total, we get $$DP[u][t]=\min\limits_{0 \leq s \leq t}\{\sum\limits_{c \in children(u)} dp[c][t-s] + s \cdot e_u \}$$ In words, to get the cheapest way to stall for $t$ seconds, we iterate over how long to stall on this node. We pay that in energy, and then have to stall the remaining $t-s$ seconds in the children. Now, how do we optimize this? To be able to use slope trick, we need to prove (or convince ourselves) that our DP is convex. For discrete functions case, this means that adjacent differences are nondecreasing. One cheesy way to discover that the DP is convex is to print the differences in the DP table and stare at it. Let's do something slightly better than that. In the base case, the DP is convex (the differences will first be 0, and then increase to a constant value). Then, we actually perform a min-plus convolution to produce $DP[u][t]$. We can see this by letting $g[s]=s \cdot e_u$. Note that $g$ is also convex. Then, we get $$DP[u][t]=\min\limits_{0 \leq s \leq t}\{\sum\limits_{c \in children(u)} dp[c][t-s] + g[s] \}$$ So we start with convex functions and combine them via min-plus convolutions. The min-plus convolution of any two convex functions is always convex, so we retain convexity. Therefore, we can apply slope trick. To see how to to solve the problem using slope tricks, let's do some rewriting $$DP[u][t]=\min\limits_{0 \leq s \leq t}\{\sum\limits_{c \in children(u)} dp[c][t-s] + s \cdot e_u \}$$ First, we can swap the $s$ and $t-s$, as they are symmetric. $$DP[u][t]=\min\limits_{0 \leq s \leq t}\{\sum\limits_{c \in children(u)} dp[c][s] + (t-s) \cdot e_u \}$$ Then, group terms. $$DP[u][t]=\min\limits_{0 \leq s \leq t}\{\sum\limits_{c \in children(u)} (dp[c][s]-s \cdot e_u) + t \cdot e_u \}$$ Finally, move the $t \cdot e_u$ outside the min, since it's constant. $$DP[u][t]=\min\limits_{0 \leq s \leq t}\{(\sum\limits_{c \in children(u)} dp[c][s])-s \cdot e_u\} + t \cdot e_u$$ Suddenly, this looks very similar to Sonya and problem without a legend. First, we sum all children using smaller-to-larger merging. Then, we subtract $e_u$ from its slope, emulating $-s \cdot e_u$. Then, we take a prefix min. Finally, we add $e_u$ to the slope, emulating $+t \cdot e_u$ (since it is now a function of $t$). All of these operations can be performed quickly using slow trick. The only slow part is the smaller-to-larger merging of children, taking $O(N\log(N)^2)$ time total. The best part about this? We have explicitly computed the lines describing the amount of energy you need as a function of $T$. By applying the normal convex hull trick on these lines, we can answer queries for different values of $T$ in $O(\log(N))$. An implementation can be found [here](https://gist.github.com/Matistjati/193b057bf06fca4a04c74a5c41e3b55f). ## Min plus convolution Additionally, under certain conditions, adding two slope tricks also computes what is known as the minkowski sum or min plus convolution (or infimal convolution). It is defined as follows: ```python a={slope trick} b={slope trick} res = {} for l1 in a: for l2 in b: res.add(l1.k+l2.k, l1.m+l2.m) normalize(res) # remove all lines that are not on the convex hull ``` *Note* that we need to place an additional restriction for this to hold: it is not sufficient that we can represent our function using slope trick. It must hold that the slope trick represents the lower hull of a set of lines. The fact that the sets are both convex is sufficient to compute the min plus convolution in $O(|a|+|b|)$, but we lose the ability to perform smaller-to-larger merging-style min plus convolution. In general, min plus convolution out to be somewhat common. For example, its sibling, max plus convolution can be used in some knapsack DPs. Sometimes, solving a problem obviously requires min plus convolution, while it's less obvious that this is equal to the addition of two convex hulls. I have yet to see a problem using this fact, but it is very useful. It is probably unlikely to show up, because as soon as we need other operations such as max, slope trick is no longer nice. In Five Rats a Gustav's, we do perform a min-plus convolution, but the function isnt the lower hull of a set of lines, so can't simple add functions for it. The reason for this is the base case not satisfying the condition. Still, slope trick allows you to min-plus convolve a slope trick with a linear function in $O(\log(N))$, which is exactly what we to solve the problem. ## [Magic Tree](https://oj.uz/problem/view/CEOI19_magictree) Now that you've learned about slope trick, use it to solve magic tree yourself!