## Gourmet
Gör en dp med state med hur lång tid du har ätit och transitions testa ät varje rätt. Ifall time > M returnera 0 och ifall M == 0 returnera 1. Går att göra lätt bottom up.
Man får delpoäng ifall man inte memoizar eller inte använder long long i c++.
Top Down
```cpp
#include <bits/stdc++.h>
using namespace std;
#define ll long long
#define fo(i, n) for(ll i=0;i<(n);i++)
typedef vector<ll> vl;
ll memo[1001], m, n;
vl v(30);
ll dp(ll pos){
if(pos>m) return 0;
if(pos==m) return 1;
ll &ans = memo[pos];
if(ans != -1) return ans;
ans = 0;
fo(i, n){
ans+=dp(pos+v[i]);
}
return ans;
}
int main() {
cin.tie(0)->sync_with_stdio(0);
cin >> m >> n;
fo(i, n) cin >> v[i];
memset(memo, -1, sizeof(memo));
cout << dp(0);
return 0;
}
```
Bottom Up
```cpp
#include <bits/stdc++.h>
using namespace std;
#define ll long long
#define fo(i, n) for(ll i=0;i<(n);i++)
typedef vector<ll> vl;
int main() {
cin.tie(0)->sync_with_stdio(0);
ll k, n;
cin >> k >> n;
vl v(n);
fo(i, n) cin >> v[i];
vl ans(k+1, 0);
ans[0] = 1;
fo(i, k){
fo(j, n){
if(i+v[j] > k) continue;
ans[i+v[j]] += ans[i];
}
}
cout << ans[k];
return 0;
}
```
## Knapsack
Do a DP where your state is what item you are on and the total weight of the items you have taken. The transitions is either taking the current item and going to the next one or just skipping to the next one. The thing that is different is that you have to print the items taken in an optimal solution and not the greatest possible value. Here is how you can do that top-down.
```cpp
#include <bits/stdc++.h>
using namespace std;
#define ii pair<int, int>
#define iii pair<int, ii>
#define vi vector<int>
#define vii vector<ii>
#define ll long long
#define INF 1000000000
int w[2010], v[2010], items, memo[2010][2010], chosen, sol[2010];
int solve(int id, int remw)
{
if (id == items || remw == 0)
return 0;
if (memo[id][remw] != -1)
return memo[id][remw];
if (w[id] > remw)
return memo[id][remw] = solve(id + 1, remw);
else
return memo[id][remw] = max(solve(id+1,remw), solve(id+1, remw-w[id])+v[id]);
}
void print_sol(int id, int remw)
{
if (id == items || remw == 0)
return;
if (solve(id + 1, remw - w[id]) + v[id] == memo[id][remw])
{
sol[chosen] = id;
chosen++;
print_sol(id + 1, remw - w[id]);
}
else
{
print_sol(id + 1, remw);
}
}
int main()
{
int cap;
while (scanf("%d %d", &cap, &items) != EOF)
{
chosen = 0;
memset(memo, -1, sizeof(memo));
for (int i = 0; i < items; i++)
scanf("%d %d", &v[i], &w[i]);
solve(0, cap);
print_sol(0, cap);
printf("%d\n", chosen);
for (int i = 0; i < chosen; i++) printf("%d ", sol[i]);
printf("\n");
}
return 0;
}
```
Here is bottom-up version.
```cpp
#include <bits/stdc++.h>
using namespace std;
#define ii pair<int,int>
#define iii pair<int,ii>
#define vi vector<int>
#define vii vector<ii>
#define ll long long
#define INF 1000000000
int w[2010], v[2010], dp[2010][2010], sol[2010];
int n, cap, chosen;
int main() {
while (scanf("%d %d", &cap, &n) != EOF) {
for (int i = 0; i < n; i++) scanf("%d %d", &v[i], &w[i]);
for (int i = 0; i <= n; i++) {
for (int c = 0; c <= cap; c++) {
dp[i][c] = 0;
}
}
for (int i = 1; i <= n; i++) {
for (int c = 0; c <= cap; c++) {
dp[i][c] = dp[i - 1][c];
if (w[i - 1] <= c) {
dp[i][c] = max(dp[i][c], dp[i - 1][c - w[i - 1]] + v[i - 1]);
}
}
}
chosen = 0;
int c = cap;
for (int i = n; i > 0; i--) {
if (dp[i][c] != dp[i - 1][c]) {
sol[chosen++] = i - 1;
c -= w[i - 1];
}
}
printf("%d\n", chosen);
for (int i = chosen - 1; i >= 0; i--) printf("%d ", sol[i]);
printf("\n");
}
return 0;
}
```
## Springoalla
### Subtask 1 ($N \leq 10$ and $t \leq 1000$)
We notice that this is very similar to the Knapsack problem. So let's try to think with a similar DP-approach. We can consider both the "cost" and "value" to be the time of the lap.
Let's consider the items in any order.
When we are on a state and considering some item/trail $i$, we have 3 options:
- Skip this trail. (meaning that we should start considering the next trail)
- Run a full lap on this trail. (we can continue consider the same trail)
- Run a half lap on this trail, only if we have run on this trail before.
Using these options, we can define the following DP-function for this knapsack:
```python
n,t = map(int,input().split())
a = [*map(int,input().split())]
# Memoization
memo = [[[(-1,-1) for _ in range(2)] for _ in range(t+1)] for _ in range(n+1)]
def dp(i, goal, hasTaken):
# We return a tuple (x,y), where x is amount that we exceed t, and y is the number of laps we run.
if goal <= 0: # we have hit our goal
return (-goal,0)
if i == n: # We have considered all trails
assert(goal > 0) # we did not succeed running all t minutes
return (1e9,1e9)
if memo[i][goal][hasTaken] != (-1,-1): #we have calculated this state before
return memo[i][goal][hasTaken]
skip = dp(i+1,goal,0)
x,y = dp(i,goal-a[i],1)
takeFull = (x,y+1)
takeHalf = (1e9,1e9)
if hasTaken:
x,y = dp(i+1,goal-a[i]//2,0)
takeHalf = (x,y+1)
best = min(skip,takeHalf,takeFull)
memo[i][goal][hasTaken] = best
return best
x,y = dp(0,t,0)
print(x+t,y)
```
The time complexity of the DP is $O(N t)$, since there exists $N \cdot t$ states, and the transition is $O(3)$.
To figure out how many minutes we spend on each trail, we can save some extra information in the memo. We can save the information "Which state did we pick?". Then we can follow the "path" from the state $(0,t,0)$, until we have gotten everything.
We can for example, return one extra value in each dp state, that is $1$ if we "skipped" the current item, $2$ if we ran a full trail on this item, and $3$ if we ran half a trail, and $-1$ if we did not do anything more from this state.
Here is example code:
```py
n,t = map(int,input().split())
a = [*map(int,input().split())]
# Memoization
memo = [[[(-1,-1,-1) for _ in range(2)] for _ in range(t+1)] for _ in range(n+1)]
def dp(i, goal, hasTaken):
# We return a tuple (x,y), where x is amount that we exceed t, and y is the number of laps we run.
if goal <= 0: # we have hit our goal
return (-goal,0,-1)
if i == n: # We have considered all trails
assert(goal > 0) # we did not succeed running all t minutes
return (1e9,1e9,-1)
if memo[i][goal][hasTaken] != (-1,-1,-1): #we have calculated this state before
return memo[i][goal][hasTaken]
x,y,z = dp(i+1,goal,0)
skip = (x,y,1)
x,y,z = dp(i,goal-a[i],1)
takeFull = (x,y+1,2)
takeHalf = (1e9,1e9,3)
if hasTaken:
x,y,z = dp(i+1,goal-a[i]//2,0)
takeHalf = (x,y+1,3)
best = min(skip,takeHalf,takeFull)
memo[i][goal][hasTaken] = best
return best
x,y,z = dp(0,t,0)
print(x+t,y)
minutes = [0]*n
x,y,z = 0,t,0
while memo[x][y][z][2] != -1 and y >= 0:
_,_,move = memo[x][y][z]
if move == 1:
x += 1
z = 0
elif move == 2:
minutes[x] += a[x]
y -= a[x]
z = 1
elif move == 3:
minutes[x] += a[x]//2
y -= a[x]//2
x += 1
z = 0
print(*minutes)
```
Note that both the memory complexity and time complexity for this algortihm is $O(Nt)$.
### Why does it not get AC 100?
You don't need to exactly understand this part, but it helps with building inutition for writing fast programs.
In theory, the memory and time complexity should be fine. In practice, it's not. Let's discuss why. First off, the memory usage: we have $10^5*1000*2$ states, and each state is 3 32-bit integers. This is equal to 19200000000 bits = 2400000000 bytes = 2.4GB, couldn't possibly fit in the memory limit of 1024MB. Note that in C++, calculating memory usage like this works well if you use C-style arrays like
```C++
tuple<int,int,int> states[2][1000][int(1e5)];
```
In fact, this will use pretty much exactly 2.4GB. (To be more exact, this won't even compile, as all non-dynamic memory allocations must be less than $2^{31}$ bytes in total or something like that. But changing 1000 to 500 compiles and results in 1.2GB).
So C++, with no overhead already uses too much memory. It is unfortunately even worse for python; each integer in Pypy (the version of python that kattis uses) takes uses approximately 16 Bytes instead of the 4 you would expect in C++.
Even if the constraints were slightly to smaller to allow the memory to fit, it would still be too slow in C++. This is because top-down DP's make approximately random memory accesses. Meanwhile, if we were to write it bottom-up, we access memory sequentially (i.e., we use loops instead of recursion). The reason this is faster is that reading memory from RAM is super slow. However, if you make predictable accesses like a[0], a[1], a[2], etc., the computer notices the pattern and will get the memory before you even ask for it, making you program much faster. This principle is independent of programming language, so even if you use Python, you can benefit from trying to make sure your array accesses are near each other.
However, judging by the constraints, we can infer that it's going to be difficult for Python to be fast enough. Therefore, let's try to use C++, optimize our memory usage and use bottom-up.
### Subtask 2 (We will never run half a lap)
To use less memory, let's try to think how we can optimize this dp.
Let's create an array of length $t$ + some extra space, where the $i$:th position keeps track of the minimum number of started trails to run exactly $i$ minutes.
Then for each item, we can just iterate through this array once, and everytime we encounter a length that we previously can reach, we can reach $a[i]$ further using one more length.
After handling all items, we can check which length is the smallest reachable length $\geq t$.
To figure out amount of minutes we spend on each trail, we do something similar to the previous case. In the length-array, we also save the last index added to reach that length.
By repeatedly checking the last index used to get the optimal length, we can find every trail used, and the amount of times used.
C++ solution:
```cpp
#include <bits/stdc++.h>
using namespace std;
#define rep(i,a,b) for(int i = a; i < (b); i++)
#define all(x) begin(x),end(x)
#define sz(x) (int)(x).size()
typedef vector<int> vi;
typedef pair<int,int> pii;
typedef long long ll;
int main() {
//cin.tie(0)->sync_with_stdio(0);
//cin.exceptions(cin.failbit);
int n,t;
cin >> n >> t;
vi a(n);
rep(i,0,n) cin >> a[i];
// lengths[i] = (x,y), where x is the minimum number of trails to run i minutes, and y says the index of the race it last ran
vector<pii> lengths(t+80000,{1e9,-1});
lengths[0] = {0,-1};
rep(i,0,n) {
rep(j,0,sz(lengths)-a[i]) {
if (lengths[j].first != 1e9) {
pii curr = {lengths[j].first + 1, i};
lengths[j+a[i]] = min(lengths[j+a[i]],curr);
}
}
}
int bestt; // Find the best t, the smallest t that is larger than our goal.
rep(i,t,sz(lengths)) {
if (lengths[i].first != 1e9) {
bestt = i;
break;
}
}
vi minutes(n,0);
// Backtrack to find time spent on each trail
int curr = bestt;
while (curr != 0) {
int ind = lengths[curr].second;
minutes[ind] += a[ind];
curr -= a[ind];
}
cout << bestt << " " << lengths[bestt].first << "\n";
rep(i,0,n) cout << minutes[i] << " ";
cout << "\n";
}
```
### Full solution
Let's build on the solution for subtask 2. Now we have to consider going for half a lap, but only if we have run a lap on the same trail before.
Since we are saving the index of the last lap ran, and we're handling each trail at a time, we can just check if we ran the same lap just before. If we have, then we also handle the situation of going half a lap.
To correctly generate the answer later, we can't only save the index of the last used trail in the case of running half a lap. One solution is to save the $index + n$ instead, to indicate that we used half a lap. (Keep in mind that it's always better to run a full lap than running 2 half laps, so we will never want to run 2 half laps of the same trail).
By handling this case, we get the following code:
```cpp
#include <bits/stdc++.h>
using namespace std;
#define rep(i,a,b) for(int i = a; i < (b); i++)
#define all(x) begin(x),end(x)
#define sz(x) (int)(x).size()
typedef vector<int> vi;
typedef pair<int,int> pii;
typedef long long ll;
int main() {
//cin.tie(0)->sync_with_stdio(0);
//cin.exceptions(cin.failbit);
int n,t;
cin >> n >> t;
vi a(n);
rep(i,0,n) cin >> a[i];
// lengths[i] = (x,y), where x is the minimum number of trails to run i minutes, and y says the index of the race it last ran
vector<pii> lengths(t+80000,{1e9,-1});
lengths[0] = {0,-1};
rep(i,0,n) {
rep(j,0,sz(lengths)-a[i]) {
if (lengths[j].first != 1e9) {
pii curr = {lengths[j].first + 1, i};
lengths[j+a[i]] = min(lengths[j+a[i]],curr);
if (lengths[j].second == i) { // Run half a lap
pii temp = {lengths[j].first + 1, i+n};
lengths[j+a[i]/2] = min(lengths[j+a[i]/2],temp);
}
}
}
}
int bestt; // Find the best t, the smallest t that is larger than our goal.
rep(i,t,sz(lengths)) {
if (lengths[i].first != 1e9) {
bestt = i;
break;
}
}
vi minutes(n,0);
// Backtrack to find time spent on each trail
int curr = bestt;
while (curr != 0) {
int ind = lengths[curr].second;
if (ind < n) {
minutes[ind] += a[ind];
curr -= a[ind];
}
else { // Ran half a lap, since ind >= n
minutes[ind-n] += a[ind-n]/2;
curr -= a[ind-n]/2;
}
}
cout << bestt << " " << lengths[bestt].first << "\n";
rep(i,0,n) cout << minutes[i] << " ";
cout << "\n";
}
```
### Faster solution
It is possible to solve the problem in python. We can get away with not storing two ints in the DP array by making the backtracking to find the solution take $O(Nt)$ instead.
The code can be found [here](https://gist.github.com/Matistjati/0266e77e1e0718861657bb1fcdbec1b4).
By writing this solution in C++ and applying the following optimizations, I got the fastest solution (at the time of writing). My optimizations over the naive Python to C++ translation are:
- Split it into 2 loops, one for running whole and one for 1.5 trails
- Use a better loop order. The current one is suboptimal cache-wise, try to justify why.
- Make the DP loops branchless
- Sort the input trails to make the cache behavior slightly more predictable
- Reduce the dp size from $t+40000$ to $t$
I will leave the exact details of this as an exercise to you. Try and see if you can beat my 0.11 :).
I also believe that it can be improved; currently, it does not use SIMD. I also believe that using some preprocessing of the input, you can get away with 16-bit values in the DP. It may be also possible to use bitsets to speed up the solution, but it seems difficult.
In the end, it might also be the cause that the solution reconstruction is the bottleneck, in which case it seems difficult to optimize.
## Whiteboard space
### 45 points
First, we can realize that we can think of the problems as having two independent whiteboards. This motivates a DP of the following form:
$$\text{DP[i][used red][used blue]=max number of placed ideas}$$
However, we can realize that the problem isn't asking for the largest subset of ideas that can be placed; rather, it asks for the largest prefix of ideas. This, it suffices to compute
$$\text{DP[i][used red][used blue]=0 or 1, whether the state can be reached or not}$$
Using this defininition, we can simply look at all reachable states, and the answer is the largest reachable one. The complexity will be $O(N(R\cdot C)^2)$, which solves many of the subtasks. We will implement this bottom-up, the reason for which will become apparent when we try to optimize the DP.
It's pretty tricky to implement the DP without bugs. In the end, it gets 45 points. It can be found [here](https://gist.github.com/Matistjati/6fe12f5a95f9cc2a30348036b7271aec).
### 100 points
The DP must now be sped up. Currently, we have $O(1)$ transitions per state, so we somehow need to reduce state. (Technically, since each state stores a 0 or a 1, using bitsets to speed up the transitions by a factor of 64 might be possible. Let's not do that though).
The main insight is a common way to get rid of state: we will change our dp to
$$\text{DP[i][used red]=min used used blue}$$
Which can be implemented in $O(NRC)$, which is fast enough. Why does this work? Suppose we have two states $(i_1, blue_1, red_1)$ and $(i_2, blue_2, red_2)$. If $(i_1, blue_1)=(i_2,blue_2)$, but $red_1 < red_2$, then state 1 is definitely better; it's always better to have used less blue. We can say that state 1 $\textit{dominates}$ state 2. To give better intutition for state domination, which of the following DPs work?
1. $\text{DP[i]=min used blue. If used blue is tied, minimize red}$
2. $\text{DP[used blue][used red]=max i}$
3. $\text{DP[i]=min used blue+ used red}$
4. $\text{DP[i][used blue]=min red}$
Answer: (click to reveal) ||2 and 4 work. If we only know $i$, it's not obvious if we want to prioritize red or blue||
An implementation can be found [here](https://gist.github.com/Matistjati/33fd68d474078c142950e0344ade5241).
Other problems using this idea:
- https://cses.fi/problemset/task/1653 (you can remove two states by domination)
- https://po2punkt0.kattis.com/problems/pickleclicker (there are 3 possible DPs in $O(NT)$. One of them is better than the other 2 and can be optimized to be fast enough)
- https://po.kattis.com/problems/minigolf2
## Once in a Blue Moon
### 24 point solution
A simple DP can solve this problem for 24 points, by getting subtask 1, 2:
Imagine that for every day, we want to minimize the amount of money it took to get there. Since it also matters if you have a ticket of certain validity time, we'll include this as well as a state in the DP.
We define a function dp(index, validity_time_left) = min amount of money to reach (index, validity_time_left).
The minimum amount of money that Luna has to spend will be dp(0, 0)
Now, we can write this function recursively with a set of rules:
```cpp
int mem[1001][1001]; //Assume we've assigned -1 to all values in mem
bool isWorkDay[1001]; //The days where Luna works are filled with "true" and the rest with "false"
bool isVisitDay[1001]; //The days where Luna visits Solomon are filled with "true" and the rest with "false"
vector<pair<int, int> > tickets; //The first value is the cost of the ticket and the second is the validity
int dp(int index, int validity_time_left){
if (index == 1001)return 0;
if (mem[index][validity_time_left] != -1)return mem[index][validity_time_left];
int best = 1000000001;
for (auto p : tickets)best = min(best, dp(index+1, max(validity_time_left-1, p.second-1)) + p.first/(1+isWorkDay[index]));
if (isVisitDay[index] && validity_time_left == 0)return mem[index][validity_time_left] = best;
return mem[index][validity_time_left] = min(best, dp(index+1, max(validity_time_left-1, 0)));
}
```
We try buying a ticket everyday and see if it's better than not buying anything. However, if she has to visit Solomon today and she doesn't have a valid ticket (validity_time_left == 0), she doesn't have the option to not buy any ticket.
It's also always better to choose to buy tickets for half the price than full price if possible.
This solution results in a time complexity of $O(M \cdot max(d_i) \cdot max(g_i))$
This should give us 24 points.
### Full solution
For the full solution we want to take away "validity_time_left" from the DP statement. To do this we formulate it a bit differently:
Assume for every index that we have reached this point and have 0 validity_time_left. At this point we buy the ticket if and only if we need to visit Solomon. However, this solution doesn't take into account that we can buy tickets for half the price during some days. The optimal solution might be to buy tickets during one of those days.
We should realise that there isn't any point in buying a ticket for a certain visit day at any point before the last work day before that visit day, since the tickets available are always the same. Taking this into account we can formulate a DP function as following:
```cpp
#include <bits/stdc++.h>
int n, m, k;
vector<pair<int, int>> tickets; //TIME, PRICE
set<int> days; // all visit days
set<int> work; // all work days
ll biggestDays = -1, biggestWork = -1; //The last visit day and last work day respectively
const int MAXN = 1e6+1;
ll mem[MAXN];
ll dp(ll day){
if (day > biggestDays)return 0;
if (mem[day] != -1)return mem[day];
ll best = 2e9;
auto nextIt = days.lower_bound(day);
ll nextDay = *nextIt;
for (auto& p : tickets)best = min(best, p.second + dp(nextDay+p.first));
//if we buy a ticket at normal price it will always be best to this the day of a visit
//now we check if there exists a viable work day before the next day and
//if buying a ticket that day is better than during the actual next day.
if (work.empty())return mem[day] = best;
auto it = work.upper_bound(nextDay);
if (it == work.begin())return mem[day] = best;
it--;
ll nextWork = *it;
if (nextWork <= nextDay)for (auto& p : tickets)if (nextWork + p.first > day)best = min(best, p.second/2 + dp(nextWork+p.first));
return mem[day] = best;
}
```
You get the answer by calling dp(0). Be careful of off-by-one errors that can occur during the implementation of this.
The time complexity is $O(N \cdot (logN + logK + M))$ which with the given limits is just $O(NlogN)$
## Stack Construction
As you can probably guess, greedy doesn't work. However, it also seems hard to do a DP; if our state is the current stack, we will surely get TLE.
Instead, we will have as our state the current position in the string and which character $C$ is at the top of the stack. Then, our transitions will be "for this long of an interval, something else will be on the top of the stack". Then, after that, $C$ will be on top again. So if we start at position 0, with $C=S[0]$ ($S$ is the string to be printed), then a transition saying that something else will be on the top of the stack from position 0 to j splits our problem into two subproblems: solving [1, j] and [j+1, N]. This motivates the state being $(l, r, C)$, meaning that $C$ is the character at the top of the stack, and we need to find the least number of moves to print all characters in $(l,r)$. However, we can realize that $C$ is unnecessary; it must be $S[l]$.
The solution can be found [here](https://gist.github.com/Matistjati/5cad2e9bde07f3d600714ee13c48ef91).
To simplify the implementation, we don't pay for printing in the dp; we already know that we will print $N$ characters every time. Further, we pay for pushing and popping at the same time, since the stack must be empty at the end.
## Installing apps
This problem is a little famous. You can read about other solutions [here](https://codeforces.com/blog/entry/55794?#comment-397446).
Now, how can we solve this problem? If we knew the optimal order to install apps beforehand, and the only task was to find the largest subset we can install, this can be solved by a simple knapsack-like DP. Thus, can we somehow find the optimal order?
Let us consider the (unknown) subset of apps that is optimal to install. Since our goal is to somehow find the best order, let us try swapping two of them. Let us denote their preinstall size by $d_i$ and their install size by $s_i$. Suppose we want to check whether we should swap app 1 and 2 (could be any $i,j$).
Suppose our current installed size is $X$. When installing these, we will have the following 5 sizes
1. $X$ (nothing)
2. $X+d_1$ (preinstall 1)
3. $X+s_1$ (installed 1)
4. $X+s_1+d_2$ (preinstall 2)
5. $X+s_1+s_2$ (installed 2)
Now, we want to examine these to find if a swap would be an improvement.
First, note that we assume that apps 1 and 2 are part of the optimal subset. Thus, we know that the sum of all $s_i$ in the optimal set can be installed. Thus, step 1,3 and 5 can never fail. If we swap 1 and 2, the only difference is that instead of having sums $X+d_1$ and $X+s_1+d_2$, we would instead have $X+d_2$ and $X+s_2+d_1$. The goal of the problem is to never exceed $c$, so it makes sense to choose the smaller of
$$\max(X+d_1, X+s_1+d_2)$$ $$\max(X+d_2, X+s_2+d_1)$$
In fact, if we sort the input, making these swaps, then do a DP, we get accepted. It may feel weird, as we assumed that we will only swap items part of the optimal solution. However, that isn't a problem, as we will ignore the suboptimal items thanks to the DP, and they will not change the relative order of the items in the optimal subset.
There are other ways of interpreting what we sort by. For those, read the codeforces blog.
[Here](https://gist.github.com/Matistjati/f3b97195135ff86fe2181d9fa7c6feb6) is an implementation in C++.
## Gym aesthetics
### Subtask 1
Let's do a bitmask DP; for each weight plate, we keep track of whether it is on the stack or not. Let's first sort all weight plates in decreasing order. Then, we have two kinds of transitions:
- Remove the lightest plate from the stack. This corresponds to disabling the largest bit in the bitmask.
- Add a plate to the stack. Because the stack must have decreasing weights, this corresponds to enabling a bit larger than the current largest.
We also need to keep track of the current lift. If we our current weight stack has the correct weight, we can advance to the next lift.
The problem with this DP is that states can depend cyclically on each other. If that is the case, we can't use normal methods of evaluation, as they implicitly depend on the states forming a DAG. Instead, we will have to use djikstra to evaluate the dp! I strongly recommend you to understand why; it will bring deeper intuition for what you're really doing with DP.
Code can be found [here](https://gist.github.com/Matistjati/0a9cd1f8b50b2ffe0887408acc6604e3). Note that the implementation of djisktra here is worse by a constant factor.
### Subtask 2
We need to first realize that the constraints in the input are misleading. Since the bar weighs 20kg and must be symmetric, we can change every lift $x$ to $\frac{x-20}{2}$, and only keep track of one side of the bar. Further, as every weight plate has a weight divisible by 10, we can change every weight lift to $\frac{x-20}{20}$, and every plate $p$ to $\frac{p}{10}$ (If some lift has a weight not divisible by 10, it is impossible. Since it is given that it is always possible, this can't happen). Then, the largest possible lift after rescaling is in fact 39. This is quite small.
Can there really be that many possible weight stacks with a certain total weight? It turns out that the number of possible weight stacks is equal to the number of [partitions](https://en.wikipedia.org/wiki/Integer_partition) of a number (as the weight stack has to be sorted). If we look at [OEIS](https://oeis.org/A000041/list), we find that $p(39) = 31185$, which is comparatively small. This motivates doing a DP with state (current lift, current stack). The transitions are then to change stack to a different one for the next lift. This runs in something like $O(Np(39)^2*39)$. Unortunately, this is too slow.
For subtask 2, the lifts are smaller, and we instead get $p(24)=1575$. If implemented with a decent constant factor, you solve subtask 2.
For some details; we first precompute all possible partitions in the input with sum $<=24$. This can be done efficiently using a recursive function. Then, the cost of going from stack $a$ to $b$ is $|a|+|b|-2*lcp(a,b)$, where $lcp(a,b)$ is the longest common prefix of a and b. In words, we keep as many weight plates in the bottom the stacks as possible, then remove the rest.
### Subtask 3
We somehow need to optimize the previous solution. It had $O(Np(39))$ state, and $O(39\cdot p(39))$ transitions per state. A good candidate for optimization is to reduce the number of transitions. Let us examine how the DP could be implemented in subtask 2:
```c++
// cost of going from stack a to b is sz(a)+sz(b)-2*lcp(a,b)
auto cost = [](vi& a, vi& b)
{
int lcp = 0;
while (lcp < a.size() && lcp < b.size() && a[lcp] == b[lcp]) lcp++;
return sz(a) + sz(b) - 2 * lcp;
};
// we have precomputed ways[s] = a list of all partitions with sum s
vector<pair<int,int>> cur; // current states. (cost, stack)
cur.emplace_back(0, 0);
for (int i = 1; i < n; i++) // which lift
{
vector<pair<int,int>> nextdp;
for (int j = 0; j < sz(ways[targets[i]]); j++) // for every stack with wanted sum
{
vi& way = ways[targets[i]][j]; // some stack
int mincost = inf;
// find cheapeast way to get this stack
for (auto p : cur)
{
mincost = min(mincost, p.first + cost(ways[targets[i - 1]][p.second], way));
}
nextdp.emplace_back(mincost, j);
}
cur = nextdp;
}
```
The bottleneck in the code is the "for (auto p : cur)" loop. So the cost of coming to $(i+1, a)$ from $(i,b)$ is
$$dp[i][b]+sz(a)+sz(b)-2*lcp(a,b)$$
We can realize that $sz(a)$ is constant for each $a$. So effectively, for some $a$, we want to minimize
$$(dp[i][b]+sz(b))-2*lcp(a,b)$$
So it turns out that the only part that depends on both $a$ and $b$ is the LCP. What data structure is useful when considering prefixes? Tries! To do our transition in $O(39)$, we can build a trie over all previous DP states $(i, b)$. Then, we will walk down the trie, following the path that $a$ gives. At each node, we will considering letting the lcp stop there. Then, we want to minimize the $dp[i][b]+sz(b)$ part. To not have to visit all nodes in the subtree, each node precomputes the smallest $dp[i][b]+sz(b)$ in all its children.
All in, we get approximately $O(Np(39)*39)$. There are other possible solutions. We can note that $p(x)$ grows exponentially, so it was very surprising to the judges that one contestant found a solution running in polynomial time, a little similar to stack construction. We are still not sure of the fastest possible solution; perhaps you can find a faster one?
Code can be found [here](https://gist.github.com/Matistjati/b1629e4ac0782559be66ae39875bef4b).