# [Longest Common Subsequence](https://po2punkt0.kattis.com/problems/lcs) # Subtask 1 and 2 To solve these subtasks, it suffices to simply implement the classic $O(|A||B|)$ dynamic programming solution. [Wikipedia](https://en.wikipedia.org/wiki/Longest_common_subsequence#Solution_for_two_sequences) has an ok editorial, you might find better resources by simply searching "Longest Common Subsequence". Here is a simple solution in Python, which only prints the length: ```python a = input() b = input() n = len(a) m = len(b) dp = [[0] * (m+1) for j in range(n+1)] for i in range(n+1): for j in range(m+1): v = dp[i][j] if i<n: dp[i + 1][j] = max(dp[i + 1][j], v) if j<m: dp[i][j + 1] = max(dp[i][j + 1], v) if i<n and j<m and a[i] == b[j]: dp[i + 1][j + 1] = max(dp[i + 1][j + 1], v + 1) print(dp[n][m]) ``` # Subtask 3: length It is widely believed that if both strings have length $N$, then no $O(N^{2-\epsilon})$ solution exists for $\epsilon > 0$ in general. For a view into why, it is widely believed that SETH holds (a certain problem is hard): [introduction](https://web.stanford.edu/class/cs354/scribe/lecture17.pdf). Then, if we can solve LCS in better than $O(N^2)$ time, this [refutes SETH](https://www.weizmann.ac.il/math/AmirAbboud/sites/math.AmirAbboud/files/uploads/W8LCS.pdf). However, nothing tells us that we can't aim for $O(\frac{N^2}{64})$, where the factor of 64 comes from 64 transitions at once using bitwise parallellism (also known as bitsets). So let us see how to to optimize LCS using bitsets: a seemingly preposterous proposition, as we store integers in the DP table, not 0/1. This presentation is an approximate recreation of how I reinvented the algorithm, after reading the paper and having a hard time seeing how I would invent this myself. It is highly recommended that you pause and play around yourself so you understand what's happening. Now, let's work up towards it. Recall that $$DP[i][j]=\text{longest common subsequence of A[0,j) and B[0,j)}$$ We can think of this DP as a longest path: we can always move right or up (increasing $i$ or $j$), and if $a[i]==b[j]$, we can move diagonally and gain 1 point. This is plotted below, where I have only drawn diagonal arrows. ![image](https://hackmd.io/_uploads/SkmT3ySMZl.png) (Note: all these images are a tiny bit fake, since you need an extra row and column at the top and right). One thing you can realize is that if we sort arrows by x-coordinate, then the longest common subsequence is simply the longest increasing subsequence of the arrow's y-coordinates. This can be used to solve [NCPC 2024 Double Deck](https://open.kattis.com/problems/doubledeck). In general, the number of arrows might be quadratic, unfortunately. ## Insight 1: every row and column is nondecreasing When we take some arrow and improve a cell, we can always take the (not drawn) 0-cost arrows to the right and fill up the cells. This also holds for columns. ## Insight 2: we only need the last row to compute the length To get one row into the next one, we first handle the diagonal arrows. Then, we handle right-arrows by taking a prefix-max. Note that diagonal arrows must be iterated backwards, because we can only take at most one diagonal arrow per row. Upwards arrows are applied implicitly. ```python a = input() b = input() n = len(a) m = len(b) dp = [0] * (m+1) for i in range(n): # Diagonal arrows for j in range(m-1, -1, -1): if a[i] == b[j]: dp[j + 1] = max(dp[j + 1], dp[j] + 1) # Right-arrows for j in range(1, m+1): dp[j] = max(dp[j], dp[j-1]) print(dp[m]) ``` ## Insight 3: since every row is nondecreasing, it suffices to store a 1 at every entry where it increases This is how we avoid storing $M$ full integers, and instead $\frac{M}{64}$ bits per row. Left: only points where the DP increases. Right: original DP. ![image](https://hackmd.io/_uploads/Hkvu7gBzWx.png) Note that this representation is lossless: we could restore the real DP table by taking a prefix sum over each row. Now, let's consider how the transitions look like in this new representation. To do that, let's first think of what applying arrows does to the original DP table. ![image](https://hackmd.io/_uploads/r13gHlHzWx.png) In this example, applying the bottomleft arrow is to take its value + 1, which is just 1, and then take all cells in the red region, and increase their value to 1 if it's less than that. Now, how does this look like in the bitwise representation? ![image](https://hackmd.io/_uploads/r1mWLlBGWg.png) What we end up doing is grabbing the next 1 to the right of the arrow, and placing it to the top-right of the arrow. However, with this representation, order matters a lot. Consider the following situation: ![image](https://hackmd.io/_uploads/HkjC8gSzbx.png) Both blue arrows want to pull the green one. However, we must process them right to left such that the one is pulled all the way to the left. If you're thinking hard, you might notice that currently, there is no way for any additional ones to appear. To remedy this, we make the size of the DP array $M+1$, and always ensure that there's a spare 1 at index $M$ (which we don't count at the end). To simplify things mentally, let's shift the "bottom" string by one, and let arrows point upwards. ![image](https://hackmd.io/_uploads/HJZ5gWHf-g.png) At this point, I encourage you to implement an $O(|A||B|)$ DP with bits yourself, where you only keep the last row. This might be tricky, so [this](https://gist.github.com/Matistjati/60e4cfc0df0e1cb5bde0d8d9f9675dee) simple visualization script might be useful. If you really get stuck, an implementation can be found below. :::info Spoilers further down . . . . . . . . . . ::: ## Bitwise-optimizing it From here on, it's going to get very fiddly. Make sure to understand exactly what this code does. ```python a = input() b = input() b = '$' + b n = len(a) m = len(b) dp = [0] * (m+1) for i in range(n): # Spare one to allow increases dp[m] = 1 last_one = m arrow = [a[i] == b[j] for j in range(m)] for j in range(m-1, -1, -1): if dp[j]: # one last_one = j if arrow[j]: dp[last_one] = 0 dp[j] = 1 last_one = j print(sum(dp[:-1])) # For answer, just count number of ones, except the last, fake one ``` Notice the arrow array. While we don't need it, this should be a clue for later: it's a bitmask! Since our alphabet has size 26, we can precompute all of these in $O(26N)$ time, which is cheap. At this point, it should be clear that all an arrow at position $i$ does "pull the first one with index $i \geq j$ to $i$". How can we quickly implement this using bitwise operations? I once again encourage you to try yourself. Remember than we will split the array into blocks of 64 bits. A big hint is that there must be some cross-block dependencies: blocks are *not* independent! So it's impossible to construct a solution using only and, or, not, xor. Plotting a bit using the script above might be a good idea. :::info Spoilers further down . . . . . . . . . . ::: First, just from looking at the code, we can notice that we never have to care about cells where there's both a one and an arrow (these arrows do nothing), so let's remove those arrows. Next, let's switch perspective and think "I'm a one, where should i go"? ![image](https://hackmd.io/_uploads/ry9Vxzrf-l.png) (In all the following images, the least significant bits are to the left). Well, to the the rightmost arrow-one in its "territory" (marked red). So suppose we have a single one in the DP. ![image](https://hackmd.io/_uploads/ryeFgGBfbg.png) Then, to obtain the new DP from the arrows mask, we need to isolate (zero everything except) the rightmost set one, marked red below ![image](https://hackmd.io/_uploads/HJjr_MBM-g.png) This turns out to be nontrivial. However, isolating the leftmost set one (least significant bits) is doable! So let's store our DP in reverse order. ![image](https://hackmd.io/_uploads/ryLXZfrGbg.png) Now, our task is: for each block, isolate the lowest set bit. And then, zero out some suffix at the very end (remember that the "reserve" one will be on the left side since we reversed). How do we isolate the lowest set bit, in general? Suppose we have `x=00001001` where the least siginificant bit comes first. Then, `~x=11110110` If we could somehow take `~x`, and create a 1 at the index of the smallest set bit, we could just & them. To do this, we can simply add 1! ``` ~x+1=00001110 x =00001001 & 00001000 ``` So to summarize, to isolate the lowest set bit of `x`, do `x&(~x+1)` Now, how do we generalize this to multiple territories at once? It works to add a one at the start of every territory! And this can be done by just shifting DP by one to the right, and then adding a 1 to the start. ![image](https://hackmd.io/_uploads/Sy8sLfHMZl.png) And now, magically, the bottom row is our new DP row! Notice that we might get some junk at the end (the last one). This can easily be fixed: just find the last one in the DP, and mask out all the junk. And that's about it. I leave it as an exercise for you to implement this efficiently. It's not *that* bad, I promise. Oh, and remember [NCPC 2024 double deck](https://open.kattis.com/problems/doubledeck)? This algorithm is also fast enough to solve said problem! # Subtask 3 part 2: reconstructing the answer Reconstructing the answer turns out not to be that bad. We only need to modify our DP to return the entire last row instead of only the length, and we can then with a factor 2 overhead reconstruct the answer using Hirschberg's algorithm. [Blog](https://codeforces.com/blog/entry/57512).