Maybe add another column to the right of the lattice construction for Higher-dimension SSP, ([0]*n+[1])^T, and get $z_{n+1+m+1}$ ,and get each $x = abs(z_i - z_{n+1+m+1})/2, i \in {1,n}$
Vadim's talk at 2nd BIU gives good intro to why we can use LLL for subset sum in theory, this note focus on "how exactly" in practice for the engineers, as my digest of various related work and why they make sense.
Recall that the goal of lattice reduction algorithms are to find orthonormal bases (i.e. nice, short, and nearly orthogonal) given an arbitrary bases of the lattice. More technically, they obtain a basis whose Gram-Schmidt vectors are not decreasing too quickly (which implies that the basis vectors are somewhat orthogonal to each other).
Image Not ShowingPossible Reasons
The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted
Theorm[1]: the vector in a LLL-reduced basis (is short) has length at most
For now, we treat lattice reduction algorithm like LLL as an oracle that returns short bases in a lattice, and amazingly even finds the shortest non-zero vector in the lattice with high probability in polynomial time.
Recall the Subset Sum Problem (SSP):
[SSP]: Given a multiset , and a target where , find a subset that sums to .
Brickell and Lagarias & Odlyzko showed that almost all low-density subset sum can be solved in poly time, ; generally mean many subset solutions with the same target sum; we are interested in .
SSP doesn’t constrain since it is tasked to find a solution, not “the” solution.
Utilizing LLL to solve SSP is essentially reducing the latter to the former, here’re some attempts that gradually build towards the one reduction that we’ll use. At the center, we need to construct a problem instance , given the SSP set , to feed into the LLL oracle such that the returned from the reduced bases correspond to the SSP solutions.
note: , is called lattice dimension, is called embedding dimension, most cases in cryptography, people consider squared, fully-ranked matrix. When we only use subscript, we are implicitly using full-rank
notation: by convention (and in python), is a row matrix, first row is , but in lattice crypto, the convention is taking column matrix as input, which we denotes
First Attempt
Now, any lattice point in can be expressed by a linear combination of the row vectors. Let’s denote the linear combinator :
It’s not hard to verify that actual solution is a short vector in the lattice, thus we can hope that our LLL oracle finds it, and we can check this condition on the reduced : , then output as the solution. Wola!
Unfortunately, there’s a problem: there could be shorter vectors than and lattice oracle might return them instead!
If any one of is small, then that row itself is already a short vector
if any two inputs are close , then the difference (i.e. ) is a short vector
more generally, if for any small integer coeffs , then is a short vector
In short, all the above 3 cases presents high probability of our lattice oracle failing to produce the intended solution of SSP, and since we don’t have prior knowledge or constraint on values of , we need to modify our to avoid these possibilities.
Second Attempts
Both cases above can be solved with one simple trick: scale the last column with some value (don’t worry about the bound, just think a big number like 1000)!
Each row is no longer a short vector, and LLL does need more work to find an actual smaller vector in this lattice. And the second concern is also mitigated since the delta is also scaled up, thus unlikely to be a shorter vector than our intended solution anymore.
note: if , then the second concern still stands, we will discuss this towards the end.
observe the entry : what if for some coefficients ? Then there are more competing short vectors again.
(e.g.): v.s. where and . the former is a shorter vector and more likely to be returned by LLL oracle, but not exactly what we wanted.
In short, the lesson is we want to add even more constraints to the coefficients that contributes to the term so that constraining actually leads to without many competing/alternative short vectors.
This slightly modified reduction increase the range of SSP instances we can solve to any . But we skip the analysis, as it’s non-essential to us, and take it as given.
It’s not hard to verify that the foregoing condition guarantees with overwhelming probability (since we avoid all of the 4 caveats above).
Say , and since , Wola!You can easily double-check the other case when .
Higher-dimension SSP
Up until now, we only consider as scalars. What if they are vectors of higher dimension themselves, say ?We could flatten out all the dimensions and use the same argument, inflating the embedding dimension but keeps the same lattice dimension for .Concretely:
The condition check becomes:
TODO: add more analysis on the success rate or density bound?
Modular SSP
Sometimes, we are asked to solve the modular variant of SSP where values and summations are carried out in instead of in
TODO: add adaptation here.
One more thing
Recall the edge case of some that goes outside the scope of classical definition of SSP instance, that may cause competing short vectors.
If there aren’t too many (say constant number) of these duplicates, then we can simply remove and feed and to the current subset sum solver. Note the two instances corresponds to the original and . Since our reduction in the final attempt gives a poly-time solver, we would end up a poly-time solver here too. Effectively, we are guessing and trading off run-time with success probability.
This approach also works if there are logarithmic number of duplicates, as guessing all possible combination takes the power set of those duplicates, and logarithmic duplicates leads to number of tries, which still end up with poly-time solver.
However, if we go beyond log number of duplicates, then in practice, instead of suffering from super-poly runtime, we might choose to suffer from a slightly higher failure probability from the subset solver instead. Additionally, we can check more than just , but also the first few reduced basis, since there’s likelihood that the first reduced basis is the shortest competing vector while the second shortest basis corresponds to our subset solution. In terms of bound and guarantee, it’s beyond my depth for now but some statistical empirical evaluations might enlighten us.