We thank the reviewer for the opportunity to carry on a dialog. Below, we clarify the remaining misunderstanding and highlight the difference between Drop-DTW and the methods the reviewer has now referenced.
>* I want to highlight that LCSS is not required to be greedy. In fact, most references to LCSS discuss that it is best implemented using dynamic programming, which can obtain optimal solution.
We agree. LCSS, as all other referenced sequence alignment methods, uses dynamic programming to find the optimal solution. However, **the outlier rejection step in LCSS is greedy**, as it is performed before alignment and it is not a part of the dynamic programming formulation. This is contrary to Drop-DTW, where both inlier alignment and outlier rejection are solved jointly in a single optimization problem (see Equation 2).
>* This paper proposes an DTW-based approach for shape matching (they call it MVM) with ability to skip outliers.
MVM indeed allows to skip though outliers inside the dynamic program; however, contrary to Drop-DTW, **MVM does not allow outlier rejection in both sequences** and assumes that one of the sequence does not contain outliers. This would make the application of MVM to cross-domain representation learning (see Section 4.4) impossible. Moreover, to perform outlier rejection, **MVM restricts the space of possible matches to only one-to-one matches**. This is exactly what Needleman-Wunsch algorithm does; and we compared Drop-DTW to Needleman-Wunsch in our response to Reviewer wLu7. We copy the comparison of Drop-DTW and Needleman-Wunsch for the task of step localization here for convenience.
| | CrossTask | CrossTask | CrossTask |
| --- | ----------- |---|---|
| | Recall | Acc | IoU |
| Needleman-Wunch | 43.8 |68.4 |29.4 |
| Drop-DTW | **48.9** |**71.3** | **34.2**|
It is clear that Drop-DTW outperforms this family of alignment algorithms thanks to a more flexible **one-to-many** matching, which is essential to perform step localization. **These gains are only possible thanks to the Algorithm 1 formulation!** This demonstrates that Algorithm 1 is not just novel, it's essential to solve step localization via alignment.
>*beam search and N-best solutions can be obtained using methods such as in [a], to find more globally optimal solutions with respect to metrics not easily expressed recursively
>[a] N-best maximal decoders for part models Dennis Park, Deva Ramanan ICCV, 2012.
The N-best paper considers the problem of matching set elements between each other. However, **this is not the problem we are solving in this paper!** We solve alignment between 2 sequences in the presence of outliers. Since the work of [a] considers a single set, rather than a pair, and is agnostic to the order of elements in the set, we do not see how one can apply [a] to the problem of sequence alignment.
## Summary
Above, we have demonstrated that Algorithm 1 - one of our main contributions - enables step localization by alignment, has no existing alternative, and achieves state-of-the-art results. Contrary to the reviewer's (repeated) claims, we do consider differentiability as one of our main contributions. Differentiability is simply a useful consequence of Algorithm 1, which naturally admits the min approximation proposed by Hadji et al. [1] or other differentiable min approximations (see our initial response).