# Rebuttal AAAI
# Reviewer #5
Regarding the VisDA and Office-Home datasets, they are used in homogeneous domain adaptation (DA), thus not particularly relevant to our experiments on heterogeneous DA.
We also find that the suggested references are neither relevant nor comparable to our work. First, their main interest is on the domain adversarial learning based approach for the UDA problem. By contrast, our focus is on an optimal transport based method, for which DA is an illustrating application amongst others. Second, even from the DA viewpoint, we are interested in the heterogeneous DA, whereas the proposed methods in the two references are only applicable to homogenous DA.
# Reviewer #6
> 2. {Strengths and Weaknesses}
> - Methods are not enough explained (I feel that Alg.1 is not implementable only with this paper)
The algorithm 1 is in fact implementable. Due to space limit of the main paper, we only present the main steps. All necessary implementation details are provided in the supplementary material.
>
> 3. {Questions for the Authors}
> - In Definition 1, $\mu^s\in \mathcal{M}^+(\mathcal{X}^f)$, but it seems to be from the sample measure? i.e., $\mu^s\in \mathcal{M}^+(\mathcal{X}^s)$?
Indeed, this is a typo. We thank the reviewer for pointing out and will correct it in the revised version.
> - How to define scalar functions in practice (for example, MNIST experiments in the experiment section) ?
In practice, the scalar functions $\xi(x_1^{s}, x_1^f)$ and $\xi(x_2^{s}, x_2^f)$ are simply the coordinates of the input matrices $X$ and $Y$, respectively. For example, in the MNIST experiments, the source data is represented as a matrix $X$ of size $1000 \times 784$, and similarly the target (noisy) data is represented as a matrix $Y$ of size $1050 \times 784$.
> - Simlar to above, how to define the joint cost and how does it work (with scalar functions)? In Def 1, the joint cost is not explicitly used to define UCOOT.
The joint cost can be found in Def 2. More precisely,
$c((x_1^s, x_1^f), (x_2^s, x_2^f)) = |\xi_1(x_1^s, x_1^f) - \xi_2(x_2^s, x_2^f)|^p$, for some $p \geq 1$. In our experiments, we use $p=2$.
> - I recommend the authors to insert row names in Table 2 (as in Table 1) to make the Table 2 as self-contained as possible.
We thank the reviewer for the suggestion. We will insert the row names to Table 2 in the revised version.
# Reviewer #7
> 3. {Questions for the Authors}
> - Is it fair to compare the robustness of COOT and UCOOT on the handcrafted modifications of MNIST dataset (as described in the first paragraph of the Experiments Section)?
We believe that the comparison is fair. The purpose of the handcrafted modifications of MNIST is to illustrate the ability of the method to handle (detect and remove) the pure-noise features and samples, thanks to the unbalanced aspect of UCOOT.
> - What is the rationale for selecting the Caltech-Office dataset?
The Caltech-Office dataset contains images with different vectorial representations (vectors in $\mathbb R^{1024}$ and $\mathbb R^{4096}$) from different domains (Webcam, Amazon and Caltech-256), thus perfectly fits in our purpose of comparing incomparable spaces.
> - Table 3: According to the table, UCOOT outperforms COOT for 6 (and not 7) of the nine dataset pairs? Also, given the large variance values (due to the unsupervised HDA) it is hard to see a clear superiority of UCOOT?
It is true that UCOOT exhibits high variance in HDA, but this is not specific to UCOOT but to the difficulty of the HDA problem. All HDA methods in the literature have large variance.
> - Table 3: why does COOT perform poorly for the pair "W -> C"?
This might be due to the hyper-parameter tuning process. As the hyper-parameters for each method are validated on the pair $W \to W$, there is no guarantee that the optimal hyperparameters will result in good performance on other pairs. But since the validation procedure is the same for all methods, the comparison remains fair (if not optimized for all pairs).
# Reviewer #8
Authors's response to all of reviewer's comments:
In fact, UCOOT is not a special case of COOT but a generalization and relaxation. More precisely, one can recover COOT as a limiting case of UCOOT by letting $\lambda_1, \lambda_2 \to \infty$ in Def 3. In particular, while COOT requires that the couplings must respect the marginal constraints (i.e. $\int_X d\pi = \mu$), UCOOT does not, but uses KL divergence to measure the discrepancy between $\int_X d\pi$ and $\mu$. The hyper-parameter $\lambda_1$ and $\lambda_2$ associated to the KL terms allow for controlling the level of discrepancy. In COOT, as the marginal constraints must be met, there is no divergence term in the formulation.
Regarding Figures in page 7, we will work on the plots for better visibility. We thank the reviewer for the suggestion.
# Confidential box
Dear Senior Program Committee member,
We are writing to discuss the review provided by Reviewer 5. In particular, we believe that Reviewer 5 submitted a highly irrelevant and low-quality review that asks us to evaluate on benchmarks and compare to baselines that do not correspond to the learning setup considered in our work.
We would like to ask the Senior Program Committee member to discard this review from the reviewing process as we believe that, given the reviewers effort in review writing and their level of investment, they can hardly be seen as experts capable of evaluating our work.
# Message to SPCs and ACs
Dear Special Program Chairs and Area Chairs,
We would like to inform you about a case of potentially unethical behavior from one of the reviewers of our submission 10906. In particular, we believe that Reviewer 5 not only submitted a highly irrelevant and low-quality review that asks us to evaluate on benchmarks and compare to baselines that do not correspond to the learning setup considered in our work, but also did so by copy-pasting their review from another submission 10832.
While the two submissions present two completely unrelated research ideas with only 1 overlapping author, the review of Reviewer 5 for them is strikingly similar as shown below:
*Review for Submission 10906*
>This paper proposes CO-optimal transport leading better alignments. The author also provides a theoretical result. Experiments demonstrates the effectiveness of proposed method.
>
>+ This paper uses a novel CO-optimal transport
>+ Experiments demonstrate the effectiveness of proposed methods.
>- Lack comparison on standard benchmarks, such as VisDA and Office-Home.
>- Lack comparison with recent works, [1], [2]
>
>[1] Wei et al, ToAlign: Task-oriented Alignment for Unsupervised Domain Adaptation, in NeurIPS 2021.
>[2] Wei et al. MetaAlign: Coordinating Domain Alignment and Classification for Unsupervised Domain Adaptation, in CVPR 2021
*Review for Submission 10832*
>This paper proposes a principally new OT-based approach to DA which uses the closed-form solution of the OT problem given by an affine mapping and learns a embedding space. Experiments on DA benchmarks demonstrate the effectiveness of proposed method.
>
>+ This paper uses a novel OT-based approach for DA.
>+ Experiments demonstrate the effectiveness of proposed methods.
>- Lack comparison on standard benchmarks, such as VisDA and Office-Home.
>- Lack comparison with recent works, [1], [2]
>
>[1] Wei et al, ToAlign: Task-oriented Alignment for Unsupervised Domain Adaptation, in NeurIPS 2021.
>[2] Wei et al. MetaAlign: Coordinating Domain Alignment and Classification for Unsupervised Domain Adaptation, in CVPR 2021
We would like to ask the Special Program Chairs and Area Chairs to discard this review from the reviewing process for both papers as we believe that, given the reviewers effort in review writing and their level of investment, they can hardly be seen as experts capable of evaluating our work.