This paper aims at solving a weakly supervised Domain Adaptation problem on time series data. We thank the reviewers for their helpful comments and suggestions for improvements. Remarks about typos/unclear formulations will be fixed if the paper is accepted. We discuss the main points, then more minor remarks:
# Performance and comparison with state of the art (R9,R10)
We disagree with R9's statement "However, the proposed model fails to perform on the standard datasets". In benchmarked datasets such as HAR with no global time shift, MAD performs similarly to the best competitor (tab1). With an asserted time shift, which is precisely the scenario we target, MAD provides the best performance (tab2). We did not indeed include all possible competitors. We run additional experiments showing that (C-)MAD performs better than AdvSKM (average acc: 74.6 (miniTM) and 94.2 (TarnBrittany) / tab2). CoDats has been shown to outperform AdvSKM on (H)HAR datasets [1].
# Missing references (R9,R10)
We thank the reviewers for pointing out interesting and recent references. DA for time series is indeed an important issue that has received a lot of attention recently. At the submission time, we were not aware of [2] (pub. time:oct22) that uses OT for DA. It definitely shares some similarities with MAD, as it looks for class-dependent alignments, but also fundamental differences as it tackles the supervised DA setting. Our DeepJDOT-DTW baseline is a time-series-specific method adapted from a reference OT-based DA technique that appears as a more suitable baseline for MAD.
[3] considers a related problem: time series forecasting. While DA and time series forecasting share some similarities, the method proposed in [3] cannot be straightforwardly extended to DA.
# Knowledge of the target label proportion (R2)
(C-)MAD requires knowing target label proportions. It is true that R-DANN and VRADA do not make this assumption so the comparison may not be completely fair. To our knowledge, the only baseline in that context is CoDats-WS.
By its formulation, MAD is directly affected by the proportion drift between the two domains; the larger the drift, the more the performances can be degraded. We propose to report figures for case where we do not have such knowledge. We could not rerun the whole set of experiments but the impact should be rather limited in the considered datasets as the proportion of classes are similar between the source and target distribution (see CoDats paper for label proportions).
As discussed in our "Experimental setup" section, one could take inspiration from Fatras et al. (2021) and use unbalanced OT to tackle this limitation, this is left for future works.
# Other comments
- R2, ablation study: OT cannot be straightforwardly extended to the case of different time lengths. Only a metric such as DTW allows comparing series with different lengths
- R2, choice of α and β: we set α so that the value of its corresponding loss lies within the same range as the value of the classification loss. We set β to be a tenth of this value
- R2, link fig3-tab2: the class separation is more obvious for MAD but needs more than 2dim to be effective for CoDATS
- R2, advantages of C-MAD wrt MAD: it is true that in most of the reported scenarios, the time shift is global for all the classes. In that case, C-MAD and MAD have similar performances. When there are different shifts, C-MAD outperforms MAD (case FR1→DK1, +10pts of acc). When investigating DTW paths for different classes in that context, one can indeed notice that they are different, a behavior that can be caught by C-MAD. We propose to discuss this further in the supp. material
- R9, assumption about the same label space: we do not aim to deal here with the open-set DA scenario, which is a different problem on which standard UDA solutions fail at discriminating the new classes
- R9, lack of temporal information in MAD: temporal alignment is performed *before* the pooling, hence temporal consistency is kept (cf fig2)
- R9, background and motivation: thanks for pointing out unclear statements. We made the connection between OT, which aligns samples, and DTW, which aligns timestamps, and we combine them in an integrated formulation dedicated to DA for time series. OT is now a state-of-the-art method for comparing samples' distributions and we propose to recall more deeply some of the basics
- R9, code failure: in the file `dataset_extract.py` change line 953`path_save=os.path.join("Dataset","UWAVE")` to `path_save=os.path.join("Dataset","Uwave")` and make sure you have the libraries `unrar` (system lib) and `rarfile` installed. Sorry for the inconvenience
[1] Ragab M. et al. ADATIME: A Benchmarking Suite for Domain Adaptation on Time Series Data (arXiV, 22)
[2] Ott F. et al. Domain adaptation for time series classification to mitigate covariate shift (ACM MM, Oct 22)
[3] Jin X. et al. Domain adaptation for time series forecasting via attention sharing (ICML, Jun 22)