ECCV 2020 Rebuttal

# ECCV 2020 Rebuttal ## Reviewer 3 ### Q1: Applying some constraints to the search space does not seem to be a good solution. Q1: It is incorrect. We did not change any of the original settings or add constraints to the DARTS search space, which simply removes zeros after the search phase. However, our proposed approach is better than DARTS in terms of fairness in the competition among the operations and the performance of the searched architecutres. Different from the problem of skip connection being both overwhelming and indispensable, zero is not essential at all to the DARTS search space. Zero would never be chosen since DARTS restricts every node to have exactly two incoming edges. Under this condition, removing zero from the candidate operation set does not practically shrink or apply any constraint on the search space. Removing zero enables the other candidate operations to compete with each other instead of causing suppression on the other operations, it is reasonable to remove zero from the candidate operation set. ### Q2: Can the search improve giving a fair chance of each operation? The reviewer provided an improper suggestion unrelated to the context of this paper. The results clearly show that our approach is able to improve the performance. The reviewer has a serious misunderstanding. The two-stage search frameworks utilize the performance of one-shot architectures to feedback to and update the sampler (RL agent or genetic algorithm). Therefore, the performance estimation for each child architecture or at least their rankings have to be accurate. However, the one-shot performance is not as important for the one-stage search methods like DARTS. In contrast, the fairness of the competition among the candidate operations during the joint optimization search phase are far more important. Therefore, our proposed approach gives the operations fair chances to compete to each other instead of all suppressed by zero or dominated by unfair advantages. Additionally, please note that PDARTS and PC-DARTS as well freeze the architecture parameters in the first 15 epochs for the so-called 'warm-up' phase. We have analyzed the effect of the warm-up phase in our ablation study for PC-DARTS in Section 5.3. It is observed that the warm-up phase indeed helps to reduce the variance of the probabilities for each operation, and our proposed approach is able to further outperform the baselines, shown in Table 1. ### Q3: Zero dominance contradicts with other DARTS work. The reviewer seems to be not quite familiar with the implementation of DARTS. We would like to remind the reviewer that the phenomenon of favoring skip connection is based on the fact that "zero is excluded after the search phase" as we have mentioned in Section 1. The official implementation of DARTS removes zero only when deriving the final architecture. If zero had not been removed, the dominated operation would have definitely been zero, instead of the skip connection. ### Q4: Inadequate experiment settings. It is quite confusing that the reviewer questioned the adequacy of the experimental settings and provided inappropriate suggestions. It is rude and impolite to question the reproducibility without any proof. Reporting the mean and standard deviation of the errors in 10 random seeds from 4 different search runs is introduced by DARTS rather than invented by us. These settings are widely adopted by the followed-up works based on DARTS for the evaluation phase. We are then going to point out the inappropriate suggestions individually. 1. A random baseline isn't necessay in our work since we are only trying to improve the fairness of the search process rather than proposing a new search method. 2. As described in Section 5.1, we have emphasized that we adopt the same hyperparameter settings and augmentations for all baselines in the evaluation phase, and therefore there is no unfair impact influenced by the training methods. 3. Differentiable NAS methods do not benefit from NASBench-201, while our experimental settings with 10 random seeds are far more than that of NASBench with only 3. As a result, NASBench does not seem more advantageous than our settings. 4. Although the results of searching on Imagenet would definitely make our paper more self-contained, the baselines are not designed to directly search on it, including DARTS and PDARTS. Therefore, it is infeasible to compare the baselines adopted in our work in searching on Imagenet. 5. Since we are not proposing a new searching method, experiments conducted on the 6 search spaces introduced in RobustDARTS are not able to demonstrate the effectiveness of our proposed approach. Besides, only one out of the 6 search spaces contains zero in the candidate operation set, and therefore the rest of the spaces are not appropriate for our propositions. Furthermore, the authors of RobustDARTS in fact did not notice that zero is removed when deriving the architecture just like the reviewer did. ## Reviewer 2 ### Q1: Why zero dominate? We consider zero domination is as essential as the widely recognized problem of skip connection but causing a even more severe issue. [1] introduced the concept of "unfair advantages" for skip connection since skip connection is proposed to allow the gradients of deeper layers to flow back to the shallower layers. As a result, gradients prefer skip connection to other operations in DARTS. On the other hand, under the domination of zero, zero is much more prefered than all the other operations including skip connection by the search phase since it eliminates the gradients passed through to 0. ### Q2: Could it be because the search is conducted on CIFAR10 with 32x32 images? What if the search is conducted on larger resolution images? (sep5) The 5x5 separable convolutions are not prefered by the search phase due to the over-sized receptive fields. If the experiments are conducted on larger resolution images, the results will be more consistent to our assupmtion that the operations with more parameters are prefered by the search phase. Additionally, please note that we apply our Proposition 2 in both 3x3 and 5x5 separable convolutions, and thus it is effective for both small and large resolution images. ### Q3: The proposed fix is very restrictive by design and provides no guarantees on how it would behave if the search space contains other operations (Eg conv3x3, sepconv1x1). There is acutlly no restriction for adding new operations to our framework. The main idea of Proposition 2 is to remove the unfair advatanges of certain operations introduced by the search space or candidate operations settings. Since DARTS formulates NAS into a joint optimization problem, many settings cannot be directly inherited from the previous works. Some adjustments are required to maintain the fairness of the competition among the operations. As a result, the more important value and implication of Proposition 2 is to balance each candidate operation and guarantee that no operation possesses unfair advantages during the search process. Therefore, only fairness has to be ensured for newly added operations. [1] Chu et al., Fair DARTS: Eliminating Unfair Advantages in Differentiable Architecture Search, arXiv ## Reviewer 1 ### Q1: The phenomena reported was covered in several previous work. SNAS was the first work in the literature that unveiled the bias towards zero operation in DARTS in the literature. 1. 我們很不同意，因為雖然他們也有注意到這個問題，他們所使用的方法和我們根本上的不同 2. Code 這個月才放上來 3. 即使跑了這個Code，還是有錯 4. SNAS 也沒有針對 zero bias 的事情只用一兩句話帶過，完全沒有仔細討論這個問題。我們這篇文章把Zero Bias的問題用實驗的方法徹底討論，且凸顯這個嚴重性，這些都是SNAS完全沒有提供且忽略的。 5. SNAS 原本主旨也不是針對 zero bias 做處理，而是在ICLR OpenReview之後才補上Zero bias的討論兩行。這樣子Claim有處理Zero Bias的問題。It is rude and unfair of the reviewer去拿這個當標準來批評我們的paper, where our papers provided lots of figures and experiments to show the impacts of zero bias。 ### Q2: In differentiable architecture search, the removal of zero operation seems not necessary. For example, SNAS finally recovers from the dominance of zero operation. Authors should provide justification for the operation selection strategy they proposed. For example, why zero operation always dominates in DARTS but not in SNAS? We strongly disagree that zero domination was covered in several previous works. SNAS did not analyze the zero domination problem and merely mention it in the footnote. The reviewer failed to provide any other reference for "the several works that covered the problem". Moreover, zeor domination was actually raised by their reviewer in ICLR'19. While SNAS was lack of further analysis and ignored the problem of zero domination, we analyze it thoroughly with qualitative and quantitative experiments and highlight its significance. In addition, SNAS did not open source their search code until two months ago (March in 2020), and hence, there was no offical reference to reproduce the results reported by SNAS. Based on the recently released code, we have run SNAS and tried to reproduce their results. However in our experiments, normal cell contains only shallow connections to the input nodes and the reduction cell is full of zero. It is hard to be convinced that SNAS is able to recover from zero domination. Normal cells degrade to shallow architectures caused by zero domination, while reduction cells even fail to consist of any connected edge. Therefore, it is unfair and incorrect for the reviewer to criticize our paper based on the inappropriate standard. ### Q3: Experiments only demonstrate the phenomena of unfairness, without showing why this selection strategy can necessarily bring about advantage. 我們很不同意Reviewer，這樣是一個不正確的態度 1. 我們有展示結果，但Reviewer直接批評我們沒有，這樣是一件非常Improper and Unprofessional的行為。 2. Reviewer沒有仔細看我們的實驗結果就批評，這樣其實是很 impolite。 We strongly disagree with the reviewer. We have clearly demonstrated the effectiveness and advantages of our proposed approach, and showed that our methodology outperforms the baselines in our experiments in Section 5. It is very unprofessional of the reviewer to simply criticize our results without providing any constructive comment and suggestion.