==+== Review Readiness ==-== Enter "Ready" if the review is ready for others to see: Not Ready ==+== A. Overall merit ==-== Choices: ==-== 1. Reject ==-== 2. Weak reject ==-== 3. Weak accept ==-== 4. Accept ==-== 5. Accept, award quality ==-== Enter the number of your choice: 3? ==+== B. Reviewer expertise ==-== (hidden from authors) ==-== Choices: ==-== X. I am an expert on this topic (know the related work well) ==-== Y. I am knowledgeable on this topic ==-== Z. I am an informed outsider ==-== Enter the letter of your choice: Y. ==+== C. Paper summary ==-== Please summarize the paper briefly in your own words. ==-== Markdown styling and LaTeX math supported. This paper describes and evaluates a novel approach for multi-objective optimization of configurabl software systems which at each step only produces valid configurations. To achive this, the authors present a set of change operators to change a configuration of a configurable software system that always result in valid configurations. The paper proves that with these operators, all possible configurations can be explored while always staying within the valid configuration space. This approach is implemented using MDEOptimiser, yielding a graphical representation of the change operators for improved interpretability. The paper compares this implementation to MODAGAME and SATIBEA, two tate-of-the-art approaches, on a set of popular, but artificial case studies. The results show that the approach at hand yields statistically significantly better results in all cases, but only by a substential margin in one case while taking longer by factors 3 - 32. ==+== D. Strengths and weaknesses ==-== Please provide a short list of the paper’s strengths and ==-== weaknesses. ==-== Markdown styling and LaTeX math supported. Strengths - proof of soundness and completeness of change operators - comparison with 2 other approaches on many subject systems - statistical analysis of results accompanied by detail plots - new aproach Weaknesses - artificial data used for experiments - no interactions within data - did not address effect size of improvements - misleadingly suggests fast execution time of approach ==+== E. Comments for the authors ==-== Please provide constructive feedback to the authors justifying ==-== your score. ==-== ==-== Consider using the following aspects: ==-== ==-== * *Soundness*: The extent to which the paper’s contributions are ==-== supported by rigorous application of appropriate research methods ==-== * *Significance*: The extent to which the paper’s contributions ==-== are important with respect to open software engineering challenges ==-== * *Novelty*: The extent to which the contribution is sufficiently ==-== original and is clearly explained with respect to the ==-== state-of-the-art ==-== * *Recoverability, Replicability and Reproducibility*: The extent ==-== to which the paper shared information and artifacts that are ==-== practical and reasonable to share. Note that this depends on the ==-== type of paper, for example, qualitative interview transcripts ==-== often cannot be released due to de-identification risk or industry ==-== data may contain trade secrets. ==-== * *Presentation*: The extent to which the paper’s quality of ==-== writing includes clear descriptions and explanations, adequate use ==-== of the English language, absence of major ambiguity, clearly ==-== readable figures and tables, and adherence to the formatting ==-== instructions provided below ==-== Markdown styling and LaTeX math supported. ## Significance The authors report progress on an important topic. Multi-objective optimization is an ongoing challange that is of practival relevance e.g. to find Linux kernel configurations that minimize byte footprint, RAM demand and energy consumption. ## Novelty Although striving for valid configurations during MOO has been part of approaches such as SATIBEA, to the best of my knowledge, there has not been a approach so far that only optimizes within the valid configuration space. Hence, the approach at hand has a novel aspect, while keeping often-used aspects of evaluation and the general approach of Genetic Optimization. ## Soundness The research methods applied to study the subject were suitable in general, such as running 30 repetitions for a randomized optimization algorithms, and conducting significance tests. ### External Validity This paper choses an experiment setup that is similar to the papers that were chosen as comparison approaches (including subject systems and quality attributes). Although brings the benefit of high comparibility, it also introduces major flaws: 1. Not interactions. Only features are attibuted with their influence on the chosen NFPs. Hence, the target function to optimize is linear, whereas interactions between features have been shown to be influential, too [REFERENCE-BY-NORBERT?]. 2. Artifical values for features influence were chosen as given by one of the comparison approaches. However, it is unrealistic that the distribution of all feature influences follows a Normal distribution (as they were generated). For single-objective optimization, it has been shown that the solution quality of the approaches deviates if interactions and realistic NFP value distributions are considered [1]. The results reported in this paper show an improvement on artificial test data, but an improvment on real data is questionable. ### Effect Size The paper reports a significant improvement concerning the hyper volume quality metric when using this paper's approach. However, the improvement is only marginal for most systems, even considering that a small improvement in hyper volume may have a considerable practical impact. For example, for BerkeleyDB, MDEO has a median of 0.51 while MODAGAME also has a median of 0.51 (with higher SD admittedly). For that reason, I conducted an effect size analysis with the Vargha and Delaney A measure which is often-used for comparing MOO approaches (e.g. in the SATIBEA paper). The below table shows a reproduction of the whitney U test which matched the results presented in the paper. The A12 metric shows that it is to be expected that MDEO performs better for all systems and compared approaches with MODAGAME almost tying for tankwar. This analysis should have been done in the paper to convince the reader that the marginal difference in mean HV is relevant. | system | ('p', 'MODAGAME') | ('p', 'SATIBEA') | ('$A_{12}$', 'MODAGAME') | ('$A_{12}$', 'SATIBEA') | |:-----------------|--------------------:|-------------------:|----------------------:|---------------------:| | x264 | 1.35424e-14 | 6.00962e-13 | 1 | 1 | | berkeleydbmemory | 4.29462e-13 | 8.52638e-13 | 1 | 1 | | wget | 5.45736e-12 | 1.46143e-11 | 1 | 1 | | sensornetwork | 6.43181e-10 | 1.50993e-11 | 0.956667 | 1 | | tankwar | 0.0726596 | 1.50993e-11 | 0.61 | 1 | | weafqas | 1.50993e-11 | 1.50993e-11 | 1 | 1 | The code to generate this table is available at https://anonymous.4open.science/r/4381212f-e518-47ee-88ea-e0c1c34c457d/ ### Framing MDEO, the approach presented in this paper achieves slightly better results than MODAGAME and SATIBEA and Figure 6 shows that the new approach needs less evolutions to converge. However, even 1 evolution of MDEO takes longer than 50 evolutions of the competing approaches for the subject systems depicted in Figure 6 (b) - (f). Notably, fitting a line for each approach, the line for MDEO has the highest intercept and the steepest slope. This suggests that both the operator generation (responsible for the intersept) and their application (responsible for the slope) need to be optimized. Although this has been discussed partially by the authors, the paragraph "convergence speed" in Section 5.2 and Section 6 are misleading as they suggest that MDEO performs fast. In particular, the wording "convergence speed" suggests that less seconds pass for MDEO until it converges than for other approaches. Similarly, stating "CPCOs nable the search to converge *significantly* faster" is misleading as it is not faster, but it takes less iterations to converge and even this property was not shown with a significance test. The paper should be adapted to state that *MDEO needs less evolutions to converge* or that it *yields better results with the same number of evolutions*, but that it is substantially slower doing so. ### Minor Points SPLConquerer is reference as a state-of-th-art of MOO; however, it optimizies for a single NFP, but not multiple at once. It should be presented as an approach for MOO ## Recoverability, Replicability and Reproducibility As the table above shows, reproducibility and replicability are exmplary for this paper. The provided online resources contain the data for all repetitions and program code to generate the figures in the paper. ## Presentation The author's writing is clear and understandable in general. Not abbreviating "Section" "Figure", and similar terms would help the reading flow, however. Minor mistakes / unclear passages: - Section 3.1 - Definition 1 - if *par* defines **a** parent child relationship the feature model *fm* can onl have one such relationship, *par* being in its tuple - similarly each *xor*-group are a subset of F, yet Definition 1 states that "*xor* c F", only allowing a single xor-group. this shoul be rephrased such that *xor* is a set of xor-groups (applies to or-group also) - List of constriant types (1) - (7) - (5) - (7) do not relate to pairs of features *f,g*, a separation from the rest of the list would help the reader - Section 3.4 - "each rule contain**s**" - Description of *Act_SMSTransfer* - text says the *selected* attribute is set from true to false, which seems to be the case for *De_Screen3*, but not for *Act_SMSTransfer*, should be "false to true" - Figure 7 - At the point this Figure is presented, the reader will not remember for which attribute highe or lower is better. In addition, the colors blur, especially for 7 (b). This way, it is hard to draw any information from this plot. [1] Siegmund, Norbert, Stefan Sobernig, and Sven Apel. “Attributed Variability Models: Outside the Comfort Zone.” In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, 268–78. ESEC/FSE 2017. New York, NY, USA: ACM, 2017. https://doi.org/10.1145/3106237.3106251. ==+== F. Questions for the authors’ response ==-== Specific questions that could affect your accept/reject decision. ==-== Remember that the authors have limited space and have to respond ==-== to all reviewers. Please number your questions. ==-== Markdown styling and LaTeX math supported. ==+== G. Comments for PC ==-== (hidden from authors) ==-== Markdown styling and LaTeX math supported. ==+== H. Poster Presentation ==-== Should the authors be invited to present a poster at ESEC/FSE ==-== 2020, regardless of whether the paper is accepted or rejected? ==-== (hidden from authors) ==-== Choices: ==-== 1. No ==-== 2. Yes ==-== Enter the number of your choice: (Your choice here) ==+== I. Level of expertise required to understand the paper ==-== Suppose this paper is accepted, what is the level of expertise ==-== required to understand this paper? This information may be used to ==-== schedule talks in the program. ==-== (hidden from authors) ==-== Choices: ==-== 1. Beginner ==-== 2. Intermediate ==-== 3. Advanced ==-== Enter the number of your choice: 3 ==+== J. Industry relevance ==-== Suppose this paper is accepted, is the paper relevant for ==-== industry? This information may be used to schedule talks in the ==-== program. ==-== (hidden from authors) ==-== Choices: ==-== 1. No ==-== 2. Yes ==-== Enter the number of your choice: 1 ==+== Scratchpad (for unsaved private notes) ==+== End Review