MACODA Benchmarking book chapter === ###### tags: `Projects` `Benchmarking` Summarized and collects information and ToDos [toc] --- # General Info :::warning * Deadline: 11.02.21 ::: * 27.01.21, 09:00am: next meeting * Meeting details: skype # TODOs * ~~Links in diagram (VV)~~ * Checklist review and links to pitfalls (VV) * ~~Formatting of Chapter 3 (distance between paragraphs, emph) (BN)~~ * ~~Check for missing points in checklist (BN)~~ One added * ~~chapter 1 (BN)~~ Revised, comments added. * ~~Aspects such as noise, many-objectives, invalid constraints are not present as commonly. Look up what came out of the survey chapter and refer to it (last bullet point in chp 4) (Koen)~~ Revised sentence * ~~check [5] references for detail (VV)~~ * check missing references (everyone) * make sure references are in common bib file and correct format (Koen) * ~~address leftover todos in chapter (BN)~~ --- # Reviews ## Dimo (editor) 2nd review ### page 3 * ~~"not "is"?"~~ * ~~BN: unclear to me.~~ ~~It depends which "survey" we are referring to here. And this is in line with the question which item we are referencing in the end of the prior paragraph. Any idea? Which book chapter?~~ * ~~Koen: The reference in the preceding paragraph in the latex source points to the real world questionnaire chapter if I am not mistaken, but I don't think that makes sense for that sentence, nor for the paragraph with this comment... Was this really what we wanted to refer to?~~ * ~~Done: changed reference in paragraph before back to big chapter of TBB, change "is" to was~~ ### page 4 * ~~"as well as to identify (?)"~~ * KvB: done * "A new abbreviation not used anywhere else in the book, I guess: shall we change it or include the new term in the glossary?" * BN: Check which abbrv. to use here Suggestion not to use a corresp. abbrev. MMO maily shows up on two pages where it can also be more carefully distinguished between multi- and many objectie optimisation. **I will check ...** * "In my view, there is a bit too much focus on EC methods here (and also above), although it does not have to be: what is actually different between "classical" multiobjective algorithms and evolutionary ones when it comes to benchmarking? In COCO/BBOB, there is actually none..." * BN: not sure how to handle There surely is a difference regarding benchmarking if stochastic and non-stochastic methods are considered. But can we consider all "classicla" multiobjective algos to be non-stochastic? Should I check with Dimo? **Did: revise the paragraphs since most of the things said there also holds for optimisation, multi-objective optimisation in general and doesn't solely apply to evolutionary methods. Will check ...** * ~~"I am a bit surprised to see a reference to another chapter when it comes to motivation - within the motivation section!~~ * ~~BN: To me this is referring to the movitation for the questionnaire. Thus, I guess it's okay but I will check with Dimo.~~ Solved ### page 5 * ~~typo "algorithms"~~ * BN: corrected ### page 6 * "no CEC suite here?" * BN: Was there a reason for not adding the CEC suites in here? Possibly because they are not available in a bundle like the other suites, they are not really suits bit a loose composition of functions? Suggestion to extend the collection in the table to CEC competitions, how about the inverted DTLZ suite, etc. ... Dani, can you do so? Let's also discuss putting more information to the table as suggested by Dimo on page 13 (see below) * DI: I added some CEC test problems. I cant find some older ones because the links already dead, so I added some recent CEC test problems. * Also I found this for real world: ** @online{Wan2019, Authors = {Handing Wang}, Howpublished = {https://handingwang.github.io/DDMOP/} Note = {[Online; accessed 22 August 2020]}, Title = {{Competition on Online Data-Driven Multi-Objective Optimization}} ** @techreport{Kum2020, Author = {Abhishek Kumar and Guohua Wu and Mostafa Z. Ali and Qizhang Luo and Rammohan Mallipeddi and Ponnuthurai Nagaratnam Suganthan and Swagatam Das}, Institution = {Nanyang Technological University, Singapore}, Title = {Guidelines for Real-World Multi-Objective Constrained Optimisation Competition}, Year = {2020}} **ToDo: Forward text from Dimo to Dani and check whether everything is considered.** ### page 7 * ~~"features"~~ * ~~BN: to me, "performance of algorithms" is correct here~~ ~~Will change as dicussed with Dimo~~ Done ### page 9 * ~~"is this the right reference and not Figure 5?"~~ * KvB: corrected ### page 10 * "just a side remark: COMO-CMA-ES achieves the theoretically possible converegence rate in terms of hypervolume." * BN: leave as is, ignore :) * ~~"do"~~ * BN: corrected ### page 13 ~~* "Here, and in the other table, it will be nice to see the number and type of variables,, number of constraints, etc. in addition to the number of objectives. Especially for the real-world suites, a rough idea of the evaluation times will be helpful as well." * BN: I basically agree with Dimo but this would blow up the table. I think we should leave them as they are. What do you think? Possibly add a attachement with "full" tables? But to me this is not necessary. (see above, page 6) * Koen: It may be possible to do so without taking too much space by putting this information in the 'Description' column (in a consistent format to be easy to parse). The possible issue is that not all papers may clearly indicate each of them, so they may be a pain to find... * Dani: I think this is not so useful because each problem in the suite can have different number of variable, constraint, etc.. Even #of obj can be different and may even be configurable; however, I would argue that the #of objective is relevant because the chapter is on multiobjective, but #of variables and constraints is not as important on the topic. * Dani: # of variables may be interesting if we list the function one by one to address which test function can be considered large-scale, and which are not. But the chapter is not on that topic. as far as I remember, the topic "large-scale" is only briefly mentioned~~ * **Ignore** ### page 15 ~~* "What I miss is the discussion of actual shortcomings of most existing (and still frequently used) benchmark functions such as distance and position variables, Pareto sets aligned with the (box-)constraints, the absence of irregularities, mostly separable problems, ..." * BN: As we don't address features of special suites but more conceptual pitfalls, I' unsure how ot handle this comment. Vanessa, any idea? Not discussed with Dimo (yet?) * Koen: I'm not sure anymore what our aim was with this subsection, but my feeling is that it was not designed to provide an exhaustive list of shortcomings, but rather to highlight some. I also think we already mention some specific shortcomings of the type he asks for (e.g. constraints are rarely considered). So... I guess I see two possible directions for possibly addressing this: * Add some sentence explaining we don't aim to be exhaustive here * **TODO Koen: Add a sentence (or two) listing 'other common issues', i.e. the ones he suggested**~~ * Done ### page 16 * ~~"you should probably introduce "benchmark" in front of problems here."~~ * ~~BN: Don't get it. We're talking about benchmarking for the whole chapter now and on page 16 we are asked to introduce "benchmark"?!? Any idea?~~ ~~This is just a kind of typo, will correct~~ Done ### page 24 * "you can also specify different targets of interest in COCO, but it's less obvious to do." * BN: So we ignore? :) * Koen: Sounds good to me ; ) * **Ignore** --- ## Dimo (editor) Main remark: Almost all parts of the chapter are generic to multiobjective optimisation. So, a deeper discussion what is different in many-obj. opt. would be helpful in the book Minor comments are still available in the provided pdf. Here I just list the more important comments ### Page 2 Personally, I'm not sure this unrevisioned arXiv paper should be cited here that prominantly. There are definitely wrong statements inside? * [VV+BN] Specifiy specific chapter if possible or alternatives, possibly from the arXiv paper Competitions have other goals and test different things, e.g. a clear winner, favouring good programmers etc. * [BN] add clear winner ### Page 3 Here, we should refer to the corresponding book chapter? * [BN] check with Dimo ### Page 5 The NFL is not an argument in the context of benchmarking * [DI] Remove NFL citation No practically relevant set of problems provide the properties for an NFL ... * [DI] Remove NFL citation and edit text accordingly ### Page 6 Why listed here? and not the many other, more common suites (list of unreadlable example incl. COPS, other CEC suites ... ) * [DI] Remove single-obj. suites, add other suites to table ### Page 7 Not correct, only for a few functions! [DI: fixed] Why is this easy? [DI: removed "easy"] ### Page 8 The "max" are ... search space --> I don't see why the problem is simple to solve ... [DI: removed "easy"] ### Page 9 Careful with the comparison here there is a big difference between discrete and continous variables. 20 continuous variables are difficult but 20 binary create only about a million of search points that can be enumerated easily if the Obj function is fast enough to evaluate. [DI: mentioned contiuous vs discrete] Is this true? I thought there are always A- 1 position distribution variables? [DI: fixed to m-1] Will be nice to define this [DI: added] What I miss are some critical thoughts of which of the described properties are actually desired and which are very artificial. This results directly in suggestions of improved test suites. what about a description of the properties of the BB0B bi-obj problems? [DI: this is answered in the section shortcoming, i think] ### Page 13 Note that by now the definitions of the functions became public. ### Page 14 I am missing an actual discussion of the shortcommings here - in terms of how likely the available test suites cover / offer difficulties observed in practice. What comes immediately into my mind are the existance(?) of distance variables and Pareto sets aligned to the search space boundary. ~~I would agree that in the context of single-obj. opt., best practices are alrready established for years now, see BBOB/COCO, data/performance profiles~~, ### Page 16 ~~"it is not clear" is a strong statementhere at least for the COCO/BBOB functions, compare appendix A "How we choose the test functions" of the latest optimisation methodds and software paper.~~ ### Page 17 ~~You could stress your point with bringing the example of the cute/cuter/cutest problem of which a large subset has fixed low dimension and only few are non-linear, non-convex, ... --> "more functions not always better to have"~~ ### Page 19 ~~This is relatively fair if all algorithms that are tuned are tuned on functions, not in the benchmark suite~~ ... ### Page 25 - making it difficult to use standard measures for performance: is it really difficult? What I miss are concrete suggestions of what to do (knowing that this has been done already, so it cannot be too difficult) - not easy to combine benchmarking functions: why not? many platforms (jmetal, PISA, MOEA framework, ..) have implemented most known test problems already - I don't see why you mention the opposite - set of popular benchmarks is small: For the bbob-biobj suite it seems very difficult to overfit, because there are 55 functions with different instances and different dimensions - ease of use: why is this an open problem? Not a problem for bbob - I find [5] a bit too often references. This is not peer-reviewed yet and it is not even clear why this chapter here is needed if everything is already discussed elsewhere - some solutions implemented in benchmarking software already: but they are already! ## Reviewer 1 Comments on chapter 'Benchmarking' - the title should reflect the focus on multiobjective problems - the abstract should be more concise; moreover, I don't understand why manyobjective problems are more diffult to analyze than, say, multiobjective problems - sec. 1: formal introduction of (multiobjective) terminology is missing; no formal defintion of benchmark as it is understood here; there are a lot of papers defining the term 'benchmark' -- at least some of them should be cited; no distinction of benchmarks w.r.t. type of variables - sec 1.2: is this really necessary? - sec. 2: o p. 5, 3rd line: "... should be noted that IN GENERAL no single ..." o p. 6, table 1: what is the purpose of the first 2 benchmarks for *single*-objective problems? If you cite these two, then you must also cite numerous others (or you must explain why these 2 are so important for the paper or field or ...) [DI: removed] o fig. 1 and 2 are weak; first, they illustrate trivial stuff (school math); second, they are about singleobjective optimization and your focus is on multiobjective problems, isn't it? third, 'deceptive' problems are characterized by local minima with huge region of attraction whereas the global solution has a steep, narrow region of attraction. [DI: removed, changed to multimodal PF] o p. 7, next-to-last paragraph: the triangle-shaped Pareto fronts emerge only in case of three objectives, right? A triangle-shaped front has a lower degree of conflict in the objectives, IMHO. [DI: removed comparison on difficulty, only description.] o o. 7, 4th line from bottom: I would replace "easy" by "less difficult". BTW, the path towards the front can be arbitrary diffcult. The shape of the front does not tell too much about difficulty. For example, DTLZ3 is more difficult than DTLZ2 although they have the same front. [DI: removed comparison on difficulty, only description.] o p. 8, fig. 4: is way too small. [separated into 3 figures so they are big enough] o the figures should have a more uniform style of presentation; and they should be placed closer to the text position, where they are refered from. [some figures still different in style, particularly the 3d figures.] o p. 8, 2nd line from bottom: "consensuses"? plural of consensus? o p. 9, center: distance- and distribution-variables should be introduced formally; please add a simple example. And I don't understand how additional objectives change the number of distance-variables. [added example] o p. 11, table 2 should be moved to the bottom of the page. o p. 11, sec. 2.2, 2nd paragraph: don't start the sentence with a number ("[8] is not a noun"). o **sec. 2.2 is too much prosa including anecdotic remarks**; - sec. 3: This section is the heart of the chapter. It would be even better, if each point is elaborated and highlighted as a concise phrase ("take home message") ... so as if you were forced to put these points on few slides. - sec. 4: I am missing some requests for future work on benchmarks. Something like + When proposing a test function, the author should explain which property of the algorithms is intended to be tested. + When compiling a benchmark, the author should (a) reveal the purpose of the benchmark, (b) make clear which and why specific performance measures are used, (c) justify the choice of the consensus rule when comparing and ranking several algorithms, (d) explain why each specific test function has been added to the benchmark. - references: [5] What is that? A big technical report? Alread published? Forthcoming? How can I get access? [18] What is that? How can I get access? [28] What is that? How can I get access? [38] How can I get access? Is there an URL? ## Reviewer 2 This is an exciting chapter that gives an overview of benchmarking for multi- and many-objective problems. The paper is generally well written. I particularly found the section 3 on avoiding pitfalls quite enlightening. I wonder if some of the advice given here can be captured in a flow diagram (or mind map) of some sort, such that a practitioner can easily follow the advice given. A few minor points to consider: On page 5, third paragraph, second sentence: should '; Artificial benchmark' be '; artificial benchmark'? On page 5, the last line of the paragraph just before section 2.1: Is there a reference missing for the current research? Section 3.1, first paragraph, last line: did you mean great importance?