###### tags: `Supervision` # Agenda for 29.07.2021 Meeting ## Experimental Setup for FairSMOTEBoost 1. I think testing over the 4 common datasets which are also used in AdaFair is sufficient: **"Bank", "Adult", "KDD", "Compas"** will be fine. But in case of any unpredicted behavior or in case of desire for further testing, we can also run experiments on other similar data like **"Credit_data", "NYPD_Complaint", "Diabetes"** or even other imbalanced datasets. If there is any suggestion please let me know. > [The more datasets (and more diverse) the best.] 3. I think of the Measures used in AdaFair for our comparisons too: **Accuracy, Balanced_Accuracy, TPR_protected, TPR_non_protected, TNR_protected.** For Fairness measure, **Equal Opportunity** is my first candidate but I need to read a bit more about parity-base measures and see which measure makes the most sense to be used. > [Yes the measures from Adafair seem appropriate. Please also add Absolute Between-ROC Area (ABROCA), Tai uses it in his survey, you might ask him] 5. We would have a **"parameter analysis"** comparing the static number_of samples with growing values of **N_samples** (e.g. 5 different settings of N) and also compare the results for these values with the ones of a proportional N. In the first setting N is just a coefficient of the number of samples in the minority class. I have this in mind to have N relative to the discrimination value that each of the "Protected_Positive" and "non_Protected_Postive" groups get in each boosting round. The results could be analyzed based on both the predictive performance and fairness performance.I expect to see better results for biger N's and also for the proportional setting. I expect the proportional setting for N to show better convergence in terms of both fairess and predictive performance. [You should always consider predictive performance and fairness performance in your evaluation bcs your goal is to find a good and fair model. ExperimentS with different N-values and dynamically chosen N per round are needed] 7. The parameter analysis could further include the stack bar charts that I sent you and describe the internal behavior of our method using them with regard to different setting for N. And maybe describe the difference in performance by distribution of weights of augmented groups. 8. A big sub-section of the Experiments would be the comparison of FairSMOTEBoost (our method) with the other algorithms; AdaFair, SMOTEBoost, RUSBoost over the datasets mentioned above. [Yes.] 10. I would expect our method to outperform the SMOTEBoost and RUSBoost in Fairness measure but not sure about predictive performance. AdaFair I say would be tough one to overperform. I rather expect comparative results against it. [Lets see] 11. I think of combining the predictive and the fairness performance in **radar charts** also (for all the four methods). [Try and see if it works, hard to tell in advance which chart type is best] 13. Meanwhile, I'll also **look for other new methods** to see if there is any, to compare with [Please find the related work and write how the methods compare to each otehr and differentiate from each other] 15. Depending on how much effort it will take I might also add a couple of popular fairness methods (like the krasanakis et al. and zafar et al.). This is just an idea. Not sure if it makes sense yet or not. [Vasilis has already used them so you just need to call them I guess.] 17. A very important aspect is also to update the fairness measure used in the body of the algorithm (the measure that is used for boosting based on error and fairness values). [I am not sure what you mean, proposing a new? measure?] [In any case what Vasilis implemented in a 2-3 days takes improvements and as we disccused smote should be done in the discriminated group rather than in the minority class. So many options for experimentation with the method and understanding its limites and where you can intervene]