Processing with AI

# Processing with AI ## Exploration : 👩‍⚖️ Ethics of AI Name: > Remy Bianconi > Subject: > "Detect rare diseases on medical results, here skin cancer , using using Machine Learning and Image recognition" >[TOC] ## Design brief ### Bias If we don't source our dataset with enough rigor, the following bias might appear: >1. a Only recognize one specific color skin as human skin >2. b Fails to properly identify skin cancer area >3. c Doesn't generalize well on unseen skin pictures We will ensure that our model is not biased by: >1. Sourcing our data from diverse human skin pictures databases >2. Making sure our data take into account the broadest panel of skin colors >3. Balancing the proportion of samples we have regarding their respective category (colors in this case) ### Overfitting We will make sure our model does not overfit by using diverse sources of data and by using a train set and a test set with unseen pics for the latter. It will help seeing if the ML algorithm did manage to generalize well on the data. ### Misuse >We have to remind ourselves that our application could be misused by governments or "evil" researchers to justify some esotheric theories like skin cancer appearing more often on a said population of colors and hence stigmatising them. Such a wrong conclusion is easy to draw but hard to make disappear once known by the public. ### Data leakage We have decided that our training dataset will be fully open-sourced, but before we made sure that all the pictures and meta data are anonymised. Having a decentralised training model could be great for such project but only if all the data remain perfectly anonymous. It could have personal dramatic consequences if someone's name appeared on the web and non of his relatives knew about his disease... ### Hacking If someone found a way to "cheat" our model and make it make any prediction that it wants instead of the real one, the risk would be that they will use it to demonstrate false correlation between some cancer cases and the skin color of whatever category of color people they aim. If the hack happens without scientist noticing it, good and valid research paper could come out with completly missleading conclusions generated by the algorithm, even if the scientists are not promoting the discriminative behavior the algo would have. It could even be used for political purposes.