# Processing with AI ## Exploration : IA & Ethics Name: Adrien Lenfant > > Subject: Detect student presence in class using Face Recognition > >[TOC] ## Design brief ### Biais If we don't source our dataset with enough rigor, the following biais might appear: >1. a : As Joy Buolamwini pointed out in her thesis "Gender Shades" a dataset which is not trained well might not recognize women and minorities. >2. b : Androgynous & transgender people might also be affected by dataset bias >3. c We will ensure that our model is not biaised by: >1. Sourcing our data from IBM "Diversity in Faces" dataset. >2. We could split our Dataset and trained them separatly to make sure that every minorities will be respected. >3. Keep humains in the loop: we could think of a representative "dataset board" which will have to control the dataset training. >4. Looking for help : http://diversity.ai is an organization working to eradication biaised dataset. ### Overfitting We will make sure our model does not overfit by > Checking the accuracy of our model on a validation database which will need to be big enough and representative. We could also start with a very simple model as a benchmark, then, try complex model ### Misuse >We have to remind ourselves that our application could be misused by the administration to monitor or find a specific student. Such use of AI must be stricly limited and controled by independent institutions. It could also be used for etchnic, gender or sexual preference statistics. The administration may also sell data points from the student. ### Data leakage *Pick the most relevant one* >We have decided that our training dataset will be fully open-sourced, but before we made sure that we need to anonymize all the dataset. We will need to make sure that the dataset is unbiased (enough with school controversies) and that nobody could be targeted. ### Hacking > If someone found a way to "cheat" our model and make do it make any prediction that it want instead of the real one, the risk would be that he could reveal private or classified informations. > Or maybe the hacker won't need any doctor certification to miss a course anymore.