Processing with AI

# Processing with AI ## Exploration : IA & Ethics Name: > Amokrane Tamine > Subject: > Presence detection algorithm using cameras and facial recognition >[TOC] ## Design brief Presence in class is extremely important for the succes of our students. The problem is that sometimes, they slack-off and don't go to class, especialy in large lecture rooms where teachers can't call their names. The idea is to use a logitech PTZ Pro camera and a face recognition algorithm to detect their presence in the room ad guarantee their success. ### Biais If we don't source our dataset with enough rigor, the following biais might appear: >1. If we don't have equaly diverse and large input, the algoithm will recognize lighter skined people more easily, making darker skined student marked absent by accident more often, resulting in potentialy racist conclusions like "black students are not motivated, we should only accept white students in our school". >2. If the alogrithm is not trained properly, people with statistically low facial feature might be undetected : facial tatoos, dysmorphia, piercings, face masks (Apple's FaceID doesn't work at all, Idemia annonced a working solution only in mid 2021), religious clothing (like hijab), colored hair (blue, green)... >3. We might arrive to a conclusion that uses race as an argument and not what lays deeper like other biases (technical issues, deeper personnal issues...) >4. It will learn from the majority group (more likely white) and not improve on other ethnic groups We will ensure that our model is not biaised by: >1. Sourcing our data from diversified and rich databases that include a lot of content from a lot of skin shades and facial features, as well as genders and races. >2. Making sure our data take into account that skin shades react differentely to cameras and backgrounds, some people wear masks... >3. Uploading all the photosets we have the right to use to represent our students equaly but make sure to know white people are a large majority at emlyon but not the only ethnic group >4. Checking its evolution regularly to make sure it doesn't go beyond what was set as a goal >5. Having diversity in the team that will setup and manage the technology will help reduce the bias since the people from diversity can add their point of view to what can possibly go wrong. ### Overfitting We will make sure our model does not overfit > Having a working model is a complicated task. If the model underfits (not trained enough), it will perform poorly and will not return a confident enough response. That is the reason why you need a lot of training. When the model learns the training dataset too well however, the result is overfitting. >- Checking the accuracy of our model on split test data by using subset of our test dataset and keeping the best performing model. After a certain point, the algorithm will stop imporving which means it's time to stop the learning phase. > - Another method that is very basic consists of reducing the number of features on the model. In this specific case, we will simply answer the question "is student X here or not" and will not try to see if students arrive late or are not attentive to the class. This method is called structural stabilization. More details on structural stabilization can be found on page 332 of Neural Networks for Pattern Recognition, 1995. >- We can also give the model more data to train it. The algorith might be able to understand the real parameters to take into consideration. However, this metod does not always work as it can result in even more overfitting. >- Reducing the weight of some parameters taken into consideration can also help reduce overfitting. If a model heavily skews towards one or few criteria, it is more subject to overfit. Reducing weight on some parameters make the model more stable. For instance, if we us the contrast between the back white wall and students as 0.4 weight and the model mistakenly assumes that a white student with blond hair and a white shirt is missing, we might reduce the weight of the contrast to 0.2 to see if it improves. ### Misuse * We have to remind ourselves that our application could be misused by evil politicians to use a detection error to prentend that people from diversity don't attend school and hougout on the streets. * We're also likely to hear politicians claim that people from sensitive geographical areas (ZEP/REP in France) don't deserve the massive investments they get in education because "the data suggest that that money goes to waste since they don't go to school", while the AI was simplely less prone to detect an African face, therefore, mistakenly claiming that African students are les present in class. * It can also be used to exageratedly control people's life like China's camera + Face recognition enabled citizen grading system. * Another thing that could happen and that is more relatable in France, is that we can use the model to gather ethnic data about the students of a school, which is strictly forbidden by the French law. * Finaly, it ca be used by school administrators as an argument to terminate its contract with a few teachers on the basis of the lack of attention they get from student. In France, it is hard to find a motive to fire someone but it is possible that these data might be used that way. ### Data leakage *Choose the most relevant proposition:* >**🔐 Open source model with a private dataset:** >I believe this setting works best here. Open source model means that we can work on something pre-existing and collaborate with researchers.We can first train the model with public image databases such as Flickr's image database that has more than 70k faces. We have enough data to initiate a model and we offer tranparancy from open-source code to rassure students. Our student faces however remain private. When we introduce them to the model, we make sure that student faces are private. All the rest is open source. > > >In a catastrophic scenario, where all of our training dataset were stolen or recovered from our model, the risk would be that the data gets absorbed into another dataset without the consent of the students that agreed to share their data only internaly. This data might be used by a security company like Palantir or Clearview (Clearview is familiar with using people's data without their consent like scrapping social media pictures) to potentialy identify a student on a crime scene. It can feed other databases like Pimeyes or Yandex Face that are publicly accessible and help people that want to identify you have more chances to succeed (I personnaly use Pimeyes to make sure I am not getting catfished on social media or dating apps and the results are terrifyingly effective). ### Hacking >- If someone found a way to "cheat" our model and make it make any prediction that it want instead of the real one, the risk would be that the presence data would be completely wrong and some students would be able to graduate without attending. Worse, if the hacker has a grudge on you, he might use the algorithm to make you fail a class or even more. >- In a catastrophic scenario, where all of our dataset were stolen or recovered from our model, the risk would be that the hackers could use our facial parameter data against our own security, for instance, hacking our PCs that use Windows Hello, or iPhones with FaceId if the technology is similar. > - It could be worse with other models than basic school presence. Let's someone hacks China's citizen face recognition system and discretely changes the data. It could go wrong and people will end up in jail or worse for no reason. ### Sources https://machinelearningmastery.com/introduction-to-regularization-to-reduce-overfitting-and-improve-generalization-error/