# Processing with AI ## Exploration : 👩‍⚖️ Ethics of AI Name: > Siyi LI > Subject: > Improve dating apps matching algorithms using NLP >[TOC] ## Design brief ### Bias If we don't source our dataset with enough rigor, the following bias might appear: >1. When people in our dataset have biases to any particular group of people or favor any particular group of people (any race for example), the matching results will reinforce the biases. >2. The matching results may not work well for minority groups like homosexuals, because the database contains little data about them. >3. The matching algorithm which is based on the past good matches may not predict the future well, and this will limit users' potential choices. We will ensure that our model is not biased by: >1. Sourcing our data from different areas and among different racial and trying our best to set policies to prohibit discrimination. >2. Making sure our data take into account minority groups like homosexuals. >3. Trying our best to make our data scientists as diverse as possible to eliminate biases as much as possible. ### Overfitting We will make sure our model does not overfit by: > 1. Separating the dataset into two parts in a perfectly random and unbiased way -- a training one and a validation one. > 2. Training our model with the training dataset, then checking the accuracy of the model on the validation one. ### Misuse >We have to remind ourselves that our application could be misused by: >1. harassers to engage in sexual harassment, like sending indecent messages or pictures. >2. scammers to engage in romance scams which aim at tricking people into sending money. ### Data leakage >1. We have decided that our training dataset will be fully open-sourced, but before that, we had to make sure that the dataset had been de-identified (anonymized). It is because there is private information like messages in our dataset, and it is our responsibility to protect users' privacy. In addition, we will also set the strictest rules on our developers to avoid any backdoors. >2. In a catastrophic scenario, where all of our training dataset were stolen or recovered from our model, the risk would be the leakage of users' privacy, and they are likely to be harassed or even blackmailed. ### Hacking > If someone found a way to "cheat" our model and make it make any prediction that it wants instead of the real one, the risk would be: > 1. more serious discrimination (race, age, sexual orientation, etc.). > 2. more scams which will cause money losses, because the probability of matching a fraud might be larger with the cheating algorithm.