# Processing with AI ## Exploration : 👩‍⚖️ Ethics of AI Name: >Darrell Joseph George > Subject: Detect rare diseases in medical results (blood tests, MRI, radiology, etc.) using Machine Learning > >[TOC] ## Design brief ### Bias If we don't source our dataset with enough rigor, the following bias might appear: >1. Bias based on skin-color: For example, a model trained on the genomic data of people of European descent than those of African, Asian, Hispanic, or Middle Eastern descent. >2. Geographic Bias: A model trained on datasets from patients from particular geographic areas. >3. Bias based on anatomy: A model trained on datasets which excludes people who have a particular body-type which leads to wrong diagnoses and treatment of people with certain conditions such as obesity. We will ensure that our model is not biased by: >1. Sourcing our data from reliable sources which will ensure that we have an unbiased dataset. >2. Making sure our data takes into account all the necessary information (For example, in racial bias we will ensure that all the ethinicity and racial groups are considered rather than just one particular group) >3. By finding out the different kinds of biases that could potentially exist or could occur in the future while we make the model and working out a plan to tackle these biases one by one. ### Overfitting We will make sure our model does not overfit by > Checking the accuracy of our model by dividing our dataset in a perfectly random and in an unbiased manner. Followed by validating the model on a different dataset after training to ensure that the issue of overfitting is taken care of. ### Misuse >We have to remind ourselves that our application could be misused by unethical hospitals and doctors to manipulate the A.I. in billing or insurance software in an effort to maximize the money coming their way. ### Data leakage *Choose the most relevant proposition:* >In a catastrophic scenario, where all of our training dataset were stolen or recovered from our model, the risk would be that our data could be used illegally and maliciously by third parties. The medical history of patients could be sold on the darkweb without any consent. ### Hacking > If someone found a way to "cheat" our model and make it make any prediction that it want instead of the real one, the risk would be that the data between patients could get mixed up, thus attributing the pathologies to the wrong people or wrongly diagnose patients. The consequences would be catastrophic and would have significant repercussions on the health of the patients in question.