# Introduction to Data Science and AI
## Student: Minh Duc Nguyen - 10421010
### 1. Introduction
  Human beings have a natural inclination to assign labels and categories to the surrounding world to simplify the process of distinguishing between things. However, when these labels or features obstruct decision-making (whether it is human-driven or suggested by an algorithm), it results in the formation of stereotypes and biases. These biases are subsequently incorporated into the technologies created by humans in various ways.
### 2. What is Bias?
  In the context of machine learning, bias refers to the extent to which a model's prediction deviates from the target value, relative to the training data. Bias error occurs when the assumptions employed in a model are simplified to facilitate easier approximations of the target functions.
### 3. Real-world Problem
  The problem of bias in machine learning must be regarded as a significant issue, as it compromises the decisions or predictions generated by machines. As a consequence, there is a risk of relying on inaccurate outcomes from the model, which may have severe consequences for businesses or even human life.
  Amazon’s one of the largest tech giants in the world. And so, it’s no surprise that they’re heavy users of machine learning and artificial intelligence. According to five individuals familiar with the matter, the team has been developing computer programs since 2014 to automate the process of evaluating resumes of job applicants, with the objective of streamlining the search for the most skilled individuals.
  Automation has played a pivotal role in Amazon's success in the e-commerce sector, whether it is in their warehouses or driving pricing strategies. Some of these sources claimed that Amazon's experimental hiring tool utilizes artificial intelligence to assign scores ranging from one to five stars to job candidates, similar to how shoppers rate products on Amazon.
  But by 2015, the company realized its new system was not rating candidates for software developer jobs and other technical posts in a gender-neutral way. Amazon realized that their algorithm used for hiring employees was found to be biased against women.
### 4. Analysis of the issue
  This is due to the fact that Amazon's computer models were programmed to evaluate job applicants by analyzing resume patterns that were submitted to the company during a 10-year span. As the tech industry is predominantly male-dominated, most of the resumes that the system processed were from male applicants. Consequently, the system learned to favor male candidates over female ones. Specifically, the system penalized resumes that contained the word "women's," such as "women's chess club captain," and gave lower rankings to candidates who graduated from two all-women's colleges, as disclosed by individuals familiar with the situation who did not reveal the names of the institutions.
### 5. Solutions
  It is crucial to use data that represents "what should be" instead of "what is." Randomly sampled data will likely contain biases due to the existence of an unequal world where equal opportunities are still a distant reality. However, it is our responsibility to take proactive measures to ensure that the data we use represents all individuals equally and does not result in discrimination against any particular group of people. For instance, in the case of Amazon's hiring algorithm, had there been equal representation of men and women in the data, the algorithm may not have discriminated as significantly.
  Mandating and enforcing data governance is crucial. Both individuals and companies have a social responsibility to regulate their modeling processes and ensure that they adhere to ethical practices. This can take various forms, such as establishing an internal compliance team to conduct audits for every algorithm created. It is our obligation to ensure that our modeling processes are ethical and responsible.
  To address the issues highlighted by the instances discussed, it is essential that model evaluation includes an assessment by social groups. This means striving to ensure that metrics like true accuracy and false positive rate are consistent when comparing different social groups, including but not limited to gender, ethnicity, and age. By evaluating models from this perspective, we can gain a deeper understanding of how algorithms are performing across different groups and take steps to mitigate any biases that may arise.
### Conclusion
  It is evident that creating non-biased algorithms is a challenging task. To achieve this goal, it is crucial to ensure that the data used to train these algorithms is free of bias, and the engineers creating the algorithms must be vigilant in avoiding the introduction of their own biases. This requires a comprehensive approach that involves careful data selection, rigorous testing, and ongoing monitoring and assessment to ensure that algorithms remain free of biases that can lead to discrimination against any particular group of people.