# Week 3
# Lecture 3
## Keywords:
- Gender Shades
- Lipstick on a Pig
- Beauty Score
- Bias
- 2-Loop
## Glossary:
- new concepts, words, or ideas: and their definition, description, ...
## Notes:
The lecture began with the class watching a video. This video was about the "Gender Shades" project, which is spearheaded by computer scientist Dr. Joy Buolamwini.
Gender Shades (the title of Buolamwini's PhD thesis) is an intersectional study of facial recognition software. Buolamwini retrieved the facial data from political figures all around the world. She tested the accuracy of several facial recognition programs (designed by IBM, Microsoft and Face++) and found that all algorithms were biased towards men and light skin; the worst accuracy being for black women.
She attributed this to low variability in the training sets provided to the algorithms and a latent bias in machine learning.
This bias (how it is defined, where it comes from, and how it affects results) was the focus of this lecture.
Next, Giulio talked about the paper "Lipstick on a Pig", written by Hila Gonen and Yoav Goldberg. This paper discusses the gender bias in natural language processing tools. In class, we focused on the algorithm "word2vec".
"word2vec" takes the string representation of a word and converts it into a vector. This vector is embedded in a network of other vectors; vectors with similar meanings are closer to each other in space.
Researchers noticed that these network embeddings had a gender bias. First, they found all the words with explicit gender reference (he, she, king, queen, men, women, etc...), then, by taking the difference between these vectors, they found gendered directions. To be clear, these directions imply that certain words are gendered. What they found was that certain pairs of words were perceived as gendered, even if they were not explicitly so. For example, "Doctor" and "Nurse". `Can someone confirm this? - Liam.` `Can confirm that this indeed did happen - Brett`
In the discussion about bias we talked about what actually is bias. It was difficult to really give an actual definition for bias, on one hand there was the mathematical/statistical defintion which is mainly due to sampling error which can be easily calculated. On the other hand there is more of an informal defintion that comes down to ethics and legality which is much harder to give an answer for.
We also looked at a project that was designed to give someone a "beauty score" based on a photo. We discussed how there is going to be a large amount of bias in this kind of work as beauty itself is very subjective and will usually come down to ones personal preference which could be influenced by things such as culture and upbringing. This would mean that whoever was coding the the algorthim would either intentially or unintentially cause bias in the project.
Finally the last thing we discussed was the non-nuetrality between data-scientist and data-subject in the context photographer vs person photographed.
`-Brett`
Face recognition algorithms are heavily tailored and biased towards white males. Lack of training images play a role in this, but also the responsible are the demographics of the groups developing the algorithms.
"Lipstick on a pig" - dibiasing methods that cover up systematic gender bias
Issues are raised when things taht are subjective are presented as neutral, i.e beauty scores.
We need to the ask the question, would improving facial recognition software improve outcomes for all parties? Taking into account police surveilance tactics and unequal racial outcomes in the justice system, the answer is probably no.
Note: racial or ethnic bias is not the same as statistical bias.
# Lab 3
## Keywords:
- R-Studio
- Decision tree
- Random forest
- Bank loans
## Glossary:
- new concepts, words, or ideas: and their definition, description, ...
## Notes:
In today's lab we worked on an R studio worksheet. The lab exercise focused around a scenario where a model could mark a person as risky or not risky to loan money to, with this result potentially impacting whether a bank would loan money to this person. Throughout the course of the lab we made two decision trees and a random forest diagram. The final component of the lab was centred on a website Guilio had made that allowed you to change a perspective borrowers details (such as age, income, etc.) with sliders and be told whether the person would be given a loan. The point of these tasks were to consider whether the given models were ethically sound.