week 4
===
# Class 4
## Keywords:
- Recommender system
- Homophily
- Hetereophily
## Glossary:
- new concepts, words, or ideas: and their definition, description, ...
## Notes:
We started class by watching two YouTube videos. Both videos were focused on YouTube's recommender system. In this particular case YouTube's recommender system is the algorithim that generates the 'what to watch next' content along the right-side bar. The first video referenced how this recommender system can have a negative effect with the example of someone searching for information on 'flat earth' or vaccine conspiracy theories. The recommender system would lead the user down a rabbit hole of videos that reinforce these potentially harmful/unhelpful ideas.
The goal of recommender systems may be quickly defined as "Recommender systems seek to, based on a user's past interactions, predict what future items a user will want to interact with." Although, as was discussed in this weeks lab this definition is somewhat limited and was iterated on.
Next, we discussed what the goal of the Youtube recommender system really is. What is best for you (the user)? What is best for you (the business)? What is best for us (the data scientist)? For the user, they want the system to recommend videos with properties such as the following (which were discussed in class): High quality videos, videos of a similar topic to what they were previously watching, videos that are to the point, videos that are tailored to your interests, and videos that are novel/expose you to new potential interests. Among other concepts. From this short list we can see that what is best for the user is very broad and unspecific, whilst sometimes being outright contradictory (wanting similar and novel videos at the same time). What is best for the business is much easier to understand. The business wants to maximise watch times to earn more ad revenue.
After this, we explored a fundamental contradiction in recommender systems. Namely, if we train a machine learning algorithm to predict a user's viewing preferences, it will end up making useless predictions. To unpack this contradiction, let us consider an example. Say there is a girl named Lucy. Lucy really likes watching videos about ethics and moral philosophy. The YouTube recommender system picks up on this theme and begins recommending her lots of moral philosophy. What has happened? The one thing we know about Lucy is that she loves moral philosophy, would she not have found the recommended videos anyway? The algorithm is perfectly predicting her viewing habits, but to what effect?
We cannot know how the YouTube recommender system works. It is a block box; likely a machine learning algorithm. However, by considering our example above, we can see how pointless it would be to simply predict videos which perfectly comply with a user's pre-existing preferences. So, what does YouTube's recommender system do?
As detailed above, studies have shown that the videos which YouTube recommends tend to be extreme (see research done by Guillaume Chaslot for more information). Specifically, YouTube has a history of far-right extremism, with creators such as Ben Shapiro and Jordan Peterson being popular on the platform.
YouTube's defense to these claims had been "Well, that's just the data!". In other words, "that's what people want". This defense is problematic. It is never "just the data", the data is being filtered through YouTube's algorithm, which is not an abitrary system; it is not neutral. Someone, at some point in time, coded the algorithm to do what it does. Hence, there are assumptions latent in it. Is it possible that these assumptions, after being filtered through a machine learning process, lead to more extreme content being shown?
An analogy would be with self-driving cars. Imagine you're driving a fully autonomous Tesla, and it crashes. It swerved off the road to dodge a hedgehog, let's say. Using YouTube's defense, Tesla could say "Well, that's just the world! There are lots of hedgehogs out there. Sorry!". No, the car is trained to respond to the world in given ways. There may not be a line of code inside the Telsa which explicitly says "dodge_the_hog = True", but implicitly, there must have been some motivation. A "fully autonomous" vehicle is never *fully* autonomous.
Finally, we moved on to the topic of *homophily*. This is the idea that "like begets like", similar people enjoy similar things, etc... This seems intuitive, but it is worth investigating where this assumption comes from.
Researchers Lazarsfeld & Merton (1954) were interested in the highly-segregated city of Pennsylvania, and specifically, the cause of this segregation. As a hypothesis, they posited the idea of *homophily*. People like to live near similar people. Hence, all the white people live here, all the black people live over there. That's all there is to it. However, there is one blinding flaw in their methodology... they only used white participants!
Therefore, the concept of homophily is not as stable as it seems.
# Lab 4
## Keywords:
- list keywords here
## Glossary:
- new concepts, words, or ideas: and their definition, description, ...
## Notes:
A (short or long) summary of what we spoke about in class