MX04 - HackMD

--- title: MX04 tags: data and digital culture, mini exercise --- # MX04 *Adam Kiil Naldal* – *3611 characters* :::info What’s the relation between code, big data and platforms? ::: The relationship between code, big data and platforms are many, and multifaceted. One assumption that should ground any enquiry into those relationships is that none of those three concepts (platforms, big data and code) – with the possible exception of code – can be understood sepperate from the other. For a start let's look at the relationships between big data and platforms. The idea of a platform is first and foremost a constitutive metaphor, since it is much harder to point to a specific technical arrangement that tangibly defines the platform structure than to identify that something is generally regarded as a platform. Indeed a platform might simply be thought of as a large website with lots of (user generated) content, however that is hardly an apt definition. Wikipedia for instance has an anourmous amount of content created by it's users but it opperates quite differently from other platforms like facebook or youtube. Furthermore the metaphor of the platform is not one that is deployed neutrally: in many cases the connotations of neutrality and "even ground" upon which users can engage with each other, are often employed to evade responsibility for lacking moderation or elements of a service meant to (directly or indirectly) exaggerate negative behaviour by the users. Before we get into a lengthy semantic discussion however, we should return to question at hand: what is the relationship between platforms and big data. The digital services that most often refer to them selves as platforms are usually large, incorporated websites/apps like facebook, twitter, youtube(alphabet) etc. These services all work on the principle of algorithmic curation (though they all implement it differently) to keep users engaged on with their service. In this we see the first double loop between platforms and big data: on the one hand the algorithms that are employed to keep people engaged (and thus generating ad-revenue) require data at a massive scale to make predictions about what curation of the massive amounts of content available will best keep any given user looking at the screen. On the other hand it is precisely this user activity that generates the massive amounts of data about user behaviour that the algorithms are trained on. To understand this second connection between platforms and big data, we need to introduce the concept of code. The algorithms that are curating user feeds are usually a complicated emalgemation of several different machine learning algorithms. This means that interestingly the algorithms, or the program that curates, is not written by hand but is generated as a secondary program, based the machine learning models that are employed and how they are connected. In the case of platforms, the way in which we usually encounter code as readable sourcecode with some degree of documentation and comments, is not how their most important algorithms are written, they are much more inacessible and importantly not directly crafted by programmers typing exact instructions on a keyboard. This brings us to a third relationship: that between code and big data: The data that is generated, and how it is scrubed, categorised, labeled and organised, has an outsised impact on the final code when since it has a far more direct say in how the final code will look in a machine learning context than it would have, if the code was handwritten and directly designed.