# Mini Project B3 - Why The How ## By Group 12 :face_with_cowboy_hat: (Group 4000: Født I Går (Born Yesterday)) --- ## Research question :question: How can Ruppert, Law & Savage's description of data as apparatuses, from *Reassembling social science method* (2013), contribute to a broader understanding of how we percieve correlations between COVID-19 cases and COVID-19 deaths in a worldwide perspective? ## Introduction :newspaper: In this mini-project, we are investegating the correlation between COVID-19 cases and COVID-19 deaths in different parts of the world. The correlation will be interpreted through the lense of digital visualisations and representations. Furthermore, we wish to shed light on the uncertainties in our miniproject. The consideration regarding uncertainties in our project will reflect methodological considerations from Ruppert et al. (2013), with regard to the ways in which digital data can be understood as apparatuses, consisting of the systems, algorithms and codes that produces it. The understanding of digital data and digital devices as apparatuses is in this project considered highly relevant in order to describe eventual weaknesses and uncertainties about our collection of data as well as its representation of the real world. --- ## Methodology: why? :thinking_face: By utilizing data manipulation, cleaning and visualisation, it can be said that we are using a digital method (Marres, 2012), in which we are adapting digital devices for the purposes of social research (Marres, 2012, p.13). As stated in *The redistribution of methods: on intervention in digital social research, broadly conceived* (Marres, 2012), the digital method makes comprehensive social analysis possible through natively digital devices such as search engines (Marres, 2012, p. 13), due to its dynamic datasets and feedback possibilities. In this project, we consider the possibility to utilize programs such as JupyterLab, Pandas and RAWgraphs as being indispensable, in order to shed light on a globally monitored phenomenon as COVID-19 in the best way possible. To analyse a sociological phenomenon through digital devices, and especially the ways in which digital devices themselves are materialy implicated in the production of sociality, is discussed by Ruppert et al. in the text *reassembling social science methods: the challenge of digital device* (2013). In the article, Ruppert et al. states how the digital and digital data can be seen as an apparatus, or a *dispositif* (Foucault, 2009), which is described as: > a thoroughly heterogeneous ensemble consisting of discourses, institutions, architectural forms, regulatory decisions, laws, administrative measures, scientific statements, philosophical, moral and philantropic propositions - in short, the said as much as the unsaid > >(Ruppert et al., 2013, p. 9) Describing data and the digital as an apparatus, thus, becomes useful in understanding how produced data not only becomes a product of the systems, algorithms and codes that produces it. The apparatus also consists, and thereby becomes a product of, different actors and institutional interests, as well as different regulations, discourses and controversies regarding the subject. With regard to our research concerning COVID-19 cases and COVID-19 deaths, this understanding of data as an apparatus contributes with a highly relevant consideration regarding our initially gathered data about the subject. It is important to note that this data, or this apparatus, is a product of several elements: it is a product of different ways to keep track of COVID-19 cases, the systems used to support the tracking and the willingness to release data concerning the virus, but it is also a product of individuals and their willingness to get tested, as well as political and health related interests in the data. Thus, the data gathered about the virus will be a product of many different elements and their interests in the topic, which all together contributes to the data apparatus. In the same way, by working with our collected data, the data that we will represent in this article will be an apparatus based on the systems that we have chosen to use for representation and cleaning, as well as the elements that we are considering as interesting in order to answer our research question. --- ## What was our approach: how? :open_book: #### Generel overview - Importing libraries for working with our data ``` import numpy as np import pandas as pd import matplotlib.pyplot as plt from datetime import datetime %matplotlib inline ``` - Importing the dataset into the Jupyter enviornment ``` cvset = pd.read_csv("owid-covid-data.csv") ``` - Dropping / cleaning data (we know :shocked_face_with_exploding_head:) ``` cvset.drop(columns=["iso_code", "new_cases", "new_cases_smoothed", "new_deaths", "new_deaths_smoothed", "extreme_poverty", "cardiovasc_death_rate", "diabetes_prevalence", "female_smokers", "male_smokers", "hospital_beds_per_thousand", "human_development_index"]) ``` - Selecting the accumulated covid-19 cases and deaths data (most recent date) ``` cvc3 = cvc2.loc[cvc2['date'] == '2020-10-21'] ``` - Sorting our data ``` cvc5 = cvc4.sort_values(by=['continent'], ascending=True) ``` - Exporting our data ``` cvc5.to_csv("COVID19DATACLEAN.csv") ``` ### :link: [Link to our raw (before) and cleaned (after) data](https://github.com/LasseUtoft/B3Files) #### Tools used :hammer_and_wrench: - Python / JupyterLab / Pandas - RAWgraphs --- ## Results :fire: ### Graphs from RAW ### :link: [Links to 1920x1080 screendumps of the graphs](https://github.com/LasseUtoft/graphs-B3-project) #### Continents COVID-19 deaths pr. million citizens *shows median, upper/lower quartile & high/low extremities* ![](https://i.imgur.com/hbjnmLA.png) #### Relation between cases & continent *From low (top) to a high amount (bottom)* ![](https://i.imgur.com/dKNqYIY.jpg) #### Relation between case & death by COVID-19 *Shows cases of COVID-19 (x-axis) pr. million citizens with size showing deaths pr. million citizens. Countries are grouped in continents on the y-axis.* ![](https://i.imgur.com/7uq0Ze6.png) #### Relation between case & death by COVID-19 (2) *Shows amount of cases (x-axis) pr. million citizens and amount of deaths (y-axis) pr. million citizens. Labeled by country and colored by continent* ![](https://i.imgur.com/3F9b7uA.png) #### Countries percentage of total deaths pr. million citizens *Ordered by continent and sized by total deaths pr. million citizens* ![](https://i.imgur.com/XLL8kQp.png) --- ## Further results & comments on illustrations :speaking_head_in_silhouette: ### Extremeties #### Small Countries & Island Countries Our data set consistently ranks small European countries and island countries (San Marino, Aruba, Andorra, etc.) as leading countries in COVID-19 cases and deaths per million citizens due their small populations. These countries, then, are not completely comparable because their numbers are spiked incredibly due to their small sizes. #### Africa All African countries consistently ranks lowest in both cases pr. million and deaths pr. million. This is perhaps due to both a considerably lower median age in many African countries, as well as a worse access to both testing facilities and health care (hospitals, etc.). #### COVID-19 Cases pr. million citizens Leading in cases pr. million citizens are the more wealthy middle eastern countries such as Kuwait, Bahran, Qatar and Israel, with North and South American countries not far behind: USA, Chile, Brazil, Panama and Peru. #### COVID-19 deaths pr. million citizens Countries with a high amount of death pr. million citizens are lead mainly by Northern and Southern American countries: Peru, Brazil, Chile, USA, Panama, Argentina, Columbia, Bolivia, Mexico and Ecuador with Italy and The UK representing the most hit European countries. The high amount of deaths in relation to cases are in this case hard to determine. Firstly, because the countries represent three different continents with extremely different health care systems and infrastructures, and secondly, because countries with fewer deaths but more cases are even more diverse. --- ## Data source & vulnerabilities :file_folder: :card_file_box: The used dataset is retrieved from [GitHub.com (link)](https://github.com/owid/covid-19-data/tree/master/public/data) and is uploaded by 'Our World in Data', a non-profit organisation located at University of Oxford. We find 'Our World in Data' a trustworthy source for collecting and issuing this dataset. However, the COVID-19 pandemic and the collecting of data is filled with uncertainties because the numbers of cases are only meaningful if we know how much the individual country tests its citizens for COVID-19. Testing for COVID-19 can be very different from one country to another, and things as infrastructure, technology, economy and culture can affect the numbers of tests conducted by the country. With regard to Ruppert et al, the data as an apparatus depends on several elements. Digitally, the data depends on a countries digital resources in terms of being able to count the amount of cases correctly, but also in terms of publicing the data. infrastructurally, the data depends on the populations posibillity to actually attend a doctor and get checked and socially and politically, the data is also depending on the countries releasing correct data about cases of COVID-19 in their country. By naming a few of these elements, it becomes very clear how the data as an apparatus relies heavily on many different elements. Because of the above mentioned uncertainties and elements concerning our data, it makes it very hard to know if our data makes a realistic representation of the real world and the COVID-19 pandemic. A clear development area for our research would be to collect data on the amount of tests the individual contries have made and compare it with cases and deaths. --- ## Conclusion :fireworks: Throughout our research of correlations between COVID-19 cases and COVID-19 deaths it has become clear that the correlation in many cases differs from what one might expect. For instance, our representations showed that neither cases or deaths with relation to COVID-19 in Africa was at a high point, which may be a conflicting with a general understanding of Africa as a continent fighting with poverty and general infrastructural issues, whereas wealthier, european countires, based on our data, had way more cases and deaths. However, by involving theory from Ruppert et al. (2013), it became immensely clear that data, as an apparathus, is a product of several different elements which all affects the ways data is represented. With this understanding in mind, our studies can't possibly conclude on general COVID-19 findings in relation to cases contra death because of our limitations, but provide us with valuably insights in regards to how data always should be viewed and treated in relation to the elements that affect and constitute it. --- ## Reference List :books: * Foucault, M. (2009). Security, territory, population: Lectures at the Collège de France, 1977-78 (M. Senellart, Red.; G. Burchell, Overs.). http://site.ebrary.com/id/10487827 * Marres, N. (2012). The Redistribution of Methods: On Intervention in Digital Social Research, Broadly Conceived. The Sociological Review, 60(1_suppl), 139–165. https://doi.org/10.1111/j.1467-954X.2012.02121.x * Ruppert, E., Law, J., & Savage, M. (2013). Reassembling Social Science Methods: The Challenge of Digital Devices. Theory, Culture & Society, 30(4), 22–46. https://doi.org/10.1177/0263276413484941 --- ## Links :link: * Data used: https://github.com/owid/covid-19-data/tree/master/public/data * Data file before/after Pandas/Jupyter: https://github.com/LasseUtoft/B3Files * Graphs in 1920*1080 resolution: https://github.com/LasseUtoft/graphs-B3-project ---