# Module 3
## 29/07/21 - in office collab
(Notes from initial scoping on Feb)
Module 3: Data visualisation/exploration and intro to modeling
- Taught:
- Exploratory data analysis, how to start working with new data.
- Basic visualisation
- Types of charts
- Interactive charts
- Network graphs
- Guidance for effective data communication and visualisation, including EDI discussion on the importance of always considering visualisations in the context where they are presented and being aware of inclusivity guidelines.
- Intro to modeling:
- The two cultures (theorise and estimate, compute and test). Discuss pros and cons.
- Predictive modeling, focusing on a basic methods (e.g. linear/logistic regression) using a standard Python library.
- Hands-on project:
- Building a simple dashboard with a Gapminder dataset. Attendees will need to discuss the trends seen in the data and encouraged to build effective visualisations, then decide which features to use to train a basic predictive model. Teams will be free to experiment with different methods of collaboration
## Braindump of things that we feel are important in data visualisation.
- Near the start: interactive session with a poor plot. Groups go off an attempt to interpret it and also to pin point the interpretation. Come back and discuss different interpretations, and discuss _why_ it is hard to interpret.
- lead on to Rules of the Game / Recipe?. Basics: what is a figure? labels, axes (dimensions), scale, considering colorblindness, etc.
- Plots should be understandable without reading the caption
- Plots should be understandable without having domain expertise.
- Figure as story
- Using data visualisation for Data exploration as link between Mod 2 & 3.
- Useful tips
- Dimentionality reduction when we cant visualise all the information in a single graph
- Using data visualisation for comunication
- Plotting pitfalls
- overplotting
- trying to tell multiple stories
-
- Examples of bad figures
- NHS
- Other real-world case studies?
- when to go interactive?
data-based figures vs. infographics, do the same rules hold?
know your audience - are there different rules for public vs. academia?
## Tools
- seaborn (more advs working with dataframes, perhaps do more with less code)
- matplotlib (more low level)
- ggplot? Not in python. Could translate some examples as additional material.
## Useful books/references:
- [Fundamentals of data visualisation](https://clauswilke.com/dataviz/)
- The Truthful Art: Data, Charts, and Maps for Communication (Alberto Cairo)
- Storytelling with Data (Cole Nussbaumer Knaflic)
- https://ourworldindata.org/
- Grammar of Graphics [Wickham's paper](https://byrneslab.net/classes/biol607/readings/wickham_layered-grammar.pdf)
## Actions 06/07/21
- Read above resources and reiterate syllabus
- come back togetherr and attempt to formalise syllabus a little more
## Outline of Module 3:
*Preliminary by Camila, 19 July 2021.*
### 1. Wrong, bad and ugly figures.
We choose a few figures from these examples, ask the students to comment what is the problem with the figures:
- [No axis in graph in a Mexican government Covid briefing.](https://twitter.com/Rodpac/status/1250764503861600256?s=20)
- [Too much information in one slide from UK covid briefing.]( https://twitter.com/10DowningStreet/status/1322614557181960195/photo/1)
- [Examples of bad data visualisation.]( https://www.jotform.com/blog/bad-data-visualization/)
- [Good-and-bad-data-visualization.]( https://www.oldstreetsolutions.com/good-and-bad-data-visualization)
- [bad-data-visualization-in-the-time-of-covid-19.](https://medium.com/nightingale/bad-data-visualization-in-the-time-of-covid-19-5a9f8198ce3e)
- [statisticshowto](https://www.statisticshowto.com/probability-and-statistics/descriptive-statistics/misleading-graphs/)
- [small decisions in data viz](https://www.visualisingdata.com/2016/03/little-visualisation-design/)
Theme: Responsible data communication
- [thread about the difficulties of representing uncertainty](https://twitter.com/EvanMPeck/status/1235568532840120321)
- [referenced from here](https://www.visualisingdata.com/2020/03/communication-themes-from-coronavirus-outbreak/)
- [thread about data doubts](https://www.visualisingdata.com/2016/03/the-little-of-visualisation-design-part-4/)
### 2. Rules of the game:
Material for sections 2 and 3 mainly based on the [Fundamentals of data visualisation](https://clauswilke.com/dataviz/) book and grammar of Graphics [Wickham's paper](https://byrneslab.net/classes/biol607/readings/wickham_layered-grammar.pdf). For the examples we will use data from the hands-on sessions focusing on a particular country.
- What is a figure? Mapping data onto aesthetics.
- Coordinate systems and axes.
- Color scales. [examples in grey](https://www.visualisingdata.com/2015/01/make-grey-best-friend/)
- Statistical transformations(binning or aggregating).
- Anotations: labels, legends, titles.
### 3. Directory of visualisations (or an equivalent name):
- Distributions.
- Proportions.
- Trends.
- Time series.
- Geospatial data.
- Uncertainty.
### 4. Story telling with data visualisation.
This section can be based on the following resources:
- [Telling a story](https://clauswilke.com/dataviz/telling-a-story.html) section of the Fundamentals for data visualisation.
- [Numbers don't speak for themselves](https://data-feminism.mitpress.mit.edu/pub/czq9dfs5/release/2) chapter of the Data Feminism book.
- musings on [what are graphs](https://www.visualisingdata.com/2019/01/what-do-charts-actually-show/)
### 5. Data visualisation for data exploration
- [Visualise patterns of missigness](https://www.geeksforgeeks.org/python-visualize-missing-values-nan-values-using-missingno-library/) (is this discussed in module 2?).
- Relationships between numerical variables with scatter plots, joint plots, and pair plots.
- Relationships between numerical and categorical variables with box-and-whisker plots and complex conditional plots from [here](https://towardsdatascience.com/how-to-perform-exploratory-data-analysis-with-seaborn-97e3413e841d) and heatmaps from [here](https://towardsdatascience.com/how-to-use-python-seaborn-for-exploratory-data-analysis-1a4850f48f14).
- Visualizing high-dimensional datasets: PCA, t-sne + UMAP (?), some ideas in [here](https://www.kaggle.com/parulpandey/part1-visualizing-kannada-mnist-with-pca/notebook?scriptVersionId=29322090).