# Harmonisation Meeting 04/10/21. Doc for notes on how to bring synergy to the course. ## General - Nice to read vs nice to present seems a trade-off? - Present slides that different to coursebook? - Module seems conceptually different to the other modules as being more open-ended than the other modules. - Alan Turing logo? - Lesson x.x notation? - How do we do interactive coding? Run cells etc. Live-code? Binder? Clone the notebooks and run them on the local? - We ended up using `statsmodels` package in M3/M4 - collecting feedback? - Remember this course could be the first time the majority of the participants are using/seeing pandas. We have a general intro in Module 2, but check whether a function you're using has been seen before (and explain it if not). - List of files (probably not exhaustive) where other modules reference each module below - check references to content elsewhere are consistent ## Module 1 - 1.1. would benefit from discussion/exercise in the middle/other ways to break it up. - define data? - define model? - long paragraphs. - another way of breaking it up is to switch speakers within notebooks - you can 'hide' some material for the further reading. - more visuals less text? - move instructor/helper guiding quesdtions to separate document - give some advice on how to handle new data before hands-on sessions - legal issues/licences - Is EDI section too political? Should we word it differently? - Dichotomy of target variable - mention it in hands-on questions? Check 3.5 bottom and also 4.3? to check for CM/CRS - check that way of (1.1) describing models is similar to Module 4. - wrt M3 - check the 'Modeling and Reporting' section in 1.2 foreshadows M4. - hands-on, do we cover missing data and label bias in M3/M4? - check that some of the tasks in M3/4 are foreshadowed in M1 hands-on Other module files mentioning Module 1: ``` coursebook/modules/m2/hands-on-complete.ipynb coursebook/modules/m2/hands-on.ipynb coursebook/modules/m3/3.2-RulesOfTheGame.ipynb coursebook/modules/m3/3.5-DataVisForExploration.ipynb coursebook/modules/m3/3.3-Atlas0fVisualisations.ipynb coursebook/modules/m4/m4.1.ipynb coursebook/modules/m4/4.4_Evaluating_a_model.ipynb coursebook/modules/m4/4.3_Building_simple_model.ipynb coursebook/modules/m4/m4.2.md ``` ## Module 2 - Title "Handling Data & Deployment" - we're not doing deployment. - Who guarantees data properly collected? (in where to find data section) - "in almost all cases, it is preferable to use an already available dataset"..Are all our projects dealing with Open Data? WHat if we have a RQ specific to a dataset that the PI holds? Yes this should be open, but the sources of open data section makes it sound like we use all these 'common' open datasets for our projects. - Can we add a question in the Legality section like "check legality of putting EQLS data in Github"? They should be able to find it in the UKDS website. - Agree with M3/4 on where feature engineering, missing data etc will go - Make structure the pages into sub-topics or something? Maybe the overview will do this. - Match the hands-on with M3 - Make sure we're teaching use of `.str` in pandas. - maybe we should encourage people to use git but for the personal work, just for practice? To keep all their work in a repo. - Section titles - Lesson 2.1, 2.2, etc - Summary of M2 at the beginning (bullet points with main objectives or outcomes?) - Align with Section 3.5 - JB and JR to review 3.5 missingness - Think about exercise/discussion breaks (& actual breaks!) during teaching. - Review use of pandas throughout the course and check nothing essential is missed from the intro. Other module files mentioning module 2: ``` coursebook/modules/m1/Lesson_1.2_Data_Science_project_lifecycle.ipynb coursebook/modules/m1/Lesson_1.3_EDI_for_data_science.ipynb coursebook/modules/m1/hands-on.ipynb coursebook/modules/m3/3.2-RulesOfTheGame.ipynb coursebook/modules/m3/3.3-Atlas0fVisualisations.ipynb coursebook/modules/m3/3.5-DataVisForExploration.ipynb coursebook/modules/m4/m4.1.ipynb coursebook/modules/m4/m4.2.md coursebook/modules/m4/4.3_Building_simple_model.ipynb coursebook/modules/m4/4.4_Evaluating_a_model.ipynb ``` ## Module 3 - make more of a thing of colorblindness, with statistics with prevalence. - Is the covid vaccination vs state too political to be publically branded as turing? try and refer to someone else if this has been done before. - Module 2 people review missingness in 3.5 - move to module 2: Notes on user guide, how data constructed etc. - Justify why we're not using 2007 data (here or Module 2). - for simplicity when teaching is fine just need to clarify - Justify dropping rows with msissing values - [ ] Identify missingness type (MCAR) for dropped rows (3.5) - Could make this a hands-on bit. - Check with Module 2 whether they are exploring missingness in the hands-on. TODO 28/10/21: - [name=Callum] ~~colorblind note block, link to https://venngage.com/blog/color-blind-friendly-palette/~~ - (DONE) - [name=Camila] ~~placeholder for uncertainty - link to Claus Wilke's section.~~ - (DONE) - [name=Camila] ~~original source of american politics vaccination plot, and comments~~ - (DONE) - [name=Camila] ~~change storytelling plot to one less opinionated~~ - (DONE, decided to add more context, it sounds less opinionated now). - ~~add GraphCrimes twitter~~ - (DONE) - setup meeting with volunteer reviewers (and for M4) - [name=Camila] Do a final proof read of the Module - [name=Callum] Proof read. Other module files mentioning Module 3: ``` coursebook/modules/m1/Lesson_1.1_What_is_data_science.ipynb coursebook/modules/m1/Lesson_1.3_EDI_for_data_science.ipynb coursebook/modules/m4/m4.1.ipynb coursebook/modules/m4/4.4_Evaluating_a_model.ipynb coursebook/modules/m4/4.3_Building_simple_model.ipynb coursebook/modules/m4/m4.2.md ``` ## Module 4 - Copy elements of the data/model definitions to M1? - Modelling for explanation vs. modelling for prediction/discriminative models. - 4.1 should discuss this, wrt to the research question. - p(x) vs. log(odds) with false/true positive/negative quadrants - personally isn't obvious to me why those quadrants are correct but will read the material myself to understand! - Imbalanced dataset - add at least a recommendation/discussion for rebalancing in 4.3 or 4.4. Could they do it in hands-on? - Include more on different techniques? Be clearer what the section is and what it isn't. More machine learning-type approaches. - Was suggested that a couple of more machine learning-style approaches would be good, maybe quickly use sklearn to give a couple of other techniques. But the risk here is bombarding them with black-box approaches? - Todo for the devs: - [ ] Redraft M4.2 in a similar vein to 4.1, using fundamentals to build intuition for regression. - [ ] Add text to M4.4. TODO 28/10/21: - [name=Callum] finish drafting M4.2 - [name=Callum] read through module with attentin to m4.4 - setup meeting with volunteer reviewers - [name=Callum] ~~hide graphviz code~~ - (DONE) - [name=Callum] ~~redo statistical modelling plots in graphviz, and put into own words. (or make smaller)~~ - (DONE) - [name=Camila] make references and further reading clear at the end of the module. - (Half done, need to decide which formal text book we want to recomend). - [name=Callum] if we have time move the bishop notebook over or paraphrase with credit attributed. - [name=Callum] make figures smaller that don't need to be large. - [name=Camila] ~~change or remove entirely the true positive etc. plot.~~ - (DONE) Is removed. - [name=Callum] make sure how to interpret coefficients in M4.2 fitting models is reflected in M4.3 + M4.4. - [name=Callum] add model parsimony into m4.4 - [name=Camila] ~~Add discussion about likelihood ratio to 4.4~~ - (DONE) - [name=Camila] Do a final proof read of 4.3 and 4.4 (and 4.1, 4.2 when they are ready). - [name=Callum] final proof read Other module files mentioning Module 4: ``` coursebook/modules/m3/3.2-RulesOfTheGame.ipynb coursebook/modules/m3/3.5-DataVisForExploration.ipynb coursebook/modules/m3/3.3-Atlas0fVisualisations.ipynb ```