# Challenges in basic Data Analysis education # <a id="start"></a> LU Case Day 2024 Linda Hartman and Dmytro Perepolkin 2024-04-18, 13:15 to 17:00 :::info 💡 This notebook can be reached at ## rrr.is/lucd24 ::: This is a collaborative note-taking document. You can edit it and ask questions, share your thoughts and collaborate with each other. --- # Data Analysis: Statistical Learning and Visualization (FMSF86/90) <img style="float: right;" src="https://hackmd.io/_uploads/BkusWiixC.jpg" alt="listening group" width="250"> - Data-wrangling, visualization and applied machine learning - New course, R beginners - 120 students: IndEk2, F4, Pi4 and 2 PhD-students - 7 weeks with 1 practice session in Teknodromen + 2 lectures - Exam: 3 lab assignments with 3-5 Challenges to be solved and interpreted --- <img style="float: right;" src="https://hackmd.io/_uploads/rkpSWIqeA.png" alt="beautiful image" width="240"> # Demo of introductory session - Visualization with ggplot ### Sticky notes During the class we use sticky notes for signalling - <span style="background-color:deeppink;color:white; font-weight: bold;">Red</span>: "I am stuck" - <span style="background-color:yellowgreen;color:black; font-weight: bold;">Green</span>: "I am done" After the session, please, write your feedback on the notes and sick it to the whiteboard before you leave. ### Starter code :::success 💻 Please, go to the Posit Cloud project ### https://posit.cloud/content/8033381 ::: If you are using your own system you can copy this code to get started ```r library(tidyverse) gapminder <- gapminder::gapminder gapminder ggplot(data = gapminder) + geom_point(mapping = aes(x = gdpPercap, y = lifeExp)) ``` ::::warning #### 💪 Challenge 1 Do people in rich countries live longer than people in poor countries? - What does the relationship between GDP per capita and Life expectancy look like? Is this relationship linear? Non-linear? - Are there exceptions to the general rule (outliers)? 💡 Tip: You can make the points less crowded by log-transforming the x axis by adding ```{r} + scale_x_log10() ``` :::: General template for visualizing data in `ggplot2` can be summarized as follows (in metacode): ``` ggplot(data = <DATA>) + <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>)) + <SCALE_FUNCTION> ``` ::::warning #### 💪 Challenge 2 Can you add a little color to our initial graph of life expectancy by GDP per capita? - Color the points by continent. - There seem to be some outliers in this graph. Can you now spot which continent to these points belong to? - Map the `size` aesthetic to population. Can you spot the trajectory of China and India on this plot? :::spoiler <summary>Solution</summary> ```{r} ggplot(data = gapminder) + geom_point(mapping = aes(x = gdpPercap, y = lifeExp, color=continent, size=pop))+ scale_x_log10() ``` ![temp_gapminder](https://hackmd.io/_uploads/BJdzqwqeC.png) ::: :::: ### Class notes Please, write your notes and observations here ### Final plot in class :::spoiler <summary>code</summary> ```{r} #install.packages("gganimate") #install.packages("gifski") library(gganimate) a1 <- ggplot(data = gapminder, aes(x = gdpPercap/1e3, y = lifeExp, size=pop/1e6, color=continent)) + geom_point() + geom_text(mapping = aes(x = 40, y = 30, label=as.character(floor(year))), size = 16, color="grey") + scale_x_log10() + labs(title="Life Expectancy vs GDP per capita over time", subtitle="In the past 50 years, life expectancy has improved in most countries of the world", caption="Source: Gapminder foundation, https://www.gapminder.org/data/", x="GDP per capita, '000 USD", y="Life expectancy, years", color="Continent", size="Population, mln") + theme_bw() + # here comes gganimate addition transition_time(year)+ ease_aes('linear') # to save your animation use anim_save(animation=a1, "my_cool_animation.gif") ``` ::: <img src="https://i.ibb.co/DYNB3dR/my-cool-animation.gif" alt="gapminder-animation" border="0"> [Scroll to slides](#Code-along-method) --- # Code-along method > Teaching is theater, not cinema. — Neal Davis Scripting is a process, not a finished product (Selvaraj et al., 2021)[^7] - The most effective way to teach programming is live coding. Wilson(2019)[^1] - Live coding makes the thought process of the instructor to be visible to the students[^4] - Participatory live coding ("code-along") is interleaved with excercises ("challenges"). "I do, we do, you do". (Nederbragt et al, 2020)[^6] Scaffolding and filling in - "Let them eat cake first" approach ([video](https://www.youtube.com/watch?v=fQ4t7p6ZXDg); [slides](https://bit.ly/eat-cake-diz)) - Start with visualization. (Wang, Rush & Horton, 2017)[^3] - Fill in the (boring) details after. (Çetinkaya-Rundel, 2020)[^2], - Leverage the ecosystem (tidyverse and caret) --- # Why do we need Teknodromen <img style="float: right;" src="https://hackmd.io/_uploads/Hk-MKiieR.jpg" alt="groups and helper" width="250"> - BIG screens with ability to show 2 sources: - *Main screen:* Screen-casting from the presenter - *Side screen:* Challenges/community notes - Theater arrangement and easy access for helpers - Tables (with wall screens) that encourage collaboration in small groups Things we did not do, but should have - *Formative assessments* activating students' theoretical knowledge *prior to* practical session - *Self-paced excercises* in a sandboxed environment to reinforce the learning after the practical session[^5]. - Switch completely to *collaborative note-taking* - **More Teknodromen**! Spread the same material over more sessions[^8]. [^1]: Wilson, G. (2019). Teaching Tech Together: How to Make Your Lessons Work and Build a Teaching Community around Them. CRC Press. https://teachtogether.tech/ [^2]: Çetinkaya-Rundel, M., & Ellison, V. (2020). A Fresh Look at Introductory Data Science. Journal of Statistics Education, 0(0), 1–11. https://doi.org/10.1080/10691898.2020.1804497 [^3]: Wang, X., Rush, C., & Horton, N. J. (2017). Data Visualization on Day One: Bringing Big Ideas into Intro Stats Early and Often. arXiv:1705.08544 Stat. http://arxiv.org/abs/1705.08544 [^4]: Lin, Y.-T., Yeh, M. K.-C., & Tan, S.-R. (2022). Teaching Programming by Revealing Thinking Process: Watching Experts’ Live Coding Videos With Reflection Annotations. IEEE Transactions on Education, 65(4), 617–627. https://doi.org/10.1109/TE.2022.3155884 [^5]: Masegosa, A. R., Cabañas, R., Maldonado, A. D., & Morales, M. (2024). Learning Styles Impact Students’ Perceptions on Active Learning Methodologies: A Case Study on the Use of Live Coding and Short Programming Exercises. Education Sciences, 14(3), Article 3. https://doi.org/10.3390/educsci14030250 [^6]: Nederbragt, A., Harris, R. M., Hill, A. P., & Wilson, G. (2020). Ten quick tips for teaching with participatory live coding. PLOS Computational Biology, 16(9), e1008090. https://doi.org/10.1371/journal.pcbi.1008090 [^7]: Selvaraj, A., Zhang, E., Porter, L., & Soosai Raj, A. G. (2021). Live Coding: A Review of the Literature. Proceedings of the 26th ACM Conference on Innovation and Technology in Computer Science Education V. 1, 164–170. https://doi.org/10.1145/3430665.3456382 [^8]: Shah, A., Hogan, E., Agarwal, V., Driscoll, J., Porter, L., Griswold, W. G., & Soosai Raj, A. G. (2023). An Empirical Evaluation of Live Coding in CS1. Proceedings of the 2023 ACM Conference on International Computing Education Research - Volume 1, 1, 476–494. https://doi.org/10.1145/3568813.3600122