# Challenges in basic Data Analysis education
# <a id="start"></a> LU Case Day 2024
Linda Hartman and Dmytro Perepolkin
2024-04-18, 13:15 to 17:00
:::info
💡 This notebook can be reached at
## rrr.is/lucd24
:::
This is a collaborative note-taking document. You can edit it and ask questions, share your thoughts and collaborate with each other.
---
# Data Analysis: Statistical Learning and Visualization (FMSF86/90)
<img style="float: right;" src="https://hackmd.io/_uploads/BkusWiixC.jpg" alt="listening group" width="250">
- Data-wrangling, visualization and applied machine learning
- New course, R beginners
- 120 students: IndEk2, F4, Pi4 and 2 PhD-students
- 7 weeks with 1 practice session in Teknodromen + 2 lectures
- Exam: 3 lab assignments with 3-5 Challenges to be solved and interpreted
---
<img style="float: right;" src="https://hackmd.io/_uploads/rkpSWIqeA.png" alt="beautiful image" width="240">
# Demo of introductory session - Visualization with ggplot
### Sticky notes
During the class we use sticky notes for signalling
- <span style="background-color:deeppink;color:white; font-weight: bold;">Red</span>: "I am stuck"
- <span style="background-color:yellowgreen;color:black; font-weight: bold;">Green</span>: "I am done"
After the session, please, write your feedback on the notes and sick it to the whiteboard before you leave.
### Starter code
:::success
💻 Please, go to the Posit Cloud project
### https://posit.cloud/content/8033381
:::
If you are using your own system you can copy this code to get started
```r
library(tidyverse)
gapminder <- gapminder::gapminder
gapminder
ggplot(data = gapminder) +
geom_point(mapping = aes(x = gdpPercap, y = lifeExp))
```
::::warning
#### 💪 Challenge 1
Do people in rich countries live longer than people in poor countries?
- What does the relationship between GDP per capita and Life expectancy look like? Is this relationship linear? Non-linear?
- Are there exceptions to the general rule (outliers)?
💡 Tip: You can make the points less crowded by log-transforming the x axis by adding
```{r}
+ scale_x_log10()
```
::::
General template for visualizing data in `ggplot2` can be summarized as follows (in metacode):
```
ggplot(data = <DATA>) +
<GEOM_FUNCTION>(mapping = aes(<MAPPINGS>)) +
<SCALE_FUNCTION>
```
::::warning
#### 💪 Challenge 2
Can you add a little color to our initial graph of life expectancy by GDP per capita?
- Color the points by continent.
- There seem to be some outliers in this graph. Can you now spot which continent to these points belong to?
- Map the `size` aesthetic to population. Can you spot the trajectory of China and India on this plot?
:::spoiler
<summary>Solution</summary>
```{r}
ggplot(data = gapminder) +
geom_point(mapping = aes(x = gdpPercap, y = lifeExp,
color=continent, size=pop))+
scale_x_log10()
```

:::
::::
### Class notes
Please, write your notes and observations here
### Final plot in class
:::spoiler
<summary>code</summary>
```{r}
#install.packages("gganimate")
#install.packages("gifski")
library(gganimate)
a1 <- ggplot(data = gapminder, aes(x = gdpPercap/1e3, y = lifeExp,
size=pop/1e6, color=continent)) +
geom_point() +
geom_text(mapping = aes(x = 40, y = 30, label=as.character(floor(year))),
size = 16, color="grey") +
scale_x_log10() +
labs(title="Life Expectancy vs GDP per capita over time",
subtitle="In the past 50 years, life expectancy has improved in most countries of the world",
caption="Source: Gapminder foundation, https://www.gapminder.org/data/",
x="GDP per capita, '000 USD",
y="Life expectancy, years",
color="Continent",
size="Population, mln") +
theme_bw() +
# here comes gganimate addition
transition_time(year)+
ease_aes('linear')
# to save your animation use
anim_save(animation=a1, "my_cool_animation.gif")
```
:::
<img src="https://i.ibb.co/DYNB3dR/my-cool-animation.gif" alt="gapminder-animation" border="0">
[Scroll to slides](#Code-along-method)
---
# Code-along method
> Teaching is theater, not cinema. — Neal Davis
Scripting is a process, not a finished product (Selvaraj et al., 2021)[^7]
- The most effective way to teach programming is live coding. Wilson(2019)[^1]
- Live coding makes the thought process of the instructor to be visible to the students[^4]
- Participatory live coding ("code-along") is interleaved with excercises ("challenges"). "I do, we do, you do". (Nederbragt et al, 2020)[^6]
Scaffolding and filling in
- "Let them eat cake first" approach ([video](https://www.youtube.com/watch?v=fQ4t7p6ZXDg); [slides](https://bit.ly/eat-cake-diz))
- Start with visualization. (Wang, Rush & Horton, 2017)[^3]
- Fill in the (boring) details after. (Çetinkaya-Rundel, 2020)[^2],
- Leverage the ecosystem (tidyverse and caret)
---
# Why do we need Teknodromen
<img style="float: right;" src="https://hackmd.io/_uploads/Hk-MKiieR.jpg" alt="groups and helper" width="250">
- BIG screens with ability to show 2 sources:
- *Main screen:* Screen-casting from the presenter
- *Side screen:* Challenges/community notes
- Theater arrangement and easy access for helpers
- Tables (with wall screens) that encourage collaboration in small groups
Things we did not do, but should have
- *Formative assessments* activating students' theoretical knowledge *prior to* practical session
- *Self-paced excercises* in a sandboxed environment to reinforce the learning after the practical session[^5].
- Switch completely to *collaborative note-taking*
- **More Teknodromen**! Spread the same material over more sessions[^8].
[^1]: Wilson, G. (2019). Teaching Tech Together: How to Make Your Lessons Work and Build a Teaching Community around Them. CRC Press. https://teachtogether.tech/
[^2]: Çetinkaya-Rundel, M., & Ellison, V. (2020). A Fresh Look at Introductory Data Science. Journal of Statistics Education, 0(0), 1–11. https://doi.org/10.1080/10691898.2020.1804497
[^3]: Wang, X., Rush, C., & Horton, N. J. (2017). Data Visualization on Day One: Bringing Big Ideas into Intro Stats Early and Often. arXiv:1705.08544 Stat. http://arxiv.org/abs/1705.08544
[^4]: Lin, Y.-T., Yeh, M. K.-C., & Tan, S.-R. (2022). Teaching Programming by Revealing Thinking Process: Watching Experts’ Live Coding Videos With Reflection Annotations. IEEE Transactions on Education, 65(4), 617–627. https://doi.org/10.1109/TE.2022.3155884
[^5]: Masegosa, A. R., Cabañas, R., Maldonado, A. D., & Morales, M. (2024). Learning Styles Impact Students’ Perceptions on Active Learning Methodologies: A Case Study on the Use of Live Coding and Short Programming Exercises. Education Sciences, 14(3), Article 3. https://doi.org/10.3390/educsci14030250
[^6]: Nederbragt, A., Harris, R. M., Hill, A. P., & Wilson, G. (2020). Ten quick tips for teaching with participatory live coding. PLOS Computational Biology, 16(9), e1008090. https://doi.org/10.1371/journal.pcbi.1008090
[^7]: Selvaraj, A., Zhang, E., Porter, L., & Soosai Raj, A. G. (2021). Live Coding: A Review of the Literature. Proceedings of the 26th ACM Conference on Innovation and Technology in Computer Science Education V. 1, 164–170. https://doi.org/10.1145/3430665.3456382
[^8]: Shah, A., Hogan, E., Agarwal, V., Driscoll, J., Porter, L., Griswold, W. G., & Soosai Raj, A. G. (2023). An Empirical Evaluation of Live Coding in CS1. Proceedings of the 2023 ACM Conference on International Computing Education Research - Volume 1, 1, 476–494. https://doi.org/10.1145/3568813.3600122