Collaborative notebook

# Collaborative notebook ## R tidyverse for UiO Carpentry 11-04-2022 https://raw.githubusercontent.com/allisonhorst/palmerpenguins/master/inst/extdata/penguins.csv https://www.uio.no/english/services/it/network/wireless/help/uioguest.html ### Be a part User groups in Oslo for R: Oslo UseR! https://www.meetup.com/en-AU/Oslo-useR-Group/ R-Ladies Oslo https://www.meetup.com/en-AU/rladies-oslo/ ### Helpers Everyone should (in my opinion) be familiar with the RStudio Cheat sheets: https://www.rstudio.com/resources/cheatsheets/ ### Code example from the course See all tidyselecters by write and run: `?tidyr_tidy_select` ``` penguins %>% mutate( bill_ratio = bill_depth_mm / bill_length_mm, bill_type = if_else(condition = bill_ratio < 0.5, true = "elongated", false = "stumped")) ``` All the lesson materials can be found at: https://athanasiamo.github.io/r-tidyverse-for-working-with-data/01-project-introduction/ ### Pivoting on pairs of columns Sometimes, we get to work with data, e.g. aggregated survey results where the data is shaped as an arbitrary number of key-value columns. That is, there are several column pairs where one contains a descriptor and the other a value. One way of pivot them would be to write one `pivot_long` mutation for each pair - but that fast becomes unwieldy. ``` penguin_semi <- penguins %>% mutate(id = row_number()) %>% pivot_longer(starts_with("bill"), names_to = "name_1", values_to = "value_1") %>% pivot_longer(c(flipper_length_mm, body_mass_g), names_to = "name_2", values_to = "value_2") penguin_semi %>% pivot_longer( -c(1:5), names_to = c(".value", "col"), names_sep = "_" ) ``` The thing to notice here is the `.value` part in `names_to`. From the help for `pivot_longer`: > If `names_to` is a character #' containing the special `.value` sentinel, this value will be ignored, #' and the name of the value column will be derived from part of the #' existing column names. `".value"` is a parameter call to the function in `pivot_longer` attempting to identify the columns you choose to pivot. Consider a slightly different semi-wide dataframe, where the column names contains some identifying information about each penguin. ``` penguins_semi <- penguins %>% mutate(id = row_number()) %>% pivot_longer(starts_with("bill"), names_to = "name_yellow", values_to = "value_yellow") %>% pivot_longer(starts_with("flipper"), names_to = "name_brown", values_to = "value_brown") penguins_semi %>% pivot_longer(cols = c(starts_with("name"), starts_with("value")), names_to = c(".value", "colour"), names_sep = "_") ``` Here, instead of deselecting the columns to be pivoted by ```-c(1:5)``` we use a search pattern. Pivot on the columns starting with 'name' and 'value'. The data frame we end up with is the same shape as in the last example, but it is perhaps more clear to see that `.value` points back to the actual names of the columns found by our search patterns. So, `.value` is not a string in the sense we would expect it to, but a reference to the column names. For a nice explanation and great RegEx examples (finding columns based on pattern matching), see here: https://stackoverflow.com/questions/61386200/how-does-the-names-to-value-convention-work-for-multiple-observations-per-row (and beware of the rabbit holes). ### Messy data - not penguins For transforming "messy" or other types of column naming schemes into snake-case, use the `clean_names` function from he {janitor} package. https://garthtarr.github.io/meatR/janitor.html Question: Often I find a lot of the 'work' for intro courses is done in the csv. So courses can be easy to follow because the csv youre working with is already 'tidy' but often your real csvs arent so I was curious if there was any information/resources about the csv prep process? (so how to format your csv) > Mo: This is very true. The penguins data we used is already very tidy, so its easy to work tidily in it. > Preparing data for work really depends on the data and their state, so giving general advice can some times be hard. > In the R for Data Science book, there is a chapter of particular interest for this topic: > https://r4ds.had.co.nz/tidy-data.html which should provide some resources in how to tidy data, and what principles you might be looking out for in getting your data into a tidy format. ### Romeo networking - Language Technology and Data Analysis Laboratory (LADAL) Someone asked about a textual analysis tutorial at https://slcladal.github.io/net.html This tutorial was not complete and impossible to follow. However, there is a link that brings you to https://colab.research.google.com/drive/1mSzppeBA6Ai3zCmNKfkqgCd_QouRe8BM?usp=sharing&pli=1 - which DOES contain the examples that was a prerequisite for doing the code that was actually quoted. I would still say it is ... a messy tutorial, but you will probably learn what you have to. Sorry, I did not get your name, but I hope this gets to the right person.

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.