# Data transformation ## Learning objectives * Identify computer programming as a form of problem solving * Practice decomposing an analytical goal into a set of discrete, computational tasks * Identify the verbs for a language of data manipulation * Clarify confusing aspects of data transformation from [R for Data Science](http://r4ds.had.co.nz/transform.html) * Practice transforming data --- ### Diamonds Example(s) - If you know the order of arguments, you don't have to specify the arguments explicitly - It's best practice to name arguments explicitly though 1. Identify inputs ``` data("diamonds") ``` 2. Filter ``` diamonds_ideal <- filter(.data = diamonds, cut == "Ideal") //if I knew the order of arguments, this would also work: diamonds_ideal <- filter(diamonds, "Ideal") ``` - The first argument in dplyr functions is usually called .data so as not to be confused with the function `data()` 3. Summarize ``` summarize(.data = diamonds_ideal, avg_price = mean(price)) ``` - For calculating statistics for sub-groups in your dataset, use `group_by` (which defines a grouping structure) ---- ### `Tidyverse` #### We <3 Hadley Wickham **The functions to know are in the slides** - You can use British English or American English - The `=` and `<-` are for assigning objects name - The `==` operation checks for equivalence - Imagine the `%>%` operator as the coding equivalent of the words "and then" EX. ` flights %>% group_by(dest)`