Data transformation

Learning objectives

  • Identify computer programming as a form of problem solving
  • Practice decomposing an analytical goal into a set of discrete, computational tasks
  • Identify the verbs for a language of data manipulation
  • Clarify confusing aspects of data transformation from R for Data Science
  • Practice transforming data

Diamonds Example(s)

  • If you know the order of arguments, you don't have to specify the arguments explicitly
  • It's best practice to name arguments explicitly though
  1. Identify inputs
data("diamonds")
  1. Filter
diamonds_ideal <- filter(.data = diamonds, cut == "Ideal")
//if I knew the order of arguments, this would also work: 
diamonds_ideal <- filter(diamonds, "Ideal")
  • The first argument in dplyr functions is usually called .data so as not to be confused with the function data()
  1. Summarize
summarize(.data = diamonds_ideal, avg_price = mean(price))
  • For calculating statistics for sub-groups in your dataset, use group_by (which defines a grouping structure)

Tidyverse

We <3 Hadley Wickham

The functions to know are in the slides

  • You can use British English or American English
  • The = and <- are for assigning objects name
  • The == operation checks for equivalence
  • Imagine the %>% operator as the coding equivalent of the words "and then"
    EX. flights %>% group_by(dest)