Exploratory data analysis

Learning objectives

  • Define exploratory data analysis (EDA) and types of pattern exploration
  • Demonstrate types of graphs useful for EDA and precautions when interpreting them
  • Practice exploring data

Approaches to EDA

  • Look for variation and covariation
  • Don't need to have pretty charts (for publishing purposes, you want to add titles, legends, themes etc.)

Variation

  • dispersion of one variable
  • Histograms
    • w/ Rug plot gives 1D marginal distributions (best used with smaller, high-variance datasets)
    • Bin width is important!
    • Continous variables
  • Bar Chart
    • Categorical variables

Covariation

  • multiple variables (x and y axis)
  • Categorical on x, continuous on y -> boxplot
  • Two continuous -> scatterplots
  • Could also facet or layer variables on a single graph
    • When layering, use different channels to distinguish patterns