<!-- .slide: style="font-size: 30px;" --> # Open Refine and spreadsheets - recap --- ## Good practices - Reproducible data - Tidy data - Use static, universal file formats (`.csv` or `.tsv`) - Do not change the original data set --- ## Spreadsheets (key ideas) --- ### Things spreadsheets do well - Data entry - Data validation - Data wrangling --- ### Spreadsheet pitfalls - Reproducible data analysis - Do not enforce good data habits - Formatting --- ## OpenRefine (key ideas) --- ### Things we can use OpenRefine for - See an overview of a data set - Resolve inconsistencies - Help you split data into granular parts - Match local data with other data sets - Save a set of data cleaning steps for replay on multiple files --- ## OpenRefine features - OpenRefine automatically keeps a log of every change you make - OpenRefine does not allow you to modify your original file - Any operation can be undone - OpenRefine can repeat your steps for more than one data set - OpenRefine provides a user-friendly interface for complex data work
{"metaMigratedAt":"2023-06-15T15:15:11.648Z","metaMigratedFrom":"YAML","title":"Recap","breaks":false,"slideOptions":"{\"theme\":\"solarized\",\"transition\":\"fade\",\"spotlight\":{\"enabled\":true}}","contributors":"[{\"id\":\"a5f2ea71-ceec-442f-931c-14a5a6919cbb\",\"add\":1190,\"del\":0}]"}
    138 views