--- tags: ucsd-carpentries --- # 2022 UC Carpentries Fall Workshop (OpenRefine) **Workshop Details** Dates: September 6th - 13th, 2022 Time: 9am - 12pm **Workshop Agenda:** https://ucsdlib.github.io/2022-09-06-carpentries-uc/ ## Day 6: OpenRefine **Software Installation:** http://openrefine.org/download.html * Download and Install OpenRefine latest Version * Windows kit zip file for Windows * Mac kit for MacOS **Lesson Data (download)** https://raw.githubusercontent.com/LibraryCarpentry/lc-open-refine/gh-pages/data/doaj-article-sample.csv * right click and “save as” .csv file to your desktop ## NOTES: **Introduction to OpenRefine** Intro Slides: https://docs.goog le.com/presentation/d/1XaF9x9243BOSktfMS8YMhiPohe525HWplhr7FG7PXjw/edit?usp=sharing Extensions and Distributions can be found here: http://openrefine.org/download.html **Importing data into OpenRefine** **Layout of OpenRefine, Rows vs Records** **Faceting and filtering** Scatterplot facets are less commonly used. For further information on these see the tutorial at https://web.archive.org/web/20190105063215/http://enipedia.tudelft.nl/wiki/OpenRefine_Tutorial#Exploring_the_data_with_scatter_plots. **Clustering** For more information on the methods used to create Clusters, see https://github.com/OpenRefine/OpenRefine/wiki/Clustering-In-Depth **Working with columns and sorting** **Introduction to Transformations** Full documentation for the GREL is available at https://docs.openrefine.org/manual/grelfunctions. **Writing Transformations** **Transformations - Undo and Redo** **Transforming Strings, Numbers, Dates and Booleans** **Transformations - Handling Arrays** **Exporting data** **Looking Up Data** CrossRef API: https://github.com/CrossRef/rest-api-doc). Read more about the CrossRef service: http://www.crossref.org. OpenRefine has a function for extracting data from JSON (sometimes referred to as ‘parsing’ the JSON). The ‘parseJson’ function is explained in more detail at https://docs.openrefine.org/manual/grelfunctions/#format-based-functions-json-html-xml. The official User Manual provides detailed information about the reconciliation feature: https://docs.openrefine.org/manual/reconciling One of the most common ways of using the reconciliation option in OpenRefine is with an extension (see below for more on extensions to OpenRefine) which can use linked data sources for reconciliation. The RDF extension by Stuart Kenny can be downloaded from https://github.com/stkenny/grefine-rdf-extension/releases. Other extensions are available to do reconciliation against local data such as csv files (see http://okfnlabs.org/reconcile-csv/) and maintained lists of values (see http://okfnlabs.org/projects/nomenklatura/index.html). For more information on using Reconciliation services see https://docs.openrefine.org/manual/reconciling. A list of Extensions (not necessarily complete) is given on the OpenRefine downloads page at http://openrefine.org/download.html. ## Workshop Day 6 ### First name and Last Name/Organization/Dept./Email | Name (first & last) | Organization | Dept. | Email | | ------------------------- | ------------ | ----- | --------------- | | (example) Jane Doe |UCSD | IT | jdoe1@ucsd.edu | |Roberto Silva |Roberto Silva |UCSD |SIO |Rosilva@ucsd.edu |Kenan Chan |UCSD |SIO |kmc001@ucsd.edu | |Tom Le|UCM||tle267@ucmerced.edu| |Nicole Rosenberg | UCSD | Scripps| nrosenberg@ucsd.edu | | | | Aleks Leszczynska | UCSD | | aleszczynska@ucsd.edu | | | | Edgar Reyna | UCLA | Urban Planning | eareyna@ucla.edu | | Oishee Misra | UCSD | Economics |omisra@ucsd.edu | | | | | Marta Sala Climent |UCSD | Medicine | msalacliment@health.ucsd.edu | Charles Faulhaber|Spanish & Portuguese UCB|cbf@berkeley.edu | | | | | | | Becky Miller |UCB | Library | rcmiller@berkeley.edu| | |Apisit Kaewsanit | UCSF |Epidemiology and Biostats | apisit.kaewsanit@ucsf.edu | | | | KYLE ROKES UCSB kyle_rokes@ucsb.edu| |Dayana Elizalde |UCR | deliz002@ucr.edu | | | | Alissa Jae Lazo-Kim | UCSD DBMI internship alazokim@health.ucsd.edu | | | | |Shang Su |U Toledo|Cell and Cancer Biology|shang.su@utoledo.edu| |Jessica Wu-Woods|UCR|Microbiology|jwuw001@ucr.edu| ||||| ||||| ||||| ||||| ||||| ||||| ||||| ||||| ||||| ||||| ||||| ||||| ## Day 6 Exercises 1. **Splitting Subjects into separate cells** What separator character is used in the Subjects cells? How would you split these subject words into individual cells? 2. **Joining the Subjects column back together** Using what we’ve learned, now Join the Subjects back together 3. **Which licences are used for articles in this file?** Use a text facet for the licence column and answer these questions: What is the most common Licence in the file? How many articles in the file don’t have a licence assigned? 4. **Find all publications without a DOI** Use the Facet by blank function to find all publications in this data set without a DOI 5. **Correct the Language values via a facet** Create a Text facet on the language column and correct the variation in the EN and English values. ## Day 6 Reflection Please enter how you will use the OpenRefine in your work or research here: 1. To organize metadata 2. Clean up data 3. To reconcile PhiloBiblon project data with VIAF and Wikidata ## Day 6 Questions: Please enter any questions not answered during live session here: 1. ## OpenRefine Resources: Official wiki List of OpenRefine External Resources: https://github.com/OpenRefine/OpenRefine/wiki/External-Resources Getting started with OpenRefine by Thomas Padilla: http://thomaspadilla.org/dataprep/ Cleaning Data with OpenRefine by Seth van Hooland, Ruben Verborgh and Max De Wilde: http://programminghistorian.org/en/lessons/cleaning-data-with-openrefine Blog posts on using OpenRefine from Owen Stephens: http://www.meanboyfriend.com/overdue_ideas/tag/openrefine/?orderby=date&order=ASC Identifying potential headings for Authority work using III Sierra, MS Excel and OpenRefine: https://epublications.marquette.edu/lib_fac/81/ Free your metadata website: https://freeyourmetadata.org/ Data Munging Tools in Preparation for RDF: Catmandu and LODRefine by Christina Harlow: https://journal.code4lib.org/articles/11013 Cleaning Data with OpenRefine by John Little: https://libjohn.github.io/openrefine/ OpenRefine Blog: https://openrefine.org/category/blog.html ### End Day 6 Feedback form: https://forms.gle/5qgx8X6H3GRMacwD6