Workshop Details
Dates: September 6th - 13th, 2022
Time: 9am - 12pm
Workshop Agenda:
https://ucsdlib.github.io/2022-09-06-carpentries-uc/
Software Installation:
http://openrefine.org/download.html
Lesson Data (download)
https://raw.githubusercontent.com/LibraryCarpentry/lc-open-refine/gh-pages/data/doaj-article-sample.csv
Introduction to OpenRefine
Intro Slides: https://docs.goog
le.com/presentation/d/1XaF9x9243BOSktfMS8YMhiPohe525HWplhr7FG7PXjw/edit?usp=sharing
Extensions and Distributions can be found here: http://openrefine.org/download.html
Importing data into OpenRefine
Layout of OpenRefine, Rows vs Records
Faceting and filtering
Scatterplot facets are less commonly used. For further information on these see the tutorial at https://web.archive.org/web/20190105063215/http://enipedia.tudelft.nl/wiki/OpenRefine_Tutorial#Exploring_the_data_with_scatter_plots.
Clustering
For more information on the methods used to create Clusters, see https://github.com/OpenRefine/OpenRefine/wiki/Clustering-In-Depth
Working with columns and sorting
Introduction to Transformations
Full documentation for the GREL is available at https://docs.openrefine.org/manual/grelfunctions.
Writing Transformations
Transformations - Undo and Redo
Transforming Strings, Numbers, Dates and Booleans
Transformations - Handling Arrays
Exporting data
Looking Up Data
CrossRef API: https://github.com/CrossRef/rest-api-doc).
Read more about the CrossRef service: http://www.crossref.org.
OpenRefine has a function for extracting data from JSON (sometimes referred to as ‘parsing’ the JSON). The ‘parseJson’ function is explained in more detail at https://docs.openrefine.org/manual/grelfunctions/#format-based-functions-json-html-xml.
The official User Manual provides detailed information about the reconciliation feature: https://docs.openrefine.org/manual/reconciling
One of the most common ways of using the reconciliation option in OpenRefine is with an extension (see below for more on extensions to OpenRefine) which can use linked data sources for reconciliation. The RDF extension by Stuart Kenny can be downloaded from https://github.com/stkenny/grefine-rdf-extension/releases.
Other extensions are available to do reconciliation against local data such as csv files (see http://okfnlabs.org/reconcile-csv/) and maintained lists of values (see http://okfnlabs.org/projects/nomenklatura/index.html).
For more information on using Reconciliation services see https://docs.openrefine.org/manual/reconciling.
A list of Extensions (not necessarily complete) is given on the OpenRefine downloads page at http://openrefine.org/download.html.
(first & last) | Organization | Dept. | |
---|---|---|---|
(example) Jane Doe | UCSD | IT | jdoe1@ucsd.edu |
Roberto Silva | UCSD | SIO | Rosilva@ucsd.edu |
UCSD | SIO | kmc001@ucsd.edu | |
Tom Le | UCM | tle267@ucmerced.edu | |
Nicole Rosenberg | UCSD | Scripps | nrosenberg@ucsd.edu |
Aleks Leszczynska | UCSD | aleszczynska@ucsd.edu | |
Edgar Reyna | UCLA | Urban Planning | eareyna@ucla.edu |
Oishee Misra | UCSD | Economics | omisra@ucsd.edu |
| | | |
| Marta Sala Climent |UCSD | Medicine | msalacliment@health.ucsd.edu | Charles Faulhaber|Spanish & Portuguese UCB|cbf@berkeley.edu | | | | | |
| Becky Miller |UCB | Library | rcmiller@berkeley.edu| |
|Apisit Kaewsanit | UCSF |Epidemiology and Biostats | apisit.kaewsanit@ucsf.edu | | |
| KYLE ROKES UCSB kyle_rokes@ucsb.edu|
|Dayana Elizalde |UCR | deliz002@ucr.edu | | |
| Alissa Jae Lazo-Kim | UCSD DBMI internship alazokim@health.ucsd.edu | | | | |Shang Su |U Toledo|Cell and Cancer Biology|shang.su@utoledo.edu|
|Jessica Wu-Woods|UCR|Microbiology|jwuw001@ucr.edu|
|
|
|
|
|
|
|
|
|
|
|
|
Splitting Subjects into separate cells
What separator character is used in the Subjects cells?
How would you split these subject words into individual cells?
Joining the Subjects column back together
Using what we’ve learned, now Join the Subjects back together
Which licences are used for articles in this file?
Use a text facet for the licence column and answer these questions:
What is the most common Licence in the file?
How many articles in the file don’t have a licence assigned?
Find all publications without a DOI
Use the Facet by blank function to find all publications in this data set without a DOI
Correct the Language values via a facet
Create a Text facet on the language column and correct the variation in the EN and English values.
Please enter how you will use the OpenRefine in your work or research here:
Please enter any questions not answered during live session here:
1.
Official wiki List of OpenRefine External Resources: https://github.com/OpenRefine/OpenRefine/wiki/External-Resources
Getting started with OpenRefine by Thomas Padilla: http://thomaspadilla.org/dataprep/
Cleaning Data with OpenRefine by Seth van Hooland, Ruben Verborgh and Max De Wilde: http://programminghistorian.org/en/lessons/cleaning-data-with-openrefine
Blog posts on using OpenRefine from Owen Stephens: http://www.meanboyfriend.com/overdue_ideas/tag/openrefine/?orderby=date&order=ASC
Identifying potential headings for Authority work using III Sierra, MS Excel and OpenRefine: https://epublications.marquette.edu/lib_fac/81/
Free your metadata website: https://freeyourmetadata.org/
Data Munging Tools in Preparation for RDF: Catmandu and LODRefine by Christina Harlow: https://journal.code4lib.org/articles/11013
Cleaning Data with OpenRefine by John Little: https://libjohn.github.io/openrefine/
OpenRefine Blog: https://openrefine.org/category/blog.html
Feedback form:
https://forms.gle/5qgx8X6H3GRMacwD6