or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
![image alt](https:// "title") | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | Emoji list | ||
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Syncing
xxxxxxxxxx
2018-UCSD-Library-Carpentry
These notes have been locked by the instructors.
September 14, 2018
9am - 4pm
UC San Diego Biomedical Library Building, Classroom 4
Sign In: Name, Program/Dept, Affiliation(academic, staff, student)
Reid Otsuji, RDCP/DLDP, Librarian
Ryan Johnson, Metadata Services/RDCP, Librarian
Mary Linn Bergstrom, ALP, RDCP, Librarian
Amanda Roth, LSV, Librarian
Amber Gallant, CARS
Judea d'Arnaud, CARS, staffhack
Satomi Saito, CARS, staff
Stephen Cruz, SLA-BLB, Staff
Bie-hwa Ma, SCP
Adriana Moran, CARS
SuHui Ho, DUS
Ruihua Zhang, MS
Renee Chin, MS
Jinhong Qi, MS
Erin Glass, RAS, Librarian
Ho Jung Yoo, RDCP, staff
Start your notes here:
More about Library Carpentry: https://librarycarpentry.org/
Markdown intro
bold
italic
hyperlinks
A line break is two spaces
code
or 'monospace'Data Intro for Librarians
Type your notes here:
Jargon Busting
hackmd
shell
UNIX
confluence
Python–take workshop, get a book on it, ask Ryan
data mining
data curation
regular expression–practice, trial and error
data scraping
share point
R
machine language
API
data flow
DAMS
Chronopolis - take a workshop
data visualization
Chronopolis
visual basic
GIS formats–Talk to GIS librarian
java - W3Schools for programming langauges in order to learn how to program
Pick problematic 3 terms and think about how or where you might learn more about them
w3school - or for some conceptual questions, talk with an expert
Online courses like DataCamp
Regular expressions - can be (programming) language agnostic
Intro to Unix Shell - BASH
online help www.explainshell.com - use to look up shell command help information
Download data here:
https://librarycarpentry.github.io/lc-shell/setup.html
Ryan's BASH script to update an Ubuntu operating system: link
Shell Cheat sheet commands
Shell: Basics
pwd
- print working directoryls
- list contents of a directoryls -l
- list file informationls -lh
- list human readable file informationls -F
- list files and directories (directories will have a trailing /)ls -a
- list all files, including hidden filesls *.txt
- list all files that end with .txtcd
change directory..
- move up one level in directorycd pathname
- takes you to the directory specified by pathnamecd
orcd ~
- takes you to your home directory&&
- separates two commands sequentiallyShell: Interacting with Files
mkdir
make a directorycat
print to shell or send file or files to outputhead
output first 10 lines of a file or filestail
output last 10 lines of a file or filesmv
rename or move a file or files. Syntax for renaming a file: mv FILENAME NEWFILENAMEcp
make a backup copy of a file or files. Syntax:cp FILENAME NEWFILENAME
rm
remove a file or files. NB: USE WITH EXTREME CAUTION!!!rmdir -r
will delete a directory, even if it is not empty.rmdir -r-i
will delete a directory, even if it is not empty, but will ask you to confirm each deletion.Shell: Wildcards
?
a placeholder for one character or number*
a placeholder for zero or more characters or numbers[]
defines a class of charactersExamples
foobar?
: matches 7-character strings starting with foobar and ending with one character or numberfoobar*
: matches strings that start with foobar and end with zero or more other characters or numbersfoobar*txt
: matches strings that start with foobar and end with txt[1-9]foobar?
: matches 8-character strings that start that start with a number, have foobar after the number, and end with any character or number.Shell: Counting and Mining
wc
word count-w
: count words-l
: count lines-c
: count characterssort
sort inputgrep
global regular expression print-c
: displays counts of matches for each file-i
: match with case insensitivity-w
: match whole words-v
: exclude match--file=FILENAME.txt
: use the file FILENAME.txt as the source of strings used in query|
: (vertical bar character) send output from one command into another commandShell manual/help
On Mac:
man
On PC:
help
Tidy Data for Librarians
Lesson: https://librarycarpentry.github.io/lc-spreadsheets/
Group Discussion
Problems with Spreadsheets
Real world example: Borwein, Jonathan, and David H. Bailey. “The Reinhart-Rogoff Error – or How Not to Excel at Economics.” The Conversation, The Conversation, 22 Apr. 2013, http://theconversation.com/the-reinhart-rogoff-error-or-how-not-to-excel-at-economics-13646.
Tidy data spreadsheets
What do you do with spreadsheets?
Data entry
Organizing data
Subsetting and sorting data
Statistics
Plotting
Types of spreadsheet software
Excel
Googlesheets
LibreOffice
Gnumeric
OpenOffice.org
Problems with spreadsheets:
we push them to the limits
lots -
too many sheets
null vaules
selecting data and deleting it
Excel has it's own mind - transforms data or numeric values
Good practices:
well formatted data
data organization
create new file or tab (create a new sheet for notes)
KEEP YOU RAW DATA RAW!
track your steps! Spreadsheet functions are not reproducible.
Keep variables columns heading (heading column is the variable)
observations (values) in row
columns = variables/Rows - Observations - values
date handling in spreadsheets
dates infomration stored in Excel stores dates as a number
e.g. jan 1, 1900 is number 1
excel does a mathmatical calcualtion to figure out the dates
Good practice:
Split out dates into seperate columns in excel
Remember, dates are a big iusse in Excel:
be aware a out how your are formatting dates in the spreadsheet
Have a plan to format dates properly
remember setup the spreadsheet to minimize errors as much as possible
Use readme files to document work you have done.
Sorting
Exercise 1: Messy library training data
https://librarycarpentry.github.io/lc-spreadsheets/data/training_attendance.xlsx
Exercise 1 responses:
Problems with data in the spreadsheet:
different formatting
different information
wrong years
formats dates
multiple information in the wrong column
multiple tables on one sheet
what did you do to reformat data?
create new sheet setting up columns (variables)
added year to dates
Discussion: Messy library training data
What was wrong with this data?
** Formatting Problems - Common Mistakes: https://librarycarpentry.github.io/lc-spreadsheets/02-common-mistakes/
How did you fix it?
An excellent reference, in particular with regard to R scripting is
Hadley Wickham, Tidy Data, Vol. 59, Issue 10, Sep 2014, Journal of Statistical Software. http://www.jstatsoft.org/v59/i10.
Exercise 2: pulling month, day and year out of dates in Excel
=MONTH(A2)
=DAY(A2)
=YEAR(A2)
And then autofill down
To do:
create a new sheet
create new columns for month, day, and year
use formula above to calculate each date property
Bonus Exercise(1a) pulling date of year out of dates in Excel
=A2-DATE(YEAR(A2),1,0)
And then autofill down
Exercise: Quality Assurance
We can’t have half a person attending a workshop, so let’s try this out by setting the num_registered column in our Dates spreadsheet to only allow whole numbers between 1 and 100.
Data Validation
Allow box
selectWhole number
num_registered
column that isn’t a valid size, like 15.1. The spreadsheet stops us from entering the wrong value and asks us if we would like to try again.Input Message
tab and allow invalid data to just result in a warning by modifying the Style option on the Error Alert tab.Quality assurance can make data entry easier as well as more robust. For example, if you use a list of options to restrict data entry, the spreadsheet will provide you with a drop-downlist of the available items. So, instead of trying to remember the initials of all your trainers, you can just select the right option from the list.
Exercise: Sorting
Exercise: Conditional formatting
It is nice to do be able to do these scans in spreadsheets, but we also can do these checks in a programming language like Python or R, or in OpenRefine or SQL
Exercise: Export to CSV
Intro to OpenRefine
We will get data for this lesson here:
https://raw.githubusercontent.com/LibraryCarpentry/lc-open-refine/gh-pages/data/doaj-article-sample.csv
Start OpenRefine using Chrome!
There seem to be problems starting open refine in other web browsers
use for:
cleaning up messy data
quick summary of data
open refine can open a variety of data files.
cross platform compatability
works with google data
be aware of sorting and importing in Excel.
principles of avoid creating messy data
You can apply json scripts saved from previous OR session to re run another process.
UTF-8 encoding is the recommended default selection
OR exports as .tar file.
Dates in OR can be slpit out in to seperate columns
Month, day, year
Avtivities are useful! Please take time to check students were able to type commands before showing the result (for upgrading the instruction).
it's easier, compared with Excel, to edit the faceted data within the cluster without highlighting the specific columns and messing out the same data in other columns
Minute notes:
Please write one thing that you thought was positive during the workshop day:
The hands on activities and the command short cut notes.
The hands-on activities were well-suited to the length of time that we had and felt like we could build upon them.
Please write one thing that you thought we could improve or what you would like to learn more about:
I appreciated the number of helpers in the room! Felt very supportive. Need to take time to apply what I learned in this fast paced day.
I just want to delve into more Library Carpentry topics! I felt like it wasn't long enough. I would also have really liked real world examples of data curation and data visualization stemming from projects that librarians have done in the past, so I could really visualize the applications.
I'm not sure how I'm going to be able to keep this fresh in my mind since the applications wouldn't be something I would usd daily.
There is a lot of potential in improving work effeiencies. The morning session was helpful in understanding the fundamentals of commands. OpenRefine offers quite a bit of potential for helping our work with spreadsheets (which comprise 80% of my work). I appreciated the help with the programs/laptop.
Activities are useful! Please take time to check the majority of students are able to finish typing commands before showing the results (for upgrading instruction :))
This very helpful workshop. a lot of things to learn.