owned this note
owned this note
Published
Linked with GitHub
# CSU Fresno Summer Carpentries Workshop
## Dates: July 15-17, 2025
## Instructors: Kat Koziar and Justin Shaffer
## Helper(s): Dylan Simmons
## Website: https://justinshaffer.github.com.io/2025-07-15-Carpentries
## Notes document: https://hackmd.io/aOyuYBXdS-2ISzYSCbQIKg
## Lessons:
### The Unix Shell: https://swcarpentry.github.io/shell-novice/
### R for Reproducible Scientific Analysis: https://swcarpentry.github.io/r-novice-gapminder
### Eventbrite: https://www.eventbrite.com/myevent?eid=1349190488069
## Day 1 Attendees (please sign in)
Justin Shaffer (co-instructor; shaffer@csufresno.edu)
Valerie Gallardo (valeriegallardo211@gmail.com)
Rowan Stafford (jstaff@mail.fresnostate.edu)
Celeste Pilegard (pilegard@ucsd.edu)
Paulina Vergara-de la Cruz(paulinav314@mail.fresnostate.edu)
Matt Rodriguez (mattkrodriguez77@mail.fresnostate.edu)
Ben (kyawzya@mail.fresnostate.edu)
Megan Valdez (meganvaldez@mail.fresnostate.edu)
Kevina Pinto (pinto_kevina@mail.fresnostate.edu)
Andrew (amartin462@ucmerced.edu)
Samuel Muratalla Farias (sammf0117@mail.fresnostate.edu)
Mohammad Rocka (mpr21@mail.fresnostate.edu)
Christy Lesnikowski (christy4916@mail.fresnostate.edu)
Diana Camarena (sweetdc@mail.fresnostate.edu)
Nemah Alteshi (nemah09@mail.fresnostate.edu)
Aya Elhabashy (aya9998@mail.fresnostate.edu)
Isaiah Robles (isaiah.robles7@gmail.com)
## Day 1 Important Links
Unix Shell cheat sheet: https://thejacksonlaboratory.github.io/introduction-to-hpc/cheatsheet/
OS-specific help:
https://ss64.com/
## Day 1 Notes
### [Navigating Files and Directories](https://swcarpentry.github.io/shell-novice/02-filedir.html)
- `pwd`: print/present working directory (i.e., show current folder)
- `ls`: list contents of directory
- `ls -l` : gives long listing format
- `ls -a`: show all files including hidden files
- `ls -F`: appends `/` to folders
- `ls -laF`: does all three
- Wildcards
- `*` = wildcard; matches unknown text of any length (e.g., `ls *.txt`: lists all .txt files)
- `?` = wildcard for exactly one character
- `cd`: change directory
- `cd ..` to go back up a level (to parent directory)
- `cd ../..` to go up two levels
- `cd ~` (or just `cd`) to go to home directory
- `cd /` to root directory
- `clear`: clears the terminal screen
- **Help with commands**
- Mac/Linux: `man [command]`: opens manual (e.g., `man ls`)
- to get out: `q`
- Windows: ` [command] --help` after command (e.g., `ls --help`)
- **up arrow**: goes through history of your commands
- **tab**: completes file or directory name
### [Working with Files and Directories](https://swcarpentry.github.io/shell-novice/03-create.html)
- **Naming files and directories**
- avoid spaces (computer doesn't like them), periods (looks like file types), dash (treated as option flag in some languages)
- recommended: underscore or "camelCase" (no spaces, capitalize each word)
- `mkdir`: create a folder (make directory)
- `mkdir -p` create intermediate directories
- `nano` + filename, e.g., `nano draft.txt`: opens text file in nano (and creates it if it doesn't exist)
- `vim` + filename (e.g., `vim draft.txt`) opens in alternative text editor
- `:q` to exit
- `rm` + filename: remove (delete) file
- file is unrecoverable, no "are you sure you want to delete" message
- `rm -R` remove folder
- `touch` + filename (e.g., `touch my_file.txt`): creates a file without opening it
- `mv` + source file location + target file location: moves or renames file
- `cp` + source file + target file (e.g., `cp quotes.txt quotations.txt`): copy file
### [Pipes and Filters](https://swcarpentry.github.io/shell-novice/04-pipefilter.html)
- `wc` + filename: word count (lines, words, characters)
- `wc -l` shows only lines; `wc -m` for characters; `wc -w` for words
- add wildcard (e.g., `*.pdb` to show word count for all files of a type)
- `cat`: concatenates (i.e., joins together, one after another)
- `cat` followed by single file just prints whole contents of file; followed by multiple files prints them joined together
- `sort`: sorts (follow by `-n` to specify numerical sort)
- `head`: display first line of file
- `tail`: display last part of file
- `echo`: prints output of a command
- **Piping**: capturing output from commands
- `>` followed by filename: **redirects** output into file
- `>>` **appends** output to end
- `|` **pipes** (passes) the output to another command
- e.g., `$ sort -n lengths.txt | head -n 1` displays the first row of the sorted text file
### [Loops](https://swcarpentry.github.io/shell-novice/05-loop.html)
- **Loops** let you repeat a command for each item in a list
- **for loop**
- `for` indicates start of for loop
- `for` variable `in` list of things
- `do` indicates start of job list (insert commands here)
- `$variable` to call the variable named in `for` line
- convention: tab indent commands
- `done` ends the loop
- Behavior within a loop
- Prompt ($ or %) replaced with >
- `ctrl + c` to kill the running command if you get stuck in a loop (e.g., not getting output when you think you should)
### [Shell Scripts](https://swcarpentry.github.io/shell-novice/06-script.html)
- Create a new bash script with .sh as extension
- e.g., `nano middle.sh`
- `bash` to run a script (e.g., `bash middle.sh`)
- Write the script
- `#` to write comments (will be ignored when script runs); recommended comments for header:
- Usage (specifies input arguments), e.g., `Usage: bash middle.sh filename end_line num_lines`
- Why or in what circumstances file is used
- `$1` to reference whatever is the first filename or other argument... and can use sequential numbers for second (`$2`), third (`$3`), etc.
- `$@` to reference all filenames or arguments in command line
- Place variables in quotes (e.g., `"$1"`) if they might have spaces in them
### [Finding Things](https://swcarpentry.github.io/shell-novice/07-find.html)
- `grep` + pattern + filename: finds and prints lines that match a pattern
- `-w` limits matches to word boundaries
- `-n` specifies line number
- `-i` case-insensitive
- `-v` inverts search (returns lines that don't match pattern)
- `-r` recursive search (e.g., `$ grep -r Yesterday .` searches all files in the current directory for "Yesterday")
- `-E` interpret the pattern as a regular expression
- e.g., `$ grep -E "^.o" haiku.txt` searches for "o" in the second position in any line: `^` anchors to beginning of line, `.` is a wildcard like `?`
- Single(`'`)/double(`"`) quotes: generally can use either
- use quotes to search for a phrase; not needed for single word
- Handy reference for commands (alternative to `man` or `--help`): https://ss64.com/
## Day 2 Attendees (please sign in)
Samuel Muratalla Farias(sammf0117@mail.fresnostate.edu)
Kevina Pinto (pinto_kevina@mail.fresnostate.edu)
Ben (Kyawzya@mail.fresnostate.edu)
Andrew (amartin462@ucmerced.edu)
Aya Elhabashy (aya9998@mail.fresnostate.edu)
Christy Lesnikowski (christy4916@mail.fresnostate.edu)
Matt Rodriguez (mattkrodriguez77@mail.fresnostate.edu)
Celeste Pilegard (pilegard@ucsd.edu)
Rowan Stafford (jstaff@mail.fresnostate.edu)
Valerie Gallardo (valgal211@mail.fresnostate.edu)
Isaiah Robles
Megan Valdez (meganvaldez2mail.fresnostate.edu)
Isaiah.robles7@gmail.com
Nemah Alteshi (nemah09@mal.fresnostate.edu)
Paulina Vergara Paulinav314@mail.fresnostate.edu
## Day 2 Notes
### [Introduction to R and R Studio](https://swcarpentry.github.io/r-novice-gapminder/01-rstudio-intro.html)
- RStudio Default Layout
- Left side: R console/Terminal (shows version at top)
- Upper right: Environment (lists everything in active memory)/History/Connections
- Lower right: Files/Plots/Packages/Help/Viewer
- *once you open a file*: editor opens on top left
- R prompt in console
- `>` means R is ready for input
- `+` means R has an incomplete command and is waiting for more input
- to cancel a running command hit `Esc`
- nothing at beginning of line means R is still running or stuck
- Writing code
- Can write directly in console; code won't save
- Write in an R Script (`.R` file; upper left panel) and select/run code
- To **run** a line or selection from .R file: `⌘`+`Return` on Mac or `Ctrl`+`Return` on Windows
- `#` for comments
- `tab` autocompletes functions
- R as a calculator
- Recognizes `(` and `)`, `^` or `**`, `*`, `/`, `+`, `-`
- Built in mathematical functions: `sin()`, `log()`, `exp()`, etc
- Comparing things: `==`/`!=` for equality of integers/inequality; `<`/`>` for less/greater than; `<=`/`>=` for less/greater than or equal to
- Variables and assignments
- `<-` assignment operator; stores values in variables (e.g., `x <- 1/40`)
- stored variables appear in Environment pane (upper right)
- Variable names: can contain letters, numbers, `_`, and `.`; no space or `-`; cannot start with number or `_`; variables starting with `.` become hidden
- Vectorization
- Vector: set of values **in a certain order** of the **same data type**
- use `:` for range of numbers; e.g., `x <- 1:5`; `2^x` returns `2 4 8 16 32`
- Managing your environment: use Environment tab or `ls`
- Removing objects: use `rm()` or Environment tab
- R Packages
- Packages add functions to R
- `install.packages("packagename")`: install a package (use quotes)
- `library(packagename)`: makes package available in working memory (no quotes)
- Other package commands
- `installed.packages()`: see what packages are installed
- `update.packages()`: update package
- `remove.packages("packagename")`: remove package
- Can also use package pane to install/update/load/unload packages
### [Project Management with RStudio](https://swcarpentry.github.io/r-novice-gapminder/02-project-intro.html)
- Use **R Projects** for project management
- Create R Project: `File` > `New Project` > `New Directory` > `New Project` > type name of directory > `Create Project`
- Creates `.Rproj` file; use that file to open the project in the future
- Best practices
- Treat data as read only (don't edit data files after collection)
- Data cleaning: use scripts to get data readable in R
- Treat generated output as disposable: all output should be reproducible from scripts
- Save data in the data directory (create folder called "data" in your project folder)
- Working directory: Opening an R Project sets the working directory as the folder the `.Rproj` file is in
- `getwd()` check working directory
- `setwd()` change working directory
- Can also check/change working directory through Settings icon in Files pane
### [Seeking Help](https://swcarpentry.github.io/r-novice-gapminder/03-seeking-help.html)
- `?` before the name of a command for Help OR click Help pane in RStudio and search
- typing a command in the console opens tooltips
- CRAN (https://cran.rstudio.com) for R packages, manuals, etc.
- Also recommended: Stack Overflow, LLMs (but contains errors), colleagues (or affiliated data lab)
### [Data Structures](https://swcarpentry.github.io/r-novice-gapminder/04-data-structures-part1.html)
- Notes on writing code...
- `#` for comments
- `tab` autocompletes functions
- `Code` > `Soft wrap long lines` to wrap text in file viewer
- R ignores spaces, so e.g., `c(1,2,3)` is the same as `c(1, 2, 3)`
- `str()` displays structure of an object
- `summary()` displays summary statistics
- `paste()` to concatenate strings; `sep` argument to specify separator
- Data frames: data structure in which columns are vectors (everything in a columns must be same data type); rows can contain different types
- `data.frame()` creates data frame
- in parentheses: list of variables (columns) with same number of rows, e.g., `cats <- data.frame(coat = c("calico", "black", "tabby"), weight = c(2.1, 5.0, 3.2), likes_catnip = c(1, 0, 1))`
- `$` to choose column; e.g., `summary(cats$weight)`
- Combining data... by rows: `rbind()`; by columns: `cbind()`
- can create new variable with assignment operator, e.g., `cats$weight_corrected <- cats$weight + 2`
- `write.csv()` to save data frame as csv file
- `read.csv()` to open csv file
- Data types: a vector must be all the same type
- `typeof()` displays type of an object (e.g., `double`/numeric, `integer`, `complex`, `logical` and `character`)
- `factor` is an alternative data type to `character`; was default in older R versions
- `as.[datatype]()` coerces to new type (e.g., `as.character()`)
- **Type hierarchy**: Combining different data types into a vector R forces them to be the same type
- Goes `logical` -> `integer` -> `double` -> `complex` -> `character`. E.g., combining `logical` and `character` transforms the result to `character`
- Vector functions
- `c()` combines
- `seq()` makes a series, `by` argument specifies interval (e.g., `seq(1,10, by=0.1)`)
- `length()` gives length; number of columns if you give it a data frame
- `[]` to reference element in a vector; can be used to replace an element (e.g., `sequence_example[1] <- 30` changes first element to 30)
- `list()` creates a list, can contain mix of data types
- `[[]]` to reference element
- can give name to items, e.g., `another_list <- list(title = "Numbers", numbers = 1:10, data = TRUE )`
- `names()` displays names
- Data frames continued...
- reference columns using, e.g., `cats[,1]`, for first column)
- can mouse over column header in data view for column number
- `names()` to display column names; can rename columns with `names(cats)[2] <- "weight_kg"`
- Matrices
- `matrix()` creates matrix, e.g., `matrix_example <- matrix(0, ncol=6, nrow=3)`
- Can ask `dim()` for dimensions, `typeof()` for type, `class()`, `nrow()` for number of rows, `ncol` for columns
### [Exploring Data Frames](https://swcarpentry.github.io/r-novice-gapminder/05-data-structures-part2.html)
- Remove rows/columns
- `cats[-4,]` removes fourth row
- `cats[,-4]` removes fourth column
- OR `drop <- names(cats) %in% c("age")`; `cats[,!drop]`
- `sample()` for random sample; can use to randomly sample rows (e.g., `gapminder[sample(nrow(gapminder), 5), ]`)
### [Subsetting Data](https://swcarpentry.github.io/r-novice-gapminder/06-data-subsetting.html)
- `unique()` shows unique values, e.g., `unique(gapminder$country)`
- Special values: `NA` not available; `NaN` not a number; `Inf` infinity
- `is.na` returns positions containing "NA" (R convention for missing data)
- if other markers for missing data in dataset, can specify with `na.strings` argument in `read.csv()`, e.g., `gapminder <- read.csv("data/gapminder_data.csv", na.strings = c("NA", "na", "not applicable", "not aplicable"))`
- `subset()` to pull subset of data, e.g., `gapminder_zim <- subset(gapminder,, gapminder$country == "Zimbabwe")` for only Zimbabwe observations
- Matrix subsetting
- creating a random matrix
- `set.seed(1)`
`m <- matrix(rnorm(6*4), ncol = 4, nrow = 6)`
- `set.seed()` for same output of random generator every time
- `m[3:4, c(3,1)]` for some rows, some columns; `m[3:4,]` for some rows, all columns, etc.
## Day 3 Sign in
Justin Shaffer
Samuel Muratalla Farias(sammf0117@mail.fresnostate.edu)
Andrew Martin (amartin462@ucmerced.edu)
Matt Rodriguez (mattkrodriguez77@mail.fresnostate.edu)
Aya Elhabashy (aya9998@mail.fresnostate.edu)
Ben (Kyawzya@mail.fresnostate.edu)
Kevina Pinto (pinto_kevina@mail.fresnostate.edu)
Valerie Gallardo (valeriegallardo211gmail.com)
Nemah Alteshi(nemah09@mail.fresnostate.edu)
Rowan Stafford (jstaff@mail.fresnostate.edu)
Celeste Pilegard (pilegard@ucsd.edu)
Megan Valdez (meganvaldez@mail.fresnostate.edu)
Paulina vergara-de la cruz Paulinav314@mail.fresnostate.edu
Christy Lesnikowski (christy4916@mail.fresnostate.edu)
## Day 3 Important links
Tidyverse
https://www.tidyverse.org
R Graph Gallery
https://www.google.com/search?client=safari&rls=en&q=ggplot2+gallery&ie=UTF-8&oe=UTF-8
R Colors
bit.ly/rcolors
Cheat sheet:
https://www.rstudio.org/links/data_visualization_cheat_sheet
ggplot themes
https://ggplot2.tidyverse.org/reference/ggtheme.html
## Day 3 Notes
### [Control Flow](https://swcarpentry.github.io/r-novice-gapminder/07-control-flow.html)
- `if()` for conditional statements, e.g.:
- if alone: `if (condition is true) {perform action}`
- if + else: `if (x >= 10) {print("x is greater than or equal to 10")} else {print("x is less than 10")}`
- if + else if + else: `if (x >= 10) {print("x is greater than or equal to 10")} else if (x > 5) {print("x is greater than 5, but less than 10")} else {print("x is less than 5")}`
- `for()` loops: `{}` takes place of `do`/`done`
- `for (iterator in set of values) {do a thing}`
### [Creating Publication-Quality Graphics with ggplot2](https://swcarpentry.github.io/r-novice-gapminder/08-plot-ggplot2.html)
- `plot(x,y)` for simple scatterplots in Base R, e.g., `plot(gapminder$pop, gapminder$lifeExp)`
- `ggplot()`: creates layered plot; use `+` at end of line to add new layer
- `ggplot(` first argument is your dataset`, aes(` specify x and y axes`), color(`to change color globally`), group(`to partition data`)) +`
- `geom_point()` for scatterplot, or `geom_line()`, `geom_boxplot()`, `geom_violin()`, etc
- can add together, e.g., `geom_point() + geom_line()`
- can add arguments within plot type, e.g., `geom_line(aes(color = continent))`
- `theme` element changes a lot of stuff (e.g., x-axis labels) -- recommends using Stack Overflow for help
- `lab()` + `x`, `y`, `title`, arguments for axis labels, title, legend, OR `xlab()`, `ylab()`, etc.
- `facet_wrap()` e.g., `facet_wrap(~country)` create multi-panel plot by country
- `~` means "by" (e.g., in `facet_grid()` or `aov()`)
- Transformations and Statistics
- `scale_x_log10()` log transforms x axis
- `geom_smooth(method = "lm")` adds lm line; change look with `linewidth`, `color`
- `xlim()` and `ylim()` to change x/y limits
### [Vectorization](https://swcarpentry.github.io/r-novice-gapminder/09-vectorization.html)
- Vectorization: R functions will operate on all elements of a vector
- e.g., for numerical vector `x`, `x * 2` will double all values in the vector; `log(x)` will take log of all values, etc.
- also works with matrices
- Vectors of unequal length (e.g., adding two vectors of unequal length): R will recycle smaller vector until it matches the length of the larger vector
### [Functions Explained](https://swcarpentry.github.io/r-novice-gapminder/10-functions.html)
- Function: encapsulated code that can be called with a function name; can do anything...
- may or may not require input arguments
- may or may not return an output
- `Functions are always followed by `()`; arguments go in the parentheses; empty parentheses if no argument is needed
- Defining a function: may be useful to write a function whenever you're using the same code multiple times
- `my_function <- function(parameters) {`
`# perform action`
`# return value
`}`
### [Writing Data](http://swcarpentry.github.io/r-novice-gapminder/11-writing-data.html)
- Export a plot: use menu in Plots pane OR `ggsave()` e.g., `ggsave("myplot.pdf")`
- `pdf()` to create pdf; must end with `dev.off()`
- `write.table()` to export data; `file` argument specifies location; `sep` specifies separator
### [Data Frame Manipulation with dplyr](https://swcarpentry.github.io/r-novice-gapminder/12-dplyr.html)
- Summary statistics with Base R
- E.g., mean GDP per capita for observations grouped by continent: `mean(gapminder$gdpPercap[gapminder$continent == "Africa"])`
- then repeat with every continent
- Can use pipes, e.g., `year_country_gdp <- gapminder %>% select(year, country, gdpPercap)`
- `select()` keeps only the variables you select
- `filter()` keeps only observations that match a condition
- `group_by()` to perform operations by group
- e.g., descriptives by group:
`gapminder %>%`
`group_by(continent) %>% `
`summarize(`
`"mean life" = mean(lifeExp),`
`"min" = min(lifeExp),`
`"max" = max(lifeExp),`
`"standard error" = sd(lifeExp)/sqrt(n())`
`)`
### [Data Frame Manipulation with tidyr](https://swcarpentry.github.io/r-novice-gapminder/13-tidyr.html)
- "Wide" data: Each row is one site/subject/patient, each column is a variable
- "Long" data: one column for the observed variable; other columns are ID variables
- `pivot_longer()` converts data to long format
- `pivot_wider()` converts data to wide format