owned this note
owned this note
Published
Linked with GitHub
# Carpentry@UiO targets workshop
2023-06-08
## Code of Conduct
By participating in this workshop, you agree to abide by the [Carpentries Code of Conduct](https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html)
Type in plain text
## Participants
Please fill in your information here
| Name | Affiliation | Role |
| -------- | -------- | -------- |
| Joel Nitta | Chiba University | instructor |
| Mo (Athanasia Mowinckel) | UiO, Norway | helper - organiser |
| Espen Rosenquist | STAMI, Norway | helper |
| Molly Carlyle | Psychology | Postdoc |
| James Roe | Psychology | Postdoc |
| Maja S Jacobsen | OUS | Research assistant/lab engineer |
| Elisabeth Magin | Museum of Cultural History | Postdoc |
| Angelica Pulido | Natural History Museum | Postdoc |
| Tatiana Belova | NCMM | researcher |
| Michelle White | ILN | Postdoc |
| Tuomas Hintikka | Uni.Helsinki | microbiology |
## Resources
- [Slides](https://joelnitta.github.io/oslo-targets/)
- [Lesson](https://joelnitta.github.io/targets-workshop/)
- [Posit Cloud (Rstudio in your browser)](https://posit.cloud/content/6064275)
- [Workshop announcement](https://www.ub.uio.no/english/courses-events/courses/other/Carpentry/230608-targetsr)
### other online resources
- [Github](https://github.com/ropensci/targets)
- [Package website](https://docs.ropensci.org/targets/)
- [User manual](https://books.ropensci.org/targets/)
- [Targetopia](https://wlandau.github.io/targetopia/)
- [{tarchetypes} website](https://docs.ropensci.org/tarchetypes/)
- [Discussion board](https://github.com/ropensci/targets/discussions)
## Surveys
- [Post-workshop survey](https://forms.gle/8AAq6JKsTcnFVNvCA)
## Timing (for instructor)
- 9-9:15 intro
- 9:15-10:20 first targets workflow
- break
- 10:30 - loading data
## First things first
- Open RStudio
- install necessary packages
- start a new project for your penguins
```r
install.packages(
c(
"conflicted",
"future.callr",
"future",
"palmerpenguins",
"quarto",
"tarchetypes",
"targets",
"tidyverse",
"visNetwork"
)
)
```
## First targets workflow
```r
library(targets)
tar_script()
```
Create a new script file to analyse your penguins. Name it `penguins.R` and put it somewhere. This is your scratch-file. Test and develop your code.
Switch to the *_targets.R* script
Define a function out of your cleaning code. You can either put this directly in the target script, or put it in it's own script-file and run `source(filename.R)` from the target file you use to run your pipeline.
The `_target` script file is used to define your resources, prerequisites and workflow. The template workflow is a list that defines and runs the worksteps.
```r
library(targets)
library(tidyverse)
library(palmerpenguins)
# This is an example _targets.R file.
# clean your penguins, either by putting the function here or in its own file and source it
clean_penguin_data <- function(penguins_data_raw) {
penguins_data_raw |>
select(
species = Species,
bill_length_mm = `Culmen Length (mm)`,
bill_depth_mm = `Culmen Depth (mm)`
) |>
remove_missing(na.rm = TRUE)
}
# End this file with a list of target objects.
list(
tar_target(penguins_csv_file, path_to_file("penguins_raw.csv")),
tar_target(penguins_data_raw, read_csv(penguins_csv_file)),
tar_target(penguins_data, clean_penguin_data(penguins_data_raw))
)
```
Use `tar_make()` to run your pipeline.
<!-- started around 09:10 and used until 10:20-ish. Including introductory round, some minor issues -->
### RStudio's extract function helper
If you have a bit of code that you want to make into a function, RStudio can also help you do that, through suggestion.
Highlight the code you want to turn into a function, and then in RStudio go to
Code -> Extract function.
You might need to do some minor editing to make it work smoothly, but its a nice way to start learning to make functions.
## Loading workflow objects
<!-- start 10:30 -->
Upon a successful run of `tar_make()`, you can read your expected objects with `tar_load()`.
`tar_objects()` lists what your workflow has created.
targets stores data in the `_targets/` folder, so even when you restart your R-project another day, you can load in object from last time you worked with them, rather than having to run the entire workflow again.
Do not mess with the file structure or the contents therein, with the exception of the `user` folder. However, every time you rerun `tar_make()` it rebuilds it.
**How do you organise your steps?**
Its mostly a matter of preference, and trial and error in terms of what is the most efficient and convenient structure for your projects.
targets saves a file for every step you make, so if you want less intermediate level files saves, you will need to combine steps in a meaningful way.
You can treat your target workflow file as a list where you can run all or just some of the steps, e.g. `tar_make(penguins_data_raw)` or `tar_make(contains("raw"))`
> Be aware: all target names must be unique.
## Workflow lifecycle
<!-- start 11:00 We covered some of the topics in this episode in the previous one through questions, so it went really fast. -->
targets does not track if functions from packages change, only if your own custom functions change.
So, if you update your package library, targets no longer can make sure things are the same, and will not rerun targets where the package functions are used.
To lock your package versions, the {renv} package is a great companion to your reproducibility toolbelt in R.
You can read more about renv on the [package website](https://rstudio.github.io/renv/articles/renv.html).
## Project organization
<!-- start 11:00 -->
git is a nice version control program that is also great for reproducilbity, and carpentry has a [novice course](https://swcarpentry.github.io/git-novice/) in that too.
You can leverage the power of having a repository, not just for your code, but for your cached objects created by your workflow with the package {gittargets}.
## Managing packages
<!-- start 11:25 -->
The [conflicted](https://conflicted.r-lib.org/) package is of great help to manage conflicts and misuse of commands that you expect to work, e.g. `filter` from both the {stats} and the {dplyr} package.
To use conflicted with targets, you need to add information on conflicts on the project-level Rprofile.
to access your Rprofile, the {usethis} package is the best option.
```r
usethis::edit_r_profile("project")
# as opposed to
# usethis::edit_r_profile()
# which edits your user level profile
````
<!-- Lunch 11:45 - 12:45 -->
## External files
<!-- starts at 12:50 -->
the `!!` is often spoken out loud as bang-bang.
You can read more about this (called quasiquotation) in [Advanced R](https://adv-r.hadley.nz/quasiquotation.html) by Hadley Wickham
## Batch processing
<!-- start 13:25 -13:58 -->
## parallel processing
<!-- start 14:12 -->
`models[[1]]` why do we need that?
We dont know!
We'll ask the developer and update materials with an explanation when it comes!
examples on branching here is called "dynamic" branching.
there is also "static" branching, but requires higher level programming.
targets is kind of geared towards data.frames (rectangular data), so anything that takes or returns a data.frame is very well suited to it.
## Quarto
<!-- 14:26 - 14:45-->
https://github.com/joelnitta/penguins-targets
Dowload the file:
https://raw.githubusercontent.com/joelnitta/penguins-targets/main/penguin_report.qmd
A Quarto document is only a simple text file with the extension `.qmd`.