# Personal notes of Barbara in preparing R Packaging
To be used for additions to the [current workshop materials](https://carpentries-incubator.github.io/lesson-R-packaging/).
## Week 4 - Vignettes
Let's take a look at existing vignettes: `browseVignettes()`
Or specifically one from knitr: `browseVignettes("knitr")`
- Confirm whether people know Rmarkdown
- Let's make our report
`usethis::use_vignette("final_report")`
The knit button gives a preview. To actually document: `devtools::build_rmd("vignettes/final-report.Rmd")`
Mention that usethis also knows `usethis::use_article("")`.
## Week 3 - dependencies & data
### Dependencies
- We sometimes use functions from other packages
- Best practice: with `::`
- These packages are required to be installed, otherwise your package does not work.
- They are mentioned in the `DESCRIPTION`
- This is how to put them there: `usethis::use_package("dplyr")`
- There is also 'suggests'
Q Can you think of a reason that a dependency would not be imported but only suggested?
A If it is only used for one function (you do need to check for the presence of that package before you run the function); if it is used to run the test suite; when they are used in an example or in your vignette...
- This is how to use Suggests: `usethis::use_package("tidyr", type="suggests")`
- You can also specify a minimal version: `usethis::use_package("magrittr", min_version="1.0.0")`
#### Do Exercise ####
Q It is probably possible to do this automatically. Can you think of a reason why it is good to keep this a manual step?
A Dependencies should be kept to a minimum. You should know what dependencies you introduce.
THAT SAID: the "check" will check whether you have undeclared dependencies or if your dependencies don't actually show up in your code.
Note about versions:
"Generally, it’s always better to specify the version and to be conservative about which version to require. Unless you know otherwise, always require a version greater than or equal to the version you’re currently using.""
Check your version with: `packageVersion("dplyr")`
Automatically update them with `usethis::use_latest_dependencies()`
### Data
```r=
sample_names <- c("Luke", "Darth Vader", "Leia", "Chewbacca", "Han Solo", "R2D2")
usethis::use_data(sample_names)
```
Best done after documentation with roxygen, then document:
```
#' Example names
#'
#' An example data set containing six names from the Star Wars universe
#'
#' @format A vector of strings
#' @source Star Wars
"example_names"
```
- No need to add the @export tag, in fact, it will break your package.
- The object you create will be available to the user.
- It is not in NAMESPACE, that is OK.
Save raw data in `inst/extdata`.
When using the data, this is how you refer to the file path:
`system.file("extdata", "names.csv", package = "mysterycoffee")`
so load it with:
```r=
filepath <- system.file("extdata", "names.csv", package = "mysterycoffee")
names <- read.csv(filepath)
```
### Exercise: add data to your package
```mermaid
flowchart LR
id1(Does the user need access?) --Yes--> id6(Store it in data/)
id3(Is the data in .Rda format?)--Yes--> id1
id1 --No, but tests do--> id5(Store it in tests/)
id1 --No, but functions do--> id4(Store it in R/sysdata.Rda*)
id3 --No--> id8(But can it be?)
id8 --Yes, with some work --> id9(Document the process in data-raw/**)
id8 --No, it shouldn't--> id7(Store it in inst/extdata)
```
`*`) `R/sysdata.Rda` is a file dedicated to (larger) data needed by your functions. [Read more about it here](https://r-pkgs.org/Data.html#sec-data-sysdata).
`**`) `data-raw/` is a folder dedicated to the origin and cleanup of your data. [Read more about it here](https://r-pkgs.org/Data.html#sec-data-data-raw).
Add data to your package:
- Do you need raw data as part of your package?
- Create a folder `inst/extdata`, and save the files here. Note that a user will be able to access this data.
- When loading the data, do not describe the path as you usually would. Instead, use something like:
```r
filepath <- system.file("extdata", "names.csv", package = "mysterycoffee")
names <- read.csv(filepath)
```
- Do you need data you can store in your package as an .Rda file?
- Create the object
- Store it with `usethis::use_data(object_name)`
- Verify the object is now stored in the `data/` folder
- Create a new R file called `data.R`: `usethis::use_r("data")` (`data/R` is an example; you may call this whatever you want)
- In this file, document the data object, using this example:
```
#' Title
#'
#' A short description.
#'
#' @format What format is the data in?
#' @source Where did it come from? \url{https://google.com}
"object_name"
```
- Don't forget to call `devtools::document()` to generate the documentation file and add this data to the package namespace.
## Week 2 - testing
- Check if `testthat` is installed.
Tests are formalized checks that you would do anyway.
### Set up testthat
`usethis::use_testthat()`
This creates tests folder and proper structure inside.
### Set up a test
```r
usethis::use_test("name-of-function")
```
### Run a test
Run tests options:
- Cmd/Ctrl + Shift + T
- `devtools::test()`
- Visual (with button); both individual test files and all tests
- As part of 'check'
### Workflow
Write test, run tests.
Update code, run tests.
No need to build in between :) library is loaded in testing env!
### Test strategy
- Use setup for a test, use it once
- But decouple testing environments as much as possible (so don't daisy-chain tests, especially if you change things in setup)
- "Expected" should be as clean as possible; a fixed value, not itself subject to function use (unless that is warranted)
- But test your functions, not R functions (so don't verify things inside your test suite)
### Integration tests vs unit tests
- unit tests are about single functions
- integration will test larger workflow
- both have value (can you think of some?)
### Tests written
```r
test_that("even number of people", {
load("testdata.Rda")
starwars <- sample(starwars, 6)
groups <- make_groups(starwars)
expect_equal(ncol(groups),2)
expect_equal(nrow(groups),3)
})
test_that("dataframe", {
df <- data.frame(1:10, 11:20)
expect_error(make_groups(df),
"Requires vector input.")
})
skip("uneven")
test_that("uneven number of people", {
load("testdata.Rda")
starwars <- sample(starwars, 7)
groups <- make_groups(starwars)
expect_equal(ncol(groups),2)
})
```
```
starwars <- c("R2D2", "C3PO", "Luke", "Leia", "Darth Vader", "Amidala", "Han Solo", "Chewie", "JarJar Binks")
save(starwars, file="tests/testthat/testdata.Rda")
#rm(starwars)
#load(file="testdata.Rda")
```
### Add data to test
- Make data object
`starwars <- c("R2D2", "C3PO", "Luke", "Leia", "Darth Vader", "Amidala", "Han Solo", "Chewie", "JarJar Binks")`
- Save it in `/tests`
`save(starwars, file="tests/testthat/testdata.Rda")`
- When loading:
`load(file="testdata.Rda")`
This puts the object `testdata` in the environment.
- Show that it needs to be loaded in every `testthat` call, because the environment is recreated.
### Quiz (T/F)
- When you update/refactor a function, you have to update your tests.
- Tests can only be written on functions available to the user, not on 'helper functions'.
- Unit tests test small parts of the code, like single functions.
- You can add tests to a script, but they only work on functions.
-
### Test-driven development
Turns around the process: instead of writing functions, and thinking of edge-cases, you first formalize what you want your function to do, then write the function.
RED
Step 1: Write test(s)
Step 2: Run test(s) - make sure they fail [this means test adds value]
GREEN
Step 3: Write code so that test passes
Step 4: Ensure that all tests, including older tests, pass
REFACTOR
Step 5: Refactor and improve code
This also ensures high coverage: you only write code which fixes your tests, and only write tests that break your code, ensuring that your code is covered by tests.
PRO
- forces you to think about requirements first, define feature well
- saves time in the long run, bugs are detected early
- code will be easily testable by design, easier maintenance
PITFALLS
- takes time initially, can be a lot of overhead
- unit tests can accumulate and make you feel like everything is covered, you don't spend time on integration tests
- if you write tests AND code, you can have blind spots and miss testing opportunities
Use v abuse tests
- check that the use matches expected behavior: use
- expect_true, expect_false, expect_equal
- check that faulty behavior generates error: abuse
- expect_error

[source: Medium](https://medium.com/@l.ogrady/test-driven-development-the-red-green-refactor-cycle-e36ee7e6f1b4)
#### example
- uneven number.
- setup test (can be in a function!)
- write ncol/nrow test
- KEEP ONLY BREAKING TESTS (ideally you do this from beginning)
- fix (by ensuring even number in function)
- tests now work
- you want to report person removed
- Write test `expect_warning`
- Does not work, so now write warning
- Now add a dataframe to the function
### Skipping tests
https://testthat.r-lib.org/reference/skip.html
Add `skip("Message")` to the top of the file (or in the middle).
Everything following this will be skipped.
## Week 1 - licenses
```r
tools::CRAN_package_db() %>%
group_by(License) %>%
summarize(count = n()) %>%
arrange(desc(count))
```