Personal notes of Barbara in preparing R Packaging

# Personal notes of Barbara in preparing R Packaging To be used for additions to the [current workshop materials](https://carpentries-incubator.github.io/lesson-R-packaging/). ## Week 4 - Vignettes Let's take a look at existing vignettes: `browseVignettes()` Or specifically one from knitr: `browseVignettes("knitr")` - Confirm whether people know Rmarkdown - Let's make our report `usethis::use_vignette("final_report")` The knit button gives a preview. To actually document: `devtools::build_rmd("vignettes/final-report.Rmd")` Mention that usethis also knows `usethis::use_article("")`. ## Week 3 - dependencies & data ### Dependencies - We sometimes use functions from other packages - Best practice: with `::` - These packages are required to be installed, otherwise your package does not work. - They are mentioned in the `DESCRIPTION` - This is how to put them there: `usethis::use_package("dplyr")` - There is also 'suggests' Q Can you think of a reason that a dependency would not be imported but only suggested? A If it is only used for one function (you do need to check for the presence of that package before you run the function); if it is used to run the test suite; when they are used in an example or in your vignette... - This is how to use Suggests: `usethis::use_package("tidyr", type="suggests")` - You can also specify a minimal version: `usethis::use_package("magrittr", min_version="1.0.0")` #### Do Exercise #### Q It is probably possible to do this automatically. Can you think of a reason why it is good to keep this a manual step? A Dependencies should be kept to a minimum. You should know what dependencies you introduce. THAT SAID: the "check" will check whether you have undeclared dependencies or if your dependencies don't actually show up in your code. Note about versions: "Generally, it’s always better to specify the version and to be conservative about which version to require. Unless you know otherwise, always require a version greater than or equal to the version you’re currently using."" Check your version with: `packageVersion("dplyr")` Automatically update them with `usethis::use_latest_dependencies()` ### Data ```r= sample_names <- c("Luke", "Darth Vader", "Leia", "Chewbacca", "Han Solo", "R2D2") usethis::use_data(sample_names) ``` Best done after documentation with roxygen, then document: ``` #' Example names #' #' An example data set containing six names from the Star Wars universe #' #' @format A vector of strings #' @source Star Wars "example_names" ``` - No need to add the @export tag, in fact, it will break your package. - The object you create will be available to the user. - It is not in NAMESPACE, that is OK. Save raw data in `inst/extdata`. When using the data, this is how you refer to the file path: `system.file("extdata", "names.csv", package = "mysterycoffee")` so load it with: ```r= filepath <- system.file("extdata", "names.csv", package = "mysterycoffee") names <- read.csv(filepath) ``` ### Exercise: add data to your package ```mermaid flowchart LR id1(Does the user need access?) --Yes--> id6(Store it in data/) id3(Is the data in .Rda format?)--Yes--> id1 id1 --No, but tests do--> id5(Store it in tests/) id1 --No, but functions do--> id4(Store it in R/sysdata.Rda*) id3 --No--> id8(But can it be?) id8 --Yes, with some work --> id9(Document the process in data-raw/**) id8 --No, it shouldn't--> id7(Store it in inst/extdata) ``` `*`) `R/sysdata.Rda` is a file dedicated to (larger) data needed by your functions. [Read more about it here](https://r-pkgs.org/Data.html#sec-data-sysdata). `**`) `data-raw/` is a folder dedicated to the origin and cleanup of your data. [Read more about it here](https://r-pkgs.org/Data.html#sec-data-data-raw). Add data to your package: - Do you need raw data as part of your package? - Create a folder `inst/extdata`, and save the files here. Note that a user will be able to access this data. - When loading the data, do not describe the path as you usually would. Instead, use something like: ```r filepath <- system.file("extdata", "names.csv", package = "mysterycoffee") names <- read.csv(filepath) ``` - Do you need data you can store in your package as an .Rda file? - Create the object - Store it with `usethis::use_data(object_name)` - Verify the object is now stored in the `data/` folder - Create a new R file called `data.R`: `usethis::use_r("data")` (`data/R` is an example; you may call this whatever you want) - In this file, document the data object, using this example: ``` #' Title #' #' A short description. #' #' @format What format is the data in? #' @source Where did it come from? \url{https://google.com} "object_name" ``` - Don't forget to call `devtools::document()` to generate the documentation file and add this data to the package namespace. ## Week 2 - testing - Check if `testthat` is installed. Tests are formalized checks that you would do anyway. ### Set up testthat `usethis::use_testthat()` This creates tests folder and proper structure inside. ### Set up a test ```r usethis::use_test("name-of-function") ``` ### Run a test Run tests options: - Cmd/Ctrl + Shift + T - `devtools::test()` - Visual (with button); both individual test files and all tests - As part of 'check' ### Workflow Write test, run tests. Update code, run tests. No need to build in between :) library is loaded in testing env! ### Test strategy - Use setup for a test, use it once - But decouple testing environments as much as possible (so don't daisy-chain tests, especially if you change things in setup) - "Expected" should be as clean as possible; a fixed value, not itself subject to function use (unless that is warranted) - But test your functions, not R functions (so don't verify things inside your test suite) ### Integration tests vs unit tests - unit tests are about single functions - integration will test larger workflow - both have value (can you think of some?) ### Tests written ```r test_that("even number of people", { load("testdata.Rda") starwars <- sample(starwars, 6) groups <- make_groups(starwars) expect_equal(ncol(groups),2) expect_equal(nrow(groups),3) }) test_that("dataframe", { df <- data.frame(1:10, 11:20) expect_error(make_groups(df), "Requires vector input.") }) skip("uneven") test_that("uneven number of people", { load("testdata.Rda") starwars <- sample(starwars, 7) groups <- make_groups(starwars) expect_equal(ncol(groups),2) }) ``` ``` starwars <- c("R2D2", "C3PO", "Luke", "Leia", "Darth Vader", "Amidala", "Han Solo", "Chewie", "JarJar Binks") save(starwars, file="tests/testthat/testdata.Rda") #rm(starwars) #load(file="testdata.Rda") ``` ### Add data to test - Make data object `starwars <- c("R2D2", "C3PO", "Luke", "Leia", "Darth Vader", "Amidala", "Han Solo", "Chewie", "JarJar Binks")` - Save it in `/tests` `save(starwars, file="tests/testthat/testdata.Rda")` - When loading: `load(file="testdata.Rda")` This puts the object `testdata` in the environment. - Show that it needs to be loaded in every `testthat` call, because the environment is recreated. ### Quiz (T/F) - When you update/refactor a function, you have to update your tests. - Tests can only be written on functions available to the user, not on 'helper functions'. - Unit tests test small parts of the code, like single functions. - You can add tests to a script, but they only work on functions. - ### Test-driven development Turns around the process: instead of writing functions, and thinking of edge-cases, you first formalize what you want your function to do, then write the function. RED Step 1: Write test(s) Step 2: Run test(s) - make sure they fail [this means test adds value] GREEN Step 3: Write code so that test passes Step 4: Ensure that all tests, including older tests, pass REFACTOR Step 5: Refactor and improve code This also ensures high coverage: you only write code which fixes your tests, and only write tests that break your code, ensuring that your code is covered by tests. PRO - forces you to think about requirements first, define feature well - saves time in the long run, bugs are detected early - code will be easily testable by design, easier maintenance PITFALLS - takes time initially, can be a lot of overhead - unit tests can accumulate and make you feel like everything is covered, you don't spend time on integration tests - if you write tests AND code, you can have blind spots and miss testing opportunities Use v abuse tests - check that the use matches expected behavior: use - expect_true, expect_false, expect_equal - check that faulty behavior generates error: abuse - expect_error ![](https://i.imgur.com/5TuNlnG.png) [source: Medium](https://medium.com/@l.ogrady/test-driven-development-the-red-green-refactor-cycle-e36ee7e6f1b4) #### example - uneven number. - setup test (can be in a function!) - write ncol/nrow test - KEEP ONLY BREAKING TESTS (ideally you do this from beginning) - fix (by ensuring even number in function) - tests now work - you want to report person removed - Write test `expect_warning` - Does not work, so now write warning - Now add a dataframe to the function ### Skipping tests https://testthat.r-lib.org/reference/skip.html Add `skip("Message")` to the top of the file (or in the middle). Everything following this will be skipped. ## Week 1 - licenses ```r tools::CRAN_package_db() %>% group_by(License) %>% summarize(count = n()) %>% arrange(desc(count)) ```