bbolker

@bbolker

Joined on Jul 1, 2020

  • In general, R scripts can be run just like any other kind of program on an HPC (high-performance computing) system such as the Compute Canada systems. However, there are a few peculiarities to using R that are useful to know about. This document compiles some helpful practices; it is aimed at people who are familiar with R but unfamiliar to HPC, or vice versa. Some of these instructions will be specific to Compute Canada ca. 2024, and particularly to the Graham cluster. I assume that you're somewhat familiar with high performance computing (i.e. you've taken the Compute Canada orientation session, know how to use sbatch/squeue/etc. to work with the batch scheduler) Below, "batch mode" means running R code from an R script rather than starting R and typing commands at the prompt (i.e. "interactive mode"); "on a worker" means running a batch-mode script via the SLURM scheduler (i.e. using sbatch) rather than in a terminal session on the head node. Commands to be run within R will use an R> prompt, those to be run in the shell will use $. Getting started Compute Canada reminders
     Like  Bookmark
  • We would like to be able to estimate parameter values for dynamic models with all of the following characteristics: discrete states: we usually want to count at the level of individuals. Especially for beginnings/ends of epidemics, and outbreaks in small populations, finite-size effects (increased sampling noise at low prevalence and fadeout/extinction processes) are important continuous time: epidemics 'really' run into continuous time; even though time scales of epidemic processes are usually longer than a day, some processes can be close to this time scale, and discreteness can cause annoying dynamical instabilities [Ref Mollison and Ud Din?] both observation and process error note that 'process error' can occur at two weakly separable scales, i.e. 'sampling-level' (demographic noise, either 'simple' [Poisson noise/Poisson-process branching events] or overdispersed [Hooke processes, Gamma-white noise processes [Ionides and King], negative binomial/beta-binomial epidemic sampling]) or stochastic time-varying parameter values, especially transmission rates 'plug-and-play' analysis of complex epidemic models convenient inference, especially Bayesian
     Like  Bookmark
  • what to include in the repository Start here and here Tips for managing large repos Definitely include Code Definitely exclude
     Like  Bookmark
  • I created this document a few years ago about some of the connections and ideas I found interesting among topics such as literate programming; workflow tools; collaborative tools; etc.. I still think the ideas are interesting, but a fair number of the tools are out of date. Here I will just list some categories I think are useful and some tools that fall in those categories. It's a little alarming how rapid the turnover is (dead links, discontinued projects, etc..) Categories Tools for reproducible reports (I'm using this term rather than literate programming, which I will save for the old Knuth-style (cweb/noweb) concept) Sweave/Rnw Rmarkdown and its extensions (bookdown, pagedown, blogdown ...)
     Like  Bookmark