# Safe Haven Core Packages The current list of software and packages supported in the Safe Haven has grown organically over several years and this list has grown a bit unwieldy over time and so we'd like to make a fresh start by crowdsourcing what current researchers using the Safe Haven think are important. Please check that all the packages you think are important are listed here. ### Filling out the Python/R package list In the `Description` column please briefly explain what the package is for. In the `Importance` column please use the following scoring system: - 1: critically important to everyone; it will be extremely difficult to tackle any problems without this package (eg. `pandas` for Python or `tidyverse` for R) - 2: useful for everyone; this is a package which multiple people will be interested in using and which I use on a regular basis across several different projects (eg. `fbprophet` for Python or `RStan` for R) - 3: nice to have; this is a package which has multiple uses and several people will be interested in using it but there are alternatives/workarounds if it is not available - 4: this is a package only of interest for a single Safe Haven project For the `Downloads last month` (2s.f.) and `Contributors` columns, you should use the stats from - Python: https://pypistats.org/packages/PackageName and https://libraries.io/pypi/PackageName - R: https://cranlogs.r-pkg.org/downloads/total/last-month/PackageName (or `cranlogs` R package) and https://libraries.io/cran/PackageName ### Filling out the Other Software list In the `Description` column please briefly explain what the software is for. Use the same `Importance` scoring as for Python/R. If possible, link to installation instructions for Ubuntu (this could just be `apt-get install software-name` if that's what's needed) Add a justification explaining what this software is useful for and what things can be done with it which couldn't be done otherwise ### DECOVID specific packages There may be a number of important packages which are specific to the work of DECOVID but are not necessarily "core" packages. These packages can be entered at the bottom of the document after the main Python/R/Other sections. ## Python ### Data Science/Regression/Mathematics | Package | Description | Importance | Downloads last month | Contributors | | ------------ | ------------------------------ | ---------- | -------------------- | ------------ | | numpy | | 1 | 37M | 736 | | pandas | | 1 | 22M | 884 | | scipy | | 1 | 22M | 680 | | sympy | | 3 | 1M | ? | | statsmodels | scipy stats | 2 | 2.7M | 226 | ### Machine Learning | Package | Description | Importance | Downloads last month | Contributors | | ----------------- | ------------------------------ | ---------- | -------------------- | ------------ | | fbprophet | Forecasting | 2 | 630k | 83 | | Keras | | | 2.5M | 724 | | pymc3 | | 2 | 92k | 243 | | pystan | | 2 | 90k | 35 | | scikit-learn | | 2 | 13M | 764 | | tensorflow | | | 12M | ? | | gpflow | | 3 | ? | 55| | Theano | | | 390k | ? | | torch | | 2 | 1.5M | ? | | pytorch-lightning | | 3 | 253k | 181 | xgboost | | | 2.7M | 401 | | scikit-multilearn | | 2 | 20K | ? | | lifelines | Survival analysis | 2 | 102K | 75 | | corels | | 3 | 167 | 1 | ### Databases | Package | Description | Importance | Downloads last month | Contributors | | ---------- | --------------------- | ---------- | -------------------- | ------------ | | alembic | | | | | | pyodbc | DB driver | 1 | 3.1M | 26 | | psycopg2 | PostgreSQL driver | 2 | 6.5M | 83 | | SQLAlchemy | SQL <-> py | 2 | 12.4M | 355 | ### Data wrangling | Package | Description | Importance | Downloads last month | Contributors | | -------------- | ------------------------------ | ---------- | -------------------- | ------------ | | geopandas | | | | | | beautifulsoup4 | | | | | | clevercsv | Easy CSV file handling | 3 | 40K | 1 | ### Visualisation | Package | Description | Importance | Downloads last month | Contributors | | ---------- | ------------------------------ | ---------- | -------------------- | ------------ | | bokeh | | | | | | matplotlib | | 1 | | | | seaborn | | 3 | | | ### Code Quality / Documentation | Package | Description | Importance | Downloads last month | Contributors | | ------- | ------------------------------ | ---------- | -------------------- | ------------ | | black | Code formatting | 2 | 2.9M | 136 | | flake8 | Static code checker | | 7.6M | ? | | pytest | Tests | 2 | | | | Sphinx | Documentation | | | | | pylint | Static code checker | 2 | 5.0M | 275 | ### NLP | Package | Description | Importance | Downloads last month | Contributors | | ------- | ------------------------------ | ---------- | -------------------- | ------------ | | spacy | | | | | | nltk | | | | | ### Acceleration/Utility | Package | Description | Importance | Downloads last month | Contributors | | -------- | -------------------------------- | ---------- | -------------------- | ------------ | | Cython | | | | | | pathos | | 3 | 640K | | | Pillow | | | | | | pyopencl | | | | | | pycuda | | | | | | mpctools | Own Library for common utilities | 3 | 187 | 1 | | syncrng | Consistent RNG across Python/R | 3 | 20 | 1 | | fire | Access kwargs via CLI without argparse.| 3 | 1.0M | 34 | ### Interoperability Python/R | Package | Description | Importance | Downloads last month | Contributors | | -------- | ------------------------------ | ---------- | -------------------- | ------------ | | rpy2 | Call R from Python | 3 | 98K | 36 | | pyarrow | Feather/Parquet: language agnostic dataframe file formats. | 2 | 6.4M | 304 | ## R ### Data Science/Regression/Mathematics | Package | Description | Importance | Downloads last month | Contributors | | ---------- | ------------------------------ | ---------- | -------------------- | ------------ | | tidyverse | | 1 | 700k | 28 | | -> dplyr | | | 1.2M | 181 | | -> ggplot2 | | | 1.6M | 230 | | -> purrr | | | 930k | 76 | | -> readr | | | 490k | 70 | | -> tibble | | | 1.3M | 55 | | -> tidyr | | | 1.4M | 98 | | -> dbplyr | | 1 | 400k | | | gfoRmula | G methods for causal inference | 2 | 654 | 2 | | ppcor ||||| | bnlearn ||||| | dagitty ||||| | ggm ||||| | glasso ||||| | pcalg ||||| | riskRegression |Risk Regression Models and Prediction Scores for Survival Analysis with Competing Risks |2| 2971 | 6 | | timeROC ||||| | boot ||||| | smcfcs ||||| | CoxBoost ||||| | ipw ||||| | gbm ||||| | glmnetUtils ||||| | flexsurv ||||| | mets ||||| | mstate ||||| | party ||||| | pec ||||| | penalized ||||| | pROC ||||| | randomForest ||1|140k|| | randomForestSRC |||5.8k|| | rpart ||||| | survival | Core survival analysis routines | 1 | 160k | 6 | | mice |Multivariate Imputation by Chained Equations| 2| 91k | 17 | | missForest| Imputation via random forest | 3 | 8.2k | 2 | | lcmm | Latent class mixed Modelling | 3 | 2k | | ### Machine Learning | Package | Description | Importance | Downloads last month | Contributors | | ------------ | ------------------------------ | ---------- | -------------------- | ------------ | | mlr | | | 19k | 74 | | RStan | | 2 | 84k | >10 | | brms | | 2 | 19k | 1 | | rstanarm | | 2 | 22k | >15 | | Prophet | | | 22k | 83 | | glmnet | | | 82k | ? | | SuperLearner ||||| | tidymodels | | 2 | 14k | ? | | -> recipes | | | 200k | | | -> rsample | | | 51k | | | -> parsnip | | | 19k | | | -> tune | | | 13k | | | -> yardstick | | | 22k | | | -> tidyposterior | | | | | | -> broom | | | ### Databases | Package | Description | Importance | Downloads last month | Contributors | | ----------- | ------------------------------ | ---------- | -------------------- | ------------ | | odbc | | 1 | 60k | 25 | | RPostgreSQL | | 1 | 54k | 2 | ### Data wrangling | Package | Description | Importance | Downloads last month | Contributors | | -------------- | ------------------------------ | ---------- | -------------------- | ------------ | | lubridate | | 1 | 620k | 69 | | stringr | | 1 | 740k | 43 | | XLConnect | | | 17k | ? | | xlsx | | | 135k | 2 | | DT | | | | | | data.table | Extension to dataframes | 2 | 590k | 84 | ### Visualisation | Package | Description | Importance | Downloads last month | Contributors | | ------------ | ------------------------------ | ---------- | -------------------- | ------------ | | esquisse | | | 6k | 4 | | cowplot | | 3 | 21k | 14 | | patchwork | | 3 | 31k | ? | | RColorBrewer | | 3 | 520k | ? | | egg | | 3 | 10k | ? | | lemon | | 3 | 2.7k | 1 | | plotROC | | | | | | precrec | | | | | | survminer | Survival visualisation | 2 | 25k | 11 | | gridExtra | Functions for "Grid" Graphics | 2 | 300k | 2 | ### Code Quality / Documentation | Package | Description | Importance | Downloads last month | Contributors | | ---------- | ------------------------------ | ---------- | -------------------- | ------------ | | knitr | | 1 | 930k | 124 | | rmarkdown | | 1 | 762k | 92 | | roxygen2 | | | 280k | 74 | | testthat | | | 810k | 86 | | lintr | R code linter | 3 | 32K | 53 | | styler | R code styler | 3 | 25K | 13 | | latex2exp | LaTeX in plots | 3 | 6.7k | 1 | | devtools | Package building | 2 | 1.5M | 123 | | kableExtra | knitr tables | 3 | 65k | 23 | | R.rsp ||||| | assertr |Assertive programming | 3 | 2k | 16 | | renv | Reproducible environments | 3 | 28k | 14 | ### NLP | Package | Description | Importance | Downloads last month | Contributors | | --------- | ------------------------------ | ---------- | -------------------- | ------------ | | text2vec | | | 11k | 11 | ### Acceleration/Utility | Package | Description | Importance | Downloads last month | Contributors | | ------------- | ------------------------------ | ---------- | -------------------- | ------------ | | SyncRNG | Consistent RNG across Python/R | 3 | 610 | 1 | | futile.logger | Logging (at least one logging package should be mandatory) | 3 | 75k | ? | | rbenchmark ||||| | microbenchmark| Benchmark accurately | 3 | 26k | | ### Biological/medical data | Package | Description | Importance | Downloads last month | Contributors | | ----------- | ------------------------------ | ---------- | -------------------- | ------------ | | BiocManager | | | 110k | 10 | ### Interoperability | Package | Description | Importance | Downloads last month | Contributors | | ---------- | ------------------------------ | ---------- | -------------------- | ------------ | | Rcpp | C++ from R | 2 | 1.1M | ? | | RcppArmadillo | Interface from R to C++ linear algebra | 2 | 400k | 15 | | reticulate | Python from R | 3 | 190K | 42 | | arrow | Feather/Parquet: language agnostic dataframe file formats | 2| 23K | 413 | ## Other software | Software name | Description | Importance | Ubuntu installation instructions | Justification | | ------------- | ----------- | ---------- | -------------------------------- | ------------- | | weka | | | | | | make | Standard build tool | 1 | | Want for reproducible builds | | pandoc | Document conversion | 2 | apt-get install pandoc | eg. Convert MD to HTML, docx or PDF | | Apache Spark 3 | Big data processing | 3 | | R and Python are both pretty useless for big data | -------- ## DECOVID specific Packages ### Python | Package | Description | Importance | Downloads last month | Contributors | Location | | -------- | --------------- | ---------- | -------------------- | ------------ | -------- | | dash | Dashboards/apps | 1 | 300K | 44 | PyPI | | gunicorn | WSGI server | 2 | 7.9M | 289 | PyPI | | repro-catalogue | Reproducibility tool | | 186 | 7 | PyPI | ### R | Package | Description | Importance | Downloads last month | Contributors | Location | | --------- | ---------------------------- | ---------- | -------------------- | ------------ | ------| | Achilles | OHDSI standardised reporting | 1 | ? | 38 | https://github.com/OHDSI/Achilles | | JM |Joint Modeling of Longitudinal and Survival Data| 2 | 1.6k | 2 | CRAN | | JMbayes |Bayesian JM | 2 | 1.3k | 4 | CRAN |