owned this note
owned this note
Published
Linked with GitHub
# Tools for Reproducible Research (course_2104)
This is a collaborative document where you can give feedback, ask questions and receive help from both TAs and other students. The idea is that it will complement the Zoom breakout rooms and allow you to get more rapid help with common questions and issues.
We've created headers for each day to keep things organised. Simply write your questions and comments under the respective day.
## How to edit
Simply click the `Edit` button in the top menu to get started. You can view the document in split Markdown, Markdown/Preview or Preview modes.
## Important links
- [Zoom link](https://stockholmuniversity.zoom.us/j/66028516843)
- [Course homepage](https://nbis-reproducible-research.readthedocs.io/en/latest/)
## Teachers
- John Sundh
- Verena Kutschera
- Erik Fasterius
- Tomas Larsson
- Nina Norgren
- Estelle Proux-Wera
# Technical issues
**Question:** Despite following the tutorial, the `main` branch is still called `master`. How can we solve this?
**Answer:** You can try to update your Git version, as the nomenclature to call the branch "main" was introduced rather recently. Git still uses `master` as default, and you have to use `git config --global init.defaultBranch "main"` in order to change this (a config parameter which was introduced in Git version 2.28).
**Answer 2:** If you already have created your repository or have an older version of git, you can, while located in the `master` branch, change the name to `main`. Use `git branch -m main`.
---
**Question:** I try to install the latest version of Git but when I check the version in my computer, it still say the old version. How can I solve this?
**Answer:** Install Git from [this site](https://git-scm.com/downloads).
---
**Question:** I am running the ubuntu subsystem in windows and have trouble accessing the Display with text-editors etc. I have been trying to fix this but fail to solve it
**Answer:** This can be solved by installing a X server via a tool such as VcXsrv, *e.g.* following the instructions from [this website](https://seanthegeek.net/234/graphical-linux-applications-bash-ubuntu-windows/) (you will need to scroll down to "Graphical Applications"). You might need to adjust the bash command to configure bash to use the local X server (let us know if you can't fix this yourself).
Another option is to use the tool MobaXterm for the R Markdown tutorial which provides a bash shell and a local X server (https://mobaxterm.mobatek.net/download-home-edition.html). In MobaXterm, click "Session" and "WSL2" (to the right), and choose "Ubuntu" as default terminal. To start RStudio from the command line, open MobaXterm and a new Ubuntu terminal, move to your workshop directory, activate the conda environment (`conda activate rmarkdown-env`) and start RStudio (`rstudio &`). We didn't test MobaXterm for any of the other tutorials though.
In case you just want access to the files via Windows File Explorer from where you can open them in graphical applications, and the installation of the X Server is not working, you can also open a Windows File Explorer window from where you are currently standing in the Ubuntu app by typing `explorer.exe .`
---
**Question:** I'm trying to run the first command from the Docker tutorial on the Ubuntu app (WSL) on Windows: `docker pull ubuntu:latest` but I get an error message saying that docker cannot be found. How can I solve this?
**Answer:** This indicates that the Docker app for Windows cannot find your WSL subsystem. It could be that you installed the wrong WSL version by accident (WSL1 instead of WSL2). You can find out if this is the case by opening the Docker app, and clicking on "Settings", "Resources", and "WSL INTEGRATION". If integration with the Ubuntu app is not enabled you can follow these instructions:
- Close the Ubuntu app
- Open a Windows command prompt (search for “cmd” in Windows task bar)
- Type `wsl --list --all` to find your Linux app
- Type `wsl --set-version <Linux app> 2`
- In the Docker app, go to “Settings”, “Resources”, and “WSL INTEGRATION”. Enable integration with the Ubuntu app, and click “Apply & Restart”
- Restart the Ubuntu app
---
**Question:** I am running Docker on Windows (with WSL integration) but when I try to run Docker on the Ubuntu app, it says that it cannot connect to the Docker daemon and asks if the docker daemon is running.
**Answer:** You can try to follow these suggestions: https://stackoverflow.com/questions/40459280/docker-cannot-start-on-windows Specifically, try to run the Docker daemon as elevated user.
---
# Day 1
**Question:** Is it possible to unstage staged files while keeping the changes in them?
**Answer:** Yes, you can either use `git reset <file>` or `git restore --staged <file>`. If you use `git restore <file>` you will also discard any changes done to it.
---
**Question:** What is the difference between the commands `git log --graph --all --oneline` and `git log --oneline --decorate`?
**Answer:** The difference mostly lies in the use of `--graph`, which includes a small graph-like thing at the left-hand side of the log (if you have no branches or previous branch merges this will most likely just look like a bunch of * in a vertical line). Using `--graph` will always result in one-line entries, so using `--oneline` together with `--graph` does nothing. The flag `--decorate` makes `git log` display which branch or tag each commit is associated with (e.g. read more about it [here](https://www.atlassian.com/git/tutorials/git-log).
---
**Question:** If you are planning to create a branch from a specific tag, *e.g.* from the tag `submission1`, how do you do that in practice?
**Answer:** You would checkout to the commit that is tagged, *i.e.* `git checkout submission1`, and then you would create a new branch, e.g. `git branch revision` and move to the new branch by typing `git checkout revision`.
---
**Question:** I think the command `git branch --set-upstream-to origin main` in the green tip box under "Working with remote depositories" should be `git branch --set-upstream-to=origin/main main` and I think this will also be necessary if we want to use only the command `git pull` on its own. If we didn't change the `--set-upstream-to` would we have to use `git pull origin master` in the same way as push?
**Answer:** We will check the commands, thanks for the notification! And yes, if you didn't change you'd have to use the full `git pull origin master` all the time. UPDATE: you just need `git branch --set-upstream-to origin/main`.
---
**Question:** Is it possible to see deleted branches somewhere?
**Answer:** If you have accidentally deleted a branch you can use the `git reflog` to see it - this is a log that keeps track of all commands you've done with git. You would then use `git checkout -b <branch> <sha>`, where `sha` is the reference you see in the reflog for you deleting the branch. But other than that you can't restore a deleted branch, so be careful!
---
**Question:** I received an email from GitHub to switch to 2-factor authentication because the basic authentication using a password to Git will soon no longer work. However, after I have done this, I tried cloning with git but it still asked for a password. When I entered my previous password or the 6-digit pin, it says "Invalid username or password"
**Answer:** There are two general ways to interact with remote repositories: HTTPS or SHH. The default is HTTPS using username/password, while SSH uses SSH keys. You can read more about this in the [extra material](https://nbis-reproducible-research.readthedocs.io/en/devel/git/#remote-connections-with-ssh).
---
**Question:** How can update my git to the latest version?
**Answer:** You can install Git from [this site](https://git-scm.com/downloads). Here is a [longer tutorial](https://confluence.atlassian.com/bitbucketserver/installing-and-upgrading-git-776640906.html) for the installation and upgrade of Git.
---
**Question:** I cannot restore a deleted branch but can I restore a deleted file?
**Answer:** You can do both using the `reflog`, see the earlier answer above ("is it possible to see deleted branches somewhere?").
---
**Question:** How can a rename a tag?
**Answer:** You can do the following to rename a tag `old` to `new`: `git tag new old` and then `git tag -d old`.
---
**Question:** I used `git branch -m main`. It works in my local but when pushed to GitHub, it is still called master. How can I change even the one pushed to GitHub?
**Answer:** You have now renamed your local branch, but not the remote branch. You can simply push the new branch to the remote (`git push origin -u main`) and then delete the old branch (`git push origin --delete master`). You might have to change the default branch in the GitHub settings (in your browser) before you delete the old `master` branch.
---
**Question:** I am doing the conflict section of the tutorial and pushed the commit from the remote tutorial. It went successfully without error. I don't know what I did wrong that it did not detect the conflict.
**Answer:** It is very hard to know exactly what went wrong with just this description, but ask in one of the exercise sessions and your teacher might be able to give more detailed help.
# Day 2
**Question:** Why are new packages being installed when I remove an installed package, *e.g.* during `conda remove sra-tools`?
**Answer:** Removing one package also removes packages that depend on it, meaning that Conda may also choose to install other packages and versions of the already removed packages in the context of the new dependency tree. For example, if an environment has packages A, B, C and D, and package B and C depends on D, the removal of C may lead to an update in package D in the case where package C required an earlier version of D than B. Now that C is gone, Conda may install a later version of D that still satisfies the dependency of B. You can, however, use `--force` to remove a package without also removing packages that depend on it.
---
**Question:** What is the problem with using tabs instead of spaces in Snakemake?
**Answer:** Snakemake is, like the Python language it is built on top of, indentation-sensitive. This means that indentation matters, and there is a difference between spaces and tabs for indentation - you cannot mix them. Depending on your editor-of-choice it might automatically change one into the other, removing this problem for you - a lot of editors will, for example, put 4 spaces when you press the TAB button.
---
**Question:** I got a bit stuck on producing the logfiles in snakemake. Could someone share some example code? I think it may have to do with wildcard naming for bowtie for example.
**Answer:** There are many different ways to do this, but the important thing to remember is to use the same wildcards for the logfiles as for the input/output. The following example uses `{sample}` as a wildcard:
```python=
log:
"results/logs/{sample}.log"
```
---
**Tip:** Rename your `snakefile_mrsa` to `snakefile_mrsa.smk` , this will tell your text editor that it is a snakemake file in python and you will get colorcoded syntax if you have settings for that.
---
# Day 3
**Question:** Regarding R Markdown, I am struggling with figuring out, where to add the `fig.cap = "..."` in the `Counts barplot` chunk, no matter what I try, it never shows up in the rendered version.
**Answer:** It should work if you add the `fig.cap = "..."` to the header like this: `{r Counts barplot, fig.cap = "Counting statistics per sample, in terms of read counts for genes and reads not counted for various reasons."}`. It is important that there is no linebreak in the header, i.e. everything has to be on one line to render correctly. Let us know if this still doesn't work and we can have a look at your code.
---
**Question:** I would like to experiment with Rstudio 1.4.1106 due to the new R Markdown features but it does not seem possible to create a conda environment with this version of Rstudio (appears that none of the channels host this version).
I am able to use Rstudio 1.4.1106 in conjunction with the rmarkdown-env conda environment if I remove "-rstudio=1.1.456" and have 1.4.1106 installed in base environment, but I presume that the Rstudio version is now no longer defined within the rmarkdown-env conda environment. Are there any work-arounds that would enable me to track this new Rstudio version through conda? Thanks
**Answer:** If it is not available in any of the channels, there is unfortunately no possibility to install this version through conda (yet). I would suggest to check in a few weeks/months from now if somebody created a conda package for this Rstudio version. I am wondering though what you mean that you were able to install it to the base environment, because this should not be possible. Can you explain a bit more in detail how you installed it there?
**Additional answer:** There is nothing inherently that makes you _have_ to install RStudio with Conda, it's just that RStudio is not very good about using Conda environments if not installed inside you (and at times not even then), simply because of the way RStudio sets its library paths. So you thus simply try to use the latest version of RStudio through a normal, non-Conda installation and hope for the best - sometimes it just works, and sometimes it doesn't. There are also other workarounds where you manually reset a bunch of library paths, which can also work at times.
**Response to answers** Re. the installation on Conda base: I think I misunderstood what was going on - I think I may have actually installed it outside of Conda i.e. did not use any "conda install" functions. Now it seems that Rstudio 1.4 could be tracked within Docker instead of Conda.
**Response again:** Yes, you can install using a container instead, though I have never tried using RStudio from inside a container. Let us know how it works out!
---
**Question:** For rendering the Jupyter notebook, we need to put an exclamation mark before the command, correct? (I get a syntax error otherwise): `!jupyter nbconvert --to html Untitled.ipynb`
**Answer:** If you want to execute it from within a Jupyter notebook, this is correct! If you type in this command on the command line in the terminal, you don't need the exclamation mark (`jupyter nbconvert --to html Untitled.ipynb`).
---
# Day 4
**Question:** The command `FROM my_docker_image:latest` yields `FROM: can't read /var/mail/my_docker_image:latest`. Even if I see `my_docker_image` with the command `docker image ls`.
**Answer:** You need to create the file `Dockerfile_conda` first and write the command into the file, along with the other commands. If you are unsure how to do that, you can just copy the commands from "Click to see an example of Dockerfile_conda" into the file. Save the file and build the Docker image from the file with the command `docker build -t my_docker_conda -f Dockerfile_conda .`, giving it the name `my_docker_conda`.
---
**Question**: How long does the `docker build -f Dockerfile_slim -t my_docker_image .` command take usually? It's already running half an hour for me, but I think my internet is very slow right now and I also made the mistake of setting up the workshop directory on my external hard drive.
**Answer:** This is probably a combination of slow internet and running on an external HD, since you're having both normal I/O issues from the download plus the one from the HD. Half an hour seems like a long time, though, but it might not be anything wrong.
---
**Question:** During the last part of docker tutorial (building my_docker_project from Dockerfile) I get the following error:
```bash
[7/9] RUN conda config --add channels bioconda && conda config --add channels conda-forge && conda env update -n base -f environment.yml && conda clean --all:
#12 1.514 Collecting package metadata (repodata.json): ...working... done
#12 54.37 Solving environment: ...working... /bin/bash: line 1: 13 Killed conda env update -n base -f environment.yml
```
and it stops. Anyone else had the same error?
**Answer:** This is probably because you've run out of RAM. You need to go into your Docker settings and increase the default RAM (which is usually 2 GB). Whenever you see a `killed` error this is usually due to the memory running out and the Docker daemon killing the process as a result of this.
---
**Question:** When I run the command `docker run --rm -v $(pwd)/fastqc_results:/course/results/fastqc my_docker_conda`, I get the error `docker: invalid reference format. See 'docker run --help'.`
**Answer:** In what OS are you running this command? If you are running in *e.g.* PowerShell you need to use `${pwd}` instead of `$(pwd)`. You can also try `$PWD` instead.
# Day 5
**Question:** May we ask more advice / questions about reproducibility at the NBIS drop-in?
**Answer:** Yes, you're very welcome to come and talk to us about reproducibility!
---
**Question:** Earlier during the week, it was mentioned that it is possible to run R Markdown documents also somehow cell-by-cell or to execute only part of the R Markdown document (in a similar way as Jupyter notebooks). How exactly would this work in practice?
**Answer:** You can always run individual chunks in *e.g.* RStudio or in a REPL window connected to your code, which will then output in the RStudio terminal/REPL window. This does not render the RMarkdown document, however, but it's nevertheless a really good tool for data exploration and experimentation. You can run individual chunks to test things out and render the full document once you're satisfied with the result(s).
---
**Question:** I have encountered error (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!) after running the snakemake file. How can I solve it?
**Answer:** This error message tells you that one of the bash commands went wrong, so you will need to go back to your "shell" directive in the rule that failed to find out what went wrong.
**Question:** Follow up on the previous question, if the dry run does not show any error message, is the problem still in the shell directive?
**Answer:** Bash strict mode means that any error will stop further execution of subsequent code. You read more about it at [this website](http://redsymbol.net/articles/unofficial-bash-strict-mode/). One problem with strict mode is that some commands will yield a non-zero exit status even if they succeeded, such as grep (*e.g.* yielding an exit status of `1` if you grep something that doesn't exist in the file, even if that is a valid result and an error *per se*.) One way to work around this is to use `grep <command> || true`, which will overwrite any non-zero exit status of grep.
---
**Question:** I try to run my rscripts trough snakemake. I managed to import .csv and .rds files, but got stuck on saving images.
My rscript looks like this:
`savePlot(file = snakemake@output[["fig1a"]], type = "png")`
and my snakefile
`output:
fig1a = "results/figures/fig1a"
script:
"code/fig1.R" `
I have tried naming the output in snakefile to "results/figures/fig1a.png" but that didn't work either. If this looks right, than it might be the conda environment that is causing problems, but then at least I know this code is right.