---
tags: Miscellaneous
---
# Bioinformatics tips
Here is a highly subjective list of tools and resources that I've found to be useful in my genomics focused bioinformatics work so far.
## Resources
### General
- [O'Reilly learning](http://go.oreilly.com/uppsala-university 'Link to log in with UU account info')
Marked with * are available in O'Reilly learning.
#### Bioinformatics
- [Bioinformatics Data Skills](https://learning.oreilly.com/library/view/bioinformatics-data-skills/9781449367480/ "Link to book's first page") by Vince Buffalo*
This resource is some years old but still quite a lot of the content is relevant.
#### Commandline
- [Data Science at the Command Line](https://datascienceatthecommandline.com/2e/ "Link to book's first page")
### Python
#### General
- [Python Fundamentals](https://learning.oreilly.com/videos/python-fundamentals/9780135917411/ 'Link to video course first page') by Paul J. Deitel*
- [Tiny Python Projects](https://github.com/kyclark/tiny_python_projects 'Link to Gh repo')*
#### Bioinformatics related
- [Mastering Python for Bioinformatics](https://github.com/kyclark/biofx_python 'Link to Gh repo')*
- [Biostar Handbook](https://www.biostarhandbook.com/) (This costs ~25$/6 months for students)
## Tools
### Data visualisation
#### Static
- R: [ggplot2: Elegant Graphics for Data Analysis](https://ggplot2-book.org/ 'Link to an online book')
- Python: [Plotnine](https://plotnine.readthedocs.io/en/stable/ 'Link to homepage')
#### Interactive
##### General
- [Plotly](https://plotly.com/graphing-libraries/)
##### Python
- [Bokeh](https://docs.bokeh.org/en/latest/)
### Managing computational environments
1. [Conda](https://docs.conda.io/en/latest/ 'This is absolutely the easiest way')
2. [Singularity](https://sylabs.io/guides/latest/user-guide/)
- [Singularity tutorial](https://singularity-tutorial.github.io/)
3. [Docker](https://www.docker.com/)
4. [renv](https://rstudio.github.io/renv/articles/renv.html 'This is used for r package management')
General recommendation about computational environments: *try to avoid installing stuff globally unless you have to*. Each new project should have its own isolated computational environment.
### Generally useful in genomics
- [Sed](https://www.grymoire.com/Unix/Sed.html)
- Awk
- [Awk](https://www.grymoire.com/Unix/Awk.html)
- [To awk or not](https://pmitev.github.io/to-awk-or-not/ 'Uppmax resource')
Both Sed and Awk have been around already many years but still can be handy at times.
- Perl
Perl is also an old timer but you can get along completely fine with Python as well. Perl offers nice one liners on the CLI that can replace well Sed and Awk.
- [Coderefinery](https://coderefinery.org/)
- Command line tools on *nix machines
- This is good to get to know
- Examples: `cat`, `zcat`, `grep`, `zgrep`, `zegrep`, `less`, `zless`, `diff`, `zdiff`
- [Gnu parallel](https://www.gnu.org/software/parallel/parallel_tutorial.html) and it's less powerful predecessor xargs are both really handy at times.
- Make
Using Make the [simple way](https://github.com/kyclark/biofx_python/blob/main/01_dna/Makefile 'Here is a simple example') can get you quite far.
- Bash
Bash is widely used so it's good to know too.
- Workflow tools
- [Snakemake](https://snakemake.readthedocs.io/en/stable/)
- [Nextflow](https://www.nextflow.io/)
- Computational notebooks
- [Rmarkdown](https://rmarkdown.rstudio.com/)
This is really good for any type of reports (html or pdf) that you need to hand in.
- [Jupyter notebook/JupyterLab](https://jupyter.org/)
- Version control: Git and Github
It's good to start using git and github to make a portfolio of what you've worked with
### General tools
- [Typora Markdown editor](https://typora.io/)