Installation and use of useful sequencing programs

###### tags: `Server installation and Bioinformatics` # Installation and use of useful sequencing programs ## Oracle VM 1. Download vBox and vBox extension pack (USB3.0) here: - https://www.oracle.com/technetwork/server-storage/virtualbox/downloads/index.html#vbox - https://www.oracle.com/technetwork/server-storage/virtualbox/downloads/index.html#extpack 2. install vbox package as admin 3. install extpack as admin 4. start vBox 5. Download UBUNTU 16.04 as 64Bit version as an *.iso format 6. push button NEW and give a VM name (e.g. UBUNTU16) 7. type is Linux 8. operating system is UBUNTU (64bit) 9. Follow instructions (CPU 4 core, Drive 500GB, RAM 8GB) 10. start VM and vBox will asked you for the ISO file 14. Navigate to ISO location 15. press NEXT 16. follow instruction ## UBUNTU 1. set the proxy in network steeing under UBUNTU OR 1. open terminal 2. type in ``` $ sudo bash $ cd /etc/apt/ $ gedit apt.conf ## copy these two lines in apt.conf : Acquire::http::proxy "http://proxy.clondiag.jena:8080/"; Acquire::https::proxy "https://proxy.clondiag.jena:8080/"; ## ``` 3. save apt.conf 4. restart machine! ## Usefull Commads for installation ### apt-* command #### apt-get upgrade - install new version of programs #### apt-get update - install update verion of installed programs #### apt-get remove - deinstall program #### apt-get purge - deinstall program and all data related with #### apt-cache search - searching for programs in the apt-get portfolio ### dpkg Install packages ``` dpkg -i ``` Installed Debian packages ``` dpkg -l| grep -i "name" ``` - searching for dpkg intalled on the system ## Graphmap 1. goto https://github.com/isovic/graphmap 2. open console 3. goto the directory where you want to install the program 4. type in the console ``` cd mkdir SeqTools cd SeqTools git clone https://github.com/isovic/graphmap.git cd graphmap make modules make ``` to install the graphmap binary to /usr/bin ``` sudo make install ``` ## MiniMap2 ``` cd mkdir SeqTools (NOTE: If it not created yet) cd SeqTools git clone https://github.com/lh3/minimap2 cd minimap2 make cd /usr/bin sudo ln -s /home/sascha/SeqTools/minimap2/minimap2 minimap2 ``` ## Libxml2 apt-get install libxml2-dev NOTE: https://replikation.github.io/bioinformatics_side/R/R/ ## R-base (language) and R-Studio (environment) for Linux Mint (Xenial) ### R language ``` sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9 sudo add-apt-repository 'deb [arch=amd64,i386] https://cran.rstudio.com/bin/linux/ubuntu bionic-cran40/' sudo apt-get update sudo apt-get install r-base ``` Get the R-Studio from https://www.rstudio.com/products/rstudio/download/#download NOTE:If you want to code in terminal type R. Now you are in a R environment ("console"). q() to exit R in terminal. But use R Studio for scripting not the terminal. ### packages in R 1. goto console 2. type in ``` $ R ``` 3. This will start R on commando line Packages are organized in repositories: CRAN, Bioconductor, R-forge, Github or * Googlecode Installing a package in R and first time source location 1 establish bioconductor as a source, first time only ``` source("https://bioconductor.org/biocLite.R") ``` 2 install package methylKit from biocLite ``` biocLite("methylKit") ``` CRAN install from CRAN ``` install.packages("fortunes") ``` #### Using CRAN ``` ">install.packages("packageName") ``` #### Using Bioconductor ``` ">source("https://bioconductor.org/biocLite.R") ">biocLite("packageName") ``` e.g. NOISeq, Rsamtools, Repitools, rtracklayer (packages not available @CRAN) #### Upgrading R on Ubuntu 18.04 and Resolving ImportError in `add-apt-repository`** Upgrading R on Ubuntu 18.04 can be hindered by an `ImportError` in the `add-apt-repository` command, related to the Python GI module. Follow these steps to resolve the error and upgrade R: 1. **Reinstall Python GI Module:** - Command: `sudo apt-get install --reinstall python3-gi` - Purpose: Fixes issues with the Python GObject Introspection module, which is crucial for `add-apt-repository`. 2. **Install Dependencies:** - Command: `sudo apt-get install libgirepository1.0-dev gcc python3-dev` - Purpose: Ensures all dependencies for the GI module are installed. 3. **Verify Python Version:** - Command: `python3 --version` - Purpose: Confirms the correct Python version is being used. 4. **Update and Upgrade System:** - Commands: `sudo apt-get update` and `sudo apt-get upgrade` - Purpose: Keeps system packages updated, potentially resolving package conflicts. 5. **Manually Add R Repository:** - Method: Add `deb https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/` to `/etc/apt/sources.list` - Purpose: Bypasses `add-apt-repository` if it's not functioning. Ensures access to the latest R versions. 6. **Upgrade R:** - Commands: `sudo apt update` followed by `sudo apt install --upgrade r-base` - Purpose: Installs the latest version of R. ## MinIONQC.R https://github.com/roblanf/minion_qc ``` wget https://raw.githubusercontent.com/roblanf/minion_qc/master/MinIONQC.R -O MinIONQC.R ``` Dependencies To run the script, you will need a recent version of R, and the following packages. To install the right packages, just start up R and copy/paste the code below. 1. goto console 2. type in ``` $ R ``` 3. This will start R on commando line ``` >install.packages(c("data.table", "futile.logger", "ggplot2", "optparse", "plyr", "readr", "reshape2", "scales", "viridis", "yaml")) ``` One example ``` Rscript MinIONQC.R -i example_input_minion -o my_example_output_minion -p 2 ``` ## samtools, bcftools, htslib 1. goto http://www.htslib.org/download/ 2. download latest version of samtools-X.X, bcftools-X.X, htslib-X.X 3. create directory samtools 4. copy all *.tar.bz2 files in samtools 5. unzip all files 6. make three folder for samtools, bcftools, htslib 7. goto console ``` $ cd /go/to/unziped/directory ## e.g. /home/sascha/data/program/samtools/samtools-1.9 ## install in folder of point 6. $ ./configure --prefix=/home/sascha/data/program/samtools/samtools $ make $ make install ``` The executable programs will be installed to a bin subdirectory under your specified prefix, so you may wish to add this directory to your $PATH: ``` export PATH=/home/sascha/data/program/samtools/samtools/bin:$PATH # for sh or bash users ``` ## canu 1. Download Version 1.7.1 and 1.8 of Canu here https://github.com/marbl/canu/releases 2. unzip the folders to to folder SeqTools 3. rename the folders in a) canu-1.7 and b) canu-1.8 4. all executable files under a) /SeqTools/canu-1.7/Linux-amd64/bin or b) /SeqTools/canu-1.8/Linux-amd64/bin 5. no installation is nescessary 6. create a softlink to usr/bin for both version ### canu 1.7 goto usr/bin ``` cd sudo bash cd /usr/bin ln -s /home/sascha/SeqTools/canu-1.7/Linux-amd64/bin/canu canu-1.7 ``` ### canu 1.8 ``` cd sudo bash cd /usr/bin ln -s /home/sascha/SeqTools/canu-1.8/Linux-amd64/bin/canu canu-1.8 ``` ## FLYE ### Installation To install the Flye package into your system, run: ``` git clone https://github.com/fenderglass/Flye cd Flye python setup.py install ``` Depending on your OS, you might need to add --user or --prefix options to the install command for the local installation. After installation, Flye could be invoked via: ``` flye ``` ## Unicycler https://github.com/rrwick/Unicycler ## QUAST 1. Download QUAST - https://sourceforge.net/projects/quast/ 2. Extract quast-5.0.0.tar.gz to folder of choice 3. Folder quast-5.0.0 will be generate 4. Navigate in shell to quast-5.0.0. Requires: 1. Python2 (2.5 or higher) or Python3 (3.3 or higher) 2. GCC 4.7 or higher 3. Perl 5.6.0 or higher 4. GNU make and ar 5. zlib development files Basic installation (about 120 MB): sudo ./setup.py install or Full installation (about 540 MB, additionally includes (1) tools for SV detection based on read pairs, which is used for more precise misassembly detection, and (2) tools/data for reference genome detection in metagenomic datasets): sudo ./setup.py install_full Example: ``` quast.py -t 4 -o ~/run0002/quast_canu_assembly -R ~/run0002/Reference_genomes/PSS_728a/PSS728a.fna ~/run0002/canu/BC02_PSS_IPC/BC02_PSS_IPC.contigs.fasta ``` ## LAST ## NanoPack 1. intallation within --user -U 2. update python3 pip ``` pip3 install --upgrade pip ``` 2. install used tools for nanopack ``` pip3 install --user -U setuptools pip3 install --user -U numpy pip3 install --user -U mappy ``` 3. install nanopack ``` pip3 install --user -U nanopack ``` 4. Verify installed packages have compatible dependencies ``` pip3 check ``` 5. Are dependencies missing ``` pip3 search "type here the package you search for" pip3 install "type here the package you will install" ``` ## Glances Glances ist ein System-Monitor für die Kommandozeile. Gegenüber dem Klassikern top und htop bietet das Programm neben Prozess-Informationen ergänzende Echtzeit-Statistiken zu Dateisystem, Netzwerk, Hardware-Komponenten etc. ``` $ sudo apt-get install glances ``` ## snap - installation client (like apt-get) ### Behind a proxy ``` $ sudo nano /etc/enviroment #copy this lines: http_proxy="http://proxy.clondiag.jena:8080/" https_proxy="https://proxy.clondiag.jena:8080/" #save file $ sudo apt install snapd $ sudo systemctl edit snapd.service #copy this lines: [Service] EnvironmentFile=/etc/environment #save file $ sudo systemctl daemon-reload $ sudo systemctl restart snapd.service ``` - Snap is ready to use ## Notepad-plus-plus ``` $ snap install notepad-plus-plus ``` ## MinKNOW for Linux LINK to ONT: https://community.nanoporetech.com/protocols/experiment-companion-minknow/v/mke_1013_v1_revam_11apr2016/installing-minknow-on-linu ``` sudo bash [sudo] Passwort für sascha: ``` ``` sudo apt-get update sudo apt-get install wget wget -O- https://mirror.oxfordnanoportal.com/apt/ont-repo.pub | sudo apt-key add -echo "deb http://mirror.oxfordnanoportal.com/apt xenial-stable non-free" | sudo tee/etc/apt/sources.list.d/nanoporetech.sources.list ``` ## Basecaller ### Bonito basecaller #### Install bonito via miniconda ``` conda create -n bonito python=3.8 conda activate bonito pip install ont-bonito conda install pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch -c nvidia pip install numpy==1.17.5 ``` #### Models ```bonito download --models``` #### pyTorch Homepage https://pytorch.org/get-started/locally/ #### Using ```bonito basecaller dna_r9.4.1@v3.3 *.fast5 > nameoffile.fasta``` ### Guppy for Linux LINK to ONT: https://community.nanoporetech.com/protocols/Guppy-protocol-preRev/v/gpb_2003_v1_revg_14dec2018/linux-guppy Add Oxford NanoporeTechnologies' .deb repository to your system (this is to install Oxford Nanopore Technologies-specific dependency packages): ``` sudo bash [sudo] Passwort für sascha: ``` Copy paste all commandos in terminal ``` sudo apt-get update sudo apt-get install wget lsb-release export PLATFORM=$(lsb_release -cs) wget -O- https://mirror.oxfordnanoportal.com/apt/ont-repo.pub | sudo apt-key add -echo "deb http://mirror.oxfordnanoportal.com/apt ${PLATFORM}-stable non-free" | sudo tee/etc/apt/sources.list.d/nanoporetech.sources.list sudo apt-get update ``` To install the .deb for Guppy, use the following command (without brackets): ``` apt-get install ont-guppy-cpu ``` ### Albacore for Linux Add Oxford Nanopore's deb repository to your system (this is used to install Oxford Nanopore-specific dependency packages): ``` sudo apt-get update sudo apt-get install wget wget -O- https://mirror.oxfordnanoportal.com/apt/ont-repo.pub | sudo apt-key add - echo "deb http://mirror.oxfordnanoportal.com/apt trusty-stable non-free" | sudo tee /etc/apt/sources.list.d/nanoporetech.sources.list sudo apt-get update ``` Install the deb using dpkg: ``` sudo dpkg -i path/to/python3-ont-albacore-xxx.deb ``` This will report several errors because there are missing dependencies. Fix these errors using apt: ``` sudo apt-get -f install ``` ## MAFFT ### installation ### command ```mafft --auto --adjustdirection --thread -1 path/to/*.fasta > path/to/out-fasta``` #### Align oligos to a recent alignment Note: Alignment file must be already aligned!!!! ```mafft --adjustdirection --addfragments oligos-file.fasta allignment-file.fasta > alignment_mafft.fasta``` ## Oracle Java First, update the package index. ``` sudo apt-get update ``` Next, install Java. Specifically, this command will install the Java Runtime Environment (JRE). ``` sudo apt-get install default-jre ``` The JDK does contain the JRE, so there are no disadvantages if you install the JDK instead of the JRE, except for the larger file size. You can install the JDK with the following command: ``` sudo apt-get install default-jdk ``` Installing the Oracle JDK If you want to install the Oracle JDK, which is the official version distributed by Oracle, you will need to follow a few more steps. First, add Oracle's PPA, then update your package repository. ``` sudo add-apt-repository ppa:webupd8team/java sudo apt-get update ``` Then, depending on the version you want to install, execute one of the following commands: Oracle JDK 8 This is the latest stable version of Java at time of writing, and the recommended version to install. You can do so using the following command: ``` sudo apt-get install oracle-java8-installer ``` Managing Java There can be multiple Java installations on one server. You can configure which version is the default for use in the command line by using update-alternatives, which manages which symbolic links are used for different commands. ``` sudo update-alternatives --config java ``` The output will look something like the following. In this case, this is what the output will look like with all Java versions mentioned above installed. Output There are 5 choices for the alternative java (providing /usr/bin/java). |Selection|Path|Priority|Status| |--------|--------|--------|--------| |*0|/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java|1081|auto mode| |1|/usr/lib/jvm/java-8-oracle/jre/bin/java|3|manual mode| |2|/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java|1081|manual mode| Press <enter> to keep the current choice[*], or type selection number: You can now choose the number to use as a default. This can also be done for other Java commands, ## FastQC Using apt-get installtion ``` apt-get install fastqc ``` Note: Sometimes the HTML-file will not create at analysis folder. If so use following steps. 1. Downlaod zip file https://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.11.8.zip 2. Unzip the file directly in "Downlods" 3. make folder /etc/fastqc ``` sudo bash cd / cd /etc mkdir fastqc ``` 4. move all files from /Downloads/FastQC to etc/fastqc ``` sudo bash mv -v /home/sascha/Downloads/FastQC/* /etc/fastqc ``` 5. Done ## Bowtie2 Install bowtie2 How to install bowtie2 on Ubuntu/Linux? 1. download page https://sourceforge.net/projects/bowtie-bio/files/bowtie2/ 2. create and go to install directory ``` cd /home/SeqTools/bowtie2/ ``` 3. download Ubuntu/Linux version ``` https://sourceforge.net/projects/bowtie-bio/files/latest/download ``` 4. decompress unzip download in home/SeqTools/bowtie2/ 5. add location to system PATH ``` export PATH=/home/SeqTools/bowtie2/:$PATH ``` 6. check installation ``` bowtie2 --help ``` ## BRIG 1. Download the latest version (BRIG-x.xx-dist.zip) from http://sourceforge.net/projects/brig/ 2. Unzip BRIG-x.xx-dist.zip to a desired location ``` EXAMPLE: home/sascha/Seqtools/BRIG ``` 4. Navigate to the unpacked BRIG folder in a command-line interface (terminal, console, command prompt). 5. Run ‘java -Xmx1500M -jar BRIG.jar’. Where -Xmx specifies the amount of memory allocated to BRIG. 6. OR make shekl script BRIG.sh and copy ``` cd /home/sascha/SeqTools/BRIG/ java -Xmx1500M -jar BRIG.jar ``` into this file 6. make smart link to /usr/bin/ ``` $usr/bin: ln -s /home/sascha/SeqTools/BRIG/BRIG.sh BRIG ``` 7. Start BRIG with ``` BRIG ``` ## IGV 1. Download latest version from https://software.broadinstitute.org/software/igv/download 2. Unzip the file 3. copy folder in Seqtools 4. renmame to IGV 5. make soft link to /usr/bin/ ``` sudo ln -s /home/sascha/Seqtools/IGV/igv.sh igv.sh ``` 6. start IGV usig *IGV.sh* on command line ## Mauve 1. Download the latest version of Mauve for your operating system from http://darlinglab.org/mauve/download.html 2. unzip the file and rename to mauve 3. copy folder "mauve" to Seqtools 4. make soft link to ~/.local/bin/ of file Mauve.sh ``` ln -s /home/sascha/seqtools/mauve/Mauve /home/sascha/.local/bin/Mauve ``` 5. run Mauve by typing Mauve to command line ## Bandage The following instructions successfully build Bandage on a fresh installation of Ubuntu 14.04: 1. Ensure the package lists are up-to-date: ``` sudo apt-get update ``` 2. Install prerequisite packages: ``` sudo apt-get install build-essential git qtbase5-dev libqt5svg5-dev ``` 3. Download the Bandage code from GitHub: ``` git clone https://github.com/rrwick/Bandage.git ``` 4. Open a terminal in the Bandage directory. 5. Set the environment variable to specify that you will be using Qt 5, not Qt 4: ``` export QT_SELECT=5 ``` 6. Run qmake to generate a Makefile: ``` qmake ``` 7. Build the program: ``` make ``` 8. Bandage should now be an executable file. Example command on ubuntu command line: ``` Bandage image assembly_graph.gfa BC01.svg --query assembly.fasta --fontsize 25 --names --minnodlen 25 --lengths --width 5000 --height 5000 --depth``` ## MUMmer Download latest release here https://github.com/mummer4/mummer To compile and install: - goto folder were the mummer-zip file is located. - unpack zip file - goto folder and open folder in terminal - type in ... ``` ./configure --prefix=/home/sascha/seqtools/mummer4 make make install ``` - Set links to .local/bin/ Example ``` ln -s /home/sascha/seqtools/mummer4/bin/dnadiff /home/sascha/.local/bin/dnadiff ``` ## filtlong Filtlong is a tool for filtering long reads by quality. It can take a set of long reads and produce a smaller, better subset. It uses both read length (longer is better) and read identity (higher is better) when choosing which reads pass the filter. ``` git clone https://github.com/rrwick/Filtlong.git cd Filtlong make -j bin/filtlong -h ``` ``` cp bin/filtlong /home/sascha/bin/filtlong ``` ## Nanopolish 1. Install nanopolish from GitHub ``` git clone --recursive https://github.com/jts/nanopolish.git cd nanopolish make ``` 2. Erstelle LINK nach ./local/bin ``` ln -s /path to/nanopolish /home/sascha/.local/bin/nanopolish ``` 3. Install Bio-Python ``` sudo apt-get install python-biopython ``` 5. Erstelle LINK von nanopolish_makerange.py nach ./local/bin ``` ln -s /path to/nanopolish/scripts/nanopolish_makerange.py /home/sascha/.local/bin/nanopolish_makerange.py ``` 6. Install parallel ``` sudo apt-get install parallel ``` 7. try all using --help ``` parallel --help nanopolish --help nanoploish_makerange.py --help ``` ## Illumina Assembling ### megahit Install: ``` conda install -c bioconda megahit ``` open terminal ``` $ conda activate $ conda install -c bioconda megahit ``` Beispiel: ``` #FWD-fastq and REV-fastq $ megahit -1 file-1.fastq.gz -2 file-2.fastq.gz #SRA paired file $ megahit -12 file-paired-fastq.gz #Input options that can be specified for multiple times (supporting plain text and gz/bz2 extensions) #-1 <pe1> comma-separated list of fasta/q paired-end #1 files, paired with files in <pe2> #-2 <pe2> comma-separated list of fasta/q paired-end #2 files, paired with files in <pe1> #--12 <pe12> comma-separated list of interleaved fasta/q paired-end files #-r/--read <se> comma-separated list of fasta/q single-end files ``` ### SPAdes Install: ``` wget http://cab.spbu.ru/files/release3.14.1/SPAdes-3.14.1-Linux.tar.gz tar -xzf SPAdes-3.14.1-Linux.tar.gz cd SPAdes-3.14.1-Linux/bin/ ``` Copy a link to ./local/bin via Dateimanager Start: ``` spades.py --test ``` Example: ``` spades.py -1 R1.fastq.gz -2 R1.fastq.gz -o output ``` ## Medaka-GPU ### Create a medaka enviroment ``` conda create -n medaka python=3.9 ``` ### Activate medaka enviroment ``` conda activate medaka ``` ### Install all dependencies ``` conda install samtools conda update samtools conda install bcftools conda install minimap2 pip install pyabpoa pip install pandas ``` ### Install medaka unsing pip ``` pip install medaka==1.11.1 ``` NOTE: ==rescent verion https://github.com/nanoporetech/medaka ### Start: ``` conda activate medaka ``` ``` medaka_consensus -o output-folder -i file .fastq -d file.fasta -m r1041_e82_400bps_sup_v4.2.0 -t 32 ``` ## Medaka-CPU Short commands on Ubuntu 18.04 to install Medaka-CPU ``` conda create -n medaka-cpu python=3.9.9 conda activate medaka-cpu conda install samtools==1.11 pip install medaka-cpu ``` ## Upgrade medaka using conda 1. activate conda ```conda activate medaka``` 2. upgrade medaka ```pip install --upgrade medaka``` 3. upgrade SAMTOOLS ```conda update samtool``` ## RACON ### cmake Install: Download von https://github.com/Kitware/CMake/releases/tag/v3.18.2 --> cmake-3.18.2-Linux-x86_64.sh to Download folder ``` sh cmake-3.18.2-Linux-x86_64.sh ``` copy folder cmake-3.18.2-Linux-x86_64 to seqtools --> rename to cmake ``` sudo ln -s /home/sascha/seqtools/cmake/cmake /usr/bin/cmake ``` control by ``` cmake --version ``` ### racon installation ``` git clone --recursive https://github.com/lbcb-sci/racon.git racon cd racon mkdir build ``` ### CUDA support ``` cd build cmake -DCMAKE_BUILD_TYPE=Release -Dracon_enable_cuda=ON .. make ``` ## Miniconda3 ``` mkdir -p ~/seqtools/miniconda3 wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/seqtools/miniconda3/miniconda.sh bash ~/seqtools/miniconda3/miniconda.sh -b -u -p ~/seqtools/miniconda3 rm -rf ~/seqtools/miniconda3/miniconda.sh ~/seqtools/miniconda3/bin/conda init bash ~/seqtools/miniconda3/bin/conda init zsh ``` Note: deactivate base on Terminal ```conda config --set auto_activate_base false``` activate: ```conda activate "your env"``` deactivate: ```conda deactivate "your env"``` ## CUDA Toolkit 11.1 ### Install ``` wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600 wget https://developer.download.nvidia.com/compute/cuda/11.1.1/local_installers/cuda-repo-ubuntu1804-11-1-local_11.1.1-455.32.00-1_amd64.deb sudo dpkg -i cuda-repo-ubuntu1804-11-1-local_11.1.1-455.32.00-1_amd64.deb sudo apt-key add /var/cuda-repo-ubuntu1804-11-1-local/7fa2af80.pub sudo apt-get update sudo apt-get -y install cuda ``` ### NVIDIA visual profile - NVVP is installed /usr/local/cuda/bin - NVVP needs java verson 1.8.0 #### install java 1.8.0 - download vesrion 1.8.0 from https://www.java.com/de/download/ - unzip the tar file with ``` tar -xf jre-8u271-linux-x64.tar.gz ``` - move all to folder of your choice (e.g. /usr/local/seqtools/java1.8.0) - rename ./java to ./java1.8 ``` sudo mv java java1.8 ``` - export PATH (e.g. ``` export PATH=$PATH /usr/local/seqtools/java1.8.0/bin```) or to your readme.sh which is linke to your .bashrc (see section .bashrc) ## Java 15 ``` sudo add-apt-repository ppa:linuxuprising/java sudo apt update sudo apt install oracle-java15-installer ``` ## docker ### Installing Docker #### Dependencies ``` sudo apt-get install -y \ apt-transport-https \ ca-certificates \ curl \ software-properties-common curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - ``` #### Repo and install ``` #correct release candidate # for Linux mint var="bionic" # for other var=$(lsb_release -cs) sudo add-apt-repository "deb [arch=amd64] https://```download.docker.com/linux/ubuntu $var stable" # this stuff is stored under /etc/apt/sources.list.d # you can edit or remove this file if there are some errors sudo apt-get update sudo apt-get install docker-ce ``` add "docker" to your group so you don't have to type sudo every time ``` sudo usermod -a -G docker $USER sudo reboot ``` #### Important commands ``` docker run --rm <imagename> <command> # runs image as a container and removes it afterwards docker pull <repositoryname/dockername> # basically git clone of a docker image docker build -t <image_name> . # build a image from Docker file in . docker images # shows all images docker rmi <name> # removes a image docker ps -a # shows all current containers (active and exited) docker rm <name> # removes docker container (IMPORTANT if you want to clean up) ``` #### Run dockers examples ``` docker run --rm -it -v $PWD:/input nanozoo/flye docker run --gpus all --rm -it -v $PWD:/input nanozoo/guppy_gpu ``` ### Installing docker NVIDIA Toolkit The following steps can be used to setup NVIDIA Container Toolkit on Ubuntu LTS - 16.04, 18.04, 20.4 and Debian - Stretch, Buster distributions. Docker-CE on Ubuntu can be setup using Docker’s official convenience script: ``` curl https://get.docker.com | sh sudo systemctl start docker && sudo systemctl enable docker ``` See also Follow the official instructions for more details and post-install actions. Setup the stable repository and the GPG key: ``` distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list ``` Note To get access to experimental features such as CUDA on WSL or the new MIG capability on A100, you may want to add the experimental branch to the repository listing: ``` curl -s -L https://nvidia.github.io/nvidia-container-runtime/experimental/$distribution/nvidia-container-runtime.list | sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list ``` Install the nvidia-docker2 package (and dependencies) after updating the package listing: ``` sudo apt-get update sudo apt-get install -y nvidia-docker2 ``` Restart the Docker daemon to complete the installation after setting the default runtime: ``` sudo systemctl restart docker ``` At this point, a working setup can be tested by running a base CUDA container: ``` sudo docker run --rm --gpus all nvidia/cuda:9.0-base nvidia-smi ``` ## NCBI SRA Toolkit ### insallation 1. Download via wget to home/Download ```wget --output-document sratoolkit.tar.gz http://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/current/sratoolkit.current-ubuntu64.tar.gz``` 2. Unzip via tar ```tar -vxzf sratoolkit.tar.gz``` 3. move to folder seqtools ```sudo mv sratoolkit.2.11.0-ubuntu64 /usr/local/seqtools/``` 4. setze Link to /usr/local/bin ```ln -s /usr/local/seqtools/sratoolkit.2.11.0-ubuntu64/fastq-dump``` ### Using ```fastq-dump --stdout SRR12345678 > SRR12345678.fastq``` ## MLST ### Install tseemann-MLST ``` cd $usr/local/seqtools sudo git clone https://github.com/tseemann/mlst.git sudo nano readme.sh ``` #### in readme.sh: ``` #mlst PATH=$PATH:/usr/local/seqtools/mlst/bin ``` ### Online MLST https://cge.cbs.dtu.dk/services/MLST/ ## Barrnap (BAsic Rapid Ribosomal RNA Predictor) Barrnap predicts the location of ribosomal RNA genes in genomes. It supports bacteria (5S,23S,16S), archaea (5S,5.8S,23S,16S), metazoan mitochondria (12S,16S) and eukaryotes (5S,5.8S,28S,18S). It takes FASTA DNA sequence as input, and write GFF3 as output. It uses the new nhmmer tool that comes with HMMER 3.1 for HMM searching in RNA:DNA style. Multithreading is supported and one can expect roughly linear speed-ups with more CPUs. ### Installation ``` cd /usr/local/seqtools sudo git clone https://github.com/tseemann/barrnap.git cd barrnap/bin cd /usr/local/bin/ ln -s /usr/local/seqtools/barrnap/bin barrnap barrnap ./barrnap --help ``` ## FastTree FastTree infers approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide or protein sequences. FastTree can handle alignments with up to a million of sequences in a reasonable amount of time and memory. For large alignments, FastTree is 100-1,000 times faster than PhyML 3.0 or RAxML 7. FastTree is open-source software -- you can download the code below. FastTree is more accurate than PhyML 3 with default settings, and much more accurate than the distance-matrix methods that are traditionally used for large alignments. FastTree uses the Jukes-Cantor or generalized time-reversible (GTR) models of nucleotide evolution and the JTT (Jones-Taylor-Thornton 1992), WAG (Whelan & Goldman 2001), or LG (Le and Gascuel 2008) models of amino acid evolution. To account for the varying rates of evolution across sites, FastTree uses a single rate for each site (the "CAT" approximation). To quickly estimate the reliability of each split in the tree, FastTree computes local support values with the Shimodaira-Hasegawa test (these are the same as PhyML 3's "SH-like local supports"). ### Installation ``` cd Download wget http://www.microbesonline.org/fasttree/FastTreeMP sudo chmod 774 FastTreeMP sudo mv FastTreeMP /usr/local/bin ``` ## BRIG BRIG is a cross-platform (Windows/Mac/Unix) application that can display circular comparisons between a large number of genomes, with a focus on handling genome assembly data. Major Features: Images show similarity between a central reference sequence and other sequences as concentric rings. BRIG will perform all BLAST comparisons and file parsing automatically via a simple GUI. Contig boundaries and read coverage can be displayed for draft genomes; customized graphs and annotations can be displayed. Using a user-defined set of genes as input, BRIG can display gene presence, absence, truncation or sequence variation in a set of complete genomes, draft genomes or even raw, unassembled sequence data. BRIG also accepts SAM-formatted read-mapping files enabling genomic regions present in unassembled sequence data from multiple samples to be compared simultaneously. ### Installation 1. Download th latest version: https://sourceforge.net/projects/brig/ 2. unzip and copy folder to server via winSCP ### Running Users who wish to run BRIG from the command-line need to: 1. Navigate to the unpacked BRIG folder in a command-line interface (terminal, console, command prompt). 2. Run 'java -Xmx1500M -jar BRIG.jar'. Where -Xmx specifies the amount of memory allocated to BRIG. 3. copy the multifasta file and the reference file (query) in the BRIG folder. ## ideel (QC Tool for Minion Seq) A Repo on code by Mick Watson who wrote a blog post and follow up about a quick way to test the viability of a (long-read) assembly. ### Dependencies #### DIAMOND https://github.com/bbuchfink/diamond/wiki ##### downloading the tool ```wget http://github.com/bbuchfink/diamond/releases/download/v2.0.9/diamond-linux64.tar.gz``` ```tar xzf diamond-linux64.tar.gz``` ##### creating a diamond-formatted database file ```diamond makedb --in reference.fasta -d reference``` ##### running a search in blastp mode ```diamond blastp -d reference -q queries.fasta -o matches.tsv``` ##### running a search in blastx mode ```diamond blastx -d reference -q reads.fasta -o matches.tsv``` ##### downloading and using a BLAST database ```update_blastdb.pl --decompress --blastdb_version 5 swissprot``` ```diamond blastp -d swissprot -q queries.fasta -o matches.tsv``` #### snakemake ```conda create -c conda-forge -c bioconda -n snakemake snakemake``` ### Running ideel - navigate to "/mnt/data_1/workdir_sascha/004_seq_QC/ideel" - copy fasta file to "/mnt/data_1/workdir_sascha/004_seq_QC/ideel/genomes" - rename file to *.fa - activate conda ```conda activate snakemake``` - run ```snakemake -c 32``` ## NextDenovo ### Install ```wget https://github.com/Nextomics/NextDenovo/releases/latest/download/NextDenovo.tgz``` ```pip3 install paralleltask``` ```tar -vxzf NextDenovo.tgz && cd NextDenovo``` 1. open nextDenovo via notepad++ --> change python to python3 2. copy folder NextDenovo to /usr/local/seqtools 3. goto /usr/local/seqtools/NextDenovo testing ```./nextDenovo test_data/run.cfg``` LINKS you need: ```ln -s /usr/local/seqtools/NextDenovo/nextDenovo /usr/local/bin/nextDenovo``` ```ln -s /usr/local/seqtools/NextDenovo/bin/seq_stat /usr/local/bin/seq_stat``` ### Run Nextdenovo ```ls reads1.fasta reads2.fastq reads3.fasta.gz reads4.fastq.gz ... > input.fofn``` ```seq_stat -g "size-OF-genome" input.fofn``` ---> Suggested seed_cutoff (genome size: 2.70Mb, expected seed depth: 45, real seed depth: 45.00): 16840 bp the genome size and the seed_cutoff must be written in the run.cfg file 1. copy run.cfg to folder with fastq und run.cfg files 2. open run.cfg and write seed_cutoff (=read_cutoff) and genome_size here: [correct_option] read_cutoff = 16840bp genome_size = 2.7Mb # estimated genome size sort_options = -m 20g -t 15 minimap2_options_raw = -t 8 pa_correction = 3 # number of corrected tasks used to run in parallel, each corrected task requires ~TOTAL_INPUT_BASES/4 bytes of memory usage. correction_options = -p 15 3. don't forget to set seq type to ont!!!! 4. RUN: ```nextDenove run.cfg``` ## GenomeTools (gt) http://genometools.org/index.html oder https://github.com/genometools/genometools/tree/v1.6.2 Easy way: ```sudo apt-get install genomtools``` ## seqtk Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. It seamlessly parses both FASTA and FASTQ files which can also be optionally compressed by gzip. https://github.com/lh3/seqtk ``` git clone https://github.com/lh3/seqtk.git; cd seqtk; make sudo mv /seqtk /usr/local/seqtools cd /usr/bin ln -s /usr/locol/seqtools/seqtk/seqtk seqtk ``` ### rename fasta fiel header ``` seqtk rename BC02.fasta contig_ > BC02c_short.fasta ``` ## Duplex basaecalling using Guppy ### Start guppy basecaller in fast mode ```guppy_basecaller -c dna_r10.4.1_e8.2_400bps_fast.cfg -r -i /mnt/data_1/workdir_sascha/001_Sequenzierung/2023-02-28_run0089_greek_NordIII/002_raw-data -s /mnt/data_1/workdir_sascha/001_Sequenzierung/2023-02-28_run0089_greek_NordIII/003_analysis/001_guppy/simplex --do_read_splitting --device 'cuda:0 cuda:1'``` ### Installation duplex_tools via conda https://github.com/nanoporetech/duplex-tools ``` conda create -n duplex_tools python=3.9 pip install duplex_tools conda activate fuplex_tools ``` ### Duplex finder using sequncing_summary.txt #### step 1: ```duplex_tools pairs_from_summary ./sequencing_summary.txt duplex``` output: 'pair_ids.txt' #### step 2: ```duplex_tools filter_pairs ./duplex/pair_ids.txt /guppy/pass``` #output 'pair_ids_filtered.txt' ### Start guppy duplex basecaller using sup mode ```guppy_basecaller_duplex -c dna_r10.4.1_e8.2_400bps_sup.cfg -r -i fast5_10.4/ -s guppy_duplex/duplex_calls/ --device 'cuda:0 cuda:1' --duplex_pairing_mode from_pair_list --duplex_pairing_file guppy_duplex/duplex/pair_ids_filtered.txt``` ## Install and use Bakta https://github.com/oschwengers/bakta ### Create conda env wih python3 ```conda env -n bakta python=3.9``` ```conda activate bakta``` ### Install Bakta on conda ```conda install -c conda-forge -c bioconda bakta``` ### Install database ```bakta_db download --output ~/workdir_sascha/bakta/db --type full``` ### Example ```bakta --db /mnt/data_1/workdir_sascha/014_bakta-annotation/db/ --verbose --output /mnt/data_1/workdir_sascha/014_bakta-annotation/results/ --prefix AERO_240783 --replicons 240783_BC01_AERO_CH_Nord61_R609_aac6-lb_VIM-2.csv --threads 32 240783_BC01_AERO_CH_Nord61_R609_aac6-lb_VIM-2.fasta``` ### Replicon meta data table To fine-tune the very details of each sequence in the input fasta file, Bakta accepts a replicon meta data table provided in csv or tsv file format: --replicons <file.tsv>. Thus, complete replicons within partially completed draft assemblies can be marked & handled as such, e.g. detection & annotation of features spanning sequence edges. Table format: |original sequence id|new sequence id|type|topology|name| |--------------------|---------------|----|--------|----| |old id|new id|chromosome, plasmid, contig|circular, linear|name| |NODE_1|chrom| chromosome| circular| -| |NODE_2|p1|plasmid| c| pXYZ1| |NODE_3|p2|plasmid| c| pXYZ2| ## NCBI Command line tools Download and install the NCBI Datasets command-line tools The NCBI Datasets command-line tools (CLI) are datasets and dataformat. Use datasets to download biological sequence data across all domains of life from NCBI. Use dataformat to convert metadata from JSON Lines format to other formats. For more information about our tools, please refer to our How-to guides. ![image](https://hackmd.io/_uploads/rkQOPLJ8T.png) Note: The NCBI Datasets command-line tools are updated frequently to add new features, fix bugs, and enhance usability. Command syntax is subject to change. Please check back often for updates. Install NCBI Datasets command-line tools The NCBI Datasets CLI tools are available on multiple platforms. To download previous versions of datasets and dataformat, please refer to the Download and Install page in the CLI v13 documentation. You can get more information about new features and other updates in our release notes on GitHub. ### Linux - AMD64 https://ftp.ncbi.nlm.nih.gov/pub/datasets/command-line/v2/linux-amd64/datasets https://ftp.ncbi.nlm.nih.gov/pub/datasets/command-line/v2/linux-amd64/dataformat Install using curl Linux Download datasets: ```curl -o datasets 'https://ftp.ncbi.nlm.nih.gov/pub/datasets/command-line/v2/linux-amd64/datasets'``` Download dataformat: ```curl -o dataformat 'https://ftp.ncbi.nlm.nih.gov/pub/datasets/command-line/v2/linux-amd64/dataformat'``` Make them executable: chmod +x datasets dataformat ## Trycycler using Conda 1. **Ensure Conda is Installed:** - If not installed, download and install Conda. 2. **Activate Bioconda Channel:** - Use the command `conda config --add channels bioconda` to add the Bioconda channel. - Also, add the conda-forge channel: `conda config --add channels conda-forge`. 3. **Create a New Conda Environment (Recommended):** - Create a new environment for Trycycler: `conda create --name trycycler python=3.9`. - Activate the environment: `conda activate trycycler`. 4. **Install Trycycler:** - Install Trycycler in the environment: `pip3 install trycycler`. 5. **Install Dependencies:** - Dependencies like `mash`, `miniasm`, `minimap2`, `muscle`, `numpy`, `pillow`, `python` (>=3.6), `python-edlib`, `r-ape`, `r-base`, `r-phangorn`, and `scipy` should automatically be installed with Trycycler. 6. **Upgrade R** - see above "r-base" installation 8. **Verify Installation:** - After installation, verify by running `trycycler --help` or a similar command. Remember to activate the `trycycler` environment each time you need to use Trycycler. ## FigTree Installation and Setup ### 1. **Download FigTree** To download FigTree, navigate to the Download directory and use `wget` to download the FigTree zip file. ```bash cd /home/USER/Download wget https://github.com/rambaut/figtree/releases/download/v1.4.4/FigTree.v1.4.4.zip ``` ### 2. **Unzip the File** After downloading, unzip the FigTree zip file. ```bash unzip /home/USER/Download/FigTree.v1.4.4.zip ``` ### 3. **Rename and Move the Folder** Copy and rename the unzipped folder to a new directory. ```bash cp -r FigTree.v1.4.4 /usr/local/seqtools/figtree ``` ### 4. **Create a Bash Script** - **Script Creation:** Open a text editor and create a bash script to run the FigTree JAR file. ```bash #!/usr/bin/env bash java -jar /usr/local/seqtools/figtree/lib/figtree.jar ``` - **Make Executable:** Save the script as `figtree.sh` and make it executable. ```bash chmod +x figtree.sh ``` - **Run the Script:** Execute the script from the command line to start FigTree. ```bash ./figtree.sh ``` ### 5. **Link the Script** Create a symbolic link to `figtree.sh` in `/usr/local/bin` for easy access. ```bash sudo ln -s /usr/local/seqtools/figtree/figtree.sh /usr/local/bin/figtree ``` Ensure that you replace `USER` with your actual username and verify the paths according to your system configuration. ## Dorado (https://github.com/nanoporetech/dorado) 1. Download latest version of Dorado e.g. https://cdn.oxfordnanoportal.com/software/analysis/dorado-0.6.1-linux-x64.tar.gz 2. unzip: ```tar xzvf dorado-0.6.1-linux-x64.tar.gz``` 3. copy: ```sudo cp -R /dorado-0.6.1-linux-x64 /usr/local/seqtools``` 4. rename: ```mv dorado-0.6.1-linux-x64 dorado-0.6.1``` 5. change the link in readme.sh under /usr/local/seqtools ```PATH=$PATH:/usr/local/seqtools/dorado-0.6.1/bin``` 6. download new models ```dorado download --model``` to /Download and copy ```/usr/seqtool/dorado/model``` 7. download research models under https://github.com/nanoporetech/rerio ## Resfinder https://bitbucket.org/genomicepidemiology/resfinder/src/master/ ### Optional: Install virtualenv via pip3 ```python3 -m pip install --upgrade pip``` ```pip3 install virtualenv``` ### Install Resfinder via virtual enviroment ```virtualenv -p python3 resfinder_env``` ```source resfinder_env/bin/activate``` Start virtual enviroment ```pip install resfinder``` NOTE: ```deactivate``` stops enviroment ### Install Resfinder in your user account ```python3 -m pip install --upgrade pip``` ```pip3 install resfinder``` ### Databases Clone your database in a folder of your choice. e.g. /usr/local/seqtools/resfinder If so us sudo command: ```sudo git clone https://bitbucket.org/genomicepidemiology/resfinder_db/``` ```sudo git clone https://bitbucket.org/genomicepidemiology/pointfinder_db/``` ```sudo git clone https://bitbucket.org/genomicepidemiology/disinfinder_db/``` ### Usage #### without point mutations: ```python3 -m resfinder -o BC06 -db_res /usr/local/seqtools/resfinder/resfinder_db/ -ifa medaka_BC06_FL_nss.fasta --acquired``` #### with point mutations: ```python3 -m resfinder -o BC06 -db_res /usr/local/seqtools/resfinder/resfinder_db/ -db_point /usr/local/seqtools/resfinder/pointfinder_db -ifa medaka_BC06_FL_nss.fasta --acquired --point -s "Escherichia coli"``` - -o Outdir - -ifa fasta-input file - -m tool - -h help ## ollama for deepseek AI models Befehle zur Installation / Konfiguration von Ollama und OpenWebUI unter Linux (Von c’t 3003) curl -fsSL https://ollama.com/install.sh | sh docker pull ghcr.io/open-webui/open-webui:main docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main sudo nano /etc/systemd/system/ollama.service ####Hier folgende Zeile unter [Service] schreiben: Environment="OLLAMA_HOST=0.0.0.0" sudo systemctl daemon-reload sudo systemctl restart ollama