Using Nanopore Technologies (ONT) minION technology to sequence <i>Dictyocotyle coeliaca</i> and align in phylogenetic

--- title: "Using Nanopore Technologies (ONT) minION technology to sequence <i>Dictyocotyle coeliaca</i> and align in phylogenetic" subtitle: 'University of Graz, Institut for Biology, AMEB2020' author: "Hannes Oberreiter, Anna Dünser" date: "July-August, 2020" header-includes: - \usepackage{hyperref} - \usepackage{float} - \usepackage{caption} - \captionsetup[figure]{font=small} output: bookdown::pdf_document2: toc: false citation_package: biblatex bibliography: bib.bibtex link-citations: true abstract: Using public available gene data and self extracted data with NGS technology to tertermine the systematic position in the neodermata. BUSO complete genes were used for tree building. --- ```{r global_options, include=FALSE} # this tells r to fix images on positions were I want them and don't let LaTeX shuffle them around like crazy knitr::opts_chunk$set(fig.pos = 'H') library("tidyverse") ``` # Introduction The Parasitic Flatworm *Dictyocotyle coeliaca* (Nybelin, 1941) is a very fascinating parasite due to its rarity and its special morphological features, distinguihing it from the common cloacal parasite *Calicotyle kroyeri* [@dawes1948; @dawes1958]. (Isn't it the only monogenean inside the body cavity?) It is living in the cœlom of its host and was found multiple times on the liver of the host [@dawes1948].Possible hosts were the parasit was found are all deep sea rays for example *Raja radiata* and *Raja lintea* [@dawes1948]. Our species of interest belongs to the subclass of Monopisthocotylea. The correct systematic position is still under active discussion. At the moment there are multiple competing hypotheses about the correct evolutionary history of the neodermata, a sister-group of planthemites [@lockyer2003; @perkins2010; @littlewood1999; @olson2001; @laumer2014]. In an advanced evolutionary practice course at the University of Graz, we did sequence three different species with NGS Oxford Nanopore, which all belong into the same group. With the help of these new gained data and additional open access data from NCBI Genbank we want to shed some more light into the controverse phylogeny of neodermata. # Material and Methods ## DNA extraction and library prep The species samples were found on host *Raja radiata* in Norway. The DNA concentration after extracting and length distribution were assessed with a TapeStation using Genomic DNA screen tapes and a Qubit 4 Flourometer using the broad range assay resulting in an concentration of 69.6 ng/µl. To reach the recommended concentration of 1µg of high molecular weight DNA two sample specimen (20µl B164-65, 21µl B161-63) were pooled before the library prep. Library prep was done using the Nanopore Ligation Sequencing kit (SQK-LSK109). To enrich sequences that are 3kb long or longer, we used the long frame Buffer following the protocol. Lab Protocol Version `GDE_9063_v109_revT_14Aug2019` which can be found on a github repositroy which was used over the course in which we sequenced the parasite: [AMEB 2020 practical](https://github.com/chrishah/AMEB_2020_practical). The concentration of the preped library was assessed using a Qubit 4 Flourometer. Quality Assesment in Library preperation was done according to the lab manual the first Qubit fluorometer measurement did result in 12.4ng/µl (total 744ng) and second 47.2ng/µl (total 660.8ng). For sequencing we used the Oxford Nanopore Technologies (ONT) minION Platform. The loading concentration of DNA in our samples was 566.4ng. ## Public Gene Data Phylogenetic tree analysis was done with public available data from [NCBI Genbank](https://www.ncbi.nlm.nih.gov) and assemblies from the University of Graz, please refere to \@ref(tab:samples). ```{r samples, echo=FALSE} t <- read.delim("data/public.tsv") t$species <- paste("*", t$species, "*", sep="") t$description <- str_replace_all(t$description, "https://www.ncbi.nlm.nih.gov/", "") t$description <- str_replace_all(t$description, "-", "local sample") n <- c("Taxa", "Group", "Source (https://www.ncbi.nlm.nih.gov/*)") knitr::kable( t[,1:3], booktabs = TRUE, caption = 'List of taxa for phylogenetic tree building', col.names = n, format = "markdown" ) ``` ## Genome Assembly For reproductibility all of the used software were can be found in docker containers. Assembly and tree Building was done on a cluster server, steps are reproducible because we facilitated docker^[https://www.docker.com/] container with singularity [@kurtzer2017]. In addition to the minION reads we also used local available Illumina reads of high quality. Illumina reads were quality corrected with TRIMMOMATIC^[docker://chrishah/trimmomatic-docker:0.38], settings see \@ref(code:trim-settings) [@bolger2014]. \begin{small} Code: (\#code:trim-settings) Settings for Trimmomatic used for quality control of Illumina data. \end{small} ```bash trimmomatic PE -phred33 <input-output files> ILLUMINACLIP:/usr/src/Trimmomatic/0.38/Trimmomatic-0.38/adapters/TruSeq3-PE-2.fa:2:30:10 \ LEADING:30 TRAILING:30 SLIDINGWINDOW:5:20 MINLEN:500 ``` Assembly of contigs and scaffolds was done with PLATANUS^[docker://chrishah/platanus:v1.2.4], standard settings. Long reads from our ONT lab work were used in combination with the resulting scaffolds with pyScaf^[docker://chrishah/pyscaf-docker] logic to improve the quality of the assembly. On all samples including our sequenced samples a BUSCO^[docker://chrishah/busco-docker:v3.1.0] analysis was done for gene labeling and quality control. The Metazoa `metazoa_odb9` lineage dataset was used with mode `genome` and flag for self-training August optimization mode (`--long`), because our given non-model species and `schistosoma` was used in the species select flag. The phylogenetic tree building was archived with a `snakemake` workflow [@koster2012]. The workflow was generated by Christoph Hahn from the University of Graz and can be accced via GitHub Repository^[https://github.com/chrishah/AMEB_HPC_Snakemake]. It uses a custom script to extract ortholog and paralog genes from BUSCO, seuqnece aligning with clustalW, post alignment rating and masking with ALISCORE and ALICUT and raxml for protein gene model finding of each gene and single tree building, post-filtering with ncbi-BLAST and FASconCAT to generate a multigene alignment tree, which can be used to create a supermatrix and tree using all genes with raxml. Full list of software and usage can be found in table \@ref(tab:software) ```{r software, echo=FALSE} t <- read.csv("data/software.csv") n <- c("Name", "Version", "Docker", "Application") knitr::kable( t, booktabs = TRUE, caption = 'Full list of software and version. Docker representate the link to the docker ID, if local is given no docker container was used and the software was installed locally on the cluster server. Application is the short version of usage for the software.', col.names = n ) ``` ## Visualization Graphical representation of the results was done with R [@r2020] and the packages [@tidyverse2019; @treeio2020; @ggtree2020; @ape2019]. The script with result data and text for this homework paper can be found on github^[https://github.com/HannesOberreiter/AMEB2020_dictyocotyle_coeliaca]. # Results This is our result tree, figure \@ref(fig:tree). ```{r tree, fig.cap="\\label{fig:tree}Resulting Tree", echo=FALSE} knitr::include_graphics("data/tree.pdf") ``` This is our result tree, figure \@ref(fig:genetree). ```{r genetree, fig.cap="\\label{fig:genetree}Resulting BUSCO Trees", echo=FALSE} knitr::include_graphics("data/gene_tree.pdf") ``` The proposed hypotheses for the clades, figure \@ref(fig:trees). ```{r trees, fig.cap="\\label{fig:trees}Three competing hypotheses of the evolution of Neodermata previously found supported by molecular data. a) (Lockyer, Olson, and Littlewood 2003), b) (Perkins et al. 2010), and c) (Littlewood, Rohde, and Clough 1999; Littlewood and Olson 2001; Laumer and Giribet 2014).", echo=FALSE} knitr::include_graphics("data/trees.png") ``` The quality analysis with BUSCO did show lower completness for the assembly which was improved with ONP reads. Therefore, we used only the given Illumina Sequences. In further analysis. # Discussion The problems why our minION reads did not improve the quality of the Illumina reads can be explained with a fragmented reads in the minION process. Because of the low length of fragments we did stop the minION sequencen early to save ressources for other reads. These resulted in "short'isch" reads and low coverage of this reads and no advandage for the illumina sequences was gained by this. # References