dnaPipeTE - HackMD

# dnaPipeTE ###### tags: `TE` https://github.com/clemgoub/dnaPipeTE ![dnaPipeTE logo](https://github.com/clemgoub/dnaPipeTE/raw/master/dna_Pipe_TE_stickers.png) ## Intro tutorial Intro tutorial with some data examples (Drosophila) can be found [here](https://tehub.org/en/tutorials/docs/dnaPipeTE). ## Overview *dnaPipeTE* provides rough estimates of TE-loads from short-read input data, additionally grouped if a TE reference library is provided. Additional down-stream processing of the output folder with *dnaPt_utils* allows easily obtaining overview graphs. The workflow goes as follows: ```mermaid graph TB id1[Non-paired, short-read FASTA sequences] & id2["`Optional: TE reference library (*RefLib*)`"] -- dnaPipeTE --> id3["`TE-load estimates + *IF RefLib:* TE classification`"] -- dnaPT_utils --> id4["`TE visualizations (TE landscape plot)`"] ``` :::info Notes: 1. Assemblies can also be used as input, if they're first used to simulate non-paired and short-read sequences as required by *dnaPipeTE*. 2. If a curated *RefLib* is unavailable, consider building a reference library based on sequence similarity with an existing *RefLib*. 3. RNA-seq can also be used as input. In this case, note identified abundances do not reflect those in the genome, but rather the level of expression for TEs. ::: ## Initialization *dnaPipeTE* expects to be run in a graphical environment. For this reason, first connect to a ==computing== cluster using *X2Go*. Before starting, create a working directory for this exercise, hereafter referred to as `~/Project`. For reproducibility's sake, *dnaPipeTE* has been installed on the cluster as an *Apptainer* image. A quick run-down on the difference between images and containers, such as those used in Docker, can be found [here](https://circleci.com/blog/docker-image-vs-container/). In the command line, initialize the *dnaPipeTE* image using the following command: ``` singularity shell --bind \ /groups/fr2020/Isa/dnaPipeTEst:/mnt \ /bioware/apptainer-images/dnapipete.img ``` Ensure the use of ==absolute pathnames==, as well as the inclusion of the ==colon== ([here](https://hsf-training.github.io/hsf-training-singularity-webpage/02-running-containers/index.html#bound-directories) an explanation for why this is important). The console should now start with `Apptainer>`. Once in the container: ==change the working directory==, test out the program: ``` cd /opt/dnaPipeTE python3 dnaPipeTE.py ``` After verifying the installation of *dnaPipeTE*, we move on to prepare our first analyses. These aim to supplement results in [Schön *et al.* (2021)](https://www.mdpi.com/2073-4425/12/3/401) with those obtained through *dnaPipeTE*. ## *D. stevensoni* SRX5491116 R1 ### Running dnaPipeTE *dnaPipeTE* requires several arguments to be specified in order to function, an overview of all which provided below. In the last column, values are suggested in the analysis of *Darwinula stevensoni* library [SRX5491116](https://www.ncbi.nlm.nih.gov/sra/SRX5491116). |Variable|Format|Remark|*D. stevensoni* Value| |-|-|-|-| |*-input*|*fastq* File path|Quality filtered & no contaminating organelles or excessive GFEs|`dnaPipeTEst/data/Darwinula_stevensoni_250_350_pass_paired_1_fixed.fastq`| |-output|Output folder path||`dnaPipeTEst/output`| |-cpu|Nr. of cores to use ||`2`| | *-sample_number* |Nr. of Trinity iterations|Recommended: start with 2, experiment with 3-4|`2`| |*-genome_size*|Estimated size of genome (in bp)| *Darwinula stevensoni*: 455.000.000, *Notodromas monacha* 425.000.000|`455000000` or `425000000`| |*-genome_coverage*|Fold coverage for input of genome OR desired fold coverage of input after sub-sampling step|Tran Van *et al.* (2021) Supplementary methods S3|`137.7` or `0.15` (experiment)| |*-RM_lib*|*fasta* File path|Header format: `>name#CLASS/Subclass`==^note^==|`Ds_ONT_EarlGray/Ds_ONT_EarlGrey_Database/Ds_ONT_EarlGrey-families.fa`| |*-RM_t*|$$0 < X < 1$$|Min. alignment fraction of query with TE reference|`0.33`| |*-keep_Trinity_output*|Boolean|Many & large intermediate files -> Not recommended|`FALSE`| |*-contig_length*|Min. TE length (bp)|[dnaPipeTE Github](https://github.com/clemgoub/dnaPipeTE) has 200bp|`200`| :::warning ==Note:== Possible header format violations: * `CLASS` must be a value in: DNA, LINE, LTR, SINE, MITE, Helitron, Simple Repeat, Satellite * Some CLASS values: Unknown (**Remove useless references?**) * Following `Subclass`, some headers have additional info (**Remove extra info?**) ::: Before dnaPipeTE can be executed with these arguments, the specified files need to be created ==^note^== and the input FASTA needs to be decompressed: ``` cd /groups/fr2020/Isa/ mkdir dnaPipeTEst/data mkdir dnaPipeTEst/output_dstevensoni gzip -dk Illumina_ds_paired_reads/trimmomatic/Darwinula_stevensoni_250_350_pass_paired_1_fixed.fastq.gz mv Illumina_ds_paired_reads/fastqc/trimm/Darwinula_stevensoni_250_350_pass_paired_1_fixed.fastq \ dnaPipeTEst/data ``` ### Attempt one With values prepared for each argument, and all data in place *dnaPipeTE* can be run: ``` python3 /opt/dnaPipeTE/dnaPipeTE.py \ -input dnaPipeTEst/data/Darwinula_stevensoni_250_350_pass_paired_1_fixed.fastq \ -output dnaPipeTEst/output \ -cpu 8 -sample_number 2 -genome_size 382000000 -genome_coverage 353 \ -RM_lib Ds_ONT_EarlGrey/Ds_ONT_EarlGrey_Database/Ds_ONT_EarlGrey-families.fa \ -RM_t 0.33 -contig_length 200 ``` :::danger * Use of `-genome_coverage 353` induced the following error: ``` ... Let's go !!! Start time: Mon Oct 28 09:02:21 2024 generating trinity samples... total number of reads: 172023527 not enought base to sample 25974258253 vs 134846000000 to sample ``` * Experimenting with lower -genome_coverage value reduced the number `134846000000` in the error. * Likely, `Darwinula_stevensoni_250_350_pass_paired_1_fixed.fastq` has less than 353x coverage by itself, since the full genome was assembled using 2 additional read lengths, as well as mate-pair sequences. * Calculations suggested `-genome_coverage 67` would reduce the original error value `134846000000` to a value less than `25974258253`, circumventing this error * This was the case in attempt two, although ==the actual genome coverage of `Darwinula_stevensoni_250_350_pass_paired_1_fixed.fastq` must be checked==. * With this issue, perhaps `-genome_coverage` should be the desired coverage after sub-sampling by dnaPipeTE. Attempt three therefore uses `0.5` as a value ::: ### Attempt two Due to the above error, a second attempt was made where `-genome_coverage` was changed to `67`: ``` python3 /opt/dnaPipeTE/dnaPipeTE.py \ -input dnaPipeTEst/data/Darwinula_stevensoni_250_350_pass_paired_1_fixed.fastq \ -output dnaPipeTEst/output \ -cpu 8 -sample_number 2 -genome_size 382000000 -genome_coverage 67 \ -RM_lib Ds_ONT_EarlGrey/Ds_ONT_EarlGrey_Database/Ds_ONT_EarlGrey-families.fa \ -RM_t 0.33 \ -contig_length 200 ``` This led to the following output, with errors discussed below: ``` ... Let's go !!! Start time: Mon Oct 28 09:50:48 2024 generating trinity samples... total number of reads: 172023527 maximum number of reads to sample: 172023526 fastq : dnaPipeTEst/data/Darwinula_stevensoni_250_350_pass_paired_1_fixed.fastq sampling 2 samples of max 86011763 reads to reach coverage... 12987129919 bases sampled in 86011763 reads s_Darwinula_stevensoni_250_350_pass_paired_1_fixed.fastq done. 12987128183 bases sampled in 172023526 reads s_Darwinula_stevensoni_250_350_pass_paired_1_fixed.fastq done. ['s0_Darwinula_stevensoni_250_350_pass_paired_1_fixed.fastq.fasta', 's1_Darwinula_stevensoni_250_350_pass_paired_1_fixed.fastq.fasta'] ################################### ### TRINITY to assemble repeats ### ################################### ***** TRINITY iteration 1 ***** Selecting reads for Trinity iteration number 1... awk: cannot open dnaPipeTEst/output/Trinity_run0/chrysalis/readsToComponents.out.sort (No such file or directory) Done perl: warning: Setting locale failed. perl: warning: Please check that your locale settings: LANGUAGE = (unset), LC_ALL = (unset), LANG = "en_US.UTF-8" are supported and installed on your system. perl: warning: Falling back to the standard locale ("C"). Must specify basic parameters: ex. Trinity --seqType fq --single reads.fq --max_memory 10G at /opt/dnaPipeTE/bin/trinityrnaseq-Trinity-v2.5.1/Trinity line 853. Trinity iteration 1 Done' ################################### ### TRINITY to assemble repeats ### ################################### ***** TRINITY iteration 2 ***** Selecting reads for Trinity iteration number 2... awk: cannot open dnaPipeTEst/output/Trinity_run1/chrysalis/readsToComponents.out.sort (No such file or directory) Done perl: warning: Setting locale failed. perl: warning: Please check that your locale settings: LANGUAGE = (unset), LC_ALL = (unset), LANG = "en_US.UTF-8" are supported and installed on your system. perl: warning: Falling back to the standard locale ("C"). Must specify basic parameters: ex. Trinity --seqType fq --single reads.fq --max_memory 10G at /opt/dnaPipeTE/bin/trinityrnaseq-Trinity-v2.5.1/Trinity line 853. Trinity iteration 2 Done' perl: warning: Setting locale failed. perl: warning: Please check that your locale settings: LANGUAGE = (unset), LC_ALL = (unset), LANG = "en_US.UTF-8" are supported and installed on your system. perl: warning: Falling back to the standard locale ("C"). sed: can't read dnaPipeTEst/output/Trinity_run2/Trinity.fasta: No such file or directory renaming Trinity output... awk: cannot open dnaPipeTEst/output/Trinity_run2/Trinity.fasta (No such file or directory) done dnaPipeTEst/output/Annotation/one_RM_hit_per_Trinity_contigs dnaPipeTEst/output/Annotation/Best_RM_annot_80-80 dnaPipeTEst/output/Annotation/Best_RM_annot_partial ####################################### ### REPEATMASKER to anotate contigs ### ####################################### /bin/sh: 1: -pa: not found Traceback (most recent call last): File "/opt/dnaPipeTE/dnaPipeTE.py", line 698, in <module> RepeatMasker(config['DEFAULT']['RepeatMasker'], args.RepeatMasker_library, args.RM_species, args.cpu, args.output_folder, args.RM_threshold) File "/opt/dnaPipeTE/dnaPipeTE.py", line 381, in __init__ self.repeatmasker_run() File "/opt/dnaPipeTE/dnaPipeTE.py", line 400, in repeatmasker_run with open(self.output_folder+"/Trinity.fasta.out", 'r') as trinity_handle: FileNotFoundError: [Errno 2] No such file or directory: 'dnaPipeTEst/output/Trinity.fasta.out' ``` :::danger Several warnings & errors are included in the output: ``` awk: cannot open dnaPipeTEst/output/Trinity_run0/chrysalis/readsToComponents.out.sort (No such file or directory) ``` * As discussed [here](https://github.com/clemgoub/dnaPipeTE/issues/3), this is not an issue ``` perl: warning: Setting locale failed. perl: warning: Please check that your locale settings: LANGUAGE = (unset), LC_ALL = (unset), LANG = "en_US.UTF-8" are supported and installed on your system. perl: warning: Falling back to the standard locale ("C"). ``` * Unknown if this is an issue ``` Must specify basic parameters: ex. Trinity --seqType fq --single reads.fq --max_memory 10G at /opt/dnaPipeTE/bin/trinityrnaseq-Trinity-v2.5.1/Trinity line 853. ``` * Led to the failing of Trinity * A check-up of the `Trinity` script led to the suspicion this issue was caused by a faulty `$max_memory` argument being passed to Trinity by `dnaPipeTE` * The `dnaPipeTE.py` script was locally edited to set Trinity's `$max_memory` argument to 8G, by changing line 314 to `self.Trinity_memory = str("8G")` * This solved the issue, and hereafter I propose working with `users/yvan/dnaPipeTE_edited.py` * Ideally, we find the cause, likely pertaining `Trinity_memory` via line 697 in `dnaPipeTE.py`: ``` Trinity(config['DEFAULT']['Trinity'], config['DEFAULT']['Trinity_memory'], args.cpu, config['DEFAULT']['Trinity_glue'], args.output_folder, sample_files, args.sample_number, args.contig_length) ``` ``` sed: can't read dnaPipeTEst/output/Trinity_run2/Trinity.fasta: No such file or directory ``` * Very likely results from failing of Trinity (`dnaPipeTEst/output/Trinity.fasta` is empty) ``` /bin/sh: 1: -pa: not found Traceback (most recent call last): File "/opt/dnaPipeTE/dnaPipeTE.py", line 698, in <module> RepeatMasker(config['DEFAULT']['RepeatMasker'], args.RepeatMasker_library, args.RM_species, args.cpu, args.output_folder, args.RM_threshold) File "/opt/dnaPipeTE/dnaPipeTE.py", line 381, in __init__ self.repeatmasker_run() File "/opt/dnaPipeTE/dnaPipeTE.py", line 400, in repeatmasker_run with open(self.output_folder+"/Trinity.fasta.out", 'r') as trinity_handle: FileNotFoundError: [Errno 2] No such file or directory: 'dnaPipeTEst/output/Trinity.fasta.out' ``` * Likely results from failing of Trinity (`dnaPipeTEst/output/Trinity.fasta` is empty) * Possibly also from the issue described [here](https://superuser.com/questions/1634933/bin-sh-1-my-command-not-found). ::: :::info Suggestion: * Find out dnaPipeTE's issue passing the `$max_memory` argument to Trinity, or work with `dnaPipeTE_edited.py` * Ensure that the following locale is supported `LANGUAGE = (unset), LC_ALL = (unset), LANG = "en_US.UTF-8"` * Verify `-genome_coverage` of `Darwinula_stevensoni_250_350_pass_paired_1_fixed.fastq` * It's possible the `-genome_coverage` argument asks for the *desired* genome coverage after dnaPipeTE's subsampling step, and that it should therefore be `0.5` ::: ### Attempt three Following these issues, a third dnaPipeTE test was done, changing `-genome_coverage` to `0.5` and manually specifying Trinity's `$max_memory` as `8G` in `dnaPipeTE_edited.py`: ``` python3 /users/yvan/dnaPipeTE_edited.py \ -input dnaPipeTEst/data/Darwinula_stevensoni_250_350_pass_paired_1_fixed.fastq \ -output dnaPipeTEst/output \ -cpu 8 \ -sample_number 2 \ -genome_size 382000000 \ -genome_coverage 0.5 \ -RM_lib Ds_ONT_EarlGrey/Ds_ONT_EarlGrey_Database/Ds_ONT_EarlGrey-families.fa \ -RM_t 0.33 \ -contig_length 200 ``` Run was manually stopped for correction of the specified genome size (382 --> 455 Mb), adopting the 'default' RM threshold (0.33 -> 0.2), experimenting with a coverage value of 0.15, and for changing the output filename ``` python3 /users/yvan/dnaPipeTE_edited.py \ -input dnaPipeTEst/data/Darwinula_stevensoni_250_350_pass_paired_1_fixed.fastq \ -output dnaPipeTEst/output_dstevensoni \ -cpu 8 \ -sample_number 2 \ -genome_size 455000000 \ -genome_coverage 0.15 \ -RM_lib Ds_ONT_EarlGrey/Ds_ONT_EarlGrey_Database/Ds_ONT_EarlGrey-families.fa \ -RM_t 0.2 \ -contig_length 200 ``` Output was as follows ``` Let's go !!! Start time: Mon Nov 4 11:18:20 2024 generating trinity samples... total number of reads: 172023527 not enought base to sample 25974258253 vs 62653499999 to sample ``` :::success * Trinity ran succesfully, producing substantial output ::: :::danger * RepeatMasker produced the following error: ``` perl: warning: Falling back to the standard locale ("C"). perl: warning: Setting locale failed. perl: warning: Please check that your locale settings: LANGUAGE = (unset), LC_ALL = (unset), LANG = "en_US.UTF-8" are supported and installed on your system. perl: warning: Falling back to the standard locale ("C"). /bin/sh: 1: -pa: not found Traceback (most recent call last): File "/users/yvan/dnaPipeTE_edited.py", line 698, in <module> RepeatMasker(config['DEFAULT']['RepeatMasker'], args.RepeatMasker_library, args.RM_species, args.cpu, args.output_folder, args.RM_threshold) File "/users/yvan/dnaPipeTE_edited.py", line 381, in __init__ self.repeatmasker_run() File "/users/yvan/dnaPipeTE_edited.py", line 400, in repeatmasker_run with open(self.output_folder+"/Trinity.fasta.out", 'r') as trinity_handle: FileNotFoundError: [Errno 2] No such file or directory: '/groups/fr2020/Isa/dnaPipeTEst/output_dstevensoni//Trinity.fasta.out' ``` * [A request for help was made on GitHub](https://github.com/clemgoub/dnaPipeTE/issues/50) * As a result, in attempt four, the path specified for the `-output` argument was changed to `/mnt/output_dstevensoni` ::: ## Attempt four :::info * Following succesful completion of Trinity in attempt three, a checkpoint was passed. * Attempt four therefore started with RepeatModeler * Since Trinity could be passed, the edited `dnaPipeTE.py` script was no longer needed. ::: The following code was used: ``` python3 dnaPipeTE.py -input /mnt/data/Darwinula_stevensoni_250_350_pass_paired_1_fixed.fastq -output /mnt/output_dstevensoni/ -cpu 14 -sample_size 2 -genome_size 455000000 -genome_coverage 0.15 -RM_lib /groups/fr2020/Isa/Ds_ONT_EarlGrey/Ds_ONT_EarlGrey_Database/Ds_ONT_EarlGrey-families.fa -RM_t 0.4 -contig_length 2 1>/groups/fr2020/Isa/dnaPipeTEst/STD_OUTPUT_Ds.txt 2>/groups/fr2020/Isa/dnaPipeTEst/STD_ERROR_Ds.txt ``` :::success * **Attempt four was succesful** * STD_OUTPUT and STD_ERROR still have to be exported however ::: ``` python3 dnaPipeTE.py -input /mnt/data/Ds/Darwinula_stevensoni_250_350_pass_paired_1_fixed.fastq -output /mnt/output_dstevensoni_2 -genome_size 455000000 -genome_coverage 0.15 -sample_number 2 -RM_lib /mnt/data/Ds/Ds_TE_families.fa -RM_t 0.40 -cpu 14 -contig_length 200 ``` #### Result interpretation * `Trinity.fasta`: Includes sequences of all repeats * `read_per_component_and_annotation`: includes the counts in bp and reads for each repeat contig present in `Trinity.fasta`, and reports the annotation passing the threshold `-RM_t` * ==Not found== ### Post-processing ``` python3 /users/yvan/dnaPipeTE_edited.py \ -input Input/1_raw/Ds/Darwinula_stevensoni_250_350_pass_paired_1_fixed.fastq \ -output dnaPipeTEst/output_dstevensoni \ -cpu 8 \ -sample_number 2 \ -genome_size 455000000 \ -genome_coverage 0.15 \ -RM_lib Input/1_raw/Ds/Darwinula_stevensoni_250_350_pass_paired_1_fixed.fastq \ -RM_t 0.25 \ -contig_length 200 ``` ## *N. monacha* SRX5491106 R1 ``` python3 /users/yvan/dnaPipeTE_edited.py \ -input Input/1_raw/Nm/SRR8695262_pass_paired_1.fastq.gz \ -output dnaPipeTEst/output_Nm \ -cpu 8 \ -sample_number 2 \ -sample_number 2 \ -genome_size 425000000 \ -genome_coverage 0.15 \ -RM_lib Input/1_raw/Nm/Nm_REPET_denovoLibTEs_filtered_MCL.fa.classified \ -RM_t 0.25 \ -contig_length 200 ``` :::warning * `-RM_lib` path given is an approximation of the true path. ::: On the cluster, attempt combining the different insert size libraries for *D. stevensoni*. On the local PC, attempt dnaPipeTE with this as input instead: ``` ``` ``` seqkit seq trimmomatic/Darwinula_stevensoni_???????_pass_paired_1_fixed.fastq.gz > Darwinula_stevensoni_combined_pass_paired_1_fixed.fastq.gz python3 dnaPipeTE.py \ -input /mnt/Input/1_raw/data/Ds/Darwinula_stevensoni_250_350_pass_paired_1_fixed.fastq.gz \ -output /mnt/Output/output_Ds \ -cpu 8 \ -sample_number 2 \ -genome_size 455000000 \ -genome_coverage 0.15 \ -RM_lib /mnt/Input/1_raw/data/Ds/Dsillumina_denovoLibTEs_filtered_MCL.fa.classified \ -RM_t 0.25 \ -contig_length 200;\ mkdir_ dnaPT_charts.sh -I /mnt/Output/output_Ds ;\ dnaPT_landscapes.sh -I /mnt/Output/output_Ds -p DM;\ dnaPT_landscapes.sh -I /mnt/Output/output_Ds -p DM -S;\ python3 dnaPipeTE.py \ -input /mnt/Input/1_raw/data/Nm/SRR8695262_pass_paired_1.fastq.gz \ -output /mnt/Output/output_Nm \ -cpu 8 \ -sample_number 2 \ -sample_number 2 \ -genome_size 425000000 \ -genome_coverage 0.15 \ -RM_lib /mnt/Input/1_raw/data/Nm/Nm_REPET_denovoLibTEs_filtered_MCL.fa.classified \ -RM_t 0.25 \ -contig_length 200;\ dnaPT_charts.sh -I /mnt/Output/output_Nm;\ dnaPT_landscapes.sh -I /mnt/Output/output_Nm -p DM;\ dnaPT_landscapes.sh -I /mnt/Output/output_Nm -p DM -S;\ seqkit seq Ds_vs_Nm/Ds/REPET/TEdenovo-library/Dsillumina_denovoLibTEs_filtered_MCL.fa.classified\ Ds_vs_Nm/Nm/REPET/TEdenovo-library/Nm_REPET_denovoLibTEs_filtered_MCL.fa.classified |\ seqkit rmdup -s > dnaPipeTEst/data/Ct/TE_cat.fa.classified.fa.classified;\ python3 dnaPipeTE.py \ -input /mnt/Input/1_raw/data/Ct/SRR8695257_pass_paired_1.fastq.gz \ -output /mnt/Output/output_Ct \ -cpu 8 \ -sample_number 2 \ -sample_number 2 \ -genome_size 334833579 \ -genome_coverage 0.15 \ -RM_lib /mnt/Input/1_raw/data/Ct/TE_cat.fa.classified \ -RM_t 0.25 \ -contig_length 200;\ dnaPT_charts.sh -I /mnt/Output/output_Ct;\ dnaPT_landscapes.sh -I /mnt/Output/output_Ct -p DM;\ dnaPT_landscapes.sh -I /mnt/Output/output_Ct -p DM -S;\ ``` *[GFEs]: Gene Family Expansions ## Slurm ##### Nothing important. I just found this Slurm Script (RIT computing uses Slurm) by someone (Jacob Lamb) who is running dnaPipeTE in a cluster. Just to save it for future reference: ``` jlamb@login001 scripts]$cat dnaPipeTE.sh #!/bin/bash#SBATCH --job-name=dnaPipeTE #SBATCH --partition=week-long-highmem #SBATCH --cpus-per-task=16 #SBATCH -N 1 #nodes #SBATCH -o dnaPipeTE_%j.o #SBATCH -e dnaPipeTE_%j.ecd "/nfs/home/jlamb/Projects/dnaPipeTE_rnd_2"JOB_ID=$SLURM_JOB_ID TMP_DIR="$(pwd)/temp_$JOB_ID" mkdir -p "$TMP_DIR" && export TMPDIR="$TMP_DIR"#should only have to pass in $1 which is the SRX########module load singularity/4.1.2 module load python3# Use singularity exec instead of shell to run the subsequent commands in the container singularity exec --bind /nfs/home/jlamb/Projects/dnaPipeTE_rnd_2/$1:/mnt ~/bin/dnaPipeTE/dnapipte.img /bin/bash <<EOF# Set locale environment variables to avoid locale warnings export LANG=C.UTF-8 export LC_ALL=C.UTF-8cd /opt/dnaPipeTEpython3 dnaPipeTE.py \ -input /mnt/${1}_nuclear_1.fastq \ -output /mnt/dnaPipeTE_15gb_10gc_RMt15_smpl_2 \ -cpu 16 \ -genome_size 15000000000 \ -genome_coverage 0.1 \ -RM_lib /nfs/home/jlamb/TE_libs/dedupe_telib.fasta \ -RM_t 0.15 \ -sample_number 2 \EOF rm -rf "$TMP_DIR" ```