# dnaPipeTE
###### tags: `TE`
https://github.com/clemgoub/dnaPipeTE

## Intro tutorial
Intro tutorial with some data examples (Drosophila) can be found [here](https://tehub.org/en/tutorials/docs/dnaPipeTE).
## Overview
*dnaPipeTE* provides rough estimates of TE-loads from short-read input data, additionally grouped if a TE reference library is provided. Additional down-stream processing of the output folder with *dnaPt_utils* allows easily obtaining overview graphs.
The workflow goes as follows:
```mermaid
graph TB
id1[Non-paired, short-read
FASTA sequences] & id2["`Optional:
TE reference library
(*RefLib*)`"] -- dnaPipeTE --> id3["`TE-load estimates
+
*IF RefLib:*
TE classification`"] -- dnaPT_utils --> id4["`TE visualizations
(TE landscape plot)`"]
```
:::info
Notes:
1. Assemblies can also be used as input, if they're first used to simulate non-paired and short-read sequences as required by *dnaPipeTE*.
2. If a curated *RefLib* is unavailable, consider building a reference library based on sequence similarity with an existing *RefLib*.
3. RNA-seq can also be used as input. In this case, note identified abundances do not reflect those in the genome, but rather the level of expression for TEs.
:::
## Initialization
*dnaPipeTE* expects to be run in a graphical environment. For this reason, first connect to a ==computing== cluster using *X2Go*.
Before starting, create a working directory for this exercise, hereafter referred to as `~/Project`.
For reproducibility's sake, *dnaPipeTE* has been installed on the cluster as an *Apptainer* image. A quick run-down on the difference between images and containers, such as those used in Docker, can be found [here](https://circleci.com/blog/docker-image-vs-container/).
In the command line, initialize the *dnaPipeTE* image using the following command:
```
singularity shell --bind \
/groups/fr2020/Isa/dnaPipeTEst:/mnt \
/bioware/apptainer-images/dnapipete.img
```
Ensure the use of ==absolute pathnames==, as well as the inclusion of the ==colon== ([here](https://hsf-training.github.io/hsf-training-singularity-webpage/02-running-containers/index.html#bound-directories) an explanation for why this is important). The console should now start with `Apptainer>`.
Once in the container: ==change the working directory==, test out the program:
```
cd /opt/dnaPipeTE
python3 dnaPipeTE.py
```
After verifying the installation of *dnaPipeTE*, we move on to prepare our first analyses. These aim to supplement results in [Schön *et al.* (2021)](https://www.mdpi.com/2073-4425/12/3/401) with those obtained through *dnaPipeTE*.
## *D. stevensoni* SRX5491116 R1
### Running dnaPipeTE
*dnaPipeTE* requires several arguments to be specified in order to function, an overview of all which provided below. In the last column, values are suggested in the analysis of *Darwinula stevensoni* library [SRX5491116](https://www.ncbi.nlm.nih.gov/sra/SRX5491116).
|Variable|Format|Remark|*D. stevensoni* Value|
|-|-|-|-|
|*-input*|*fastq* File path|Quality filtered & no contaminating organelles or excessive GFEs|`dnaPipeTEst/data/Darwinula_stevensoni_250_350_pass_paired_1_fixed.fastq`|
|-output|Output folder path||`dnaPipeTEst/output`|
|-cpu|Nr. of cores to use ||`2`|
| *-sample_number* |Nr. of Trinity iterations|Recommended: start with 2, experiment with 3-4|`2`|
|*-genome_size*|Estimated size of genome (in bp)| *Darwinula stevensoni*: 455.000.000, *Notodromas monacha* 425.000.000|`455000000` or `425000000`|
|*-genome_coverage*|Fold coverage for input of genome OR desired fold coverage of input after sub-sampling step|Tran Van *et al.* (2021) Supplementary methods S3|`137.7` or `0.15` (experiment)|
|*-RM_lib*|*fasta* File path|Header format: `>name#CLASS/Subclass`==^note^==|`Ds_ONT_EarlGray/Ds_ONT_EarlGrey_Database/Ds_ONT_EarlGrey-families.fa`|
|*-RM_t*|$$0 < X < 1$$|Min. alignment fraction of query with TE reference|`0.33`|
|*-keep_Trinity_output*|Boolean|Many & large intermediate files -> Not recommended|`FALSE`|
|*-contig_length*|Min. TE length (bp)|[dnaPipeTE Github](https://github.com/clemgoub/dnaPipeTE) has 200bp|`200`|
:::warning
==Note:== Possible header format violations:
* `CLASS` must be a value in: DNA, LINE, LTR, SINE, MITE, Helitron, Simple Repeat, Satellite
* Some CLASS values: Unknown
(**Remove useless references?**)
* Following `Subclass`, some headers have additional info (**Remove extra info?**)
:::
Before dnaPipeTE can be executed with these arguments, the specified files need to be created ==^note^== and the input FASTA needs to be decompressed:
```
cd /groups/fr2020/Isa/
mkdir dnaPipeTEst/data
mkdir dnaPipeTEst/output_dstevensoni
gzip -dk Illumina_ds_paired_reads/trimmomatic/Darwinula_stevensoni_250_350_pass_paired_1_fixed.fastq.gz
mv Illumina_ds_paired_reads/fastqc/trimm/Darwinula_stevensoni_250_350_pass_paired_1_fixed.fastq \
dnaPipeTEst/data
```
### Attempt one
With values prepared for each argument, and all data in place *dnaPipeTE* can be run:
```
python3 /opt/dnaPipeTE/dnaPipeTE.py \
-input dnaPipeTEst/data/Darwinula_stevensoni_250_350_pass_paired_1_fixed.fastq \
-output dnaPipeTEst/output \
-cpu 8 -sample_number 2 -genome_size 382000000 -genome_coverage 353 \
-RM_lib Ds_ONT_EarlGrey/Ds_ONT_EarlGrey_Database/Ds_ONT_EarlGrey-families.fa \
-RM_t 0.33 -contig_length 200
```
:::danger
* Use of `-genome_coverage 353` induced the following error:
```
...
Let's go !!!
Start time: Mon Oct 28 09:02:21 2024
generating trinity samples...
total number of reads: 172023527
not enought base to sample 25974258253 vs 134846000000 to sample
```
* Experimenting with lower -genome_coverage value reduced the number `134846000000` in the error.
* Likely, `Darwinula_stevensoni_250_350_pass_paired_1_fixed.fastq` has less than 353x coverage by itself, since the full genome was assembled using 2 additional read lengths, as well as mate-pair sequences.
* Calculations suggested `-genome_coverage 67` would reduce the original error value `134846000000` to a value less than `25974258253`, circumventing this error
* This was the case in attempt two, although ==the actual genome coverage of `Darwinula_stevensoni_250_350_pass_paired_1_fixed.fastq` must be checked==.
* With this issue, perhaps `-genome_coverage` should be the desired coverage after sub-sampling by dnaPipeTE. Attempt three therefore uses `0.5` as a value
:::
### Attempt two
Due to the above error, a second attempt was made where `-genome_coverage` was changed to `67`:
```
python3 /opt/dnaPipeTE/dnaPipeTE.py \
-input dnaPipeTEst/data/Darwinula_stevensoni_250_350_pass_paired_1_fixed.fastq \
-output dnaPipeTEst/output \
-cpu 8 -sample_number 2 -genome_size 382000000 -genome_coverage 67 \
-RM_lib Ds_ONT_EarlGrey/Ds_ONT_EarlGrey_Database/Ds_ONT_EarlGrey-families.fa \
-RM_t 0.33 \
-contig_length 200
```
This led to the following output, with errors discussed below:
```
...
Let's go !!!
Start time: Mon Oct 28 09:50:48 2024
generating trinity samples...
total number of reads: 172023527
maximum number of reads to sample: 172023526
fastq : dnaPipeTEst/data/Darwinula_stevensoni_250_350_pass_paired_1_fixed.fastq
sampling 2 samples of max 86011763 reads to reach coverage...
12987129919 bases sampled in 86011763 reads
s_Darwinula_stevensoni_250_350_pass_paired_1_fixed.fastq done.
12987128183 bases sampled in 172023526 reads
s_Darwinula_stevensoni_250_350_pass_paired_1_fixed.fastq done.
['s0_Darwinula_stevensoni_250_350_pass_paired_1_fixed.fastq.fasta', 's1_Darwinula_stevensoni_250_350_pass_paired_1_fixed.fastq.fasta']
###################################
### TRINITY to assemble repeats ###
###################################
***** TRINITY iteration 1 *****
Selecting reads for Trinity iteration number 1...
awk: cannot open dnaPipeTEst/output/Trinity_run0/chrysalis/readsToComponents.out.sort (No such file or directory)
Done
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = (unset),
LC_ALL = (unset),
LANG = "en_US.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
Must specify basic parameters: ex. Trinity --seqType fq --single reads.fq --max_memory 10G at /opt/dnaPipeTE/bin/trinityrnaseq-Trinity-v2.5.1/Trinity line 853.
Trinity iteration 1 Done'
###################################
### TRINITY to assemble repeats ###
###################################
***** TRINITY iteration 2 *****
Selecting reads for Trinity iteration number 2...
awk: cannot open dnaPipeTEst/output/Trinity_run1/chrysalis/readsToComponents.out.sort (No such file or directory)
Done
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = (unset),
LC_ALL = (unset),
LANG = "en_US.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
Must specify basic parameters: ex. Trinity --seqType fq --single reads.fq --max_memory 10G at /opt/dnaPipeTE/bin/trinityrnaseq-Trinity-v2.5.1/Trinity line 853.
Trinity iteration 2 Done'
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = (unset),
LC_ALL = (unset),
LANG = "en_US.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
sed: can't read dnaPipeTEst/output/Trinity_run2/Trinity.fasta: No such file or directory
renaming Trinity output...
awk: cannot open dnaPipeTEst/output/Trinity_run2/Trinity.fasta (No such file or directory)
done
dnaPipeTEst/output/Annotation/one_RM_hit_per_Trinity_contigs
dnaPipeTEst/output/Annotation/Best_RM_annot_80-80
dnaPipeTEst/output/Annotation/Best_RM_annot_partial
#######################################
### REPEATMASKER to anotate contigs ###
#######################################
/bin/sh: 1: -pa: not found
Traceback (most recent call last):
File "/opt/dnaPipeTE/dnaPipeTE.py", line 698, in <module>
RepeatMasker(config['DEFAULT']['RepeatMasker'], args.RepeatMasker_library, args.RM_species, args.cpu, args.output_folder, args.RM_threshold)
File "/opt/dnaPipeTE/dnaPipeTE.py", line 381, in __init__
self.repeatmasker_run()
File "/opt/dnaPipeTE/dnaPipeTE.py", line 400, in repeatmasker_run
with open(self.output_folder+"/Trinity.fasta.out", 'r') as trinity_handle:
FileNotFoundError: [Errno 2] No such file or directory: 'dnaPipeTEst/output/Trinity.fasta.out'
```
:::danger
Several warnings & errors are included in the output:
```
awk: cannot open dnaPipeTEst/output/Trinity_run0/chrysalis/readsToComponents.out.sort (No such file or directory)
```
* As discussed [here](https://github.com/clemgoub/dnaPipeTE/issues/3), this is not an issue
```
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = (unset),
LC_ALL = (unset),
LANG = "en_US.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
```
* Unknown if this is an issue
```
Must specify basic parameters: ex. Trinity --seqType fq --single reads.fq --max_memory 10G at /opt/dnaPipeTE/bin/trinityrnaseq-Trinity-v2.5.1/Trinity line 853.
```
* Led to the failing of Trinity
* A check-up of the `Trinity` script led to the suspicion this issue was caused by a faulty `$max_memory` argument being passed to Trinity by `dnaPipeTE`
* The `dnaPipeTE.py` script was locally edited to set Trinity's `$max_memory` argument to 8G, by changing line 314 to `self.Trinity_memory = str("8G")`
* This solved the issue, and hereafter I propose working with `users/yvan/dnaPipeTE_edited.py`
* Ideally, we find the cause, likely pertaining `Trinity_memory` via line 697 in `dnaPipeTE.py`:
```
Trinity(config['DEFAULT']['Trinity'], config['DEFAULT']['Trinity_memory'], args.cpu, config['DEFAULT']['Trinity_glue'], args.output_folder, sample_files, args.sample_number, args.contig_length)
```
```
sed: can't read dnaPipeTEst/output/Trinity_run2/Trinity.fasta: No such file or directory
```
* Very likely results from failing of Trinity (`dnaPipeTEst/output/Trinity.fasta` is empty)
```
/bin/sh: 1: -pa: not found
Traceback (most recent call last):
File "/opt/dnaPipeTE/dnaPipeTE.py", line 698, in <module>
RepeatMasker(config['DEFAULT']['RepeatMasker'], args.RepeatMasker_library, args.RM_species, args.cpu, args.output_folder, args.RM_threshold)
File "/opt/dnaPipeTE/dnaPipeTE.py", line 381, in __init__
self.repeatmasker_run()
File "/opt/dnaPipeTE/dnaPipeTE.py", line 400, in repeatmasker_run
with open(self.output_folder+"/Trinity.fasta.out", 'r') as trinity_handle:
FileNotFoundError: [Errno 2] No such file or directory: 'dnaPipeTEst/output/Trinity.fasta.out'
```
* Likely results from failing of Trinity (`dnaPipeTEst/output/Trinity.fasta` is empty)
* Possibly also from the issue described [here](https://superuser.com/questions/1634933/bin-sh-1-my-command-not-found).
:::
:::info
Suggestion:
* Find out dnaPipeTE's issue passing the `$max_memory` argument to Trinity, or work with `dnaPipeTE_edited.py`
* Ensure that the following locale is supported
`LANGUAGE = (unset),
LC_ALL = (unset),
LANG = "en_US.UTF-8"`
* Verify `-genome_coverage` of `Darwinula_stevensoni_250_350_pass_paired_1_fixed.fastq`
* It's possible the `-genome_coverage` argument asks for the *desired* genome coverage after dnaPipeTE's subsampling step, and that it should therefore be `0.5`
:::
### Attempt three
Following these issues, a third dnaPipeTE test was done, changing `-genome_coverage` to `0.5` and manually specifying Trinity's `$max_memory` as `8G` in `dnaPipeTE_edited.py`:
```
python3 /users/yvan/dnaPipeTE_edited.py \
-input dnaPipeTEst/data/Darwinula_stevensoni_250_350_pass_paired_1_fixed.fastq \
-output dnaPipeTEst/output \
-cpu 8 \
-sample_number 2 \
-genome_size 382000000 \
-genome_coverage 0.5 \
-RM_lib Ds_ONT_EarlGrey/Ds_ONT_EarlGrey_Database/Ds_ONT_EarlGrey-families.fa \
-RM_t 0.33 \
-contig_length 200
```
Run was manually stopped for correction of the specified genome size (382 --> 455 Mb), adopting the 'default' RM threshold (0.33 -> 0.2), experimenting with a coverage value of 0.15, and for changing the output filename
```
python3 /users/yvan/dnaPipeTE_edited.py \
-input dnaPipeTEst/data/Darwinula_stevensoni_250_350_pass_paired_1_fixed.fastq \
-output dnaPipeTEst/output_dstevensoni \
-cpu 8 \
-sample_number 2 \
-genome_size 455000000 \
-genome_coverage 0.15 \
-RM_lib Ds_ONT_EarlGrey/Ds_ONT_EarlGrey_Database/Ds_ONT_EarlGrey-families.fa \
-RM_t 0.2 \
-contig_length 200
```
Output was as follows
```
Let's go !!!
Start time: Mon Nov 4 11:18:20 2024
generating trinity samples...
total number of reads: 172023527
not enought base to sample 25974258253 vs 62653499999 to sample
```
:::success
* Trinity ran succesfully, producing substantial output
:::
:::danger
* RepeatMasker produced the following error:
```
perl: warning: Falling back to the standard locale ("C").
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = (unset),
LC_ALL = (unset),
LANG = "en_US.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
/bin/sh: 1: -pa: not found
Traceback (most recent call last):
File "/users/yvan/dnaPipeTE_edited.py", line 698, in <module>
RepeatMasker(config['DEFAULT']['RepeatMasker'], args.RepeatMasker_library, args.RM_species, args.cpu, args.output_folder, args.RM_threshold)
File "/users/yvan/dnaPipeTE_edited.py", line 381, in __init__
self.repeatmasker_run()
File "/users/yvan/dnaPipeTE_edited.py", line 400, in repeatmasker_run
with open(self.output_folder+"/Trinity.fasta.out", 'r') as trinity_handle:
FileNotFoundError: [Errno 2] No such file or directory: '/groups/fr2020/Isa/dnaPipeTEst/output_dstevensoni//Trinity.fasta.out'
```
* [A request for help was made on GitHub](https://github.com/clemgoub/dnaPipeTE/issues/50)
* As a result, in attempt four, the path specified for the `-output` argument was changed to `/mnt/output_dstevensoni`
:::
## Attempt four
:::info
* Following succesful completion of Trinity in attempt three, a checkpoint was passed.
* Attempt four therefore started with RepeatModeler
* Since Trinity could be passed, the edited `dnaPipeTE.py` script was no longer needed.
:::
The following code was used:
```
python3 dnaPipeTE.py -input /mnt/data/Darwinula_stevensoni_250_350_pass_paired_1_fixed.fastq -output /mnt/output_dstevensoni/ -cpu 14 -sample_size 2 -genome_size 455000000 -genome_coverage 0.15 -RM_lib /groups/fr2020/Isa/Ds_ONT_EarlGrey/Ds_ONT_EarlGrey_Database/Ds_ONT_EarlGrey-families.fa -RM_t 0.4 -contig_length 2 1>/groups/fr2020/Isa/dnaPipeTEst/STD_OUTPUT_Ds.txt 2>/groups/fr2020/Isa/dnaPipeTEst/STD_ERROR_Ds.txt
```
:::success
* **Attempt four was succesful**
* STD_OUTPUT and STD_ERROR still have to be exported however
:::
```
python3 dnaPipeTE.py -input /mnt/data/Ds/Darwinula_stevensoni_250_350_pass_paired_1_fixed.fastq -output /mnt/output_dstevensoni_2 -genome_size 455000000 -genome_coverage 0.15 -sample_number 2 -RM_lib /mnt/data/Ds/Ds_TE_families.fa -RM_t 0.40 -cpu 14 -contig_length 200
```
#### Result interpretation
* `Trinity.fasta`: Includes sequences of all repeats
* `read_per_component_and_annotation`: includes the counts in bp and reads for each repeat contig present in `Trinity.fasta`, and reports the annotation passing the threshold `-RM_t`
* ==Not found==
### Post-processing
```
python3 /users/yvan/dnaPipeTE_edited.py \
-input Input/1_raw/Ds/Darwinula_stevensoni_250_350_pass_paired_1_fixed.fastq \
-output dnaPipeTEst/output_dstevensoni \
-cpu 8 \
-sample_number 2 \
-genome_size 455000000 \
-genome_coverage 0.15 \
-RM_lib Input/1_raw/Ds/Darwinula_stevensoni_250_350_pass_paired_1_fixed.fastq \
-RM_t 0.25 \
-contig_length 200
```
## *N. monacha* SRX5491106 R1
```
python3 /users/yvan/dnaPipeTE_edited.py \
-input Input/1_raw/Nm/SRR8695262_pass_paired_1.fastq.gz \
-output dnaPipeTEst/output_Nm \
-cpu 8 \
-sample_number 2 \
-sample_number 2 \
-genome_size 425000000 \
-genome_coverage 0.15 \
-RM_lib Input/1_raw/Nm/Nm_REPET_denovoLibTEs_filtered_MCL.fa.classified \
-RM_t 0.25 \
-contig_length 200
```
:::warning
* `-RM_lib` path given is an approximation of the true path.
:::
On the cluster, attempt combining the different insert size libraries for *D. stevensoni*. On the local PC, attempt dnaPipeTE with this as input instead:
```
```
```
seqkit seq trimmomatic/Darwinula_stevensoni_???????_pass_paired_1_fixed.fastq.gz > Darwinula_stevensoni_combined_pass_paired_1_fixed.fastq.gz
python3 dnaPipeTE.py \
-input /mnt/Input/1_raw/data/Ds/Darwinula_stevensoni_250_350_pass_paired_1_fixed.fastq.gz \
-output /mnt/Output/output_Ds \
-cpu 8 \
-sample_number 2 \
-genome_size 455000000 \
-genome_coverage 0.15 \
-RM_lib /mnt/Input/1_raw/data/Ds/Dsillumina_denovoLibTEs_filtered_MCL.fa.classified \
-RM_t 0.25 \
-contig_length 200;\
mkdir_
dnaPT_charts.sh -I /mnt/Output/output_Ds ;\
dnaPT_landscapes.sh -I /mnt/Output/output_Ds -p DM;\
dnaPT_landscapes.sh -I /mnt/Output/output_Ds -p DM -S;\
python3 dnaPipeTE.py \
-input /mnt/Input/1_raw/data/Nm/SRR8695262_pass_paired_1.fastq.gz \
-output /mnt/Output/output_Nm \
-cpu 8 \
-sample_number 2 \
-sample_number 2 \
-genome_size 425000000 \
-genome_coverage 0.15 \
-RM_lib /mnt/Input/1_raw/data/Nm/Nm_REPET_denovoLibTEs_filtered_MCL.fa.classified \
-RM_t 0.25 \
-contig_length 200;\
dnaPT_charts.sh -I /mnt/Output/output_Nm;\
dnaPT_landscapes.sh -I /mnt/Output/output_Nm -p DM;\
dnaPT_landscapes.sh -I /mnt/Output/output_Nm -p DM -S;\
seqkit seq Ds_vs_Nm/Ds/REPET/TEdenovo-library/Dsillumina_denovoLibTEs_filtered_MCL.fa.classified\
Ds_vs_Nm/Nm/REPET/TEdenovo-library/Nm_REPET_denovoLibTEs_filtered_MCL.fa.classified |\
seqkit rmdup -s > dnaPipeTEst/data/Ct/TE_cat.fa.classified.fa.classified;\
python3 dnaPipeTE.py \
-input /mnt/Input/1_raw/data/Ct/SRR8695257_pass_paired_1.fastq.gz \
-output /mnt/Output/output_Ct \
-cpu 8 \
-sample_number 2 \
-sample_number 2 \
-genome_size 334833579 \
-genome_coverage 0.15 \
-RM_lib /mnt/Input/1_raw/data/Ct/TE_cat.fa.classified \
-RM_t 0.25 \
-contig_length 200;\
dnaPT_charts.sh -I /mnt/Output/output_Ct;\
dnaPT_landscapes.sh -I /mnt/Output/output_Ct -p DM;\
dnaPT_landscapes.sh -I /mnt/Output/output_Ct -p DM -S;\
```
*[GFEs]: Gene Family Expansions
## Slurm
##### Nothing important. I just found this Slurm Script (RIT computing uses Slurm) by someone (Jacob Lamb) who is running dnaPipeTE in a cluster. Just to save it for future reference:
```
jlamb@login001 scripts]$cat dnaPipeTE.sh
#!/bin/bash#SBATCH --job-name=dnaPipeTE
#SBATCH --partition=week-long-highmem
#SBATCH --cpus-per-task=16
#SBATCH -N 1 #nodes
#SBATCH -o dnaPipeTE_%j.o
#SBATCH -e dnaPipeTE_%j.ecd "/nfs/home/jlamb/Projects/dnaPipeTE_rnd_2"JOB_ID=$SLURM_JOB_ID
TMP_DIR="$(pwd)/temp_$JOB_ID"
mkdir -p "$TMP_DIR" && export TMPDIR="$TMP_DIR"#should only have to pass in $1 which is the SRX########module load singularity/4.1.2
module load python3# Use singularity exec instead of shell to run the subsequent commands in the container
singularity exec --bind /nfs/home/jlamb/Projects/dnaPipeTE_rnd_2/$1:/mnt ~/bin/dnaPipeTE/dnapipte.img /bin/bash <<EOF# Set locale environment variables to avoid locale warnings
export LANG=C.UTF-8
export LC_ALL=C.UTF-8cd /opt/dnaPipeTEpython3 dnaPipeTE.py \
-input /mnt/${1}_nuclear_1.fastq \
-output /mnt/dnaPipeTE_15gb_10gc_RMt15_smpl_2 \
-cpu 16 \
-genome_size 15000000000 \
-genome_coverage 0.1 \
-RM_lib /nfs/home/jlamb/TE_libs/dedupe_telib.fasta \
-RM_t 0.15 \
-sample_number 2 \EOF
rm -rf "$TMP_DIR"
```