HybPiper update and Gene Trees

# HybPiper update and Gene Trees ## Hybpiper To activate hybpiper environment `conda activate hybpiper` ## Running HybPiper Make sure you are in hybpiper environment before running command. #### Intial command `hybpiper assemble -t_dna [targetfile].fasta -r [samplefile].fastq --prefix (NameOutputFile) --bwa --cpu N --targetfile_ambiguity_codes NMRWSYKVHD` #### Command for while read loop `while read name do hybpiper assemble -r $name.R1.paired.fastq $name.R2.paired.fastq -t angiosperms353.abronia.fasta --prefix --bwa --cpu N --targetfile_ambiguity_codes NMRWSYKVHD $name done < namelist.txt` #### Extracting supercontigs `while read name do hybpiper assemble -r $name.R1.paired.fastq $name.R2.paired.fastq -t angiosperms353.abronia.fasta --prefix --bwa --cpu N --targetfile_ambiguity_codes NMRWSYKVHD --run_intronerate --no-blast --no-distribute --no-assemble done < namelist.txt` #### HybPiper stats Before running you need to create namelist.txt using nano and put in the text file the name of the output directory of where your hybpiper output was dumped. `hybpiper stats angiosperms353.abronia.fasta -t_dna gene namelist.txt` #### Recovering heat map `hybpiper recovery_heatmap seq_lengths.tsv` /scratch/bot3404/sprice/fastq/501/fastafiles # Gene Trees ## MAFFT To be able to infer a phylogeny, we first need to align the sequences. MAFFT takes unaligned raw sequences and creates multiple sequence alignments of amino acids or nucleotide sequences. First make a new directory named MAFFT `mkdir MAFFT` to put your output file in ##### Intial Command `mafft --preservecase --maxiterate 1000 --localpair inputfile.fasta > MAFFT/outputfile.mafft.fasta` ##### Command for parallel `parallel "mafft --preservecase --maxiterate 1000 --localpair Sequences2/inputfile.fasta > MAFFT/outputfile.mafft.fasta" :::: namelist.txt` ## Trimal After aligning sequences, there will be spaces in between base pairs that need to be removed before building the tree. Trimal will remove any illegitimate or poorly aligned sections of the sequences. First, make a new directory named TRIMAL, `mkdir TRIMAL` for your trimal output files to be directed to. ##### Base Command `trimal -in <inputfile> -out <outputfile> -(other options)` ##### Options for Trimal -gt is an option that tells trimal how big of a gap is allowed in that fraction of the sequence. ##### Command for one sample `trimal -in MAFFT/$name.mafft.fasta -out TRIMAL/$name.trimal.fasta -gt .5` ##### Command using parallel `parallel -j 10 --eta trimal -in MAFFT/{}.mafft.fasta -out TRIMAL/{}.trimal.fasta -gt .5 :::: namelist.txt` ## IQ Tree IQ tree command will take your input of multiple sequence alignment and will reconstruct a phylogeny that is best explained by your input data. You must be in your directory where your trimal output is. ##### Inital Command `iqtree -s $name.fasta` -s gives you the option to specify the name of the alignment Three output files will be generated ``` $name.fasta.iqtree $name.fasta.treefile $name.fasta.log ``` ##### Command for parallel `parallel --eta -j 10 iqtree -s fastafiles/{}_supercontig.trimal.fasta -m MFP -B 1000 :::: namelist.txt` ## Astral This compiles all sequences into a tree #### Base command `java -jar /opt/apps/Software/ASTRAL/Astral/astral.5.6.3.jar -i inputfilename.tre > outputfilename.astral.tre`