# TE divergence Plots ###### tags: `TE`, `divergence`, `plot` Go to your working dir `cd /groups/fr2020/...` Copy the template folder divergencePlot_template in /groups/arklab/bin/REPET `cp -R /groups/arklab/bin/REPET/divergencePlot_template` - [ ] Take a look at the folder and its contents - [ ] Read README: #============================================================================== This directory contains the scripts and files necessary to output a divergence landscape plot. RepeatMasker's RepBase poorly categorizes animals that aren't mammals or flies (as it appears to me), so using a TEdenovo library is highly suggested: 1. Copy this folder to a new location. 2. Symlink the fasta file for the organism's genome to this directory. `ln -s /groups/fr2020/metagenomes/assembly.fa ./` `ln -s /TEdenovo_project_directory/{NAME}_Blaster_GrpRecPil_Struct_Map_TEclassif_Filtered_MCL/{NAME}_denovoLibTEs_filtered_MCL.fa ./` 3. After running TEdenovo on the genome you want to generate a plot for, take the file located in: /TEdenovo_project_directory/{NAME}_Blaster_GrpRecPil_Struct_Map_TEclassif_Filtered_MCL/{NAME}_denovoLibTEs_filtered_MCL.fa and symlink it to this new location. 4. Edit "run" to modify the variable names as needed. 5. Execute the following command: "./run > output.log" The landscape divergence plot should be labeled {NAME}.html. #============================================================================== Notes The script "extractUnknowns.py" is used to pull out all of the consensuses labeled as "unknown" into a separate file. It also generates a "known" file that contains everything else. In /groups/arklab/bin/RepeatMasker, there is a script called "createRepeatLandscape_UnknownOnTop.pl" that is a copy of "createRepeatLandscape.pl", with the only difference being that the unknown consensuses are plotted on top of the other results in the divergence plots. #============================================================================== Brandon M. Le (Brandon_Le@brown.edu), 2016 OK, step one is done, let's create a symlink (step2) to the fasta file: ln -s /groups/fr2020/Bkoreanus/Bkoreanus.fa ./ Step 3: Look for the TEdenovo fasta file as described (once TEdenovo is finish) and create a symlink in /groups/fr2020/Bkoreanus/divergencePlot_template Step: edit `run`: ## Modify these variables as needed file="DmelChr4" TElib="DmelChr4_denovoLibTEs_filtered_MCL.fa" instead of DmelChr, it should be Bkoreanus We are going to use one of the servers (no clusters) for this run. Then modifie the option "-pa" in `RepeatMasker -pa 12 -a -low -no_is -e ncbi -lib ${TElib}.classified ${file}.fa` with the numer of nodes available, but not all of then! It's going to run overnight anyway (so use SCREEN) Save `run` file Before run anythig we need to check every commad in run works on your terminal (bash_profile issue). You need to check: `RepeatClassifier` `faToTwoBit` `RepeatMasker` ### Everything ready? Check which machine are you going to use `status` etc... This step (divergePlot) is better (faster) in the servers: Generally Available Bioinformatics Servers * arthur CentOS 7.4 8*96 * domino CentOS 7.4 24*96 * flicker CentOS 7.4 24*250 * jake CentOS 7.4 4*96 * luna CentOS 7.4 4*96 * minnie CentOS 7.4 32*1000 * rocket CentOS 7.4 12*128 * taiga CentOS 7.4 12*128 * tern CentOS 7.4 12*128 `ssh tern` Just remener, adjust -pa in ./run `RepeatMasker -pa 12 -a -low -no_is -e ncbi -lib ${TElib}.classified ${file}.fa` to the number of processors each machine has (processors*Memory) Use a screen session `screen` and run it `./run` If you use one of the clusters General Use Clusters: * cluster5 CentOS 7.4 2 @ 40*995 * cricket CentOS 7.4 1 @ 40*230 * grendel CentOS 7.4 8@8*96 3@12*128 ``` ssh cricket clusterize -log output.log ./run ``` ## EXTRA July 9 I would like to check two more divergence plots: 1. Use the combination of cosmid + illumina TEdenovo `cat Dscosmid_denovoLibTEs_filtered_MCL.fa Dstevenill_denovoLibTEs_filtered_MCL.fa > newname.fa` For building the plot within the cosmid seqs (Dscosmid.fa) in the divergencePlot_Template `customTElib="newname.fa"` 2. And use another custom TE library in `/groups/arklab/ostracods/divergencePlot_BL/customLibrary/ostracods_denovoLibTEs_filtered.fa.classified` copy it in your new divergencePlot_Template folder and: Run divergencePlot_Template script, but use this script instead: ./run ``` # Genererates a landscape plot through RepeatMasker (RM) # Modify these variables as needed file="ostracods" customTElib="customLibrary/ostracods_denovoLibTEs_filtered.fa.classified" # Run RM: query against database (Repbase in this scenario) with -a (alignment output .align) /groups/arklab/bin/RepeatMasker/RepeatMasker -pa 12 -a -low -no_is -lib ${customTElib} ${file}.fa # Run script calcDivergenceFromAlign.pl using output file (.align) from RM perl /groups/arklab/bin/RepeatMasker/util/calcDivergenceFromAlign.pl -s ${file}.divsum ${file}.fa.align # Output file ${file}.divsum which contains Kimura divergence is used to generate the landscape plot perl /groups/arklab/bin/RepeatMasker/util/createRepeatLandscape_UnknownOnTop.pl -div ${file}.divsum -twoBit ${file}.2bit > ${file}.html ```