# Alignment exercise using a transformed HAL file for DESCHRAMBLER.
###### Author: Manuel Hoyos mhoyosro@ttu.edu
######
- [ ]
Now, just to keep the perspective of things, I will outline our inmediate plan/objectives:
1. We will transform a Hal file into Chains to feed it into DESCHRAMBLER because the process may be faster this way.
2. You will kindly help me align pKuh.softmasked.fa and rFer.softmasked.fa using your traditional parallelized process to ensure that everything matches in the end.
**The relevant files to achieve our objectives are available at this link:**
https://drive.google.com/drive/folders/1GGLArs1LDVYmtKVSq_RlI9DbL5LKPxLj?usp=sharing
Next, I'm going to bore you with the procedure I have followed to transform a Hal file into Chains (I know it's long and maybe unnecessary, but I think it completes the message).
### 1. The first step is to describe the species and the tree I am using.
As I am interested in phyllostomid bats. I performed an alignment (with just five species) As follows:
rFer.softmasked.fa, rMic.softmasked.fa, aJam.softmasked.fa, tBra.softmasked.fa, pKuh.softmasked.fa
This is the tree:

In each node, there is a label that starts with "manAnc" which stands for "manuel Ancestor". I wrote that to make a distinction from another tree I was working on at that time, where I didn't include any ancestors. As you can see, there are 19 ancestors.
I'm telling you all this to make it very clear how the Cactus alignment program presents the output of the entire alignment process, which looks like this:
```
2683837899 Dec 28 20:31 manAnc16.cigar
3881846925 Dec 28 20:31 manAnc16.cigar.secondary
2060541479 Dec 28 20:31 manAnc16.cigar.og_fragment_0
982793920 Dec 28 20:31 manAnc16.cigar.og_fragment_1
334937139 Dec 28 20:31 manAnc16.cigar.og_fragment_2
661554506 Dec 28 20:31 manAnc16.cigar.ig_coverage_0
897735910 Dec 28 20:31 manAnc16.cigar.ig_coverage_1
3923833430 Dec 29 05:40 manAnc16.hal
1788551383 Dec 29 05:40 manAnc16.fa
2627281668 Dec 29 20:42 manAnc7.cigar
4121715727 Dec 29 20:43 manAnc7.cigar.secondary
2066631165 Dec 29 20:43 manAnc7.cigar.og_fragment_0
233070744 Dec 29 20:43 manAnc7.cigar.og_fragment_2
982218008 Dec 29 20:43 manAnc7.cigar.og_fragment_1
983275386 Dec 29 20:43 manAnc7.cigar.ig_coverage_0
1021480342 Dec 29 20:43 manAnc7.cigar.ig_coverage_1
2075106341 Jan 3 17:05 rFer.softmasked.fa
2564589532 Jan 3 17:06 rMic.softmasked.fa
2224126925 Jan 3 17:06 aJam.softmasked.fa
1014 Jan 3 17:06 fivegenomes.txt
2283176224 Jan 3 20:31 tBra.softmasked.fa
1775724452 Jan 3 20:31 pKuh.softmasked.fa
4372705968 Jan 4 19:31 manAnc7.hal
1931873984 Jan 4 19:32 manAnc7.fa
1782200946 Jan 5 02:12 manAnc17.cigar
2952881705 Jan 5 02:12 manAnc17.cigar.secondary
1871910340 Jan 5 02:12 manAnc17.cigar.og_fragment_0
572994967 Jan 5 02:12 manAnc17.cigar.ig_coverage_0
455145950 Jan 5 02:12 manAnc17.cigar.ig_coverage_1
3570997337 Jan 5 08:45 manAnc17.hal
1816394037 Jan 5 08:46 manAnc17.fa
630186533 Jan 5 11:33 manAnc19.cigar
742204635 Jan 5 11:33 manAnc19.cigar.secondary
1688338849 Jan 5 13:47 manAnc19.fa
3418283662 Jan 5 14:31 fivegenomes.hal
```
Of all this, the most important thing, of course, is the .hal file, which is called **fivegenomes.hal** in this case.
### 2. Now we need the .hal file to be useful as an input file for DESCHRAMBLER, which involves producing UCSC browser-style pairwise chains from the fivegenomes.hal file. But first we need HalTools and DoBlastzChainNet
For that, we need to use haltools and DoBlastzChainNet. The installation of both can be a bit cumbersome, so I will provide my installation recipe for both programs below (off course all the paths apply to my HPCC because I am lazy today).
* My recipe for HalTools instalation:
```
#Downloading HAL
cd /lustre/work/mhoyosro/software
git clone https://github.com/ComparativeGenomicsToolkit/hal.git
#Installing dependencies for HAL
cd hal
mkdir -p DIR/hdf5
cd /lustre/work/mhoyosro/software/hal/DIR/hdf5
wget http://www.hdfgroup.org/ftp/HDF5/releases/hdf5-1.10/hdf5-1.10.1/src/hdf5-1.10.1.tar.gz
tar xzf hdf5-1.10.1.tar.gz
cd hdf5-1.10.1
./configure --enable-cxx --prefix /lustre/work/mhoyosro/software/hal/DIR/hdf5
make && make install
#Before building HAL, update the following environment variables
export PATH=/lustre/work/mhoyosro/software/hal/DIR/hdf5/bin:${PATH}
export h5prefix=-prefix=/lustre/work/mhoyosro/software/hal/DIR/hdf5
#sonLib (A compact C/Python library for sequence analysis in bioinformatics from Benedict Paten)
#HAL and sonLib must be sibling directories
cd /lustre/work/mhoyosro/software
git clone https://github.com/ComparativeGenomicsToolkit/sonLib.git
pushd sonLib && make && popd
# Finish installation (HPPC solved this part on april 25th)
cd /lustre/work/mhoyosro/software/hal
module load gcc/10.1.0 hdf5/1.10.6
Make
# Set the path every time and everywhere the program is used (Otherwise nothing will work!!)
export PATH=/lustre/work/mhoyosro/software/hal/DIR/hdf5/bin:${PATH}
export h5prefix=-prefix=/lustre/work/mhoyosro/software/hal/DIR/hdf5
export PATH=/lustre/work/mhoyosro/software/hal/bin:${PATH}
```
* My recipe for DoBlastzChainNet instalation:
```
# Go to the operational directory
cd /lustre/work/mhoyosro/software/
# Prepare the subfolder
mkdir DoBlastzChainNet && cd DoBlastzChainNet
mkdir data && cd data
mkdir scripts bin && cd bin
# Download some software
rsync -a rsync://hgdownload.soe.ucsc.edu/genome/admin/exe/linux.x86_64/ .
# I made this code to replace the cannonical one
git clone git://genome-source.soe.ucsc.edu/kent.git
cp -r /lustre/work/mhoyosro/software/DoBlastzChainNet/kent/src/hg/utils/automation/* /lustre/work/mhoyosro/software/DoBlastzChainNet/data/scripts/
#PATH setup
# The two directories /data/bin and /data/scripts are added to the shell PATH environment
# Originally the instructions from https://genomewiki.ucsc.edu/index.php?title=DoBlastzChainNet.pl said this: ----- echo 'export PATH=/data/bin:/data/scripts:$PATH' >> $HOME/.bashrc -----
# But I don't get very much what that thing does so I prefer this path
export PATH=/lustre/work/mhoyosro/software/DoBlastzChainNet/data/bin:${PATH}
export PATH=/lustre/work/mhoyosro/software/DoBlastzChainNet/data/scripts:${PATH}
```
The important thing about this is that to convert HAL to CHAINS, I will need to set up these paths.
```
# HalTools path
export PATH=/lustre/work/mhoyosro/software/hal/DIR/hdf5/bin:${PATH}
export h5prefix=-prefix=/lustre/work/mhoyosro/software/hal/DIR/hdf5
export PATH=/lustre/work/mhoyosro/software/hal/bin:${PATH}
# doblastchain path
export PATH=/lustre/work/mhoyosro/software/DoBlastzChainNet/data/bin:${PATH}
export PATH=/lustre/work/mhoyosro/software/DoBlastzChainNet/data/scripts:${PATH}
```
### 3. In this step, I will show you what I have done to transform that HAL into CHAINS. Everything is based on this page https://github.com/ComparativeGenomicsToolkit/hal/blob/chaining-doc/doc/chaining-mapping.md
```
# Go to the location of the file fivegenomes.hal
cd /lustre/work/mhoyosro/software/cactus/steps-output
# Set the paths for HalTools and DoBlastzChainNet
export PATH=/lustre/work/mhoyosro/software/hal/DIR/hdf5/bin:${PATH}
export h5prefix=-prefix=/lustre/work/mhoyosro/software/hal/DIR/hdf5
export PATH=/lustre/work/mhoyosro/software/hal/bin:${PATH}
export PATH=/lustre/work/mhoyosro/software/DoBlastzChainNet/data/bin:${PATH}
export PATH=/lustre/work/mhoyosro/software/DoBlastzChainNet/data/scripts:${PATH}
# print the list of genomes in alignment
halStats --genomes fivegenomes.hal
# Convert to fasta the genomes of interest and then transform them to 2bit
hal2fasta fivegenomes.hal pKuh.softmasked.fa | faToTwoBit stdin pKuh.softmasked.2bit
hal2fasta fivegenomes.hal rFer.softmasked.fa | faToTwoBit stdin rFer.softmasked.2bit
# So far, this is similar to the regular DoBlastzChainNet procedure
# Next, the source genome sequences are obtained in BED format.
# Importantly, here the target is the source genome!!
# So for this exercise the TARGET/SOURCE is pKuh.softmasked.fa
# --bedSequences Prints sequences of given genome in bed format
halStats --bedSequences pKuh.softmasked.fa fivegenomes.hal > pKuh.bed
# When we inspect the resultant file (pKuh.bed) We observe that the only thing we have done so far is defining the scaffolds.
cat pKuh.bed
# Next, following the instructions provided by the mentioned website,
# the previous .bed file is then used by halLiftover to create pairwise alignments.
halLiftover --outPSL fivegenomes.hal pKuh.softmasked.fa \
pKuh.bed rFer.softmasked.fa /dev/stdout | \
pslPosTarget stdin pKuh_to_rFer.psl
# The resulting PSL files are then concatenated together. These alignments are then chained using the UCSC Browser axtChain program:
axtChain -psl -linearGap=loose pKuh_to_rFer.psl rFer.softmasked.2bit pKuh.softmasked.2bit pKuh_to_rFer.chain
```
Well, I think that's it. Today, I'm going to focus on trying to use DESCHRAMBLER with the file I obtained because I haven't tried it yet. So if I encounter any issues, I'll definitely bother you. I'm embarrassed for sending such a long letter, but it gives me peace of mind to present a complete idea. Once again, thank you very much for your help.