Andrew Kittentails Genome

--- title: Andrew Kittentails Genome Assembly break: false tags: Genomes, Andrew --- # Genome Assembly Notes for *Synthyris bullii* This document contains notes for genome assembly for Synthyris bullii (Kittentails) with an estimated genome size of 0.6GB (based on Veronica (Synthyris) missurica; Albach and GreilHuber 2004). The genome is being assembled from individual ND9672 grown from maternal plant ND8241 (Nachusa Doug population at Nachusa Grasslands, Illinois). The genome was sequenced by PacBio at Mt. Sinai using a SMRTcell on Revio (ADD MORE METADATA DETAILS) ... Data is currently on the server Funk @ CBG **CBG Servers:** Curie = "ID"@10.2.0.53; Funk = @10.2.0.51; Rosalind = @10.2.0.52 **NU Server:** Quest = "ID"@quest.northwestern.edu **Reminders/tips for running jobs in terminal with screen:** * Add prefix 'screen -L' to 'command' to go into screen session * Add -LogFile to name screen log file * Add -aS to name each screen * To view all screens: screen -list * To reenter session: screen -r <ID> * CTRL+a+d detaches from screen * CTRL+a+K kills screen * Make sure to log off server when screens are running * background vs screen * With screen, each job has a process ID, and you can see the ID's ## Assembly Outline 1. Run QC with 'SequelTools' or 'LongQC'(expected 20X+ coverage) 2. Run HifiASM - primary and alternate assembly 3. Run BUSCO - uses known genomes to test assembly quality and completeness 4. Create table that contains: total assembly length (bp), N50 (Kbp; 1Mb is a good #), # scaffolds/contigs, BUSCO score (S:D:F:M%) 5. Error Correction - Use pbmm2 and/or GCpp to map raw reads back to assembly, decided to use NextPolish2. 6. Check BUSCO again!! 7. PROFIT ______________________________________________________ ## 1. Run QC with 'SequelTools' or 'LongQC' https://github.com/ISUgenomics/SequelTools https://github.com/yfukasawa/LongQC ### Long QC ##### Code to run requires: input Either fasta, fastq or PacBio BAM formatted file is required. Input file is expected to be ready for analysis or have at least 5x coverage. -o or --output specify a path for output -x or --preset specify a platform/kit to be evaluated. adapter and some ovlp parameters are automatically applied. (pb-rs2, pb-sequel, ont-ligation, ont-rapid, ont-1dsq) ##### Web summary looks good * file:///C:/Users/adavi/Downloads/web_summary.html * yield = 26,464,887,540 base pairs * expected genome size = 600,000,000 base pairs * yield/expected genome size = 44.1X (average genome coverage) * N50 = 13,665 (50% of reads are over 13,665 base pairs long) ______________________________________________________ ## 2. HifiASM (v0.19.8-r603) https://github.com/chhylp123/hifiasm hifiasm -t 32 synthryis_bullii_bc2049.reads.fastq **There are multiple hifiasm assembly outputs, they are:** hifiasm.asm.bp.**p_ctg**.fa; * **p_ctg** (Primary Contigs): These are the primary contigs from the assembly, reflecting the overall structure of the assembled genome. They include all the contigs that are considered, regardless of haplotypes. Less accurate but more contiguous. * **hap1** and **hap2**: These represent the two haplotypes in a heterozygous assembly. If your organism is diploid, these contigs reflect the two different genetic variants present. **Combining these two assemblies is the preferred approach for generating a usable genome assembly**. * **p_utg** (Primary Uniparental Transcripts): These are contigs that represent a single parent in a diploid genome, typically used to simplify analysis by focusing on one haplotype. ______________________________________________________ ## 3. BUSCO (v5.5.0) BUSCO Manual: https://busco.ezlab.org/busco_userguide.html#command-line-options BUSCO publication walkthrough (much more helpful): https://currentprotocols.onlinelibrary.wiley.com/doi/full/10.1002/cpz1.323 ##### Convert .gfa to .fa file type awk '/^S/{print ">"$2;print $3}' hifiasm.asm.bp.p_ctg.gfa > hifiasm.asm.bp.p_ctg.fa* ##### run on primary contigs and then on combined primary and alternate * need to check how many cores to use? how many are on Funk? Steve used 30 for hybphaser recently screen -L busco -i hifiasm.asm.bp.hap2.p_ctg.fa -m genome -c 30 -o BUSCO_sbull_hap2 Error: No module named 'pandas' There was a problem installing BUSCO or importing one of its dependencies. See the user guide and the GitLab issue board (https://gitlab.com/ezlab/busco/issues) if you need further assistance. ^^ Looks as though we need to install more dependencies before we can run BUSCO **Scores using default evaluation database:** * hap1hap2.p_ctg - combined assembly (USABLE GENOME) * **** * bp.p_ctg - primary contig * **C:100.0%[S:49.0%,D:51.0%],F:0.0%,M:0.0%,n:255** * bp.hap1.p_ctg - haplotype 1; primary contig that has been phased * **C:98.5%[S:51.0%,D:47.5%],F:0.0%,M:1.5%,n:255** * bp.hap2.p_ctg - haplotype 2; primary contig that has been phased * **C:98.9%[S:52.2%,D:46.7%],F:0.4%,M:0.7%,n:255** * bp.p_utg - primary unitig; mostly repeat regions * **C:100.0%[S:6.3%,D:93.7%],F:0.0%,M:0.0%,n:255** Checking for space on Server: cat /proc/cpuinfo free -m (free memory) df -lh tail -f busco.log : output give the following error File "/usr/local/lib/python3.10/dist-packages/busco/BuscoRunner.py", line 195, in run raise BatchFatalError(str(exc_value)) busco.Exceptions.BatchFatalError: [Errno 5] Input/output error BUSCO COMPLETED!!?? Checking BUSCO output: * Move and rename accordingly the output 'short_summaries.txt' into a BUSCO_summaries folder so that all summaries can be visualized together * Run: *python3 scripts/generate_plot.py –wd BUSCO_summaries* * cannot get this to run, keeps looking for scripts/generate_plot.py in my directories?? ### Re-running BUSCO with a set lineage * Previous run was with auto-lineage (default) but few scaffolds were returned * New run with EUDICOT lineage (eudicots_odb10) * And run with EMBRYOPHYTA lineage (embryophyta_odb10) New command (added lineage with -l eudicots_odb10 and name for screen with -aS hap1): screen -L -Logfile s_log.busco.eud.p_ctg -aS hap1 busco -i hifiasm.asm.bp.hap1.p_ctg.fa -m genome -l eudicots_odb10 -c 20 -o BUSCO_sbull_eudicot_hap1 screen -L -Logfile s_log.busco.emb.p_ctg -aS emb.p_ctg busco -i hifiasm.asm.bp.hap1.p_ctg.fa -m genome -l embryophyta_odb10 -c 20 -o BUSCO_sbull_embryophyta_hap1 ### Eudicota BUSCOs * hap1hap2.p_ctg - combined assembly (USABLE GENOME) * **C:96.7%[S:2.7%,D:94.0%],F:0.3%,M:3.0%,n:2326** * 2249 Complete BUSCOs (C) 63 Complete and single-copy BUSCOs (S) 2186 Complete and duplicated BUSCOs (D) 8 Fragmented BUSCOs (F) 69 Missing BUSCOs (M) 2326 Total BUSCO groups searched * Assembly Statistics: 2556 Number of scaffolds 2556 Number of contigs 2,460,379,596 Total length **This is double the size of the genome right??** 0.000% Percent gaps 2 MB Scaffold N50 2 MB Contigs N50 * bp.p_ctg - primary contig * **C:96.4%[S:71.3%,D:25.1%],F:0.4%,M:3.2%,n:2326** * 2242 Complete BUSCOs ( C ) 1659 Complete and single-copy BUSCOs (S) 583 Complete and duplicated BUSCOs (D) 10 Fragmented BUSCOs (F) 74 Missing BUSCOs (M) 2326 Total BUSCO groups searched * Assembly Statistics: 1065 Number of scaffolds 1065 Number of contigs 1,337,434,359 Total length (1.3Gb - about twice the size expected) 0.000% Percent gaps 9 MB Scaffold N50 9 MB Contigs N50 * bp.hap1.p_ctg - haplotype 1; primary contig that has been phased * **C:94.6%[S:74.5%,D:20.1%],F:0.6%,M:4.8%,n:2326** * 2201 Complete BUSCOs (C) 1734 Complete and single-copy BUSCOs (S) 467 Complete and duplicated BUSCOs (D) 14 Fragmented BUSCOs (F) 111 Missing BUSCOs (M) 2326 Total BUSCO groups searched * Assembly Statistics: 1668 Number of scaffolds 1668 Number of contigs 1,249,595,024 Total length 0.000% Percent gaps 2 MB Scaffold N50 2 MB Contigs N50 * bp.hap2.p_ctg - haplotype 2; primary contig that has been phased * **C:94.9%[S:75.0%,D:19.9%],F:0.5%,M:4.6%,n:2326** * 2209 Complete BUSCOs (C) 1745 Complete and single-copy BUSCOs (S) 464 Complete and duplicated BUSCOs (D) 11 Fragmented BUSCOs (F) 106 Missing BUSCOs (M) 2326 Total BUSCO groups searched * Assembly Statistics: 888 Number of scaffolds 888 Number of contigs 1210784572 Total length 0.000% Percent gaps 3 MB Scaffold N50 3 MB Contigs N50 * bp.p_utg - primary unitig; mostly repeat regions * **C:96.8%[S:7.0%,D:89.8%],F:0.3%,M:2.9%,n:2326** 2250 Complete BUSCOs (C) 162 Complete and single-copy BUSCOs (S) 2088 Complete and duplicated BUSCOs (D) 8 Fragmented BUSCOs (F) 68 Missing BUSCOs (M) 2326 Total BUSCO groups searched * Assembly Statistics: 24217 Number of scaffolds 24217 Number of contigs 2605941398 Total length 0.000% Percent gaps 634 KB Scaffold N50 634 KB Contigs N50 ### Embryophyta BUSCOs C:98.6%[S:2.6%,D:96.0%],F:0.3%,M:1.1%,n:1614 1592 Complete BUSCOs (C) 42 Complete and single-copy BUSCOs (S) 1550 Complete and duplicated BUSCOs (D) 5 Fragmented BUSCOs (F) 17 Missing BUSCOs (M) 1614 Total BUSCO groups searched Assembly Statistics: 2556 Number of scaffolds 2556 Number of contigs 2460379596 Total length 0.000% Percent gaps 2 MB Scaffold N50 2 MB Contigs N50 ______________________________________________________ ## 4. Create Busco Tables to visualize assembly quality * Total assembly length (bp) = 1,337,434,359 * N50 (kbp) = * Number of scaffolds/contigs = 1065 * BUSCO C(S:D):F:M)% = 49.0 : 51.0 : 0 0 : 0.0 * C = 100% * n = 255 * **Complete:100.0%[Single-copy:49.0%, Duplicate:51.0%], Fragmented:0.0%, Missing:0.0%, n:255** ![image](https://hackmd.io/_uploads/BJJGwXSoA.png) ________________________________________________________ ## 5. Error Correction with meryl, winnowmap, samtools and Polishing Genome with NextPolish2 ## * Need to install nextpolish2, if you get the _libgcc-ng >=12 error_, run: conda config --append channels conda-forge to tell conda to also look on the conda-forge channel when you search for packages * Combine all hifiasm assembly outputs (hap1, hap2, p_ctg, p_utg) into one assembly file: cat hifiasm.asm.bp.hap1.p_ctg.fa hifiasm.asm.bp.hap2.p_ctg.fa hifiasm.asm.bp.p_ctg.fa hifiasm.asm.bp.p_utg.fa > s_bullii.combined_assembly.fa ^^ this didn't work with winnowmap, I got a -split-prefix error so i have rerun winnow on each assembly separately - when do I combine the genome assemblies?? https://github.com/Nextomics/NextPolish2?tab=readme-ov-file#install **1. Prepare HiFi mapping file with winnowmap (can also used minimap2)** (what and why courtesy of ChatGPT4.0) screen -L -Logfile meryl_assembly -aS meryl.asm # prefix to code for screen meryl count k=15 output merylDB s_bullii.combined_assembly.fa **What 'meryl count' does**: This command uses a tool called meryl to count k-mers (short sequences of DNA) of length 15 from the input file asm.fa.gz (a compressed file of **your genome assembly**). **Why**: It creates a database (merylDB) that stores information about these 15-mers meryl print greater-than distinct=0.9998 merylDB > repetitive_k15.txt **What 'meryl print' does:** This command generates a list of k-mers from the merylDB that are highly unique (greater than 99.98% distinct). **Why:** It helps identify and filter out repetitive sequences, saving them to repetitive_k15.txt. winnowmap -t 5 -W repetitive_k15.txt -ax map-pb synthryis_bullii_bc2049.reads.fastq s_bullii.combined_assembly.fa|samtools sort -o hifi.map.sort.bam - **What 'winnowmap' does:** This command uses winnowmap to align the HiFi reads (hifi.fasta.gz) to the assembly (asm.fa.gz), using the unique k-mers for better accuracy. The results are sorted and saved as a BAM file (hifi.map.sort.bam). **Why:** This alignment helps in mapping the reads to the assembly, which is essential for polishing the genome. ** having trouble with this file ^^ 9/10/24: running a few different ways: winnow > rerun witout | to samtools = no output to be found, see winnow_combo winnow_2 > rerun with p_ctg assembly only, no | = typo in input file winnow_3 > rerun with p_ctg assembly only, no | and > hifi_map.sam = typo in input/re run winnowmap -t 60 -W repetitive_k15.txt -ax map-pb --split-prefix combined_map s_bullii.combined_assembly.fa synthryis_bullii_bc2049.reads.fastq ^^^ There is no output file still??? I can't find any errors in the screenlog Trying this with hap1 (smallest assembly file to expedite troubleshooting): screen -L -Logfile winnow_hap1 -aS winnow.hap1 winnowmap -t 30 -W repetitive_k15.txt -ax map-pb hifiasm.asm.bp.hap1.p_ctg.fa synthryis_bullii_bc2049.reads.fastq > hifi.map.hap1.sam ^^^ OUTPUT FILE IS CREATED BUT EMPTY AGHH trying again with the samtools sort option screen -L -Logfile s_log.winnow_hap1_samtools -aS winnow.hap1 winnowmap -t 100 -W repetitive_k15.txt -ax map-pb -o hifi.map.hap1.sam hifiasm.asm.bp.hap1.p_ctg.fa synthryis_bullii_bc2049.reads.fastq | samtools sort -o hifi.map.sort.bam - No .bam file but I have a .sam file!! IT WORKKKEEEDD!! Finally an output, kind of unsure what wotked though :( Repeat with hap2, p_ctg, and u_ctg... * hap2: screen -L -Logfile s_log.winnow_hap2 -aS winnow.hap2 winnowmap -t 40 -W repetitive_k15.txt -ax map-pb -o hifi.map.hap2.sam hifiasm.asm.bp.hap2.p_ctg.fa synthryis_bullii_bc2049.reads.fastq * p_ctg screen -L -Logfile s_log.winnow_p_ctg -aS winnow.p_ctg winnowmap -t 40 -W repetitive_k15.txt -ax map-pb -o hifi.map.p_ctg.sam hifiasm.asm.bp.p_ctg.fa synthryis_bullii_bc2049.reads.fastq * p_utg screen -L -Logfile s_log.winnow_p_utg -aS winnow.p_utg winnowmap -t 40 -W repetitive_k15.txt -ax map-pb -o hifi.map.p_utg.sam hifiasm.asm.bp.p_utg.fa synthryis_bullii_bc2049.reads.fastq **b. sorting** (haven't been able to get this to work on the end of the winnowmap command, so sorting after map is created) samtools sort -o hifi.map.hap1.sort.bam hifi.map.hap1.sam **c. indexing** samtools index hifi.map.sort.bam **What 'samtools index' does:** generates a .bai file (index file) for the specified BAM file. This index file contains information about the positions of the reads in the BAM file. **Why:** The index allows for quick random access to specific regions of the BAM file. This is especially useful when working with large genomic datasets, as it enables tools and applications (like genome browsers) to quickly retrieve and display data from specific regions without needing to read the entire BAM file. When you use a .bam file with other bioinformatics tools, the tools typically access the associated .bai file automatically if it is present in the same directory and follows the naming convention (i.e., the same name as the .bam file, but with the .bai extension) **2. Creating k-mer files with yak (21 and 31 k-mers, explore why we would use others)** yak count -o k21.yak -k 21 -b 37 -i synthryis_bullii_bc2049.reads.fastq yak count -o k31.yak -k 31 -b 37 -i synthryis_bullii_bc2049.reads.fastq **What 'yak' does:** This command counts 21/31-mers from the provided FASTQ files (which contain raw sequencing data) and outputs the results to k21.yak. The -b 37 option is used to exclude singletons (k-mers that appear only once). **Why:** Producing k-mer datasets helps in understanding the composition of the sequences. **3. Run NextPolish2** screen -L -Logfile s_log.nextpolish4 -aS nextPolish_hap1 nextPolish2 -t 60 hifi.map.hap1.sort.bam hifiasm.asm.bp.hap1.p_ctg.fa k21.yak k31.yak -o asm.np2.hap1.fa **What 'nextPolish2' does:** This command runs NextPolish2, which refines the genome assembly using the aligned HiFi reads and k-mer datasets. **Why:** The goal is to improve the quality of the assembly by correcting errors and filling gaps. Citation to use: Jiang Hu, Zhuo Wang, Fan Liang, Shan-Lin Liu, Kai Ye, De-Peng Wang, NextPolish2: A Repeat-aware Polishing Tool for Genomes Assembled Using HiFi Long Reads, Genomics, Proteomics & Bioinformatics, 2024, qzad009, https://doi.org/10.1093/gpbjnl/qzad009 ______________________________________________________ ## 6. Re-run busco on polished genome (asm.np2.fa) #eudicots screen -L -Logfile busco_hap1_np2 -aS hap1_np2 busco -i asm.np2.hap1.fa -m genome -l eudicots_odb10 -c 80 -o BUSCO_sbull_eudicot_hap1_polished #embryophyta screen -L -Logfile s_log.busco.p_ctg.emb.np2 -aS p_ctg_emb_np2 busco -i asm.np2.p_ctg.fa -m genome -l embryophyta_odb10 -c 34 -o BUSCO_sbull_embryophyta_p_ctg_polished ______________________________________________________ ## 7. FINAL ASSESSMENT OF ASSEMBLY Should I evaluate assembly with the eudicot or the embryophyta dataset in busco? In (Koshimizu et al. 2023; https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10558197/), they used and reported both. They reported "99.4% and 99.6% coverage of the core genes of eudicots and embryophyta, respectively." For S.bullii, the assembly contains 96.5% and 98.5% coverage of the core genes of eudicots and embryophyta, respectively. **Eudicot BUSCO eval:** * **p_ctg**: * **Before polishing**: C:96.4%[S:71.3%,D:25.1%],F:0.4%,M:3.2%,n:2326** 2242 Complete BUSCOs ( C ) 1659 Complete and single-copy BUSCOs (S) 583 Complete and duplicated BUSCOs (D) 10 Fragmented BUSCOs (F) 74 Missing BUSCOs (M) 2326 Total BUSCO groups searched * **After polishing**: C:96.5%[S:71.4%,D:25.1%],F:0.4%,M:3.1%,n:2326 2244 Complete BUSCOs (C) 1661 Complete and single-copy BUSCOs (S) 583 Complete and duplicated BUSCOs (D) 10 Fragmented BUSCOs (F) 72 Missing BUSCOs (M) 2326 Total BUSCO groups searched * **hap1**: * **Before polishing**: C:94.6%[S:74.5%,D:20.1%],F:0.6%,M:4.8%,n:2326 2201 Complete BUSCOs (C) 1734 Complete and single-copy BUSCOs (S) 467 Complete and duplicated BUSCOs (D) 14 Fragmented BUSCOs (F) 111 Missing BUSCOs (M) 2326 Total BUSCO groups searched * **After polishing (lol @ marginal gains)**: C:94.7%[S:74.6%,D:20.1%],F:0.6%,M:4.7%,n:2326| |2202 Complete BUSCOs (C)| |1735 Complete and single-copy BUSCOs (S)| |467 Complete and duplicated BUSCOs (D)| |14 Fragmented BUSCOs (F)| |110 Missing BUSCOs (M)| |2326 Total BUSCO groups searched| * **hap2:** * **Before polishing**: C:94.9%[S:75.0%,D:19.9%],F:0.5%,M:4.6%,n:2326 2209 Complete BUSCOs (C) 1745 Complete and single-copy BUSCOs (S) 464 Complete and duplicated BUSCOs (D) 11 Fragmented BUSCOs (F) 106 Missing BUSCOs (M) 2326 Total BUSCO groups searched * **After polishing**: C:94.9%[S:75.0%,D:19.9%],F:0.5%,M:4.6%,n:2326 2209 Complete BUSCOs (C) 1745 Complete and single-copy BUSCOs (S) 464 Complete and duplicated BUSCOs (D) 11 Fragmented BUSCOs (F) 106 Missing BUSCOs (M) 2326 Total BUSCO groups searched * **p_utg:** * **Before polishing**: C:96.8%[S:7.0%,D:89.8%],F:0.3%,M:2.9%,n:2326 2250 Complete BUSCOs (C) 162 Complete and single-copy BUSCOs (S) 2088 Complete and duplicated BUSCOs (D) 8 Fragmented BUSCOs (F) 68 Missing BUSCOs (M) 2326 Total BUSCO groups searched * **After polishing**: * bp.p_ctg - primary contig (THIS IS PRIMARY SEQUENCE) * **C:96.4%[S:71.3%,D:25.1%],F:0.4%,M:3.2%,n:2326** * 2242 Complete BUSCOs ( C ) 1659 Complete and single-copy BUSCOs (S) 583 Complete and duplicated BUSCOs (D) 10 Fragmented BUSCOs (F) 74 Missing BUSCOs (M) 2326 Total BUSCO groups searched **Embryophyta BUSCO eval:** * p_ctg: * **Before polishing**: C:98.4%[S:74.3%,D:24.1%],F:0.4%,M:1.2%,n:1614 1589 Complete BUSCOs (C) 1200 Complete and single-copy BUSCOs (S) 389 Complete and duplicated BUSCOs (D) 7 Fragmented BUSCOs (F) 18 Missing BUSCOs (M) 1614 Total BUSCO groups searched * **After polishing**: C:98.5%[S:74.4%,D:24.1%],F:0.4%,M:1.1%,n:1614 1590 Complete BUSCOs (C) 1201 Complete and single-copy BUSCOs (S) 389 Complete and duplicated BUSCOs (D) 7 Fragmented BUSCOs (F) 17 Missing BUSCOs (M) 1614 Total BUSCO groups searched 1065 Number of scaffolds 1065 Number of contigs 1,337,213,455 Total length 0.000% Percent gaps 9 MB Scaffold N50 9 MB Contigs N50 * **hap1**: * **Before polishing**: C:96.7%[S:77.1%,D:19.6%],F:0.5%,M:2.8%,n:1614 1562 Complete BUSCOs (C) 1245 Complete and single-copy BUSCOs (S) 317 Complete and duplicated BUSCOs (D) 8 Fragmented BUSCOs (F) 44 Missing BUSCOs (M) 1614 Total BUSCO groups searched * **After polishing**: C:96.8%[S:77.2%,D:19.6%],F:0.5%,M:2.7%,n:1614 1563 Complete BUSCOs (C) 1246 Complete and single-copy BUSCOs (S) 317 Complete and duplicated BUSCOs (D) 8 Fragmented BUSCOs (F) 43 Missing BUSCOs (M) 1614 Total BUSCO groups searched * hap2: * **Before polishing**: C:97.1%[S:77.8%,D:19.3%],F:0.5%,M:2.4%,n:1614 1567 Complete BUSCOs (C) 1256 Complete and single-copy BUSCOs (S) 311 Complete and duplicated BUSCOs (D) 8 Fragmented BUSCOs (F) 39 Missing BUSCOs (M) 1614 Total BUSCO groups searched * **After polishing**: C:97.1%[S:77.8%,D:19.3%],F:0.5%,M:2.4%,n:1614 1567 Complete BUSCOs (C) 1256 Complete and single-copy BUSCOs (S) 311 Complete and duplicated BUSCOs (D) 8 Fragmented BUSCOs (F) 39 Missing BUSCOs (M) 1614 Total BUSCO groups searched * p_utg: * **Before polishing**: * **After polishing**: ## 8. Moving data between NU-Quest and CBG with rsync From Funk to Quest: rsync -aivP /home/amd9539/Kittentails/2.Genome/ amd9539@quest.northwestern.edu:/projects/p31922/05_genome/ From Quest to Funk: rsync -aivP /projects/p31922/05_genome/scripts amd9539@10.2.0.51:/home/amd9539/Kittentails/2.Genome/